Can you tell us the average size of your records and how much heap is given to the region servers ?
Thanks On Aug 23, 2013, at 12:11 AM, Gautam Borah <[email protected]> wrote: > Hello all, > > I have an use case where I need to write 1 million to 10 million records > periodically (with intervals of 1 minutes to 10 minutes), into an HBase > table. > > Once the insert is completed, these records are queried immediately from > another program - multiple reads. > > So, this is one massive write followed by many reads. > > I have two approaches to insert these records into the HBase table - > > Use HTable or HTableMultiplexer to stream the data to HBase table. > > or > > Write the data to HDFS store as a sequence file (avro in my case) - run map > reduce job using HFileOutputFormat and then load the output files into > HBase cluster. > Something like, > > LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); > loader.doBulkLoad(new Path(outputDir), hTable); > > > In my use case which approach would be better? > > If I use HTable interface, would the inserted data be in the HBase cache, > before flushing to the files, for immediate read queries? > > If I use map reduce job to insert, would the data be loaded into the HBase > cache immediately? or only the output files would be copied to respective > hbase table specific directories? > > So, which approach is better for write and then immediate multiple read > operations? > > Thanks, > Gautam
