Assuming you are using 0.94, the default value for hbase.regionserver.global.memstore.lowerLimit is 0.35
Meaning, memstore on each region server would be able to hold 3000M * 0.35 / 60 = 17.5 mil records (roughly). bq. If I use HTable interface, would the inserted data be in the HBase cache, before flushing to the files, for immediate read queries? Yes. Cheers On Fri, Aug 23, 2013 at 12:01 PM, Gautam Borah <[email protected]>wrote: > Hi, > > Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value, > table has one column family. > > I have setup a cluster for testing - 1 master and 3 region servers. Each > have a heap size of 3 GB, single cpu. > > I have pre-split the table into 30 regions. I do not have to keep data > forever, I could purge older records periodically. > > Thanks, > > Gautam > > > > On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu <[email protected]> wrote: > > > Can you tell us the average size of your records and how much heap is > > given to the region servers ? > > > > Thanks > > > > On Aug 23, 2013, at 12:11 AM, Gautam Borah <[email protected]> > wrote: > > > > > Hello all, > > > > > > I have an use case where I need to write 1 million to 10 million > records > > > periodically (with intervals of 1 minutes to 10 minutes), into an HBase > > > table. > > > > > > Once the insert is completed, these records are queried immediately > from > > > another program - multiple reads. > > > > > > So, this is one massive write followed by many reads. > > > > > > I have two approaches to insert these records into the HBase table - > > > > > > Use HTable or HTableMultiplexer to stream the data to HBase table. > > > > > > or > > > > > > Write the data to HDFS store as a sequence file (avro in my case) - run > > map > > > reduce job using HFileOutputFormat and then load the output files into > > > HBase cluster. > > > Something like, > > > > > > LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf); > > > loader.doBulkLoad(new Path(outputDir), hTable); > > > > > > > > > In my use case which approach would be better? > > > > > > If I use HTable interface, would the inserted data be in the HBase > cache, > > > before flushing to the files, for immediate read queries? > > > > > > If I use map reduce job to insert, would the data be loaded into the > > HBase > > > cache immediately? or only the output files would be copied to > respective > > > hbase table specific directories? > > > > > > So, which approach is better for write and then immediate multiple read > > > operations? > > > > > > Thanks, > > > Gautam > > >
