Re: best approach for write and immediate read use case

Ted Yu Fri, 23 Aug 2013 03:22:04 -0700

Can you tell us the average size of your records and how much heap is given to 
the region servers ?


Thanks

On Aug 23, 2013, at 12:11 AM, Gautam Borah <[email protected]> wrote:

> Hello all,
> 
> I have an use case where I need to write 1 million to 10 million records
> periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> table.
> 
> Once the insert is completed, these records are queried immediately from
> another program - multiple reads.
> 
> So, this is one massive write followed by many reads.
> 
> I have two approaches to insert these records into the HBase table -
> 
> Use HTable or HTableMultiplexer to stream the data to HBase table.
> 
> or
> 
> Write the data to HDFS store as a sequence file (avro in my case) - run map
> reduce job using HFileOutputFormat and then load the output files into
> HBase cluster.
> Something like,
> 
>  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
>  loader.doBulkLoad(new Path(outputDir), hTable);
> 
> 
> In my use case which approach would be better?
> 
> If I use HTable interface, would the inserted data be in the HBase cache,
> before flushing to the files, for immediate read queries?
> 
> If I use map reduce job to insert, would the data be loaded into the HBase
> cache immediately? or only the output files would be copied to respective
> hbase table specific directories?
> 
> So, which approach is better for write and then immediate multiple read
> operations?
> 
> Thanks,
> Gautam

Re: best approach for write and immediate read use case

Reply via email to