Re: best approach for write and immediate read use case

Gautam Borah Fri, 23 Aug 2013 12:01:53 -0700

Hi,

Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
table has one column family.


I have setup a cluster for testing - 1 master and 3 region servers. Each
have a heap size of 3 GB, single cpu.

I have pre-split the table into 30 regions. I do not have to keep data
forever, I could purge older records periodically.

Thanks,

Gautam



On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu <[email protected]> wrote:

> Can you tell us the average size of your records and how much heap is
> given to the region servers ?
>
> Thanks
>
> On Aug 23, 2013, at 12:11 AM, Gautam Borah <[email protected]> wrote:
>
> > Hello all,
> >
> > I have an use case where I need to write 1 million to 10 million records
> > periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> > table.
> >
> > Once the insert is completed, these records are queried immediately from
> > another program - multiple reads.
> >
> > So, this is one massive write followed by many reads.
> >
> > I have two approaches to insert these records into the HBase table -
> >
> > Use HTable or HTableMultiplexer to stream the data to HBase table.
> >
> > or
> >
> > Write the data to HDFS store as a sequence file (avro in my case) - run
> map
> > reduce job using HFileOutputFormat and then load the output files into
> > HBase cluster.
> > Something like,
> >
> >  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> >  loader.doBulkLoad(new Path(outputDir), hTable);
> >
> >
> > In my use case which approach would be better?
> >
> > If I use HTable interface, would the inserted data be in the HBase cache,
> > before flushing to the files, for immediate read queries?
> >
> > If I use map reduce job to insert, would the data be loaded into the
> HBase
> > cache immediately? or only the output files would be copied to respective
> > hbase table specific directories?
> >
> > So, which approach is better for write and then immediate multiple read
> > operations?
> >
> > Thanks,
> > Gautam
>

Re: best approach for write and immediate read use case

Reply via email to