Re: best approach for write and immediate read use case

Ted Yu Fri, 23 Aug 2013 14:44:33 -0700

Assuming you are using 0.94, the default value
for hbase.regionserver.global.memstore.lowerLimit is 0.35


Meaning, memstore on each region server would be able to hold 3000M * 0.35
/ 60 = 17.5 mil records (roughly).

bq. If I use HTable interface, would the inserted data be in the HBase
cache, before flushing to the files, for immediate read queries?

Yes.

Cheers


On Fri, Aug 23, 2013 at 12:01 PM, Gautam Borah <[email protected]>wrote:

> Hi,
>
> Average size of my records is 60 bytes - 20 bytes Key and 40 bytes value,
> table has one column family.
>
> I have setup a cluster for testing - 1 master and 3 region servers. Each
> have a heap size of 3 GB, single cpu.
>
> I have pre-split the table into 30 regions. I do not have to keep data
> forever, I could purge older records periodically.
>
> Thanks,
>
> Gautam
>
>
>
> On Fri, Aug 23, 2013 at 3:20 AM, Ted Yu <[email protected]> wrote:
>
> > Can you tell us the average size of your records and how much heap is
> > given to the region servers ?
> >
> > Thanks
> >
> > On Aug 23, 2013, at 12:11 AM, Gautam Borah <[email protected]>
> wrote:
> >
> > > Hello all,
> > >
> > > I have an use case where I need to write 1 million to 10 million
> records
> > > periodically (with intervals of 1 minutes to 10 minutes), into an HBase
> > > table.
> > >
> > > Once the insert is completed, these records are queried immediately
> from
> > > another program - multiple reads.
> > >
> > > So, this is one massive write followed by many reads.
> > >
> > > I have two approaches to insert these records into the HBase table -
> > >
> > > Use HTable or HTableMultiplexer to stream the data to HBase table.
> > >
> > > or
> > >
> > > Write the data to HDFS store as a sequence file (avro in my case) - run
> > map
> > > reduce job using HFileOutputFormat and then load the output files into
> > > HBase cluster.
> > > Something like,
> > >
> > >  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
> > >  loader.doBulkLoad(new Path(outputDir), hTable);
> > >
> > >
> > > In my use case which approach would be better?
> > >
> > > If I use HTable interface, would the inserted data be in the HBase
> cache,
> > > before flushing to the files, for immediate read queries?
> > >
> > > If I use map reduce job to insert, would the data be loaded into the
> > HBase
> > > cache immediately? or only the output files would be copied to
> respective
> > > hbase table specific directories?
> > >
> > > So, which approach is better for write and then immediate multiple read
> > > operations?
> > >
> > > Thanks,
> > > Gautam
> >
>

Re: best approach for write and immediate read use case

Reply via email to