> > <name>hfile.block.cache.size</name> > <value>0.0</value> >
Yikes. Don't do that. :) Even if your blocks are in the OS cache, upon each single Get HBase needs to re-allocate a new 64k block on the heap (including the index blocks). If you see no chance that a working set of the data fits into the aggregate block cache across the region serves and your reads are truly random then you can try to reduce BLOCKSIZE => '65536', maybe to 8192 or even less (but that has other downsides). Also disable Nagle's (i.e. enable tcpnodelay at both the HBase and HDFS layers). ________________________________ From: "[email protected]" <[email protected]> To: user <[email protected]> Sent: Wednesday, August 13, 2014 2:23 AM Subject: Re: Re: Any fast way to random access hbase data? Haven't tried yet only one thread 10 regions servers, total 2555 regions. I am just new to HBase and not sure what exactly the block cache mean, here's the configuration i can see from the CDH HBase master UI: <name>hbase.rs.cacheblocksonwrite</name> <value>false</value> <source>hbase-default.xml</source> <name>hbase.offheapcache.percentage</name> <value>0</value> <source>hbase-default.xml</source> <name>hfile.block.cache.size</name> <value>0.0</value> <source>programatically</source> Table description: {NAME => 'userdigest', coprocessor$3 => 'hdfs://agrant/user/tracking/userdigest/copro cessor/endpoint_0.0.17.jar|com.agrantsem.data.userdigest.endpoint.UserdigestEndPoint| 1001|', coprocessor$2 => '|org.apache.hadoop.hbase.coprocessor.AggregateImplementatio n||', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROWC OL', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_ME MORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} [email protected] From: Esteban Gutierrez Date: 2014-08-13 15:59 To: [email protected] Subject: Re: Any fast way to random access hbase data? Hello Lei, Have you tried a larger batch size? how many threads or tasks are you using to fetch data? could you please describe a little bit more your HBase cluster? e.g. how many region servers, how many regions per RS? whats the hit ratio of the block cache? any chance for you to share the table schema? cheers, esteban. -- Cloudera, Inc. On Wed, Aug 13, 2014 at 12:34 AM, [email protected] <[email protected] > wrote: > > I have a hbase table with more than 2G rows. > Every hour there comes 5M~10M row ids and i must get all the row info from > the hbase table. > But even I use the batch call(1000 row ids as a list) as described here > > http://stackoverflow.com/questions/13310434/hbase-api-get-data-rows-information-by-list-of-row-ids > > It takes about 1 hour. > Any other way to do this more quickly? > > Thanks, > Lei > > > [email protected] >
