Re: Read thruput

Ted Mon, 01 Apr 2013 03:54:02 -0700

Can you increase block cache size ?

What version of hbase are you using ?


Thanks

On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[email protected]> wrote:

> The typical size of each of my row is less than 1KB.
> 
> Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB for
> datanodes and I dont see them completely used. So I ruled out the GC aspect.
> 
> In case u still believe that GC is an issue, I will upload the gc logs.
> 
> -Vibhav
> 
> 
> On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan <
> [email protected]> wrote:
> 
>> Hi
>> 
>> How big is your row?  Are they wider rows and what would be the size of
>> every cell?
>> How many read threads are getting used?
>> 
>> 
>> Were you able to take a thread dump when this was happening?  Have you seen
>> the GC log?
>> May be need some more info before we can think of the problem.
>> 
>> Regards
>> Ram
>> 
>> 
>> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[email protected]> wrote:
>> 
>>> Hi All,
>>> 
>>> I am trying to use Hbase for real-time data retrieval with a timeout of
>> 50
>>> ms.
>>> 
>>> I am using 2 machines as datanode and regionservers,
>>> and one machine as a master for hadoop and Hbase.
>>> 
>>> But I am able to fire only 3000 queries per sec and 10% of them are
>> timing
>>> out.
>>> The database has 60 million rows.
>>> 
>>> Are these figure okie, or I am missing something.
>>> I have used the scanner caching to be equal to one, because for each time
>>> we are fetching a single row only.
>>> 
>>> Here are the various configurations:
>>> 
>>> *Our schema
>>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
>>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION =>
>>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
>>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK =>
>> 'true',
>>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>>> 
>>> *Configuration*
>>> 1 Machine having both hbase and hadoop master
>>> 2 machines having both region server node and datanode
>>> total 285 region servers
>>> 
>>> *Machine Level Optimizations:*
>>> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
>>> b)Increase the read-ahead value to 4096
>>> c)Added noatime,nodiratime to the disks
>>> 
>>> *Hadoop Optimizations:*
>>> dfs.datanode.max.xcievers = 4096
>>> dfs.block.size = 33554432
>>> dfs.datanode.handler.count = 256
>>> io.file.buffer.size = 65536
>>> hadoop data is split on 4 directories, so that different disks are being
>>> accessed
>>> 
>>> *Hbase Optimizations*:
>>> 
>>> hbase.client.scanner.caching=1  #We have specifcally added this, as we
>>> return always one row.
>>> hbase.regionserver.handler.count=3200
>>> hfile.block.cache.size=0.35
>>> hbase.hregion.memstore.mslab.enabled=true
>>> hfile.min.blocksize.size=16384
>>> hfile.min.blocksize.size=4
>>> hbase.hstore.blockingStoreFiles=200
>>> hbase.regionserver.optionallogflushinterval=60000
>>> hbase.hregion.majorcompaction=0
>>> hbase.hstore.compaction.max=100
>>> hbase.hstore.compactionThreshold=100
>>> 
>>> *Hbase-GC
>>> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
>>> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
>>> *Hadoop-GC*
>>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>> 
>>> -Vibhav
>>

Re: Read thruput

Reply via email to