Can you increase block cache size ? What version of hbase are you using ?
Thanks On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[email protected]> wrote: > The typical size of each of my row is less than 1KB. > > Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB for > datanodes and I dont see them completely used. So I ruled out the GC aspect. > > In case u still believe that GC is an issue, I will upload the gc logs. > > -Vibhav > > > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan < > [email protected]> wrote: > >> Hi >> >> How big is your row? Are they wider rows and what would be the size of >> every cell? >> How many read threads are getting used? >> >> >> Were you able to take a thread dump when this was happening? Have you seen >> the GC log? >> May be need some more info before we can think of the problem. >> >> Regards >> Ram >> >> >> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[email protected]> wrote: >> >>> Hi All, >>> >>> I am trying to use Hbase for real-time data retrieval with a timeout of >> 50 >>> ms. >>> >>> I am using 2 machines as datanode and regionservers, >>> and one machine as a master for hadoop and Hbase. >>> >>> But I am able to fire only 3000 queries per sec and 10% of them are >> timing >>> out. >>> The database has 60 million rows. >>> >>> Are these figure okie, or I am missing something. >>> I have used the scanner caching to be equal to one, because for each time >>> we are fetching a single row only. >>> >>> Here are the various configurations: >>> >>> *Our schema >>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => >>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION => >>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE >>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => >> 'true', >>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} >>> >>> *Configuration* >>> 1 Machine having both hbase and hadoop master >>> 2 machines having both region server node and datanode >>> total 285 region servers >>> >>> *Machine Level Optimizations:* >>> a)No of file descriptors is 1000000(ulimit -n gives 1000000) >>> b)Increase the read-ahead value to 4096 >>> c)Added noatime,nodiratime to the disks >>> >>> *Hadoop Optimizations:* >>> dfs.datanode.max.xcievers = 4096 >>> dfs.block.size = 33554432 >>> dfs.datanode.handler.count = 256 >>> io.file.buffer.size = 65536 >>> hadoop data is split on 4 directories, so that different disks are being >>> accessed >>> >>> *Hbase Optimizations*: >>> >>> hbase.client.scanner.caching=1 #We have specifcally added this, as we >>> return always one row. >>> hbase.regionserver.handler.count=3200 >>> hfile.block.cache.size=0.35 >>> hbase.hregion.memstore.mslab.enabled=true >>> hfile.min.blocksize.size=16384 >>> hfile.min.blocksize.size=4 >>> hbase.hstore.blockingStoreFiles=200 >>> hbase.regionserver.optionallogflushinterval=60000 >>> hbase.hregion.majorcompaction=0 >>> hbase.hstore.compaction.max=100 >>> hbase.hstore.compactionThreshold=100 >>> >>> *Hbase-GC >>> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled >>> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16 >>> *Hadoop-GC* >>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >>> >>> -Vibhav >>
