The typical size of each of my row is less than 1KB. Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB for datanodes and I dont see them completely used. So I ruled out the GC aspect.
In case u still believe that GC is an issue, I will upload the gc logs. -Vibhav On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan < [email protected]> wrote: > Hi > > How big is your row? Are they wider rows and what would be the size of > every cell? > How many read threads are getting used? > > > Were you able to take a thread dump when this was happening? Have you seen > the GC log? > May be need some more info before we can think of the problem. > > Regards > Ram > > > On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[email protected]> wrote: > > > Hi All, > > > > I am trying to use Hbase for real-time data retrieval with a timeout of > 50 > > ms. > > > > I am using 2 machines as datanode and regionservers, > > and one machine as a master for hadoop and Hbase. > > > > But I am able to fire only 3000 queries per sec and 10% of them are > timing > > out. > > The database has 60 million rows. > > > > Are these figure okie, or I am missing something. > > I have used the scanner caching to be equal to one, because for each time > > we are fetching a single row only. > > > > Here are the various configurations: > > > > *Our schema > > *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => > > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION => > > 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE > > P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => > 'true', > > IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} > > > > *Configuration* > > 1 Machine having both hbase and hadoop master > > 2 machines having both region server node and datanode > > total 285 region servers > > > > *Machine Level Optimizations:* > > a)No of file descriptors is 1000000(ulimit -n gives 1000000) > > b)Increase the read-ahead value to 4096 > > c)Added noatime,nodiratime to the disks > > > > *Hadoop Optimizations:* > > dfs.datanode.max.xcievers = 4096 > > dfs.block.size = 33554432 > > dfs.datanode.handler.count = 256 > > io.file.buffer.size = 65536 > > hadoop data is split on 4 directories, so that different disks are being > > accessed > > > > *Hbase Optimizations*: > > > > hbase.client.scanner.caching=1 #We have specifcally added this, as we > > return always one row. > > hbase.regionserver.handler.count=3200 > > hfile.block.cache.size=0.35 > > hbase.hregion.memstore.mslab.enabled=true > > hfile.min.blocksize.size=16384 > > hfile.min.blocksize.size=4 > > hbase.hstore.blockingStoreFiles=200 > > hbase.regionserver.optionallogflushinterval=60000 > > hbase.hregion.majorcompaction=0 > > hbase.hstore.compaction.max=100 > > hbase.hstore.compactionThreshold=100 > > > > *Hbase-GC > > *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled > > -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16 > > *Hadoop-GC* > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > > > > -Vibhav > > >
