Re: Read thruput

Vibhav Mundra Mon, 01 Apr 2013 10:50:55 -0700

yes, I have changes the BLOCK CACHE % to 0.35.

-Vibhav



On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu <[email protected]> wrote:

> I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE
>
> My suggestion was about block cache percentage.
>
> Cheers
>
>
> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra <[email protected]> wrote:
>
> > I have used the following site:
> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> >
> > to lessen the value of block cache.
> >
> > -Vibhav
> >
> >
> > On Mon, Apr 1, 2013 at 4:23 PM, Ted <[email protected]> wrote:
> >
> > > Can you increase block cache size ?
> > >
> > > What version of hbase are you using ?
> > >
> > > Thanks
> > >
> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[email protected]> wrote:
> > >
> > > > The typical size of each of my row is less than 1KB.
> > > >
> > > > Regarding the memory, I have used 8GB for Hbase regionservers and 4
> GB
> > > for
> > > > datanodes and I dont see them completely used. So I ruled out the GC
> > > aspect.
> > > >
> > > > In case u still believe that GC is an issue, I will upload the gc
> logs.
> > > >
> > > > -Vibhav
> > > >
> > > >
> > > > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan <
> > > > [email protected]> wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> How big is your row?  Are they wider rows and what would be the size
> > of
> > > >> every cell?
> > > >> How many read threads are getting used?
> > > >>
> > > >>
> > > >> Were you able to take a thread dump when this was happening?  Have
> you
> > > seen
> > > >> the GC log?
> > > >> May be need some more info before we can think of the problem.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >>
> > > >> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[email protected]>
> > wrote:
> > > >>
> > > >>> Hi All,
> > > >>>
> > > >>> I am trying to use Hbase for real-time data retrieval with a
> timeout
> > of
> > > >> 50
> > > >>> ms.
> > > >>>
> > > >>> I am using 2 machines as datanode and regionservers,
> > > >>> and one machine as a master for hadoop and Hbase.
> > > >>>
> > > >>> But I am able to fire only 3000 queries per sec and 10% of them are
> > > >> timing
> > > >>> out.
> > > >>> The database has 60 million rows.
> > > >>>
> > > >>> Are these figure okie, or I am missing something.
> > > >>> I have used the scanner caching to be equal to one, because for
> each
> > > time
> > > >>> we are fetching a single row only.
> > > >>>
> > > >>> Here are the various configurations:
> > > >>>
> > > >>> *Our schema
> > > >>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf',
> DATA_BLOCK_ENCODING
> > =>
> > > >>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0',
> > COMPRESSION
> > > =>
> > > >>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0',
> KEE
> > > >>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK =>
> > > >> 'true',
> > > >>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> > > >>>
> > > >>> *Configuration*
> > > >>> 1 Machine having both hbase and hadoop master
> > > >>> 2 machines having both region server node and datanode
> > > >>> total 285 region servers
> > > >>>
> > > >>> *Machine Level Optimizations:*
> > > >>> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
> > > >>> b)Increase the read-ahead value to 4096
> > > >>> c)Added noatime,nodiratime to the disks
> > > >>>
> > > >>> *Hadoop Optimizations:*
> > > >>> dfs.datanode.max.xcievers = 4096
> > > >>> dfs.block.size = 33554432
> > > >>> dfs.datanode.handler.count = 256
> > > >>> io.file.buffer.size = 65536
> > > >>> hadoop data is split on 4 directories, so that different disks are
> > > being
> > > >>> accessed
> > > >>>
> > > >>> *Hbase Optimizations*:
> > > >>>
> > > >>> hbase.client.scanner.caching=1  #We have specifcally added this, as
> > we
> > > >>> return always one row.
> > > >>> hbase.regionserver.handler.count=3200
> > > >>> hfile.block.cache.size=0.35
> > > >>> hbase.hregion.memstore.mslab.enabled=true
> > > >>> hfile.min.blocksize.size=16384
> > > >>> hfile.min.blocksize.size=4
> > > >>> hbase.hstore.blockingStoreFiles=200
> > > >>> hbase.regionserver.optionallogflushinterval=60000
> > > >>> hbase.hregion.majorcompaction=0
> > > >>> hbase.hstore.compaction.max=100
> > > >>> hbase.hstore.compactionThreshold=100
> > > >>>
> > > >>> *Hbase-GC
> > > >>> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > -XX:+CMSParallelRemarkEnabled
> > > >>> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
> > > >>> *Hadoop-GC*
> > > >>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > > >>>
> > > >>> -Vibhav
> > > >>
> > >
> >
>

Re: Read thruput

Reply via email to