Re: Rows per second for RegionScanner

Vladimir Rodionov Thu, 21 Apr 2016 21:11:39 -0700

Try disabling block encoding - you will get better numbers.

>>  I mean per region scan speed,


Scan performance depends on # of CPU cores, the more cores you have the
more performance you will get. Your servers are pretty low end (4 virtual
CPU cores is just 2 hardware cores). With 32 cores per node you will get 8x
speed up (close to 8x).

-Vlad


On Thu, Apr 21, 2016 at 7:22 PM, hongbin ma <[email protected]> wrote:

> hi Thakrar
>
> Thanks for your reply.
>
> My settings for the RegionScanner Scan is
>
> scan.setCaching(1024)
> scan.setMaxResultSize(5M)
>
> even if I change the caching to 100000 I'm still not getting any
> improvements. I guess the caching works for remote scan through RPC,
> however not helping too much for region side scan?
>
> I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no
> improvement was observed.
>
> I'm pursuing for pure scan-read performance optimization because our
> application is sort of read-only. And I observed that even if I did no
> other thing (only scanning) in my coprocessor, the scan speed is not
> satisfying. The CPU seems to be fully utilized. May be the process of
> decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan
> speed would your expect on a normal setting? I mean per region scan speed,
> not the overall scan speed counting in all regions.
>
> thanks
>
> On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh <
> [email protected]> wrote:
>
> > Just curious - have you set the scanner caching to some high value - say
> > 1000 (or even higher in your small value case)?
> >
> > The parameter is hbase.client.scanner.caching
> >
> > You can read up on it - https://hbase.apache.org/book.html
> >
> > Another thing, are you just looking for pure scan-read performance
> > optimization?
> > Depending upon the table size you can also look into caching the table or
> > not caching at all.
> >
> > -----Original Message-----
> > From: hongbin ma [mailto:[email protected]]
> > Sent: Thursday, April 21, 2016 5:04 AM
> > To: [email protected]
> > Subject: Rows per second for RegionScanner
> >
> > Hi, experts,
> >
> > I'm trying to figure out how fast hbase can scan. I'm setting up the
> > RegionScan in a endpoint coprocessor so that no network overhead will be
> > included. My average key length is 35 and average value length is 5.
> >
> > My test result is that if I warm all my interested blocks in the block
> > cache, I'm only able to scan around 300,000 rows per second per region
> > (with endpoint I guess it's one thread per region), so it's like
> getting15M
> > data per second. I'm not sure if this is already an acceptable number for
> > HBase. The answers from you experts might help me to decide if it's worth
> > to further dig into tuning it.
> >
> > thanks!
> >
> >
> >
> >
> >
> >
> > other info:
> >
> > My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G
> > RAM. Each region server is configured 10G heap size. The test HTable has
> 23
> > regions, one hfile per region (just major compacted). There's no other
> > resource contention when I ran the tests.
> >
> > Attached is the HFile output of one of the region hfile:
> > =============================================
> >  hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
> >
> >
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> > 2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
> > hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> > 2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
> > org.apache.hadoop.util.PureJavaCrc32
> > 2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use
> > org.apache.hadoop.util.PureJavaCrc32C
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> >
> [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> >
> [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > 2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
> > fs.default.name is deprecated. Instead, use fs.defaultFS Scanning ->
> >
> >
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> > Block index size as per heapsize: 3640
> >
> >
> reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
> >     compression=none,
> >     cacheConf=CacheConfig:disabled,
> >
> >
> >
> firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,
> >
> >
> >
> lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
> >     avgKeyLen=35,
> >     avgValueLen=5,
> >     entries=160988965,
> >     length=1832309188
> > Trailer:
> >     fileinfoOffset=1832308623,
> >     loadOnOpenDataOffset=1832306641,
> >     dataIndexCount=43,
> >     metaIndexCount=0,
> >     totalUncomressedBytes=1831809883,
> >     entryCount=160988965,
> >     compressionCodec=NONE,
> >     uncompressedDataIndexSize=5558733,
> >     numDataIndexLevels=2,
> >     firstDataBlockOffset=0,
> >     lastDataBlockOffset=1832250057,
> >     comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
> >     majorVersion=2,
> >     minorVersion=3
> > Fileinfo:
> >     DATA_BLOCK_ENCODING = FAST_DIFF
> >     DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
> >     EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00
> >     MAJOR_COMPACTION_KEY = \xFF
> >     MAX_SEQ_ID_KEY = 4
> >     TIMERANGE = 0....0
> >     hfile.AVG_KEY_LEN = 35
> >     hfile.AVG_VALUE_LEN = 5
> >     hfile.LASTKEY =
> >
> >
> \x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04
> > Mid-key:
> >
> >
> \x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81
> > Bloom filter:
> >     Not present
> > Delete Family Bloom filter:
> >     Not present
> > Stats:
> >    Key length:
> >                min = 32.00
> >                max = 37.00
> >               mean = 35.11
> >             stddev = 1.46
> >             median = 35.00
> >               75% <= 37.00
> >               95% <= 37.00
> >               98% <= 37.00
> >               99% <= 37.00
> >             99.9% <= 37.00
> >              count = 160988965
> >    Row size (bytes):
> >                min = 44.00
> >                max = 55.00
> >               mean = 48.17
> >             stddev = 1.43
> >             median = 48.00
> >               75% <= 50.00
> >               95% <= 50.00
> >               98% <= 50.00
> >               99% <= 50.00
> >             99.9% <= 51.97
> >              count = 160988965
> >    Row size (columns):
> >                min = 1.00
> >                max = 1.00
> >               mean = 1.00
> >             stddev = 0.00
> >             median = 1.00
> >               75% <= 1.00
> >               95% <= 1.00
> >               98% <= 1.00
> >               99% <= 1.00
> >             99.9% <= 1.00
> >              count = 160988965
> >    Val length:
> >                min = 4.00
> >                max = 12.00
> >               mean = 5.06
> >             stddev = 0.33
> >             median = 5.00
> >               75% <= 5.00
> >               95% <= 5.00
> >               98% <= 6.00
> >               99% <= 8.00
> >             99.9% <= 9.00
> >              count = 160988965
> > Key of biggest row:
> >
> >
> \x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x04\xDD:\x06\x00U\x00\x00\x00\x8DS\xD2
> > Scanned kv count -> 160988965
> >
> >
> >
> >
> > This email and any files included with it may contain privileged,
> > proprietary and/or confidential information that is for the sole use
> > of the intended recipient(s).  Any disclosure, copying, distribution,
> > posting, or use of the information contained in or attached to this
> > email is prohibited unless permitted by the sender.  If you have
> > received this email in error, please immediately notify the sender
> > via return email, telephone, or fax and destroy this original
> transmission
> > and its included files without reading or saving it in any manner.
> > Thank you.
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Rows per second for RegionScanner

Reply via email to