Try disabling block encoding - you will get better numbers. >> I mean per region scan speed,
Scan performance depends on # of CPU cores, the more cores you have the more performance you will get. Your servers are pretty low end (4 virtual CPU cores is just 2 hardware cores). With 32 cores per node you will get 8x speed up (close to 8x). -Vlad On Thu, Apr 21, 2016 at 7:22 PM, hongbin ma <[email protected]> wrote: > hi Thakrar > > Thanks for your reply. > > My settings for the RegionScanner Scan is > > scan.setCaching(1024) > scan.setMaxResultSize(5M) > > even if I change the caching to 100000 I'm still not getting any > improvements. I guess the caching works for remote scan through RPC, > however not helping too much for region side scan? > > I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no > improvement was observed. > > I'm pursuing for pure scan-read performance optimization because our > application is sort of read-only. And I observed that even if I did no > other thing (only scanning) in my coprocessor, the scan speed is not > satisfying. The CPU seems to be fully utilized. May be the process of > decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan > speed would your expect on a normal setting? I mean per region scan speed, > not the overall scan speed counting in all regions. > > thanks > > On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh < > [email protected]> wrote: > > > Just curious - have you set the scanner caching to some high value - say > > 1000 (or even higher in your small value case)? > > > > The parameter is hbase.client.scanner.caching > > > > You can read up on it - https://hbase.apache.org/book.html > > > > Another thing, are you just looking for pure scan-read performance > > optimization? > > Depending upon the table size you can also look into caching the table or > > not caching at all. > > > > -----Original Message----- > > From: hongbin ma [mailto:[email protected]] > > Sent: Thursday, April 21, 2016 5:04 AM > > To: [email protected] > > Subject: Rows per second for RegionScanner > > > > Hi, experts, > > > > I'm trying to figure out how fast hbase can scan. I'm setting up the > > RegionScan in a endpoint coprocessor so that no network overhead will be > > included. My average key length is 35 and average value length is 5. > > > > My test result is that if I warm all my interested blocks in the block > > cache, I'm only able to scan around 300,000 rows per second per region > > (with endpoint I guess it's one thread per region), so it's like > getting15M > > data per second. I'm not sure if this is already an acceptable number for > > HBase. The answers from you experts might help me to decide if it's worth > > to further dig into tuning it. > > > > thanks! > > > > > > > > > > > > > > other info: > > > > My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G > > RAM. Each region server is configured 10G heap size. The test HTable has > 23 > > regions, one hfile per region (just major compacted). There's no other > > resource contention when I ran the tests. > > > > Attached is the HFile output of one of the region hfile: > > ============================================= > > hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f > > > > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > > 2016-04-21 09:16:04,091 INFO [main] Configuration.deprecation: > > hadoop.native.lib is deprecated. Instead, use io.native.lib.available > > 2016-04-21 09:16:04,292 INFO [main] util.ChecksumType: Checksum using > > org.apache.hadoop.util.PureJavaCrc32 > > 2016-04-21 09:16:04,294 INFO [main] util.ChecksumType: Checksum can use > > org.apache.hadoop.util.PureJavaCrc32C > > SLF4J: Class path contains multiple SLF4J bindings. > > SLF4J: Found binding in > > > > > [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: Found binding in > > > > > [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > > explanation. > > 2016-04-21 09:16:05,654 INFO [main] Configuration.deprecation: > > fs.default.name is deprecated. Instead, use fs.defaultFS Scanning -> > > > > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > > Block index size as per heapsize: 3640 > > > > > reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06, > > compression=none, > > cacheConf=CacheConfig:disabled, > > > > > > > firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put, > > > > > > > lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put, > > avgKeyLen=35, > > avgValueLen=5, > > entries=160988965, > > length=1832309188 > > Trailer: > > fileinfoOffset=1832308623, > > loadOnOpenDataOffset=1832306641, > > dataIndexCount=43, > > metaIndexCount=0, > > totalUncomressedBytes=1831809883, > > entryCount=160988965, > > compressionCodec=NONE, > > uncompressedDataIndexSize=5558733, > > numDataIndexLevels=2, > > firstDataBlockOffset=0, > > lastDataBlockOffset=1832250057, > > comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, > > majorVersion=2, > > minorVersion=3 > > Fileinfo: > > DATA_BLOCK_ENCODING = FAST_DIFF > > DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00 > > EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00 > > MAJOR_COMPACTION_KEY = \xFF > > MAX_SEQ_ID_KEY = 4 > > TIMERANGE = 0....0 > > hfile.AVG_KEY_LEN = 35 > > hfile.AVG_VALUE_LEN = 5 > > hfile.LASTKEY = > > > > > \x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04 > > Mid-key: > > > > > \x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81 > > Bloom filter: > > Not present > > Delete Family Bloom filter: > > Not present > > Stats: > > Key length: > > min = 32.00 > > max = 37.00 > > mean = 35.11 > > stddev = 1.46 > > median = 35.00 > > 75% <= 37.00 > > 95% <= 37.00 > > 98% <= 37.00 > > 99% <= 37.00 > > 99.9% <= 37.00 > > count = 160988965 > > Row size (bytes): > > min = 44.00 > > max = 55.00 > > mean = 48.17 > > stddev = 1.43 > > median = 48.00 > > 75% <= 50.00 > > 95% <= 50.00 > > 98% <= 50.00 > > 99% <= 50.00 > > 99.9% <= 51.97 > > count = 160988965 > > Row size (columns): > > min = 1.00 > > max = 1.00 > > mean = 1.00 > > stddev = 0.00 > > median = 1.00 > > 75% <= 1.00 > > 95% <= 1.00 > > 98% <= 1.00 > > 99% <= 1.00 > > 99.9% <= 1.00 > > count = 160988965 > > Val length: > > min = 4.00 > > max = 12.00 > > mean = 5.06 > > stddev = 0.33 > > median = 5.00 > > 75% <= 5.00 > > 95% <= 5.00 > > 98% <= 6.00 > > 99% <= 8.00 > > 99.9% <= 9.00 > > count = 160988965 > > Key of biggest row: > > > > > \x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x04\xDD:\x06\x00U\x00\x00\x00\x8DS\xD2 > > Scanned kv count -> 160988965 > > > > > > > > > > This email and any files included with it may contain privileged, > > proprietary and/or confidential information that is for the sole use > > of the intended recipient(s). Any disclosure, copying, distribution, > > posting, or use of the information contained in or attached to this > > email is prohibited unless permitted by the sender. If you have > > received this email in error, please immediately notify the sender > > via return email, telephone, or fax and destroy this original > transmission > > and its included files without reading or saving it in any manner. > > Thank you. > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone >
