Re: Rows per second for RegionScanner
Try disabling block encoding - you will get better numbers. >> I mean per region scan speed, Scan performance depends on # of CPU cores, the more cores you have the more performance you will get. Your servers are pretty low end (4 virtual CPU cores is just 2 hardware cores). With 32 cores per node you will get 8x speed up (close to 8x). -Vlad On Thu, Apr 21, 2016 at 7:22 PM, hongbin mawrote: > hi Thakrar > > Thanks for your reply. > > My settings for the RegionScanner Scan is > > scan.setCaching(1024) > scan.setMaxResultSize(5M) > > even if I change the caching to 10 I'm still not getting any > improvements. I guess the caching works for remote scan through RPC, > however not helping too much for region side scan? > > I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no > improvement was observed. > > I'm pursuing for pure scan-read performance optimization because our > application is sort of read-only. And I observed that even if I did no > other thing (only scanning) in my coprocessor, the scan speed is not > satisfying. The CPU seems to be fully utilized. May be the process of > decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan > speed would your expect on a normal setting? I mean per region scan speed, > not the overall scan speed counting in all regions. > > thanks > > On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh < > jthak...@conversantmedia.com> wrote: > > > Just curious - have you set the scanner caching to some high value - say > > 1000 (or even higher in your small value case)? > > > > The parameter is hbase.client.scanner.caching > > > > You can read up on it - https://hbase.apache.org/book.html > > > > Another thing, are you just looking for pure scan-read performance > > optimization? > > Depending upon the table size you can also look into caching the table or > > not caching at all. > > > > -Original Message- > > From: hongbin ma [mailto:mahong...@apache.org] > > Sent: Thursday, April 21, 2016 5:04 AM > > To: user@hbase.apache.org > > Subject: Rows per second for RegionScanner > > > > Hi, experts, > > > > I'm trying to figure out how fast hbase can scan. I'm setting up the > > RegionScan in a endpoint coprocessor so that no network overhead will be > > included. My average key length is 35 and average value length is 5. > > > > My test result is that if I warm all my interested blocks in the block > > cache, I'm only able to scan around 300,000 rows per second per region > > (with endpoint I guess it's one thread per region), so it's like > getting15M > > data per second. I'm not sure if this is already an acceptable number for > > HBase. The answers from you experts might help me to decide if it's worth > > to further dig into tuning it. > > > > thanks! > > > > > > > > > > > > > > other info: > > > > My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G > > RAM. Each region server is configured 10G heap size. The test HTable has > 23 > > regions, one hfile per region (just major compacted). There's no other > > resource contention when I ran the tests. > > > > Attached is the HFile output of one of the region hfile: > > = > > hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f > > > > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > > 2016-04-21 09:16:04,091 INFO [main] Configuration.deprecation: > > hadoop.native.lib is deprecated. Instead, use io.native.lib.available > > 2016-04-21 09:16:04,292 INFO [main] util.ChecksumType: Checksum using > > org.apache.hadoop.util.PureJavaCrc32 > > 2016-04-21 09:16:04,294 INFO [main] util.ChecksumType: Checksum can use > > org.apache.hadoop.util.PureJavaCrc32C > > SLF4J: Class path contains multiple SLF4J bindings. > > SLF4J: Found binding in > > > > > [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: Found binding in > > > > > [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > > explanation. > > 2016-04-21 09:16:05,654 INFO [main] Configuration.deprecation: > > fs.default.name is deprecated. Instead, use fs.defaultFS Scanning -> > > > > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > > Block index size as per heapsize: 3640 > > > > > reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06, > > compression=none, > > cacheConf=CacheConfig:disabled, > > > > > > > firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put, > > > > > > > lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put, > > avgKeyLen=35,
Re: Can not connect local java client to a remote Hbase
Check these links out http://stackoverflow.com/questions/36377393/connecting-to-hbase-1-0-3-via-java-client-stuck-at-zookeeper-clientcnxn-session http://mail-archives.apache.org/mod_mbox/hbase-user/201604.mbox/browser First what is you machines IP address. If you can specify only IP address in regionserver and hbase-site.xml and also remove 192.168.1.240 master-sigma from hosts then you can be sure everything is getting resolved via IP address only. Also enable trace logging to understand more, as what call is getting failed and why. What I have found is that in HBase some servers are resolved differently as pointed in those links. Hope it helps. Sachin On Thu, Apr 21, 2016 at 11:11 PM, SOUFIANI Mustapha | السفياني مصطفى < s.mustaph...@gmail.com> wrote: > Hi all, > I'm trying to connect my local java client (pentaho) to a remote Hbase but > every time I get a TimeOut error telleing me that this connection couldn't > be established. > > herer is the full message error: > > > *** > > java.io.IOException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=36, exceptions: > Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table > 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=localhost,16020,1461071963695, seqNum=0 > > > at > > com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71) > > at > > org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398) > > at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) > > at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) > > at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603) > > at > > org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125) > > at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783) > > at > org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072) > > at > > org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755) > > at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) > > at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) > > at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) > > at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347) > > at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989) > > at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269) > > at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92) > > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > after attempts=36, exceptions: > Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table > 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=localhost,16020,1461071963695, seqNum=0 > > > at > > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270) > > at > > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225) > > at > > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63) > > at > > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314) > > at > > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289) > > at > >
Re: Rows per second for RegionScanner
hi Thakrar Thanks for your reply. My settings for the RegionScanner Scan is scan.setCaching(1024) scan.setMaxResultSize(5M) even if I change the caching to 10 I'm still not getting any improvements. I guess the caching works for remote scan through RPC, however not helping too much for region side scan? I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no improvement was observed. I'm pursuing for pure scan-read performance optimization because our application is sort of read-only. And I observed that even if I did no other thing (only scanning) in my coprocessor, the scan speed is not satisfying. The CPU seems to be fully utilized. May be the process of decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan speed would your expect on a normal setting? I mean per region scan speed, not the overall scan speed counting in all regions. thanks On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh < jthak...@conversantmedia.com> wrote: > Just curious - have you set the scanner caching to some high value - say > 1000 (or even higher in your small value case)? > > The parameter is hbase.client.scanner.caching > > You can read up on it - https://hbase.apache.org/book.html > > Another thing, are you just looking for pure scan-read performance > optimization? > Depending upon the table size you can also look into caching the table or > not caching at all. > > -Original Message- > From: hongbin ma [mailto:mahong...@apache.org] > Sent: Thursday, April 21, 2016 5:04 AM > To: user@hbase.apache.org > Subject: Rows per second for RegionScanner > > Hi, experts, > > I'm trying to figure out how fast hbase can scan. I'm setting up the > RegionScan in a endpoint coprocessor so that no network overhead will be > included. My average key length is 35 and average value length is 5. > > My test result is that if I warm all my interested blocks in the block > cache, I'm only able to scan around 300,000 rows per second per region > (with endpoint I guess it's one thread per region), so it's like getting15M > data per second. I'm not sure if this is already an acceptable number for > HBase. The answers from you experts might help me to decide if it's worth > to further dig into tuning it. > > thanks! > > > > > > > other info: > > My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G > RAM. Each region server is configured 10G heap size. The test HTable has 23 > regions, one hfile per region (just major compacted). There's no other > resource contention when I ran the tests. > > Attached is the HFile output of one of the region hfile: > = > hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > 2016-04-21 09:16:04,091 INFO [main] Configuration.deprecation: > hadoop.native.lib is deprecated. Instead, use io.native.lib.available > 2016-04-21 09:16:04,292 INFO [main] util.ChecksumType: Checksum using > org.apache.hadoop.util.PureJavaCrc32 > 2016-04-21 09:16:04,294 INFO [main] util.ChecksumType: Checksum can use > org.apache.hadoop.util.PureJavaCrc32C > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > > [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > > [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 2016-04-21 09:16:05,654 INFO [main] Configuration.deprecation: > fs.default.name is deprecated. Instead, use fs.defaultFS Scanning -> > > /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 > Block index size as per heapsize: 3640 > > reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06, > compression=none, > cacheConf=CacheConfig:disabled, > > > firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put, > > > lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put, > avgKeyLen=35, > avgValueLen=5, > entries=160988965, > length=1832309188 > Trailer: > fileinfoOffset=1832308623, > loadOnOpenDataOffset=1832306641, > dataIndexCount=43, > metaIndexCount=0, > totalUncomressedBytes=1831809883, > entryCount=160988965, > compressionCodec=NONE, > uncompressedDataIndexSize=5558733, > numDataIndexLevels=2, > firstDataBlockOffset=0, > lastDataBlockOffset=1832250057, > comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, > majorVersion=2, > minorVersion=3 > Fileinfo: > DATA_BLOCK_ENCODING = FAST_DIFF > DELETE_FAMILY_COUNT =
Re: Best way to pass configuration properties to MRv2 jobs
How true!! ;-) Thanks, Henning On 18.04.2016 19:53, Dima Spivak wrote: Probably better off asking on the Hadoop user mailing list ( u...@hadoop.apache.org) than the HBase one… :) -Dima On Mon, Apr 18, 2016 at 2:57 AM, Henning Blohmwrote: Hi, in our Hadoop 2.6.0 cluster, we need to pass some properties to all Hadoop processes so they can be referenced using ${...} syntax in configuration files. This works reasonably well using HADOOP_NAMENODE_OPTS and the like. For Map/Reduce jobs however, we need to speficy not only mapred.child.java.opts to pass system properties, in addition we need to set yarn.app.mapreduce.am.command-opts for anything that is referenced in Hadoop configuration files. In the end however almost all the properties passed are available as environment variables as well. Hence my question: * Is it possible to use reference environment variables in configuration files directly? * Does anybody know of a simpler way to make sure some system properties are _always_ set for all Yarn processes? Thanks, Henning
Hbase row key to Hive mapping
Hi, I'm storing the row key as following combination in bytes. Row key using rowbuilder: unused8+timestamp(int)+accountid(long)+id(long) When I'm trying to map that key in hive table, I'm unable to convert back the actual value since it's stored in bytes. Even I set mapping as :key#b. Please help.
Re: new 4.x-HBase-1.1 branch
On Thu, Apr 21, 2016 at 11:09 AM, Enis Söztutarwrote: > > > > Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98 > >> versus 1.0 and up, besides the folks at Salesforce, whom I know well > >> (smile). And, more generally, who in greater HBase land is still on 0.98 > >> and won't move off this year. > >> > > > From our customers perspective, most of the users are 1.1+. > The ones using 0.98-based HBase releases are not getting the latest Phoenix > anyway through the vendor channels. > > Thanks Enis, really appreciate the info. > > >> > On Apr 20, 2016, at 5:00 PM, James Taylor > >> wrote: > >> > > >> > Due to some API changes in HBase, we need to have a separate branch > for > >> our > >> > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for > >> this > >> > - please make sure to keep this branch in sync with the other 4.x and > >> > master branches. We can release 4.x-HBase-1.2 compatible releases out > of > >> > master for 4.8. I think post 4.8 release we may not need to continue > >> with > >> > the 4.x-HBase-1.0 branch. > >> > > >> > Thanks, > >> > James > >> > > >
Re: new 4.x-HBase-1.1 branch
> > Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98 >> versus 1.0 and up, besides the folks at Salesforce, whom I know well >> (smile). And, more generally, who in greater HBase land is still on 0.98 >> and won't move off this year. >> > From our customers perspective, most of the users are 1.1+. The ones using 0.98-based HBase releases are not getting the latest Phoenix anyway through the vendor channels. Enis > >> > On Apr 20, 2016, at 5:00 PM, James Taylor>> wrote: >> > >> > Due to some API changes in HBase, we need to have a separate branch for >> our >> > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for >> this >> > - please make sure to keep this branch in sync with the other 4.x and >> > master branches. We can release 4.x-HBase-1.2 compatible releases out of >> > master for 4.8. I think post 4.8 release we may not need to continue >> with >> > the 4.x-HBase-1.0 branch. >> > >> > Thanks, >> > James >> > >
Re: new 4.x-HBase-1.1 branch
Makes sense to drop the branch for HBase-1.0.x. I had proposed it here before: http://search-hadoop.com/m/9UY0h2XrnGW1d3OBF1=+DISCUSS+Drop+branch+for+HBase+1+0+ Enis On Thu, Apr 21, 2016 at 8:06 AM, Andrew Purtellwrote: > HBase announced at the last 1.0 release that it would be the last release > in that line and I think we would recommend any 1.0 user move up to 1.1 or > 1.2 at their earliest convenience. FWIW > > Also, as RM of the 0.98 code line I am considering ending its (mostly) > regular release cadence at the end of this calendar year. I'd continue if > there were expressed user or dev demand but otherwise place it into the > same state as 1.0. Also FWIW. I'd be curious to hear how many Phoenix users > are using 0.98 versus 1.0 and up, besides the folks at Salesforce, whom I > know well (smile). And, more generally, who in greater HBase land is still > on 0.98 and won't move off this year. > > > On Apr 20, 2016, at 5:00 PM, James Taylor > wrote: > > > > Due to some API changes in HBase, we need to have a separate branch for > our > > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for > this > > - please make sure to keep this branch in sync with the other 4.x and > > master branches. We can release 4.x-HBase-1.2 compatible releases out of > > master for 4.8. I think post 4.8 release we may not need to continue with > > the 4.x-HBase-1.0 branch. > > > > Thanks, > > James >
Re: Can not connect local java client to a remote Hbase
Are you using hbase 1.0 or 1.1 ? I assume you have verified that hbase master is running normally on master-sigma. Are you able to use hbase shell on that node ? If you check master log, you would see which node hosts hbase:meta On that node, do you see anything interesting in region server log ? Cheers On Thu, Apr 21, 2016 at 10:41 AM, SOUFIANI Mustapha | السفياني مصطفى < s.mustaph...@gmail.com> wrote: > Hi all, > I'm trying to connect my local java client (pentaho) to a remote Hbase but > every time I get a TimeOut error telleing me that this connection couldn't > be established. > > herer is the full message error: > > > *** > > java.io.IOException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=36, exceptions: > Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table > 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=localhost,16020,1461071963695, seqNum=0 > > > at > > com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71) > > at > > org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398) > > at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) > > at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) > > at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) > > at > > org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603) > > at > > org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125) > > at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783) > > at > org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072) > > at > > org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755) > > at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) > > at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) > > at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) > > at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) > > at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347) > > at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989) > > at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269) > > at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > > at java.lang.reflect.Method.invoke(Unknown Source) > > at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92) > > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > after attempts=36, exceptions: > Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: > callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table > 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=localhost,16020,1461071963695, seqNum=0 > > > at > > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270) > > at > > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225) > > at > > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63) > > at > > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314) > > at > > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289) > > at > > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161) > > at > org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156) > > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) > > at > > org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601) > > at > >
Can not connect local java client to a remote Hbase
Hi all, I'm trying to connect my local java client (pentaho) to a remote Hbase but every time I get a TimeOut error telleing me that this connection couldn't be established. herer is the full message error: *** java.io.IOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,16020,1461071963695, seqNum=0 at com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71) at org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502) at org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818) at org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88) at org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=localhost,16020,1461071963695, seqNum=0 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161) at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888) at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601) at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:365) at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:310) at org.pentaho.hadoop.hbase.factory.HBase10Admin.tableExists(HBase10Admin.java:41) at org.pentaho.hbase.shim.common.CommonHBaseConnection.tableExists(CommonHBaseConnection.java:206) at org.pentaho.hbase.shim.common.HBaseConnectionImpl.access$801(HBaseConnectionImpl.java:35) at org.pentaho.hbase.shim.common.HBaseConnectionImpl$9.call(HBaseConnectionImpl.java:185) at org.pentaho.hbase.shim.common.HBaseConnectionImpl$9.call(HBaseConnectionImpl.java:181) at
Re: new 4.x-HBase-1.1 branch
HBase announced at the last 1.0 release that it would be the last release in that line and I think we would recommend any 1.0 user move up to 1.1 or 1.2 at their earliest convenience. FWIW Also, as RM of the 0.98 code line I am considering ending its (mostly) regular release cadence at the end of this calendar year. I'd continue if there were expressed user or dev demand but otherwise place it into the same state as 1.0. Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98 versus 1.0 and up, besides the folks at Salesforce, whom I know well (smile). And, more generally, who in greater HBase land is still on 0.98 and won't move off this year. > On Apr 20, 2016, at 5:00 PM, James Taylorwrote: > > Due to some API changes in HBase, we need to have a separate branch for our > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for this > - please make sure to keep this branch in sync with the other 4.x and > master branches. We can release 4.x-HBase-1.2 compatible releases out of > master for 4.8. I think post 4.8 release we may not need to continue with > the 4.x-HBase-1.0 branch. > > Thanks, > James
Re: Processing rows in parallel with MapReduce jobs.
Thanks Ted, Finally I found the real mistake, the class had to be declared static. Best, Iván. - Mensaje original - > De: "Ted Yu"> Para: user@hbase.apache.org > Enviados: Martes, 19 de Abril 2016 15:56:56 > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > From the error, you need to provide an argumentless ctor for > MyTableInputFormat. > > On Tue, Apr 19, 2016 at 12:12 AM, Ivan Cores gonzalez > wrote: > > > > > Hi Ted, > > > > Sorry, I forgot to write the error. In runtime I have the next exception: > > > > Exception in thread "main" java.lang.RuntimeException: > > java.lang.NoSuchMethodException: > > simplerowcounter.SimpleRowCounter$MyTableInputFormat.() > > > > the program works fine if I don't use "MyTableInputFormat" modifying the > > call to initTableMapperJob: > > > > TableMapReduceUtil.initTableMapperJob(tableName, scan, > > RowCounterMapper.class, > > ImmutableBytesWritable.class, Result.class, job); // --> > > works fine without MyTableInputFormat > > > > That's why I asked If you see any problem in the code. Because maybe I > > forgot override some method or something is missing. > > > > Best, > > Iván. > > > > > > - Mensaje original - > > > De: "Ted Yu" > > > Para: user@hbase.apache.org > > > Enviados: Martes, 19 de Abril 2016 0:22:05 > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > Did you see the "Message to log?" log ? > > > > > > Can you pastebin the error / exception you got ? > > > > > > On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez < > > ivan.co...@inria.fr> > > > wrote: > > > > > > > > > > > > > > > Hi Ted, > > > > So, If I understand the behaviour of getSplits(), I can create > > "virtual" > > > > splits overriding the getSplits function. > > > > I was performing some tests, but my code crash in runtime and I cannot > > > > found the problem. > > > > Any help? I didn't find examples. > > > > > > > > > > > > public class SimpleRowCounter extends Configured implements Tool { > > > > > > > > static class RowCounterMapper extends > > > > TableMapper { > > > > public static enum Counters { ROWS } > > > > @Override > > > > public void map(ImmutableBytesWritable row, Result value, Context > > > > context) { > > > > context.getCounter(Counters.ROWS).increment(1); > > > > try { > > > > Thread.sleep(3000); //Simulates work > > > > } catch (InterruptedException name) { } > > > > } > > > > } > > > > > > > > public class MyTableInputFormat extends TableInputFormat { > > > > @Override > > > > public List getSplits(JobContext context) throws > > > > IOException { > > > > //Just to detect if this method is being called ... > > > > List splits = super.getSplits(context); > > > > System.out.printf("Message to log? \n" ); > > > > return splits; > > > > } > > > > } > > > > > > > > @Override > > > > public int run(String[] args) throws Exception { > > > > if (args.length != 1) { > > > > System.err.println("Usage: SimpleRowCounter "); > > > > return -1; > > > > } > > > > String tableName = args[0]; > > > > > > > > Scan scan = new Scan(); > > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > > scan.setCaching(500); > > > > scan.setCacheBlocks(false); > > > > > > > > Job job = new Job(getConf(), getClass().getSimpleName()); > > > > job.setJarByClass(getClass()); > > > > > > > > TableMapReduceUtil.initTableMapperJob(tableName, scan, > > > > RowCounterMapper.class, > > > > ImmutableBytesWritable.class, Result.class, job, true, > > > > MyTableInputFormat.class); > > > > > > > > job.setNumReduceTasks(0); > > > > job.setOutputFormatClass(NullOutputFormat.class); > > > > return job.waitForCompletion(true) ? 0 : 1; > > > > } > > > > > > > > public static void main(String[] args) throws Exception { > > > > int exitCode = ToolRunner.run(HBaseConfiguration.create(), > > > > new SimpleRowCounter(), args); > > > > System.exit(exitCode); > > > > } > > > > } > > > > > > > > Thanks so much, > > > > Iván. > > > > > > > > > > > > > > > > > > > > - Mensaje original - > > > > > De: "Ted Yu" > > > > > Para: user@hbase.apache.org > > > > > Enviados: Martes, 12 de Abril 2016 17:29:52 > > > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > > > > > Please take a look at TableInputFormatBase#getSplits() : > > > > > > > > > >* Calculates the splits that will serve as input for the map > > tasks. > > > > The > > > > > > > > > >* number of splits matches the number of regions in a table. > > > > > > > > > > Each mapper would be reading one of the regions. > > > > > > > > > > On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez < > > > >
RE: Rows per second for RegionScanner
Just curious - have you set the scanner caching to some high value - say 1000 (or even higher in your small value case)? The parameter is hbase.client.scanner.caching You can read up on it - https://hbase.apache.org/book.html Another thing, are you just looking for pure scan-read performance optimization? Depending upon the table size you can also look into caching the table or not caching at all. -Original Message- From: hongbin ma [mailto:mahong...@apache.org] Sent: Thursday, April 21, 2016 5:04 AM To: user@hbase.apache.org Subject: Rows per second for RegionScanner Hi, experts, I'm trying to figure out how fast hbase can scan. I'm setting up the RegionScan in a endpoint coprocessor so that no network overhead will be included. My average key length is 35 and average value length is 5. My test result is that if I warm all my interested blocks in the block cache, I'm only able to scan around 300,000 rows per second per region (with endpoint I guess it's one thread per region), so it's like getting15M data per second. I'm not sure if this is already an acceptable number for HBase. The answers from you experts might help me to decide if it's worth to further dig into tuning it. thanks! other info: My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G RAM. Each region server is configured 10G heap size. The test HTable has 23 regions, one hfile per region (just major compacted). There's no other resource contention when I ran the tests. Attached is the HFile output of one of the region hfile: = hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 2016-04-21 09:16:04,091 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2016-04-21 09:16:04,292 INFO [main] util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 2016-04-21 09:16:04,294 INFO [main] util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2016-04-21 09:16:05,654 INFO [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning -> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 Block index size as per heapsize: 3640 reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06, compression=none, cacheConf=CacheConfig:disabled, firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put, lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put, avgKeyLen=35, avgValueLen=5, entries=160988965, length=1832309188 Trailer: fileinfoOffset=1832308623, loadOnOpenDataOffset=1832306641, dataIndexCount=43, metaIndexCount=0, totalUncomressedBytes=1831809883, entryCount=160988965, compressionCodec=NONE, uncompressedDataIndexSize=5558733, numDataIndexLevels=2, firstDataBlockOffset=0, lastDataBlockOffset=1832250057, comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, majorVersion=2, minorVersion=3 Fileinfo: DATA_BLOCK_ENCODING = FAST_DIFF DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00 EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00 MAJOR_COMPACTION_KEY = \xFF MAX_SEQ_ID_KEY = 4 TIMERANGE = 00 hfile.AVG_KEY_LEN = 35 hfile.AVG_VALUE_LEN = 5 hfile.LASTKEY = \x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04 Mid-key: \x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81 Bloom filter: Not present Delete Family Bloom filter: Not present Stats: Key length: min = 32.00 max = 37.00 mean = 35.11 stddev = 1.46 median = 35.00 75% <= 37.00 95% <= 37.00 98% <= 37.00 99% <= 37.00 99.9% <= 37.00 count = 160988965 Row size (bytes): min = 44.00 max = 55.00 mean = 48.17 stddev = 1.43 median = 48.00 75% <= 50.00 95% <= 50.00 98% <= 50.00 99% <= 50.00
Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.
Since topic has shifted to making zookeeper quorum work, I suggest sending the question to user@zookeeper Thanks > On Apr 21, 2016, at 3:38 AM, Eric Gaowrote: > > Thanks a lot! > I delete the line /tmp/zookeeper,and make sure there is only one > dataDir=/opt/zookeeper/data. > And I have changed the zoo.cfg: > server.1=192.168.1.2:2887:3887 > server.2=192.168.1.3:2888:3888 > server.3=192.168.1.4:2889:3889 > > Nothing is running on the IP:port. > > But I found myid files' contents are changed to 0,1 and 2,so I have changed > them back to 1,2 and 3. > > Then zookeepers were restarted,and here is the zookeeper.out: > > master's zookeeper.out: > 2016-04-21 11:30:06,645 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading > configuration from: /opt/zookeeper/bin/../conf/zoo.cfg > 2016-04-21 11:30:06,724 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - > Resolved hostname: 192.168.1.4 to address: /192.168.1.4 > 2016-04-21 11:30:06,725 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - > Resolved hostname: 192.168.1.3 to address: /192.168.1.3 > 2016-04-21 11:30:06,726 [myid:] - INFO [main:QuorumPeer$QuorumServer@149] - > Resolved hostname: 192.168.1.2 to address: /192.168.1.2 > 2016-04-21 11:30:06,727 [myid:] - INFO [main:QuorumPeerConfig@331] - > Defaulting to majority quorums > 2016-04-21 11:30:06,737 [myid:1] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2016-04-21 11:30:06,738 [myid:1] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2016-04-21 11:30:06,738 [myid:1] - INFO [main:DatadirCleanupManager@101] - > Purge task is not scheduled. > 2016-04-21 11:30:06,765 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting > quorum peer > 2016-04-21 11:30:06,794 [myid:1] - INFO [main:NIOServerCnxnFactory@89] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2016-04-21 11:30:06,815 [myid:1] - INFO [main:QuorumPeer@1019] - tickTime > set to 2000 > 2016-04-21 11:30:06,815 [myid:1] - INFO [main:QuorumPeer@1039] - > minSessionTimeout set to -1 > 2016-04-21 11:30:06,816 [myid:1] - INFO [main:QuorumPeer@1050] - > maxSessionTimeout set to -1 > 2016-04-21 11:30:06,816 [myid:1] - INFO [main:QuorumPeer@1065] - initLimit > set to 10 > 2016-04-21 11:30:06,851 [myid:1] - INFO [main:FileSnap@83] - Reading > snapshot /opt/zookeeper/data/version-2/snapshot.2 > 2016-04-21 11:30:06,882 [myid:1] - INFO > [ListenerThread:QuorumCnxManager$Listener@534] - My election bind port: > /192.168.1.2:3887 > 2016-04-21 11:30:06,989 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@774] - LOOKING > 00 > 0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) > LOOKING (my state) > 2016-04-21 11:30:07,020 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@199] - Have smaller server identifier, > so dropping the connection: (2, 1) > 2016-04-21 11:30:07,024 [myid:1] - INFO > [WorkerSender[myid=1]:QuorumCnxManager@199] - Have smaller server identifier, > so dropping the connection: (3, 1) > 2016-04-21 11:30:07,035 [myid:1] - INFO > [/192.168.1.2:3887:QuorumCnxManager$Listener@541] - Received connection > request /192.168.1.3:56474 > d), 0x1 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING > (my state) > d), 0x1 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING > (my state) > 2016-04-21 11:30:07,043 [myid:1] - INFO > [/192.168.1.2:3887:QuorumCnxManager$Listener@541] - Received connection > request /192.168.1.4:46486 > d), 0x1 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING > (my state) > 2016-04-21 11:30:07,050 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@844] - FOLLOWING > 2016-04-21 11:30:07,058 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@86] - TCP NoDelay set to: > true > 02/06/2016 03:18 GMT > 2016-04-21 11:30:07,071 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:host.name=master > 2016-04-21 11:30:07,071 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:java.version=1.7.0_75 > 2016-04-21 11:30:07,072 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:java.vendor=Oracle Corporation > njdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre > lib > lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > 2016-04-21 11:30:07,072 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:java.io.tmpdir=/tmp > 2016-04-21 11:30:07,073 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:java.compiler= > 2016-04-21 11:30:07,073 [myid:1] - INFO > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server > environment:os.name=Linux > 2016-04-21 11:30:07,073 [myid:1] - INFO >
Rows per second for RegionScanner
Hi, experts, I'm trying to figure out how fast hbase can scan. I'm setting up the RegionScan in a endpoint coprocessor so that no network overhead will be included. My average key length is 35 and average value length is 5. My test result is that if I warm all my interested blocks in the block cache, I'm only able to scan around 300,000 rows per second per region (with endpoint I guess it's one thread per region), so it's like getting15M data per second. I'm not sure if this is already an acceptable number for HBase. The answers from you experts might help me to decide if it's worth to further dig into tuning it. thanks! other info: My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G RAM. Each region server is configured 10G heap size. The test HTable has 23 regions, one hfile per region (just major compacted). There's no other resource contention when I ran the tests. Attached is the HFile output of one of the region hfile: = hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 2016-04-21 09:16:04,091 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2016-04-21 09:16:04,292 INFO [main] util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 2016-04-21 09:16:04,294 INFO [main] util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2016-04-21 09:16:05,654 INFO [main] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Scanning -> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06 Block index size as per heapsize: 3640 reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06, compression=none, cacheConf=CacheConfig:disabled, firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put, lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put, avgKeyLen=35, avgValueLen=5, entries=160988965, length=1832309188 Trailer: fileinfoOffset=1832308623, loadOnOpenDataOffset=1832306641, dataIndexCount=43, metaIndexCount=0, totalUncomressedBytes=1831809883, entryCount=160988965, compressionCodec=NONE, uncompressedDataIndexSize=5558733, numDataIndexLevels=2, firstDataBlockOffset=0, lastDataBlockOffset=1832250057, comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, majorVersion=2, minorVersion=3 Fileinfo: DATA_BLOCK_ENCODING = FAST_DIFF DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00 EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00 MAJOR_COMPACTION_KEY = \xFF MAX_SEQ_ID_KEY = 4 TIMERANGE = 00 hfile.AVG_KEY_LEN = 35 hfile.AVG_VALUE_LEN = 5 hfile.LASTKEY = \x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04 Mid-key: \x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81 Bloom filter: Not present Delete Family Bloom filter: Not present Stats: Key length: min = 32.00 max = 37.00 mean = 35.11 stddev = 1.46 median = 35.00 75% <= 37.00 95% <= 37.00 98% <= 37.00 99% <= 37.00 99.9% <= 37.00 count = 160988965 Row size (bytes): min = 44.00 max = 55.00 mean = 48.17 stddev = 1.43 median = 48.00 75% <= 50.00 95% <= 50.00 98% <= 50.00 99% <= 50.00 99.9% <= 51.97 count = 160988965 Row size (columns): min = 1.00 max = 1.00 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00 count = 160988965 Val length: min = 4.00 max = 12.00 mean = 5.06 stddev = 0.33 median = 5.00 75% <= 5.00 95% <= 5.00 98% <= 6.00