Re: Rows per second for RegionScanner

2016-04-21 Thread Vladimir Rodionov
Try disabling block encoding - you will get better numbers.

>>  I mean per region scan speed,

Scan performance depends on # of CPU cores, the more cores you have the
more performance you will get. Your servers are pretty low end (4 virtual
CPU cores is just 2 hardware cores). With 32 cores per node you will get 8x
speed up (close to 8x).

-Vlad


On Thu, Apr 21, 2016 at 7:22 PM, hongbin ma  wrote:

> hi Thakrar
>
> Thanks for your reply.
>
> My settings for the RegionScanner Scan is
>
> scan.setCaching(1024)
> scan.setMaxResultSize(5M)
>
> even if I change the caching to 10 I'm still not getting any
> improvements. I guess the caching works for remote scan through RPC,
> however not helping too much for region side scan?
>
> I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no
> improvement was observed.
>
> I'm pursuing for pure scan-read performance optimization because our
> application is sort of read-only. And I observed that even if I did no
> other thing (only scanning) in my coprocessor, the scan speed is not
> satisfying. The CPU seems to be fully utilized. May be the process of
> decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan
> speed would your expect on a normal setting? I mean per region scan speed,
> not the overall scan speed counting in all regions.
>
> thanks
>
> On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh <
> jthak...@conversantmedia.com> wrote:
>
> > Just curious - have you set the scanner caching to some high value - say
> > 1000 (or even higher in your small value case)?
> >
> > The parameter is hbase.client.scanner.caching
> >
> > You can read up on it - https://hbase.apache.org/book.html
> >
> > Another thing, are you just looking for pure scan-read performance
> > optimization?
> > Depending upon the table size you can also look into caching the table or
> > not caching at all.
> >
> > -Original Message-
> > From: hongbin ma [mailto:mahong...@apache.org]
> > Sent: Thursday, April 21, 2016 5:04 AM
> > To: user@hbase.apache.org
> > Subject: Rows per second for RegionScanner
> >
> > ​Hi, experts,
> >
> > I'm trying to figure out how fast hbase can scan. I'm setting up the
> > RegionScan in a endpoint coprocessor so that no network overhead will be
> > included. My average key length is 35 and average value length is 5.
> >
> > My test result is that if I warm all my interested blocks in the block
> > cache, I'm only able to scan around 300,000 rows per second per region
> > (with endpoint I guess it's one thread per region), so it's like
> getting15M
> > data per second. I'm not sure if this is already an acceptable number for
> > HBase. The answers from you experts might help me to decide if it's worth
> > to further dig into tuning it.
> >
> > thanks!
> >
> >
> >
> >
> >
> >
> > other info:
> >
> > My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G
> > RAM. Each region server is configured 10G heap size. The test HTable has
> 23
> > regions, one hfile per region (just major compacted). There's no other
> > resource contention when I ran the tests.
> >
> > Attached is the HFile output of one of the region hfile:
> > =
> >  hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
> >
> >
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> > 2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
> > hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> > 2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
> > org.apache.hadoop.util.PureJavaCrc32
> > 2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use
> > org.apache.hadoop.util.PureJavaCrc32C
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> >
> [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> >
> [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > 2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
> > fs.default.name is deprecated. Instead, use fs.defaultFS Scanning ->
> >
> >
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> > Block index size as per heapsize: 3640
> >
> >
> reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
> > compression=none,
> > cacheConf=CacheConfig:disabled,
> >
> >
> >
> firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,
> >
> >
> >
> lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
> > avgKeyLen=35,

Re: Can not connect local java client to a remote Hbase

2016-04-21 Thread Sachin Mittal
Check these links out
http://stackoverflow.com/questions/36377393/connecting-to-hbase-1-0-3-via-java-client-stuck-at-zookeeper-clientcnxn-session
http://mail-archives.apache.org/mod_mbox/hbase-user/201604.mbox/browser

First what is you machines IP address.

If you can specify only IP address in regionserver and hbase-site.xml and
also remove 192.168.1.240   master-sigma from hosts then you can be sure
everything is getting resolved via IP address only.

Also enable trace logging to understand more, as what call is getting
failed and why.

What I have found is that in HBase some servers are resolved differently as
pointed in those links.

Hope it helps.

Sachin


On Thu, Apr 21, 2016 at 11:11 PM, SOUFIANI Mustapha | السفياني مصطفى <
s.mustaph...@gmail.com> wrote:

> Hi all,
> I'm trying to connect my local java client (pentaho) to a remote Hbase but
> every time I get a TimeOut error telleing me that this connection couldn't
> be established.
>
> herer is the full message error:
>
>
> ***
>
> java.io.IOException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603)
>
> at
>
> org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125)
>
> at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783)
>
> at
> org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072)
>
> at
>
> org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347)
>
> at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989)
>
> at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269)
>
> at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
> at java.lang.reflect.Method.invoke(Unknown Source)
>
> at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63)
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)
>
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)
>
> at
>
> 

Re: Rows per second for RegionScanner

2016-04-21 Thread hongbin ma
hi Thakrar

Thanks for your reply.

My settings for the RegionScanner Scan is

scan.setCaching(1024)
scan.setMaxResultSize(5M)

even if I change the caching to 10 I'm still not getting any
improvements. I guess the caching works for remote scan through RPC,
however not helping too much for region side scan?

I also tried the PREFETCH_BLOCKS_ON_OPEN for the whole table, however no
improvement was observed.

I'm pursuing for pure scan-read performance optimization because our
application is sort of read-only. And I observed that even if I did no
other thing (only scanning) in my coprocessor, the scan speed is not
satisfying. The CPU seems to be fully utilized. May be the process of
decoding FAST_DIFF rows is too heavy for CPU? How many rows/second scan
speed would your expect on a normal setting? I mean per region scan speed,
not the overall scan speed counting in all regions.

thanks

On Thu, Apr 21, 2016 at 10:24 PM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> Just curious - have you set the scanner caching to some high value - say
> 1000 (or even higher in your small value case)?
>
> The parameter is hbase.client.scanner.caching
>
> You can read up on it - https://hbase.apache.org/book.html
>
> Another thing, are you just looking for pure scan-read performance
> optimization?
> Depending upon the table size you can also look into caching the table or
> not caching at all.
>
> -Original Message-
> From: hongbin ma [mailto:mahong...@apache.org]
> Sent: Thursday, April 21, 2016 5:04 AM
> To: user@hbase.apache.org
> Subject: Rows per second for RegionScanner
>
> ​Hi, experts,
>
> I'm trying to figure out how fast hbase can scan. I'm setting up the
> RegionScan in a endpoint coprocessor so that no network overhead will be
> included. My average key length is 35 and average value length is 5.
>
> My test result is that if I warm all my interested blocks in the block
> cache, I'm only able to scan around 300,000 rows per second per region
> (with endpoint I guess it's one thread per region), so it's like getting15M
> data per second. I'm not sure if this is already an acceptable number for
> HBase. The answers from you experts might help me to decide if it's worth
> to further dig into tuning it.
>
> thanks!
>
>
>
>
>
>
> other info:
>
> My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G
> RAM. Each region server is configured 10G heap size. The test HTable has 23
> regions, one hfile per region (just major compacted). There's no other
> resource contention when I ran the tests.
>
> Attached is the HFile output of one of the region hfile:
> =
>  hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
>
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> 2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
> hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> 2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
> org.apache.hadoop.util.PureJavaCrc32
> 2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use
> org.apache.hadoop.util.PureJavaCrc32C
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
> fs.default.name is deprecated. Instead, use fs.defaultFS Scanning ->
>
> /apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
> Block index size as per heapsize: 3640
>
> reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
> compression=none,
> cacheConf=CacheConfig:disabled,
>
>
> firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,
>
>
> lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
> avgKeyLen=35,
> avgValueLen=5,
> entries=160988965,
> length=1832309188
> Trailer:
> fileinfoOffset=1832308623,
> loadOnOpenDataOffset=1832306641,
> dataIndexCount=43,
> metaIndexCount=0,
> totalUncomressedBytes=1831809883,
> entryCount=160988965,
> compressionCodec=NONE,
> uncompressedDataIndexSize=5558733,
> numDataIndexLevels=2,
> firstDataBlockOffset=0,
> lastDataBlockOffset=1832250057,
> comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
> majorVersion=2,
> minorVersion=3
> Fileinfo:
> DATA_BLOCK_ENCODING = FAST_DIFF
> DELETE_FAMILY_COUNT = 

Re: Best way to pass configuration properties to MRv2 jobs

2016-04-21 Thread Henning Blohm

How true!! ;-)

Thanks,
Henning

On 18.04.2016 19:53, Dima Spivak wrote:

Probably better off asking on the Hadoop user mailing list (
u...@hadoop.apache.org) than the HBase one… :)

-Dima

On Mon, Apr 18, 2016 at 2:57 AM, Henning Blohm 
wrote:


Hi,

in our Hadoop 2.6.0 cluster, we need to pass some properties to all Hadoop
processes so they can be referenced using ${...} syntax in configuration
files. This works reasonably well using HADOOP_NAMENODE_OPTS and the like.

For Map/Reduce jobs however, we need to speficy not only

mapred.child.java.opts

to pass system properties, in addition we need to set

yarn.app.mapreduce.am.command-opts

for anything that is referenced in Hadoop configuration files.

In the end however almost all the properties passed are available as
environment variables as well.

Hence my question:

* Is it possible to use reference environment variables in configuration
files directly?
* Does anybody know of a simpler way to make sure some system properties
are _always_ set for all Yarn processes?

Thanks,
Henning





Hbase row key to Hive mapping

2016-04-21 Thread Viswanathan J
Hi,

I'm storing the row key as following combination in bytes.

Row key using rowbuilder: unused8+timestamp(int)+accountid(long)+id(long)

When I'm trying to map that key in hive table, I'm unable to convert back
the actual value since it's stored in bytes. Even I set mapping as :key#b.

Please help.


Re: new 4.x-HBase-1.1 branch

2016-04-21 Thread Andrew Purtell
On Thu, Apr 21, 2016 at 11:09 AM, Enis Söztutar  wrote:

> >
> > Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98
> >> versus 1.0 and up, besides the folks at Salesforce, whom I know well
> >> (smile). And, more generally, who in greater HBase land is still on 0.98
> >> and won't move off this year.
> >>
> >
>  From our customers perspective, most of the users are 1.1+.
> The ones using 0.98-based HBase releases are not getting the latest Phoenix
> anyway through the vendor channels.
>
>
​Thanks Enis, really appreciate the info.​



>
> >> > On Apr 20, 2016, at 5:00 PM, James Taylor 
> >> wrote:
> >> >
> >> > Due to some API changes in HBase, we need to have a separate branch
> for
> >> our
> >> > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for
> >> this
> >> > - please make sure to keep this branch in sync with the other 4.x and
> >> > master branches. We can release 4.x-HBase-1.2 compatible releases out
> of
> >> > master for 4.8. I think post 4.8 release we may not need to continue
> >> with
> >> > the 4.x-HBase-1.0 branch.
> >> >
> >> > Thanks,
> >> > James
> >>
> >
>


Re: new 4.x-HBase-1.1 branch

2016-04-21 Thread Enis Söztutar
>
> Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98
>> versus 1.0 and up, besides the folks at Salesforce, whom I know well
>> (smile). And, more generally, who in greater HBase land is still on 0.98
>> and won't move off this year.
>>
>
 From our customers perspective, most of the users are 1.1+.
The ones using 0.98-based HBase releases are not getting the latest Phoenix
anyway through the vendor channels.

Enis


>
>> > On Apr 20, 2016, at 5:00 PM, James Taylor 
>> wrote:
>> >
>> > Due to some API changes in HBase, we need to have a separate branch for
>> our
>> > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for
>> this
>> > - please make sure to keep this branch in sync with the other 4.x and
>> > master branches. We can release 4.x-HBase-1.2 compatible releases out of
>> > master for 4.8. I think post 4.8 release we may not need to continue
>> with
>> > the 4.x-HBase-1.0 branch.
>> >
>> > Thanks,
>> > James
>>
>
>


Re: new 4.x-HBase-1.1 branch

2016-04-21 Thread Enis Söztutar
Makes sense to drop the branch for HBase-1.0.x.

I had proposed it here before:
http://search-hadoop.com/m/9UY0h2XrnGW1d3OBF1=+DISCUSS+Drop+branch+for+HBase+1+0+


Enis

On Thu, Apr 21, 2016 at 8:06 AM, Andrew Purtell 
wrote:

> HBase announced at the last 1.0 release that it would be the last release
> in that line and I think we would recommend any 1.0 user move up to 1.1 or
> 1.2 at their earliest convenience. FWIW
>
> Also, as RM of the 0.98 code line I am considering ending its (mostly)
> regular release cadence at the end of this calendar year. I'd continue if
> there were expressed user or dev demand but otherwise place it into the
> same state as 1.0. Also FWIW. I'd be curious to hear how many Phoenix users
> are using 0.98 versus 1.0 and up, besides the folks at Salesforce, whom I
> know well (smile). And, more generally, who in greater HBase land is still
> on 0.98 and won't move off this year.
>
> > On Apr 20, 2016, at 5:00 PM, James Taylor 
> wrote:
> >
> > Due to some API changes in HBase, we need to have a separate branch for
> our
> > HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for
> this
> > - please make sure to keep this branch in sync with the other 4.x and
> > master branches. We can release 4.x-HBase-1.2 compatible releases out of
> > master for 4.8. I think post 4.8 release we may not need to continue with
> > the 4.x-HBase-1.0 branch.
> >
> > Thanks,
> > James
>


Re: Can not connect local java client to a remote Hbase

2016-04-21 Thread Ted Yu
Are you using hbase 1.0 or 1.1 ?

I assume you have verified that hbase master is running normally on
master-sigma.
Are you able to use hbase shell on that node ?

If you check master log, you would see which node hosts hbase:meta
On that node, do you see anything interesting in region server log ?

Cheers

On Thu, Apr 21, 2016 at 10:41 AM, SOUFIANI Mustapha | السفياني مصطفى <
s.mustaph...@gmail.com> wrote:

> Hi all,
> I'm trying to connect my local java client (pentaho) to a remote Hbase but
> every time I get a TimeOut error telleing me that this connection couldn't
> be established.
>
> herer is the full message error:
>
>
> ***
>
> java.io.IOException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603)
>
> at
>
> org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125)
>
> at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783)
>
> at
> org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072)
>
> at
>
> org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347)
>
> at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989)
>
> at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269)
>
> at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
> at java.lang.reflect.Method.invoke(Unknown Source)
>
> at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63)
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)
>
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)
>
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156)
>
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)
>
> at
>
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601)
>
> at
>
> 

Can not connect local java client to a remote Hbase

2016-04-21 Thread SOUFIANI Mustapha | السفياني مصطفى
Hi all,
I'm trying to connect my local java client (pentaho) to a remote Hbase but
every time I get a TimeOut error telleing me that this connection couldn't
be established.

herer is the full message error:

***

java.io.IOException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:
Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=localhost,16020,1461071963695, seqNum=0


at
com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71)

at
org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502)

at
org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818)

at
org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88)

at
org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398)

at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)

at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)

at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)

at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)

at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)

at
org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603)

at
org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125)

at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783)

at
org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072)

at
org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755)

at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)

at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)

at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)

at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)

at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)

at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347)

at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989)

at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269)

at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)

at java.lang.reflect.Method.invoke(Unknown Source)

at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
after attempts=36, exceptions:
Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
'hbase:meta' at region=hbase:meta,,1.1588230740,
hostname=localhost,16020,1461071963695, seqNum=0


at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270)

at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225)

at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63)

at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)

at
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)

at
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161)

at
org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156)

at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)

at
org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601)

at
org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:365)

at
org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:310)

at
org.pentaho.hadoop.hbase.factory.HBase10Admin.tableExists(HBase10Admin.java:41)

at
org.pentaho.hbase.shim.common.CommonHBaseConnection.tableExists(CommonHBaseConnection.java:206)

at
org.pentaho.hbase.shim.common.HBaseConnectionImpl.access$801(HBaseConnectionImpl.java:35)

at
org.pentaho.hbase.shim.common.HBaseConnectionImpl$9.call(HBaseConnectionImpl.java:185)

at
org.pentaho.hbase.shim.common.HBaseConnectionImpl$9.call(HBaseConnectionImpl.java:181)

at

Re: new 4.x-HBase-1.1 branch

2016-04-21 Thread Andrew Purtell
HBase announced at the last 1.0 release that it would be the last release in 
that line and I think we would recommend any 1.0 user move up to 1.1 or 1.2 at 
their earliest convenience. FWIW

Also, as RM of the 0.98 code line I am considering ending its (mostly) regular 
release cadence at the end of this calendar year. I'd continue if there were 
expressed user or dev demand but otherwise place it into the same state as 1.0. 
Also FWIW. I'd be curious to hear how many Phoenix users are using 0.98 versus 
1.0 and up, besides the folks at Salesforce, whom I know well (smile). And, 
more generally, who in greater HBase land is still on 0.98 and won't move off 
this year. 

> On Apr 20, 2016, at 5:00 PM, James Taylor  wrote:
> 
> Due to some API changes in HBase, we need to have a separate branch for our
> HBase 1.1 compatible branches. I've created a 4.x-HBase-1.1 branch for this
> - please make sure to keep this branch in sync with the other 4.x and
> master branches. We can release 4.x-HBase-1.2 compatible releases out of
> master for 4.8. I think post 4.8 release we may not need to continue with
> the 4.x-HBase-1.0 branch.
> 
> Thanks,
> James


Re: Processing rows in parallel with MapReduce jobs.

2016-04-21 Thread Ivan Cores gonzalez
Thanks Ted, 
Finally I found the real mistake, the class had to be declared static.

Best,
Iván.

- Mensaje original -
> De: "Ted Yu" 
> Para: user@hbase.apache.org
> Enviados: Martes, 19 de Abril 2016 15:56:56
> Asunto: Re: Processing rows in parallel with MapReduce jobs.
> 
> From the error, you need to provide an argumentless ctor for
> MyTableInputFormat.
> 
> On Tue, Apr 19, 2016 at 12:12 AM, Ivan Cores gonzalez 
> wrote:
> 
> >
> > Hi Ted,
> >
> > Sorry, I forgot to write the error. In runtime I have the next exception:
> >
> > Exception in thread "main" java.lang.RuntimeException:
> > java.lang.NoSuchMethodException:
> > simplerowcounter.SimpleRowCounter$MyTableInputFormat.()
> >
> > the program works fine if I don't use "MyTableInputFormat" modifying the
> > call to initTableMapperJob:
> >
> > TableMapReduceUtil.initTableMapperJob(tableName, scan,
> > RowCounterMapper.class,
> > ImmutableBytesWritable.class, Result.class, job);   // -->
> > works fine without MyTableInputFormat
> >
> > That's why I asked If you see any problem in the code. Because maybe I
> > forgot override some method or something is missing.
> >
> > Best,
> > Iván.
> >
> >
> > - Mensaje original -
> > > De: "Ted Yu" 
> > > Para: user@hbase.apache.org
> > > Enviados: Martes, 19 de Abril 2016 0:22:05
> > > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> > >
> > > Did you see the "Message to log?" log ?
> > >
> > > Can you pastebin the error / exception you got ?
> > >
> > > On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez <
> > ivan.co...@inria.fr>
> > > wrote:
> > >
> > > >
> > > >
> > > > Hi Ted,
> > > > So, If I understand the behaviour of getSplits(), I can create
> > "virtual"
> > > > splits overriding the getSplits function.
> > > > I was performing some tests, but my code crash in runtime and I cannot
> > > > found the problem.
> > > > Any help? I didn't find examples.
> > > >
> > > >
> > > > public class SimpleRowCounter extends Configured implements Tool {
> > > >
> > > >   static class RowCounterMapper extends
> > > > TableMapper {
> > > > public static enum Counters { ROWS }
> > > > @Override
> > > > public void map(ImmutableBytesWritable row, Result value, Context
> > > > context) {
> > > >   context.getCounter(Counters.ROWS).increment(1);
> > > > try {
> > > > Thread.sleep(3000); //Simulates work
> > > > } catch (InterruptedException name) { }
> > > > }
> > > >   }
> > > >
> > > >   public class MyTableInputFormat extends TableInputFormat {
> > > > @Override
> > > > public List getSplits(JobContext context) throws
> > > > IOException {
> > > > //Just to detect if this method is being called ...
> > > > List splits = super.getSplits(context);
> > > > System.out.printf("Message to log? \n" );
> > > > return splits;
> > > > }
> > > >   }
> > > >
> > > >   @Override
> > > >   public int run(String[] args) throws Exception {
> > > > if (args.length != 1) {
> > > >   System.err.println("Usage: SimpleRowCounter ");
> > > >   return -1;
> > > > }
> > > > String tableName = args[0];
> > > >
> > > > Scan scan = new Scan();
> > > > scan.setFilter(new FirstKeyOnlyFilter());
> > > > scan.setCaching(500);
> > > > scan.setCacheBlocks(false);
> > > >
> > > > Job job = new Job(getConf(), getClass().getSimpleName());
> > > > job.setJarByClass(getClass());
> > > >
> > > > TableMapReduceUtil.initTableMapperJob(tableName, scan,
> > > > RowCounterMapper.class,
> > > > ImmutableBytesWritable.class, Result.class, job, true,
> > > > MyTableInputFormat.class);
> > > >
> > > > job.setNumReduceTasks(0);
> > > > job.setOutputFormatClass(NullOutputFormat.class);
> > > > return job.waitForCompletion(true) ? 0 : 1;
> > > >   }
> > > >
> > > >   public static void main(String[] args) throws Exception {
> > > > int exitCode = ToolRunner.run(HBaseConfiguration.create(),
> > > > new SimpleRowCounter(), args);
> > > > System.exit(exitCode);
> > > >   }
> > > > }
> > > >
> > > > Thanks so much,
> > > > Iván.
> > > >
> > > >
> > > >
> > > >
> > > > - Mensaje original -
> > > > > De: "Ted Yu" 
> > > > > Para: user@hbase.apache.org
> > > > > Enviados: Martes, 12 de Abril 2016 17:29:52
> > > > > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> > > > >
> > > > > Please take a look at TableInputFormatBase#getSplits() :
> > > > >
> > > > >* Calculates the splits that will serve as input for the map
> > tasks.
> > > > The
> > > > >
> > > > >* number of splits matches the number of regions in a table.
> > > > >
> > > > > Each mapper would be reading one of the regions.
> > > > >
> > > > > On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez <
> > > > 

RE: Rows per second for RegionScanner

2016-04-21 Thread Thakrar, Jayesh
Just curious - have you set the scanner caching to some high value - say 1000 
(or even higher in your small value case)?

The parameter is hbase.client.scanner.caching

You can read up on it - https://hbase.apache.org/book.html

Another thing, are you just looking for pure scan-read performance optimization?
Depending upon the table size you can also look into caching the table or not 
caching at all.

-Original Message-
From: hongbin ma [mailto:mahong...@apache.org] 
Sent: Thursday, April 21, 2016 5:04 AM
To: user@hbase.apache.org
Subject: Rows per second for RegionScanner

​Hi, experts,

I'm trying to figure out how fast hbase can scan. I'm setting up the RegionScan 
in a endpoint coprocessor so that no network overhead will be included. My 
average key length is 35 and average value length is 5.

My test result is that if I warm all my interested blocks in the block cache, 
I'm only able to scan around 300,000 rows per second per region (with endpoint 
I guess it's one thread per region), so it's like getting15M data per second. 
I'm not sure if this is already an acceptable number for HBase. The answers 
from you experts might help me to decide if it's worth to further dig into 
tuning it.

thanks!






other info:

My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G RAM. 
Each region server is configured 10G heap size. The test HTable has 23 regions, 
one hfile per region (just major compacted). There's no other resource 
contention when I ran the tests.

Attached is the HFile output of one of the region hfile:
=
 hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
org.apache.hadoop.util.PureJavaCrc32
2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use 
org.apache.hadoop.util.PureJavaCrc32C
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
fs.default.name is deprecated. Instead, use fs.defaultFS Scanning ->
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
Block index size as per heapsize: 3640
reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
compression=none,
cacheConf=CacheConfig:disabled,

firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,

lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
avgKeyLen=35,
avgValueLen=5,
entries=160988965,
length=1832309188
Trailer:
fileinfoOffset=1832308623,
loadOnOpenDataOffset=1832306641,
dataIndexCount=43,
metaIndexCount=0,
totalUncomressedBytes=1831809883,
entryCount=160988965,
compressionCodec=NONE,
uncompressedDataIndexSize=5558733,
numDataIndexLevels=2,
firstDataBlockOffset=0,
lastDataBlockOffset=1832250057,
comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
majorVersion=2,
minorVersion=3
Fileinfo:
DATA_BLOCK_ENCODING = FAST_DIFF
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00
MAJOR_COMPACTION_KEY = \xFF
MAX_SEQ_ID_KEY = 4
TIMERANGE = 00
hfile.AVG_KEY_LEN = 35
hfile.AVG_VALUE_LEN = 5
hfile.LASTKEY =
\x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04
Mid-key:
\x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81
Bloom filter:
Not present
Delete Family Bloom filter:
Not present
Stats:
   Key length:
   min = 32.00
   max = 37.00
  mean = 35.11
stddev = 1.46
median = 35.00
  75% <= 37.00
  95% <= 37.00
  98% <= 37.00
  99% <= 37.00
99.9% <= 37.00
 count = 160988965
   Row size (bytes):
   min = 44.00
   max = 55.00
  mean = 48.17
stddev = 1.43
median = 48.00
  75% <= 50.00
  95% <= 50.00
  98% <= 50.00
  99% <= 50.00
  

Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.

2016-04-21 Thread Ted Yu
Since topic has shifted to making zookeeper quorum work, I suggest sending the 
question to user@zookeeper

Thanks

> On Apr 21, 2016, at 3:38 AM, Eric Gao  wrote:
> 
> Thanks a lot!
> I delete the line /tmp/zookeeper,and make sure there is only one 
> dataDir=/opt/zookeeper/data.
> And I have changed the zoo.cfg:
> server.1=192.168.1.2:2887:3887
> server.2=192.168.1.3:2888:3888
> server.3=192.168.1.4:2889:3889
> 
> Nothing is running on the IP:port.
> 
> But I found myid files' contents are changed to 0,1 and 2,so I have changed 
> them back to 1,2 and 3.
> 
> Then zookeepers were restarted,and here is the zookeeper.out:
> 
> master's zookeeper.out:
> 2016-04-21 11:30:06,645 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading 
> configuration from: /opt/zookeeper/bin/../conf/zoo.cfg
> 2016-04-21 11:30:06,724 [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - 
> Resolved hostname: 192.168.1.4 to address: /192.168.1.4
> 2016-04-21 11:30:06,725 [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - 
> Resolved hostname: 192.168.1.3 to address: /192.168.1.3
> 2016-04-21 11:30:06,726 [myid:] - INFO  [main:QuorumPeer$QuorumServer@149] - 
> Resolved hostname: 192.168.1.2 to address: /192.168.1.2
> 2016-04-21 11:30:06,727 [myid:] - INFO  [main:QuorumPeerConfig@331] - 
> Defaulting to majority quorums
> 2016-04-21 11:30:06,737 [myid:1] - INFO  [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2016-04-21 11:30:06,738 [myid:1] - INFO  [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 0
> 2016-04-21 11:30:06,738 [myid:1] - INFO  [main:DatadirCleanupManager@101] - 
> Purge task is not scheduled.
> 2016-04-21 11:30:06,765 [myid:1] - INFO  [main:QuorumPeerMain@127] - Starting 
> quorum peer
> 2016-04-21 11:30:06,794 [myid:1] - INFO  [main:NIOServerCnxnFactory@89] - 
> binding to port 0.0.0.0/0.0.0.0:2181
> 2016-04-21 11:30:06,815 [myid:1] - INFO  [main:QuorumPeer@1019] - tickTime 
> set to 2000
> 2016-04-21 11:30:06,815 [myid:1] - INFO  [main:QuorumPeer@1039] - 
> minSessionTimeout set to -1
> 2016-04-21 11:30:06,816 [myid:1] - INFO  [main:QuorumPeer@1050] - 
> maxSessionTimeout set to -1
> 2016-04-21 11:30:06,816 [myid:1] - INFO  [main:QuorumPeer@1065] - initLimit 
> set to 10
> 2016-04-21 11:30:06,851 [myid:1] - INFO  [main:FileSnap@83] - Reading 
> snapshot /opt/zookeeper/data/version-2/snapshot.2
> 2016-04-21 11:30:06,882 [myid:1] - INFO  
> [ListenerThread:QuorumCnxManager$Listener@534] - My election bind port: 
> /192.168.1.2:3887
> 2016-04-21 11:30:06,989 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@774] - LOOKING
> 00
> 0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) 
> LOOKING (my state)
> 2016-04-21 11:30:07,020 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@199] - Have smaller server identifier, 
> so dropping the connection: (2, 1)
> 2016-04-21 11:30:07,024 [myid:1] - INFO  
> [WorkerSender[myid=1]:QuorumCnxManager@199] - Have smaller server identifier, 
> so dropping the connection: (3, 1)
> 2016-04-21 11:30:07,035 [myid:1] - INFO  
> [/192.168.1.2:3887:QuorumCnxManager$Listener@541] - Received connection 
> request /192.168.1.3:56474
> d), 0x1 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING 
> (my state)
> d), 0x1 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING 
> (my state)
> 2016-04-21 11:30:07,043 [myid:1] - INFO  
> [/192.168.1.2:3887:QuorumCnxManager$Listener@541] - Received connection 
> request /192.168.1.4:46486
> d), 0x1 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING 
> (my state)
> 2016-04-21 11:30:07,050 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@844] - FOLLOWING
> 2016-04-21 11:30:07,058 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@86] - TCP NoDelay set to: 
> true
> 02/06/2016 03:18 GMT
> 2016-04-21 11:30:07,071 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:host.name=master
> 2016-04-21 11:30:07,071 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:java.version=1.7.0_75
> 2016-04-21 11:30:07,072 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:java.vendor=Oracle Corporation
> njdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre
> lib
> lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2016-04-21 11:30:07,072 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:java.io.tmpdir=/tmp
> 2016-04-21 11:30:07,073 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:java.compiler=
> 2016-04-21 11:30:07,073 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Environment@100] - Server 
> environment:os.name=Linux
> 2016-04-21 11:30:07,073 [myid:1] - INFO  
> 

Rows per second for RegionScanner

2016-04-21 Thread hongbin ma
​Hi, experts,

I'm trying to figure out how fast hbase can scan. I'm setting up the
RegionScan in a endpoint coprocessor so that no network overhead will be
included. My average key length is 35 and average value length is 5.

My test result is that if I warm all my interested blocks in the block
cache, I'm only able to scan around 300,000 rows per second per region
(with endpoint I guess it's one thread per region), so it's like getting15M
data per second. I'm not sure if this is already an acceptable number for
HBase. The answers from you experts might help me to decide if it's worth
to further dig into tuning it.

thanks!






other info:

My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G
RAM. Each region server is configured 10G heap size. The test HTable has 23
regions, one hfile per region (just major compacted). There's no other
resource contention when I ran the tests.

Attached is the HFile output of one of the region hfile:
=
 hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
org.apache.hadoop.util.PureJavaCrc32
2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use
org.apache.hadoop.util.PureJavaCrc32C
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
fs.default.name is deprecated. Instead, use fs.defaultFS
Scanning ->
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
Block index size as per heapsize: 3640
reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
compression=none,
cacheConf=CacheConfig:disabled,

firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,

lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
avgKeyLen=35,
avgValueLen=5,
entries=160988965,
length=1832309188
Trailer:
fileinfoOffset=1832308623,
loadOnOpenDataOffset=1832306641,
dataIndexCount=43,
metaIndexCount=0,
totalUncomressedBytes=1831809883,
entryCount=160988965,
compressionCodec=NONE,
uncompressedDataIndexSize=5558733,
numDataIndexLevels=2,
firstDataBlockOffset=0,
lastDataBlockOffset=1832250057,
comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
majorVersion=2,
minorVersion=3
Fileinfo:
DATA_BLOCK_ENCODING = FAST_DIFF
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00
MAJOR_COMPACTION_KEY = \xFF
MAX_SEQ_ID_KEY = 4
TIMERANGE = 00
hfile.AVG_KEY_LEN = 35
hfile.AVG_VALUE_LEN = 5
hfile.LASTKEY =
\x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04
Mid-key:
\x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81
Bloom filter:
Not present
Delete Family Bloom filter:
Not present
Stats:
   Key length:
   min = 32.00
   max = 37.00
  mean = 35.11
stddev = 1.46
median = 35.00
  75% <= 37.00
  95% <= 37.00
  98% <= 37.00
  99% <= 37.00
99.9% <= 37.00
 count = 160988965
   Row size (bytes):
   min = 44.00
   max = 55.00
  mean = 48.17
stddev = 1.43
median = 48.00
  75% <= 50.00
  95% <= 50.00
  98% <= 50.00
  99% <= 50.00
99.9% <= 51.97
 count = 160988965
   Row size (columns):
   min = 1.00
   max = 1.00
  mean = 1.00
stddev = 0.00
median = 1.00
  75% <= 1.00
  95% <= 1.00
  98% <= 1.00
  99% <= 1.00
99.9% <= 1.00
 count = 160988965
   Val length:
   min = 4.00
   max = 12.00
  mean = 5.06
stddev = 0.33
median = 5.00
  75% <= 5.00
  95% <= 5.00
  98% <= 6.00