RE: How to make the client fast fail

2015-06-24 Thread Hariharan_Sethuraman
In our case (0.94.15), we had a timer to interrupt the hanging thread. 
Subsequently, we are able to reconnect to hbase and it all worked fine. But we 
observed the old zookeeper-client thread(s) still failing to connect in 
addition to new set of zookeeper-client thread(s) which are serving with 
response.
So we scored out the timer option.

Thanks,
Hari

-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Thursday, June 11, 2015 5:17 AM
To: user@hbase.apache.org
Subject: Re: How to make the client fast fail

threads?

So that regardless of your hadoop settings, if you want something faster, you 
can use one thread for a timer and then the request is in another. So if you 
hit your timeout before you get a response, you can stop your thread.
(YMMV depending on side effects... )

 On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR wrote:

 Hi,

 I have got the Connection object with default configuration, if the
 zookeeper or HMaster or Region server is down, the client didn't fast
 fail and it took almost 20 mins to thrown an error.

 What is the best configuration to make the client fast fail.

 Also what is significance of changing the following parameters.

 hbase.client.retries.number
 zookeeper.recovery.retry
 zookeeper.session.timeout
 zookeeper.recovery.retry.intervalmill
 hbase.rpc.timeout

 Regards,
 Praneesh


HBase monitoring

2015-06-24 Thread Wojciech Indyk
hello!
I try to monitor memory of my regionservers. I have HBase 0.98 on CDH 5.3.1.
I can see inconsistency between HBase metrics (here is metrics dump
from HBase UI):
name : Hadoop:service=HBase,name=JvmMetrics,
modelerType : JvmMetrics,
tag.Context : jvm,
tag.ProcessName : IPC,
tag.SessionId : ,
tag.Hostname : sqhadoop04.gazeta.pl,
MemNonHeapUsedM : 93.26676,
MemNonHeapCommittedM : 94.89844,
MemNonHeapMaxM : -9.536743E-7,
MemHeapUsedM : 8513.122,
MemHeapCommittedM : 20330.25,
MemHeapMaxM : 20330.25,
MemMaxM : 20330.25,
and system top monitoring (all of my hbase processes on the regionserver):
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
19700 hbase 20   0 22.3g  16g  23m S  1.3  8.7 479:47.53 java
19874 hbase 20   0  107m 2404  556 S  0.7  0.0   9:16.13 hbase.sh
19873 hbase 20   0  105m  908  556 S  0.0  0.0   0:00.00 hbase.sh

As we can see there is 16GB according to top and ~8.5GB according to
HBase. Why these values are different?

Kindly regards
Wojciech Indyk


Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available

2015-06-24 Thread Ted Yu
+1

Ran test suite against Java 1.8.0_45
Checked signature
Practiced basic shell commands

On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote:

 I'm happy to announce the first release candidate of HBase 1.1.1
 (HBase-1.1.1RC0) is available for download at
 https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/

 Maven artifacts are also available in the staging repository
 https://repository.apache.org/content/repositories/orgapachehbase-1087/

 Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
 available in the Apache keys directory
 https://people.apache.org/keys/committer/ndimiduk.asc

 There's also a signed tag for this release at

 https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259

 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on
 the theme of bringing a stable, reliable database to the Hadoop and NoSQL
 communities. This release includes over 100 bug fixes since the 1.1.0
 release, including an assignment manager bug that can lead to data loss in
 rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as
 soon as possible.

 The full list of issues can be found at

 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169

 Please try out this candidate and vote +/-1 by midnight Pacific time on
 Sunday, 2015-06-28 as to whether we should release these artifacts as HBase
 1.1.1.

 Thanks,
 Nick



Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available

2015-06-24 Thread Jean-Marc Spaggiari
Checked signature for both bin and src = Passed.
Checked archive, documentation, .TXT files, etc. = Passed.
Ran it locally, tried create, drop, disable, alter, flush, put, scan =
Passed
Deployed on 9 nodes with 0.98 /hbase folder with JDK8 and Hadoop 2.7.0=
Passed. Yes. I'm able to read my tables, had not get a chance to run MR on
that yet.

However...

I'm unable to get any clean test run. Tried on 4 different servers with 2
JDKs, all failed.

With 1.7.0_45:
Tests in error:
  TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod
java.util.con...
  TestProcedureManager.setupBeforeClass:53 » IO Shutting down
Tests run: 923, Failures: 0, Errors: 2, Skipped: 5

Tests in error:
  TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod
java.util.con...
  TestProcedureManager.setupBeforeClass:53 » IO Shutting down
Tests run: 923, Failures: 0, Errors: 2, Skipped: 5


Tests in error:
  TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod
java.util.con...
  TestProcedureManager.setupBeforeClass:53 » IO Shutting down
Tests run: 923, Failures: 0, Errors: 2, Skipped: 5

Tests in error:
  TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod
java.util.con...
  TestProcedureManager.setupBeforeClass:53 » IO Shutting down
Tests run: 923, Failures: 0, Errors: 2, Skipped: 5

All failed with the exact same error.

With 1.8.0_45:
Failed tests:

TestFromClientSideWithCoprocessorTestFromClientSide.testCheckAndDeleteWithCompareOp:5031
expected:false but was:true
  TestMultiParallel.testActiveThreadsCount:160 expected:5 but was:4

TestReplicationEndpoint.testReplicationEndpointReturnsFalseOnReplicate:145
Waiting timed out after [60,000] msec

Tests in error:

TestSnapshotCloneIndependence.testOfflineSnapshotDeleteIndependent:177-runTestSnapshotDeleteIndependent:424
» RetriesExhausted
  TestTableLockManager.testReapAllTableLocks:283 » LockTimeout Timed out
acquiri...
Tests run: 2633, Failures: 3, Errors: 2, Skipped: 20

Failed tests:
  TestFromClientSide.testCheckAndDeleteWithCompareOp:5031 expected:false
but was:true
Tests run: 2637, Failures: 1, Errors: 0, Skipped: 20


[INFO] HBase - Server  FAILURE
[1:58:05.610s]



Also, I tried to run IntegrationTestBigLinkedList and it fails:
015-06-24 19:06:11,644 ERROR [main]
test.IntegrationTestBigLinkedList$Verify: Expected referenced count does
not match with actual referenced count. expected referenced=100
,actual=0


And last I ran IntegrationTestLoadAndVerify but I have no idea how to
interpret the result ;)
org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
REFERENCES_WRITTEN=1980
ROWS_WRITTEN=2000

org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
REFERENCES_CHECKED=1036925998
ROWS_WRITTEN=0


So. It seems to be working on my cluster, but I have not been able to get
any successful test. Therefore I'm a bit reluctant to say +1 and will only
say +/-0

For pefs tests, I still need some more work on my clusters... So not for
this release.

JM

2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com:

 +1

 Ran test suite against Java 1.8.0_45
 Checked signature
 Practiced basic shell commands

 On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote:

  I'm happy to announce the first release candidate of HBase 1.1.1
  (HBase-1.1.1RC0) is available for download at
  https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/
 
  Maven artifacts are also available in the staging repository
  https://repository.apache.org/content/repositories/orgapachehbase-1087/
 
  Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
  available in the Apache keys directory
  https://people.apache.org/keys/committer/ndimiduk.asc
 
  There's also a signed tag for this release at
 
 
 https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259
 
  HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing
 on
  the theme of bringing a stable, reliable database to the Hadoop and NoSQL
  communities. This release includes over 100 bug fixes since the 1.1.0
  release, including an assignment manager bug that can lead to data loss
 in
  rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as
  soon as possible.
 
  The full list of issues can be found at
 
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169
 
  Please try out this candidate and vote +/-1 by midnight Pacific time on
  Sunday, 2015-06-28 as to whether we should release these artifacts as
 HBase
  1.1.1.
 
  Thanks,
  Nick
 



Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available

2015-06-24 Thread Enis Söztutar

 Also, I tried to run IntegrationTestBigLinkedList and it fails:
 015-06-24 19:06:11,644 ERROR [main]
 test.IntegrationTestBigLinkedList$Verify: Expected referenced count does
 not match with actual referenced count. expected referenced=100
 ,actual=0


What are the command line arguments passed? Verify cannot find any
references?




 And last I ran IntegrationTestLoadAndVerify but I have no idea how to
 interpret the result ;)
 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
 REFERENCES_WRITTEN=1980
 ROWS_WRITTEN=2000

 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
 REFERENCES_CHECKED=1036925998
 ROWS_WRITTEN=0


This is a bit fishy. Again, what are the parameters passed? Did you run
with a clean cluster state?

For these two tests, I think there is at least 3 or so bugs already fixed
in theory. Our tests and my 1.2B row tests on a previous branch-1.1 code
base was ok.




 So. It seems to be working on my cluster, but I have not been able to get
 any successful test. Therefore I'm a bit reluctant to say +1 and will only
 say +/-0

 For pefs tests, I still need some more work on my clusters... So not for
 this release.

 JM

 2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com:

  +1
 
  Ran test suite against Java 1.8.0_45
  Checked signature
  Practiced basic shell commands
 
  On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org
 wrote:
 
   I'm happy to announce the first release candidate of HBase 1.1.1
   (HBase-1.1.1RC0) is available for download at
   https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/
  
   Maven artifacts are also available in the staging repository
  
 https://repository.apache.org/content/repositories/orgapachehbase-1087/
  
   Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
   available in the Apache keys directory
   https://people.apache.org/keys/committer/ndimiduk.asc
  
   There's also a signed tag for this release at
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259
  
   HBase 1.1.1 is the first patch release in the HBase 1.1 line,
 continuing
  on
   the theme of bringing a stable, reliable database to the Hadoop and
 NoSQL
   communities. This release includes over 100 bug fixes since the 1.1.0
   release, including an assignment manager bug that can lead to data loss
  in
   rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1
 as
   soon as possible.
  
   The full list of issues can be found at
  
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169
  
   Please try out this candidate and vote +/-1 by midnight Pacific time on
   Sunday, 2015-06-28 as to whether we should release these artifacts as
  HBase
   1.1.1.
  
   Thanks,
   Nick
  
 



Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available

2015-06-24 Thread Nick Dimiduk
Here's my review of this RC:

- verified tarballs vs public key in p.a.o/keys/committers/ndimiduk.asc.
- extracted both tgz, structure looks good.
- examined book, pdf.
- ran Stack's hbase-downstreamer vs. the maven repo. tests pass.
- verified build of src tgz against hadoop versions (2.2.0/minikdc=2.3.0,
2.3.0, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0, 2.7.0), with both
openjdk-1.7.0_79.jdk and openjdk-1.8.0_45.jdk.
- on 5-node cluster, verified rolling upgrade from
hadoop-2.4.0/hbase-0.98.0 while concurrently running LoadTestTool with LZ4
compression (0.98.0 client). No issues, logs look good.
- poked around with the shell on the same: list, status, snapshot, compact,
drop, clone, delete_snapshot, drop. no issues, logs look good.
- inspected compatibility report vs. 1.1.0 [0]. Looks good to me; a single
low-severity issue which I understand to be benign.

+1

[0]: http://people.apache.org/~ndimiduk/1.1.0_1.1.1RC0_compat_report.html


On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote:

 I'm happy to announce the first release candidate of HBase 1.1.1
 (HBase-1.1.1RC0) is available for download at
 https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/

 Maven artifacts are also available in the staging repository
 https://repository.apache.org/content/repositories/orgapachehbase-1087/

 Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
 available in the Apache keys directory
 https://people.apache.org/keys/committer/ndimiduk.asc

 There's also a signed tag for this release at
 https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259

 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing
 on the theme of bringing a stable, reliable database to the Hadoop and
 NoSQL communities. This release includes over 100 bug fixes since the 1.1.0
 release, including an assignment manager bug that can lead to data loss in
 rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as
 soon as possible.

 The full list of issues can be found at
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169

 Please try out this candidate and vote +/-1 by midnight Pacific time on
 Sunday, 2015-06-28 as to whether we should release these artifacts as HBase
 1.1.1.

 Thanks,
 Nick



Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available

2015-06-24 Thread Enis Söztutar
Here is my official +1.

- Checked sigs, crcs
- Checked dir layout
- Built src with Hadoop-2.3+
- Run local mode, smoke tests from shell
- Run LTT on local mode
- Checked compat report that Nick put up.
- Checked tag
- Checked src tarball contents against tag. There are two extra files:
 hbase-shaded-client/pom.xml  and hbase-shaded-server/pom.xml. Not sure
where they are coming from. Create an issue? But not important for the RC.

Plus, we have been running (close to) 1.1.1 bits against our test rig with
most of the IT's and the results never looked better.

Enis


On Wed, Jun 24, 2015 at 7:29 PM, Enis Söztutar enis@gmail.com wrote:

 Also, I tried to run IntegrationTestBigLinkedList and it fails:
 015-06-24 19:06:11,644 ERROR [main]
 test.IntegrationTestBigLinkedList$Verify: Expected referenced count does
 not match with actual referenced count. expected referenced=100
 ,actual=0


 What are the command line arguments passed? Verify cannot find any
 references?




 And last I ran IntegrationTestLoadAndVerify but I have no idea how to
 interpret the result ;)
 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
 REFERENCES_WRITTEN=1980
 ROWS_WRITTEN=2000

 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
 REFERENCES_CHECKED=1036925998
 ROWS_WRITTEN=0


 This is a bit fishy. Again, what are the parameters passed? Did you run
 with a clean cluster state?

 For these two tests, I think there is at least 3 or so bugs already fixed
 in theory. Our tests and my 1.2B row tests on a previous branch-1.1 code
 base was ok.




 So. It seems to be working on my cluster, but I have not been able to get
 any successful test. Therefore I'm a bit reluctant to say +1 and will only
 say +/-0

 For pefs tests, I still need some more work on my clusters... So not for
 this release.

 JM

 2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com:

  +1
 
  Ran test suite against Java 1.8.0_45
  Checked signature
  Practiced basic shell commands
 
  On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org
 wrote:
 
   I'm happy to announce the first release candidate of HBase 1.1.1
   (HBase-1.1.1RC0) is available for download at
   https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/
  
   Maven artifacts are also available in the staging repository
  
 https://repository.apache.org/content/repositories/orgapachehbase-1087/
  
   Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
   available in the Apache keys directory
   https://people.apache.org/keys/committer/ndimiduk.asc
  
   There's also a signed tag for this release at
  
  
 
 https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259
  
   HBase 1.1.1 is the first patch release in the HBase 1.1 line,
 continuing
  on
   the theme of bringing a stable, reliable database to the Hadoop and
 NoSQL
   communities. This release includes over 100 bug fixes since the 1.1.0
   release, including an assignment manager bug that can lead to data
 loss
  in
   rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1
 as
   soon as possible.
  
   The full list of issues can be found at
  
  
 
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169
  
   Please try out this candidate and vote +/-1 by midnight Pacific time
 on
   Sunday, 2015-06-28 as to whether we should release these artifacts as
  HBase
   1.1.1.
  
   Thanks,
   Nick
  
 





Visibility Labels...

2015-06-24 Thread Sonny
Hey Guys,

Is anyone using HBase's visibility labels feature in their production
environments? If so, could you share your experience?



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Visibility-Labels-tp4072558.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: question regarding HBASE-7351

2015-06-24 Thread Arun Mishra
I am guessing that HBASE-7351 won’t work for my case since process won’t be 
able to read the script from disk.

Regards,
Arun

 On Jun 23, 2015, at 9:48 PM, Arun Mishra arunmis...@me.com wrote:
 
 Hello,
 
 I am using hbase cdh version 0.98.6. I am facing a problem where a disk 
 controller fails on a host and all disk operation kind of hang up on that 
 host. But region server/data node processes doesn’t die and at the same time 
 the zookeeper session keeps alive. Resulting in all requests to that region 
 server failing. Currently, I use zookeeper client to delete the corresponding 
 znode manually to initiate the recovery process. It will take some time to 
 figure out the hardware issue and fix it. Meanwhile, I am looking to find 
 some solution to automate the recovery process. 
 
 I came across HBASE-7351. I am wondering if any one has used this feature or 
 if any other option is available to kill a region server in similar partial 
 hardware failures case. Any insight would be very helpful to me. 
 
 Thanks - Arun.
 
 



Re: question regarding HBASE-7351

2015-06-24 Thread Ted Yu
bq. data node processes doesn’t die

Which hadoop version are you using ?

Have you read the following section in
http://hbase.apache.org/book.html#_hbase_and_hdfs ?
HDFS takes a while to mark a node as dead. You can configure HDFS to avoid
using stale DataNodes

Cheers

On Wed, Jun 24, 2015 at 10:19 AM, Arun Mishra arunmis...@me.com wrote:

 I am guessing that HBASE-7351 won’t work for my case since process won’t
 be able to read the script from disk.

 Regards,
 Arun

  On Jun 23, 2015, at 9:48 PM, Arun Mishra arunmis...@me.com wrote:
 
  Hello,
 
  I am using hbase cdh version 0.98.6. I am facing a problem where a disk
 controller fails on a host and all disk operation kind of hang up on that
 host. But region server/data node processes doesn’t die and at the same
 time the zookeeper session keeps alive. Resulting in all requests to that
 region server failing. Currently, I use zookeeper client to delete the
 corresponding znode manually to initiate the recovery process. It will take
 some time to figure out the hardware issue and fix it. Meanwhile, I am
 looking to find some solution to automate the recovery process.
 
  I came across HBASE-7351. I am wondering if any one has used this
 feature or if any other option is available to kill a region server in
 similar partial hardware failures case. Any insight would be very helpful
 to me.
 
  Thanks - Arun.