RE: How to make the client fast fail
In our case (0.94.15), we had a timer to interrupt the hanging thread. Subsequently, we are able to reconnect to hbase and it all worked fine. But we observed the old zookeeper-client thread(s) still failing to connect in addition to new set of zookeeper-client thread(s) which are serving with response. So we scored out the timer option. Thanks, Hari -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Thursday, June 11, 2015 5:17 AM To: user@hbase.apache.org Subject: Re: How to make the client fast fail threads? So that regardless of your hadoop settings, if you want something faster, you can use one thread for a timer and then the request is in another. So if you hit your timeout before you get a response, you can stop your thread. (YMMV depending on side effects... ) On Jun 10, 2015, at 12:55 AM, PRANEESH KUMAR wrote: Hi, I have got the Connection object with default configuration, if the zookeeper or HMaster or Region server is down, the client didn't fast fail and it took almost 20 mins to thrown an error. What is the best configuration to make the client fast fail. Also what is significance of changing the following parameters. hbase.client.retries.number zookeeper.recovery.retry zookeeper.session.timeout zookeeper.recovery.retry.intervalmill hbase.rpc.timeout Regards, Praneesh
HBase monitoring
hello! I try to monitor memory of my regionservers. I have HBase 0.98 on CDH 5.3.1. I can see inconsistency between HBase metrics (here is metrics dump from HBase UI): name : Hadoop:service=HBase,name=JvmMetrics, modelerType : JvmMetrics, tag.Context : jvm, tag.ProcessName : IPC, tag.SessionId : , tag.Hostname : sqhadoop04.gazeta.pl, MemNonHeapUsedM : 93.26676, MemNonHeapCommittedM : 94.89844, MemNonHeapMaxM : -9.536743E-7, MemHeapUsedM : 8513.122, MemHeapCommittedM : 20330.25, MemHeapMaxM : 20330.25, MemMaxM : 20330.25, and system top monitoring (all of my hbase processes on the regionserver): PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 19700 hbase 20 0 22.3g 16g 23m S 1.3 8.7 479:47.53 java 19874 hbase 20 0 107m 2404 556 S 0.7 0.0 9:16.13 hbase.sh 19873 hbase 20 0 105m 908 556 S 0.0 0.0 0:00.00 hbase.sh As we can see there is 16GB according to top and ~8.5GB according to HBase. Why these values are different? Kindly regards Wojciech Indyk
Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available
+1 Ran test suite against Java 1.8.0_45 Checked signature Practiced basic shell commands On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote: I'm happy to announce the first release candidate of HBase 1.1.1 (HBase-1.1.1RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1087/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes over 100 bug fixes since the 1.1.0 release, including an assignment manager bug that can lead to data loss in rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as soon as possible. The full list of issues can be found at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169 Please try out this candidate and vote +/-1 by midnight Pacific time on Sunday, 2015-06-28 as to whether we should release these artifacts as HBase 1.1.1. Thanks, Nick
Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available
Checked signature for both bin and src = Passed. Checked archive, documentation, .TXT files, etc. = Passed. Ran it locally, tried create, drop, disable, alter, flush, put, scan = Passed Deployed on 9 nodes with 0.98 /hbase folder with JDK8 and Hadoop 2.7.0= Passed. Yes. I'm able to read my tables, had not get a chance to run MR on that yet. However... I'm unable to get any clean test run. Tried on 4 different servers with 2 JDKs, all failed. With 1.7.0_45: Tests in error: TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod java.util.con... TestProcedureManager.setupBeforeClass:53 » IO Shutting down Tests run: 923, Failures: 0, Errors: 2, Skipped: 5 Tests in error: TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod java.util.con... TestProcedureManager.setupBeforeClass:53 » IO Shutting down Tests run: 923, Failures: 0, Errors: 2, Skipped: 5 Tests in error: TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod java.util.con... TestProcedureManager.setupBeforeClass:53 » IO Shutting down Tests run: 923, Failures: 0, Errors: 2, Skipped: 5 Tests in error: TestClockSkewDetection.testClockSkewDetection:110 » NoSuchMethod java.util.con... TestProcedureManager.setupBeforeClass:53 » IO Shutting down Tests run: 923, Failures: 0, Errors: 2, Skipped: 5 All failed with the exact same error. With 1.8.0_45: Failed tests: TestFromClientSideWithCoprocessorTestFromClientSide.testCheckAndDeleteWithCompareOp:5031 expected:false but was:true TestMultiParallel.testActiveThreadsCount:160 expected:5 but was:4 TestReplicationEndpoint.testReplicationEndpointReturnsFalseOnReplicate:145 Waiting timed out after [60,000] msec Tests in error: TestSnapshotCloneIndependence.testOfflineSnapshotDeleteIndependent:177-runTestSnapshotDeleteIndependent:424 » RetriesExhausted TestTableLockManager.testReapAllTableLocks:283 » LockTimeout Timed out acquiri... Tests run: 2633, Failures: 3, Errors: 2, Skipped: 20 Failed tests: TestFromClientSide.testCheckAndDeleteWithCompareOp:5031 expected:false but was:true Tests run: 2637, Failures: 1, Errors: 0, Skipped: 20 [INFO] HBase - Server FAILURE [1:58:05.610s] Also, I tried to run IntegrationTestBigLinkedList and it fails: 015-06-24 19:06:11,644 ERROR [main] test.IntegrationTestBigLinkedList$Verify: Expected referenced count does not match with actual referenced count. expected referenced=100 ,actual=0 And last I ran IntegrationTestLoadAndVerify but I have no idea how to interpret the result ;) org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_WRITTEN=1980 ROWS_WRITTEN=2000 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_CHECKED=1036925998 ROWS_WRITTEN=0 So. It seems to be working on my cluster, but I have not been able to get any successful test. Therefore I'm a bit reluctant to say +1 and will only say +/-0 For pefs tests, I still need some more work on my clusters... So not for this release. JM 2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com: +1 Ran test suite against Java 1.8.0_45 Checked signature Practiced basic shell commands On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote: I'm happy to announce the first release candidate of HBase 1.1.1 (HBase-1.1.1RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1087/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes over 100 bug fixes since the 1.1.0 release, including an assignment manager bug that can lead to data loss in rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as soon as possible. The full list of issues can be found at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169 Please try out this candidate and vote +/-1 by midnight Pacific time on Sunday, 2015-06-28 as to whether we should release these artifacts as HBase 1.1.1. Thanks, Nick
Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available
Also, I tried to run IntegrationTestBigLinkedList and it fails: 015-06-24 19:06:11,644 ERROR [main] test.IntegrationTestBigLinkedList$Verify: Expected referenced count does not match with actual referenced count. expected referenced=100 ,actual=0 What are the command line arguments passed? Verify cannot find any references? And last I ran IntegrationTestLoadAndVerify but I have no idea how to interpret the result ;) org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_WRITTEN=1980 ROWS_WRITTEN=2000 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_CHECKED=1036925998 ROWS_WRITTEN=0 This is a bit fishy. Again, what are the parameters passed? Did you run with a clean cluster state? For these two tests, I think there is at least 3 or so bugs already fixed in theory. Our tests and my 1.2B row tests on a previous branch-1.1 code base was ok. So. It seems to be working on my cluster, but I have not been able to get any successful test. Therefore I'm a bit reluctant to say +1 and will only say +/-0 For pefs tests, I still need some more work on my clusters... So not for this release. JM 2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com: +1 Ran test suite against Java 1.8.0_45 Checked signature Practiced basic shell commands On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote: I'm happy to announce the first release candidate of HBase 1.1.1 (HBase-1.1.1RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1087/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes over 100 bug fixes since the 1.1.0 release, including an assignment manager bug that can lead to data loss in rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as soon as possible. The full list of issues can be found at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169 Please try out this candidate and vote +/-1 by midnight Pacific time on Sunday, 2015-06-28 as to whether we should release these artifacts as HBase 1.1.1. Thanks, Nick
Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available
Here's my review of this RC: - verified tarballs vs public key in p.a.o/keys/committers/ndimiduk.asc. - extracted both tgz, structure looks good. - examined book, pdf. - ran Stack's hbase-downstreamer vs. the maven repo. tests pass. - verified build of src tgz against hadoop versions (2.2.0/minikdc=2.3.0, 2.3.0, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0, 2.7.0), with both openjdk-1.7.0_79.jdk and openjdk-1.8.0_45.jdk. - on 5-node cluster, verified rolling upgrade from hadoop-2.4.0/hbase-0.98.0 while concurrently running LoadTestTool with LZ4 compression (0.98.0 client). No issues, logs look good. - poked around with the shell on the same: list, status, snapshot, compact, drop, clone, delete_snapshot, drop. no issues, logs look good. - inspected compatibility report vs. 1.1.0 [0]. Looks good to me; a single low-severity issue which I understand to be benign. +1 [0]: http://people.apache.org/~ndimiduk/1.1.0_1.1.1RC0_compat_report.html On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote: I'm happy to announce the first release candidate of HBase 1.1.1 (HBase-1.1.1RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1087/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes over 100 bug fixes since the 1.1.0 release, including an assignment manager bug that can lead to data loss in rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as soon as possible. The full list of issues can be found at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169 Please try out this candidate and vote +/-1 by midnight Pacific time on Sunday, 2015-06-28 as to whether we should release these artifacts as HBase 1.1.1. Thanks, Nick
Re: [VOTE] First release candidate for HBase 1.1.1 (RC0) is available
Here is my official +1. - Checked sigs, crcs - Checked dir layout - Built src with Hadoop-2.3+ - Run local mode, smoke tests from shell - Run LTT on local mode - Checked compat report that Nick put up. - Checked tag - Checked src tarball contents against tag. There are two extra files: hbase-shaded-client/pom.xml and hbase-shaded-server/pom.xml. Not sure where they are coming from. Create an issue? But not important for the RC. Plus, we have been running (close to) 1.1.1 bits against our test rig with most of the IT's and the results never looked better. Enis On Wed, Jun 24, 2015 at 7:29 PM, Enis Söztutar enis@gmail.com wrote: Also, I tried to run IntegrationTestBigLinkedList and it fails: 015-06-24 19:06:11,644 ERROR [main] test.IntegrationTestBigLinkedList$Verify: Expected referenced count does not match with actual referenced count. expected referenced=100 ,actual=0 What are the command line arguments passed? Verify cannot find any references? And last I ran IntegrationTestLoadAndVerify but I have no idea how to interpret the result ;) org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_WRITTEN=1980 ROWS_WRITTEN=2000 org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters REFERENCES_CHECKED=1036925998 ROWS_WRITTEN=0 This is a bit fishy. Again, what are the parameters passed? Did you run with a clean cluster state? For these two tests, I think there is at least 3 or so bugs already fixed in theory. Our tests and my 1.2B row tests on a previous branch-1.1 code base was ok. So. It seems to be working on my cluster, but I have not been able to get any successful test. Therefore I'm a bit reluctant to say +1 and will only say +/-0 For pefs tests, I still need some more work on my clusters... So not for this release. JM 2015-06-24 16:25 GMT-04:00 Ted Yu yuzhih...@gmail.com: +1 Ran test suite against Java 1.8.0_45 Checked signature Practiced basic shell commands On Tue, Jun 23, 2015 at 4:25 PM, Nick Dimiduk ndimi...@apache.org wrote: I'm happy to announce the first release candidate of HBase 1.1.1 (HBase-1.1.1RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.1RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1087/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=af1934d826cab80f727e9a95c5b564f04da73259 HBase 1.1.1 is the first patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes over 100 bug fixes since the 1.1.0 release, including an assignment manager bug that can lead to data loss in rare cases. Users of 1.1.0 are strongly encouraged to update to 1.1.1 as soon as possible. The full list of issues can be found at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12332169 Please try out this candidate and vote +/-1 by midnight Pacific time on Sunday, 2015-06-28 as to whether we should release these artifacts as HBase 1.1.1. Thanks, Nick
Visibility Labels...
Hey Guys, Is anyone using HBase's visibility labels feature in their production environments? If so, could you share your experience? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Visibility-Labels-tp4072558.html Sent from the HBase User mailing list archive at Nabble.com.
Re: question regarding HBASE-7351
I am guessing that HBASE-7351 won’t work for my case since process won’t be able to read the script from disk. Regards, Arun On Jun 23, 2015, at 9:48 PM, Arun Mishra arunmis...@me.com wrote: Hello, I am using hbase cdh version 0.98.6. I am facing a problem where a disk controller fails on a host and all disk operation kind of hang up on that host. But region server/data node processes doesn’t die and at the same time the zookeeper session keeps alive. Resulting in all requests to that region server failing. Currently, I use zookeeper client to delete the corresponding znode manually to initiate the recovery process. It will take some time to figure out the hardware issue and fix it. Meanwhile, I am looking to find some solution to automate the recovery process. I came across HBASE-7351. I am wondering if any one has used this feature or if any other option is available to kill a region server in similar partial hardware failures case. Any insight would be very helpful to me. Thanks - Arun.
Re: question regarding HBASE-7351
bq. data node processes doesn’t die Which hadoop version are you using ? Have you read the following section in http://hbase.apache.org/book.html#_hbase_and_hdfs ? HDFS takes a while to mark a node as dead. You can configure HDFS to avoid using stale DataNodes Cheers On Wed, Jun 24, 2015 at 10:19 AM, Arun Mishra arunmis...@me.com wrote: I am guessing that HBASE-7351 won’t work for my case since process won’t be able to read the script from disk. Regards, Arun On Jun 23, 2015, at 9:48 PM, Arun Mishra arunmis...@me.com wrote: Hello, I am using hbase cdh version 0.98.6. I am facing a problem where a disk controller fails on a host and all disk operation kind of hang up on that host. But region server/data node processes doesn’t die and at the same time the zookeeper session keeps alive. Resulting in all requests to that region server failing. Currently, I use zookeeper client to delete the corresponding znode manually to initiate the recovery process. It will take some time to figure out the hardware issue and fix it. Meanwhile, I am looking to find some solution to automate the recovery process. I came across HBASE-7351. I am wondering if any one has used this feature or if any other option is available to kill a region server in similar partial hardware failures case. Any insight would be very helpful to me. Thanks - Arun.