[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Reviewed-on: http://gerrit.cloudera.org:8080/4348 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 176 insertions(+), 59 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Harrison Sheinblatt has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/4348/9/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 135: zk_client.stop() A problem is that zk client wants you to stop() and then close(). In this case, if there is an exception or something that gets through, the close() will be called without stop(). This may work without a 'graceful shutdown' of the client, and be OK. I've done a 2-phase context manager in the past to ensure stop() is called before close(). -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Alex Behm has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: Code-Review+2 Thanks for the explanation. Harrison, thanks for reviewing! -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: Harrison and I looked at that too, but the hdfs lib binds to the WebHDFS/HttpFS API, and doesn't maintain active sessions. (The hdfs client doesn't provide any kind of __exit__() or close() interface.) -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Alex Behm has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: What about the other clients like for HDFS? -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 9: Harrison very astutely pointed out in an out-of-band message that I had neglected to call close on the zk_client. Latest patch wraps the client in contextlib.closing() to avoid potentially leaking file handles. -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Hello Michael Brown, Internal Jenkins, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4348 to look at the new patch set (#9). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 176 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/9 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Alex Behm has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/4348/8/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 88: LOGGER.debug("Success: " + str(zk_client)) > That's a good catch, but I think it might be OK as-is. This is the output o Wfm. -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/4348/8/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 88: LOGGER.debug("Success: " + str(zk_client)) > LOGGER.info? Otherwise it looks like we're trying to connect but never succ That's a good catch, but I think it might be OK as-is. This is the output of a typical run: Contents of HDFS root: [u'hbase', u'home', u'test-warehouse', u'tmp', u'user'] Connecting to Zookeeper host(s). Waiting for HBase node: /hbase/master Success: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs [etc...] The missing line would have been: Success: If the connection attempt didn't succeed, I don't think it would be a mystery -- you'd get the exception msg, and the script would exit with an error. Is that acceptable? (Honestly, I think might have intended to take that line out.) -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Alex Behm has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4348/8/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 88: LOGGER.debug("Success: " + str(zk_client)) LOGGER.info? Otherwise it looks like we're trying to connect but never succeed -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Michael Brown has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new patch set (#8). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 173 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/8 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new patch set (#7). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 173 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/7 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4348/6/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS6, Line 59: help=('Comma-delineated string of hosts in host:PORT format, ' > When is more than one zk host needed? Don't know, but that's the way the API is written. PS6, Line 135: # Confirm that HDFS is available. There is a pathological case where the HBase master : # can start up briefly if HDFS is not available, and then quit immediately, but that can : # be long enough to give a false positive that the HBase master is running. : # : try: : hdfs_client = InsecureClient('http://' + args.hdfs_host) : hdfs_client.list('/') : except requests.exceptions.ConnectionError as e: : msg = 'Could not connect to HDFS web host http://{0} - {1}'.format(args.hdfs_host, e) : LOGGER.error(msg) : sys.exit(1) : : errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) > I meant for this section also to be a method, with errors returned. Done -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Michael Brown has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4348/6/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS6, Line 59: help=('Comma-delineated string of hosts in host:PORT format, ' When is more than one zk host needed? PS6, Line 135: # Confirm that HDFS is available. There is a pathological case where the HBase master : # can start up briefly if HDFS is not available, and then quit immediately, but that can : # be long enough to give a false positive that the HBase master is running. : # : try: : hdfs_client = InsecureClient('http://' + args.hdfs_host) : hdfs_client.list('/') : except requests.exceptions.ConnectionError as e: : msg = 'Could not connect to HDFS web host http://{0} - {1}'.format(args.hdfs_host, e) : LOGGER.error(msg) : sys.exit(1) : : errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) I meant for this section also to be a method, with errors returned. -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new patch set (#6). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic, so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 154 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/6 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 5: (8 comments) Reworked the git commit message, and fixed all the issues. I'm trying to find a good way to check the remote health of the namenode on a remote cluster. http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS5, Line 50: parser.add_argument('--timeout', '-t', action='store' > I find action='store' to be redundant since that's the default. Done PS5, Line 52: 'Default is 30 seconds.')) > Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS. Done PS5, Line 56: 'e.g, 0.0.0.0:5070. Default is localhost:5070.')) > Use a format string and read from HDFS_HOST. Done PS5, Line 60: 'e.g, 0.0.0.0:2181. Default is localhost:2181.')) > Use a format string and read from ZK_HOSTS. Done PS5, Line 66: '/hbase/master and /hbase/rs.') > Use a format string. Maybe use '-n '.join(HBASE_NODES) as part of it? Done PS5, Line 127: hdfs_client.list('/') > What does this do? Throws an exception if we can't connect to HDFS. I was re-thinking this weekend that maybe this whole bit should either moved or reworked. It doesn't seem to be working on remote hosts. PS5, Line 125: try: : hdfs_client = InsecureClient('http://' + args.hdfs_host) : hdfs_client.list('/') : except requests.exceptions.ConnectionError as e: : msg = 'Could not connect to HDFS web host http://{0} - {1}'.format(args.hdfs_host, e) : LOGGER.error(msg) : sys.exit(1) : : zk_client = connect_to_zookeeper(args.zookeeper_hosts, args.timeout) : errors = sum([check_hbase_node(zk_client, node, args.timeout) for node in args.nodes]) > I suggest you make this a method. The __main__ can get the arguments, the e Done PS5, Line 135: if errors: > It would be better to have the explicit numerical > 0 comparison. Done -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes