[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Michael Brown has posted comments on this change.
Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in
run-hbase.sh
..
Patch Set 5:
(9 comments)
Under what circumstances is more than one zookeeper needed?
http://gerrit.cloudera.org:8080/#/c/4348/5//COMMIT_MSG
Commit Message:
PS5, Line 11: The original script was problematic for a couple of reasons (most
: notably because it queried the wrong Zookeeper node) so we stopped
: using it during our mini-cluster HBase start up procedure.
Note quite right. I didn't use it because it wasn't clear it was needed
anymore, the so-called "HBase race" had been fixed. I didn't notice the master
vs. rs discrepancy at that time.
http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:
PS5, Line 50: parser.add_argument('--timeout', '-t', action='store'
I find action='store' to be redundant since that's the default.
PS5, Line 52: 'Default is 30 seconds.'))
Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS.
PS5, Line 56: 'e.g, 0.0.0.0:5070. Default is
localhost:5070.'))
Use a format string and read from HDFS_HOST.
PS5, Line 60: 'e.g, 0.0.0.0:2181. Default is
localhost:2181.'))
Use a format string and read from ZK_HOSTS.
PS5, Line 66: '/hbase/master and /hbase/rs.')
Use a format string. Maybe use '-n '.join(HBASE_NODES) as part of it?
PS5, Line 127: hdfs_client.list('/')
What does this do?
PS5, Line 125: try:
: hdfs_client = InsecureClient('http://' + args.hdfs_host)
: hdfs_client.list('/')
: except requests.exceptions.ConnectionError as e:
: msg = 'Could not connect to HDFS web host http://{0} -
{1}'.format(args.hdfs_host, e)
: LOGGER.error(msg)
: sys.exit(1)
:
: zk_client = connect_to_zookeeper(args.zookeeper_hosts,
args.timeout)
: errors = sum([check_hbase_node(zk_client, node,
args.timeout) for node in args.nodes])
I suggest you make this a method. The __main__ can get the arguments, the
errors, and print the error and exit accordingly. It would be better to have a
separate method to make a connection and do the query work.
PS5, Line 135: if errors:
It would be better to have the explicit numerical > 0 comparison.
--
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp
Gerrit-Reviewer: David Knupp
Gerrit-Reviewer: Ishaan Joshi
Gerrit-Reviewer: Michael Brown
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new patch set (#5). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic for a couple of reasons (most notably because it queried the wrong Zookeeper node) so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 140 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/5 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 3: (3 comments) Thanks for the feedback. It was helpful. Also, note that I'm adding a preliminary check for HDFS. http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 75: except Exception as e: > Does the message you tack on to the exception give you a lot more informati No, but the exception doesn't give me any addtional helpful information, and I prefer not to look at tracebacks if I don't need to. Purely an aesthetic choice. If you strongly prefer the traceback, I can remove. Line 92: while True: > More readable to put the timeout at the top of the loop and exit if you bug I think you mean "<", yes? ;-) I find the more verbose version more explicit and legible, but eh, not a big deal either way. WRT to 1/0 vs. T/F, I'm actually summing the number of errors at the end. Probably not necessary, but now you can pass in any number of znodes, and see which ones fail and which don't. (It's probably overkill.) Line 103: except Exception: > This is wrong, ignore. Nevertheless, thanks to David for looking into it. F Actually, this was a useful comment, because it prompted me to not be lazy and actually import and check for specific exceptions, which is the right thing to do. -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new patch set (#4). Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic for a couple of reasons (most notably because it queried the wrong Zookeeper node) so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with the following updates: - replace the original script with check-hbase-nodes.py - query the correct node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services - add an additional check that the HDFS service is running, because of an edge case where the HBase master can briefly start without a cluster running. In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 139 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/4 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Ishaan Joshi has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 103: except Exception: > Unless you want this to catch SIGINT(Ctrl-C), you want to do: This is wrong, ignore. Nevertheless, thanks to David for looking into it. FWIW, the problems occures with an except:, not an exception Exception: -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Ishaan Joshi has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 75: except Exception as e: Does the message you tack on to the exception give you a lot more information than the exception? If not, then there's no need to catch it and then do an exit(). Line 92: while True: More readable to put the timeout at the top of the loop and exit if you bug out of it. while time.time() - start_time > TIMEOUT_SECONDS: ... if foo: return 0 LOGGER.error(..) return 1 On that note, why not return True/False instead of 1/0 ? Line 103: except Exception: Unless you want this to catch SIGINT(Ctrl-C), you want to do: except Exception as e: instead of except Exception -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
David Knupp has uploaded a new change for review. http://gerrit.cloudera.org:8080/4348 Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh We used to include a step in run-hbase.sh for calling a python script that queried Zookeeper to see if the HBase master was up. The original script was problematic for a couple of reasons (most notably because it queried the wrong Zookeeper node) so we stopped using it during our mini-cluster HBase start up procedure. HBase start up issues continue to plague us, however. This patch reintroduces a Zookeeper check, with certain updates: - replace the original script with check-hbase-nodes.py - query the corect node /hbase/master, not just /hbase/rs - use the python Zookeeper library kazoo, rather than calling out to the shell and parsing the return string - since we are moving toward testing on a remote cluster, also add the capability to pass in the address for the host that provides the Zookeeper and HBase services In addition to the expected tests, this script was also tested under the conditions of IMPALA-4088, whereby the HBase RegionServer is running, but the master fails because another listening process has already taken its TCP port (60010) during startup. Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 --- M infra/python/deps/requirements.txt A testdata/bin/check-hbase-nodes.py M testdata/bin/run-hbase.sh D testdata/bin/wait-for-hbase-master.py 4 files changed, 116 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/2 -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown
