[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-10 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 5:

(9 comments)

Under what circumstances is more than one zookeeper needed?

http://gerrit.cloudera.org:8080/#/c/4348/5//COMMIT_MSG
Commit Message:

PS5, Line 11: The original script was problematic for a couple of reasons (most
: notably because it queried the wrong Zookeeper node) so we stopped
: using it during our mini-cluster HBase start up procedure.
Note quite right. I didn't use it because it wasn't clear it was needed 
anymore, the so-called "HBase race" had been fixed. I didn't notice the master 
vs. rs discrepancy at that time.


http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

PS5, Line 50: parser.add_argument('--timeout', '-t', action='store'
I find action='store' to be redundant since that's the default.


PS5, Line 52:   'Default is 30 seconds.'))
Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS.


PS5, Line 56:   'e.g, 0.0.0.0:5070. Default is 
localhost:5070.'))
Use a format string and read from HDFS_HOST.


PS5, Line 60:   'e.g, 0.0.0.0:2181. Default is 
localhost:2181.'))
Use a format string and read from ZK_HOSTS.


PS5, Line 66:   '/hbase/master and /hbase/rs.')
Use a format string. Maybe use '-n '.join(HBASE_NODES)  as part of it?


PS5, Line 127: hdfs_client.list('/')
What does this do?


PS5, Line 125: try:
 : hdfs_client = InsecureClient('http://' + args.hdfs_host)
 : hdfs_client.list('/')
 : except requests.exceptions.ConnectionError as e:
 : msg = 'Could not connect to HDFS web host http://{0} - 
{1}'.format(args.hdfs_host, e)
 : LOGGER.error(msg)
 : sys.exit(1)
 : 
 : zk_client = connect_to_zookeeper(args.zookeeper_hosts, 
args.timeout)
 : errors = sum([check_hbase_node(zk_client, node, 
args.timeout) for node in args.nodes])
I suggest you make this a method. The __main__ can get the arguments, the 
errors, and print the error and exit accordingly. It would be better to have a 
separate method to make a connection and do the query work.


PS5, Line 135: if errors:
It would be better to have the explicit numerical > 0 comparison.


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#5).

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic for a couple of reasons (most
notably because it queried the wrong Zookeeper node) so we stopped
using it during our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 140 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 3:

(3 comments)

Thanks for the feedback. It was helpful.

Also, note that I'm adding a preliminary check for HDFS.

http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 75: except Exception as e:
> Does the message you tack on to the exception give you a lot more informati
No, but the exception doesn't give me any addtional helpful information, and I 
prefer not to look at tracebacks if I don't need to. Purely an aesthetic 
choice. If you strongly prefer the traceback, I can remove.


Line 92: while True:
> More readable to put the timeout at the top of the loop and exit if you bug
I think you mean "<", yes? ;-)

I find the more verbose version more explicit and legible, but eh, not a big 
deal either way.

WRT to 1/0 vs. T/F, I'm actually summing the number of errors at the end. 
Probably not necessary, but now you can pass in any number of znodes, and see 
which ones fail and which don't. (It's probably overkill.)


Line 103: except Exception:
> This is wrong, ignore. Nevertheless, thanks to David for looking into it. F
Actually, this was a useful comment, because it prompted me to not be lazy and 
actually import and check for specific exceptions, which is the right thing to 
do.


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#4).

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic for a couple of reasons (most
notably because it queried the wrong Zookeeper node) so we stopped
using it during our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 139 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread Ishaan Joshi (Code Review)
Ishaan Joshi has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 103: except Exception:
> Unless you want this to catch SIGINT(Ctrl-C), you want to do:
This is wrong, ignore. Nevertheless, thanks to David for looking into it. FWIW, 
the problems occures with an except:, not an exception Exception:


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread Ishaan Joshi (Code Review)
Ishaan Joshi has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4348/3/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 75: except Exception as e:
Does the message you tack on to the exception give you a lot more information 
than the exception? If not, then there's no need to catch it and then do an 
exit().


Line 92: while True:
More readable to put the timeout at the top of the loop and exit if you bug out 
of it.

while time.time() - start_time > TIMEOUT_SECONDS:
  ...
  if foo: return 0

LOGGER.error(..)
return 1

On that note, why not return True/False instead of 1/0 ?


Line 103: except Exception:
Unless you want this to catch SIGINT(Ctrl-C), you want to do:

except Exception as e: instead of except Exception


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-09 Thread David Knupp (Code Review)
David Knupp has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4348

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic for a couple of reasons (most
notably because it queried the wrong Zookeeper node) so we stopped
using it during our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with certain updates:

- replace the original script with check-hbase-nodes.py
- query the corect node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 116 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown