[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-14 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic, so we stopped using it during
our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Reviewed-on: http://gerrit.cloudera.org:8080/4348
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 176 insertions(+), 59 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-14 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-14 Thread Harrison Sheinblatt (Code Review)
Harrison Sheinblatt has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4348/9/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 135: zk_client.stop()
A problem is that zk client wants you to stop() and then close().  In this 
case, if there is an exception or something that gets through, the close() will 
be called without stop().  This may work without a 'graceful shutdown' of the 
client, and be OK.  I've done a 2-phase context manager in the past to ensure 
stop() is called before close().


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4348/8/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 88: LOGGER.debug("Success: " + str(zk_client))
> That's a good catch, but I think it might be OK as-is. This is the output o
Wfm.


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4348/8/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 88: LOGGER.debug("Success: " + str(zk_client))
> LOGGER.info? Otherwise it looks like we're trying to connect but never succ
That's a good catch, but I think it might be OK as-is. This is the output of a 
typical run:


Contents of HDFS root: [u'hbase', u'home', u'test-warehouse', u'tmp', 
u'user']
Connecting to Zookeeper host(s).
Waiting for HBase node: /hbase/master
Success: /hbase/master
Waiting for HBase node: /hbase/rs
Success: /hbase/rs 
[etc...]


The missing line would have been:


Success: 


If the connection attempt didn't succeed, I don't think it would be a mystery 
-- you'd get the exception msg, and the script would exit with an error. Is 
that acceptable? (Honestly, I think might have intended to take that line out.)


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 8: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#8).

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic, so we stopped using it during
our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 173 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-13 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#7).

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic, so we stopped using it during
our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 173 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-12 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#6).

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..

IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

We used to include a step in run-hbase.sh for calling a python
script that queried Zookeeper to see if the HBase master was up.
The original script was problematic, so we stopped using it during
our mini-cluster HBase start up procedure.

HBase start up issues continue to plague us, however. This patch
reintroduces a Zookeeper check, with the following updates:

- replace the original script with check-hbase-nodes.py
- query the correct node /hbase/master, not just /hbase/rs
- use the python Zookeeper library kazoo, rather than calling
  out to the shell and parsing the return string
- since we are moving toward testing on a remote cluster, also
  add the capability to pass in the address for the host that
  provides the Zookeeper and HBase services
- add an additional check that the HDFS service is running,
  because of an edge case where the HBase master can briefly
  start without a cluster running.

In addition to the expected tests, this script was also tested
under the conditions of IMPALA-4088, whereby the HBase RegionServer
is running, but the master fails because another listening process
has already taken its TCP port (60010) during startup.

Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
---
M infra/python/deps/requirements.txt
A testdata/bin/check-hbase-nodes.py
M testdata/bin/run-hbase.sh
D testdata/bin/wait-for-hbase-master.py
4 files changed, 154 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/4348/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh

2016-09-12 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in 
run-hbase.sh
..


Patch Set 5:

(8 comments)

Reworked the git commit message, and fixed all the issues. I'm trying to find a 
good way to check the remote health of the namenode on a remote cluster.

http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

PS5, Line 50: parser.add_argument('--timeout', '-t', action='store'
> I find action='store' to be redundant since that's the default.
Done


PS5, Line 52:   'Default is 30 seconds.'))
> Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS.
Done


PS5, Line 56:   'e.g, 0.0.0.0:5070. Default is 
localhost:5070.'))
> Use a format string and read from HDFS_HOST.
Done


PS5, Line 60:   'e.g, 0.0.0.0:2181. Default is 
localhost:2181.'))
> Use a format string and read from ZK_HOSTS.
Done


PS5, Line 66:   '/hbase/master and /hbase/rs.')
> Use a format string. Maybe use '-n '.join(HBASE_NODES)  as part of it?
Done


PS5, Line 127: hdfs_client.list('/')
> What does this do?
Throws an exception if we can't connect to HDFS. I was re-thinking this weekend 
that maybe this whole bit should either moved or reworked. It doesn't seem to 
be working on remote hosts.


PS5, Line 125: try:
 : hdfs_client = InsecureClient('http://' + args.hdfs_host)
 : hdfs_client.list('/')
 : except requests.exceptions.ConnectionError as e:
 : msg = 'Could not connect to HDFS web host http://{0} - 
{1}'.format(args.hdfs_host, e)
 : LOGGER.error(msg)
 : sys.exit(1)
 : 
 : zk_client = connect_to_zookeeper(args.zookeeper_hosts, 
args.timeout)
 : errors = sum([check_hbase_node(zk_client, node, 
args.timeout) for node in args.nodes])
> I suggest you make this a method. The __main__ can get the arguments, the e
Done


PS5, Line 135: if errors:
> It would be better to have the explicit numerical > 0 comparison.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4348
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Ishaan Joshi 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes