[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
Harrison Sheinblatt has posted comments on this change. Change subject: Enabling end-to-end tests on a remote cluster .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4769/1/bin/remote_data_load.py File bin/remote_data_load.py: PS1, Line 365: main > I'm having a bit of trouble parsing this sentence. Can you clarify? With the parser options directly in main() it would be difficult to invoke the main() logic from another python script without shelling out to execute the script as a sub process. If instead, you define the parse options in a separate method, and create a method that does all the logic in main() but takes a parameter of the args, then another python program could set an arg dictionary and invoke the main logic directly without need to shell out. -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
David Knupp has posted comments on this change. Change subject: Enabling end-to-end tests on a remote cluster .. Patch Set 1: (16 comments) http://gerrit.cloudera.org:8080/#/c/4769/1/bin/remote_data_load.py File bin/remote_data_load.py: PS1, Line 142: fe > Does this mean we're still using the 3rd party client libraries with these Nope, this is the path to where we keep our config files. We're just literally overwriting some of these files on the client with the same files downloaded from the cluster. $ ls fe/src/test/resources/*.xml -l lrwxrwxrwx 1 dknupp dknupp78 Oct 25 18:22 fe/src/test/resources/core-site.xml -> /home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/core-site.xml -rw-rw-r-- 1 dknupp dknupp 1985 Oct 25 18:22 fe/src/test/resources/hbase-site.xml lrwxrwxrwx 1 dknupp dknupp78 Oct 25 18:22 fe/src/test/resources/hdfs-site.xml -> /home/dknupp/Impala/testdata/cluster/cdh5/node-1/etc/hadoop/conf/hdfs-site.xml -rw-rw-r-- 1 dknupp dknupp 67730 Oct 18 18:18 fe/src/test/resources/hive-default.xml -rw-rw-r-- 1 dknupp dknupp 4728 Oct 25 18:22 fe/src/test/resources/hive-site.xml -rw-rw-r-- 1 dknupp dknupp 1976 Oct 25 18:22 fe/src/test/resources/sentry-site.xml PS1, Line 149: service > I believe the Cluster object in comparisons/cluster.py has helper methods f Going to leave this for a later investigation. PS1, Line 160: settings required for data loading > It would be good to document here exactly what is returned, and an explanat Done PS1, Line 224: environment > Is there a reason to update the current environment rather than create an e My presumption is that we set environment variables here because "that's how it's done" under our current model. That said, I don't think the current environment really gets updated, right? Python gets forked as a child process for the shell, and the environment gets set for the life span of the script. I agree that it seems a bit hacky, but it shouldn't have a persistent effect on one's environment. PS1, Line 266: load > Might be good to time this at least overall. Even if we just log the total I added a decorator that we can use on various functions. It might be handy when/if this script gets refactors to time various parts or stages of it. For right now, it just logs the time as you requested, but we can change the decorator to do something more intelligent at any time, e.g., record time in a DB for eventual trending, etc. PS1, Line 278: INFO A > What does this mean? You know, I'm not sure. I think Martin may have just been marking when certain phases completed, or testing the logger setup. I'll remove it. PS1, Line 281: logger > Two blank lines before this line, probably remove at least one. Done PS1, Line 296: INFO B > This must relate to INFO A above, but what does it mean? Removed. PS1, Line 297: chmod > Are we re-setting these permissions at the end, or do we know that tests do I'm not sure, but as elsewhere, I've filed a JIRA to investigate at a later time. PS1, Line 315: Re-load > Does this mean it was already loaded and now it's being loaded again? Why? I'm not sure, but I can't actually get this far into the script now, owing to the breakages introduced by the latest Kudu changes. I'll have to make a note to look into this once we fix IMPALA-4365. PS1, Line 335: test > This seems to not belong in this class; it doesn't do any data load. This may be here due to the fact that, running as part of the forked child python process, it can make use of the environment changes from before. I'm going to leave this in place for now, with the idea that we can refactor it out at a later time. JIRA has been filed. PS1, Line 365: main > If we have a parse_options() method a run(parsed_options) method, then you I'm having a bit of trouble parsing this sentence. Can you clarify? PS1, Line 393: test > This seems to belong elsewhere. Why does it go here? See the reply from above. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/compute-table-stats.sh File testdata/bin/compute-table-stats.sh: PS1, Line 27: IMPALAD > Can you reference the Jira in a comment? Yup, a comment was added. I think you may have been looking at an older patch. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: PS1, Line 38: HS2_HOST_PORT > Is it reasonable to add a comment referencing the Jira here? Possible you were looking at an older patch. A comment has been added to the code. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/setup-hdfs-env.sh File testdata/bin/setup-hdfs-env.sh: PS1, Line 53: CACHEADMIN_ARGS > If the is_kerberized block is executed above, then the CACHADMIN_ARGS would I feel like some of these comments might be outside of the scope of this review, esp. with regard to factoring out the existing is_kerberized block. Since I'm not an
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
Harrison Sheinblatt has posted comments on this change. Change subject: Enabling end-to-end tests on a remote cluster .. Patch Set 1: (3 comments) Responded to comments. http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/compute-table-stats.sh File testdata/bin/compute-table-stats.sh: PS1, Line 27: IMPALAD > IMPALA-4346 has been filed. Can you reference the Jira in a comment? http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: PS1, Line 38: HS2_HOST_PORT > I think the latter is preferable -- corral all the required configs in one Is it reasonable to add a comment referencing the Jira here? http://gerrit.cloudera.org:8080/#/c/4769/1/testdata/bin/setup-hdfs-env.sh File testdata/bin/setup-hdfs-env.sh: PS1, Line 53: CACHEADMIN_ARGS > Clarification: can you be more explicit about the check you want? Something If the is_kerberized block is executed above, then the CACHADMIN_ARGS would include '-owner ${PREVIOUS_USER}'. If HADOOP_USER_NAME is also true, then we add another '-owner ${USER}' to this, which probably breaks it. I think there are probably 4 bugs: 1) The is_kerberized block above probably isn't supported and should be removed and 2) the CACHEADMIN_ARGS definition logic needs a clear conditional, ideally in a single location, that sets the user/group/owner information properly in a way that you can easily tell it's always well-defined. Here it looks like the logic is intended to be that if it's kerberized it sets owner one way, if it's not kerberized and the hadoop user is defined it's set another way and if it's not kerberized and the hadoop user is not defined it stays undefined. If we want to keep the is_kerberized logic in one place, then we can have it set another parameter about owner fields and here only update it if it's set already. 3) CACHEADMIN_ARGS is prob! ably the wrong name as it is for the -addPool command and sets a subset of the args 4) We should explicitly set all arguments to cacheadmin, -addPool if possible (e.g. mode maxTtl) -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
David Knupp has posted comments on this change. Change subject: Enabling end-to-end tests on a remote cluster .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/4769/1/bin/remote_data_load.py File bin/remote_data_load.py: PS1, Line 88: RemoteDataLoad > I'd separate out the common functionality needed for dealing with remote cl IMPALA-4367 has been filed. PS1, Line 132: v10 > Hardcoding v10. Is this necessary? I think URL may be missing with later IMPALA-4367 has been filed. PS1, Line 155: get_service_client_configurations > A lot of this seems like it could be in comparisons/cluster.py, or at least IMPALA-4367 has been filed. PS1, Line 212: find_snapshot_file > It would be good to start converting the snapshot file management into pyth IMPALA-4367 has been filed. -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
David Knupp has uploaded a new patch set (#4). Change subject: Enabling end-to-end tests on a remote cluster .. Enabling end-to-end tests on a remote cluster This patch enables data loading and running end-to-end tests on a remote cluster. The requirements to run the tests on a remote cluster are - CDH cluster that is CM managed - KMS and KeyTrustee installed and available as service - Hive warehouse dir points to /test-warehouse The new remote_load_data.py script takes a CM host as argument and will load the test warehouse snapshot on the first cluster managed by this instance of CM. It will automatically pick the necessary configuration needed to perform the data load process. Usage: remote_data_load.py [options] cm_host Options: -h, --helpshow this help message and exit --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --testRun end-to-end tests against cluster Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh 10 files changed, 575 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/4 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
David Knupp has uploaded a new patch set (#4). Change subject: Enabling end-to-end tests on a remote cluster .. Enabling end-to-end tests on a remote cluster This patch enables data loading and running end-to-end tests on a remote cluster. The requirements to run the tests on a remote cluster are - CDH cluster that is CM managed - KMS and KeyTrustee installed and available as service - Hive warehouse dir points to /test-warehouse The new remote_load_data.py script takes a CM host as argument and will load the test warehouse snapshot on the first cluster managed by this instance of CM. It will automatically pick the necessary configuration needed to perform the data load process. Usage: remote_data_load.py [options] cm_host Options: -h, --helpshow this help message and exit --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --testRun end-to-end tests against cluster Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh 10 files changed, 575 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/4 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] Enabling end-to-end tests on a remote cluster
David Knupp has uploaded a new patch set (#3). Change subject: Enabling end-to-end tests on a remote cluster .. Enabling end-to-end tests on a remote cluster This patch enables data loading and running end-to-end tests on a remote cluster. The requirements to run the tests on a remote cluster are - CDH cluster that is CM managed - KMS and KeyTrustee installed and available as service - Hive warehouse dir points to /test-warehouse The new remote_load_data.py script takes a CM host as argument and will load the test warehouse snapshot on the first cluster managed by this instance of CM. It will automatically pick the necessary configuration needed to perform the data load process. Usage: remote_data_load.py [options] cm_host Options: -h, --helpshow this help message and exit --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --testRun end-to-end tests against cluster Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh 10 files changed, 574 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/3 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown