Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster ......................................................................
Patch Set 7: (20 comments) http://gerrit.cloudera.org:8080/#/c/4769/7/bin/remote_data_load.py File bin/remote_data_load.py: PS7, Line 17: This script is a setup script How about simply "this is a setup script" PS7, Line 21: The script expects a setup Impala development environment. How about: This script should be executed from a machine that has the Impala development environment set up. PS7, Line 22: appropriately It's not clear what this means. Maybe expand on this a little? PS7, Line 47: import logging import statements should be above from x import y PS7, Line 111: , remove PS7, Line 116: that, remove PS7, Line 127: not having a not here makes it harder to reason about, how about: options.gateway if options.gateway else cm_host PS7, Line 130: password=options.cm_pass) Is this indent correct according to the style guide? I thought it was supposed to be double the regular indent (so 8 spaces)? PS7, Line 204: Only pick the first cluster configured This comment does not really add anything, it's obvious what the code is doing. Maybe explain WHY we are getting only the first? PS7, Line 209: for service in cluster.get_all_services(): Just an observation: it looks like we don't check that all services we need are present. For example if IMPALA is not in the list, this will not get caught in this method. If Impala is not present, there will be an exception on line 275. Maybe instead of iterating way, we can first construct a dictionary with service.type as key and service as value. Then pull out one service after another to populate result. If a service is missing it will be caught earlier this way. What do you think? Line 281: # See https://issues.cloudera.org/browse/IMPALA-4365 It looks like the issue is resolved, is this still necessary? PS7, Line 284: self.options.snapshot_file is not None: How about removing the not and switching the if and else clause? PS7, Line 338: "database_name" Why would it ever be called "database_name"? Where is this constant coming from? PS7, Line 432: wmay may PS7, Line 451: # "--skip_local_tests", why is this commented out? http://gerrit.cloudera.org:8080/#/c/4769/7/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: Line 163: set -e Don't we already set -eou at the top of the script? Either remove or add a comment why we need to set -e again. Line 492: run-step "Loading nested data" load-nested.log \ The command is very similar in both cases. The only difference is --cm-host. How about factoring that out? http://gerrit.cloudera.org:8080/#/c/4769/7/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: Line 411: if os.getenv("REMOTE_LOAD"): base_load_dir = os.getenv("REMOTE_LOAD") or os.getenv("IMPALA_HOME") http://gerrit.cloudera.org:8080/#/c/4769/7/testdata/bin/load_nested.py File testdata/bin/load_nested.py: Line 301: parser.add_argument("-s", "--source-db", default="tpch_parquet") don't we also need cm-host parameter here? http://gerrit.cloudera.org:8080/#/c/4769/7/testdata/bin/setup-hdfs-env.sh File testdata/bin/setup-hdfs-env.sh: Line 20: set -e we are already setting -euo pipefail below -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Harrison Sheinblatt <[email protected]> Gerrit-Reviewer: Martin Grund <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
