Michael Brown has posted comments on this change. Change subject: IMPALA-3980: Re-enable Hive as a target database for the Random Query Generator ......................................................................
Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/4011/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-3980: Re-enable Hive as a target database for the Random Query Generator I prefer commit messages that, after the bug, speak of the code section changed. So something like: "IMPALA-3890: qgen: reenable Hive as a target database" PS2, Line 19: * Hive integration tested locally by invoking the data generator (./data-generator.py : --db-name=functional --use-hive --min-row-count=50 --max-row-count=100 : --storage-file-formats textfile --use-postgresql --postgresql-user stakiar) and the : discrepancy checker (./discrepancy-checker.py --db-name=functional --use-hive : --use-postgresql --postgresql-user stakiar --test-db-type HIVE --timeout 300 : --query-count 50 --profile hive) While it makes the commit message have more lines, it is more readable to write the commands with breaks across options, like this: ./data_generator.py \ --db-name=functional \ --use-hive \ --min-row-count=50 \ --max-row-count=100 \ --storage-file-formats textfile \ --use-postgresql \ --postgresql-user stakiar http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cli_options.py File tests/comparison/cli_options.py: PS2, Line 150: group.add_argument('--hive-host', default='localhost', : help="The name of the host running the HS2") : group.add_argument("--hive-port", default=10000, type=int, : help="The port of HiveServer2") Why not derive these defaults from symbols imported from cluster.py? (The constants I've asked you to create in that file) http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cluster.py File tests/comparison/cluster.py: PS2, Line 166: def __init__(self, hive_host, hive_port, num_impalads=3): This changes the MiniCluster constructor interface for all callers. It also implicitly changes the cli_options.create_cluster() call, which has several callers. What do you think about defining 127.0.0.1 and 11050 (from the old code L179) as default hive host and port as constants at the top of this module? Then make hive_host and hive_port on this line optional, defaulting to those constants. PS2, Line 178: node_conf_dir = os.environ["HADOOP_CONF_DIR"] Thanks for doing the cleanups like this one. http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/data_generator.py File tests/comparison/data_generator.py: PS2, Line 101: def populate_db(self, table_count, postgresql_conn=None, isHive=False): We don't use variable names of the format "isHive" typically, so can you change this to is_hive? -- To view, visit http://gerrit.cloudera.org:8080/4011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: stak...@cloudera.com Gerrit-HasComments: Yes