Michael Brown has posted comments on this change.

Change subject: IMPALA-3980: Re-enable Hive as a target database for the Random 
Query Generator

Patch Set 2:


Commit Message:

PS2, Line 7: IMPALA-3980: Re-enable Hive as a target database for the Random 
Query Generator
I prefer commit messages that, after the bug, speak of the code section 
changed. So something like:

"IMPALA-3890: qgen: reenable Hive as a target database"

PS2, Line 19: * Hive integration tested locally by invoking the data generator 
            : --db-name=functional --use-hive --min-row-count=50 
            : --storage-file-formats textfile --use-postgresql 
--postgresql-user stakiar) and the
            : discrepancy checker (./discrepancy-checker.py 
--db-name=functional --use-hive
            : --use-postgresql --postgresql-user stakiar --test-db-type HIVE 
--timeout 300
            : --query-count 50 --profile hive)
While it makes the commit message have more lines, it is more readable to write 
the commands with breaks across options, like this:

  ./data_generator.py \
    --db-name=functional \
    --use-hive \
    --min-row-count=50 \
    --max-row-count=100 \
    --storage-file-formats textfile \
    --use-postgresql \
    --postgresql-user stakiar

File tests/comparison/cli_options.py:

PS2, Line 150:   group.add_argument('--hive-host', default='localhost',
             :       help="The name of the host running the HS2")
             :   group.add_argument("--hive-port", default=10000, type=int,
             :       help="The port of HiveServer2")
Why not derive these defaults from symbols imported from cluster.py? (The 
constants I've asked you to create in that file)

File tests/comparison/cluster.py:

PS2, Line 166:   def __init__(self, hive_host, hive_port, num_impalads=3):
This changes the MiniCluster constructor interface for all callers. It also 
implicitly changes the cli_options.create_cluster() call, which has several 

What do you think about defining and 11050 (from the old code L179) 
as default hive host and port as constants at the top of this module? Then make 
hive_host and hive_port on this line optional, defaulting to those constants.

PS2, Line 178:     node_conf_dir = os.environ["HADOOP_CONF_DIR"]
Thanks for doing the cleanups like this one.

File tests/comparison/data_generator.py:

PS2, Line 101:   def populate_db(self, table_count, postgresql_conn=None, 
We don't use variable names of the format "isHive" typically, so can you change 
this to is_hive?

To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes

Reply via email to