Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15966 )

Change subject: IMPALA-9793: Impala quickstart cluster with docker-compose
......................................................................


Patch Set 9:

(7 comments)

I haven't tried running this, but a lot of this makes sense to me.

http://gerrit.cloudera.org:8080/#/c/15966/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15966/9//COMMIT_MSG@26
PS9, Line 26: Instructions for running the quickstart cluster are in
            : docker/quickstart.yml.
One thing I saw over in Kudu is that they have the docker compose instructions 
in the README.adoc for the docker directory: 
https://github.com/apache/kudu/tree/master/docker
I think that is convenient. We should consider putting the documentation there.


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml
File docker/quickstart.yml:

http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml@20
PS9, Line 20: # All filesystem data is stored in Docker volumes. The default 
storage location for tables
            : # is in the impala-quickstart-warehouse volume, i.e. if you 
create a table in Impala, it
            : # will be stored in that volume by default.
This could be a follow-on change, but we may want to have some documentation 
about ingesting user data to this volume.


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml@51
PS9, Line 51: # To load data in background into Parquet and Kudu formats:
            : #
            : #  docker-compose -f docker/quickstart.yml -f 
docker/quickstart-kudu-minimal.yml \
            : #                     -f docker/quickstart-load-data.yml up -d
Can the dataload part run as a separate docker-compose command? i.e.

# Start the cluster with Kudu
docker-compose -f docker/quickstart.yml -f docker/quickstart-kudu-minimal.yml 
up -d

# Load data for Parquet/Kudu
docker-compose -f docker/quickstart-load-data.yml up -d

Is splitting that off a bit clearer?


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/Dockerfile
File docker/quickstart_client/Dockerfile:

http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/Dockerfile@52
PS9, Line 52: # Copy the Hive install.
Nit: Not really Hive related. Maybe change this to "Copy the client entrypoint 
and dataload files" or something like that?


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh
File docker/quickstart_client/data-load-entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh@32
PS9, Line 32: LOading
Nit: capitalization typo


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh@35
PS9, Line 35:   
TPCDS_TARBALL=tpc-ds-${TPCDS_VERSION}-gcc-4.9.2-ec2-package-ubuntu-18-04.tar.gz
Nit: The client base image is using Ubuntu 16 by default, and we are 
downloading Ubuntu 18 things here. I doubt there is any problem as tpc-ds is C 
code, so it might be fine as-is. Since we are not really using anything built 
from the Impala dev env, we could just always have the client on Ubuntu 18. It 
should still be buildable from Ubuntu 16.


http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_conf/hive-site.xml
File docker/quickstart_conf/hive-site.xml:

http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_conf/hive-site.xml@28
PS9, Line 28:      <property>
            :         <!-- Required for automatic metadata sync. -->
            :         
<name>hive.metastore.notifications.add.thrift.objects</name>
            :         <value>true</value>
            :       </property>
            :       <property>
            :         <!-- Required for automatic metadata sync. -->
            :         <name>hive.metastore.alter.notifications.basic</name>
            :         <value>false</value>
            :       </property>
            :       <property>
            :         <!-- User impala is not authorized to consume 
notifications by default, disable
            :              authentication to work around this. -->
            :          
<name>hive.metastore.event.db.notification.api.auth</name>
            :         <value>false</value>
            :       </property>
When I'm looking at 
https://github.com/apache/impala/blob/master/fe/src/test/resources/hive-site.xml.py
 , some of these settings are no longer used when we run with Hive 3+. This may 
be a bit stale. Should the hive-site.xml be basically what we use for our tests 
with some modifications for the derby database and some hostnames? (And also 
minus things needed only for Hive Server and not HMS.)

This is an area where things may be difficult to evaluate without tests.



--
To view, visit http://gerrit.cloudera.org:8080/15966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c
Gerrit-Change-Number: 15966
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 14 Jan 2021 23:28:02 +0000
Gerrit-HasComments: Yes

Reply via email to