Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15966 )
Change subject: IMPALA-9793: Impala quickstart cluster with docker-compose ...................................................................... Patch Set 9: (7 comments) I haven't tried running this, but a lot of this makes sense to me. http://gerrit.cloudera.org:8080/#/c/15966/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15966/9//COMMIT_MSG@26 PS9, Line 26: Instructions for running the quickstart cluster are in : docker/quickstart.yml. One thing I saw over in Kudu is that they have the docker compose instructions in the README.adoc for the docker directory: https://github.com/apache/kudu/tree/master/docker I think that is convenient. We should consider putting the documentation there. http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml File docker/quickstart.yml: http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml@20 PS9, Line 20: # All filesystem data is stored in Docker volumes. The default storage location for tables : # is in the impala-quickstart-warehouse volume, i.e. if you create a table in Impala, it : # will be stored in that volume by default. This could be a follow-on change, but we may want to have some documentation about ingesting user data to this volume. http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart.yml@51 PS9, Line 51: # To load data in background into Parquet and Kudu formats: : # : # docker-compose -f docker/quickstart.yml -f docker/quickstart-kudu-minimal.yml \ : # -f docker/quickstart-load-data.yml up -d Can the dataload part run as a separate docker-compose command? i.e. # Start the cluster with Kudu docker-compose -f docker/quickstart.yml -f docker/quickstart-kudu-minimal.yml up -d # Load data for Parquet/Kudu docker-compose -f docker/quickstart-load-data.yml up -d Is splitting that off a bit clearer? http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/Dockerfile File docker/quickstart_client/Dockerfile: http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/Dockerfile@52 PS9, Line 52: # Copy the Hive install. Nit: Not really Hive related. Maybe change this to "Copy the client entrypoint and dataload files" or something like that? http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh File docker/quickstart_client/data-load-entrypoint.sh: http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh@32 PS9, Line 32: LOading Nit: capitalization typo http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_client/data-load-entrypoint.sh@35 PS9, Line 35: TPCDS_TARBALL=tpc-ds-${TPCDS_VERSION}-gcc-4.9.2-ec2-package-ubuntu-18-04.tar.gz Nit: The client base image is using Ubuntu 16 by default, and we are downloading Ubuntu 18 things here. I doubt there is any problem as tpc-ds is C code, so it might be fine as-is. Since we are not really using anything built from the Impala dev env, we could just always have the client on Ubuntu 18. It should still be buildable from Ubuntu 16. http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_conf/hive-site.xml File docker/quickstart_conf/hive-site.xml: http://gerrit.cloudera.org:8080/#/c/15966/9/docker/quickstart_conf/hive-site.xml@28 PS9, Line 28: <property> : <!-- Required for automatic metadata sync. --> : <name>hive.metastore.notifications.add.thrift.objects</name> : <value>true</value> : </property> : <property> : <!-- Required for automatic metadata sync. --> : <name>hive.metastore.alter.notifications.basic</name> : <value>false</value> : </property> : <property> : <!-- User impala is not authorized to consume notifications by default, disable : authentication to work around this. --> : <name>hive.metastore.event.db.notification.api.auth</name> : <value>false</value> : </property> When I'm looking at https://github.com/apache/impala/blob/master/fe/src/test/resources/hive-site.xml.py , some of these settings are no longer used when we run with Hive 3+. This may be a bit stale. Should the hive-site.xml be basically what we use for our tests with some modifications for the derby database and some hostnames? (And also minus things needed only for Hive Server and not HMS.) This is an area where things may be difficult to evaluate without tests. -- To view, visit http://gerrit.cloudera.org:8080/15966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c Gerrit-Change-Number: 15966 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 14 Jan 2021 23:28:02 +0000 Gerrit-HasComments: Yes
