Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15966 )
Change subject: WIP - IMPALA-9793: Impala quickstart cluster with docker-compose ...................................................................... Patch Set 6: (14 comments) http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@26 PS6, Line 26: docker network create -d bridge impala-quickstart-network I can add this step to the Kudu quickstart so we align. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@35 PS6, Line 35: -f docker/quickstart-load-data.yml up -d Could the data loader just be run as a standalone image without compose? Similar to how we run the nifi image here: https://github.com/apache/kudu/blob/master/examples/quickstart/nifi/README.adoc#run-apache-nifi http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@40 PS6, Line 40: pip install impala-shell The Kudu quickstart did this in an image because impala-shell won't work/install on all environments. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@49 PS6, Line 49: * KUDU_QUICKSTART_VERSION - defaults to 1.12.0, can be overridden to a This could probably default to latest too. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@57 PS6, Line 57: * Factor out some of the CMake into a docker build I am working on changing the kudu script to be python and much more user friendly. We have found our tagging policy works well so far: https://github.com/apache/kudu/blob/master/docker/docker-build.sh#L25-L49 http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml File docker/quickstart-kudu.yml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@40 PS6, Line 40: --memory_limit_hard_bytes=1073741824 Why did you need to set this? (here and below) http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@42 PS6, Line 42: - impala-quickstart-network It would be great if we could use the actual kudu quickstart here and coordinate on the network. We could use an environment variable with a default so users can set it. If we use generic and unified variables things would just work for the end user: ${QUICKSTART_IP:?Please set QUICKSTART_IP environment variable} ${QUICKSTART_NETWORK:-quickstart-network} We can also update the guide to simply curl the raw yml file instead of requiring users to checkout the repository. Recent updates to the docker images also make it so we need fewer flags. I will make an adjustment Kudu side too. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml File docker/quickstart_conf/hive-site.xml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml@22 PS6, Line 22: <configuration> There is a bit more here than Kudu needed for the quickstart. Maybe adding a comment on why each one is needed or what it is for would be useful. https://github.com/apache/kudu/blob/master/docker/impala/etc/hive/conf/hive-site.xml http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile File docker/quickstart_data_loader/Dockerfile: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@30 PS6, Line 30: LABEL name="Apache Impala Quickstart Data Loader" \ For docker caching purposes its a good idea to be fast/cheap and frequently changing things like this at the bottom. Otherwise the commands below will need to run everytime the git hash changes. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@42 PS6, Line 42: RUN apt-get update && \ Switching to a multi-stage build at some point would help avoid this duplication across images. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@49 PS6, Line 49: RUN pip install impala-shell Would it make sense to use the source based impala-shell? http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql File docker/quickstart_data_loader/load_tpcds_kudu.sql: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql@18 PS6, Line 18: ---- Template SQL statements to create and load TPCDS tables in Kudu. Don't these live somewhere else in the impala repo already? http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql File docker/quickstart_data_loader/load_tpcds_parquet.sql: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql@18 PS6, Line 18: -- Create text tables on top of raw text data. Don't these live somewhere else in the impala repo already? http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile File docker/quickstart_hms/Dockerfile: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile@65 PS6, Line 65: EXPOSE 9083 You shouldn't need this. The docker compose or docker run commands can specify it. -- To view, visit http://gerrit.cloudera.org:8080/15966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c Gerrit-Change-Number: 15966 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 29 May 2020 14:45:48 +0000 Gerrit-HasComments: Yes
