Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15966 )

Change subject: WIP - IMPALA-9793: Impala quickstart cluster with docker-compose
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@26
PS6, Line 26:   docker network create -d bridge impala-quickstart-network
I can add this step to the Kudu quickstart so we align.


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@35
PS6, Line 35:                  -f docker/quickstart-load-data.yml up -d
Could the data loader just be run as a standalone image without compose? 
Similar to how we run the nifi image here:
https://github.com/apache/kudu/blob/master/examples/quickstart/nifi/README.adoc#run-apache-nifi


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@40
PS6, Line 40:   pip install impala-shell
The Kudu quickstart did this in an image because impala-shell won't 
work/install on all environments.


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@49
PS6, Line 49: * KUDU_QUICKSTART_VERSION - defaults to 1.12.0, can be overridden 
to a
This could probably default to latest too.


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@57
PS6, Line 57: * Factor out some of the CMake into a docker build
I am working on changing the kudu script to be python and much more user 
friendly. We have found our tagging policy works well so far: 
https://github.com/apache/kudu/blob/master/docker/docker-build.sh#L25-L49


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml
File docker/quickstart-kudu.yml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@40
PS6, Line 40:         --memory_limit_hard_bytes=1073741824
Why did you need to set this? (here and below)


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@42
PS6, Line 42:       - impala-quickstart-network
It would be great if we could use the actual kudu quickstart here and 
coordinate on the network. We could use an environment variable with a default 
so users can set it. If we use generic and unified variables things would just 
work for the end user:

   ${QUICKSTART_IP:?Please set QUICKSTART_IP environment variable}
   ${QUICKSTART_NETWORK:-quickstart-network}

We can also update the guide to simply curl the raw yml file instead of 
requiring users to checkout the repository.

Recent updates to the docker images also make it so we need fewer flags. I will 
make an adjustment Kudu side too.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml
File docker/quickstart_conf/hive-site.xml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml@22
PS6, Line 22: <configuration>
There is a bit more here than Kudu needed for the quickstart. Maybe adding a 
comment on why each one is needed or what it is for would be useful.
https://github.com/apache/kudu/blob/master/docker/impala/etc/hive/conf/hive-site.xml


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile
File docker/quickstart_data_loader/Dockerfile:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@30
PS6, Line 30: LABEL name="Apache Impala Quickstart Data Loader" \
For docker caching purposes its a good idea to be fast/cheap and frequently 
changing things like this at the bottom. Otherwise the commands below will need 
to run everytime the git hash changes.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@42
PS6, Line 42: RUN apt-get update && \
Switching to a multi-stage build at some point would help avoid this 
duplication across images.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@49
PS6, Line 49: RUN pip install impala-shell
Would it make sense to use the source based impala-shell?


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql
File docker/quickstart_data_loader/load_tpcds_kudu.sql:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql@18
PS6, Line 18: ---- Template SQL statements to create and load TPCDS tables in 
Kudu.
Don't these live somewhere else in the impala repo already?


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql
File docker/quickstart_data_loader/load_tpcds_parquet.sql:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql@18
PS6, Line 18: -- Create text tables on top of raw text data.
Don't these live somewhere else in the impala repo already?


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile
File docker/quickstart_hms/Dockerfile:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile@65
PS6, Line 65: EXPOSE 9083
You shouldn't need this. The docker compose or docker run commands can specify 
it.



--
To view, visit http://gerrit.cloudera.org:8080/15966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c
Gerrit-Change-Number: 15966
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 29 May 2020 14:45:48 +0000
Gerrit-HasComments: Yes

Reply via email to