Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15966 )

Change subject: WIP - IMPALA-9793: Impala quickstart cluster with docker-compose
......................................................................


Patch Set 6:

(17 comments)

Didn't quite address all of the issues, but wanted to push out another revision 
since I had a bunch of changes queued up.

http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@26
PS6, Line 26:   docker network create -d bridge impala-quickstart-network
> I can add this step to the Kudu quickstart so we align.
Sounds good. What network name do we want to use?


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@35
PS6, Line 35:                  -f docker/quickstart-load-data.yml up -d
> Could the data loader just be run as a standalone image without compose? Si
You can definitely do that, I think the nice thing about compose is that it 
makes sense that the volumes and network are configured right.


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@40
PS6, Line 40:   pip install impala-shell
> The Kudu quickstart did this in an image because impala-shell won't work/in
Yeah, I had someone else test this out and it was a headache to pip install 
impala-shell even on Ubuntu 20.04.


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@49
PS6, Line 49: * KUDU_QUICKSTART_VERSION - defaults to 1.12.0, can be overridden 
to a
> This could probably default to latest too.
Done


http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@61
PS6, Line 61: * Upload latest version of containers before merging.
I should add some better instructions on how to do things like copy data into 
the warehouse with 'docker cp'


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml
File docker/quickstart-kudu.yml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@40
PS6, Line 40:         --memory_limit_hard_bytes=1073741824
> Why did you need to set this? (here and below)
I was seeing the tservers using quite a lot of memory without this after 
loading tpc-ds (~1.5GB - 2GB each). I used "docker stats". It seemed like that 
might be a problem on more memory-constrained systems.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@42
PS6, Line 42:       - impala-quickstart-network
> It would be great if we could use the actual kudu quickstart here and coord
Yeah I think this makes sense. We'd have to be pickier about the network name 
cause of the sensitivity to underscores, but I think your proposal would work 
fine.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-load-data.yml
File docker/quickstart-load-data.yml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-load-data.yml@27
PS6, Line 27:       - ./quickstart_conf:/opt/impala/conf:ro
Not sure that the config is needed


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart.yml
File docker/quickstart.yml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart.yml@39
PS6, Line 39:       - "25010:25010"
I locked down the default config so that it doesn't automatically listen on all 
interfaces. Ideally it would default to listening on IMPALA_QUICKSTART_IP, but 
I couldn't figure out a way to do that with docker-compose's limited variable 
substitution.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml
File docker/quickstart_conf/hive-site.xml:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml@22
PS6, Line 22: <configuration>
> There is a bit more here than Kudu needed for the quickstart. Maybe adding
Done. I think I actually was missing some of the required HMS notification 
configs, so it was good to double check


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile
File docker/quickstart_data_loader/Dockerfile:

PS6:
I changed this to also be usable as an impala-shell container.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@30
PS6, Line 30: LABEL name="Apache Impala Quickstart Data Loader" \
> For docker caching purposes its a good idea to be fast/cheap and frequently
Done


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@42
PS6, Line 42: RUN apt-get update && \
> Switching to a multi-stage build at some point would help avoid this duplic
Yeah I think it would be worthwhile combining this with the other 
quickstart-specific image. I'll probably keep the daemon images separate so 
they can have a more minimal set of dependencies.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@49
PS6, Line 49: RUN pip install impala-shell
> Would it make sense to use the source based impala-shell?
This was kinda a shortcut, would probably make sense to build and include the 
latest impala-shell but I think I'd like to punt on that.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql
File docker/quickstart_data_loader/load_tpcds_kudu.sql:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql@18
PS6, Line 18: ---- Template SQL statements to create and load TPCDS tables in 
Kudu.
> Don't these live somewhere else in the impala repo already?
Yeah, this was copied and modified from 
testdata/datasets/tpcds/tpcds_kudu_template.sql. The non-Kudu schemas are 
defined here - testdata/datasets/tpcds/tpcds_schema_template.sql


The duplication was preferable, for me, over trying to bring the test data 
loading framework into it to process those files. It's also nice that loading 
the data is just running a SQL file, without indirection or templating.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql
File docker/quickstart_data_loader/load_tpcds_parquet.sql:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql@18
PS6, Line 18: -- Create text tables on top of raw text data.
> Don't these live somewhere else in the impala repo already?
Same comment as for Kudu.


http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile
File docker/quickstart_hms/Dockerfile:

http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile@65
PS6, Line 65: EXPOSE 9083
> You shouldn't need this. The docker compose or docker run commands can spec
Done



--
To view, visit http://gerrit.cloudera.org:8080/15966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c
Gerrit-Change-Number: 15966
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 12 Jun 2020 04:07:23 +0000
Gerrit-HasComments: Yes

Reply via email to