Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15966 )
Change subject: WIP - IMPALA-9793: Impala quickstart cluster with docker-compose ...................................................................... Patch Set 6: (17 comments) Didn't quite address all of the issues, but wanted to push out another revision since I had a bunch of changes queued up. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@26 PS6, Line 26: docker network create -d bridge impala-quickstart-network > I can add this step to the Kudu quickstart so we align. Sounds good. What network name do we want to use? http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@35 PS6, Line 35: -f docker/quickstart-load-data.yml up -d > Could the data loader just be run as a standalone image without compose? Si You can definitely do that, I think the nice thing about compose is that it makes sense that the volumes and network are configured right. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@40 PS6, Line 40: pip install impala-shell > The Kudu quickstart did this in an image because impala-shell won't work/in Yeah, I had someone else test this out and it was a headache to pip install impala-shell even on Ubuntu 20.04. http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@49 PS6, Line 49: * KUDU_QUICKSTART_VERSION - defaults to 1.12.0, can be overridden to a > This could probably default to latest too. Done http://gerrit.cloudera.org:8080/#/c/15966/6//COMMIT_MSG@61 PS6, Line 61: * Upload latest version of containers before merging. I should add some better instructions on how to do things like copy data into the warehouse with 'docker cp' http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml File docker/quickstart-kudu.yml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@40 PS6, Line 40: --memory_limit_hard_bytes=1073741824 > Why did you need to set this? (here and below) I was seeing the tservers using quite a lot of memory without this after loading tpc-ds (~1.5GB - 2GB each). I used "docker stats". It seemed like that might be a problem on more memory-constrained systems. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-kudu.yml@42 PS6, Line 42: - impala-quickstart-network > It would be great if we could use the actual kudu quickstart here and coord Yeah I think this makes sense. We'd have to be pickier about the network name cause of the sensitivity to underscores, but I think your proposal would work fine. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-load-data.yml File docker/quickstart-load-data.yml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart-load-data.yml@27 PS6, Line 27: - ./quickstart_conf:/opt/impala/conf:ro Not sure that the config is needed http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart.yml File docker/quickstart.yml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart.yml@39 PS6, Line 39: - "25010:25010" I locked down the default config so that it doesn't automatically listen on all interfaces. Ideally it would default to listening on IMPALA_QUICKSTART_IP, but I couldn't figure out a way to do that with docker-compose's limited variable substitution. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml File docker/quickstart_conf/hive-site.xml: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_conf/hive-site.xml@22 PS6, Line 22: <configuration> > There is a bit more here than Kudu needed for the quickstart. Maybe adding Done. I think I actually was missing some of the required HMS notification configs, so it was good to double check http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile File docker/quickstart_data_loader/Dockerfile: PS6: I changed this to also be usable as an impala-shell container. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@30 PS6, Line 30: LABEL name="Apache Impala Quickstart Data Loader" \ > For docker caching purposes its a good idea to be fast/cheap and frequently Done http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@42 PS6, Line 42: RUN apt-get update && \ > Switching to a multi-stage build at some point would help avoid this duplic Yeah I think it would be worthwhile combining this with the other quickstart-specific image. I'll probably keep the daemon images separate so they can have a more minimal set of dependencies. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/Dockerfile@49 PS6, Line 49: RUN pip install impala-shell > Would it make sense to use the source based impala-shell? This was kinda a shortcut, would probably make sense to build and include the latest impala-shell but I think I'd like to punt on that. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql File docker/quickstart_data_loader/load_tpcds_kudu.sql: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_kudu.sql@18 PS6, Line 18: ---- Template SQL statements to create and load TPCDS tables in Kudu. > Don't these live somewhere else in the impala repo already? Yeah, this was copied and modified from testdata/datasets/tpcds/tpcds_kudu_template.sql. The non-Kudu schemas are defined here - testdata/datasets/tpcds/tpcds_schema_template.sql The duplication was preferable, for me, over trying to bring the test data loading framework into it to process those files. It's also nice that loading the data is just running a SQL file, without indirection or templating. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql File docker/quickstart_data_loader/load_tpcds_parquet.sql: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_data_loader/load_tpcds_parquet.sql@18 PS6, Line 18: -- Create text tables on top of raw text data. > Don't these live somewhere else in the impala repo already? Same comment as for Kudu. http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile File docker/quickstart_hms/Dockerfile: http://gerrit.cloudera.org:8080/#/c/15966/6/docker/quickstart_hms/Dockerfile@65 PS6, Line 65: EXPOSE 9083 > You shouldn't need this. The docker compose or docker run commands can spec Done -- To view, visit http://gerrit.cloudera.org:8080/15966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc0b862af40a368381ada7ec2a355fe4b0aa778c Gerrit-Change-Number: 15966 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 12 Jun 2020 04:07:23 +0000 Gerrit-HasComments: Yes
