Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8320 )
Change subject: IMPALA-6070: Parallel data load. ...................................................................... Patch Set 2: (9 comments) Thanks for the reviews! I observed memory when watching this, and on my 32GB machine, I always has ~20GB available. I agree with Alex on adding in more things: there are similar changes that can continue to help here, but I'm doing them one at a time. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9 PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in > nit: Can you wrap this at the red line provided by gerrit? I think it is 72 Done. "gqip" does it in vi. It looks like it's 72 chars. http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12 PS1, Line 12: 13 minut > nit: minutes Done http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33 PS1, Line 33: 16:14:33 Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 16 min 29 sec) > What testing did you do? Does the data load still run on a non-beefy local Define non-beefy? My desktop is 32 GB and 8 cores. This ran fine. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480 PS1, Line 480: # Run some steps in parallel, with run-step-backgroundable / run-step-wait-all. > Could add a comment about what you decided to background and what you decid Done. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492 PS1, Line 492: LOAD_NESTED_ARGS="--cm-host $CM_HOST" > I don't see any reason this also couldn't run in parallel. Yes, but I've not tested this one. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505 PS1, Line 505: load-data "functional-query" "core" "hbase/none" : fi : : if $KUDU_IS_SUPPORTED; then : # Tests depend on the kudu data being clean, so load > It should be possible to do the same thing for these. That will only save a Yes. I am testing this one, but I'll do a separate patch for it. http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh File testdata/bin/run-hive-server.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75 PS1, Line 75: HADOOP_HEAPSIZE="512" hive --service hiveserver2 > ${LOGDIR}/hive-server2.out 2>&1 & > :). I'm still using that good-old machine, mem should be fine (fingers cros 512 works, so that's what I've changed it to. I'm not investigating using -Xms -Xmx to give this more flexibility (but even less predictability). http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh File testdata/bin/run-step.sh: http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53 PS1, Line 53: > nit: only one empty line, to match context Done http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84 PS1, Line 84: RUN_STEP_MSGS=() > Do you want to reset MSGS, too? Good catch. Done. -- To view, visit http://gerrit.cloudera.org:8080/8320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac Gerrit-Change-Number: 8320 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Zach Amsden <[email protected]> Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 +0000 Gerrit-HasComments: Yes
