Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8320 )

Change subject: IMPALA-6070: Parallel data load.
......................................................................


Patch Set 2:

(9 comments)

Thanks for the reviews!

I observed memory when watching this, and on my 32GB machine, I always has 
~20GB available.

I agree with Alex on adding in more things: there are similar changes that can 
continue to help here, but I'm doing them one at a time.

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@9
PS1, Line 9: This commit loads functional-query, TPC-H data, and TPC-DS data in
> nit: Can you wrap this at the red line provided by gerrit? I think it is 72
Done. "gqip" does it in vi. It looks like it's 72 chars.


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@12
PS1, Line 12: 13 minut
> nit: minutes
Done


http://gerrit.cloudera.org:8080/#/c/8320/1//COMMIT_MSG@33
PS1, Line 33: 16:14:33    Loading workload 'tpcds' using exploration strategy 
'core' OK (Took: 16 min 29 sec)
> What testing did you do? Does the data load still run on a non-beefy local
Define non-beefy?

My desktop is 32 GB and 8 cores. This ran fine.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@480
PS1, Line 480:   # Run some steps in parallel, with run-step-backgroundable / 
run-step-wait-all.
> Could add a comment about what you decided to background and what you decid
Done.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@492
PS1, Line 492:     LOAD_NESTED_ARGS="--cm-host $CM_HOST"
> I don't see any reason this also couldn't run in parallel.
Yes, but I've not tested this one.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/create-load-data.sh@505
PS1, Line 505:       load-data "functional-query" "core" "hbase/none"
             : fi
             :
             : if $KUDU_IS_SUPPORTED; then
             :   # Tests depend on the kudu data being clean, so load
> It should be possible to do the same thing for these. That will only save a
Yes. I am testing this one, but I'll do a separate patch for it.


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-hive-server.sh@75
PS1, Line 75:   HADOOP_HEAPSIZE="512" hive --service hiveserver2 > 
${LOGDIR}/hive-server2.out 2>&1 &
> :). I'm still using that good-old machine, mem should be fine (fingers cros
512 works, so that's what I've changed it to. I'm not investigating using -Xms 
-Xmx to give this more flexibility (but even less predictability).


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh
File testdata/bin/run-step.sh:

http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@53
PS1, Line 53:
> nit: only one empty line, to match context
Done


http://gerrit.cloudera.org:8080/#/c/8320/1/testdata/bin/run-step.sh@84
PS1, Line 84:   RUN_STEP_MSGS=()
> Do you want to reset MSGS, too?
Good catch. Done.



--
To view, visit http://gerrit.cloudera.org:8080/8320
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I836c4e1586f229621c102c4f4ba22ce7224ab9ac
Gerrit-Change-Number: 8320
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Zach Amsden <[email protected]>
Gerrit-Comment-Date: Sat, 21 Oct 2017 21:32:51 +0000
Gerrit-HasComments: Yes

Reply via email to