Michael Smith has uploaded a new patch set (#5). (
http://gerrit.cloudera.org:8080/23627 )
Change subject: Run schema eval concurrently
......................................................................
Run schema eval concurrently
The majority of time spent in generate-schema-statements.py is in
eval_section for schema operations that shell out, often uploading files
via the hadoop CLI or generating data files. These operations should be
independent.
Runs eval_section at the beginning so we don't repeat it for each row in
test_vectors, and executes them in parallel via a ThreadPool.
Also collects existing tables into a set to optimize lookup.
Speeds up functional-query devdata load by ~20%. As a side-effect, it
slightly slows down parallel TPC-H and TPC-DS data load because
functional-query load starts sooner, but overall still faster. Before
Loading TPC-H data OK (Took: 0 min 12 sec)
Loading TPC-DS data OK (Took: 1 min 34 sec)
Loading functional-query data OK (Took: 10 min 20 sec)
After
Loading TPC-H data OK (Took: 0 min 16 sec)
Loading TPC-DS data OK (Took: 1 min 51 sec)
Loading functional-query data OK (Took: 8 min 1 sec)
Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2
---
M testdata/bin/generate-schema-statements.py
1 file changed, 104 insertions(+), 44 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/23627/5
--
To view, visit http://gerrit.cloudera.org:8080/23627
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2
Gerrit-Change-Number: 23627
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith <[email protected]>