Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/23627 )
Change subject: IMPALA-14553: Run schema eval concurrently ...................................................................... IMPALA-14553: Run schema eval concurrently The majority of time spent in generate-schema-statements.py is in eval_section for schema operations that shell out, often uploading files via the hadoop CLI or generating data files. These operations should be independent. Runs eval_section at the beginning so we don't repeat it for each row in test_vectors, and executes them in parallel via a ThreadPool. Defaults to NUM_CONCURRENT_TESTS threads because the underlying operations have some concurrency to them (such as HDFS mirroring writes). Also collects existing tables into a set to optimize lookup. Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed that contents of logs/data_loading/sql/functional are identical. Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2 Reviewed-on: http://gerrit.cloudera.org:8080/23627 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Michael Smith <[email protected]> --- M testdata/bin/generate-schema-statements.py 1 file changed, 136 insertions(+), 49 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved Michael Smith: Verified -- To view, visit http://gerrit.cloudera.org:8080/23627 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2 Gerrit-Change-Number: 23627 Gerrit-PatchSet: 18 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]>
