Norbert Luksa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15051 )
Change subject: IMPALA-9226: Improve string allocations of the ORC scanner ...................................................................... IMPALA-9226: Improve string allocations of the ORC scanner Currently the OrcColumnReader copies values from the orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which contains the pointed values is moved to the StringVectorBatch, so we can copy it. This commit beside the above improvement also enables the LazyEncoding option for the ORC reader. This way, for stripes with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains the data in a dictionaryBlob from which the data can be acquired with the given indices and lengths. Tests: * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc) and tpch query tests. * Tested performance on tpch.lineitem table with scale=25, running queries that selects min of string columns. Some results: col_name | encoding | before | after | speedup ============================================================= l_comment DIRECT 16.42s 14.38s 14% l_shipinstruct DICTIONARY 5.26s 3.80s 32% l_commitdate DICTIONARY 5.46s 5.19s 5% all string col BOTH 39.06s 32.18s 21% The queries were run on a desktop PC with MT_DOP and NUM_NODES set to 1. * Also run TPC-H queries on the TPC-H benchmark where some queries' runtime improved by around 10-15%, while there were no regression for the others. Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h 4 files changed, 135 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/7 -- To view, visit http://gerrit.cloudera.org:8080/15051 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f Gerrit-Change-Number: 15051 Gerrit-PatchSet: 7 Gerrit-Owner: Norbert Luksa <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
