Norbert Luksa has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/15051 )

Change subject: IMPALA-9226: Improve string allocations of the ORC scanner
......................................................................

IMPALA-9226: Improve string allocations of the ORC scanner

Currently the OrcColumnReader copies values from the
orc::StringVectorBatch one-by-one. Since ORC 1.6, the blob which
contains the pointed values is moved to the StringVectorBatch,
so we can copy it.

This commit beside the above improvement also enables the
LazyEncoding option for the ORC reader. This way, for stripes
with DICTIONARY_ENCODING[_V2], EncodedStringVectorBatch contains
the data in a dictionaryBlob from which the data can be aquired
with the given indices and lengths.

Tests:
 * Run ORC scanner tests (query_tests/test_scanners.py::TestOrc)
 * Tested performance on tpch.lineitem table with scale=25,
   running queries that selects min of string columns.
   Some results:
   col_name      | encoding | before      | after
   =====================================================
   l_comment       DIRECT     14.05s (97s)  4.11s (30s)
   l_shipinstruct  DICTIONARY 12.36s (86s)  2.16s (18s)
   l_commitdate    DICTIONARY 12.56s (85s)  2.54s (20s)

   The queries are run on my machine with 8 core, while
   in the brackets the durations are when MT_DOP and NUM_NODES
   are set to 1.

Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
3 files changed, 95 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/15051/2
--
To view, visit http://gerrit.cloudera.org:8080/15051
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If2d975946fb6f4104d8dc98895285b3a0c6bef7f
Gerrit-Change-Number: 15051
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to