Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9134


Change subject: IMPALA-5717: Support for ORC data files
......................................................................

IMPALA-5717: Support for ORC data files

This patch integrates the orc-reader into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch.

Instead of linking the orc-reader as a third party library, it's
integrated in the code level, leaving chances for further optimization,
e.g. Predicate Pushdown, Code Generation. Currently, we haven’t changed
any codes of the orc-reader. They're in folder be/src/exec/orc.

Currently, we only support reading premitive types. Writing into ORC
table has not been supported neither.

Tests
Most of the end-to-end tests can run on ORC format. Have passed all the
tests.

Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
---
M be/CMakeLists.txt
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-orc-scanner-test.cc
A be/src/exec/hdfs-orc-scanner.cc
A be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
A be/src/orc/Adaptor.hh
A be/src/orc/Adaptor.hh.in
A be/src/orc/ByteRLE.cc
A be/src/orc/ByteRLE.hh
A be/src/orc/C09Adapter.cc
A be/src/orc/CMakeLists.txt
A be/src/orc/ColumnPrinter.cc
A be/src/orc/ColumnPrinter.hh
A be/src/orc/ColumnReader.cc
A be/src/orc/ColumnReader.hh
A be/src/orc/Compression.cc
A be/src/orc/Compression.hh
A be/src/orc/Exceptions.cc
A be/src/orc/Exceptions.hh
A be/src/orc/Int128.cc
A be/src/orc/Int128.hh
A be/src/orc/LzoDecompressor.cc
A be/src/orc/LzoDecompressor.hh
A be/src/orc/MemoryPool.cc
A be/src/orc/MemoryPool.hh
A be/src/orc/OrcFile.cc
A be/src/orc/OrcFile.hh
A be/src/orc/RLE.cc
A be/src/orc/RLE.hh
A be/src/orc/RLEv1.cc
A be/src/orc/RLEv1.hh
A be/src/orc/RLEv2.cc
A be/src/orc/RLEv2.hh
A be/src/orc/Reader.cc
A be/src/orc/Reader.hh
A be/src/orc/Timezone.cc
A be/src/orc/Timezone.hh
A be/src/orc/Type.hh
A be/src/orc/TypeImpl.cc
A be/src/orc/TypeImpl.hh
A be/src/orc/Vector.cc
A be/src/orc/Vector.hh
A be/src/orc/orc-config.hh
A be/src/orc/orc-config.hh.in
A be/src/orc/orc_proto.proto
A be/src/orc/wrap/coded-stream-wrapper.h
A be/src/orc/wrap/gmock.h
A be/src/orc/wrap/gtest-wrapper.h
A be/src/orc/wrap/orc-proto-wrapper.cc
A be/src/orc/wrap/orc-proto-wrapper.hh
A be/src/orc/wrap/snappy-wrapper.h
A be/src/orc/wrap/zero-copy-stream-wrapper.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/bin/generate-schema-statements.py
M testdata/bin/run-hive-server.sh
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
M tests/common/test_dimensions.py
M tests/comparison/cli_options.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners.py
70 files changed, 15,389 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/2
--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>

Reply via email to