Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9134
Change subject: IMPALA-5717: Support for ORC data files ...................................................................... IMPALA-5717: Support for ORC data files This patch integrates the orc-reader into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. Instead of linking the orc-reader as a third party library, it's integrated in the code level, leaving chances for further optimization, e.g. Predicate Pushdown, Code Generation. Currently, we haven’t changed any codes of the orc-reader. They're in folder be/src/exec/orc. Currently, we only support reading premitive types. Writing into ORC table has not been supported neither. Tests Most of the end-to-end tests can run on ORC format. Have passed all the tests. Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 --- M be/CMakeLists.txt M be/src/exec/CMakeLists.txt A be/src/exec/hdfs-orc-scanner-test.cc A be/src/exec/hdfs-orc-scanner.cc A be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-mt.cc A be/src/orc/Adaptor.hh A be/src/orc/Adaptor.hh.in A be/src/orc/ByteRLE.cc A be/src/orc/ByteRLE.hh A be/src/orc/C09Adapter.cc A be/src/orc/CMakeLists.txt A be/src/orc/ColumnPrinter.cc A be/src/orc/ColumnPrinter.hh A be/src/orc/ColumnReader.cc A be/src/orc/ColumnReader.hh A be/src/orc/Compression.cc A be/src/orc/Compression.hh A be/src/orc/Exceptions.cc A be/src/orc/Exceptions.hh A be/src/orc/Int128.cc A be/src/orc/Int128.hh A be/src/orc/LzoDecompressor.cc A be/src/orc/LzoDecompressor.hh A be/src/orc/MemoryPool.cc A be/src/orc/MemoryPool.hh A be/src/orc/OrcFile.cc A be/src/orc/OrcFile.hh A be/src/orc/RLE.cc A be/src/orc/RLE.hh A be/src/orc/RLEv1.cc A be/src/orc/RLEv1.hh A be/src/orc/RLEv2.cc A be/src/orc/RLEv2.hh A be/src/orc/Reader.cc A be/src/orc/Reader.hh A be/src/orc/Timezone.cc A be/src/orc/Timezone.hh A be/src/orc/Type.hh A be/src/orc/TypeImpl.cc A be/src/orc/TypeImpl.hh A be/src/orc/Vector.cc A be/src/orc/Vector.hh A be/src/orc/orc-config.hh A be/src/orc/orc-config.hh.in A be/src/orc/orc_proto.proto A be/src/orc/wrap/coded-stream-wrapper.h A be/src/orc/wrap/gmock.h A be/src/orc/wrap/gtest-wrapper.h A be/src/orc/wrap/orc-proto-wrapper.cc A be/src/orc/wrap/orc-proto-wrapper.hh A be/src/orc/wrap/snappy-wrapper.h A be/src/orc/wrap/zero-copy-stream-wrapper.h M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/jflex/sql-scanner.flex M testdata/bin/generate-schema-statements.py M testdata/bin/run-hive-server.sh M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv M tests/common/test_dimensions.py M tests/comparison/cli_options.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners.py 70 files changed, 15,389 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/2 -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]>
