Hello Tim Armstrong, Joe McDonnell, Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9134

to look at the new patch set (#18).

Change subject: IMPALA-5717: Support for reading ORC data files
......................................................................

IMPALA-5717: Support for reading ORC data files

This patch integrates the orc library into Impala and implements
HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner
supplies input needed from the orc-reader, tracks memory consumption of
the reader and transfers the reader's output (orc::ColumnVectorBatch)
into impala::RowBatch. The ORC version we used is release-1.4.3.

A startup option --enable_orc_scanner is added for this feature. It's
set to true by default. Setting it to false will fail queries on ORC
tables.

Currently, we only support reading primitive types. Writing into ORC
table has not been supported neither.

Tests
 - Most of the end-to-end tests can run on ORC format.
 - Add tpcds, tpch tests for ORC.
 - Add some ORC specific tests.
 - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library
   is not robust for corrupt files (ORC-315).

Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
---
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/CMakeLists.txt
A be/src/exec/hdfs-orc-scanner.cc
A be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-parquet-scanner-ir.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/util/backend-gflag-util.cc
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindOrc.cmake
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/LineItemMultiBlock/README.dox
A testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc
A testdata/LineItemMultiBlock/lineitem_sixblocks.orc
A testdata/LineItemMultiBlock/lineitem_threeblocks.orc
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/run-hive-server.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
A testdata/data/chars-formats.orc
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M testdata/workloads/functional-query/functional-query_core.csv
M testdata/workloads/functional-query/functional-query_dimensions.csv
M testdata/workloads/functional-query/functional-query_exhaustive.csv
M testdata/workloads/functional-query/functional-query_pairwise.csv
A 
testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test
M testdata/workloads/tpcds/tpcds_core.csv
M testdata/workloads/tpcds/tpcds_dimensions.csv
M testdata/workloads/tpcds/tpcds_exhaustive.csv
M testdata/workloads/tpcds/tpcds_pairwise.csv
M testdata/workloads/tpch/tpch_core.csv
M testdata/workloads/tpch/tpch_dimensions.csv
M testdata/workloads/tpch/tpch_exhaustive.csv
M testdata/workloads/tpch/tpch_pairwise.csv
M tests/common/impala_test_suite.py
M tests/common/test_dimensions.py
M tests/common/test_vector.py
M tests/comparison/cli_options.py
M tests/query_test/test_chars.py
M tests/query_test/test_decimal_queries.py
M tests/query_test/test_scanners.py
M tests/query_test/test_scanners_fuzz.py
M tests/query_test/test_tpch_queries.py
62 files changed, 1,747 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/9134/18
--
To view, visit http://gerrit.cloudera.org:8080/9134
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4
Gerrit-Change-Number: 9134
Gerrit-PatchSet: 18
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to