Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9134 )
Change subject: IMPALA-5717: Support for reading ORC data files ...................................................................... IMPALA-5717: Support for reading ORC data files This patch integrates the orc library into Impala and implements HdfsOrcScanner as a middle layer between them. The HdfsOrcScanner supplies input needed from the orc-reader, tracks memory consumption of the reader and transfers the reader's output (orc::ColumnVectorBatch) into impala::RowBatch. The ORC version we used is release-1.4.3. A startup option --enable_orc_scanner is added for this feature. It's set to true by default. Setting it to false will fail queries on ORC tables. Currently, we only support reading primitive types. Writing into ORC table has not been supported neither. Tests - Most of the end-to-end tests can run on ORC format. - Add tpcds, tpch tests for ORC. - Add some ORC specific tests. - Haven't enabled test_scanner_fuzz for ORC yet, since the ORC library is not robust for corrupt files (ORC-315). Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Reviewed-on: http://gerrit.cloudera.org:8080/9134 Reviewed-by: Quanlong Huang <[email protected]> Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M CMakeLists.txt M be/CMakeLists.txt M be/src/codegen/gen_ir_descriptions.py M be/src/exec/CMakeLists.txt A be/src/exec/hdfs-orc-scanner.cc A be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/util/backend-gflag-util.cc M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindOrc.cmake M common/thrift/BackendGflags.thrift M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/jflex/sql-scanner.flex M testdata/LineItemMultiBlock/README.dox A testdata/LineItemMultiBlock/lineitem_orc_multiblock_one_stripe.orc A testdata/LineItemMultiBlock/lineitem_sixblocks.orc A testdata/LineItemMultiBlock/lineitem_threeblocks.orc M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/run-hive-server.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl A testdata/data/chars-formats.orc M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/functional-query_core.csv M testdata/workloads/functional-query/functional-query_dimensions.csv M testdata/workloads/functional-query/functional-query_exhaustive.csv M testdata/workloads/functional-query/functional-query_pairwise.csv A testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/tpcds/tpcds_core.csv M testdata/workloads/tpcds/tpcds_dimensions.csv M testdata/workloads/tpcds/tpcds_exhaustive.csv M testdata/workloads/tpcds/tpcds_pairwise.csv M testdata/workloads/tpch/tpch_core.csv M testdata/workloads/tpch/tpch_dimensions.csv M testdata/workloads/tpch/tpch_exhaustive.csv M testdata/workloads/tpch/tpch_pairwise.csv M tests/common/impala_test_suite.py M tests/common/test_dimensions.py M tests/common/test_vector.py M tests/comparison/cli_options.py M tests/query_test/test_chars.py M tests/query_test/test_decimal_queries.py M tests/query_test/test_scanners.py M tests/query_test/test_scanners_fuzz.py M tests/query_test/test_tpch_queries.py 62 files changed, 1,745 insertions(+), 307 deletions(-) Approvals: Quanlong Huang: Looks good to me, but someone else must approve Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9134 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia7b6ae4ce3b9ee8125b21993702faa87537790a4 Gerrit-Change-Number: 9134 Gerrit-PatchSet: 21 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
