Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12168 )
Change subject: IMPALA-6503: Support reading complex types from ORC ...................................................................... IMPALA-6503: Support reading complex types from ORC We've supported reading primitive types from ORC files (IMPALA-5717). In this patch we add support for complex types (struct/array/map). In IMPALA-5717, we leverage the ORC lib to parse ORC binaries (data in io buffer read from DiskIoMgr). The ORC lib can materialize ORC column binaries into its representation (orc::ColumnVectorBatch). Then we transform values in orc::ColumnVectorBatch into impala::Tuples in hdfs-orc-scanner. We don't need to do anything about decoding/decompression since they are handled by the ORC lib. Fortunately, the ORC lib already supports complex types, we can still leverage it to support complex types. What we need to add in IMPALA-6503 are two things: 1. Specify which nested columns we need in the form required by the ORC lib (Get list of ORC type ids from tuple descriptors) 2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into Impala's representation (Slots/Tuples/RowBatches) To format the materialization, we implement several ORC column readers in hdfs-orc-scanner. Each kind of reader treats a column type and transforms outputs of the ORC lib into tuple/slot values. Tests: * Enable existing tests for complex types (test_nested_types.py, test_tpch_nested_queries.py) for ORC. * Run exhaustive tests in DEBUG and RELEASE builds. Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Reviewed-on: http://gerrit.cloudera.org:8080/12168 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h A be/src/exec/orc-column-readers.cc A be/src/exec/orc-column-readers.h A be/src/exec/orc-metadata-utils.cc A be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java A testdata/ComplexTypesTbl/README A testdata/ComplexTypesTbl/nonnullable.orc A testdata/ComplexTypesTbl/nullable.orc M testdata/bin/create-load-data.sh M testdata/bin/load_nested.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/queries/DataErrorsTest/orc-type-checks.test M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit-single-node.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv M tests/query_test/test_nested_types.py M tests/query_test/test_tpch_nested_queries.py 30 files changed, 1,943 insertions(+), 466 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 25 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
