Zoltan Borok-Nagy has uploaded this change for review. (
http://gerrit.cloudera.org:8080/15818
Change subject: IMPALA-9512: Full ACID Milestone 2: Validate each row against
the valid write id list
......................................................................
IMPALA-9512: Full ACID Milestone 2: Validate each row against the valid write
id list
Minor compactions can compact several delta directories into a single
delta directory. The current directory filtering algorithm had to be
modified to handle minor compacted directories and prefer those over
plain delta directories. This happens in the Frontend, mostly in
AcidUtils.java.
In minor compacted directories we need to filter out rows we cannot see.
E.g. we can have the following delta directory:
full_acid/delta_0000001_0000010_0000/0000 # minWriteId: 1
# maxWriteId: 10
This delta dir contains rows with write ids between 1 and 10. But maybe
we are only allowed to see write ids less than 5. Therefore we need to
check the ACID write id column (named originalTransaction) for each row
to decide whether this row is valid or not.
The row validation is implemented in the scanner and the column readers.
New utility classes (ValidWriteIdList and OrcRowValidator) are added to
separate logic.
Most of the complexity comes from the fact that the tuple row index and
the ORC file batch index are out of sync, and row validation is based on
the batch index. A helper method (GetTopLevelIndex()) is added to the
column readers, it maps tuple indexes to file batch indexes.
Testing
* the frontend logic is tested in AcidUtilsTest
* the backend row validation is tested in test_acid_row_validation
Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d
---
M be/src/exec/CMakeLists.txt
A be/src/exec/acid-metadata-utils.cc
A be/src/exec/acid-metadata-utils.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/orc-metadata-utils.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/bin/generate-schema-statements.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-row-validation.test
A tests/common/acid_txn.py
M tests/common/impala_test_suite.py
A tests/query_test/test_acid_row_validation.py
23 files changed, 2,599 insertions(+), 149 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/15818/1
--
To view, visit http://gerrit.cloudera.org:8080/15818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d
Gerrit-Change-Number: 15818
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>