Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17870 )

Change subject: IMPALA-10894: Pushing down predicates in reading "original 
files" of ACID tables
......................................................................

IMPALA-10894: Pushing down predicates in reading "original files" of ACID tables

ACID tables can have "original files" that don't have full ACID schema.
For instance, if we upgrade a non-ACID table to full ACID, the original
files won't be changed so they don't have ACID columns, i.e. operation,
originalTransaction, bucket, rowid, and currentTransaction.

Besides rowid, the other 4 columns can be calculated based on the file
path. We calculate the rowid as row index inside the file. This is done
by setting a first row id for the split then the OrcStructReader fills
the rowid slot with values auto-incremented by one.

However, if we push down predicates into the ORC reader, some rows may
be skipped. The ORC lib guarantees that rows in a returned batch are
consecutive. But consecutive batches may skip rows in the middle. So we
can't simply auto-increment the first row id by 1 to calculate the row
index. Instead, we should use orc::RowReader::getRowNumber() to update
the first row index of the returned batch.

This patch changes the row index initialization logic to use
orc::RowReader::getRowNumber(), and removes the branch that skips
pushing down predicates on such case.

Tests:
 - Ran test_full_acid_original_files

Change-Id: I5bfdb624fcaf62ffa22f53025761b9dee3fe58a2
Reviewed-on: http://gerrit.cloudera.org:8080/17870
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/exec/hdfs-orc-scanner.cc
1 file changed, 5 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17870
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5bfdb624fcaf62ffa22f53025761b9dee3fe58a2
Gerrit-Change-Number: 17870
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to