Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17870 )
Change subject: IMPALA-10894: Pushing down predicates in reading "original files" of ACID tables ...................................................................... IMPALA-10894: Pushing down predicates in reading "original files" of ACID tables ACID tables can have "original files" that don't have full ACID schema. For instance, if we upgrade a non-ACID table to full ACID, the original files won't be changed so they don't have ACID columns, i.e. operation, originalTransaction, bucket, rowid, and currentTransaction. Besides rowid, the other 4 columns can be calculated based on the file path. We calculate the rowid as row index inside the file. This is done by setting a first row id for the split then the OrcStructReader fills the rowid slot with values auto-incremented by one. However, if we push down predicates into the ORC reader, some rows may be skipped. The ORC lib guarantees that rows in a returned batch are consecutive. But consecutive batches may skip rows in the middle. So we can't simply auto-increment the first row id by 1 to calculate the row index. Instead, we should use orc::RowReader::getRowNumber() to update the first row index of the returned batch. This patch changes the row index initialization logic to use orc::RowReader::getRowNumber(), and removes the branch that skips pushing down predicates on such case. Tests: - Ran test_full_acid_original_files Change-Id: I5bfdb624fcaf62ffa22f53025761b9dee3fe58a2 Reviewed-on: http://gerrit.cloudera.org:8080/17870 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/hdfs-orc-scanner.cc 1 file changed, 5 insertions(+), 17 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17870 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5bfdb624fcaf62ffa22f53025761b9dee3fe58a2 Gerrit-Change-Number: 17870 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
