Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. ......................................................................
IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Reviewed-on: http://gerrit.cloudera.org:8080/6527 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/kudu-scan-node.cc M be/src/exec/scan-node.h M be/src/util/blocking-queue.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 26 files changed, 709 insertions(+), 824 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]>
