Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6000
Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. ...................................................................... IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. Implements HdfsLzoTextTextScanner::GetNext() and changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSpit()) and the new GetNext() interface. These changes were tricky: - The scanner used to rely on the ability to attach a batch to the row-batch queue for freeing resources - This patch attempts to preserve the resource-freeing behavior by clearing resources as soon as they are complete - In particular, the scanner attempts to skip corrupt/invalid data blocks, and we should avoid accumulating memory unnecessarily The other changes are mostly straightforward: - Add a RowBatch parameter to various functions - Add a MemPool parameter to various functions for attaching memory of completed resources that may still be references by returned batches - Change Close() to free all resources when a nullptr RowBatch is passed Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 13 files changed, 404 insertions(+), 320 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/6000/1 -- To view, visit http://gerrit.cloudera.org:8080/6000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]>
