Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans. ......................................................................
Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 64: batch_start_ptr_(NULL), use nullptr throughout Line 273: MemPool* pool = row_batch->tuple_data_pool(); please consider getting rid of this variable and referring to the rowbatch pool directly. there are too many mempools flying around in this code, it's nice to have a reminder which one is being used. Line 440: eos_ = true; advance scan_state_? also, i don't see anything in hdfs-scanner.h that says you can't call getnext() when eos() returns true. Line 445: << "FE should have generated SNAPPY_BLOCKED instead."; why not check this when creating the stream? Line 446: RETURN_IF_ERROR(UpdateDecompressor(stream_->file_desc()->file_compression)); why is this needed, given that we just started (ie, why not move it into initialization)? Line 459: eos_ = true; scan_state_ transition? Line 469: Status HdfsTextScanner::CommitRows(RowBatch* row_batch, int num_rows) { > This loosely corresponds to the previous HdfsScanNode::CommitRows() how about coalescing that with hdfsscannode::commitrows? at the very least move this into hdfsscannode. Line 485: ExprContext::FreeLocalAllocations(entry.second); why call this here? http://gerrit.cloudera.org:8080/#/c/6000/2/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 119: /// regardless if whether it passed the conjuncts. regardless of whether Line 124: /// Only valid to call in scan state FIRST_TUPLE_FOUND or PAST_SCAN_RANGE. thanks for introducing that state variable, it's much clearer than a bunch of bools that get passed around. Line 135: Status CommitRows(RowBatch* row_batch, int num_rows); in/out params go last Line 140: /// otherwise it will just read num_bytes. If we are reading compressed text, num_bytes if we are reading compressed text, should this even be called (or why is there a 'compressed stream' variant)? -- To view, visit http://gerrit.cloudera.org:8080/6000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
