Alex Behm has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans ......................................................................
Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 1031: DCHECK_EQ(0, scratch_batch_->total_allocated_bytes()); Where do the decompression buffers get freed? http://gerrit.cloudera.org:8080/#/c/6949/2/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS2, Line 48: MemPool It's not clear from the var names comments where the var-len data goes. That's important to point out explicitly. Line 50: // Pool used to accumulate other memory such as decompression buffers that may be may be referenced Line 109: dst_batch->tuple_data_pool()->AcquireData(&aux_mem_pool, false); I would have thought that the var-len data like strings or collections can make up the bulk of memory that needs to be transferred, so why not deep-copy those out as well and avoid this transfer? What's the rationale behind only avoiding transferring the mem for the fixed-len portion? Line 130: if (num_output_batches > 1) return false; This new compaction has non-obvious caveats like this one, and I find the flow of memory difficult to follow now. I wonder if this process could be simplified if we did something along these lines: 1. Evaluate conjuncts over all tuples in scratch batch. Keep a bitmap of survivors. 2. Decide whether to compact scratch batch or not. 3. Transfer rows to output batch. When AtEnd() of the scratch batch, have a function TransferResources() or similar to transfer whatever the output batch needs. This may be the original memory or memory from compaction. Let's discuss before you make any changes obviously :) Line 139: for (int i = dst_batch->num_rows(); i < end_row; ++i) { Don't we have a CopyRows() for this in RowBatch? -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
