Tim Armstrong has posted comments on this change.

Change subject: MPALA-5776: Write partial tuple to the correct mempool
......................................................................


Patch Set 2:

(10 comments)

This is very subtle. I think the solution works but maybe there are some ways 
we can make it clearer why it works.

http://gerrit.cloudera.org:8080/#/c/7639/2//COMMIT_MSG
Commit Message:

Line 7: MPALA-5776: Write partial tuple to the correct mempool
Missing I.


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

Line 169:     row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), 
false);
Does this even make sense? are there any cases where it's valid for the row 
batch to reference data in the boundary pool directly?

It seems like either this is unnecessary or we should be transferring the 
contents of boundary_pool_ below instead of clearing it.


Line 299:         boundary_column_.Clear();
This is subtle because the contents of boundary_column_ are needed until 
WriteFields() below, I think. Maybe it would be clearer to document this or 
even move it below to the point where the data is no longer needed.


PS2, Line 765: batches
I'm not sure that I understand the reference to batches. Do they mean blocks?


Line 806:       partial_tuple_->DeepCopy(tuple_, *scan_node_->tuple_desc(), 
pool);
Can you add a column clarifying what the intent is. E.g. "Copy over all 
materialized slots in the partial tuple".


Line 808:       boundary_row_.Reset();
It might be clearer if we factored out this reset logic for boundary_ and 
partial_tuple_ into a separate function and documented the pre-conditions for 
it. I.e. the row has been fully materialized and all memory has been copied 
into the RowBatch pool (I think that's the precondition but not sure).


PS2, Line 810: emptied
cleared? The comment makes me thing that all the memory has been freed, but 
Clear() just enables recycling of the chunks.


Line 814:       partial_tuple_ = Tuple::Create(tuple_byte_size_, 
boundary_pool_.get());
Is it possible to just initialize partial_tuple_ when it's needed? I.e. before 
we call InitTuple(template_tuple_, partial_tuple_). Then maybe we can get rid 
of partial_tuple_empty_ and represent that as partition_tuple_ == nullptr.


http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.h
File be/src/exec/hdfs-text-scanner.h:

Line 188:   /// the boundary pool.
Hah! If only the code had matched the comment...


Line 194:   /// Mem pool for boundary_row_ and boundary_column_.
Needs updating since it also backs partial_tuple_.


-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to