Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10550 )

Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro 
scans
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@18
PS2, Line 18: Cap the maximum row batch queue size at 5 * the number of active
            : scanner threads.
adding this heuristic on top of the current one seems hard to understand how 
the system will behave. what's the reasoning behind the new heuristic combined 
with the old heuristic? does it make sense intuitive?


http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@32
PS2, Line 32: Track the row batch queue memory consumption against a sub-tracker
what does that look like in terms of the memtracker hierarchy?


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@496
PS2, Line 496: Check
             :   // AtCapacity() at the end of the loop to guarantee that we 
process at least one row
             :   // so that we mke progress even if the batch starts off with 
AtCapacity() == true.
that's kinda confusing because it's counter to our usual which is that once 
AtCapaticy() is true you are not allowed to add more rows (it might have hit 
the hard limit of tuple ptrs). Maybe reword a bit and add a dcheck that the 
hard limit isn't reached yet. (Though this do-while pattern for AtCapacity() is 
preexisting.)


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@568
PS2, Line 568: keep_current_chunk = true to
but line 573 passes false. i think you meant false here.


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc@238
PS2, Line 238:   row_batch->SetMemTracker(row_batches_mem_tracker_);
which mem-tracker are these accounted to before this line?
are they transfered back to another mem-tracker when being dequeued? when does 
that happen?


http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc
File be/src/runtime/mem-tracker.cc:

http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc@243
PS2, Line 243: }
why is it that we need this new interface now but didn't need it in other cases 
where we wanted to transfer between trackers?



--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 31 May 2018 00:30:35 +0000
Gerrit-HasComments: Yes

Reply via email to