Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10550 )
Change subject: PREVIEW: IMPALA-7078: improve memory consumption of wide Avro scans ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@18 PS2, Line 18: Cap the maximum row batch queue size at 5 * the number of active : scanner threads. adding this heuristic on top of the current one seems hard to understand how the system will behave. what's the reasoning behind the new heuristic combined with the old heuristic? does it make sense intuitive? http://gerrit.cloudera.org:8080/#/c/10550/2//COMMIT_MSG@32 PS2, Line 32: Track the row batch queue memory consumption against a sub-tracker what does that look like in terms of the memtracker hierarchy? http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@496 PS2, Line 496: Check : // AtCapacity() at the end of the loop to guarantee that we process at least one row : // so that we mke progress even if the batch starts off with AtCapacity() == true. that's kinda confusing because it's counter to our usual which is that once AtCapaticy() is true you are not allowed to add more rows (it might have hit the hard limit of tuple ptrs). Maybe reword a bit and add a dcheck that the hard limit isn't reached yet. (Though this do-while pattern for AtCapacity() is preexisting.) http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-avro-scanner.cc@568 PS2, Line 568: keep_current_chunk = true to but line 573 passes false. i think you meant false here. http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/exec/hdfs-scan-node.cc@238 PS2, Line 238: row_batch->SetMemTracker(row_batches_mem_tracker_); which mem-tracker are these accounted to before this line? are they transfered back to another mem-tracker when being dequeued? when does that happen? http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/10550/2/be/src/runtime/mem-tracker.cc@243 PS2, Line 243: } why is it that we need this new interface now but didn't need it in other cases where we wanted to transfer between trackers? -- To view, visit http://gerrit.cloudera.org:8080/10550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b Gerrit-Change-Number: 10550 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 31 May 2018 00:30:35 +0000 Gerrit-HasComments: Yes
