Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 )
Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. ...................................................................... Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc File be/src/exec/sort-node.cc: http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90 PS12, Line 90: GetRowSize > I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t I will look at possibility to access that average size data in the backend. But just to make sure I get it right. For row that contain varlen data, the GetRowSize() will most likely underestimate the size, since it only takes account for the pointer, but not the string length itself? So that, in turn, will cause return value of this ComputeInputSizeEstimate() to be underestimate as well. But isn't this input size underestimation better than overestimation? In case of underestimation, the worse situation is that we don't enforce sort_run_bytes_limit for the first run (hoping that all will fit in memory), turns out wrong and spill, but we then enforce sort_run_bytes_limit for the next runs. Overestimation is worse, because we unnecessarily spill from beginning when the input can possibly fit in the memory. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 +0000 Gerrit-HasComments: Yes
