Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t
I will look at possibility to access that average size data in the backend.

But just to make sure I get it right.
For row that contain varlen data, the GetRowSize() will most likely 
underestimate the size, since it only takes account for the pointer, but not 
the string length itself?
So that, in turn, will cause return value of this ComputeInputSizeEstimate() to 
be underestimate as well.

But isn't this input size underestimation better than overestimation? In case 
of underestimation, the worse situation is that we don't enforce 
sort_run_bytes_limit  for the first run (hoping that all will fit in memory), 
turns out wrong and spill, but we then enforce sort_run_bytes_limit for the 
next runs. Overestimation is worse, because we unnecessarily spill from 
beginning when the input can possibly fit in the memory.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 +0000
Gerrit-HasComments: Yes

Reply via email to