Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> So what is the nature of varlen column? Is each row possibly will have diff
I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain the 
size of the tuple that holds a row - but in case of string and varchar it 
contains a pointer (+length), so there is additional data in some buffer.

The column stats contain AvgSize and MaxSize - these are constants for fixed 
sized types, but we calculate them for strings during COMPUTE STATS, so we can 
get a more or less accurate estimation for the total amount of memory consumed.

I don't know from the top of my head how to access this data in the backend.

Strings are very common, so many queries contain varlen slots. I am not sure if 
it is a good idea to create an optimization specifically for queries without 
strings.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:11:01 +0000
Gerrit-HasComments: Yes

Reply via email to