Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
            : to hit memory limit first by capping the intermediary quicksort 
run to
            : lower memory limit,
> Since the patch can have immediate benefit in some well known cases like 
> partitioned inserts, we might consider putting it in with the query option 
> but without additional safeguards against regressions (but documenting the 
> cases where it's safe to use) and then explore some of these other approaches 
> in a follow up phase.

I'm fine with this, I just want to make sure we're set up with some freedom to 
change the behaviour in future.

As far as estimates go, there's a lot of room for improvement. The approach to 
estimates for SortNode is an outlier, since it doesn't actually estimate the 
ideal memory to keep it in-memory, unlike everything else. I'd be a little 
concerned that the estimate of the memory required to keep the full sort input 
in memory could be too high, since it depends on getting the cardinality and 
row size right for the full output of the plan tree.

Agree that the estimates would likely be very accurate for plain ETL cases 
where the source table has stats computed.

We could probably finesse the estimate logic a bit, e.g. use the full data size 
estimate up to a few GB, then switch to the two-phase estimate.

David, do you have some examples of numbers for the estimates vs actual that 
you mentioned?



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 05 Jun 2020 16:46:58 +0000
Gerrit-HasComments: Yes

Reply via email to