Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
......................................................................


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
> This is one step towards solving the problem.  The back pressure problem st
Resolving this, and continue discussion in latest patch.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
            : to hit memory limit first by capping the intermediary quicksort 
run to
            : lower memory limit,
> Patch set 10 add flag to either enforce sort_run_bytes_limit or not. It wil
Resolving this, and continue discussion in latest patch.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40: intermediary sort.
> I later did 256 MB limit and it achieved a little faster query time. Loweri
Done


http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc@76
PS11, Line 76:   int64_t estimated_input_size = 
children_node->tnode_->estimated_stats.cardinality
             :       * children_node->row_descriptor_->GetRowSize();
I'm not sure my calculation is right here. My assumption is that 
estimated_stats.cardinality here is per backend, but it seems like it is a 
query wide.

In my experiment today, I did insert to tpcds_300_parquet.web_sales using 5 
executor backends. Each SORT_NODE in each backend is expected to process 5.67GB 
of row batches, which is more beneficial to fit everything in memory. Setting 
buffer_pool_limit between 8GB to 28GB still cap and spill the first sort run, 
while setting it as 29GB or above can successfully waive sort_run_bytes_limit 
for the first sort run (allowing it to use full memory). Maybe I should divide 
by the number of executor backend involved here.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Sat, 13 Jun 2020 00:58:41 +0000
Gerrit-HasComments: Yes

Reply via email to