Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18929 )

Change subject: IMPALA-11539: Mitigate intra-node skew of file scans with MT_DOP
......................................................................


Patch Set 4: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18929/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18929/3//COMMIT_MSG@25
PS3, Line 25: Ranges that are marked to use the hdfs cache are still handled 
with
            : higher priority.
> File handle cache: it would require a more invasive change plus testing, so
Data cache:
If we could get the number of bytes from a file in data cache, it could be 
subtracted from the size during the ordering.
This would probably also help with the file handle cache, as files with lot of 
chunks in data cache are more likely to be also in the file handle cache.

Note that this is just a brain dump, I wouldn't complicate this change with it.

Also realized that while a fix order for files can potentially cause regression 
in full scans not fitting to cache, it can also improve things for other 
queries: if not all files are read in the end (the query has LIMIT), then it is 
better for caching to not randomize the order.


http://gerrit.cloudera.org:8080/#/c/18929/3/be/src/exec/scan-range-queue-mt.h
File be/src/exec/scan-range-queue-mt.h:

http://gerrit.cloudera.org:8080/#/c/18929/3/be/src/exec/scan-range-queue-mt.h@30
PS3, Line 30: Only used for MT scans where the scan ranges are dynamically 
assigned
            : /// to the fragment instances using this queue.
> The MT_DOP=0 uses different methods and I didn't want to change it in this
ok


http://gerrit.cloudera.org:8080/#/c/18929/3/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/18929/3/tests/query_test/test_scanners.py@378
PS3, Line 378: 'text', 'parquet
> Interestingly I got similar min/max ratios with text and parquet. I was thi
It is ok like this, maybe we can look into it deeper if the test turn out to be 
flaky.



--
To view, visit http://gerrit.cloudera.org:8080/18929
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7dc1f1665565da6c0e155c1e585f7089b18a180
Gerrit-Change-Number: 18929
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 31 Aug 2022 11:45:13 +0000
Gerrit-HasComments: Yes

Reply via email to