Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18929 )
Change subject: IMPALA-11539: Mitigate intra-node skew of file scans with MT_DOP ...................................................................... Patch Set 4: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/18929/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18929/3//COMMIT_MSG@25 PS3, Line 25: Ranges that are marked to use the hdfs cache are still handled with : higher priority. > File handle cache: it would require a more invasive change plus testing, so Data cache: If we could get the number of bytes from a file in data cache, it could be subtracted from the size during the ordering. This would probably also help with the file handle cache, as files with lot of chunks in data cache are more likely to be also in the file handle cache. Note that this is just a brain dump, I wouldn't complicate this change with it. Also realized that while a fix order for files can potentially cause regression in full scans not fitting to cache, it can also improve things for other queries: if not all files are read in the end (the query has LIMIT), then it is better for caching to not randomize the order. http://gerrit.cloudera.org:8080/#/c/18929/3/be/src/exec/scan-range-queue-mt.h File be/src/exec/scan-range-queue-mt.h: http://gerrit.cloudera.org:8080/#/c/18929/3/be/src/exec/scan-range-queue-mt.h@30 PS3, Line 30: Only used for MT scans where the scan ranges are dynamically assigned : /// to the fragment instances using this queue. > The MT_DOP=0 uses different methods and I didn't want to change it in this ok http://gerrit.cloudera.org:8080/#/c/18929/3/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/18929/3/tests/query_test/test_scanners.py@378 PS3, Line 378: 'text', 'parquet > Interestingly I got similar min/max ratios with text and parquet. I was thi It is ok like this, maybe we can look into it deeper if the test turn out to be flaky. -- To view, visit http://gerrit.cloudera.org:8080/18929 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7dc1f1665565da6c0e155c1e585f7089b18a180 Gerrit-Change-Number: 18929 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 31 Aug 2022 11:45:13 +0000 Gerrit-HasComments: Yes
