Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18219 )
Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch Reading ...................................................................... Patch Set 10: Code-Review+1 (8 comments) Thanks for the clarification! http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@52 PS9, Line 52: ew start option: > Changed to remote_batch_read_block_size_level. Keep the level to indicate i Done http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@63 PS9, Line 63: ew query options: > The idea for the perfetch here is to set the step instead of the size to be Done http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@74 PS9, Line 74: > That is a good point. In fact remote_batch_buffer_file_limit limits the max Done http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@82 PS9, Line 82: int64_t level, uint64_t default_id > I think in that case, multiple instances but with the same partition id cou Assume uniform data distributions, then each of the <n> instances has the same chance of spill, or generates equal amount of spilled data. When a skew exists, then one of the <n> instances may have high amount of spilled data. I agree that it may be useful to run some more tests and find out a good strategy for each case. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@84 PS9, Line 84: 16; > If my understanding is correct, the partition number here relies on the PAR okay. Thanks for pointing out that the partition is actually the fanout. May be add a comment in comment message to clarify? See my comment about about more testing. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc File be/src/runtime/io/local-file-writer.cc: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc@66 PS9, Line 66: WriteOne > The name has been there for a while, and the rename requires changes in qui Done http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@427 PS9, Line 427: int > Uses int because it leaves a room for negative value to indicate special me Done http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@890 PS9, Line 890: gro > changed to resource_val, because it may not be bytes. Done -- To view, visit http://gerrit.cloudera.org:8080/18219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018 Gerrit-Change-Number: 18219 Gerrit-PatchSet: 10 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Fri, 18 Nov 2022 20:13:21 +0000 Gerrit-HasComments: Yes
