Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18219 )

Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch 
Reading
......................................................................


Patch Set 10: Code-Review+1

(8 comments)

Thanks for the clarification!

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@52
PS9, Line 52: ew start option:
> Changed to remote_batch_read_block_size_level. Keep the level to indicate i
Done


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@63
PS9, Line 63: ew query options:
> The idea for the perfetch here is to set the step instead of the size to be
Done


http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@74
PS9, Line 74:
> That is a good point. In fact remote_batch_buffer_file_limit limits the max
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@82
PS9, Line 82: int64_t level, uint64_t default_id
> I think in that case, multiple instances but with the same partition id cou
Assume uniform data distributions, then each of the <n> instances has the same 
chance of spill, or generates equal amount of spilled data.

When a skew exists, then one of the <n> instances may have high amount of 
spilled data.

I agree that it may be useful to run some more tests and find out a good 
strategy for each case.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@84
PS9, Line 84: 16;
> If my understanding is correct, the partition number here relies on the PAR
okay. Thanks for pointing out that the partition is actually the fanout. May be 
add a comment in comment message to clarify?

See my comment about about more testing.


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc
File be/src/runtime/io/local-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc@66
PS9, Line 66: WriteOne
> The name has been there for a while, and the rename requires changes in qui
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@427
PS9, Line 427: int
> Uses int because it leaves a room for negative value to indicate special me
Done


http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@890
PS9, Line 890: gro
> changed to resource_val, because it may not be bytes.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018
Gerrit-Change-Number: 18219
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Fri, 18 Nov 2022 20:13:21 +0000
Gerrit-HasComments: Yes

Reply via email to