Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18219 )

Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch 
Reading
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/exec/partitioned-hash-join-builder.h@80
PS5, Line 80:   /// left shift 16 bits for the later specified partition id.
> PartitionId is an int, which can be 32-bits. Should we left shift 32 bits h
My understanding is that the partition number depends on the PARTITION_FANOUT, 
which is 16 by default, to increase the parallelism of processing the 
streaming, should not be a very large number because the limit of the core 
number. 2^16=65536 should be enough.


http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/exec/partitioned-hash-join-builder.h@82
PS5, Line 82:   static uint64_t GenerateBaseSpillId(int64_t level, uint64_t 
default_id) {
> Can you add a small unit test for this?
Will add.


http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/runtime/tmp-file-mgr.h@617
PS5, Line 617:     if (it != tmp_files_remote_by_spill_id_.end()) {
> This violates the comments at 507 that
Moved the not thread-safe public functions to protected and added some comments.


http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/runtime/tmp-file-mgr.cc@130
PS5, Line 130:     "If set to 0, means to use the same size as 
remote_tmp_file_size. "
> Maybe "The size should be a power of 2, and an integer multiple of the bloc
The file and block size are power of 2 following the norm of the page size, 
because the size of page could be varied according to queries, for example, a 
10MB file is a good fit for 2MB page, but the page can be set to 4MB, so all 
power of 2 sizes seem easier to be compatible with different pages. The total 
size of buffer pool or local disk buffer do not require power of 2.
I prefer the level because it may reduce the chance of setting an invalid size 
and guarantees power of 2, but the old norm is using the string for the size 
and it is more human-readable.


http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/runtime/tmp-file-mgr.cc@1639
PS5, Line 1639:   DCHECK(tmp_blocks != nullptr);
> Why is it minus one instead of plus one?
During the test, the upper layer could always pin these pages in a reverse 
order among partitions, but it could be something to investigate later to see 
if there is a better heuristic. Added some comments, i think it is more likely 
in a reverse order from my observation.


http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18219/5/be/src/service/query-options.cc@1259
PS5, Line 1259:         if (result != StringParser::PARSE_SUCCESS || level < -1 
|| level > 16) {
> Why 12 instead of 16 (or 32)?
2^12=4096, partition FANOUT by default is 16, if each block using 8MB, the max 
local disk buffer using batch reading for a query could be 4096*16*8MB=512GB, 
looks enough for one query, because currently cdw is using a 300GB disk. 
Changed to 16, could be less confusion.



--
To view, visit http://gerrit.cloudera.org:8080/18219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018
Gerrit-Change-Number: 18219
Gerrit-PatchSet: 6
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 29 Jun 2022 17:24:54 +0000
Gerrit-HasComments: Yes

Reply via email to