Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18219 )
Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch Reading ...................................................................... Patch Set 9: (29 comments) Looks good! It seems there is still room to simplify the new query options. That is the interface to the new temp file scheme. That to be able to handle many many partitions is also a key. http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@12 PS9, Line 12: allocate nit. allocating http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@12 PS9, Line 12: There will be two types of structures, one is the original, allocate nit with the patch http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@52 PS9, Line 52: remote_batch_read_max_block_size_level I wonder why not spell out the block size directly, as remote_batch_read_block_size. Simplicity is better. http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@63 PS9, Line 63: remote_batch_read_prefetch_step A slight better name would be remote_batch_read_prefetch_size. The option specifies the number of subsequent blocks to prefetch. http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@74 PS9, Line 74: base_spill_id_level If I understand correctly, max number of temp files = 2^level times # of partitions. When there are many partitions, this number could still be large. Would it be possible to provide an option on max number of temp files directly? http://gerrit.cloudera.org:8080/#/c/18219/9//COMMIT_MSG@103 PS9, Line 103: May need to say a few sentences with performance numbers here to indicate the new scheme is better. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@82 PS9, Line 82: int64_t level, uint64_t default_id I wonder if fragment instance should play a role here. In the mt_dop > 0 case, multiple instances can be running on a single node. In this case, it may be useful to place spills from fragment instances on different temp files. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@84 PS9, Line 84: 16; This limits max # of partitions to be 2^16 - 1, which could be very limiting for tables with range partitions on weeks or days. Would it be possible to use the following existing code directly? The inputs are a sequence of uints: default_id, partition_id, and possibly fragment instance id. And then mod on the max number of temp files (see my comment in commit message). 91 /// CrcHash() specialized for 8-byte data 92 static inline uint32_t CrcHash8(const void* v, uint32_t hash) { 93 DCHECK(CpuInfo::IsSupported(CpuInfo::SSE4_2) || base::IsAarch64()); 94 const uint64_t* p = reinterpret_cast<const uint64_t*>(v); 95 hash = SSE4_crc32_u64(hash, *p); 96 hash = (hash << 16) | (hash >> 16); 97 return hash; 98 } util/hash-util.h http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.h@86 PS9, Line 86: DCHECK_GE(level, 0); nit. this probably is nor necessary because of line 83. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc@194 PS9, Line 194: base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId( : state->query_options().base_spill_id_level, (uint64_t)this); nit. This line of code probably should be part of the member initialization section at line 180. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-builder.cc@225 PS9, Line 225: base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId( : state->query_options().base_spill_id_level, (uint64_t)this); nit. same comment as above. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/exec/partitioned-hash-join-node.cc@123 PS9, Line 123: base_spill_id_ = PhjBuilderConfig::GenerateBaseSpillId( : state->query_options().base_spill_id_level, (uint64_t)this); nit. Move to member initialization section. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc File be/src/runtime/io/local-file-writer.cc: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc@66 PS9, Line 66: WriteOne nit. Probably renamed as WriteOneRange(). http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/io/local-file-writer.cc@77 PS9, Line 77: if (ret_status.ok() && !close_status.ok()) ret_status = close_status; nit. This line can be moved inside the next block to avoid check ret_status.ok() twice. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@81 PS9, Line 81: set_full nt. set_is_ful(). http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@109 PS9, Line 109: with : /// the specific nit. "among all blocks with the same spill id". http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@124 PS9, Line 124: offset_ nit. offset_within_block_ http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@129 PS9, Line 129: /// The id of the block in a temporary file. For example, if the block is the first one nit. 0-based http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@427 PS9, Line 427: int nit uint_32 http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@429 PS9, Line 429: DCHECK_LT(block_id, file_mgr_->GetNumReadBuffersPerFile()); nit. may augment with an error message. DCHECK_LT(block_id, file_mgr_->GetNumReadBuffersPerFile()) << "block_id is out of range."; http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@438 PS9, Line 438: IncFullBlockStat nit. May rename as IncreaseFullBlockCounter() http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@450 PS9, Line 450: (full_block_cnt_ < file_mgr_->GetNumReadBuffersPerFile() nit LIKELY() http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@485 PS9, Line 485: should be allocated nit. Do we return false also when the allocation fails? May need to state that fact in the comment. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@485 PS9, Line 485: would nit remove http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@591 PS9, Line 591: buffer_idx nt. Do we need to DCHECK_LE() first? http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@867 PS9, Line 867: destroying nit being destroyed. http://gerrit.cloudera.org:8080/#/c/18219/9/be/src/runtime/tmp-file-mgr-internal.h@890 PS9, Line 890: val nit. may renamed as resource_bytes http://gerrit.cloudera.org:8080/#/c/18219/9/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18219/9/common/thrift/ImpalaService.thrift@744 PS9, Line 744: REMOTE_BATCH_READ_PREFETCH_STEP = 148 : REMOTE_BATCH_BUFFER_FILE_LIMIT = 149 : BASE_SPILL_ID_LEVEL = 150 Need to provide a specific comment for each new option. http://gerrit.cloudera.org:8080/#/c/18219/9/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18219/9/common/thrift/Query.thrift@601 PS9, Line 601: // Control the batch reading of temporary files. : 149: optional i32 remote_batch_buffer_file_limit = 0; : 150: optional i32 remote_batch_read_prefetch_step = 1; : 151: optional i32 base_spill_id_level = 6; better to follow the existing convention above. -- To view, visit http://gerrit.cloudera.org:8080/18219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018 Gerrit-Change-Number: 18219 Gerrit-PatchSet: 9 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Fri, 21 Oct 2022 15:35:03 +0000 Gerrit-HasComments: Yes
