Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/18219 )
Change subject: IMPALA-11064 Optimizing Temporary File Structure for Batch Reading ...................................................................... Patch Set 9: (18 comments) http://gerrit.cloudera.org:8080/#/c/18219/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18219/8//COMMIT_MSG@67 PS8, Line 67: a block with spill id 1 and sequence number 1. Instead, if step > It's not really clear to me what would happen if step number is other than Added examples. Simply, the idea is to specify which future block is going to fetch. For example, if we are reading the data of block 1-1, then will try to fetch block 1-2 if step is 1, and will try to get block 1-3 if step is 2. Block 1-3 means the spill id is 1 and sequence id is 3. We expect the data would be read sequentially by the sequence id within the same spill id. The main purpose is that it may have chances that not fast enough to fetch the closest block before we move forward to read the next block, so we can set a larger step to prefetch a block a little far away. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-file.h@256 PS8, Line 256: // written_bytes() in FileWriter is thread-safe. > Any chance this should also be under lock_guard? Looks like probably not be The life cycle of file_writer_ should be the same as the DiskFile, nobody else is able to set it to nullptr. The written_bytes() is protected inside the FileWriter's lock_. So it should be fine. Added some comments. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-io-mgr.cc@291 PS8, Line 291: DCHECK_LE(written_bytes, actual_file_size); > Why call actual_content_bytes() again (which uses a lock) rather than compa Done http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-io-mgr.cc@413 PS8, Line 413: Status RemoteOperRange::DoFetch() { > Why is io_ctx added? It doesn't seem to be used at all. Thanks for pointing this out. I think it was used within the function, but no longer needed. Removed. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/disk-io-mgr.cc@435 PS8, Line 435: // Return okay because the remote file is deleted only when the query is going to > nit: "the the", and I'd remove "probably" unless there's some sort of race That makes sense. Changed. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/io/request-context.h@195 PS8, Line 195: void set_read_use_batch_counter(RuntimeProfile::Counter* read_use_batch_counter) { > nit: it'd be nice if order of these functions added order of the counter va Done http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr-internal.h@67 PS8, Line 67: is_full_ = true; > Why does this set is_full_ when it doesn't actually allocate the space? If Yeah, that is reason to make it easier to manage the blocks and have a fixed sequence. Also, in most cases I can see, the page size (num_bytes) spilled in the same partition should be the same unless the last few pages if my understanding is correct. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr-test.cc File be/src/runtime/tmp-file-mgr-test.cc: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr-test.cc@334 PS8, Line 334: TmpFileGroup& file_group, int idx, bool batch_read = false) { > Planning to add tests where batch_read is false? Added one old testcase to use it, also changed the new testcase to use both batch_read true and false. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr-test.cc@2148 PS8, Line 2148: WaitForCallbacks(1); > I wonder if this whole test should be duplicated, so we can have both batch That is a good idea. Added. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.h@521 PS8, Line 521: /// compression). The caller should not modify the data in 'buffer' until the write > These default values are only used in testing, and remote_batch_read_prefet Changed to default value. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@457 PS8, Line 457: Status TmpFileMgr::CreateUploadTmpFileThread() { > I'm not really sure why this is needed. Reading the function comments doesn I guess the function is helping the turn the buffer size into a form of files in the pool during initialization, which is the form that will eventually happen, because all the released buffer files should be returned to the pool. However it may not be necessary, removed for now. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@545 PS8, Line 545: DCHECK(file_group != nullptr); > This function is only called one place, where we know tmp_file is a TempFil On second thought, I moved the function to TmpFileRemote, and name it as ReleaseReadBuffers(). http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@549 PS8, Line 549: > This check redundant because we called it in TempFileRemote right before th Removed the check of the caller, and keep the check inside the implementation. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@608 PS8, Line 608: > I'm not clear what this pattern is doing. I see it here and somewhat simila There is a context that we can reserve the local disk space for a new file from two ways, one is from the metrics local_buff_dir_bytes_high_water_mark before it reaches the dir bytes_limit, the other is from the pool, an evictable file will be returned to the pool without changing the metrics. For long running, the latter way could be more common. Because this patch introduces a new file size for using batch read, it could be smaller than the default value. Also, previously the files in the pool are with default file size. To reduce complexity using different sizes in the pool, it would be good to remain the previous pattern, so this patch uses a way when returning the buffer space, which changes the metrics, we will check the remaining size, if larger than a default file size, then return a dummy file (with default file size) to the pool instead, otherwise, it could be a case that the thread in the pool might be starving waiting for the evictable file. Added some comments. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@831 PS8, Line 831: *path = parsed_raw_path; > nit: "by default" seems misleading here. I don't see any way to change this Done. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/runtime/tmp-file-mgr.cc@1362 PS8, Line 1362: } > Personally I'd rather call actual_content_bytes once and store it locally f Done http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/service/query-options.h@279 PS8, Line 279: QUERY_OPT_FN(remote_batch_read_prefetch_step, REMOTE_BATCH_READ_PREFETCH_STEP, \ > When will docs for these new query options be added? That is a good point. Filed a Jira IMPALA-11656 to follow up. http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/18219/8/be/src/service/query-options.cc@1255 PS8, Line 1255: "Only integer value larger than or equal to 0 is allowed.", > nit: "equal" instead of "equals" Done -- To view, visit http://gerrit.cloudera.org:8080/18219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If913785cac9e2dafa20013b6600c87fcaf3e2018 Gerrit-Change-Number: 18219 Gerrit-PatchSet: 9 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Mon, 17 Oct 2022 03:00:10 +0000 Gerrit-HasComments: Yes
