Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batching reading for remote temporary files ...................................................................... Patch Set 4: (4 comments) Looks great. I have only the following two remaining concerns. 1. Whether the memory block lock can be integrated better with the disk file lock; 2. Once a memory block is disabled and its memory is returned to system, can it be put into use again? Let us say there are two HJs and both spill. Assume a memory block B can serve both overspills. Ideally B can serve HJ1 first followed by HJ2 or vice versa. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70 PS1, Line 70: DCHECK_EQ(static_cast<int>(status_), static_cast<int>(MemBlockStatus::RESERVED)); : // Use malloc, could be better to alloc from a buffer pool. : data_ = static_cast<uint8_t*>(malloc(size)); : DCHECK(data_ != nullptr); : SetStatusLocked(lock, MemBlockStatus::ALLOC); : } : : uint8_t* data() { return data_; } : > Yeah, maybe fewer functions could be better. Sounds good. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193 PS1, Line 193: > I think the problem is even we use shared_mutex for the disk file lock, we Some design may be needed in this area. For example, if the disk file level lock is acquired for writing, then there is no point to get the memory block lock. If the disk file level lock is not acquired (say due to read case), then the memory block lock is needed. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@195 PS1, Line 195: > Because the naming in DiskFile is "file_status_", maybe just keep them the Okay. http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@1039 PS2, Line 1039: read_buffer_block->NotifyAllWaits(); > Have a simple test today (15x tpcds, q67, c5d.4xlarge 16u32g, 1G read buffe Good test! Once we disable the memory block here, when does it get used again? My original concern is whether it can be used again hopefully (i.e. when the memory threshold test on dofetch becomes true again). -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Tue, 02 Nov 2021 20:19:28 +0000 Gerrit-HasComments: Yes
