Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16318 )

Change subject: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
......................................................................


Patch Set 32:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr-internal.h@206
PS32, Line 206:   TmpFileRemote(TmpFileGroup* file_group, TmpFileMgr::DeviceId 
device_id,
> Can we add a destructor with a DCHECK that will ensure that the resources h
Added DCHECK for is_buffer_returned in destructor. If the buffer is reserved 
for the file, the file needs to be returned to the pool and the pool set the 
is_buffer_returned flag. The shared_ptr of a remote TmpFile could be kept by 
FileGroup or TmpFileBufferPool, the file would be destructed if the FileGroup 
is closed and the file is not in the TmpFileBufferPool.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.h@184
PS32, Line 184:     /// A map to keep the TmpFile shared pointer alive by 
keeping a reference of the
> Why do we need to do this again? Can't we enqueue the shared_ptr into the t
Moved the map of raw pointer and its shared_ptr of TmpFiles from TmpFileMgr to 
TmpFileGroup to avoid potential memory leaks. The map is used to locate the 
shared_ptr from raw pointer of a TmpFile, because we pass the raw pointer for 
most cases, only one special case we need the shared_ptr of a TmpFile is that 
when we want to enqueue the file to the pool and may need the file to live 
longer than the file group (the caller may dequeue a file from the pool after 
its file group closed), in this case, we look for the shared_ptr by its raw 
pointer and enqueue the share_ptr of the file to the pool.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@576
PS32, Line 576:   if 
(tmp_dirs_remote_ctrl_.local_buff_dir_bytes_high_water_mark_.Add(file_size)
> I think this code might be vulnerable to a race where a thread ends up bloc
Added some comments. The high water mark would keep increasing before it 
reaches the dir limit. The caller, if gains the space, would return the space 
to the pool instead of decreasing the high water mark. Then when the high water 
mark reaches the dir limit, the high water mark would be of no use for 
assigning the space, because we can assume that all of the buffers are in the 
pool, we would reserve space from the pool.


http://gerrit.cloudera.org:8080/#/c/16318/32/be/src/runtime/tmp-file-mgr.cc@1835
PS32, Line 1835: void TmpFileBufferPool::EnqueueTmpFilesPool(TmpFile* tmp_file, 
bool front) {
> I'm suspicious about using the raw pointer here
Changed to use shared_ptr. The shared_ptr of a remote TmpFile could be kept by 
FileGroup or TmpFileBufferPool, the file would be destructed if the FileGroup 
is closed and the file is not in the TmpFileBufferPool. The reason of using 
shared_ptr instead of unique_ptr here is that, the file may still be needed 
after dequeued from the pool (the remote file still exists even the buffer is 
evicted after dequeuing), so using shared_ptr can keep the file alive in both 
FileGroup and TmpFileBufferPool.



--
To view, visit http://gerrit.cloudera.org:8080/16318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I419b1d5dbbfe35334d9f964c4b65e553579fdc89
Gerrit-Change-Number: 16318
Gerrit-PatchSet: 32
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Wed, 03 Feb 2021 02:31:04 +0000
Gerrit-HasComments: Yes

Reply via email to