Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batching reading for remote temporary files
......................................................................


Patch Set 4:

(4 comments)

Looks great.

I have only the following two remaining concerns.

1. Whether the memory block lock can be integrated better with the disk file 
lock;
2. Once a memory block is disabled and its memory is returned to system, can it 
be put into use again?  Let us say there are two HJs and both spill. Assume a 
memory block B can serve both overspills. Ideally B can serve HJ1 first 
followed by HJ2 or vice versa.

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70
PS1, Line 70:    DCHECK_EQ(static_cast<int>(status_), 
static_cast<int>(MemBlockStatus::RESERVED));
            :     // Use malloc, could be better to alloc from a buffer pool.
            :     data_ = static_cast<uint8_t*>(malloc(size));
            :     DCHECK(data_ != nullptr);
            :     SetStatusLocked(lock, MemBlockStatus::ALLOC);
            :   }
            :
            :   uint8_t* data() { return data_; }
            :
> Yeah, maybe fewer functions could be better.
Sounds good.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193
PS1, Line 193:
> I think the problem is even we use shared_mutex for the disk file lock, we
Some design may be needed in this area.

For example, if the disk file level lock is acquired for writing, then there is 
no point to get the memory block lock.

If the disk file level lock is not acquired (say due to read case), then the 
memory block lock is needed.


http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@195
PS1, Line 195:
> Because the naming in DiskFile is "file_status_", maybe just keep them the
Okay.


http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/2/be/src/runtime/tmp-file-mgr.cc@1039
PS2, Line 1039:       read_buffer_block->NotifyAllWaits();
> Have a simple test today (15x tpcds, q67, c5d.4xlarge 16u32g, 1G read buffe
Good test!

Once we disable the memory block here, when does it get used again?

My original concern is whether it can be used again hopefully (i.e. when the 
memory threshold test on dofetch becomes true again).



--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 02 Nov 2021 20:19:28 +0000
Gerrit-HasComments: Yes

Reply via email to