Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17979 )
Change subject: IMPALA-10791 Add batching reading for remote temporary files ...................................................................... Patch Set 1: (18 comments) Just try to understand this piece of work. Sounds promising. http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17979/1//COMMIT_MSG@38 PS1, Line 38: now the system You mean Impala or OS?. Is it a guarantee? http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h File be/src/runtime/io/disk-file.h: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@47 PS1, Line 47: DONE nit. WRITTEN probably is better. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@55 PS1, Line 55: DISABLED nit. Could be UNINIT too? http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@62 PS1, Line 62: poool nit. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@70 PS1, Line 70: bool IsDisabled() { : std::unique_lock<SpinLock> l(lock_); : return IsDisabledLocked(l); : } : : bool IsDisabledLocked(const std::unique_lock<SpinLock>& lock) { : DCHECK(lock.mutex() == &lock_ && lock.owns_lock()); : return status_ == MemBlockStatus::DISABLED; : } Can this pair of methods to be simplified as follows? bool IsDisabled() { std::unique_lock<SpinLock> l(lock_); return IsInStateLocked(l, MemBlockStatus::DISABLED); } bool IsInStateLocked(const std::unique_lock<SpinLock>& lock, enum x) { DCHECK(lock.mutex() == &lock_ && lock.owns_lock()); return status_ == x; } In this way, other similar methods can call IsInStateLocked() directly. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@193 PS1, Line 193: lock_ A little bit confusing when read code which applies lock_. since there are two lock_ in two classes. Suggest to name the lock_ here to memory_block_lock_. I also wonder if we can just use the lock in disk file instead. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@292 PS1, Line 292: std::lock_guard<SpinLock> l(lock_); : return actual_file_size_; actual_file_size_ is AtomicInt64. Do we need to lock with lock_ here first? http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@379 PS1, Line 379: num_of_read_buffer nit. num_of_read_buffers. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@381 PS1, Line 381: equals nit equal http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@391 PS1, Line 391: DCHECK_LT(buffer_idx, num_of_read_buffer()); : DCHECK_GE(buffer_idx, 0); Should call DCheckReadBufferIdx(). http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@393 PS1, Line 393: std::lock_guard<SpinLock> lock(lock_); should we lock it first before check? http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@415 PS1, Line 415: DCheckReadBufferIdx(buffer_idx); : std::lock_guard<SpinLock> lock(lock_); Lock before check? http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.h@509 PS1, Line 509: lock_ Suggest to rename as disk_file_lock_. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc File be/src/runtime/io/disk-file.cc: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/io/disk-file.cc@123 PS1, Line 123: *allocated = true; If the memory is deleted, the state in its control block should become reserved at most. Suggest to also update its control block in this method. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@233 PS1, Line 233: of nit off. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@234 PS1, Line 234: is nit if. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr-internal.h@283 PS1, Line 283: SetToDelete nit. May name as SetToDeleteFlag() since the argument can be false which indicates the files are not deleted. http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/17979/1/be/src/runtime/tmp-file-mgr.h@245 PS1, Line 245: Return if batching read for remote temporary files is enabled. nit. Indicate ... May rename as CanRemoteBatchingRead() -- To view, visit http://gerrit.cloudera.org:8080/17979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b Gerrit-Change-Number: 17979 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu <wydbaggio...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Comment-Date: Wed, 27 Oct 2021 18:34:58 +0000 Gerrit-HasComments: Yes