Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21754 )

Change subject: IMPALA-12908: Add correctness check for tuple cache
......................................................................


Patch Set 3:

(13 comments)

Thanks Joe for the review.

http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG@20
PS2, Line 20: The subdirectory is using the cache key as
            : its name. Additionally, data from the child i
> I think this is out of date. We dropped mt_dop/instance index and now incor
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h@99
PS2, Line 99: }
> I think this can be const.
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@356
PS2, Line 356:     debug_dump_text_writer_ref_->Delete();
> Here's another way of doing this: At open, write out the entire reference c
Done. It seems some buffer_pool_client issue when it tries to read in Open(), I 
put the logic to read from the cache and write the ref to the end of the 
GetNext().


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@468
PS2, Line 468:
             :
             :
             :
> I think we should structure the code so that this function does not need to
After the eviction function removes the metadata entry, it's possible that the 
id might not be found if the cache was evicted. In this case, returning an 
empty string seems reasonable, allowing the caller to skip the correctness 
check for this cache. I think the idea is that if there's an issue and we can't 
confirm the path name, the correctness check is skipped. Added some comments


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h
File be/src/exec/tuple-text-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h@45
PS2, Line 45:   int GetFileSize()
> Mark this function as const
The reader_.tellg() doesn't allow the const, I changed the logic a little bit.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h
File be/src/exec/tuple-text-file-writer.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@52
PS2, Line 52:   bool IsEmpty()
> Mark this function as const
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@55
PS2, Line 55:   size_t BytesWritten() const { return bytes_writte
> Mark this function as const
Done. writer_.tellp() doesn't allow const, changed it another way to mark the 
bytes written.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h
File be/src/runtime/tuple-cache-mgr.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@72
PS2, Line 72:     /// The fragment id of the tuple cache for debug purpose.
            :     std::string fragment_id;
> One other thing that we might want to put in this struct is the value of th
Will file a separate JIRA


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@140
PS2, Line 140:   /// Create the subdirectory for debug dumping the tuple cache 
if needed and return
             :   /// the full file path.
             :   string GetDebugDumpPath(const string& sub_dir, const string& 
file_name)
> Mark this function as const
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@144
PS2, Line 144:   /// Return whether debug dumping is enabled.
             :   bool DebugDumpEnabled() const { return cache_debug_dump_dir_ !=
> Mark this function as const
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@152
PS2, Line 152:
             :  private:
             :   // Disallow copy and assign
             :   TupleCacheMgr(const TupleCacheMgr&) = delete;
             :   TupleCacheMgr& operator=(const TupleCacheMgr&) = delete;
             :
             :   friend class TupleCacheMgrTest;
             :
             :   // Constructor for tests
> From what I can tell, this function doesn't make use of any of the fields o
Moved the verify functions to the tuple-text-file-util.h/.cc.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc
File be/src/runtime/tuple-cache-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@384
PS2, Line 384:   Status status = FileSystemUtil::PathExists(full_path.string(), 
&path_exists);
             :   if (!status.ok()) return string();
             :   i
> We need to be careful about boost throwing exceptions. Let's switch this to
Done


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@407
PS2, Line 407:   return iter->second.fragment_id;
> When we evict an entry from the cache, I think we should remove the entry f
Done



--
To view, visit http://gerrit.cloudera.org:8080/21754
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Gerrit-Change-Number: 21754
Gerrit-PatchSet: 3
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Mon, 23 Sep 2024 17:40:05 +0000
Gerrit-HasComments: Yes

Reply via email to