Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/21754 )
Change subject: IMPALA-12908: Add correctness check for tuple cache ...................................................................... Patch Set 3: (13 comments) Thanks Joe for the review. http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG@20 PS2, Line 20: The subdirectory is using the cache key as : its name. Additionally, data from the child i > I think this is out of date. We dropped mt_dop/instance index and now incor Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h@99 PS2, Line 99: } > I think this can be const. Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@356 PS2, Line 356: debug_dump_text_writer_ref_->Delete(); > Here's another way of doing this: At open, write out the entire reference c Done. It seems some buffer_pool_client issue when it tries to read in Open(), I put the logic to read from the cache and write the ref to the end of the GetNext(). http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@468 PS2, Line 468: : : : > I think we should structure the code so that this function does not need to After the eviction function removes the metadata entry, it's possible that the id might not be found if the cache was evicted. In this case, returning an empty string seems reasonable, allowing the caller to skip the correctness check for this cache. I think the idea is that if there's an issue and we can't confirm the path name, the correctness check is skipped. Added some comments http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h File be/src/exec/tuple-text-file-reader.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h@45 PS2, Line 45: int GetFileSize() > Mark this function as const The reader_.tellg() doesn't allow the const, I changed the logic a little bit. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h File be/src/exec/tuple-text-file-writer.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@52 PS2, Line 52: bool IsEmpty() > Mark this function as const Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@55 PS2, Line 55: size_t BytesWritten() const { return bytes_writte > Mark this function as const Done. writer_.tellp() doesn't allow const, changed it another way to mark the bytes written. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h File be/src/runtime/tuple-cache-mgr.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@72 PS2, Line 72: /// The fragment id of the tuple cache for debug purpose. : std::string fragment_id; > One other thing that we might want to put in this struct is the value of th Will file a separate JIRA http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@140 PS2, Line 140: /// Create the subdirectory for debug dumping the tuple cache if needed and return : /// the full file path. : string GetDebugDumpPath(const string& sub_dir, const string& file_name) > Mark this function as const Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@144 PS2, Line 144: /// Return whether debug dumping is enabled. : bool DebugDumpEnabled() const { return cache_debug_dump_dir_ != > Mark this function as const Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@152 PS2, Line 152: : private: : // Disallow copy and assign : TupleCacheMgr(const TupleCacheMgr&) = delete; : TupleCacheMgr& operator=(const TupleCacheMgr&) = delete; : : friend class TupleCacheMgrTest; : : // Constructor for tests > From what I can tell, this function doesn't make use of any of the fields o Moved the verify functions to the tuple-text-file-util.h/.cc. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc File be/src/runtime/tuple-cache-mgr.cc: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@384 PS2, Line 384: Status status = FileSystemUtil::PathExists(full_path.string(), &path_exists); : if (!status.ok()) return string(); : i > We need to be careful about boost throwing exceptions. Let's switch this to Done http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@407 PS2, Line 407: return iter->second.fragment_id; > When we evict an entry from the cache, I think we should remove the entry f Done -- To view, visit http://gerrit.cloudera.org:8080/21754 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c Gerrit-Change-Number: 21754 Gerrit-PatchSet: 3 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Mon, 23 Sep 2024 17:40:05 +0000 Gerrit-HasComments: Yes
