Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21754 )
Change subject: IMPALA-12908: Add correctness check for tuple cache ...................................................................... Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG@20 PS2, Line 20: The subdirectory is named using a combination : of the cache key, mt_dop, and instance index. I think this is out of date. We dropped mt_dop/instance index and now incorporate fragment instance id. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h@99 PS2, Line 99: string GetDebugDumpPath(const RuntimeState* state, bool is_ref); I think this can be const. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@356 PS2, Line 356: if (debug_dump_text_writer_ && !debug_dump_rowbatch_done_reading_) { Here's another way of doing this: At open, write out the entire reference cache file immediately. Then, set up the query to write its output to the debug location and let it run normally. It would behave as if there was no cache hit except that it writes the cache file to a different location (and writes the debugging text file). i.e. in Open() we would end up with code like: if (IsAvailableForRead() && correctness_checking_enabled) { // Read the cache file and write out the reference file. // Open the debug_dump_text_writer_ } else if (IsAvailableForRead()) { // Stays the same } else if (IsAvailableForWrite()) { // Stays the same } The GetNext() would have code on the non-cache-hit path to write out to the debug_dump_text_writer_. The cache hit path would not be set. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@468 PS2, Line 468: string org_fragment_id = : ExecEnv::GetInstance()->tuple_cache_mgr()->GetFragmentIdForTupleCache( : combined_key_); : if (org_fragment_id.empty()) return org_fragment_id; I think we should structure the code so that this function does not need to do an existence check. I don't want the debug_dump_caches_metadata_ to serve as an additional source of truth. If we know there is a cache hit when we call this, we can DCHECK that this entry exists. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h File be/src/exec/tuple-text-file-reader.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h@45 PS2, Line 45: int GetFileSize(); Mark this function as const http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h File be/src/exec/tuple-text-file-writer.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@52 PS2, Line 52: bool IsEmpty(); Mark this function as const http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@55 PS2, Line 55: size_t BytesWritten() { return writer_.tellp(); } Mark this function as const http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h File be/src/runtime/tuple-cache-mgr.h: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@72 PS2, Line 72: /// The fragment id of the tuple cache for debug purpose. : std::string fragment_id; One other thing that we might want to put in this struct is the value of the client_identifier query option. In our test infrastructure, we set that to the pytest test string, so that would let us say "this cache entry was written by this pytest". We would need to write that out to a separate file somehow. This could be done as a separate JIRA. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@140 PS2, Line 140: /// Create the subdirectory for debug dumping the tuple cache if needed and return : /// the full file path. : string GetDebugDumpPath(const string& sub_dir, const string& file_name); Mark this function as const http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@144 PS2, Line 144: /// Return whether debug dumping is enabled. : bool DebugDumpEnabled() { return cache_debug_dump_dir_ != ""; } Mark this function as const http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@152 PS2, Line 152: /// Verify whether two debug dump caches are the same. : /// If the files are with different sizes or not able to open the files, will return : /// an error status to stop the query. : /// If the files content are different, return status Ok but set the passed as false : /// to allow further comparison. : /// If files are the same or there is no need for a comparison, like verification is : /// off or cache doesn't exist, return status Ok and set the passed as true. : Status VerifyDebugDumpCache( : const string& file_name, const string& ref_file_name, bool* passed); >From what I can tell, this function doesn't make use of any of the fields on >TupleCacheMgr. We have the comparison logic in a couple different places (here >and tuple-cache-node.cc). I'm thinking it would be cleaner to consolidate it >into one place. One option is to have a tuple-text-file-util.h/.cc that >handles both the fast and slow comparison. We could also think about consolidating the tuple-text-reader*/tuple-text-writer* files into that file, as they are both quite small. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc File be/src/runtime/tuple-cache-mgr.cc: http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@384 PS2, Line 384: if (!filesystem::exists(full_path)) { : filesystem::create_directories(full_path); : } We need to be careful about boost throwing exceptions. Let's switch this to use be/src/util/filesystem-util.h, which is careful about avoiding exceptions. http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@407 PS2, Line 407: void TupleCacheMgr::EvictedEntry(Slice key, Slice value) { When we evict an entry from the cache, I think we should remove the entry from debug_dump_caches_metadata_. -- To view, visit http://gerrit.cloudera.org:8080/21754 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c Gerrit-Change-Number: 21754 Gerrit-PatchSet: 2 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 19 Sep 2024 22:06:33 +0000 Gerrit-HasComments: Yes
