Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21754 )

Change subject: IMPALA-12908: Add correctness check for tuple cache
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21754/2//COMMIT_MSG@20
PS2, Line 20: The subdirectory is named using a combination
            : of the cache key, mt_dop, and instance index.
I think this is out of date. We dropped mt_dop/instance index and now 
incorporate fragment instance id.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.h@99
PS2, Line 99:   string GetDebugDumpPath(const RuntimeState* state, bool is_ref);
I think this can be const.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@356
PS2, Line 356:     if (debug_dump_text_writer_ && 
!debug_dump_rowbatch_done_reading_) {
Here's another way of doing this: At open, write out the entire reference cache 
file immediately. Then, set up the query to write its output to the debug 
location and let it run normally. It would behave as if there was no cache hit 
except that it writes the cache file to a different location (and writes the 
debugging text file).

i.e. in Open() we would end up with code like:

if (IsAvailableForRead() && correctness_checking_enabled) {
  // Read the cache file and write out the reference file.
  // Open the debug_dump_text_writer_
} else if (IsAvailableForRead()) {
  // Stays the same
} else if (IsAvailableForWrite()) {
  // Stays the same
}

The GetNext() would have code on the non-cache-hit path to write out to the 
debug_dump_text_writer_. The cache hit path would not be set.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-cache-node.cc@468
PS2, Line 468:     string org_fragment_id =
             :         
ExecEnv::GetInstance()->tuple_cache_mgr()->GetFragmentIdForTupleCache(
             :             combined_key_);
             :     if (org_fragment_id.empty()) return org_fragment_id;
I think we should structure the code so that this function does not need to do 
an existence check. I don't want the debug_dump_caches_metadata_ to serve as an 
additional source of truth. If we know there is a cache hit when we call this, 
we can DCHECK that this entry exists.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h
File be/src/exec/tuple-text-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-reader.h@45
PS2, Line 45:   int GetFileSize();
Mark this function as const


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h
File be/src/exec/tuple-text-file-writer.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@52
PS2, Line 52:   bool IsEmpty();
Mark this function as const


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/exec/tuple-text-file-writer.h@55
PS2, Line 55:   size_t BytesWritten() { return writer_.tellp(); }
Mark this function as const


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h
File be/src/runtime/tuple-cache-mgr.h:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@72
PS2, Line 72:     /// The fragment id of the tuple cache for debug purpose.
            :     std::string fragment_id;
One other thing that we might want to put in this struct is the value of the 
client_identifier query option. In our test infrastructure, we set that to the 
pytest test string, so that would let us say "this cache entry was written by 
this pytest". We would need to write that out to a separate file somehow.

This could be done as a separate JIRA.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@140
PS2, Line 140:   /// Create the subdirectory for debug dumping the tuple cache 
if needed and return
             :   /// the full file path.
             :   string GetDebugDumpPath(const string& sub_dir, const string& 
file_name);
Mark this function as const


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@144
PS2, Line 144:   /// Return whether debug dumping is enabled.
             :   bool DebugDumpEnabled() { return cache_debug_dump_dir_ != ""; }
Mark this function as const


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.h@152
PS2, Line 152:   /// Verify whether two debug dump caches are the same.
             :   /// If the files are with different sizes or not able to open 
the files, will return
             :   /// an error status to stop the query.
             :   /// If the files content are different, return status Ok but 
set the passed as false
             :   /// to allow further comparison.
             :   /// If files are the same or there is no need for a 
comparison, like verification is
             :   /// off or cache doesn't exist, return status Ok and set the 
passed as true.
             :   Status VerifyDebugDumpCache(
             :       const string& file_name, const string& ref_file_name, 
bool* passed);
>From what I can tell, this function doesn't make use of any of the fields on 
>TupleCacheMgr. We have the comparison logic in a couple different places (here 
>and tuple-cache-node.cc). I'm thinking it would be cleaner to consolidate it 
>into one place. One option is to have a tuple-text-file-util.h/.cc that 
>handles both the fast and slow comparison.

We could also think about consolidating the 
tuple-text-reader*/tuple-text-writer* files into that file, as they are both 
quite small.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc
File be/src/runtime/tuple-cache-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@384
PS2, Line 384:   if (!filesystem::exists(full_path)) {
             :     filesystem::create_directories(full_path);
             :   }
We need to be careful about boost throwing exceptions. Let's switch this to use 
be/src/util/filesystem-util.h, which is careful about avoiding exceptions.


http://gerrit.cloudera.org:8080/#/c/21754/2/be/src/runtime/tuple-cache-mgr.cc@407
PS2, Line 407: void TupleCacheMgr::EvictedEntry(Slice key, Slice value) {
When we evict an entry from the cache, I think we should remove the entry from 
debug_dump_caches_metadata_.



--
To view, visit http://gerrit.cloudera.org:8080/21754
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c
Gerrit-Change-Number: 21754
Gerrit-PatchSet: 2
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 19 Sep 2024 22:06:33 +0000
Gerrit-HasComments: Yes

Reply via email to