Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 9:

(4 comments)

Hi Joe, thanks for the code review. I've fixed some issues, and I still work 
about the custom cluster test.
In the meantime, it would be better if we could solve IMPALA-10971 first, as 
this issue affects the correctness of some metrics (cache size and number of 
entries) and could cause tests to fail. If you don't have time to deal with it, 
you could assign it to me, and I'll fix it as soon as possible, thanks.

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@131
PS9, Line 131: DEFINE_bool(data_cache_enable_loading, false,
             :     "(Experimental) With loading enabled, data cache will try to 
load the previously dump"
             :     " file (created by the dumping feature enabled by the 
'data_cache_enable_dumping') "
             :     "before create a empty cache. if loading fails, it will 
proceed with regular "
             :     "initialization.");
> I'm thinking that we can combine data_cache_enable_dumping and data_cache_e
Done


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@684
PS9, Line 684:   // Check if there is enough space available at this point in 
time.
             :   uint64_t available_bytes;
             :   RETURN_IF_ERROR(FileSystemUtil::GetSpaceAvailable(path_, 
&available_bytes));
             :   if (available_bytes < capacity_) {
             :     const string& err = Substitute("Insufficient space for $0. 
Required $1. Only $2 is "
             :         "available", path_, PrettyPrinter::PrintBytes(capacity_),
             :         PrettyPrinter::PrintBytes(available_bytes));
             :     LOG(ERROR) << err;
             :     return Status(err);
             :   }
> I think this check can fire if we have the cache on its own device with the
Done. The current check now takes into account the size of existing cached 
data, but it relies on TOTAL_BYTES, as mentioned earlier. It is better to first 
solve IMPALA-10971 to prevent any potential issues.


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/runtime/io/data-cache.cc@1056
PS9, Line 1056: Status DataCache::Partition::Load() {
> I think we want to delete files that are not listed in the metadata. It sho
Done


http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h
File be/src/util/cache/cache.h:

http://gerrit.cloudera.org:8080/#/c/19532/9/be/src/util/cache/cache.h@288
PS9, Line 288:  Traverse
> Nit: Since we are exporting all entries as a single vector, let's call this
Done



--
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 9
Gerrit-Owner: Zihao Ye <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Zihao Ye <[email protected]>
Gerrit-Comment-Date: Wed, 10 May 2023 07:59:50 +0000
Gerrit-HasComments: Yes

Reply via email to