Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
......................................................................


Patch Set 7:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc
File be/src/runtime/io/data-cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache-test.cc@273
PS5, Line 273:   auto s =
> I just found out we have ScopedFlagSetter in scoped-flag-setter.h, I think
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@112
PS6, Line 112:
             :   /// 'config' is the configurati
> per the commit message, we've moved to a single quota rather than per-direc
Yes, comments were stale. Fixed in new patch.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.h@261
PS6, Line 261:
> perhaps init to -1?
Done. Switched to initializing it to 0 in CreateCacheFile() instead.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h
File be/src/runtime/io/data-cache.h:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@215
PS5, Line 215:
> 'start_reclaim'?
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@337
PS5, Line 337:   };
> Can you mention in the comment that the pool has only 1 thread and why you'
The reason for using a pool is that we want to be able to queue the work for 
deferred processing by another thread and ThreadPool seems to provide the right 
abstraction. May be there are other classes in utility/ which are also 
applicable ?


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.h@341
PS5, Line 341:
> Some functions around deleting files are called "Close...". We should point
Done


http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@306
PS4, Line 306:   cache_files_.emplace_back(std::mo
> This check is racy. More allocation could have happened since 'insertion_of
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@95
PS5, Line 95: ng files opened, ol
> switch to single thread, or mention pool here
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@112
PS5, Line 112: DeleteFile
> Can we call this DeleteFile? Otherwise there's a third thing to keep track
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@125
PS5, Line 125: d the lock in
> It's not obvious to me why we only need a percpu_rwlock here. Can you add a
Done. Please also see explanation in comments above.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@208
PS5, Line 208:
> nit: singular
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@335
PS5, Line 335: ing files left over fro
> Similar to other comments, I'd call this "VerifySizeAndDeleteFiles", I thin
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@395
PS5, Line 395: }
> Will this handle hole punching through the eviction logic?
Yes.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@436
PS5, Line 436:   // Try verifying the checksum of the new buffer matches that 
of the existing entry.
> nit: only append the "checksum $3" part if checksumming is enabled? I don't
Keeping it for simplicity.


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@457
PS5, Line 457: == nullptr)) r
> start_reclaim?
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/data-cache.cc@633
PS5, Line 633: URN_IF_ERROR(p
> start_reclaim?
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@64
PS6, Line 64: DEFINE_int64(data_cache_file_max_size_bytes, 1L << 40 /* 1TB */,
> - can you add a comment here like /* 4TB */?
Done.

Yes, according to https://access.redhat.com/solutions/1532, ext4 should support 
up to 16TB. Apparently, there is still need to support ext3 which has a limit 
of 2TB. So, I will set it to 1TB to be safe.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@68
PS6, Line 68:     "(Advanced) The maximum number of allowed opened files. This 
must be at least the "
> Setting this per-partition creates a dependency between this and the number
Makes sense. Done.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@70
PS6, Line 70: DEFINE_int32(data_cache_write_concurrency, 1,
> is this per-partition? should be, right?
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@117
PS6, Line 117:     KUDU_RETURN_IF_ERROR(kudu::Env::Default()->NewRWFile(path, 
&cache_file->file_),
> why not pass the RWFile into the CacheFile constructor vs creating an empty
One advantage is to hide the implementation of CacheFile from the caller (e.g. 
if we somehow can switch to using Impala's own implementation of RWFile).

I forgot to move the constructor to private. Should have done that so there is 
no way to create an empty CacheFile.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@141
PS6, Line 141:     Close();
> WARN_NOT_OK could be used here
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@153
PS6, Line 153: int64_
> 'inline' here and elsewhere isn't necessary since you've defined them inlin
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@174
PS6, Line 174:     DCHECK_EQ(offset % PAGE_SIZE, 0);
> worth a DCHECK that offset + bytes_to_read <= current_offset_
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@192
PS6, Line 192:     DCHECK_EQ(offset % PAGE_SIZE, 0);
> same DCHECK suggested above
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@205
PS6, Line 205:   }
> same
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@228
PS6, Line 228:
> instead of just saying used for synchronozation" I think best to say "taken
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@281
PS6, Line 281:     return key_;
> can you DCHECK that lock_ is held by the current thread here? (same elsewhe
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@347
PS6, Line 347:   }
             :
> is this the right method call? seems the same as above
Oops..mean to call GetFileSize().


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@352
PS6, Line 352: acking file for the partition.
> .clear()
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@360
PS6, Line 360:   for (auto& file : cache_files_) {
> dcheck the lock is held?
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@380
PS6, Line 380:
> I think this pattern is used pretty widely in Impala but it's moderately sk
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@395
PS6, Line 395: }
> should LOG(WARNING) an error here on bad checksum, so people know their dri
There is already a LOG(DFATAL) in VerifyChecksum() in case there is any 
checksum mismatch.


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@436
PS6, Line 436:   // Try verifying the checksum of the new buffer matches that 
of the existing entry.
> maybe VLOG(3)?
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@444
PS6, Line 444:   }
> similar to comment above, I think it's better to create a struct, set the f
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@508
PS6, Line 508:  path
> nit: emplace
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@588
PS6, Line 588:
> CHECK here (not a hot path and we'd crash later)
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/runtime/io/data-cache.cc@616
PS6, Line 616:     return Status(Substitute("Configured data cache capacity $0 
is too small",
> 3 or 4 maybe? this will be quite noisy
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/12987/5/be/src/runtime/io/hdfs-file-reader.cc@37
PS5, Line 37: "
> nit: trailing space
Done


http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/util/filesystem-util.cc
File be/src/util/filesystem-util.cc:

http://gerrit.cloudera.org:8080/#/c/12987/6/be/src/util/filesystem-util.cc@305
PS6, Line 305: }
> we should probably test for known buggy el6 versions and ext4 here and disa
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/tests/custom_cluster/test_data_cache.py
File tests/custom_cluster/test_data_cache.py:

http://gerrit.cloudera.org:8080/#/c/12987/5/tests/custom_cluster/test_data_cache.py@22
PS5, Line 22:
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/12987/5/tests/custom_cluster/test_data_cache.py@23
PS5, Line 23:
            :   """ This test enables the data cache an
> It actually seems to check the metrics, not the profile counters.
It checks both. It checks the metrics in the python code but also checks the 
profile's counters in "QueryTest/data-cache.test"



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Wed, 01 May 2019 01:46:49 +0000
Gerrit-HasComments: Yes

Reply via email to