Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 )
Change subject: IMPALA-8341: Data cache for remote reads ...................................................................... Patch Set 1: (1 comment) > I think the _IO_ apportioned relative to capacity may be fine as that's a > reasonable expectation if a user specifies more capacity on a slower device. > One thing to consider may be to use multiple backing files for larger > partition to avoid per-file lock problem. I think, unfortunately, the lower capacity device is most likely to be the faster one (eg a small SSD plus big HDDs). Maybe we can simplify this for now by just requiring that all partitions of the cache be allocated the same amount of space? The most immediate scenario I can see is on Amazon instances like r5d.4xlarge (two local SSDs with the same size) where you'd want to give the same amount of capacity to each cache drive anyway. Put another way, I think we should constrain the configuration space in such a way that only good configs are possible, rather than hope that users don't do something like: /ssd/:100G,/hdd1:1TB,/hdd2:1TB,/hdd3:1TB and find that their SSD's fast IO performance is basically ignored. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@79 PS1, Line 79: // Unlink the file so the disk space is recycled once the process exits. > Thanks for the feedback. I also considered the approach of cleaning up the In terms of preventing the "wiping" of a directory if a previous Impala's already running, we can always use advisory locks. kudu::Env::Default()->LockFile() can do this for you pretty easily. One question I don't know the answer to: there might be an advantage to operating on a deleted inode in performance. It may be that XFS or ext4 has optimizations that kick in when a file is unlinked. For example, normally, every write to a file will update the file's mtime, which requires writing to the filesystem journal, etc. (i've often seen stacks blocked in file_update_time() waiting on a jbd2 lock in the kernel under heavy IO). I'm not sure if this is optimized or not but you could certainly imagine that, for an unlinked file, journaling would be skipped, etc, since the goal is that on a crash we don't need to restore that file. May also be that this optimization isn't really implemented in practice :) I looked through the kernel source for a bit to see if I could find such an optimization but wasn't one obviously present. -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Fri, 12 Apr 2019 16:00:21 +0000 Gerrit-HasComments: Yes