Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
......................................................................


Patch Set 1:

(1 comment)

> I think the _IO_ apportioned relative to capacity may be fine as that's a 
> reasonable expectation if a user specifies more capacity on a slower device. 
> One thing to consider may be to use multiple backing files for larger 
> partition to avoid per-file lock problem.

I think, unfortunately, the lower capacity device is most likely to be the 
faster one (eg a small SSD plus big HDDs).

Maybe we can simplify this for now by just requiring that all partitions of the 
cache be allocated the same amount of space? The most immediate scenario I can 
see is on Amazon instances like r5d.4xlarge (two local SSDs with the same size) 
where you'd want to give the same amount of capacity to each cache drive anyway.

Put another way, I think we should constrain the configuration space in such a 
way that only good configs are possible, rather than hope that users don't do 
something like: /ssd/:100G,/hdd1:1TB,/hdd2:1TB,/hdd3:1TB and find that their 
SSD's fast IO performance is basically ignored.

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache.cc@79
PS1, Line 79:   // Unlink the file so the disk space is recycled once the 
process exits.
> Thanks for the feedback. I also considered the approach of cleaning up the
In terms of preventing the "wiping" of a directory if a previous Impala's 
already running, we can always use advisory locks. 
kudu::Env::Default()->LockFile() can do this for you pretty easily.

One question I don't know the answer to: there might be an advantage to 
operating on a deleted inode in performance. It may be that XFS or ext4 has 
optimizations that kick in when a file is unlinked. For example, normally, 
every write to a file will update the file's mtime, which requires writing to 
the filesystem journal, etc. (i've often seen stacks blocked in 
file_update_time() waiting on a jbd2 lock in the kernel under heavy IO). I'm 
not sure if this is optimized or not but you could certainly imagine that, for 
an unlinked file, journaling would be skipped, etc, since the goal is that on a 
crash we don't need to restore that file. May also be that this optimization 
isn't really implemented in practice :) I looked through the kernel source for 
a bit to see if I could find such an optimization but wasn't one obviously 
present.



--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Fri, 12 Apr 2019 16:00:21 +0000
Gerrit-HasComments: Yes

Reply via email to