Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-02-20 Thread Nicolai Stange
Linus Torvalds writes: > So in order to use it as a signal, first you have to first scrub the > cache (because if the page was already there, there's no signal at > all), and then for the signal to be as useful as possible, you're also > going to want to try to get out more than one bit of info

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-30 Thread Jiri Kosina
On Wed, 30 Jan 2019, Michal Hocko wrote: > > > I'm not sure why I'm the main recipient of that mail but answering > > > because I am -- let's get these patches in through the regular -mm > > > tree though > > > > *prod to mm maintainers* (at least for an opinion) > > Could you repost those pat

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-30 Thread Michal Hocko
On Wed 30-01-19 00:52:02, Jiri Kosina wrote: > On Mon, 28 Jan 2019, Dominique Martinet wrote: > > > > So, any objections to aproaching it this way? > > > > I'm not sure why I'm the main recipient of that mail but answering > > because I am -- let's get these patches in through the regular -mm tre

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-29 Thread Jiri Kosina
On Mon, 28 Jan 2019, Dominique Martinet wrote: > > So, any objections to aproaching it this way? > > I'm not sure why I'm the main recipient of that mail but answering > because I am -- let's get these patches in through the regular -mm tree > though *prod to mm maintainers* (at least for an opi

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-28 Thread Cyril Hrubis
Hi! > > Of course, there aren't any tests for RWF_NOWAIT in xfstests. Are there > > any in LTP? Just FYI I've send a patch with basic RWF_NOWAIT test for review to LTP ML and also CCed mailing lists from this thread. https://lkml.org/lkml/2019/1/28/416 -- Cyril Hrubis chru...@suse.cz

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-27 Thread Dominique Martinet
Jiri Kosina wrote on Sun, Jan 27, 2019: > So, any objections to aproaching it this way? I'm not sure why I'm the main recipient of that mail but answering because I am -- let's get these patches in through the regular -mm tree though -- Dominique

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-27 Thread Jiri Kosina
On Thu, 24 Jan 2019, Jiri Kosina wrote: > > Jiri, you've offered resubmitting the last two patches properly, can you > > incorporate this change or should I just send this directly? (I'd take > > most of your commit message and add your name somewhere) > > I've been running some basic smoke tes

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-24 Thread Jiri Kosina
On Thu, 24 Jan 2019, Dominique Martinet wrote: > Jiri, you've offered resubmitting the last two patches properly, can you > incorporate this change or should I just send this directly? (I'd take > most of your commit message and add your name somewhere) I've been running some basic smoke testin

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-24 Thread Dominique Martinet
Dominique Martinet wrote on Thu, Jan 24, 2019: > I was thinking of something along the lines of: > return vma_is_anonymous(vma) || (vma->vm_file && > (inode_owner_or_capable(file_inode(vma->vm_file)) >|| inode_permission(file_inode(vma->vm_file),

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 24, 2019: > I've reverted the 'let's try to just remove the code' part in my tree. > But I didn't apply the two other patches yet. Any final comments > before that should happen? I mentionned when sending the updated version that just checking file permission might

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 24, 2019 at 12:12 PM Jiri Kosina wrote: > > > > > I think the "test vm_file" thing may be unnecessary, because a > > non-anonymous mapping should always have a file pointer and an inode. > > But I could imagine some odd case (vdso mapping, anyone?) that > > doesn't have a vm_file, but

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Jiri Kosina
On Thu, 24 Jan 2019, Linus Torvalds wrote: > Side note: the inode_permission() addition to can_do_mincore() in that > patch 0002, seems to be questionable. We do > > +static inline bool can_do_mincore(struct vm_area_struct *vma) > +{ > + return vma_is_anonymous(vma) > + || (vm

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 24, 2019 at 9:27 AM Linus Torvalds wrote: > > I've reverted the 'let's try to just remove the code' part in my tree. > But I didn't apply the two other patches yet. Any final comments > before that should happen? Side note: the inode_permission() addition to can_do_mincore() in that p

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 9:23 AM Jiri Kosina wrote: > > So I've done some basic smoke testing (~2 hours of LTP+xfstests) on the > kernel with the three topmost patches from > > > https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/log/?h=pagecache-sidechannel > > applied (also

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-18 Thread Vlastimil Babka
On 1/18/19 5:49 AM, Linus Torvalds wrote: > On Fri, Jan 18, 2019 at 9:45 AM Vlastimil Babka wrote: >> >> Or maybe we could resort to the 5.0-rc1 page table check (that is now being >> reverted) but only in cases when we are not allowed the page cache residency >> check? Or would that be needlessly

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-18 Thread Tejun Heo
Hello, Linus. On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > And the first hit is 'fincore', which probably nobody cares about > anyway, but it does > > fd = open (name, O_RDONLY) > .. > mmap(window, len, PROT_NONE, MAP_PRIVATE, .. So, folks here have been using fi

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 4:51 PM Linus Torvalds wrote: > > On Thu, Jan 17, 2019 at 9:37 AM Matthew Wilcox wrote: > > > > Your patch 3/3 just removes the test. Am I right in thinking that it > > doesn't need to be *moved* because the existing test after !PageUptodate > > catches it? > > That's the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Linus Torvalds
On Fri, Jan 18, 2019 at 9:45 AM Vlastimil Babka wrote: > > Or maybe we could resort to the 5.0-rc1 page table check (that is now being > reverted) but only in cases when we are not allowed the page cache residency > check? Or would that be needlessly complicated? I think it would be good fallbac

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Vlastimil Babka
On 1/16/2019 8:52 AM, Josh Snyder wrote: > On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet > wrote: >> >> There is a difference with your previous patch though, that used to list no >> page in core when it didn't know; this patch lists pages as in core when it >> refuses to tell. I don't think

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Dave Chinner
On Thu, Jan 17, 2019 at 09:18:41AM +0100, Jiri Kosina wrote: > On Thu, 17 Jan 2019, Dave Chinner wrote: > > > > > commit e837eac23662afae603aaaef7c94bc839c1b8f67 > > > > Author: Steve Lord > > > > Date: Mon Mar 5 16:47:52 2001 + > > > > > > > > Add bounds checking for direct I/O, do th

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Cyril Hrubis
Hi! > > Your patch 3/3 just removes the test. Am I right in thinking that it > > doesn't need to be *moved* because the existing test after !PageUptodate > > catches it? > > Exactly. It just initiates read-ahead for IOCB_NOWAIT cases as well, and > if it's actually set, it'll be handled by the !

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Jiri Kosina
On Thu, 17 Jan 2019, Dave Chinner wrote: > > > commit e837eac23662afae603aaaef7c94bc839c1b8f67 > > > Author: Steve Lord > > > Date: Mon Mar 5 16:47:52 2001 + > > > > > > Add bounds checking for direct I/O, do the cache invalidation for > > > data coherency on direct I/O. > > > > O

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 9:37 AM Matthew Wilcox wrote: > > Your patch 3/3 just removes the test. Am I right in thinking that it > doesn't need to be *moved* because the existing test after !PageUptodate > catches it? That's the _hope_. That's the simplest patch I can come up with as a potential

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dave Chinner
On Fri, Jan 11, 2019 at 08:36:55AM +0100, Jiri Kosina wrote: > On Thu, 10 Jan 2019, Dave Chinner wrote: > > > Sounds nice from a theoretical POV, but reality has taught us very > > different lessons. > > > > FWIW, a quick check of XFS's history so you understand how long this > > behaviour has

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dominique Martinet
Jiri Kosina wrote on Wed, Jan 16, 2019: > So if noone sees any principal problem there, I'll happily submit it with > proper attribution etc. I'm not convinced just the write permission check is enough for mincore(), as Josh also seems to share the concern I raised (e.g. map a git directory "hot"

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dave Chinner
On Wed, Jan 16, 2019 at 04:54:49PM +1200, Linus Torvalds wrote: > On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > > > I'm assuming that you can invalidate the page cache reliably by a > > means that does not repeated require probing to detect invalidation > > has occurred. I've mentioned

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Wed, 16 Jan 2019, Matthew Wilcox wrote: > > On Thu, 17 Jan 2019, Linus Torvalds wrote: > > > As I suggested earlier in the thread, the fix for RWF_NOWAIT might be > > > to just move the test down to after readahead. > > Your patch 3/3 just removes the test. Am I right in thinking that it > do

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Matthew Wilcox
On Wed, Jan 16, 2019 at 09:23:04PM +0100, Jiri Kosina wrote: > On Thu, 17 Jan 2019, Linus Torvalds wrote: > > As I suggested earlier in the thread, the fix for RWF_NOWAIT might be > > to just move the test down to after readahead. Your patch 3/3 just removes the test. Am I right in thinking that

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Thu, 17 Jan 2019, Linus Torvalds wrote: > > So that seems to deal with mincore() in a reasonable way indeed. > > > > It doesn't unfortunately really solve the preadv2(RWF_NOWAIT), nor does it > > provide any good answer what to do about it, does it? > > As I suggested earlier in the thread, th

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 4:12 AM Jiri Kosina wrote: > > So that seems to deal with mincore() in a reasonable way indeed. > > It doesn't unfortunately really solve the preadv2(RWF_NOWAIT), nor does it > provide any good answer what to do about it, does it? As I suggested earlier in the thread, the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Wed, 16 Jan 2019, Linus Torvalds wrote: > > "Being owner or has cap" (whichever cap) is probably OK. On the other > > hand, writeability check makes more sense in general - could we > > somehow check if the user has write access to the file instead of > > checking if it currently is opened r

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Matthew Wilcox
On Wed, Jan 16, 2019 at 05:00:25PM +1200, Linus Torvalds wrote: > And if you're not the owner of the file, do you have another > suggestion for that "Yes, I have the right to see what's in-core for > this file". Because the problem is literally that if it's some random > read-only system file, the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Kevin Easton
On Tue, Jan 15, 2019 at 11:52:25PM -0800, Josh Snyder wrote: > On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet > wrote: > > > > There is a difference with your previous patch though, that used to list no > > page in core when it didn't know; this patch lists pages as in core when it > > refuse

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Josh Snyder
On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet wrote: > > There is a difference with your previous patch though, that used to list no > page in core when it didn't know; this patch lists pages as in core when it > refuses to tell. I don't think that's very important, though. Is there a reaso

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dominique Martinet
Linus Torvalds wrote on Wed, Jan 16, 2019: > Anybody willing to test the above patch instead? And replace the > >|| capable(CAP_SYS_ADMIN) > > check with something like > >|| inode_permission(inode, MAY_WRITE) == 0 > > instead? > > (This is obviously after you've reverted the "only che

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 5:46 PM Dominique Martinet wrote: > > "Being owner or has cap" (whichever cap) is probably OK. > On the other hand, writeability check makes more sense in general - > could we somehow check if the user has write access to the file instead > of checking if it currently is op

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 4:54 PM Linus Torvalds wrote: > > On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > > > I'm assuming that you can invalidate the page cache reliably by a > > means that does not repeated require probing to detect invalidation > > has occurred. I've mentioned one met

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dominique Martinet
Linus Torvalds wrote on Wed, Jan 16, 2019: > *Very* few people want to run their databases as root. In the case of happycache, this isn't the database doing the dump/restore, but a separate process that could have the cap - it's better if we can do without though, and from his readme he runs as us

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 5:25 PM Andy Lutomirski wrote: > > Something like CAP_DAC_READ_SEARCH might not be crazy. I agree that it would work. In fact' it's what Jiri's patch basically did. Except Jiri used CAP_SYS_ADMIN instead. But that then basically limits it to root (or root-like with capabi

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Andy Lutomirski
> On Jan 15, 2019, at 9:00 PM, Linus Torvalds > wrote: > >> On Wed, Jan 16, 2019 at 12:42 PM Josh Snyder wrote: >> >> For Netflix, losing accurate information from the mincore syscall would >> lengthen database cluster maintenance operations from days to months. We >> rely on cross-process

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 12:42 PM Josh Snyder wrote: > > For Netflix, losing accurate information from the mincore syscall would > lengthen database cluster maintenance operations from days to months. We > rely on cross-process mincore to migrate the contents of a page cache from > machine to mach

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > I'm assuming that you can invalidate the page cache reliably by a > means that does not repeated require probing to detect invalidation > has occurred. I've mentioned one method in this discussion > already... Yes. And it was made clear to

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Josh Snyder
Linus Torvalds wrote on Thu, Jan 10, 2019: > So right now, I consider the mincore change to be a "try to probe the > state of mincore users", and we haven't really gotten a lot of > information back yet. For Netflix, losing accurate information from the mincore syscall would lengthen database cl

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dave Chinner
On Fri, Jan 11, 2019 at 08:26:14AM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 11:36 PM Dave Chinner wrote: > > > > > It's only that single page that *matters*. That's the page that the > > > probe reveals the status of - but it's also the page that the probe > > > then *changes* the st

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-11 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 11:36 PM Dave Chinner wrote: > > > It's only that single page that *matters*. That's the page that the > > probe reveals the status of - but it's also the page that the probe > > then *changes* the status of. > > It changes the state of it /after/ we've already got the info

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Jiri Kosina
On Thu, 10 Jan 2019, Dave Chinner wrote: > Sounds nice from a theoretical POV, but reality has taught us very > different lessons. > > FWIW, a quick check of XFS's history so you understand how long this > behaviour has been around. It was introduced in the linux port in 2001 > as direct IO su

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 11:08:07PM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 8:04 PM Dave Chinner wrote: > > > > So it will only read the single page we tried to access and won't > > perturb the rest of the message encoded into subsequent pages in > > file. > > Dave, you're being int

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > But those numbers aren't about the mincore() change. That's just from > dropping caches. > > Now, what's the difference with the mincore change, and without? Is it > actually measurable? > > Because that's all that matters: is the mincore change someth

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 08:08:37PM -0800, Andy Lutomirski wrote: > > On Jan 10, 2019, at 8:04 PM, Dave Chinner > > wrote: > > > >> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds > >> wrote: > >>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner > >>> wrote: > >>> > On Thu, Jan 10, 20

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 8:58 PM Dominique Martinet wrote: > > I get on average over a few queries approximately a real time of 350ms, > 230ms and 220ms immediately after drop cache and service restart, and > 150ms, 60ms and 60ms after a prefetch (hand-wavy average over 3 runs, I > didn't have the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 8:04 PM Dave Chinner wrote: > > So it will only read the single page we tried to access and won't > perturb the rest of the message encoded into subsequent pages in > file. Dave, you're being intentionally obtuse, aren't you? It's only that single page that *matters*. Tha

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > On Thu, Jan 10, 2019 at 4:25 AM Dominique Martinet > wrote: > > Linus Torvalds wrote on Thu, Jan 10, 2019: > > > (Except, of course, if somebody actually notices outside of tests. > > > Which may well happen and just force us to revert that commit. But

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Andy Lutomirski
> On Jan 10, 2019, at 8:04 PM, Dave Chinner wrote: > >> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote: >>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: >>> On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: And we *can* do sane things about RWF_

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: > > > > On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > > > And we *can* do sane things about RWF_NOWAIT. For example, we could > > > start async IO on RWF_NOWAIT

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: > > On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > > And we *can* do sane things about RWF_NOWAIT. For example, we could > > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the > > page cache" to "probe and f

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > And we *can* do sane things about RWF_NOWAIT. For example, we could > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the > page cache" to "probe and fill", and be much harder to use as an > attack vector.. We can

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > Since direct IO has been brought up, I have a question. I've wondered > for years why direct IO works the way it does. If I were implementing > it from scratch, my first inclination would be to use the page cache > instead of figh

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 4:25 AM Dominique Martinet wrote: > > Linus Torvalds wrote on Thu, Jan 10, 2019: > > (Except, of course, if somebody actually notices outside of tests. > > Which may well happen and just force us to revert that commit. But > > that's a separate issue entirely). > > Both Dav

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 1:44 PM Dave Chinner wrote: > > GUP does page fault on user buffer which is a mmapped region of same > file. page fault sets up for buffered IO, tries to take rwsem for > write, deadlocks. > > Most of the schemes we come up with fall down at this point - you > can't hold a

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 06:47:11AM -0800, Matthew Wilcox wrote: > On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > > Since direct IO has been brought up, I have a question. I've wondered > > for years why direct IO works the way it does. If I were implementing > > it from scratc

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Matthew Wilcox
On Thu, Jan 10, 2019 at 11:44:24AM +1100, Dave Chinner wrote: > And, really, this would be just another band-aid over a symptom of > the information leak - it doesn't prevent users from being able to > control page cache invalidation. It just removes one method, just > like hacking mincore only rem

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Matthew Wilcox
On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > Since direct IO has been brought up, I have a question. I've wondered > for years why direct IO works the way it does. If I were implementing > it from scratch, my first inclination would be to use the page cache > instead of figh

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > (Except, of course, if somebody actually notices outside of tests. > Which may well happen and just force us to revert that commit. But > that's a separate issue entirely). Both Dave and I pointed at a couple of utilities that break with this. nocache c

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Wed, Jan 9, 2019 at 11:04 PM Dave Chinner wrote: > > Sorry, what hacks did I just admit to making? This O_DIRECT > behaviour long predates me - I'm just the messenger and you are > shooting from the hip. Sure, sorry. I find this whole thing annoying. > Linus, the point I was making is that th

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Jiri Kosina
On Thu, 10 Jan 2019, Dave Chinner wrote: > > Yeah, preadv2(RWF_NOWAIT) is in the same teritory as mincore(), it has > > "just" been overlooked. I can't speak for Daniel, but I believe he might > > be ok with rephrasing the above as "Restricting mincore() and RWF_NOWAIT > > is sufficient ...". >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 05:18:21PM -0800, Linus Torvalds wrote: > On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > > > I wouldn't look at ext4 as an example of a reliable, problem free > > direct IO implementation because, historically speaking, it's been a > > series of nasty hacks (*cough*

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Andy Lutomirski
On Wed, Jan 9, 2019 at 5:18 PM Linus Torvalds wrote: > > On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > > > I wouldn't look at ext4 as an example of a reliable, problem free > > direct IO implementation because, historically speaking, it's been a > > series of nasty hacks (*cough* mount -

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Linus Torvalds
On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > I wouldn't look at ext4 as an example of a reliable, problem free > direct IO implementation because, historically speaking, it's been a > series of nasty hacks (*cough* mount -o dioread_nolock *cough*) and > been far worse than XFS from data

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 11:08:57AM +0100, Jiri Kosina wrote: > On Wed, 9 Jan 2019, Dave Chinner wrote: > > > FWIW, I just realised that the easiest, most reliable way to invalidate > > the page cache over a file range is simply to do a O_DIRECT read on it. > > Neat, good catch indeed. Still, it

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 10:25:43AM -0800, Linus Torvalds wrote: > On Tue, Jan 8, 2019 at 8:39 PM Dave Chinner wrote: > > > > FWIW, I just realised that the easiest, most reliable way to > > invalidate the page cache over a file range is simply to do a > > O_DIRECT read on it. > > If that's the ca

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Linus Torvalds
On Tue, Jan 8, 2019 at 8:39 PM Dave Chinner wrote: > > FWIW, I just realised that the easiest, most reliable way to > invalidate the page cache over a file range is simply to do a > O_DIRECT read on it. If that's the case, that's actually an O_DIRECT bug. It should only invalidate the caches on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Jiri Kosina
On Wed, 9 Jan 2019, Dave Chinner wrote: > FWIW, I just realised that the easiest, most reliable way to invalidate > the page cache over a file range is simply to do a O_DIRECT read on it. Neat, good catch indeed. Still, it's only the invalidation part, but the residency check is the crucial on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Dave Chinner
On Wed, Jan 09, 2019 at 03:31:35AM +0100, Jiri Kosina wrote: > On Wed, 9 Jan 2019, Dave Chinner wrote: > > > > But mincore is certainly the easiest interface, and the one that > > > doesn't require much effort or setup. > > > > Off the top of my head, here's a few vectors for reading the page > >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Jiri Kosina
On Wed, 9 Jan 2019, Dave Chinner wrote: > > But mincore is certainly the easiest interface, and the one that > > doesn't require much effort or setup. > > Off the top of my head, here's a few vectors for reading the page > cache residency state without perturbing the page cache residency > patter

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Dave Chinner
On Tue, Jan 08, 2019 at 09:57:49AM -0800, Linus Torvalds wrote: > On Mon, Jan 7, 2019 at 8:43 PM Dave Chinner wrote: > > > > So, I read the paper and before I was half way through it I figured > > there are a bunch of other similar page cache invalidation attacks > > we can perform without needing

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Linus Torvalds
On Mon, Jan 7, 2019 at 8:43 PM Dave Chinner wrote: > > So, I read the paper and before I was half way through it I figured > there are a bunch of other similar page cache invalidation attacks > we can perform without needing mincore. i.e. Focussing on mmap() and > mincore() misses the wider issues

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Kirill A. Shutemov
On Tue, Jan 08, 2019 at 02:53:00PM +0100, Bernd Petrovitsch wrote: > On 08/01/2019 12:37, Jiri Kosina wrote: > > On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > > > >> Shouldn't the application use e.g. mlock()/ to guarantee no page > >> faults in the first place? > > > > Calling mincore() on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Bernd Petrovitsch
On 08/01/2019 12:37, Jiri Kosina wrote: > On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > >> Shouldn't the application use e.g. mlock()/ to guarantee no page >> faults in the first place? > > Calling mincore() on pages you've just mlock()ed is sort of pointless > though. Obviously;-) Sorry

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Jiri Kosina
On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > Shouldn't the application use e.g. mlock()/ to guarantee no page > faults in the first place? Calling mincore() on pages you've just mlock()ed is sort of pointless though. -- Jiri Kosina SUSE Labs

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Bernd Petrovitsch
On 05/01/2019 20:38, Vlastimil Babka wrote: [...] > I was thinking about "return true" here, assuming that userspace generally > wants > to ensure itself there won't be page faults when it starts doing something > critical, and if it sees a "false" it will try to do some kind of prefaulting, > pos

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Kevin Easton
On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > > > Who actually _uses_ mincore()? That's probably the best guide to what > > > we should do. Maybe they open the file read-only even if they are the > > > owner, and we reall

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Dave Chinner
On Sun, Jan 06, 2019 at 01:46:37PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 5:50 PM Linus Torvalds > wrote: > > > > Slightly updated patch in case somebody wants to try things out. > > I decided to just apply that patch. It is *not* marked for stable, > very intentionally, because I

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Daniel Gruss
On 1/7/19 12:08 PM, Dominique Martinet wrote: >> That's my bigger concern here. In [1] there's described a remote attack >> (on webserver) using the page fault timing differences for present/not >> present page cache pages. Noisy but works, and I expect locally it to be >> much less noisy. Yet the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Vlastimil Babka
On 1/7/19 12:08 PM, Dominique Martinet wrote: > > With the current mincore change, it will think everything was "in core" > and not flush anything unless my option to just fadvise dontneed > everything is passed though ; so even if we can make it work it is a > change of behaviour that is breaking

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Dominique Martinet
Vlastimil Babka wrote on Mon, Jan 07, 2019: > On 1/7/19 5:32 AM, Dominique Martinet wrote: > > Linus Torvalds wrote on Sat, Jan 05, 2019: > >> But I think my patch to just rip out all that page lookup, and just > >> base it on the page table state has the fundamental advantage that it > >> gets rid

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Vlastimil Babka
On 1/7/19 5:32 AM, Dominique Martinet wrote: > Linus Torvalds wrote on Sat, Jan 05, 2019: >> But I think my patch to just rip out all that page lookup, and just >> base it on the page table state has the fundamental advantage that it >> gets rid of code. Maybe I should jst commit it, and see if any

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Michael Ellerman
Linus Torvalds writes: > On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds > wrote: >> >> It goes back to forever, it looks like. I can't find a reason. > > mincore() was originally added in 2.3.52pre3, it looks like. Around > 2000 or so. But sadly before the BK history. Yeah, it's here in the comm

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Dominique Martinet
Linus Torvalds wrote on Sat, Jan 05, 2019: > But I think my patch to just rip out all that page lookup, and just > base it on the page table state has the fundamental advantage that it > gets rid of code. Maybe I should jst commit it, and see if anything > breaks? We do have options in case things

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 5:50 PM Linus Torvalds wrote: > > Slightly updated patch in case somebody wants to try things out. I decided to just apply that patch. It is *not* marked for stable, very intentionally, because I expect that we will need to wait and see if there are issues with it, and whet

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Kevin Easton
On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > > > Who actually _uses_ mincore()? That's probably the best guide to what > > > we should do. Maybe they open the file read-only even if they are the > > > owner, and we reall

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 4:22 PM Linus Torvalds wrote: > > But I think my patch to just rip out all that page lookup, and just > base it on the page table state has the fundamental advantage that it > gets rid of code. Maybe I should jst commit it, and see if anything > breaks? We do have options in

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 4:11 PM Matthew Wilcox wrote: > > FreeBSD claims to have a manpage from SunOS 4.1.3 with mincore (!) > > https://www.freebsd.org/cgi/man.cgi?query=mincore&apropos=0&sektion=0&manpath=SunOS+4.1.3&arch=default&format=html > > DESCRIPTION >mincore() returns the primar

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Matthew Wilcox
On Sat, Jan 05, 2019 at 03:39:10PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds > wrote: > > > > It goes back to forever, it looks like. I can't find a reason. > > mincore() was originally added in 2.3.52pre3, it looks like. Around > 2000 or so. But sadly before th

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds wrote: > > It goes back to forever, it looks like. I can't find a reason. mincore() was originally added in 2.3.52pre3, it looks like. Around 2000 or so. But sadly before the BK history. And that comment about "Later we can get more picky about wh

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds wrote: > > It goes back to forever, it looks like. I can't find a reason. Our man-pages talk abouit the "without doing IO" part. That may be the result of our code, though, not the reason for it. The BSD man-page has other flags, but doesn't describe

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:05 PM Linus Torvalds wrote: > > That would be nicer than my patch, simply because removing code is > always nice. And arguably it's a better semantic anyway. Yeah, I wonder why we did that thing where mincore() walks the page tables, but if they are empty it looks in the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Jiri Kosina
On Sat, 5 Jan 2019, Jann Horn wrote: > > Provide vm.mincore_privileged sysctl, which makes it possible to mincore() > > start returning -EPERM in case it's invoked by a process lacking > > CAP_SYS_ADMIN. > > > > The default behavior stays "mincore() can be used by anybody" in order to > > be conse

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 2:55 PM Jann Horn wrote: > > Just checking: I guess /proc/$pid/pagemap (iow, the pagemap_read() > handler) is less problematic because it only returns data about the > state of page tables, and doesn't query the address_space? In other > words, it permits monitoring eviction

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Jann Horn
On Sat, Jan 5, 2019 at 6:27 PM Jiri Kosina wrote: > There are possibilities [1] how mincore() could be used as a converyor of > a sidechannel information about pagecache metadata. > > Provide vm.mincore_privileged sysctl, which makes it possible to mincore() > start returning -EPERM in case it's i

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > Who actually _uses_ mincore()? That's probably the best guide to what > > we should do. Maybe they open the file read-only even if they are the > > owner, and we really should look at file ownership instead. > > Yeah, well > > https:

  1   2   >