Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-02-20 Thread Nicolai Stange
Linus Torvalds writes: > So in order to use it as a signal, first you have to first scrub the > cache (because if the page was already there, there's no signal at > all), and then for the signal to be as useful as possible, you're also > going to want to try to get out more than one bit of

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-30 Thread Jiri Kosina
On Wed, 30 Jan 2019, Michal Hocko wrote: > > > I'm not sure why I'm the main recipient of that mail but answering > > > because I am -- let's get these patches in through the regular -mm > > > tree though > > > > *prod to mm maintainers* (at least for an opinion) > > Could you repost those

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-30 Thread Michal Hocko
On Wed 30-01-19 00:52:02, Jiri Kosina wrote: > On Mon, 28 Jan 2019, Dominique Martinet wrote: > > > > So, any objections to aproaching it this way? > > > > I'm not sure why I'm the main recipient of that mail but answering > > because I am -- let's get these patches in through the regular -mm

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-29 Thread Jiri Kosina
On Mon, 28 Jan 2019, Dominique Martinet wrote: > > So, any objections to aproaching it this way? > > I'm not sure why I'm the main recipient of that mail but answering > because I am -- let's get these patches in through the regular -mm tree > though *prod to mm maintainers* (at least for an

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-28 Thread Cyril Hrubis
Hi! > > Of course, there aren't any tests for RWF_NOWAIT in xfstests. Are there > > any in LTP? Just FYI I've send a patch with basic RWF_NOWAIT test for review to LTP ML and also CCed mailing lists from this thread. https://lkml.org/lkml/2019/1/28/416 -- Cyril Hrubis chru...@suse.cz

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-27 Thread Dominique Martinet
Jiri Kosina wrote on Sun, Jan 27, 2019: > So, any objections to aproaching it this way? I'm not sure why I'm the main recipient of that mail but answering because I am -- let's get these patches in through the regular -mm tree though -- Dominique

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-27 Thread Jiri Kosina
On Thu, 24 Jan 2019, Jiri Kosina wrote: > > Jiri, you've offered resubmitting the last two patches properly, can you > > incorporate this change or should I just send this directly? (I'd take > > most of your commit message and add your name somewhere) > > I've been running some basic smoke

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-24 Thread Jiri Kosina
On Thu, 24 Jan 2019, Dominique Martinet wrote: > Jiri, you've offered resubmitting the last two patches properly, can you > incorporate this change or should I just send this directly? (I'd take > most of your commit message and add your name somewhere) I've been running some basic smoke

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-24 Thread Dominique Martinet
Dominique Martinet wrote on Thu, Jan 24, 2019: > I was thinking of something along the lines of: > return vma_is_anonymous(vma) || (vma->vm_file && > (inode_owner_or_capable(file_inode(vma->vm_file)) >|| inode_permission(file_inode(vma->vm_file),

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 24, 2019: > I've reverted the 'let's try to just remove the code' part in my tree. > But I didn't apply the two other patches yet. Any final comments > before that should happen? I mentionned when sending the updated version that just checking file permission

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 24, 2019 at 12:12 PM Jiri Kosina wrote: > > > > > I think the "test vm_file" thing may be unnecessary, because a > > non-anonymous mapping should always have a file pointer and an inode. > > But I could imagine some odd case (vdso mapping, anyone?) that > > doesn't have a vm_file,

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Jiri Kosina
On Thu, 24 Jan 2019, Linus Torvalds wrote: > Side note: the inode_permission() addition to can_do_mincore() in that > patch 0002, seems to be questionable. We do > > +static inline bool can_do_mincore(struct vm_area_struct *vma) > +{ > + return vma_is_anonymous(vma) > + ||

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 24, 2019 at 9:27 AM Linus Torvalds wrote: > > I've reverted the 'let's try to just remove the code' part in my tree. > But I didn't apply the two other patches yet. Any final comments > before that should happen? Side note: the inode_permission() addition to can_do_mincore() in that

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-23 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 9:23 AM Jiri Kosina wrote: > > So I've done some basic smoke testing (~2 hours of LTP+xfstests) on the > kernel with the three topmost patches from > > > https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/log/?h=pagecache-sidechannel > > applied (also

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-18 Thread Vlastimil Babka
On 1/18/19 5:49 AM, Linus Torvalds wrote: > On Fri, Jan 18, 2019 at 9:45 AM Vlastimil Babka wrote: >> >> Or maybe we could resort to the 5.0-rc1 page table check (that is now being >> reverted) but only in cases when we are not allowed the page cache residency >> check? Or would that be

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-18 Thread Tejun Heo
Hello, Linus. On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > And the first hit is 'fincore', which probably nobody cares about > anyway, but it does > > fd = open (name, O_RDONLY) > .. > mmap(window, len, PROT_NONE, MAP_PRIVATE, .. So, folks here have been using

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 4:51 PM Linus Torvalds wrote: > > On Thu, Jan 17, 2019 at 9:37 AM Matthew Wilcox wrote: > > > > Your patch 3/3 just removes the test. Am I right in thinking that it > > doesn't need to be *moved* because the existing test after !PageUptodate > > catches it? > > That's

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Linus Torvalds
On Fri, Jan 18, 2019 at 9:45 AM Vlastimil Babka wrote: > > Or maybe we could resort to the 5.0-rc1 page table check (that is now being > reverted) but only in cases when we are not allowed the page cache residency > check? Or would that be needlessly complicated? I think it would be good

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Vlastimil Babka
On 1/16/2019 8:52 AM, Josh Snyder wrote: > On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet > wrote: >> >> There is a difference with your previous patch though, that used to list no >> page in core when it didn't know; this patch lists pages as in core when it >> refuses to tell. I don't

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Dave Chinner
On Thu, Jan 17, 2019 at 09:18:41AM +0100, Jiri Kosina wrote: > On Thu, 17 Jan 2019, Dave Chinner wrote: > > > > > commit e837eac23662afae603aaaef7c94bc839c1b8f67 > > > > Author: Steve Lord > > > > Date: Mon Mar 5 16:47:52 2001 + > > > > > > > > Add bounds checking for direct I/O, do

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Cyril Hrubis
Hi! > > Your patch 3/3 just removes the test. Am I right in thinking that it > > doesn't need to be *moved* because the existing test after !PageUptodate > > catches it? > > Exactly. It just initiates read-ahead for IOCB_NOWAIT cases as well, and > if it's actually set, it'll be handled by the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-17 Thread Jiri Kosina
On Thu, 17 Jan 2019, Dave Chinner wrote: > > > commit e837eac23662afae603aaaef7c94bc839c1b8f67 > > > Author: Steve Lord > > > Date: Mon Mar 5 16:47:52 2001 + > > > > > > Add bounds checking for direct I/O, do the cache invalidation for > > > data coherency on direct I/O. > > > >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 9:37 AM Matthew Wilcox wrote: > > Your patch 3/3 just removes the test. Am I right in thinking that it > doesn't need to be *moved* because the existing test after !PageUptodate > catches it? That's the _hope_. That's the simplest patch I can come up with as a potential

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dave Chinner
On Fri, Jan 11, 2019 at 08:36:55AM +0100, Jiri Kosina wrote: > On Thu, 10 Jan 2019, Dave Chinner wrote: > > > Sounds nice from a theoretical POV, but reality has taught us very > > different lessons. > > > > FWIW, a quick check of XFS's history so you understand how long this > > behaviour has

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dominique Martinet
Jiri Kosina wrote on Wed, Jan 16, 2019: > So if noone sees any principal problem there, I'll happily submit it with > proper attribution etc. I'm not convinced just the write permission check is enough for mincore(), as Josh also seems to share the concern I raised (e.g. map a git directory

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Dave Chinner
On Wed, Jan 16, 2019 at 04:54:49PM +1200, Linus Torvalds wrote: > On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > > > I'm assuming that you can invalidate the page cache reliably by a > > means that does not repeated require probing to detect invalidation > > has occurred. I've mentioned

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Wed, 16 Jan 2019, Matthew Wilcox wrote: > > On Thu, 17 Jan 2019, Linus Torvalds wrote: > > > As I suggested earlier in the thread, the fix for RWF_NOWAIT might be > > > to just move the test down to after readahead. > > Your patch 3/3 just removes the test. Am I right in thinking that it >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Matthew Wilcox
On Wed, Jan 16, 2019 at 09:23:04PM +0100, Jiri Kosina wrote: > On Thu, 17 Jan 2019, Linus Torvalds wrote: > > As I suggested earlier in the thread, the fix for RWF_NOWAIT might be > > to just move the test down to after readahead. Your patch 3/3 just removes the test. Am I right in thinking that

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Thu, 17 Jan 2019, Linus Torvalds wrote: > > So that seems to deal with mincore() in a reasonable way indeed. > > > > It doesn't unfortunately really solve the preadv2(RWF_NOWAIT), nor does it > > provide any good answer what to do about it, does it? > > As I suggested earlier in the thread,

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Linus Torvalds
On Thu, Jan 17, 2019 at 4:12 AM Jiri Kosina wrote: > > So that seems to deal with mincore() in a reasonable way indeed. > > It doesn't unfortunately really solve the preadv2(RWF_NOWAIT), nor does it > provide any good answer what to do about it, does it? As I suggested earlier in the thread, the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Jiri Kosina
On Wed, 16 Jan 2019, Linus Torvalds wrote: > > "Being owner or has cap" (whichever cap) is probably OK. On the other > > hand, writeability check makes more sense in general - could we > > somehow check if the user has write access to the file instead of > > checking if it currently is opened

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Matthew Wilcox
On Wed, Jan 16, 2019 at 05:00:25PM +1200, Linus Torvalds wrote: > And if you're not the owner of the file, do you have another > suggestion for that "Yes, I have the right to see what's in-core for > this file". Because the problem is literally that if it's some random > read-only system file, the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-16 Thread Kevin Easton
On Tue, Jan 15, 2019 at 11:52:25PM -0800, Josh Snyder wrote: > On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet > wrote: > > > > There is a difference with your previous patch though, that used to list no > > page in core when it didn't know; this patch lists pages as in core when it > >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Josh Snyder
On Tue, Jan 15, 2019 at 10:34 PM Dominique Martinet wrote: > > There is a difference with your previous patch though, that used to list no > page in core when it didn't know; this patch lists pages as in core when it > refuses to tell. I don't think that's very important, though. Is there a

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dominique Martinet
Linus Torvalds wrote on Wed, Jan 16, 2019: > Anybody willing to test the above patch instead? And replace the > >|| capable(CAP_SYS_ADMIN) > > check with something like > >|| inode_permission(inode, MAY_WRITE) == 0 > > instead? > > (This is obviously after you've reverted the "only

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 5:46 PM Dominique Martinet wrote: > > "Being owner or has cap" (whichever cap) is probably OK. > On the other hand, writeability check makes more sense in general - > could we somehow check if the user has write access to the file instead > of checking if it currently is

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 4:54 PM Linus Torvalds wrote: > > On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > > > I'm assuming that you can invalidate the page cache reliably by a > > means that does not repeated require probing to detect invalidation > > has occurred. I've mentioned one

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dominique Martinet
Linus Torvalds wrote on Wed, Jan 16, 2019: > *Very* few people want to run their databases as root. In the case of happycache, this isn't the database doing the dump/restore, but a separate process that could have the cap - it's better if we can do without though, and from his readme he runs as

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 5:25 PM Andy Lutomirski wrote: > > Something like CAP_DAC_READ_SEARCH might not be crazy. I agree that it would work. In fact' it's what Jiri's patch basically did. Except Jiri used CAP_SYS_ADMIN instead. But that then basically limits it to root (or root-like with

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Andy Lutomirski
> On Jan 15, 2019, at 9:00 PM, Linus Torvalds > wrote: > >> On Wed, Jan 16, 2019 at 12:42 PM Josh Snyder wrote: >> >> For Netflix, losing accurate information from the mincore syscall would >> lengthen database cluster maintenance operations from days to months. We >> rely on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 12:42 PM Josh Snyder wrote: > > For Netflix, losing accurate information from the mincore syscall would > lengthen database cluster maintenance operations from days to months. We > rely on cross-process mincore to migrate the contents of a page cache from > machine to

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Linus Torvalds
On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner wrote: > > I'm assuming that you can invalidate the page cache reliably by a > means that does not repeated require probing to detect invalidation > has occurred. I've mentioned one method in this discussion > already... Yes. And it was made clear to

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Josh Snyder
Linus Torvalds wrote on Thu, Jan 10, 2019: > So right now, I consider the mincore change to be a "try to probe the > state of mincore users", and we haven't really gotten a lot of > information back yet. For Netflix, losing accurate information from the mincore syscall would lengthen database

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-15 Thread Dave Chinner
On Fri, Jan 11, 2019 at 08:26:14AM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 11:36 PM Dave Chinner wrote: > > > > > It's only that single page that *matters*. That's the page that the > > > probe reveals the status of - but it's also the page that the probe > > > then *changes* the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-11 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 11:36 PM Dave Chinner wrote: > > > It's only that single page that *matters*. That's the page that the > > probe reveals the status of - but it's also the page that the probe > > then *changes* the status of. > > It changes the state of it /after/ we've already got the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Jiri Kosina
On Thu, 10 Jan 2019, Dave Chinner wrote: > Sounds nice from a theoretical POV, but reality has taught us very > different lessons. > > FWIW, a quick check of XFS's history so you understand how long this > behaviour has been around. It was introduced in the linux port in 2001 > as direct IO

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 11:08:07PM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 8:04 PM Dave Chinner wrote: > > > > So it will only read the single page we tried to access and won't > > perturb the rest of the message encoded into subsequent pages in > > file. > > Dave, you're being

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > But those numbers aren't about the mincore() change. That's just from > dropping caches. > > Now, what's the difference with the mincore change, and without? Is it > actually measurable? > > Because that's all that matters: is the mincore change

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 08:08:37PM -0800, Andy Lutomirski wrote: > > On Jan 10, 2019, at 8:04 PM, Dave Chinner > > wrote: > > > >> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds > >> wrote: > >>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner > >>> wrote: > >>> > On Thu, Jan 10,

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 8:58 PM Dominique Martinet wrote: > > I get on average over a few queries approximately a real time of 350ms, > 230ms and 220ms immediately after drop cache and service restart, and > 150ms, 60ms and 60ms after a prefetch (hand-wavy average over 3 runs, I > didn't have the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 8:04 PM Dave Chinner wrote: > > So it will only read the single page we tried to access and won't > perturb the rest of the message encoded into subsequent pages in > file. Dave, you're being intentionally obtuse, aren't you? It's only that single page that *matters*.

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > On Thu, Jan 10, 2019 at 4:25 AM Dominique Martinet > wrote: > > Linus Torvalds wrote on Thu, Jan 10, 2019: > > > (Except, of course, if somebody actually notices outside of tests. > > > Which may well happen and just force us to revert that commit. But

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Andy Lutomirski
> On Jan 10, 2019, at 8:04 PM, Dave Chinner wrote: > >> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote: >>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: >>> On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: And we *can* do sane things about

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote: > On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: > > > > On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > > > And we *can* do sane things about RWF_NOWAIT. For example, we could > > > start async IO on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner wrote: > > On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > > And we *can* do sane things about RWF_NOWAIT. For example, we could > > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the > > page cache" to "probe and

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote: > And we *can* do sane things about RWF_NOWAIT. For example, we could > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the > page cache" to "probe and fill", and be much harder to use as an > attack vector.. We

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > Since direct IO has been brought up, I have a question. I've wondered > for years why direct IO works the way it does. If I were implementing > it from scratch, my first inclination would be to use the page cache > instead of

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 4:25 AM Dominique Martinet wrote: > > Linus Torvalds wrote on Thu, Jan 10, 2019: > > (Except, of course, if somebody actually notices outside of tests. > > Which may well happen and just force us to revert that commit. But > > that's a separate issue entirely). > > Both

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 1:44 PM Dave Chinner wrote: > > GUP does page fault on user buffer which is a mmapped region of same > file. page fault sets up for buffered IO, tries to take rwsem for > write, deadlocks. > > Most of the schemes we come up with fall down at this point - you > can't hold a

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 06:47:11AM -0800, Matthew Wilcox wrote: > On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > > Since direct IO has been brought up, I have a question. I've wondered > > for years why direct IO works the way it does. If I were implementing > > it from

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Matthew Wilcox
On Thu, Jan 10, 2019 at 11:44:24AM +1100, Dave Chinner wrote: > And, really, this would be just another band-aid over a symptom of > the information leak - it doesn't prevent users from being able to > control page cache invalidation. It just removes one method, just > like hacking mincore only

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Matthew Wilcox
On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote: > Since direct IO has been brought up, I have a question. I've wondered > for years why direct IO works the way it does. If I were implementing > it from scratch, my first inclination would be to use the page cache > instead of

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019: > (Except, of course, if somebody actually notices outside of tests. > Which may well happen and just force us to revert that commit. But > that's a separate issue entirely). Both Dave and I pointed at a couple of utilities that break with this. nocache

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Wed, Jan 9, 2019 at 11:04 PM Dave Chinner wrote: > > Sorry, what hacks did I just admit to making? This O_DIRECT > behaviour long predates me - I'm just the messenger and you are > shooting from the hip. Sure, sorry. I find this whole thing annoying. > Linus, the point I was making is that

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Jiri Kosina
On Thu, 10 Jan 2019, Dave Chinner wrote: > > Yeah, preadv2(RWF_NOWAIT) is in the same teritory as mincore(), it has > > "just" been overlooked. I can't speak for Daniel, but I believe he might > > be ok with rephrasing the above as "Restricting mincore() and RWF_NOWAIT > > is sufficient ...".

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 05:18:21PM -0800, Linus Torvalds wrote: > On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > > > I wouldn't look at ext4 as an example of a reliable, problem free > > direct IO implementation because, historically speaking, it's been a > > series of nasty hacks

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Andy Lutomirski
On Wed, Jan 9, 2019 at 5:18 PM Linus Torvalds wrote: > > On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > > > I wouldn't look at ext4 as an example of a reliable, problem free > > direct IO implementation because, historically speaking, it's been a > > series of nasty hacks (*cough* mount

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Linus Torvalds
On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner wrote: > > I wouldn't look at ext4 as an example of a reliable, problem free > direct IO implementation because, historically speaking, it's been a > series of nasty hacks (*cough* mount -o dioread_nolock *cough*) and > been far worse than XFS from data

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 11:08:57AM +0100, Jiri Kosina wrote: > On Wed, 9 Jan 2019, Dave Chinner wrote: > > > FWIW, I just realised that the easiest, most reliable way to invalidate > > the page cache over a file range is simply to do a O_DIRECT read on it. > > Neat, good catch indeed. Still,

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Dave Chinner
On Wed, Jan 09, 2019 at 10:25:43AM -0800, Linus Torvalds wrote: > On Tue, Jan 8, 2019 at 8:39 PM Dave Chinner wrote: > > > > FWIW, I just realised that the easiest, most reliable way to > > invalidate the page cache over a file range is simply to do a > > O_DIRECT read on it. > > If that's the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Linus Torvalds
On Tue, Jan 8, 2019 at 8:39 PM Dave Chinner wrote: > > FWIW, I just realised that the easiest, most reliable way to > invalidate the page cache over a file range is simply to do a > O_DIRECT read on it. If that's the case, that's actually an O_DIRECT bug. It should only invalidate the caches on

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-09 Thread Jiri Kosina
On Wed, 9 Jan 2019, Dave Chinner wrote: > FWIW, I just realised that the easiest, most reliable way to invalidate > the page cache over a file range is simply to do a O_DIRECT read on it. Neat, good catch indeed. Still, it's only the invalidation part, but the residency check is the crucial

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Dave Chinner
On Wed, Jan 09, 2019 at 03:31:35AM +0100, Jiri Kosina wrote: > On Wed, 9 Jan 2019, Dave Chinner wrote: > > > > But mincore is certainly the easiest interface, and the one that > > > doesn't require much effort or setup. > > > > Off the top of my head, here's a few vectors for reading the page >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Jiri Kosina
On Wed, 9 Jan 2019, Dave Chinner wrote: > > But mincore is certainly the easiest interface, and the one that > > doesn't require much effort or setup. > > Off the top of my head, here's a few vectors for reading the page > cache residency state without perturbing the page cache residency >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Dave Chinner
On Tue, Jan 08, 2019 at 09:57:49AM -0800, Linus Torvalds wrote: > On Mon, Jan 7, 2019 at 8:43 PM Dave Chinner wrote: > > > > So, I read the paper and before I was half way through it I figured > > there are a bunch of other similar page cache invalidation attacks > > we can perform without

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Linus Torvalds
On Mon, Jan 7, 2019 at 8:43 PM Dave Chinner wrote: > > So, I read the paper and before I was half way through it I figured > there are a bunch of other similar page cache invalidation attacks > we can perform without needing mincore. i.e. Focussing on mmap() and > mincore() misses the wider

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Kirill A. Shutemov
On Tue, Jan 08, 2019 at 02:53:00PM +0100, Bernd Petrovitsch wrote: > On 08/01/2019 12:37, Jiri Kosina wrote: > > On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > > > >> Shouldn't the application use e.g. mlock()/ to guarantee no page > >> faults in the first place? > > > > Calling mincore()

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Bernd Petrovitsch
On 08/01/2019 12:37, Jiri Kosina wrote: > On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > >> Shouldn't the application use e.g. mlock()/ to guarantee no page >> faults in the first place? > > Calling mincore() on pages you've just mlock()ed is sort of pointless > though. Obviously;-)

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Jiri Kosina
On Tue, 8 Jan 2019, Bernd Petrovitsch wrote: > Shouldn't the application use e.g. mlock()/ to guarantee no page > faults in the first place? Calling mincore() on pages you've just mlock()ed is sort of pointless though. -- Jiri Kosina SUSE Labs

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Bernd Petrovitsch
On 05/01/2019 20:38, Vlastimil Babka wrote: [...] > I was thinking about "return true" here, assuming that userspace generally > wants > to ensure itself there won't be page faults when it starts doing something > critical, and if it sees a "false" it will try to do some kind of prefaulting, >

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-08 Thread Kevin Easton
On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > > > Who actually _uses_ mincore()? That's probably the best guide to what > > > we should do. Maybe they open the file read-only even if they are the > > > owner, and we

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Dave Chinner
On Sun, Jan 06, 2019 at 01:46:37PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 5:50 PM Linus Torvalds > wrote: > > > > Slightly updated patch in case somebody wants to try things out. > > I decided to just apply that patch. It is *not* marked for stable, > very intentionally, because I

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Daniel Gruss
On 1/7/19 12:08 PM, Dominique Martinet wrote: >> That's my bigger concern here. In [1] there's described a remote attack >> (on webserver) using the page fault timing differences for present/not >> present page cache pages. Noisy but works, and I expect locally it to be >> much less noisy. Yet the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Vlastimil Babka
On 1/7/19 12:08 PM, Dominique Martinet wrote: > > With the current mincore change, it will think everything was "in core" > and not flush anything unless my option to just fadvise dontneed > everything is passed though ; so even if we can make it work it is a > change of behaviour that is breaking

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Dominique Martinet
Vlastimil Babka wrote on Mon, Jan 07, 2019: > On 1/7/19 5:32 AM, Dominique Martinet wrote: > > Linus Torvalds wrote on Sat, Jan 05, 2019: > >> But I think my patch to just rip out all that page lookup, and just > >> base it on the page table state has the fundamental advantage that it > >> gets

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Vlastimil Babka
On 1/7/19 5:32 AM, Dominique Martinet wrote: > Linus Torvalds wrote on Sat, Jan 05, 2019: >> But I think my patch to just rip out all that page lookup, and just >> base it on the page table state has the fundamental advantage that it >> gets rid of code. Maybe I should jst commit it, and see if

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-07 Thread Michael Ellerman
Linus Torvalds writes: > On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds > wrote: >> >> It goes back to forever, it looks like. I can't find a reason. > > mincore() was originally added in 2.3.52pre3, it looks like. Around > 2000 or so. But sadly before the BK history. Yeah, it's here in the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Dominique Martinet
Linus Torvalds wrote on Sat, Jan 05, 2019: > But I think my patch to just rip out all that page lookup, and just > base it on the page table state has the fundamental advantage that it > gets rid of code. Maybe I should jst commit it, and see if anything > breaks? We do have options in case things

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 5:50 PM Linus Torvalds wrote: > > Slightly updated patch in case somebody wants to try things out. I decided to just apply that patch. It is *not* marked for stable, very intentionally, because I expect that we will need to wait and see if there are issues with it, and

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-06 Thread Kevin Easton
On Sat, Jan 05, 2019 at 01:54:03PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > > > Who actually _uses_ mincore()? That's probably the best guide to what > > > we should do. Maybe they open the file read-only even if they are the > > > owner, and we

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 4:22 PM Linus Torvalds wrote: > > But I think my patch to just rip out all that page lookup, and just > base it on the page table state has the fundamental advantage that it > gets rid of code. Maybe I should jst commit it, and see if anything > breaks? We do have options

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 4:11 PM Matthew Wilcox wrote: > > FreeBSD claims to have a manpage from SunOS 4.1.3 with mincore (!) > > https://www.freebsd.org/cgi/man.cgi?query=mincore=0=0=SunOS+4.1.3=default=html > > DESCRIPTION >mincore() returns the primary memory residency status of pages

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Matthew Wilcox
On Sat, Jan 05, 2019 at 03:39:10PM -0800, Linus Torvalds wrote: > On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds > wrote: > > > > It goes back to forever, it looks like. I can't find a reason. > > mincore() was originally added in 2.3.52pre3, it looks like. Around > 2000 or so. But sadly before

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds wrote: > > It goes back to forever, it looks like. I can't find a reason. mincore() was originally added in 2.3.52pre3, it looks like. Around 2000 or so. But sadly before the BK history. And that comment about "Later we can get more picky about

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:16 PM Linus Torvalds wrote: > > It goes back to forever, it looks like. I can't find a reason. Our man-pages talk abouit the "without doing IO" part. That may be the result of our code, though, not the reason for it. The BSD man-page has other flags, but doesn't

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 3:05 PM Linus Torvalds wrote: > > That would be nicer than my patch, simply because removing code is > always nice. And arguably it's a better semantic anyway. Yeah, I wonder why we did that thing where mincore() walks the page tables, but if they are empty it looks in the

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Jiri Kosina
On Sat, 5 Jan 2019, Jann Horn wrote: > > Provide vm.mincore_privileged sysctl, which makes it possible to mincore() > > start returning -EPERM in case it's invoked by a process lacking > > CAP_SYS_ADMIN. > > > > The default behavior stays "mincore() can be used by anybody" in order to > > be

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 2:55 PM Jann Horn wrote: > > Just checking: I guess /proc/$pid/pagemap (iow, the pagemap_read() > handler) is less problematic because it only returns data about the > state of page tables, and doesn't query the address_space? In other > words, it permits monitoring

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Jann Horn
On Sat, Jan 5, 2019 at 6:27 PM Jiri Kosina wrote: > There are possibilities [1] how mincore() could be used as a converyor of > a sidechannel information about pagecache metadata. > > Provide vm.mincore_privileged sysctl, which makes it possible to mincore() > start returning -EPERM in case it's

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-05 Thread Linus Torvalds
On Sat, Jan 5, 2019 at 12:43 PM Jiri Kosina wrote: > > > Who actually _uses_ mincore()? That's probably the best guide to what > > we should do. Maybe they open the file read-only even if they are the > > owner, and we really should look at file ownership instead. > > Yeah, well > >

  1   2   >