Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-07-09 Thread Hugh Dickins
On Tue, 8 Jul 2014, David Herrmann wrote:
> 
> Hugh, any comments on patch 5, 6 and 7? Those are the last outstanding
> issues with memfd+sealing. Patch 7 (isolating pages) is still my
> favorite and has been running just fine on my machine for the last
> months. I think it'd be nice if we could give it a try in -next. We
> can always fall back to Patch 5 or Patch 5+6. Those will detect any
> racing AIO and just fail or wait for the IO to finish for a short
> period.

It's distressing for both of us how slow I am to review these, sorry.
We have just too many bugs in mm (and yes, some of them mine) for me
to set aside time to get deep enough into new features.

I've been trying for days and weeks to get there, made some progress
today, and hope to continue tomorrow.  I'll send my comments on 1/7
(thumb up) and 7/7 (thumb down) in a moment: 2-6 not tonight.

> 
> Are there any other blockers for this?

Trivia only, I haven't noticed any blocker; though I'm still not quite
convinced by memfd_create() - but happy enough with it if others are.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-07-09 Thread Hugh Dickins
On Tue, 8 Jul 2014, David Herrmann wrote:
 
 Hugh, any comments on patch 5, 6 and 7? Those are the last outstanding
 issues with memfd+sealing. Patch 7 (isolating pages) is still my
 favorite and has been running just fine on my machine for the last
 months. I think it'd be nice if we could give it a try in -next. We
 can always fall back to Patch 5 or Patch 5+6. Those will detect any
 racing AIO and just fail or wait for the IO to finish for a short
 period.

It's distressing for both of us how slow I am to review these, sorry.
We have just too many bugs in mm (and yes, some of them mine) for me
to set aside time to get deep enough into new features.

I've been trying for days and weeks to get there, made some progress
today, and hope to continue tomorrow.  I'll send my comments on 1/7
(thumb up) and 7/7 (thumb down) in a moment: 2-6 not tonight.

 
 Are there any other blockers for this?

Trivia only, I haven't noticed any blocker; though I'm still not quite
convinced by memfd_create() - but happy enough with it if others are.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-07-08 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 12:36 PM, David Herrmann  wrote:
> Hi
>
> This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
> with
> a longer introduction at gmane:
>   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
> An LWN article about memfd+sealing is available, too:
>   https://lwn.net/Articles/593918/
> v2 with some more discussions can be found here:
>   http://thread.gmane.org/gmane.linux.kernel.mm/115713
>
> This series introduces two new APIs:
>   memfd_create(): Think of this syscall as malloc() but it returns a
>   file-descriptor instead of a pointer. That file-descriptor 
> is
>   backed by anon-memory and can be memory-mapped for access.
>   sealing: The sealing API can be used to prevent a specific set of operations
>on a file-descriptor. You 'seal' the file and give thus the
>guarantee, that it cannot be modified in the specific ways.
>
> A short high-level introduction is also available here:
>   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
>
>
> Changed in v3:
>  - fcntl() now returns EINVAL if the FD does not support sealing. We used to
>return EBADF like pipe_fcntl() does, but that is really weird and I don't
>like repeating that.
>  - seals are now saved as "unsigned int" instead of "u32".
>  - i_mmap_writable is now an atomic so we can deny writable mappings just like
>i_writecount does.
>  - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
>and only unset it for memfds that shall support sealing.
>  - memfd_create() no longer has a size argument. It was redundant, use
>ftruncate() or fallocate().
>  - memfd_create() flags are "unsigned int" now, instead of "u64".
>  - NAME_MAX off-by-one fix
>  - several cosmetic changes
>  - Added AIO/Direct-IO page-pinning protection
>
> The last point is the most important change in this version: We now bail out 
> if
> any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
> GUP users from writing to sealed files _after_ they were sealed. There is 
> also a
> new FUSE-based test-case to trigger such situations.
>
> The last 2 patches try to improve the page-pinning handling. I included both 
> in
> this series, but obviously only one of them is needed (or we could stack 
> them):
>  - 6/7: This waits for up to 150ms for pages to be unpinned
>  - 7/7: This isolates pinned pages and replaces them with a fresh copy
>
> Hugh, patch 6 is basically your code. In case that gets merged, can I put your
> Signed-off-by on it?

Hugh, any comments on patch 5, 6 and 7? Those are the last outstanding
issues with memfd+sealing. Patch 7 (isolating pages) is still my
favorite and has been running just fine on my machine for the last
months. I think it'd be nice if we could give it a try in -next. We
can always fall back to Patch 5 or Patch 5+6. Those will detect any
racing AIO and just fail or wait for the IO to finish for a short
period.

Are there any other blockers for this?

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-07-08 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 12:36 PM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That file-descriptor 
 is
   backed by anon-memory and can be memory-mapped for access.
   sealing: The sealing API can be used to prevent a specific set of operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/


 Changed in v3:
  - fcntl() now returns EINVAL if the FD does not support sealing. We used to
return EBADF like pipe_fcntl() does, but that is really weird and I don't
like repeating that.
  - seals are now saved as unsigned int instead of u32.
  - i_mmap_writable is now an atomic so we can deny writable mappings just like
i_writecount does.
  - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
and only unset it for memfds that shall support sealing.
  - memfd_create() no longer has a size argument. It was redundant, use
ftruncate() or fallocate().
  - memfd_create() flags are unsigned int now, instead of u64.
  - NAME_MAX off-by-one fix
  - several cosmetic changes
  - Added AIO/Direct-IO page-pinning protection

 The last point is the most important change in this version: We now bail out 
 if
 any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
 GUP users from writing to sealed files _after_ they were sealed. There is 
 also a
 new FUSE-based test-case to trigger such situations.

 The last 2 patches try to improve the page-pinning handling. I included both 
 in
 this series, but obviously only one of them is needed (or we could stack 
 them):
  - 6/7: This waits for up to 150ms for pages to be unpinned
  - 7/7: This isolates pinned pages and replaces them with a fresh copy

 Hugh, patch 6 is basically your code. In case that gets merged, can I put your
 Signed-off-by on it?

Hugh, any comments on patch 5, 6 and 7? Those are the last outstanding
issues with memfd+sealing. Patch 7 (isolating pages) is still my
favorite and has been running just fine on my machine for the last
months. I think it'd be nice if we could give it a try in -next. We
can always fall back to Patch 5 or Patch 5+6. Those will detect any
racing AIO and just fail or wait for the IO to finish for a short
period.

Are there any other blockers for this?

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 1:31 PM, Hugh Dickins  wrote:
> On Tue, 17 Jun 2014, Andy Lutomirski wrote:
>> On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann  
>> wrote:
>> > On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski  
>> > wrote:
>> >> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann  
>> >> wrote:
>> >>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  
>> >>> wrote:
>>  Can you summarize why holes can't be reliably backed by the zero page?
>> >>>
>> >>> To answer this, I will quote Hugh from "PATCH v2 1/3":
>> >>>
>>  We do already use the ZERO_PAGE instead of allocating when it's a
>>  simple read; and on the face of it, we could extend that to mmap
>>  once the file is sealed.  But I am rather afraid to do so - for
>>  many years there was an mmap /dev/zero case which did that, but
>>  it was an easily forgotten case which caught us out at least
>>  once, so I'm reluctant to reintroduce it now for sealing.
>> 
>>  Anyway, I don't expect you to resolve the issue of sealed holes:
>>  that's very much my territory, to give you support on.
>> >>>
>> >>> Holes can be avoided with a simple fallocate(). I don't understand why
>> >>> I should make SEAL_WRITE do the fallocate for the caller. During the
>> >>> discussion of memfd_create() I was told to drop the "size" parameter,
>> >>> because it is redundant. I don't see how this implicit fallocate()
>> >>> does not fall into the same category?
>> >>>
>> >>
>> >> I'm really confused now.
>> >>
>> >> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
>> >> it, is that a "simple read"?  If so, doesn't that mean that there's no
>> >> problem?
>> >
>> > I assumed Hugh was talking about read(). So no, this is not about
>> > memory-reads on mmap()ed regions.
>> >
>> > Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
>> > case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
>> > anything like that in the mmap_region() and shmem_fault() paths.
>>
>> Would it be easy to fix this just for SEAL_WRITE files?  Hugh?
>>
>> This would make the interface much nicer, IMO.
>
> I do agree with you, Andy.
>
> I agree with David that a fallocate (of the fill-in-holes variety)
> does not have to be prohibited on a sealed file, that detection of
> holes is not an issue with respect to sealing, and that fallocate
> by the recipient could be used to "post-seal" the object to safety.
>
> But it doesn't feel right, and we shall be re-explaining and apologizing
> for it for months to come, until we just fix it.  I suspect David didn't
> want to add a dependency upon me to fix it, and I didn't want to be
> rushed into fixing it (nor is it a job I'd be comfortable to delegate).

I suppose it would be possible to merge memfd_create as is, and then
to fix the zero page thing and make fallocate on a SEAL_WRITEd file be
a no-op.  It would be silly for code to fallocate actual
sealed-with-holes files and allocate fresh pages that are guaranteed
to only ever contain zeros.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Hugh Dickins
On Tue, 17 Jun 2014, Andy Lutomirski wrote:
> On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann  wrote:
> > On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski  
> > wrote:
> >> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann  
> >> wrote:
> >>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  
> >>> wrote:
>  Can you summarize why holes can't be reliably backed by the zero page?
> >>>
> >>> To answer this, I will quote Hugh from "PATCH v2 1/3":
> >>>
>  We do already use the ZERO_PAGE instead of allocating when it's a
>  simple read; and on the face of it, we could extend that to mmap
>  once the file is sealed.  But I am rather afraid to do so - for
>  many years there was an mmap /dev/zero case which did that, but
>  it was an easily forgotten case which caught us out at least
>  once, so I'm reluctant to reintroduce it now for sealing.
> 
>  Anyway, I don't expect you to resolve the issue of sealed holes:
>  that's very much my territory, to give you support on.
> >>>
> >>> Holes can be avoided with a simple fallocate(). I don't understand why
> >>> I should make SEAL_WRITE do the fallocate for the caller. During the
> >>> discussion of memfd_create() I was told to drop the "size" parameter,
> >>> because it is redundant. I don't see how this implicit fallocate()
> >>> does not fall into the same category?
> >>>
> >>
> >> I'm really confused now.
> >>
> >> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
> >> it, is that a "simple read"?  If so, doesn't that mean that there's no
> >> problem?
> >
> > I assumed Hugh was talking about read(). So no, this is not about
> > memory-reads on mmap()ed regions.
> >
> > Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
> > case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
> > anything like that in the mmap_region() and shmem_fault() paths.
> 
> Would it be easy to fix this just for SEAL_WRITE files?  Hugh?
> 
> This would make the interface much nicer, IMO.

I do agree with you, Andy.

I agree with David that a fallocate (of the fill-in-holes variety)
does not have to be prohibited on a sealed file, that detection of
holes is not an issue with respect to sealing, and that fallocate
by the recipient could be used to "post-seal" the object to safety.

But it doesn't feel right, and we shall be re-explaining and apologizing
for it for months to come, until we just fix it.  I suspect David didn't
want to add a dependency upon me to fix it, and I didn't want to be
rushed into fixing it (nor is it a job I'd be comfortable to delegate).

I'll give it more thought.  The problem is that there may be a variety
of codepaths, in mm/shmem.c but more seriously outside it, which expect
an appropriate page->mapping and page->index on any page of a shared
mapping, and will be buggily surprised to find a ZERO_PAGE instead.
I'll have to go through carefully.  Splice may be more difficult to
audit than fault, I don't very often have to think about it.

And though I'd prefer to do the same for non-sealed as for sealed, it
may make more sense in the short term just to address the sealed case,
as you suggest.  In the unsealed case, first write to a page entails
locating all the places where the ZERO_PAGE had previously been mapped,
and replacing it there by the newly allocated page; might depend on
VM_NONLINEAR removal, and might entail page_mkwrite().  Doing just
the sealed is easier, though the half-complete job will annoy me.

I did refresh my memory of the /dev/zero case that had particularly
worried me: it was stranger than I'd thought, that reading from
/dev/zero could insert ZERO_PAGEs into mappings of other files.
Nick put an end to that in 2.6.24, but perhaps its prior existence
helps give assurance that ZERO_PAGE in surprising places is less
trouble than I fear (it did force XIP into having its own zero_page,
but I don't remember other complications).

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann  wrote:
> Hi
>
> On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski  wrote:
>> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann  
>> wrote:
>>> Hi
>>>
>>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  
>>> wrote:
 Can you summarize why holes can't be reliably backed by the zero page?
>>>
>>> To answer this, I will quote Hugh from "PATCH v2 1/3":
>>>
 We do already use the ZERO_PAGE instead of allocating when it's a
 simple read; and on the face of it, we could extend that to mmap
 once the file is sealed.  But I am rather afraid to do so - for
 many years there was an mmap /dev/zero case which did that, but
 it was an easily forgotten case which caught us out at least
 once, so I'm reluctant to reintroduce it now for sealing.

 Anyway, I don't expect you to resolve the issue of sealed holes:
 that's very much my territory, to give you support on.
>>>
>>> Holes can be avoided with a simple fallocate(). I don't understand why
>>> I should make SEAL_WRITE do the fallocate for the caller. During the
>>> discussion of memfd_create() I was told to drop the "size" parameter,
>>> because it is redundant. I don't see how this implicit fallocate()
>>> does not fall into the same category?
>>>
>>
>> I'm really confused now.
>>
>> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
>> it, is that a "simple read"?  If so, doesn't that mean that there's no
>> problem?
>
> I assumed Hugh was talking about read(). So no, this is not about
> memory-reads on mmap()ed regions.
>
> Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
> case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
> anything like that in the mmap_region() and shmem_fault() paths.

Would it be easy to fix this just for SEAL_WRITE files?  Hugh?

This would make the interface much nicer, IMO.

--Andy

>
> Thanks
> David



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski  wrote:
> On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann  wrote:
>> Hi
>>
>> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  wrote:
>>> Can you summarize why holes can't be reliably backed by the zero page?
>>
>> To answer this, I will quote Hugh from "PATCH v2 1/3":
>>
>>> We do already use the ZERO_PAGE instead of allocating when it's a
>>> simple read; and on the face of it, we could extend that to mmap
>>> once the file is sealed.  But I am rather afraid to do so - for
>>> many years there was an mmap /dev/zero case which did that, but
>>> it was an easily forgotten case which caught us out at least
>>> once, so I'm reluctant to reintroduce it now for sealing.
>>>
>>> Anyway, I don't expect you to resolve the issue of sealed holes:
>>> that's very much my territory, to give you support on.
>>
>> Holes can be avoided with a simple fallocate(). I don't understand why
>> I should make SEAL_WRITE do the fallocate for the caller. During the
>> discussion of memfd_create() I was told to drop the "size" parameter,
>> because it is redundant. I don't see how this implicit fallocate()
>> does not fall into the same category?
>>
>
> I'm really confused now.
>
> If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
> it, is that a "simple read"?  If so, doesn't that mean that there's no
> problem?

I assumed Hugh was talking about read(). So no, this is not about
memory-reads on mmap()ed regions.

Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
anything like that in the mmap_region() and shmem_fault() paths.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann  wrote:
> Hi
>
> On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  wrote:
>> Can you summarize why holes can't be reliably backed by the zero page?
>
> To answer this, I will quote Hugh from "PATCH v2 1/3":
>
>> We do already use the ZERO_PAGE instead of allocating when it's a
>> simple read; and on the face of it, we could extend that to mmap
>> once the file is sealed.  But I am rather afraid to do so - for
>> many years there was an mmap /dev/zero case which did that, but
>> it was an easily forgotten case which caught us out at least
>> once, so I'm reluctant to reintroduce it now for sealing.
>>
>> Anyway, I don't expect you to resolve the issue of sealed holes:
>> that's very much my territory, to give you support on.
>
> Holes can be avoided with a simple fallocate(). I don't understand why
> I should make SEAL_WRITE do the fallocate for the caller. During the
> discussion of memfd_create() I was told to drop the "size" parameter,
> because it is redundant. I don't see how this implicit fallocate()
> does not fall into the same category?
>

I'm really confused now.

If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
it, is that a "simple read"?  If so, doesn't that mean that there's no
problem?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski  wrote:
> Can you summarize why holes can't be reliably backed by the zero page?

To answer this, I will quote Hugh from "PATCH v2 1/3":

> We do already use the ZERO_PAGE instead of allocating when it's a
> simple read; and on the face of it, we could extend that to mmap
> once the file is sealed.  But I am rather afraid to do so - for
> many years there was an mmap /dev/zero case which did that, but
> it was an easily forgotten case which caught us out at least
> once, so I'm reluctant to reintroduce it now for sealing.
>
> Anyway, I don't expect you to resolve the issue of sealed holes:
> that's very much my territory, to give you support on.

Holes can be avoided with a simple fallocate(). I don't understand why
I should make SEAL_WRITE do the fallocate for the caller. During the
discussion of memfd_create() I was told to drop the "size" parameter,
because it is redundant. I don't see how this implicit fallocate()
does not fall into the same category?

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 2:13 PM, Florian Weimer  wrote:
> On 06/17/2014 12:10 PM, David Herrmann wrote:
>
 The file might have holes, therefore, you'd have to allocate backing
 pages. This might hit a soft-limit and fail. To avoid this, use
 fallocate() to allocate pages prior to mmap()
>>>
>>>
>>> This does not work because the consuming side does not know how the
>>> descriptor was set up if sealing does not imply that.
>>
>>
>> The consuming side has to very seals via F_GET_SEALS. After that, it
>> shall do a simple fallocate() on the whole file if it wants to go sure
>> that all pages are allocated. Why shouldn't that be possible? Please
>> elaborate.
>
>
> Hmm.  You permit general fallocate even for WRITE seals.  That's really
> unexpected.

SEAL_WRITE prevents modifications of file-content. fallocate() does
not modify file-contents, so I think it's not unexpected that
fallocate() is still allowed.

> The inode_newsize_ok check in shmem_fallocate can result in SIGXFSZ, which
> doesn't seem to be what's intended here.

It can only result in SIGXFSZ if you _increase_ the file-size with
fallocate(). You shouldn't do that if you only verify that holes are
allocated. Hence, a simple fallocate(st.st_size) cannot result in
SIGXFSZ. Obviously, this requires SEAL_SHRINK to prevent the remote
site to shrink the file while you call fallocate(). But SEAL_WRITE
usually goes together with SEAL_SHRINK for obvious reasons.

> Will the new pages attributed to the process calling fallocate, or to the
> process calling memfd_create?

Pages are always allocated by the caller and charged on current->mm
(current process).

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Florian Weimer

On 06/17/2014 12:10 PM, David Herrmann wrote:


The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap()


This does not work because the consuming side does not know how the
descriptor was set up if sealing does not imply that.


The consuming side has to very seals via F_GET_SEALS. After that, it
shall do a simple fallocate() on the whole file if it wants to go sure
that all pages are allocated. Why shouldn't that be possible? Please
elaborate.


Hmm.  You permit general fallocate even for WRITE seals.  That's really 
unexpected.


The inode_newsize_ok check in shmem_fallocate can result in SIGXFSZ, 
which doesn't seem to be what's intended here.


Will the new pages attributed to the process calling fallocate, or to 
the process calling memfd_create?


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 12:04 PM, Florian Weimer  wrote:
> On 06/17/2014 12:01 PM, David Herrmann wrote:
>
>>> I don't think this is what potential users expect because mlock requires
>>> capabilities which are not available to them.
>>>
>>> A couple of weeks ago, sealing was to be applied to anonymous shared
>>> memory.
>>> Has this changed?  Why should *reading* it trigger OOM?
>>
>> The file might have holes, therefore, you'd have to allocate backing
>> pages. This might hit a soft-limit and fail. To avoid this, use
>> fallocate() to allocate pages prior to mmap()
>
> This does not work because the consuming side does not know how the
> descriptor was set up if sealing does not imply that.

The consuming side has to very seals via F_GET_SEALS. After that, it
shall do a simple fallocate() on the whole file if it wants to go sure
that all pages are allocated. Why shouldn't that be possible? Please
elaborate.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Florian Weimer

On 06/17/2014 12:01 PM, David Herrmann wrote:


I don't think this is what potential users expect because mlock requires
capabilities which are not available to them.

A couple of weeks ago, sealing was to be applied to anonymous shared memory.
Has this changed?  Why should *reading* it trigger OOM?


The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap()


This does not work because the consuming side does not know how the 
descriptor was set up if sealing does not imply that.



or mlock() to make the kernel lock them in memory.


See above for why that does not work.

I think you should eliminate the holes on sealing and report ENOMEM 
there if necessary.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 11:54 AM, Florian Weimer  wrote:
> On 06/13/2014 05:33 PM, David Herrmann wrote:
>>
>> On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski 
>> wrote:
>>>
>>> Isn't the point of SEAL_SHRINK to allow servers to mmap and read
>>> safely without worrying about SIGBUS?
>>
>>
>> No, I don't think so.
>> The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
>> is an effect, not a cause. It's only a coincidence that "OOM during
>> reads" and "reading beyond file-boundaries" has the same effect:
>> SIGBUS.
>> We only protect against reading beyond file-boundaries due to
>> shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.
>>
>> Anyone dealing with mmap() _has_ to use mlock() to protect against
>> OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
>> redundant, because you can achieve the same with SEAL_SHRINK+mlock().
>
>
> I don't think this is what potential users expect because mlock requires
> capabilities which are not available to them.
>
> A couple of weeks ago, sealing was to be applied to anonymous shared memory.
> Has this changed?  Why should *reading* it trigger OOM?

The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap() or mlock() to make the
kernel lock them in memory.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-17 Thread Florian Weimer

On 06/13/2014 05:33 PM, David Herrmann wrote:

On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski  wrote:

Isn't the point of SEAL_SHRINK to allow servers to mmap and read
safely without worrying about SIGBUS?


No, I don't think so.
The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
is an effect, not a cause. It's only a coincidence that "OOM during
reads" and "reading beyond file-boundaries" has the same effect:
SIGBUS.
We only protect against reading beyond file-boundaries due to
shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.

Anyone dealing with mmap() _has_ to use mlock() to protect against
OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
redundant, because you can achieve the same with SEAL_SHRINK+mlock().


I don't think this is what potential users expect because mlock requires 
capabilities which are not available to them.


A couple of weeks ago, sealing was to be applied to anonymous shared 
memory.  Has this changed?  Why should *reading* it trigger OOM?


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Florian Weimer

On 06/13/2014 05:33 PM, David Herrmann wrote:

On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski l...@amacapital.net wrote:

Isn't the point of SEAL_SHRINK to allow servers to mmap and read
safely without worrying about SIGBUS?


No, I don't think so.
The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
is an effect, not a cause. It's only a coincidence that OOM during
reads and reading beyond file-boundaries has the same effect:
SIGBUS.
We only protect against reading beyond file-boundaries due to
shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.

Anyone dealing with mmap() _has_ to use mlock() to protect against
OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
redundant, because you can achieve the same with SEAL_SHRINK+mlock().


I don't think this is what potential users expect because mlock requires 
capabilities which are not available to them.


A couple of weeks ago, sealing was to be applied to anonymous shared 
memory.  Has this changed?  Why should *reading* it trigger OOM?


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 11:54 AM, Florian Weimer fwei...@redhat.com wrote:
 On 06/13/2014 05:33 PM, David Herrmann wrote:

 On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski l...@amacapital.net
 wrote:

 Isn't the point of SEAL_SHRINK to allow servers to mmap and read
 safely without worrying about SIGBUS?


 No, I don't think so.
 The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
 is an effect, not a cause. It's only a coincidence that OOM during
 reads and reading beyond file-boundaries has the same effect:
 SIGBUS.
 We only protect against reading beyond file-boundaries due to
 shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.

 Anyone dealing with mmap() _has_ to use mlock() to protect against
 OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
 redundant, because you can achieve the same with SEAL_SHRINK+mlock().


 I don't think this is what potential users expect because mlock requires
 capabilities which are not available to them.

 A couple of weeks ago, sealing was to be applied to anonymous shared memory.
 Has this changed?  Why should *reading* it trigger OOM?

The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap() or mlock() to make the
kernel lock them in memory.

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Florian Weimer

On 06/17/2014 12:01 PM, David Herrmann wrote:


I don't think this is what potential users expect because mlock requires
capabilities which are not available to them.

A couple of weeks ago, sealing was to be applied to anonymous shared memory.
Has this changed?  Why should *reading* it trigger OOM?


The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap()


This does not work because the consuming side does not know how the 
descriptor was set up if sealing does not imply that.



or mlock() to make the kernel lock them in memory.


See above for why that does not work.

I think you should eliminate the holes on sealing and report ENOMEM 
there if necessary.


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 12:04 PM, Florian Weimer fwei...@redhat.com wrote:
 On 06/17/2014 12:01 PM, David Herrmann wrote:

 I don't think this is what potential users expect because mlock requires
 capabilities which are not available to them.

 A couple of weeks ago, sealing was to be applied to anonymous shared
 memory.
 Has this changed?  Why should *reading* it trigger OOM?

 The file might have holes, therefore, you'd have to allocate backing
 pages. This might hit a soft-limit and fail. To avoid this, use
 fallocate() to allocate pages prior to mmap()

 This does not work because the consuming side does not know how the
 descriptor was set up if sealing does not imply that.

The consuming side has to very seals via F_GET_SEALS. After that, it
shall do a simple fallocate() on the whole file if it wants to go sure
that all pages are allocated. Why shouldn't that be possible? Please
elaborate.

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Florian Weimer

On 06/17/2014 12:10 PM, David Herrmann wrote:


The file might have holes, therefore, you'd have to allocate backing
pages. This might hit a soft-limit and fail. To avoid this, use
fallocate() to allocate pages prior to mmap()


This does not work because the consuming side does not know how the
descriptor was set up if sealing does not imply that.


The consuming side has to very seals via F_GET_SEALS. After that, it
shall do a simple fallocate() on the whole file if it wants to go sure
that all pages are allocated. Why shouldn't that be possible? Please
elaborate.


Hmm.  You permit general fallocate even for WRITE seals.  That's really 
unexpected.


The inode_newsize_ok check in shmem_fallocate can result in SIGXFSZ, 
which doesn't seem to be what's intended here.


Will the new pages attributed to the process calling fallocate, or to 
the process calling memfd_create?


--
Florian Weimer / Red Hat Product Security Team
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 2:13 PM, Florian Weimer fwei...@redhat.com wrote:
 On 06/17/2014 12:10 PM, David Herrmann wrote:

 The file might have holes, therefore, you'd have to allocate backing
 pages. This might hit a soft-limit and fail. To avoid this, use
 fallocate() to allocate pages prior to mmap()


 This does not work because the consuming side does not know how the
 descriptor was set up if sealing does not imply that.


 The consuming side has to very seals via F_GET_SEALS. After that, it
 shall do a simple fallocate() on the whole file if it wants to go sure
 that all pages are allocated. Why shouldn't that be possible? Please
 elaborate.


 Hmm.  You permit general fallocate even for WRITE seals.  That's really
 unexpected.

SEAL_WRITE prevents modifications of file-content. fallocate() does
not modify file-contents, so I think it's not unexpected that
fallocate() is still allowed.

 The inode_newsize_ok check in shmem_fallocate can result in SIGXFSZ, which
 doesn't seem to be what's intended here.

It can only result in SIGXFSZ if you _increase_ the file-size with
fallocate(). You shouldn't do that if you only verify that holes are
allocated. Hence, a simple fallocate(st.st_size) cannot result in
SIGXFSZ. Obviously, this requires SEAL_SHRINK to prevent the remote
site to shrink the file while you call fallocate(). But SEAL_WRITE
usually goes together with SEAL_SHRINK for obvious reasons.

 Will the new pages attributed to the process calling fallocate, or to the
 process calling memfd_create?

Pages are always allocated by the caller and charged on current-mm
(current process).

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net wrote:
 Can you summarize why holes can't be reliably backed by the zero page?

To answer this, I will quote Hugh from PATCH v2 1/3:

 We do already use the ZERO_PAGE instead of allocating when it's a
 simple read; and on the face of it, we could extend that to mmap
 once the file is sealed.  But I am rather afraid to do so - for
 many years there was an mmap /dev/zero case which did that, but
 it was an easily forgotten case which caught us out at least
 once, so I'm reluctant to reintroduce it now for sealing.

 Anyway, I don't expect you to resolve the issue of sealed holes:
 that's very much my territory, to give you support on.

Holes can be avoided with a simple fallocate(). I don't understand why
I should make SEAL_WRITE do the fallocate for the caller. During the
discussion of memfd_create() I was told to drop the size parameter,
because it is redundant. I don't see how this implicit fallocate()
does not fall into the same category?

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net wrote:
 Can you summarize why holes can't be reliably backed by the zero page?

 To answer this, I will quote Hugh from PATCH v2 1/3:

 We do already use the ZERO_PAGE instead of allocating when it's a
 simple read; and on the face of it, we could extend that to mmap
 once the file is sealed.  But I am rather afraid to do so - for
 many years there was an mmap /dev/zero case which did that, but
 it was an easily forgotten case which caught us out at least
 once, so I'm reluctant to reintroduce it now for sealing.

 Anyway, I don't expect you to resolve the issue of sealed holes:
 that's very much my territory, to give you support on.

 Holes can be avoided with a simple fallocate(). I don't understand why
 I should make SEAL_WRITE do the fallocate for the caller. During the
 discussion of memfd_create() I was told to drop the size parameter,
 because it is redundant. I don't see how this implicit fallocate()
 does not fall into the same category?


I'm really confused now.

If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
it, is that a simple read?  If so, doesn't that mean that there's no
problem?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread David Herrmann
Hi

On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net wrote:
 Can you summarize why holes can't be reliably backed by the zero page?

 To answer this, I will quote Hugh from PATCH v2 1/3:

 We do already use the ZERO_PAGE instead of allocating when it's a
 simple read; and on the face of it, we could extend that to mmap
 once the file is sealed.  But I am rather afraid to do so - for
 many years there was an mmap /dev/zero case which did that, but
 it was an easily forgotten case which caught us out at least
 once, so I'm reluctant to reintroduce it now for sealing.

 Anyway, I don't expect you to resolve the issue of sealed holes:
 that's very much my territory, to give you support on.

 Holes can be avoided with a simple fallocate(). I don't understand why
 I should make SEAL_WRITE do the fallocate for the caller. During the
 discussion of memfd_create() I was told to drop the size parameter,
 because it is redundant. I don't see how this implicit fallocate()
 does not fall into the same category?


 I'm really confused now.

 If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
 it, is that a simple read?  If so, doesn't that mean that there's no
 problem?

I assumed Hugh was talking about read(). So no, this is not about
memory-reads on mmap()ed regions.

Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
anything like that in the mmap_region() and shmem_fault() paths.

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann dh.herrm...@gmail.com 
 wrote:
 Hi

 On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net 
 wrote:
 Can you summarize why holes can't be reliably backed by the zero page?

 To answer this, I will quote Hugh from PATCH v2 1/3:

 We do already use the ZERO_PAGE instead of allocating when it's a
 simple read; and on the face of it, we could extend that to mmap
 once the file is sealed.  But I am rather afraid to do so - for
 many years there was an mmap /dev/zero case which did that, but
 it was an easily forgotten case which caught us out at least
 once, so I'm reluctant to reintroduce it now for sealing.

 Anyway, I don't expect you to resolve the issue of sealed holes:
 that's very much my territory, to give you support on.

 Holes can be avoided with a simple fallocate(). I don't understand why
 I should make SEAL_WRITE do the fallocate for the caller. During the
 discussion of memfd_create() I was told to drop the size parameter,
 because it is redundant. I don't see how this implicit fallocate()
 does not fall into the same category?


 I'm really confused now.

 If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
 it, is that a simple read?  If so, doesn't that mean that there's no
 problem?

 I assumed Hugh was talking about read(). So no, this is not about
 memory-reads on mmap()ed regions.

 Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
 case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
 anything like that in the mmap_region() and shmem_fault() paths.

Would it be easy to fix this just for SEAL_WRITE files?  Hugh?

This would make the interface much nicer, IMO.

--Andy


 Thanks
 David



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Hugh Dickins
On Tue, 17 Jun 2014, Andy Lutomirski wrote:
 On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann dh.herrm...@gmail.com wrote:
  On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann dh.herrm...@gmail.com 
  wrote:
  On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  Can you summarize why holes can't be reliably backed by the zero page?
 
  To answer this, I will quote Hugh from PATCH v2 1/3:
 
  We do already use the ZERO_PAGE instead of allocating when it's a
  simple read; and on the face of it, we could extend that to mmap
  once the file is sealed.  But I am rather afraid to do so - for
  many years there was an mmap /dev/zero case which did that, but
  it was an easily forgotten case which caught us out at least
  once, so I'm reluctant to reintroduce it now for sealing.
 
  Anyway, I don't expect you to resolve the issue of sealed holes:
  that's very much my territory, to give you support on.
 
  Holes can be avoided with a simple fallocate(). I don't understand why
  I should make SEAL_WRITE do the fallocate for the caller. During the
  discussion of memfd_create() I was told to drop the size parameter,
  because it is redundant. I don't see how this implicit fallocate()
  does not fall into the same category?
 
 
  I'm really confused now.
 
  If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
  it, is that a simple read?  If so, doesn't that mean that there's no
  problem?
 
  I assumed Hugh was talking about read(). So no, this is not about
  memory-reads on mmap()ed regions.
 
  Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
  case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
  anything like that in the mmap_region() and shmem_fault() paths.
 
 Would it be easy to fix this just for SEAL_WRITE files?  Hugh?
 
 This would make the interface much nicer, IMO.

I do agree with you, Andy.

I agree with David that a fallocate (of the fill-in-holes variety)
does not have to be prohibited on a sealed file, that detection of
holes is not an issue with respect to sealing, and that fallocate
by the recipient could be used to post-seal the object to safety.

But it doesn't feel right, and we shall be re-explaining and apologizing
for it for months to come, until we just fix it.  I suspect David didn't
want to add a dependency upon me to fix it, and I didn't want to be
rushed into fixing it (nor is it a job I'd be comfortable to delegate).

I'll give it more thought.  The problem is that there may be a variety
of codepaths, in mm/shmem.c but more seriously outside it, which expect
an appropriate page-mapping and page-index on any page of a shared
mapping, and will be buggily surprised to find a ZERO_PAGE instead.
I'll have to go through carefully.  Splice may be more difficult to
audit than fault, I don't very often have to think about it.

And though I'd prefer to do the same for non-sealed as for sealed, it
may make more sense in the short term just to address the sealed case,
as you suggest.  In the unsealed case, first write to a page entails
locating all the places where the ZERO_PAGE had previously been mapped,
and replacing it there by the newly allocated page; might depend on
VM_NONLINEAR removal, and might entail page_mkwrite().  Doing just
the sealed is easier, though the half-complete job will annoy me.

I did refresh my memory of the /dev/zero case that had particularly
worried me: it was stranger than I'd thought, that reading from
/dev/zero could insert ZERO_PAGEs into mappings of other files.
Nick put an end to that in 2.6.24, but perhaps its prior existence
helps give assurance that ZERO_PAGE in surprising places is less
trouble than I fear (it did force XIP into having its own zero_page,
but I don't remember other complications).

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-17 Thread Andy Lutomirski
On Tue, Jun 17, 2014 at 1:31 PM, Hugh Dickins hu...@google.com wrote:
 On Tue, 17 Jun 2014, Andy Lutomirski wrote:
 On Tue, Jun 17, 2014 at 9:51 AM, David Herrmann dh.herrm...@gmail.com 
 wrote:
  On Tue, Jun 17, 2014 at 6:41 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Tue, Jun 17, 2014 at 9:36 AM, David Herrmann dh.herrm...@gmail.com 
  wrote:
  On Tue, Jun 17, 2014 at 6:20 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  Can you summarize why holes can't be reliably backed by the zero page?
 
  To answer this, I will quote Hugh from PATCH v2 1/3:
 
  We do already use the ZERO_PAGE instead of allocating when it's a
  simple read; and on the face of it, we could extend that to mmap
  once the file is sealed.  But I am rather afraid to do so - for
  many years there was an mmap /dev/zero case which did that, but
  it was an easily forgotten case which caught us out at least
  once, so I'm reluctant to reintroduce it now for sealing.
 
  Anyway, I don't expect you to resolve the issue of sealed holes:
  that's very much my territory, to give you support on.
 
  Holes can be avoided with a simple fallocate(). I don't understand why
  I should make SEAL_WRITE do the fallocate for the caller. During the
  discussion of memfd_create() I was told to drop the size parameter,
  because it is redundant. I don't see how this implicit fallocate()
  does not fall into the same category?
 
 
  I'm really confused now.
 
  If I SEAL_WRITE a file, and then I mmap it PROT_READ, and then I read
  it, is that a simple read?  If so, doesn't that mean that there's no
  problem?
 
  I assumed Hugh was talking about read(). So no, this is not about
  memory-reads on mmap()ed regions.
 
  Looking at shmem_file_read_iter() I can see a ZERO_PAGE(0) call in
  case shmem_getpage_gfp(SGP_READ) tells us there's a hole. I cannot see
  anything like that in the mmap_region() and shmem_fault() paths.

 Would it be easy to fix this just for SEAL_WRITE files?  Hugh?

 This would make the interface much nicer, IMO.

 I do agree with you, Andy.

 I agree with David that a fallocate (of the fill-in-holes variety)
 does not have to be prohibited on a sealed file, that detection of
 holes is not an issue with respect to sealing, and that fallocate
 by the recipient could be used to post-seal the object to safety.

 But it doesn't feel right, and we shall be re-explaining and apologizing
 for it for months to come, until we just fix it.  I suspect David didn't
 want to add a dependency upon me to fix it, and I didn't want to be
 rushed into fixing it (nor is it a job I'd be comfortable to delegate).

I suppose it would be possible to merge memfd_create as is, and then
to fix the zero page thing and make fallocate on a SEAL_WRITEd file be
a no-op.  It would be silly for code to fallocate actual
sealed-with-holes files and allocate fresh pages that are guaranteed
to only ever contain zeros.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-13 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski  wrote:
> On Fri, Jun 13, 2014 at 8:15 AM, David Herrmann  wrote:
>> Hi
>>
>> On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski  wrote:
>>> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann  
>>> wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That 
 file-descriptor is
   backed by anon-memory and can be memory-mapped for 
 access.
   sealing: The sealing API can be used to prevent a specific set of 
 operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
>>>
>>> Potentially silly question: is it guaranteed that mmapping and reading
>>> a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
>>> this be documented?  (The particular issue here would be reading
>>> holes.  It should work by using the zero page, but, if so, we should
>>> probably make it a real documented guarantee.)
>>
>> No, this is not guaranteed. See the previous discussion in v2 on Patch
>> 2/4 between Hugh and me.
>>
>> Summary is: If you want mmap-reads to not fail, use mlock(). There are
>> many situations where a fault might fail (think: OOM) and sealing is
>> not meant to protect against that. Btw., holes are automatically
>> filled with fresh pages by shmem. So a read only fails in OOM
>> situations (or memcg limits, etc.).
>>
>
> Isn't the point of SEAL_SHRINK to allow servers to mmap and read
> safely without worrying about SIGBUS?

No, I don't think so.
The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
is an effect, not a cause. It's only a coincidence that "OOM during
reads" and "reading beyond file-boundaries" has the same effect:
SIGBUS.
We only protect against reading beyond file-boundaries due to
shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.

Anyone dealing with mmap() _has_ to use mlock() to protect against
OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
redundant, because you can achieve the same with SEAL_SHRINK+mlock().

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-13 Thread Andy Lutomirski
On Fri, Jun 13, 2014 at 8:15 AM, David Herrmann  wrote:
> Hi
>
> On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski  wrote:
>> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann  
>> wrote:
>>> Hi
>>>
>>> This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
>>> with
>>> a longer introduction at gmane:
>>>   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
>>> An LWN article about memfd+sealing is available, too:
>>>   https://lwn.net/Articles/593918/
>>> v2 with some more discussions can be found here:
>>>   http://thread.gmane.org/gmane.linux.kernel.mm/115713
>>>
>>> This series introduces two new APIs:
>>>   memfd_create(): Think of this syscall as malloc() but it returns a
>>>   file-descriptor instead of a pointer. That 
>>> file-descriptor is
>>>   backed by anon-memory and can be memory-mapped for access.
>>>   sealing: The sealing API can be used to prevent a specific set of 
>>> operations
>>>on a file-descriptor. You 'seal' the file and give thus the
>>>guarantee, that it cannot be modified in the specific ways.
>>>
>>> A short high-level introduction is also available here:
>>>   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
>>
>> Potentially silly question: is it guaranteed that mmapping and reading
>> a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
>> this be documented?  (The particular issue here would be reading
>> holes.  It should work by using the zero page, but, if so, we should
>> probably make it a real documented guarantee.)
>
> No, this is not guaranteed. See the previous discussion in v2 on Patch
> 2/4 between Hugh and me.
>
> Summary is: If you want mmap-reads to not fail, use mlock(). There are
> many situations where a fault might fail (think: OOM) and sealing is
> not meant to protect against that. Btw., holes are automatically
> filled with fresh pages by shmem. So a read only fails in OOM
> situations (or memcg limits, etc.).
>

Isn't the point of SEAL_SHRINK to allow servers to mmap and read
safely without worrying about SIGBUS?

--Andy

> Thanks
> David



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-13 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski  wrote:
> On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann  wrote:
>> Hi
>>
>> This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
>> with
>> a longer introduction at gmane:
>>   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
>> An LWN article about memfd+sealing is available, too:
>>   https://lwn.net/Articles/593918/
>> v2 with some more discussions can be found here:
>>   http://thread.gmane.org/gmane.linux.kernel.mm/115713
>>
>> This series introduces two new APIs:
>>   memfd_create(): Think of this syscall as malloc() but it returns a
>>   file-descriptor instead of a pointer. That file-descriptor 
>> is
>>   backed by anon-memory and can be memory-mapped for access.
>>   sealing: The sealing API can be used to prevent a specific set of 
>> operations
>>on a file-descriptor. You 'seal' the file and give thus the
>>guarantee, that it cannot be modified in the specific ways.
>>
>> A short high-level introduction is also available here:
>>   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
>
> Potentially silly question: is it guaranteed that mmapping and reading
> a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
> this be documented?  (The particular issue here would be reading
> holes.  It should work by using the zero page, but, if so, we should
> probably make it a real documented guarantee.)

No, this is not guaranteed. See the previous discussion in v2 on Patch
2/4 between Hugh and me.

Summary is: If you want mmap-reads to not fail, use mlock(). There are
many situations where a fault might fail (think: OOM) and sealing is
not meant to protect against that. Btw., holes are automatically
filled with fresh pages by shmem. So a read only fails in OOM
situations (or memcg limits, etc.).

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing & memfd_create()

2014-06-13 Thread Andy Lutomirski
On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann  wrote:
> Hi
>
> This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
> with
> a longer introduction at gmane:
>   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
> An LWN article about memfd+sealing is available, too:
>   https://lwn.net/Articles/593918/
> v2 with some more discussions can be found here:
>   http://thread.gmane.org/gmane.linux.kernel.mm/115713
>
> This series introduces two new APIs:
>   memfd_create(): Think of this syscall as malloc() but it returns a
>   file-descriptor instead of a pointer. That file-descriptor 
> is
>   backed by anon-memory and can be memory-mapped for access.
>   sealing: The sealing API can be used to prevent a specific set of operations
>on a file-descriptor. You 'seal' the file and give thus the
>guarantee, that it cannot be modified in the specific ways.
>
> A short high-level introduction is also available here:
>   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

Potentially silly question: is it guaranteed that mmapping and reading
a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
this be documented?  (The particular issue here would be reading
holes.  It should work by using the zero page, but, if so, we should
probably make it a real documented guarantee.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/7] File Sealing & memfd_create()

2014-06-13 Thread David Herrmann
Hi

This is v3 of the File-Sealing and memfd_create() patches. You can find v1 with
a longer introduction at gmane:
  http://thread.gmane.org/gmane.comp.video.dri.devel/102241
An LWN article about memfd+sealing is available, too:
  https://lwn.net/Articles/593918/
v2 with some more discussions can be found here:
  http://thread.gmane.org/gmane.linux.kernel.mm/115713

This series introduces two new APIs:
  memfd_create(): Think of this syscall as malloc() but it returns a
  file-descriptor instead of a pointer. That file-descriptor is
  backed by anon-memory and can be memory-mapped for access.
  sealing: The sealing API can be used to prevent a specific set of operations
   on a file-descriptor. You 'seal' the file and give thus the
   guarantee, that it cannot be modified in the specific ways.

A short high-level introduction is also available here:
  http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/


Changed in v3:
 - fcntl() now returns EINVAL if the FD does not support sealing. We used to
   return EBADF like pipe_fcntl() does, but that is really weird and I don't
   like repeating that.
 - seals are now saved as "unsigned int" instead of "u32".
 - i_mmap_writable is now an atomic so we can deny writable mappings just like
   i_writecount does.
 - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
   and only unset it for memfds that shall support sealing.
 - memfd_create() no longer has a size argument. It was redundant, use
   ftruncate() or fallocate().
 - memfd_create() flags are "unsigned int" now, instead of "u64".
 - NAME_MAX off-by-one fix
 - several cosmetic changes
 - Added AIO/Direct-IO page-pinning protection

The last point is the most important change in this version: We now bail out if
any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
GUP users from writing to sealed files _after_ they were sealed. There is also a
new FUSE-based test-case to trigger such situations.

The last 2 patches try to improve the page-pinning handling. I included both in
this series, but obviously only one of them is needed (or we could stack them):
 - 6/7: This waits for up to 150ms for pages to be unpinned
 - 7/7: This isolates pinned pages and replaces them with a fresh copy

Hugh, patch 6 is basically your code. In case that gets merged, can I put your
Signed-off-by on it?

I hope I didn't miss anything. Further comments welcome!

Thanks
David

David Herrmann (7):
  mm: allow drivers to prevent new writable mappings
  shm: add sealing API
  shm: add memfd_create() syscall
  selftests: add memfd_create() + sealing tests
  selftests: add memfd/sealing page-pinning tests
  shm: wait for pins to be released when sealing
  shm: isolate pinned pages when sealing files

 arch/x86/syscalls/syscall_32.tbl   |   1 +
 arch/x86/syscalls/syscall_64.tbl   |   1 +
 fs/fcntl.c |   5 +
 fs/inode.c |   1 +
 include/linux/fs.h |  29 +-
 include/linux/shmem_fs.h   |  17 +
 include/linux/syscalls.h   |   1 +
 include/uapi/linux/fcntl.h |  15 +
 include/uapi/linux/memfd.h |   8 +
 kernel/fork.c  |   2 +-
 kernel/sys_ni.c|   1 +
 mm/mmap.c  |  24 +-
 mm/shmem.c | 320 -
 mm/swap_state.c|   1 +
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/memfd/.gitignore   |   4 +
 tools/testing/selftests/memfd/Makefile |  40 ++
 tools/testing/selftests/memfd/fuse_mnt.c   | 110 +++
 tools/testing/selftests/memfd/fuse_test.c  | 311 +
 tools/testing/selftests/memfd/memfd_test.c | 913 +
 tools/testing/selftests/memfd/run_fuse_test.sh |  14 +
 21 files changed, 1807 insertions(+), 12 deletions(-)
 create mode 100644 include/uapi/linux/memfd.h
 create mode 100644 tools/testing/selftests/memfd/.gitignore
 create mode 100644 tools/testing/selftests/memfd/Makefile
 create mode 100755 tools/testing/selftests/memfd/fuse_mnt.c
 create mode 100644 tools/testing/selftests/memfd/fuse_test.c
 create mode 100644 tools/testing/selftests/memfd/memfd_test.c
 create mode 100755 tools/testing/selftests/memfd/run_fuse_test.sh

-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/7] File Sealing memfd_create()

2014-06-13 Thread David Herrmann
Hi

This is v3 of the File-Sealing and memfd_create() patches. You can find v1 with
a longer introduction at gmane:
  http://thread.gmane.org/gmane.comp.video.dri.devel/102241
An LWN article about memfd+sealing is available, too:
  https://lwn.net/Articles/593918/
v2 with some more discussions can be found here:
  http://thread.gmane.org/gmane.linux.kernel.mm/115713

This series introduces two new APIs:
  memfd_create(): Think of this syscall as malloc() but it returns a
  file-descriptor instead of a pointer. That file-descriptor is
  backed by anon-memory and can be memory-mapped for access.
  sealing: The sealing API can be used to prevent a specific set of operations
   on a file-descriptor. You 'seal' the file and give thus the
   guarantee, that it cannot be modified in the specific ways.

A short high-level introduction is also available here:
  http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/


Changed in v3:
 - fcntl() now returns EINVAL if the FD does not support sealing. We used to
   return EBADF like pipe_fcntl() does, but that is really weird and I don't
   like repeating that.
 - seals are now saved as unsigned int instead of u32.
 - i_mmap_writable is now an atomic so we can deny writable mappings just like
   i_writecount does.
 - SHMEM_ALLOW_SEALING is dropped. We initialize all objects with F_SEAL_SEAL
   and only unset it for memfds that shall support sealing.
 - memfd_create() no longer has a size argument. It was redundant, use
   ftruncate() or fallocate().
 - memfd_create() flags are unsigned int now, instead of u64.
 - NAME_MAX off-by-one fix
 - several cosmetic changes
 - Added AIO/Direct-IO page-pinning protection

The last point is the most important change in this version: We now bail out if
any page-refcount is elevated while setting SEAL_WRITE. This prevents parallel
GUP users from writing to sealed files _after_ they were sealed. There is also a
new FUSE-based test-case to trigger such situations.

The last 2 patches try to improve the page-pinning handling. I included both in
this series, but obviously only one of them is needed (or we could stack them):
 - 6/7: This waits for up to 150ms for pages to be unpinned
 - 7/7: This isolates pinned pages and replaces them with a fresh copy

Hugh, patch 6 is basically your code. In case that gets merged, can I put your
Signed-off-by on it?

I hope I didn't miss anything. Further comments welcome!

Thanks
David

David Herrmann (7):
  mm: allow drivers to prevent new writable mappings
  shm: add sealing API
  shm: add memfd_create() syscall
  selftests: add memfd_create() + sealing tests
  selftests: add memfd/sealing page-pinning tests
  shm: wait for pins to be released when sealing
  shm: isolate pinned pages when sealing files

 arch/x86/syscalls/syscall_32.tbl   |   1 +
 arch/x86/syscalls/syscall_64.tbl   |   1 +
 fs/fcntl.c |   5 +
 fs/inode.c |   1 +
 include/linux/fs.h |  29 +-
 include/linux/shmem_fs.h   |  17 +
 include/linux/syscalls.h   |   1 +
 include/uapi/linux/fcntl.h |  15 +
 include/uapi/linux/memfd.h |   8 +
 kernel/fork.c  |   2 +-
 kernel/sys_ni.c|   1 +
 mm/mmap.c  |  24 +-
 mm/shmem.c | 320 -
 mm/swap_state.c|   1 +
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/memfd/.gitignore   |   4 +
 tools/testing/selftests/memfd/Makefile |  40 ++
 tools/testing/selftests/memfd/fuse_mnt.c   | 110 +++
 tools/testing/selftests/memfd/fuse_test.c  | 311 +
 tools/testing/selftests/memfd/memfd_test.c | 913 +
 tools/testing/selftests/memfd/run_fuse_test.sh |  14 +
 21 files changed, 1807 insertions(+), 12 deletions(-)
 create mode 100644 include/uapi/linux/memfd.h
 create mode 100644 tools/testing/selftests/memfd/.gitignore
 create mode 100644 tools/testing/selftests/memfd/Makefile
 create mode 100755 tools/testing/selftests/memfd/fuse_mnt.c
 create mode 100644 tools/testing/selftests/memfd/fuse_test.c
 create mode 100644 tools/testing/selftests/memfd/memfd_test.c
 create mode 100755 tools/testing/selftests/memfd/run_fuse_test.sh

-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-13 Thread Andy Lutomirski
On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That file-descriptor 
 is
   backed by anon-memory and can be memory-mapped for access.
   sealing: The sealing API can be used to prevent a specific set of operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

Potentially silly question: is it guaranteed that mmapping and reading
a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
this be documented?  (The particular issue here would be reading
holes.  It should work by using the zero page, but, if so, we should
probably make it a real documented guarantee.)

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-13 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That file-descriptor 
 is
   backed by anon-memory and can be memory-mapped for access.
   sealing: The sealing API can be used to prevent a specific set of 
 operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

 Potentially silly question: is it guaranteed that mmapping and reading
 a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
 this be documented?  (The particular issue here would be reading
 holes.  It should work by using the zero page, but, if so, we should
 probably make it a real documented guarantee.)

No, this is not guaranteed. See the previous discussion in v2 on Patch
2/4 between Hugh and me.

Summary is: If you want mmap-reads to not fail, use mlock(). There are
many situations where a fault might fail (think: OOM) and sealing is
not meant to protect against that. Btw., holes are automatically
filled with fresh pages by shmem. So a read only fails in OOM
situations (or memcg limits, etc.).

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-13 Thread Andy Lutomirski
On Fri, Jun 13, 2014 at 8:15 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann dh.herrm...@gmail.com 
 wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That 
 file-descriptor is
   backed by anon-memory and can be memory-mapped for access.
   sealing: The sealing API can be used to prevent a specific set of 
 operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

 Potentially silly question: is it guaranteed that mmapping and reading
 a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
 this be documented?  (The particular issue here would be reading
 holes.  It should work by using the zero page, but, if so, we should
 probably make it a real documented guarantee.)

 No, this is not guaranteed. See the previous discussion in v2 on Patch
 2/4 between Hugh and me.

 Summary is: If you want mmap-reads to not fail, use mlock(). There are
 many situations where a fault might fail (think: OOM) and sealing is
 not meant to protect against that. Btw., holes are automatically
 filled with fresh pages by shmem. So a read only fails in OOM
 situations (or memcg limits, etc.).


Isn't the point of SEAL_SHRINK to allow servers to mmap and read
safely without worrying about SIGBUS?

--Andy

 Thanks
 David



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/7] File Sealing memfd_create()

2014-06-13 Thread David Herrmann
Hi

On Fri, Jun 13, 2014 at 5:17 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Jun 13, 2014 at 8:15 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Fri, Jun 13, 2014 at 5:10 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Jun 13, 2014 at 3:36 AM, David Herrmann dh.herrm...@gmail.com 
 wrote:
 Hi

 This is v3 of the File-Sealing and memfd_create() patches. You can find v1 
 with
 a longer introduction at gmane:
   http://thread.gmane.org/gmane.comp.video.dri.devel/102241
 An LWN article about memfd+sealing is available, too:
   https://lwn.net/Articles/593918/
 v2 with some more discussions can be found here:
   http://thread.gmane.org/gmane.linux.kernel.mm/115713

 This series introduces two new APIs:
   memfd_create(): Think of this syscall as malloc() but it returns a
   file-descriptor instead of a pointer. That 
 file-descriptor is
   backed by anon-memory and can be memory-mapped for 
 access.
   sealing: The sealing API can be used to prevent a specific set of 
 operations
on a file-descriptor. You 'seal' the file and give thus the
guarantee, that it cannot be modified in the specific ways.

 A short high-level introduction is also available here:
   http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

 Potentially silly question: is it guaranteed that mmapping and reading
 a SEAL_SHRINKed fd within size bounds will not SIGBUS?  If so, should
 this be documented?  (The particular issue here would be reading
 holes.  It should work by using the zero page, but, if so, we should
 probably make it a real documented guarantee.)

 No, this is not guaranteed. See the previous discussion in v2 on Patch
 2/4 between Hugh and me.

 Summary is: If you want mmap-reads to not fail, use mlock(). There are
 many situations where a fault might fail (think: OOM) and sealing is
 not meant to protect against that. Btw., holes are automatically
 filled with fresh pages by shmem. So a read only fails in OOM
 situations (or memcg limits, etc.).


 Isn't the point of SEAL_SHRINK to allow servers to mmap and read
 safely without worrying about SIGBUS?

No, I don't think so.
The point of SEAL_SHRINK is to prevent a file from shrinking. SIGBUS
is an effect, not a cause. It's only a coincidence that OOM during
reads and reading beyond file-boundaries has the same effect:
SIGBUS.
We only protect against reading beyond file-boundaries due to
shrinking. Therefore, OOM-SIGBUS is unrelated to SEAL_SHRINK.

Anyone dealing with mmap() _has_ to use mlock() to protect against
OOM-SIGBUS. Making SEAL_SHRINK protect against OOM-SIGBUS would be
redundant, because you can achieve the same with SEAL_SHRINK+mlock().

Thanks
David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/