subject:"\[PATCH v3 resend 1\/2\] mm\: Add an F_SEAL_FUTURE

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Joel Fernandes

On Sun, Nov 11, 2018 at 07:14:33AM -0800, Andy Lutomirski wrote:
[...]
> >> I see two reasonable solutions:
> >> 
> >> 1. Don’t fiddle with the struct file at all. Instead make the 
> >> inode flag
> >> work by itself.
> > 
> > Currently, the various VFS paths check only the struct file's 
> > f_mode to deny
> > writes of already opened files. This would mean more checking in 
> > all those
> > paths (and modification of all those paths).
> > 
> > Anyway going with that idea, we could
> > 1. call deny_write_access(file) from the memfd's seal path which 
> > decrements
> > the inode::i_writecount.
> > 2. call get_write_access(inode) in the various VFS paths in 
> > addition to
> > checking for FMODE_*WRITE and deny the write (incase i_writecount 
> > is negative)
> > 
> > That will prevent both reopens, and writes from succeeding. However 
> > I worry a
> > bit about 2 not being too familiar with VFS internals, about what 
> > the
> > consequences of doing that may be.
>  
>  IMHO, modifying both the inode and the struct file separately is 
>  fine,
>  since they mean different things. In regular filesystems, it's fine 
>  to
>  have a read-write open file description for a file whose inode grants
>  write permission to nobody. Speaking of which: is fchmod enough to
>  prevent this attack?
> >>> 
> >>> Well, yes and no. fchmod does prevent reopening the file RW, but
> >>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >>> isn't sufficient by itself. While it might be good enough for Android
> >>> (in the sense that it'll prevent RW-reopens from other security
> >>> contexts to which we send an open memfd file), it's still conceptually
> >>> ugly, IMHO. Let's go with the original approach of just tweaking the
> >>> inode so that open-for-write is permanently blocked.
> >> 
> >> Agreed with the idea of modifying both file and inode flags. I was 
> >> thinking
> >> modifying i_mode may do the trick but as you pointed it probably could 
> >> be
> >> reverted by chmod or some other attribute setting calls.
> >> 
> >> OTOH, I don't think deny_write_access(file) can be reverted from any
> >> user-facing path so we could do that from the seal to prevent the 
> >> future
> >> opens in write mode. I'll double check and test that out tomorrow.
> >> 
> >> 
> > 
> > This seems considerably more complicated and more fragile than needed. 
> > Just
> > add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the 
> > _FUTURE
> > variant work exactly like it with two exceptions:
> > 
> > - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
> > act
> > accordingly.
>  
>  There's more to it than that, we also need to block future writes through
>  write syscall, so we have to hook into the write path too once the seal 
>  is
>  set, not just the mmap. That means we have to add code in mm/shmem.c to 
>  do
>  that in all those handlers, to check for the seal (and hope we didn't 
>  miss a
>  file_operations handler). Is that what you are proposing?
> >>> 
> >>> The existing code already does this. That’s why I suggested grepping :)
> >>> 
>  
>  Also, it means we have to keep CONFIG_TMPFS enabled so that the
>  shmem_file_operations write handlers like write_iter are hooked up. 
>  Currently
>  memfd works even with !CONFIG_TMPFS.
> >>> 
> >>> If so, that sounds like it may already be a bug.
> > 
> > Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> > write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> > kernel implementation detail leaking out into userspace.
> > 
> > - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> > 
> > That really should be all that’s needed.
>  
>  It seems a fair idea what you're saying. But I don't see how its less
>  complex.. IMO its far more simple to have VFS do the denial of the 
>  operations
>  based on the flags of its datastructures.. and if it works (which I will 
>  test
>  to be sure it will), then we should be good.
> >>> 
> >>> I agree it’s complicated, but the code is already written.  You should 
> >>> just
> >>> need to adjust some masks.
> >>> 
> >> 
> >> Its actually not that bad and a great idea, I did something like the
> >> following and it works pretty well. I would say its cleaner than the old
> >> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
> >> selftest and made sure

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Joel Fernandes

On Sun, Nov 11, 2018 at 07:14:33AM -0800, Andy Lutomirski wrote:
[...]
> >> I see two reasonable solutions:
> >> 
> >> 1. Don’t fiddle with the struct file at all. Instead make the 
> >> inode flag
> >> work by itself.
> > 
> > Currently, the various VFS paths check only the struct file's 
> > f_mode to deny
> > writes of already opened files. This would mean more checking in 
> > all those
> > paths (and modification of all those paths).
> > 
> > Anyway going with that idea, we could
> > 1. call deny_write_access(file) from the memfd's seal path which 
> > decrements
> > the inode::i_writecount.
> > 2. call get_write_access(inode) in the various VFS paths in 
> > addition to
> > checking for FMODE_*WRITE and deny the write (incase i_writecount 
> > is negative)
> > 
> > That will prevent both reopens, and writes from succeeding. However 
> > I worry a
> > bit about 2 not being too familiar with VFS internals, about what 
> > the
> > consequences of doing that may be.
>  
>  IMHO, modifying both the inode and the struct file separately is 
>  fine,
>  since they mean different things. In regular filesystems, it's fine 
>  to
>  have a read-write open file description for a file whose inode grants
>  write permission to nobody. Speaking of which: is fchmod enough to
>  prevent this attack?
> >>> 
> >>> Well, yes and no. fchmod does prevent reopening the file RW, but
> >>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >>> isn't sufficient by itself. While it might be good enough for Android
> >>> (in the sense that it'll prevent RW-reopens from other security
> >>> contexts to which we send an open memfd file), it's still conceptually
> >>> ugly, IMHO. Let's go with the original approach of just tweaking the
> >>> inode so that open-for-write is permanently blocked.
> >> 
> >> Agreed with the idea of modifying both file and inode flags. I was 
> >> thinking
> >> modifying i_mode may do the trick but as you pointed it probably could 
> >> be
> >> reverted by chmod or some other attribute setting calls.
> >> 
> >> OTOH, I don't think deny_write_access(file) can be reverted from any
> >> user-facing path so we could do that from the seal to prevent the 
> >> future
> >> opens in write mode. I'll double check and test that out tomorrow.
> >> 
> >> 
> > 
> > This seems considerably more complicated and more fragile than needed. 
> > Just
> > add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the 
> > _FUTURE
> > variant work exactly like it with two exceptions:
> > 
> > - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
> > act
> > accordingly.
>  
>  There's more to it than that, we also need to block future writes through
>  write syscall, so we have to hook into the write path too once the seal 
>  is
>  set, not just the mmap. That means we have to add code in mm/shmem.c to 
>  do
>  that in all those handlers, to check for the seal (and hope we didn't 
>  miss a
>  file_operations handler). Is that what you are proposing?
> >>> 
> >>> The existing code already does this. That’s why I suggested grepping :)
> >>> 
>  
>  Also, it means we have to keep CONFIG_TMPFS enabled so that the
>  shmem_file_operations write handlers like write_iter are hooked up. 
>  Currently
>  memfd works even with !CONFIG_TMPFS.
> >>> 
> >>> If so, that sounds like it may already be a bug.
> > 
> > Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> > write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> > kernel implementation detail leaking out into userspace.
> > 
> > - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> > 
> > That really should be all that’s needed.
>  
>  It seems a fair idea what you're saying. But I don't see how its less
>  complex.. IMO its far more simple to have VFS do the denial of the 
>  operations
>  based on the flags of its datastructures.. and if it works (which I will 
>  test
>  to be sure it will), then we should be good.
> >>> 
> >>> I agree it’s complicated, but the code is already written.  You should 
> >>> just
> >>> need to adjust some masks.
> >>> 
> >> 
> >> Its actually not that bad and a great idea, I did something like the
> >> following and it works pretty well. I would say its cleaner than the old
> >> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
> >> selftest and made sure

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Andy Lutomirski





> On Nov 11, 2018, at 12:30 AM, Daniel Colascione  wrote:
> 
>> On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes  
>> wrote:
>> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>> I see two reasonable solutions:
>> 
>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
>> flag
>> work by itself.
> 
> Currently, the various VFS paths check only the struct file's f_mode 
> to deny
> writes of already opened files. This would mean more checking in all 
> those
> paths (and modification of all those paths).
> 
> Anyway going with that idea, we could
> 1. call deny_write_access(file) from the memfd's seal path which 
> decrements
> the inode::i_writecount.
> 2. call get_write_access(inode) in the various VFS paths in addition 
> to
> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> negative)
> 
> That will prevent both reopens, and writes from succeeding. However I 
> worry a
> bit about 2 not being too familiar with VFS internals, about what the
> consequences of doing that may be.
 
 IMHO, modifying both the inode and the struct file separately is fine,
 since they mean different things. In regular filesystems, it's fine to
 have a read-write open file description for a file whose inode grants
 write permission to nobody. Speaking of which: is fchmod enough to
 prevent this attack?
>>> 
>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>> isn't sufficient by itself. While it might be good enough for Android
>>> (in the sense that it'll prevent RW-reopens from other security
>>> contexts to which we send an open memfd file), it's still conceptually
>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>> inode so that open-for-write is permanently blocked.
>> 
>> Agreed with the idea of modifying both file and inode flags. I was 
>> thinking
>> modifying i_mode may do the trick but as you pointed it probably could be
>> reverted by chmod or some other attribute setting calls.
>> 
>> OTOH, I don't think deny_write_access(file) can be reverted from any
>> user-facing path so we could do that from the seal to prevent the future
>> opens in write mode. I'll double check and test that out tomorrow.
>> 
>> 
> 
> This seems considerably more complicated and more fragile than needed. 
> Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
> act
> accordingly.
 
 There's more to it than that, we also need to block future writes through
 write syscall, so we have to hook into the write path too once the seal is
 set, not just the mmap. That means we have to add code in mm/shmem.c to do
 that in all those handlers, to check for the seal (and hope we didn't miss 
 a
 file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
 
 Also, it means we have to keep CONFIG_TMPFS enabled so that the
 shmem_file_operations write handlers like write_iter are hooked up. 
 Currently
 memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> 
> That really should be all that’s needed.
 
 It seems a fair idea what you're saying. But I don't see how its less
 complex.. IMO its far more simple to have VFS do the denial of the 
 operations
 based on the flags of its datastructures.. and if it works (which I will 
 test
 to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Andy Lutomirski





> On Nov 11, 2018, at 12:30 AM, Daniel Colascione  wrote:
> 
>> On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes  
>> wrote:
>> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
>> [...]
>> I see two reasonable solutions:
>> 
>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
>> flag
>> work by itself.
> 
> Currently, the various VFS paths check only the struct file's f_mode 
> to deny
> writes of already opened files. This would mean more checking in all 
> those
> paths (and modification of all those paths).
> 
> Anyway going with that idea, we could
> 1. call deny_write_access(file) from the memfd's seal path which 
> decrements
> the inode::i_writecount.
> 2. call get_write_access(inode) in the various VFS paths in addition 
> to
> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> negative)
> 
> That will prevent both reopens, and writes from succeeding. However I 
> worry a
> bit about 2 not being too familiar with VFS internals, about what the
> consequences of doing that may be.
 
 IMHO, modifying both the inode and the struct file separately is fine,
 since they mean different things. In regular filesystems, it's fine to
 have a read-write open file description for a file whose inode grants
 write permission to nobody. Speaking of which: is fchmod enough to
 prevent this attack?
>>> 
>>> Well, yes and no. fchmod does prevent reopening the file RW, but
>>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>> isn't sufficient by itself. While it might be good enough for Android
>>> (in the sense that it'll prevent RW-reopens from other security
>>> contexts to which we send an open memfd file), it's still conceptually
>>> ugly, IMHO. Let's go with the original approach of just tweaking the
>>> inode so that open-for-write is permanently blocked.
>> 
>> Agreed with the idea of modifying both file and inode flags. I was 
>> thinking
>> modifying i_mode may do the trick but as you pointed it probably could be
>> reverted by chmod or some other attribute setting calls.
>> 
>> OTOH, I don't think deny_write_access(file) can be reverted from any
>> user-facing path so we could do that from the seal to prevent the future
>> opens in write mode. I'll double check and test that out tomorrow.
>> 
>> 
> 
> This seems considerably more complicated and more fragile than needed. 
> Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
> act
> accordingly.
 
 There's more to it than that, we also need to block future writes through
 write syscall, so we have to hook into the write path too once the seal is
 set, not just the mmap. That means we have to add code in mm/shmem.c to do
 that in all those handlers, to check for the seal (and hope we didn't miss 
 a
 file_operations handler). Is that what you are proposing?
>>> 
>>> The existing code already does this. That’s why I suggested grepping :)
>>> 
 
 Also, it means we have to keep CONFIG_TMPFS enabled so that the
 shmem_file_operations write handlers like write_iter are hooked up. 
 Currently
 memfd works even with !CONFIG_TMPFS.
>>> 
>>> If so, that sounds like it may already be a bug.
> 
> Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
> write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
> kernel implementation detail leaking out into userspace.
> 
> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> 
> That really should be all that’s needed.
 
 It seems a fair idea what you're saying. But I don't see how its less
 complex.. IMO its far more simple to have VFS do the denial of the 
 operations
 based on the flags of its datastructures.. and if it works (which I will 
 test
 to be sure it will), then we should be good.
>>> 
>>> I agree it’s complicated, but the code is already written.  You should just
>>> need to adjust some masks.
>>> 
>> 
>> Its actually not that bad and a great idea, I did something like the
>> following and it works pretty well. I would say its cleaner than the old
>> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
>> selftest and made sure that issue goes away).
>> 
>> Side note: One subtelty I discovered from the existing selftests is once the
>> F_SEAL_WRITE are active, an mmap of PROT_READ

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Daniel Colascione

On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes  wrote:
> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> [...]
>> >>> I see two reasonable solutions:
>> >>>
>> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
>> >>> flag
>> >>> work by itself.
>> >>
>> >> Currently, the various VFS paths check only the struct file's f_mode 
>> >> to deny
>> >> writes of already opened files. This would mean more checking in all 
>> >> those
>> >> paths (and modification of all those paths).
>> >>
>> >> Anyway going with that idea, we could
>> >> 1. call deny_write_access(file) from the memfd's seal path which 
>> >> decrements
>> >> the inode::i_writecount.
>> >> 2. call get_write_access(inode) in the various VFS paths in addition 
>> >> to
>> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> >> negative)
>> >>
>> >> That will prevent both reopens, and writes from succeeding. However I 
>> >> worry a
>> >> bit about 2 not being too familiar with VFS internals, about what the
>> >> consequences of doing that may be.
>> >
>> > IMHO, modifying both the inode and the struct file separately is fine,
>> > since they mean different things. In regular filesystems, it's fine to
>> > have a read-write open file description for a file whose inode grants
>> > write permission to nobody. Speaking of which: is fchmod enough to
>> > prevent this attack?
>> 
>>  Well, yes and no. fchmod does prevent reopening the file RW, but
>>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>  isn't sufficient by itself. While it might be good enough for Android
>>  (in the sense that it'll prevent RW-reopens from other security
>>  contexts to which we send an open memfd file), it's still conceptually
>>  ugly, IMHO. Let's go with the original approach of just tweaking the
>>  inode so that open-for-write is permanently blocked.
>> >>>
>> >>> Agreed with the idea of modifying both file and inode flags. I was 
>> >>> thinking
>> >>> modifying i_mode may do the trick but as you pointed it probably could be
>> >>> reverted by chmod or some other attribute setting calls.
>> >>>
>> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
>> >>> user-facing path so we could do that from the seal to prevent the future
>> >>> opens in write mode. I'll double check and test that out tomorrow.
>> >>>
>> >>>
>> >>
>> >> This seems considerably more complicated and more fragile than needed. 
>> >> Just
>> >> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> >> variant work exactly like it with two exceptions:
>> >>
>> >> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
>> >> act
>> >> accordingly.
>> >
>> > There's more to it than that, we also need to block future writes through
>> > write syscall, so we have to hook into the write path too once the seal is
>> > set, not just the mmap. That means we have to add code in mm/shmem.c to do
>> > that in all those handlers, to check for the seal (and hope we didn't miss 
>> > a
>> > file_operations handler). Is that what you are proposing?
>>
>> The existing code already does this. That’s why I suggested grepping :)
>>
>> >
>> > Also, it means we have to keep CONFIG_TMPFS enabled so that the
>> > shmem_file_operations write handlers like write_iter are hooked up. 
>> > Currently
>> > memfd works even with !CONFIG_TMPFS.
>>
>> If so, that sounds like it may already be a bug.

Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
kernel implementation detail leaking out into userspace.

>> >> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>> >>
>> >> That really should be all that’s needed.
>> >
>> > It seems a fair idea what you're saying. But I don't see how its less
>> > complex.. IMO its far more simple to have VFS do the denial of the 
>> > operations
>> > based on the flags of its datastructures.. and if it works (which I will 
>> > test
>> > to be sure it will), then we should be good.
>>
>> I agree it’s complicated, but the code is already written.  You should just
>> need to adjust some masks.
>>
>
> Its actually not that bad and a great idea, I did something like the
> following and it works pretty well. I would say its cleaner than the old
> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
> selftest and made sure that issue goes away).
>
> Side note: One subtelty I discovered from the existing selftests is once the
> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
> expected to fail. This is also evident from this code in mmap_region:
> if

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Daniel Colascione

On Sun, Nov 11, 2018 at 12:09 AM, Joel Fernandes  wrote:
> On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> [...]
>> >>> I see two reasonable solutions:
>> >>>
>> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
>> >>> flag
>> >>> work by itself.
>> >>
>> >> Currently, the various VFS paths check only the struct file's f_mode 
>> >> to deny
>> >> writes of already opened files. This would mean more checking in all 
>> >> those
>> >> paths (and modification of all those paths).
>> >>
>> >> Anyway going with that idea, we could
>> >> 1. call deny_write_access(file) from the memfd's seal path which 
>> >> decrements
>> >> the inode::i_writecount.
>> >> 2. call get_write_access(inode) in the various VFS paths in addition 
>> >> to
>> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> >> negative)
>> >>
>> >> That will prevent both reopens, and writes from succeeding. However I 
>> >> worry a
>> >> bit about 2 not being too familiar with VFS internals, about what the
>> >> consequences of doing that may be.
>> >
>> > IMHO, modifying both the inode and the struct file separately is fine,
>> > since they mean different things. In regular filesystems, it's fine to
>> > have a read-write open file description for a file whose inode grants
>> > write permission to nobody. Speaking of which: is fchmod enough to
>> > prevent this attack?
>> 
>>  Well, yes and no. fchmod does prevent reopening the file RW, but
>>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>>  isn't sufficient by itself. While it might be good enough for Android
>>  (in the sense that it'll prevent RW-reopens from other security
>>  contexts to which we send an open memfd file), it's still conceptually
>>  ugly, IMHO. Let's go with the original approach of just tweaking the
>>  inode so that open-for-write is permanently blocked.
>> >>>
>> >>> Agreed with the idea of modifying both file and inode flags. I was 
>> >>> thinking
>> >>> modifying i_mode may do the trick but as you pointed it probably could be
>> >>> reverted by chmod or some other attribute setting calls.
>> >>>
>> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
>> >>> user-facing path so we could do that from the seal to prevent the future
>> >>> opens in write mode. I'll double check and test that out tomorrow.
>> >>>
>> >>>
>> >>
>> >> This seems considerably more complicated and more fragile than needed. 
>> >> Just
>> >> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> >> variant work exactly like it with two exceptions:
>> >>
>> >> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and 
>> >> act
>> >> accordingly.
>> >
>> > There's more to it than that, we also need to block future writes through
>> > write syscall, so we have to hook into the write path too once the seal is
>> > set, not just the mmap. That means we have to add code in mm/shmem.c to do
>> > that in all those handlers, to check for the seal (and hope we didn't miss 
>> > a
>> > file_operations handler). Is that what you are proposing?
>>
>> The existing code already does this. That’s why I suggested grepping :)
>>
>> >
>> > Also, it means we have to keep CONFIG_TMPFS enabled so that the
>> > shmem_file_operations write handlers like write_iter are hooked up. 
>> > Currently
>> > memfd works even with !CONFIG_TMPFS.
>>
>> If so, that sounds like it may already be a bug.

Why shouldn't memfd work independently of CONFIG_TMPFS? In particular,
write(2) on tmpfs FDs shouldn't work differently. If it does, that's a
kernel implementation detail leaking out into userspace.

>> >> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
>> >>
>> >> That really should be all that’s needed.
>> >
>> > It seems a fair idea what you're saying. But I don't see how its less
>> > complex.. IMO its far more simple to have VFS do the denial of the 
>> > operations
>> > based on the flags of its datastructures.. and if it works (which I will 
>> > test
>> > to be sure it will), then we should be good.
>>
>> I agree it’s complicated, but the code is already written.  You should just
>> need to adjust some masks.
>>
>
> Its actually not that bad and a great idea, I did something like the
> following and it works pretty well. I would say its cleaner than the old
> approach for sure (and I also added a /proc/pid/fd/N reopen test to the
> selftest and made sure that issue goes away).
>
> Side note: One subtelty I discovered from the existing selftests is once the
> F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
> expected to fail. This is also evident from this code in mmap_region:
> if

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
[...]
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode. I'll double check and test that out tomorrow.
> >>> 
> >>> 
> >> 
> >> This seems considerably more complicated and more fragile than needed. Just
> >> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> >> variant work exactly like it with two exceptions:
> >> 
> >> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> >> accordingly.
> > 
> > There's more to it than that, we also need to block future writes through
> > write syscall, so we have to hook into the write path too once the seal is
> > set, not just the mmap. That means we have to add code in mm/shmem.c to do
> > that in all those handlers, to check for the seal (and hope we didn't miss a
> > file_operations handler). Is that what you are proposing?
> 
> The existing code already does this. That’s why I suggested grepping :)
> 
> > 
> > Also, it means we have to keep CONFIG_TMPFS enabled so that the
> > shmem_file_operations write handlers like write_iter are hooked up. 
> > Currently
> > memfd works even with !CONFIG_TMPFS.
> 
> If so, that sounds like it may already be a bug.
> 
> > 
> >> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> >> 
> >> That really should be all that’s needed.
> > 
> > It seems a fair idea what you're saying. But I don't see how its less
> > complex.. IMO its far more simple to have VFS do the denial of the 
> > operations
> > based on the flags of its datastructures.. and if it works (which I will 
> > test
> > to be sure it will), then we should be good.
> 
> I agree it’s complicated, but the code is already written.  You should just
> need to adjust some masks.
> 

Its actually not that bad and a great idea, I did something like the
following and it works pretty well. I would say its cleaner than the old
approach for sure (and I also added a /proc/pid/fd/N reopen test to the
selftest and made sure that issue goes away).

Side note: One subtelty I discovered from the existing selftests is once the
F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
expected to fail. This is also evident from this code in mmap_region:
if (vm_flags & VM_SHARED) {
error = mapping_map_writable(file->f_mapping);
if (error)
goto allow_write_and_free_vma;
}

---8<---

From: "Joel Fernandes (Google)" 
Subject: [PATCH] mm/memfd: implement future write seal using shmem ops

Signed-off-by: Joel Fernandes

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-11 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
[...]
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode. I'll double check and test that out tomorrow.
> >>> 
> >>> 
> >> 
> >> This seems considerably more complicated and more fragile than needed. Just
> >> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> >> variant work exactly like it with two exceptions:
> >> 
> >> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> >> accordingly.
> > 
> > There's more to it than that, we also need to block future writes through
> > write syscall, so we have to hook into the write path too once the seal is
> > set, not just the mmap. That means we have to add code in mm/shmem.c to do
> > that in all those handlers, to check for the seal (and hope we didn't miss a
> > file_operations handler). Is that what you are proposing?
> 
> The existing code already does this. That’s why I suggested grepping :)
> 
> > 
> > Also, it means we have to keep CONFIG_TMPFS enabled so that the
> > shmem_file_operations write handlers like write_iter are hooked up. 
> > Currently
> > memfd works even with !CONFIG_TMPFS.
> 
> If so, that sounds like it may already be a bug.
> 
> > 
> >> - add_seals won’t need the wait_for_pins and mapping_deny_write logic.
> >> 
> >> That really should be all that’s needed.
> > 
> > It seems a fair idea what you're saying. But I don't see how its less
> > complex.. IMO its far more simple to have VFS do the denial of the 
> > operations
> > based on the flags of its datastructures.. and if it works (which I will 
> > test
> > to be sure it will), then we should be good.
> 
> I agree it’s complicated, but the code is already written.  You should just
> need to adjust some masks.
> 

Its actually not that bad and a great idea, I did something like the
following and it works pretty well. I would say its cleaner than the old
approach for sure (and I also added a /proc/pid/fd/N reopen test to the
selftest and made sure that issue goes away).

Side note: One subtelty I discovered from the existing selftests is once the
F_SEAL_WRITE are active, an mmap of PROT_READ and MAP_SHARED region is
expected to fail. This is also evident from this code in mmap_region:
if (vm_flags & VM_SHARED) {
error = mapping_map_writable(file->f_mapping);
if (error)
goto allow_write_and_free_vma;
}

---8<---

From: "Joel Fernandes (Google)" 
Subject: [PATCH] mm/memfd: implement future write seal using shmem ops

Signed-off-by: Joel Fernandes

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> >> 
>  On Nov 10, 2018, at 2:09 PM, Joel Fernandes  
>  wrote:
>  
> > On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione 
> >>  wrote:
> >> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
> >>  wrote:
> >> Thanks Andy for your thoughts, my comments below:
>  [snip]
> >> I don't see it as warty, different seals will work differently. It 
> >> works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >> 
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >> 
> >> Wouldn't that be the right thing to observe anyway?
> >> 
> >>> - causes reopen to fail.
> >> 
> >> So this concern isn't true anymore if we make reopen fail only for 
> >> WRITE
> >> opens as Daniel suggested. I will make this change so that the 
> >> security fix
> >> is a clean one.
> >> 
> >>> - does *not* affect other struct files that may already exist on the 
> >>> same inode.
> >> 
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent 
> >> over IPC
> >> to another process and we want to prevent any new writes in the 
> >> receiver
> >> side. There is no way this other receiving process can have an 
> >> existing fd
> >> unless it was already sent one without the seal applied.  The proposed 
> >> seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >> 
> >>> - mysteriously malfunctions if you try to set it again on another 
> >>> struct
> >>> file that already exists
> >>> 
> >> 
> >> I didn't follow this, could you explain more?
> >> 
> >>> - probably is insecure when used on hugetlbfs.
> >> 
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the 
> >> memory map.
> >> So would you call that a security issue too? The use of the seal wants 
> >> to
> >> allow existing mmap regions to be continue to be written into (I 
> >> mentioned
> >> more details in the cover letter).
> >> 
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> >> 
>  On Nov 10, 2018, at 2:09 PM, Joel Fernandes  
>  wrote:
>  
> > On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione 
> >>  wrote:
> >> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
> >>  wrote:
> >> Thanks Andy for your thoughts, my comments below:
>  [snip]
> >> I don't see it as warty, different seals will work differently. It 
> >> works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >> 
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >> 
> >> Wouldn't that be the right thing to observe anyway?
> >> 
> >>> - causes reopen to fail.
> >> 
> >> So this concern isn't true anymore if we make reopen fail only for 
> >> WRITE
> >> opens as Daniel suggested. I will make this change so that the 
> >> security fix
> >> is a clean one.
> >> 
> >>> - does *not* affect other struct files that may already exist on the 
> >>> same inode.
> >> 
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent 
> >> over IPC
> >> to another process and we want to prevent any new writes in the 
> >> receiver
> >> side. There is no way this other receiving process can have an 
> >> existing fd
> >> unless it was already sent one without the seal applied.  The proposed 
> >> seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >> 
> >>> - mysteriously malfunctions if you try to set it again on another 
> >>> struct
> >>> file that already exists
> >>> 
> >> 
> >> I didn't follow this, could you explain more?
> >> 
> >>> - probably is insecure when used on hugetlbfs.
> >> 
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the 
> >> memory map.
> >> So would you call that a security issue too? The use of the seal wants 
> >> to
> >> allow existing mmap regions to be continue to be written into (I 
> >> mentioned
> >> more details in the cover letter).
> >> 
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski




> On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
>> 
 On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
 
> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>>  wrote:
>> Thanks Andy for your thoughts, my comments below:
 [snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>> 
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>> 
>> Wouldn't that be the right thing to observe anyway?
>> 
>>> - causes reopen to fail.
>> 
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security 
>> fix
>> is a clean one.
>> 
>>> - does *not* affect other struct files that may already exist on the 
>>> same inode.
>> 
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>> IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing 
>> fd
>> unless it was already sent one without the seal applied.  The proposed 
>> seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>> 
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>> 
>> 
>> I didn't follow this, could you explain more?
>> 
>>> - probably is insecure when used on hugetlbfs.
>> 
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory 
>> map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I 
>> mentioned
>> more details in the cover letter).
>> 
>>> I see two reasonable solutions:
>>> 
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>> 
>> Currently, the various VFS paths check only the struct file's f_mode to 
>> deny
>> writes of already opened files. This would mean more checking in all 
>> those
>> paths (and modification of all those paths).
>> 
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which 
>> decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>> 
>> That will prevent both reopens, and writes from succeeding. However I 
>> worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
> 
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?
 
 Well, yes and no. fchmod does prevent reopening the file RW, but
 anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
 seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
 isn't sufficient by itself. While it might be good enough for Android
 (in the sense that it'll prevent RW-reopens from other security
 contexts to which we send an open memfd file), it's still conceptually
 ugly, IMHO. Let's go with the original approach of just tweaking the
 inode so that open-for-write is permanently blocked.
>>> 
>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>> modifying i_mode may do the trick but as you pointed it probably could be
>>> reverted by chmod or some other attribute setting calls.
>>> 
>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>> user-facing path so we could do that from the seal to prevent the future
>>> opens in write mode. I'll double check and test that out tomorrow.
>>> 
>>> 
>> 
>> This seems considerably more complicated and more fragile than needed. Just
>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> variant work exactly like it with two exceptions:
>> 
>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski




> On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
>> 
 On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
 
> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>>  wrote:
>> Thanks Andy for your thoughts, my comments below:
 [snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>> 
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>> 
>> Wouldn't that be the right thing to observe anyway?
>> 
>>> - causes reopen to fail.
>> 
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security 
>> fix
>> is a clean one.
>> 
>>> - does *not* affect other struct files that may already exist on the 
>>> same inode.
>> 
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>> IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing 
>> fd
>> unless it was already sent one without the seal applied.  The proposed 
>> seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>> 
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>> 
>> 
>> I didn't follow this, could you explain more?
>> 
>>> - probably is insecure when used on hugetlbfs.
>> 
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory 
>> map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I 
>> mentioned
>> more details in the cover letter).
>> 
>>> I see two reasonable solutions:
>>> 
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>> 
>> Currently, the various VFS paths check only the struct file's f_mode to 
>> deny
>> writes of already opened files. This would mean more checking in all 
>> those
>> paths (and modification of all those paths).
>> 
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which 
>> decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>> 
>> That will prevent both reopens, and writes from succeeding. However I 
>> worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
> 
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?
 
 Well, yes and no. fchmod does prevent reopening the file RW, but
 anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
 seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
 isn't sufficient by itself. While it might be good enough for Android
 (in the sense that it'll prevent RW-reopens from other security
 contexts to which we send an open memfd file), it's still conceptually
 ugly, IMHO. Let's go with the original approach of just tweaking the
 inode so that open-for-write is permanently blocked.
>>> 
>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>> modifying i_mode may do the trick but as you pointed it probably could be
>>> reverted by chmod or some other attribute setting calls.
>>> 
>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>> user-facing path so we could do that from the seal to prevent the future
>>> opens in write mode. I'll double check and test that out tomorrow.
>>> 
>>> 
>> 
>> This seems considerably more complicated and more fragile than needed. Just
>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> variant work exactly like it with two exceptions:
>> 
>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
> >>> wrote:
>  On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>   wrote:
>  Thanks Andy for your thoughts, my comments below:
> >> [snip]
>  I don't see it as warty, different seals will work differently. It works
>  quite well for our usecase, and since Linux is all about solving real
>  problems in the real work, it would be useful to have it.
>  
> > - causes a probably-observable effect in the file mode in F_GETFL.
>  
>  Wouldn't that be the right thing to observe anyway?
>  
> > - causes reopen to fail.
>  
>  So this concern isn't true anymore if we make reopen fail only for WRITE
>  opens as Daniel suggested. I will make this change so that the security 
>  fix
>  is a clean one.
>  
> > - does *not* affect other struct files that may already exist on the 
> > same inode.
>  
>  TBH if you really want to block all writes to the file, then you want
>  F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>  IPC
>  to another process and we want to prevent any new writes in the receiver
>  side. There is no way this other receiving process can have an existing 
>  fd
>  unless it was already sent one without the seal applied.  The proposed 
>  seal
>  could be renamed to F_SEAL_FD_WRITE if that is preferred.
>  
> > - mysteriously malfunctions if you try to set it again on another struct
> > file that already exists
> > 
>  
>  I didn't follow this, could you explain more?
>  
> > - probably is insecure when used on hugetlbfs.
>  
>  The usecase is not expected to prevent all writes, indeed the usecase
>  requires existing mmaps to continue to be able to write into the memory 
>  map.
>  So would you call that a security issue too? The use of the seal wants to
>  allow existing mmap regions to be continue to be written into (I 
>  mentioned
>  more details in the cover letter).
>  
> > I see two reasonable solutions:
> > 
> > 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> > work by itself.
>  
>  Currently, the various VFS paths check only the struct file's f_mode to 
>  deny
>  writes of already opened files. This would mean more checking in all 
>  those
>  paths (and modification of all those paths).
>  
>  Anyway going with that idea, we could
>  1. call deny_write_access(file) from the memfd's seal path which 
>  decrements
>  the inode::i_writecount.
>  2. call get_write_access(inode) in the various VFS paths in addition to
>  checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>  negative)
>  
>  That will prevent both reopens, and writes from succeeding. However I 
>  worry a
>  bit about 2 not being too familiar with VFS internals, about what the
>  consequences of doing that may be.
> >>> 
> >>> IMHO, modifying both the inode and the struct file separately is fine,
> >>> since they mean different things. In regular filesystems, it's fine to
> >>> have a read-write open file description for a file whose inode grants
> >>> write permission to nobody. Speaking of which: is fchmod enough to
> >>> prevent this attack?
> >> 
> >> Well, yes and no. fchmod does prevent reopening the file RW, but
> >> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >> isn't sufficient by itself. While it might be good enough for Android
> >> (in the sense that it'll prevent RW-reopens from other security
> >> contexts to which we send an open memfd file), it's still conceptually
> >> ugly, IMHO. Let's go with the original approach of just tweaking the
> >> inode so that open-for-write is permanently blocked.
> > 
> > Agreed with the idea of modifying both file and inode flags. I was thinking
> > modifying i_mode may do the trick but as you pointed it probably could be
> > reverted by chmod or some other attribute setting calls.
> > 
> > OTOH, I don't think deny_write_access(file) can be reverted from any
> > user-facing path so we could do that from the seal to prevent the future
> > opens in write mode. I'll double check and test that out tomorrow.
> > 
> > 
> 
> This seems considerably more complicated and more fragile than needed. Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> accordingly.

There's more to it than that, we also need to block future

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
> >>> wrote:
>  On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>   wrote:
>  Thanks Andy for your thoughts, my comments below:
> >> [snip]
>  I don't see it as warty, different seals will work differently. It works
>  quite well for our usecase, and since Linux is all about solving real
>  problems in the real work, it would be useful to have it.
>  
> > - causes a probably-observable effect in the file mode in F_GETFL.
>  
>  Wouldn't that be the right thing to observe anyway?
>  
> > - causes reopen to fail.
>  
>  So this concern isn't true anymore if we make reopen fail only for WRITE
>  opens as Daniel suggested. I will make this change so that the security 
>  fix
>  is a clean one.
>  
> > - does *not* affect other struct files that may already exist on the 
> > same inode.
>  
>  TBH if you really want to block all writes to the file, then you want
>  F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>  IPC
>  to another process and we want to prevent any new writes in the receiver
>  side. There is no way this other receiving process can have an existing 
>  fd
>  unless it was already sent one without the seal applied.  The proposed 
>  seal
>  could be renamed to F_SEAL_FD_WRITE if that is preferred.
>  
> > - mysteriously malfunctions if you try to set it again on another struct
> > file that already exists
> > 
>  
>  I didn't follow this, could you explain more?
>  
> > - probably is insecure when used on hugetlbfs.
>  
>  The usecase is not expected to prevent all writes, indeed the usecase
>  requires existing mmaps to continue to be able to write into the memory 
>  map.
>  So would you call that a security issue too? The use of the seal wants to
>  allow existing mmap regions to be continue to be written into (I 
>  mentioned
>  more details in the cover letter).
>  
> > I see two reasonable solutions:
> > 
> > 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> > work by itself.
>  
>  Currently, the various VFS paths check only the struct file's f_mode to 
>  deny
>  writes of already opened files. This would mean more checking in all 
>  those
>  paths (and modification of all those paths).
>  
>  Anyway going with that idea, we could
>  1. call deny_write_access(file) from the memfd's seal path which 
>  decrements
>  the inode::i_writecount.
>  2. call get_write_access(inode) in the various VFS paths in addition to
>  checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>  negative)
>  
>  That will prevent both reopens, and writes from succeeding. However I 
>  worry a
>  bit about 2 not being too familiar with VFS internals, about what the
>  consequences of doing that may be.
> >>> 
> >>> IMHO, modifying both the inode and the struct file separately is fine,
> >>> since they mean different things. In regular filesystems, it's fine to
> >>> have a read-write open file description for a file whose inode grants
> >>> write permission to nobody. Speaking of which: is fchmod enough to
> >>> prevent this attack?
> >> 
> >> Well, yes and no. fchmod does prevent reopening the file RW, but
> >> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >> isn't sufficient by itself. While it might be good enough for Android
> >> (in the sense that it'll prevent RW-reopens from other security
> >> contexts to which we send an open memfd file), it's still conceptually
> >> ugly, IMHO. Let's go with the original approach of just tweaking the
> >> inode so that open-for-write is permanently blocked.
> > 
> > Agreed with the idea of modifying both file and inode flags. I was thinking
> > modifying i_mode may do the trick but as you pointed it probably could be
> > reverted by chmod or some other attribute setting calls.
> > 
> > OTOH, I don't think deny_write_access(file) can be reverted from any
> > user-facing path so we could do that from the seal to prevent the future
> > opens in write mode. I'll double check and test that out tomorrow.
> > 
> > 
> 
> This seems considerably more complicated and more fragile than needed. Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> accordingly.

There's more to it than that, we also need to block future

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski

> On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>>> wrote:
 On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
 wrote:
 Thanks Andy for your thoughts, my comments below:
>> [snip]
 I don't see it as warty, different seals will work differently. It works
 quite well for our usecase, and since Linux is all about solving real
 problems in the real work, it would be useful to have it.

> - causes a probably-observable effect in the file mode in F_GETFL.

 Wouldn't that be the right thing to observe anyway?

> - causes reopen to fail.

 So this concern isn't true anymore if we make reopen fail only for WRITE
 opens as Daniel suggested. I will make this change so that the security fix
 is a clean one.

> - does *not* affect other struct files that may already exist on the same 
> inode.

 TBH if you really want to block all writes to the file, then you want
 F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
 to another process and we want to prevent any new writes in the receiver
 side. There is no way this other receiving process can have an existing fd
 unless it was already sent one without the seal applied.  The proposed seal
 could be renamed to F_SEAL_FD_WRITE if that is preferred.

> - mysteriously malfunctions if you try to set it again on another struct
> file that already exists
> 

 I didn't follow this, could you explain more?

> - probably is insecure when used on hugetlbfs.

 The usecase is not expected to prevent all writes, indeed the usecase
 requires existing mmaps to continue to be able to write into the memory 
 map.
 So would you call that a security issue too? The use of the seal wants to
 allow existing mmap regions to be continue to be written into (I mentioned
 more details in the cover letter).

> I see two reasonable solutions:
> 
> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> work by itself.

 Currently, the various VFS paths check only the struct file's f_mode to 
 deny
 writes of already opened files. This would mean more checking in all those
 paths (and modification of all those paths).

 Anyway going with that idea, we could
 1. call deny_write_access(file) from the memfd's seal path which decrements
 the inode::i_writecount.
 2. call get_write_access(inode) in the various VFS paths in addition to
 checking for FMODE_*WRITE and deny the write (incase i_writecount is 
 negative)

 That will prevent both reopens, and writes from succeeding. However I 
 worry a
 bit about 2 not being too familiar with VFS internals, about what the
 consequences of doing that may be.
>>> 
>>> IMHO, modifying both the inode and the struct file separately is fine,
>>> since they mean different things. In regular filesystems, it's fine to
>>> have a read-write open file description for a file whose inode grants
>>> write permission to nobody. Speaking of which: is fchmod enough to
>>> prevent this attack?
>> 
>> Well, yes and no. fchmod does prevent reopening the file RW, but
>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>> isn't sufficient by itself. While it might be good enough for Android
>> (in the sense that it'll prevent RW-reopens from other security
>> contexts to which we send an open memfd file), it's still conceptually
>> ugly, IMHO. Let's go with the original approach of just tweaking the
>> inode so that open-for-write is permanently blocked.
> 
> Agreed with the idea of modifying both file and inode flags. I was thinking
> modifying i_mode may do the trick but as you pointed it probably could be
> reverted by chmod or some other attribute setting calls.
> 
> OTOH, I don't think deny_write_access(file) can be reverted from any
> user-facing path so we could do that from the seal to prevent the future
> opens in write mode. I'll double check and test that out tomorrow.
> 
> 

This seems considerably more complicated and more fragile than needed. Just add 
a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE variant 
work exactly like it with two exceptions:

- shmem_mmap and maybe its hugetlbfs equivalent should check for it and act 
accordingly.

- add_seals won’t need the wait_for_pins and mapping_deny_write logic.

That really should be all that’s needed.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski

> On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>>> wrote:
 On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
 wrote:
 Thanks Andy for your thoughts, my comments below:
>> [snip]
 I don't see it as warty, different seals will work differently. It works
 quite well for our usecase, and since Linux is all about solving real
 problems in the real work, it would be useful to have it.

> - causes a probably-observable effect in the file mode in F_GETFL.

 Wouldn't that be the right thing to observe anyway?

> - causes reopen to fail.

 So this concern isn't true anymore if we make reopen fail only for WRITE
 opens as Daniel suggested. I will make this change so that the security fix
 is a clean one.

> - does *not* affect other struct files that may already exist on the same 
> inode.

 TBH if you really want to block all writes to the file, then you want
 F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
 to another process and we want to prevent any new writes in the receiver
 side. There is no way this other receiving process can have an existing fd
 unless it was already sent one without the seal applied.  The proposed seal
 could be renamed to F_SEAL_FD_WRITE if that is preferred.

> - mysteriously malfunctions if you try to set it again on another struct
> file that already exists
> 

 I didn't follow this, could you explain more?

> - probably is insecure when used on hugetlbfs.

 The usecase is not expected to prevent all writes, indeed the usecase
 requires existing mmaps to continue to be able to write into the memory 
 map.
 So would you call that a security issue too? The use of the seal wants to
 allow existing mmap regions to be continue to be written into (I mentioned
 more details in the cover letter).

> I see two reasonable solutions:
> 
> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> work by itself.

 Currently, the various VFS paths check only the struct file's f_mode to 
 deny
 writes of already opened files. This would mean more checking in all those
 paths (and modification of all those paths).

 Anyway going with that idea, we could
 1. call deny_write_access(file) from the memfd's seal path which decrements
 the inode::i_writecount.
 2. call get_write_access(inode) in the various VFS paths in addition to
 checking for FMODE_*WRITE and deny the write (incase i_writecount is 
 negative)

 That will prevent both reopens, and writes from succeeding. However I 
 worry a
 bit about 2 not being too familiar with VFS internals, about what the
 consequences of doing that may be.
>>> 
>>> IMHO, modifying both the inode and the struct file separately is fine,
>>> since they mean different things. In regular filesystems, it's fine to
>>> have a read-write open file description for a file whose inode grants
>>> write permission to nobody. Speaking of which: is fchmod enough to
>>> prevent this attack?
>> 
>> Well, yes and no. fchmod does prevent reopening the file RW, but
>> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>> isn't sufficient by itself. While it might be good enough for Android
>> (in the sense that it'll prevent RW-reopens from other security
>> contexts to which we send an open memfd file), it's still conceptually
>> ugly, IMHO. Let's go with the original approach of just tweaking the
>> inode so that open-for-write is permanently blocked.
> 
> Agreed with the idea of modifying both file and inode flags. I was thinking
> modifying i_mode may do the trick but as you pointed it probably could be
> reverted by chmod or some other attribute setting calls.
> 
> OTOH, I don't think deny_write_access(file) can be reverted from any
> user-facing path so we could do that from the seal to prevent the future
> opens in write mode. I'll double check and test that out tomorrow.
> 
> 

This seems considerably more complicated and more fragile than needed. Just add 
a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE variant 
work exactly like it with two exceptions:

- shmem_mmap and maybe its hugetlbfs equivalent should check for it and act 
accordingly.

- add_seals won’t need the wait_for_pins and mapping_deny_write logic.

That really should be all that’s needed.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  wrote:
> > On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
> > wrote:
> >> Thanks Andy for your thoughts, my comments below:
> [snip]
> >> I don't see it as warty, different seals will work differently. It works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >>
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >>
> >> Wouldn't that be the right thing to observe anyway?
> >>
> >>> - causes reopen to fail.
> >>
> >> So this concern isn't true anymore if we make reopen fail only for WRITE
> >> opens as Daniel suggested. I will make this change so that the security fix
> >> is a clean one.
> >>
> >>> - does *not* affect other struct files that may already exist on the same 
> >>> inode.
> >>
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
> >> to another process and we want to prevent any new writes in the receiver
> >> side. There is no way this other receiving process can have an existing fd
> >> unless it was already sent one without the seal applied.  The proposed seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >>
> >>> - mysteriously malfunctions if you try to set it again on another struct
> >>> file that already exists
> >>>
> >>
> >> I didn't follow this, could you explain more?
> >>
> >>> - probably is insecure when used on hugetlbfs.
> >>
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the memory 
> >> map.
> >> So would you call that a security issue too? The use of the seal wants to
> >> allow existing mmap regions to be continue to be written into (I mentioned
> >> more details in the cover letter).
> >>
> >>> I see two reasonable solutions:
> >>>
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> >>> work by itself.
> >>
> >> Currently, the various VFS paths check only the struct file's f_mode to 
> >> deny
> >> writes of already opened files. This would mean more checking in all those
> >> paths (and modification of all those paths).
> >>
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >>
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> >
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
> 
> Well, yes and no. fchmod does prevent reopening the file RW, but
> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> isn't sufficient by itself. While it might be good enough for Android
> (in the sense that it'll prevent RW-reopens from other security
> contexts to which we send an open memfd file), it's still conceptually
> ugly, IMHO. Let's go with the original approach of just tweaking the
> inode so that open-for-write is permanently blocked.

Agreed with the idea of modifying both file and inode flags. I was thinking
modifying i_mode may do the trick but as you pointed it probably could be
reverted by chmod or some other attribute setting calls.

OTOH, I don't think deny_write_access(file) can be reverted from any
user-facing path so we could do that from the seal to prevent the future
opens in write mode. I'll double check and test that out tomorrow.

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  wrote:
> > On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
> > wrote:
> >> Thanks Andy for your thoughts, my comments below:
> [snip]
> >> I don't see it as warty, different seals will work differently. It works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >>
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >>
> >> Wouldn't that be the right thing to observe anyway?
> >>
> >>> - causes reopen to fail.
> >>
> >> So this concern isn't true anymore if we make reopen fail only for WRITE
> >> opens as Daniel suggested. I will make this change so that the security fix
> >> is a clean one.
> >>
> >>> - does *not* affect other struct files that may already exist on the same 
> >>> inode.
> >>
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
> >> to another process and we want to prevent any new writes in the receiver
> >> side. There is no way this other receiving process can have an existing fd
> >> unless it was already sent one without the seal applied.  The proposed seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >>
> >>> - mysteriously malfunctions if you try to set it again on another struct
> >>> file that already exists
> >>>
> >>
> >> I didn't follow this, could you explain more?
> >>
> >>> - probably is insecure when used on hugetlbfs.
> >>
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the memory 
> >> map.
> >> So would you call that a security issue too? The use of the seal wants to
> >> allow existing mmap regions to be continue to be written into (I mentioned
> >> more details in the cover letter).
> >>
> >>> I see two reasonable solutions:
> >>>
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> >>> work by itself.
> >>
> >> Currently, the various VFS paths check only the struct file's f_mode to 
> >> deny
> >> writes of already opened files. This would mean more checking in all those
> >> paths (and modification of all those paths).
> >>
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >>
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> >
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
> 
> Well, yes and no. fchmod does prevent reopening the file RW, but
> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> isn't sufficient by itself. While it might be good enough for Android
> (in the sense that it'll prevent RW-reopens from other security
> contexts to which we send an open memfd file), it's still conceptually
> ugly, IMHO. Let's go with the original approach of just tweaking the
> inode so that open-for-write is permanently blocked.

Agreed with the idea of modifying both file and inode flags. I was thinking
modifying i_mode may do the trick but as you pointed it probably could be
reverted by chmod or some other attribute setting calls.

OTOH, I don't think deny_write_access(file) can be reverted from any
user-facing path so we could do that from the seal to prevent the future
opens in write mode. I'll double check and test that out tomorrow.

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski



> On Nov 10, 2018, at 11:11 AM, Daniel Colascione  wrote:
> 
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
>>> wrote:
>>> Thanks Andy for your thoughts, my comments below:
> [snip]
>>> I don't see it as warty, different seals will work differently. It works
>>> quite well for our usecase, and since Linux is all about solving real
>>> problems in the real work, it would be useful to have it.
>>> 
 - causes a probably-observable effect in the file mode in F_GETFL.
>>> 
>>> Wouldn't that be the right thing to observe anyway?
>>> 
 - causes reopen to fail.
>>> 
>>> So this concern isn't true anymore if we make reopen fail only for WRITE
>>> opens as Daniel suggested. I will make this change so that the security fix
>>> is a clean one.
>>> 
 - does *not* affect other struct files that may already exist on the same 
 inode.
>>> 
>>> TBH if you really want to block all writes to the file, then you want
>>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
>>> to another process and we want to prevent any new writes in the receiver
>>> side. There is no way this other receiving process can have an existing fd
>>> unless it was already sent one without the seal applied.  The proposed seal
>>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>>> 
 - mysteriously malfunctions if you try to set it again on another struct
 file that already exists
 
>>> 
>>> I didn't follow this, could you explain more?
>>> 
 - probably is insecure when used on hugetlbfs.
>>> 
>>> The usecase is not expected to prevent all writes, indeed the usecase
>>> requires existing mmaps to continue to be able to write into the memory map.
>>> So would you call that a security issue too? The use of the seal wants to
>>> allow existing mmap regions to be continue to be written into (I mentioned
>>> more details in the cover letter).
>>> 
 I see two reasonable solutions:
 
 1. Don’t fiddle with the struct file at all. Instead make the inode flag
 work by itself.
>>> 
>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>> writes of already opened files. This would mean more checking in all those
>>> paths (and modification of all those paths).
>>> 
>>> Anyway going with that idea, we could
>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>> the inode::i_writecount.
>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>>> negative)
>>> 
>>> That will prevent both reopens, and writes from succeeding. However I worry 
>>> a
>>> bit about 2 not being too familiar with VFS internals, about what the
>>> consequences of doing that may be.
>> 
>> IMHO, modifying both the inode and the struct file separately is fine,
>> since they mean different things. In regular filesystems, it's fine to
>> have a read-write open file description for a file whose inode grants
>> write permission to nobody. Speaking of which: is fchmod enough to
>> prevent this attack?
> 
> Well, yes and no. fchmod does prevent reopening the file RW, but
> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> isn't sufficient by itself. While it might be good enough for Android
> (in the sense that it'll prevent RW-reopens from other security
> contexts to which we send an open memfd file), it's still conceptually
> ugly, IMHO. Let's go with the original approach of just tweaking the
> inode so that open-for-write is permanently blocked.

This should be straightforward. Just add a new seal type and wire it up. It 
should be considerably simpler than SEAL_WRITE.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski



> On Nov 10, 2018, at 11:11 AM, Daniel Colascione  wrote:
> 
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
>>> wrote:
>>> Thanks Andy for your thoughts, my comments below:
> [snip]
>>> I don't see it as warty, different seals will work differently. It works
>>> quite well for our usecase, and since Linux is all about solving real
>>> problems in the real work, it would be useful to have it.
>>> 
 - causes a probably-observable effect in the file mode in F_GETFL.
>>> 
>>> Wouldn't that be the right thing to observe anyway?
>>> 
 - causes reopen to fail.
>>> 
>>> So this concern isn't true anymore if we make reopen fail only for WRITE
>>> opens as Daniel suggested. I will make this change so that the security fix
>>> is a clean one.
>>> 
 - does *not* affect other struct files that may already exist on the same 
 inode.
>>> 
>>> TBH if you really want to block all writes to the file, then you want
>>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
>>> to another process and we want to prevent any new writes in the receiver
>>> side. There is no way this other receiving process can have an existing fd
>>> unless it was already sent one without the seal applied.  The proposed seal
>>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>>> 
 - mysteriously malfunctions if you try to set it again on another struct
 file that already exists
 
>>> 
>>> I didn't follow this, could you explain more?
>>> 
 - probably is insecure when used on hugetlbfs.
>>> 
>>> The usecase is not expected to prevent all writes, indeed the usecase
>>> requires existing mmaps to continue to be able to write into the memory map.
>>> So would you call that a security issue too? The use of the seal wants to
>>> allow existing mmap regions to be continue to be written into (I mentioned
>>> more details in the cover letter).
>>> 
 I see two reasonable solutions:
 
 1. Don’t fiddle with the struct file at all. Instead make the inode flag
 work by itself.
>>> 
>>> Currently, the various VFS paths check only the struct file's f_mode to deny
>>> writes of already opened files. This would mean more checking in all those
>>> paths (and modification of all those paths).
>>> 
>>> Anyway going with that idea, we could
>>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>>> the inode::i_writecount.
>>> 2. call get_write_access(inode) in the various VFS paths in addition to
>>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>>> negative)
>>> 
>>> That will prevent both reopens, and writes from succeeding. However I worry 
>>> a
>>> bit about 2 not being too familiar with VFS internals, about what the
>>> consequences of doing that may be.
>> 
>> IMHO, modifying both the inode and the struct file separately is fine,
>> since they mean different things. In regular filesystems, it's fine to
>> have a read-write open file description for a file whose inode grants
>> write permission to nobody. Speaking of which: is fchmod enough to
>> prevent this attack?
> 
> Well, yes and no. fchmod does prevent reopening the file RW, but
> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> isn't sufficient by itself. While it might be good enough for Android
> (in the sense that it'll prevent RW-reopens from other security
> contexts to which we send an open memfd file), it's still conceptually
> ugly, IMHO. Let's go with the original approach of just tweaking the
> inode so that open-for-write is permanently blocked.

This should be straightforward. Just add a new seal type and wire it up. It 
should be considerably simpler than SEAL_WRITE.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Daniel Colascione

On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  wrote:
> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
> wrote:
>> Thanks Andy for your thoughts, my comments below:
[snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>>
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>>
>> Wouldn't that be the right thing to observe anyway?
>>
>>> - causes reopen to fail.
>>
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security fix
>> is a clean one.
>>
>>> - does *not* affect other struct files that may already exist on the same 
>>> inode.
>>
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing fd
>> unless it was already sent one without the seal applied.  The proposed seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>>
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>>
>>
>> I didn't follow this, could you explain more?
>>
>>> - probably is insecure when used on hugetlbfs.
>>
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I mentioned
>> more details in the cover letter).
>>
>>> I see two reasonable solutions:
>>>
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>>
>> Currently, the various VFS paths check only the struct file's f_mode to deny
>> writes of already opened files. This would mean more checking in all those
>> paths (and modification of all those paths).
>>
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>>
>> That will prevent both reopens, and writes from succeeding. However I worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
>
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?

Well, yes and no. fchmod does prevent reopening the file RW, but
anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
isn't sufficient by itself. While it might be good enough for Android
(in the sense that it'll prevent RW-reopens from other security
contexts to which we send an open memfd file), it's still conceptually
ugly, IMHO. Let's go with the original approach of just tweaking the
inode so that open-for-write is permanently blocked.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Daniel Colascione

On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  wrote:
> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  
> wrote:
>> Thanks Andy for your thoughts, my comments below:
[snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>>
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>>
>> Wouldn't that be the right thing to observe anyway?
>>
>>> - causes reopen to fail.
>>
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security fix
>> is a clean one.
>>
>>> - does *not* affect other struct files that may already exist on the same 
>>> inode.
>>
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing fd
>> unless it was already sent one without the seal applied.  The proposed seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>>
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>>
>>
>> I didn't follow this, could you explain more?
>>
>>> - probably is insecure when used on hugetlbfs.
>>
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I mentioned
>> more details in the cover letter).
>>
>>> I see two reasonable solutions:
>>>
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>>
>> Currently, the various VFS paths check only the struct file's f_mode to deny
>> writes of already opened files. This would mean more checking in all those
>> paths (and modification of all those paths).
>>
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>>
>> That will prevent both reopens, and writes from succeeding. However I worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
>
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?

Well, yes and no. fchmod does prevent reopening the file RW, but
anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
isn't sufficient by itself. While it might be good enough for Android
(in the sense that it'll prevent RW-reopens from other security
contexts to which we send an open memfd file), it's still conceptually
ugly, IMHO. Let's go with the original approach of just tweaking the
inode so that open-for-write is permanently blocked.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Daniel Colascione

On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  wrote:
> Thanks Andy for your thoughts, my comments below:
>
> On Fri, Nov 09, 2018 at 10:05:14PM -0800, Andy Lutomirski wrote:
>>
>>
>> > On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
>> >
>> >> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
>> >>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
>> >>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>> >>>  wrote:
>>  Android uses ashmem for sharing memory regions. We are looking forward
>>  to migrating all usecases of ashmem to memfd so that we can possibly
>>  remove the ashmem driver in the future from staging while also
>>  benefiting from using memfd and contributing to it. Note staging drivers
>>  are also not ABI and generally can be removed at anytime.
>> 
>>  One of the main usecases Android has is the ability to create a region
>>  and mmap it as writeable, then add protection against making any
>>  "future" writes while keeping the existing already mmap'ed
>>  writeable-region active.  This allows us to implement a usecase where
>>  receivers of the shared memory buffer can get a read-only view, while
>>  the sender continues to write to the buffer.
>>  See CursorWindow documentation in Android for more details:
>>  https://developer.android.com/reference/android/database/CursorWindow
>> 
>>  This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>>  To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>>  which prevents any future mmap and write syscalls from succeeding while
>>  keeping the existing mmap active.
>> >>>
>> >>> Please CC linux-api@ on patches like this. If you had done that, I
>> >>> might have criticized your v1 patch instead of your v3 patch...
>> >>>
>>  The following program shows the seal
>>  working in action:
>> >>> [...]
>>  Cc: jr...@google.com
>>  Cc: john.stu...@linaro.org
>>  Cc: tk...@google.com
>>  Cc: gre...@linuxfoundation.org
>>  Cc: h...@infradead.org
>>  Reviewed-by: John Stultz 
>>  Signed-off-by: Joel Fernandes (Google) 
>>  ---
>> >>> [...]
>>  diff --git a/mm/memfd.c b/mm/memfd.c
>>  index 2bb5e257080e..5ba9804e9515 100644
>>  --- a/mm/memfd.c
>>  +++ b/mm/memfd.c
>> >>> [...]
>>  @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
>>  unsigned int seals)
>> }
>> }
>> 
>>  +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>>  +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>>  +   /*
>>  +* The FUTURE_WRITE seal also prevents growing and 
>>  shrinking
>>  +* so we need them to be already set, or requested now.
>>  +*/
>>  +   int test_seals = (seals | *file_seals) &
>>  +(F_SEAL_GROW | F_SEAL_SHRINK);
>>  +
>>  +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>>  +   error = -EINVAL;
>>  +   goto unlock;
>>  +   }
>>  +
>>  +   spin_lock(>f_lock);
>>  +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>>  +   spin_unlock(>f_lock);
>>  +   }
>> >>>
>> >>> So you're fiddling around with the file, but not the inode? How are
>> >>> you preventing code like the following from re-opening the file as
>> >>> writable?
>> >>>
>> >>> $ cat memfd.c
>> >>> #define _GNU_SOURCE
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>>
>> >>> int main(void) {
>> >>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>> >>>  if (fd == -1) err(1, "memfd");
>> >>>  char path[100];
>> >>>  sprintf(path, "/proc/self/fd/%d", fd);
>> >>>  int fd2 = open(path, O_RDWR);
>> >>>  if (fd2 == -1) err(1, "reopen");
>> >>>  printf("reopen successful: %d\n", fd2);
>> >>> }
>> >>> $ gcc -o memfd memfd.c
>> >>> $ ./memfd
>> >>> reopen successful: 4
>> >>> $
>> >>>
>> >>> That aside: I wonder whether a better API would be something that
>> >>> allows you to create a new readonly file descriptor, instead of
>> >>> fiddling with the writability of an existing fd.
>> >>
>> >> My favorite approach would be to forbid open() on memfds, hope that
>> >> nobody notices the tiny API break, and then add an ioctl for "reopen
>> >> this memfd with reduced permissions" - but that's just my personal
>> >> opinion.
>> >
>> > I did something along these lines and it fixes the issue, but I forbid open
>> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its 
>> > not
>> > an ABI break because this is a brand new seal. That seems the least 
>> > intrusive
>> > solution and it works. Do you mind testing it and I'll add your and 
>> > Tested-by
>> > to the new fix? The patch is based on top of this

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Daniel Colascione

On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes  wrote:
> Thanks Andy for your thoughts, my comments below:
>
> On Fri, Nov 09, 2018 at 10:05:14PM -0800, Andy Lutomirski wrote:
>>
>>
>> > On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
>> >
>> >> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
>> >>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
>> >>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>> >>>  wrote:
>>  Android uses ashmem for sharing memory regions. We are looking forward
>>  to migrating all usecases of ashmem to memfd so that we can possibly
>>  remove the ashmem driver in the future from staging while also
>>  benefiting from using memfd and contributing to it. Note staging drivers
>>  are also not ABI and generally can be removed at anytime.
>> 
>>  One of the main usecases Android has is the ability to create a region
>>  and mmap it as writeable, then add protection against making any
>>  "future" writes while keeping the existing already mmap'ed
>>  writeable-region active.  This allows us to implement a usecase where
>>  receivers of the shared memory buffer can get a read-only view, while
>>  the sender continues to write to the buffer.
>>  See CursorWindow documentation in Android for more details:
>>  https://developer.android.com/reference/android/database/CursorWindow
>> 
>>  This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>>  To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>>  which prevents any future mmap and write syscalls from succeeding while
>>  keeping the existing mmap active.
>> >>>
>> >>> Please CC linux-api@ on patches like this. If you had done that, I
>> >>> might have criticized your v1 patch instead of your v3 patch...
>> >>>
>>  The following program shows the seal
>>  working in action:
>> >>> [...]
>>  Cc: jr...@google.com
>>  Cc: john.stu...@linaro.org
>>  Cc: tk...@google.com
>>  Cc: gre...@linuxfoundation.org
>>  Cc: h...@infradead.org
>>  Reviewed-by: John Stultz 
>>  Signed-off-by: Joel Fernandes (Google) 
>>  ---
>> >>> [...]
>>  diff --git a/mm/memfd.c b/mm/memfd.c
>>  index 2bb5e257080e..5ba9804e9515 100644
>>  --- a/mm/memfd.c
>>  +++ b/mm/memfd.c
>> >>> [...]
>>  @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
>>  unsigned int seals)
>> }
>> }
>> 
>>  +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>>  +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>>  +   /*
>>  +* The FUTURE_WRITE seal also prevents growing and 
>>  shrinking
>>  +* so we need them to be already set, or requested now.
>>  +*/
>>  +   int test_seals = (seals | *file_seals) &
>>  +(F_SEAL_GROW | F_SEAL_SHRINK);
>>  +
>>  +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>>  +   error = -EINVAL;
>>  +   goto unlock;
>>  +   }
>>  +
>>  +   spin_lock(>f_lock);
>>  +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>>  +   spin_unlock(>f_lock);
>>  +   }
>> >>>
>> >>> So you're fiddling around with the file, but not the inode? How are
>> >>> you preventing code like the following from re-opening the file as
>> >>> writable?
>> >>>
>> >>> $ cat memfd.c
>> >>> #define _GNU_SOURCE
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>> #include 
>> >>>
>> >>> int main(void) {
>> >>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>> >>>  if (fd == -1) err(1, "memfd");
>> >>>  char path[100];
>> >>>  sprintf(path, "/proc/self/fd/%d", fd);
>> >>>  int fd2 = open(path, O_RDWR);
>> >>>  if (fd2 == -1) err(1, "reopen");
>> >>>  printf("reopen successful: %d\n", fd2);
>> >>> }
>> >>> $ gcc -o memfd memfd.c
>> >>> $ ./memfd
>> >>> reopen successful: 4
>> >>> $
>> >>>
>> >>> That aside: I wonder whether a better API would be something that
>> >>> allows you to create a new readonly file descriptor, instead of
>> >>> fiddling with the writability of an existing fd.
>> >>
>> >> My favorite approach would be to forbid open() on memfds, hope that
>> >> nobody notices the tiny API break, and then add an ioctl for "reopen
>> >> this memfd with reduced permissions" - but that's just my personal
>> >> opinion.
>> >
>> > I did something along these lines and it fixes the issue, but I forbid open
>> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its 
>> > not
>> > an ABI break because this is a brand new seal. That seems the least 
>> > intrusive
>> > solution and it works. Do you mind testing it and I'll add your and 
>> > Tested-by
>> > to the new fix? The patch is based on top of this

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

Thanks Andy for your thoughts, my comments below:

On Fri, Nov 09, 2018 at 10:05:14PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
> > 
> >> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> >>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> >>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >>>  wrote:
>  Android uses ashmem for sharing memory regions. We are looking forward
>  to migrating all usecases of ashmem to memfd so that we can possibly
>  remove the ashmem driver in the future from staging while also
>  benefiting from using memfd and contributing to it. Note staging drivers
>  are also not ABI and generally can be removed at anytime.
>  
>  One of the main usecases Android has is the ability to create a region
>  and mmap it as writeable, then add protection against making any
>  "future" writes while keeping the existing already mmap'ed
>  writeable-region active.  This allows us to implement a usecase where
>  receivers of the shared memory buffer can get a read-only view, while
>  the sender continues to write to the buffer.
>  See CursorWindow documentation in Android for more details:
>  https://developer.android.com/reference/android/database/CursorWindow
>  
>  This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>  To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>  which prevents any future mmap and write syscalls from succeeding while
>  keeping the existing mmap active.
> >>> 
> >>> Please CC linux-api@ on patches like this. If you had done that, I
> >>> might have criticized your v1 patch instead of your v3 patch...
> >>> 
>  The following program shows the seal
>  working in action:
> >>> [...]
>  Cc: jr...@google.com
>  Cc: john.stu...@linaro.org
>  Cc: tk...@google.com
>  Cc: gre...@linuxfoundation.org
>  Cc: h...@infradead.org
>  Reviewed-by: John Stultz 
>  Signed-off-by: Joel Fernandes (Google) 
>  ---
> >>> [...]
>  diff --git a/mm/memfd.c b/mm/memfd.c
>  index 2bb5e257080e..5ba9804e9515 100644
>  --- a/mm/memfd.c
>  +++ b/mm/memfd.c
> >>> [...]
>  @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
>  unsigned int seals)
> }
> }
>  
>  +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>  +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>  +   /*
>  +* The FUTURE_WRITE seal also prevents growing and 
>  shrinking
>  +* so we need them to be already set, or requested now.
>  +*/
>  +   int test_seals = (seals | *file_seals) &
>  +(F_SEAL_GROW | F_SEAL_SHRINK);
>  +
>  +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>  +   error = -EINVAL;
>  +   goto unlock;
>  +   }
>  +
>  +   spin_lock(>f_lock);
>  +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>  +   spin_unlock(>f_lock);
>  +   }
> >>> 
> >>> So you're fiddling around with the file, but not the inode? How are
> >>> you preventing code like the following from re-opening the file as
> >>> writable?
> >>> 
> >>> $ cat memfd.c
> >>> #define _GNU_SOURCE
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> 
> >>> int main(void) {
> >>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
> >>>  if (fd == -1) err(1, "memfd");
> >>>  char path[100];
> >>>  sprintf(path, "/proc/self/fd/%d", fd);
> >>>  int fd2 = open(path, O_RDWR);
> >>>  if (fd2 == -1) err(1, "reopen");
> >>>  printf("reopen successful: %d\n", fd2);
> >>> }
> >>> $ gcc -o memfd memfd.c
> >>> $ ./memfd
> >>> reopen successful: 4
> >>> $
> >>> 
> >>> That aside: I wonder whether a better API would be something that
> >>> allows you to create a new readonly file descriptor, instead of
> >>> fiddling with the writability of an existing fd.
> >> 
> >> My favorite approach would be to forbid open() on memfds, hope that
> >> nobody notices the tiny API break, and then add an ioctl for "reopen
> >> this memfd with reduced permissions" - but that's just my personal
> >> opinion.
> > 
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
> > an ABI break because this is a brand new seal. That seems the least 
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and 
> > Tested-by
> > to the new fix? The patch is based on top of this series.
> > 
> > ---8<---
> > From: "Joel Fernandes (Google)" 
> > Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed 
> > memfd
> > 
> > Jann

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

Thanks Andy for your thoughts, my comments below:

On Fri, Nov 09, 2018 at 10:05:14PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
> > 
> >> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> >>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> >>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >>>  wrote:
>  Android uses ashmem for sharing memory regions. We are looking forward
>  to migrating all usecases of ashmem to memfd so that we can possibly
>  remove the ashmem driver in the future from staging while also
>  benefiting from using memfd and contributing to it. Note staging drivers
>  are also not ABI and generally can be removed at anytime.
>  
>  One of the main usecases Android has is the ability to create a region
>  and mmap it as writeable, then add protection against making any
>  "future" writes while keeping the existing already mmap'ed
>  writeable-region active.  This allows us to implement a usecase where
>  receivers of the shared memory buffer can get a read-only view, while
>  the sender continues to write to the buffer.
>  See CursorWindow documentation in Android for more details:
>  https://developer.android.com/reference/android/database/CursorWindow
>  
>  This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>  To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>  which prevents any future mmap and write syscalls from succeeding while
>  keeping the existing mmap active.
> >>> 
> >>> Please CC linux-api@ on patches like this. If you had done that, I
> >>> might have criticized your v1 patch instead of your v3 patch...
> >>> 
>  The following program shows the seal
>  working in action:
> >>> [...]
>  Cc: jr...@google.com
>  Cc: john.stu...@linaro.org
>  Cc: tk...@google.com
>  Cc: gre...@linuxfoundation.org
>  Cc: h...@infradead.org
>  Reviewed-by: John Stultz 
>  Signed-off-by: Joel Fernandes (Google) 
>  ---
> >>> [...]
>  diff --git a/mm/memfd.c b/mm/memfd.c
>  index 2bb5e257080e..5ba9804e9515 100644
>  --- a/mm/memfd.c
>  +++ b/mm/memfd.c
> >>> [...]
>  @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
>  unsigned int seals)
> }
> }
>  
>  +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>  +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>  +   /*
>  +* The FUTURE_WRITE seal also prevents growing and 
>  shrinking
>  +* so we need them to be already set, or requested now.
>  +*/
>  +   int test_seals = (seals | *file_seals) &
>  +(F_SEAL_GROW | F_SEAL_SHRINK);
>  +
>  +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>  +   error = -EINVAL;
>  +   goto unlock;
>  +   }
>  +
>  +   spin_lock(>f_lock);
>  +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>  +   spin_unlock(>f_lock);
>  +   }
> >>> 
> >>> So you're fiddling around with the file, but not the inode? How are
> >>> you preventing code like the following from re-opening the file as
> >>> writable?
> >>> 
> >>> $ cat memfd.c
> >>> #define _GNU_SOURCE
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> #include 
> >>> 
> >>> int main(void) {
> >>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
> >>>  if (fd == -1) err(1, "memfd");
> >>>  char path[100];
> >>>  sprintf(path, "/proc/self/fd/%d", fd);
> >>>  int fd2 = open(path, O_RDWR);
> >>>  if (fd2 == -1) err(1, "reopen");
> >>>  printf("reopen successful: %d\n", fd2);
> >>> }
> >>> $ gcc -o memfd memfd.c
> >>> $ ./memfd
> >>> reopen successful: 4
> >>> $
> >>> 
> >>> That aside: I wonder whether a better API would be something that
> >>> allows you to create a new readonly file descriptor, instead of
> >>> fiddling with the writability of an existing fd.
> >> 
> >> My favorite approach would be to forbid open() on memfds, hope that
> >> nobody notices the tiny API break, and then add an ioctl for "reopen
> >> this memfd with reduced permissions" - but that's just my personal
> >> opinion.
> > 
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
> > an ABI break because this is a brand new seal. That seems the least 
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and 
> > Tested-by
> > to the new fix? The patch is based on top of this series.
> > 
> > ---8<---
> > From: "Joel Fernandes (Google)" 
> > Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed 
> > memfd
> > 
> > Jann

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 04:26:46AM -0800, Daniel Colascione wrote:
> On Friday, November 9, 2018, Joel Fernandes  wrote:
> 
> > On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> > > On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> > > > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> > > >  wrote:
> > > > > Android uses ashmem for sharing memory regions. We are looking
> > forward
> > > > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > > > remove the ashmem driver in the future from staging while also
> > > > > benefiting from using memfd and contributing to it. Note staging
> > drivers
> > > > > are also not ABI and generally can be removed at anytime.
> > > > >
> > > > > One of the main usecases Android has is the ability to create a
> > region
> > > > > and mmap it as writeable, then add protection against making any
> > > > > "future" writes while keeping the existing already mmap'ed
> > > > > writeable-region active.  This allows us to implement a usecase where
> > > > > receivers of the shared memory buffer can get a read-only view, while
> > > > > the sender continues to write to the buffer.
> > > > > See CursorWindow documentation in Android for more details:
> > > > > https://developer.android.com/reference/android/database/
> > CursorWindow
> > > > >
> > > > > This usecase cannot be implemented with the existing F_SEAL_WRITE
> > seal.
> > > > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE
> > seal
> > > > > which prevents any future mmap and write syscalls from succeeding
> > while
> > > > > keeping the existing mmap active.
> > > >
> > > > Please CC linux-api@ on patches like this. If you had done that, I
> > > > might have criticized your v1 patch instead of your v3 patch...
> > > >
> > > > > The following program shows the seal
> > > > > working in action:
> > > > [...]
> > > > > Cc: jr...@google.com
> > > > > Cc: john.stu...@linaro.org
> > > > > Cc: tk...@google.com
> > > > > Cc: gre...@linuxfoundation.org
> > > > > Cc: h...@infradead.org
> > > > > Reviewed-by: John Stultz 
> > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > > ---
> > > > [...]
> > > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > > index 2bb5e257080e..5ba9804e9515 100644
> > > > > --- a/mm/memfd.c
> > > > > +++ b/mm/memfd.c
> > > > [...]
> > > > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file,
> > unsigned int seals)
> > > > > }
> > > > > }
> > > > >
> > > > > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > > > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > > > +   /*
> > > > > +* The FUTURE_WRITE seal also prevents growing and
> > shrinking
> > > > > +* so we need them to be already set, or requested
> > now.
> > > > > +*/
> > > > > +   int test_seals = (seals | *file_seals) &
> > > > > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > > > > +
> > > > > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > > > +   error = -EINVAL;
> > > > > +   goto unlock;
> > > > > +   }
> > > > > +
> > > > > +   spin_lock(>f_lock);
> > > > > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > > +   spin_unlock(>f_lock);
> > > > > +   }
> > > >
> > > > So you're fiddling around with the file, but not the inode? How are
> > > > you preventing code like the following from re-opening the file as
> > > > writable?
> > > >
> > > > $ cat memfd.c
> > > > #define _GNU_SOURCE
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > >
> > > > int main(void) {
> > > >   int fd = syscall(__NR_memfd_create, "testfd", 0);
> > > >   if (fd == -1) err(1, "memfd");
> > > >   char path[100];
> > > >   sprintf(path, "/proc/self/fd/%d", fd);
> > > >   int fd2 = open(path, O_RDWR);
> > > >   if (fd2 == -1) err(1, "reopen");
> > > >   printf("reopen successful: %d\n", fd2);
> > > > }
> > > > $ gcc -o memfd memfd.c
> > > > $ ./memfd
> > > > reopen successful: 4
> > > > $
> > > >
> > > > That aside: I wonder whether a better API would be something that
> > > > allows you to create a new readonly file descriptor, instead of
> > > > fiddling with the writability of an existing fd.
> > >
> > > My favorite approach would be to forbid open() on memfds, hope that
> > > nobody notices the tiny API break, and then add an ioctl for "reopen
> > > this memfd with reduced permissions" - but that's just my personal
> > > opinion.
> >
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its
> > not
> > an ABI break because this is a brand new seal. That seems the least
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and
> > Tested-by
> > to the new

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 04:26:46AM -0800, Daniel Colascione wrote:
> On Friday, November 9, 2018, Joel Fernandes  wrote:
> 
> > On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> > > On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> > > > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> > > >  wrote:
> > > > > Android uses ashmem for sharing memory regions. We are looking
> > forward
> > > > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > > > remove the ashmem driver in the future from staging while also
> > > > > benefiting from using memfd and contributing to it. Note staging
> > drivers
> > > > > are also not ABI and generally can be removed at anytime.
> > > > >
> > > > > One of the main usecases Android has is the ability to create a
> > region
> > > > > and mmap it as writeable, then add protection against making any
> > > > > "future" writes while keeping the existing already mmap'ed
> > > > > writeable-region active.  This allows us to implement a usecase where
> > > > > receivers of the shared memory buffer can get a read-only view, while
> > > > > the sender continues to write to the buffer.
> > > > > See CursorWindow documentation in Android for more details:
> > > > > https://developer.android.com/reference/android/database/
> > CursorWindow
> > > > >
> > > > > This usecase cannot be implemented with the existing F_SEAL_WRITE
> > seal.
> > > > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE
> > seal
> > > > > which prevents any future mmap and write syscalls from succeeding
> > while
> > > > > keeping the existing mmap active.
> > > >
> > > > Please CC linux-api@ on patches like this. If you had done that, I
> > > > might have criticized your v1 patch instead of your v3 patch...
> > > >
> > > > > The following program shows the seal
> > > > > working in action:
> > > > [...]
> > > > > Cc: jr...@google.com
> > > > > Cc: john.stu...@linaro.org
> > > > > Cc: tk...@google.com
> > > > > Cc: gre...@linuxfoundation.org
> > > > > Cc: h...@infradead.org
> > > > > Reviewed-by: John Stultz 
> > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > > ---
> > > > [...]
> > > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > > index 2bb5e257080e..5ba9804e9515 100644
> > > > > --- a/mm/memfd.c
> > > > > +++ b/mm/memfd.c
> > > > [...]
> > > > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file,
> > unsigned int seals)
> > > > > }
> > > > > }
> > > > >
> > > > > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > > > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > > > +   /*
> > > > > +* The FUTURE_WRITE seal also prevents growing and
> > shrinking
> > > > > +* so we need them to be already set, or requested
> > now.
> > > > > +*/
> > > > > +   int test_seals = (seals | *file_seals) &
> > > > > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > > > > +
> > > > > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > > > +   error = -EINVAL;
> > > > > +   goto unlock;
> > > > > +   }
> > > > > +
> > > > > +   spin_lock(>f_lock);
> > > > > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > > +   spin_unlock(>f_lock);
> > > > > +   }
> > > >
> > > > So you're fiddling around with the file, but not the inode? How are
> > > > you preventing code like the following from re-opening the file as
> > > > writable?
> > > >
> > > > $ cat memfd.c
> > > > #define _GNU_SOURCE
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > > #include 
> > > >
> > > > int main(void) {
> > > >   int fd = syscall(__NR_memfd_create, "testfd", 0);
> > > >   if (fd == -1) err(1, "memfd");
> > > >   char path[100];
> > > >   sprintf(path, "/proc/self/fd/%d", fd);
> > > >   int fd2 = open(path, O_RDWR);
> > > >   if (fd2 == -1) err(1, "reopen");
> > > >   printf("reopen successful: %d\n", fd2);
> > > > }
> > > > $ gcc -o memfd memfd.c
> > > > $ ./memfd
> > > > reopen successful: 4
> > > > $
> > > >
> > > > That aside: I wonder whether a better API would be something that
> > > > allows you to create a new readonly file descriptor, instead of
> > > > fiddling with the writability of an existing fd.
> > >
> > > My favorite approach would be to forbid open() on memfds, hope that
> > > nobody notices the tiny API break, and then add an ioctl for "reopen
> > > this memfd with reduced permissions" - but that's just my personal
> > > opinion.
> >
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its
> > not
> > an ABI break because this is a brand new seal. That seems the least
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and
> > Tested-by
> > to the new

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
> 
>> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
>>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
>>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>>>  wrote:
 Android uses ashmem for sharing memory regions. We are looking forward
 to migrating all usecases of ashmem to memfd so that we can possibly
 remove the ashmem driver in the future from staging while also
 benefiting from using memfd and contributing to it. Note staging drivers
 are also not ABI and generally can be removed at anytime.
 
 One of the main usecases Android has is the ability to create a region
 and mmap it as writeable, then add protection against making any
 "future" writes while keeping the existing already mmap'ed
 writeable-region active.  This allows us to implement a usecase where
 receivers of the shared memory buffer can get a read-only view, while
 the sender continues to write to the buffer.
 See CursorWindow documentation in Android for more details:
 https://developer.android.com/reference/android/database/CursorWindow
 
 This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
 To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
 which prevents any future mmap and write syscalls from succeeding while
 keeping the existing mmap active.
>>> 
>>> Please CC linux-api@ on patches like this. If you had done that, I
>>> might have criticized your v1 patch instead of your v3 patch...
>>> 
 The following program shows the seal
 working in action:
>>> [...]
 Cc: jr...@google.com
 Cc: john.stu...@linaro.org
 Cc: tk...@google.com
 Cc: gre...@linuxfoundation.org
 Cc: h...@infradead.org
 Reviewed-by: John Stultz 
 Signed-off-by: Joel Fernandes (Google) 
 ---
>>> [...]
 diff --git a/mm/memfd.c b/mm/memfd.c
 index 2bb5e257080e..5ba9804e9515 100644
 --- a/mm/memfd.c
 +++ b/mm/memfd.c
>>> [...]
 @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
 unsigned int seals)
}
}
 
 +   if ((seals & F_SEAL_FUTURE_WRITE) &&
 +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
 +   /*
 +* The FUTURE_WRITE seal also prevents growing and 
 shrinking
 +* so we need them to be already set, or requested now.
 +*/
 +   int test_seals = (seals | *file_seals) &
 +(F_SEAL_GROW | F_SEAL_SHRINK);
 +
 +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
 +   error = -EINVAL;
 +   goto unlock;
 +   }
 +
 +   spin_lock(>f_lock);
 +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
 +   spin_unlock(>f_lock);
 +   }
>>> 
>>> So you're fiddling around with the file, but not the inode? How are
>>> you preventing code like the following from re-opening the file as
>>> writable?
>>> 
>>> $ cat memfd.c
>>> #define _GNU_SOURCE
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> int main(void) {
>>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>>>  if (fd == -1) err(1, "memfd");
>>>  char path[100];
>>>  sprintf(path, "/proc/self/fd/%d", fd);
>>>  int fd2 = open(path, O_RDWR);
>>>  if (fd2 == -1) err(1, "reopen");
>>>  printf("reopen successful: %d\n", fd2);
>>> }
>>> $ gcc -o memfd memfd.c
>>> $ ./memfd
>>> reopen successful: 4
>>> $
>>> 
>>> That aside: I wonder whether a better API would be something that
>>> allows you to create a new readonly file descriptor, instead of
>>> fiddling with the writability of an existing fd.
>> 
>> My favorite approach would be to forbid open() on memfds, hope that
>> nobody notices the tiny API break, and then add an ioctl for "reopen
>> this memfd with reduced permissions" - but that's just my personal
>> opinion.
> 
> I did something along these lines and it fixes the issue, but I forbid open
> of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
> an ABI break because this is a brand new seal. That seems the least intrusive
> solution and it works. Do you mind testing it and I'll add your and Tested-by
> to the new fix? The patch is based on top of this series.
> 
> ---8<---
> From: "Joel Fernandes (Google)" 
> Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed memfd
> 
> Jann Horn found that reopening an F_SEAL_FUTURE_WRITE sealed memfd
> through /proc/self/fd/N symlink as writeable succeeds. The simplest fix
> without causing ABI breakages and ugly VFS hacks is to simply deny all
> opens on F_SEAL_FUTURE_WRITE sealed fds.
> 
> Reported-by: Jann Horn 
> Signed-off-by: Joel Fernandes (Google) 
> ---
> mm/shmem.c | 18 ++
> 1

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 7:20 PM, Joel Fernandes  wrote:
> 
>> On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
>>> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
>>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>>>  wrote:
 Android uses ashmem for sharing memory regions. We are looking forward
 to migrating all usecases of ashmem to memfd so that we can possibly
 remove the ashmem driver in the future from staging while also
 benefiting from using memfd and contributing to it. Note staging drivers
 are also not ABI and generally can be removed at anytime.
 
 One of the main usecases Android has is the ability to create a region
 and mmap it as writeable, then add protection against making any
 "future" writes while keeping the existing already mmap'ed
 writeable-region active.  This allows us to implement a usecase where
 receivers of the shared memory buffer can get a read-only view, while
 the sender continues to write to the buffer.
 See CursorWindow documentation in Android for more details:
 https://developer.android.com/reference/android/database/CursorWindow
 
 This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
 To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
 which prevents any future mmap and write syscalls from succeeding while
 keeping the existing mmap active.
>>> 
>>> Please CC linux-api@ on patches like this. If you had done that, I
>>> might have criticized your v1 patch instead of your v3 patch...
>>> 
 The following program shows the seal
 working in action:
>>> [...]
 Cc: jr...@google.com
 Cc: john.stu...@linaro.org
 Cc: tk...@google.com
 Cc: gre...@linuxfoundation.org
 Cc: h...@infradead.org
 Reviewed-by: John Stultz 
 Signed-off-by: Joel Fernandes (Google) 
 ---
>>> [...]
 diff --git a/mm/memfd.c b/mm/memfd.c
 index 2bb5e257080e..5ba9804e9515 100644
 --- a/mm/memfd.c
 +++ b/mm/memfd.c
>>> [...]
 @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
 unsigned int seals)
}
}
 
 +   if ((seals & F_SEAL_FUTURE_WRITE) &&
 +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
 +   /*
 +* The FUTURE_WRITE seal also prevents growing and 
 shrinking
 +* so we need them to be already set, or requested now.
 +*/
 +   int test_seals = (seals | *file_seals) &
 +(F_SEAL_GROW | F_SEAL_SHRINK);
 +
 +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
 +   error = -EINVAL;
 +   goto unlock;
 +   }
 +
 +   spin_lock(>f_lock);
 +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
 +   spin_unlock(>f_lock);
 +   }
>>> 
>>> So you're fiddling around with the file, but not the inode? How are
>>> you preventing code like the following from re-opening the file as
>>> writable?
>>> 
>>> $ cat memfd.c
>>> #define _GNU_SOURCE
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> int main(void) {
>>>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>>>  if (fd == -1) err(1, "memfd");
>>>  char path[100];
>>>  sprintf(path, "/proc/self/fd/%d", fd);
>>>  int fd2 = open(path, O_RDWR);
>>>  if (fd2 == -1) err(1, "reopen");
>>>  printf("reopen successful: %d\n", fd2);
>>> }
>>> $ gcc -o memfd memfd.c
>>> $ ./memfd
>>> reopen successful: 4
>>> $
>>> 
>>> That aside: I wonder whether a better API would be something that
>>> allows you to create a new readonly file descriptor, instead of
>>> fiddling with the writability of an existing fd.
>> 
>> My favorite approach would be to forbid open() on memfds, hope that
>> nobody notices the tiny API break, and then add an ioctl for "reopen
>> this memfd with reduced permissions" - but that's just my personal
>> opinion.
> 
> I did something along these lines and it fixes the issue, but I forbid open
> of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
> an ABI break because this is a brand new seal. That seems the least intrusive
> solution and it works. Do you mind testing it and I'll add your and Tested-by
> to the new fix? The patch is based on top of this series.
> 
> ---8<---
> From: "Joel Fernandes (Google)" 
> Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed memfd
> 
> Jann Horn found that reopening an F_SEAL_FUTURE_WRITE sealed memfd
> through /proc/self/fd/N symlink as writeable succeeds. The simplest fix
> without causing ABI breakages and ugly VFS hacks is to simply deny all
> opens on F_SEAL_FUTURE_WRITE sealed fds.
> 
> Reported-by: Jann Horn 
> Signed-off-by: Joel Fernandes (Google) 
> ---
> mm/shmem.c | 18 ++
> 1

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 12:36:34PM -0800, Andrew Morton wrote:
> On Wed,  7 Nov 2018 20:15:36 -0800 "Joel Fernandes (Google)" 
>  wrote:
> 
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> > 
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> 
> It appears that the memfd_create and fcntl manpages will require
> updating.  Please attend to this at the appropriate time?

Yes, I am planning to send those out shortly. I finished working on them.

Also just to let you know, I posted a fix for the security issue Jann Horn
reported and requested him to test it:
https://lore.kernel.org/lkml/20181109234636.ga136...@google.com/T/#m8d9d185e6480d095f0ab8f84bcb103892181f77d

This fix along with the 2 other patches I posted in v3 are all that's needed. 
thanks!

- Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 12:36:34PM -0800, Andrew Morton wrote:
> On Wed,  7 Nov 2018 20:15:36 -0800 "Joel Fernandes (Google)" 
>  wrote:
> 
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> > 
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> 
> It appears that the memfd_create and fcntl manpages will require
> updating.  Please attend to this at the appropriate time?

Yes, I am planning to send those out shortly. I finished working on them.

Also just to let you know, I posted a fix for the security issue Jann Horn
reported and requested him to test it:
https://lore.kernel.org/lkml/20181109234636.ga136...@google.com/T/#m8d9d185e6480d095f0ab8f84bcb103892181f77d

This fix along with the 2 other patches I posted in v3 are all that's needed. 
thanks!

- Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >  wrote:
> > > Android uses ashmem for sharing memory regions. We are looking forward
> > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > remove the ashmem driver in the future from staging while also
> > > benefiting from using memfd and contributing to it. Note staging drivers
> > > are also not ABI and generally can be removed at anytime.
> > >
> > > One of the main usecases Android has is the ability to create a region
> > > and mmap it as writeable, then add protection against making any
> > > "future" writes while keeping the existing already mmap'ed
> > > writeable-region active.  This allows us to implement a usecase where
> > > receivers of the shared memory buffer can get a read-only view, while
> > > the sender continues to write to the buffer.
> > > See CursorWindow documentation in Android for more details:
> > > https://developer.android.com/reference/android/database/CursorWindow
> > >
> > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > > which prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active.
> >
> > Please CC linux-api@ on patches like this. If you had done that, I
> > might have criticized your v1 patch instead of your v3 patch...
> >
> > > The following program shows the seal
> > > working in action:
> > [...]
> > > Cc: jr...@google.com
> > > Cc: john.stu...@linaro.org
> > > Cc: tk...@google.com
> > > Cc: gre...@linuxfoundation.org
> > > Cc: h...@infradead.org
> > > Reviewed-by: John Stultz 
> > > Signed-off-by: Joel Fernandes (Google) 
> > > ---
> > [...]
> > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > index 2bb5e257080e..5ba9804e9515 100644
> > > --- a/mm/memfd.c
> > > +++ b/mm/memfd.c
> > [...]
> > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
> > > unsigned int seals)
> > > }
> > > }
> > >
> > > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > +   /*
> > > +* The FUTURE_WRITE seal also prevents growing and 
> > > shrinking
> > > +* so we need them to be already set, or requested now.
> > > +*/
> > > +   int test_seals = (seals | *file_seals) &
> > > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > > +
> > > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > +   error = -EINVAL;
> > > +   goto unlock;
> > > +   }
> > > +
> > > +   spin_lock(>f_lock);
> > > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > +   spin_unlock(>f_lock);
> > > +   }
> >
> > So you're fiddling around with the file, but not the inode? How are
> > you preventing code like the following from re-opening the file as
> > writable?
> >
> > $ cat memfd.c
> > #define _GNU_SOURCE
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(void) {
> >   int fd = syscall(__NR_memfd_create, "testfd", 0);
> >   if (fd == -1) err(1, "memfd");
> >   char path[100];
> >   sprintf(path, "/proc/self/fd/%d", fd);
> >   int fd2 = open(path, O_RDWR);
> >   if (fd2 == -1) err(1, "reopen");
> >   printf("reopen successful: %d\n", fd2);
> > }
> > $ gcc -o memfd memfd.c
> > $ ./memfd
> > reopen successful: 4
> > $
> >
> > That aside: I wonder whether a better API would be something that
> > allows you to create a new readonly file descriptor, instead of
> > fiddling with the writability of an existing fd.
> 
> My favorite approach would be to forbid open() on memfds, hope that
> nobody notices the tiny API break, and then add an ioctl for "reopen
> this memfd with reduced permissions" - but that's just my personal
> opinion.

I did something along these lines and it fixes the issue, but I forbid open
of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
an ABI break because this is a brand new seal. That seems the least intrusive
solution and it works. Do you mind testing it and I'll add your and Tested-by
to the new fix? The patch is based on top of this series.

---8<---
From: "Joel Fernandes (Google)" 
Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed memfd

Jann Horn found that reopening an F_SEAL_FUTURE_WRITE sealed memfd
through /proc/self/fd/N symlink as writeable succeeds. The simplest fix
without causing ABI breakages and ugly VFS hacks is to simply deny all
opens on F_SEAL_FUTURE_WRITE sealed fds.

Reported-by: Jann Horn 
Signed-off-by: Joel Fernandes (Google) 
---
 mm/shmem.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/shmem.c

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >  wrote:
> > > Android uses ashmem for sharing memory regions. We are looking forward
> > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > remove the ashmem driver in the future from staging while also
> > > benefiting from using memfd and contributing to it. Note staging drivers
> > > are also not ABI and generally can be removed at anytime.
> > >
> > > One of the main usecases Android has is the ability to create a region
> > > and mmap it as writeable, then add protection against making any
> > > "future" writes while keeping the existing already mmap'ed
> > > writeable-region active.  This allows us to implement a usecase where
> > > receivers of the shared memory buffer can get a read-only view, while
> > > the sender continues to write to the buffer.
> > > See CursorWindow documentation in Android for more details:
> > > https://developer.android.com/reference/android/database/CursorWindow
> > >
> > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > > which prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active.
> >
> > Please CC linux-api@ on patches like this. If you had done that, I
> > might have criticized your v1 patch instead of your v3 patch...
> >
> > > The following program shows the seal
> > > working in action:
> > [...]
> > > Cc: jr...@google.com
> > > Cc: john.stu...@linaro.org
> > > Cc: tk...@google.com
> > > Cc: gre...@linuxfoundation.org
> > > Cc: h...@infradead.org
> > > Reviewed-by: John Stultz 
> > > Signed-off-by: Joel Fernandes (Google) 
> > > ---
> > [...]
> > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > index 2bb5e257080e..5ba9804e9515 100644
> > > --- a/mm/memfd.c
> > > +++ b/mm/memfd.c
> > [...]
> > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, 
> > > unsigned int seals)
> > > }
> > > }
> > >
> > > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > +   /*
> > > +* The FUTURE_WRITE seal also prevents growing and 
> > > shrinking
> > > +* so we need them to be already set, or requested now.
> > > +*/
> > > +   int test_seals = (seals | *file_seals) &
> > > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > > +
> > > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > +   error = -EINVAL;
> > > +   goto unlock;
> > > +   }
> > > +
> > > +   spin_lock(>f_lock);
> > > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > +   spin_unlock(>f_lock);
> > > +   }
> >
> > So you're fiddling around with the file, but not the inode? How are
> > you preventing code like the following from re-opening the file as
> > writable?
> >
> > $ cat memfd.c
> > #define _GNU_SOURCE
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(void) {
> >   int fd = syscall(__NR_memfd_create, "testfd", 0);
> >   if (fd == -1) err(1, "memfd");
> >   char path[100];
> >   sprintf(path, "/proc/self/fd/%d", fd);
> >   int fd2 = open(path, O_RDWR);
> >   if (fd2 == -1) err(1, "reopen");
> >   printf("reopen successful: %d\n", fd2);
> > }
> > $ gcc -o memfd memfd.c
> > $ ./memfd
> > reopen successful: 4
> > $
> >
> > That aside: I wonder whether a better API would be something that
> > allows you to create a new readonly file descriptor, instead of
> > fiddling with the writability of an existing fd.
> 
> My favorite approach would be to forbid open() on memfds, hope that
> nobody notices the tiny API break, and then add an ioctl for "reopen
> this memfd with reduced permissions" - but that's just my personal
> opinion.

I did something along these lines and it fixes the issue, but I forbid open
of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its not
an ABI break because this is a brand new seal. That seems the least intrusive
solution and it works. Do you mind testing it and I'll add your and Tested-by
to the new fix? The patch is based on top of this series.

---8<---
From: "Joel Fernandes (Google)" 
Subject: [PATCH] mm/memfd: Fix possible promotion to writeable of sealed memfd

Jann Horn found that reopening an F_SEAL_FUTURE_WRITE sealed memfd
through /proc/self/fd/N symlink as writeable succeeds. The simplest fix
without causing ABI breakages and ugly VFS hacks is to simply deny all
opens on F_SEAL_FUTURE_WRITE sealed fds.

Reported-by: Jann Horn 
Signed-off-by: Joel Fernandes (Google) 
---
 mm/shmem.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/shmem.c

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 08:02:14PM +, Michael Tirado wrote:
[...]
> > > That aside: I wonder whether a better API would be something that
> > > allows you to create a new readonly file descriptor, instead of
> > > fiddling with the writability of an existing fd.
> >
> > Every now and then I try to write a patch to prevent using proc to reopen
> > a file with greater permission than the original open.
> >
> > I like your idea to have a clean way to reopen a a memfd with reduced
> > permissions. But I would make it a syscall instead and maybe make it only
> > work for memfd at first.  And the proc issue would need to be fixed, too.
> 
> IMO the best solution would handle the issue at memfd creation time by
> removing the race condition.

I agree, this is another idea I'm exploring. We could add a new .open
callback to shmem_file_operations and check for seals there.

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 08:02:14PM +, Michael Tirado wrote:
[...]
> > > That aside: I wonder whether a better API would be something that
> > > allows you to create a new readonly file descriptor, instead of
> > > fiddling with the writability of an existing fd.
> >
> > Every now and then I try to write a patch to prevent using proc to reopen
> > a file with greater permission than the original open.
> >
> > I like your idea to have a clean way to reopen a a memfd with reduced
> > permissions. But I would make it a syscall instead and maybe make it only
> > work for memfd at first.  And the proc issue would need to be fixed, too.
> 
> IMO the best solution would handle the issue at memfd creation time by
> removing the race condition.

I agree, this is another idea I'm exploring. We could add a new .open
callback to shmem_file_operations and check for seals there.

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 03:14:02PM -0800, Andy Lutomirski wrote:
>  That aside: I wonder whether a better API would be something that
>  allows you to create a new readonly file descriptor, instead of
>  fiddling with the writability of an existing fd.
> >>> 
> >>> That doesn't work, unfortunately. The ashmem API we're replacing with
> >>> memfd requires file descriptor continuity. I also looked into opening
> >>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
> >>> work in the case that the old FD has already leaked to some other
> >>> context (e.g., another dup, SCM_RIGHTS). See
> >>> https://developer.android.com/ndk/reference/group/memory. We can't
> >>> break ASharedMemory_setProt.
> >> 
> >> 
> >> Hmm.  If we fix the general reopen bug, a way to drop write access from
> >> an existing struct file would do what Android needs, right?  I don’t
> >> know if there are general VFS issues with that.
> > 

I don't think there is a way to fix this in /proc/pid/fd. At the proc
level, the /proc/pid/fd/N files are just soft symlinks that follow through to
the actual file. The open is actually done on that inode/file. I think
changing it the way being discussed here means changing the way symlinks work
in Linux.

I think the right way to fix this is at the memfd inode level. I am working
on a follow up patch on top of this patch, and will send that out in a few
days (along with the man page updates).

thanks!

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 03:14:02PM -0800, Andy Lutomirski wrote:
>  That aside: I wonder whether a better API would be something that
>  allows you to create a new readonly file descriptor, instead of
>  fiddling with the writability of an existing fd.
> >>> 
> >>> That doesn't work, unfortunately. The ashmem API we're replacing with
> >>> memfd requires file descriptor continuity. I also looked into opening
> >>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
> >>> work in the case that the old FD has already leaked to some other
> >>> context (e.g., another dup, SCM_RIGHTS). See
> >>> https://developer.android.com/ndk/reference/group/memory. We can't
> >>> break ASharedMemory_setProt.
> >> 
> >> 
> >> Hmm.  If we fix the general reopen bug, a way to drop write access from
> >> an existing struct file would do what Android needs, right?  I don’t
> >> know if there are general VFS issues with that.
> > 

I don't think there is a way to fix this in /proc/pid/fd. At the proc
level, the /proc/pid/fd/N files are just soft symlinks that follow through to
the actual file. The open is actually done on that inode/file. I think
changing it the way being discussed here means changing the way symlinks work
in Linux.

I think the right way to fix this is at the memfd inode level. I am working
on a follow up patch on top of this patch, and will send that out in a few
days (along with the man page updates).

thanks!

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Michael Tirado

On Fri, Nov 9, 2018 at 9:41 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 9, 2018, at 1:06 PM, Jann Horn  wrote:
> >
> > +linux-api for API addition
> > +hughd as FYI since this is somewhat related to mm/shmem
> >
> > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >  wrote:
> >> Android uses ashmem for sharing memory regions. We are looking forward
> >> to migrating all usecases of ashmem to memfd so that we can possibly
> >> remove the ashmem driver in the future from staging while also
> >> benefiting from using memfd and contributing to it. Note staging drivers
> >> are also not ABI and generally can be removed at anytime.
> >>
> >> One of the main usecases Android has is the ability to create a region
> >> and mmap it as writeable, then add protection against making any
> >> "future" writes while keeping the existing already mmap'ed
> >> writeable-region active.  This allows us to implement a usecase where
> >> receivers of the shared memory buffer can get a read-only view, while
> >> the sender continues to write to the buffer.

Oh I remember trying this years ago with a new seal, F_SEAL_WRITE_PEER,
or something like that.

> >
> > So you're fiddling around with the file, but not the inode? How are
> > you preventing code like the following from re-opening the file as
> > writable?
> >
> > $ cat memfd.c
> > #define _GNU_SOURCE
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(void) {
> >  int fd = syscall(__NR_memfd_create, "testfd", 0);
> >  if (fd == -1) err(1, "memfd");
> >  char path[100];
> >  sprintf(path, "/proc/self/fd/%d", fd);
> >  int fd2 = open(path, O_RDWR);
> >  if (fd2 == -1) err(1, "reopen");
> >  printf("reopen successful: %d\n", fd2);
> > }
> > $ gcc -o memfd memfd.c
> > $ ./memfd
> > reopen successful: 4
> > $
> >

The race condition between memfd_create and applying seals in fcntl?
I think it would be possible to block new write mappings from peer processes if
there is a new memfd_create api that accepts seals. Allowing caller to
set a seal
like the one I proposed years ago, though in a race-free manner. Then
also consider
how to properly handle blocking inherited +W mapping through
clone/fork. Maybe I'm
forgetting some other pitfalls?



> > That aside: I wonder whether a better API would be something that
> > allows you to create a new readonly file descriptor, instead of
> > fiddling with the writability of an existing fd.
>
> Every now and then I try to write a patch to prevent using proc to reopen a 
> file with greater permission than the original open.
>
> I like your idea to have a clean way to reopen a a memfd with reduced 
> permissions. But I would make it a syscall instead and maybe make it only 
> work for memfd at first.  And the proc issue would need to be fixed, too.

IMO the best solution would handle the issue at memfd creation time by
removing the race condition.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Michael Tirado

On Fri, Nov 9, 2018 at 9:41 PM Andy Lutomirski  wrote:
>
>
>
> > On Nov 9, 2018, at 1:06 PM, Jann Horn  wrote:
> >
> > +linux-api for API addition
> > +hughd as FYI since this is somewhat related to mm/shmem
> >
> > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> >  wrote:
> >> Android uses ashmem for sharing memory regions. We are looking forward
> >> to migrating all usecases of ashmem to memfd so that we can possibly
> >> remove the ashmem driver in the future from staging while also
> >> benefiting from using memfd and contributing to it. Note staging drivers
> >> are also not ABI and generally can be removed at anytime.
> >>
> >> One of the main usecases Android has is the ability to create a region
> >> and mmap it as writeable, then add protection against making any
> >> "future" writes while keeping the existing already mmap'ed
> >> writeable-region active.  This allows us to implement a usecase where
> >> receivers of the shared memory buffer can get a read-only view, while
> >> the sender continues to write to the buffer.

Oh I remember trying this years ago with a new seal, F_SEAL_WRITE_PEER,
or something like that.

> >
> > So you're fiddling around with the file, but not the inode? How are
> > you preventing code like the following from re-opening the file as
> > writable?
> >
> > $ cat memfd.c
> > #define _GNU_SOURCE
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(void) {
> >  int fd = syscall(__NR_memfd_create, "testfd", 0);
> >  if (fd == -1) err(1, "memfd");
> >  char path[100];
> >  sprintf(path, "/proc/self/fd/%d", fd);
> >  int fd2 = open(path, O_RDWR);
> >  if (fd2 == -1) err(1, "reopen");
> >  printf("reopen successful: %d\n", fd2);
> > }
> > $ gcc -o memfd memfd.c
> > $ ./memfd
> > reopen successful: 4
> > $
> >

The race condition between memfd_create and applying seals in fcntl?
I think it would be possible to block new write mappings from peer processes if
there is a new memfd_create api that accepts seals. Allowing caller to
set a seal
like the one I proposed years ago, though in a race-free manner. Then
also consider
how to properly handle blocking inherited +W mapping through
clone/fork. Maybe I'm
forgetting some other pitfalls?



> > That aside: I wonder whether a better API would be something that
> > allows you to create a new readonly file descriptor, instead of
> > fiddling with the writability of an existing fd.
>
> Every now and then I try to write a patch to prevent using proc to reopen a 
> file with greater permission than the original open.
>
> I like your idea to have a clean way to reopen a a memfd with reduced 
> permissions. But I would make it a syscall instead and maybe make it only 
> work for memfd at first.  And the proc issue would need to be fixed, too.

IMO the best solution would handle the issue at memfd creation time by
removing the race condition.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 10:06:31PM +0100, Jann Horn wrote:
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
> 
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
> 
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...

Ok, will do from next time.

> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
> 
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
> 
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void) {
>   int fd = syscall(__NR_memfd_create, "testfd", 0);
>   if (fd == -1) err(1, "memfd");
>   char path[100];
>   sprintf(path, "/proc/self/fd/%d", fd);
>   int fd2 = open(path, O_RDWR);
>   if (fd2 == -1) err(1, "reopen");
>   printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4

Great catch and this is indeed an issue :-(. I verified it too.

> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

Android usecases cannot deal with a new fd number because it breaks the
continuity of having the same old fd, as Dan also pointed out.

Also such API will have the same issues you brought up?

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Fri, Nov 09, 2018 at 10:06:31PM +0100, Jann Horn wrote:
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
> 
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
> 
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...

Ok, will do from next time.

> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
> 
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
> 
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void) {
>   int fd = syscall(__NR_memfd_create, "testfd", 0);
>   if (fd == -1) err(1, "memfd");
>   char path[100];
>   sprintf(path, "/proc/self/fd/%d", fd);
>   int fd2 = open(path, O_RDWR);
>   if (fd2 == -1) err(1, "reopen");
>   printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4

Great catch and this is indeed an issue :-(. I verified it too.

> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

Android usecases cannot deal with a new fd number because it breaks the
continuity of having the same old fd, as Dan also pointed out.

Also such API will have the same issues you brought up?

thanks,

 - Joel

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 2:42 PM, Daniel Colascione  wrote:
> 
> On Fri, Nov 9, 2018 at 2:37 PM, Andy Lutomirski  wrote:
>>> Another, more general fix might be to prevent /proc/pid/fd/N opens
>>> from "upgrading" access modes. But that'd be a bigger ABI break.
>> 
>> I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
>> personally.
> 
> Someone, somewhere is probably relying on it though, and that means
> that we probably can't change it unless it's actually causing
> problems.
> 
> spacebar heating

I think it has caused problems in the past. It’s certainly extremely surprising 
behavior.  I’d say it should be fixed and, if needed, a sysctl to unfix it 
might be okay.

> 
 That aside: I wonder whether a better API would be something that
 allows you to create a new readonly file descriptor, instead of
 fiddling with the writability of an existing fd.
>>> 
>>> That doesn't work, unfortunately. The ashmem API we're replacing with
>>> memfd requires file descriptor continuity. I also looked into opening
>>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
>>> work in the case that the old FD has already leaked to some other
>>> context (e.g., another dup, SCM_RIGHTS). See
>>> https://developer.android.com/ndk/reference/group/memory. We can't
>>> break ASharedMemory_setProt.
>> 
>> 
>> Hmm.  If we fix the general reopen bug, a way to drop write access from an 
>> existing struct file would do what Android needs, right?  I don’t know if 
>> there are general VFS issues with that.
> 
> I also proposed that. :-) Maybe it'd work best as a special case of
> the perennial revoke(2) that people keep proposing. You'd be able to
> selectively revoke all access or just write access.

Sounds good to me, modulo possible races, but that shouldn’t be too hard to 
deal with.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 2:42 PM, Daniel Colascione  wrote:
> 
> On Fri, Nov 9, 2018 at 2:37 PM, Andy Lutomirski  wrote:
>>> Another, more general fix might be to prevent /proc/pid/fd/N opens
>>> from "upgrading" access modes. But that'd be a bigger ABI break.
>> 
>> I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
>> personally.
> 
> Someone, somewhere is probably relying on it though, and that means
> that we probably can't change it unless it's actually causing
> problems.
> 
> spacebar heating

I think it has caused problems in the past. It’s certainly extremely surprising 
behavior.  I’d say it should be fixed and, if needed, a sysctl to unfix it 
might be okay.

> 
 That aside: I wonder whether a better API would be something that
 allows you to create a new readonly file descriptor, instead of
 fiddling with the writability of an existing fd.
>>> 
>>> That doesn't work, unfortunately. The ashmem API we're replacing with
>>> memfd requires file descriptor continuity. I also looked into opening
>>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
>>> work in the case that the old FD has already leaked to some other
>>> context (e.g., another dup, SCM_RIGHTS). See
>>> https://developer.android.com/ndk/reference/group/memory. We can't
>>> break ASharedMemory_setProt.
>> 
>> 
>> Hmm.  If we fix the general reopen bug, a way to drop write access from an 
>> existing struct file would do what Android needs, right?  I don’t know if 
>> there are general VFS issues with that.
> 
> I also proposed that. :-) Maybe it'd work best as a special case of
> the perennial revoke(2) that people keep proposing. You'd be able to
> selectively revoke all access or just write access.

Sounds good to me, modulo possible races, but that shouldn’t be too hard to 
deal with.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Daniel Colascione

On Fri, Nov 9, 2018 at 2:37 PM, Andy Lutomirski  wrote:
>> Another, more general fix might be to prevent /proc/pid/fd/N opens
>> from "upgrading" access modes. But that'd be a bigger ABI break.
>
> I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
> personally.

Someone, somewhere is probably relying on it though, and that means
that we probably can't change it unless it's actually causing
problems.

spacebar heating

>>> That aside: I wonder whether a better API would be something that
>>> allows you to create a new readonly file descriptor, instead of
>>> fiddling with the writability of an existing fd.
>>
>> That doesn't work, unfortunately. The ashmem API we're replacing with
>> memfd requires file descriptor continuity. I also looked into opening
>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
>> work in the case that the old FD has already leaked to some other
>> context (e.g., another dup, SCM_RIGHTS). See
>> https://developer.android.com/ndk/reference/group/memory. We can't
>> break ASharedMemory_setProt.
>
>
> Hmm.  If we fix the general reopen bug, a way to drop write access from an 
> existing struct file would do what Android needs, right?  I don’t know if 
> there are general VFS issues with that.

I also proposed that. :-) Maybe it'd work best as a special case of
the perennial revoke(2) that people keep proposing. You'd be able to
selectively revoke all access or just write access.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Daniel Colascione

On Fri, Nov 9, 2018 at 2:37 PM, Andy Lutomirski  wrote:
>> Another, more general fix might be to prevent /proc/pid/fd/N opens
>> from "upgrading" access modes. But that'd be a bigger ABI break.
>
> I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
> personally.

Someone, somewhere is probably relying on it though, and that means
that we probably can't change it unless it's actually causing
problems.

spacebar heating

>>> That aside: I wonder whether a better API would be something that
>>> allows you to create a new readonly file descriptor, instead of
>>> fiddling with the writability of an existing fd.
>>
>> That doesn't work, unfortunately. The ashmem API we're replacing with
>> memfd requires file descriptor continuity. I also looked into opening
>> a new FD and dup2(2)ing atop the old one, but this approach doesn't
>> work in the case that the old FD has already leaked to some other
>> context (e.g., another dup, SCM_RIGHTS). See
>> https://developer.android.com/ndk/reference/group/memory. We can't
>> break ASharedMemory_setProt.
>
>
> Hmm.  If we fix the general reopen bug, a way to drop write access from an 
> existing struct file would do what Android needs, right?  I don’t know if 
> there are general VFS issues with that.

I also proposed that. :-) Maybe it'd work best as a special case of
the perennial revoke(2) that people keep proposing. You'd be able to
selectively revoke all access or just write access.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 2:20 PM, Daniel Colascione  wrote:
> 
>> On Fri, Nov 9, 2018 at 1:06 PM, Jann Horn  wrote:
>> 
>> +linux-api for API addition
>> +hughd as FYI since this is somewhat related to mm/shmem
>> 
>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>>  wrote:
>>> Android uses ashmem for sharing memory regions. We are looking forward
>>> to migrating all usecases of ashmem to memfd so that we can possibly
>>> remove the ashmem driver in the future from staging while also
>>> benefiting from using memfd and contributing to it. Note staging drivers
>>> are also not ABI and generally can be removed at anytime.
>>> 
>>> One of the main usecases Android has is the ability to create a region
>>> and mmap it as writeable, then add protection against making any
>>> "future" writes while keeping the existing already mmap'ed
>>> writeable-region active.  This allows us to implement a usecase where
>>> receivers of the shared memory buffer can get a read-only view, while
>>> the sender continues to write to the buffer.
>>> See CursorWindow documentation in Android for more details:
>>> https://developer.android.com/reference/android/database/CursorWindow
>>> 
>>> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>>> which prevents any future mmap and write syscalls from succeeding while
>>> keeping the existing mmap active.
>> 
>> Please CC linux-api@ on patches like this. If you had done that, I
>> might have criticized your v1 patch instead of your v3 patch...
>> 
>>> The following program shows the seal
>>> working in action:
>> [...]
>>> Cc: jr...@google.com
>>> Cc: john.stu...@linaro.org
>>> Cc: tk...@google.com
>>> Cc: gre...@linuxfoundation.org
>>> Cc: h...@infradead.org
>>> Reviewed-by: John Stultz 
>>> Signed-off-by: Joel Fernandes (Google) 
>>> ---
>> [...]
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index 2bb5e257080e..5ba9804e9515 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>> [...]
>>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
>>> int seals)
>>>}
>>>}
>>> 
>>> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>>> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>>> +   /*
>>> +* The FUTURE_WRITE seal also prevents growing and shrinking
>>> +* so we need them to be already set, or requested now.
>>> +*/
>>> +   int test_seals = (seals | *file_seals) &
>>> +(F_SEAL_GROW | F_SEAL_SHRINK);
>>> +
>>> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>>> +   error = -EINVAL;
>>> +   goto unlock;
>>> +   }
>>> +
>>> +   spin_lock(>f_lock);
>>> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>>> +   spin_unlock(>f_lock);
>>> +   }
>> 
>> So you're fiddling around with the file, but not the inode? How are
>> you preventing code like the following from re-opening the file as
>> writable?
> 
> Good catch. That's fixable too though, isn't it, just by fiddling with
> the inode, right?

True.

> 
> Another, more general fix might be to prevent /proc/pid/fd/N opens
> from "upgrading" access modes. But that'd be a bigger ABI break.

I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
personally.

> 
>> That aside: I wonder whether a better API would be something that
>> allows you to create a new readonly file descriptor, instead of
>> fiddling with the writability of an existing fd.
> 
> That doesn't work, unfortunately. The ashmem API we're replacing with
> memfd requires file descriptor continuity. I also looked into opening
> a new FD and dup2(2)ing atop the old one, but this approach doesn't
> work in the case that the old FD has already leaked to some other
> context (e.g., another dup, SCM_RIGHTS). See
> https://developer.android.com/ndk/reference/group/memory. We can't
> break ASharedMemory_setProt.


Hmm.  If we fix the general reopen bug, a way to drop write access from an 
existing struct file would do what Android needs, right?  I don’t know if there 
are general VFS issues with that.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 2:20 PM, Daniel Colascione  wrote:
> 
>> On Fri, Nov 9, 2018 at 1:06 PM, Jann Horn  wrote:
>> 
>> +linux-api for API addition
>> +hughd as FYI since this is somewhat related to mm/shmem
>> 
>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>>  wrote:
>>> Android uses ashmem for sharing memory regions. We are looking forward
>>> to migrating all usecases of ashmem to memfd so that we can possibly
>>> remove the ashmem driver in the future from staging while also
>>> benefiting from using memfd and contributing to it. Note staging drivers
>>> are also not ABI and generally can be removed at anytime.
>>> 
>>> One of the main usecases Android has is the ability to create a region
>>> and mmap it as writeable, then add protection against making any
>>> "future" writes while keeping the existing already mmap'ed
>>> writeable-region active.  This allows us to implement a usecase where
>>> receivers of the shared memory buffer can get a read-only view, while
>>> the sender continues to write to the buffer.
>>> See CursorWindow documentation in Android for more details:
>>> https://developer.android.com/reference/android/database/CursorWindow
>>> 
>>> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>>> which prevents any future mmap and write syscalls from succeeding while
>>> keeping the existing mmap active.
>> 
>> Please CC linux-api@ on patches like this. If you had done that, I
>> might have criticized your v1 patch instead of your v3 patch...
>> 
>>> The following program shows the seal
>>> working in action:
>> [...]
>>> Cc: jr...@google.com
>>> Cc: john.stu...@linaro.org
>>> Cc: tk...@google.com
>>> Cc: gre...@linuxfoundation.org
>>> Cc: h...@infradead.org
>>> Reviewed-by: John Stultz 
>>> Signed-off-by: Joel Fernandes (Google) 
>>> ---
>> [...]
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index 2bb5e257080e..5ba9804e9515 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>> [...]
>>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
>>> int seals)
>>>}
>>>}
>>> 
>>> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>>> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>>> +   /*
>>> +* The FUTURE_WRITE seal also prevents growing and shrinking
>>> +* so we need them to be already set, or requested now.
>>> +*/
>>> +   int test_seals = (seals | *file_seals) &
>>> +(F_SEAL_GROW | F_SEAL_SHRINK);
>>> +
>>> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>>> +   error = -EINVAL;
>>> +   goto unlock;
>>> +   }
>>> +
>>> +   spin_lock(>f_lock);
>>> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>>> +   spin_unlock(>f_lock);
>>> +   }
>> 
>> So you're fiddling around with the file, but not the inode? How are
>> you preventing code like the following from re-opening the file as
>> writable?
> 
> Good catch. That's fixable too though, isn't it, just by fiddling with
> the inode, right?

True.

> 
> Another, more general fix might be to prevent /proc/pid/fd/N opens
> from "upgrading" access modes. But that'd be a bigger ABI break.

I think we should fix that, too.  I consider it a bug fix, not an ABI break, 
personally.

> 
>> That aside: I wonder whether a better API would be something that
>> allows you to create a new readonly file descriptor, instead of
>> fiddling with the writability of an existing fd.
> 
> That doesn't work, unfortunately. The ashmem API we're replacing with
> memfd requires file descriptor continuity. I also looked into opening
> a new FD and dup2(2)ing atop the old one, but this approach doesn't
> work in the case that the old FD has already leaked to some other
> context (e.g., another dup, SCM_RIGHTS). See
> https://developer.android.com/ndk/reference/group/memory. We can't
> break ASharedMemory_setProt.


Hmm.  If we fix the general reopen bug, a way to drop write access from an 
existing struct file would do what Android needs, right?  I don’t know if there 
are general VFS issues with that.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Daniel Colascione

On Fri, Nov 9, 2018 at 1:06 PM, Jann Horn  wrote:
>
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
>
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
>
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
>
> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
>
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?

Good catch. That's fixable too though, isn't it, just by fiddling with
the inode, right?

Another, more general fix might be to prevent /proc/pid/fd/N opens
from "upgrading" access modes. But that'd be a bigger ABI break.

> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

That doesn't work, unfortunately. The ashmem API we're replacing with
memfd requires file descriptor continuity. I also looked into opening
a new FD and dup2(2)ing atop the old one, but this approach doesn't
work in the case that the old FD has already leaked to some other
context (e.g., another dup, SCM_RIGHTS). See
https://developer.android.com/ndk/reference/group/memory. We can't
break ASharedMemory_setProt.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Daniel Colascione

On Fri, Nov 9, 2018 at 1:06 PM, Jann Horn  wrote:
>
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
>
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
>
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
>
> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
>
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?

Good catch. That's fixable too though, isn't it, just by fiddling with
the inode, right?

Another, more general fix might be to prevent /proc/pid/fd/N opens
from "upgrading" access modes. But that'd be a bigger ABI break.

> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

That doesn't work, unfortunately. The ashmem API we're replacing with
memfd requires file descriptor continuity. I also looked into opening
a new FD and dup2(2)ing atop the old one, but this approach doesn't
work in the case that the old FD has already leaked to some other
context (e.g., another dup, SCM_RIGHTS). See
https://developer.android.com/ndk/reference/group/memory. We can't
break ASharedMemory_setProt.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 1:06 PM, Jann Horn  wrote:
> 
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
> 
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
>> Android uses ashmem for sharing memory regions. We are looking forward
>> to migrating all usecases of ashmem to memfd so that we can possibly
>> remove the ashmem driver in the future from staging while also
>> benefiting from using memfd and contributing to it. Note staging drivers
>> are also not ABI and generally can be removed at anytime.
>> 
>> One of the main usecases Android has is the ability to create a region
>> and mmap it as writeable, then add protection against making any
>> "future" writes while keeping the existing already mmap'ed
>> writeable-region active.  This allows us to implement a usecase where
>> receivers of the shared memory buffer can get a read-only view, while
>> the sender continues to write to the buffer.
>> See CursorWindow documentation in Android for more details:
>> https://developer.android.com/reference/android/database/CursorWindow
>> 
>> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>> which prevents any future mmap and write syscalls from succeeding while
>> keeping the existing mmap active.
> 
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
> 
>> The following program shows the seal
>> working in action:
> [...]
>> Cc: jr...@google.com
>> Cc: john.stu...@linaro.org
>> Cc: tk...@google.com
>> Cc: gre...@linuxfoundation.org
>> Cc: h...@infradead.org
>> Reviewed-by: John Stultz 
>> Signed-off-by: Joel Fernandes (Google) 
>> ---
> [...]
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index 2bb5e257080e..5ba9804e9515 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
> [...]
>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
>> int seals)
>>}
>>}
>> 
>> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>> +   /*
>> +* The FUTURE_WRITE seal also prevents growing and shrinking
>> +* so we need them to be already set, or requested now.
>> +*/
>> +   int test_seals = (seals | *file_seals) &
>> +(F_SEAL_GROW | F_SEAL_SHRINK);
>> +
>> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>> +   error = -EINVAL;
>> +   goto unlock;
>> +   }
>> +
>> +   spin_lock(>f_lock);
>> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>> +   spin_unlock(>f_lock);
>> +   }
> 
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
> 
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void) {
>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>  if (fd == -1) err(1, "memfd");
>  char path[100];
>  sprintf(path, "/proc/self/fd/%d", fd);
>  int fd2 = open(path, O_RDWR);
>  if (fd2 == -1) err(1, "reopen");
>  printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4
> $
> 
> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

Every now and then I try to write a patch to prevent using proc to reopen a 
file with greater permission than the original open.

I like your idea to have a clean way to reopen a a memfd with reduced 
permissions. But I would make it a syscall instead and maybe make it only work 
for memfd at first.  And the proc issue would need to be fixed, too.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andy Lutomirski




> On Nov 9, 2018, at 1:06 PM, Jann Horn  wrote:
> 
> +linux-api for API addition
> +hughd as FYI since this is somewhat related to mm/shmem
> 
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
>> Android uses ashmem for sharing memory regions. We are looking forward
>> to migrating all usecases of ashmem to memfd so that we can possibly
>> remove the ashmem driver in the future from staging while also
>> benefiting from using memfd and contributing to it. Note staging drivers
>> are also not ABI and generally can be removed at anytime.
>> 
>> One of the main usecases Android has is the ability to create a region
>> and mmap it as writeable, then add protection against making any
>> "future" writes while keeping the existing already mmap'ed
>> writeable-region active.  This allows us to implement a usecase where
>> receivers of the shared memory buffer can get a read-only view, while
>> the sender continues to write to the buffer.
>> See CursorWindow documentation in Android for more details:
>> https://developer.android.com/reference/android/database/CursorWindow
>> 
>> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>> which prevents any future mmap and write syscalls from succeeding while
>> keeping the existing mmap active.
> 
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
> 
>> The following program shows the seal
>> working in action:
> [...]
>> Cc: jr...@google.com
>> Cc: john.stu...@linaro.org
>> Cc: tk...@google.com
>> Cc: gre...@linuxfoundation.org
>> Cc: h...@infradead.org
>> Reviewed-by: John Stultz 
>> Signed-off-by: Joel Fernandes (Google) 
>> ---
> [...]
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index 2bb5e257080e..5ba9804e9515 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
> [...]
>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
>> int seals)
>>}
>>}
>> 
>> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
>> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>> +   /*
>> +* The FUTURE_WRITE seal also prevents growing and shrinking
>> +* so we need them to be already set, or requested now.
>> +*/
>> +   int test_seals = (seals | *file_seals) &
>> +(F_SEAL_GROW | F_SEAL_SHRINK);
>> +
>> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>> +   error = -EINVAL;
>> +   goto unlock;
>> +   }
>> +
>> +   spin_lock(>f_lock);
>> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>> +   spin_unlock(>f_lock);
>> +   }
> 
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
> 
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void) {
>  int fd = syscall(__NR_memfd_create, "testfd", 0);
>  if (fd == -1) err(1, "memfd");
>  char path[100];
>  sprintf(path, "/proc/self/fd/%d", fd);
>  int fd2 = open(path, O_RDWR);
>  if (fd2 == -1) err(1, "reopen");
>  printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4
> $
> 
> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

Every now and then I try to write a patch to prevent using proc to reopen a 
file with greater permission than the original open.

I like your idea to have a clean way to reopen a a memfd with reduced 
permissions. But I would make it a syscall instead and maybe make it only work 
for memfd at first.  And the proc issue would need to be fixed, too.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Jann Horn

On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
>
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
>
> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
>
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
>
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main(void) {
>   int fd = syscall(__NR_memfd_create, "testfd", 0);
>   if (fd == -1) err(1, "memfd");
>   char path[100];
>   sprintf(path, "/proc/self/fd/%d", fd);
>   int fd2 = open(path, O_RDWR);
>   if (fd2 == -1) err(1, "reopen");
>   printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4
> $
>
> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

My favorite approach would be to forbid open() on memfds, hope that
nobody notices the tiny API break, and then add an ioctl for "reopen
this memfd with reduced permissions" - but that's just my personal
opinion.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Jann Horn

On Fri, Nov 9, 2018 at 10:06 PM Jann Horn  wrote:
> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>  wrote:
> > Android uses ashmem for sharing memory regions. We are looking forward
> > to migrating all usecases of ashmem to memfd so that we can possibly
> > remove the ashmem driver in the future from staging while also
> > benefiting from using memfd and contributing to it. Note staging drivers
> > are also not ABI and generally can be removed at anytime.
> >
> > One of the main usecases Android has is the ability to create a region
> > and mmap it as writeable, then add protection against making any
> > "future" writes while keeping the existing already mmap'ed
> > writeable-region active.  This allows us to implement a usecase where
> > receivers of the shared memory buffer can get a read-only view, while
> > the sender continues to write to the buffer.
> > See CursorWindow documentation in Android for more details:
> > https://developer.android.com/reference/android/database/CursorWindow
> >
> > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> > which prevents any future mmap and write syscalls from succeeding while
> > keeping the existing mmap active.
>
> Please CC linux-api@ on patches like this. If you had done that, I
> might have criticized your v1 patch instead of your v3 patch...
>
> > The following program shows the seal
> > working in action:
> [...]
> > Cc: jr...@google.com
> > Cc: john.stu...@linaro.org
> > Cc: tk...@google.com
> > Cc: gre...@linuxfoundation.org
> > Cc: h...@infradead.org
> > Reviewed-by: John Stultz 
> > Signed-off-by: Joel Fernandes (Google) 
> > ---
> [...]
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 2bb5e257080e..5ba9804e9515 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> [...]
> > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> > int seals)
> > }
> > }
> >
> > +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> > +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > +   /*
> > +* The FUTURE_WRITE seal also prevents growing and shrinking
> > +* so we need them to be already set, or requested now.
> > +*/
> > +   int test_seals = (seals | *file_seals) &
> > +(F_SEAL_GROW | F_SEAL_SHRINK);
> > +
> > +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > +   error = -EINVAL;
> > +   goto unlock;
> > +   }
> > +
> > +   spin_lock(>f_lock);
> > +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > +   spin_unlock(>f_lock);
> > +   }
>
> So you're fiddling around with the file, but not the inode? How are
> you preventing code like the following from re-opening the file as
> writable?
>
> $ cat memfd.c
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main(void) {
>   int fd = syscall(__NR_memfd_create, "testfd", 0);
>   if (fd == -1) err(1, "memfd");
>   char path[100];
>   sprintf(path, "/proc/self/fd/%d", fd);
>   int fd2 = open(path, O_RDWR);
>   if (fd2 == -1) err(1, "reopen");
>   printf("reopen successful: %d\n", fd2);
> }
> $ gcc -o memfd memfd.c
> $ ./memfd
> reopen successful: 4
> $
>
> That aside: I wonder whether a better API would be something that
> allows you to create a new readonly file descriptor, instead of
> fiddling with the writability of an existing fd.

My favorite approach would be to forbid open() on memfds, hope that
nobody notices the tiny API break, and then add an ioctl for "reopen
this memfd with reduced permissions" - but that's just my personal
opinion.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Jann Horn

+linux-api for API addition
+hughd as FYI since this is somewhat related to mm/shmem

On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
 wrote:
> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
>
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow
>
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active.

Please CC linux-api@ on patches like this. If you had done that, I
might have criticized your v1 patch instead of your v3 patch...

> The following program shows the seal
> working in action:
[...]
> Cc: jr...@google.com
> Cc: john.stu...@linaro.org
> Cc: tk...@google.com
> Cc: gre...@linuxfoundation.org
> Cc: h...@infradead.org
> Reviewed-by: John Stultz 
> Signed-off-by: Joel Fernandes (Google) 
> ---
[...]
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 2bb5e257080e..5ba9804e9515 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
[...]
> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> int seals)
> }
> }
>
> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> +   /*
> +* The FUTURE_WRITE seal also prevents growing and shrinking
> +* so we need them to be already set, or requested now.
> +*/
> +   int test_seals = (seals | *file_seals) &
> +(F_SEAL_GROW | F_SEAL_SHRINK);
> +
> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> +   error = -EINVAL;
> +   goto unlock;
> +   }
> +
> +   spin_lock(>f_lock);
> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> +   spin_unlock(>f_lock);
> +   }

So you're fiddling around with the file, but not the inode? How are
you preventing code like the following from re-opening the file as
writable?

$ cat memfd.c
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 

int main(void) {
  int fd = syscall(__NR_memfd_create, "testfd", 0);
  if (fd == -1) err(1, "memfd");
  char path[100];
  sprintf(path, "/proc/self/fd/%d", fd);
  int fd2 = open(path, O_RDWR);
  if (fd2 == -1) err(1, "reopen");
  printf("reopen successful: %d\n", fd2);
}
$ gcc -o memfd memfd.c
$ ./memfd
reopen successful: 4
$

That aside: I wonder whether a better API would be something that
allows you to create a new readonly file descriptor, instead of
fiddling with the writability of an existing fd.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Jann Horn

+linux-api for API addition
+hughd as FYI since this is somewhat related to mm/shmem

On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
 wrote:
> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
>
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow
>
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active.

Please CC linux-api@ on patches like this. If you had done that, I
might have criticized your v1 patch instead of your v3 patch...

> The following program shows the seal
> working in action:
[...]
> Cc: jr...@google.com
> Cc: john.stu...@linaro.org
> Cc: tk...@google.com
> Cc: gre...@linuxfoundation.org
> Cc: h...@infradead.org
> Reviewed-by: John Stultz 
> Signed-off-by: Joel Fernandes (Google) 
> ---
[...]
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 2bb5e257080e..5ba9804e9515 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
[...]
> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> int seals)
> }
> }
>
> +   if ((seals & F_SEAL_FUTURE_WRITE) &&
> +   !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> +   /*
> +* The FUTURE_WRITE seal also prevents growing and shrinking
> +* so we need them to be already set, or requested now.
> +*/
> +   int test_seals = (seals | *file_seals) &
> +(F_SEAL_GROW | F_SEAL_SHRINK);
> +
> +   if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> +   error = -EINVAL;
> +   goto unlock;
> +   }
> +
> +   spin_lock(>f_lock);
> +   file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> +   spin_unlock(>f_lock);
> +   }

So you're fiddling around with the file, but not the inode? How are
you preventing code like the following from re-opening the file as
writable?

$ cat memfd.c
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 

int main(void) {
  int fd = syscall(__NR_memfd_create, "testfd", 0);
  if (fd == -1) err(1, "memfd");
  char path[100];
  sprintf(path, "/proc/self/fd/%d", fd);
  int fd2 = open(path, O_RDWR);
  if (fd2 == -1) err(1, "reopen");
  printf("reopen successful: %d\n", fd2);
}
$ gcc -o memfd memfd.c
$ ./memfd
reopen successful: 4
$

That aside: I wonder whether a better API would be something that
allows you to create a new readonly file descriptor, instead of
fiddling with the writability of an existing fd.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andrew Morton

On Wed,  7 Nov 2018 20:15:36 -0800 "Joel Fernandes (Google)" 
 wrote:

> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
> 
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow

It appears that the memfd_create and fcntl manpages will require
updating.  Please attend to this at the appropriate time?

Actually, it would help the review process if those updates were
available now.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Andrew Morton

On Wed,  7 Nov 2018 20:15:36 -0800 "Joel Fernandes (Google)" 
 wrote:

> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
> 
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow

It appears that the memfd_create and fcntl manpages will require
updating.  Please attend to this at the appropriate time?

Actually, it would help the review process if those updates were
available now.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Wed, Nov 07, 2018 at 08:15:36PM -0800, Joel Fernandes (Google) wrote:
> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
> 
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow
> 
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:
> 
[...] 
> The output of running this program is as follows:
> ret=3
> map 0 passed
> write passed
> map 1 prot-write passed as expected
> future-write seal now active
> write failed as expected due to future-write seal
> map 2 prot-write failed as expected due to seal
> : Permission denied
> map 3 prot-read passed as expected
> 
> Cc: jr...@google.com
> Cc: john.stu...@linaro.org
> Cc: tk...@google.com
> Cc: gre...@linuxfoundation.org
> Cc: h...@infradead.org
> Reviewed-by: John Stultz 
> Signed-off-by: Joel Fernandes (Google) 
> ---
> v1->v2: No change, just added selftests to the series. manpages are
> ready and I'll submit them once the patches are accepted.
> 
> v2->v3: Updated commit message to have more support code (John Stultz)
>   Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
>   (Christoph Hellwig)
>   Allow for this seal only if grow/shrink seals are also
>   either previous set, or are requested along with this seal.
>   (Christoph Hellwig)
>   Added locking to synchronize access to file->f_mode.
>   (Christoph Hellwig)


Christoph, do the patches look Ok to you now? If so, then could you give an
Acked-by or Reviewed-by tag?

Thanks a lot,

 - Joel


>  include/uapi/linux/fcntl.h |  1 +
>  mm/memfd.c | 22 +-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 6448cdd9a350..a2f8658f1c55 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -41,6 +41,7 @@
>  #define F_SEAL_SHRINK0x0002  /* prevent file from shrinking */
>  #define F_SEAL_GROW  0x0004  /* prevent file from growing */
>  #define F_SEAL_WRITE 0x0008  /* prevent writes */
> +#define F_SEAL_FUTURE_WRITE  0x0010  /* prevent future writes while mapped */
>  /* (1U << 31) is reserved for signed error codes */
>  
>  /*
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 2bb5e257080e..5ba9804e9515 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file 
> *file)
>  #define F_ALL_SEALS (F_SEAL_SEAL | \
>F_SEAL_SHRINK | \
>F_SEAL_GROW | \
> -  F_SEAL_WRITE)
> +  F_SEAL_WRITE | \
> +  F_SEAL_FUTURE_WRITE)
>  
>  static int memfd_add_seals(struct file *file, unsigned int seals)
>  {
> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> int seals)
>   }
>   }
>  
> + if ((seals & F_SEAL_FUTURE_WRITE) &&
> + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> + /*
> +  * The FUTURE_WRITE seal also prevents growing and shrinking
> +  * so we need them to be already set, or requested now.
> +  */
> + int test_seals = (seals | *file_seals) &
> +  (F_SEAL_GROW | F_SEAL_SHRINK);
> +
> + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> + error = -EINVAL;
> + goto unlock;
> + }
> +
> + spin_lock(>f_lock);
> + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> + spin_unlock(>f_lock);
> + }
> +
>   *file_seals |= seals;
>   error = 0;
>  
> -- 
> 2.19.1.930.g4563a0d9d0-goog

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-09 Thread Joel Fernandes

On Wed, Nov 07, 2018 at 08:15:36PM -0800, Joel Fernandes (Google) wrote:
> Android uses ashmem for sharing memory regions. We are looking forward
> to migrating all usecases of ashmem to memfd so that we can possibly
> remove the ashmem driver in the future from staging while also
> benefiting from using memfd and contributing to it. Note staging drivers
> are also not ABI and generally can be removed at anytime.
> 
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then add protection against making any
> "future" writes while keeping the existing already mmap'ed
> writeable-region active.  This allows us to implement a usecase where
> receivers of the shared memory buffer can get a read-only view, while
> the sender continues to write to the buffer.
> See CursorWindow documentation in Android for more details:
> https://developer.android.com/reference/android/database/CursorWindow
> 
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:
> 
[...] 
> The output of running this program is as follows:
> ret=3
> map 0 passed
> write passed
> map 1 prot-write passed as expected
> future-write seal now active
> write failed as expected due to future-write seal
> map 2 prot-write failed as expected due to seal
> : Permission denied
> map 3 prot-read passed as expected
> 
> Cc: jr...@google.com
> Cc: john.stu...@linaro.org
> Cc: tk...@google.com
> Cc: gre...@linuxfoundation.org
> Cc: h...@infradead.org
> Reviewed-by: John Stultz 
> Signed-off-by: Joel Fernandes (Google) 
> ---
> v1->v2: No change, just added selftests to the series. manpages are
> ready and I'll submit them once the patches are accepted.
> 
> v2->v3: Updated commit message to have more support code (John Stultz)
>   Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
>   (Christoph Hellwig)
>   Allow for this seal only if grow/shrink seals are also
>   either previous set, or are requested along with this seal.
>   (Christoph Hellwig)
>   Added locking to synchronize access to file->f_mode.
>   (Christoph Hellwig)


Christoph, do the patches look Ok to you now? If so, then could you give an
Acked-by or Reviewed-by tag?

Thanks a lot,

 - Joel


>  include/uapi/linux/fcntl.h |  1 +
>  mm/memfd.c | 22 +-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 6448cdd9a350..a2f8658f1c55 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -41,6 +41,7 @@
>  #define F_SEAL_SHRINK0x0002  /* prevent file from shrinking */
>  #define F_SEAL_GROW  0x0004  /* prevent file from growing */
>  #define F_SEAL_WRITE 0x0008  /* prevent writes */
> +#define F_SEAL_FUTURE_WRITE  0x0010  /* prevent future writes while mapped */
>  /* (1U << 31) is reserved for signed error codes */
>  
>  /*
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 2bb5e257080e..5ba9804e9515 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file 
> *file)
>  #define F_ALL_SEALS (F_SEAL_SEAL | \
>F_SEAL_SHRINK | \
>F_SEAL_GROW | \
> -  F_SEAL_WRITE)
> +  F_SEAL_WRITE | \
> +  F_SEAL_FUTURE_WRITE)
>  
>  static int memfd_add_seals(struct file *file, unsigned int seals)
>  {
> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned 
> int seals)
>   }
>   }
>  
> + if ((seals & F_SEAL_FUTURE_WRITE) &&
> + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> + /*
> +  * The FUTURE_WRITE seal also prevents growing and shrinking
> +  * so we need them to be already set, or requested now.
> +  */
> + int test_seals = (seals | *file_seals) &
> +  (F_SEAL_GROW | F_SEAL_SHRINK);
> +
> + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> + error = -EINVAL;
> + goto unlock;
> + }
> +
> + spin_lock(>f_lock);
> + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> + spin_unlock(>f_lock);
> + }
> +
>   *file_seals |= seals;
>   error = 0;
>  
> -- 
> 2.19.1.930.g4563a0d9d0-goog

[PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-07 Thread Joel Fernandes (Google)

Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active.  This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #define F_SEAL_FUTURE_WRITE 0x0010
 #define REGION_SIZE (5 * 1024 * 1024)

int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}

int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;

// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");

if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
   "(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");

addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");

ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
 F_SEAL_GROW |
 F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");

if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");

addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");

addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}

The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected

Cc: jr...@google.com
Cc: john.stu...@linaro.org
Cc: tk...@google.com
Cc: gre...@linuxfoundation.org
Cc: h...@infradead.org
Reviewed-by: John Stultz 
Signed-off-by: Joel Fernandes (Google) 
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.

v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)

 include/uapi/linux/fcntl.h |  1 +
 mm/memfd.c | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
 #define F_SEAL_SHRINK  0x0002  /* prevent file from shrinking */
 #define F_SEAL_GROW0x0004  /* prevent file from growing */
 #define F_SEAL_WRITE   0x0008  /* prevent writes */
+#define F_SEAL_FUTURE_WRITE

[PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-07 Thread Joel Fernandes (Google)

Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.

One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active.  This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow

This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #define F_SEAL_FUTURE_WRITE 0x0010
 #define REGION_SIZE (5 * 1024 * 1024)

int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}

int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;

// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");

if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
   "(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");

addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");

ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
 F_SEAL_GROW |
 F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");

if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");

addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");

addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}

The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected

Cc: jr...@google.com
Cc: john.stu...@linaro.org
Cc: tk...@google.com
Cc: gre...@linuxfoundation.org
Cc: h...@infradead.org
Reviewed-by: John Stultz 
Signed-off-by: Joel Fernandes (Google) 
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.

v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)

 include/uapi/linux/fcntl.h |  1 +
 mm/memfd.c | 22 +-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
 #define F_SEAL_SHRINK  0x0002  /* prevent file from shrinking */
 #define F_SEAL_GROW0x0004  /* prevent file from growing */
 #define F_SEAL_WRITE   0x0008  /* prevent writes */
+#define F_SEAL_FUTURE_WRITE

62 matches

Mail list logo