Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread J. Bruce Fields
On Tue, Mar 21, 2017 at 05:41:22PM +0100, Jan Kara wrote:
> On Tue 21-03-17 11:38:49, J. Bruce Fields wrote:
> > On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > > On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > > > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  
> > > > wrote:
> > > > > An alternative might be to create wrapper functions like
> > > > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > > > security_path_(rename|unlink|...), which is currently also up to
> > > > > the indvidual callers (possibly with a flag because it might not
> > > > > be always desired).
> > > > 
> > > > That's an interesting idea. There is some duplicity between 
> > > > security/audit
> > > > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > > > some of this code.
> > > 
> > > Yeah, but ecryptfs or nfsd don't actually call these security hooks 
> > > AFAICT.
> > 
> > We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
> > security_inode_unlink().
> 
> OK, I have not been specific enough :). ecryptfs or nfsd don't call *path*
> security hooks AFAICT - e.g. security_path_unlink() from nfsd_unlink().

Oh, got it, thanks.

But, no, nfsd is definitely is not meant to be invisible to security
modules, so that's just a bug.

--b.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread J. Bruce Fields
On Tue, Mar 21, 2017 at 05:41:22PM +0100, Jan Kara wrote:
> On Tue 21-03-17 11:38:49, J. Bruce Fields wrote:
> > On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > > On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > > > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  
> > > > wrote:
> > > > > An alternative might be to create wrapper functions like
> > > > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > > > security_path_(rename|unlink|...), which is currently also up to
> > > > > the indvidual callers (possibly with a flag because it might not
> > > > > be always desired).
> > > > 
> > > > That's an interesting idea. There is some duplicity between 
> > > > security/audit
> > > > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > > > some of this code.
> > > 
> > > Yeah, but ecryptfs or nfsd don't actually call these security hooks 
> > > AFAICT.
> > 
> > We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
> > security_inode_unlink().
> 
> OK, I have not been specific enough :). ecryptfs or nfsd don't call *path*
> security hooks AFAICT - e.g. security_path_unlink() from nfsd_unlink().

Oh, got it, thanks.

But, no, nfsd is definitely is not meant to be invisible to security
modules, so that's just a bug.

--b.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread Jan Kara
On Tue 21-03-17 11:38:49, J. Bruce Fields wrote:
> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  
> > > wrote:
> > > > Besause fanotify requires `struct path`, the event cannot be generated
> > > > directly in `fsnotify_move` and friends because they only get the inode
> > > > (and their callers, `vfs_rename` cannot supply any better info).
> > > > So instead it needs to be generated higher in the call chain, i.e. in
> > > > the callers of functions like `vfs_rename`.
> > > >
> > > > This leads to some code duplication. Currently, there are several places
> > > > whence functions like `vfs_rename` or `vfs_unlink` are called:
> > > >
> > > >   * syscall handlers (done)
> > > >   * NFS server (done)
> > > >   * stacked filesystems
> > > >   - ecryptfs (done)
> > > >   - overlayfs
> > > > (Currently doesn't report even ordinary fanotify events, because
> > > >  it internally clones the upper mount; not sure about the
> > > >  rationale.  One can always watch the overlay mount instead.)
> > > >   * few rather minor things
> > > >   - devtmpfs
> > > > (its internal changes are not tied to any vfsmount so it cannot
> > > >  emit mount-scoped events)
> > > >   - cachefiles (done)
> > > >   - ipc/mqueue.c (done)
> > > >   - fs/nfsd/nfs4recover.c (done)
> > > >   - kernel/bpf/inode.c (done)
> > > > net/unix/af_unix.c (done)
> > > >
> > > > (grep -rE 
> > > > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> > > >
> > > > Signed-off-by: Filip Štědronský 
> > > >
> > > > ---
> > > >
> > > > An alternative might be to create wrapper functions like
> > > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > > security_path_(rename|unlink|...), which is currently also up to
> > > > the indvidual callers (possibly with a flag because it might not
> > > > be always desired).
> > > 
> > > That's an interesting idea. There is some duplicity between security/audit
> > > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > > some of this code.
> > 
> > Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.
> 
> We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
> security_inode_unlink().

OK, I have not been specific enough :). ecryptfs or nfsd don't call *path*
security hooks AFAICT - e.g. security_path_unlink() from nfsd_unlink().

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread Jan Kara
On Tue 21-03-17 11:38:49, J. Bruce Fields wrote:
> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  
> > > wrote:
> > > > Besause fanotify requires `struct path`, the event cannot be generated
> > > > directly in `fsnotify_move` and friends because they only get the inode
> > > > (and their callers, `vfs_rename` cannot supply any better info).
> > > > So instead it needs to be generated higher in the call chain, i.e. in
> > > > the callers of functions like `vfs_rename`.
> > > >
> > > > This leads to some code duplication. Currently, there are several places
> > > > whence functions like `vfs_rename` or `vfs_unlink` are called:
> > > >
> > > >   * syscall handlers (done)
> > > >   * NFS server (done)
> > > >   * stacked filesystems
> > > >   - ecryptfs (done)
> > > >   - overlayfs
> > > > (Currently doesn't report even ordinary fanotify events, because
> > > >  it internally clones the upper mount; not sure about the
> > > >  rationale.  One can always watch the overlay mount instead.)
> > > >   * few rather minor things
> > > >   - devtmpfs
> > > > (its internal changes are not tied to any vfsmount so it cannot
> > > >  emit mount-scoped events)
> > > >   - cachefiles (done)
> > > >   - ipc/mqueue.c (done)
> > > >   - fs/nfsd/nfs4recover.c (done)
> > > >   - kernel/bpf/inode.c (done)
> > > > net/unix/af_unix.c (done)
> > > >
> > > > (grep -rE 
> > > > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> > > >
> > > > Signed-off-by: Filip Štědronský 
> > > >
> > > > ---
> > > >
> > > > An alternative might be to create wrapper functions like
> > > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > > security_path_(rename|unlink|...), which is currently also up to
> > > > the indvidual callers (possibly with a flag because it might not
> > > > be always desired).
> > > 
> > > That's an interesting idea. There is some duplicity between security/audit
> > > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > > some of this code.
> > 
> > Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.
> 
> We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
> security_inode_unlink().

OK, I have not been specific enough :). ecryptfs or nfsd don't call *path*
security hooks AFAICT - e.g. security_path_unlink() from nfsd_unlink().

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread J. Bruce Fields
On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> > > Besause fanotify requires `struct path`, the event cannot be generated
> > > directly in `fsnotify_move` and friends because they only get the inode
> > > (and their callers, `vfs_rename` cannot supply any better info).
> > > So instead it needs to be generated higher in the call chain, i.e. in
> > > the callers of functions like `vfs_rename`.
> > >
> > > This leads to some code duplication. Currently, there are several places
> > > whence functions like `vfs_rename` or `vfs_unlink` are called:
> > >
> > >   * syscall handlers (done)
> > >   * NFS server (done)
> > >   * stacked filesystems
> > >   - ecryptfs (done)
> > >   - overlayfs
> > > (Currently doesn't report even ordinary fanotify events, because
> > >  it internally clones the upper mount; not sure about the
> > >  rationale.  One can always watch the overlay mount instead.)
> > >   * few rather minor things
> > >   - devtmpfs
> > > (its internal changes are not tied to any vfsmount so it cannot
> > >  emit mount-scoped events)
> > >   - cachefiles (done)
> > >   - ipc/mqueue.c (done)
> > >   - fs/nfsd/nfs4recover.c (done)
> > >   - kernel/bpf/inode.c (done)
> > > net/unix/af_unix.c (done)
> > >
> > > (grep -rE 
> > > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> > >
> > > Signed-off-by: Filip Štědronský 
> > >
> > > ---
> > >
> > > An alternative might be to create wrapper functions like
> > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > security_path_(rename|unlink|...), which is currently also up to
> > > the indvidual callers (possibly with a flag because it might not
> > > be always desired).
> > 
> > That's an interesting idea. There is some duplicity between security/audit
> > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > some of this code.
> 
> Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.

We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
security_inode_unlink().

> And at least for NFSD it seems correct they don't call them since you are
> running in a context of an NFSD server process and don't have security
> context of the process actually issuing the IO.

# ps -eo comm,label|grep nfsd
nfsdsystem_u:system_r:kernel_t:s0
...

So I guess that's the process context they see.

There is actually a way for the protocol to pass the security context,
in the case it's running over kerberos.  So some day this might
optionally start using the context of the client process, for what it's
worth.

--b.

> So I'm not sure
> "deduplication" is really possible.
> 
> However if you can really call fsnotify hooks with 'path' available in all
> the places, it should be equally hard to just pass 'path' to
> vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> calls into several call sites but keep them local to vfs_(create|mkdir|...)
> helpers. Hmm?
> 
> > 
> > > ---
> > >  fs/cachefiles/namei.c |  9 +++
> > >  fs/ecryptfs/inode.c   | 67 
> > > +++
> > >  fs/namei.c| 23 +-
> > >  fs/nfsd/nfs4recover.c |  7 ++
> > >  fs/nfsd/vfs.c | 24 --
> > >  ipc/mqueue.c  |  9 +++
> > >  kernel/bpf/inode.c|  3 +++
> > >  net/unix/af_unix.c|  2 ++
> > >  8 files changed, 141 insertions(+), 3 deletions(-)
> > >
> > 
> > OK, just for comparison, I am going to put here the diff of the sub set of
> > my patches that are needed to support fanotify filename events.
> > 
> > 
> > $ git diff --stat fsnotify_sb..fanotify_dentry
> >  fs/notify/fanotify/fanotify.c  | 94
> > --
> >  fs/notify/fanotify/fanotify.h  | 25 -
> >  fs/notify/fanotify/fanotify_user.c | 92
> > 
> >  fs/notify/fdinfo.c | 25 +++--
> >  fs/notify/inode_mark.c |  1 +
> >  fs/notify/mark.c   | 15 ---
> >  include/linux/fsnotify_backend.h   | 21 -
> >  include/uapi/linux/fanotify.h  | 41
> > +++--
> >  8 files changed, 255 insertions(+), 59 deletions(-)
> > 
> > Yes, it is a bit more code, much mostly because it adds more functionality
> > (optionally reporting the filename).
> > But most of the code is contained within the fsnotify/fanotify subsystem.
> 
> So I'm not that concerned about actual line numbers. They play some role
> but they don't tell much about how maintainable the result is in the 

Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-21 Thread J. Bruce Fields
On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> > On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> > > Besause fanotify requires `struct path`, the event cannot be generated
> > > directly in `fsnotify_move` and friends because they only get the inode
> > > (and their callers, `vfs_rename` cannot supply any better info).
> > > So instead it needs to be generated higher in the call chain, i.e. in
> > > the callers of functions like `vfs_rename`.
> > >
> > > This leads to some code duplication. Currently, there are several places
> > > whence functions like `vfs_rename` or `vfs_unlink` are called:
> > >
> > >   * syscall handlers (done)
> > >   * NFS server (done)
> > >   * stacked filesystems
> > >   - ecryptfs (done)
> > >   - overlayfs
> > > (Currently doesn't report even ordinary fanotify events, because
> > >  it internally clones the upper mount; not sure about the
> > >  rationale.  One can always watch the overlay mount instead.)
> > >   * few rather minor things
> > >   - devtmpfs
> > > (its internal changes are not tied to any vfsmount so it cannot
> > >  emit mount-scoped events)
> > >   - cachefiles (done)
> > >   - ipc/mqueue.c (done)
> > >   - fs/nfsd/nfs4recover.c (done)
> > >   - kernel/bpf/inode.c (done)
> > > net/unix/af_unix.c (done)
> > >
> > > (grep -rE 
> > > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> > >
> > > Signed-off-by: Filip Štědronský 
> > >
> > > ---
> > >
> > > An alternative might be to create wrapper functions like
> > > vfs_path_(rename|unlink|...). They could also take care of calling
> > > security_path_(rename|unlink|...), which is currently also up to
> > > the indvidual callers (possibly with a flag because it might not
> > > be always desired).
> > 
> > That's an interesting idea. There is some duplicity between security/audit
> > hook and fsnotify hooks. It should be interesting to try and deduplicate
> > some of this code.
> 
> Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.

We don't?  E.g. nfsd_unlink calls vfs_unlink which calls
security_inode_unlink().

> And at least for NFSD it seems correct they don't call them since you are
> running in a context of an NFSD server process and don't have security
> context of the process actually issuing the IO.

# ps -eo comm,label|grep nfsd
nfsdsystem_u:system_r:kernel_t:s0
...

So I guess that's the process context they see.

There is actually a way for the protocol to pass the security context,
in the case it's running over kerberos.  So some day this might
optionally start using the context of the client process, for what it's
worth.

--b.

> So I'm not sure
> "deduplication" is really possible.
> 
> However if you can really call fsnotify hooks with 'path' available in all
> the places, it should be equally hard to just pass 'path' to
> vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> calls into several call sites but keep them local to vfs_(create|mkdir|...)
> helpers. Hmm?
> 
> > 
> > > ---
> > >  fs/cachefiles/namei.c |  9 +++
> > >  fs/ecryptfs/inode.c   | 67 
> > > +++
> > >  fs/namei.c| 23 +-
> > >  fs/nfsd/nfs4recover.c |  7 ++
> > >  fs/nfsd/vfs.c | 24 --
> > >  ipc/mqueue.c  |  9 +++
> > >  kernel/bpf/inode.c|  3 +++
> > >  net/unix/af_unix.c|  2 ++
> > >  8 files changed, 141 insertions(+), 3 deletions(-)
> > >
> > 
> > OK, just for comparison, I am going to put here the diff of the sub set of
> > my patches that are needed to support fanotify filename events.
> > 
> > 
> > $ git diff --stat fsnotify_sb..fanotify_dentry
> >  fs/notify/fanotify/fanotify.c  | 94
> > --
> >  fs/notify/fanotify/fanotify.h  | 25 -
> >  fs/notify/fanotify/fanotify_user.c | 92
> > 
> >  fs/notify/fdinfo.c | 25 +++--
> >  fs/notify/inode_mark.c |  1 +
> >  fs/notify/mark.c   | 15 ---
> >  include/linux/fsnotify_backend.h   | 21 -
> >  include/uapi/linux/fanotify.h  | 41
> > +++--
> >  8 files changed, 255 insertions(+), 59 deletions(-)
> > 
> > Yes, it is a bit more code, much mostly because it adds more functionality
> > (optionally reporting the filename).
> > But most of the code is contained within the fsnotify/fanotify subsystem.
> 
> So I'm not that concerned about actual line numbers. They play some role
> but they don't tell much about how maintainable the result is in the end.
> 
> > The altenative to sprinkle 

Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Filip Štědronský
Hi,

On Sun, Mar 19, 2017 at 07:04:13PM +0100, Jan Kara wrote:
> How come? In current kernel the call looks like:
> 
> vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);
> 
> so the path is available there... I've actually quickly checked all
> vfs_mknod() callers and they all seem to have path available.

terribly sorry, must have misremembered something. Been staring at the
code long into the night. You are quite right.

But it is an internal mount so userspace never gets the notifications.
The same goes for the cloned upper mount in overlayfs. This might be
considered ok, because the change is semantically "internal" and does
not originate through any userspace-visible mountpoint. Superblock
watches would solve this case.

Otherwise it seems feasible to pass a path to all VFS functions.

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Filip Štědronský
Hi,

On Sun, Mar 19, 2017 at 07:04:13PM +0100, Jan Kara wrote:
> How come? In current kernel the call looks like:
> 
> vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);
> 
> so the path is available there... I've actually quickly checked all
> vfs_mknod() callers and they all seem to have path available.

terribly sorry, must have misremembered something. Been staring at the
code long into the night. You are quite right.

But it is an internal mount so userspace never gets the notifications.
The same goes for the cloned upper mount in overlayfs. This might be
considered ok, because the change is semantically "internal" and does
not originate through any userspace-visible mountpoint. Superblock
watches would solve this case.

Otherwise it seems feasible to pass a path to all VFS functions.

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Amir Goldstein
On Sun, Mar 19, 2017 at 2:04 PM, Jan Kara  wrote:
> On Sun 19-03-17 11:37:39, Filip Štědronský wrote:
>> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
>> > However if you can really call fsnotify hooks with 'path' available in all
>> > the places, it should be equally hard to just pass 'path' to
>> > vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
>> > calls into several call sites but keep them local to vfs_(create|mkdir|...)
>> > helpers. Hmm?
>>
>> the problem is: not absolutely all. One illuminating example is the use
>> of vfs_mknod in devtmpfs. There a struct path is not only unavailable
>> but makes not semantic sense: the changes do not go thru any mountpoint.
>
> How come? In current kernel the call looks like:
>
> vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);
>
> so the path is available there... I've actually quickly checked all
> vfs_mknod() callers and they all seem to have path available.
>
>> And in general I think there will be situations where you would need
>> to call VFS functions without paths.
>>
>> Thus I suggested either
>> (a) wrapping the VFS functions with path variants, or
>> (b) giving them an optional vfsmount argument that can be set to NULL
>> when it does not make sense
>
> So my first take is that fsnotify calls happen still relatively high in the
> call stack where we should mostly have mount point available from the path
> lookup. That being said there may be places where we've lost that
> information and it will not be easy to propagate it there and that would
> have to be dealt with on case-by-case basis. But mountpoint is needed for
> other stuff like security checks these days as well so we should have it
> available in principle.
>

I agree that propagating the path to fsnotify seem like the right thing to do.
fsnotify_inoderemove() is an example (the only one I know of) where path
is not available (when called down from from dput()), but frankly, I can't
think of any use cases that really needs to make use of the
FS_DELETE_SELF event in that case.

d_delete() which also calls fsnotify_inoderemove() already calls
fsnotify_nameremove() hook with the exact same dentry, so
the FS_DELETE_SELF event can be generated by that hook as well
as the FS_DELETE event.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Amir Goldstein
On Sun, Mar 19, 2017 at 2:04 PM, Jan Kara  wrote:
> On Sun 19-03-17 11:37:39, Filip Štědronský wrote:
>> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
>> > However if you can really call fsnotify hooks with 'path' available in all
>> > the places, it should be equally hard to just pass 'path' to
>> > vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
>> > calls into several call sites but keep them local to vfs_(create|mkdir|...)
>> > helpers. Hmm?
>>
>> the problem is: not absolutely all. One illuminating example is the use
>> of vfs_mknod in devtmpfs. There a struct path is not only unavailable
>> but makes not semantic sense: the changes do not go thru any mountpoint.
>
> How come? In current kernel the call looks like:
>
> vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);
>
> so the path is available there... I've actually quickly checked all
> vfs_mknod() callers and they all seem to have path available.
>
>> And in general I think there will be situations where you would need
>> to call VFS functions without paths.
>>
>> Thus I suggested either
>> (a) wrapping the VFS functions with path variants, or
>> (b) giving them an optional vfsmount argument that can be set to NULL
>> when it does not make sense
>
> So my first take is that fsnotify calls happen still relatively high in the
> call stack where we should mostly have mount point available from the path
> lookup. That being said there may be places where we've lost that
> information and it will not be easy to propagate it there and that would
> have to be dealt with on case-by-case basis. But mountpoint is needed for
> other stuff like security checks these days as well so we should have it
> available in principle.
>

I agree that propagating the path to fsnotify seem like the right thing to do.
fsnotify_inoderemove() is an example (the only one I know of) where path
is not available (when called down from from dput()), but frankly, I can't
think of any use cases that really needs to make use of the
FS_DELETE_SELF event in that case.

d_delete() which also calls fsnotify_inoderemove() already calls
fsnotify_nameremove() hook with the exact same dentry, so
the FS_DELETE_SELF event can be generated by that hook as well
as the FS_DELETE event.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Jan Kara
On Sun 19-03-17 11:37:39, Filip Štědronský wrote:
> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > However if you can really call fsnotify hooks with 'path' available in all
> > the places, it should be equally hard to just pass 'path' to
> > vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> > calls into several call sites but keep them local to vfs_(create|mkdir|...)
> > helpers. Hmm?
> 
> the problem is: not absolutely all. One illuminating example is the use
> of vfs_mknod in devtmpfs. There a struct path is not only unavailable
> but makes not semantic sense: the changes do not go thru any mountpoint.

How come? In current kernel the call looks like:

vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);

so the path is available there... I've actually quickly checked all
vfs_mknod() callers and they all seem to have path available.

> And in general I think there will be situations where you would need
> to call VFS functions without paths.
> 
> Thus I suggested either
> (a) wrapping the VFS functions with path variants, or
> (b) giving them an optional vfsmount argument that can be set to NULL
> when it does not make sense

So my first take is that fsnotify calls happen still relatively high in the
call stack where we should mostly have mount point available from the path
lookup. That being said there may be places where we've lost that
information and it will not be easy to propagate it there and that would
have to be dealt with on case-by-case basis. But mountpoint is needed for
other stuff like security checks these days as well so we should have it
available in principle.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-20 Thread Jan Kara
On Sun 19-03-17 11:37:39, Filip Štědronský wrote:
> On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> > However if you can really call fsnotify hooks with 'path' available in all
> > the places, it should be equally hard to just pass 'path' to
> > vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> > calls into several call sites but keep them local to vfs_(create|mkdir|...)
> > helpers. Hmm?
> 
> the problem is: not absolutely all. One illuminating example is the use
> of vfs_mknod in devtmpfs. There a struct path is not only unavailable
> but makes not semantic sense: the changes do not go thru any mountpoint.

How come? In current kernel the call looks like:

vfs_mknod(d_inode(path.dentry), dentry, mode, dev->devt);

so the path is available there... I've actually quickly checked all
vfs_mknod() callers and they all seem to have path available.

> And in general I think there will be situations where you would need
> to call VFS functions without paths.
> 
> Thus I suggested either
> (a) wrapping the VFS functions with path variants, or
> (b) giving them an optional vfsmount argument that can be set to NULL
> when it does not make sense

So my first take is that fsnotify calls happen still relatively high in the
call stack where we should mostly have mount point available from the path
lookup. That being said there may be places where we've lost that
information and it will not be easy to propagate it there and that would
have to be dealt with on case-by-case basis. But mountpoint is needed for
other stuff like security checks these days as well so we should have it
available in principle.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-19 Thread Filip Štědronský
On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> However if you can really call fsnotify hooks with 'path' available in all
> the places, it should be equally hard to just pass 'path' to
> vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> calls into several call sites but keep them local to vfs_(create|mkdir|...)
> helpers. Hmm?

the problem is: not absolutely all. One illuminating example is the use
of vfs_mknod in devtmpfs. There a struct path is not only unavailable
but makes not semantic sense: the changes do not go thru any mountpoint.
And in general I think there will be situations where you would need
to call VFS functions without paths.

Thus I suggested either
(a) wrapping the VFS functions with path variants, or
(b) giving them an optional vfsmount argument that can be set to NULL
when it does not make sense

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-19 Thread Filip Štědronský
On Sun, Mar 19, 2017 at 11:19:43AM +0100, Jan Kara wrote:
> However if you can really call fsnotify hooks with 'path' available in all
> the places, it should be equally hard to just pass 'path' to
> vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
> calls into several call sites but keep them local to vfs_(create|mkdir|...)
> helpers. Hmm?

the problem is: not absolutely all. One illuminating example is the use
of vfs_mknod in devtmpfs. There a struct path is not only unavailable
but makes not semantic sense: the changes do not go thru any mountpoint.
And in general I think there will be situations where you would need
to call VFS functions without paths.

Thus I suggested either
(a) wrapping the VFS functions with path variants, or
(b) giving them an optional vfsmount argument that can be set to NULL
when it does not make sense

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-19 Thread Jan Kara
On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> > Besause fanotify requires `struct path`, the event cannot be generated
> > directly in `fsnotify_move` and friends because they only get the inode
> > (and their callers, `vfs_rename` cannot supply any better info).
> > So instead it needs to be generated higher in the call chain, i.e. in
> > the callers of functions like `vfs_rename`.
> >
> > This leads to some code duplication. Currently, there are several places
> > whence functions like `vfs_rename` or `vfs_unlink` are called:
> >
> >   * syscall handlers (done)
> >   * NFS server (done)
> >   * stacked filesystems
> >   - ecryptfs (done)
> >   - overlayfs
> > (Currently doesn't report even ordinary fanotify events, because
> >  it internally clones the upper mount; not sure about the
> >  rationale.  One can always watch the overlay mount instead.)
> >   * few rather minor things
> >   - devtmpfs
> > (its internal changes are not tied to any vfsmount so it cannot
> >  emit mount-scoped events)
> >   - cachefiles (done)
> >   - ipc/mqueue.c (done)
> >   - fs/nfsd/nfs4recover.c (done)
> >   - kernel/bpf/inode.c (done)
> > net/unix/af_unix.c (done)
> >
> > (grep -rE 
> > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> >
> > Signed-off-by: Filip Štědronský 
> >
> > ---
> >
> > An alternative might be to create wrapper functions like
> > vfs_path_(rename|unlink|...). They could also take care of calling
> > security_path_(rename|unlink|...), which is currently also up to
> > the indvidual callers (possibly with a flag because it might not
> > be always desired).
> 
> That's an interesting idea. There is some duplicity between security/audit
> hook and fsnotify hooks. It should be interesting to try and deduplicate
> some of this code.

Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.
And at least for NFSD it seems correct they don't call them since you are
running in a context of an NFSD server process and don't have security
context of the process actually issuing the IO. So I'm not sure
"deduplication" is really possible.

However if you can really call fsnotify hooks with 'path' available in all
the places, it should be equally hard to just pass 'path' to
vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
calls into several call sites but keep them local to vfs_(create|mkdir|...)
helpers. Hmm?

> 
> > ---
> >  fs/cachefiles/namei.c |  9 +++
> >  fs/ecryptfs/inode.c   | 67 
> > +++
> >  fs/namei.c| 23 +-
> >  fs/nfsd/nfs4recover.c |  7 ++
> >  fs/nfsd/vfs.c | 24 --
> >  ipc/mqueue.c  |  9 +++
> >  kernel/bpf/inode.c|  3 +++
> >  net/unix/af_unix.c|  2 ++
> >  8 files changed, 141 insertions(+), 3 deletions(-)
> >
> 
> OK, just for comparison, I am going to put here the diff of the sub set of
> my patches that are needed to support fanotify filename events.
> 
> 
> $ git diff --stat fsnotify_sb..fanotify_dentry
>  fs/notify/fanotify/fanotify.c  | 94
> --
>  fs/notify/fanotify/fanotify.h  | 25 -
>  fs/notify/fanotify/fanotify_user.c | 92
> 
>  fs/notify/fdinfo.c | 25 +++--
>  fs/notify/inode_mark.c |  1 +
>  fs/notify/mark.c   | 15 ---
>  include/linux/fsnotify_backend.h   | 21 -
>  include/uapi/linux/fanotify.h  | 41
> +++--
>  8 files changed, 255 insertions(+), 59 deletions(-)
> 
> Yes, it is a bit more code, much mostly because it adds more functionality
> (optionally reporting the filename).
> But most of the code is contained within the fsnotify/fanotify subsystem.

So I'm not that concerned about actual line numbers. They play some role
but they don't tell much about how maintainable the result is in the end.

> The altenative to sprinkle fsnotify_modify_dir() hooks is much less
> maintainable IMO.

Agreed on this.

> I managed to stay a way from cross subsystems changes,
> by allowing to loose some information about the event.
> 
> When a filename event is generated (rename|delete|create)
> the path of the parent fd that will be reported to user is NOT
> the actual path that the process executing the operation used,
> but the path from which the watching process has added the mark.

Yeah, and frankly I'm not yet convinced this is a sane thing to do. I'd
much rather propagate path to vfs_(create|mkdir|...) helpers.

> So for example, if you have a bind mount:
> mount -o bind 

Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-19 Thread Jan Kara
On Tue 14-03-17 13:18:01, Amir Goldstein wrote:
> On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> > Besause fanotify requires `struct path`, the event cannot be generated
> > directly in `fsnotify_move` and friends because they only get the inode
> > (and their callers, `vfs_rename` cannot supply any better info).
> > So instead it needs to be generated higher in the call chain, i.e. in
> > the callers of functions like `vfs_rename`.
> >
> > This leads to some code duplication. Currently, there are several places
> > whence functions like `vfs_rename` or `vfs_unlink` are called:
> >
> >   * syscall handlers (done)
> >   * NFS server (done)
> >   * stacked filesystems
> >   - ecryptfs (done)
> >   - overlayfs
> > (Currently doesn't report even ordinary fanotify events, because
> >  it internally clones the upper mount; not sure about the
> >  rationale.  One can always watch the overlay mount instead.)
> >   * few rather minor things
> >   - devtmpfs
> > (its internal changes are not tied to any vfsmount so it cannot
> >  emit mount-scoped events)
> >   - cachefiles (done)
> >   - ipc/mqueue.c (done)
> >   - fs/nfsd/nfs4recover.c (done)
> >   - kernel/bpf/inode.c (done)
> > net/unix/af_unix.c (done)
> >
> > (grep -rE 
> > '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
> >
> > Signed-off-by: Filip Štědronský 
> >
> > ---
> >
> > An alternative might be to create wrapper functions like
> > vfs_path_(rename|unlink|...). They could also take care of calling
> > security_path_(rename|unlink|...), which is currently also up to
> > the indvidual callers (possibly with a flag because it might not
> > be always desired).
> 
> That's an interesting idea. There is some duplicity between security/audit
> hook and fsnotify hooks. It should be interesting to try and deduplicate
> some of this code.

Yeah, but ecryptfs or nfsd don't actually call these security hooks AFAICT.
And at least for NFSD it seems correct they don't call them since you are
running in a context of an NFSD server process and don't have security
context of the process actually issuing the IO. So I'm not sure
"deduplication" is really possible.

However if you can really call fsnotify hooks with 'path' available in all
the places, it should be equally hard to just pass 'path' to
vfs_(create|mkdir|...) and that way we don't have to sprinkle fsnotify
calls into several call sites but keep them local to vfs_(create|mkdir|...)
helpers. Hmm?

> 
> > ---
> >  fs/cachefiles/namei.c |  9 +++
> >  fs/ecryptfs/inode.c   | 67 
> > +++
> >  fs/namei.c| 23 +-
> >  fs/nfsd/nfs4recover.c |  7 ++
> >  fs/nfsd/vfs.c | 24 --
> >  ipc/mqueue.c  |  9 +++
> >  kernel/bpf/inode.c|  3 +++
> >  net/unix/af_unix.c|  2 ++
> >  8 files changed, 141 insertions(+), 3 deletions(-)
> >
> 
> OK, just for comparison, I am going to put here the diff of the sub set of
> my patches that are needed to support fanotify filename events.
> 
> 
> $ git diff --stat fsnotify_sb..fanotify_dentry
>  fs/notify/fanotify/fanotify.c  | 94
> --
>  fs/notify/fanotify/fanotify.h  | 25 -
>  fs/notify/fanotify/fanotify_user.c | 92
> 
>  fs/notify/fdinfo.c | 25 +++--
>  fs/notify/inode_mark.c |  1 +
>  fs/notify/mark.c   | 15 ---
>  include/linux/fsnotify_backend.h   | 21 -
>  include/uapi/linux/fanotify.h  | 41
> +++--
>  8 files changed, 255 insertions(+), 59 deletions(-)
> 
> Yes, it is a bit more code, much mostly because it adds more functionality
> (optionally reporting the filename).
> But most of the code is contained within the fsnotify/fanotify subsystem.

So I'm not that concerned about actual line numbers. They play some role
but they don't tell much about how maintainable the result is in the end.

> The altenative to sprinkle fsnotify_modify_dir() hooks is much less
> maintainable IMO.

Agreed on this.

> I managed to stay a way from cross subsystems changes,
> by allowing to loose some information about the event.
> 
> When a filename event is generated (rename|delete|create)
> the path of the parent fd that will be reported to user is NOT
> the actual path that the process executing the operation used,
> but the path from which the watching process has added the mark.

Yeah, and frankly I'm not yet convinced this is a sane thing to do. I'd
much rather propagate path to vfs_(create|mkdir|...) helpers.

> So for example, if you have a bind mount:
> mount -o bind /a/b/c/d/e/f/g /tmp/g
> 
> And you add a 

Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Amir Goldstein
On Wed, Mar 15, 2017 at 3:39 PM, Jan Kara  wrote:
> On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
>> Filip Štědronský :
>>
>> > there are basically two classes of uses for a fantotify-like
>> > interface:
>> >
>> > (1) Keeping an up-to-date representation of the file system. For this,
>> > superblock watches are clearly what you want.
>> >
>> > [...]
>> >
>> > All those factors speak greatly in favour of superblock
>> > watches.
>> >
>> > (2) Tracking filesystem *activity*. Now you are not building
>> > an image of current filesystem state but rather a log of what
>> > happened. Perhaps you are also interested in who
>> > (user/process/...) did what. Permission events also fit mostly in
>> > this category.
>> >
>> > For those it *might* make sense to have mount-scoped watches, for
>> > example if you want to monitor only one container or a subset of
>> > processes.
>> >
>> > We both concentrate on the first but we shouldn't forget about the
>> > second, which was one of the original motivations for fanotify.
>>
>> My (employer's) needs are centered around (2). We definitely crave
>> permission events with a filesystem scope. At the moment, you can avoid
>> permission checks with a simple unshare command (> https://lkml.org/lkml/2016/12/21/144>).
>
> Yes, that is bad.
>
>> So I must be able to see everything that is happening in my universe. It
>> might also be useful to monitor a subuniverse of mine, but the former
>> need is critical at the moment.
>
> So I understand your need. However with superblock watches I'm still
> concerned that the process would be able to see too much. E.g. if it is
> restricted to see only some subtree of a filesystem (by bind mounts &
> namespaces), it should not be able to see events on the same filesystem
> outside of that subtree. I have not found a good solution for that yet.
>

See the last patch in my series. The cherry on the top ;-)

commit 5e3b5bd943991cdf5b72745c1e24833bc998b7ed
Author: Amir Goldstein 
Date:   Sun Dec 18 11:25:55 2016 +0200

fanotify: filter events by root mark mount point

When adding a super block root watch from a mount point that is not mounted
on the root of the file system, filter out events on file system objects
that happen outside this mount point directory (on non decendant objects).

This is not like FAN_MARK_MOUNT which filters only events that happened
on the mount of the mark. All events on file system objects are reported
as long as these objects are accessible from the mark mount point.

Signed-off-by: Amir Goldstein 

 fs/notify/fanotify/fanotify.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Our use case is monitoring a large directory tree, not the entire file system.

I used a simple check in should_send_event()

+   /*
+* Only interesetd in dentry events visible from the mount
+* from which the root watch was added
+*/
+   if (mark_mnt && mark_mnt->mnt_root != dentry &&
+   d_ancestor(mark_mnt->mnt_root, dentry) == NULL)
+   return false;
+

This does not cover the case of events on objects that are hidden
under another mount in my mount namespace, but it covers the
simple case of bind mount.

Note that 'mark_mnt' does NOT stand for the vfs_mark mount,
because root watch is an inode_mark (on the root inode).
It stands for the mount from which the root watch was added.
Same mount that is used to construct the event->fd for all
the dentry events.

This scheme does NOT allow multiple root watches with
different mount point filter on the same group.
Every group can have just one root watch per sb with a single
mount filter.

Amir.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Amir Goldstein
On Wed, Mar 15, 2017 at 3:39 PM, Jan Kara  wrote:
> On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
>> Filip Štědronský :
>>
>> > there are basically two classes of uses for a fantotify-like
>> > interface:
>> >
>> > (1) Keeping an up-to-date representation of the file system. For this,
>> > superblock watches are clearly what you want.
>> >
>> > [...]
>> >
>> > All those factors speak greatly in favour of superblock
>> > watches.
>> >
>> > (2) Tracking filesystem *activity*. Now you are not building
>> > an image of current filesystem state but rather a log of what
>> > happened. Perhaps you are also interested in who
>> > (user/process/...) did what. Permission events also fit mostly in
>> > this category.
>> >
>> > For those it *might* make sense to have mount-scoped watches, for
>> > example if you want to monitor only one container or a subset of
>> > processes.
>> >
>> > We both concentrate on the first but we shouldn't forget about the
>> > second, which was one of the original motivations for fanotify.
>>
>> My (employer's) needs are centered around (2). We definitely crave
>> permission events with a filesystem scope. At the moment, you can avoid
>> permission checks with a simple unshare command (> https://lkml.org/lkml/2016/12/21/144>).
>
> Yes, that is bad.
>
>> So I must be able to see everything that is happening in my universe. It
>> might also be useful to monitor a subuniverse of mine, but the former
>> need is critical at the moment.
>
> So I understand your need. However with superblock watches I'm still
> concerned that the process would be able to see too much. E.g. if it is
> restricted to see only some subtree of a filesystem (by bind mounts &
> namespaces), it should not be able to see events on the same filesystem
> outside of that subtree. I have not found a good solution for that yet.
>

See the last patch in my series. The cherry on the top ;-)

commit 5e3b5bd943991cdf5b72745c1e24833bc998b7ed
Author: Amir Goldstein 
Date:   Sun Dec 18 11:25:55 2016 +0200

fanotify: filter events by root mark mount point

When adding a super block root watch from a mount point that is not mounted
on the root of the file system, filter out events on file system objects
that happen outside this mount point directory (on non decendant objects).

This is not like FAN_MARK_MOUNT which filters only events that happened
on the mount of the mark. All events on file system objects are reported
as long as these objects are accessible from the mark mount point.

Signed-off-by: Amir Goldstein 

 fs/notify/fanotify/fanotify.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Our use case is monitoring a large directory tree, not the entire file system.

I used a simple check in should_send_event()

+   /*
+* Only interesetd in dentry events visible from the mount
+* from which the root watch was added
+*/
+   if (mark_mnt && mark_mnt->mnt_root != dentry &&
+   d_ancestor(mark_mnt->mnt_root, dentry) == NULL)
+   return false;
+

This does not cover the case of events on objects that are hidden
under another mount in my mount namespace, but it covers the
simple case of bind mount.

Note that 'mark_mnt' does NOT stand for the vfs_mark mount,
because root watch is an inode_mark (on the root inode).
It stands for the mount from which the root watch was added.
Same mount that is used to construct the event->fd for all
the dentry events.

This scheme does NOT allow multiple root watches with
different mount point filter on the same group.
Every group can have just one root watch per sb with a single
mount filter.

Amir.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Marko Rauhamaa
Jan Kara :

> On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
>> As for "who (user/process/...) did what", the fanotify API is flawed
>> in that we don't have a CLOSE_WRITE_PERM event. The hit-and-run
>> process is long gone by the time we receive the event. That's more of
>> a rule than an exception.
>
> Adding CLOSE_WRITE_PERM would not be that difficult I assume. What do you
> need it for?

Mainly to hold the process hostage until I have verified the content
change. If I disqualify the content change, I will need to report on the
process. CLOSE_WRITE only gives me a pid that is often stale as it
doesn't block the process.

(Another possibility would be to keep the process around as a zombie as
long as the CLOSE_WRITE event's file descriptor is open. That sounds
more complicated and questionable, though.)


Marko

-- 
+358 44 990 4795
Skype: marko.rauhamaa_f-secure


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Marko Rauhamaa
Jan Kara :

> On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
>> As for "who (user/process/...) did what", the fanotify API is flawed
>> in that we don't have a CLOSE_WRITE_PERM event. The hit-and-run
>> process is long gone by the time we receive the event. That's more of
>> a rule than an exception.
>
> Adding CLOSE_WRITE_PERM would not be that difficult I assume. What do you
> need it for?

Mainly to hold the process hostage until I have verified the content
change. If I disqualify the content change, I will need to report on the
process. CLOSE_WRITE only gives me a pid that is often stale as it
doesn't block the process.

(Another possibility would be to keep the process around as a zombie as
long as the CLOSE_WRITE event's file descriptor is open. That sounds
more complicated and questionable, though.)


Marko

-- 
+358 44 990 4795
Skype: marko.rauhamaa_f-secure


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Jan Kara
On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
> Filip Štědronský :
> 
> > there are basically two classes of uses for a fantotify-like
> > interface:
> >
> > (1) Keeping an up-to-date representation of the file system. For this,
> > superblock watches are clearly what you want.
> >
> > [...]
> >
> > All those factors speak greatly in favour of superblock
> > watches.
> >
> > (2) Tracking filesystem *activity*. Now you are not building
> > an image of current filesystem state but rather a log of what
> > happened. Perhaps you are also interested in who
> > (user/process/...) did what. Permission events also fit mostly in
> > this category.
> >
> > For those it *might* make sense to have mount-scoped watches, for
> > example if you want to monitor only one container or a subset of
> > processes.
> >
> > We both concentrate on the first but we shouldn't forget about the
> > second, which was one of the original motivations for fanotify.
> 
> My (employer's) needs are centered around (2). We definitely crave
> permission events with a filesystem scope. At the moment, you can avoid
> permission checks with a simple unshare command ( https://lkml.org/lkml/2016/12/21/144>).

Yes, that is bad.

> So I must be able to see everything that is happening in my universe. It
> might also be useful to monitor a subuniverse of mine, but the former
> need is critical at the moment.

So I understand your need. However with superblock watches I'm still
concerned that the process would be able to see too much. E.g. if it is
restricted to see only some subtree of a filesystem (by bind mounts &
namespaces), it should not be able to see events on the same filesystem
outside of that subtree. I have not found a good solution for that yet.

> As for "who (user/process/...) did what", the fanotify API is flawed in
> that we don't have a CLOSE_WRITE_PERM event. The hit-and-run process is
> long gone by the time we receive the event. That's more of a rule than
> an exception.

Adding CLOSE_WRITE_PERM would not be that difficult I assume. What do you
need it for?

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Jan Kara
On Wed 15-03-17 10:19:52, Marko Rauhamaa wrote:
> Filip Štědronský :
> 
> > there are basically two classes of uses for a fantotify-like
> > interface:
> >
> > (1) Keeping an up-to-date representation of the file system. For this,
> > superblock watches are clearly what you want.
> >
> > [...]
> >
> > All those factors speak greatly in favour of superblock
> > watches.
> >
> > (2) Tracking filesystem *activity*. Now you are not building
> > an image of current filesystem state but rather a log of what
> > happened. Perhaps you are also interested in who
> > (user/process/...) did what. Permission events also fit mostly in
> > this category.
> >
> > For those it *might* make sense to have mount-scoped watches, for
> > example if you want to monitor only one container or a subset of
> > processes.
> >
> > We both concentrate on the first but we shouldn't forget about the
> > second, which was one of the original motivations for fanotify.
> 
> My (employer's) needs are centered around (2). We definitely crave
> permission events with a filesystem scope. At the moment, you can avoid
> permission checks with a simple unshare command ( https://lkml.org/lkml/2016/12/21/144>).

Yes, that is bad.

> So I must be able to see everything that is happening in my universe. It
> might also be useful to monitor a subuniverse of mine, but the former
> need is critical at the moment.

So I understand your need. However with superblock watches I'm still
concerned that the process would be able to see too much. E.g. if it is
restricted to see only some subtree of a filesystem (by bind mounts &
namespaces), it should not be able to see events on the same filesystem
outside of that subtree. I have not found a good solution for that yet.

> As for "who (user/process/...) did what", the fanotify API is flawed in
> that we don't have a CLOSE_WRITE_PERM event. The hit-and-run process is
> long gone by the time we receive the event. That's more of a rule than
> an exception.

Adding CLOSE_WRITE_PERM would not be that difficult I assume. What do you
need it for?

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Marko Rauhamaa
Filip Štědronský :

> there are basically two classes of uses for a fantotify-like
> interface:
>
> (1) Keeping an up-to-date representation of the file system. For this,
> superblock watches are clearly what you want.
>
> [...]
>
> All those factors speak greatly in favour of superblock
> watches.
>
> (2) Tracking filesystem *activity*. Now you are not building
> an image of current filesystem state but rather a log of what
> happened. Perhaps you are also interested in who
> (user/process/...) did what. Permission events also fit mostly in
> this category.
>
> For those it *might* make sense to have mount-scoped watches, for
> example if you want to monitor only one container or a subset of
> processes.
>
> We both concentrate on the first but we shouldn't forget about the
> second, which was one of the original motivations for fanotify.

My (employer's) needs are centered around (2). We definitely crave
permission events with a filesystem scope. At the moment, you can avoid
permission checks with a simple unshare command (https://lkml.org/lkml/2016/12/21/144>).

So I must be able to see everything that is happening in my universe. It
might also be useful to monitor a subuniverse of mine, but the former
need is critical at the moment.

As for "who (user/process/...) did what", the fanotify API is flawed in
that we don't have a CLOSE_WRITE_PERM event. The hit-and-run process is
long gone by the time we receive the event. That's more of a rule than
an exception.


Marko


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-15 Thread Marko Rauhamaa
Filip Štědronský :

> there are basically two classes of uses for a fantotify-like
> interface:
>
> (1) Keeping an up-to-date representation of the file system. For this,
> superblock watches are clearly what you want.
>
> [...]
>
> All those factors speak greatly in favour of superblock
> watches.
>
> (2) Tracking filesystem *activity*. Now you are not building
> an image of current filesystem state but rather a log of what
> happened. Perhaps you are also interested in who
> (user/process/...) did what. Permission events also fit mostly in
> this category.
>
> For those it *might* make sense to have mount-scoped watches, for
> example if you want to monitor only one container or a subset of
> processes.
>
> We both concentrate on the first but we shouldn't forget about the
> second, which was one of the original motivations for fanotify.

My (employer's) needs are centered around (2). We definitely crave
permission events with a filesystem scope. At the moment, you can avoid
permission checks with a simple unshare command (https://lkml.org/lkml/2016/12/21/144>).

So I must be able to see everything that is happening in my universe. It
might also be useful to monitor a subuniverse of mine, but the former
need is critical at the moment.

As for "who (user/process/...) did what", the fanotify API is flawed in
that we don't have a CLOSE_WRITE_PERM event. The hit-and-run process is
long gone by the time we receive the event. That's more of a rule than
an exception.


Marko


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Amir Goldstein
On Tue, Mar 14, 2017 at 4:58 PM, Filip Štědronský  wrote:
> Hi,
>
> On Tue, Mar 14, 2017 at 01:18:01PM +0200, Amir Goldstein wrote:
>> I claim that fanotify filters event by mount not because it
>> was a requirement, but because it was an implementation challenge
>> to do otherwise.
>>
>> And I claim that what mount watchers are really interested in is
>> "all the changes that happen in the file system in the area
>>  that is visible to me through this mount point".
>>
>> In other words, an indexer needs to know if files were modified\
>> create/deleted if that indexer sits in container host namespace
>> regardless if those files were modified from within a container
>> namespace.
>>
>> It's not a matter of security/isolation. It's a matter of functionality.
>> I agree that for some event (e.g. permission events) it is possible
>> to argue both ways (i.e. that the namespace context should be used
>> as a filter for events).
>> But for the new proposed events (FS_MODIFY_DIR), I really don't
>> see the point in isolation by mount/namespace.
>
> there are basically two classes of uses for a fantotify-like
> interface:
>
> (1) Keeping an up-to-date representation of the file system.
> For this, superblock watches are clearly what you want.
>
>   * You are interested to know the current state of the
> filesystem so you need to know about every change,
> regardless of where it came from.
>   * As I mentioned earlier, in case of remote, ditributed
> and virtual filesystems, the change might come from
> within the filesystem itself (if the protocol supports
> reporting such changes). This can probably be
> implemented only with superblock-scoped watches because
> the change is fundamentally not related to any mount.
>   * Some filesystems might also support change journalling
> and it might be concievable to extend the API in the
> future to report "past" events (for example by passing
> sequence number of last seen event or similar).
>   * The argument about containers escaping change notification
> you mentioned earlier.
>
> All those factors speak greatly in favour of superblock
> watches.
>
> (2) Tracking filesystem *activity*. Now you are not building
> an image of current filesystem state but rather a log of
> what happened. Perhaps you are also interested in who
> (user/process/...) did what. Permission events also fit
> mostly in this category.
>
> For those it *might* make sense to have mount-scoped
> watches, for example if you want to monitor only one
> container or a subset of processes.
>
> We both concentrate on the first but we shouldn't forget about
> the second, which was one of the original motivations for
> fanotify.
>
> Thus I conclude that it might be desirable to implement
> mount-scoped filename events in the long run. Even though
> I agree that the sb-scoped events are more important because
> they cover more use cases and you can do additional filtering
> (e.g. by pid) if deemed necessary.
>
> This would require:
>
> (a) Sprinkling the callers of vfs_* with fanotify calls
> as I did, or
> (b) Creating wrapper functions like vfs_path_unlink & co.
> that would make the necessary fanotify call (and probably
> tell the lower function not to generate another
> notification), as I suggested earlier.
> (c) Give the vfs_* functions an *optional* vfsmount argument.
>
> In the end I probably find (c) the most elegant but this
> can be discussed later, even after your changes are merged.
>

Agreed. That is an independent question.
Thanks for the thorough summary.

Amir.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Amir Goldstein
On Tue, Mar 14, 2017 at 4:58 PM, Filip Štědronský  wrote:
> Hi,
>
> On Tue, Mar 14, 2017 at 01:18:01PM +0200, Amir Goldstein wrote:
>> I claim that fanotify filters event by mount not because it
>> was a requirement, but because it was an implementation challenge
>> to do otherwise.
>>
>> And I claim that what mount watchers are really interested in is
>> "all the changes that happen in the file system in the area
>>  that is visible to me through this mount point".
>>
>> In other words, an indexer needs to know if files were modified\
>> create/deleted if that indexer sits in container host namespace
>> regardless if those files were modified from within a container
>> namespace.
>>
>> It's not a matter of security/isolation. It's a matter of functionality.
>> I agree that for some event (e.g. permission events) it is possible
>> to argue both ways (i.e. that the namespace context should be used
>> as a filter for events).
>> But for the new proposed events (FS_MODIFY_DIR), I really don't
>> see the point in isolation by mount/namespace.
>
> there are basically two classes of uses for a fantotify-like
> interface:
>
> (1) Keeping an up-to-date representation of the file system.
> For this, superblock watches are clearly what you want.
>
>   * You are interested to know the current state of the
> filesystem so you need to know about every change,
> regardless of where it came from.
>   * As I mentioned earlier, in case of remote, ditributed
> and virtual filesystems, the change might come from
> within the filesystem itself (if the protocol supports
> reporting such changes). This can probably be
> implemented only with superblock-scoped watches because
> the change is fundamentally not related to any mount.
>   * Some filesystems might also support change journalling
> and it might be concievable to extend the API in the
> future to report "past" events (for example by passing
> sequence number of last seen event or similar).
>   * The argument about containers escaping change notification
> you mentioned earlier.
>
> All those factors speak greatly in favour of superblock
> watches.
>
> (2) Tracking filesystem *activity*. Now you are not building
> an image of current filesystem state but rather a log of
> what happened. Perhaps you are also interested in who
> (user/process/...) did what. Permission events also fit
> mostly in this category.
>
> For those it *might* make sense to have mount-scoped
> watches, for example if you want to monitor only one
> container or a subset of processes.
>
> We both concentrate on the first but we shouldn't forget about
> the second, which was one of the original motivations for
> fanotify.
>
> Thus I conclude that it might be desirable to implement
> mount-scoped filename events in the long run. Even though
> I agree that the sb-scoped events are more important because
> they cover more use cases and you can do additional filtering
> (e.g. by pid) if deemed necessary.
>
> This would require:
>
> (a) Sprinkling the callers of vfs_* with fanotify calls
> as I did, or
> (b) Creating wrapper functions like vfs_path_unlink & co.
> that would make the necessary fanotify call (and probably
> tell the lower function not to generate another
> notification), as I suggested earlier.
> (c) Give the vfs_* functions an *optional* vfsmount argument.
>
> In the end I probably find (c) the most elegant but this
> can be discussed later, even after your changes are merged.
>

Agreed. That is an independent question.
Thanks for the thorough summary.

Amir.


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Filip Štědronský
Hi,

On Tue, Mar 14, 2017 at 01:18:01PM +0200, Amir Goldstein wrote:
> I claim that fanotify filters event by mount not because it
> was a requirement, but because it was an implementation challenge
> to do otherwise.
>
> And I claim that what mount watchers are really interested in is
> "all the changes that happen in the file system in the area
>  that is visible to me through this mount point".
>
> In other words, an indexer needs to know if files were modified\
> create/deleted if that indexer sits in container host namespace
> regardless if those files were modified from within a container
> namespace.
> 
> It's not a matter of security/isolation. It's a matter of functionality.
> I agree that for some event (e.g. permission events) it is possible
> to argue both ways (i.e. that the namespace context should be used
> as a filter for events).
> But for the new proposed events (FS_MODIFY_DIR), I really don't
> see the point in isolation by mount/namespace.

there are basically two classes of uses for a fantotify-like
interface:

(1) Keeping an up-to-date representation of the file system.
For this, superblock watches are clearly what you want.

  * You are interested to know the current state of the
filesystem so you need to know about every change, 
regardless of where it came from.
  * As I mentioned earlier, in case of remote, ditributed
and virtual filesystems, the change might come from
within the filesystem itself (if the protocol supports
reporting such changes). This can probably be
implemented only with superblock-scoped watches because
the change is fundamentally not related to any mount.
  * Some filesystems might also support change journalling
and it might be concievable to extend the API in the
future to report "past" events (for example by passing
sequence number of last seen event or similar).
  * The argument about containers escaping change notification
you mentioned earlier.

All those factors speak greatly in favour of superblock
watches.

(2) Tracking filesystem *activity*. Now you are not building
an image of current filesystem state but rather a log of
what happened. Perhaps you are also interested in who
(user/process/...) did what. Permission events also fit
mostly in this category.

For those it *might* make sense to have mount-scoped
watches, for example if you want to monitor only one
container or a subset of processes.

We both concentrate on the first but we shouldn't forget about
the second, which was one of the original motivations for
fanotify.

Thus I conclude that it might be desirable to implement
mount-scoped filename events in the long run. Even though
I agree that the sb-scoped events are more important because
they cover more use cases and you can do additional filtering
(e.g. by pid) if deemed necessary.

This would require:

(a) Sprinkling the callers of vfs_* with fanotify calls
as I did, or
(b) Creating wrapper functions like vfs_path_unlink & co.
that would make the necessary fanotify call (and probably
tell the lower function not to generate another
notification), as I suggested earlier.
(c) Give the vfs_* functions an *optional* vfsmount argument.

In the end I probably find (c) the most elegant but this
can be discussed later, even after your changes are merged.

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Filip Štědronský
Hi,

On Tue, Mar 14, 2017 at 01:18:01PM +0200, Amir Goldstein wrote:
> I claim that fanotify filters event by mount not because it
> was a requirement, but because it was an implementation challenge
> to do otherwise.
>
> And I claim that what mount watchers are really interested in is
> "all the changes that happen in the file system in the area
>  that is visible to me through this mount point".
>
> In other words, an indexer needs to know if files were modified\
> create/deleted if that indexer sits in container host namespace
> regardless if those files were modified from within a container
> namespace.
> 
> It's not a matter of security/isolation. It's a matter of functionality.
> I agree that for some event (e.g. permission events) it is possible
> to argue both ways (i.e. that the namespace context should be used
> as a filter for events).
> But for the new proposed events (FS_MODIFY_DIR), I really don't
> see the point in isolation by mount/namespace.

there are basically two classes of uses for a fantotify-like
interface:

(1) Keeping an up-to-date representation of the file system.
For this, superblock watches are clearly what you want.

  * You are interested to know the current state of the
filesystem so you need to know about every change, 
regardless of where it came from.
  * As I mentioned earlier, in case of remote, ditributed
and virtual filesystems, the change might come from
within the filesystem itself (if the protocol supports
reporting such changes). This can probably be
implemented only with superblock-scoped watches because
the change is fundamentally not related to any mount.
  * Some filesystems might also support change journalling
and it might be concievable to extend the API in the
future to report "past" events (for example by passing
sequence number of last seen event or similar).
  * The argument about containers escaping change notification
you mentioned earlier.

All those factors speak greatly in favour of superblock
watches.

(2) Tracking filesystem *activity*. Now you are not building
an image of current filesystem state but rather a log of
what happened. Perhaps you are also interested in who
(user/process/...) did what. Permission events also fit
mostly in this category.

For those it *might* make sense to have mount-scoped
watches, for example if you want to monitor only one
container or a subset of processes.

We both concentrate on the first but we shouldn't forget about
the second, which was one of the original motivations for
fanotify.

Thus I conclude that it might be desirable to implement
mount-scoped filename events in the long run. Even though
I agree that the sb-scoped events are more important because
they cover more use cases and you can do additional filtering
(e.g. by pid) if deemed necessary.

This would require:

(a) Sprinkling the callers of vfs_* with fanotify calls
as I did, or
(b) Creating wrapper functions like vfs_path_unlink & co.
that would make the necessary fanotify call (and probably
tell the lower function not to generate another
notification), as I suggested earlier.
(c) Give the vfs_* functions an *optional* vfsmount argument.

In the end I probably find (c) the most elegant but this
can be discussed later, even after your changes are merged.

Filip


Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Amir Goldstein
On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> Besause fanotify requires `struct path`, the event cannot be generated
> directly in `fsnotify_move` and friends because they only get the inode
> (and their callers, `vfs_rename` cannot supply any better info).
> So instead it needs to be generated higher in the call chain, i.e. in
> the callers of functions like `vfs_rename`.
>
> This leads to some code duplication. Currently, there are several places
> whence functions like `vfs_rename` or `vfs_unlink` are called:
>
>   * syscall handlers (done)
>   * NFS server (done)
>   * stacked filesystems
>   - ecryptfs (done)
>   - overlayfs
> (Currently doesn't report even ordinary fanotify events, because
>  it internally clones the upper mount; not sure about the
>  rationale.  One can always watch the overlay mount instead.)
>   * few rather minor things
>   - devtmpfs
> (its internal changes are not tied to any vfsmount so it cannot
>  emit mount-scoped events)
>   - cachefiles (done)
>   - ipc/mqueue.c (done)
>   - fs/nfsd/nfs4recover.c (done)
>   - kernel/bpf/inode.c (done)
> net/unix/af_unix.c (done)
>
> (grep -rE 
> '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
>
> Signed-off-by: Filip Štědronský 
>
> ---
>
> An alternative might be to create wrapper functions like
> vfs_path_(rename|unlink|...). They could also take care of calling
> security_path_(rename|unlink|...), which is currently also up to
> the indvidual callers (possibly with a flag because it might not
> be always desired).

That's an interesting idea. There is some duplicity between security/audit
hook and fsnotify hooks. It should be interesting to try and deduplicate
some of this code.

> ---
>  fs/cachefiles/namei.c |  9 +++
>  fs/ecryptfs/inode.c   | 67 
> +++
>  fs/namei.c| 23 +-
>  fs/nfsd/nfs4recover.c |  7 ++
>  fs/nfsd/vfs.c | 24 --
>  ipc/mqueue.c  |  9 +++
>  kernel/bpf/inode.c|  3 +++
>  net/unix/af_unix.c|  2 ++
>  8 files changed, 141 insertions(+), 3 deletions(-)
>

OK, just for comparison, I am going to put here the diff of the sub set of
my patches that are needed to support fanotify filename events.


$ git diff --stat fsnotify_sb..fanotify_dentry
 fs/notify/fanotify/fanotify.c  | 94
--
 fs/notify/fanotify/fanotify.h  | 25 -
 fs/notify/fanotify/fanotify_user.c | 92

 fs/notify/fdinfo.c | 25 +++--
 fs/notify/inode_mark.c |  1 +
 fs/notify/mark.c   | 15 ---
 include/linux/fsnotify_backend.h   | 21 -
 include/uapi/linux/fanotify.h  | 41
+++--
 8 files changed, 255 insertions(+), 59 deletions(-)

Yes, it is a bit more code, much mostly because it adds more functionality
(optionally reporting the filename).
But most of the code is contained within the fsnotify/fanotify subsystem.

The altenative to sprinkle fsnotify_modify_dir() hooks is much less
maintainable IMO.

Of course I am presenting biased information :-)
so for full disclosure, these patches also depend on a previous
cleanup series, with the following diffstat.
But as I claimed before and am going to claim again,
the cleanup series improves the code IMO regardless of
the additional functionality that it enables:

 $ git diff --stat base..fsnotify_dentry
 arch/powerpc/platforms/cell/spufs/inode.c |  2 +-
 fs/btrfs/ioctl.c  |  2 +-
 fs/debugfs/inode.c|  8 
 fs/devpts/inode.c |  2 +-
 fs/namei.c| 23 +--
 fs/notify/fsnotify.c  |  2 +-
 fs/ocfs2/refcounttree.c   |  2 +-
 fs/overlayfs/inode.c  | 15 ---
 fs/tracefs/inode.c|  4 ++--
 include/linux/fsnotify.h  | 78
+-
 include/linux/fsnotify_backend.h  |  3 ++-
 net/sunrpc/rpc_pipe.c |  6 +++---
 12 files changed, 90 insertions(+), 57 deletions(-)

But even when taking the cleanup series into account,
the changes outside of the fsnotify subsystem and include files
are still a lot smaller then in your counter proposal.

This does not come without a price though.
I managed to stay a way from cross subsystems changes,
by allowing to loose some information about the event.

When a filename event is generated (rename|delete|create)
the path of the parent fd that will be 

Re: [RFC 2/2] fanotify: emit FAN_MODIFY_DIR on filesystem changes

2017-03-14 Thread Amir Goldstein
On Tue, Mar 14, 2017 at 1:03 AM, Filip Štědronský  wrote:
> Besause fanotify requires `struct path`, the event cannot be generated
> directly in `fsnotify_move` and friends because they only get the inode
> (and their callers, `vfs_rename` cannot supply any better info).
> So instead it needs to be generated higher in the call chain, i.e. in
> the callers of functions like `vfs_rename`.
>
> This leads to some code duplication. Currently, there are several places
> whence functions like `vfs_rename` or `vfs_unlink` are called:
>
>   * syscall handlers (done)
>   * NFS server (done)
>   * stacked filesystems
>   - ecryptfs (done)
>   - overlayfs
> (Currently doesn't report even ordinary fanotify events, because
>  it internally clones the upper mount; not sure about the
>  rationale.  One can always watch the overlay mount instead.)
>   * few rather minor things
>   - devtmpfs
> (its internal changes are not tied to any vfsmount so it cannot
>  emit mount-scoped events)
>   - cachefiles (done)
>   - ipc/mqueue.c (done)
>   - fs/nfsd/nfs4recover.c (done)
>   - kernel/bpf/inode.c (done)
> net/unix/af_unix.c (done)
>
> (grep -rE 
> '\bvfs_(rename|unlink|mknod|whiteout|create|mkdir|rmdir|symlink|link)\(')
>
> Signed-off-by: Filip Štědronský 
>
> ---
>
> An alternative might be to create wrapper functions like
> vfs_path_(rename|unlink|...). They could also take care of calling
> security_path_(rename|unlink|...), which is currently also up to
> the indvidual callers (possibly with a flag because it might not
> be always desired).

That's an interesting idea. There is some duplicity between security/audit
hook and fsnotify hooks. It should be interesting to try and deduplicate
some of this code.

> ---
>  fs/cachefiles/namei.c |  9 +++
>  fs/ecryptfs/inode.c   | 67 
> +++
>  fs/namei.c| 23 +-
>  fs/nfsd/nfs4recover.c |  7 ++
>  fs/nfsd/vfs.c | 24 --
>  ipc/mqueue.c  |  9 +++
>  kernel/bpf/inode.c|  3 +++
>  net/unix/af_unix.c|  2 ++
>  8 files changed, 141 insertions(+), 3 deletions(-)
>

OK, just for comparison, I am going to put here the diff of the sub set of
my patches that are needed to support fanotify filename events.


$ git diff --stat fsnotify_sb..fanotify_dentry
 fs/notify/fanotify/fanotify.c  | 94
--
 fs/notify/fanotify/fanotify.h  | 25 -
 fs/notify/fanotify/fanotify_user.c | 92

 fs/notify/fdinfo.c | 25 +++--
 fs/notify/inode_mark.c |  1 +
 fs/notify/mark.c   | 15 ---
 include/linux/fsnotify_backend.h   | 21 -
 include/uapi/linux/fanotify.h  | 41
+++--
 8 files changed, 255 insertions(+), 59 deletions(-)

Yes, it is a bit more code, much mostly because it adds more functionality
(optionally reporting the filename).
But most of the code is contained within the fsnotify/fanotify subsystem.

The altenative to sprinkle fsnotify_modify_dir() hooks is much less
maintainable IMO.

Of course I am presenting biased information :-)
so for full disclosure, these patches also depend on a previous
cleanup series, with the following diffstat.
But as I claimed before and am going to claim again,
the cleanup series improves the code IMO regardless of
the additional functionality that it enables:

 $ git diff --stat base..fsnotify_dentry
 arch/powerpc/platforms/cell/spufs/inode.c |  2 +-
 fs/btrfs/ioctl.c  |  2 +-
 fs/debugfs/inode.c|  8 
 fs/devpts/inode.c |  2 +-
 fs/namei.c| 23 +--
 fs/notify/fsnotify.c  |  2 +-
 fs/ocfs2/refcounttree.c   |  2 +-
 fs/overlayfs/inode.c  | 15 ---
 fs/tracefs/inode.c|  4 ++--
 include/linux/fsnotify.h  | 78
+-
 include/linux/fsnotify_backend.h  |  3 ++-
 net/sunrpc/rpc_pipe.c |  6 +++---
 12 files changed, 90 insertions(+), 57 deletions(-)

But even when taking the cleanup series into account,
the changes outside of the fsnotify subsystem and include files
are still a lot smaller then in your counter proposal.

This does not come without a price though.
I managed to stay a way from cross subsystems changes,
by allowing to loose some information about the event.

When a filename event is generated (rename|delete|create)
the path of the parent fd that will be reported to user is NOT
the actual path