subject:"Re\: \[PATCH 14\/39\] ovl\: stack file ops"

Re: [PATCH 14/39] ovl: stack file ops

2018-06-18 Thread Miklos Szeredi

On Fri, Jun 15, 2018 at 06:47:17AM +0100, Al Viro wrote:
> On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:

> > Looked at some other options...  What coda mmap does looks very
> > dubious.  It only sets f_mapping, not vm_file.  That's going to get
> > into all sorts of trouble when underlying fs tries to look at
> > file_inode() or worse, ->private_data.  Looks like that should be
> > converted to what overlayfs does, to have a remote chance of actually
> > not crashing on most filesystems.  Does anybody actually use coda
> > still?
> 
> Keep in mind that coda is using the local fs only as cache; IOW, its needs
> are much more limited than those of overlayfs - local r/w filesystem,
> disk-backed or tmpfs, used pretty much as a scratch space.

Look:

coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
{
[...]
coda_file->f_mapping = host_file->f_mapping;
[...]
return call_mmap(host_file, vma);
}

So that'll end up with vma->vm_file pointing to coda file, coda_file->f_mapping
pointing to host mapping.  Hence vm_ops and a_ops are going to come from host
file, but they'll be getting a "foreign" file with ->private_data and ->f_inode
pointing to coda structures.

For example:

int ext4_filemap_fault(struct vm_fault *vmf)
{
struct inode *inode = file_inode(vmf->vma->vm_file)
int err;

down_read(&EXT4_I(inode)->i_mmap_sem);
[...]

There you have it: coda inode being interpreted as ext4 inode.  How is that
supposed to work?  How is it not blowing up?  What am I missing?

Thanks,
Miklos

Re: [PATCH 14/39] ovl: stack file ops

2018-06-14 Thread Al Viro

On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:
> On Tue, Jun 12, 2018 at 8:31 PM, Al Viro  wrote:
> > On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> 
> >> I hate it, but... consider path_open() objections withdrawn for now.
> 
> Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

Yes.

> >> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> >> at the moment.  I still would love to get that straightened out, but
> >> it's not this cycle fodder, more's the pity...
> 
> Looked at some other options...  What coda mmap does looks very
> dubious.  It only sets f_mapping, not vm_file.  That's going to get
> into all sorts of trouble when underlying fs tries to look at
> file_inode() or worse, ->private_data.  Looks like that should be
> converted to what overlayfs does, to have a remote chance of actually
> not crashing on most filesystems.  Does anybody actually use coda
> still?

Keep in mind that coda is using the local fs only as cache; IOW, its needs
are much more limited than those of overlayfs - local r/w filesystem,
disk-backed or tmpfs, used pretty much as a scratch space.

> > PS: conversion of ->f_path.dentry is easy and that can probably go this
> > cycle - it's a fairly trivial change, with no functional changes unless
> > overlayfs is used with , fixing really bad shit if it ever
> > gets used thus.  I'm not asking to put that into overlayfs pull *and*
> > it's independent from the "want to kill that fucking kludge" stuff.
> > The latter is too hard for this cycle, unfortunately.
> 
> So this is about adding a file_dentry_check() (or whatever we want to
> call it) helper to be used by all filesystems when dereferecing
> f_path.dentry, right?

file_dentry(), and some of the users should be converted to file_inode().
There's also a missing helper for debugfs uses - more or less a combination
of file_dentry() and debugfs_file_get() (if not a conversion of
debugfs_file_get() to taking struct file - almost all users are of that
form, if not entirely all of them).  I've some of that done in local
branch...

Re: [PATCH 14/39] ovl: stack file ops

2018-06-13 Thread J. R. Okajima

Al Viro:
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
:::
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

I don't fully read this thread, but the discussion is related to the
file path printed in /proc/$$/maps?  If so, as just for your
information, here is an approach that aufs took.

In linux-v2.6 era, aufs tried implementing mmap by customzing
address_space ops, but it was not good and failed completing the
implementation.
As wel as overlayfs, aufs has two struct file objects for a single
a regular file.  One is for a virtual aufs' entry, and the other is for
a real layer's entry.  When a user issues mmap(2) for the virtual file,
aufs redirects the request to the real file on the layer internally.  So
the vm_file points to the real file.  It means /proc/$$/maps prints the
unexpected file path.

Aufs added another struct file* vm_prfile in struct vma.  It points to
the virtual aufs file, and /proc/$$/maps prints vm_prfile instead of
vm_file. Of cource, maintaining vm_prfile is important since vma may be
merged or splitted.
Still I don't like this approach, but I don't have another better idea,
also it works for many years.  You can get the patch in
aufs4-standalone.git on sourceforge if you want.


J. R. Okajima

Re: [PATCH 14/39] ovl: stack file ops

2018-06-13 Thread Miklos Szeredi

On Tue, Jun 12, 2018 at 8:31 PM, Al Viro  wrote:
> On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:

>> I hate it, but... consider path_open() objections withdrawn for now.

Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

>> Uses of ->vm_file (and rules for those) are too convoluted to untangle
>> at the moment.  I still would love to get that straightened out, but
>> it's not this cycle fodder, more's the pity...

Looked at some other options...  What coda mmap does looks very
dubious.  It only sets f_mapping, not vm_file.  That's going to get
into all sorts of trouble when underlying fs tries to look at
file_inode() or worse, ->private_data.  Looks like that should be
converted to what overlayfs does, to have a remote chance of actually
not crashing on most filesystems.  Does anybody actually use coda
still?

> PS: conversion of ->f_path.dentry is easy and that can probably go this
> cycle - it's a fairly trivial change, with no functional changes unless
> overlayfs is used with , fixing really bad shit if it ever
> gets used thus.  I'm not asking to put that into overlayfs pull *and*
> it's independent from the "want to kill that fucking kludge" stuff.
> The latter is too hard for this cycle, unfortunately.

So this is about adding a file_dentry_check() (or whatever we want to
call it) helper to be used by all filesystems when dereferecing
f_path.dentry, right?

Thanks,
Miklos

Re: [PATCH 14/39] ovl: stack file ops

2018-06-12 Thread Al Viro

On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:
> 
> > > Note that anything that uses file_dentry() anywhere near ->open(),
> > > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > > Such as
> > > int nfs_open(struct inode *inode, struct file *filp)
> > > {
> > > struct nfs_open_context *ctx;
> > >
> > > ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, 
> > > filp);
> > > if (IS_ERR(ctx))
> > > return PTR_ERR(ctx);
> > > nfs_file_set_open_context(filp, ctx);
> > > put_nfs_open_context(ctx);
> > > nfs_fscache_open_file(inode, filp);
> > > return 0;
> > > }
> > >
> > > You do want to support NFS for lower layers, right?
> > 
> > There's no change regarding how file_dentry() works.  We've just
> > pushed these weird files (f_path points to overlay, f_inode points to
> > underlay) down into the guts of overlayfs and are not directly
> > referenced from the file table anymore.  That shouldn't make *any*
> > difference from the lower fs's pov.
> 
> *o*
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
> 
> Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
> that they don't come back.  Leaving them around due to "it won't come
> anywhere near overlayfs" was a mistake of the same kind as leaving
> d_add() in ->lookup() instances where we'd been certain that filesystem
> would never get exported over NFS.  Just as we'd got open-by-handle for
> e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
> overlayfs...
> 
> I hate it, but... consider path_open() objections withdrawn for now.
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

PS: conversion of ->f_path.dentry is easy and that can probably go this
cycle - it's a fairly trivial change, with no functional changes unless
overlayfs is used with , fixing really bad shit if it ever
gets used thus.  I'm not asking to put that into overlayfs pull *and*
it's independent from the "want to kill that fucking kludge" stuff.
The latter is too hard for this cycle, unfortunately.

Re: [PATCH 14/39] ovl: stack file ops

2018-06-12 Thread Al Viro

On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:

> > Note that anything that uses file_dentry() anywhere near ->open(),
> > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > Such as
> > int nfs_open(struct inode *inode, struct file *filp)
> > {
> > struct nfs_open_context *ctx;
> >
> > ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> > if (IS_ERR(ctx))
> > return PTR_ERR(ctx);
> > nfs_file_set_open_context(filp, ctx);
> > put_nfs_open_context(ctx);
> > nfs_fscache_open_file(inode, filp);
> > return 0;
> > }
> >
> > You do want to support NFS for lower layers, right?
> 
> There's no change regarding how file_dentry() works.  We've just
> pushed these weird files (f_path points to overlay, f_inode points to
> underlay) down into the guts of overlayfs and are not directly
> referenced from the file table anymore.  That shouldn't make *any*
> difference from the lower fs's pov.

*o*
I'd managed to push that particular nest of horrors out of mind ;-/
Having dug out my notes from back then and grepped around...  The real
mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
reasons for that kludge are still valid ;-/

Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
that they don't come back.  Leaving them around due to "it won't come
anywhere near overlayfs" was a mistake of the same kind as leaving
d_add() in ->lookup() instances where we'd been certain that filesystem
would never get exported over NFS.  Just as we'd got open-by-handle for
e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
overlayfs...

I hate it, but... consider path_open() objections withdrawn for now.
Uses of ->vm_file (and rules for those) are too convoluted to untangle
at the moment.  I still would love to get that straightened out, but
it's not this cycle fodder, more's the pity...

Re: [PATCH 14/39] ovl: stack file ops

2018-06-12 Thread Miklos Szeredi

On Tue, Jun 12, 2018 at 4:40 AM, Al Viro  wrote:
> On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:
>
>> It might (or might not) work for the filesystems you'd been testing
>> on, but it's a lot of trouble waiting to happen.  Hell, try and use
>> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
>> a dumb testcase, but I don't see how to check if something more
>> realistic is trouble-free.

That's funny, because when dhowells added the patch to make f_path
point to the overlay, I was fighting tooth and claw against that
change on the grounds of being unsafe, but it went through regardless
(and was in fact one of the biggest headaches in overlay/vfs
interaction).

So you might be right that there are bugs in the handling of ecryptfs,
etc, however the patchset is guaranteed not to cause regressions in
this area.

And yes, it would be best to get rid of that kludge once and for all.

>>
>> I'd been trying to come up with some way to salvage that kludge of yours,
>> but I don't see any solutions.  We don't have good proxies for "this
>> filesystem might be unsafe as lower layer" ;-/
>
> Note that anything that uses file_dentry() anywhere near ->open(),
> ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> Such as
> int nfs_open(struct inode *inode, struct file *filp)
> {
> struct nfs_open_context *ctx;
>
> ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> if (IS_ERR(ctx))
> return PTR_ERR(ctx);
> nfs_file_set_open_context(filp, ctx);
> put_nfs_open_context(ctx);
> nfs_fscache_open_file(inode, filp);
> return 0;
> }
>
> You do want to support NFS for lower layers, right?

There's no change regarding how file_dentry() works.  We've just
pushed these weird files (f_path points to overlay, f_inode points to
underlay) down into the guts of overlayfs and are not directly
referenced from the file table anymore.  That shouldn't make *any*
difference from the lower fs's pov.

The only difference is that now the real file has creds inherited from
mounter task.  If lower filesystem's a_ops did some permission
checking based on that, then that might make a difference in behavior.
But I guess that difference would be in the positive direction, making
behavior more consistent.

Thanks,
Miklos

Re: [PATCH 14/39] ovl: stack file ops

2018-06-11 Thread Al Viro

On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:

> It might (or might not) work for the filesystems you'd been testing
> on, but it's a lot of trouble waiting to happen.  Hell, try and use
> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
> a dumb testcase, but I don't see how to check if something more
> realistic is trouble-free.
> 
> I'd been trying to come up with some way to salvage that kludge of yours,
> but I don't see any solutions.  We don't have good proxies for "this
> filesystem might be unsafe as lower layer" ;-/

Note that anything that uses file_dentry() anywhere near ->open(),
->read_iter() or ->write_iter() is an instant trouble with your scheme.
Such as
int nfs_open(struct inode *inode, struct file *filp)
{
struct nfs_open_context *ctx;

ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
if (IS_ERR(ctx)) 
return PTR_ERR(ctx);
nfs_file_set_open_context(filp, ctx);
put_nfs_open_context(ctx);
nfs_fscache_open_file(inode, filp);
return 0;
}

You do want to support NFS for lower layers, right?

Re: [PATCH 14/39] ovl: stack file ops

2018-06-11 Thread Al Viro

On Mon, Jun 11, 2018 at 09:09:04AM +0200, Miklos Szeredi wrote:

[context: opening files in layers with unholy mix of overlayfs
->f_path and layer's ->f_inode/->f_op]

> I'd really like to get there some time but...
> 
> List of basic requirements:
> 
>  - Private mmap of overlay file shares page cache with lower file (and
> hence with all other overlays using the same lower file).
> 
>  - /proc/PID/maps shows correct path.
> 
> Thought about setting f_mapping/i_mapping of overlay file to that of
> underlying file.  But that breaks when doing a copy-up.  We can't just
> go and change those mapping pointers, assumption is that those remain
> constant (we'd need READ_ONCE() for all cases where we use the mapping
> more than once).  It's probably doable, but it's a large and fragile
> change.

We are really asking for trouble here - anything with e.g. ->read_iter()
using dentry will get in trouble with that kind of games.  Consider something
like

foo_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
struct foo_data *p = file->f_path.dentry->d_fsdata;
...
}

which will work just fine for files on foofs, where we have ->d_fsdata set
on lookup.  Now, try to use foofs as a layer; suddenly, you get foofs
files with ->f_path.dentry being *overlayfs* dentry, with ->d_fsdata
being nothing like struct foo_data *.

Better yet, consider

foo_open(struct inode *inode, struct file *file)
{
struct dentry *dentry = file->f_path.dentry;
...
foo_add_splat(dentry, splat);
...
}
where foo_add_splat() inserts struct foo_splat into an hlist starting
in dentry->d_fsdata.  That's not a pure theory - we *do* have ->open()
instances doing things of that sort.  That'll bugger overlayfs quite
badly, not to mention that foofs methods won't be happy with overlayfs
dentries.

It might (or might not) work for the filesystems you'd been testing
on, but it's a lot of trouble waiting to happen.  Hell, try and use
ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
a dumb testcase, but I don't see how to check if something more
realistic is trouble-free.

I'd been trying to come up with some way to salvage that kludge of yours,
but I don't see any solutions.  We don't have good proxies for "this
filesystem might be unsafe as lower layer" ;-/

Frankly, it might be saner and safer to teach procfs (and similar
places) to do more than just use ->vm_file->f_path.  _That_ at least
is much more local in impact.

Re: [PATCH 14/39] ovl: stack file ops

2018-06-11 Thread Miklos Szeredi

On Sun, Jun 10, 2018 at 6:13 AM, Al Viro  wrote:
> On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
>> Implement file operations on a regular overlay file.  The underlying file
>> is opened separately and cached in ->private_data.
>>
>> It might be worth making an exception for such files when accounting in
>> nr_file to confirm to userspace expectations.  We are only adding a small
>> overhead (248bytes for the struct file) since the real inode and dentry are
>> pinned by overlayfs anyway.
>>
>> This patch doesn't have any effect, since the vfs will use d_real() to find
>> the real underlying file to open.  The patch at the end of the series will
>> actually enable this functionality.
>
>> +static struct file *ovl_open_realfile(const struct file *file)
>> +{
>> + struct inode *inode = file_inode(file);
>> + struct inode *upperinode = ovl_inode_upper(inode);
>> + struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
>> + struct file *realfile;
>> + const struct cred *old_cred;
>> +
>> + old_cred = ovl_override_creds(inode->i_sb);
>> + realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
>> +  realinode, current_cred(), false);
>> + revert_creds(old_cred);
>> +
>> + pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
>> +  file, file, upperinode ? 'u' : 'l', file->f_flags,
>> +  realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
>> +
>> + return realfile;
>> +}
>
> IDGI.  OK, you open a file in the layer you want; good, but why the hell do 
> you
> *not* use the dentry/vfsmount from the same layer?
>
> IOW, why does your path_open() get an explicit inode argument at all?  With 
> the
> rest of the work done in that series it looks like you should be able to use
> vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
> not in a layer, but why do the same for those?

I'd really like to get there some time but...

List of basic requirements:

 - Private mmap of overlay file shares page cache with lower file (and
hence with all other overlays using the same lower file).

 - /proc/PID/maps shows correct path.

Thought about setting f_mapping/i_mapping of overlay file to that of
underlying file.  But that breaks when doing a copy-up.  We can't just
go and change those mapping pointers, assumption is that those remain
constant (we'd need READ_ONCE() for all cases where we use the mapping
more than once).  It's probably doable, but it's a large and fragile
change.

Thanks,
Miklos

Re: [PATCH 14/39] ovl: stack file ops

2018-06-09 Thread Al Viro

On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
> Implement file operations on a regular overlay file.  The underlying file
> is opened separately and cached in ->private_data.
> 
> It might be worth making an exception for such files when accounting in
> nr_file to confirm to userspace expectations.  We are only adding a small
> overhead (248bytes for the struct file) since the real inode and dentry are
> pinned by overlayfs anyway.
> 
> This patch doesn't have any effect, since the vfs will use d_real() to find
> the real underlying file to open.  The patch at the end of the series will
> actually enable this functionality.

> +static struct file *ovl_open_realfile(const struct file *file)
> +{
> + struct inode *inode = file_inode(file);
> + struct inode *upperinode = ovl_inode_upper(inode);
> + struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
> + struct file *realfile;
> + const struct cred *old_cred;
> +
> + old_cred = ovl_override_creds(inode->i_sb);
> + realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
> +  realinode, current_cred(), false);
> + revert_creds(old_cred);
> +
> + pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
> +  file, file, upperinode ? 'u' : 'l', file->f_flags,
> +  realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
> +
> + return realfile;
> +}

IDGI.  OK, you open a file in the layer you want; good, but why the hell do you
*not* use the dentry/vfsmount from the same layer?

IOW, why does your path_open() get an explicit inode argument at all?  With the
rest of the work done in that series it looks like you should be able to use
vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
not in a layer, but why do the same for those?

And why bother with override_creds at all?  What's wrong with simply passing
->creator_cred to path_open()/vfs_open()/whatnot?

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

Re: [PATCH 14/39] ovl: stack file ops

11 matches

Site Navigation

Mail list logo

Footer information