Re: [PATCH 14/39] ovl: stack file ops
On Fri, Jun 15, 2018 at 06:47:17AM +0100, Al Viro wrote: > On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote: > > Looked at some other options... What coda mmap does looks very > > dubious. It only sets f_mapping, not vm_file. That's going to get > > into all sorts of trouble when underlying fs tries to look at > > file_inode() or worse, ->private_data. Looks like that should be > > converted to what overlayfs does, to have a remote chance of actually > > not crashing on most filesystems. Does anybody actually use coda > > still? > > Keep in mind that coda is using the local fs only as cache; IOW, its needs > are much more limited than those of overlayfs - local r/w filesystem, > disk-backed or tmpfs, used pretty much as a scratch space. Look: coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma) { [...] coda_file->f_mapping = host_file->f_mapping; [...] return call_mmap(host_file, vma); } So that'll end up with vma->vm_file pointing to coda file, coda_file->f_mapping pointing to host mapping. Hence vm_ops and a_ops are going to come from host file, but they'll be getting a "foreign" file with ->private_data and ->f_inode pointing to coda structures. For example: int ext4_filemap_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file) int err; down_read(&EXT4_I(inode)->i_mmap_sem); [...] There you have it: coda inode being interpreted as ext4 inode. How is that supposed to work? How is it not blowing up? What am I missing? Thanks, Miklos
Re: [PATCH 14/39] ovl: stack file ops
On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote: > On Tue, Jun 12, 2018 at 8:31 PM, Al Viro wrote: > > On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote: > > >> I hate it, but... consider path_open() objections withdrawn for now. > > Is that an ACK for the pull if I follow up with fixes for mmap botch, etc? Yes. > >> Uses of ->vm_file (and rules for those) are too convoluted to untangle > >> at the moment. I still would love to get that straightened out, but > >> it's not this cycle fodder, more's the pity... > > Looked at some other options... What coda mmap does looks very > dubious. It only sets f_mapping, not vm_file. That's going to get > into all sorts of trouble when underlying fs tries to look at > file_inode() or worse, ->private_data. Looks like that should be > converted to what overlayfs does, to have a remote chance of actually > not crashing on most filesystems. Does anybody actually use coda > still? Keep in mind that coda is using the local fs only as cache; IOW, its needs are much more limited than those of overlayfs - local r/w filesystem, disk-backed or tmpfs, used pretty much as a scratch space. > > PS: conversion of ->f_path.dentry is easy and that can probably go this > > cycle - it's a fairly trivial change, with no functional changes unless > > overlayfs is used with , fixing really bad shit if it ever > > gets used thus. I'm not asking to put that into overlayfs pull *and* > > it's independent from the "want to kill that fucking kludge" stuff. > > The latter is too hard for this cycle, unfortunately. > > So this is about adding a file_dentry_check() (or whatever we want to > call it) helper to be used by all filesystems when dereferecing > f_path.dentry, right? file_dentry(), and some of the users should be converted to file_inode(). There's also a missing helper for debugfs uses - more or less a combination of file_dentry() and debugfs_file_get() (if not a conversion of debugfs_file_get() to taking struct file - almost all users are of that form, if not entirely all of them). I've some of that done in local branch...
Re: [PATCH 14/39] ovl: stack file ops
Al Viro: > I'd managed to push that particular nest of horrors out of mind ;-/ > Having dug out my notes from back then and grepped around... The real > mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the > reasons for that kludge are still valid ;-/ ::: > Uses of ->vm_file (and rules for those) are too convoluted to untangle > at the moment. I still would love to get that straightened out, but > it's not this cycle fodder, more's the pity... I don't fully read this thread, but the discussion is related to the file path printed in /proc/$$/maps? If so, as just for your information, here is an approach that aufs took. In linux-v2.6 era, aufs tried implementing mmap by customzing address_space ops, but it was not good and failed completing the implementation. As wel as overlayfs, aufs has two struct file objects for a single a regular file. One is for a virtual aufs' entry, and the other is for a real layer's entry. When a user issues mmap(2) for the virtual file, aufs redirects the request to the real file on the layer internally. So the vm_file points to the real file. It means /proc/$$/maps prints the unexpected file path. Aufs added another struct file* vm_prfile in struct vma. It points to the virtual aufs file, and /proc/$$/maps prints vm_prfile instead of vm_file. Of cource, maintaining vm_prfile is important since vma may be merged or splitted. Still I don't like this approach, but I don't have another better idea, also it works for many years. You can get the patch in aufs4-standalone.git on sourceforge if you want. J. R. Okajima
Re: [PATCH 14/39] ovl: stack file ops
On Tue, Jun 12, 2018 at 8:31 PM, Al Viro wrote: > On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote: >> I hate it, but... consider path_open() objections withdrawn for now. Is that an ACK for the pull if I follow up with fixes for mmap botch, etc? >> Uses of ->vm_file (and rules for those) are too convoluted to untangle >> at the moment. I still would love to get that straightened out, but >> it's not this cycle fodder, more's the pity... Looked at some other options... What coda mmap does looks very dubious. It only sets f_mapping, not vm_file. That's going to get into all sorts of trouble when underlying fs tries to look at file_inode() or worse, ->private_data. Looks like that should be converted to what overlayfs does, to have a remote chance of actually not crashing on most filesystems. Does anybody actually use coda still? > PS: conversion of ->f_path.dentry is easy and that can probably go this > cycle - it's a fairly trivial change, with no functional changes unless > overlayfs is used with , fixing really bad shit if it ever > gets used thus. I'm not asking to put that into overlayfs pull *and* > it's independent from the "want to kill that fucking kludge" stuff. > The latter is too hard for this cycle, unfortunately. So this is about adding a file_dentry_check() (or whatever we want to call it) helper to be used by all filesystems when dereferecing f_path.dentry, right? Thanks, Miklos
Re: [PATCH 14/39] ovl: stack file ops
On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote: > On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote: > > > > Note that anything that uses file_dentry() anywhere near ->open(), > > > ->read_iter() or ->write_iter() is an instant trouble with your scheme. > > > Such as > > > int nfs_open(struct inode *inode, struct file *filp) > > > { > > > struct nfs_open_context *ctx; > > > > > > ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, > > > filp); > > > if (IS_ERR(ctx)) > > > return PTR_ERR(ctx); > > > nfs_file_set_open_context(filp, ctx); > > > put_nfs_open_context(ctx); > > > nfs_fscache_open_file(inode, filp); > > > return 0; > > > } > > > > > > You do want to support NFS for lower layers, right? > > > > There's no change regarding how file_dentry() works. We've just > > pushed these weird files (f_path points to overlay, f_inode points to > > underlay) down into the guts of overlayfs and are not directly > > referenced from the file table anymore. That shouldn't make *any* > > difference from the lower fs's pov. > > *o* > I'd managed to push that particular nest of horrors out of mind ;-/ > Having dug out my notes from back then and grepped around... The real > mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the > reasons for that kludge are still valid ;-/ > > Fuck. OK, so we want to get rid of ->f_path.dentry accesses and see > that they don't come back. Leaving them around due to "it won't come > anywhere near overlayfs" was a mistake of the same kind as leaving > d_add() in ->lookup() instances where we'd been certain that filesystem > would never get exported over NFS. Just as we'd got open-by-handle for > e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in > overlayfs... > > I hate it, but... consider path_open() objections withdrawn for now. > Uses of ->vm_file (and rules for those) are too convoluted to untangle > at the moment. I still would love to get that straightened out, but > it's not this cycle fodder, more's the pity... PS: conversion of ->f_path.dentry is easy and that can probably go this cycle - it's a fairly trivial change, with no functional changes unless overlayfs is used with , fixing really bad shit if it ever gets used thus. I'm not asking to put that into overlayfs pull *and* it's independent from the "want to kill that fucking kludge" stuff. The latter is too hard for this cycle, unfortunately.
Re: [PATCH 14/39] ovl: stack file ops
On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote: > > Note that anything that uses file_dentry() anywhere near ->open(), > > ->read_iter() or ->write_iter() is an instant trouble with your scheme. > > Such as > > int nfs_open(struct inode *inode, struct file *filp) > > { > > struct nfs_open_context *ctx; > > > > ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); > > if (IS_ERR(ctx)) > > return PTR_ERR(ctx); > > nfs_file_set_open_context(filp, ctx); > > put_nfs_open_context(ctx); > > nfs_fscache_open_file(inode, filp); > > return 0; > > } > > > > You do want to support NFS for lower layers, right? > > There's no change regarding how file_dentry() works. We've just > pushed these weird files (f_path points to overlay, f_inode points to > underlay) down into the guts of overlayfs and are not directly > referenced from the file table anymore. That shouldn't make *any* > difference from the lower fs's pov. *o* I'd managed to push that particular nest of horrors out of mind ;-/ Having dug out my notes from back then and grepped around... The real mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the reasons for that kludge are still valid ;-/ Fuck. OK, so we want to get rid of ->f_path.dentry accesses and see that they don't come back. Leaving them around due to "it won't come anywhere near overlayfs" was a mistake of the same kind as leaving d_add() in ->lookup() instances where we'd been certain that filesystem would never get exported over NFS. Just as we'd got open-by-handle for e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in overlayfs... I hate it, but... consider path_open() objections withdrawn for now. Uses of ->vm_file (and rules for those) are too convoluted to untangle at the moment. I still would love to get that straightened out, but it's not this cycle fodder, more's the pity...
Re: [PATCH 14/39] ovl: stack file ops
On Tue, Jun 12, 2018 at 4:40 AM, Al Viro wrote: > On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote: > >> It might (or might not) work for the filesystems you'd been testing >> on, but it's a lot of trouble waiting to happen. Hell, try and use >> ecryptfs as lower layer, see how fast it'll blow up. Sure, it's >> a dumb testcase, but I don't see how to check if something more >> realistic is trouble-free. That's funny, because when dhowells added the patch to make f_path point to the overlay, I was fighting tooth and claw against that change on the grounds of being unsafe, but it went through regardless (and was in fact one of the biggest headaches in overlay/vfs interaction). So you might be right that there are bugs in the handling of ecryptfs, etc, however the patchset is guaranteed not to cause regressions in this area. And yes, it would be best to get rid of that kludge once and for all. >> >> I'd been trying to come up with some way to salvage that kludge of yours, >> but I don't see any solutions. We don't have good proxies for "this >> filesystem might be unsafe as lower layer" ;-/ > > Note that anything that uses file_dentry() anywhere near ->open(), > ->read_iter() or ->write_iter() is an instant trouble with your scheme. > Such as > int nfs_open(struct inode *inode, struct file *filp) > { > struct nfs_open_context *ctx; > > ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); > if (IS_ERR(ctx)) > return PTR_ERR(ctx); > nfs_file_set_open_context(filp, ctx); > put_nfs_open_context(ctx); > nfs_fscache_open_file(inode, filp); > return 0; > } > > You do want to support NFS for lower layers, right? There's no change regarding how file_dentry() works. We've just pushed these weird files (f_path points to overlay, f_inode points to underlay) down into the guts of overlayfs and are not directly referenced from the file table anymore. That shouldn't make *any* difference from the lower fs's pov. The only difference is that now the real file has creds inherited from mounter task. If lower filesystem's a_ops did some permission checking based on that, then that might make a difference in behavior. But I guess that difference would be in the positive direction, making behavior more consistent. Thanks, Miklos
Re: [PATCH 14/39] ovl: stack file ops
On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote: > It might (or might not) work for the filesystems you'd been testing > on, but it's a lot of trouble waiting to happen. Hell, try and use > ecryptfs as lower layer, see how fast it'll blow up. Sure, it's > a dumb testcase, but I don't see how to check if something more > realistic is trouble-free. > > I'd been trying to come up with some way to salvage that kludge of yours, > but I don't see any solutions. We don't have good proxies for "this > filesystem might be unsafe as lower layer" ;-/ Note that anything that uses file_dentry() anywhere near ->open(), ->read_iter() or ->write_iter() is an instant trouble with your scheme. Such as int nfs_open(struct inode *inode, struct file *filp) { struct nfs_open_context *ctx; ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp); if (IS_ERR(ctx)) return PTR_ERR(ctx); nfs_file_set_open_context(filp, ctx); put_nfs_open_context(ctx); nfs_fscache_open_file(inode, filp); return 0; } You do want to support NFS for lower layers, right?
Re: [PATCH 14/39] ovl: stack file ops
On Mon, Jun 11, 2018 at 09:09:04AM +0200, Miklos Szeredi wrote: [context: opening files in layers with unholy mix of overlayfs ->f_path and layer's ->f_inode/->f_op] > I'd really like to get there some time but... > > List of basic requirements: > > - Private mmap of overlay file shares page cache with lower file (and > hence with all other overlays using the same lower file). > > - /proc/PID/maps shows correct path. > > Thought about setting f_mapping/i_mapping of overlay file to that of > underlying file. But that breaks when doing a copy-up. We can't just > go and change those mapping pointers, assumption is that those remain > constant (we'd need READ_ONCE() for all cases where we use the mapping > more than once). It's probably doable, but it's a large and fragile > change. We are really asking for trouble here - anything with e.g. ->read_iter() using dentry will get in trouble with that kind of games. Consider something like foo_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; struct foo_data *p = file->f_path.dentry->d_fsdata; ... } which will work just fine for files on foofs, where we have ->d_fsdata set on lookup. Now, try to use foofs as a layer; suddenly, you get foofs files with ->f_path.dentry being *overlayfs* dentry, with ->d_fsdata being nothing like struct foo_data *. Better yet, consider foo_open(struct inode *inode, struct file *file) { struct dentry *dentry = file->f_path.dentry; ... foo_add_splat(dentry, splat); ... } where foo_add_splat() inserts struct foo_splat into an hlist starting in dentry->d_fsdata. That's not a pure theory - we *do* have ->open() instances doing things of that sort. That'll bugger overlayfs quite badly, not to mention that foofs methods won't be happy with overlayfs dentries. It might (or might not) work for the filesystems you'd been testing on, but it's a lot of trouble waiting to happen. Hell, try and use ecryptfs as lower layer, see how fast it'll blow up. Sure, it's a dumb testcase, but I don't see how to check if something more realistic is trouble-free. I'd been trying to come up with some way to salvage that kludge of yours, but I don't see any solutions. We don't have good proxies for "this filesystem might be unsafe as lower layer" ;-/ Frankly, it might be saner and safer to teach procfs (and similar places) to do more than just use ->vm_file->f_path. _That_ at least is much more local in impact.
Re: [PATCH 14/39] ovl: stack file ops
On Sun, Jun 10, 2018 at 6:13 AM, Al Viro wrote: > On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote: >> Implement file operations on a regular overlay file. The underlying file >> is opened separately and cached in ->private_data. >> >> It might be worth making an exception for such files when accounting in >> nr_file to confirm to userspace expectations. We are only adding a small >> overhead (248bytes for the struct file) since the real inode and dentry are >> pinned by overlayfs anyway. >> >> This patch doesn't have any effect, since the vfs will use d_real() to find >> the real underlying file to open. The patch at the end of the series will >> actually enable this functionality. > >> +static struct file *ovl_open_realfile(const struct file *file) >> +{ >> + struct inode *inode = file_inode(file); >> + struct inode *upperinode = ovl_inode_upper(inode); >> + struct inode *realinode = upperinode ?: ovl_inode_lower(inode); >> + struct file *realfile; >> + const struct cred *old_cred; >> + >> + old_cred = ovl_override_creds(inode->i_sb); >> + realfile = path_open(&file->f_path, file->f_flags | O_NOATIME, >> + realinode, current_cred(), false); >> + revert_creds(old_cred); >> + >> + pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n", >> + file, file, upperinode ? 'u' : 'l', file->f_flags, >> + realfile, IS_ERR(realfile) ? 0 : realfile->f_flags); >> + >> + return realfile; >> +} > > IDGI. OK, you open a file in the layer you want; good, but why the hell do > you > *not* use the dentry/vfsmount from the same layer? > > IOW, why does your path_open() get an explicit inode argument at all? With > the > rest of the work done in that series it looks like you should be able to use > vfs_open() instead... Sure, for ovlfs file you want ->f_path on overlayfs and > not in a layer, but why do the same for those? I'd really like to get there some time but... List of basic requirements: - Private mmap of overlay file shares page cache with lower file (and hence with all other overlays using the same lower file). - /proc/PID/maps shows correct path. Thought about setting f_mapping/i_mapping of overlay file to that of underlying file. But that breaks when doing a copy-up. We can't just go and change those mapping pointers, assumption is that those remain constant (we'd need READ_ONCE() for all cases where we use the mapping more than once). It's probably doable, but it's a large and fragile change. Thanks, Miklos
Re: [PATCH 14/39] ovl: stack file ops
On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote: > Implement file operations on a regular overlay file. The underlying file > is opened separately and cached in ->private_data. > > It might be worth making an exception for such files when accounting in > nr_file to confirm to userspace expectations. We are only adding a small > overhead (248bytes for the struct file) since the real inode and dentry are > pinned by overlayfs anyway. > > This patch doesn't have any effect, since the vfs will use d_real() to find > the real underlying file to open. The patch at the end of the series will > actually enable this functionality. > +static struct file *ovl_open_realfile(const struct file *file) > +{ > + struct inode *inode = file_inode(file); > + struct inode *upperinode = ovl_inode_upper(inode); > + struct inode *realinode = upperinode ?: ovl_inode_lower(inode); > + struct file *realfile; > + const struct cred *old_cred; > + > + old_cred = ovl_override_creds(inode->i_sb); > + realfile = path_open(&file->f_path, file->f_flags | O_NOATIME, > + realinode, current_cred(), false); > + revert_creds(old_cred); > + > + pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n", > + file, file, upperinode ? 'u' : 'l', file->f_flags, > + realfile, IS_ERR(realfile) ? 0 : realfile->f_flags); > + > + return realfile; > +} IDGI. OK, you open a file in the layer you want; good, but why the hell do you *not* use the dentry/vfsmount from the same layer? IOW, why does your path_open() get an explicit inode argument at all? With the rest of the work done in that series it looks like you should be able to use vfs_open() instead... Sure, for ovlfs file you want ->f_path on overlayfs and not in a layer, but why do the same for those? And why bother with override_creds at all? What's wrong with simply passing ->creator_cred to path_open()/vfs_open()/whatnot?