Re: Linux 3.15 .. and continuation of merge window
Al Viro: > The former guarantees that the address we are doing trylock on would be that > of a live dentry. The latter makes sure that anything assigned to > dentry->d_parent after we drop ->d_lock will not be freed until we drop > rcu_read_lock. Although I don't know how to reproduce the problem for the latter case, I think the patch is good. J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.15 .. and continuation of merge window
On Wed, Jun 11, 2014 at 06:32:58PM +0900, J. R. Okajima wrote: > > Al Viro: > > So I suspect that the right fix is a bit trickier - in addition to check > > on the fast path (i.e. when trylock gets us the lock on parent), we need > > to > > * get rcu_read_lock() before dropping ->d_lock. > > * check if dentry is already doomed right after taking rcu_read_lock(); > > if not, any value we might see in ->d_parent afterwards will point to object > > not freed until we drop rcu_read_lock. > > > > IOW, something like the delta below. Comments? > > I will try testing later. > For now, as a comment before testing, the patch looks weird for me. It > checks d_lockref.count twice during d_lockref.lock held. It must be the > same result, isn't it? Right you are. So what we need is * check that thing once, as in your variant (I'd still prefer to check ->d_lockref.count instead of ->d_flags, but it's the same thing being tested) * ... and get rcu_read_lock() *before* dropping ->d_lock. The former guarantees that the address we are doing trylock on would be that of a live dentry. The latter makes sure that anything assigned to dentry->d_parent after we drop ->d_lock will not be freed until we drop rcu_read_lock. Signed-off-by: Al Viro --- diff --git a/fs/dcache.c b/fs/dcache.c index be2bea8..e99c6f5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -532,10 +532,12 @@ static inline struct dentry *lock_parent(struct dentry *dentry) struct dentry *parent = dentry->d_parent; if (IS_ROOT(dentry)) return NULL; + if (unlikely((int)dentry->d_lockref.count < 0)) + return NULL; if (likely(spin_trylock(&parent->d_lock))) return parent; - spin_unlock(&dentry->d_lock); rcu_read_lock(); + spin_unlock(&dentry->d_lock); again: parent = ACCESS_ONCE(dentry->d_parent); spin_lock(&parent->d_lock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.15 .. and continuation of merge window
Al Viro: > So I suspect that the right fix is a bit trickier - in addition to check > on the fast path (i.e. when trylock gets us the lock on parent), we need > to > * get rcu_read_lock() before dropping ->d_lock. > * check if dentry is already doomed right after taking rcu_read_lock(); > if not, any value we might see in ->d_parent afterwards will point to object > not freed until we drop rcu_read_lock. > > IOW, something like the delta below. Comments? I will try testing later. For now, as a comment before testing, the patch looks weird for me. It checks d_lockref.count twice during d_lockref.lock held. It must be the same result, isn't it? Or does it mean that denty can be handled by lockref_mark_dead() even if d_lockref.lock is held? J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.15 .. and continuation of merge window
On Mon, Jun 09, 2014 at 12:30:34PM +0900, J. R. Okajima wrote: > > Linus Torvalds: > > So I ended up doing an rc8 because I was a bit worried about some > > last-minute dcache fixes, but it turns out that nobody seemed to even > > notice those. We did have other issues during the week, though, so it > ::: > > I am afraid there is a problem in dcache. Please read > http://marc.info/?l=linux-fsdevel&m=140214911608925&w=2 There is a problem, all right, but your fix doesn't really fix it - just narrows the race window ;-/ I would really like to detach the bugger as soon as __dentry_kill() removes it from the list of children, but unfortunately NFS wants it to be still valid in ->d_iput() (BTW, nfs_can_unlink() is doing something very odd - parent = dget_parent(dentry); if (parent == NULL) goto out_free; is pointless, since dget_parent() never returns NULL; what was that check trying to accomplish?) Your scenario isn't feasible as described, but something similar can happen with *two* shrinkers racing - dirB might've ended up on one list, with fileC looked up a bit later, ending up on another list right after that. So the problem is real; unfortunately, DCACHE_DENTRY_KILLED might have appeared right after we'd dropped ->d_lock. So I suspect that the right fix is a bit trickier - in addition to check on the fast path (i.e. when trylock gets us the lock on parent), we need to * get rcu_read_lock() before dropping ->d_lock. * check if dentry is already doomed right after taking rcu_read_lock(); if not, any value we might see in ->d_parent afterwards will point to object not freed until we drop rcu_read_lock. IOW, something like the delta below. Comments? PS: apologies for being MIA; caught some crap, spent the last week being very unhappy ;-/ I'll send a pull request tomorrow morning. diff --git a/fs/dcache.c b/fs/dcache.c index be2bea8..65ec10f 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -532,10 +532,16 @@ static inline struct dentry *lock_parent(struct dentry *dentry) struct dentry *parent = dentry->d_parent; if (IS_ROOT(dentry)) return NULL; + if (unlikely((int)dentry->d_lockref.count < 0)) + return NULL; if (likely(spin_trylock(&parent->d_lock))) return parent; - spin_unlock(&dentry->d_lock); rcu_read_lock(); + if (unlikely((int)dentry->d_lockref.count < 0)) { + rcu_read_unlock(); + return NULL; + } + spin_unlock(&dentry->d_lock); again: parent = ACCESS_ONCE(dentry->d_parent); spin_lock(&parent->d_lock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.15 .. and continuation of merge window
Linus Torvalds: > So I ended up doing an rc8 because I was a bit worried about some > last-minute dcache fixes, but it turns out that nobody seemed to even > notice those. We did have other issues during the week, though, so it ::: I am afraid there is a problem in dcache. Please read http://marc.info/?l=linux-fsdevel&m=140214911608925&w=2 For 3.16, please consider these patches. - for vfs [PATCH v2] vfs: get_next_ino(), never inum=0 http://marc.info/?l=linux-fsdevel&m=140128600801771&w=2 - for tmpfs [RFC PATCH v4 0/2] tmpfs: manage the inode-number by IDR (performance measure) http://marc.info/?l=linux-fsdevel&m=140197128021025&w=2 [RFC PATCH v4 1/2] tmpfs: manage the inode-number by IDR, signed int inum http://marc.info/?l=linux-fsdevel&m=140197128321029&w=2 [RFC PATCH v4 2/2] tmpfs: refine a file handle for NFS-exporting http://marc.info/?l=linux-fsdevel&m=140197128121026&w=2 J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.15 .. and continuation of merge window
So I ended up doing an rc8 because I was a bit worried about some last-minute dcache fixes, but it turns out that nobody seemed to even notice those. We did have other issues during the week, though, so it was just as well. The futex fixes and cleanups may stand out, but as usual there's various other random fixes since rc8 in there too: mainly drivers (drm, networking, sound, usb etc), networking, scheduling and perf tooling. But it's all been fairly small and quiet, which *may* of course be due to the fact that last week was also the first week of the merge window for 3.16. That might have distracted some developers. I'm not entirely convinced I liked the overlap, but it seemed to work ok, and unless people scream really loudly ("Please don't _ever_ do that again") and give good reasons for doing so, I might end up doing that overlapping merge window in the future too if it ends up helping out with some particular timing issue. That said, I also don't think it was such a wonderful experience that I'd want to necessarily do the overlap every time, without a good specific reason for doing so. It was kind of nice being productive during the last week or rc (which is usually quite boring and dead), but I think it might be a distraction when people should be worrying about the stability of the rc. Of course, maybe the overlap ends up meaning that we get less noise during the last week of stabilization, and it actually helps. It could go either way. I'd be interested to hear what people thought, although I _suspect_ most people don't feel strongly either way. Anyway, with 3.15 released, my "master" branch has already merged the work in my "next" branch on my local machine, and I'll be decommissioning the "next" branch once I push that all out. After that, any future merge window work will happen on "master", and we'll be back to the normal single-branch model for my tree. Linus --- Alan Stern (1): USB: Avoid runtime suspend loops for HCDs that can't handle suspend/resume Aleksander Morgado (3): net: qmi_wwan: add Netgear AirCard 341U net: qmi_wwan: add additional Sierra Wireless QMI devices net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI Alex Deucher (1): drm/radeon/dpm: resume fixes for some systems Alexej Starschenko (1): USB: serial: option: add support for Novatel E371 PCIe card Andrey Ryabinin (1): mm: rmap: fix use-after-free in __put_anon_vma Andy Lutomirski (1): x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET Bart De Schuymer (1): ebtables: Update MAINTAINERS entry. Ben Hutchings (2): Staging: speakup: Move pasting into a work item Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt Bjørn Mork (1): usb: cdc-wdm: export cdc-wdm uapi header Christian König (3): drm/radeon: fix vm buffer size estimation drm/radeon: sync page table updates drm/radeon: use the CP DMA on CIK Dan Carpenter (1): qlcnic: info leak in qlcnic_dcb_peer_app_info() Dave Young (2): x86/efi: earlyprintk=efi,keep fix x86/efi: Do not export efi runtime map in case old map Dirk Brandewie (3): intel_pstate: Remove C0 tracking intel_pstate: Correct rounding in busy calculation intel_pstate: add sample time scaling Doug Smythies (1): intel_pstate: Improve initial busy calculation Emmanuel Grumbach (1): iwlwifi: mvm: disable beacon filtering Eric Dumazet (1): net: fix inet_getid() and ipv6_select_ident() bugs Eric W. Biederman (1): netlink: Only check file credentials for implicit destinations Filipe Manana (1): Btrfs: send, fix corrupted path strings for long paths George McCollister (1): USB: ftdi_sio: add NovaTech OrionLXm product ID Greg Kroah-Hartman (1): USB: cdc-wdm: properly include types.h Ian Abbott (1): staging: comedi: ni_daq_700: add mux settling delay Igor Mammedov (3): x86: Fix list/memory corruption on CPU hotplug x86/smpboot: Log error on secondary CPU wakeup failure at ERR level x86/smpboot: Initialize secondary CPU only if master CPU will wait for it Ivan Mikhaylov (2): emac: add missing support of 10mbit in emac/rgmii emac: aggregation of v1-2 PLB errors for IER register Jack Morgenstein (1): net/mlx4_core: Reset RoCE VF gids when guest driver goes down Jean Delvare (1): net: ec_bhf: Add runtime dependencies Jianyu Zhan (1): kernfs: move the last knowledge of sysfs out from kernfs Jiri Pirko (1): team: fix mtu setting Johan Hovold (1): USB: io_ti: fix firmware download on big-endian machines (part 2) Jon Maxwell (1): bridge: notify user space after fdb update Kirill Tkhai (1): sched/dl: Fix race in dl_task_timer() Kristian Evensen (1): ipheth: Add support for iPad 2 and iPad 3 Leon Yu (1): net: filter: fix possible memory leak in __sk_prepare_filter() Lin