Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 13 Mar 2013, Jeff Layton wrote: > Of course, this is all a lot of work, and not something we can shove > into the kernel for 3.9 at this point. In the meantime, while Mandeep's > warning is correctly pointing out a problem, I think we ought to back > it out until we can fix this properly. Revert posted at: http://marc.info/?l=linux-kernel=136468829626184=2 Andrew or Linus, care to pick this up before for 3.9-rc? - Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 13 Mar 2013, Jeff Layton wrote: Of course, this is all a lot of work, and not something we can shove into the kernel for 3.9 at this point. In the meantime, while Mandeep's warning is correctly pointing out a problem, I think we ought to back it out until we can fix this properly. Revert posted at: http://marc.info/?l=linux-kernelm=136468829626184w=2 Andrew or Linus, care to pick this up before for 3.9-rc? - Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 13:40:16 -0800 Tejun Heo wrote: > On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: > > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > > So I do agree that we probably have *too* many of the stupid "let's > > > check if we can freeze", and I suspect that the NFS code should get > > > rid of the "freezable_schedule()" that is causing this warning > > > (because I also agree that you should *not* freeze while holding > > > locks, because it really can cause deadlocks), but I do suspect that > > > network filesystems do need to have a few places where they check for > > > freezing on their own... Exactly because freezing isn't *quite* like a > > > signal. > > > > Well, I don't really know much about nfs so I can't really tell, but > > for most other cases, dealing with freezing like a signal should work > > fine from what I've seen although I can't be sure before actually > > trying. Trond, Bruce, can you guys please chime in? > > So, I think the question here would be, in nfs, how many of the > current freezer check points would be difficult to conver to signal > handling model after excluding the ones which are performed while > holding some locks which we need to get rid of anyway? > I think we can do this, but it isn't trivial. Here's what I'd envision, but there are still many details that would need to be worked out... Basically what we'd need is a way to go into TASK_KILLABLE sleep, but still allow the freezer to wake these processes up. I guess that likely means we need a new sleep state (TASK_FREEZABLE?). We'd also need a way to return an -ERESTARTSYS like error (-EFREEZING?) that tells the upper syscall handling layers to freeze the task and then restart the syscall after waking back up. Maybe we don't need a new error at all and -ERESTARTSYS is fine here? We also need to consider the effects vs. audit code here, but that may be OK with the overhaul that Al merged a few months ago. Assuming we have those, then we need to fix up the NFS and RPC layers to use this stuff: 1/ Prior to submitting a new RPC, we'd look and see whether "current" is being frozen. If it is, then return -EFREEZING immediately without doing anything. 2/ We might also need to retrofit certain stages in the RPC FSM to return -EFREEZING too if it's a synchronous RPC and the task is being frozen. 3/ A task is waiting for an RPC reply on an async RPC, we'd need to use this new sleep state. If the process wakes up because something wants it to freeze, then have it go back to sleep for a short period of time to try and wait for the reply (up to 15s or so?). If we get the reply, great -- return to userland and freeze the task there. If the reply never comes in, give up on it and just return -EFREEZE and hope for the best. We might have to make this latter behavior contingent on a new mount option (maybe "-o slushy" like Trond recommended). The current "hard" and "soft" semantics don't really fit this situation correctly. Of course, this is all a lot of work, and not something we can shove into the kernel for 3.9 at this point. In the meantime, while Mandeep's warning is correctly pointing out a problem, I think we ought to back it out until we can fix this properly. We're already getting a ton of reports on the mailing lists and in the fedora bug tracker for this warning. Part of the problem is the verbiage -- "BUG" makes people think "Oops", but this is really just a warning. We should also note that this is a problem too in the CIFS code since it uses a similar mechanism for allowing the kernel to suspend while waiting on SMB replies. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 13:40:16 -0800 Tejun Heo t...@kernel.org wrote: On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: So I do agree that we probably have *too* many of the stupid let's check if we can freeze, and I suspect that the NFS code should get rid of the freezable_schedule() that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? So, I think the question here would be, in nfs, how many of the current freezer check points would be difficult to conver to signal handling model after excluding the ones which are performed while holding some locks which we need to get rid of anyway? I think we can do this, but it isn't trivial. Here's what I'd envision, but there are still many details that would need to be worked out... Basically what we'd need is a way to go into TASK_KILLABLE sleep, but still allow the freezer to wake these processes up. I guess that likely means we need a new sleep state (TASK_FREEZABLE?). We'd also need a way to return an -ERESTARTSYS like error (-EFREEZING?) that tells the upper syscall handling layers to freeze the task and then restart the syscall after waking back up. Maybe we don't need a new error at all and -ERESTARTSYS is fine here? We also need to consider the effects vs. audit code here, but that may be OK with the overhaul that Al merged a few months ago. Assuming we have those, then we need to fix up the NFS and RPC layers to use this stuff: 1/ Prior to submitting a new RPC, we'd look and see whether current is being frozen. If it is, then return -EFREEZING immediately without doing anything. 2/ We might also need to retrofit certain stages in the RPC FSM to return -EFREEZING too if it's a synchronous RPC and the task is being frozen. 3/ A task is waiting for an RPC reply on an async RPC, we'd need to use this new sleep state. If the process wakes up because something wants it to freeze, then have it go back to sleep for a short period of time to try and wait for the reply (up to 15s or so?). If we get the reply, great -- return to userland and freeze the task there. If the reply never comes in, give up on it and just return -EFREEZE and hope for the best. We might have to make this latter behavior contingent on a new mount option (maybe -o slushy like Trond recommended). The current hard and soft semantics don't really fit this situation correctly. Of course, this is all a lot of work, and not something we can shove into the kernel for 3.9 at this point. In the meantime, while Mandeep's warning is correctly pointing out a problem, I think we ought to back it out until we can fix this properly. We're already getting a ton of reports on the mailing lists and in the fedora bug tracker for this warning. Part of the problem is the verbiage -- BUG makes people think Oops, but this is really just a warning. We should also note that this is a problem too in the CIFS code since it uses a similar mechanism for allowing the kernel to suspend while waiting on SMB replies. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
* Linus Torvalds wrote: > - the "freeezer for suspend/resume on a laptop" > > [...] > > The second one is unlikely to really use NFS anyway. [...] Incidentally I use NFS to a file server on my laptop, over wifi, and I close the lid for the night. It's NFS mounted soft. If it was mounted hard I wouldn't expect the lid close event to result in a successful suspend if wifi went down - but with soft I would be pretty sad in the morning if batteries drained if I closed the lid after there's a power outage which took down both wifi and the laptop power supply. Closing the lid while on battery is a pretty strong command from the user. Arguably this is a special case on several levels as on desktop distros it's not at all easy mount NFS shares (all point & click automation goes to Samba shares), just wanted to mention it. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
* Linus Torvalds torva...@linux-foundation.org wrote: - the freeezer for suspend/resume on a laptop [...] The second one is unlikely to really use NFS anyway. [...] me raises a hand Incidentally I use NFS to a file server on my laptop, over wifi, and I close the lid for the night. It's NFS mounted soft. If it was mounted hard I wouldn't expect the lid close event to result in a successful suspend if wifi went down - but with soft I would be pretty sad in the morning if batteries drained if I closed the lid after there's a power outage which took down both wifi and the laptop power supply. Closing the lid while on battery is a pretty strong command from the user. Arguably this is a special case on several levels as on desktop distros it's not at all easy mount NFS shares (all point click automation goes to Samba shares), just wanted to mention it. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 7 Mar 2013 17:16:12 + "Myklebust, Trond" wrote: > On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: > > On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond > > wrote: > > > > > > The problem there is that we get into the whole 'hard' vs 'soft' mount > > > problem. We're supposed to guarantee data integrity for 'hard' mounts, > > > so no funny business is allowed. OTOH, 'soft' mounts time out and return > > > EIO to the application anyway, and so shouldn't be a problem. > > > > > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > > > integrity for all situations _except_ ENETDOWN/ENETUNREACH? > > > > I do think we are probably over-analyzing this. It's not like people > > who want freezing to work usually use flaky NFS. There's really two > > main groups: > > > > - the "freezer as a snapshot mechanism" that might use NFS because > > they are in a server environment. > > > > - the "freeezer for suspend/resume on a laptop" > > > > The first one does use NFS, and cares about it, and probably would > > prefer the freeze event to take longer and finish for all ongoing IO > > operations. End result: just ditch the "freezable_schedule()" > > entirely. > > > > The second one is unlikely to really use NFS anyway. End result: > > ditching the freezable_schedule() is probably perfectly fine, even if > > it would cause suspend failures if the network is being troublesome. > > > > So for now, why not just replace freezable_schedule() with plain > > schedule() in the NFS code, and ignore it until somebody actually > > complains about it, and then aim to try to do something more targeted > > for that particular complaint? > > We _have_ had complaints about the laptop suspension problem; that was > why Jeff introduced freezable_schedule() in the first place. We've never > had complaints about any problems involvinf cgroup_freeze. This is why > our focus tends to be on the former, and why I'm more worried about > laptop suspend regressions for any short term fixes. > Right. I don't know of anyone using the cgroup_freeze stuff. OTOH, I've seen *many* reports of people who complained because their machine didn't suspend because of a busy NFS mount. In fact, the problem is still not fully "solved" with the freezable_schedule stuff we have so far anyway. Even after this (admittedly crappy) solution went in, we still had reports of machines that failed to suspend because other tasks were stuck in wait_on_page_writeback() for NFS pages. So, in principle I think we need to abandon the current approach. The question is what should replace it. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thursday, March 07, 2013 08:25:10 AM Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond > wrote: > > > > It _shouldn't_ be an interruption unless the filesystem can't make > > progress. > > So how can we tell? Calling "freezable_schedule()" if you're not ready > to be frozen is not good. And nobody but the NFS code can know. > > You might want to introduce some counter that counts number of > outstanding non-interruptible events, and only call the "freezable" > version if that counter is zero. > > A better alternative might be to *never* call the freezable version. > Because those freezable_*() things are really quite disgusting, and > are wrong - they don't actually freeze the process, they say "I don't > care if you freeze me while I sleep", and you might actually wake up > *while* the system is being frozen. I think the whole concept is > broken. Rafaei - comments? Well, the only comment I have is that the source of these functions was commit d310310c (Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer) and they were added because people were complaining that they couldn't suspend while their NFS servers were not responding, IIRC. I agree that they are not really nice. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond > wrote: > > > > The problem there is that we get into the whole 'hard' vs 'soft' mount > > problem. We're supposed to guarantee data integrity for 'hard' mounts, > > so no funny business is allowed. OTOH, 'soft' mounts time out and return > > EIO to the application anyway, and so shouldn't be a problem. > > > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > > integrity for all situations _except_ ENETDOWN/ENETUNREACH? > > I do think we are probably over-analyzing this. It's not like people > who want freezing to work usually use flaky NFS. There's really two > main groups: > > - the "freezer as a snapshot mechanism" that might use NFS because > they are in a server environment. > > - the "freeezer for suspend/resume on a laptop" > > The first one does use NFS, and cares about it, and probably would > prefer the freeze event to take longer and finish for all ongoing IO > operations. End result: just ditch the "freezable_schedule()" > entirely. > > The second one is unlikely to really use NFS anyway. End result: > ditching the freezable_schedule() is probably perfectly fine, even if > it would cause suspend failures if the network is being troublesome. > > So for now, why not just replace freezable_schedule() with plain > schedule() in the NFS code, and ignore it until somebody actually > complains about it, and then aim to try to do something more targeted > for that particular complaint? We _have_ had complaints about the laptop suspension problem; that was why Jeff introduced freezable_schedule() in the first place. We've never had complaints about any problems involvinf cgroup_freeze. This is why our focus tends to be on the former, and why I'm more worried about laptop suspend regressions for any short term fixes. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond wrote: > > The problem there is that we get into the whole 'hard' vs 'soft' mount > problem. We're supposed to guarantee data integrity for 'hard' mounts, > so no funny business is allowed. OTOH, 'soft' mounts time out and return > EIO to the application anyway, and so shouldn't be a problem. > > Perhaps we could add a '-oslushy' mount option :-) that guarantees data > integrity for all situations _except_ ENETDOWN/ENETUNREACH? I do think we are probably over-analyzing this. It's not like people who want freezing to work usually use flaky NFS. There's really two main groups: - the "freezer as a snapshot mechanism" that might use NFS because they are in a server environment. - the "freeezer for suspend/resume on a laptop" The first one does use NFS, and cares about it, and probably would prefer the freeze event to take longer and finish for all ongoing IO operations. End result: just ditch the "freezable_schedule()" entirely. The second one is unlikely to really use NFS anyway. End result: ditching the freezable_schedule() is probably perfectly fine, even if it would cause suspend failures if the network is being troublesome. So for now, why not just replace freezable_schedule() with plain schedule() in the NFS code, and ignore it until somebody actually complains about it, and then aim to try to do something more targeted for that particular complaint? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 08:25 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond > wrote: > > > > It _shouldn't_ be an interruption unless the filesystem can't make > > progress. > > So how can we tell? Calling "freezable_schedule()" if you're not ready > to be frozen is not good. And nobody but the NFS code can know. > > You might want to introduce some counter that counts number of > outstanding non-interruptible events, and only call the "freezable" > version if that counter is zero. > > A better alternative might be to *never* call the freezable version. > Because those freezable_*() things are really quite disgusting, and > are wrong - they don't actually freeze the process, they say "I don't > care if you freeze me while I sleep", and you might actually wake up > *while* the system is being frozen. I think the whole concept is > broken. Rafaei - comments? The function really is crap, regardless of > any unrelated NFS problems. > > So what NFS could do instead is actually check the "do I need to > freeze" flag, and if you need to freeze you consider it an abort - and > do *not* try to continue. Just freeze, and then act as if the machine > got rebooted as far as NFS was concerned. That should work anyway, no? > > That does sound a lot more complex, though. The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond wrote: > > It _shouldn't_ be an interruption unless the filesystem can't make > progress. So how can we tell? Calling "freezable_schedule()" if you're not ready to be frozen is not good. And nobody but the NFS code can know. You might want to introduce some counter that counts number of outstanding non-interruptible events, and only call the "freezable" version if that counter is zero. A better alternative might be to *never* call the freezable version. Because those freezable_*() things are really quite disgusting, and are wrong - they don't actually freeze the process, they say "I don't care if you freeze me while I sleep", and you might actually wake up *while* the system is being frozen. I think the whole concept is broken. Rafaei - comments? The function really is crap, regardless of any unrelated NFS problems. So what NFS could do instead is actually check the "do I need to freeze" flag, and if you need to freeze you consider it an abort - and do *not* try to continue. Just freeze, and then act as if the machine got rebooted as far as NFS was concerned. That should work anyway, no? That does sound a lot more complex, though. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Linus. On Thu, Mar 07, 2013 at 07:55:39AM -0800, Linus Torvalds wrote: > In other words, that suggestion not only introduces new problems (a), > it's fundamentally broken anyway (b) *AND* it doesn't even solve > anything, it just moves it around. I don't think it's gonna solve the problems we're talking about but the "moving around" part could be useful nonetheless - e.g. we don't want to shut down network before nfs while suspending. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 07:55 -0800, Linus Torvalds wrote: > On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton wrote: > > > > I think Trond may be on the right track. We probably need some > > mechanism to quiesce the filesystem ahead of any sort of freezer > > event. > > No, guys. That cannot work. It's a completely moronic idea. Let me > count the way: > > (a) it's just another form of saying "lock". But since other things > are (by definition) going on when it happens, it will just cause > deadlocks. > > (b) the freeze event might not even be system-global. So *some* > processes (a cgroup) might freeze, others would not. You can't shut > off the filesystem just because some processes migth freeze. That's the whole bloody problem in a nutshell. We only want to freeze the filesystem when the network goes down, and in that case we want time to clean up first. That's why it needs to be initiated by something like NetworkManager _before_ the network is shut down. > (c) it just moves the same issue somewhere else. If you have some > operation that must be done under the lock, then such an operation > must be completed before you've quiesced the filesystem, which is your > whole point of that "quiesce" event. BUT THAT'S THE EXACT SAME ISSUE > AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. > > In other words, that suggestion not only introduces new problems (a), > it's fundamentally broken anyway (b) *AND* it doesn't even solve > anything, it just moves it around. > > The solution is damn simple: if you're in some kind of "atomic > region", then you cannot freeze. Seriously. SO DON'T CALL > "freezable_schedule()", FOR CHRISSAKE! You clearly aren't freezable! > > Which is exactly what the new lockdep warning was all about. Don't try > to move the problem around, when it's quite clear where the problem > is. If you need to do something uninterruptible, you do not say "oh, > I'm freezable". Because freezing is by definition an interruption. > Seriously, it's that simple. It _shouldn't_ be an interruption unless the filesystem can't make progress. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton wrote: > > I think Trond may be on the right track. We probably need some > mechanism to quiesce the filesystem ahead of any sort of freezer > event. No, guys. That cannot work. It's a completely moronic idea. Let me count the way: (a) it's just another form of saying "lock". But since other things are (by definition) going on when it happens, it will just cause deadlocks. (b) the freeze event might not even be system-global. So *some* processes (a cgroup) might freeze, others would not. You can't shut off the filesystem just because some processes migth freeze. (c) it just moves the same issue somewhere else. If you have some operation that must be done under the lock, then such an operation must be completed before you've quiesced the filesystem, which is your whole point of that "quiesce" event. BUT THAT'S THE EXACT SAME ISSUE AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. In other words, that suggestion not only introduces new problems (a), it's fundamentally broken anyway (b) *AND* it doesn't even solve anything, it just moves it around. The solution is damn simple: if you're in some kind of "atomic region", then you cannot freeze. Seriously. SO DON'T CALL "freezable_schedule()", FOR CHRISSAKE! You clearly aren't freezable! Which is exactly what the new lockdep warning was all about. Don't try to move the problem around, when it's quite clear where the problem is. If you need to do something uninterruptible, you do not say "oh, I'm freezable". Because freezing is by definition an interruption. Seriously, it's that simple. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Thu, Mar 07, 2013 at 06:41:40AM -0500, Jeff Layton wrote: > Suppose I call unlink("somefile"); on an NFS mount. We take all of the > VFS locks, go down into the NFS layer. That marshals up the UNLINK > call, sends it off to the server, and waits for the reply. While we're > waiting, a freeze event comes in and we start returning from the > kernel with our new -EFREEZE return code that works sort of like > -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and > removes the file. A little while later we wake up the machine and it > goes to try and pick up where it left off. But you can't freeze regardless of the freezing mechanism in such cases, right? The current code which allows freezing while such operations are in progress is broken as it can lead to freezer deadlocks. They should go away no matter how we implement freezer, so the question is not whether we can move all the existing freezing points to signal mechanism but that, after removing the deadlock-prone ones, how many would be difficult to convert. I'm fully speculating but my suspicion is not too many if you remove (or update somehow) the ones which are being done with some locks held. > The catch here is that it's quite possible that when we need to quiesce > that we've lost communications with the server. We don't want to hold > up the freezer at that point so the wait for replies has to be bounded > in time somehow. If that times out, we probably just have to return all > calls with our new -EFREEZE return and hope for the best when the > machine wakes back up. Sure, a separate prep step may be helpful but assuming a user nfs-mounting stuff on a laptop, I'm not sure how reliable that can be made. People move around with laptops, wifi can be iffy and the lid can be shut at any moment. I don't think it's possible for nfs to be laptop friendly while staying completely correct. Designing such a network filesystem probably is possible with transactions and whatnot but AFAIU nfs isn't designed that way. If such use case is something nfs wants to support, I think it just should make do with some middleground - ie. just implement a mount switch which says "retry operations across network / power problems" and explain the implications. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 13:36:36 -0800 Tejun Heo wrote: > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > So I do agree that we probably have *too* many of the stupid "let's > > check if we can freeze", and I suspect that the NFS code should get > > rid of the "freezable_schedule()" that is causing this warning > > (because I also agree that you should *not* freeze while holding > > locks, because it really can cause deadlocks), but I do suspect that > > network filesystems do need to have a few places where they check for > > freezing on their own... Exactly because freezing isn't *quite* like a > > signal. > > Well, I don't really know much about nfs so I can't really tell, but > for most other cases, dealing with freezing like a signal should work > fine from what I've seen although I can't be sure before actually > trying. Trond, Bruce, can you guys please chime in? > > Thanks. > (hopefully this isn't tl;dr) It's not quite that simple... The problem (as Trond already mentioned) is non-idempotent operations. You can't just restart certain operations from scratch once you reach a certain point. Here's an example: Suppose I call unlink("somefile"); on an NFS mount. We take all of the VFS locks, go down into the NFS layer. That marshals up the UNLINK call, sends it off to the server, and waits for the reply. While we're waiting, a freeze event comes in and we start returning from the kernel with our new -EFREEZE return code that works sort of like -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and removes the file. A little while later we wake up the machine and it goes to try and pick up where it left off. What do we do now? Suppose we pretend we never sent the call in the first place, marshal up a new RPC and send it again. This is problematic -- the server will probably send back the equivalent of ENOENT. How do we know whether the file never existed in the first place, or whether the server processed the original call and removed the file then? Do we instead try and keep track of whether the RPC has been sent and just wait for the reply on the original call? That's tricky too -- it means adding an extra codepath to check for these sorts of restarts in a bunch of different ops vectors into the filesystem. We also have to somehow keep track of this state too (I guess by hanging something off the task_struct). Note too that the above is the simple case. We're dropping the parent's i_mutex during the freeze. Suppose when we restart the call that the parent directory has changed in such a way that the original lookup we did to do the original RPC is no longer valid? I think Trond may be on the right track. We probably need some mechanism to quiesce the filesystem ahead of any sort of freezer event. That quiesce could simply wait on any in flight RPCs to come back, and not allow any new ones to go out. On syscalls where the RPC didn't go out, we'd just return -EFREEZE or whatever and let the upper layers restart the call after waking back up. Writeback would be tricky, but that can be handled too. The catch here is that it's quite possible that when we need to quiesce that we've lost communications with the server. We don't want to hold up the freezer at that point so the wait for replies has to be bounded in time somehow. If that times out, we probably just have to return all calls with our new -EFREEZE return and hope for the best when the machine wakes back up. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 13:36:36 -0800 Tejun Heo t...@kernel.org wrote: On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: So I do agree that we probably have *too* many of the stupid let's check if we can freeze, and I suspect that the NFS code should get rid of the freezable_schedule() that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? Thanks. (hopefully this isn't tl;dr) It's not quite that simple... The problem (as Trond already mentioned) is non-idempotent operations. You can't just restart certain operations from scratch once you reach a certain point. Here's an example: Suppose I call unlink(somefile); on an NFS mount. We take all of the VFS locks, go down into the NFS layer. That marshals up the UNLINK call, sends it off to the server, and waits for the reply. While we're waiting, a freeze event comes in and we start returning from the kernel with our new -EFREEZE return code that works sort of like -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and removes the file. A little while later we wake up the machine and it goes to try and pick up where it left off. What do we do now? Suppose we pretend we never sent the call in the first place, marshal up a new RPC and send it again. This is problematic -- the server will probably send back the equivalent of ENOENT. How do we know whether the file never existed in the first place, or whether the server processed the original call and removed the file then? Do we instead try and keep track of whether the RPC has been sent and just wait for the reply on the original call? That's tricky too -- it means adding an extra codepath to check for these sorts of restarts in a bunch of different ops vectors into the filesystem. We also have to somehow keep track of this state too (I guess by hanging something off the task_struct). Note too that the above is the simple case. We're dropping the parent's i_mutex during the freeze. Suppose when we restart the call that the parent directory has changed in such a way that the original lookup we did to do the original RPC is no longer valid? I think Trond may be on the right track. We probably need some mechanism to quiesce the filesystem ahead of any sort of freezer event. That quiesce could simply wait on any in flight RPCs to come back, and not allow any new ones to go out. On syscalls where the RPC didn't go out, we'd just return -EFREEZE or whatever and let the upper layers restart the call after waking back up. Writeback would be tricky, but that can be handled too. The catch here is that it's quite possible that when we need to quiesce that we've lost communications with the server. We don't want to hold up the freezer at that point so the wait for replies has to be bounded in time somehow. If that times out, we probably just have to return all calls with our new -EFREEZE return and hope for the best when the machine wakes back up. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Thu, Mar 07, 2013 at 06:41:40AM -0500, Jeff Layton wrote: Suppose I call unlink(somefile); on an NFS mount. We take all of the VFS locks, go down into the NFS layer. That marshals up the UNLINK call, sends it off to the server, and waits for the reply. While we're waiting, a freeze event comes in and we start returning from the kernel with our new -EFREEZE return code that works sort of like -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and removes the file. A little while later we wake up the machine and it goes to try and pick up where it left off. But you can't freeze regardless of the freezing mechanism in such cases, right? The current code which allows freezing while such operations are in progress is broken as it can lead to freezer deadlocks. They should go away no matter how we implement freezer, so the question is not whether we can move all the existing freezing points to signal mechanism but that, after removing the deadlock-prone ones, how many would be difficult to convert. I'm fully speculating but my suspicion is not too many if you remove (or update somehow) the ones which are being done with some locks held. The catch here is that it's quite possible that when we need to quiesce that we've lost communications with the server. We don't want to hold up the freezer at that point so the wait for replies has to be bounded in time somehow. If that times out, we probably just have to return all calls with our new -EFREEZE return and hope for the best when the machine wakes back up. Sure, a separate prep step may be helpful but assuming a user nfs-mounting stuff on a laptop, I'm not sure how reliable that can be made. People move around with laptops, wifi can be iffy and the lid can be shut at any moment. I don't think it's possible for nfs to be laptop friendly while staying completely correct. Designing such a network filesystem probably is possible with transactions and whatnot but AFAIU nfs isn't designed that way. If such use case is something nfs wants to support, I think it just should make do with some middleground - ie. just implement a mount switch which says retry operations across network / power problems and explain the implications. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton jlay...@redhat.com wrote: I think Trond may be on the right track. We probably need some mechanism to quiesce the filesystem ahead of any sort of freezer event. No, guys. That cannot work. It's a completely moronic idea. Let me count the way: (a) it's just another form of saying lock. But since other things are (by definition) going on when it happens, it will just cause deadlocks. (b) the freeze event might not even be system-global. So *some* processes (a cgroup) might freeze, others would not. You can't shut off the filesystem just because some processes migth freeze. (c) it just moves the same issue somewhere else. If you have some operation that must be done under the lock, then such an operation must be completed before you've quiesced the filesystem, which is your whole point of that quiesce event. BUT THAT'S THE EXACT SAME ISSUE AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. In other words, that suggestion not only introduces new problems (a), it's fundamentally broken anyway (b) *AND* it doesn't even solve anything, it just moves it around. The solution is damn simple: if you're in some kind of atomic region, then you cannot freeze. Seriously. SO DON'T CALL freezable_schedule(), FOR CHRISSAKE! You clearly aren't freezable! Which is exactly what the new lockdep warning was all about. Don't try to move the problem around, when it's quite clear where the problem is. If you need to do something uninterruptible, you do not say oh, I'm freezable. Because freezing is by definition an interruption. Seriously, it's that simple. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 07:55 -0800, Linus Torvalds wrote: On Thu, Mar 7, 2013 at 3:41 AM, Jeff Layton jlay...@redhat.com wrote: I think Trond may be on the right track. We probably need some mechanism to quiesce the filesystem ahead of any sort of freezer event. No, guys. That cannot work. It's a completely moronic idea. Let me count the way: (a) it's just another form of saying lock. But since other things are (by definition) going on when it happens, it will just cause deadlocks. (b) the freeze event might not even be system-global. So *some* processes (a cgroup) might freeze, others would not. You can't shut off the filesystem just because some processes migth freeze. That's the whole bloody problem in a nutshell. We only want to freeze the filesystem when the network goes down, and in that case we want time to clean up first. That's why it needs to be initiated by something like NetworkManager _before_ the network is shut down. (c) it just moves the same issue somewhere else. If you have some operation that must be done under the lock, then such an operation must be completed before you've quiesced the filesystem, which is your whole point of that quiesce event. BUT THAT'S THE EXACT SAME ISSUE AS NOT ALLOWING THE FREEZE TO HAPPEN DURING THAT TIME. In other words, that suggestion not only introduces new problems (a), it's fundamentally broken anyway (b) *AND* it doesn't even solve anything, it just moves it around. The solution is damn simple: if you're in some kind of atomic region, then you cannot freeze. Seriously. SO DON'T CALL freezable_schedule(), FOR CHRISSAKE! You clearly aren't freezable! Which is exactly what the new lockdep warning was all about. Don't try to move the problem around, when it's quite clear where the problem is. If you need to do something uninterruptible, you do not say oh, I'm freezable. Because freezing is by definition an interruption. Seriously, it's that simple. It _shouldn't_ be an interruption unless the filesystem can't make progress. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Linus. On Thu, Mar 07, 2013 at 07:55:39AM -0800, Linus Torvalds wrote: In other words, that suggestion not only introduces new problems (a), it's fundamentally broken anyway (b) *AND* it doesn't even solve anything, it just moves it around. I don't think it's gonna solve the problems we're talking about but the moving around part could be useful nonetheless - e.g. we don't want to shut down network before nfs while suspending. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: It _shouldn't_ be an interruption unless the filesystem can't make progress. So how can we tell? Calling freezable_schedule() if you're not ready to be frozen is not good. And nobody but the NFS code can know. You might want to introduce some counter that counts number of outstanding non-interruptible events, and only call the freezable version if that counter is zero. A better alternative might be to *never* call the freezable version. Because those freezable_*() things are really quite disgusting, and are wrong - they don't actually freeze the process, they say I don't care if you freeze me while I sleep, and you might actually wake up *while* the system is being frozen. I think the whole concept is broken. Rafaei - comments? The function really is crap, regardless of any unrelated NFS problems. So what NFS could do instead is actually check the do I need to freeze flag, and if you need to freeze you consider it an abort - and do *not* try to continue. Just freeze, and then act as if the machine got rebooted as far as NFS was concerned. That should work anyway, no? That does sound a lot more complex, though. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 08:25 -0800, Linus Torvalds wrote: On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: It _shouldn't_ be an interruption unless the filesystem can't make progress. So how can we tell? Calling freezable_schedule() if you're not ready to be frozen is not good. And nobody but the NFS code can know. You might want to introduce some counter that counts number of outstanding non-interruptible events, and only call the freezable version if that counter is zero. A better alternative might be to *never* call the freezable version. Because those freezable_*() things are really quite disgusting, and are wrong - they don't actually freeze the process, they say I don't care if you freeze me while I sleep, and you might actually wake up *while* the system is being frozen. I think the whole concept is broken. Rafaei - comments? The function really is crap, regardless of any unrelated NFS problems. So what NFS could do instead is actually check the do I need to freeze flag, and if you need to freeze you consider it an abort - and do *not* try to continue. Just freeze, and then act as if the machine got rebooted as far as NFS was concerned. That should work anyway, no? That does sound a lot more complex, though. The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? I do think we are probably over-analyzing this. It's not like people who want freezing to work usually use flaky NFS. There's really two main groups: - the freezer as a snapshot mechanism that might use NFS because they are in a server environment. - the freeezer for suspend/resume on a laptop The first one does use NFS, and cares about it, and probably would prefer the freeze event to take longer and finish for all ongoing IO operations. End result: just ditch the freezable_schedule() entirely. The second one is unlikely to really use NFS anyway. End result: ditching the freezable_schedule() is probably perfectly fine, even if it would cause suspend failures if the network is being troublesome. So for now, why not just replace freezable_schedule() with plain schedule() in the NFS code, and ignore it until somebody actually complains about it, and then aim to try to do something more targeted for that particular complaint? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? I do think we are probably over-analyzing this. It's not like people who want freezing to work usually use flaky NFS. There's really two main groups: - the freezer as a snapshot mechanism that might use NFS because they are in a server environment. - the freeezer for suspend/resume on a laptop The first one does use NFS, and cares about it, and probably would prefer the freeze event to take longer and finish for all ongoing IO operations. End result: just ditch the freezable_schedule() entirely. The second one is unlikely to really use NFS anyway. End result: ditching the freezable_schedule() is probably perfectly fine, even if it would cause suspend failures if the network is being troublesome. So for now, why not just replace freezable_schedule() with plain schedule() in the NFS code, and ignore it until somebody actually complains about it, and then aim to try to do something more targeted for that particular complaint? We _have_ had complaints about the laptop suspension problem; that was why Jeff introduced freezable_schedule() in the first place. We've never had complaints about any problems involvinf cgroup_freeze. This is why our focus tends to be on the former, and why I'm more worried about laptop suspend regressions for any short term fixes. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thursday, March 07, 2013 08:25:10 AM Linus Torvalds wrote: On Thu, Mar 7, 2013 at 7:59 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: It _shouldn't_ be an interruption unless the filesystem can't make progress. So how can we tell? Calling freezable_schedule() if you're not ready to be frozen is not good. And nobody but the NFS code can know. You might want to introduce some counter that counts number of outstanding non-interruptible events, and only call the freezable version if that counter is zero. A better alternative might be to *never* call the freezable version. Because those freezable_*() things are really quite disgusting, and are wrong - they don't actually freeze the process, they say I don't care if you freeze me while I sleep, and you might actually wake up *while* the system is being frozen. I think the whole concept is broken. Rafaei - comments? Well, the only comment I have is that the source of these functions was commit d310310c (Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer) and they were added because people were complaining that they couldn't suspend while their NFS servers were not responding, IIRC. I agree that they are not really nice. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Thu, 7 Mar 2013 17:16:12 + Myklebust, Trond trond.mykleb...@netapp.com wrote: On Thu, 2013-03-07 at 09:03 -0800, Linus Torvalds wrote: On Thu, Mar 7, 2013 at 8:45 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: The problem there is that we get into the whole 'hard' vs 'soft' mount problem. We're supposed to guarantee data integrity for 'hard' mounts, so no funny business is allowed. OTOH, 'soft' mounts time out and return EIO to the application anyway, and so shouldn't be a problem. Perhaps we could add a '-oslushy' mount option :-) that guarantees data integrity for all situations _except_ ENETDOWN/ENETUNREACH? I do think we are probably over-analyzing this. It's not like people who want freezing to work usually use flaky NFS. There's really two main groups: - the freezer as a snapshot mechanism that might use NFS because they are in a server environment. - the freeezer for suspend/resume on a laptop The first one does use NFS, and cares about it, and probably would prefer the freeze event to take longer and finish for all ongoing IO operations. End result: just ditch the freezable_schedule() entirely. The second one is unlikely to really use NFS anyway. End result: ditching the freezable_schedule() is probably perfectly fine, even if it would cause suspend failures if the network is being troublesome. So for now, why not just replace freezable_schedule() with plain schedule() in the NFS code, and ignore it until somebody actually complains about it, and then aim to try to do something more targeted for that particular complaint? We _have_ had complaints about the laptop suspension problem; that was why Jeff introduced freezable_schedule() in the first place. We've never had complaints about any problems involvinf cgroup_freeze. This is why our focus tends to be on the former, and why I'm more worried about laptop suspend regressions for any short term fixes. Right. I don't know of anyone using the cgroup_freeze stuff. OTOH, I've seen *many* reports of people who complained because their machine didn't suspend because of a busy NFS mount. In fact, the problem is still not fully solved with the freezable_schedule stuff we have so far anyway. Even after this (admittedly crappy) solution went in, we still had reports of machines that failed to suspend because other tasks were stuck in wait_on_page_writeback() for NFS pages. So, in principle I think we need to abandon the current approach. The question is what should replace it. -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: > On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > > So I do agree that we probably have *too* many of the stupid "let's > > check if we can freeze", and I suspect that the NFS code should get > > rid of the "freezable_schedule()" that is causing this warning > > (because I also agree that you should *not* freeze while holding > > locks, because it really can cause deadlocks), but I do suspect that > > network filesystems do need to have a few places where they check for > > freezing on their own... Exactly because freezing isn't *quite* like a > > signal. > > Well, I don't really know much about nfs so I can't really tell, but > for most other cases, dealing with freezing like a signal should work > fine from what I've seen although I can't be sure before actually > trying. Trond, Bruce, can you guys please chime in? So, I think the question here would be, in nfs, how many of the current freezer check points would be difficult to conver to signal handling model after excluding the ones which are performed while holding some locks which we need to get rid of anyway? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: > So I do agree that we probably have *too* many of the stupid "let's > check if we can freeze", and I suspect that the NFS code should get > rid of the "freezable_schedule()" that is causing this warning > (because I also agree that you should *not* freeze while holding > locks, because it really can cause deadlocks), but I do suspect that > network filesystems do need to have a few places where they check for > freezing on their own... Exactly because freezing isn't *quite* like a > signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 1:24 PM, Tejun Heo wrote: > > With syscall paths out of the way, the surface is reduced a lot. So the issue is syscalls that don't react to signals, and that can potentially wait a long time. Like NFS with a network hickup. Which is not at all unlikely. Think wireless network, somebody trying to get on a network share, things not working, and closing the damn lid because you give up. So I do agree that we probably have *too* many of the stupid "let's check if we can freeze", and I suspect that the NFS code should get rid of the "freezable_schedule()" that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Linus. On Wed, Mar 06, 2013 at 01:00:02PM -0800, Linus Torvalds wrote: > > Oh yeah, we don't need another signal. We just need sigpending state > > and a wakeup. I wasn't really going into details. The important > > point is that for code paths outside signal/ptrace, freezing could > > look and behave about the same as signal delivery. > > Don't we already do that? The whole "try_to_freeze()" in > get_signal_to_deliver() is about exactly this. See > fake_signal_wake_up(). Yeap, that was what I had in mind too. Maybe we'll need to modify it slightly but we already have most of the basic stuff. > You still have kernel threads (that don't do signals) to worry about, > so it doesn't make things go away. And you still have issues with > latency of disk wait, which is, I think, the reason for that > "freezable_schedule()" in the NFS code to begin with. I haven't thought about it for quite some time so things are hazy, but here's what I can recall now. With syscall paths out of the way, the surface is reduced a lot. Another part is converting most freezable kthread users to freezable workqueue which provides natural resource boundaries (the duration of work item execution). kthread is already difficult to get the synchronization completely right and significant number of freezable + should_stop users are subtly broken the last time I went over the freezer users. I think we would be much better off converting most over to freezable workqueues which is easier to get right and likely to be less expensive. Freezing happens at work item boundary which in most cases could be made to coincide with the original freezer check point. There could be kthreads which can't be converted to workqueue for whatever reason (there shouldn't be many at this point) but most freezer usages in kthreads are pretty simple. It's usually single or a couple freezer check points in the main loop. While we may still need special handling for them, I don't think they're likely to have implications on issues like this. We probably would want to handle restart for freezable kthreads calling syscalls. Haven't thought about this one too much yet. Maybe freezable kthreads doing syscalls just need to be ready for -ERESTARTSYS? I'm not sure I follow the disk wait latency part. Are you saying that switching to jobctl trap based freezer implementation wouldn't help them? If so, right, it doesn't in itself. It's just changing the infrastructure used for freezing and can't make the underlying synchronization issues just disappear but at least it becomes the same problem as being responsive to SIGKILL rather than a completely separate problem. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 10:53 AM, Tejun Heo wrote: > Hello, Oleg. > > On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: >> And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC >> layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) >> and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). > > Oh yeah, we don't need another signal. We just need sigpending state > and a wakeup. I wasn't really going into details. The important > point is that for code paths outside signal/ptrace, freezing could > look and behave about the same as signal delivery. Don't we already do that? The whole "try_to_freeze()" in get_signal_to_deliver() is about exactly this. See fake_signal_wake_up(). You still have kernel threads (that don't do signals) to worry about, so it doesn't make things go away. And you still have issues with latency of disk wait, which is, I think, the reason for that "freezable_schedule()" in the NFS code to begin with. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 10:37 AM, Myklebust, Trond wrote: > On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: >> On Wed, 6 Mar 2013 07:59:01 -0800 >> Mandeep Singh Baines wrote: >> > In general, holding a lock and freezing can cause a deadlock if: >> > >> > 1) you froze via the cgroup_freezer subsystem and a task in another >> > cgroup tried to acquire the same lock >> > 2) the lock was needed later is suspend/hibernate. For example, if the >> > lock was needed in dpm_suspend by one of the device callbacks. For >> > hibernate, you also need to worry about any locks that need to be >> > acquired in order to write to the swap device. >> > 3) another freezing task blocked on this lock and held other locks >> > needed later in suspend. If that task were skipped by the freezer, you >> > would deadlock >> > >> > You will block/prevent suspend if: >> > >> > 4) another freezing task blocked on this lock and was unable to freeze >> > >> > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires >> > cgroup freezer. Case 4) while not causing a deadlock could prevent >> > your laptop/phone from sleeping and end up burning all your battery. >> > If suspend is initiated via lid close you won't even know about the >> > failure. >> > >> >> We're aware of #4. That was the intent of adding try_to_freeze() into >> this codepath in the first place. It's not a great solution for obvious >> reasons, but we don't have another at the moment. >> >> For #1 I'm not sure what to do. I'm that familiar with cgroups or how >> the freezer works. >> >> The bottom line is that we have a choice -- we can either rip out this >> new lockdep warning, or rip out the code that causes it to fire. >> >> If we rip out the warning we may miss some legit cases where we might >> possibly have hit a deadlock. >> >> If we rip out the code that causes it to fire, then we exacerbate the >> #4 problem above. That will effectively make it so that you can't >> suspend the host whenever NFS is doing anything moderately active. > > #4 is probably the only case where we might want to freeze. > > Unless we're in a situation where the network is going down, we can > usually always make progress with completing the RPC call and finishing > the system call. So in the case of cgroup_freezer, we only care if the > freezing cgroup also owns the network device (or net namespace) that NFS > is using to talk to the server. > > As I said, the alternative is to notify NFS that the device is going > down, and to give it a chance to quiesce itself before that happens. > This is also the only way to ensure that processes which own locks on > the server (e.g. posix file locks) have a chance to release them before > being suspended. > So if I'm understanding correctly, the locks are held for bounded time so under normal situation you don't really need to freeze here. But if the task that is going to wake you were to freeze then this code path could block suspend if PF_FREEZER_SKIP were not set. For this particular case (not in general), instead of calling try_to_freeze(), what if you only cleared PF_FREEZER_SKIP. For case #1, you wouldn't block suspend if you're stuck waiting. If you received the event while processes were freezing, you'd just try_to_freeze() some place else. Probably on the way back up from the system call. For case #4, you'd eventually complete the rpc and then freeze some place else. If you're stuck waiting for the bit because the process that was going to wake you also got frozen, then the container would be stuck in the THAWING state. Not catastrophic and something the administrator could workaround by sequencing things right. Does this sound like a reasonable workaround? Regards, Mandeep > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > trond.mykleb...@netapp.com > www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Oleg. On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: > And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC > layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) > and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). Oh yeah, we don't need another signal. We just need sigpending state and a wakeup. I wasn't really going into details. The important point is that for code paths outside signal/ptrace, freezing could look and behave about the same as signal delivery. > But if we can do this, then it should be possible so simply make these > sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we > can't always restart, so I am totally confused. Of course, switching to another freezing mechanism doesn't make issues like this go away in itself, but it would at least allow handling such issues in easier to grasp way rather than dealing with this giant pseudo lock and allows code paths outside jobctl to simply deal with one issue - signal delivery, rather than having to deal with freezing separately. Also, while not completely identical, frozen state would at least be an extension of jobctl trap and it would be possible to allow things like killing frozen tasks or even ptracing them in well-defined manner for cgroup_freezer use cases. As it currently stands, there is no well-defined abstraction for frozen state which we can expose in any way, which sucks. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, On Wed, Mar 06, 2013 at 01:40:04PM -0500, Jeff Layton wrote: > Though when I said that, it was before Tejun mentioned hooking this up > to ptrace. I'll confess that I don't fully understand what he's > proposing either though... Oh, they're all just pretty closely related. All signal and ptrace traps run off get_signal_to_deliver(). It is still a bit hairy but in much better shape after cleanups which accompanied PTRACE_SEIZE and I don't really think it'll be too difficult to extend it so that freezing is implemented as a type of jobctl trap. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 19:17:35 +0100 Oleg Nesterov wrote: > On 03/05, Jeff Layton wrote: > > > > Anyone up for working out how to handle a freeze event on a process > > that already has a pending signal, while it's being ptraced? > > Could you explain the problem? > Not very well. I was just saying that the signal/ptrace stuff is very complicated already. Now we're looking at mixing in the freezer too, which further increases the complexity. Though when I said that, it was before Tejun mentioned hooking this up to ptrace. I'll confess that I don't fully understand what he's proposing either though... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: > On Wed, 6 Mar 2013 07:59:01 -0800 > Mandeep Singh Baines wrote: > > In general, holding a lock and freezing can cause a deadlock if: > > > > 1) you froze via the cgroup_freezer subsystem and a task in another > > cgroup tried to acquire the same lock > > 2) the lock was needed later is suspend/hibernate. For example, if the > > lock was needed in dpm_suspend by one of the device callbacks. For > > hibernate, you also need to worry about any locks that need to be > > acquired in order to write to the swap device. > > 3) another freezing task blocked on this lock and held other locks > > needed later in suspend. If that task were skipped by the freezer, you > > would deadlock > > > > You will block/prevent suspend if: > > > > 4) another freezing task blocked on this lock and was unable to freeze > > > > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires > > cgroup freezer. Case 4) while not causing a deadlock could prevent > > your laptop/phone from sleeping and end up burning all your battery. > > If suspend is initiated via lid close you won't even know about the > > failure. > > > > We're aware of #4. That was the intent of adding try_to_freeze() into > this codepath in the first place. It's not a great solution for obvious > reasons, but we don't have another at the moment. > > For #1 I'm not sure what to do. I'm that familiar with cgroups or how > the freezer works. > > The bottom line is that we have a choice -- we can either rip out this > new lockdep warning, or rip out the code that causes it to fire. > > If we rip out the warning we may miss some legit cases where we might > possibly have hit a deadlock. > > If we rip out the code that causes it to fire, then we exacerbate the > #4 problem above. That will effectively make it so that you can't > suspend the host whenever NFS is doing anything moderately active. #4 is probably the only case where we might want to freeze. Unless we're in a situation where the network is going down, we can usually always make progress with completing the RPC call and finishing the system call. So in the case of cgroup_freezer, we only care if the freezing cgroup also owns the network device (or net namespace) that NFS is using to talk to the server. As I said, the alternative is to notify NFS that the device is going down, and to give it a chance to quiesce itself before that happens. This is also the only way to ensure that processes which own locks on the server (e.g. posix file locks) have a chance to release them before being suspended. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 07:59:01 -0800 Mandeep Singh Baines wrote: > On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton wrote: > > On Wed, 6 Mar 2013 10:09:14 +0100 > > Ingo Molnar wrote: > > > >> > >> * Mandeep Singh Baines wrote: > >> > >> > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo wrote: > >> > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > >> > >> If it's really just a 2-line patch to try_to_freeze(), could it just > >> > >> be > >> > >> carried out-of-tree by people that are specifically working on > >> > >> tracking > >> > >> down these problems? > >> > >> > >> > >> But I don't have strong feelings about it--as long as it doesn't > >> > >> result > >> > >> in the same known issues getting reported again and again > >> > > > >> > > Agreed, I don't think a Kconfig option is justified for this. If this > >> > > is really important, annotate broken paths so that it doesn't trigger > >> > > spuriously; otherwise, please just remove it. > >> > > > >> > > >> > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > >> > > >> > Maybe, add a new lockdep API: > >> > > >> > lockdep_set_held_during_freeze(lock); > >> > > >> > Then when we do the check, ignore any locks that set this bit. > >> > > >> > Ingo, does this seem like a reasonable design to you? > >> > >> Am I reading the discussion correctly that the new warnings show REAL > >> potential > >> deadlock scenarios, which can hit real users and can lock their box up in > >> entirely > >> real usage scenarios? > >> > >> If yes then guys we _really_ don't want to use lockdep annotation to > >> _HIDE_ bugs. > >> We typically use them to teach lockdep about things it does not know about. > >> > >> How about fixing the deadlocks instead? > >> > > > > I do see how the freezer might fail to suspend certain tasks, but I > > don't see the deadlock scenario here in the NFS/RPC case. Can someone > > outline a situation where this might end up deadlocking? If not, then > > I'd be inclined to say that while this may be a problem, the warning is > > excessive... > > > > In general, holding a lock and freezing can cause a deadlock if: > > 1) you froze via the cgroup_freezer subsystem and a task in another > cgroup tried to acquire the same lock > 2) the lock was needed later is suspend/hibernate. For example, if the > lock was needed in dpm_suspend by one of the device callbacks. For > hibernate, you also need to worry about any locks that need to be > acquired in order to write to the swap device. > 3) another freezing task blocked on this lock and held other locks > needed later in suspend. If that task were skipped by the freezer, you > would deadlock > > You will block/prevent suspend if: > > 4) another freezing task blocked on this lock and was unable to freeze > > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires > cgroup freezer. Case 4) while not causing a deadlock could prevent > your laptop/phone from sleeping and end up burning all your battery. > If suspend is initiated via lid close you won't even know about the > failure. > We're aware of #4. That was the intent of adding try_to_freeze() into this codepath in the first place. It's not a great solution for obvious reasons, but we don't have another at the moment. For #1 I'm not sure what to do. I'm that familiar with cgroups or how the freezer works. The bottom line is that we have a choice -- we can either rip out this new lockdep warning, or rip out the code that causes it to fire. If we rip out the warning we may miss some legit cases where we might possibly have hit a deadlock. If we rip out the code that causes it to fire, then we exacerbate the #4 problem above. That will effectively make it so that you can't suspend the host whenever NFS is doing anything moderately active. Ripping out the warning seems like the best course of action in the near term, but it's not my call... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On 03/05, Jeff Layton wrote: > > Anyone up for working out how to handle a freeze event on a process > that already has a pending signal, while it's being ptraced? Could you explain the problem? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On 03/05, Tejun Heo wrote: > > Oleg, are you still opposed to the idea of making freezer share trap > points with ptrace? My memory can fool me, but iirc I wasn't actually opposed... I guess you mean the previous discussion about vfork/ptrace/etc which I forgot completely. But I can recall the main problem with your idea (with me): I simply wasn't able to understand it ;) Likewise, I can't really understand the ideas discussed in this thread. At least when it come to this particular problem, rpc_wait_bit_killable() is not interruptible... And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). But if we can do this, then it should be possible so simply make these sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we can't always restart, so I am totally confused. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton wrote: > On Wed, 6 Mar 2013 10:09:14 +0100 > Ingo Molnar wrote: > >> >> * Mandeep Singh Baines wrote: >> >> > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo wrote: >> > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: >> > >> If it's really just a 2-line patch to try_to_freeze(), could it just be >> > >> carried out-of-tree by people that are specifically working on tracking >> > >> down these problems? >> > >> >> > >> But I don't have strong feelings about it--as long as it doesn't result >> > >> in the same known issues getting reported again and again >> > > >> > > Agreed, I don't think a Kconfig option is justified for this. If this >> > > is really important, annotate broken paths so that it doesn't trigger >> > > spuriously; otherwise, please just remove it. >> > > >> > >> > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. >> > >> > Maybe, add a new lockdep API: >> > >> > lockdep_set_held_during_freeze(lock); >> > >> > Then when we do the check, ignore any locks that set this bit. >> > >> > Ingo, does this seem like a reasonable design to you? >> >> Am I reading the discussion correctly that the new warnings show REAL >> potential >> deadlock scenarios, which can hit real users and can lock their box up in >> entirely >> real usage scenarios? >> >> If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ >> bugs. >> We typically use them to teach lockdep about things it does not know about. >> >> How about fixing the deadlocks instead? >> > > I do see how the freezer might fail to suspend certain tasks, but I > don't see the deadlock scenario here in the NFS/RPC case. Can someone > outline a situation where this might end up deadlocking? If not, then > I'd be inclined to say that while this may be a problem, the warning is > excessive... > In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. Regards, Mandeep > -- > Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 10:09:14 +0100 Ingo Molnar wrote: > > * Mandeep Singh Baines wrote: > > > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo wrote: > > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > > >> If it's really just a 2-line patch to try_to_freeze(), could it just be > > >> carried out-of-tree by people that are specifically working on tracking > > >> down these problems? > > >> > > >> But I don't have strong feelings about it--as long as it doesn't result > > >> in the same known issues getting reported again and again > > > > > > Agreed, I don't think a Kconfig option is justified for this. If this > > > is really important, annotate broken paths so that it doesn't trigger > > > spuriously; otherwise, please just remove it. > > > > > > > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > > > > Maybe, add a new lockdep API: > > > > lockdep_set_held_during_freeze(lock); > > > > Then when we do the check, ignore any locks that set this bit. > > > > Ingo, does this seem like a reasonable design to you? > > Am I reading the discussion correctly that the new warnings show REAL > potential > deadlock scenarios, which can hit real users and can lock their box up in > entirely > real usage scenarios? > > If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ > bugs. > We typically use them to teach lockdep about things it does not know about. > > How about fixing the deadlocks instead? > I do see how the freezer might fail to suspend certain tasks, but I don't see the deadlock scenario here in the NFS/RPC case. Can someone outline a situation where this might end up deadlocking? If not, then I'd be inclined to say that while this may be a problem, the warning is excessive... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 01:10:07 + "Myklebust, Trond" wrote: > On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: > > On Tue, 5 Mar 2013 09:49:54 -0800 > > Tejun Heo wrote: > > > > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > > So, I think this is why implementing freezer as a separate blocking > > > > mechanism isn't such a good idea. We're effectively introducing a > > > > completely new waiting state to a lot of unsuspecting paths which > > > > generates a lot of risks and eventually extra complexity to work > > > > around those. I think we really should update freezer to re-use the > > > > blocking points we already have - the ones used for signal delivery > > > > and ptracing. That way, other code paths don't have to worry about an > > > > extra stop state and we can confine most complexities to freezer > > > > proper. > > > > > > Also, consolidating those wait states means that we can solve the > > > event-to-response latency problem for all three cases - signal, ptrace > > > and freezer, rather than adding separate backing-out strategy for > > > freezer. > > > > > > > Sounds intriguing... > > > > I'm not sure what this really means for something like NFS though. How > > would you envision this working when we have long running syscalls that > > might sit waiting in the kernel indefinitely? > > > > Here's my blue-sky, poorly-thought-out idea... > > > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > > NFS/RPC layer to be interrupted. Those would return back toward > > userland with a particular type of error (sort of like ERESTARTSYS). > > > > Before returning from the kernel though, we could freeze the process. > > When it wakes up, then we could go back down and retry the call again > > (much like an ERESTARTSYS kind of thing). > > Two (three?) show stopper words for you: "non-idempotent operations". > > Not all RPC calls can just be interrupted and restarted. Something like > an exclusive file create, unlink, file locking attempt, etc may give > rise to different results when replayed in the above scenario. > Interrupting an RPC call is not equivalent to cancelling its effects... > Right -- that's the part where we have to take great care to save the state of the syscall at the time we returned back up toward userland on a freeze event. I suppose we'd need to hang something off the task_struct to keep track of that. In most cases, it would be sufficient to keep track of whether an RPC had been sent during the call for non-idempotent operations. If it was sent, then we'd just re-enter the wait for the reply. If it wasn't then we'd go ahead and send the call. Still, I'm sure there are details I'm overlooking here. The whole point of holding these mutexes is to ensure that operations that the directories don't change while we're doing these operations. If we release the locks in order to go to sleep, then there's no guarantee that things haven't changed when we reacquire them. Maybe it's best to give up and just tell people that suspending your laptop with a NFS mount is not allowed :P -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
* Mandeep Singh Baines wrote: > On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo wrote: > > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > >> If it's really just a 2-line patch to try_to_freeze(), could it just be > >> carried out-of-tree by people that are specifically working on tracking > >> down these problems? > >> > >> But I don't have strong feelings about it--as long as it doesn't result > >> in the same known issues getting reported again and again > > > > Agreed, I don't think a Kconfig option is justified for this. If this > > is really important, annotate broken paths so that it doesn't trigger > > spuriously; otherwise, please just remove it. > > > > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. > > Maybe, add a new lockdep API: > > lockdep_set_held_during_freeze(lock); > > Then when we do the check, ignore any locks that set this bit. > > Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
* Mandeep Singh Baines m...@chromium.org wrote: On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 01:10:07 + Myklebust, Trond trond.mykleb...@netapp.com wrote: On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: On Tue, 5 Mar 2013 09:49:54 -0800 Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). Two (three?) show stopper words for you: non-idempotent operations. Not all RPC calls can just be interrupted and restarted. Something like an exclusive file create, unlink, file locking attempt, etc may give rise to different results when replayed in the above scenario. Interrupting an RPC call is not equivalent to cancelling its effects... Right -- that's the part where we have to take great care to save the state of the syscall at the time we returned back up toward userland on a freeze event. I suppose we'd need to hang something off the task_struct to keep track of that. In most cases, it would be sufficient to keep track of whether an RPC had been sent during the call for non-idempotent operations. If it was sent, then we'd just re-enter the wait for the reply. If it wasn't then we'd go ahead and send the call. Still, I'm sure there are details I'm overlooking here. The whole point of holding these mutexes is to ensure that operations that the directories don't change while we're doing these operations. If we release the locks in order to go to sleep, then there's no guarantee that things haven't changed when we reacquire them. Maybe it's best to give up and just tell people that suspending your laptop with a NFS mount is not allowed :P -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 10:09:14 +0100 Ingo Molnar mi...@kernel.org wrote: * Mandeep Singh Baines m...@chromium.org wrote: On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? I do see how the freezer might fail to suspend certain tasks, but I don't see the deadlock scenario here in the NFS/RPC case. Can someone outline a situation where this might end up deadlocking? If not, then I'd be inclined to say that while this may be a problem, the warning is excessive... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton jlay...@redhat.com wrote: On Wed, 6 Mar 2013 10:09:14 +0100 Ingo Molnar mi...@kernel.org wrote: * Mandeep Singh Baines m...@chromium.org wrote: On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? I do see how the freezer might fail to suspend certain tasks, but I don't see the deadlock scenario here in the NFS/RPC case. Can someone outline a situation where this might end up deadlocking? If not, then I'd be inclined to say that while this may be a problem, the warning is excessive... In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. Regards, Mandeep -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On 03/05, Tejun Heo wrote: Oleg, are you still opposed to the idea of making freezer share trap points with ptrace? My memory can fool me, but iirc I wasn't actually opposed... I guess you mean the previous discussion about vfork/ptrace/etc which I forgot completely. But I can recall the main problem with your idea (with me): I simply wasn't able to understand it ;) Likewise, I can't really understand the ideas discussed in this thread. At least when it come to this particular problem, rpc_wait_bit_killable() is not interruptible... And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). But if we can do this, then it should be possible so simply make these sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we can't always restart, so I am totally confused. Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On 03/05, Jeff Layton wrote: Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? Could you explain the problem? Oleg. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 07:59:01 -0800 Mandeep Singh Baines m...@chromium.org wrote: On Wed, Mar 6, 2013 at 4:06 AM, Jeff Layton jlay...@redhat.com wrote: On Wed, 6 Mar 2013 10:09:14 +0100 Ingo Molnar mi...@kernel.org wrote: * Mandeep Singh Baines m...@chromium.org wrote: On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Am I reading the discussion correctly that the new warnings show REAL potential deadlock scenarios, which can hit real users and can lock their box up in entirely real usage scenarios? If yes then guys we _really_ don't want to use lockdep annotation to _HIDE_ bugs. We typically use them to teach lockdep about things it does not know about. How about fixing the deadlocks instead? I do see how the freezer might fail to suspend certain tasks, but I don't see the deadlock scenario here in the NFS/RPC case. Can someone outline a situation where this might end up deadlocking? If not, then I'd be inclined to say that while this may be a problem, the warning is excessive... In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. We're aware of #4. That was the intent of adding try_to_freeze() into this codepath in the first place. It's not a great solution for obvious reasons, but we don't have another at the moment. For #1 I'm not sure what to do. I'm that familiar with cgroups or how the freezer works. The bottom line is that we have a choice -- we can either rip out this new lockdep warning, or rip out the code that causes it to fire. If we rip out the warning we may miss some legit cases where we might possibly have hit a deadlock. If we rip out the code that causes it to fire, then we exacerbate the #4 problem above. That will effectively make it so that you can't suspend the host whenever NFS is doing anything moderately active. Ripping out the warning seems like the best course of action in the near term, but it's not my call... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: On Wed, 6 Mar 2013 07:59:01 -0800 Mandeep Singh Baines m...@chromium.org wrote: In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. We're aware of #4. That was the intent of adding try_to_freeze() into this codepath in the first place. It's not a great solution for obvious reasons, but we don't have another at the moment. For #1 I'm not sure what to do. I'm that familiar with cgroups or how the freezer works. The bottom line is that we have a choice -- we can either rip out this new lockdep warning, or rip out the code that causes it to fire. If we rip out the warning we may miss some legit cases where we might possibly have hit a deadlock. If we rip out the code that causes it to fire, then we exacerbate the #4 problem above. That will effectively make it so that you can't suspend the host whenever NFS is doing anything moderately active. #4 is probably the only case where we might want to freeze. Unless we're in a situation where the network is going down, we can usually always make progress with completing the RPC call and finishing the system call. So in the case of cgroup_freezer, we only care if the freezing cgroup also owns the network device (or net namespace) that NFS is using to talk to the server. As I said, the alternative is to notify NFS that the device is going down, and to give it a chance to quiesce itself before that happens. This is also the only way to ensure that processes which own locks on the server (e.g. posix file locks) have a chance to release them before being suspended. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, 6 Mar 2013 19:17:35 +0100 Oleg Nesterov o...@redhat.com wrote: On 03/05, Jeff Layton wrote: Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? Could you explain the problem? Not very well. I was just saying that the signal/ptrace stuff is very complicated already. Now we're looking at mixing in the freezer too, which further increases the complexity. Though when I said that, it was before Tejun mentioned hooking this up to ptrace. I'll confess that I don't fully understand what he's proposing either though... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, On Wed, Mar 06, 2013 at 01:40:04PM -0500, Jeff Layton wrote: Though when I said that, it was before Tejun mentioned hooking this up to ptrace. I'll confess that I don't fully understand what he's proposing either though... Oh, they're all just pretty closely related. All signal and ptrace traps run off get_signal_to_deliver(). It is still a bit hairy but in much better shape after cleanups which accompanied PTRACE_SEIZE and I don't really think it'll be too difficult to extend it so that freezing is implemented as a type of jobctl trap. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Oleg. On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). Oh yeah, we don't need another signal. We just need sigpending state and a wakeup. I wasn't really going into details. The important point is that for code paths outside signal/ptrace, freezing could look and behave about the same as signal delivery. But if we can do this, then it should be possible so simply make these sleeps TASK_INTERRUPTIBLE ? But it seems that we can't just because we can't always restart, so I am totally confused. Of course, switching to another freezing mechanism doesn't make issues like this go away in itself, but it would at least allow handling such issues in easier to grasp way rather than dealing with this giant pseudo lock and allows code paths outside jobctl to simply deal with one issue - signal delivery, rather than having to deal with freezing separately. Also, while not completely identical, frozen state would at least be an extension of jobctl trap and it would be possible to allow things like killing frozen tasks or even ptracing them in well-defined manner for cgroup_freezer use cases. As it currently stands, there is no well-defined abstraction for frozen state which we can expose in any way, which sucks. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 10:37 AM, Myklebust, Trond trond.mykleb...@netapp.com wrote: On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote: On Wed, 6 Mar 2013 07:59:01 -0800 Mandeep Singh Baines m...@chromium.org wrote: In general, holding a lock and freezing can cause a deadlock if: 1) you froze via the cgroup_freezer subsystem and a task in another cgroup tried to acquire the same lock 2) the lock was needed later is suspend/hibernate. For example, if the lock was needed in dpm_suspend by one of the device callbacks. For hibernate, you also need to worry about any locks that need to be acquired in order to write to the swap device. 3) another freezing task blocked on this lock and held other locks needed later in suspend. If that task were skipped by the freezer, you would deadlock You will block/prevent suspend if: 4) another freezing task blocked on this lock and was unable to freeze I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires cgroup freezer. Case 4) while not causing a deadlock could prevent your laptop/phone from sleeping and end up burning all your battery. If suspend is initiated via lid close you won't even know about the failure. We're aware of #4. That was the intent of adding try_to_freeze() into this codepath in the first place. It's not a great solution for obvious reasons, but we don't have another at the moment. For #1 I'm not sure what to do. I'm that familiar with cgroups or how the freezer works. The bottom line is that we have a choice -- we can either rip out this new lockdep warning, or rip out the code that causes it to fire. If we rip out the warning we may miss some legit cases where we might possibly have hit a deadlock. If we rip out the code that causes it to fire, then we exacerbate the #4 problem above. That will effectively make it so that you can't suspend the host whenever NFS is doing anything moderately active. #4 is probably the only case where we might want to freeze. Unless we're in a situation where the network is going down, we can usually always make progress with completing the RPC call and finishing the system call. So in the case of cgroup_freezer, we only care if the freezing cgroup also owns the network device (or net namespace) that NFS is using to talk to the server. As I said, the alternative is to notify NFS that the device is going down, and to give it a chance to quiesce itself before that happens. This is also the only way to ensure that processes which own locks on the server (e.g. posix file locks) have a chance to release them before being suspended. So if I'm understanding correctly, the locks are held for bounded time so under normal situation you don't really need to freeze here. But if the task that is going to wake you were to freeze then this code path could block suspend if PF_FREEZER_SKIP were not set. For this particular case (not in general), instead of calling try_to_freeze(), what if you only cleared PF_FREEZER_SKIP. For case #1, you wouldn't block suspend if you're stuck waiting. If you received the event while processes were freezing, you'd just try_to_freeze() some place else. Probably on the way back up from the system call. For case #4, you'd eventually complete the rpc and then freeze some place else. If you're stuck waiting for the bit because the process that was going to wake you also got frozen, then the container would be stuck in the THAWING state. Not catastrophic and something the administrator could workaround by sequencing things right. Does this sound like a reasonable workaround? Regards, Mandeep -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 10:53 AM, Tejun Heo t...@kernel.org wrote: Hello, Oleg. On Wed, Mar 06, 2013 at 07:16:08PM +0100, Oleg Nesterov wrote: And how SIGFREEZE can help? If we want to interrupt the sleeps in NFS/RPC layer we can simply add TASK_WAKEFREEZE (can be used with TASK_KILLABLE) and change freeze_task() to do signal_wake_up_state(TASK_WAKEFREEZE). Oh yeah, we don't need another signal. We just need sigpending state and a wakeup. I wasn't really going into details. The important point is that for code paths outside signal/ptrace, freezing could look and behave about the same as signal delivery. Don't we already do that? The whole try_to_freeze() in get_signal_to_deliver() is about exactly this. See fake_signal_wake_up(). You still have kernel threads (that don't do signals) to worry about, so it doesn't make things go away. And you still have issues with latency of disk wait, which is, I think, the reason for that freezable_schedule() in the NFS code to begin with. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Linus. On Wed, Mar 06, 2013 at 01:00:02PM -0800, Linus Torvalds wrote: Oh yeah, we don't need another signal. We just need sigpending state and a wakeup. I wasn't really going into details. The important point is that for code paths outside signal/ptrace, freezing could look and behave about the same as signal delivery. Don't we already do that? The whole try_to_freeze() in get_signal_to_deliver() is about exactly this. See fake_signal_wake_up(). Yeap, that was what I had in mind too. Maybe we'll need to modify it slightly but we already have most of the basic stuff. You still have kernel threads (that don't do signals) to worry about, so it doesn't make things go away. And you still have issues with latency of disk wait, which is, I think, the reason for that freezable_schedule() in the NFS code to begin with. I haven't thought about it for quite some time so things are hazy, but here's what I can recall now. With syscall paths out of the way, the surface is reduced a lot. Another part is converting most freezable kthread users to freezable workqueue which provides natural resource boundaries (the duration of work item execution). kthread is already difficult to get the synchronization completely right and significant number of freezable + should_stop users are subtly broken the last time I went over the freezer users. I think we would be much better off converting most over to freezable workqueues which is easier to get right and likely to be less expensive. Freezing happens at work item boundary which in most cases could be made to coincide with the original freezer check point. There could be kthreads which can't be converted to workqueue for whatever reason (there shouldn't be many at this point) but most freezer usages in kthreads are pretty simple. It's usually single or a couple freezer check points in the main loop. While we may still need special handling for them, I don't think they're likely to have implications on issues like this. We probably would want to handle restart for freezable kthreads calling syscalls. Haven't thought about this one too much yet. Maybe freezable kthreads doing syscalls just need to be ready for -ERESTARTSYS? I'm not sure I follow the disk wait latency part. Are you saying that switching to jobctl trap based freezer implementation wouldn't help them? If so, right, it doesn't in itself. It's just changing the infrastructure used for freezing and can't make the underlying synchronization issues just disappear but at least it becomes the same problem as being responsive to SIGKILL rather than a completely separate problem. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 6, 2013 at 1:24 PM, Tejun Heo t...@kernel.org wrote: With syscall paths out of the way, the surface is reduced a lot. So the issue is syscalls that don't react to signals, and that can potentially wait a long time. Like NFS with a network hickup. Which is not at all unlikely. Think wireless network, somebody trying to get on a network share, things not working, and closing the damn lid because you give up. So I do agree that we probably have *too* many of the stupid let's check if we can freeze, and I suspect that the NFS code should get rid of the freezable_schedule() that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: So I do agree that we probably have *too* many of the stupid let's check if we can freeze, and I suspect that the NFS code should get rid of the freezable_schedule() that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Wed, Mar 06, 2013 at 01:36:36PM -0800, Tejun Heo wrote: On Wed, Mar 06, 2013 at 01:31:10PM -0800, Linus Torvalds wrote: So I do agree that we probably have *too* many of the stupid let's check if we can freeze, and I suspect that the NFS code should get rid of the freezable_schedule() that is causing this warning (because I also agree that you should *not* freeze while holding locks, because it really can cause deadlocks), but I do suspect that network filesystems do need to have a few places where they check for freezing on their own... Exactly because freezing isn't *quite* like a signal. Well, I don't really know much about nfs so I can't really tell, but for most other cases, dealing with freezing like a signal should work fine from what I've seen although I can't be sure before actually trying. Trond, Bruce, can you guys please chime in? So, I think the question here would be, in nfs, how many of the current freezer check points would be difficult to conver to signal handling model after excluding the ones which are performed while holding some locks which we need to get rid of anyway? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo wrote: > On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: >> If it's really just a 2-line patch to try_to_freeze(), could it just be >> carried out-of-tree by people that are specifically working on tracking >> down these problems? >> >> But I don't have strong feelings about it--as long as it doesn't result >> in the same known issues getting reported again and again > > Agreed, I don't think a Kconfig option is justified for this. If this > is really important, annotate broken paths so that it doesn't trigger > spuriously; otherwise, please just remove it. > Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Regards, Mandeep > Thanks. > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 05:14:23PM -0800, Tejun Heo wrote: > Then, the operation simply isn't freezable while in progress and > should be on the receiving end of failed-to-freeze error message and > users who depend on suspend/hibernation working properly should be > advised away from using nfs. > > It doesn't really changes anything. The current code is buggy. We're > just not enforcing the rules strictly and people aren't hitting the > bug often enough. Just one more thing. Also, not being able to do retries without side-effect doesn't mean it can't do retries at all. There are syscalls which need to do things differently on re-entry (forgot which one but there are several). They record the necessary state in the restart block and resume from where they left off on re-entry. It sure is hairy but is doable and we already have supporting infrastructure. Not sure whether that would be worthwhile tho. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: > If it's really just a 2-line patch to try_to_freeze(), could it just be > carried out-of-tree by people that are specifically working on tracking > down these problems? > > But I don't have strong feelings about it--as long as it doesn't result > in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Trond. On Wed, Mar 06, 2013 at 01:10:07AM +, Myklebust, Trond wrote: > Not all RPC calls can just be interrupted and restarted. Something like > an exclusive file create, unlink, file locking attempt, etc may give > rise to different results when replayed in the above scenario. > Interrupting an RPC call is not equivalent to cancelling its effects... Then, the operation simply isn't freezable while in progress and should be on the receiving end of failed-to-freeze error message and users who depend on suspend/hibernation working properly should be advised away from using nfs. It doesn't really changes anything. The current code is buggy. We're just not enforcing the rules strictly and people aren't hitting the bug often enough. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: > On Tue, 5 Mar 2013 09:49:54 -0800 > Tejun Heo wrote: > > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > So, I think this is why implementing freezer as a separate blocking > > > mechanism isn't such a good idea. We're effectively introducing a > > > completely new waiting state to a lot of unsuspecting paths which > > > generates a lot of risks and eventually extra complexity to work > > > around those. I think we really should update freezer to re-use the > > > blocking points we already have - the ones used for signal delivery > > > and ptracing. That way, other code paths don't have to worry about an > > > extra stop state and we can confine most complexities to freezer > > > proper. > > > > Also, consolidating those wait states means that we can solve the > > event-to-response latency problem for all three cases - signal, ptrace > > and freezer, rather than adding separate backing-out strategy for > > freezer. > > > > Sounds intriguing... > > I'm not sure what this really means for something like NFS though. How > would you envision this working when we have long running syscalls that > might sit waiting in the kernel indefinitely? > > Here's my blue-sky, poorly-thought-out idea... > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > NFS/RPC layer to be interrupted. Those would return back toward > userland with a particular type of error (sort of like ERESTARTSYS). > > Before returning from the kernel though, we could freeze the process. > When it wakes up, then we could go back down and retry the call again > (much like an ERESTARTSYS kind of thing). Two (three?) show stopper words for you: "non-idempotent operations". Not all RPC calls can just be interrupted and restarted. Something like an exclusive file create, unlink, file locking attempt, etc may give rise to different results when replayed in the above scenario. Interrupting an RPC call is not equivalent to cancelling its effects... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 04:59:00PM -0800, Mandeep Singh Baines wrote: > On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields wrote: > > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > >> On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > >> > So, I think this is why implementing freezer as a separate blocking > >> > mechanism isn't such a good idea. We're effectively introducing a > >> > completely new waiting state to a lot of unsuspecting paths which > >> > generates a lot of risks and eventually extra complexity to work > >> > around those. I think we really should update freezer to re-use the > >> > blocking points we already have - the ones used for signal delivery > >> > and ptracing. That way, other code paths don't have to worry about an > >> > extra stop state and we can confine most complexities to freezer > >> > proper. > >> > >> Also, consolidating those wait states means that we can solve the > >> event-to-response latency problem for all three cases - signal, ptrace > >> and freezer, rather than adding separate backing-out strategy for > >> freezer. > > > > Meanwhile, as none of this sounds likely to be done this time > > around--are we backing out the new lockdep warnings? > > > > --b. > > What if we hide it behind a Kconfig? Its finding real bugs. > > http://lkml.org/lkml/2013/3/5/583 If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again --b. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields wrote: > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: >> On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: >> > So, I think this is why implementing freezer as a separate blocking >> > mechanism isn't such a good idea. We're effectively introducing a >> > completely new waiting state to a lot of unsuspecting paths which >> > generates a lot of risks and eventually extra complexity to work >> > around those. I think we really should update freezer to re-use the >> > blocking points we already have - the ones used for signal delivery >> > and ptracing. That way, other code paths don't have to worry about an >> > extra stop state and we can confine most complexities to freezer >> > proper. >> >> Also, consolidating those wait states means that we can solve the >> event-to-response latency problem for all three cases - signal, ptrace >> and freezer, rather than adding separate backing-out strategy for >> freezer. > > Meanwhile, as none of this sounds likely to be done this time > around--are we backing out the new lockdep warnings? > > --b. What if we hide it behind a Kconfig? Its finding real bugs. http://lkml.org/lkml/2013/3/5/583 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tuesday, March 05, 2013 06:11:10 PM J. Bruce Fields wrote: > On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > > So, I think this is why implementing freezer as a separate blocking > > > mechanism isn't such a good idea. We're effectively introducing a > > > completely new waiting state to a lot of unsuspecting paths which > > > generates a lot of risks and eventually extra complexity to work > > > around those. I think we really should update freezer to re-use the > > > blocking points we already have - the ones used for signal delivery > > > and ptracing. That way, other code paths don't have to worry about an > > > extra stop state and we can confine most complexities to freezer > > > proper. > > > > Also, consolidating those wait states means that we can solve the > > event-to-response latency problem for all three cases - signal, ptrace > > and freezer, rather than adding separate backing-out strategy for > > freezer. > > Meanwhile, as none of this sounds likely to be done this time > around--are we backing out the new lockdep warnings? I think we should do that. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Tue, Mar 05, 2013 at 06:39:41PM -0500, Jeff Layton wrote: > Al was in the middle of his signal handling/execve rework though and I > ran the idea past him. He pointedly told me that I was crazy for even > considering it. This is rather non-trivial to handle since it means > mucking around in a bunch of arch-specific code and dealing with all of > the weirdo corner cases. Hmmm... I tried it a couple years ago while I was working on ptrace improvements to support checkpoint-restart in userland (PTRACE_SEIZE and friends) and had a half-working code. At the time, Oleg was against the idea and something else came up so I didn't push it all the way but IIRC it didn't need to touch any arch code. It should be able to just share the existing trap path with ptrace. > Anyone up for working out how to handle a freeze event on a process > that already has a pending signal, while it's being ptraced? It's > probably best to chat with Al (cc'ed here) before you embark on this > plan since he was just in that code recently. Maybe Al sees something that I don't but AFAICS it should be doable in generic code proper and I don't think it's gonna be that hard. Oleg, are you still opposed to the idea of making freezer share trap points with ptrace? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 5 Mar 2013 11:09:23 -0800 Tejun Heo wrote: > Hello, Jeff. > > On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: > > Sounds intriguing... > > > > I'm not sure what this really means for something like NFS though. How > > would you envision this working when we have long running syscalls that > > might sit waiting in the kernel indefinitely? > > I think it is the same problem as being able to handle SIGKILL in > responsive manner. It could be tricky to implement for nfs but it at > least doesn't have to solve the problem twice. > > > Here's my blue-sky, poorly-thought-out idea... > > > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > > NFS/RPC layer to be interrupted. Those would return back toward > > userland with a particular type of error (sort of like ERESTARTSYS). > > > > Before returning from the kernel though, we could freeze the process. > > When it wakes up, then we could go back down and retry the call again > > (much like an ERESTARTSYS kind of thing). > > > > The tricky part here is that we'd need to distinguish between the case > > where we caught SIGFREEZE before sending an RPC vs. after. If we sent > > the call before freezing, then we don't want to resend it again. It > > might be a non-idempotent operation. > > So, yeah, you are thinking pretty much the same as I'm. > > > Sounds horrific to code up though... :) > > I don't know the details of nfs but those events could essentially be > signaling that the system is gonna lose power. I think it would be a > good idea to solve it. > > Thanks. > It would be... So, I briefly considered a similar approach when I was working on the retry-on-ESTALE error patches. It occurred to me that it was somewhat similar to the ERESTARTSYS case, so handling it at a higher level than in the syscall handlers themselves might make sense... Al was in the middle of his signal handling/execve rework though and I ran the idea past him. He pointedly told me that I was crazy for even considering it. This is rather non-trivial to handle since it means mucking around in a bunch of arch-specific code and dealing with all of the weirdo corner cases. Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? It's probably best to chat with Al (cc'ed here) before you embark on this plan since he was just in that code recently. In any case, maybe there's also some code consolidation opportunity here too. I suspect at least some of this logic is in arch-specific code when it really needn't be -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > So, I think this is why implementing freezer as a separate blocking > > mechanism isn't such a good idea. We're effectively introducing a > > completely new waiting state to a lot of unsuspecting paths which > > generates a lot of risks and eventually extra complexity to work > > around those. I think we really should update freezer to re-use the > > blocking points we already have - the ones used for signal delivery > > and ptracing. That way, other code paths don't have to worry about an > > extra stop state and we can confine most complexities to freezer > > proper. > > Also, consolidating those wait states means that we can solve the > event-to-response latency problem for all three cases - signal, ptrace > and freezer, rather than adding separate backing-out strategy for > freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? --b. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: > Sounds intriguing... > > I'm not sure what this really means for something like NFS though. How > would you envision this working when we have long running syscalls that > might sit waiting in the kernel indefinitely? I think it is the same problem as being able to handle SIGKILL in responsive manner. It could be tricky to implement for nfs but it at least doesn't have to solve the problem twice. > Here's my blue-sky, poorly-thought-out idea... > > We could add a signal (e.g. SIGFREEZE) that allows the sleeps in > NFS/RPC layer to be interrupted. Those would return back toward > userland with a particular type of error (sort of like ERESTARTSYS). > > Before returning from the kernel though, we could freeze the process. > When it wakes up, then we could go back down and retry the call again > (much like an ERESTARTSYS kind of thing). > > The tricky part here is that we'd need to distinguish between the case > where we caught SIGFREEZE before sending an RPC vs. after. If we sent > the call before freezing, then we don't want to resend it again. It > might be a non-idempotent operation. So, yeah, you are thinking pretty much the same as I'm. > Sounds horrific to code up though... :) I don't know the details of nfs but those events could essentially be signaling that the system is gonna lose power. I think it would be a good idea to solve it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 5 Mar 2013 09:49:54 -0800 Tejun Heo wrote: > On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > > So, I think this is why implementing freezer as a separate blocking > > mechanism isn't such a good idea. We're effectively introducing a > > completely new waiting state to a lot of unsuspecting paths which > > generates a lot of risks and eventually extra complexity to work > > around those. I think we really should update freezer to re-use the > > blocking points we already have - the ones used for signal delivery > > and ptracing. That way, other code paths don't have to worry about an > > extra stop state and we can confine most complexities to freezer > > proper. > > Also, consolidating those wait states means that we can solve the > event-to-response latency problem for all three cases - signal, ptrace > and freezer, rather than adding separate backing-out strategy for > freezer. > Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). The tricky part here is that we'd need to distinguish between the case where we caught SIGFREEZE before sending an RPC vs. after. If we sent the call before freezing, then we don't want to resend it again. It might be a non-idempotent operation. Sounds horrific to code up though... :) -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: > So, I think this is why implementing freezer as a separate blocking > mechanism isn't such a good idea. We're effectively introducing a > completely new waiting state to a lot of unsuspecting paths which > generates a lot of risks and eventually extra complexity to work > around those. I think we really should update freezer to re-use the > blocking points we already have - the ones used for signal delivery > and ptracing. That way, other code paths don't have to worry about an > extra stop state and we can confine most complexities to freezer > proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, guys. On Tue, Mar 05, 2013 at 08:23:08AM -0500, Jeff Layton wrote: > So, not a deadlock per-se in this case but it does prevent the freezer > from running to completion. I don't see any way to solve it though w/o > making all mutexes freezable. Note that I don't think this is really > limited to NFS either -- a lot of other filesystems will have similar > problems: CIFS, some FUSE variants, etc... So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, 4 Mar 2013 22:08:34 + "Myklebust, Trond" wrote: > On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: > > On 03/04, Mandeep Singh Baines wrote: > > > > > > The problem is that freezer_count() calls try_to_freeze(). In this > > > case, try_to_freeze() is not really adding any value. > > > > Well, I tend to agree. > > > > If a task calls __refrigerator() holding a lock which another freezable > > task can wait for, this is not freezer-friendly. > > > > freezable_schedule/freezer_do_not_count/etc not only means "I won't be > > active if freezing()", it should also mean "I won't block suspend/etc". > > If suspend for some reason requires a re-entrant mount, then yes, I can > see how it might be a problem that we're holding a mount-related lock. > The question is why is that necessary? > > > OTOH, I understand that probably it is not trivial to change this code > > to make it freezer-friendly. But at least I disagree with "push your > > problems onto others". > > That code can't be made freezer-friendly if it isn't allowed to hold > basic filesystem-related locks across RPC calls. A number of those RPC > calls do things that need to be protected by VFS or MM-level locks. > i.e.: lookups, file creation/deletion, page fault in/out, ... > > IOW: the problem would need to be solved differently, possibly by adding > a new FIFREEZE-like call to allow the filesystem to quiesce itself > _before_ NetworkManager pulls the rug out from underneath it. There > would still be plenty of corner cases to keep people entertained (e.g. > the server goes down before the quiesce call is invoked) but at least > the top 90% of cases would be solved. > Ok, I think I'm starting to get it. It doesn't necessarily need a reentrant mount or anything like that. Consider this case (which is not even that unlikely): Suppose there are two tasks calling unlink() on files in the same NFS directory. First task takes the i_mutex on the parent directory and goes to ask the server to remove the file. Second task calls unlink just afterward and blocks on the parent's i_mutex. Now, a suspend event comes in and freezes the first task while it's waiting on the response. It still holds the parent's i_mutex. Freezer now gets to the second task and can't freeze it because the sleep on that mutex isn't freezable. So, not a deadlock per-se in this case but it does prevent the freezer from running to completion. I don't see any way to solve it though w/o making all mutexes freezable. Note that I don't think this is really limited to NFS either -- a lot of other filesystems will have similar problems: CIFS, some FUSE variants, etc... -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, 4 Mar 2013 22:08:34 + Myklebust, Trond trond.mykleb...@netapp.com wrote: On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: On 03/04, Mandeep Singh Baines wrote: The problem is that freezer_count() calls try_to_freeze(). In this case, try_to_freeze() is not really adding any value. Well, I tend to agree. If a task calls __refrigerator() holding a lock which another freezable task can wait for, this is not freezer-friendly. freezable_schedule/freezer_do_not_count/etc not only means I won't be active if freezing(), it should also mean I won't block suspend/etc. If suspend for some reason requires a re-entrant mount, then yes, I can see how it might be a problem that we're holding a mount-related lock. The question is why is that necessary? OTOH, I understand that probably it is not trivial to change this code to make it freezer-friendly. But at least I disagree with push your problems onto others. That code can't be made freezer-friendly if it isn't allowed to hold basic filesystem-related locks across RPC calls. A number of those RPC calls do things that need to be protected by VFS or MM-level locks. i.e.: lookups, file creation/deletion, page fault in/out, ... IOW: the problem would need to be solved differently, possibly by adding a new FIFREEZE-like call to allow the filesystem to quiesce itself _before_ NetworkManager pulls the rug out from underneath it. There would still be plenty of corner cases to keep people entertained (e.g. the server goes down before the quiesce call is invoked) but at least the top 90% of cases would be solved. Ok, I think I'm starting to get it. It doesn't necessarily need a reentrant mount or anything like that. Consider this case (which is not even that unlikely): Suppose there are two tasks calling unlink() on files in the same NFS directory. First task takes the i_mutex on the parent directory and goes to ask the server to remove the file. Second task calls unlink just afterward and blocks on the parent's i_mutex. Now, a suspend event comes in and freezes the first task while it's waiting on the response. It still holds the parent's i_mutex. Freezer now gets to the second task and can't freeze it because the sleep on that mutex isn't freezable. So, not a deadlock per-se in this case but it does prevent the freezer from running to completion. I don't see any way to solve it though w/o making all mutexes freezable. Note that I don't think this is really limited to NFS either -- a lot of other filesystems will have similar problems: CIFS, some FUSE variants, etc... -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, guys. On Tue, Mar 05, 2013 at 08:23:08AM -0500, Jeff Layton wrote: So, not a deadlock per-se in this case but it does prevent the freezer from running to completion. I don't see any way to solve it though w/o making all mutexes freezable. Note that I don't think this is really limited to NFS either -- a lot of other filesystems will have similar problems: CIFS, some FUSE variants, etc... So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 5 Mar 2013 09:49:54 -0800 Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). The tricky part here is that we'd need to distinguish between the case where we caught SIGFREEZE before sending an RPC vs. after. If we sent the call before freezing, then we don't want to resend it again. It might be a non-idempotent operation. Sounds horrific to code up though... :) -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? I think it is the same problem as being able to handle SIGKILL in responsive manner. It could be tricky to implement for nfs but it at least doesn't have to solve the problem twice. Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). The tricky part here is that we'd need to distinguish between the case where we caught SIGFREEZE before sending an RPC vs. after. If we sent the call before freezing, then we don't want to resend it again. It might be a non-idempotent operation. So, yeah, you are thinking pretty much the same as I'm. Sounds horrific to code up though... :) I don't know the details of nfs but those events could essentially be signaling that the system is gonna lose power. I think it would be a good idea to solve it. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? --b. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 5 Mar 2013 11:09:23 -0800 Tejun Heo t...@kernel.org wrote: Hello, Jeff. On Tue, Mar 05, 2013 at 02:03:12PM -0500, Jeff Layton wrote: Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? I think it is the same problem as being able to handle SIGKILL in responsive manner. It could be tricky to implement for nfs but it at least doesn't have to solve the problem twice. Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). The tricky part here is that we'd need to distinguish between the case where we caught SIGFREEZE before sending an RPC vs. after. If we sent the call before freezing, then we don't want to resend it again. It might be a non-idempotent operation. So, yeah, you are thinking pretty much the same as I'm. Sounds horrific to code up though... :) I don't know the details of nfs but those events could essentially be signaling that the system is gonna lose power. I think it would be a good idea to solve it. Thanks. It would be... So, I briefly considered a similar approach when I was working on the retry-on-ESTALE error patches. It occurred to me that it was somewhat similar to the ERESTARTSYS case, so handling it at a higher level than in the syscall handlers themselves might make sense... Al was in the middle of his signal handling/execve rework though and I ran the idea past him. He pointedly told me that I was crazy for even considering it. This is rather non-trivial to handle since it means mucking around in a bunch of arch-specific code and dealing with all of the weirdo corner cases. Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? It's probably best to chat with Al (cc'ed here) before you embark on this plan since he was just in that code recently. In any case, maybe there's also some code consolidation opportunity here too. I suspect at least some of this logic is in arch-specific code when it really needn't be -- Jeff Layton jlay...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Jeff. On Tue, Mar 05, 2013 at 06:39:41PM -0500, Jeff Layton wrote: Al was in the middle of his signal handling/execve rework though and I ran the idea past him. He pointedly told me that I was crazy for even considering it. This is rather non-trivial to handle since it means mucking around in a bunch of arch-specific code and dealing with all of the weirdo corner cases. Hmmm... I tried it a couple years ago while I was working on ptrace improvements to support checkpoint-restart in userland (PTRACE_SEIZE and friends) and had a half-working code. At the time, Oleg was against the idea and something else came up so I didn't push it all the way but IIRC it didn't need to touch any arch code. It should be able to just share the existing trap path with ptrace. Anyone up for working out how to handle a freeze event on a process that already has a pending signal, while it's being ptraced? It's probably best to chat with Al (cc'ed here) before you embark on this plan since he was just in that code recently. Maybe Al sees something that I don't but AFAICS it should be doable in generic code proper and I don't think it's gonna be that hard. Oleg, are you still opposed to the idea of making freezer share trap points with ptrace? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tuesday, March 05, 2013 06:11:10 PM J. Bruce Fields wrote: On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? I think we should do that. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields bfie...@fieldses.org wrote: On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? --b. What if we hide it behind a Kconfig? Its finding real bugs. http://lkml.org/lkml/2013/3/5/583 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 04:59:00PM -0800, Mandeep Singh Baines wrote: On Tue, Mar 5, 2013 at 3:11 PM, J. Bruce Fields bfie...@fieldses.org wrote: On Tue, Mar 05, 2013 at 09:49:54AM -0800, Tejun Heo wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Meanwhile, as none of this sounds likely to be done this time around--are we backing out the new lockdep warnings? --b. What if we hide it behind a Kconfig? Its finding real bugs. http://lkml.org/lkml/2013/3/5/583 If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again --b. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, 2013-03-05 at 14:03 -0500, Jeff Layton wrote: On Tue, 5 Mar 2013 09:49:54 -0800 Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 09:46:48AM -0800, Tejun Heo wrote: So, I think this is why implementing freezer as a separate blocking mechanism isn't such a good idea. We're effectively introducing a completely new waiting state to a lot of unsuspecting paths which generates a lot of risks and eventually extra complexity to work around those. I think we really should update freezer to re-use the blocking points we already have - the ones used for signal delivery and ptracing. That way, other code paths don't have to worry about an extra stop state and we can confine most complexities to freezer proper. Also, consolidating those wait states means that we can solve the event-to-response latency problem for all three cases - signal, ptrace and freezer, rather than adding separate backing-out strategy for freezer. Sounds intriguing... I'm not sure what this really means for something like NFS though. How would you envision this working when we have long running syscalls that might sit waiting in the kernel indefinitely? Here's my blue-sky, poorly-thought-out idea... We could add a signal (e.g. SIGFREEZE) that allows the sleeps in NFS/RPC layer to be interrupted. Those would return back toward userland with a particular type of error (sort of like ERESTARTSYS). Before returning from the kernel though, we could freeze the process. When it wakes up, then we could go back down and retry the call again (much like an ERESTARTSYS kind of thing). Two (three?) show stopper words for you: non-idempotent operations. Not all RPC calls can just be interrupted and restarted. Something like an exclusive file create, unlink, file locking attempt, etc may give rise to different results when replayed in the above scenario. Interrupting an RPC call is not equivalent to cancelling its effects... -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
Hello, Trond. On Wed, Mar 06, 2013 at 01:10:07AM +, Myklebust, Trond wrote: Not all RPC calls can just be interrupted and restarted. Something like an exclusive file create, unlink, file locking attempt, etc may give rise to different results when replayed in the above scenario. Interrupting an RPC call is not equivalent to cancelling its effects... Then, the operation simply isn't freezable while in progress and should be on the receiving end of failed-to-freeze error message and users who depend on suspend/hibernation working properly should be advised away from using nfs. It doesn't really changes anything. The current code is buggy. We're just not enforcing the rules strictly and people aren't hitting the bug often enough. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 05, 2013 at 05:14:23PM -0800, Tejun Heo wrote: Then, the operation simply isn't freezable while in progress and should be on the receiving end of failed-to-freeze error message and users who depend on suspend/hibernation working properly should be advised away from using nfs. It doesn't really changes anything. The current code is buggy. We're just not enforcing the rules strictly and people aren't hitting the bug often enough. Just one more thing. Also, not being able to do retries without side-effect doesn't mean it can't do retries at all. There are syscalls which need to do things differently on re-entry (forgot which one but there are several). They record the necessary state in the restart block and resume from where they left off on re-entry. It sure is hairy but is doable and we already have supporting infrastructure. Not sure whether that would be worthwhile tho. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Tue, Mar 5, 2013 at 5:16 PM, Tejun Heo t...@kernel.org wrote: On Tue, Mar 05, 2013 at 08:05:07PM -0500, J. Bruce Fields wrote: If it's really just a 2-line patch to try_to_freeze(), could it just be carried out-of-tree by people that are specifically working on tracking down these problems? But I don't have strong feelings about it--as long as it doesn't result in the same known issues getting reported again and again Agreed, I don't think a Kconfig option is justified for this. If this is really important, annotate broken paths so that it doesn't trigger spuriously; otherwise, please just remove it. Fair enough. Let's revert then. I'll rework to use a lockdep annotation. Maybe, add a new lockdep API: lockdep_set_held_during_freeze(lock); Then when we do the check, ignore any locks that set this bit. Ingo, does this seem like a reasonable design to you? Regards, Mandeep Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, 2013-03-04 at 21:53 +0100, Oleg Nesterov wrote: > On 03/04, Mandeep Singh Baines wrote: > > > > The problem is that freezer_count() calls try_to_freeze(). In this > > case, try_to_freeze() is not really adding any value. > > Well, I tend to agree. > > If a task calls __refrigerator() holding a lock which another freezable > task can wait for, this is not freezer-friendly. > > freezable_schedule/freezer_do_not_count/etc not only means "I won't be > active if freezing()", it should also mean "I won't block suspend/etc". If suspend for some reason requires a re-entrant mount, then yes, I can see how it might be a problem that we're holding a mount-related lock. The question is why is that necessary? > OTOH, I understand that probably it is not trivial to change this code > to make it freezer-friendly. But at least I disagree with "push your > problems onto others". That code can't be made freezer-friendly if it isn't allowed to hold basic filesystem-related locks across RPC calls. A number of those RPC calls do things that need to be protected by VFS or MM-level locks. i.e.: lookups, file creation/deletion, page fault in/out, ... IOW: the problem would need to be solved differently, possibly by adding a new FIFREEZE-like call to allow the filesystem to quiesce itself _before_ NetworkManager pulls the rug out from underneath it. There would still be plenty of corner cases to keep people entertained (e.g. the server goes down before the quiesce call is invoked) but at least the top 90% of cases would be solved. -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On 03/04, Mandeep Singh Baines wrote: > > The problem is that freezer_count() calls try_to_freeze(). In this > case, try_to_freeze() is not really adding any value. Well, I tend to agree. If a task calls __refrigerator() holding a lock which another freezable task can wait for, this is not freezer-friendly. freezable_schedule/freezer_do_not_count/etc not only means "I won't be active if freezing()", it should also mean "I won't block suspend/etc". OTOH, I understand that probably it is not trivial to change this code to make it freezer-friendly. But at least I disagree with "push your problems onto others". Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, Mar 4, 2013 at 12:09 PM, Mandeep Singh Baines wrote: > On Mon, Mar 4, 2013 at 7:53 AM, Myklebust, Trond > wrote: >> On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: >>> Hi, >>> >>> CC guys who introduced the lockdep change. >>> >>> On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton wrote: >>> >>> > >>> > I don't get it -- why is it bad to hold a lock across a freeze event? >>> >>> At least this may deadlock another mount.nfs during freezing, :-) >>> >>> See detailed explanation in the commit log: >>> >>> commit 6aa9707099c4b25700940eb3d016f16c4434360d >>> Author: Mandeep Singh Baines >>> Date: Wed Feb 27 17:03:18 2013 -0800 >>> >>> lockdep: check that no locks held at freeze time >>> >>> We shouldn't try_to_freeze if locks are held. Holding a lock can cause >>> a >>> deadlock if the lock is later acquired in the suspend or hibernate path >>> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of >>> cgroup_freezer if a lock is held inside a frozen cgroup that is later >>> acquired by a process outside that group. >>> >> >> This is bloody ridiculous... If you want to add functionality to >> implement cgroup or per-process freezing, then do it through some other >> api instead of trying to push your problems onto others by adding new >> global locking rules. >> >> Filesystems are a shared resource that have _nothing_ to do with process >> cgroups. They need to be suspended when the network goes down or other >> resources that they depend on are suspended. At that point, there is no >> "what if I launch a new mount command?" scenario. >> > > Hi Trond, > > My intention was to introduce new rules. My change simply introduces a D'oh. s/was/was not/ Regards, Mandeep > check for a deadlock case that can already happen. > > I think a deadlock could happen under the following scenario: > > 1) An administrator wants to freeze a container. Perhaps to checkpoint > it and it migrate it some place else. > 2) An nfs mount was in progress so we hit this code path and freeze > with a lock held. > 3) Another container tries to nfs mount. > 4) Deadlock. > > Regards, > Mandeep > >> Trond >> -- >> Trond Myklebust >> Linux NFS client maintainer >> >> NetApp >> trond.mykleb...@netapp.com >> www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, Mar 4, 2013 at 7:53 AM, Myklebust, Trond wrote: > On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: >> Hi, >> >> CC guys who introduced the lockdep change. >> >> On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton wrote: >> >> > >> > I don't get it -- why is it bad to hold a lock across a freeze event? >> >> At least this may deadlock another mount.nfs during freezing, :-) >> >> See detailed explanation in the commit log: >> >> commit 6aa9707099c4b25700940eb3d016f16c4434360d >> Author: Mandeep Singh Baines >> Date: Wed Feb 27 17:03:18 2013 -0800 >> >> lockdep: check that no locks held at freeze time >> >> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a >> deadlock if the lock is later acquired in the suspend or hibernate path >> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of >> cgroup_freezer if a lock is held inside a frozen cgroup that is later >> acquired by a process outside that group. >> > > This is bloody ridiculous... If you want to add functionality to > implement cgroup or per-process freezing, then do it through some other > api instead of trying to push your problems onto others by adding new > global locking rules. > > Filesystems are a shared resource that have _nothing_ to do with process > cgroups. They need to be suspended when the network goes down or other > resources that they depend on are suspended. At that point, there is no > "what if I launch a new mount command?" scenario. > Hi Trond, My intention was to introduce new rules. My change simply introduces a check for a deadlock case that can already happen. I think a deadlock could happen under the following scenario: 1) An administrator wants to freeze a container. Perhaps to checkpoint it and it migrate it some place else. 2) An nfs mount was in progress so we hit this code path and freeze with a lock held. 3) Another container tries to nfs mount. 4) Deadlock. Regards, Mandeep > Trond > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > trond.mykleb...@netapp.com > www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
+ rjw, akpm, tejun, mingo, oleg On Mon, Mar 4, 2013 at 6:23 AM, Jeff Layton wrote: > On Mon, 4 Mar 2013 14:14:23 + > "Myklebust, Trond" wrote: > >> On Mon, 2013-03-04 at 21:57 +0800, Ming Lei wrote: >> > Hi, >> > >> > The below warning can be triggered each time when mount.nfs is >> > running on 3.9-rc1. >> > >> > Not sure if freezable_schedule() inside rpc_wait_bit_killable should >> > be changed to schedule() since nfs_clid_init_mutex is held in the path. >> >> Cc:ing Jeff, who added freezable_schedule(), and applied it to >> rpc_wait_bit_killable. >> >> So this is occurring when the kernel enters the freeze state? >> Why does it occur only with nfs_clid_init_mutex, and not with all the >> other mutexes that we hold across RPC calls? We hold inode->i_mutex >> across RPC calls all the time when doing renames, unlinks, file >> creation,... >> > > cc'ing Mandeep as well since his patch caused this to start popping up... > > We've also gotten some similar lockdep pops in the nfsd code recently. > I responded to an email about it here on Friday, and I'll re-post what > I said there below: > > -[snip]-- > Ok, I see... > > rpc_wait_bit_killable() calls freezable_schedule(). That calls > freezer_count() which calls try_to_freeze(). try_to_freeze does this > lockdep check now as of commit 6aa9707099. > > The assumption seems to be that freezing a thread while holding any > sort of lock is bad. The rationale in that patch seems a bit sketchy to > me though. We can be fairly certain that we're not going to deadlock by > holding these locks, but I guess there could be something I've missed. > > Mandeep, can you elaborate on whether there's really a deadlock > scenario here? If not, then is there some way to annotate these locks > so this lockdep pop goes away? We were seeing deadlocks on suspend that were root caused to locks held at freeze time. In this case, the locks were device locks. I wanted a way to detect if a process tried to freeze with a device_lock held. Then I thought about the cgroup_freezer subsystem (which I recently turned on in our system). If you were to freeze a cgroup and a process were holding a lock, you could deadlock. Seems reasonable that this code path could cause a deadlock that way. The problem is that freezer_count() calls try_to_freeze(). In this case, try_to_freeze() is not really adding any value. If we didn't try_to_freeze() in this instance of freezer_count() you'd avoid the potential deadlock and not block suspend. I think a lot of call sites are like that. They want to avoid blocking suspend but aren't really interested in trying to freeze at the moment they call freezer_count(). Regards, Mandeep > -[snip]-- > > >> > [ 41.387939] = >> > [ 41.392913] [ BUG: mount.nfs/643 still has locks held! ] >> > [ 41.398559] 3.9.0-rc1+ #1740 Not tainted >> > [ 41.402709] - >> > [ 41.407714] 1 lock held by mount.nfs/643: >> > [ 41.411956] #0: (nfs_clid_init_mutex){+.+...}, at: [] >> > nfs4_discover_server_trunking+0x60/0x1d4 >> > [ 41.422363] >> > [ 41.422363] stack backtrace: >> > [ 41.427032] [] (unwind_backtrace+0x0/0xe0) from >> > [] (rpc_wait_bit_killable+0x38/0xc8) >> > [ 41.437103] [] (rpc_wait_bit_killable+0x38/0xc8) from >> > [] (__wait_on_bit+0x54/0x9c) >> > [ 41.446990] [] (__wait_on_bit+0x54/0x9c) from >> > [] (out_of_line_wait_on_bit+0x78/0x84) >> > [ 41.457061] [] (out_of_line_wait_on_bit+0x78/0x84) from >> > [] (__rpc_execute+0x170/0x348) >> > [ 41.467407] [] (__rpc_execute+0x170/0x348) from >> > [] (rpc_run_task+0x9c/0xa4) >> > [ 41.476715] [] (rpc_run_task+0x9c/0xa4) from [] >> > (rpc_call_sync+0x70/0xb0) >> > [ 41.485778] [] (rpc_call_sync+0x70/0xb0) from >> > [] (nfs4_proc_setclientid+0x1a0/0x1c8) >> > [ 41.495819] [] (nfs4_proc_setclientid+0x1a0/0x1c8) from >> > [] (nfs40_discover_server_trunki >> > ng+0xec/0x148) >> > [ 41.507507] [] >> > (nfs40_discover_server_trunking+0xec/0x148) from [] >> > (nfs4_discover_server >> > _trunking+0x94/0x1d4) >> > [ 41.519866] [] (nfs4_discover_server_trunking+0x94/0x1d4) >> > from [] (nfs4_init_client+0x15 >> > 0/0x1b0) >> > [ 41.531036] [] (nfs4_init_client+0x150/0x1b0) from >> > [] (nfs_get_client+0x2cc/0x320) >> > [ 41.540863] [] (nfs_get_client+0x2cc/0x320) from >> > [] (nfs4_set_client+0x80/0xb0) >> > [ 41.550476] [] (nfs4_set_client+0x80/0xb0) from >> > [] (nfs4_create_server+0xb0/0x21c) >> > [ 41.560333] [] (nfs4_create_server+0xb0/0x21c) from >> > [] (nfs4_remote_mount+0x28/0x54) >> > [ 41.570373] [] (nfs4_remote_mount+0x28/0x54) from >> > [] (mount_fs+0x6c/0x160) >> > [ 41.579498] [] (mount_fs+0x6c/0x160) from [] >> > (vfs_kern_mount+0x4c/0xc0) >> > [ 41.588378] [] (vfs_kern_mount+0x4c/0xc0) from >> > [] (nfs_do_root_mount+0x74/0x90) >> > [ 41.597961] [] (nfs_do_root_mount+0x74/0x90) from >> > []
Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!
On Mon, 2013-03-04 at 23:33 +0800, Ming Lei wrote: > Hi, > > CC guys who introduced the lockdep change. > > On Mon, Mar 4, 2013 at 11:04 PM, Jeff Layton wrote: > > > > > I don't get it -- why is it bad to hold a lock across a freeze event? > > At least this may deadlock another mount.nfs during freezing, :-) > > See detailed explanation in the commit log: > > commit 6aa9707099c4b25700940eb3d016f16c4434360d > Author: Mandeep Singh Baines > Date: Wed Feb 27 17:03:18 2013 -0800 > > lockdep: check that no locks held at freeze time > > We shouldn't try_to_freeze if locks are held. Holding a lock can cause a > deadlock if the lock is later acquired in the suspend or hibernate path > (e.g. by dpm). Holding a lock can also cause a deadlock in the case of > cgroup_freezer if a lock is held inside a frozen cgroup that is later > acquired by a process outside that group. > This is bloody ridiculous... If you want to add functionality to implement cgroup or per-process freezing, then do it through some other api instead of trying to push your problems onto others by adding new global locking rules. Filesystems are a shared resource that have _nothing_ to do with process cgroups. They need to be suspended when the network goes down or other resources that they depend on are suspended. At that point, there is no "what if I launch a new mount command?" scenario. Trond -- Trond Myklebust Linux NFS client maintainer NetApp trond.mykleb...@netapp.com www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/