RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
> A boot argument might help - so we can force use of pstore in cases where > kdump is failing (or prevent use of pstore in cases where it > seem to be preventing us getting to kdump ... I don't have a preference). > BUT this would only be useful if we had a repeatable > problem so that we

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
> > If we can fix it with a small patch in adance, it is really helpful for us. > > As I said in my email I just sent, it may not help you without testing it. > As there are probably other problems in that un-tested theoretical scenario. OK. I understood. > > > > 2) > > In the long term, I

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
> Now my first reaction would be, if that is the scenario, why couldn't cpuA > release the lock within one second. Because if cpuA is stuck > talking with firmware, then your patch to force the unlock is probably going > to trip over the same problems. > (those problems include dealing with

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Luck, Tony
> But you are assuming that kmsg_dump is perfect and it isn't, in which case > by putting kmsg_dump in the kdump path, you actually may be blocking kdump > from working. I think the concern is that kdump isn't perfect, so sometimes we don't get a good dump from it. In those cases it would have

Re: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Don Zickus
On Fri, Dec 07, 2012 at 11:43:03PM +, Seiji Aguchi wrote: > > Can all these things really happen (did you run into this problem on a real > > system?). Or is this just a theoretical problem. Ugly (but > > practical) hacks might be OK to solve real problems. > > It is a theoretical problem

Re: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Don Zickus
On Fri, Dec 07, 2012 at 09:41:13PM +, Seiji Aguchi wrote: > [Issue] > > If one cpu ,which is taking a psinfo->buf_lock, > receive NMI from a panicked cpu via smp_send_stop(), > the panicked cpu hangs up in pstore_dump() called by > kmsg_dump(KMSG_DUMP_PANIC) > because the psinfo->buf_lock

Re: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Don Zickus
On Fri, Dec 07, 2012 at 09:41:13PM +, Seiji Aguchi wrote: [Issue] If one cpu ,which is taking a psinfo-buf_lock, receive NMI from a panicked cpu via smp_send_stop(), the panicked cpu hangs up in pstore_dump() called by kmsg_dump(KMSG_DUMP_PANIC) because the psinfo-buf_lock is taken

Re: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Don Zickus
On Fri, Dec 07, 2012 at 11:43:03PM +, Seiji Aguchi wrote: Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem. Ugly (but practical) hacks might be OK to solve real problems. It is a theoretical problem right

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Luck, Tony
But you are assuming that kmsg_dump is perfect and it isn't, in which case by putting kmsg_dump in the kdump path, you actually may be blocking kdump from working. I think the concern is that kdump isn't perfect, so sometimes we don't get a good dump from it. In those cases it would have been

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
Now my first reaction would be, if that is the scenario, why couldn't cpuA release the lock within one second. Because if cpuA is stuck talking with firmware, then your patch to force the unlock is probably going to trip over the same problems. (those problems include dealing with

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
If we can fix it with a small patch in adance, it is really helpful for us. As I said in my email I just sent, it may not help you without testing it. As there are probably other problems in that un-tested theoretical scenario. OK. I understood. 2) In the long term, I plan to add a

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-10 Thread Seiji Aguchi
A boot argument might help - so we can force use of pstore in cases where kdump is failing (or prevent use of pstore in cases where it seem to be preventing us getting to kdump ... I don't have a preference). BUT this would only be useful if we had a repeatable problem so that we could

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-07 Thread Seiji Aguchi
> Can all these things really happen (did you run into this problem on a real > system?). Or is this just a theoretical problem. Ugly (but > practical) hacks might be OK to solve real problems. It is a theoretical problem right now. But it is a timing issue and there is a possibility to happen

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-07 Thread Luck, Tony
> This patch skips taking a psinfo->buf_lock when just one cpu is online > because stopped cpus turn to offline via smp_send_stop() > in some architectures like x86, powerpc or arm64. That seems an impressive list of preconditions. So for this to help we need to have taken all but one cpu

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-07 Thread Luck, Tony
This patch skips taking a psinfo-buf_lock when just one cpu is online because stopped cpus turn to offline via smp_send_stop() in some architectures like x86, powerpc or arm64. That seems an impressive list of preconditions. So for this to help we need to have taken all but one cpu offline,

RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

2012-12-07 Thread Seiji Aguchi
Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem. Ugly (but practical) hacks might be OK to solve real problems. It is a theoretical problem right now. But it is a timing issue and there is a possibility to happen