> A boot argument might help - so we can force use of pstore in cases where
> kdump is failing (or prevent use of pstore in cases where it
> seem to be preventing us getting to kdump ... I don't have a preference).
> BUT this would only be useful if we had a repeatable
> problem so that we
> > If we can fix it with a small patch in adance, it is really helpful for us.
>
> As I said in my email I just sent, it may not help you without testing it.
> As there are probably other problems in that un-tested theoretical scenario.
OK. I understood.
> >
> > 2)
> > In the long term, I
> Now my first reaction would be, if that is the scenario, why couldn't cpuA
> release the lock within one second. Because if cpuA is stuck
> talking with firmware, then your patch to force the unlock is probably going
> to trip over the same problems.
> (those problems include dealing with
> But you are assuming that kmsg_dump is perfect and it isn't, in which case
> by putting kmsg_dump in the kdump path, you actually may be blocking kdump
> from working.
I think the concern is that kdump isn't perfect, so sometimes we don't get a
good dump
from it. In those cases it would have
On Fri, Dec 07, 2012 at 11:43:03PM +, Seiji Aguchi wrote:
> > Can all these things really happen (did you run into this problem on a real
> > system?). Or is this just a theoretical problem. Ugly (but
> > practical) hacks might be OK to solve real problems.
>
> It is a theoretical problem
On Fri, Dec 07, 2012 at 09:41:13PM +, Seiji Aguchi wrote:
> [Issue]
>
> If one cpu ,which is taking a psinfo->buf_lock,
> receive NMI from a panicked cpu via smp_send_stop(),
> the panicked cpu hangs up in pstore_dump() called by
> kmsg_dump(KMSG_DUMP_PANIC)
> because the psinfo->buf_lock
On Fri, Dec 07, 2012 at 09:41:13PM +, Seiji Aguchi wrote:
[Issue]
If one cpu ,which is taking a psinfo-buf_lock,
receive NMI from a panicked cpu via smp_send_stop(),
the panicked cpu hangs up in pstore_dump() called by
kmsg_dump(KMSG_DUMP_PANIC)
because the psinfo-buf_lock is taken
On Fri, Dec 07, 2012 at 11:43:03PM +, Seiji Aguchi wrote:
Can all these things really happen (did you run into this problem on a real
system?). Or is this just a theoretical problem. Ugly (but
practical) hacks might be OK to solve real problems.
It is a theoretical problem right
But you are assuming that kmsg_dump is perfect and it isn't, in which case
by putting kmsg_dump in the kdump path, you actually may be blocking kdump
from working.
I think the concern is that kdump isn't perfect, so sometimes we don't get a
good dump
from it. In those cases it would have been
Now my first reaction would be, if that is the scenario, why couldn't cpuA
release the lock within one second. Because if cpuA is stuck
talking with firmware, then your patch to force the unlock is probably going
to trip over the same problems.
(those problems include dealing with
If we can fix it with a small patch in adance, it is really helpful for us.
As I said in my email I just sent, it may not help you without testing it.
As there are probably other problems in that un-tested theoretical scenario.
OK. I understood.
2)
In the long term, I plan to add a
A boot argument might help - so we can force use of pstore in cases where
kdump is failing (or prevent use of pstore in cases where it
seem to be preventing us getting to kdump ... I don't have a preference).
BUT this would only be useful if we had a repeatable
problem so that we could
> Can all these things really happen (did you run into this problem on a real
> system?). Or is this just a theoretical problem. Ugly (but
> practical) hacks might be OK to solve real problems.
It is a theoretical problem right now.
But it is a timing issue and there is a possibility to happen
> This patch skips taking a psinfo->buf_lock when just one cpu is online
> because stopped cpus turn to offline via smp_send_stop()
> in some architectures like x86, powerpc or arm64.
That seems an impressive list of preconditions. So for this to
help we need to have taken all but one cpu
This patch skips taking a psinfo-buf_lock when just one cpu is online
because stopped cpus turn to offline via smp_send_stop()
in some architectures like x86, powerpc or arm64.
That seems an impressive list of preconditions. So for this to
help we need to have taken all but one cpu offline,
Can all these things really happen (did you run into this problem on a real
system?). Or is this just a theoretical problem. Ugly (but
practical) hacks might be OK to solve real problems.
It is a theoretical problem right now.
But it is a timing issue and there is a possibility to happen
16 matches
Mail list logo