Re: [Openipmi-developer] [PATCH 24/30] panic: Refactor the panic path

2022-06-14 Thread Petr Mladek via Openipmi-developer
On Thu 2022-05-26 13:25:57, Guilherme G. Piccoli wrote:
> OK, so it seems we have some points in which agreement exists, and some
> points that there is no agreement and instead, we have antagonistic /
> opposite views and needs. Let's start with the easier part heh
>
> It seems everybody agrees that *we shouldn't over-engineer things*, and
> as per Eric good words: making the panic path more feature-full or
> increasing flexibility isn't a good idea. So, as a "corollary": the
> panic level approach I'm proposing is not a good fit, I'll drop it and
> let's go with something simpler.

Makes sense.

> Another point of agreement seems to be that _notifier lists in the panic
> path are dangerous_, for *2 different reasons*:
> 
> (a) We cannot guarantee that people won't add crazy callbacks there, we
> can plan and document things the best as possible - it'll never be
> enough, somebody eventually would slip a nonsense callback that would
> break things and defeat the planned purpose of such a list;

It is true that notifier lists might allow to add crazy stuff
without proper review more easily. Things added into the core
code would most likely get better review.

But nothing is error-proof. And bugs will happen with any approach.


> (b) As per Eric point, in a panic/crash situation we might have memory
> corruption exactly in the list code / pointers, etc, so the notifier
> lists are, by nature, a bit fragile. But I think we shouldn't consider
> it completely "bollocks", since this approach has been used for a while
> with a good success rate. So, lists aren't perfect at all, but at the
> same time, they aren't completely useless.

I am not able to judge this. Of course, any extra step increases
the risk. I am just not sure how much more complicated it would
be to hardcode the calls. Most of them are architecture
and/or feature specific. And such code is often hard to
review and maintain.

> To avoid using a 4th list,

4th or 5th? We already have "hypervisor", "info", "pre-reboot", and "pre-loop".
The 5th might be pre-crash-exec.

> especially given the list nature is a bit
> fragile, I'd suggest one of the 3 following approaches - I *really
> appreciate feedbacks* on that so I can implement the best solution and
> avoid wasting time in some poor/disliked solution:

Honestly, I am not able to decide what might be better without seeing
the code.

Most things fits pretty well into the 4 proposed lists:
"hypervisor", "info", "pre-reboot", and "pre-loop". IMHO, the
only question is the code that needs to be always called
even before crash_dump.

I suggest that you solve the crash_dump callbacks the way that
looks best to you. Ideally do it in a separate patch so it can be
reviewed and reworked more easily.

I believe that a fresh code with an updated split and simplified
logic would help us to move forward.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-24 Thread Petr Mladek via Openipmi-developer
On Mon 2022-05-23 11:56:12, Guilherme G. Piccoli wrote:
> On 19/05/2022 16:20, Scott Branden wrote:
> > [...] 
> >> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
> >> designed to run in x86 only or you have other architectures' use cases?
> > The adapter may be used in any PCIe design that supports DMA.
> > So it may be possible to run in arm64 servers.
> >>
> >> [...]
> >> With that said, and given this is a lightweight notifier that ideally
> >> should run ASAP, I'd keep this one in the hypervisor list. We can
> >> "adjust" the semantic of this list to include lightweight notifiers that
> >> reset adapters.
> > Sounds the best to keep system operating as tested today.
> >>
> >> With that said, Petr has a point - not always such list is going to be
> >> called before kdump. So, that makes me think in another idea: what if we
> >> have another list, but not on panic path, but instead in the custom
> >> crash_shutdown()? Drivers could add callbacks there that must execute
> >> before kexec/kdump, no matter what.
> > It may be beneficial for some other drivers but for our use we would 
> > then need to register for the panic path and the crash_shutdown path. 
> > We notify the VK card for 2 purposes: one to stop DMA so memory stop 
> > changing during a kdump.  And also to get the card into a good state so 
> > resets happen cleanly.
> 
> Thanks Scott! With that, I guess it's really better to keep this
> notifier in this hypervisor/early list - I'm planning to do that for V2.
> Unless Petr or somebody has strong feelings against that, of course.

I am fine with it because we do not have a better solution at the
moment.

It might be a good candidate for the 5th notifier list mentioned
in the thread https://lore.kernel.org/r/YoyQyHHfhIIXSX0U@alley .
But I am not sure if the 5th list is worth the complexity.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 24/30] panic: Refactor the panic path

2022-05-24 Thread Petr Mladek via Openipmi-developer
On Fri 2022-05-20 08:23:33, Guilherme G. Piccoli wrote:
> On 19/05/2022 20:45, Baoquan He wrote:
> > [...]
> >> I really appreciate the summary skill you have, to convert complex
> >> problems in very clear and concise ideas. Thanks for that, very useful!
> >> I agree with what was summarized above.
> > 
> > I want to say the similar words to Petr's reviewing comment when I went
> > through the patches and traced each reviewing sub-thread to try to
> > catch up. Petr has reivewed this series so carefully and given many
> > comments I want to ack immediately.
> > 
> > I agree with most of the suggestions from Petr to this patch, except of
> > one tiny concern, please see below inline comment.
> 
> Hi Baoquan, thanks! I'm glad you're also reviewing that =)
> 
> 
> > [...]
> > 
> > I like the proposed skeleton of panic() and code style suggested by
> > Petr very much. About panic_prefer_crash_dump which might need be added,
> > I hope it has a default value true. This makes crash_dump execute at
> > first by default just as before, unless people specify
> > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> > this is inconsistent with the old behaviour.
> 
> I'd like to understand better why the crash_kexec() must always be the
> first thing in your use case. If we keep that behavior, we'll see all
> sorts of workarounds - see the last patches of this series, Hyper-V and
> PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
> execution of their relevant notifiers (like the vmbus disconnect,
> specially in arm64 that has no custom machine_crash_shutdown, or the
> fadump case in ppc). This led to more risk in kdump.
> 
> The thing is: with the notifiers' split, we tried to keep only the most
> relevant/necessary stuff in this first list, things that ultimately
> should improve kdump reliability or if not, at least not break it. My
> feeling is that, with this series, we should change the idea/concept
> that kdump must run first nevertheless, not matter what. We're here
> trying to accommodate the antagonistic goals of hypervisors that need
> some clean-up (even for kdump to work) VS. kdump users, that wish a
> "pristine" system reboot ASAP after the crash.

Good question. I wonder if Baoquan knows about problems caused by the
particular notifiers that will end up in the hypervisor list. Note
that there will be some shuffles and the list will be slightly
different in V2.

Anyway, I see four possible solutions:

  1. The most conservative approach is to keep the current behavior
 and call kdump first by default.

  2. A medium conservative approach to change the default default
 behavior and call hypervisor and eventually the info notifiers
 before kdump. There still would be the possibility to call kdump
 first by the command line parameter.

  3. Remove the possibility to call kdump first completely. It would
 assume that all the notifiers in the info list are super safe
 or that they make kdump actually more safe.

  4. Create one more notifier list for operations that always should
 be called before crash_dump.

Regarding the extra notifier list (4th solution). It is not clear to
me whether it would be always called even before hypervisor list or
when kdump is not enabled. We must not over-engineer it.

2nd proposal looks like a good compromise. But maybe we could do
this change few releases later. The notifiers split is a big
change on its own.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-19 Thread Petr Mladek via Openipmi-developer
On Wed 2022-05-18 10:16:20, Guilherme G. Piccoli wrote:
> On 18/05/2022 04:58, Petr Mladek wrote:
> > [...]
> >> I does similar things like kmsg_dump() so it should be called in
> >> the same location (after info notifier list and before kdump).
> >>
> >> A solution might be to put it at these notifiers at the very
> >> end of the "info" list or make extra "dump" notifier list.
> > 
> > I just want to point out that the above idea has problems.
> > Notifiers storing kernel log need to be treated as kmsg_dump().
> > In particular, we would  need to know if there are any.
> > We do not need to call "info" notifier list before kdump
> > when there is no kernel log dumper registered.
> > 
> 
> Notifiers respect the priority concept, which is just a number that
> orders the list addition (and the list is called in order).
> 
> I've used the last position to panic_print() [in patch 25] - one idea
> here is to "reserve" the last position (represented by INT_MIN) for
> notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
> doesn't prevent us to save this position and comment about that.

I would ignore it for now. If anyone would want to safe the log
then they would need to read it. They will most likely use
the existing kmsg_dump() infastructure. In fact, they should
use it to avoid a code duplication.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-17 15:57:34, Petr Mladek wrote:
> On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> > >> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> > >> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device 
> > >> *pdev)
> > >>  goto out;
> > >>  }
> > >>  
> > >> -atomic_notifier_chain_register(_notifier_list,
> > >> +atomic_notifier_chain_register(_hypervisor_list,
> > >> _pm_panic_nb);
> > > 
> > > I am not sure about this one. It instruct some HW to preserve DRAM.
> > > IMHO, it better fits into pre_reboot category but I do not have
> > > strong opinion.
> > 
> > Disagree here, I'm CCing Florian for information.
> > 
> > This notifier preserves RAM so it's *very interesting* if we have
> > kmsg_dump() for example, but maybe might be also relevant in case kdump
> > kernel is configured to store something in a persistent RAM (then,
> > without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.

As discussed in the other other reply, it seems that both affected
notifiers do not store kernel logs and should stay in the "hypervisor".

> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
>
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

I just want to point out that the above idea has problems.
Notifiers storing kernel log need to be treated as kmsg_dump().
In particular, we would  need to know if there are any.
We do not need to call "info" notifier list before kdump
when there is no kernel log dumper registered.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-17 13:42:06, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:57, Petr Mladek wrote:
> >> Disagree here, I'm CCing Florian for information.
> >>
> >> This notifier preserves RAM so it's *very interesting* if we have
> >> kmsg_dump() for example, but maybe might be also relevant in case kdump
> >> kernel is configured to store something in a persistent RAM (then,
> >> without this notifier, after kdump reboots the system data would be lost).
> > 
> > I see. It is actually similar problem as with
> > drivers/firmware/google/gsmi.c.
> > 
> > I does similar things like kmsg_dump() so it should be called in
> > the same location (after info notifier list and before kdump).
> > 
> > A solution might be to put it at these notifiers at the very
> > end of the "info" list or make extra "dump" notifier list.
> 
> Here I still disagree. I've commented in the other response thread
> (about Google gsmi) about the semantics of the hypervisor list, but
> again: this list should contain callbacks that
> 
> (a) Should run early, _by default_ before a kdump;
> (b) Communicate with the firmware/hypervisor in a "low-risk" way;
> 
> Imagine a scenario where users configure kdump kernel to save something
> in a persistent form in DRAM - it'd be like a late pstore, in the next
> kernel. This callback enables that, it's meant to inform FW "hey, panic
> happened, please from now on don't clear the RAM in the next FW-reboot".
> I don't see a reason to postpone that - let's see if the maintainers
> have an opinion.

I have answered this in more detail in the other reply, see
https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley

I agree that both notifiers in

drivers/soc/bcm/brcmstb/pm/pm-arm.c
drivers/firmware/google/gsmi.c

better fit into the hypervisor list after all.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-17 13:37:58, Guilherme G. Piccoli wrote:
> On 17/05/2022 10:28, Petr Mladek wrote:
> > [...]
> >>> Disagree here. I'm looping Google maintainers, so they can comment.
> >>> (CCed Evan, David, Julius)
> >>>
> >>> This notifier is clearly a hypervisor notification mechanism. I've fixed
> >>> a locking stuff there (in previous patch), I feel it's low-risk but even
> >>> if it's mid-risk, the class of such callback remains a perfect fit with
> >>> the hypervisor list IMHO.
> >>
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up
 > >> being the only clue we get in a Chromebook feedback report that a
> >> panic occurred, so from my perspective moving it to the front of the
> >> line seems like a good idea.
> > 
> > IMHO, this would really better fit into the pre-reboot notifier list:
> > 
> >+ the callback stores the log so it is similar to kmsg_dump()
> >  or console_flush_on_panic()
> > 
> >+ the callback should be proceed after "info" notifiers
> >  that might add some other useful information.
> > 
> > Honestly, I am not sure what exactly hypervisor callbacks do. But I
> > think that they do not try to extract the kernel log because they
> > would need to handle the internal format.
> > 
> 
> I guess the main point in your response is : "I am not sure what exactly
> hypervisor callbacks do". We need to be sure about the semantics of such
> list, and agree on that.
> 
> So, my opinion about this first list, that we call "hypervisor list",
> is: it contains callbacks that
> 
> (1) should run early, preferably before kdump (or even if kdump isn't
> set, should run ASAP);
> 
> (2) these callbacks perform some communication with an abstraction that
> runs "below" the kernel, like a firmware or hypervisor. Classic example:
> pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
> snapshot the full guest memory, for example.
> 
> (3) Should be low-risk. What defines risk is the level of reliability of
> subsequent operations - if the callback have 50% of chance of "bricking"
> the system totally and prevent kdump / kmsg_dump() / reboot , this is
> high risk one for example.
> 
> Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
> powerpc, etc.
> 
> So, this is a good case for the Google notifier as well - it's not
> collecting data like the dmesg (hence your second bullet seems to not
> apply here, info notifiers won't add info to be collected by gsmi). It
> is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
> tells such "lower" abstraction a panic occurred. It seems low risk and
> we want it to run ASAP, if possible.

" 
> >> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> >> for critical and power-related events. In some cases this ends up

I see. I somehow assumed that it was about the kernel log because
Evans wrote:

  "This logs a panic to our "eventlog", a tiny logging area in SPI flash
   for critical and power-related events. In some cases this ends up"


Anyway, I would distinguish it the following way.

  + If the notifier is preserving kernel log then it should be ideally
treated as kmsg_dump().

  + It the notifier is saving another debugging data then it better
fits into the "hypervisor" notifier list.


Regarding the reliability. From my POV, any panic notifier enabled
in a generic kernel should be reliable with more than 99,9%.
Otherwise, they should not be in the notifier list at all.

An exception would be a platform-specific notifier that is
called only on some specific platform and developers maintaining
this platform agree on this.

The value "99,9%" is arbitrary. I am not sure if it is realistic
even in the other code, for example, console_flush_on_panic()
or emergency_restart(). I just want to point out that the border
should be rather high. Otherwise we would back in the situation
where people would want to disable particular notifiers.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Petr Mladek via Openipmi-developer
On Mon 2022-05-16 13:33:51, Guilherme G. Piccoli wrote:
> On 16/05/2022 13:18, Luck, Tony wrote:
> >> [...]
> > Would it be possible to have some global "kdump is configured + enabled" 
> > flag?
> > 
> > Then notifiers could make an informed choice on whether to deep dive to
> > get all the possible details (when there is no kdump) or just skim the high
> > level stuff (to maximize chance of getting a successful kdump).
> > 
> > -Tony
> 
> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

I like this idea.

One small problem is that kexec_crash_loaded() has valid result
only under kexec_mutex. On the other hand, it should stay true
once loaded so that the small race window should be innocent.

> With that, are you/Petr/Dinh OK in moving it for the info list?

Sounds good to me.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Petr Mladek via Openipmi-developer
On Mon 2022-05-16 12:06:17, Guilherme G. Piccoli wrote:
> Thanks for the review!
> 
> I agree with the blinking stuff, I can rework and add all LED/blinking
> stuff into the loop list, it does make sense. I'll comment a bit in the
> others below...
> 
> On 16/05/2022 11:01, Petr Mladek wrote:
> >> --- a/drivers/firmware/google/gsmi.c
> >> +++ b/drivers/firmware/google/gsmi.c
> >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> >>  
> >>register_reboot_notifier(_reboot_notifier);
> >>register_die_notifier(_die_notifier);
> >> -  atomic_notifier_chain_register(_notifier_list,
> >> +  atomic_notifier_chain_register(_hypervisor_list,
> >>   _panic_notifier);
> > 
> > I am not sure about this one. It looks like some logging or
> > pre_reboot stuff.
> > 
> 
> Disagree here. I'm looping Google maintainers, so they can comment.
> (CCed Evan, David, Julius)
> 
> This notifier is clearly a hypervisor notification mechanism. I've fixed
> a locking stuff there (in previous patch), I feel it's low-risk but even
> if it's mid-risk, the class of such callback remains a perfect fit with
> the hypervisor list IMHO.

It is similar to drivers/soc/bcm/brcmstb/pm/pm-arm.c.
See below for another idea.

> >> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> >> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
> >> struct pci_device_id *ent)
> >>  
> >>/* register for panic notifier */
> >>vk->panic_nb.notifier_call = bcm_vk_on_panic;
> >> -  err = atomic_notifier_chain_register(_notifier_list,
> >> +  err = atomic_notifier_chain_register(_hypervisor_list,
> >> >panic_nb);
> > 
> > It seems to reset some hardware or so. IMHO, it should go into the
> > pre-reboot list.
> 
> Mixed feelings here, I'm looping Broadcom maintainers to comment.
> (CC Scott and Broadcom list)
> 
> I'm afraid it breaks kdump if this device is not reset beforehand - it's
> a doorbell write, so not high risk I think...
> 
> But in case the not-reset device can be probed normally in kdump kernel,
> then I'm fine in moving this to the reboot list! I don't have the HW to
> test myself.

Good question. Well, it if has to be called before kdump then
even "hypervisor" list is a wrong place because is not always
called before kdump.


> >> --- a/drivers/power/reset/ltc2952-poweroff.c
> >> +++ b/drivers/power/reset/ltc2952-poweroff.c
> >> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct 
> >> platform_device *pdev)
> >>pm_power_off = ltc2952_poweroff_kill;
> >>  
> >>data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> >> -  atomic_notifier_chain_register(_notifier_list,
> >> +  atomic_notifier_chain_register(_hypervisor_list,
> >>   >panic_notifier);
> > 
> > I looks like this somehow triggers the reboot. IMHO, it should go
> > into the pre_reboot list.
> 
> Mixed feeling again here - CCing the maintainers for comments (Sebastian
> / PM folks).
> 
> This is setting a variable only, and once it's set (data->kernel_panic
> is the bool's name), it just bails out the IRQ handler and a timer
> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
> IMHO.

IMHO, the timer informs the hardware that the system is still alive
in the middle of panic(). If the timer is not working then the
hardware (chip) will think that the system frozen in panic()
and will power off the system. See the comments in
drivers/power/reset/ltc2952-poweroff.c:

 * The following GPIOs are used:
 * - trigger (input)
 * A level change indicates the shut-down trigger. If it's state reverts
 * within the time-out defined by trigger_delay, the shut down is not
 * executed. If no pin is assigned to this input, the driver will start the
 * watchdog toggle immediately. The chip will only power off the system if
 * it is requested to do so through the kill line.
 *
 * - watchdog (output)
 * Once a shut down is triggered, the driver will toggle this signal,
 * with an internal (wde_interval) to stall the hardware shut down.

IMHO, we really have to keep it alive until we reach the reboot stage.

Another question is how it actually works when the interrupts are
disabled during panic() and the timer callbacks are not handled.


> > [...]
> >> --- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> >> +++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
> >> @@ -814,7 +814,7 @@ static int brcmstb_pm_probe(struct platform_device 
> >> *pdev)
> >>goto out;
> >>}
> >>  
> >> -  atomic_notifier_chain_register(_notifier_list,
> >> +  atomic_notifier_chain_register(_hypervisor_list,
> >>   _pm_panic_nb);
> > 
> > I am not sure about this one. It instruct some HW to preserve DRAM.
> > IMHO, it better fits into pre_reboot category but I do not have
> > strong opinion.
> 
> Disagree here, I'm CCing Florian for information.
> 

Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Petr Mladek via Openipmi-developer
On Mon 2022-05-16 09:02:10, Evan Green wrote:
> On Mon, May 16, 2022 at 8:07 AM Guilherme G. Piccoli
>  wrote:
> >
> > Thanks for the review!
> >
> > I agree with the blinking stuff, I can rework and add all LED/blinking
> > stuff into the loop list, it does make sense. I'll comment a bit in the
> > others below...
> >
> > On 16/05/2022 11:01, Petr Mladek wrote:
> > > [...]
> > >> --- a/arch/mips/sgi-ip22/ip22-reset.c
> > >> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> > >> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
> > >>  }
> > >>
> > >>  timer_setup(_timer, blink_timeout, 0);
> > >> -atomic_notifier_chain_register(_notifier_list, _block);
> > >> +atomic_notifier_chain_register(_hypervisor_list, 
> > >> _block);
> > >
> > > This notifier enables blinking. It is not much safe. It calls
> > > mod_timer() that takes a lock internally.
> > >
> > > This kind of functionality should go into the last list called
> > > before panic() enters the infinite loop. IMHO, all the blinking
> > > stuff should go there.
> > > [...]
> > >> --- a/arch/mips/sgi-ip32/ip32-reset.c
> > >> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> > >> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
> > >>  pm_power_off = ip32_machine_halt;
> > >>
> > >>  timer_setup(_timer, blink_timeout, 0);
> > >> -atomic_notifier_chain_register(_notifier_list, _block);
> > >> +atomic_notifier_chain_register(_hypervisor_list, 
> > >> _block);
> > >
> > > Same here. Should be done only before the "loop".
> > > [...]
> >
> > Ack.
> >
> >
> > >> --- a/drivers/firmware/google/gsmi.c
> > >> +++ b/drivers/firmware/google/gsmi.c
> > >> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
> > >>
> > >>  register_reboot_notifier(_reboot_notifier);
> > >>  register_die_notifier(_die_notifier);
> > >> -atomic_notifier_chain_register(_notifier_list,
> > >> +atomic_notifier_chain_register(_hypervisor_list,
> > >> _panic_notifier);
> > >
> > > I am not sure about this one. It looks like some logging or
> > > pre_reboot stuff.
> > >
> >
> > Disagree here. I'm looping Google maintainers, so they can comment.
> > (CCed Evan, David, Julius)
> >
> > This notifier is clearly a hypervisor notification mechanism. I've fixed
> > a locking stuff there (in previous patch), I feel it's low-risk but even
> > if it's mid-risk, the class of such callback remains a perfect fit with
> > the hypervisor list IMHO.
>
> This logs a panic to our "eventlog", a tiny logging area in SPI flash
> for critical and power-related events. In some cases this ends up
> being the only clue we get in a Chromebook feedback report that a
> panic occurred, so from my perspective moving it to the front of the
> line seems like a good idea.

IMHO, this would really better fit into the pre-reboot notifier list:

   + the callback stores the log so it is similar to kmsg_dump()
 or console_flush_on_panic()

   + the callback should be proceed after "info" notifiers
 that might add some other useful information.

Honestly, I am not sure what exactly hypervisor callbacks do. But I
think that they do not try to extract the kernel log because they
would need to handle the internal format.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-17 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-10 13:16:54, Guilherme G. Piccoli wrote:
> On 10/05/2022 12:16, Petr Mladek wrote:
> > [...]
> > Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> > All notifiers will be always called with PANIC_NOTIFIER.
> > 
> > The @val parameter is normally used when the same notifier_list
> > is used in different situations.
> > 
> > But you are going to use it when the same notifier is used
> > in more lists. This is normally distinguished by the @nh
> > (atomic_notifier_head) parameter.
> > 
> > IMHO, it is a bad idea. First, it would confuse people because
> > it does not follow the original design of the parameters.
> > Second, the related code must be touched anyway when
> > the notifier is moved into another list so it does not
> > help much.
> > 
> > Or do I miss anything, please?
> > 
> > Best Regards,
> > Petr
> 
> Hi Petr, thanks for the review.
> 
> I'm not strong attached to this patch, so we could drop it and refactor
> the code of next patches to use the @nh as identification - but
> personally, I feel this parameter could be used to identify the list
> that called such function, in other words, what is the event that
> triggered the callback. Some notifiers are even declared with this
> parameter called "ev", like the event that triggers the notifier.
> 
> 
> You mentioned 2 cases:
> 
> (a) Same notifier_list used in different situations;
> 
> (b) Same *notifier callback* used in different lists;
> 
> Mine is case (b), right? Can you show me an example of case (a)?

There are many examples of case (a):

   + module_notify_list:
MODULE_STATE_LIVE,  /* Normal state. */
MODULE_STATE_COMING,/* Full formed, running module_init. */
MODULE_STATE_GOING, /* Going away. */
MODULE_STATE_UNFORMED,  /* Still setting it up. */


   + netdev_chain:

NETDEV_UP   = 1,/* For now you can't veto a device up/down */
NETDEV_DOWN,
NETDEV_REBOOT,  /* Tell a protocol stack a network interface
   detected a hardware crash and restarted
   - we can use this eg to kick tcp sessions
   once done */
NETDEV_CHANGE,  /* Notify device state change */
NETDEV_REGISTER,
NETDEV_UNREGISTER,
NETDEV_CHANGEMTU,   /* notify after mtu change happened */
NETDEV_CHANGEADDR,  /* notify after the address change */
NETDEV_PRE_CHANGEADDR,  /* notify before the address change */
NETDEV_GOING_DOWN,
...

+ vt_notifier_list:

#define VT_ALLOCATE 0x0001 /* Console got allocated */
#define VT_DEALLOCATE   0x0002 /* Console will be deallocated */
#define VT_WRITE0x0003 /* A char got output */
#define VT_UPDATE   0x0004 /* A bigger update occurred */
#define VT_PREWRITE 0x0005 /* A char is about to be written 
to the console */

+ die_chain:

DIE_OOPS = 1,
DIE_INT3,
DIE_DEBUG,
DIE_PANIC,
DIE_NMI,
DIE_DIE,
DIE_KERNELDEBUG,
...

These all call the same list/chain in different situations.
The situation is distinguished by @val.


> You can see in the following patches (or grep the kernel) that people are 
> using
> this identification parameter to determine which kind of OOPS trigger
> the callback to condition the execution of the function to specific
> cases.

Could you please show me some existing code for case (b)?
I am not able to find any except in your patches.

Anyway, the solution in 16th patch is bad, definitely.
hv_die_panic_notify_crash() uses "val" to disinguish
both:

 + "panic_notifier_list" vs "die_chain"
 + die_val when callen via "die_chain"

The API around "die_chain" API is not aware of enum panic_notifier_val
and the API using "panic_notifier_list" is not aware of enum die_val.
As I said, it is mixing apples and oranges and it is error prone.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-17 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-10 10:00:58, Guilherme G. Piccoli wrote:
> On 10/05/2022 09:14, Petr Mladek wrote:
> > [...]
> >> With that said, it's dangerous to use regular spinlocks in such path,
> >> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
> >> instances").
> >> This patch fixes that by replacing regular spinlocks with the trylock
> >> safer approach.
> > 
> > It seems that the lock is used just to manipulating a list. A super
> > safe solution would be to use the rcu API: rcu_add_rcu() and
> > list_del_rcu() under rcu_read_lock(). The spin lock will not be
> > needed and the list will always be valid.
> > 
> > The advantage would be that it will always call members that
> > were successfully added earlier. That said, I am not familiar
> > with pvpanic and am not sure if it is worth it.
> > 
> >> It also fixes an old comment (about a long gone framebuffer code) and
> >> the notifier priority - we should execute hypervisor notifiers early,
> >> deferring this way the panic action to the hypervisor, as expected by
> >> the users that are setting up pvpanic.
> > 
> > This should be done in a separate patch. It changes the behavior.
> > Also there might be a discussion whether it really should be
> > the maximal priority.
> > 
> > Best Regards,
> > Petr
> 
> Thanks for the review Petr. Patch was already merged - my goal was to be
> concise, i.e., a patch per driver / module, so the patch kinda fixes
> whatever I think is wrong with the driver with regards panic handling.
> 
> Do you think it worth to remove this patch from Greg's branch just to
> split it in 2? Personally I think it's not worth, but opinions are welcome.

No problem. It is not worth the effort.


> About the RCU part, this one really could be a new patch, a good
> improvement patch - it makes sense to me, we can think about that after
> the fixes I guess.

Yup.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:19, Guilherme G. Piccoli wrote:
> Currently the parameter "panic_print" relies in a function called
> directly on panic path; one of the flags the users can set for
> panic_print triggers a console replay mechanism, to show the
> entire kernel log buffer (from the beginning) in a panic event.
> 
> Two problems with that: the dual nature of the panic_print
> isn't really appropriate, the function was originally meant
> to allow users dumping system information on panic events,
> and was "overridden" to also force a console flush of the full
> kernel log buffer. It also turns the code a bit more complex
> and duplicate than it needs to be.
> 
> This patch proposes 2 changes: first, we decouple panic_print
> from the console flushing mechanism, in the form of a new kernel
> core parameter (panic_console_replay); we kept the functionality
> on panic_print to avoid userspace breakage, although we comment
> in both code and documentation that this panic_print usage is
> deprecated.
> 
> We converted panic_print function to a panic notifier too, adding
> it on the panic informational notifier list, executed as the final
> callback. This allows a more clear code and makes sense, as
> panic_print_sys_info() is really a panic-time only function.
> We also moved its code to kernel/printk.c, it seems to make more
> sense given it's related to printing stuff.

I really like both changes. Just please split it them into two
patchset. I mean one patch for the new "panic_console_replay"
parameter and 2nd that moves "printk_info" into the notifier.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-05-11 17:03:51, Guilherme G. Piccoli wrote:
> On 10/05/2022 14:40, Steven Rostedt wrote:
> > On Wed, 27 Apr 2022 19:49:17 -0300
> > "Guilherme G. Piccoli"  wrote:
> > 
> >> Currently we don't have a way to check if there are dumpers set,
> >> except counting the list members maybe. This patch introduces a very
> >> simple helper to provide this information, by just keeping track of
> >> registered/unregistered kmsg dumpers. It's going to be used on the
> >> panic path in the subsequent patch.
> > 
> > FYI, it is considered "bad form" to reference in the change log "this
> > patch". We know this is a patch. The change log should just talk about what
> > is being done. So can you reword your change logs (you do this is almost
> > every patch). Here's what I would reword the above to be:
> > 
> >  Currently we don't have a way to check if there are dumpers set, except
> >  perhaps by counting the list members. Introduce a very simple helper to
> >  provide this information, by just keeping track of registered/unregistered
> >  kmsg dumpers. This will simplify the refactoring of the panic path.
> 
> Thanks for the hint, you're right - it's almost in all of my patches.
> I'll reword all of them (except the ones already merged) to remove this
> "bad form".

Shame on me that I do not care that much about the style of the commit
message :-)

Anyway, the code looks good to me. With the better commit message:

Reviewed-by: Petr Mladek 

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:16, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.

The patch looks good to me. I would just suggest two changes.

1. I would rename the list to "panic_loop_list" instead of
   "panic_post_reboot_list".

   It will be more clear that it includes things that are
   needed before panic() enters the infinite loop.


2. I would move all the notifiers that enable blinking here.

   The blinking should be done only during the infinite
   loop when there is nothing else to do. If we enable
   earlier then it might disturb/break more important
   functionality (dumping information, reboot).

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:15, Guilherme G. Piccoli wrote:
> This patch renames the panic_notifier_list to panic_pre_reboot_list;
> the idea is that a subsequent patch will refactor the panic path
> in order to better split the notifiers, running some of them very
> early, some of them not so early [but still before kmsg_dump()] and
> finally, the rest should execute late, after kdump. The latter ones
> are now in the panic pre-reboot list - the name comes from the idea
> that these notifiers execute before panic() attempts rebooting the
> machine (if that option is set).
> 
> We also took the opportunity to clean-up useless header inclusions,
> improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
> and more important, change some priorities - we hereby set 2 notifiers
> to run late in the list [iss_panic_event() and the IPMI panic_event()]
> due to the risks they offer (may not return, for example).
> Proper documentation is going to be provided in a subsequent patch,
> that effectively refactors the panic path.
> 
> --- a/drivers/edac/altera_edac.c
> +++ b/drivers/edac/altera_edac.c
> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device 
> *pdev)
>   int dberror, err_addr;
>  
>   edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_pre_reboot_list,

My understanding is that this notifier first prints info about ECC
errors and then triggers reboot. It might make sense to split it
into two notifiers.


>  >panic_notifier);
>  
>   /* Printout a message if uncorrectable error previously. */
> --- a/drivers/leds/trigger/ledtrig-panic.c
> +++ b/drivers/leds/trigger/ledtrig-panic.c
> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>  
>  static int __init ledtrig_panic_init(void)
>  {
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_pre_reboot_list,
>  _trigger_panic_nb);

Blinking => should go to the last "post_reboot/loop" list.


>  
>   led_trigger_register_simple("panic", );
> --- a/drivers/misc/ibmasm/heartbeat.c
> +++ b/drivers/misc/ibmasm/heartbeat.c
> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>  static int panic_happened(struct notifier_block *n, unsigned long val, void 
> *v)
>  {
>   suspend_heartbeats = 1;
> - return 0;
> + return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
> +static struct notifier_block panic_notifier = {
> + .notifier_call = panic_happened,
> +};
>  
>  void ibmasm_register_panic_notifier(void)
>  {
> - atomic_notifier_chain_register(_notifier_list, _notifier);
> + atomic_notifier_chain_register(_pre_reboot_list,
> + _notifier);

Same here. Blinking => should go to the last "post_reboot/loop" list.


>  }
>  
>  void ibmasm_unregister_panic_notifier(void)
>  {
> - atomic_notifier_chain_unregister(_notifier_list,
> - _notifier);
> + atomic_notifier_chain_unregister(_pre_reboot_list,
> + _notifier);
>  }


The rest of the moved notifiers seem to fit well this "pre_reboot"
list.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 20/30] panic: Add the panic informational notifier list

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:14, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to
> register callbacks to run earlier in the panic path than they
> currently do. This aims at informational mechanisms, like dumping
> kernel offsets and showing device error data (in case it's simple
> registers reading, for example) as well as mechanisms to disable
> log flooding (like hung_task detector / RCU warnings) and the
> tracing dump_on_oops (when enabled).
> 
> Any (non-invasive) information that should be provided before
> kmsg_dump() as well as log flooding preventing code should fit
> here, as long it offers relatively low risk for kdump.
> 
> For now, the patch is almost a no-op, although it changes a bit
> the ordering in which some panic notifiers are executed - specially
> affected by this are the notifiers responsible for disabling the
> hung_task detector / RCU warnings, which now run first. In a
> subsequent patch, the panic path will be refactored, then the
> panic informational notifiers will effectively run earlier,
> before ksmg_dump() (and usually before kdump as well).
> 
> We also defer documenting it all properly in the subsequent
> refactor patch. Finally, while at it, we removed some useless
> header inclusions too.
> 
> Signed-off-by: Guilherme G. Piccoli 

All notifiers moved in this patch seems to fit well the "info"
notifier list. The patch looks good from this POV.

I still have to review the rest of the patches to see if it
is complete.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:13, Guilherme G. Piccoli wrote:
> The goal of this new panic notifier is to allow its users to register
> callbacks to run very early in the panic path. This aims hypervisor/FW
> notification mechanisms as well as simple LED functions, and any other
> simple and safe mechanism that should run early in the panic path; more
> dangerous callbacks should execute later.
> 
> For now, the patch is almost a no-op (although it changes a bit the
> ordering in which some panic notifiers are executed). In a subsequent
> patch, the panic path will be refactored, then the panic hypervisor
> notifiers will effectively run very early in the panic path.
> 
> We also defer documenting it all properly in the subsequent refactor
> patch. While at it, we removed some useless header inclusions and
> fixed some notifiers return too (by using the standard NOTIFY_DONE).

> --- a/arch/mips/sgi-ip22/ip22-reset.c
> +++ b/arch/mips/sgi-ip22/ip22-reset.c
> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
>   }
>  
>   timer_setup(_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(_notifier_list, _block);
> + atomic_notifier_chain_register(_hypervisor_list, _block);

This notifier enables blinking. It is not much safe. It calls
mod_timer() that takes a lock internally.

This kind of functionality should go into the last list called
before panic() enters the infinite loop. IMHO, all the blinking
stuff should go there.

>  
>   return 0;
>  }
> diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
> index 18d1c115cd53..9ee1302c9d13 100644
> --- a/arch/mips/sgi-ip32/ip32-reset.c
> +++ b/arch/mips/sgi-ip32/ip32-reset.c
> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
>   pm_power_off = ip32_machine_halt;
>  
>   timer_setup(_timer, blink_timeout, 0);
> - atomic_notifier_chain_register(_notifier_list, _block);
> + atomic_notifier_chain_register(_hypervisor_list, _block);

Same here. Should be done only before the "loop".

>  
>   return 0;
>  }
> --- a/drivers/firmware/google/gsmi.c
> +++ b/drivers/firmware/google/gsmi.c
> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>  
>   register_reboot_notifier(_reboot_notifier);
>   register_die_notifier(_die_notifier);
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_hypervisor_list,
>  _panic_notifier);

I am not sure about this one. It looks like some logging or
pre_reboot stuff.


>  
>   printk(KERN_INFO "gsmi version " DRIVER_VERSION " loaded\n");
> --- a/drivers/leds/trigger/ledtrig-activity.c
> +++ b/drivers/leds/trigger/ledtrig-activity.c
> @@ -247,7 +247,7 @@ static int __init activity_init(void)
>   int rc = led_trigger_register(_led_trigger);
>  
>   if (!rc) {
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_hypervisor_list,
>  _panic_nb);

The notifier is trivial. It just sets a variable.

But still, it is about blinking and should be done
in the last "loop" list.


>   register_reboot_notifier(_reboot_nb);
>   }
> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
>   int rc = led_trigger_register(_led_trigger);
>  
>   if (!rc) {
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_hypervisor_list,
>  _panic_nb);

Same here. Blinking => loop list.

>   register_reboot_notifier(_reboot_nb);
>   }
> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c 
> b/drivers/misc/bcm-vk/bcm_vk_dev.c
> index a16b99bdaa13..d9d5199cdb2b 100644
> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   /* register for panic notifier */
>   vk->panic_nb.notifier_call = bcm_vk_on_panic;
> - err = atomic_notifier_chain_register(_notifier_list,
> + err = atomic_notifier_chain_register(_hypervisor_list,
>>panic_nb);

It seems to reset some hardware or so. IMHO, it should go into the
pre-reboot list.


>   if (err) {
>   dev_err(dev, "Fail to register panic notifier\n");
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device 
> *pdev)
>   pm_power_off = ltc2952_poweroff_kill;
>  
>   data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
> - atomic_notifier_chain_register(_notifier_list,
> + atomic_notifier_chain_register(_hypervisor_list,
>  

Re: [Openipmi-developer] [PATCH 24/30] panic: Refactor the panic path

2022-05-16 Thread Petr Mladek via Openipmi-developer
On Sun 2022-05-15 19:47:39, Guilherme G. Piccoli wrote:
> On 12/05/2022 11:03, Petr Mladek wrote:
> > This talks only about kdump. The reality is much more complicated.
> > The level affect the order of:
> > 
> > + notifiers vs. kdump
> > + notifiers vs. crash_dump
> > + crash_dump vs. kdump
> 
> First of all, I'd like to ask you please to clarify to me *exactly* what
> are the differences between "crash_dump" and "kdump". I'm sorry if
> that's a silly question, I need to be 100% sure I understand the
> concepts the same way you do.

Ah, it should have been:

 + notifiers vs. kmsg_dump
 + notifiers vs. crash_dump
 + crash_dump vs. kmsg_dump

I am sorry for the confusion. Even "crash_dump" is slightly
misleading because there is no function with this name.
But it seems to be easier to understand than __crash_kexec().


> > There might theoretically many variants of the ordering of kdump,
> > crash_dump, and the 4 notifier list. Some variants do not make
> > much sense. You choose 5 variants and tried to select them by
> > a level number.
> > 
> > The question is if we really could easily describe the meaning this
> > way. It is not only about a "level" of notifiers before kdump. It is
> > also about the ordering of crash_dump vs. kdump. IMHO, "level"
> > semantic does not fit there.
> > 
> > Maybe more parameters might be easier to understand the effect.
> > Anyway, we first need to agree on the chosen variants.
> > I am going to discuss it more in the code, see below.
> > 
> > 
> > [...] 
> > Here is the code using the above functions. It helps to discuss
> > the design and logic.
> > 
> > I have to say that the logic is very unclear. Almost all
> > functions are called twice:
> > 
> > The really used code path is defined by order_panic_notifiers_and_kdump()
> > that encodes "level" into "bits". The bits are then flipped in
> > panic_notifier_*_once() calls that either do something or not.
> > kmsg_dump() is called according to the bit flip.
> > 
> > Also I guess that it is good proof that "level" abstraction does
> > not fit here. Normal levels would not need this kind of magic.
> 
> Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
> disagree here - I'm much more fond to this kind of code than a bunch of
> if/else blocks that almost give headaches. Encoding such "level" logic
> in the if/else scheme is very convoluted, generates a very big code. And
> the functions aren't so black magic - they map a level in bits, and the
> functions _once() are called...once! Although we switch the position in
> the code, so there are 2 calls, one of them is called and the other not.

I see. Well, I would consider this as a warning that the approach is
too complex. If the code, using if/then/else, would cause headaches
then also understanding of the behavior would cause headaches for
both users and programmers.


> But that's totally fine to change - especially if we're moving away from
> the "level" logic. I see below you propose a much simpler approach - if
> we follow that, definitely we won't need the "black magic" approach heheh

I do not say that my proposal is fully correct. But we really need
this kind of simpler approach.


> > OK, the question is how to make it better.

> > One option "panic_prefer_crash_dump" should be enough.
> > And the code might look like:
> > 
> > void panic()
> > {
> > [...]
> > dump_stack();
> > kgdb_panic(buf);
> > 
> > < ---  here starts the reworked code --- >
> > 
> > /* crash dump is enough when enabled and preferred. */
> > if (panic_prefer_crash_dump)
> > __crash_kexec(NULL);
> > 
> > /* Stop other CPUs and focus on handling the panic state. */
> > if (has_kexec_crash_image)
> > crash_smp_send_stop();
> > else
> > smp_send_stop()
> > 
> 
> Here we have a very important point. Why do we need 2 variants of SMP
> CPU stopping functions? I disagree with that - my understanding of this
> after some study in architectures is that the crash_() variant is
> "stronger", should work in all cases and if not, we should fix that -
> that'd be a bug.
> 
> Such variant either maps to smp_send_stop() (in various architectures,
> including XEN/x86) or overrides the basic function with more proper
> handling for panic() case...I don't see why we still need such
> distinction, if you / others have some insight about that, I'd like to
> hear =)

The two variants were introduced by the commit 0ee59413c967c35a6dd
("x86/panic: replace smp_send_stop() with kdump friendly version in
panic path")

It points to https://lkml.org/lkml/2015/6/24/44 that talks about
still running watchdogs.

It is possible that the problem could be fixed another way. It is
even possible that it has already been fixed by the notifiers
that disable the watchdogs.

Anyway, any change of the smp_send_stop() behavior should be done
in a separate patch. It will help with bisection of possible
regression. Also it would 

Re: [Openipmi-developer] [PATCH 24/30] panic: Refactor the panic path

2022-05-12 Thread Petr Mladek via Openipmi-developer
Hello,

first, I am sorry for stepping into the discussion so late.
I was busy with some other stuff and this patchset is far
from trivial.

Second, thanks a lot for putting so much effort into it.
Most of the changes look pretty good, especially all
the fixes of particular notifiers and split into
four lists.

Though this patch will need some more love. See below
for more details.


On Wed 2022-04-27 19:49:18, Guilherme G. Piccoli wrote:
> The panic() function is somewhat convoluted - a lot of changes were
> made over the years, adding comments that might be misleading/outdated
> now, it has a code structure that is a bit complex to follow, with
> lots of conditionals, for example. The panic notifier list is something
> else - a single list, with multiple callbacks of different purposes,
> that run in a non-deterministic order and may affect hardly kdump
> reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.
> 
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3784,6 +3791,33 @@
>   timeout < 0: reboot immediately
>   Format: 
>  
> + panic_notifiers_level=
> + [KNL] Set the panic notifiers execution order.
> + Format: 
> + We currently have 4 lists of panic notifiers; based
> + on the functionality and risk (for panic success) the
> + callbacks are added in a given list. The lists are:
> + - hypervisor/FW notification list (low risk);
> + - informational list (low/medium risk);
> + - pre_reboot list (higher risk);
> + - post_reboot list (only run late in panic and after
> + kdump, not configurable for now).
> + This parameter defines the ordering of the first 3
> + lists with regards to kdump; the levels determine
> + which set of notifiers execute before kdump. The
> + accepted levels are:

This talks only about kdump. The reality is much more complicated.
The level affect the order of:

+ notifiers vs. kdump
+ notifiers vs. crash_dump
+ crash_dump vs. kdump

There might theoretically many variants of the ordering of kdump,
crash_dump, and the 4 notifier list. Some variants do not make
much sense. You choose 5 variants and tried to select them by
a level number.

The question is if we really could easily describe the meaning this
way. It is not only about a "level" of notifiers before kdump. It is
also about the ordering of crash_dump vs. kdump. IMHO, "level"
semantic does not fit there.

Maybe more parameters might be easier to understand the effect.
Anyway, we first need to agree on the chosen variants.
I am going to discuss it more in the code, see below.



> + 0: kdump is the first thing to run, NO list is
> + executed before kdump.
> + 1: only the hypervisor list is executed before kdump.
> + 2 (default level): the hypervisor list and (*if*
> + there's any kmsg_dumper defined) the informational
> + list are executed before kdump.
> + 3: both the hypervisor and the informational lists
> + (always) execute before kdump.
> + 4: the 3 lists (hypervisor, info and pre_reboot)
> + execute before kdump - this behavior is analog to the
> + deprecated parameter "crash_kexec_post_notifiers".
> +
>   panic_print=Bitmask for printing system info when panic happens.
>   User can chose combination of the following bits:
>   bit 0: print all tasks info
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -183,6 +195,112 @@ static void panic_print_sys_info(bool console_flush)
>   ftrace_dump(DUMP_ALL);
>  }
>  
> +/*
> + * Helper that accumulates all console flushing routines executed on panic.
> + */
> +static void console_flushing(void)
> +{
> +#ifdef CONFIG_VT
> + unblank_screen();
> +#endif
> + console_unblank();
> +
> + /*
> +  * In this point, we may have disabled other CPUs, hence stopping the
> +  * CPU holding the lock while still having some valuable data in the
> +  * console buffer.
> +  *
> +  * Try to acquire the lock then release it regardless of the result.
> +  * The release will also print the buffers out. Locks debug should
> +  * be disabled to avoid reporting bad unlock balance when panic()
> +  * is not being called from OOPS.
> +  */
> + debug_locks_off();
> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
> + panic_print_sys_info(true);
> +}
> +
> +#define PN_HYPERVISOR_BIT0
> +#define PN_INFO_BIT  1
> +#define 

Re: [Openipmi-developer] [PATCH 17/30] tracing: Improve panic/die notifiers

2022-05-11 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:11, Guilherme G. Piccoli wrote:
> Currently the tracing dump_on_oops feature is implemented
> through separate notifiers, one for die/oops and the other
> for panic. With the addition of panic notifier "id", this
> patch makes use of such "id" to unify both functions.
> 
> It also comments the function and changes the priority of the
> notifier blocks, in order they run early compared to other
> notifiers, to prevent useless trace data (like the callback
> names for the other notifiers). Finally, we also removed an
> unnecessary header inclusion.
> 
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
>  
>  fs_initcall(tracer_init_tracefs);
>  
> -static int trace_panic_handler(struct notifier_block *this,
> -unsigned long event, void *unused)
> +/*
> + * The idea is to execute the following die/panic callback early, in order
> + * to avoid showing irrelevant information in the trace (like other panic
> + * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
> + * warnings get disabled (to prevent potential log flooding).
> + */
> +static int trace_die_panic_handler(struct notifier_block *self,
> + unsigned long ev, void *unused)
>  {
> - if (ftrace_dump_on_oops)
> + int do_dump;
> +
> + if (!ftrace_dump_on_oops)
> + return NOTIFY_DONE;
> +
> + switch (ev) {
> + case DIE_OOPS:
> + do_dump = 1;
> + break;
> + case PANIC_NOTIFIER:
> + do_dump = 1;
> + break;

DIE_OOPS and PANIC_NOTIFIER are from different enum.
It feels like comparing apples with oranges here.

IMHO, the proper way to unify the two notifiers is
a check of the @self parameter. Something like:

static int trace_die_panic_handler(struct notifier_block *self,
unsigned long ev, void *unused)
{
if (self == trace_die_notifier && val != DIE_OOPS)
goto out;

ftrace_dump(ftrace_dump_on_oops);
out:
return NOTIFY_DONE;
}

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-11 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-10 21:46:38, John Ogness wrote:
> On 2022-05-10, Steven Rostedt  wrote:
> >> As already mentioned in the other reply, panic() sometimes stops the
> >> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> >> 
> >> Another situation is when the CPU using the lock ends in some
> >> infinite loop because something went wrong. The system is in
> >> an unpredictable state during panic().
> >> 
> >> I am not sure if this is possible with the code under gsmi_dev.lock
> >> but such things really happen during panic() in other subsystems.
> >> Using trylock in the panic() code path is a good practice.
> >
> > I believe that Peter Zijlstra had a special spin lock for NMIs or
> > early printk, where it would not block if the lock was held on the
> > same CPU. That is, if an NMI happened and paniced while this lock was
> > held on the same CPU, it would not deadlock. But it would block if the
> > lock was held on another CPU.
> 
> Yes. And starting with 5.19 it will be carrying the name that _you_ came
> up with (cpu_sync):
> 
> printk_cpu_sync_get_irqsave()
> printk_cpu_sync_put_irqrestore()

There is a risk that this lock might become a big kernel lock.

This special lock would need to be used even during normal
system operation. It does not make sense to suddenly start using
another lock during panic.

So I think that we should think twice before using it.
I would prefer using trylock of the original lock when
possible during panic.

It is possible that I miss something.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:09, Guilherme G. Piccoli wrote:
> This patch improves the panic/die notifiers in this driver by
> making use of a passed "id" instead of comparing pointer
> address; also, it removes an useless prototype declaration
> and unnecessary header inclusion.
> 
> This is part of a panic notifiers refactor - this notifier in
> the future will be moved to a new list, that encompass the
> information notifiers only.
> 
> --- a/drivers/bus/brcmstb_gisb.c
> +++ b/drivers/bus/brcmstb_gisb.c
> @@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, 
> void *dev_id)
>  /*
>   * Dump out gisb errors on die or panic.
>   */
> -static int dump_gisb_error(struct notifier_block *self, unsigned long v,
> -void *p);
> -
> -static struct notifier_block gisb_die_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
> -static struct notifier_block gisb_panic_notifier = {
> - .notifier_call = dump_gisb_error,
> -};
> -
>  static int dump_gisb_error(struct notifier_block *self, unsigned long v,
>  void *p)
>  {
>   struct brcmstb_gisb_arb_device *gdev;
> - const char *reason = "panic";
> + const char *reason = "die";
>  
> - if (self == _die_notifier)
> - reason = "die";
> + if (v == PANIC_NOTIFIER)
> + reason = "panic";

IMHO, the check of the @self parameter was the proper solution.

"gisb_die_notifier" list uses @val from enum die_val.
"gisb_panic_notifier" list uses @val from enum panic_notifier_val.

These are unrelated types. It might easily break when
someone defines the same constant also in enum die_val.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:08, Guilherme G. Piccoli wrote:
> The notifiers infrastructure provides a way to pass an "id" to the
> callbacks to determine what kind of event happened, i.e., what is
> the reason behind they getting called.
> 
> The panic notifier currently pass 0, but this is soon to be
> used in a multi-targeted notifier, so let's pass a meaningful
> "id" over there.
>
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  include/linux/panic_notifier.h | 5 +
>  kernel/panic.c | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
> index 41e32483d7a7..07dced83a783 100644
> --- a/include/linux/panic_notifier.h
> +++ b/include/linux/panic_notifier.h
> @@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
>  
>  extern bool crash_kexec_post_notifiers;
>  
> +enum panic_notifier_val {
> + PANIC_UNUSED,
> + PANIC_NOTIFIER = 0xDEAD,
> +};

Hmm, this looks like a hack. PANIC_UNUSED will never be used.
All notifiers will be always called with PANIC_NOTIFIER.

The @val parameter is normally used when the same notifier_list
is used in different situations.

But you are going to use it when the same notifier is used
in more lists. This is normally distinguished by the @nh
(atomic_notifier_head) parameter.

IMHO, it is a bad idea. First, it would confuse people because
it does not follow the original design of the parameters.
Second, the related code must be touched anyway when
the notifier is moved into another list so it does not
help much.

Or do I miss anything, please?

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:49:05, Guilherme G. Piccoli wrote:
> Currently the panic notifiers from user mode linux don't follow
> the convention for most of the other notifiers present in the
> kernel (indentation, priority setting, numeric return).
> More important, the priorities could be improved, since it's a
> special case (userspace), hence we could run the notifiers earlier;
> user mode linux shouldn't care much with other panic notifiers but
> the ordering among the mconsole and arch notifier is important,
> given that the arch one effectively triggers a core dump.

It is not clear to me why user mode linux should not care about
the other notifiers. It might be because I do not know much
about the user mode linux.

Is the because they always create core dump or are never running
in a hypervisor or ...?

AFAIK, the notifiers do many different things. For example, there
is a notifier that disables RCU watchdog, print some extra
information. Why none of them make sense here?

> This patch fixes that by running the mconsole notifier as the first
> panic notifier, followed by the architecture one (that coredumps).
> Also, we remove a useless header inclusion.


Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 10/30] alpha: Clean-up the panic notifier code

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Mon 2022-05-09 11:13:17, Guilherme G. Piccoli wrote:
> On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> > The alpha panic notifier has some code issues, not following
> > the conventions of other notifiers. Also, it might halt the
> > machine but still it is set to run as early as possible, which
> > doesn't seem to be a good idea.

Yeah, it is pretty strange behavior.

I looked into the history. This notifier was added into the alpha code
in 2.4.0-test2pre2. In this historic code, the default panic() code
either rebooted after a timeout or ended in a infinite loop. There
was not crasdump at that times.

The notifier allowed to change the behavior. There were 3 notifiers:

   + mips and mips64 ended with blinking in panic()
   + alpha did __halt() in this srm case

They both still do this. I guess that it is some historic behavior
that people using these architectures are used to.

Anyway, it makes sense to do this as the last notifier after
dumping other information.

Reviewed-by: Petr Mladek 

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Wed 2022-04-27 19:48:59, Guilherme G. Piccoli wrote:
> The pvpanic driver relies on panic notifiers to execute a callback
> on panic event. Such function is executed in atomic context - the
> panic function disables local IRQs, preemption and all other CPUs
> that aren't running the panic code.
> 
> With that said, it's dangerous to use regular spinlocks in such path,
> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
> instances").
> This patch fixes that by replacing regular spinlocks with the trylock
> safer approach.

It seems that the lock is used just to manipulating a list. A super
safe solution would be to use the rcu API: rcu_add_rcu() and
list_del_rcu() under rcu_read_lock(). The spin lock will not be
needed and the list will always be valid.

The advantage would be that it will always call members that
were successfully added earlier. That said, I am not familiar
with pvpanic and am not sure if it is worth it.

> It also fixes an old comment (about a long gone framebuffer code) and
> the notifier priority - we should execute hypervisor notifiers early,
> deferring this way the panic action to the hypervisor, as expected by
> the users that are setting up pvpanic.

This should be done in a separate patch. It changes the behavior.
Also there might be a discussion whether it really should be
the maximal priority.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


Re: [Openipmi-developer] [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Petr Mladek via Openipmi-developer
On Tue 2022-05-03 16:12:09, Guilherme G. Piccoli wrote:
> On 03/05/2022 15:03, Evan Green wrote:
> > [...]
> > gsmi_shutdown_reason() is a common function called in other scenarios
> > as well, like reboot and thermal trip, where it may still make sense
> > to wait to acquire a spinlock. Maybe we should add a parameter to
> > gsmi_shutdown_reason() so that you can get your change on panic, but
> > we don't convert other callbacks into try-fail scenarios causing us to
> > miss logs.
> > 
> 
> Hi Evan, thanks for your feedback, much appreciated!
> What I've done in other cases like this was to have a helper checking
> the spinlock in the panic notifier - if we can acquire that, go ahead
> but if not, bail out. For a proper example of an implementation, check
> patch 13 of the series:
> https://lore.kernel.org/lkml/20220427224924.592546-14-gpicc...@igalia.com/ .
> 
> Do you agree with that, or prefer really a parameter in
> gsmi_shutdown_reason() ? I'll follow your choice =)

I see two more alternative solutions:

1st variant is a trick already used in console write() callbacks.
They do trylock() when oops_in_progress is set. They remember
the result to prevent double unlock when printing Oops messages and
the system will try to continue working. For example:

pl011_console_write(struct console *co, const char *s, unsigned int count)
{
[...]
int locked = 1;
[...]
if (uap->port.sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock(>port.lock);
else
spin_lock(>port.lock);

[...]

if (locked)
spin_unlock(>port.lock);
}


2nd variant is to check panic_cpu variable. It is used in printk.c.
We might move the function to panic.h:

static bool panic_in_progress(void)
{
return unlikely(atomic_read(_cpu) != PANIC_CPU_INVALID);
}

and then do:

if (panic_in_progress()) {
...


> > Though thinking more about it, is this really a Good Change (TM)? The
> > spinlock itself already disables interrupts, meaning the only case
> > where this change makes a difference is if the panic happens from
> > within the function that grabbed the spinlock (in which case the
> > callback is also likely to panic), or in an NMI that panics within
> > that window.

As already mentioned in the other reply, panic() sometimes stops
the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().

Another situation is when the CPU using the lock ends in some
infinite loop because something went wrong. The system is in
an unpredictable state during panic().

I am not sure if this is possible with the code under gsmi_dev.lock
but such things really happen during panic() in other subsystems.
Using trylock in the panic() code path is a good practice.

Best Regards,
Petr


___
Openipmi-developer mailing list
Openipmi-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openipmi-developer