Re: [patch V4 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

2023-05-14 Thread Guilherme G. Piccoli
On 12/05/2023 18:06, Thomas Gleixner wrote:
> Hi!
> 
> This is version 4 of the reworked parallel bringup series. Version 3 can be
> found here:
> 
>https://lore.kernel.org/lkml/20230508181633.089804...@linutronix.de


Hi Thomas, thanks for series! I was able to test it on the Steam Deck
(on top of 6.4-rc2), and everything is working fine; also tested S3
suspend/resume, working as expected.

Some logs from boot time:


Parallel boot
[0.239764] smp: Bringing up secondary CPUs ...
[...]
[0.253130] smp: Brought up 1 node, 8 CPUs


Regular boot (with cpuhp.parallel=0)
[0.240093] smp: Bringing up secondary CPUs ...
[...]
[0.253475] smp: Brought up 1 node, 8 CPUs


Feel free to add (to the series):

Tested-by: Guilherme G. Piccoli  # Steam Deck

Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-06-15 Thread Guilherme G. Piccoli
Perfect Petr, thanks for your feedback!

I'll be out for some weeks, but after that what I'm doing is to split
the series in 2 parts:

(a) The general fixes, which should be reviewed by subsystem maintainers
and even merged individually by them.

(b) The proper panic refactor, which includes the notifiers list split,
etc. I'll think about what I consider the best solution for the
crash_dump required ones, and will try to split in very simple patches
to make it easier to review.

Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-26 Thread Guilherme G. Piccoli
tifier into wrong list.
>> 
>> I would like to note again that the panic notifiers are optional to run,
>> while kdump is expectd once loaded, from the original purpose. I guess
>> people I know will still have this thought, e.g Hatayama, Masa, they are
>> truly often use panic notifiers like this on their company's system.


On 24/05/2022 11:44, Eric W. Biederman wrote:
> [...]
> Unfortunately I am also very grouchy.
> 
> Notifiers before kexec on panic are fundamentally broken.  So please
> just remove crash_kexec_post notifiers and be done with it.  Part of the
> deep issue is that firmware always has a common and broken
> implementation for anything that is not mission critical to
> motherboards.
> 
> Notifiers in any sense on these paths are just bollocks.  Any kind of
> notifier list is fundamentally fragile in the face of memory corruption
> and very very difficult to review.
> 
> So I am going to refresh my ancient NACK on this.
> 
> I can certainly appreciate that there are pieces of the reboot paths
> that can be improved.  I don't think making anything more feature full
> or flexible is any kind of real improvement.


Now, from the thread "Should arm64 have a custom crash shutdown
handler?" (link:
https://lore.kernel.org/lkml/427a8277-49f0-4317-d6c3-4a15d7070...@igalia.com/),
we have:

On 05/05/2022 08:10, Mark Rutland wrote:
>> On Wed, May 04, 2022 at 05:00:42PM -0300, Guilherme G. Piccoli wrote:
>>> [...]
>>> Currently, when we kexec in arm64, the function machine_crash_shutdown()
>>> is called as a handler to disable CPUs and (potentially) do extra
>>> quiesce work. In the aforementioned architectures, there's a way to
>>> override this function, if for example an hypervisor wish to have its
>>> guests running their own custom shutdown machinery.
>> 
>> What exactly do you need to do in this custom shutdown machinery?
>> 
>> The general expectation for arm64 is that any hypervisor can implement PSCI,
>> and as long as you have that, CPUs (and the VM as a whole) can be shutdown 
>> in a
>> standard way.
>> 
>> I suspect what you're actually after is a mechanism to notify the hypervisor
>> when the guest crashes, rather than changing the way the shutdown itself
>> occurs? If so, we already have panic notifiers, and QEMU has a "pvpanic"
>> device using that. See drivers/misc/pvpanic/.


OK, so it seems we have some points in which agreement exists, and some
points that there is no agreement and instead, we have antagonistic /
opposite views and needs. Let's start with the easier part heh


It seems everybody agrees that *we shouldn't over-engineer things*, and
as per Eric good words: making the panic path more feature-full or
increasing flexibility isn't a good idea. So, as a "corollary": the
panic level approach I'm proposing is not a good fit, I'll drop it and
let's go with something simpler.

Another point of agreement seems to be that _notifier lists in the panic
path are dangerous_, for *2 different reasons*:

(a) We cannot guarantee that people won't add crazy callbacks there, we
can plan and document things the best as possible - it'll never be
enough, somebody eventually would slip a nonsense callback that would
break things and defeat the planned purpose of such a list;

(b) As per Eric point, in a panic/crash situation we might have memory
corruption exactly in the list code / pointers, etc, so the notifier
lists are, by nature, a bit fragile. But I think we shouldn't consider
it completely "bollocks", since this approach has been used for a while
with a good success rate. So, lists aren't perfect at all, but at the
same time, they aren't completely useless.


Now, to the points in which there are conflicting / antagonistic
needs/views:

(I) Kdump should be the first thing to run, as it's been like that since
forever. But...notice that "crash_kexec_post_notifiers" was created
exactly as a way to circumvent that, so we can see this is not an
absolute truth. Some users really *require to execute* some special code
*before kdump*.
Worth noticing here that regular kexec invokes the drivers .shutdown()
handlers, while kdump [aka crash_kexec()] does not, so we must have a
way to run code before kdump in a crash situation.

(II) If *we need* to have some code always running before kdump/reboot
on panic path (like the Hyper-V vmbus connection unload), *where to add
such code*? Again, conflicting views. Some would say we should hardcode
this in the panic() function. Others, that we should use the custom
machine_crash_shutdown() infrastructure - but notice that this isn't
available in all architectures, like arm64. Finally, others suggest
to...use notifier lists! Which was more or less the approach we took in
this patch.

How can we reach consen

Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path

2022-05-24 Thread Guilherme G. Piccoli
On 28/04/2022 13:55, Helge Deller wrote:
> [...]
> You may add:
> Acked-by: Helge Deller  # parisc
> 
> Helge

Hi Helge, do you think would be possible to still pick this one for
v5.19 or do you prefer to hold for the next release?

I'm working on V2, so if it's merged for 5.19 I won't send it again.
Thanks,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-23 Thread Guilherme G. Piccoli
On 19/05/2022 16:20, Scott Branden wrote:
> [...] 
>> Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
>> designed to run in x86 only or you have other architectures' use cases?
> The adapter may be used in any PCIe design that supports DMA.
> So it may be possible to run in arm64 servers.
>>
>> [...]
>> With that said, and given this is a lightweight notifier that ideally
>> should run ASAP, I'd keep this one in the hypervisor list. We can
>> "adjust" the semantic of this list to include lightweight notifiers that
>> reset adapters.
> Sounds the best to keep system operating as tested today.
>>
>> With that said, Petr has a point - not always such list is going to be
>> called before kdump. So, that makes me think in another idea: what if we
>> have another list, but not on panic path, but instead in the custom
>> crash_shutdown()? Drivers could add callbacks there that must execute
>> before kexec/kdump, no matter what.
> It may be beneficial for some other drivers but for our use we would 
> then need to register for the panic path and the crash_shutdown path. 
> We notify the VK card for 2 purposes: one to stop DMA so memory stop 
> changing during a kdump.  And also to get the card into a good state so 
> resets happen cleanly.

Thanks Scott! With that, I guess it's really better to keep this
notifier in this hypervisor/early list - I'm planning to do that for V2.
Unless Petr or somebody has strong feelings against that, of course.

Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-20 Thread Guilherme G. Piccoli
On 19/05/2022 20:45, Baoquan He wrote:
> [...]
>> I really appreciate the summary skill you have, to convert complex
>> problems in very clear and concise ideas. Thanks for that, very useful!
>> I agree with what was summarized above.
> 
> I want to say the similar words to Petr's reviewing comment when I went
> through the patches and traced each reviewing sub-thread to try to
> catch up. Petr has reivewed this series so carefully and given many
> comments I want to ack immediately.
> 
> I agree with most of the suggestions from Petr to this patch, except of
> one tiny concern, please see below inline comment.

Hi Baoquan, thanks! I'm glad you're also reviewing that =)


> [...]
> 
> I like the proposed skeleton of panic() and code style suggested by
> Petr very much. About panic_prefer_crash_dump which might need be added,
> I hope it has a default value true. This makes crash_dump execute at
> first by default just as before, unless people specify
> panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> this is inconsistent with the old behaviour.

I'd like to understand better why the crash_kexec() must always be the
first thing in your use case. If we keep that behavior, we'll see all
sorts of workarounds - see the last patches of this series, Hyper-V and
PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
execution of their relevant notifiers (like the vmbus disconnect,
specially in arm64 that has no custom machine_crash_shutdown, or the
fadump case in ppc). This led to more risk in kdump.

The thing is: with the notifiers' split, we tried to keep only the most
relevant/necessary stuff in this first list, things that ultimately
should improve kdump reliability or if not, at least not break it. My
feeling is that, with this series, we should change the idea/concept
that kdump must run first nevertheless, not matter what. We're here
trying to accommodate the antagonistic goals of hypervisors that need
some clean-up (even for kdump to work) VS. kdump users, that wish a
"pristine" system reboot ASAP after the crash.

Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-19 Thread Guilherme G. Piccoli
On 18/05/2022 19:17, Scott Branden wrote:
> Hi Guilherme,
> 
> +Desmond
> [...] 
 I'm afraid it breaks kdump if this device is not reset beforehand - it's
 a doorbell write, so not high risk I think...

 But in case the not-reset device can be probed normally in kdump kernel,
 then I'm fine in moving this to the reboot list! I don't have the HW to
 test myself.
>>>
>>> Good question. Well, it if has to be called before kdump then
>>> even "hypervisor" list is a wrong place because is not always
>>> called before kdump.
>> [...]
> We register to the panic notifier so that we can kill the VK card ASAP
> to stop DMAing things over to the host side.  If it is not notified then
> memory may not be frozen when kdump is occurring.
> Notifying the card on panic is also needed to allow for any type of 
> reset to occur.
> 
> So, the only thing preventing moving the notifier later is the chance
> that memory is modified while kdump is occurring.  Or, if DMA is 
> disabled before kdump already then this wouldn't be an issue and the 
> notification to the card (to allow for clean resets) can be done later.

Hi Scott / Desmond, thanks for the detailed answer! Is this adapter
designed to run in x86 only or you have other architectures' use cases?

I'm not expert on that, but I guess whether DMA is "kept" or not depends
a bit if IOMMU is used. IIRC, there was a copy of the DMAR table in
kdump (at least for Intel IOMMU). Also, devices are not properly
quiesced on kdump IIUC, we don't call shutdown/reset handlers, they're
skip due to the crash nature - so there is a risk of devices doing bad
things in the new kernel.

With that said, and given this is a lightweight notifier that ideally
should run ASAP, I'd keep this one in the hypervisor list. We can
"adjust" the semantic of this list to include lightweight notifiers that
reset adapters.

With that said, Petr has a point - not always such list is going to be
called before kdump. So, that makes me think in another idea: what if we
have another list, but not on panic path, but instead in the custom
crash_shutdown()? Drivers could add callbacks there that must execute
before kexec/kdump, no matter what.

Let me know your thoughts Scott / Desmond / Petr and all interested parties.
Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-19 Thread Guilherme G. Piccoli
On 19/05/2022 04:03, Petr Mladek wrote:
> [...]
> I would ignore it for now. If anyone would want to safe the log
> then they would need to read it. They will most likely use
> the existing kmsg_dump() infastructure. In fact, they should
> use it to avoid a code duplication.
> 
> Best Regards,
> Petr

Cool, thanks! I agree, let's expect people use kmsg_dump() as they should =)



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:33, Petr Mladek wrote:
> [...]
> Anyway, I would distinguish it the following way.
> 
>   + If the notifier is preserving kernel log then it should be ideally
> treated as kmsg_dump().
> 
>   + It the notifier is saving another debugging data then it better
> fits into the "hypervisor" notifier list.
> 
>

Definitely, I agree - it's logical, since we want more info in the logs,
and happens some notifiers running in the informational list do that,
like ftrace_on_oops for example.


> Regarding the reliability. From my POV, any panic notifier enabled
> in a generic kernel should be reliable with more than 99,9%.
> Otherwise, they should not be in the notifier list at all.
> 
> An exception would be a platform-specific notifier that is
> called only on some specific platform and developers maintaining
> this platform agree on this.
> 
> The value "99,9%" is arbitrary. I am not sure if it is realistic
> even in the other code, for example, console_flush_on_panic()
> or emergency_restart(). I just want to point out that the border
> should be rather high. Otherwise we would back in the situation
> where people would want to disable particular notifiers.
> 

Totally agree, these percentages are just an example, 50% is ridiculous
low reliability in my example heheh

But some notifiers deep dive in abstraction layers (like regmap or GPIO
stuff) and it's hard to determine the probability of a lock issue (take
a spinlock already taken inside regmap code and live-lock forever, for
example). These are better to run, if possible, later than kdump or even
info list.

Thanks again for the good analysis Petr!
Cheers,


Guilherme





Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:58, Petr Mladek wrote:
> [...]
>> I does similar things like kmsg_dump() so it should be called in
>> the same location (after info notifier list and before kdump).
>>
>> A solution might be to put it at these notifiers at the very
>> end of the "info" list or make extra "dump" notifier list.
> 
> I just want to point out that the above idea has problems.
> Notifiers storing kernel log need to be treated as kmsg_dump().
> In particular, we would  need to know if there are any.
> We do not need to call "info" notifier list before kdump
> when there is no kernel log dumper registered.
> 

Notifiers respect the priority concept, which is just a number that
orders the list addition (and the list is called in order).

I've used the last position to panic_print() [in patch 25] - one idea
here is to "reserve" the last position (represented by INT_MIN) for
notifiers that act like kmsg_dump(). I couldn't find any IIRC, but that
doesn't prevent us to save this position and comment about that.

Makes sense to you ?
Cheers!



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-18 Thread Guilherme G. Piccoli
On 18/05/2022 04:38, Petr Mladek wrote:
> [...]
> I have answered this in more detail in the other reply, see
> https://lore.kernel.org/r/YoShZVYNAdvvjb7z@alley
> 
> I agree that both notifiers in
> 
> drivers/soc/bcm/brcmstb/pm/pm-arm.c
> drivers/firmware/google/gsmi.c
> 
> better fit into the hypervisor list after all.
> 
> Best Regards,
> Petr

Perfect, thanks - will keep both in such list for V2.



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 14:02, Luck, Tony wrote:
>> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
>> we run the code as-is? Does that make sense to you?
> 
> The "skip" option sounds like it needs some special flag associated with
> an entry on the notifier chain. But there are other notifier chains ... so 
> that
> sounds messy to me.
> 
> Just all the notifiers in priority order. If any want to take different 
> actions
> based on kdump status, change the code. That seems more flexible than
> an "all or nothing" approach by skipping.
> 
> -Tony

I guess I've expressed myself in a poor way - sorry!

What I'm planning to do in the altera_edac notifier is:

if (kdump_is_set)
 return;

/* regular code */

In other words: if the kdump is set, this notifier will be effectively a
nop (although it's gonna be called).

Lemme know your thoughts Tony, if that makes sense.
Thanks,


Guilherme



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 11:11, Petr Mladek wrote:
> [...]
>>> Then notifiers could make an informed choice on whether to deep dive to
>>> get all the possible details (when there is no kdump) or just skim the high
>>> level stuff (to maximize chance of getting a successful kdump).
>>>
>>> -Tony
>>
>> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?
> 
> I like this idea.
> 
> One small problem is that kexec_crash_loaded() has valid result
> only under kexec_mutex. On the other hand, it should stay true
> once loaded so that the small race window should be innocent.
> 
>> With that, are you/Petr/Dinh OK in moving it for the info list?
> 
> Sounds good to me.
> 
> Best Regards,
> Petr

Perfect, I'll do that for V2 then =)

Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
we run the code as-is? Does that make sense to you?

I'll postpone it to run almost in the end of info list (last position is
for panic_print).

Thanks,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:57, Petr Mladek wrote:
> [...]
 --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
 +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
 @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
 struct pci_device_id *ent)
>>> [... snip ...]
>>> It seems to reset some hardware or so. IMHO, it should go into the
>>> pre-reboot list.
>>
>> Mixed feelings here, I'm looping Broadcom maintainers to comment.
>> (CC Scott and Broadcom list)
>>
>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>> a doorbell write, so not high risk I think...
>>
>> But in case the not-reset device can be probed normally in kdump kernel,
>> then I'm fine in moving this to the reboot list! I don't have the HW to
>> test myself.
> 
> Good question. Well, it if has to be called before kdump then
> even "hypervisor" list is a wrong place because is not always
> called before kdump.

Agreed! I'll defer that to Scott and Broadcom folks to comment.
If it's not strictly necessary, I'll happily move it to the reboot list.

If necessary, we could use the machine_crash_kexec() approach, but we'll
fall into the case arm64 doesn't support it and I'm not sure if this
device is available for arm - again a question for the maintainers.


>  [...]
 --- a/drivers/power/reset/ltc2952-poweroff.c
 +++ b/drivers/power/reset/ltc2952-poweroff.c
>> [...]
>> This is setting a variable only, and once it's set (data->kernel_panic
>> is the bool's name), it just bails out the IRQ handler and a timer
>> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
>> IMHO.
> 
> IMHO, the timer informs the hardware that the system is still alive
> in the middle of panic(). If the timer is not working then the
> hardware (chip) will think that the system frozen in panic()
> and will power off the system. See the comments in
> drivers/power/reset/ltc2952-poweroff.c:
> [ snip ...]
> IMHO, we really have to keep it alive until we reach the reboot stage.
> 
> Another question is how it actually works when the interrupts are
> disabled during panic() and the timer callbacks are not handled.

Agreed here! Guess I can move this one the reboot list, fine by me.
Unless PM folks think otherwise.


> [...]
>> Disagree here, I'm CCing Florian for information.
>>
>> This notifier preserves RAM so it's *very interesting* if we have
>> kmsg_dump() for example, but maybe might be also relevant in case kdump
>> kernel is configured to store something in a persistent RAM (then,
>> without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.
> 
> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
> 
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

Here I still disagree. I've commented in the other response thread
(about Google gsmi) about the semantics of the hypervisor list, but
again: this list should contain callbacks that

(a) Should run early, _by default_ before a kdump;
(b) Communicate with the firmware/hypervisor in a "low-risk" way;

Imagine a scenario where users configure kdump kernel to save something
in a persistent form in DRAM - it'd be like a late pstore, in the next
kernel. This callback enables that, it's meant to inform FW "hey, panic
happened, please from now on don't clear the RAM in the next FW-reboot".
I don't see a reason to postpone that - let's see if the maintainers
have an opinion.

Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:28, Petr Mladek wrote:
> [...]
>>> Disagree here. I'm looping Google maintainers, so they can comment.
>>> (CCed Evan, David, Julius)
>>>
>>> This notifier is clearly a hypervisor notification mechanism. I've fixed
>>> a locking stuff there (in previous patch), I feel it's low-risk but even
>>> if it's mid-risk, the class of such callback remains a perfect fit with
>>> the hypervisor list IMHO.
>>
>> This logs a panic to our "eventlog", a tiny logging area in SPI flash
>> for critical and power-related events. In some cases this ends up
>> being the only clue we get in a Chromebook feedback report that a
>> panic occurred, so from my perspective moving it to the front of the
>> line seems like a good idea.
> 
> IMHO, this would really better fit into the pre-reboot notifier list:
> 
>+ the callback stores the log so it is similar to kmsg_dump()
>  or console_flush_on_panic()
> 
>+ the callback should be proceed after "info" notifiers
>  that might add some other useful information.
> 
> Honestly, I am not sure what exactly hypervisor callbacks do. But I
> think that they do not try to extract the kernel log because they
> would need to handle the internal format.
> 

I guess the main point in your response is : "I am not sure what exactly
hypervisor callbacks do". We need to be sure about the semantics of such
list, and agree on that.

So, my opinion about this first list, that we call "hypervisor list",
is: it contains callbacks that

(1) should run early, preferably before kdump (or even if kdump isn't
set, should run ASAP);

(2) these callbacks perform some communication with an abstraction that
runs "below" the kernel, like a firmware or hypervisor. Classic example:
pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
snapshot the full guest memory, for example.

(3) Should be low-risk. What defines risk is the level of reliability of
subsequent operations - if the callback have 50% of chance of "bricking"
the system totally and prevent kdump / kmsg_dump() / reboot , this is
high risk one for example.

Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
powerpc, etc.

So, this is a good case for the Google notifier as well - it's not
collecting data like the dmesg (hence your second bullet seems to not
apply here, info notifiers won't add info to be collected by gsmi). It
is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
tells such "lower" abstraction a panic occurred. It seems low risk and
we want it to run ASAP, if possible.

So, I'd like to keep it here, unless gsmi maintainers disagree or I'm
perhaps misunderstanding the meaning of this first list.
Cheers,


Guilherme



Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-05-17 Thread Guilherme G. Piccoli
On 11/05/2022 08:45, Petr Mladek wrote:
> [...]
> DIE_OOPS and PANIC_NOTIFIER are from different enum.
> It feels like comparing apples with oranges here.
> 
> IMHO, the proper way to unify the two notifiers is
> a check of the @self parameter. Something like:
> 
> static int trace_die_panic_handler(struct notifier_block *self,
>   unsigned long ev, void *unused)
> {
>   if (self == trace_die_notifier && val != DIE_OOPS)
>   goto out;
> 
>   ftrace_dump(ftrace_dump_on_oops);
> out:
>   return NOTIFY_DONE;
> }
> 
> Best Regards,
> Petr

OK Petr, thanks - will implement your suggestion in V2 (CC Steven)

Cheers!



Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-05-17 Thread Guilherme G. Piccoli
On 10/05/2022 12:28, Petr Mladek wrote:
> [...]
> IMHO, the check of the @self parameter was the proper solution.
> 
> "gisb_die_notifier" list uses @val from enum die_val.
> "gisb_panic_notifier" list uses @val from enum panic_notifier_val.
> 
> These are unrelated types. It might easily break when
> someone defines the same constant also in enum die_val.
> 
> Best Regards,
> Petr

OK Petr, I'll drop this idea for V2 - will just remove the useless
header / prototype then. (CC Florian)


Cheers,


Guilherme



Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 10:11, Petr Mladek wrote:
> [...]
>> You mentioned 2 cases:
>>
>> (a) Same notifier_list used in different situations;
>>
>> (b) Same *notifier callback* used in different lists;
>>
>> Mine is case (b), right? Can you show me an example of case (a)?
> 
> There are many examples of case (a):
> 
> [... snip ...] 
> These all call the same list/chain in different situations.
> The situation is distinguished by @val.
> 
> 
>> You can see in the following patches (or grep the kernel) that people are 
>> using
>> this identification parameter to determine which kind of OOPS trigger
>> the callback to condition the execution of the function to specific
>> cases.
> 
> Could you please show me some existing code for case (b)?
> I am not able to find any except in your patches.
> 

Hi Petr, thanks for the examples - I agree with you. In the end, seems
I'm kind of abusing the API. This id is used to distinguish different
situations in which the callback is called, but in the same
"realm"/notifier list.

In my case I have different list calling the same callback and
(ab-)using the id to make distinction. I can rework the patches using
pointer comparison, it's fine =)

So, I'll drop this patch in V2.

> Anyway, the solution in 16th patch is bad, definitely.
> hv_die_panic_notify_crash() uses "val" to disinguish
> both:
> 
>  + "panic_notifier_list" vs "die_chain"
>  + die_val when callen via "die_chain"
> 
> The API around "die_chain" API is not aware of enum panic_notifier_val
> and the API using "panic_notifier_list" is not aware of enum die_val.
> As I said, it is mixing apples and oranges and it is error prone.
> 

OK, I'll re-work that patch - there's more there to be changed, that one
is complex heheh

Cheers!



Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-17 Thread Guilherme G. Piccoli
On 17/05/2022 07:58, Petr Mladek wrote:
> [...]
>> Thanks for the review Petr. Patch was already merged - my goal was to be
>> concise, i.e., a patch per driver / module, so the patch kinda fixes
>> whatever I think is wrong with the driver with regards panic handling.
>>
>> Do you think it worth to remove this patch from Greg's branch just to
>> split it in 2? Personally I think it's not worth, but opinions are welcome.
> 
> No problem. It is not worth the effort.
> 

OK, perfect!

Cheers,


Guilherme



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 13:18, Luck, Tony wrote:
>> [...]
> Would it be possible to have some global "kdump is configured + enabled" flag?
> 
> Then notifiers could make an informed choice on whether to deep dive to
> get all the possible details (when there is no kdump) or just skim the high
> level stuff (to maximize chance of getting a successful kdump).
> 
> -Tony

Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?

With that, are you/Petr/Dinh OK in moving it for the info list?
Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 07:21, Petr Mladek wrote:
> [...]
> Ah, it should have been:
> 
>  + notifiers vs. kmsg_dump
>  + notifiers vs. crash_dump
>  + crash_dump vs. kmsg_dump
> 
> I am sorry for the confusion. Even "crash_dump" is slightly
> misleading because there is no function with this name.
> But it seems to be easier to understand than __crash_kexec().

Cool, thanks! Now it's totally clear for me =)
I feel crash dump is the proper term, but I personally prefer kdump to
avoid mess-up with user space "core dump" concept heheh
Also, KDUMP is an entry on MAINTAINERS file.


> [...]
>> Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
>> disagree here - I'm much more fond to this kind of code than a bunch of
>> if/else blocks that almost give headaches. Encoding such "level" logic
>> in the if/else scheme is very convoluted, generates a very big code. And
>> the functions aren't so black magic - they map a level in bits, and the
>> functions _once() are called...once! Although we switch the position in
>> the code, so there are 2 calls, one of them is called and the other not.
> 
> I see. Well, I would consider this as a warning that the approach is
> too complex. If the code, using if/then/else, would cause headaches
> then also understanding of the behavior would cause headaches for
> both users and programmers.
> 
> 
>> But that's totally fine to change - especially if we're moving away from
>> the "level" logic. I see below you propose a much simpler approach - if
>> we follow that, definitely we won't need the "black magic" approach heheh
> 
> I do not say that my proposal is fully correct. But we really need
> this kind of simpler approach.

It's cool, I agree that your idea is much simpler and makes sense - mine
seems to be an over-engineering effort. Let's see the opinions of the
interested parties, I'm curious to see if everybody agrees here, that'd
would be ideal (and kind of "wishful thinking" I guess heheh - panic
path is polemic).


> [...] 
>> Here we have a very important point. Why do we need 2 variants of SMP
>> CPU stopping functions? I disagree with that - my understanding of this
>> after some study in architectures is that the crash_() variant is
>> "stronger", should work in all cases and if not, we should fix that -
>> that'd be a bug.
>>
>> Such variant either maps to smp_send_stop() (in various architectures,
>> including XEN/x86) or overrides the basic function with more proper
>> handling for panic() case...I don't see why we still need such
>> distinction, if you / others have some insight about that, I'd like to
>> hear =)
> 
> The two variants were introduced by the commit 0ee59413c967c35a6dd
> ("x86/panic: replace smp_send_stop() with kdump friendly version in
> panic path")
> 
> It points to https://lkml.org/lkml/2015/6/24/44 that talks about
> still running watchdogs.
> 
> It is possible that the problem could be fixed another way. It is
> even possible that it has already been fixed by the notifiers
> that disable the watchdogs.
> 
> Anyway, any change of the smp_send_stop() behavior should be done
> in a separate patch. It will help with bisection of possible
> regression. Also it would require a good explanation in
> the commit message. I would personally do it in a separate
> patch(set).

Thanks for the archeology and interesting findings. I agree that is
better to split in smaller patches. I'm planning a split in 3 patches
for V2: clean-up (comment, console flushing idea, useless header), the
refactor itself and finally, this SMP change.


> [...] 
>> You had the order of panic_reboot_list VS. consoles flushing inverted.
>> It might make sense, although I didn't do that in V1...
> 
> IMHO, it makes sense:
> 
>   1. panic_reboot_list contains notifiers that do the reboot
>  immediately, for example, xen_panic_event, alpha_panic_event.
>  The consoles have to be flushed earlier.
> 
>   2. console_flush_on_panic() ignores the result of console_trylock()
>  and always calls console_unlock(). As a result the lock should
>  be unlocked at the end. And any further printk() should be able
>  to printk the messages to the console immediately. It means
>  that any messages printed by the reboot notifiers should appear
>  on the console as well.
> [...] 
>> OK, I agree with you! It's indeed simpler and if others agree, I can
>> happily change the logic to what you proposed. Although...currently the
>> "crash_kexec_post_notifiers" allows to call _all_ panic_reboot_list
>> callbacks _before kdump_.
>>
>> We need to mention this change in the commit messages, but I really
>> would like to hear the opinions of heavy users of notifiers (as
>> Michael/Hyper-V) and the kdump interested parties (like Baoquan / Dave
>> Young / Hayatama). If we all agree on such approach, will change that
>> for V2 =)
> 
> Sure, we need to make sure that we call everything that is needed.
> And it should be documented.
> 
> I believe that this is the 

Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-05-16 Thread Guilherme G. Piccoli
On 10/05/2022 14:29, Steven Rostedt wrote:
> [...]
> Also, don't sprinkle #ifdef in C code. Instead:
> 
>   if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
>   pr_info("notifers: regsiter %ps()\n",
>   n->notifer_call);
> 
> 
> Or define a print macro at the start of the C file that is a nop if it is
> not defined, and use the macro.

Thanks, I'll go with the IS_ENABLED() idea in V2 - appreciate the hint.
Cheers,


Guilherme



Re: [PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 11:56, Petr Mladek wrote:
> [...]
> I really like both changes. Just please split it them into two
> patchset. I mean one patch for the new "panic_console_replay"
> parameter and 2nd that moves "printk_info" into the notifier.
> 

OK sure, will do that in V2.
Thanks,


Guilherme



Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 11:50, Petr Mladek wrote:
> [...]
> Shame on me that I do not care that much about the style of the commit
> message :-)
> 
> Anyway, the code looks good to me. With the better commit message:
> 
> Reviewed-by: Petr Mladek 
> 

Heheh, cool - I'll add your tag and improve the message in V2.
Thanks,


Guilherme



Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 11:45, Petr Mladek wrote:
> [...]
> 
> The patch looks good to me. I would just suggest two changes.
> 
> 1. I would rename the list to "panic_loop_list" instead of
>"panic_post_reboot_list".
> 
>It will be more clear that it includes things that are
>needed before panic() enters the infinite loop.
> 
> 
> 2. I would move all the notifiers that enable blinking here.
> 
>The blinking should be done only during the infinite
>loop when there is nothing else to do. If we enable
>earlier then it might disturb/break more important
>functionality (dumping information, reboot).
> 

Perfect, I agree with you. I'll change both points in V2 =)



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-05-16 Thread Guilherme G. Piccoli
Thanks again for the review! Comments inline below:


On 16/05/2022 11:33, Petr Mladek wrote:
> [...]
>> --- a/drivers/edac/altera_edac.c
>> +++ b/drivers/edac/altera_edac.c
>> @@ -2163,7 +2162,7 @@ static int altr_edac_a10_probe(struct platform_device 
>> *pdev)
>>  int dberror, err_addr;
>>  
>>  edac->panic_notifier.notifier_call = s10_edac_dberr_handler;
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_pre_reboot_list,
> 
> My understanding is that this notifier first prints info about ECC
> errors and then triggers reboot. It might make sense to split it
> into two notifiers.

I disagree here - looping the maintainers for comments (CCing Dinh /
Tony). BTW, sorry for not having you on CC already Dinh, it was my mistake.

So, my reasoning here is: this notifier should fit the info list,
definitely! But...it's very high risk for kdump. It deep dives into the
regmap API (there are locks in such code) plus there is an (MM)IO write
to the device and an ARM firmware call. So, despite the nature of this
notifier _fits the informational list_, the _code is risky_ so we should
avoid running it before a kdump.

Now, we indeed have a chicken/egg problem: want to avoid it before
kdump, BUT in case kdump is not set, kmsg_dump() (and console flushing,
after your suggestion Petr) will run before it and not save collected
information from EDAC PoV.

My idea: I could call a second kmsg_dump() or at least a panic console
flush for within such notifier. Let me know what you think Petr (also
Dinh / Tony and all interested parties).


> [...] 
>> --- a/drivers/leds/trigger/ledtrig-panic.c
>> +++ b/drivers/leds/trigger/ledtrig-panic.c
>> @@ -64,7 +63,7 @@ static long led_panic_blink(int state)
>>  
>>  static int __init ledtrig_panic_init(void)
>>  {
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_pre_reboot_list,
>> _trigger_panic_nb);
> 
> Blinking => should go to the last "post_reboot/loop" list.
> [...] 
>> --- a/drivers/misc/ibmasm/heartbeat.c
>> +++ b/drivers/misc/ibmasm/heartbeat.c
>> @@ -32,20 +31,23 @@ static int suspend_heartbeats = 0;
>>  static int panic_happened(struct notifier_block *n, unsigned long val, void 
>> *v)
>>  {
>>  suspend_heartbeats = 1;
>> -return 0;
>> +return NOTIFY_DONE;
>>  }
>>  
>> -static struct notifier_block panic_notifier = { panic_happened, NULL, 1 };
>> +static struct notifier_block panic_notifier = {
>> +.notifier_call = panic_happened,
>> +};
>>  
>>  void ibmasm_register_panic_notifier(void)
>>  {
>> -atomic_notifier_chain_register(_notifier_list, _notifier);
>> +atomic_notifier_chain_register(_pre_reboot_list,
>> +_notifier);
> 
> Same here. Blinking => should go to the last "post_reboot/loop" list.

Ack on both.

IBMasm is not blinking IIUC, but still fits properly the loop list. This
notifier would make a heartbeat mechanism stop, and once it's stopped,
service processor is allowed to reboot - that's my understanding.


Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-16 Thread Guilherme G. Piccoli
Thanks for the review!

I agree with the blinking stuff, I can rework and add all LED/blinking
stuff into the loop list, it does make sense. I'll comment a bit in the
others below...

On 16/05/2022 11:01, Petr Mladek wrote:
> [...]
>> --- a/arch/mips/sgi-ip22/ip22-reset.c
>> +++ b/arch/mips/sgi-ip22/ip22-reset.c
>> @@ -195,7 +195,7 @@ static int __init reboot_setup(void)
>>  }
>>  
>>  timer_setup(_timer, blink_timeout, 0);
>> -atomic_notifier_chain_register(_notifier_list, _block);
>> +atomic_notifier_chain_register(_hypervisor_list, _block);
> 
> This notifier enables blinking. It is not much safe. It calls
> mod_timer() that takes a lock internally.
> 
> This kind of functionality should go into the last list called
> before panic() enters the infinite loop. IMHO, all the blinking
> stuff should go there.
> [...] 
>> --- a/arch/mips/sgi-ip32/ip32-reset.c
>> +++ b/arch/mips/sgi-ip32/ip32-reset.c
>> @@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
>>  pm_power_off = ip32_machine_halt;
>>  
>>  timer_setup(_timer, blink_timeout, 0);
>> -atomic_notifier_chain_register(_notifier_list, _block);
>> +atomic_notifier_chain_register(_hypervisor_list, _block);
> 
> Same here. Should be done only before the "loop".
> [...] 

Ack.


>> --- a/drivers/firmware/google/gsmi.c
>> +++ b/drivers/firmware/google/gsmi.c
>> @@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
>>  
>>  register_reboot_notifier(_reboot_notifier);
>>  register_die_notifier(_die_notifier);
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_hypervisor_list,
>> _panic_notifier);
> 
> I am not sure about this one. It looks like some logging or
> pre_reboot stuff.
> 

Disagree here. I'm looping Google maintainers, so they can comment.
(CCed Evan, David, Julius)

This notifier is clearly a hypervisor notification mechanism. I've fixed
a locking stuff there (in previous patch), I feel it's low-risk but even
if it's mid-risk, the class of such callback remains a perfect fit with
the hypervisor list IMHO.


> [...] 
>> --- a/drivers/leds/trigger/ledtrig-activity.c
>> +++ b/drivers/leds/trigger/ledtrig-activity.c
>> @@ -247,7 +247,7 @@ static int __init activity_init(void)
>>  int rc = led_trigger_register(_led_trigger);
>>  
>>  if (!rc) {
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_hypervisor_list,
>> _panic_nb);
> 
> The notifier is trivial. It just sets a variable.
> 
> But still, it is about blinking and should be done
> in the last "loop" list.
> 
> 
>>  register_reboot_notifier(_reboot_nb);
>>  }
>> --- a/drivers/leds/trigger/ledtrig-heartbeat.c
>> +++ b/drivers/leds/trigger/ledtrig-heartbeat.c
>> @@ -190,7 +190,7 @@ static int __init heartbeat_trig_init(void)
>>  int rc = led_trigger_register(_led_trigger);
>>  
>>  if (!rc) {
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_hypervisor_list,
>> _panic_nb);
> 
> Same here. Blinking => loop list.

Ack.


>> [...]
>> diff --git a/drivers/misc/bcm-vk/bcm_vk_dev.c 
>> b/drivers/misc/bcm-vk/bcm_vk_dev.c
>> index a16b99bdaa13..d9d5199cdb2b 100644
>> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
>> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
>> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const 
>> struct pci_device_id *ent)
>>  
>>  /* register for panic notifier */
>>  vk->panic_nb.notifier_call = bcm_vk_on_panic;
>> -err = atomic_notifier_chain_register(_notifier_list,
>> +err = atomic_notifier_chain_register(_hypervisor_list,
>>   >panic_nb);
> 
> It seems to reset some hardware or so. IMHO, it should go into the
> pre-reboot list.

Mixed feelings here, I'm looping Broadcom maintainers to comment.
(CC Scott and Broadcom list)

I'm afraid it breaks kdump if this device is not reset beforehand - it's
a doorbell write, so not high risk I think...

But in case the not-reset device can be probed normally in kdump kernel,
then I'm fine in moving this to the reboot list! I don't have the HW to
test myself.


> [...]
>> --- a/drivers/power/reset/ltc2952-poweroff.c
>> +++ b/drivers/power/reset/ltc2952-poweroff.c
>> @@ -279,7 +279,7 @@ static int ltc2952_poweroff_probe(struct platform_device 
>> *pdev)
>>  pm_power_off = ltc2952_poweroff_kill;
>>  
>>  data->panic_notifier.notifier_call = ltc2952_poweroff_notify_panic;
>> -atomic_notifier_chain_register(_notifier_list,
>> +atomic_notifier_chain_register(_hypervisor_list,
>> >panic_notifier);
> 
> I looks like this somehow triggers the reboot. IMHO, it should go
> into the pre_reboot list.

Mixed feeling again here - CCing the maintainers for comments (Sebastian
/ PM 

Re: [PATCH 20/30] panic: Add the panic informational notifier list

2022-05-16 Thread Guilherme G. Piccoli
On 16/05/2022 11:11, Petr Mladek wrote:
> [...]
> 
> All notifiers moved in this patch seems to fit well the "info"
> notifier list. The patch looks good from this POV.
> 
> I still have to review the rest of the patches to see if it
> is complete.
> 
> Best Regards,
> Petr

Thanks a bunch for the review =)



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-15 Thread Guilherme G. Piccoli
On 12/05/2022 11:03, Petr Mladek wrote:
> Hello,
> 
> first, I am sorry for stepping into the discussion so late.
> I was busy with some other stuff and this patchset is far
> from trivial.
> 
> Second, thanks a lot for putting so much effort into it.
> Most of the changes look pretty good, especially all
> the fixes of particular notifiers and split into
> four lists.
> 
> Though this patch will need some more love. See below
> for more details.

Thanks a lot for your review Petr, it is much appreciated! No need for
apologies, there is no urgency here =)


> [...] 
> This talks only about kdump. The reality is much more complicated.
> The level affect the order of:
> 
> + notifiers vs. kdump
> + notifiers vs. crash_dump
> + crash_dump vs. kdump

First of all, I'd like to ask you please to clarify to me *exactly* what
are the differences between "crash_dump" and "kdump". I'm sorry if
that's a silly question, I need to be 100% sure I understand the
concepts the same way you do.


> There might theoretically many variants of the ordering of kdump,
> crash_dump, and the 4 notifier list. Some variants do not make
> much sense. You choose 5 variants and tried to select them by
> a level number.
> 
> The question is if we really could easily describe the meaning this
> way. It is not only about a "level" of notifiers before kdump. It is
> also about the ordering of crash_dump vs. kdump. IMHO, "level"
> semantic does not fit there.
> 
> Maybe more parameters might be easier to understand the effect.
> Anyway, we first need to agree on the chosen variants.
> I am going to discuss it more in the code, see below.
> 
> 
> [...] 
> Here is the code using the above functions. It helps to discuss
> the design and logic.
> 
> 
>   order_panic_notifiers_and_kdump();
> 
>   /* If no level, we should kdump ASAP. */
>   if (!panic_notifiers_level)
>   __crash_kexec(NULL);
> 
>   crash_smp_send_stop();
>   panic_notifier_hypervisor_once(buf);
> 
>   if (panic_notifier_info_once(buf))
>   kmsg_dump(KMSG_DUMP_PANIC);
> 
>   panic_notifier_pre_reboot_once(buf);
> 
>   __crash_kexec(NULL);
> 
>   panic_notifier_hypervisor_once(buf);
> 
>   if (panic_notifier_info_once(buf))
>   kmsg_dump(KMSG_DUMP_PANIC);
> 
>   panic_notifier_pre_reboot_once(buf);
> 
> 
> I have to say that the logic is very unclear. Almost all
> functions are called twice:
> 
>+ __crash_kexec()
>+ kmsg_dump()
>+ panic_notifier_hypervisor_once()
>+ panic_notifier_pre_reboot_once()
>+ panic_notifier_info_once()
> 
> It is pretty hard to find what functions are always called in the same
> order and where the order can be inverted.
> 
> The really used code path is defined by order_panic_notifiers_and_kdump()
> that encodes "level" into "bits". The bits are then flipped in
> panic_notifier_*_once() calls that either do something or not.
> kmsg_dump() is called according to the bit flip.
> 
> It is an interesting approach. I guess that you wanted to avoid too
> many if/then/else levels in panic(). But honestly, it looks like
> a black magic to me.
> 
> IMHO, it is always easier to follow if/then/else logic than using
> a translation table that requires additional bit flips when
> a value is used more times.
> 
> Also I guess that it is good proof that "level" abstraction does
> not fit here. Normal levels would not need this kind of magic.

Heheh OK, I appreciate your opinion, but I guess we'll need to agree in
disagree here - I'm much more fond to this kind of code than a bunch of
if/else blocks that almost give headaches. Encoding such "level" logic
in the if/else scheme is very convoluted, generates a very big code. And
the functions aren't so black magic - they map a level in bits, and the
functions _once() are called...once! Although we switch the position in
the code, so there are 2 calls, one of them is called and the other not.

But that's totally fine to change - especially if we're moving away from
the "level" logic. I see below you propose a much simpler approach - if
we follow that, definitely we won't need the "black magic" approach heheh


> 
> OK, the question is how to make it better. Let's start with
> a clear picture of the problem:
> 
> 1. panic() has basically two funtions:
> 
>   + show/store debug information (optional ways and amount)
>   + do something with the system (reboot, stay hanged)
> 
> 
> 2. There are 4 ways how to show/store the information:
> 
>   + tell hypervisor to store what it is interested about
>   + crash_dump
>   + kmsg_dump()
>   + consoles
> 
>   , where crash_dump and consoles are special:
> 
>  + crash_dump does not return. Instead it ends up with reboot.
> 
>  + Consoles work transparently. They just need an extra flush
>before reboot or staying hanged.
> 
> 
> 3. The various notifiers do things like:
> 
>  + tell hypervisor about the crash
>  + print more 

Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-05-15 Thread Guilherme G. Piccoli
On 13/05/2022 11:44, Johannes Berg wrote:
> [...]
>> Maybe Anton / Johannes / Richard could give their opinions - appreciate
>> that, I'm not attached to the priority here, it's more about users'
>> common usage of UML I can think of...
> 
> It's hard to say ... In a sense I'm not sure it matters?
> 
> OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
> might still be useful to run before the mconsole and coredump ones, even
> if you could probably use gdb to figure out the information.
> 
> Personally, I don't have a scenario where I'd care about the trace
> buffers though, and most of the others I found would seem irrelevant
> (drivers that aren't even compiled, hung tasks won't really happen since
> we exit immediately, and similar.)
> 
> johannes

Thanks Johannes, I agree with you.

We don't have great ordering now, one thing we need to enforce is the
order between the 2 UML notifiers, and this patch is doing that..trying
to order against other callbacks like the ftrace dumper is messy in the
current code.

OTOH if this patch set is accepted at some point, we'll likely have 3
lists, and with that we can improve ordering a lot - this notifier for
instance would run in the pre-reboot list, *after* the ftrace dumper (if
a kmsg dumper is set).

So, my intention is to keep this patch as is for V2 (with some changes
Johannes suggested before), unless Petr or the other maintainers want
something different.
Cheers,


Guilherme



Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-05-11 Thread Guilherme G. Piccoli
On 10/05/2022 11:28, Petr Mladek wrote:
> [...]
> It is not clear to me why user mode linux should not care about
> the other notifiers. It might be because I do not know much
> about the user mode linux.
> 
> Is the because they always create core dump or are never running
> in a hypervisor or ...?
> 
> AFAIK, the notifiers do many different things. For example, there
> is a notifier that disables RCU watchdog, print some extra
> information. Why none of them make sense here?
>

Hi Petr, my understanding is that UML is a form of running Linux as a
regular userspace process for testing purposes. With that said, as soon
as we exit in the error path, less "pollution" would happen, so users
can use GDB to debug the core dump for example.

In later patches of this series (when we split the panic notifiers in 3
lists) these UML notifiers run in the pre-reboot list, so they run after
the informational notifiers for example (in the default level).
But without the list split we cannot order properly, so my gut feeling
is that makes sense to run them rather earlier than later in the panic
process...

Maybe Anton / Johannes / Richard could give their opinions - appreciate
that, I'm not attached to the priority here, it's more about users'
common usage of UML I can think of...

Cheers,


Guilherme



Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

2022-05-11 Thread Guilherme G. Piccoli
On 10/05/2022 11:16, Petr Mladek wrote:
> [...]
> Yeah, it is pretty strange behavior.
> 
> I looked into the history. This notifier was added into the alpha code
> in 2.4.0-test2pre2. In this historic code, the default panic() code
> either rebooted after a timeout or ended in a infinite loop. There
> was not crasdump at that times.
> 
> The notifier allowed to change the behavior. There were 3 notifiers:
> 
>+ mips and mips64 ended with blinking in panic()
>+ alpha did __halt() in this srm case
> 
> They both still do this. I guess that it is some historic behavior
> that people using these architectures are used to.
> 
> Anyway, it makes sense to do this as the last notifier after
> dumping other information.
> 
> Reviewed-by: Petr Mladek 
> 
> Best Regards,
> Petr

Thanks a bunch for the review - added your tag for V2 =)



Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-11 Thread Guilherme G. Piccoli
On 10/05/2022 14:40, Steven Rostedt wrote:
> On Wed, 27 Apr 2022 19:49:17 -0300
> "Guilherme G. Piccoli"  wrote:
> 
>> Currently we don't have a way to check if there are dumpers set,
>> except counting the list members maybe. This patch introduces a very
>> simple helper to provide this information, by just keeping track of
>> registered/unregistered kmsg dumpers. It's going to be used on the
>> panic path in the subsequent patch.
> 
> FYI, it is considered "bad form" to reference in the change log "this
> patch". We know this is a patch. The change log should just talk about what
> is being done. So can you reword your change logs (you do this is almost
> every patch). Here's what I would reword the above to be:
> 
>  Currently we don't have a way to check if there are dumpers set, except
>  perhaps by counting the list members. Introduce a very simple helper to
>  provide this information, by just keeping track of registered/unregistered
>  kmsg dumpers. This will simplify the refactoring of the panic path.

Thanks for the hint, you're right - it's almost in all of my patches.
I'll reword all of them (except the ones already merged) to remove this
"bad form".

Cheers,


Guilherme



Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-11 Thread Guilherme G. Piccoli
On 11/05/2022 13:45, Heiko Carstens wrote:
> [...]
>>
>> Hey S390/SPARC folks, sorry for the ping!
>>
>> Any reviews on this V1 would be greatly appreciated, I'm working on V2
>> and seeking feedback in the non-reviewed patches.
> 
> Sorry, missed that this is quite s390 specific. So, yes, this looks
> good to me and nice to see that one of the remaining CONFIG_S390 in
> common code will be removed!
> 
> For the s390 bits:
> Acked-by: Heiko Carstens 

No need for apologies, I really appreciate your review!
Will add your Ack for V2 =)

Cheers,


Guilherme



Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

2022-05-10 Thread Guilherme G. Piccoli
On 09/05/2022 12:52, Sean Christopherson wrote:
> I find the shortlog to be very confusing, the bug has nothing to do with 
> disabling
> VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent 
> doom
> if VMX is already disabled :-).  The issue is really that 
> nmi_shootdown_cpus() doesn't
> play nice with being called twice.
> 

Hey Sean, OK - I agree with you, the issue is really about the double
list addition.

> [...]
> 
> Call stacks for the two callers would be very, very helpful.
> [...]

> This feels like were just adding more duct tape to the mess.  nmi_shootdown() 
> is
> still unsafe for more than one caller, and it takes a _lot_ of staring and 
> searching
> to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, 
> i.e.
> that it will call smp_ops.crash_stop_other_cpus() and not just 
> smp_send_stop().
> 
> Rather than shared a flag between two relatively unrelated functions, what if 
> we
> instead disabling virtualization in crash_nmi_callback() and then turn the 
> reboot
> call into a nop if an NMI shootdown has already occurred?  That will also add 
> a
> bit of documentation about multiple shootdowns not working.
> 
> And I believe there's also a lurking bug in 
> native_machine_emergency_restart() that
> can be fixed with cleanup.  SVM can also block INIT and so should be disabled 
> during
> an emergency reboot.
> 
> The attached patches are compile tested only.  If they seem sane, I'll post an
> official mini series.

Thanks Sean, it makes sense - my patch is more a "band-aid" whereas
yours fixes it in a more generic way. Confess I found the logic of your
patch complex, but as you said, it requires a *lot* of code analysis to
understand these multiple shutdown patches, the problem is complicated
by nature heh

I've tested your patch 0001 and it works well for all cases [0], so go
ahead and submit the miniseries, feel free to add:

Reported-and-tested-by: Guilherme G. Piccoli 


I've read patch 0002 and it makes sense to me as well, a good proactive
bug fix =)

With that said, I'll of course drop this one from V2 of this series.
Cheers,


Guilherme




[0]
A summary of my tests and the code paths that the panic shutdown take
depending on some conditions:

New function that disables VMX/SVM: cpu_crash_disable_virtualization()
[should be executed in every online CPU on shutdown)

The panic path triggers the following call stacks depending on kdump and
post_notifiers:


(1) kexec/kdump + !crash_kexec_post_notifiers
->machine_crash_shutdown()
.crash_shutdown() 
--native_machine_crash_shutdown() [all custom handlers except Xen PV
call the native generic function]
crash_smp_send_stop()
--kdump_nmi_shootdown_cpus()
nmi_shootdown_cpus(kdump_nmi_callback)
--crash_nmi_callback()
kdump_nmi_callback()
--cpu_crash_disable_virtualization()


(2) kexec/kdump + crash_kexec_post_notifiers
->crash_smp_send_stop()
kdump_nmi_shootdown_cpus()
--nmi_shootdown_cpus(kdump_nmi_callback)
crash_nmi_callback()
--kdump_nmi_callback()
cpu_crash_disable_virtualization()

After this path, will execute machine_crash_shutdown() but
crash_smp_send_stop()
is guarded against double execution. Also, emergency restart calls
emergency_vmx_disable_all() .


(3) !kexec/kdump + crash_kexec_post_notifiers

Same as (2)


(4) !kexec/kdump + !crash_kexec_post_notifiers
-> smp_send_stop()
native_stop_other_cpus()
--apic_send_IPI_allbutself(REBOOT_VECTOR)
sysvec_reboot
--cpu_emergency_vmxoff() 

If not:
--register_stop_handler()
apic_send_IPI_allbutself(NMI_VECTOR)
--smp_stop_nmi_callback()
cpu_emergency_vmxoff()

After that, emergency_vmx_disable_all() gets called in the emergency
restart path as well.



Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 12:16, Petr Mladek wrote:
> [...]
> Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> All notifiers will be always called with PANIC_NOTIFIER.
> 
> The @val parameter is normally used when the same notifier_list
> is used in different situations.
> 
> But you are going to use it when the same notifier is used
> in more lists. This is normally distinguished by the @nh
> (atomic_notifier_head) parameter.
> 
> IMHO, it is a bad idea. First, it would confuse people because
> it does not follow the original design of the parameters.
> Second, the related code must be touched anyway when
> the notifier is moved into another list so it does not
> help much.
> 
> Or do I miss anything, please?
> 
> Best Regards,
> Petr

Hi Petr, thanks for the review.

I'm not strong attached to this patch, so we could drop it and refactor
the code of next patches to use the @nh as identification - but
personally, I feel this parameter could be used to identify the list
that called such function, in other words, what is the event that
triggered the callback. Some notifiers are even declared with this
parameter called "ev", like the event that triggers the notifier.


You mentioned 2 cases:

(a) Same notifier_list used in different situations;

(b) Same *notifier callback* used in different lists;

Mine is case (b), right? Can you show me an example of case (a)? You can
see in the following patches (or grep the kernel) that people are using
this identification parameter to determine which kind of OOPS trigger
the callback to condition the execution of the function to specific
cases. IIUIC, this is more or less what I'm doing, but extending the
idea for panic notifiers.

Again, as a personal preference, it makes sense to me using id's VS
comparing pointers to differentiate events/callers.

Cheers,


Guilherme



Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 10:53, Michael Ellerman wrote:
> "Guilherme G. Piccoli"  writes:
>> On 05/05/2022 15:55, Hari Bathini wrote:
>>> [...] 
>>> The change looks good. I have tested it on an LPAR (ppc64).
>>>
>>> Reviewed-by: Hari Bathini 
>>>
>>
>> Hi Michael. do you think it's possible to add this one to powerpc/next
>> (or something like that), or do you prefer a V2 with his tag?
> 
> Ah sorry, I assumed it was going in as part of the whole series. I guess
> I misread the cover letter.
> 
> So you want me to take this patch on its own via the powerpc tree?
> 
> cheers

Hi Michael, thanks for the prompt response!

You didn't misread, that was the plan heh
But some maintainers start to take patches and merge in their trees, and
in the end, it seems to make sense - almost half of this series are
fixes or clean-ups, that are not really necessary to get merged altogether.

So, if you can take this one, I'd appreciate - it'll make V2 a bit
smaller =)

Cheers,


Guilherme



Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 08:38, Petr Mladek wrote:
> [...]
> I see two more alternative solutions:
> 
> 1st variant is a trick already used in console write() callbacks.
> They do trylock() when oops_in_progress is set. They remember
> the result to prevent double unlock when printing Oops messages and
> the system will try to continue working. For example:
> 
> pl011_console_write(struct console *co, const char *s, unsigned int count)
> {
> [...]
>   int locked = 1;
> [...]
>   if (uap->port.sysrq)
>   locked = 0;
>   else if (oops_in_progress)
>   locked = spin_trylock(>port.lock);
>   else
>   spin_lock(>port.lock);
> 
> [...]
> 
>   if (locked)
>   spin_unlock(>port.lock);
> }
> 
> 
> 2nd variant is to check panic_cpu variable. It is used in printk.c.
> We might move the function to panic.h:
> 
> static bool panic_in_progress(void)
> {
>   return unlikely(atomic_read(_cpu) != PANIC_CPU_INVALID);
> }
> 
> and then do:
> 
>   if (panic_in_progress()) {
>   ...

Thanks for the review Petr! I feel alternative two is way better, it
checks for panic - the oops_in_progress isn't really enough, since we
can call panic() directly, not necessarily through an oops path, correct?

For me, we could stick with the lock check, but I'll defer to Evan - I
didn't work the V2 patch yet, what do you prefer Evan?


> [...]
> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.
> 
> Best Regards,
> Petr

Makes total sense, thanks for confirming!
Cheers,


Guilherme



Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 09:14, Petr Mladek wrote:
> [...]
>> With that said, it's dangerous to use regular spinlocks in such path,
>> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
>> instances").
>> This patch fixes that by replacing regular spinlocks with the trylock
>> safer approach.
> 
> It seems that the lock is used just to manipulating a list. A super
> safe solution would be to use the rcu API: rcu_add_rcu() and
> list_del_rcu() under rcu_read_lock(). The spin lock will not be
> needed and the list will always be valid.
> 
> The advantage would be that it will always call members that
> were successfully added earlier. That said, I am not familiar
> with pvpanic and am not sure if it is worth it.
> 
>> It also fixes an old comment (about a long gone framebuffer code) and
>> the notifier priority - we should execute hypervisor notifiers early,
>> deferring this way the panic action to the hypervisor, as expected by
>> the users that are setting up pvpanic.
> 
> This should be done in a separate patch. It changes the behavior.
> Also there might be a discussion whether it really should be
> the maximal priority.
> 
> Best Regards,
> Petr

Thanks for the review Petr. Patch was already merged - my goal was to be
concise, i.e., a patch per driver / module, so the patch kinda fixes
whatever I think is wrong with the driver with regards panic handling.

Do you think it worth to remove this patch from Greg's branch just to
split it in 2? Personally I think it's not worth, but opinions are welcome.

About the RCU part, this one really could be a new patch, a good
improvement patch - it makes sense to me, we can think about that after
the fixes I guess.

Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-09 Thread Guilherme G. Piccoli
Hey Hatayma, thanks for your great analysis and no need for apologies!

I'll comment/respond properly inline below, just noticing here that I've
CCed Mark and Marc (from the ARM64 perspective), Michael (Hyper-V
perspective) and Hari (PowerPC perspective), besides the usual suspects
as Petr, Baoquan, etc.

On 09/05/2022 12:16, d.hatay...@fujitsu.com wrote:
> Sorry for the delayed response. Unfortunately, I had 10 days holidays
> until yesterday...
> [...] 
>> +   We currently have 4 lists of panic notifiers; based
>> +   on the functionality and risk (for panic success) the
>> +   callbacks are added in a given list. The lists are:
>> +   - hypervisor/FW notification list (low risk);
>> +   - informational list (low/medium risk);
>> +   - pre_reboot list (higher risk);
>> +   - post_reboot list (only run late in panic and after
>> +   kdump, not configurable for now).
>> +   This parameter defines the ordering of the first 3
>> +   lists with regards to kdump; the levels determine
>> +   which set of notifiers execute before kdump. The
>> +   accepted levels are:
>> +   0: kdump is the first thing to run, NO list is
>> +   executed before kdump.
>> +   1: only the hypervisor list is executed before kdump.
>> +   2 (default level): the hypervisor list and (*if*
> 
> Hmmm, why are you trying to change default setting?
> 
> Based on the current design of kdump, it's natural to put what the
> handlers for these level 1 and level 2 handlers do in
> machine_crash_shutdown(), as these are necessary by default, right?
> 
> Or have you already tried that and figured out it's difficult in some
> reason and reached the current design? If so, why is that difficult?
> Could you point to if there is already such discussion online?
> 
> kdump is designed to perform as little things as possible before
> transferring the execution to the 2nd kernel in order to increase
> reliability. Just detour to panic() increases risks of kdump failure
> in the sense of increasing the executed codes in the abnormal
> situation, which is very the note in the explanation of
> crash_kexec_post_notifiers.
> 
> Also, the current implementation of crash_kexec_post_notifiers uses
> the panic notifier, but this is not from the technical
> reason. Ideally, it should have been implemented in the context of
> crash_kexec() independently of panic().
> 
> That is, it looks to me that, in addition to changing design of panic
> notifier, you are trying to integrate shutdown code of the crash kexec
> and the panic paths. If so, this is a big design change for kdump.
> I'm concerned about increase of reliability. I'd like you to discuss
> them carefully.

>From my understanding (specially based on both these threads [0] and
[1]), 3 facts are clear and quite complex in nature:

(a) Currently, the panic notifier list is a "no man's land" - it's a
mess, all sort of callbacks are added there, some of them are extremely
risk for breaking kdump, others are quite safe (like setting a
variable). Petr's details in thread [0] are really clear and express in
great way how confusing and conflicting the panic notifiers goals are.

(b) In order to "address" problems in the antagonistic goals of
notifiers (see point (a) above and thread [0]), we have this
quirk/workaround called "crash_kexec_post_notifiers". This is useful,
but (almost as for attesting how this is working as band-aid over
complex and fundamental issues) see the following commits:

a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before
running crash kernel")

06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore
generated with panic")

They hardcode such workaround, because they *need* some notifiers'
callbacks. But notice they *don't need all of them*, only some important
ones (that usually are good considering the risk, it's a good
cost/benefit). Since we currently have an all-or-nothing behavior for
the panic notifiers, both PowerPC and Hyper-V end-up bringing all of
them to run before kdump due to the lack of flexibility, increasing a
lot the risk of failure for kdump.

(c) To add on top of all such complexity, we don't have a custom
machine_crash_shutdown() handler for some architectures like ARM64, and
the feeling is that's not right to add a bunch of callbacks / complexity
in such architecture code, specially since we have the notifiers
infrastructure in the kernel. I've recently started a discussion about
that with ARM64 community, please take a look in [1].

With that said, we can use (a) + (b) + (c) to justify our argument here:
the panic notifiers should be refactored! We need to try to encompass
the antagonistic goals of kdump (wants to be the 

Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-05-09 Thread Guilherme G. Piccoli
On 09/05/2022 13:14, Suzuki K Poulose wrote:
> [...]> 
> I have queued this to coresight/next.
> 
> Thanks
> Suzuki


Thanks a lot Suzuki!



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-09 Thread Guilherme G. Piccoli
On 29/04/2022 13:04, Guilherme G. Piccoli wrote:
> On 27/04/2022 21:28, Randy Dunlap wrote:
>>
>>
>> On 4/27/22 15:49, Guilherme G. Piccoli wrote:
>>> +   crash_kexec_post_notifiers
>>> +   This was DEPRECATED - users should always prefer the
>>
>>  This is DEPRECATED - users should always prefer the
>>
>>> +   parameter "panic_notifiers_level" - check its entry
>>> +   in this documentation for details on how it works.
>>> +   Setting this parameter is exactly the same as setting
>>> +   "panic_notifiers_level=4".
>>
> 
> Thanks Randy, for your suggestion - but I confess I couldn't understand
> it properly. It's related to spaces/tabs, right? What you suggest me to
> change in this formatting? Just by looking the email I can't parse.
> 
> Cheers,
> 
> 
> Guilherme

Complete lack of attention from me, apologies!
The suggestions was s/was/is - already fixed for V2, thanks Randy.



Re: [PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> Currently we have 3 notifier lists in the panic path, which will
> be wired in a way to allow the notifier callbacks to run in
> different moments at panic time, in a subsequent patch.
> 
> But there is also an odd set of architecture calls hardcoded in
> the end of panic path, after the restart machinery. They're
> responsible for late time tunings / events, like enabling a stop
> button (Sparc) or effectively stopping the machine (s390).
> 
> This patch introduces yet another notifier list to offer the
> architectures a way to add callbacks in such late moment on
> panic path without the need of ifdefs / hardcoded approaches.
> 
> Cc: Alexander Gordeev 
> Cc: Christian Borntraeger 
> Cc: "David S. Miller" 
> Cc: Heiko Carstens 
> Cc: Sven Schnelle 
> Cc: Vasily Gorbik 
> Signed-off-by: Guilherme G. Piccoli 

Hey S390/SPARC folks, sorry for the ping!

Any reviews on this V1 would be greatly appreciated, I'm working on V2
and seeking feedback in the non-reviewed patches.

Thanks in advance,


Guilherme



Re: [PATCH 10/30] alpha: Clean-up the panic notifier code

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:49, Guilherme G. Piccoli wrote:
> The alpha panic notifier has some code issues, not following
> the conventions of other notifiers. Also, it might halt the
> machine but still it is set to run as early as possible, which
> doesn't seem to be a good idea.
> 
> This patch cleans the code, and set the notifier to run as the
> latest, following the same approach other architectures are doing.
> Also, we remove the unnecessary include of a header already
> included indirectly.
> 
> Cc: Ivan Kokshaysky 
> Cc: Matt Turner 
> Cc: Richard Henderson 
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/alpha/kernel/setup.c | 36 +++-
>  1 file changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
> index b4fbbba30aa2..d88bdf852753 100644
> --- a/arch/alpha/kernel/setup.c
> +++ b/arch/alpha/kernel/setup.c
> @@ -41,19 +41,11 @@
>  #include 
>  #include 
>  #endif
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  
> -static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
> -static struct notifier_block alpha_panic_block = {
> - alpha_panic_event,
> -NULL,
> -INT_MAX /* try to do it first */
> -};
> -
>  #include 
>  #include 
>  #include 
> @@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
>  };
>  #endif
>  
> +static int alpha_panic_event(struct notifier_block *this,
> +  unsigned long event, void *ptr)
> +{
> + /* If we are using SRM and serial console, just hard halt here. */
> + if (alpha_using_srm && srmcons_output)
> + __halt();
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block alpha_panic_block = {
> + .notifier_call = alpha_panic_event,
> + .priority = INT_MIN, /* may not return, do it last */
> +};
> +
>  void __init
>  setup_arch(char **cmdline_p)
>  {
> @@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
>   .show   = show_cpuinfo,
>  };
>  
> -
> -static int
> -alpha_panic_event(struct notifier_block *this, unsigned long event, void 
> *ptr)
> -{
> -#if 1
> - /* FIXME FIXME FIXME */
> - /* If we are using SRM and serial console, just hard halt here. */
> - if (alpha_using_srm && srmcons_output)
> - __halt();
> -#endif
> -return NOTIFY_DONE;
> -}
> -
>  static __init int add_pcspkr(void)
>  {
>   struct platform_device *pd;


Hi folks, I'm updating Richard's email and re-sending the V1, any
reviews are greatly appreciated!

Cheers,


Guilherme



Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-05-09 Thread Guilherme G. Piccoli
On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> 
> On 27/04/2022 23:49, Guilherme G. Piccoli wrote:
>> The panic notifier infrastructure executes registered callbacks when
>> a panic event happens - such callbacks are executed in atomic context,
>> with interrupts and preemption disabled in the running CPU and all other
>> CPUs disabled. That said, mutexes in such context are not a good idea.
>>
>> This patch replaces a regular mutex with a mutex_trylock safer approach;
>> given the nature of the mutex used in the driver, it should be pretty
>> uncommon being unable to acquire such mutex in the panic path, hence
>> no functional change should be observed (and if it is, that would be
>> likely a deadlock with the regular mutex).
>>
>> Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
>> Cc: Leo Yan 
>> Cc: Mathieu Poirier 
>> Cc: Mike Leach 
>> Cc: Suzuki K Poulose 
>> Signed-off-by: Guilherme G. Piccoli 
> 
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
> 
> Reviewed-by: Suzuki K Poulose 

Hi Suzuki, some other maintainers are taking the patches to their next
branches for example. I'm working on V2, and I guess in the end would be
nice to reduce the size of the series a bit.

So, do you think you could pick this one for your coresight/next branch
(or even for rc cycle, your call - this is really a fix)?
This way, I won't re-submit this one in V2, since it's gonna be merged
already in your branch.

Thanks in advance,


Guilherme



Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-09 Thread Guilherme G. Piccoli
On 05/05/2022 15:55, Hari Bathini wrote:
> [...] 
> The change looks good. I have tested it on an LPAR (ppc64).
> 
> Reviewed-by: Hari Bathini 
> 

Hi Michael. do you think it's possible to add this one to powerpc/next
(or something like that), or do you prefer a V2 with his tag?
Thanks,


Guilherme



Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

2022-05-09 Thread Guilherme G. Piccoli
On 27/04/2022 19:48, Guilherme G. Piccoli wrote:
> In the panic path we have a list of functions to be called, the panic
> notifiers - such callbacks perform various actions in the machine's
> last breath, and sometimes users want them to run before kdump. We
> have the parameter "crash_kexec_post_notifiers" for that. When such
> parameter is used, the function "crash_smp_send_stop()" is executed
> to poweroff all secondary CPUs through the NMI-shootdown mechanism;
> part of this process involves disabling virtualization features in
> all CPUs (except the main one).
> 
> Now, in the emergency restart procedure we have also a way of
> disabling VMX in all CPUs, using the same NMI-shootdown mechanism;
> what happens though is that in case we already NMI-disabled all CPUs,
> the emergency restart fails due to a second addition of the same items
> in the NMI list, as per the following log output:
> 
> sysrq: Trigger a crash
> Kernel panic - not syncing: sysrq triggered crash
> [...]
> Rebooting in 2 seconds..
> list_add double add: new=, prev=, next=.
> [ cut here ]
> kernel BUG at lib/list_debug.c:29!
> invalid opcode:  [#1] PREEMPT SMP PTI
> 
> In order to reproduce the problem, users just need to set the kernel
> parameter "crash_kexec_post_notifiers" *without* kdump set in any
> system with the VMX feature present.
> 
> Since there is no benefit in re-disabling VMX in all CPUs in case
> it was already done, this patch prevents that by guarding the restart
> routine against doubly issuing NMIs unnecessarily. Notice we still
> need to disable VMX locally in the emergency restart.
> 
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is 
> supported)
> Fixes: 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly 
> version in panic path")
> Cc: David P. Reed 
> Cc: Hidehiro Kawai 
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Signed-off-by: Guilherme G. Piccoli 
> ---
>  arch/x86/include/asm/cpu.h |  1 +
>  arch/x86/kernel/crash.c|  8 
>  arch/x86/kernel/reboot.c   | 14 --
>  3 files changed, 17 insertions(+), 6 deletions(-)
> 

Hi Paolo / Sean / Vitaly, sorry for the ping.
But do you think this fix is OK from the VMX point-of-view?

I'd like to send a V2 of this set soon, so any review here is highly
appreciated!

Cheers,


Guilherme




Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-05 Thread Guilherme G. Piccoli
On 05/05/2022 15:55, Hari Bathini wrote:
> [...] 
> 
> The change looks good. I have tested it on an LPAR (ppc64).
> 
> Reviewed-by: Hari Bathini 

Thanks a bunch Hari, much appreciated!



Re: [PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header

2022-05-04 Thread Guilherme G. Piccoli
On 04/05/2022 17:32, Thomas Bogendoerfer wrote:
> [...]
> 
> applied to mips-next.
> 
> Thomas.
> 

Thanks a bunch Thomas =)



Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-04 Thread Guilherme G. Piccoli
On 03/05/2022 18:56, Evan Green wrote:
> Hi Guilherme,
> [...] 
>> Do you agree with that, or prefer really a parameter in
>> gsmi_shutdown_reason() ? I'll follow your choice =)
> 
> I'm fine with either, thanks for the link. Mostly I want to make sure
> other paths to gsmi_shutdown_reason() aren't also converted to a try.

Hi Evan, thanks for the prompt response! So, I'll proceed like I did in
s390, for consistency.

> [...]
>> Reasoning: the problem with your example is that, by default, secondary
>> CPUs are disabled in the panic path, through an IPI mechanism. IPIs take
>> precedence and interrupt the work in these CPUs, effectively
>> interrupting the "polite work" with the lock held heh
> 
> The IPI can only interrupt a CPU with irqs disabled if the IPI is an
> NMI. I haven't looked before to see if we use NMI IPIs to corral the
> other CPUs on panic. On x86, I grepped my way down to
> native_stop_other_cpus(), which looks like it does a normal IPI, waits
> 1 second, then does an NMI IPI. So, if a secondary CPU has the lock
> held, on x86 it has roughly 1s to finish what it's doing and re-enable
> interrupts before smp_send_stop() brings the NMI hammer down. I think
> this should be more than enough time for the secondary CPU to get out
> and release the lock.
> 
> So then it makes sense to me that you're fixing cases where we
> panicked with the lock held, or hung with the lock held. Given the 1
> second grace period x86 gives us, I'm on board, as that helps mitigate
> the risk that we bailed out early with the try and should have spun a
> bit longer instead. Thanks.
> 
> -Evan

Well, in the old path without "crash_kexec_post_notifiers", we indeed
end-up relying on native_stop_other_cpus() for x86 as you said, and the
"1s rule" makes sense. But after this series (or even before, if the
kernel parameter "crash_kexec_post_notifiers" was used) the function
used to stop CPUs in the panic path is crash_smp_send_stop(), and the
call chain is like:

Main CPU:
crash_smp_send_stop()
--kdump_nmi_shootdown_cpus()
nmi_shootdown_cpus()

Then, in each CPU (except the main one, running panic() path),
we execute kdump_nmi_callback() in NMI context.

So, we seem to indeed interrupt any context (even with IRQs disabled),
increasing the likelihood of the potential lockups due to stopped CPUs
holding the locks heheh

Thanks again for the good discussion, let me know if anything I'm saying
doesn't make sense - this crash path is a bit convoluted, specially in
x86, I might have understood something wrongly =)
Cheers,


Guilherme



Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-03 Thread Guilherme G. Piccoli
On 03/05/2022 15:03, Evan Green wrote:
> [...]
> gsmi_shutdown_reason() is a common function called in other scenarios
> as well, like reboot and thermal trip, where it may still make sense
> to wait to acquire a spinlock. Maybe we should add a parameter to
> gsmi_shutdown_reason() so that you can get your change on panic, but
> we don't convert other callbacks into try-fail scenarios causing us to
> miss logs.
> 

Hi Evan, thanks for your feedback, much appreciated!
What I've done in other cases like this was to have a helper checking
the spinlock in the panic notifier - if we can acquire that, go ahead
but if not, bail out. For a proper example of an implementation, check
patch 13 of the series:
https://lore.kernel.org/lkml/20220427224924.592546-14-gpicc...@igalia.com/ .

Do you agree with that, or prefer really a parameter in
gsmi_shutdown_reason() ? I'll follow your choice =)


> Though thinking more about it, is this really a Good Change (TM)? The
> spinlock itself already disables interrupts, meaning the only case
> where this change makes a difference is if the panic happens from
> within the function that grabbed the spinlock (in which case the
> callback is also likely to panic), or in an NMI that panics within
> that window. The downside of this change is that if one core was
> politely working through an event with the lock held, and another core
> panics, we now might lose the panic log, even though it probably would
> have gone through fine assuming the other core has a chance to
> continue.

My feeling is that this is a good change, indeed - a lot of places are
getting changed like this, in this series.

Reasoning: the problem with your example is that, by default, secondary
CPUs are disabled in the panic path, through an IPI mechanism. IPIs take
precedence and interrupt the work in these CPUs, effectively
interrupting the "polite work" with the lock held heh

Then, such CPU is put to sleep and we finally reach the panic notifier
hereby discussed, in the main CPU. If the other CPU was shut-off *with
the lock held*, it's never finishing such work, so the lock is never to
be released. Conclusion: the spinlock can't be acquired, hence we broke
the machine (which is already broken, given it's panic) in the path of
this notifier.
This should be really rare, but..possible. So I think we should protect
against this scenario.

We can grab others' feedback if you prefer, and of course you have the
rights to refuse this change in the gsmi code, but from my
point-of-view, I don't see any advantage in just assume the risk,
specially since the change is very very simple.

Cheers,


Guilherme



Re: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers

2022-05-03 Thread Guilherme G. Piccoli
On 03/05/2022 15:13, Michael Kelley (LINUX) wrote:
> [...]
>> (a) We could forget about this change, and always do the clean-up here,
>> not relying in machine_crash_shutdown().
>> Pro: really simple, behaves the same as it is doing currently.
>> Con: less elegant/concise, doesn't allow arm64 customization.
>>
>> (b) Add a way to allow ARM64 customization of shutdown crash handler.
>> Pro: matches x86, more customizable, improves arm64 arch code.
>> Con: A tad more complex.
>>
>> Also, a question that came-up: if ARM64 has no way of calling special
>> crash shutdown handler, how can you execute hv_stimer_cleanup() and
>> hv_synic_disable_regs() there? Or are they not required in ARM64?
>>
> 
> My suggestion is to do (a) for now.  I suspect (b) could be a more
> extended discussion and I wouldn't want your patch set to get held
> up on that discussion.  I don't know what the sense of the ARM64
> maintainers would be toward (b).  They have tried to avoid picking
> up code warts like have accumulated on the x86/x64 side over the
> years, and I agree with that effort.  But as more and varied
> hypervisors become available for ARM64, it seems like a framework
> for supporting a custom shutdown handler may become necessary.
> But that could take a little time.
> 
> You are right about hv_stimer_cleanup() and hv_synic_disable_regs().
> We are not running these when a panic occurs on ARM64, and we
> should be, though the risk is small.   We will pursue (b) and add
> these additional cleanups as part of that.  But again, I would suggest
> doing (a) for now, and we will switch back to your solution once
> (b) is in place.
> 

Thanks again Michael, I'll stick with (a) for now. I'll check with ARM64
community about that, and I might even try to implement something in
parallel (if you are not already working on that - lemme know please),
so we don't get stuck here. As you said, I feel that this is more and
more relevant as the number of panic/crash/kexec scenarios tend to
increase in ARM64.


>> [...]
>> Some ideas of what we can do here:
>>
>> I) we could change the framebuffer notifier to rely on trylocks, instead
>> of risking a lockup scenario, and with that, we can execute it before
>> the vmbus disconnect in the hypervisor list;
> 
> I think we have to do this approach for now.
> 
>>
>> II) we ignore the hypervisor notifier in case of kdump _by default_, and
>> if the users don't want that, they can always set the panic notifier
>> level to 4 and run all notifiers prior to kdump; would that be terrible
>> you think? Kdump users might don't care about the framebuffer...
>>
>> III) we go with approach (b) above and refactor arm64 code to allow the
>> custom crash handler on kdump time, then [with point (I) above] the
>> logic proposed in this series is still valid - seems more and more the
>> most correct/complete solution.
> 
> But even when/if we get approach (b) implemented, having the
> framebuffer notifier on the pre_reboot list is still too late with the
> default of panic_notifier_level = 2.  The kdump path will reset the
> VMbus connection and then the framebuffer notifier won't work.
> 

OK, perfect! I'll work something along these lines in V2, allowing the
FB notifier to always run in the hypervisor list before the vmbus unload
mechanism.


>> [...]
 +static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long 
 val,
  void *args)
 +{
 +  if (!kexec_crash_loaded())
>>>
>>> I'm not clear on the purpose of this condition.  I think it means
>>> we will skip the vmbus_initiate_unload() if a panic occurs in the
>>> kdump kernel.  Is there a reason a panic in the kdump kernel
>>> should be treated differently?  Or am I misunderstanding?
>>
>> This is really related with the point discussed in the top of this
>> response - I assumed both ARM64/x86_64 would behave the same and
>> disconnect the vmbus through the custom crash handler when kdump is set,
>> so worth skipping it here in the notifier. But that's not true for ARM64
>> as you pointed, so this guard against kexec is really part of the
>> decision/discussion on what to do with ARM64 heh
> 
> But note that vmbus_initiate_unload() already has a guard built-in.
> If the intent of this test is just as a guard against running twice,
> then it isn't needed.

Since we're going to avoid relying in the custom crash_shutdown(), due
to the lack of ARM64 support for now, this check will be removed in V2.

Its purpose was to skip the notifier *proactively* in case kexec is set,
given that...once kexec happens, the custom crash_shutdown() would run
the same function (wrong assumption for ARM64, my bad).

Postponing that slightly would maybe gain us some time while the
hypervisor finish its work, so we'd delay less in the vmbus unload path
- that was the rationale behind this check.


Cheers!



Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-03 Thread Guilherme G. Piccoli
On 03/05/2022 14:31, Michael Kelley (LINUX) wrote:
> [...]
> 
> To me, it's a weak correlation between having a kmsg dumper, and
> wanting or not wanting the info level output to come before kdump.
> Hyper-V is one of only a few places that register a kmsg dumper, so most
> Linux instances outside of Hyper-V guest (and PowerPC systems?) will have
> the info level output after kdump.  It seems like anyone who cared strongly
> about the info level output would set the panic_notifier_level to 1 or to 3
> so that the result is more deterministic.  But that's just my opinion, and
> it's probably an opinion that is not as well informed on the topic as some
> others in the discussion. So keeping things as in your patch set is not a
> show-stopper for me.
> 
> However, I would request a clarification in the documentation.   The
> panic_notifier_level affects not only the hypervisor, informational,
> and pre_reboot lists, but it also affects panic_print_sys_info() and
> kmsg_dump().  Specifically, at level 1, panic_print_sys_info() and
> kmsg_dump() will not be run before kdump.  At level 3, they will
> always be run before kdump.  Your documentation above mentions
> "informational lists" (plural), which I take to vaguely include
> kmsg_dump() and panic_print_sys_info(), but being explicit about
> the effect would be better.
> 
> Michael

Thanks again Michael, to express your points and concerns - great idea
of documentation improvement here, I'll do that for V2, for sure.

The idea of "defaulting" to skip the info list on kdump (if no
kmsg_dump() is set) is again a mechanism that aims at accommodating all
users and concerns of antagonistic goals, kdump vs notifier lists.

Before this patch set, by default no notifier executed before kdump. So,
the "pendulum"  was strongly on kdump side, and clearly this was a
sub-optimal decision - proof of that is that both Hyper-V / PowerPC code
forcibly set the "crash_kexec_post_notifiers". The goal here is to have
a more lightweight list that by default runs before kdump, a secondary
list that only runs before kdump if there's usage for that (either user
sets that or kmsg_dumper set is considered a valid user), and the
remaining notifiers run by default only after kdump, all of that very
customizable through the levels idea.

Now, one thing we could do to improve consistency for the hyper-v case:
having a kmsg_dump_once() helper, and *for Hyper-V only*, call it on the
hypervisor list, within the info notifier (that would be moved to
hypervisor list, ofc).
Let's wait for more feedback on that, just throwing some ideas in order
we can have everyone happy with the end-result!

Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-05-03 Thread Guilherme G. Piccoli
On 03/05/2022 14:44, Michael Kelley (LINUX) wrote:
> [...]
>>
>> Hi Michael, thanks for your feedback! I agree that your idea could work,
>> but...there is one downside: imagine the kmsg_dump() approach is not set
>> in some Hyper-V guest, then we would rely in the regular notification
>> mechanism [hv_die_panic_notify_crash()], right?
>> But...you want then to run this notifier in the informational list,
>> which...won't execute *by default* before kdump if no kmsg_dump() is
>> set. So, this logic is convoluted when you mix it with the default level
>> concept + kdump.
> 
> Yes, you are right.  But to me that speaks as much to the linkage
> between the informational list and kmsg_dump() being the core
> problem.  But as I described in my reply to Patch 24, I can live with
> the linkage as-is.

Thanks for the feedback Michael!

> [...] 
>> I feel the panic notification mechanism does really fit with a
>> hypervisor list, it's a good match with the nature of the list, which
>> aims at informing the panic notification to the hypervisor/FW.
>> Of course we can modify it if you prefer...but please take into account
>> the kdump case and how it complicates the logic.
> 
> I agree that the runtime effect of one list vs. the other is nil.  The
> code works and can stay as you written it.
> 
> I was trying to align from a conceptual standpoint.  It was a bit
> unexpected that one path would be on the hypervisor list, and the
> other path effectively on the informational list.  When I see
> conceptual mismatches like that, I tend to want to understand why,
> and if there is something more fundamental that is out-of-whack.
> 

Totally agree with you here, I am like that as well - try to really
understand the details, this is very important specially in this patch
set, since it's a refactor and affects every user of the notifiers
infrastructure.

Again, just to double-say it: feel free to suggest any change for the
Hyper-V portion (might as well for any patch in the series, indeed) -
you and the other Hyper-V maintainers own this code and I'd be glad to
align with your needs, you are honor citizens in the panic notifiers
area, being one the most heavy users for that =)

Cheers,


Guilherme



Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-05-02 Thread Guilherme G. Piccoli
On 02/05/2022 12:38, Florian Fainelli wrote:
> [...] 
> 
> Acked-by: Florian Fainelli 
> 
> Not sure if the Fixes tag is warranted however as this is a clean up, 
> and not really fixing a bug.

Perfect, thanks Florian. I'll add your ACK and remove the fixes tag in V2.
Cheers,


Guilherme



Re: [PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header

2022-05-02 Thread Guilherme G. Piccoli
On 02/05/2022 12:38, Florian Fainelli wrote:
> [...] 
> Acked-by: Florian Fainelli 
> 
> Likewise, I am not sure if the Fixes tag is necessary here.

Perfect Florian, thanks!  I'll add your Acked-by tag and remove the
fixes for V2 =)
Cheers,


Guilherme



Re: [PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers

2022-04-29 Thread Guilherme G. Piccoli
Hi Michael, first of all thanks for the great review, much appreciated.
Some comments inline below:

On 29/04/2022 14:16, Michael Kelley (LINUX) wrote:
> [...]
>> hypervisor I/O completion), so we postpone that to run late. But more
>> relevant: this *same* vmbus unloading happens in the crash_shutdown()
>> handler, so if kdump is set, we can safely skip this panic notifier and
>> defer such clean-up to the kexec crash handler.
> 
> While the last sentence is true for Hyper-V on x86/x64, it's not true for
> Hyper-V on ARM64.  x86/x64 has the 'machine_ops' data structure
> with the ability to provide a custom crash_shutdown() function, which
> Hyper-V does in the form of hv_machine_crash_shutdown().  But ARM64
> has no mechanism to provide such a custom function that will eventually
> do the needed vmbus_initiate_unload() before running kdump.
> 
> I'm not immediately sure what the best solution is for ARM64.  At this
> point, I'm just pointing out the problem and will think about the tradeoffs
> for various possible solutions.  Please do the same yourself. :-)
> 

Oh, you're totally right! I just assumed ARM64 would the the same, my
bad. Just to propose some alternatives, so you/others can also discuss
here and we can reach a consensus about the trade-offs:

(a) We could forget about this change, and always do the clean-up here,
not relying in machine_crash_shutdown().
Pro: really simple, behaves the same as it is doing currently.
Con: less elegant/concise, doesn't allow arm64 customization.

(b) Add a way to allow ARM64 customization of shutdown crash handler.
Pro: matches x86, more customizable, improves arm64 arch code.
Con: A tad more complex.

Also, a question that came-up: if ARM64 has no way of calling special
crash shutdown handler, how can you execute hv_stimer_cleanup() and
hv_synic_disable_regs() there? Or are they not required in ARM64?


>>
>> (c) There is also a Hyper-V framebuffer panic notifier, which relies in
>> doing a vmbus operation that demands a valid connection. So, we must
>> order this notifier with the panic notifier from vmbus_drv.c, in order to
>> guarantee that the framebuffer code executes before the vmbus connection
>> is unloaded.
> 
> Patch 21 of this set puts the Hyper-V FB panic notifier on the pre_reboot
> notifier list, which means it won't execute before the VMbus connection
> unload in the case of kdump.   This notifier is making sure that Hyper-V
> is notified about the last updates made to the frame buffer before the
> panic, so maybe it needs to be put on the hypervisor notifier list.  It
> sends a message to Hyper-V over its existing VMbus channel, but it
> does not wait for a reply.  It does, however, obtain a spin lock on the
> ring buffer used to communicate with Hyper-V.   Unless someone has
> a better suggestion, I'm inclined to take the risk of blocking on that
> spin lock.

The logic behind that was: when kdump is set, we'd skip the vmbus
disconnect on notifiers, deferring that to crash_shutdown(), logic this
one refuted in the above discussion on ARM64 (one more Pro argument to
the idea of refactoring aarch64 code to allow a custom crash shutdown
handler heh). But you're right, for the default level 2, we skip the
pre_reboot notifiers on kdump, effectively skipping this notifier.

Some ideas of what we can do here:

I) we could change the framebuffer notifier to rely on trylocks, instead
of risking a lockup scenario, and with that, we can execute it before
the vmbus disconnect in the hypervisor list;

II) we ignore the hypervisor notifier in case of kdump _by default_, and
if the users don't want that, they can always set the panic notifier
level to 4 and run all notifiers prior to kdump; would that be terrible
you think? Kdump users might don't care about the framebuffer...

III) we go with approach (b) above and refactor arm64 code to allow the
custom crash handler on kdump time, then [with point (I) above] the
logic proposed in this series is still valid - seems more and more the
most correct/complete solution.

In any case, I guess we should avoid workarounds if possible and do the
things the best way we can, to encompass all (or almost all) the
possible scenarios and don't force things on users (like enforcing panic
notifier level 4 for Hyper-V or something like this...)

More feedback from you / Hyper-V folks is pretty welcome about this.


> 
>> [...]
> The "Fixes:" tags imply that these changes should be backported to older
> longterm kernel versions, which I don't think is the case.  There is a
> dependency on Patch 14 of your series where PANIC_NOTIFIER is
> introduced.
> 

Oh, this was more related with archeology of the kernel. When I'm
investigating stuff, I really want to understand why code was added and
that usually require some time git blaming stuff, so having that pronto
in the commit message is a bonus.

But of course we don't need to use the Fixes tag for that, easy to only
mention it in the text. A secondary benefit by using this 

Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 18:45, Russell King (Oracle) wrote:
> [...]
>> Marc, I did some investigation in the code (and tried/failed in the ARM
>> documentation as well heh), but this is still not 100% clear for me.
>>
>> You're saying IPI calls disable IRQs/FIQs by default in the the target
>> CPUs? Where does it happen? I'm a bit confused if this a processor
>> mechanism, or it's in code.
> 
> When we taken an IRQ, IRQs will be masked, FIQs will not. IPIs are
> themselves interrupts, so IRQs will be masked while the IPI is being
> processed. Therefore, there should be no need to re-disable the
> already disabled interrupts.
> 
>> But crash_smp_send_stop() is different, it seems to IPI the other CPUs
>> with the flag IPI_CALL_FUNC, which leads to calling
>> generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
>> as well? I couldn't find it.
> 
> It's buried in the architecture behaviour. When the CPU takes an
> interrupt and jumps to the interrupt vector in the vectors page, it is
> architecturally defined that interrupts will be disabled. If they
> weren't architecturally disabled at this point, then as soon as the
> first instruction is processed (at the interrupt vector, likely a
> branch) the CPU would immediately take another jump to the interrupt
> vector, and this process would continue indefinitely, making interrupt
> handling utterly useless.
> 
> So, you won't find an explicit instruction in the code path from the
> vectors to the IPI handler that disables interrupts - because it's
> written into the architecture that this is what must happen.
> 
> IRQs are a lower priority than FIQs, so FIQs remain unmasked.
> 

Thanks a lot for the *great* explanation Russell, much appreciated.
So, this leads to the both following questions:

a) Shall we then change the patch to only disable FIQs, since it's panic
path and we don't want secondary CPUs getting interrupted, but only
spinning quietly "forever"?

b) How about cleaning ipi_cpu_stop() then, by dropping the call to
local_irq_disable() there, to avoid the double IRQ disabling?

Thanks,


Guilherme



Re: [PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-29 Thread Guilherme G. Piccoli
Thanks Marc and Michael for the review/discussion.

On 29/04/2022 15:20, Marc Zyngier wrote:
> [...]

> My expectations would be that, since we're getting here using an IPI,
> interrupts are already masked. So what reenabled them the first place?
> 
> Thanks,
> 
>   M.
> 

Marc, I did some investigation in the code (and tried/failed in the ARM
documentation as well heh), but this is still not 100% clear for me.

You're saying IPI calls disable IRQs/FIQs by default in the the target
CPUs? Where does it happen? I'm a bit confused if this a processor
mechanism, or it's in code.

Looking the smp_send_stop() in arch/arm/, it does IPI the CPUs, with the
flag IPI_CPU_STOP, eventually calling ipi_cpu_stop(), and the latter
does disable IRQ/FIQ in code - that's where I stole my code from.

But crash_smp_send_stop() is different, it seems to IPI the other CPUs
with the flag IPI_CALL_FUNC, which leads to calling
generic_smp_call_function_interrupt() - does it disable interrupts/FIQs
as well? I couldn't find it.

Appreciate your clarifications about that, thanks again.
Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 14:53, Michael Kelley (LINUX) wrote:
> From: Guilherme G. Piccoli  Sent: Wednesday, April 27, 
> 2022 3:49 PM
>> [...]
>> +panic_notifiers_level=
>> +[KNL] Set the panic notifiers execution order.
>> +Format: 
>> +We currently have 4 lists of panic notifiers; based
>> +on the functionality and risk (for panic success) the
>> +callbacks are added in a given list. The lists are:
>> +- hypervisor/FW notification list (low risk);
>> +- informational list (low/medium risk);
>> +- pre_reboot list (higher risk);
>> +- post_reboot list (only run late in panic and after
>> +kdump, not configurable for now).
>> +This parameter defines the ordering of the first 3
>> +lists with regards to kdump; the levels determine
>> +which set of notifiers execute before kdump. The
>> +accepted levels are:
>> +0: kdump is the first thing to run, NO list is
>> +executed before kdump.
>> +1: only the hypervisor list is executed before kdump.
>> +2 (default level): the hypervisor list and (*if*
>> +there's any kmsg_dumper defined) the informational
>> +list are executed before kdump.
>> +3: both the hypervisor and the informational lists
>> +(always) execute before kdump.
> 
> I'm not clear on why level 2 exists.  What is the scenario where
> execution of the info list before kdump should be conditional on the
> existence of a kmsg_dumper?   Maybe the scenario is described
> somewhere in the patch set and I just missed it.
> 

Hi Michael, thanks for your review/consideration. So, this idea started
kind of some time ago. It all started with a need of exposing more
information on kernel log *before* kdump and *before* pstore -
specifically, we're talking about panic_print. But this cause some
reactions, Baoquan was very concerned with that [0]. Soon after, I've
proposed a panic notifiers filter (orthogonal) approach, to which Petr
suggested instead doing a major refactor [1] - it finally is alive in
the form of this series.

The theory behind the level 2 is to allow a scenario of kdump with the
minimum amount of notifiers - what is the point in printing more
information if the user doesn't care, since it's going to kdump? Now, if
there is a kmsg dumper, it means that there is likely some interest in
collecting information, and that might as well be required before the
potential kdump (which is my case, hence the proposal on [0]).

Instead of forcing one of the two behaviors (level 1 or level 3), we
have a middle-term/compromise: if there's interest in collecting such
data (in the form of a kmsg dumper), we then execute the informational
notifiers before kdump. If not, why to increase (even slightly) the risk
for kdump?

I'm OK in removing the level 2 if people prefer, but I don't feel it's a
burden, quite opposite - seems a good way to accommodate the somewhat
antagonistic ideas (jump to kdump ASAP vs collecting more info in the
panicked kernel log).

[0] https://lore.kernel.org/lkml/20220126052246.GC2086@MiWiFi-R3L-srv/

[1] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/


>[...]
>> + * Based on the level configured (smaller than 4), we clear the
>> + * proper bits in "panic_notifiers_bits". Notice that this bitfield
>> + * is initialized with all notifiers set.
>> + */
>> +switch (panic_notifiers_level) {
>> +case 3:
>> +clear_bit(PN_PRE_REBOOT_BIT, _notifiers_bits);
>> +break;
>> +case 2:
>> +clear_bit(PN_PRE_REBOOT_BIT, _notifiers_bits);
>> +
>> +if (!kmsg_has_dumpers())
>> +clear_bit(PN_INFO_BIT, _notifiers_bits);
>> +break;
>> +case 1:
>> +clear_bit(PN_PRE_REBOOT_BIT, _notifiers_bits);
>> +clear_bit(PN_INFO_BIT, _notifiers_bits);
>> +break;
>> +case 0:
>> +clear_bit(PN_PRE_REBOOT_BIT, _notifiers_bits);
>> +clear_bit(PN_INFO_BIT, _notifiers_bits);
>> +clear_bit(PN_HYPERVISOR_BIT, _notifiers_bits);
>> +break;
>> +}
> 
> I think the above switch statement could be done as follows:
> 
> if (panic_notifiers_level <= 3)
>   clear_bit(PN_PRE_REBOOT_BIT, _notifiers_bits);
> if (panic_notifiers_level <= 2)
>   if (!kmsg_has

Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-04-29 Thread Guilherme G. Piccoli
On 27/04/2022 22:01, Xiaoming Ni wrote:
> [...]
> Duplicate Code.
> 
> Is it better to use __func__ and %pS?
> 
> pr_info("%s: %pS\n", __func__, n->notifier_call);
> 
> 

This is a great suggestion Xiaoming, much appreciated!
I feel like reinventing the wheel here - with your idea, code was super
clear and concise, very nice suggestion!!

The only 2 things that diverge from your idea: I'm using '%ps' (not
showing offsets) and also, kept the wording "(un)registered/calling",
not using __func__ - I feel it's a bit odd in the output.
OK for you?

I'm definitely using your idea in V2 heh
Cheers,


Guilherme



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 13:04, Max Filippov wrote:
> [...]
>>  arch/xtensa/platforms/iss/setup.c |  4 ++--For xtensa:
> 
> For xtensa:
> Acked-by: Max Filippov 
> 

Perfect, thanks Max!
Cheers,


Guilherme



Re: [PATCH 13/30] s390/consoles: Improve panic notifiers reliability

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 15:46, Heiko Carstens wrote:
> [...]
> 
> Code looks good, and everything still seems to work. I applied this
> internally for the time being, and if it passes testing, I'll schedule
> it for the next merge window.
> 
> Thanks!

Perfect Heiko, thanks a bunch for your review and tests!
Let me know if anything breaks heh
Cheers,


Guilherme



Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 14:30, Michael Kelley (LINUX) wrote:
> From: Guilherme G. Piccoli  Sent: Wednesday, April 27, 
> 2022 3:49 PM
>> [...]
>>
>> @@ -2843,7 +2843,7 @@ static void __exit vmbus_exit(void)
>>  if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
>>  kmsg_dump_unregister(_kmsg_dumper);
>>  unregister_die_notifier(_die_report_block);
>> -atomic_notifier_chain_unregister(_notifier_list,
>> +atomic_notifier_chain_unregister(_hypervisor_list,
>>  _panic_report_block);
>>  }
>>
> 
> Using the hypervisor_list here produces a bit of a mismatch.  In many cases
> this notifier will do nothing, and will defer to the kmsg_dump() mechanism
> to notify the hypervisor about the panic.   Running the kmsg_dump()
> mechanism is linked to the info_list, so I'm thinking the Hyper-V panic report
> notifier should be on the info_list as well.  That way the reporting behavior
> is triggered at the same point in the panic path regardless of which
> reporting mechanism is used.
> 

Hi Michael, thanks for your feedback! I agree that your idea could work,
but...there is one downside: imagine the kmsg_dump() approach is not set
in some Hyper-V guest, then we would rely in the regular notification
mechanism [hv_die_panic_notify_crash()], right?
But...you want then to run this notifier in the informational list,
which...won't execute *by default* before kdump if no kmsg_dump() is
set. So, this logic is convoluted when you mix it with the default level
concept + kdump.

May I suggest something? If possible, take a run with this patch set +
DEBUG_NOTIFIER=y, in *both* cases (with and without the kmsg_dump()
set). I did that and they run almost at the same time...I've checked the
notifiers called, it's like almost nothing runs in-between.

I feel the panic notification mechanism does really fit with a
hypervisor list, it's a good match with the nature of the list, which
aims at informing the panic notification to the hypervisor/FW.
Of course we can modify it if you prefer...but please take into account
the kdump case and how it complicates the logic.

Let me know your considerations, in case you can experiment with the
patch set as-is.
Cheers,


Guilherme



Re: [PATCH 24/30] panic: Refactor the panic path

2022-04-29 Thread Guilherme G. Piccoli
On 27/04/2022 21:28, Randy Dunlap wrote:
> 
> 
> On 4/27/22 15:49, Guilherme G. Piccoli wrote:
>> +crash_kexec_post_notifiers
>> +This was DEPRECATED - users should always prefer the
> 
>   This is DEPRECATED - users should always prefer the
> 
>> +parameter "panic_notifiers_level" - check its entry
>> +in this documentation for details on how it works.
>> +Setting this parameter is exactly the same as setting
>> +"panic_notifiers_level=4".
> 

Thanks Randy, for your suggestion - but I confess I couldn't understand
it properly. It's related to spaces/tabs, right? What you suggest me to
change in this formatting? Just by looking the email I can't parse.

Cheers,


Guilherme



Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-29 Thread Guilherme G. Piccoli
On 28/04/2022 13:26, Corey Minyard wrote:
> [...]
> 
> For the IPMI portion:
> 
> Acked-by: Corey Minyard 

Thanks Alex and Corey for the ACKs!

> 
> Note that the IPMI panic_event() should always return, but it may take
> some time, especially if the IPMI controller is no longer functional.
> So the risk of a long delay is there and it makes sense to move it very
> late.
> 

Thanks, I agree - the patch moves it to the (latest - 1) position, since
some arch code might run as the latest and effectively stops the machine.
Cheers,


Guilherme



Re: [PATCH 20/30] panic: Add the panic informational notifier list

2022-04-29 Thread Guilherme G. Piccoli
Thanks Paul and Suzuki for the ACKs.
Cheers,


Guilherme



Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 10:56, Steven Rostedt wrote:
> [...]
> No. The fallthrough keyword is only needed when there's code between case
> labels. As it is very common to list multiple cases for the same code path.
> That is:
> 
>   case DIE_OOPS:
>   case PANIC_NOTIFIER:
>   do_dump = 1;
>   break;
> 
> Does not need a fall through label, as there's no code between the DIE_OOPS
> and the PANIC_NOTIFIER. But if you had:
> 
>   case DIE_OOPS:
>   x = true;
>   case PANIC_NOTIFIER:
>   do_dump = 1;
>   break;
> 
> Then you do.
> 
> -- Steve

Thanks a bunch for the clarification, changed that for V2 =)



Re: [PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path

2022-04-29 Thread Guilherme G. Piccoli
On 28/04/2022 13:55, Helge Deller wrote:
> [...]
> You may add:
> Acked-by: Helge Deller  # parisc
> 
> Helge

Thanks Helge, added!
Cheers,


Guilherme

> 
> 
>> ---
>>  arch/parisc/include/asm/pdc.h |  1 +
>>  arch/parisc/kernel/firmware.c | 27 +++
>>  drivers/parisc/power.c| 17 ++---
>>  3 files changed, 34 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
>> index b643092d4b98..7a106008e258 100644
>> --- a/arch/parisc/include/asm/pdc.h
>> +++ b/arch/parisc/include/asm/pdc.h
>> @@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
>>  int pdc_do_reset(void);
>>  int pdc_soft_power_info(unsigned long *power_reg);
>>  int pdc_soft_power_button(int sw_control);
>> +int pdc_soft_power_button_panic(int sw_control);
>>  void pdc_io_reset(void);
>>  void pdc_io_reset_devices(void);
>>  int pdc_iodc_getc(void);
>> diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
>> index 6a7e315bcc2e..0e2f70b592f4 100644
>> --- a/arch/parisc/kernel/firmware.c
>> +++ b/arch/parisc/kernel/firmware.c
>> @@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long 
>> *power_reg)
>>  }
>>
>>  /*
>> - * pdc_soft_power_button - Control the soft power button behaviour
>> - * @sw_control: 0 for hardware control, 1 for software control
>> + * pdc_soft_power_button{_panic} - Control the soft power button behaviour
>> + * @sw_control: 0 for hardware control, 1 for software control
>>   *
>>   *
>>   * This PDC function places the soft power button under software or
>>   * hardware control.
>> - * Under software control the OS may control to when to allow to shut
>> - * down the system. Under hardware control pressing the power button
>> + * Under software control the OS may control to when to allow to shut
>> + * down the system. Under hardware control pressing the power button
>>   * powers off the system immediately.
>> + *
>> + * The _panic version relies in spin_trylock to prevent deadlock
>> + * on panic path.
>>   */
>>  int pdc_soft_power_button(int sw_control)
>>  {
>> @@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
>>  return retval;
>>  }
>>
>> +int pdc_soft_power_button_panic(int sw_control)
>> +{
>> +int retval;
>> +unsigned long flags;
>> +
>> +if (!spin_trylock_irqsave(_lock, flags)) {
>> +pr_emerg("Couldn't enable soft power button\n");
>> +return -EBUSY; /* ignored by the panic notifier */
>> +}
>> +
>> +retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, 
>> __pa(pdc_result), sw_control);
>> +spin_unlock_irqrestore(_lock, flags);
>> +
>> +return retval;
>> +}
>> +
>>  /*
>>   * pdc_io_reset - Hack to avoid overlapping range registers of Bridges 
>> devices.
>>   * Primarily a problem on T600 (which parisc-linux doesn't support) but
>> diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
>> index 456776bd8ee6..8512884de2cf 100644
>> --- a/drivers/parisc/power.c
>> +++ b/drivers/parisc/power.c
>> @@ -37,7 +37,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
>>
>>
>>
>> -/* parisc_panic_event() is called by the panic handler.
>> - * As soon as a panic occurs, our tasklets above will not be
>> - * executed any longer. This function then re-enables the
>> - * soft-power switch and allows the user to switch off the system
>> +/*
>> + * parisc_panic_event() is called by the panic handler.
>> + *
>> + * As soon as a panic occurs, our tasklets above will not
>> + * be executed any longer. This function then re-enables
>> + * the soft-power switch and allows the user to switch off
>> + * the system. We rely in pdc_soft_power_button_panic()
>> + * since this version spin_trylocks (instead of regular
>> + * spinlock), preventing deadlocks on panic path.
>>   */
>>  static int parisc_panic_event(struct notifier_block *this,
>>  unsigned long event, void *ptr)
>>  {
>>  /* re-enable the soft-power switch */
>> -pdc_soft_power_button(0);
>> +pdc_soft_power_button_panic(0);
>>  return NOTIFY_DONE;
>>  }
>>
>> @@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
>>  .priority   = INT_MAX,
>>  };
>>
>> -
>>  static int __init power_init(void)
>>  {
>>  unsigned long ret;
> 



Re: [PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-04-29 Thread Guilherme G. Piccoli
On 28/04/2022 05:11, Suzuki K Poulose wrote:
> Hi Guilherme,
> [...] 
> How would you like to proceed with queuing this ? I am happy
> either way. In case you plan to push this as part of this
> series (I don't see any potential conflicts) :
> 
> Reviewed-by: Suzuki K Poulose 

Thanks for your review Suzuki, much appreciated!

About your question, I'm not sure yet - in case the core changes would
take a while (like if community find them polemic, require many changes,
etc) I might split this series in 2 parts, the fixes part vs the
improvements per se. Either way, a V2 is going to happen for sure, and
in that moment, I'll let you know what I think it's best.

But either way, any choice you prefer is fine by me as well (like if you
want to merge it now or postpone to get merged in the future), this is
not an urgent fix I think =)
Cheers,


Guilherme



Re: [PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-29 Thread Guilherme G. Piccoli
On 29/04/2022 10:23, Steven Rostedt wrote:
> On Fri, 29 Apr 2022 12:22:44 +0300
> Sergei Shtylyov  wrote:
> 
>>> +   switch (ev) {
>>> +   case DIE_OOPS:
>>> +   do_dump = 1;
>>> +   break;
>>> +   case PANIC_NOTIFIER:
>>> +   do_dump = 1;
>>> +   break;  
>>
>>Why not:
>>
>>  case DIE_OOPS:
>>  case PANIC_NOTIFIER:
>>  do_dump = 1;
>>  break;
> 
> Agreed.
> 
> Other than that.
> 
> Acked-by: Steven Rostedt (Google) 
> 
> -- Steve

Thanks Sergei and Steven, good idea! I thought about the switch change
you propose, but I confess I got a bit confused by the "fallthrough"
keyword - do I need to use it?

About the s/int/bool, for sure! Not sure why I didn't use bool at
first...heheh

Cheers,


Guilherme



[PATCH 29/30] powerpc: ps3, pseries: Avoid duplicate call to kmsg_dump() on panic

2022-04-27 Thread Guilherme G. Piccoli
Currently both pseries and ps3 are platforms that define special
panic notifiers that run as callbacks inside powerpc generic panic
notifier. In both cases kmsg_dump() is called, and the reason seems
to be that both of these callbacks aims to effectively stop the
machine, so nothing would execute after that - hence, both force
a series of console flushing related operations, after calling
the kmsg dumpers.

Happens that recently the panic path was refactored, and now
kmsg_dump() is *certainly* called before the pre_reboot panic
notifiers, category in which both pseries/ps3 callbacks belong.
In other words: kmsg_dump() will execute twice in both platforms,
on panic path.

This patch prevents that by disabling the kmsg_dump() for both
platform's notifiers. But worth to notice that PowerNV still
has a legit use for executing kmsg_dump() in its unrecoverable
error path, so we rely in parameter passing to differentiate
both cases. Also, since the pre_reboot notifiers still run
earlier than console flushing routines, we kept that for
both pseries and ps3 platforms, only skipping kmsg_dump().

Fixes: 35adacd6fc48 ("powerpc/pseries, ps3: panic flush kernel messages before 
halting system")
Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/include/asm/bug.h | 2 +-
 arch/powerpc/kernel/traps.c| 6 --
 arch/powerpc/platforms/powernv/opal.c  | 2 +-
 arch/powerpc/platforms/ps3/setup.c | 2 +-
 arch/powerpc/platforms/pseries/setup.c | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index ecbae1832de3..49e5f6f86869 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -166,7 +166,7 @@ extern void die(const char *, struct pt_regs *, long);
 void die_mce(const char *str, struct pt_regs *regs, long err);
 extern bool die_will_crash(void);
 extern void panic_flush_kmsg_start(void);
-extern void panic_flush_kmsg_end(void);
+extern void panic_flush_kmsg_end(bool dump);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a08bb7cefdc5..837a5ed98d62 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -169,9 +169,11 @@ extern void panic_flush_kmsg_start(void)
bust_spinlocks(1);
 }
 
-extern void panic_flush_kmsg_end(void)
+extern void panic_flush_kmsg_end(bool dump)
 {
-   kmsg_dump(KMSG_DUMP_PANIC);
+   if (dump)
+   kmsg_dump(KMSG_DUMP_PANIC);
+
bust_spinlocks(0);
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 55a8fbfdb5b2..d172ceedece2 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -641,7 +641,7 @@ void __noreturn pnv_platform_error_reboot(struct pt_regs 
*regs, const char *msg)
show_regs(regs);
smp_send_stop();
 
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(true);
 
/*
 * Don't bother to shut things down because this will
diff --git a/arch/powerpc/platforms/ps3/setup.c 
b/arch/powerpc/platforms/ps3/setup.c
index 3de9145c20bc..7cb78e508fb3 100644
--- a/arch/powerpc/platforms/ps3/setup.c
+++ b/arch/powerpc/platforms/ps3/setup.c
@@ -102,7 +102,7 @@ static void ps3_panic(char *str)
printk("   System does not reboot automatically.\n");
printk("   Please press POWER button.\n");
printk("\n");
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(false);
 
while(1)
lv1_pause(1);
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 955ff8aa1644..d6eea473eafd 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -856,7 +856,7 @@ static void __init pSeries_setup_arch(void)
 
 static void pseries_panic(char *str)
 {
-   panic_flush_kmsg_end();
+   panic_flush_kmsg_end(false);
rtas_os_term(str);
 }
 
-- 
2.36.0




[PATCH 30/30] um: Avoid duplicate call to kmsg_dump()

2022-04-27 Thread Guilherme G. Piccoli
Currently the panic notifier panic_exit() calls kmsg_dump() and
some console flushing routines - this makes sense since such
panic notifier exits UserMode Linux and never returns.

Happens that after a panic refactor, kmsg_dump() is now always
called *before* the pre_reboot list of panic notifiers, in which
panic_exit() belongs, leading to a double call situation.

This patch changes that by removing such call from the panic
notifier, but leaving the console flushing calls since the
pre_reboot list still runs before console flushing on panic().

Cc: Anton Ivanov 
Cc: Johannes Berg 
Cc: Richard Weinberger 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/um/kernel/um_arch.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index fc6e443299da..651310e3e86f 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -241,7 +241,6 @@ static void __init uml_postsetup(void)
 static int panic_exit(struct notifier_block *self, unsigned long unused1,
  void *unused2)
 {
-   kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(1);
bust_spinlocks(0);
uml_exitcode = 1;
-- 
2.36.0




[PATCH 28/30] panic: Unexport crash_kexec_post_notifiers

2022-04-27 Thread Guilherme G. Piccoli
There is no users anymore of this variable that requires
it to be "exported" in the headers; also, it was deprecated
by the kernel parameter "panic_notifiers_level".

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/panic.h  | 2 --
 include/linux/panic_notifier.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 34175d0188d0..d301db07a8af 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -34,8 +34,6 @@ extern int sysctl_panic_on_rcu_stall;
 extern int sysctl_max_rcu_stall_to_panic;
 extern int sysctl_panic_on_stackoverflow;
 
-extern bool crash_kexec_post_notifiers;
-
 /*
  * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
  * holds a CPU number which is executing panic() currently. A value of
diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index b5041132321d..8fda7045e2f7 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -11,7 +11,6 @@ extern struct atomic_notifier_head panic_pre_reboot_list;
 extern struct atomic_notifier_head panic_post_reboot_list;
 
 bool panic_notifiers_before_kdump(void);
-extern bool crash_kexec_post_notifiers;
 
 enum panic_notifier_val {
PANIC_UNUSED,
-- 
2.36.0




[PATCH 27/30] powerpc: Do not force all panic notifiers to execute before kdump

2022-04-27 Thread Guilherme G. Piccoli
Commit 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in
vmcore generated with panic") introduced a hardcoded setting of kernel
parameter "crash_kexec_post_notifiers", effectively forcing all the
panic notifiers to execute earlier in the panic path, before kdump.

The reason for that was a fadump issue on collecting data accurately,
due to smp_send_stop() setting all CPUs offline, so the net effect
desired with this change was to avoid calling the regular CPU
shutdown function, and instead rely on crash_smp_send_stop(), which
copes fine with fadump. The collateral effect was to increase the
risk for kdump if fadump is not used, since it forces all panic
notifiers to execute early, before kdump.

Happens that, after a panic refactor, crash_smp_send_stop() is
now used by default in the panic path, so there is no reason to
mess with the notifiers ordering (which was also improved in the
refactor) from within arch code.

Fixes: 06e629c25daa ("powerpc/fadump: Fix inaccurate CPU state info in vmcore 
generated with panic")
Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment.

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/fadump.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 65562c4a0a69..35ae8c09af66 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1649,14 +1649,6 @@ int __init setup_fadump(void)
register_fadump();
}
 
-   /*
-* In case of panic, fadump is triggered via ppc_panic_event()
-* panic notifier. Setting crash_kexec_post_notifiers to 'true'
-* lets panic() function take crash friendly path before panic
-* notifiers are invoked.
-*/
-   crash_kexec_post_notifiers = true;
-
return 1;
 }
 /*
-- 
2.36.0




[PATCH 24/30] panic: Refactor the panic path

2022-04-27 Thread Guilherme G. Piccoli
The panic() function is somewhat convoluted - a lot of changes were
made over the years, adding comments that might be misleading/outdated
now, it has a code structure that is a bit complex to follow, with
lots of conditionals, for example. The panic notifier list is something
else - a single list, with multiple callbacks of different purposes,
that run in a non-deterministic order and may affect hardly kdump
reliability - see the "crash_kexec_post_notifiers" workaround-ish flag.

This patch proposes a major refactor on the panic path based on Petr's
idea [0] - basically we split the notifiers list in three, having a set
of different call points in the panic path. Below a list of changes
proposed in this patch, culminating in the panic notifiers level
concept:

(a) First of all, we improved comments all over the function
and removed useless variables / includes. Also, as part of this
clean-up we concentrate the console flushing functions in a helper.

(b) As mentioned before, there is a split of the panic notifier list
in three, based on the purpose of the callback. The code contains
good documentation in form of comments, but a summary of the three
lists follows:

- the hypervisor list aims low-risk procedures to inform hypervisors
or firmware about the panic event, also includes LED-related functions;

- the informational list contains callbacks that provide more details,
like kernel offset or trace dump (if enabled) and also includes the
callbacks aimed at reducing log pollution or warns, like the RCU and
hung task disable callbacks;

- finally, the pre_reboot list is the old notifier list renamed,
containing the more risky callbacks that didn't fit the previous
lists. There is also a 4th list (the post_reboot one), but it's not
related with the original list - it contains late time architecture
callbacks aimed at stopping the machine, for example.

The 3 notifiers lists execute in different moments, hypervisor being
the first, followed by informational and finally the pre_reboot list.

(c) But then, there is the ordering problem of the notifiers against
the crash_kernel() call - kdump must be as reliable as possible.
For that, a simple binary "switch" as "crash_kexec_post_notifiers"
is not enough, hence we introduce here concept of panic notifier
levels: there are 5 levels, from 0 (no notifier executes before
kdump) until 4 (all notifiers run before kdump); the default level
is 2, in which the hypervisor and (iff we have any kmsg dumper)
the informational notifiers execute before kdump.

The detailed documentation of the levels is present in code comments
and in the kernel-parameters.txt file; as an analogy with the previous
panic() implementation, the level 0 is exactly the same as the old
behavior of notifiers, running all after kdump, and the level 4 is
the same as "crash_kexec_post_notifiers=Y" (we kept this parameter as
a deprecated one).

(d) Finally, an important change made here: we now use only the
function "crash_smp_send_stop()" to shut all the secondary CPUs
in the panic path. Before, there was a case of using the regular
"smp_send_stop()", but the better approach is to simplify the
code and try to use the function which was created exclusively
for the panic path. Experiments showed that it works fine, and
code was very simplified with that.

Functional change is expected from this refactor, since now we
call some notifiers by default before kdump, but the goal here
besides code clean-up is to have a better panic path, more
reliable and deterministic, but also very customizable.

[0] https://lore.kernel.org/lkml/YfPxvzSzDLjO5ldp@alley/

Suggested-by: Petr Mladek 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Petr and Baoquan for the suggestion and feedback in a previous
email thread. There's some important design decisions that worth mentioning and
discussing:

* The default panic notifiers level is 2, based on Petr Mladek's suggestion,
which makes a lot of sense. Of course, this is customizable through the
parameter, but would be something worthwhile to have a KConfig option to set
the default level? It would help distros that want the old behavior
(no notifiers before kdump) as default.

* The implementation choice was to _avoid_ intricate if conditionals in the
panic path, which would _definitely_ be present with the panic notifiers levels
idea; so, instead of lots of if conditionals, the set/clear bits approach with
functions called in 2 points (but executing only in one of them) is much easier
to follow an was used here; the ordering helper function and the comments also
help a lot to avoid confusion (hopefully).

* Choice was to *always* use crash_smp_send_stop() instead of sometimes making
use of the regular smp_send_stop(); for most architectures they are the same,
including Xen (on x86). For the ones that override it, all should work fine,
in the powerpc case it's even more correct (see the subsequent patch

[PATCH 21/30] panic: Introduce the panic pre-reboot notifier list

2022-04-27 Thread Guilherme G. Piccoli
This patch renames the panic_notifier_list to panic_pre_reboot_list;
the idea is that a subsequent patch will refactor the panic path
in order to better split the notifiers, running some of them very
early, some of them not so early [but still before kmsg_dump()] and
finally, the rest should execute late, after kdump. The latter ones
are now in the panic pre-reboot list - the name comes from the idea
that these notifiers execute before panic() attempts rebooting the
machine (if that option is set).

We also took the opportunity to clean-up useless header inclusions,
improve some notifier block declarations (e.g. in ibmasm/heartbeat.c)
and more important, change some priorities - we hereby set 2 notifiers
to run late in the list [iss_panic_event() and the IPMI panic_event()]
due to the risks they offer (may not return, for example).
Proper documentation is going to be provided in a subsequent patch,
that effectively refactors the panic path.

Cc: Alex Elder 
Cc: Alexander Gordeev 
Cc: Anton Ivanov 
Cc: Benjamin Herrenschmidt 
Cc: Bjorn Andersson 
Cc: Boris Ostrovsky 
Cc: Chris Zankel 
Cc: Christian Borntraeger 
Cc: Corey Minyard 
Cc: Dexuan Cui 
Cc: "H. Peter Anvin" 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ivan Kokshaysky 
Cc: "James E.J. Bottomley" 
Cc: James Morse 
Cc: Johannes Berg 
Cc: Juergen Gross 
Cc: "K. Y. Srinivasan" 
Cc: Mathieu Poirier 
Cc: Matt Turner 
Cc: Mauro Carvalho Chehab 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Richard Henderson 
Cc: Richard Weinberger 
Cc: Robert Richter 
Cc: Stefano Stabellini 
Cc: Stephen Hemminger 
Cc: Sven Schnelle 
Cc: Tony Luck 
Cc: Vasily Gorbik 
Cc: Wei Liu 
Signed-off-by: Guilherme G. Piccoli 
---

Notice that, with this name change, out-of-tree code that relies in the global
exported "panic_notifier_list" will fail to build. We could easily keep the
retro-compatibility by making the old symbol to still exist and point to the
pre_reboot list (or even, keep the old naming).

But our design choice was to allow the breakage, making users rethink their
notifiers, adding them in the list that fits best. If that wasn't a good
decision, we're open to change it, of course.
Thanks in advance for the review!

 arch/alpha/kernel/setup.c |  4 ++--
 arch/parisc/kernel/pdc_chassis.c  |  3 +--
 arch/powerpc/kernel/setup-common.c|  2 +-
 arch/s390/kernel/ipl.c|  4 ++--
 arch/um/drivers/mconsole_kern.c   |  2 +-
 arch/um/kernel/um_arch.c  |  2 +-
 arch/x86/xen/enlighten.c  |  2 +-
 arch/xtensa/platforms/iss/setup.c |  4 ++--
 drivers/char/ipmi/ipmi_msghandler.c   | 12 +++-
 drivers/edac/altera_edac.c|  3 +--
 drivers/hv/vmbus_drv.c|  4 ++--
 drivers/leds/trigger/ledtrig-panic.c  |  3 +--
 drivers/misc/ibmasm/heartbeat.c   | 16 +---
 drivers/net/ipa/ipa_smp2p.c   |  5 ++---
 drivers/parisc/power.c|  4 ++--
 drivers/remoteproc/remoteproc_core.c  |  6 --
 drivers/s390/char/con3215.c   |  2 +-
 drivers/s390/char/con3270.c   |  2 +-
 drivers/s390/char/sclp_con.c  |  2 +-
 drivers/s390/char/sclp_vt220.c|  2 +-
 drivers/staging/olpc_dcon/olpc_dcon.c |  6 --
 drivers/video/fbdev/hyperv_fb.c   |  4 ++--
 include/linux/panic_notifier.h|  2 +-
 kernel/panic.c|  9 -
 24 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index d88bdf852753..8ace0d7113b6 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -472,8 +472,8 @@ setup_arch(char **cmdline_p)
}
 
/* Register a call for panic conditions. */
-   atomic_notifier_chain_register(_notifier_list,
-   _panic_block);
+   atomic_notifier_chain_register(_pre_reboot_list,
+   _panic_block);
 
 #ifndef alpha_using_srm
/* Assume that we've booted from SRM if we haven't booted from MILO.
diff --git a/arch/parisc/kernel/pdc_chassis.c b/arch/parisc/kernel/pdc_chassis.c
index da154406d368..0fd8d87fb4f9 100644
--- a/arch/parisc/kernel/pdc_chassis.c
+++ b/arch/parisc/kernel/pdc_chassis.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -135,7 +134,7 @@ void __init parisc_pdc_chassis_init(void)
PDC_CHASSIS_VER);
 
/* initialize panic notifier chain */
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_pre_reboot_list,
_chassis_panic_block);
 
/* initialize reboot notifier chain */
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index d04b8bf8dbc7..3518bebc10ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch

[PATCH 20/30] panic: Add the panic informational notifier list

2022-04-27 Thread Guilherme G. Piccoli
The goal of this new panic notifier is to allow its users to
register callbacks to run earlier in the panic path than they
currently do. This aims at informational mechanisms, like dumping
kernel offsets and showing device error data (in case it's simple
registers reading, for example) as well as mechanisms to disable
log flooding (like hung_task detector / RCU warnings) and the
tracing dump_on_oops (when enabled).

Any (non-invasive) information that should be provided before
kmsg_dump() as well as log flooding preventing code should fit
here, as long it offers relatively low risk for kdump.

For now, the patch is almost a no-op, although it changes a bit
the ordering in which some panic notifiers are executed - specially
affected by this are the notifiers responsible for disabling the
hung_task detector / RCU warnings, which now run first. In a
subsequent patch, the panic path will be refactored, then the
panic informational notifiers will effectively run earlier,
before ksmg_dump() (and usually before kdump as well).

We also defer documenting it all properly in the subsequent
refactor patch. Finally, while at it, we removed some useless
header inclusions too.

Cc: Benjamin Herrenschmidt 
Cc: Catalin Marinas 
Cc: Florian Fainelli 
Cc: Frederic Weisbecker 
Cc: "H. Peter Anvin" 
Cc: Hari Bathini 
Cc: Joel Fernandes 
Cc: Jonathan Hunter 
Cc: Josh Triplett 
Cc: Lai Jiangshan 
Cc: Leo Yan 
Cc: Mathieu Desnoyers 
Cc: Mathieu Poirier 
Cc: Michael Ellerman 
Cc: Mike Leach 
Cc: Mikko Perttunen 
Cc: Neeraj Upadhyay 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Cc: Suzuki K Poulose 
Cc: Thierry Reding 
Cc: Thomas Bogendoerfer 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/arm64/kernel/setup.c | 2 +-
 arch/mips/kernel/relocate.c   | 2 +-
 arch/powerpc/kernel/setup-common.c| 2 +-
 arch/x86/kernel/setup.c   | 2 +-
 drivers/bus/brcmstb_gisb.c| 2 +-
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 4 ++--
 drivers/soc/tegra/ari-tegra186.c  | 3 ++-
 include/linux/panic_notifier.h| 1 +
 kernel/hung_task.c| 3 ++-
 kernel/panic.c| 4 
 kernel/rcu/tree.c | 1 -
 kernel/rcu/tree_stall.h   | 3 ++-
 kernel/trace/trace.c  | 2 +-
 13 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 3505789cf4bd..ac2c7e8c9c6a 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -444,7 +444,7 @@ static struct notifier_block arm64_panic_block = {
 
 static int __init register_arm64_panic_block(void)
 {
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_info_list,
   _panic_block);
return 0;
 }
diff --git a/arch/mips/kernel/relocate.c b/arch/mips/kernel/relocate.c
index 56b51de2dc51..650811f2436a 100644
--- a/arch/mips/kernel/relocate.c
+++ b/arch/mips/kernel/relocate.c
@@ -459,7 +459,7 @@ static struct notifier_block kernel_location_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_info_list,
   _location_notifier);
return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1468c3937bf4..d04b8bf8dbc7 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -757,7 +757,7 @@ void __init setup_panic(void)
   _fadump_block);
 
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_info_list,
   _offset_notifier);
 
/* Low-level platform-specific routines that should run on panic */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c95b9ac5a457..599b25346964 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1266,7 +1266,7 @@ static struct notifier_block kernel_offset_notifier = {
 
 static int __init register_kernel_offset_dumper(void)
 {
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_info_list,
_offset_notifier);
return 0;
 }
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 1ea7b015e225..c64e087fba7a 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -486,7 +486,7 @@ static int __init brcmstb_gisb_arb_probe(struct 
platform_device *pdev)
 
if (list_is_singular(_gisb_arb_device_list)) {
register_die_n

[PATCH 19/30] panic: Add the panic hypervisor notifier list

2022-04-27 Thread Guilherme G. Piccoli
The goal of this new panic notifier is to allow its users to register
callbacks to run very early in the panic path. This aims hypervisor/FW
notification mechanisms as well as simple LED functions, and any other
simple and safe mechanism that should run early in the panic path; more
dangerous callbacks should execute later.

For now, the patch is almost a no-op (although it changes a bit the
ordering in which some panic notifiers are executed). In a subsequent
patch, the panic path will be refactored, then the panic hypervisor
notifiers will effectively run very early in the panic path.

We also defer documenting it all properly in the subsequent refactor
patch. While at it, we removed some useless header inclusions and
fixed some notifiers return too (by using the standard NOTIFY_DONE).

Cc: Alexander Gordeev 
Cc: Andrea Parri (Microsoft) 
Cc: Ard Biesheuvel 
Cc: Benjamin Herrenschmidt 
Cc: Brian Norris 
Cc: Christian Borntraeger 
Cc: Christophe JAILLET 
Cc: David Gow 
Cc: "David S. Miller" 
Cc: Dexuan Cui 
Cc: Doug Berger 
Cc: Evan Green 
Cc: Florian Fainelli 
Cc: Haiyang Zhang 
Cc: Hari Bathini 
Cc: Heiko Carstens 
Cc: Julius Werner 
Cc: Justin Chen 
Cc: "K. Y. Srinivasan" 
Cc: Lee Jones 
Cc: Markus Mayer 
Cc: Michael Ellerman 
Cc: Michael Kelley 
Cc: Mihai Carabas 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Scott Branden 
Cc: Sebastian Reichel 
Cc: Shile Zhang 
Cc: Stephen Hemminger 
Cc: Sven Schnelle 
Cc: Thomas Bogendoerfer 
Cc: Tianyu Lan 
Cc: Vasily Gorbik 
Cc: Wang ShaoBo 
Cc: Wei Liu 
Cc: zhenwei pi 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/mips/sgi-ip22/ip22-reset.c  | 2 +-
 arch/mips/sgi-ip32/ip32-reset.c  | 3 +--
 arch/powerpc/kernel/setup-common.c   | 2 +-
 arch/sparc/kernel/sstate.c   | 3 +--
 drivers/firmware/google/gsmi.c   | 4 ++--
 drivers/hv/vmbus_drv.c   | 4 ++--
 drivers/leds/trigger/ledtrig-activity.c  | 4 ++--
 drivers/leds/trigger/ledtrig-heartbeat.c | 4 ++--
 drivers/misc/bcm-vk/bcm_vk_dev.c | 6 +++---
 drivers/misc/pvpanic/pvpanic.c   | 4 ++--
 drivers/power/reset/ltc2952-poweroff.c   | 4 ++--
 drivers/s390/char/zcore.c| 5 +++--
 drivers/soc/bcm/brcmstb/pm/pm-arm.c  | 2 +-
 include/linux/panic_notifier.h   | 1 +
 kernel/panic.c   | 4 
 15 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 8f0861c58080..3023848acbf1 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -195,7 +195,7 @@ static int __init reboot_setup(void)
}
 
timer_setup(_timer, blink_timeout, 0);
-   atomic_notifier_chain_register(_notifier_list, _block);
+   atomic_notifier_chain_register(_hypervisor_list, _block);
 
return 0;
 }
diff --git a/arch/mips/sgi-ip32/ip32-reset.c b/arch/mips/sgi-ip32/ip32-reset.c
index 18d1c115cd53..9ee1302c9d13 100644
--- a/arch/mips/sgi-ip32/ip32-reset.c
+++ b/arch/mips/sgi-ip32/ip32-reset.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -145,7 +144,7 @@ static __init int ip32_reboot_setup(void)
pm_power_off = ip32_machine_halt;
 
timer_setup(_timer, blink_timeout, 0);
-   atomic_notifier_chain_register(_notifier_list, _block);
+   atomic_notifier_chain_register(_hypervisor_list, _block);
 
return 0;
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 52f96b209a96..1468c3937bf4 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -753,7 +753,7 @@ static struct notifier_block ppc_panic_block = {
 void __init setup_panic(void)
 {
/* Hard-disables IRQs + deal with FW-assisted dump (fadump) */
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_hypervisor_list,
   _fadump_block);
 
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && kaslr_offset() > 0)
diff --git a/arch/sparc/kernel/sstate.c b/arch/sparc/kernel/sstate.c
index 3bcc4ddc6911..82b7b68e0bdc 100644
--- a/arch/sparc/kernel/sstate.c
+++ b/arch/sparc/kernel/sstate.c
@@ -5,7 +5,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -106,7 +105,7 @@ static int __init sstate_init(void)
 
do_set_sstate(HV_SOFT_STATE_TRANSITION, booting_msg);
 
-   atomic_notifier_chain_register(_notifier_list,
+   atomic_notifier_chain_register(_hypervisor_list,
   _panic_block);
register_reboot_notifier(_reboot_notifier);
 
diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index b01ed02e4a87..ff0bebe2f444 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -1034,7 +1034,7 @@ static __init int gsmi_init(void)
 
register_r

[PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers

2022-04-27 Thread Guilherme G. Piccoli
This patch improves the panic/die notifiers in this driver by
making use of a passed "id" instead of comparing pointer
address; also, it removes an useless prototype declaration
and unnecessary header inclusion.

This is part of a panic notifiers refactor - this notifier in
the future will be moved to a new list, that encompass the
information notifiers only.

Fixes: 9eb60880d9a9 ("bus: brcmstb_gisb: add notifier handling")
Cc: Brian Norris 
Cc: Florian Fainelli 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/bus/brcmstb_gisb.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 183d5cc37d42..1ea7b015e225 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #ifdef CONFIG_MIPS
 #include 
@@ -347,25 +346,14 @@ static irqreturn_t brcmstb_gisb_bp_handler(int irq, void 
*dev_id)
 /*
  * Dump out gisb errors on die or panic.
  */
-static int dump_gisb_error(struct notifier_block *self, unsigned long v,
-  void *p);
-
-static struct notifier_block gisb_die_notifier = {
-   .notifier_call = dump_gisb_error,
-};
-
-static struct notifier_block gisb_panic_notifier = {
-   .notifier_call = dump_gisb_error,
-};
-
 static int dump_gisb_error(struct notifier_block *self, unsigned long v,
   void *p)
 {
struct brcmstb_gisb_arb_device *gdev;
-   const char *reason = "panic";
+   const char *reason = "die";
 
-   if (self == _die_notifier)
-   reason = "die";
+   if (v == PANIC_NOTIFIER)
+   reason = "panic";
 
/* iterate over each GISB arb registered handlers */
list_for_each_entry(gdev, _gisb_arb_device_list, next)
@@ -374,6 +362,14 @@ static int dump_gisb_error(struct notifier_block *self, 
unsigned long v,
return NOTIFY_DONE;
 }
 
+static struct notifier_block gisb_die_notifier = {
+   .notifier_call = dump_gisb_error,
+};
+
+static struct notifier_block gisb_panic_notifier = {
+   .notifier_call = dump_gisb_error,
+};
+
 static DEVICE_ATTR(gisb_arb_timeout, S_IWUSR | S_IRUGO,
gisb_arb_get_timeout, gisb_arb_set_timeout);
 
-- 
2.36.0




[PATCH 16/30] drivers/hv/vmbus, video/hyperv_fb: Untangle and refactor Hyper-V panic notifiers

2022-04-27 Thread Guilherme G. Piccoli
Currently Hyper-V guests are among the most relevant users of the panic
infrastructure, like panic notifiers, kmsg dumpers, etc. The reasons rely
both in cleaning-up procedures (closing a hypervisor <-> guest connection,
disabling a paravirtualized timer) as well as to data collection (sending
panic information to the hypervisor) and framebuffer management.

The thing is: some notifiers are related to others, ordering matters, some
functionalities are duplicated and there are lots of conditionals behind
sending panic information to the hypervisor. This patch, as part of an
effort to clean-up the panic notifiers mechanism and better document
things, address some of the issues/complexities of Hyper-V panic handling
through the following changes:

(a) We have die and panic notifiers on vmbus_drv.c and both have goals of
sending panic information to the hypervisor, though the panic notifier is
also responsible for a cleaning-up procedure.

This commit clears the code by splitting the panic notifier in two, one
for closing the vmbus connection whereas the other is only for sending
panic info to hypervisor. With that, it was possible to merge the die and
panic notifiers in a single/well-documented function, and clear some
conditional complexities on sending such information to the hypervisor.

(b) The new panic notifier created after (a) is only doing a single thing:
cleaning the vmbus connection. This procedure might cause a delay (due to
hypervisor I/O completion), so we postpone that to run late. But more
relevant: this *same* vmbus unloading happens in the crash_shutdown()
handler, so if kdump is set, we can safely skip this panic notifier and
defer such clean-up to the kexec crash handler.

(c) There is also a Hyper-V framebuffer panic notifier, which relies in
doing a vmbus operation that demands a valid connection. So, we must
order this notifier with the panic notifier from vmbus_drv.c, in order to
guarantee that the framebuffer code executes before the vmbus connection
is unloaded.

Also, this commit removes a useless header.

Although there is code rework and re-ordering, we expect that this change
has no functional regressions but instead optimize the path and increase
panic reliability on Hyper-V. This was tested on Hyper-V with success.

Fixes: 792f232d57ff ("Drivers: hv: vmbus: Fix potential crash on module unload")
Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Cc: Andrea Parri (Microsoft) 
Cc: Dexuan Cui 
Cc: Haiyang Zhang 
Cc: "K. Y. Srinivasan" 
Cc: Michael Kelley 
Cc: Stephen Hemminger 
Cc: Tianyu Lan 
Cc: Wei Liu 
Tested-by: Fabio A M Martins 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

Michael and all Microsoft folks: a careful analysis to double-check our changes
and assumptions here is really appreciated, this code is complex and intricate,
it is possible some corner case might have been overlooked.

Thanks in advance!

 drivers/hv/vmbus_drv.c  | 109 
 drivers/video/fbdev/hyperv_fb.c |   8 +++
 2 files changed, 76 insertions(+), 41 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 14de17087864..f37f12d48001 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -24,11 +24,11 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -68,51 +68,75 @@ static int hyperv_report_reg(void)
return !sysctl_record_panic_msg || !hv_panic_page;
 }
 
-static int hyperv_panic_event(struct notifier_block *nb, unsigned long val,
+/*
+ * The panic notifier below is responsible solely for unloading the
+ * vmbus connection, which is necessary in a panic event. But notice
+ * that this same unloading procedure is executed in the Hyper-V
+ * crash_shutdown() handler [see hv_crash_handler()], which basically
+ * means that we can postpone its execution if we have kdump set,
+ * since it will run the crash_shutdown() handler anyway. Even more
+ * intrincated is the relation of this notifier with Hyper-V framebuffer
+ * panic notifier - we need vmbus connection alive there in order to
+ * succeed, so we need to order both with each other [for reference see
+ * hvfb_on_panic()] - this is done using notifiers' priorities.
+ */
+static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
  void *args)
+{
+   if (!kexec_crash_loaded())
+   vmbus_initiate_unload(true);
+
+   return NOTIFY_DONE;
+}
+static struct notifier_block hyperv_panic_vmbus_unload_block = {
+   .notifier_call  = hv_panic_vmbus_unload,
+   .priority   = INT_MIN + 1, /* almost the latest one to execute */
+};
+
+/*
+ * The following callback works both as die and panic notifi

[PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-04-27 Thread Guilherme G. Piccoli
Currently the panic notifiers from user mode linux don't follow
the convention for most of the other notifiers present in the
kernel (indentation, priority setting, numeric return).
More important, the priorities could be improved, since it's a
special case (userspace), hence we could run the notifiers earlier;
user mode linux shouldn't care much with other panic notifiers but
the ordering among the mconsole and arch notifier is important,
given that the arch one effectively triggers a core dump.

This patch fixes that by running the mconsole notifier as the first
panic notifier, followed by the architecture one (that coredumps).
Also, we remove a useless header inclusion.

Cc: Anton Ivanov 
Cc: Johannes Berg 
Cc: Richard Weinberger 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/um/drivers/mconsole_kern.c | 8 +++-
 arch/um/kernel/um_arch.c| 8 
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 8ca67a692683..2ea0421bcc3f 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -846,13 +845,12 @@ static int notify_panic(struct notifier_block *self, 
unsigned long unused1,
 
mconsole_notify(notify_socket, MCONSOLE_PANIC, message,
strlen(message) + 1);
-   return 0;
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-   .notifier_call  = notify_panic,
-   .next   = NULL,
-   .priority   = 1
+   .notifier_call  = notify_panic,
+   .priority   = INT_MAX, /* run as soon as possible */
 };
 
 static int add_notifier(void)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 0760e24f2eba..4485b1a7c8e4 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -246,13 +246,13 @@ static int panic_exit(struct notifier_block *self, 
unsigned long unused1,
bust_spinlocks(0);
uml_exitcode = 1;
os_dump_core();
-   return 0;
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block panic_exit_notifier = {
-   .notifier_call  = panic_exit,
-   .next   = NULL,
-   .priority   = 0
+   .notifier_call  = panic_exit,
+   .priority   = INT_MAX - 1, /* run as 2nd notifier, won't return */
 };
 
 void uml_finishsetup(void)
-- 
2.36.0




[PATCH 26/30] Drivers: hv: Do not force all panic notifiers to execute before kdump

2022-04-27 Thread Guilherme G. Piccoli
Since commit a11589563e96 ("x86/Hyper-V: Report crash register
data or kmsg before running crash kernel") Hyper-V forcibly sets
the kernel parameter "crash_kexec_post_notifiers"; with that, it
did enforce the execution of *all* panic notifiers before kdump.
The main reason behind that is that Hyper-V has an hypervisor
notification mechanism that has the ability of warning the
hypervisor when the guest panics.

Happens that after the panic notifiers refactor, we now have 3 lists
and a level mechanism that defines the ordering of the notifiers
execution with regards to kdump. And for Hyper-V, the specific
notifier to inform the hypervisor about a panic lies in the first
list, which *by default* is set to execute before kdump. Hence,
this patch removes the hardcoded setting, effectively reverting
the aforementioned commit.

One of the problems with the forced approach was greatly exposed by
commit d57d6fe5bf34 ("drivers: hv: log when enabling 
crash_kexec_post_notifiers")
which ended-up confusing the user that didn't expect the notifiers
to execute before kdump, since it's a user setting and wasn't
enabled by such user. With the patch hereby proposed, that kind
of issue doesn't happen anymore, the panic notifiers level is
well-documented and users can expect a predictable behavior.

Fixes: a11589563e96 ("x86/Hyper-V: Report crash register data or kmsg before 
running crash kernel")
Fixes: d57d6fe5bf34 ("drivers: hv: log when enabling crash_kexec_post_notifiers"
Cc: Andrea Parri (Microsoft) 
Cc: Dexuan Cui 
Cc: Haiyang Zhang 
Cc: "K. Y. Srinivasan" 
Cc: Michael Kelley 
Cc: Stephen Brennan 
Cc: Stephen Hemminger 
Cc: Tianyu Lan 
Cc: Wei Liu 
Tested-by: Fabio A M Martins 
Signed-off-by: Guilherme G. Piccoli 
---

Special thanks to Michael Kelley for the good information about the Hyper-V
panic path in email threads some months ago, and to Fabio for the testing
performed.

 drivers/hv/hv_common.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae68298c0dca..af59793de523 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -73,18 +73,6 @@ int __init hv_common_init(void)
 {
int i;
 
-   /*
-* Hyper-V expects to get crash register data or kmsg when
-* crash enlightment is available and system crashes. Set
-* crash_kexec_post_notifiers to be true to make sure that
-* calling crash enlightment interface before running kdump
-* kernel.
-*/
-   if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
-   crash_kexec_post_notifiers = true;
-   pr_info("Hyper-V: enabling crash_kexec_post_notifiers\n");
-   }
-
/*
 * Allocate the per-CPU state for the hypercall input arg.
 * If this allocation fails, we will not be able to setup
-- 
2.36.0




[PATCH 10/30] alpha: Clean-up the panic notifier code

2022-04-27 Thread Guilherme G. Piccoli
The alpha panic notifier has some code issues, not following
the conventions of other notifiers. Also, it might halt the
machine but still it is set to run as early as possible, which
doesn't seem to be a good idea.

This patch cleans the code, and set the notifier to run as the
latest, following the same approach other architectures are doing.
Also, we remove the unnecessary include of a header already
included indirectly.

Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Richard Henderson 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/alpha/kernel/setup.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index b4fbbba30aa2..d88bdf852753 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -41,19 +41,11 @@
 #include 
 #include 
 #endif
-#include 
 #include 
 #include 
 #include 
 #include 
 
-static int alpha_panic_event(struct notifier_block *, unsigned long, void *);
-static struct notifier_block alpha_panic_block = {
-   alpha_panic_event,
-NULL,
-INT_MAX /* try to do it first */
-};
-
 #include 
 #include 
 #include 
@@ -435,6 +427,21 @@ static const struct sysrq_key_op srm_sysrq_reboot_op = {
 };
 #endif
 
+static int alpha_panic_event(struct notifier_block *this,
+unsigned long event, void *ptr)
+{
+   /* If we are using SRM and serial console, just hard halt here. */
+   if (alpha_using_srm && srmcons_output)
+   __halt();
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block alpha_panic_block = {
+   .notifier_call = alpha_panic_event,
+   .priority = INT_MIN, /* may not return, do it last */
+};
+
 void __init
 setup_arch(char **cmdline_p)
 {
@@ -1427,19 +1434,6 @@ const struct seq_operations cpuinfo_op = {
.show   = show_cpuinfo,
 };
 
-
-static int
-alpha_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-#if 1
-   /* FIXME FIXME FIXME */
-   /* If we are using SRM and serial console, just hard halt here. */
-   if (alpha_using_srm && srmcons_output)
-   __halt();
-#endif
-return NOTIFY_DONE;
-}
-
 static __init int add_pcspkr(void)
 {
struct platform_device *pd;
-- 
2.36.0




[PATCH 09/30] coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier

2022-04-27 Thread Guilherme G. Piccoli
The panic notifier infrastructure executes registered callbacks when
a panic event happens - such callbacks are executed in atomic context,
with interrupts and preemption disabled in the running CPU and all other
CPUs disabled. That said, mutexes in such context are not a good idea.

This patch replaces a regular mutex with a mutex_trylock safer approach;
given the nature of the mutex used in the driver, it should be pretty
uncommon being unable to acquire such mutex in the panic path, hence
no functional change should be observed (and if it is, that would be
likely a deadlock with the regular mutex).

Fixes: 2227b7c74634 ("coresight: add support for CPU debug module")
Cc: Leo Yan 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Suzuki K Poulose 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 8845ec4b4402..1874df7c6a73 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -380,9 +380,10 @@ static int debug_notifier_call(struct notifier_block *self,
int cpu;
struct debug_drvdata *drvdata;
 
-   mutex_lock(_lock);
+   /* Bail out if we can't acquire the mutex or the functionality is off */
+   if (!mutex_trylock(_lock))
+   return NOTIFY_DONE;
 
-   /* Bail out if the functionality is disabled */
if (!debug_enable)
goto skip_dump;
 
@@ -401,7 +402,7 @@ static int debug_notifier_call(struct notifier_block *self,
 
 skip_dump:
mutex_unlock(_lock);
-   return 0;
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block debug_notifier = {
-- 
2.36.0




[PATCH 25/30] panic, printk: Add console flush parameter and convert panic_print to a notifier

2022-04-27 Thread Guilherme G. Piccoli
Currently the parameter "panic_print" relies in a function called
directly on panic path; one of the flags the users can set for
panic_print triggers a console replay mechanism, to show the
entire kernel log buffer (from the beginning) in a panic event.

Two problems with that: the dual nature of the panic_print
isn't really appropriate, the function was originally meant
to allow users dumping system information on panic events,
and was "overridden" to also force a console flush of the full
kernel log buffer. It also turns the code a bit more complex
and duplicate than it needs to be.

This patch proposes 2 changes: first, we decouple panic_print
from the console flushing mechanism, in the form of a new kernel
core parameter (panic_console_replay); we kept the functionality
on panic_print to avoid userspace breakage, although we comment
in both code and documentation that this panic_print usage is
deprecated.

We converted panic_print function to a panic notifier too, adding
it on the panic informational notifier list, executed as the final
callback. This allows a more clear code and makes sense, as
panic_print_sys_info() is really a panic-time only function.
We also moved its code to kernel/printk.c, it seems to make more
sense given it's related to printing stuff.

Suggested-by: Petr Mladek 
Signed-off-by: Guilherme G. Piccoli 
---
 .../admin-guide/kernel-parameters.txt | 12 +++-
 Documentation/admin-guide/sysctl/kernel.rst   |  5 +-
 include/linux/console.h   |  2 +
 include/linux/panic.h |  1 -
 include/linux/printk.h|  1 +
 kernel/panic.c| 51 +++
 kernel/printk/printk.c| 62 +++
 7 files changed, 87 insertions(+), 47 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8d3524060ce3..c99da8b2b216 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3791,6 +3791,14 @@
timeout < 0: reboot immediately
Format: 
 
+   panic_console_replay
+   [KNL] Force a kernel log replay in the console on
+   panic event. Notice that there is already a flush
+   mechanism for pending messages; this option is meant
+   for users that wish to replay the *full* buffer.
+   It deprecates the bit 5 setting on "panic_print",
+   both having the same functionality.
+
panic_notifiers_level=
[KNL] Set the panic notifiers execution order.
Format: 
@@ -3825,12 +3833,14 @@
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
-   bit 5: print all printk messages in buffer
+   bit 5: print all printk messages in buffer (DEPRECATED)
bit 6: print all CPUs backtrace (if available in the 
arch)
*Be aware* that this option may print a _lot_ of lines,
so there are risks of losing older messages in the log.
Use this option carefully, maybe worth to setup a
bigger log buffer with "log_buf_len" along with this.
+   Also, notice that bit 5 was deprecated in favor of the
+   parameter "panic_console_replay".
 
panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: [,nousertaint]
diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..17b293a0e566 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -763,10 +763,13 @@ bit 1  print system memory info
 bit 2  print timer info
 bit 3  print locks info if ``CONFIG_LOCKDEP`` is on
 bit 4  print ftrace buffer
-bit 5  print all printk messages in buffer
+bit 5  print all printk messages in buffer (DEPRECATED)
 bit 6  print all CPUs backtrace (if available in the arch)
 =  
 
+Notice that bit 5 was deprecated in favor of kernel core parameter
+"panic_console_replay" (see kernel-parameters.txt documentation).
+
 So for example to print tasks and memory info on panic, user can::
 
   echo 3 > /proc/sys/kernel/panic_print
diff --git a/include/linux/console.h b/include/linux/console.h
index 7cd758a4f44e..351c14f623ad 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -169,6 +169,8 @@ enum con_flush_mode {
CONSOLE_REPLAY_ALL,
 };
 
+extern bool panic_conso

[PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-04-27 Thread Guilherme G. Piccoli
Currently we don't have a way to check if there are dumpers set,
except counting the list members maybe. This patch introduces a very
simple helper to provide this information, by just keeping track of
registered/unregistered kmsg dumpers. It's going to be used on the
panic path in the subsequent patch.

Notice that the spinlock guarding kmsg_dumpers list also guards
increment/decrement of the dumper's counter, but there's no need
for that when reading the counter in the panic path, since that is
an atomic path and there's no other (planned) user.

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/kmsg_dump.h |  7 +++
 kernel/printk/printk.c| 14 ++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/kmsg_dump.h b/include/linux/kmsg_dump.h
index 906521c2329c..abea1974bff8 100644
--- a/include/linux/kmsg_dump.h
+++ b/include/linux/kmsg_dump.h
@@ -65,6 +65,8 @@ bool kmsg_dump_get_buffer(struct kmsg_dump_iter *iter, bool 
syslog,
 
 void kmsg_dump_rewind(struct kmsg_dump_iter *iter);
 
+bool kmsg_has_dumpers(void);
+
 int kmsg_dump_register(struct kmsg_dumper *dumper);
 
 int kmsg_dump_unregister(struct kmsg_dumper *dumper);
@@ -91,6 +93,11 @@ static inline void kmsg_dump_rewind(struct kmsg_dump_iter 
*iter)
 {
 }
 
+static inline bool kmsg_has_dumpers(void)
+{
+   return false;
+}
+
 static inline int kmsg_dump_register(struct kmsg_dumper *dumper)
 {
return -EINVAL;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da03c15ecc89..e3a1c429fbbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3399,6 +3399,18 @@ EXPORT_SYMBOL(printk_timed_ratelimit);
 
 static DEFINE_SPINLOCK(dump_list_lock);
 static LIST_HEAD(dump_list);
+static int num_dumpers;
+
+/**
+ * kmsg_has_dumpers - inform if there is any kmsg dumper registered.
+ *
+ * Returns true if there's at least one registered dumper, or false
+ * if otherwise.
+ */
+bool kmsg_has_dumpers(void)
+{
+   return num_dumpers ? true : false;
+}
 
 /**
  * kmsg_dump_register - register a kernel log dumper.
@@ -3423,6 +3435,7 @@ int kmsg_dump_register(struct kmsg_dumper *dumper)
dumper->registered = 1;
list_add_tail_rcu(>list, _list);
err = 0;
+   num_dumpers++;
}
spin_unlock_irqrestore(_list_lock, flags);
 
@@ -3447,6 +3460,7 @@ int kmsg_dump_unregister(struct kmsg_dumper *dumper)
dumper->registered = 0;
list_del_rcu(>list);
err = 0;
+   num_dumpers--;
}
spin_unlock_irqrestore(_list_lock, flags);
synchronize_rcu();
-- 
2.36.0




[PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
The pvpanic driver relies on panic notifiers to execute a callback
on panic event. Such function is executed in atomic context - the
panic function disables local IRQs, preemption and all other CPUs
that aren't running the panic code.

With that said, it's dangerous to use regular spinlocks in such path,
as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple instances").
This patch fixes that by replacing regular spinlocks with the trylock
safer approach.

It also fixes an old comment (about a long gone framebuffer code) and
the notifier priority - we should execute hypervisor notifiers early,
deferring this way the panic action to the hypervisor, as expected by
the users that are setting up pvpanic.

Fixes: b3c0f8774668 ("misc/pvpanic: probe multiple instances")
Cc: Christophe JAILLET 
Cc: Mihai Carabas 
Cc: Shile Zhang 
Cc: Wang ShaoBo 
Cc: zhenwei pi 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/misc/pvpanic/pvpanic.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/pvpanic/pvpanic.c b/drivers/misc/pvpanic/pvpanic.c
index 4b8f1c7d726d..049a12006348 100644
--- a/drivers/misc/pvpanic/pvpanic.c
+++ b/drivers/misc/pvpanic/pvpanic.c
@@ -34,7 +34,9 @@ pvpanic_send_event(unsigned int event)
 {
struct pvpanic_instance *pi_cur;
 
-   spin_lock(_lock);
+   if (!spin_trylock(_lock))
+   return;
+
list_for_each_entry(pi_cur, _list, list) {
if (event & pi_cur->capability & pi_cur->events)
iowrite8(event, pi_cur->base);
@@ -55,9 +57,13 @@ pvpanic_panic_notify(struct notifier_block *nb, unsigned 
long code, void *unused
return NOTIFY_DONE;
 }
 
+/*
+ * Call our notifier very early on panic, deferring the
+ * action taken to the hypervisor.
+ */
 static struct notifier_block pvpanic_panic_nb = {
.notifier_call = pvpanic_panic_notify,
-   .priority = 1, /* let this called before broken drm_fb_helper() */
+   .priority = INT_MAX,
 };
 
 static void pvpanic_remove(void *param)
-- 
2.36.0




[PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
Currently the gsmi driver registers a panic notifier as well as
reboot and die notifiers. The callbacks registered are called in
atomic and very limited context - for instance, panic disables
preemption, local IRQs and all other CPUs that aren't running the
current panic function.

With that said, taking a spinlock in this scenario is a
dangerous invitation for a deadlock scenario. So, we fix
that in this commit by changing the regular spinlock with
a trylock, which is a safer approach.

Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
Cc: Ard Biesheuvel 
Cc: David Gow 
Cc: Evan Green 
Cc: Julius Werner 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/firmware/google/gsmi.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..b01ed02e4a87 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -629,7 +629,10 @@ static int gsmi_shutdown_reason(int reason)
if (saved_reason & (1 << reason))
return 0;
 
-   spin_lock_irqsave(_dev.lock, flags);
+   if (!spin_trylock_irqsave(_dev.lock, flags)) {
+   rc = -EBUSY;
+   goto out;
+   }
 
saved_reason |= (1 << reason);
 
@@ -646,6 +649,7 @@ static int gsmi_shutdown_reason(int reason)
 
spin_unlock_irqrestore(_dev.lock, flags);
 
+out:
if (rc < 0)
printk(KERN_ERR "gsmi: Log Shutdown Reason failed\n");
else
-- 
2.36.0




[PATCH 22/30] panic: Introduce the panic post-reboot notifier list

2022-04-27 Thread Guilherme G. Piccoli
Currently we have 3 notifier lists in the panic path, which will
be wired in a way to allow the notifier callbacks to run in
different moments at panic time, in a subsequent patch.

But there is also an odd set of architecture calls hardcoded in
the end of panic path, after the restart machinery. They're
responsible for late time tunings / events, like enabling a stop
button (Sparc) or effectively stopping the machine (s390).

This patch introduces yet another notifier list to offer the
architectures a way to add callbacks in such late moment on
panic path without the need of ifdefs / hardcoded approaches.

Cc: Alexander Gordeev 
Cc: Christian Borntraeger 
Cc: "David S. Miller" 
Cc: Heiko Carstens 
Cc: Sven Schnelle 
Cc: Vasily Gorbik 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/s390/kernel/setup.c   | 19 ++-
 arch/sparc/kernel/setup_32.c   | 27 +++
 arch/sparc/kernel/setup_64.c   | 29 -
 include/linux/panic_notifier.h |  1 +
 kernel/panic.c | 19 +++
 5 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index d860ac300919..d816b2045f1e 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -51,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -943,6 +943,20 @@ static void __init log_component_list(void)
}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   disabled_wait();
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+   .notifier_call = panic_event,
+};
+
 /*
  * Setup function called from init/main.c just after the banner
  * was printed.
@@ -1058,4 +1072,7 @@ void __init setup_arch(char **cmdline_p)
 
/* Add system specific data to the random pool */
setup_randomness();
+
+   atomic_notifier_chain_register(_post_reboot_list,
+  _reboot_panic_block);
 }
diff --git a/arch/sparc/kernel/setup_32.c b/arch/sparc/kernel/setup_32.c
index c8e0dd99f370..4e2428972f76 100644
--- a/arch/sparc/kernel/setup_32.c
+++ b/arch/sparc/kernel/setup_32.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -51,6 +52,7 @@
 
 #include "kernel.h"
 
+int stop_a_enabled = 1;
 struct screen_info screen_info = {
0, 0,   /* orig-x, orig-y */
0,  /* unused */
@@ -293,6 +295,24 @@ void __init sparc32_start_kernel(struct linux_romvec *rp)
start_kernel();
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   /* Make sure the user can actually press Stop-A (L1-A) */
+   stop_a_enabled = 1;
+   pr_emerg("Press Stop-A (L1-A) from sun keyboard or send break\n"
+   "twice on console to return to the boot prom\n");
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block post_reboot_panic_block = {
+   .notifier_call = panic_event,
+};
+
 void __init setup_arch(char **cmdline_p)
 {
int i;
@@ -368,9 +388,10 @@ void __init setup_arch(char **cmdline_p)
paging_init();
 
smp_setup_cpu_possible_map();
-}
 
-extern int stop_a_enabled;
+   atomic_notifier_chain_register(_post_reboot_list,
+  _reboot_panic_block);
+}
 
 void sun_do_break(void)
 {
@@ -384,8 +405,6 @@ void sun_do_break(void)
 }
 EXPORT_SYMBOL(sun_do_break);
 
-int stop_a_enabled = 1;
-
 static int __init topology_init(void)
 {
int i, ncpus, err;
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 48abee4eee29..9066c25ecc07 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -62,6 +63,8 @@
 #include "entry.h"
 #include "kernel.h"
 
+int stop_a_enabled = 1;
+
 /* Used to synchronize accesses to NatSemi SUPER I/O chip configure
  * operations in asm/ns87303.h
  */
@@ -632,6 +635,24 @@ void __init alloc_irqstack_bootmem(void)
}
 }
 
+/*
+ * The following notifier executes as one of the latest things in the panic
+ * path, only if the restart routines weren't executed (or didn't succeed).
+ */
+static int panic_event(struct notifier_block *n, unsigned long ev, void 
*unused)
+{
+   /* Make sure the user can actually press Stop-A (L1-A) */
+   stop_a

[PATCH 03/30] notifier: Add panic notifiers info and purge trailing whitespaces

2022-04-27 Thread Guilherme G. Piccoli
Although many notifiers are mentioned in the comments, the panic
notifiers infrastructure is not. Also, the file contains some
trailing whitespaces. This commit fix both issues.

Cc: Arjan van de Ven 
Cc: Cong Wang 
Cc: Sebastian Andrzej Siewior 
Cc: Valentin Schneider 
Cc: Xiaoming Ni 
Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/notifier.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..0589896fc7bd 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -201,12 +201,12 @@ static inline int notifier_to_errno(int ret)
 
 /*
  * Declared notifiers so far. I can imagine quite a few more chains
- * over time (eg laptop power reset chains, reboot chain (to clean 
+ * over time (eg laptop power reset chains, reboot chain (to clean
  * device units up), device [un]mount chain, module load/unload chain,
- * low memory chain, screenblank chain (for plug in modular 
screenblankers) 
+ * low memory chain, screenblank chain (for plug in modular screenblankers)
  * VC switch chains (for loadable kernel svgalib VC switch helpers) etc...
  */
- 
+
 /* CPU notfiers are defined in include/linux/cpu.h. */
 
 /* netdevice notifiers are defined in include/linux/netdevice.h */
@@ -217,6 +217,8 @@ static inline int notifier_to_errno(int ret)
 
 /* Virtual Terminal events are defined in include/linux/vt.h. */
 
+/* Panic notifiers are defined in include/linux/panic_notifier.h. */
+
 #define NETLINK_URELEASE   0x0001  /* Unicast netlink socket released */
 
 /* Console keyboard events.
-- 
2.36.0




[PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-04-27 Thread Guilherme G. Piccoli
Currently we have a debug infrastructure in the notifiers file, but
it's very simple/limited. This patch extends it by:

(a) Showing all registered/unregistered notifiers' callback names;

(b) Adding a dynamic debug tuning to allow showing called notifiers'
function names. Notice that this should be guarded as a tunable since
it can flood the kernel log buffer.

Cc: Arjan van de Ven 
Cc: Cong Wang 
Cc: Sebastian Andrzej Siewior 
Cc: Valentin Schneider 
Cc: Xiaoming Ni 
Signed-off-by: Guilherme G. Piccoli 
---

We have some design decisions that worth discussing here:

(a) First of call, using C99 helps a lot to write clear and concise code, but
due to commit 4d94f910e79a ("Kbuild: use -Wdeclaration-after-statement") we
have a warning if mixing variable declarations with code. For this patch though,
doing that makes the code way clear, so decision was to add the debug code
inside brackets whenever this warning pops up. We can change that, but that'll
cause more ifdefs in the same function.

(b) In the symbol lookup helper function, we modify the parameter passed but
even more, we return it as well! This is unusual and seems unnecessary, but was
the strategy taken to allow embedding such function in the pr_debug() call.

Not doing that would likely requiring 3 symbol_name variables to avoid
concurrency (registering notifier A while calling notifier B) - we rely in
local variables as a serialization mechanism.

We're open for suggestions in case this design is not appropriate;
thanks in advance!

 kernel/notifier.c | 48 +--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index ba005ebf4730..21032ebcde57 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -7,6 +7,22 @@
 #include 
 #include 
 
+#ifdef CONFIG_DEBUG_NOTIFIERS
+#include 
+
+/*
+ * Helper to get symbol names in case DEBUG_NOTIFIERS is set.
+ * Return the modified parameter is a strategy used to achieve
+ * the pr_debug() functionality - with this, function is only
+ * executed if the dynamic debug tuning is effectively set.
+ */
+static inline char *notifier_name(struct notifier_block *nb, char *sym_name)
+{
+   lookup_symbol_name((unsigned long)(nb->notifier_call), sym_name);
+   return sym_name;
+}
+#endif
+
 /*
  * Notifier list for kernel code which wants to be called
  * at shutdown. This is used to stop any idling DMA operations
@@ -34,20 +50,41 @@ static int notifier_chain_register(struct notifier_block 
**nl,
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+   {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_info("notifiers: registered %s()\n",
+   notifier_name(n, sym_name));
+   }
+#endif
return 0;
 }
 
 static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
 {
+   int ret = -ENOENT;
+
while ((*nl) != NULL) {
if ((*nl) == n) {
rcu_assign_pointer(*nl, n->next);
-   return 0;
+   ret = 0;
+   break;
}
nl = &((*nl)->next);
}
-   return -ENOENT;
+
+#ifdef CONFIG_DEBUG_NOTIFIERS
+   if (!ret) {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_info("notifiers: unregistered %s()\n",
+   notifier_name(n, sym_name));
+   }
+#endif
+   return ret;
 }
 
 /**
@@ -80,6 +117,13 @@ static int notifier_call_chain(struct notifier_block **nl,
nb = next_nb;
continue;
}
+
+   {
+   char sym_name[KSYM_NAME_LEN];
+
+   pr_debug("notifiers: calling %s()\n",
+notifier_name(nb, sym_name));
+   }
 #endif
ret = nb->notifier_call(nb, val, v);
 
-- 
2.36.0




[PATCH 17/30] tracing: Improve panic/die notifiers

2022-04-27 Thread Guilherme G. Piccoli
Currently the tracing dump_on_oops feature is implemented
through separate notifiers, one for die/oops and the other
for panic. With the addition of panic notifier "id", this
patch makes use of such "id" to unify both functions.

It also comments the function and changes the priority of the
notifier blocks, in order they run early compared to other
notifiers, to prevent useless trace data (like the callback
names for the other notifiers). Finally, we also removed an
unnecessary header inclusion.

Signed-off-by: Guilherme G. Piccoli 
---
 kernel/trace/trace.c | 57 +---
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index f4de111fa18f..c1d8a3622ccc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -9767,38 +9766,46 @@ static __init int tracer_init_tracefs(void)
 
 fs_initcall(tracer_init_tracefs);
 
-static int trace_panic_handler(struct notifier_block *this,
-  unsigned long event, void *unused)
+/*
+ * The idea is to execute the following die/panic callback early, in order
+ * to avoid showing irrelevant information in the trace (like other panic
+ * notifier functions); we are the 2nd to run, after hung_task/rcu_stall
+ * warnings get disabled (to prevent potential log flooding).
+ */
+static int trace_die_panic_handler(struct notifier_block *self,
+   unsigned long ev, void *unused)
 {
-   if (ftrace_dump_on_oops)
+   int do_dump;
+
+   if (!ftrace_dump_on_oops)
+   return NOTIFY_DONE;
+
+   switch (ev) {
+   case DIE_OOPS:
+   do_dump = 1;
+   break;
+   case PANIC_NOTIFIER:
+   do_dump = 1;
+   break;
+   default:
+   do_dump = 0;
+   break;
+   }
+
+   if (do_dump)
ftrace_dump(ftrace_dump_on_oops);
-   return NOTIFY_OK;
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block trace_panic_notifier = {
-   .notifier_call  = trace_panic_handler,
-   .next   = NULL,
-   .priority   = 150   /* priority: INT_MAX >= x >= 0 */
+   .notifier_call = trace_die_panic_handler,
+   .priority = INT_MAX - 1,
 };
 
-static int trace_die_handler(struct notifier_block *self,
-unsigned long val,
-void *data)
-{
-   switch (val) {
-   case DIE_OOPS:
-   if (ftrace_dump_on_oops)
-   ftrace_dump(ftrace_dump_on_oops);
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
-}
-
 static struct notifier_block trace_die_notifier = {
-   .notifier_call = trace_die_handler,
-   .priority = 200
+   .notifier_call = trace_die_panic_handler,
+   .priority = INT_MAX - 1,
 };
 
 /*
-- 
2.36.0




[PATCH 02/30] ARM: kexec: Disable IRQs/FIQs also on crash CPUs shutdown path

2022-04-27 Thread Guilherme G. Piccoli
Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
is responsible for that. This makes sense, since we're turning off
such CPUs, putting them in an endless busy-wait loop.

Problem is that there is an alternative path for disabling CPUs,
in the form of function crash_smp_send_stop(), used for kexec/panic
paths. This functions relies in a SMP call that also triggers a
busy-wait loop [at machine_crash_nonpanic_core()], but *without*
disabling interrupts. This might lead to odd scenarios, like early
interrupts in the boot of kexec'd kernel or even interrupts in
other CPUs while the main one still works in the panic path and
assumes all secondary CPUs are (really!) off.

This patch mimics the ipi_cpu_stop() interrupt disable mechanism
in the crash CPU shutdown path, hence disabling IRQs/FIQs in all
secondary CPUs in the kexec/panic path as well.

Cc: Marc Zyngier 
Cc: Russell King 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/arm/kernel/machine_kexec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index f567032a09c0..ef788ee00519 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -86,6 +86,9 @@ void machine_crash_nonpanic_core(void *unused)
set_cpu_online(smp_processor_id(), false);
atomic_dec(_for_crash_ipi);
 
+   local_fiq_disable();
+   local_irq_disable();
+
while (1) {
cpu_relax();
wfe();
-- 
2.36.0




[PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-04-27 Thread Guilherme G. Piccoli
The notifiers infrastructure provides a way to pass an "id" to the
callbacks to determine what kind of event happened, i.e., what is
the reason behind they getting called.

The panic notifier currently pass 0, but this is soon to be
used in a multi-targeted notifier, so let's pass a meaningful
"id" over there.

Signed-off-by: Guilherme G. Piccoli 
---
 include/linux/panic_notifier.h | 5 +
 kernel/panic.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/panic_notifier.h b/include/linux/panic_notifier.h
index 41e32483d7a7..07dced83a783 100644
--- a/include/linux/panic_notifier.h
+++ b/include/linux/panic_notifier.h
@@ -9,4 +9,9 @@ extern struct atomic_notifier_head panic_notifier_list;
 
 extern bool crash_kexec_post_notifiers;
 
+enum panic_notifier_val {
+   PANIC_UNUSED,
+   PANIC_NOTIFIER = 0xDEAD,
+};
+
 #endif /* _LINUX_PANIC_NOTIFIERS_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index eb4dfb932c85..523bc9ccd0e9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -287,7 +287,7 @@ void panic(const char *fmt, ...)
 * Run any panic handlers, including those that might need to
 * add information to the kmsg dump output.
 */
-   atomic_notifier_call_chain(_notifier_list, 0, buf);
+   atomic_notifier_call_chain(_notifier_list, PANIC_NOTIFIER, buf);
 
panic_print_sys_info(false);
 
-- 
2.36.0




[PATCH 13/30] s390/consoles: Improve panic notifiers reliability

2022-04-27 Thread Guilherme G. Piccoli
Currently many console drivers for s390 rely on panic/reboot notifiers
to invoke callbacks on these events. The panic() function disables local
IRQs, secondary CPUs and preemption, so callbacks invoked on panic are
effectively running in atomic context.

Happens that most of these console callbacks from s390 doesn't take the
proper care with regards to atomic context, like taking spinlocks that
might be taken in other function/CPU and hence will cause a lockup
situation.

The goal for this patch is to improve the notifiers reliability, acting
on 4 console drivers, as detailed below:

(1) con3215: changed a regular spinlock to the trylock alternative.

(2) con3270: also changed a regular spinlock to its trylock counterpart,
but here we also have another problem: raw3270_activate_view() takes a
different spinlock. So, we worked a helper to validate if this other lock
is safe to acquire, and if so, raw3270_activate_view() should be safe.

Notice though that there is a functional change here: it's now possible
to continue the notifier code [reaching con3270_wait_write() and
con3270_rebuild_update()] without executing raw3270_activate_view().

(3) sclp: a global lock is used heavily in the functions called from
the notifier, so we added a check here - if the lock is taken already,
we just bail-out, preventing the lockup.

(4) sclp_vt220: same as (3), a lock validation was added to prevent the
potential lockup problem.

Besides (1)-(4), we also removed useless void functions, adding the
code called from the notifier inside its own body, and changed the
priority of such notifiers to execute late, since they are "heavyweight"
for the panic environment, so we aim to reduce risks here.
Changed return values to NOTIFY_DONE as well, the standard one.

Cc: Alexander Gordeev 
Cc: Christian Borntraeger 
Cc: Heiko Carstens 
Cc: Sven Schnelle 
Cc: Vasily Gorbik 
Signed-off-by: Guilherme G. Piccoli 
---

As a design choice, the option used here to verify a given spinlock is taken
was the function "spin_is_locked()" - but we noticed that it is not often used.
An alternative would to take the lock with a spin_trylock() and if it succeeds,
just release the spinlock and continue the code. But that seemed weird...

Also, we'd like to ask a good validation of case (2) potential functionality
change from the s390 console experts - far from expert here, and in our naive
code observation, that seems fine, but that analysis might be missing some
corner case.

Thanks in advance!

 drivers/s390/char/con3215.c| 36 +++--
 drivers/s390/char/con3270.c| 34 +++
 drivers/s390/char/raw3270.c| 18 +++
 drivers/s390/char/raw3270.h|  1 +
 drivers/s390/char/sclp_con.c   | 28 +--
 drivers/s390/char/sclp_vt220.c | 42 +++---
 6 files changed, 96 insertions(+), 63 deletions(-)

diff --git a/drivers/s390/char/con3215.c b/drivers/s390/char/con3215.c
index f356607835d8..192198bd3dc4 100644
--- a/drivers/s390/char/con3215.c
+++ b/drivers/s390/char/con3215.c
@@ -771,35 +771,37 @@ static struct tty_driver *con3215_device(struct console 
*c, int *index)
 }
 
 /*
- * panic() calls con3215_flush through a panic_notifier
- * before the system enters a disabled, endless loop.
+ * The below function is called as a panic/reboot notifier before the
+ * system enters a disabled, endless loop.
+ *
+ * Notice we must use the spin_trylock() alternative, to prevent lockups
+ * in atomic context (panic routine runs with secondary CPUs, local IRQs
+ * and preemption disabled).
  */
-static void con3215_flush(void)
-{
-   struct raw3215_info *raw;
-   unsigned long flags;
-
-   raw = raw3215[0];  /* console 3215 is the first one */
-   spin_lock_irqsave(get_ccwdev_lock(raw->cdev), flags);
-   raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
-   spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
-}
-
 static int con3215_notify(struct notifier_block *self,
  unsigned long event, void *data)
 {
-   con3215_flush();
-   return NOTIFY_OK;
+   struct raw3215_info *raw;
+   unsigned long flags;
+
+   raw = raw3215[0];  /* console 3215 is the first one */
+   if (!spin_trylock_irqsave(get_ccwdev_lock(raw->cdev), flags))
+   return NOTIFY_DONE;
+
+   raw3215_make_room(raw, RAW3215_BUFFER_SIZE);
+   spin_unlock_irqrestore(get_ccwdev_lock(raw->cdev), flags);
+
+   return NOTIFY_DONE;
 }
 
 static struct notifier_block on_panic_nb = {
.notifier_call = con3215_notify,
-   .priority = 0,
+   .priority = INT_MIN + 1, /* run the callback late */
 };
 
 static struct notifier_block on_reboot_nb = {
.notifier_call = con3215_notify,
-   .priority = 0,
+   .priority = INT_MIN + 1, /* run the callback late */
 };
 
 /*
diff --git a/drivers/s390/char/con3270.c b/drivers/s390/char/con3270.c
index e4592890f20a..4

[PATCH 12/30] parisc: Replace regular spinlock with spin_trylock on panic path

2022-04-27 Thread Guilherme G. Piccoli
The panic notifiers' callbacks execute in an atomic context, with
interrupts/preemption disabled, and all CPUs not running the panic
function are off, so it's very dangerous to wait on a regular
spinlock, there's a risk of deadlock.

This patch refactors the panic notifier of parisc/power driver
to make use of spin_trylock - for that, we've added a second
version of the soft-power function. Also, some comments were
reorganized and trailing white spaces, useless header inclusion
and blank lines were removed.

Cc: Helge Deller 
Cc: "James E.J. Bottomley" 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/parisc/include/asm/pdc.h |  1 +
 arch/parisc/kernel/firmware.c | 27 +++
 drivers/parisc/power.c| 17 ++---
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/parisc/include/asm/pdc.h b/arch/parisc/include/asm/pdc.h
index b643092d4b98..7a106008e258 100644
--- a/arch/parisc/include/asm/pdc.h
+++ b/arch/parisc/include/asm/pdc.h
@@ -83,6 +83,7 @@ int pdc_do_firm_test_reset(unsigned long ftc_bitmap);
 int pdc_do_reset(void);
 int pdc_soft_power_info(unsigned long *power_reg);
 int pdc_soft_power_button(int sw_control);
+int pdc_soft_power_button_panic(int sw_control);
 void pdc_io_reset(void);
 void pdc_io_reset_devices(void);
 int pdc_iodc_getc(void);
diff --git a/arch/parisc/kernel/firmware.c b/arch/parisc/kernel/firmware.c
index 6a7e315bcc2e..0e2f70b592f4 100644
--- a/arch/parisc/kernel/firmware.c
+++ b/arch/parisc/kernel/firmware.c
@@ -1232,15 +1232,18 @@ int __init pdc_soft_power_info(unsigned long *power_reg)
 }
 
 /*
- * pdc_soft_power_button - Control the soft power button behaviour
- * @sw_control: 0 for hardware control, 1 for software control 
+ * pdc_soft_power_button{_panic} - Control the soft power button behaviour
+ * @sw_control: 0 for hardware control, 1 for software control
  *
  *
  * This PDC function places the soft power button under software or
  * hardware control.
- * Under software control the OS may control to when to allow to shut 
- * down the system. Under hardware control pressing the power button 
+ * Under software control the OS may control to when to allow to shut
+ * down the system. Under hardware control pressing the power button
  * powers off the system immediately.
+ *
+ * The _panic version relies in spin_trylock to prevent deadlock
+ * on panic path.
  */
 int pdc_soft_power_button(int sw_control)
 {
@@ -1254,6 +1257,22 @@ int pdc_soft_power_button(int sw_control)
return retval;
 }
 
+int pdc_soft_power_button_panic(int sw_control)
+{
+   int retval;
+   unsigned long flags;
+
+   if (!spin_trylock_irqsave(_lock, flags)) {
+   pr_emerg("Couldn't enable soft power button\n");
+   return -EBUSY; /* ignored by the panic notifier */
+   }
+
+   retval = mem_pdc_call(PDC_SOFT_POWER, PDC_SOFT_POWER_ENABLE, 
__pa(pdc_result), sw_control);
+   spin_unlock_irqrestore(_lock, flags);
+
+   return retval;
+}
+
 /*
  * pdc_io_reset - Hack to avoid overlapping range registers of Bridges devices.
  * Primarily a problem on T600 (which parisc-linux doesn't support) but
diff --git a/drivers/parisc/power.c b/drivers/parisc/power.c
index 456776bd8ee6..8512884de2cf 100644
--- a/drivers/parisc/power.c
+++ b/drivers/parisc/power.c
@@ -37,7 +37,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -175,16 +174,21 @@ static void powerfail_interrupt(int code, void *x)
 
 
 
-/* parisc_panic_event() is called by the panic handler.
- * As soon as a panic occurs, our tasklets above will not be
- * executed any longer. This function then re-enables the 
- * soft-power switch and allows the user to switch off the system
+/*
+ * parisc_panic_event() is called by the panic handler.
+ *
+ * As soon as a panic occurs, our tasklets above will not
+ * be executed any longer. This function then re-enables
+ * the soft-power switch and allows the user to switch off
+ * the system. We rely in pdc_soft_power_button_panic()
+ * since this version spin_trylocks (instead of regular
+ * spinlock), preventing deadlocks on panic path.
  */
 static int parisc_panic_event(struct notifier_block *this,
unsigned long event, void *ptr)
 {
/* re-enable the soft-power switch */
-   pdc_soft_power_button(0);
+   pdc_soft_power_button_panic(0);
return NOTIFY_DONE;
 }
 
@@ -193,7 +197,6 @@ static struct notifier_block parisc_panic_block = {
.priority   = INT_MAX,
 };
 
-
 static int __init power_init(void)
 {
unsigned long ret;
-- 
2.36.0




[PATCH 07/30] mips: ip22: Reword PANICED to PANICKED and remove useless header

2022-04-27 Thread Guilherme G. Piccoli
Many other place in the kernel prefer the latter, so let's keep
it consistent in MIPS code as well. Also, removes a useless header.

Cc: Thomas Bogendoerfer 
Signed-off-by: Guilherme G. Piccoli 
---
 arch/mips/sgi-ip22/ip22-reset.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/mips/sgi-ip22/ip22-reset.c b/arch/mips/sgi-ip22/ip22-reset.c
index 9028dbbb45dd..8f0861c58080 100644
--- a/arch/mips/sgi-ip22/ip22-reset.c
+++ b/arch/mips/sgi-ip22/ip22-reset.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -41,7 +40,7 @@
 static struct timer_list power_timer, blink_timer, debounce_timer;
 static unsigned long blink_timer_timeout;
 
-#define MACHINE_PANICED1
+#define MACHINE_PANICKED   1
 #define MACHINE_SHUTTING_DOWN  2
 
 static int machine_state;
@@ -112,7 +111,7 @@ static void debounce(struct timer_list *unused)
return;
}
 
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
sgimc->cpuctrl0 |= SGIMC_CCTRL0_SYSINIT;
 
enable_irq(SGI_PANEL_IRQ);
@@ -120,7 +119,7 @@ static void debounce(struct timer_list *unused)
 
 static inline void power_button(void)
 {
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
return;
 
if ((machine_state & MACHINE_SHUTTING_DOWN) ||
@@ -167,9 +166,9 @@ static irqreturn_t panel_int(int irq, void *dev_id)
 static int panic_event(struct notifier_block *this, unsigned long event,
  void *ptr)
 {
-   if (machine_state & MACHINE_PANICED)
+   if (machine_state & MACHINE_PANICKED)
return NOTIFY_DONE;
-   machine_state |= MACHINE_PANICED;
+   machine_state |= MACHINE_PANICKED;
 
blink_timer_timeout = PANIC_FREQ;
blink_timeout(_timer);
-- 
2.36.0




[PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-04-27 Thread Guilherme G. Piccoli
The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Cc: Benjamin Herrenschmidt 
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Paul Mackerras 
Signed-off-by: Guilherme G. Piccoli 
---

We'd like to thanks specially the MiniCloud infrastructure [0] maintainers,
that allow us to test PowerPC code in a very complete, functional and FREE
environment (there's no need even for adding a credit card, like many "free"
clouds require ¬¬ ).

[0] https://openpower.ic.unicamp.br/minicloud

 arch/powerpc/kernel/setup-common.c | 74 ++
 1 file changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 518ae5aa9410..52f96b209a96 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -680,8 +679,25 @@ int check_legacy_ioport(unsigned long base_port)
 }
 EXPORT_SYMBOL(check_legacy_ioport);
 
-static int ppc_panic_event(struct notifier_block *this,
- unsigned long event, void *ptr)
+/*
+ * Panic notifiers setup
+ *
+ * We have 3 notifiers for powerpc, each one from a different "nature":
+ *
+ * - ppc_panic_fadump_handler() is a hypervisor notifier, which hard-disables
+ *   IRQs and deal with the Firmware-Assisted dump, when it is configured;
+ *   should run early in the panic path.
+ *
+ * - dump_kernel_offset() is an informative notifier, just showing the KASLR
+ *   offset if we have RANDOMIZE_BASE set.
+ *
+ * - ppc_panic_platform_handler() is a low-level handler that's registered
+ *   only if the platform wishes to perform final actions in the panic path,
+ *   hence it should run late and might not even return. Currently, only
+ *   pseries and ps3 platforms register callbacks.
+ */
+static int ppc_panic_fadump_handler(struct notifier_block *this,
+   unsigned long event, void *ptr)
 {
/*
 * panic does a local_irq_disable, but we really
@@ -691,45 +707,63 @@ static int ppc_panic_event(struct notifier_block *this,
 
/*
 * If firmware-assisted dump has been registered then trigger
-* firmware-assisted dump and let firmware handle everything else.
+* its callback and let the firmware handles everything else.
 */
crash_fadump(NULL, ptr);
-   if (ppc_md.panic)
-   ppc_md.panic(ptr);  /* May not return */
+
return NOTIFY_DONE;
 }
 
-static struct notifier_block ppc_panic_block = {
-   .notifier_call = ppc_panic_event,
-   .priority = INT_MIN /* may not return; must be done last */
-};
-
-/*
- * Dump out kernel offset information on panic.
- */
 static int dump_kernel_offset(struct notifier_block *self, unsigned long v,
  void *p)
 {
pr_emerg("Kernel Offset: 0x%lx from 0x%lx\n",
 kaslr_offset(), KERNELBASE);
 
-   return 0;
+   return NOTIFY_DONE;
 }
 
+static int ppc_panic_platform_handler(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+   /*
+* This handler is only registered if we have a panic callback
+* on ppc_md, hence NULL check is not needed.
+* Also, it may not return, so it runs really late on panic path.
+*/
+   ppc_md.panic(ptr);
+
+   return NO

[PATCH 06/30] soc: bcm: brcmstb: Document panic notifier action and remove useless header

2022-04-27 Thread Guilherme G. Piccoli
The panic notifier of this driver is very simple code-wise, just a memory
write to a special position with some numeric code. But this is not clear
from the semantic point-of-view, and there is no public documentation
about that either.

After discussing this in the mailing-lists [0] and having Florian explained
it very well, this patch just document that in the code for the future
generations asking the same questions. Also, it removes a useless header.

[0] https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da1...@gmail.com

Fixes: 0b741b8234c8 ("soc: bcm: brcmstb: Add support for S2/S3/S5 suspend 
states (ARM)")
Cc: Brian Norris 
Cc: Doug Berger 
Cc: Florian Fainelli 
Cc: Justin Chen 
Cc: Lee Jones 
Cc: Markus Mayer 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/soc/bcm/brcmstb/pm/pm-arm.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/bcm/brcmstb/pm/pm-arm.c 
b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
index 3cbb165d6e30..870686ae042b 100644
--- a/drivers/soc/bcm/brcmstb/pm/pm-arm.c
+++ b/drivers/soc/bcm/brcmstb/pm/pm-arm.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -664,7 +663,20 @@ static void __iomem *brcmstb_ioremap_match(const struct 
of_device_id *matches,
 
return of_io_request_and_map(dn, index, dn->full_name);
 }
-
+/*
+ * The AON is a small domain in the SoC that can retain its state across
+ * various system wide sleep states and specific reset conditions; the
+ * AON DATA RAM is a small RAM of a few words (< 1KB) which can store
+ * persistent information across such events.
+ *
+ * The purpose of the below panic notifier is to help with notifying
+ * the bootloader that a panic occurred and so that it should try its
+ * best to preserve the DRAM contents holding that buffer for recovery
+ * by the kernel as opposed to wiping out DRAM clean again.
+ *
+ * Reference: comment from Florian Fainelli, at
+ * https://lore.kernel.org/lkml/781cafb0-8d06-8b56-907a-5175c2da1...@gmail.com
+ */
 static int brcmstb_pm_panic_notify(struct notifier_block *nb,
unsigned long action, void *data)
 {
-- 
2.36.0




  1   2   >