Re: [PATCH 1/2] powerpc/mce: Avoid using irq_work_queue() in realmode

2021-11-11 Thread Daniel Axtens
Hi Ganesh,

> In realmode mce handler we use irq_work_queue() to defer
> the processing of mce events, irq_work_queue() can only
> be called when translation is enabled because it touches
> memory outside RMA, hence we enable translation before
> calling irq_work_queue and disable on return, though it
> is not safe to do in realmode.
>
> To avoid this, program the decrementer and call the event
> processing functions from timer handler.
>

This is an interesting approach, and it would indeed be nice to clear up
the MCE handling a bit.

I have a few questions:

> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index 331d944280b8..187810f13669 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -235,8 +235,10 @@ extern void machine_check_print_event_info(struct 
> machine_check_event *evt,
>  unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr);
>  extern void mce_common_process_ue(struct pt_regs *regs,
> struct mce_error_info *mce_err);
> +extern void machine_check_raise_dec_intr(void);

Does this need an extern? I think that's the default...?

>  int mce_register_notifier(struct notifier_block *nb);
>  int mce_unregister_notifier(struct notifier_block *nb);
> +void mce_run_late_handlers(void);
>  #ifdef CONFIG_PPC_BOOK3S_64
>  void flush_and_reload_slb(void);
>  void flush_erat(void);
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index dc05a862e72a..f49180f8c9be 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -280,6 +280,7 @@ struct paca_struct {
>  #endif
>  #ifdef CONFIG_PPC_BOOK3S_64
>   struct mce_info *mce_info;
> + atomic_t mces_to_process;

This is in the PACA, which is supposed to be a per-cpu structure: hey
does it need to be atomic_t? Isn't there only one CPU accessing it?

Does this variable provide anything new compared to mce_nest_count or
mce_queue_count + mce_ue_count? It looks like
machine_check_process_queued_event will clear a queue based on value of
mce_queue_count and machine_check_process_ue_event will clear a queue
based on mce_ue_count...

I think (absent nested interrupts which I talk about below) it should be
the case that mces_to_process == mce_queue_count + mce_ue_count but I
might be wrong?

>  #endif /* CONFIG_PPC_BOOK3S_64 */
>  } cacheline_aligned;

  
>  /*
>   * Decode and save high level MCE information into per cpu buffer which
>   * is an array of machine_check_event structure.
> @@ -135,6 +131,8 @@ void save_mce_event(struct pt_regs *regs, long handled,
>   if (mce->error_type == MCE_ERROR_TYPE_UE)
>   mce->u.ue_error.ignore_event = mce_err->ignore_event;
>  
> + atomic_inc(&local_paca->mces_to_process);
> +

Is there any chance the decrementer will fire between when you do this
atomic_inc() and when you finish adding all the information to the mce
data structure in the remainder of save_mce_event? (e.g. filling in the
tlb_errror.effective_address field)?

(Or does save_mce_event get called with interrupts masked? I find it
very hard to remember what parts of the MCE code path happen under what
circumstances!)

>   if (!addr)
>   return;
>  


> @@ -338,7 +322,7 @@ static void machine_process_ue_event(struct work_struct 
> *work)
>   * process pending MCE event from the mce event queue. This function will be
>   * called during syscall exit.

Is this comment still accurate if this patch is applied?

>   */
> -static void machine_check_process_queued_event(struct irq_work *work)
> +static void machine_check_process_queued_event(void)
>  {
>   int index;
>   struct machine_check_event *evt;
> @@ -363,6 +347,17 @@ static void machine_check_process_queued_event(struct 
> irq_work *work)
>   }
>  }
>  
> +void mce_run_late_handlers(void)
> +{
> + if (unlikely(atomic_read(&local_paca->mces_to_process))) {
> + if (ppc_md.machine_check_log_err)
> + ppc_md.machine_check_log_err();
> + machine_check_process_queued_event();
> + machine_check_ue_work();
> + atomic_dec(&local_paca->mces_to_process);
> + }
> +}

What happens if you get a nested MCE between log_err() and
process_queued_event()? If my very foggy memory of the MCE handling is
correct, we enable nested MCEs very early in the process because the
alternative is that a nested MCE will checkstop the box. So I think this
might be possible, albeit probably unlikely.

It looks like process_queued_event clears the entire MCE queue as
determined by the mce_queue_count. So, consider the following sequence
of events:

1. Take MCE 1. Save to queue, increment mce_queue_count, increment
   mces_to_process, set decrementer to fire.

2. Decrementer fires. mce_run_late_handlers is called.

3. mces_to_process = 1, so we call machine_check_log_err(), which prints
   (on pseries) the info for MCE 1.

4. Take MCE 2

Re: [PATCH v1] powerpc/watchdog: help remote CPUs to flush NMI printk output

2021-11-11 Thread Daniel Axtens
Hi,

> The printk layer at the moment does not seem to have a good way to force
> flush printk messages that are created in NMI context, except in the
> panic path.
>
> NMI-context printk messages normally get to the console with irq_work,
> but that won't help if the CPU is stuck with irqs disabled, as can be
> the case for hard lockup watchdog messages.
>
> The watchdog currently flushes the printk buffers after detecting a
> lockup on remote CPUs, but they may not have processed their NMI IPI
> yet by that stage, or they may have self-detected a lockup in which
> case they won't go via this NMI IPI path.
>
> Improve the situation by having NMI-context mark a flag if it called
> printk, and have watchdog timer interrupts check if that flag was set
> and try to flush if it was. Latency is not a big problem because we
> were already stuck for a while, just need to try to make sure the
> messages eventually make it out.

Initially I was surprised that this doesn't affect the printk code
itself, just the powerpc code...

>
> Cc: Laurent Dufour 
> Signed-off-by: Nicholas Piggin 
> ---
> This patch is actually based on top of this one which is planned to go
> upstream in rc1/2. https://marc.info/?l=linux-kernel&m=163626070312052&w=2
>
> Prior to commit 93d102f094be that is fixed by the above, we had a printk
> flush function with a different name but basically does the same job, so
> this patch can be backported, just needs some care. I'm posting it for
> review now for feedback so it's ready to go when the printk patch is
> upstream.
>
> Thanks,
> Nick
>
>  arch/powerpc/kernel/watchdog.c | 29 +++--
>  1 file changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index b6533539386b..a7b6b0691203 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -86,6 +86,7 @@ static DEFINE_PER_CPU(u64, wd_timer_tb);
>  /* SMP checker bits */
>  static unsigned long __wd_smp_lock;
>  static unsigned long __wd_reporting;
> +static unsigned long __wd_nmi_output;
>  static cpumask_t wd_smp_cpus_pending;
>  static cpumask_t wd_smp_cpus_stuck;
>  static u64 wd_smp_last_reset_tb;
> @@ -154,6 +155,18 @@ static void wd_lockup_ipi(struct pt_regs *regs)
>   else
>   dump_stack();
>  
> + /*
> +  * We printk()ed from NMI context, the contents may not get flushed
> +  * if we return to a context with interrupts disabled because
> +  * printk uses irq_work to schedule flushes of NMI output.
> +  * __wd_nmi_output says there has been output from NMI context, so
> +  * other CPUs are recruited to help flush it.
> +  *
> +  * xchg is not needed here (it could be a simple atomic store), but
> +  * it gives the memory ordering and atomicity required.
> +  */
> + xchg(&__wd_nmi_output, 1);
> +
>   /* Do not panic from here because that can recurse into NMI IPI layer */
>  }

I think, looking at this and the other site where __wd_nmi_output is
set, that this works because you set the flag only when you are done
printing from the non-panic lockup context on this CPU. I was initially
worried that you set this flag part way through printing, and then it
might get cleared by another CPU while you're still trying to print.
However, in this function it's right at the end - there's nothing else
left to do, and ...

>  DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
> @@ -386,6 +401,8 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   print_irqtrace_events(current);
>   show_regs(regs);
>  
> + xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
> +
>   if (sysctl_hardlockup_all_cpu_backtrace)
>   trigger_allbutself_cpu_backtrace();

in this one, the only things that can happen afterwards are
 - a panic, which does its own flushing, and

- trigger_allbutself_cpu_backtrace(), which seems to just send IPIs, not
 do any printing of its own.

All of which is fine, but I wonder if we need a more descriptive
variable name or if the comment needs to specify that the flag should
only be set at certain times.

Kind regards,
Daniel


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-11 Thread Nayna



On 11/8/21 07:05, Michal Suchánek wrote:

Hello,

On Mon, Nov 08, 2021 at 09:18:56AM +1100, Daniel Axtens wrote:

Michal Suchánek  writes:


On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:

Michal Suchanek  writes:


S390 uses appended signature for kernel but implements the check
separately from module loader.

Support for secure boot on powerpc with appended signature is planned -
grub patches submitted upstream but not yet merged.

Power Non-Virtualised / OpenPower already supports secure boot via kexec
with signature verification via IMA. I think you have now sent a
follow-up series that merges some of the IMA implementation, I just
wanted to make sure it was clear that we actually already have support

So is IMA_KEXEC and KEXEC_SIG redundant?

I see some architectures have both. I also see there is a lot of overlap
between the IMA framework and the KEXEC_SIG and MODULE_SIg.


Mimi would be much better placed than me to answer this.

The limits of my knowledge are basically that signature verification for
modules and kexec kernels can be enforced by IMA policies.

For example a secure booted powerpc kernel with module support will have
the following IMA policy set at the arch level:

"appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist 
appraise_type=imasig|modsig",
(in arch/powerpc/kernel/ima_arch.c)

Module signature enforcement can be set with either IMA (policy like
"appraise func=MODULE_CHECK appraise_flag=check_blacklist 
appraise_type=imasig|modsig" )
or with CONFIG_MODULE_SIG_FORCE/module.sig_enforce=1.

Sometimes this leads to arguably unexpected interactions - for example
commit fa4f3f56ccd2 ("powerpc/ima: Fix secure boot rules in ima arch
policy"), so it might be interesting to see if we can make things easier
to understand.

I suspect that is the root of the problem here. Until distributions pick
up IMA and properly document step by step in detail how to implement,
enable, and debug it the _SIG options are required for users to be able
to make use of signatures.


For secureboot, IMA appraisal policies are configured in kernel at boot 
time based on secureboot state of the system, refer 
arch/powerpc/kernel/ima_arch.c and security/integrity/ima/ima_efi.c. 
This doesn't require any user configuration. Yes, I agree it would be 
helpful to update kernel documentation specifying steps to sign the 
kernel image using sign-file.




The other part is that distributions apply 'lockdown' patches that change
the security policy depending on secure boot status which were rejected
by upstream which only hook into the _SIG options, and not into the IMA_
options. Of course, I expect this to change when the IMA options are
universally available across architectures and the support picked up by
distributions.

Which brings the third point: IMA features vary across architectures,
and KEXEC_SIG is more common than IMA_KEXEC.

config/arm64/default:CONFIG_HAVE_IMA_KEXEC=y
config/ppc64le/default:CONFIG_HAVE_IMA_KEXEC=y

config/arm64/default:CONFIG_KEXEC_SIG=y
config/s390x/default:CONFIG_KEXEC_SIG=y
config/x86_64/default:CONFIG_KEXEC_SIG=y

KEXEC_SIG makes it much easier to get uniform features across
architectures.


Architectures use KEXEC_SIG vs IMA_KEXEC based on their requirement. 
IMA_KEXEC is for the kernel images signed using sign-file (appended 
signatures, not PECOFF), provides measurement along with verification, 
and is tied to secureboot state of the system at boot time.


Thanks & Regards,

  - Nayna



Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-11 Thread Nayna



On 11/5/21 09:14, Michal Suchánek wrote:

On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:

Michal Suchanek  writes:


S390 uses appended signature for kernel but implements the check
separately from module loader.

Support for secure boot on powerpc with appended signature is planned -
grub patches submitted upstream but not yet merged.

Power Non-Virtualised / OpenPower already supports secure boot via kexec
with signature verification via IMA. I think you have now sent a
follow-up series that merges some of the IMA implementation, I just
wanted to make sure it was clear that we actually already have support

So is IMA_KEXEC and KEXEC_SIG redundant?

I see some architectures have both. I also see there is a lot of overlap
between the IMA framework and the KEXEC_SIG and MODULE_SIg.


Originally, KEXEC_SIG was meant for PECOFF based signatures, while 
IMA_KEXEC mainly supported xattr based signatures.


Power (Non-virtualized/OpenPOWER) doesn't support PECOFF. Extended 
attributes based signature verification doesn't work with netboot. 
That's when appended signature support was added to IMA.


Using IMA_KEXEC has the benefit of being able to enable both signature 
verification and measurement of the kernel image.


Thanks & Regards,

 - Nayna



Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Olof Johansson
On Thu, Nov 11, 2021 at 2:20 AM Marc Zyngier  wrote:
>
> On Thu, 11 Nov 2021 07:47:08 +,
> Christian Zigotzky  wrote:
> >
> > On 11 November 2021 at 08:13 am, Marc Zyngier wrote:
> > > On Thu, 11 Nov 2021 05:24:52 +,
> > > Christian Zigotzky  wrote:
> > >> Hello Marc,
> > >>
> > >> Here you are:
> > >> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406
> > > This is not what I asked. I need the actual source file, or at the
> > > very least the compiled object (the /sys/firmware/fdt file, for
> > > example). Not an interpretation that I can't feed to the kernel.
> > >
> > > Without this, I can't debug your problem.
> > >
> > >> We are very happy to have the patch for reverting the bad commit
> > >> because we want to test the new PASEMI i2c driver with support for the
> > >> Apple M1 [1] on our Nemo boards.
> > > You can revert the patch on your own. At this stage, we're not blindly
> > > reverting things in the tree, but instead trying to understand what
> > > is happening on your particular system.
> > >
> > > Thanks,
> > >
> > > M.
> > >
> > I added a download link for the fdt file to the post [1]. Please read
> > also Darren's comments in this post.
>
> Thanks for that. The DT looks absolutely bonkers, no wonder that
> things break with something like that.
>
> But Darren's comments made me jump a bit, and I quote them here for
> everyone to see:
>
> 
> [...]
>
> The dtb passed by the CFE firmware has a number of issues, which up till
> now have been fixed by use of patches applied to the mainline kernel.
> This occasionally causes problems with changes made to mainline.
>
> [...]
> 
>
> Am I right in understanding that the upstream kernel does not support
> the machine out of the box, and that you actually have to apply out of
> tree patches to make it work? That these patches have to do with the
> IRQ routing?

To my knowledge that has never been needed -- that the base platform
support is all there.

This is an old platform, and just like with the power macs, the
devicetree is indeed supplied from firmware, and as such not easily
patchable like with ARM platforms.

Last time this was an issue (to my memory) was when they enabled
automatic little endian switching in the boot path, that caused some
issues with a (bad) phandle pointer that had gone undiscovered for 10+
years.

> If so, I wonder why upstream should revert a patch to work on a system
> that isn't supported upstream the first place. I will still try and
> come up with a solution for you. But asking for the revert of a patch
> on these grounds is not, IMHO, acceptable. Also, please provide these
> patches on the list so that I can help you to some extend (and I mean
> *on the list*, not on a random forum that collects my information).

Early fixups of DT is the way to go here, if needed -- we do it on
some other platforms. That can happen in-kernel, and keep the new
functionality. For that we'd need to figure out what's actually wrong
with the DT as such right now.



-Olof


Re: [PATCH v3] perf vendor events power10: Add metric events json file for power10 platform

2021-11-11 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 08, 2021 at 08:47:02AM -0600, Paul A. Clarke escreveu:
> On Mon, Nov 08, 2021 at 11:30:10AM +0530, Kajol Jain wrote:
> > Add pmu metric json file for power10 platform.
> > 
> > Signed-off-by: Kajol Jain 
> > ---
> > Changelog
> > v2 -> v3:
> > - Did nit changes in BriefDescription as suggested
> >   by Paul A. Clarke and Michael Ellermen
> 
> LGTM.
> 
> Reviewed-by: Paul A. Clarke 

Thanks, applied.

- Arnaldo



Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Wed, 10 Nov 2021 18:07:24 +,
Christian Zigotzky  wrote:
> 
> On 09 November 2021 at 03:45 pm, Christian Zigotzky wrote:
> > Hello,
> >
> > The Nemo board [1] doesn't recognize any ATA disks with the
> pci-v5.16 updates [2].
> >
> > Error messages:
> >
> > ata4.00: gc timeout cmd 0xec
> > ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > ata1.00: gc timeout cmd 0xec
> > ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> > ata3.00: gc timeout cmd 0xec
> > ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)
> >
> > I was able to revert the new pci-v5.16 updates [2]. After a new
> compiling, the kernel recognize all ATA disks correctly.
> >
> > Could you please check the pci-v5.16 updates [2]?
> >
> > Please find attached the kernel config.
> >
> > Thanks,
> > Christian
> >
> > [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
> > [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
> 
> Hi All,
> 
> Many thanks for your nice responses.
> 
> I bisected today [1]. 0412841812265734c306ba5ef8088bcb64d5d3bd
> (of/irq: Allow matching of an interrupt-map local to an interrupt
> controller) [2] is the first bad commit.

Can you please give the following hack a go and post the result
(including the full dmesg)?

Thanks,

M.
diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index 32be5a03951f..8cf0cc9b7caf 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -156,14 +156,15 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
 
/* Now start the actual "proper" walk of the interrupt tree */
while (ipar != NULL) {
+   bool intc = of_property_read_bool(ipar, "interrupt-controller");
+
/*
 * Now check if cursor is an interrupt-controller and
 * if it is then we are done, unless there is an
 * interrupt-map which takes precedence.
 */
imap = of_get_property(ipar, "interrupt-map", &imaplen);
-   if (imap == NULL &&
-   of_property_read_bool(ipar, "interrupt-controller")) {
+   if (imap == NULL && intc) {
pr_debug(" -> got it !\n");
return 0;
}
@@ -244,8 +245,14 @@ int of_irq_parse_raw(const __be32 *addr, struct 
of_phandle_args *out_irq)
 
pr_debug(" -> imaplen=%d\n", imaplen);
}
-   if (!match)
+   if (!match) {
+   if (intc) {
+   pr_info("%pOF interrupt-map failed, using 
interrupt-controller\n", ipar);
+   return 0;
+   }
+
goto fail;
+   }
 
/*
 * Successfully parsed an interrrupt-map translation; copy new

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 11/11] powerpc/smp: Add a doorbell=off kernel parameter

2021-11-11 Thread Cédric Le Goater

On 11/11/21 11:41, Michael Ellerman wrote:

Cédric Le Goater  writes:

On processors with a XIVE interrupt controller (POWER9 and above), the
kernel can use either doorbells or XIVE to generate CPU IPIs. Sending
doorbell is generally preferred to using the XIVE IC because it is
faster. There are cases where we want to avoid doorbells and use XIVE
only, for debug or performance. Only useful on POWER9 and above.


How much do we want this?


Yes. Thanks for asking. It is a recent need.

Here is some background I should have added in the first place. May be
for a v2.

We have different ways of doing IPIs on POWER9 and above processors,
depending on the platform and the underlying hypervisor.

- PowerNV uses global doorbells

- pSeries/KVM uses XIVE only because local doorbells are not
  efficient, as there are emulated in the KVM hypervisor

- pSeries/PowerVM uses XIVE for remote cores and local doorbells for
  threads on same core (SMT4 or 8)

This recent commit 5b06d1679f2f ("powerpc/pseries: Use doorbells even
if XIVE is available") introduced the optimization for PowerVM and
commit 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell
restrictions") restricted the optimization.

We would like a way to turn off the optimization.


Kernel command line args are a bit of a pain, they tend to be poorly
tested, because someone has to explicitly enable them at boot time,
and then reboot to test the other case.


True. The "xive=off" parameter was poorly tested initially.


When would we want to enable this?


For bring-up, for debug, for tests. I have been using a similar switch
to compare the XIVE interrupt controller performance with doorbells on
POWER9 and P0WER10.

A new need arises with PowerVM, some configurations will behave as KVM
(local doorbell are unsupported) and the doorbell=off parameter is a
simple way to handle this case today.


Can we make the kernel smarter about when to use doorbells and make
it automated?


I don't think we want to probe all IPI methods to detect how well
local doorbells are supported on the platform. Do we ?

A machine property/feature would be cleaner. It is a global CPU
property but I don't know where to put it. Ideas ?


Could we make it a runtime switch?


We can. See the patch below. It covers the need for test/performance
but it won't work on a PowerVM system not supporting local doorbells
since boot will fail as soon as secondaries are started. We need a way
to take a decision early on which method to activate.


Thanks

C.

From dcac8528c89b689217515032f3329ba5ea10085d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= 
Date: Fri, 5 Nov 2021 12:23:48 +0100
Subject: [PATCH] powerpc/xive: Add a debugfs toggle to select xive for IPIs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For performance tests only.

Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/sysdev/xive/common.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index 39142df828a018..9ee36b95f9c545 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1826,6 +1826,30 @@ static int xive_eq_debug_show(struct seq_file *m, void 
*private)
 }
 DEFINE_SHOW_ATTRIBUTE(xive_eq_debug);
 
+static int xive_ipi_cause_debug_set(void *data, u64 val)

+{
+   static void (*do_ipi)(int cpu);
+
+   if (val) {
+   do_ipi = smp_ops->cause_ipi;
+   smp_ops->cause_ipi = xive_cause_ipi;
+   } else {
+   if (do_ipi)
+   smp_ops->cause_ipi = do_ipi;
+   }
+
+   return 0;
+}
+
+static int xive_ipi_cause_debug_get(void *data, u64 *val)
+{
+   *val = xive_cause_ipi == smp_ops->cause_ipi;
+   return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(xive_ipi_cause_debug_fops, xive_ipi_cause_debug_get,
+xive_ipi_cause_debug_set, "%llu\n");
+
 static void xive_core_debugfs_create(void)
 {
struct dentry *xive_dir;
@@ -1849,6 +1873,8 @@ static void xive_core_debugfs_create(void)
}
debugfs_create_bool("store-eoi", 0600, xive_dir, &xive_store_eoi);
debugfs_create_bool("save-restore", 0600, xive_dir, 
&xive_has_save_restore);
+   debugfs_create_file("ipi-cause", 0600, xive_dir,
+   NULL, &xive_ipi_cause_debug_fops);
 }
 
 #endif /* CONFIG_DEBUG_FS */


Re: [PATCH] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic

2021-11-11 Thread Nicholas Piggin
Excerpts from Hari Bathini's message of November 11, 2021 10:06 pm:
> 
> 
> On 11/11/21 11:44 am, Michael Ellerman wrote:
>> Hari Bathini  writes:
>>> In panic path, fadump is triggered via a panic notifier function.
>>> Before calling panic notifier functions, smp_send_stop() gets called,
>>> which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
>>> ("powerpc: stop_this_cpu: remove the cpu from the online map.") and
>>> again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
>>> started marking CPUs as offline while stopping them. So, if a kernel
>>> has either of the above commits, vmcore captured with fadump via panic
>>> path would show only the panic'ing CPU as online. Sample output of
>>> crash-utility with such vmcore:
>>>
>>># crash vmlinux vmcore
>>>...
>>>  KERNEL: vmlinux
>>>DUMPFILE: vmcore  [PARTIAL DUMP]
>>>CPUS: 1
>>>DATE: Wed Nov 10 09:56:34 EST 2021
>>>  UPTIME: 00:00:42
>>>LOAD AVERAGE: 2.27, 0.69, 0.24
>>>   TASKS: 183
>>>NODENAME: X
>>> RELEASE: 5.15.0+
>>> VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
>>> MACHINE: ppc64le  (2500 Mhz)
>>>  MEMORY: 8 GB
>>>   PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>> PID: 3394
>>> COMMAND: "bash"
>>>TASK: c000150a5f80  [THREAD_INFO: c000150a5f80]
>>> CPU: 1
>>>   STATE: TASK_RUNNING (PANIC)
>>>
>>>crash> p -x __cpu_online_mask
>>>__cpu_online_mask = $1 = {
>>>  bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
>>>}
>>>crash>
>>>crash>
>>>crash> p -x __cpu_active_mask
>>>__cpu_active_mask = $2 = {
>>>  bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
>>>}
>>>crash>
>>>
>>> While this has been the case since fadump was introduced, the issue
>>> was not identified for two probable reasons:
>>>
>>>- In general, the bulk of the vmcores analyzed were from crash
>>>  due to exception.
>>>
>>>- The above did change since commit 8341f2f222d7 ("sysrq: Use
>>>  panic() to force a crash") started using panic() instead of
>>>  deferencing NULL pointer to force a kernel crash. But then
>>>  commit de6e5d38417e ("powerpc: smp_send_stop do not offline
>>>  stopped CPUs") stopped marking CPUs as offline till kernel
>>>  commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
>>>  reverted that change.
>>>
>>> To avoid vmcore from showing only one CPU as online in panic path,
>>> skip marking non panic'ing CPUs as offline while stopping them
>>> during fadump crash.
>> 
>> Is this really worth the added complexity/bug surface?
>> 
>> Why does it matter if the vmcore shows only one CPU online?
> 
> We lose out on backtrace/register data of non-crashing CPUs as they
> are explicitly marked offline.
> 
> Actually, the state of CPU resources is explicitly changed after the
> panic though the aim of vmcore is to capture the system state at the
> time of panic...
> 
> Alternatively, how about moving crash_fadump() call from panic notifier
> into panic() function explicitly, just like __crash_kexec() - before the
> smp_send_stop() call, so as to remove dependency with smp_send_stop()
> implementation altogether...

Does the crash dump code snapshot the CPUs with send_debuggr_break? I
can't remember the exact flow. But if it sends NMI IPIs to all other
CPUs then maybe it could be moved before smp_send_stop, good idea.

> 
>> 
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index c23ee842c4c3..20555d5d5966 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -61,6 +61,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>>   
>>>   #ifdef DEBUG
>>>   #include 
>>> @@ -626,7 +627,8 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
>>> /*
>>>  * IRQs are already hard disabled by the smp_handle_nmi_ipi.
>>>  */
>>> -   set_cpu_online(smp_processor_id(), false);
>>> +   if (!(oops_in_progress && should_fadump_crash()))
>>> +   set_cpu_online(smp_processor_id(), false);
>>>   
>>> spin_begin();
>>> while (1)
>>> @@ -650,7 +652,8 @@ static void stop_this_cpu(void *dummy)
>>>  * to know other CPUs are offline before it breaks locks to flush
>>>  * printk buffers, in case we panic()ed while holding the lock.
>>>  */
>>> -   set_cpu_online(smp_processor_id(), false);
>>> +   if (!(oops_in_progress && should_fadump_crash()))
>>> +   set_cpu_online(smp_processor_id(), false);
>> 
>> The comment talks about printk_safe_flush_on_panic(), and this change
>> would presumably break that. Except that printk_safe_flush_on_panic() no
>> longer exists.
>> 
>> So do we need to set the CPU online here at all?
>> 
>> ie. could we revert bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
>> now that printk_safe_flush_on_panic() no longer exists?
> 

Re: [PATCH] ppc64/fadump: fix inaccurate CPU state info in vmcore generated with panic

2021-11-11 Thread Hari Bathini




On 11/11/21 11:44 am, Michael Ellerman wrote:

Hari Bathini  writes:

In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would show only the panic'ing CPU as online. Sample output of
crash-utility with such vmcore:

   # crash vmlinux vmcore
   ...
 KERNEL: vmlinux
   DUMPFILE: vmcore  [PARTIAL DUMP]
   CPUS: 1
   DATE: Wed Nov 10 09:56:34 EST 2021
 UPTIME: 00:00:42
   LOAD AVERAGE: 2.27, 0.69, 0.24
  TASKS: 183
   NODENAME: X
RELEASE: 5.15.0+
VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
MACHINE: ppc64le  (2500 Mhz)
 MEMORY: 8 GB
  PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 3394
COMMAND: "bash"
   TASK: c000150a5f80  [THREAD_INFO: c000150a5f80]
CPU: 1
  STATE: TASK_RUNNING (PANIC)

   crash> p -x __cpu_online_mask
   __cpu_online_mask = $1 = {
 bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
   }
   crash>
   crash>
   crash> p -x __cpu_active_mask
   __cpu_active_mask = $2 = {
 bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
   }
   crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

   - In general, the bulk of the vmcores analyzed were from crash
 due to exception.

   - The above did change since commit 8341f2f222d7 ("sysrq: Use
 panic() to force a crash") started using panic() instead of
 deferencing NULL pointer to force a kernel crash. But then
 commit de6e5d38417e ("powerpc: smp_send_stop do not offline
 stopped CPUs") stopped marking CPUs as offline till kernel
 commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
 reverted that change.

To avoid vmcore from showing only one CPU as online in panic path,
skip marking non panic'ing CPUs as offline while stopping them
during fadump crash.


Is this really worth the added complexity/bug surface?

Why does it matter if the vmcore shows only one CPU online?


We lose out on backtrace/register data of non-crashing CPUs as they
are explicitly marked offline.

Actually, the state of CPU resources is explicitly changed after the
panic though the aim of vmcore is to capture the system state at the
time of panic...

Alternatively, how about moving crash_fadump() call from panic notifier
into panic() function explicitly, just like __crash_kexec() - before the
smp_send_stop() call, so as to remove dependency with smp_send_stop()
implementation altogether...




diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c23ee842c4c3..20555d5d5966 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #ifdef DEBUG

  #include 
@@ -626,7 +627,8 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
/*
 * IRQs are already hard disabled by the smp_handle_nmi_ipi.
 */
-   set_cpu_online(smp_processor_id(), false);
+   if (!(oops_in_progress && should_fadump_crash()))
+   set_cpu_online(smp_processor_id(), false);
  
  	spin_begin();

while (1)
@@ -650,7 +652,8 @@ static void stop_this_cpu(void *dummy)
 * to know other CPUs are offline before it breaks locks to flush
 * printk buffers, in case we panic()ed while holding the lock.
 */
-   set_cpu_online(smp_processor_id(), false);
+   if (!(oops_in_progress && should_fadump_crash()))
+   set_cpu_online(smp_processor_id(), false);


The comment talks about printk_safe_flush_on_panic(), and this change
would presumably break that. Except that printk_safe_flush_on_panic() no
longer exists.

So do we need to set the CPU online here at all?

ie. could we revert bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
now that printk_safe_flush_on_panic() no longer exists?


Yeah, sounds like the logical thing to do but I guess, Nick would be in
a better position to answer this..

Thanks,
Hari


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Christian Zigotzky

On 11 November 2021 at 12:24 pm, Marc Zyngier wrote:

On Thu, 11 Nov 2021 10:44:30 +,
Christian Zigotzky  wrote:

On 11 November 2021 at 11:20 am, Marc Zyngier wrote:

On Thu, 11 Nov 2021 07:47:08 +,
Christian Zigotzky  wrote:

On 11 November 2021 at 08:13 am, Marc Zyngier wrote:

On Thu, 11 Nov 2021 05:24:52 +,
Christian Zigotzky  wrote:

Hello Marc,

Here you are:
https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406

This is not what I asked. I need the actual source file, or at the
very least the compiled object (the /sys/firmware/fdt file, for
example). Not an interpretation that I can't feed to the kernel.

Without this, I can't debug your problem.


We are very happy to have the patch for reverting the bad commit
because we want to test the new PASEMI i2c driver with support for the
Apple M1 [1] on our Nemo boards.

You can revert the patch on your own. At this stage, we're not blindly
reverting things in the tree, but instead trying to understand what
is happening on your particular system.

Thanks,

M.


I added a download link for the fdt file to the post [1]. Please read
also Darren's comments in this post.

Am I right in understanding that the upstream kernel does not support
the machine out of the box, and that you actually have to apply out of
tree patches to make it work?

No, you aren't right. The upstream kernel supports the Nemo board out
of the box. [1]

That's not the way I interpret Darren's comments, but you surely know
better than I do.

I'll try to come up with something for you to test shortly.

M.


Great! Thanks a lot for your help!

- Christian


Re: [PATCH v2] powerpc/64s: Get LPID bit width from device tree

2021-11-11 Thread kernel test robot
Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.15 next-2021]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-Get-LPID-bit-width-from-device-tree/2020-160038
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-buildonly-randconfig-r001-2021 (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/b8fe7181506bf3011295456a9ee471ddf49d694d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Nicholas-Piggin/powerpc-64s-Get-LPID-bit-width-from-device-tree/2020-160038
git checkout b8fe7181506bf3011295456a9ee471ddf49d694d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   arch/powerpc/mm/init_64.c: In function 'mmu_early_init_devtree':
>> arch/powerpc/mm/init_64.c:472:13: warning: variable 'rc' set but not used 
>> [-Wunused-but-set-variable]
 472 | int rc;
 | ^~


vim +/rc +472 arch/powerpc/mm/init_64.c

   469  
   470  void __init mmu_early_init_devtree(void)
   471  {
 > 472  int rc;

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Thu, 11 Nov 2021 10:44:30 +,
Christian Zigotzky  wrote:
> 
> On 11 November 2021 at 11:20 am, Marc Zyngier wrote:
> > On Thu, 11 Nov 2021 07:47:08 +,
> > Christian Zigotzky  wrote:
> >> On 11 November 2021 at 08:13 am, Marc Zyngier wrote:
> >>> On Thu, 11 Nov 2021 05:24:52 +,
> >>> Christian Zigotzky  wrote:
>  Hello Marc,
>  
>  Here you are:
>  https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406
> >>> This is not what I asked. I need the actual source file, or at the
> >>> very least the compiled object (the /sys/firmware/fdt file, for
> >>> example). Not an interpretation that I can't feed to the kernel.
> >>> 
> >>> Without this, I can't debug your problem.
> >>> 
>  We are very happy to have the patch for reverting the bad commit
>  because we want to test the new PASEMI i2c driver with support for the
>  Apple M1 [1] on our Nemo boards.
> >>> You can revert the patch on your own. At this stage, we're not blindly
> >>> reverting things in the tree, but instead trying to understand what
> >>> is happening on your particular system.
> >>> 
> >>> Thanks,
> >>> 
> >>>   M.
> >>> 
> >> I added a download link for the fdt file to the post [1]. Please read
> >> also Darren's comments in this post.
> > 
> > Am I right in understanding that the upstream kernel does not support
> > the machine out of the box, and that you actually have to apply out of
> > tree patches to make it work?
> No, you aren't right. The upstream kernel supports the Nemo board out
> of the box. [1]

That's not the way I interpret Darren's comments, but you surely know
better than I do.

I'll try to come up with something for you to test shortly.

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH v2 2/5] preempt/dynamic: Introduce preempt mode accessors

2021-11-11 Thread Mike Galbraith
On Thu, 2021-11-11 at 10:56 +, Valentin Schneider wrote:
> On 11/11/21 11:32, Mike Galbraith wrote:
> > On Thu, 2021-11-11 at 10:36 +0100, Marco Elver wrote:
> > > I guess the question is if is_preempt_full() should be true also if
> > > is_preempt_rt() is true?
> >
> > That's what CONFIG_PREEMPTION is.  More could follow, but it was added
> > to allow multiple models to say "preemptible".
> >
>
> That's what I was gonna say, but you can have CONFIG_PREEMPTION while being
> is_preempt_none() due to PREEMPT_DYNAMIC...

Ah, right.. this is gonna take some getting used to.

-Mike


Re: [PATCH v2 4/5] kscan: Use preemption model accessors

2021-11-11 Thread Valentin Schneider
On 11/11/21 10:11, Marco Elver wrote:
> Subject s/kscan/kcsan/
>

Woops...

> On Wed, Nov 10, 2021 at 08:24PM +, Valentin Schneider wrote:
>> Per PREEMPT_DYNAMIC, checking CONFIG_PREEMPT doesn't tell you the actual
>> preemption model of the live kernel. Use the newly-introduced accessors
>> instead.
>>
>> Signed-off-by: Valentin Schneider 
>
> Reviewed-by: Marco Elver 
>
> Though it currently doesn't compile as a module due to missing
> EXPORT_SYMBOL of is_preempt*().
>
>> ---
>>  kernel/kcsan/kcsan_test.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/kcsan/kcsan_test.c b/kernel/kcsan/kcsan_test.c
>> index dc55fd5a36fc..14d811eb9a21 100644
>> --- a/kernel/kcsan/kcsan_test.c
>> +++ b/kernel/kcsan/kcsan_test.c
>> @@ -1005,13 +1005,13 @@ static const void *nthreads_gen_params(const void 
>> *prev, char *desc)
>>  else
>>  nthreads *= 2;
>>
>> -if (!IS_ENABLED(CONFIG_PREEMPT) || 
>> !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) {
>> +if (!is_preempt_full() || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) {
>>  /*
>>   * Without any preemption, keep 2 CPUs free for other tasks, one
>>   * of which is the main test case function checking for
>>   * completion or failure.
>>   */
>> -const long min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 
>> 2 : 0;
>> +const long min_unused_cpus = is_preempt_none() ? 2 : 0;
>>  const long min_required_cpus = 2 + min_unused_cpus;
>>
>>  if (num_online_cpus() < min_required_cpus) {
>> --
>> 2.25.1


Re: [PATCH v2 2/5] preempt/dynamic: Introduce preempt mode accessors

2021-11-11 Thread Valentin Schneider
On 11/11/21 11:32, Mike Galbraith wrote:
> On Thu, 2021-11-11 at 10:36 +0100, Marco Elver wrote:
>> I guess the question is if is_preempt_full() should be true also if
>> is_preempt_rt() is true?
>
> That's what CONFIG_PREEMPTION is.  More could follow, but it was added
> to allow multiple models to say "preemptible".
>

That's what I was gonna say, but you can have CONFIG_PREEMPTION while being
is_preempt_none() due to PREEMPT_DYNAMIC...

>> Not sure all cases are happy with that, e.g. the kernel/trace/trace.c
>> case, which wants to print the precise preemption level.
>
> Yeah, that's the "annoying" bit, needing one oddball model accessor
> that isn't about a particular model.
>
>> To avoid confusion, I'd introduce another helper that says true if the
>> preemption level is "at least full", currently that'd be "full or rt".
>> Something like is_preempt_full_or_rt() (but might as well write
>> "is_preempt_full() || is_preempt_rt()"), or is_preemption() (to match
>> that Kconfig variable, although it's slightly confusing). The
>> implementation of that helper can just be a static inline function
>> returning "is_preempt_full() || is_preempt_rt()".
>>
>> Would that help?
>
> Yeah, as it sits two accessors are needed, one that says PREEMPT the
> other PREEMPTION, spelling optional.
>

Per the above, I think we need the full || rt thingie.

>   -Mike


Re: [PATCH v2 2/5] preempt/dynamic: Introduce preempt mode accessors

2021-11-11 Thread Valentin Schneider
On 11/11/21 09:54, Marco Elver wrote:
> On Wed, Nov 10, 2021 at 08:24PM +, Valentin Schneider wrote:
> [...]
>> +#ifdef CONFIG_PREEMPT_DYNAMIC
>> +
>> +extern bool is_preempt_none(void);
>> +extern bool is_preempt_voluntary(void);
>> +extern bool is_preempt_full(void);
>> +
>> +#else
>> +
>> +#define is_preempt_none() IS_ENABLED(CONFIG_PREEMPT_NONE)
>> +#define is_preempt_voluntary() IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY)
>> +#define is_preempt_full() IS_ENABLED(CONFIG_PREEMPT)
>> +
>> +#endif
>> +
>> +#define is_preempt_rt() IS_ENABLED(CONFIG_PREEMPT_RT)
>> +
>
> Can these callables be real functions in all configs, making the
> !DYNAMIC ones just static inline bool ones? That'd catch invalid use in
> #if etc. in all configs.
>

Ack

>>  /*
>>   * Does a critical section need to be broken due to another
>>   * task waiting?: (technically does not depend on CONFIG_PREEMPTION,
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 97047aa7b6c2..9db7f77e53c3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6638,6 +6638,17 @@ static void __init preempt_dynamic_init(void)
>>  }
>>  }
>>
>> +#define PREEMPT_MODE_ACCESSOR(mode) \
>> +bool is_preempt_##mode(void)
>>  \
>> +{   
>>  \
>> +WARN_ON_ONCE(preempt_dynamic_mode == 
>> preempt_dynamic_undefined); \
>> +return preempt_dynamic_mode == preempt_dynamic_##mode;  
>>  \
>> +}
>
> This needs an EXPORT_SYMBOL, so it's usable from modules like the
> kcsan_test module.

Ah, wasn't sure about that one, thanks!


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Christian Zigotzky

On 11 November 2021 at 11:20 am, Marc Zyngier wrote:

On Thu, 11 Nov 2021 07:47:08 +,
Christian Zigotzky  wrote:

On 11 November 2021 at 08:13 am, Marc Zyngier wrote:

On Thu, 11 Nov 2021 05:24:52 +,
Christian Zigotzky  wrote:

Hello Marc,

Here you are:
https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406

This is not what I asked. I need the actual source file, or at the
very least the compiled object (the /sys/firmware/fdt file, for
example). Not an interpretation that I can't feed to the kernel.

Without this, I can't debug your problem.


We are very happy to have the patch for reverting the bad commit
because we want to test the new PASEMI i2c driver with support for the
Apple M1 [1] on our Nemo boards.

You can revert the patch on your own. At this stage, we're not blindly
reverting things in the tree, but instead trying to understand what
is happening on your particular system.

Thanks,

M.


I added a download link for the fdt file to the post [1]. Please read
also Darren's comments in this post.


Am I right in understanding that the upstream kernel does not support
the machine out of the box, and that you actually have to apply out of
tree patches to make it work?
No, you aren't right. The upstream kernel supports the Nemo board out of 
the box. [1]


- Christian

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=Darren+Stevens


Re: [PATCH 11/11] powerpc/smp: Add a doorbell=off kernel parameter

2021-11-11 Thread Michael Ellerman
Cédric Le Goater  writes:
> On processors with a XIVE interrupt controller (POWER9 and above), the
> kernel can use either doorbells or XIVE to generate CPU IPIs. Sending
> doorbell is generally preferred to using the XIVE IC because it is
> faster. There are cases where we want to avoid doorbells and use XIVE
> only, for debug or performance. Only useful on POWER9 and above.

How much do we want this?

Kernel command line args are a bit of a pain, they tend to be poorly
tested, because someone has to explicitly enable them at boot time, and
then reboot to test the other case.

When would we want to enable this? Can we make the kernel smarter about
when to use doorbells and make it automated?

Could we make it a runtime switch?

cheers

>
> Signed-off-by: Cédric Le Goater 
> ---
>  arch/powerpc/include/asm/dbell.h|  1 +
>  arch/powerpc/kernel/dbell.c | 17 +
>  arch/powerpc/platforms/powernv/smp.c|  7 +--
>  arch/powerpc/platforms/pseries/smp.c|  3 +++
>  Documentation/admin-guide/kernel-parameters.txt | 10 ++
>  5 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/dbell.h 
> b/arch/powerpc/include/asm/dbell.h
> index 3e9da22a2779..07775aa3e81b 100644
> --- a/arch/powerpc/include/asm/dbell.h
> +++ b/arch/powerpc/include/asm/dbell.h
> @@ -90,6 +90,7 @@ static inline void ppc_msgsync(void)
>  #endif /* CONFIG_PPC_BOOK3S */
>  
>  extern void doorbell_exception(struct pt_regs *regs);
> +extern bool doorbell_disabled;
>  
>  static inline void ppc_msgsnd(enum ppc_dbell type, u32 flags, u32 tag)
>  {
> diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
> index 5545c9cd17c1..681ee4775629 100644
> --- a/arch/powerpc/kernel/dbell.c
> +++ b/arch/powerpc/kernel/dbell.c
> @@ -38,6 +38,23 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
>  
>   set_irq_regs(old_regs);
>  }
> +
> +bool doorbell_disabled;
> +
> +static int __init doorbell_cmdline(char *arg)
> +{
> + if (!arg)
> + return -EINVAL;
> +
> + if (strncmp(arg, "off", 3) == 0) {
> + pr_info("Doorbell disabled on kernel command line\n");
> + doorbell_disabled = true;
> + }
> +
> + return 0;
> +}
> +__setup("doorbell=", doorbell_cmdline);
> +
>  #else /* CONFIG_SMP */
>  DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
>  {
> diff --git a/arch/powerpc/platforms/powernv/smp.c 
> b/arch/powerpc/platforms/powernv/smp.c
> index cbb67813cd5d..1311bda9446a 100644
> --- a/arch/powerpc/platforms/powernv/smp.c
> +++ b/arch/powerpc/platforms/powernv/smp.c
> @@ -338,10 +338,13 @@ static void __init pnv_smp_probe(void)
>   ic_cause_ipi = smp_ops->cause_ipi;
>   WARN_ON(!ic_cause_ipi);
>  
> - if (cpu_has_feature(CPU_FTR_ARCH_300))
> + if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> + if (doorbell_disabled)
> + return;
>   smp_ops->cause_ipi = doorbell_global_ipi;
> - else
> + } else {
>   smp_ops->cause_ipi = pnv_cause_ipi;
> + }
>   }
>  }
>  
> diff --git a/arch/powerpc/platforms/pseries/smp.c 
> b/arch/powerpc/platforms/pseries/smp.c
> index f47429323eee..3bc9e6aaf645 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -229,6 +229,9 @@ static __init void pSeries_smp_probe(void)
>   return;
>   }
>  
> + if (doorbell_disabled)
> + return;
> +
>   /*
>* Under PowerVM, FSCR[MSGP] is enabled as guest vCPU siblings are
>* gang scheduled on the same physical core, so doorbells are always
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 10fa093251e8..2e1284febe39 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1041,6 +1041,16 @@
>   The filter can be disabled or changed to another
>   driver later using sysfs.
>  
> + doorbell=off[PPC]
> + On processors with a XIVE interrupt controller
> + (POWER9 and above), the kernel can use either
> + doorbells or XIVE to generate CPU IPIs. Sending
> + doorbell is generally preferred to using the XIVE
> + IC because it is faster. There are cases where
> + we want to avoid doorbells and use XIVE only,
> + for debug or performance. Only useful on
> + POWER9 and above.
> +
>   driver_async_probe=  [KNL]
>   List of driver names to be probed asynchronously.
>   Format: ,...
> -- 
> 2.31.1


Re: [PATCH v2 2/5] preempt/dynamic: Introduce preempt mode accessors

2021-11-11 Thread Mike Galbraith
On Thu, 2021-11-11 at 10:36 +0100, Marco Elver wrote:
> On Thu, 11 Nov 2021 at 04:47, Mike Galbraith  wrote:
> >
> > On Thu, 2021-11-11 at 04:35 +0100, Mike Galbraith wrote:
> > > On Thu, 2021-11-11 at 04:16 +0100, Mike Galbraith wrote:
> > > > On Wed, 2021-11-10 at 20:24 +, Valentin Schneider wrote:
> > > > >
> > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > > index 5f8db54226af..0640d5622496 100644
> > > > > --- a/include/linux/sched.h
> > > > > +++ b/include/linux/sched.h
> > > > > @@ -2073,6 +2073,22 @@ static inline void cond_resched_rcu(void)
> > > > >  #endif
> > > > >  }
> > > > >
> > > > > +#ifdef CONFIG_PREEMPT_DYNAMIC
> > > > > +
> > > > > +extern bool is_preempt_none(void);
> > > > > +extern bool is_preempt_voluntary(void);
> > > > > +extern bool is_preempt_full(void);
> > > > > +
> > > > > +#else
> > > > > +
> > > > > +#define is_preempt_none() IS_ENABLED(CONFIG_PREEMPT_NONE)
> > > > > +#define is_preempt_voluntary()
> > > > > IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY)
> > > > > +#define is_preempt_full() IS_ENABLED(CONFIG_PREEMPT)
> > > >
> > > > I think that should be IS_ENABLED(CONFIG_PREEMPTION), see
> > > > c1a280b68d4e.
> > > >
> > > > Noticed while applying the series to an RT tree, where tglx
> > > > has done that replacement to the powerpc spot your next patch
> > > > diddles.
> > >
> > > Damn, then comes patch 5 properly differentiating PREEMPT/PREEMPT_RT.
> >
> > So I suppose the powerpc spot should remain CONFIG_PREEMPT and become
> > CONFIG_PREEMPTION when the RT change gets merged, because that spot is
> > about full preemptibility, not a distinct preemption model.
> >
> > That's rather annoying :-/
>
> I guess the question is if is_preempt_full() should be true also if
> is_preempt_rt() is true?

That's what CONFIG_PREEMPTION is.  More could follow, but it was added
to allow multiple models to say "preemptible".

> Not sure all cases are happy with that, e.g. the kernel/trace/trace.c
> case, which wants to print the precise preemption level.

Yeah, that's the "annoying" bit, needing one oddball model accessor
that isn't about a particular model.

> To avoid confusion, I'd introduce another helper that says true if the
> preemption level is "at least full", currently that'd be "full or rt".
> Something like is_preempt_full_or_rt() (but might as well write
> "is_preempt_full() || is_preempt_rt()"), or is_preemption() (to match
> that Kconfig variable, although it's slightly confusing). The
> implementation of that helper can just be a static inline function
> returning "is_preempt_full() || is_preempt_rt()".
>
> Would that help?

Yeah, as it sits two accessors are needed, one that says PREEMPT the
other PREEMPTION, spelling optional.

-Mike


Re: [PATCH v2] powerpc/64s: introduce CONFIG_MAXSMP to test very large SMP

2021-11-11 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 09/11/2021 à 07:51, Nicholas Piggin a écrit :
...
>> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
>> b/arch/powerpc/platforms/Kconfig.cputype
>> index a208997ade88..14c275e0ff93 100644
>> --- a/arch/powerpc/platforms/Kconfig.cputype
>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>> @@ -475,9 +475,14 @@ config SMP
>>   
>>If you don't know what to do here, say N.
>>   
>> +# MAXSMP sets 8192 if COMPILE_TEST because that's what x86 has flushed out.
>> +# Exceeding that will cause a lot of compile errors. Have to deal with those
>> +# first.
>>   config NR_CPUS
>> -int "Maximum number of CPUs (2-8192)" if SMP
>> -range 2 8192 if SMP
>> +int "Maximum number of CPUs (2-8192)" if SMP && !MAXSMP
>> +range 2 16384 if SMP
>> +default 16384 if MAXSMP && !COMPILE_TEST
>> +default 8192 if MAXSMP && COMPILE_TEST
>
> You can do less complex. First hit becomes the default, so you can do:
>
>   default 8192 if MAXSMP && COMPILE_TEST
>   default 16384 if MAXSMP

I did that when applying.

cheers


Re: [PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

2021-11-11 Thread Marc Zyngier
On Thu, 11 Nov 2021 07:47:08 +,
Christian Zigotzky  wrote:
> 
> On 11 November 2021 at 08:13 am, Marc Zyngier wrote:
> > On Thu, 11 Nov 2021 05:24:52 +,
> > Christian Zigotzky  wrote:
> >> Hello Marc,
> >> 
> >> Here you are:
> >> https://forum.hyperion-entertainment.com/viewtopic.php?p=54406#p54406
> > This is not what I asked. I need the actual source file, or at the
> > very least the compiled object (the /sys/firmware/fdt file, for
> > example). Not an interpretation that I can't feed to the kernel.
> > 
> > Without this, I can't debug your problem.
> > 
> >> We are very happy to have the patch for reverting the bad commit
> >> because we want to test the new PASEMI i2c driver with support for the
> >> Apple M1 [1] on our Nemo boards.
> > You can revert the patch on your own. At this stage, we're not blindly
> > reverting things in the tree, but instead trying to understand what
> > is happening on your particular system.
> > 
> > Thanks,
> > 
> > M.
> > 
> I added a download link for the fdt file to the post [1]. Please read
> also Darren's comments in this post.

Thanks for that. The DT looks absolutely bonkers, no wonder that
things break with something like that.

But Darren's comments made me jump a bit, and I quote them here for
everyone to see:


[...]

The dtb passed by the CFE firmware has a number of issues, which up till
now have been fixed by use of patches applied to the mainline kernel.
This occasionally causes problems with changes made to mainline.

[...]


Am I right in understanding that the upstream kernel does not support
the machine out of the box, and that you actually have to apply out of
tree patches to make it work? That these patches have to do with the
IRQ routing?

If so, I wonder why upstream should revert a patch to work on a system
that isn't supported upstream the first place. I will still try and
come up with a solution for you. But asking for the revert of a patch
on these grounds is not, IMHO, acceptable. Also, please provide these
patches on the list so that I can help you to some extend (and I mean
*on the list*, not on a random forum that collects my information).

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.