from:"Rafael J. Wysocki"

Re: [PATCH] acpi_idle: use raw_safe_halt() from acpi_idle_play_dead()

2023-11-20 Thread Rafael J. Wysocki

On Mon, Nov 20, 2023 at 1:20 PM David Woodhouse  wrote:
>
> On Fri, 2023-10-27 at 21:14 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 27, 2023 at 07:36:51PM +0100, David Woodhouse wrote:
> > > From: David Woodhouse 
> > >
> > > Xen HVM guests were observed taking triple-faults when attempting to
> > > online a previously offlined vCPU.
> > >
> > > Investigation showed that the fault was coming from a failing call
> > > to lockdep_assert_irqs_disabled(), in load_current_idt() which was
> > > too early in the CPU bringup to actually catch the exception and
> > > report the failure cleanly.
> > >
> > > This was a false positive, caused by acpi_idle_play_dead() setting
> > > the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it
> > > to use raw_safe_halt() instead, which doesn't do so.
> > >
> > > Signed-off-by: David Woodhouse 
> > > ---
> > > We might {also,instead} explicitly set the hardirqs_enabled flag to
> > > zero when bringing up an AP?
> >
> > So I fixed up the idle paths the other day (see all that __cpuidle
> > stuff) but I've not yet gone through the whole hotplug thing :/
> >
> > This seems right, at this point everything, including RCU is very much
> > gone, any instrumentation is undesired.
> >
> > Acked-by: Peter Zijlstra (Intel) 
>
> Ping? Who's taking this?

I'm going to apply it.

> Needs a Cc:sta...@vger.kernel.org now too, to fix 6.6.x.

Sure.

Re: [PATCH v6] ACPI: processor: Fix evaluating _PDC method when running as Xen dom0

2023-03-22 Thread Rafael J. Wysocki

On Wed, Mar 22, 2023 at 12:13 PM Roger Pau Monne  wrote:
>
> In ACPI systems, the OS can direct power management, as opposed to the
> firmware.  This OS-directed Power Management is called OSPM.  Part of
> telling the firmware that the OS going to direct power management is
> making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
> methods must be evaluated for every processor object.  If these _PDC
> calls are not completed for every processor it can lead to
> inconsistency and later failures in things like the CPU frequency
> driver.
>
> In a Xen system, the dom0 kernel is responsible for system-wide power
> management.  The dom0 kernel is in charge of OSPM.  However, the
> number of CPUs available to dom0 can be different than the number of
> CPUs physically present on the system.
>
> This leads to a problem: the dom0 kernel needs to evaluate _PDC for
> all the processors, but it can't always see them.
>
> In dom0 kernels, ignore the existing ACPI method for determining if a
> processor is physically present because it might not be accurate.
> Instead, ask the hypervisor for this information.
>
> Fix this by introducing a custom function to use when running as Xen
> dom0 in order to check whether a processor object matches a CPU that's
> online.  Such checking is done using the existing information fetched
> by the Xen pCPU subsystem, extending it to also store the ACPI ID.
>
> This ensures that _PDC method gets evaluated for all physically online
> CPUs, regardless of the number of CPUs made available to dom0.
>
> Fixes: 5d554a7bb064 ('ACPI: processor: add internal 
> processor_physically_present()')
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v5:
>  - Include bug.h header for the inline dummy helper.
>  - Constify pcpu local variable in xen_processor_present().
>
> Changes since v4:
>  - Move definition/declaration of xen_processor_present() to different
>header.
>  - Fold subject edit.
>
> Changes since v3:
>  - Protect xen_processor_present() definition with CONFIG_ACPI.
>
> Changes since v2:
>  - Extend and use the existing pcpu functionality.
>
> Changes since v1:
>  - Reword commit message.
> ---
>  drivers/acpi/processor_pdc.c | 11 +++
>  drivers/xen/pcpu.c   | 20 
>  include/xen/xen.h| 11 +++
>  3 files changed, 42 insertions(+)
>
> diff --git a/drivers/acpi/processor_pdc.c b/drivers/acpi/processor_pdc.c
> index 8c3f82c9fff3..18fb04523f93 100644
> --- a/drivers/acpi/processor_pdc.c
> +++ b/drivers/acpi/processor_pdc.c
> @@ -14,6 +14,8 @@
>  #include 
>  #include 
>
> +#include 
> +
>  #include "internal.h"
>
>  static bool __init processor_physically_present(acpi_handle handle)
> @@ -47,6 +49,15 @@ static bool __init 
> processor_physically_present(acpi_handle handle)
> return false;
> }
>
> +   if (xen_initial_domain())
> +   /*
> +* When running as a Xen dom0 the number of processors Linux
> +* sees can be different from the real number of processors on
> +* the system, and we still need to execute _PDC for all of
> +* them.
> +*/
> +   return xen_processor_present(acpi_id);
> +
> type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
> cpuid = acpi_get_cpuid(handle, type, acpi_id);
>
> diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
> index fd3a644b0855..b3e3d1bb37f3 100644
> --- a/drivers/xen/pcpu.c
> +++ b/drivers/xen/pcpu.c
> @@ -58,6 +58,7 @@ struct pcpu {
> struct list_head list;
> struct device dev;
> uint32_t cpu_id;
> +   uint32_t acpi_id;
> uint32_t flags;
>  };
>
> @@ -249,6 +250,7 @@ static struct pcpu *create_and_register_pcpu(struct 
> xenpf_pcpuinfo *info)
>
> INIT_LIST_HEAD(>list);
> pcpu->cpu_id = info->xen_cpuid;
> +   pcpu->acpi_id = info->acpi_id;
> pcpu->flags = info->flags;
>
> /* Need hold on xen_pcpu_lock before pcpu list manipulations */
> @@ -381,3 +383,21 @@ static int __init xen_pcpu_init(void)
> return ret;
>  }
>  arch_initcall(xen_pcpu_init);
> +
> +#ifdef CONFIG_ACPI
> +bool __init xen_processor_present(uint32_t acpi_id)
> +{
> +   const struct pcpu *pcpu;
> +   bool online = false;
> +
> +   mutex_lock(_pcpu_lock);
> +   list_for_each_entry(pcpu, _pcpus, list)
> +   if (pcpu->acpi_id == acpi_id) {
> +   online = pcpu->flags & XEN_PCPU_FLAGS_ONLINE;
> +   break;
> +   }
> +   mutex_unlock(_pcpu_lock);
> +
> +   return online;
> +}
> +#endif
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 7adf59837c25..0efeb652f9b8 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -71,4 +71,15 @@ static inline void xen_free_unpopulated_pages(unsigned int 
> nr_pages,
>  }
>  #endif
>
> +#if defined(CONFIG_XEN_DOM0) && defined(CONFIG_ACPI) &&

Re: [PATCH v4] acpi/processor: fix evaluating _PDC method when running as Xen dom0

2023-03-21 Thread Rafael J. Wysocki

On Tue, Mar 21, 2023 at 3:02 PM Roger Pau Monné  wrote:
>
> On Tue, Mar 21, 2023 at 02:47:46PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Mar 16, 2023 at 5:43 PM Roger Pau Monne  
> > wrote:
> > >
> > > In ACPI systems, the OS can direct power management, as opposed to the
> > > firmware.  This OS-directed Power Management is called OSPM.  Part of
> > > telling the firmware that the OS going to direct power management is
> > > making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
> > > methods must be evaluated for every processor object.  If these _PDC
> > > calls are not completed for every processor it can lead to
> > > inconsistency and later failures in things like the CPU frequency
> > > driver.
> > >
> > > In a Xen system, the dom0 kernel is responsible for system-wide power
> > > management.  The dom0 kernel is in charge of OSPM.  However, the
> > > number of CPUs available to dom0 can be different than the number of
> > > CPUs physically present on the system.
> > >
> > > This leads to a problem: the dom0 kernel needs to evaluate _PDC for
> > > all the processors, but it can't always see them.
> > >
> > > In dom0 kernels, ignore the existing ACPI method for determining if a
> > > processor is physically present because it might not be accurate.
> > > Instead, ask the hypervisor for this information.
> > >
> > > Fix this by introducing a custom function to use when running as Xen
> > > dom0 in order to check whether a processor object matches a CPU that's
> > > online.  Such checking is done using the existing information fetched
> > > by the Xen pCPU subsystem, extending it to also store the ACPI ID.
> > >
> > > This ensures that _PDC method gets evaluated for all physically online
> > > CPUs, regardless of the number of CPUs made available to dom0.
> > >
> > > Fixes: 5d554a7bb064 ('ACPI: processor: add internal 
> > > processor_physically_present()')
> > > Signed-off-by: Roger Pau Monné 
> > > ---
> > > Changes since v3:
> > >  - Protect xen_processor_present() definition with CONFIG_ACPI.
> > >
> > > Changes since v2:
> > >  - Extend and use the existing pcpu functionality.
> > >
> > > Changes since v1:
> > >  - Reword commit message.
> > > ---
> > >  arch/x86/include/asm/xen/hypervisor.h | 10 ++
> > >  drivers/acpi/processor_pdc.c  | 11 +++
> > >  drivers/xen/pcpu.c| 21 +
> > >  3 files changed, 42 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/xen/hypervisor.h 
> > > b/arch/x86/include/asm/xen/hypervisor.h
> > > index 5fc35f889cd1..990a1609677e 100644
> > > --- a/arch/x86/include/asm/xen/hypervisor.h
> > > +++ b/arch/x86/include/asm/xen/hypervisor.h
> > > @@ -63,4 +63,14 @@ void __init xen_pvh_init(struct boot_params 
> > > *boot_params);
> > >  void __init mem_map_via_hcall(struct boot_params *boot_params_p);
> > >  #endif
> > >
> > > +#if defined(CONFIG_XEN_DOM0) && defined(CONFIG_ACPI)
> > > +bool __init xen_processor_present(uint32_t acpi_id);
> > > +#else
> > > +static inline bool xen_processor_present(uint32_t acpi_id)
> > > +{
> > > +   BUG();
> > > +   return false;
> > > +}
> > > +#endif
> > > +
> > >  #endif /* _ASM_X86_XEN_HYPERVISOR_H */
> > > diff --git a/drivers/acpi/processor_pdc.c b/drivers/acpi/processor_pdc.c
> > > index 8c3f82c9fff3..18fb04523f93 100644
> > > --- a/drivers/acpi/processor_pdc.c
> > > +++ b/drivers/acpi/processor_pdc.c
> > > @@ -14,6 +14,8 @@
> > >  #include 
> > >  #include 
> > >
> > > +#include 
> >
> > This along with the definition above is evidently insufficient for
> > xen_processor_present() to always be defined.  See
> > https://lore.kernel.org/linux-acpi/64198b60.bo+m9o5w+hd8hcf3%25...@intel.com/T/#u
> > for example.
> >
> > I'm dropping the patch now, please fix and resend.
>
> Hello,
>
> Sorry.  I've sent a followup fix:
>
> https://lore.kernel.org/xen-devel/20230321112522.46806-1-roger@citrix.com/T/#u
>
> Would you be fine with taking such followup, or would rather prefer
> for me to send the original fixed patch as v5?

Please fold the fix into the original patch and resend.

Re: [PATCH v4] acpi/processor: fix evaluating _PDC method when running as Xen dom0

2023-03-21 Thread Rafael J. Wysocki

On Thu, Mar 16, 2023 at 5:43 PM Roger Pau Monne  wrote:
>
> In ACPI systems, the OS can direct power management, as opposed to the
> firmware.  This OS-directed Power Management is called OSPM.  Part of
> telling the firmware that the OS going to direct power management is
> making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
> methods must be evaluated for every processor object.  If these _PDC
> calls are not completed for every processor it can lead to
> inconsistency and later failures in things like the CPU frequency
> driver.
>
> In a Xen system, the dom0 kernel is responsible for system-wide power
> management.  The dom0 kernel is in charge of OSPM.  However, the
> number of CPUs available to dom0 can be different than the number of
> CPUs physically present on the system.
>
> This leads to a problem: the dom0 kernel needs to evaluate _PDC for
> all the processors, but it can't always see them.
>
> In dom0 kernels, ignore the existing ACPI method for determining if a
> processor is physically present because it might not be accurate.
> Instead, ask the hypervisor for this information.
>
> Fix this by introducing a custom function to use when running as Xen
> dom0 in order to check whether a processor object matches a CPU that's
> online.  Such checking is done using the existing information fetched
> by the Xen pCPU subsystem, extending it to also store the ACPI ID.
>
> This ensures that _PDC method gets evaluated for all physically online
> CPUs, regardless of the number of CPUs made available to dom0.
>
> Fixes: 5d554a7bb064 ('ACPI: processor: add internal 
> processor_physically_present()')
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v3:
>  - Protect xen_processor_present() definition with CONFIG_ACPI.
>
> Changes since v2:
>  - Extend and use the existing pcpu functionality.
>
> Changes since v1:
>  - Reword commit message.
> ---
>  arch/x86/include/asm/xen/hypervisor.h | 10 ++
>  drivers/acpi/processor_pdc.c  | 11 +++
>  drivers/xen/pcpu.c| 21 +
>  3 files changed, 42 insertions(+)
>
> diff --git a/arch/x86/include/asm/xen/hypervisor.h 
> b/arch/x86/include/asm/xen/hypervisor.h
> index 5fc35f889cd1..990a1609677e 100644
> --- a/arch/x86/include/asm/xen/hypervisor.h
> +++ b/arch/x86/include/asm/xen/hypervisor.h
> @@ -63,4 +63,14 @@ void __init xen_pvh_init(struct boot_params *boot_params);
>  void __init mem_map_via_hcall(struct boot_params *boot_params_p);
>  #endif
>
> +#if defined(CONFIG_XEN_DOM0) && defined(CONFIG_ACPI)
> +bool __init xen_processor_present(uint32_t acpi_id);
> +#else
> +static inline bool xen_processor_present(uint32_t acpi_id)
> +{
> +   BUG();
> +   return false;
> +}
> +#endif
> +
>  #endif /* _ASM_X86_XEN_HYPERVISOR_H */
> diff --git a/drivers/acpi/processor_pdc.c b/drivers/acpi/processor_pdc.c
> index 8c3f82c9fff3..18fb04523f93 100644
> --- a/drivers/acpi/processor_pdc.c
> +++ b/drivers/acpi/processor_pdc.c
> @@ -14,6 +14,8 @@
>  #include 
>  #include 
>
> +#include 

This along with the definition above is evidently insufficient for
xen_processor_present() to always be defined.  See
https://lore.kernel.org/linux-acpi/64198b60.bo+m9o5w+hd8hcf3%25...@intel.com/T/#u
for example.

I'm dropping the patch now, please fix and resend.

> +
>  #include "internal.h"
>
>  static bool __init processor_physically_present(acpi_handle handle)
> @@ -47,6 +49,15 @@ static bool __init 
> processor_physically_present(acpi_handle handle)
> return false;
> }
>
> +   if (xen_initial_domain())
> +   /*
> +* When running as a Xen dom0 the number of processors Linux
> +* sees can be different from the real number of processors on
> +* the system, and we still need to execute _PDC for all of
> +* them.
> +*/
> +   return xen_processor_present(acpi_id);
> +
> type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
> cpuid = acpi_get_cpuid(handle, type, acpi_id);
>
> diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
> index fd3a644b0855..034d05e56507 100644
> --- a/drivers/xen/pcpu.c
> +++ b/drivers/xen/pcpu.c
> @@ -58,6 +58,7 @@ struct pcpu {
> struct list_head list;
> struct device dev;
> uint32_t cpu_id;
> +   uint32_t acpi_id;
> uint32_t flags;
>  };
>
> @@ -249,6 +250,7 @@ static struct pcpu *create_and_register_pcpu(struct 
> xenpf_pcpuinfo *info)
>
> INIT_LIST_HEAD(>list);
> pcpu->cpu_id = info->xen_cpuid;
> +   pcpu->acpi_id = info->acpi_id;
> pcpu->flags = info->flags;
>
> /* Need hold on xen_pcpu_lock before pcpu list manipulations */
> @@ -381,3 +383,22 @@ static int __init xen_pcpu_init(void)
> return ret;
>  }
>  arch_initcall(xen_pcpu_init);
> +
> +#ifdef CONFIG_ACPI
> +bool __init xen_processor_present(uint32_t acpi_id)
> +{
> +   struct pcpu *pcpu;
> +

Re: [PATCH v4] acpi/processor: fix evaluating _PDC method when running as Xen dom0

2023-03-20 Thread Rafael J. Wysocki

On Fri, Mar 17, 2023 at 1:42 PM Juergen Gross  wrote:
>
> On 16.03.23 17:42, Roger Pau Monne wrote:
> > In ACPI systems, the OS can direct power management, as opposed to the
> > firmware.  This OS-directed Power Management is called OSPM.  Part of
> > telling the firmware that the OS going to direct power management is
> > making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
> > methods must be evaluated for every processor object.  If these _PDC
> > calls are not completed for every processor it can lead to
> > inconsistency and later failures in things like the CPU frequency
> > driver.
> >
> > In a Xen system, the dom0 kernel is responsible for system-wide power
> > management.  The dom0 kernel is in charge of OSPM.  However, the
> > number of CPUs available to dom0 can be different than the number of
> > CPUs physically present on the system.
> >
> > This leads to a problem: the dom0 kernel needs to evaluate _PDC for
> > all the processors, but it can't always see them.
> >
> > In dom0 kernels, ignore the existing ACPI method for determining if a
> > processor is physically present because it might not be accurate.
> > Instead, ask the hypervisor for this information.
> >
> > Fix this by introducing a custom function to use when running as Xen
> > dom0 in order to check whether a processor object matches a CPU that's
> > online.  Such checking is done using the existing information fetched
> > by the Xen pCPU subsystem, extending it to also store the ACPI ID.
> >
> > This ensures that _PDC method gets evaluated for all physically online
> > CPUs, regardless of the number of CPUs made available to dom0.
> >
> > Fixes: 5d554a7bb064 ('ACPI: processor: add internal 
> > processor_physically_present()')
> > Signed-off-by: Roger Pau Monné 
>
> Reviewed-by: Juergen Gross 

Applied as 6.4 material under a slightly edited subject, thanks!

Re: [PATCH] x86/acpi: fix suspend with Xen

2023-01-17 Thread Rafael J. Wysocki

On Tue, Jan 17, 2023 at 4:32 PM Juergen Gross  wrote:
>
> On 17.01.23 15:09, Rafael J. Wysocki wrote:
> > On Mon, Jan 16, 2023 at 7:45 AM Juergen Gross  wrote:
> >>
> >> On 13.01.23 20:40, Rafael J. Wysocki wrote:
> >>> On Fri, Jan 13, 2023 at 3:06 PM Juergen Gross  wrote:
> >>>>
> >>>> Commit f1e525009493 ("x86/boot: Skip realmode init code when running as
> >>>> Xen PV guest") missed one code path accessing real_mode_header, leading
> >>>> to dereferencing NULL when suspending the system under Xen:
> >>>>
> >>>>   [  348.284004] PM: suspend entry (deep)
> >>>>   [  348.289532] Filesystems sync: 0.005 seconds
> >>>>   [  348.291545] Freezing user space processes ... (elapsed 0.000 
> >>>> seconds) done.
> >>>>   [  348.292457] OOM killer disabled.
> >>>>   [  348.292462] Freezing remaining freezable tasks ... (elapsed 
> >>>> 0.104 seconds) done.
> >>>>   [  348.396612] printk: Suspending console(s) (use 
> >>>> no_console_suspend to debug)
> >>>>   [  348.749228] PM: suspend devices took 0.352 seconds
> >>>>   [  348.769713] ACPI: EC: interrupt blocked
> >>>>   [  348.816077] BUG: kernel NULL pointer dereference, address: 
> >>>> 001c
> >>>>   [  348.816080] #PF: supervisor read access in kernel mode
> >>>>   [  348.816081] #PF: error_code(0x) - not-present page
> >>>>   [  348.816083] PGD 0 P4D 0
> >>>>   [  348.816086] Oops:  [#1] PREEMPT SMP NOPTI
> >>>>   [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 
> >>>> 6.1.3-1.fc32.qubes.x86_64 #1
> >>>>   [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 
> >>>> 8.01 07/03/2022
> >>>>   [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20
> >>>>
> >>>> Fix that by adding an indirection for acpi_get_wakeup_address() which
> >>>> Xen PV dom0 can use to return a dummy non-zero wakeup address (this
> >>>> address won't ever be used, as the real suspend handling is done by the
> >>>> hypervisor).
> >>>
> >>> How exactly does this help?
> >>
> >> I believed the first sentence of the commit message would make this
> >> clear enough.
> >
> > That was clear, but the fix part wasn't really.
> >
> >> I can expand the commit message to go more into detail if you think
> >> this is really needed.
> >
> > IMO calling acpi_set_waking_vector() with a known-invalid wakeup
> > vector address in dom0 is plain confusing.
> >
> > I'm not sure what to do about it yet, but IMV something needs to be done.
>
> Another possibility would be to modify acpi_sleep_prepare(), e.g. like the
> attached patch (compile tested only).

I prefer this to the previous version.  It is much more straightforward IMV.

Re: [PATCH] x86/acpi: fix suspend with Xen

2023-01-17 Thread Rafael J. Wysocki

On Mon, Jan 16, 2023 at 7:45 AM Juergen Gross  wrote:
>
> On 13.01.23 20:40, Rafael J. Wysocki wrote:
> > On Fri, Jan 13, 2023 at 3:06 PM Juergen Gross  wrote:
> >>
> >> Commit f1e525009493 ("x86/boot: Skip realmode init code when running as
> >> Xen PV guest") missed one code path accessing real_mode_header, leading
> >> to dereferencing NULL when suspending the system under Xen:
> >>
> >>  [  348.284004] PM: suspend entry (deep)
> >>  [  348.289532] Filesystems sync: 0.005 seconds
> >>  [  348.291545] Freezing user space processes ... (elapsed 0.000 
> >> seconds) done.
> >>  [  348.292457] OOM killer disabled.
> >>  [  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 
> >> seconds) done.
> >>  [  348.396612] printk: Suspending console(s) (use no_console_suspend 
> >> to debug)
> >>  [  348.749228] PM: suspend devices took 0.352 seconds
> >>  [  348.769713] ACPI: EC: interrupt blocked
> >>  [  348.816077] BUG: kernel NULL pointer dereference, address: 
> >> 001c
> >>  [  348.816080] #PF: supervisor read access in kernel mode
> >>  [  348.816081] #PF: error_code(0x) - not-present page
> >>  [  348.816083] PGD 0 P4D 0
> >>  [  348.816086] Oops:  [#1] PREEMPT SMP NOPTI
> >>  [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 
> >> 6.1.3-1.fc32.qubes.x86_64 #1
> >>  [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 
> >> 07/03/2022
> >>  [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20
> >>
> >> Fix that by adding an indirection for acpi_get_wakeup_address() which
> >> Xen PV dom0 can use to return a dummy non-zero wakeup address (this
> >> address won't ever be used, as the real suspend handling is done by the
> >> hypervisor).
> >
> > How exactly does this help?
>
> I believed the first sentence of the commit message would make this
> clear enough.

That was clear, but the fix part wasn't really.

> I can expand the commit message to go more into detail if you think
> this is really needed.

IMO calling acpi_set_waking_vector() with a known-invalid wakeup
vector address in dom0 is plain confusing.

I'm not sure what to do about it yet, but IMV something needs to be done.

Re: [PATCH] x86/acpi: fix suspend with Xen

2023-01-13 Thread Rafael J. Wysocki

On Fri, Jan 13, 2023 at 3:06 PM Juergen Gross  wrote:
>
> Commit f1e525009493 ("x86/boot: Skip realmode init code when running as
> Xen PV guest") missed one code path accessing real_mode_header, leading
> to dereferencing NULL when suspending the system under Xen:
>
> [  348.284004] PM: suspend entry (deep)
> [  348.289532] Filesystems sync: 0.005 seconds
> [  348.291545] Freezing user space processes ... (elapsed 0.000 seconds) 
> done.
> [  348.292457] OOM killer disabled.
> [  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 
> seconds) done.
> [  348.396612] printk: Suspending console(s) (use no_console_suspend to 
> debug)
> [  348.749228] PM: suspend devices took 0.352 seconds
> [  348.769713] ACPI: EC: interrupt blocked
> [  348.816077] BUG: kernel NULL pointer dereference, address: 
> 001c
> [  348.816080] #PF: supervisor read access in kernel mode
> [  348.816081] #PF: error_code(0x) - not-present page
> [  348.816083] PGD 0 P4D 0
> [  348.816086] Oops:  [#1] PREEMPT SMP NOPTI
> [  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 
> 6.1.3-1.fc32.qubes.x86_64 #1
> [  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 
> 07/03/2022
> [  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20
>
> Fix that by adding an indirection for acpi_get_wakeup_address() which
> Xen PV dom0 can use to return a dummy non-zero wakeup address (this
> address won't ever be used, as the real suspend handling is done by the
> hypervisor).

How exactly does this help?

> Fixes: f1e525009493 ("x86/boot: Skip realmode init code when running as Xen 
> PV guest")
> Reported-by: Marek Marczykowski-Górecki 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/acpi.h  | 2 +-
>  arch/x86/kernel/acpi/sleep.c | 3 ++-
>  include/xen/acpi.h   | 9 +
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index 65064d9f7fa6..137259ff8f03 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -61,7 +61,7 @@ static inline void acpi_disable_pci(void)
>  extern int (*acpi_suspend_lowlevel)(void);
>
>  /* Physical address to resume after wakeup */
> -unsigned long acpi_get_wakeup_address(void);
> +extern unsigned long (*acpi_get_wakeup_address)(void);
>
>  /*
>   * Check if the CPU can handle C2 and deeper
> diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
> index 3b7f4cdbf2e0..1a3cd5e24cd0 100644
> --- a/arch/x86/kernel/acpi/sleep.c
> +++ b/arch/x86/kernel/acpi/sleep.c
> @@ -33,10 +33,11 @@ static char temp_stack[4096];
>   * Returns the physical address where the kernel should be resumed after the
>   * system awakes from S3, e.g. for programming into the firmware waking 
> vector.
>   */
> -unsigned long acpi_get_wakeup_address(void)
> +static unsigned long x86_acpi_get_wakeup_address(void)
>  {
> return ((unsigned long)(real_mode_header->wakeup_start));
>  }
> +unsigned long (*acpi_get_wakeup_address)(void) = x86_acpi_get_wakeup_address;
>
>  /**
>   * x86_acpi_enter_sleep_state - enter sleep state
> diff --git a/include/xen/acpi.h b/include/xen/acpi.h
> index b1e11863144d..7e1e5dbfb77c 100644
> --- a/include/xen/acpi.h
> +++ b/include/xen/acpi.h
> @@ -56,6 +56,12 @@ static inline int xen_acpi_suspend_lowlevel(void)
> return 0;
>  }
>
> +static inline unsigned long xen_acpi_get_wakeup_address(void)
> +{
> +   /* Just return a dummy non-zero value, it will never be used. */
> +   return 1;
> +}
> +
>  static inline void xen_acpi_sleep_register(void)
>  {
> if (xen_initial_domain()) {
> @@ -65,6 +71,9 @@ static inline void xen_acpi_sleep_register(void)
> _acpi_notify_hypervisor_extended_sleep);
>
> acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
> +#ifdef CONFIG_ACPI_SLEEP
> +   acpi_get_wakeup_address = xen_acpi_get_wakeup_address;
> +#endif
> }
>  }
>  #else
> --
> 2.35.3
>

Re: [PATCH v8 2/3] freezer: refactor pm_freezing into a function.

2022-12-02 Thread Rafael J. Wysocki

On Thu, Dec 1, 2022 at 12:08 PM Ricardo Ribalda  wrote:
>
> Add a way to let the drivers know if the processes are frozen.
>
> This is needed by drivers that are waiting for processes to end on their
> shutdown path.
>
> Convert pm_freezing into a function and export it, so it can be used by
> drivers that are either built-in or modules.
>
> Cc: sta...@vger.kernel.org
> Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers 
> in .shutdown")
> Signed-off-by: Ricardo Ribalda 

Why can't you export the original pm_freezing variable and why is this
fixing anything?

> ---
>  include/linux/freezer.h |  3 ++-
>  kernel/freezer.c|  3 +--
>  kernel/power/process.c  | 24 
>  3 files changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index b303472255be..3413c869d68b 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -13,7 +13,7 @@
>  #ifdef CONFIG_FREEZER
>  DECLARE_STATIC_KEY_FALSE(freezer_active);
>
> -extern bool pm_freezing;   /* PM freezing in effect */
> +bool pm_freezing(void);
>  extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
>
>  /*
> @@ -80,6 +80,7 @@ static inline int freeze_processes(void) { return -ENOSYS; }
>  static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
>  static inline void thaw_kernel_threads(void) {}
> +static inline bool pm_freezing(void) { return false; }
>
>  static inline bool try_to_freeze(void) { return false; }
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index 4fad0e6fca64..2d3530ebdb7e 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -20,7 +20,6 @@ EXPORT_SYMBOL(freezer_active);
>   * indicate whether PM freezing is in effect, protected by
>   * system_transition_mutex
>   */
> -bool pm_freezing;
>  bool pm_nosig_freezing;
>
>  /* protects freezing and frozen transitions */
> @@ -46,7 +45,7 @@ bool freezing_slow_path(struct task_struct *p)
> if (pm_nosig_freezing || cgroup_freezing(p))
> return true;
>
> -   if (pm_freezing && !(p->flags & PF_KTHREAD))
> +   if (pm_freezing() && !(p->flags & PF_KTHREAD))
> return true;
>
> return false;
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index ddd9988327fe..8a4d0e2c8c20 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -108,6 +108,22 @@ static int try_to_freeze_tasks(bool user_only)
> return todo ? -EBUSY : 0;
>  }
>
> +/*
> + * Indicate whether PM freezing is in effect, protected by
> + * system_transition_mutex.
> + */
> +static bool pm_freezing_internal;
> +
> +/**
> + * pm_freezing - indicate whether PM freezing is in effect.
> + *
> + */
> +bool pm_freezing(void)
> +{
> +   return pm_freezing_internal;
> +}
> +EXPORT_SYMBOL(pm_freezing);

Use EXPORT_SYMBOL_GPL() instead, please.

> +
>  /**
>   * freeze_processes - Signal user space processes to enter the refrigerator.
>   * The current thread will not be frozen.  The same process that calls
> @@ -126,12 +142,12 @@ int freeze_processes(void)
> /* Make sure this task doesn't get frozen */
> current->flags |= PF_SUSPEND_TASK;
>
> -   if (!pm_freezing)
> +   if (!pm_freezing())
> static_branch_inc(_active);
>
> pm_wakeup_clear(0);
> pr_info("Freezing user space processes ... ");
> -   pm_freezing = true;
> +   pm_freezing_internal = true;
> error = try_to_freeze_tasks(true);
> if (!error) {
> __usermodehelper_set_disable_depth(UMH_DISABLED);
> @@ -187,9 +203,9 @@ void thaw_processes(void)
> struct task_struct *curr = current;
>
> trace_suspend_resume(TPS("thaw_processes"), 0, true);
> -   if (pm_freezing)
> +   if (pm_freezing())
> static_branch_dec(_active);
> -   pm_freezing = false;
> +   pm_freezing_internal = false;
> pm_nosig_freezing = false;
>
> oom_killer_enable();
>
> --

Re: [PATCH 1/3] acpi/processor: fix evaluating _PDC method when running as Xen dom0

2022-12-02 Thread Rafael J. Wysocki

On Fri, Dec 2, 2022 at 5:37 PM Roger Pau Monné  wrote:
>
> On Fri, Dec 02, 2022 at 08:17:56AM -0800, Dave Hansen wrote:
> > On 12/2/22 04:24, Roger Pau Monné wrote:
> > > On the implementation side, is the proposed approach acceptable?
> > > Mostly asking because it adds Xen conditionals to otherwise generic
> > > ACPI code.
> >
> > That's a good Rafael question.

Sorry for joining late, but first off _PDC has been deprecated since
ACPI 3.0 (2004) and it is not even present in ACPI 6.5 any more.

It follows from your description that _PDC is still used in the field,
though, after 18 years of deprecation.  Who uses it, if I may know?

> > But, how do other places in the ACPI code handle things like this?
>
> Hm, I don't know of other places in the Xen case, the only resource
> in ACPI AML tables managed by Xen are Processor objects/devices AFAIK.
> The rest of devices are fully managed by the dom0 guest.
>
> I think such special handling is very specific to Xen, but maybe I'm
> wrong and there are similar existing cases in ACPI code already.
>
> We could add some kind of hook (iow: a function pointer in some struct
> that could be filled on a implementation basis?) but I didn't want
> overengineering this if adding a conditional was deemed OK.

What _PDC capabilities specifically do you need to pass to the
firmware for things to work correctly?

What platforms are affected?

Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-30 Thread Rafael J. Wysocki

On Sat, Jul 30, 2022 at 11:48 AM Michel Lespinasse
 wrote:
>
> On Fri, Jul 29, 2022 at 04:59:50PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Jul 29, 2022 at 12:25 PM Michel Lespinasse
> >  wrote:
> > >
> > > On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > > > > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > > > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > > > > Xeons") wrecked intel_idle in two ways:
> > > > > >
> > > > > >  - must not have tracing in idle functions
> > > > > >  - must return with IRQs disabled
> > > > > >
> > > > > > Additionally, it added a branch for no good reason.
> > > > > >
> > > > > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on 
> > > > > > Xeons")
> > > > > > Signed-off-by: Peter Zijlstra (Intel) 
> > > > >
> > > > > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > > > > usage" when booting a kernel with debug options compiled in. Please
> > > > > see the attached dmesg output. The issue starts with commit 
> > > > > 32d4fd5751ea
> > > > > and is still present in v5.19-rc8.
> > > > >
> > > > > I'm not sure, is this too late to fix or revert in v5.19 final ?
> > > >
> > > > I finally got a chance to take a quick look at this.
> > > >
> > > > The rcu_eqs_exit() function is making a lockdep complaint about
> > > > being invoked with interrupts enabled.  This function is called from
> > > > rcu_idle_exit(), which is an expected code path from 
> > > > cpuidle_enter_state()
> > > > via its call to rcu_idle_exit().  Except that rcu_idle_exit() disables
> > > > interrupts before invoking rcu_eqs_exit().
> > > >
> > > > The only other call to rcu_idle_exit() does not disable interrupts,
> > > > but it is via rcu_user_exit(), which would be a very odd choice for
> > > > cpuidle_enter_state().
> > > >
> > > > It seems unlikely, but it might be that it is the use of 
> > > > local_irq_save()
> > > > instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> > > > the trouble.  If this is the case, then the commit shown below would
> > > > help.  Note that this commit removes the warning from lockdep, so it
> > > > is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> > > > equivalent debugging.
> > > >
> > > > Could you please try your test with the -rce commit shown below applied?
> > >
> > > Thanks for looking into it.
> > >
> > > After checking out Peter's commit 32d4fd5751ea,
> > > cherry picking your commit ed4ae5eff4b3,
> > > and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
> > > I am now seeing this a few seconds into the boot:
> > >
> > > [3.010650] [ cut here ]
> > > [3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 
> > > sched_clock_tick+0x27/0x60
> > > [3.010657] Modules linked in:
> > > [3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > > 5.19.0-rc1-test-5-g1be22fea0611 #1
> > > [3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 
> > > 01/17/2022
> > > [3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
> > > [3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 
> > > c0 74 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 
> > > 02 <0f> 0b e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
> > >  89 c0 48 03 1c c5 c0 98
> > > [3.010667] RSP: :b2803e28 EFLAGS: 00010002
> > > [3.010670] RAX: 0001 RBX: c8ce7fa07060 RCX: 
> > > 0001
> > > [3.010671] RDX:  RSI: b268dd21 RDI: 
> > > b269ab13
> > > [3.010673] RBP: 0001 R08: ffc300d5 R09: 
> > > 0002be80
> > > [3.010674] R10: 03625b53183a R11: a012b802b7a4 R12: 
> > > b2aa9e80
> > > [3.010675] R13: b2aa9e00 R14: 0001 R15: 
> > > 
> > >

Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-07-29 Thread Rafael J. Wysocki

On Fri, Jul 29, 2022 at 12:25 PM Michel Lespinasse
 wrote:
>
> On Thu, Jul 28, 2022 at 10:20:53AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 25, 2022 at 12:43:06PM -0700, Michel Lespinasse wrote:
> > > On Wed, Jun 08, 2022 at 04:27:27PM +0200, Peter Zijlstra wrote:
> > > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > > Xeons") wrecked intel_idle in two ways:
> > > >
> > > >  - must not have tracing in idle functions
> > > >  - must return with IRQs disabled
> > > >
> > > > Additionally, it added a branch for no good reason.
> > > >
> > > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > > Signed-off-by: Peter Zijlstra (Intel) 
> > >
> > > After this change was introduced, I am seeing "WARNING: suspicious RCU
> > > usage" when booting a kernel with debug options compiled in. Please
> > > see the attached dmesg output. The issue starts with commit 32d4fd5751ea
> > > and is still present in v5.19-rc8.
> > >
> > > I'm not sure, is this too late to fix or revert in v5.19 final ?
> >
> > I finally got a chance to take a quick look at this.
> >
> > The rcu_eqs_exit() function is making a lockdep complaint about
> > being invoked with interrupts enabled.  This function is called from
> > rcu_idle_exit(), which is an expected code path from cpuidle_enter_state()
> > via its call to rcu_idle_exit().  Except that rcu_idle_exit() disables
> > interrupts before invoking rcu_eqs_exit().
> >
> > The only other call to rcu_idle_exit() does not disable interrupts,
> > but it is via rcu_user_exit(), which would be a very odd choice for
> > cpuidle_enter_state().
> >
> > It seems unlikely, but it might be that it is the use of local_irq_save()
> > instead of raw_local_irq_save() within rcu_idle_exit() that is causing
> > the trouble.  If this is the case, then the commit shown below would
> > help.  Note that this commit removes the warning from lockdep, so it
> > is necessary to build the kernel with CONFIG_RCU_EQS_DEBUG=y to enable
> > equivalent debugging.
> >
> > Could you please try your test with the -rce commit shown below applied?
>
> Thanks for looking into it.
>
> After checking out Peter's commit 32d4fd5751ea,
> cherry picking your commit ed4ae5eff4b3,
> and setting CONFIG_RCU_EQS_DEBUG=y in addition of my usual debug config,
> I am now seeing this a few seconds into the boot:
>
> [3.010650] [ cut here ]
> [3.010651] WARNING: CPU: 0 PID: 0 at kernel/sched/clock.c:397 
> sched_clock_tick+0x27/0x60
> [3.010657] Modules linked in:
> [3.010660] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.19.0-rc1-test-5-g1be22fea0611 #1
> [3.010662] Hardware name: LENOVO 30BFS44D00/1036, BIOS S03KT51A 01/17/2022
> [3.010663] RIP: 0010:sched_clock_tick+0x27/0x60
> [3.010665] Code: 1f 40 00 53 eb 02 5b c3 66 90 8b 05 2f c3 40 01 85 c0 74 
> 18 65 8b 05 60 88 8f 4e 85 c0 75 0d 65 8b 05 a9 85 8f 4e 85 c0 74 02 <0f> 0b 
> e8 e2 6c 89 00 48 c7 c3 40 d5 02 00
>  89 c0 48 03 1c c5 c0 98
> [3.010667] RSP: :b2803e28 EFLAGS: 00010002
> [3.010670] RAX: 0001 RBX: c8ce7fa07060 RCX: 
> 0001
> [3.010671] RDX:  RSI: b268dd21 RDI: 
> b269ab13
> [3.010673] RBP: 0001 R08: ffc300d5 R09: 
> 0002be80
> [3.010674] R10: 03625b53183a R11: a012b802b7a4 R12: 
> b2aa9e80
> [3.010675] R13: b2aa9e00 R14: 0001 R15: 
> 
> [3.010677] FS:  () GS:a012b800() 
> knlGS:
> [3.010678] CS:  0010 DS:  ES:  CR0: 80050033
> [3.010680] CR2: a012f81ff000 CR3: 000c99612001 CR4: 
> 003706f0
> [3.010681] DR0:  DR1:  DR2: 
> 
> [3.010682] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [3.010683] Call Trace:
> [3.010685]  
> [3.010688]  cpuidle_enter_state+0xb7/0x4b0
> [3.010694]  cpuidle_enter+0x29/0x40
> [3.010697]  do_idle+0x1d4/0x210
> [3.010702]  cpu_startup_entry+0x19/0x20
> [3.010704]  rest_init+0x117/0x1a0
> [3.010708]  arch_call_rest_init+0xa/0x10
> [3.010711]  start_kernel+0x6d8/0x6ff
> [3.010716]  secondary_startup_64_no_verify+0xce/0xdb
> [3.010728]  
> [3.010729] irq event stamp: 44179
> [3.010730] hardirqs last  enabled at (44179): [] 
> asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [3.010734] hardirqs last disabled at (44177): [] 
> __do_softirq+0x3f0/0x498
> [3.010736] softirqs last  enabled at (44178): [] 
> __do_softirq+0x332/0x498
> [3.010738] softirqs last disabled at (44171): [] 
> irq_exit_rcu+0xab/0xf0
> [3.010741] ---[ end trace  ]---

Can you please give this patch a go:
https://patchwork.kernel.org/project/linux-pm/patch/Yt/axpfi88new...@e126311.manchester.arm.com/
?

Re: [PATCH 2/3] x86: add wrapper functions for mtrr functions handling also pat

2022-07-15 Thread Rafael J. Wysocki

On Fri, Jul 15, 2022 at 4:25 PM Juergen Gross  wrote:
>
> There are several MTRR functions which also do PAT handling. In order
> to support PAT handling without MTRR in the future, add some wrappers
> for those functions.
>
> Cc:  # 5.17
> Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")
> Signed-off-by: Juergen Gross 

Do I understand correctly that this particular patch doesn't change
the behavior?

If so, it would be good to mention that in the changelog.

> ---
>  arch/x86/include/asm/mtrr.h  |  2 --
>  arch/x86/include/asm/processor.h |  7 +
>  arch/x86/kernel/cpu/common.c | 44 +++-
>  arch/x86/kernel/cpu/mtrr/mtrr.c  | 25 +++---
>  arch/x86/kernel/setup.c  |  5 +---
>  arch/x86/kernel/smpboot.c|  8 +++---
>  arch/x86/power/cpu.c |  2 +-
>  7 files changed, 59 insertions(+), 34 deletions(-)
>
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index 12a16caed395..900083ac9f60 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -43,7 +43,6 @@ extern int mtrr_del(int reg, unsigned long base, unsigned 
> long size);
>  extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
>  extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
>  extern void mtrr_ap_init(void);
> -extern void set_mtrr_aps_delayed_init(void);
>  extern void mtrr_aps_init(void);
>  extern void mtrr_bp_restore(void);
>  extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
> @@ -86,7 +85,6 @@ static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, 
> u32 hi)
>  {
>  }
>  #define mtrr_ap_init() do {} while (0)
> -#define set_mtrr_aps_delayed_init() do {} while (0)
>  #define mtrr_aps_init() do {} while (0)
>  #define mtrr_bp_restore() do {} while (0)
>  #define mtrr_disable() do {} while (0)
> diff --git a/arch/x86/include/asm/processor.h 
> b/arch/x86/include/asm/processor.h
> index 5c934b922450..e2140204fb7e 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -865,7 +865,14 @@ bool arch_is_platform_page(u64 paddr);
>  #define arch_is_platform_page arch_is_platform_page
>  #endif
>
> +extern bool cache_aps_delayed_init;
> +
>  void cache_disable(void);
>  void cache_enable(void);
> +void cache_bp_init(void);
> +void cache_ap_init(void);
> +void cache_set_aps_delayed_init(void);
> +void cache_aps_init(void);
> +void cache_bp_restore(void);
>
>  #endif /* _ASM_X86_PROCESSOR_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e43322f8a4ef..0a1bd14f7966 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1929,7 +1929,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
>  #ifdef CONFIG_X86_32
> enable_sep_cpu();
>  #endif
> -   mtrr_ap_init();
> +   cache_ap_init();
> validate_apic_and_package_id(c);
> x86_spec_ctrl_setup_ap();
> update_srbds_msr();
> @@ -2403,3 +2403,45 @@ void cache_enable(void) __releases(cache_disable_lock)
>
> raw_spin_unlock(_disable_lock);
>  }
> +
> +void __init cache_bp_init(void)
> +{
> +   if (IS_ENABLED(CONFIG_MTRR))
> +   mtrr_bp_init();
> +   else
> +   pat_disable("PAT support disabled because CONFIG_MTRR is 
> disabled in the kernel.");
> +}
> +
> +void cache_ap_init(void)
> +{
> +   if (cache_aps_delayed_init)
> +   return;
> +
> +   mtrr_ap_init();
> +}
> +
> +bool cache_aps_delayed_init;
> +
> +void cache_set_aps_delayed_init(void)
> +{
> +   cache_aps_delayed_init = true;
> +}
> +
> +void cache_aps_init(void)
> +{
> +   /*
> +* Check if someone has requested the delay of AP cache 
> initialization,
> +* by doing cache_set_aps_delayed_init(), prior to this point. If not,
> +* then we are done.
> +*/
> +   if (!cache_aps_delayed_init)
> +   return;
> +
> +   mtrr_aps_init();
> +   cache_aps_delayed_init = false;
> +}
> +
> +void cache_bp_restore(void)
> +{
> +   mtrr_bp_restore();
> +}
> diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
> index 2746cac9d8a9..c1593cfae641 100644
> --- a/arch/x86/kernel/cpu/mtrr/mtrr.c
> +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
> @@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>  static DEFINE_MUTEX(mtrr_mutex);
>
>  u64 size_or_mask, size_and_mask;
> -static bool mtrr_aps_delayed_init;
>
>  static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
>
> @@ -176,7 +175,8 @@ static int mtrr_rendezvous_handler(void *info)
> if (data->smp_reg != ~0U) {
> mtrr_if->set(data->smp_reg, data->smp_base,
>  data->smp_size, data->smp_type);
> -   } else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
> +   } else if ((use_intel() && cache_aps_delayed_init) ||
> +

Re: [PATCH 31/36] cpuidle,acpi: Make noinstr clean

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> vmlinux.o: warning: objtool: io_idle+0xc: call to __inb.isra.0() leaves 
> .noinstr.text section
> vmlinux.o: warning: objtool: acpi_idle_enter+0xfe: call to num_online_cpus() 
> leaves .noinstr.text section
> vmlinux.o: warning: objtool: acpi_idle_enter+0x115: call to 
> acpi_idle_fallback_to_c1.isra.0() leaves .noinstr.text section
>
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Rafael J. Wysocki 

> ---
>  arch/x86/include/asm/shared/io.h |4 ++--
>  drivers/acpi/processor_idle.c|2 +-
>  include/linux/cpumask.h  |4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/include/asm/shared/io.h
> +++ b/arch/x86/include/asm/shared/io.h
> @@ -5,13 +5,13 @@
>  #include 
>
>  #define BUILDIO(bwl, bw, type) \
> -static inline void __out##bwl(type value, u16 port)\
> +static __always_inline void __out##bwl(type value, u16 port)   \
>  {  \
> asm volatile("out" #bwl " %" #bw "0, %w1"   \
>  : : "a"(value), "Nd"(port));   \
>  }  \
> \
> -static inline type __in##bwl(u16 port) \
> +static __always_inline type __in##bwl(u16 port)  
>   \
>  {  \
> type value; \
> asm volatile("in" #bwl " %w1, %" #bw "0"\
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -593,7 +593,7 @@ static int acpi_idle_play_dead(struct cp
> return 0;
>  }
>
> -static bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
> +static __always_inline bool acpi_idle_fallback_to_c1(struct acpi_processor 
> *pr)
>  {
> return IS_ENABLED(CONFIG_HOTPLUG_CPU) && !pr->flags.has_cst &&
> !(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED);
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -908,9 +908,9 @@ static inline const struct cpumask *get_
>   * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
>   * region.
>   */
> -static inline unsigned int num_online_cpus(void)
> +static __always_inline unsigned int num_online_cpus(void)
>  {
> -   return atomic_read(&__num_online_cpus);
> +   return arch_atomic_read(&__num_online_cpus);
>  }
>  #define num_possible_cpus()cpumask_weight(cpu_possible_mask)
>  #define num_present_cpus() cpumask_weight(cpu_present_mask)
>
>

Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:46 PM Peter Zijlstra  wrote:
>
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
>
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
>
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
>
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Rafael J. Wysocki 

> ---
>  arch/alpha/kernel/process.c  |1 -
>  arch/arc/kernel/process.c|3 +++
>  arch/arm/kernel/process.c|1 -
>  arch/arm/mach-gemini/board-dt.c  |3 ++-
>  arch/arm64/kernel/idle.c |1 -
>  arch/csky/kernel/process.c   |1 -
>  arch/csky/kernel/smp.c   |2 +-
>  arch/hexagon/kernel/process.c|1 -
>  arch/ia64/kernel/process.c   |1 +
>  arch/microblaze/kernel/process.c |1 -
>  arch/mips/kernel/idle.c  |8 +++-
>  arch/nios2/kernel/process.c  |1 -
>  arch/openrisc/kernel/process.c   |1 +
>  arch/parisc/kernel/process.c |2 --
>  arch/powerpc/kernel/idle.c   |5 ++---
>  arch/riscv/kernel/process.c  |1 -
>  arch/s390/kernel/idle.c  |1 -
>  arch/sh/kernel/idle.c|1 +
>  arch/sparc/kernel/leon_pmc.c |4 
>  arch/sparc/kernel/process_32.c   |1 -
>  arch/sparc/kernel/process_64.c   |3 ++-
>  arch/um/kernel/process.c |1 -
>  arch/x86/coco/tdx/tdx.c  |3 +++
>  arch/x86/kernel/process.c|   15 ---
>  arch/xtensa/kernel/process.c |1 +
>  kernel/sched/idle.c  |2 --
>  26 files changed, 28 insertions(+), 37 deletions(-)
>
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
>  void arch_cpu_idle(void)
>  {
> wtint(0);
> -   raw_local_irq_enable();
>  }
>
>  void arch_cpu_idle_dead(void)
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -114,6 +114,8 @@ void arch_cpu_idle(void)
> "sleep %0   \n"
> :
> :"I"(arg)); /* can't be "r" has to be embedded const */
> +
> +   raw_local_irq_disable();
>  }
>
>  #else  /* ARC700 */
> @@ -122,6 +124,7 @@ void arch_cpu_idle(void)
>  {
> /* sleep, but enable both set E1/E2 (levels of interrupts) before 
> committing */
> __asm__ __volatile__("sleep 0x3 \n");
> +   raw_local_irq_disable();
>  }
>
>  #endif
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -78,7 +78,6 @@ void arch_cpu_idle(void)
> arm_pm_idle();
> else
> cpu_do_idle();
> -   raw_local_irq_enable();
>  }
>
>  void arch_cpu_idle_prepare(void)
> --- a/arch/arm/mach-gemini/board-dt.c
> +++ b/arch/arm/mach-gemini/board-dt.c
> @@ -42,8 +42,9 @@ static void gemini_idle(void)
>  */
>
> /* FIXME: Enabling interrupts here is racy! */
> -   local_irq_enable();
> +   raw_local_irq_enable();
> cpu_do_idle();
> +   raw_local_irq_disable();
>  }
>
>  static void __init gemini_init_machine(void)
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
>  * tricks
>  */
> cpu_do_idle();
> -   raw_local_irq_enable();
>  }
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -101,6 +101,5 @@ void arch_cpu_idle(void)
>  #ifdef CONFIG_CPU_PM_STOP
> asm volatile("stop\n");
>  #endif
> -   raw_local_irq_enable();
>  }
>  #endif
> --- a/arch/csky/kernel/smp.c
> +++ b/arch/csky/kernel/smp.c
> @@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void)
> while (!secondary_stack)
> arch_cpu_idle();
>
> -   local_irq_disable();
> +   raw_local_irq_disable();
>
> asm volatile(
> "movsp, %0\n"
> --- a/arch/hexagon/kernel/process.c
> +++ b/arch/hexagon/kernel/process.c
> @@ -44,7 +44,6 @@ void arch_cpu_idle(void)
>  {
> __vmwait();
> /*  interrupts wake us up, but irqs are still disabled */
> -   raw_local_irq_enable();
>  }
>
>  /*
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -241,6 +241,7 @@ void arch_cpu_idle(void)
> (*mark_idle)(1

Re: [PATCH 18/36] cpuidle: Annotate poll_idle()

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:46 PM Peter Zijlstra  wrote:
>
> The __cpuidle functions will become a noinstr class, as such they need
> explicit annotations.
>
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Rafael J. Wysocki 

> ---
>  drivers/cpuidle/poll_state.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -13,7 +13,10 @@
>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>struct cpuidle_driver *drv, int index)
>  {
> -   u64 time_start = local_clock();
> +   u64 time_start;
> +
> +   instrumentation_begin();
> +   time_start = local_clock();
>
> dev->poll_time_limit = false;
>
> @@ -39,6 +42,7 @@ static int __cpuidle poll_idle(struct cp
> raw_local_irq_disable();
>
> current_clr_polling();
> +   instrumentation_end();
>
> return index;
>  }
>
>

Re: [PATCH 17/36] acpi_idle: Remove tracing

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> All the idle routines are called with RCU disabled, as such there must
> not be any tracing inside.
>
> Signed-off-by: Peter Zijlstra (Intel) 

This actually does some additional code duplication cleanup which
would be good to mention in the changelog.  Or even move to a separate
patch for that matter.

Otherwise LGTM.

> ---
>  drivers/acpi/processor_idle.c |   24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
>
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -108,8 +108,8 @@ static const struct dmi_system_id proces
>  static void __cpuidle acpi_safe_halt(void)
>  {
> if (!tif_need_resched()) {
> -   safe_halt();
> -   local_irq_disable();
> +   raw_safe_halt();
> +   raw_local_irq_disable();
> }
>  }
>
> @@ -524,16 +524,21 @@ static int acpi_idle_bm_check(void)
> return bm_status;
>  }
>
> -static void wait_for_freeze(void)
> +static __cpuidle void io_idle(unsigned long addr)
>  {
> +   /* IO port based C-state */
> +   inb(addr);
> +
>  #ifdef CONFIG_X86
> /* No delay is needed if we are in guest */
> if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> return;
>  #endif
> -   /* Dummy wait op - must do something useless after P_LVL2 read
> -  because chipsets cannot guarantee that STPCLK# signal
> -  gets asserted in time to freeze execution properly. */
> +   /*
> +* Dummy wait op - must do something useless after P_LVL2 read
> +* because chipsets cannot guarantee that STPCLK# signal
> +* gets asserted in time to freeze execution properly.
> +*/
> inl(acpi_gbl_FADT.xpm_timer_block.address);
>  }
>
> @@ -553,9 +558,7 @@ static void __cpuidle acpi_idle_do_entry
> } else if (cx->entry_method == ACPI_CSTATE_HALT) {
> acpi_safe_halt();
> } else {
> -   /* IO port based C-state */
> -   inb(cx->address);
> -   wait_for_freeze();
> +   io_idle(cx->address);
> }
>
> perf_lopwr_cb(false);
> @@ -577,8 +580,7 @@ static int acpi_idle_play_dead(struct cp
> if (cx->entry_method == ACPI_CSTATE_HALT)
> safe_halt();
> else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
> -   inb(cx->address);
> -   wait_for_freeze();
> +   io_idle(cx->address);
> } else
> return -ENODEV;
>
>
>

Re: [PATCH 03/36] cpuidle/poll: Ensure IRQ state is invariant

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> cpuidle_state::enter() methods should be IRQ invariant
>
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Rafael J. Wysocki 

> ---
>  drivers/cpuidle/poll_state.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -17,7 +17,7 @@ static int __cpuidle poll_idle(struct cp
>
> dev->poll_time_limit = false;
>
> -   local_irq_enable();
> +   raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> unsigned int loop_count = 0;
> u64 limit;
> @@ -36,6 +36,8 @@ static int __cpuidle poll_idle(struct cp
> }
> }
> }
> +   raw_local_irq_disable();
> +
> current_clr_polling();
>
> return index;
>
>

Re: [PATCH 05/36] cpuidle: Move IRQ state validation

2022-07-06 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> Make cpuidle_enter_state() consistent with the s2idle variant and
> verify ->enter() always returns with interrupts disabled.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  drivers/cpuidle/cpuidle.c |   10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -234,7 +234,11 @@ int cpuidle_enter_state(struct cpuidle_d
> stop_critical_timings();
> if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> rcu_idle_enter();
> +
> entered_state = target_state->enter(dev, drv, index);
> +   if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
> target_state->enter))

I'm not sure if dumping a call trace here is really useful and
WARN_ON() often gets converted to panic().

I would print an error message with pr_warn_once().

Otherwise LGTM.

> +   raw_local_irq_disable();
> +
> if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> rcu_idle_exit();
> start_critical_timings();
> @@ -246,12 +250,8 @@ int cpuidle_enter_state(struct cpuidle_d
> /* The cpu is no longer idle or about to enter idle. */
> sched_idle_set_state(NULL);
>
> -   if (broadcast) {
> -   if (WARN_ON_ONCE(!irqs_disabled()))
> -   local_irq_disable();
> -
> +   if (broadcast)
> tick_broadcast_exit();
> -   }
>
> if (!cpuidle_state_is_coupled(drv, index))
> local_irq_enable();
>
>

Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 5:48 PM Peter Zijlstra  wrote:
>
> On Wed, Jun 08, 2022 at 05:01:05PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
> > >
> > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > Xeons") wrecked intel_idle in two ways:
> > >
> > >  - must not have tracing in idle functions
> > >  - must return with IRQs disabled
> > >
> > > Additionally, it added a branch for no good reason.
> > >
> > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > Signed-off-by: Peter Zijlstra (Intel) 
> >
> > Acked-by: Rafael J. Wysocki 
> >
> > And do I think correctly that this can be applied without the rest of
> > the series?
>
> Yeah, I don't think this relies on any of the preceding patches. If you
> want to route this through the pm/fixes tree that's fine.

OK, thanks, applied (and I moved the intel_idle() kerneldoc so it is
next to the function to avoid the docs build warning).

Re: [PATCH 02/36] x86/idle: Replace x86_idle with a static_call

2022-06-08 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> Typical boot time setup; no need to suffer an indirect call for that.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Frederic Weisbecker 

Reviewed-by: Rafael J. Wysocki 

> ---
>  arch/x86/kernel/process.c |   50 
> +-
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
>  unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
>  EXPORT_SYMBOL(boot_option_idle_override);
>
> -static void (*x86_idle)(void);
> +/*
> + * We use this if we don't have any better idle routine..
> + */
> +void __cpuidle default_idle(void)
> +{
> +   raw_safe_halt();
> +}
> +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
> +EXPORT_SYMBOL(default_idle);
> +#endif
> +
> +DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
> +
> +static bool x86_idle_set(void)
> +{
> +   return !!static_call_query(x86_idle);
> +}
>
>  #ifndef CONFIG_SMP
>  static inline void play_dead(void)
> @@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
>  /*
>   * Called from the generic idle code.
>   */
> -void arch_cpu_idle(void)
> -{
> -   x86_idle();
> -}
> -
> -/*
> - * We use this if we don't have any better idle routine..
> - */
> -void __cpuidle default_idle(void)
> +void __cpuidle arch_cpu_idle(void)
>  {
> -   raw_safe_halt();
> +   static_call(x86_idle)();
>  }
> -#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
> -EXPORT_SYMBOL(default_idle);
> -#endif
>
>  #ifdef CONFIG_XEN
>  bool xen_set_default_idle(void)
>  {
> -   bool ret = !!x86_idle;
> +   bool ret = x86_idle_set();
>
> -   x86_idle = default_idle;
> +   static_call_update(x86_idle, default_idle);
>
> return ret;
>  }
> @@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
> if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
> pr_warn_once("WARNING: polling idle and HT enabled, 
> performance may degrade\n");
>  #endif
> -   if (x86_idle || boot_option_idle_override == IDLE_POLL)
> +   if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
> return;
>
> if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
> pr_info("using AMD E400 aware idle routine\n");
> -   x86_idle = amd_e400_idle;
> +   static_call_update(x86_idle, amd_e400_idle);
> } else if (prefer_mwait_c1_over_halt(c)) {
> pr_info("using mwait in idle threads\n");
> -   x86_idle = mwait_idle;
> +   static_call_update(x86_idle, mwait_idle);
> } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> pr_info("using TDX aware idle routine\n");
> -   x86_idle = tdx_safe_halt;
> +   static_call_update(x86_idle, tdx_safe_halt);
> } else
> -   x86_idle = default_idle;
> +   static_call_update(x86_idle, default_idle);
>  }
>
>  void amd_e400_c1e_apic_setup(void)
> @@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
>  * To continue to load the CPU idle driver, don't touch
>  * the boot_option_idle_override.
>  */
> -   x86_idle = default_idle;
> +   static_call_update(x86_idle, default_idle);
> boot_option_idle_override = IDLE_HALT;
> } else if (!strcmp(str, "nomwait")) {
> /*
>
>

Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Rafael J. Wysocki

On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> Xeons") wrecked intel_idle in two ways:
>
>  - must not have tracing in idle functions
>  - must return with IRQs disabled
>
> Additionally, it added a branch for no good reason.
>
> Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Rafael J. Wysocki 

And do I think correctly that this can be applied without the rest of
the series?

> ---
>  drivers/idle/intel_idle.c |   48 
> +++---
>  1 file changed, 37 insertions(+), 11 deletions(-)
>
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -129,21 +137,37 @@ static unsigned int mwait_substates __in
>   *
>   * Must be called under local_irq_disable().
>   */
> +
> -static __cpuidle int intel_idle(struct cpuidle_device *dev,
> -   struct cpuidle_driver *drv, int index)
> +static __always_inline int __intel_idle(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
>  {
> struct cpuidle_state *state = >states[index];
> unsigned long eax = flg2MWAIT(state->flags);
> unsigned long ecx = 1; /* break on interrupt flag */
>
> -   if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)
> -   local_irq_enable();
> -
> mwait_idle_with_hints(eax, ecx);
>
> return index;
>  }
>
> +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
> +{
> +   return __intel_idle(dev, drv, index);
> +}
> +
> +static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
> +{
> +   int ret;
> +
> +   raw_local_irq_enable();
> +   ret = __intel_idle(dev, drv, index);
> +   raw_local_irq_disable();
> +
> +   return ret;
> +}
> +
>  /**
>   * intel_idle_s2idle - Ask the processor to enter the given idle state.
>   * @dev: cpuidle device of the target CPU.
> @@ -1801,6 +1824,9 @@ static void __init intel_idle_init_cstat
> /* Structure copy. */
> drv->states[drv->state_count] = cpuidle_state_table[cstate];
>
> +   if (cpuidle_state_table[cstate].flags & 
> CPUIDLE_FLAG_IRQ_ENABLE)
> +   drv->states[drv->state_count].enter = intel_idle_irq;
> +
> if ((disabled_states_mask & BIT(drv->state_count)) ||
> ((icpu->use_acpi || force_use_acpi) &&
>  intel_idle_off_by_default(mwait_hint) &&
>
>

Re: [PATCH v8 00/27] Introduce power-off+restart call chain API

2022-05-24 Thread Rafael J. Wysocki

Hi Geert,

On Mon, May 23, 2022 at 8:08 PM Geert Uytterhoeven  wrote:
>
> Hi Rafael,
>
> On Wed, May 18, 2022 at 4:46 PM Rafael J. Wysocki  wrote:
> > On Tue, May 10, 2022 at 1:33 AM Dmitry Osipenko
> >  wrote:
>
> > >   m68k: Switch to new sys-off handler API
>
> Sorry, I didn't realize this was going to interact with the new m68k
> virtual machine support, which is included in the m68k pull request
> for v5.19.
>
> > However, I'm going to send a pull request with it in the second half
> > of the merge window, after the majority of the other changes in the
> > subsystems touched by it have been integrated.
>
> And presumably you will have to merge in v5.19-rc1, too?

I will merge this series on top of the Linus' merges of my pull
requests sent yesterday (assuming that he pulls them, that is).

> I've sent a fix.  It should appear at
> https://lore.kernel.org/r/20220523175520.949681-1-ge...@linux-m68k.org
> soon.
>
> Can you please include that in your PR?

I will.

Thanks!

Re: [PATCH v8 00/27] Introduce power-off+restart call chain API

2022-05-18 Thread Rafael J. Wysocki

On Tue, May 10, 2022 at 1:33 AM Dmitry Osipenko
 wrote:
>
> Problem
> ---
>
> SoC devices require power-off call chaining functionality from kernel.
> We have a widely used restart chaining provided by restart notifier API,
> but nothing for power-off.
>
> Solution
> 
>
> Introduce new API that provides call chains support for all restart and
> power-off modes. The new API is designed with simplicity and extensibility
> in mind.
>
> This is a third attempt to introduce the new API. First was made by
> Guenter Roeck back in 2014, second was made by Thierry Reding in 2017.
> In fact the work didn't stop and recently arm_pm_restart() was removed
> from v5.14 kernel, which was a part of preparatory work started by
> Guenter Roeck.
>
> Adoption plan
> -
>
> This patchset introduces the new API. It also converts multiple drivers
> and arch code to the new API to demonstrate how it all looks in practice,
> removing the pm_power_off_prepare global variable.
>
> The plan is:
>
> 1. Merge the new API and convert arch code to use do_kernel_power_off().
>For now the new API will co-exist with the older API.
>
> 2. Convert all drivers and platform code to the new API.
>
> 3. Remove obsoleted pm_power_off and pm_power_off_prepare variables.
>
> Results
> ---
>
> 1. Devices can be powered off properly.
>
> 2. Global variables are removed from drivers.
>
> 3. Global pm_power_off and pm_power_off_prepare callback variables are
> removed once all users are converted to the new API. The latter callback
> is removed by patch #24 of this series.
>
> 4. Ambiguous call chain ordering is prohibited for non-default priorities.
>
> Changelog:
>
> v8: - Reworked sys-off handler like was suggested by Rafael Wysocki in
>   the comments to v7.
>
> - The struct sys-off handler now is private to kernel/reboot.c and
>   new API is simplified.
>
> - There is a single sys-off API function for all handler types.
>   Users shall pass the required sys-off mode type (restart, power-off
>   and etc).
>
> - There is single struct sys_off_data callback argument for all
>   handler modes.
>
> - User's callback now must return NOTIFY_DONE or NOTIFY_STOP.
>
> - The default priority level is zero now.
>
> - Multiple handlers now allowed to be registered at the default
>   priority level.
>
> - Power-off call chain is atomic now, like the restart chain.
>
> - kernel/reboot.c changes are split up into several logical patches.
>
> - Added r-b from Michał Mirosław to unmodified patches from v7.
>
> - Added acks that were missing in v7 by accident.

The v8 looks much better than the previous versions to me.

I actually don't really have any comments on it except for the minor
remark regarding patch [1/27] sent separately.

Please just send an update of that one patch and I will queue up the
series for 5.19.

However, I'm going to send a pull request with it in the second half
of the merge window, after the majority of the other changes in the
subsystems touched by it have been integrated.

> v7: - Rebased on a recent linux-next. Dropped the recently removed
>   NDS32 architecture. Only SH and x86 arches left un-acked.
>
> - Added acks from Thomas Bogendoerfer and Krzysztof Kozlowski
>   to the MIPS and memory/emif patches respectively.
>
> - Made couple minor cosmetic improvements to the new API.
>
> - A month ago I joined Collabora and continuing to work on this series
>   on the company's time, so changed my email address to collabora.com
>
> v6: - Rebased on a recent linux-next.
>
> - Made minor couple cosmetic changes.
>
> v5: - Dropped patches which cleaned up notifier/reboot headers, as was
>   requested by Rafael Wysocki.
>
> - Dropped WARN_ON() from the code, as was requested by Rafael Wysocki.
>   Replaced it with pr_err() appropriately.
>
> - Dropped *_notifier_has_unique_priority() functions and added
>   *_notifier_chain_register_unique_prio() instead, as was suggested
>   by Michał Mirosław and Rafael Wysocki.
>
> - Dropped export of blocking_notifier_call_chain_is_empty() symbol,
>   as was suggested by Rafael Wysocki.
>
> - Michał Mirosław suggested that will be better to split up patch
>   that adds the new API to ease reviewing, but Rafael Wysocki asked
>   not add more patches, so I kept it as a single patch.
>
> - Added temporary "weak" stub for pm_power_off() which fixes linkage
>   failure once symbol is removed from arch/* code. Previously I missed
>   this problem because was only compile-testing object files.
>
> v4: - Made a very minor improvement to doc comments, clarifying couple
>   default values.
>
> - Corrected list of emails recipient by adding Linus, Sebastian,
>   Philipp and more NDS people. Removed bouncing emails.
>
> - Added acks that were given to v3.
>
> v3: - Renamed power_handler to sys_off_handler as was suggested by
>

Re: [PATCH v8 01/27] notifier: Add atomic_notifier_call_chain_is_empty()

2022-05-10 Thread Rafael J. Wysocki

On Tue, May 10, 2022 at 1:33 AM Dmitry Osipenko
 wrote:
>
> Add atomic_notifier_call_chain_is_empty() that returns true if given
> atomic call chain is empty.

It would be good to mention a use case for it.

> Reviewed-by: Michał Mirosław 
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/notifier.h |  2 ++
>  kernel/notifier.c| 13 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/include/linux/notifier.h b/include/linux/notifier.h
> index 87069b8459af..95e2440037de 100644
> --- a/include/linux/notifier.h
> +++ b/include/linux/notifier.h
> @@ -173,6 +173,8 @@ extern int blocking_notifier_call_chain_robust(struct 
> blocking_notifier_head *nh
>  extern int raw_notifier_call_chain_robust(struct raw_notifier_head *nh,
> unsigned long val_up, unsigned long val_down, void *v);
>
> +extern bool atomic_notifier_call_chain_is_empty(struct atomic_notifier_head 
> *nh);
> +
>  #define NOTIFY_DONE0x  /* Don't care */
>  #define NOTIFY_OK  0x0001  /* Suits me */
>  #define NOTIFY_STOP_MASK   0x8000  /* Don't call further */
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index ba005ebf4730..aaf5b56452a6 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -204,6 +204,19 @@ int atomic_notifier_call_chain(struct 
> atomic_notifier_head *nh,
>  EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
>  NOKPROBE_SYMBOL(atomic_notifier_call_chain);
>
> +/**
> + * atomicnotifier_call_chain_is_empty - Check whether notifier chain is 
> empty
> + * @nh: Pointer to head of the blocking notifier chain
> + *
> + * Checks whether notifier chain is empty.
> + *
> + * Returns true is notifier chain is empty, false otherwise.
> + */
> +bool atomic_notifier_call_chain_is_empty(struct atomic_notifier_head *nh)
> +{
> +   return !rcu_access_pointer(nh->head);
> +}
> +
>  /*
>   * Blocking notifier chain routines.  All access to the chain is
>   * synchronized by an rwsem.
> --
> 2.35.1
>

Re: [PATCH v7 04/20] kernel: Add combined power-off+restart handler call chain API

2022-04-20 Thread Rafael J. Wysocki

On Mon, Apr 18, 2022 at 3:44 AM Dmitry Osipenko
 wrote:
>
> On 4/15/22 21:14, Rafael J. Wysocki wrote:
> > Honestly, I would prefer this to be split so as to make it easier to
> > review if nothing else.
>
> I'll try to split it in v8.
>
> > On Tue, Apr 12, 2022 at 1:39 AM Dmitry Osipenko
> >  wrote:
> >>
> >> SoC platforms often have multiple ways of how to perform system's
> >> power-off and restart operations. Meanwhile today's kernel is limited to
> >> a single option. Add combined power-off+restart handler call chain API,
> >> which is inspired by the restart API. The new API provides both power-off
> >> and restart functionality.
> >>
> >> The old pm_power_off method will be kept around till all users are
> >> converted to the new API.
> >>
> >> Current restart API will be replaced by the new unified API since
> >> new API is its superset. The restart functionality of the sys-off handler
> >> API is built upon the existing restart-notifier APIs.
> >
> > Which means that the existing notifier chains for system restart are
> > used as they are without modifications.
> >
> > At least that's what follows from the code and it would be good to
> > mention it here.
>
> Will improve the commit message.
>
> > Moreover, a new notifier chain is introduced for the power-off case
> > and it appears to be the counterpart of the restart_handler_list
> > chain, but then why is it blocking and not atomic like the latter?
>
> Good catch, it probably indeed should be atomic because shutting down
> could run with a disabled interrupts. I'll invistigate this more for v8,
> at least right now I don't recall any particular reason for using the
> blocking notifier.
>
> >> In order to ease conversion to the new API, convenient helpers are added
> >> for the common use-cases. They will reduce amount of boilerplate code and
> >> remove global variables. These helpers preserve old behaviour for cases
> >> where only one power-off handler is expected, this is what all existing
> >> drivers want, and thus, they could be easily converted to the new API.
> >> Users of the new API should explicitly enable power-off chaining by
> >> setting corresponding flag of the power_handler structure.
> >
> > "the corresponding"
>
> Thanks
>
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  include/linux/reboot.h   | 229 ++-
> >>  kernel/power/hibernate.c |   2 +-
> >>  kernel/reboot.c  | 604 ++-
> >>  3 files changed, 827 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> >> index a2429648d831..ba5e5dddcfcd 100644
> >> --- a/include/linux/reboot.h
> >> +++ b/include/linux/reboot.h
> >> @@ -8,10 +8,35 @@
> >>
> >>  struct device;
> >>
> >> -#define SYS_DOWN   0x0001  /* Notify of system down */
> >> -#define SYS_RESTARTSYS_DOWN
> >> -#define SYS_HALT   0x0002  /* Notify of system halt */
> >> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> >> +enum reboot_prepare_mode {
> >> +   SYS_DOWN = 1,   /* Notify of system down */
> >> +   SYS_RESTART = SYS_DOWN,
> >> +   SYS_HALT,   /* Notify of system halt */
> >> +   SYS_POWER_OFF,  /* Notify of system power off */
> >> +};
> >> +
> >> +/*
> >> + * Standard restart priority levels. Intended to be set in the
> >> + * sys_off_handler.restart_priority field.
> >> + *
> >> + * Use `RESTART_PRIO_ABC +- prio` style for additional levels.
> >> + *
> >> + * RESTART_PRIO_RESERVED:  Falls back to RESTART_PRIO_DEFAULT.
> >> + * Drivers may leave priority initialized
> >> + * to zero, to auto-set it to the default 
> >> level.
> >
> > What is the "default level" here?
>
> "default level" = RESTART_PRIO_DEFAULT = 128
>
> I'll remove the second sentence about the "default level", for clarity.
>
> >> + *
> >> + * RESTART_PRIO_LOW:   Use this for handler of last resort.
> >> + *
> >> + * RESTART_PRIO_DEFAULT:   Use this for default/generic handler.
> >> + *
> >> + * RESTART_PRIO_HIGH:  Use this if you have multiple handlers and
> >> + * this handler has higher pr

Re: [PATCH v7 03/20] reboot: Print error message if restart handler has duplicated priority

2022-04-20 Thread Rafael J. Wysocki

On Mon, Apr 18, 2022 at 3:29 AM Dmitry Osipenko
 wrote:
>
> On 4/14/22 14:19, Rafael J. Wysocki wrote:
> > On Thu, Apr 14, 2022 at 12:24 AM Dmitry Osipenko
> >  wrote:
> >>
> >> On 4/13/22 21:48, Rafael J. Wysocki wrote:
> >>> On Tue, Apr 12, 2022 at 1:39 AM Dmitry Osipenko
> >>>  wrote:
> >>>>
> >>>> Add sanity check which ensures that there are no two restart handlers
> >>>> registered using the same priority. This requirement will become 
> >>>> mandatory
> >>>> once all drivers will be converted to the new API and such errors will be
> >>>> fixed.
> >>>>
> >>>> Signed-off-by: Dmitry Osipenko 
> >>>
> >>> The first two patches in the series are fine with me and there's only
> >>> one minor nit regarding this one (below).
> >>>
> >>>> ---
> >>>>  kernel/reboot.c | 15 +++
> >>>>  1 file changed, 15 insertions(+)
> >>>>
> >>>> diff --git a/kernel/reboot.c b/kernel/reboot.c
> >>>> index ed4e6dfb7d44..acdae4e95061 100644
> >>>> --- a/kernel/reboot.c
> >>>> +++ b/kernel/reboot.c
> >>>> @@ -182,6 +182,21 @@ static ATOMIC_NOTIFIER_HEAD(restart_handler_list);
> >>>>   */
> >>>>  int register_restart_handler(struct notifier_block *nb)
> >>>>  {
> >>>> +   int ret;
> >>>> +
> >>>> +   ret = 
> >>>> atomic_notifier_chain_register_unique_prio(_handler_list, nb);
> >>>> +   if (ret != -EBUSY)
> >>>> +   return ret;
> >>>> +
> >>>> +   /*
> >>>> +* Handler must have unique priority. Otherwise call order is
> >>>> +* determined by registration order, which is unreliable.
> >>>> +*
> >>>> +* This requirement will become mandatory once all drivers
> >>>> +* will be converted to use new sys-off API.
> >>>> +*/
> >>>> +   pr_err("failed to register restart handler using unique 
> >>>> priority\n");
> >>>
> >>> I would use pr_info() here, because this is not a substantial error 
> >>> AFAICS.
> >>
> >> It's indeed not a substantial error so far, but it will become
> >> substantial later on once only unique priorities will be allowed. The
> >> pr_warn() could be a good compromise here, pr_info() is too mild, IMO.
> >
> > Well, I'm still unconvinced about requiring all of the users of this
> > interface to use unique priorities.
> >
> > Arguably, there are some of them who don't really care about the
> > ordering, so could there be an option for them to specify the lack of
> > care by, say, passing 0 as the priority that would be regarded as a
> > special case?
> >
> > IOW, if you pass 0, you'll be run along the others who've also passed
> > 0, but if you pass anything different from 0, it must be unique.  What
> > do you think?
>
> There are indeed cases where ordering is unimportant. Like a case of
> PMIC and watchdog restart handlers for example, both handlers will
> produce equal effect from a user's perspective. Perhaps indeed it's more
> practical to have at least one shared level.
>
> In this patchset the level 0 is specified as an alias to the default
> level 128. If one user registers handler using unique level 128 and the
> other user uses non-unique level 0, then we have ambiguity.
>
> One potential option is to make the whole default level 128 non-unique.
> This will allow users to not care about the uniqueness by default like
> they always did it previously, but it will hide potential problems for
> users who actually need unique level and don't know about it yet due to
> a lucky registration ordering that they have today. Are you okay with
> this option?

Yes, I am.

Re: [PATCH v7 04/20] kernel: Add combined power-off+restart handler call chain API

2022-04-15 Thread Rafael J. Wysocki

Honestly, I would prefer this to be split so as to make it easier to
review if nothing else.

On Tue, Apr 12, 2022 at 1:39 AM Dmitry Osipenko
 wrote:
>
> SoC platforms often have multiple ways of how to perform system's
> power-off and restart operations. Meanwhile today's kernel is limited to
> a single option. Add combined power-off+restart handler call chain API,
> which is inspired by the restart API. The new API provides both power-off
> and restart functionality.
>
> The old pm_power_off method will be kept around till all users are
> converted to the new API.
>
> Current restart API will be replaced by the new unified API since
> new API is its superset. The restart functionality of the sys-off handler
> API is built upon the existing restart-notifier APIs.

Which means that the existing notifier chains for system restart are
used as they are without modifications.

At least that's what follows from the code and it would be good to
mention it here.

Moreover, a new notifier chain is introduced for the power-off case
and it appears to be the counterpart of the restart_handler_list
chain, but then why is it blocking and not atomic like the latter?

> In order to ease conversion to the new API, convenient helpers are added
> for the common use-cases. They will reduce amount of boilerplate code and
> remove global variables. These helpers preserve old behaviour for cases
> where only one power-off handler is expected, this is what all existing
> drivers want, and thus, they could be easily converted to the new API.
> Users of the new API should explicitly enable power-off chaining by
> setting corresponding flag of the power_handler structure.

"the corresponding"

>
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/reboot.h   | 229 ++-
>  kernel/power/hibernate.c |   2 +-
>  kernel/reboot.c  | 604 ++-
>  3 files changed, 827 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index a2429648d831..ba5e5dddcfcd 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -8,10 +8,35 @@
>
>  struct device;
>
> -#define SYS_DOWN   0x0001  /* Notify of system down */
> -#define SYS_RESTARTSYS_DOWN
> -#define SYS_HALT   0x0002  /* Notify of system halt */
> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> +enum reboot_prepare_mode {
> +   SYS_DOWN = 1,   /* Notify of system down */
> +   SYS_RESTART = SYS_DOWN,
> +   SYS_HALT,   /* Notify of system halt */
> +   SYS_POWER_OFF,  /* Notify of system power off */
> +};
> +
> +/*
> + * Standard restart priority levels. Intended to be set in the
> + * sys_off_handler.restart_priority field.
> + *
> + * Use `RESTART_PRIO_ABC +- prio` style for additional levels.
> + *
> + * RESTART_PRIO_RESERVED:  Falls back to RESTART_PRIO_DEFAULT.
> + * Drivers may leave priority initialized
> + * to zero, to auto-set it to the default level.

What is the "default level" here?

> + *
> + * RESTART_PRIO_LOW:   Use this for handler of last resort.
> + *
> + * RESTART_PRIO_DEFAULT:   Use this for default/generic handler.
> + *
> + * RESTART_PRIO_HIGH:  Use this if you have multiple handlers and
> + * this handler has higher priority than the
> + * default handler.
> + */
> +#define RESTART_PRIO_RESERVED  0
> +#define RESTART_PRIO_LOW   8
> +#define RESTART_PRIO_DEFAULT   128
> +#define RESTART_PRIO_HIGH  192
>
>  enum reboot_mode {
> REBOOT_UNDEFINED = -1,
> @@ -49,6 +74,201 @@ extern int register_restart_handler(struct notifier_block 
> *);
>  extern int unregister_restart_handler(struct notifier_block *);
>  extern void do_kernel_restart(char *cmd);
>
> +/*
> + * System power-off and restart API.
> + */
> +
> +/*
> + * Standard power-off priority levels. Intended to be set in the
> + * sys_off_handler.power_off_priority field.
> + *
> + * Use `POWEROFF_PRIO_ABC +- prio` style for additional levels.

What exactly does this mean?

> + *
> + * POWEROFF_PRIO_RESERVED: Falls back to POWEROFF_PRIO_DEFAULT.
> + * Drivers may leave priority initialized
> + * to zero, to auto-set it to the default level.
> + *
> + * POWEROFF_PRIO_PLATFORM: Intended to be used by platform-level handler.
> + * Has lowest priority since device drivers are
> + * expected to take over platform handler which
> + * doesn't allow further callback chaining.
> + *
> + * POWEROFF_PRIO_DEFAULT:  Use this for default/generic handler.
> + *
> + * POWEROFF_PRIO_FIRMWARE: Use this if handler uses firmware call.
> + * Has highest priority since firmware is 
> expected
>

Re: [PATCH v7 03/20] reboot: Print error message if restart handler has duplicated priority

2022-04-14 Thread Rafael J. Wysocki

On Thu, Apr 14, 2022 at 12:24 AM Dmitry Osipenko
 wrote:
>
> On 4/13/22 21:48, Rafael J. Wysocki wrote:
> > On Tue, Apr 12, 2022 at 1:39 AM Dmitry Osipenko
> >  wrote:
> >>
> >> Add sanity check which ensures that there are no two restart handlers
> >> registered using the same priority. This requirement will become mandatory
> >> once all drivers will be converted to the new API and such errors will be
> >> fixed.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >
> > The first two patches in the series are fine with me and there's only
> > one minor nit regarding this one (below).
> >
> >> ---
> >>  kernel/reboot.c | 15 +++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/kernel/reboot.c b/kernel/reboot.c
> >> index ed4e6dfb7d44..acdae4e95061 100644
> >> --- a/kernel/reboot.c
> >> +++ b/kernel/reboot.c
> >> @@ -182,6 +182,21 @@ static ATOMIC_NOTIFIER_HEAD(restart_handler_list);
> >>   */
> >>  int register_restart_handler(struct notifier_block *nb)
> >>  {
> >> +   int ret;
> >> +
> >> +   ret = 
> >> atomic_notifier_chain_register_unique_prio(_handler_list, nb);
> >> +   if (ret != -EBUSY)
> >> +   return ret;
> >> +
> >> +   /*
> >> +* Handler must have unique priority. Otherwise call order is
> >> +* determined by registration order, which is unreliable.
> >> +*
> >> +* This requirement will become mandatory once all drivers
> >> +* will be converted to use new sys-off API.
> >> +*/
> >> +   pr_err("failed to register restart handler using unique 
> >> priority\n");
> >
> > I would use pr_info() here, because this is not a substantial error AFAICS.
>
> It's indeed not a substantial error so far, but it will become
> substantial later on once only unique priorities will be allowed. The
> pr_warn() could be a good compromise here, pr_info() is too mild, IMO.

Well, I'm still unconvinced about requiring all of the users of this
interface to use unique priorities.

Arguably, there are some of them who don't really care about the
ordering, so could there be an option for them to specify the lack of
care by, say, passing 0 as the priority that would be regarded as a
special case?

IOW, if you pass 0, you'll be run along the others who've also passed
0, but if you pass anything different from 0, it must be unique.  What
do you think?

Re: [PATCH v7 03/20] reboot: Print error message if restart handler has duplicated priority

2022-04-13 Thread Rafael J. Wysocki

On Tue, Apr 12, 2022 at 1:39 AM Dmitry Osipenko
 wrote:
>
> Add sanity check which ensures that there are no two restart handlers
> registered using the same priority. This requirement will become mandatory
> once all drivers will be converted to the new API and such errors will be
> fixed.
>
> Signed-off-by: Dmitry Osipenko 

The first two patches in the series are fine with me and there's only
one minor nit regarding this one (below).

> ---
>  kernel/reboot.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index ed4e6dfb7d44..acdae4e95061 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -182,6 +182,21 @@ static ATOMIC_NOTIFIER_HEAD(restart_handler_list);
>   */
>  int register_restart_handler(struct notifier_block *nb)
>  {
> +   int ret;
> +
> +   ret = 
> atomic_notifier_chain_register_unique_prio(_handler_list, nb);
> +   if (ret != -EBUSY)
> +   return ret;
> +
> +   /*
> +* Handler must have unique priority. Otherwise call order is
> +* determined by registration order, which is unreliable.
> +*
> +* This requirement will become mandatory once all drivers
> +* will be converted to use new sys-off API.
> +*/
> +   pr_err("failed to register restart handler using unique priority\n");

I would use pr_info() here, because this is not a substantial error AFAICS.

> +
> return atomic_notifier_chain_register(_handler_list, nb);
>  }
>  EXPORT_SYMBOL(register_restart_handler);
> --

Re: [PATCH v6 00/21] Introduce power-off+restart call chain API

2022-02-16 Thread Rafael J. Wysocki

On Tue, Feb 15, 2022 at 11:00 PM Dmitry Osipenko  wrote:
>
> 31.01.2022 02:36, Dmitry Osipenko пишет:
> > Problem
> > ---
> >
> > SoC devices require power-off call chaining functionality from kernel.
> > We have a widely used restart chaining provided by restart notifier API,
> > but nothing for power-off.
> >
> > Solution
> > 
> >
> > Introduce new API that provides both restart and power-off call chains.
> >
> > Why combine restart with power-off? Because drivers often do both.
> > More practical to have API that provides both under the same roof.
> >
> > The new API is designed with simplicity and extensibility in mind.
> > It's built upon the existing restart and reboot APIs. The simplicity
> > is in new helper functions that are convenient for drivers. The
> > extensibility is in the design that doesn't hardcode callback
> > arguments, making easy to add new parameters and remove old.
> >
> > This is a third attempt to introduce the new API. First was made by
> > Guenter Roeck back in 2014, second was made by Thierry Reding in 2017.
> > In fact the work didn't stop and recently arm_pm_restart() was removed
> > from v5.14 kernel, which was a part of preparatory work started by
> > Guenter Roeck. I took into account experience and ideas from the
> > previous attempts, extended and polished them.
>
>
> Rafael and all, do you see anything critical that needs to be improved
> in this v6?
>
> Will be great if you could take this patchset via the power tree if it
> looks okay, or give an ack.

I need some more time for this, sorry.

I'm a bit concerned about seeing no response to this set from anyone.

It looks like multiple platforms may be affected by it in principle,
so doesn't anyone care?

Re: [PATCH v4 05/25] reboot: Warn if restart handler has duplicated priority

2021-12-10 Thread Rafael J. Wysocki

On Fri, Dec 10, 2021 at 8:04 PM Dmitry Osipenko  wrote:
>
> 10.12.2021 21:27, Rafael J. Wysocki пишет:
> > On Mon, Nov 29, 2021 at 12:34 PM Dmitry Osipenko  wrote:
> >>
> >> 29.11.2021 03:26, Michał Mirosław пишет:
> >>> On Mon, Nov 29, 2021 at 12:06:19AM +0300, Dmitry Osipenko wrote:
> >>>> 28.11.2021 03:28, Michał Mirosław пишет:
> >>>>> On Fri, Nov 26, 2021 at 09:00:41PM +0300, Dmitry Osipenko wrote:
> >>>>>> Add sanity check which ensures that there are no two restart handlers
> >>>>>> registered with the same priority. Normally it's a direct sign of a
> >>>>>> problem if two handlers use the same priority.
> >>>>>
> >>>>> The patch doesn't ensure the property that there are no 
> >>>>> duplicated-priority
> >>>>> entries on the chain.
> >>>>
> >>>> It's not the exact point of this patch.
> >>>>
> >>>>> I'd rather see a atomic_notifier_chain_register_unique() that returns
> >>>>> -EBUSY or something istead of adding an entry with duplicate priority.
> >>>>> That way it would need only one list traversal unless you want to
> >>>>> register the duplicate anyway (then you would call the older
> >>>>> atomic_notifier_chain_register() after reporting the error).
> >>>>
> >>>> The point of this patch is to warn developers about the problem that
> >>>> needs to be fixed. We already have such troubling drivers in mainline.
> >>>>
> >>>> It's not critical to register different handlers with a duplicated
> >>>> priorities, but such cases really need to be corrected. We shouldn't
> >>>> break users' machines during transition to the new API, meanwhile
> >>>> developers should take action of fixing theirs drivers.
> >>>>
> >>>>> (Or you could return > 0 when a duplicate is registered in
> >>>>> atomic_notifier_chain_register() if the callers are prepared
> >>>>> for that. I don't really like this way, though.)
> >>>>
> >>>> I had a similar thought at some point before and decided that I'm not in
> >>>> favor of this approach. It's nicer to have a dedicated function that
> >>>> verifies the uniqueness, IMO.
> >>>
> >>> I don't like the part that it traverses the list second time to check
> >>> the uniqueness. But actually you could avoid that if
> >>> notifier_chain_register() would always add equal-priority entries in
> >>> reverse order:
> >>>
> >>>  static int notifier_chain_register(struct notifier_block **nl,
> >>>   struct notifier_block *n)
> >>>  {
> >>>   while ((*nl) != NULL) {
> >>>   if (unlikely((*nl) == n)) {
> >>>   WARN(1, "double register detected");
> >>>   return 0;
> >>>   }
> >>> - if (n->priority > (*nl)->priority)
> >>> + if (n->priority >= (*nl)->priority)
> >>>   break;
> >>>   nl = &((*nl)->next);
> >>>   }
> >>>   n->next = *nl;
> >>>   rcu_assign_pointer(*nl, n);
> >>>   return 0;
> >>>  }
> >>>
> >>> Then the check for uniqueness after adding would be:
> >>>
> >>>  WARN(nb->next && nb->priority == nb->next->priority);
> >>
> >> We can't just change the registration order because invocation order of
> >> the call chain depends on the registration order
> >
> > It doesn't if unique priorities are required and isn't that what you want?
> >
> >> and some of current
> >> users may rely on that order. I'm pretty sure that changing the order
> >> will have unfortunate consequences.
> >
> > Well, the WARN() doesn't help much then.
> >
> > Either you can make all of the users register with unique priorities,
> > and then you can make the registration reject non-unique ones, or you
> > cannot assume them to be unique.
>
> There is no strong requirement for priorities to be unique, the reboot.c
> code will work properly.

In which case adding the WARN() is not appropriate IMV.

Also I've looked at the existing code and at least in some cases the
order in which the notifiers run doesn't matter.  I'm not sure what
the purpose of this patch is TBH.

> The potential problem is on the user's side and the warning is intended
> to aid the user.

Unless somebody has the panic_on_warn mentioned previously set and
really the user need not understand what the WARN() is about.  IOW,
WARN() helps developers, not users.

> We can make it a strong requirement, but only after converting and
> testing all kernel drivers.

Right.

> I'll consider to add patches for that.

But can you avoid adding more patches to this series?

Re: [PATCH v4 06/25] reboot: Warn if unregister_restart_handler() fails

2021-12-10 Thread Rafael J. Wysocki

On Fri, Dec 10, 2021 at 7:54 PM Dmitry Osipenko  wrote:
>
> 10.12.2021 21:32, Rafael J. Wysocki пишет:
> > On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
> >>
> >> Emit warning if unregister_restart_handler() fails since it never should
> >> fail. This will ease further API development by catching mistakes early.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  kernel/reboot.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/reboot.c b/kernel/reboot.c
> >> index e6659ae329f1..f0e7b9c13f6b 100644
> >> --- a/kernel/reboot.c
> >> +++ b/kernel/reboot.c
> >> @@ -210,7 +210,7 @@ EXPORT_SYMBOL(register_restart_handler);
> >>   */
> >>  int unregister_restart_handler(struct notifier_block *nb)
> >>  {
> >> -   return atomic_notifier_chain_unregister(_handler_list, nb);
> >> +   return 
> >> WARN_ON(atomic_notifier_chain_unregister(_handler_list, nb));
> >
> > The only reason why it can fail is if the object pointed to by nb is
> > not in the chain.
>
> I had exactly this case where object wasn't in the chain due to a bug
> and this warning was very helpful.

During the development.  In production it would be rather annoying.

> >  Why WARN() about this?  And what about systems with
> > panic_on_warn set?
>
> That warning condition will never happen normally, only when something
> is seriously wrong.
>
> Those systems with panic_on_warn will get what was they asked for.

They may not be asking for panicking on bugs in the reboot notifier
code, though.  That's what your change is making them panic on.

Re: [PATCH v4 03/25] notifier: Add atomic/blocking_notifier_has_unique_priority()

2021-12-10 Thread Rafael J. Wysocki

On Fri, Dec 10, 2021 at 7:52 PM Dmitry Osipenko  wrote:
>
> 10.12.2021 21:19, Rafael J. Wysocki пишет:
> ...
> >> +bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
> >> +   struct notifier_block *n)
> >> +{
> >> +   unsigned long flags;
> >> +   bool ret;
> >> +
> >> +   spin_lock_irqsave(>lock, flags);
> >> +   ret = notifier_has_unique_priority(>head, n);
> >> +   spin_unlock_irqrestore(>lock, flags);
> >
> > This only works if the caller can prevent new entries from being added
> > to the list at this point or if the caller knows that they cannot be
> > added for some reason, but the kerneldoc doesn't mention this
> > limitation.
>
> I'll update the comment.
>
> ..
> >> +bool blocking_notifier_has_unique_priority(struct blocking_notifier_head 
> >> *nh,
> >> +   struct notifier_block *n)
> >> +{
> >> +   bool ret;
> >> +
> >> +   /*
> >> +* This code gets used during boot-up, when task switching is
> >> +* not yet working and interrupts must remain disabled. At such
> >> +* times we must not call down_read().
> >> +*/
> >> +   if (system_state != SYSTEM_BOOTING)
> >
> > No, please don't do this, it makes the whole thing error-prone.
>
> What should I do then?

First of all, do you know of any users who may want to call this
during early initialization?  If so, then why may they want to do
that?

Depending on the above, I would consider adding a special mechanism for them.

> >> +   down_read(>rwsem);
> >> +
> >> +   ret = notifier_has_unique_priority(>head, n);
> >> +
> >> +   if (system_state != SYSTEM_BOOTING)
> >> +   up_read(>rwsem);
> >
> > And still what if a new entry with a non-unique priority is added to
> > the chain at this point?
>
> If entry with a non-unique priority is added after the check, then
> obviously it won't be detected.

Why isn't this a problem?

> I don't understand the question. These
> down/up_read() are the locks that prevent the race, if that's the question.

Not really, they only prevent the race from occurring while
notifier_has_unique_priority() is running.

If anyone depends on this check for correctness, they need to lock the
rwsem, do the check, do the thing depending on the check while holding
the rwsem and then release the rwsem.  Otherwise it is racy.

Re: [PATCH v4 07/25] reboot: Remove extern annotation from function prototypes

2021-12-10 Thread Rafael J. Wysocki

On Fri, Dec 10, 2021 at 7:16 PM Dmitry Osipenko  wrote:
>
> 10.12.2021 21:09, Rafael J. Wysocki пишет:
> > On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
> >>
> >> There is no need to annotate function prototypes with 'extern', it makes
> >> code less readable. Remove unnecessary annotations from .
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >
> > I'm not sure that this is really useful.
> >
> > Personally, I tend to respect the existing conventions like this.
> >
> > Surely, this change is not required for the rest of the series to work.
>
> Problem that such things start to spread all over the kernel with a
> copy-paste approach if there is nobody to clean up the code.
>
> This is not a common convention and sometimes it's getting corrected [1].
>
> [1] https://git.kernel.org/linus/6d7434931

In separate patches outside of series adding new features, if one is
so inclined.

Re: [PATCH v4 06/25] reboot: Warn if unregister_restart_handler() fails

2021-12-10 Thread Rafael J. Wysocki

On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
>
> Emit warning if unregister_restart_handler() fails since it never should
> fail. This will ease further API development by catching mistakes early.
>
> Signed-off-by: Dmitry Osipenko 
> ---
>  kernel/reboot.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index e6659ae329f1..f0e7b9c13f6b 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -210,7 +210,7 @@ EXPORT_SYMBOL(register_restart_handler);
>   */
>  int unregister_restart_handler(struct notifier_block *nb)
>  {
> -   return atomic_notifier_chain_unregister(_handler_list, nb);
> +   return 
> WARN_ON(atomic_notifier_chain_unregister(_handler_list, nb));

The only reason why it can fail is if the object pointed to by nb is
not in the chain.  Why WARN() about this?  And what about systems with
panic_on_warn set?

>  }
>  EXPORT_SYMBOL(unregister_restart_handler);
>
> --
> 2.33.1
>

Re: [PATCH v4 05/25] reboot: Warn if restart handler has duplicated priority

2021-12-10 Thread Rafael J. Wysocki

On Mon, Nov 29, 2021 at 12:34 PM Dmitry Osipenko  wrote:
>
> 29.11.2021 03:26, Michał Mirosław пишет:
> > On Mon, Nov 29, 2021 at 12:06:19AM +0300, Dmitry Osipenko wrote:
> >> 28.11.2021 03:28, Michał Mirosław пишет:
> >>> On Fri, Nov 26, 2021 at 09:00:41PM +0300, Dmitry Osipenko wrote:
>  Add sanity check which ensures that there are no two restart handlers
>  registered with the same priority. Normally it's a direct sign of a
>  problem if two handlers use the same priority.
> >>>
> >>> The patch doesn't ensure the property that there are no 
> >>> duplicated-priority
> >>> entries on the chain.
> >>
> >> It's not the exact point of this patch.
> >>
> >>> I'd rather see a atomic_notifier_chain_register_unique() that returns
> >>> -EBUSY or something istead of adding an entry with duplicate priority.
> >>> That way it would need only one list traversal unless you want to
> >>> register the duplicate anyway (then you would call the older
> >>> atomic_notifier_chain_register() after reporting the error).
> >>
> >> The point of this patch is to warn developers about the problem that
> >> needs to be fixed. We already have such troubling drivers in mainline.
> >>
> >> It's not critical to register different handlers with a duplicated
> >> priorities, but such cases really need to be corrected. We shouldn't
> >> break users' machines during transition to the new API, meanwhile
> >> developers should take action of fixing theirs drivers.
> >>
> >>> (Or you could return > 0 when a duplicate is registered in
> >>> atomic_notifier_chain_register() if the callers are prepared
> >>> for that. I don't really like this way, though.)
> >>
> >> I had a similar thought at some point before and decided that I'm not in
> >> favor of this approach. It's nicer to have a dedicated function that
> >> verifies the uniqueness, IMO.
> >
> > I don't like the part that it traverses the list second time to check
> > the uniqueness. But actually you could avoid that if
> > notifier_chain_register() would always add equal-priority entries in
> > reverse order:
> >
> >  static int notifier_chain_register(struct notifier_block **nl,
> >   struct notifier_block *n)
> >  {
> >   while ((*nl) != NULL) {
> >   if (unlikely((*nl) == n)) {
> >   WARN(1, "double register detected");
> >   return 0;
> >   }
> > - if (n->priority > (*nl)->priority)
> > + if (n->priority >= (*nl)->priority)
> >   break;
> >   nl = &((*nl)->next);
> >   }
> >   n->next = *nl;
> >   rcu_assign_pointer(*nl, n);
> >   return 0;
> >  }
> >
> > Then the check for uniqueness after adding would be:
> >
> >  WARN(nb->next && nb->priority == nb->next->priority);
>
> We can't just change the registration order because invocation order of
> the call chain depends on the registration order

It doesn't if unique priorities are required and isn't that what you want?

> and some of current
> users may rely on that order. I'm pretty sure that changing the order
> will have unfortunate consequences.

Well, the WARN() doesn't help much then.

Either you can make all of the users register with unique priorities,
and then you can make the registration reject non-unique ones, or you
cannot assume them to be unique.

Re: [PATCH v4 04/25] reboot: Correct typo in a comment

2021-12-10 Thread Rafael J. Wysocki

On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
>
> Correct s/implemenations/implementations/ in .
>
> Signed-off-by: Dmitry Osipenko 

This patch clearly need not be part of this series.

> ---
>  include/linux/reboot.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index af907a3d68d1..7c288013a3ca 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -63,7 +63,7 @@ struct pt_regs;
>  extern void machine_crash_shutdown(struct pt_regs *);
>
>  /*
> - * Architecture independent implemenations of sys_reboot commands.
> + * Architecture independent implementations of sys_reboot commands.
>   */
>
>  extern void kernel_restart_prepare(char *cmd);
> --
> 2.33.1
>

Re: [PATCH v4 03/25] notifier: Add atomic/blocking_notifier_has_unique_priority()

2021-12-10 Thread Rafael J. Wysocki

On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
>
> Add atomic/blocking_notifier_has_unique_priority() helpers which return
> true if given handler has unique priority.
>
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/notifier.h |  5 +++
>  kernel/notifier.c| 69 
>  2 files changed, 74 insertions(+)
>
> diff --git a/include/linux/notifier.h b/include/linux/notifier.h
> index 924c9d7c8e73..2c4036f225e1 100644
> --- a/include/linux/notifier.h
> +++ b/include/linux/notifier.h
> @@ -175,6 +175,11 @@ int raw_notifier_call_chain_robust(struct 
> raw_notifier_head *nh,
>
>  bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head 
> *nh);
>
> +bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
> +   struct notifier_block *nb);
> +bool blocking_notifier_has_unique_priority(struct blocking_notifier_head *nh,
> +   struct notifier_block *nb);
> +
>  #define NOTIFY_DONE0x  /* Don't care */
>  #define NOTIFY_OK  0x0001  /* Suits me */
>  #define NOTIFY_STOP_MASK   0x8000  /* Don't call further */
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index b20cb7b9b1f0..7a325b742104 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -122,6 +122,19 @@ static int notifier_call_chain_robust(struct 
> notifier_block **nl,
> return ret;
>  }
>
> +static int notifier_has_unique_priority(struct notifier_block **nl,
> +   struct notifier_block *n)
> +{
> +   while (*nl && (*nl)->priority >= n->priority) {
> +   if ((*nl)->priority == n->priority && *nl != n)
> +   return false;
> +
> +   nl = &((*nl)->next);
> +   }
> +
> +   return true;
> +}
> +
>  /*
>   * Atomic notifier chain routines.  Registration and unregistration
>   * use a spinlock, and call_chain is synchronized by RCU (no locks).
> @@ -203,6 +216,30 @@ int atomic_notifier_call_chain(struct 
> atomic_notifier_head *nh,
>  EXPORT_SYMBOL_GPL(atomic_notifier_call_chain);
>  NOKPROBE_SYMBOL(atomic_notifier_call_chain);
>
> +/**
> + * atomic_notifier_has_unique_priority - Checks whether notifier's 
> priority is unique
> + * @nh: Pointer to head of the atomic notifier chain
> + * @n: Entry in notifier chain to check
> + *
> + * Checks whether there is another notifier in the chain with the same 
> priority.
> + * Must be called in process context.
> + *
> + * Returns true if priority is unique, false otherwise.
> + */
> +bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
> +   struct notifier_block *n)
> +{
> +   unsigned long flags;
> +   bool ret;
> +
> +   spin_lock_irqsave(>lock, flags);
> +   ret = notifier_has_unique_priority(>head, n);
> +   spin_unlock_irqrestore(>lock, flags);

This only works if the caller can prevent new entries from being added
to the list at this point or if the caller knows that they cannot be
added for some reason, but the kerneldoc doesn't mention this
limitation.

> +
> +   return ret;
> +}
> +EXPORT_SYMBOL_GPL(atomic_notifier_has_unique_priority);
> +
>  /*
>   * Blocking notifier chain routines.  All access to the chain is
>   * synchronized by an rwsem.
> @@ -336,6 +373,38 @@ bool blocking_notifier_call_chain_is_empty(struct 
> blocking_notifier_head *nh)
>  }
>  EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_is_empty);
>
> +/**
> + * blocking_notifier_has_unique_priority - Checks whether notifier's 
> priority is unique
> + * @nh: Pointer to head of the blocking notifier chain
> + * @n: Entry in notifier chain to check
> + *
> + * Checks whether there is another notifier in the chain with the same 
> priority.
> + * Must be called in process context.
> + *
> + * Returns true if priority is unique, false otherwise.
> + */
> +bool blocking_notifier_has_unique_priority(struct blocking_notifier_head *nh,
> +   struct notifier_block *n)
> +{
> +   bool ret;
> +
> +   /*
> +* This code gets used during boot-up, when task switching is
> +* not yet working and interrupts must remain disabled. At such
> +* times we must not call down_read().
> +*/
> +   if (system_state != SYSTEM_BOOTING)

No, please don't do this, it makes the whole thing error-prone.

> +   down_read(>rwsem);
> +
> +   ret = notifier_has_unique_priority(>head, n);
> +
> +   if (system_state != SYSTEM_BOOTING)
> +   up_read(>rwsem);

And still what if a new entry with a non-unique priority is added to
the chain at this point?

> +
> +   return ret;
> +}
> +EXPORT_SYMBOL_GPL(blocking_notifier_has_unique_priority);
> +
>  /*
>   * Raw notifier chain routines.  There is no protection;
>   * the caller must provide it.  Use at your own risk!
> --
> 2.33.1
>

Re: [PATCH v4 02/25] notifier: Add blocking_notifier_call_chain_is_empty()

2021-12-10 Thread Rafael J. Wysocki

On Fri, Nov 26, 2021 at 7:01 PM Dmitry Osipenko  wrote:
>
> Add blocking_notifier_call_chain_is_empty() that returns true if call
> chain is empty.
>
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/notifier.h |  2 ++
>  kernel/notifier.c| 14 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/include/linux/notifier.h b/include/linux/notifier.h
> index 4b80a815b666..924c9d7c8e73 100644
> --- a/include/linux/notifier.h
> +++ b/include/linux/notifier.h
> @@ -173,6 +173,8 @@ int blocking_notifier_call_chain_robust(struct 
> blocking_notifier_head *nh,
>  int raw_notifier_call_chain_robust(struct raw_notifier_head *nh,
> unsigned long val_up, unsigned long val_down, void *v);
>
> +bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head 
> *nh);
> +
>  #define NOTIFY_DONE0x  /* Don't care */
>  #define NOTIFY_OK  0x0001  /* Suits me */
>  #define NOTIFY_STOP_MASK   0x8000  /* Don't call further */
> diff --git a/kernel/notifier.c b/kernel/notifier.c
> index b8251dc0bc0f..b20cb7b9b1f0 100644
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -322,6 +322,20 @@ int blocking_notifier_call_chain(struct 
> blocking_notifier_head *nh,
>  }
>  EXPORT_SYMBOL_GPL(blocking_notifier_call_chain);
>
> +/**
> + * blocking_notifier_call_chain_is_empty - Check whether notifier chain 
> is empty
> + * @nh: Pointer to head of the blocking notifier chain
> + *
> + * Checks whether notifier chain is empty.
> + *
> + * Returns true is notifier chain is empty, false otherwise.
> + */
> +bool blocking_notifier_call_chain_is_empty(struct blocking_notifier_head *nh)
> +{
> +   return !rcu_access_pointer(nh->head);
> +}
> +EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_is_empty);

The check is not reliable (racy) without locking, so I wouldn't export
anything like this to modules.

At least IMO it should be added along with a user.

Re: [PATCH v4 07/25] reboot: Remove extern annotation from function prototypes

2021-12-10 Thread Rafael J. Wysocki

On Fri, Nov 26, 2021 at 7:02 PM Dmitry Osipenko  wrote:
>
> There is no need to annotate function prototypes with 'extern', it makes
> code less readable. Remove unnecessary annotations from .
>
> Signed-off-by: Dmitry Osipenko 

I'm not sure that this is really useful.

Personally, I tend to respect the existing conventions like this.

Surely, this change is not required for the rest of the series to work.

> ---
>  include/linux/reboot.h | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index 7c288013a3ca..b7fa25726323 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -40,36 +40,36 @@ extern int reboot_cpu;
>  extern int reboot_force;
>
>
> -extern int register_reboot_notifier(struct notifier_block *);
> -extern int unregister_reboot_notifier(struct notifier_block *);
> +int register_reboot_notifier(struct notifier_block *);
> +int unregister_reboot_notifier(struct notifier_block *);
>
> -extern int devm_register_reboot_notifier(struct device *, struct 
> notifier_block *);
> +int devm_register_reboot_notifier(struct device *, struct notifier_block *);
>
> -extern int register_restart_handler(struct notifier_block *);
> -extern int unregister_restart_handler(struct notifier_block *);
> -extern void do_kernel_restart(char *cmd);
> +int register_restart_handler(struct notifier_block *);
> +int unregister_restart_handler(struct notifier_block *);
> +void do_kernel_restart(char *cmd);
>
>  /*
>   * Architecture-specific implementations of sys_reboot commands.
>   */
>
> -extern void migrate_to_reboot_cpu(void);
> -extern void machine_restart(char *cmd);
> -extern void machine_halt(void);
> -extern void machine_power_off(void);
> +void migrate_to_reboot_cpu(void);
> +void machine_restart(char *cmd);
> +void machine_halt(void);
> +void machine_power_off(void);
>
> -extern void machine_shutdown(void);
> +void machine_shutdown(void);
>  struct pt_regs;
> -extern void machine_crash_shutdown(struct pt_regs *);
> +void machine_crash_shutdown(struct pt_regs *);
>
>  /*
>   * Architecture independent implementations of sys_reboot commands.
>   */
>
> -extern void kernel_restart_prepare(char *cmd);
> -extern void kernel_restart(char *cmd);
> -extern void kernel_halt(void);
> -extern void kernel_power_off(void);
> +void kernel_restart_prepare(char *cmd);
> +void kernel_restart(char *cmd);
> +void kernel_halt(void);
> +void kernel_power_off(void);
>
>  extern int C_A_D; /* for sysctl */
>  void ctrl_alt_del(void);
> @@ -77,15 +77,15 @@ void ctrl_alt_del(void);
>  #define POWEROFF_CMD_PATH_LEN  256
>  extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN];
>
> -extern void orderly_poweroff(bool force);
> -extern void orderly_reboot(void);
> +void orderly_poweroff(bool force);
> +void orderly_reboot(void);
>  void hw_protection_shutdown(const char *reason, int ms_until_forced);
>
>  /*
>   * Emergency restart, callable from an interrupt handler.
>   */
>
> -extern void emergency_restart(void);
> +void emergency_restart(void);
>  #include 
>
>  #endif /* _LINUX_REBOOT_H */
> --
> 2.33.1
>

Re: [PATCH v2 08/45] kernel: Add combined power-off+restart handler call chain API

2021-10-28 Thread Rafael J. Wysocki

On Wed, Oct 27, 2021 at 11:18 PM Dmitry Osipenko  wrote:
>
> SoC platforms often have multiple options of how to perform system's
> power-off and restart operations. Meanwhile today's kernel is limited to
> a single option. Add combined power-off+restart handler call chain API,
> which is inspired by the restart API. The new API provides both power-off
> and restart functionality.
>
> The old pm_power_off method will be kept around till all users are
> converted to the new API.
>
> Current restart API will be replaced by the new unified API since
> new API is its superset. The restart functionality of the power-handler
> API is built upon the existing restart-notifier APIs.
>
> In order to ease conversion to the new API, convenient helpers are added
> for the common use-cases. They will reduce amount of boilerplate code and
> remove global variables. These helpers preserve old behaviour for cases
> where only one power-off handler is executed, this is what existing
> drivers want, and thus, they could be easily converted to the new API.
> Users of the new API should explicitly enable power-off chaining by
> setting corresponding flag of the power_handler structure.
>
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/reboot.h   | 176 +++-
>  kernel/power/hibernate.c |   2 +-
>  kernel/reboot.c  | 601 ++-
>  3 files changed, 768 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index b7fa25726323..0ec835338c27 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -8,10 +8,16 @@
>
>  struct device;
>
> -#define SYS_DOWN   0x0001  /* Notify of system down */
> -#define SYS_RESTARTSYS_DOWN
> -#define SYS_HALT   0x0002  /* Notify of system halt */
> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> +enum reboot_prepare_mode {
> +   SYS_DOWN = 1,   /* Notify of system down */
> +   SYS_RESTART = SYS_DOWN,
> +   SYS_HALT,   /* Notify of system halt */
> +   SYS_POWER_OFF,  /* Notify of system power off */
> +};
> +
> +#define RESTART_PRIO_RESERVED  0
> +#define RESTART_PRIO_DEFAULT   128
> +#define RESTART_PRIO_HIGH  192
>
>  enum reboot_mode {
> REBOOT_UNDEFINED = -1,
> @@ -49,6 +55,167 @@ int register_restart_handler(struct notifier_block *);
>  int unregister_restart_handler(struct notifier_block *);
>  void do_kernel_restart(char *cmd);
>
> +/*
> + * Unified poweroff + restart API.
> + */
> +
> +#define POWEROFF_PRIO_RESERVED 0
> +#define POWEROFF_PRIO_PLATFORM 1
> +#define POWEROFF_PRIO_DEFAULT  128
> +#define POWEROFF_PRIO_HIGH 192
> +#define POWEROFF_PRIO_FIRMWARE 224

Also I'm wondering why these particular numbers were chosen, here and above?

> +
> +enum poweroff_mode {
> +   POWEROFF_NORMAL = 0,
> +   POWEROFF_PREPARE,
> +};
> +
> +struct power_off_data {
> +   void *cb_data;
> +};
> +
> +struct power_off_prep_data {
> +   void *cb_data;
> +};
> +
> +struct restart_data {
> +   void *cb_data;
> +   const char *cmd;
> +   enum reboot_mode mode;
> +};
> +
> +struct reboot_prep_data {
> +   void *cb_data;
> +   const char *cmd;
> +   enum reboot_prepare_mode mode;
> +};
> +
> +struct power_handler_private_data {
> +   struct notifier_block reboot_prep_nb;
> +   struct notifier_block power_off_nb;
> +   struct notifier_block restart_nb;
> +   void (*trivial_power_off_cb)(void);
> +   void (*simple_power_off_cb)(void *data);
> +   void *simple_power_off_cb_data;
> +   bool registered;
> +};
> +
> +/**
> + * struct power_handler - Machine power-off + restart handler
> + *
> + * Describes power-off and restart handlers which are invoked by kernel
> + * to power off or restart this machine.  Supports prioritized chaining for
> + * both restart and power-off handlers.  Callback's priority must be unique.
> + * Intended to be used by device drivers that are responsible for restarting
> + * and powering off hardware which kernel is running on.
> + *
> + * Struct power_handler can be static.  Members of this structure must not be
> + * altered while handler is registered.
> + *
> + * Fill the structure members and pass it to register_power_handler().
> + */
> +struct power_handler {
> +   /**
> +* @cb_data:
> +*
> +* User data included in callback's argument.
> +*/

And here I would document the structure fields in the main kerneldoc
comment above.

As is, it is a bit hard to grasp the whole definition.

> +   void *cb_data;
> +
> +   /**
> +* @power_off_cb:
> +*
> +* Callback that should turn off machine.  Inactive if NULL.
> +*/
> +   void (*power_off_cb)(struct power_off_data *data);
> +
> +   /**
> +* @power_off_prepare_cb:
> +*
> +* Power-off preparation

Re: [PATCH v2 08/45] kernel: Add combined power-off+restart handler call chain API

2021-10-28 Thread Rafael J. Wysocki

On Wed, Oct 27, 2021 at 11:18 PM Dmitry Osipenko  wrote:
>
> SoC platforms often have multiple options of how to perform system's
> power-off and restart operations. Meanwhile today's kernel is limited to
> a single option. Add combined power-off+restart handler call chain API,
> which is inspired by the restart API. The new API provides both power-off
> and restart functionality.
>
> The old pm_power_off method will be kept around till all users are
> converted to the new API.
>
> Current restart API will be replaced by the new unified API since
> new API is its superset. The restart functionality of the power-handler
> API is built upon the existing restart-notifier APIs.
>
> In order to ease conversion to the new API, convenient helpers are added
> for the common use-cases. They will reduce amount of boilerplate code and
> remove global variables. These helpers preserve old behaviour for cases
> where only one power-off handler is executed, this is what existing
> drivers want, and thus, they could be easily converted to the new API.
> Users of the new API should explicitly enable power-off chaining by
> setting corresponding flag of the power_handler structure.
>
> Signed-off-by: Dmitry Osipenko 
> ---
>  include/linux/reboot.h   | 176 +++-
>  kernel/power/hibernate.c |   2 +-
>  kernel/reboot.c  | 601 ++-
>  3 files changed, 768 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index b7fa25726323..0ec835338c27 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -8,10 +8,16 @@
>
>  struct device;
>
> -#define SYS_DOWN   0x0001  /* Notify of system down */
> -#define SYS_RESTARTSYS_DOWN
> -#define SYS_HALT   0x0002  /* Notify of system halt */
> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> +enum reboot_prepare_mode {
> +   SYS_DOWN = 1,   /* Notify of system down */
> +   SYS_RESTART = SYS_DOWN,
> +   SYS_HALT,   /* Notify of system halt */
> +   SYS_POWER_OFF,  /* Notify of system power off */
> +};
> +
> +#define RESTART_PRIO_RESERVED  0
> +#define RESTART_PRIO_DEFAULT   128
> +#define RESTART_PRIO_HIGH  192
>
>  enum reboot_mode {
> REBOOT_UNDEFINED = -1,
> @@ -49,6 +55,167 @@ int register_restart_handler(struct notifier_block *);
>  int unregister_restart_handler(struct notifier_block *);
>  void do_kernel_restart(char *cmd);
>
> +/*
> + * Unified poweroff + restart API.
> + */
> +
> +#define POWEROFF_PRIO_RESERVED 0
> +#define POWEROFF_PRIO_PLATFORM 1
> +#define POWEROFF_PRIO_DEFAULT  128
> +#define POWEROFF_PRIO_HIGH 192
> +#define POWEROFF_PRIO_FIRMWARE 224
> +
> +enum poweroff_mode {
> +   POWEROFF_NORMAL = 0,
> +   POWEROFF_PREPARE,
> +};
> +
> +struct power_off_data {
> +   void *cb_data;
> +};
> +
> +struct power_off_prep_data {
> +   void *cb_data;
> +};
> +
> +struct restart_data {
> +   void *cb_data;
> +   const char *cmd;
> +   enum reboot_mode mode;
> +};
> +
> +struct reboot_prep_data {
> +   void *cb_data;
> +   const char *cmd;
> +   enum reboot_prepare_mode mode;
> +};
> +
> +struct power_handler_private_data {
> +   struct notifier_block reboot_prep_nb;
> +   struct notifier_block power_off_nb;
> +   struct notifier_block restart_nb;
> +   void (*trivial_power_off_cb)(void);
> +   void (*simple_power_off_cb)(void *data);
> +   void *simple_power_off_cb_data;
> +   bool registered;
> +};
> +
> +/**
> + * struct power_handler - Machine power-off + restart handler
> + *
> + * Describes power-off and restart handlers which are invoked by kernel
> + * to power off or restart this machine.  Supports prioritized chaining for
> + * both restart and power-off handlers.  Callback's priority must be unique.
> + * Intended to be used by device drivers that are responsible for restarting
> + * and powering off hardware which kernel is running on.
> + *
> + * Struct power_handler can be static.  Members of this structure must not be
> + * altered while handler is registered.
> + *
> + * Fill the structure members and pass it to register_power_handler().
> + */
> +struct power_handler {

The name of this structure is too generic IMV.  There are many things
that it might apply to in principle.

What about calling power_off_handler or sys_off_handler as it need not
be about power at all?

Re: [PATCH 1/2] PM: base: power: don't try to use non-existing RTC for storing data

2021-09-06 Thread Rafael J. Wysocki

On Fri, Sep 3, 2021 at 11:02 AM Juergen Gross  wrote:
>
> On 03.09.21 10:56, Greg Kroah-Hartman wrote:
> > On Fri, Sep 03, 2021 at 10:49:36AM +0200, Juergen Gross wrote:
> >> In there is no legacy RTC device, don't try to use it for storing trace
> >> data across suspend/resume.
> >>
> >> Cc: 
> >> Signed-off-by: Juergen Gross 
> >> ---
> >>   drivers/base/power/trace.c | 10 ++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/base/power/trace.c b/drivers/base/power/trace.c
> >> index a97f33d0c59f..b7c80849455c 100644
> >> --- a/drivers/base/power/trace.c
> >> +++ b/drivers/base/power/trace.c
> >> @@ -13,6 +13,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>
> >>   #include 
> >>
> >> @@ -165,6 +166,9 @@ void generate_pm_trace(const void *tracedata, unsigned 
> >> int user)
> >>  const char *file = *(const char **)(tracedata + 2);
> >>  unsigned int user_hash_value, file_hash_value;
> >>
> >> +if (!x86_platform.legacy.rtc)
> >> +return 0;
> >
> > Why does the driver core code here care about a platform/arch-specific
> > thing at all?  Did you just break all other arches?
>
> This file is only compiled for x86. It depends on CONFIG_PM_TRACE_RTC,
> which has a "depends on X86" attribute.

This feature uses the CMOS RTC memory to store data, so if that memory
is not present, it's better to avoid using it.

Please feel free to add

Reviewed-by: Rafael J. Wysocki 

to this patch or let me know if you want me to take it.

Re: [PATCH v2 4/4] bus: Make remove callback return void

2021-07-06 Thread Rafael J. Wysocki

On Tue, Jul 6, 2021 at 5:53 PM Uwe Kleine-König
 wrote:
>
> The driver core ignores the return value of this callback because there
> is only little it can do when a device disappears.
>
> This is the final bit of a long lasting cleanup quest where several
> buses were converted to also return void from their remove callback.
> Additionally some resource leaks were fixed that were caused by drivers
> returning an error code in the expectation that the driver won't go
> away.
>
> With struct bus_type::remove returning void it's prevented that newly
> implemented buses return an ignored error code and so don't anticipate
> wrong expectations for driver authors.
>
> Acked-by: Russell King (Oracle)  (For ARM, Amba 
> and related parts)
> Acked-by: Mark Brown 
> Acked-by: Chen-Yu Tsai  (for drivers/bus/sunxi-rsb.c)
> Acked-by: Pali Rohár 
> Acked-by: Mauro Carvalho Chehab  (for drivers/media)
> Acked-by: Hans de Goede  (For drivers/platform)
> Acked-by: Alexandre Belloni 
> Acked-By: Vinod Koul 
> Acked-by: Juergen Gross  (For Xen)
> Acked-by: Lee Jones  (For drivers/mfd)
> Acked-by: Johannes Thumshirn  (For drivers/mcb)
> Acked-by: Johan Hovold 
> Acked-by: Srinivas Kandagatla  (For 
> drivers/slimbus)
> Acked-by: Kirti Wankhede  (For drivers/vfio)
> Acked-by: Maximilian Luz 
> Acked-by: Heikki Krogerus  (For ulpi and 
> typec)
> Acked-by: Samuel Iglesias Gonsálvez  (For ipack)
> Reviewed-by: Tom Rix  (For fpga)
> Acked-by: Geoff Levand  (For ps3)
> Signed-off-by: Uwe Kleine-König 

For the ACPI part:

Acked-by: Rafael J. Wysocki 

> ---
>
>  arch/arm/common/locomo.c  | 3 +--
>  arch/arm/common/sa.c  | 4 +---
>  arch/arm/mach-rpc/ecard.c | 4 +---
>  arch/mips/sgi-ip22/ip22-gio.c | 3 +--
>  arch/parisc/kernel/drivers.c  | 5 ++---
>  arch/powerpc/platforms/ps3/system-bus.c   | 3 +--
>  arch/powerpc/platforms/pseries/ibmebus.c  | 3 +--
>  arch/powerpc/platforms/pseries/vio.c  | 3 +--
>  drivers/acpi/bus.c| 3 +--
>  drivers/amba/bus.c| 4 +---
>  drivers/base/auxiliary.c  | 4 +---
>  drivers/base/isa.c| 4 +---
>  drivers/base/platform.c   | 4 +---
>  drivers/bcma/main.c   | 6 ++
>  drivers/bus/sunxi-rsb.c   | 4 +---
>  drivers/cxl/core.c| 3 +--
>  drivers/dax/bus.c | 4 +---
>  drivers/dma/idxd/sysfs.c  | 4 +---
>  drivers/firewire/core-device.c| 4 +---
>  drivers/firmware/arm_scmi/bus.c   | 4 +---
>  drivers/firmware/google/coreboot_table.c  | 4 +---
>  drivers/fpga/dfl.c| 4 +---
>  drivers/hid/hid-core.c| 4 +---
>  drivers/hid/intel-ish-hid/ishtp/bus.c | 4 +---
>  drivers/hv/vmbus_drv.c| 5 +
>  drivers/hwtracing/intel_th/core.c | 4 +---
>  drivers/i2c/i2c-core-base.c   | 5 +
>  drivers/i3c/master.c  | 4 +---
>  drivers/input/gameport/gameport.c | 3 +--
>  drivers/input/serio/serio.c   | 3 +--
>  drivers/ipack/ipack.c | 4 +---
>  drivers/macintosh/macio_asic.c| 4 +---
>  drivers/mcb/mcb-core.c| 4 +---
>  drivers/media/pci/bt8xx/bttv-gpio.c   | 3 +--
>  drivers/memstick/core/memstick.c  | 3 +--
>  drivers/mfd/mcp-core.c| 3 +--
>  drivers/misc/mei/bus.c| 4 +---
>  drivers/misc/tifm_core.c  | 3 +--
>  drivers/mmc/core/bus.c| 4 +---
>  drivers/mmc/core/sdio_bus.c   | 4 +---
>  drivers/net/netdevsim/bus.c   | 3 +--
>  drivers/ntb/core.c| 4 +---
>  drivers/ntb/ntb_transport.c   | 4 +---
>  drivers/nvdimm/bus.c  | 3 +--
>  drivers/pci/endpoint/pci-epf-core.c   | 4 +---
>  drivers/pci/pci-driver.c  | 3 +--
>  drivers/pcmcia/ds.c   | 4 +---
>  drivers/platform/surface/aggregator/bus.c | 4 +---
>  drivers/platform/x86/wmi.c| 4 +---
>  drivers/pnp/driver.c  | 3 +--
>  drivers/rapidio/rio-driver.c  | 4 +---
>  drivers/rpmsg/rpmsg_core.c| 4 +---
>  drivers/s390/cio/ccwgroup.c   | 4 +---
>  drivers/s390/cio/css.c| 4 +---
>  drivers/s390/cio/device.c | 4 +---
>  drivers/s390/cio/scm.c| 4 +---
>  drivers/s390/crypto/ap_bus.c  | 4 +---
>  drivers/scsi/scsi_debug.c | 3 +--
>  drivers/siox/siox-co

Re: [PATCH] xen: Remove support for PV ACPI cpu/memory hotplug

2021-04-13 Thread Rafael J. Wysocki

On Tue, Apr 13, 2021 at 7:53 PM Boris Ostrovsky
 wrote:
>
> Commit 76fc253723ad ("xen/acpi-stub: Disable it b/c the acpi_processor_add
> is no longer called.") declared as BROKEN support for Xen ACPI stub (which
> is required for xen-acpi-{cpu|memory}-hotplug) and suggested that this
> is temporary and will be soon fixed. This was in March 2013.
>
> Further, commit cfafae940381 ("xen: rename dom0_op to platform_op")
> renamed an interface used by memory hotplug code without updating that
> code (as it was BROKEN and therefore not compiled). This was
> in November 2015 and has gone unnoticed for over 5 year.
>
> It is now clear that this code is of no interest to anyone and therefore
> should be removed.
>
> Signed-off-by: Boris Ostrovsky 

Acked-by: Rafael J. Wysocki 

Thanks for doing this!

> ---
>  drivers/xen/Kconfig   |  31 ---
>  drivers/xen/Makefile  |   3 -
>  drivers/xen/pcpu.c|  35 ---
>  drivers/xen/xen-acpi-cpuhotplug.c | 446 ---
>  drivers/xen/xen-acpi-memhotplug.c | 475 
> --
>  drivers/xen/xen-stub.c|  90 
>  include/xen/acpi.h|  35 ---
>  7 files changed, 1115 deletions(-)
>  delete mode 100644 drivers/xen/xen-acpi-cpuhotplug.c
>  delete mode 100644 drivers/xen/xen-acpi-memhotplug.c
>  delete mode 100644 drivers/xen/xen-stub.c
>
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index ea0efd290c37..5f1ce59b44b9 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -238,37 +238,6 @@ config XEN_PRIVCMD
> depends on XEN
> default m
>
> -config XEN_STUB
> -   bool "Xen stub drivers"
> -   depends on XEN && X86_64 && BROKEN
> -   help
> - Allow kernel to install stub drivers, to reserve space for Xen 
> drivers,
> - i.e. memory hotplug and cpu hotplug, and to block native drivers 
> loaded,
> - so that real Xen drivers can be modular.
> -
> - To enable Xen features like cpu and memory hotplug, select Y here.
> -
> -config XEN_ACPI_HOTPLUG_MEMORY
> -   tristate "Xen ACPI memory hotplug"
> -   depends on XEN_DOM0 && XEN_STUB && ACPI
> -   help
> - This is Xen ACPI memory hotplug.
> -
> - Currently Xen only support ACPI memory hot-add. If you want
> - to hot-add memory at runtime (the hot-added memory cannot be
> - removed until machine stop), select Y/M here, otherwise select N.
> -
> -config XEN_ACPI_HOTPLUG_CPU
> -   tristate "Xen ACPI cpu hotplug"
> -   depends on XEN_DOM0 && XEN_STUB && ACPI
> -   select ACPI_CONTAINER
> -   help
> - Xen ACPI cpu enumerating and hotplugging
> -
> - For hotplugging, currently Xen only support ACPI cpu hotadd.
> - If you want to hotadd cpu at runtime (the hotadded cpu cannot
> - be removed until machine stop), select Y/M here.
> -
>  config XEN_ACPI_PROCESSOR
> tristate "Xen ACPI processor"
> depends on XEN && XEN_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index c3621b9f4012..3434593455b2 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -26,9 +26,6 @@ obj-$(CONFIG_SWIOTLB_XEN) += swiotlb-xen.o
>  obj-$(CONFIG_XEN_MCE_LOG)  += mcelog.o
>  obj-$(CONFIG_XEN_PCIDEV_BACKEND)   += xen-pciback/
>  obj-$(CONFIG_XEN_PRIVCMD)  += xen-privcmd.o
> -obj-$(CONFIG_XEN_STUB) += xen-stub.o
> -obj-$(CONFIG_XEN_ACPI_HOTPLUG_MEMORY)  += xen-acpi-memhotplug.o
> -obj-$(CONFIG_XEN_ACPI_HOTPLUG_CPU) += xen-acpi-cpuhotplug.o
>  obj-$(CONFIG_XEN_ACPI_PROCESSOR)   += xen-acpi-processor.o
>  obj-$(CONFIG_XEN_EFI)  += efi.o
>  obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
> diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
> index cdc6daa7a9f6..1bcdd5227771 100644
> --- a/drivers/xen/pcpu.c
> +++ b/drivers/xen/pcpu.c
> @@ -345,41 +345,6 @@ static irqreturn_t xen_pcpu_interrupt(int irq, void 
> *dev_id)
> return IRQ_HANDLED;
>  }
>
> -/* Sync with Xen hypervisor after cpu hotadded */
> -void xen_pcpu_hotplug_sync(void)
> -{
> -   schedule_work(_pcpu_work);
> -}
> -EXPORT_SYMBOL_GPL(xen_pcpu_hotplug_sync);
> -
> -/*
> - * For hypervisor presented cpu, return logic cpu id;
> - * For hypervisor non-presented cpu, return -ENODEV.
> - */
> -int xen_pcpu_id(uint32_t acpi_id)
> -{
> -   int cpu_id = 0, max_id = 0;

Re: [PATCH v1] xen: ACPI: Get rid of ACPICA message printing

2021-03-01 Thread Rafael J. Wysocki

On Sun, Feb 28, 2021 at 2:49 AM Boris Ostrovsky
 wrote:
>
>
> On 2/24/21 1:47 PM, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> >
> > The ACPI_DEBUG_PRINT() macro is used in a few places in
> > xen-acpi-cpuhotplug.c and xen-acpi-memhotplug.c for printing debug
> > messages, but that is questionable, because that macro belongs to
> > ACPICA and it should not be used elsewhere.  In addition,
> > ACPI_DEBUG_PRINT() requires special enabling to allow it to actually
> > print the message and the _COMPONENT symbol generally needed for
> > that is not defined in any of the files in question.
> >
> > For this reason, replace all of the ACPI_DEBUG_PRINT() instances in
> > the Xen code with acpi_handle_debug() (with the additional benefit
> > that the source object can be identified more easily after this
> > change) and drop the ACPI_MODULE_NAME() definitions that are only
> > used by the ACPICA message printing macros from that code.
> >
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> >  drivers/xen/xen-acpi-cpuhotplug.c |   12 +---
> >  drivers/xen/xen-acpi-memhotplug.c |   16 +++-
>
>
> As I was building with this patch I (re-)discovered that since 2013 it 
> depends on BROKEN (commit 76fc253723add). Despite commit message there saying 
> that it's a temporary patch it seems to me by now that it's more than that.
>
>
> And clearly noone tried to build these files since at least 2015 because 
> memhotplug file doesn't compile due to commit cfafae940381207.
>
>
> While this is easily fixable the question is whether we want to keep these 
> files. Obviously noone cares about this functionality.

Well, I would be for dropping them.

Do you want me to send a patch to do that?

[PATCH v1] xen: ACPI: Get rid of ACPICA message printing

2021-02-24 Thread Rafael J. Wysocki

From: Rafael J. Wysocki 

The ACPI_DEBUG_PRINT() macro is used in a few places in
xen-acpi-cpuhotplug.c and xen-acpi-memhotplug.c for printing debug
messages, but that is questionable, because that macro belongs to
ACPICA and it should not be used elsewhere.  In addition,
ACPI_DEBUG_PRINT() requires special enabling to allow it to actually
print the message and the _COMPONENT symbol generally needed for
that is not defined in any of the files in question.

For this reason, replace all of the ACPI_DEBUG_PRINT() instances in
the Xen code with acpi_handle_debug() (with the additional benefit
that the source object can be identified more easily after this
change) and drop the ACPI_MODULE_NAME() definitions that are only
used by the ACPICA message printing macros from that code.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/xen/xen-acpi-cpuhotplug.c |   12 +---
 drivers/xen/xen-acpi-memhotplug.c |   16 +++-
 2 files changed, 12 insertions(+), 16 deletions(-)

Index: linux-pm/drivers/xen/xen-acpi-cpuhotplug.c
===
--- linux-pm.orig/drivers/xen/xen-acpi-cpuhotplug.c
+++ linux-pm/drivers/xen/xen-acpi-cpuhotplug.c
@@ -242,10 +242,10 @@ static void acpi_processor_hotplug_notif
switch (event) {
case ACPI_NOTIFY_BUS_CHECK:
case ACPI_NOTIFY_DEVICE_CHECK:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
+   acpi_handle_debug(handle,
"Processor driver received %s event\n",
(event == ACPI_NOTIFY_BUS_CHECK) ?
-   "ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK"));
+   "ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK");
 
if (!is_processor_present(handle))
break;
@@ -269,8 +269,8 @@ static void acpi_processor_hotplug_notif
break;
 
case ACPI_NOTIFY_EJECT_REQUEST:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
- "received ACPI_NOTIFY_EJECT_REQUEST\n"));
+   acpi_handle_debug(handle,
+ "received ACPI_NOTIFY_EJECT_REQUEST\n");
 
if (acpi_bus_get_device(handle, )) {
pr_err(PREFIX "Device don't exist, dropping EJECT\n");
@@ -290,8 +290,7 @@ static void acpi_processor_hotplug_notif
break;
 
default:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
- "Unsupported event [0x%x]\n", event));
+   acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event);
 
/* non-hotplug event; possibly handled by other handler */
goto out;
@@ -440,7 +439,6 @@ static void __exit xen_acpi_processor_ex
 
 module_init(xen_acpi_processor_init);
 module_exit(xen_acpi_processor_exit);
-ACPI_MODULE_NAME("xen-acpi-cpuhotplug");
 MODULE_AUTHOR("Liu Jinsong ");
 MODULE_DESCRIPTION("Xen Hotplug CPU Driver");
 MODULE_LICENSE("GPL");
Index: linux-pm/drivers/xen/xen-acpi-memhotplug.c
===
--- linux-pm.orig/drivers/xen/xen-acpi-memhotplug.c
+++ linux-pm/drivers/xen/xen-acpi-memhotplug.c
@@ -227,13 +227,13 @@ static void acpi_memory_device_notify(ac
 
switch (event) {
case ACPI_NOTIFY_BUS_CHECK:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-   "\nReceived BUS CHECK notification for device\n"));
+   acpi_handle_debug(handle,
+   "Received BUS CHECK notification for device\n");
fallthrough;
case ACPI_NOTIFY_DEVICE_CHECK:
if (event == ACPI_NOTIFY_DEVICE_CHECK)
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-   "\nReceived DEVICE CHECK notification for device\n"));
+   acpi_handle_debug(handle,
+   "Received DEVICE CHECK notification for device\n");
 
if (acpi_memory_get_device(handle, _device)) {
pr_err(PREFIX "Cannot find driver data\n");
@@ -244,8 +244,8 @@ static void acpi_memory_device_notify(ac
break;
 
case ACPI_NOTIFY_EJECT_REQUEST:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-   "\nReceived EJECT REQUEST notification for device\n"));
+   acpi_handle_debug(handle,
+   "Received EJECT REQUEST notification for device\n");
 
acpi_scan_lock_acquire();
if (acpi_bus_get_device(handle, )) {
@@ -269,8 +269,7 @@ static void acpi_memory_device_notify(ac
break;
 
default:
-   ACPI_DEBUG_PRINT((ACPI_DB_INFO,
- "

Re: [RFC PATCH 29/34] power/swap: use bio_new in hib_submit_io

2021-01-28 Thread Rafael J. Wysocki

On Thu, Jan 28, 2021 at 8:21 AM Chaitanya Kulkarni
 wrote:
>

Please explain in the changelog why making this change is a good idea.

> Signed-off-by: Chaitanya Kulkarni 
> ---
>  kernel/power/swap.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index c73f2e295167..e92e36c053a6 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -271,13 +271,12 @@ static int hib_submit_io(int op, int op_flags, pgoff_t 
> page_off, void *addr,
> struct hib_bio_batch *hb)
>  {
> struct page *page = virt_to_page(addr);
> +   sector_t sect = page_off * (PAGE_SIZE >> 9);
> struct bio *bio;
> int error = 0;
>
> -   bio = bio_alloc(GFP_NOIO | __GFP_HIGH, 1);
> -   bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
> -   bio_set_dev(bio, hib_resume_bdev);
> -   bio_set_op_attrs(bio, op, op_flags);
> +   bio = bio_new(hib_resume_bdev, sect, op, op_flags, 1,
> + GFP_NOIO | __GFP_HIGH);
>
> if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
> pr_err("Adding page to bio failed at %llu\n",
> --
> 2.22.1
>

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-23 Thread Rafael J. Wysocki

On Mon, Nov 23, 2020 at 4:58 PM James Bottomley
 wrote:
>
> On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote:
> > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley
> >  wrote:

[cut]

> >
> > Maintainers routinely review 1-line trivial patches, not to mention
> > internal API changes, etc.
>
> We're also complaining about the inability to recruit maintainers:
>
> https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/
>
> And burn out:
>
> http://antirez.com/news/129

Right.

> The whole crux of your argument seems to be maintainers' time isn't
> important so we should accept all trivial patches ... I'm pushing back
> on that assumption in two places, firstly the valulessness of the time
> and secondly that all trivial patches are valuable.
>
> > If some company does not want to pay for that, that's fine, but they
> > don't get to be maintainers and claim `Supported`.
>
> What I'm actually trying to articulate is a way of measuring value of
> the patch vs cost ... it has nothing really to do with who foots the
> actual bill.
>
> One thesis I'm actually starting to formulate is that this continual
> devaluing of maintainers is why we have so much difficulty keeping and
> recruiting them.

Absolutely.

This is just one of the factors involved, but a significant one IMV.

Re: [PATCH v3 00/11] Fix PM hibernation in Xen guests

2020-08-28 Thread Rafael J. Wysocki

On Fri, Aug 28, 2020 at 8:26 PM Anchal Agarwal  wrote:
>
> On Fri, Aug 21, 2020 at 10:22:43PM +, Anchal Agarwal wrote:
> > Hello,
> > This series fixes PM hibernation for hvm guests running on xen hypervisor.
> > The running guest could now be hibernated and resumed successfully at a
> > later time. The fixes for PM hibernation are added to block and
> > network device drivers i.e xen-blkfront and xen-netfront. Any other driver
> > that needs to add S4 support if not already, can follow same method of
> > introducing freeze/thaw/restore callbacks.
> > The patches had been tested against upstream kernel and xen4.11. Large
> > scale testing is also done on Xen based Amazon EC2 instances. All this 
> > testing
> > involved running memory exhausting workload in the background.
> >
> > Doing guest hibernation does not involve any support from hypervisor and
> > this way guest has complete control over its state. Infrastructure
> > restrictions for saving up guest state can be overcome by guest initiated
> > hibernation.
> >
> > These patches were send out as RFC before and all the feedback had been
> > incorporated in the patches. The last v1 & v2 could be found here:
> >
> > [v1]: https://lkml.org/lkml/2020/5/19/1312
> > [v2]: https://lkml.org/lkml/2020/7/2/995
> > All comments and feedback from v2 had been incorporated in v3 series.
> >
> > Known issues:
> > 1.KASLR causes intermittent hibernation failures. VM fails to resumes and
> > has to be restarted. I will investigate this issue separately and shouldn't
> > be a blocker for this patch series.
> > 2. During hibernation, I observed sometimes that freezing of tasks fails due
> > to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
> > out of 200 runs and hibernation is aborted in this case. Re-trying 
> > hibernation
> > may work. Also, this is a known issue with hibernation and some
> > filesystems like XFS has been discussed by the community for years with not 
> > an
> > effectve resolution at this point.
> >
> > Testing How to:
> > ---
> > 1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream
> > xen-4.11]
> > 2. Bring up a HVM guest w/t kernel compiled with hibernation patches
> > [I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem 
> > images].
> > 3. Create a swap file size=RAM size
> > 4. Update grub parameters and reboot
> > 5. Trigger pm-hibernation from within the VM
> >
> > Example:
> > Set up a file-backed swap space. Swap file size>=Total memory on the system
> > sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
> > sudo chmod 600 /swap
> > sudo mkswap /swap
> > sudo swapon /swap
> >
> > Update resume device/resume offset in grub if using swap file:
> > resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1
> >
> > Execute:
> > 
> > sudo pm-hibernate
> > OR
> > echo disk > /sys/power/state && echo reboot > /sys/power/disk
> >
> > Compute resume offset code:
> > "
> > #!/usr/bin/env python
> > import sys
> > import array
> > import fcntl
> >
> > #swap file
> > f = open(sys.argv[1], 'r')
> > buf = array.array('L', [0])
> >
> > #FIBMAP
> > ret = fcntl.ioctl(f.fileno(), 0x01, buf)
> > print buf[0]
> > "
> >
> > Aleksei Besogonov (1):
> >   PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
> >
> > Anchal Agarwal (4):
> >   x86/xen: Introduce new function to map HYPERVISOR_shared_info on
> > Resume
> >   x86/xen: save and restore steal clock during PM hibernation
> >   xen: Introduce wrapper for save/restore sched clock offset
> >   xen: Update sched clock offset to avoid system instability in
> > hibernation
> >
> > Munehisa Kamata (5):
> >   xen/manage: keep track of the on-going suspend mode
> >   xenbus: add freeze/thaw/restore callbacks support
> >   x86/xen: add system core suspend and resume callbacks
> >   xen-blkfront: add callbacks for PM suspend and hibernation
> >   xen-netfront: add callbacks for PM suspend and hibernation
> >
> > Thomas Gleixner (1):
> >   genirq: Shutdown irq chips in suspend/resume during hibernation
> >
> >  arch/x86/xen/enlighten_hvm.c  |   7 +++
> >  arch/x86/xen/suspend.c|  63 
> >  arch/x86/xen/time.c   |  15 -
> >  arch/x86/xen/xen-ops.h|   3 +
> >  drivers/block/xen-blkfront.c  | 122 
> > --
> >  drivers/net/xen-netfront.c|  96 +-
> >  drivers/xen/events/events_base.c  |   1 +
> >  drivers/xen/manage.c  |  46 ++
> >  drivers/xen/xenbus/xenbus_probe.c |  96 +-
> >  include/linux/irq.h   |   2 +
> >  include/xen/xen-ops.h |   3 +
> >  include/xen/xenbus.h  |   3 +
> >  kernel/irq/chip.c |   2 +-
> >  kernel/irq/internals.h|   1 +
> >  kernel/irq/pm.c   |  31 +++---
> >  kernel/power/user.c

Re: [PATCH v2 01/11] xen/manage: keep track of the on-going suspend mode

2020-07-31 Thread Rafael J. Wysocki

On Fri, Jul 31, 2020 at 4:14 PM Boris Ostrovsky
 wrote:
>
> On 7/30/20 7:06 PM, Anchal Agarwal wrote:
> > On Mon, Jul 27, 2020 at 06:08:29PM -0400, Boris Ostrovsky wrote:
> >> CAUTION: This email originated from outside of the organization. Do not 
> >> click links or open attachments unless you can confirm the sender and know 
> >> the content is safe.
> >>
> >>
> >>
> >> On 7/24/20 7:01 PM, Stefano Stabellini wrote:
> >>> Yes, it does, thank you. I'd rather not introduce unknown regressions so
> >>> I would recommend to add an arch-specific check on registering
> >>> freeze/thaw/restore handlers. Maybe something like the following:
> >>>
> >>> #ifdef CONFIG_X86
> >>> .freeze = blkfront_freeze,
> >>> .thaw = blkfront_restore,
> >>> .restore = blkfront_restore
> >>> #endif
> >>>
> >>>
> >>> maybe Boris has a better suggestion on how to do it
> >>
> >> An alternative might be to still install pm notifier in
> >> drivers/xen/manage.c (I think as result of latest discussions we decided
> >> we won't need it) and return -ENOTSUPP for ARM for
> >> PM_HIBERNATION_PREPARE and friends. Would that work?
> >>
> > I think the question here is for registering driver specific 
> > freeze/thaw/restore
> > callbacks for x86 only. I have dropped the pm_notifier in the v3 still 
> > pending
> > testing. So I think just registering driver specific callbacks for x86 only 
> > is a
> > good option. What do you think?
>
>
> I suggested using the notifier under assumption that if it returns an
> error then that will prevent callbacks to be called because hibernation
> will be effectively disabled.

That's correct.

Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

2020-01-13 Thread Rafael J. Wysocki

On Mon, Jan 13, 2020 at 10:50 PM Rafael J. Wysocki  wrote:
>
> On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra  wrote:
> >
> > On Mon, Jan 13, 2020 at 11:43:18AM +, Singh, Balbir wrote:
> > > For your original comment, just wanted to clarify the following:
> > >
> > > 1. After hibernation, the machine can be resumed on a different but 
> > > compatible
> > > host (these are VM images hibernated)
> > > 2. This means the clock between host1 and host2 can/will be different
> > >
> > > In your comments are you making the assumption that the host(s) is/are the
> > > same? Just checking the assumptions being made and being on the same page 
> > > with
> > > them.
> >
> > I would expect this to be the same problem we have as regular suspend,
> > after power off the TSC will have been reset, so resume will have to
> > somehow bridge that gap. I've no idea if/how it does that.
>
> In general, this is done by timekeeping_resume() and the only special
> thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
> call in tsc_resume().

And I forgot about tsc_restore_sched_clock_state() that gets called
via restore_processor_state() on x86, before calling
timekeeping_resume().

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

2020-01-13 Thread Rafael J. Wysocki

On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra  wrote:
>
> On Mon, Jan 13, 2020 at 11:43:18AM +, Singh, Balbir wrote:
> > For your original comment, just wanted to clarify the following:
> >
> > 1. After hibernation, the machine can be resumed on a different but 
> > compatible
> > host (these are VM images hibernated)
> > 2. This means the clock between host1 and host2 can/will be different
> >
> > In your comments are you making the assumption that the host(s) is/are the
> > same? Just checking the assumptions being made and being on the same page 
> > with
> > them.
>
> I would expect this to be the same problem we have as regular suspend,
> after power off the TSC will have been reset, so resume will have to
> somehow bridge that gap. I've no idea if/how it does that.

In general, this is done by timekeeping_resume() and the only special
thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
call in tsc_resume().

> I remember some BIOSes had crazy TSC ideas for suspend2ram, and we grew
> tsc_restore_sched_clock_state() for it.
>
> Playing crazy games like what you're doing just isn't it though.

Right.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

2020-01-13 Thread Rafael J. Wysocki

On Mon, Jan 13, 2020 at 12:43 PM Singh, Balbir  wrote:
>
> On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote:
> > On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote:
> > > Hey Peter,
> > >
> > > On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote:
> > > > On Tue, Jan 07, 2020 at 11:45:26PM +, Anchal Agarwal wrote:
> > > > > From: Eduardo Valentin 
> > > > >
> > > > > System instability are seen during resume from hibernation when system
> > > > > is under heavy CPU load. This is due to the lack of update of sched
> > > > > clock data, and the scheduler would then think that heavy CPU hog
> > > > > tasks need more time in CPU, causing the system to freeze
> > > > > during the unfreezing of tasks. For example, threaded irqs,
> > > > > and kernel processes servicing network interface may be delayed
> > > > > for several tens of seconds, causing the system to be unreachable.
> > > > > The fix for this situation is to mark the sched clock as unstable
> > > > > as early as possible in the resume path, leaving it unstable
> > > > > for the duration of the resume process. This will force the
> > > > > scheduler to attempt to align the sched clock across CPUs using
> > > > > the delta with time of day, updating sched clock data. In a post
> > > > > hibernation event, we can then mark the sched clock as stable
> > > > > again, avoiding unnecessary syncs with time of day on systems
> > > > > in which TSC is reliable.
> > > >
> > > > This makes no frigging sense what so bloody ever. If the clock is
> > > > stable, we don't care about sched_clock_data. When it is stable you get
> > > > a linear function of the TSC without complicated bits on.
> > > >
> > > > When it is unstable, only then do we care about the sched_clock_data.
> > > >
> > >
> > > Yeah, maybe what is not clear here is that we covering for situation
> > > where clock stability changes over time, e.g. at regular boot clock is
> > > stable, hibernation happens, then restore happens in a non-stable clock.
> >
> > Still confused, who marks the thing unstable? The patch seems to suggest
> > you do yourself, but it is not at all clear why.
> >
> > If TSC really is unstable, then it needs to remain unstable. If the TSC
> > really is stable then there is no point in marking is unstable.
> >
> > Either way something is off, and you're not telling me what.
> >
>
> Hi, Peter
>
> For your original comment, just wanted to clarify the following:
>
> 1. After hibernation, the machine can be resumed on a different but compatible
> host (these are VM images hibernated)
> 2. This means the clock between host1 and host2 can/will be different

So the problem is specific to this particular use case.

I'm not sure why to impose this hack on hibernation in all cases.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 01/28] linkage: Introduce new macros for assembler symbols

2019-10-14 Thread Rafael J. Wysocki

On Fri, Oct 11, 2019 at 1:53 PM Jiri Slaby  wrote:
>
> Introduce new C macros for annotations of functions and data in
> assembly. There is a long-standing mess in macros like ENTRY, END,
> ENDPROC and similar. They are used in different manners and sometimes
> incorrectly.
>
> So introduce macros with clear use to annotate assembly as follows:
>
> a) Support macros for the ones below
>SYM_T_FUNC -- type used by assembler to mark functions
>SYM_T_OBJECT -- type used by assembler to mark data
>SYM_T_NONE -- type used by assembler to mark entries of unknown type
>
>They are defined as STT_FUNC, STT_OBJECT, and STT_NOTYPE
>respectively. According to the gas manual, this is the most portable
>way. I am not sure about other assemblers, so this can be switched
>back to %function and %object if this turns into a problem.
>Architectures can also override them by something like ", @function"
>if they need.
>
>SYM_A_ALIGN, SYM_A_NONE -- align the symbol?
>SYM_L_GLOBAL, SYM_L_WEAK, SYM_L_LOCAL -- linkage of symbols
>
> b) Mostly internal annotations, used by the ones below
>SYM_ENTRY -- use only if you have to (for non-paired symbols)
>SYM_START -- use only if you have to (for paired symbols)
>SYM_END -- use only if you have to (for paired symbols)
>
> c) Annotations for code
>SYM_INNER_LABEL_ALIGN -- only for labels in the middle of code
>SYM_INNER_LABEL -- only for labels in the middle of code
>
>SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for
> one function
>SYM_FUNC_START_ALIAS -- use where there are two global names for one
> function
>SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function
>
>SYM_FUNC_START -- use for global functions
>SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment
>SYM_FUNC_START_LOCAL -- use for local functions
>SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o
> alignment
>SYM_FUNC_START_WEAK -- use for weak functions
>SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment
>SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START,
> SYM_FUNC_START_WEAK, ...
>
>For functions with special (non-C) calling conventions:
>SYM_CODE_START -- use for non-C (special) functions
>SYM_CODE_START_NOALIGN -- use for non-C (special) functions, w/o
> alignment
>SYM_CODE_START_LOCAL -- use for local non-C (special) functions
>SYM_CODE_START_LOCAL_NOALIGN -- use for local non-C (special)
> functions, w/o alignment
>SYM_CODE_END -- the end of SYM_CODE_START_LOCAL or SYM_CODE_START
>
> d) For data
>SYM_DATA_START -- global data symbol
>SYM_DATA_START_LOCAL -- local data symbol
>SYM_DATA_END -- the end of the SYM_DATA_START symbol
>SYM_DATA_END_LABEL -- the labeled end of SYM_DATA_START symbol
>SYM_DATA -- start+end wrapper around simple global data
>SYM_DATA_LOCAL -- start+end wrapper around simple local data
>
> ==
>
> The macros allow to pair starts and ends of functions and mark functions
> correctly in the output ELF objects.
>
> All users of the old macros in x86 are converted to use these in further
> patches.
>
> Signed-off-by: Jiri Slaby 
> Cc: Andrew Morton 
> Cc: Boris Ostrovsky 
> Cc: h...@zytor.com
> Cc: Ingo Molnar 
> Cc: jpoim...@redhat.com
> Cc: Juergen Gross 
> Cc: Len Brown 
> Cc: Linus Torvalds 
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: Borislav Petkov 
> Cc: mi...@redhat.com
> Cc: Pavel Machek 
> Cc: Peter Zijlstra 
> Cc: "Rafael J. Wysocki" 
> Cc: Thomas Gleixner 
> Cc: xen-devel@lists.xenproject.org
> Cc: x...@kernel.org

Acked-by: Rafael J. Wysocki 

for all the changes in the series affecting the power management code
maintained by me.

Thanks!

> ---
>
> Notes:
> [v2]
> * use SYM_ prefix and sane names
> * add SYM_START and SYM_END and parametrize all the macros
>
> [v3]
> * add SYM_DATA, SYM_DATA_LOCAL, and SYM_DATA_END_LABEL
>
> [v4]
> * add _NOALIGN versions of some macros
> * add _CODE_ derivates of _FUNC_ macros
>
> [v5]
> * drop "SIMPLE" from data annotations
> * switch NOALIGN and ALIGN variants of inner labels
> * s/visibility/linkage/; s@SYM_V_@SYM_L_@
> * add Documentation
>
> [v6]
> * fixed typos found by Randy Dunlap
> * remove doubled INNER_LABEL macros, one pair was unused
>
> [v8]
> * use lkml.kernel.org for links
> * link the docs from index.rst (by Boris)
> * fixed typos on the docs
&

Re: [Xen-devel] [PATCH 10/10] docs: fix broken documentation links

2019-05-27 Thread Rafael J. Wysocki

On Mon, May 20, 2019 at 4:48 PM Mauro Carvalho Chehab
 wrote:
>
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
>
> Signed-off-by: Mauro Carvalho Chehab 

For the ACPI part:

Acked-by: Rafael J. Wysocki 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] vsprintf: Remove support for %pF and %pf in favour of %pS and %ps

2019-03-25 Thread Rafael J. Wysocki

On Mon, Mar 25, 2019 at 10:30 AM Rafael J. Wysocki  wrote:
>
> On Fri, Mar 22, 2019 at 2:21 PM Sakari Ailus
>  wrote:
> >
> > %pS and %ps are now the preferred conversion specifiers to print function
> > %names. The functionality is equivalent; remove the old, deprecated %pF
> > %and %pf support.
>
> Are %pF and %pf really not used any more in the kernel?
>
> If that is not the case, you need to convert the remaining users of
> them to using %ps or %pS before making support for them go away
> completely.

Well, this is a [2/2] in a series, sorry for the noise (/me blames
gmail for the confusion).

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively

2019-03-25 Thread Rafael J. Wysocki

On Fri, Mar 22, 2019 at 2:21 PM Sakari Ailus
 wrote:
>
> %pF and %pf are functionally equivalent to %pS and %ps conversion
> specifiers. The former are deprecated, therefore switch the current users
> to use the preferred variant.
>
> The changes have been produced by the following command:
>
> git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
> while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
>
> And verifying the result.
>
> Signed-off-by: Sakari Ailus 

Acked-by: Rafael J. Wysocki 

> ---
>  arch/alpha/kernel/pci_iommu.c   | 20 ++--
>  arch/arm/mach-imx/pm-imx6.c |  2 +-
>  arch/arm/mm/alignment.c |  2 +-
>  arch/arm/nwfpe/fpmodule.c   |  2 +-
>  arch/microblaze/mm/pgtable.c|  2 +-
>  arch/sparc/kernel/ds.c  |  2 +-
>  arch/um/kernel/sysrq.c  |  2 +-
>  arch/x86/include/asm/trace/exceptions.h |  2 +-
>  arch/x86/kernel/irq_64.c|  2 +-
>  arch/x86/mm/extable.c   |  4 ++--
>  arch/x86/xen/multicalls.c   |  2 +-
>  drivers/acpi/device_pm.c|  2 +-
>  drivers/base/power/main.c   |  6 +++---
>  drivers/base/syscore.c  | 12 ++--
>  drivers/block/drbd/drbd_receiver.c  |  2 +-
>  drivers/block/floppy.c  | 10 +-
>  drivers/cpufreq/cpufreq.c   |  2 +-
>  drivers/mmc/core/quirks.h   |  2 +-
>  drivers/nvdimm/bus.c|  2 +-
>  drivers/nvdimm/dimm_devs.c  |  2 +-
>  drivers/pci/pci-driver.c| 14 +++---
>  drivers/pci/quirks.c|  4 ++--
>  drivers/pnp/quirks.c|  2 +-
>  drivers/scsi/esp_scsi.c |  2 +-
>  fs/btrfs/tests/free-space-tree-tests.c  |  4 ++--
>  fs/f2fs/f2fs.h  |  2 +-
>  fs/pstore/inode.c   |  2 +-
>  include/trace/events/btrfs.h|  2 +-
>  include/trace/events/cpuhp.h|  4 ++--
>  include/trace/events/preemptirq.h   |  2 +-
>  include/trace/events/rcu.h  |  4 ++--
>  include/trace/events/sunrpc.h   |  2 +-
>  include/trace/events/timer.h|  8 
>  include/trace/events/vmscan.h   |  4 ++--
>  include/trace/events/workqueue.h|  4 ++--
>  include/trace/events/xen.h  |  2 +-
>  init/main.c |  6 +++---
>  kernel/async.c  |  4 ++--
>  kernel/events/uprobes.c |  2 +-
>  kernel/fail_function.c  |  2 +-
>  kernel/irq/debugfs.c|  2 +-
>  kernel/irq/handle.c |  2 +-
>  kernel/irq/manage.c |  2 +-
>  kernel/irq/spurious.c   |  4 ++--
>  kernel/rcu/tree.c   |  2 +-
>  kernel/stop_machine.c   |  2 +-
>  kernel/time/sched_clock.c   |  2 +-
>  kernel/time/timer.c |  2 +-
>  kernel/workqueue.c  | 12 ++--
>  lib/error-inject.c  |  2 +-
>  lib/percpu-refcount.c   |  4 ++--
>  mm/memblock.c   | 12 ++--
>  mm/memory.c |  2 +-
>  mm/vmscan.c |  2 +-
>  net/ceph/osd_client.c   |  2 +-
>  net/core/net-procfs.c   |  2 +-
>  net/core/netpoll.c  |  4 ++--
>  57 files changed, 109 insertions(+), 109 deletions(-)
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index 3034d6d936d2..242108439f42 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> @@ -249,7 +249,7 @@ static int pci_dac_dma_supported(struct pci_dev *dev, u64 
> mask)
> ok = 0;
>
> /* If both conditions above are met, we are fine. */
> -   DBGA("pci_dac_dma_supported %s from %pf\n",
> +   DBGA("pci_dac_dma_supported %s from %ps\n",
>  ok ? "yes" : "no", __builtin_return_address(0));
>
> return ok;
> @@ -281,7 +281,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, 
> size_t size,
> && paddr + size <= __direct_map_size) {
> ret = paddr + __direct_map_base;
>
> -   DBGA2("pci_map_single: [%p,%zx] -> direct %llx from %pf\n",
> +   DBGA2("pci_map_single: [%p,%zx] -> direct %llx from %ps\n",
>   cpu_addr, size, ret, __builtin_return_address(0))

Re: [Xen-devel] [PATCH 2/2] vsprintf: Remove support for %pF and %pf in favour of %pS and %ps

2019-03-25 Thread Rafael J. Wysocki

On Fri, Mar 22, 2019 at 2:21 PM Sakari Ailus
 wrote:
>
> %pS and %ps are now the preferred conversion specifiers to print function
> %names. The functionality is equivalent; remove the old, deprecated %pF
> %and %pf support.

Are %pF and %pf really not used any more in the kernel?

If that is not the case, you need to convert the remaining users of
them to using %ps or %pS before making support for them go away
completely.

That said, checkpatch can be made treat %pf/F as invalid format right away IMO.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 7/8] PM / Hibernate: use pfn_to_online_page()

2018-11-20 Thread Rafael J. Wysocki

On Monday, November 19, 2018 11:16:15 AM CET David Hildenbrand wrote:
> Let's use pfn_to_online_page() instead of pfn_to_page() when checking
> for saveable pages to not save/restore offline memory sections.
> 
> Cc: "Rafael J. Wysocki" 
> Cc: Pavel Machek 
> Cc: Len Brown 
> Cc: Andrew Morton 
> Cc: Matthew Wilcox 
> Cc: Michal Hocko 
> Cc: "Michael S. Tsirkin" 
> Suggested-by: Michal Hocko 
> Signed-off-by: David Hildenbrand 

Acked-by: Rafael J. Wysocki 

> ---
>  kernel/power/snapshot.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 640b2034edd6..87e6dd57819f 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1215,8 +1215,8 @@ static struct page *saveable_highmem_page(struct zone 
> *zone, unsigned long pfn)
>   if (!pfn_valid(pfn))
>   return NULL;
>  
> - page = pfn_to_page(pfn);
> - if (page_zone(page) != zone)
> + page = pfn_to_online_page(pfn);
> + if (!page || page_zone(page) != zone)
>   return NULL;
>  
>   BUG_ON(!PageHighMem(page));
> @@ -1277,8 +1277,8 @@ static struct page *saveable_page(struct zone *zone, 
> unsigned long pfn)
>   if (!pfn_valid(pfn))
>   return NULL;
>  
> - page = pfn_to_page(pfn);
> - if (page_zone(page) != zone)
> + page = pfn_to_online_page(pfn);
> + if (!page || page_zone(page) != zone)
>   return NULL;
>  
>   BUG_ON(PageHighMem(page));
> 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 8/8] PM / Hibernate: exclude all PageOffline() pages

2018-11-20 Thread Rafael J. Wysocki

On Monday, November 19, 2018 11:16:16 AM CET David Hildenbrand wrote:
> The content of pages that are marked PG_offline is not of interest
> (e.g. inflated by a balloon driver), let's skip these pages.
> 
> Cc: "Rafael J. Wysocki" 
> Cc: Pavel Machek 
> Cc: Len Brown 
> Cc: Andrew Morton 
> Cc: Matthew Wilcox 
> Cc: Michal Hocko 
> Cc: "Michael S. Tsirkin" 
> Acked-by: Pavel Machek 
> Signed-off-by: David Hildenbrand 

Acked-by: Rafael J. Wysocki 

> ---
>  kernel/power/snapshot.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 87e6dd57819f..8d7b4d458842 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1222,7 +1222,7 @@ static struct page *saveable_highmem_page(struct zone 
> *zone, unsigned long pfn)
>   BUG_ON(!PageHighMem(page));
>  
>   if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page) ||
> - PageReserved(page))
> + PageReserved(page) || PageOffline(page))
>   return NULL;
>  
>   if (page_is_guard(page))
> @@ -1286,6 +1286,9 @@ static struct page *saveable_page(struct zone *zone, 
> unsigned long pfn)
>   if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
>   return NULL;
>  
> + if (PageOffline(page))
> + return NULL;
> +
>   if (PageReserved(page)
>   && (!kernel_page_present(page) || pfn_is_nosave(pfn)))
>   return NULL;
> 



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-09-18 Thread Rafael J. Wysocki

On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand  wrote:
>
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and similar.
>
> In general, we should hold the device_hotplug_lock when adding memory
> to synchronize against online/offline request (e.g. from user space) -
> which already resulted in lock inversions due to device_lock() and
> mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
> hot-add deadlock"). add_memory()/add_memory_resource() will create memory
> block devices, so this really feels like the right thing to do.
>
> Holding the device_hotplug_lock makes sure that a memory block device
> can really only be accessed (e.g. via .online/.state) from user space,
> once the memory has been fully added to the system.
>
> The lock is not held yet in
> drivers/xen/balloon.c
> arch/powerpc/platforms/powernv/memtrace.c
> drivers/s390/char/sclp_cmd.c
> drivers/hv/hv_balloon.c
> So, let's either use the locked variants or take the lock.
>
> Don't export add_memory_resource(), as it once was exported to be used
> by XEN, which is never built as a module. If somebody requires it, we
> also have to export a locked variant (as device_hotplug_lock is never
> exported).
>
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Greg Kroah-Hartman 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Nathan Fontenot 
> Cc: John Allen 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Joonsoo Kim 
> Cc: Vlastimil Babka 
> Cc: Oscar Salvador 
> Cc: Mathieu Malaterre 
> Cc: Pavel Tatashin 
> Cc: YASUAKI ISHIMATSU 
> Reviewed-by: Pavel Tatashin 
> Signed-off-by: David Hildenbrand 
> ---
>  .../platforms/pseries/hotplug-memory.c|  2 +-
>  drivers/acpi/acpi_memhotplug.c|  2 +-
>  drivers/base/memory.c |  9 ++--
>  drivers/xen/balloon.c |  3 +++
>  include/linux/memory_hotplug.h|  1 +
>  mm/memory_hotplug.c   | 22 ---
>  6 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b3f54466e25f..2e6f41dc103a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> /* Add the memory */
> -   rc = add_memory(nid, lmb->base_addr, block_sz);
> +   rc = __add_memory(nid, lmb->base_addr, block_sz);
> if (rc) {
> dlpar_remove_device_tree_lmb(lmb);
> return rc;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 811148415993..8fe0960ea572 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> -   result = add_memory(node, info->start_addr, info->length);
> +   result = __add_memory(node, info->start_addr, info->length);
>
> /*
>  * If the memory block has been used by the kernel, 
> add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 817320c7c4c1..40cac122ec73 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct 
> device_attribute *attr,
> if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
> return -EINVAL;
>
> +   ret = lock_device_hotplug_sysfs();
> +   if (ret)
> +   goto out;
> +
> nid = memory_add_physaddr_to_nid(phys_addr);
> -   ret = add_memory(nid, phys_addr,
> -MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +   ret = __add_memory(nid, phys_addr,
> +  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
>
> if (ret)
> goto out;
>
> ret = count;
>  out:
> +   unlock_device_hotplug();
> return ret;
>  }
>
&

Re: [Xen-devel] [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

2018-09-18 Thread Rafael J. Wysocki

On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand  wrote:
>
> remove_memory() is exported right now but requires the
> device_hotplug_lock, which is not exported. So let's provide a variant
> that takes the lock and only export that one.
>
> The lock is already held in
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> So, let's use the locked variant.
>
> The lock is not held (but taken in)
> arch/powerpc/platforms/powernv/memtrace.c
> So let's keep using the (now) locked variant.
>
> Apart from that, there are not other users in the tree.
>
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Rashmica Gupta 
> Cc: Michael Neuling 
> Cc: Balbir Singh 
> Cc: Nathan Fontenot 
> Cc: John Allen 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Joonsoo Kim 
> Cc: Vlastimil Babka 
> Cc: Pavel Tatashin 
> Cc: Greg Kroah-Hartman 
> Cc: Oscar Salvador 
> Cc: YASUAKI ISHIMATSU 
> Cc: Mathieu Malaterre 
> Reviewed-by: Pavel Tatashin 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c   | 2 --
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
>  drivers/acpi/acpi_memhotplug.c  | 2 +-
>  include/linux/memory_hotplug.h  | 3 ++-
>  mm/memory_hotplug.c | 9 -
>  5 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..8f1cd4f3bfd5 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, 
> u64 nr_pages)
> walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>   change_memblock_state);
>
> -   lock_device_hotplug();
> remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> -   unlock_device_hotplug();
>
> return true;
>  }
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c1578f54c626..b3f54466e25f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, 
> unsigned int memblock_siz
> nid = memory_add_physaddr_to_nid(base);
>
> for (i = 0; i < sections_per_block; i++) {
> -   remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
> +   __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
> base += MIN_MEMORY_BLOCK_SIZE;
> }
>
> @@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> block_sz = pseries_memory_block_size();
> nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> -   remove_memory(nid, lmb->base_addr, block_sz);
> +   __remove_memory(nid, lmb->base_addr, block_sz);
>
> /* Update memory regions for memory remove */
> memblock_remove(lmb->base_addr, block_sz);
> @@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> rc = dlpar_online_lmb(lmb);
> if (rc) {
> -   remove_memory(nid, lmb->base_addr, block_sz);
> +   __remove_memory(nid, lmb->base_addr, block_sz);
> dlpar_remove_device_tree_lmb(lmb);
> } else {
> lmb->flags |= DRCONF_MEM_ASSIGNED;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..811148415993 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
> acpi_memory_device *mem_device)
> nid = memory_add_physaddr_to_nid(info->start_addr);
>
> acpi_unbind_memory_blocks(info);
> -   remove_memory(nid, info->start_addr, info->length);
> +   __remove_memory(nid, info->start_addr, info->length);
> list_del(>list);
> kfree(info);
> }
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 34a28227068d..1f096852f479 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, 
> unsigned long nr_pages);
>  extern v

Re: [Xen-devel] [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:
>
> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> > On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >> From: Vitaly Kuznetsov 
> >>
> >> Well require to call add_memory()/add_memory_resource() with
> >> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >> lock device hotplug.
> >>
> >> Signed-off-by: Vitaly Kuznetsov 
> >> [modify patch description]
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  drivers/base/core.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 04bbcd779e11..9010b9e942b5 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>  {
> >>  mutex_lock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>
> >>  void unlock_device_hotplug(void)
> >>  {
> >>  mutex_unlock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >
> > If these are going to be "global" symbols, let's properly name them.
> > device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> > about letting stuff outside of the driver core mess with this, as people
> > better know what they are doing.
>
> The only "problem" is that we have kernel modules (for paravirtualized
> devices) that call add_memory(). This is Hyper-V right now, but we might
> have other ones in the future. Without them we would not have to export
> it. We might also get kernel modules that want to call remove_memory() -
> which will require the device_hotplug_lock as of now.
>
> What we could do is
>
> a) add_memory() -> _add_memory() and don't export it
> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> We export that one.
> c) Use add_memory() in external modules only
>
> Similar wrapper would be needed e.g. for remove_memory() later on.

That would be safer IMO, as it would prevent developers from using
add_memory() without the lock, say.

If the lock is always going to be required for add_memory(), make it
hard (or event impossible) to use the latter without it.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 10:41 AM Greg Kroah-Hartman
 wrote:
>
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> > From: Vitaly Kuznetsov 
> >
> > Well require to call add_memory()/add_memory_resource() with
> > device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> > (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> > lock device hotplug.
> >
> > Signed-off-by: Vitaly Kuznetsov 
> > [modify patch description]
> > Signed-off-by: David Hildenbrand 
> > ---
> >  drivers/base/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 04bbcd779e11..9010b9e942b5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >  {
> >   mutex_lock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >
> >  void unlock_device_hotplug(void)
> >  {
> >   mutex_unlock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.

Well, device_hotplug_lock is the name of the lock itself. :-)

> But I am _really_ nervous about letting stuff outside of the driver core mess
> with this, as people better know what they are doing.
>
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

Because, in general, memory hotplug and hotplug of devices are not
independent.  CPUs and memory may only be possible to take away
together and that may be the case for other devices too (say, it
wouldn't be a good idea to access a memory block that has just gone
away from a device, for DMA and the like).  That's why
device_hotplug_lock was introduced in the first place.

That said I agree that exporting this to drivers doesn't feel particularly safe.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread Rafael J. Wysocki

On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
>mem_hotplug_begin() and the calls device_online() - which takes
>device_lock() - .online does no longer call mem_hotplug_begin(), so
>effectively calls online_pages() without mem_hotplug_lock. onlining/
>offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
>onlining memory during add_memory() does not take care of that. (I
>didn't follow how strictly this is needed, but there seems to be a
>reason because it is documented at device_online() and
>device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>(and device_online()) without locks. This was introduced after
>30467e0b3be. And skimming over the code, I assume it could need some
>more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
>  just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>  drivers/acpi/acpi_memhotplug.c|  1 +
>  drivers/base/memory.c | 18 +-
>  drivers/hv/hv_balloon.c   |  4 +++
>  drivers/s390/char/sclp_cmd.c  |  3 ++
>  drivers/xen/balloon.c |  3 ++
>  mm/memory_hotplug.c   | 42 ++-
>  7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> +   /* add_memory() requires device_hotplug_lock */
> +   lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = _array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> +   unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> +   /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what

Re: [Xen-devel] Ping: [PATCH] xen/ACPI: don't upload Px/Cx data for disabled processors

2018-08-15 Thread Rafael J. Wysocki

On Wed, Aug 15, 2018 at 8:44 AM Jan Beulich  wrote:
>
> >>> On 25.06.18 at 12:17,  wrote:
> > This is unnecessary and triggers a warning in the hypervisor.
> >
> > Often systems have more processor entries in their ACPI tables than are
> > actually installed/active. The ACPI_STA_DEVICE_PRESENT bit cannot be
> > reliably used, but the ACPI_MADT_ENABLED bit can. In order to not
> > introduce new functions in the main ACPI processor driver code, simply
> > use acpi_get_phys_id(), which does more than we need, but which checks
> > the MADT enabled bit in the process. Any CPU for which we can't
> > determine the APIC ID is unlikely to work properly anyway, so the extra
> > checks done by acpi_get_phys_id() should do no harm.
> >
> > Signed-off-by: Jan Beulich 
> > ---
> >  drivers/acpi/processor_core.c|1 +
> >  drivers/xen/xen-acpi-processor.c |6 ++
> >  2 files changed, 7 insertions(+)
>
> With Jürgen's R-b in place, may I ask for an ack for the processor_core.c
> change, or - in case you dislike the new export - an alternative suggestion?

It is fine to export it IMO.  If you need my ACK for that, please feel
free to add it to the patch.

Thanks,
Rafael

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/acpi: upload _PSD info for non Dom0 CPUs too

2018-03-20 Thread Rafael J. Wysocki

On Fri, Mar 16, 2018 at 2:57 PM, Joao Martins <joao.m.mart...@oracle.com> wrote:
> On 03/15/2018 03:45 PM, Boris Ostrovsky wrote:
>> On 03/15/2018 10:22 AM, Joao Martins wrote:
>>> All uploaded PM data from non-dom0 CPUs takes the info from vCPU 0 and
>>> changing only the acpi_id. For processors which P-state coordination type
>>> is HW_ALL (0xFD) it is OK to upload bogus P-state dependency information
>>> (_PSD), because Xen will ignore any cpufreq domains created for past CPUs.
>>>
>>> Albeit for platforms which expose coordination types as SW_ANY or SW_ALL,
>>> this will have some unintended side effects. Effectively, it will look at
>>> the P-state domain existence and *if it already exists* it will skip the
>>> acpi-cpufreq initialization and thus inherit the policy from the first CPU
>>> in the cpufreq domain. This will finally lead to the original cpu not
>>> changing target freq to P0 other than the first in the domain. Which will
>>> make turbo boost not getting enabled (e.g. for 'performance' governor) for
>>> all cpus.
>>>
>>> This patch fixes that, by also evaluating _PSD when we enumerate all ACPI
>>> processors and thus always uploading the correct info to Xen. We export
>>> acpi_processor_get_psd() for that this purpose, but change signature
>>> to not assume an existent of acpi_processor given that ACPI isn't creating
>>> an acpi_processor for non-dom0 CPUs.
>>>
>>> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
>>
>> Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
>>
> Thanks!
>
> I suppose what's remaining is review (or ack) from ACPI folks on the interface
> changes made to acpi_processor_get_psd().

There you go:

Acked-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>

Do you want to route this via Xen?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 0/3] xen: re-enable booting as Xen PVH guest

2018-02-19 Thread Rafael J. Wysocki


On 2/19/2018 11:09 AM, Juergen Gross wrote:

The Xen PVH boot protocol passes vital information to the kernel via
a start_info block. One of the data transferred is the physical address
of the RSDP table.

Unfortunately PVH support in the kernel didn't use that passed address
for RSDP, but relied on the legacy mechanism searching for the RSDP in
low memory. After a recent change in Xen putting the RSDP to a higher
address booting as PVH guest is now failing.

This small series repairs that by passing the RSDP address from the
start_info block to ACPI handling.

Changes in V3:
- instead of using a weak function add a function pointer to x86_init
   for obtaining the RSDP address

Juergen Gross (3):
   acpi: introduce acpi_arch_get_root_pointer() for getting rsdp address
   x86/acpi: add a new x86_init_acpi structure to x86_init_ops
   x86/xen: add pvh specific rsdp address retrieval function

  arch/x86/include/asm/acpi.h |  7 +++
  arch/x86/include/asm/x86_init.h |  9 +
  arch/x86/kernel/x86_init.c  |  5 +
  arch/x86/xen/enlighten_pvh.c| 14 +++---
  drivers/acpi/osl.c  |  5 -
  include/linux/acpi.h|  7 +++
  6 files changed, 43 insertions(+), 4 deletions(-)


The series is fine by me:

Acked-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/acpi: add retrieval function for rsdp address

2018-02-02 Thread Rafael J. Wysocki

On Thu, Feb 1, 2018 at 4:45 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Thu, Feb 1, 2018 at 9:57 AM, Rafael J. Wysocki <raf...@kernel.org> wrote:
>> On Wed, Jan 31, 2018 at 4:43 PM, Andy Shevchenko
>> <andy.shevche...@gmail.com> wrote:
>>> On Mon, Jan 29, 2018 at 5:02 AM, Rafael J. Wysocki <raf...@kernel.org> 
>>> wrote:
>>>> On Sun, Jan 28, 2018 at 4:04 PM, Andy Shevchenko
>>>> <andy.shevche...@gmail.com> wrote:
>
>>> Instead of declaring function as __weak, establish a new struct for
>>> ACPI related stubs and incorporate it into x86_init.
>>>
>>> That is my proposal. I think I would go this way in my case where I
>>> need to treat differently ACPI HW reduced initialization of legacy
>>> devices.
>>
>> IOW you'd like to have a set of ACPI init callbacks that could be
>> defined by an arch, right?
>
> Correct!

OK

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/acpi: add retrieval function for rsdp address

2018-01-31 Thread Rafael J. Wysocki

On Wed, Jan 31, 2018 at 4:43 PM, Andy Shevchenko
<andy.shevche...@gmail.com> wrote:
> On Mon, Jan 29, 2018 at 5:02 AM, Rafael J. Wysocki <raf...@kernel.org> wrote:
>> On Sun, Jan 28, 2018 at 4:04 PM, Andy Shevchenko
>> <andy.shevche...@gmail.com> wrote:
>>> On Fri, Jan 26, 2018 at 8:21 PM, Juergen Gross <jgr...@suse.com> wrote:
>>>> On 26/01/18 19:08, Andy Shevchenko wrote:
>>>>> On Thu, Jan 25, 2018 at 4:36 PM, Juergen Gross <jgr...@suse.com> wrote:
>
>>>>> The problem with weak functions that we can't have more than one
>>>>> implementation per kernel while we would like to built several code
>>>>> paths.
>>>>>
>>>>> I have stumbled on the similar stuff and realize that.
>>>>>
>>>>> Perhaps, one of the solution is to have an additional struct under
>>>>> x86_init to alternate ACPI related stuff.
>>>>
>>>> I think we can go that route when another user of that interface is
>>>> appearing.
>>>
>>> Why not to establish the struct? At least this route I would like to
>>> go with [1].
>>>
>>> [1]: https://lkml.org/lkml/2018/1/17/834
>>
>> Maybe I'm a bit slow today, but care to explain what exactly you mean?
>
> Instead of declaring function as __weak, establish a new struct for
> ACPI related stubs and incorporate it into x86_init.
>
> That is my proposal. I think I would go this way in my case where I
> need to treat differently ACPI HW reduced initialization of legacy
> devices.

IOW you'd like to have a set of ACPI init callbacks that could be
defined by an arch, right?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/acpi: add retrieval function for rsdp address

2018-01-28 Thread Rafael J. Wysocki

On Sun, Jan 28, 2018 at 4:04 PM, Andy Shevchenko
 wrote:
> On Fri, Jan 26, 2018 at 8:21 PM, Juergen Gross  wrote:
>> On 26/01/18 19:08, Andy Shevchenko wrote:
>>> On Thu, Jan 25, 2018 at 4:36 PM, Juergen Gross  wrote:
 Add a function to get the address of the RSDP table. Per default use a
 __weak annotated function being a nop.
>>>
>>> The problem with weak functions that we can't have more than one
>>> implementation per kernel while we would like to built several code
>>> paths.
>>>
>>> I have stumbled on the similar stuff and realize that.
>>>
>>> Perhaps, one of the solution is to have an additional struct under
>>> x86_init to alternate ACPI related stuff.
>>
>> I think we can go that route when another user of that interface is
>> appearing.
>
> Why not to establish the struct? At least this route I would like to
> go with [1].
>
> [1]: https://lkml.org/lkml/2018/1/17/834

Maybe I'm a bit slow today, but care to explain what exactly you mean?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/acpi: add retrieval function for rsdp address

2018-01-28 Thread Rafael J. Wysocki

On Fri, Jan 26, 2018 at 7:08 PM, Andy Shevchenko
 wrote:
> On Thu, Jan 25, 2018 at 4:36 PM, Juergen Gross  wrote:
>> Add a function to get the address of the RSDP table. Per default use a
>> __weak annotated function being a nop.
>
> The problem with weak functions that we can't have more than one
> implementation per kernel while we would like to built several code
> paths.
>
> I have stumbled on the similar stuff and realize that.
>
> Perhaps, one of the solution is to have an additional struct under
> x86_init to alternate ACPI related stuff.

I'm not sure if that really is a problem in this particular case.

Why would you want to use different RSDP retrieval functions for one arch?

Thanks,
Rafael

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

74 matches

Mail list logo