Re: [PATCH 0/5] cpuidle-pseries: Parse extended CEDE information for idle.

2020-07-27 Thread Rafael J. Wysocki
On Tue, Jul 7, 2020 at 1:32 PM Gautham R Shenoy  wrote:
>
> Hi,
>
> On Tue, Jul 07, 2020 at 04:41:34PM +0530, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" 
> >
> > Hi,
> >
> >
> >
> >
> > Gautham R. Shenoy (5):
> >   cpuidle-pseries: Set the latency-hint before entering CEDE
> >   cpuidle-pseries: Add function to parse extended CEDE records
> >   cpuidle-pseries : Fixup exit latency for CEDE(0)
> >   cpuidle-pseries : Include extended CEDE states in cpuidle framework
> >   cpuidle-pseries: Block Extended CEDE(1) which adds no additional
> > value.
>
> Forgot to mention that these patches are on top of Nathan's series to
> remove extended CEDE offline and bogus topology update code :
> https://lore.kernel.org/linuxppc-dev/20200612051238.1007764-1-nath...@linux.ibm.com/

OK, so this is targeted at the powerpc maintainers, isn't it?


Re: [PATCH v3 1/2] cpuidle: Trace IPI based and timer based wakeup latency from idle states

2020-07-27 Thread Rafael J. Wysocki
On Tue, Jul 21, 2020 at 2:43 PM Pratik Rajesh Sampat
 wrote:
>
> Fire directed smp_call_function_single IPIs from a specified source
> CPU to the specified target CPU to reduce the noise we have to wade
> through in the trace log.

And what's the purpose of it?

> The module is based on the idea written by Srivatsa Bhat and maintained
> by Vaidyanathan Srinivasan internally.
>
> Queue HR timer and measure jitter. Wakeup latency measurement for idle
> states using hrtimer.  Echo a value in ns to timer_test_function and
> watch trace. A HRtimer will be queued and when it fires the expected
> wakeup vs actual wakeup is computes and delay printed in ns.
>
> Implemented as a module which utilizes debugfs so that it can be
> integrated with selftests.
>
> To include the module, check option and include as module
> kernel hacking -> Cpuidle latency selftests
>
> [srivatsa.b...@linux.vnet.ibm.com: Initial implementation in
>  cpidle/sysfs]
>
> [sva...@linux.vnet.ibm.com: wakeup latency measurements using hrtimer
>  and fix some of the time calculation]
>
> [e...@linux.vnet.ibm.com: Fix some whitespace and tab errors and
>  increase the resolution of IPI wakeup]
>
> Signed-off-by: Pratik Rajesh Sampat 
> Reviewed-by: Gautham R. Shenoy 
> ---
>  drivers/cpuidle/Makefile   |   1 +
>  drivers/cpuidle/test-cpuidle_latency.c | 150 +
>  lib/Kconfig.debug  |  10 ++
>  3 files changed, 161 insertions(+)
>  create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
>
> diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
> index f07800cbb43f..2ae05968078c 100644
> --- a/drivers/cpuidle/Makefile
> +++ b/drivers/cpuidle/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
>  obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
>  obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
>  obj-$(CONFIG_HALTPOLL_CPUIDLE)   += cpuidle-haltpoll.o
> +obj-$(CONFIG_IDLE_LATENCY_SELFTEST)  += test-cpuidle_latency.o
>
>  
> ##
>  # ARM SoC drivers
> diff --git a/drivers/cpuidle/test-cpuidle_latency.c 
> b/drivers/cpuidle/test-cpuidle_latency.c
> new file mode 100644
> index ..61574665e972
> --- /dev/null
> +++ b/drivers/cpuidle/test-cpuidle_latency.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Module-based API test facility for cpuidle latency using IPIs and timers

I'd like to see a more detailed description of what it does and how it
works here.

> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/* IPI based wakeup latencies */
> +struct latency {
> +   unsigned int src_cpu;
> +   unsigned int dest_cpu;
> +   ktime_t time_start;
> +   ktime_t time_end;
> +   u64 latency_ns;
> +} ipi_wakeup;
> +
> +static void measure_latency(void *info)
> +{
> +   struct latency *v;
> +   ktime_t time_diff;
> +
> +   v = (struct latency *)info;
> +   v->time_end = ktime_get();
> +   time_diff = ktime_sub(v->time_end, v->time_start);
> +   v->latency_ns = ktime_to_ns(time_diff);
> +}
> +
> +void run_smp_call_function_test(unsigned int cpu)
> +{
> +   ipi_wakeup.src_cpu = smp_processor_id();
> +   ipi_wakeup.dest_cpu = cpu;
> +   ipi_wakeup.time_start = ktime_get();
> +   smp_call_function_single(cpu, measure_latency, _wakeup, 1);
> +}
> +
> +/* Timer based wakeup latencies */
> +struct timer_data {
> +   unsigned int src_cpu;
> +   u64 timeout;
> +   ktime_t time_start;
> +   ktime_t time_end;
> +   struct hrtimer timer;
> +   u64 timeout_diff_ns;
> +} timer_wakeup;
> +
> +static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
> +{
> +   struct timer_data *w;
> +   ktime_t time_diff;
> +
> +   w = container_of(hrtimer, struct timer_data, timer);
> +   w->time_end = ktime_get();
> +
> +   time_diff = ktime_sub(w->time_end, w->time_start);
> +   time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
> +   w->timeout_diff_ns = ktime_to_ns(time_diff);
> +   return HRTIMER_NORESTART;
> +}
> +
> +static void run_timer_test(unsigned int ns)
> +{
> +   hrtimer_init(_wakeup.timer, CLOCK_MONOTONIC,
> +HRTIMER_MODE_REL);
> +   timer_wakeup.timer.function = timer_called;
> +   timer_wakeup.time_start = ktime_get();
> +   timer_wakeup.src_cpu = smp_processor_id();
> +   timer_wakeup.timeout = ns;
> +
> +   hrtimer_start(_wakeup.timer, ns_to_ktime(ns),
> + HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static struct dentry *dir;
> +
> +static int cpu_read_op(void *data, u64 *value)
> +{
> +   *value = ipi_wakeup.dest_cpu;
> +   return 0;
> +}
> +
> +static int cpu_write_op(void *data, u64 value)
> +{
> +   run_smp_call_function_test(value);
> +   return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, 

Re: [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static

2020-07-15 Thread Rafael J. Wysocki
On Tue, Jul 14, 2020 at 4:14 PM Wei Yongjun  wrote:
>
> The sparse tool complains as follows:
>
> drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
>  symbol 'pseries_idle_driver' was not declared. Should it be static?
>
> 'pseries_idle_driver' is not used outside of this file, so marks
> it static.
>
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/cpuidle/cpuidle-pseries.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 6513ef2af66a..3e058ad2bb51 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -22,7 +22,7 @@
>  #include 
>  #include 
>
> -struct cpuidle_driver pseries_idle_driver = {
> +static struct cpuidle_driver pseries_idle_driver = {
> .name = "pseries_idle",
> .owner= THIS_MODULE,
>  };

Applied as 5.9 material, thanks!


Re: [PATCH -next] cpufreq: powernv: Make some symbols static

2020-07-15 Thread Rafael J. Wysocki
On Tue, Jul 14, 2020 at 4:14 PM Wei Yongjun  wrote:
>
> The sparse tool complains as follows:
>
> drivers/cpufreq/powernv-cpufreq.c:88:1: warning:
>  symbol 'pstate_revmap' was not declared. Should it be static?
> drivers/cpufreq/powernv-cpufreq.c:383:18: warning:
>  symbol 'cpufreq_freq_attr_cpuinfo_nominal_freq' was not declared. Should it 
> be static?
> drivers/cpufreq/powernv-cpufreq.c:669:6: warning:
>  symbol 'gpstate_timer_handler' was not declared. Should it be static?
> drivers/cpufreq/powernv-cpufreq.c:902:6: warning:
>  symbol 'powernv_cpufreq_work_fn' was not declared. Should it be static?
>
> Those symbols are not used outside of this file, so mark
> them static.
>
> Reported-by: Hulk Robot 
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 8646eb197cd9..cf118263ec65 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -85,7 +85,7 @@ struct global_pstate_info {
>
>  static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
>
> -DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
> +static DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
>  /**
>   * struct pstate_idx_revmap_data: Entry in the hashmap pstate_revmap
>   *   indexed by a function of pstate id.
> @@ -380,7 +380,7 @@ static ssize_t cpuinfo_nominal_freq_show(struct 
> cpufreq_policy *policy,
> powernv_freqs[powernv_pstate_info.nominal].frequency);
>  }
>
> -struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
> +static struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
> __ATTR_RO(cpuinfo_nominal_freq);
>
>  #define SCALING_BOOST_FREQS_ATTR_INDEX 2
> @@ -666,7 +666,7 @@ static inline void  queue_gpstate_timer(struct 
> global_pstate_info *gpstates)
>   * according quadratic equation. Queues a new timer if it is still not equal
>   * to local pstate
>   */
> -void gpstate_timer_handler(struct timer_list *t)
> +static void gpstate_timer_handler(struct timer_list *t)
>  {
> struct global_pstate_info *gpstates = from_timer(gpstates, t, timer);
> struct cpufreq_policy *policy = gpstates->policy;
> @@ -899,7 +899,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
> .notifier_call = powernv_cpufreq_reboot_notifier,
>  };
>
> -void powernv_cpufreq_work_fn(struct work_struct *work)
> +static void powernv_cpufreq_work_fn(struct work_struct *work)
>  {
> struct chip *chip = container_of(work, struct chip, throttle);
> struct cpufreq_policy *policy;
>

Applied as 5.9 material, thanks!


Re: [PATCH V4 0/3] cpufreq: Allow default governor on cmdline and fix locking issues

2020-06-30 Thread Rafael J. Wysocki
On Mon, Jun 29, 2020 at 10:58 PM Viresh Kumar  wrote:
>
> Hi,
>
> I have picked Quentin's series over my patch, modified both and tested.
>
> V3->V4:
> - Do __module_get() for cpufreq_default_governor() case as well and get
>   rid of an extra variable.
> - Use a single character array, default_governor, instead of two of them.
>
> V2->V3:
> - default_governor is a string now and we don't set it on governor
>   registration or unregistration anymore.
> - Fixed locking issues in cpufreq_init_policy().
>
> --
> Viresh
>
> Original cover letter fro Quentin:
>
> This series enables users of prebuilt kernels (e.g. distro kernels) to
> specify their CPUfreq governor of choice using the kernel command line,
> instead of having to wait for the system to fully boot to userspace to
> switch using the sysfs interface. This is helpful for 2 reasons:
>   1. users get to choose the governor that runs during the actual boot;
>   2. it simplifies the userspace boot procedure a bit (one less thing to
>  worry about).
>
> To enable this, the first patch moves all governor init calls to
> core_initcall, to make sure they are registered by the time the drivers
> probe. This should be relatively low impact as registering a governor
> is a simple procedure (it gets added to a llist), and all governors
> already load at core_initcall anyway when they're set as the default
> in Kconfig. This also allows to clean-up the governors' init/exit code,
> and reduces boilerplate.
>
> The second patch introduces the new command line parameter, inspired by
> its cpuidle counterpart. More details can be found in the respective
> patch headers.
>
> Changes in v2:
>  - added Viresh's ack to patch 01
>  - moved the assignment of 'default_governor' in patch 02 to the governor
>registration path instead of the driver registration (Viresh)
>
> Quentin Perret (2):
>   cpufreq: Register governors at core_initcall
>   cpufreq: Specify default governor on command line
>
> Viresh Kumar (1):
>   cpufreq: Fix locking issues with governors
>
>  .../admin-guide/kernel-parameters.txt |  5 ++
>  Documentation/admin-guide/pm/cpufreq.rst  |  6 +-
>  .../platforms/cell/cpufreq_spudemand.c| 26 +-
>  drivers/cpufreq/cpufreq.c | 87 ---
>  drivers/cpufreq/cpufreq_conservative.c| 22 ++---
>  drivers/cpufreq/cpufreq_ondemand.c| 24 ++---
>  drivers/cpufreq/cpufreq_performance.c | 14 +--
>  drivers/cpufreq/cpufreq_powersave.c   | 18 +---
>  drivers/cpufreq/cpufreq_userspace.c   | 18 +---
>  include/linux/cpufreq.h   | 14 +++
>  kernel/sched/cpufreq_schedutil.c  |  6 +-
>  11 files changed, 100 insertions(+), 140 deletions(-)
>
> --

All three patches applied as 5.9 material, thanks!


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 3:50 PM Quentin Perret  wrote:
>
> On Thursday 25 Jun 2020 at 15:28:43 (+0200), Rafael J. Wysocki wrote:
> > On Thu, Jun 25, 2020 at 1:53 PM Quentin Perret  wrote:
> > >
> > > On Thursday 25 Jun 2020 at 13:44:34 (+0200), Rafael J. Wysocki wrote:
> > > > On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  
> > > > wrote:
> > > > > This change is not right IMO. This part handles the set-policy case,
> > > > > where there are no governors. Right now this code, for some reasons
> > > > > unknown to me, forcefully uses the default governor set to indicate
> > > > > the policy, which is not a great idea in my opinion TBH. This doesn't
> > > > > and shouldn't care about governor modules and should only be looking
> > > > > at strings instead of governor pointer.
> > > >
> > > > Sounds right.
> > > >
> > > > > Rafael, I even think we should remove this code completely and just
> > > > > rely on what the driver has sent to us. Using the selected governor
> > > > > for set policy drivers is very confusing and also we shouldn't be
> > > > > forced to compiling any governor for the set-policy case.
> > > >
> > > > Well, AFAICS the idea was to use the default governor as a kind of
> > > > default policy proxy, but I agree that strings should be sufficient
> > > > for that.
> > >
> > > I agree with all the above. I'd much rather not rely on the default
> > > governor name to populate the default policy, too, so +1 from me.
> >
> > So before this series the default governor was selected at the kernel
> > configuration time (pre-build) and was always built-in.  Because it
> > could not go away, its name could be used to indicate the default
> > policy for the "setpolicy" drivers.
> >
> > After this series, however, it cannot be used this way reliably, but
> > you can still pass cpufreq_param_governor to cpufreq_parse_policy()
> > instead of def_gov->name in cpufreq_init_policy(), can't you?
>
> Good point. I also need to fallback to the default builtin governor if
> the command line parameter isn't valid (or non-existent), so perhaps
> something like so?

Yes, that should work if I haven't missed anything.

> iff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index dad6b85f4c89..20a2020abf88 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -653,6 +653,23 @@ static unsigned int cpufreq_parse_policy(char 
> *str_governor)
> return CPUFREQ_POLICY_UNKNOWN;
>  }
>
> +static unsigned int cpufreq_default_policy(void)
> +{
> +   unsigned int pol;
> +
> +   pol = cpufreq_parse_policy(cpufreq_param_governor);
> +   if (pol != CPUFREQ_POLICY_UNKNOWN)
> +   return pol;
> +
> +   if (IS_BUILTIN(CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE))
> +   return CPUFREQ_POLICY_PERFORMANCE;
> +
> +   if (IS_BUILTIN(CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE))
> +   return CPUFREQ_POLICY_POWERSAVE;
> +
> +   return CPUFREQ_POLICY_UNKNOWN;
> +}
> +
>  /**
>   * cpufreq_parse_governor - parse a governor string only for has_target()
>   * @str_governor: Governor name.
> @@ -1085,8 +1102,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
> /* Use the default policy if there is no last_policy. */
> if (policy->last_policy) {
> pol = policy->last_policy;
> -   } else if (default_governor) {
> -   pol = cpufreq_parse_policy(default_governor->name);
> +   } else {
> +   pol = cpufreq_default_policy();
> /*
>  * In case the default governor is neiter 
> "performance"
>  * nor "powersave", fall back to the initial policy


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 1:53 PM Quentin Perret  wrote:
>
> On Thursday 25 Jun 2020 at 13:44:34 (+0200), Rafael J. Wysocki wrote:
> > On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  
> > wrote:
> > > This change is not right IMO. This part handles the set-policy case,
> > > where there are no governors. Right now this code, for some reasons
> > > unknown to me, forcefully uses the default governor set to indicate
> > > the policy, which is not a great idea in my opinion TBH. This doesn't
> > > and shouldn't care about governor modules and should only be looking
> > > at strings instead of governor pointer.
> >
> > Sounds right.
> >
> > > Rafael, I even think we should remove this code completely and just
> > > rely on what the driver has sent to us. Using the selected governor
> > > for set policy drivers is very confusing and also we shouldn't be
> > > forced to compiling any governor for the set-policy case.
> >
> > Well, AFAICS the idea was to use the default governor as a kind of
> > default policy proxy, but I agree that strings should be sufficient
> > for that.
>
> I agree with all the above. I'd much rather not rely on the default
> governor name to populate the default policy, too, so +1 from me.

So before this series the default governor was selected at the kernel
configuration time (pre-build) and was always built-in.  Because it
could not go away, its name could be used to indicate the default
policy for the "setpolicy" drivers.

After this series, however, it cannot be used this way reliably, but
you can still pass cpufreq_param_governor to cpufreq_parse_policy()
instead of def_gov->name in cpufreq_init_policy(), can't you?


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 1:36 PM Viresh Kumar  wrote:
>
> After your last email (reply to my patch), I noticed a change which
> isn't required. :)
>
> On 23-06-20, 15:21, Quentin Perret wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 0128de3603df..4b1a5c0173cf 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
> >  #define for_each_governor(__governor)\
> >   list_for_each_entry(__governor, _governor_list, governor_list)
> >
> > +static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
> > +static struct cpufreq_governor *default_governor;
> > +
> >  /**
> >   * The "cpufreq driver" - the arch- or hardware-dependent low
> >   * level driver of CPUFreq support, and its spinlock. This lock
> > @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor 
> > *cpufreq_default_governor(void)
> >
> >  static int cpufreq_init_policy(struct cpufreq_policy *policy)
> >  {
> > - struct cpufreq_governor *def_gov = cpufreq_default_governor();
> >   struct cpufreq_governor *gov = NULL;
> >   unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> >
> > @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> > *policy)
> >   if (gov) {
> >   pr_debug("Restoring governor %s for cpu %d\n",
> >policy->governor->name, policy->cpu);
> > - } else if (def_gov) {
> > - gov = def_gov;
> > + } else if (default_governor) {
> > + gov = default_governor;
> >   } else {
> >   return -ENODATA;
> >   }
>
>
> > @@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> > *policy)
> >   /* Use the default policy if there is no last_policy. */
> >   if (policy->last_policy) {
> >   pol = policy->last_policy;
> > - } else if (def_gov) {
> > - pol = cpufreq_parse_policy(def_gov->name);
> > + } else if (default_governor) {
> > + pol = cpufreq_parse_policy(default_governor->name);
>
> This change is not right IMO. This part handles the set-policy case,
> where there are no governors. Right now this code, for some reasons
> unknown to me, forcefully uses the default governor set to indicate
> the policy, which is not a great idea in my opinion TBH. This doesn't
> and shouldn't care about governor modules and should only be looking
> at strings instead of governor pointer.

Sounds right.

> Rafael, I even think we should remove this code completely and just
> rely on what the driver has sent to us. Using the selected governor
> for set policy drivers is very confusing and also we shouldn't be
> forced to compiling any governor for the set-policy case.

Well, AFAICS the idea was to use the default governor as a kind of
default policy proxy, but I agree that strings should be sufficient
for that.

I'll have a look at what to do with that code.


Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-25 Thread Rafael J. Wysocki
On Thu, Jun 25, 2020 at 10:50 AM Viresh Kumar  wrote:
>
> On 24-06-20, 16:32, Quentin Perret wrote:
> > Right, but I must admit that, looking at this more, I'm getting a bit
> > confused with the overall locking for governors :/
> >
> > When in cpufreq_init_policy() we find a governor using
> > find_governor(policy->last_governor), what guarantees this governor is
> > not concurrently unregistered? That is, what guarantees this governor
> > doesn't go away between that find_governor() call, and the subsequent
> > call to try_module_get() in cpufreq_set_policy() down the line?
> >
> > Can we somewhat assume that whatever governor is referred to by
> > policy->last_governor will have a non-null refcount? Or are the
> > cpufreq_online() and cpufreq_unregister_governor() path mutually
> > exclusive? Or is there something else?
>
> This should be sufficient to fix pending issues I believe. Based over your
> patches.

LGTM, but can you post it in a new thread to let Patchwork pick it up?

> -8<-
> From: Viresh Kumar 
> Date: Thu, 25 Jun 2020 13:15:23 +0530
> Subject: [PATCH] cpufreq: Fix locking issues with governors
>
> The locking around governors handling isn't adequate currently. The list
> of governors should never be traversed without locking in place. Also we
> must make sure the governor isn't removed while it is still referenced
> by code.
>
> Reported-by: Quentin Perret 
> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq.c | 59 ---
>  1 file changed, 36 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4b1a5c0173cf..dad6b85f4c89 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -624,6 +624,24 @@ static struct cpufreq_governor *find_governor(const char 
> *str_governor)
> return NULL;
>  }
>
> +static struct cpufreq_governor *get_governor(const char *str_governor)
> +{
> +   struct cpufreq_governor *t;
> +
> +   mutex_lock(_governor_mutex);
> +   t = find_governor(str_governor);
> +   if (!t)
> +   goto unlock;
> +
> +   if (!try_module_get(t->owner))
> +   t = NULL;
> +
> +unlock:
> +   mutex_unlock(_governor_mutex);
> +
> +   return t;
> +}
> +
>  static unsigned int cpufreq_parse_policy(char *str_governor)
>  {
> if (!strncasecmp(str_governor, "performance", CPUFREQ_NAME_LEN))
> @@ -643,28 +661,14 @@ static struct cpufreq_governor 
> *cpufreq_parse_governor(char *str_governor)
>  {
> struct cpufreq_governor *t;
>
> -   mutex_lock(_governor_mutex);
> -
> -   t = find_governor(str_governor);
> -   if (!t) {
> -   int ret;
> -
> -   mutex_unlock(_governor_mutex);
> -
> -   ret = request_module("cpufreq_%s", str_governor);
> -   if (ret)
> -   return NULL;
> -
> -   mutex_lock(_governor_mutex);
> +   t = get_governor(str_governor);
> +   if (t)
> +   return t;
>
> -   t = find_governor(str_governor);
> -   }
> -   if (t && !try_module_get(t->owner))
> -   t = NULL;
> -
> -   mutex_unlock(_governor_mutex);
> +   if (request_module("cpufreq_%s", str_governor))
> +   return NULL;
>
> -   return t;
> +   return get_governor(str_governor);
>  }
>
>  /**
> @@ -818,12 +822,14 @@ static ssize_t show_scaling_available_governors(struct 
> cpufreq_policy *policy,
> goto out;
> }
>
> +   mutex_lock(_governor_mutex);
> for_each_governor(t) {
> if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
> - (CPUFREQ_NAME_LEN + 2)))
> -   goto out;
> +   break;
> i += scnprintf([i], CPUFREQ_NAME_PLEN, "%s ", t->name);
> }
> +   mutex_unlock(_governor_mutex);
>  out:
> i += sprintf([i], "\n");
> return i;
> @@ -1060,11 +1066,14 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
>  {
> struct cpufreq_governor *gov = NULL;
> unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> +   bool put_governor = false;
> +   int ret;
>
> if (has_target()) {
> /* Update policy governor to the one used before hotplug. */
> -   gov = find_governor(policy->last_governor);
> +   gov = get_governor(policy->last_governor);
> if (gov) {
> +   put_governor = true;
> pr_debug("Restoring governor %s for cpu %d\n",
>  policy->governor->name, policy->cpu);
> } else if (default_governor) {
> @@ -1091,7 +1100,11 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> *policy)
> return -ENODATA;
> }
>
> -   return cpufreq_set_policy(policy, gov, pol);
> +

Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line

2020-06-24 Thread Rafael J. Wysocki
On Wed, Jun 24, 2020 at 7:50 AM Viresh Kumar  wrote:
>
> On 23-06-20, 15:21, Quentin Perret wrote:
> > Currently, the only way to specify the default CPUfreq governor is via
> > Kconfig options, which suits users who can build the kernel themselves
> > perfectly.
> >
> > However, for those who use a distro-like kernel (such as Android, with
> > the Generic Kernel Image project), the only way to use a different
> > default is to boot to userspace, and to then switch using the sysfs
> > interface. Being able to specify the default governor on the command
> > line, like is the case for cpuidle, would enable those users to specify
> > their governor of choice earlier on, and to simplify slighlty the
> > userspace boot procedure.
> >
> > To support this use-case, add a kernel command line parameter enabling
> > to specify a default governor for CPUfreq, which takes precedence over
> > the builtin default.
> >
> > This implementation has one notable limitation: the default governor
> > must be registered before the driver. This is solved for builtin
> > governors and drivers using appropriate *_initcall() functions. And in
> > the modular case, this must be reflected as a constraint on the module
> > loading order.
> >
> > Signed-off-by: Quentin Perret 
> > ---
> >  .../admin-guide/kernel-parameters.txt |  5 
> >  Documentation/admin-guide/pm/cpufreq.rst  |  6 ++---
> >  drivers/cpufreq/cpufreq.c | 23 +++
> >  3 files changed, 26 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index fb95fad81c79..5fd3c9f187eb 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -703,6 +703,11 @@
> >   cpufreq.off=1   [CPU_FREQ]
> >   disable the cpufreq sub-system
> >
> > + cpufreq.default_governor=
> > + [CPU_FREQ] Name of the default cpufreq governor to 
> > use.
> > + This governor must be registered in the kernel before
> > + the cpufreq driver probes.
> > +
> >   cpu_init_udelay=N
> >   [X86] Delay for N microsec between assert and 
> > de-assert
> >   of APIC INIT to start processors.  This delay occurs
> > diff --git a/Documentation/admin-guide/pm/cpufreq.rst 
> > b/Documentation/admin-guide/pm/cpufreq.rst
> > index 0c74a7784964..368e612145d2 100644
> > --- a/Documentation/admin-guide/pm/cpufreq.rst
> > +++ b/Documentation/admin-guide/pm/cpufreq.rst
> > @@ -147,9 +147,9 @@ CPUs in it.
> >
> >  The next major initialization step for a new policy object is to attach a
> >  scaling governor to it (to begin with, that is the default scaling governor
> > -determined by the kernel configuration, but it may be changed later
> > -via ``sysfs``).  First, a pointer to the new policy object is passed to the
> > -governor's ``->init()`` callback which is expected to initialize all of the
> > +determined by the kernel command line or configuration, but it may be 
> > changed
> > +later via ``sysfs``).  First, a pointer to the new policy object is passed 
> > to
> > +the governor's ``->init()`` callback which is expected to initialize all 
> > of the
> >  data structures necessary to handle the given policy and, possibly, to add
> >  a governor ``sysfs`` interface to it.  Next, the governor is started by
> >  invoking its ``->start()`` callback.
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 0128de3603df..4b1a5c0173cf 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list);
> >  #define for_each_governor(__governor)\
> >   list_for_each_entry(__governor, _governor_list, governor_list)
> >
> > +static char cpufreq_param_governor[CPUFREQ_NAME_LEN];
> > +static struct cpufreq_governor *default_governor;
> > +
> >  /**
> >   * The "cpufreq driver" - the arch- or hardware-dependent low
> >   * level driver of CPUFreq support, and its spinlock. This lock
> > @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor 
> > *cpufreq_default_governor(void)
> >
> >  static int cpufreq_init_policy(struct cpufreq_policy *policy)
> >  {
> > - struct cpufreq_governor *def_gov = cpufreq_default_governor();
> >   struct cpufreq_governor *gov = NULL;
> >   unsigned int pol = CPUFREQ_POLICY_UNKNOWN;
> >
> > @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy 
> > *policy)
> >   if (gov) {
> >   pr_debug("Restoring governor %s for cpu %d\n",
> >policy->governor->name, policy->cpu);
> > - } else if (def_gov) {
> > - gov = def_gov;
> > + } else if (default_governor) {
> > + gov = 

Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn

2020-03-25 Thread Rafael J. Wysocki
On Tuesday, March 24, 2020 7:34:56 AM CET Michael Ellerman wrote:
> "Rafael J. Wysocki"  writes:
> > On Monday, March 16, 2020 2:57:43 PM CET Pratik Rajesh Sampat wrote:
> >> The patch avoids allocating cpufreq_policy on stack hence fixing frame
> >> size overflow in 'powernv_cpufreq_work_fn'
> >> 
> >> Fixes: 227942809b52 ("cpufreq: powernv: Restore cpu frequency to 
> >> policy->cur on unthrottling")
> >> Signed-off-by: Pratik Rajesh Sampat 
> >
> > Any objections or concerns here?
> >
> > If not, I'll queue it up.
> 
> I have it in my testing branch,

Great!

> but if you pick it up I can drop it.

Let it go in through your tree.

Cheers!





Re: [patch V3 05/20] acpi: Remove header dependency

2020-03-22 Thread Rafael J. Wysocki
On Sat, Mar 21, 2020 at 12:35 PM Thomas Gleixner  wrote:
>
> From: Peter Zijlstra 
>
> In order to avoid future header hell, remove the inclusion of
> proc_fs.h from acpi_bus.h. All it needs is a forward declaration of a
> struct.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Thomas Gleixner 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> Cc: platform-driver-...@vger.kernel.org
> Cc: Greg Kroah-Hartman 
> Cc: Zhang Rui 
> Cc: "Rafael J. Wysocki" 
> Cc: linux...@vger.kernel.org
> Cc: Len Brown 
> Cc: linux-a...@vger.kernel.org

Acked-by: Rafael J. Wysocki 

> ---
>  drivers/platform/x86/dell-smo8800.c  |1 +
>  drivers/platform/x86/wmi.c   |1 +
>  drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c |1 +
>  include/acpi/acpi_bus.h  |2 +-
>  4 files changed, 4 insertions(+), 1 deletion(-)
>
> --- a/drivers/platform/x86/dell-smo8800.c
> +++ b/drivers/platform/x86/dell-smo8800.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct smo8800_device {
> u32 irq; /* acpi device irq */
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  ACPI_MODULE_NAME("wmi");
> --- a/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> +++ b/drivers/thermal/intel/int340x_thermal/acpi_thermal_rel.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "acpi_thermal_rel.h"
>
>  static acpi_handle acpi_thermal_rel_handle;
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -80,7 +80,7 @@ bool acpi_dev_present(const char *hid, c
>
>  #ifdef CONFIG_ACPI
>
> -#include 
> +struct proc_dir_entry;
>
>  #define ACPI_BUS_FILE_ROOT "acpi"
>  extern struct proc_dir_entry *acpi_root_dir;
>
>


Re: [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn

2020-03-19 Thread Rafael J. Wysocki
On Monday, March 16, 2020 2:57:43 PM CET Pratik Rajesh Sampat wrote:
> The patch avoids allocating cpufreq_policy on stack hence fixing frame
> size overflow in 'powernv_cpufreq_work_fn'
> 
> Fixes: 227942809b52 ("cpufreq: powernv: Restore cpu frequency to policy->cur 
> on unthrottling")
> Signed-off-by: Pratik Rajesh Sampat 

Any objections or concerns here?

If not, I'll queue it up.

> ---
>  drivers/cpufreq/powernv-cpufreq.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 56f4bc0d209e..20ee0661555a 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -902,6 +902,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
>  void powernv_cpufreq_work_fn(struct work_struct *work)
>  {
>   struct chip *chip = container_of(work, struct chip, throttle);
> + struct cpufreq_policy *policy;
>   unsigned int cpu;
>   cpumask_t mask;
>  
> @@ -916,12 +917,14 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
>   chip->restore = false;
>   for_each_cpu(cpu, ) {
>   int index;
> - struct cpufreq_policy policy;
>  
> - cpufreq_get_policy(, cpu);
> - index = cpufreq_table_find_index_c(, policy.cur);
> - powernv_cpufreq_target_index(, index);
> - cpumask_andnot(, , policy.cpus);
> + policy = cpufreq_cpu_get(cpu);
> + if (!policy)
> + continue;
> + index = cpufreq_table_find_index_c(policy, policy->cur);
> + powernv_cpufreq_target_index(policy, index);
> + cpumask_andnot(, , policy->cpus);
> + cpufreq_cpu_put(policy);
>   }
>  out:
>   put_online_cpus();
> 






Re: [PATCH v5] reboot: support offline CPUs before reboot

2020-01-15 Thread Rafael J. Wysocki
On Wed, Jan 15, 2020 at 7:35 AM Hsin-Yi Wang  wrote:
>
> Currently system reboots uses architecture specific codes (smp_send_stop)
> to offline non reboot CPUs. Most architecture's implementation is looping
> through all non reboot online CPUs and call ipi function to each of them. Some
> architecture like arm64, arm, and x86... would set offline masks to cpu 
> without
> really offline them. This causes some race condition and kernel warning comes
> out sometimes when system reboots.
>
> This patch adds a config ARCH_OFFLINE_CPUS_ON_REBOOT, which would offline 
> cpus in
> migrate_to_reboot_cpu(). If non reboot cpus are all offlined here, the loop 
> for
> checking online cpus would be an empty loop. If architecture don't enable this
> config, or some cpus somehow fails to offline, it would fallback to ipi
> function.
>
> Opt in this config for architectures that support CONFIG_HOTPLUG_CPU.
>
> Signed-off-by: Hsin-Yi Wang 
> ---
> Change from v4:
> * fix a few nits: naming, comments, remove Kconfig text...
>
> Change from v3:
> * Opt in config for architectures that support CONFIG_HOTPLUG_CPU
> * Merge function offline_secondary_cpus() and freeze_secondary_cpus()
>   with an additional flag.

This does not seem to be a very good idea, since
freeze_secondary_cpus() does much more than you need for reboot.

For reboot, you basically only need to do something like this AFAICS:

cpu_maps_update_begin();

for_each_online_cpu(i) {
if (i != cpu)
_cpu_down(i, 1, CPUHP_OFFLINE);
}
cpu_hotplug_disabled++;

cpu_maps_update_done();

And you may put this into a function defined outside of CONFIG_PM_SLEEP.

>
> Change from v2:
> * Add another config instead of configed by CONFIG_HOTPLUG_CPU

So why exactly is this new Kconfig option needed?

Everybody supporting CPU hotplug seems to opt in anyway.

[cut]

>
> -int freeze_secondary_cpus(int primary)
> +int freeze_secondary_cpus(int primary, bool reboot)
>  {
> int cpu, error = 0;
>
> @@ -1237,11 +1237,13 @@ int freeze_secondary_cpus(int primary)
> if (cpu == primary)
> continue;
>
> -   if (pm_wakeup_pending()) {
> +#ifdef CONFIG_PM_SLEEP
> +   if (!reboot && pm_wakeup_pending()) {
> pr_info("Wakeup pending. Abort CPU freeze\n");
> error = -EBUSY;
> break;
> }
> +#endif

Please avoid using #ifdefs in function bodies.  This makes the code
hard to maintain in the long term.

>
> trace_suspend_resume(TPS("CPU_OFF"), cpu, true);
> error = _cpu_down(cpu, 1, CPUHP_OFFLINE);
> @@ -1250,7 +1252,9 @@ int freeze_secondary_cpus(int primary)
> cpumask_set_cpu(cpu, frozen_cpus);
> else {
> pr_err("Error taking CPU%d down: %d\n", cpu, error);
> -   break;
> +   /* When rebooting, offline as many CPUs as possible. 
> */
> +   if (!reboot)
> +   break;
> }
> }
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index c4d472b7f1b4..12f643b66e57 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -7,6 +7,7 @@
>
>  #define pr_fmt(fmt)"reboot: " fmt
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -220,7 +221,9 @@ void migrate_to_reboot_cpu(void)
> /* The boot cpu is always logical cpu 0 */
> int cpu = reboot_cpu;
>
> +#if !IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT)
> cpu_hotplug_disable();
> +#endif

You can write this as

if (!IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT))
cpu_hotplug_disable();

That's what IS_ENABLED() is there for.

>
> /* Make certain the cpu I'm about to reboot on is online */
> if (!cpu_online(cpu))
> @@ -231,6 +234,11 @@ void migrate_to_reboot_cpu(void)
>
> /* Make certain I only run on the appropriate processor */
> set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +
> +#if IS_ENABLED(CONFIG_ARCH_OFFLINE_CPUS_ON_REBOOT)
> +   /* Offline other cpus if possible */
> +   freeze_secondary_cpus(cpu, true);
> +#endif

The above comment applies here too.

>  }
>
>  /**
> --


Re: [PATCH 12/23] y2038: syscalls: change remaining timeval to __kernel_old_timeval

2019-11-13 Thread Rafael J. Wysocki
On Friday, November 8, 2019 10:12:11 PM CET Arnd Bergmann wrote:
> All of the remaining syscalls that pass a timeval (gettimeofday, utime,
> futimesat) can trivially be changed to pass a __kernel_old_timeval
> instead, which has a compatible layout, but avoids ambiguity with
> the timeval type in user space.
> 
> Signed-off-by: Arnd Bergmann 

For the change in power/power.h

Acked-by: Rafael J. Wysocki 

> ---
>  arch/powerpc/include/asm/asm-prototypes.h |  3 ++-
>  arch/powerpc/kernel/syscalls.c|  4 ++--
>  fs/select.c   | 10 +-
>  fs/utimes.c   |  8 
>  include/linux/syscalls.h  | 10 +-
>  kernel/power/power.h  |  2 +-
>  kernel/time/time.c|  2 +-
>  7 files changed, 20 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
> b/arch/powerpc/include/asm/asm-prototypes.h
> index 8561498e653c..2c25dc079cb9 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -92,7 +92,8 @@ long sys_swapcontext(struct ucontext __user *old_ctx,
>  long sys_debug_setcontext(struct ucontext __user *ctx,
> int ndbg, struct sig_dbg_op __user *dbg);
>  int
> -ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user 
> *exp, struct timeval __user *tvp);
> +ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user 
> *exp,
> +struct __kernel_old_timeval __user *tvp);
>  unsigned long __init early_init(unsigned long dt_ptr);
>  void __init machine_init(u64 dt_ptr);
>  #endif
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index 3bfb3888e897..078608ec2e92 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -79,7 +79,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
>   * sys_select() with the appropriate args. -- Cort
>   */
>  int
> -ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user 
> *exp, struct timeval __user *tvp)
> +ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user 
> *exp, struct __kernel_old_timeval __user *tvp)
>  {
>   if ( (unsigned long)n >= 4096 )
>   {
> @@ -89,7 +89,7 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, 
> fd_set __user *exp, s
>   || __get_user(inp, ((fd_set __user * __user *)(buffer+1)))
>   || __get_user(outp, ((fd_set  __user * __user *)(buffer+2)))
>   || __get_user(exp, ((fd_set  __user * __user *)(buffer+3)))
> - || __get_user(tvp, ((struct timeval  __user * __user 
> *)(buffer+4
> + || __get_user(tvp, ((struct __kernel_old_timeval  __user * 
> __user *)(buffer+4
>   return -EFAULT;
>   }
>   return sys_select(n, inp, outp, exp, tvp);
> diff --git a/fs/select.c b/fs/select.c
> index 53a0c149f528..11d0285d46b7 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -321,7 +321,7 @@ static int poll_select_finish(struct timespec64 *end_time,
>   switch (pt_type) {
>   case PT_TIMEVAL:
>   {
> - struct timeval rtv;
> + struct __kernel_old_timeval rtv;
>  
>   if (sizeof(rtv) > sizeof(rtv.tv_sec) + 
> sizeof(rtv.tv_usec))
>   memset(, 0, sizeof(rtv));
> @@ -698,10 +698,10 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>  }
>  
>  static int kern_select(int n, fd_set __user *inp, fd_set __user *outp,
> -fd_set __user *exp, struct timeval __user *tvp)
> +fd_set __user *exp, struct __kernel_old_timeval __user 
> *tvp)
>  {
>   struct timespec64 end_time, *to = NULL;
> - struct timeval tv;
> + struct __kernel_old_timeval tv;
>   int ret;
>  
>   if (tvp) {
> @@ -720,7 +720,7 @@ static int kern_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>  }
>  
>  SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp,
> - fd_set __user *, exp, struct timeval __user *, tvp)
> + fd_set __user *, exp, struct __kernel_old_timeval __user *, tvp)
>  {
>   return kern_select(n, inp, outp, exp, tvp);
>  }
> @@ -810,7 +810,7 @@ SYSCALL_DEFINE6(pselect6_time32, int, n, fd_set __user *, 
> inp, fd_set __user *,
>  struct sel_arg_struct {
>   unsigned long n;
>   fd_set __user *inp, *outp, *exp;
> - struct timeval __user *tvp;
> + struct __kernel_old_timeval __user *tvp;
>  };
>

Re: [PATCH 4/5] power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call

2019-11-13 Thread Rafael J. Wysocki
On Monday, October 21, 2019 4:51:48 PM CET Geert Uytterhoeven wrote:
> There is no need to cast a typed pointer to a void pointer when calling
> a function that accepts the latter.  Remove it, as the cast prevents
> further compiler checks.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/power/avs/smartreflex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c
> index 4684e7df833a81e9..5376f3d22f31eade 100644
> --- a/drivers/power/avs/smartreflex.c
> +++ b/drivers/power/avs/smartreflex.c
> @@ -905,7 +905,7 @@ static int omap_sr_probe(struct platform_device *pdev)
>   sr_info->dbg_dir = debugfs_create_dir(sr_info->name, sr_dbg_dir);
>  
>   debugfs_create_file("autocomp", S_IRUGO | S_IWUSR, sr_info->dbg_dir,
> - (void *)sr_info, _sr_fops);
> + sr_info, _sr_fops);
>   debugfs_create_x32("errweight", S_IRUGO, sr_info->dbg_dir,
>  _info->err_weight);
>   debugfs_create_x32("errmaxlimit", S_IRUGO, sr_info->dbg_dir,
> 

Applying as 5.5 material, thanks!






Re: [PATCH 4/5] power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call

2019-11-08 Thread Rafael J. Wysocki
On Monday, October 21, 2019 4:51:48 PM CET Geert Uytterhoeven wrote:
> There is no need to cast a typed pointer to a void pointer when calling
> a function that accepts the latter.  Remove it, as the cast prevents
> further compiler checks.
> 
> Signed-off-by: Geert Uytterhoeven 

Greg, have you taken this one by any chance?

> ---
>  drivers/power/avs/smartreflex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c
> index 4684e7df833a81e9..5376f3d22f31eade 100644
> --- a/drivers/power/avs/smartreflex.c
> +++ b/drivers/power/avs/smartreflex.c
> @@ -905,7 +905,7 @@ static int omap_sr_probe(struct platform_device *pdev)
>   sr_info->dbg_dir = debugfs_create_dir(sr_info->name, sr_dbg_dir);
>  
>   debugfs_create_file("autocomp", S_IRUGO | S_IWUSR, sr_info->dbg_dir,
> - (void *)sr_info, _sr_fops);
> + sr_info, _sr_fops);
>   debugfs_create_x32("errweight", S_IRUGO, sr_info->dbg_dir,
>  _info->err_weight);
>   debugfs_create_x32("errmaxlimit", S_IRUGO, sr_info->dbg_dir,
> 






Re: [PATCH v3] cpufreq: powernv: fix stack bloat and hard limit on num cpus

2019-11-04 Thread Rafael J. Wysocki
On Thursday, October 31, 2019 6:21:59 AM CET John Hubbard wrote:
> The following build warning occurred on powerpc 64-bit builds:
> 
> drivers/cpufreq/powernv-cpufreq.c: In function 'init_chip_info':
> drivers/cpufreq/powernv-cpufreq.c:1070:1: warning: the frame size of
> 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
> 
> This is with a cross-compiler based on gcc 8.1.0, which I got from:
>   https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/
> 
> The warning is due to putting 1024 bytes on the stack:
> 
> unsigned int chip[256];
> 
> ...and it's also undesirable to have a hard limit on the number of
> CPUs here.
> 
> Fix both problems by dynamically allocating based on num_possible_cpus,
> as recommended by Michael Ellerman.
> 
> Fixes: 053819e0bf840 ("cpufreq: powernv: Handle throttling due to Pmax 
> capping at chip level")
> Cc: Michael Ellerman 
> Cc: Shilpasri G Bhat 
> Cc: Preeti U Murthy 
> Cc: Viresh Kumar 
> Cc: Rafael J. Wysocki 
> Cc: linux...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: John Hubbard 
> Acked-by: Viresh Kumar 
> ---
> 
> Changes since v2: applied fixes from Michael Ellerman's review:
> 
> * Changed from CONFIG_NR_CPUS to num_possible_cpus()
> 
> * Fixed up commit description: added a note about exactly which
>   compiler generates the warning. And softened up wording about
>   the limitation on number of CPUs.
> 
> Changes since v1: includes Viresh's review commit fixes.

Applying as 5.5 material, thanks!





Re: [PATCH v2] cpufreq: powernv: fix stack bloat and NR_CPUS limitation

2019-10-28 Thread Rafael J. Wysocki
On Friday, October 18, 2019 7:07:12 AM CET Viresh Kumar wrote:
> On 17-10-19, 21:55, John Hubbard wrote:
> > The following build warning occurred on powerpc 64-bit builds:
> > 
> > drivers/cpufreq/powernv-cpufreq.c: In function 'init_chip_info':
> > drivers/cpufreq/powernv-cpufreq.c:1070:1: warning: the frame size of 1040 
> > bytes is larger than 1024 bytes [-Wframe-larger-than=]
> > 
> > This is due to putting 1024 bytes on the stack:
> > 
> > unsigned int chip[256];
> > 
> > ...and while looking at this, it also has a bug: it fails with a stack
> > overrun, if CONFIG_NR_CPUS > 256.
> > 
> > Fix both problems by dynamically allocating based on CONFIG_NR_CPUS.
> > 
> > Fixes: 053819e0bf840 ("cpufreq: powernv: Handle throttling due to Pmax 
> > capping at chip level")
> > Cc: Shilpasri G Bhat 
> > Cc: Preeti U Murthy 
> > Cc: Viresh Kumar 
> > Cc: Rafael J. Wysocki 
> > Cc: linux...@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: John Hubbard 
> > ---
> > 
> > Changes since v1: includes Viresh's review commit fixes.
> > 
> >  drivers/cpufreq/powernv-cpufreq.c | 17 +
> >  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> Acked-by: Viresh Kumar 
> 
> 

Applying as 5.5 material, thanks!






Re: [PATCH v9 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-10-23 Thread Rafael J. Wysocki
On Wed, Oct 23, 2019 at 10:24 AM Ran Wang  wrote:
>
> Some user might want to go through all registered wakeup sources
> and doing things accordingly. For example, SoC PM driver might need to
> do HW programming to prevent powering down specific IP which wakeup
> source depending on. So add this API to help walk through all registered
> wakeup source objects on that list and return them one by one.
>
> Signed-off-by: Ran Wang 
> Tested-by: Leonard Crestez 

OK, thanks for making all of the requested changes:

Reviewed-by: Rafael J. Wysocki 

and please feel free to push this through the appropriate
arch/platform tree.  Alternatively, please let me know if you want me
to take this series, but then I need an ACK from the appropriate
maintainer(s) on patch 3.

> ---
> Change in v9:
> - Supplement comments for wakeup_sources_read_lock(),
>   wakeup_sources_read_unlock, wakeup_sources_walk_start and
>   wakeup_sources_walk_next().
>
> Change in v8:
> - Rename wakeup_source_get_next() to wakeup_sources_walk_next().
> - Add wakeup_sources_read_lock() to take over locking job of
>   wakeup_source_get_star().
> - Rename wakeup_source_get_start() to wakeup_sources_walk_start().
> - Replace wakeup_source_get_stop() with wakeup_sources_read_unlock().
> - Define macro for_each_wakeup_source(ws).
>
> Change in v7:
> - Remove define of member *dev in wake_irq to fix conflict with commit
> c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs"), user
> will use ws->dev->parent instead.
> - Remove '#include ' because it is not used.
>
> Change in v6:
> - Add wakeup_source_get_star() and wakeup_source_get_stop() to aligned
> with wakeup_sources_stats_seq_start/nex/stop.
>
> Change in v5:
> - Update commit message, add decription of walk through all wakeup
> source objects.
> - Add SCU protection in function wakeup_source_get_next().
> - Rename wakeup_source member 'attached_dev' to 'dev' and move it up
> (before wakeirq).
>
> Change in v4:
> - None.
>
> Change in v3:
> - Adjust indentation of *attached_dev;.
>
> Change in v2:
> - None.
>
>  drivers/base/power/wakeup.c | 54 
> +
>  include/linux/pm_wakeup.h   |  9 
>  2 files changed, 63 insertions(+)
>
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 5817b51..70a9edb 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -248,6 +248,60 @@ void wakeup_source_unregister(struct wakeup_source *ws)
>  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
>
>  /**
> + * wakeup_sources_read_lock - Lock wakeup source list for read.
> + *
> + * Returns an index of srcu lock for struct wakeup_srcu.
> + * This index must be passed to the matching wakeup_sources_read_unlock().
> + */
> +int wakeup_sources_read_lock(void)
> +{
> +   return srcu_read_lock(_srcu);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_read_lock);
> +
> +/**
> + * wakeup_sources_read_unlock - Unlock wakeup source list.
> + * @idx: return value from corresponding wakeup_sources_read_lock()
> + */
> +void wakeup_sources_read_unlock(int idx)
> +{
> +   srcu_read_unlock(_srcu, idx);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock);
> +
> +/**
> + * wakeup_sources_walk_start - Begin a walk on wakeup source list
> + *
> + * Returns first object of the list of wakeup sources.
> + *
> + * Note that to be safe, wakeup sources list needs to be locked by calling
> + * wakeup_source_read_lock() for this.
> + */
> +struct wakeup_source *wakeup_sources_walk_start(void)
> +{
> +   struct list_head *ws_head = _sources;
> +
> +   return list_entry_rcu(ws_head->next, struct wakeup_source, entry);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_walk_start);
> +
> +/**
> + * wakeup_sources_walk_next - Get next wakeup source from the list
> + * @ws: Previous wakeup source object
> + *
> + * Note that to be safe, wakeup sources list needs to be locked by calling
> + * wakeup_source_read_lock() for this.
> + */
> +struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws)
> +{
> +   struct list_head *ws_head = _sources;
> +
> +   return list_next_or_null_rcu(ws_head, >entry,
> +   struct wakeup_source, entry);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_walk_next);
> +
> +/**
>   * device_wakeup_attach - Attach a wakeup source object to a device object.
>   * @dev: Device to handle.
>   * @ws: Wakeup source object to 

Re: [PATCH v9 3/3] soc: fsl: add RCPM driver

2019-10-23 Thread Rafael J. Wysocki
gt; +   struct rcpm *rcpm;
> +   struct device_node  *np = dev->of_node;
> +   u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1];
> +
> +   rcpm = dev_get_drvdata(dev);
> +   if (!rcpm)
> +   return -EINVAL;
> +
> +   base = rcpm->ippdexpcr_base;
> +   idx = wakeup_sources_read_lock();
> +
> +   /* Begin with first registered wakeup source */
> +   for_each_wakeup_source(ws) {
> +
> +   /* skip object which is not attached to device */
> +   if (!ws->dev || !ws->dev->parent)
> +   continue;
> +
> +   ret = device_property_read_u32_array(ws->dev->parent,
> +   "fsl,rcpm-wakeup", value,
> +   rcpm->wakeup_cells + 1);
> +
> +   /*  Wakeup source should refer to current rcpm device */
> +   if (ret || (np->phandle != value[0])) {
> +   pr_debug("%s doesn't refer to this rcpm\n", ws->name);

I'm still quite unsure why it is useful to print this message instead
of printing one when the wakeup source does match (there may be many
wakeup source objects you don't care about in principle), but
whatever.

> +   continue;
> +   }
> +
> +   /* Property "#fsl,rcpm-wakeup-cells" of rcpm node defines the
> +* number of IPPDEXPCR register cells, and "fsl,rcpm-wakeup"
> +* of wakeup source IP contains an integer array:  +* RCPM node, IPPDEXPCR0 setting, IPPDEXPCR1 setting,
> +* IPPDEXPCR2 setting, etc>.
> +*
> +* So we will go thought them and do programming accordngly.
> +*/
> +   for (i = 0; i < rcpm->wakeup_cells; i++) {
> +   u32 tmp = value[i + 1];
> +   void __iomem *address = base + i * 4;
> +
> +   if (!tmp)
> +   continue;
> +
> +   /* We can only OR related bits */
> +   if (rcpm->little_endian) {
> +   tmp |= ioread32(address);
> +   iowrite32(tmp, address);
> +   } else {
> +   tmp |= ioread32be(address);
> +   iowrite32be(tmp, address);
> +   }
> +   }
> +   }
> +
> +   wakeup_sources_read_unlock(idx);
> +
> +   return 0;
> +}
> +
> +static const struct dev_pm_ops rcpm_pm_ops = {
> +   .prepare =  rcpm_pm_prepare,
> +};

For the above:

Reviewed-by: Rafael J. Wysocki 


Re: [PATCH 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-10-22 Thread Rafael J. Wysocki
On Tue, Oct 22, 2019 at 9:51 AM Ran Wang  wrote:
>
> Some user might want to go through all registered wakeup sources
> and doing things accordingly. For example, SoC PM driver might need to
> do HW programming to prevent powering down specific IP which wakeup
> source depending on. So add this API to help walk through all registered
> wakeup source objects on that list and return them one by one.
>
> Signed-off-by: Ran Wang 
> Tested-by: Leonard Crestez 
> ---
> Change in v8
> - Rename wakeup_source_get_next() to wakeup_sources_walk_next().
> - Add wakeup_sources_read_lock() to take over locking job of
>   wakeup_source_get_star().
> - Rename wakeup_source_get_start() to wakeup_sources_walk_start().
> - Replace wakeup_source_get_stop() with wakeup_sources_read_unlock().
> - Define macro for_each_wakeup_source(ws).
>
> Change in v7:
> - Remove define of member *dev in wake_irq to fix conflict with commit
> c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs"), user
> will use ws->dev->parent instead.
> - Remove '#include ' because it is not used.
>
> Change in v6:
> - Add wakeup_source_get_star() and wakeup_source_get_stop() to aligned
> with wakeup_sources_stats_seq_start/nex/stop.
>
> Change in v5:
> - Update commit message, add decription of walk through all wakeup
> source objects.
> - Add SCU protection in function wakeup_source_get_next().
> - Rename wakeup_source member 'attached_dev' to 'dev' and move it up
> (before wakeirq).
>
> Change in v4:
> - None.
>
> Change in v3:
> - Adjust indentation of *attached_dev;.
>
> Change in v2:
> - None.
>
>  drivers/base/power/wakeup.c | 42 ++
>  include/linux/pm_wakeup.h   |  9 +
>  2 files changed, 51 insertions(+)
>
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 5817b51..8c7a5f9 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -248,6 +248,48 @@ void wakeup_source_unregister(struct wakeup_source *ws)
>  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
>
>  /**
> + * wakeup_sources_read_lock - Lock wakeup source list for read.

Please document the return value.

> + */
> +int wakeup_sources_read_lock(void)
> +{
> +   return srcu_read_lock(_srcu);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_read_lock);
> +
> +/**
> + * wakeup_sources_read_unlock - Unlock wakeup source list.

Please document the argument.

> + */
> +void wakeup_sources_read_unlock(int idx)
> +{
> +   srcu_read_unlock(_srcu, idx);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock);
> +
> +/**
> + * wakeup_sources_walk_start - Begin a walk on wakeup source list

Please document the return value and add a note that the wakeup
sources list needs to be locked for reading for this to be safe.

> + */
> +struct wakeup_source *wakeup_sources_walk_start(void)
> +{
> +   struct list_head *ws_head = _sources;
> +
> +   return list_entry_rcu(ws_head->next, struct wakeup_source, entry);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_walk_start);
> +
> +/**
> + * wakeup_sources_walk_next - Get next wakeup source from the list
> + * @ws: Previous wakeup source object

Please add a note that the wakeup sources list needs to be locked for
reading for this to be safe.

> + */
> +struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws)
> +{
> +   struct list_head *ws_head = _sources;
> +
> +   return list_next_or_null_rcu(ws_head, >entry,
> +   struct wakeup_source, entry);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_sources_walk_next);
> +
> +/**
>   * device_wakeup_attach - Attach a wakeup source object to a device object.
>   * @dev: Device to handle.
>   * @ws: Wakeup source object to attach to @dev.
> diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
> index 661efa0..aa3da66 100644
> --- a/include/linux/pm_wakeup.h
> +++ b/include/linux/pm_wakeup.h
> @@ -63,6 +63,11 @@ struct wakeup_source {
> boolautosleep_enabled:1;
>  };
>
> +#define for_each_wakeup_source(ws) \
> +   for ((ws) = wakeup_sources_walk_start();\
> +(ws);  \
> +(ws) = wakeup_sources_walk_next((ws)))
> +
>  #ifdef CONFIG_PM_SLEEP
>
>  /*
> @@ -92,6 +97,10 @@ extern void wakeup_source_remove(struct wakeup_source *ws);
>  extern struct wakeup_source *wakeup_source_register(struct device *dev,
> const char *name);
>  extern void wakeup_source_unregister(struct wakeup_source *ws);
> +extern int wakeup_sources_read_lock(void);
> +extern void wakeup_sources_read_unlock(int idx);
> +extern struct wakeup_source *wakeup_sources_walk_start(void);
> +extern struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source 
> *ws);
>  extern int 

Re: [PATCH 3/3] soc: fsl: add RCPM driver

2019-10-22 Thread Rafael J. Wysocki
On Tue, Oct 22, 2019 at 9:52 AM Ran Wang  wrote:
>
> The NXP's QorIQ Processors based on ARM Core have RCPM module
> (Run Control and Power Management), which performs system level
> tasks associated with power management such as wakeup source control.
>
> This driver depends on PM wakeup source framework which help to
> collect wake information.
>
> Signed-off-by: Ran Wang 
> ---
> Change in v8:
> - Adjust related API usage to meet wakeup.c's update in patch 1/3.
> - Add sanity checking for the case of ws->dev or ws->dev->parent
>   is null.
>
> Change in v7:
> - Replace 'ws->dev' with 'ws->dev->parent' to get aligned with
> c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs")
> - Remove '+obj-y += ftm_alarm.o' since it is wrong.
> - Cosmetic work.
>
> Change in v6:
> - Adjust related API usage to meet wakeup.c's update in patch 1/3.
>
> Change in v5:
> - Fix v4 regression of the return value of wakeup_source_get_next()
> didn't pass to ws in while loop.
> - Rename wakeup_source member 'attached_dev' to 'dev'.
> - Rename property 'fsl,#rcpm-wakeup-cells' to 
> '#fsl,rcpm-wakeup-cells'.
> please see https://lore.kernel.org/patchwork/patch/1101022/
>
> Change in v4:
> - Remove extra ',' in author line of rcpm.c
> - Update usage of wakeup_source_get_next() to be less confusing to the
> reader, code logic remain the same.
>
> Change in v3:
> - Some whitespace ajdustment.
>
> Change in v2:
> - Rebase Kconfig and Makefile update to latest mainline.
>
>  drivers/soc/fsl/Kconfig  |   8 +++
>  drivers/soc/fsl/Makefile |   1 +
>  drivers/soc/fsl/rcpm.c   | 133 
> +++
>  3 files changed, 142 insertions(+)
>  create mode 100644 drivers/soc/fsl/rcpm.c
>
> diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
> index f9ad8ad..4918856 100644
> --- a/drivers/soc/fsl/Kconfig
> +++ b/drivers/soc/fsl/Kconfig
> @@ -40,4 +40,12 @@ config DPAA2_CONSOLE
>   /dev/dpaa2_mc_console and /dev/dpaa2_aiop_console,
>   which can be used to dump the Management Complex and AIOP
>   firmware logs.
> +
> +config FSL_RCPM
> +   bool "Freescale RCPM support"
> +   depends on PM_SLEEP
> +   help
> + The NXP QorIQ Processors based on ARM Core have RCPM module
> + (Run Control and Power Management), which performs all device-level
> + tasks associated with power management, such as wakeup source 
> control.
>  endmenu
> diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
> index 71dee8d..906f1cd 100644
> --- a/drivers/soc/fsl/Makefile
> +++ b/drivers/soc/fsl/Makefile
> @@ -6,6 +6,7 @@
>  obj-$(CONFIG_FSL_DPAA) += qbman/
>  obj-$(CONFIG_QUICC_ENGINE) += qe/
>  obj-$(CONFIG_CPM)  += qe/
> +obj-$(CONFIG_FSL_RCPM) += rcpm.o
>  obj-$(CONFIG_FSL_GUTS) += guts.o
>  obj-$(CONFIG_FSL_MC_DPIO)  += dpio/
>  obj-$(CONFIG_DPAA2_CONSOLE)+= dpaa2-console.o
> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
> new file mode 100644
> index 000..3ed135e
> --- /dev/null
> +++ b/drivers/soc/fsl/rcpm.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// rcpm.c - Freescale QorIQ RCPM driver
> +//
> +// Copyright 2019 NXP
> +//
> +// Author: Ran Wang 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define RCPM_WAKEUP_CELL_MAX_SIZE  7
> +
> +struct rcpm {
> +   unsigned intwakeup_cells;
> +   void __iomem*ippdexpcr_base;
> +   boollittle_endian;
> +};
> +

Please add a kerneldoc comment describing this routine.

> +static int rcpm_pm_prepare(struct device *dev)
> +{
> +   int i, ret, idx;
> +   void __iomem *base;
> +   struct wakeup_source*ws;
> +   struct rcpm *rcpm;
> +   struct device_node  *np = dev->of_node;
> +   u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1], tmp;
> +
> +   rcpm = dev_get_drvdata(dev);
> +   if (!rcpm)
> +   return -EINVAL;
> +
> +   base = rcpm->ippdexpcr_base;
> +   idx = wakeup_sources_read_lock();
> +
> +   /* Begin with first registered wakeup source */
> +   for_each_wakeup_source(ws) {
> +
> +   /* skip object which is not attached to device */
> +   if (!ws->dev || !ws->dev->parent)
> +   continue;
> +
> +   ret = device_property_read_u32_array(ws->dev->parent,
> +   "fsl,rcpm-wakeup", value,
> +   rcpm->wakeup_cells + 1);
> +
> +   /*  Wakeup source should refer to current rcpm device */
> +   if (ret || (np->phandle != value[0])) {
> +   dev_info(dev, "%s doesn't refer to this rcpm\n",
> +  

Re: [PATCH v7 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-10-21 Thread Rafael J. Wysocki
On Mon, Oct 21, 2019 at 5:49 AM Ran Wang  wrote:
>
> Some user might want to go through all registered wakeup sources
> and doing things accordingly. For example, SoC PM driver might need to
> do HW programming to prevent powering down specific IP which wakeup
> source depending on. So add this API to help walk through all registered
> wakeup source objects on that list and return them one by one.
>
> Signed-off-by: Ran Wang 
> Tested-by: Leonard Crestez 
> ---
> Change in v7:
> - Remove define of member *dev in wake_irq to fix conflict with commit
> c8377adfa781 ("PM / wakeup: Show wakeup sources stats in sysfs"), user
> will use ws->dev->parent instead.
> - Remove '#include ' because it is not used.
>
> Change in v6:
> - Add wakeup_source_get_star() and wakeup_source_get_stop() to aligned
> with wakeup_sources_stats_seq_start/nex/stop.
>
> Change in v5:
> - Update commit message, add decription of walk through all wakeup
> source objects.
> - Add SCU protection in function wakeup_source_get_next().
> - Rename wakeup_source member 'attached_dev' to 'dev' and move it up
> (before wakeirq).
>
> Change in v4:
> - None.
>
> Change in v3:
> - Adjust indentation of *attached_dev;.
>
> Change in v2:
> - None.
>
>  drivers/base/power/wakeup.c | 37 +
>  include/linux/pm_wakeup.h   |  3 +++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 5817b51..dee1b09 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -248,6 +248,43 @@ void wakeup_source_unregister(struct wakeup_source *ws)
>  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
>
>  /**
> + * wakeup_source_get_star - Begin a walk on wakeup source list

The "get" in the name suggests acquiring a reference of some kind
which doesn't happen here.

What about renaming it to wakeup_sources_walk_start()?

> + * @srcuidx: Lock index allocated for this caller.
> + */
> +struct wakeup_source *wakeup_source_get_start(int *srcuidx)

I don't quite like the calling convention here with passing an int
pointer to get the SRCU index back.

What about splitting this into, say, wakeup_sources_read_lock() (that
will return the SRCU index) and wakeup_sources_walk_start() (that will
return the first list entry)?

Then, you could do something like

idx = wakeup_sources_read_lock();

ws = wakeup_sources_walk_start();
while (ws) {

stuff

ws = wakeup_sources_walk_next();
}

wakeup_sources_read_unlock(idx);

Or even define for_each_wakeup_source(ws) as

for (ws = wakeup_sources_walk_start(); ws; ws = wakeup_sources_walk_next())

and use that under a _read_lock()/_read_unlock() pair?


Re: [PATCH v5 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-08-19 Thread Rafael J. Wysocki
On Monday, August 19, 2019 10:33:25 AM CEST Ran Wang wrote:
> Hi Rafael,
> 
> On Monday, August 19, 2019 16:20, Rafael J. Wysocki wrote:
> > 
> > On Mon, Aug 19, 2019 at 10:15 AM Ran Wang  wrote:
> > >
> > > Hi Rafael,
> > >
> > > On Monday, August 05, 2019 17:59, Rafael J. Wysocki wrote:
> > > >
> > > > On Wednesday, July 24, 2019 9:47:20 AM CEST Ran Wang wrote:
> > > > > Some user might want to go through all registered wakeup sources
> > > > > and doing things accordingly. For example, SoC PM driver might
> > > > > need to do HW programming to prevent powering down specific IP
> > > > > which wakeup source depending on. So add this API to help walk
> > > > > through all registered wakeup source objects on that list and return 
> > > > > them
> > one by one.
> > > > >
> > > > > Signed-off-by: Ran Wang 
> > > > > ---
> > > > > Change in v5:
> > > > > - Update commit message, add decription of walk through all wakeup
> > > > > source objects.
> > > > > - Add SCU protection in function wakeup_source_get_next().
> > > > > - Rename wakeup_source member 'attached_dev' to 'dev' and move
> > > > > it
> > > > up
> > > > > (before wakeirq).
> > > > >
> > > > > Change in v4:
> > > > > - None.
> > > > >
> > > > > Change in v3:
> > > > > - Adjust indentation of *attached_dev;.
> > > > >
> > > > > Change in v2:
> > > > > - None.
> > > > >
> > > > >  drivers/base/power/wakeup.c | 24 
> > > > >  include/linux/pm_wakeup.h   |  3 +++
> > > > >  2 files changed, 27 insertions(+)
> > > > >
> > > > > diff --git a/drivers/base/power/wakeup.c
> > > > > b/drivers/base/power/wakeup.c index ee31d4f..2fba891 100644
> > > > > --- a/drivers/base/power/wakeup.c
> > > > > +++ b/drivers/base/power/wakeup.c
> > > > > @@ -14,6 +14,7 @@
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > +#include 
> > > > >  #include 
> > > > >  #include 
> > > > >
> > > > > @@ -226,6 +227,28 @@ void wakeup_source_unregister(struct
> > > > wakeup_source *ws)
> > > > > }
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
> > > > > +/**
> > > > > + * wakeup_source_get_next - Get next wakeup source from the list
> > > > > + * @ws: Previous wakeup source object, null means caller want first 
> > > > > one.
> > > > > + */
> > > > > +struct wakeup_source *wakeup_source_get_next(struct wakeup_source
> > > > > +*ws) {
> > > > > +   struct list_head *ws_head = _sources;
> > > > > +   struct wakeup_source *next_ws = NULL;
> > > > > +   int idx;
> > > > > +
> > > > > +   idx = srcu_read_lock(_srcu);
> > > > > +   if (ws)
> > > > > +   next_ws = list_next_or_null_rcu(ws_head, >entry,
> > > > > +   struct wakeup_source, entry);
> > > > > +   else
> > > > > +   next_ws = list_entry_rcu(ws_head->next,
> > > > > +   struct wakeup_source, entry);
> > > > > +   srcu_read_unlock(_srcu, idx);
> > > > > +
> > > >
> > > > This is incorrect.
> > > >
> > > > The SRCU cannot be unlocked until the caller of this is done with
> > > > the object returned by it, or that object can be freed while it is 
> > > > still being
> > accessed.
> > >
> > > Thanks for the comment. Looks like I was not fully understanding your
> > > point on
> > > v4 discussion. So I will implement 3 APIs by referring
> > > wakeup_sources_stats_seq_start/next/stop()
> > >
> > > > Besides, this patch conflicts with some general wakeup sources
> > > > changes in the works, so it needs to be deferred and rebased on top of 
> > > > those
> > changes.
> > >
> > > Could you please tell me which is the right code base I should developing 
> > > on?
> > > I just tried applying v5 patch on latest
> > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git branch master
> > (d1abaeb Linux 5.3-rc5) and no conflict encountered.
> > 
> > It is better to use the most recent -rc from Linus (5.3-rc5 as of
> > today) as the base unless your patches depend on some changes that are not 
> > in
> > there.
> 
> OK, So I need to implement on latest 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git branch 
> master, am I right?
> 
> However, I just checked v5.3-rc5 code and found it has the same HEAD (d1abaeb 
> Linux 5.3-rc5
> on which I did not observe v5 patch apply conflict, did I miss something? 
> Thanks.

The conflict I mentioned earlier was with another patch series in the works
which is not in 5.3-rc5.  However, there are problems with that series and it
is not linux-next now even, so please just base your series on top of -rc5.






Re: [PATCH v5 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-08-19 Thread Rafael J. Wysocki
On Mon, Aug 19, 2019 at 10:15 AM Ran Wang  wrote:
>
> Hi Rafael,
>
> On Monday, August 05, 2019 17:59, Rafael J. Wysocki wrote:
> >
> > On Wednesday, July 24, 2019 9:47:20 AM CEST Ran Wang wrote:
> > > Some user might want to go through all registered wakeup sources and
> > > doing things accordingly. For example, SoC PM driver might need to do
> > > HW programming to prevent powering down specific IP which wakeup
> > > source depending on. So add this API to help walk through all
> > > registered wakeup source objects on that list and return them one by one.
> > >
> > > Signed-off-by: Ran Wang 
> > > ---
> > > Change in v5:
> > > - Update commit message, add decription of walk through all wakeup
> > > source objects.
> > > - Add SCU protection in function wakeup_source_get_next().
> > > - Rename wakeup_source member 'attached_dev' to 'dev' and move it
> > up
> > > (before wakeirq).
> > >
> > > Change in v4:
> > > - None.
> > >
> > > Change in v3:
> > > - Adjust indentation of *attached_dev;.
> > >
> > > Change in v2:
> > > - None.
> > >
> > >  drivers/base/power/wakeup.c | 24 
> > >  include/linux/pm_wakeup.h   |  3 +++
> > >  2 files changed, 27 insertions(+)
> > >
> > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > index ee31d4f..2fba891 100644
> > > --- a/drivers/base/power/wakeup.c
> > > +++ b/drivers/base/power/wakeup.c
> > > @@ -14,6 +14,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >
> > > @@ -226,6 +227,28 @@ void wakeup_source_unregister(struct
> > wakeup_source *ws)
> > > }
> > >  }
> > >  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
> > > +/**
> > > + * wakeup_source_get_next - Get next wakeup source from the list
> > > + * @ws: Previous wakeup source object, null means caller want first one.
> > > + */
> > > +struct wakeup_source *wakeup_source_get_next(struct wakeup_source
> > > +*ws) {
> > > +   struct list_head *ws_head = _sources;
> > > +   struct wakeup_source *next_ws = NULL;
> > > +   int idx;
> > > +
> > > +   idx = srcu_read_lock(_srcu);
> > > +   if (ws)
> > > +   next_ws = list_next_or_null_rcu(ws_head, >entry,
> > > +   struct wakeup_source, entry);
> > > +   else
> > > +   next_ws = list_entry_rcu(ws_head->next,
> > > +   struct wakeup_source, entry);
> > > +   srcu_read_unlock(_srcu, idx);
> > > +
> >
> > This is incorrect.
> >
> > The SRCU cannot be unlocked until the caller of this is done with the object
> > returned by it, or that object can be freed while it is still being 
> > accessed.
>
> Thanks for the comment. Looks like I was not fully understanding your point on
> v4 discussion. So I will implement 3 APIs by referring 
> wakeup_sources_stats_seq_start/next/stop()
>
> > Besides, this patch conflicts with some general wakeup sources changes in 
> > the
> > works, so it needs to be deferred and rebased on top of those changes.
>
> Could you please tell me which is the right code base I should developing on?
> I just tried applying v5 patch on latest 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git branch master 
> (d1abaeb Linux 5.3-rc5)
> and no conflict encountered.

It is better to use the most recent -rc from Linus (5.3-rc5 as of
today) as the base unless your patches depend on some changes that are
not in there.


Re: [PATCH v5 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-08-05 Thread Rafael J. Wysocki
On Wednesday, July 24, 2019 9:47:20 AM CEST Ran Wang wrote:
> Some user might want to go through all registered wakeup sources
> and doing things accordingly. For example, SoC PM driver might need to
> do HW programming to prevent powering down specific IP which wakeup
> source depending on. So add this API to help walk through all registered
> wakeup source objects on that list and return them one by one.
> 
> Signed-off-by: Ran Wang 
> ---
> Change in v5:
>   - Update commit message, add decription of walk through all wakeup
>   source objects.
>   - Add SCU protection in function wakeup_source_get_next().
>   - Rename wakeup_source member 'attached_dev' to 'dev' and move it up
>   (before wakeirq).
> 
> Change in v4:
>   - None.
> 
> Change in v3:
>   - Adjust indentation of *attached_dev;.
> 
> Change in v2:
>   - None.
> 
>  drivers/base/power/wakeup.c | 24 
>  include/linux/pm_wakeup.h   |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index ee31d4f..2fba891 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -226,6 +227,28 @@ void wakeup_source_unregister(struct wakeup_source *ws)
>   }
>  }
>  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
> +/**
> + * wakeup_source_get_next - Get next wakeup source from the list
> + * @ws: Previous wakeup source object, null means caller want first one.
> + */
> +struct wakeup_source *wakeup_source_get_next(struct wakeup_source *ws)
> +{
> + struct list_head *ws_head = _sources;
> + struct wakeup_source *next_ws = NULL;
> + int idx;
> +
> + idx = srcu_read_lock(_srcu);
> + if (ws)
> + next_ws = list_next_or_null_rcu(ws_head, >entry,
> + struct wakeup_source, entry);
> + else
> + next_ws = list_entry_rcu(ws_head->next,
> + struct wakeup_source, entry);
> + srcu_read_unlock(_srcu, idx);
> +

This is incorrect.

The SRCU cannot be unlocked until the caller of this is done
with the object returned by it, or that object can be freed
while it is still being accessed.

Besides, this patch conflicts with some general wakeup sources
changes in the works, so it needs to be deferred and rebased on
top of those changes.

> + return next_ws;
> +}
> +EXPORT_SYMBOL_GPL(wakeup_source_get_next);
>  
>  /**
>   * device_wakeup_attach - Attach a wakeup source object to a device object.
> @@ -242,6 +265,7 @@ static int device_wakeup_attach(struct device *dev, 
> struct wakeup_source *ws)
>   return -EEXIST;
>   }
>   dev->power.wakeup = ws;
> + ws->dev = dev;
>   if (dev->power.wakeirq)
>   device_wakeup_attach_irq(dev, dev->power.wakeirq);
>   spin_unlock_irq(>power.lock);
> diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
> index 9102760..fc23c1a 100644
> --- a/include/linux/pm_wakeup.h
> +++ b/include/linux/pm_wakeup.h
> @@ -23,6 +23,7 @@ struct wake_irq;
>   * @name: Name of the wakeup source
>   * @entry: Wakeup source list entry
>   * @lock: Wakeup source lock
> + * @dev: The device it attached to
>   * @wakeirq: Optional device specific wakeirq
>   * @timer: Wakeup timer list
>   * @timer_expires: Wakeup timer expiration
> @@ -42,6 +43,7 @@ struct wakeup_source {
>   const char  *name;
>   struct list_headentry;
>   spinlock_t  lock;
> + struct device   *dev;
>   struct wake_irq *wakeirq;
>   struct timer_list   timer;
>   unsigned long   timer_expires;
> @@ -88,6 +90,7 @@ extern void wakeup_source_add(struct wakeup_source *ws);
>  extern void wakeup_source_remove(struct wakeup_source *ws);
>  extern struct wakeup_source *wakeup_source_register(const char *name);
>  extern void wakeup_source_unregister(struct wakeup_source *ws);
> +extern struct wakeup_source *wakeup_source_get_next(struct wakeup_source 
> *ws);
>  extern int device_wakeup_enable(struct device *dev);
>  extern int device_wakeup_disable(struct device *dev);
>  extern void device_set_wakeup_capable(struct device *dev, bool capable);
> 






Re: [PATCH V3] cpufreq: Make cpufreq_generic_init() return void

2019-07-18 Thread Rafael J. Wysocki
On Tuesday, July 16, 2019 6:06:08 AM CEST Viresh Kumar wrote:
> It always returns 0 (success) and its return type should really be void.
> Over that, many drivers have added error handling code based on its
> return value, which is not required at all.
> 
> change its return type to void and update all the callers.
> 
> Signed-off-by: Viresh Kumar 
> ---
> V2->V3:
> - Update bmips cpufreq driver to avoid "warning: 'ret' may be used
>   uninitialized".
> - Build bot reported this issue almost after 4 days of posting this
>   patch, I was expecting this a lot earlier :)
> 
>  drivers/cpufreq/bmips-cpufreq.c | 17 ++---
>  drivers/cpufreq/cpufreq.c   |  4 +---
>  drivers/cpufreq/davinci-cpufreq.c   |  3 ++-
>  drivers/cpufreq/imx6q-cpufreq.c |  6 ++
>  drivers/cpufreq/kirkwood-cpufreq.c  |  3 ++-
>  drivers/cpufreq/loongson1-cpufreq.c |  8 +++-
>  drivers/cpufreq/loongson2_cpufreq.c |  3 ++-
>  drivers/cpufreq/maple-cpufreq.c |  3 ++-
>  drivers/cpufreq/omap-cpufreq.c  | 15 +--
>  drivers/cpufreq/pasemi-cpufreq.c|  3 ++-
>  drivers/cpufreq/pmac32-cpufreq.c|  3 ++-
>  drivers/cpufreq/pmac64-cpufreq.c|  3 ++-
>  drivers/cpufreq/s3c2416-cpufreq.c   |  9 ++---
>  drivers/cpufreq/s3c64xx-cpufreq.c   | 15 +++
>  drivers/cpufreq/s5pv210-cpufreq.c   |  3 ++-
>  drivers/cpufreq/sa1100-cpufreq.c|  3 ++-
>  drivers/cpufreq/sa1110-cpufreq.c|  3 ++-
>  drivers/cpufreq/spear-cpufreq.c |  3 ++-
>  drivers/cpufreq/tegra20-cpufreq.c   |  8 +---
>  include/linux/cpufreq.h |  2 +-
>  20 files changed, 46 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/cpufreq/bmips-cpufreq.c b/drivers/cpufreq/bmips-cpufreq.c
> index 56a4ebbf00e0..f7c23fa468f0 100644
> --- a/drivers/cpufreq/bmips-cpufreq.c
> +++ b/drivers/cpufreq/bmips-cpufreq.c
> @@ -131,23 +131,18 @@ static int bmips_cpufreq_exit(struct cpufreq_policy 
> *policy)
>  static int bmips_cpufreq_init(struct cpufreq_policy *policy)
>  {
>   struct cpufreq_frequency_table *freq_table;
> - int ret;
>  
>   freq_table = bmips_cpufreq_get_freq_table(policy);
>   if (IS_ERR(freq_table)) {
> - ret = PTR_ERR(freq_table);
> - pr_err("%s: couldn't determine frequency table (%d).\n",
> - BMIPS_CPUFREQ_NAME, ret);
> - return ret;
> + pr_err("%s: couldn't determine frequency table (%ld).\n",
> + BMIPS_CPUFREQ_NAME, PTR_ERR(freq_table));
> + return PTR_ERR(freq_table);
>   }
>  
> - ret = cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
> - if (ret)
> - bmips_cpufreq_exit(policy);
> - else
> - pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
> + cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
> + pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
>  
> - return ret;
> + return 0;
>  }
>  
>  static struct cpufreq_driver bmips_cpufreq_driver = {
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4d6043ee7834..8dda62367816 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -159,7 +159,7 @@ EXPORT_SYMBOL_GPL(arch_set_freq_scale);
>   * - set policies transition latency
>   * - policy->cpus with all possible CPUs
>   */
> -int cpufreq_generic_init(struct cpufreq_policy *policy,
> +void cpufreq_generic_init(struct cpufreq_policy *policy,
>   struct cpufreq_frequency_table *table,
>   unsigned int transition_latency)
>  {
> @@ -171,8 +171,6 @@ int cpufreq_generic_init(struct cpufreq_policy *policy,
>* share the clock and voltage and clock.
>*/
>   cpumask_setall(policy->cpus);
> -
> - return 0;
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_generic_init);
>  
> diff --git a/drivers/cpufreq/davinci-cpufreq.c 
> b/drivers/cpufreq/davinci-cpufreq.c
> index 3de48ae60c29..297d23cad8b5 100644
> --- a/drivers/cpufreq/davinci-cpufreq.c
> +++ b/drivers/cpufreq/davinci-cpufreq.c
> @@ -90,7 +90,8 @@ static int davinci_cpu_init(struct cpufreq_policy *policy)
>* Setting the latency to 2000 us to accommodate addition of drivers
>* to pre/post change notification list.
>*/
> - return cpufreq_generic_init(policy, freq_table, 2000 * 1000);
> + cpufreq_generic_init(policy, freq_table, 2000 * 1000);
> + return 0;
>  }
>  
>  static struct cpufreq_driver davinci_driver = {
> diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
> index 47ccfa6b17b7..648a09a1778a 100644
> --- a/drivers/cpufreq/imx6q-cpufreq.c
> +++ b/drivers/cpufreq/imx6q-cpufreq.c
> @@ -190,14 +190,12 @@ static int imx6q_set_target(struct cpufreq_policy 
> *policy, unsigned int index)
>  
>  static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
>  {
> - int ret;
> -
>   policy->clk = clks[ARM].clk;
> - ret = cpufreq_generic_init(policy, freq_table, transition_latency);
> +

Re: [PATCH 00/10] cpufreq: Migrate users of policy notifiers to QoS requests

2019-07-16 Thread Rafael J. Wysocki
On Tue, Jul 16, 2019 at 12:14 PM Viresh Kumar  wrote:
>
> On 16-07-19, 12:06, Rafael J. Wysocki wrote:
> > On Tue, Jul 16, 2019 at 11:49 AM Viresh Kumar  
> > wrote:
> > >
> > > Hello,
> > >
> > > Now that cpufreq core supports taking QoS requests for min/max cpu
> > > frequencies, lets migrate rest of the users to using them instead of the
> > > policy notifiers.
> >
> > Technically, this still is linux-next only. :-)
>
> True :)
>
> > > The CPUFREQ_NOTIFY and CPUFREQ_ADJUST events of the policy notifiers are
> > > removed as a result, but we have to add CPUFREQ_CREATE_POLICY and
> > > CPUFREQ_REMOVE_POLICY events to it for the acpi stuff specifically. So
> > > the policy notifiers aren't completely removed.
> >
> > That's not entirely accurate, because arch_topology is going to use
> > CPUFREQ_CREATE_POLICY now too.
>
> Yeah, I thought about that while writing this patchset and
> coverletter. But had it not been required for ACPI, I would have done
> it differently for the arch-topology code. Maybe direct calling of
> arch-topology routine from cpufreq core. I wanted to get rid of the
> policy notifiers completely but I couldn't find a better way of doing
> it for ACPI stuff.
>
> > > Boot tested on my x86 PC and ARM hikey board. Nothing looked broken :)
> > >
> > > This has already gone through build bot for a few days now.
> >
> > So I'd prefer patches [5-8] to go right after the first one and then
> > do the cleanups on top of that, as somebody may want to backport the
> > essential changes without the cleanups.
>
> In the exceptional case where nobody finds anything wrong with the
> patches (highly unlikely), do you want me to resend with reordering or
> you can reorder them while applying? There are no dependencies between
> those patches anyway.

Please resend the reordered set when the merge window closes.


Re: [PATCH 00/10] cpufreq: Migrate users of policy notifiers to QoS requests

2019-07-16 Thread Rafael J. Wysocki
On Tue, Jul 16, 2019 at 11:49 AM Viresh Kumar  wrote:
>
> Hello,
>
> Now that cpufreq core supports taking QoS requests for min/max cpu
> frequencies, lets migrate rest of the users to using them instead of the
> policy notifiers.

Technically, this still is linux-next only. :-)

> The CPUFREQ_NOTIFY and CPUFREQ_ADJUST events of the policy notifiers are
> removed as a result, but we have to add CPUFREQ_CREATE_POLICY and
> CPUFREQ_REMOVE_POLICY events to it for the acpi stuff specifically. So
> the policy notifiers aren't completely removed.

That's not entirely accurate, because arch_topology is going to use
CPUFREQ_CREATE_POLICY now too.

> Boot tested on my x86 PC and ARM hikey board. Nothing looked broken :)
>
> This has already gone through build bot for a few days now.

So I'd prefer patches [5-8] to go right after the first one and then
do the cleanups on top of that, as somebody may want to backport the
essential changes without the cleanups.


Re: [PATCH v5] cpufreq/pasemi: fix an use-after-free in pas_cpufreq_cpu_init()

2019-07-10 Thread Rafael J. Wysocki
On Tuesday, July 9, 2019 10:12:05 AM CEST Viresh Kumar wrote:
> On 09-07-19, 16:04, Wen Yang wrote:
> > The cpu variable is still being used in the of_get_property() call
> > after the of_node_put() call, which may result in use-after-free.
> > 
> > Fixes: a9acc26b75f ("cpufreq/pasemi: fix possible object reference leak")
> > Signed-off-by: Wen Yang 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Viresh Kumar 
> > Cc: Michael Ellerman 
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux...@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org

Patch applied.

> > ---
> > v5: put together the code to get, use, and release cpu device_node.
> > v4: restore the blank line.
> > v3: fix a leaked reference.
> > v2: clean up the code according to the advice of viresh.
> > 
> >  drivers/cpufreq/pasemi-cpufreq.c | 21 +
> >  1 file changed, 9 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/pasemi-cpufreq.c 
> > b/drivers/cpufreq/pasemi-cpufreq.c
> > index 6b1e4ab..1f0beb7 100644
> > --- a/drivers/cpufreq/pasemi-cpufreq.c
> > +++ b/drivers/cpufreq/pasemi-cpufreq.c
> > @@ -131,10 +131,17 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy 
> > *policy)
> > int err = -ENODEV;
> >  
> > cpu = of_get_cpu_node(policy->cpu, NULL);
> > -
> > -   of_node_put(cpu);
> > if (!cpu)
> > goto out;
> 
> I would have loved a blank line here :)

And I added the blank line.

> > +   max_freqp = of_get_property(cpu, "clock-frequency", NULL);
> > +   of_node_put(cpu);
> > +   if (!max_freqp) {
> > +   err = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > +   /* we need the freq in kHz */
> > +   max_freq = *max_freqp / 1000;
> >  
> > dn = of_find_compatible_node(NULL, NULL, "1682m-sdc");
> > if (!dn)
> > @@ -171,16 +178,6 @@ static int pas_cpufreq_cpu_init(struct cpufreq_policy 
> > *policy)
> > }
> >  
> > pr_debug("init cpufreq on CPU %d\n", policy->cpu);
> > -
> > -   max_freqp = of_get_property(cpu, "clock-frequency", NULL);
> > -   if (!max_freqp) {
> > -   err = -EINVAL;
> > -   goto out_unmap_sdcpwr;
> > -   }
> > -
> > -   /* we need the freq in kHz */
> > -   max_freq = *max_freqp / 1000;
> > -
> > pr_debug("max clock-frequency is at %u kHz\n", max_freq);
> > pr_debug("initializing frequency table\n");
> 
> Though, enough versions have happened now.
> 
> Acked-by: Viresh Kumar 
> 
> 

Thanks!





Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-06-27 Thread Rafael J. Wysocki
On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
> 
>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> 'pfn_is_nosave' [-Werror=missing-prototypes]
> 
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
> 
> Cc: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: As suggestion by christophe remove duplicates prototypes
> 
>  arch/powerpc/kernel/suspend.c | 1 +
>  arch/s390/kernel/entry.h  | 1 -
>  include/linux/suspend.h   | 1 +
>  kernel/power/power.h  | 2 --
>  4 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/suspend.c b/arch/powerpc/kernel/suspend.c
> index a531154cc0f3..9e1b6b894245 100644
> --- a/arch/powerpc/kernel/suspend.c
> +++ b/arch/powerpc/kernel/suspend.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
> index 20420c2b8a14..b2956d49b6ad 100644
> --- a/arch/s390/kernel/entry.h
> +++ b/arch/s390/kernel/entry.h
> @@ -63,7 +63,6 @@ void __init startup_init(void);
>  void die(struct pt_regs *regs, const char *str);
>  int setup_profiling_timer(unsigned int multiplier);
>  void __init time_init(void);
> -int pfn_is_nosave(unsigned long);
>  void s390_early_resume(void);
>  unsigned long prepare_ftrace_return(unsigned long parent, unsigned long sp, 
> unsigned long ip);
>  
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 6b3ea9ea6a9e..e8b8a7bede90 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -395,6 +395,7 @@ extern bool system_entering_hibernation(void);
>  extern bool hibernation_available(void);
>  asmlinkage int swsusp_save(void);
>  extern struct pbe *restore_pblist;
> +int pfn_is_nosave(unsigned long pfn);
>  #else /* CONFIG_HIBERNATION */
>  static inline void register_nosave_region(unsigned long b, unsigned long e) 
> {}
>  static inline void register_nosave_region_late(unsigned long b, unsigned 
> long e) {}
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index 9e58bdc8a562..44bee462ff57 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>  static inline void hibernate_image_size_init(void) {}
>  #endif /* !CONFIG_HIBERNATION */
>  
> -extern int pfn_is_nosave(unsigned long);
> -
>  #define power_attr(_name) \
>  static struct kobj_attribute _name##_attr = {\
>   .attr   = { \
> 

Applied, thanks!





Re: [PATCH v4 1/3] PM: wakeup: Add routine to help fetch wakeup source object.

2019-06-18 Thread Rafael J. Wysocki
On Monday, May 20, 2019 11:52:36 AM CEST Ran Wang wrote:
> Some user might want to go through all registered wakeup sources
> and doing things accordingly. For example, SoC PM driver might need to
> do HW programming to prevent powering down specific IP which wakeup
> source depending on. And is user's responsibility to identify if this
> wakeup source he is interested in.

I guess the idea here is that you need to walk wakeup devices and you noticed
that there was a wakeup source object for each of them and those wakeup
source objects were on a list, so you could walk wakeup devices by walking
the list of wakeup source objects.

That is fair enough, but the changelog above doesn't even talk about that.
 
> Signed-off-by: Ran Wang 
> ---
> Change in v4:
>   - None.
> 
> Change in v3:
>   - Adjust indentation of *attached_dev;.
> 
> Change in v2:
>   - None.
> 
>  drivers/base/power/wakeup.c |   18 ++
>  include/linux/pm_wakeup.h   |3 +++
>  2 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 5b2b6a0..6904485 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -226,6 +227,22 @@ void wakeup_source_unregister(struct wakeup_source *ws)
>   }
>  }
>  EXPORT_SYMBOL_GPL(wakeup_source_unregister);
> +/**
> + * wakeup_source_get_next - Get next wakeup source from the list
> + * @ws: Previous wakeup source object, null means caller want first one.
> + */
> +struct wakeup_source *wakeup_source_get_next(struct wakeup_source *ws)
> +{
> + struct list_head *ws_head = _sources;
> +
> + if (ws)
> + return list_next_or_null_rcu(ws_head, >entry,
> + struct wakeup_source, entry);
> + else
> + return list_entry_rcu(ws_head->next,
> + struct wakeup_source, entry);
> +}
> +EXPORT_SYMBOL_GPL(wakeup_source_get_next);

This needs to be arranged along the lines of 
wakeup_sources_stats_seq_start/next/stop()
because of the SRCU protection of the list.

>  
>  /**
>   * device_wakeup_attach - Attach a wakeup source object to a device object.
> @@ -242,6 +259,7 @@ static int device_wakeup_attach(struct device *dev, 
> struct wakeup_source *ws)
>   return -EEXIST;
>   }
>   dev->power.wakeup = ws;
> + ws->attached_dev = dev;
>   if (dev->power.wakeirq)
>   device_wakeup_attach_irq(dev, dev->power.wakeirq);
>   spin_unlock_irq(>power.lock);
> diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h
> index 0ff134d..913b2fb 100644
> --- a/include/linux/pm_wakeup.h
> +++ b/include/linux/pm_wakeup.h
> @@ -50,6 +50,7 @@
>   * @wakeup_count: Number of times the wakeup source might abort suspend.
>   * @active: Status of the wakeup source.
>   * @has_timeout: The wakeup source has been activated with a timeout.
> + * @attached_dev: The device it attached to
>   */
>  struct wakeup_source {
>   const char  *name;
> @@ -70,6 +71,7 @@ struct wakeup_source {
>   unsigned long   wakeup_count;
>   boolactive:1;
>   boolautosleep_enabled:1;
> + struct device   *attached_dev;

Please (a) call it just dev and (b) move it up (before wakeirq, say).

>  };
>  
>  #ifdef CONFIG_PM_SLEEP
> @@ -101,6 +103,7 @@ static inline void device_set_wakeup_path(struct device 
> *dev)
>  extern void wakeup_source_remove(struct wakeup_source *ws);
>  extern struct wakeup_source *wakeup_source_register(const char *name);
>  extern void wakeup_source_unregister(struct wakeup_source *ws);
> +extern struct wakeup_source *wakeup_source_get_next(struct wakeup_source 
> *ws);
>  extern int device_wakeup_enable(struct device *dev);
>  extern int device_wakeup_disable(struct device *dev);
>  extern void device_set_wakeup_capable(struct device *dev, bool capable);
> 






Re: [PATCH 01/14] ABI: fix some syntax issues at the ABI database

2019-06-14 Thread Rafael J. Wysocki
On Fri, Jun 14, 2019 at 4:04 AM Mauro Carvalho Chehab
 wrote:
>
> From: Mauro Carvalho Chehab 
>
> On those three files, the ABI representation described at
> README are violated.
>
> - at sysfs-bus-iio-proximity-as3935:
> a ':' character is missing after "What"
>
> - at sysfs-class-devfreq:
> there's a typo at Description
>
> - at sysfs-class-cxl, it is using the ":" character at a
> file preamble, causing it to be misinterpreted as a
> tag.
>
> - On the other files, instead of "What", they use "Where".
>
> Signed-off-by: Mauro Carvalho Chehab 
> Signed-off-by: Mauro Carvalho Chehab 

Acked-by: Rafael J. Wysocki 

> ---
>  Documentation/ABI/testing/pstore  |  2 +-
>  .../sysfs-bus-event_source-devices-format |  2 +-
>  .../ABI/testing/sysfs-bus-i2c-devices-hm6352  |  6 ++---
>  .../ABI/testing/sysfs-bus-iio-distance-srf08  |  4 ++--
>  .../testing/sysfs-bus-iio-proximity-as3935|  4 ++--
>  .../ABI/testing/sysfs-bus-pci-devices-cciss   | 22 +--
>  .../testing/sysfs-bus-usb-devices-usbsevseg   | 12 +-
>  Documentation/ABI/testing/sysfs-class-cxl |  6 ++---
>  Documentation/ABI/testing/sysfs-class-devfreq |  2 +-
>  .../ABI/testing/sysfs-class-powercap  |  2 +-
>  Documentation/ABI/testing/sysfs-kernel-fscaps |  2 +-
>  .../ABI/testing/sysfs-kernel-vmcoreinfo   |  2 +-
>  12 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/ABI/testing/pstore 
> b/Documentation/ABI/testing/pstore
> index 5fca9f5e10a3..8d6e48f4e8ef 100644
> --- a/Documentation/ABI/testing/pstore
> +++ b/Documentation/ABI/testing/pstore
> @@ -1,4 +1,4 @@
> -Where: /sys/fs/pstore/... (or /dev/pstore/...)
> +What:  /sys/fs/pstore/... (or /dev/pstore/...)
>  Date:  March 2011
>  Kernel Version: 2.6.39
>  Contact:   tony.l...@intel.com
> diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format 
> b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
> index 77f47ff5ee02..b6f8748e0200 100644
> --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
> +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
> @@ -1,4 +1,4 @@
> -Where: /sys/bus/event_source/devices//format
> +What:  /sys/bus/event_source/devices//format
>  Date:  January 2012
>  Kernel Version: 3.3
>  Contact:   Jiri Olsa 
> diff --git a/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352 
> b/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352
> index feb2e4a87075..29bd447e50a0 100644
> --- a/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352
> +++ b/Documentation/ABI/testing/sysfs-bus-i2c-devices-hm6352
> @@ -1,18 +1,18 @@
> -Where: /sys/bus/i2c/devices/.../heading0_input
> +What:  /sys/bus/i2c/devices/.../heading0_input
>  Date:  April 2010
>  Kernel Version: 2.6.36?
>  Contact:   alan@intel.com
>  Description:   Reports the current heading from the compass as a floating
> point value in degrees.
>
> -Where: /sys/bus/i2c/devices/.../power_state
> +What:  /sys/bus/i2c/devices/.../power_state
>  Date:  April 2010
>  Kernel Version: 2.6.36?
>  Contact:   alan@intel.com
>  Description:   Sets the power state of the device. 0 sets the device into
> sleep mode, 1 wakes it up.
>
> -Where: /sys/bus/i2c/devices/.../calibration
> +What:  /sys/bus/i2c/devices/.../calibration
>  Date:  April 2010
>  Kernel Version: 2.6.36?
>  Contact:   alan@intel.com
> diff --git a/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08 
> b/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08
> index 0a1ca1487fa9..a133fd8d081a 100644
> --- a/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08
> +++ b/Documentation/ABI/testing/sysfs-bus-iio-distance-srf08
> @@ -1,4 +1,4 @@
> -What   /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
> +What:  /sys/bus/iio/devices/iio:deviceX/sensor_sensitivity
>  Date:  January 2017
>  KernelVersion: 4.11
>  Contact:   linux-...@vger.kernel.org
> @@ -6,7 +6,7 @@ Description:
> Show or set the gain boost of the amp, from 0-31 range.
> default 31
>
> -What   /sys/bus/iio/devices/iio:deviceX/sensor_max_range
> +What:  /sys/bus/iio/devices/iio:deviceX/sensor_max_range
>  Date:  January 2017
>  KernelVersion: 4.11
>  Contact:   linux-...@vger.kernel.org
> diff --git a/Documentation/ABI/testing/sysfs-bus-iio-proximity-as3935 
> b/Documentation/ABI/testing/sysfs-bus-

Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-28 Thread Rafael J. Wysocki
On Tuesday, May 28, 2019 3:16:30 AM CEST Michael Ellerman wrote:
> "Rafael J. Wysocki"  writes:
> > On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
> >> The declaration for pfn_is_nosave is only available in
> >> kernel/power/power.h. Since this function can be override in arch,
> >> expose it globally. Having a prototype will make sure to avoid warning
> >> (sometime treated as error with W=1) such as:
> >> 
> >>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> >> 'pfn_is_nosave' [-Werror=missing-prototypes]
> >> 
> >> This moves the declaration into a globally visible header file and add
> >> missing include to avoid a warning on powerpc. Also remove the
> >> duplicated prototypes since not required anymore.
> >> 
> >> Cc: Christophe Leroy 
> >> Signed-off-by: Mathieu Malaterre 
> >> ---
> >> v2: As suggestion by christophe remove duplicates prototypes
> >> 
> >>  arch/powerpc/kernel/suspend.c | 1 +
> >>  arch/s390/kernel/entry.h  | 1 -
> >>  include/linux/suspend.h   | 1 +
> >>  kernel/power/power.h  | 2 --
> >>  4 files changed, 2 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/kernel/power/power.h b/kernel/power/power.h
> >> index 9e58bdc8a562..44bee462ff57 100644
> >> --- a/kernel/power/power.h
> >> +++ b/kernel/power/power.h
> >> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
> >>  static inline void hibernate_image_size_init(void) {}
> >>  #endif /* !CONFIG_HIBERNATION */
> >>  
> >> -extern int pfn_is_nosave(unsigned long);
> >> -
> >>  #define power_attr(_name) \
> >>  static struct kobj_attribute _name##_attr = { \
> >>.attr   = { \
> >> 
> >
> > With an ACK from the powerpc maintainers, I could apply this one.
> 
> Sent.

Thanks!





Re: [PATCH v2] powerpc/power: Expose pfn_is_nosave prototype

2019-05-27 Thread Rafael J. Wysocki
On Friday, May 24, 2019 12:44:18 PM CEST Mathieu Malaterre wrote:
> The declaration for pfn_is_nosave is only available in
> kernel/power/power.h. Since this function can be override in arch,
> expose it globally. Having a prototype will make sure to avoid warning
> (sometime treated as error with W=1) such as:
> 
>   arch/powerpc/kernel/suspend.c:18:5: error: no previous prototype for 
> 'pfn_is_nosave' [-Werror=missing-prototypes]
> 
> This moves the declaration into a globally visible header file and add
> missing include to avoid a warning on powerpc. Also remove the
> duplicated prototypes since not required anymore.
> 
> Cc: Christophe Leroy 
> Signed-off-by: Mathieu Malaterre 
> ---
> v2: As suggestion by christophe remove duplicates prototypes
> 
>  arch/powerpc/kernel/suspend.c | 1 +
>  arch/s390/kernel/entry.h  | 1 -
>  include/linux/suspend.h   | 1 +
>  kernel/power/power.h  | 2 --
>  4 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/suspend.c b/arch/powerpc/kernel/suspend.c
> index a531154cc0f3..9e1b6b894245 100644
> --- a/arch/powerpc/kernel/suspend.c
> +++ b/arch/powerpc/kernel/suspend.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> diff --git a/arch/s390/kernel/entry.h b/arch/s390/kernel/entry.h
> index 20420c2b8a14..b2956d49b6ad 100644
> --- a/arch/s390/kernel/entry.h
> +++ b/arch/s390/kernel/entry.h
> @@ -63,7 +63,6 @@ void __init startup_init(void);
>  void die(struct pt_regs *regs, const char *str);
>  int setup_profiling_timer(unsigned int multiplier);
>  void __init time_init(void);
> -int pfn_is_nosave(unsigned long);
>  void s390_early_resume(void);
>  unsigned long prepare_ftrace_return(unsigned long parent, unsigned long sp, 
> unsigned long ip);
>  
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 6b3ea9ea6a9e..e8b8a7bede90 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -395,6 +395,7 @@ extern bool system_entering_hibernation(void);
>  extern bool hibernation_available(void);
>  asmlinkage int swsusp_save(void);
>  extern struct pbe *restore_pblist;
> +int pfn_is_nosave(unsigned long pfn);
>  #else /* CONFIG_HIBERNATION */
>  static inline void register_nosave_region(unsigned long b, unsigned long e) 
> {}
>  static inline void register_nosave_region_late(unsigned long b, unsigned 
> long e) {}
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index 9e58bdc8a562..44bee462ff57 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -75,8 +75,6 @@ static inline void hibernate_reserved_size_init(void) {}
>  static inline void hibernate_image_size_init(void) {}
>  #endif /* !CONFIG_HIBERNATION */
>  
> -extern int pfn_is_nosave(unsigned long);
> -
>  #define power_attr(_name) \
>  static struct kobj_attribute _name##_attr = {\
>   .attr   = { \
> 

With an ACK from the powerpc maintainers, I could apply this one.






Re: [PATCH 10/10] docs: fix broken documentation links

2019-05-27 Thread Rafael J. Wysocki
On Mon, May 20, 2019 at 4:48 PM Mauro Carvalho Chehab
 wrote:
>
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
>
> Signed-off-by: Mauro Carvalho Chehab 

For the ACPI part:

Acked-by: Rafael J. Wysocki 


Re: [PATCH v5 00/23] Include linux ACPI docs into Sphinx TOC tree

2019-05-01 Thread Rafael J. Wysocki
On Thursday, April 25, 2019 5:20:35 PM CEST Changbin Du wrote:
> On Thu, Apr 25, 2019 at 10:44:14AM +0200, Rafael J. Wysocki wrote:
> > .On Wed, Apr 24, 2019 at 7:54 PM Changbin Du  wrote:
> > >
> > > Hi All,
> > > The kernel now uses Sphinx to generate intelligent and beautiful 
> > > documentation
> > > from reStructuredText files. I converted all of the Linux ACPI/PCI/X86 
> > > docs to
> > > reST format in this serias.
> > >
> > > The hieararchy of ACPI docs are based on Corbet's suggestion:
> > > https://lkml.org/lkml/2019/4/3/1047
> > > I did some adjustment according to the content and finally they are 
> > > placed as:
> > > Documentation/firmware-guide/acpi/
> > 
> > I'd like to queue up this series, but it is missing a patch to create
> > Documentation/firmware-guide/acpi/index.rst.
> > 
> > Care to provide one?
> oops, the first patch is missed. Let me add it next.

I've picked up the first patch from the v6 and applied this series on top of it.

 Thanks!





Re: Why is suspend with s2idle available on POWER8 systems?

2019-04-29 Thread Rafael J. Wysocki
On Mon, Apr 29, 2019 at 10:50 AM Paul Menzel  wrote:
>
> Dear Rafael,
>
>
> On 04/29/2019 09:17 AM, Rafael J. Wysocki wrote:
> > On Sat, Apr 27, 2019 at 12:54 PM Paul Menzel  wrote:
>
> >> Updating an IBM S822LC from Ubuntu 18.10 to 19.04 some user space stuff
> >> seems to have changed, so that going into sleep/suspend is enabled.
> >>
> >> That raises two questions.
> >>
> >> 1.  Is suspend actually supported on a POWER8 processor?
> >
> > Suspend-to-idle is a special variant of system suspend that does not
> > depend on any special platform support.  It works by suspending
> > devices and letting all of the CPUs in the system go idle (hence the
> > name).
> >
> > Also see 
> > https://www.kernel.org/doc/html/latest/admin-guide/pm/sleep-states.html#suspend-to-idle
>
> Thanks. I guess I mixed it up with the new S0ix-states [1].

Those can be entered via suspend-to-idle, if supported and actually
reachable on a given platform, but suspend-to-idle is more general
than that.

> >>> Apr 27 10:18:13 power NetworkManager[7534]:   [1556353093.7224] 
> >>> manager: sleep: sleep requested (sleeping: no  e
> >>> Apr 27 10:18:13 power systemd[1]: Reached target Sleep.
> >>> Apr 27 10:18:13 power systemd[1]: Starting Suspend...
> >>> Apr 27 10:18:13 power systemd-sleep[82190]: Suspending system...
> >>> Apr 27 10:18:13 power kernel: PM: suspend entry (s2idle)
> >>> -- Reboot --
> >>
> >>> $ uname -m
> >>> ppc64le
> >>> $ more /proc/version
> >>> Linux version 5.1.0-rc6+ (joey@power) (gcc version 8.3.0 (Ubuntu 
> >>> 8.3.0-6ubuntu1)) #1 SMP Sat Apr 27 10:01:48 CEST 2019
> >>> $ more /sys/power/mem_sleep
> >>> [s2idle]
> >>> $ more /sys/power/state
> >>> freeze mem
> >>> $ grep _SUSPEND /boot/config-5.0.0-14-generic # also enabled in Ubuntu’s 
> >>> configuration
> >>> CONFIG_ARCH_SUSPEND_POSSIBLE=y
> >>> CONFIG_SUSPEND=y
> >>> CONFIG_SUSPEND_FREEZER=y
> >>> # CONFIG_SUSPEND_SKIP_SYNC is not set
> >>> # CONFIG_PM_TEST_SUSPEND is not set
> >>
> >> Should the Kconfig symbol `SUSPEND` be selectable? If yes, should their
> >> be some detection during runtime?
> >>
> >> 2.  If it is supported, what are the ways to getting it to resume? What
> >> would the IPMI command be?
> >
> > That would depend on the distribution.
> >
> > Generally, you need to set up at least one device to generate wakeup
> > interrupts.
> >
> > The interface to do that are the /sys/devices/.../power/wakeup files,
> > but that has to cause enble_irq_wake() to be called for the given IRQ,
> > so some support in the underlying drivers need to be present for it to
> > work.
> >
> > USB devices generally work as wakeup sources if the controllers reside
> > on a PCI bus, for example.
>
> ```
> $ find /sys/devices/ -name wakeup | xargs grep enabled
> /sys/devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:09.0/0021:0d:00.0/usb1/1-3/1-3.4/power/wakeup:enabled
> /sys/devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:09.0/0021:0d:00.0/power/wakeup:enabled
> $ lsusb -t
> /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
> /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
> |__ Port 3: Dev 2, If 0, Class=Hub, Driver=hub/5p, 480M
> |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 480M
> |__ Port 2: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M
> |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M
> |__ Port 4: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 
> 1.5M
> |__ Port 4: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 
> 1.5M
> $ lsusb
> Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> Bus 001 Device 006: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
> and Mouse
> Bus 001 Device 005: ID 046b:ff31 American Megatrends, Inc.
> Bus 001 Device 004: ID 046b:ff40 American Megatrends, Inc.
> Bus 001 Device 003: ID 046b:ff20 American Megatrends, Inc.
> Bus 001 Device 002: ID 046b:ff01 American Megatrends, Inc.
> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
> ```

I'm not really sure what you wanted to say here, but it looks like
system wakeup is not enabled for device 6 on bus 1 which is probably
what you want.


Re: Why is suspend with s2idle available on POWER8 systems?

2019-04-29 Thread Rafael J. Wysocki
On Sat, Apr 27, 2019 at 12:54 PM Paul Menzel  wrote:
>
> Dear Linux folks,
>
>
> Updating an IBM S822LC from Ubuntu 18.10 to 19.04 some user space stuff
> seems to have changed, so that going into sleep/suspend is enabled.
>
> That raises two questions.
>
> 1.  Is suspend actually supported on a POWER8 processor?

Suspend-to-idle is a special variant of system suspend that does not
depend on any special platform support.  It works by suspending
devices and letting all of the CPUs in the system go idle (hence the
name).

Also see 
https://www.kernel.org/doc/html/latest/admin-guide/pm/sleep-states.html#suspend-to-idle

>
> > Apr 27 10:18:13 power NetworkManager[7534]:   [1556353093.7224] 
> > manager: sleep: sleep requested (sleeping: no  e
> > Apr 27 10:18:13 power systemd[1]: Reached target Sleep.
> > Apr 27 10:18:13 power systemd[1]: Starting Suspend...
> > Apr 27 10:18:13 power systemd-sleep[82190]: Suspending system...
> > Apr 27 10:18:13 power kernel: PM: suspend entry (s2idle)
> > -- Reboot --
>
> > $ uname -m
> > ppc64le
> > $ more /proc/version
> > Linux version 5.1.0-rc6+ (joey@power) (gcc version 8.3.0 (Ubuntu 
> > 8.3.0-6ubuntu1)) #1 SMP Sat Apr 27 10:01:48 CEST 2019
> > $ more /sys/power/mem_sleep
> > [s2idle]
> > $ more /sys/power/state
> > freeze mem
> > $ grep _SUSPEND /boot/config-5.0.0-14-generic # also enabled in Ubuntu’s 
> > configuration
> > CONFIG_ARCH_SUSPEND_POSSIBLE=y
> > CONFIG_SUSPEND=y
> > CONFIG_SUSPEND_FREEZER=y
> > # CONFIG_SUSPEND_SKIP_SYNC is not set
> > # CONFIG_PM_TEST_SUSPEND is not set
>
> Should the Kconfig symbol `SUSPEND` be selectable? If yes, should their
> be some detection during runtime?
>
> 2.  If it is supported, what are the ways to getting it to resume? What
> would the IPMI command be?

That would depend on the distribution.

Generally, you need to set up at least one device to generate wakeup interrupts.

The interface to do that are the /sys/devices/.../power/wakeup files,
but that has to cause enble_irq_wake() to be called for the given IRQ,
so some support in the underlying drivers need to be present for it to
work.

USB devices generally work as wakeup sources if the controllers reside
on a PCI bus, for example.

Thanks,
Rafael


Re: [PATCH v5 00/23] Include linux ACPI docs into Sphinx TOC tree

2019-04-25 Thread Rafael J. Wysocki
.On Wed, Apr 24, 2019 at 7:54 PM Changbin Du  wrote:
>
> Hi All,
> The kernel now uses Sphinx to generate intelligent and beautiful documentation
> from reStructuredText files. I converted all of the Linux ACPI/PCI/X86 docs to
> reST format in this serias.
>
> The hieararchy of ACPI docs are based on Corbet's suggestion:
> https://lkml.org/lkml/2019/4/3/1047
> I did some adjustment according to the content and finally they are placed as:
> Documentation/firmware-guide/acpi/

I'd like to queue up this series, but it is missing a patch to create
Documentation/firmware-guide/acpi/index.rst.

Care to provide one?


Re: [PATCH v4 00/63] Include linux ACPI/PCI/X86 docs into Sphinx TOC tree

2019-04-23 Thread Rafael J. Wysocki
On Tue, Apr 23, 2019 at 6:30 PM Changbin Du  wrote:
>
> Hi Corbet and All,
> The kernel now uses Sphinx to generate intelligent and beautiful documentation
> from reStructuredText files. I converted all of the Linux ACPI/PCI/X86 docs to
> reST format in this serias.
>
> In this version I combined ACPI and PCI docs, and added new x86 docs 
> conversion.

I'm not sure if combining all three into one big patch series has been
a good idea, honestly.

It would have been easier to review and handle otherwise.

For one, I'd like to handle the ACPI part of it myself if Jon doesn't mind that.

Thanks,
Rafael


Re: [PATCH] drivers: cpuidle: This patch fix the following checkpatch warning.

2019-04-18 Thread Rafael J. Wysocki
On Wednesday, April 17, 2019 4:52:34 PM CEST Mohan Kumar wrote:
> Use pr_debug instead of printk
> 
> WARNING: Prefer [subsystem eg: netdev]_dbg([subsystem]dev, ... then
> dev_dbg(dev, ... then pr_debug(...  to printk(KERN_DEBUG ...
> 
> Signed-off-by: Mohan Kumar 
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index 84b1ebe..7cf9835 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -401,7 +401,7 @@ static int __init powernv_processor_idle_init(void)
>   powernv_cpuidle_driver_init();
>   retval = cpuidle_register(_idle_driver, NULL);
>   if (retval) {
> - printk(KERN_DEBUG "Registration of powernv driver failed.\n");
> + pr_debug("Registration of powernv driver failed.\n");
>   return retval;
>   }
>  
> @@ -413,7 +413,7 @@ static int __init powernv_processor_idle_init(void)
>  "cpuidle/powernv:dead", NULL,
>  powernv_cpuidle_cpu_dead);
>   WARN_ON(retval < 0);
> - printk(KERN_DEBUG "powernv_idle_driver registered\n");
> + pr_debug("powernv_idle_driver registered\n");
>   return 0;
>  }
>  
> 

Recently, you've sent two different patches against two different drivers with 
the same subject.

IMO it is fair enough to call that "confusing".

Moreover, pr_debug() is not a direct replacement for printk(KERN_DEBUG ) as the 
latter does
not require dynamic debug to be enabled and I'm not really sure if you are 
aware of that
difference.






Re: [PATCH v2 1/2] cpuidle : auto-promotion for cpuidle states

2019-04-09 Thread Rafael J. Wysocki
On Fri, Apr 5, 2019 at 11:17 AM Abhishek Goel
 wrote:
>
> Currently, the cpuidle governors (menu /ladder) determine what idle state

There are three governors in 5.1-rc.

> an idling CPU should enter into based on heuristics that depend on the
> idle history on that CPU. Given that no predictive heuristic is perfect,
> there are cases where the governor predicts a shallow idle state, hoping
> that the CPU will be busy soon. However, if no new workload is scheduled
> on that CPU in the near future, the CPU will end up in the shallow state.
>
> In case of POWER, this is problematic, when the predicted state in the
> aforementioned scenario is a lite stop state, as such lite states will
> inhibit SMT folding, thereby depriving the other threads in the core from
> using the core resources.
>
> To address this, such lite states need to be autopromoted.

I don't quite agree with this statement and it doesn't even match what
the patch does AFAICS.  "Autopromotion" would be going from the given
state to a deeper one without running state selection in between, but
that's not what's going on here.

> The cpuidle-core can queue timer to correspond with the residency value of 
> the next
> available state. Thus leading to auto-promotion to a deeper idle state as
> soon as possible.

No, it doesn't automatically cause a deeper state to be used next
time.  It simply kicks the CPU out of the idle state and one more
iteration of the idle loop runs on it.  Whether or not a deeper state
will be selected in that iteration depends on the governor
computations carried out in it.

Now, this appears to be almost analogous to the "polling" state used
on x86 which uses the next idle state's target residency as a timeout.

While generally I'm not a big fan of setting up timers in the idle
loop (it sort of feels like pulling your own hair in order to get
yourself out of a swamp), if idle states like these are there in your
platform, setting up a timer to get out of them in the driver's
->enter() routine might not be particularly objectionable.  Doing that
in the core is a whole different story, though.

Generally, this adds quite a bit of complexity (on the "ugly" side of
things IMO) to the core to cover a corner case present in one
platform, while IMO it can be covered in the driver for that platform
directly.

> Signed-off-by: Abhishek Goel 
> ---
>
> v1->v2 : Removed timeout_needed and rebased to current upstream kernel
>
>  drivers/cpuidle/cpuidle.c  | 68 +-
>  drivers/cpuidle/governors/ladder.c |  3 +-
>  drivers/cpuidle/governors/menu.c   | 22 +-
>  include/linux/cpuidle.h| 10 -
>  4 files changed, 99 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 7f108309e..11ce43f19 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -36,6 +36,11 @@ static int enabled_devices;
>  static int off __read_mostly;
>  static int initialized __read_mostly;
>
> +struct auto_promotion {
> +   struct hrtimer  hrtimer;
> +   unsigned long   timeout_us;
> +};
> +
>  int cpuidle_disabled(void)
>  {
> return off;
> @@ -188,6 +193,54 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, 
> struct cpuidle_device *dev)
>  }
>  #endif /* CONFIG_SUSPEND */
>
> +enum hrtimer_restart auto_promotion_hrtimer_callback(struct hrtimer *hrtimer)
> +{
> +   return HRTIMER_NORESTART;
> +}
> +
> +#ifdef CONFIG_CPU_IDLE_AUTO_PROMOTION
> +DEFINE_PER_CPU(struct auto_promotion, ap);
> +
> +static void cpuidle_auto_promotion_start(int cpu, struct cpuidle_state 
> *state)
> +{
> +   struct auto_promotion *this_ap = _cpu(ap, cpu);
> +
> +   if (state->flags & CPUIDLE_FLAG_AUTO_PROMOTION)
> +   hrtimer_start(_ap->hrtimer, 
> ns_to_ktime(this_ap->timeout_us
> +   * 1000), HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void cpuidle_auto_promotion_cancel(int cpu)
> +{
> +   struct hrtimer *hrtimer;
> +
> +   hrtimer = _cpu(ap, cpu).hrtimer;
> +   if (hrtimer_is_queued(hrtimer))
> +   hrtimer_cancel(hrtimer);
> +}
> +
> +static void cpuidle_auto_promotion_update(int cpu, unsigned long timeout)
> +{
> +   per_cpu(ap, cpu).timeout_us = timeout;
> +}
> +
> +static void cpuidle_auto_promotion_init(int cpu, struct cpuidle_driver *drv)
> +{
> +   struct auto_promotion *this_ap = _cpu(ap, cpu);
> +
> +   hrtimer_init(_ap->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +   this_ap->hrtimer.function = auto_promotion_hrtimer_callback;
> +}
> +#else
> +static inline void cpuidle_auto_promotion_start(int cpu, struct cpuidle_state
> +   *state) { }
> +static inline void cpuidle_auto_promotion_cancel(int cpu) { }
> +static inline void cpuidle_auto_promotion_update(int cpu, unsigned long
> +   timeout) { }
> +static inline void 

Re: [PATCH v2 1/2] cpuidle : auto-promotion for cpuidle states

2019-04-09 Thread Rafael J. Wysocki
On Tue, Apr 9, 2019 at 11:29 AM Abhishek  wrote:
>
> Hi Daniel,
>
> Thanks for such a descriptive review. I will include all the suggestions
> made in my next iteration.

Please give me some time to send comments before that.


Re: [PATCH 1/2] cpuidle : auto-promotion for cpuidle states

2019-03-22 Thread Rafael J. Wysocki
On Fri, Mar 22, 2019 at 8:31 AM Abhishek Goel
 wrote:
>
> Currently, the cpuidle governors (menu /ladder) determine what idle state
> an idling CPU should enter into based on heuristics that depend on the
> idle history on that CPU. Given that no predictive heuristic is perfect,
> there are cases where the governor predicts a shallow idle state, hoping
> that the CPU will be busy soon. However, if no new workload is scheduled
> on that CPU in the near future, the CPU will end up in the shallow state.
>
> In case of POWER, this is problematic, when the predicted state in the
> aforementioned scenario is a lite stop state, as such lite states will
> inhibit SMT folding, thereby depriving the other threads in the core from
> using the core resources.
>
> To address this, such lite states need to be autopromoted. The cpuidle-
> core can queue timer to correspond with the residency value of the next
> available state. Thus leading to auto-promotion to a deeper idle state as
> soon as possible.

Isn't the tick stopping avoidance sufficient for that?


Re: [PATCH v2] cpufreq: powernv: add of_node_put()

2018-12-11 Thread Rafael J. Wysocki
On Wednesday, November 21, 2018 5:02:04 AM CET Viresh Kumar wrote:
> On 20-11-18, 11:05, Yangtao Li wrote:
> > The of_find_node_by_path() returns a node pointer with refcount
> > incremented,but there is the lack of use of the of_node_put() when
> > done.Add the missing of_node_put() to release the refcount.
> > 
> > Signed-off-by: Yangtao Li 
> > ---
> > Changes in v2
> > -update changelog
> > ---
> >  drivers/cpufreq/powernv-cpufreq.c | 17 +++--
> >  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> Acked-by: Viresh Kumar 

Patch applied, thanks!



Re: [PATCH v2] cpufreq: pmac64: add of_node_put()

2018-12-11 Thread Rafael J. Wysocki
On Monday, November 26, 2018 7:02:26 AM CET Viresh Kumar wrote:
> On 23-11-18, 08:33, Yangtao Li wrote:
> > of_find_node_by_path() acquires a reference to the node
> > returned by it and that reference needs to be dropped by its caller.
> > g5_neo2_cpufreq_init() doesn't do that, so fix it.
> > 
> > Signed-off-by: Yangtao Li 
> > Acked-by: Viresh Kumar 
> > ---
> > Changes in v2:
> > -update changelog
> > ---
> >  drivers/cpufreq/pmac64-cpufreq.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/cpufreq/pmac64-cpufreq.c 
> > b/drivers/cpufreq/pmac64-cpufreq.c
> > index be623dd7b9f2..1d32a863332d 100644
> > --- a/drivers/cpufreq/pmac64-cpufreq.c
> > +++ b/drivers/cpufreq/pmac64-cpufreq.c
> > @@ -411,6 +411,7 @@ static int __init g5_neo2_cpufreq_init(struct 
> > device_node *cpunode)
> > pfunc_set_vdnap0 = pmf_find_function(root, "set-vdnap0");
> > pfunc_vdnap0_complete =
> > pmf_find_function(root, "slewing-done");
> > +   of_node_put(root);
> > if (pfunc_set_vdnap0 == NULL ||
> > pfunc_vdnap0_complete == NULL) {
> > pr_err("Can't find required platform function\n");
> 
> Acked-by: Viresh Kumar 

Patch applied, thanks!



Re: [PATCH] cpufreq: powernv: add of_node_put()

2018-11-20 Thread Rafael J. Wysocki
On Tue, Nov 20, 2018 at 1:57 PM Yangtao Li  wrote:
>
> use of_node_put() to release the refcount.

Again, this changelog is as good as no changelog at all.

If you are adding a missing of_node_put(), please say that and explain
why it is necessary.

Thanks,
Rafael


Re: [PATCH v1 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-09-18 Thread Rafael J. Wysocki
On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand  wrote:
>
> add_memory() currently does not take the device_hotplug_lock, however
> is aleady called under the lock from
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> to synchronize against CPU hot-remove and similar.
>
> In general, we should hold the device_hotplug_lock when adding memory
> to synchronize against online/offline request (e.g. from user space) -
> which already resulted in lock inversions due to device_lock() and
> mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
> hot-add deadlock"). add_memory()/add_memory_resource() will create memory
> block devices, so this really feels like the right thing to do.
>
> Holding the device_hotplug_lock makes sure that a memory block device
> can really only be accessed (e.g. via .online/.state) from user space,
> once the memory has been fully added to the system.
>
> The lock is not held yet in
> drivers/xen/balloon.c
> arch/powerpc/platforms/powernv/memtrace.c
> drivers/s390/char/sclp_cmd.c
> drivers/hv/hv_balloon.c
> So, let's either use the locked variants or take the lock.
>
> Don't export add_memory_resource(), as it once was exported to be used
> by XEN, which is never built as a module. If somebody requires it, we
> also have to export a locked variant (as device_hotplug_lock is never
> exported).
>
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Greg Kroah-Hartman 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Nathan Fontenot 
> Cc: John Allen 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Joonsoo Kim 
> Cc: Vlastimil Babka 
> Cc: Oscar Salvador 
> Cc: Mathieu Malaterre 
> Cc: Pavel Tatashin 
> Cc: YASUAKI ISHIMATSU 
> Reviewed-by: Pavel Tatashin 
> Signed-off-by: David Hildenbrand 
> ---
>  .../platforms/pseries/hotplug-memory.c|  2 +-
>  drivers/acpi/acpi_memhotplug.c|  2 +-
>  drivers/base/memory.c |  9 ++--
>  drivers/xen/balloon.c |  3 +++
>  include/linux/memory_hotplug.h|  1 +
>  mm/memory_hotplug.c   | 22 ---
>  6 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index b3f54466e25f..2e6f41dc103a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> /* Add the memory */
> -   rc = add_memory(nid, lmb->base_addr, block_sz);
> +   rc = __add_memory(nid, lmb->base_addr, block_sz);
> if (rc) {
> dlpar_remove_device_tree_lmb(lmb);
> return rc;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 811148415993..8fe0960ea572 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> -   result = add_memory(node, info->start_addr, info->length);
> +   result = __add_memory(node, info->start_addr, info->length);
>
> /*
>  * If the memory block has been used by the kernel, 
> add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 817320c7c4c1..40cac122ec73 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct 
> device_attribute *attr,
> if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
> return -EINVAL;
>
> +   ret = lock_device_hotplug_sysfs();
> +   if (ret)
> +   goto out;
> +
> nid = memory_add_physaddr_to_nid(phys_addr);
> -   ret = add_memory(nid, phys_addr,
> -MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +   ret = __add_memory(nid, phys_addr,
> +  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
>
> if (ret)
> goto out;
>
> ret = count;
>  out:
> +   unlock_device_hotplug();
> return ret;
>  }
>
&

Re: [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

2018-09-18 Thread Rafael J. Wysocki
On Tue, Sep 18, 2018 at 1:48 PM David Hildenbrand  wrote:
>
> remove_memory() is exported right now but requires the
> device_hotplug_lock, which is not exported. So let's provide a variant
> that takes the lock and only export that one.
>
> The lock is already held in
> arch/powerpc/platforms/pseries/hotplug-memory.c
> drivers/acpi/acpi_memhotplug.c
> So, let's use the locked variant.
>
> The lock is not held (but taken in)
> arch/powerpc/platforms/powernv/memtrace.c
> So let's keep using the (now) locked variant.
>
> Apart from that, there are not other users in the tree.
>
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Rashmica Gupta 
> Cc: Michael Neuling 
> Cc: Balbir Singh 
> Cc: Nathan Fontenot 
> Cc: John Allen 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Joonsoo Kim 
> Cc: Vlastimil Babka 
> Cc: Pavel Tatashin 
> Cc: Greg Kroah-Hartman 
> Cc: Oscar Salvador 
> Cc: YASUAKI ISHIMATSU 
> Cc: Mathieu Malaterre 
> Reviewed-by: Pavel Tatashin 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c   | 2 --
>  arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++---
>  drivers/acpi/acpi_memhotplug.c  | 2 +-
>  include/linux/memory_hotplug.h  | 3 ++-
>  mm/memory_hotplug.c | 9 -
>  5 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..8f1cd4f3bfd5 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, 
> u64 nr_pages)
> walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE,
>   change_memblock_state);
>
> -   lock_device_hotplug();
> remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT);
> -   unlock_device_hotplug();
>
> return true;
>  }
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c1578f54c626..b3f54466e25f 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, 
> unsigned int memblock_siz
> nid = memory_add_physaddr_to_nid(base);
>
> for (i = 0; i < sections_per_block; i++) {
> -   remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
> +   __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
> base += MIN_MEMORY_BLOCK_SIZE;
> }
>
> @@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
> block_sz = pseries_memory_block_size();
> nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> -   remove_memory(nid, lmb->base_addr, block_sz);
> +   __remove_memory(nid, lmb->base_addr, block_sz);
>
> /* Update memory regions for memory remove */
> memblock_remove(lmb->base_addr, block_sz);
> @@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>
> rc = dlpar_online_lmb(lmb);
> if (rc) {
> -   remove_memory(nid, lmb->base_addr, block_sz);
> +   __remove_memory(nid, lmb->base_addr, block_sz);
> dlpar_remove_device_tree_lmb(lmb);
> } else {
> lmb->flags |= DRCONF_MEM_ASSIGNED;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..811148415993 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
> acpi_memory_device *mem_device)
> nid = memory_add_physaddr_to_nid(info->start_addr);
>
> acpi_unbind_memory_blocks(info);
> -   remove_memory(nid, info->start_addr, info->length);
> +   __remove_memory(nid, info->start_addr, info->length);
> list_del(>list);
> kfree(info);
> }
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 34a28227068d..1f096852f479 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, 
> unsigned long nr_pages);
>  extern v

Re: [RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-20 Thread Rafael J. Wysocki
On Mon, Aug 20, 2018 at 1:17 PM Arnd Bergmann  wrote:
>
> On Mon, Aug 20, 2018 at 10:31 AM Rafael J. Wysocki  wrote:
> > On Fri, Aug 17, 2018 at 12:32 PM Arnd Bergmann  wrote:
>
> > > -static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
> > > -   struct pci_ops *ops, void *sysdata, struct list_head 
> > > *resources)
> > > +void pcibios_scan_root(int busnum)
> > >  {
> > > +   struct pci_sysdata *sd;
> > > struct pci_host_bridge *bridge;
> > > int error;
> > >
> > > -   bridge = pci_alloc_host_bridge(0);
> > > -   if (!bridge)
> > > -   return NULL;
> > > +   bridge = pci_alloc_host_bridge(sizeof(sd));
> > > +   if (!bridge) {
> > > +   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", 
> > > busnum);
> > > +   return;
> > > +   }
> > > +   sd = pci_host_bridge_priv(bridge);
> >
> > This looks fishy, as bridge->private is not set at this point AFAICS,
> > unless one of the previous patches changes that.
>
> bridge->private what comes after the bridge structure, and it's allocated
> by pci_alloc_host_bridge() passing the size of the structure we want
> for this private area.

I see, sorry for the noise.


Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Rafael J. Wysocki
On Mon, Aug 20, 2018 at 1:20 PM Arnd Bergmann  wrote:
>
> On Mon, Aug 20, 2018 at 10:23 AM Rafael J. Wysocki  wrote:
> > On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
> > > @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct 
> > > acpi_pci_root *root,
> > > int ret, busnum = root->secondary.start;
> > > struct acpi_device *device = root->device;
> > > int node = acpi_get_node(device->handle);
> > > -   struct pci_bus *bus;
> > > -   struct pci_host_bridge *host_bridge;
> > > +   struct pci_host_bridge *bridge;
> >
> > Why "bridge" and not "host" or even something to stand for "root complex"?
> >
> > Or maybe it can still be "host_bridge"?
>
> I did this for consistency with the naming in drivers/pci/probe.c,
> which always declares the local variable as 'struct pci_host_bridge *bridge'.
> It's easy to change here if you feel strongly about it (I don't).

I would leave host_bridge here.  It would make the patch smaller too I think.


Re: [RFC 08/15] x86: PCI: clean up pcibios_scan_root()

2018-08-20 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 12:32 PM Arnd Bergmann  wrote:
>
> pcibios_scan_root() is now just a wrapper around pci_scan_root_bus(),
> and merging the two into one makes it shorter and more readable.
>
> We can also take advantage of pci_alloc_host_bridge() doing the
> allocation of the sysdata for us, which helps if we ever want to
> allow hot-unplugging the host bridge itself.
>
> We might be able to simplify it further using pci_host_probe(),
> but I wasn't sure about the resource registration there.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/x86/pci/common.c | 53 ++-
>  1 file changed, 17 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index e740d9aa4024..920d0885434c 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -453,54 +453,35 @@ void __init dmi_check_pciprobe(void)
> dmi_check_system(pciprobe_dmi_table);
>  }
>
> -static struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
> -   struct pci_ops *ops, void *sysdata, struct list_head 
> *resources)
> +void pcibios_scan_root(int busnum)
>  {
> +   struct pci_sysdata *sd;
> struct pci_host_bridge *bridge;
> int error;
>
> -   bridge = pci_alloc_host_bridge(0);
> -   if (!bridge)
> -   return NULL;
> +   bridge = pci_alloc_host_bridge(sizeof(sd));
> +   if (!bridge) {
> +   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
> +   return;
> +   }
> +   sd = pci_host_bridge_priv(bridge);

This looks fishy, as bridge->private is not set at this point AFAICS,
unless one of the previous patches changes that.

>
> -   list_splice_init(resources, >windows);
> -   bridge->dev.parent = parent;
> -   bridge->sysdata = sysdata;
> -   bridge->busnr = bus;
> -   bridge->ops = ops;
> +   sd->node = x86_pci_root_bus_node(busnum);
> +   x86_pci_root_bus_resources(busnum, >windows);
> +   bridge->sysdata = sd;
> +   bridge->busnr = busnum;
> +   bridge->ops = _root_ops;
>
> +   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
> error = pci_scan_root_bus_bridge(bridge);
> if (error < 0)
> goto err_out;
>
> -   return bridge->bus;
> +   pci_bus_add_devices(bridge->bus);
> +   return;
>
>  err_out:
> -   kfree(bridge);
> -   return NULL;
> -}
> -
> -void pcibios_scan_root(int busnum)
> -{
> -   struct pci_bus *bus;
> -   struct pci_sysdata *sd;
> -   LIST_HEAD(resources);
> -
> -   sd = kzalloc(sizeof(*sd), GFP_KERNEL);
> -   if (!sd) {
> -   printk(KERN_ERR "PCI: OOM, skipping PCI bus %02x\n", busnum);
> -   return;
> -   }
> -   sd->node = x86_pci_root_bus_node(busnum);
> -   x86_pci_root_bus_resources(busnum, );
> -   printk(KERN_DEBUG "PCI: Probing PCI hardware (bus %02x)\n", busnum);
> -   bus = pci_scan_root_bus(NULL, busnum, _root_ops, sd, );
> -   if (!bus) {
> -   pci_free_resource_list();
> -   kfree(sd);
> -   return;
> -   }
> -   pci_bus_add_devices(bus);
> +   pci_free_host_bridge(bridge);
>  }
>
>  void __init pcibios_set_cache_line_size(void)
> --
> 2.18.0
>


Re: [RFC 07/15] PCI/ACPI: clean up acpi_pci_root_create()

2018-08-20 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 12:33 PM Arnd Bergmann  wrote:
>
> The acpi_pci_create_root_bus() can be fully integrated into
> acpi_pci_root_create(), improving a few things:
>
> * We can call pci_scan_root_bus_bridge(), which registers and
>   scans the bridge in one step.
> * After a failure in pci_register_host_bridge(), we correctly
>   clean up the resources.
> * The bridge settings (release function, flags, operations etc)
>   can get set up before registering the bridge.
> * Further cleanup would be possible, removing duplication between
>   pci_host_bridge and some ACPI structures.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/acpi/pci_root.c | 68 +++--
>  1 file changed, 24 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 85dbcf47015b..5f73de3b67c8 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -873,34 +873,6 @@ static void acpi_pci_root_release_info(struct 
> pci_host_bridge *bridge)
> __acpi_pci_root_release_info(bridge->release_data);
>  }
>
> -static struct pci_bus *acpi_pci_create_root_bus(struct device *parent, int 
> bus,
> -   struct pci_ops *ops, void *sysdata, struct list_head 
> *resources)
> -{
> -   int error;
> -   struct pci_host_bridge *bridge;
> -
> -   bridge = pci_alloc_host_bridge(0);
> -   if (!bridge)
> -   return NULL;
> -
> -   bridge->dev.parent = parent;
> -
> -   list_splice_init(resources, >windows);
> -   bridge->sysdata = sysdata;
> -   bridge->busnr = bus;
> -   bridge->ops = ops;
> -
> -   error = pci_register_host_bridge(bridge);
> -   if (error < 0)
> -   goto err_out;
> -
> -   return bridge->bus;
> -
> -err_out:
> -   kfree(bridge);
> -   return NULL;
> -}
> -
>  struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
>  struct acpi_pci_root_ops *ops,
>  struct acpi_pci_root_info *info,
> @@ -909,8 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
> *root,
> int ret, busnum = root->secondary.start;
> struct acpi_device *device = root->device;
> int node = acpi_get_node(device->handle);
> -   struct pci_bus *bus;
> -   struct pci_host_bridge *host_bridge;
> +   struct pci_host_bridge *bridge;

Why "bridge" and not "host" or even something to stand for "root complex"?

Or maybe it can still be "host_bridge"?

>
> info->root = root;
> info->bridge = device;
> @@ -930,30 +901,39 @@ struct pci_bus *acpi_pci_root_create(struct 
> acpi_pci_root *root,
>
> pci_acpi_root_add_resources(info);
> pci_add_resource(>resources, >secondary);
> -   bus = acpi_pci_create_root_bus(NULL, busnum, ops->pci_ops,
> - sysdata, >resources);
> -   if (!bus)
> +
> +   bridge = pci_alloc_host_bridge(0);
> +   if (!bridge)
> goto out_release_info;
>
> -   host_bridge = to_pci_host_bridge(bus->bridge);
> +   list_splice_init(>resources, >windows);
> +   bridge->sysdata = sysdata;
> +   bridge->busnr = busnum;
> +   bridge->ops = ops->pci_ops;
> +   pci_set_host_bridge_release(bridge, acpi_pci_root_release_info,
> +   info);
> +
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_NATIVE_HP_CONTROL))
> -   host_bridge->native_pcie_hotplug = 0;
> +   bridge->native_pcie_hotplug = 0;
> if (!(root->osc_control_set & OSC_PCI_SHPC_NATIVE_HP_CONTROL))
> -   host_bridge->native_shpc_hotplug = 0;
> +   bridge->native_shpc_hotplug = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_AER_CONTROL))
> -   host_bridge->native_aer = 0;
> +   bridge->native_aer = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_PME_CONTROL))
> -   host_bridge->native_pme = 0;
> +   bridge->native_pme = 0;
> if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
> -   host_bridge->native_ltr = 0;
> +   bridge->native_ltr = 0;
> +
> +   ret = pci_scan_root_bus_bridge(bridge);
> +   if (ret < 0)
> +   goto out_release_bridge;
>
> -   pci_scan_child_bus(bus);
> -   pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
> -   info);
> if (node != NUMA_NO_NODE)
> -   dev_printk(KERN_DEBUG, >dev, "on NUMA node %d\n", node);
> -   return bus;
> +   dev_printk(KERN_DEBUG, >bus->dev, "on NUMA node 
> %d\n", node);
> +   return bridge->bus;
>
> +out_release_bridge:
> +   pci_free_host_bridge(bridge);
>  out_release_info:
> __acpi_pci_root_release_info(info);
> return NULL;
> --
> 2.18.0
>


Re: [PATCH 8/9] PCI: hotplug: Embed hotplug_slot

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:46 PM Lukas Wunner  wrote:
>
> When the PCI hotplug core and its first user, cpqphp, were introduced in
> February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
> struct for its internal use plus a hotplug_slot struct to be registered
> with the hotplug core and linked the two with pointers:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
>
> Nowadays, the predominant pattern in the tree is to embed ("subclass")
> such structures in one another and cast to the containing struct with
> container_of().  But it wasn't until July 2002 that container_of() was
> introduced with historic commit ec4f214232cf:
> https://git.kernel.org/tglx/history/c/ec4f214232cf
>
> pnv_php, introduced in 2016, did the right thing and embedded struct
> hotplug_slot in its internal struct pnv_php_slot, but all other drivers
> cargo-culted cpqphp's design and linked separate structs with pointers.
>
> Embedding structs is preferrable to linking them with pointers because
> it requires fewer allocations, thereby reducing overhead and simplifying
> error paths.  Casting an embedded struct to the containing struct
> becomes a cheap subtraction rather than a dereference.  And having fewer
> pointers reduces the risk of them pointing nowhere either accidentally
> or due to an attack.
>
> Convert all drivers to embed struct hotplug_slot in their internal slot
> struct.  The "private" pointer in struct hotplug_slot thereby becomes
> unused, so drop it.
>
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---
>  drivers/pci/hotplug/acpiphp.h   |  9 ++-
>  drivers/pci/hotplug/acpiphp_core.c  | 28 +++-
>  drivers/pci/hotplug/acpiphp_ibm.c   |  2 +-
>  drivers/pci/hotplug/cpci_hotplug.h  |  9 ++-
>  drivers/pci/hotplug/cpci_hotplug_core.c | 37 --
>  drivers/pci/hotplug/cpci_hotplug_pci.c  |  6 +-
>  drivers/pci/hotplug/cpqphp.h|  9 ++-
>  drivers/pci/hotplug/cpqphp_core.c   | 37 --
>  drivers/pci/hotplug/cpqphp_ctrl.c   |  2 -
>  drivers/pci/hotplug/ibmphp.h|  7 +-
>  drivers/pci/hotplug/ibmphp_core.c   | 92 +++--
>  drivers/pci/hotplug/ibmphp_ebda.c   | 37 +++---
>  drivers/pci/hotplug/pciehp.h| 11 ++-
>  drivers/pci/hotplug/pciehp_core.c   | 37 --
>  drivers/pci/hotplug/pciehp_ctrl.c   |  4 +-
>  drivers/pci/hotplug/pciehp_hpc.c|  8 +--
>  drivers/pci/hotplug/pnv_php.c   |  9 ++-
>  drivers/pci/hotplug/rpaphp.h|  7 +-
>  drivers/pci/hotplug/rpaphp_core.c   | 14 ++--
>  drivers/pci/hotplug/rpaphp_slot.c   | 15 ++--
>  drivers/pci/hotplug/s390_pci_hpc.c  | 30 
>  drivers/pci/hotplug/sgi_hotplug.c   | 52 ++
>  drivers/pci/hotplug/shpchp.h|  6 +-
>  drivers/pci/hotplug/shpchp_core.c   | 17 ++---
>  drivers/platform/x86/asus-wmi.c     | 26 +++
>  drivers/platform/x86/eeepc-laptop.c | 30 
>  include/linux/pci_hotplug.h |  3 -
>  27 files changed, 223 insertions(+), 321 deletions(-)

Good cleanup.

Reviewed-by: Rafael J. Wysocki 


Re: [PATCH 7/9] PCI: hotplug: Drop hotplug_slot_info

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:43 PM Lukas Wunner  wrote:
>
> Ever since the PCI hotplug core was introduced in 2002, drivers had to
> allocate and register a struct hotplug_slot_info for every slot:
> https://git.kernel.org/tglx/history/c/a8a2069f432c
>
> Apparently the idea was that drivers furnish the hotplug core with an
> up-to-date card presence status, power status, latch status and
> attention indicator status as well as notify the hotplug core of changes
> thereof.  However only 4 out of 12 hotplug drivers bother to notify the
> hotplug core with pci_hp_change_slot_info() and the hotplug core never
> made any use of the information:  There is just a single macro in
> pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if
> the driver lacks the corresponding callback in hotplug_slot_ops.  The
> macro is called when the user reads the attribute via sysfs.
>
> Now, if the callback isn't defined, the attribute isn't exposed in sysfs
> in the first place (see e.g. has_power_file()).  There are only two
> situations when the hotplug_slot_info would actually be accessed:
>
> * If the driver defines ->enable_slot or ->disable_slot but not
>   ->get_power_status.
>
> * If the driver defines ->set_attention_status but not
>   ->get_attention_status.
>
> There is no driver doing the former and just a single driver doing the
> latter, namely pnv_php.c.  Amend it with a ->get_attention_status
> callback.  With that, the hotplug_slot_info becomes completely unused by
> the PCI hotplug core.  But a few drivers use it internally as a cache:
>
> cpcihp uses it to cache the latch_status and adapter_status.
> cpqhp uses it to cache the adapter_status.
> pnv_php and rpaphp use it to cache the attention_status.
> shpchp uses it to cache all four values.
>
> Amend these drivers to cache the information in their private slot
> struct.  shpchp's slot struct already contains members to cache the
> power_status and adapter_status, so additional members are only needed
> for the other two values.  In the case of cpqphp, the cached value is
> only accessed in a single place, so instead of caching it, read the
> current value from the hardware.
>
> Caution:  acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop
> populate the hotplug_slot_info with initial values on probe.  That code
> is herewith removed.  There is a theoretical chance that the code has
> side effects without which the driver fails to function, e.g. if the
> ACPI method to read the adapter status needs to be executed at least
> once on probe.  That seems unlikely to me, still maintainers should
> review the changes carefully for this possibility.

I'm not aware of any case in which it will break anything, so

Reviewed-by: Rafael J. Wysocki 

but if that happens, it may be necessary to add the execution of the
control methods in question directly to the initialization part.


Re: [PATCH 6/9] PCI: hotplug: Constify hotplug_slot_ops

2018-08-20 Thread Rafael J. Wysocki
On Sun, Aug 19, 2018 at 4:36 PM Lukas Wunner  wrote:
>
> Hotplug drivers cannot declare their hotplug_slot_ops const, making them
> attractive targets for attackers, because upon registration of a hotplug
> slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members
> in that struct.
>
> Fix by moving these members to struct hotplug_slot and constify every
> driver's hotplug_slot_ops except for pciehp.
>
> pciehp constructs its hotplug_slot_ops at runtime based on the PCIe
> port's capabilities, hence cannot declare them const.  It can be
> converted to __write_rarely once that's mainlined:
> http://www.openwall.com/lists/kernel-hardening/2016/11/16/3
>
> Signed-off-by: Lukas Wunner 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: Scott Murray 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Gavin Shan 
> Cc: Sebastian Ott 
> Cc: Gerald Schaefer 
> Cc: Corentin Chary 
> Cc: Darren Hart 
> Cc: Andy Shevchenko 
> ---
>  drivers/pci/hotplug/acpiphp_core.c  |  2 +-
>  drivers/pci/hotplug/cpci_hotplug_core.c |  2 +-
>  drivers/pci/hotplug/cpqphp_core.c   |  2 +-
>  drivers/pci/hotplug/ibmphp.h|  2 +-
>  drivers/pci/hotplug/ibmphp_core.c   |  2 +-
>  drivers/pci/hotplug/pci_hotplug_core.c  | 27 +
>  drivers/pci/hotplug/pnv_php.c   |  2 +-
>  drivers/pci/hotplug/rpaphp.h|  2 +-
>  drivers/pci/hotplug/rpaphp_core.c   |  2 +-
>  drivers/pci/hotplug/s390_pci_hpc.c  |  2 +-
>  drivers/pci/hotplug/sgi_hotplug.c   |  2 +-
>  drivers/pci/hotplug/shpchp_core.c   |  2 +-
>  drivers/pci/pci.c   |  4 ++--
>  drivers/pci/slot.c  |  2 +-
>  drivers/platform/x86/asus-wmi.c |  3 +--
>  drivers/platform/x86/eeepc-laptop.c     |  3 +--
>  include/linux/pci_hotplug.h | 10 -
>  17 files changed, 35 insertions(+), 36 deletions(-)

Nice!

Reviewed-by: Rafael J. Wysocki 


Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 10:56 AM David Hildenbrand  wrote:
>
> On 17.08.2018 10:41, Greg Kroah-Hartman wrote:
> > On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> >> From: Vitaly Kuznetsov 
> >>
> >> Well require to call add_memory()/add_memory_resource() with
> >> device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> >> (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> >> lock device hotplug.
> >>
> >> Signed-off-by: Vitaly Kuznetsov 
> >> [modify patch description]
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  drivers/base/core.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index 04bbcd779e11..9010b9e942b5 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >>  {
> >>  mutex_lock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >>
> >>  void unlock_device_hotplug(void)
> >>  {
> >>  mutex_unlock(_hotplug_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
> >
> > If these are going to be "global" symbols, let's properly name them.
> > device_hotplug_lock/unlock would be better.  But I am _really_ nervous
> > about letting stuff outside of the driver core mess with this, as people
> > better know what they are doing.
>
> The only "problem" is that we have kernel modules (for paravirtualized
> devices) that call add_memory(). This is Hyper-V right now, but we might
> have other ones in the future. Without them we would not have to export
> it. We might also get kernel modules that want to call remove_memory() -
> which will require the device_hotplug_lock as of now.
>
> What we could do is
>
> a) add_memory() -> _add_memory() and don't export it
> b) add_memory() takes the device_hotplug_lock and calls _add_memory() .
> We export that one.
> c) Use add_memory() in external modules only
>
> Similar wrapper would be needed e.g. for remove_memory() later on.

That would be safer IMO, as it would prevent developers from using
add_memory() without the lock, say.

If the lock is always going to be required for add_memory(), make it
hard (or event impossible) to use the latter without it.


Re: [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug

2018-08-17 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 10:41 AM Greg Kroah-Hartman
 wrote:
>
> On Fri, Aug 17, 2018 at 09:59:00AM +0200, David Hildenbrand wrote:
> > From: Vitaly Kuznetsov 
> >
> > Well require to call add_memory()/add_memory_resource() with
> > device_hotplug_lock held, to avoid a lock inversion. Allow external modules
> > (e.g. hv_balloon) that make use of add_memory()/add_memory_resource() to
> > lock device hotplug.
> >
> > Signed-off-by: Vitaly Kuznetsov 
> > [modify patch description]
> > Signed-off-by: David Hildenbrand 
> > ---
> >  drivers/base/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 04bbcd779e11..9010b9e942b5 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -700,11 +700,13 @@ void lock_device_hotplug(void)
> >  {
> >   mutex_lock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(lock_device_hotplug);
> >
> >  void unlock_device_hotplug(void)
> >  {
> >   mutex_unlock(_hotplug_lock);
> >  }
> > +EXPORT_SYMBOL_GPL(unlock_device_hotplug);
>
> If these are going to be "global" symbols, let's properly name them.
> device_hotplug_lock/unlock would be better.

Well, device_hotplug_lock is the name of the lock itself. :-)

> But I am _really_ nervous about letting stuff outside of the driver core mess
> with this, as people better know what they are doing.
>
> Can't we just "lock" the memory stuff instead?  Why does the entirety of
> the driver core need to be messed with here?

Because, in general, memory hotplug and hotplug of devices are not
independent.  CPUs and memory may only be possible to take away
together and that may be the case for other devices too (say, it
wouldn't be a good idea to access a memory block that has just gone
away from a device, for DMA and the like).  That's why
device_hotplug_lock was introduced in the first place.

That said I agree that exporting this to drivers doesn't feel particularly safe.


Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

2018-08-17 Thread Rafael J. Wysocki
On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand  wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
>mem_hotplug_begin() and the calls device_online() - which takes
>device_lock() - .online does no longer call mem_hotplug_begin(), so
>effectively calls online_pages() without mem_hotplug_lock. onlining/
>offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
>onlining memory during add_memory() does not take care of that. (I
>didn't follow how strictly this is needed, but there seems to be a
>reason because it is documented at device_online() and
>device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
>(and device_online()) without locks. This was introduced after
>30467e0b3be. And skimming over the code, I assume it could need some
>more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
>  just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c |  3 ++
>  drivers/acpi/acpi_memhotplug.c|  1 +
>  drivers/base/memory.c | 18 +-
>  drivers/hv/hv_balloon.c   |  4 +++
>  drivers/s390/char/sclp_cmd.c  |  3 ++
>  drivers/xen/balloon.c |  3 ++
>  mm/memory_hotplug.c   | 42 ++-
>  7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> +   /* add_memory() requires device_hotplug_lock */
> +   lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = _array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> +   unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> +   /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*

A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what 

Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-09 Thread Rafael J. Wysocki
On Mon, Jul 9, 2018 at 10:40 AM, Pingfan Liu  wrote:
> On Mon, Jul 9, 2018 at 3:48 PM Rafael J. Wysocki  wrote:
>>
>> On Mon, Jul 9, 2018 at 8:48 AM, Pingfan Liu  wrote:
>> > On Sun, Jul 8, 2018 at 4:25 PM Rafael J. Wysocki  wrote:

[cut]

>>
>> I simply think that there should be one way to iterate over devices
>> for both system-wide PM and shutdown.
>>
>> The reason why it is not like that today is because of the development
>> history, but if it doesn't work and we want to fix it, let's just
>> consolidate all of that.
>>
>> Now, system-wide suspend resume sometimes iterates the list in the
>> reverse order which would be hard without having a list, wouldn't it?
>>
> Yes, it would be hard without having a list. I just thought to use
> device tree info to build up a shadowed list, and rebuild the list
> until there is new device_link_add() operation. For
> device_add/_remove(), it can modify the shadowed list directly.

Right, and that's the idea of dpm_list, generally speaking: It
represents one of the (possibly many) orders in which devices can be
suspended (or shut down) based on the information coming from the
device hierarchy and device links.

So it appears straightforward (even though it may be complicated
because of the build-time dependencies) to start using dpm_list for
shutdown too - and to ensure that it is properly maintained
everywhere.

Thanks,
Rafael


Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-09 Thread Rafael J. Wysocki
On Mon, Jul 9, 2018 at 8:48 AM, Pingfan Liu  wrote:
> On Sun, Jul 8, 2018 at 4:25 PM Rafael J. Wysocki  wrote:
>>
>> On Sat, Jul 7, 2018 at 6:24 AM, Pingfan Liu  wrote:
>> > On Fri, Jul 6, 2018 at 9:55 PM Pingfan Liu  wrote:
>> >>
>> >> On Fri, Jul 6, 2018 at 4:47 PM Rafael J. Wysocki  
>> >> wrote:
>> >> >
>> >> > On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner  wrote:
>> >> > > [cc += Kishon Vijay Abraham]
>> >> > >
>> >> > > On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
>> >> > >> OK, so calling devices_kset_move_last() from really_probe() clearly 
>> >> > >> is
>> >> > >> a mistake.
>> >> > >>
>> >> > >> I'm not really sure what the intention of it was as the changelog of
>> >> > >> commit 52cdbdd49853d doesn't really explain that (why would it be
>> >> > >> insufficient without that change?)
>> >> > >
>> >> > > It seems 52cdbdd49853d fixed an issue with boards which have an MMC
>> >> > > whose reset pin needs to be driven high on shutdown, lest the MMC
>> >> > > won't be found on the next boot.
>> >> > >
>> >> > > The boards' devicetrees use a kludge wherein the reset pin is modelled
>> >> > > as a regulator.  The regulator is enabled when the MMC probes and
>> >> > > disabled on driver unbind and shutdown.  As a result, the pin is 
>> >> > > driven
>> >> > > low on shutdown and the MMC is not found on the next boot.
>> >> > >
>> >> > > To fix this, another kludge was invented wherein the GPIO expander
>> >> > > driving the reset pin unconditionally drives all its pins high on
>> >> > > shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
>> >> > > (commit adc284755055, "gpio: pcf857x: restore the initial line state
>> >> > > of all pcf lines").
>> >> > >
>> >> > > For this kludge to work, the GPIO expander's ->shutdown hook needs to
>> >> > > be executed after the MMC expander's ->shutdown hook.
>> >> > >
>> >> > > Commit 52cdbdd49853d achieved that by reordering devices_kset 
>> >> > > according
>> >> > > to the probe order.  Apparently the MMC probes after the GPIO 
>> >> > > expander,
>> >> > > possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
>> >> > > available yet, see mmc_regulator_get_supply().
>> >> > >
>> >> > > Note, I'm just piecing the information together from git history,
>> >> > > I'm not responsible for these kludges.  (I'm innocent!)
>> >> >
>> >> > Sure enough. :-)
>> >> >
>> >> > In any case, calling devices_kset_move_last() in really_probe() is
>> >> > plain broken and if its only purpose was to address a single, arguably
>> >> > kludgy, use case, let's just get rid of it in the first place IMO.
>> >> >
>> >> Yes, if it is only used for a single use case.
>> >>
>> > Think it again, I saw other potential issue with the current code.
>> > device_link_add->device_reorder_to_tail() can break the
>> > "supplier<-consumer" order. During moving children after parent's
>> > supplier, it ignores the order of child's consumer.
>>
>> What do you mean?
>>
> The drivers use device_link_add() to build "supplier<-consumer" order
> without knowing each other. Hence there is the following potential
> odds: (consumerX, child_a, ...) (consumer_a,..) (supplierX), where
> consumer_a consumes child_a.

Well, what's the initial state of the list?

> When device_link_add()->device_reorder_to_tail() moves all descendant of
> consumerX to the tail, it breaks the "supplier<-consumer" order by
> "consumer_a <- child_a".

That depends on what the initial ordering of the list is and please
note that circular dependencies are explicitly assumed to be not
present.

The assumption is that the initial ordering of the list reflects the
correct suspend (or shutdown) order without the new link.  Therefore
initially all children are located after their parents and all known
consumers are located after their suppliers.

If a new link is added, the new consumer goes to the end of the list
and all o

Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-08 Thread Rafael J. Wysocki
On Sat, Jul 7, 2018 at 6:24 AM, Pingfan Liu  wrote:
> On Fri, Jul 6, 2018 at 9:55 PM Pingfan Liu  wrote:
>>
>> On Fri, Jul 6, 2018 at 4:47 PM Rafael J. Wysocki  wrote:
>> >
>> > On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner  wrote:
>> > > [cc += Kishon Vijay Abraham]
>> > >
>> > > On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
>> > >> OK, so calling devices_kset_move_last() from really_probe() clearly is
>> > >> a mistake.
>> > >>
>> > >> I'm not really sure what the intention of it was as the changelog of
>> > >> commit 52cdbdd49853d doesn't really explain that (why would it be
>> > >> insufficient without that change?)
>> > >
>> > > It seems 52cdbdd49853d fixed an issue with boards which have an MMC
>> > > whose reset pin needs to be driven high on shutdown, lest the MMC
>> > > won't be found on the next boot.
>> > >
>> > > The boards' devicetrees use a kludge wherein the reset pin is modelled
>> > > as a regulator.  The regulator is enabled when the MMC probes and
>> > > disabled on driver unbind and shutdown.  As a result, the pin is driven
>> > > low on shutdown and the MMC is not found on the next boot.
>> > >
>> > > To fix this, another kludge was invented wherein the GPIO expander
>> > > driving the reset pin unconditionally drives all its pins high on
>> > > shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
>> > > (commit adc284755055, "gpio: pcf857x: restore the initial line state
>> > > of all pcf lines").
>> > >
>> > > For this kludge to work, the GPIO expander's ->shutdown hook needs to
>> > > be executed after the MMC expander's ->shutdown hook.
>> > >
>> > > Commit 52cdbdd49853d achieved that by reordering devices_kset according
>> > > to the probe order.  Apparently the MMC probes after the GPIO expander,
>> > > possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
>> > > available yet, see mmc_regulator_get_supply().
>> > >
>> > > Note, I'm just piecing the information together from git history,
>> > > I'm not responsible for these kludges.  (I'm innocent!)
>> >
>> > Sure enough. :-)
>> >
>> > In any case, calling devices_kset_move_last() in really_probe() is
>> > plain broken and if its only purpose was to address a single, arguably
>> > kludgy, use case, let's just get rid of it in the first place IMO.
>> >
>> Yes, if it is only used for a single use case.
>>
> Think it again, I saw other potential issue with the current code.
> device_link_add->device_reorder_to_tail() can break the
> "supplier<-consumer" order. During moving children after parent's
> supplier, it ignores the order of child's consumer.

What do you mean?

> Beside this, essentially both devices_kset_move_after/_before() and
> device_pm_move_after/_before() expose  the shutdown order to the
> indirect caller,  and we can not expect that the caller can not handle
> it correctly. It should be a job of drivers core.

Arguably so, but that's how those functions were designed and the
callers should be aware of the limitation.

If they aren't, there is a bug in the caller.

> It is hard to extract high dimension info and pack them into one dimension
> linked-list.

Well, yes and no.

We know it for a fact that there is a linear ordering that will work.
It is inefficient to figure it out every time during system suspend
and resume, for one and that's why we have dpm_list.

Now, if we have it for suspend and resume, it can also be used for shutdown.

> And in theory, it is warranted that the shutdown seq is
> correct by using device tree info. More important, it is cheap with
> the data structure in hand. So I think it is time to resolve the issue
> once for all.

Not the way you want to do that, though.

Thanks,
Rafael


Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices

2018-07-06 Thread Rafael J. Wysocki
On Friday, July 6, 2018 5:02:15 AM CEST Pingfan Liu wrote:
> On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki  wrote:
> >
> > On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge. This will break the
> > > parent<-children order and cause failure when "kexec -e" in some scenario.
> > >
> > > The detailed description of the scenario:
> > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> > > to some issue. For this case, the bridge is moved after its children in
> > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> > > write back buffer in flight due to the former shutdown of the bridge which
> > > clears the BusMaster bit.
> > >
> > > It is a little hard to impose both "parent<-child" and 
> > > "supplier<-consumer"
> > > order on devices_kset. Take the following scene:
> > > step0: before a consumer's probing, (note child_a is supplier of 
> > > consumer_a)
> > >   [ consumer-X, child_a, , child_z] [... consumer_a, ..., consumer_z, 
> > > ...] supplier-X
> > >  ^^ affected range 
> > > ^^
> > > step1: when probing, moving consumer-X after supplier-X
> > >   [ child_a, , child_z] [ consumer_a, ..., consumer_z, ...] 
> > > supplier-X, consumer-X
> > > step2: the children of consumer-X should be re-ordered to maintain the seq
> > >   [... consumer_a, ..., consumer_z, ] supplier-X  [consumer-X, 
> > > child_a, , child_z]
> > > step3: the consumer_a should be re-ordered to maintain the seq
> > >   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., 
> > > child_z]
> > >
> > > It requires two nested recursion to drain out all out-of-order item in
> > > "affected range". To avoid such complicated code, this patch suggests
> > > to utilize the info in device tree, instead of using the order of
> > > devices_kset during shutdown. It iterates the device tree, and firstly
> > > shutdown a device's children and consumers. After this patch, the buggy
> > > commit is hollow and left to clean.
> > >
> > > Cc: Greg Kroah-Hartman 
> > > Cc: Rafael J. Wysocki 
> > > Cc: Grygorii Strashko 
> > > Cc: Christoph Hellwig 
> > > Cc: Bjorn Helgaas 
> > > Cc: Dave Young 
> > > Cc: linux-...@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu 
> > > ---
> > >  drivers/base/core.c| 48 
> > > +++-
> > >  include/linux/device.h |  1 +
> > >  2 files changed, 44 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index a48868f..684b994 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
> > >   INIT_LIST_HEAD(>links.consumers);
> > >   INIT_LIST_HEAD(>links.suppliers);
> > >   dev->links.status = DL_DEV_NO_DRIVER;
> > > + dev->shutdown = false;
> > >  }
> > >  EXPORT_SYMBOL_GPL(device_initialize);
> > >
> > > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
> > >* lock is to be held
> > >*/
> > >   parent = get_device(dev->parent);
> > > - get_device(dev);
> >
> > Why is the get_/put_device() not needed any more?
> >
> They are moved upper layer into device_for_each_child_shutdown().
> Since there is lock breakage in __device_shutdown(), resorting to
> ref++ to protect the ancestor.  And I think the
> get_device(dev->parent) can be deleted either.

Wouldn't that break USB?

> > >   /*
> > >* Make sure the device is off the kset list, in the
> > >* event that

Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-06 Thread Rafael J. Wysocki
On Fri, Jul 6, 2018 at 10:36 AM, Lukas Wunner  wrote:
> [cc += Kishon Vijay Abraham]
>
> On Thu, Jul 05, 2018 at 11:18:28AM +0200, Rafael J. Wysocki wrote:
>> OK, so calling devices_kset_move_last() from really_probe() clearly is
>> a mistake.
>>
>> I'm not really sure what the intention of it was as the changelog of
>> commit 52cdbdd49853d doesn't really explain that (why would it be
>> insufficient without that change?)
>
> It seems 52cdbdd49853d fixed an issue with boards which have an MMC
> whose reset pin needs to be driven high on shutdown, lest the MMC
> won't be found on the next boot.
>
> The boards' devicetrees use a kludge wherein the reset pin is modelled
> as a regulator.  The regulator is enabled when the MMC probes and
> disabled on driver unbind and shutdown.  As a result, the pin is driven
> low on shutdown and the MMC is not found on the next boot.
>
> To fix this, another kludge was invented wherein the GPIO expander
> driving the reset pin unconditionally drives all its pins high on
> shutdown, see pcf857x_shutdown() in drivers/gpio/gpio-pcf857x.c
> (commit adc284755055, "gpio: pcf857x: restore the initial line state
> of all pcf lines").
>
> For this kludge to work, the GPIO expander's ->shutdown hook needs to
> be executed after the MMC expander's ->shutdown hook.
>
> Commit 52cdbdd49853d achieved that by reordering devices_kset according
> to the probe order.  Apparently the MMC probes after the GPIO expander,
> possibly because it returns -EPROBE_DEFER if the vmmc regulator isn't
> available yet, see mmc_regulator_get_supply().
>
> Note, I'm just piecing the information together from git history,
> I'm not responsible for these kludges.  (I'm innocent!)

Sure enough. :-)

In any case, calling devices_kset_move_last() in really_probe() is
plain broken and if its only purpose was to address a single, arguably
kludgy, use case, let's just get rid of it in the first place IMO.

> @Pingfan Liu, if you just remove the call to devices_kset_move_last()
> from really_probe(), does the issue go away?

I would think so from the description of the problem (elsewhere in this thread).

Thanks,
Rafael


Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices

2018-07-05 Thread Rafael J. Wysocki
On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge. This will break the
> parent<-children order and cause failure when "kexec -e" in some scenario.
> 
> The detailed description of the scenario:
> An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
> match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
> to some issue. For this case, the bridge is moved after its children in
> devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
> write back buffer in flight due to the former shutdown of the bridge which
> clears the BusMaster bit.
> 
> It is a little hard to impose both "parent<-child" and "supplier<-consumer"
> order on devices_kset. Take the following scene:
> step0: before a consumer's probing, (note child_a is supplier of consumer_a)
>   [ consumer-X, child_a, , child_z] [... consumer_a, ..., consumer_z, 
> ...] supplier-X
>  ^^ affected range ^^
> step1: when probing, moving consumer-X after supplier-X
>   [ child_a, , child_z] [ consumer_a, ..., consumer_z, ...] 
> supplier-X, consumer-X
> step2: the children of consumer-X should be re-ordered to maintain the seq
>   [... consumer_a, ..., consumer_z, ] supplier-X  [consumer-X, child_a, 
> , child_z]
> step3: the consumer_a should be re-ordered to maintain the seq
>   [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., 
> child_z]
> 
> It requires two nested recursion to drain out all out-of-order item in
> "affected range". To avoid such complicated code, this patch suggests
> to utilize the info in device tree, instead of using the order of
> devices_kset during shutdown. It iterates the device tree, and firstly
> shutdown a device's children and consumers. After this patch, the buggy
> commit is hollow and left to clean.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Rafael J. Wysocki 
> Cc: Grygorii Strashko 
> Cc: Christoph Hellwig 
> Cc: Bjorn Helgaas 
> Cc: Dave Young 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu 
> ---
>  drivers/base/core.c| 48 +++-
>  include/linux/device.h |  1 +
>  2 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a48868f..684b994 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev)
>   INIT_LIST_HEAD(>links.consumers);
>   INIT_LIST_HEAD(>links.suppliers);
>   dev->links.status = DL_DEV_NO_DRIVER;
> + dev->shutdown = false;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev)
>* lock is to be held
>*/
>   parent = get_device(dev->parent);
> - get_device(dev);

Why is the get_/put_device() not needed any more?

>   /*
>* Make sure the device is off the kset list, in the
>* event that dev->*->shutdown() doesn't remove it.
> @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev)
>   dev_info(dev, "shutdown\n");
>   dev->driver->shutdown(dev);
>   }
> -
> + dev->shutdown = true;
>   device_unlock(dev);
>   if (parent)
>   device_unlock(parent);
>  
> - put_device(dev);
>   put_device(parent);
>   spin_lock(_kset->list_lock);
>  }
>  
> +/* shutdown dev's children and consumer firstly, then itself */
> +static int device_for_each_child_shutdown(struct device *dev)

Confusing name.

What about device_shutdown_subordinate()?

> +{
> + struct klist_iter i;
> + struct device *child;
> + struct device_link *link;
> +
> + /* already shutdown, then skip this sub tree */
> + if (dev->shutdown)
> + return 0;
> +
> + if (!dev->p)
> + goto check_consumers;
> +
> + /* there is breakage of lock in __device_shutdown(), and the redundant
> +  * ref++ on srcu protected consumer is harmless si

Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-05 Thread Rafael J. Wysocki
On Thu, Jul 5, 2018 at 4:44 AM, Pingfan Liu  wrote:
> On Wed, Jul 4, 2018 at 6:23 PM Rafael J. Wysocki  wrote:
>>
>> On Wednesday, July 4, 2018 4:47:07 AM CEST Pingfan Liu wrote:
>> > On Tue, Jul 3, 2018 at 10:36 PM Rafael J. Wysocki  
>> > wrote:
>> > >
>> > > On Tuesday, July 3, 2018 8:50:38 AM CEST Pingfan Liu wrote:
>> > > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
>> > > > places an assumption of supplier<-consumer order on the process of 
>> > > > probe.
>> > > > But it turns out to break down the parent <- child order in some scene.
>> > > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
>> > > > have been probed. Then comes the bridge's module, which enables extra
>> > > > feature(such as hotplug) on this bridge.
>> > >
>> > > So what *exactly* does happen in that case?
>> > >
>> > I saw the  shpc_probe() is called on the bridge, although the probing
>> > failed on that bare-metal. But if it success, then it will enable the
>> > hotplug feature on the bridge.
>>
>> I don't understand what you are saying here, sorry.
>>
> On the system, I observe the following:
> [2.114986] devices_kset: Moving 0004:00:00.0 to end of list
> <---pcie port drive's probe, but it failed
> [2.115192] devices_kset: Moving 0004:01:00.0 to end of list
> [2.115591] devices_kset: Moving 0004:02:02.0 to end of list
> [2.115923] devices_kset: Moving 0004:02:0a.0 to end of list
> [2.116141] devices_kset: Moving 0004:02:0b.0 to end of list
> [2.116358] devices_kset: Moving 0004:02:0c.0 to end of list
> [3.181860] devices_kset: Moving 0004:03:00.0 to end of list
> <---the ata disk controller which sits behind the bridge
> [   10.267081] devices_kset: Moving 0004:00:00.0 to end of list
>  <---shpc_probe() on this bridge, failed too.
>
> As you can the the parent device "0004:00:00.0" is moved twice, and
> finally, it is after the "0004:03:00.0", this will break the
> "parent<-child" order in devices_kset. This is caused by the code
> really_probe()->devices_kset_move_last(). Apparently, it makes
> assumption that child device's probing comes after its parent's. But
> it does not stand up in the case.
>
>> device_reorder_to_tail() walks the entire device hierarchy below the target
>> and moves all of the children in there *after* their parents.
>>
> As described, the bug is not related with device_reorder_to_tail(), it
> is related with really_probe()->devices_kset_move_last().

OK, so calling devices_kset_move_last() from really_probe() clearly is
a mistake.

I'm not really sure what the intention of it was as the changelog of
commit 52cdbdd49853d doesn't really explain that (why would it be
insufficient without that change?) and I'm quite sure that in the
majority of cases it is unnecessary.

I *think* that it attempted to cover a use case similar to the device
links one, but it should have moved children along with the parent
every time like device_link_add() does.

> So [2/4] uses different method to achieve the "parent<-child" and
> "supplier<-consumer" order. The [3/4] clean up some code in
> device_reorder_to_tail(), since I need to revert the commit.

OK, let me look at that one.

Thanks,
Rafael


Re: [PATCH v3 0/2] powernv/cpuidle Device-tree parsing cleanup

2018-07-04 Thread Rafael J. Wysocki
On Tuesday, July 3, 2018 11:20:54 AM CEST Akshay Adiga wrote:
> 
> Device-tree parsed multiple time in powernv cpuidle and powernv
> hotplug code. 
> 
> First to identify supported flags. Second time, to identify deepest_state
> and first deep state. Third time, during cpuidle init to find the available
> idle states. Any change in device-tree format will lead to make changes in
> these 3 places. Errors in device-tree can be handled in a better manner.
> 
> This series adds code to parse device tree once and save in global structure.
> 
> Changes from v2 :
>  - Fix build error (moved a hunk from patch 1 to patch 2)
> Changes from v1 :
>  - fold first 2 patches into 1
>  - rename pm_ctrl_reg_* as psscr_*
>  - added comment stating removal of pmicr parsing code
>  - removed parsing code for pmicr
>  - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val
> are valid combination,
>  - Change function description of pnv_parse_cpuidle_dt
>  - Added error handling code.
> 
> 
> Akshay Adiga (2):
>   powernv/cpuidle: Parse dt idle properties into global structure
>   powernv/cpuidle: Use parsed device tree values for cpuidle_init
> 
>  arch/powerpc/include/asm/cpuidle.h|  13 ++
>  arch/powerpc/platforms/powernv/idle.c | 216 --
>  drivers/cpuidle/cpuidle-powernv.c | 154 --
>  3 files changed, 177 insertions(+), 206 deletions(-)
> 
> 

I am assuming that this series will go in via the powerpc tree.

Thanks,
Rafael




Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-04 Thread Rafael J. Wysocki
On Wednesday, July 4, 2018 4:47:07 AM CEST Pingfan Liu wrote:
> On Tue, Jul 3, 2018 at 10:36 PM Rafael J. Wysocki  wrote:
> >
> > On Tuesday, July 3, 2018 8:50:38 AM CEST Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge.
> >
> > So what *exactly* does happen in that case?
> >
> I saw the  shpc_probe() is called on the bridge, although the probing
> failed on that bare-metal. But if it success, then it will enable the
> hotplug feature on the bridge.

I don't understand what you are saying here, sorry.

device_reorder_to_tail() walks the entire device hierarchy below the target
and moves all of the children in there *after* their parents.

How can it break "the parent <- child order" then?

Thanks,
Rafael



Re: [PATCHv3 3/4] drivers/base: clean up the usage of devices_kset_move_last()

2018-07-04 Thread Rafael J. Wysocki
On Wednesday, July 4, 2018 6:40:09 AM CEST Pingfan Liu wrote:
> On Tue, Jul 3, 2018 at 10:28 PM Rafael J. Wysocki  wrote:
> >
> > On Tuesday, July 3, 2018 8:50:41 AM CEST Pingfan Liu wrote:
> > > Clean up the referring to the code in commit 52cdbdd49853 ("driver core:
> > > correct device's shutdown order"). So later we can revert it safely.
> > >
> > > Cc: Greg Kroah-Hartman 
> > > Cc: Rafael J. Wysocki 
> > > Cc: Grygorii Strashko 
> > > Cc: Christoph Hellwig 
> > > Cc: Bjorn Helgaas 
> > > Cc: Dave Young 
> > > Cc: linux-...@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu 
> > > ---
> > >  drivers/base/core.c | 7 ---
> > >  1 file changed, 7 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 684b994..db3deb8 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -127,13 +127,6 @@ static int device_reorder_to_tail(struct device 
> > > *dev, void *not_used)
> > >  {
> > >   struct device_link *link;
> > >
> > > - /*
> > > -  * Devices that have not been registered yet will be put to the ends
> > > -  * of the lists during the registration, so skip them here.
> > > -  */
> > > - if (device_is_registered(dev))
> > > - devices_kset_move_last(dev);
> > > -
> > >   if (device_pm_initialized(dev))
> > >   device_pm_move_last(dev);
> >
> > You can't do this.
> >
> > If you do it, that will break power management in some situations.
> >
> Could you shed light on it? I had a quick browsing of pm code, but it
> is a big function, and I got lost in it.
> If the above code causes failure, then does it imply that the seq in
> devices_kset should be the same as dpm_list?

Generally, yes it should.

> But in device_shutdown(), it only intersect with pm by
> pm_runtime_get_noresume(dev) and pm_runtime_barrier(dev). How do these
> function affect the seq in dpm_list?

They are not related to dpm_list directly.

However, if you shut down a supplier device before its consumer and that
involves power management, then the consumer shutdown may fail and lock up
the system

I asked you elsewhere to clearly describe the problem you are trying to
address.  Please do that in the first place.

Thanks,
Rafael



Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset

2018-07-03 Thread Rafael J. Wysocki
On Tuesday, July 3, 2018 8:50:38 AM CEST Pingfan Liu wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> places an assumption of supplier<-consumer order on the process of probe.
> But it turns out to break down the parent <- child order in some scene.
> E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> have been probed. Then comes the bridge's module, which enables extra
> feature(such as hotplug) on this bridge.

So what *exactly* does happen in that case?



Re: [PATCHv3 3/4] drivers/base: clean up the usage of devices_kset_move_last()

2018-07-03 Thread Rafael J. Wysocki
On Tuesday, July 3, 2018 8:50:41 AM CEST Pingfan Liu wrote:
> Clean up the referring to the code in commit 52cdbdd49853 ("driver core:
> correct device's shutdown order"). So later we can revert it safely.
> 
> Cc: Greg Kroah-Hartman 
> Cc: Rafael J. Wysocki 
> Cc: Grygorii Strashko 
> Cc: Christoph Hellwig 
> Cc: Bjorn Helgaas 
> Cc: Dave Young 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu 
> ---
>  drivers/base/core.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 684b994..db3deb8 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -127,13 +127,6 @@ static int device_reorder_to_tail(struct device *dev, 
> void *not_used)
>  {
>   struct device_link *link;
>  
> - /*
> -  * Devices that have not been registered yet will be put to the ends
> -  * of the lists during the registration, so skip them here.
> -  */
> - if (device_is_registered(dev))
> - devices_kset_move_last(dev);
> -
>   if (device_pm_initialized(dev))
>   device_pm_move_last(dev);

You can't do this.

If you do it, that will break power management in some situations.

Thanks,
Rafael



Re: [trivial PATCH V2] treewide: Align function definition open/close braces

2018-03-21 Thread Rafael J. Wysocki
On Wed, Mar 21, 2018 at 11:09 PM, Joe Perches <j...@perches.com> wrote:
> Some functions definitions have either the initial open brace and/or
> the closing brace outside of column 1.
>
> Move those braces to column 1.
>
> This allows various function analyzers like gnu complexity to work
> properly for these modified functions.
>
> Signed-off-by: Joe Perches <j...@perches.com>
> Acked-by: Andy Shevchenko <andy.shevche...@gmail.com>
> Acked-by: Paul Moore <p...@paul-moore.com>
> Acked-by: Alex Deucher <alexander.deuc...@amd.com>
> Acked-by: Dave Chinner <dchin...@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.w...@oracle.com>
> Acked-by: Alexandre Belloni <alexandre.bell...@free-electrons.com>
> Acked-by: Martin K. Petersen <martin.peter...@oracle.com>
> Acked-by: Takashi Iwai <ti...@suse.de>
> Acked-by: Mauro Carvalho Chehab <mche...@s-opensource.com>
> ---
>
> git diff -w still shows no difference.
>
> This patch was sent but December and not applied.
>
> As the trivial maintainer seems not active, it'd be nice if
> Andrew Morton picks this up.
>
> V2: Remove fs/xfs/libxfs/xfs_alloc.c as it's updated and remerge the rest
>
>  arch/x86/include/asm/atomic64_32.h   |  2 +-
>  drivers/acpi/custom_method.c |  2 +-
>  drivers/acpi/fan.c   |  2 +-

For the ACPI changes:

Acked-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>


Re: Linux 4.16: Reported regressions as of Monday, 2018-02-26 (Was: Linux 4.16-rc3)

2018-02-27 Thread Rafael J. Wysocki
On Monday, February 26, 2018 11:25:09 AM CET Thorsten Leemhuis wrote:
> On 26.02.2018 04:05, Linus Torvalds wrote:
> > We're on the normal schedule for 4.16 and everything still looks very 
> > regular.
> 
> Hi! Find below my second regression report for Linux 4.16. It lists 8
> regressions I'm currently aware of.
> 
> To anyone reading this: Are you aware of any other regressions that got
> introduced this development cycle? Then please let me know by mail (a
> simple bounce or forward to the email address is enough!).
> 
> For details see http://bit.ly/lnxregtrackid And please tell me if there
> is anything in the report that shouldn't be there.

Please add https://bugzilla.kernel.org/show_bug.cgi?id=198763

It has a bisect result and seems to be readily reproducible.

No response from the author of the problematic commit so far.

Thanks,
Rafael



Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-26 Thread Rafael J. Wysocki
On Mon, Feb 12, 2018 at 11:33 AM, Shilpasri G Bhat
 wrote:
> Hi,
>
> On 02/12/2018 03:59 PM, Viresh Kumar wrote:
>> On 12-02-18, 15:51, Shilpasri G Bhat wrote:
>>> This patch fixes the below Coverity warning:
>>>
>>> *** CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
>>> /drivers/cpufreq/powernv-cpufreq.c: 1008 in powernv_fast_switch()
>>> 1002 unsigned int target_freq)
>>> 1003 {
>>> 1004 int index;
>>> 1005 struct powernv_smp_call_data freq_data;
>>> 1006
>>> 1007 index = cpufreq_table_find_index_dl(policy, target_freq);
>> CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
>> Using variable "index" as an index to array "powernv_freqs".
>>> 1008 freq_data.pstate_id = powernv_freqs[index].driver_data;
>>> 1009 freq_data.gpstate_id = powernv_freqs[index].driver_data;
>>> 1010 set_pstate(_data);
>>> 1011
>>> 1012 return powernv_freqs[index].frequency;
>>> 1013 }
>>>
>>> Signed-off-by: Shilpasri G Bhat 
>>> ---
>>>  drivers/cpufreq/powernv-cpufreq.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
>>> b/drivers/cpufreq/powernv-cpufreq.c
>>> index 29cdec1..69edfe9 100644
>>> --- a/drivers/cpufreq/powernv-cpufreq.c
>>> +++ b/drivers/cpufreq/powernv-cpufreq.c
>>> @@ -1005,6 +1005,9 @@ static unsigned int powernv_fast_switch(struct 
>>> cpufreq_policy *policy,
>>>  struct powernv_smp_call_data freq_data;
>>>
>>>  index = cpufreq_table_find_index_dl(policy, target_freq);
>>> +if (unlikely(index < 0))
>>> +index = get_nominal_index();
>>> +
>>
>> AFAICT, you will get -1 here only if the freq table had no valid
>> frequencies (or the freq table is empty). Why would that happen ?
>
> I agree too. There is no way we can get -1 with initialized cpu frequency 
> table.
> We don't initialize powernv-cpufreq if we don't have valid CPU frequency
> entries. Is there any other way to suppress the Coverity tool warning apart 
> from
> ignoring it?

In principle you could use BUG_ON(something_impossible) to annotate
that kind of thing to the static analysis tools, but that would
generate extra code.


Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-21 Thread Rafael J. Wysocki
On Wed, Feb 21, 2018 at 2:13 PM, Michael Ellerman <m...@ellerman.id.au> wrote:
> "Rafael J. Wysocki" <raf...@kernel.org> writes:
>
>> On Wed, Feb 21, 2018 at 6:54 AM, Viresh Kumar <viresh.ku...@linaro.org> 
>> wrote:
>>> On 21-02-18, 16:39, Michael Ellerman wrote:
>>>> Viresh Kumar <viresh.ku...@linaro.org> writes:
>>>
>>>> > AFAICT, you will get -1 here only if the freq table had no valid
>>>> > frequencies (or the freq table is empty). Why would that happen ?
>>>>
>>>> Bugs?
>>>
>>> The cupfreq driver shouldn't have registered itself in that case (i.e.
>>> if the cpufreq table is empty).
>>
>> To be precise, ->init() should fail as that's where the table is
>> created.  The registration fails as a result then.
>>
>> But what if the bug is that ->init() doesn't fail when it should?
>>
>> I guess the core could double check the frequency table after ->init()
>> if ->target_index is not NULL.
>>
>> The overall point here is that if you get a negative index in
>> ->fast_switch(), that's way too late anyway and we should be able to
>> catch that error much earlier.
>
> OK.
>
> Still it's one thing for the driver to print a warning and bail out,
> it's another to access off the front of an array and keep running using
> some junk values, or oops (though not in this case because the array
> happens to be static).

Well, let me rephrase.  If ->fast_switch() runs, then it must not be
possible to get a negative index in it.  That has to be guaranteed by
the core.


Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-21 Thread Rafael J. Wysocki
On Wed, Feb 21, 2018 at 11:02 AM, Viresh Kumar <viresh.ku...@linaro.org> wrote:
> On 21-02-18, 10:27, Rafael J. Wysocki wrote:
>> To be precise, ->init() should fail as that's where the table is
>> created.  The registration fails as a result then.
>>
>> But what if the bug is that ->init() doesn't fail when it should?
>>
>> I guess the core could double check the frequency table after ->init()
>> if ->target_index is not NULL.
>>
>> The overall point here is that if you get a negative index in
>> ->fast_switch(), that's way too late anyway and we should be able to
>> catch that error much earlier.
>
> I don't want to end up doing double checking as some of it is already
> done at init, but let me check on what can be done.

The driver is expected to call cpufreq_table_validate_and_show() at
->init() time and fail ->init() if that fails.

That's kind of fragile, because it depends on the driver to do the right thing.


Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-21 Thread Rafael J. Wysocki
On Wed, Feb 21, 2018 at 6:54 AM, Viresh Kumar  wrote:
> On 21-02-18, 16:39, Michael Ellerman wrote:
>> Viresh Kumar  writes:
>
>> > AFAICT, you will get -1 here only if the freq table had no valid
>> > frequencies (or the freq table is empty). Why would that happen ?
>>
>> Bugs?
>
> The cupfreq driver shouldn't have registered itself in that case (i.e.
> if the cpufreq table is empty).

To be precise, ->init() should fail as that's where the table is
created.  The registration fails as a result then.

But what if the bug is that ->init() doesn't fail when it should?

I guess the core could double check the frequency table after ->init()
if ->target_index is not NULL.

The overall point here is that if you get a negative index in
->fast_switch(), that's way too late anyway and we should be able to
catch that error much earlier.


Re: [PATCH v3] cpufreq: powernv: Add support of frequency domain

2018-02-09 Thread Rafael J. Wysocki
On Monday, January 22, 2018 9:17:34 AM CET Abhishek Goel wrote:
> Frequency-domain indicates group of CPUs that would share same frequency.
> It is detected using device-tree node "frequency-domain-indicator".
> frequency-domain-indicator is a bitmask which will have different value
> depending upon the generation of the processor.
> 
> CPUs of the same chip for which the result of a bitwise AND between
> their PIR and the frequency-domain-indicator is the same share the same
> frequency.
> 
> In this patch, we define hash-table indexed by the aforementioned
> bitwise ANDed value to store the cpumask of the CPUs sharing the same
> frequency domain. Further, the cpufreq policy will be created per
> frequency-domain
> 
> So for POWER9, a cpufreq policy is created per quad while for POWER8 it
> is created per core. Governor decides frequency for each policy but
> multiple cores may come under same policy. In such case frequency needs
> to be set on each core sharing that policy.
> 
> Signed-off-by: Abhishek Goel 
> ---
> 
> Skiboot patch required for the corresponding device-tree changes have been
> posted here : http://patchwork.ozlabs.org/patch/862256/
> 
>  drivers/cpufreq/powernv-cpufreq.c | 104 
> ++
>  1 file changed, 95 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index b6d7c4c..aab23a4 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -37,6 +37,7 @@
>  #include  /* Required for cpu_sibling_mask() in UP configs */
>  #include 
>  #include 
> +#include 
>  
>  #define POWERNV_MAX_PSTATES  256
>  #define PMSR_PSAFE_ENABLE(1UL << 30)
> @@ -130,6 +131,9 @@ enum throttle_reason_type {
>  static int nr_chips;
>  static DEFINE_PER_CPU(struct chip *, chip_info);
>  
> +static u32 freq_domain_indicator;
> +static bool p9_occ_quirk;
> +
>  /*
>   * Note:
>   * The set of pstates consists of contiguous integers.
> @@ -194,6 +198,38 @@ static inline void reset_gpstates(struct cpufreq_policy 
> *policy)
>   gpstates->last_gpstate_idx = 0;
>  }
>  
> +#define SIZE NR_CPUS
> +#define ORDER_FREQ_MAP ilog2(SIZE)
> +
> +static DEFINE_HASHTABLE(freq_domain_map, ORDER_FREQ_MAP);
> +
> +struct hashmap {
> + cpumask_t mask;
> + int chip_id;
> + u32 pir_key;
> + struct hlist_node hash_node;
> +};
> +
> +static void insert(u32 key, int cpu)
> +{
> + struct hashmap *data;
> +
> + hash_for_each_possible(freq_domain_map, data, hash_node, key%SIZE) {
> + if (data->chip_id == cpu_to_chip_id(cpu) &&
> + data->pir_key == key) {
> + cpumask_set_cpu(cpu, >mask);
> + return;
> + }
> + }
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL);
> + hash_add(freq_domain_map, >hash_node, key%SIZE);
> + cpumask_set_cpu(cpu, >mask);
> + data->chip_id = cpu_to_chip_id(cpu);
> + data->pir_key = key;
> +
> +}
> +
>  /*
>   * Initialize the freq table based on data obtained
>   * from the firmware passed via device-tree
> @@ -206,6 +242,7 @@ static int init_powernv_pstates(void)
>   u32 len_ids, len_freqs;
>   u32 pstate_min, pstate_max, pstate_nominal;
>   u32 pstate_turbo, pstate_ultra_turbo;
> + u32 key;
>  
>   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
>   if (!power_mgt) {
> @@ -246,9 +283,18 @@ static int init_powernv_pstates(void)
>   else
>   powernv_pstate_info.wof_enabled = true;
>  
> + if (of_device_is_compatible(power_mgt, "freq-domain-v1") &&
> + of_property_read_u32(power_mgt, "ibm,freq-domain-indicator",
> + _domain_indicator))
> + pr_warn("ibm,freq-domain-indicator not found\n");
> +
> +if (of_device_is_compatible(power_mgt, "p9-occ-quirk"))
> + p9_occ_quirk = true;
> +
>  next:
>   pr_info("cpufreq pstate min %d nominal %d max %d\n", pstate_min,
>   pstate_nominal, pstate_max);
> + pr_info("frequency domain indicator %d", freq_domain_indicator);
>   pr_info("Workload Optimized Frequency is %s in the platform\n",
>   (powernv_pstate_info.wof_enabled) ? "enabled" : "disabled");
>  
> @@ -276,6 +322,15 @@ static int init_powernv_pstates(void)
>   return -ENODEV;
>   }
>  
> + if (freq_domain_indicator) {
> + hash_init(freq_domain_map);
> + for_each_possible_cpu(i) {
> + key = ((u32) get_hard_smp_processor_id(i) &
> + freq_domain_indicator);
> + insert(key, i);
> + }
> + }
> +
>   powernv_pstate_info.nr_pstates = nr_pstates;
>   pr_debug("NR PStates %d\n", nr_pstates);
>   for (i = 0; i < nr_pstates; i++) {
> @@ -760,25 +815,56 @@ static int powernv_cpufreq_target_index(struct 
> cpufreq_policy *policy,
>  
>  

Re: [v3 PATCH 2/3] powernv-cpufreq: Fix pstate_to_idx() to handle non-continguous pstates

2018-01-10 Thread Rafael J. Wysocki
On Wednesday, January 10, 2018 9:55:45 AM CET Gautham R Shenoy wrote:
> Hi Rafael,
> 
> On Wed, Jan 03, 2018 at 11:47:58PM +1100, Balbir Singh wrote:
> > On Wed, Jan 3, 2018 at 11:07 PM, Rafael J. Wysocki <r...@rjwysocki.net> 
> > wrote:
> > > On Monday, December 18, 2017 9:38:20 AM CET Gautham R Shenoy wrote:
> > >> Hi Balbir,
> > >>
> > >> On Sun, Dec 17, 2017 at 02:15:25PM +1100, Balbir Singh wrote:
> > >> > On Wed, Dec 13, 2017 at 5:57 PM, Gautham R. Shenoy
> > >> > <e...@linux.vnet.ibm.com> wrote:
> > >> > > From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>
> > >> > >
> > >> > > The code in powernv-cpufreq, makes the following two assumptions 
> > >> > > which
> > >> > > are not guaranteed by the device-tree bindings:
> > >> > >
> > >> > > 1) Pstate ids are continguous: This is used in pstate_to_idx() to
> > >> > >obtain the reverse map from a pstate to it's corresponding
> > >> > >entry into the cpufreq frequency table.
> > >> > >
> > >> > > 2) Every Pstate should always lie between the max and the min
> > >> > >pstates that are explicitly reported in the device tree: This
> > >> > >is used to determine whether a pstate reported by the PMSR is
> > >> > >out of bounds.
> > >> > >
> > >> > > Both these assumptions are unwarranted and can change on future
> > >> > > platforms.
> > >> >
> > >> > While this is a good thing, I wonder if it is worth the complexity. 
> > >> > Pstates
> > >> > are contiguous because they define transitions in incremental value
> > >> > of change in frequency and I can't see how this can be broken in the
> > >> > future?
> > >>
> > >> In the future, we can have the OPAL firmware give us a smaller set of
> > >> pstates instead of expose every one of them. As it stands today, for
> > >> most of the workloads, we will need at best 20-30 pstates and not
> > >> beyond that.
> > >
> > > I'm not sure about the status here.
> > >
> > > Is this good to go as is or is it going to be updated?
> > >
> > 
> > I have no major objections, except some of the added complexity, but
> > Gautham makes a point that this is refactoring for the future
> 
> I have tested this across POWER8 and POWER9. The additional complexity
> introduced by the second patch is required for the future when we are
> going to reduce the number of pstates.

I have applied the series, thanks!



Re: [v3 PATCH 2/3] powernv-cpufreq: Fix pstate_to_idx() to handle non-continguous pstates

2018-01-03 Thread Rafael J. Wysocki
On Monday, December 18, 2017 9:38:20 AM CET Gautham R Shenoy wrote:
> Hi Balbir,
> 
> On Sun, Dec 17, 2017 at 02:15:25PM +1100, Balbir Singh wrote:
> > On Wed, Dec 13, 2017 at 5:57 PM, Gautham R. Shenoy
> >  wrote:
> > > From: "Gautham R. Shenoy" 
> > >
> > > The code in powernv-cpufreq, makes the following two assumptions which
> > > are not guaranteed by the device-tree bindings:
> > >
> > > 1) Pstate ids are continguous: This is used in pstate_to_idx() to
> > >obtain the reverse map from a pstate to it's corresponding
> > >entry into the cpufreq frequency table.
> > >
> > > 2) Every Pstate should always lie between the max and the min
> > >pstates that are explicitly reported in the device tree: This
> > >is used to determine whether a pstate reported by the PMSR is
> > >out of bounds.
> > >
> > > Both these assumptions are unwarranted and can change on future
> > > platforms.
> > 
> > While this is a good thing, I wonder if it is worth the complexity. Pstates
> > are contiguous because they define transitions in incremental value
> > of change in frequency and I can't see how this can be broken in the
> > future?
> 
> In the future, we can have the OPAL firmware give us a smaller set of
> pstates instead of expose every one of them. As it stands today, for
> most of the workloads, we will need at best 20-30 pstates and not
> beyond that.

I'm not sure about the status here.

Is this good to go as is or is it going to be updated?

Thanks,
Rafael



Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT

2017-12-22 Thread Rafael J. Wysocki
On Sat, Dec 23, 2017 at 12:57 AM, Dan Williams  wrote:
> On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler
>  wrote:
>> On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote:
>>> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin  
>>> wrote:
>>> > Le 20/12/2017 à 23:41, Ross Zwisler a écrit :
>>> [..]
>>> > Hello
>>> >
>>> > I can confirm that HPC runtimes are going to use these patches (at least
>>> > all runtimes that use hwloc for topology discovery, but that's the vast
>>> > majority of HPC anyway).
>>> >
>>> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to
>>> > explicitly detect that specific crazy table to find out which NUMA nodes
>>> > were local to which cores, and to find out which NUMA nodes were
>>> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the
>>> > application because the reported latencies didn't match reality. Quite
>>> > annoying.
>>> >
>>> > With Ross' patches, we can easily get what we need:
>>> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/
>>> > can only report a single local node per CPU (doesn't work for KNL and
>>> > upcoming architectures with HBM+DDR+...)
>>> > * which NUMA nodes are slow/fast (for both bandwidth and latency)
>>> > And we can still look at SLIT under /sys/devices/system/node if really
>>> > needed.
>>> >
>>> > And of course having this in sysfs is much better than parsing ACPI
>>> > tables that are only accessible to root :)
>>>
>>> On this point, it's not clear to me that we should allow these sysfs
>>> entries to be world readable. Given /proc/iomem now hides physical
>>> address information from non-root we at least need to be careful not
>>> to undo that with new sysfs HMAT attributes.
>>
>> This enabling does not expose any physical addresses to userspace.  It only
>> provides performance numbers from the HMAT and associates them with existing
>> NUMA nodes.  Are you worried that exposing performance numbers to non-root
>> users via sysfs poses a security risk?
>
> It's an information disclosure that's not clear we need to make to
> non-root processes.
>
> I'm more worried about userspace growing dependencies on the absolute
> numbers when those numbers can change from platform to platform.
> Differentiated memory on one platform may be the common memory pool on
> another.
>
> To me this has parallels with storage device hinting where
> specifications like T10 have a complex enumeration of all the
> performance hints that can be passed to the device, but the Linux
> enabling effort aims for a sanitzed set of relative hints that make
> sense. It's more flexible if userspace specifies a relative intent
> rather than an absolute performance target. Putting all the HMAT
> information into sysfs gives userspace more information than it could
> possibly do anything reasonable, at least outside of specialized apps
> that are hand tuned for a given hardware platform.

That's a valid point IMO.

It is sort of tempting to expose everything to user space verbatim,
especially early in the enabling process when the kernel has not yet
found suitable ways to utilize the given information, but the very act
of exposing it may affect what can be done with it in the future.

User space interfaces need to stay around and be supported forever, at
least potentially, so adding every one of them is a serious
commitment.

Thanks,
Rafael


Re: [trivial PATCH] treewide: Align function definition open/close braces

2017-12-18 Thread Rafael J. Wysocki
(xfs_sb_version_hascrc(>m_sb)) {
> @@ -2449,8 +2449,7 @@ xfs_agf_verify(
>be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS))
>   return false;
>  
> - return true;;
> -
> + return true;
>  }
>  
>  static void
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index fe1bfee35898..7d5c355d78b5 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -122,7 +122,7 @@ xfs_nfs_get_inode(
>   struct super_block  *sb,
>   u64 ino,
>   u32 generation)
> - {
> +{
>   xfs_mount_t *mp = XFS_M(sb);
>   xfs_inode_t *ip;
>   int error;
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 227db99b0f19..d97e8f0f73ca 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -443,15 +443,15 @@ static int audit_set_failure(u32 state)
>   * Drop any references inside the auditd connection tracking struct and free
>   * the memory.
>   */
> - static void auditd_conn_free(struct rcu_head *rcu)
> - {
> +static void auditd_conn_free(struct rcu_head *rcu)
> +{
>   struct auditd_connection *ac;
>  
>   ac = container_of(rcu, struct auditd_connection, rcu);
>   put_pid(ac->pid);
>   put_net(ac->net);
>   kfree(ac);
> - }
> +}
>  
>  /**
>   * auditd_set - Set/Reset the auditd connection state
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index ad1d6164e946..50f44b7b2b32 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -196,7 +196,7 @@ struct notifier_block module_trace_bprintk_format_nb = {
>  };
>  
>  int __trace_bprintk(unsigned long ip, const char *fmt, ...)
> - {
> +{
>   int ret;
>   va_list ap;
>  
> @@ -214,7 +214,7 @@ int __trace_bprintk(unsigned long ip, const char *fmt, 
> ...)
>  EXPORT_SYMBOL_GPL(__trace_bprintk);
>  
>  int __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap)
> - {
> +{
>   if (unlikely(!fmt))
>   return 0;
>  
> diff --git a/lib/raid6/sse2.c b/lib/raid6/sse2.c
> index 1d2276b007ee..8191e1d0d2fb 100644
> --- a/lib/raid6/sse2.c
> +++ b/lib/raid6/sse2.c
> @@ -91,7 +91,7 @@ static void raid6_sse21_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>  
>  static void raid6_sse21_xor_syndrome(int disks, int start, int stop,
>size_t bytes, void **ptrs)
> - {
> +{
>   u8 **dptr = (u8 **)ptrs;
>   u8 *p, *q;
>   int d, z, z0;
> @@ -200,9 +200,9 @@ static void raid6_sse22_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>   kernel_fpu_end();
>  }
>  
> - static void raid6_sse22_xor_syndrome(int disks, int start, int stop,
> +static void raid6_sse22_xor_syndrome(int disks, int start, int stop,
>size_t bytes, void **ptrs)
> - {
> +{
>   u8 **dptr = (u8 **)ptrs;
>   u8 *p, *q;
>   int d, z, z0;
> @@ -265,7 +265,7 @@ static void raid6_sse22_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>  
>   asm volatile("sfence" : : : "memory");
>   kernel_fpu_end();
> - }
> +}
>  
>  const struct raid6_calls raid6_sse2x2 = {
>   raid6_sse22_gen_syndrome,
> @@ -366,9 +366,9 @@ static void raid6_sse24_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>   kernel_fpu_end();
>  }
>  
> - static void raid6_sse24_xor_syndrome(int disks, int start, int stop,
> +static void raid6_sse24_xor_syndrome(int disks, int start, int stop,
>size_t bytes, void **ptrs)
> - {
> +{
>   u8 **dptr = (u8 **)ptrs;
>   u8 *p, *q;
>   int d, z, z0;
> @@ -471,7 +471,7 @@ static void raid6_sse24_gen_syndrome(int disks, size_t 
> bytes, void **ptrs)
>   }
>   asm volatile("sfence" : : : "memory");
>   kernel_fpu_end();
> - }
> +}
>  
>  
>  const struct raid6_calls raid6_sse2x4 = {
> diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> index 0c11f434a374..ec619f51d336 100644
> --- a/sound/soc/fsl/fsl_dma.c
> +++ b/sound/soc/fsl/fsl_dma.c
> @@ -879,7 +879,7 @@ static const struct snd_pcm_ops fsl_dma_ops = {
>  };
>  
>  static int fsl_soc_dma_probe(struct platform_device *pdev)
> - {
> +{
>   struct dma_object *dma;
>   struct device_node *np = pdev->dev.of_node;
>   struct device_node *ssi_np;
> 
> --

Acked-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>

for the ACPI part.

Thanks!



Re: [v2 PATCH] cpufreq: powernv: Correctly parse the sign of pstates on POWER8 vs POWER9

2017-12-08 Thread Rafael J. Wysocki
On Fri, Dec 8, 2017 at 12:47 PM, Michael Ellerman <m...@ellerman.id.au> wrote:
> "Rafael J. Wysocki" <raf...@kernel.org> writes:
>
>> On Thu, Dec 7, 2017 at 6:59 AM, Gautham R. Shenoy
>> <e...@linux.vnet.ibm.com> wrote:
>>> From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>
>>>
>>> On POWERNV platform, Pstates are 8-bit values. On POWER8 they are
>>> negatively numbered while on POWER9 they are positively
>>> numbered. Thus, on POWER9, the maximum number of pstates could be as
>>> high as 256.
>>>
>>> The current code interprets pstates as a signed 8-bit value. This
>>> causes a problem on POWER9 platforms which have more than 128 pstates.
>>> On such systems, on a CPU that is in a lower pstate whose number is
>>> greater than 128, querying the current pstate returns a "pstate X is
>>> out of bound" error message and the current pstate is reported as the
>>> nominal pstate.
>>>
>>> This patch fixes the aforementioned issue by correctly differentiating
>>> the sign whenever a pstate value read, depending on whether the
>>> pstates are positively numbered or negatively numbered.
>>>
>>> Fixes: commit 09ca4c9b5958 ("cpufreq: powernv: Replacing pstate_id with 
>>> frequency table index")
>>> Cc: <sta...@vger.kernel.org> #v4.8
>>> Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
>>> Tested-and-reviewed-by: Shilpasri G Bhat <shilpa.b...@linux.vnet.ibm.com>
>>> Acked-by: Viresh Kumar <viresh.ku...@linaro.org>
>>
>> I'm going to apply this, or please let me know if you want to route it
>> differently.
>
> Do you mind waiting for now, we're still debating how to fix it.

No problem. :-)

Just please let me know when you're ready.


Re: [v2 PATCH] cpufreq: powernv: Correctly parse the sign of pstates on POWER8 vs POWER9

2017-12-07 Thread Rafael J. Wysocki
On Thu, Dec 7, 2017 at 6:59 AM, Gautham R. Shenoy
 wrote:
> From: "Gautham R. Shenoy" 
>
> On POWERNV platform, Pstates are 8-bit values. On POWER8 they are
> negatively numbered while on POWER9 they are positively
> numbered. Thus, on POWER9, the maximum number of pstates could be as
> high as 256.
>
> The current code interprets pstates as a signed 8-bit value. This
> causes a problem on POWER9 platforms which have more than 128 pstates.
> On such systems, on a CPU that is in a lower pstate whose number is
> greater than 128, querying the current pstate returns a "pstate X is
> out of bound" error message and the current pstate is reported as the
> nominal pstate.
>
> This patch fixes the aforementioned issue by correctly differentiating
> the sign whenever a pstate value read, depending on whether the
> pstates are positively numbered or negatively numbered.
>
> Fixes: commit 09ca4c9b5958 ("cpufreq: powernv: Replacing pstate_id with 
> frequency table index")
> Cc:  #v4.8
> Signed-off-by: Gautham R. Shenoy 
> Tested-and-reviewed-by: Shilpasri G Bhat 
> Acked-by: Viresh Kumar 

I'm going to apply this, or please let me know if you want to route it
differently.

> ---
>  drivers/cpufreq/powernv-cpufreq.c | 43 
> ++-
>  1 file changed, 33 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index b6d7c4c..bb7586e 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -41,11 +41,14 @@
>  #define POWERNV_MAX_PSTATES256
>  #define PMSR_PSAFE_ENABLE  (1UL << 30)
>  #define PMSR_SPR_EM_DISABLE(1UL << 31)
> -#define PMSR_MAX(x)((x >> 32) & 0xFF)
> +#define EXTRACT_BYTE(x, shift) (((x) >> shift) & 0xFF)
> +#define MAX_SHIFT  32
>  #define LPSTATE_SHIFT  48
>  #define GPSTATE_SHIFT  56
> -#define GET_LPSTATE(x) (((x) >> LPSTATE_SHIFT) & 0xFF)
> -#define GET_GPSTATE(x) (((x) >> GPSTATE_SHIFT) & 0xFF)
> +#define GET_PMSR_MAX(x)EXTRACT_BYTE(x, MAX_SHIFT)
> +#define GET_LPSTATE(x) EXTRACT_BYTE(x, LPSTATE_SHIFT)
> +#define GET_GPSTATE(x) EXTRACT_BYTE(x, GPSTATE_SHIFT)
> +
>
>  #define MAX_RAMP_DOWN_TIME 5120
>  /*
> @@ -64,6 +67,12 @@
>
>  /* Interval after which the timer is queued to bring down global pstate */
>  #define GPSTATE_TIMER_INTERVAL 2000
> +/*
> + * On POWER8 the pstates are negatively numbered. On POWER9, they are
> + * positively numbered.  Use this flag to track whether we have
> + * positive or negative numbered pstates.
> + */
> +static bool pos_pstates;
>
>  /**
>   * struct global_pstate_info - Per policy data structure to maintain history 
> of
> @@ -164,7 +173,7 @@ static inline unsigned int pstate_to_idx(int pstate)
> int min = powernv_freqs[powernv_pstate_info.min].driver_data;
> int max = powernv_freqs[powernv_pstate_info.max].driver_data;
>
> -   if (min > 0) {
> +   if (pos_pstates) {
> if (unlikely((pstate < max) || (pstate > min))) {
> pr_warn_once("pstate %d is out of bound\n", pstate);
> return powernv_pstate_info.nominal;
> @@ -301,6 +310,9 @@ static int init_powernv_pstates(void)
> }
> }
>
> +   if ((int)pstate_min > 0)
> +   pos_pstates = true;
> +
> /* End of list marker entry */
> powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> return 0;
> @@ -438,7 +450,6 @@ struct powernv_smp_call_data {
>  static void powernv_read_cpu_freq(void *arg)
>  {
> unsigned long pmspr_val;
> -   s8 local_pstate_id;
> struct powernv_smp_call_data *freq_data = arg;
>
> pmspr_val = get_pmspr(SPRN_PMSR);
> @@ -447,8 +458,11 @@ static void powernv_read_cpu_freq(void *arg)
>  * The local pstate id corresponds bits 48..55 in the PMSR.
>  * Note: Watch out for the sign!
>  */
> -   local_pstate_id = (pmspr_val >> 48) & 0xFF;
> -   freq_data->pstate_id = local_pstate_id;
> +   if (pos_pstates)
> +   freq_data->pstate_id = (u8)GET_LPSTATE(pmspr_val);
> +   else
> +   freq_data->pstate_id = (s8)GET_LPSTATE(pmspr_val);
> +
> freq_data->freq = pstate_id_to_freq(freq_data->pstate_id);
>
> pr_debug("cpu %d pmsr %016lX pstate_id %d frequency %d kHz\n",
> @@ -522,7 +536,10 @@ static void powernv_cpufreq_throttle_check(void *data)
> chip = this_cpu_read(chip_info);
>
> /* Check for Pmax Capping */
> -   pmsr_pmax = (s8)PMSR_MAX(pmsr);
> +   if (pos_pstates)
> +   pmsr_pmax = (u8)GET_PMSR_MAX(pmsr);
> +   else
> +   pmsr_pmax = (s8)GET_PMSR_MAX(pmsr);
> 

Re: Linux 4.14: Reported regressions as of Sunday, 2017-10-08

2017-10-11 Thread Rafael J. Wysocki
On Sunday, October 8, 2017 2:37:41 PM CEST Thorsten Leemhuis wrote:
> Hi! Find below my second regression report for Linux 4.14. It lists 8
> regressions I'm currently aware of. One regression was fixed since last
> weeks report. One was in there that shouldn't have been there.
> 
> As always: Are you aware of any other regressions? Then please let me
> know by mail (a simple bounce in my direction is enough!). For details
> see http://bit.ly/lnxregtrackid And please tell me if there is anything
> in the report that shouldn't be there.
> 
> Ciao, Thorsten
> 
> P.S.: Thx to Adam and Igor for pointing me at two regressions they face.
> And thx to Yanko for pointing out a stupidity I did in last weeks report.
> 
> == Current regressions ==
> 
> "hangs when building e.g. perf" & "Random insta-reboots on AMD Phenom II"
> Status: "Mr. Luto better revert the new lazy TLB flushing fun'n'games"
> -> "Yeah, working on it.  It's not a straightforward revert."
> Note: TWIMC: Workaround: wrmsr -a 0xc0010015 0x118
> Reported: 2017-09-05
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1484723.html
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1501379.html
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1501570.html
> Cause: 94b1b03b519b81c494900cb112aa00ed205cc2d9
> 
> New default s2idle does not work on Dell XPS 13 9360
> Status: works fine on several other owners of this laptop; maybe
> specific to the variant or the particular machine the reporter owns;
> looks related to the storage device used
> Reported: 2017-09-11
> https://bugzilla.kernel.org/show_bug.cgi?id=196907
> Cause: e870c6c87cf9484090d28f2a68aa29e008960c93 (assumed)

This actually turns out to be an s2idle regression on the reporter's machine
that was introduced between 4.11-rc1 and 4.13 and appears to be related to NVMe
(and specifically to the particular Hynix 512G NVMe SSD in that machine).

Due to the lack of reproducibility, we need the reporter to bisect it to
make progress (on his system it appears to be reproducible 100% of the time).

Thanks,
Rafael



Re: [PATCH 03/13] timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()

2017-10-05 Thread Rafael J. Wysocki
On Thu, Oct 5, 2017 at 1:26 AM, Kees Cook <keesc...@chromium.org> wrote:
> Remove uses of init_timer_on_stack() with open-coded function and data
> assignments that could be expressed using timer_setup_on_stack(). Several
> were removed from the stack entirely since there was a one-to-one mapping
> of parent structure to timer, those are switched to using timer_setup()
> instead. All related callbacks were adjusted to use from_timer().
>
> Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
> Cc: Pavel Machek <pa...@ucw.cz>
> Cc: Len Brown <len.br...@intel.com>
> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> Cc: Stefan Richter <stef...@s5r6.in-berlin.de>
> Cc: Sudip Mukherjee <sudipm.mukher...@gmail.com>
> Cc: Martin Schwidefsky <schwidef...@de.ibm.com>
> Cc: Heiko Carstens <heiko.carst...@de.ibm.com>
> Cc: Julian Wiedmann <j...@linux.vnet.ibm.com>
> Cc: Ursula Braun <ubr...@linux.vnet.ibm.com>
> Cc: Michael Reed <m...@sgi.com>
> Cc: "James E.J. Bottomley" <j...@linux.vnet.ibm.com>
> Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: linux...@vger.kernel.org
> Cc: linux1394-de...@lists.sourceforge.net
> Cc: linux-s...@vger.kernel.org
> Cc: linux-s...@vger.kernel.org
> Signed-off-by: Kees Cook <keesc...@chromium.org>
> ---
>  drivers/base/power/main.c   |  8 +++-
>  drivers/firewire/core-transaction.c | 10 +-
>  drivers/parport/ieee1284.c  | 21 +++--
>  drivers/s390/char/tape.h|  1 +
>  drivers/s390/char/tape_std.c| 18 ++
>  drivers/s390/net/lcs.c  | 16 ++--
>  drivers/s390/net/lcs.h  |  1 +
>  drivers/scsi/qla1280.c  | 14 +-
>  drivers/scsi/qla1280.h  |  1 +
>  include/linux/parport.h |  1 +
>  include/linux/timer.h   |  2 --
>  11 files changed, 36 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 770b1539a083..ae47b2ec84b4 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -478,9 +478,9 @@ struct dpm_watchdog {
>   * There's not much we can do here to recover so panic() to
>   * capture a crash-dump in pstore.
>   */
> -static void dpm_watchdog_handler(unsigned long data)
> +static void dpm_watchdog_handler(struct timer_list *t)
>  {
> -   struct dpm_watchdog *wd = (void *)data;
> +   struct dpm_watchdog *wd = from_timer(wd, t, timer);
>
> dev_emerg(wd->dev, " DPM device timeout \n");
> show_stack(wd->tsk, NULL);
> @@ -500,11 +500,9 @@ static void dpm_watchdog_set(struct dpm_watchdog *wd, 
> struct device *dev)
> wd->dev = dev;
> wd->tsk = current;
>
> -   init_timer_on_stack(timer);
> +   timer_setup_on_stack(timer, dpm_watchdog_handler, 0);
> /* use same timeout value for both suspend and resume */
> timer->expires = jiffies + HZ * CONFIG_DPM_WATCHDOG_TIMEOUT;
> -   timer->function = dpm_watchdog_handler;
> -   timer->data = (unsigned long)wd;
> add_timer(timer);
>  }

For the above:

Acked-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>


Re: [PATCH V3 0/9] cpufreq: transition-latency cleanups

2017-07-19 Thread Rafael J. Wysocki
On Wednesday, July 19, 2017 03:42:40 PM Viresh Kumar wrote:
> Hi Rafael,
> 
> This series tries to cleanup the code around transition-latency and its
> users. Some of the old legacy code, which may not make much sense now,
> is dropped as well. And some code consolidation is also done across
> governors.
> 
> Based of: v4.13-rc1
> Tested on: ARM64 Hikey board.

>From the first quick look this version is fine by me.

Unless I find anything of concern later, this will be queued up for 4.14.

Thanks,
Rafael



Re: [1/3] cpuidle: powerpc: cpuidle set polling before enabling irqs

2017-06-30 Thread Rafael J. Wysocki
On Fri, Jun 30, 2017 at 5:45 AM, Michael Ellerman <m...@ellerman.id.au> wrote:
> "Rafael J. Wysocki" <raf...@kernel.org> writes:
>
>> On Thu, Jun 29, 2017 at 2:21 PM, Michael Ellerman
>> <patch-notificati...@ellerman.id.au> wrote:
>>> On Wed, 2017-06-14 at 13:02:39 UTC, Nicholas Piggin wrote:
>>>> local_irq_enable can cause interrupts to be taken which could
>>>> take significant amount of processing time. The idle process
>>>> should set its polling flag before this, so another process that
>>>> wakes it during this time will not have to send an IPI.
>>>>
>>>> Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
>>>>
>>>> Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
>>>> Signed-off-by: Nicholas Piggin <npig...@gmail.com>
>>>
>>> Series applied to powerpc next, thanks.
>>>
>>> https://git.kernel.org/powerpc/c/3fc5ee927ff4ffed6aa2fcd44d2fbf
>>
>> OK
>>
>> I've applied it too, so I guess I should drop it?
>
> Erk sorry. I hadn't heard anything so I picked it up.
>
> If you can drop it that would be good, but if not git will probably work
> it out mostly :)

I've dropped it, no problem.

Thanks,
Rafael


Re: [1/3] cpuidle: powerpc: cpuidle set polling before enabling irqs

2017-06-29 Thread Rafael J. Wysocki
On Thu, Jun 29, 2017 at 2:21 PM, Michael Ellerman
 wrote:
> On Wed, 2017-06-14 at 13:02:39 UTC, Nicholas Piggin wrote:
>> local_irq_enable can cause interrupts to be taken which could
>> take significant amount of processing time. The idle process
>> should set its polling flag before this, so another process that
>> wakes it during this time will not have to send an IPI.
>>
>> Expand the TIF_POLLING_NRFLAG coverage to as large as possible.
>>
>> Reviewed-by: Gautham R. Shenoy 
>> Signed-off-by: Nicholas Piggin 
>
> Series applied to powerpc next, thanks.
>
> https://git.kernel.org/powerpc/c/3fc5ee927ff4ffed6aa2fcd44d2fbf

OK

I've applied it too, so I guess I should drop it?

Thanks,
Rafael


Re: [PATCH] kernel/power/suspend: use CONFIG_HAVE_SET_MEMORY for include condition

2017-06-26 Thread Rafael J. Wysocki
On Monday, June 26, 2017 01:34:52 PM Balbir Singh wrote:
> On Sat, Jun 3, 2017 at 11:27 PM, Pavel Machek <pa...@ucw.cz> wrote:
> > On Sat 2017-06-03 20:52:32, Balbir Singh wrote:
> >> Kbuild reported a build failure when CONFIG_STRICT_KERNEL_RWX was
> >> enabled on powerpc. We don't yet have ARCH_HAS_SET_MEMORY and ppc32
> >> saw a build failure.
> >>
> >> fixes(50327dd kernel/power/snapshot.c: use set_memory.h header)
> >>
> >> I've only done a basic compile test with a config that has
> >> hibernation enabled.
> >>
> >> Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
> >> Cc: Len Brown <len.br...@intel.com>
> > Acked-by: Pavel Machek <pa...@ucw.cz>
> 
> Ping. Could we please pick this up? it breaks any attempt to support
> STRICT_KERNEL_RWX on powerpc

Yes, I'm going to pick it up for 4.13.

Thanks,
Rafael



Re: [PATCH v2 1/2] powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration

2017-03-23 Thread Rafael J. Wysocki
On Thu, Mar 23, 2017 at 4:22 PM, Vaidyanathan Srinivasan
 wrote:
> drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
> On PowerNV platform cpu_present could be less than cpu_possible in cases
> where firmware detects the cpu, but it is not available to the OS.  When
> CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
> we skip creating cpu_device.
>
> This breaks cpuidle on powernv where register_cpu() is not called for
> cpus in cpu_possible_mask that cannot be hot-added at runtime.
>
> Trying cpuidle_register_device() on cpu without cpu_device will cause
> crash like this:
>
> cpu 0xf: Vector: 380 (Data SLB Access) at [c00ff1503490]
> pc: c022c8bc: string+0x34/0x60
> lr: c022ed78: vsnprintf+0x284/0x42c
> sp: c00ff1503710
>msr: 90009033
>dar: 60006000
>   current = 0xc00ff148
>   paca= 0xcfe82d00   softe: 0irq_happened: 0x01
> pid   = 1, comm = swapper/8
> Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
> (Buildroot 2017.02-4-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
> enter ? for help
> [link register   ] c022ed78 vsnprintf+0x284/0x42c
> [c00ff1503710] c022ebb8 vsnprintf+0xc4/0x42c (unreliable)
> [c00ff1503800] c022ef40 vscnprintf+0x20/0x44
> [c00ff1503830] c00ab61c vprintk_emit+0x94/0x2cc
> [c00ff15038a0] c00acc9c vprintk_func+0x60/0x74
> [c00ff15038c0] c0619694 printk+0x38/0x4c
> [c00ff15038e0] c0224950 kobject_get+0x40/0x60
> [c00ff1503950] c022507c kobject_add_internal+0x60/0x2c4
> [c00ff15039e0] c0225350 kobject_init_and_add+0x70/0x78
> [c00ff1503a60] c053c288 cpuidle_add_sysfs+0x9c/0xe0
> [c00ff1503ae0] c053aeac cpuidle_register_device+0xd4/0x12c
> [c00ff1503b30] c053b108 cpuidle_register+0x98/0xcc
> [c00ff1503bc0] c085eaf0 powernv_processor_idle_init+0x140/0x1e0
> [c00ff1503c60] c000cd60 do_one_initcall+0xc0/0x15c
> [c00ff1503d20] c0833e84 kernel_init_freeable+0x1a0/0x25c
> [c00ff1503dc0] c000d478 kernel_init+0x24/0x12c
> [c00ff1503e30] c000b564 ret_from_kernel_thread+0x5c/0x78
>
> This patch fixes the bug by passing correct cpumask from
> powernv-cpuidle driver.
>
> Signed-off-by: Vaidyanathan Srinivasan 

That needs to be ACKed by someone familiar with powernv.

> ---
>  drivers/cpuidle/cpuidle-powernv.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-powernv.c 
> b/drivers/cpuidle/cpuidle-powernv.c
> index a06df51..82f7b33 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -175,6 +175,24 @@ static int powernv_cpuidle_driver_init(void)
> drv->state_count += 1;
> }
>
> +   /*
> +* On PowerNV platform cpu_present may be less that cpu_possible in
> +* cases where firmware detects the cpu, but it is not available to 
> the
> +* OS.  If CONFIG_HOTPLUG_CPU=n then such CPUs are not hotplugable at
> +* runtime and hence cpu_devices are not created for those cpus by
> +* generic topology_init().
> +*
> +* drv->cpumask defaults to cpu_possible_mask in
> +* __cpuidle_driver_init().  This breaks cpuidle on powernv where
> +* cpu_devices are not created for cpus in cpu_possible_mask that
> +* cannot be hot-added later at runtime.
> +*
> +* Trying cpuidle_register_device() on a cpu without cpu_devices is
> +* incorrect. Hence pass correct cpu mask to generic cpuidle driver.
> +*/
> +
> +   drv->cpumask = (struct cpumask *)cpu_present_mask;
> +
> return 0;
>  }
>
> --
> 2.9.3
>


Re: [PATCH v2 2/2] cpuidle: Validate cpu_dev in cpuidle_add_sysfs

2017-03-23 Thread Rafael J. Wysocki
On Thu, Mar 23, 2017 at 4:22 PM, Vaidyanathan Srinivasan
 wrote:
> If a given cpu is not in cpu_present and cpu hotplug
> is disabled, arch can skip setting up the cpu_dev.
>
> Arch cpuidle driver should pass correct cpu mask
> for registration, but failing to do so by the driver
> causes error to propagate and crash like this:
>
> [   30.076045] Unable to handle kernel paging request for
> data at address 0x0048
> [   30.076100] Faulting instruction address: 0xc07b2f30
> cpu 0x4d: Vector: 300 (Data Access) at [c03feb18b670]
> pc: c07b2f30: kobject_get+0x20/0x70
> lr: c07b3c94: kobject_add_internal+0x54/0x3f0
> sp: c03feb18b8f0
>msr: 90009033
>dar: 48
>  dsisr: 4000
>   current = 0xc03fd2ed8300
>   paca= 0xcfbab500   softe: 0irq_happened: 0x01
> pid   = 1, comm = swapper/0
> Linux version 4.11.0-rc2-svaidy+ (sv@sagarika) (gcc version 6.2.0
> 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #10 SMP Sun Mar 19 00:08:09 IST 2017
> enter ? for help
> [c03feb18b960] c07b3c94 kobject_add_internal+0x54/0x3f0
> [c03feb18b9f0] c07b43a4 kobject_init_and_add+0x64/0xa0
> [c03feb18ba70] c0e284f4 cpuidle_add_sysfs+0xb4/0x130
> [c03feb18baf0] c0e26038 cpuidle_register_device+0x118/0x1c0
> [c03feb18bb30] c0e26c48 cpuidle_register+0x78/0x120
> [c03feb18bbc0] c168fd9c powernv_processor_idle_init+0x110/0x1c4
> [c03feb18bc40] c000cff8 do_one_initcall+0x68/0x1d0
> [c03feb18bd00] c16242f4 kernel_init_freeable+0x280/0x360
> [c03feb18bdc0] c000d864 kernel_init+0x24/0x160
> [c03feb18be30] c000b4e8 ret_from_kernel_thread+0x5c/0x74
>
> Validating cpu_dev fixes the crash and reports correct error message like:
>
> [   30.163506] Failed to register cpuidle device for cpu136
> [   30.173329] Registration of powernv driver failed.
>
> Signed-off-by: Vaidyanathan Srinivasan 

The previous version is in linux-next already and I'm going to push it
for merging shortly.

> ---
>  drivers/cpuidle/sysfs.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c
> index c5adc8c..f2c3bce 100644
> --- a/drivers/cpuidle/sysfs.c
> +++ b/drivers/cpuidle/sysfs.c
> @@ -615,6 +615,18 @@ int cpuidle_add_sysfs(struct cpuidle_device *dev)
> struct device *cpu_dev = get_cpu_device((unsigned long)dev->cpu);
> int error;
>
> +   /*
> +* Return error if cpu_device is not setup for this cpu.  This
> +* could happen if arch did not setup cpu_device since this
> +* cpu is not in cpu_present mask and the driver did not send
> +* correct cpu mask at registration.  Without this check we
> +* would end up passing bogus value for _dev->kobj in
> +* kobject_init_and_add().
> +*/
> +
> +   if (!cpu_dev)
> +   return -ENODEV;
> +
> kdev = kzalloc(sizeof(*kdev), GFP_KERNEL);
> if (!kdev)
> return -ENOMEM;
> --
> 2.9.3
>


Re: [PATCH 00/35] treewide trivial patches converting pr_warning to pr_warn

2017-02-17 Thread Rafael J. Wysocki
On Fri, Feb 17, 2017 at 8:11 AM, Joe Perches  wrote:
> There are ~4300 uses of pr_warn and ~250 uses of the older
> pr_warning in the kernel source tree.
>
> Make the use of pr_warn consistent across all kernel files.
>
> This excludes all files in tools/ as there is a separate
> define pr_warning for that directory tree and pr_warn is
> not used in tools/.
>
> Done with 'sed s/\bpr_warning\b/pr_warn/' and some emacsing.

Sorry about asking if that has been asked already.

Wouldn't it be slightly less intrusive to simply redefined
pr_warning() as a synonym for pr_warn()?

Thanks,
Rafael


Re: [PATCH v6 3/5] cpuidle:powernv: Add helper function to populate powernv idle states.

2017-01-29 Thread Rafael J. Wysocki
On Mon, Jan 30, 2017 at 4:47 AM, Michael Ellerman  wrote:
> "Gautham R. Shenoy"  writes:
>
>> From: "Gautham R. Shenoy" 
>>
>> In the current code for powernv_add_idle_states, there is a lot of code
>> duplication while initializing an idle state in powernv_states table.
>>
>> Add an inline helper function to populate the powernv_states[] table
>> for a given idle state. Invoke this for populating the "Nap",
>> "Fastsleep" and the stop states in powernv_add_idle_states.
>>
>> Acked-by: Balbir Singh 
>> Signed-off-by: Gautham R. Shenoy 
>> ---
>>  drivers/cpuidle/cpuidle-powernv.c | 89 
>> +++
>>  include/linux/cpuidle.h   |  1 +
>
> I was going to merge this, but I see you've touched cpuidle.h, so I feel
> like I should get an ACK from the cpuidle folks.
>
> It's a fairly uncontroversial change, but it's their API.

OK, please add an ACK from me to it then.

Thanks,
Rafael


Re: [PATCH] cpuidle/powernv: staticise powernv_idle_driver

2016-11-28 Thread Rafael J. Wysocki
On Tue, Nov 22, 2016 at 6:08 AM, Andrew Donnellan
 wrote:
> powernv_idle_driver isn't exported, it can be made static. Found by sparse.
>
> Signed-off-by: Andrew Donnellan 

Applied.

Thanks,
Rafael


Re: [PATCH v2] cpufreq, powernv: Disable preemption while checking CPU throttling state throttling state

2016-11-22 Thread Rafael J. Wysocki
On Tue, Nov 22, 2016 at 2:07 PM, Denis Kirjanov  wrote:
> With preemption turned on we can read incorrect throttling state while being 
> switched
> to CPU on a different chip.
> The following BUG_() was hit while running the 4.8-rc5 kernel compiled
> with CONFIG_PREEMPT_DEBUG on a POWER machine.
>
>
> [   67.700897] BUG: using smp_processor_id() in preemptible [] 
> code: cat/7343
> [   67.700988] caller is .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.700998] CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
> [   67.701038] Call Trace:
> [   67.701066] [c007d25b75b0] [c0971378] 
> .dump_stack+0xe4/0x150 (unreliable)
> [   67.701153] [c007d25b7640] [c05162e4] 
> .check_preemption_disabled+0x134/0x150
> [   67.701238] [c007d25b76e0] [c07b63ac] 
> .powernv_cpufreq_throttle_check+0x2c/0x710
> [   67.701322] [c007d25b7790] [c07b6d18] 
> .powernv_cpufreq_target_index+0x288/0x360
> [   67.701407] [c007d25b7870] [c07acee4] 
> .__cpufreq_driver_target+0x394/0x8c0
> [   67.701491] [c007d25b7920] [c07b22ac] 
> .cpufreq_set+0x7c/0xd0
> [   67.701565] [c007d25b79b0] [c07adf50] 
> .store_scaling_setspeed+0x80/0xc0
> [   67.701650] [c007d25b7a40] [c07ae270] .store+0xa0/0x100
> [   67.701723] [c007d25b7ae0] [c03566e8] 
> .sysfs_kf_write+0x88/0xb0
> [   67.701796] [c007d25b7b70] [c03553b8] 
> .kernfs_fop_write+0x178/0x260
> [   67.701881] [c007d25b7c10] [c02ac3cc] 
> .__vfs_write+0x3c/0x1c0
> [   67.701954] [c007d25b7cf0] [c02ad584] .vfs_write+0xc4/0x230
> [   67.702027] [c007d25b7d90] [c02aeef8] .SyS_write+0x58/0x100
> [   67.702101] [c007d25b7e30] [c000bfec] system_call+0x38/0xfc
>
> The task which updates the cpu-frequency checks if the chip was
> throttled. However, it does so without protecting itself against
> preemption. Hence, if the task is preempted and woken up on a CPU on
> another chip, it can end up updating the throttle statistics of the
> incorrect chip.
>
> This patch fixes the issue by ensuring that the throttle_check is
> performed with preemption disabled.
>
> Fixes: 09a972d16209 ("cpufreq: powernv: Report cpu frequency throttling")
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Denis Kirjanov 

That's already queued up for v4.10, you don't have to resend it.

Thanks,
Rafael


Re: [PATCH -next] powernv: cpufreq: Fix uninitialized lpstate_idx in gpstates_timer_handler

2016-11-14 Thread Rafael J. Wysocki
On Monday, November 14, 2016 05:29:27 PM Akshay Adiga wrote:
> lpstate_idx remains uninitialized in the case when elapsed_time
> is greater than MAX_RAMP_DOWN_TIME. At the end of rampdown
> global pstate should be equal to local pstate.
> 
> Fixes: 20b15b766354 ("cpufreq: powernv: Use PMCR to verify global and
> localpstate")
> Reported-by: Stephen Rothwell 
> Signed-off-by: Akshay Adiga 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index c82304b..c5c5bc3 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -624,6 +624,7 @@ void gpstate_timer_handler(unsigned long data)
>  
>   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
>   gpstate_idx = pstate_to_idx(freq_data.pstate_id);
> + lpstate_idx = gpstate_idx;
>   reset_gpstates(policy);
>   gpstates->highest_lpstate_idx = gpstate_idx;
>   } else {
> 

Applied.

Thanks,
Rafael



Re: [PATCH v2 0/3] powernv:stop: Use psscr_val, mask provided by firmware

2016-10-31 Thread Rafael J. Wysocki
On Thursday, October 27, 2016 02:05:04 PM Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> Hi,
> 
> This is the second iteration of the patchset to use the psscr_val and
> psscr_mask provided by the firmware for each of the stop states.
> 
> The previous version can be found here: 
> https://lkml.org/lkml/2016/9/29/45
> 
> The main changes in this version are:
> 
> 1) Add a helper function in powernv-cpuidle.c to help initialize the
>powernv_states cpuidle table.
> 
> 2) Handle the older firmware which populates only the Requested Level
>(RL) fields of the psscr and psscr_mask in the device tree. This
>patchset ensures that in the presence of the older firmware, the
>other fields of PSSCR are initalized to sane default values.
> 
> 
> Synopsis
> ==
> In the current implementation, the code for ISA v3.0 stop
> implementation has a couple of shortcomings.
> 
> a) The code hand-codes the values for ESL,EC,TR,MTL bits of PSSCR and
>uses only the RL field from the firmware. While this is not
>incorrect, since the hand-coded values are legitimate, it is not a
>very flexible design since the firmware has the capability to
>communicate these values via the "ibm,cpu-idle-state-psscr" and
>"ibm,cpu-idle-state-psscr-mask" properties. In case where the
>firmware provides values for these fields that is different from
>the hand-coded values, the current code will not work as intended.
> 
> b) Due to issue a), the current code assumes that ESL=EC=1 for all the
>stop states and hence the wakeup from the stop instruction will
>happen at 0x100, the system-reset vector. However, the ISA v3.0
>allows the ESL=EC=0 behaviour where the corresponding stop-state
>loses no state and wakes up from the subsequent instruction. The
>current code doesn't handle this case.
>
> This patch series addresses these issues.
> 
> The first patch in the series renames the existing
> IDLE_STATE_ENTER_SEQ macro to IDLE_STATE_ENTER_SEQ_NORET. It reuses
> the name IDLE_STATE_ENTER_SEQ for entering into stop-states which wake
> up at the subsequent instruction.
> 
> The second patch adds a helper function in cpuidle-powernv.c for
> initializing entries of the powernv_states[] table that is passed to
> the cpu-idle core. This eliminates some of the code duplication in the
> function that discovers and initializes the stop states.
> 
> The third patch in the series fixes issues a) and b) by ensuring that
> the psscr-value and the psscr-mask provided by the firmware are what
> will be used to set a particular stop state. It also adds support for
> handling wake-up from stop states which were entered with ESL=EC=0.
> 
> The third patch also handles the older firmware which sets only the
> Requested Level (RL) field in the psscr and psscr-mask exposed in the
> device tree. In the presence of such older firmware, this patch will
> set the default sane values for for remaining PSSCR fields (i.e PSLL,
> MTL, ESL, EC, and TR).
> 
> The skiboot patch populates all the relevant fields in the PSSCR
> values and the mask for all the stop states can be found here:
> https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html
> 
> The patches are based on top of
> git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes
> 
> Gautham R. Shenoy (3):
>   powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro
>   cpuidle:powernv: Add helper function to populate powernv idle states.
>   powernv: Pass PSSCR value and mask to power9_idle_stop

OK

I need someone to review this for me.

Thanks,
Rafael



  1   2   3   >