Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-16 Thread Rafael J. Wysocki
On Wed, Oct 16, 2019 at 10:27 AM Viresh Kumar  wrote:
>
> On 15-10-19, 23:50, Rafael J. Wysocki wrote:
> > On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki  wrote:
>
> > > > - Update QoS framework with the knowledge of related CPUs, this has 
> > > > been pending
> > > >   until now from my side. And this is the thing we really need to do. 
> > > > Eventually
> > > >   we shall have only a single notifier list for all CPUs of a policy, 
> > > > at least
> > > >   for MIN/MAX frequencies.
> > >
> > > - Move the PM QoS requests and notifiers to the new policy CPU on all
> > > changes of that.  That is, when cpufreq_offline() nominates the new
> > > "leader", all of the QoS stuff for the policy needs to go to this one.
> >
> > Alas, that still will not work, because things like
> > acpi_processor_ppc_init() only work accidentally for one-CPU policies.
>
> I am not sure what problem you see here ? Can you please explain a bit more.

Never mind, sorry.  This is called for policy->cpu too.

> > Generally, adding such a PM QoS request to a non-policy CPU simply has
> > no effect until it becomes a policy CPU which may be never.
>
> I was thinking maybe we can read the constraints for all CPUs in the
> policy->cpus mask in cpufreq_set_policy() and so this part of the problem will
> just go away. The only part that would be left is to remove the QoS 
> constraints
> properly.

That would be on the complicated side IMO.

> > It looks like using device PM QoS for cpufreq is a mistake in general
> > and what is needed is a struct pm_qos_constraints member in struct
> > cpufreq_policy and something like
> >
> > struct freq_pm_qos_request {
> > enum freq_pm_qos_req_type type; /* min or max */
> > struct plist_node pnode;
> > struct cpufreq_policy *policy;
> > };
> >
> > Then, pm_qos_update_target() can be used for adding, updating and
> > removing requests.

I have patches implementing this idea, more or less, almost ready,
stay tuned. :-)


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-16 Thread Viresh Kumar
On 15-10-19, 23:50, Rafael J. Wysocki wrote:
> On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki  wrote:

> > > - Update QoS framework with the knowledge of related CPUs, this has been 
> > > pending
> > >   until now from my side. And this is the thing we really need to do. 
> > > Eventually
> > >   we shall have only a single notifier list for all CPUs of a policy, at 
> > > least
> > >   for MIN/MAX frequencies.
> >
> > - Move the PM QoS requests and notifiers to the new policy CPU on all
> > changes of that.  That is, when cpufreq_offline() nominates the new
> > "leader", all of the QoS stuff for the policy needs to go to this one.
> 
> Alas, that still will not work, because things like
> acpi_processor_ppc_init() only work accidentally for one-CPU policies.

I am not sure what problem you see here ? Can you please explain a bit more.

> Generally, adding such a PM QoS request to a non-policy CPU simply has
> no effect until it becomes a policy CPU which may be never.

I was thinking maybe we can read the constraints for all CPUs in the
policy->cpus mask in cpufreq_set_policy() and so this part of the problem will
just go away. The only part that would be left is to remove the QoS constraints
properly.

> It looks like using device PM QoS for cpufreq is a mistake in general
> and what is needed is a struct pm_qos_constraints member in struct
> cpufreq_policy and something like
> 
> struct freq_pm_qos_request {
> enum freq_pm_qos_req_type type; /* min or max */
> struct plist_node pnode;
> struct cpufreq_policy *policy;
> };
> 
> Then, pm_qos_update_target() can be used for adding, updating and
> removing requests.

-- 
viresh


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-15 Thread Rafael J. Wysocki
On Tue, Oct 15, 2019 at 5:53 PM Rafael J. Wysocki  wrote:
>
> On Tue, Oct 15, 2019 at 1:46 PM Viresh Kumar  wrote:
> >
> > On 22-09-19, 23:12, Dmitry Osipenko wrote:
> > > Hello Viresh,
> > >
> > > This patch causes use-after-free on a cpufreq driver module reload. 
> > > Please take a look, thanks in advance.
> > >
> > >
> > > [   87.952369] 
> > > ==
> > > [   87.953259] BUG: KASAN: use-after-free in 
> > > notifier_chain_register+0x4f/0x9c
> > > [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> > >
> > > [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
> > > 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> > > [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> > > [   87.956807] [] (unwind_backtrace) from [] 
> > > (show_stack+0x11/0x14)
> > > [   87.957709] [] (show_stack) from [] 
> > > (dump_stack+0x89/0x98)
> > > [   87.958616] [] (dump_stack) from []
> > > (print_address_description.constprop.0+0x3d/0x340)
> > > [   87.959785] [] (print_address_description.constprop.0) from 
> > > []
> > > (__kasan_report+0xe3/0x12c)
> > > [   87.960907] [] (__kasan_report) from [] 
> > > (notifier_chain_register+0x4f/0x9c)
> > > [   87.962001] [] (notifier_chain_register) from []
> > > (blocking_notifier_chain_register+0x29/0x3c)
> > > [   87.963180] [] (blocking_notifier_chain_register) from 
> > > []
> > > (dev_pm_qos_add_notifier+0x79/0xf8)
> > > [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
> > > (cpufreq_online+0x5e1/0x8a4)
> > > [   87.965351] [] (cpufreq_online) from [] 
> > > (cpufreq_add_dev+0x79/0x80)
> > > [   87.966247] [] (cpufreq_add_dev) from [] 
> > > (subsys_interface_register+0xc3/0x100)
> > > [   87.967297] [] (subsys_interface_register) from []
> > > (cpufreq_register_driver+0x13b/0x1ec)
> > > [   87.968476] [] (cpufreq_register_driver) from []
> > > (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
> >
> > Hi Dmitry,
> >
> > Thanks for the bug report and I was finally able to reproduce it at my end 
> > and
> > this was quite an interesting debugging exercise :)
> >
> > When a cpufreq driver gets registered, we register with the subsys 
> > interface and
> > it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> > notifiers get added to the first CPU of the policy, i.e. CPU0 in common 
> > cases.
> >
> > When the cpufreq driver gets unregistered, we unregister with the subsys
> > interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> > (should have been in reverse order I feel). We remove the QoS notifier only 
> > when
> > cpufreq_remove_dev() gets called for the last CPU of the policy, lets call 
> > it
> > CPUx. Now this has a different notifier list as compared to CPU0.
>
> The same problem will appear if the original policy CPU goes offline, won't 
> it?
>
> > In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> > CPUx. When we try to add it again by inserting the module for second time, 
> > we
> > find a node in the notifier list which is already freed but still in the 
> > list as
> > we removed it from CPUx's list (which doesn't do anything as the node wasn't
> > there in the first place).
> >
> > @Rafael: How do you see we solve this problem ? Here are the options I could
> > think of:
> >
> > - Update subsys layer to reverse the order of devices while unregistering 
> > (this
> >   will fix the current problem, but we will still have corner cases hanging
> >   around, like if the CPU0 is hotplugged out, etc).
>
> This isn't sufficient for the offline case.
>
> > - Update QoS framework with the knowledge of related CPUs, this has been 
> > pending
> >   until now from my side. And this is the thing we really need to do. 
> > Eventually
> >   we shall have only a single notifier list for all CPUs of a policy, at 
> > least
> >   for MIN/MAX frequencies.
>
> - Move the PM QoS requests and notifiers to the new policy CPU on all
> changes of that.  That is, when cpufreq_offline() nominates the new
> "leader", all of the QoS stuff for the policy needs to go to this one.

Alas, that still will not work, because things like
acpi_processor_ppc_init() only work accidentally for one-CPU policies.
Generally, adding such a PM QoS request to a non-policy CPU simply has
no effect until it becomes a policy CPU which may be never.

It looks like using device PM QoS for cpufreq is a mistake in general
and what is needed is a struct pm_qos_constraints member in struct
cpufreq_policy and something like

struct freq_pm_qos_request {
enum freq_pm_qos_req_type type; /* min or max */
struct plist_node pnode;
struct cpufreq_policy *policy;
};

Then, pm_qos_update_target() can be used for adding, updating and
removing requests.


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-15 Thread Rafael J. Wysocki
On Tue, Oct 15, 2019 at 1:46 PM Viresh Kumar  wrote:
>
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
> > Hello Viresh,
> >
> > This patch causes use-after-free on a cpufreq driver module reload. Please 
> > take a look, thanks in advance.
> >
> >
> > [   87.952369] 
> > ==
> > [   87.953259] BUG: KASAN: use-after-free in 
> > notifier_chain_register+0x4f/0x9c
> > [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> >
> > [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
> > 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> > [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> > [   87.956807] [] (unwind_backtrace) from [] 
> > (show_stack+0x11/0x14)
> > [   87.957709] [] (show_stack) from [] 
> > (dump_stack+0x89/0x98)
> > [   87.958616] [] (dump_stack) from []
> > (print_address_description.constprop.0+0x3d/0x340)
> > [   87.959785] [] (print_address_description.constprop.0) from 
> > []
> > (__kasan_report+0xe3/0x12c)
> > [   87.960907] [] (__kasan_report) from [] 
> > (notifier_chain_register+0x4f/0x9c)
> > [   87.962001] [] (notifier_chain_register) from []
> > (blocking_notifier_chain_register+0x29/0x3c)
> > [   87.963180] [] (blocking_notifier_chain_register) from 
> > []
> > (dev_pm_qos_add_notifier+0x79/0xf8)
> > [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
> > (cpufreq_online+0x5e1/0x8a4)
> > [   87.965351] [] (cpufreq_online) from [] 
> > (cpufreq_add_dev+0x79/0x80)
> > [   87.966247] [] (cpufreq_add_dev) from [] 
> > (subsys_interface_register+0xc3/0x100)
> > [   87.967297] [] (subsys_interface_register) from []
> > (cpufreq_register_driver+0x13b/0x1ec)
> > [   87.968476] [] (cpufreq_register_driver) from []
> > (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
>
> Hi Dmitry,
>
> Thanks for the bug report and I was finally able to reproduce it at my end and
> this was quite an interesting debugging exercise :)
>
> When a cpufreq driver gets registered, we register with the subsys interface 
> and
> it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.
>
> When the cpufreq driver gets unregistered, we unregister with the subsys
> interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> (should have been in reverse order I feel). We remove the QoS notifier only 
> when
> cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
> CPUx. Now this has a different notifier list as compared to CPU0.

The same problem will appear if the original policy CPU goes offline, won't it?

> In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> CPUx. When we try to add it again by inserting the module for second time, we
> find a node in the notifier list which is already freed but still in the list 
> as
> we removed it from CPUx's list (which doesn't do anything as the node wasn't
> there in the first place).
>
> @Rafael: How do you see we solve this problem ? Here are the options I could
> think of:
>
> - Update subsys layer to reverse the order of devices while unregistering 
> (this
>   will fix the current problem, but we will still have corner cases hanging
>   around, like if the CPU0 is hotplugged out, etc).

This isn't sufficient for the offline case.

> - Update QoS framework with the knowledge of related CPUs, this has been 
> pending
>   until now from my side. And this is the thing we really need to do. 
> Eventually
>   we shall have only a single notifier list for all CPUs of a policy, at least
>   for MIN/MAX frequencies.

- Move the PM QoS requests and notifiers to the new policy CPU on all
changes of that.  That is, when cpufreq_offline() nominates the new
"leader", all of the QoS stuff for the policy needs to go to this one.

Of course, the reverse order of unregistration in the subsys layer
would help to avoid some useless churn related to that.


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-15 Thread Dmitry Osipenko
15.10.2019 14:46, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> Hello Viresh,
>>
>> This patch causes use-after-free on a cpufreq driver module reload. Please 
>> take a look, thanks in advance.
>>
>>
>> [   87.952369] 
>> ==
>> [   87.953259] BUG: KASAN: use-after-free in 
>> notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
>>
>> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
>> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
>> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [   87.956807] [] (unwind_backtrace) from [] 
>> (show_stack+0x11/0x14)
>> [   87.957709] [] (show_stack) from [] 
>> (dump_stack+0x89/0x98)
>> [   87.958616] [] (dump_stack) from []
>> (print_address_description.constprop.0+0x3d/0x340)
>> [   87.959785] [] (print_address_description.constprop.0) from 
>> []
>> (__kasan_report+0xe3/0x12c)
>> [   87.960907] [] (__kasan_report) from [] 
>> (notifier_chain_register+0x4f/0x9c)
>> [   87.962001] [] (notifier_chain_register) from []
>> (blocking_notifier_chain_register+0x29/0x3c)
>> [   87.963180] [] (blocking_notifier_chain_register) from 
>> []
>> (dev_pm_qos_add_notifier+0x79/0xf8)
>> [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
>> (cpufreq_online+0x5e1/0x8a4)
>> [   87.965351] [] (cpufreq_online) from [] 
>> (cpufreq_add_dev+0x79/0x80)
>> [   87.966247] [] (cpufreq_add_dev) from [] 
>> (subsys_interface_register+0xc3/0x100)
>> [   87.967297] [] (subsys_interface_register) from []
>> (cpufreq_register_driver+0x13b/0x1ec)
>> [   87.968476] [] (cpufreq_register_driver) from []
>> (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])
> 
> Hi Dmitry,
> 
> Thanks for the bug report and I was finally able to reproduce it at my end and
> this was quite an interesting debugging exercise :)
> 
> When a cpufreq driver gets registered, we register with the subsys interface 
> and
> it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
> notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.
> 
> When the cpufreq driver gets unregistered, we unregister with the subsys
> interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
> (should have been in reverse order I feel). We remove the QoS notifier only 
> when
> cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
> CPUx. Now this has a different notifier list as compared to CPU0.
> 
> In short, we are adding the cpufreq notifiers to CPU0 and removing them from
> CPUx. When we try to add it again by inserting the module for second time, we
> find a node in the notifier list which is already freed but still in the list 
> as
> we removed it from CPUx's list (which doesn't do anything as the node wasn't
> there in the first place).
> 
> @Rafael: How do you see we solve this problem ? Here are the options I could
> think of:
> 
> - Update subsys layer to reverse the order of devices while unregistering 
> (this
>   will fix the current problem, but we will still have corner cases hanging
>   around, like if the CPU0 is hotplugged out, etc).
> 
> - Update QoS framework with the knowledge of related CPUs, this has been 
> pending
>   until now from my side. And this is the thing we really need to do. 
> Eventually
>   we shall have only a single notifier list for all CPUs of a policy, at least
>   for MIN/MAX frequencies.
> 
> - ??
> 

Viresh, thank you very much! Looking forward to a fix :)


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-15 Thread Viresh Kumar
On 22-09-19, 23:12, Dmitry Osipenko wrote:
> Hello Viresh,
> 
> This patch causes use-after-free on a cpufreq driver module reload. Please 
> take a look, thanks in advance.
> 
> 
> [   87.952369] 
> ==
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [   87.956807] [] (unwind_backtrace) from [] 
> (show_stack+0x11/0x14)
> [   87.957709] [] (show_stack) from [] 
> (dump_stack+0x89/0x98)
> [   87.958616] [] (dump_stack) from []
> (print_address_description.constprop.0+0x3d/0x340)
> [   87.959785] [] (print_address_description.constprop.0) from 
> []
> (__kasan_report+0xe3/0x12c)
> [   87.960907] [] (__kasan_report) from [] 
> (notifier_chain_register+0x4f/0x9c)
> [   87.962001] [] (notifier_chain_register) from []
> (blocking_notifier_chain_register+0x29/0x3c)
> [   87.963180] [] (blocking_notifier_chain_register) from 
> []
> (dev_pm_qos_add_notifier+0x79/0xf8)
> [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
> (cpufreq_online+0x5e1/0x8a4)
> [   87.965351] [] (cpufreq_online) from [] 
> (cpufreq_add_dev+0x79/0x80)
> [   87.966247] [] (cpufreq_add_dev) from [] 
> (subsys_interface_register+0xc3/0x100)
> [   87.967297] [] (subsys_interface_register) from []
> (cpufreq_register_driver+0x13b/0x1ec)
> [   87.968476] [] (cpufreq_register_driver) from []
> (tegra20_cpufreq_probe+0x165/0x1a8 [tegra20_cpufreq])

Hi Dmitry,

Thanks for the bug report and I was finally able to reproduce it at my end and
this was quite an interesting debugging exercise :)

When a cpufreq driver gets registered, we register with the subsys interface and
it calls cpufreq_add_dev() for each CPU, starting from CPU0. And so the QoS
notifiers get added to the first CPU of the policy, i.e. CPU0 in common cases.

When the cpufreq driver gets unregistered, we unregister with the subsys
interface and it calls cpufreq_remove_dev() for each CPU, starting from CPU0
(should have been in reverse order I feel). We remove the QoS notifier only when
cpufreq_remove_dev() gets called for the last CPU of the policy, lets call it
CPUx. Now this has a different notifier list as compared to CPU0.

In short, we are adding the cpufreq notifiers to CPU0 and removing them from
CPUx. When we try to add it again by inserting the module for second time, we
find a node in the notifier list which is already freed but still in the list as
we removed it from CPUx's list (which doesn't do anything as the node wasn't
there in the first place).

@Rafael: How do you see we solve this problem ? Here are the options I could
think of:

- Update subsys layer to reverse the order of devices while unregistering (this
  will fix the current problem, but we will still have corner cases hanging
  around, like if the CPU0 is hotplugged out, etc).

- Update QoS framework with the knowledge of related CPUs, this has been pending
  until now from my side. And this is the thing we really need to do. Eventually
  we shall have only a single notifier list for all CPUs of a policy, at least
  for MIN/MAX frequencies.

- ??

-- 
viresh


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-14 Thread Dmitry Osipenko
14.10.2019 12:42, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> This patch causes use-after-free on a cpufreq driver module reload. Please 
>> take a look, thanks in advance.
>>
>>
>> [   87.952369] 
>> ==
>> [   87.953259] BUG: KASAN: use-after-free in 
>> notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> Hi Dmitry,
> 
> I tried to reproduce it on my ubuntu on ARM64 setup and I couldn't hit
> these issues on v5.4-rc1 with Kasan built in.
> 
> I then enabled Kasan (tried both inline and outline instrumentation)
> but I couldn't get past the issues with module insertion. It fails
> like this for me:
> 
> root@linaro-developer:~/work# insmod cpufreq-dt.ko 
> [   72.985974] cpufreq_dt: Unknown symbol __asan_report_load1_noabort (err -2)
> [   72.993164] cpufreq_dt: Unknown symbol __asan_report_load4_noabort (err -2)
> [   73.000307] cpufreq_dt: Unknown symbol __asan_report_load8_noabort (err -2)
> [   73.007451] cpufreq_dt: Unknown symbol __asan_report_store1_noabort (err 
> -2)
> [   73.014643] cpufreq_dt: Unknown symbol __asan_register_globals (err -2)
> [   73.021409] cpufreq_dt: Unknown symbol __asan_unregister_globals (err -2)
> [   73.028349] cpufreq_dt: Unknown symbol __asan_report_store8_noabort (err 
> -2)
> [   73.035543] cpufreq_dt: Unknown symbol __asan_report_store4_noabort (err 
> -2)
> insmod: ERROR: could not insert module cpufreq-dt.ko: Unknown symbol in module
> 
> I tried to search for these errors but couldn't find why I am getting
> these and why the symbols are missing here. Can you suggest something
> here ?
> 

Sorry, I don't know what's wrong with ARM64. There is no KASAN on ARM32 in 
upstream yet, I'm using
the WIP patches [1].

[1] https://lkml.org/lkml/2019/6/17/1562

BTW, I moved tegra20-cpufreq to use cpufreq-dt recently and the problem 
presents with the cpufreq-dt:

# rmmod cpufreq_dt
# modprobe cpufreq_dt

[   31.259483] 
==
[   31.260321] BUG: KASAN: use-after-free in notifier_chain_register+0x2b/0x7c
[   31.261026] Read of size 4 at addr cc30250c by task modprobe/218

[   31.262067] CPU: 1 PID: 218 Comm: modprobe Tainted: GW
5.4.0-rc2-next-20191011-00194-g02f44e30b215-dirty #2645
[   31.263347] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   31.264154] [] (unwind_backtrace) from [] 
(show_stack+0x11/0x14)
[   31.264960] [] (show_stack) from [] 
(dump_stack+0x89/0x98)
[   31.265804] [] (dump_stack) from []
(print_address_description.constprop.0+0x3d/0x340)
[   31.266830] [] (print_address_description.constprop.0) from 
[]
(__kasan_report+0xe3/0x12c)
[   31.267865] [] (__kasan_report) from [] 
(notifier_chain_register+0x2b/0x7c)
[   31.268755] [] (notifier_chain_register) from []
(blocking_notifier_chain_register+0x29/0x3c)
[   31.269842] [] (blocking_notifier_chain_register) from []
(dev_pm_qos_add_notifier+0x79/0xf8)
[   31.270948] [] (dev_pm_qos_add_notifier) from [] 
(cpufreq_online+0x5e1/0x8a4)
[   31.271922] [] (cpufreq_online) from [] 
(cpufreq_add_dev+0x79/0x80)
[   31.272889] [] (cpufreq_add_dev) from [] 
(subsys_interface_register+0xc3/0x100)
[   31.273894] [] (subsys_interface_register) from []
(cpufreq_register_driver+0x13b/0x1ec)
[   31.274912] [] (cpufreq_register_driver) from [] 
(dt_cpufreq_probe+0x89/0xe0
[cpufreq_dt])
[   31.275924] [] (dt_cpufreq_probe [cpufreq_dt]) from []
(platform_drv_probe+0x49/0x88)
[   31.276889] [] (platform_drv_probe) from [] 
(really_probe+0x109/0x378)
[   31.277715] [] (really_probe) from [] 
(driver_probe_device+0x57/0x15c)
[   31.278537] [] (driver_probe_device) from [] 
(device_driver_attach+0x61/0x64)
[   31.279425] [] (device_driver_attach) from [] 
(__driver_attach+0x49/0xa0)
[   31.280273] [] (__driver_attach) from [] 
(bus_for_each_dev+0x69/0x94)
[   31.281087] [] (bus_for_each_dev) from [] 
(bus_add_driver+0x179/0x1e8)
[   31.281909] [] (bus_add_driver) from [] 
(driver_register+0x8f/0x130)
[   31.282734] [] (driver_register) from [] 
(dt_cpufreq_platdrv_init+0x17/0x1000
[cpufreq_dt])
[   31.283761] [] (dt_cpufreq_platdrv_init [cpufreq_dt]) from 
[]
(do_one_initcall+0x4d/0x280)
[   31.284759] [] (do_one_initcall) from [] 
(do_init_module+0xb9/0x28c)
[   31.285561] [] (do_init_module) from [] 
(load_module+0x2895/0x2c04)
[   31.286347] [] (load_module) from [] 
(sys_finit_module+0x7b/0x8c)
[   31.287117] [] (sys_finit_module) from [] 
(ret_fast_syscall+0x1/0x26)
[   31.287901] Exception stack(0xcabb3fa8 to 0xcabb3ff0)
[   31.288406] 3fa0:   0003f348 0001 0003 0002b744 
 b6b31e74
[   31.289200] 3fc0: 0003f348 0001 94ccfd00 017b 0003f3f0  
0003f348 00040010
[   31.290029] 3fe0: b6b31df8 b6b31de8 00022534 aec752f0

[   31.290698] Allocated by task 181:
[   31.291065]  __kasan_kmalloc.constprop.0+0x7b/0x84
[   31.291565]  cpufreq_online+0x55f/0x8a4
[   

Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-10-14 Thread Viresh Kumar
On 22-09-19, 23:12, Dmitry Osipenko wrote:
> This patch causes use-after-free on a cpufreq driver module reload. Please 
> take a look, thanks in advance.
> 
> 
> [   87.952369] 
> ==
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243

Hi Dmitry,

I tried to reproduce it on my ubuntu on ARM64 setup and I couldn't hit
these issues on v5.4-rc1 with Kasan built in.

I then enabled Kasan (tried both inline and outline instrumentation)
but I couldn't get past the issues with module insertion. It fails
like this for me:

root@linaro-developer:~/work# insmod cpufreq-dt.ko 
[   72.985974] cpufreq_dt: Unknown symbol __asan_report_load1_noabort (err -2)
[   72.993164] cpufreq_dt: Unknown symbol __asan_report_load4_noabort (err -2)
[   73.000307] cpufreq_dt: Unknown symbol __asan_report_load8_noabort (err -2)
[   73.007451] cpufreq_dt: Unknown symbol __asan_report_store1_noabort (err -2)
[   73.014643] cpufreq_dt: Unknown symbol __asan_register_globals (err -2)
[   73.021409] cpufreq_dt: Unknown symbol __asan_unregister_globals (err -2)
[   73.028349] cpufreq_dt: Unknown symbol __asan_report_store8_noabort (err -2)
[   73.035543] cpufreq_dt: Unknown symbol __asan_report_store4_noabort (err -2)
insmod: ERROR: could not insert module cpufreq-dt.ko: Unknown symbol in module

I tried to search for these errors but couldn't find why I am getting
these and why the symbols are missing here. Can you suggest something
here ?

-- 
viresh


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-09-23 Thread Dmitry Osipenko
23.09.2019 16:56, Viresh Kumar пишет:
> On 22-09-19, 23:12, Dmitry Osipenko wrote:
>> This patch causes use-after-free on a cpufreq driver module reload. Please 
>> take a look, thanks in advance.
>>
>>
>> [   87.952369] 
>> ==
>> [   87.953259] BUG: KASAN: use-after-free in 
>> notifier_chain_register+0x4f/0x9c
>> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
>>
>> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
>> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
>> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [   87.956807] [] (unwind_backtrace) from [] 
>> (show_stack+0x11/0x14)
>> [   87.957709] [] (show_stack) from [] 
>> (dump_stack+0x89/0x98)
>> [   87.958616] [] (dump_stack) from []
>> (print_address_description.constprop.0+0x3d/0x340)
>> [   87.959785] [] (print_address_description.constprop.0) from 
>> []
>> (__kasan_report+0xe3/0x12c)
>> [   87.960907] [] (__kasan_report) from [] 
>> (notifier_chain_register+0x4f/0x9c)
>> [   87.962001] [] (notifier_chain_register) from []
>> (blocking_notifier_chain_register+0x29/0x3c)
>> [   87.963180] [] (blocking_notifier_chain_register) from 
>> []
>> (dev_pm_qos_add_notifier+0x79/0xf8)
>> [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
>> (cpufreq_online+0x5e1/0x8a4)
> 
> Hi Dmitry,
> 
> Unfortunately I am traveling right now and can't test this stuff, though I may
> have found the root cause here. Can you please test the below diff for me ?
> 
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 6c90fd7e2ff8..9ac244ee05fe 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -328,6 +328,8 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
> spin_unlock_irq(>power.lock);
>  
> kfree(qos->resume_latency.notifiers);
> +   kfree(qos->min_frequency.notifiers);
> +   kfree(qos->max_frequency.notifiers);
> kfree(qos);
>  
>   out:
> 

Doesn't help. The use-after-free bugs are usually caused by a missing
NULL assignment after kfree(), like in this snippet:

..
if (!a)
a = kmalloc();
..
kfree(a);
// a = NULL<-- missing!
..

I briefly looked through the code and don't see anything obviously
wrong. The bug isn't critical since unlikely that somebody reloads
cpufreq module for a non-development purposes, so it's not a big deal
and can wait. Please take your time.

I also want to point out that kernel crashes after second module reload,
hence the KASAN report should be valid.


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-09-23 Thread Viresh Kumar
On 22-09-19, 23:12, Dmitry Osipenko wrote:
> This patch causes use-after-free on a cpufreq driver module reload. Please 
> take a look, thanks in advance.
> 
> 
> [   87.952369] 
> ==
> [   87.953259] BUG: KASAN: use-after-free in notifier_chain_register+0x4f/0x9c
> [   87.954031] Read of size 4 at addr e6abbd0c by task modprobe/243
> 
> [   87.954901] CPU: 1 PID: 243 Comm: modprobe Tainted: GW
> 5.3.0-next-20190920-00185-gf61698eab956-dirty #2408
> [   87.956077] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [   87.956807] [] (unwind_backtrace) from [] 
> (show_stack+0x11/0x14)
> [   87.957709] [] (show_stack) from [] 
> (dump_stack+0x89/0x98)
> [   87.958616] [] (dump_stack) from []
> (print_address_description.constprop.0+0x3d/0x340)
> [   87.959785] [] (print_address_description.constprop.0) from 
> []
> (__kasan_report+0xe3/0x12c)
> [   87.960907] [] (__kasan_report) from [] 
> (notifier_chain_register+0x4f/0x9c)
> [   87.962001] [] (notifier_chain_register) from []
> (blocking_notifier_chain_register+0x29/0x3c)
> [   87.963180] [] (blocking_notifier_chain_register) from 
> []
> (dev_pm_qos_add_notifier+0x79/0xf8)
> [   87.964339] [] (dev_pm_qos_add_notifier) from [] 
> (cpufreq_online+0x5e1/0x8a4)

Hi Dmitry,

Unfortunately I am traveling right now and can't test this stuff, though I may
have found the root cause here. Can you please test the below diff for me ?

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 6c90fd7e2ff8..9ac244ee05fe 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -328,6 +328,8 @@ void dev_pm_qos_constraints_destroy(struct device *dev)
spin_unlock_irq(>power.lock);
 
kfree(qos->resume_latency.notifiers);
+   kfree(qos->min_frequency.notifiers);
+   kfree(qos->max_frequency.notifiers);
kfree(qos);
 
  out:

-- 
viresh


Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-09-22 Thread Dmitry Osipenko
08.07.2019 13:57, Viresh Kumar пишет:
> This registers the notifiers for min/max frequency constraints with the
> PM QoS framework. The constraints are also taken into consideration in
> cpufreq_set_policy().
> 
> This also relocates cpufreq_policy_put_kobj() as it is required to be
> called from cpufreq_policy_alloc() now.
> 
> refresh_frequency_limits() is updated to avoid calling
> cpufreq_set_policy() for inactive policies and handle_update() is
> updated to have proper locking in place.
> 
> No constraints are added until now though.
> 
> Reviewed-by: Matthias Kaehlcke 
> Reviewed-by: Ulf Hansson 
> Signed-off-by: Viresh Kumar 
> Signed-off-by: Rafael J. Wysocki 
> ---
> V6->V7:
> - All callers of refresh_frequency_limits(), except handle_update(),
>   take the policy->rwsem and result in deadlock as
>   refresh_frequency_limits() also takes the same lock again. Fix that
>   by taking the rwsem from handle_update() instead.
> 
> @Rafael: Sending it before Pavel has verified it as I would be offline
> later, in case you want to apply this today itself.
> 
>  drivers/cpufreq/cpufreq.c | 135 +-
>  include/linux/cpufreq.h   |   3 +
>  2 files changed, 108 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index ceb57af15ca0..b96ef6db1bfe 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy 
> *policy, unsigned int cpu)
>  {
>   struct device *dev = get_cpu_device(cpu);
>  
> - if (!dev)
> + if (unlikely(!dev))
>   return;
>  
>   if (cpumask_test_and_set_cpu(cpu, policy->real_cpus))
> @@ -1117,14 +1118,16 @@ static int cpufreq_add_policy_cpu(struct 
> cpufreq_policy *policy, unsigned int cp
>  
>  static void refresh_frequency_limits(struct cpufreq_policy *policy)
>  {
> - struct cpufreq_policy new_policy = *policy;
> -
> - pr_debug("updating policy for CPU %u\n", policy->cpu);
> + struct cpufreq_policy new_policy;
>  
> - new_policy.min = policy->user_policy.min;
> - new_policy.max = policy->user_policy.max;
> + if (!policy_is_inactive(policy)) {
> + new_policy = *policy;
> + pr_debug("updating policy for CPU %u\n", policy->cpu);
>  
> - cpufreq_set_policy(policy, _policy);
> + new_policy.min = policy->user_policy.min;
> + new_policy.max = policy->user_policy.max;
> + cpufreq_set_policy(policy, _policy);
> + }
>  }
>  
>  static void handle_update(struct work_struct *work)
> @@ -1133,14 +1136,60 @@ static void handle_update(struct work_struct *work)
>   container_of(work, struct cpufreq_policy, update);
>  
>   pr_debug("handle_update for cpu %u called\n", policy->cpu);
> + down_write(>rwsem);
>   refresh_frequency_limits(policy);
> + up_write(>rwsem);
> +}
> +
> +static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long 
> freq,
> + void *data)
> +{
> + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
> nb_min);
> +
> + schedule_work(>update);
> + return 0;
> +}
> +
> +static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long 
> freq,
> + void *data)
> +{
> + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
> nb_max);
> +
> + schedule_work(>update);
> + return 0;
> +}
> +
> +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> +{
> + struct kobject *kobj;
> + struct completion *cmp;
> +
> + down_write(>rwsem);
> + cpufreq_stats_free_table(policy);
> + kobj = >kobj;
> + cmp = >kobj_unregister;
> + up_write(>rwsem);
> + kobject_put(kobj);
> +
> + /*
> +  * We need to make sure that the underlying kobj is
> +  * actually not referenced anymore by anybody before we
> +  * proceed with unloading.
> +  */
> + pr_debug("waiting for dropping of refcount\n");
> + wait_for_completion(cmp);
> + pr_debug("wait complete\n");
>  }
>  
>  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  {
>   struct cpufreq_policy *policy;
> + struct device *dev = get_cpu_device(cpu);
>   int ret;
>  
> + if (!dev)
> + return NULL;
> +
>   policy = kzalloc(sizeof(*policy), GFP_KERNEL);
>   if (!policy)
>   return NULL;
> @@ -1157,7 +1206,7 @@ static struct cpufreq_policy 
> *cpufreq_policy_alloc(unsigned int cpu)
>   ret = kobject_init_and_add(>kobj, _cpufreq,
>  cpufreq_global_kobject, "policy%u", cpu);
>   if (ret) {
> - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
> + dev_err(dev, "%s: failed to init