Time stamp value in printk records

2019-09-30 Thread Sodagudi Prasad

Hi All,

From Qualcomm side, we would like to check with upstream team about 
adding Raw time stamp value to printk records. On Qualcomm soc, there 
are various DSPs subsystems are there - for example audio, video and 
modem DSPs.
Adding raw timer value(along with sched_clock()) in the printk record 
helps in the following use cases –
1)	To find out which subsystem  crashed first  -  Whether application 
processor crashed first or DSP subsystem?
2)	If there are any system stability issues on the DSP side, what is the 
activity on the APPS processor side during that time?


Initially during the device boot up, printk shed_clock value can be 
matched with timer raw value used on the dsp subsystem, but after APPS 
processor suspends several times, we don’t have way to correlate the 
time stamp  value on the DSP and APPS processor. All timers(both apps 
processor timer and dsp timers) are derived from globally always on 
timer on Qualcomm soc, So keeping global timer raw values in printk 
records and dsp logs help to correlate the activity of all the 
processors in SoC.


It would be great if upstream team adds common solution this problem if 
all soc vendors would get benefit by adding raw timer value to  printk 
records.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] kernel/panic: Use SYSTEM_RESET2 command for warm reset

2019-05-17 Thread Sodagudi Prasad

On 2019-05-16 11:29, Aaro Koskinen wrote:

Hi,

On Wed, May 08, 2019 at 06:47:12PM -0700, Prasad Sodagudi wrote:

Some platforms may need warm reboot support when kernel crashed
for post mortem analysis instead of cold reboot. So use config
CONFIG_WARM_REBOOT_ON_PANIC and SYSTEM_RESET2 psci command
support for warm reset.


Please see commit b287a25a7148 - you can now use kernel command
line option reboot=panic_warm to get this.


Thanks Aaro. Yes. I can use this option.  Thanks Sudeep and all for 
discussing.




A.


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: PSCI version 1.1 and SYSTEM_RESET2

2019-05-08 Thread Sodagudi Prasad

On 2019-05-02 02:05, Sudeep Holla wrote:

On Wed, May 01, 2019 at 11:43:00AM -0700, Sodagudi Prasad wrote:

On 2019-05-01 02:49, Sudeep Holla wrote:
> On Tue, Apr 30, 2019 at 05:07:31PM -0700, Sodagudi Prasad wrote:
> > On 2019-04-30 14:44, Sodagudi Prasad wrote:


[...]


> >
> > It would nice if there is a config option to reboot the device
> > either in
> > warm or cold in the case of kernel panic.
>
> I presume you prefer to do warm boot in case of panic to get a dump of
> the memory to inspect ? If so, is kexec/kdump not the mechanism to
> achieve that ?

Hi Sudeep,

Thanks for your response and sharing details about your patch.

> If so, is kexec/kdump not the mechanism to achieve that?
>
Qualcomm is having vendor specific solution to capture ram contents 
and for

offline analysis.



Ah OK.


>
> I am just trying to understand the use case. Xilinx asked for the same
> but never got to understand their use case.

Here is the background -
Usually, power off drivers are overriding arm_pm_restart and 
pm_power_off
callbacks and registering with reboot notifier with  some priority for 
the
reboot operations.  Here is the Qualcomm poweroff driver for 
reference.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/reset/msm-poweroff.c

Before vendor chip set specific power off driver is probed, 
arm_pm_restart
functions pointer holds the psci_sys_reset function. Once vendor power 
off
driver is probed,  vendor drivers can override the arm_pm_restart 
function

pointer.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/psci.c#n562

Once vendor driver is probed, drivers can take care of devices warm or 
hard

reset configuration part properly.  But there is a window from
start_kernel() to vendor specific driver probed, devices are getting 
cold
resets even if kernel crashed.  This is due to arm_pm_restart points 
to

psci_sys_reset function by default.  Is this problem clear now?



Too specific use case IMO and I am not sure if we need a generic 
solution

to deal with this. Anyways, I don't see any check in arch/psci specific
code for what you want, just ensure reboot_mode is set appropriately.
Post a patch and see what people have to say.


Hi Sudeep,

Yes. With your system_reset2 command support addition, just configuring 
the reboot_mode is good enough.


-Thanks, Prasad



Qualcomm downstream kernel has a lot of use cases with respect device 
reset
sequence and the downstream driver is much different from upstream 
drivers.
I think, the above-mentioned problem is common for all the chipset 
vendors
and it is not specific Qualcomm use cases.  I have one downstream 
solution

to this problem but thought to bring up this problem to the upstream
community for a common solution, so that all the vendors can use it.



May be or may be not, post the patch and let's see.

I have modified below flow to avoid cold restart in the case of early 
kernel

panic.
panic() --> emergency_restart() --> machine_emergency_restart() -->
machine_restart(NULL);

-Thanks, Prasad

>
> --
> Regards,
> Sudeep

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: PSCI version 1.1 and SYSTEM_RESET2

2019-05-01 Thread Sodagudi Prasad

On 2019-05-01 02:49, Sudeep Holla wrote:

On Tue, Apr 30, 2019 at 05:07:31PM -0700, Sodagudi Prasad wrote:

On 2019-04-30 14:44, Sodagudi Prasad wrote:
+Sudeep

> Hi Mark/Will,
>
> I would like to understand whether ARM linux community have plans to
> support PSCI version 1.1 or not.
> PSCI_1_1 specification introduced support for SYSTEM_RESET2 command
> and this new command helps mobile devices to SYSTEM_WARM_RESET
> support. Rebooting devices with warm reboot helps to capture the
> snapshot of the ram contents for post-mortem analysis.

I think, there is a recent discussion from Sudeep for the 
SYSTEM_RESET2

support.
https://patchwork.kernel.org/patch/10884345/



This has landed in -next, and hopefully must appear in v5.2



Hi Sudeep,

I was going through your discussion in the below list -
https://lore.kernel.org/lkml/d73d3580-4ec1-a281-4585-5c776fc08...@xilinx.com/

There is no provision to set up reboot mode dynamically instead kernel
command line parameter.
Looking for options to reboot device with warm reboot option when 
kernel

crashed.

panic() --> emergency_restart() --> machine_emergency_restart() -->
machine_restart(NULL);

It would nice if there is a config option to reboot the device either 
in

warm or cold in the case of kernel panic.


I presume you prefer to do warm boot in case of panic to get a dump of
the memory to inspect ? If so, is kexec/kdump not the mechanism to
achieve that ?


Hi Sudeep,

Thanks for your response and sharing details about your patch.
  If so, is kexec/kdump not the mechanism to achieve that?
Qualcomm is having vendor specific solution to capture ram contents and 
for offline analysis.




I am just trying to understand the use case. Xilinx asked for the same
but never got to understand their use case.


Here is the background -
Usually, power off drivers are overriding arm_pm_restart and 
pm_power_off callbacks and registering with reboot notifier with  some 
priority for the reboot operations.  Here is the Qualcomm poweroff 
driver for reference.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/reset/msm-poweroff.c

Before vendor chip set specific power off driver is probed, 
arm_pm_restart functions pointer holds the psci_sys_reset function. Once 
vendor power off driver is probed,  vendor drivers can override the 
arm_pm_restart function pointer.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/psci.c#n562

Once vendor driver is probed, drivers can take care of devices warm or 
hard reset configuration part properly.  But there is a window from 
start_kernel() to vendor specific driver probed, devices are getting 
cold resets even if kernel crashed.  This is due to arm_pm_restart 
points to psci_sys_reset function by default.  Is this problem clear 
now?


Qualcomm downstream kernel has a lot of use cases with respect device 
reset sequence and the downstream driver is much different from upstream 
drivers. I think, the above-mentioned problem is common for all the 
chipset vendors and it is not specific Qualcomm use cases.  I have one 
downstream solution to this problem but thought to bring up this problem 
to the upstream community for a common solution, so that all the vendors 
can use it.


I have modified below flow to avoid cold restart in the case of early 
kernel panic.
panic() --> emergency_restart() --> machine_emergency_restart() --> 
machine_restart(NULL);


-Thanks, Prasad



--
Regards,
Sudeep


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: PSCI version 1.1 and SYSTEM_RESET2

2019-04-30 Thread Sodagudi Prasad

On 2019-04-30 14:44, Sodagudi Prasad wrote:
+Sudeep


Hi Mark/Will,

I would like to understand whether ARM linux community have plans to
support PSCI version 1.1 or not.
PSCI_1_1 specification introduced support for SYSTEM_RESET2 command
and this new command helps mobile devices to SYSTEM_WARM_RESET
support. Rebooting devices with warm reboot helps to capture the
snapshot of the ram contents for post-mortem analysis.


I think, there is a recent discussion from Sudeep for the SYSTEM_RESET2 
support.

https://patchwork.kernel.org/patch/10884345/


Hi Sudeep,

I was going through your discussion in the below list -
https://lore.kernel.org/lkml/d73d3580-4ec1-a281-4585-5c776fc08...@xilinx.com/

There is no provision to set up reboot mode dynamically instead kernel 
command line parameter.
Looking for options to reboot device with warm reboot option when kernel 
crashed.


panic() --> emergency_restart() --> machine_emergency_restart() --> 
machine_restart(NULL);


It would nice if there is a config option to reboot the device either in 
warm or cold in the case of kernel panic.
Calling machine_restart with a NULL parameter for kernel crash is 
leading to devices cold reboot.


-Thanks, Prasad



-Thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


PSCI version 1.1 and SYSTEM_RESET2

2019-04-30 Thread Sodagudi Prasad

Hi Mark/Will,

I would like to understand whether ARM linux community have plans to 
support PSCI version 1.1 or not.
PSCI_1_1 specification introduced support for SYSTEM_RESET2 command and 
this new command helps mobile devices to SYSTEM_WARM_RESET support. 
Rebooting devices with warm reboot helps to capture the snapshot of the 
ram contents for post-mortem analysis.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] genirq: call cancel_work_sync from irq_set_affinity_notifier

2019-03-21 Thread Sodagudi Prasad

On 2019-03-21 09:19, Thomas Gleixner wrote:

Prasad,

On Wed, 20 Mar 2019, Prasad Sodagudi wrote:

Subject: [PATCH] genirq: call cancel_work_sync from 
irq_set_affinity_notifier


Please do not decribe WHAT the code change is. Give a consice 
explanation
WHY this change is done. The above is like '[PATCH] foo: Increment bar 
by 5'.


  [PATCH] genirq: Prevent UAF and work list corruption


When ever notification of IRQ affinity changes, call
cancel_work_sync from irq_set_affinity_notifier to cancel
all pending works to avoid work list corruption.


Again, you describe first WHAT you are doing instead of telling WHY.

  When irq_set_affinity_notifier() replaces the notifier, then the
  reference count on the old notifier is dropped which causes it to be
  freed. But nothing ensures that the old notifier is not longer queued 
in

  the work list. If it is queued this results in a use after free and
  possibly in work list corruption.

  Ensure that the work is canceled before the reference is dropped.

See?


Hi Tglx,

Thanks for suggesting commit text and modifications.



This gives precise context first and then describes the cure.

Also it is completely irrelevant whether this is achieved by calling
cancel_work_sync() or by something else. What matters is that it's
canceled. Changelogs describe context and concepts not implementation
details. The implementation details are in the patch itself.


Signed-off-by: Prasad Sodagudi 
---
 kernel/irq/manage.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 9ec34a2..da8b2ee 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -356,6 +356,9 @@ static void irq_affinity_notify(struct work_struct 
*work)

desc->affinity_notify = notify;
raw_spin_unlock_irqrestore(>lock, flags);

+   if (!notify && old_notify)
+   cancel_work_sync(_notify->work);


That '!notify' doesn't make any sense.


Yes. I will remove this in the next patch set.  Thanks for reviewing.

-thanks, Prasad


Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] perf: Change PMCR write to read-modify-write

2019-03-21 Thread Sodagudi Prasad

On 2019-03-21 06:34, Julien Thierry wrote:

Hi Prasad,

On 21/03/2019 02:07, Prasad Sodagudi wrote:

Preserves the bitfields of PMCR_EL0(AArch64) during PMU reset.
Reset routine should write a 1 to PMCR.C and PMCR.P fields only
to reset the counters. Other fields should not be changed
as they could be set before PMU initialization and their
value must be preserved even after reset.



Are there any particular bit you are concerned about? Apart from the RO
ones and the Res0 ones (to which we are already writing 0), I see:

DP -> irrelevant for non-secure
X -> This one is the only potentially interesting, however it resets to
an architecturally unknown value, so unless we know for a fact it was
set before hand, we probably want to clear it
D -> ignored when we have LC set (and we do)
E -> Since this is the function we use to reset the PMU on the current
CPU, we probably want to set this bit to 0 regardless of its previous 
value


So, is there any issue this patch is solving?


Hi Julien,

Thanks for taking a look into this patch. Yes. On our Qualcomm 
platforms, observed that X bit is getting cleared by kernel.
This bit is getting set by firmware for Qualcomm use cases and 
non-secure world is resetting without this patch.
I think, changing that register this register modifications to 
read-modify-write style make sense.


-Thanks, Prasad



Thanks,


Signed-off-by: Prasad Sodagudi 
---
 arch/arm64/kernel/perf_event.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c 
b/arch/arm64/kernel/perf_event.c

index 4addb38..0c1afdd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -868,8 +868,8 @@ static void armv8pmu_reset(void *info)
 * Initialize & Reset PMNC. Request overflow interrupt for
 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
 */
-   armv8pmu_pmcr_write(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C |
-   ARMV8_PMU_PMCR_LC);
+   armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_P |
+   ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_LC);
 }

 static int __armv8_pmuv3_map_event(struct perf_event *event,



--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: protected pins and debugfs

2018-10-16 Thread Sodagudi Prasad

On 2018-10-10 12:40, Sodagudi Prasad wrote:

On 2018-10-07 23:04, Stephen Boyd wrote:

Quoting Sodagudi Prasad (2018-10-03 05:38:24)


 for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
 msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
 }
  }



Does something not work with the following code in
msm_gpio_dbg_show_one()?


if (!gpiochip_line_is_valid(chip, offset))
return;


Hi Stephen,
I didnt realize that these changes are merged on tip. I was testing on
4.14 kernel.

https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/



Hi Stephen,

After checking this further, adding "gpio-reserved-ranges" in not good 
option. Because of the following reasons.
1) These gpio information changes from platform to platform. So need to 
maintain reserved-range properly for each platform.
2) Also some of the gpio can be changed to secure/protected gpio 
dynamically based on the use case.


It looks adding the "gpio-reserved-ranges" ranges is not good option for 
most of the platforms.


Can you please check the initial patch suggested in this thread? Please 
let me know if you have any other options for the above points.


-Thanks, Prasad



I will add "gpio-reserved-ranges" to internal platforms and this issue
should not be observed.

-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: protected pins and debugfs

2018-10-16 Thread Sodagudi Prasad

On 2018-10-10 12:40, Sodagudi Prasad wrote:

On 2018-10-07 23:04, Stephen Boyd wrote:

Quoting Sodagudi Prasad (2018-10-03 05:38:24)


 for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
 msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
 }
  }



Does something not work with the following code in
msm_gpio_dbg_show_one()?


if (!gpiochip_line_is_valid(chip, offset))
return;


Hi Stephen,
I didnt realize that these changes are merged on tip. I was testing on
4.14 kernel.

https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/



Hi Stephen,

After checking this further, adding "gpio-reserved-ranges" in not good 
option. Because of the following reasons.
1) These gpio information changes from platform to platform. So need to 
maintain reserved-range properly for each platform.
2) Also some of the gpio can be changed to secure/protected gpio 
dynamically based on the use case.


It looks adding the "gpio-reserved-ranges" ranges is not good option for 
most of the platforms.


Can you please check the initial patch suggested in this thread? Please 
let me know if you have any other options for the above points.


-Thanks, Prasad



I will add "gpio-reserved-ranges" to internal platforms and this issue
should not be observed.

-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: livelock with hrtimer cpu_base->lock

2018-10-12 Thread Sodagudi Prasad

On 2018-10-10 09:49, Will Deacon wrote:

Hi Prasad,

On Tue, Oct 09, 2018 at 01:56:14PM -0700, Sodagudi Prasad wrote:
This is regarding - thread "try to fix contention between 
expire_timers and

try_to_del_timer_sync".
https://lkml.org/lkml/2017/7/28/172

I think this live lockup issue was discussed earlier but the final set 
of

changes were not concluded.


Well we basically need a way to pick a value for 
CPU_RELAX_WFE_THRESHOLD.

Do you have any ideas? It could be determined at runtime if necessary.


Hi Will,

Please share what are values need to be tried for 
CPU_RELAX_WFE_THRESHOLD.


 It would be great if it can be determined from runtime.  Please let me 
know

if any testing need to be done with dynamic detection patch.

-Thanks, Prasad

I would like to check whether you have new updates on this issue or 
not.

This problem is observed with 4.14 .64 stable kernel too.
We see this problem 2 times in overnight testing.

I have to add the following code to avoid live lock. I am thinking 
that

fixing this at the cpu_relax() level.

+++ b/kernel/time/hrtimer.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const
struct hrtimer *timer,
raw_spin_unlock_irqrestore(>cpu_base->lock, *flags);
}
cpu_relax();
+   udelay(1);
}
 }

@@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer)
if (ret >= 0)
return ret;
cpu_relax();
+   udelay(1);
}
 }
 EXPORT_SYMBOL_GPL(hrtimer_cancel);


This is just another bodge and likely to hurt in places where 1us is
excessive because there isn't contention.

Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: livelock with hrtimer cpu_base->lock

2018-10-12 Thread Sodagudi Prasad

On 2018-10-10 09:49, Will Deacon wrote:

Hi Prasad,

On Tue, Oct 09, 2018 at 01:56:14PM -0700, Sodagudi Prasad wrote:
This is regarding - thread "try to fix contention between 
expire_timers and

try_to_del_timer_sync".
https://lkml.org/lkml/2017/7/28/172

I think this live lockup issue was discussed earlier but the final set 
of

changes were not concluded.


Well we basically need a way to pick a value for 
CPU_RELAX_WFE_THRESHOLD.

Do you have any ideas? It could be determined at runtime if necessary.


Hi Will,

Please share what are values need to be tried for 
CPU_RELAX_WFE_THRESHOLD.


 It would be great if it can be determined from runtime.  Please let me 
know

if any testing need to be done with dynamic detection patch.

-Thanks, Prasad

I would like to check whether you have new updates on this issue or 
not.

This problem is observed with 4.14 .64 stable kernel too.
We see this problem 2 times in overnight testing.

I have to add the following code to avoid live lock. I am thinking 
that

fixing this at the cpu_relax() level.

+++ b/kernel/time/hrtimer.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const
struct hrtimer *timer,
raw_spin_unlock_irqrestore(>cpu_base->lock, *flags);
}
cpu_relax();
+   udelay(1);
}
 }

@@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer)
if (ret >= 0)
return ret;
cpu_relax();
+   udelay(1);
}
 }
 EXPORT_SYMBOL_GPL(hrtimer_cancel);


This is just another bodge and likely to hurt in places where 1us is
excessive because there isn't contention.

Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: protected pins and debugfs

2018-10-10 Thread Sodagudi Prasad

On 2018-10-07 23:04, Stephen Boyd wrote:

Quoting Sodagudi Prasad (2018-10-03 05:38:24)


 for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
 msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
 }
  }



Does something not work with the following code in
msm_gpio_dbg_show_one()?


if (!gpiochip_line_is_valid(chip, offset))
return;


Hi Stephen,
I didnt realize that these changes are merged on tip. I was testing on 
4.14 kernel.


https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/

I will add "gpio-reserved-ranges" to internal platforms and this issue 
should not be observed.


-thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: protected pins and debugfs

2018-10-10 Thread Sodagudi Prasad

On 2018-10-07 23:04, Stephen Boyd wrote:

Quoting Sodagudi Prasad (2018-10-03 05:38:24)


 for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
 msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
 }
  }



Does something not work with the following code in
msm_gpio_dbg_show_one()?


if (!gpiochip_line_is_valid(chip, offset))
return;


Hi Stephen,
I didnt realize that these changes are merged on tip. I was testing on 
4.14 kernel.


https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/

I will add "gpio-reserved-ranges" to internal platforms and this issue 
should not be observed.


-thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


livelock with hrtimer cpu_base->lock

2018-10-09 Thread Sodagudi Prasad

Hi Will,

This is regarding - thread "try to fix contention between expire_timers 
and try_to_del_timer_sync".

https://lkml.org/lkml/2017/7/28/172

I think this live lockup issue was discussed earlier but the final set 
of changes were not concluded.
I would like to check whether you have new updates on this issue or not. 
 This problem is observed with 4.14 .64 stable kernel too.

We see this problem 2 times in overnight testing.

I have to add the following code to avoid live lock. I am thinking that 
fixing this at the cpu_relax() level.


+++ b/kernel/time/hrtimer.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const 
struct hrtimer *timer,

raw_spin_unlock_irqrestore(>cpu_base->lock, *flags);

}
cpu_relax();
+   udelay(1);
}
 }

@@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer)
if (ret >= 0)
return ret;
cpu_relax();
+   udelay(1);
}
 }
 EXPORT_SYMBOL_GPL(hrtimer_cancel);

Note:- Timer event streaming is enabled and still live lock observed.

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


livelock with hrtimer cpu_base->lock

2018-10-09 Thread Sodagudi Prasad

Hi Will,

This is regarding - thread "try to fix contention between expire_timers 
and try_to_del_timer_sync".

https://lkml.org/lkml/2017/7/28/172

I think this live lockup issue was discussed earlier but the final set 
of changes were not concluded.
I would like to check whether you have new updates on this issue or not. 
 This problem is observed with 4.14 .64 stable kernel too.

We see this problem 2 times in overnight testing.

I have to add the following code to avoid live lock. I am thinking that 
fixing this at the cpu_relax() level.


+++ b/kernel/time/hrtimer.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const 
struct hrtimer *timer,

raw_spin_unlock_irqrestore(>cpu_base->lock, *flags);

}
cpu_relax();
+   udelay(1);
}
 }

@@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer)
if (ret >= 0)
return ret;
cpu_relax();
+   udelay(1);
}
 }
 EXPORT_SYMBOL_GPL(hrtimer_cancel);

Note:- Timer event streaming is enabled and still live lock observed.

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


protected pins and debugfs

2018-10-03 Thread Sodagudi Prasad

Hi All,

This is regarding the protected pins configuration reading and printing 
from non-secure operating systems.
GPIO framework is checking whether pin is in use(flag FLAG_REQUESTED) or 
not in gpiolib_dbg_show().


If GPIO chip drivers are overriding the dbg_show callback, drivers are 
not checking whether a pin is really in use or not to print 
configuration details.

if (chip->dbg_show)
chip->dbg_show(s, chip);
else
gpiolib_dbg_show(s, gdev);

I think, reserved-gpio-ranges solution may move the problem to dts 
settings maintenance as they change from platform to platform.

https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/

Can we use a simple/common solution like below?  It will check whether a 
pin is in use or not before printing configuration data with the help of 
gpiochip_is_requested().


index fef0970..2ca1440 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -579,16 +579,20 @@ static void msm_gpio_dbg_show_one(struct seq_file 
*s,
seq_printf(s, " %-8s: %-3s %d", g->name, is_out ? "out" : "in", 
func);

seq_printf(s, " %dmA", msm_regval_to_drive(drive));
seq_printf(s, " %s", pulls[pull]);
+   seq_puts(s, "\n");
 }

 static void msm_gpio_dbg_show(struct seq_file *s, struct gpio_chip 
*chip)

 {
unsigned gpio = chip->base;
unsigned i;
+   const char *label;

for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
}
 }

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


protected pins and debugfs

2018-10-03 Thread Sodagudi Prasad

Hi All,

This is regarding the protected pins configuration reading and printing 
from non-secure operating systems.
GPIO framework is checking whether pin is in use(flag FLAG_REQUESTED) or 
not in gpiolib_dbg_show().


If GPIO chip drivers are overriding the dbg_show callback, drivers are 
not checking whether a pin is really in use or not to print 
configuration details.

if (chip->dbg_show)
chip->dbg_show(s, chip);
else
gpiolib_dbg_show(s, gdev);

I think, reserved-gpio-ranges solution may move the problem to dts 
settings maintenance as they change from platform to platform.

https://lore.kernel.org/patchwork/patch/878107/
https://lore.kernel.org/patchwork/patch/878106/
https://lore.kernel.org/patchwork/patch/878109/

Can we use a simple/common solution like below?  It will check whether a 
pin is in use or not before printing configuration data with the help of 
gpiochip_is_requested().


index fef0970..2ca1440 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -579,16 +579,20 @@ static void msm_gpio_dbg_show_one(struct seq_file 
*s,
seq_printf(s, " %-8s: %-3s %d", g->name, is_out ? "out" : "in", 
func);

seq_printf(s, " %dmA", msm_regval_to_drive(drive));
seq_printf(s, " %s", pulls[pull]);
+   seq_puts(s, "\n");
 }

 static void msm_gpio_dbg_show(struct seq_file *s, struct gpio_chip 
*chip)

 {
unsigned gpio = chip->base;
unsigned i;
+   const char *label;

for (i = 0; i < chip->ngpio; i++, gpio++) {
+   label = gpiochip_is_requested(chip, i);
+   if (!label)
+   continue;
msm_gpio_dbg_show_one(s, NULL, chip, i, gpio);
-   seq_puts(s, "\n");
}
 }

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


EDAC polling

2018-09-11 Thread Sodagudi Prasad



Hi All,

While adding edac_device control info using edac_device_add_device(), 
changed the poll_msec at client driver level.

For example -.
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/edac/qcom_llcc_edac.c?h=msm-4.9#n385


I see that client driver request is not considered and always default 
value(1 second) is considered. During the initialization

edac_device_workq_setup sets poll_msed to 1 second.

int edac_device_add_device(struct edac_device_ctl_info *edac_dev)
{
...
...
/* If there IS a check routine, then we are running POLLED */
if (edac_dev->edac_check != NULL) {
/* This instance is NOW RUNNING */
edac_dev->op_state = OP_RUNNING_POLL;

/*
 * enable workq processing on this instance,
 * default = 1000 msec
 */
edac_device_workq_setup(edac_dev, 1000);
} else {
edac_dev->op_state = OP_RUNNING_INTERRUPT;
}


May I know why client edev_ctl->poll_msec settings is not considered?


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


EDAC polling

2018-09-11 Thread Sodagudi Prasad



Hi All,

While adding edac_device control info using edac_device_add_device(), 
changed the poll_msec at client driver level.

For example -.
https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/edac/qcom_llcc_edac.c?h=msm-4.9#n385


I see that client driver request is not considered and always default 
value(1 second) is considered. During the initialization

edac_device_workq_setup sets poll_msed to 1 second.

int edac_device_add_device(struct edac_device_ctl_info *edac_dev)
{
...
...
/* If there IS a check routine, then we are running POLLED */
if (edac_dev->edac_check != NULL) {
/* This instance is NOW RUNNING */
edac_dev->op_state = OP_RUNNING_POLL;

/*
 * enable workq processing on this instance,
 * default = 1000 msec
 */
edac_device_workq_setup(edac_dev, 1000);
} else {
edac_dev->op_state = OP_RUNNING_INTERRUPT;
}


May I know why client edev_ctl->poll_msec settings is not considered?


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


KASAN errors from unwind_frame

2018-08-10 Thread Sodagudi Prasad

Hi All,

I have observed following KASAN error with 4.14.56 kernel.
Can you please copy change-[1](kasan: add no_sanitize attribute for 
clang builds) into stable kernels?


[1] - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/compiler-clang.h?h=v4.18-rc8=12c8f25a016dff69ee284aa3338bebfd2cfcba33



==
BUG: KASAN: out-of-bounds in __read_once_size_nocheck 
include/linux/compiler.h:196 [inline]
BUG: KASAN: out-of-bounds in unwind_frame+0xc4/0x324 
arch/arm64/kernel/stacktrace.c:56

Read of size 8 at addr ffe3123ff4b0 by task poc/15233

CPU: 7 PID: 15233 Comm: poc Tainted: G S  W  O4.14.56+ #3
Hardware name: Qualcomm Technologies, Inc.
Call trace:
dump_backtrace+0x0/0x388
show_stack+0x24/0x30
__dump_stack+0x24/0x2c
dump_stack+0x8c/0xd0
print_address_description+0x74/0x234
kasan_report+0x240/0x264
__asan_report_load8_noabort+0x2c/0x38
unwind_frame+0xc4/0x324
walk_stackframe+0x44/0x6c
__save_stack_trace+0x250/0x444
save_stack_trace_tsk+0x2c/0x38
proc_pid_stack+0x134/0x268
proc_single_show+0xdc/0x130
traverse+0x244/0x5b0
seq_lseek+0x10c/0x27c
vfs_llseek+0xb4/0xe4
SyS_lseek+0x54/0xa0
el0_svc_naked+0x34/0x38

The buggy address belongs to the page:
page:ffbf8c48ffc0 count:0 mapcount:0 mapping:  (null) 
index:0x0

flags: 0x0()
raw:    
raw:  dead0200  
page dumped because: kasan: bad access detected
page_owner info is not active (free page?)

Memory state around the buggy address:
ffe3123ff380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffe3123ff400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ffe3123ff480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

^
ffe3123ff500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffe3123ff580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


KASAN errors from unwind_frame

2018-08-10 Thread Sodagudi Prasad

Hi All,

I have observed following KASAN error with 4.14.56 kernel.
Can you please copy change-[1](kasan: add no_sanitize attribute for 
clang builds) into stable kernels?


[1] - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/compiler-clang.h?h=v4.18-rc8=12c8f25a016dff69ee284aa3338bebfd2cfcba33



==
BUG: KASAN: out-of-bounds in __read_once_size_nocheck 
include/linux/compiler.h:196 [inline]
BUG: KASAN: out-of-bounds in unwind_frame+0xc4/0x324 
arch/arm64/kernel/stacktrace.c:56

Read of size 8 at addr ffe3123ff4b0 by task poc/15233

CPU: 7 PID: 15233 Comm: poc Tainted: G S  W  O4.14.56+ #3
Hardware name: Qualcomm Technologies, Inc.
Call trace:
dump_backtrace+0x0/0x388
show_stack+0x24/0x30
__dump_stack+0x24/0x2c
dump_stack+0x8c/0xd0
print_address_description+0x74/0x234
kasan_report+0x240/0x264
__asan_report_load8_noabort+0x2c/0x38
unwind_frame+0xc4/0x324
walk_stackframe+0x44/0x6c
__save_stack_trace+0x250/0x444
save_stack_trace_tsk+0x2c/0x38
proc_pid_stack+0x134/0x268
proc_single_show+0xdc/0x130
traverse+0x244/0x5b0
seq_lseek+0x10c/0x27c
vfs_llseek+0xb4/0xe4
SyS_lseek+0x54/0xa0
el0_svc_naked+0x34/0x38

The buggy address belongs to the page:
page:ffbf8c48ffc0 count:0 mapcount:0 mapping:  (null) 
index:0x0

flags: 0x0()
raw:    
raw:  dead0200  
page dumped because: kasan: bad access detected
page_owner info is not active (free page?)

Memory state around the buggy address:
ffe3123ff380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffe3123ff400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ffe3123ff480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

^
ffe3123ff500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffe3123ff580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level

2018-08-10 Thread Sodagudi Prasad

On 2018-08-10 00:10, Rafael J. Wysocki wrote:

On Fri, Aug 10, 2018 at 12:30 AM,   wrote:

On 2018-08-06 01:53, Rafael J. Wysocki wrote:


On Fri, Aug 3, 2018 at 12:20 AM, Sodagudi Prasad
 wrote:


From: RAFAEL J. WYSOCKI 
Date: Wed, Aug 1, 2018 at 2:21 PM
Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every
initcall level
To: Rishabh Bhatnagar 
Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman
, Linux Kernel Mailing List
, ckad...@codeaurora.org,
ts...@codeaurora.org, Vikram Mulukutla 


On Wed, Aug 1, 2018 at 11:18 PM,   wrote:



On 2018-07-24 01:24, Rafael J. Wysocki wrote:




On Mon, Jul 23, 2018 at 10:22 PM,   
wrote:




On 2018-07-23 04:17, Rafael J. Wysocki wrote:





On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar
 wrote:





Drivers that are registered at an initcall level may have to
wait until late_init before the probe deferral mechanism can
retry their probe functions. It is possible that their
dependencies were resolved much earlier, in some cases even
before the next initcall level. Invoke one probe retry cycle
at every _sync initcall level, allowing these drivers to be
probed earlier.






Can you please say something about the actual use case this is
expected to address?





We have a display driver that depends 3 other devices to be
probed so that it can bring-up the display. Because of



dependencies



not being met the deferral mechanism defers the probes for a 
later



time,



even though the dependencies might be met earlier. With this



change



display can be brought up much earlier.





OK

What runlevel brings up the display after the change?

Thanks,
Rafael



After the change the display can come up after device_initcall 
level

itself.
The above mentioned 3 devices are probed at 0.503253, 0.505210 and



0.523264



secs.
Only the first device is probed successfully. With the current
deferral mechanism the devices get probed again after 
late_initcall
at 9.19 and 9.35 secs. So display can only come up after 9.35 
secs.
With this change the devices are re-probed successfully at 0.60 
and

0.613 secs. Therefore display can come just after 0.613 secs.




OK, so why do you touch the initcall levels earlier than device_?



1)  re-probe probing devices in the active list on every level 
help

to
avoid circular dependency pending list.
2)  There are so many devices which gets deferred in earlier 
init

call
levels, so we wanted to reprobe them at every successive init call 
level.



Do you have specific examples of devices for which that helps?




This change helps in overall android bootup as well.




How exactly?



We have seen less no of re-probes at late_init and most of the 
driver's
dependency met earlier than late_init call level. It helped display 
and

couple of other drivers by executing the re probe work at every init
level.



So I can believe that walking the deferred list on device_initcall 
and

maybe on device_initcall_sync may help, but I'm not quite convinced
that it matters for the earlier initcall levels.



Many of our drivers are dependent on the regulator and bus driver.
Both the regulator and bus driver are probed in the subsys_initcall 
level.
Now the probe of bus driver requires regulator to be working. If the 
probe

of
bus driver happens before regulator, then bus driver's probe will be
deferred and all other device's probes which depend on bus driver will 
also

be deferred.
The impact of this problem is reduced if we have this patch.


Fair enough, but this information should be there in the changelog of
your patch.

And why do you do that for arch_initcall_sync()?

 You are right and we can remove arch_initcall_sync().

Added some logging and observed that, none of the devices are re-probed 
in the arch_initcall_sync level.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level

2018-08-10 Thread Sodagudi Prasad

On 2018-08-10 00:10, Rafael J. Wysocki wrote:

On Fri, Aug 10, 2018 at 12:30 AM,   wrote:

On 2018-08-06 01:53, Rafael J. Wysocki wrote:


On Fri, Aug 3, 2018 at 12:20 AM, Sodagudi Prasad
 wrote:


From: RAFAEL J. WYSOCKI 
Date: Wed, Aug 1, 2018 at 2:21 PM
Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every
initcall level
To: Rishabh Bhatnagar 
Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman
, Linux Kernel Mailing List
, ckad...@codeaurora.org,
ts...@codeaurora.org, Vikram Mulukutla 


On Wed, Aug 1, 2018 at 11:18 PM,   wrote:



On 2018-07-24 01:24, Rafael J. Wysocki wrote:




On Mon, Jul 23, 2018 at 10:22 PM,   
wrote:




On 2018-07-23 04:17, Rafael J. Wysocki wrote:





On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar
 wrote:





Drivers that are registered at an initcall level may have to
wait until late_init before the probe deferral mechanism can
retry their probe functions. It is possible that their
dependencies were resolved much earlier, in some cases even
before the next initcall level. Invoke one probe retry cycle
at every _sync initcall level, allowing these drivers to be
probed earlier.






Can you please say something about the actual use case this is
expected to address?





We have a display driver that depends 3 other devices to be
probed so that it can bring-up the display. Because of



dependencies



not being met the deferral mechanism defers the probes for a 
later



time,



even though the dependencies might be met earlier. With this



change



display can be brought up much earlier.





OK

What runlevel brings up the display after the change?

Thanks,
Rafael



After the change the display can come up after device_initcall 
level

itself.
The above mentioned 3 devices are probed at 0.503253, 0.505210 and



0.523264



secs.
Only the first device is probed successfully. With the current
deferral mechanism the devices get probed again after 
late_initcall
at 9.19 and 9.35 secs. So display can only come up after 9.35 
secs.
With this change the devices are re-probed successfully at 0.60 
and

0.613 secs. Therefore display can come just after 0.613 secs.




OK, so why do you touch the initcall levels earlier than device_?



1)  re-probe probing devices in the active list on every level 
help

to
avoid circular dependency pending list.
2)  There are so many devices which gets deferred in earlier 
init

call
levels, so we wanted to reprobe them at every successive init call 
level.



Do you have specific examples of devices for which that helps?




This change helps in overall android bootup as well.




How exactly?



We have seen less no of re-probes at late_init and most of the 
driver's
dependency met earlier than late_init call level. It helped display 
and

couple of other drivers by executing the re probe work at every init
level.



So I can believe that walking the deferred list on device_initcall 
and

maybe on device_initcall_sync may help, but I'm not quite convinced
that it matters for the earlier initcall levels.



Many of our drivers are dependent on the regulator and bus driver.
Both the regulator and bus driver are probed in the subsys_initcall 
level.
Now the probe of bus driver requires regulator to be working. If the 
probe

of
bus driver happens before regulator, then bus driver's probe will be
deferred and all other device's probes which depend on bus driver will 
also

be deferred.
The impact of this problem is reduced if we have this patch.


Fair enough, but this information should be there in the changelog of
your patch.

And why do you do that for arch_initcall_sync()?

 You are right and we can remove arch_initcall_sync().

Added some logging and observed that, none of the devices are re-probed 
in the arch_initcall_sync level.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: cpu stopper threads and setaffinity leads to deadlock

2018-08-03 Thread Sodagudi Prasad

On 2018-08-03 04:41, Thomas Gleixner wrote:

Prasad.

On Thu, 2 Aug 2018, Peter Zijlstra wrote:


So why didn't you do the 'obvious' parallel to what you did for
cpu_stop_queue_two_works(), namely:


Is that patch fixing the issue for you?

 Hi Thomas and Peter,

Yes. Tested both versions of patches and both variants are working on 
Qualcomm devices
with stress testing of set affinity and tasks cross-migration, which 
were previously leading to the deadlock.


-Thanks, Prasad



--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -81,6 +81,7 @@ static bool cpu_stop_queue_work(unsigned
unsigned long flags;
bool enabled;

+   preempt_disable();
raw_spin_lock_irqsave(>lock, flags);
enabled = stopper->enabled;
if (enabled)
@@ -90,6 +91,7 @@ static bool cpu_stop_queue_work(unsigned
raw_spin_unlock_irqrestore(>lock, flags);

wake_up_q();
+   preempt_enable();

return enabled;
 }



--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: cpu stopper threads and setaffinity leads to deadlock

2018-08-03 Thread Sodagudi Prasad

On 2018-08-03 04:41, Thomas Gleixner wrote:

Prasad.

On Thu, 2 Aug 2018, Peter Zijlstra wrote:


So why didn't you do the 'obvious' parallel to what you did for
cpu_stop_queue_two_works(), namely:


Is that patch fixing the issue for you?

 Hi Thomas and Peter,

Yes. Tested both versions of patches and both variants are working on 
Qualcomm devices
with stress testing of set affinity and tasks cross-migration, which 
were previously leading to the deadlock.


-Thanks, Prasad



--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -81,6 +81,7 @@ static bool cpu_stop_queue_work(unsigned
unsigned long flags;
bool enabled;

+   preempt_disable();
raw_spin_lock_irqsave(>lock, flags);
enabled = stopper->enabled;
if (enabled)
@@ -90,6 +91,7 @@ static bool cpu_stop_queue_work(unsigned
raw_spin_unlock_irqrestore(>lock, flags);

wake_up_q();
+   preempt_enable();

return enabled;
 }



--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level

2018-08-02 Thread Sodagudi Prasad

From: RAFAEL J. WYSOCKI 
Date: Wed, Aug 1, 2018 at 2:21 PM
Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every
initcall level
To: Rishabh Bhatnagar 
Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman
, Linux Kernel Mailing List
, ckad...@codeaurora.org,
ts...@codeaurora.org, Vikram Mulukutla 

On Wed, Aug 1, 2018 at 11:18 PM,   wrote:

On 2018-07-24 01:24, Rafael J. Wysocki wrote:


On Mon, Jul 23, 2018 at 10:22 PM,   wrote:


On 2018-07-23 04:17, Rafael J. Wysocki wrote:



On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar
 wrote:



Drivers that are registered at an initcall level may have to
wait until late_init before the probe deferral mechanism can
retry their probe functions. It is possible that their
dependencies were resolved much earlier, in some cases even
before the next initcall level. Invoke one probe retry cycle
at every _sync initcall level, allowing these drivers to be
probed earlier.




Can you please say something about the actual use case this is
expected to address?



We have a display driver that depends 3 other devices to be
probed so that it can bring-up the display. Because of

dependencies

not being met the deferral mechanism defers the probes for a later

time,

even though the dependencies might be met earlier. With this

change

display can be brought up much earlier.



OK

What runlevel brings up the display after the change?

Thanks,
Rafael

After the change the display can come up after device_initcall level
itself.
The above mentioned 3 devices are probed at 0.503253, 0.505210 and

0.523264

secs.
Only the first device is probed successfully. With the current
deferral mechanism the devices get probed again after late_initcall
at 9.19 and 9.35 secs. So display can only come up after 9.35 secs.
With this change the devices are re-probed successfully at 0.60 and
0.613 secs. Therefore display can come just after 0.613 secs.


OK, so why do you touch the initcall levels earlier than device_?
1)	re-probe probing devices in the active list on every level help to 
avoid circular dependency pending list.
2)	There are so many devices which gets deferred in earlier init call 
levels, so we wanted to reprobe them at every successive init call 
level.





This change helps in overall android bootup as well.


How exactly?
We have seen less no of re-probes at late_init and most of the driver's 
dependency met earlier than late_init call level. It helped display and 
couple of other drivers by executing the re probe work at every init 
level.


-thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level

2018-08-02 Thread Sodagudi Prasad

From: RAFAEL J. WYSOCKI 
Date: Wed, Aug 1, 2018 at 2:21 PM
Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every
initcall level
To: Rishabh Bhatnagar 
Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman
, Linux Kernel Mailing List
, ckad...@codeaurora.org,
ts...@codeaurora.org, Vikram Mulukutla 

On Wed, Aug 1, 2018 at 11:18 PM,   wrote:

On 2018-07-24 01:24, Rafael J. Wysocki wrote:


On Mon, Jul 23, 2018 at 10:22 PM,   wrote:


On 2018-07-23 04:17, Rafael J. Wysocki wrote:



On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar
 wrote:



Drivers that are registered at an initcall level may have to
wait until late_init before the probe deferral mechanism can
retry their probe functions. It is possible that their
dependencies were resolved much earlier, in some cases even
before the next initcall level. Invoke one probe retry cycle
at every _sync initcall level, allowing these drivers to be
probed earlier.




Can you please say something about the actual use case this is
expected to address?



We have a display driver that depends 3 other devices to be
probed so that it can bring-up the display. Because of

dependencies

not being met the deferral mechanism defers the probes for a later

time,

even though the dependencies might be met earlier. With this

change

display can be brought up much earlier.



OK

What runlevel brings up the display after the change?

Thanks,
Rafael

After the change the display can come up after device_initcall level
itself.
The above mentioned 3 devices are probed at 0.503253, 0.505210 and

0.523264

secs.
Only the first device is probed successfully. With the current
deferral mechanism the devices get probed again after late_initcall
at 9.19 and 9.35 secs. So display can only come up after 9.35 secs.
With this change the devices are re-probed successfully at 0.60 and
0.613 secs. Therefore display can come just after 0.613 secs.


OK, so why do you touch the initcall levels earlier than device_?
1)	re-probe probing devices in the active list on every level help to 
avoid circular dependency pending list.
2)	There are so many devices which gets deferred in earlier init call 
levels, so we wanted to reprobe them at every successive init call 
level.





This change helps in overall android bootup as well.


How exactly?
We have seen less no of re-probes at late_init and most of the driver's 
dependency met earlier than late_init call level. It helped display and 
couple of other drivers by executing the re probe work at every init 
level.


-thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-08-01 Thread Sodagudi Prasad

On 2018-07-28 00:47, Darrick J. Wong wrote:

On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote:

On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > The error should be pretty clear: "Inode table for bg 0 marked as
> > needing zeroing".  That should never happen.
>
> Can you provide any debug patch to detect when this corruption is happening?
> Source of this corruption and how this is partition getting corrupted?
> Or which file system operation lead to this corruption?

Do you have a reliable repro?  If it's a one-off, it can be caused by
*anything*.  Crappy hardware, a bug in some proprietary, binary-only
GPU driver dereferencing some wild pointer that corrupts kernel
memory, etc.

Asking for a debug patch is like asking for "can you create technology
that can detect when a cockroach enter my house?"


Well, ext4 *could* add metadata read and write verifiers to complain
loudly in dmesg about stuff that shouldn't be there, so at least we'd
know when we're writing cockroaches into the house... :)

--D


Hi Ted,

Below change fixed this issue. Thanks for your support.

https://github.com/torvalds/linux/commit/5012284700775a4e6e3fbe7eac4c543c4874b559

"ext4: fix check to prevent initializing reserved inodes"

-Thanks, Prasad


So if you have a reliable repro, then we know what operations might be
triggering the corruption, and then you work on creating a minimal
repro, and only *then* when we have a restricted set of possibilities
that might be the cause (for example, if removing a GPU call makes the
problem go away, then the patch would need to be in the proprietary
GPU driver)

> I am digging code a bit around this warning to understand more.

The warning means that a flag in block group descriptor #0 is set
that should never be set.  How did the flag get set?  There is any
number of things that could cause that.

You might want to look at the block group descriptor via dumpe2fs or
debugfs, to see if it's just a single bit getting flipped, or if the
entire block group descriptor is garbage.  Note that under normal code
paths, the flag *never* gets set by ext4 kernel code.  The flag will
get set on non-block group 0 block group descriptors by ext4, and the
ext4 kernel code will only clear the flag.

Of course, if there is a bug in some driver that dereferences a
pointer widely, all bets are off.

- Ted


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-08-01 Thread Sodagudi Prasad

On 2018-07-28 00:47, Darrick J. Wong wrote:

On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote:

On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > The error should be pretty clear: "Inode table for bg 0 marked as
> > needing zeroing".  That should never happen.
>
> Can you provide any debug patch to detect when this corruption is happening?
> Source of this corruption and how this is partition getting corrupted?
> Or which file system operation lead to this corruption?

Do you have a reliable repro?  If it's a one-off, it can be caused by
*anything*.  Crappy hardware, a bug in some proprietary, binary-only
GPU driver dereferencing some wild pointer that corrupts kernel
memory, etc.

Asking for a debug patch is like asking for "can you create technology
that can detect when a cockroach enter my house?"


Well, ext4 *could* add metadata read and write verifiers to complain
loudly in dmesg about stuff that shouldn't be there, so at least we'd
know when we're writing cockroaches into the house... :)

--D


Hi Ted,

Below change fixed this issue. Thanks for your support.

https://github.com/torvalds/linux/commit/5012284700775a4e6e3fbe7eac4c543c4874b559

"ext4: fix check to prevent initializing reserved inodes"

-Thanks, Prasad


So if you have a reliable repro, then we know what operations might be
triggering the corruption, and then you work on creating a minimal
repro, and only *then* when we have a restricted set of possibilities
that might be the cause (for example, if removing a GPU call makes the
problem go away, then the patch would need to be in the proprietary
GPU driver)

> I am digging code a bit around this warning to understand more.

The warning means that a flag in block group descriptor #0 is set
that should never be set.  How did the flag get set?  There is any
number of things that could cause that.

You might want to look at the block group descriptor via dumpe2fs or
debugfs, to see if it's just a single bit getting flipped, or if the
entire block group descriptor is garbage.  Note that under normal code
paths, the flag *never* gets set by ext4 kernel code.  The flag will
get set on non-block group 0 block group descriptors by ext4, and the
ext4 kernel code will only clear the flag.

Of course, if there is a bug in some driver that dereferences a
pointer widely, all bets are off.

- Ted


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


cpu stopper threads and setaffinity leads to deadlock

2018-08-01 Thread Sodagudi Prasad

Hi Peter and Tglx,

We are observing another deadlock issue due to commit 
0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() 
deadlock), even after taking the following fix
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html 
on the Linux-4.14.56  kernel.


Here is the scenario that leads to this deadlock.
We have used the stress-ng-64 --affinity test case to reproduce this 
issue in a controlled environment, while simultaneously running CPU hot 
plug and task migrations.


Stress-ng-affin (call stack shown below) is changing its own affinity 
from cpu3 to cpu7. Stress-ng-affin is preempted in the 
cpu_stop_queue_work() function
as soon as the stopper lock for migration/3 is released . At the same 
time, on CPU 7, cross migration of tasks happens between  cpu3 and cpu7.


===
Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffd8817e4480
=
Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffd8817e4480
state: 0x0 exit_state: 0x0 stack base: 0xff801c8e8000 Prio: 120
Stack:
[] __switch_to+0xb8
[] __schedule+0x690
[] preempt_schedule_common+0x100
[] preempt_schedule+0x24
[] _raw_spin_unlock_irqrestore+0x64
[] cpu_stop_queue_work+0x9c
[] stop_one_cpu+0x58
[] __set_cpus_allowed_ptr+0x234
[] sched_setaffinity+0x150
[] SyS_sched_setaffinity+0xcc
[] el0_svc_naked+0x34
[<0>] UNKNOWN+0x0

Due to cross migration of tasks between cpu7 and cpu3, migration/7 has 
started executing and waits for the migration/3 task, so that they can 
proceed within the multi cpu stop state machine together.
Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 
has started running, and has monopolized cpu7’s execution, stress-ng 
will never run on cpu7, and cpu3’s migration task is never woken up.


Essentially:
Due to the nature of the wake_q interface,  a thread can only be in at 
most one wake queue at a time.
migration/3 is currently in stress-ng-affin’s wake_q. This means that no 
other thread can add migration/3 to their wake queue.
Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, 
hot plugging, etc), no thread will wake up migration/3.


Below change helped to fix this deadlock.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..f932e1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, 
struct cpu_stop_work *work)

__cpu_stop_queue_work(stopper, work, );
else if (work->done)
cpu_stop_signal_done(work->done);
-   raw_spin_unlock_irqrestore(>lock, flags);

wake_up_q();
+   raw_spin_unlock_irqrestore(>lock, flags);


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


cpu stopper threads and setaffinity leads to deadlock

2018-08-01 Thread Sodagudi Prasad

Hi Peter and Tglx,

We are observing another deadlock issue due to commit 
0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() 
deadlock), even after taking the following fix
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html 
on the Linux-4.14.56  kernel.


Here is the scenario that leads to this deadlock.
We have used the stress-ng-64 --affinity test case to reproduce this 
issue in a controlled environment, while simultaneously running CPU hot 
plug and task migrations.


Stress-ng-affin (call stack shown below) is changing its own affinity 
from cpu3 to cpu7. Stress-ng-affin is preempted in the 
cpu_stop_queue_work() function
as soon as the stopper lock for migration/3 is released . At the same 
time, on CPU 7, cross migration of tasks happens between  cpu3 and cpu7.


===
Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffd8817e4480
=
Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffd8817e4480
state: 0x0 exit_state: 0x0 stack base: 0xff801c8e8000 Prio: 120
Stack:
[] __switch_to+0xb8
[] __schedule+0x690
[] preempt_schedule_common+0x100
[] preempt_schedule+0x24
[] _raw_spin_unlock_irqrestore+0x64
[] cpu_stop_queue_work+0x9c
[] stop_one_cpu+0x58
[] __set_cpus_allowed_ptr+0x234
[] sched_setaffinity+0x150
[] SyS_sched_setaffinity+0xcc
[] el0_svc_naked+0x34
[<0>] UNKNOWN+0x0

Due to cross migration of tasks between cpu7 and cpu3, migration/7 has 
started executing and waits for the migration/3 task, so that they can 
proceed within the multi cpu stop state machine together.
Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 
has started running, and has monopolized cpu7’s execution, stress-ng 
will never run on cpu7, and cpu3’s migration task is never woken up.


Essentially:
Due to the nature of the wake_q interface,  a thread can only be in at 
most one wake queue at a time.
migration/3 is currently in stress-ng-affin’s wake_q. This means that no 
other thread can add migration/3 to their wake queue.
Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, 
hot plugging, etc), no thread will wake up migration/3.


Below change helped to fix this deadlock.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e190d1e..f932e1e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, 
struct cpu_stop_work *work)

__cpu_stop_queue_work(stopper, work, );
else if (work->done)
cpu_stop_signal_done(work->done);
-   raw_spin_unlock_irqrestore(>lock, flags);

wake_up_q();
+   raw_spin_unlock_irqrestore(>lock, flags);


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-01 Thread Sodagudi Prasad

On 2018-07-30 14:07, Peter Zijlstra wrote:

On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well?  Currently, there is no way 
to
identify thread migrations completed or not.  When we observe this 
issue,
the symptom was work queue lock up. It is better to have some timeout 
here

and induce the bug_on.


You'd trigger the soft-lockup or hung-task detector I think. And if 
not,

we ought to look at making it trigger at least one of those.


There is no way to identify the migration threads stuck or not.


Should be pretty obvious from the splat generated by the above, no?

Hi Peter and Thomas,

Thanks for your support.
I have another question on this flow and retry mechanism used in this 
cpu_stop_queue_two_works() function using the global variable 
stop_cpus_in_progress.


This variable is getting used in various paths, such as task migration, 
set task affinity, and CPU hotplug.


For example cpu hotplug path, stop_cpus_in_progress variable getting set 
with true with out checking.

takedown_cpu()
--stop_machine_cpuslocked()
---stop_cpus()
---__stop_cpus()
queue_stop_cpus_work()
setting stop_cpus_in_progress to true directly.

But in the task migration path only, the stop_cpus_in_progress  variable 
is used for retry.


I am thinking that stop_cpus_in_progress variable lead race conditions, 
where CPU hotplug and task migration happening simultaneously. Please 
correct me If my understanding wrong.


-Thanks, Prasad




--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,
msecs_to_jiffies(1000));
+   if (!ret)
+   BUG_ON(1);
+


That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-08-01 Thread Sodagudi Prasad

On 2018-07-30 14:07, Peter Zijlstra wrote:

On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well?  Currently, there is no way 
to
identify thread migrations completed or not.  When we observe this 
issue,
the symptom was work queue lock up. It is better to have some timeout 
here

and induce the bug_on.


You'd trigger the soft-lockup or hung-task detector I think. And if 
not,

we ought to look at making it trigger at least one of those.


There is no way to identify the migration threads stuck or not.


Should be pretty obvious from the splat generated by the above, no?

Hi Peter and Thomas,

Thanks for your support.
I have another question on this flow and retry mechanism used in this 
cpu_stop_queue_two_works() function using the global variable 
stop_cpus_in_progress.


This variable is getting used in various paths, such as task migration, 
set task affinity, and CPU hotplug.


For example cpu hotplug path, stop_cpus_in_progress variable getting set 
with true with out checking.

takedown_cpu()
--stop_machine_cpuslocked()
---stop_cpus()
---__stop_cpus()
queue_stop_cpus_work()
setting stop_cpus_in_progress to true directly.

But in the task migration path only, the stop_cpus_in_progress  variable 
is used for retry.


I am thinking that stop_cpus_in_progress variable lead race conditions, 
where CPU hotplug and task migration happening simultaneously. Please 
correct me If my understanding wrong.


-Thanks, Prasad




--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2,

cpu_stop_fn_t fn, void *
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,
msecs_to_jiffies(1000));
+   if (!ret)
+   BUG_ON(1);
+


That's a random timeout, which if you spuriously trigger it, will take
down your machine. That seems like a cure worse than the disease.


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Sodagudi Prasad

On 2018-07-30 05:41, Thomas Gleixner wrote:

On Mon, 30 Jul 2018, Peter Zijlstra wrote:


On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> >
> > > Are there any comments about this patch?
> >
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
>
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more 
natural
nesting wrt the spinlocks and make the wake_up_q() and 
preempt_enable()

unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 



Hi Peter/Thomas,

How about including below change as well?  Currently, there is no way to 
identify thread migrations completed or not.  When we observe this 
issue, the symptom was work queue lock up. It is better to have some 
timeout here and induce the bug_on.


There is no way to identify the migration threads stuck or not.

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,  
msecs_to_jiffies(1000));

+   if (!ret)
+   BUG_ON(1);
+



Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads

2018-07-30 Thread Sodagudi Prasad

On 2018-07-30 05:41, Thomas Gleixner wrote:

On Mon, 30 Jul 2018, Peter Zijlstra wrote:


On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
> On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
> > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote:
> > > Hi all,
> > Hi,
> >
> > > Are there any comments about this patch?
> >
> > I haven't look in detail at this but your new preempt_disable() makes
> > things unbalanced for the err != 0 case.
>
> It doesn't but that code is really an unreadable pile of ...

---
Subject: stop_machine: Reflow cpu_stop_queue_two_works()

The code flow in cpu_stop_queue_two_works() is a little arcane; fix
this by lifting the preempt_disable() to the top to create more 
natural
nesting wrt the spinlocks and make the wake_up_q() and 
preempt_enable()

unconditional at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we
spin-wait with preemption enabled.

Suggested-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 



Hi Peter/Thomas,

How about including below change as well?  Currently, there is no way to 
identify thread migrations completed or not.  When we observe this 
issue, the symptom was work queue lock up. It is better to have some 
timeout here and induce the bug_on.


There is no way to identify the migration threads stuck or not.

--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

struct cpu_stop_done done;
struct cpu_stop_work work1, work2;
struct multi_stop_data msdata;
+   int ret;

msdata = (struct multi_stop_data){
.fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;

-   wait_for_completion();
+   ret = wait_for_completion_timeout(,  
msecs_to_jiffies(1000));

+   if (!ret)
+   BUG_ON(1);
+



Thanks for cleaning that up!

Reviewed-by: Thomas Gleixner 


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-07-27 Thread Sodagudi Prasad

On 2018-07-27 12:52, Theodore Y. Ts'o wrote:

On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote:

On 2018-07-26 18:04, Sodagudi Prasad wrote:
> Hi All,
>

+linux-kernel@vger.kernel.org list.

Hi All,

Observing the following issue with one of the partition on android 
device
with 4.14.56 kernel.  When I try to remount this partition using the 
command

- mount -o rw,remount /vendor/dsp, it is remounting as read-only.

[  191.364358] EXT4-fs error (device sde9): 
ext4_has_uninit_itable:3108:

comm mount: Inode table for bg 0 marked as needing zeroing
[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if 
it is

valid", the issue is not observed. It is just giving following warning
message.
[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, 
running

e2fsck is recommendedt
[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this 
partition?


The error should be pretty clear: "Inode table for bg 0 marked as
needing zeroing".  That should never happen.


Hi Ted,

Can you provide any debug patch to detect when this corruption is 
happening?

Source of this corruption and how this is partition getting corrupted?
Or which file system operation lead to this corruption?

I am digging code a bit around this warning to understand more.

Thanks in advance for your help.

-Thanks, Prasad


The warning "mounting fs
with errors" means the file system was corrupted.  The commit is now
telling more about how the file system was corrupted, and is
protecting you from further data loss.  (You shoud never, ever, ever,
allow a file system to be mounted read/write with corruptions.  The
file system should have been checked using fsck.ext4, aka e2fsck,
before the file system was allowed to be mounted.)

  - Ted


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-07-27 Thread Sodagudi Prasad

On 2018-07-27 12:52, Theodore Y. Ts'o wrote:

On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote:

On 2018-07-26 18:04, Sodagudi Prasad wrote:
> Hi All,
>

+linux-kernel@vger.kernel.org list.

Hi All,

Observing the following issue with one of the partition on android 
device
with 4.14.56 kernel.  When I try to remount this partition using the 
command

- mount -o rw,remount /vendor/dsp, it is remounting as read-only.

[  191.364358] EXT4-fs error (device sde9): 
ext4_has_uninit_itable:3108:

comm mount: Inode table for bg 0 marked as needing zeroing
[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if 
it is

valid", the issue is not observed. It is just giving following warning
message.
[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, 
running

e2fsck is recommendedt
[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this 
partition?


The error should be pretty clear: "Inode table for bg 0 marked as
needing zeroing".  That should never happen.


Hi Ted,

Can you provide any debug patch to detect when this corruption is 
happening?

Source of this corruption and how this is partition getting corrupted?
Or which file system operation lead to this corruption?

I am digging code a bit around this warning to understand more.

Thanks in advance for your help.

-Thanks, Prasad


The warning "mounting fs
with errors" means the file system was corrupted.  The commit is now
telling more about how the file system was corrupted, and is
protecting you from further data loss.  (You shoud never, ever, ever,
allow a file system to be mounted read/write with corruptions.  The
file system should have been checked using fsck.ext4, aka e2fsck,
before the file system was allowed to be mounted.)

  - Ted


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-07-27 Thread Sodagudi Prasad

On 2018-07-26 18:04, Sodagudi Prasad wrote:

Hi All,



+linux-kernel@vger.kernel.org list.

Hi All,

Observing the following issue with one of the partition on android 
device with 4.14.56 kernel.  When I try to remount this partition using 
the command - mount -o rw,remount /vendor/dsp, it is remounting as 
read-only.


[  191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: 
comm mount: Inode table for bg 0 marked as needing zeroing

[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if 
it is valid", the issue is not observed. It is just giving following 
warning message.

[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running 
e2fsck is recommended

[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this partition?


Observing the following issue with one of the partition on android
device with 4.14.56 kernel.  When I try to remount this partition
using the command - mount -o rw,remount /vendor/dsp, it is remounting
as read-only.

[  191.364358] EXT4-fs error (device sde9):
ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked
as needing zeroing
[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if
it is valid", the issue is not observed. It is just giving following
warning message.
[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors,
running e2fsck is recommended
[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this 
partition?


-Thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Remounting filesystem read-only

2018-07-27 Thread Sodagudi Prasad

On 2018-07-26 18:04, Sodagudi Prasad wrote:

Hi All,



+linux-kernel@vger.kernel.org list.

Hi All,

Observing the following issue with one of the partition on android 
device with 4.14.56 kernel.  When I try to remount this partition using 
the command - mount -o rw,remount /vendor/dsp, it is remounting as 
read-only.


[  191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: 
comm mount: Inode table for bg 0 marked as needing zeroing

[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if 
it is valid", the issue is not observed. It is just giving following 
warning message.

[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running 
e2fsck is recommended

[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this partition?


Observing the following issue with one of the partition on android
device with 4.14.56 kernel.  When I try to remount this partition
using the command - mount -o rw,remount /vendor/dsp, it is remounting
as read-only.

[  191.364358] EXT4-fs error (device sde9):
ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked
as needing zeroing
[  191.364762] Aborting journal on device sde9-8.
[  191.365226] EXT4-fs (sde9): Remounting filesystem read-only
[  191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere

If I revert this commit [1] -"ext4: only look at the bg_flags field if
it is valid", the issue is not observed. It is just giving following
warning message.
[1] - https://patchwork.ozlabs.org/patch/929239/.

[  123.373456] EXT4-fs (sde9): warning: mounting fs with errors,
running e2fsck is recommended
[  123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered

Can you provide some inputs what could be the issue with this 
partition?


-Thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Remove cpu swap from stop_two_cpus

2018-06-27 Thread Sodagudi Prasad

On 2018-06-27 00:15, Sebastian Andrzej Siewior wrote:

On 2018-06-26 14:28:26 [-0700], Isaac J. Manjarres wrote:

Remove CPU ID swapping in stop_two_cpus() so that the
source CPU's stopper thread is added to the wake queue last,
so that the source CPU's stopper thread is woken up last,
ensuring that all other threads that it depends on are woken
up before it runs.


You can't do that because you could deadlock while locking the stoper
lock.
 Without this change boot up issues are observed with Linux 
4.14.52.

One of the core is executing the stopper thread after wake_up_q() in
cpu_stop_queue_two_works() function, without waking up other cores 
stopper thread.

We see this issue 100% on device boot up with Linux 4.14.52.

Could you please explain bit more how the deadlock occurs?
static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work 
*work1,
int cpu2, struct cpu_stop_work 
*work2)

{
struct cpu_stopper *stopper1 = per_cpu_ptr(_stopper, cpu1);
struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2);
DEFINE_WAKE_Q(wakeq);
int err;
retry:
raw_spin_lock_irq(>lock);
raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING);



I think, you are suggesting to switch the locking sequence too.
stopper2->lock and stopper1->lock.

could you please share the test case to stress this code flow?


Couldn't you swap cpu1+cpu2 and work1+work2?

 Work1 and work2 are having same data contents.





diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index f89014a..d10d633 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -307,8 +307,6 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

cpu_stop_init_done(, 2);
set_state(, MULTI_STOP_PREPARE);

-   if (cpu1 > cpu2)
-   swap(cpu1, cpu2);
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;



Sebastian


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] stop_machine: Remove cpu swap from stop_two_cpus

2018-06-27 Thread Sodagudi Prasad

On 2018-06-27 00:15, Sebastian Andrzej Siewior wrote:

On 2018-06-26 14:28:26 [-0700], Isaac J. Manjarres wrote:

Remove CPU ID swapping in stop_two_cpus() so that the
source CPU's stopper thread is added to the wake queue last,
so that the source CPU's stopper thread is woken up last,
ensuring that all other threads that it depends on are woken
up before it runs.


You can't do that because you could deadlock while locking the stoper
lock.
 Without this change boot up issues are observed with Linux 
4.14.52.

One of the core is executing the stopper thread after wake_up_q() in
cpu_stop_queue_two_works() function, without waking up other cores 
stopper thread.

We see this issue 100% on device boot up with Linux 4.14.52.

Could you please explain bit more how the deadlock occurs?
static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work 
*work1,
int cpu2, struct cpu_stop_work 
*work2)

{
struct cpu_stopper *stopper1 = per_cpu_ptr(_stopper, cpu1);
struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2);
DEFINE_WAKE_Q(wakeq);
int err;
retry:
raw_spin_lock_irq(>lock);
raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING);



I think, you are suggesting to switch the locking sequence too.
stopper2->lock and stopper1->lock.

could you please share the test case to stress this code flow?


Couldn't you swap cpu1+cpu2 and work1+work2?

 Work1 and work2 are having same data contents.





diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index f89014a..d10d633 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -307,8 +307,6 @@ int stop_two_cpus(unsigned int cpu1, unsigned int 
cpu2, cpu_stop_fn_t fn, void *

cpu_stop_init_done(, 2);
set_state(, MULTI_STOP_PREPARE);

-   if (cpu1 > cpu2)
-   swap(cpu1, cpu2);
if (cpu_stop_queue_two_works(cpu1, , cpu2, ))
return -ENOENT;



Sebastian


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: write_lock_irq(_lock)

2018-05-24 Thread Sodagudi Prasad

On 2018-05-24 06:51, Boqun Feng wrote:

On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:

On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 8:35 AM Will Deacon  wrote:
>
> > In other words, qrwlock requires consistent locking order wrt spinlocks.
>
> I *thought* lockdep already tracked and detected this. Or is that only with
> with the sleeping versions?

There are patches in-flight to detect this:

https://marc.info/?l=linux-kernel=152483640529740=2k

as part of Boqun's work into recursive read locking.




Hi Linus and Will,

Thank you for quick responses and providing the suggestions.

 Kernel version is locked for couple of products and same issue observed 
on both 4.14.41
and 4.9.96 kernels. We can only accept the stable updates from upstream 
for these products.
If QUEUED_RWLOCKS works on above listed kernel versions without any 
issues, we can enabled QUEUED_RWLOCKS.


Can we go ahead with Linus suggestion for these kernel version?
So that IRQ wont be disabled for quite a long time.

 static void tasklist_write_lock(void)
   {
 unsigned long flags;
 local_irq_save(flags);
 while (!write_trylock(_lock)) {
 local_irq_restore(flags);
 do { cpu_relax(); } while 
(write_islocked(_lock));

 local_irq_disable();
 }
   }


-Thanks, Prasad


Yeah, lemme put some details here:

So we have three cases:

Case #1 (from Will)

P0: P1: P2:

spin_lock()   read_lock()
write_lock()
read_lock()  spin_lock()

, which is a deadlock, and couldn't not be detected by lockdep yet. And
lockdep could detect this with the patch I attach at the end of the
mail.

Case #2

P0: P1: P2:


spin_lock()   read_lock()
write_lock()
read_lock()  spin_lock_irq()

, which is not a deadlock, as the read_lock() on P0 can use the unfair
fastpass.

Case #3

P0: P1: P2:


spin_lock_irq()   read_lock()
write_lock_irq()
read_lock()  spin_lock()

, which is a deadlock, as the read_lock() on P0 cannot use the 
fastpass.

To detect this and not to make case #2 as a false positive, the
recursive deadlock detection patchset is needed, the WIP version is at:

	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git 
arr-rfc-wip


The code is done, I'm just working on the rework for documention stuff,
so if anyone is interested, could try it out ;-)

Regards,
Boqun

--->8
Subject: [PATCH] locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a 
recursive

read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng 
---
 include/linux/lockdep.h | 35 ++-
 lib/locking-selftest.c  | 11 +++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..07793986c063 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -27,6 +27,7 @@ extern int lock_stat;
 #include 
 #include 
 #include 
+#include 

 /*
  * We'd rather not expose kernel/lockdep_states.h this wide, but we do 
need

@@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct
task_struct *curr)
 }
 #endif

+/* Variable used to make lockdep treat read_lock() as recursive in 
selftests */

+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+static inline bool read_lock_is_recursive(void)
+{
+   return force_read_lock_recursive ||
+  !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+  in_interrupt();
+}
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
 /*
  * For trivial one-depth nesting of a lock-class, the following
  * global define can be used. (Subsystems with multiple levels
@@ -561,7 +587,14 

Re: write_lock_irq(_lock)

2018-05-24 Thread Sodagudi Prasad

On 2018-05-24 06:51, Boqun Feng wrote:

On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:

On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 8:35 AM Will Deacon  wrote:
>
> > In other words, qrwlock requires consistent locking order wrt spinlocks.
>
> I *thought* lockdep already tracked and detected this. Or is that only with
> with the sleeping versions?

There are patches in-flight to detect this:

https://marc.info/?l=linux-kernel=152483640529740=2k

as part of Boqun's work into recursive read locking.




Hi Linus and Will,

Thank you for quick responses and providing the suggestions.

 Kernel version is locked for couple of products and same issue observed 
on both 4.14.41
and 4.9.96 kernels. We can only accept the stable updates from upstream 
for these products.
If QUEUED_RWLOCKS works on above listed kernel versions without any 
issues, we can enabled QUEUED_RWLOCKS.


Can we go ahead with Linus suggestion for these kernel version?
So that IRQ wont be disabled for quite a long time.

 static void tasklist_write_lock(void)
   {
 unsigned long flags;
 local_irq_save(flags);
 while (!write_trylock(_lock)) {
 local_irq_restore(flags);
 do { cpu_relax(); } while 
(write_islocked(_lock));

 local_irq_disable();
 }
   }


-Thanks, Prasad


Yeah, lemme put some details here:

So we have three cases:

Case #1 (from Will)

P0: P1: P2:

spin_lock()   read_lock()
write_lock()
read_lock()  spin_lock()

, which is a deadlock, and couldn't not be detected by lockdep yet. And
lockdep could detect this with the patch I attach at the end of the
mail.

Case #2

P0: P1: P2:


spin_lock()   read_lock()
write_lock()
read_lock()  spin_lock_irq()

, which is not a deadlock, as the read_lock() on P0 can use the unfair
fastpass.

Case #3

P0: P1: P2:


spin_lock_irq()   read_lock()
write_lock_irq()
read_lock()  spin_lock()

, which is a deadlock, as the read_lock() on P0 cannot use the 
fastpass.

To detect this and not to make case #2 as a false positive, the
recursive deadlock detection patchset is needed, the WIP version is at:

	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git 
arr-rfc-wip


The code is done, I'm just working on the rework for documention stuff,
so if anyone is interested, could try it out ;-)

Regards,
Boqun

--->8
Subject: [PATCH] locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a 
recursive

read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng 
---
 include/linux/lockdep.h | 35 ++-
 lib/locking-selftest.c  | 11 +++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..07793986c063 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -27,6 +27,7 @@ extern int lock_stat;
 #include 
 #include 
 #include 
+#include 

 /*
  * We'd rather not expose kernel/lockdep_states.h this wide, but we do 
need

@@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct
task_struct *curr)
 }
 #endif

+/* Variable used to make lockdep treat read_lock() as recursive in 
selftests */

+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+static inline bool read_lock_is_recursive(void)
+{
+   return force_read_lock_recursive ||
+  !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+  in_interrupt();
+}
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
 /*
  * For trivial one-depth nesting of a lock-class, the following
  * global define can be used. (Subsystems with multiple levels
@@ -561,7 +587,14 @@ static inline void 

write_lock_irq(_lock)

2018-05-22 Thread Sodagudi Prasad

Hi All,

When following test is executed on 4.14.41 stable kernel, observed that 
one of the core is waiting for tasklist_lock for long time with IRQs 
disabled.

./stress-ng-64 --get 8 -t 3h --times --metrics-brief

Every time when device is crashed, I observed that one the task stuck at 
fork system call and waiting for tasklist_lock as writer with irq 
disabled.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Some other tasks are making getrlimit, prlimit system calls, so that 
these readers are continuously taking tasklist_list read lock.


Writer has disabled local IRQs for long time and waiting to readers to 
finish but readers are keeping tasklist_lock busy

for quite long time.

I think, −−get N option creates N thread and they make following system 
calls.


start N workers that call system calls that fetch data from the kernel, 
currently these are: getpid,
getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, 
getpriority, getresgid, getresuid,
getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, 
uname, adjtimex, sysfs.

Some of these system calls are OS specific.


Have you observed this type of issues with tasklist_lock ?
Do we need write_lock_irq(_lock) in below portion of code ? Can 
I use write_unlock instead of write_lock_irq in portion of code?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


write_lock_irq(_lock)

2018-05-22 Thread Sodagudi Prasad

Hi All,

When following test is executed on 4.14.41 stable kernel, observed that 
one of the core is waiting for tasklist_lock for long time with IRQs 
disabled.

./stress-ng-64 --get 8 -t 3h --times --metrics-brief

Every time when device is crashed, I observed that one the task stuck at 
fork system call and waiting for tasklist_lock as writer with irq 
disabled.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Some other tasks are making getrlimit, prlimit system calls, so that 
these readers are continuously taking tasklist_list read lock.


Writer has disabled local IRQs for long time and waiting to readers to 
finish but readers are keeping tasklist_lock busy

for quite long time.

I think, −−get N option creates N thread and they make following system 
calls.


start N workers that call system calls that fetch data from the kernel, 
currently these are: getpid,
getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, 
getpriority, getresgid, getresuid,
getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, 
uname, adjtimex, sysfs.

Some of these system calls are OS specific.


Have you observed this type of issues with tasklist_lock ?
Do we need write_lock_irq(_lock) in below portion of code ? Can 
I use write_unlock instead of write_lock_irq in portion of code?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


bpf: mismatch between inode->i_private and inode->i_op

2018-02-28 Thread Sodagudi Prasad

Hi All,

Trying to boot 4.14.19 kernel with Android and observed a crash in bpf 
system.
From the call stack netd process is trying to access the 
/sys/fs/bfp/traffic_cookie_uid_map file.
After checking call stack further observed that, inode->i_private is 
pointing to (struct bpf_map *) but inode->i_op is pointing to the 
bpf_prog_fops.


[ 4134.721483] Unable to handle kernel paging request at virtual address 
80001

[ 4134.820925] Mem abort info:
[ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
[ 4135.016736]   SET = 0, FnV = 0
[ 4135.119820]   EA = 0, S1PTW = 0
[ 4135.201431] Data abort info:
[ 4135.301388]   ISV = 0, ISS = 0x0021
[ 4135.359599]   CM = 0, WnR = 0
[ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = 
ffe39b946000
[ 4135.499757] [00080001] *pgd=, 
*pud=

[ 4135.660725] Internal error: Oops: 9621 [#1] PREEMPT SMP
[ 4135.674610] Modules linked in:
[ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S  W   
4.14.19+ #1

[ 4135.716188] task: ffe39f4aa380 task.stack: ff801d4e
[ 4135.731599] PC is at bpf_prog_add+0x20/0x68
[ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
[ 4135.751788] pc : [] lr : [] 
pstate: 60400145

[ 4135.769062] sp : ff801d4e3ce0
[ 4135.777241] x29: ff801d4e3cf0 x28: ffe39f4aa380
[ 4135.790125] x27: ffe39f4aa380 x26: 0118
[ 4135.803020] x25: 0030 x24: 0015
[ 4135.815799] x23:  x22: 01080020
[ 4135.828704] x21: ffe3a01bf8f8 x20: 0001
[ 4135.841494] x19: ffe383821280 x18: 
[ 4135.854346] x17:  x16: ff94acc92000
[ 4135.867083] x15: 00042e7f x14: ff94ace81d88
[ 4135.880041] x13: 0001e4a7 x12: 0110
[ 4135.892894] x11: bdf96acb38411b00 x10: 
[ 4135.905610] x9 : ff94ac24c400 x8 : 00080001
[ 4135.918578] x7 :  x6 : ff801d4e3c24
[ 4135.931357] x5 : ff801d4e3c10 x4 : 
[ 4135.944231] x3 :  x2 : 
[ 4135.957062] x1 : 0001 x0 : ffe383821280
[ 4135.969757]
[ 4135.969757] PC: 0xff94ab7ad544:
[ 4135.981567] d544  91074400 91278021 72a00103 aa1303e2 94036495 
a9417bfd f84207f3 d65f03c0
[ 4136.001588] d564  a9be4ff4 a9017bfd 910043fd 2a0103f4 aa0003f3 
d503201f f9400e68 f9800111
[ 4136.021504] d584  885f7d09 0b140129 880afd09 35aa d5033bbf 
7140213f 5400010d f9400e69
[ 4136.041346] d5a4  b27ceff3 f9800131 885f7d28 4b140108 880b7d28 
35ab a9417bfd aa1303e0

[ 4136.061210]
[ 4136.061210] LR: 0xff94ab7ad5f8:
[ 4136.073031] d5f8  35aa d5033bbf 3489 a9417bfd a8c24ff4 
d65f03c0 d421 17fc
[ 4136.093094] d618  f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f 
320003e1 aa1303e0 97cc
[ 4136.112946] d638  a9417bfd f84207f3 d65f03c0 f81e0ff3 a9017bfd 
910043fd aa0003f3 d503201f
[ 4136.132694] d658  f9400e6a b9400149 2a0903e8 340002a8 f9400e6b 
1100050d 93407d0c 93407dae

[ 4136.152810]
[ 4136.152810] SP: 0xff801d4e3ca0:
[ 4136.164610] 3ca0  ab7ad584 ff94 60400145  a01bf8f8 
ffe3 0006 
[ 4136.184188] 3cc0   0080 ab844204 ff94 1d4e3cf0 
ff80 ab7ad584 ff94
[ 4136.204262] 3ce0    83821280 ffe3 1d4e3d10 
ff80 ab7ad638 ff94
[ 4136.224210] 3d00  9136a3c0 ffe3 9136a3c0 ffe3 1d4e3d70 
ff80 ab7b5d08 ff94

[ 4136.253946]
[ 4136.258315] Process netd (pid: 1260, stack limit = 
0xff801d4e)

[ 4136.273746] Call trace:
[ 4136.280146] Exception stack(0xff801d4e3ba0 to 0xff801d4e3ce0)
[ 4136.295378] 3ba0: ffe383821280 0001  

[ 4136.313715] 3bc0:  ff801d4e3c10 ff801d4e3c24 

[ 4136.332157] 3be0: 00080001 ff94ac24c400  
bdf96acb38411b00
[ 4136.350831] 3c00: 0110 0001e4a7 ff94ace81d88 
00042e7f
[ 4136.369115] 3c20: ff94acc92000   
ffe383821280
[ 4136.387357] 3c40: 0001 ffe3a01bf8f8 01080020 

[ 4136.405694] 3c60: 0015 0030 0118 
ffe39f4aa380
[ 4136.424136] 3c80: ffe39f4aa380 ff801d4e3cf0 ff94ab7ad638 
ff801d4e3ce0
[ 4136.442494] 3ca0: ff94ab7ad584 60400145 ffe3a01bf8f8 
0006
[ 4136.460936] 3cc0: 0080 ff94ab844204 ff801d4e3cf0 
ff94ab7ad584

[ 4136.479241] [] bpf_prog_add+0x20/0x68
[ 4136.491767] [] bpf_prog_inc+0x20/0x2c
[ 4136.504536] [] bpf_obj_get_user+0x204/0x22c
[ 4136.518746] [] SyS_bpf+0x5a8/0x1a88

User space is trying to access a bpf map, so inode->i_op and 
inode->i_priate should be pointing to bpf_map_iops and struct bpf_map * 
respectively.

But inode ->I_private value is correct but the inode->i_op is not.

Are there any race conditions assigning 

bpf: mismatch between inode->i_private and inode->i_op

2018-02-28 Thread Sodagudi Prasad

Hi All,

Trying to boot 4.14.19 kernel with Android and observed a crash in bpf 
system.
From the call stack netd process is trying to access the 
/sys/fs/bfp/traffic_cookie_uid_map file.
After checking call stack further observed that, inode->i_private is 
pointing to (struct bpf_map *) but inode->i_op is pointing to the 
bpf_prog_fops.


[ 4134.721483] Unable to handle kernel paging request at virtual address 
80001

[ 4134.820925] Mem abort info:
[ 4134.901283]   Exception class = DABT (current EL), IL = 32 bits
[ 4135.016736]   SET = 0, FnV = 0
[ 4135.119820]   EA = 0, S1PTW = 0
[ 4135.201431] Data abort info:
[ 4135.301388]   ISV = 0, ISS = 0x0021
[ 4135.359599]   CM = 0, WnR = 0
[ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = 
ffe39b946000
[ 4135.499757] [00080001] *pgd=, 
*pud=

[ 4135.660725] Internal error: Oops: 9621 [#1] PREEMPT SMP
[ 4135.674610] Modules linked in:
[ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S  W   
4.14.19+ #1

[ 4135.716188] task: ffe39f4aa380 task.stack: ff801d4e
[ 4135.731599] PC is at bpf_prog_add+0x20/0x68
[ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c
[ 4135.751788] pc : [] lr : [] 
pstate: 60400145

[ 4135.769062] sp : ff801d4e3ce0
[ 4135.777241] x29: ff801d4e3cf0 x28: ffe39f4aa380
[ 4135.790125] x27: ffe39f4aa380 x26: 0118
[ 4135.803020] x25: 0030 x24: 0015
[ 4135.815799] x23:  x22: 01080020
[ 4135.828704] x21: ffe3a01bf8f8 x20: 0001
[ 4135.841494] x19: ffe383821280 x18: 
[ 4135.854346] x17:  x16: ff94acc92000
[ 4135.867083] x15: 00042e7f x14: ff94ace81d88
[ 4135.880041] x13: 0001e4a7 x12: 0110
[ 4135.892894] x11: bdf96acb38411b00 x10: 
[ 4135.905610] x9 : ff94ac24c400 x8 : 00080001
[ 4135.918578] x7 :  x6 : ff801d4e3c24
[ 4135.931357] x5 : ff801d4e3c10 x4 : 
[ 4135.944231] x3 :  x2 : 
[ 4135.957062] x1 : 0001 x0 : ffe383821280
[ 4135.969757]
[ 4135.969757] PC: 0xff94ab7ad544:
[ 4135.981567] d544  91074400 91278021 72a00103 aa1303e2 94036495 
a9417bfd f84207f3 d65f03c0
[ 4136.001588] d564  a9be4ff4 a9017bfd 910043fd 2a0103f4 aa0003f3 
d503201f f9400e68 f9800111
[ 4136.021504] d584  885f7d09 0b140129 880afd09 35aa d5033bbf 
7140213f 5400010d f9400e69
[ 4136.041346] d5a4  b27ceff3 f9800131 885f7d28 4b140108 880b7d28 
35ab a9417bfd aa1303e0

[ 4136.061210]
[ 4136.061210] LR: 0xff94ab7ad5f8:
[ 4136.073031] d5f8  35aa d5033bbf 3489 a9417bfd a8c24ff4 
d65f03c0 d421 17fc
[ 4136.093094] d618  f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f 
320003e1 aa1303e0 97cc
[ 4136.112946] d638  a9417bfd f84207f3 d65f03c0 f81e0ff3 a9017bfd 
910043fd aa0003f3 d503201f
[ 4136.132694] d658  f9400e6a b9400149 2a0903e8 340002a8 f9400e6b 
1100050d 93407d0c 93407dae

[ 4136.152810]
[ 4136.152810] SP: 0xff801d4e3ca0:
[ 4136.164610] 3ca0  ab7ad584 ff94 60400145  a01bf8f8 
ffe3 0006 
[ 4136.184188] 3cc0   0080 ab844204 ff94 1d4e3cf0 
ff80 ab7ad584 ff94
[ 4136.204262] 3ce0    83821280 ffe3 1d4e3d10 
ff80 ab7ad638 ff94
[ 4136.224210] 3d00  9136a3c0 ffe3 9136a3c0 ffe3 1d4e3d70 
ff80 ab7b5d08 ff94

[ 4136.253946]
[ 4136.258315] Process netd (pid: 1260, stack limit = 
0xff801d4e)

[ 4136.273746] Call trace:
[ 4136.280146] Exception stack(0xff801d4e3ba0 to 0xff801d4e3ce0)
[ 4136.295378] 3ba0: ffe383821280 0001  

[ 4136.313715] 3bc0:  ff801d4e3c10 ff801d4e3c24 

[ 4136.332157] 3be0: 00080001 ff94ac24c400  
bdf96acb38411b00
[ 4136.350831] 3c00: 0110 0001e4a7 ff94ace81d88 
00042e7f
[ 4136.369115] 3c20: ff94acc92000   
ffe383821280
[ 4136.387357] 3c40: 0001 ffe3a01bf8f8 01080020 

[ 4136.405694] 3c60: 0015 0030 0118 
ffe39f4aa380
[ 4136.424136] 3c80: ffe39f4aa380 ff801d4e3cf0 ff94ab7ad638 
ff801d4e3ce0
[ 4136.442494] 3ca0: ff94ab7ad584 60400145 ffe3a01bf8f8 
0006
[ 4136.460936] 3cc0: 0080 ff94ab844204 ff801d4e3cf0 
ff94ab7ad584

[ 4136.479241] [] bpf_prog_add+0x20/0x68
[ 4136.491767] [] bpf_prog_inc+0x20/0x2c
[ 4136.504536] [] bpf_obj_get_user+0x204/0x22c
[ 4136.518746] [] SyS_bpf+0x5a8/0x1a88

User space is trying to access a bpf map, so inode->i_op and 
inode->i_priate should be pointing to bpf_map_iops and struct bpf_map * 
respectively.

But inode ->I_private value is correct but the inode->i_op is not.

Are there any race conditions assigning 

Re: [PATCH] perf: Add support for creating offline events

2018-02-13 Thread Sodagudi Prasad

On 2018-02-13 10:23, Peter Zijlstra wrote:

On Fri, Feb 09, 2018 at 03:07:00PM -0800, Raghavendra Rao Ananta wrote:

Perf framework doesn't allow creation of hardware events if
the requested CPU is offline. However, creation of an event
is achievable if the event is attached to the PMU as soon
as the CPU is online again.

So, introducing a feature that could allow to create events
even when the CPU is offline and return a success to the caller.
If, during the time of event creation, the CPU is found offline,
the event is moved to a new state (PERF_EVENT_STATE_DORMANT). As
and when the CPU is know to be woken up (through hotplug notifiers),
all the dormant events would be attached to the PMU (by
perf_install_in_context()). If during the life time of the event,
the CPU hasn't come online, the dormant event would just be freed.


This is horrible.. and you seem to have forgotten to explain why you
care about offline CPUs.

Hi Peter,

Up to 4.9 kernel, drivers can register cpu hotplug notfiters and drivers 
are able to create perf events dynamically based cpu notifies callback 
events.
As cpu hot plug is converted to state machine approach from hot plug 
notifiers,  every driver need to define a cpuhp state and registers with 
cpu hotplug state machine for creating perf events dynamically.


Qualcomm have use cases to monitor the cpu cycles and other hw events 
continuously on all cpus from kernel and profiling tools.
So we are thinking that there could be other soc vendors, who are 
interested in perf events preserving across cpu hot plug and perf events 
creation on hot plugged cores.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] perf: Add support for creating offline events

2018-02-13 Thread Sodagudi Prasad

On 2018-02-13 10:23, Peter Zijlstra wrote:

On Fri, Feb 09, 2018 at 03:07:00PM -0800, Raghavendra Rao Ananta wrote:

Perf framework doesn't allow creation of hardware events if
the requested CPU is offline. However, creation of an event
is achievable if the event is attached to the PMU as soon
as the CPU is online again.

So, introducing a feature that could allow to create events
even when the CPU is offline and return a success to the caller.
If, during the time of event creation, the CPU is found offline,
the event is moved to a new state (PERF_EVENT_STATE_DORMANT). As
and when the CPU is know to be woken up (through hotplug notifiers),
all the dormant events would be attached to the PMU (by
perf_install_in_context()). If during the life time of the event,
the CPU hasn't come online, the dormant event would just be freed.


This is horrible.. and you seem to have forgotten to explain why you
care about offline CPUs.

Hi Peter,

Up to 4.9 kernel, drivers can register cpu hotplug notfiters and drivers 
are able to create perf events dynamically based cpu notifies callback 
events.
As cpu hot plug is converted to state machine approach from hot plug 
notifiers,  every driver need to define a cpuhp state and registers with 
cpu hotplug state machine for creating perf events dynamically.


Qualcomm have use cases to monitor the cpu cycles and other hw events 
continuously on all cpus from kernel and profiling tools.
So we are thinking that there could be other soc vendors, who are 
interested in perf events preserving across cpu hot plug and perf events 
creation on hot plugged cores.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] kbuild: clang: Disable -Wunused-const-variable warnings

2018-01-29 Thread Sodagudi Prasad

On 2018-01-28 08:22, Segher Boessenkool wrote:

On Fri, Jan 26, 2018 at 04:59:46PM -0800, Prasad Sodagudi wrote:

Disable -Wunused-const-variable warnings instead of
disabling -Wunused-variable warnings, So that in both
clang and GCC -Wunused-const-variable gets disabled.


Why would you disable -Wunused-const-variable on GCC?  You do not
explain why.


Hi Segher,

Please check below discussion with Greg and Masahiro about 
"unused-variable".

https://lkml.org/lkml/2017/12/18/697


Please check this link, "unused-const-variable" warning is already 
disabled for GCC.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile?h=v4.15#n719
Below commit is disabling the "unused-const-variable" for GCC.
commit -  c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to 
W=1 warning level"


Currently "unused-variable" warnings are disabled for clang and not for 
the gcc.  Disabling of
"unused-variable" warning also disables "unused-const-variable" warning. 
To match same settings
between clang and gcc, disabling "unused-const-variable" warnings for 
clang as well.
So with this patch, keeping the same settings for GCC and CLANG with 
respect to "unused-const-variable".


-Thanks, Prasad




Segher


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] kbuild: clang: Disable -Wunused-const-variable warnings

2018-01-29 Thread Sodagudi Prasad

On 2018-01-28 08:22, Segher Boessenkool wrote:

On Fri, Jan 26, 2018 at 04:59:46PM -0800, Prasad Sodagudi wrote:

Disable -Wunused-const-variable warnings instead of
disabling -Wunused-variable warnings, So that in both
clang and GCC -Wunused-const-variable gets disabled.


Why would you disable -Wunused-const-variable on GCC?  You do not
explain why.


Hi Segher,

Please check below discussion with Greg and Masahiro about 
"unused-variable".

https://lkml.org/lkml/2017/12/18/697


Please check this link, "unused-const-variable" warning is already 
disabled for GCC.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile?h=v4.15#n719
Below commit is disabling the "unused-const-variable" for GCC.
commit -  c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to 
W=1 warning level"


Currently "unused-variable" warnings are disabled for clang and not for 
the gcc.  Disabling of
"unused-variable" warning also disables "unused-const-variable" warning. 
To match same settings
between clang and gcc, disabling "unused-const-variable" warnings for 
clang as well.
So with this patch, keeping the same settings for GCC and CLANG with 
respect to "unused-const-variable".


-Thanks, Prasad




Segher


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: unused-variable warning is getting disabled with clang

2017-12-07 Thread Sodagudi Prasad

On 2017-12-06 22:26, Greg Kroah-Hartman wrote:

On Wed, Dec 06, 2017 at 01:24:51PM -0800, Sodagudi Prasad wrote:


Hi All,

When kernel compiled with clang, following line is disabling the
unused-variable warning. This is not the case with gcc.
KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)

Are there any specific reasons for disabling unused-variable with 
clang?


Try it and see why it is disabled :)


Hi Greg,

When I have enabled -Wunused-variable warnings with clang, observed both 
-Wunused-variable and -Wunused-const-variable as expected.
It looks that, -Wunused-const-variable warnings are disabled explicitly 
with GCC as well.
commit -  c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to 
W=1 warning level"


I could see following warnings along with couple of -Wunused-variables 
warnings with downstream code.
arch/arm64/crypto/sha1-ce-glue.c:118:1: warning: unused variable 
'cpu_feature_match_SHA1' [-Wunused-const-variable]
include/linux/cpufeature.h:48:33: note: expanded from macro 
'module_cpu_feature_match'
arch/arm64/crypto/sha2-ce-glue.c:148:1: warning: unused variable 
'cpu_feature_match_SHA2' [-Wunused-const-variable]
arch/arm64/crypto/ghash-ce-glue.c:597:33: warning: unused variable 
'ghash_cpu_feature' [-Wunused-const-variable]
arch/arm64/crypto/aes-ce-cipher.c:280:1: warning: unused variable 
'cpu_feature_match_AES' [-Wunused-const-variable]
arch/arm64/crypto/aes-glue.c:674:1: warning: unused variable 
'cpu_feature_match_AES' [-Wunused-const-variable]
kernel/trace/ftrace.c:1092:27: warning: unused variable 
'ftrace_swapper_pid' [-Wunused-const-variable]
drivers/usb/host/ehci-platform.c:406:36: warning: unused variable 
'ehci_acpi_match' [-Wunused-const-variable]
drivers/usb/host/xhci-plat.c:416:36: warning: unused variable 
'usb_xhci_acpi_match' [-Wunused-const-variable]


So I have made following change and I will share patch for the same.

diff --git a/Makefile b/Makefile
index 4e6da2f..8a6c14e 100644
--- a/Makefile
+++ b/Makefile
@@ -711,7 +711,7 @@ endif
 KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) -meabi gnu
 KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
 KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,)
-KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)
+KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
 KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)

Please let me know if you have any concerns with this approach to 
identify all unused local variables.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: unused-variable warning is getting disabled with clang

2017-12-07 Thread Sodagudi Prasad

On 2017-12-06 22:26, Greg Kroah-Hartman wrote:

On Wed, Dec 06, 2017 at 01:24:51PM -0800, Sodagudi Prasad wrote:


Hi All,

When kernel compiled with clang, following line is disabling the
unused-variable warning. This is not the case with gcc.
KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)

Are there any specific reasons for disabling unused-variable with 
clang?


Try it and see why it is disabled :)


Hi Greg,

When I have enabled -Wunused-variable warnings with clang, observed both 
-Wunused-variable and -Wunused-const-variable as expected.
It looks that, -Wunused-const-variable warnings are disabled explicitly 
with GCC as well.
commit -  c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to 
W=1 warning level"


I could see following warnings along with couple of -Wunused-variables 
warnings with downstream code.
arch/arm64/crypto/sha1-ce-glue.c:118:1: warning: unused variable 
'cpu_feature_match_SHA1' [-Wunused-const-variable]
include/linux/cpufeature.h:48:33: note: expanded from macro 
'module_cpu_feature_match'
arch/arm64/crypto/sha2-ce-glue.c:148:1: warning: unused variable 
'cpu_feature_match_SHA2' [-Wunused-const-variable]
arch/arm64/crypto/ghash-ce-glue.c:597:33: warning: unused variable 
'ghash_cpu_feature' [-Wunused-const-variable]
arch/arm64/crypto/aes-ce-cipher.c:280:1: warning: unused variable 
'cpu_feature_match_AES' [-Wunused-const-variable]
arch/arm64/crypto/aes-glue.c:674:1: warning: unused variable 
'cpu_feature_match_AES' [-Wunused-const-variable]
kernel/trace/ftrace.c:1092:27: warning: unused variable 
'ftrace_swapper_pid' [-Wunused-const-variable]
drivers/usb/host/ehci-platform.c:406:36: warning: unused variable 
'ehci_acpi_match' [-Wunused-const-variable]
drivers/usb/host/xhci-plat.c:416:36: warning: unused variable 
'usb_xhci_acpi_match' [-Wunused-const-variable]


So I have made following change and I will share patch for the same.

diff --git a/Makefile b/Makefile
index 4e6da2f..8a6c14e 100644
--- a/Makefile
+++ b/Makefile
@@ -711,7 +711,7 @@ endif
 KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) -meabi gnu
 KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC)
 KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,)
-KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)
+KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
 KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)

Please let me know if you have any concerns with this approach to 
identify all unused local variables.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


unused-variable warning is getting disabled with clang

2017-12-06 Thread Sodagudi Prasad


Hi All,

When kernel compiled with clang, following line is disabling the 
unused-variable warning. This is not the case with gcc.

KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)

Are there any specific reasons for disabling unused-variable with clang?

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


unused-variable warning is getting disabled with clang

2017-12-06 Thread Sodagudi Prasad


Hi All,

When kernel compiled with clang, following line is disabling the 
unused-variable warning. This is not the case with gcc.

KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)

Are there any specific reasons for disabling unused-variable with clang?

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] Clockevents: Always call clockevents_program_event

2017-10-24 Thread Sodagudi Prasad

On 2017-10-24 01:37, Thomas Gleixner wrote:

On Tue, 24 Oct 2017, Prasad Sodagudi wrote:


Currently tick_program_event function is not calling
clockevents_program_event when  'expires == KTIME_MAX',
it is just updating clockevent state to 
CLOCK_EVT_STATE_ONESHOT_STOPPED.

clockevents_program_event function updates clockevent
device next_event by checking clockevent device state,
so always call clockevents_program_event() from tick_program_event.


No. This is fundmentally wrong. If the clockevent is in oneshot stopped
mode then you cannot call clockevents_program_event(). There is even a
warning in that code which will trigger.


 Yes. There is warning and I overlooked at that part of the code 
and thought
it would return from the clockevents_program_event function after 
next_event update.

  dev->next_event = expires;

if (clockevent_state_shutdown(dev))
return 0;

Thanks tglx for reviewing patch.

How to clean next next_event from clockevent device in the 
ONESHOT_STOPPED state from tick_program_event()?


Shall I update the next patch set with following condition?  Or Any 
other suggestions to fix this path?


diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 4237e07..21104b6 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -317,7 +317,8 @@ int clockevents_program_event(struct 
clock_event_device *dev, ktime_t expires,


dev->next_event = expires;

-   if (clockevent_state_shutdown(dev))
+   if (clockevent_state_shutdown(dev) ||
+   clockevent_state_oneshot_stopped(dev))
return 0;



Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] Clockevents: Always call clockevents_program_event

2017-10-24 Thread Sodagudi Prasad

On 2017-10-24 01:37, Thomas Gleixner wrote:

On Tue, 24 Oct 2017, Prasad Sodagudi wrote:


Currently tick_program_event function is not calling
clockevents_program_event when  'expires == KTIME_MAX',
it is just updating clockevent state to 
CLOCK_EVT_STATE_ONESHOT_STOPPED.

clockevents_program_event function updates clockevent
device next_event by checking clockevent device state,
so always call clockevents_program_event() from tick_program_event.


No. This is fundmentally wrong. If the clockevent is in oneshot stopped
mode then you cannot call clockevents_program_event(). There is even a
warning in that code which will trigger.


 Yes. There is warning and I overlooked at that part of the code 
and thought
it would return from the clockevents_program_event function after 
next_event update.

  dev->next_event = expires;

if (clockevent_state_shutdown(dev))
return 0;

Thanks tglx for reviewing patch.

How to clean next next_event from clockevent device in the 
ONESHOT_STOPPED state from tick_program_event()?


Shall I update the next patch set with following condition?  Or Any 
other suggestions to fix this path?


diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 4237e07..21104b6 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -317,7 +317,8 @@ int clockevents_program_event(struct 
clock_event_device *dev, ktime_t expires,


dev->next_event = expires;

-   if (clockevent_state_shutdown(dev))
+   if (clockevent_state_shutdown(dev) ||
+   clockevent_state_oneshot_stopped(dev))
return 0;



Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: clock event device’s next_event

2017-10-24 Thread Sodagudi Prasad

On 2017-10-24 00:23, Thomas Gleixner wrote:

On Mon, 23 Oct 2017, Sodagudi Prasad wrote:


Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are
canceling the tick_sched_timer timer but we are not updating the clock 
event

device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly 
and

resulting unnecessary wakeups for this cpu.

/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


Right, because this code does not have access to the broadcast device 
at

all. It doesn't even know and care about it.



Some of the drivers in cpuidle frame works are using 
tick_nohz_get_sleep_length API

to find maximum idle/sleep time from ts->sleep_length.

And also the estimated sleep length until the next timer in 
tick_nohz_stop_sched_tick

not getting updated at the end.


After digging further, I see that following call flow is updating
tick_cpu_device state to shutdown state but clock event device  
next_event is

not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram ->
tick_program_event.

int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);

if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev,
CLOCK_EVT_STATE_ONESHOT_STOPPED);
return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is
not getting updated as clockevents_program_event() function not called 
after

state update.


If the device is shutdown, then next_event does not matter. But yes, 
for

consistency reasons we could set it to KTIME_MAX.

Thanks tglx for your quick reply on this and I will send patch for the 
same.



Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: clock event device’s next_event

2017-10-24 Thread Sodagudi Prasad

On 2017-10-24 00:23, Thomas Gleixner wrote:

On Mon, 23 Oct 2017, Sodagudi Prasad wrote:


Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are
canceling the tick_sched_timer timer but we are not updating the clock 
event

device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly 
and

resulting unnecessary wakeups for this cpu.

/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


Right, because this code does not have access to the broadcast device 
at

all. It doesn't even know and care about it.



Some of the drivers in cpuidle frame works are using 
tick_nohz_get_sleep_length API

to find maximum idle/sleep time from ts->sleep_length.

And also the estimated sleep length until the next timer in 
tick_nohz_stop_sched_tick

not getting updated at the end.


After digging further, I see that following call flow is updating
tick_cpu_device state to shutdown state but clock event device  
next_event is

not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram ->
tick_program_event.

int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev =
__this_cpu_read(tick_cpu_device.evtdev);

if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev,
CLOCK_EVT_STATE_ONESHOT_STOPPED);
return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is
not getting updated as clockevents_program_event() function not called 
after

state update.


If the device is shutdown, then next_event does not matter. But yes, 
for

consistency reasons we could set it to KTIME_MAX.

Thanks tglx for your quick reply on this and I will send patch for the 
same.



Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


clock event device’s next_event

2017-10-23 Thread Sodagudi Prasad

Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are canceling the tick_sched_timer timer but we are not updating the 
clock event device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly and 
resulting unnecessary wakeups for this cpu.


/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


After digging further, I see that following call flow is updating 
tick_cpu_device state to shutdown state but clock event device  
next_event is not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> 
tick_program_event.


int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev = 
__this_cpu_read(tick_cpu_device.evtdev);


if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev, 
CLOCK_EVT_STATE_ONESHOT_STOPPED);

return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is not getting updated as clockevents_program_event() 
function not called after state update.


-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


clock event device’s next_event

2017-10-23 Thread Sodagudi Prasad

Hi Viresh and Thomas,

In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX 
we are canceling the tick_sched_timer timer but we are not updating the 
clock event device’s next_event to KTIME_MAX.
Due to that broadcast device’s next_event is not programmed properly and 
resulting unnecessary wakeups for this cpu.


/*
 * If the expiration time == KTIME_MAX, then we simply stop
 * the tick timer.
 */
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(>sched_timer);
goto out;
}


After digging further, I see that following call flow is updating 
tick_cpu_device state to shutdown state but clock event device  
next_event is not updated to KTIME_MAX.
hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> 
tick_program_event.


int tick_program_event(ktime_t expires, int force)
{
struct clock_event_device *dev = 
__this_cpu_read(tick_cpu_device.evtdev);


if (unlikely(expires == KTIME_MAX)) {
/*
 * We don't need the clock event device any more, stop 
it.

 */
clockevents_switch_state(dev, 
CLOCK_EVT_STATE_ONESHOT_STOPPED);

return 0;
}
In the above tick_program_event() function clock event device’s 
next_event is not getting updated as clockevents_program_event() 
function not called after state update.


-thanks, Prasad


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


__ubsan_handle_type_mismatch converted to __ubsan_handle_type_mismatch_v1

2017-10-05 Thread Sodagudi Prasad

Hi All,

Based on below links __ubsan_handle_type_mismatch has been renamed to  
__ubsan_handle_type_mismatch_v1.


https://github.com/llvm-mirror/compiler-rt/commit/56faee71af1888ba12ab076b3d1f9bbe223493df#diff-21369cc6f3917b27df3ced8de89cf134
https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg535130.html


When I tried to compile the kernel with LLVM seen following errors.
arch/arm64/kernel/traps.o: In function `dump_backtrace':
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'

arch/arm64/kernel/traps.o: In function `dump_mem':


Can I add __ubsan_handle_type_mismatch_v1 API similar to  
__ubsan_handle_type_mismatch()(I will send path for review)?
If both apis are present, then there will be backward compatibility.  
Let me know if you have any other suggestions for this issue.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


__ubsan_handle_type_mismatch converted to __ubsan_handle_type_mismatch_v1

2017-10-05 Thread Sodagudi Prasad

Hi All,

Based on below links __ubsan_handle_type_mismatch has been renamed to  
__ubsan_handle_type_mismatch_v1.


https://github.com/llvm-mirror/compiler-rt/commit/56faee71af1888ba12ab076b3d1f9bbe223493df#diff-21369cc6f3917b27df3ced8de89cf134
https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg535130.html


When I tried to compile the kernel with LLVM seen following errors.
arch/arm64/kernel/traps.o: In function `dump_backtrace':
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'
kernel/arch/arm64/kernel/traps.c:192: undefined reference to 
`__ubsan_handle_type_mismatch_v1'

arch/arm64/kernel/traps.o: In function `dump_mem':


Can I add __ubsan_handle_type_mismatch_v1 API similar to  
__ubsan_handle_type_mismatch()(I will send path for review)?
If both apis are present, then there will be backward compatibility.  
Let me know if you have any other suggestions for this issue.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] arch_topology: Fix section miss match warning due to free_raw_capacity()

2017-10-05 Thread Sodagudi Prasad

On 2017-09-25 16:20, Viresh Kumar wrote:

On 25-09-17, 15:51, Prasad Sodagudi wrote:

Remove the __init annotation from free_raw_capacity() to avoid
the following warning.

The function init_cpu_capacity_callback() references the
function __init free_raw_capacity().
WARNING: vmlinux.o(.text+0x425cc0): Section mismatch in reference
from the function init_cpu_capacity_callback() to the function
.init.text:free_raw_capacity().

Signed-off-by: Prasad Sodagudi 
---
 drivers/base/arch_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/arch_topology.c 
b/drivers/base/arch_topology.c

index 41be9ff..3da53cc 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -96,7 +96,7 @@ static int register_cpu_capacity_sysctl(void)
 static u32 capacity_scale;
 static u32 *raw_capacity;

-static int __init free_raw_capacity(void)
+static int free_raw_capacity(void)
 {
kfree(raw_capacity);
raw_capacity = NULL;


So we need the __init thing only when cpufreq support isn't there in 
the kernel
and I am not sure if we want to add more ifdef hackery in the 
declaration of

free_raw_capacity().

Acked-by: Viresh Kumar 

Hi Greg,

 Can you please consider this patch ?

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] arch_topology: Fix section miss match warning due to free_raw_capacity()

2017-10-05 Thread Sodagudi Prasad

On 2017-09-25 16:20, Viresh Kumar wrote:

On 25-09-17, 15:51, Prasad Sodagudi wrote:

Remove the __init annotation from free_raw_capacity() to avoid
the following warning.

The function init_cpu_capacity_callback() references the
function __init free_raw_capacity().
WARNING: vmlinux.o(.text+0x425cc0): Section mismatch in reference
from the function init_cpu_capacity_callback() to the function
.init.text:free_raw_capacity().

Signed-off-by: Prasad Sodagudi 
---
 drivers/base/arch_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/arch_topology.c 
b/drivers/base/arch_topology.c

index 41be9ff..3da53cc 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -96,7 +96,7 @@ static int register_cpu_capacity_sysctl(void)
 static u32 capacity_scale;
 static u32 *raw_capacity;

-static int __init free_raw_capacity(void)
+static int free_raw_capacity(void)
 {
kfree(raw_capacity);
raw_capacity = NULL;


So we need the __init thing only when cpufreq support isn't there in 
the kernel
and I am not sure if we want to add more ifdef hackery in the 
declaration of

free_raw_capacity().

Acked-by: Viresh Kumar 

Hi Greg,

 Can you please consider this patch ?

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


variable length array in structure

2017-08-07 Thread Sodagudi Prasad

Hi Arnd Bergmann,

Following commit have introduced compilation issue.
commit 2df2c3402fc81918a888e1ec711369f6014471f2
Author: Arnd Bergmann 
Date:   Sat Aug 5 21:57:46 2017 -0400
ext4: fix warning about stack corruption

Observed following error -
/fs/ext4/mballoc.c:2303:17: error: fields must have a constant size: 
'variable length array in structure' extension will never be supported

ext4_grpblk_t counters[blocksize_bits + 2];

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


variable length array in structure

2017-08-07 Thread Sodagudi Prasad

Hi Arnd Bergmann,

Following commit have introduced compilation issue.
commit 2df2c3402fc81918a888e1ec711369f6014471f2
Author: Arnd Bergmann 
Date:   Sat Aug 5 21:57:46 2017 -0400
ext4: fix warning about stack corruption

Observed following error -
/fs/ext4/mballoc.c:2303:17: error: fields must have a constant size: 
'variable length array in structure' extension will never be supported

ext4_grpblk_t counters[blocksize_bits + 2];

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] llist: clang: introduce member_address_is_nonnull()

2017-07-19 Thread Sodagudi Prasad


Hi All,

Observed boot up crash with clang in kerne/sched/core.c file 
sched_ttwu_pending() function, because it is using 
llist_for_each_entry_safe().
After pulling patch from “https://lkml.org/lkml/2017/7/19/1169, no crash 
observed.


Tested-by: Sodagudi Prasad <psoda...@codeaurora.org>

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] llist: clang: introduce member_address_is_nonnull()

2017-07-19 Thread Sodagudi Prasad


Hi All,

Observed boot up crash with clang in kerne/sched/core.c file 
sched_ttwu_pending() function, because it is using 
llist_for_each_entry_safe().
After pulling patch from “https://lkml.org/lkml/2017/7/19/1169, no crash 
observed.


Tested-by: Sodagudi Prasad 

-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [patch resend 4.12] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled

2017-06-26 Thread Sodagudi Prasad

On 2017-06-26 16:03, David Rientjes wrote:
The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress 
warning
for unused static inline functions") is to suppress clang's warnings 
about

unused static inline functions.

For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any 
non-x86

architecture, `inline' in the kernel implies that
__attribute__((always_inline)) is used.

Some code depends on that behavior, see
https://lkml.org/lkml/2017/6/13/918:

net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to
`__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to
`__compiletime_assert_99

The full fix would be to identify these breakages and annotate the
functions with __always_inline instead of `inline'.  But since we are 
late

in the 4.12-rc cycle, simply carry forward the forced inlining behavior
and work toward moving arm64, and other architectures, toward
CONFIG_OPTIMIZE_INLINING behavior.

Reported-by: Sodagudi Prasad <psoda...@codeaurora.org>
Tested-by: Matthias Kaehlcke <m...@chromium.org>
Signed-off-by: David Rientjes <rient...@google.com>
---
 Resend of http://marc.info/?l=linux-kernel=149681501816319 for 4.12
 inclusion.

 Prasad, please add your Tested-by.



I have compile tested this change with clang, please add -

Tested-by: Sodagudi Prasad <psoda...@codeaurora.org>

-Thanks, Prasad


 include/linux/compiler-clang.h |  8 
 include/linux/compiler-gcc.h   | 18 +++---
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,11 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), 
__COUNTER__)

-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef

- * directives.  Suppress the warning in clang as well.
- */
-#undef inline
-#define inline inline __attribute__((unused)) notrace
diff --git a/include/linux/compiler-gcc.h 
b/include/linux/compiler-gcc.h

--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -66,18 +66,22 @@

 /*
  * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * or if gcc is too old.
+ * GCC does not warn about unused static inline functions for
+ * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef
+ * directives.  Suppress the warning in clang as well by using 
"unused"

+ * function attribute, which is redundant but not harmful for gcc.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) notrace
+#define __inline__ __inline__	__attribute__((always_inline,unused)) 
notrace
+#define __inline __inline	__attribute__((always_inline,unused)) 
notrace

 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline __inline  __attribute__((unused)) notrace
 #endif

 #define __always_inlineinline __attribute__((always_inline))


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [patch resend 4.12] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled

2017-06-26 Thread Sodagudi Prasad

On 2017-06-26 16:03, David Rientjes wrote:
The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress 
warning
for unused static inline functions") is to suppress clang's warnings 
about

unused static inline functions.

For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any 
non-x86

architecture, `inline' in the kernel implies that
__attribute__((always_inline)) is used.

Some code depends on that behavior, see
https://lkml.org/lkml/2017/6/13/918:

net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to
`__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: undefined reference to
`__compiletime_assert_99

The full fix would be to identify these breakages and annotate the
functions with __always_inline instead of `inline'.  But since we are 
late

in the 4.12-rc cycle, simply carry forward the forced inlining behavior
and work toward moving arm64, and other architectures, toward
CONFIG_OPTIMIZE_INLINING behavior.

Reported-by: Sodagudi Prasad 
Tested-by: Matthias Kaehlcke 
Signed-off-by: David Rientjes 
---
 Resend of http://marc.info/?l=linux-kernel=149681501816319 for 4.12
 inclusion.

 Prasad, please add your Tested-by.



I have compile tested this change with clang, please add -

Tested-by: Sodagudi Prasad 

-Thanks, Prasad


 include/linux/compiler-clang.h |  8 
 include/linux/compiler-gcc.h   | 18 +++---
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,11 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), 
__COUNTER__)

-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef

- * directives.  Suppress the warning in clang as well.
- */
-#undef inline
-#define inline inline __attribute__((unused)) notrace
diff --git a/include/linux/compiler-gcc.h 
b/include/linux/compiler-gcc.h

--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -66,18 +66,22 @@

 /*
  * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * or if gcc is too old.
+ * GCC does not warn about unused static inline functions for
+ * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef
+ * directives.  Suppress the warning in clang as well by using 
"unused"

+ * function attribute, which is redundant but not harmful for gcc.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) notrace
+#define __inline__ __inline__	__attribute__((always_inline,unused)) 
notrace
+#define __inline __inline	__attribute__((always_inline,unused)) 
notrace

 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline __inline  __attribute__((unused)) notrace
 #endif

 #define __always_inlineinline __attribute__((always_inline))


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-23 Thread Sodagudi Prasad

On 2017-06-22 02:43, Mark Rutland wrote:

On Tue, Jun 20, 2017 at 04:12:32PM -0700, David Rientjes wrote:

On Tue, 20 Jun 2017, Mark Rutland wrote:

> As with my reply to David, my preference would be that we:
>
> 1) Align compiler-clang.h with the compiler-gcc.h inlining behaviour, so
>that things work by default.
>
> 2) Fix up the arm64 core code (and drivers for architected / common
>peripherals) to use __always_inline where we always require inlining.
>
> 3) Have arm64 select CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING, and have
>people test-build configurations with CONFIG_OPTIMIZE_INLINING, with
>both GCC and clang.
>
> 4) Fix up drivers, etc, as appropriate.
>
> 5) Once that's largely stable, and if there's a benefit, have arm64
>select CONFIG_OPTIMIZE_INLINING by default.
>
> That should avoid undue breakage, while enabling this ASAP.
>

Sounds good, but I think we should simply deal with the
__attribute__((unused)) needed for clang as part of compiler-gcc.h by
simply adding it to the inline override there to avoid duplicated 
code.


Agreed. That looks much better.

Thanks,
Mark.



diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,11 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), 
__COUNTER__)

-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef

- * directives.  Suppress the warning in clang as well.
- */
-#undef inline
-#define inline inline __attribute__((unused)) notrace
diff --git a/include/linux/compiler-gcc.h 
b/include/linux/compiler-gcc.h

index 0efef9cf014f..71fe0994cf1a 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -66,18 +66,22 @@

 /*
  * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * or if gcc is too old.
+ * GCC does not warn about unused static inline functions for
+ * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef
+ * directives.  Suppress the warning in clang as well by using 
"unused"

+ * function attribute, which is redundant but not harmful for gcc.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) notrace
+#define __inline__ __inline__	__attribute__((always_inline,unused)) 
notrace
+#define __inline __inline	__attribute__((always_inline,unused)) 
notrace

 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline __inline  __attribute__((unused)) notrace
 #endif

 #define __always_inlineinline __attribute__((always_inline))

Hi David,

I just want to cross check with you.
Do you want me to update this change in next patch set ? Or are you 
going to add this ?


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-23 Thread Sodagudi Prasad

On 2017-06-22 02:43, Mark Rutland wrote:

On Tue, Jun 20, 2017 at 04:12:32PM -0700, David Rientjes wrote:

On Tue, 20 Jun 2017, Mark Rutland wrote:

> As with my reply to David, my preference would be that we:
>
> 1) Align compiler-clang.h with the compiler-gcc.h inlining behaviour, so
>that things work by default.
>
> 2) Fix up the arm64 core code (and drivers for architected / common
>peripherals) to use __always_inline where we always require inlining.
>
> 3) Have arm64 select CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING, and have
>people test-build configurations with CONFIG_OPTIMIZE_INLINING, with
>both GCC and clang.
>
> 4) Fix up drivers, etc, as appropriate.
>
> 5) Once that's largely stable, and if there's a benefit, have arm64
>select CONFIG_OPTIMIZE_INLINING by default.
>
> That should avoid undue breakage, while enabling this ASAP.
>

Sounds good, but I think we should simply deal with the
__attribute__((unused)) needed for clang as part of compiler-gcc.h by
simply adding it to the inline override there to avoid duplicated 
code.


Agreed. That looks much better.

Thanks,
Mark.



diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -15,11 +15,3 @@
  * with any version that can compile the kernel
  */
 #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), 
__COUNTER__)

-
-/*
- * GCC does not warn about unused static inline functions for
- * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef

- * directives.  Suppress the warning in clang as well.
- */
-#undef inline
-#define inline inline __attribute__((unused)) notrace
diff --git a/include/linux/compiler-gcc.h 
b/include/linux/compiler-gcc.h

index 0efef9cf014f..71fe0994cf1a 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -66,18 +66,22 @@

 /*
  * Force always-inline if the user requests it so via the .config,
- * or if gcc is too old:
+ * or if gcc is too old.
+ * GCC does not warn about unused static inline functions for
+ * -Wunused-function.  This turns out to avoid the need for complex 
#ifdef
+ * directives.  Suppress the warning in clang as well by using 
"unused"

+ * function attribute, which is redundant but not harmful for gcc.
  */
 #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
 !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-#define inline inline  __attribute__((always_inline)) notrace
-#define __inline__ __inline__  __attribute__((always_inline)) notrace
-#define __inline   __inline__attribute__((always_inline)) notrace
+#define inline inline  __attribute__((always_inline,unused)) notrace
+#define __inline__ __inline__	__attribute__((always_inline,unused)) 
notrace
+#define __inline __inline	__attribute__((always_inline,unused)) 
notrace

 #else
 /* A lot of inline functions can cause havoc with function tracing */
-#define inline inline  notrace
-#define __inline__ __inline__  notrace
-#define __inline   __inlinenotrace
+#define inline inline  __attribute__((unused)) notrace
+#define __inline__ __inline__  __attribute__((unused)) notrace
+#define __inline __inline  __attribute__((unused)) notrace
 #endif

 #define __always_inlineinline __attribute__((always_inline))

Hi David,

I just want to cross check with you.
Do you want me to update this change in next patch set ? Or are you 
going to add this ?


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread Sodagudi Prasad

On 2017-06-19 14:42, David Rientjes wrote:

On Mon, 19 Jun 2017, Sodagudi Prasad wrote:


> > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
> > static inline functions") re-defining the 'inline' macro but
> > __attribute__((always_inline)) is missing. Some compilers may
> > not honor inline hint if always_iniline attribute not there.
> > So add always_inline attribute to inline as done by
> > compiler-gcc.h file.
> >
>
> IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4
> and that the inlining decision making is improved in >= 4.  To make a
> change like this, I would think that we would need to show that clang is
> making suboptimal decisions.  I don't think there's a downside to making
> CONFIG_OPTIMIZE_INLINING specific only to gcc.
>
> If it is shown that __attribute__((always_inline)) is needed for clang as
> well, this should be done as part of compiler-gcc.h to avoid duplicated
> code.

Hi David,

Here is the discussion about this change -
https://lkml.org/lkml/2017/6/15/396
Please check mark and will's comments.



Yes, the arch/arm64/include/asm/cmpxchg.h instance appears to need
__always_inline as several other functions need __always_inline in
arch/arm64/include/*.  It's worth making that change as you suggested 
in

your original patch.

The concern, however, is inlining all "inline" functions forcefully.  
The

only reason this is done for gcc is because of suboptimal inlining
decisions in gcc < 4.

So the question is whether this is a single instance that can be fixed
where clang un-inlining causes problems or whether that instance 
suggests

all possible inline usage for clang absolutely requires __always_inline
due to a suboptimal compiler implementation.  I would suggest the 
former.


Hi David,

 I am not 100% sure about the best approach for this problem. We may 
have to
replace inline with always_inline for all inline functions where 
BUILD_BUG() used.


So far inline as always_inline for ARM64, if we do not continue same 
settings,

will there not be any performance differences?

Hi Will and Mark,

Please suggest the best solution to this problem. Currently __xchg_mb is 
only having issue
based on compiler -inline-threshold configuration. But there are many 
other instances
in arch/arm64/* where BUILD_BUG() used for inline functions and which 
may fail later.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread Sodagudi Prasad

On 2017-06-19 14:42, David Rientjes wrote:

On Mon, 19 Jun 2017, Sodagudi Prasad wrote:


> > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
> > static inline functions") re-defining the 'inline' macro but
> > __attribute__((always_inline)) is missing. Some compilers may
> > not honor inline hint if always_iniline attribute not there.
> > So add always_inline attribute to inline as done by
> > compiler-gcc.h file.
> >
>
> IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4
> and that the inlining decision making is improved in >= 4.  To make a
> change like this, I would think that we would need to show that clang is
> making suboptimal decisions.  I don't think there's a downside to making
> CONFIG_OPTIMIZE_INLINING specific only to gcc.
>
> If it is shown that __attribute__((always_inline)) is needed for clang as
> well, this should be done as part of compiler-gcc.h to avoid duplicated
> code.

Hi David,

Here is the discussion about this change -
https://lkml.org/lkml/2017/6/15/396
Please check mark and will's comments.



Yes, the arch/arm64/include/asm/cmpxchg.h instance appears to need
__always_inline as several other functions need __always_inline in
arch/arm64/include/*.  It's worth making that change as you suggested 
in

your original patch.

The concern, however, is inlining all "inline" functions forcefully.  
The

only reason this is done for gcc is because of suboptimal inlining
decisions in gcc < 4.

So the question is whether this is a single instance that can be fixed
where clang un-inlining causes problems or whether that instance 
suggests

all possible inline usage for clang absolutely requires __always_inline
due to a suboptimal compiler implementation.  I would suggest the 
former.


Hi David,

 I am not 100% sure about the best approach for this problem. We may 
have to
replace inline with always_inline for all inline functions where 
BUILD_BUG() used.


So far inline as always_inline for ARM64, if we do not continue same 
settings,

will there not be any performance differences?

Hi Will and Mark,

Please suggest the best solution to this problem. Currently __xchg_mb is 
only having issue
based on compiler -inline-threshold configuration. But there are many 
other instances
in arch/arm64/* where BUILD_BUG() used for inline functions and which 
may fail later.


-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread Sodagudi Prasad

On 2017-06-19 13:25, David Rientjes wrote:

On Mon, 19 Jun 2017, Prasad Sodagudi wrote:


Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") re-defining the 'inline' macro but
__attribute__((always_inline)) is missing. Some compilers may
not honor inline hint if always_iniline attribute not there.
So add always_inline attribute to inline as done by
compiler-gcc.h file.



IIUC, __attribute__((always_inline)) was only needed for gcc versions < 
4

and that the inlining decision making is improved in >= 4.  To make a
change like this, I would think that we would need to show that clang 
is
making suboptimal decisions.  I don't think there's a downside to 
making

CONFIG_OPTIMIZE_INLINING specific only to gcc.

If it is shown that __attribute__((always_inline)) is needed for clang 
as

well, this should be done as part of compiler-gcc.h to avoid duplicated
code.


Hi David,

Here is the discussion about this change -  
https://lkml.org/lkml/2017/6/15/396

Please check mark and will's comments.

-Thanks, Prasad




Signed-off-by: Prasad Sodagudi 
---
 include/linux/compiler-clang.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..deb65b3 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,9 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
+#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
+!defined(CONFIG_OPTIMIZE_INLINING)
+#define inline inline __attribute__((always_inline)) 
__attribute__((unused)) notrace

+#else
 #define inline inline __attribute__((unused)) notrace
+#endif


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: [PATCH] compiler, clang: Add always_inline attribute to inline

2017-06-19 Thread Sodagudi Prasad

On 2017-06-19 13:25, David Rientjes wrote:

On Mon, 19 Jun 2017, Prasad Sodagudi wrote:


Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") re-defining the 'inline' macro but
__attribute__((always_inline)) is missing. Some compilers may
not honor inline hint if always_iniline attribute not there.
So add always_inline attribute to inline as done by
compiler-gcc.h file.



IIUC, __attribute__((always_inline)) was only needed for gcc versions < 
4

and that the inlining decision making is improved in >= 4.  To make a
change like this, I would think that we would need to show that clang 
is
making suboptimal decisions.  I don't think there's a downside to 
making

CONFIG_OPTIMIZE_INLINING specific only to gcc.

If it is shown that __attribute__((always_inline)) is needed for clang 
as

well, this should be done as part of compiler-gcc.h to avoid duplicated
code.


Hi David,

Here is the discussion about this change -  
https://lkml.org/lkml/2017/6/15/396

Please check mark and will's comments.

-Thanks, Prasad




Signed-off-by: Prasad Sodagudi 
---
 include/linux/compiler-clang.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..deb65b3 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,9 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
+#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||   \
+!defined(CONFIG_OPTIMIZE_INLINING)
+#define inline inline __attribute__((always_inline)) 
__attribute__((unused)) notrace

+#else
 #define inline inline __attribute__((unused)) notrace
+#endif


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Using __always_inline attribute

2017-06-14 Thread Sodagudi Prasad

On 2017-06-14 15:33, Sodagudi Prasad wrote:

On 2017-06-14 03:06, Will Deacon wrote:

Hi Prasad,

On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote:
With a variant of a CLANG(based on 4.0) following errors observed on 
Linux

4.12-rc5 tag.


How are you building the kernel with clang, btw? I'd be keen to try 
the same
thing, but I thought the LLVMLinux project was largely dead at his 
point. Do
you still need build system patches, or is this now integrated with 
Kbuild?

Hi Will,
Trying clang patches from below tree -
https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm




net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above

compilation issue due to BUILD_BUG().


The only BUILD_BUG I see around here is if the size parameter (which 
is
calculated using sizeof) is not known to be 1,2,4 or 8 at compile 
time. It
would be interesting to know which call site is confusing clang in 
this way

and what code is actually being emitted here.

If it's just that __xchg_mb isn't being inlined into the 
__xchg_wrapper

macro, then your patch should be ok, but I'd like to be sure it's not
something else. I'm also surprised you haven't see this crop up in 
other

places.

After digging further, we observed that inline definition was changed
recently and causing this issue.
Here is missing part of inline macro definition 
__attribute__((unused)).


Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") have redefined the inline macro as below
#define inline inline __attribute__((unused))

But actual definition of inline macro defined compiler-gcc.h file as
shown below.
#define inline  inline  __attribute__((always_inline)) 
notrace


As always_inline attribute is missing in inline definition, compile
may not inline macros __xchg_mb in
arch/arm64/include/asm/cmpxchg.h file and leading to error.

Some compilers may not honor inline as inline if always_inline
attribute is removed because of
-inline-threshold compiler options.

Here is the change to fix this issue-
diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..a0e6433 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline __attribute__((always_inline)) __attribute__((unused)) 
notrace


-Thanks, Prasad


Hi Will,

Here is the proper patch -
compiler, clang: Add always_inline attribute to inline

Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") re-defining the 'inline' macro but
__attribute__((always_inline)) is missing. Some compilers may
not honor inline hint if always_iniline attribute is not there.
So add always_inline attribute to inline as done by
compiler-gcc.h file.

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..400b0cd 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline inline __attribute__((always_inline)) 
__attribute__((unused)) notrace


-Thanks, Prasad



Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Using __always_inline attribute

2017-06-14 Thread Sodagudi Prasad

On 2017-06-14 15:33, Sodagudi Prasad wrote:

On 2017-06-14 03:06, Will Deacon wrote:

Hi Prasad,

On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote:
With a variant of a CLANG(based on 4.0) following errors observed on 
Linux

4.12-rc5 tag.


How are you building the kernel with clang, btw? I'd be keen to try 
the same
thing, but I thought the LLVMLinux project was largely dead at his 
point. Do
you still need build system patches, or is this now integrated with 
Kbuild?

Hi Will,
Trying clang patches from below tree -
https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm




net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above

compilation issue due to BUILD_BUG().


The only BUILD_BUG I see around here is if the size parameter (which 
is
calculated using sizeof) is not known to be 1,2,4 or 8 at compile 
time. It
would be interesting to know which call site is confusing clang in 
this way

and what code is actually being emitted here.

If it's just that __xchg_mb isn't being inlined into the 
__xchg_wrapper

macro, then your patch should be ok, but I'd like to be sure it's not
something else. I'm also surprised you haven't see this crop up in 
other

places.

After digging further, we observed that inline definition was changed
recently and causing this issue.
Here is missing part of inline macro definition 
__attribute__((unused)).


Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") have redefined the inline macro as below
#define inline inline __attribute__((unused))

But actual definition of inline macro defined compiler-gcc.h file as
shown below.
#define inline  inline  __attribute__((always_inline)) 
notrace


As always_inline attribute is missing in inline definition, compile
may not inline macros __xchg_mb in
arch/arm64/include/asm/cmpxchg.h file and leading to error.

Some compilers may not honor inline as inline if always_inline
attribute is removed because of
-inline-threshold compiler options.

Here is the change to fix this issue-
diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..a0e6433 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline __attribute__((always_inline)) __attribute__((unused)) 
notrace


-Thanks, Prasad


Hi Will,

Here is the proper patch -
compiler, clang: Add always_inline attribute to inline

Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") re-defining the 'inline' macro but
__attribute__((always_inline)) is missing. Some compilers may
not honor inline hint if always_iniline attribute is not there.
So add always_inline attribute to inline as done by
compiler-gcc.h file.

diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..400b0cd 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline inline __attribute__((always_inline)) 
__attribute__((unused)) notrace


-Thanks, Prasad



Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Using __always_inline attribute

2017-06-14 Thread Sodagudi Prasad

On 2017-06-14 03:06, Will Deacon wrote:

Hi Prasad,

On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote:
With a variant of a CLANG(based on 4.0) following errors observed on 
Linux

4.12-rc5 tag.


How are you building the kernel with clang, btw? I'd be keen to try the 
same
thing, but I thought the LLVMLinux project was largely dead at his 
point. Do
you still need build system patches, or is this now integrated with 
Kbuild?

Hi Will,
Trying clang patches from below tree -
https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm




net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above

compilation issue due to BUILD_BUG().


The only BUILD_BUG I see around here is if the size parameter (which is
calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. 
It
would be interesting to know which call site is confusing clang in this 
way

and what code is actually being emitted here.

If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper
macro, then your patch should be ok, but I'd like to be sure it's not
something else. I'm also surprised you haven't see this crop up in 
other

places.
After digging further, we observed that inline definition was changed 
recently and causing this issue.

Here is missing part of inline macro definition __attribute__((unused)).

Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") have redefined the inline macro as below
#define inline inline __attribute__((unused))

But actual definition of inline macro defined compiler-gcc.h file as 
shown below.
#define inline  inline  __attribute__((always_inline)) 
notrace


As always_inline attribute is missing in inline definition, compile may 
not inline macros __xchg_mb in

arch/arm64/include/asm/cmpxchg.h file and leading to error.

Some compilers may not honor inline as inline if always_inline attribute 
is removed because of

-inline-threshold compiler options.

Here is the change to fix this issue-
diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..a0e6433 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline __attribute__((always_inline)) __attribute__((unused)) 
notrace


-Thanks, Prasad


Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Using __always_inline attribute

2017-06-14 Thread Sodagudi Prasad

On 2017-06-14 03:06, Will Deacon wrote:

Hi Prasad,

On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote:
With a variant of a CLANG(based on 4.0) following errors observed on 
Linux

4.12-rc5 tag.


How are you building the kernel with clang, btw? I'd be keen to try the 
same
thing, but I thought the LLVMLinux project was largely dead at his 
point. Do
you still need build system patches, or is this now integrated with 
Kbuild?

Hi Will,
Trying clang patches from below tree -
https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm




net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above

compilation issue due to BUILD_BUG().


The only BUILD_BUG I see around here is if the size parameter (which is
calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. 
It
would be interesting to know which call site is confusing clang in this 
way

and what code is actually being emitted here.

If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper
macro, then your patch should be ok, but I'd like to be sure it's not
something else. I'm also surprised you haven't see this crop up in 
other

places.
After digging further, we observed that inline definition was changed 
recently and causing this issue.

Here is missing part of inline macro definition __attribute__((unused)).

Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") have redefined the inline macro as below
#define inline inline __attribute__((unused))

But actual definition of inline macro defined compiler-gcc.h file as 
shown below.
#define inline  inline  __attribute__((always_inline)) 
notrace


As always_inline attribute is missing in inline definition, compile may 
not inline macros __xchg_mb in

arch/arm64/include/asm/cmpxchg.h file and leading to error.

Some compilers may not honor inline as inline if always_inline attribute 
is removed because of

-inline-threshold compiler options.

Here is the change to fix this issue-
diff --git a/include/linux/compiler-clang.h 
b/include/linux/compiler-clang.h

index d614c5e..a0e6433 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -22,4 +22,4 @@
  * directives.  Suppress the warning in clang as well.
  */
 #undef inline
-#define inline inline __attribute__((unused)) notrace
+#define inline __attribute__((always_inline)) __attribute__((unused)) 
notrace


-Thanks, Prasad


Will


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Using __always_inline attribute

2017-06-13 Thread Sodagudi Prasad


Hi All,

With a variant of a CLANG(based on 4.0) following errors observed on 
Linux 4.12-rc5 tag.


net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above compilation issue due to BUILD_BUG().


We added  __always_inline attribute to this macro as shown below, so 
that clang forces this macro to be always inline.
Based on definition of __xchg##sfx, it should always be inline.  Can we 
force this macro to be __always_inline ?



diff --git a/arch/arm64/include/asm/cmpxchg.h 
b/arch/arm64/include/asm/cmpxchg.h

index ae852ad..ce57cec 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -73,7 +73,7 @@
 #undef __XCHG_CASE

 #define __XCHG_GEN(sfx) 
   \
-static inline unsigned long __xchg##sfx(unsigned long x,   
\
+static __always_inline unsigned long __xchg##sfx(unsigned long x,   
   \
volatile void *ptr, 
\
int size)   
\
 {  
\



-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Using __always_inline attribute

2017-06-13 Thread Sodagudi Prasad


Hi All,

With a variant of a CLANG(based on 4.0) following errors observed on 
Linux 4.12-rc5 tag.


net/built-in.o: In function `__xchg_mb':
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99'
arch/arm64/include/asm/cmpxchg.h:99: \
undefined reference to `__compiletime_assert_99

Clang does not seems to be marking this macro as inline and causing 
above compilation issue due to BUILD_BUG().


We added  __always_inline attribute to this macro as shown below, so 
that clang forces this macro to be always inline.
Based on definition of __xchg##sfx, it should always be inline.  Can we 
force this macro to be __always_inline ?



diff --git a/arch/arm64/include/asm/cmpxchg.h 
b/arch/arm64/include/asm/cmpxchg.h

index ae852ad..ce57cec 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -73,7 +73,7 @@
 #undef __XCHG_CASE

 #define __XCHG_GEN(sfx) 
   \
-static inline unsigned long __xchg##sfx(unsigned long x,   
\
+static __always_inline unsigned long __xchg##sfx(unsigned long x,   
   \
volatile void *ptr, 
\
int size)   
\
 {  
\



-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


deadlock due to circular dependency between threads

2017-05-03 Thread Sodagudi Prasad

Hi all,

I am working on a platform, which is using the Linux version 4.4. I have 
observed a DEADLOCK between couple of threads and looking for 
suggestions/comments.

Here is my understanding from the call stacks of these blocked tasks.

0) CPU3 is getting hot plugged from a kthread and which is running on 
core5.
1) Cpu hot plug flow needs to flush the work items on hot plugging CPU3, 
with a high priority worker from the corresponding CPU’s(cpu3) worker 
pool.
2) There are no high priority workers on the CPU3 worker pool, so 
create_worker was initiated to create high priority kernel 
thread/worker.
3) This thread creation should be done by kthreadd demon, but kthreadd  
demon have got stuck in some other thread creation. At this point of 
time kthreadd creating a thread and updating cgroup settings and waiting 
on rw semaphore of cgroup subsystem.
4) Cgroup readwrite semaphore is taken by "init" thread and waiting on 
cpuset mutex lock. init task is updating cgroup's based on userspace 
request.
5) Cpuset mutex lock is taken by "kworker:5/1" and it is waiting for 
cpuhotplug lock.

Cpuhotplug mutex is taken by "ABC_XYZ" hotplugging thread. DEADLOCK!

circular dependency between threads:-
"kthread_XYZ" ==> "kthreadd" ==> "init"  ==> "kworker/5:1" ==> 
"kthread_XYZ"



PID: 910TASK: ffc0ee8dd780  CPU: 5   COMMAND: "ABC_XYZ"
 #0 [ffc0ee9cb900] __switch_to at ff800808553c
 #1 [ffc0ee9cb930] __schedule at ff8008d76aa0
 #2 [ffc0ee9cb990] schedule at ff8008d76e04
 #3 [ffc0ee9cb9b0] schedule_timeout at ff8008d7953c
 #4 [ffc0ee9cba60] wait_for_common at ff8008d77888
 #5 [ffc0ee9cbaf0] wait_for_completion at ff8008d778dc
 #6 [ffc0ee9cbb00] flush_work at ff80080b3850
 #7 [ffc0ee9cbb80] workqueue_cpu_down_callback at ff80080b5360
 #8 [ffc0ee9cbbc0] notifier_call_chain at ff80080b9c4c
 #9 [ffc0ee9cbc00] __raw_notifier_call_chain at ff80080b9cb8
#10 [ffc0ee9cbc10] __cpu_notify at ff800809eb50
#11 [ffc0ee9cbc20] _cpu_down at ff800809ee84
#12 [ffc0ee9cbca0] cpu_down at ff800809f124
#13 [ffc0ee9cbcd0] cpu_subsys_offline at ff800856b768
#14 [ffc0ee9cbce0] device_offline at ff8008567040
#15 [ffc0ee9cbd10] update_offline_cores at ff8008d74b54
#16 [ffc0ee9cbda0] do_hotplug at ff8008d75358
#17 [ffc0ee9cbe20] kthread at ff80080b8e3c



PID: 2  TASK: ffc0f9660c80  CPU: 4   COMMAND: "kthreadd"
 #0 [ffc0f9683bf0] __switch_to at ff800808553c
 #1 [ffc0f9683c20] __schedule at ff8008d76aa0
 #2 [ffc0f9683c80] schedule at ff8008d76e04
 #3 [ffc0f9683ca0] rwsem_down_read_failed at ff8008d79144
 #4 [ffc0f9683cf0] __percpu_down_read at ff80080edc4c
 #5 [ffc0f9683d10] copy_process at ff800809cecc
 #6 [ffc0f9683df0] _do_fork at ff800809d5a0
 #7 [ffc0f9683e50] kernel_thread at ff800809d89c
 #8 [ffc0f9683e60] kthreadd at ff80080b9714


PID: 898TASK: ffc0ee91  CPU: 0   COMMAND: "init"
 #0 [ffc06fd93980] __switch_to at ff800808553c
 #1 [ffc06fd939b0] __schedule at ff8008d76aa0
 #2 [ffc06fd93a10] schedule at ff8008d76e04
 #3 [ffc06fd93a30] schedule_preempt_disabled at ff8008d7714c
 #4 [ffc06fd93a50] __mutex_lock_slowpath at ff8008d78684
 #5 [ffc06fd93ab0] mutex_lock at ff8008d78714
 #6 [ffc06fd93ad0] cpuset_can_attach at ff800812d490
 #7 [ffc06fd93b20] cgroup_taskset_migrate at ff8008129194
 #8 [ffc06fd93b70] cgroup_migrate at ff8008129454
 #9 [ffc06fd93bf0] cgroup_attach_task at ff800812950c
#10 [ffc06fd93c50] __cgroup_procs_write at ff8008129884
#11 [ffc06fd93d10] cgroup_tasks_write at ff800812993c
#12 [ffc06fd93d20] cgroup_file_write at ff8008125078
#13 [ffc06fd93d70] kernfs_fop_write at ff800820bef4
#14 [ffc06fd93db0] __vfs_write at ff80081ac6f4
#15 [ffc06fd93e30] vfs_write at ff80081acf28
#16 [ffc06fd93e70] sys_write at ff80081ad6d8
#17 [ffc06fd93ed0] el0_svc_naked at ff800808462



PID: 66 TASK: ffc020dc7080  CPU: 5   COMMAND: "kworker/5:1"
 #0 [ffc0f7ff3a90] __switch_to at ff800808553c
 #1 [ffc0f7ff3ac0] __schedule at ff8008d76aa0
 #2 [ffc0f7ff3b20] schedule at ff8008d76e04
 #3 [ffc0f7ff3b40] schedule_preempt_disabled at ff8008d7714c
 #4 [ffc0f7ff3b60] __mutex_lock_slowpath at ff8008d78684
 #5 [ffc0f7ff3bc0] mutex_lock at ff8008d78714
 #6 [ffc0f7ff3be0] get_online_cpus at ff800809e9bc
 #7 [ffc0f7ff3c00] rebuild_sched_domains_locked at ff800812c960
 #8 [ffc0f7ff3cb0] rebuild_sched_domains at ff800812e7bc
 #9 [ffc0f7ff3cd0] cpuset_hotplug_workfn at ff800812eca8
#10 [ffc0f7ff3d70] process_one_work at ff80080b3cec
#11 [ffc0f7ff3dc0] worker_thread at ff80080b4700
#12 [ffc0f7ff3e20] kthread at ff80080b8e3c

I think, we can avoid this DEADLOCK 

deadlock due to circular dependency between threads

2017-05-03 Thread Sodagudi Prasad

Hi all,

I am working on a platform, which is using the Linux version 4.4. I have 
observed a DEADLOCK between couple of threads and looking for 
suggestions/comments.

Here is my understanding from the call stacks of these blocked tasks.

0) CPU3 is getting hot plugged from a kthread and which is running on 
core5.
1) Cpu hot plug flow needs to flush the work items on hot plugging CPU3, 
with a high priority worker from the corresponding CPU’s(cpu3) worker 
pool.
2) There are no high priority workers on the CPU3 worker pool, so 
create_worker was initiated to create high priority kernel 
thread/worker.
3) This thread creation should be done by kthreadd demon, but kthreadd  
demon have got stuck in some other thread creation. At this point of 
time kthreadd creating a thread and updating cgroup settings and waiting 
on rw semaphore of cgroup subsystem.
4) Cgroup readwrite semaphore is taken by "init" thread and waiting on 
cpuset mutex lock. init task is updating cgroup's based on userspace 
request.
5) Cpuset mutex lock is taken by "kworker:5/1" and it is waiting for 
cpuhotplug lock.

Cpuhotplug mutex is taken by "ABC_XYZ" hotplugging thread. DEADLOCK!

circular dependency between threads:-
"kthread_XYZ" ==> "kthreadd" ==> "init"  ==> "kworker/5:1" ==> 
"kthread_XYZ"



PID: 910TASK: ffc0ee8dd780  CPU: 5   COMMAND: "ABC_XYZ"
 #0 [ffc0ee9cb900] __switch_to at ff800808553c
 #1 [ffc0ee9cb930] __schedule at ff8008d76aa0
 #2 [ffc0ee9cb990] schedule at ff8008d76e04
 #3 [ffc0ee9cb9b0] schedule_timeout at ff8008d7953c
 #4 [ffc0ee9cba60] wait_for_common at ff8008d77888
 #5 [ffc0ee9cbaf0] wait_for_completion at ff8008d778dc
 #6 [ffc0ee9cbb00] flush_work at ff80080b3850
 #7 [ffc0ee9cbb80] workqueue_cpu_down_callback at ff80080b5360
 #8 [ffc0ee9cbbc0] notifier_call_chain at ff80080b9c4c
 #9 [ffc0ee9cbc00] __raw_notifier_call_chain at ff80080b9cb8
#10 [ffc0ee9cbc10] __cpu_notify at ff800809eb50
#11 [ffc0ee9cbc20] _cpu_down at ff800809ee84
#12 [ffc0ee9cbca0] cpu_down at ff800809f124
#13 [ffc0ee9cbcd0] cpu_subsys_offline at ff800856b768
#14 [ffc0ee9cbce0] device_offline at ff8008567040
#15 [ffc0ee9cbd10] update_offline_cores at ff8008d74b54
#16 [ffc0ee9cbda0] do_hotplug at ff8008d75358
#17 [ffc0ee9cbe20] kthread at ff80080b8e3c



PID: 2  TASK: ffc0f9660c80  CPU: 4   COMMAND: "kthreadd"
 #0 [ffc0f9683bf0] __switch_to at ff800808553c
 #1 [ffc0f9683c20] __schedule at ff8008d76aa0
 #2 [ffc0f9683c80] schedule at ff8008d76e04
 #3 [ffc0f9683ca0] rwsem_down_read_failed at ff8008d79144
 #4 [ffc0f9683cf0] __percpu_down_read at ff80080edc4c
 #5 [ffc0f9683d10] copy_process at ff800809cecc
 #6 [ffc0f9683df0] _do_fork at ff800809d5a0
 #7 [ffc0f9683e50] kernel_thread at ff800809d89c
 #8 [ffc0f9683e60] kthreadd at ff80080b9714


PID: 898TASK: ffc0ee91  CPU: 0   COMMAND: "init"
 #0 [ffc06fd93980] __switch_to at ff800808553c
 #1 [ffc06fd939b0] __schedule at ff8008d76aa0
 #2 [ffc06fd93a10] schedule at ff8008d76e04
 #3 [ffc06fd93a30] schedule_preempt_disabled at ff8008d7714c
 #4 [ffc06fd93a50] __mutex_lock_slowpath at ff8008d78684
 #5 [ffc06fd93ab0] mutex_lock at ff8008d78714
 #6 [ffc06fd93ad0] cpuset_can_attach at ff800812d490
 #7 [ffc06fd93b20] cgroup_taskset_migrate at ff8008129194
 #8 [ffc06fd93b70] cgroup_migrate at ff8008129454
 #9 [ffc06fd93bf0] cgroup_attach_task at ff800812950c
#10 [ffc06fd93c50] __cgroup_procs_write at ff8008129884
#11 [ffc06fd93d10] cgroup_tasks_write at ff800812993c
#12 [ffc06fd93d20] cgroup_file_write at ff8008125078
#13 [ffc06fd93d70] kernfs_fop_write at ff800820bef4
#14 [ffc06fd93db0] __vfs_write at ff80081ac6f4
#15 [ffc06fd93e30] vfs_write at ff80081acf28
#16 [ffc06fd93e70] sys_write at ff80081ad6d8
#17 [ffc06fd93ed0] el0_svc_naked at ff800808462



PID: 66 TASK: ffc020dc7080  CPU: 5   COMMAND: "kworker/5:1"
 #0 [ffc0f7ff3a90] __switch_to at ff800808553c
 #1 [ffc0f7ff3ac0] __schedule at ff8008d76aa0
 #2 [ffc0f7ff3b20] schedule at ff8008d76e04
 #3 [ffc0f7ff3b40] schedule_preempt_disabled at ff8008d7714c
 #4 [ffc0f7ff3b60] __mutex_lock_slowpath at ff8008d78684
 #5 [ffc0f7ff3bc0] mutex_lock at ff8008d78714
 #6 [ffc0f7ff3be0] get_online_cpus at ff800809e9bc
 #7 [ffc0f7ff3c00] rebuild_sched_domains_locked at ff800812c960
 #8 [ffc0f7ff3cb0] rebuild_sched_domains at ff800812e7bc
 #9 [ffc0f7ff3cd0] cpuset_hotplug_workfn at ff800812eca8
#10 [ffc0f7ff3d70] process_one_work at ff80080b3cec
#11 [ffc0f7ff3dc0] worker_thread at ff80080b4700
#12 [ffc0f7ff3e20] kthread at ff80080b8e3c

I think, we can avoid this DEADLOCK 

gic_write_grpen1 in hotplug flow

2017-05-03 Thread Sodagudi Prasad

Hi All,

This is regarding the usage of gic_write_grpen1 API usage in irq-gic-v3 
driver.


Here my understanding about ICC_IGRPEN1_EL1.
ICC_IGRPEN1_EL1 is banked between secure and non-secure states. If two 
secure states are implemented, Secure side Group bit is set by the 
platform firmware (PSCI) and kernel need to set in non secure state.

https://mail.codeaurora.org/?_task=mail&_action=compose&_id=676833541590a775943df6#
1) Currently gic_write_grpen1(0) is getting called from 
gic_cpu_pm_notifier() for CPU_PM_ENTER in single security state only.
But enabling of group1 non-secure interrupts are done in CPU_PM_EXIT 
path unconditionally. Why are we not disabling group1 non-secure 
interrupts unconditionally in CPU_PM_ENTER(and disabling only in single 
security state)?


2) Why group1 non-secure interrupts are not disabled in kernel during 
cpu hotplug path? Spurious interrupt can still come if we dont disable 
group1 non-secure interrupts, right?


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


gic_write_grpen1 in hotplug flow

2017-05-03 Thread Sodagudi Prasad

Hi All,

This is regarding the usage of gic_write_grpen1 API usage in irq-gic-v3 
driver.


Here my understanding about ICC_IGRPEN1_EL1.
ICC_IGRPEN1_EL1 is banked between secure and non-secure states. If two 
secure states are implemented, Secure side Group bit is set by the 
platform firmware (PSCI) and kernel need to set in non secure state.

https://mail.codeaurora.org/?_task=mail&_action=compose&_id=676833541590a775943df6#
1) Currently gic_write_grpen1(0) is getting called from 
gic_cpu_pm_notifier() for CPU_PM_ENTER in single security state only.
But enabling of group1 non-secure interrupts are done in CPU_PM_EXIT 
path unconditionally. Why are we not disabling group1 non-secure 
interrupts unconditionally in CPU_PM_ENTER(and disabling only in single 
security state)?


2) Why group1 non-secure interrupts are not disabled in kernel during 
cpu hotplug path? Spurious interrupt can still come if we dont disable 
group1 non-secure interrupts, right?


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-03-17 Thread Sodagudi Prasad

On 2017-03-13 13:19, Thomas Gleixner wrote:

On Mon, 13 Mar 2017, Sodagudi Prasad wrote:

On 2017-02-27 09:21, Thomas Gleixner wrote:
> On Mon, 27 Feb 2017, Sodagudi Prasad wrote:
> > So I am thinking that, adding following sched_work() would notify clients.
>
> And break the world and some more.
>
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 6b66959..5e4766b 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const
> > struct cpumask *mask,
> > case IRQ_SET_MASK_OK_DONE:
> > cpumask_copy(desc->irq_common_data.affinity, mask);
> > case IRQ_SET_MASK_OK_NOCOPY:
> > +   schedule_work(>affinity_notify->work);
> > irq_set_thread_affinity(desc);
> > ret = 0;
>
> You cannot do that unconditionally and just slap that schedule_work() call
> into the code. Aside of that schedule_work() would be invoked twice for all
> calls which come via irq_set_affinity_locked() 
Hi Tglx,

Yes. I agree with you, schedule_work() gets invoked twice with 
previous

change.

How about calling irq_set_notify_locked() instead of 
irq_do_set_notify()?


Is this a quiz?

Can you actually see the difference between these functions? There is a
damned good reason WHY this calls irq_do_set_affinity().
Other option is that, adding an argument to irq_do_set_affinity() and 
queue
work to notify when that new parameter set.  I have attached patch for 
the same.


I tested this change on arm64 bit platform and observed that clients 
drivers

are getting notified during cpu hot plug.


Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative ProjectFrom 54b8d5164126fbdf14d1a9586342b972a6eb5537 Mon Sep 17 00:00:00 2001
From: Prasad Sodagudi <psoda...@codeaurora.org>
Date: Thu, 16 Mar 2017 23:44:44 -0700
Subject: [PATCH] genirq: Notify clients whenever there is change in affinity

During the cpu hotplug, irq are getting migrated from
hotplugging core but not getting notitfied to client
drivers. So add parameter to irq_do_set_affinity(),
to check and notify client drivers during the cpu hotplug.

Signed-off-by: Prasad Sodagudi <psoda...@codeaurora.org>
---
 kernel/irq/cpuhotplug.c | 2 +-
 kernel/irq/internals.h  | 2 +-
 kernel/irq/manage.c | 9 ++---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..e293d9b 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -38,7 +38,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	if (!c->irq_set_affinity) {
 		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
 	} else {
-		int r = irq_do_set_affinity(d, affinity, false);
+		int r = irq_do_set_affinity(d, affinity, false, true);
 		if (r)
 			pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
 	d->irq, r);
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index bc226e7..6abde48 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -114,7 +114,7 @@ static inline void unregister_handler_proc(unsigned int irq,
 extern void irq_set_thread_affinity(struct irq_desc *desc);
 
 extern int irq_do_set_affinity(struct irq_data *data,
-			   const struct cpumask *dest, bool force);
+		const struct cpumask *dest, bool force, bool notify);
 
 /* Inline functions for support of irq chips on slow busses */
 static inline void chip_bus_lock(struct irq_desc *desc)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index a4afe5c..aef8a96 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -197,7 +197,7 @@ static inline bool irq_move_pending(struct irq_data *data)
 #endif
 
 int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
-			bool force)
+			bool force, bool notify)
 {
 	struct irq_desc *desc = irq_data_to_desc(data);
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
@@ -209,6 +209,9 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	case IRQ_SET_MASK_OK_DONE:
 		cpumask_copy(desc->irq_common_data.affinity, mask);
 	case IRQ_SET_MASK_OK_NOCOPY:
+		if (notify)
+			schedule_work(>affinity_notify->work);
+
 		irq_set_thread_affinity(desc);
 		ret = 0;
 	}
@@ -227,7 +230,7 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
 		return -EINVAL;
 
 	if (irq_can_move_pcntxt(data)) {
-		ret = irq_do_set_affinity(data, mask, force);
+		ret = irq_do_set_affinity(data, mask, force, false);
 	} else {
 		irqd_set_move_pending(data);
 		irq_copy_pending(desc, mask);
@@ -375,7 +378,7 @@ static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
 		if (cpumask_intersects(mask, nodemask))
 			cpumask_and(mask

Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-03-17 Thread Sodagudi Prasad

On 2017-03-13 13:19, Thomas Gleixner wrote:

On Mon, 13 Mar 2017, Sodagudi Prasad wrote:

On 2017-02-27 09:21, Thomas Gleixner wrote:
> On Mon, 27 Feb 2017, Sodagudi Prasad wrote:
> > So I am thinking that, adding following sched_work() would notify clients.
>
> And break the world and some more.
>
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 6b66959..5e4766b 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const
> > struct cpumask *mask,
> > case IRQ_SET_MASK_OK_DONE:
> > cpumask_copy(desc->irq_common_data.affinity, mask);
> > case IRQ_SET_MASK_OK_NOCOPY:
> > +   schedule_work(>affinity_notify->work);
> > irq_set_thread_affinity(desc);
> > ret = 0;
>
> You cannot do that unconditionally and just slap that schedule_work() call
> into the code. Aside of that schedule_work() would be invoked twice for all
> calls which come via irq_set_affinity_locked() 
Hi Tglx,

Yes. I agree with you, schedule_work() gets invoked twice with 
previous

change.

How about calling irq_set_notify_locked() instead of 
irq_do_set_notify()?


Is this a quiz?

Can you actually see the difference between these functions? There is a
damned good reason WHY this calls irq_do_set_affinity().
Other option is that, adding an argument to irq_do_set_affinity() and 
queue
work to notify when that new parameter set.  I have attached patch for 
the same.


I tested this change on arm64 bit platform and observed that clients 
drivers

are getting notified during cpu hot plug.


Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative ProjectFrom 54b8d5164126fbdf14d1a9586342b972a6eb5537 Mon Sep 17 00:00:00 2001
From: Prasad Sodagudi 
Date: Thu, 16 Mar 2017 23:44:44 -0700
Subject: [PATCH] genirq: Notify clients whenever there is change in affinity

During the cpu hotplug, irq are getting migrated from
hotplugging core but not getting notitfied to client
drivers. So add parameter to irq_do_set_affinity(),
to check and notify client drivers during the cpu hotplug.

Signed-off-by: Prasad Sodagudi 
---
 kernel/irq/cpuhotplug.c | 2 +-
 kernel/irq/internals.h  | 2 +-
 kernel/irq/manage.c | 9 ++---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..e293d9b 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -38,7 +38,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	if (!c->irq_set_affinity) {
 		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
 	} else {
-		int r = irq_do_set_affinity(d, affinity, false);
+		int r = irq_do_set_affinity(d, affinity, false, true);
 		if (r)
 			pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
 	d->irq, r);
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index bc226e7..6abde48 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -114,7 +114,7 @@ static inline void unregister_handler_proc(unsigned int irq,
 extern void irq_set_thread_affinity(struct irq_desc *desc);
 
 extern int irq_do_set_affinity(struct irq_data *data,
-			   const struct cpumask *dest, bool force);
+		const struct cpumask *dest, bool force, bool notify);
 
 /* Inline functions for support of irq chips on slow busses */
 static inline void chip_bus_lock(struct irq_desc *desc)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index a4afe5c..aef8a96 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -197,7 +197,7 @@ static inline bool irq_move_pending(struct irq_data *data)
 #endif
 
 int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
-			bool force)
+			bool force, bool notify)
 {
 	struct irq_desc *desc = irq_data_to_desc(data);
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
@@ -209,6 +209,9 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	case IRQ_SET_MASK_OK_DONE:
 		cpumask_copy(desc->irq_common_data.affinity, mask);
 	case IRQ_SET_MASK_OK_NOCOPY:
+		if (notify)
+			schedule_work(>affinity_notify->work);
+
 		irq_set_thread_affinity(desc);
 		ret = 0;
 	}
@@ -227,7 +230,7 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
 		return -EINVAL;
 
 	if (irq_can_move_pcntxt(data)) {
-		ret = irq_do_set_affinity(data, mask, force);
+		ret = irq_do_set_affinity(data, mask, force, false);
 	} else {
 		irqd_set_move_pending(data);
 		irq_copy_pending(desc, mask);
@@ -375,7 +378,7 @@ static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
 		if (cpumask_intersects(mask, nodemask))
 			cpumask_and(mask, mask, nodemask);
 	}
-	irq_do_set_affinity(>irq_data, m

Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-03-13 Thread Sodagudi Prasad

On 2017-02-27 09:21, Thomas Gleixner wrote:

On Mon, 27 Feb 2017, Sodagudi Prasad wrote:
So I am thinking that, adding following sched_work() would notify 
clients.


And break the world and some more.


diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b66959..5e4766b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, 
const

struct cpumask *mask,
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;


You cannot do that unconditionally and just slap that schedule_work() 
call
into the code. Aside of that schedule_work() would be invoked twice for 
all

calls which come via irq_set_affinity_locked() 

Hi Tglx,

Yes. I agree with you, schedule_work() gets invoked twice with previous 
change.


How about calling irq_set_notify_locked() instead of 
irq_do_set_notify()?


diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..e8ce0db 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -18,8 +18,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
 {
struct irq_data *d = irq_desc_get_irq_data(desc);
const struct cpumask *affinity = d->common->affinity;
-   struct irq_chip *c;
bool ret = false;
+   int r;

/*
 * If this is a per-CPU interrupt, or the affinity does not
@@ -34,15 +34,10 @@ static bool migrate_one_irq(struct irq_desc *desc)
ret = true;
}

-   c = irq_data_get_irq_chip(d);
-   if (!c->irq_set_affinity) {
-   pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-   } else {
-   int r = irq_do_set_affinity(d, affinity, false);
-   if (r)
-   pr_warn_ratelimited("IRQ%u: set affinity 
failed(%d).\n",

+   r = irq_set_affinity_locked(d, affinity, false);
+   if (r)
+   pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
d->irq, r);
-   }

return ret;


-Thanks, Prasad


Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-03-13 Thread Sodagudi Prasad

On 2017-02-27 09:21, Thomas Gleixner wrote:

On Mon, 27 Feb 2017, Sodagudi Prasad wrote:
So I am thinking that, adding following sched_work() would notify 
clients.


And break the world and some more.


diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b66959..5e4766b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, 
const

struct cpumask *mask,
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;


You cannot do that unconditionally and just slap that schedule_work() 
call
into the code. Aside of that schedule_work() would be invoked twice for 
all

calls which come via irq_set_affinity_locked() 

Hi Tglx,

Yes. I agree with you, schedule_work() gets invoked twice with previous 
change.


How about calling irq_set_notify_locked() instead of 
irq_do_set_notify()?


diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..e8ce0db 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -18,8 +18,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
 {
struct irq_data *d = irq_desc_get_irq_data(desc);
const struct cpumask *affinity = d->common->affinity;
-   struct irq_chip *c;
bool ret = false;
+   int r;

/*
 * If this is a per-CPU interrupt, or the affinity does not
@@ -34,15 +34,10 @@ static bool migrate_one_irq(struct irq_desc *desc)
ret = true;
}

-   c = irq_data_get_irq_chip(d);
-   if (!c->irq_set_affinity) {
-   pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-   } else {
-   int r = irq_do_set_affinity(d, affinity, false);
-   if (r)
-   pr_warn_ratelimited("IRQ%u: set affinity 
failed(%d).\n",

+   r = irq_set_affinity_locked(d, affinity, false);
+   if (r)
+   pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
d->irq, r);
-   }

return ret;


-Thanks, Prasad


Thanks,

tglx


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


opaque types instead of union epoll_data

2017-03-07 Thread Sodagudi Prasad

Hi All,

uapi structs epoll_data are more opaque than user space expects. kernel 
have defined as __u64 instead of the union epoll_data.
Because of this libc have redefined struct epoll_event with union data 
member.


https://android.googlesource.com/platform/bionic.git/+/master/libc/include/sys/epoll.h
typedef union epoll_data {
  void* ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;
struct epoll_event {
  uint32_t events;
  epoll_data_t data;
}

Kernel UAPI header defined as "include/uapi/linux/eventpoll.h"
struct epoll_event {
__u32 events;
__u64 data;  =>opaque type instead of union epoll_data
} EPOLL_PACKED;


Because of this we are landing into some issues as we copy kernel 
headers. Will it be going to be addressed?


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


opaque types instead of union epoll_data

2017-03-07 Thread Sodagudi Prasad

Hi All,

uapi structs epoll_data are more opaque than user space expects. kernel 
have defined as __u64 instead of the union epoll_data.
Because of this libc have redefined struct epoll_event with union data 
member.


https://android.googlesource.com/platform/bionic.git/+/master/libc/include/sys/epoll.h
typedef union epoll_data {
  void* ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;
struct epoll_event {
  uint32_t events;
  epoll_data_t data;
}

Kernel UAPI header defined as "include/uapi/linux/eventpoll.h"
struct epoll_event {
__u32 events;
__u64 data;  =>opaque type instead of union epoll_data
} EPOLL_PACKED;


Because of this we are landing into some issues as we copy kernel 
headers. Will it be going to be addressed?


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-27 Thread Sodagudi Prasad

On 2017-02-21 12:59, Sodagudi Prasad wrote:

Hi Thomas,

Currently irq_set_affinity() is called to migrate irqs from 
migrate_one_irq()

during cpu hot plug and clients which are interested to know the irq
affinity change
not getting notified

take_cpu_down () --> __cpu_disable() --> 
irq_migrate_all_off_this_cpu();


irq_set_affinity() is changing the IRQ affinity at chip level
but it is not notifying the affinity_notify work.



Hi Thomas and All,

I could see that in the 3.10 kernel irq affinity notifiers are getting 
called when a core getting hot plugged. But in later kernel versions api 
used to change
affinity was changed from irq_set_affinity_locked() to 
irq_do_set_affinity(), so irq notifiers are not getting called when a 
core hot plugged.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq/manage.c?id=refs/tags/v3.10.105#n201


In latest kernel following path is executed to migrate IRQs.
irq_migrate_all_off_this_cpu() --> migrate_one_irq() -> 
irq_do_set_affinity().


As I mentioned above, irq_set_affinity_locked() notifies all clients 
drivers, which are registered for IRQ affinity change but 
irq_do_set_affinity() API just changes the affinity at irq chip level 
but does not notify the clients drivers. I am not sure whether it is 
just a miss during IRQ framework refactor or intentionally done like 
this. Can you please check whether following code change make sense or 
not?


So I am thinking that, adding following sched_work() would notify 
clients.


diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b66959..5e4766b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const 
struct cpumask *mask,

case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;
}



How about below change, so that clients drivers gets notified about
irq affinity changes?
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data,
const struct cpumask *mask,
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;

With this change, notifications of IRQ affinity gets executed and 
notified

to client drivers.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-27 Thread Sodagudi Prasad

On 2017-02-21 12:59, Sodagudi Prasad wrote:

Hi Thomas,

Currently irq_set_affinity() is called to migrate irqs from 
migrate_one_irq()

during cpu hot plug and clients which are interested to know the irq
affinity change
not getting notified

take_cpu_down () --> __cpu_disable() --> 
irq_migrate_all_off_this_cpu();


irq_set_affinity() is changing the IRQ affinity at chip level
but it is not notifying the affinity_notify work.



Hi Thomas and All,

I could see that in the 3.10 kernel irq affinity notifiers are getting 
called when a core getting hot plugged. But in later kernel versions api 
used to change
affinity was changed from irq_set_affinity_locked() to 
irq_do_set_affinity(), so irq notifiers are not getting called when a 
core hot plugged.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq/manage.c?id=refs/tags/v3.10.105#n201


In latest kernel following path is executed to migrate IRQs.
irq_migrate_all_off_this_cpu() --> migrate_one_irq() -> 
irq_do_set_affinity().


As I mentioned above, irq_set_affinity_locked() notifies all clients 
drivers, which are registered for IRQ affinity change but 
irq_do_set_affinity() API just changes the affinity at irq chip level 
but does not notify the clients drivers. I am not sure whether it is 
just a miss during IRQ framework refactor or intentionally done like 
this. Can you please check whether following code change make sense or 
not?


So I am thinking that, adding following sched_work() would notify 
clients.


diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 6b66959..5e4766b 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const 
struct cpumask *mask,

case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;
}



How about below change, so that clients drivers gets notified about
irq affinity changes?
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data,
const struct cpumask *mask,
case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;

With this change, notifications of IRQ affinity gets executed and 
notified

to client drivers.


-Thanks, Prasad
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head

2017-02-21 Thread Sodagudi Prasad

On 2017-01-03 07:19, Greg KH wrote:

On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote:


Hi All,

Device has crashed due to memory access after free while 
pending_fw_head
list accessed. Kernel 4.4 stable version is used to reproduce this use 
after

free.
--
[ 9031.178428] Unable to handle kernel paging request at virtual 
address

6b6b6b6b6b6b6b6b
[ 9031.178508] pgd = ffc0de9d2000
[ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=,
*pud=
[ 9031.253045] [ cut here ]
[ 9031.253100] Kernel BUG at ff800864c0a0 [verbose debug info
unavailable]
[ 9031.256860] Internal error: Oops - BUG: 9604 [#1] PREEMPT SMP
[ 9031.263539] Modules linked in:
[ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G
WL

4.4.16+ #1
[ 9031.280648] task: ffc0d1a1d700 ti: ffc0d1a2c000 task.ti:
ffc0d1a2c000
[ 9031.287776] PC is at fw_pm_notify+0x84/0x19c
[ 9031.295215] LR is at fw_pm_notify+0x60/0x19c
[ 9031.511559] [] fw_pm_notify+0x84/0x19c
[ 9031.519355] [] notifier_call_chain+0x58/0x8c
[ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70
[ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44
[ 9031.537243] [] pm_notifier_call_chain+0x28/0x48
[ 9031.543662] [] pm_suspend+0x278/0x674
[ 9031.549906] [] state_store+0x58/0x90
[ 9031.554942] [] kobj_attr_store+0x18/0x28
[ 9031.560154] [] sysfs_kf_write+0x5c/0x68
[ 9031.565620] [] kernfs_fop_write+0x114/0x16c
[ 9031.571092] [] __vfs_write+0x48/0xf0
[ 9031.576816] [] vfs_write+0xb8/0x150
[ 9031.581848] [] SyS_write+0x58/0x94
[ 9031.586973] [] el0_svc_naked+0x24/0x28
---

Kernel panic is observed during device suspend/resume path in the
kill_requests_without_uevent() called from fw_pm_notify().
when pending_list of a firmware_buf is accessed 0x6b(free pattern) 
pattern
observed. Based on this firmware_buf is freed even if firmware_buf is 
part

of
pending_fw_head list.


What are you doing in userspace to trigger this problem?  What kernel
driver is this happening with?
Device continuous suspend and resume is happening here. I think, echo 
mem > /sys/power/state issued here.
It is not clear what driver involved here, because after firmware_buf is 
freed all memory gets filled with 0x6b pattern.




And 4.4.16 is pretty old, can you try 4.9?
We don't have system which runs on new kernels. Looking for possible 
reasons/path, how firmware_buf can get freed when that in 
pending_fw_head list.




thanks,

greg k-h


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head

2017-02-21 Thread Sodagudi Prasad

On 2017-01-03 07:19, Greg KH wrote:

On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote:


Hi All,

Device has crashed due to memory access after free while 
pending_fw_head
list accessed. Kernel 4.4 stable version is used to reproduce this use 
after

free.
--
[ 9031.178428] Unable to handle kernel paging request at virtual 
address

6b6b6b6b6b6b6b6b
[ 9031.178508] pgd = ffc0de9d2000
[ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=,
*pud=
[ 9031.253045] [ cut here ]
[ 9031.253100] Kernel BUG at ff800864c0a0 [verbose debug info
unavailable]
[ 9031.256860] Internal error: Oops - BUG: 9604 [#1] PREEMPT SMP
[ 9031.263539] Modules linked in:
[ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G
WL

4.4.16+ #1
[ 9031.280648] task: ffc0d1a1d700 ti: ffc0d1a2c000 task.ti:
ffc0d1a2c000
[ 9031.287776] PC is at fw_pm_notify+0x84/0x19c
[ 9031.295215] LR is at fw_pm_notify+0x60/0x19c
[ 9031.511559] [] fw_pm_notify+0x84/0x19c
[ 9031.519355] [] notifier_call_chain+0x58/0x8c
[ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70
[ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44
[ 9031.537243] [] pm_notifier_call_chain+0x28/0x48
[ 9031.543662] [] pm_suspend+0x278/0x674
[ 9031.549906] [] state_store+0x58/0x90
[ 9031.554942] [] kobj_attr_store+0x18/0x28
[ 9031.560154] [] sysfs_kf_write+0x5c/0x68
[ 9031.565620] [] kernfs_fop_write+0x114/0x16c
[ 9031.571092] [] __vfs_write+0x48/0xf0
[ 9031.576816] [] vfs_write+0xb8/0x150
[ 9031.581848] [] SyS_write+0x58/0x94
[ 9031.586973] [] el0_svc_naked+0x24/0x28
---

Kernel panic is observed during device suspend/resume path in the
kill_requests_without_uevent() called from fw_pm_notify().
when pending_list of a firmware_buf is accessed 0x6b(free pattern) 
pattern
observed. Based on this firmware_buf is freed even if firmware_buf is 
part

of
pending_fw_head list.


What are you doing in userspace to trigger this problem?  What kernel
driver is this happening with?
Device continuous suspend and resume is happening here. I think, echo 
mem > /sys/power/state issued here.
It is not clear what driver involved here, because after firmware_buf is 
freed all memory gets filled with 0x6b pattern.




And 4.4.16 is pretty old, can you try 4.9?
We don't have system which runs on new kernels. Looking for possible 
reasons/path, how firmware_buf can get freed when that in 
pending_fw_head list.




thanks,

greg k-h


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-21 Thread Sodagudi Prasad

Hi Thomas,

Currently irq_set_affinity() is called to migrate irqs from 
migrate_one_irq()
during cpu hot plug and clients which are interested to know the irq 
affinity change

not getting notified

take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu();

irq_set_affinity() is changing the IRQ affinity at chip level
but it is not notifying the affinity_notify work.

How about below change, so that clients drivers gets notified about irq 
affinity changes?

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const 
struct cpumask *mask,

case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;

With this change, notifications of IRQ affinity gets executed and 
notified

to client drivers.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


Schedule affinity_notify work while migrating IRQs during hot plug

2017-02-21 Thread Sodagudi Prasad

Hi Thomas,

Currently irq_set_affinity() is called to migrate irqs from 
migrate_one_irq()
during cpu hot plug and clients which are interested to know the irq 
affinity change

not getting notified

take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu();

irq_set_affinity() is changing the IRQ affinity at chip level
but it is not notifying the affinity_notify work.

How about below change, so that clients drivers gets notified about irq 
affinity changes?

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const 
struct cpumask *mask,

case IRQ_SET_MASK_OK_DONE:
cpumask_copy(desc->irq_common_data.affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
+   schedule_work(>affinity_notify->work);
irq_set_thread_affinity(desc);
ret = 0;

With this change, notifications of IRQ affinity gets executed and 
notified

to client drivers.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

Linux Foundation Collaborative Project


  1   2   >