Time stamp value in printk records
Hi All, From Qualcomm side, we would like to check with upstream team about adding Raw time stamp value to printk records. On Qualcomm soc, there are various DSPs subsystems are there - for example audio, video and modem DSPs. Adding raw timer value(along with sched_clock()) in the printk record helps in the following use cases – 1) To find out which subsystem crashed first - Whether application processor crashed first or DSP subsystem? 2) If there are any system stability issues on the DSP side, what is the activity on the APPS processor side during that time? Initially during the device boot up, printk shed_clock value can be matched with timer raw value used on the dsp subsystem, but after APPS processor suspends several times, we don’t have way to correlate the time stamp value on the DSP and APPS processor. All timers(both apps processor timer and dsp timers) are derived from globally always on timer on Qualcomm soc, So keeping global timer raw values in printk records and dsp logs help to correlate the activity of all the processors in SoC. It would be great if upstream team adds common solution this problem if all soc vendors would get benefit by adding raw timer value to printk records. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] kernel/panic: Use SYSTEM_RESET2 command for warm reset
On 2019-05-16 11:29, Aaro Koskinen wrote: Hi, On Wed, May 08, 2019 at 06:47:12PM -0700, Prasad Sodagudi wrote: Some platforms may need warm reboot support when kernel crashed for post mortem analysis instead of cold reboot. So use config CONFIG_WARM_REBOOT_ON_PANIC and SYSTEM_RESET2 psci command support for warm reset. Please see commit b287a25a7148 - you can now use kernel command line option reboot=panic_warm to get this. Thanks Aaro. Yes. I can use this option. Thanks Sudeep and all for discussing. A. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: PSCI version 1.1 and SYSTEM_RESET2
On 2019-05-02 02:05, Sudeep Holla wrote: On Wed, May 01, 2019 at 11:43:00AM -0700, Sodagudi Prasad wrote: On 2019-05-01 02:49, Sudeep Holla wrote: > On Tue, Apr 30, 2019 at 05:07:31PM -0700, Sodagudi Prasad wrote: > > On 2019-04-30 14:44, Sodagudi Prasad wrote: [...] > > > > It would nice if there is a config option to reboot the device > > either in > > warm or cold in the case of kernel panic. > > I presume you prefer to do warm boot in case of panic to get a dump of > the memory to inspect ? If so, is kexec/kdump not the mechanism to > achieve that ? Hi Sudeep, Thanks for your response and sharing details about your patch. > If so, is kexec/kdump not the mechanism to achieve that? > Qualcomm is having vendor specific solution to capture ram contents and for offline analysis. Ah OK. > > I am just trying to understand the use case. Xilinx asked for the same > but never got to understand their use case. Here is the background - Usually, power off drivers are overriding arm_pm_restart and pm_power_off callbacks and registering with reboot notifier with some priority for the reboot operations. Here is the Qualcomm poweroff driver for reference. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/reset/msm-poweroff.c Before vendor chip set specific power off driver is probed, arm_pm_restart functions pointer holds the psci_sys_reset function. Once vendor power off driver is probed, vendor drivers can override the arm_pm_restart function pointer. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/psci.c#n562 Once vendor driver is probed, drivers can take care of devices warm or hard reset configuration part properly. But there is a window from start_kernel() to vendor specific driver probed, devices are getting cold resets even if kernel crashed. This is due to arm_pm_restart points to psci_sys_reset function by default. Is this problem clear now? Too specific use case IMO and I am not sure if we need a generic solution to deal with this. Anyways, I don't see any check in arch/psci specific code for what you want, just ensure reboot_mode is set appropriately. Post a patch and see what people have to say. Hi Sudeep, Yes. With your system_reset2 command support addition, just configuring the reboot_mode is good enough. -Thanks, Prasad Qualcomm downstream kernel has a lot of use cases with respect device reset sequence and the downstream driver is much different from upstream drivers. I think, the above-mentioned problem is common for all the chipset vendors and it is not specific Qualcomm use cases. I have one downstream solution to this problem but thought to bring up this problem to the upstream community for a common solution, so that all the vendors can use it. May be or may be not, post the patch and let's see. I have modified below flow to avoid cold restart in the case of early kernel panic. panic() --> emergency_restart() --> machine_emergency_restart() --> machine_restart(NULL); -Thanks, Prasad > > -- > Regards, > Sudeep -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: PSCI version 1.1 and SYSTEM_RESET2
On 2019-05-01 02:49, Sudeep Holla wrote: On Tue, Apr 30, 2019 at 05:07:31PM -0700, Sodagudi Prasad wrote: On 2019-04-30 14:44, Sodagudi Prasad wrote: +Sudeep > Hi Mark/Will, > > I would like to understand whether ARM linux community have plans to > support PSCI version 1.1 or not. > PSCI_1_1 specification introduced support for SYSTEM_RESET2 command > and this new command helps mobile devices to SYSTEM_WARM_RESET > support. Rebooting devices with warm reboot helps to capture the > snapshot of the ram contents for post-mortem analysis. I think, there is a recent discussion from Sudeep for the SYSTEM_RESET2 support. https://patchwork.kernel.org/patch/10884345/ This has landed in -next, and hopefully must appear in v5.2 Hi Sudeep, I was going through your discussion in the below list - https://lore.kernel.org/lkml/d73d3580-4ec1-a281-4585-5c776fc08...@xilinx.com/ There is no provision to set up reboot mode dynamically instead kernel command line parameter. Looking for options to reboot device with warm reboot option when kernel crashed. panic() --> emergency_restart() --> machine_emergency_restart() --> machine_restart(NULL); It would nice if there is a config option to reboot the device either in warm or cold in the case of kernel panic. I presume you prefer to do warm boot in case of panic to get a dump of the memory to inspect ? If so, is kexec/kdump not the mechanism to achieve that ? Hi Sudeep, Thanks for your response and sharing details about your patch. If so, is kexec/kdump not the mechanism to achieve that? Qualcomm is having vendor specific solution to capture ram contents and for offline analysis. I am just trying to understand the use case. Xilinx asked for the same but never got to understand their use case. Here is the background - Usually, power off drivers are overriding arm_pm_restart and pm_power_off callbacks and registering with reboot notifier with some priority for the reboot operations. Here is the Qualcomm poweroff driver for reference. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/power/reset/msm-poweroff.c Before vendor chip set specific power off driver is probed, arm_pm_restart functions pointer holds the psci_sys_reset function. Once vendor power off driver is probed, vendor drivers can override the arm_pm_restart function pointer. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/psci.c#n562 Once vendor driver is probed, drivers can take care of devices warm or hard reset configuration part properly. But there is a window from start_kernel() to vendor specific driver probed, devices are getting cold resets even if kernel crashed. This is due to arm_pm_restart points to psci_sys_reset function by default. Is this problem clear now? Qualcomm downstream kernel has a lot of use cases with respect device reset sequence and the downstream driver is much different from upstream drivers. I think, the above-mentioned problem is common for all the chipset vendors and it is not specific Qualcomm use cases. I have one downstream solution to this problem but thought to bring up this problem to the upstream community for a common solution, so that all the vendors can use it. I have modified below flow to avoid cold restart in the case of early kernel panic. panic() --> emergency_restart() --> machine_emergency_restart() --> machine_restart(NULL); -Thanks, Prasad -- Regards, Sudeep -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: PSCI version 1.1 and SYSTEM_RESET2
On 2019-04-30 14:44, Sodagudi Prasad wrote: +Sudeep Hi Mark/Will, I would like to understand whether ARM linux community have plans to support PSCI version 1.1 or not. PSCI_1_1 specification introduced support for SYSTEM_RESET2 command and this new command helps mobile devices to SYSTEM_WARM_RESET support. Rebooting devices with warm reboot helps to capture the snapshot of the ram contents for post-mortem analysis. I think, there is a recent discussion from Sudeep for the SYSTEM_RESET2 support. https://patchwork.kernel.org/patch/10884345/ Hi Sudeep, I was going through your discussion in the below list - https://lore.kernel.org/lkml/d73d3580-4ec1-a281-4585-5c776fc08...@xilinx.com/ There is no provision to set up reboot mode dynamically instead kernel command line parameter. Looking for options to reboot device with warm reboot option when kernel crashed. panic() --> emergency_restart() --> machine_emergency_restart() --> machine_restart(NULL); It would nice if there is a config option to reboot the device either in warm or cold in the case of kernel panic. Calling machine_restart with a NULL parameter for kernel crash is leading to devices cold reboot. -Thanks, Prasad -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
PSCI version 1.1 and SYSTEM_RESET2
Hi Mark/Will, I would like to understand whether ARM linux community have plans to support PSCI version 1.1 or not. PSCI_1_1 specification introduced support for SYSTEM_RESET2 command and this new command helps mobile devices to SYSTEM_WARM_RESET support. Rebooting devices with warm reboot helps to capture the snapshot of the ram contents for post-mortem analysis. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] genirq: call cancel_work_sync from irq_set_affinity_notifier
On 2019-03-21 09:19, Thomas Gleixner wrote: Prasad, On Wed, 20 Mar 2019, Prasad Sodagudi wrote: Subject: [PATCH] genirq: call cancel_work_sync from irq_set_affinity_notifier Please do not decribe WHAT the code change is. Give a consice explanation WHY this change is done. The above is like '[PATCH] foo: Increment bar by 5'. [PATCH] genirq: Prevent UAF and work list corruption When ever notification of IRQ affinity changes, call cancel_work_sync from irq_set_affinity_notifier to cancel all pending works to avoid work list corruption. Again, you describe first WHAT you are doing instead of telling WHY. When irq_set_affinity_notifier() replaces the notifier, then the reference count on the old notifier is dropped which causes it to be freed. But nothing ensures that the old notifier is not longer queued in the work list. If it is queued this results in a use after free and possibly in work list corruption. Ensure that the work is canceled before the reference is dropped. See? Hi Tglx, Thanks for suggesting commit text and modifications. This gives precise context first and then describes the cure. Also it is completely irrelevant whether this is achieved by calling cancel_work_sync() or by something else. What matters is that it's canceled. Changelogs describe context and concepts not implementation details. The implementation details are in the patch itself. Signed-off-by: Prasad Sodagudi --- kernel/irq/manage.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 9ec34a2..da8b2ee 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -356,6 +356,9 @@ static void irq_affinity_notify(struct work_struct *work) desc->affinity_notify = notify; raw_spin_unlock_irqrestore(>lock, flags); + if (!notify && old_notify) + cancel_work_sync(_notify->work); That '!notify' doesn't make any sense. Yes. I will remove this in the next patch set. Thanks for reviewing. -thanks, Prasad Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] perf: Change PMCR write to read-modify-write
On 2019-03-21 06:34, Julien Thierry wrote: Hi Prasad, On 21/03/2019 02:07, Prasad Sodagudi wrote: Preserves the bitfields of PMCR_EL0(AArch64) during PMU reset. Reset routine should write a 1 to PMCR.C and PMCR.P fields only to reset the counters. Other fields should not be changed as they could be set before PMU initialization and their value must be preserved even after reset. Are there any particular bit you are concerned about? Apart from the RO ones and the Res0 ones (to which we are already writing 0), I see: DP -> irrelevant for non-secure X -> This one is the only potentially interesting, however it resets to an architecturally unknown value, so unless we know for a fact it was set before hand, we probably want to clear it D -> ignored when we have LC set (and we do) E -> Since this is the function we use to reset the PMU on the current CPU, we probably want to set this bit to 0 regardless of its previous value So, is there any issue this patch is solving? Hi Julien, Thanks for taking a look into this patch. Yes. On our Qualcomm platforms, observed that X bit is getting cleared by kernel. This bit is getting set by firmware for Qualcomm use cases and non-secure world is resetting without this patch. I think, changing that register this register modifications to read-modify-write style make sense. -Thanks, Prasad Thanks, Signed-off-by: Prasad Sodagudi --- arch/arm64/kernel/perf_event.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 4addb38..0c1afdd 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -868,8 +868,8 @@ static void armv8pmu_reset(void *info) * Initialize & Reset PMNC. Request overflow interrupt for * 64 bit cycle counter but cheat in armv8pmu_write_counter(). */ - armv8pmu_pmcr_write(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C | - ARMV8_PMU_PMCR_LC); + armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_P | + ARMV8_PMU_PMCR_C | ARMV8_PMU_PMCR_LC); } static int __armv8_pmuv3_map_event(struct perf_event *event, -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: protected pins and debugfs
On 2018-10-10 12:40, Sodagudi Prasad wrote: On 2018-10-07 23:04, Stephen Boyd wrote: Quoting Sodagudi Prasad (2018-10-03 05:38:24) for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } Does something not work with the following code in msm_gpio_dbg_show_one()? if (!gpiochip_line_is_valid(chip, offset)) return; Hi Stephen, I didnt realize that these changes are merged on tip. I was testing on 4.14 kernel. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ Hi Stephen, After checking this further, adding "gpio-reserved-ranges" in not good option. Because of the following reasons. 1) These gpio information changes from platform to platform. So need to maintain reserved-range properly for each platform. 2) Also some of the gpio can be changed to secure/protected gpio dynamically based on the use case. It looks adding the "gpio-reserved-ranges" ranges is not good option for most of the platforms. Can you please check the initial patch suggested in this thread? Please let me know if you have any other options for the above points. -Thanks, Prasad I will add "gpio-reserved-ranges" to internal platforms and this issue should not be observed. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: protected pins and debugfs
On 2018-10-10 12:40, Sodagudi Prasad wrote: On 2018-10-07 23:04, Stephen Boyd wrote: Quoting Sodagudi Prasad (2018-10-03 05:38:24) for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } Does something not work with the following code in msm_gpio_dbg_show_one()? if (!gpiochip_line_is_valid(chip, offset)) return; Hi Stephen, I didnt realize that these changes are merged on tip. I was testing on 4.14 kernel. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ Hi Stephen, After checking this further, adding "gpio-reserved-ranges" in not good option. Because of the following reasons. 1) These gpio information changes from platform to platform. So need to maintain reserved-range properly for each platform. 2) Also some of the gpio can be changed to secure/protected gpio dynamically based on the use case. It looks adding the "gpio-reserved-ranges" ranges is not good option for most of the platforms. Can you please check the initial patch suggested in this thread? Please let me know if you have any other options for the above points. -Thanks, Prasad I will add "gpio-reserved-ranges" to internal platforms and this issue should not be observed. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: livelock with hrtimer cpu_base->lock
On 2018-10-10 09:49, Will Deacon wrote: Hi Prasad, On Tue, Oct 09, 2018 at 01:56:14PM -0700, Sodagudi Prasad wrote: This is regarding - thread "try to fix contention between expire_timers and try_to_del_timer_sync". https://lkml.org/lkml/2017/7/28/172 I think this live lockup issue was discussed earlier but the final set of changes were not concluded. Well we basically need a way to pick a value for CPU_RELAX_WFE_THRESHOLD. Do you have any ideas? It could be determined at runtime if necessary. Hi Will, Please share what are values need to be tried for CPU_RELAX_WFE_THRESHOLD. It would be great if it can be determined from runtime. Please let me know if any testing need to be done with dynamic detection patch. -Thanks, Prasad I would like to check whether you have new updates on this issue or not. This problem is observed with 4.14 .64 stable kernel too. We see this problem 2 times in overnight testing. I have to add the following code to avoid live lock. I am thinking that fixing this at the cpu_relax() level. +++ b/kernel/time/hrtimer.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, raw_spin_unlock_irqrestore(>cpu_base->lock, *flags); } cpu_relax(); + udelay(1); } } @@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer) if (ret >= 0) return ret; cpu_relax(); + udelay(1); } } EXPORT_SYMBOL_GPL(hrtimer_cancel); This is just another bodge and likely to hurt in places where 1us is excessive because there isn't contention. Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: livelock with hrtimer cpu_base->lock
On 2018-10-10 09:49, Will Deacon wrote: Hi Prasad, On Tue, Oct 09, 2018 at 01:56:14PM -0700, Sodagudi Prasad wrote: This is regarding - thread "try to fix contention between expire_timers and try_to_del_timer_sync". https://lkml.org/lkml/2017/7/28/172 I think this live lockup issue was discussed earlier but the final set of changes were not concluded. Well we basically need a way to pick a value for CPU_RELAX_WFE_THRESHOLD. Do you have any ideas? It could be determined at runtime if necessary. Hi Will, Please share what are values need to be tried for CPU_RELAX_WFE_THRESHOLD. It would be great if it can be determined from runtime. Please let me know if any testing need to be done with dynamic detection patch. -Thanks, Prasad I would like to check whether you have new updates on this issue or not. This problem is observed with 4.14 .64 stable kernel too. We see this problem 2 times in overnight testing. I have to add the following code to avoid live lock. I am thinking that fixing this at the cpu_relax() level. +++ b/kernel/time/hrtimer.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, raw_spin_unlock_irqrestore(>cpu_base->lock, *flags); } cpu_relax(); + udelay(1); } } @@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer) if (ret >= 0) return ret; cpu_relax(); + udelay(1); } } EXPORT_SYMBOL_GPL(hrtimer_cancel); This is just another bodge and likely to hurt in places where 1us is excessive because there isn't contention. Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: protected pins and debugfs
On 2018-10-07 23:04, Stephen Boyd wrote: Quoting Sodagudi Prasad (2018-10-03 05:38:24) for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } Does something not work with the following code in msm_gpio_dbg_show_one()? if (!gpiochip_line_is_valid(chip, offset)) return; Hi Stephen, I didnt realize that these changes are merged on tip. I was testing on 4.14 kernel. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ I will add "gpio-reserved-ranges" to internal platforms and this issue should not be observed. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: protected pins and debugfs
On 2018-10-07 23:04, Stephen Boyd wrote: Quoting Sodagudi Prasad (2018-10-03 05:38:24) for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } Does something not work with the following code in msm_gpio_dbg_show_one()? if (!gpiochip_line_is_valid(chip, offset)) return; Hi Stephen, I didnt realize that these changes are merged on tip. I was testing on 4.14 kernel. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ I will add "gpio-reserved-ranges" to internal platforms and this issue should not be observed. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
livelock with hrtimer cpu_base->lock
Hi Will, This is regarding - thread "try to fix contention between expire_timers and try_to_del_timer_sync". https://lkml.org/lkml/2017/7/28/172 I think this live lockup issue was discussed earlier but the final set of changes were not concluded. I would like to check whether you have new updates on this issue or not. This problem is observed with 4.14 .64 stable kernel too. We see this problem 2 times in overnight testing. I have to add the following code to avoid live lock. I am thinking that fixing this at the cpu_relax() level. +++ b/kernel/time/hrtimer.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, raw_spin_unlock_irqrestore(>cpu_base->lock, *flags); } cpu_relax(); + udelay(1); } } @@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer) if (ret >= 0) return ret; cpu_relax(); + udelay(1); } } EXPORT_SYMBOL_GPL(hrtimer_cancel); Note:- Timer event streaming is enabled and still live lock observed. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
livelock with hrtimer cpu_base->lock
Hi Will, This is regarding - thread "try to fix contention between expire_timers and try_to_del_timer_sync". https://lkml.org/lkml/2017/7/28/172 I think this live lockup issue was discussed earlier but the final set of changes were not concluded. I would like to check whether you have new updates on this issue or not. This problem is observed with 4.14 .64 stable kernel too. We see this problem 2 times in overnight testing. I have to add the following code to avoid live lock. I am thinking that fixing this at the cpu_relax() level. +++ b/kernel/time/hrtimer.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -152,6 +153,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, raw_spin_unlock_irqrestore(>cpu_base->lock, *flags); } cpu_relax(); + udelay(1); } } @@ -1067,6 +1069,7 @@ int hrtimer_cancel(struct hrtimer *timer) if (ret >= 0) return ret; cpu_relax(); + udelay(1); } } EXPORT_SYMBOL_GPL(hrtimer_cancel); Note:- Timer event streaming is enabled and still live lock observed. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
protected pins and debugfs
Hi All, This is regarding the protected pins configuration reading and printing from non-secure operating systems. GPIO framework is checking whether pin is in use(flag FLAG_REQUESTED) or not in gpiolib_dbg_show(). If GPIO chip drivers are overriding the dbg_show callback, drivers are not checking whether a pin is really in use or not to print configuration details. if (chip->dbg_show) chip->dbg_show(s, chip); else gpiolib_dbg_show(s, gdev); I think, reserved-gpio-ranges solution may move the problem to dts settings maintenance as they change from platform to platform. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ Can we use a simple/common solution like below? It will check whether a pin is in use or not before printing configuration data with the help of gpiochip_is_requested(). index fef0970..2ca1440 100644 --- a/drivers/pinctrl/qcom/pinctrl-msm.c +++ b/drivers/pinctrl/qcom/pinctrl-msm.c @@ -579,16 +579,20 @@ static void msm_gpio_dbg_show_one(struct seq_file *s, seq_printf(s, " %-8s: %-3s %d", g->name, is_out ? "out" : "in", func); seq_printf(s, " %dmA", msm_regval_to_drive(drive)); seq_printf(s, " %s", pulls[pull]); + seq_puts(s, "\n"); } static void msm_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip) { unsigned gpio = chip->base; unsigned i; + const char *label; for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
protected pins and debugfs
Hi All, This is regarding the protected pins configuration reading and printing from non-secure operating systems. GPIO framework is checking whether pin is in use(flag FLAG_REQUESTED) or not in gpiolib_dbg_show(). If GPIO chip drivers are overriding the dbg_show callback, drivers are not checking whether a pin is really in use or not to print configuration details. if (chip->dbg_show) chip->dbg_show(s, chip); else gpiolib_dbg_show(s, gdev); I think, reserved-gpio-ranges solution may move the problem to dts settings maintenance as they change from platform to platform. https://lore.kernel.org/patchwork/patch/878107/ https://lore.kernel.org/patchwork/patch/878106/ https://lore.kernel.org/patchwork/patch/878109/ Can we use a simple/common solution like below? It will check whether a pin is in use or not before printing configuration data with the help of gpiochip_is_requested(). index fef0970..2ca1440 100644 --- a/drivers/pinctrl/qcom/pinctrl-msm.c +++ b/drivers/pinctrl/qcom/pinctrl-msm.c @@ -579,16 +579,20 @@ static void msm_gpio_dbg_show_one(struct seq_file *s, seq_printf(s, " %-8s: %-3s %d", g->name, is_out ? "out" : "in", func); seq_printf(s, " %dmA", msm_regval_to_drive(drive)); seq_printf(s, " %s", pulls[pull]); + seq_puts(s, "\n"); } static void msm_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip) { unsigned gpio = chip->base; unsigned i; + const char *label; for (i = 0; i < chip->ngpio; i++, gpio++) { + label = gpiochip_is_requested(chip, i); + if (!label) + continue; msm_gpio_dbg_show_one(s, NULL, chip, i, gpio); - seq_puts(s, "\n"); } } -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
EDAC polling
Hi All, While adding edac_device control info using edac_device_add_device(), changed the poll_msec at client driver level. For example -. https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/edac/qcom_llcc_edac.c?h=msm-4.9#n385 I see that client driver request is not considered and always default value(1 second) is considered. During the initialization edac_device_workq_setup sets poll_msed to 1 second. int edac_device_add_device(struct edac_device_ctl_info *edac_dev) { ... ... /* If there IS a check routine, then we are running POLLED */ if (edac_dev->edac_check != NULL) { /* This instance is NOW RUNNING */ edac_dev->op_state = OP_RUNNING_POLL; /* * enable workq processing on this instance, * default = 1000 msec */ edac_device_workq_setup(edac_dev, 1000); } else { edac_dev->op_state = OP_RUNNING_INTERRUPT; } May I know why client edev_ctl->poll_msec settings is not considered? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
EDAC polling
Hi All, While adding edac_device control info using edac_device_add_device(), changed the poll_msec at client driver level. For example -. https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/edac/qcom_llcc_edac.c?h=msm-4.9#n385 I see that client driver request is not considered and always default value(1 second) is considered. During the initialization edac_device_workq_setup sets poll_msed to 1 second. int edac_device_add_device(struct edac_device_ctl_info *edac_dev) { ... ... /* If there IS a check routine, then we are running POLLED */ if (edac_dev->edac_check != NULL) { /* This instance is NOW RUNNING */ edac_dev->op_state = OP_RUNNING_POLL; /* * enable workq processing on this instance, * default = 1000 msec */ edac_device_workq_setup(edac_dev, 1000); } else { edac_dev->op_state = OP_RUNNING_INTERRUPT; } May I know why client edev_ctl->poll_msec settings is not considered? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
KASAN errors from unwind_frame
Hi All, I have observed following KASAN error with 4.14.56 kernel. Can you please copy change-[1](kasan: add no_sanitize attribute for clang builds) into stable kernels? [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/compiler-clang.h?h=v4.18-rc8=12c8f25a016dff69ee284aa3338bebfd2cfcba33 == BUG: KASAN: out-of-bounds in __read_once_size_nocheck include/linux/compiler.h:196 [inline] BUG: KASAN: out-of-bounds in unwind_frame+0xc4/0x324 arch/arm64/kernel/stacktrace.c:56 Read of size 8 at addr ffe3123ff4b0 by task poc/15233 CPU: 7 PID: 15233 Comm: poc Tainted: G S W O4.14.56+ #3 Hardware name: Qualcomm Technologies, Inc. Call trace: dump_backtrace+0x0/0x388 show_stack+0x24/0x30 __dump_stack+0x24/0x2c dump_stack+0x8c/0xd0 print_address_description+0x74/0x234 kasan_report+0x240/0x264 __asan_report_load8_noabort+0x2c/0x38 unwind_frame+0xc4/0x324 walk_stackframe+0x44/0x6c __save_stack_trace+0x250/0x444 save_stack_trace_tsk+0x2c/0x38 proc_pid_stack+0x134/0x268 proc_single_show+0xdc/0x130 traverse+0x244/0x5b0 seq_lseek+0x10c/0x27c vfs_llseek+0xb4/0xe4 SyS_lseek+0x54/0xa0 el0_svc_naked+0x34/0x38 The buggy address belongs to the page: page:ffbf8c48ffc0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: raw: dead0200 page dumped because: kasan: bad access detected page_owner info is not active (free page?) Memory state around the buggy address: ffe3123ff380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffe3123ff500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 == -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
KASAN errors from unwind_frame
Hi All, I have observed following KASAN error with 4.14.56 kernel. Can you please copy change-[1](kasan: add no_sanitize attribute for clang builds) into stable kernels? [1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/compiler-clang.h?h=v4.18-rc8=12c8f25a016dff69ee284aa3338bebfd2cfcba33 == BUG: KASAN: out-of-bounds in __read_once_size_nocheck include/linux/compiler.h:196 [inline] BUG: KASAN: out-of-bounds in unwind_frame+0xc4/0x324 arch/arm64/kernel/stacktrace.c:56 Read of size 8 at addr ffe3123ff4b0 by task poc/15233 CPU: 7 PID: 15233 Comm: poc Tainted: G S W O4.14.56+ #3 Hardware name: Qualcomm Technologies, Inc. Call trace: dump_backtrace+0x0/0x388 show_stack+0x24/0x30 __dump_stack+0x24/0x2c dump_stack+0x8c/0xd0 print_address_description+0x74/0x234 kasan_report+0x240/0x264 __asan_report_load8_noabort+0x2c/0x38 unwind_frame+0xc4/0x324 walk_stackframe+0x44/0x6c __save_stack_trace+0x250/0x444 save_stack_trace_tsk+0x2c/0x38 proc_pid_stack+0x134/0x268 proc_single_show+0xdc/0x130 traverse+0x244/0x5b0 seq_lseek+0x10c/0x27c vfs_llseek+0xb4/0xe4 SyS_lseek+0x54/0xa0 el0_svc_naked+0x34/0x38 The buggy address belongs to the page: page:ffbf8c48ffc0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: raw: dead0200 page dumped because: kasan: bad access detected page_owner info is not active (free page?) Memory state around the buggy address: ffe3123ff380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffe3123ff500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffe3123ff580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 == -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level
On 2018-08-10 00:10, Rafael J. Wysocki wrote: On Fri, Aug 10, 2018 at 12:30 AM, wrote: On 2018-08-06 01:53, Rafael J. Wysocki wrote: On Fri, Aug 3, 2018 at 12:20 AM, Sodagudi Prasad wrote: From: RAFAEL J. WYSOCKI Date: Wed, Aug 1, 2018 at 2:21 PM Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level To: Rishabh Bhatnagar Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman , Linux Kernel Mailing List , ckad...@codeaurora.org, ts...@codeaurora.org, Vikram Mulukutla On Wed, Aug 1, 2018 at 11:18 PM, wrote: On 2018-07-24 01:24, Rafael J. Wysocki wrote: On Mon, Jul 23, 2018 at 10:22 PM, wrote: On 2018-07-23 04:17, Rafael J. Wysocki wrote: On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar wrote: Drivers that are registered at an initcall level may have to wait until late_init before the probe deferral mechanism can retry their probe functions. It is possible that their dependencies were resolved much earlier, in some cases even before the next initcall level. Invoke one probe retry cycle at every _sync initcall level, allowing these drivers to be probed earlier. Can you please say something about the actual use case this is expected to address? We have a display driver that depends 3 other devices to be probed so that it can bring-up the display. Because of dependencies not being met the deferral mechanism defers the probes for a later time, even though the dependencies might be met earlier. With this change display can be brought up much earlier. OK What runlevel brings up the display after the change? Thanks, Rafael After the change the display can come up after device_initcall level itself. The above mentioned 3 devices are probed at 0.503253, 0.505210 and 0.523264 secs. Only the first device is probed successfully. With the current deferral mechanism the devices get probed again after late_initcall at 9.19 and 9.35 secs. So display can only come up after 9.35 secs. With this change the devices are re-probed successfully at 0.60 and 0.613 secs. Therefore display can come just after 0.613 secs. OK, so why do you touch the initcall levels earlier than device_? 1) re-probe probing devices in the active list on every level help to avoid circular dependency pending list. 2) There are so many devices which gets deferred in earlier init call levels, so we wanted to reprobe them at every successive init call level. Do you have specific examples of devices for which that helps? This change helps in overall android bootup as well. How exactly? We have seen less no of re-probes at late_init and most of the driver's dependency met earlier than late_init call level. It helped display and couple of other drivers by executing the re probe work at every init level. So I can believe that walking the deferred list on device_initcall and maybe on device_initcall_sync may help, but I'm not quite convinced that it matters for the earlier initcall levels. Many of our drivers are dependent on the regulator and bus driver. Both the regulator and bus driver are probed in the subsys_initcall level. Now the probe of bus driver requires regulator to be working. If the probe of bus driver happens before regulator, then bus driver's probe will be deferred and all other device's probes which depend on bus driver will also be deferred. The impact of this problem is reduced if we have this patch. Fair enough, but this information should be there in the changelog of your patch. And why do you do that for arch_initcall_sync()? You are right and we can remove arch_initcall_sync(). Added some logging and observed that, none of the devices are re-probed in the arch_initcall_sync level. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level
On 2018-08-10 00:10, Rafael J. Wysocki wrote: On Fri, Aug 10, 2018 at 12:30 AM, wrote: On 2018-08-06 01:53, Rafael J. Wysocki wrote: On Fri, Aug 3, 2018 at 12:20 AM, Sodagudi Prasad wrote: From: RAFAEL J. WYSOCKI Date: Wed, Aug 1, 2018 at 2:21 PM Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level To: Rishabh Bhatnagar Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman , Linux Kernel Mailing List , ckad...@codeaurora.org, ts...@codeaurora.org, Vikram Mulukutla On Wed, Aug 1, 2018 at 11:18 PM, wrote: On 2018-07-24 01:24, Rafael J. Wysocki wrote: On Mon, Jul 23, 2018 at 10:22 PM, wrote: On 2018-07-23 04:17, Rafael J. Wysocki wrote: On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar wrote: Drivers that are registered at an initcall level may have to wait until late_init before the probe deferral mechanism can retry their probe functions. It is possible that their dependencies were resolved much earlier, in some cases even before the next initcall level. Invoke one probe retry cycle at every _sync initcall level, allowing these drivers to be probed earlier. Can you please say something about the actual use case this is expected to address? We have a display driver that depends 3 other devices to be probed so that it can bring-up the display. Because of dependencies not being met the deferral mechanism defers the probes for a later time, even though the dependencies might be met earlier. With this change display can be brought up much earlier. OK What runlevel brings up the display after the change? Thanks, Rafael After the change the display can come up after device_initcall level itself. The above mentioned 3 devices are probed at 0.503253, 0.505210 and 0.523264 secs. Only the first device is probed successfully. With the current deferral mechanism the devices get probed again after late_initcall at 9.19 and 9.35 secs. So display can only come up after 9.35 secs. With this change the devices are re-probed successfully at 0.60 and 0.613 secs. Therefore display can come just after 0.613 secs. OK, so why do you touch the initcall levels earlier than device_? 1) re-probe probing devices in the active list on every level help to avoid circular dependency pending list. 2) There are so many devices which gets deferred in earlier init call levels, so we wanted to reprobe them at every successive init call level. Do you have specific examples of devices for which that helps? This change helps in overall android bootup as well. How exactly? We have seen less no of re-probes at late_init and most of the driver's dependency met earlier than late_init call level. It helped display and couple of other drivers by executing the re probe work at every init level. So I can believe that walking the deferred list on device_initcall and maybe on device_initcall_sync may help, but I'm not quite convinced that it matters for the earlier initcall levels. Many of our drivers are dependent on the regulator and bus driver. Both the regulator and bus driver are probed in the subsys_initcall level. Now the probe of bus driver requires regulator to be working. If the probe of bus driver happens before regulator, then bus driver's probe will be deferred and all other device's probes which depend on bus driver will also be deferred. The impact of this problem is reduced if we have this patch. Fair enough, but this information should be there in the changelog of your patch. And why do you do that for arch_initcall_sync()? You are right and we can remove arch_initcall_sync(). Added some logging and observed that, none of the devices are re-probed in the arch_initcall_sync level. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: cpu stopper threads and setaffinity leads to deadlock
On 2018-08-03 04:41, Thomas Gleixner wrote: Prasad. On Thu, 2 Aug 2018, Peter Zijlstra wrote: So why didn't you do the 'obvious' parallel to what you did for cpu_stop_queue_two_works(), namely: Is that patch fixing the issue for you? Hi Thomas and Peter, Yes. Tested both versions of patches and both variants are working on Qualcomm devices with stress testing of set affinity and tasks cross-migration, which were previously leading to the deadlock. -Thanks, Prasad --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -81,6 +81,7 @@ static bool cpu_stop_queue_work(unsigned unsigned long flags; bool enabled; + preempt_disable(); raw_spin_lock_irqsave(>lock, flags); enabled = stopper->enabled; if (enabled) @@ -90,6 +91,7 @@ static bool cpu_stop_queue_work(unsigned raw_spin_unlock_irqrestore(>lock, flags); wake_up_q(); + preempt_enable(); return enabled; } -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: cpu stopper threads and setaffinity leads to deadlock
On 2018-08-03 04:41, Thomas Gleixner wrote: Prasad. On Thu, 2 Aug 2018, Peter Zijlstra wrote: So why didn't you do the 'obvious' parallel to what you did for cpu_stop_queue_two_works(), namely: Is that patch fixing the issue for you? Hi Thomas and Peter, Yes. Tested both versions of patches and both variants are working on Qualcomm devices with stress testing of set affinity and tasks cross-migration, which were previously leading to the deadlock. -Thanks, Prasad --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -81,6 +81,7 @@ static bool cpu_stop_queue_work(unsigned unsigned long flags; bool enabled; + preempt_disable(); raw_spin_lock_irqsave(>lock, flags); enabled = stopper->enabled; if (enabled) @@ -90,6 +91,7 @@ static bool cpu_stop_queue_work(unsigned raw_spin_unlock_irqrestore(>lock, flags); wake_up_q(); + preempt_enable(); return enabled; } -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level
From: RAFAEL J. WYSOCKI Date: Wed, Aug 1, 2018 at 2:21 PM Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level To: Rishabh Bhatnagar Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman , Linux Kernel Mailing List , ckad...@codeaurora.org, ts...@codeaurora.org, Vikram Mulukutla On Wed, Aug 1, 2018 at 11:18 PM, wrote: On 2018-07-24 01:24, Rafael J. Wysocki wrote: On Mon, Jul 23, 2018 at 10:22 PM, wrote: On 2018-07-23 04:17, Rafael J. Wysocki wrote: On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar wrote: Drivers that are registered at an initcall level may have to wait until late_init before the probe deferral mechanism can retry their probe functions. It is possible that their dependencies were resolved much earlier, in some cases even before the next initcall level. Invoke one probe retry cycle at every _sync initcall level, allowing these drivers to be probed earlier. Can you please say something about the actual use case this is expected to address? We have a display driver that depends 3 other devices to be probed so that it can bring-up the display. Because of dependencies not being met the deferral mechanism defers the probes for a later time, even though the dependencies might be met earlier. With this change display can be brought up much earlier. OK What runlevel brings up the display after the change? Thanks, Rafael After the change the display can come up after device_initcall level itself. The above mentioned 3 devices are probed at 0.503253, 0.505210 and 0.523264 secs. Only the first device is probed successfully. With the current deferral mechanism the devices get probed again after late_initcall at 9.19 and 9.35 secs. So display can only come up after 9.35 secs. With this change the devices are re-probed successfully at 0.60 and 0.613 secs. Therefore display can come just after 0.613 secs. OK, so why do you touch the initcall levels earlier than device_? 1) re-probe probing devices in the active list on every level help to avoid circular dependency pending list. 2) There are so many devices which gets deferred in earlier init call levels, so we wanted to reprobe them at every successive init call level. This change helps in overall android bootup as well. How exactly? We have seen less no of re-probes at late_init and most of the driver's dependency met earlier than late_init call level. It helped display and couple of other drivers by executing the re probe work at every init level. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level
From: RAFAEL J. WYSOCKI Date: Wed, Aug 1, 2018 at 2:21 PM Subject: Re: [PATCH] dd: Invoke one probe retry cycle after every initcall level To: Rishabh Bhatnagar Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman , Linux Kernel Mailing List , ckad...@codeaurora.org, ts...@codeaurora.org, Vikram Mulukutla On Wed, Aug 1, 2018 at 11:18 PM, wrote: On 2018-07-24 01:24, Rafael J. Wysocki wrote: On Mon, Jul 23, 2018 at 10:22 PM, wrote: On 2018-07-23 04:17, Rafael J. Wysocki wrote: On Thu, Jul 19, 2018 at 11:24 PM, Rishabh Bhatnagar wrote: Drivers that are registered at an initcall level may have to wait until late_init before the probe deferral mechanism can retry their probe functions. It is possible that their dependencies were resolved much earlier, in some cases even before the next initcall level. Invoke one probe retry cycle at every _sync initcall level, allowing these drivers to be probed earlier. Can you please say something about the actual use case this is expected to address? We have a display driver that depends 3 other devices to be probed so that it can bring-up the display. Because of dependencies not being met the deferral mechanism defers the probes for a later time, even though the dependencies might be met earlier. With this change display can be brought up much earlier. OK What runlevel brings up the display after the change? Thanks, Rafael After the change the display can come up after device_initcall level itself. The above mentioned 3 devices are probed at 0.503253, 0.505210 and 0.523264 secs. Only the first device is probed successfully. With the current deferral mechanism the devices get probed again after late_initcall at 9.19 and 9.35 secs. So display can only come up after 9.35 secs. With this change the devices are re-probed successfully at 0.60 and 0.613 secs. Therefore display can come just after 0.613 secs. OK, so why do you touch the initcall levels earlier than device_? 1) re-probe probing devices in the active list on every level help to avoid circular dependency pending list. 2) There are so many devices which gets deferred in earlier init call levels, so we wanted to reprobe them at every successive init call level. This change helps in overall android bootup as well. How exactly? We have seen less no of re-probes at late_init and most of the driver's dependency met earlier than late_init call level. It helped display and couple of other drivers by executing the re probe work at every init level. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-28 00:47, Darrick J. Wong wrote: On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote: On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote: > > The error should be pretty clear: "Inode table for bg 0 marked as > > needing zeroing". That should never happen. > > Can you provide any debug patch to detect when this corruption is happening? > Source of this corruption and how this is partition getting corrupted? > Or which file system operation lead to this corruption? Do you have a reliable repro? If it's a one-off, it can be caused by *anything*. Crappy hardware, a bug in some proprietary, binary-only GPU driver dereferencing some wild pointer that corrupts kernel memory, etc. Asking for a debug patch is like asking for "can you create technology that can detect when a cockroach enter my house?" Well, ext4 *could* add metadata read and write verifiers to complain loudly in dmesg about stuff that shouldn't be there, so at least we'd know when we're writing cockroaches into the house... :) --D Hi Ted, Below change fixed this issue. Thanks for your support. https://github.com/torvalds/linux/commit/5012284700775a4e6e3fbe7eac4c543c4874b559 "ext4: fix check to prevent initializing reserved inodes" -Thanks, Prasad So if you have a reliable repro, then we know what operations might be triggering the corruption, and then you work on creating a minimal repro, and only *then* when we have a restricted set of possibilities that might be the cause (for example, if removing a GPU call makes the problem go away, then the patch would need to be in the proprietary GPU driver) > I am digging code a bit around this warning to understand more. The warning means that a flag in block group descriptor #0 is set that should never be set. How did the flag get set? There is any number of things that could cause that. You might want to look at the block group descriptor via dumpe2fs or debugfs, to see if it's just a single bit getting flipped, or if the entire block group descriptor is garbage. Note that under normal code paths, the flag *never* gets set by ext4 kernel code. The flag will get set on non-block group 0 block group descriptors by ext4, and the ext4 kernel code will only clear the flag. Of course, if there is a bug in some driver that dereferences a pointer widely, all bets are off. - Ted -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-28 00:47, Darrick J. Wong wrote: On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote: On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote: > > The error should be pretty clear: "Inode table for bg 0 marked as > > needing zeroing". That should never happen. > > Can you provide any debug patch to detect when this corruption is happening? > Source of this corruption and how this is partition getting corrupted? > Or which file system operation lead to this corruption? Do you have a reliable repro? If it's a one-off, it can be caused by *anything*. Crappy hardware, a bug in some proprietary, binary-only GPU driver dereferencing some wild pointer that corrupts kernel memory, etc. Asking for a debug patch is like asking for "can you create technology that can detect when a cockroach enter my house?" Well, ext4 *could* add metadata read and write verifiers to complain loudly in dmesg about stuff that shouldn't be there, so at least we'd know when we're writing cockroaches into the house... :) --D Hi Ted, Below change fixed this issue. Thanks for your support. https://github.com/torvalds/linux/commit/5012284700775a4e6e3fbe7eac4c543c4874b559 "ext4: fix check to prevent initializing reserved inodes" -Thanks, Prasad So if you have a reliable repro, then we know what operations might be triggering the corruption, and then you work on creating a minimal repro, and only *then* when we have a restricted set of possibilities that might be the cause (for example, if removing a GPU call makes the problem go away, then the patch would need to be in the proprietary GPU driver) > I am digging code a bit around this warning to understand more. The warning means that a flag in block group descriptor #0 is set that should never be set. How did the flag get set? There is any number of things that could cause that. You might want to look at the block group descriptor via dumpe2fs or debugfs, to see if it's just a single bit getting flipped, or if the entire block group descriptor is garbage. Note that under normal code paths, the flag *never* gets set by ext4 kernel code. The flag will get set on non-block group 0 block group descriptors by ext4, and the ext4 kernel code will only clear the flag. Of course, if there is a bug in some driver that dereferences a pointer widely, all bets are off. - Ted -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
cpu stopper threads and setaffinity leads to deadlock
Hi Peter and Tglx, We are observing another deadlock issue due to commit 0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock), even after taking the following fix https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html on the Linux-4.14.56 kernel. Here is the scenario that leads to this deadlock. We have used the stress-ng-64 --affinity test case to reproduce this issue in a controlled environment, while simultaneously running CPU hot plug and task migrations. Stress-ng-affin (call stack shown below) is changing its own affinity from cpu3 to cpu7. Stress-ng-affin is preempted in the cpu_stop_queue_work() function as soon as the stopper lock for migration/3 is released . At the same time, on CPU 7, cross migration of tasks happens between cpu3 and cpu7. === Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffd8817e4480 = Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffd8817e4480 state: 0x0 exit_state: 0x0 stack base: 0xff801c8e8000 Prio: 120 Stack: [] __switch_to+0xb8 [] __schedule+0x690 [] preempt_schedule_common+0x100 [] preempt_schedule+0x24 [] _raw_spin_unlock_irqrestore+0x64 [] cpu_stop_queue_work+0x9c [] stop_one_cpu+0x58 [] __set_cpus_allowed_ptr+0x234 [] sched_setaffinity+0x150 [] SyS_sched_setaffinity+0xcc [] el0_svc_naked+0x34 [<0>] UNKNOWN+0x0 Due to cross migration of tasks between cpu7 and cpu3, migration/7 has started executing and waits for the migration/3 task, so that they can proceed within the multi cpu stop state machine together. Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 has started running, and has monopolized cpu7’s execution, stress-ng will never run on cpu7, and cpu3’s migration task is never woken up. Essentially: Due to the nature of the wake_q interface, a thread can only be in at most one wake queue at a time. migration/3 is currently in stress-ng-affin’s wake_q. This means that no other thread can add migration/3 to their wake queue. Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, hot plugging, etc), no thread will wake up migration/3. Below change helped to fix this deadlock. diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index e190d1e..f932e1e 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) __cpu_stop_queue_work(stopper, work, ); else if (work->done) cpu_stop_signal_done(work->done); - raw_spin_unlock_irqrestore(>lock, flags); wake_up_q(); + raw_spin_unlock_irqrestore(>lock, flags); -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
cpu stopper threads and setaffinity leads to deadlock
Hi Peter and Tglx, We are observing another deadlock issue due to commit 0b26351b91(stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock), even after taking the following fix https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1740526.html on the Linux-4.14.56 kernel. Here is the scenario that leads to this deadlock. We have used the stress-ng-64 --affinity test case to reproduce this issue in a controlled environment, while simultaneously running CPU hot plug and task migrations. Stress-ng-affin (call stack shown below) is changing its own affinity from cpu3 to cpu7. Stress-ng-affin is preempted in the cpu_stop_queue_work() function as soon as the stopper lock for migration/3 is released . At the same time, on CPU 7, cross migration of tasks happens between cpu3 and cpu7. === Process: stress-ng-affin, cpu: 3 pid: 1748 start: 0xffd8817e4480 = Task name: stress-ng-affin pid: 1748 cpu: 3 start: ffd8817e4480 state: 0x0 exit_state: 0x0 stack base: 0xff801c8e8000 Prio: 120 Stack: [] __switch_to+0xb8 [] __schedule+0x690 [] preempt_schedule_common+0x100 [] preempt_schedule+0x24 [] _raw_spin_unlock_irqrestore+0x64 [] cpu_stop_queue_work+0x9c [] stop_one_cpu+0x58 [] __set_cpus_allowed_ptr+0x234 [] sched_setaffinity+0x150 [] SyS_sched_setaffinity+0xcc [] el0_svc_naked+0x34 [<0>] UNKNOWN+0x0 Due to cross migration of tasks between cpu7 and cpu3, migration/7 has started executing and waits for the migration/3 task, so that they can proceed within the multi cpu stop state machine together. Unfortunately stress-ng-affin is affine to cpu7, and since migration 7 has started running, and has monopolized cpu7’s execution, stress-ng will never run on cpu7, and cpu3’s migration task is never woken up. Essentially: Due to the nature of the wake_q interface, a thread can only be in at most one wake queue at a time. migration/3 is currently in stress-ng-affin’s wake_q. This means that no other thread can add migration/3 to their wake queue. Thus, even if any attempt is made to stop CPU 3 (e.g. cross-migration, hot plugging, etc), no thread will wake up migration/3. Below change helped to fix this deadlock. diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index e190d1e..f932e1e 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -87,9 +87,9 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) __cpu_stop_queue_work(stopper, work, ); else if (work->done) cpu_stop_signal_done(work->done); - raw_spin_unlock_irqrestore(>lock, flags); wake_up_q(); + raw_spin_unlock_irqrestore(>lock, flags); -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
On 2018-07-30 14:07, Peter Zijlstra wrote: On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote: How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on. You'd trigger the soft-lockup or hung-task detector I think. And if not, we ought to look at making it trigger at least one of those. There is no way to identify the migration threads stuck or not. Should be pretty obvious from the splat generated by the above, no? Hi Peter and Thomas, Thanks for your support. I have another question on this flow and retry mechanism used in this cpu_stop_queue_two_works() function using the global variable stop_cpus_in_progress. This variable is getting used in various paths, such as task migration, set task affinity, and CPU hotplug. For example cpu hotplug path, stop_cpus_in_progress variable getting set with true with out checking. takedown_cpu() --stop_machine_cpuslocked() ---stop_cpus() ---__stop_cpus() queue_stop_cpus_work() setting stop_cpus_in_progress to true directly. But in the task migration path only, the stop_cpus_in_progress variable is used for retry. I am thinking that stop_cpus_in_progress variable lead race conditions, where CPU hotplug and task migration happening simultaneously. Please correct me If my understanding wrong. -Thanks, Prasad --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret; msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; - wait_for_completion(); + ret = wait_for_completion_timeout(, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); + That's a random timeout, which if you spuriously trigger it, will take down your machine. That seems like a cure worse than the disease. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
On 2018-07-30 14:07, Peter Zijlstra wrote: On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote: How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on. You'd trigger the soft-lockup or hung-task detector I think. And if not, we ought to look at making it trigger at least one of those. There is no way to identify the migration threads stuck or not. Should be pretty obvious from the splat generated by the above, no? Hi Peter and Thomas, Thanks for your support. I have another question on this flow and retry mechanism used in this cpu_stop_queue_two_works() function using the global variable stop_cpus_in_progress. This variable is getting used in various paths, such as task migration, set task affinity, and CPU hotplug. For example cpu hotplug path, stop_cpus_in_progress variable getting set with true with out checking. takedown_cpu() --stop_machine_cpuslocked() ---stop_cpus() ---__stop_cpus() queue_stop_cpus_work() setting stop_cpus_in_progress to true directly. But in the task migration path only, the stop_cpus_in_progress variable is used for retry. I am thinking that stop_cpus_in_progress variable lead race conditions, where CPU hotplug and task migration happening simultaneously. Please correct me If my understanding wrong. -Thanks, Prasad --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret; msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; - wait_for_completion(); + ret = wait_for_completion_timeout(, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); + That's a random timeout, which if you spuriously trigger it, will take down your machine. That seems like a cure worse than the disease. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
On 2018-07-30 05:41, Thomas Gleixner wrote: On Mon, 30 Jul 2018, Peter Zijlstra wrote: On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote: > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote: > > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote: > > > Hi all, > > Hi, > > > > > Are there any comments about this patch? > > > > I haven't look in detail at this but your new preempt_disable() makes > > things unbalanced for the err != 0 case. > > It doesn't but that code is really an unreadable pile of ... --- Subject: stop_machine: Reflow cpu_stop_queue_two_works() The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Hi Peter/Thomas, How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on. There is no way to identify the migration threads stuck or not. --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret; msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; - wait_for_completion(); + ret = wait_for_completion_timeout(, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); + Thanks for cleaning that up! Reviewed-by: Thomas Gleixner -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Disable preemption after queueing stopper threads
On 2018-07-30 05:41, Thomas Gleixner wrote: On Mon, 30 Jul 2018, Peter Zijlstra wrote: On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote: > On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote: > > On 2018-07-23 18:13:48 [-0700], isa...@codeaurora.org wrote: > > > Hi all, > > Hi, > > > > > Are there any comments about this patch? > > > > I haven't look in detail at this but your new preempt_disable() makes > > things unbalanced for the err != 0 case. > > It doesn't but that code is really an unreadable pile of ... --- Subject: stop_machine: Reflow cpu_stop_queue_two_works() The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Hi Peter/Thomas, How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on. There is no way to identify the migration threads stuck or not. --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret; msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; - wait_for_completion(); + ret = wait_for_completion_timeout(, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); + Thanks for cleaning that up! Reviewed-by: Thomas Gleixner -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-27 12:52, Theodore Y. Ts'o wrote: On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote: On 2018-07-26 18:04, Sodagudi Prasad wrote: > Hi All, > +linux-kernel@vger.kernel.org list. Hi All, Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommendedt [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? The error should be pretty clear: "Inode table for bg 0 marked as needing zeroing". That should never happen. Hi Ted, Can you provide any debug patch to detect when this corruption is happening? Source of this corruption and how this is partition getting corrupted? Or which file system operation lead to this corruption? I am digging code a bit around this warning to understand more. Thanks in advance for your help. -Thanks, Prasad The warning "mounting fs with errors" means the file system was corrupted. The commit is now telling more about how the file system was corrupted, and is protecting you from further data loss. (You shoud never, ever, ever, allow a file system to be mounted read/write with corruptions. The file system should have been checked using fsck.ext4, aka e2fsck, before the file system was allowed to be mounted.) - Ted -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-27 12:52, Theodore Y. Ts'o wrote: On Fri, Jul 27, 2018 at 12:26:25PM -0700, Sodagudi Prasad wrote: On 2018-07-26 18:04, Sodagudi Prasad wrote: > Hi All, > +linux-kernel@vger.kernel.org list. Hi All, Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommendedt [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? The error should be pretty clear: "Inode table for bg 0 marked as needing zeroing". That should never happen. Hi Ted, Can you provide any debug patch to detect when this corruption is happening? Source of this corruption and how this is partition getting corrupted? Or which file system operation lead to this corruption? I am digging code a bit around this warning to understand more. Thanks in advance for your help. -Thanks, Prasad The warning "mounting fs with errors" means the file system was corrupted. The commit is now telling more about how the file system was corrupted, and is protecting you from further data loss. (You shoud never, ever, ever, allow a file system to be mounted read/write with corruptions. The file system should have been checked using fsck.ext4, aka e2fsck, before the file system was allowed to be mounted.) - Ted -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-26 18:04, Sodagudi Prasad wrote: Hi All, +linux-kernel@vger.kernel.org list. Hi All, Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommended [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommended [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Remounting filesystem read-only
On 2018-07-26 18:04, Sodagudi Prasad wrote: Hi All, +linux-kernel@vger.kernel.org list. Hi All, Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommended [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? Observing the following issue with one of the partition on android device with 4.14.56 kernel. When I try to remount this partition using the command - mount -o rw,remount /vendor/dsp, it is remounting as read-only. [ 191.364358] EXT4-fs error (device sde9): ext4_has_uninit_itable:3108: comm mount: Inode table for bg 0 marked as needing zeroing [ 191.364762] Aborting journal on device sde9-8. [ 191.365226] EXT4-fs (sde9): Remounting filesystem read-only [ 191.365232] EXT4-fs (sde9): re-mounted. Opts: data=ordere If I revert this commit [1] -"ext4: only look at the bg_flags field if it is valid", the issue is not observed. It is just giving following warning message. [1] - https://patchwork.ozlabs.org/patch/929239/. [ 123.373456] EXT4-fs (sde9): warning: mounting fs with errors, running e2fsck is recommended [ 123.389649] EXT4-fs (sde9): re-mounted. Opts: data=ordered Can you provide some inputs what could be the issue with this partition? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Remove cpu swap from stop_two_cpus
On 2018-06-27 00:15, Sebastian Andrzej Siewior wrote: On 2018-06-26 14:28:26 [-0700], Isaac J. Manjarres wrote: Remove CPU ID swapping in stop_two_cpus() so that the source CPU's stopper thread is added to the wake queue last, so that the source CPU's stopper thread is woken up last, ensuring that all other threads that it depends on are woken up before it runs. You can't do that because you could deadlock while locking the stoper lock. Without this change boot up issues are observed with Linux 4.14.52. One of the core is executing the stopper thread after wake_up_q() in cpu_stop_queue_two_works() function, without waking up other cores stopper thread. We see this issue 100% on device boot up with Linux 4.14.52. Could you please explain bit more how the deadlock occurs? static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, int cpu2, struct cpu_stop_work *work2) { struct cpu_stopper *stopper1 = per_cpu_ptr(_stopper, cpu1); struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2); DEFINE_WAKE_Q(wakeq); int err; retry: raw_spin_lock_irq(>lock); raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING); I think, you are suggesting to switch the locking sequence too. stopper2->lock and stopper1->lock. could you please share the test case to stress this code flow? Couldn't you swap cpu1+cpu2 and work1+work2? Work1 and work2 are having same data contents. diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index f89014a..d10d633 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -307,8 +307,6 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(, 2); set_state(, MULTI_STOP_PREPARE); - if (cpu1 > cpu2) - swap(cpu1, cpu2); if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; Sebastian -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] stop_machine: Remove cpu swap from stop_two_cpus
On 2018-06-27 00:15, Sebastian Andrzej Siewior wrote: On 2018-06-26 14:28:26 [-0700], Isaac J. Manjarres wrote: Remove CPU ID swapping in stop_two_cpus() so that the source CPU's stopper thread is added to the wake queue last, so that the source CPU's stopper thread is woken up last, ensuring that all other threads that it depends on are woken up before it runs. You can't do that because you could deadlock while locking the stoper lock. Without this change boot up issues are observed with Linux 4.14.52. One of the core is executing the stopper thread after wake_up_q() in cpu_stop_queue_two_works() function, without waking up other cores stopper thread. We see this issue 100% on device boot up with Linux 4.14.52. Could you please explain bit more how the deadlock occurs? static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, int cpu2, struct cpu_stop_work *work2) { struct cpu_stopper *stopper1 = per_cpu_ptr(_stopper, cpu1); struct cpu_stopper *stopper2 = per_cpu_ptr(_stopper, cpu2); DEFINE_WAKE_Q(wakeq); int err; retry: raw_spin_lock_irq(>lock); raw_spin_lock_nested(>lock, SINGLE_DEPTH_NESTING); I think, you are suggesting to switch the locking sequence too. stopper2->lock and stopper1->lock. could you please share the test case to stress this code flow? Couldn't you swap cpu1+cpu2 and work1+work2? Work1 and work2 are having same data contents. diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index f89014a..d10d633 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -307,8 +307,6 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * cpu_stop_init_done(, 2); set_state(, MULTI_STOP_PREPARE); - if (cpu1 > cpu2) - swap(cpu1, cpu2); if (cpu_stop_queue_two_works(cpu1, , cpu2, )) return -ENOENT; Sebastian -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: write_lock_irq(_lock)
On 2018-05-24 06:51, Boqun Feng wrote: On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote: On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote: > On Wed, May 23, 2018 at 8:35 AM Will Deaconwrote: > > > In other words, qrwlock requires consistent locking order wrt spinlocks. > > I *thought* lockdep already tracked and detected this. Or is that only with > with the sleeping versions? There are patches in-flight to detect this: https://marc.info/?l=linux-kernel=152483640529740=2k as part of Boqun's work into recursive read locking. Hi Linus and Will, Thank you for quick responses and providing the suggestions. Kernel version is locked for couple of products and same issue observed on both 4.14.41 and 4.9.96 kernels. We can only accept the stable updates from upstream for these products. If QUEUED_RWLOCKS works on above listed kernel versions without any issues, we can enabled QUEUED_RWLOCKS. Can we go ahead with Linus suggestion for these kernel version? So that IRQ wont be disabled for quite a long time. static void tasklist_write_lock(void) { unsigned long flags; local_irq_save(flags); while (!write_trylock(_lock)) { local_irq_restore(flags); do { cpu_relax(); } while (write_islocked(_lock)); local_irq_disable(); } } -Thanks, Prasad Yeah, lemme put some details here: So we have three cases: Case #1 (from Will) P0: P1: P2: spin_lock() read_lock() write_lock() read_lock() spin_lock() , which is a deadlock, and couldn't not be detected by lockdep yet. And lockdep could detect this with the patch I attach at the end of the mail. Case #2 P0: P1: P2: spin_lock() read_lock() write_lock() read_lock() spin_lock_irq() , which is not a deadlock, as the read_lock() on P0 can use the unfair fastpass. Case #3 P0: P1: P2: spin_lock_irq() read_lock() write_lock_irq() read_lock() spin_lock() , which is a deadlock, as the read_lock() on P0 cannot use the fastpass. To detect this and not to make case #2 as a false positive, the recursive deadlock detection patchset is needed, the WIP version is at: git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git arr-rfc-wip The code is done, I'm just working on the rework for documention stuff, so if anyone is interested, could try it out ;-) Regards, Boqun --->8 Subject: [PATCH] locking: More accurate annotations for read_lock() On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive read lock, actually it's only recursive if in_interrupt() is true. So change the annotation accordingly to catch more deadlocks. Note we used to treat read_lock() as pure recursive read locks in lib/locking-seftest.c, and this is useful, especially for the lockdep development selftest, so we keep this via a variable to force switching lock annotation for read_lock(). Signed-off-by: Boqun Feng --- include/linux/lockdep.h | 35 ++- lib/locking-selftest.c | 11 +++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..07793986c063 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -27,6 +27,7 @@ extern int lock_stat; #include #include #include +#include /* * We'd rather not expose kernel/lockdep_states.h this wide, but we do need @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct task_struct *curr) } #endif +/* Variable used to make lockdep treat read_lock() as recursive in selftests */ +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS +extern unsigned int force_read_lock_recursive; +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ +#define force_read_lock_recursive 0 +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ + +#ifdef CONFIG_LOCKDEP +/* + * read_lock() is recursive if: + * 1. We force lockdep think this way in selftests or + * 2. The implementation is not queued read/write lock or + * 3. The locker is at an in_interrupt() context. + */ +static inline bool read_lock_is_recursive(void) +{ + return force_read_lock_recursive || + !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) || + in_interrupt(); +} +#else /* CONFIG_LOCKDEP */ +/* If !LOCKDEP, the value is meaningless */ +#define read_lock_is_recursive() 0 +#endif + /* * For trivial one-depth nesting of a lock-class, the following * global define can be used. (Subsystems with multiple levels @@ -561,7 +587,14
Re: write_lock_irq(_lock)
On 2018-05-24 06:51, Boqun Feng wrote: On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote: On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote: > On Wed, May 23, 2018 at 8:35 AM Will Deacon wrote: > > > In other words, qrwlock requires consistent locking order wrt spinlocks. > > I *thought* lockdep already tracked and detected this. Or is that only with > with the sleeping versions? There are patches in-flight to detect this: https://marc.info/?l=linux-kernel=152483640529740=2k as part of Boqun's work into recursive read locking. Hi Linus and Will, Thank you for quick responses and providing the suggestions. Kernel version is locked for couple of products and same issue observed on both 4.14.41 and 4.9.96 kernels. We can only accept the stable updates from upstream for these products. If QUEUED_RWLOCKS works on above listed kernel versions without any issues, we can enabled QUEUED_RWLOCKS. Can we go ahead with Linus suggestion for these kernel version? So that IRQ wont be disabled for quite a long time. static void tasklist_write_lock(void) { unsigned long flags; local_irq_save(flags); while (!write_trylock(_lock)) { local_irq_restore(flags); do { cpu_relax(); } while (write_islocked(_lock)); local_irq_disable(); } } -Thanks, Prasad Yeah, lemme put some details here: So we have three cases: Case #1 (from Will) P0: P1: P2: spin_lock() read_lock() write_lock() read_lock() spin_lock() , which is a deadlock, and couldn't not be detected by lockdep yet. And lockdep could detect this with the patch I attach at the end of the mail. Case #2 P0: P1: P2: spin_lock() read_lock() write_lock() read_lock() spin_lock_irq() , which is not a deadlock, as the read_lock() on P0 can use the unfair fastpass. Case #3 P0: P1: P2: spin_lock_irq() read_lock() write_lock_irq() read_lock() spin_lock() , which is a deadlock, as the read_lock() on P0 cannot use the fastpass. To detect this and not to make case #2 as a false positive, the recursive deadlock detection patchset is needed, the WIP version is at: git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git arr-rfc-wip The code is done, I'm just working on the rework for documention stuff, so if anyone is interested, could try it out ;-) Regards, Boqun --->8 Subject: [PATCH] locking: More accurate annotations for read_lock() On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive read lock, actually it's only recursive if in_interrupt() is true. So change the annotation accordingly to catch more deadlocks. Note we used to treat read_lock() as pure recursive read locks in lib/locking-seftest.c, and this is useful, especially for the lockdep development selftest, so we keep this via a variable to force switching lock annotation for read_lock(). Signed-off-by: Boqun Feng --- include/linux/lockdep.h | 35 ++- lib/locking-selftest.c | 11 +++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 6fc77d4dbdcd..07793986c063 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -27,6 +27,7 @@ extern int lock_stat; #include #include #include +#include /* * We'd rather not expose kernel/lockdep_states.h this wide, but we do need @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct task_struct *curr) } #endif +/* Variable used to make lockdep treat read_lock() as recursive in selftests */ +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS +extern unsigned int force_read_lock_recursive; +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ +#define force_read_lock_recursive 0 +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */ + +#ifdef CONFIG_LOCKDEP +/* + * read_lock() is recursive if: + * 1. We force lockdep think this way in selftests or + * 2. The implementation is not queued read/write lock or + * 3. The locker is at an in_interrupt() context. + */ +static inline bool read_lock_is_recursive(void) +{ + return force_read_lock_recursive || + !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) || + in_interrupt(); +} +#else /* CONFIG_LOCKDEP */ +/* If !LOCKDEP, the value is meaningless */ +#define read_lock_is_recursive() 0 +#endif + /* * For trivial one-depth nesting of a lock-class, the following * global define can be used. (Subsystems with multiple levels @@ -561,7 +587,14 @@ static inline void
write_lock_irq(_lock)
Hi All, When following test is executed on 4.14.41 stable kernel, observed that one of the core is waiting for tasklist_lock for long time with IRQs disabled. ./stress-ng-64 --get 8 -t 3h --times --metrics-brief Every time when device is crashed, I observed that one the task stuck at fork system call and waiting for tasklist_lock as writer with irq disabled. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 Some other tasks are making getrlimit, prlimit system calls, so that these readers are continuously taking tasklist_list read lock. Writer has disabled local IRQs for long time and waiting to readers to finish but readers are keeping tasklist_lock busy for quite long time. I think, −−get N option creates N thread and they make following system calls. start N workers that call system calls that fetch data from the kernel, currently these are: getpid, getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, getpriority, getresgid, getresuid, getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, uname, adjtimex, sysfs. Some of these system calls are OS specific. Have you observed this type of issues with tasklist_lock ? Do we need write_lock_irq(_lock) in below portion of code ? Can I use write_unlock instead of write_lock_irq in portion of code? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
write_lock_irq(_lock)
Hi All, When following test is executed on 4.14.41 stable kernel, observed that one of the core is waiting for tasklist_lock for long time with IRQs disabled. ./stress-ng-64 --get 8 -t 3h --times --metrics-brief Every time when device is crashed, I observed that one the task stuck at fork system call and waiting for tasklist_lock as writer with irq disabled. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 Some other tasks are making getrlimit, prlimit system calls, so that these readers are continuously taking tasklist_list read lock. Writer has disabled local IRQs for long time and waiting to readers to finish but readers are keeping tasklist_lock busy for quite long time. I think, −−get N option creates N thread and they make following system calls. start N workers that call system calls that fetch data from the kernel, currently these are: getpid, getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid, getpriority, getresgid, getresuid, getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday, uname, adjtimex, sysfs. Some of these system calls are OS specific. Have you observed this type of issues with tasklist_lock ? Do we need write_lock_irq(_lock) in below portion of code ? Can I use write_unlock instead of write_lock_irq in portion of code? https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843 -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
bpf: mismatch between inode->i_private and inode->i_op
Hi All, Trying to boot 4.14.19 kernel with Android and observed a crash in bpf system. From the call stack netd process is trying to access the /sys/fs/bfp/traffic_cookie_uid_map file. After checking call stack further observed that, inode->i_private is pointing to (struct bpf_map *) but inode->i_op is pointing to the bpf_prog_fops. [ 4134.721483] Unable to handle kernel paging request at virtual address 80001 [ 4134.820925] Mem abort info: [ 4134.901283] Exception class = DABT (current EL), IL = 32 bits [ 4135.016736] SET = 0, FnV = 0 [ 4135.119820] EA = 0, S1PTW = 0 [ 4135.201431] Data abort info: [ 4135.301388] ISV = 0, ISS = 0x0021 [ 4135.359599] CM = 0, WnR = 0 [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffe39b946000 [ 4135.499757] [00080001] *pgd=, *pud= [ 4135.660725] Internal error: Oops: 9621 [#1] PREEMPT SMP [ 4135.674610] Modules linked in: [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1 [ 4135.716188] task: ffe39f4aa380 task.stack: ff801d4e [ 4135.731599] PC is at bpf_prog_add+0x20/0x68 [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c [ 4135.751788] pc : [] lr : [] pstate: 60400145 [ 4135.769062] sp : ff801d4e3ce0 [ 4135.777241] x29: ff801d4e3cf0 x28: ffe39f4aa380 [ 4135.790125] x27: ffe39f4aa380 x26: 0118 [ 4135.803020] x25: 0030 x24: 0015 [ 4135.815799] x23: x22: 01080020 [ 4135.828704] x21: ffe3a01bf8f8 x20: 0001 [ 4135.841494] x19: ffe383821280 x18: [ 4135.854346] x17: x16: ff94acc92000 [ 4135.867083] x15: 00042e7f x14: ff94ace81d88 [ 4135.880041] x13: 0001e4a7 x12: 0110 [ 4135.892894] x11: bdf96acb38411b00 x10: [ 4135.905610] x9 : ff94ac24c400 x8 : 00080001 [ 4135.918578] x7 : x6 : ff801d4e3c24 [ 4135.931357] x5 : ff801d4e3c10 x4 : [ 4135.944231] x3 : x2 : [ 4135.957062] x1 : 0001 x0 : ffe383821280 [ 4135.969757] [ 4135.969757] PC: 0xff94ab7ad544: [ 4135.981567] d544 91074400 91278021 72a00103 aa1303e2 94036495 a9417bfd f84207f3 d65f03c0 [ 4136.001588] d564 a9be4ff4 a9017bfd 910043fd 2a0103f4 aa0003f3 d503201f f9400e68 f9800111 [ 4136.021504] d584 885f7d09 0b140129 880afd09 35aa d5033bbf 7140213f 5400010d f9400e69 [ 4136.041346] d5a4 b27ceff3 f9800131 885f7d28 4b140108 880b7d28 35ab a9417bfd aa1303e0 [ 4136.061210] [ 4136.061210] LR: 0xff94ab7ad5f8: [ 4136.073031] d5f8 35aa d5033bbf 3489 a9417bfd a8c24ff4 d65f03c0 d421 17fc [ 4136.093094] d618 f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f 320003e1 aa1303e0 97cc [ 4136.112946] d638 a9417bfd f84207f3 d65f03c0 f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f [ 4136.132694] d658 f9400e6a b9400149 2a0903e8 340002a8 f9400e6b 1100050d 93407d0c 93407dae [ 4136.152810] [ 4136.152810] SP: 0xff801d4e3ca0: [ 4136.164610] 3ca0 ab7ad584 ff94 60400145 a01bf8f8 ffe3 0006 [ 4136.184188] 3cc0 0080 ab844204 ff94 1d4e3cf0 ff80 ab7ad584 ff94 [ 4136.204262] 3ce0 83821280 ffe3 1d4e3d10 ff80 ab7ad638 ff94 [ 4136.224210] 3d00 9136a3c0 ffe3 9136a3c0 ffe3 1d4e3d70 ff80 ab7b5d08 ff94 [ 4136.253946] [ 4136.258315] Process netd (pid: 1260, stack limit = 0xff801d4e) [ 4136.273746] Call trace: [ 4136.280146] Exception stack(0xff801d4e3ba0 to 0xff801d4e3ce0) [ 4136.295378] 3ba0: ffe383821280 0001 [ 4136.313715] 3bc0: ff801d4e3c10 ff801d4e3c24 [ 4136.332157] 3be0: 00080001 ff94ac24c400 bdf96acb38411b00 [ 4136.350831] 3c00: 0110 0001e4a7 ff94ace81d88 00042e7f [ 4136.369115] 3c20: ff94acc92000 ffe383821280 [ 4136.387357] 3c40: 0001 ffe3a01bf8f8 01080020 [ 4136.405694] 3c60: 0015 0030 0118 ffe39f4aa380 [ 4136.424136] 3c80: ffe39f4aa380 ff801d4e3cf0 ff94ab7ad638 ff801d4e3ce0 [ 4136.442494] 3ca0: ff94ab7ad584 60400145 ffe3a01bf8f8 0006 [ 4136.460936] 3cc0: 0080 ff94ab844204 ff801d4e3cf0 ff94ab7ad584 [ 4136.479241] [] bpf_prog_add+0x20/0x68 [ 4136.491767] [] bpf_prog_inc+0x20/0x2c [ 4136.504536] [] bpf_obj_get_user+0x204/0x22c [ 4136.518746] [] SyS_bpf+0x5a8/0x1a88 User space is trying to access a bpf map, so inode->i_op and inode->i_priate should be pointing to bpf_map_iops and struct bpf_map * respectively. But inode ->I_private value is correct but the inode->i_op is not. Are there any race conditions assigning
bpf: mismatch between inode->i_private and inode->i_op
Hi All, Trying to boot 4.14.19 kernel with Android and observed a crash in bpf system. From the call stack netd process is trying to access the /sys/fs/bfp/traffic_cookie_uid_map file. After checking call stack further observed that, inode->i_private is pointing to (struct bpf_map *) but inode->i_op is pointing to the bpf_prog_fops. [ 4134.721483] Unable to handle kernel paging request at virtual address 80001 [ 4134.820925] Mem abort info: [ 4134.901283] Exception class = DABT (current EL), IL = 32 bits [ 4135.016736] SET = 0, FnV = 0 [ 4135.119820] EA = 0, S1PTW = 0 [ 4135.201431] Data abort info: [ 4135.301388] ISV = 0, ISS = 0x0021 [ 4135.359599] CM = 0, WnR = 0 [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffe39b946000 [ 4135.499757] [00080001] *pgd=, *pud= [ 4135.660725] Internal error: Oops: 9621 [#1] PREEMPT SMP [ 4135.674610] Modules linked in: [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1 [ 4135.716188] task: ffe39f4aa380 task.stack: ff801d4e [ 4135.731599] PC is at bpf_prog_add+0x20/0x68 [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c [ 4135.751788] pc : [] lr : [] pstate: 60400145 [ 4135.769062] sp : ff801d4e3ce0 [ 4135.777241] x29: ff801d4e3cf0 x28: ffe39f4aa380 [ 4135.790125] x27: ffe39f4aa380 x26: 0118 [ 4135.803020] x25: 0030 x24: 0015 [ 4135.815799] x23: x22: 01080020 [ 4135.828704] x21: ffe3a01bf8f8 x20: 0001 [ 4135.841494] x19: ffe383821280 x18: [ 4135.854346] x17: x16: ff94acc92000 [ 4135.867083] x15: 00042e7f x14: ff94ace81d88 [ 4135.880041] x13: 0001e4a7 x12: 0110 [ 4135.892894] x11: bdf96acb38411b00 x10: [ 4135.905610] x9 : ff94ac24c400 x8 : 00080001 [ 4135.918578] x7 : x6 : ff801d4e3c24 [ 4135.931357] x5 : ff801d4e3c10 x4 : [ 4135.944231] x3 : x2 : [ 4135.957062] x1 : 0001 x0 : ffe383821280 [ 4135.969757] [ 4135.969757] PC: 0xff94ab7ad544: [ 4135.981567] d544 91074400 91278021 72a00103 aa1303e2 94036495 a9417bfd f84207f3 d65f03c0 [ 4136.001588] d564 a9be4ff4 a9017bfd 910043fd 2a0103f4 aa0003f3 d503201f f9400e68 f9800111 [ 4136.021504] d584 885f7d09 0b140129 880afd09 35aa d5033bbf 7140213f 5400010d f9400e69 [ 4136.041346] d5a4 b27ceff3 f9800131 885f7d28 4b140108 880b7d28 35ab a9417bfd aa1303e0 [ 4136.061210] [ 4136.061210] LR: 0xff94ab7ad5f8: [ 4136.073031] d5f8 35aa d5033bbf 3489 a9417bfd a8c24ff4 d65f03c0 d421 17fc [ 4136.093094] d618 f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f 320003e1 aa1303e0 97cc [ 4136.112946] d638 a9417bfd f84207f3 d65f03c0 f81e0ff3 a9017bfd 910043fd aa0003f3 d503201f [ 4136.132694] d658 f9400e6a b9400149 2a0903e8 340002a8 f9400e6b 1100050d 93407d0c 93407dae [ 4136.152810] [ 4136.152810] SP: 0xff801d4e3ca0: [ 4136.164610] 3ca0 ab7ad584 ff94 60400145 a01bf8f8 ffe3 0006 [ 4136.184188] 3cc0 0080 ab844204 ff94 1d4e3cf0 ff80 ab7ad584 ff94 [ 4136.204262] 3ce0 83821280 ffe3 1d4e3d10 ff80 ab7ad638 ff94 [ 4136.224210] 3d00 9136a3c0 ffe3 9136a3c0 ffe3 1d4e3d70 ff80 ab7b5d08 ff94 [ 4136.253946] [ 4136.258315] Process netd (pid: 1260, stack limit = 0xff801d4e) [ 4136.273746] Call trace: [ 4136.280146] Exception stack(0xff801d4e3ba0 to 0xff801d4e3ce0) [ 4136.295378] 3ba0: ffe383821280 0001 [ 4136.313715] 3bc0: ff801d4e3c10 ff801d4e3c24 [ 4136.332157] 3be0: 00080001 ff94ac24c400 bdf96acb38411b00 [ 4136.350831] 3c00: 0110 0001e4a7 ff94ace81d88 00042e7f [ 4136.369115] 3c20: ff94acc92000 ffe383821280 [ 4136.387357] 3c40: 0001 ffe3a01bf8f8 01080020 [ 4136.405694] 3c60: 0015 0030 0118 ffe39f4aa380 [ 4136.424136] 3c80: ffe39f4aa380 ff801d4e3cf0 ff94ab7ad638 ff801d4e3ce0 [ 4136.442494] 3ca0: ff94ab7ad584 60400145 ffe3a01bf8f8 0006 [ 4136.460936] 3cc0: 0080 ff94ab844204 ff801d4e3cf0 ff94ab7ad584 [ 4136.479241] [] bpf_prog_add+0x20/0x68 [ 4136.491767] [] bpf_prog_inc+0x20/0x2c [ 4136.504536] [] bpf_obj_get_user+0x204/0x22c [ 4136.518746] [] SyS_bpf+0x5a8/0x1a88 User space is trying to access a bpf map, so inode->i_op and inode->i_priate should be pointing to bpf_map_iops and struct bpf_map * respectively. But inode ->I_private value is correct but the inode->i_op is not. Are there any race conditions assigning
Re: [PATCH] perf: Add support for creating offline events
On 2018-02-13 10:23, Peter Zijlstra wrote: On Fri, Feb 09, 2018 at 03:07:00PM -0800, Raghavendra Rao Ananta wrote: Perf framework doesn't allow creation of hardware events if the requested CPU is offline. However, creation of an event is achievable if the event is attached to the PMU as soon as the CPU is online again. So, introducing a feature that could allow to create events even when the CPU is offline and return a success to the caller. If, during the time of event creation, the CPU is found offline, the event is moved to a new state (PERF_EVENT_STATE_DORMANT). As and when the CPU is know to be woken up (through hotplug notifiers), all the dormant events would be attached to the PMU (by perf_install_in_context()). If during the life time of the event, the CPU hasn't come online, the dormant event would just be freed. This is horrible.. and you seem to have forgotten to explain why you care about offline CPUs. Hi Peter, Up to 4.9 kernel, drivers can register cpu hotplug notfiters and drivers are able to create perf events dynamically based cpu notifies callback events. As cpu hot plug is converted to state machine approach from hot plug notifiers, every driver need to define a cpuhp state and registers with cpu hotplug state machine for creating perf events dynamically. Qualcomm have use cases to monitor the cpu cycles and other hw events continuously on all cpus from kernel and profiling tools. So we are thinking that there could be other soc vendors, who are interested in perf events preserving across cpu hot plug and perf events creation on hot plugged cores. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] perf: Add support for creating offline events
On 2018-02-13 10:23, Peter Zijlstra wrote: On Fri, Feb 09, 2018 at 03:07:00PM -0800, Raghavendra Rao Ananta wrote: Perf framework doesn't allow creation of hardware events if the requested CPU is offline. However, creation of an event is achievable if the event is attached to the PMU as soon as the CPU is online again. So, introducing a feature that could allow to create events even when the CPU is offline and return a success to the caller. If, during the time of event creation, the CPU is found offline, the event is moved to a new state (PERF_EVENT_STATE_DORMANT). As and when the CPU is know to be woken up (through hotplug notifiers), all the dormant events would be attached to the PMU (by perf_install_in_context()). If during the life time of the event, the CPU hasn't come online, the dormant event would just be freed. This is horrible.. and you seem to have forgotten to explain why you care about offline CPUs. Hi Peter, Up to 4.9 kernel, drivers can register cpu hotplug notfiters and drivers are able to create perf events dynamically based cpu notifies callback events. As cpu hot plug is converted to state machine approach from hot plug notifiers, every driver need to define a cpuhp state and registers with cpu hotplug state machine for creating perf events dynamically. Qualcomm have use cases to monitor the cpu cycles and other hw events continuously on all cpus from kernel and profiling tools. So we are thinking that there could be other soc vendors, who are interested in perf events preserving across cpu hot plug and perf events creation on hot plugged cores. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] kbuild: clang: Disable -Wunused-const-variable warnings
On 2018-01-28 08:22, Segher Boessenkool wrote: On Fri, Jan 26, 2018 at 04:59:46PM -0800, Prasad Sodagudi wrote: Disable -Wunused-const-variable warnings instead of disabling -Wunused-variable warnings, So that in both clang and GCC -Wunused-const-variable gets disabled. Why would you disable -Wunused-const-variable on GCC? You do not explain why. Hi Segher, Please check below discussion with Greg and Masahiro about "unused-variable". https://lkml.org/lkml/2017/12/18/697 Please check this link, "unused-const-variable" warning is already disabled for GCC. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile?h=v4.15#n719 Below commit is disabling the "unused-const-variable" for GCC. commit - c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to W=1 warning level" Currently "unused-variable" warnings are disabled for clang and not for the gcc. Disabling of "unused-variable" warning also disables "unused-const-variable" warning. To match same settings between clang and gcc, disabling "unused-const-variable" warnings for clang as well. So with this patch, keeping the same settings for GCC and CLANG with respect to "unused-const-variable". -Thanks, Prasad Segher -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] kbuild: clang: Disable -Wunused-const-variable warnings
On 2018-01-28 08:22, Segher Boessenkool wrote: On Fri, Jan 26, 2018 at 04:59:46PM -0800, Prasad Sodagudi wrote: Disable -Wunused-const-variable warnings instead of disabling -Wunused-variable warnings, So that in both clang and GCC -Wunused-const-variable gets disabled. Why would you disable -Wunused-const-variable on GCC? You do not explain why. Hi Segher, Please check below discussion with Greg and Masahiro about "unused-variable". https://lkml.org/lkml/2017/12/18/697 Please check this link, "unused-const-variable" warning is already disabled for GCC. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile?h=v4.15#n719 Below commit is disabling the "unused-const-variable" for GCC. commit - c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to W=1 warning level" Currently "unused-variable" warnings are disabled for clang and not for the gcc. Disabling of "unused-variable" warning also disables "unused-const-variable" warning. To match same settings between clang and gcc, disabling "unused-const-variable" warnings for clang as well. So with this patch, keeping the same settings for GCC and CLANG with respect to "unused-const-variable". -Thanks, Prasad Segher -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: unused-variable warning is getting disabled with clang
On 2017-12-06 22:26, Greg Kroah-Hartman wrote: On Wed, Dec 06, 2017 at 01:24:51PM -0800, Sodagudi Prasad wrote: Hi All, When kernel compiled with clang, following line is disabling the unused-variable warning. This is not the case with gcc. KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) Are there any specific reasons for disabling unused-variable with clang? Try it and see why it is disabled :) Hi Greg, When I have enabled -Wunused-variable warnings with clang, observed both -Wunused-variable and -Wunused-const-variable as expected. It looks that, -Wunused-const-variable warnings are disabled explicitly with GCC as well. commit - c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to W=1 warning level" I could see following warnings along with couple of -Wunused-variables warnings with downstream code. arch/arm64/crypto/sha1-ce-glue.c:118:1: warning: unused variable 'cpu_feature_match_SHA1' [-Wunused-const-variable] include/linux/cpufeature.h:48:33: note: expanded from macro 'module_cpu_feature_match' arch/arm64/crypto/sha2-ce-glue.c:148:1: warning: unused variable 'cpu_feature_match_SHA2' [-Wunused-const-variable] arch/arm64/crypto/ghash-ce-glue.c:597:33: warning: unused variable 'ghash_cpu_feature' [-Wunused-const-variable] arch/arm64/crypto/aes-ce-cipher.c:280:1: warning: unused variable 'cpu_feature_match_AES' [-Wunused-const-variable] arch/arm64/crypto/aes-glue.c:674:1: warning: unused variable 'cpu_feature_match_AES' [-Wunused-const-variable] kernel/trace/ftrace.c:1092:27: warning: unused variable 'ftrace_swapper_pid' [-Wunused-const-variable] drivers/usb/host/ehci-platform.c:406:36: warning: unused variable 'ehci_acpi_match' [-Wunused-const-variable] drivers/usb/host/xhci-plat.c:416:36: warning: unused variable 'usb_xhci_acpi_match' [-Wunused-const-variable] So I have made following change and I will share patch for the same. diff --git a/Makefile b/Makefile index 4e6da2f..8a6c14e 100644 --- a/Makefile +++ b/Makefile @@ -711,7 +711,7 @@ endif KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) -meabi gnu KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,) -KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) +KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable) KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier) KBUILD_CFLAGS += $(call cc-disable-warning, gnu) KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member) Please let me know if you have any concerns with this approach to identify all unused local variables. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: unused-variable warning is getting disabled with clang
On 2017-12-06 22:26, Greg Kroah-Hartman wrote: On Wed, Dec 06, 2017 at 01:24:51PM -0800, Sodagudi Prasad wrote: Hi All, When kernel compiled with clang, following line is disabling the unused-variable warning. This is not the case with gcc. KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) Are there any specific reasons for disabling unused-variable with clang? Try it and see why it is disabled :) Hi Greg, When I have enabled -Wunused-variable warnings with clang, observed both -Wunused-variable and -Wunused-const-variable as expected. It looks that, -Wunused-const-variable warnings are disabled explicitly with GCC as well. commit - c9c6837d39311b0c - "kbuild: move -Wunused-const-variable to W=1 warning level" I could see following warnings along with couple of -Wunused-variables warnings with downstream code. arch/arm64/crypto/sha1-ce-glue.c:118:1: warning: unused variable 'cpu_feature_match_SHA1' [-Wunused-const-variable] include/linux/cpufeature.h:48:33: note: expanded from macro 'module_cpu_feature_match' arch/arm64/crypto/sha2-ce-glue.c:148:1: warning: unused variable 'cpu_feature_match_SHA2' [-Wunused-const-variable] arch/arm64/crypto/ghash-ce-glue.c:597:33: warning: unused variable 'ghash_cpu_feature' [-Wunused-const-variable] arch/arm64/crypto/aes-ce-cipher.c:280:1: warning: unused variable 'cpu_feature_match_AES' [-Wunused-const-variable] arch/arm64/crypto/aes-glue.c:674:1: warning: unused variable 'cpu_feature_match_AES' [-Wunused-const-variable] kernel/trace/ftrace.c:1092:27: warning: unused variable 'ftrace_swapper_pid' [-Wunused-const-variable] drivers/usb/host/ehci-platform.c:406:36: warning: unused variable 'ehci_acpi_match' [-Wunused-const-variable] drivers/usb/host/xhci-plat.c:416:36: warning: unused variable 'usb_xhci_acpi_match' [-Wunused-const-variable] So I have made following change and I will share patch for the same. diff --git a/Makefile b/Makefile index 4e6da2f..8a6c14e 100644 --- a/Makefile +++ b/Makefile @@ -711,7 +711,7 @@ endif KBUILD_CFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) -meabi gnu KBUILD_AFLAGS += $(CLANG_TARGET) $(CLANG_GCC_TC) KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,) -KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) +KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable) KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier) KBUILD_CFLAGS += $(call cc-disable-warning, gnu) KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member) Please let me know if you have any concerns with this approach to identify all unused local variables. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
unused-variable warning is getting disabled with clang
Hi All, When kernel compiled with clang, following line is disabling the unused-variable warning. This is not the case with gcc. KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) Are there any specific reasons for disabling unused-variable with clang? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
unused-variable warning is getting disabled with clang
Hi All, When kernel compiled with clang, following line is disabling the unused-variable warning. This is not the case with gcc. KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable) Are there any specific reasons for disabling unused-variable with clang? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] Clockevents: Always call clockevents_program_event
On 2017-10-24 01:37, Thomas Gleixner wrote: On Tue, 24 Oct 2017, Prasad Sodagudi wrote: Currently tick_program_event function is not calling clockevents_program_event when 'expires == KTIME_MAX', it is just updating clockevent state to CLOCK_EVT_STATE_ONESHOT_STOPPED. clockevents_program_event function updates clockevent device next_event by checking clockevent device state, so always call clockevents_program_event() from tick_program_event. No. This is fundmentally wrong. If the clockevent is in oneshot stopped mode then you cannot call clockevents_program_event(). There is even a warning in that code which will trigger. Yes. There is warning and I overlooked at that part of the code and thought it would return from the clockevents_program_event function after next_event update. dev->next_event = expires; if (clockevent_state_shutdown(dev)) return 0; Thanks tglx for reviewing patch. How to clean next next_event from clockevent device in the ONESHOT_STOPPED state from tick_program_event()? Shall I update the next patch set with following condition? Or Any other suggestions to fix this path? diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 4237e07..21104b6 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -317,7 +317,8 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, dev->next_event = expires; - if (clockevent_state_shutdown(dev)) + if (clockevent_state_shutdown(dev) || + clockevent_state_oneshot_stopped(dev)) return 0; Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] Clockevents: Always call clockevents_program_event
On 2017-10-24 01:37, Thomas Gleixner wrote: On Tue, 24 Oct 2017, Prasad Sodagudi wrote: Currently tick_program_event function is not calling clockevents_program_event when 'expires == KTIME_MAX', it is just updating clockevent state to CLOCK_EVT_STATE_ONESHOT_STOPPED. clockevents_program_event function updates clockevent device next_event by checking clockevent device state, so always call clockevents_program_event() from tick_program_event. No. This is fundmentally wrong. If the clockevent is in oneshot stopped mode then you cannot call clockevents_program_event(). There is even a warning in that code which will trigger. Yes. There is warning and I overlooked at that part of the code and thought it would return from the clockevents_program_event function after next_event update. dev->next_event = expires; if (clockevent_state_shutdown(dev)) return 0; Thanks tglx for reviewing patch. How to clean next next_event from clockevent device in the ONESHOT_STOPPED state from tick_program_event()? Shall I update the next patch set with following condition? Or Any other suggestions to fix this path? diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c index 4237e07..21104b6 100644 --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -317,7 +317,8 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, dev->next_event = expires; - if (clockevent_state_shutdown(dev)) + if (clockevent_state_shutdown(dev) || + clockevent_state_oneshot_stopped(dev)) return 0; Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: clock event device’s next_event
On 2017-10-24 00:23, Thomas Gleixner wrote: On Mon, 23 Oct 2017, Sodagudi Prasad wrote: Hi Viresh and Thomas, In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX we are canceling the tick_sched_timer timer but we are not updating the clock event device’s next_event to KTIME_MAX. Due to that broadcast device’s next_event is not programmed properly and resulting unnecessary wakeups for this cpu. /* * If the expiration time == KTIME_MAX, then we simply stop * the tick timer. */ if (unlikely(expires == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(>sched_timer); goto out; } Right, because this code does not have access to the broadcast device at all. It doesn't even know and care about it. Some of the drivers in cpuidle frame works are using tick_nohz_get_sleep_length API to find maximum idle/sleep time from ts->sleep_length. And also the estimated sleep length until the next timer in tick_nohz_stop_sched_tick not getting updated at the end. After digging further, I see that following call flow is updating tick_cpu_device state to shutdown state but clock event device next_event is not updated to KTIME_MAX. hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> tick_program_event. int tick_program_event(ktime_t expires, int force) { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); if (unlikely(expires == KTIME_MAX)) { /* * We don't need the clock event device any more, stop it. */ clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); return 0; } In the above tick_program_event() function clock event device’s next_event is not getting updated as clockevents_program_event() function not called after state update. If the device is shutdown, then next_event does not matter. But yes, for consistency reasons we could set it to KTIME_MAX. Thanks tglx for your quick reply on this and I will send patch for the same. Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: clock event device’s next_event
On 2017-10-24 00:23, Thomas Gleixner wrote: On Mon, 23 Oct 2017, Sodagudi Prasad wrote: Hi Viresh and Thomas, In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX we are canceling the tick_sched_timer timer but we are not updating the clock event device’s next_event to KTIME_MAX. Due to that broadcast device’s next_event is not programmed properly and resulting unnecessary wakeups for this cpu. /* * If the expiration time == KTIME_MAX, then we simply stop * the tick timer. */ if (unlikely(expires == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(>sched_timer); goto out; } Right, because this code does not have access to the broadcast device at all. It doesn't even know and care about it. Some of the drivers in cpuidle frame works are using tick_nohz_get_sleep_length API to find maximum idle/sleep time from ts->sleep_length. And also the estimated sleep length until the next timer in tick_nohz_stop_sched_tick not getting updated at the end. After digging further, I see that following call flow is updating tick_cpu_device state to shutdown state but clock event device next_event is not updated to KTIME_MAX. hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> tick_program_event. int tick_program_event(ktime_t expires, int force) { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); if (unlikely(expires == KTIME_MAX)) { /* * We don't need the clock event device any more, stop it. */ clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); return 0; } In the above tick_program_event() function clock event device’s next_event is not getting updated as clockevents_program_event() function not called after state update. If the device is shutdown, then next_event does not matter. But yes, for consistency reasons we could set it to KTIME_MAX. Thanks tglx for your quick reply on this and I will send patch for the same. Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
clock event device’s next_event
Hi Viresh and Thomas, In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX we are canceling the tick_sched_timer timer but we are not updating the clock event device’s next_event to KTIME_MAX. Due to that broadcast device’s next_event is not programmed properly and resulting unnecessary wakeups for this cpu. /* * If the expiration time == KTIME_MAX, then we simply stop * the tick timer. */ if (unlikely(expires == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(>sched_timer); goto out; } After digging further, I see that following call flow is updating tick_cpu_device state to shutdown state but clock event device next_event is not updated to KTIME_MAX. hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> tick_program_event. int tick_program_event(ktime_t expires, int force) { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); if (unlikely(expires == KTIME_MAX)) { /* * We don't need the clock event device any more, stop it. */ clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); return 0; } In the above tick_program_event() function clock event device’s next_event is not getting updated as clockevents_program_event() function not called after state update. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
clock event device’s next_event
Hi Viresh and Thomas, In the functions tick_nohz_stop_sched_tick(), when expires = KTIME_MAX we are canceling the tick_sched_timer timer but we are not updating the clock event device’s next_event to KTIME_MAX. Due to that broadcast device’s next_event is not programmed properly and resulting unnecessary wakeups for this cpu. /* * If the expiration time == KTIME_MAX, then we simply stop * the tick timer. */ if (unlikely(expires == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(>sched_timer); goto out; } After digging further, I see that following call flow is updating tick_cpu_device state to shutdown state but clock event device next_event is not updated to KTIME_MAX. hrtimer_cancel -> __remove_hrtimer -> hrtimer_force_reprogram -> tick_program_event. int tick_program_event(ktime_t expires, int force) { struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); if (unlikely(expires == KTIME_MAX)) { /* * We don't need the clock event device any more, stop it. */ clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT_STOPPED); return 0; } In the above tick_program_event() function clock event device’s next_event is not getting updated as clockevents_program_event() function not called after state update. -thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
__ubsan_handle_type_mismatch converted to __ubsan_handle_type_mismatch_v1
Hi All, Based on below links __ubsan_handle_type_mismatch has been renamed to __ubsan_handle_type_mismatch_v1. https://github.com/llvm-mirror/compiler-rt/commit/56faee71af1888ba12ab076b3d1f9bbe223493df#diff-21369cc6f3917b27df3ced8de89cf134 https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg535130.html When I tried to compile the kernel with LLVM seen following errors. arch/arm64/kernel/traps.o: In function `dump_backtrace': kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' arch/arm64/kernel/traps.o: In function `dump_mem': Can I add __ubsan_handle_type_mismatch_v1 API similar to __ubsan_handle_type_mismatch()(I will send path for review)? If both apis are present, then there will be backward compatibility. Let me know if you have any other suggestions for this issue. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
__ubsan_handle_type_mismatch converted to __ubsan_handle_type_mismatch_v1
Hi All, Based on below links __ubsan_handle_type_mismatch has been renamed to __ubsan_handle_type_mismatch_v1. https://github.com/llvm-mirror/compiler-rt/commit/56faee71af1888ba12ab076b3d1f9bbe223493df#diff-21369cc6f3917b27df3ced8de89cf134 https://www.mail-archive.com/gcc-bugs@gcc.gnu.org/msg535130.html When I tried to compile the kernel with LLVM seen following errors. arch/arm64/kernel/traps.o: In function `dump_backtrace': kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' kernel/arch/arm64/kernel/traps.c:192: undefined reference to `__ubsan_handle_type_mismatch_v1' arch/arm64/kernel/traps.o: In function `dump_mem': Can I add __ubsan_handle_type_mismatch_v1 API similar to __ubsan_handle_type_mismatch()(I will send path for review)? If both apis are present, then there will be backward compatibility. Let me know if you have any other suggestions for this issue. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] arch_topology: Fix section miss match warning due to free_raw_capacity()
On 2017-09-25 16:20, Viresh Kumar wrote: On 25-09-17, 15:51, Prasad Sodagudi wrote: Remove the __init annotation from free_raw_capacity() to avoid the following warning. The function init_cpu_capacity_callback() references the function __init free_raw_capacity(). WARNING: vmlinux.o(.text+0x425cc0): Section mismatch in reference from the function init_cpu_capacity_callback() to the function .init.text:free_raw_capacity(). Signed-off-by: Prasad Sodagudi--- drivers/base/arch_topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 41be9ff..3da53cc 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -96,7 +96,7 @@ static int register_cpu_capacity_sysctl(void) static u32 capacity_scale; static u32 *raw_capacity; -static int __init free_raw_capacity(void) +static int free_raw_capacity(void) { kfree(raw_capacity); raw_capacity = NULL; So we need the __init thing only when cpufreq support isn't there in the kernel and I am not sure if we want to add more ifdef hackery in the declaration of free_raw_capacity(). Acked-by: Viresh Kumar Hi Greg, Can you please consider this patch ? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] arch_topology: Fix section miss match warning due to free_raw_capacity()
On 2017-09-25 16:20, Viresh Kumar wrote: On 25-09-17, 15:51, Prasad Sodagudi wrote: Remove the __init annotation from free_raw_capacity() to avoid the following warning. The function init_cpu_capacity_callback() references the function __init free_raw_capacity(). WARNING: vmlinux.o(.text+0x425cc0): Section mismatch in reference from the function init_cpu_capacity_callback() to the function .init.text:free_raw_capacity(). Signed-off-by: Prasad Sodagudi --- drivers/base/arch_topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index 41be9ff..3da53cc 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -96,7 +96,7 @@ static int register_cpu_capacity_sysctl(void) static u32 capacity_scale; static u32 *raw_capacity; -static int __init free_raw_capacity(void) +static int free_raw_capacity(void) { kfree(raw_capacity); raw_capacity = NULL; So we need the __init thing only when cpufreq support isn't there in the kernel and I am not sure if we want to add more ifdef hackery in the declaration of free_raw_capacity(). Acked-by: Viresh Kumar Hi Greg, Can you please consider this patch ? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
variable length array in structure
Hi Arnd Bergmann, Following commit have introduced compilation issue. commit 2df2c3402fc81918a888e1ec711369f6014471f2 Author: Arnd BergmannDate: Sat Aug 5 21:57:46 2017 -0400 ext4: fix warning about stack corruption Observed following error - /fs/ext4/mballoc.c:2303:17: error: fields must have a constant size: 'variable length array in structure' extension will never be supported ext4_grpblk_t counters[blocksize_bits + 2]; -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
variable length array in structure
Hi Arnd Bergmann, Following commit have introduced compilation issue. commit 2df2c3402fc81918a888e1ec711369f6014471f2 Author: Arnd Bergmann Date: Sat Aug 5 21:57:46 2017 -0400 ext4: fix warning about stack corruption Observed following error - /fs/ext4/mballoc.c:2303:17: error: fields must have a constant size: 'variable length array in structure' extension will never be supported ext4_grpblk_t counters[blocksize_bits + 2]; -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] llist: clang: introduce member_address_is_nonnull()
Hi All, Observed boot up crash with clang in kerne/sched/core.c file sched_ttwu_pending() function, because it is using llist_for_each_entry_safe(). After pulling patch from “https://lkml.org/lkml/2017/7/19/1169, no crash observed. Tested-by: Sodagudi Prasad <psoda...@codeaurora.org> -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] llist: clang: introduce member_address_is_nonnull()
Hi All, Observed boot up crash with clang in kerne/sched/core.c file sched_ttwu_pending() function, because it is using llist_for_each_entry_safe(). After pulling patch from “https://lkml.org/lkml/2017/7/19/1169, no crash observed. Tested-by: Sodagudi Prasad -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [patch resend 4.12] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
On 2017-06-26 16:03, David Rientjes wrote: The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") is to suppress clang's warnings about unused static inline functions. For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86 architecture, `inline' in the kernel implies that __attribute__((always_inline)) is used. Some code depends on that behavior, see https://lkml.org/lkml/2017/6/13/918: net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99 The full fix would be to identify these breakages and annotate the functions with __always_inline instead of `inline'. But since we are late in the 4.12-rc cycle, simply carry forward the forced inlining behavior and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior. Reported-by: Sodagudi Prasad <psoda...@codeaurora.org> Tested-by: Matthias Kaehlcke <m...@chromium.org> Signed-off-by: David Rientjes <rient...@google.com> --- Resend of http://marc.info/?l=linux-kernel=149681501816319 for 4.12 inclusion. Prasad, please add your Tested-by. I have compile tested this change with clang, please add - Tested-by: Sodagudi Prasad <psoda...@codeaurora.org> -Thanks, Prasad include/linux/compiler-clang.h | 8 include/linux/compiler-gcc.h | 18 +++--- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -15,11 +15,3 @@ * with any version that can compile the kernel */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) - -/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@ /* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline__attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inlinenotrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif #define __always_inlineinline __attribute__((always_inline)) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [patch resend 4.12] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
On 2017-06-26 16:03, David Rientjes wrote: The motivation of commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") is to suppress clang's warnings about unused static inline functions. For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86 architecture, `inline' in the kernel implies that __attribute__((always_inline)) is used. Some code depends on that behavior, see https://lkml.org/lkml/2017/6/13/918: net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99 The full fix would be to identify these breakages and annotate the functions with __always_inline instead of `inline'. But since we are late in the 4.12-rc cycle, simply carry forward the forced inlining behavior and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior. Reported-by: Sodagudi Prasad Tested-by: Matthias Kaehlcke Signed-off-by: David Rientjes --- Resend of http://marc.info/?l=linux-kernel=149681501816319 for 4.12 inclusion. Prasad, please add your Tested-by. I have compile tested this change with clang, please add - Tested-by: Sodagudi Prasad -Thanks, Prasad include/linux/compiler-clang.h | 8 include/linux/compiler-gcc.h | 18 +++--- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -15,11 +15,3 @@ * with any version that can compile the kernel */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) - -/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@ /* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline__attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inlinenotrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif #define __always_inlineinline __attribute__((always_inline)) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-22 02:43, Mark Rutland wrote: On Tue, Jun 20, 2017 at 04:12:32PM -0700, David Rientjes wrote: On Tue, 20 Jun 2017, Mark Rutland wrote: > As with my reply to David, my preference would be that we: > > 1) Align compiler-clang.h with the compiler-gcc.h inlining behaviour, so >that things work by default. > > 2) Fix up the arm64 core code (and drivers for architected / common >peripherals) to use __always_inline where we always require inlining. > > 3) Have arm64 select CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING, and have >people test-build configurations with CONFIG_OPTIMIZE_INLINING, with >both GCC and clang. > > 4) Fix up drivers, etc, as appropriate. > > 5) Once that's largely stable, and if there's a benefit, have arm64 >select CONFIG_OPTIMIZE_INLINING by default. > > That should avoid undue breakage, while enabling this ASAP. > Sounds good, but I think we should simply deal with the __attribute__((unused)) needed for clang as part of compiler-gcc.h by simply adding it to the inline override there to avoid duplicated code. Agreed. That looks much better. Thanks, Mark. diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -15,11 +15,3 @@ * with any version that can compile the kernel */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) - -/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 0efef9cf014f..71fe0994cf1a 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@ /* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline__attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inlinenotrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif #define __always_inlineinline __attribute__((always_inline)) Hi David, I just want to cross check with you. Do you want me to update this change in next patch set ? Or are you going to add this ? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-22 02:43, Mark Rutland wrote: On Tue, Jun 20, 2017 at 04:12:32PM -0700, David Rientjes wrote: On Tue, 20 Jun 2017, Mark Rutland wrote: > As with my reply to David, my preference would be that we: > > 1) Align compiler-clang.h with the compiler-gcc.h inlining behaviour, so >that things work by default. > > 2) Fix up the arm64 core code (and drivers for architected / common >peripherals) to use __always_inline where we always require inlining. > > 3) Have arm64 select CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING, and have >people test-build configurations with CONFIG_OPTIMIZE_INLINING, with >both GCC and clang. > > 4) Fix up drivers, etc, as appropriate. > > 5) Once that's largely stable, and if there's a benefit, have arm64 >select CONFIG_OPTIMIZE_INLINING by default. > > That should avoid undue breakage, while enabling this ASAP. > Sounds good, but I think we should simply deal with the __attribute__((unused)) needed for clang as part of compiler-gcc.h by simply adding it to the inline override there to avoid duplicated code. Agreed. That looks much better. Thanks, Mark. diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -15,11 +15,3 @@ * with any version that can compile the kernel */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) - -/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 0efef9cf014f..71fe0994cf1a 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@ /* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline__attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inlinenotrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif #define __always_inlineinline __attribute__((always_inline)) Hi David, I just want to cross check with you. Do you want me to update this change in next patch set ? Or are you going to add this ? -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-19 14:42, David Rientjes wrote: On Mon, 19 Jun 2017, Sodagudi Prasad wrote: > > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused > > static inline functions") re-defining the 'inline' macro but > > __attribute__((always_inline)) is missing. Some compilers may > > not honor inline hint if always_iniline attribute not there. > > So add always_inline attribute to inline as done by > > compiler-gcc.h file. > > > > IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4 > and that the inlining decision making is improved in >= 4. To make a > change like this, I would think that we would need to show that clang is > making suboptimal decisions. I don't think there's a downside to making > CONFIG_OPTIMIZE_INLINING specific only to gcc. > > If it is shown that __attribute__((always_inline)) is needed for clang as > well, this should be done as part of compiler-gcc.h to avoid duplicated > code. Hi David, Here is the discussion about this change - https://lkml.org/lkml/2017/6/15/396 Please check mark and will's comments. Yes, the arch/arm64/include/asm/cmpxchg.h instance appears to need __always_inline as several other functions need __always_inline in arch/arm64/include/*. It's worth making that change as you suggested in your original patch. The concern, however, is inlining all "inline" functions forcefully. The only reason this is done for gcc is because of suboptimal inlining decisions in gcc < 4. So the question is whether this is a single instance that can be fixed where clang un-inlining causes problems or whether that instance suggests all possible inline usage for clang absolutely requires __always_inline due to a suboptimal compiler implementation. I would suggest the former. Hi David, I am not 100% sure about the best approach for this problem. We may have to replace inline with always_inline for all inline functions where BUILD_BUG() used. So far inline as always_inline for ARM64, if we do not continue same settings, will there not be any performance differences? Hi Will and Mark, Please suggest the best solution to this problem. Currently __xchg_mb is only having issue based on compiler -inline-threshold configuration. But there are many other instances in arch/arm64/* where BUILD_BUG() used for inline functions and which may fail later. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-19 14:42, David Rientjes wrote: On Mon, 19 Jun 2017, Sodagudi Prasad wrote: > > Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused > > static inline functions") re-defining the 'inline' macro but > > __attribute__((always_inline)) is missing. Some compilers may > > not honor inline hint if always_iniline attribute not there. > > So add always_inline attribute to inline as done by > > compiler-gcc.h file. > > > > IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4 > and that the inlining decision making is improved in >= 4. To make a > change like this, I would think that we would need to show that clang is > making suboptimal decisions. I don't think there's a downside to making > CONFIG_OPTIMIZE_INLINING specific only to gcc. > > If it is shown that __attribute__((always_inline)) is needed for clang as > well, this should be done as part of compiler-gcc.h to avoid duplicated > code. Hi David, Here is the discussion about this change - https://lkml.org/lkml/2017/6/15/396 Please check mark and will's comments. Yes, the arch/arm64/include/asm/cmpxchg.h instance appears to need __always_inline as several other functions need __always_inline in arch/arm64/include/*. It's worth making that change as you suggested in your original patch. The concern, however, is inlining all "inline" functions forcefully. The only reason this is done for gcc is because of suboptimal inlining decisions in gcc < 4. So the question is whether this is a single instance that can be fixed where clang un-inlining causes problems or whether that instance suggests all possible inline usage for clang absolutely requires __always_inline due to a suboptimal compiler implementation. I would suggest the former. Hi David, I am not 100% sure about the best approach for this problem. We may have to replace inline with always_inline for all inline functions where BUILD_BUG() used. So far inline as always_inline for ARM64, if we do not continue same settings, will there not be any performance differences? Hi Will and Mark, Please suggest the best solution to this problem. Currently __xchg_mb is only having issue based on compiler -inline-threshold configuration. But there are many other instances in arch/arm64/* where BUILD_BUG() used for inline functions and which may fail later. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-19 13:25, David Rientjes wrote: On Mon, 19 Jun 2017, Prasad Sodagudi wrote: Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") re-defining the 'inline' macro but __attribute__((always_inline)) is missing. Some compilers may not honor inline hint if always_iniline attribute not there. So add always_inline attribute to inline as done by compiler-gcc.h file. IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4 and that the inlining decision making is improved in >= 4. To make a change like this, I would think that we would need to show that clang is making suboptimal decisions. I don't think there's a downside to making CONFIG_OPTIMIZE_INLINING specific only to gcc. If it is shown that __attribute__((always_inline)) is needed for clang as well, this should be done as part of compiler-gcc.h to avoid duplicated code. Hi David, Here is the discussion about this change - https://lkml.org/lkml/2017/6/15/396 Please check mark and will's comments. -Thanks, Prasad Signed-off-by: Prasad Sodagudi--- include/linux/compiler-clang.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..deb65b3 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,9 @@ * directives. Suppress the warning in clang as well. */ #undef inline +#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ +!defined(CONFIG_OPTIMIZE_INLINING) +#define inline inline __attribute__((always_inline)) __attribute__((unused)) notrace +#else #define inline inline __attribute__((unused)) notrace +#endif -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: [PATCH] compiler, clang: Add always_inline attribute to inline
On 2017-06-19 13:25, David Rientjes wrote: On Mon, 19 Jun 2017, Prasad Sodagudi wrote: Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") re-defining the 'inline' macro but __attribute__((always_inline)) is missing. Some compilers may not honor inline hint if always_iniline attribute not there. So add always_inline attribute to inline as done by compiler-gcc.h file. IIUC, __attribute__((always_inline)) was only needed for gcc versions < 4 and that the inlining decision making is improved in >= 4. To make a change like this, I would think that we would need to show that clang is making suboptimal decisions. I don't think there's a downside to making CONFIG_OPTIMIZE_INLINING specific only to gcc. If it is shown that __attribute__((always_inline)) is needed for clang as well, this should be done as part of compiler-gcc.h to avoid duplicated code. Hi David, Here is the discussion about this change - https://lkml.org/lkml/2017/6/15/396 Please check mark and will's comments. -Thanks, Prasad Signed-off-by: Prasad Sodagudi --- include/linux/compiler-clang.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..deb65b3 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,9 @@ * directives. Suppress the warning in clang as well. */ #undef inline +#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ +!defined(CONFIG_OPTIMIZE_INLINING) +#define inline inline __attribute__((always_inline)) __attribute__((unused)) notrace +#else #define inline inline __attribute__((unused)) notrace +#endif -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Using __always_inline attribute
On 2017-06-14 15:33, Sodagudi Prasad wrote: On 2017-06-14 03:06, Will Deacon wrote: Hi Prasad, On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote: With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. How are you building the kernel with clang, btw? I'd be keen to try the same thing, but I thought the LLVMLinux project was largely dead at his point. Do you still need build system patches, or is this now integrated with Kbuild? Hi Will, Trying clang patches from below tree - https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). The only BUILD_BUG I see around here is if the size parameter (which is calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. It would be interesting to know which call site is confusing clang in this way and what code is actually being emitted here. If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper macro, then your patch should be ok, but I'd like to be sure it's not something else. I'm also surprised you haven't see this crop up in other places. After digging further, we observed that inline definition was changed recently and causing this issue. Here is missing part of inline macro definition __attribute__((unused)). Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") have redefined the inline macro as below #define inline inline __attribute__((unused)) But actual definition of inline macro defined compiler-gcc.h file as shown below. #define inline inline __attribute__((always_inline)) notrace As always_inline attribute is missing in inline definition, compile may not inline macros __xchg_mb in arch/arm64/include/asm/cmpxchg.h file and leading to error. Some compilers may not honor inline as inline if always_inline attribute is removed because of -inline-threshold compiler options. Here is the change to fix this issue- diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..a0e6433 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Hi Will, Here is the proper patch - compiler, clang: Add always_inline attribute to inline Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") re-defining the 'inline' macro but __attribute__((always_inline)) is missing. Some compilers may not honor inline hint if always_iniline attribute is not there. So add always_inline attribute to inline as done by compiler-gcc.h file. diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..400b0cd 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Using __always_inline attribute
On 2017-06-14 15:33, Sodagudi Prasad wrote: On 2017-06-14 03:06, Will Deacon wrote: Hi Prasad, On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote: With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. How are you building the kernel with clang, btw? I'd be keen to try the same thing, but I thought the LLVMLinux project was largely dead at his point. Do you still need build system patches, or is this now integrated with Kbuild? Hi Will, Trying clang patches from below tree - https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). The only BUILD_BUG I see around here is if the size parameter (which is calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. It would be interesting to know which call site is confusing clang in this way and what code is actually being emitted here. If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper macro, then your patch should be ok, but I'd like to be sure it's not something else. I'm also surprised you haven't see this crop up in other places. After digging further, we observed that inline definition was changed recently and causing this issue. Here is missing part of inline macro definition __attribute__((unused)). Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") have redefined the inline macro as below #define inline inline __attribute__((unused)) But actual definition of inline macro defined compiler-gcc.h file as shown below. #define inline inline __attribute__((always_inline)) notrace As always_inline attribute is missing in inline definition, compile may not inline macros __xchg_mb in arch/arm64/include/asm/cmpxchg.h file and leading to error. Some compilers may not honor inline as inline if always_inline attribute is removed because of -inline-threshold compiler options. Here is the change to fix this issue- diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..a0e6433 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Hi Will, Here is the proper patch - compiler, clang: Add always_inline attribute to inline Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") re-defining the 'inline' macro but __attribute__((always_inline)) is missing. Some compilers may not honor inline hint if always_iniline attribute is not there. So add always_inline attribute to inline as done by compiler-gcc.h file. diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..400b0cd 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Using __always_inline attribute
On 2017-06-14 03:06, Will Deacon wrote: Hi Prasad, On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote: With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. How are you building the kernel with clang, btw? I'd be keen to try the same thing, but I thought the LLVMLinux project was largely dead at his point. Do you still need build system patches, or is this now integrated with Kbuild? Hi Will, Trying clang patches from below tree - https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). The only BUILD_BUG I see around here is if the size parameter (which is calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. It would be interesting to know which call site is confusing clang in this way and what code is actually being emitted here. If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper macro, then your patch should be ok, but I'd like to be sure it's not something else. I'm also surprised you haven't see this crop up in other places. After digging further, we observed that inline definition was changed recently and causing this issue. Here is missing part of inline macro definition __attribute__((unused)). Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") have redefined the inline macro as below #define inline inline __attribute__((unused)) But actual definition of inline macro defined compiler-gcc.h file as shown below. #define inline inline __attribute__((always_inline)) notrace As always_inline attribute is missing in inline definition, compile may not inline macros __xchg_mb in arch/arm64/include/asm/cmpxchg.h file and leading to error. Some compilers may not honor inline as inline if always_inline attribute is removed because of -inline-threshold compiler options. Here is the change to fix this issue- diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..a0e6433 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Using __always_inline attribute
On 2017-06-14 03:06, Will Deacon wrote: Hi Prasad, On Tue, Jun 13, 2017 at 03:39:37PM -0700, Sodagudi Prasad wrote: With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. How are you building the kernel with clang, btw? I'd be keen to try the same thing, but I thought the LLVMLinux project was largely dead at his point. Do you still need build system patches, or is this now integrated with Kbuild? Hi Will, Trying clang patches from below tree - https://android.googlesource.com/kernel/common/+/experimental/android-4.4-llvm net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). The only BUILD_BUG I see around here is if the size parameter (which is calculated using sizeof) is not known to be 1,2,4 or 8 at compile time. It would be interesting to know which call site is confusing clang in this way and what code is actually being emitted here. If it's just that __xchg_mb isn't being inlined into the __xchg_wrapper macro, then your patch should be ok, but I'd like to be sure it's not something else. I'm also surprised you haven't see this crop up in other places. After digging further, we observed that inline definition was changed recently and causing this issue. Here is missing part of inline macro definition __attribute__((unused)). Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") have redefined the inline macro as below #define inline inline __attribute__((unused)) But actual definition of inline macro defined compiler-gcc.h file as shown below. #define inline inline __attribute__((always_inline)) notrace As always_inline attribute is missing in inline definition, compile may not inline macros __xchg_mb in arch/arm64/include/asm/cmpxchg.h file and leading to error. Some compilers may not honor inline as inline if always_inline attribute is removed because of -inline-threshold compiler options. Here is the change to fix this issue- diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index d614c5e..a0e6433 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -22,4 +22,4 @@ * directives. Suppress the warning in clang as well. */ #undef inline -#define inline inline __attribute__((unused)) notrace +#define inline __attribute__((always_inline)) __attribute__((unused)) notrace -Thanks, Prasad Will -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Using __always_inline attribute
Hi All, With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). We added __always_inline attribute to this macro as shown below, so that clang forces this macro to be always inline. Based on definition of __xchg##sfx, it should always be inline. Can we force this macro to be __always_inline ? diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h index ae852ad..ce57cec 100644 --- a/arch/arm64/include/asm/cmpxchg.h +++ b/arch/arm64/include/asm/cmpxchg.h @@ -73,7 +73,7 @@ #undef __XCHG_CASE #define __XCHG_GEN(sfx) \ -static inline unsigned long __xchg##sfx(unsigned long x, \ +static __always_inline unsigned long __xchg##sfx(unsigned long x, \ volatile void *ptr, \ int size) \ { \ -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Using __always_inline attribute
Hi All, With a variant of a CLANG(based on 4.0) following errors observed on Linux 4.12-rc5 tag. net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: \ undefined reference to `__compiletime_assert_99 Clang does not seems to be marking this macro as inline and causing above compilation issue due to BUILD_BUG(). We added __always_inline attribute to this macro as shown below, so that clang forces this macro to be always inline. Based on definition of __xchg##sfx, it should always be inline. Can we force this macro to be __always_inline ? diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h index ae852ad..ce57cec 100644 --- a/arch/arm64/include/asm/cmpxchg.h +++ b/arch/arm64/include/asm/cmpxchg.h @@ -73,7 +73,7 @@ #undef __XCHG_CASE #define __XCHG_GEN(sfx) \ -static inline unsigned long __xchg##sfx(unsigned long x, \ +static __always_inline unsigned long __xchg##sfx(unsigned long x, \ volatile void *ptr, \ int size) \ { \ -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
deadlock due to circular dependency between threads
Hi all, I am working on a platform, which is using the Linux version 4.4. I have observed a DEADLOCK between couple of threads and looking for suggestions/comments. Here is my understanding from the call stacks of these blocked tasks. 0) CPU3 is getting hot plugged from a kthread and which is running on core5. 1) Cpu hot plug flow needs to flush the work items on hot plugging CPU3, with a high priority worker from the corresponding CPU’s(cpu3) worker pool. 2) There are no high priority workers on the CPU3 worker pool, so create_worker was initiated to create high priority kernel thread/worker. 3) This thread creation should be done by kthreadd demon, but kthreadd demon have got stuck in some other thread creation. At this point of time kthreadd creating a thread and updating cgroup settings and waiting on rw semaphore of cgroup subsystem. 4) Cgroup readwrite semaphore is taken by "init" thread and waiting on cpuset mutex lock. init task is updating cgroup's based on userspace request. 5) Cpuset mutex lock is taken by "kworker:5/1" and it is waiting for cpuhotplug lock. Cpuhotplug mutex is taken by "ABC_XYZ" hotplugging thread. DEADLOCK! circular dependency between threads:- "kthread_XYZ" ==> "kthreadd" ==> "init" ==> "kworker/5:1" ==> "kthread_XYZ" PID: 910TASK: ffc0ee8dd780 CPU: 5 COMMAND: "ABC_XYZ" #0 [ffc0ee9cb900] __switch_to at ff800808553c #1 [ffc0ee9cb930] __schedule at ff8008d76aa0 #2 [ffc0ee9cb990] schedule at ff8008d76e04 #3 [ffc0ee9cb9b0] schedule_timeout at ff8008d7953c #4 [ffc0ee9cba60] wait_for_common at ff8008d77888 #5 [ffc0ee9cbaf0] wait_for_completion at ff8008d778dc #6 [ffc0ee9cbb00] flush_work at ff80080b3850 #7 [ffc0ee9cbb80] workqueue_cpu_down_callback at ff80080b5360 #8 [ffc0ee9cbbc0] notifier_call_chain at ff80080b9c4c #9 [ffc0ee9cbc00] __raw_notifier_call_chain at ff80080b9cb8 #10 [ffc0ee9cbc10] __cpu_notify at ff800809eb50 #11 [ffc0ee9cbc20] _cpu_down at ff800809ee84 #12 [ffc0ee9cbca0] cpu_down at ff800809f124 #13 [ffc0ee9cbcd0] cpu_subsys_offline at ff800856b768 #14 [ffc0ee9cbce0] device_offline at ff8008567040 #15 [ffc0ee9cbd10] update_offline_cores at ff8008d74b54 #16 [ffc0ee9cbda0] do_hotplug at ff8008d75358 #17 [ffc0ee9cbe20] kthread at ff80080b8e3c PID: 2 TASK: ffc0f9660c80 CPU: 4 COMMAND: "kthreadd" #0 [ffc0f9683bf0] __switch_to at ff800808553c #1 [ffc0f9683c20] __schedule at ff8008d76aa0 #2 [ffc0f9683c80] schedule at ff8008d76e04 #3 [ffc0f9683ca0] rwsem_down_read_failed at ff8008d79144 #4 [ffc0f9683cf0] __percpu_down_read at ff80080edc4c #5 [ffc0f9683d10] copy_process at ff800809cecc #6 [ffc0f9683df0] _do_fork at ff800809d5a0 #7 [ffc0f9683e50] kernel_thread at ff800809d89c #8 [ffc0f9683e60] kthreadd at ff80080b9714 PID: 898TASK: ffc0ee91 CPU: 0 COMMAND: "init" #0 [ffc06fd93980] __switch_to at ff800808553c #1 [ffc06fd939b0] __schedule at ff8008d76aa0 #2 [ffc06fd93a10] schedule at ff8008d76e04 #3 [ffc06fd93a30] schedule_preempt_disabled at ff8008d7714c #4 [ffc06fd93a50] __mutex_lock_slowpath at ff8008d78684 #5 [ffc06fd93ab0] mutex_lock at ff8008d78714 #6 [ffc06fd93ad0] cpuset_can_attach at ff800812d490 #7 [ffc06fd93b20] cgroup_taskset_migrate at ff8008129194 #8 [ffc06fd93b70] cgroup_migrate at ff8008129454 #9 [ffc06fd93bf0] cgroup_attach_task at ff800812950c #10 [ffc06fd93c50] __cgroup_procs_write at ff8008129884 #11 [ffc06fd93d10] cgroup_tasks_write at ff800812993c #12 [ffc06fd93d20] cgroup_file_write at ff8008125078 #13 [ffc06fd93d70] kernfs_fop_write at ff800820bef4 #14 [ffc06fd93db0] __vfs_write at ff80081ac6f4 #15 [ffc06fd93e30] vfs_write at ff80081acf28 #16 [ffc06fd93e70] sys_write at ff80081ad6d8 #17 [ffc06fd93ed0] el0_svc_naked at ff800808462 PID: 66 TASK: ffc020dc7080 CPU: 5 COMMAND: "kworker/5:1" #0 [ffc0f7ff3a90] __switch_to at ff800808553c #1 [ffc0f7ff3ac0] __schedule at ff8008d76aa0 #2 [ffc0f7ff3b20] schedule at ff8008d76e04 #3 [ffc0f7ff3b40] schedule_preempt_disabled at ff8008d7714c #4 [ffc0f7ff3b60] __mutex_lock_slowpath at ff8008d78684 #5 [ffc0f7ff3bc0] mutex_lock at ff8008d78714 #6 [ffc0f7ff3be0] get_online_cpus at ff800809e9bc #7 [ffc0f7ff3c00] rebuild_sched_domains_locked at ff800812c960 #8 [ffc0f7ff3cb0] rebuild_sched_domains at ff800812e7bc #9 [ffc0f7ff3cd0] cpuset_hotplug_workfn at ff800812eca8 #10 [ffc0f7ff3d70] process_one_work at ff80080b3cec #11 [ffc0f7ff3dc0] worker_thread at ff80080b4700 #12 [ffc0f7ff3e20] kthread at ff80080b8e3c I think, we can avoid this DEADLOCK
deadlock due to circular dependency between threads
Hi all, I am working on a platform, which is using the Linux version 4.4. I have observed a DEADLOCK between couple of threads and looking for suggestions/comments. Here is my understanding from the call stacks of these blocked tasks. 0) CPU3 is getting hot plugged from a kthread and which is running on core5. 1) Cpu hot plug flow needs to flush the work items on hot plugging CPU3, with a high priority worker from the corresponding CPU’s(cpu3) worker pool. 2) There are no high priority workers on the CPU3 worker pool, so create_worker was initiated to create high priority kernel thread/worker. 3) This thread creation should be done by kthreadd demon, but kthreadd demon have got stuck in some other thread creation. At this point of time kthreadd creating a thread and updating cgroup settings and waiting on rw semaphore of cgroup subsystem. 4) Cgroup readwrite semaphore is taken by "init" thread and waiting on cpuset mutex lock. init task is updating cgroup's based on userspace request. 5) Cpuset mutex lock is taken by "kworker:5/1" and it is waiting for cpuhotplug lock. Cpuhotplug mutex is taken by "ABC_XYZ" hotplugging thread. DEADLOCK! circular dependency between threads:- "kthread_XYZ" ==> "kthreadd" ==> "init" ==> "kworker/5:1" ==> "kthread_XYZ" PID: 910TASK: ffc0ee8dd780 CPU: 5 COMMAND: "ABC_XYZ" #0 [ffc0ee9cb900] __switch_to at ff800808553c #1 [ffc0ee9cb930] __schedule at ff8008d76aa0 #2 [ffc0ee9cb990] schedule at ff8008d76e04 #3 [ffc0ee9cb9b0] schedule_timeout at ff8008d7953c #4 [ffc0ee9cba60] wait_for_common at ff8008d77888 #5 [ffc0ee9cbaf0] wait_for_completion at ff8008d778dc #6 [ffc0ee9cbb00] flush_work at ff80080b3850 #7 [ffc0ee9cbb80] workqueue_cpu_down_callback at ff80080b5360 #8 [ffc0ee9cbbc0] notifier_call_chain at ff80080b9c4c #9 [ffc0ee9cbc00] __raw_notifier_call_chain at ff80080b9cb8 #10 [ffc0ee9cbc10] __cpu_notify at ff800809eb50 #11 [ffc0ee9cbc20] _cpu_down at ff800809ee84 #12 [ffc0ee9cbca0] cpu_down at ff800809f124 #13 [ffc0ee9cbcd0] cpu_subsys_offline at ff800856b768 #14 [ffc0ee9cbce0] device_offline at ff8008567040 #15 [ffc0ee9cbd10] update_offline_cores at ff8008d74b54 #16 [ffc0ee9cbda0] do_hotplug at ff8008d75358 #17 [ffc0ee9cbe20] kthread at ff80080b8e3c PID: 2 TASK: ffc0f9660c80 CPU: 4 COMMAND: "kthreadd" #0 [ffc0f9683bf0] __switch_to at ff800808553c #1 [ffc0f9683c20] __schedule at ff8008d76aa0 #2 [ffc0f9683c80] schedule at ff8008d76e04 #3 [ffc0f9683ca0] rwsem_down_read_failed at ff8008d79144 #4 [ffc0f9683cf0] __percpu_down_read at ff80080edc4c #5 [ffc0f9683d10] copy_process at ff800809cecc #6 [ffc0f9683df0] _do_fork at ff800809d5a0 #7 [ffc0f9683e50] kernel_thread at ff800809d89c #8 [ffc0f9683e60] kthreadd at ff80080b9714 PID: 898TASK: ffc0ee91 CPU: 0 COMMAND: "init" #0 [ffc06fd93980] __switch_to at ff800808553c #1 [ffc06fd939b0] __schedule at ff8008d76aa0 #2 [ffc06fd93a10] schedule at ff8008d76e04 #3 [ffc06fd93a30] schedule_preempt_disabled at ff8008d7714c #4 [ffc06fd93a50] __mutex_lock_slowpath at ff8008d78684 #5 [ffc06fd93ab0] mutex_lock at ff8008d78714 #6 [ffc06fd93ad0] cpuset_can_attach at ff800812d490 #7 [ffc06fd93b20] cgroup_taskset_migrate at ff8008129194 #8 [ffc06fd93b70] cgroup_migrate at ff8008129454 #9 [ffc06fd93bf0] cgroup_attach_task at ff800812950c #10 [ffc06fd93c50] __cgroup_procs_write at ff8008129884 #11 [ffc06fd93d10] cgroup_tasks_write at ff800812993c #12 [ffc06fd93d20] cgroup_file_write at ff8008125078 #13 [ffc06fd93d70] kernfs_fop_write at ff800820bef4 #14 [ffc06fd93db0] __vfs_write at ff80081ac6f4 #15 [ffc06fd93e30] vfs_write at ff80081acf28 #16 [ffc06fd93e70] sys_write at ff80081ad6d8 #17 [ffc06fd93ed0] el0_svc_naked at ff800808462 PID: 66 TASK: ffc020dc7080 CPU: 5 COMMAND: "kworker/5:1" #0 [ffc0f7ff3a90] __switch_to at ff800808553c #1 [ffc0f7ff3ac0] __schedule at ff8008d76aa0 #2 [ffc0f7ff3b20] schedule at ff8008d76e04 #3 [ffc0f7ff3b40] schedule_preempt_disabled at ff8008d7714c #4 [ffc0f7ff3b60] __mutex_lock_slowpath at ff8008d78684 #5 [ffc0f7ff3bc0] mutex_lock at ff8008d78714 #6 [ffc0f7ff3be0] get_online_cpus at ff800809e9bc #7 [ffc0f7ff3c00] rebuild_sched_domains_locked at ff800812c960 #8 [ffc0f7ff3cb0] rebuild_sched_domains at ff800812e7bc #9 [ffc0f7ff3cd0] cpuset_hotplug_workfn at ff800812eca8 #10 [ffc0f7ff3d70] process_one_work at ff80080b3cec #11 [ffc0f7ff3dc0] worker_thread at ff80080b4700 #12 [ffc0f7ff3e20] kthread at ff80080b8e3c I think, we can avoid this DEADLOCK
gic_write_grpen1 in hotplug flow
Hi All, This is regarding the usage of gic_write_grpen1 API usage in irq-gic-v3 driver. Here my understanding about ICC_IGRPEN1_EL1. ICC_IGRPEN1_EL1 is banked between secure and non-secure states. If two secure states are implemented, Secure side Group bit is set by the platform firmware (PSCI) and kernel need to set in non secure state. https://mail.codeaurora.org/?_task=mail&_action=compose&_id=676833541590a775943df6# 1) Currently gic_write_grpen1(0) is getting called from gic_cpu_pm_notifier() for CPU_PM_ENTER in single security state only. But enabling of group1 non-secure interrupts are done in CPU_PM_EXIT path unconditionally. Why are we not disabling group1 non-secure interrupts unconditionally in CPU_PM_ENTER(and disabling only in single security state)? 2) Why group1 non-secure interrupts are not disabled in kernel during cpu hotplug path? Spurious interrupt can still come if we dont disable group1 non-secure interrupts, right? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
gic_write_grpen1 in hotplug flow
Hi All, This is regarding the usage of gic_write_grpen1 API usage in irq-gic-v3 driver. Here my understanding about ICC_IGRPEN1_EL1. ICC_IGRPEN1_EL1 is banked between secure and non-secure states. If two secure states are implemented, Secure side Group bit is set by the platform firmware (PSCI) and kernel need to set in non secure state. https://mail.codeaurora.org/?_task=mail&_action=compose&_id=676833541590a775943df6# 1) Currently gic_write_grpen1(0) is getting called from gic_cpu_pm_notifier() for CPU_PM_ENTER in single security state only. But enabling of group1 non-secure interrupts are done in CPU_PM_EXIT path unconditionally. Why are we not disabling group1 non-secure interrupts unconditionally in CPU_PM_ENTER(and disabling only in single security state)? 2) Why group1 non-secure interrupts are not disabled in kernel during cpu hotplug path? Spurious interrupt can still come if we dont disable group1 non-secure interrupts, right? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-03-13 13:19, Thomas Gleixner wrote: On Mon, 13 Mar 2017, Sodagudi Prasad wrote: On 2017-02-27 09:21, Thomas Gleixner wrote: > On Mon, 27 Feb 2017, Sodagudi Prasad wrote: > > So I am thinking that, adding following sched_work() would notify clients. > > And break the world and some more. > > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > > index 6b66959..5e4766b 100644 > > --- a/kernel/irq/manage.c > > +++ b/kernel/irq/manage.c > > @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const > > struct cpumask *mask, > > case IRQ_SET_MASK_OK_DONE: > > cpumask_copy(desc->irq_common_data.affinity, mask); > > case IRQ_SET_MASK_OK_NOCOPY: > > + schedule_work(>affinity_notify->work); > > irq_set_thread_affinity(desc); > > ret = 0; > > You cannot do that unconditionally and just slap that schedule_work() call > into the code. Aside of that schedule_work() would be invoked twice for all > calls which come via irq_set_affinity_locked() Hi Tglx, Yes. I agree with you, schedule_work() gets invoked twice with previous change. How about calling irq_set_notify_locked() instead of irq_do_set_notify()? Is this a quiz? Can you actually see the difference between these functions? There is a damned good reason WHY this calls irq_do_set_affinity(). Other option is that, adding an argument to irq_do_set_affinity() and queue work to notify when that new parameter set. I have attached patch for the same. I tested this change on arm64 bit platform and observed that clients drivers are getting notified during cpu hot plug. Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative ProjectFrom 54b8d5164126fbdf14d1a9586342b972a6eb5537 Mon Sep 17 00:00:00 2001 From: Prasad Sodagudi <psoda...@codeaurora.org> Date: Thu, 16 Mar 2017 23:44:44 -0700 Subject: [PATCH] genirq: Notify clients whenever there is change in affinity During the cpu hotplug, irq are getting migrated from hotplugging core but not getting notitfied to client drivers. So add parameter to irq_do_set_affinity(), to check and notify client drivers during the cpu hotplug. Signed-off-by: Prasad Sodagudi <psoda...@codeaurora.org> --- kernel/irq/cpuhotplug.c | 2 +- kernel/irq/internals.h | 2 +- kernel/irq/manage.c | 9 ++--- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 011f8c4..e293d9b 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -38,7 +38,7 @@ static bool migrate_one_irq(struct irq_desc *desc) if (!c->irq_set_affinity) { pr_debug("IRQ%u: unable to set affinity\n", d->irq); } else { - int r = irq_do_set_affinity(d, affinity, false); + int r = irq_do_set_affinity(d, affinity, false, true); if (r) pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", d->irq, r); diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index bc226e7..6abde48 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -114,7 +114,7 @@ static inline void unregister_handler_proc(unsigned int irq, extern void irq_set_thread_affinity(struct irq_desc *desc); extern int irq_do_set_affinity(struct irq_data *data, - const struct cpumask *dest, bool force); + const struct cpumask *dest, bool force, bool notify); /* Inline functions for support of irq chips on slow busses */ static inline void chip_bus_lock(struct irq_desc *desc) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index a4afe5c..aef8a96 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -197,7 +197,7 @@ static inline bool irq_move_pending(struct irq_data *data) #endif int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, - bool force) + bool force, bool notify) { struct irq_desc *desc = irq_data_to_desc(data); struct irq_chip *chip = irq_data_get_irq_chip(data); @@ -209,6 +209,9 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + if (notify) + schedule_work(>affinity_notify->work); + irq_set_thread_affinity(desc); ret = 0; } @@ -227,7 +230,7 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, return -EINVAL; if (irq_can_move_pcntxt(data)) { - ret = irq_do_set_affinity(data, mask, force); + ret = irq_do_set_affinity(data, mask, force, false); } else { irqd_set_move_pending(data); irq_copy_pending(desc, mask); @@ -375,7 +378,7 @@ static int setup_affinity(struct irq_desc *desc, struct cpumask *mask) if (cpumask_intersects(mask, nodemask)) cpumask_and(mask
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-03-13 13:19, Thomas Gleixner wrote: On Mon, 13 Mar 2017, Sodagudi Prasad wrote: On 2017-02-27 09:21, Thomas Gleixner wrote: > On Mon, 27 Feb 2017, Sodagudi Prasad wrote: > > So I am thinking that, adding following sched_work() would notify clients. > > And break the world and some more. > > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > > index 6b66959..5e4766b 100644 > > --- a/kernel/irq/manage.c > > +++ b/kernel/irq/manage.c > > @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const > > struct cpumask *mask, > > case IRQ_SET_MASK_OK_DONE: > > cpumask_copy(desc->irq_common_data.affinity, mask); > > case IRQ_SET_MASK_OK_NOCOPY: > > + schedule_work(>affinity_notify->work); > > irq_set_thread_affinity(desc); > > ret = 0; > > You cannot do that unconditionally and just slap that schedule_work() call > into the code. Aside of that schedule_work() would be invoked twice for all > calls which come via irq_set_affinity_locked() Hi Tglx, Yes. I agree with you, schedule_work() gets invoked twice with previous change. How about calling irq_set_notify_locked() instead of irq_do_set_notify()? Is this a quiz? Can you actually see the difference between these functions? There is a damned good reason WHY this calls irq_do_set_affinity(). Other option is that, adding an argument to irq_do_set_affinity() and queue work to notify when that new parameter set. I have attached patch for the same. I tested this change on arm64 bit platform and observed that clients drivers are getting notified during cpu hot plug. Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative ProjectFrom 54b8d5164126fbdf14d1a9586342b972a6eb5537 Mon Sep 17 00:00:00 2001 From: Prasad Sodagudi Date: Thu, 16 Mar 2017 23:44:44 -0700 Subject: [PATCH] genirq: Notify clients whenever there is change in affinity During the cpu hotplug, irq are getting migrated from hotplugging core but not getting notitfied to client drivers. So add parameter to irq_do_set_affinity(), to check and notify client drivers during the cpu hotplug. Signed-off-by: Prasad Sodagudi --- kernel/irq/cpuhotplug.c | 2 +- kernel/irq/internals.h | 2 +- kernel/irq/manage.c | 9 ++--- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 011f8c4..e293d9b 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -38,7 +38,7 @@ static bool migrate_one_irq(struct irq_desc *desc) if (!c->irq_set_affinity) { pr_debug("IRQ%u: unable to set affinity\n", d->irq); } else { - int r = irq_do_set_affinity(d, affinity, false); + int r = irq_do_set_affinity(d, affinity, false, true); if (r) pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", d->irq, r); diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h index bc226e7..6abde48 100644 --- a/kernel/irq/internals.h +++ b/kernel/irq/internals.h @@ -114,7 +114,7 @@ static inline void unregister_handler_proc(unsigned int irq, extern void irq_set_thread_affinity(struct irq_desc *desc); extern int irq_do_set_affinity(struct irq_data *data, - const struct cpumask *dest, bool force); + const struct cpumask *dest, bool force, bool notify); /* Inline functions for support of irq chips on slow busses */ static inline void chip_bus_lock(struct irq_desc *desc) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index a4afe5c..aef8a96 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -197,7 +197,7 @@ static inline bool irq_move_pending(struct irq_data *data) #endif int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, - bool force) + bool force, bool notify) { struct irq_desc *desc = irq_data_to_desc(data); struct irq_chip *chip = irq_data_get_irq_chip(data); @@ -209,6 +209,9 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + if (notify) + schedule_work(>affinity_notify->work); + irq_set_thread_affinity(desc); ret = 0; } @@ -227,7 +230,7 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, return -EINVAL; if (irq_can_move_pcntxt(data)) { - ret = irq_do_set_affinity(data, mask, force); + ret = irq_do_set_affinity(data, mask, force, false); } else { irqd_set_move_pending(data); irq_copy_pending(desc, mask); @@ -375,7 +378,7 @@ static int setup_affinity(struct irq_desc *desc, struct cpumask *mask) if (cpumask_intersects(mask, nodemask)) cpumask_and(mask, mask, nodemask); } - irq_do_set_affinity(>irq_data, m
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-02-27 09:21, Thomas Gleixner wrote: On Mon, 27 Feb 2017, Sodagudi Prasad wrote: So I am thinking that, adding following sched_work() would notify clients. And break the world and some more. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 6b66959..5e4766b 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; You cannot do that unconditionally and just slap that schedule_work() call into the code. Aside of that schedule_work() would be invoked twice for all calls which come via irq_set_affinity_locked() Hi Tglx, Yes. I agree with you, schedule_work() gets invoked twice with previous change. How about calling irq_set_notify_locked() instead of irq_do_set_notify()? diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 011f8c4..e8ce0db 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -18,8 +18,8 @@ static bool migrate_one_irq(struct irq_desc *desc) { struct irq_data *d = irq_desc_get_irq_data(desc); const struct cpumask *affinity = d->common->affinity; - struct irq_chip *c; bool ret = false; + int r; /* * If this is a per-CPU interrupt, or the affinity does not @@ -34,15 +34,10 @@ static bool migrate_one_irq(struct irq_desc *desc) ret = true; } - c = irq_data_get_irq_chip(d); - if (!c->irq_set_affinity) { - pr_debug("IRQ%u: unable to set affinity\n", d->irq); - } else { - int r = irq_do_set_affinity(d, affinity, false); - if (r) - pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", + r = irq_set_affinity_locked(d, affinity, false); + if (r) + pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", d->irq, r); - } return ret; -Thanks, Prasad Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-02-27 09:21, Thomas Gleixner wrote: On Mon, 27 Feb 2017, Sodagudi Prasad wrote: So I am thinking that, adding following sched_work() would notify clients. And break the world and some more. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 6b66959..5e4766b 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; You cannot do that unconditionally and just slap that schedule_work() call into the code. Aside of that schedule_work() would be invoked twice for all calls which come via irq_set_affinity_locked() Hi Tglx, Yes. I agree with you, schedule_work() gets invoked twice with previous change. How about calling irq_set_notify_locked() instead of irq_do_set_notify()? diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 011f8c4..e8ce0db 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -18,8 +18,8 @@ static bool migrate_one_irq(struct irq_desc *desc) { struct irq_data *d = irq_desc_get_irq_data(desc); const struct cpumask *affinity = d->common->affinity; - struct irq_chip *c; bool ret = false; + int r; /* * If this is a per-CPU interrupt, or the affinity does not @@ -34,15 +34,10 @@ static bool migrate_one_irq(struct irq_desc *desc) ret = true; } - c = irq_data_get_irq_chip(d); - if (!c->irq_set_affinity) { - pr_debug("IRQ%u: unable to set affinity\n", d->irq); - } else { - int r = irq_do_set_affinity(d, affinity, false); - if (r) - pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", + r = irq_set_affinity_locked(d, affinity, false); + if (r) + pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n", d->irq, r); - } return ret; -Thanks, Prasad Thanks, tglx -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
opaque types instead of union epoll_data
Hi All, uapi structs epoll_data are more opaque than user space expects. kernel have defined as __u64 instead of the union epoll_data. Because of this libc have redefined struct epoll_event with union data member. https://android.googlesource.com/platform/bionic.git/+/master/libc/include/sys/epoll.h typedef union epoll_data { void* ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; epoll_data_t data; } Kernel UAPI header defined as "include/uapi/linux/eventpoll.h" struct epoll_event { __u32 events; __u64 data; =>opaque type instead of union epoll_data } EPOLL_PACKED; Because of this we are landing into some issues as we copy kernel headers. Will it be going to be addressed? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
opaque types instead of union epoll_data
Hi All, uapi structs epoll_data are more opaque than user space expects. kernel have defined as __u64 instead of the union epoll_data. Because of this libc have redefined struct epoll_event with union data member. https://android.googlesource.com/platform/bionic.git/+/master/libc/include/sys/epoll.h typedef union epoll_data { void* ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; epoll_data_t data; } Kernel UAPI header defined as "include/uapi/linux/eventpoll.h" struct epoll_event { __u32 events; __u64 data; =>opaque type instead of union epoll_data } EPOLL_PACKED; Because of this we are landing into some issues as we copy kernel headers. Will it be going to be addressed? -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-02-21 12:59, Sodagudi Prasad wrote: Hi Thomas, Currently irq_set_affinity() is called to migrate irqs from migrate_one_irq() during cpu hot plug and clients which are interested to know the irq affinity change not getting notified take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu(); irq_set_affinity() is changing the IRQ affinity at chip level but it is not notifying the affinity_notify work. Hi Thomas and All, I could see that in the 3.10 kernel irq affinity notifiers are getting called when a core getting hot plugged. But in later kernel versions api used to change affinity was changed from irq_set_affinity_locked() to irq_do_set_affinity(), so irq notifiers are not getting called when a core hot plugged. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq/manage.c?id=refs/tags/v3.10.105#n201 In latest kernel following path is executed to migrate IRQs. irq_migrate_all_off_this_cpu() --> migrate_one_irq() -> irq_do_set_affinity(). As I mentioned above, irq_set_affinity_locked() notifies all clients drivers, which are registered for IRQ affinity change but irq_do_set_affinity() API just changes the affinity at irq chip level but does not notify the clients drivers. I am not sure whether it is just a miss during IRQ framework refactor or intentionally done like this. Can you please check whether following code change make sense or not? So I am thinking that, adding following sched_work() would notify clients. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 6b66959..5e4766b 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; } How about below change, so that clients drivers gets notified about irq affinity changes? --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; With this change, notifications of IRQ affinity gets executed and notified to client drivers. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Schedule affinity_notify work while migrating IRQs during hot plug
On 2017-02-21 12:59, Sodagudi Prasad wrote: Hi Thomas, Currently irq_set_affinity() is called to migrate irqs from migrate_one_irq() during cpu hot plug and clients which are interested to know the irq affinity change not getting notified take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu(); irq_set_affinity() is changing the IRQ affinity at chip level but it is not notifying the affinity_notify work. Hi Thomas and All, I could see that in the 3.10 kernel irq affinity notifiers are getting called when a core getting hot plugged. But in later kernel versions api used to change affinity was changed from irq_set_affinity_locked() to irq_do_set_affinity(), so irq notifiers are not getting called when a core hot plugged. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/kernel/irq/manage.c?id=refs/tags/v3.10.105#n201 In latest kernel following path is executed to migrate IRQs. irq_migrate_all_off_this_cpu() --> migrate_one_irq() -> irq_do_set_affinity(). As I mentioned above, irq_set_affinity_locked() notifies all clients drivers, which are registered for IRQ affinity change but irq_do_set_affinity() API just changes the affinity at irq chip level but does not notify the clients drivers. I am not sure whether it is just a miss during IRQ framework refactor or intentionally done like this. Can you please check whether following code change make sense or not? So I am thinking that, adding following sched_work() would notify clients. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 6b66959..5e4766b 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; } How about below change, so that clients drivers gets notified about irq affinity changes? --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; With this change, notifications of IRQ affinity gets executed and notified to client drivers. -Thanks, Prasad -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head
On 2017-01-03 07:19, Greg KH wrote: On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote: Hi All, Device has crashed due to memory access after free while pending_fw_head list accessed. Kernel 4.4 stable version is used to reproduce this use after free. -- [ 9031.178428] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b [ 9031.178508] pgd = ffc0de9d2000 [ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=, *pud= [ 9031.253045] [ cut here ] [ 9031.253100] Kernel BUG at ff800864c0a0 [verbose debug info unavailable] [ 9031.256860] Internal error: Oops - BUG: 9604 [#1] PREEMPT SMP [ 9031.263539] Modules linked in: [ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G WL 4.4.16+ #1 [ 9031.280648] task: ffc0d1a1d700 ti: ffc0d1a2c000 task.ti: ffc0d1a2c000 [ 9031.287776] PC is at fw_pm_notify+0x84/0x19c [ 9031.295215] LR is at fw_pm_notify+0x60/0x19c [ 9031.511559] [] fw_pm_notify+0x84/0x19c [ 9031.519355] [] notifier_call_chain+0x58/0x8c [ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70 [ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44 [ 9031.537243] [] pm_notifier_call_chain+0x28/0x48 [ 9031.543662] [] pm_suspend+0x278/0x674 [ 9031.549906] [] state_store+0x58/0x90 [ 9031.554942] [] kobj_attr_store+0x18/0x28 [ 9031.560154] [] sysfs_kf_write+0x5c/0x68 [ 9031.565620] [] kernfs_fop_write+0x114/0x16c [ 9031.571092] [] __vfs_write+0x48/0xf0 [ 9031.576816] [] vfs_write+0xb8/0x150 [ 9031.581848] [] SyS_write+0x58/0x94 [ 9031.586973] [] el0_svc_naked+0x24/0x28 --- Kernel panic is observed during device suspend/resume path in the kill_requests_without_uevent() called from fw_pm_notify(). when pending_list of a firmware_buf is accessed 0x6b(free pattern) pattern observed. Based on this firmware_buf is freed even if firmware_buf is part of pending_fw_head list. What are you doing in userspace to trigger this problem? What kernel driver is this happening with? Device continuous suspend and resume is happening here. I think, echo mem > /sys/power/state issued here. It is not clear what driver involved here, because after firmware_buf is freed all memory gets filled with 0x6b pattern. And 4.4.16 is pretty old, can you try 4.9? We don't have system which runs on new kernels. Looking for possible reasons/path, how firmware_buf can get freed when that in pending_fw_head list. thanks, greg k-h -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Re: Free after use in fw_pm_notify()->kill_requests_without_uevent() due pending_fw_head
On 2017-01-03 07:19, Greg KH wrote: On Tue, Jan 03, 2017 at 06:44:03AM -0800, Sodagudi Prasad wrote: Hi All, Device has crashed due to memory access after free while pending_fw_head list accessed. Kernel 4.4 stable version is used to reproduce this use after free. -- [ 9031.178428] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b [ 9031.178508] pgd = ffc0de9d2000 [ 9031.185888] [6b6b6b6b6b6b6b6b] *pgd=, *pud= [ 9031.253045] [ cut here ] [ 9031.253100] Kernel BUG at ff800864c0a0 [verbose debug info unavailable] [ 9031.256860] Internal error: Oops - BUG: 9604 [#1] PREEMPT SMP [ 9031.263539] Modules linked in: [ 9031.272708] CPU: 6 PID: 1373 Comm: system_server Tainted: G WL 4.4.16+ #1 [ 9031.280648] task: ffc0d1a1d700 ti: ffc0d1a2c000 task.ti: ffc0d1a2c000 [ 9031.287776] PC is at fw_pm_notify+0x84/0x19c [ 9031.295215] LR is at fw_pm_notify+0x60/0x19c [ 9031.511559] [] fw_pm_notify+0x84/0x19c [ 9031.519355] [] notifier_call_chain+0x58/0x8c [ 9031.524739] [] __blocking_notifier_call_chain+0x54/0x70 [ 9031.530387] [] blocking_notifier_call_chain+0x38/0x44 [ 9031.537243] [] pm_notifier_call_chain+0x28/0x48 [ 9031.543662] [] pm_suspend+0x278/0x674 [ 9031.549906] [] state_store+0x58/0x90 [ 9031.554942] [] kobj_attr_store+0x18/0x28 [ 9031.560154] [] sysfs_kf_write+0x5c/0x68 [ 9031.565620] [] kernfs_fop_write+0x114/0x16c [ 9031.571092] [] __vfs_write+0x48/0xf0 [ 9031.576816] [] vfs_write+0xb8/0x150 [ 9031.581848] [] SyS_write+0x58/0x94 [ 9031.586973] [] el0_svc_naked+0x24/0x28 --- Kernel panic is observed during device suspend/resume path in the kill_requests_without_uevent() called from fw_pm_notify(). when pending_list of a firmware_buf is accessed 0x6b(free pattern) pattern observed. Based on this firmware_buf is freed even if firmware_buf is part of pending_fw_head list. What are you doing in userspace to trigger this problem? What kernel driver is this happening with? Device continuous suspend and resume is happening here. I think, echo mem > /sys/power/state issued here. It is not clear what driver involved here, because after firmware_buf is freed all memory gets filled with 0x6b pattern. And 4.4.16 is pretty old, can you try 4.9? We don't have system which runs on new kernels. Looking for possible reasons/path, how firmware_buf can get freed when that in pending_fw_head list. thanks, greg k-h -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Schedule affinity_notify work while migrating IRQs during hot plug
Hi Thomas, Currently irq_set_affinity() is called to migrate irqs from migrate_one_irq() during cpu hot plug and clients which are interested to know the irq affinity change not getting notified take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu(); irq_set_affinity() is changing the IRQ affinity at chip level but it is not notifying the affinity_notify work. How about below change, so that clients drivers gets notified about irq affinity changes? --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; With this change, notifications of IRQ affinity gets executed and notified to client drivers. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project
Schedule affinity_notify work while migrating IRQs during hot plug
Hi Thomas, Currently irq_set_affinity() is called to migrate irqs from migrate_one_irq() during cpu hot plug and clients which are interested to know the irq affinity change not getting notified take_cpu_down () --> __cpu_disable() --> irq_migrate_all_off_this_cpu(); irq_set_affinity() is changing the IRQ affinity at chip level but it is not notifying the affinity_notify work. How about below change, so that clients drivers gets notified about irq affinity changes? --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -207,6 +207,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, case IRQ_SET_MASK_OK_DONE: cpumask_copy(desc->irq_common_data.affinity, mask); case IRQ_SET_MASK_OK_NOCOPY: + schedule_work(>affinity_notify->work); irq_set_thread_affinity(desc); ret = 0; With this change, notifications of IRQ affinity gets executed and notified to client drivers. -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, Linux Foundation Collaborative Project