Re: [PATCH v2 2/3] arm64: dts: renesas: draak: Describe CVBS input
Hi Jacopo, I love your patch! Yet something to improve: [auto build test ERROR on linuxtv-media/master] [cannot apply to renesas/next v4.17-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Jacopo-Mondi/arm64-dts-Draak-Enable-video-inputs-and-VIN4/20180517-102013 base: git://linuxtv.org/media_tree.git master config: arm64-defconfig (attached as .config) compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm64 All errors (new ones prefixed by >>): >> Error: arch/arm64/boot/dts/renesas/r8a77995-draak.dts:266.1-6 Label or path >> vin4 not found >> FATAL ERROR: Syntax error parsing input tree --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 2/3] arm64: dts: renesas: draak: Describe CVBS input
Hi Jacopo, I love your patch! Yet something to improve: [auto build test ERROR on linuxtv-media/master] [cannot apply to renesas/next v4.17-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Jacopo-Mondi/arm64-dts-Draak-Enable-video-inputs-and-VIN4/20180517-102013 base: git://linuxtv.org/media_tree.git master config: arm64-defconfig (attached as .config) compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm64 All errors (new ones prefixed by >>): >> Error: arch/arm64/boot/dts/renesas/r8a77995-draak.dts:266.1-6 Label or path >> vin4 not found >> FATAL ERROR: Syntax error parsing input tree --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] cpuidle/powernv : init all present cpus for deep states
Yes this needs to be sent to stable. Fixes: d405a98c ("powerpc/powernv: Move cpuidle related code from setup.c to new file")
Re: [PATCH] cpuidle/powernv : init all present cpus for deep states
Yes this needs to be sent to stable. Fixes: d405a98c ("powerpc/powernv: Move cpuidle related code from setup.c to new file")
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote: >Hi Chao, > >On 05/17/18 at 11:27am, Chao Fan wrote: >> >+/* Store the number of 1GB huge pages which user specified.*/ >> >+static unsigned long max_gb_huge_pages; >> >+ >> >+static int parse_gb_huge_pages(char *param, char* val) >> >+{ >> >+ char *p; >> >+ u64 mem_size; >> >+ static bool gbpage_sz = false; >> >+ >> >+ if (!strcmp(param, "hugepagesz")) { >> >+ p = val; >> >+ mem_size = memparse(p, ); >> >+ if (mem_size == PUD_SIZE) { >> >+ if (gbpage_sz) >> >+ warn("Repeadly set hugeTLB page size of 1G!\n"); >> >+ gbpage_sz = true; >> >+ } else >> >+ gbpage_sz = false; >> >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) { >> >+ p = val; >> >+ max_gb_huge_pages = simple_strtoull(p, , 0); >> >+ debug_putaddr(max_gb_huge_pages); >> >+ } >> >+} >> >+ >> >+ >> > static int handle_mem_memmap(void) >> > { >> >char *args = (char *)get_cmd_line_ptr(); >> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, >> >unsigned long image_size) >> >} >> > } >> > >> >+/* Skip as many 1GB huge pages as possible in the passed region. */ >> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long >> >image_size) >> >+{ >> >+ int i = 0; >> >+ unsigned long addr, size; >> >+ struct mem_vector tmp; >> >+ >> >+ if (!max_gb_huge_pages) { >> >+ store_slot_info(region, image_size); >> >+ return; >> >+ } >> >+ >> >+ addr = ALIGN(region->start, PUD_SIZE); >> >+ /* If Did we raise the address above the passed in memory entry? */ >> >+ if (addr < region->start + region->size) >> >+ size = region->size - (addr - region->start); >> >+ >> >+ /* Check how many 1GB huge pages can be filtered out*/ >> >+ while (size > PUD_SIZE && max_gb_huge_pages) { >> >+ size -= PUD_SIZE; >> >+ max_gb_huge_pages--; >> >> The global variable 'max_gb_huge_pages' means how many huge pages >> user specified when you get it from command line. >> But here, everytime we find a position which is good for huge page >> allocation, the 'max_gdb_huge_page' decreased. So in my understanding, >> it is used to store how many huge pages that we still need to search memory >> for good slots to filter out, right? >> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable. >> If my understanding is wrong, please tell me. > >No, you have understood it very right. I finished the draft patch last >week, but changed this variable name and the function names several >time, still I feel they are not good. However I can't get a better name. > >Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected >from kernel command-line. And in this function it will be decreased. But >we can't define another global variable only for decreasing in this >place. > >And you can see that in this patchset I only take cares of 1GB huge >pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why >1GB only? Because 2MB is not impacted by KASLR, please check the code in >hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be >pre-allocated in hugetlb_nrpages_setup(), and if you look into >hugetlb_nrpages_setup(), you will find that it will call >alloc_bootmem_huge_page() to allocate huge pages one by one, but not at >one time. That is why I always add 'gb' in the middle of the global >variable and the newly added functions. > >And it will answer your below questions. When walk over all memory >regions, 'max_gb_huge_pages' is still not 0, what should we do? It's >normal and done as expected. Here hugetlb only try its best to allocate >as many as possible according to 'max_gb_huge_pages'. If can't fully >satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add >below to command-line: > >default_hugepagesz=1G hugepagesz=1G hugepages=20 > >Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G) >and [3G,4G) are touched by bios reservation and pci/firmware reservation. >Then this 14 huge pages are maximal value which is expected. It's not a >bug in huge page. But with kaslr enabled, it sometime only get 13 1GB >huge pages because KASLR put kernel into one of those good 1GB huge >pages. This is a bug. Thanks for your explaination, I got it. > >I am not very familiar with huge page handling, just read code recently >because of this kaslr bug. Hope Luiz and people from his team can help >correct and clarify if anything is not right. Especially the function >names, I feel it's not good, if anyone have a better idea, I will really >appreciate that. >> >> >+ i++; >> >+ } >> >+ >> >+ if (!i) { >> >+ store_slot_info(region, image_size); >> >+ return; >> >+ } >> >+ >> >+ /* Process the remaining regions after filtering out. */ >> >+ >> This line
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote: >Hi Chao, > >On 05/17/18 at 11:27am, Chao Fan wrote: >> >+/* Store the number of 1GB huge pages which user specified.*/ >> >+static unsigned long max_gb_huge_pages; >> >+ >> >+static int parse_gb_huge_pages(char *param, char* val) >> >+{ >> >+ char *p; >> >+ u64 mem_size; >> >+ static bool gbpage_sz = false; >> >+ >> >+ if (!strcmp(param, "hugepagesz")) { >> >+ p = val; >> >+ mem_size = memparse(p, ); >> >+ if (mem_size == PUD_SIZE) { >> >+ if (gbpage_sz) >> >+ warn("Repeadly set hugeTLB page size of 1G!\n"); >> >+ gbpage_sz = true; >> >+ } else >> >+ gbpage_sz = false; >> >+ } else if (!strcmp(param, "hugepages") && gbpage_sz) { >> >+ p = val; >> >+ max_gb_huge_pages = simple_strtoull(p, , 0); >> >+ debug_putaddr(max_gb_huge_pages); >> >+ } >> >+} >> >+ >> >+ >> > static int handle_mem_memmap(void) >> > { >> >char *args = (char *)get_cmd_line_ptr(); >> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, >> >unsigned long image_size) >> >} >> > } >> > >> >+/* Skip as many 1GB huge pages as possible in the passed region. */ >> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long >> >image_size) >> >+{ >> >+ int i = 0; >> >+ unsigned long addr, size; >> >+ struct mem_vector tmp; >> >+ >> >+ if (!max_gb_huge_pages) { >> >+ store_slot_info(region, image_size); >> >+ return; >> >+ } >> >+ >> >+ addr = ALIGN(region->start, PUD_SIZE); >> >+ /* If Did we raise the address above the passed in memory entry? */ >> >+ if (addr < region->start + region->size) >> >+ size = region->size - (addr - region->start); >> >+ >> >+ /* Check how many 1GB huge pages can be filtered out*/ >> >+ while (size > PUD_SIZE && max_gb_huge_pages) { >> >+ size -= PUD_SIZE; >> >+ max_gb_huge_pages--; >> >> The global variable 'max_gb_huge_pages' means how many huge pages >> user specified when you get it from command line. >> But here, everytime we find a position which is good for huge page >> allocation, the 'max_gdb_huge_page' decreased. So in my understanding, >> it is used to store how many huge pages that we still need to search memory >> for good slots to filter out, right? >> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable. >> If my understanding is wrong, please tell me. > >No, you have understood it very right. I finished the draft patch last >week, but changed this variable name and the function names several >time, still I feel they are not good. However I can't get a better name. > >Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected >from kernel command-line. And in this function it will be decreased. But >we can't define another global variable only for decreasing in this >place. > >And you can see that in this patchset I only take cares of 1GB huge >pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why >1GB only? Because 2MB is not impacted by KASLR, please check the code in >hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be >pre-allocated in hugetlb_nrpages_setup(), and if you look into >hugetlb_nrpages_setup(), you will find that it will call >alloc_bootmem_huge_page() to allocate huge pages one by one, but not at >one time. That is why I always add 'gb' in the middle of the global >variable and the newly added functions. > >And it will answer your below questions. When walk over all memory >regions, 'max_gb_huge_pages' is still not 0, what should we do? It's >normal and done as expected. Here hugetlb only try its best to allocate >as many as possible according to 'max_gb_huge_pages'. If can't fully >satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add >below to command-line: > >default_hugepagesz=1G hugepagesz=1G hugepages=20 > >Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G) >and [3G,4G) are touched by bios reservation and pci/firmware reservation. >Then this 14 huge pages are maximal value which is expected. It's not a >bug in huge page. But with kaslr enabled, it sometime only get 13 1GB >huge pages because KASLR put kernel into one of those good 1GB huge >pages. This is a bug. Thanks for your explaination, I got it. > >I am not very familiar with huge page handling, just read code recently >because of this kaslr bug. Hope Luiz and people from his team can help >correct and clarify if anything is not right. Especially the function >names, I feel it's not good, if anyone have a better idea, I will really >appreciate that. >> >> >+ i++; >> >+ } >> >+ >> >+ if (!i) { >> >+ store_slot_info(region, image_size); >> >+ return; >> >+ } >> >+ >> >+ /* Process the remaining regions after filtering out. */ >> >+ >> This line
[PATCH v7 0/6] provide power off support for iMX6 with external PMIC
2018.05.17: update patches to version v7 This patch series is providing power off support for Freescale/NXP iMX6 based boards with external power management integrated circuit (PMIC). As a first step the PMIC is configured to turn off the system if the standby pin is asserted. On second step we assert the standby pin. For this reason we need to use pm_power_off_prepare. Usage of stnadby pin for power off is described in official iMX6 documentation. 2018.03.05: As this patch set touches multiple subsystems I think it would make sense for Shawn Guo to take the all patch set. The only part which didn't receive an ACK is regulator stuff. So I would hope that Mark Brown can ACK it. Kind regards, Oleksij Rempel 2017.12.06: Adding Linus. Probably there is no maintainer for this patch set. No changes are made, tested on v4.15-rc1. 2017.10.27: Last version of this patch set was send at 20 Jun 2017, this is a rebase against kernel v4.14-rc6. Probably this set got lost. If I forgot to address some comments, please point me. changes: v7: - use EXPORT_SYMBOL_GPL(pm_power_off_prepare) instead of EXPORT_SYMBOL - call imx6q_suspend_finish() directly without cpu_suspend() v6: - rename imx6_pm_poweroff to imx6_pm_stby_poweroff - fix "MPIC_STBY_REQ" typo in the comment. v5: - remove useless includes from pm-imx6.c patch - add Acked-by to "regulator: pfuze100: add fsl,pmic-stby-poweroff property" patch v4: - update comment in "regulator: pfuze100: add fsl,pmic-stby-poweroff ..." patch - add Acked-by to "ARM: imx6q: provide documentation for new ..." patch v3: - set pm_power_off_prepare = NULL on .remove. - documentation and spelling fixes. - use %pf instead of lookup_symbol_name. Oleksij Rempel (6): ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set kernel/reboot.c: export pm_power_off_prepare regulator: pfuze100: add fsl,pmic-stby-poweroff property regulator: pfuze100-regulator: provide pm_power_off_prepare handler ARM: dts: imx6: RIoTboard provide standby on power off option .../devicetree/bindings/clock/imx6q-clock.txt | 8 ++ .../bindings/regulator/pfuze100.txt | 7 ++ arch/arm/boot/dts/imx6dl-riotboard.dts| 5 + arch/arm/mach-imx/pm-imx6.c | 25 + drivers/regulator/pfuze100-regulator.c| 92 +++ kernel/reboot.c | 1 + 6 files changed, 138 insertions(+) -- 2.17.0
[PATCH v7 0/6] provide power off support for iMX6 with external PMIC
2018.05.17: update patches to version v7 This patch series is providing power off support for Freescale/NXP iMX6 based boards with external power management integrated circuit (PMIC). As a first step the PMIC is configured to turn off the system if the standby pin is asserted. On second step we assert the standby pin. For this reason we need to use pm_power_off_prepare. Usage of stnadby pin for power off is described in official iMX6 documentation. 2018.03.05: As this patch set touches multiple subsystems I think it would make sense for Shawn Guo to take the all patch set. The only part which didn't receive an ACK is regulator stuff. So I would hope that Mark Brown can ACK it. Kind regards, Oleksij Rempel 2017.12.06: Adding Linus. Probably there is no maintainer for this patch set. No changes are made, tested on v4.15-rc1. 2017.10.27: Last version of this patch set was send at 20 Jun 2017, this is a rebase against kernel v4.14-rc6. Probably this set got lost. If I forgot to address some comments, please point me. changes: v7: - use EXPORT_SYMBOL_GPL(pm_power_off_prepare) instead of EXPORT_SYMBOL - call imx6q_suspend_finish() directly without cpu_suspend() v6: - rename imx6_pm_poweroff to imx6_pm_stby_poweroff - fix "MPIC_STBY_REQ" typo in the comment. v5: - remove useless includes from pm-imx6.c patch - add Acked-by to "regulator: pfuze100: add fsl,pmic-stby-poweroff property" patch v4: - update comment in "regulator: pfuze100: add fsl,pmic-stby-poweroff ..." patch - add Acked-by to "ARM: imx6q: provide documentation for new ..." patch v3: - set pm_power_off_prepare = NULL on .remove. - documentation and spelling fixes. - use %pf instead of lookup_symbol_name. Oleksij Rempel (6): ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set kernel/reboot.c: export pm_power_off_prepare regulator: pfuze100: add fsl,pmic-stby-poweroff property regulator: pfuze100-regulator: provide pm_power_off_prepare handler ARM: dts: imx6: RIoTboard provide standby on power off option .../devicetree/bindings/clock/imx6q-clock.txt | 8 ++ .../bindings/regulator/pfuze100.txt | 7 ++ arch/arm/boot/dts/imx6dl-riotboard.dts| 5 + arch/arm/mach-imx/pm-imx6.c | 25 + drivers/regulator/pfuze100-regulator.c| 92 +++ kernel/reboot.c | 1 + 6 files changed, 138 insertions(+) -- 2.17.0
[PATCH v7 6/6] ARM: dts: imx6: RIoTboard provide standby on power off option
This board, as well as some other boards with i.MX6 and a PMIC, uses a "PMIC_STBY_REQ" line to notify the PMIC about a state change. The PMIC is programmed for a specific state change before triggering the line. In this case, PMIC_STBY_REQ can be used for stand by, sleep and power off modes. Signed-off-by: Oleksij Rempel--- arch/arm/boot/dts/imx6dl-riotboard.dts | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/imx6dl-riotboard.dts b/arch/arm/boot/dts/imx6dl-riotboard.dts index 2e98c92adff7..a0e9753ee767 100644 --- a/arch/arm/boot/dts/imx6dl-riotboard.dts +++ b/arch/arm/boot/dts/imx6dl-riotboard.dts @@ -90,6 +90,10 @@ status = "okay"; }; + { + fsl,pmic-stby-poweroff; +}; + { pinctrl-names = "default"; pinctrl-0 = <_enet>; @@ -170,6 +174,7 @@ reg = <0x08>; interrupt-parent = <>; interrupts = <16 8>; + fsl,pmic-stby-poweroff; regulators { reg_vddcore: sw1ab {/* VDDARM_IN */ -- 2.17.0
[PATCH v7 2/6] ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set
One of the Freescale recommended sequences for power off with external PMIC is the following: ... 3. SoC is programming PMIC for power off when standby is asserted. 4. In CCM STOP mode, Standby is asserted, PMIC gates SoC supplies. See: http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6DQRM.pdf page 5083 This patch implements step 4. of this sequence. Signed-off-by: Oleksij Rempel--- arch/arm/mach-imx/pm-imx6.c | 25 + 1 file changed, 25 insertions(+) diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c index 017539dd712b..2f5c643f62fb 100644 --- a/arch/arm/mach-imx/pm-imx6.c +++ b/arch/arm/mach-imx/pm-imx6.c @@ -601,6 +601,28 @@ static void __init imx6_pm_common_init(const struct imx6_pm_socdata IMX6Q_GPR1_GINT); } +static void imx6_pm_stby_poweroff(void) +{ + imx6_set_lpm(STOP_POWER_OFF); + imx6q_suspend_finish(0); + + mdelay(1000); + + pr_emerg("Unable to poweroff system\n"); +} + +static int imx6_pm_stby_poweroff_probe(void) +{ + if (pm_power_off) { + pr_warn("%s: pm_power_off already claimed %p %pf!\n", + __func__, pm_power_off, pm_power_off); + return -EBUSY; + } + + pm_power_off = imx6_pm_stby_poweroff; + return 0; +} + void __init imx6_pm_ccm_init(const char *ccm_compat) { struct device_node *np; @@ -617,6 +639,9 @@ void __init imx6_pm_ccm_init(const char *ccm_compat) val = readl_relaxed(ccm_base + CLPCR); val &= ~BM_CLPCR_LPM; writel_relaxed(val, ccm_base + CLPCR); + + if (of_property_read_bool(np, "fsl,pmic-stby-poweroff")) + imx6_pm_stby_poweroff_probe(); } void __init imx6q_pm_init(void) -- 2.17.0
[PATCH v7 1/6] ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property
Signed-off-by: Oleksij RempelAcked-by: Rob Herring --- Documentation/devicetree/bindings/clock/imx6q-clock.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/clock/imx6q-clock.txt b/Documentation/devicetree/bindings/clock/imx6q-clock.txt index a45ca67a9d5f..e1308346e00d 100644 --- a/Documentation/devicetree/bindings/clock/imx6q-clock.txt +++ b/Documentation/devicetree/bindings/clock/imx6q-clock.txt @@ -6,6 +6,14 @@ Required properties: - interrupts: Should contain CCM interrupt - #clock-cells: Should be <1> +Optional properties: +- fsl,pmic-stby-poweroff: Configure CCM to assert PMIC_STBY_REQ signal + on power off. + Use this property if the SoC should be powered off by external power + management IC (PMIC) triggered via PMIC_STBY_REQ signal. + Boards that are designed to initiate poweroff on PMIC_ON_REQ signal should + be using "syscon-poweroff" driver instead. + The clock consumer should specify the desired clock by having the clock ID in its "clocks" phandle cell. See include/dt-bindings/clock/imx6qdl-clock.h for the full list of i.MX6 Quad and DualLite clock IDs. -- 2.17.0
[PATCH v7 2/6] ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set
One of the Freescale recommended sequences for power off with external PMIC is the following: ... 3. SoC is programming PMIC for power off when standby is asserted. 4. In CCM STOP mode, Standby is asserted, PMIC gates SoC supplies. See: http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6DQRM.pdf page 5083 This patch implements step 4. of this sequence. Signed-off-by: Oleksij Rempel --- arch/arm/mach-imx/pm-imx6.c | 25 + 1 file changed, 25 insertions(+) diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c index 017539dd712b..2f5c643f62fb 100644 --- a/arch/arm/mach-imx/pm-imx6.c +++ b/arch/arm/mach-imx/pm-imx6.c @@ -601,6 +601,28 @@ static void __init imx6_pm_common_init(const struct imx6_pm_socdata IMX6Q_GPR1_GINT); } +static void imx6_pm_stby_poweroff(void) +{ + imx6_set_lpm(STOP_POWER_OFF); + imx6q_suspend_finish(0); + + mdelay(1000); + + pr_emerg("Unable to poweroff system\n"); +} + +static int imx6_pm_stby_poweroff_probe(void) +{ + if (pm_power_off) { + pr_warn("%s: pm_power_off already claimed %p %pf!\n", + __func__, pm_power_off, pm_power_off); + return -EBUSY; + } + + pm_power_off = imx6_pm_stby_poweroff; + return 0; +} + void __init imx6_pm_ccm_init(const char *ccm_compat) { struct device_node *np; @@ -617,6 +639,9 @@ void __init imx6_pm_ccm_init(const char *ccm_compat) val = readl_relaxed(ccm_base + CLPCR); val &= ~BM_CLPCR_LPM; writel_relaxed(val, ccm_base + CLPCR); + + if (of_property_read_bool(np, "fsl,pmic-stby-poweroff")) + imx6_pm_stby_poweroff_probe(); } void __init imx6q_pm_init(void) -- 2.17.0
[PATCH v7 6/6] ARM: dts: imx6: RIoTboard provide standby on power off option
This board, as well as some other boards with i.MX6 and a PMIC, uses a "PMIC_STBY_REQ" line to notify the PMIC about a state change. The PMIC is programmed for a specific state change before triggering the line. In this case, PMIC_STBY_REQ can be used for stand by, sleep and power off modes. Signed-off-by: Oleksij Rempel --- arch/arm/boot/dts/imx6dl-riotboard.dts | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/imx6dl-riotboard.dts b/arch/arm/boot/dts/imx6dl-riotboard.dts index 2e98c92adff7..a0e9753ee767 100644 --- a/arch/arm/boot/dts/imx6dl-riotboard.dts +++ b/arch/arm/boot/dts/imx6dl-riotboard.dts @@ -90,6 +90,10 @@ status = "okay"; }; + { + fsl,pmic-stby-poweroff; +}; + { pinctrl-names = "default"; pinctrl-0 = <_enet>; @@ -170,6 +174,7 @@ reg = <0x08>; interrupt-parent = <>; interrupts = <16 8>; + fsl,pmic-stby-poweroff; regulators { reg_vddcore: sw1ab {/* VDDARM_IN */ -- 2.17.0
[PATCH v7 1/6] ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property
Signed-off-by: Oleksij Rempel Acked-by: Rob Herring --- Documentation/devicetree/bindings/clock/imx6q-clock.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/devicetree/bindings/clock/imx6q-clock.txt b/Documentation/devicetree/bindings/clock/imx6q-clock.txt index a45ca67a9d5f..e1308346e00d 100644 --- a/Documentation/devicetree/bindings/clock/imx6q-clock.txt +++ b/Documentation/devicetree/bindings/clock/imx6q-clock.txt @@ -6,6 +6,14 @@ Required properties: - interrupts: Should contain CCM interrupt - #clock-cells: Should be <1> +Optional properties: +- fsl,pmic-stby-poweroff: Configure CCM to assert PMIC_STBY_REQ signal + on power off. + Use this property if the SoC should be powered off by external power + management IC (PMIC) triggered via PMIC_STBY_REQ signal. + Boards that are designed to initiate poweroff on PMIC_ON_REQ signal should + be using "syscon-poweroff" driver instead. + The clock consumer should specify the desired clock by having the clock ID in its "clocks" phandle cell. See include/dt-bindings/clock/imx6qdl-clock.h for the full list of i.MX6 Quad and DualLite clock IDs. -- 2.17.0
[PATCH v7 5/6] regulator: pfuze100-regulator: provide pm_power_off_prepare handler
On some boards the SoC can use one pin "PMIC_STBY_REQ" to notify th PMIC about state changes. In this case internal state of PMIC must be preconfigured for upcomming state change. It works fine with the current regulator framework, except with the power-off case. This patch is providing an optional pm_power_off_prepare handler which will configure standby state of the PMIC to disable all power lines. In my power consumption test on RIoTBoard, I got the following results: power off without this patch: 320 mA power off with this patch: 2 mA suspend to ram: 40 mA Signed-off-by: Oleksij Rempel--- drivers/regulator/pfuze100-regulator.c | 92 ++ 1 file changed, 92 insertions(+) diff --git a/drivers/regulator/pfuze100-regulator.c b/drivers/regulator/pfuze100-regulator.c index 63922a2167e5..f6c276ed91d8 100644 --- a/drivers/regulator/pfuze100-regulator.c +++ b/drivers/regulator/pfuze100-regulator.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #define PFUZE_NUMREGS 128 @@ -42,11 +43,17 @@ #define PFUZE100_COINVOL 0x1a #define PFUZE100_SW1ABVOL 0x20 +#define PFUZE100_SW1ABMODE 0x23 #define PFUZE100_SW1CVOL 0x2e +#define PFUZE100_SW1CMODE 0x31 #define PFUZE100_SW2VOL0x35 +#define PFUZE100_SW2MODE 0x38 #define PFUZE100_SW3AVOL 0x3c +#define PFUZE100_SW3AMODE 0x3f #define PFUZE100_SW3BVOL 0x43 +#define PFUZE100_SW3BMODE 0x46 #define PFUZE100_SW4VOL0x4a +#define PFUZE100_SW4MODE 0x4d #define PFUZE100_SWBSTCON1 0x66 #define PFUZE100_VREFDDRCON0x6a #define PFUZE100_VSNVSVOL 0x6b @@ -57,6 +64,13 @@ #define PFUZE100_VGEN5VOL 0x70 #define PFUZE100_VGEN6VOL 0x71 +#define PFUZE100_SWxMODE_MASK 0xf +#define PFUZE100_SWxMODE_APS_APS 0x8 +#define PFUZE100_SWxMODE_APS_OFF 0x4 + +#define PFUZE100_VGENxLPWR BIT(6) +#define PFUZE100_VGENxSTBY BIT(5) + enum chips { PFUZE100, PFUZE200, PFUZE3000 = 3 }; struct pfuze_regulator { @@ -489,6 +503,69 @@ static inline struct device_node *match_of_node(int index) } #endif +static struct pfuze_chip *syspm_pfuze_chip; + +static void pfuze_power_off_prepare(void) +{ + dev_info(syspm_pfuze_chip->dev, "Configure standy mode for power off"); + + /* Switch from default mode: APS/APS to APS/Off */ + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1ABMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1CMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW2MODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3AMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3BMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW4MODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN1VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN2VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN3VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN4VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN5VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN6VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); +} + +static int pfuze_power_off_prepare_init(struct pfuze_chip *pfuze_chip) +{ + if (pfuze_chip->chip_id != PFUZE100) { + dev_warn(pfuze_chip->dev, "Requested pm_power_off_prepare handler for not supported chip\n"); + return -ENODEV; + } + + if (pm_power_off_prepare) { + dev_warn(pfuze_chip->dev, "pm_power_off_prepare is already registered.\n"); + return -EBUSY; + } + + if (syspm_pfuze_chip) { + dev_warn(pfuze_chip->dev,
[PATCH v7 5/6] regulator: pfuze100-regulator: provide pm_power_off_prepare handler
On some boards the SoC can use one pin "PMIC_STBY_REQ" to notify th PMIC about state changes. In this case internal state of PMIC must be preconfigured for upcomming state change. It works fine with the current regulator framework, except with the power-off case. This patch is providing an optional pm_power_off_prepare handler which will configure standby state of the PMIC to disable all power lines. In my power consumption test on RIoTBoard, I got the following results: power off without this patch: 320 mA power off with this patch: 2 mA suspend to ram: 40 mA Signed-off-by: Oleksij Rempel --- drivers/regulator/pfuze100-regulator.c | 92 ++ 1 file changed, 92 insertions(+) diff --git a/drivers/regulator/pfuze100-regulator.c b/drivers/regulator/pfuze100-regulator.c index 63922a2167e5..f6c276ed91d8 100644 --- a/drivers/regulator/pfuze100-regulator.c +++ b/drivers/regulator/pfuze100-regulator.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #define PFUZE_NUMREGS 128 @@ -42,11 +43,17 @@ #define PFUZE100_COINVOL 0x1a #define PFUZE100_SW1ABVOL 0x20 +#define PFUZE100_SW1ABMODE 0x23 #define PFUZE100_SW1CVOL 0x2e +#define PFUZE100_SW1CMODE 0x31 #define PFUZE100_SW2VOL0x35 +#define PFUZE100_SW2MODE 0x38 #define PFUZE100_SW3AVOL 0x3c +#define PFUZE100_SW3AMODE 0x3f #define PFUZE100_SW3BVOL 0x43 +#define PFUZE100_SW3BMODE 0x46 #define PFUZE100_SW4VOL0x4a +#define PFUZE100_SW4MODE 0x4d #define PFUZE100_SWBSTCON1 0x66 #define PFUZE100_VREFDDRCON0x6a #define PFUZE100_VSNVSVOL 0x6b @@ -57,6 +64,13 @@ #define PFUZE100_VGEN5VOL 0x70 #define PFUZE100_VGEN6VOL 0x71 +#define PFUZE100_SWxMODE_MASK 0xf +#define PFUZE100_SWxMODE_APS_APS 0x8 +#define PFUZE100_SWxMODE_APS_OFF 0x4 + +#define PFUZE100_VGENxLPWR BIT(6) +#define PFUZE100_VGENxSTBY BIT(5) + enum chips { PFUZE100, PFUZE200, PFUZE3000 = 3 }; struct pfuze_regulator { @@ -489,6 +503,69 @@ static inline struct device_node *match_of_node(int index) } #endif +static struct pfuze_chip *syspm_pfuze_chip; + +static void pfuze_power_off_prepare(void) +{ + dev_info(syspm_pfuze_chip->dev, "Configure standy mode for power off"); + + /* Switch from default mode: APS/APS to APS/Off */ + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1ABMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1CMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW2MODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3AMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3BMODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW4MODE, + PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF); + + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN1VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN2VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN3VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN4VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN5VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); + regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN6VOL, + PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY, + PFUZE100_VGENxSTBY); +} + +static int pfuze_power_off_prepare_init(struct pfuze_chip *pfuze_chip) +{ + if (pfuze_chip->chip_id != PFUZE100) { + dev_warn(pfuze_chip->dev, "Requested pm_power_off_prepare handler for not supported chip\n"); + return -ENODEV; + } + + if (pm_power_off_prepare) { + dev_warn(pfuze_chip->dev, "pm_power_off_prepare is already registered.\n"); + return -EBUSY; + } + + if (syspm_pfuze_chip) { + dev_warn(pfuze_chip->dev, "syspm_pfuze_chip is already
[PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare
Export pm_power_off_prepare. It is needed to implement power off on Freescale/NXP iMX6 based boards with external power management integrated circuit (PMIC). Signed-off-by: Oleksij Rempel--- kernel/reboot.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/reboot.c b/kernel/reboot.c index e4ced883d8de..83810d726f3e 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -49,6 +49,7 @@ int reboot_force; */ void (*pm_power_off_prepare)(void); +EXPORT_SYMBOL_GPL(pm_power_off_prepare); /** * emergency_restart - reboot the system -- 2.17.0
[PATCH v7 4/6] regulator: pfuze100: add fsl,pmic-stby-poweroff property
Document the new optional "fsl,pmic-stby-poweroff" property. Signed-off-by: Oleksij RempelAcked-by: Rob Herring --- Documentation/devicetree/bindings/regulator/pfuze100.txt | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt b/Documentation/devicetree/bindings/regulator/pfuze100.txt index c6dd3f5e485b..91fec1a0785d 100644 --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt @@ -4,6 +4,13 @@ Required properties: - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000" - reg: I2C slave address +Optional properties: +- fsl,pmic-stby-poweroff: if present, configure the PMIC to shutdown all + power rails when PMIC_STBY_REQ line is asserted during the power off sequence. + Use this option if the SoC should be powered off by external power + management IC (PMIC) on PMIC_STBY_REQ signal. + As opposite to PMIC_STBY_REQ boards can implement PMIC_ON_REQ signal. + Required child node: - regulators: This is the list of child nodes that specify the regulator initialization data for defined regulators. Please refer to below doc -- 2.17.0
[PATCH v7 4/6] regulator: pfuze100: add fsl,pmic-stby-poweroff property
Document the new optional "fsl,pmic-stby-poweroff" property. Signed-off-by: Oleksij Rempel Acked-by: Rob Herring --- Documentation/devicetree/bindings/regulator/pfuze100.txt | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt b/Documentation/devicetree/bindings/regulator/pfuze100.txt index c6dd3f5e485b..91fec1a0785d 100644 --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt @@ -4,6 +4,13 @@ Required properties: - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000" - reg: I2C slave address +Optional properties: +- fsl,pmic-stby-poweroff: if present, configure the PMIC to shutdown all + power rails when PMIC_STBY_REQ line is asserted during the power off sequence. + Use this option if the SoC should be powered off by external power + management IC (PMIC) on PMIC_STBY_REQ signal. + As opposite to PMIC_STBY_REQ boards can implement PMIC_ON_REQ signal. + Required child node: - regulators: This is the list of child nodes that specify the regulator initialization data for defined regulators. Please refer to below doc -- 2.17.0
[PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare
Export pm_power_off_prepare. It is needed to implement power off on Freescale/NXP iMX6 based boards with external power management integrated circuit (PMIC). Signed-off-by: Oleksij Rempel --- kernel/reboot.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/reboot.c b/kernel/reboot.c index e4ced883d8de..83810d726f3e 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -49,6 +49,7 @@ int reboot_force; */ void (*pm_power_off_prepare)(void); +EXPORT_SYMBOL_GPL(pm_power_off_prepare); /** * emergency_restart - reboot the system -- 2.17.0
Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules
Russell King - ARM Linuxwrites: > On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote: >> Hi Russell, >> >> On 10 May 2018 at 02:40, Russell King - ARM Linux >> wrote: >> > This does not leak information from other namespaces because of the >> > uniqueness of the global PID. However, what it does leak is the value >> > of the global PID which is meaningless in the namespace. So, before >> > the event stream is delivered to userspace, this value needs to be >> > re-written to the namespace's PID value. >> >> Unfortunately that can't be done. The trace stream is compressed and >> needs to be decompressed using an external library. I think the only >> option is to return an error if a user is trying to use this feature >> from a namespace. > > That sounds like a sensible approach, and that should get rid of the > vpid stuff too. > > Eric, would this solve all your concerns? It does, and I have given my ack to the respin. I am moderately concerned about using the global pid with hardware. But pids are a core abstraction that aren't going anywhere. As long as hardware does not impose constraints on pids that software already does not we should be fine. Eric
Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules
Russell King - ARM Linux writes: > On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote: >> Hi Russell, >> >> On 10 May 2018 at 02:40, Russell King - ARM Linux >> wrote: >> > This does not leak information from other namespaces because of the >> > uniqueness of the global PID. However, what it does leak is the value >> > of the global PID which is meaningless in the namespace. So, before >> > the event stream is delivered to userspace, this value needs to be >> > re-written to the namespace's PID value. >> >> Unfortunately that can't be done. The trace stream is compressed and >> needs to be decompressed using an external library. I think the only >> option is to return an error if a user is trying to use this feature >> from a namespace. > > That sounds like a sensible approach, and that should get rid of the > vpid stuff too. > > Eric, would this solve all your concerns? It does, and I have given my ack to the respin. I am moderately concerned about using the global pid with hardware. But pids are a core abstraction that aren't going anywhere. As long as hardware does not impose constraints on pids that software already does not we should be fine. Eric
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On 05/17/18 at 07:12am, damian wrote: > On Wed, 16. May 18:05, Baoquan He wrote: > > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to > > handle conflict between KASLR and huge pages, will be used in the next > > patch. > > > > Function parse_gb_huge_pages() is used to parse kernel command-line to get > > how many 1GB huge pages have been specified. A static global variable > > 'max_gb_huge_pages' is added to store the number. > > > > And process_gb_huge_page() is used to skip as many 1GB huge pages as > > possible > > from the passed in memory region according to the specified number. > > > > Signed-off-by: Baoquan He> > --- > > arch/x86/boot/compressed/kaslr.c | 71 > > > > 1 file changed, 71 insertions(+) > > > > diff --git a/arch/x86/boot/compressed/kaslr.c > > b/arch/x86/boot/compressed/kaslr.c > > index a0a50b91ecef..13bd879cdc5d 100644 > > --- a/arch/x86/boot/compressed/kaslr.c > > +++ b/arch/x86/boot/compressed/kaslr.c > > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str) > > memmap_too_large = true; > > } > > > > +/* Store the number of 1GB huge pages which user specified.*/ > > +static unsigned long max_gb_huge_pages; > > + > > +static int parse_gb_huge_pages(char *param, char* val) > > +{ > > + char *p; > > + u64 mem_size; > > + static bool gbpage_sz = false; > > + > > + if (!strcmp(param, "hugepagesz")) { > > + p = val; > > + mem_size = memparse(p, ); > > + if (mem_size == PUD_SIZE) { > > + if (gbpage_sz) > > + warn("Repeadly set hugeTLB page size of 1G!\n"); > > + gbpage_sz = true; > > + } else > > + gbpage_sz = false; > > + } else if (!strcmp(param, "hugepages") && gbpage_sz) { > > + p = val; > > + max_gb_huge_pages = simple_strtoull(p, , 0); > > + debug_putaddr(max_gb_huge_pages); > > + } > > +} > Hello, > > the return value is missing for the function or not ? Thanks, very good catch. It should be 'static void ', no return value since we didn't check it. Will update when repost. > > > + > > + > > static int handle_mem_memmap(void) > > { > > char *args = (char *)get_cmd_line_ptr(); > > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > > unsigned long image_size) > > } > > } > > > > +/* Skip as many 1GB huge pages as possible in the passed region. */ > > +static void process_gb_huge_page(struct mem_vector *region, unsigned long > > image_size) > > +{ > > + int i = 0; > > + unsigned long addr, size; > > + struct mem_vector tmp; > > + > > + if (!max_gb_huge_pages) { > > + store_slot_info(region, image_size); > > + return; > > + } > > + > > + addr = ALIGN(region->start, PUD_SIZE); > > + /* If Did we raise the address above the passed in memory entry? */ > > + if (addr < region->start + region->size) > > + size = region->size - (addr - region->start); > > + > > + /* Check how many 1GB huge pages can be filtered out*/ > > + while (size > PUD_SIZE && max_gb_huge_pages) { > > + size -= PUD_SIZE; > > + max_gb_huge_pages--; > > + i++; > > + } > > + > > + if (!i) { > > + store_slot_info(region, image_size); > > + return; > > + } > > + > > + /* Process the remaining regions after filtering out. */ > > + > > + if (addr >= region->start + image_size) { > > + tmp.start = region->start; > > + tmp.size = addr - region->start; > > + store_slot_info(, image_size); > > + } > > + > > + size = region->size - (addr - region->start) - i * PUD_SIZE; > > +if (size >= image_size) { > > + tmp.start = addr + i*PUD_SIZE; > > + tmp.size = size; > > + store_slot_info(, image_size); > > +} > > +} > > + > > static unsigned long slots_fetch_random(void) > > { > > unsigned long slot; > > -- > > 2.13.6 > >
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On 05/17/18 at 07:12am, damian wrote: > On Wed, 16. May 18:05, Baoquan He wrote: > > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to > > handle conflict between KASLR and huge pages, will be used in the next > > patch. > > > > Function parse_gb_huge_pages() is used to parse kernel command-line to get > > how many 1GB huge pages have been specified. A static global variable > > 'max_gb_huge_pages' is added to store the number. > > > > And process_gb_huge_page() is used to skip as many 1GB huge pages as > > possible > > from the passed in memory region according to the specified number. > > > > Signed-off-by: Baoquan He > > --- > > arch/x86/boot/compressed/kaslr.c | 71 > > > > 1 file changed, 71 insertions(+) > > > > diff --git a/arch/x86/boot/compressed/kaslr.c > > b/arch/x86/boot/compressed/kaslr.c > > index a0a50b91ecef..13bd879cdc5d 100644 > > --- a/arch/x86/boot/compressed/kaslr.c > > +++ b/arch/x86/boot/compressed/kaslr.c > > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str) > > memmap_too_large = true; > > } > > > > +/* Store the number of 1GB huge pages which user specified.*/ > > +static unsigned long max_gb_huge_pages; > > + > > +static int parse_gb_huge_pages(char *param, char* val) > > +{ > > + char *p; > > + u64 mem_size; > > + static bool gbpage_sz = false; > > + > > + if (!strcmp(param, "hugepagesz")) { > > + p = val; > > + mem_size = memparse(p, ); > > + if (mem_size == PUD_SIZE) { > > + if (gbpage_sz) > > + warn("Repeadly set hugeTLB page size of 1G!\n"); > > + gbpage_sz = true; > > + } else > > + gbpage_sz = false; > > + } else if (!strcmp(param, "hugepages") && gbpage_sz) { > > + p = val; > > + max_gb_huge_pages = simple_strtoull(p, , 0); > > + debug_putaddr(max_gb_huge_pages); > > + } > > +} > Hello, > > the return value is missing for the function or not ? Thanks, very good catch. It should be 'static void ', no return value since we didn't check it. Will update when repost. > > > + > > + > > static int handle_mem_memmap(void) > > { > > char *args = (char *)get_cmd_line_ptr(); > > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > > unsigned long image_size) > > } > > } > > > > +/* Skip as many 1GB huge pages as possible in the passed region. */ > > +static void process_gb_huge_page(struct mem_vector *region, unsigned long > > image_size) > > +{ > > + int i = 0; > > + unsigned long addr, size; > > + struct mem_vector tmp; > > + > > + if (!max_gb_huge_pages) { > > + store_slot_info(region, image_size); > > + return; > > + } > > + > > + addr = ALIGN(region->start, PUD_SIZE); > > + /* If Did we raise the address above the passed in memory entry? */ > > + if (addr < region->start + region->size) > > + size = region->size - (addr - region->start); > > + > > + /* Check how many 1GB huge pages can be filtered out*/ > > + while (size > PUD_SIZE && max_gb_huge_pages) { > > + size -= PUD_SIZE; > > + max_gb_huge_pages--; > > + i++; > > + } > > + > > + if (!i) { > > + store_slot_info(region, image_size); > > + return; > > + } > > + > > + /* Process the remaining regions after filtering out. */ > > + > > + if (addr >= region->start + image_size) { > > + tmp.start = region->start; > > + tmp.size = addr - region->start; > > + store_slot_info(, image_size); > > + } > > + > > + size = region->size - (addr - region->start) - i * PUD_SIZE; > > +if (size >= image_size) { > > + tmp.start = addr + i*PUD_SIZE; > > + tmp.size = size; > > + store_slot_info(, image_size); > > +} > > +} > > + > > static unsigned long slots_fetch_random(void) > > { > > unsigned long slot; > > -- > > 2.13.6 > >
Re: [PATCH v3 2/2] Input: xen-kbdfront - allow better run-time configuration
On 05/17/2018 12:08 AM, Dmitry Torokhov wrote: On Wed, May 16, 2018 at 08:47:30PM +0300, Oleksandr Andrushchenko wrote: On 05/16/2018 08:15 PM, Dmitry Torokhov wrote: Hi Oleksandr, On Mon, May 14, 2018 at 05:40:29PM +0300, Oleksandr Andrushchenko wrote: @@ -211,93 +220,114 @@ static int xenkbd_probe(struct xenbus_device *dev, if (!info->page) goto error_nomem; - /* Set input abs params to match backend screen res */ - abs = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_FEAT_ABS_POINTER, 0); - ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_WIDTH, - ptr_size[KPARAM_X]); - ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_HEIGHT, - ptr_size[KPARAM_Y]); - if (abs) { - ret = xenbus_write(XBT_NIL, dev->nodename, - XENKBD_FIELD_REQ_ABS_POINTER, "1"); - if (ret) { - pr_warn("xenkbd: can't request abs-pointer\n"); - abs = 0; - } - } + /* +* The below are reverse logic, e.g. if the feature is set, then +* do not expose the corresponding virtual device. +*/ + with_kbd = !xenbus_read_unsigned(dev->nodename, +XENKBD_FIELD_FEAT_DSBL_KEYBRD, 0); - touch = xenbus_read_unsigned(dev->nodename, -XENKBD_FIELD_FEAT_MTOUCH, 0); - if (touch) { + with_ptr = !xenbus_read_unsigned(dev->nodename, +XENKBD_FIELD_FEAT_DSBL_POINTER, 0); + + /* Direct logic: if set, then create multi-touch device. */ + with_mtouch = xenbus_read_unsigned(dev->nodename, + XENKBD_FIELD_FEAT_MTOUCH, 0); + if (with_mtouch) { ret = xenbus_write(XBT_NIL, dev->nodename, XENKBD_FIELD_REQ_MTOUCH, "1"); if (ret) { pr_warn("xenkbd: can't request multi-touch"); - touch = 0; + with_mtouch = 0; } } Does it make sense to still end up calling xenkbd_connect_backend() when all interfaces (keyboard, pointer, and multitouch) are disabled? Should we do: if (!(with_kbd || || with_ptr || with_mtouch)) return -ENXIO; ? It does make sense. Then we probably need to move all xenbus_read_unsigned calls to the very beginning of the .probe, so no memory allocations are made which will be useless if we return -ENXIO, e.g. something like static int xenkbd_probe(struct xenbus_device *dev, Â Â Â Â Â Â Â Â Â Â Â Â Â const struct xenbus_device_id *id) { Â Â Â int ret, i; Â Â Â bool with_mtouch, with_kbd, with_ptr; Â Â Â struct xenkbd_info *info; Â Â Â struct input_dev *kbd, *ptr, *mtouch; if (!(with_kbd | with_ptr | with_mtouch)) Â Â Â Â Â Â return -ENXIO; Does the above looks ok? Yes. Another option is to keep the check where I suggested and do if (...) { ret = -ENXIO; goto error; } Whichever you prefer is fine with me. I will go with the change you suggested and I'll send v4 tomorrow then. Thanks. Thank you, Oleksandr
Re: [PATCH v3 2/2] Input: xen-kbdfront - allow better run-time configuration
On 05/17/2018 12:08 AM, Dmitry Torokhov wrote: On Wed, May 16, 2018 at 08:47:30PM +0300, Oleksandr Andrushchenko wrote: On 05/16/2018 08:15 PM, Dmitry Torokhov wrote: Hi Oleksandr, On Mon, May 14, 2018 at 05:40:29PM +0300, Oleksandr Andrushchenko wrote: @@ -211,93 +220,114 @@ static int xenkbd_probe(struct xenbus_device *dev, if (!info->page) goto error_nomem; - /* Set input abs params to match backend screen res */ - abs = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_FEAT_ABS_POINTER, 0); - ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_WIDTH, - ptr_size[KPARAM_X]); - ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend, - XENKBD_FIELD_HEIGHT, - ptr_size[KPARAM_Y]); - if (abs) { - ret = xenbus_write(XBT_NIL, dev->nodename, - XENKBD_FIELD_REQ_ABS_POINTER, "1"); - if (ret) { - pr_warn("xenkbd: can't request abs-pointer\n"); - abs = 0; - } - } + /* +* The below are reverse logic, e.g. if the feature is set, then +* do not expose the corresponding virtual device. +*/ + with_kbd = !xenbus_read_unsigned(dev->nodename, +XENKBD_FIELD_FEAT_DSBL_KEYBRD, 0); - touch = xenbus_read_unsigned(dev->nodename, -XENKBD_FIELD_FEAT_MTOUCH, 0); - if (touch) { + with_ptr = !xenbus_read_unsigned(dev->nodename, +XENKBD_FIELD_FEAT_DSBL_POINTER, 0); + + /* Direct logic: if set, then create multi-touch device. */ + with_mtouch = xenbus_read_unsigned(dev->nodename, + XENKBD_FIELD_FEAT_MTOUCH, 0); + if (with_mtouch) { ret = xenbus_write(XBT_NIL, dev->nodename, XENKBD_FIELD_REQ_MTOUCH, "1"); if (ret) { pr_warn("xenkbd: can't request multi-touch"); - touch = 0; + with_mtouch = 0; } } Does it make sense to still end up calling xenkbd_connect_backend() when all interfaces (keyboard, pointer, and multitouch) are disabled? Should we do: if (!(with_kbd || || with_ptr || with_mtouch)) return -ENXIO; ? It does make sense. Then we probably need to move all xenbus_read_unsigned calls to the very beginning of the .probe, so no memory allocations are made which will be useless if we return -ENXIO, e.g. something like static int xenkbd_probe(struct xenbus_device *dev, Â Â Â Â Â Â Â Â Â Â Â Â Â const struct xenbus_device_id *id) { Â Â Â int ret, i; Â Â Â bool with_mtouch, with_kbd, with_ptr; Â Â Â struct xenkbd_info *info; Â Â Â struct input_dev *kbd, *ptr, *mtouch; if (!(with_kbd | with_ptr | with_mtouch)) Â Â Â Â Â Â return -ENXIO; Does the above looks ok? Yes. Another option is to keep the check where I suggested and do if (...) { ret = -ENXIO; goto error; } Whichever you prefer is fine with me. I will go with the change you suggested and I'll send v4 tomorrow then. Thanks. Thank you, Oleksandr
Re: [PATCH 11/40] ipv6/flowlabel: simplify pid namespace lookup
Christoph Hellwigwrites: > On Sat, May 05, 2018 at 07:37:33AM -0500, Eric W. Biederman wrote: >> Christoph Hellwig writes: >> >> > The shole seq_file sequence already operates under a single RCU lock pair, >> > so move the pid namespace lookup into it, and stop grabbing a reference >> > and remove all kinds of boilerplate code. >> >> This is wrong. >> >> Move task_active_pid_ns(current) from open to seq_start actually means >> that the results if you pass this proc file between callers the results >> will change. So this breaks file descriptor passing. >> >> Open is a bad place to access current. In the middle of read/write is >> broken. >> >> >> In this particular instance looking up the pid namespace with >> task_active_pid_ns was a personal brain fart. What the code should be >> doing (with an appropriate helper) is: >> >> struct pid_namespace *pid_ns = inode->i_sb->s_fs_info; >> >> Because each mount of proc is bound to a pid namespace. Looking up the >> pid namespace from the super_block is a much better way to go. > > What do you have in mind for the helper? For now I've thrown it in > opencoded into my working tree, but I'd be glad to add a helper. > > struct pid_namespace *proc_pid_namespace(struct inode *inode) > { > // maybe warn on for s_magic not on procfs?? > return inode->i_sb->s_fs_info; > } That should work. Ideally out of line for the proc_fs.h version. Basically it should be a cousin of PDE_DATA. Eric
Re: [PATCH 11/40] ipv6/flowlabel: simplify pid namespace lookup
Christoph Hellwig writes: > On Sat, May 05, 2018 at 07:37:33AM -0500, Eric W. Biederman wrote: >> Christoph Hellwig writes: >> >> > The shole seq_file sequence already operates under a single RCU lock pair, >> > so move the pid namespace lookup into it, and stop grabbing a reference >> > and remove all kinds of boilerplate code. >> >> This is wrong. >> >> Move task_active_pid_ns(current) from open to seq_start actually means >> that the results if you pass this proc file between callers the results >> will change. So this breaks file descriptor passing. >> >> Open is a bad place to access current. In the middle of read/write is >> broken. >> >> >> In this particular instance looking up the pid namespace with >> task_active_pid_ns was a personal brain fart. What the code should be >> doing (with an appropriate helper) is: >> >> struct pid_namespace *pid_ns = inode->i_sb->s_fs_info; >> >> Because each mount of proc is bound to a pid namespace. Looking up the >> pid namespace from the super_block is a much better way to go. > > What do you have in mind for the helper? For now I've thrown it in > opencoded into my working tree, but I'd be glad to add a helper. > > struct pid_namespace *proc_pid_namespace(struct inode *inode) > { > // maybe warn on for s_magic not on procfs?? > return inode->i_sb->s_fs_info; > } That should work. Ideally out of line for the proc_fs.h version. Basically it should be a cousin of PDE_DATA. Eric
Re: [linux-firmware] Version in WHENCE file
On Mon, 2018-05-07 at 09:47 +0200, Sedat Dilek wrote: > On Sun, May 6, 2018 at 3:07 PM, Luciano Coelhocom> wrote: > > On Sun, 2018-05-06 at 14:46 +0200, Sedat Dilek wrote: > > > Hi Luca, > > > > > > I hope I catched the correct MLs (not sure if linux-firmware has > > > a > > > ML, > > > I did not found any in the MAINTAINERS file). > > > > > > I have seen that in the WHENCE file there is "Version" with and > > > without ":", mostly iwlwifi ucodes. > > > > > > As an example: > > > > > > File: iwlwifi-8265-36.ucode > > > -Version 36.e91976c0.0 > > > +Version: 36.e91976c0.0 > > > > > > I don't know the workflow: Do you want to fix it in your tree or > > > directly in linux-firmware.git upstream? > > > My attached patch is against upstream. > > > > Thanks, Sedat! > > > > I'm going to send a new pull-request this week, so I can include > > your > > patch in my tree and as part of the pull-request. > > > > -- > > Cheers, > > Luca. > > OK, Thanks. > Attached Patch v2 is against your tree [1]. > It differs from v1 against upstream. You need to write a proper commit message and sign it off so I can apply it. -- Luca.
Re: [linux-firmware] Version in WHENCE file
On Mon, 2018-05-07 at 09:47 +0200, Sedat Dilek wrote: > On Sun, May 6, 2018 at 3:07 PM, Luciano Coelho com> wrote: > > On Sun, 2018-05-06 at 14:46 +0200, Sedat Dilek wrote: > > > Hi Luca, > > > > > > I hope I catched the correct MLs (not sure if linux-firmware has > > > a > > > ML, > > > I did not found any in the MAINTAINERS file). > > > > > > I have seen that in the WHENCE file there is "Version" with and > > > without ":", mostly iwlwifi ucodes. > > > > > > As an example: > > > > > > File: iwlwifi-8265-36.ucode > > > -Version 36.e91976c0.0 > > > +Version: 36.e91976c0.0 > > > > > > I don't know the workflow: Do you want to fix it in your tree or > > > directly in linux-firmware.git upstream? > > > My attached patch is against upstream. > > > > Thanks, Sedat! > > > > I'm going to send a new pull-request this week, so I can include > > your > > patch in my tree and as part of the pull-request. > > > > -- > > Cheers, > > Luca. > > OK, Thanks. > Attached Patch v2 is against your tree [1]. > It differs from v1 against upstream. You need to write a proper commit message and sign it off so I can apply it. -- Luca.
Re: [PATCH] dmaengine: qcom: bam_dma: check if the runtime pm enabled
On 14-05-18, 17:18, Srinivas Kandagatla wrote: > Disabling pm runtime at probe is not sufficient to get BAM working > on remotely controller instances. pm_runtime_get_sync() would return > -EACCES in such cases. > So check if runtime pm is enabled before returning error from bam functions. > > Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote > controlled") > Signed-off-by: Srinivas Kandagatla> --- > drivers/dma/qcom/bam_dma.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index d29275b97e84..5f4babebc508 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -540,7 +540,7 @@ static void bam_free_chan(struct dma_chan *chan) > int ret; > > ret = pm_runtime_get_sync(bdev->dev); > - if (ret < 0) > + if (pm_runtime_enabled(bdev->dev) && ret < 0) would it make sense to first check enabled and do _get_sync() if (pm_runtime_enabled()) { ret = pm_runtime_get_sync() { ... } } thus making clear in code that we do calls only when it is enabled. Also you can add a local macro for this code and use that rather than copy pasting :) -- ~Vinod
Re: [PATCH] dmaengine: qcom: bam_dma: check if the runtime pm enabled
On 14-05-18, 17:18, Srinivas Kandagatla wrote: > Disabling pm runtime at probe is not sufficient to get BAM working > on remotely controller instances. pm_runtime_get_sync() would return > -EACCES in such cases. > So check if runtime pm is enabled before returning error from bam functions. > > Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote > controlled") > Signed-off-by: Srinivas Kandagatla > --- > drivers/dma/qcom/bam_dma.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c > index d29275b97e84..5f4babebc508 100644 > --- a/drivers/dma/qcom/bam_dma.c > +++ b/drivers/dma/qcom/bam_dma.c > @@ -540,7 +540,7 @@ static void bam_free_chan(struct dma_chan *chan) > int ret; > > ret = pm_runtime_get_sync(bdev->dev); > - if (ret < 0) > + if (pm_runtime_enabled(bdev->dev) && ret < 0) would it make sense to first check enabled and do _get_sync() if (pm_runtime_enabled()) { ret = pm_runtime_get_sync() { ... } } thus making clear in code that we do calls only when it is enabled. Also you can add a local macro for this code and use that rather than copy pasting :) -- ~Vinod
Re: [PATCH v3 2/2] dmaengine: sprd: Add Spreadtrum DMA configuration
On 11-05-18, 21:06, Baolin Wang wrote: > +struct sprd_dma_config { > + struct dma_slave_config cfg; > + u32 block_len; > + u32 transcation_len; /s/transcation/transaction now in code I see block_len and this filled by len which is sg_dma_len()? So why two varibales when you are using only one. Second, you are storing parameters programmed thru _prep call into _dma_config. That is not correct. We use these to store channel parameters and NOT transaction parameters which would differ with each txn. No wonder you can only support only 1 txn :) Lastly, if these are same values why not use src/dstn_max_burst? > +static enum sprd_dma_datawidth > +sprd_dma_get_datawidth(enum dma_slave_buswidth buswidth) > +{ > + switch (buswidth) { > + case DMA_SLAVE_BUSWIDTH_1_BYTE: > + case DMA_SLAVE_BUSWIDTH_2_BYTES: > + case DMA_SLAVE_BUSWIDTH_4_BYTES: > + case DMA_SLAVE_BUSWIDTH_8_BYTES: > + return ffs(buswidth) - 1; > + default: > + return SPRD_DMA_DATAWIDTH_4_BYTES; Does default make sense, should we error out? > +static u32 sprd_dma_get_step(enum dma_slave_buswidth buswidth) > +{ > + switch (buswidth) { > + case DMA_SLAVE_BUSWIDTH_1_BYTE: > + case DMA_SLAVE_BUSWIDTH_2_BYTES: > + case DMA_SLAVE_BUSWIDTH_4_BYTES: > + case DMA_SLAVE_BUSWIDTH_8_BYTES: > + return buswidth; > + > + default: > + return SPRD_DMA_DWORD_STEP; Does default make sense, should we error out? > +static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc > *sdesc, > +dma_addr_t src, dma_addr_t dst, > +struct sprd_dma_config *slave_cfg) > +{ > + struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan); > + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan); > + struct sprd_dma_chn_hw *hw = >chn_hw; > + u32 fix_mode = 0, fix_en = 0, wrap_en = 0, wrap_mode = 0; > + u32 src_datawidth, dst_datawidth, temp; > + > + if (slave_cfg->cfg.slave_id) > + schan->dev_id = slave_cfg->cfg.slave_id; > + > + hw->cfg = SPRD_DMA_DONOT_WAIT_BDONE << SPRD_DMA_WAIT_BDONE_OFFSET; > + > + temp = slave_cfg->wrap_ptr & SPRD_DMA_LOW_ADDR_MASK; > + temp |= (src >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK; > + hw->wrap_ptr = temp; > + > + temp = slave_cfg->wrap_to & SPRD_DMA_LOW_ADDR_MASK; > + temp |= (dst >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK; > + hw->wrap_to = temp; > + > + hw->src_addr = src & SPRD_DMA_LOW_ADDR_MASK; > + hw->des_addr = dst & SPRD_DMA_LOW_ADDR_MASK; > + > + if ((slave_cfg->src_step != 0 && slave_cfg->dst_step != 0) > + || (slave_cfg->src_step | slave_cfg->dst_step) == 0) { this check doesnt seem right to me, what are you trying here? > + fix_en = 0; > + } else { > + fix_en = 1; > + if (slave_cfg->src_step) > + fix_mode = 1; > + else > + fix_mode = 0; > + } > + > + if (slave_cfg->wrap_ptr && slave_cfg->wrap_to) { > + wrap_en = 1; > + if (slave_cfg->wrap_to == src) { > + wrap_mode = 0; > + } else if (slave_cfg->wrap_to == dst) { > + wrap_mode = 1; > + } else { > + dev_err(sdev->dma_dev.dev, "invalid wrap mode\n"); > + return -EINVAL; > + } > + } > + > + hw->intc = slave_cfg->int_mode | SPRD_DMA_CFG_ERR_INT_EN; > + > + src_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.src_addr_width); > + dst_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.dst_addr_width); > + > + temp = src_datawidth << SPRD_DMA_SRC_DATAWIDTH_OFFSET; > + temp |= dst_datawidth << SPRD_DMA_DES_DATAWIDTH_OFFSET; > + temp |= slave_cfg->req_mode << SPRD_DMA_REQ_MODE_OFFSET; > + temp |= wrap_mode << SPRD_DMA_WRAP_SEL_OFFSET; > + temp |= wrap_en << SPRD_DMA_WRAP_EN_OFFSET; > + temp |= fix_mode << SPRD_DMA_FIX_SEL_OFFSET; > + temp |= fix_en << SPRD_DMA_FIX_EN_OFFSET; > + temp |= slave_cfg->cfg.src_maxburst & SPRD_DMA_FRG_LEN_MASK; > + hw->frg_len = temp; > + > + hw->blk_len = slave_cfg->block_len & SPRD_DMA_BLK_LEN_MASK; > + hw->trsc_len = slave_cfg->transcation_len & SPRD_DMA_TRSC_LEN_MASK; > + > + temp = (slave_cfg->dst_step & SPRD_DMA_TRSF_STEP_MASK) << > + SPRD_DMA_DEST_TRSF_STEP_OFFSET; > + temp |= (slave_cfg->src_step & SPRD_DMA_TRSF_STEP_MASK) << > + SPRD_DMA_SRC_TRSF_STEP_OFFSET; > + hw->trsf_step = temp; > + > + hw->frg_step = 0; > + hw->src_blk_step = 0; > + hw->des_blk_step = 0; > + return 0; > +} > +static struct dma_async_tx_descriptor * > +sprd_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl, > +unsigned int sglen, enum dma_transfer_direction dir, > +unsigned long flags, void *context) > +{ > +
Re: [PATCH v3 2/2] dmaengine: sprd: Add Spreadtrum DMA configuration
On 11-05-18, 21:06, Baolin Wang wrote: > +struct sprd_dma_config { > + struct dma_slave_config cfg; > + u32 block_len; > + u32 transcation_len; /s/transcation/transaction now in code I see block_len and this filled by len which is sg_dma_len()? So why two varibales when you are using only one. Second, you are storing parameters programmed thru _prep call into _dma_config. That is not correct. We use these to store channel parameters and NOT transaction parameters which would differ with each txn. No wonder you can only support only 1 txn :) Lastly, if these are same values why not use src/dstn_max_burst? > +static enum sprd_dma_datawidth > +sprd_dma_get_datawidth(enum dma_slave_buswidth buswidth) > +{ > + switch (buswidth) { > + case DMA_SLAVE_BUSWIDTH_1_BYTE: > + case DMA_SLAVE_BUSWIDTH_2_BYTES: > + case DMA_SLAVE_BUSWIDTH_4_BYTES: > + case DMA_SLAVE_BUSWIDTH_8_BYTES: > + return ffs(buswidth) - 1; > + default: > + return SPRD_DMA_DATAWIDTH_4_BYTES; Does default make sense, should we error out? > +static u32 sprd_dma_get_step(enum dma_slave_buswidth buswidth) > +{ > + switch (buswidth) { > + case DMA_SLAVE_BUSWIDTH_1_BYTE: > + case DMA_SLAVE_BUSWIDTH_2_BYTES: > + case DMA_SLAVE_BUSWIDTH_4_BYTES: > + case DMA_SLAVE_BUSWIDTH_8_BYTES: > + return buswidth; > + > + default: > + return SPRD_DMA_DWORD_STEP; Does default make sense, should we error out? > +static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc > *sdesc, > +dma_addr_t src, dma_addr_t dst, > +struct sprd_dma_config *slave_cfg) > +{ > + struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan); > + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan); > + struct sprd_dma_chn_hw *hw = >chn_hw; > + u32 fix_mode = 0, fix_en = 0, wrap_en = 0, wrap_mode = 0; > + u32 src_datawidth, dst_datawidth, temp; > + > + if (slave_cfg->cfg.slave_id) > + schan->dev_id = slave_cfg->cfg.slave_id; > + > + hw->cfg = SPRD_DMA_DONOT_WAIT_BDONE << SPRD_DMA_WAIT_BDONE_OFFSET; > + > + temp = slave_cfg->wrap_ptr & SPRD_DMA_LOW_ADDR_MASK; > + temp |= (src >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK; > + hw->wrap_ptr = temp; > + > + temp = slave_cfg->wrap_to & SPRD_DMA_LOW_ADDR_MASK; > + temp |= (dst >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK; > + hw->wrap_to = temp; > + > + hw->src_addr = src & SPRD_DMA_LOW_ADDR_MASK; > + hw->des_addr = dst & SPRD_DMA_LOW_ADDR_MASK; > + > + if ((slave_cfg->src_step != 0 && slave_cfg->dst_step != 0) > + || (slave_cfg->src_step | slave_cfg->dst_step) == 0) { this check doesnt seem right to me, what are you trying here? > + fix_en = 0; > + } else { > + fix_en = 1; > + if (slave_cfg->src_step) > + fix_mode = 1; > + else > + fix_mode = 0; > + } > + > + if (slave_cfg->wrap_ptr && slave_cfg->wrap_to) { > + wrap_en = 1; > + if (slave_cfg->wrap_to == src) { > + wrap_mode = 0; > + } else if (slave_cfg->wrap_to == dst) { > + wrap_mode = 1; > + } else { > + dev_err(sdev->dma_dev.dev, "invalid wrap mode\n"); > + return -EINVAL; > + } > + } > + > + hw->intc = slave_cfg->int_mode | SPRD_DMA_CFG_ERR_INT_EN; > + > + src_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.src_addr_width); > + dst_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.dst_addr_width); > + > + temp = src_datawidth << SPRD_DMA_SRC_DATAWIDTH_OFFSET; > + temp |= dst_datawidth << SPRD_DMA_DES_DATAWIDTH_OFFSET; > + temp |= slave_cfg->req_mode << SPRD_DMA_REQ_MODE_OFFSET; > + temp |= wrap_mode << SPRD_DMA_WRAP_SEL_OFFSET; > + temp |= wrap_en << SPRD_DMA_WRAP_EN_OFFSET; > + temp |= fix_mode << SPRD_DMA_FIX_SEL_OFFSET; > + temp |= fix_en << SPRD_DMA_FIX_EN_OFFSET; > + temp |= slave_cfg->cfg.src_maxburst & SPRD_DMA_FRG_LEN_MASK; > + hw->frg_len = temp; > + > + hw->blk_len = slave_cfg->block_len & SPRD_DMA_BLK_LEN_MASK; > + hw->trsc_len = slave_cfg->transcation_len & SPRD_DMA_TRSC_LEN_MASK; > + > + temp = (slave_cfg->dst_step & SPRD_DMA_TRSF_STEP_MASK) << > + SPRD_DMA_DEST_TRSF_STEP_OFFSET; > + temp |= (slave_cfg->src_step & SPRD_DMA_TRSF_STEP_MASK) << > + SPRD_DMA_SRC_TRSF_STEP_OFFSET; > + hw->trsf_step = temp; > + > + hw->frg_step = 0; > + hw->src_blk_step = 0; > + hw->des_blk_step = 0; > + return 0; > +} > +static struct dma_async_tx_descriptor * > +sprd_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl, > +unsigned int sglen, enum dma_transfer_direction dir, > +unsigned long flags, void *context) > +{ > +
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On Wed, 16. May 18:05, Baoquan He wrote: > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to > handle conflict between KASLR and huge pages, will be used in the next patch. > > Function parse_gb_huge_pages() is used to parse kernel command-line to get > how many 1GB huge pages have been specified. A static global variable > 'max_gb_huge_pages' is added to store the number. > > And process_gb_huge_page() is used to skip as many 1GB huge pages as possible > from the passed in memory region according to the specified number. > > Signed-off-by: Baoquan He> --- > arch/x86/boot/compressed/kaslr.c | 71 > > 1 file changed, 71 insertions(+) > > diff --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index a0a50b91ecef..13bd879cdc5d 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str) > memmap_too_large = true; > } > > +/* Store the number of 1GB huge pages which user specified.*/ > +static unsigned long max_gb_huge_pages; > + > +static int parse_gb_huge_pages(char *param, char* val) > +{ > + char *p; > + u64 mem_size; > + static bool gbpage_sz = false; > + > + if (!strcmp(param, "hugepagesz")) { > + p = val; > + mem_size = memparse(p, ); > + if (mem_size == PUD_SIZE) { > + if (gbpage_sz) > + warn("Repeadly set hugeTLB page size of 1G!\n"); > + gbpage_sz = true; > + } else > + gbpage_sz = false; > + } else if (!strcmp(param, "hugepages") && gbpage_sz) { > + p = val; > + max_gb_huge_pages = simple_strtoull(p, , 0); > + debug_putaddr(max_gb_huge_pages); > + } > +} Hello, the return value is missing for the function or not ? regards Damian > + > + > static int handle_mem_memmap(void) > { > char *args = (char *)get_cmd_line_ptr(); > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > unsigned long image_size) > } > } > > +/* Skip as many 1GB huge pages as possible in the passed region. */ > +static void process_gb_huge_page(struct mem_vector *region, unsigned long > image_size) > +{ > + int i = 0; > + unsigned long addr, size; > + struct mem_vector tmp; > + > + if (!max_gb_huge_pages) { > + store_slot_info(region, image_size); > + return; > + } > + > + addr = ALIGN(region->start, PUD_SIZE); > + /* If Did we raise the address above the passed in memory entry? */ > + if (addr < region->start + region->size) > + size = region->size - (addr - region->start); > + > + /* Check how many 1GB huge pages can be filtered out*/ > + while (size > PUD_SIZE && max_gb_huge_pages) { > + size -= PUD_SIZE; > + max_gb_huge_pages--; > + i++; > + } > + > + if (!i) { > + store_slot_info(region, image_size); > + return; > + } > + > + /* Process the remaining regions after filtering out. */ > + > + if (addr >= region->start + image_size) { > + tmp.start = region->start; > + tmp.size = addr - region->start; > + store_slot_info(, image_size); > + } > + > + size = region->size - (addr - region->start) - i * PUD_SIZE; > +if (size >= image_size) { > + tmp.start = addr + i*PUD_SIZE; > + tmp.size = size; > + store_slot_info(, image_size); > +} > +} > + > static unsigned long slots_fetch_random(void) > { > unsigned long slot; > -- > 2.13.6 >
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
On Wed, 16. May 18:05, Baoquan He wrote: > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to > handle conflict between KASLR and huge pages, will be used in the next patch. > > Function parse_gb_huge_pages() is used to parse kernel command-line to get > how many 1GB huge pages have been specified. A static global variable > 'max_gb_huge_pages' is added to store the number. > > And process_gb_huge_page() is used to skip as many 1GB huge pages as possible > from the passed in memory region according to the specified number. > > Signed-off-by: Baoquan He > --- > arch/x86/boot/compressed/kaslr.c | 71 > > 1 file changed, 71 insertions(+) > > diff --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index a0a50b91ecef..13bd879cdc5d 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str) > memmap_too_large = true; > } > > +/* Store the number of 1GB huge pages which user specified.*/ > +static unsigned long max_gb_huge_pages; > + > +static int parse_gb_huge_pages(char *param, char* val) > +{ > + char *p; > + u64 mem_size; > + static bool gbpage_sz = false; > + > + if (!strcmp(param, "hugepagesz")) { > + p = val; > + mem_size = memparse(p, ); > + if (mem_size == PUD_SIZE) { > + if (gbpage_sz) > + warn("Repeadly set hugeTLB page size of 1G!\n"); > + gbpage_sz = true; > + } else > + gbpage_sz = false; > + } else if (!strcmp(param, "hugepages") && gbpage_sz) { > + p = val; > + max_gb_huge_pages = simple_strtoull(p, , 0); > + debug_putaddr(max_gb_huge_pages); > + } > +} Hello, the return value is missing for the function or not ? regards Damian > + > + > static int handle_mem_memmap(void) > { > char *args = (char *)get_cmd_line_ptr(); > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > unsigned long image_size) > } > } > > +/* Skip as many 1GB huge pages as possible in the passed region. */ > +static void process_gb_huge_page(struct mem_vector *region, unsigned long > image_size) > +{ > + int i = 0; > + unsigned long addr, size; > + struct mem_vector tmp; > + > + if (!max_gb_huge_pages) { > + store_slot_info(region, image_size); > + return; > + } > + > + addr = ALIGN(region->start, PUD_SIZE); > + /* If Did we raise the address above the passed in memory entry? */ > + if (addr < region->start + region->size) > + size = region->size - (addr - region->start); > + > + /* Check how many 1GB huge pages can be filtered out*/ > + while (size > PUD_SIZE && max_gb_huge_pages) { > + size -= PUD_SIZE; > + max_gb_huge_pages--; > + i++; > + } > + > + if (!i) { > + store_slot_info(region, image_size); > + return; > + } > + > + /* Process the remaining regions after filtering out. */ > + > + if (addr >= region->start + image_size) { > + tmp.start = region->start; > + tmp.size = addr - region->start; > + store_slot_info(, image_size); > + } > + > + size = region->size - (addr - region->start) - i * PUD_SIZE; > +if (size >= image_size) { > + tmp.start = addr + i*PUD_SIZE; > + tmp.size = size; > + store_slot_info(, image_size); > +} > +} > + > static unsigned long slots_fetch_random(void) > { > unsigned long slot; > -- > 2.13.6 >
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On Tue, Feb 13, 2018 at 2:07 PM, Ulf Magnussonwrote: >> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote: >> Sander Eikelenboom wrote: >>> The Debian kernel-package tool make-kpkg for easy building of upstream >>> kernels on Debian fails with linux 4.16-rc1. >>> >>> Bisection turned up as culprit: >>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>> >>> What the commit does is to make silentoldconfig not immediately exit(1) >>> when both of the following apply: >>> >>> 1. stdin is from something that's not a terminal >>> >>> 2. New symbols are prompted for > > Shouldn't be a problem to back this one out either if it turns out to > cause massive amounts of pain in practice I guess, even if it's the > Debian tools doing something weird. > > Good to look into what it is they're doing in any case. It appears that make-kpkg is trying to extract kernel version and configuration information by including the kernel Makefile then referencing the variables it exports. The offending make-kpkg Makefile is kernel_version.mk[1] which can be simplified to the following example: -8<- make-kpkg-regression.mk - include Makefile .PHONY: debian_VERSION debian_VERSION: -8<--- Invoking this script from the root of the kernel source tree, with stderr redirected (but not stdin) demonstrates the issue: make -f make-kpkg-regression.mk debian_VERSION >/dev/null 2>&1 Before d2a04648a5db this would exit, after it will hang waiting for input. This occurs because the debian_VERSION target name isn't in no-dot-config-targets defined in Makefile, so Makefile invokes silentoldconfig automatically as necessary.[2] I think the issue is best fixed by make-kpkg, and that d2a04648a5db doesn't need to be reverted. The simple fix would be for make-kpkg to redirect stdin when redirecting stdout/stderr so that scripts/kconfig/conf will fail as it did before. The better fix would be for make-kpkg to use a supported interface for getting the variables it needs, rather than including Makefile. Best, Kevin 1. https://salsa.debian.org/srivasta/kernel-package/blob/master/kernel/ruleset/kernel_version.mk 2. Interestingly, it's not just that the target isn't in no-dot-config-targets. make-kpkg actually sets `dot-config := 1` to force silentoldconfig (since 38399c1[3]). 3. https://salsa.debian.org/srivasta/kernel-package/commit/38399c1
Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"
On Tue, Feb 13, 2018 at 2:07 PM, Ulf Magnusson wrote: >> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote: >> Sander Eikelenboom wrote: >>> The Debian kernel-package tool make-kpkg for easy building of upstream >>> kernels on Debian fails with linux 4.16-rc1. >>> >>> Bisection turned up as culprit: >>> commit d2a04648a5dbc3d1d043b35257364f0197d4d868 >>> >>> What the commit does is to make silentoldconfig not immediately exit(1) >>> when both of the following apply: >>> >>> 1. stdin is from something that's not a terminal >>> >>> 2. New symbols are prompted for > > Shouldn't be a problem to back this one out either if it turns out to > cause massive amounts of pain in practice I guess, even if it's the > Debian tools doing something weird. > > Good to look into what it is they're doing in any case. It appears that make-kpkg is trying to extract kernel version and configuration information by including the kernel Makefile then referencing the variables it exports. The offending make-kpkg Makefile is kernel_version.mk[1] which can be simplified to the following example: -8<- make-kpkg-regression.mk - include Makefile .PHONY: debian_VERSION debian_VERSION: -8<--- Invoking this script from the root of the kernel source tree, with stderr redirected (but not stdin) demonstrates the issue: make -f make-kpkg-regression.mk debian_VERSION >/dev/null 2>&1 Before d2a04648a5db this would exit, after it will hang waiting for input. This occurs because the debian_VERSION target name isn't in no-dot-config-targets defined in Makefile, so Makefile invokes silentoldconfig automatically as necessary.[2] I think the issue is best fixed by make-kpkg, and that d2a04648a5db doesn't need to be reverted. The simple fix would be for make-kpkg to redirect stdin when redirecting stdout/stderr so that scripts/kconfig/conf will fail as it did before. The better fix would be for make-kpkg to use a supported interface for getting the variables it needs, rather than including Makefile. Best, Kevin 1. https://salsa.debian.org/srivasta/kernel-package/blob/master/kernel/ruleset/kernel_version.mk 2. Interestingly, it's not just that the target isn't in no-dot-config-targets. make-kpkg actually sets `dot-config := 1` to force silentoldconfig (since 38399c1[3]). 3. https://salsa.debian.org/srivasta/kernel-package/commit/38399c1
Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode
> > > Anyway, I understand that Intel has been ignoring kernel.org instead of > > > sending forwarding their patches properly so you're doing a difficult > > > and thankless job... Thanks for that. I'm sure it's frustrating to > > > look at these patches for you as well. > > > > Thank you for the complement. Also thank you for taking time to review > > these patches. Your feedback is most welcomed and benefitical to the > > health of the lustre client. > > > > Sadly its not just Intel but other vendors that don't directly contribute > > to the linux lustre client. I have spoke to the vendors about contributing > > and they all say the same thing. No working with drivers in the staging > > tree. Sadly all the parties involved are very interested in the success > > of the lustre client. No one has ever told me directly why they don't get > > involved but I suspect it has to deal with 2 reasons. One is that staging > > drivers are not normally enabled by distributions so their clients > > normally will never deal with the staging lustre client. Secondly vendors > > just lack the man power to contribute in a meanful way. > > If staging is hurting you, why is it in staging at all? Why not just > drop it, go off and spend a few months to clean up all the issues in > your own tree (with none of those pesky requirements of easy-to-review > patches) and then submit a "clean" filesystem for inclusion in the > "real" part of the kernel tree? > > It doesn't sound like anyone is actually using this code in the tree > as-is, so why even keep it here? I never said being in staging is hurting the progression of Lustre. In fact it is the exact opposite otherwise I wouldn't be active in this work. What I was pointing out to Dan was that many vendors are reluctant to partcipate in broader open source development of this type. The whole point of this is to evolve Lustre into a proper open source project not dependent on vendors for survival. Several years ago Lustre changed hands several times and the HPC community was worried about its survival. Various institutions band togther to raise the resources to keep it alive. Over time Lustre has been migrating to a more open source community effort. An awesome example is the work the University of Indiana did for the sptlrpc layer. Now we see efforts expanding into the realm of the linux lustre client. Actually HPC sites that are community members are testing and running the linux client. In spite of the lack of vendor involvement the linux lustre client is making excellent progress. How often do you see style patches anymore? The headers are properly split between userspace UAPI headers and kernel space. One of the major barriers to leave staging was the the lack of a strong presence to continue moving the lustre client forward. That is no longer the case. The finish line is in view.
Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode
> > > Anyway, I understand that Intel has been ignoring kernel.org instead of > > > sending forwarding their patches properly so you're doing a difficult > > > and thankless job... Thanks for that. I'm sure it's frustrating to > > > look at these patches for you as well. > > > > Thank you for the complement. Also thank you for taking time to review > > these patches. Your feedback is most welcomed and benefitical to the > > health of the lustre client. > > > > Sadly its not just Intel but other vendors that don't directly contribute > > to the linux lustre client. I have spoke to the vendors about contributing > > and they all say the same thing. No working with drivers in the staging > > tree. Sadly all the parties involved are very interested in the success > > of the lustre client. No one has ever told me directly why they don't get > > involved but I suspect it has to deal with 2 reasons. One is that staging > > drivers are not normally enabled by distributions so their clients > > normally will never deal with the staging lustre client. Secondly vendors > > just lack the man power to contribute in a meanful way. > > If staging is hurting you, why is it in staging at all? Why not just > drop it, go off and spend a few months to clean up all the issues in > your own tree (with none of those pesky requirements of easy-to-review > patches) and then submit a "clean" filesystem for inclusion in the > "real" part of the kernel tree? > > It doesn't sound like anyone is actually using this code in the tree > as-is, so why even keep it here? I never said being in staging is hurting the progression of Lustre. In fact it is the exact opposite otherwise I wouldn't be active in this work. What I was pointing out to Dan was that many vendors are reluctant to partcipate in broader open source development of this type. The whole point of this is to evolve Lustre into a proper open source project not dependent on vendors for survival. Several years ago Lustre changed hands several times and the HPC community was worried about its survival. Various institutions band togther to raise the resources to keep it alive. Over time Lustre has been migrating to a more open source community effort. An awesome example is the work the University of Indiana did for the sptlrpc layer. Now we see efforts expanding into the realm of the linux lustre client. Actually HPC sites that are community members are testing and running the linux client. In spite of the lack of vendor involvement the linux lustre client is making excellent progress. How often do you see style patches anymore? The headers are properly split between userspace UAPI headers and kernel space. One of the major barriers to leave staging was the the lack of a strong presence to continue moving the lustre client forward. That is no longer the case. The finish line is in view.
Re: [PATCH RFC] schedutil: Allow cpufreq requests to be made even when kthread kicked
On 16-05-18, 15:45, Joel Fernandes (Google) wrote: > kernel/sched/cpufreq_schedutil.c | 36 +--- > 1 file changed, 28 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index e13df951aca7..a87fc281893d 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy > *sg_policy, u64 time) > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > return false; > > - if (sg_policy->work_in_progress) > - return false; > - > if (unlikely(sg_policy->need_freq_update)) { > sg_policy->need_freq_update = false; > /* > @@ -129,8 +126,11 @@ static void sugov_update_commit(struct sugov_policy > *sg_policy, u64 time, > policy->cur = next_freq; > trace_cpu_frequency(next_freq, smp_processor_id()); > } else { > - sg_policy->work_in_progress = true; > - irq_work_queue(_policy->irq_work); > + /* Don't queue request if one was already queued */ > + if (!sg_policy->work_in_progress) { Merge it above to make it "else if". > + sg_policy->work_in_progress = true; > + irq_work_queue(_policy->irq_work); > + } > } > } > > @@ -291,6 +291,15 @@ static void sugov_update_single(struct update_util_data > *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + /* > + * For slow-switch systems, single policy requests can't run at the > + * moment if the governor thread is already processing a pending > + * frequency switch request, this can be fixed by acquiring update_lock > + * while updating next_freq and work_in_progress but we prefer not to. > + */ > + if (sg_policy->work_in_progress) > + return; > + @Rafael: Do you think its worth start using the lock now for unshared policies ? > if (!sugov_should_update_freq(sg_policy, time)) > return; > > @@ -382,13 +391,24 @@ sugov_update_shared(struct update_util_data *hook, u64 > time, unsigned int flags) > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct > sugov_policy, work); > + unsigned int freq; > + unsigned long flags; > + > + /* > + * Hold sg_policy->update_lock shortly to handle the case where: > + * incase sg_policy->next_freq is read here, and then updated by > + * sugov_update_shared just before work_in_progress is set to false > + * here, we may miss queueing the new update. > + */ > + raw_spin_lock_irqsave(_policy->update_lock, flags); > + freq = sg_policy->next_freq; > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(_policy->update_lock, flags); > > mutex_lock(_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > + __cpufreq_driver_target(sg_policy->policy, freq, > CPUFREQ_RELATION_L); No need of line break anymore. > mutex_unlock(_policy->work_lock); > - > - sg_policy->work_in_progress = false; > } > > static void sugov_irq_work(struct irq_work *irq_work) LGTM. -- viresh
Re: [PATCH RFC] schedutil: Allow cpufreq requests to be made even when kthread kicked
On 16-05-18, 15:45, Joel Fernandes (Google) wrote: > kernel/sched/cpufreq_schedutil.c | 36 +--- > 1 file changed, 28 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index e13df951aca7..a87fc281893d 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy > *sg_policy, u64 time) > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) > return false; > > - if (sg_policy->work_in_progress) > - return false; > - > if (unlikely(sg_policy->need_freq_update)) { > sg_policy->need_freq_update = false; > /* > @@ -129,8 +126,11 @@ static void sugov_update_commit(struct sugov_policy > *sg_policy, u64 time, > policy->cur = next_freq; > trace_cpu_frequency(next_freq, smp_processor_id()); > } else { > - sg_policy->work_in_progress = true; > - irq_work_queue(_policy->irq_work); > + /* Don't queue request if one was already queued */ > + if (!sg_policy->work_in_progress) { Merge it above to make it "else if". > + sg_policy->work_in_progress = true; > + irq_work_queue(_policy->irq_work); > + } > } > } > > @@ -291,6 +291,15 @@ static void sugov_update_single(struct update_util_data > *hook, u64 time, > > ignore_dl_rate_limit(sg_cpu, sg_policy); > > + /* > + * For slow-switch systems, single policy requests can't run at the > + * moment if the governor thread is already processing a pending > + * frequency switch request, this can be fixed by acquiring update_lock > + * while updating next_freq and work_in_progress but we prefer not to. > + */ > + if (sg_policy->work_in_progress) > + return; > + @Rafael: Do you think its worth start using the lock now for unshared policies ? > if (!sugov_should_update_freq(sg_policy, time)) > return; > > @@ -382,13 +391,24 @@ sugov_update_shared(struct update_util_data *hook, u64 > time, unsigned int flags) > static void sugov_work(struct kthread_work *work) > { > struct sugov_policy *sg_policy = container_of(work, struct > sugov_policy, work); > + unsigned int freq; > + unsigned long flags; > + > + /* > + * Hold sg_policy->update_lock shortly to handle the case where: > + * incase sg_policy->next_freq is read here, and then updated by > + * sugov_update_shared just before work_in_progress is set to false > + * here, we may miss queueing the new update. > + */ > + raw_spin_lock_irqsave(_policy->update_lock, flags); > + freq = sg_policy->next_freq; > + sg_policy->work_in_progress = false; > + raw_spin_unlock_irqrestore(_policy->update_lock, flags); > > mutex_lock(_policy->work_lock); > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, > + __cpufreq_driver_target(sg_policy->policy, freq, > CPUFREQ_RELATION_L); No need of line break anymore. > mutex_unlock(_policy->work_lock); > - > - sg_policy->work_in_progress = false; > } > > static void sugov_irq_work(struct irq_work *irq_work) LGTM. -- viresh
[PATCH] clk: imx6sl: correct ocram_podf clock type
IMX6SL_CLK_OCRAM_PODF is a busy divider, its name in CCM_CDHIPR register of Reference Manual CCM chapter is axi_podf_busy, correct its clock type. Signed-off-by: Anson Huang--- drivers/clk/imx/clk-imx6sl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/imx/clk-imx6sl.c b/drivers/clk/imx/clk-imx6sl.c index 9642cdf..66b1dd1 100644 --- a/drivers/clk/imx/clk-imx6sl.c +++ b/drivers/clk/imx/clk-imx6sl.c @@ -330,7 +330,7 @@ static void __init imx6sl_clocks_init(struct device_node *ccm_node) clks[IMX6SL_CLK_PERIPH2] = imx_clk_busy_mux("periph2", base + 0x14, 26, 1, base + 0x48, 3, periph2_sels, ARRAY_SIZE(periph2_sels)); /* name parent_name reg shift width */ - clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_divider("ocram_podf", "ocram_sel", base + 0x14, 16, 3); + clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_busy_divider("ocram_podf", "ocram_sel", base + 0x14, 16, 3, base + 0x48, 0); clks[IMX6SL_CLK_PERIPH_CLK2_PODF] = imx_clk_divider("periph_clk2_podf", "periph_clk2_sel", base + 0x14, 27, 3); clks[IMX6SL_CLK_PERIPH2_CLK2_PODF] = imx_clk_divider("periph2_clk2_podf", "periph2_clk2_sel", base + 0x14, 0, 3); clks[IMX6SL_CLK_IPG] = imx_clk_divider("ipg", "ahb", base + 0x14, 8, 2); -- 2.7.4
[PATCH] clk: imx6sx: disable unnecessary clocks during clock initialization
Disable those unnecessary clocks during kernel boot up to save power, those modules clock should be managed by modules driver in runtime. Signed-off-by: Anson Huang--- drivers/clk/imx/clk-imx6sx.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/clk/imx/clk-imx6sx.c b/drivers/clk/imx/clk-imx6sx.c index 0178ee2..10c771b 100644 --- a/drivers/clk/imx/clk-imx6sx.c +++ b/drivers/clk/imx/clk-imx6sx.c @@ -97,12 +97,7 @@ static int const clks_init_on[] __initconst = { IMX6SX_CLK_IPMUX1, IMX6SX_CLK_IPMUX2, IMX6SX_CLK_IPMUX3, IMX6SX_CLK_WAKEUP, IMX6SX_CLK_MMDC_P0_FAST, IMX6SX_CLK_MMDC_P0_IPG, IMX6SX_CLK_ROM, IMX6SX_CLK_ARM, IMX6SX_CLK_IPG, IMX6SX_CLK_OCRAM, - IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_M4, - IMX6SX_CLK_QSPI1, IMX6SX_CLK_QSPI2, IMX6SX_CLK_UART_IPG, - IMX6SX_CLK_UART_SERIAL, IMX6SX_CLK_I2C3, IMX6SX_CLK_ECSPI5, - IMX6SX_CLK_CAN1_IPG, IMX6SX_CLK_CAN1_SERIAL, IMX6SX_CLK_CAN2_IPG, - IMX6SX_CLK_CAN2_SERIAL, IMX6SX_CLK_CANFD, IMX6SX_CLK_EPIT1, - IMX6SX_CLK_EPIT2, + IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_TZASC1, }; static const struct clk_div_table clk_enet_ref_table[] = { -- 2.7.4
[PATCH] clk: imx6sl: correct ocram_podf clock type
IMX6SL_CLK_OCRAM_PODF is a busy divider, its name in CCM_CDHIPR register of Reference Manual CCM chapter is axi_podf_busy, correct its clock type. Signed-off-by: Anson Huang --- drivers/clk/imx/clk-imx6sl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/imx/clk-imx6sl.c b/drivers/clk/imx/clk-imx6sl.c index 9642cdf..66b1dd1 100644 --- a/drivers/clk/imx/clk-imx6sl.c +++ b/drivers/clk/imx/clk-imx6sl.c @@ -330,7 +330,7 @@ static void __init imx6sl_clocks_init(struct device_node *ccm_node) clks[IMX6SL_CLK_PERIPH2] = imx_clk_busy_mux("periph2", base + 0x14, 26, 1, base + 0x48, 3, periph2_sels, ARRAY_SIZE(periph2_sels)); /* name parent_name reg shift width */ - clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_divider("ocram_podf", "ocram_sel", base + 0x14, 16, 3); + clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_busy_divider("ocram_podf", "ocram_sel", base + 0x14, 16, 3, base + 0x48, 0); clks[IMX6SL_CLK_PERIPH_CLK2_PODF] = imx_clk_divider("periph_clk2_podf", "periph_clk2_sel", base + 0x14, 27, 3); clks[IMX6SL_CLK_PERIPH2_CLK2_PODF] = imx_clk_divider("periph2_clk2_podf", "periph2_clk2_sel", base + 0x14, 0, 3); clks[IMX6SL_CLK_IPG] = imx_clk_divider("ipg", "ahb", base + 0x14, 8, 2); -- 2.7.4
[PATCH] clk: imx6sx: disable unnecessary clocks during clock initialization
Disable those unnecessary clocks during kernel boot up to save power, those modules clock should be managed by modules driver in runtime. Signed-off-by: Anson Huang --- drivers/clk/imx/clk-imx6sx.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/clk/imx/clk-imx6sx.c b/drivers/clk/imx/clk-imx6sx.c index 0178ee2..10c771b 100644 --- a/drivers/clk/imx/clk-imx6sx.c +++ b/drivers/clk/imx/clk-imx6sx.c @@ -97,12 +97,7 @@ static int const clks_init_on[] __initconst = { IMX6SX_CLK_IPMUX1, IMX6SX_CLK_IPMUX2, IMX6SX_CLK_IPMUX3, IMX6SX_CLK_WAKEUP, IMX6SX_CLK_MMDC_P0_FAST, IMX6SX_CLK_MMDC_P0_IPG, IMX6SX_CLK_ROM, IMX6SX_CLK_ARM, IMX6SX_CLK_IPG, IMX6SX_CLK_OCRAM, - IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_M4, - IMX6SX_CLK_QSPI1, IMX6SX_CLK_QSPI2, IMX6SX_CLK_UART_IPG, - IMX6SX_CLK_UART_SERIAL, IMX6SX_CLK_I2C3, IMX6SX_CLK_ECSPI5, - IMX6SX_CLK_CAN1_IPG, IMX6SX_CLK_CAN1_SERIAL, IMX6SX_CLK_CAN2_IPG, - IMX6SX_CLK_CAN2_SERIAL, IMX6SX_CLK_CANFD, IMX6SX_CLK_EPIT1, - IMX6SX_CLK_EPIT2, + IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_TZASC1, }; static const struct clk_div_table clk_enet_ref_table[] = { -- 2.7.4
[PATCH v7 2/3] gpio: pca953x: define masks for addressing common and extended registers
These mask bits are to be used to map the extended register addreseses (which are defined for an unsupported 8-bit pcal chip) to 16 and 24 bit chips (pcal6524). Reviewed-by: Andy ShevchenkoSigned-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index 2b667166e855..c682921d7019 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -56,6 +56,10 @@ #define PCAL6524_DEBOUNCE 0x2d #define PCA_GPIO_MASK 0x00FF + +#define PCAL_GPIO_MASK 0x1f +#define PCAL_PINCTRL_MASK 0xe0 + #define PCA_INT0x0100 #define PCA_PCAL 0x0200 #define PCA_LATCH_INT (PCA_PCAL | PCA_INT) -- 2.12.2
[PATCH v7 2/3] gpio: pca953x: define masks for addressing common and extended registers
These mask bits are to be used to map the extended register addreseses (which are defined for an unsupported 8-bit pcal chip) to 16 and 24 bit chips (pcal6524). Reviewed-by: Andy Shevchenko Signed-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index 2b667166e855..c682921d7019 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -56,6 +56,10 @@ #define PCAL6524_DEBOUNCE 0x2d #define PCA_GPIO_MASK 0x00FF + +#define PCAL_GPIO_MASK 0x1f +#define PCAL_PINCTRL_MASK 0xe0 + #define PCA_INT0x0100 #define PCA_PCAL 0x0200 #define PCA_LATCH_INT (PCA_PCAL | PCA_INT) -- 2.12.2
Re: linux-next: build failure after merge of the drm tree
I've applied this locally for now so I can continue arm64 builds :-) Dave. On 16 May 2018 at 18:09, Oded Gabbaywrote: > On Wed, May 16, 2018 at 9:53 AM, Stephen Rothwell > wrote: >> Hi all, >> >> After merging the drm tree, today's linux-next build (powerpc >> allyesconfig) failed like this: >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function >> 'init_user_pages': >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:632:3: error: implicit >> declaration of function 'release_pages'; did you mean 'release_task'? >> [-Werror=implicit-function-declaration] >>release_pages(mem->user_pages, bo->tbo.ttm->num_pages); >>^ >>release_task >> >> Caused by commit >> >> 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD") >> >> I have applied the following patch for today: >> >> From: Stephen Rothwell >> Date: Wed, 16 May 2018 16:43:34 +1000 >> Subject: [PATCH] drm/amdgpu: include pagemap.h for release_pages() >> >> Fixes: 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD" >> Cc: Felix Kuehling >> Cc: Oded Gabbay >> Signed-off-by: Stephen Rothwell >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index 72ab2b1ffe75..ff8fd75f7ca5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -23,6 +23,7 @@ >> #define pr_fmt(fmt) "kfd2kgd: " fmt >> >> #include >> +#include >> #include >> #include >> #include "amdgpu_object.h" >> -- >> 2.17.0 >> >> -- >> Cheers, >> Stephen Rothwell > > Thanks Stephen, > > I'll add it to amdkfd-next and send it to Dave with other fixes. > > Oded > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: linux-next: build failure after merge of the drm tree
I've applied this locally for now so I can continue arm64 builds :-) Dave. On 16 May 2018 at 18:09, Oded Gabbay wrote: > On Wed, May 16, 2018 at 9:53 AM, Stephen Rothwell > wrote: >> Hi all, >> >> After merging the drm tree, today's linux-next build (powerpc >> allyesconfig) failed like this: >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function >> 'init_user_pages': >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:632:3: error: implicit >> declaration of function 'release_pages'; did you mean 'release_task'? >> [-Werror=implicit-function-declaration] >>release_pages(mem->user_pages, bo->tbo.ttm->num_pages); >>^ >>release_task >> >> Caused by commit >> >> 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD") >> >> I have applied the following patch for today: >> >> From: Stephen Rothwell >> Date: Wed, 16 May 2018 16:43:34 +1000 >> Subject: [PATCH] drm/amdgpu: include pagemap.h for release_pages() >> >> Fixes: 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD" >> Cc: Felix Kuehling >> Cc: Oded Gabbay >> Signed-off-by: Stephen Rothwell >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index 72ab2b1ffe75..ff8fd75f7ca5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -23,6 +23,7 @@ >> #define pr_fmt(fmt) "kfd2kgd: " fmt >> >> #include >> +#include >> #include >> #include >> #include "amdgpu_object.h" >> -- >> 2.17.0 >> >> -- >> Cheers, >> Stephen Rothwell > > Thanks Stephen, > > I'll add it to amdkfd-next and send it to Dave with other fixes. > > Oded > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524
The register constants are so far defined in a way that they fit for the pcal9555a when shifted by the number of banks, i.e. are multiplied by 2 in the accessor function. Now, the pcal6524 has 3 banks which means the relative offset is multiplied by 4 for the standard registers. Simply applying the bit shift to the extended registers gives a wrong result, since the base offset is already included in the offset. Therefore, we have to add code to the 24 bit accessor functions that adjusts the register number for these exended registers. The formula finally used was developed and proposed by Andy Shevchenko. Suggested-by: Andy Shevchenko Signed-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index c682921d7019..4ad553f4e41f 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -222,9 +222,11 @@ static int pca957x_write_regs_16(struct pca953x_chip *chip, int reg, u8 *val) static int pca953x_write_regs_24(struct pca953x_chip *chip, int reg, u8 *val) { int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ); + int addr = (reg & PCAL_GPIO_MASK) << bank_shift; + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1; return i2c_smbus_write_i2c_block_data(chip->client, - (reg << bank_shift) | REG_ADDR_AI, + pinctrl | addr | REG_ADDR_AI, NBANK(chip), val); } @@ -264,9 +266,11 @@ static int pca953x_read_regs_16(struct pca953x_chip *chip, int reg, u8 *val) static int pca953x_read_regs_24(struct pca953x_chip *chip, int reg, u8 *val) { int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ); + int addr = (reg & PCAL_GPIO_MASK) << bank_shift; + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1; return i2c_smbus_read_i2c_block_data(chip->client, -(reg << bank_shift) | REG_ADDR_AI, +pinctrl | addr | REG_ADDR_AI, NBANK(chip), val); } -- 2.12.2
[PATCH v7 1/3] gpio: pca953x: set the PCA_PCAL flag also when matching by DT
The of_device_table is missing the PCA_PCAL flag so the pcal6524 would be operated in tca6424 compatibility mode which does not handle the new interrupt mask registers. Suggested-by: Andy ShevchenkoSigned-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index 7d37692d672e..2b667166e855 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -58,6 +58,7 @@ #define PCA_GPIO_MASK 0x00FF #define PCA_INT0x0100 #define PCA_PCAL 0x0200 +#define PCA_LATCH_INT (PCA_PCAL | PCA_INT) #define PCA953X_TYPE 0x1000 #define PCA957X_TYPE 0x2000 #define PCA_TYPE_MASK 0xF000 @@ -946,8 +947,8 @@ static const struct of_device_id pca953x_dt_ids[] = { { .compatible = "nxp,pca9575", .data = OF_957X(16, PCA_INT), }, { .compatible = "nxp,pca9698", .data = OF_953X(40, 0), }, - { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_INT), }, - { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_INT), }, + { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_LATCH_INT), }, + { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_LATCH_INT), }, { .compatible = "maxim,max7310", .data = OF_953X( 8, 0), }, { .compatible = "maxim,max7312", .data = OF_953X(16, PCA_INT), }, -- 2.12.2
[PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524
The register constants are so far defined in a way that they fit for the pcal9555a when shifted by the number of banks, i.e. are multiplied by 2 in the accessor function. Now, the pcal6524 has 3 banks which means the relative offset is multiplied by 4 for the standard registers. Simply applying the bit shift to the extended registers gives a wrong result, since the base offset is already included in the offset. Therefore, we have to add code to the 24 bit accessor functions that adjusts the register number for these exended registers. The formula finally used was developed and proposed by Andy Shevchenko . Suggested-by: Andy Shevchenko Signed-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index c682921d7019..4ad553f4e41f 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -222,9 +222,11 @@ static int pca957x_write_regs_16(struct pca953x_chip *chip, int reg, u8 *val) static int pca953x_write_regs_24(struct pca953x_chip *chip, int reg, u8 *val) { int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ); + int addr = (reg & PCAL_GPIO_MASK) << bank_shift; + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1; return i2c_smbus_write_i2c_block_data(chip->client, - (reg << bank_shift) | REG_ADDR_AI, + pinctrl | addr | REG_ADDR_AI, NBANK(chip), val); } @@ -264,9 +266,11 @@ static int pca953x_read_regs_16(struct pca953x_chip *chip, int reg, u8 *val) static int pca953x_read_regs_24(struct pca953x_chip *chip, int reg, u8 *val) { int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ); + int addr = (reg & PCAL_GPIO_MASK) << bank_shift; + int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1; return i2c_smbus_read_i2c_block_data(chip->client, -(reg << bank_shift) | REG_ADDR_AI, +pinctrl | addr | REG_ADDR_AI, NBANK(chip), val); } -- 2.12.2
[PATCH v7 1/3] gpio: pca953x: set the PCA_PCAL flag also when matching by DT
The of_device_table is missing the PCA_PCAL flag so the pcal6524 would be operated in tca6424 compatibility mode which does not handle the new interrupt mask registers. Suggested-by: Andy Shevchenko Signed-off-by: H. Nikolaus Schaller --- drivers/gpio/gpio-pca953x.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c index 7d37692d672e..2b667166e855 100644 --- a/drivers/gpio/gpio-pca953x.c +++ b/drivers/gpio/gpio-pca953x.c @@ -58,6 +58,7 @@ #define PCA_GPIO_MASK 0x00FF #define PCA_INT0x0100 #define PCA_PCAL 0x0200 +#define PCA_LATCH_INT (PCA_PCAL | PCA_INT) #define PCA953X_TYPE 0x1000 #define PCA957X_TYPE 0x2000 #define PCA_TYPE_MASK 0xF000 @@ -946,8 +947,8 @@ static const struct of_device_id pca953x_dt_ids[] = { { .compatible = "nxp,pca9575", .data = OF_957X(16, PCA_INT), }, { .compatible = "nxp,pca9698", .data = OF_953X(40, 0), }, - { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_INT), }, - { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_INT), }, + { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_LATCH_INT), }, + { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_LATCH_INT), }, { .compatible = "maxim,max7310", .data = OF_953X( 8, 0), }, { .compatible = "maxim,max7312", .data = OF_953X(16, PCA_INT), }, -- 2.12.2
[PATCH v7 0/3] pcal6524 extensions and fixes for pca953x driver
V7: * replace PCAL register masks by hex constants 2018-05-16 19:01:29: V6: * added proper attribution to the formula used for fixing the pcal6524 register address (changes commit message only) * add back missing first patch from V2 that defines the PCA_LATCH_INT constant * removed patches already merged 2018-04-28 18:33:42: V5: * fix wrong split up between patches 1/7 and 2/7. 2018-04-26 19:35:07: V4: * introduced PCA_LATCH_INT constant to make of_table more readable (suggested by Andy Shevchenko) * converted all register constants to hex in a separate patch (suggested by Andy Shevchenko) * separated additional pcal953x and pcal6524 register definitions into separate patches (suggested by Andy Shevchenko) * made special pcal6524 address adjustment more readable (suggested by Andy Shevchenko) * moved gpio-controller and interrupt-controller to the "required" section (reviewed by Rob Herring) 2018-04-10 18:07:07: V3: * add Reported-by: and Reviewed-by: * fix wording for bindings description and example * convert all register offsets to hex * omit the LEVEL-IRQ RFC/hack commit 2018-04-04 21:00:27: V2: * added PCA_PCAL flags if matched through of-table * fix address calculation for extended PCAL6524 registers * hack to map LEVEL_LOW to EDGE_FALLING to be able to test in combination with ts3a227e driver * improve description of bindings for optional vcc-supply and interrupt-controller; 2018-03-10 09:32:53: no initial description H. Nikolaus Schaller (3): gpio: pca953x: set the PCA_PCAL flag also when matching by DT gpio: pca953x: define masks for addressing common and extended registers gpio: pca953x: fix address calculation for pcal6524 drivers/gpio/gpio-pca953x.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) -- 2.12.2
[PATCH v7 0/3] pcal6524 extensions and fixes for pca953x driver
V7: * replace PCAL register masks by hex constants 2018-05-16 19:01:29: V6: * added proper attribution to the formula used for fixing the pcal6524 register address (changes commit message only) * add back missing first patch from V2 that defines the PCA_LATCH_INT constant * removed patches already merged 2018-04-28 18:33:42: V5: * fix wrong split up between patches 1/7 and 2/7. 2018-04-26 19:35:07: V4: * introduced PCA_LATCH_INT constant to make of_table more readable (suggested by Andy Shevchenko) * converted all register constants to hex in a separate patch (suggested by Andy Shevchenko) * separated additional pcal953x and pcal6524 register definitions into separate patches (suggested by Andy Shevchenko) * made special pcal6524 address adjustment more readable (suggested by Andy Shevchenko) * moved gpio-controller and interrupt-controller to the "required" section (reviewed by Rob Herring) 2018-04-10 18:07:07: V3: * add Reported-by: and Reviewed-by: * fix wording for bindings description and example * convert all register offsets to hex * omit the LEVEL-IRQ RFC/hack commit 2018-04-04 21:00:27: V2: * added PCA_PCAL flags if matched through of-table * fix address calculation for extended PCAL6524 registers * hack to map LEVEL_LOW to EDGE_FALLING to be able to test in combination with ts3a227e driver * improve description of bindings for optional vcc-supply and interrupt-controller; 2018-03-10 09:32:53: no initial description H. Nikolaus Schaller (3): gpio: pca953x: set the PCA_PCAL flag also when matching by DT gpio: pca953x: define masks for addressing common and extended registers gpio: pca953x: fix address calculation for pcal6524 drivers/gpio/gpio-pca953x.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) -- 2.12.2
Re: [PATCH] perf annotate: Support multiple events without group
Hi, Any comments for this fix? Thanks Jin Yao On 5/10/2018 9:59 PM, Jin Yao wrote: See example, perf record -e cycles,branches ./div perf annotate main --stdio or perf annotate main --stdio2 or perf annotate main The "perf annotate" should show both cycles and branches on the left side, but actually it only shows one event cycles. It works with events group like: perf record -e "{cycles,branches}" ./div It should work too even without group. This patch uses 'symbol_conf.nr_events > 1' to check if it's the multiple events case. With this patch, perf record -e cycles,branches ./div perf annotate main --stdio2 .. for (i = 0; i < 20; i++) { flag = compute_flag(); 4.85 5.83 38: xor%eax,%eax 0.01 0.01 → callq compute_flag count++; 4.44 2.27movcount,%edx 0.00 0.00add$0x1,%edx .. Signed-off-by: Jin Yao--- tools/perf/util/annotate.c | 8 1 file changed, 8 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 5d74a30..ca8d4b1 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1054,6 +1054,8 @@ annotation_line__new(struct annotate_args *args, size_t privsize) if (perf_evsel__is_group_event(evsel)) nr = evsel->nr_members; + else if (symbol_conf.nr_events > nr) + nr = symbol_conf.nr_events; size += sizeof(al->samples[0]) * nr; @@ -1326,6 +1328,8 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start if (perf_evsel__is_group_event(evsel)) width *= evsel->nr_members; + else if (symbol_conf.nr_events > 1) + width *= symbol_conf.nr_events; if (!*al->line) printf(" %*s:\n", width, " "); @@ -1967,6 +1971,8 @@ int symbol__annotate_printf(struct symbol *sym, struct map *map, if (perf_evsel__is_group_event(evsel)) width *= evsel->nr_members; + else if (symbol_conf.nr_events > 1) + width *= symbol_conf.nr_events; graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s for %s (%" PRIu64 " samples)\n", width, width, symbol_conf.show_total_period ? "Period" : @@ -2578,6 +2584,8 @@ int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *ev if (perf_evsel__is_group_event(evsel)) nr_pcnt = evsel->nr_members; + else if (symbol_conf.nr_events > nr_pcnt) + nr_pcnt = symbol_conf.nr_events; err = symbol__annotate(sym, map, evsel, 0, parch); if (err)
Re: [PATCH] perf annotate: Support multiple events without group
Hi, Any comments for this fix? Thanks Jin Yao On 5/10/2018 9:59 PM, Jin Yao wrote: See example, perf record -e cycles,branches ./div perf annotate main --stdio or perf annotate main --stdio2 or perf annotate main The "perf annotate" should show both cycles and branches on the left side, but actually it only shows one event cycles. It works with events group like: perf record -e "{cycles,branches}" ./div It should work too even without group. This patch uses 'symbol_conf.nr_events > 1' to check if it's the multiple events case. With this patch, perf record -e cycles,branches ./div perf annotate main --stdio2 .. for (i = 0; i < 20; i++) { flag = compute_flag(); 4.85 5.83 38: xor%eax,%eax 0.01 0.01 → callq compute_flag count++; 4.44 2.27movcount,%edx 0.00 0.00add$0x1,%edx .. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 8 1 file changed, 8 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 5d74a30..ca8d4b1 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1054,6 +1054,8 @@ annotation_line__new(struct annotate_args *args, size_t privsize) if (perf_evsel__is_group_event(evsel)) nr = evsel->nr_members; + else if (symbol_conf.nr_events > nr) + nr = symbol_conf.nr_events; size += sizeof(al->samples[0]) * nr; @@ -1326,6 +1328,8 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start if (perf_evsel__is_group_event(evsel)) width *= evsel->nr_members; + else if (symbol_conf.nr_events > 1) + width *= symbol_conf.nr_events; if (!*al->line) printf(" %*s:\n", width, " "); @@ -1967,6 +1971,8 @@ int symbol__annotate_printf(struct symbol *sym, struct map *map, if (perf_evsel__is_group_event(evsel)) width *= evsel->nr_members; + else if (symbol_conf.nr_events > 1) + width *= symbol_conf.nr_events; graph_dotted_len = printf(" %-*.*s| Source code & Disassembly of %s for %s (%" PRIu64 " samples)\n", width, width, symbol_conf.show_total_period ? "Period" : @@ -2578,6 +2584,8 @@ int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *ev if (perf_evsel__is_group_event(evsel)) nr_pcnt = evsel->nr_members; + else if (symbol_conf.nr_events > nr_pcnt) + nr_pcnt = symbol_conf.nr_events; err = symbol__annotate(sym, map, evsel, 0, parch); if (err)
Re: [PATCH v5 13/13] mm: Clear shrinker bit if there are no objects related to memcg
On Tue, May 15, 2018 at 11:55:04AM +0300, Kirill Tkhai wrote: > >> @@ -586,8 +586,23 @@ static unsigned long shrink_slab_memcg(gfp_t > >> gfp_mask, int nid, > >>continue; > >> > >>ret = do_shrink_slab(, shrinker, priority); > >> - if (ret == SHRINK_EMPTY) > >> - ret = 0; > >> + if (ret == SHRINK_EMPTY) { > >> + clear_bit(i, map->map); > >> + /* > >> + * Pairs with mb in memcg_set_shrinker_bit(): > >> + * > >> + * list_lru_add() shrink_slab_memcg() > >> + * list_add_tail()clear_bit() > >> + * > >> + * set_bit() do_shrink_slab() > >> + */ > > > > Please improve the comment so that it isn't just a diagram. > > Please, say, which comment you want to see here. I want the reader to understand why we need to invoke the shrinker twice if it returns SHRINK_EMPTY. The diagram doesn't really help here IMO. So I'd write Something like this: ret = do_shrink_slab(, shrinker, priority); if (ret == SHRINK_EMPTY) { clear_bit(i, map->map); /* * After the shrinker reported that it had no objects to free, * but before we cleared the corresponding bit in the memcg * shrinker map, a new object might have been added. To make * sure, we have the bit set in this case, we invoke the * shrinker one more time and re-set the bit if it reports that * it is not empty anymore. The memory barrier here pairs with * the barrier in memcg_set_shrinker_bit(): * * list_lru_add() shrink_slab_memcg() * list_add_tail()clear_bit() * * set_bit() do_shrink_slab() */ smp_mb__after_atomic(); ret = do_shrink_slab(, shrinker, priority); if (ret == SHRINK_EMPTY) ret = 0; else memcg_set_shrinker_bit(memcg, nid, i);
Re: [PATCH v5 13/13] mm: Clear shrinker bit if there are no objects related to memcg
On Tue, May 15, 2018 at 11:55:04AM +0300, Kirill Tkhai wrote: > >> @@ -586,8 +586,23 @@ static unsigned long shrink_slab_memcg(gfp_t > >> gfp_mask, int nid, > >>continue; > >> > >>ret = do_shrink_slab(, shrinker, priority); > >> - if (ret == SHRINK_EMPTY) > >> - ret = 0; > >> + if (ret == SHRINK_EMPTY) { > >> + clear_bit(i, map->map); > >> + /* > >> + * Pairs with mb in memcg_set_shrinker_bit(): > >> + * > >> + * list_lru_add() shrink_slab_memcg() > >> + * list_add_tail()clear_bit() > >> + * > >> + * set_bit() do_shrink_slab() > >> + */ > > > > Please improve the comment so that it isn't just a diagram. > > Please, say, which comment you want to see here. I want the reader to understand why we need to invoke the shrinker twice if it returns SHRINK_EMPTY. The diagram doesn't really help here IMO. So I'd write Something like this: ret = do_shrink_slab(, shrinker, priority); if (ret == SHRINK_EMPTY) { clear_bit(i, map->map); /* * After the shrinker reported that it had no objects to free, * but before we cleared the corresponding bit in the memcg * shrinker map, a new object might have been added. To make * sure, we have the bit set in this case, we invoke the * shrinker one more time and re-set the bit if it reports that * it is not empty anymore. The memory barrier here pairs with * the barrier in memcg_set_shrinker_bit(): * * list_lru_add() shrink_slab_memcg() * list_add_tail()clear_bit() * * set_bit() do_shrink_slab() */ smp_mb__after_atomic(); ret = do_shrink_slab(, shrinker, priority); if (ret == SHRINK_EMPTY) ret = 0; else memcg_set_shrinker_bit(memcg, nid, i);
[PATCH 3/6] workqueue: Make worker_attach/detach_pool() update worker->pool
For historical reasons, the worker attach/detach functions don't currently manage worker->pool and the callers are manually and inconsistently updating it. This patch moves worker->pool updates into the worker attach/detach functions. This makes worker->pool consistent and clearly defines how worker->pool updates are synchronized. This will help later workqueue visibility improvements by allowing safe access to workqueue information from worker->task. Signed-off-by: Tejun Heo--- kernel/workqueue.c | 16 kernel/workqueue_internal.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 91fe0a6..2fde50f 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1741,6 +1741,7 @@ static void worker_attach_to_pool(struct worker *worker, worker->flags |= WORKER_UNBOUND; list_add_tail(>node, >workers); + worker->pool = pool; mutex_unlock(_pool_attach_mutex); } @@ -1748,19 +1749,21 @@ static void worker_attach_to_pool(struct worker *worker, /** * worker_detach_from_pool() - detach a worker from its pool * @worker: worker which is attached to its pool - * @pool: the pool @worker is attached to * * Undo the attaching which had been done in worker_attach_to_pool(). The * caller worker shouldn't access to the pool after detached except it has * other reference to the pool. */ -static void worker_detach_from_pool(struct worker *worker, - struct worker_pool *pool) +static void worker_detach_from_pool(struct worker *worker) { + struct worker_pool *pool = worker->pool; struct completion *detach_completion = NULL; mutex_lock(_pool_attach_mutex); + list_del(>node); + worker->pool = NULL; + if (list_empty(>workers)) detach_completion = pool->detach_completion; mutex_unlock(_pool_attach_mutex); @@ -1799,7 +1802,6 @@ static struct worker *create_worker(struct worker_pool *pool) if (!worker) goto fail; - worker->pool = pool; worker->id = id; if (pool->cpu >= 0) @@ -2236,7 +2238,7 @@ static int worker_thread(void *__worker) set_task_comm(worker->task, "kworker/dying"); ida_simple_remove(>worker_ida, worker->id); - worker_detach_from_pool(worker, pool); + worker_detach_from_pool(worker); kfree(worker); return 0; } @@ -2367,7 +2369,6 @@ static int rescuer_thread(void *__rescuer) worker_attach_to_pool(rescuer, pool); spin_lock_irq(>lock); - rescuer->pool = pool; /* * Slurp in all works issued via this workqueue and @@ -2417,10 +2418,9 @@ static int rescuer_thread(void *__rescuer) if (need_more_worker(pool)) wake_up_worker(pool); - rescuer->pool = NULL; spin_unlock_irq(>lock); - worker_detach_from_pool(rescuer, pool); + worker_detach_from_pool(rescuer); spin_lock_irq(_mayday_lock); } diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index d390d1b..4a182e0 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -37,7 +37,7 @@ struct worker { /* 64 bytes boundary on 64bit, 32 on 32bit */ struct task_struct *task; /* I: worker task */ - struct worker_pool *pool; /* I: the associated pool */ + struct worker_pool *pool; /* A: the associated pool */ /* L: for rescuers */ struct list_headnode; /* A: anchored at pool->workers */ /* A: runs through worker->node */ -- 2.9.5
[PATCH 2/6] workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
To improve workqueue visibility, we want to be able to access workqueue information from worker tasks. The per-pool attach mutex makes that difficult because there's no way of stabilizing task -> worker pool association without knowing the pool first. Worker attach/detach is a slow path and there's no need for different pools to be able to perform them concurrently. This patch replaces the per-pool attach_mutex with global wq_pool_attach_mutex to prepare for visibility improvement changes. Signed-off-by: Tejun Heo--- kernel/workqueue.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index ca7959b..91fe0a6 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -66,7 +66,7 @@ enum { * be executing on any CPU. The pool behaves as an unbound one. * * Note that DISASSOCIATED should be flipped only while holding -* attach_mutex to avoid changing binding state while +* wq_pool_attach_mutex to avoid changing binding state while * worker_attach_to_pool() is in progress. */ POOL_MANAGER_ACTIVE = 1 << 0, /* being managed */ @@ -123,7 +123,7 @@ enum { *cpu or grabbing pool->lock is enough for read access. If *POOL_DISASSOCIATED is set, it's identical to L. * - * A: pool->attach_mutex protected. + * A: wq_pool_attach_mutex protected. * * PL: wq_pool_mutex protected. * @@ -166,7 +166,6 @@ struct worker_pool { /* L: hash of busy workers */ struct worker *manager; /* L: purely informational */ - struct mutexattach_mutex; /* attach/detach exclusion */ struct list_headworkers;/* A: attached workers */ struct completion *detach_completion; /* all workers detached */ @@ -297,6 +296,7 @@ static bool wq_numa_enabled;/* unbound NUMA affinity enabled */ static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf; static DEFINE_MUTEX(wq_pool_mutex);/* protects pools and workqueues list */ +static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */ static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq->maydays list */ static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */ @@ -399,14 +399,14 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq); * @worker: iteration cursor * @pool: worker_pool to iterate workers of * - * This must be called with @pool->attach_mutex. + * This must be called with wq_pool_attach_mutex. * * The if/else clause exists only for the lockdep assertion and can be * ignored. */ #define for_each_pool_worker(worker, pool) \ list_for_each_entry((worker), &(pool)->workers, node) \ - if (({ lockdep_assert_held(>attach_mutex); false; })) { } \ + if (({ lockdep_assert_held(_pool_attach_mutex); false; })) { } \ else /** @@ -1724,7 +1724,7 @@ static struct worker *alloc_worker(int node) static void worker_attach_to_pool(struct worker *worker, struct worker_pool *pool) { - mutex_lock(>attach_mutex); + mutex_lock(_pool_attach_mutex); /* * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any @@ -1733,16 +1733,16 @@ static void worker_attach_to_pool(struct worker *worker, set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); /* -* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains -* stable across this function. See the comments above the -* flag definition for details. +* The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains +* stable across this function. See the comments above the flag +* definition for details. */ if (pool->flags & POOL_DISASSOCIATED) worker->flags |= WORKER_UNBOUND; list_add_tail(>node, >workers); - mutex_unlock(>attach_mutex); + mutex_unlock(_pool_attach_mutex); } /** @@ -1759,11 +1759,11 @@ static void worker_detach_from_pool(struct worker *worker, { struct completion *detach_completion = NULL; - mutex_lock(>attach_mutex); + mutex_lock(_pool_attach_mutex); list_del(>node); if (list_empty(>workers)) detach_completion = pool->detach_completion; - mutex_unlock(>attach_mutex); + mutex_unlock(_pool_attach_mutex); /* clear leftover flags without pool->lock after it is detached */ worker->flags &= ~(WORKER_UNBOUND | WORKER_REBOUND); @@ -3271,7 +3271,6 @@ static int init_worker_pool(struct worker_pool *pool) timer_setup(>mayday_timer, pool_mayday_timeout, 0); - mutex_init(>attach_mutex);
[PATCH 3/6] workqueue: Make worker_attach/detach_pool() update worker->pool
For historical reasons, the worker attach/detach functions don't currently manage worker->pool and the callers are manually and inconsistently updating it. This patch moves worker->pool updates into the worker attach/detach functions. This makes worker->pool consistent and clearly defines how worker->pool updates are synchronized. This will help later workqueue visibility improvements by allowing safe access to workqueue information from worker->task. Signed-off-by: Tejun Heo --- kernel/workqueue.c | 16 kernel/workqueue_internal.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 91fe0a6..2fde50f 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1741,6 +1741,7 @@ static void worker_attach_to_pool(struct worker *worker, worker->flags |= WORKER_UNBOUND; list_add_tail(>node, >workers); + worker->pool = pool; mutex_unlock(_pool_attach_mutex); } @@ -1748,19 +1749,21 @@ static void worker_attach_to_pool(struct worker *worker, /** * worker_detach_from_pool() - detach a worker from its pool * @worker: worker which is attached to its pool - * @pool: the pool @worker is attached to * * Undo the attaching which had been done in worker_attach_to_pool(). The * caller worker shouldn't access to the pool after detached except it has * other reference to the pool. */ -static void worker_detach_from_pool(struct worker *worker, - struct worker_pool *pool) +static void worker_detach_from_pool(struct worker *worker) { + struct worker_pool *pool = worker->pool; struct completion *detach_completion = NULL; mutex_lock(_pool_attach_mutex); + list_del(>node); + worker->pool = NULL; + if (list_empty(>workers)) detach_completion = pool->detach_completion; mutex_unlock(_pool_attach_mutex); @@ -1799,7 +1802,6 @@ static struct worker *create_worker(struct worker_pool *pool) if (!worker) goto fail; - worker->pool = pool; worker->id = id; if (pool->cpu >= 0) @@ -2236,7 +2238,7 @@ static int worker_thread(void *__worker) set_task_comm(worker->task, "kworker/dying"); ida_simple_remove(>worker_ida, worker->id); - worker_detach_from_pool(worker, pool); + worker_detach_from_pool(worker); kfree(worker); return 0; } @@ -2367,7 +2369,6 @@ static int rescuer_thread(void *__rescuer) worker_attach_to_pool(rescuer, pool); spin_lock_irq(>lock); - rescuer->pool = pool; /* * Slurp in all works issued via this workqueue and @@ -2417,10 +2418,9 @@ static int rescuer_thread(void *__rescuer) if (need_more_worker(pool)) wake_up_worker(pool); - rescuer->pool = NULL; spin_unlock_irq(>lock); - worker_detach_from_pool(rescuer, pool); + worker_detach_from_pool(rescuer); spin_lock_irq(_mayday_lock); } diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index d390d1b..4a182e0 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -37,7 +37,7 @@ struct worker { /* 64 bytes boundary on 64bit, 32 on 32bit */ struct task_struct *task; /* I: worker task */ - struct worker_pool *pool; /* I: the associated pool */ + struct worker_pool *pool; /* A: the associated pool */ /* L: for rescuers */ struct list_headnode; /* A: anchored at pool->workers */ /* A: runs through worker->node */ -- 2.9.5
[PATCH 2/6] workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex
To improve workqueue visibility, we want to be able to access workqueue information from worker tasks. The per-pool attach mutex makes that difficult because there's no way of stabilizing task -> worker pool association without knowing the pool first. Worker attach/detach is a slow path and there's no need for different pools to be able to perform them concurrently. This patch replaces the per-pool attach_mutex with global wq_pool_attach_mutex to prepare for visibility improvement changes. Signed-off-by: Tejun Heo --- kernel/workqueue.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index ca7959b..91fe0a6 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -66,7 +66,7 @@ enum { * be executing on any CPU. The pool behaves as an unbound one. * * Note that DISASSOCIATED should be flipped only while holding -* attach_mutex to avoid changing binding state while +* wq_pool_attach_mutex to avoid changing binding state while * worker_attach_to_pool() is in progress. */ POOL_MANAGER_ACTIVE = 1 << 0, /* being managed */ @@ -123,7 +123,7 @@ enum { *cpu or grabbing pool->lock is enough for read access. If *POOL_DISASSOCIATED is set, it's identical to L. * - * A: pool->attach_mutex protected. + * A: wq_pool_attach_mutex protected. * * PL: wq_pool_mutex protected. * @@ -166,7 +166,6 @@ struct worker_pool { /* L: hash of busy workers */ struct worker *manager; /* L: purely informational */ - struct mutexattach_mutex; /* attach/detach exclusion */ struct list_headworkers;/* A: attached workers */ struct completion *detach_completion; /* all workers detached */ @@ -297,6 +296,7 @@ static bool wq_numa_enabled;/* unbound NUMA affinity enabled */ static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf; static DEFINE_MUTEX(wq_pool_mutex);/* protects pools and workqueues list */ +static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */ static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq->maydays list */ static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */ @@ -399,14 +399,14 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq); * @worker: iteration cursor * @pool: worker_pool to iterate workers of * - * This must be called with @pool->attach_mutex. + * This must be called with wq_pool_attach_mutex. * * The if/else clause exists only for the lockdep assertion and can be * ignored. */ #define for_each_pool_worker(worker, pool) \ list_for_each_entry((worker), &(pool)->workers, node) \ - if (({ lockdep_assert_held(>attach_mutex); false; })) { } \ + if (({ lockdep_assert_held(_pool_attach_mutex); false; })) { } \ else /** @@ -1724,7 +1724,7 @@ static struct worker *alloc_worker(int node) static void worker_attach_to_pool(struct worker *worker, struct worker_pool *pool) { - mutex_lock(>attach_mutex); + mutex_lock(_pool_attach_mutex); /* * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any @@ -1733,16 +1733,16 @@ static void worker_attach_to_pool(struct worker *worker, set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); /* -* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains -* stable across this function. See the comments above the -* flag definition for details. +* The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains +* stable across this function. See the comments above the flag +* definition for details. */ if (pool->flags & POOL_DISASSOCIATED) worker->flags |= WORKER_UNBOUND; list_add_tail(>node, >workers); - mutex_unlock(>attach_mutex); + mutex_unlock(_pool_attach_mutex); } /** @@ -1759,11 +1759,11 @@ static void worker_detach_from_pool(struct worker *worker, { struct completion *detach_completion = NULL; - mutex_lock(>attach_mutex); + mutex_lock(_pool_attach_mutex); list_del(>node); if (list_empty(>workers)) detach_completion = pool->detach_completion; - mutex_unlock(>attach_mutex); + mutex_unlock(_pool_attach_mutex); /* clear leftover flags without pool->lock after it is detached */ worker->flags &= ~(WORKER_UNBOUND | WORKER_REBOUND); @@ -3271,7 +3271,6 @@ static int init_worker_pool(struct worker_pool *pool) timer_setup(>mayday_timer, pool_mayday_timeout, 0); - mutex_init(>attach_mutex); INIT_LIST_HEAD(>workers);
[PATCH 6/6] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef | grep kworker root 4 2 0 Feb25 ?00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ?00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ?00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ?00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ?00:00:00 [kworker/3:0H] ... This patch makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient Unfortunately, ps(1) truncates comm to 15 characters, # ps 25 PID TTY STAT TIME COMMAND 25 ?I 0:00 [kworker/2:0-eve] making it a lot less useful; however, this should be an easy fix from ps(1) side. Signed-off-by: Tejun HeoSuggested-by: Linus Torvalds Cc: Craig Small --- fs/proc/array.c | 7 +-- include/linux/workqueue.h | 1 + kernel/workqueue.c| 39 +++ 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index f29221e..bb1d361 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -99,10 +99,13 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) { char *buf; size_t size; - char tcomm[sizeof(p->comm)]; + char tcomm[64]; int ret; - get_task_comm(tcomm, p); + if (p->flags & PF_WQ_WORKER) + wq_worker_comm(tcomm, sizeof(tcomm), p); + else + __get_task_comm(tcomm, sizeof(tcomm), p); size = seq_get_buf(m, ); if (escape) { diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 39a0e21..60d673e 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -494,6 +494,7 @@ extern unsigned int work_busy(struct work_struct *work); extern __printf(1, 2) void set_worker_desc(const char *fmt, ...); extern void print_worker_info(const char *log_lvl, struct task_struct *task); extern void show_workqueue_state(void); +extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task); /** * queue_work - queue work on a workqueue diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 3fbe007..b4a39a1 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4577,6 +4577,45 @@ void show_workqueue_state(void) rcu_read_unlock_sched(); } +/* used to show worker information through /proc/PID/{comm,stat,status} */ +void wq_worker_comm(char *buf, size_t size, struct task_struct *task) +{ + struct worker *worker; + struct worker_pool *pool; + int off; + + /* always show the actual comm */ + off = strscpy(buf, task->comm, size); + if (off < 0) + return; + + /* stabilize worker pool association */ + mutex_lock(_pool_attach_mutex); + + worker = kthread_data(task); + pool = worker->pool; + + if (pool) { + spin_lock_irq(>lock); + /* +* ->desc tracks information (wq name or set_worker_desc()) +* for the latest execution. If current, prepend '+', +* otherwise '-'. +*/ + if (worker->desc[0] != '\0') { + if (worker->current_work) + scnprintf(buf + off, size - off, "+%s", + worker->desc); + else + scnprintf(buf + off, size - off, "-%s", + worker->desc); + } + spin_unlock_irq(>lock); + } + + mutex_unlock(_pool_attach_mutex); +} + /* * CPU hotplug. * -- 2.9.5
[PATCH 6/6] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef | grep kworker root 4 2 0 Feb25 ?00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ?00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ?00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ?00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ?00:00:00 [kworker/3:0H] ... This patch makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient Unfortunately, ps(1) truncates comm to 15 characters, # ps 25 PID TTY STAT TIME COMMAND 25 ?I 0:00 [kworker/2:0-eve] making it a lot less useful; however, this should be an easy fix from ps(1) side. Signed-off-by: Tejun Heo Suggested-by: Linus Torvalds Cc: Craig Small --- fs/proc/array.c | 7 +-- include/linux/workqueue.h | 1 + kernel/workqueue.c| 39 +++ 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index f29221e..bb1d361 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -99,10 +99,13 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) { char *buf; size_t size; - char tcomm[sizeof(p->comm)]; + char tcomm[64]; int ret; - get_task_comm(tcomm, p); + if (p->flags & PF_WQ_WORKER) + wq_worker_comm(tcomm, sizeof(tcomm), p); + else + __get_task_comm(tcomm, sizeof(tcomm), p); size = seq_get_buf(m, ); if (escape) { diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 39a0e21..60d673e 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -494,6 +494,7 @@ extern unsigned int work_busy(struct work_struct *work); extern __printf(1, 2) void set_worker_desc(const char *fmt, ...); extern void print_worker_info(const char *log_lvl, struct task_struct *task); extern void show_workqueue_state(void); +extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task); /** * queue_work - queue work on a workqueue diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 3fbe007..b4a39a1 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4577,6 +4577,45 @@ void show_workqueue_state(void) rcu_read_unlock_sched(); } +/* used to show worker information through /proc/PID/{comm,stat,status} */ +void wq_worker_comm(char *buf, size_t size, struct task_struct *task) +{ + struct worker *worker; + struct worker_pool *pool; + int off; + + /* always show the actual comm */ + off = strscpy(buf, task->comm, size); + if (off < 0) + return; + + /* stabilize worker pool association */ + mutex_lock(_pool_attach_mutex); + + worker = kthread_data(task); + pool = worker->pool; + + if (pool) { + spin_lock_irq(>lock); + /* +* ->desc tracks information (wq name or set_worker_desc()) +* for the latest execution. If current, prepend '+', +* otherwise '-'. +*/ + if (worker->desc[0] != '\0') { + if (worker->current_work) + scnprintf(buf + off, size - off, "+%s", + worker->desc); + else + scnprintf(buf + off, size - off, "-%s", + worker->desc); + } + spin_unlock_irq(>lock); + } + + mutex_unlock(_pool_attach_mutex); +} + /* * CPU hotplug. * -- 2.9.5
[PATCH 4/6] workqueue: Set worker->desc to workqueue name by default
Work functions can use set_worker_desc() to improve the visibility of what the worker task is doing. Currently, the desc field is unset at the beginning of each execution and there is a separate field to track the field is set during the current execution. Instead of leaving empty till desc is set, worker->desc can be used to remember the last workqueue the worker worked on by default and users that use set_worker_desc() can override it to something more informative as necessary. This simplifies desc handling and helps tracking the last workqueue that the worker exected on to improve visibility. Signed-off-by: Tejun Heo--- kernel/workqueue.c | 21 ++--- kernel/workqueue_internal.h | 1 - 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2fde50f..3fbe007 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2088,6 +2088,12 @@ __acquires(>lock) worker->current_pwq = pwq; work_color = get_work_color(work); + /* +* Record wq name for cmdline and debug reporting, may get +* overridden through set_worker_desc(). +*/ + strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN); + list_del_init(>entry); /* @@ -2183,7 +2189,6 @@ __acquires(>lock) worker->current_work = NULL; worker->current_func = NULL; worker->current_pwq = NULL; - worker->desc_valid = false; pwq_dec_nr_in_flight(pwq, work_color); } @@ -4346,7 +4351,6 @@ void set_worker_desc(const char *fmt, ...) va_start(args, fmt); vsnprintf(worker->desc, sizeof(worker->desc), fmt, args); va_end(args); - worker->desc_valid = true; } } @@ -4370,7 +4374,6 @@ void print_worker_info(const char *log_lvl, struct task_struct *task) char desc[WORKER_DESC_LEN] = { }; struct pool_workqueue *pwq = NULL; struct workqueue_struct *wq = NULL; - bool desc_valid = false; struct worker *worker; if (!(task->flags & PF_WQ_WORKER)) @@ -4383,22 +4386,18 @@ void print_worker_info(const char *log_lvl, struct task_struct *task) worker = kthread_probe_data(task); /* -* Carefully copy the associated workqueue's workfn and name. Keep -* the original last '\0' in case the original contains garbage. +* Carefully copy the associated workqueue's workfn, name and desc. +* Keep the original last '\0' in case the original is garbage. */ probe_kernel_read(, >current_func, sizeof(fn)); probe_kernel_read(, >current_pwq, sizeof(pwq)); probe_kernel_read(, >wq, sizeof(wq)); probe_kernel_read(name, wq->name, sizeof(name) - 1); - - /* copy worker description */ - probe_kernel_read(_valid, >desc_valid, sizeof(desc_valid)); - if (desc_valid) - probe_kernel_read(desc, worker->desc, sizeof(desc) - 1); + probe_kernel_read(desc, worker->desc, sizeof(desc) - 1); if (fn || name[0] || desc[0]) { printk("%sWorkqueue: %s %pf", log_lvl, name, fn); - if (desc[0]) + if (strcmp(name, desc)) pr_cont(" (%s)", desc); pr_cont("\n"); } diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 4a182e0..66fbb5a 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -31,7 +31,6 @@ struct worker { struct work_struct *current_work; /* L: work being processed */ work_func_t current_func; /* L: current_work's fn */ struct pool_workqueue *current_pwq; /* L: current_work's pwq */ - booldesc_valid; /* ->desc is valid */ struct list_headscheduled; /* L: scheduled works */ /* 64 bytes boundary on 64bit, 32 on 32bit */ -- 2.9.5
[PATCH 5/6] proc: Consolidate task->comm formatting into proc_task_name()
proc shows task->comm in three places - comm, stat, status - and each is fetching and formatting task->comm slighly differently. This patch renames task_name() to proc_task_name(), makes it more generic, and updates all three paths to use it. This will enable expanding comm reporting for workqueue workers. Signed-off-by: Tejun Heo--- fs/proc/array.c| 26 +++--- fs/proc/base.c | 5 ++--- fs/proc/internal.h | 2 ++ 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index ae2c807..f29221e 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -95,7 +95,7 @@ #include #include "internal.h" -static inline void task_name(struct seq_file *m, struct task_struct *p) +void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) { char *buf; size_t size; @@ -104,13 +104,17 @@ static inline void task_name(struct seq_file *m, struct task_struct *p) get_task_comm(tcomm, p); - seq_puts(m, "Name:\t"); - size = seq_get_buf(m, ); - ret = string_escape_str(tcomm, buf, size, ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); - seq_commit(m, ret < size ? ret : -1); + if (escape) { + ret = string_escape_str(tcomm, buf, size, + ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); + if (ret >= size) + ret = -1; + } else { + ret = strscpy(buf, tcomm, size); + } - seq_putc(m, '\n'); + seq_commit(m, ret); } /* @@ -365,7 +369,10 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, { struct mm_struct *mm = get_task_mm(task); - task_name(m, task); + seq_puts(m, "Name:\t"); + proc_task_name(m, task, true); + seq_putc(m, '\n'); + task_state(m, ns, pid, task); if (mm) { @@ -400,7 +407,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, u64 cutime, cstime, utime, stime; u64 cgtime, gtime; unsigned long rsslim = 0; - char tcomm[sizeof(task->comm)]; unsigned long flags; state = *get_task_state(task); @@ -427,8 +433,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } } - get_task_comm(tcomm, task); - sigemptyset(); sigemptyset(); cutime = cstime = utime = stime = 0; @@ -495,7 +499,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns)); seq_puts(m, " ("); - seq_puts(m, tcomm); + proc_task_name(m, task, false); seq_puts(m, ") "); seq_putc(m, state); seq_put_decimal_ll(m, " ", ppid); diff --git a/fs/proc/base.c b/fs/proc/base.c index 2eee4d7..eb17917ca 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1581,9 +1581,8 @@ static int comm_show(struct seq_file *m, void *v) if (!p) return -ESRCH; - task_lock(p); - seq_printf(m, "%s\n", p->comm); - task_unlock(p); + proc_task_name(m, p, false); + seq_putc(m, '\n'); put_task_struct(p); diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 0f1692e..b823fac62 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -131,6 +131,8 @@ unsigned name_to_int(const struct qstr *qstr); */ extern const struct file_operations proc_tid_children_operations; +extern void proc_task_name(struct seq_file *m, struct task_struct *p, + bool escape); extern int proc_tid_stat(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *, -- 2.9.5
[PATCH 1/6] proc: Don't allow empty /proc/PID/cmdline for user tasks
Kernel threads have empty /proc/PID/cmdline and some userland tools including ps(1) and older versions of systemd use this to detect kernel threads. However, any userland program can emulate the behavior by making its argvs unavailable and trick the affected tools into thinking that the task is a kernel thread. Linus's reproducer follows. #include #include int main(void) { char empty[16384]; unsigned long ptr; asm volatile("" :"=r" (ptr) : "0" (empty):"memory"); ptr = (ptr+4095) & ~4095; munmap((void *)ptr, 32768); sleep(1000); return 0; } Compiling the above program into nullcmdline and running it on an unpatche kernel shows the following behavior. $ ./nullcmdline & [1] 2382031 [devbig577 ~/tmp]$ hexdump -C /proc/2382031/comm 6e 75 6c 6c 63 6d 64 6c 69 6e 65 0a |nullcmdline.| 000c $ hexdump -C /proc/2382031/cmdline $ ps 2382031 PID TTY STAT TIME COMMAND 2382031 pts/2S 0:00 [nullcmdline] The empty cmdline makes ps(1) think that nullcmdline is a kernel thread and put brackets around its name (comm), which is mostly a nuisance but it's possible that this confusion can lead to more harmful confusions. This patch fixes the issue by making proc_pid_cmdline_read() never return empty string for user tasks. If the result is empty for whatever reason, comm string is returned. Even when the comm string is empty, it still returns the null termnation character. On a patched kernel, running the same command as above gives us. $ ./nullcmdline & [1] 2317 [test ~]# hexdump -C /proc/2317/comm 6e 75 6c 6c 63 6d 64 6c 69 6e 65 0a |nullcmdline.| 000c $ hexdump -C /proc/2317/cmdline 6e 75 6c 6c 63 6d 64 6c 69 6e 65 00 |nullcmdline.| 000c $ ps 2317 PID TTY STAT TIME COMMAND 2317 pts/0S 0:00 nullcmdline Note that cmdline is a dup of comm and ps(1) is no longer confused. Signed-off-by: Tejun HeoSuggested-by: Linus Torvalds --- fs/proc/base.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 1b2ede6..2eee4d7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -224,9 +224,10 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, if (!tsk) return -ESRCH; mm = get_task_mm(tsk); - put_task_struct(tsk); - if (!mm) - return 0; + if (!mm) { + rv = 0; + goto out_put_task; + } /* Check if process spawned far enough to have cmdline. */ if (!mm->env_end) { rv = 0; @@ -367,8 +368,23 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, free_page((unsigned long)page); out_mmput: mmput(mm); +out_put_task: + /* +* Some userland tools use empty cmdline to distinguish kthreads. +* Avoid empty cmdline for user tasks by returning tsk->comm with +* \0 termination when empty. +*/ + if (*pos == 0 && rv == 0 && !(tsk->flags & PF_KTHREAD)) { + char tcomm[TASK_COMM_LEN]; + + get_task_comm(tcomm, tsk); + rv = min(strlen(tcomm) + 1, count); + if (copy_to_user(buf, tsk->comm, rv)) + rv = -EFAULT; + } if (rv > 0) *pos += rv; + put_task_struct(tsk); return rv; } -- 2.9.5
[PATCH 4/6] workqueue: Set worker->desc to workqueue name by default
Work functions can use set_worker_desc() to improve the visibility of what the worker task is doing. Currently, the desc field is unset at the beginning of each execution and there is a separate field to track the field is set during the current execution. Instead of leaving empty till desc is set, worker->desc can be used to remember the last workqueue the worker worked on by default and users that use set_worker_desc() can override it to something more informative as necessary. This simplifies desc handling and helps tracking the last workqueue that the worker exected on to improve visibility. Signed-off-by: Tejun Heo --- kernel/workqueue.c | 21 ++--- kernel/workqueue_internal.h | 1 - 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2fde50f..3fbe007 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2088,6 +2088,12 @@ __acquires(>lock) worker->current_pwq = pwq; work_color = get_work_color(work); + /* +* Record wq name for cmdline and debug reporting, may get +* overridden through set_worker_desc(). +*/ + strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN); + list_del_init(>entry); /* @@ -2183,7 +2189,6 @@ __acquires(>lock) worker->current_work = NULL; worker->current_func = NULL; worker->current_pwq = NULL; - worker->desc_valid = false; pwq_dec_nr_in_flight(pwq, work_color); } @@ -4346,7 +4351,6 @@ void set_worker_desc(const char *fmt, ...) va_start(args, fmt); vsnprintf(worker->desc, sizeof(worker->desc), fmt, args); va_end(args); - worker->desc_valid = true; } } @@ -4370,7 +4374,6 @@ void print_worker_info(const char *log_lvl, struct task_struct *task) char desc[WORKER_DESC_LEN] = { }; struct pool_workqueue *pwq = NULL; struct workqueue_struct *wq = NULL; - bool desc_valid = false; struct worker *worker; if (!(task->flags & PF_WQ_WORKER)) @@ -4383,22 +4386,18 @@ void print_worker_info(const char *log_lvl, struct task_struct *task) worker = kthread_probe_data(task); /* -* Carefully copy the associated workqueue's workfn and name. Keep -* the original last '\0' in case the original contains garbage. +* Carefully copy the associated workqueue's workfn, name and desc. +* Keep the original last '\0' in case the original is garbage. */ probe_kernel_read(, >current_func, sizeof(fn)); probe_kernel_read(, >current_pwq, sizeof(pwq)); probe_kernel_read(, >wq, sizeof(wq)); probe_kernel_read(name, wq->name, sizeof(name) - 1); - - /* copy worker description */ - probe_kernel_read(_valid, >desc_valid, sizeof(desc_valid)); - if (desc_valid) - probe_kernel_read(desc, worker->desc, sizeof(desc) - 1); + probe_kernel_read(desc, worker->desc, sizeof(desc) - 1); if (fn || name[0] || desc[0]) { printk("%sWorkqueue: %s %pf", log_lvl, name, fn); - if (desc[0]) + if (strcmp(name, desc)) pr_cont(" (%s)", desc); pr_cont("\n"); } diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 4a182e0..66fbb5a 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -31,7 +31,6 @@ struct worker { struct work_struct *current_work; /* L: work being processed */ work_func_t current_func; /* L: current_work's fn */ struct pool_workqueue *current_pwq; /* L: current_work's pwq */ - booldesc_valid; /* ->desc is valid */ struct list_headscheduled; /* L: scheduled works */ /* 64 bytes boundary on 64bit, 32 on 32bit */ -- 2.9.5
[PATCH 5/6] proc: Consolidate task->comm formatting into proc_task_name()
proc shows task->comm in three places - comm, stat, status - and each is fetching and formatting task->comm slighly differently. This patch renames task_name() to proc_task_name(), makes it more generic, and updates all three paths to use it. This will enable expanding comm reporting for workqueue workers. Signed-off-by: Tejun Heo --- fs/proc/array.c| 26 +++--- fs/proc/base.c | 5 ++--- fs/proc/internal.h | 2 ++ 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index ae2c807..f29221e 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -95,7 +95,7 @@ #include #include "internal.h" -static inline void task_name(struct seq_file *m, struct task_struct *p) +void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) { char *buf; size_t size; @@ -104,13 +104,17 @@ static inline void task_name(struct seq_file *m, struct task_struct *p) get_task_comm(tcomm, p); - seq_puts(m, "Name:\t"); - size = seq_get_buf(m, ); - ret = string_escape_str(tcomm, buf, size, ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); - seq_commit(m, ret < size ? ret : -1); + if (escape) { + ret = string_escape_str(tcomm, buf, size, + ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\"); + if (ret >= size) + ret = -1; + } else { + ret = strscpy(buf, tcomm, size); + } - seq_putc(m, '\n'); + seq_commit(m, ret); } /* @@ -365,7 +369,10 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, { struct mm_struct *mm = get_task_mm(task); - task_name(m, task); + seq_puts(m, "Name:\t"); + proc_task_name(m, task, true); + seq_putc(m, '\n'); + task_state(m, ns, pid, task); if (mm) { @@ -400,7 +407,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, u64 cutime, cstime, utime, stime; u64 cgtime, gtime; unsigned long rsslim = 0; - char tcomm[sizeof(task->comm)]; unsigned long flags; state = *get_task_state(task); @@ -427,8 +433,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } } - get_task_comm(tcomm, task); - sigemptyset(); sigemptyset(); cutime = cstime = utime = stime = 0; @@ -495,7 +499,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns)); seq_puts(m, " ("); - seq_puts(m, tcomm); + proc_task_name(m, task, false); seq_puts(m, ") "); seq_putc(m, state); seq_put_decimal_ll(m, " ", ppid); diff --git a/fs/proc/base.c b/fs/proc/base.c index 2eee4d7..eb17917ca 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1581,9 +1581,8 @@ static int comm_show(struct seq_file *m, void *v) if (!p) return -ESRCH; - task_lock(p); - seq_printf(m, "%s\n", p->comm); - task_unlock(p); + proc_task_name(m, p, false); + seq_putc(m, '\n'); put_task_struct(p); diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 0f1692e..b823fac62 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -131,6 +131,8 @@ unsigned name_to_int(const struct qstr *qstr); */ extern const struct file_operations proc_tid_children_operations; +extern void proc_task_name(struct seq_file *m, struct task_struct *p, + bool escape); extern int proc_tid_stat(struct seq_file *, struct pid_namespace *, struct pid *, struct task_struct *); extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *, -- 2.9.5
[PATCH 1/6] proc: Don't allow empty /proc/PID/cmdline for user tasks
Kernel threads have empty /proc/PID/cmdline and some userland tools including ps(1) and older versions of systemd use this to detect kernel threads. However, any userland program can emulate the behavior by making its argvs unavailable and trick the affected tools into thinking that the task is a kernel thread. Linus's reproducer follows. #include #include int main(void) { char empty[16384]; unsigned long ptr; asm volatile("" :"=r" (ptr) : "0" (empty):"memory"); ptr = (ptr+4095) & ~4095; munmap((void *)ptr, 32768); sleep(1000); return 0; } Compiling the above program into nullcmdline and running it on an unpatche kernel shows the following behavior. $ ./nullcmdline & [1] 2382031 [devbig577 ~/tmp]$ hexdump -C /proc/2382031/comm 6e 75 6c 6c 63 6d 64 6c 69 6e 65 0a |nullcmdline.| 000c $ hexdump -C /proc/2382031/cmdline $ ps 2382031 PID TTY STAT TIME COMMAND 2382031 pts/2S 0:00 [nullcmdline] The empty cmdline makes ps(1) think that nullcmdline is a kernel thread and put brackets around its name (comm), which is mostly a nuisance but it's possible that this confusion can lead to more harmful confusions. This patch fixes the issue by making proc_pid_cmdline_read() never return empty string for user tasks. If the result is empty for whatever reason, comm string is returned. Even when the comm string is empty, it still returns the null termnation character. On a patched kernel, running the same command as above gives us. $ ./nullcmdline & [1] 2317 [test ~]# hexdump -C /proc/2317/comm 6e 75 6c 6c 63 6d 64 6c 69 6e 65 0a |nullcmdline.| 000c $ hexdump -C /proc/2317/cmdline 6e 75 6c 6c 63 6d 64 6c 69 6e 65 00 |nullcmdline.| 000c $ ps 2317 PID TTY STAT TIME COMMAND 2317 pts/0S 0:00 nullcmdline Note that cmdline is a dup of comm and ps(1) is no longer confused. Signed-off-by: Tejun Heo Suggested-by: Linus Torvalds --- fs/proc/base.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 1b2ede6..2eee4d7 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -224,9 +224,10 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, if (!tsk) return -ESRCH; mm = get_task_mm(tsk); - put_task_struct(tsk); - if (!mm) - return 0; + if (!mm) { + rv = 0; + goto out_put_task; + } /* Check if process spawned far enough to have cmdline. */ if (!mm->env_end) { rv = 0; @@ -367,8 +368,23 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, free_page((unsigned long)page); out_mmput: mmput(mm); +out_put_task: + /* +* Some userland tools use empty cmdline to distinguish kthreads. +* Avoid empty cmdline for user tasks by returning tsk->comm with +* \0 termination when empty. +*/ + if (*pos == 0 && rv == 0 && !(tsk->flags & PF_KTHREAD)) { + char tcomm[TASK_COMM_LEN]; + + get_task_comm(tcomm, tsk); + rv = min(strlen(tcomm) + 1, count); + if (copy_to_user(buf, tsk->comm, rv)) + rv = -EFAULT; + } if (rv > 0) *pos += rv; + put_task_struct(tsk); return rv; } -- 2.9.5
[PATCHSET] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef | grep kworker root 4 2 0 Feb25 ?00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ?00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ?00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ?00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ?00:00:00 [kworker/3:0H] ... This patchset makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient For details on the design decisions, please refer to the following thread. http://lkml.kernel.org/r/20180516153939.gh2368...@devbig577.frc2.facebook.com This patchset contains the following six patches. 0001-proc-Don-t-allow-empty-proc-PID-cmdline-for-user-tas.patch 0002-workqueue-Replace-pool-attach_mutex-with-global-wq_p.patch 0003-workqueue-Make-worker_attach-detach_pool-update-work.patch 0004-workqueue-Set-worker-desc-to-workqueue-name-by-defau.patch 0005-proc-Consolidate-task-comm-formatting-into-proc_task.patch 0006-workqueue-Show-the-latest-workqueue-name-in-proc-PID.patch I'm applying the patches to wq/for-4.18. Please let me know if the patchset need updates (the branch doesn't have any other changes anyway). diffstat follows. Thanks. fs/proc/array.c | 33 +++- fs/proc/base.c | 27 +++--- fs/proc/internal.h |2 include/linux/workqueue.h |1 kernel/workqueue.c | 117 kernel/workqueue_internal.h |3 - 6 files changed, 122 insertions(+), 61 deletions(-) -- tejun
[PATCHSET] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}
There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef | grep kworker root 4 2 0 Feb25 ?00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ?00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ?00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ?00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ?00:00:00 [kworker/3:0H] ... This patchset makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient For details on the design decisions, please refer to the following thread. http://lkml.kernel.org/r/20180516153939.gh2368...@devbig577.frc2.facebook.com This patchset contains the following six patches. 0001-proc-Don-t-allow-empty-proc-PID-cmdline-for-user-tas.patch 0002-workqueue-Replace-pool-attach_mutex-with-global-wq_p.patch 0003-workqueue-Make-worker_attach-detach_pool-update-work.patch 0004-workqueue-Set-worker-desc-to-workqueue-name-by-defau.patch 0005-proc-Consolidate-task-comm-formatting-into-proc_task.patch 0006-workqueue-Show-the-latest-workqueue-name-in-proc-PID.patch I'm applying the patches to wq/for-4.18. Please let me know if the patchset need updates (the branch doesn't have any other changes anyway). diffstat follows. Thanks. fs/proc/array.c | 33 +++- fs/proc/base.c | 27 +++--- fs/proc/internal.h |2 include/linux/workqueue.h |1 kernel/workqueue.c | 117 kernel/workqueue_internal.h |3 - 6 files changed, 122 insertions(+), 61 deletions(-) -- tejun
Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()
On Tue, May 15, 2018 at 01:12:20PM +0300, Kirill Tkhai wrote: > >> +#define root_mem_cgroup NULL > > > > Let's instead export mem_cgroup_is_root(). In case if MEMCG is disabled > > it will always return false. > > export == move to header file That and adding a stub function in case !MEMCG. > >> +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > >> + struct mem_cgroup *memcg, int priority) > >> +{ > >> + struct memcg_shrinker_map *map; > >> + unsigned long freed = 0; > >> + int ret, i; > >> + > >> + if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > >> + return 0; > >> + > >> + if (!down_read_trylock(_rwsem)) > >> + return 0; > >> + > >> + /* > >> + * 1)Caller passes only alive memcg, so map can't be NULL. > >> + * 2)shrinker_rwsem protects from maps expanding. > > > > ^^ > > Nit: space missing here :-) > > I don't understand what you mean here. Please, clarify... This is just a trivial remark regarding comment formatting. They usually put a space between the number and the first word in the sentence, i.e. between '1)' and 'Caller' in your case. > > >> + */ > >> + map = rcu_dereference_protected(MEMCG_SHRINKER_MAP(memcg, nid), true); > >> + BUG_ON(!map); > >> + > >> + for_each_set_bit(i, map->map, memcg_shrinker_nr_max) { > >> + struct shrink_control sc = { > >> + .gfp_mask = gfp_mask, > >> + .nid = nid, > >> + .memcg = memcg, > >> + }; > >> + struct shrinker *shrinker; > >> + > >> + shrinker = idr_find(_idr, i); > >> + if (!shrinker) { > >> + clear_bit(i, map->map); > >> + continue; > >> + } > >> + if (list_empty(>list)) > >> + continue; > > > > I don't like using shrinker->list as an indicator that the shrinker has > > been initialized. IMO if you do need such a check, you should split > > shrinker_idr registration in two steps - allocate a slot in 'prealloc' > > and set the pointer in 'register'. However, can we really encounter an > > unregistered shrinker here? AFAIU a bit can be set in the shrinker map > > only after the corresponding shrinker has been initialized, no? > > 1)No, it's not so. Here is a race: > cpu#0cpu#1 cpu#2 > prealloc_shrinker() > prealloc_shrinker() >memcg_expand_shrinker_maps() > memcg_expand_one_shrinker_map() >memset(>map, 0xff); > > do_shrink_slab() (on uninitialized LRUs) > init LRUs > register_shrinker_prepared() > > So, the check is needed. OK, I see. > > 2)Assigning NULL pointer can't be used here, since NULL pointer is already > used > to clear unregistered shrinkers from the map. See the check right after > idr_find(). But it won't break anything if we clear bit for prealloc-ed, but not yet registered shrinkers, will it? > > list_empty() is used since it's the already existing indicator, which does not > require additional member in struct shrinker. It just looks rather counter-intuitive to me to use shrinker->list to differentiate between registered and unregistered shrinkers. May be, I'm wrong. If you are sure that this is OK, I'm fine with it, but then please add a comment here explaining what this check is needed for. Thanks.
Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()
On Tue, May 15, 2018 at 01:12:20PM +0300, Kirill Tkhai wrote: > >> +#define root_mem_cgroup NULL > > > > Let's instead export mem_cgroup_is_root(). In case if MEMCG is disabled > > it will always return false. > > export == move to header file That and adding a stub function in case !MEMCG. > >> +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > >> + struct mem_cgroup *memcg, int priority) > >> +{ > >> + struct memcg_shrinker_map *map; > >> + unsigned long freed = 0; > >> + int ret, i; > >> + > >> + if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > >> + return 0; > >> + > >> + if (!down_read_trylock(_rwsem)) > >> + return 0; > >> + > >> + /* > >> + * 1)Caller passes only alive memcg, so map can't be NULL. > >> + * 2)shrinker_rwsem protects from maps expanding. > > > > ^^ > > Nit: space missing here :-) > > I don't understand what you mean here. Please, clarify... This is just a trivial remark regarding comment formatting. They usually put a space between the number and the first word in the sentence, i.e. between '1)' and 'Caller' in your case. > > >> + */ > >> + map = rcu_dereference_protected(MEMCG_SHRINKER_MAP(memcg, nid), true); > >> + BUG_ON(!map); > >> + > >> + for_each_set_bit(i, map->map, memcg_shrinker_nr_max) { > >> + struct shrink_control sc = { > >> + .gfp_mask = gfp_mask, > >> + .nid = nid, > >> + .memcg = memcg, > >> + }; > >> + struct shrinker *shrinker; > >> + > >> + shrinker = idr_find(_idr, i); > >> + if (!shrinker) { > >> + clear_bit(i, map->map); > >> + continue; > >> + } > >> + if (list_empty(>list)) > >> + continue; > > > > I don't like using shrinker->list as an indicator that the shrinker has > > been initialized. IMO if you do need such a check, you should split > > shrinker_idr registration in two steps - allocate a slot in 'prealloc' > > and set the pointer in 'register'. However, can we really encounter an > > unregistered shrinker here? AFAIU a bit can be set in the shrinker map > > only after the corresponding shrinker has been initialized, no? > > 1)No, it's not so. Here is a race: > cpu#0cpu#1 cpu#2 > prealloc_shrinker() > prealloc_shrinker() >memcg_expand_shrinker_maps() > memcg_expand_one_shrinker_map() >memset(>map, 0xff); > > do_shrink_slab() (on uninitialized LRUs) > init LRUs > register_shrinker_prepared() > > So, the check is needed. OK, I see. > > 2)Assigning NULL pointer can't be used here, since NULL pointer is already > used > to clear unregistered shrinkers from the map. See the check right after > idr_find(). But it won't break anything if we clear bit for prealloc-ed, but not yet registered shrinkers, will it? > > list_empty() is used since it's the already existing indicator, which does not > require additional member in struct shrinker. It just looks rather counter-intuitive to me to use shrinker->list to differentiate between registered and unregistered shrinkers. May be, I'm wrong. If you are sure that this is OK, I'm fine with it, but then please add a comment here explaining what this check is needed for. Thanks.
Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter
Hi, Le 16/05/2018 à 09:56, Boris Brezillon a écrit : On Wed, 16 May 2018 09:32:57 +0200 Chris Moorewrote: Hi, Le 15/05/2018 à 09:34, Boris Brezillon a écrit : On Tue, 15 May 2018 06:45:51 +0200 Chris Moore wrote: Hi, Le 13/05/2018 à 06:30, Wan, Jane (Nokia - US/Sunnyvale) a écrit : Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present. Signed-off-by: Jane Wan --- v7: change debug print messages v6: support the cases that srcbufs are not contiguous v5: make the bit-wise majority functon generic v4: move the bit-wise majority code in a separate function v3: fix warning message detected by kbuild test robot v2: rebase the changes on top of v4.17-rc1 drivers/mtd/nand/raw/nand_base.c | 50 ++ 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 72f3a89..b43b784 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip, } /* + * Recover data with bit-wise majority + */ +static void nand_bit_wise_majority(const void **srcbufs, + unsigned int nsrcbufs, + void *dstbuf, + unsigned int bufsize) +{ + int i, j, k; + + for (i = 0; i < bufsize; i++) { + u8 cnt, val; + + val = 0; + for (j = 0; j < 8; j++) { + cnt = 0; + for (k = 0; k < nsrcbufs; k++) { + const u8 *srcbuf = srcbufs[k]; + + if (srcbuf[i] & BIT(j)) + cnt++; + } + if (cnt > nsrcbufs / 2) + val |= BIT(j); + } + ((u8 *)dstbuf)[i] = val; + } +} + +/* * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise. */ static int nand_flash_detect_onfi(struct nand_chip *chip) @@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) return 0; /* ONFI chip: allocate a buffer to hold its parameter page */ - p = kzalloc(sizeof(*p), GFP_KERNEL); + p = kzalloc((sizeof(*p) * 3), GFP_KERNEL); if (!p) return -ENOMEM; @@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } for (i = 0; i < 3; i++) { - ret = nand_read_data_op(chip, p, sizeof(*p), true); + ret = nand_read_data_op(chip, [i], sizeof(*p), true); if (ret) { ret = 0; goto free_onfi_param_page; } - if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) == + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)[i], 254) == le16_to_cpu(p->crc)) { + if (i) + memcpy(p, [i], sizeof(*p)); break; } } if (i == 3) { - pr_err("Could not find valid ONFI parameter page; aborting\n"); - goto free_onfi_param_page; + const void *srcbufs[3] = {p, p + 1, p + 2}; + + pr_warn("Could not find a valid ONFI parameter page, trying bit-wise majority to recover it\n"); + nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p, + sizeof(*p)); + + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) != + le16_to_cpu(p->crc)) { + pr_err("ONFI parameter recovery failed, aborting\n"); + goto free_onfi_param_page; + } } /* Check version */ This version is still hard coded for a three sample bitwise majority vote. So why not use the method which I suggested previously for v2 and which I repeat below? Because I want the nand_bit_wise_majority() function to work with nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param page, but NAND vendor can decide to put more). Also, if the X copies of the PARAM are corrupted (which is rather unlikely), that means we already spent quite a lot of time reading the different copies and calculating the CRC, so I think we don't care about perf optimizations when doing bit-wise majority. The three sample bitwise majority can be implemented without bit level manipulation using the identity: majority3(a, b, c) = (a & b) | (a & c) | (b & c) This can be factorized slightly to (a & (b | c)) | (b & c) This
Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter
Hi, Le 16/05/2018 à 09:56, Boris Brezillon a écrit : On Wed, 16 May 2018 09:32:57 +0200 Chris Moore wrote: Hi, Le 15/05/2018 à 09:34, Boris Brezillon a écrit : On Tue, 15 May 2018 06:45:51 +0200 Chris Moore wrote: Hi, Le 13/05/2018 à 06:30, Wan, Jane (Nokia - US/Sunnyvale) a écrit : Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present. Signed-off-by: Jane Wan --- v7: change debug print messages v6: support the cases that srcbufs are not contiguous v5: make the bit-wise majority functon generic v4: move the bit-wise majority code in a separate function v3: fix warning message detected by kbuild test robot v2: rebase the changes on top of v4.17-rc1 drivers/mtd/nand/raw/nand_base.c | 50 ++ 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 72f3a89..b43b784 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip, } /* + * Recover data with bit-wise majority + */ +static void nand_bit_wise_majority(const void **srcbufs, + unsigned int nsrcbufs, + void *dstbuf, + unsigned int bufsize) +{ + int i, j, k; + + for (i = 0; i < bufsize; i++) { + u8 cnt, val; + + val = 0; + for (j = 0; j < 8; j++) { + cnt = 0; + for (k = 0; k < nsrcbufs; k++) { + const u8 *srcbuf = srcbufs[k]; + + if (srcbuf[i] & BIT(j)) + cnt++; + } + if (cnt > nsrcbufs / 2) + val |= BIT(j); + } + ((u8 *)dstbuf)[i] = val; + } +} + +/* * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise. */ static int nand_flash_detect_onfi(struct nand_chip *chip) @@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) return 0; /* ONFI chip: allocate a buffer to hold its parameter page */ - p = kzalloc(sizeof(*p), GFP_KERNEL); + p = kzalloc((sizeof(*p) * 3), GFP_KERNEL); if (!p) return -ENOMEM; @@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } for (i = 0; i < 3; i++) { - ret = nand_read_data_op(chip, p, sizeof(*p), true); + ret = nand_read_data_op(chip, [i], sizeof(*p), true); if (ret) { ret = 0; goto free_onfi_param_page; } - if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) == + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)[i], 254) == le16_to_cpu(p->crc)) { + if (i) + memcpy(p, [i], sizeof(*p)); break; } } if (i == 3) { - pr_err("Could not find valid ONFI parameter page; aborting\n"); - goto free_onfi_param_page; + const void *srcbufs[3] = {p, p + 1, p + 2}; + + pr_warn("Could not find a valid ONFI parameter page, trying bit-wise majority to recover it\n"); + nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p, + sizeof(*p)); + + if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) != + le16_to_cpu(p->crc)) { + pr_err("ONFI parameter recovery failed, aborting\n"); + goto free_onfi_param_page; + } } /* Check version */ This version is still hard coded for a three sample bitwise majority vote. So why not use the method which I suggested previously for v2 and which I repeat below? Because I want the nand_bit_wise_majority() function to work with nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param page, but NAND vendor can decide to put more). Also, if the X copies of the PARAM are corrupted (which is rather unlikely), that means we already spent quite a lot of time reading the different copies and calculating the CRC, so I think we don't care about perf optimizations when doing bit-wise majority. The three sample bitwise majority can be implemented without bit level manipulation using the identity: majority3(a, b, c) = (a & b) | (a & c) | (b & c) This can be factorized slightly to (a & (b | c)) | (b & c) This enables the operation to be performed 8, 16, 32 or even
Re: [PATCH net-next 1/3] net: ethernet: ti: Allow most drivers with COMPILE_TEST
Hi Florian, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807 config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit': >> drivers/net//ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument >> 1 of 'writel_relaxed' makes integer from pointer without a cast >> [-Wint-conversion] writel_relaxed(token, >sw_token); ^ In file included from arch/xtensa/include/asm/io.h:83:0, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from drivers/net//ethernet/ti/davinci_cpdma.c:21: include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but argument is of type 'void *' #define writel_relaxed writel_relaxed ^ >> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed' static inline void writel_relaxed(u32 value, volatile void __iomem *addr) ^~ -- drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit': drivers/net/ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 1 of 'writel_relaxed' makes integer from pointer without a cast [-Wint-conversion] writel_relaxed(token, >sw_token); ^ In file included from arch/xtensa/include/asm/io.h:83:0, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from drivers/net/ethernet/ti/davinci_cpdma.c:21: include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but argument is of type 'void *' #define writel_relaxed writel_relaxed ^ >> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed' static inline void writel_relaxed(u32 value, volatile void __iomem *addr) ^~ vim +/writel_relaxed +1083 drivers/net//ethernet/ti/davinci_cpdma.c ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1029 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1030 int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data, aef614e1 drivers/net/ethernet/ti/davinci_cpdma.c Sebastian Siewior 2013-04-23 1031 int len, int directed) ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1032 { ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1033 struct cpdma_ctlr *ctlr = chan->ctlr; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1034 struct cpdma_desc __iomem *desc; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1035 dma_addr_t buffer; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1036 unsigned long flags; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1037 u32 mode; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1038 int ret = 0; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1039 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1040 spin_lock_irqsave(>lock, flags); ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1041 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1042 if (chan->state == CPDMA_STATE_TEARDOWN) { ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1043 ret = -EINVAL; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1044 goto unlock_ret; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1045 } ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1046 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1047 if (chan->count >= chan->desc_num) { 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1048 chan->stats.desc_alloc_fail++; 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1049 ret = -ENOMEM; 742fb20f
Re: [PATCH net-next 1/3] net: ethernet: ti: Allow most drivers with COMPILE_TEST
Hi Florian, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807 config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit': >> drivers/net//ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument >> 1 of 'writel_relaxed' makes integer from pointer without a cast >> [-Wint-conversion] writel_relaxed(token, >sw_token); ^ In file included from arch/xtensa/include/asm/io.h:83:0, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from drivers/net//ethernet/ti/davinci_cpdma.c:21: include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but argument is of type 'void *' #define writel_relaxed writel_relaxed ^ >> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed' static inline void writel_relaxed(u32 value, volatile void __iomem *addr) ^~ -- drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit': drivers/net/ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 1 of 'writel_relaxed' makes integer from pointer without a cast [-Wint-conversion] writel_relaxed(token, >sw_token); ^ In file included from arch/xtensa/include/asm/io.h:83:0, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from drivers/net/ethernet/ti/davinci_cpdma.c:21: include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but argument is of type 'void *' #define writel_relaxed writel_relaxed ^ >> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed' static inline void writel_relaxed(u32 value, volatile void __iomem *addr) ^~ vim +/writel_relaxed +1083 drivers/net//ethernet/ti/davinci_cpdma.c ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1029 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1030 int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data, aef614e1 drivers/net/ethernet/ti/davinci_cpdma.c Sebastian Siewior 2013-04-23 1031 int len, int directed) ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1032 { ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1033 struct cpdma_ctlr *ctlr = chan->ctlr; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1034 struct cpdma_desc __iomem *desc; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1035 dma_addr_t buffer; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1036 unsigned long flags; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1037 u32 mode; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1038 int ret = 0; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1039 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1040 spin_lock_irqsave(>lock, flags); ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1041 ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1042 if (chan->state == CPDMA_STATE_TEARDOWN) { ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1043 ret = -EINVAL; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1044 goto unlock_ret; ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1045 } ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15 1046 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1047 if (chan->count >= chan->desc_num) { 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1048 chan->stats.desc_alloc_fail++; 742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27 1049 ret = -ENOMEM; 742fb20f
[PATCH] of: overlay: validate offset from property fixups
From: Frank RowandThe smatch static checker marks the data in offset as untrusted, leading it to warn: drivers/of/resolver.c:125 update_usages_of_a_phandle_reference() error: buffer underflow 'prop->value' 's32min-s32max' Add check to verify that offset is within the property data. Reported-by: Dan Carpenter Signed-off-by: Frank Rowand --- drivers/of/resolver.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c index 65d0b7adfcd4..7edfac6f1914 100644 --- a/drivers/of/resolver.c +++ b/drivers/of/resolver.c @@ -122,6 +122,11 @@ static int update_usages_of_a_phandle_reference(struct device_node *overlay, goto err_fail; } + if (offset < 0 || offset + sizeof(__be32) > prop->length) { + err = -EINVAL; + goto err_fail; + } + *(__be32 *)(prop->value + offset) = cpu_to_be32(phandle); } -- Frank Rowand
[PATCH] of: overlay: validate offset from property fixups
From: Frank Rowand The smatch static checker marks the data in offset as untrusted, leading it to warn: drivers/of/resolver.c:125 update_usages_of_a_phandle_reference() error: buffer underflow 'prop->value' 's32min-s32max' Add check to verify that offset is within the property data. Reported-by: Dan Carpenter Signed-off-by: Frank Rowand --- drivers/of/resolver.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c index 65d0b7adfcd4..7edfac6f1914 100644 --- a/drivers/of/resolver.c +++ b/drivers/of/resolver.c @@ -122,6 +122,11 @@ static int update_usages_of_a_phandle_reference(struct device_node *overlay, goto err_fail; } + if (offset < 0 || offset + sizeof(__be32) > prop->length) { + err = -EINVAL; + goto err_fail; + } + *(__be32 *)(prop->value + offset) = cpu_to_be32(phandle); } -- Frank Rowand
Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"
On 15-05-18, 11:50, Frank Mori Hess wrote: > On Tue, May 15, 2018 at 2:21 AM, Vinodwrote: > > > > For Pause/resume data loss is _not_ expected. > > > >> > and some of the 8250 drivers like 8250_dw.c set a maxburst > 1. If it > >> > can't count on the pause/residue/terminate working without data loss > >> > then it is just broken. As is the 8250_omap.c driver. Is the > >> > description of dmaengine_pause in the documentation of "This pauses > >> > activity on the DMA channel without data loss" to be interpreted as > >> > "as long as you resume afterwards"? > >> > >> I assume that this requirement is for both - resuming and terminating. > > > > Terminate is abort, data loss may happen here. > > Wait, are you saying if you do > > dma pause no data loss > read residue here as well > dma terminate Oh yes, we aborted... > > then it is acceptable for there to be data loss, because it can be > blamed on the terminate? In that case the usage of dmaengine in all > the 8250 serial drivers is broken. Every time there is a break in the > incoming rx data, the 8250 driver does pause/residue/terminate which > could potentially cause data from the incoming data stream to be > dropped. As I said terminate is abort. It cleans up the channel and is supposed to used for cleanup not for stopping. See Documentation: - device_terminate_all - Aborts all the pending and ongoing transfers on the channel - For aborted transfers the complete callback should not be called - Can be called from atomic context or from within a complete callback of a descriptor. Must not sleep. Drivers must be able to handle this correctly. - Termination may be asynchronous. The driver does not have to wait until the currently active transfer has completely stopped. See device_synchronize. Thanks -- ~Vinod
Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"
On 15-05-18, 11:50, Frank Mori Hess wrote: > On Tue, May 15, 2018 at 2:21 AM, Vinod wrote: > > > > For Pause/resume data loss is _not_ expected. > > > >> > and some of the 8250 drivers like 8250_dw.c set a maxburst > 1. If it > >> > can't count on the pause/residue/terminate working without data loss > >> > then it is just broken. As is the 8250_omap.c driver. Is the > >> > description of dmaengine_pause in the documentation of "This pauses > >> > activity on the DMA channel without data loss" to be interpreted as > >> > "as long as you resume afterwards"? > >> > >> I assume that this requirement is for both - resuming and terminating. > > > > Terminate is abort, data loss may happen here. > > Wait, are you saying if you do > > dma pause no data loss > read residue here as well > dma terminate Oh yes, we aborted... > > then it is acceptable for there to be data loss, because it can be > blamed on the terminate? In that case the usage of dmaengine in all > the 8250 serial drivers is broken. Every time there is a break in the > incoming rx data, the 8250 driver does pause/residue/terminate which > could potentially cause data from the incoming data stream to be > dropped. As I said terminate is abort. It cleans up the channel and is supposed to used for cleanup not for stopping. See Documentation: - device_terminate_all - Aborts all the pending and ongoing transfers on the channel - For aborted transfers the complete callback should not be called - Can be called from atomic context or from within a complete callback of a descriptor. Must not sleep. Drivers must be able to handle this correctly. - Termination may be asynchronous. The driver does not have to wait until the currently active transfer has completely stopped. See device_synchronize. Thanks -- ~Vinod
linux-next: manual merge of the staging tree with the v4l-dvb tree
Hi all, Today's linux-next merge of the staging tree got a conflict in: drivers/staging/media/atomisp/TODO between commit: 51b8dc5163d2 ("media: staging: atomisp: Remove driver") from the v4l-dvb tree and commit: 1bd421154821 ("staging: atomisp: Augment TODO file with GPIO work item") from the staging tree. I fixed it up (I just removed the file) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpn0MlNOAqLq.pgp Description: OpenPGP digital signature
linux-next: manual merge of the staging tree with the v4l-dvb tree
Hi all, Today's linux-next merge of the staging tree got a conflict in: drivers/staging/media/atomisp/TODO between commit: 51b8dc5163d2 ("media: staging: atomisp: Remove driver") from the v4l-dvb tree and commit: 1bd421154821 ("staging: atomisp: Augment TODO file with GPIO work item") from the staging tree. I fixed it up (I just removed the file) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpn0MlNOAqLq.pgp Description: OpenPGP digital signature
Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()
On Tue, May 15, 2018 at 05:49:59PM +0300, Kirill Tkhai wrote: > >> @@ -589,13 +647,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > >> nid, > >>.memcg = memcg, > >>}; > >> > >> - /* > >> - * If kernel memory accounting is disabled, we ignore > >> - * SHRINKER_MEMCG_AWARE flag and call all shrinkers > >> - * passing NULL for memcg. > >> - */ > >> - if (memcg_kmem_enabled() && > >> - !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > >> + if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > >>continue; > > > > I want this check gone. It's easy to achieve, actually - just remove the > > following lines from shrink_node() > > > > if (global_reclaim(sc)) > > shrink_slab(sc->gfp_mask, pgdat->node_id, NULL, > > sc->priority); > > This check is not related to the patchset. Yes, it is. This patch modifies shrink_slab which is used only by shrink_node. Simplifying shrink_node along the way looks right to me. > Let's don't mix everything in the single series of patches, because > after your last remarks it will grow at least up to 15 patches. Most of which are trivial so I don't see any problem here. > This patchset can't be responsible for everything. I don't understand why you balk at simplifying the code a bit while you are patching related functions anyway. > > >> > >>if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > >>
Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()
On Tue, May 15, 2018 at 05:49:59PM +0300, Kirill Tkhai wrote: > >> @@ -589,13 +647,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > >> nid, > >>.memcg = memcg, > >>}; > >> > >> - /* > >> - * If kernel memory accounting is disabled, we ignore > >> - * SHRINKER_MEMCG_AWARE flag and call all shrinkers > >> - * passing NULL for memcg. > >> - */ > >> - if (memcg_kmem_enabled() && > >> - !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > >> + if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > >>continue; > > > > I want this check gone. It's easy to achieve, actually - just remove the > > following lines from shrink_node() > > > > if (global_reclaim(sc)) > > shrink_slab(sc->gfp_mask, pgdat->node_id, NULL, > > sc->priority); > > This check is not related to the patchset. Yes, it is. This patch modifies shrink_slab which is used only by shrink_node. Simplifying shrink_node along the way looks right to me. > Let's don't mix everything in the single series of patches, because > after your last remarks it will grow at least up to 15 patches. Most of which are trivial so I don't see any problem here. > This patchset can't be responsible for everything. I don't understand why you balk at simplifying the code a bit while you are patching related functions anyway. > > >> > >>if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > >>
Re: [PATCH] cpufreq: brcmstb-avs-cpufreq: sort frequencies in ascending order
On 16-05-18, 12:24, Florian Fainelli wrote: > On 05/15/2018 09:32 PM, Viresh Kumar wrote: > > On 15-05-18, 20:49, Markus Mayer wrote: > >> From: Markus Mayer> >> > >> Most CPUfreq drivers (at least on ARM) seem to be sorting the available > >> frequencies from lowest to highest. To match this behaviour, we reverse > >> the sorting order in brcmstb-avs-cpufreq, so it is now also lowest to > >> highest. > > > > The reasoning isn't correct. Just because everyone else is doing it > > doesn't make it right and so you shouldn't change just because of > > that. > > > > What you must written instead in the commit log is that the cpufreq > > core performs better if the table is sorted (in any order), and so we > > must sort it as well. > > Is there a reason why set_freq_table_sorted() tries an ascending or > descending sort, but does not enforce one versus another for all drivers? set_freq_table_sorted() doesn't sort the frequency table but checks if the table is already sorted in a particular order. And then cpufreq_frequency_table_target() is optimized based on this flag. We don't have to enforce any particular ordering here. > > But I feel the table is already sorted for your platform, isn't it? > > And I don't see a clear advantage of merging this patch. > > The patch changes the order to have the lowest to highest, whereas the > current implementation has them from highest to lowest. From what you > are saying, it sounds like this is unnecessary, since the sorting is > already making things efficient enough, so this is just a cosmetic thing? Right. It shouldn't make any performance improvements. -- viresh
Re: [PATCH] cpufreq: brcmstb-avs-cpufreq: sort frequencies in ascending order
On 16-05-18, 12:24, Florian Fainelli wrote: > On 05/15/2018 09:32 PM, Viresh Kumar wrote: > > On 15-05-18, 20:49, Markus Mayer wrote: > >> From: Markus Mayer > >> > >> Most CPUfreq drivers (at least on ARM) seem to be sorting the available > >> frequencies from lowest to highest. To match this behaviour, we reverse > >> the sorting order in brcmstb-avs-cpufreq, so it is now also lowest to > >> highest. > > > > The reasoning isn't correct. Just because everyone else is doing it > > doesn't make it right and so you shouldn't change just because of > > that. > > > > What you must written instead in the commit log is that the cpufreq > > core performs better if the table is sorted (in any order), and so we > > must sort it as well. > > Is there a reason why set_freq_table_sorted() tries an ascending or > descending sort, but does not enforce one versus another for all drivers? set_freq_table_sorted() doesn't sort the frequency table but checks if the table is already sorted in a particular order. And then cpufreq_frequency_table_target() is optimized based on this flag. We don't have to enforce any particular ordering here. > > But I feel the table is already sorted for your platform, isn't it? > > And I don't see a clear advantage of merging this patch. > > The patch changes the order to have the lowest to highest, whereas the > current implementation has them from highest to lowest. From what you > are saying, it sounds like this is unnecessary, since the sorting is > already making things efficient enough, so this is just a cosmetic thing? Right. It shouldn't make any performance improvements. -- viresh
Re: [PATCH] sched/rt: fix call to cpufreq_update_util
On 16-05-18, 20:18, Vincent Guittot wrote: > With commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support") > schedutil governor uses rq->rt.rt_nr_running to detect whether a RT task is > currently running on the CPU and to set frequency to max if necessary. > cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but > rq->rt.rt_nr_running as not been updated yet when dequeue_top_rt_rq() is > called so schedutil still considers that a RT task is running when the > last task is dequeued. The update of rq->rt.rt_nr_running happens later > in dequeue_rt_stack() > > Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support') > Cc:# v4.16+ > Signed-off-by: Vincent Guittot > --- > kernel/sched/rt.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 7aef6b4..6e74d3d 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1001,8 +1001,6 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq) > sub_nr_running(rq, rt_rq->rt_nr_running); > rt_rq->rt_queued = 0; > Remove this blank line as well ? > - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ > - cpufreq_update_util(rq, 0); > } > > static void > @@ -1288,6 +1286,9 @@ static void dequeue_rt_stack(struct sched_rt_entity > *rt_se, unsigned int flags) > if (on_rt_rq(rt_se)) > __dequeue_rt_entity(rt_se, flags); > } > + > + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq_of_rt_rq(rt_rq_of_se(back)), 0); > } > > static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int > flags) > -- > 2.7.4 -- viresh
Re: [PATCH] sched/rt: fix call to cpufreq_update_util
On 16-05-18, 20:18, Vincent Guittot wrote: > With commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support") > schedutil governor uses rq->rt.rt_nr_running to detect whether a RT task is > currently running on the CPU and to set frequency to max if necessary. > cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but > rq->rt.rt_nr_running as not been updated yet when dequeue_top_rt_rq() is > called so schedutil still considers that a RT task is running when the > last task is dequeued. The update of rq->rt.rt_nr_running happens later > in dequeue_rt_stack() > > Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support') > Cc: # v4.16+ > Signed-off-by: Vincent Guittot > --- > kernel/sched/rt.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 7aef6b4..6e74d3d 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1001,8 +1001,6 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq) > sub_nr_running(rq, rt_rq->rt_nr_running); > rt_rq->rt_queued = 0; > Remove this blank line as well ? > - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ > - cpufreq_update_util(rq, 0); > } > > static void > @@ -1288,6 +1286,9 @@ static void dequeue_rt_stack(struct sched_rt_entity > *rt_se, unsigned int flags) > if (on_rt_rq(rt_se)) > __dequeue_rt_entity(rt_se, flags); > } > + > + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq_of_rt_rq(rt_rq_of_se(back)), 0); > } > > static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int > flags) > -- > 2.7.4 -- viresh
Re: Grant
I Mikhail Fridman. has selected you specially as one of my beneficiaries for my Charitable Donation, Just as I have declared on May 23, 2016 to give my fortune as charity. Check the link below for confirmation: http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604 Reply as soon as possible with further directives. Best Regards, Mikhail Fridman.
Re: Grant
I Mikhail Fridman. has selected you specially as one of my beneficiaries for my Charitable Donation, Just as I have declared on May 23, 2016 to give my fortune as charity. Check the link below for confirmation: http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604 Reply as soon as possible with further directives. Best Regards, Mikhail Fridman.
Re: [PATCH 1/4] scsi: ufs: add quirk to fix mishandling utrlclr/utmrlclr
On 5/6/2018 3:44 PM, Alim Akhtar wrote: In the right behavior, setting the bit to '0' indicates clear and '1' indicates no change. If host controller handles this the other way, UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used. Signed-off-by: Seungwon JeonSigned-off-by: Alim Akhtar --- drivers/scsi/ufs/ufshcd.c | 21 +++-- drivers/scsi/ufs/ufshcd.h | 5 + 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 00e7905..9898ce5 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -675,7 +675,24 @@ static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, int slot) */ static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos) { - ufshcd_writel(hba, ~(1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR); + if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR) + ufshcd_writel(hba, (1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR); + else + ufshcd_writel(hba, ~(1 << pos), + REG_UTP_TRANSFER_REQ_LIST_CLEAR); +} + +/** + * ufshcd_utmrl_clear - Clear a bit in UTRMLCLR register + * @hba: per adapter instance + * @pos: position of the bit to be cleared + */ +static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos) +{ + if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR) + ufshcd_writel(hba, (1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR); + else + ufshcd_writel(hba, ~(1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR); } /** @@ -5398,7 +5415,7 @@ static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag) goto out; spin_lock_irqsave(hba->host->host_lock, flags); - ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR); + ufshcd_utmrl_clear(hba, tag); spin_unlock_irqrestore(hba->host->host_lock, flags); /* poll for max. 1 sec to clear door bell register by h/w */ diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 8110dcd..43035f8 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -595,6 +595,11 @@ struct ufs_hba { */ #define UFSHCD_QUIRK_PRDT_BYTE_GRAN 0x80 + /* +* Cleaer handling for transfer/task request list is just opposite. +*/ Spell check - should be 'Clear' + #define UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR0x100 + unsigned int quirks;/* Deviations from standard UFSHCI spec. */ /* Device deviations from standard UFS device spec. */ Looks good to me, except the spell-check. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH 1/4] scsi: ufs: add quirk to fix mishandling utrlclr/utmrlclr
On 5/6/2018 3:44 PM, Alim Akhtar wrote: In the right behavior, setting the bit to '0' indicates clear and '1' indicates no change. If host controller handles this the other way, UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used. Signed-off-by: Seungwon Jeon Signed-off-by: Alim Akhtar --- drivers/scsi/ufs/ufshcd.c | 21 +++-- drivers/scsi/ufs/ufshcd.h | 5 + 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 00e7905..9898ce5 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -675,7 +675,24 @@ static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, int slot) */ static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos) { - ufshcd_writel(hba, ~(1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR); + if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR) + ufshcd_writel(hba, (1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR); + else + ufshcd_writel(hba, ~(1 << pos), + REG_UTP_TRANSFER_REQ_LIST_CLEAR); +} + +/** + * ufshcd_utmrl_clear - Clear a bit in UTRMLCLR register + * @hba: per adapter instance + * @pos: position of the bit to be cleared + */ +static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos) +{ + if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR) + ufshcd_writel(hba, (1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR); + else + ufshcd_writel(hba, ~(1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR); } /** @@ -5398,7 +5415,7 @@ static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int tag) goto out; spin_lock_irqsave(hba->host->host_lock, flags); - ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR); + ufshcd_utmrl_clear(hba, tag); spin_unlock_irqrestore(hba->host->host_lock, flags); /* poll for max. 1 sec to clear door bell register by h/w */ diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 8110dcd..43035f8 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -595,6 +595,11 @@ struct ufs_hba { */ #define UFSHCD_QUIRK_PRDT_BYTE_GRAN 0x80 + /* +* Cleaer handling for transfer/task request list is just opposite. +*/ Spell check - should be 'Clear' + #define UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR0x100 + unsigned int quirks;/* Deviations from standard UFSHCI spec. */ /* Device deviations from standard UFS device spec. */ Looks good to me, except the spell-check. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH net-next 2/3] net: ethernet: freescale: Allow FEC with COMPILE_TEST
Hi Florian, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807 config: m68k-allmodconfig (attached as .config) compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m68k All errors (new ones prefixed by >>): In file included from include/linux/swab.h:5:0, from include/uapi/linux/byteorder/big_endian.h:13, from include/linux/byteorder/big_endian.h:5, from arch/m68k/include/uapi/asm/byteorder.h:5, from include/asm-generic/bitops/le.h:6, from arch/m68k/include/asm/bitops.h:519, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/list.h:9, from include/linux/module.h:9, from drivers/net//ethernet/freescale/fec_main.c:24: drivers/net//ethernet/freescale/fec_main.c: In function 'fec_restart': >> drivers/net//ethernet/freescale/fec_main.c:959:26: error: 'FEC_RACC' >> undeclared (first use in this function); did you mean 'FEC_RXIC1'? val = readl(fep->hwp + FEC_RACC); ^ include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32' (__builtin_constant_p((__u32)(x)) ? \ ^ include/linux/byteorder/generic.h:89:21: note: in expansion of macro '__le32_to_cpu' #define le32_to_cpu __le32_to_cpu ^ arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32' #define readl(addr) in_le32(addr) ^~~ drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of macro 'readl' val = readl(fep->hwp + FEC_RACC); ^ drivers/net//ethernet/freescale/fec_main.c:959:26: note: each undeclared identifier is reported only once for each function it appears in val = readl(fep->hwp + FEC_RACC); ^ include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32' (__builtin_constant_p((__u32)(x)) ? \ ^ include/linux/byteorder/generic.h:89:21: note: in expansion of macro '__le32_to_cpu' #define le32_to_cpu __le32_to_cpu ^ arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32' #define readl(addr) in_le32(addr) ^~~ drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of macro 'readl' val = readl(fep->hwp + FEC_RACC); ^ In file included from arch/m68k/include/asm/io_mm.h:27:0, from arch/m68k/include/asm/io.h:5, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from include/linux/skbuff.h:34, from include/linux/if_ether.h:23, from include/uapi/linux/ethtool.h:19, from include/linux/ethtool.h:18, from include/linux/netdevice.h:41, from drivers/net//ethernet/freescale/fec_main.c:34: drivers/net//ethernet/freescale/fec_main.c:968:38: error: 'FEC_FTRL' undeclared (first use in this function); did you mean 'FEC_ECNTRL'? writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL); ^ arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32' #define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = cpu_to_le32(l)) ^~~~ drivers/net//ethernet/freescale/fec_main.c:968:3: note: in expansion of macro 'writel' writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL); ^~ drivers/net//ethernet/freescale/fec_main.c:1034:38: error: 'FEC_R_FIFO_RSEM' undeclared (first use in this function); did you mean 'FEC_FIFO_RAM'? writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM); ^ arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32' #define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = cpu_to_le32(l)) ^~~~ drivers/net//ethernet/freescale/fec_main.c:1034:3: note: in expansion of macro 'writel' writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM); ^~ drivers/net//ethernet/freescale/fec_main.c:1035:38: error: 'FEC_R_FIFO_RSFL' undeclared (first use in
Re: [PATCH net-next 2/3] net: ethernet: freescale: Allow FEC with COMPILE_TEST
Hi Florian, I love your patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807 config: m68k-allmodconfig (attached as .config) compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m68k All errors (new ones prefixed by >>): In file included from include/linux/swab.h:5:0, from include/uapi/linux/byteorder/big_endian.h:13, from include/linux/byteorder/big_endian.h:5, from arch/m68k/include/uapi/asm/byteorder.h:5, from include/asm-generic/bitops/le.h:6, from arch/m68k/include/asm/bitops.h:519, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/list.h:9, from include/linux/module.h:9, from drivers/net//ethernet/freescale/fec_main.c:24: drivers/net//ethernet/freescale/fec_main.c: In function 'fec_restart': >> drivers/net//ethernet/freescale/fec_main.c:959:26: error: 'FEC_RACC' >> undeclared (first use in this function); did you mean 'FEC_RXIC1'? val = readl(fep->hwp + FEC_RACC); ^ include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32' (__builtin_constant_p((__u32)(x)) ? \ ^ include/linux/byteorder/generic.h:89:21: note: in expansion of macro '__le32_to_cpu' #define le32_to_cpu __le32_to_cpu ^ arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32' #define readl(addr) in_le32(addr) ^~~ drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of macro 'readl' val = readl(fep->hwp + FEC_RACC); ^ drivers/net//ethernet/freescale/fec_main.c:959:26: note: each undeclared identifier is reported only once for each function it appears in val = readl(fep->hwp + FEC_RACC); ^ include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32' (__builtin_constant_p((__u32)(x)) ? \ ^ include/linux/byteorder/generic.h:89:21: note: in expansion of macro '__le32_to_cpu' #define le32_to_cpu __le32_to_cpu ^ arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32' #define readl(addr) in_le32(addr) ^~~ drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of macro 'readl' val = readl(fep->hwp + FEC_RACC); ^ In file included from arch/m68k/include/asm/io_mm.h:27:0, from arch/m68k/include/asm/io.h:5, from include/linux/scatterlist.h:9, from include/linux/dma-mapping.h:11, from include/linux/skbuff.h:34, from include/linux/if_ether.h:23, from include/uapi/linux/ethtool.h:19, from include/linux/ethtool.h:18, from include/linux/netdevice.h:41, from drivers/net//ethernet/freescale/fec_main.c:34: drivers/net//ethernet/freescale/fec_main.c:968:38: error: 'FEC_FTRL' undeclared (first use in this function); did you mean 'FEC_ECNTRL'? writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL); ^ arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32' #define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = cpu_to_le32(l)) ^~~~ drivers/net//ethernet/freescale/fec_main.c:968:3: note: in expansion of macro 'writel' writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL); ^~ drivers/net//ethernet/freescale/fec_main.c:1034:38: error: 'FEC_R_FIFO_RSEM' undeclared (first use in this function); did you mean 'FEC_FIFO_RAM'? writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM); ^ arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32' #define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = cpu_to_le32(l)) ^~~~ drivers/net//ethernet/freescale/fec_main.c:1034:3: note: in expansion of macro 'writel' writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM); ^~ drivers/net//ethernet/freescale/fec_main.c:1035:38: error: 'FEC_R_FIFO_RSFL' undeclared (first use in
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
Hi Chao, On 05/17/18 at 11:27am, Chao Fan wrote: > >+/* Store the number of 1GB huge pages which user specified.*/ > >+static unsigned long max_gb_huge_pages; > >+ > >+static int parse_gb_huge_pages(char *param, char* val) > >+{ > >+char *p; > >+u64 mem_size; > >+static bool gbpage_sz = false; > >+ > >+if (!strcmp(param, "hugepagesz")) { > >+p = val; > >+mem_size = memparse(p, ); > >+if (mem_size == PUD_SIZE) { > >+if (gbpage_sz) > >+warn("Repeadly set hugeTLB page size of 1G!\n"); > >+gbpage_sz = true; > >+} else > >+gbpage_sz = false; > >+} else if (!strcmp(param, "hugepages") && gbpage_sz) { > >+p = val; > >+max_gb_huge_pages = simple_strtoull(p, , 0); > >+debug_putaddr(max_gb_huge_pages); > >+} > >+} > >+ > >+ > > static int handle_mem_memmap(void) > > { > > char *args = (char *)get_cmd_line_ptr(); > >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > >unsigned long image_size) > > } > > } > > > >+/* Skip as many 1GB huge pages as possible in the passed region. */ > >+static void process_gb_huge_page(struct mem_vector *region, unsigned long > >image_size) > >+{ > >+int i = 0; > >+unsigned long addr, size; > >+struct mem_vector tmp; > >+ > >+if (!max_gb_huge_pages) { > >+store_slot_info(region, image_size); > >+return; > >+} > >+ > >+addr = ALIGN(region->start, PUD_SIZE); > >+/* If Did we raise the address above the passed in memory entry? */ > >+if (addr < region->start + region->size) > >+size = region->size - (addr - region->start); > >+ > >+/* Check how many 1GB huge pages can be filtered out*/ > >+while (size > PUD_SIZE && max_gb_huge_pages) { > >+size -= PUD_SIZE; > >+max_gb_huge_pages--; > > The global variable 'max_gb_huge_pages' means how many huge pages > user specified when you get it from command line. > But here, everytime we find a position which is good for huge page > allocation, the 'max_gdb_huge_page' decreased. So in my understanding, > it is used to store how many huge pages that we still need to search memory > for good slots to filter out, right? > If it's right, maybe the name 'max_gb_huge_pages' is not very suitable. > If my understanding is wrong, please tell me. No, you have understood it very right. I finished the draft patch last week, but changed this variable name and the function names several time, still I feel they are not good. However I can't get a better name. Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected from kernel command-line. And in this function it will be decreased. But we can't define another global variable only for decreasing in this place. And you can see that in this patchset I only take cares of 1GB huge pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why 1GB only? Because 2MB is not impacted by KASLR, please check the code in hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be pre-allocated in hugetlb_nrpages_setup(), and if you look into hugetlb_nrpages_setup(), you will find that it will call alloc_bootmem_huge_page() to allocate huge pages one by one, but not at one time. That is why I always add 'gb' in the middle of the global variable and the newly added functions. And it will answer your below questions. When walk over all memory regions, 'max_gb_huge_pages' is still not 0, what should we do? It's normal and done as expected. Here hugetlb only try its best to allocate as many as possible according to 'max_gb_huge_pages'. If can't fully satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add below to command-line: default_hugepagesz=1G hugepagesz=1G hugepages=20 Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G) and [3G,4G) are touched by bios reservation and pci/firmware reservation. Then this 14 huge pages are maximal value which is expected. It's not a bug in huge page. But with kaslr enabled, it sometime only get 13 1GB huge pages because KASLR put kernel into one of those good 1GB huge pages. This is a bug. I am not very familiar with huge page handling, just read code recently because of this kaslr bug. Hope Luiz and people from his team can help correct and clarify if anything is not right. Especially the function names, I feel it's not good, if anyone have a better idea, I will really appreciate that. > > >+i++; > >+} > >+ > >+if (!i) { > >+store_slot_info(region, image_size); > >+return; > >+} > >+ > >+/* Process the remaining regions after filtering out. */ > >+ > This line may be unusable. Hmm, I made it on purpose. Because 1GB huge pages may be digged out from the middle, then the remaing head and tail regions still need be handled. I put it here to
Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling
Hi Chao, On 05/17/18 at 11:27am, Chao Fan wrote: > >+/* Store the number of 1GB huge pages which user specified.*/ > >+static unsigned long max_gb_huge_pages; > >+ > >+static int parse_gb_huge_pages(char *param, char* val) > >+{ > >+char *p; > >+u64 mem_size; > >+static bool gbpage_sz = false; > >+ > >+if (!strcmp(param, "hugepagesz")) { > >+p = val; > >+mem_size = memparse(p, ); > >+if (mem_size == PUD_SIZE) { > >+if (gbpage_sz) > >+warn("Repeadly set hugeTLB page size of 1G!\n"); > >+gbpage_sz = true; > >+} else > >+gbpage_sz = false; > >+} else if (!strcmp(param, "hugepages") && gbpage_sz) { > >+p = val; > >+max_gb_huge_pages = simple_strtoull(p, , 0); > >+debug_putaddr(max_gb_huge_pages); > >+} > >+} > >+ > >+ > > static int handle_mem_memmap(void) > > { > > char *args = (char *)get_cmd_line_ptr(); > >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, > >unsigned long image_size) > > } > > } > > > >+/* Skip as many 1GB huge pages as possible in the passed region. */ > >+static void process_gb_huge_page(struct mem_vector *region, unsigned long > >image_size) > >+{ > >+int i = 0; > >+unsigned long addr, size; > >+struct mem_vector tmp; > >+ > >+if (!max_gb_huge_pages) { > >+store_slot_info(region, image_size); > >+return; > >+} > >+ > >+addr = ALIGN(region->start, PUD_SIZE); > >+/* If Did we raise the address above the passed in memory entry? */ > >+if (addr < region->start + region->size) > >+size = region->size - (addr - region->start); > >+ > >+/* Check how many 1GB huge pages can be filtered out*/ > >+while (size > PUD_SIZE && max_gb_huge_pages) { > >+size -= PUD_SIZE; > >+max_gb_huge_pages--; > > The global variable 'max_gb_huge_pages' means how many huge pages > user specified when you get it from command line. > But here, everytime we find a position which is good for huge page > allocation, the 'max_gdb_huge_page' decreased. So in my understanding, > it is used to store how many huge pages that we still need to search memory > for good slots to filter out, right? > If it's right, maybe the name 'max_gb_huge_pages' is not very suitable. > If my understanding is wrong, please tell me. No, you have understood it very right. I finished the draft patch last week, but changed this variable name and the function names several time, still I feel they are not good. However I can't get a better name. Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected from kernel command-line. And in this function it will be decreased. But we can't define another global variable only for decreasing in this place. And you can see that in this patchset I only take cares of 1GB huge pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why 1GB only? Because 2MB is not impacted by KASLR, please check the code in hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be pre-allocated in hugetlb_nrpages_setup(), and if you look into hugetlb_nrpages_setup(), you will find that it will call alloc_bootmem_huge_page() to allocate huge pages one by one, but not at one time. That is why I always add 'gb' in the middle of the global variable and the newly added functions. And it will answer your below questions. When walk over all memory regions, 'max_gb_huge_pages' is still not 0, what should we do? It's normal and done as expected. Here hugetlb only try its best to allocate as many as possible according to 'max_gb_huge_pages'. If can't fully satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add below to command-line: default_hugepagesz=1G hugepagesz=1G hugepages=20 Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G) and [3G,4G) are touched by bios reservation and pci/firmware reservation. Then this 14 huge pages are maximal value which is expected. It's not a bug in huge page. But with kaslr enabled, it sometime only get 13 1GB huge pages because KASLR put kernel into one of those good 1GB huge pages. This is a bug. I am not very familiar with huge page handling, just read code recently because of this kaslr bug. Hope Luiz and people from his team can help correct and clarify if anything is not right. Especially the function names, I feel it's not good, if anyone have a better idea, I will really appreciate that. > > >+i++; > >+} > >+ > >+if (!i) { > >+store_slot_info(region, image_size); > >+return; > >+} > >+ > >+/* Process the remaining regions after filtering out. */ > >+ > This line may be unusable. Hmm, I made it on purpose. Because 1GB huge pages may be digged out from the middle, then the remaing head and tail regions still need be handled. I put it here to