Re: [PATCH v2 2/3] arm64: dts: renesas: draak: Describe CVBS input

2018-05-16 Thread kbuild test robot
Hi Jacopo,

I love your patch! Yet something to improve:

[auto build test ERROR on linuxtv-media/master]
[cannot apply to renesas/next v4.17-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jacopo-Mondi/arm64-dts-Draak-Enable-video-inputs-and-VIN4/20180517-102013
base:   git://linuxtv.org/media_tree.git master
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> Error: arch/arm64/boot/dts/renesas/r8a77995-draak.dts:266.1-6 Label or path 
>> vin4 not found
>> FATAL ERROR: Syntax error parsing input tree

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 2/3] arm64: dts: renesas: draak: Describe CVBS input

2018-05-16 Thread kbuild test robot
Hi Jacopo,

I love your patch! Yet something to improve:

[auto build test ERROR on linuxtv-media/master]
[cannot apply to renesas/next v4.17-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jacopo-Mondi/arm64-dts-Draak-Enable-video-inputs-and-VIN4/20180517-102013
base:   git://linuxtv.org/media_tree.git master
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> Error: arch/arm64/boot/dts/renesas/r8a77995-draak.dts:266.1-6 Label or path 
>> vin4 not found
>> FATAL ERROR: Syntax error parsing input tree

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] cpuidle/powernv : init all present cpus for deep states

2018-05-16 Thread Akshay Adiga
Yes this needs to be sent to  stable. 

Fixes: d405a98c ("powerpc/powernv: Move cpuidle related code from setup.c
to new file")



Re: [PATCH] cpuidle/powernv : init all present cpus for deep states

2018-05-16 Thread Akshay Adiga
Yes this needs to be sent to  stable. 

Fixes: d405a98c ("powerpc/powernv: Move cpuidle related code from setup.c
to new file")



Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Chao Fan
On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote:
>Hi Chao,
>
>On 05/17/18 at 11:27am, Chao Fan wrote:
>> >+/* Store the number of 1GB huge pages which user specified.*/
>> >+static unsigned long max_gb_huge_pages;
>> >+
>> >+static int parse_gb_huge_pages(char *param, char* val)
>> >+{
>> >+   char *p;
>> >+   u64 mem_size;
>> >+   static bool gbpage_sz = false;
>> >+
>> >+   if (!strcmp(param, "hugepagesz")) {
>> >+   p = val;
>> >+   mem_size = memparse(p, );
>> >+   if (mem_size == PUD_SIZE) {
>> >+   if (gbpage_sz)
>> >+   warn("Repeadly set hugeTLB page size of 1G!\n");
>> >+   gbpage_sz = true;
>> >+   } else
>> >+   gbpage_sz = false;
>> >+   } else if (!strcmp(param, "hugepages") && gbpage_sz) {
>> >+   p = val;
>> >+   max_gb_huge_pages = simple_strtoull(p, , 0);
>> >+   debug_putaddr(max_gb_huge_pages);
>> >+   }
>> >+}
>> >+
>> >+
>> > static int handle_mem_memmap(void)
>> > {
>> >char *args = (char *)get_cmd_line_ptr();
>> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
>> >unsigned long image_size)
>> >}
>> > }
>> > 
>> >+/* Skip as many 1GB huge pages as possible in the passed region. */
>> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long 
>> >image_size)
>> >+{
>> >+   int i = 0;
>> >+   unsigned long addr, size;
>> >+   struct mem_vector tmp;
>> >+
>> >+   if (!max_gb_huge_pages) {
>> >+   store_slot_info(region, image_size);
>> >+   return;
>> >+   }
>> >+
>> >+   addr = ALIGN(region->start, PUD_SIZE);
>> >+   /* If Did we raise the address above the passed in memory entry? */
>> >+   if (addr < region->start + region->size)
>> >+   size = region->size - (addr - region->start);
>> >+
>> >+   /* Check how many 1GB huge pages can be filtered out*/
>> >+   while (size > PUD_SIZE && max_gb_huge_pages) {
>> >+   size -= PUD_SIZE;
>> >+   max_gb_huge_pages--;
>> 
>> The global variable 'max_gb_huge_pages' means how many huge pages
>> user specified when you get it from command line.
>> But here, everytime we find a position which is good for huge page
>> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
>> it is used to store how many huge pages that we still need to search memory
>> for good slots to filter out, right?
>> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
>> If my understanding is wrong, please tell me.
>
>No, you have understood it very right. I finished the draft patch last
>week, but changed this variable name and the function names several
>time, still I feel they are not good. However I can't get a better name.
>
>Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
>from kernel command-line. And in this function it will be decreased. But
>we can't define another global variable only for decreasing in this
>place.
>
>And you can see that in this patchset I only take cares of 1GB huge
>pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
>1GB only? Because 2MB is not impacted by KASLR, please check the code in 
>hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
>pre-allocated in hugetlb_nrpages_setup(), and if you look into
>hugetlb_nrpages_setup(), you will find that it will call
>alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
>one time. That is why I always add 'gb' in the middle of the global
>variable and the newly added functions.
>
>And it will answer your below questions. When walk over all memory
>regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
>normal and done as expected. Here hugetlb only try its best to allocate
>as many as possible according to 'max_gb_huge_pages'. If can't fully
>satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
>below to command-line:
>
>default_hugepagesz=1G hugepagesz=1G hugepages=20
>
>Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
>and [3G,4G) are touched by bios reservation and pci/firmware reservation.
>Then this 14 huge pages are maximal value which is expected. It's not a
>bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
>huge pages because KASLR put kernel into one of those good 1GB huge
>pages. This is a bug.

Thanks for your explaination, I got it.

>
>I am not very familiar with huge page handling, just read code recently
>because of this kaslr bug. Hope Luiz and people from his team can help
>correct and clarify if anything is not right. Especially the function
>names, I feel it's not good, if anyone have a better idea, I will really
>appreciate that.
>> 
>> >+   i++;
>> >+   }
>> >+
>> >+   if (!i) {
>> >+   store_slot_info(region, image_size);
>> >+   return;
>> >+   }
>> >+
>> >+   /* Process the remaining regions after filtering out. */
>> >+
>> This line 

Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Chao Fan
On Thu, May 17, 2018 at 12:03:43PM +0800, Baoquan He wrote:
>Hi Chao,
>
>On 05/17/18 at 11:27am, Chao Fan wrote:
>> >+/* Store the number of 1GB huge pages which user specified.*/
>> >+static unsigned long max_gb_huge_pages;
>> >+
>> >+static int parse_gb_huge_pages(char *param, char* val)
>> >+{
>> >+   char *p;
>> >+   u64 mem_size;
>> >+   static bool gbpage_sz = false;
>> >+
>> >+   if (!strcmp(param, "hugepagesz")) {
>> >+   p = val;
>> >+   mem_size = memparse(p, );
>> >+   if (mem_size == PUD_SIZE) {
>> >+   if (gbpage_sz)
>> >+   warn("Repeadly set hugeTLB page size of 1G!\n");
>> >+   gbpage_sz = true;
>> >+   } else
>> >+   gbpage_sz = false;
>> >+   } else if (!strcmp(param, "hugepages") && gbpage_sz) {
>> >+   p = val;
>> >+   max_gb_huge_pages = simple_strtoull(p, , 0);
>> >+   debug_putaddr(max_gb_huge_pages);
>> >+   }
>> >+}
>> >+
>> >+
>> > static int handle_mem_memmap(void)
>> > {
>> >char *args = (char *)get_cmd_line_ptr();
>> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
>> >unsigned long image_size)
>> >}
>> > }
>> > 
>> >+/* Skip as many 1GB huge pages as possible in the passed region. */
>> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long 
>> >image_size)
>> >+{
>> >+   int i = 0;
>> >+   unsigned long addr, size;
>> >+   struct mem_vector tmp;
>> >+
>> >+   if (!max_gb_huge_pages) {
>> >+   store_slot_info(region, image_size);
>> >+   return;
>> >+   }
>> >+
>> >+   addr = ALIGN(region->start, PUD_SIZE);
>> >+   /* If Did we raise the address above the passed in memory entry? */
>> >+   if (addr < region->start + region->size)
>> >+   size = region->size - (addr - region->start);
>> >+
>> >+   /* Check how many 1GB huge pages can be filtered out*/
>> >+   while (size > PUD_SIZE && max_gb_huge_pages) {
>> >+   size -= PUD_SIZE;
>> >+   max_gb_huge_pages--;
>> 
>> The global variable 'max_gb_huge_pages' means how many huge pages
>> user specified when you get it from command line.
>> But here, everytime we find a position which is good for huge page
>> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
>> it is used to store how many huge pages that we still need to search memory
>> for good slots to filter out, right?
>> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
>> If my understanding is wrong, please tell me.
>
>No, you have understood it very right. I finished the draft patch last
>week, but changed this variable name and the function names several
>time, still I feel they are not good. However I can't get a better name.
>
>Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
>from kernel command-line. And in this function it will be decreased. But
>we can't define another global variable only for decreasing in this
>place.
>
>And you can see that in this patchset I only take cares of 1GB huge
>pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
>1GB only? Because 2MB is not impacted by KASLR, please check the code in 
>hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
>pre-allocated in hugetlb_nrpages_setup(), and if you look into
>hugetlb_nrpages_setup(), you will find that it will call
>alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
>one time. That is why I always add 'gb' in the middle of the global
>variable and the newly added functions.
>
>And it will answer your below questions. When walk over all memory
>regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
>normal and done as expected. Here hugetlb only try its best to allocate
>as many as possible according to 'max_gb_huge_pages'. If can't fully
>satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
>below to command-line:
>
>default_hugepagesz=1G hugepagesz=1G hugepages=20
>
>Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
>and [3G,4G) are touched by bios reservation and pci/firmware reservation.
>Then this 14 huge pages are maximal value which is expected. It's not a
>bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
>huge pages because KASLR put kernel into one of those good 1GB huge
>pages. This is a bug.

Thanks for your explaination, I got it.

>
>I am not very familiar with huge page handling, just read code recently
>because of this kaslr bug. Hope Luiz and people from his team can help
>correct and clarify if anything is not right. Especially the function
>names, I feel it's not good, if anyone have a better idea, I will really
>appreciate that.
>> 
>> >+   i++;
>> >+   }
>> >+
>> >+   if (!i) {
>> >+   store_slot_info(region, image_size);
>> >+   return;
>> >+   }
>> >+
>> >+   /* Process the remaining regions after filtering out. */
>> >+
>> This line 

[PATCH v7 0/6] provide power off support for iMX6 with external PMIC

2018-05-16 Thread Oleksij Rempel
2018.05.17:
update patches to version v7


This patch series is providing power off support for Freescale/NXP iMX6 based
boards with external power management integrated circuit (PMIC).
As a first step the PMIC is configured to turn off the system if the
standby pin is asserted. On second step we assert the standby pin.
For this reason we need to use pm_power_off_prepare.

Usage of stnadby pin for power off is described in official iMX6
documentation.

2018.03.05:
As this patch set touches multiple subsystems I think it would make
sense for Shawn Guo to take the all patch set.
The only part which didn't receive an ACK is regulator stuff. So I would
hope that Mark Brown can ACK it.

Kind regards,
Oleksij Rempel

2017.12.06:
Adding Linus. Probably there is no maintainer for this patch set.
No changes are made, tested on v4.15-rc1.

2017.10.27:
Last version of this patch set was send at 20 Jun 2017, this is a rebase against
kernel v4.14-rc6. Probably this set got lost. If I forgot to address some 
comments,
please point me.

changes:
v7:
 - use EXPORT_SYMBOL_GPL(pm_power_off_prepare) instead of EXPORT_SYMBOL
 - call imx6q_suspend_finish() directly without cpu_suspend()

v6:
 - rename imx6_pm_poweroff to imx6_pm_stby_poweroff
 - fix "MPIC_STBY_REQ" typo in the comment.

v5:
 - remove useless includes from pm-imx6.c patch
 - add Acked-by to "regulator: pfuze100: add fsl,pmic-stby-poweroff property"
   patch

v4:
 - update comment in "regulator: pfuze100: add fsl,pmic-stby-poweroff ..."
   patch
 - add Acked-by to "ARM: imx6q: provide documentation for new ..."
   patch

v3:
 - set pm_power_off_prepare = NULL on .remove.
 - documentation and spelling fixes.
 - use %pf instead of lookup_symbol_name.

Oleksij Rempel (6):
  ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff
property
  ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff"
is set
  kernel/reboot.c: export pm_power_off_prepare
  regulator: pfuze100: add fsl,pmic-stby-poweroff property
  regulator: pfuze100-regulator: provide pm_power_off_prepare handler
  ARM: dts: imx6: RIoTboard provide standby on power off option

 .../devicetree/bindings/clock/imx6q-clock.txt |  8 ++
 .../bindings/regulator/pfuze100.txt   |  7 ++
 arch/arm/boot/dts/imx6dl-riotboard.dts|  5 +
 arch/arm/mach-imx/pm-imx6.c   | 25 +
 drivers/regulator/pfuze100-regulator.c| 92 +++
 kernel/reboot.c   |  1 +
 6 files changed, 138 insertions(+)

-- 
2.17.0



[PATCH v7 0/6] provide power off support for iMX6 with external PMIC

2018-05-16 Thread Oleksij Rempel
2018.05.17:
update patches to version v7


This patch series is providing power off support for Freescale/NXP iMX6 based
boards with external power management integrated circuit (PMIC).
As a first step the PMIC is configured to turn off the system if the
standby pin is asserted. On second step we assert the standby pin.
For this reason we need to use pm_power_off_prepare.

Usage of stnadby pin for power off is described in official iMX6
documentation.

2018.03.05:
As this patch set touches multiple subsystems I think it would make
sense for Shawn Guo to take the all patch set.
The only part which didn't receive an ACK is regulator stuff. So I would
hope that Mark Brown can ACK it.

Kind regards,
Oleksij Rempel

2017.12.06:
Adding Linus. Probably there is no maintainer for this patch set.
No changes are made, tested on v4.15-rc1.

2017.10.27:
Last version of this patch set was send at 20 Jun 2017, this is a rebase against
kernel v4.14-rc6. Probably this set got lost. If I forgot to address some 
comments,
please point me.

changes:
v7:
 - use EXPORT_SYMBOL_GPL(pm_power_off_prepare) instead of EXPORT_SYMBOL
 - call imx6q_suspend_finish() directly without cpu_suspend()

v6:
 - rename imx6_pm_poweroff to imx6_pm_stby_poweroff
 - fix "MPIC_STBY_REQ" typo in the comment.

v5:
 - remove useless includes from pm-imx6.c patch
 - add Acked-by to "regulator: pfuze100: add fsl,pmic-stby-poweroff property"
   patch

v4:
 - update comment in "regulator: pfuze100: add fsl,pmic-stby-poweroff ..."
   patch
 - add Acked-by to "ARM: imx6q: provide documentation for new ..."
   patch

v3:
 - set pm_power_off_prepare = NULL on .remove.
 - documentation and spelling fixes.
 - use %pf instead of lookup_symbol_name.

Oleksij Rempel (6):
  ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff
property
  ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff"
is set
  kernel/reboot.c: export pm_power_off_prepare
  regulator: pfuze100: add fsl,pmic-stby-poweroff property
  regulator: pfuze100-regulator: provide pm_power_off_prepare handler
  ARM: dts: imx6: RIoTboard provide standby on power off option

 .../devicetree/bindings/clock/imx6q-clock.txt |  8 ++
 .../bindings/regulator/pfuze100.txt   |  7 ++
 arch/arm/boot/dts/imx6dl-riotboard.dts|  5 +
 arch/arm/mach-imx/pm-imx6.c   | 25 +
 drivers/regulator/pfuze100-regulator.c| 92 +++
 kernel/reboot.c   |  1 +
 6 files changed, 138 insertions(+)

-- 
2.17.0



[PATCH v7 6/6] ARM: dts: imx6: RIoTboard provide standby on power off option

2018-05-16 Thread Oleksij Rempel
This board, as well as some other boards with i.MX6 and a PMIC, uses a
"PMIC_STBY_REQ" line to notify the PMIC about a state change.
The PMIC is programmed for a specific state change before triggering the
line.
In this case, PMIC_STBY_REQ can be used for stand by, sleep
and power off modes.

Signed-off-by: Oleksij Rempel 
---
 arch/arm/boot/dts/imx6dl-riotboard.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/imx6dl-riotboard.dts 
b/arch/arm/boot/dts/imx6dl-riotboard.dts
index 2e98c92adff7..a0e9753ee767 100644
--- a/arch/arm/boot/dts/imx6dl-riotboard.dts
+++ b/arch/arm/boot/dts/imx6dl-riotboard.dts
@@ -90,6 +90,10 @@
status = "okay";
 };
 
+ {
+   fsl,pmic-stby-poweroff;
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_enet>;
@@ -170,6 +174,7 @@
reg = <0x08>;
interrupt-parent = <>;
interrupts = <16 8>;
+   fsl,pmic-stby-poweroff;
 
regulators {
reg_vddcore: sw1ab {/* 
VDDARM_IN */
-- 
2.17.0



[PATCH v7 2/6] ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set

2018-05-16 Thread Oleksij Rempel
One of the Freescale recommended sequences for power off with external
PMIC is the following:
...
3.  SoC is programming PMIC for power off when standby is asserted.
4.  In CCM STOP mode, Standby is asserted, PMIC gates SoC supplies.

See:
http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6DQRM.pdf
page 5083

This patch implements step 4. of this sequence.

Signed-off-by: Oleksij Rempel 
---
 arch/arm/mach-imx/pm-imx6.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c
index 017539dd712b..2f5c643f62fb 100644
--- a/arch/arm/mach-imx/pm-imx6.c
+++ b/arch/arm/mach-imx/pm-imx6.c
@@ -601,6 +601,28 @@ static void __init imx6_pm_common_init(const struct 
imx6_pm_socdata
   IMX6Q_GPR1_GINT);
 }
 
+static void imx6_pm_stby_poweroff(void)
+{
+   imx6_set_lpm(STOP_POWER_OFF);
+   imx6q_suspend_finish(0);
+
+   mdelay(1000);
+
+   pr_emerg("Unable to poweroff system\n");
+}
+
+static int imx6_pm_stby_poweroff_probe(void)
+{
+   if (pm_power_off) {
+   pr_warn("%s: pm_power_off already claimed  %p %pf!\n",
+   __func__, pm_power_off, pm_power_off);
+   return -EBUSY;
+   }
+
+   pm_power_off = imx6_pm_stby_poweroff;
+   return 0;
+}
+
 void __init imx6_pm_ccm_init(const char *ccm_compat)
 {
struct device_node *np;
@@ -617,6 +639,9 @@ void __init imx6_pm_ccm_init(const char *ccm_compat)
val = readl_relaxed(ccm_base + CLPCR);
val &= ~BM_CLPCR_LPM;
writel_relaxed(val, ccm_base + CLPCR);
+
+   if (of_property_read_bool(np, "fsl,pmic-stby-poweroff"))
+   imx6_pm_stby_poweroff_probe();
 }
 
 void __init imx6q_pm_init(void)
-- 
2.17.0



[PATCH v7 1/6] ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property

2018-05-16 Thread Oleksij Rempel
Signed-off-by: Oleksij Rempel 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/clock/imx6q-clock.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/imx6q-clock.txt 
b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
index a45ca67a9d5f..e1308346e00d 100644
--- a/Documentation/devicetree/bindings/clock/imx6q-clock.txt
+++ b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
@@ -6,6 +6,14 @@ Required properties:
 - interrupts: Should contain CCM interrupt
 - #clock-cells: Should be <1>
 
+Optional properties:
+- fsl,pmic-stby-poweroff: Configure CCM to assert PMIC_STBY_REQ signal
+  on power off.
+  Use this property if the SoC should be powered off by external power
+  management IC (PMIC) triggered via PMIC_STBY_REQ signal.
+  Boards that are designed to initiate poweroff on PMIC_ON_REQ signal should
+  be using "syscon-poweroff" driver instead.
+
 The clock consumer should specify the desired clock by having the clock
 ID in its "clocks" phandle cell.  See include/dt-bindings/clock/imx6qdl-clock.h
 for the full list of i.MX6 Quad and DualLite clock IDs.
-- 
2.17.0



[PATCH v7 2/6] ARM: imx6: register pm_power_off handler if "fsl,pmic-stby-poweroff" is set

2018-05-16 Thread Oleksij Rempel
One of the Freescale recommended sequences for power off with external
PMIC is the following:
...
3.  SoC is programming PMIC for power off when standby is asserted.
4.  In CCM STOP mode, Standby is asserted, PMIC gates SoC supplies.

See:
http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6DQRM.pdf
page 5083

This patch implements step 4. of this sequence.

Signed-off-by: Oleksij Rempel 
---
 arch/arm/mach-imx/pm-imx6.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/arm/mach-imx/pm-imx6.c b/arch/arm/mach-imx/pm-imx6.c
index 017539dd712b..2f5c643f62fb 100644
--- a/arch/arm/mach-imx/pm-imx6.c
+++ b/arch/arm/mach-imx/pm-imx6.c
@@ -601,6 +601,28 @@ static void __init imx6_pm_common_init(const struct 
imx6_pm_socdata
   IMX6Q_GPR1_GINT);
 }
 
+static void imx6_pm_stby_poweroff(void)
+{
+   imx6_set_lpm(STOP_POWER_OFF);
+   imx6q_suspend_finish(0);
+
+   mdelay(1000);
+
+   pr_emerg("Unable to poweroff system\n");
+}
+
+static int imx6_pm_stby_poweroff_probe(void)
+{
+   if (pm_power_off) {
+   pr_warn("%s: pm_power_off already claimed  %p %pf!\n",
+   __func__, pm_power_off, pm_power_off);
+   return -EBUSY;
+   }
+
+   pm_power_off = imx6_pm_stby_poweroff;
+   return 0;
+}
+
 void __init imx6_pm_ccm_init(const char *ccm_compat)
 {
struct device_node *np;
@@ -617,6 +639,9 @@ void __init imx6_pm_ccm_init(const char *ccm_compat)
val = readl_relaxed(ccm_base + CLPCR);
val &= ~BM_CLPCR_LPM;
writel_relaxed(val, ccm_base + CLPCR);
+
+   if (of_property_read_bool(np, "fsl,pmic-stby-poweroff"))
+   imx6_pm_stby_poweroff_probe();
 }
 
 void __init imx6q_pm_init(void)
-- 
2.17.0



[PATCH v7 6/6] ARM: dts: imx6: RIoTboard provide standby on power off option

2018-05-16 Thread Oleksij Rempel
This board, as well as some other boards with i.MX6 and a PMIC, uses a
"PMIC_STBY_REQ" line to notify the PMIC about a state change.
The PMIC is programmed for a specific state change before triggering the
line.
In this case, PMIC_STBY_REQ can be used for stand by, sleep
and power off modes.

Signed-off-by: Oleksij Rempel 
---
 arch/arm/boot/dts/imx6dl-riotboard.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/imx6dl-riotboard.dts 
b/arch/arm/boot/dts/imx6dl-riotboard.dts
index 2e98c92adff7..a0e9753ee767 100644
--- a/arch/arm/boot/dts/imx6dl-riotboard.dts
+++ b/arch/arm/boot/dts/imx6dl-riotboard.dts
@@ -90,6 +90,10 @@
status = "okay";
 };
 
+ {
+   fsl,pmic-stby-poweroff;
+};
+
  {
pinctrl-names = "default";
pinctrl-0 = <_enet>;
@@ -170,6 +174,7 @@
reg = <0x08>;
interrupt-parent = <>;
interrupts = <16 8>;
+   fsl,pmic-stby-poweroff;
 
regulators {
reg_vddcore: sw1ab {/* 
VDDARM_IN */
-- 
2.17.0



[PATCH v7 1/6] ARM: imx6q: provide documentation for new fsl,pmic-stby-poweroff property

2018-05-16 Thread Oleksij Rempel
Signed-off-by: Oleksij Rempel 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/clock/imx6q-clock.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/imx6q-clock.txt 
b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
index a45ca67a9d5f..e1308346e00d 100644
--- a/Documentation/devicetree/bindings/clock/imx6q-clock.txt
+++ b/Documentation/devicetree/bindings/clock/imx6q-clock.txt
@@ -6,6 +6,14 @@ Required properties:
 - interrupts: Should contain CCM interrupt
 - #clock-cells: Should be <1>
 
+Optional properties:
+- fsl,pmic-stby-poweroff: Configure CCM to assert PMIC_STBY_REQ signal
+  on power off.
+  Use this property if the SoC should be powered off by external power
+  management IC (PMIC) triggered via PMIC_STBY_REQ signal.
+  Boards that are designed to initiate poweroff on PMIC_ON_REQ signal should
+  be using "syscon-poweroff" driver instead.
+
 The clock consumer should specify the desired clock by having the clock
 ID in its "clocks" phandle cell.  See include/dt-bindings/clock/imx6qdl-clock.h
 for the full list of i.MX6 Quad and DualLite clock IDs.
-- 
2.17.0



[PATCH v7 5/6] regulator: pfuze100-regulator: provide pm_power_off_prepare handler

2018-05-16 Thread Oleksij Rempel
On some boards the SoC can use one pin "PMIC_STBY_REQ" to notify th PMIC
about state changes. In this case internal state of PMIC must be
preconfigured for upcomming state change.
It works fine with the current regulator framework, except with the
power-off case.

This patch is providing an optional pm_power_off_prepare handler
which will configure standby state of the PMIC to disable all power lines.

In my power consumption test on RIoTBoard, I got the following results:
power off without this patch:   320 mA
power off with this patch:  2   mA
suspend to ram: 40  mA

Signed-off-by: Oleksij Rempel 
---
 drivers/regulator/pfuze100-regulator.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/drivers/regulator/pfuze100-regulator.c 
b/drivers/regulator/pfuze100-regulator.c
index 63922a2167e5..f6c276ed91d8 100644
--- a/drivers/regulator/pfuze100-regulator.c
+++ b/drivers/regulator/pfuze100-regulator.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define PFUZE_NUMREGS  128
@@ -42,11 +43,17 @@
 
 #define PFUZE100_COINVOL   0x1a
 #define PFUZE100_SW1ABVOL  0x20
+#define PFUZE100_SW1ABMODE 0x23
 #define PFUZE100_SW1CVOL   0x2e
+#define PFUZE100_SW1CMODE  0x31
 #define PFUZE100_SW2VOL0x35
+#define PFUZE100_SW2MODE   0x38
 #define PFUZE100_SW3AVOL   0x3c
+#define PFUZE100_SW3AMODE  0x3f
 #define PFUZE100_SW3BVOL   0x43
+#define PFUZE100_SW3BMODE  0x46
 #define PFUZE100_SW4VOL0x4a
+#define PFUZE100_SW4MODE   0x4d
 #define PFUZE100_SWBSTCON1 0x66
 #define PFUZE100_VREFDDRCON0x6a
 #define PFUZE100_VSNVSVOL  0x6b
@@ -57,6 +64,13 @@
 #define PFUZE100_VGEN5VOL  0x70
 #define PFUZE100_VGEN6VOL  0x71
 
+#define PFUZE100_SWxMODE_MASK  0xf
+#define PFUZE100_SWxMODE_APS_APS   0x8
+#define PFUZE100_SWxMODE_APS_OFF   0x4
+
+#define PFUZE100_VGENxLPWR BIT(6)
+#define PFUZE100_VGENxSTBY BIT(5)
+
 enum chips { PFUZE100, PFUZE200, PFUZE3000 = 3 };
 
 struct pfuze_regulator {
@@ -489,6 +503,69 @@ static inline struct device_node *match_of_node(int index)
 }
 #endif
 
+static struct pfuze_chip *syspm_pfuze_chip;
+
+static void pfuze_power_off_prepare(void)
+{
+   dev_info(syspm_pfuze_chip->dev, "Configure standy mode for power off");
+
+   /* Switch from default mode: APS/APS to APS/Off */
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1ABMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1CMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW2MODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3AMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3BMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW4MODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN1VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN2VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN3VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN4VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN5VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN6VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+}
+
+static int pfuze_power_off_prepare_init(struct pfuze_chip *pfuze_chip)
+{
+   if (pfuze_chip->chip_id != PFUZE100) {
+   dev_warn(pfuze_chip->dev, "Requested pm_power_off_prepare 
handler for not supported chip\n");
+   return -ENODEV;
+   }
+
+   if (pm_power_off_prepare) {
+   dev_warn(pfuze_chip->dev, "pm_power_off_prepare is already 
registered.\n");
+   return -EBUSY;
+   }
+
+   if (syspm_pfuze_chip) {
+   dev_warn(pfuze_chip->dev, 

[PATCH v7 5/6] regulator: pfuze100-regulator: provide pm_power_off_prepare handler

2018-05-16 Thread Oleksij Rempel
On some boards the SoC can use one pin "PMIC_STBY_REQ" to notify th PMIC
about state changes. In this case internal state of PMIC must be
preconfigured for upcomming state change.
It works fine with the current regulator framework, except with the
power-off case.

This patch is providing an optional pm_power_off_prepare handler
which will configure standby state of the PMIC to disable all power lines.

In my power consumption test on RIoTBoard, I got the following results:
power off without this patch:   320 mA
power off with this patch:  2   mA
suspend to ram: 40  mA

Signed-off-by: Oleksij Rempel 
---
 drivers/regulator/pfuze100-regulator.c | 92 ++
 1 file changed, 92 insertions(+)

diff --git a/drivers/regulator/pfuze100-regulator.c 
b/drivers/regulator/pfuze100-regulator.c
index 63922a2167e5..f6c276ed91d8 100644
--- a/drivers/regulator/pfuze100-regulator.c
+++ b/drivers/regulator/pfuze100-regulator.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define PFUZE_NUMREGS  128
@@ -42,11 +43,17 @@
 
 #define PFUZE100_COINVOL   0x1a
 #define PFUZE100_SW1ABVOL  0x20
+#define PFUZE100_SW1ABMODE 0x23
 #define PFUZE100_SW1CVOL   0x2e
+#define PFUZE100_SW1CMODE  0x31
 #define PFUZE100_SW2VOL0x35
+#define PFUZE100_SW2MODE   0x38
 #define PFUZE100_SW3AVOL   0x3c
+#define PFUZE100_SW3AMODE  0x3f
 #define PFUZE100_SW3BVOL   0x43
+#define PFUZE100_SW3BMODE  0x46
 #define PFUZE100_SW4VOL0x4a
+#define PFUZE100_SW4MODE   0x4d
 #define PFUZE100_SWBSTCON1 0x66
 #define PFUZE100_VREFDDRCON0x6a
 #define PFUZE100_VSNVSVOL  0x6b
@@ -57,6 +64,13 @@
 #define PFUZE100_VGEN5VOL  0x70
 #define PFUZE100_VGEN6VOL  0x71
 
+#define PFUZE100_SWxMODE_MASK  0xf
+#define PFUZE100_SWxMODE_APS_APS   0x8
+#define PFUZE100_SWxMODE_APS_OFF   0x4
+
+#define PFUZE100_VGENxLPWR BIT(6)
+#define PFUZE100_VGENxSTBY BIT(5)
+
 enum chips { PFUZE100, PFUZE200, PFUZE3000 = 3 };
 
 struct pfuze_regulator {
@@ -489,6 +503,69 @@ static inline struct device_node *match_of_node(int index)
 }
 #endif
 
+static struct pfuze_chip *syspm_pfuze_chip;
+
+static void pfuze_power_off_prepare(void)
+{
+   dev_info(syspm_pfuze_chip->dev, "Configure standy mode for power off");
+
+   /* Switch from default mode: APS/APS to APS/Off */
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1ABMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW1CMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW2MODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3AMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW3BMODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_SW4MODE,
+  PFUZE100_SWxMODE_MASK, PFUZE100_SWxMODE_APS_OFF);
+
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN1VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN2VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN3VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN4VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN5VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+   regmap_update_bits(syspm_pfuze_chip->regmap, PFUZE100_VGEN6VOL,
+  PFUZE100_VGENxLPWR | PFUZE100_VGENxSTBY,
+  PFUZE100_VGENxSTBY);
+}
+
+static int pfuze_power_off_prepare_init(struct pfuze_chip *pfuze_chip)
+{
+   if (pfuze_chip->chip_id != PFUZE100) {
+   dev_warn(pfuze_chip->dev, "Requested pm_power_off_prepare 
handler for not supported chip\n");
+   return -ENODEV;
+   }
+
+   if (pm_power_off_prepare) {
+   dev_warn(pfuze_chip->dev, "pm_power_off_prepare is already 
registered.\n");
+   return -EBUSY;
+   }
+
+   if (syspm_pfuze_chip) {
+   dev_warn(pfuze_chip->dev, "syspm_pfuze_chip is already 

[PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-05-16 Thread Oleksij Rempel
Export pm_power_off_prepare. It is needed to implement power off on
Freescale/NXP iMX6 based boards with external power management
integrated circuit (PMIC).

Signed-off-by: Oleksij Rempel 
---
 kernel/reboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index e4ced883d8de..83810d726f3e 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -49,6 +49,7 @@ int reboot_force;
  */
 
 void (*pm_power_off_prepare)(void);
+EXPORT_SYMBOL_GPL(pm_power_off_prepare);
 
 /**
  * emergency_restart - reboot the system
-- 
2.17.0



[PATCH v7 4/6] regulator: pfuze100: add fsl,pmic-stby-poweroff property

2018-05-16 Thread Oleksij Rempel
Document the new optional "fsl,pmic-stby-poweroff" property.

Signed-off-by: Oleksij Rempel 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/regulator/pfuze100.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt 
b/Documentation/devicetree/bindings/regulator/pfuze100.txt
index c6dd3f5e485b..91fec1a0785d 100644
--- a/Documentation/devicetree/bindings/regulator/pfuze100.txt
+++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt
@@ -4,6 +4,13 @@ Required properties:
 - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000"
 - reg: I2C slave address
 
+Optional properties:
+- fsl,pmic-stby-poweroff: if present, configure the PMIC to shutdown all
+  power rails when PMIC_STBY_REQ line is asserted during the power off 
sequence.
+  Use this option if the SoC should be powered off by external power
+  management IC (PMIC) on PMIC_STBY_REQ signal.
+  As opposite to PMIC_STBY_REQ boards can implement PMIC_ON_REQ signal.
+
 Required child node:
 - regulators: This is the list of child nodes that specify the regulator
   initialization data for defined regulators. Please refer to below doc
-- 
2.17.0



[PATCH v7 4/6] regulator: pfuze100: add fsl,pmic-stby-poweroff property

2018-05-16 Thread Oleksij Rempel
Document the new optional "fsl,pmic-stby-poweroff" property.

Signed-off-by: Oleksij Rempel 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/regulator/pfuze100.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt 
b/Documentation/devicetree/bindings/regulator/pfuze100.txt
index c6dd3f5e485b..91fec1a0785d 100644
--- a/Documentation/devicetree/bindings/regulator/pfuze100.txt
+++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt
@@ -4,6 +4,13 @@ Required properties:
 - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000"
 - reg: I2C slave address
 
+Optional properties:
+- fsl,pmic-stby-poweroff: if present, configure the PMIC to shutdown all
+  power rails when PMIC_STBY_REQ line is asserted during the power off 
sequence.
+  Use this option if the SoC should be powered off by external power
+  management IC (PMIC) on PMIC_STBY_REQ signal.
+  As opposite to PMIC_STBY_REQ boards can implement PMIC_ON_REQ signal.
+
 Required child node:
 - regulators: This is the list of child nodes that specify the regulator
   initialization data for defined regulators. Please refer to below doc
-- 
2.17.0



[PATCH v7 3/6] kernel/reboot.c: export pm_power_off_prepare

2018-05-16 Thread Oleksij Rempel
Export pm_power_off_prepare. It is needed to implement power off on
Freescale/NXP iMX6 based boards with external power management
integrated circuit (PMIC).

Signed-off-by: Oleksij Rempel 
---
 kernel/reboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index e4ced883d8de..83810d726f3e 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -49,6 +49,7 @@ int reboot_force;
  */
 
 void (*pm_power_off_prepare)(void);
+EXPORT_SYMBOL_GPL(pm_power_off_prepare);
 
 /**
  * emergency_restart - reboot the system
-- 
2.17.0



Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-16 Thread Eric W. Biederman
Russell King - ARM Linux  writes:

> On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
>> Hi Russell,
>> 
>> On 10 May 2018 at 02:40, Russell King - ARM Linux  
>> wrote:
>> > This does not leak information from other namespaces because of the
>> > uniqueness of the global PID.  However, what it does leak is the value
>> > of the global PID which is meaningless in the namespace.  So, before
>> > the event stream is delivered to userspace, this value needs to be
>> > re-written to the namespace's PID value.
>> 
>> Unfortunately that can't be done.  The trace stream is compressed and
>> needs to be decompressed using an external library.  I think the only
>> option is to return an error if a user is trying to use this feature
>> from a namespace.
>
> That sounds like a sensible approach, and that should get rid of the
> vpid stuff too.
>
> Eric, would this solve all your concerns?

It does, and I have given my ack to the respin.

I am moderately concerned about using the global pid with hardware.  But
pids are a core abstraction that aren't going anywhere.  As long as
hardware does not impose constraints on pids that software already does
not we should be fine.

Eric


Re: [PATCH 2/4] pid: Export find_task_by_vpid for use in external modules

2018-05-16 Thread Eric W. Biederman
Russell King - ARM Linux  writes:

> On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote:
>> Hi Russell,
>> 
>> On 10 May 2018 at 02:40, Russell King - ARM Linux  
>> wrote:
>> > This does not leak information from other namespaces because of the
>> > uniqueness of the global PID.  However, what it does leak is the value
>> > of the global PID which is meaningless in the namespace.  So, before
>> > the event stream is delivered to userspace, this value needs to be
>> > re-written to the namespace's PID value.
>> 
>> Unfortunately that can't be done.  The trace stream is compressed and
>> needs to be decompressed using an external library.  I think the only
>> option is to return an error if a user is trying to use this feature
>> from a namespace.
>
> That sounds like a sensible approach, and that should get rid of the
> vpid stuff too.
>
> Eric, would this solve all your concerns?

It does, and I have given my ack to the respin.

I am moderately concerned about using the global pid with hardware.  But
pids are a core abstraction that aren't going anywhere.  As long as
hardware does not impose constraints on pids that software already does
not we should be fine.

Eric


Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Baoquan He
On 05/17/18 at 07:12am, damian wrote:
> On Wed, 16. May 18:05, Baoquan He wrote:
> > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> > handle conflict between KASLR and huge pages, will be used in the next 
> > patch.
> > 
> > Function parse_gb_huge_pages() is used to parse kernel command-line to get
> > how many 1GB huge pages have been specified. A static global variable
> > 'max_gb_huge_pages' is added to store the number.
> > 
> > And process_gb_huge_page() is used to skip as many 1GB huge pages as 
> > possible
> > from the passed in memory region according to the specified number.
> > 
> > Signed-off-by: Baoquan He 
> > ---
> >  arch/x86/boot/compressed/kaslr.c | 71 
> > 
> >  1 file changed, 71 insertions(+)
> > 
> > diff --git a/arch/x86/boot/compressed/kaslr.c 
> > b/arch/x86/boot/compressed/kaslr.c
> > index a0a50b91ecef..13bd879cdc5d 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
> > memmap_too_large = true;
> >  }
> >  
> > +/* Store the number of 1GB huge pages which user specified.*/
> > +static unsigned long max_gb_huge_pages;
> > +
> > +static int parse_gb_huge_pages(char *param, char* val)
> > +{
> > +   char *p;
> > +   u64 mem_size;
> > +   static bool gbpage_sz = false;
> > +
> > +   if (!strcmp(param, "hugepagesz")) {
> > +   p = val;
> > +   mem_size = memparse(p, );
> > +   if (mem_size == PUD_SIZE) {
> > +   if (gbpage_sz)
> > +   warn("Repeadly set hugeTLB page size of 1G!\n");
> > +   gbpage_sz = true;
> > +   } else
> > +   gbpage_sz = false;
> > +   } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> > +   p = val;
> > +   max_gb_huge_pages = simple_strtoull(p, , 0);
> > +   debug_putaddr(max_gb_huge_pages);
> > +   }
> > +}
> Hello,
> 
> the return value is missing for the function or not ?

Thanks, very good catch. It should be 'static void ', no return value
since we didn't check it. Will update when repost.

> 
> > +
> > +
> >  static int handle_mem_memmap(void)
> >  {
> > char *args = (char *)get_cmd_line_ptr();
> > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> > unsigned long image_size)
> > }
> >  }
> >  
> > +/* Skip as many 1GB huge pages as possible in the passed region. */
> > +static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> > image_size)
> > +{
> > +   int i = 0;
> > +   unsigned long addr, size;
> > +   struct mem_vector tmp;
> > +
> > +   if (!max_gb_huge_pages) {
> > +   store_slot_info(region, image_size);
> > +   return;
> > +   }
> > +
> > +   addr = ALIGN(region->start, PUD_SIZE);
> > +   /* If Did we raise the address above the passed in memory entry? */
> > +   if (addr < region->start + region->size)
> > +   size = region->size - (addr - region->start);
> > +
> > +   /* Check how many 1GB huge pages can be filtered out*/
> > +   while (size > PUD_SIZE && max_gb_huge_pages) {
> > +   size -= PUD_SIZE;
> > +   max_gb_huge_pages--;
> > +   i++;
> > +   }
> > +
> > +   if (!i) {
> > +   store_slot_info(region, image_size);
> > +   return;
> > +   }
> > +
> > +   /* Process the remaining regions after filtering out. */
> > +
> > +   if (addr >= region->start + image_size) {
> > +   tmp.start = region->start;
> > +   tmp.size = addr - region->start;
> > +   store_slot_info(, image_size);
> > +   }
> > +
> > +   size  = region->size - (addr - region->start) - i * PUD_SIZE;
> > +if (size >= image_size) {
> > +   tmp.start = addr + i*PUD_SIZE;
> > +   tmp.size = size;
> > +   store_slot_info(, image_size);
> > +}
> > +}
> > +
> >  static unsigned long slots_fetch_random(void)
> >  {
> > unsigned long slot;
> > -- 
> > 2.13.6
> > 



Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Baoquan He
On 05/17/18 at 07:12am, damian wrote:
> On Wed, 16. May 18:05, Baoquan He wrote:
> > Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> > handle conflict between KASLR and huge pages, will be used in the next 
> > patch.
> > 
> > Function parse_gb_huge_pages() is used to parse kernel command-line to get
> > how many 1GB huge pages have been specified. A static global variable
> > 'max_gb_huge_pages' is added to store the number.
> > 
> > And process_gb_huge_page() is used to skip as many 1GB huge pages as 
> > possible
> > from the passed in memory region according to the specified number.
> > 
> > Signed-off-by: Baoquan He 
> > ---
> >  arch/x86/boot/compressed/kaslr.c | 71 
> > 
> >  1 file changed, 71 insertions(+)
> > 
> > diff --git a/arch/x86/boot/compressed/kaslr.c 
> > b/arch/x86/boot/compressed/kaslr.c
> > index a0a50b91ecef..13bd879cdc5d 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
> > memmap_too_large = true;
> >  }
> >  
> > +/* Store the number of 1GB huge pages which user specified.*/
> > +static unsigned long max_gb_huge_pages;
> > +
> > +static int parse_gb_huge_pages(char *param, char* val)
> > +{
> > +   char *p;
> > +   u64 mem_size;
> > +   static bool gbpage_sz = false;
> > +
> > +   if (!strcmp(param, "hugepagesz")) {
> > +   p = val;
> > +   mem_size = memparse(p, );
> > +   if (mem_size == PUD_SIZE) {
> > +   if (gbpage_sz)
> > +   warn("Repeadly set hugeTLB page size of 1G!\n");
> > +   gbpage_sz = true;
> > +   } else
> > +   gbpage_sz = false;
> > +   } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> > +   p = val;
> > +   max_gb_huge_pages = simple_strtoull(p, , 0);
> > +   debug_putaddr(max_gb_huge_pages);
> > +   }
> > +}
> Hello,
> 
> the return value is missing for the function or not ?

Thanks, very good catch. It should be 'static void ', no return value
since we didn't check it. Will update when repost.

> 
> > +
> > +
> >  static int handle_mem_memmap(void)
> >  {
> > char *args = (char *)get_cmd_line_ptr();
> > @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> > unsigned long image_size)
> > }
> >  }
> >  
> > +/* Skip as many 1GB huge pages as possible in the passed region. */
> > +static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> > image_size)
> > +{
> > +   int i = 0;
> > +   unsigned long addr, size;
> > +   struct mem_vector tmp;
> > +
> > +   if (!max_gb_huge_pages) {
> > +   store_slot_info(region, image_size);
> > +   return;
> > +   }
> > +
> > +   addr = ALIGN(region->start, PUD_SIZE);
> > +   /* If Did we raise the address above the passed in memory entry? */
> > +   if (addr < region->start + region->size)
> > +   size = region->size - (addr - region->start);
> > +
> > +   /* Check how many 1GB huge pages can be filtered out*/
> > +   while (size > PUD_SIZE && max_gb_huge_pages) {
> > +   size -= PUD_SIZE;
> > +   max_gb_huge_pages--;
> > +   i++;
> > +   }
> > +
> > +   if (!i) {
> > +   store_slot_info(region, image_size);
> > +   return;
> > +   }
> > +
> > +   /* Process the remaining regions after filtering out. */
> > +
> > +   if (addr >= region->start + image_size) {
> > +   tmp.start = region->start;
> > +   tmp.size = addr - region->start;
> > +   store_slot_info(, image_size);
> > +   }
> > +
> > +   size  = region->size - (addr - region->start) - i * PUD_SIZE;
> > +if (size >= image_size) {
> > +   tmp.start = addr + i*PUD_SIZE;
> > +   tmp.size = size;
> > +   store_slot_info(, image_size);
> > +}
> > +}
> > +
> >  static unsigned long slots_fetch_random(void)
> >  {
> > unsigned long slot;
> > -- 
> > 2.13.6
> > 



Re: [PATCH v3 2/2] Input: xen-kbdfront - allow better run-time configuration

2018-05-16 Thread Oleksandr Andrushchenko

On 05/17/2018 12:08 AM, Dmitry Torokhov wrote:

On Wed, May 16, 2018 at 08:47:30PM +0300, Oleksandr Andrushchenko wrote:

On 05/16/2018 08:15 PM, Dmitry Torokhov wrote:

Hi Oleksandr,

On Mon, May 14, 2018 at 05:40:29PM +0300, Oleksandr Andrushchenko wrote:

@@ -211,93 +220,114 @@ static int xenkbd_probe(struct xenbus_device *dev,
if (!info->page)
goto error_nomem;
-   /* Set input abs params to match backend screen res */
-   abs = xenbus_read_unsigned(dev->otherend,
-  XENKBD_FIELD_FEAT_ABS_POINTER, 0);
-   ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_WIDTH,
- ptr_size[KPARAM_X]);
-   ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_HEIGHT,
- ptr_size[KPARAM_Y]);
-   if (abs) {
-   ret = xenbus_write(XBT_NIL, dev->nodename,
-  XENKBD_FIELD_REQ_ABS_POINTER, "1");
-   if (ret) {
-   pr_warn("xenkbd: can't request abs-pointer\n");
-   abs = 0;
-   }
-   }
+   /*
+* The below are reverse logic, e.g. if the feature is set, then
+* do not expose the corresponding virtual device.
+*/
+   with_kbd = !xenbus_read_unsigned(dev->nodename,
+XENKBD_FIELD_FEAT_DSBL_KEYBRD, 0);
-   touch = xenbus_read_unsigned(dev->nodename,
-XENKBD_FIELD_FEAT_MTOUCH, 0);
-   if (touch) {
+   with_ptr = !xenbus_read_unsigned(dev->nodename,
+XENKBD_FIELD_FEAT_DSBL_POINTER, 0);
+
+   /* Direct logic: if set, then create multi-touch device. */
+   with_mtouch = xenbus_read_unsigned(dev->nodename,
+  XENKBD_FIELD_FEAT_MTOUCH, 0);
+   if (with_mtouch) {
ret = xenbus_write(XBT_NIL, dev->nodename,
   XENKBD_FIELD_REQ_MTOUCH, "1");
if (ret) {
pr_warn("xenkbd: can't request multi-touch");
-   touch = 0;
+   with_mtouch = 0;
}
}

Does it make sense to still end up calling xenkbd_connect_backend() when
all interfaces (keyboard, pointer, and multitouch) are disabled? Should
we do:

if (!(with_kbd || || with_ptr || with_mtouch))
return -ENXIO;

?

It does make sense. Then we probably need to move all xenbus_read_unsigned
calls to the very beginning of the .probe, so no memory allocations are made
which will be useless if we return -ENXIO, e.g. something like

static int xenkbd_probe(struct xenbus_device *dev,
                   const struct xenbus_device_id *id)
{
     int ret, i;
     bool with_mtouch, with_kbd, with_ptr;
     struct xenkbd_info *info;
     struct input_dev *kbd, *ptr, *mtouch;



if (!(with_kbd | with_ptr | with_mtouch))
         return -ENXIO;

Does the above looks ok?

Yes. Another option is to keep the check where I suggested and do

if (...) {
ret = -ENXIO;
goto error;
}

Whichever you prefer is fine with me.

I will go with the change you suggested and
I'll send v4 tomorrow then.

Thanks.


Thank you,
Oleksandr


Re: [PATCH v3 2/2] Input: xen-kbdfront - allow better run-time configuration

2018-05-16 Thread Oleksandr Andrushchenko

On 05/17/2018 12:08 AM, Dmitry Torokhov wrote:

On Wed, May 16, 2018 at 08:47:30PM +0300, Oleksandr Andrushchenko wrote:

On 05/16/2018 08:15 PM, Dmitry Torokhov wrote:

Hi Oleksandr,

On Mon, May 14, 2018 at 05:40:29PM +0300, Oleksandr Andrushchenko wrote:

@@ -211,93 +220,114 @@ static int xenkbd_probe(struct xenbus_device *dev,
if (!info->page)
goto error_nomem;
-   /* Set input abs params to match backend screen res */
-   abs = xenbus_read_unsigned(dev->otherend,
-  XENKBD_FIELD_FEAT_ABS_POINTER, 0);
-   ptr_size[KPARAM_X] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_WIDTH,
- ptr_size[KPARAM_X]);
-   ptr_size[KPARAM_Y] = xenbus_read_unsigned(dev->otherend,
- XENKBD_FIELD_HEIGHT,
- ptr_size[KPARAM_Y]);
-   if (abs) {
-   ret = xenbus_write(XBT_NIL, dev->nodename,
-  XENKBD_FIELD_REQ_ABS_POINTER, "1");
-   if (ret) {
-   pr_warn("xenkbd: can't request abs-pointer\n");
-   abs = 0;
-   }
-   }
+   /*
+* The below are reverse logic, e.g. if the feature is set, then
+* do not expose the corresponding virtual device.
+*/
+   with_kbd = !xenbus_read_unsigned(dev->nodename,
+XENKBD_FIELD_FEAT_DSBL_KEYBRD, 0);
-   touch = xenbus_read_unsigned(dev->nodename,
-XENKBD_FIELD_FEAT_MTOUCH, 0);
-   if (touch) {
+   with_ptr = !xenbus_read_unsigned(dev->nodename,
+XENKBD_FIELD_FEAT_DSBL_POINTER, 0);
+
+   /* Direct logic: if set, then create multi-touch device. */
+   with_mtouch = xenbus_read_unsigned(dev->nodename,
+  XENKBD_FIELD_FEAT_MTOUCH, 0);
+   if (with_mtouch) {
ret = xenbus_write(XBT_NIL, dev->nodename,
   XENKBD_FIELD_REQ_MTOUCH, "1");
if (ret) {
pr_warn("xenkbd: can't request multi-touch");
-   touch = 0;
+   with_mtouch = 0;
}
}

Does it make sense to still end up calling xenkbd_connect_backend() when
all interfaces (keyboard, pointer, and multitouch) are disabled? Should
we do:

if (!(with_kbd || || with_ptr || with_mtouch))
return -ENXIO;

?

It does make sense. Then we probably need to move all xenbus_read_unsigned
calls to the very beginning of the .probe, so no memory allocations are made
which will be useless if we return -ENXIO, e.g. something like

static int xenkbd_probe(struct xenbus_device *dev,
                   const struct xenbus_device_id *id)
{
     int ret, i;
     bool with_mtouch, with_kbd, with_ptr;
     struct xenkbd_info *info;
     struct input_dev *kbd, *ptr, *mtouch;



if (!(with_kbd | with_ptr | with_mtouch))
         return -ENXIO;

Does the above looks ok?

Yes. Another option is to keep the check where I suggested and do

if (...) {
ret = -ENXIO;
goto error;
}

Whichever you prefer is fine with me.

I will go with the change you suggested and
I'll send v4 tomorrow then.

Thanks.


Thank you,
Oleksandr


Re: [PATCH 11/40] ipv6/flowlabel: simplify pid namespace lookup

2018-05-16 Thread Eric W. Biederman
Christoph Hellwig  writes:

> On Sat, May 05, 2018 at 07:37:33AM -0500, Eric W. Biederman wrote:
>> Christoph Hellwig  writes:
>> 
>> > The shole seq_file sequence already operates under a single RCU lock pair,
>> > so move the pid namespace lookup into it, and stop grabbing a reference
>> > and remove all kinds of boilerplate code.
>> 
>> This is wrong.
>> 
>> Move task_active_pid_ns(current) from open to seq_start actually means
>> that the results if you pass this proc file between callers the results
>> will change.  So this breaks file descriptor passing.
>> 
>> Open is a bad place to access current.  In the middle of read/write is
>> broken.
>> 
>> 
>> In this particular instance looking up the pid namespace with
>> task_active_pid_ns was a personal brain fart.  What the code should be
>> doing (with an appropriate helper) is:
>> 
>> struct pid_namespace *pid_ns = inode->i_sb->s_fs_info;
>> 
>> Because each mount of proc is bound to a pid namespace.  Looking up the
>> pid namespace from the super_block is a much better way to go.
>
> What do you have in mind for the helper?  For now I've thrown it in
> opencoded into my working tree, but I'd be glad to add a helper.
>
> struct pid_namespace *proc_pid_namespace(struct inode *inode)
> {
>   // maybe warn on for s_magic not on procfs??
>   return inode->i_sb->s_fs_info;
> }

That should work.  Ideally out of line for the proc_fs.h version.
Basically it should be a cousin of PDE_DATA.

Eric



Re: [PATCH 11/40] ipv6/flowlabel: simplify pid namespace lookup

2018-05-16 Thread Eric W. Biederman
Christoph Hellwig  writes:

> On Sat, May 05, 2018 at 07:37:33AM -0500, Eric W. Biederman wrote:
>> Christoph Hellwig  writes:
>> 
>> > The shole seq_file sequence already operates under a single RCU lock pair,
>> > so move the pid namespace lookup into it, and stop grabbing a reference
>> > and remove all kinds of boilerplate code.
>> 
>> This is wrong.
>> 
>> Move task_active_pid_ns(current) from open to seq_start actually means
>> that the results if you pass this proc file between callers the results
>> will change.  So this breaks file descriptor passing.
>> 
>> Open is a bad place to access current.  In the middle of read/write is
>> broken.
>> 
>> 
>> In this particular instance looking up the pid namespace with
>> task_active_pid_ns was a personal brain fart.  What the code should be
>> doing (with an appropriate helper) is:
>> 
>> struct pid_namespace *pid_ns = inode->i_sb->s_fs_info;
>> 
>> Because each mount of proc is bound to a pid namespace.  Looking up the
>> pid namespace from the super_block is a much better way to go.
>
> What do you have in mind for the helper?  For now I've thrown it in
> opencoded into my working tree, but I'd be glad to add a helper.
>
> struct pid_namespace *proc_pid_namespace(struct inode *inode)
> {
>   // maybe warn on for s_magic not on procfs??
>   return inode->i_sb->s_fs_info;
> }

That should work.  Ideally out of line for the proc_fs.h version.
Basically it should be a cousin of PDE_DATA.

Eric



Re: [linux-firmware] Version in WHENCE file

2018-05-16 Thread Luciano Coelho
On Mon, 2018-05-07 at 09:47 +0200, Sedat Dilek wrote:
> On Sun, May 6, 2018 at 3:07 PM, Luciano Coelho  com> wrote:
> > On Sun, 2018-05-06 at 14:46 +0200, Sedat Dilek wrote:
> > > Hi Luca,
> > > 
> > > I hope I catched the correct MLs (not sure if linux-firmware has
> > > a
> > > ML,
> > > I did not found any in the MAINTAINERS file).
> > > 
> > > I have seen that in the WHENCE file there is "Version" with and
> > > without ":", mostly iwlwifi ucodes.
> > > 
> > > As an example:
> > > 
> > >  File: iwlwifi-8265-36.ucode
> > > -Version 36.e91976c0.0
> > > +Version: 36.e91976c0.0
> > > 
> > > I don't know the workflow: Do you want to fix it in your tree or
> > > directly in linux-firmware.git upstream?
> > > My attached patch is against upstream.
> > 
> > Thanks, Sedat!
> > 
> > I'm going to send a new pull-request this week, so I can include
> > your
> > patch in my tree and as part of the pull-request.
> > 
> > --
> > Cheers,
> > Luca.
> 
> OK, Thanks.
> Attached Patch v2 is against your tree [1].
> It differs from v1 against upstream.

You need to write a proper commit message and sign it off so I can
apply it.

--
Luca.


Re: [linux-firmware] Version in WHENCE file

2018-05-16 Thread Luciano Coelho
On Mon, 2018-05-07 at 09:47 +0200, Sedat Dilek wrote:
> On Sun, May 6, 2018 at 3:07 PM, Luciano Coelho  com> wrote:
> > On Sun, 2018-05-06 at 14:46 +0200, Sedat Dilek wrote:
> > > Hi Luca,
> > > 
> > > I hope I catched the correct MLs (not sure if linux-firmware has
> > > a
> > > ML,
> > > I did not found any in the MAINTAINERS file).
> > > 
> > > I have seen that in the WHENCE file there is "Version" with and
> > > without ":", mostly iwlwifi ucodes.
> > > 
> > > As an example:
> > > 
> > >  File: iwlwifi-8265-36.ucode
> > > -Version 36.e91976c0.0
> > > +Version: 36.e91976c0.0
> > > 
> > > I don't know the workflow: Do you want to fix it in your tree or
> > > directly in linux-firmware.git upstream?
> > > My attached patch is against upstream.
> > 
> > Thanks, Sedat!
> > 
> > I'm going to send a new pull-request this week, so I can include
> > your
> > patch in my tree and as part of the pull-request.
> > 
> > --
> > Cheers,
> > Luca.
> 
> OK, Thanks.
> Attached Patch v2 is against your tree [1].
> It differs from v1 against upstream.

You need to write a proper commit message and sign it off so I can
apply it.

--
Luca.


Re: [PATCH] dmaengine: qcom: bam_dma: check if the runtime pm enabled

2018-05-16 Thread Vinod
On 14-05-18, 17:18, Srinivas Kandagatla wrote:
> Disabling pm runtime at probe is not sufficient to get BAM working
> on remotely controller instances. pm_runtime_get_sync() would return
> -EACCES in such cases.
> So check if runtime pm is enabled before returning error from bam functions.
> 
> Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote 
> controlled")
> Signed-off-by: Srinivas Kandagatla 
> ---
>  drivers/dma/qcom/bam_dma.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index d29275b97e84..5f4babebc508 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -540,7 +540,7 @@ static void bam_free_chan(struct dma_chan *chan)
>   int ret;
>  
>   ret = pm_runtime_get_sync(bdev->dev);
> - if (ret < 0)
> + if (pm_runtime_enabled(bdev->dev) && ret < 0)

would it make sense to first check enabled and do _get_sync() 

if (pm_runtime_enabled()) {
ret = pm_runtime_get_sync() {
...
}
}

thus making clear in code that we do calls only when it is enabled. Also you can
add a local macro for this code and use that rather than copy pasting :)

-- 
~Vinod


Re: [PATCH] dmaengine: qcom: bam_dma: check if the runtime pm enabled

2018-05-16 Thread Vinod
On 14-05-18, 17:18, Srinivas Kandagatla wrote:
> Disabling pm runtime at probe is not sufficient to get BAM working
> on remotely controller instances. pm_runtime_get_sync() would return
> -EACCES in such cases.
> So check if runtime pm is enabled before returning error from bam functions.
> 
> Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote 
> controlled")
> Signed-off-by: Srinivas Kandagatla 
> ---
>  drivers/dma/qcom/bam_dma.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index d29275b97e84..5f4babebc508 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -540,7 +540,7 @@ static void bam_free_chan(struct dma_chan *chan)
>   int ret;
>  
>   ret = pm_runtime_get_sync(bdev->dev);
> - if (ret < 0)
> + if (pm_runtime_enabled(bdev->dev) && ret < 0)

would it make sense to first check enabled and do _get_sync() 

if (pm_runtime_enabled()) {
ret = pm_runtime_get_sync() {
...
}
}

thus making clear in code that we do calls only when it is enabled. Also you can
add a local macro for this code and use that rather than copy pasting :)

-- 
~Vinod


Re: [PATCH v3 2/2] dmaengine: sprd: Add Spreadtrum DMA configuration

2018-05-16 Thread Vinod
On 11-05-18, 21:06, Baolin Wang wrote:
> +struct sprd_dma_config {
> + struct dma_slave_config cfg;
> + u32 block_len;
> + u32 transcation_len;

/s/transcation/transaction

now in code I see block_len and this filled by len which is sg_dma_len()?
So why two varibales when you are using only one.

Second, you are storing parameters programmed thru _prep call into
_dma_config. That is not correct.

We use these to store channel parameters and NOT transaction parameters which
would differ with each txn. No wonder you can only support only 1 txn :)

Lastly, if these are same values why not use src/dstn_max_burst?

> +static enum sprd_dma_datawidth
> +sprd_dma_get_datawidth(enum dma_slave_buswidth buswidth)
> +{
> + switch (buswidth) {
> + case DMA_SLAVE_BUSWIDTH_1_BYTE:
> + case DMA_SLAVE_BUSWIDTH_2_BYTES:
> + case DMA_SLAVE_BUSWIDTH_4_BYTES:
> + case DMA_SLAVE_BUSWIDTH_8_BYTES:
> + return ffs(buswidth) - 1;
> + default:
> + return SPRD_DMA_DATAWIDTH_4_BYTES;

Does default make sense, should we error out?

> +static u32 sprd_dma_get_step(enum dma_slave_buswidth buswidth)
> +{
> + switch (buswidth) {
> + case DMA_SLAVE_BUSWIDTH_1_BYTE:
> + case DMA_SLAVE_BUSWIDTH_2_BYTES:
> + case DMA_SLAVE_BUSWIDTH_4_BYTES:
> + case DMA_SLAVE_BUSWIDTH_8_BYTES:
> + return buswidth;
> +
> + default:
> + return SPRD_DMA_DWORD_STEP;

Does default make sense, should we error out?

> +static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc 
> *sdesc,
> +dma_addr_t src, dma_addr_t dst,
> +struct sprd_dma_config *slave_cfg)
> +{
> + struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> + struct sprd_dma_chn_hw *hw = >chn_hw;
> + u32 fix_mode = 0, fix_en = 0, wrap_en = 0, wrap_mode = 0;
> + u32 src_datawidth, dst_datawidth, temp;
> +
> + if (slave_cfg->cfg.slave_id)
> + schan->dev_id = slave_cfg->cfg.slave_id;
> +
> + hw->cfg = SPRD_DMA_DONOT_WAIT_BDONE << SPRD_DMA_WAIT_BDONE_OFFSET;
> +
> + temp = slave_cfg->wrap_ptr & SPRD_DMA_LOW_ADDR_MASK;
> + temp |= (src >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK;
> + hw->wrap_ptr = temp;
> +
> + temp = slave_cfg->wrap_to & SPRD_DMA_LOW_ADDR_MASK;
> + temp |= (dst >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK;
> + hw->wrap_to = temp;
> +
> + hw->src_addr = src & SPRD_DMA_LOW_ADDR_MASK;
> + hw->des_addr = dst & SPRD_DMA_LOW_ADDR_MASK;
> +
> + if ((slave_cfg->src_step != 0 && slave_cfg->dst_step != 0)
> + || (slave_cfg->src_step | slave_cfg->dst_step) == 0) {

this check doesnt seem right to me, what are you trying here?

> + fix_en = 0;
> + } else {
> + fix_en = 1;
> + if (slave_cfg->src_step)
> + fix_mode = 1;
> + else
> + fix_mode = 0;
> + }
> +
> + if (slave_cfg->wrap_ptr && slave_cfg->wrap_to) {
> + wrap_en = 1;
> + if (slave_cfg->wrap_to == src) {
> + wrap_mode = 0;
> + } else if (slave_cfg->wrap_to == dst) {
> + wrap_mode = 1;
> + } else {
> + dev_err(sdev->dma_dev.dev, "invalid wrap mode\n");
> + return -EINVAL;
> + }
> + }
> +
> + hw->intc = slave_cfg->int_mode | SPRD_DMA_CFG_ERR_INT_EN;
> +
> + src_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.src_addr_width);
> + dst_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.dst_addr_width);
> +
> + temp = src_datawidth << SPRD_DMA_SRC_DATAWIDTH_OFFSET;
> + temp |= dst_datawidth << SPRD_DMA_DES_DATAWIDTH_OFFSET;
> + temp |= slave_cfg->req_mode << SPRD_DMA_REQ_MODE_OFFSET;
> + temp |= wrap_mode << SPRD_DMA_WRAP_SEL_OFFSET;
> + temp |= wrap_en << SPRD_DMA_WRAP_EN_OFFSET;
> + temp |= fix_mode << SPRD_DMA_FIX_SEL_OFFSET;
> + temp |= fix_en << SPRD_DMA_FIX_EN_OFFSET;
> + temp |= slave_cfg->cfg.src_maxburst & SPRD_DMA_FRG_LEN_MASK;
> + hw->frg_len = temp;
> +
> + hw->blk_len = slave_cfg->block_len & SPRD_DMA_BLK_LEN_MASK;
> + hw->trsc_len = slave_cfg->transcation_len & SPRD_DMA_TRSC_LEN_MASK;
> +
> + temp = (slave_cfg->dst_step & SPRD_DMA_TRSF_STEP_MASK) <<
> + SPRD_DMA_DEST_TRSF_STEP_OFFSET;
> + temp |= (slave_cfg->src_step & SPRD_DMA_TRSF_STEP_MASK) <<
> + SPRD_DMA_SRC_TRSF_STEP_OFFSET;
> + hw->trsf_step = temp;
> +
> + hw->frg_step = 0;
> + hw->src_blk_step = 0;
> + hw->des_blk_step = 0;
> + return 0;
> +}
> +static struct dma_async_tx_descriptor *
> +sprd_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
> +unsigned int sglen, enum dma_transfer_direction dir,
> +unsigned long flags, void *context)
> +{
> + 

Re: [PATCH v3 2/2] dmaengine: sprd: Add Spreadtrum DMA configuration

2018-05-16 Thread Vinod
On 11-05-18, 21:06, Baolin Wang wrote:
> +struct sprd_dma_config {
> + struct dma_slave_config cfg;
> + u32 block_len;
> + u32 transcation_len;

/s/transcation/transaction

now in code I see block_len and this filled by len which is sg_dma_len()?
So why two varibales when you are using only one.

Second, you are storing parameters programmed thru _prep call into
_dma_config. That is not correct.

We use these to store channel parameters and NOT transaction parameters which
would differ with each txn. No wonder you can only support only 1 txn :)

Lastly, if these are same values why not use src/dstn_max_burst?

> +static enum sprd_dma_datawidth
> +sprd_dma_get_datawidth(enum dma_slave_buswidth buswidth)
> +{
> + switch (buswidth) {
> + case DMA_SLAVE_BUSWIDTH_1_BYTE:
> + case DMA_SLAVE_BUSWIDTH_2_BYTES:
> + case DMA_SLAVE_BUSWIDTH_4_BYTES:
> + case DMA_SLAVE_BUSWIDTH_8_BYTES:
> + return ffs(buswidth) - 1;
> + default:
> + return SPRD_DMA_DATAWIDTH_4_BYTES;

Does default make sense, should we error out?

> +static u32 sprd_dma_get_step(enum dma_slave_buswidth buswidth)
> +{
> + switch (buswidth) {
> + case DMA_SLAVE_BUSWIDTH_1_BYTE:
> + case DMA_SLAVE_BUSWIDTH_2_BYTES:
> + case DMA_SLAVE_BUSWIDTH_4_BYTES:
> + case DMA_SLAVE_BUSWIDTH_8_BYTES:
> + return buswidth;
> +
> + default:
> + return SPRD_DMA_DWORD_STEP;

Does default make sense, should we error out?

> +static int sprd_dma_config(struct dma_chan *chan, struct sprd_dma_desc 
> *sdesc,
> +dma_addr_t src, dma_addr_t dst,
> +struct sprd_dma_config *slave_cfg)
> +{
> + struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> + struct sprd_dma_chn_hw *hw = >chn_hw;
> + u32 fix_mode = 0, fix_en = 0, wrap_en = 0, wrap_mode = 0;
> + u32 src_datawidth, dst_datawidth, temp;
> +
> + if (slave_cfg->cfg.slave_id)
> + schan->dev_id = slave_cfg->cfg.slave_id;
> +
> + hw->cfg = SPRD_DMA_DONOT_WAIT_BDONE << SPRD_DMA_WAIT_BDONE_OFFSET;
> +
> + temp = slave_cfg->wrap_ptr & SPRD_DMA_LOW_ADDR_MASK;
> + temp |= (src >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK;
> + hw->wrap_ptr = temp;
> +
> + temp = slave_cfg->wrap_to & SPRD_DMA_LOW_ADDR_MASK;
> + temp |= (dst >> SPRD_DMA_HIGH_ADDR_OFFSET) & SPRD_DMA_HIGH_ADDR_MASK;
> + hw->wrap_to = temp;
> +
> + hw->src_addr = src & SPRD_DMA_LOW_ADDR_MASK;
> + hw->des_addr = dst & SPRD_DMA_LOW_ADDR_MASK;
> +
> + if ((slave_cfg->src_step != 0 && slave_cfg->dst_step != 0)
> + || (slave_cfg->src_step | slave_cfg->dst_step) == 0) {

this check doesnt seem right to me, what are you trying here?

> + fix_en = 0;
> + } else {
> + fix_en = 1;
> + if (slave_cfg->src_step)
> + fix_mode = 1;
> + else
> + fix_mode = 0;
> + }
> +
> + if (slave_cfg->wrap_ptr && slave_cfg->wrap_to) {
> + wrap_en = 1;
> + if (slave_cfg->wrap_to == src) {
> + wrap_mode = 0;
> + } else if (slave_cfg->wrap_to == dst) {
> + wrap_mode = 1;
> + } else {
> + dev_err(sdev->dma_dev.dev, "invalid wrap mode\n");
> + return -EINVAL;
> + }
> + }
> +
> + hw->intc = slave_cfg->int_mode | SPRD_DMA_CFG_ERR_INT_EN;
> +
> + src_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.src_addr_width);
> + dst_datawidth = sprd_dma_get_datawidth(slave_cfg->cfg.dst_addr_width);
> +
> + temp = src_datawidth << SPRD_DMA_SRC_DATAWIDTH_OFFSET;
> + temp |= dst_datawidth << SPRD_DMA_DES_DATAWIDTH_OFFSET;
> + temp |= slave_cfg->req_mode << SPRD_DMA_REQ_MODE_OFFSET;
> + temp |= wrap_mode << SPRD_DMA_WRAP_SEL_OFFSET;
> + temp |= wrap_en << SPRD_DMA_WRAP_EN_OFFSET;
> + temp |= fix_mode << SPRD_DMA_FIX_SEL_OFFSET;
> + temp |= fix_en << SPRD_DMA_FIX_EN_OFFSET;
> + temp |= slave_cfg->cfg.src_maxburst & SPRD_DMA_FRG_LEN_MASK;
> + hw->frg_len = temp;
> +
> + hw->blk_len = slave_cfg->block_len & SPRD_DMA_BLK_LEN_MASK;
> + hw->trsc_len = slave_cfg->transcation_len & SPRD_DMA_TRSC_LEN_MASK;
> +
> + temp = (slave_cfg->dst_step & SPRD_DMA_TRSF_STEP_MASK) <<
> + SPRD_DMA_DEST_TRSF_STEP_OFFSET;
> + temp |= (slave_cfg->src_step & SPRD_DMA_TRSF_STEP_MASK) <<
> + SPRD_DMA_SRC_TRSF_STEP_OFFSET;
> + hw->trsf_step = temp;
> +
> + hw->frg_step = 0;
> + hw->src_blk_step = 0;
> + hw->des_blk_step = 0;
> + return 0;
> +}
> +static struct dma_async_tx_descriptor *
> +sprd_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
> +unsigned int sglen, enum dma_transfer_direction dir,
> +unsigned long flags, void *context)
> +{
> + 

Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread damian
On Wed, 16. May 18:05, Baoquan He wrote:
> Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> handle conflict between KASLR and huge pages, will be used in the next patch.
> 
> Function parse_gb_huge_pages() is used to parse kernel command-line to get
> how many 1GB huge pages have been specified. A static global variable
> 'max_gb_huge_pages' is added to store the number.
> 
> And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
> from the passed in memory region according to the specified number.
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/x86/boot/compressed/kaslr.c | 71 
> 
>  1 file changed, 71 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index a0a50b91ecef..13bd879cdc5d 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
>   memmap_too_large = true;
>  }
>  
> +/* Store the number of 1GB huge pages which user specified.*/
> +static unsigned long max_gb_huge_pages;
> +
> +static int parse_gb_huge_pages(char *param, char* val)
> +{
> + char *p;
> + u64 mem_size;
> + static bool gbpage_sz = false;
> +
> + if (!strcmp(param, "hugepagesz")) {
> + p = val;
> + mem_size = memparse(p, );
> + if (mem_size == PUD_SIZE) {
> + if (gbpage_sz)
> + warn("Repeadly set hugeTLB page size of 1G!\n");
> + gbpage_sz = true;
> + } else
> + gbpage_sz = false;
> + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> + p = val;
> + max_gb_huge_pages = simple_strtoull(p, , 0);
> + debug_putaddr(max_gb_huge_pages);
> + }
> +}
Hello,

the return value is missing for the function or not ?

regards
Damian

> +
> +
>  static int handle_mem_memmap(void)
>  {
>   char *args = (char *)get_cmd_line_ptr();
> @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> unsigned long image_size)
>   }
>  }
>  
> +/* Skip as many 1GB huge pages as possible in the passed region. */
> +static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> image_size)
> +{
> + int i = 0;
> + unsigned long addr, size;
> + struct mem_vector tmp;
> +
> + if (!max_gb_huge_pages) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + addr = ALIGN(region->start, PUD_SIZE);
> + /* If Did we raise the address above the passed in memory entry? */
> + if (addr < region->start + region->size)
> + size = region->size - (addr - region->start);
> +
> + /* Check how many 1GB huge pages can be filtered out*/
> + while (size > PUD_SIZE && max_gb_huge_pages) {
> + size -= PUD_SIZE;
> + max_gb_huge_pages--;
> + i++;
> + }
> +
> + if (!i) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + /* Process the remaining regions after filtering out. */
> +
> + if (addr >= region->start + image_size) {
> + tmp.start = region->start;
> + tmp.size = addr - region->start;
> + store_slot_info(, image_size);
> + }
> +
> + size  = region->size - (addr - region->start) - i * PUD_SIZE;
> +if (size >= image_size) {
> + tmp.start = addr + i*PUD_SIZE;
> + tmp.size = size;
> + store_slot_info(, image_size);
> +}
> +}
> +
>  static unsigned long slots_fetch_random(void)
>  {
>   unsigned long slot;
> -- 
> 2.13.6
> 


Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread damian
On Wed, 16. May 18:05, Baoquan He wrote:
> Functions parse_gb_huge_pages() and process_gb_huge_page() are introduced to
> handle conflict between KASLR and huge pages, will be used in the next patch.
> 
> Function parse_gb_huge_pages() is used to parse kernel command-line to get
> how many 1GB huge pages have been specified. A static global variable
> 'max_gb_huge_pages' is added to store the number.
> 
> And process_gb_huge_page() is used to skip as many 1GB huge pages as possible
> from the passed in memory region according to the specified number.
> 
> Signed-off-by: Baoquan He 
> ---
>  arch/x86/boot/compressed/kaslr.c | 71 
> 
>  1 file changed, 71 insertions(+)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index a0a50b91ecef..13bd879cdc5d 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -215,6 +215,32 @@ static void mem_avoid_memmap(char *str)
>   memmap_too_large = true;
>  }
>  
> +/* Store the number of 1GB huge pages which user specified.*/
> +static unsigned long max_gb_huge_pages;
> +
> +static int parse_gb_huge_pages(char *param, char* val)
> +{
> + char *p;
> + u64 mem_size;
> + static bool gbpage_sz = false;
> +
> + if (!strcmp(param, "hugepagesz")) {
> + p = val;
> + mem_size = memparse(p, );
> + if (mem_size == PUD_SIZE) {
> + if (gbpage_sz)
> + warn("Repeadly set hugeTLB page size of 1G!\n");
> + gbpage_sz = true;
> + } else
> + gbpage_sz = false;
> + } else if (!strcmp(param, "hugepages") && gbpage_sz) {
> + p = val;
> + max_gb_huge_pages = simple_strtoull(p, , 0);
> + debug_putaddr(max_gb_huge_pages);
> + }
> +}
Hello,

the return value is missing for the function or not ?

regards
Damian

> +
> +
>  static int handle_mem_memmap(void)
>  {
>   char *args = (char *)get_cmd_line_ptr();
> @@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> unsigned long image_size)
>   }
>  }
>  
> +/* Skip as many 1GB huge pages as possible in the passed region. */
> +static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> image_size)
> +{
> + int i = 0;
> + unsigned long addr, size;
> + struct mem_vector tmp;
> +
> + if (!max_gb_huge_pages) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + addr = ALIGN(region->start, PUD_SIZE);
> + /* If Did we raise the address above the passed in memory entry? */
> + if (addr < region->start + region->size)
> + size = region->size - (addr - region->start);
> +
> + /* Check how many 1GB huge pages can be filtered out*/
> + while (size > PUD_SIZE && max_gb_huge_pages) {
> + size -= PUD_SIZE;
> + max_gb_huge_pages--;
> + i++;
> + }
> +
> + if (!i) {
> + store_slot_info(region, image_size);
> + return;
> + }
> +
> + /* Process the remaining regions after filtering out. */
> +
> + if (addr >= region->start + image_size) {
> + tmp.start = region->start;
> + tmp.size = addr - region->start;
> + store_slot_info(, image_size);
> + }
> +
> + size  = region->size - (addr - region->start) - i * PUD_SIZE;
> +if (size >= image_size) {
> + tmp.start = addr + i*PUD_SIZE;
> + tmp.size = size;
> + store_slot_info(, image_size);
> +}
> +}
> +
>  static unsigned long slots_fetch_random(void)
>  {
>   unsigned long slot;
> -- 
> 2.13.6
> 


Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"

2018-05-16 Thread Kevin Locke
On Tue, Feb 13, 2018 at 2:07 PM, Ulf Magnusson  wrote:
>> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote:
>> Sander Eikelenboom wrote:
>>> The Debian kernel-package tool make-kpkg for easy building of upstream
>>> kernels on Debian fails with linux 4.16-rc1.
>>>
>>> Bisection turned up as culprit:
>>>  commit d2a04648a5dbc3d1d043b35257364f0197d4d868
>>>
>>> What the commit does is to make silentoldconfig not immediately exit(1)
>>> when both of the following apply:
>>>
>>> 1. stdin is from something that's not a terminal
>>>
>>> 2. New symbols are prompted for
> 
> Shouldn't be a problem to back this one out either if it turns out to
> cause massive amounts of pain in practice I guess, even if it's the
> Debian tools doing something weird.
> 
> Good to look into what it is they're doing in any case.

It appears that make-kpkg is trying to extract kernel version and
configuration information by including the kernel Makefile then
referencing the variables it exports.  The offending make-kpkg
Makefile is kernel_version.mk[1] which can be simplified to the
following example:

-8<- make-kpkg-regression.mk -
include Makefile
.PHONY: debian_VERSION
debian_VERSION:
-8<---

Invoking this script from the root of the kernel source tree, with
stderr redirected (but not stdin) demonstrates the issue:

make -f make-kpkg-regression.mk debian_VERSION >/dev/null 2>&1

Before d2a04648a5db this would exit, after it will hang waiting for
input.  This occurs because the debian_VERSION target name isn't in
no-dot-config-targets defined in Makefile, so Makefile invokes
silentoldconfig automatically as necessary.[2]

I think the issue is best fixed by make-kpkg, and that d2a04648a5db
doesn't need to be reverted.  The simple fix would be for make-kpkg to
redirect stdin when redirecting stdout/stderr so that
scripts/kconfig/conf will fail as it did before.  The better fix would
be for make-kpkg to use a supported interface for getting the
variables it needs, rather than including Makefile.

Best,
Kevin

1. 
https://salsa.debian.org/srivasta/kernel-package/blob/master/kernel/ruleset/kernel_version.mk
2. Interestingly, it's not just that the target isn't in
   no-dot-config-targets.  make-kpkg actually sets `dot-config := 1`
   to force silentoldconfig (since 38399c1[3]).
3. https://salsa.debian.org/srivasta/kernel-package/commit/38399c1


Re: Linux 4.16-rc1: regression bisected, Debian kernel package tool make-kpkg stalls indefinitely during kernel build due to commit "kconfig: remove check_stdin()"

2018-05-16 Thread Kevin Locke
On Tue, Feb 13, 2018 at 2:07 PM, Ulf Magnusson  wrote:
>> On Tue, Feb 13, 2018 at 12:33:24PM +0100, Ulf Magnusson wrote:
>> Sander Eikelenboom wrote:
>>> The Debian kernel-package tool make-kpkg for easy building of upstream
>>> kernels on Debian fails with linux 4.16-rc1.
>>>
>>> Bisection turned up as culprit:
>>>  commit d2a04648a5dbc3d1d043b35257364f0197d4d868
>>>
>>> What the commit does is to make silentoldconfig not immediately exit(1)
>>> when both of the following apply:
>>>
>>> 1. stdin is from something that's not a terminal
>>>
>>> 2. New symbols are prompted for
> 
> Shouldn't be a problem to back this one out either if it turns out to
> cause massive amounts of pain in practice I guess, even if it's the
> Debian tools doing something weird.
> 
> Good to look into what it is they're doing in any case.

It appears that make-kpkg is trying to extract kernel version and
configuration information by including the kernel Makefile then
referencing the variables it exports.  The offending make-kpkg
Makefile is kernel_version.mk[1] which can be simplified to the
following example:

-8<- make-kpkg-regression.mk -
include Makefile
.PHONY: debian_VERSION
debian_VERSION:
-8<---

Invoking this script from the root of the kernel source tree, with
stderr redirected (but not stdin) demonstrates the issue:

make -f make-kpkg-regression.mk debian_VERSION >/dev/null 2>&1

Before d2a04648a5db this would exit, after it will hang waiting for
input.  This occurs because the debian_VERSION target name isn't in
no-dot-config-targets defined in Makefile, so Makefile invokes
silentoldconfig automatically as necessary.[2]

I think the issue is best fixed by make-kpkg, and that d2a04648a5db
doesn't need to be reverted.  The simple fix would be for make-kpkg to
redirect stdin when redirecting stdout/stderr so that
scripts/kconfig/conf will fail as it did before.  The better fix would
be for make-kpkg to use a supported interface for getting the
variables it needs, rather than including Makefile.

Best,
Kevin

1. 
https://salsa.debian.org/srivasta/kernel-package/blob/master/kernel/ruleset/kernel_version.mk
2. Interestingly, it's not just that the target isn't in
   no-dot-config-targets.  make-kpkg actually sets `dot-config := 1`
   to force silentoldconfig (since 38399c1[3]).
3. https://salsa.debian.org/srivasta/kernel-package/commit/38399c1


Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread James Simmons

> > > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > > sending forwarding their patches properly so you're doing a difficult
> > > and thankless job...  Thanks for that.  I'm sure it's frustrating to
> > > look at these patches for you as well.
> > 
> > Thank you for the complement. Also thank you for taking time to review
> > these patches. Your feedback is most welcomed and benefitical to the
> > health of the lustre client.
> > 
> > Sadly its not just Intel but other vendors that don't directly contribute
> > to the linux lustre client. I have spoke to the vendors about contributing 
> > and they all say the same thing. No working with drivers in the staging 
> > tree. Sadly all the parties involved are very interested in the success 
> > of the lustre client. No one has ever told me directly why they don't get
> > involved but I suspect it has to deal with 2 reasons. One is that staging
> > drivers are not normally enabled by distributions so their clients 
> > normally will never deal with the staging lustre client. Secondly vendors
> > just lack the man power to contribute in a meanful way.
> 
> If staging is hurting you, why is it in staging at all?  Why not just
> drop it, go off and spend a few months to clean up all the issues in
> your own tree (with none of those pesky requirements of easy-to-review
> patches) and then submit a "clean" filesystem for inclusion in the
> "real" part of the kernel tree?
> 
> It doesn't sound like anyone is actually using this code in the tree
> as-is, so why even keep it here?

I never said being in staging is hurting the progression of Lustre. In 
fact it is the exact opposite otherwise I wouldn't be active in this work.
What I was pointing out to Dan was that many vendors are reluctant to 
partcipate in broader open source development of this type.

The whole point of this is to evolve Lustre into a proper open source 
project not dependent on vendors for survival. Several years ago Lustre 
changed hands several times and the HPC community was worried about its
survival. Various institutions band togther to raise the resources to 
keep it alive. Over time Lustre has been migrating to a more open source 
community effort. An awesome example is the work the University of Indiana 
did for the sptlrpc layer. Now we see efforts expanding into the realm of 
the linux lustre client. Actually HPC sites that are community members are 
testing and running the linux client. In spite of the lack of vendor 
involvement the linux lustre client is making excellent progress. How 
often do you see style patches anymore? The headers are properly split
between userspace UAPI headers and kernel space. One of the major barriers
to leave staging was the the lack of a strong presence to continue moving
the lustre client forward. That is no longer the case. The finish line is
in view.


Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread James Simmons

> > > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > > sending forwarding their patches properly so you're doing a difficult
> > > and thankless job...  Thanks for that.  I'm sure it's frustrating to
> > > look at these patches for you as well.
> > 
> > Thank you for the complement. Also thank you for taking time to review
> > these patches. Your feedback is most welcomed and benefitical to the
> > health of the lustre client.
> > 
> > Sadly its not just Intel but other vendors that don't directly contribute
> > to the linux lustre client. I have spoke to the vendors about contributing 
> > and they all say the same thing. No working with drivers in the staging 
> > tree. Sadly all the parties involved are very interested in the success 
> > of the lustre client. No one has ever told me directly why they don't get
> > involved but I suspect it has to deal with 2 reasons. One is that staging
> > drivers are not normally enabled by distributions so their clients 
> > normally will never deal with the staging lustre client. Secondly vendors
> > just lack the man power to contribute in a meanful way.
> 
> If staging is hurting you, why is it in staging at all?  Why not just
> drop it, go off and spend a few months to clean up all the issues in
> your own tree (with none of those pesky requirements of easy-to-review
> patches) and then submit a "clean" filesystem for inclusion in the
> "real" part of the kernel tree?
> 
> It doesn't sound like anyone is actually using this code in the tree
> as-is, so why even keep it here?

I never said being in staging is hurting the progression of Lustre. In 
fact it is the exact opposite otherwise I wouldn't be active in this work.
What I was pointing out to Dan was that many vendors are reluctant to 
partcipate in broader open source development of this type.

The whole point of this is to evolve Lustre into a proper open source 
project not dependent on vendors for survival. Several years ago Lustre 
changed hands several times and the HPC community was worried about its
survival. Various institutions band togther to raise the resources to 
keep it alive. Over time Lustre has been migrating to a more open source 
community effort. An awesome example is the work the University of Indiana 
did for the sptlrpc layer. Now we see efforts expanding into the realm of 
the linux lustre client. Actually HPC sites that are community members are 
testing and running the linux client. In spite of the lack of vendor 
involvement the linux lustre client is making excellent progress. How 
often do you see style patches anymore? The headers are properly split
between userspace UAPI headers and kernel space. One of the major barriers
to leave staging was the the lack of a strong presence to continue moving
the lustre client forward. That is no longer the case. The finish line is
in view.


Re: [PATCH RFC] schedutil: Allow cpufreq requests to be made even when kthread kicked

2018-05-16 Thread Viresh Kumar
On 16-05-18, 15:45, Joel Fernandes (Google) wrote:
>  kernel/sched/cpufreq_schedutil.c | 36 +---
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index e13df951aca7..a87fc281893d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
> *sg_policy, u64 time)
>   !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>   return false;
>  
> - if (sg_policy->work_in_progress)
> - return false;
> -
>   if (unlikely(sg_policy->need_freq_update)) {
>   sg_policy->need_freq_update = false;
>   /*
> @@ -129,8 +126,11 @@ static void sugov_update_commit(struct sugov_policy 
> *sg_policy, u64 time,
>   policy->cur = next_freq;
>   trace_cpu_frequency(next_freq, smp_processor_id());
>   } else {
> - sg_policy->work_in_progress = true;
> - irq_work_queue(_policy->irq_work);
> + /* Don't queue request if one was already queued */
> + if (!sg_policy->work_in_progress) {

Merge it above to make it "else if".

> + sg_policy->work_in_progress = true;
> + irq_work_queue(_policy->irq_work);
> + }
>   }
>  }
>  
> @@ -291,6 +291,15 @@ static void sugov_update_single(struct update_util_data 
> *hook, u64 time,
>  
>   ignore_dl_rate_limit(sg_cpu, sg_policy);
>  
> + /*
> +  * For slow-switch systems, single policy requests can't run at the
> +  * moment if the governor thread is already processing a pending
> +  * frequency switch request, this can be fixed by acquiring update_lock
> +  * while updating next_freq and work_in_progress but we prefer not to.
> +  */
> + if (sg_policy->work_in_progress)
> + return;
> +

@Rafael: Do you think its worth start using the lock now for unshared
policies ?

>   if (!sugov_should_update_freq(sg_policy, time))
>   return;
>  
> @@ -382,13 +391,24 @@ sugov_update_shared(struct update_util_data *hook, u64 
> time, unsigned int flags)
>  static void sugov_work(struct kthread_work *work)
>  {
>   struct sugov_policy *sg_policy = container_of(work, struct 
> sugov_policy, work);
> + unsigned int freq;
> + unsigned long flags;
> +
> + /*
> +  * Hold sg_policy->update_lock shortly to handle the case where:
> +  * incase sg_policy->next_freq is read here, and then updated by
> +  * sugov_update_shared just before work_in_progress is set to false
> +  * here, we may miss queueing the new update.
> +  */
> + raw_spin_lock_irqsave(_policy->update_lock, flags);
> + freq = sg_policy->next_freq;
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(_policy->update_lock, flags);
>  
>   mutex_lock(_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> + __cpufreq_driver_target(sg_policy->policy, freq,
>   CPUFREQ_RELATION_L);

No need of line break anymore.

>   mutex_unlock(_policy->work_lock);
> -
> - sg_policy->work_in_progress = false;
>  }
>  
>  static void sugov_irq_work(struct irq_work *irq_work)

LGTM.

-- 
viresh


Re: [PATCH RFC] schedutil: Allow cpufreq requests to be made even when kthread kicked

2018-05-16 Thread Viresh Kumar
On 16-05-18, 15:45, Joel Fernandes (Google) wrote:
>  kernel/sched/cpufreq_schedutil.c | 36 +---
>  1 file changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c 
> b/kernel/sched/cpufreq_schedutil.c
> index e13df951aca7..a87fc281893d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
> *sg_policy, u64 time)
>   !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>   return false;
>  
> - if (sg_policy->work_in_progress)
> - return false;
> -
>   if (unlikely(sg_policy->need_freq_update)) {
>   sg_policy->need_freq_update = false;
>   /*
> @@ -129,8 +126,11 @@ static void sugov_update_commit(struct sugov_policy 
> *sg_policy, u64 time,
>   policy->cur = next_freq;
>   trace_cpu_frequency(next_freq, smp_processor_id());
>   } else {
> - sg_policy->work_in_progress = true;
> - irq_work_queue(_policy->irq_work);
> + /* Don't queue request if one was already queued */
> + if (!sg_policy->work_in_progress) {

Merge it above to make it "else if".

> + sg_policy->work_in_progress = true;
> + irq_work_queue(_policy->irq_work);
> + }
>   }
>  }
>  
> @@ -291,6 +291,15 @@ static void sugov_update_single(struct update_util_data 
> *hook, u64 time,
>  
>   ignore_dl_rate_limit(sg_cpu, sg_policy);
>  
> + /*
> +  * For slow-switch systems, single policy requests can't run at the
> +  * moment if the governor thread is already processing a pending
> +  * frequency switch request, this can be fixed by acquiring update_lock
> +  * while updating next_freq and work_in_progress but we prefer not to.
> +  */
> + if (sg_policy->work_in_progress)
> + return;
> +

@Rafael: Do you think its worth start using the lock now for unshared
policies ?

>   if (!sugov_should_update_freq(sg_policy, time))
>   return;
>  
> @@ -382,13 +391,24 @@ sugov_update_shared(struct update_util_data *hook, u64 
> time, unsigned int flags)
>  static void sugov_work(struct kthread_work *work)
>  {
>   struct sugov_policy *sg_policy = container_of(work, struct 
> sugov_policy, work);
> + unsigned int freq;
> + unsigned long flags;
> +
> + /*
> +  * Hold sg_policy->update_lock shortly to handle the case where:
> +  * incase sg_policy->next_freq is read here, and then updated by
> +  * sugov_update_shared just before work_in_progress is set to false
> +  * here, we may miss queueing the new update.
> +  */
> + raw_spin_lock_irqsave(_policy->update_lock, flags);
> + freq = sg_policy->next_freq;
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(_policy->update_lock, flags);
>  
>   mutex_lock(_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> + __cpufreq_driver_target(sg_policy->policy, freq,
>   CPUFREQ_RELATION_L);

No need of line break anymore.

>   mutex_unlock(_policy->work_lock);
> -
> - sg_policy->work_in_progress = false;
>  }
>  
>  static void sugov_irq_work(struct irq_work *irq_work)

LGTM.

-- 
viresh


[PATCH] clk: imx6sl: correct ocram_podf clock type

2018-05-16 Thread Anson Huang
IMX6SL_CLK_OCRAM_PODF is a busy divider, its name in
CCM_CDHIPR register of Reference Manual CCM chapter
is axi_podf_busy, correct its clock type.

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-imx6sl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/imx/clk-imx6sl.c b/drivers/clk/imx/clk-imx6sl.c
index 9642cdf..66b1dd1 100644
--- a/drivers/clk/imx/clk-imx6sl.c
+++ b/drivers/clk/imx/clk-imx6sl.c
@@ -330,7 +330,7 @@ static void __init imx6sl_clocks_init(struct device_node 
*ccm_node)
clks[IMX6SL_CLK_PERIPH2] = imx_clk_busy_mux("periph2", base + 0x14, 26, 
 1,   base + 0x48, 3,  periph2_sels, ARRAY_SIZE(periph2_sels));
 
/*   name   
  parent_name  reg   shift width */
-   clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_divider("ocram_podf",  
  "ocram_sel", base + 0x14, 16, 3);
+   clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_busy_divider("ocram_podf", 
  "ocram_sel", base + 0x14, 16, 3, base + 0x48, 0);
clks[IMX6SL_CLK_PERIPH_CLK2_PODF]  = 
imx_clk_divider("periph_clk2_podf",  "periph_clk2_sel",   base + 0x14, 27, 3);
clks[IMX6SL_CLK_PERIPH2_CLK2_PODF] = 
imx_clk_divider("periph2_clk2_podf", "periph2_clk2_sel",  base + 0x14, 0,  3);
clks[IMX6SL_CLK_IPG]   = imx_clk_divider("ipg", 
  "ahb",   base + 0x14, 8,  2);
-- 
2.7.4



[PATCH] clk: imx6sx: disable unnecessary clocks during clock initialization

2018-05-16 Thread Anson Huang
Disable those unnecessary clocks during kernel boot up to save power,
those modules clock should be managed by modules driver in runtime.

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-imx6sx.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/clk/imx/clk-imx6sx.c b/drivers/clk/imx/clk-imx6sx.c
index 0178ee2..10c771b 100644
--- a/drivers/clk/imx/clk-imx6sx.c
+++ b/drivers/clk/imx/clk-imx6sx.c
@@ -97,12 +97,7 @@ static int const clks_init_on[] __initconst = {
IMX6SX_CLK_IPMUX1, IMX6SX_CLK_IPMUX2, IMX6SX_CLK_IPMUX3,
IMX6SX_CLK_WAKEUP, IMX6SX_CLK_MMDC_P0_FAST, IMX6SX_CLK_MMDC_P0_IPG,
IMX6SX_CLK_ROM, IMX6SX_CLK_ARM, IMX6SX_CLK_IPG, IMX6SX_CLK_OCRAM,
-   IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_M4,
-   IMX6SX_CLK_QSPI1, IMX6SX_CLK_QSPI2, IMX6SX_CLK_UART_IPG,
-   IMX6SX_CLK_UART_SERIAL, IMX6SX_CLK_I2C3, IMX6SX_CLK_ECSPI5,
-   IMX6SX_CLK_CAN1_IPG, IMX6SX_CLK_CAN1_SERIAL, IMX6SX_CLK_CAN2_IPG,
-   IMX6SX_CLK_CAN2_SERIAL, IMX6SX_CLK_CANFD, IMX6SX_CLK_EPIT1,
-   IMX6SX_CLK_EPIT2,
+   IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_TZASC1,
 };
 
 static const struct clk_div_table clk_enet_ref_table[] = {
-- 
2.7.4



[PATCH] clk: imx6sl: correct ocram_podf clock type

2018-05-16 Thread Anson Huang
IMX6SL_CLK_OCRAM_PODF is a busy divider, its name in
CCM_CDHIPR register of Reference Manual CCM chapter
is axi_podf_busy, correct its clock type.

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-imx6sl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/imx/clk-imx6sl.c b/drivers/clk/imx/clk-imx6sl.c
index 9642cdf..66b1dd1 100644
--- a/drivers/clk/imx/clk-imx6sl.c
+++ b/drivers/clk/imx/clk-imx6sl.c
@@ -330,7 +330,7 @@ static void __init imx6sl_clocks_init(struct device_node 
*ccm_node)
clks[IMX6SL_CLK_PERIPH2] = imx_clk_busy_mux("periph2", base + 0x14, 26, 
 1,   base + 0x48, 3,  periph2_sels, ARRAY_SIZE(periph2_sels));
 
/*   name   
  parent_name  reg   shift width */
-   clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_divider("ocram_podf",  
  "ocram_sel", base + 0x14, 16, 3);
+   clks[IMX6SL_CLK_OCRAM_PODF]= imx_clk_busy_divider("ocram_podf", 
  "ocram_sel", base + 0x14, 16, 3, base + 0x48, 0);
clks[IMX6SL_CLK_PERIPH_CLK2_PODF]  = 
imx_clk_divider("periph_clk2_podf",  "periph_clk2_sel",   base + 0x14, 27, 3);
clks[IMX6SL_CLK_PERIPH2_CLK2_PODF] = 
imx_clk_divider("periph2_clk2_podf", "periph2_clk2_sel",  base + 0x14, 0,  3);
clks[IMX6SL_CLK_IPG]   = imx_clk_divider("ipg", 
  "ahb",   base + 0x14, 8,  2);
-- 
2.7.4



[PATCH] clk: imx6sx: disable unnecessary clocks during clock initialization

2018-05-16 Thread Anson Huang
Disable those unnecessary clocks during kernel boot up to save power,
those modules clock should be managed by modules driver in runtime.

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-imx6sx.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/clk/imx/clk-imx6sx.c b/drivers/clk/imx/clk-imx6sx.c
index 0178ee2..10c771b 100644
--- a/drivers/clk/imx/clk-imx6sx.c
+++ b/drivers/clk/imx/clk-imx6sx.c
@@ -97,12 +97,7 @@ static int const clks_init_on[] __initconst = {
IMX6SX_CLK_IPMUX1, IMX6SX_CLK_IPMUX2, IMX6SX_CLK_IPMUX3,
IMX6SX_CLK_WAKEUP, IMX6SX_CLK_MMDC_P0_FAST, IMX6SX_CLK_MMDC_P0_IPG,
IMX6SX_CLK_ROM, IMX6SX_CLK_ARM, IMX6SX_CLK_IPG, IMX6SX_CLK_OCRAM,
-   IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_M4,
-   IMX6SX_CLK_QSPI1, IMX6SX_CLK_QSPI2, IMX6SX_CLK_UART_IPG,
-   IMX6SX_CLK_UART_SERIAL, IMX6SX_CLK_I2C3, IMX6SX_CLK_ECSPI5,
-   IMX6SX_CLK_CAN1_IPG, IMX6SX_CLK_CAN1_SERIAL, IMX6SX_CLK_CAN2_IPG,
-   IMX6SX_CLK_CAN2_SERIAL, IMX6SX_CLK_CANFD, IMX6SX_CLK_EPIT1,
-   IMX6SX_CLK_EPIT2,
+   IMX6SX_CLK_PER2_MAIN, IMX6SX_CLK_PERCLK, IMX6SX_CLK_TZASC1,
 };
 
 static const struct clk_div_table clk_enet_ref_table[] = {
-- 
2.7.4



[PATCH v7 2/3] gpio: pca953x: define masks for addressing common and extended registers

2018-05-16 Thread H. Nikolaus Schaller
These mask bits are to be used to map the extended register
addreseses (which are defined for an unsupported 8-bit pcal chip)
to 16 and 24 bit chips (pcal6524).

Reviewed-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 2b667166e855..c682921d7019 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -56,6 +56,10 @@
 #define PCAL6524_DEBOUNCE  0x2d
 
 #define PCA_GPIO_MASK  0x00FF
+
+#define PCAL_GPIO_MASK 0x1f
+#define PCAL_PINCTRL_MASK  0xe0
+
 #define PCA_INT0x0100
 #define PCA_PCAL   0x0200
 #define PCA_LATCH_INT (PCA_PCAL | PCA_INT)
-- 
2.12.2



[PATCH v7 2/3] gpio: pca953x: define masks for addressing common and extended registers

2018-05-16 Thread H. Nikolaus Schaller
These mask bits are to be used to map the extended register
addreseses (which are defined for an unsupported 8-bit pcal chip)
to 16 and 24 bit chips (pcal6524).

Reviewed-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 2b667166e855..c682921d7019 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -56,6 +56,10 @@
 #define PCAL6524_DEBOUNCE  0x2d
 
 #define PCA_GPIO_MASK  0x00FF
+
+#define PCAL_GPIO_MASK 0x1f
+#define PCAL_PINCTRL_MASK  0xe0
+
 #define PCA_INT0x0100
 #define PCA_PCAL   0x0200
 #define PCA_LATCH_INT (PCA_PCAL | PCA_INT)
-- 
2.12.2



Re: linux-next: build failure after merge of the drm tree

2018-05-16 Thread Dave Airlie
I've applied this locally for now so I can continue arm64 builds :-)

Dave.

On 16 May 2018 at 18:09, Oded Gabbay  wrote:
> On Wed, May 16, 2018 at 9:53 AM, Stephen Rothwell  
> wrote:
>> Hi all,
>>
>> After merging the drm tree, today's linux-next build (powerpc
>> allyesconfig) failed like this:
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function 
>> 'init_user_pages':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:632:3: error: implicit 
>> declaration of function 'release_pages'; did you mean 'release_task'? 
>> [-Werror=implicit-function-declaration]
>>release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
>>^
>>release_task
>>
>> Caused by commit
>>
>>   5ae0283e831a ("drm/amdgpu: Add userptr support for KFD")
>>
>> I have applied the following patch for today:
>>
>> From: Stephen Rothwell 
>> Date: Wed, 16 May 2018 16:43:34 +1000
>> Subject: [PATCH] drm/amdgpu: include pagemap.h for release_pages()
>>
>> Fixes: 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD"
>> Cc: Felix Kuehling 
>> Cc: Oded Gabbay 
>> Signed-off-by: Stephen Rothwell 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 72ab2b1ffe75..ff8fd75f7ca5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -23,6 +23,7 @@
>>  #define pr_fmt(fmt) "kfd2kgd: " fmt
>>
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include "amdgpu_object.h"
>> --
>> 2.17.0
>>
>> --
>> Cheers,
>> Stephen Rothwell
>
> Thanks Stephen,
>
> I'll add it to amdkfd-next and send it to Dave with other fixes.
>
> Oded
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: linux-next: build failure after merge of the drm tree

2018-05-16 Thread Dave Airlie
I've applied this locally for now so I can continue arm64 builds :-)

Dave.

On 16 May 2018 at 18:09, Oded Gabbay  wrote:
> On Wed, May 16, 2018 at 9:53 AM, Stephen Rothwell  
> wrote:
>> Hi all,
>>
>> After merging the drm tree, today's linux-next build (powerpc
>> allyesconfig) failed like this:
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c: In function 
>> 'init_user_pages':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:632:3: error: implicit 
>> declaration of function 'release_pages'; did you mean 'release_task'? 
>> [-Werror=implicit-function-declaration]
>>release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
>>^
>>release_task
>>
>> Caused by commit
>>
>>   5ae0283e831a ("drm/amdgpu: Add userptr support for KFD")
>>
>> I have applied the following patch for today:
>>
>> From: Stephen Rothwell 
>> Date: Wed, 16 May 2018 16:43:34 +1000
>> Subject: [PATCH] drm/amdgpu: include pagemap.h for release_pages()
>>
>> Fixes: 5ae0283e831a ("drm/amdgpu: Add userptr support for KFD"
>> Cc: Felix Kuehling 
>> Cc: Oded Gabbay 
>> Signed-off-by: Stephen Rothwell 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 72ab2b1ffe75..ff8fd75f7ca5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -23,6 +23,7 @@
>>  #define pr_fmt(fmt) "kfd2kgd: " fmt
>>
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include "amdgpu_object.h"
>> --
>> 2.17.0
>>
>> --
>> Cheers,
>> Stephen Rothwell
>
> Thanks Stephen,
>
> I'll add it to amdkfd-next and send it to Dave with other fixes.
>
> Oded
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524

2018-05-16 Thread H. Nikolaus Schaller
The register constants are so far defined in a way that they fit
for the pcal9555a when shifted by the number of banks, i.e. are
multiplied by 2 in the accessor function.

Now, the pcal6524 has 3 banks which means the relative offset
is multiplied by 4 for the standard registers.

Simply applying the bit shift to the extended registers gives
a wrong result, since the base offset is already included in
the offset.

Therefore, we have to add code to the 24 bit accessor functions
that adjusts the register number for these exended registers.

The formula finally used was developed and proposed by
Andy Shevchenko .

Suggested-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index c682921d7019..4ad553f4e41f 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -222,9 +222,11 @@ static int pca957x_write_regs_16(struct pca953x_chip 
*chip, int reg, u8 *val)
 static int pca953x_write_regs_24(struct pca953x_chip *chip, int reg, u8 *val)
 {
int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
+   int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
+   int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
 
return i2c_smbus_write_i2c_block_data(chip->client,
- (reg << bank_shift) | REG_ADDR_AI,
+ pinctrl | addr | REG_ADDR_AI,
  NBANK(chip), val);
 }
 
@@ -264,9 +266,11 @@ static int pca953x_read_regs_16(struct pca953x_chip *chip, 
int reg, u8 *val)
 static int pca953x_read_regs_24(struct pca953x_chip *chip, int reg, u8 *val)
 {
int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
+   int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
+   int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
 
return i2c_smbus_read_i2c_block_data(chip->client,
-(reg << bank_shift) | REG_ADDR_AI,
+pinctrl | addr | REG_ADDR_AI,
 NBANK(chip), val);
 }
 
-- 
2.12.2



[PATCH v7 1/3] gpio: pca953x: set the PCA_PCAL flag also when matching by DT

2018-05-16 Thread H. Nikolaus Schaller
The of_device_table is missing the PCA_PCAL flag so the
pcal6524 would be operated in tca6424 compatibility mode which
does not handle the new interrupt mask registers.

Suggested-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 7d37692d672e..2b667166e855 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -58,6 +58,7 @@
 #define PCA_GPIO_MASK  0x00FF
 #define PCA_INT0x0100
 #define PCA_PCAL   0x0200
+#define PCA_LATCH_INT (PCA_PCAL | PCA_INT)
 #define PCA953X_TYPE   0x1000
 #define PCA957X_TYPE   0x2000
 #define PCA_TYPE_MASK  0xF000
@@ -946,8 +947,8 @@ static const struct of_device_id pca953x_dt_ids[] = {
{ .compatible = "nxp,pca9575", .data = OF_957X(16, PCA_INT), },
{ .compatible = "nxp,pca9698", .data = OF_953X(40, 0), },
 
-   { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_INT), },
-   { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_INT), },
+   { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_LATCH_INT), },
+   { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_LATCH_INT), },
 
{ .compatible = "maxim,max7310", .data = OF_953X( 8, 0), },
{ .compatible = "maxim,max7312", .data = OF_953X(16, PCA_INT), },
-- 
2.12.2



[PATCH v7 3/3] gpio: pca953x: fix address calculation for pcal6524

2018-05-16 Thread H. Nikolaus Schaller
The register constants are so far defined in a way that they fit
for the pcal9555a when shifted by the number of banks, i.e. are
multiplied by 2 in the accessor function.

Now, the pcal6524 has 3 banks which means the relative offset
is multiplied by 4 for the standard registers.

Simply applying the bit shift to the extended registers gives
a wrong result, since the base offset is already included in
the offset.

Therefore, we have to add code to the 24 bit accessor functions
that adjusts the register number for these exended registers.

The formula finally used was developed and proposed by
Andy Shevchenko .

Suggested-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index c682921d7019..4ad553f4e41f 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -222,9 +222,11 @@ static int pca957x_write_regs_16(struct pca953x_chip 
*chip, int reg, u8 *val)
 static int pca953x_write_regs_24(struct pca953x_chip *chip, int reg, u8 *val)
 {
int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
+   int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
+   int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
 
return i2c_smbus_write_i2c_block_data(chip->client,
- (reg << bank_shift) | REG_ADDR_AI,
+ pinctrl | addr | REG_ADDR_AI,
  NBANK(chip), val);
 }
 
@@ -264,9 +266,11 @@ static int pca953x_read_regs_16(struct pca953x_chip *chip, 
int reg, u8 *val)
 static int pca953x_read_regs_24(struct pca953x_chip *chip, int reg, u8 *val)
 {
int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
+   int addr = (reg & PCAL_GPIO_MASK) << bank_shift;
+   int pinctrl = (reg & PCAL_PINCTRL_MASK) << 1;
 
return i2c_smbus_read_i2c_block_data(chip->client,
-(reg << bank_shift) | REG_ADDR_AI,
+pinctrl | addr | REG_ADDR_AI,
 NBANK(chip), val);
 }
 
-- 
2.12.2



[PATCH v7 1/3] gpio: pca953x: set the PCA_PCAL flag also when matching by DT

2018-05-16 Thread H. Nikolaus Schaller
The of_device_table is missing the PCA_PCAL flag so the
pcal6524 would be operated in tca6424 compatibility mode which
does not handle the new interrupt mask registers.

Suggested-by: Andy Shevchenko 
Signed-off-by: H. Nikolaus Schaller 
---
 drivers/gpio/gpio-pca953x.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 7d37692d672e..2b667166e855 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -58,6 +58,7 @@
 #define PCA_GPIO_MASK  0x00FF
 #define PCA_INT0x0100
 #define PCA_PCAL   0x0200
+#define PCA_LATCH_INT (PCA_PCAL | PCA_INT)
 #define PCA953X_TYPE   0x1000
 #define PCA957X_TYPE   0x2000
 #define PCA_TYPE_MASK  0xF000
@@ -946,8 +947,8 @@ static const struct of_device_id pca953x_dt_ids[] = {
{ .compatible = "nxp,pca9575", .data = OF_957X(16, PCA_INT), },
{ .compatible = "nxp,pca9698", .data = OF_953X(40, 0), },
 
-   { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_INT), },
-   { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_INT), },
+   { .compatible = "nxp,pcal6524", .data = OF_953X(24, PCA_LATCH_INT), },
+   { .compatible = "nxp,pcal9555a", .data = OF_953X(16, PCA_LATCH_INT), },
 
{ .compatible = "maxim,max7310", .data = OF_953X( 8, 0), },
{ .compatible = "maxim,max7312", .data = OF_953X(16, PCA_INT), },
-- 
2.12.2



[PATCH v7 0/3] pcal6524 extensions and fixes for pca953x driver

2018-05-16 Thread H. Nikolaus Schaller
V7:
* replace PCAL register masks by hex constants

2018-05-16 19:01:29: V6:
* added proper attribution to the formula used for fixing the
  pcal6524 register address (changes commit message only)
* add back missing first patch from V2 that defines the
  PCA_LATCH_INT constant
* removed patches already merged

2018-04-28 18:33:42: V5:
* fix wrong split up between patches 1/7 and 2/7.

2018-04-26 19:35:07: V4:
* introduced PCA_LATCH_INT constant to make of_table more
  readable (suggested by Andy Shevchenko)
* converted all register constants to hex in a separate
  patch (suggested by Andy Shevchenko)
* separated additional pcal953x and pcal6524 register
  definitions into separate patches (suggested by Andy Shevchenko)
* made special pcal6524 address adjustment more readable
  (suggested by Andy Shevchenko)
* moved gpio-controller and interrupt-controller to the
  "required" section (reviewed by Rob Herring)

2018-04-10 18:07:07: V3:
* add Reported-by: and Reviewed-by:
* fix wording for bindings description and example
* convert all register offsets to hex
* omit the LEVEL-IRQ RFC/hack commit

2018-04-04 21:00:27: V2:
* added PCA_PCAL flags if matched through of-table
* fix address calculation for extended PCAL6524 registers
* hack to map LEVEL_LOW to EDGE_FALLING to be able to
  test in combination with ts3a227e driver
* improve description of bindings for optional vcc-supply
  and interrupt-controller;

2018-03-10 09:32:53: no initial description

H. Nikolaus Schaller (3):
  gpio: pca953x: set the PCA_PCAL flag also when matching by DT
  gpio: pca953x: define masks for addressing common and extended
registers
  gpio: pca953x: fix address calculation for pcal6524

 drivers/gpio/gpio-pca953x.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

-- 
2.12.2



[PATCH v7 0/3] pcal6524 extensions and fixes for pca953x driver

2018-05-16 Thread H. Nikolaus Schaller
V7:
* replace PCAL register masks by hex constants

2018-05-16 19:01:29: V6:
* added proper attribution to the formula used for fixing the
  pcal6524 register address (changes commit message only)
* add back missing first patch from V2 that defines the
  PCA_LATCH_INT constant
* removed patches already merged

2018-04-28 18:33:42: V5:
* fix wrong split up between patches 1/7 and 2/7.

2018-04-26 19:35:07: V4:
* introduced PCA_LATCH_INT constant to make of_table more
  readable (suggested by Andy Shevchenko)
* converted all register constants to hex in a separate
  patch (suggested by Andy Shevchenko)
* separated additional pcal953x and pcal6524 register
  definitions into separate patches (suggested by Andy Shevchenko)
* made special pcal6524 address adjustment more readable
  (suggested by Andy Shevchenko)
* moved gpio-controller and interrupt-controller to the
  "required" section (reviewed by Rob Herring)

2018-04-10 18:07:07: V3:
* add Reported-by: and Reviewed-by:
* fix wording for bindings description and example
* convert all register offsets to hex
* omit the LEVEL-IRQ RFC/hack commit

2018-04-04 21:00:27: V2:
* added PCA_PCAL flags if matched through of-table
* fix address calculation for extended PCAL6524 registers
* hack to map LEVEL_LOW to EDGE_FALLING to be able to
  test in combination with ts3a227e driver
* improve description of bindings for optional vcc-supply
  and interrupt-controller;

2018-03-10 09:32:53: no initial description

H. Nikolaus Schaller (3):
  gpio: pca953x: set the PCA_PCAL flag also when matching by DT
  gpio: pca953x: define masks for addressing common and extended
registers
  gpio: pca953x: fix address calculation for pcal6524

 drivers/gpio/gpio-pca953x.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

-- 
2.12.2



Re: [PATCH] perf annotate: Support multiple events without group

2018-05-16 Thread Jin, Yao

Hi,

Any comments for this fix?

Thanks
Jin Yao

On 5/10/2018 9:59 PM, Jin Yao wrote:

See example,

perf record -e cycles,branches ./div
perf annotate main --stdio or
perf annotate main --stdio2 or
perf annotate main

The "perf annotate" should show both cycles and branches on the
left side, but actually it only shows one event cycles.

It works with events group like:
perf record -e "{cycles,branches}" ./div

It should work too even without group.

This patch uses 'symbol_conf.nr_events > 1' to check if it's the
multiple events case.

With this patch,

perf record -e cycles,branches ./div
perf annotate main --stdio2

   ..
for (i = 0; i < 20; i++) {
flag = compute_flag();
   4.85   5.83  38:   xor%eax,%eax
   0.01   0.01  → callq  compute_flag

count++;
   4.44   2.27movcount,%edx
   0.00   0.00add$0x1,%edx
   ..

Signed-off-by: Jin Yao 
---
  tools/perf/util/annotate.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5d74a30..ca8d4b1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1054,6 +1054,8 @@ annotation_line__new(struct annotate_args *args, size_t 
privsize)
  
  	if (perf_evsel__is_group_event(evsel))

nr = evsel->nr_members;
+   else if (symbol_conf.nr_events > nr)
+   nr = symbol_conf.nr_events;
  
  	size += sizeof(al->samples[0]) * nr;
  
@@ -1326,6 +1328,8 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
  
  		if (perf_evsel__is_group_event(evsel))

width *= evsel->nr_members;
+   else if (symbol_conf.nr_events > 1)
+   width *= symbol_conf.nr_events;
  
  		if (!*al->line)

printf(" %*s:\n", width, " ");
@@ -1967,6 +1971,8 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
  
  	if (perf_evsel__is_group_event(evsel))

width *= evsel->nr_members;
+   else if (symbol_conf.nr_events > 1)
+   width *= symbol_conf.nr_events;
  
  	graph_dotted_len = printf(" %-*.*s|	Source code & Disassembly of %s for %s (%" PRIu64 " samples)\n",

  width, width, symbol_conf.show_total_period ? 
"Period" :
@@ -2578,6 +2584,8 @@ int symbol__annotate2(struct symbol *sym, struct map 
*map, struct perf_evsel *ev
  
  	if (perf_evsel__is_group_event(evsel))

nr_pcnt = evsel->nr_members;
+   else if (symbol_conf.nr_events > nr_pcnt)
+   nr_pcnt = symbol_conf.nr_events;
  
  	err = symbol__annotate(sym, map, evsel, 0, parch);

if (err)



Re: [PATCH] perf annotate: Support multiple events without group

2018-05-16 Thread Jin, Yao

Hi,

Any comments for this fix?

Thanks
Jin Yao

On 5/10/2018 9:59 PM, Jin Yao wrote:

See example,

perf record -e cycles,branches ./div
perf annotate main --stdio or
perf annotate main --stdio2 or
perf annotate main

The "perf annotate" should show both cycles and branches on the
left side, but actually it only shows one event cycles.

It works with events group like:
perf record -e "{cycles,branches}" ./div

It should work too even without group.

This patch uses 'symbol_conf.nr_events > 1' to check if it's the
multiple events case.

With this patch,

perf record -e cycles,branches ./div
perf annotate main --stdio2

   ..
for (i = 0; i < 20; i++) {
flag = compute_flag();
   4.85   5.83  38:   xor%eax,%eax
   0.01   0.01  → callq  compute_flag

count++;
   4.44   2.27movcount,%edx
   0.00   0.00add$0x1,%edx
   ..

Signed-off-by: Jin Yao 
---
  tools/perf/util/annotate.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5d74a30..ca8d4b1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1054,6 +1054,8 @@ annotation_line__new(struct annotate_args *args, size_t 
privsize)
  
  	if (perf_evsel__is_group_event(evsel))

nr = evsel->nr_members;
+   else if (symbol_conf.nr_events > nr)
+   nr = symbol_conf.nr_events;
  
  	size += sizeof(al->samples[0]) * nr;
  
@@ -1326,6 +1328,8 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
  
  		if (perf_evsel__is_group_event(evsel))

width *= evsel->nr_members;
+   else if (symbol_conf.nr_events > 1)
+   width *= symbol_conf.nr_events;
  
  		if (!*al->line)

printf(" %*s:\n", width, " ");
@@ -1967,6 +1971,8 @@ int symbol__annotate_printf(struct symbol *sym, struct 
map *map,
  
  	if (perf_evsel__is_group_event(evsel))

width *= evsel->nr_members;
+   else if (symbol_conf.nr_events > 1)
+   width *= symbol_conf.nr_events;
  
  	graph_dotted_len = printf(" %-*.*s|	Source code & Disassembly of %s for %s (%" PRIu64 " samples)\n",

  width, width, symbol_conf.show_total_period ? 
"Period" :
@@ -2578,6 +2584,8 @@ int symbol__annotate2(struct symbol *sym, struct map 
*map, struct perf_evsel *ev
  
  	if (perf_evsel__is_group_event(evsel))

nr_pcnt = evsel->nr_members;
+   else if (symbol_conf.nr_events > nr_pcnt)
+   nr_pcnt = symbol_conf.nr_events;
  
  	err = symbol__annotate(sym, map, evsel, 0, parch);

if (err)



Re: [PATCH v5 13/13] mm: Clear shrinker bit if there are no objects related to memcg

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 11:55:04AM +0300, Kirill Tkhai wrote:
> >> @@ -586,8 +586,23 @@ static unsigned long shrink_slab_memcg(gfp_t 
> >> gfp_mask, int nid,
> >>continue;
> >>  
> >>ret = do_shrink_slab(, shrinker, priority);
> >> -  if (ret == SHRINK_EMPTY)
> >> -  ret = 0;
> >> +  if (ret == SHRINK_EMPTY) {
> >> +  clear_bit(i, map->map);
> >> +  /*
> >> +   * Pairs with mb in memcg_set_shrinker_bit():
> >> +   *
> >> +   * list_lru_add() shrink_slab_memcg()
> >> +   *   list_add_tail()clear_bit()
> >> +   *  
> >> +   *   set_bit()  do_shrink_slab()
> >> +   */
> > 
> > Please improve the comment so that it isn't just a diagram.
> 
> Please, say, which comment you want to see here.

I want the reader to understand why we need to invoke the shrinker twice
if it returns SHRINK_EMPTY. The diagram doesn't really help here IMO. So
I'd write Something like this:

ret = do_shrink_slab(, shrinker, priority);
if (ret == SHRINK_EMPTY) {
clear_bit(i, map->map);
/*
 * After the shrinker reported that it had no objects to free,
 * but before we cleared the corresponding bit in the memcg
 * shrinker map, a new object might have been added. To make
 * sure, we have the bit set in this case, we invoke the
 * shrinker one more time and re-set the bit if it reports that
 * it is not empty anymore. The memory barrier here pairs with
 * the barrier in memcg_set_shrinker_bit():
 *
 * list_lru_add() shrink_slab_memcg()
 *   list_add_tail()clear_bit()
 *  
 *   set_bit()  do_shrink_slab()
 */
smp_mb__after_atomic();
ret = do_shrink_slab(, shrinker, priority);
if (ret == SHRINK_EMPTY)
ret = 0;
else
memcg_set_shrinker_bit(memcg, nid, i);


Re: [PATCH v5 13/13] mm: Clear shrinker bit if there are no objects related to memcg

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 11:55:04AM +0300, Kirill Tkhai wrote:
> >> @@ -586,8 +586,23 @@ static unsigned long shrink_slab_memcg(gfp_t 
> >> gfp_mask, int nid,
> >>continue;
> >>  
> >>ret = do_shrink_slab(, shrinker, priority);
> >> -  if (ret == SHRINK_EMPTY)
> >> -  ret = 0;
> >> +  if (ret == SHRINK_EMPTY) {
> >> +  clear_bit(i, map->map);
> >> +  /*
> >> +   * Pairs with mb in memcg_set_shrinker_bit():
> >> +   *
> >> +   * list_lru_add() shrink_slab_memcg()
> >> +   *   list_add_tail()clear_bit()
> >> +   *  
> >> +   *   set_bit()  do_shrink_slab()
> >> +   */
> > 
> > Please improve the comment so that it isn't just a diagram.
> 
> Please, say, which comment you want to see here.

I want the reader to understand why we need to invoke the shrinker twice
if it returns SHRINK_EMPTY. The diagram doesn't really help here IMO. So
I'd write Something like this:

ret = do_shrink_slab(, shrinker, priority);
if (ret == SHRINK_EMPTY) {
clear_bit(i, map->map);
/*
 * After the shrinker reported that it had no objects to free,
 * but before we cleared the corresponding bit in the memcg
 * shrinker map, a new object might have been added. To make
 * sure, we have the bit set in this case, we invoke the
 * shrinker one more time and re-set the bit if it reports that
 * it is not empty anymore. The memory barrier here pairs with
 * the barrier in memcg_set_shrinker_bit():
 *
 * list_lru_add() shrink_slab_memcg()
 *   list_add_tail()clear_bit()
 *  
 *   set_bit()  do_shrink_slab()
 */
smp_mb__after_atomic();
ret = do_shrink_slab(, shrinker, priority);
if (ret == SHRINK_EMPTY)
ret = 0;
else
memcg_set_shrinker_bit(memcg, nid, i);


[PATCH 3/6] workqueue: Make worker_attach/detach_pool() update worker->pool

2018-05-16 Thread Tejun Heo
For historical reasons, the worker attach/detach functions don't
currently manage worker->pool and the callers are manually and
inconsistently updating it.

This patch moves worker->pool updates into the worker attach/detach
functions.  This makes worker->pool consistent and clearly defines how
worker->pool updates are synchronized.

This will help later workqueue visibility improvements by allowing
safe access to workqueue information from worker->task.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c  | 16 
 kernel/workqueue_internal.h |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 91fe0a6..2fde50f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1741,6 +1741,7 @@ static void worker_attach_to_pool(struct worker *worker,
worker->flags |= WORKER_UNBOUND;
 
list_add_tail(>node, >workers);
+   worker->pool = pool;
 
mutex_unlock(_pool_attach_mutex);
 }
@@ -1748,19 +1749,21 @@ static void worker_attach_to_pool(struct worker *worker,
 /**
  * worker_detach_from_pool() - detach a worker from its pool
  * @worker: worker which is attached to its pool
- * @pool: the pool @worker is attached to
  *
  * Undo the attaching which had been done in worker_attach_to_pool().  The
  * caller worker shouldn't access to the pool after detached except it has
  * other reference to the pool.
  */
-static void worker_detach_from_pool(struct worker *worker,
-   struct worker_pool *pool)
+static void worker_detach_from_pool(struct worker *worker)
 {
+   struct worker_pool *pool = worker->pool;
struct completion *detach_completion = NULL;
 
mutex_lock(_pool_attach_mutex);
+
list_del(>node);
+   worker->pool = NULL;
+
if (list_empty(>workers))
detach_completion = pool->detach_completion;
mutex_unlock(_pool_attach_mutex);
@@ -1799,7 +1802,6 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (!worker)
goto fail;
 
-   worker->pool = pool;
worker->id = id;
 
if (pool->cpu >= 0)
@@ -2236,7 +2238,7 @@ static int worker_thread(void *__worker)
 
set_task_comm(worker->task, "kworker/dying");
ida_simple_remove(>worker_ida, worker->id);
-   worker_detach_from_pool(worker, pool);
+   worker_detach_from_pool(worker);
kfree(worker);
return 0;
}
@@ -2367,7 +2369,6 @@ static int rescuer_thread(void *__rescuer)
worker_attach_to_pool(rescuer, pool);
 
spin_lock_irq(>lock);
-   rescuer->pool = pool;
 
/*
 * Slurp in all works issued via this workqueue and
@@ -2417,10 +2418,9 @@ static int rescuer_thread(void *__rescuer)
if (need_more_worker(pool))
wake_up_worker(pool);
 
-   rescuer->pool = NULL;
spin_unlock_irq(>lock);
 
-   worker_detach_from_pool(rescuer, pool);
+   worker_detach_from_pool(rescuer);
 
spin_lock_irq(_mayday_lock);
}
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index d390d1b..4a182e0 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,7 +37,7 @@ struct worker {
/* 64 bytes boundary on 64bit, 32 on 32bit */
 
struct task_struct  *task;  /* I: worker task */
-   struct worker_pool  *pool;  /* I: the associated pool */
+   struct worker_pool  *pool;  /* A: the associated pool */
/* L: for rescuers */
struct list_headnode;   /* A: anchored at pool->workers 
*/
/* A: runs through worker->node 
*/
-- 
2.9.5



[PATCH 2/6] workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex

2018-05-16 Thread Tejun Heo
To improve workqueue visibility, we want to be able to access
workqueue information from worker tasks.  The per-pool attach mutex
makes that difficult because there's no way of stabilizing task ->
worker pool association without knowing the pool first.

Worker attach/detach is a slow path and there's no need for different
pools to be able to perform them concurrently.  This patch replaces
the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
for visibility improvement changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ca7959b..91fe0a6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 * be executing on any CPU.  The pool behaves as an unbound one.
 *
 * Note that DISASSOCIATED should be flipped only while holding
-* attach_mutex to avoid changing binding state while
+* wq_pool_attach_mutex to avoid changing binding state while
 * worker_attach_to_pool() is in progress.
 */
POOL_MANAGER_ACTIVE = 1 << 0,   /* being managed */
@@ -123,7 +123,7 @@ enum {
  *cpu or grabbing pool->lock is enough for read access.  If
  *POOL_DISASSOCIATED is set, it's identical to L.
  *
- * A: pool->attach_mutex protected.
+ * A: wq_pool_attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -166,7 +166,6 @@ struct worker_pool {
/* L: hash of busy workers */
 
struct worker   *manager;   /* L: purely informational */
-   struct mutexattach_mutex;   /* attach/detach exclusion */
struct list_headworkers;/* A: attached workers */
struct completion   *detach_completion; /* all workers detached */
 
@@ -297,6 +296,7 @@ static bool wq_numa_enabled;/* unbound NUMA 
affinity enabled */
 static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);/* protects pools and workqueues list */
+static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
 static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq->maydays list */
 static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go 
away */
 
@@ -399,14 +399,14 @@ static void workqueue_sysfs_unregister(struct 
workqueue_struct *wq);
  * @worker: iteration cursor
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with @pool->attach_mutex.
+ * This must be called with wq_pool_attach_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, pool) \
list_for_each_entry((worker), &(pool)->workers, node)   \
-   if (({ lockdep_assert_held(>attach_mutex); false; })) { } 
\
+   if (({ lockdep_assert_held(_pool_attach_mutex); false; })) { 
} \
else
 
 /**
@@ -1724,7 +1724,7 @@ static struct worker *alloc_worker(int node)
 static void worker_attach_to_pool(struct worker *worker,
   struct worker_pool *pool)
 {
-   mutex_lock(>attach_mutex);
+   mutex_lock(_pool_attach_mutex);
 
/*
 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1733,16 +1733,16 @@ static void worker_attach_to_pool(struct worker *worker,
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
/*
-* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
-* stable across this function.  See the comments above the
-* flag definition for details.
+* The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
+* stable across this function.  See the comments above the flag
+* definition for details.
 */
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;
 
list_add_tail(>node, >workers);
 
-   mutex_unlock(>attach_mutex);
+   mutex_unlock(_pool_attach_mutex);
 }
 
 /**
@@ -1759,11 +1759,11 @@ static void worker_detach_from_pool(struct worker 
*worker,
 {
struct completion *detach_completion = NULL;
 
-   mutex_lock(>attach_mutex);
+   mutex_lock(_pool_attach_mutex);
list_del(>node);
if (list_empty(>workers))
detach_completion = pool->detach_completion;
-   mutex_unlock(>attach_mutex);
+   mutex_unlock(_pool_attach_mutex);
 
/* clear leftover flags without pool->lock after it is detached */
worker->flags &= ~(WORKER_UNBOUND | WORKER_REBOUND);
@@ -3271,7 +3271,6 @@ static int init_worker_pool(struct worker_pool *pool)
 
timer_setup(>mayday_timer, pool_mayday_timeout, 0);
 
-   mutex_init(>attach_mutex);

[PATCH 3/6] workqueue: Make worker_attach/detach_pool() update worker->pool

2018-05-16 Thread Tejun Heo
For historical reasons, the worker attach/detach functions don't
currently manage worker->pool and the callers are manually and
inconsistently updating it.

This patch moves worker->pool updates into the worker attach/detach
functions.  This makes worker->pool consistent and clearly defines how
worker->pool updates are synchronized.

This will help later workqueue visibility improvements by allowing
safe access to workqueue information from worker->task.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c  | 16 
 kernel/workqueue_internal.h |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 91fe0a6..2fde50f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1741,6 +1741,7 @@ static void worker_attach_to_pool(struct worker *worker,
worker->flags |= WORKER_UNBOUND;
 
list_add_tail(>node, >workers);
+   worker->pool = pool;
 
mutex_unlock(_pool_attach_mutex);
 }
@@ -1748,19 +1749,21 @@ static void worker_attach_to_pool(struct worker *worker,
 /**
  * worker_detach_from_pool() - detach a worker from its pool
  * @worker: worker which is attached to its pool
- * @pool: the pool @worker is attached to
  *
  * Undo the attaching which had been done in worker_attach_to_pool().  The
  * caller worker shouldn't access to the pool after detached except it has
  * other reference to the pool.
  */
-static void worker_detach_from_pool(struct worker *worker,
-   struct worker_pool *pool)
+static void worker_detach_from_pool(struct worker *worker)
 {
+   struct worker_pool *pool = worker->pool;
struct completion *detach_completion = NULL;
 
mutex_lock(_pool_attach_mutex);
+
list_del(>node);
+   worker->pool = NULL;
+
if (list_empty(>workers))
detach_completion = pool->detach_completion;
mutex_unlock(_pool_attach_mutex);
@@ -1799,7 +1802,6 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (!worker)
goto fail;
 
-   worker->pool = pool;
worker->id = id;
 
if (pool->cpu >= 0)
@@ -2236,7 +2238,7 @@ static int worker_thread(void *__worker)
 
set_task_comm(worker->task, "kworker/dying");
ida_simple_remove(>worker_ida, worker->id);
-   worker_detach_from_pool(worker, pool);
+   worker_detach_from_pool(worker);
kfree(worker);
return 0;
}
@@ -2367,7 +2369,6 @@ static int rescuer_thread(void *__rescuer)
worker_attach_to_pool(rescuer, pool);
 
spin_lock_irq(>lock);
-   rescuer->pool = pool;
 
/*
 * Slurp in all works issued via this workqueue and
@@ -2417,10 +2418,9 @@ static int rescuer_thread(void *__rescuer)
if (need_more_worker(pool))
wake_up_worker(pool);
 
-   rescuer->pool = NULL;
spin_unlock_irq(>lock);
 
-   worker_detach_from_pool(rescuer, pool);
+   worker_detach_from_pool(rescuer);
 
spin_lock_irq(_mayday_lock);
}
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index d390d1b..4a182e0 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,7 +37,7 @@ struct worker {
/* 64 bytes boundary on 64bit, 32 on 32bit */
 
struct task_struct  *task;  /* I: worker task */
-   struct worker_pool  *pool;  /* I: the associated pool */
+   struct worker_pool  *pool;  /* A: the associated pool */
/* L: for rescuers */
struct list_headnode;   /* A: anchored at pool->workers 
*/
/* A: runs through worker->node 
*/
-- 
2.9.5



[PATCH 2/6] workqueue: Replace pool->attach_mutex with global wq_pool_attach_mutex

2018-05-16 Thread Tejun Heo
To improve workqueue visibility, we want to be able to access
workqueue information from worker tasks.  The per-pool attach mutex
makes that difficult because there's no way of stabilizing task ->
worker pool association without knowing the pool first.

Worker attach/detach is a slow path and there's no need for different
pools to be able to perform them concurrently.  This patch replaces
the per-pool attach_mutex with global wq_pool_attach_mutex to prepare
for visibility improvement changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ca7959b..91fe0a6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 * be executing on any CPU.  The pool behaves as an unbound one.
 *
 * Note that DISASSOCIATED should be flipped only while holding
-* attach_mutex to avoid changing binding state while
+* wq_pool_attach_mutex to avoid changing binding state while
 * worker_attach_to_pool() is in progress.
 */
POOL_MANAGER_ACTIVE = 1 << 0,   /* being managed */
@@ -123,7 +123,7 @@ enum {
  *cpu or grabbing pool->lock is enough for read access.  If
  *POOL_DISASSOCIATED is set, it's identical to L.
  *
- * A: pool->attach_mutex protected.
+ * A: wq_pool_attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -166,7 +166,6 @@ struct worker_pool {
/* L: hash of busy workers */
 
struct worker   *manager;   /* L: purely informational */
-   struct mutexattach_mutex;   /* attach/detach exclusion */
struct list_headworkers;/* A: attached workers */
struct completion   *detach_completion; /* all workers detached */
 
@@ -297,6 +296,7 @@ static bool wq_numa_enabled;/* unbound NUMA 
affinity enabled */
 static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
 
 static DEFINE_MUTEX(wq_pool_mutex);/* protects pools and workqueues list */
+static DEFINE_MUTEX(wq_pool_attach_mutex); /* protects worker attach/detach */
 static DEFINE_SPINLOCK(wq_mayday_lock);/* protects wq->maydays list */
 static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go 
away */
 
@@ -399,14 +399,14 @@ static void workqueue_sysfs_unregister(struct 
workqueue_struct *wq);
  * @worker: iteration cursor
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with @pool->attach_mutex.
+ * This must be called with wq_pool_attach_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, pool) \
list_for_each_entry((worker), &(pool)->workers, node)   \
-   if (({ lockdep_assert_held(>attach_mutex); false; })) { } 
\
+   if (({ lockdep_assert_held(_pool_attach_mutex); false; })) { 
} \
else
 
 /**
@@ -1724,7 +1724,7 @@ static struct worker *alloc_worker(int node)
 static void worker_attach_to_pool(struct worker *worker,
   struct worker_pool *pool)
 {
-   mutex_lock(>attach_mutex);
+   mutex_lock(_pool_attach_mutex);
 
/*
 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1733,16 +1733,16 @@ static void worker_attach_to_pool(struct worker *worker,
set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
/*
-* The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
-* stable across this function.  See the comments above the
-* flag definition for details.
+* The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
+* stable across this function.  See the comments above the flag
+* definition for details.
 */
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;
 
list_add_tail(>node, >workers);
 
-   mutex_unlock(>attach_mutex);
+   mutex_unlock(_pool_attach_mutex);
 }
 
 /**
@@ -1759,11 +1759,11 @@ static void worker_detach_from_pool(struct worker 
*worker,
 {
struct completion *detach_completion = NULL;
 
-   mutex_lock(>attach_mutex);
+   mutex_lock(_pool_attach_mutex);
list_del(>node);
if (list_empty(>workers))
detach_completion = pool->detach_completion;
-   mutex_unlock(>attach_mutex);
+   mutex_unlock(_pool_attach_mutex);
 
/* clear leftover flags without pool->lock after it is detached */
worker->flags &= ~(WORKER_UNBOUND | WORKER_REBOUND);
@@ -3271,7 +3271,6 @@ static int init_worker_pool(struct worker_pool *pool)
 
timer_setup(>mayday_timer, pool_mayday_timeout, 0);
 
-   mutex_init(>attach_mutex);
INIT_LIST_HEAD(>workers);
 

[PATCH 6/6] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}

2018-05-16 Thread Tejun Heo
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

  # ps -ef | grep kworker
  root   4   2  0 Feb25 ?00:00:00 [kworker/0:0H]
  root   6   2  0 Feb25 ?00:00:00 [kworker/u112:0]
  root  19   2  0 Feb25 ?00:00:00 [kworker/1:0H]
  root  25   2  0 Feb25 ?00:00:00 [kworker/2:0H]
  root  31   2  0 Feb25 ?00:00:00 [kworker/3:0H]
  ...

This patch makes workqueue workers report the latest workqueue it was
executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

  # cat /proc/25/comm
  kworker/2:0-events_power_efficient
  # cat /proc/25/stat
  25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
  # grep Name /proc/25/status
  Name:   kworker/2:0-events_power_efficient

Unfortunately, ps(1) truncates comm to 15 characters,

  # ps 25
PID TTY  STAT   TIME COMMAND
 25 ?I  0:00 [kworker/2:0-eve]

making it a lot less useful; however, this should be an easy fix from
ps(1) side.

Signed-off-by: Tejun Heo 
Suggested-by: Linus Torvalds 
Cc: Craig Small 
---
 fs/proc/array.c   |  7 +--
 include/linux/workqueue.h |  1 +
 kernel/workqueue.c| 39 +++
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index f29221e..bb1d361 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -99,10 +99,13 @@ void proc_task_name(struct seq_file *m, struct task_struct 
*p, bool escape)
 {
char *buf;
size_t size;
-   char tcomm[sizeof(p->comm)];
+   char tcomm[64];
int ret;
 
-   get_task_comm(tcomm, p);
+   if (p->flags & PF_WQ_WORKER)
+   wq_worker_comm(tcomm, sizeof(tcomm), p);
+   else
+   __get_task_comm(tcomm, sizeof(tcomm), p);
 
size = seq_get_buf(m, );
if (escape) {
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 39a0e21..60d673e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -494,6 +494,7 @@ extern unsigned int work_busy(struct work_struct *work);
 extern __printf(1, 2) void set_worker_desc(const char *fmt, ...);
 extern void print_worker_info(const char *log_lvl, struct task_struct *task);
 extern void show_workqueue_state(void);
+extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task);
 
 /**
  * queue_work - queue work on a workqueue
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3fbe007..b4a39a1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4577,6 +4577,45 @@ void show_workqueue_state(void)
rcu_read_unlock_sched();
 }
 
+/* used to show worker information through /proc/PID/{comm,stat,status} */
+void wq_worker_comm(char *buf, size_t size, struct task_struct *task)
+{
+   struct worker *worker;
+   struct worker_pool *pool;
+   int off;
+
+   /* always show the actual comm */
+   off = strscpy(buf, task->comm, size);
+   if (off < 0)
+   return;
+
+   /* stabilize worker pool association */
+   mutex_lock(_pool_attach_mutex);
+
+   worker = kthread_data(task);
+   pool = worker->pool;
+
+   if (pool) {
+   spin_lock_irq(>lock);
+   /*
+* ->desc tracks information (wq name or set_worker_desc())
+* for the latest execution.  If current, prepend '+',
+* otherwise '-'.
+*/
+   if (worker->desc[0] != '\0') {
+   if (worker->current_work)
+   scnprintf(buf + off, size - off, "+%s",
+ worker->desc);
+   else
+   scnprintf(buf + off, size - off, "-%s",
+ worker->desc);
+   }
+   spin_unlock_irq(>lock);
+   }
+
+   mutex_unlock(_pool_attach_mutex);
+}
+
 /*
  * CPU hotplug.
  *
-- 
2.9.5



[PATCH 6/6] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}

2018-05-16 Thread Tejun Heo
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

  # ps -ef | grep kworker
  root   4   2  0 Feb25 ?00:00:00 [kworker/0:0H]
  root   6   2  0 Feb25 ?00:00:00 [kworker/u112:0]
  root  19   2  0 Feb25 ?00:00:00 [kworker/1:0H]
  root  25   2  0 Feb25 ?00:00:00 [kworker/2:0H]
  root  31   2  0 Feb25 ?00:00:00 [kworker/3:0H]
  ...

This patch makes workqueue workers report the latest workqueue it was
executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

  # cat /proc/25/comm
  kworker/2:0-events_power_efficient
  # cat /proc/25/stat
  25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
  # grep Name /proc/25/status
  Name:   kworker/2:0-events_power_efficient

Unfortunately, ps(1) truncates comm to 15 characters,

  # ps 25
PID TTY  STAT   TIME COMMAND
 25 ?I  0:00 [kworker/2:0-eve]

making it a lot less useful; however, this should be an easy fix from
ps(1) side.

Signed-off-by: Tejun Heo 
Suggested-by: Linus Torvalds 
Cc: Craig Small 
---
 fs/proc/array.c   |  7 +--
 include/linux/workqueue.h |  1 +
 kernel/workqueue.c| 39 +++
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index f29221e..bb1d361 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -99,10 +99,13 @@ void proc_task_name(struct seq_file *m, struct task_struct 
*p, bool escape)
 {
char *buf;
size_t size;
-   char tcomm[sizeof(p->comm)];
+   char tcomm[64];
int ret;
 
-   get_task_comm(tcomm, p);
+   if (p->flags & PF_WQ_WORKER)
+   wq_worker_comm(tcomm, sizeof(tcomm), p);
+   else
+   __get_task_comm(tcomm, sizeof(tcomm), p);
 
size = seq_get_buf(m, );
if (escape) {
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 39a0e21..60d673e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -494,6 +494,7 @@ extern unsigned int work_busy(struct work_struct *work);
 extern __printf(1, 2) void set_worker_desc(const char *fmt, ...);
 extern void print_worker_info(const char *log_lvl, struct task_struct *task);
 extern void show_workqueue_state(void);
+extern void wq_worker_comm(char *buf, size_t size, struct task_struct *task);
 
 /**
  * queue_work - queue work on a workqueue
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3fbe007..b4a39a1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4577,6 +4577,45 @@ void show_workqueue_state(void)
rcu_read_unlock_sched();
 }
 
+/* used to show worker information through /proc/PID/{comm,stat,status} */
+void wq_worker_comm(char *buf, size_t size, struct task_struct *task)
+{
+   struct worker *worker;
+   struct worker_pool *pool;
+   int off;
+
+   /* always show the actual comm */
+   off = strscpy(buf, task->comm, size);
+   if (off < 0)
+   return;
+
+   /* stabilize worker pool association */
+   mutex_lock(_pool_attach_mutex);
+
+   worker = kthread_data(task);
+   pool = worker->pool;
+
+   if (pool) {
+   spin_lock_irq(>lock);
+   /*
+* ->desc tracks information (wq name or set_worker_desc())
+* for the latest execution.  If current, prepend '+',
+* otherwise '-'.
+*/
+   if (worker->desc[0] != '\0') {
+   if (worker->current_work)
+   scnprintf(buf + off, size - off, "+%s",
+ worker->desc);
+   else
+   scnprintf(buf + off, size - off, "-%s",
+ worker->desc);
+   }
+   spin_unlock_irq(>lock);
+   }
+
+   mutex_unlock(_pool_attach_mutex);
+}
+
 /*
  * CPU hotplug.
  *
-- 
2.9.5



[PATCH 4/6] workqueue: Set worker->desc to workqueue name by default

2018-05-16 Thread Tejun Heo
Work functions can use set_worker_desc() to improve the visibility of
what the worker task is doing.  Currently, the desc field is unset at
the beginning of each execution and there is a separate field to track
the field is set during the current execution.

Instead of leaving empty till desc is set, worker->desc can be used to
remember the last workqueue the worker worked on by default and users
that use set_worker_desc() can override it to something more
informative as necessary.

This simplifies desc handling and helps tracking the last workqueue
that the worker exected on to improve visibility.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c  | 21 ++---
 kernel/workqueue_internal.h |  1 -
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2fde50f..3fbe007 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2088,6 +2088,12 @@ __acquires(>lock)
worker->current_pwq = pwq;
work_color = get_work_color(work);
 
+   /*
+* Record wq name for cmdline and debug reporting, may get
+* overridden through set_worker_desc().
+*/
+   strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
+
list_del_init(>entry);
 
/*
@@ -2183,7 +2189,6 @@ __acquires(>lock)
worker->current_work = NULL;
worker->current_func = NULL;
worker->current_pwq = NULL;
-   worker->desc_valid = false;
pwq_dec_nr_in_flight(pwq, work_color);
 }
 
@@ -4346,7 +4351,6 @@ void set_worker_desc(const char *fmt, ...)
va_start(args, fmt);
vsnprintf(worker->desc, sizeof(worker->desc), fmt, args);
va_end(args);
-   worker->desc_valid = true;
}
 }
 
@@ -4370,7 +4374,6 @@ void print_worker_info(const char *log_lvl, struct 
task_struct *task)
char desc[WORKER_DESC_LEN] = { };
struct pool_workqueue *pwq = NULL;
struct workqueue_struct *wq = NULL;
-   bool desc_valid = false;
struct worker *worker;
 
if (!(task->flags & PF_WQ_WORKER))
@@ -4383,22 +4386,18 @@ void print_worker_info(const char *log_lvl, struct 
task_struct *task)
worker = kthread_probe_data(task);
 
/*
-* Carefully copy the associated workqueue's workfn and name.  Keep
-* the original last '\0' in case the original contains garbage.
+* Carefully copy the associated workqueue's workfn, name and desc.
+* Keep the original last '\0' in case the original is garbage.
 */
probe_kernel_read(, >current_func, sizeof(fn));
probe_kernel_read(, >current_pwq, sizeof(pwq));
probe_kernel_read(, >wq, sizeof(wq));
probe_kernel_read(name, wq->name, sizeof(name) - 1);
-
-   /* copy worker description */
-   probe_kernel_read(_valid, >desc_valid, sizeof(desc_valid));
-   if (desc_valid)
-   probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
+   probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
 
if (fn || name[0] || desc[0]) {
printk("%sWorkqueue: %s %pf", log_lvl, name, fn);
-   if (desc[0])
+   if (strcmp(name, desc))
pr_cont(" (%s)", desc);
pr_cont("\n");
}
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 4a182e0..66fbb5a 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -31,7 +31,6 @@ struct worker {
struct work_struct  *current_work;  /* L: work being processed */
work_func_t current_func;   /* L: current_work's fn */
struct pool_workqueue   *current_pwq; /* L: current_work's pwq */
-   booldesc_valid; /* ->desc is valid */
struct list_headscheduled;  /* L: scheduled works */
 
/* 64 bytes boundary on 64bit, 32 on 32bit */
-- 
2.9.5



[PATCH 5/6] proc: Consolidate task->comm formatting into proc_task_name()

2018-05-16 Thread Tejun Heo
proc shows task->comm in three places - comm, stat, status - and each
is fetching and formatting task->comm slighly differently.  This patch
renames task_name() to proc_task_name(), makes it more generic, and
updates all three paths to use it.

This will enable expanding comm reporting for workqueue workers.

Signed-off-by: Tejun Heo 
---
 fs/proc/array.c| 26 +++---
 fs/proc/base.c |  5 ++---
 fs/proc/internal.h |  2 ++
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index ae2c807..f29221e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -95,7 +95,7 @@
 #include 
 #include "internal.h"
 
-static inline void task_name(struct seq_file *m, struct task_struct *p)
+void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)
 {
char *buf;
size_t size;
@@ -104,13 +104,17 @@ static inline void task_name(struct seq_file *m, struct 
task_struct *p)
 
get_task_comm(tcomm, p);
 
-   seq_puts(m, "Name:\t");
-
size = seq_get_buf(m, );
-   ret = string_escape_str(tcomm, buf, size, ESCAPE_SPACE | 
ESCAPE_SPECIAL, "\n\\");
-   seq_commit(m, ret < size ? ret : -1);
+   if (escape) {
+   ret = string_escape_str(tcomm, buf, size,
+   ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\");
+   if (ret >= size)
+   ret = -1;
+   } else {
+   ret = strscpy(buf, tcomm, size);
+   }
 
-   seq_putc(m, '\n');
+   seq_commit(m, ret);
 }
 
 /*
@@ -365,7 +369,10 @@ int proc_pid_status(struct seq_file *m, struct 
pid_namespace *ns,
 {
struct mm_struct *mm = get_task_mm(task);
 
-   task_name(m, task);
+   seq_puts(m, "Name:\t");
+   proc_task_name(m, task, true);
+   seq_putc(m, '\n');
+
task_state(m, ns, pid, task);
 
if (mm) {
@@ -400,7 +407,6 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
u64 cutime, cstime, utime, stime;
u64 cgtime, gtime;
unsigned long rsslim = 0;
-   char tcomm[sizeof(task->comm)];
unsigned long flags;
 
state = *get_task_state(task);
@@ -427,8 +433,6 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
}
}
 
-   get_task_comm(tcomm, task);
-
sigemptyset();
sigemptyset();
cutime = cstime = utime = stime = 0;
@@ -495,7 +499,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
 
seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
seq_puts(m, " (");
-   seq_puts(m, tcomm);
+   proc_task_name(m, task, false);
seq_puts(m, ") ");
seq_putc(m, state);
seq_put_decimal_ll(m, " ", ppid);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2eee4d7..eb17917ca 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1581,9 +1581,8 @@ static int comm_show(struct seq_file *m, void *v)
if (!p)
return -ESRCH;
 
-   task_lock(p);
-   seq_printf(m, "%s\n", p->comm);
-   task_unlock(p);
+   proc_task_name(m, p, false);
+   seq_putc(m, '\n');
 
put_task_struct(p);
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e..b823fac62 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -131,6 +131,8 @@ unsigned name_to_int(const struct qstr *qstr);
  */
 extern const struct file_operations proc_tid_children_operations;
 
+extern void proc_task_name(struct seq_file *m, struct task_struct *p,
+  bool escape);
 extern int proc_tid_stat(struct seq_file *, struct pid_namespace *,
 struct pid *, struct task_struct *);
 extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *,
-- 
2.9.5



[PATCH 1/6] proc: Don't allow empty /proc/PID/cmdline for user tasks

2018-05-16 Thread Tejun Heo
Kernel threads have empty /proc/PID/cmdline and some userland tools
including ps(1) and older versions of systemd use this to detect
kernel threads.  However, any userland program can emulate the
behavior by making its argvs unavailable and trick the affected tools
into thinking that the task is a kernel thread.  Linus's reproducer
follows.

  #include 
  #include 

  int main(void)
  {
  char empty[16384];
  unsigned long ptr;

  asm volatile("" :"=r" (ptr) : "0" (empty):"memory");
  ptr = (ptr+4095) & ~4095;
  munmap((void *)ptr, 32768);

  sleep(1000);
  return 0;
  }

Compiling the above program into nullcmdline and running it on an
unpatche kernel shows the following behavior.

  $ ./nullcmdline &
  [1] 2382031
  [devbig577 ~/tmp]$ hexdump -C /proc/2382031/comm
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 0a  |nullcmdline.|
  000c
  $ hexdump -C /proc/2382031/cmdline
  $ ps 2382031
  PID TTY  STAT   TIME COMMAND
  2382031 pts/2S  0:00 [nullcmdline]

The empty cmdline makes ps(1) think that nullcmdline is a kernel
thread and put brackets around its name (comm), which is mostly a
nuisance but it's possible that this confusion can lead to more
harmful confusions.

This patch fixes the issue by making proc_pid_cmdline_read() never
return empty string for user tasks.  If the result is empty for
whatever reason, comm string is returned.  Even when the comm string
is empty, it still returns the null termnation character.  On a
patched kernel, running the same command as above gives us.

  $ ./nullcmdline &
  [1] 2317
  [test ~]# hexdump -C /proc/2317/comm
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 0a  |nullcmdline.|
  000c
  $ hexdump -C /proc/2317/cmdline
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 00  |nullcmdline.|
  000c
  $ ps 2317
PID TTY  STAT   TIME COMMAND
   2317 pts/0S  0:00 nullcmdline

Note that cmdline is a dup of comm and ps(1) is no longer confused.

Signed-off-by: Tejun Heo 
Suggested-by: Linus Torvalds 
---
 fs/proc/base.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b2ede6..2eee4d7 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -224,9 +224,10 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
if (!tsk)
return -ESRCH;
mm = get_task_mm(tsk);
-   put_task_struct(tsk);
-   if (!mm)
-   return 0;
+   if (!mm) {
+   rv = 0;
+   goto out_put_task;
+   }
/* Check if process spawned far enough to have cmdline. */
if (!mm->env_end) {
rv = 0;
@@ -367,8 +368,23 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
free_page((unsigned long)page);
 out_mmput:
mmput(mm);
+out_put_task:
+   /*
+* Some userland tools use empty cmdline to distinguish kthreads.
+* Avoid empty cmdline for user tasks by returning tsk->comm with
+* \0 termination when empty.
+*/
+   if (*pos == 0 && rv == 0 && !(tsk->flags & PF_KTHREAD)) {
+   char tcomm[TASK_COMM_LEN];
+
+   get_task_comm(tcomm, tsk);
+   rv = min(strlen(tcomm) + 1, count);
+   if (copy_to_user(buf, tsk->comm, rv))
+   rv = -EFAULT;
+   }
if (rv > 0)
*pos += rv;
+   put_task_struct(tsk);
return rv;
 }
 
-- 
2.9.5



[PATCH 4/6] workqueue: Set worker->desc to workqueue name by default

2018-05-16 Thread Tejun Heo
Work functions can use set_worker_desc() to improve the visibility of
what the worker task is doing.  Currently, the desc field is unset at
the beginning of each execution and there is a separate field to track
the field is set during the current execution.

Instead of leaving empty till desc is set, worker->desc can be used to
remember the last workqueue the worker worked on by default and users
that use set_worker_desc() can override it to something more
informative as necessary.

This simplifies desc handling and helps tracking the last workqueue
that the worker exected on to improve visibility.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c  | 21 ++---
 kernel/workqueue_internal.h |  1 -
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2fde50f..3fbe007 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2088,6 +2088,12 @@ __acquires(>lock)
worker->current_pwq = pwq;
work_color = get_work_color(work);
 
+   /*
+* Record wq name for cmdline and debug reporting, may get
+* overridden through set_worker_desc().
+*/
+   strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
+
list_del_init(>entry);
 
/*
@@ -2183,7 +2189,6 @@ __acquires(>lock)
worker->current_work = NULL;
worker->current_func = NULL;
worker->current_pwq = NULL;
-   worker->desc_valid = false;
pwq_dec_nr_in_flight(pwq, work_color);
 }
 
@@ -4346,7 +4351,6 @@ void set_worker_desc(const char *fmt, ...)
va_start(args, fmt);
vsnprintf(worker->desc, sizeof(worker->desc), fmt, args);
va_end(args);
-   worker->desc_valid = true;
}
 }
 
@@ -4370,7 +4374,6 @@ void print_worker_info(const char *log_lvl, struct 
task_struct *task)
char desc[WORKER_DESC_LEN] = { };
struct pool_workqueue *pwq = NULL;
struct workqueue_struct *wq = NULL;
-   bool desc_valid = false;
struct worker *worker;
 
if (!(task->flags & PF_WQ_WORKER))
@@ -4383,22 +4386,18 @@ void print_worker_info(const char *log_lvl, struct 
task_struct *task)
worker = kthread_probe_data(task);
 
/*
-* Carefully copy the associated workqueue's workfn and name.  Keep
-* the original last '\0' in case the original contains garbage.
+* Carefully copy the associated workqueue's workfn, name and desc.
+* Keep the original last '\0' in case the original is garbage.
 */
probe_kernel_read(, >current_func, sizeof(fn));
probe_kernel_read(, >current_pwq, sizeof(pwq));
probe_kernel_read(, >wq, sizeof(wq));
probe_kernel_read(name, wq->name, sizeof(name) - 1);
-
-   /* copy worker description */
-   probe_kernel_read(_valid, >desc_valid, sizeof(desc_valid));
-   if (desc_valid)
-   probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
+   probe_kernel_read(desc, worker->desc, sizeof(desc) - 1);
 
if (fn || name[0] || desc[0]) {
printk("%sWorkqueue: %s %pf", log_lvl, name, fn);
-   if (desc[0])
+   if (strcmp(name, desc))
pr_cont(" (%s)", desc);
pr_cont("\n");
}
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 4a182e0..66fbb5a 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -31,7 +31,6 @@ struct worker {
struct work_struct  *current_work;  /* L: work being processed */
work_func_t current_func;   /* L: current_work's fn */
struct pool_workqueue   *current_pwq; /* L: current_work's pwq */
-   booldesc_valid; /* ->desc is valid */
struct list_headscheduled;  /* L: scheduled works */
 
/* 64 bytes boundary on 64bit, 32 on 32bit */
-- 
2.9.5



[PATCH 5/6] proc: Consolidate task->comm formatting into proc_task_name()

2018-05-16 Thread Tejun Heo
proc shows task->comm in three places - comm, stat, status - and each
is fetching and formatting task->comm slighly differently.  This patch
renames task_name() to proc_task_name(), makes it more generic, and
updates all three paths to use it.

This will enable expanding comm reporting for workqueue workers.

Signed-off-by: Tejun Heo 
---
 fs/proc/array.c| 26 +++---
 fs/proc/base.c |  5 ++---
 fs/proc/internal.h |  2 ++
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index ae2c807..f29221e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -95,7 +95,7 @@
 #include 
 #include "internal.h"
 
-static inline void task_name(struct seq_file *m, struct task_struct *p)
+void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)
 {
char *buf;
size_t size;
@@ -104,13 +104,17 @@ static inline void task_name(struct seq_file *m, struct 
task_struct *p)
 
get_task_comm(tcomm, p);
 
-   seq_puts(m, "Name:\t");
-
size = seq_get_buf(m, );
-   ret = string_escape_str(tcomm, buf, size, ESCAPE_SPACE | 
ESCAPE_SPECIAL, "\n\\");
-   seq_commit(m, ret < size ? ret : -1);
+   if (escape) {
+   ret = string_escape_str(tcomm, buf, size,
+   ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\");
+   if (ret >= size)
+   ret = -1;
+   } else {
+   ret = strscpy(buf, tcomm, size);
+   }
 
-   seq_putc(m, '\n');
+   seq_commit(m, ret);
 }
 
 /*
@@ -365,7 +369,10 @@ int proc_pid_status(struct seq_file *m, struct 
pid_namespace *ns,
 {
struct mm_struct *mm = get_task_mm(task);
 
-   task_name(m, task);
+   seq_puts(m, "Name:\t");
+   proc_task_name(m, task, true);
+   seq_putc(m, '\n');
+
task_state(m, ns, pid, task);
 
if (mm) {
@@ -400,7 +407,6 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
u64 cutime, cstime, utime, stime;
u64 cgtime, gtime;
unsigned long rsslim = 0;
-   char tcomm[sizeof(task->comm)];
unsigned long flags;
 
state = *get_task_state(task);
@@ -427,8 +433,6 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
}
}
 
-   get_task_comm(tcomm, task);
-
sigemptyset();
sigemptyset();
cutime = cstime = utime = stime = 0;
@@ -495,7 +499,7 @@ static int do_task_stat(struct seq_file *m, struct 
pid_namespace *ns,
 
seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
seq_puts(m, " (");
-   seq_puts(m, tcomm);
+   proc_task_name(m, task, false);
seq_puts(m, ") ");
seq_putc(m, state);
seq_put_decimal_ll(m, " ", ppid);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2eee4d7..eb17917ca 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1581,9 +1581,8 @@ static int comm_show(struct seq_file *m, void *v)
if (!p)
return -ESRCH;
 
-   task_lock(p);
-   seq_printf(m, "%s\n", p->comm);
-   task_unlock(p);
+   proc_task_name(m, p, false);
+   seq_putc(m, '\n');
 
put_task_struct(p);
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e..b823fac62 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -131,6 +131,8 @@ unsigned name_to_int(const struct qstr *qstr);
  */
 extern const struct file_operations proc_tid_children_operations;
 
+extern void proc_task_name(struct seq_file *m, struct task_struct *p,
+  bool escape);
 extern int proc_tid_stat(struct seq_file *, struct pid_namespace *,
 struct pid *, struct task_struct *);
 extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *,
-- 
2.9.5



[PATCH 1/6] proc: Don't allow empty /proc/PID/cmdline for user tasks

2018-05-16 Thread Tejun Heo
Kernel threads have empty /proc/PID/cmdline and some userland tools
including ps(1) and older versions of systemd use this to detect
kernel threads.  However, any userland program can emulate the
behavior by making its argvs unavailable and trick the affected tools
into thinking that the task is a kernel thread.  Linus's reproducer
follows.

  #include 
  #include 

  int main(void)
  {
  char empty[16384];
  unsigned long ptr;

  asm volatile("" :"=r" (ptr) : "0" (empty):"memory");
  ptr = (ptr+4095) & ~4095;
  munmap((void *)ptr, 32768);

  sleep(1000);
  return 0;
  }

Compiling the above program into nullcmdline and running it on an
unpatche kernel shows the following behavior.

  $ ./nullcmdline &
  [1] 2382031
  [devbig577 ~/tmp]$ hexdump -C /proc/2382031/comm
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 0a  |nullcmdline.|
  000c
  $ hexdump -C /proc/2382031/cmdline
  $ ps 2382031
  PID TTY  STAT   TIME COMMAND
  2382031 pts/2S  0:00 [nullcmdline]

The empty cmdline makes ps(1) think that nullcmdline is a kernel
thread and put brackets around its name (comm), which is mostly a
nuisance but it's possible that this confusion can lead to more
harmful confusions.

This patch fixes the issue by making proc_pid_cmdline_read() never
return empty string for user tasks.  If the result is empty for
whatever reason, comm string is returned.  Even when the comm string
is empty, it still returns the null termnation character.  On a
patched kernel, running the same command as above gives us.

  $ ./nullcmdline &
  [1] 2317
  [test ~]# hexdump -C /proc/2317/comm
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 0a  |nullcmdline.|
  000c
  $ hexdump -C /proc/2317/cmdline
    6e 75 6c 6c 63 6d 64 6c  69 6e 65 00  |nullcmdline.|
  000c
  $ ps 2317
PID TTY  STAT   TIME COMMAND
   2317 pts/0S  0:00 nullcmdline

Note that cmdline is a dup of comm and ps(1) is no longer confused.

Signed-off-by: Tejun Heo 
Suggested-by: Linus Torvalds 
---
 fs/proc/base.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b2ede6..2eee4d7 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -224,9 +224,10 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
if (!tsk)
return -ESRCH;
mm = get_task_mm(tsk);
-   put_task_struct(tsk);
-   if (!mm)
-   return 0;
+   if (!mm) {
+   rv = 0;
+   goto out_put_task;
+   }
/* Check if process spawned far enough to have cmdline. */
if (!mm->env_end) {
rv = 0;
@@ -367,8 +368,23 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
char __user *buf,
free_page((unsigned long)page);
 out_mmput:
mmput(mm);
+out_put_task:
+   /*
+* Some userland tools use empty cmdline to distinguish kthreads.
+* Avoid empty cmdline for user tasks by returning tsk->comm with
+* \0 termination when empty.
+*/
+   if (*pos == 0 && rv == 0 && !(tsk->flags & PF_KTHREAD)) {
+   char tcomm[TASK_COMM_LEN];
+
+   get_task_comm(tcomm, tsk);
+   rv = min(strlen(tcomm) + 1, count);
+   if (copy_to_user(buf, tsk->comm, rv))
+   rv = -EFAULT;
+   }
if (rv > 0)
*pos += rv;
+   put_task_struct(tsk);
return rv;
 }
 
-- 
2.9.5



[PATCHSET] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}

2018-05-16 Thread Tejun Heo
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

 # ps -ef | grep kworker
 root   4   2  0 Feb25 ?00:00:00 [kworker/0:0H]
 root   6   2  0 Feb25 ?00:00:00 [kworker/u112:0]
 root  19   2  0 Feb25 ?00:00:00 [kworker/1:0H]
 root  25   2  0 Feb25 ?00:00:00 [kworker/2:0H]
 root  31   2  0 Feb25 ?00:00:00 [kworker/3:0H]
 ...

This patchset makes workqueue workers report the latest workqueue it
was executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

 # cat /proc/25/comm
 kworker/2:0-events_power_efficient
 # cat /proc/25/stat
 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
 # grep Name /proc/25/status
 Name:   kworker/2:0-events_power_efficient

For details on the design decisions, please refer to the following
thread.

 http://lkml.kernel.org/r/20180516153939.gh2368...@devbig577.frc2.facebook.com

This patchset contains the following six patches.

 0001-proc-Don-t-allow-empty-proc-PID-cmdline-for-user-tas.patch
 0002-workqueue-Replace-pool-attach_mutex-with-global-wq_p.patch
 0003-workqueue-Make-worker_attach-detach_pool-update-work.patch
 0004-workqueue-Set-worker-desc-to-workqueue-name-by-defau.patch
 0005-proc-Consolidate-task-comm-formatting-into-proc_task.patch
 0006-workqueue-Show-the-latest-workqueue-name-in-proc-PID.patch

I'm applying the patches to wq/for-4.18.  Please let me know if the
patchset need updates (the branch doesn't have any other changes
anyway).  diffstat follows.  Thanks.

 fs/proc/array.c |   33 +++-
 fs/proc/base.c  |   27 +++---
 fs/proc/internal.h  |2 
 include/linux/workqueue.h   |1 
 kernel/workqueue.c  |  117 
 kernel/workqueue_internal.h |3 -
 6 files changed, 122 insertions(+), 61 deletions(-)

--
tejun


[PATCHSET] workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}

2018-05-16 Thread Tejun Heo
There can be a lot of workqueue workers and they all show up with the
cryptic kworker/* names making it difficult to understand which is
doing what and how they came to be.

 # ps -ef | grep kworker
 root   4   2  0 Feb25 ?00:00:00 [kworker/0:0H]
 root   6   2  0 Feb25 ?00:00:00 [kworker/u112:0]
 root  19   2  0 Feb25 ?00:00:00 [kworker/1:0H]
 root  25   2  0 Feb25 ?00:00:00 [kworker/2:0H]
 root  31   2  0 Feb25 ?00:00:00 [kworker/3:0H]
 ...

This patchset makes workqueue workers report the latest workqueue it
was executing for through /proc/PID/{comm,stat,status}.  The extra
information is appended to the kthread name with intervening '+' if
currently executing, otherwise '-'.

 # cat /proc/25/comm
 kworker/2:0-events_power_efficient
 # cat /proc/25/stat
 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0...
 # grep Name /proc/25/status
 Name:   kworker/2:0-events_power_efficient

For details on the design decisions, please refer to the following
thread.

 http://lkml.kernel.org/r/20180516153939.gh2368...@devbig577.frc2.facebook.com

This patchset contains the following six patches.

 0001-proc-Don-t-allow-empty-proc-PID-cmdline-for-user-tas.patch
 0002-workqueue-Replace-pool-attach_mutex-with-global-wq_p.patch
 0003-workqueue-Make-worker_attach-detach_pool-update-work.patch
 0004-workqueue-Set-worker-desc-to-workqueue-name-by-defau.patch
 0005-proc-Consolidate-task-comm-formatting-into-proc_task.patch
 0006-workqueue-Show-the-latest-workqueue-name-in-proc-PID.patch

I'm applying the patches to wq/for-4.18.  Please let me know if the
patchset need updates (the branch doesn't have any other changes
anyway).  diffstat follows.  Thanks.

 fs/proc/array.c |   33 +++-
 fs/proc/base.c  |   27 +++---
 fs/proc/internal.h  |2 
 include/linux/workqueue.h   |1 
 kernel/workqueue.c  |  117 
 kernel/workqueue_internal.h |3 -
 6 files changed, 122 insertions(+), 61 deletions(-)

--
tejun


Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 01:12:20PM +0300, Kirill Tkhai wrote:
> >> +#define root_mem_cgroup NULL
> > 
> > Let's instead export mem_cgroup_is_root(). In case if MEMCG is disabled
> > it will always return false.
> 
> export == move to header file

That and adding a stub function in case !MEMCG.

> >> +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
> >> +  struct mem_cgroup *memcg, int priority)
> >> +{
> >> +  struct memcg_shrinker_map *map;
> >> +  unsigned long freed = 0;
> >> +  int ret, i;
> >> +
> >> +  if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> >> +  return 0;
> >> +
> >> +  if (!down_read_trylock(_rwsem))
> >> +  return 0;
> >> +
> >> +  /*
> >> +   * 1)Caller passes only alive memcg, so map can't be NULL.
> >> +   * 2)shrinker_rwsem protects from maps expanding.
> > 
> > ^^
> > Nit: space missing here :-)
> 
> I don't understand what you mean here. Please, clarify...

This is just a trivial remark regarding comment formatting. They usually
put a space between the number and the first word in the sentence, i.e.
between '1)' and 'Caller' in your case.

> 
> >> +   */
> >> +  map = rcu_dereference_protected(MEMCG_SHRINKER_MAP(memcg, nid), true);
> >> +  BUG_ON(!map);
> >> +
> >> +  for_each_set_bit(i, map->map, memcg_shrinker_nr_max) {
> >> +  struct shrink_control sc = {
> >> +  .gfp_mask = gfp_mask,
> >> +  .nid = nid,
> >> +  .memcg = memcg,
> >> +  };
> >> +  struct shrinker *shrinker;
> >> +
> >> +  shrinker = idr_find(_idr, i);
> >> +  if (!shrinker) {
> >> +  clear_bit(i, map->map);
> >> +  continue;
> >> +  }
> >> +  if (list_empty(>list))
> >> +  continue;
> > 
> > I don't like using shrinker->list as an indicator that the shrinker has
> > been initialized. IMO if you do need such a check, you should split
> > shrinker_idr registration in two steps - allocate a slot in 'prealloc'
> > and set the pointer in 'register'. However, can we really encounter an
> > unregistered shrinker here? AFAIU a bit can be set in the shrinker map
> > only after the corresponding shrinker has been initialized, no?
> 
> 1)No, it's not so. Here is a race:
> cpu#0cpu#1   cpu#2
> prealloc_shrinker()
>  prealloc_shrinker()
>memcg_expand_shrinker_maps()
>  memcg_expand_one_shrinker_map()
>memset(>map, 0xff);  
>  
> do_shrink_slab() (on uninitialized LRUs)
> init LRUs
> register_shrinker_prepared()
> 
> So, the check is needed.

OK, I see.

> 
> 2)Assigning NULL pointer can't be used here, since NULL pointer is already 
> used
> to clear unregistered shrinkers from the map. See the check right after 
> idr_find().

But it won't break anything if we clear bit for prealloc-ed, but not yet
registered shrinkers, will it?

> 
> list_empty() is used since it's the already existing indicator, which does not
> require additional member in struct shrinker.

It just looks rather counter-intuitive to me to use shrinker->list to
differentiate between registered and unregistered shrinkers. May be, I'm
wrong. If you are sure that this is OK, I'm fine with it, but then
please add a comment here explaining what this check is needed for.

Thanks.


Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 01:12:20PM +0300, Kirill Tkhai wrote:
> >> +#define root_mem_cgroup NULL
> > 
> > Let's instead export mem_cgroup_is_root(). In case if MEMCG is disabled
> > it will always return false.
> 
> export == move to header file

That and adding a stub function in case !MEMCG.

> >> +static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid,
> >> +  struct mem_cgroup *memcg, int priority)
> >> +{
> >> +  struct memcg_shrinker_map *map;
> >> +  unsigned long freed = 0;
> >> +  int ret, i;
> >> +
> >> +  if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
> >> +  return 0;
> >> +
> >> +  if (!down_read_trylock(_rwsem))
> >> +  return 0;
> >> +
> >> +  /*
> >> +   * 1)Caller passes only alive memcg, so map can't be NULL.
> >> +   * 2)shrinker_rwsem protects from maps expanding.
> > 
> > ^^
> > Nit: space missing here :-)
> 
> I don't understand what you mean here. Please, clarify...

This is just a trivial remark regarding comment formatting. They usually
put a space between the number and the first word in the sentence, i.e.
between '1)' and 'Caller' in your case.

> 
> >> +   */
> >> +  map = rcu_dereference_protected(MEMCG_SHRINKER_MAP(memcg, nid), true);
> >> +  BUG_ON(!map);
> >> +
> >> +  for_each_set_bit(i, map->map, memcg_shrinker_nr_max) {
> >> +  struct shrink_control sc = {
> >> +  .gfp_mask = gfp_mask,
> >> +  .nid = nid,
> >> +  .memcg = memcg,
> >> +  };
> >> +  struct shrinker *shrinker;
> >> +
> >> +  shrinker = idr_find(_idr, i);
> >> +  if (!shrinker) {
> >> +  clear_bit(i, map->map);
> >> +  continue;
> >> +  }
> >> +  if (list_empty(>list))
> >> +  continue;
> > 
> > I don't like using shrinker->list as an indicator that the shrinker has
> > been initialized. IMO if you do need such a check, you should split
> > shrinker_idr registration in two steps - allocate a slot in 'prealloc'
> > and set the pointer in 'register'. However, can we really encounter an
> > unregistered shrinker here? AFAIU a bit can be set in the shrinker map
> > only after the corresponding shrinker has been initialized, no?
> 
> 1)No, it's not so. Here is a race:
> cpu#0cpu#1   cpu#2
> prealloc_shrinker()
>  prealloc_shrinker()
>memcg_expand_shrinker_maps()
>  memcg_expand_one_shrinker_map()
>memset(>map, 0xff);  
>  
> do_shrink_slab() (on uninitialized LRUs)
> init LRUs
> register_shrinker_prepared()
> 
> So, the check is needed.

OK, I see.

> 
> 2)Assigning NULL pointer can't be used here, since NULL pointer is already 
> used
> to clear unregistered shrinkers from the map. See the check right after 
> idr_find().

But it won't break anything if we clear bit for prealloc-ed, but not yet
registered shrinkers, will it?

> 
> list_empty() is used since it's the already existing indicator, which does not
> require additional member in struct shrinker.

It just looks rather counter-intuitive to me to use shrinker->list to
differentiate between registered and unregistered shrinkers. May be, I'm
wrong. If you are sure that this is OK, I'm fine with it, but then
please add a comment here explaining what this check is needed for.

Thanks.


Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter

2018-05-16 Thread Chris Moore

Hi,

Le 16/05/2018 à 09:56, Boris Brezillon a écrit :

On Wed, 16 May 2018 09:32:57 +0200
Chris Moore  wrote:


Hi,

Le 15/05/2018 à 09:34, Boris Brezillon a écrit :

On Tue, 15 May 2018 06:45:51 +0200
Chris Moore  wrote:
  

Hi,

Le 13/05/2018 à 06:30, Wan, Jane (Nokia - US/Sunnyvale) a écrit :

Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC 
values, the bit-wise majority may be used to recover the contents of the 
parameter pages from the parameter page copies present.

Signed-off-by: Jane Wan 
---
v7: change debug print messages
v6: support the cases that srcbufs are not contiguous
v5: make the bit-wise majority functon generic
v4: move the bit-wise majority code in a separate function
v3: fix warning message detected by kbuild test robot
v2: rebase the changes on top of v4.17-rc1

drivers/mtd/nand/raw/nand_base.c |   50 ++

1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 72f3a89..b43b784 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct 
nand_chip *chip,
}

/*

+ * Recover data with bit-wise majority
+ */
+static void nand_bit_wise_majority(const void **srcbufs,
+  unsigned int nsrcbufs,
+  void *dstbuf,
+  unsigned int bufsize)
+{
+   int i, j, k;
+
+   for (i = 0; i < bufsize; i++) {
+   u8 cnt, val;
+
+   val = 0;
+   for (j = 0; j < 8; j++) {
+   cnt = 0;
+   for (k = 0; k < nsrcbufs; k++) {
+   const u8 *srcbuf = srcbufs[k];
+
+   if (srcbuf[i] & BIT(j))
+   cnt++;
+   }
+   if (cnt > nsrcbufs / 2)
+   val |= BIT(j);
+   }
+   ((u8 *)dstbuf)[i] = val;
+   }
+}
+
+/*
 * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 
otherwise.
 */
static int nand_flash_detect_onfi(struct nand_chip *chip)
@@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
return 0;

	/* ONFI chip: allocate a buffer to hold its parameter page */

-   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
if (!p)
return -ENOMEM;

@@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)

}

	for (i = 0; i < 3; i++) {

-   ret = nand_read_data_op(chip, p, sizeof(*p), true);
+   ret = nand_read_data_op(chip, [i], sizeof(*p), true);
if (ret) {
ret = 0;
goto free_onfi_param_page;
}

-		if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) ==

+   if (onfi_crc16(ONFI_CRC_BASE, (u8 *)[i], 254) ==
le16_to_cpu(p->crc)) {
+   if (i)
+   memcpy(p, [i], sizeof(*p));
break;
}
}

	if (i == 3) {

-   pr_err("Could not find valid ONFI parameter page; aborting\n");
-   goto free_onfi_param_page;
+   const void *srcbufs[3] = {p, p + 1, p + 2};
+
+   pr_warn("Could not find a valid ONFI parameter page, trying bit-wise 
majority to recover it\n");
+   nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p,
+  sizeof(*p));
+
+   if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) !=
+   le16_to_cpu(p->crc)) {
+   pr_err("ONFI parameter recovery failed, aborting\n");
+   goto free_onfi_param_page;
+   }
}

	/* Check version */

This version is still hard coded for a three sample bitwise majority vote.
So why not use the method which I suggested previously for v2 and which
I repeat below?

Because I want the nand_bit_wise_majority() function to work with
nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param
page, but NAND vendor can decide to put more). Also, if the X copies of
the PARAM are corrupted (which is rather unlikely), that means we
already spent quite a lot of time reading the different copies and
calculating the CRC, so I think we don't care about perf optimizations
when doing bit-wise majority.
  

The three sample bitwise majority can be implemented without bit level
manipulation using the identity:
majority3(a, b, c) = (a & b) | (a & c) | (b & c)
This can be factorized slightly to (a & (b | c)) | (b & c)
This 

Re: [PATCH v7] mtd: rawnand: use bit-wise majority to recover the contents of ONFI parameter

2018-05-16 Thread Chris Moore

Hi,

Le 16/05/2018 à 09:56, Boris Brezillon a écrit :

On Wed, 16 May 2018 09:32:57 +0200
Chris Moore  wrote:


Hi,

Le 15/05/2018 à 09:34, Boris Brezillon a écrit :

On Tue, 15 May 2018 06:45:51 +0200
Chris Moore  wrote:
  

Hi,

Le 13/05/2018 à 06:30, Wan, Jane (Nokia - US/Sunnyvale) a écrit :

Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC 
values, the bit-wise majority may be used to recover the contents of the 
parameter pages from the parameter page copies present.

Signed-off-by: Jane Wan 
---
v7: change debug print messages
v6: support the cases that srcbufs are not contiguous
v5: make the bit-wise majority functon generic
v4: move the bit-wise majority code in a separate function
v3: fix warning message detected by kbuild test robot
v2: rebase the changes on top of v4.17-rc1

drivers/mtd/nand/raw/nand_base.c |   50 ++

1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 72f3a89..b43b784 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -5087,6 +5087,35 @@ static int nand_flash_detect_ext_param_page(struct 
nand_chip *chip,
}

/*

+ * Recover data with bit-wise majority
+ */
+static void nand_bit_wise_majority(const void **srcbufs,
+  unsigned int nsrcbufs,
+  void *dstbuf,
+  unsigned int bufsize)
+{
+   int i, j, k;
+
+   for (i = 0; i < bufsize; i++) {
+   u8 cnt, val;
+
+   val = 0;
+   for (j = 0; j < 8; j++) {
+   cnt = 0;
+   for (k = 0; k < nsrcbufs; k++) {
+   const u8 *srcbuf = srcbufs[k];
+
+   if (srcbuf[i] & BIT(j))
+   cnt++;
+   }
+   if (cnt > nsrcbufs / 2)
+   val |= BIT(j);
+   }
+   ((u8 *)dstbuf)[i] = val;
+   }
+}
+
+/*
 * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 
otherwise.
 */
static int nand_flash_detect_onfi(struct nand_chip *chip)
@@ -5102,7 +5131,7 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)
return 0;

	/* ONFI chip: allocate a buffer to hold its parameter page */

-   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   p = kzalloc((sizeof(*p) * 3), GFP_KERNEL);
if (!p)
return -ENOMEM;

@@ -5113,21 +5142,32 @@ static int nand_flash_detect_onfi(struct nand_chip *chip)

}

	for (i = 0; i < 3; i++) {

-   ret = nand_read_data_op(chip, p, sizeof(*p), true);
+   ret = nand_read_data_op(chip, [i], sizeof(*p), true);
if (ret) {
ret = 0;
goto free_onfi_param_page;
}

-		if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) ==

+   if (onfi_crc16(ONFI_CRC_BASE, (u8 *)[i], 254) ==
le16_to_cpu(p->crc)) {
+   if (i)
+   memcpy(p, [i], sizeof(*p));
break;
}
}

	if (i == 3) {

-   pr_err("Could not find valid ONFI parameter page; aborting\n");
-   goto free_onfi_param_page;
+   const void *srcbufs[3] = {p, p + 1, p + 2};
+
+   pr_warn("Could not find a valid ONFI parameter page, trying bit-wise 
majority to recover it\n");
+   nand_bit_wise_majority(srcbufs, ARRAY_SIZE(srcbufs), p,
+  sizeof(*p));
+
+   if (onfi_crc16(ONFI_CRC_BASE, (u8 *)p, 254) !=
+   le16_to_cpu(p->crc)) {
+   pr_err("ONFI parameter recovery failed, aborting\n");
+   goto free_onfi_param_page;
+   }
}

	/* Check version */

This version is still hard coded for a three sample bitwise majority vote.
So why not use the method which I suggested previously for v2 and which
I repeat below?

Because I want the nand_bit_wise_majority() function to work with
nsrcbufs > 3 (the ONFI spec says there's at least 3 copy of the param
page, but NAND vendor can decide to put more). Also, if the X copies of
the PARAM are corrupted (which is rather unlikely), that means we
already spent quite a lot of time reading the different copies and
calculating the CRC, so I think we don't care about perf optimizations
when doing bit-wise majority.
  

The three sample bitwise majority can be implemented without bit level
manipulation using the identity:
majority3(a, b, c) = (a & b) | (a & c) | (b & c)
This can be factorized slightly to (a & (b | c)) | (b & c)
This enables the operation to be performed 8, 16, 32 or even 

Re: [PATCH net-next 1/3] net: ethernet: ti: Allow most drivers with COMPILE_TEST

2018-05-16 Thread kbuild test robot
Hi Florian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit':
>> drivers/net//ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 
>> 1 of 'writel_relaxed' makes integer from pointer without a cast 
>> [-Wint-conversion]
 writel_relaxed(token, >sw_token);
^
   In file included from arch/xtensa/include/asm/io.h:83:0,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from drivers/net//ethernet/ti/davinci_cpdma.c:21:
   include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but 
argument is of type 'void *'
#define writel_relaxed writel_relaxed
   ^
>> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed'
static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
   ^~
--
   drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit':
   drivers/net/ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 1 
of 'writel_relaxed' makes integer from pointer without a cast [-Wint-conversion]
 writel_relaxed(token, >sw_token);
^
   In file included from arch/xtensa/include/asm/io.h:83:0,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from drivers/net/ethernet/ti/davinci_cpdma.c:21:
   include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but 
argument is of type 'void *'
#define writel_relaxed writel_relaxed
   ^
>> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed'
static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
   ^~

vim +/writel_relaxed +1083 drivers/net//ethernet/ti/davinci_cpdma.c

ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1029  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1030  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
aef614e1 drivers/net/ethernet/ti/davinci_cpdma.c Sebastian Siewior 2013-04-23  
1031   int len, int directed)
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1032  {
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1033 struct cpdma_ctlr   *ctlr = chan->ctlr;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1034 struct cpdma_desc __iomem   *desc;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1035 dma_addr_t  buffer;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1036 unsigned long   flags;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1037 u32 mode;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1038 int ret = 0;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1039  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1040 spin_lock_irqsave(>lock, flags);
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1041  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1042 if (chan->state == CPDMA_STATE_TEARDOWN) {
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1043 ret = -EINVAL;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1044 goto unlock_ret;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1045 }
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1046  
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1047 if (chan->count >= chan->desc_num)  {
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1048 chan->stats.desc_alloc_fail++;
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1049 ret = -ENOMEM;
742fb20f 

Re: [PATCH net-next 1/3] net: ethernet: ti: Allow most drivers with COMPILE_TEST

2018-05-16 Thread kbuild test robot
Hi Florian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit':
>> drivers/net//ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 
>> 1 of 'writel_relaxed' makes integer from pointer without a cast 
>> [-Wint-conversion]
 writel_relaxed(token, >sw_token);
^
   In file included from arch/xtensa/include/asm/io.h:83:0,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from drivers/net//ethernet/ti/davinci_cpdma.c:21:
   include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but 
argument is of type 'void *'
#define writel_relaxed writel_relaxed
   ^
>> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed'
static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
   ^~
--
   drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit':
   drivers/net/ethernet/ti/davinci_cpdma.c:1083:17: warning: passing argument 1 
of 'writel_relaxed' makes integer from pointer without a cast [-Wint-conversion]
 writel_relaxed(token, >sw_token);
^
   In file included from arch/xtensa/include/asm/io.h:83:0,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from drivers/net/ethernet/ti/davinci_cpdma.c:21:
   include/asm-generic/io.h:303:24: note: expected 'u32 {aka unsigned int}' but 
argument is of type 'void *'
#define writel_relaxed writel_relaxed
   ^
>> include/asm-generic/io.h:304:20: note: in expansion of macro 'writel_relaxed'
static inline void writel_relaxed(u32 value, volatile void __iomem *addr)
   ^~

vim +/writel_relaxed +1083 drivers/net//ethernet/ti/davinci_cpdma.c

ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1029  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1030  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
aef614e1 drivers/net/ethernet/ti/davinci_cpdma.c Sebastian Siewior 2013-04-23  
1031   int len, int directed)
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1032  {
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1033 struct cpdma_ctlr   *ctlr = chan->ctlr;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1034 struct cpdma_desc __iomem   *desc;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1035 dma_addr_t  buffer;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1036 unsigned long   flags;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1037 u32 mode;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1038 int ret = 0;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1039  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1040 spin_lock_irqsave(>lock, flags);
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1041  
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1042 if (chan->state == CPDMA_STATE_TEARDOWN) {
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1043 ret = -EINVAL;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1044 goto unlock_ret;
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1045 }
ef8c2dab drivers/net/davinci_cpdma.c Cyril Chemparathy 2010-09-15  
1046  
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1047 if (chan->count >= chan->desc_num)  {
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1048 chan->stats.desc_alloc_fail++;
742fb20f drivers/net/ethernet/ti/davinci_cpdma.c Grygorii Strashko 2016-06-27  
1049 ret = -ENOMEM;
742fb20f 

[PATCH] of: overlay: validate offset from property fixups

2018-05-16 Thread frowand . list
From: Frank Rowand 

The smatch static checker marks the data in offset as untrusted,
leading it to warn:

  drivers/of/resolver.c:125 update_usages_of_a_phandle_reference()
  error: buffer underflow 'prop->value' 's32min-s32max'

Add check to verify that offset is within the property data.

Reported-by: Dan Carpenter 
Signed-off-by: Frank Rowand 
---
 drivers/of/resolver.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c
index 65d0b7adfcd4..7edfac6f1914 100644
--- a/drivers/of/resolver.c
+++ b/drivers/of/resolver.c
@@ -122,6 +122,11 @@ static int update_usages_of_a_phandle_reference(struct 
device_node *overlay,
goto err_fail;
}
 
+   if (offset < 0 || offset + sizeof(__be32) > prop->length) {
+   err = -EINVAL;
+   goto err_fail;
+   }
+
*(__be32 *)(prop->value + offset) = cpu_to_be32(phandle);
}
 
-- 
Frank Rowand 



[PATCH] of: overlay: validate offset from property fixups

2018-05-16 Thread frowand . list
From: Frank Rowand 

The smatch static checker marks the data in offset as untrusted,
leading it to warn:

  drivers/of/resolver.c:125 update_usages_of_a_phandle_reference()
  error: buffer underflow 'prop->value' 's32min-s32max'

Add check to verify that offset is within the property data.

Reported-by: Dan Carpenter 
Signed-off-by: Frank Rowand 
---
 drivers/of/resolver.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/of/resolver.c b/drivers/of/resolver.c
index 65d0b7adfcd4..7edfac6f1914 100644
--- a/drivers/of/resolver.c
+++ b/drivers/of/resolver.c
@@ -122,6 +122,11 @@ static int update_usages_of_a_phandle_reference(struct 
device_node *overlay,
goto err_fail;
}
 
+   if (offset < 0 || offset + sizeof(__be32) > prop->length) {
+   err = -EINVAL;
+   goto err_fail;
+   }
+
*(__be32 *)(prop->value + offset) = cpu_to_be32(phandle);
}
 
-- 
Frank Rowand 



Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-16 Thread Vinod Koul
On 15-05-18, 11:50, Frank Mori Hess wrote:
> On Tue, May 15, 2018 at 2:21 AM, Vinod  wrote:
> >
> > For Pause/resume data loss is _not_ expected.
> >
> >> > and some of the 8250 drivers like 8250_dw.c set a maxburst > 1.  If it
> >> > can't count on the pause/residue/terminate working without data loss
> >> > then it is just broken.  As is the 8250_omap.c driver.  Is the
> >> > description of dmaengine_pause in the documentation of "This pauses
> >> > activity on the DMA channel without data loss" to be interpreted as
> >> > "as long as you resume afterwards"?
> >>
> >> I assume that this requirement is for both - resuming and terminating.
> >
> > Terminate is abort, data loss may happen here.
> 
> Wait, are you saying if you do
> 
> dma pause

no data loss
> read residue

here as well
> dma terminate

Oh yes, we aborted...
> 
> then it is acceptable for there to be data loss, because it can be
> blamed on the terminate?  In that case the usage of dmaengine in all
> the 8250 serial drivers is broken.  Every time there is a break in the
> incoming rx data, the 8250 driver does pause/residue/terminate which
> could potentially cause data from the incoming data stream to be
> dropped.

As I said terminate is abort. It cleans up the channel and is supposed to
used for cleanup not for stopping. See Documentation:

- device_terminate_all

  - Aborts all the pending and ongoing transfers on the channel

  - For aborted transfers the complete callback should not be called

  - Can be called from atomic context or from within a complete
callback of a descriptor. Must not sleep. Drivers must be able
to handle this correctly.

  - Termination may be asynchronous. The driver does not have to
wait until the currently active transfer has completely stopped.
See device_synchronize.

Thanks
-- 
~Vinod


Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-16 Thread Vinod Koul
On 15-05-18, 11:50, Frank Mori Hess wrote:
> On Tue, May 15, 2018 at 2:21 AM, Vinod  wrote:
> >
> > For Pause/resume data loss is _not_ expected.
> >
> >> > and some of the 8250 drivers like 8250_dw.c set a maxburst > 1.  If it
> >> > can't count on the pause/residue/terminate working without data loss
> >> > then it is just broken.  As is the 8250_omap.c driver.  Is the
> >> > description of dmaengine_pause in the documentation of "This pauses
> >> > activity on the DMA channel without data loss" to be interpreted as
> >> > "as long as you resume afterwards"?
> >>
> >> I assume that this requirement is for both - resuming and terminating.
> >
> > Terminate is abort, data loss may happen here.
> 
> Wait, are you saying if you do
> 
> dma pause

no data loss
> read residue

here as well
> dma terminate

Oh yes, we aborted...
> 
> then it is acceptable for there to be data loss, because it can be
> blamed on the terminate?  In that case the usage of dmaengine in all
> the 8250 serial drivers is broken.  Every time there is a break in the
> incoming rx data, the 8250 driver does pause/residue/terminate which
> could potentially cause data from the incoming data stream to be
> dropped.

As I said terminate is abort. It cleans up the channel and is supposed to
used for cleanup not for stopping. See Documentation:

- device_terminate_all

  - Aborts all the pending and ongoing transfers on the channel

  - For aborted transfers the complete callback should not be called

  - Can be called from atomic context or from within a complete
callback of a descriptor. Must not sleep. Drivers must be able
to handle this correctly.

  - Termination may be asynchronous. The driver does not have to
wait until the currently active transfer has completely stopped.
See device_synchronize.

Thanks
-- 
~Vinod


linux-next: manual merge of the staging tree with the v4l-dvb tree

2018-05-16 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the staging tree got a conflict in:

  drivers/staging/media/atomisp/TODO

between commit:

  51b8dc5163d2 ("media: staging: atomisp: Remove driver")

from the v4l-dvb tree and commit:

  1bd421154821 ("staging: atomisp: Augment TODO file with GPIO work item")

from the staging tree.

I fixed it up (I just removed the file) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpn0MlNOAqLq.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the staging tree with the v4l-dvb tree

2018-05-16 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the staging tree got a conflict in:

  drivers/staging/media/atomisp/TODO

between commit:

  51b8dc5163d2 ("media: staging: atomisp: Remove driver")

from the v4l-dvb tree and commit:

  1bd421154821 ("staging: atomisp: Augment TODO file with GPIO work item")

from the staging tree.

I fixed it up (I just removed the file) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpn0MlNOAqLq.pgp
Description: OpenPGP digital signature


Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 05:49:59PM +0300, Kirill Tkhai wrote:
> >> @@ -589,13 +647,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int 
> >> nid,
> >>.memcg = memcg,
> >>};
> >>  
> >> -  /*
> >> -   * If kernel memory accounting is disabled, we ignore
> >> -   * SHRINKER_MEMCG_AWARE flag and call all shrinkers
> >> -   * passing NULL for memcg.
> >> -   */
> >> -  if (memcg_kmem_enabled() &&
> >> -  !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> >> +  if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> >>continue;
> > 
> > I want this check gone. It's easy to achieve, actually - just remove the
> > following lines from shrink_node()
> > 
> > if (global_reclaim(sc))
> > shrink_slab(sc->gfp_mask, pgdat->node_id, NULL,
> > sc->priority);
> 
> This check is not related to the patchset.

Yes, it is. This patch modifies shrink_slab which is used only by
shrink_node. Simplifying shrink_node along the way looks right to me.

> Let's don't mix everything in the single series of patches, because
> after your last remarks it will grow at least up to 15 patches.

Most of which are trivial so I don't see any problem here.

> This patchset can't be responsible for everything.

I don't understand why you balk at simplifying the code a bit while you
are patching related functions anyway.

> 
> >>  
> >>if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> >>


Re: [PATCH v5 11/13] mm: Iterate only over charged shrinkers during memcg shrink_slab()

2018-05-16 Thread Vladimir Davydov
On Tue, May 15, 2018 at 05:49:59PM +0300, Kirill Tkhai wrote:
> >> @@ -589,13 +647,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int 
> >> nid,
> >>.memcg = memcg,
> >>};
> >>  
> >> -  /*
> >> -   * If kernel memory accounting is disabled, we ignore
> >> -   * SHRINKER_MEMCG_AWARE flag and call all shrinkers
> >> -   * passing NULL for memcg.
> >> -   */
> >> -  if (memcg_kmem_enabled() &&
> >> -  !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> >> +  if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> >>continue;
> > 
> > I want this check gone. It's easy to achieve, actually - just remove the
> > following lines from shrink_node()
> > 
> > if (global_reclaim(sc))
> > shrink_slab(sc->gfp_mask, pgdat->node_id, NULL,
> > sc->priority);
> 
> This check is not related to the patchset.

Yes, it is. This patch modifies shrink_slab which is used only by
shrink_node. Simplifying shrink_node along the way looks right to me.

> Let's don't mix everything in the single series of patches, because
> after your last remarks it will grow at least up to 15 patches.

Most of which are trivial so I don't see any problem here.

> This patchset can't be responsible for everything.

I don't understand why you balk at simplifying the code a bit while you
are patching related functions anyway.

> 
> >>  
> >>if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> >>


Re: [PATCH] cpufreq: brcmstb-avs-cpufreq: sort frequencies in ascending order

2018-05-16 Thread Viresh Kumar
On 16-05-18, 12:24, Florian Fainelli wrote:
> On 05/15/2018 09:32 PM, Viresh Kumar wrote:
> > On 15-05-18, 20:49, Markus Mayer wrote:
> >> From: Markus Mayer 
> >>
> >> Most CPUfreq drivers (at least on ARM) seem to be sorting the available
> >> frequencies from lowest to highest. To match this behaviour, we reverse
> >> the sorting order in brcmstb-avs-cpufreq, so it is now also lowest to
> >> highest.
> > 
> > The reasoning isn't correct. Just because everyone else is doing it
> > doesn't make it right and so you shouldn't change just because of
> > that.
> > 
> > What you must written instead in the commit log is that the cpufreq
> > core performs better if the table is sorted (in any order), and so we
> > must sort it as well.
> 
> Is there a reason why set_freq_table_sorted() tries an ascending or
> descending sort, but does not enforce one versus another for all drivers?

set_freq_table_sorted() doesn't sort the frequency table but checks if
the table is already sorted in a particular order. And then
cpufreq_frequency_table_target() is optimized based on this flag. We
don't have to enforce any particular ordering here.

> > But I feel the table is already sorted for your platform, isn't it?
> > And I don't see a clear advantage of merging this patch.
> 
> The patch changes the order to have the lowest to highest, whereas the
> current implementation has them from highest to lowest. From what you
> are saying, it sounds like this is unnecessary, since the sorting is
> already making things efficient enough, so this is just a cosmetic thing?

Right. It shouldn't make any performance improvements.

-- 
viresh


Re: [PATCH] cpufreq: brcmstb-avs-cpufreq: sort frequencies in ascending order

2018-05-16 Thread Viresh Kumar
On 16-05-18, 12:24, Florian Fainelli wrote:
> On 05/15/2018 09:32 PM, Viresh Kumar wrote:
> > On 15-05-18, 20:49, Markus Mayer wrote:
> >> From: Markus Mayer 
> >>
> >> Most CPUfreq drivers (at least on ARM) seem to be sorting the available
> >> frequencies from lowest to highest. To match this behaviour, we reverse
> >> the sorting order in brcmstb-avs-cpufreq, so it is now also lowest to
> >> highest.
> > 
> > The reasoning isn't correct. Just because everyone else is doing it
> > doesn't make it right and so you shouldn't change just because of
> > that.
> > 
> > What you must written instead in the commit log is that the cpufreq
> > core performs better if the table is sorted (in any order), and so we
> > must sort it as well.
> 
> Is there a reason why set_freq_table_sorted() tries an ascending or
> descending sort, but does not enforce one versus another for all drivers?

set_freq_table_sorted() doesn't sort the frequency table but checks if
the table is already sorted in a particular order. And then
cpufreq_frequency_table_target() is optimized based on this flag. We
don't have to enforce any particular ordering here.

> > But I feel the table is already sorted for your platform, isn't it?
> > And I don't see a clear advantage of merging this patch.
> 
> The patch changes the order to have the lowest to highest, whereas the
> current implementation has them from highest to lowest. From what you
> are saying, it sounds like this is unnecessary, since the sorting is
> already making things efficient enough, so this is just a cosmetic thing?

Right. It shouldn't make any performance improvements.

-- 
viresh


Re: [PATCH] sched/rt: fix call to cpufreq_update_util

2018-05-16 Thread Viresh Kumar
On 16-05-18, 20:18, Vincent Guittot wrote:
> With commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
> schedutil governor uses rq->rt.rt_nr_running to detect whether a RT task is
> currently running on the CPU and to set frequency to max if necessary.
> cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
> rq->rt.rt_nr_running as not been updated yet when dequeue_top_rt_rq() is
> called so schedutil still considers that a RT task is running when the
> last task is dequeued. The update of rq->rt.rt_nr_running happens later
> in dequeue_rt_stack()
> 
> Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
> Cc:  # v4.16+
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/rt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7aef6b4..6e74d3d 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1001,8 +1001,6 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
>   sub_nr_running(rq, rt_rq->rt_nr_running);
>   rt_rq->rt_queued = 0;
>  

Remove this blank line as well ?

> - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> - cpufreq_update_util(rq, 0);
>  }
>  
>  static void
> @@ -1288,6 +1286,9 @@ static void dequeue_rt_stack(struct sched_rt_entity 
> *rt_se, unsigned int flags)
>   if (on_rt_rq(rt_se))
>   __dequeue_rt_entity(rt_se, flags);
>   }
> +
> + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> + cpufreq_update_util(rq_of_rt_rq(rt_rq_of_se(back)), 0);
>  }
>  
>  static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int 
> flags)
> -- 
> 2.7.4

-- 
viresh


Re: [PATCH] sched/rt: fix call to cpufreq_update_util

2018-05-16 Thread Viresh Kumar
On 16-05-18, 20:18, Vincent Guittot wrote:
> With commit 8f111bc357aa ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
> schedutil governor uses rq->rt.rt_nr_running to detect whether a RT task is
> currently running on the CPU and to set frequency to max if necessary.
> cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
> rq->rt.rt_nr_running as not been updated yet when dequeue_top_rt_rq() is
> called so schedutil still considers that a RT task is running when the
> last task is dequeued. The update of rq->rt.rt_nr_running happens later
> in dequeue_rt_stack()
> 
> Fixes: 8f111bc357aa ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
> Cc:  # v4.16+
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/rt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7aef6b4..6e74d3d 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1001,8 +1001,6 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
>   sub_nr_running(rq, rt_rq->rt_nr_running);
>   rt_rq->rt_queued = 0;
>  

Remove this blank line as well ?

> - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> - cpufreq_update_util(rq, 0);
>  }
>  
>  static void
> @@ -1288,6 +1286,9 @@ static void dequeue_rt_stack(struct sched_rt_entity 
> *rt_se, unsigned int flags)
>   if (on_rt_rq(rt_se))
>   __dequeue_rt_entity(rt_se, flags);
>   }
> +
> + /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> + cpufreq_update_util(rq_of_rt_rq(rt_rq_of_se(back)), 0);
>  }
>  
>  static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int 
> flags)
> -- 
> 2.7.4

-- 
viresh


Re: Grant

2018-05-16 Thread Maratovich.M. Fridman



I Mikhail Fridman. has selected you specially as one of my beneficiaries
for my Charitable Donation, Just as I have declared on May 23, 2016 to give
my fortune as charity.

Check the link below for confirmation:

http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604

Reply as soon as possible with further directives.

Best Regards,
Mikhail Fridman.



Re: Grant

2018-05-16 Thread Maratovich.M. Fridman



I Mikhail Fridman. has selected you specially as one of my beneficiaries
for my Charitable Donation, Just as I have declared on May 23, 2016 to give
my fortune as charity.

Check the link below for confirmation:

http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604

Reply as soon as possible with further directives.

Best Regards,
Mikhail Fridman.



Re: [PATCH 1/4] scsi: ufs: add quirk to fix mishandling utrlclr/utmrlclr

2018-05-16 Thread Asutosh Das (asd)

On 5/6/2018 3:44 PM, Alim Akhtar wrote:

In the right behavior, setting the bit to '0' indicates clear and
'1' indicates no change. If host controller handles this the other way,
UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used.

Signed-off-by: Seungwon Jeon 
Signed-off-by: Alim Akhtar 
---
  drivers/scsi/ufs/ufshcd.c | 21 +++--
  drivers/scsi/ufs/ufshcd.h |  5 +
  2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 00e7905..9898ce5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -675,7 +675,24 @@ static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, 
int slot)
   */
  static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos)
  {
-   ufshcd_writel(hba, ~(1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+   if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR)
+   ufshcd_writel(hba, (1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+   else
+   ufshcd_writel(hba, ~(1 << pos),
+   REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+}
+
+/**
+ * ufshcd_utmrl_clear - Clear a bit in UTRMLCLR register
+ * @hba: per adapter instance
+ * @pos: position of the bit to be cleared
+ */
+static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos)
+{
+   if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR)
+   ufshcd_writel(hba, (1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR);
+   else
+   ufshcd_writel(hba, ~(1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR);
  }
  
  /**

@@ -5398,7 +5415,7 @@ static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int 
tag)
goto out;
  
  	spin_lock_irqsave(hba->host->host_lock, flags);

-   ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR);
+   ufshcd_utmrl_clear(hba, tag);
spin_unlock_irqrestore(hba->host->host_lock, flags);
  
  	/* poll for max. 1 sec to clear door bell register by h/w */

diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 8110dcd..43035f8 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -595,6 +595,11 @@ struct ufs_hba {
 */
#define UFSHCD_QUIRK_PRDT_BYTE_GRAN 0x80
  
+	/*

+* Cleaer handling for transfer/task request list is just opposite.
+*/

Spell check - should be 'Clear'

+   #define UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR0x100
+
unsigned int quirks;/* Deviations from standard UFSHCI spec. */
  
  	/* Device deviations from standard UFS device spec. */




Looks good to me, except the spell-check.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [PATCH 1/4] scsi: ufs: add quirk to fix mishandling utrlclr/utmrlclr

2018-05-16 Thread Asutosh Das (asd)

On 5/6/2018 3:44 PM, Alim Akhtar wrote:

In the right behavior, setting the bit to '0' indicates clear and
'1' indicates no change. If host controller handles this the other way,
UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR can be used.

Signed-off-by: Seungwon Jeon 
Signed-off-by: Alim Akhtar 
---
  drivers/scsi/ufs/ufshcd.c | 21 +++--
  drivers/scsi/ufs/ufshcd.h |  5 +
  2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 00e7905..9898ce5 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -675,7 +675,24 @@ static inline void ufshcd_put_tm_slot(struct ufs_hba *hba, 
int slot)
   */
  static inline void ufshcd_utrl_clear(struct ufs_hba *hba, u32 pos)
  {
-   ufshcd_writel(hba, ~(1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+   if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR)
+   ufshcd_writel(hba, (1 << pos), REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+   else
+   ufshcd_writel(hba, ~(1 << pos),
+   REG_UTP_TRANSFER_REQ_LIST_CLEAR);
+}
+
+/**
+ * ufshcd_utmrl_clear - Clear a bit in UTRMLCLR register
+ * @hba: per adapter instance
+ * @pos: position of the bit to be cleared
+ */
+static inline void ufshcd_utmrl_clear(struct ufs_hba *hba, u32 pos)
+{
+   if (hba->quirks & UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR)
+   ufshcd_writel(hba, (1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR);
+   else
+   ufshcd_writel(hba, ~(1 << pos), REG_UTP_TASK_REQ_LIST_CLEAR);
  }
  
  /**

@@ -5398,7 +5415,7 @@ static int ufshcd_clear_tm_cmd(struct ufs_hba *hba, int 
tag)
goto out;
  
  	spin_lock_irqsave(hba->host->host_lock, flags);

-   ufshcd_writel(hba, ~(1 << tag), REG_UTP_TASK_REQ_LIST_CLEAR);
+   ufshcd_utmrl_clear(hba, tag);
spin_unlock_irqrestore(hba->host->host_lock, flags);
  
  	/* poll for max. 1 sec to clear door bell register by h/w */

diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 8110dcd..43035f8 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -595,6 +595,11 @@ struct ufs_hba {
 */
#define UFSHCD_QUIRK_PRDT_BYTE_GRAN 0x80
  
+	/*

+* Cleaer handling for transfer/task request list is just opposite.
+*/

Spell check - should be 'Clear'

+   #define UFSHCI_QUIRK_BROKEN_REQ_LIST_CLR0x100
+
unsigned int quirks;/* Deviations from standard UFSHCI spec. */
  
  	/* Device deviations from standard UFS device spec. */




Looks good to me, except the spell-check.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [PATCH net-next 2/3] net: ethernet: freescale: Allow FEC with COMPILE_TEST

2018-05-16 Thread kbuild test robot
Hi Florian,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807
config: m68k-allmodconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   In file included from include/linux/swab.h:5:0,
from include/uapi/linux/byteorder/big_endian.h:13,
from include/linux/byteorder/big_endian.h:5,
from arch/m68k/include/uapi/asm/byteorder.h:5,
from include/asm-generic/bitops/le.h:6,
from arch/m68k/include/asm/bitops.h:519,
from include/linux/bitops.h:38,
from include/linux/kernel.h:11,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers/net//ethernet/freescale/fec_main.c:24:
   drivers/net//ethernet/freescale/fec_main.c: In function 'fec_restart':
>> drivers/net//ethernet/freescale/fec_main.c:959:26: error: 'FEC_RACC' 
>> undeclared (first use in this function); did you mean 'FEC_RXIC1'?
  val = readl(fep->hwp + FEC_RACC);
 ^
   include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32'
 (__builtin_constant_p((__u32)(x)) ? \
   ^
   include/linux/byteorder/generic.h:89:21: note: in expansion of macro 
'__le32_to_cpu'
#define le32_to_cpu __le32_to_cpu
^
   arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32'
#define readl(addr)  in_le32(addr)
 ^~~
   drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of 
macro 'readl'
  val = readl(fep->hwp + FEC_RACC);
^
   drivers/net//ethernet/freescale/fec_main.c:959:26: note: each undeclared 
identifier is reported only once for each function it appears in
  val = readl(fep->hwp + FEC_RACC);
 ^
   include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32'
 (__builtin_constant_p((__u32)(x)) ? \
   ^
   include/linux/byteorder/generic.h:89:21: note: in expansion of macro 
'__le32_to_cpu'
#define le32_to_cpu __le32_to_cpu
^
   arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32'
#define readl(addr)  in_le32(addr)
 ^~~
   drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of 
macro 'readl'
  val = readl(fep->hwp + FEC_RACC);
^
   In file included from arch/m68k/include/asm/io_mm.h:27:0,
from arch/m68k/include/asm/io.h:5,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from include/linux/skbuff.h:34,
from include/linux/if_ether.h:23,
from include/uapi/linux/ethtool.h:19,
from include/linux/ethtool.h:18,
from include/linux/netdevice.h:41,
from drivers/net//ethernet/freescale/fec_main.c:34:
   drivers/net//ethernet/freescale/fec_main.c:968:38: error: 'FEC_FTRL' 
undeclared (first use in this function); did you mean 'FEC_ECNTRL'?
  writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
 ^
   arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32'
#define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = 
cpu_to_le32(l))
   ^~~~
   drivers/net//ethernet/freescale/fec_main.c:968:3: note: in expansion of 
macro 'writel'
  writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
  ^~
   drivers/net//ethernet/freescale/fec_main.c:1034:38: error: 'FEC_R_FIFO_RSEM' 
undeclared (first use in this function); did you mean 'FEC_FIFO_RAM'?
  writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM);
 ^
   arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32'
#define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = 
cpu_to_le32(l))
   ^~~~
   drivers/net//ethernet/freescale/fec_main.c:1034:3: note: in expansion of 
macro 'writel'
  writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM);
  ^~
   drivers/net//ethernet/freescale/fec_main.c:1035:38: error: 'FEC_R_FIFO_RSFL' 
undeclared (first use in 

Re: [PATCH net-next 2/3] net: ethernet: freescale: Allow FEC with COMPILE_TEST

2018-05-16 Thread kbuild test robot
Hi Florian,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-Allow-more-drivers-with-COMPILE_TEST/20180517-092807
config: m68k-allmodconfig (attached as .config)
compiler: m68k-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   In file included from include/linux/swab.h:5:0,
from include/uapi/linux/byteorder/big_endian.h:13,
from include/linux/byteorder/big_endian.h:5,
from arch/m68k/include/uapi/asm/byteorder.h:5,
from include/asm-generic/bitops/le.h:6,
from arch/m68k/include/asm/bitops.h:519,
from include/linux/bitops.h:38,
from include/linux/kernel.h:11,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers/net//ethernet/freescale/fec_main.c:24:
   drivers/net//ethernet/freescale/fec_main.c: In function 'fec_restart':
>> drivers/net//ethernet/freescale/fec_main.c:959:26: error: 'FEC_RACC' 
>> undeclared (first use in this function); did you mean 'FEC_RXIC1'?
  val = readl(fep->hwp + FEC_RACC);
 ^
   include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32'
 (__builtin_constant_p((__u32)(x)) ? \
   ^
   include/linux/byteorder/generic.h:89:21: note: in expansion of macro 
'__le32_to_cpu'
#define le32_to_cpu __le32_to_cpu
^
   arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32'
#define readl(addr)  in_le32(addr)
 ^~~
   drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of 
macro 'readl'
  val = readl(fep->hwp + FEC_RACC);
^
   drivers/net//ethernet/freescale/fec_main.c:959:26: note: each undeclared 
identifier is reported only once for each function it appears in
  val = readl(fep->hwp + FEC_RACC);
 ^
   include/uapi/linux/swab.h:117:32: note: in definition of macro '__swab32'
 (__builtin_constant_p((__u32)(x)) ? \
   ^
   include/linux/byteorder/generic.h:89:21: note: in expansion of macro 
'__le32_to_cpu'
#define le32_to_cpu __le32_to_cpu
^
   arch/m68k/include/asm/io_mm.h:452:26: note: in expansion of macro 'in_le32'
#define readl(addr)  in_le32(addr)
 ^~~
   drivers/net//ethernet/freescale/fec_main.c:959:9: note: in expansion of 
macro 'readl'
  val = readl(fep->hwp + FEC_RACC);
^
   In file included from arch/m68k/include/asm/io_mm.h:27:0,
from arch/m68k/include/asm/io.h:5,
from include/linux/scatterlist.h:9,
from include/linux/dma-mapping.h:11,
from include/linux/skbuff.h:34,
from include/linux/if_ether.h:23,
from include/uapi/linux/ethtool.h:19,
from include/linux/ethtool.h:18,
from include/linux/netdevice.h:41,
from drivers/net//ethernet/freescale/fec_main.c:34:
   drivers/net//ethernet/freescale/fec_main.c:968:38: error: 'FEC_FTRL' 
undeclared (first use in this function); did you mean 'FEC_ECNTRL'?
  writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
 ^
   arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32'
#define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = 
cpu_to_le32(l))
   ^~~~
   drivers/net//ethernet/freescale/fec_main.c:968:3: note: in expansion of 
macro 'writel'
  writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
  ^~
   drivers/net//ethernet/freescale/fec_main.c:1034:38: error: 'FEC_R_FIFO_RSEM' 
undeclared (first use in this function); did you mean 'FEC_FIFO_RAM'?
  writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM);
 ^
   arch/m68k/include/asm/raw_io.h:48:64: note: in definition of macro 'out_le32'
#define out_le32(addr,l) (void)((*(__force volatile __le32 *) (addr)) = 
cpu_to_le32(l))
   ^~~~
   drivers/net//ethernet/freescale/fec_main.c:1034:3: note: in expansion of 
macro 'writel'
  writel(FEC_ENET_RSEM_V, fep->hwp + FEC_R_FIFO_RSEM);
  ^~
   drivers/net//ethernet/freescale/fec_main.c:1035:38: error: 'FEC_R_FIFO_RSFL' 
undeclared (first use in 

Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Baoquan He
Hi Chao,

On 05/17/18 at 11:27am, Chao Fan wrote:
> >+/* Store the number of 1GB huge pages which user specified.*/
> >+static unsigned long max_gb_huge_pages;
> >+
> >+static int parse_gb_huge_pages(char *param, char* val)
> >+{
> >+char *p;
> >+u64 mem_size;
> >+static bool gbpage_sz = false;
> >+
> >+if (!strcmp(param, "hugepagesz")) {
> >+p = val;
> >+mem_size = memparse(p, );
> >+if (mem_size == PUD_SIZE) {
> >+if (gbpage_sz)
> >+warn("Repeadly set hugeTLB page size of 1G!\n");
> >+gbpage_sz = true;
> >+} else
> >+gbpage_sz = false;
> >+} else if (!strcmp(param, "hugepages") && gbpage_sz) {
> >+p = val;
> >+max_gb_huge_pages = simple_strtoull(p, , 0);
> >+debug_putaddr(max_gb_huge_pages);
> >+}
> >+}
> >+
> >+
> > static int handle_mem_memmap(void)
> > {
> > char *args = (char *)get_cmd_line_ptr();
> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> >unsigned long image_size)
> > }
> > }
> > 
> >+/* Skip as many 1GB huge pages as possible in the passed region. */
> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> >image_size)
> >+{
> >+int i = 0;
> >+unsigned long addr, size;
> >+struct mem_vector tmp;
> >+
> >+if (!max_gb_huge_pages) {
> >+store_slot_info(region, image_size);
> >+return;
> >+}
> >+
> >+addr = ALIGN(region->start, PUD_SIZE);
> >+/* If Did we raise the address above the passed in memory entry? */
> >+if (addr < region->start + region->size)
> >+size = region->size - (addr - region->start);
> >+
> >+/* Check how many 1GB huge pages can be filtered out*/
> >+while (size > PUD_SIZE && max_gb_huge_pages) {
> >+size -= PUD_SIZE;
> >+max_gb_huge_pages--;
> 
> The global variable 'max_gb_huge_pages' means how many huge pages
> user specified when you get it from command line.
> But here, everytime we find a position which is good for huge page
> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
> it is used to store how many huge pages that we still need to search memory
> for good slots to filter out, right?
> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
> If my understanding is wrong, please tell me.

No, you have understood it very right. I finished the draft patch last
week, but changed this variable name and the function names several
time, still I feel they are not good. However I can't get a better name.

Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
from kernel command-line. And in this function it will be decreased. But
we can't define another global variable only for decreasing in this
place.

And you can see that in this patchset I only take cares of 1GB huge
pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
1GB only? Because 2MB is not impacted by KASLR, please check the code in 
hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
pre-allocated in hugetlb_nrpages_setup(), and if you look into
hugetlb_nrpages_setup(), you will find that it will call
alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
one time. That is why I always add 'gb' in the middle of the global
variable and the newly added functions.

And it will answer your below questions. When walk over all memory
regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
normal and done as expected. Here hugetlb only try its best to allocate
as many as possible according to 'max_gb_huge_pages'. If can't fully
satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
below to command-line:

default_hugepagesz=1G hugepagesz=1G hugepages=20

Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
and [3G,4G) are touched by bios reservation and pci/firmware reservation.
Then this 14 huge pages are maximal value which is expected. It's not a
bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
huge pages because KASLR put kernel into one of those good 1GB huge
pages. This is a bug.

I am not very familiar with huge page handling, just read code recently
because of this kaslr bug. Hope Luiz and people from his team can help
correct and clarify if anything is not right. Especially the function
names, I feel it's not good, if anyone have a better idea, I will really
appreciate that.
> 
> >+i++;
> >+}
> >+
> >+if (!i) {
> >+store_slot_info(region, image_size);
> >+return;
> >+}
> >+
> >+/* Process the remaining regions after filtering out. */
> >+
> This line may be unusable.

Hmm, I made it on purpose. Because 1GB huge pages may be digged out from
the middle, then the remaing head and tail regions still need be
handled. I put it here to 

Re: [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling

2018-05-16 Thread Baoquan He
Hi Chao,

On 05/17/18 at 11:27am, Chao Fan wrote:
> >+/* Store the number of 1GB huge pages which user specified.*/
> >+static unsigned long max_gb_huge_pages;
> >+
> >+static int parse_gb_huge_pages(char *param, char* val)
> >+{
> >+char *p;
> >+u64 mem_size;
> >+static bool gbpage_sz = false;
> >+
> >+if (!strcmp(param, "hugepagesz")) {
> >+p = val;
> >+mem_size = memparse(p, );
> >+if (mem_size == PUD_SIZE) {
> >+if (gbpage_sz)
> >+warn("Repeadly set hugeTLB page size of 1G!\n");
> >+gbpage_sz = true;
> >+} else
> >+gbpage_sz = false;
> >+} else if (!strcmp(param, "hugepages") && gbpage_sz) {
> >+p = val;
> >+max_gb_huge_pages = simple_strtoull(p, , 0);
> >+debug_putaddr(max_gb_huge_pages);
> >+}
> >+}
> >+
> >+
> > static int handle_mem_memmap(void)
> > {
> > char *args = (char *)get_cmd_line_ptr();
> >@@ -466,6 +492,51 @@ static void store_slot_info(struct mem_vector *region, 
> >unsigned long image_size)
> > }
> > }
> > 
> >+/* Skip as many 1GB huge pages as possible in the passed region. */
> >+static void process_gb_huge_page(struct mem_vector *region, unsigned long 
> >image_size)
> >+{
> >+int i = 0;
> >+unsigned long addr, size;
> >+struct mem_vector tmp;
> >+
> >+if (!max_gb_huge_pages) {
> >+store_slot_info(region, image_size);
> >+return;
> >+}
> >+
> >+addr = ALIGN(region->start, PUD_SIZE);
> >+/* If Did we raise the address above the passed in memory entry? */
> >+if (addr < region->start + region->size)
> >+size = region->size - (addr - region->start);
> >+
> >+/* Check how many 1GB huge pages can be filtered out*/
> >+while (size > PUD_SIZE && max_gb_huge_pages) {
> >+size -= PUD_SIZE;
> >+max_gb_huge_pages--;
> 
> The global variable 'max_gb_huge_pages' means how many huge pages
> user specified when you get it from command line.
> But here, everytime we find a position which is good for huge page
> allocation, the 'max_gdb_huge_page' decreased. So in my understanding,
> it is used to store how many huge pages that we still need to search memory
> for good slots to filter out, right?
> If it's right, maybe the name 'max_gb_huge_pages' is not very suitable.
> If my understanding is wrong, please tell me.

No, you have understood it very right. I finished the draft patch last
week, but changed this variable name and the function names several
time, still I feel they are not good. However I can't get a better name.

Yes, 'max_gb_huge_pages' stores how many 1GB huge pages are expected
from kernel command-line. And in this function it will be decreased. But
we can't define another global variable only for decreasing in this
place.

And you can see that in this patchset I only take cares of 1GB huge
pages. While on x86 we have two kinds of huge pages, 2MB and 1GB, why
1GB only? Because 2MB is not impacted by KASLR, please check the code in 
hugetlb_nrpages_setup() of mm/hugetlb.c . Only 1GB huge pages need be
pre-allocated in hugetlb_nrpages_setup(), and if you look into
hugetlb_nrpages_setup(), you will find that it will call
alloc_bootmem_huge_page() to allocate huge pages one by one, but not at
one time. That is why I always add 'gb' in the middle of the global
variable and the newly added functions.

And it will answer your below questions. When walk over all memory
regions, 'max_gb_huge_pages' is still not 0, what should we do? It's
normal and done as expected. Here hugetlb only try its best to allocate
as many as possible according to 'max_gb_huge_pages'. If can't fully
satisfied, it's fine. E.g on bare-metal machine with 16GB RAM, you add
below to command-line:

default_hugepagesz=1G hugepagesz=1G hugepages=20

Then it will get 14 good 1GB huge pages with kaslr disabled since [0,1G)
and [3G,4G) are touched by bios reservation and pci/firmware reservation.
Then this 14 huge pages are maximal value which is expected. It's not a
bug in huge page. But with kaslr enabled, it sometime only get 13 1GB
huge pages because KASLR put kernel into one of those good 1GB huge
pages. This is a bug.

I am not very familiar with huge page handling, just read code recently
because of this kaslr bug. Hope Luiz and people from his team can help
correct and clarify if anything is not right. Especially the function
names, I feel it's not good, if anyone have a better idea, I will really
appreciate that.
> 
> >+i++;
> >+}
> >+
> >+if (!i) {
> >+store_slot_info(region, image_size);
> >+return;
> >+}
> >+
> >+/* Process the remaining regions after filtering out. */
> >+
> This line may be unusable.

Hmm, I made it on purpose. Because 1GB huge pages may be digged out from
the middle, then the remaing head and tail regions still need be
handled. I put it here to 

  1   2   3   4   5   6   7   8   9   10   >