Re: [PATCH 4.4 15/47] ubi: fastmap: Correctly handle interrupted erasures in EBA

2018-07-27 Thread Richard Weinberger
Ben,

Am Samstag, 28. Juli 2018, 03:28:58 CEST schrieb Ben Hutchings:
> > > Isn't there a risk here, that a read error leads to erasing data that
> > > might be recoverable if the read is retried?
> > 
> > Well, read error means that already something went very wrong. At other 
> > places
> > in UBI wo also don't retry reading headers and consider it as fatal when we 
> > are unable to read it.
> > We could also read the EC header, but what do we gain from that?
> > If the VID header is not readable we cannot check fastmap either.
> > 
> > What case exactly do you have in mind?
> 
> I suppose I'm thinking about data recovery from a flash device that has
> become unreliable.  I'm not familiar with ubi; is there a read-only
> mode where the erase wouldn't actually be performed?

We have a ready only mode, UBI enters this mode upon fatal errors.
But I don't see what this has to do with the case this patch addresses.
Here we need to recover from interrupted erases when fastmap is used.
With fastmap the mapping still looks valid to the upper layer but due to
a bad erase we might face later ECC errors and therefore we have to check
whether it is still valid upon first access.
If data is present, mapping is ok. If not (no data or ECC error), unmap it
to make the state sync with the upper layer.

Thanks,
//richard



Re: [PATCH 4.4 15/47] ubi: fastmap: Correctly handle interrupted erasures in EBA

2018-07-27 Thread Richard Weinberger
Ben,

Am Samstag, 28. Juli 2018, 03:28:58 CEST schrieb Ben Hutchings:
> > > Isn't there a risk here, that a read error leads to erasing data that
> > > might be recoverable if the read is retried?
> > 
> > Well, read error means that already something went very wrong. At other 
> > places
> > in UBI wo also don't retry reading headers and consider it as fatal when we 
> > are unable to read it.
> > We could also read the EC header, but what do we gain from that?
> > If the VID header is not readable we cannot check fastmap either.
> > 
> > What case exactly do you have in mind?
> 
> I suppose I'm thinking about data recovery from a flash device that has
> become unreliable.  I'm not familiar with ubi; is there a read-only
> mode where the erase wouldn't actually be performed?

We have a ready only mode, UBI enters this mode upon fatal errors.
But I don't see what this has to do with the case this patch addresses.
Here we need to recover from interrupted erases when fastmap is used.
With fastmap the mapping still looks valid to the upper layer but due to
a bad erase we might face later ECC errors and therefore we have to check
whether it is still valid upon first access.
If data is present, mapping is ok. If not (no data or ECC error), unmap it
to make the state sync with the upper layer.

Thanks,
//richard



Re: [PATCH 4/4] cpufreq: intel_pstate: enable boost for Skylake Xeon

2018-07-27 Thread Francisco Jerez
Srinivas Pandruvada  writes:

> Enable HWP boost on Skylake server and workstations.
>

Please revert this series, it led to significant energy usage and
graphics performance regressions [1].  The reasons are roughly the ones
we discussed by e-mail off-list last April: This causes the intel_pstate
driver to decrease the EPP to zero when the workload blocks on IO
frequently enough, which for the regressing benchmarks detailed in [1]
is a symptom of the workload being heavily IO-bound, which means they
won't benefit at all from the EPP boost since they aren't significantly
CPU-bound, and they will suffer a decrease in parallelism due to the
active CPU core using a larger fraction of the TDP in order to achieve
the same work, causing the GPU to have a lower power budget available,
leading to a decrease in system performance.

You may want to give a shot to my previous suggestion of using [2] in
order to detect whether the system is IO-bound, which you can use as an
indicator that the optimization implemented in this series cannot
possibly improve performance and can be expected to hurt energy usage.

Thanks.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=107410
[2] https://patchwork.kernel.org/patch/10312259/

> Reported-by: Mel Gorman 
> Tested-by: Giovanni Gherdovich 
> Signed-off-by: Srinivas Pandruvada 
> ---
>  drivers/cpufreq/intel_pstate.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 70bf63bb4e0e..01c8da1f99db 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1794,6 +1794,12 @@ static const struct x86_cpu_id 
> intel_pstate_cpu_ee_disable_ids[] = {
>   {}
>  };
>  
> +static const struct x86_cpu_id intel_pstate_hwp_boost_ids[] __initconst = {
> + ICPU(INTEL_FAM6_SKYLAKE_X, core_funcs),
> + ICPU(INTEL_FAM6_SKYLAKE_DESKTOP, core_funcs),
> + {}
> +};
> +
>  static int intel_pstate_init_cpu(unsigned int cpunum)
>  {
>   struct cpudata *cpu;
> @@ -1824,6 +1830,10 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
>   intel_pstate_disable_ee(cpunum);
>  
>   intel_pstate_hwp_enable(cpu);
> +
> + id = x86_match_cpu(intel_pstate_hwp_boost_ids);
> + if (id)
> + hwp_boost = true;
>   }
>  
>   intel_pstate_get_cpu_pstates(cpu);
> -- 
> 2.13.6


signature.asc
Description: PGP signature


Re: [PATCH 4/4] cpufreq: intel_pstate: enable boost for Skylake Xeon

2018-07-27 Thread Francisco Jerez
Srinivas Pandruvada  writes:

> Enable HWP boost on Skylake server and workstations.
>

Please revert this series, it led to significant energy usage and
graphics performance regressions [1].  The reasons are roughly the ones
we discussed by e-mail off-list last April: This causes the intel_pstate
driver to decrease the EPP to zero when the workload blocks on IO
frequently enough, which for the regressing benchmarks detailed in [1]
is a symptom of the workload being heavily IO-bound, which means they
won't benefit at all from the EPP boost since they aren't significantly
CPU-bound, and they will suffer a decrease in parallelism due to the
active CPU core using a larger fraction of the TDP in order to achieve
the same work, causing the GPU to have a lower power budget available,
leading to a decrease in system performance.

You may want to give a shot to my previous suggestion of using [2] in
order to detect whether the system is IO-bound, which you can use as an
indicator that the optimization implemented in this series cannot
possibly improve performance and can be expected to hurt energy usage.

Thanks.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=107410
[2] https://patchwork.kernel.org/patch/10312259/

> Reported-by: Mel Gorman 
> Tested-by: Giovanni Gherdovich 
> Signed-off-by: Srinivas Pandruvada 
> ---
>  drivers/cpufreq/intel_pstate.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 70bf63bb4e0e..01c8da1f99db 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1794,6 +1794,12 @@ static const struct x86_cpu_id 
> intel_pstate_cpu_ee_disable_ids[] = {
>   {}
>  };
>  
> +static const struct x86_cpu_id intel_pstate_hwp_boost_ids[] __initconst = {
> + ICPU(INTEL_FAM6_SKYLAKE_X, core_funcs),
> + ICPU(INTEL_FAM6_SKYLAKE_DESKTOP, core_funcs),
> + {}
> +};
> +
>  static int intel_pstate_init_cpu(unsigned int cpunum)
>  {
>   struct cpudata *cpu;
> @@ -1824,6 +1830,10 @@ static int intel_pstate_init_cpu(unsigned int cpunum)
>   intel_pstate_disable_ee(cpunum);
>  
>   intel_pstate_hwp_enable(cpu);
> +
> + id = x86_match_cpu(intel_pstate_hwp_boost_ids);
> + if (id)
> + hwp_boost = true;
>   }
>  
>   intel_pstate_get_cpu_pstates(cpu);
> -- 
> 2.13.6


signature.asc
Description: PGP signature


Re: [PATCH] scripts: Add Python 3 support to tracing/draw_functrace.py

2018-07-27 Thread Masahiro Yamada
2018-07-26 0:01 GMT+09:00 Jeremy Cline :
> On 07/25/2018 10:39 AM, Masahiro Yamada wrote:
>> 2018-07-21 4:35 GMT+09:00 Jeremy Cline :
>>> Use the print function. This maintains Python 2 support and should have
>>> no functional change.
>>>
>>> Signed-off-by: Jeremy Cline 
>>> ---
>>>  scripts/tracing/draw_functrace.py | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/scripts/tracing/draw_functrace.py 
>>> b/scripts/tracing/draw_functrace.py
>>> index db40fa04cd51..7d44e796d362 100755
>>> --- a/scripts/tracing/draw_functrace.py
>>> +++ b/scripts/tracing/draw_functrace.py
>>> @@ -20,6 +20,7 @@ Usage:
>>> $ scripts/draw_functrace.py < raw_trace_func > draw_functrace
>>> Then you have your drawn trace in draw_functrace
>>>  """
>>> +from __future__ import print_function
>>
>> What do you need this line for?
>>
>> I have not tested this,
>> but I guess print(CallTree.ROOT) will work for Python 2.
>
> Although "print(CallTree.ROOT)" (as a statement) works in Python 2,
> its behavior is different than print (as a function) in Python 3. In
> this case, there's no additional arguments being provided so the
> behavior will match, but if someone added an argument it would work
> differently on Python 2 vs Python 3:
>
> Python 2.7.15
 print("hello", "world")
> ('hello', 'world')
>
> Python 3.6.6
 print("hello, "world")
>   File "", line 1
> print("hello, "world")
>^
> SyntaxError: invalid syntax

Yes, I know this.


> Importing the print_function works on Python 2.6+[0] and changes print
> to be a function in Python 2 so it'll behave the same in 2 and 3. Given
> that this script doesn't appear to change much it's probably not going
> to save anyone from making that mistake, though. Would you prefer a
> patch without it?

Either will do.
If it is tedious to respin, I will pick this up.


I saw only one simple print statement in this script,
so I wanted to ask you if this had some reason
I might be missing.

Anyway, we will remove 'from __future__ import print_function'
when Python 2 retires.



> [0] https://docs.python.org/3/library/__future__.html
>
> Regards,
> Jeremy
>
>>
>>
>>
>>>
>>>  import sys, re
>>> @@ -123,7 +124,7 @@ def main():
>>> tree = tree.getParent(caller)
>>> tree = tree.calls(callee, calltime)
>>>
>>> -   print CallTree.ROOT
>>> +   print(CallTree.ROOT)
>>>
>>>  if __name__ == "__main__":
>>> main()
>>> --
>>> 2.17.1
>>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] scripts: Add Python 3 support to tracing/draw_functrace.py

2018-07-27 Thread Masahiro Yamada
2018-07-26 0:01 GMT+09:00 Jeremy Cline :
> On 07/25/2018 10:39 AM, Masahiro Yamada wrote:
>> 2018-07-21 4:35 GMT+09:00 Jeremy Cline :
>>> Use the print function. This maintains Python 2 support and should have
>>> no functional change.
>>>
>>> Signed-off-by: Jeremy Cline 
>>> ---
>>>  scripts/tracing/draw_functrace.py | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/scripts/tracing/draw_functrace.py 
>>> b/scripts/tracing/draw_functrace.py
>>> index db40fa04cd51..7d44e796d362 100755
>>> --- a/scripts/tracing/draw_functrace.py
>>> +++ b/scripts/tracing/draw_functrace.py
>>> @@ -20,6 +20,7 @@ Usage:
>>> $ scripts/draw_functrace.py < raw_trace_func > draw_functrace
>>> Then you have your drawn trace in draw_functrace
>>>  """
>>> +from __future__ import print_function
>>
>> What do you need this line for?
>>
>> I have not tested this,
>> but I guess print(CallTree.ROOT) will work for Python 2.
>
> Although "print(CallTree.ROOT)" (as a statement) works in Python 2,
> its behavior is different than print (as a function) in Python 3. In
> this case, there's no additional arguments being provided so the
> behavior will match, but if someone added an argument it would work
> differently on Python 2 vs Python 3:
>
> Python 2.7.15
 print("hello", "world")
> ('hello', 'world')
>
> Python 3.6.6
 print("hello, "world")
>   File "", line 1
> print("hello, "world")
>^
> SyntaxError: invalid syntax

Yes, I know this.


> Importing the print_function works on Python 2.6+[0] and changes print
> to be a function in Python 2 so it'll behave the same in 2 and 3. Given
> that this script doesn't appear to change much it's probably not going
> to save anyone from making that mistake, though. Would you prefer a
> patch without it?

Either will do.
If it is tedious to respin, I will pick this up.


I saw only one simple print statement in this script,
so I wanted to ask you if this had some reason
I might be missing.

Anyway, we will remove 'from __future__ import print_function'
when Python 2 retires.



> [0] https://docs.python.org/3/library/__future__.html
>
> Regards,
> Jeremy
>
>>
>>
>>
>>>
>>>  import sys, re
>>> @@ -123,7 +124,7 @@ def main():
>>> tree = tree.getParent(caller)
>>> tree = tree.calls(callee, calltime)
>>>
>>> -   print CallTree.ROOT
>>> +   print(CallTree.ROOT)
>>>
>>>  if __name__ == "__main__":
>>> main()
>>> --
>>> 2.17.1
>>>
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 4.17 00/66] 4.17.11-stable review

2018-07-27 Thread Greg Kroah-Hartman
On Fri, Jul 27, 2018 at 10:31:48AM -0700, Guenter Roeck wrote:
> On Fri, Jul 27, 2018 at 11:44:53AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 134 pass: 134 fail: 0
> Qemu test results:
>   total: 177 pass: 177 fail: 0
> 
> Details are available at http://kerneltests.org/builders/.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.17 00/66] 4.17.11-stable review

2018-07-27 Thread Greg Kroah-Hartman
On Fri, Jul 27, 2018 at 10:31:48AM -0700, Guenter Roeck wrote:
> On Fri, Jul 27, 2018 at 11:44:53AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 134 pass: 134 fail: 0
> Qemu test results:
>   total: 177 pass: 177 fail: 0
> 
> Details are available at http://kerneltests.org/builders/.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.17 00/66] 4.17.11-stable review

2018-07-27 Thread Greg Kroah-Hartman
On Fri, Jul 27, 2018 at 01:49:42PM -0600, Shuah Khan wrote:
> On 07/27/2018 03:44 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.17.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Missed a few releases while I was away :)

Welcome back!

> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


Re: [PATCH 4.17 00/66] 4.17.11-stable review

2018-07-27 Thread Greg Kroah-Hartman
On Fri, Jul 27, 2018 at 01:49:42PM -0600, Shuah Khan wrote:
> On 07/27/2018 03:44 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.11 release.
> > There are 66 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Jul 29 09:37:38 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.11-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.17.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> 
> Missed a few releases while I was away :)

Welcome back!

> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h


[PATCH 2/5] staging: gasket: apex: fixup undefined PCI class

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Apex chips with class 0 (PCI_CLASS_NOT_DEFINED) fixed up to
PCI_CLASS_SYSTEM_OTHER to enable PCI resource assignments.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 73fc2683a834d..ab466d49608a7 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -739,3 +739,10 @@ static ssize_t sysfs_show(
gasket_sysfs_put_device_data(device, gasket_dev);
return ret;
 }
+
+static void apex_pci_fixup_class(struct pci_dev *pdev)
+{
+   pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID,
+  PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 3/5] staging: gasket: sysfs: remove unnecessary NULL check on device ptr

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

The device pointer passed into get_mapping() will never be NULL; the
check is unnecessary.

Reported-by: Greg Kroah-Hartman 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_sysfs.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index 2d8647de697cd..da972ce0e0db0 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -67,11 +67,6 @@ static struct gasket_sysfs_mapping *get_mapping(struct 
device *device)
 {
int i;
 
-   if (!device) {
-   pr_debug("%s: Received NULL device\n", __func__);
-   return NULL;
-   }
-
for (i = 0; i < GASKET_SYSFS_NUM_MAPPINGS; i++) {
mutex_lock(_mappings[i].mutex);
if (dev_mappings[i].device == device) {
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 1/5] staging: gasket: sysfs: remove check for refcount already zero

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Remove the check for refcount already zero, which shouldn't be
necessary.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_sysfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index 418b81797e638..2d8647de697cd 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -111,8 +111,6 @@ static void put_mapping(struct gasket_sysfs_mapping 
*mapping)
}
 
mutex_lock(>mutex);
-   if (refcount_read(>refcount.refcount) == 0)
-   dev_err(mapping->device, "Refcount is already 0\n");
if (kref_put(>refcount, release_entry)) {
dev_dbg(mapping->device, "Removing Gasket sysfs mapping\n");
/*
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 0/5] staging: gasket: fixes and cleanups

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

The fun continues with gasket+apex: remove dead code and unnecessary
stuff, fixup apex PCI class for devices that advertise class 0
(undefined), and make sure the struct device doesn't go away on us.
Most of these from review comments of previous patch series.

Todd Poynor (5):
  staging: gasket: sysfs: remove check for refcount already zero
  staging: gasket: apex: fixup undefined PCI class
  staging: gasket: sysfs: remove unnecessary NULL check on device ptr
  staging: gasket: page table: remove code for "no dma_ops"
  staging: gasket: core: hold reference on device kobj while in use

 drivers/staging/gasket/apex_driver.c   |  7 +++
 drivers/staging/gasket/gasket_core.c   |  5 +-
 drivers/staging/gasket/gasket_page_table.c | 58 +++---
 drivers/staging/gasket/gasket_page_table.h |  3 +-
 drivers/staging/gasket/gasket_sysfs.c  |  7 ---
 5 files changed, 17 insertions(+), 63 deletions(-)

-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 3/5] staging: gasket: sysfs: remove unnecessary NULL check on device ptr

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

The device pointer passed into get_mapping() will never be NULL; the
check is unnecessary.

Reported-by: Greg Kroah-Hartman 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_sysfs.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index 2d8647de697cd..da972ce0e0db0 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -67,11 +67,6 @@ static struct gasket_sysfs_mapping *get_mapping(struct 
device *device)
 {
int i;
 
-   if (!device) {
-   pr_debug("%s: Received NULL device\n", __func__);
-   return NULL;
-   }
-
for (i = 0; i < GASKET_SYSFS_NUM_MAPPINGS; i++) {
mutex_lock(_mappings[i].mutex);
if (dev_mappings[i].device == device) {
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 1/5] staging: gasket: sysfs: remove check for refcount already zero

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Remove the check for refcount already zero, which shouldn't be
necessary.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_sysfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index 418b81797e638..2d8647de697cd 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -111,8 +111,6 @@ static void put_mapping(struct gasket_sysfs_mapping 
*mapping)
}
 
mutex_lock(>mutex);
-   if (refcount_read(>refcount.refcount) == 0)
-   dev_err(mapping->device, "Refcount is already 0\n");
if (kref_put(>refcount, release_entry)) {
dev_dbg(mapping->device, "Removing Gasket sysfs mapping\n");
/*
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 0/5] staging: gasket: fixes and cleanups

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

The fun continues with gasket+apex: remove dead code and unnecessary
stuff, fixup apex PCI class for devices that advertise class 0
(undefined), and make sure the struct device doesn't go away on us.
Most of these from review comments of previous patch series.

Todd Poynor (5):
  staging: gasket: sysfs: remove check for refcount already zero
  staging: gasket: apex: fixup undefined PCI class
  staging: gasket: sysfs: remove unnecessary NULL check on device ptr
  staging: gasket: page table: remove code for "no dma_ops"
  staging: gasket: core: hold reference on device kobj while in use

 drivers/staging/gasket/apex_driver.c   |  7 +++
 drivers/staging/gasket/gasket_core.c   |  5 +-
 drivers/staging/gasket/gasket_page_table.c | 58 +++---
 drivers/staging/gasket/gasket_page_table.h |  3 +-
 drivers/staging/gasket/gasket_sysfs.c  |  7 ---
 5 files changed, 17 insertions(+), 63 deletions(-)

-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 2/5] staging: gasket: apex: fixup undefined PCI class

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Apex chips with class 0 (PCI_CLASS_NOT_DEFINED) fixed up to
PCI_CLASS_SYSTEM_OTHER to enable PCI resource assignments.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 73fc2683a834d..ab466d49608a7 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -739,3 +739,10 @@ static ssize_t sysfs_show(
gasket_sysfs_put_device_data(device, gasket_dev);
return ret;
 }
+
+static void apex_pci_fixup_class(struct pci_dev *pdev)
+{
+   pdev->class = (PCI_CLASS_SYSTEM_OTHER << 8) | pdev->class;
+}
+DECLARE_PCI_FIXUP_CLASS_HEADER(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID,
+  PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 4/5] staging: gasket: page table: remove code for "no dma_ops"

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Remove code with TODOs on it for working around apparent problems
previously seen in a qemu environment where dma_ops was not set
correctly.  There is no user of this in the current code.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c   |  2 +-
 drivers/staging/gasket/gasket_page_table.c | 58 +++---
 drivers/staging/gasket/gasket_page_table.h |  3 +-
 3 files changed, 8 insertions(+), 55 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index f44805c38159b..859a6df9e12df 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -916,7 +916,7 @@ static int gasket_enable_dev(
_dev->bar_data[
driver_desc->page_table_bar_index],
_desc->page_table_configs[tbl_idx],
-   gasket_dev->dev, gasket_dev->pci_dev, true);
+   gasket_dev->dev, gasket_dev->pci_dev);
if (ret) {
dev_err(gasket_dev->dev,
"Couldn't init page table %d: %d\n",
diff --git a/drivers/staging/gasket/gasket_page_table.c 
b/drivers/staging/gasket/gasket_page_table.c
index 32f1c1e10c7e2..722839603f20d 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -212,12 +212,6 @@ struct gasket_page_table {
 * gasket_mmap function, so user_virt belongs in the driver anyhow.
 */
struct gasket_coherent_page_entry *coherent_pages;
-
-   /*
-* Whether the page table uses arch specific dma_ops or
-* whether the driver is supplying its own.
-*/
-   bool dma_ops;
 };
 
 /* Mapping declarations */
@@ -290,7 +284,7 @@ int gasket_page_table_init(
struct gasket_page_table **ppg_tbl,
const struct gasket_bar_data *bar_data,
const struct gasket_page_table_config *page_table_config,
-   struct device *device, struct pci_dev *pci_dev, bool has_dma_ops)
+   struct device *device, struct pci_dev *pci_dev)
 {
ulong bytes;
struct gasket_page_table *pg_tbl;
@@ -353,7 +347,6 @@ int gasket_page_table_init(
bar_data->virt_base[page_table_config->extended_reg]);
pg_tbl->device = device;
pg_tbl->pci_dev = pci_dev;
-   pg_tbl->dma_ops = has_dma_ops;
 
dev_dbg(device, "Page table initialized successfully\n");
 
@@ -759,33 +752,6 @@ static int gasket_map_extended_pages(
return 0;
 }
 
-/*
- * TODO: dma_map_page() is not plugged properly when running under qemu. i.e.
- * dma_ops are not set properly, which causes the kernel to assert.
- *
- * This temporary hack allows the driver to work on qemu, but need to be fixed:
- * - either manually set the dma_ops for the architecture (which incidentally
- * can't be done in an out-of-tree module) - or get qemu to fill the device 
tree
- * properly so as linux plug the proper dma_ops or so as the driver can detect
- * that it is runnig on qemu
- */
-static inline dma_addr_t _no_op_dma_map_page(
-   struct device *dev, struct page *page, size_t offset, size_t size,
-   enum dma_data_direction dir)
-{
-   /*
-* struct dma_map_ops *ops = get_dma_ops(dev);
-* dma_addr_t addr;
-*
-* kmemcheck_mark_initialized(page_address(page) + offset, size);
-* BUG_ON(!valid_dma_direction(dir));
-* addr = ops->map_page(dev, page, offset, size, dir, NULL);
-* debug_dma_map_page(dev, page, offset, size, dir, addr, false);
-*/
-
-   return page_to_phys(page);
-}
-
 /*
  * Get and map last level page table buffers.
  * @pg_tbl: Gasket page table pointer.
@@ -856,16 +822,9 @@ static int gasket_perform_mapping(
ptes[i].offset = offset;
 
/* Map the page into DMA space. */
-   if (pg_tbl->dma_ops) {
-   /* hook in to kernel map functions */
-   ptes[i].dma_addr = dma_map_page(pg_tbl->device,
-   page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
-   } else {
-   ptes[i].dma_addr = _no_op_dma_map_page(
-   pg_tbl->device, page, 0, PAGE_SIZE,
-   DMA_BIDIRECTIONAL);
-   }
-
+   ptes[i].dma_addr =
+   dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
+DMA_BIDIRECTIONAL);
dev_dbg(pg_tbl->device,
"%s i %d pte %p pfn %p -> mapped %llx\n",
__func__, i, [i],
@@ -1042,13 +1001,8 @@ static int gasket_alloc_extended_subtable(
}
 
/* Map the page into DMA space. */
-   if 

[PATCH 5/5] staging: gasket: core: hold reference on device kobj while in use

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Hold a reference on the struct device kobject while a pointer to that
device is in use by gasket.

Reported-by: Greg Kroah-Hartman 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 859a6df9e12df..1dc7273e38734 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -449,6 +449,7 @@ static int gasket_alloc_dev(
gasket_dev->dev_idx = dev_idx;
snprintf(gasket_dev->kobj_name, GASKET_NAME_MAX, "%s", kobj_name);
gasket_dev->dev = parent;
+   kobject_get(_dev->dev->kobj);
/* gasket_bar_data is uninitialized. */
gasket_dev->num_page_tables = driver_desc->num_page_tables;
/* max_page_table_size and *page table are uninit'ed */
@@ -487,7 +488,7 @@ static void gasket_free_dev(struct gasket_dev *gasket_dev)
mutex_lock(_desc->mutex);
internal_desc->devs[gasket_dev->dev_idx] = NULL;
mutex_unlock(_desc->mutex);
-
+   kobject_put(_dev->dev->kobj);
kfree(gasket_dev);
 }
 
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH 4/5] staging: gasket: page table: remove code for "no dma_ops"

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Remove code with TODOs on it for working around apparent problems
previously seen in a qemu environment where dma_ops was not set
correctly.  There is no user of this in the current code.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c   |  2 +-
 drivers/staging/gasket/gasket_page_table.c | 58 +++---
 drivers/staging/gasket/gasket_page_table.h |  3 +-
 3 files changed, 8 insertions(+), 55 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index f44805c38159b..859a6df9e12df 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -916,7 +916,7 @@ static int gasket_enable_dev(
_dev->bar_data[
driver_desc->page_table_bar_index],
_desc->page_table_configs[tbl_idx],
-   gasket_dev->dev, gasket_dev->pci_dev, true);
+   gasket_dev->dev, gasket_dev->pci_dev);
if (ret) {
dev_err(gasket_dev->dev,
"Couldn't init page table %d: %d\n",
diff --git a/drivers/staging/gasket/gasket_page_table.c 
b/drivers/staging/gasket/gasket_page_table.c
index 32f1c1e10c7e2..722839603f20d 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -212,12 +212,6 @@ struct gasket_page_table {
 * gasket_mmap function, so user_virt belongs in the driver anyhow.
 */
struct gasket_coherent_page_entry *coherent_pages;
-
-   /*
-* Whether the page table uses arch specific dma_ops or
-* whether the driver is supplying its own.
-*/
-   bool dma_ops;
 };
 
 /* Mapping declarations */
@@ -290,7 +284,7 @@ int gasket_page_table_init(
struct gasket_page_table **ppg_tbl,
const struct gasket_bar_data *bar_data,
const struct gasket_page_table_config *page_table_config,
-   struct device *device, struct pci_dev *pci_dev, bool has_dma_ops)
+   struct device *device, struct pci_dev *pci_dev)
 {
ulong bytes;
struct gasket_page_table *pg_tbl;
@@ -353,7 +347,6 @@ int gasket_page_table_init(
bar_data->virt_base[page_table_config->extended_reg]);
pg_tbl->device = device;
pg_tbl->pci_dev = pci_dev;
-   pg_tbl->dma_ops = has_dma_ops;
 
dev_dbg(device, "Page table initialized successfully\n");
 
@@ -759,33 +752,6 @@ static int gasket_map_extended_pages(
return 0;
 }
 
-/*
- * TODO: dma_map_page() is not plugged properly when running under qemu. i.e.
- * dma_ops are not set properly, which causes the kernel to assert.
- *
- * This temporary hack allows the driver to work on qemu, but need to be fixed:
- * - either manually set the dma_ops for the architecture (which incidentally
- * can't be done in an out-of-tree module) - or get qemu to fill the device 
tree
- * properly so as linux plug the proper dma_ops or so as the driver can detect
- * that it is runnig on qemu
- */
-static inline dma_addr_t _no_op_dma_map_page(
-   struct device *dev, struct page *page, size_t offset, size_t size,
-   enum dma_data_direction dir)
-{
-   /*
-* struct dma_map_ops *ops = get_dma_ops(dev);
-* dma_addr_t addr;
-*
-* kmemcheck_mark_initialized(page_address(page) + offset, size);
-* BUG_ON(!valid_dma_direction(dir));
-* addr = ops->map_page(dev, page, offset, size, dir, NULL);
-* debug_dma_map_page(dev, page, offset, size, dir, addr, false);
-*/
-
-   return page_to_phys(page);
-}
-
 /*
  * Get and map last level page table buffers.
  * @pg_tbl: Gasket page table pointer.
@@ -856,16 +822,9 @@ static int gasket_perform_mapping(
ptes[i].offset = offset;
 
/* Map the page into DMA space. */
-   if (pg_tbl->dma_ops) {
-   /* hook in to kernel map functions */
-   ptes[i].dma_addr = dma_map_page(pg_tbl->device,
-   page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
-   } else {
-   ptes[i].dma_addr = _no_op_dma_map_page(
-   pg_tbl->device, page, 0, PAGE_SIZE,
-   DMA_BIDIRECTIONAL);
-   }
-
+   ptes[i].dma_addr =
+   dma_map_page(pg_tbl->device, page, 0, PAGE_SIZE,
+DMA_BIDIRECTIONAL);
dev_dbg(pg_tbl->device,
"%s i %d pte %p pfn %p -> mapped %llx\n",
__func__, i, [i],
@@ -1042,13 +1001,8 @@ static int gasket_alloc_extended_subtable(
}
 
/* Map the page into DMA space. */
-   if 

[PATCH 5/5] staging: gasket: core: hold reference on device kobj while in use

2018-07-27 Thread Todd Poynor
From: Todd Poynor 

Hold a reference on the struct device kobject while a pointer to that
device is in use by gasket.

Reported-by: Greg Kroah-Hartman 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 859a6df9e12df..1dc7273e38734 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -449,6 +449,7 @@ static int gasket_alloc_dev(
gasket_dev->dev_idx = dev_idx;
snprintf(gasket_dev->kobj_name, GASKET_NAME_MAX, "%s", kobj_name);
gasket_dev->dev = parent;
+   kobject_get(_dev->dev->kobj);
/* gasket_bar_data is uninitialized. */
gasket_dev->num_page_tables = driver_desc->num_page_tables;
/* max_page_table_size and *page table are uninit'ed */
@@ -487,7 +488,7 @@ static void gasket_free_dev(struct gasket_dev *gasket_dev)
mutex_lock(_desc->mutex);
internal_desc->devs[gasket_dev->dev_idx] = NULL;
mutex_unlock(_desc->mutex);
-
+   kobject_put(_dev->dev->kobj);
kfree(gasket_dev);
 }
 
-- 
2.18.0.345.g5c9ce644c3-goog



Compliment of the day to you Dear Friend.

2018-07-27 Thread Mrs.Amina.Kadi
 Compliment of the day to you Dear Friend.

Dear Friend.
 
  I am Mrs. Amina Kadi. am sending this brief letter to solicit your
partnership to transfer $5.5 million US Dollars. I shall send you
more information and procedures when I receive positive response from
you.

Mrs. Amina Kadi









Compliment of the day to you Dear Friend.

2018-07-27 Thread Mrs.Amina.Kadi
 Compliment of the day to you Dear Friend.

Dear Friend.
 
  I am Mrs. Amina Kadi. am sending this brief letter to solicit your
partnership to transfer $5.5 million US Dollars. I shall send you
more information and procedures when I receive positive response from
you.

Mrs. Amina Kadi









Re: [alsa-devel] [PATCH] ASoC: soc-pcm: Use delay set in pointer function

2018-07-27 Thread Agrawal, Akshu



On 7/27/2018 8:39 PM, Pierre-Louis Bossart wrote:
> On 7/27/18 5:13 AM, Akshu Agrawal wrote:
>> There are cases where a pointer function populates
>> runtime->delay, such as:
>> ./sound/pci/hda/hda_controller.c
>> ./sound/soc/intel/atom/sst-mfld-platform-pcm.c
>>
>> Also, in some cases cpu dai used is generic and the pcm
>> driver needs to set delay.
>>
>> This delay was getting lost and was overwritten by delays
>> from codec or cpu dai delay function if exposed.
> 
> Humm, yes the runtime->delay set in the .pointer function would be lost 
> without this change, but the delay would still be provided in the 
> followup call to .delay.
> With your change, the same delay will be accounted for twice?
> 

It will not be accounted twice because no driver which is setting
runtime->delay is defining .delay op for cpu_dai. Vice versa is also
true, the drivers which define .delay for cpu_dai don't set
runtime->delay. And I think this is expected from drivers else it would
be a bug from their side.

.delay for codec_dai anyway is different and has to be accounted for.

Thanks,
Akshu
>>
>> Signed-off-by: Akshu Agrawal 
>> ---
>>   sound/soc/soc-pcm.c | 5 -
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
>> index 98be04b..b1a2bc2 100644
>> --- a/sound/soc/soc-pcm.c
>> +++ b/sound/soc/soc-pcm.c
>> @@ -1179,6 +1179,9 @@ static snd_pcm_uframes_t soc_pcm_pointer(struct 
>> snd_pcm_substream *substream)
>>  snd_pcm_sframes_t codec_delay = 0;
>>  int i;
>>   
>> +/* clearing the previous delay */
>> +runtime->delay = 0;
>> +
>>  for_each_rtdcom(rtd, rtdcom) {
>>  component = rtdcom->component;
>>   
>> @@ -1203,7 +1206,7 @@ static snd_pcm_uframes_t soc_pcm_pointer(struct 
>> snd_pcm_substream *substream)
>>  }
>>  delay += codec_delay;
>>   
>> -runtime->delay = delay;
>> +runtime->delay += delay;
>>   
>>  return offset;
>>   }
>>
> 


Re: [alsa-devel] [PATCH] ASoC: soc-pcm: Use delay set in pointer function

2018-07-27 Thread Agrawal, Akshu



On 7/27/2018 8:39 PM, Pierre-Louis Bossart wrote:
> On 7/27/18 5:13 AM, Akshu Agrawal wrote:
>> There are cases where a pointer function populates
>> runtime->delay, such as:
>> ./sound/pci/hda/hda_controller.c
>> ./sound/soc/intel/atom/sst-mfld-platform-pcm.c
>>
>> Also, in some cases cpu dai used is generic and the pcm
>> driver needs to set delay.
>>
>> This delay was getting lost and was overwritten by delays
>> from codec or cpu dai delay function if exposed.
> 
> Humm, yes the runtime->delay set in the .pointer function would be lost 
> without this change, but the delay would still be provided in the 
> followup call to .delay.
> With your change, the same delay will be accounted for twice?
> 

It will not be accounted twice because no driver which is setting
runtime->delay is defining .delay op for cpu_dai. Vice versa is also
true, the drivers which define .delay for cpu_dai don't set
runtime->delay. And I think this is expected from drivers else it would
be a bug from their side.

.delay for codec_dai anyway is different and has to be accounted for.

Thanks,
Akshu
>>
>> Signed-off-by: Akshu Agrawal 
>> ---
>>   sound/soc/soc-pcm.c | 5 -
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
>> index 98be04b..b1a2bc2 100644
>> --- a/sound/soc/soc-pcm.c
>> +++ b/sound/soc/soc-pcm.c
>> @@ -1179,6 +1179,9 @@ static snd_pcm_uframes_t soc_pcm_pointer(struct 
>> snd_pcm_substream *substream)
>>  snd_pcm_sframes_t codec_delay = 0;
>>  int i;
>>   
>> +/* clearing the previous delay */
>> +runtime->delay = 0;
>> +
>>  for_each_rtdcom(rtd, rtdcom) {
>>  component = rtdcom->component;
>>   
>> @@ -1203,7 +1206,7 @@ static snd_pcm_uframes_t soc_pcm_pointer(struct 
>> snd_pcm_substream *substream)
>>  }
>>  delay += codec_delay;
>>   
>> -runtime->delay = delay;
>> +runtime->delay += delay;
>>   
>>  return offset;
>>   }
>>
> 


[PATCH URGENT for ftrace/core] lockdep: Fix error due to incorrect pointer return

2018-07-27 Thread Joel Fernandes
From: "Joel Fernandes (Google)" 

The 0-day bot caught an issue which all my tests missed (will add it
into my tests) where this_cpu_ptr is incorrectly returning the wrong
pointer from get_lock_stats. Fix it.

Fixes: f4ac253a8df0 ("lockdep: use this_cpu_ptr instead of get_cpu_var stats")
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/locking/lockdep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a6a6b7eb4b82..03bfaeb9f4e6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -249,7 +249,7 @@ void clear_lock_stats(struct lock_class *class)
 
 static struct lock_class_stats *get_lock_stats(struct lock_class *class)
 {
-   return _cpu_ptr(_lock_stats)[class - lock_classes];
+   return _cpu_ptr(cpu_lock_stats)[class - lock_classes];
 }
 
 static void lock_release_holdtime(struct held_lock *hlock)
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH URGENT for ftrace/core] lockdep: Fix error due to incorrect pointer return

2018-07-27 Thread Joel Fernandes
From: "Joel Fernandes (Google)" 

The 0-day bot caught an issue which all my tests missed (will add it
into my tests) where this_cpu_ptr is incorrectly returning the wrong
pointer from get_lock_stats. Fix it.

Fixes: f4ac253a8df0 ("lockdep: use this_cpu_ptr instead of get_cpu_var stats")
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/locking/lockdep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index a6a6b7eb4b82..03bfaeb9f4e6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -249,7 +249,7 @@ void clear_lock_stats(struct lock_class *class)
 
 static struct lock_class_stats *get_lock_stats(struct lock_class *class)
 {
-   return _cpu_ptr(_lock_stats)[class - lock_classes];
+   return _cpu_ptr(cpu_lock_stats)[class - lock_classes];
 }
 
 static void lock_release_holdtime(struct held_lock *hlock)
-- 
2.18.0.345.g5c9ce644c3-goog



[PATCH] PM / devfreq: Generic cpufreq governor

2018-07-27 Thread Saravana Kannan
Many CPU architectures have caches that can scale independent of the CPUs.
Frequency scaling of the caches is necessary to make sure the cache is not
a performance bottleneck that leads to poor performance and power. The same
idea applies for RAM/DDR.

To achieve this, this patch adds a generic devfreq governor that can listen
to the frequency transitions of each CPU frequency domain and then adjusts
the frequency of the cache (or any devfreq device) based on the frequency
of the CPUs.

To decide the frequency of the device, the governor does one of the
following:

* Uses a CPU frequency to device frequency mapping table
  - Either one mapping table used for all CPU freq policies (typically used
for system with homogeneous cores/clusters that have the same OPPs.
  - One mapping table per CPU freq policy (typically used for ASMP systems
with heterogeneous CPUs with different OPPs)

OR

* Scales the device frequency in proportion to the CPU frequency. So, if
  the CPUs are running at their max frequency, the device runs at its max
  frequency.  If the CPUs are running at their min frequency, the device
  runs at its min frequency. And interpolated for frequencies in between.

Signed-off-by: Saravana Kannan 
---
 .../bindings/devfreq/devfreq-cpufreq.txt   |  53 ++
 drivers/devfreq/Kconfig|   8 +
 drivers/devfreq/Makefile   |   1 +
 drivers/devfreq/governor_cpufreq.c | 583 +
 4 files changed, 645 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
 create mode 100644 drivers/devfreq/governor_cpufreq.c

diff --git a/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt 
b/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
new file mode 100644
index 000..6537538
--- /dev/null
+++ b/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
@@ -0,0 +1,53 @@
+Devfreq CPUfreq governor
+
+devfreq-cpufreq is a parent device that contains one or more child devices.
+Each child device provides CPU frequency to device frequency mapping for a
+specific device. Examples of devices that could use this are: DDR, cache and
+CCI.
+
+Parent device name shall be "devfreq-cpufreq".
+
+Required child device properties:
+- cpu-to-dev-map, or cpu-to-dev-map-:
+   A list of tuples where each tuple consists of a
+   CPU frequency (KHz) and the corresponding device
+   frequency. CPU frequencies not listed in the table
+   will use the device frequency that corresponds to the
+   next rounded up CPU frequency.
+   Use "cpu-to-dev-map" if all CPUs in the system should
+   share same mapping.
+   Use cpu-to-dev-map- to describe different
+   mappings for different CPUs. The property should be
+   listed only for the first CPU if multiple CPUs are
+   synchronous.
+- target-dev:  Phandle to device that this mapping applies to.
+
+Example:
+   devfreq-cpufreq {
+   cpubw-cpufreq {
+   target-dev = <>;
+   cpu-to-dev-map =
+   <  30  1144 >,
+   <  422400  2288 >,
+   <  652800  3051 >,
+   <  883200  5996 >,
+   < 1190400  8056 >,
+   < 1497600 10101 >,
+   < 1728000 12145 >,
+   < 2649600 16250 >;
+   };
+
+   cache-cpufreq {
+   target-dev = <>;
+   cpu-to-dev-map =
+   <  30  30 >,
+   <  422400  422400 >,
+   <  652800  499200 >,
+   <  883200  576000 >,
+   <  96  96 >,
+   < 1497600 1036800 >,
+   < 1574400 1574400 >,
+   < 1728000 1651200 >,
+   < 2649600 1728000 >;
+   };
+   };
diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 8503018..5eeb33f 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -73,6 +73,14 @@ config DEVFREQ_GOV_PASSIVE
  through sysfs entries. The passive governor recommends that
  devfreq device uses the OPP table to get the frequency/voltage.
 
+config DEVFREQ_GOV_CPUFREQ
+   tristate "CPUfreq"
+   depends on CPU_FREQ
+   help
+ Chooses frequency based on the online CPUs' current frequency and a
+ CPU frequency to device frequency mapping table(s). This governor
+ can be useful for controlling devices such 

[PATCH] PM / devfreq: Generic cpufreq governor

2018-07-27 Thread Saravana Kannan
Many CPU architectures have caches that can scale independent of the CPUs.
Frequency scaling of the caches is necessary to make sure the cache is not
a performance bottleneck that leads to poor performance and power. The same
idea applies for RAM/DDR.

To achieve this, this patch adds a generic devfreq governor that can listen
to the frequency transitions of each CPU frequency domain and then adjusts
the frequency of the cache (or any devfreq device) based on the frequency
of the CPUs.

To decide the frequency of the device, the governor does one of the
following:

* Uses a CPU frequency to device frequency mapping table
  - Either one mapping table used for all CPU freq policies (typically used
for system with homogeneous cores/clusters that have the same OPPs.
  - One mapping table per CPU freq policy (typically used for ASMP systems
with heterogeneous CPUs with different OPPs)

OR

* Scales the device frequency in proportion to the CPU frequency. So, if
  the CPUs are running at their max frequency, the device runs at its max
  frequency.  If the CPUs are running at their min frequency, the device
  runs at its min frequency. And interpolated for frequencies in between.

Signed-off-by: Saravana Kannan 
---
 .../bindings/devfreq/devfreq-cpufreq.txt   |  53 ++
 drivers/devfreq/Kconfig|   8 +
 drivers/devfreq/Makefile   |   1 +
 drivers/devfreq/governor_cpufreq.c | 583 +
 4 files changed, 645 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
 create mode 100644 drivers/devfreq/governor_cpufreq.c

diff --git a/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt 
b/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
new file mode 100644
index 000..6537538
--- /dev/null
+++ b/Documentation/devicetree/bindings/devfreq/devfreq-cpufreq.txt
@@ -0,0 +1,53 @@
+Devfreq CPUfreq governor
+
+devfreq-cpufreq is a parent device that contains one or more child devices.
+Each child device provides CPU frequency to device frequency mapping for a
+specific device. Examples of devices that could use this are: DDR, cache and
+CCI.
+
+Parent device name shall be "devfreq-cpufreq".
+
+Required child device properties:
+- cpu-to-dev-map, or cpu-to-dev-map-:
+   A list of tuples where each tuple consists of a
+   CPU frequency (KHz) and the corresponding device
+   frequency. CPU frequencies not listed in the table
+   will use the device frequency that corresponds to the
+   next rounded up CPU frequency.
+   Use "cpu-to-dev-map" if all CPUs in the system should
+   share same mapping.
+   Use cpu-to-dev-map- to describe different
+   mappings for different CPUs. The property should be
+   listed only for the first CPU if multiple CPUs are
+   synchronous.
+- target-dev:  Phandle to device that this mapping applies to.
+
+Example:
+   devfreq-cpufreq {
+   cpubw-cpufreq {
+   target-dev = <>;
+   cpu-to-dev-map =
+   <  30  1144 >,
+   <  422400  2288 >,
+   <  652800  3051 >,
+   <  883200  5996 >,
+   < 1190400  8056 >,
+   < 1497600 10101 >,
+   < 1728000 12145 >,
+   < 2649600 16250 >;
+   };
+
+   cache-cpufreq {
+   target-dev = <>;
+   cpu-to-dev-map =
+   <  30  30 >,
+   <  422400  422400 >,
+   <  652800  499200 >,
+   <  883200  576000 >,
+   <  96  96 >,
+   < 1497600 1036800 >,
+   < 1574400 1574400 >,
+   < 1728000 1651200 >,
+   < 2649600 1728000 >;
+   };
+   };
diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 8503018..5eeb33f 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -73,6 +73,14 @@ config DEVFREQ_GOV_PASSIVE
  through sysfs entries. The passive governor recommends that
  devfreq device uses the OPP table to get the frequency/voltage.
 
+config DEVFREQ_GOV_CPUFREQ
+   tristate "CPUfreq"
+   depends on CPU_FREQ
+   help
+ Chooses frequency based on the online CPUs' current frequency and a
+ CPU frequency to device frequency mapping table(s). This governor
+ can be useful for controlling devices such 

Re: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Len Brown
>   ACPI Warning: \_SB.IETM._ART: Return Package type mismatch ..

_ART is the AML method for "Active Cooling Relationship Table".
iIt tells the system how fans are related to temperature sensors.

It is likely that Windows uses DPTF on this platform, and since DPTF
is not using _ART, Linux may be pointing out a vestigial _ART.

Anyway, if your system doesn't handle being hot very well -- this error
message is related to that capability.

cheers,
-Len


Re: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Len Brown
>   ACPI Warning: \_SB.IETM._ART: Return Package type mismatch ..

_ART is the AML method for "Active Cooling Relationship Table".
iIt tells the system how fans are related to temperature sensors.

It is likely that Windows uses DPTF on this platform, and since DPTF
is not using _ART, Linux may be pointing out a vestigial _ART.

Anyway, if your system doesn't handle being hot very well -- this error
message is related to that capability.

cheers,
-Len


Hello from Lisa

2018-07-27 Thread Lisa Johnson
Hello Dear, I am Miss Lisa. I have very important thing to discuss
with you. please, this information is very vital. Contact me with my
private email so we can talk ( lisajoh...@hotmail.com ) I find your
email when I was searching for some that will understand me.

Lisa


Hello from Lisa

2018-07-27 Thread Lisa Johnson
Hello Dear, I am Miss Lisa. I have very important thing to discuss
with you. please, this information is very vital. Contact me with my
private email so we can talk ( lisajoh...@hotmail.com ) I find your
email when I was searching for some that will understand me.

Lisa


Re: [PATCH 13/38] tomoyo: Implement security hooks for the new mount API [ver #10]

2018-07-27 Thread Tetsuo Handa
On 2018/07/28 2:32, David Howells wrote:
> Implement the security hook to check the creation of a new mountpoint for
> Tomoyo.
> 
> As far as I can tell, Tomoyo doesn't make use of the mount data or parse
> any mount options, so I haven't implemented any of the fs_context hooks for
> it.
> 
> Signed-off-by: David Howells 
> cc: Tetsuo Handa 
> cc: tomoyo-dev...@lists.sourceforge.jp
> cc: linux-security-mod...@vger.kernel.org

Would you provide examples of each possible combination as a C program?
For example, if one mount point from multiple sources with different
options are possible, please describe such pattern using syscall so that
LSM modules can run it to see whether they are working as expected. 


Re: [PATCH 13/38] tomoyo: Implement security hooks for the new mount API [ver #10]

2018-07-27 Thread Tetsuo Handa
On 2018/07/28 2:32, David Howells wrote:
> Implement the security hook to check the creation of a new mountpoint for
> Tomoyo.
> 
> As far as I can tell, Tomoyo doesn't make use of the mount data or parse
> any mount options, so I haven't implemented any of the fs_context hooks for
> it.
> 
> Signed-off-by: David Howells 
> cc: Tetsuo Handa 
> cc: tomoyo-dev...@lists.sourceforge.jp
> cc: linux-security-mod...@vger.kernel.org

Would you provide examples of each possible combination as a C program?
For example, if one mount point from multiple sources with different
options are possible, please describe such pattern using syscall so that
LSM modules can run it to see whether they are working as expected. 


Re: [PATCH v2 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
On Fri, 27 Jul 2018 10:31:45 -0400
Steven Rostedt  wrote:

> On Fri, 27 Jul 2018 08:26:40 -0600
> Shuah Khan  wrote:
> 
> > Masami,
> > 
> > The patch isn't in my Inbox. Could you please send it.
> 
> You don't subscribe to LKML? ;-)
> 
> I just bounced it to you.
> 
> -- Steve

Sorry Shuah and Steve,

I've fixed a problem in this patch. I've sent v3 patch to you.

Thank you,


-- 
Masami Hiramatsu 


Re: [PATCH v2 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
On Fri, 27 Jul 2018 10:31:45 -0400
Steven Rostedt  wrote:

> On Fri, 27 Jul 2018 08:26:40 -0600
> Shuah Khan  wrote:
> 
> > Masami,
> > 
> > The patch isn't in my Inbox. Could you please send it.
> 
> You don't subscribe to LKML? ;-)
> 
> I just bounced it to you.
> 
> -- Steve

Sorry Shuah and Steve,

I've fixed a problem in this patch. I've sent v3 patch to you.

Thank you,


-- 
Masami Hiramatsu 


[PATCH v3 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
Fix kprobe string argument testcase to not probe notrace
function. Instead, it probes tracefs function which must
be available with ftrace.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Fix probepoint testcase too.
---
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 
 .../selftests/ftrace/test.d/kprobe/probepoint.tc   |2 +
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
index a0002563e9ee..1ad70cdaf442 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -9,28 +9,22 @@ echo > kprobe_events
 
 case `uname -m` in
 x86_64)
-  ARG2=%si
-  OFFS=8
+  ARG1=%di
 ;;
 i[3456]86)
-  ARG2=%cx
-  OFFS=4
+  ARG1=%ax
 ;;
 aarch64)
-  ARG2=%x1
-  OFFS=8
+  ARG1=%x0
 ;;
 arm*)
-  ARG2=%r1
-  OFFS=4
+  ARG1=%r0
 ;;
 ppc64*)
-  ARG2=%r4
-  OFFS=8
+  ARG1=%r3
 ;;
 ppc*)
-  ARG2=%r4
-  OFFS=4
+  ARG1=%r3
 ;;
 *)
   echo "Please implement other architecture here"
@@ -38,17 +32,17 @@ ppc*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
kprobe_events
+echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
-! echo test >> kprobe_events
-tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
+echo "p:test _do_fork" >> kprobe_events
+grep -qe "testprobe.* arg1=\"test\"" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
+echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string 
arg2=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
-! echo test1 test2 >> kprobe_events
-tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
+echo "p:test _do_fork" >> kprobe_events
+grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
 
 echo 0 > events/enable
 echo > kprobe_events
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
index 4fda01a08da4..519d2763f5d2 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
@@ -4,7 +4,7 @@
 
 [ -f kprobe_events ] || exit_unsupported # this is configurable
 
-TARGET_FUNC=create_trace_kprobe
+TARGET_FUNC=tracefs_create_dir
 
 dec_addr() { # hexaddr
   printf "%d" "0x"`echo $1 | tail -c 8`



[PATCH v3 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
Fix kprobe string argument testcase to not probe notrace
function. Instead, it probes tracefs function which must
be available with ftrace.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Fix probepoint testcase too.
---
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 
 .../selftests/ftrace/test.d/kprobe/probepoint.tc   |2 +
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
index a0002563e9ee..1ad70cdaf442 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -9,28 +9,22 @@ echo > kprobe_events
 
 case `uname -m` in
 x86_64)
-  ARG2=%si
-  OFFS=8
+  ARG1=%di
 ;;
 i[3456]86)
-  ARG2=%cx
-  OFFS=4
+  ARG1=%ax
 ;;
 aarch64)
-  ARG2=%x1
-  OFFS=8
+  ARG1=%x0
 ;;
 arm*)
-  ARG2=%r1
-  OFFS=4
+  ARG1=%r0
 ;;
 ppc64*)
-  ARG2=%r4
-  OFFS=8
+  ARG1=%r3
 ;;
 ppc*)
-  ARG2=%r4
-  OFFS=4
+  ARG1=%r3
 ;;
 *)
   echo "Please implement other architecture here"
@@ -38,17 +32,17 @@ ppc*)
 esac
 
 : "Test get argument (1)"
-echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
kprobe_events
+echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
-! echo test >> kprobe_events
-tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
+echo "p:test _do_fork" >> kprobe_events
+grep -qe "testprobe.* arg1=\"test\"" trace
 
 echo 0 > events/kprobes/testprobe/enable
 : "Test get argument (2)"
-echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
+echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string 
arg2=+0(${ARG1}):string" > kprobe_events
 echo 1 > events/kprobes/testprobe/enable
-! echo test1 test2 >> kprobe_events
-tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
+echo "p:test _do_fork" >> kprobe_events
+grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
 
 echo 0 > events/enable
 echo > kprobe_events
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
index 4fda01a08da4..519d2763f5d2 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/probepoint.tc
@@ -4,7 +4,7 @@
 
 [ -f kprobe_events ] || exit_unsupported # this is configurable
 
-TARGET_FUNC=create_trace_kprobe
+TARGET_FUNC=tracefs_create_dir
 
 dec_addr() { # hexaddr
   printf "%d" "0x"`echo $1 | tail -c 8`



[PATCH v3 1/3] tracing: kprobes: Prohibit probing on notrace function

2018-07-27 Thread Masami Hiramatsu
Prohibit kprobe-events probing on notrace function.
Since probing on the notrace function can cause recursive
event call. In most case those are just skipped, but
in some case it falls into infinit recursive call.

This protection can be disabled by the kconfig
CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly
recommended to keep it "n" for normal kernel.

Signed-off-by: Masami Hiramatsu 
Tested-by: Francis Deslauriers 
---
  Changes in v2
   - Add CONFIG_KPROBE_EVENTS_ON_NOTRACE kconfig for knocking down
 the protection.
  Changes in v3
   - Fix to check raw-address (no symbol) probe point correctly.
---
 kernel/trace/Kconfig|   18 +
 kernel/trace/trace_kprobe.c |   46 +++
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index dcc0166d1997..24d5a58467a3 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -456,6 +456,24 @@ config KPROBE_EVENTS
  This option is also required by perf-probe subcommand of perf tools.
  If you want to use perf tools, this option is strongly recommended.
 
+config KPROBE_EVENTS_ON_NOTRACE
+   bool "Do NOT protect notrace function from kprobe events"
+   depends on KPROBE_EVENTS
+   default n
+   help
+ This is only for the developers who want to debug ftrace itself
+ using kprobe events.
+
+ Usually, ftrace related functions are protected from kprobe-events
+ to prevent an infinit recursion or any unexpected execution path
+ which leads to a kernel crash.
+
+ This option disables such protection and allows you to put kprobe
+ events on ftrace functions for debugging ftrace by itself.
+ Note that this might let you shoot yourself in the foot.
+
+ If unsure, say N.
+
 config UPROBE_EVENTS
bool "Enable uprobes-based dynamic events"
depends on ARCH_SUPPORTS_UPROBES
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 27ace4513c43..2bf75af92f16 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -87,6 +87,21 @@ static nokprobe_inline unsigned long 
trace_kprobe_nhit(struct trace_kprobe *tk)
return nhit;
 }
 
+static nokprobe_inline
+unsigned long trace_kprobe_address(struct trace_kprobe *tk)
+{
+   unsigned long addr;
+
+   if (tk->symbol) {
+   addr = (unsigned long)
+   kallsyms_lookup_name(trace_kprobe_symbol(tk));
+   addr += tk->rp.kp.offset;
+   } else {
+   addr = (unsigned long)tk->rp.kp.addr;
+   }
+   return addr;
+}
+
 bool trace_kprobe_on_func_entry(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
@@ -99,16 +114,8 @@ bool trace_kprobe_on_func_entry(struct trace_event_call 
*call)
 bool trace_kprobe_error_injectable(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
-   unsigned long addr;
 
-   if (tk->symbol) {
-   addr = (unsigned long)
-   kallsyms_lookup_name(trace_kprobe_symbol(tk));
-   addr += tk->rp.kp.offset;
-   } else {
-   addr = (unsigned long)tk->rp.kp.addr;
-   }
-   return within_error_injection_list(addr);
+   return within_error_injection_list(trace_kprobe_address(tk));
 }
 
 static int register_kprobe_event(struct trace_kprobe *tk);
@@ -496,6 +503,21 @@ disable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
return ret;
 }
 
+#ifdef CONFIG_KPROBE_EVENTS_ON_NOTRACE
+#define within_notrace_func(tk)(false)
+#else
+static bool within_notrace_func(struct trace_kprobe *tk)
+{
+   unsigned long offset, size, addr;
+
+   addr = trace_kprobe_address(tk);
+   if (!kallsyms_lookup_size_offset(addr, , ))
+   return true;/* Out of range. */
+
+   return !ftrace_location_range(addr - offset, addr - offset + size);
+}
+#endif
+
 /* Internal register function - just handle k*probes and flags */
 static int __register_trace_kprobe(struct trace_kprobe *tk)
 {
@@ -504,6 +526,12 @@ static int __register_trace_kprobe(struct trace_kprobe *tk)
if (trace_probe_is_registered(>tp))
return -EINVAL;
 
+   if (within_notrace_func(tk)) {
+   pr_warn("Could not probe notrace function %s\n",
+   trace_kprobe_symbol(tk));
+   return -EINVAL;
+   }
+
for (i = 0; i < tk->tp.nr_args; i++)
traceprobe_update_arg(>tp.args[i]);
 



[PATCH v3 1/3] tracing: kprobes: Prohibit probing on notrace function

2018-07-27 Thread Masami Hiramatsu
Prohibit kprobe-events probing on notrace function.
Since probing on the notrace function can cause recursive
event call. In most case those are just skipped, but
in some case it falls into infinit recursive call.

This protection can be disabled by the kconfig
CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly
recommended to keep it "n" for normal kernel.

Signed-off-by: Masami Hiramatsu 
Tested-by: Francis Deslauriers 
---
  Changes in v2
   - Add CONFIG_KPROBE_EVENTS_ON_NOTRACE kconfig for knocking down
 the protection.
  Changes in v3
   - Fix to check raw-address (no symbol) probe point correctly.
---
 kernel/trace/Kconfig|   18 +
 kernel/trace/trace_kprobe.c |   46 +++
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index dcc0166d1997..24d5a58467a3 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -456,6 +456,24 @@ config KPROBE_EVENTS
  This option is also required by perf-probe subcommand of perf tools.
  If you want to use perf tools, this option is strongly recommended.
 
+config KPROBE_EVENTS_ON_NOTRACE
+   bool "Do NOT protect notrace function from kprobe events"
+   depends on KPROBE_EVENTS
+   default n
+   help
+ This is only for the developers who want to debug ftrace itself
+ using kprobe events.
+
+ Usually, ftrace related functions are protected from kprobe-events
+ to prevent an infinit recursion or any unexpected execution path
+ which leads to a kernel crash.
+
+ This option disables such protection and allows you to put kprobe
+ events on ftrace functions for debugging ftrace by itself.
+ Note that this might let you shoot yourself in the foot.
+
+ If unsure, say N.
+
 config UPROBE_EVENTS
bool "Enable uprobes-based dynamic events"
depends on ARCH_SUPPORTS_UPROBES
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 27ace4513c43..2bf75af92f16 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -87,6 +87,21 @@ static nokprobe_inline unsigned long 
trace_kprobe_nhit(struct trace_kprobe *tk)
return nhit;
 }
 
+static nokprobe_inline
+unsigned long trace_kprobe_address(struct trace_kprobe *tk)
+{
+   unsigned long addr;
+
+   if (tk->symbol) {
+   addr = (unsigned long)
+   kallsyms_lookup_name(trace_kprobe_symbol(tk));
+   addr += tk->rp.kp.offset;
+   } else {
+   addr = (unsigned long)tk->rp.kp.addr;
+   }
+   return addr;
+}
+
 bool trace_kprobe_on_func_entry(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
@@ -99,16 +114,8 @@ bool trace_kprobe_on_func_entry(struct trace_event_call 
*call)
 bool trace_kprobe_error_injectable(struct trace_event_call *call)
 {
struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
-   unsigned long addr;
 
-   if (tk->symbol) {
-   addr = (unsigned long)
-   kallsyms_lookup_name(trace_kprobe_symbol(tk));
-   addr += tk->rp.kp.offset;
-   } else {
-   addr = (unsigned long)tk->rp.kp.addr;
-   }
-   return within_error_injection_list(addr);
+   return within_error_injection_list(trace_kprobe_address(tk));
 }
 
 static int register_kprobe_event(struct trace_kprobe *tk);
@@ -496,6 +503,21 @@ disable_trace_kprobe(struct trace_kprobe *tk, struct 
trace_event_file *file)
return ret;
 }
 
+#ifdef CONFIG_KPROBE_EVENTS_ON_NOTRACE
+#define within_notrace_func(tk)(false)
+#else
+static bool within_notrace_func(struct trace_kprobe *tk)
+{
+   unsigned long offset, size, addr;
+
+   addr = trace_kprobe_address(tk);
+   if (!kallsyms_lookup_size_offset(addr, , ))
+   return true;/* Out of range. */
+
+   return !ftrace_location_range(addr - offset, addr - offset + size);
+}
+#endif
+
 /* Internal register function - just handle k*probes and flags */
 static int __register_trace_kprobe(struct trace_kprobe *tk)
 {
@@ -504,6 +526,12 @@ static int __register_trace_kprobe(struct trace_kprobe *tk)
if (trace_probe_is_registered(>tp))
return -EINVAL;
 
+   if (within_notrace_func(tk)) {
+   pr_warn("Could not probe notrace function %s\n",
+   trace_kprobe_symbol(tk));
+   return -EINVAL;
+   }
+
for (i = 0; i < tk->tp.nr_args; i++)
traceprobe_update_arg(>tp.args[i]);
 



[PATCH v3 2/3] selftest/ftrace: Move kprobe selftest function to separate compile unit

2018-07-27 Thread Masami Hiramatsu
From: Francis Deslauriers 

Move selftest function to its own compile unit so it can be compiled
with the ftrace cflags (CC_FLAGS_FTRACE) allowing it to be probed
during the ftrace startup tests.

Signed-off-by: Francis Deslauriers 
Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/Makefile|5 +
 kernel/trace/trace_kprobe.c  |   12 +---
 kernel/trace/trace_kprobe_selftest.c |   10 ++
 kernel/trace/trace_kprobe_selftest.h |7 +++
 4 files changed, 23 insertions(+), 11 deletions(-)
 create mode 100644 kernel/trace/trace_kprobe_selftest.c
 create mode 100644 kernel/trace/trace_kprobe_selftest.h

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..e38771eccb2f 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -13,6 +13,11 @@ obj-y += trace_selftest_dynamic.o
 endif
 endif
 
+ifdef CONFIG_FTRACE_STARTUP_TEST
+CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE)
+obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o
+endif
+
 # If unlikely tracing is enabled, do not trace these files
 ifdef CONFIG_TRACING_BRANCHES
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 2bf75af92f16..88b49396ba08 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include "trace_kprobe_selftest.h"
 #include "trace_probe.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
@@ -1578,17 +1579,6 @@ fs_initcall(init_kprobe_trace);
 
 
 #ifdef CONFIG_FTRACE_STARTUP_TEST
-/*
- * The "__used" keeps gcc from removing the function symbol
- * from the kallsyms table. 'noinline' makes sure that there
- * isn't an inlined version used by the test method below
- */
-static __used __init noinline int
-kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int a6)
-{
-   return a1 + a2 + a3 + a4 + a5 + a6;
-}
-
 static __init struct trace_event_file *
 find_trace_probe_file(struct trace_kprobe *tk, struct trace_array *tr)
 {
diff --git a/kernel/trace/trace_kprobe_selftest.c 
b/kernel/trace/trace_kprobe_selftest.c
new file mode 100644
index ..16548ee4c8c6
--- /dev/null
+++ b/kernel/trace/trace_kprobe_selftest.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Function used during the kprobe self test. This function is in a separate
+ * compile unit so it can be compile with CC_FLAGS_FTRACE to ensure that it
+ * can be probed by the selftests.
+ */
+int kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int 
a6)
+{
+   return a1 + a2 + a3 + a4 + a5 + a6;
+}
diff --git a/kernel/trace/trace_kprobe_selftest.h 
b/kernel/trace/trace_kprobe_selftest.h
new file mode 100644
index ..4e10ec41c013
--- /dev/null
+++ b/kernel/trace/trace_kprobe_selftest.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Function used during the kprobe self test. This function is in a separate
+ * compile unit so it can be compile with CC_FLAGS_FTRACE to ensure that it
+ * can be probed by the selftests.
+ */
+int kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int 
a6);



[PATCH v3 2/3] selftest/ftrace: Move kprobe selftest function to separate compile unit

2018-07-27 Thread Masami Hiramatsu
From: Francis Deslauriers 

Move selftest function to its own compile unit so it can be compiled
with the ftrace cflags (CC_FLAGS_FTRACE) allowing it to be probed
during the ftrace startup tests.

Signed-off-by: Francis Deslauriers 
Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/Makefile|5 +
 kernel/trace/trace_kprobe.c  |   12 +---
 kernel/trace/trace_kprobe_selftest.c |   10 ++
 kernel/trace/trace_kprobe_selftest.h |7 +++
 4 files changed, 23 insertions(+), 11 deletions(-)
 create mode 100644 kernel/trace/trace_kprobe_selftest.c
 create mode 100644 kernel/trace/trace_kprobe_selftest.h

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index e2538c7638d4..e38771eccb2f 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -13,6 +13,11 @@ obj-y += trace_selftest_dynamic.o
 endif
 endif
 
+ifdef CONFIG_FTRACE_STARTUP_TEST
+CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE)
+obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o
+endif
+
 # If unlikely tracing is enabled, do not trace these files
 ifdef CONFIG_TRACING_BRANCHES
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 2bf75af92f16..88b49396ba08 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include "trace_kprobe_selftest.h"
 #include "trace_probe.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
@@ -1578,17 +1579,6 @@ fs_initcall(init_kprobe_trace);
 
 
 #ifdef CONFIG_FTRACE_STARTUP_TEST
-/*
- * The "__used" keeps gcc from removing the function symbol
- * from the kallsyms table. 'noinline' makes sure that there
- * isn't an inlined version used by the test method below
- */
-static __used __init noinline int
-kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int a6)
-{
-   return a1 + a2 + a3 + a4 + a5 + a6;
-}
-
 static __init struct trace_event_file *
 find_trace_probe_file(struct trace_kprobe *tk, struct trace_array *tr)
 {
diff --git a/kernel/trace/trace_kprobe_selftest.c 
b/kernel/trace/trace_kprobe_selftest.c
new file mode 100644
index ..16548ee4c8c6
--- /dev/null
+++ b/kernel/trace/trace_kprobe_selftest.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Function used during the kprobe self test. This function is in a separate
+ * compile unit so it can be compile with CC_FLAGS_FTRACE to ensure that it
+ * can be probed by the selftests.
+ */
+int kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int 
a6)
+{
+   return a1 + a2 + a3 + a4 + a5 + a6;
+}
diff --git a/kernel/trace/trace_kprobe_selftest.h 
b/kernel/trace/trace_kprobe_selftest.h
new file mode 100644
index ..4e10ec41c013
--- /dev/null
+++ b/kernel/trace/trace_kprobe_selftest.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Function used during the kprobe self test. This function is in a separate
+ * compile unit so it can be compile with CC_FLAGS_FTRACE to ensure that it
+ * can be probed by the selftests.
+ */
+int kprobe_trace_selftest_target(int a1, int a2, int a3, int a4, int a5, int 
a6);



[PATCH v3 0/3] tracing: kprobes: Prohibit probing on notrace functions

2018-07-27 Thread Masami Hiramatsu
Hello,

This is the 3rd version of the series to prohibit kprobe
on notrace functions which Francis sent before.

This version fixes some issues on previous version. Fix
to handle the no-symbol kprobe-events correctly and Fix
probepoint.tc testcase to not use notrace function as
a probe point.

Thank you,

---

Francis Deslauriers (1):
  selftest/ftrace: Move kprobe selftest function to separate compile unit

Masami Hiramatsu (2):
  tracing: kprobes: Prohibit probing on notrace function
  selftests/ftrace: Fix kprobe string testcase to not probe notrace function


 kernel/trace/Kconfig   |   18 ++
 kernel/trace/Makefile  |5 ++
 kernel/trace/trace_kprobe.c|   58 +---
 kernel/trace/trace_kprobe_selftest.c   |   10 +++
 kernel/trace/trace_kprobe_selftest.h   |7 ++
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 --
 .../selftests/ftrace/test.d/kprobe/probepoint.tc   |2 -
 7 files changed, 91 insertions(+), 39 deletions(-)
 create mode 100644 kernel/trace/trace_kprobe_selftest.c
 create mode 100644 kernel/trace/trace_kprobe_selftest.h

--
Masami Hiramatsu (Linaro) 


[PATCH v3 0/3] tracing: kprobes: Prohibit probing on notrace functions

2018-07-27 Thread Masami Hiramatsu
Hello,

This is the 3rd version of the series to prohibit kprobe
on notrace functions which Francis sent before.

This version fixes some issues on previous version. Fix
to handle the no-symbol kprobe-events correctly and Fix
probepoint.tc testcase to not use notrace function as
a probe point.

Thank you,

---

Francis Deslauriers (1):
  selftest/ftrace: Move kprobe selftest function to separate compile unit

Masami Hiramatsu (2):
  tracing: kprobes: Prohibit probing on notrace function
  selftests/ftrace: Fix kprobe string testcase to not probe notrace function


 kernel/trace/Kconfig   |   18 ++
 kernel/trace/Makefile  |5 ++
 kernel/trace/trace_kprobe.c|   58 +---
 kernel/trace/trace_kprobe_selftest.c   |   10 +++
 kernel/trace/trace_kprobe_selftest.h   |7 ++
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 --
 .../selftests/ftrace/test.d/kprobe/probepoint.tc   |2 -
 7 files changed, 91 insertions(+), 39 deletions(-)
 create mode 100644 kernel/trace/trace_kprobe_selftest.c
 create mode 100644 kernel/trace/trace_kprobe_selftest.h

--
Masami Hiramatsu (Linaro) 


Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread Paul E. McKenney
On Sat, Jul 28, 2018 at 12:07:19AM +, David Chen wrote:
> Hi Paul,
> 
> I wasn't talking about the xchg() though.
> 
> The smp_mb__after_atomic() is not for xchg(), it's for `*tail = 
> rdp->nocb_gp_head;`
> it's stated so in the comment. And I do think we need ordering between
> `*tail = rdp->nocb_gp_head;` and wake up, because the waiter is checking on 
> head not tail.
> 
>   swait_event_interruptible(rdp->nocb_wq,
>
> READ_ONCE(rdp->nocb_follower_head));
> 
> So what I'm saying is that since we need to maintain ordering between `*tail 
> = rdp->nocb_gp_head;`
> and wake up, we need to change the smp_mb__after_atomic() to smp_mb(). Because
> smp_mb__after_atomic() wouldn't guarantee 
> 
> So this is what I'm proposing.

Good eyes!

Hmmm...  What do I do about this in mainline?  Ah, I introduced a
->nocb_lock in mainline to prevent this from happening.  In that same
commit that you didn't want to use because it is hard to backport.  ;-)

So yes, but there might well be other misorderings fixed by the
hard-to-backport commit that your change below does not cover.  Still I
do agree that you need full ordering at that point.

Thanx, Paul

> diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> linux-4.9.37/kernel/rcu/tree_plugin.h
> --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h2017-07-12 
> 06:42:41.0 -0700
> +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 16:23:41.349044259 
> -0700
> @@ -2076,7 +2076,7 @@
>   /* Append callbacks to follower's "done" list. */
>   tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>   *tail = rdp->nocb_gp_head;
> - smp_mb__after_atomic(); /* Store *tail before wakeup. */
> + smp_mb(); /* Store *tail before wakeup. */
>   if (rdp != my_rdp && tail == >nocb_follower_head) {
>   /*
>* List was empty, wake up the follower.
> 
> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 4:47 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> > Hi Paul,
> > 
> > Thanks for the advice.
> > The bug is kind of hard to hit, so I can't say for certain yet.
> 
> Well, you can always remove the "tail == >nocb_follower_head" as an
> extra belt-and-suspenders safety net.  I am not putting that in mainline,
> but in the privacy of your own copy of the kernel, I don't see any really
> serious problem with it.  (As long as you aren't going for absolute maximum
> performance, but even then there are other more important tuning actions
> and code changes you could make.)
> 
> > Though after another look at the code, I found out the 
> > `smp_mb__after_atomic();`
> > seems to be only a compiler barrier on x86.
> 
> Yes, and that is because the locked xchg instruction used on x86 to
> implement xchg() already provides full ordering.  ;-)
> 
> >    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
> >    *tail = rdp->nocb_gp_head;
> >    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> >    if (rdp != my_rdp && tail == >nocb_follower_head) {
> >    swake_up(>nocb_wq);
> > 
> > But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> > wakeup operation don't guarantee ordering. And when the follower wakes up, 
> > it checks
> > for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> > which doesn't
> > have LOCK prefix. So it's possible for follower to wake up and see the list 
> > is empty and go
> > back to sleep.
> 
> Again, xchg() is defined to provide full ordering against all operations
> before and after it.  Each architecture is required to do whatever is
> necessary to implement that full ordering, and x86 need only provide
> its "lock xchg" instruction.
> 
> The smp_mb__after_atomic() has effect only after atomic read-modify-write
> operations that do not return a value, for example, atomic_inc().
> If you use it after a value-returning atomic read-modify-write operation
> like xchg(), all you do is needlessly slow things down on platforms
> that provide non-empty smp_mb__after_atomic() definitions.  So again,
> smp_mb__after_atomic() after xchg() is pointless.
> 
> Please take a look at Documentation/core-api/atomic_ops.rst in the
> Linux-kernel source tree for more information.  Or get a v4.17 kernel
> source tree and check this using the memory model (tools/memory-model
> in that version).
> 
>     Thanx, Paul
> 
> > Thanks,
> > David
> > 
> > From: Paul E. McKenney 
> > Sent: Friday, July 27, 2018 3:31 PM
> > To: David Chen
> > Cc: linux-kernel@vger.kernel.org
> > Subject: Re: RCU nocb list not 

Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread Paul E. McKenney
On Sat, Jul 28, 2018 at 12:07:19AM +, David Chen wrote:
> Hi Paul,
> 
> I wasn't talking about the xchg() though.
> 
> The smp_mb__after_atomic() is not for xchg(), it's for `*tail = 
> rdp->nocb_gp_head;`
> it's stated so in the comment. And I do think we need ordering between
> `*tail = rdp->nocb_gp_head;` and wake up, because the waiter is checking on 
> head not tail.
> 
>   swait_event_interruptible(rdp->nocb_wq,
>
> READ_ONCE(rdp->nocb_follower_head));
> 
> So what I'm saying is that since we need to maintain ordering between `*tail 
> = rdp->nocb_gp_head;`
> and wake up, we need to change the smp_mb__after_atomic() to smp_mb(). Because
> smp_mb__after_atomic() wouldn't guarantee 
> 
> So this is what I'm proposing.

Good eyes!

Hmmm...  What do I do about this in mainline?  Ah, I introduced a
->nocb_lock in mainline to prevent this from happening.  In that same
commit that you didn't want to use because it is hard to backport.  ;-)

So yes, but there might well be other misorderings fixed by the
hard-to-backport commit that your change below does not cover.  Still I
do agree that you need full ordering at that point.

Thanx, Paul

> diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> linux-4.9.37/kernel/rcu/tree_plugin.h
> --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h2017-07-12 
> 06:42:41.0 -0700
> +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 16:23:41.349044259 
> -0700
> @@ -2076,7 +2076,7 @@
>   /* Append callbacks to follower's "done" list. */
>   tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>   *tail = rdp->nocb_gp_head;
> - smp_mb__after_atomic(); /* Store *tail before wakeup. */
> + smp_mb(); /* Store *tail before wakeup. */
>   if (rdp != my_rdp && tail == >nocb_follower_head) {
>   /*
>* List was empty, wake up the follower.
> 
> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 4:47 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> > Hi Paul,
> > 
> > Thanks for the advice.
> > The bug is kind of hard to hit, so I can't say for certain yet.
> 
> Well, you can always remove the "tail == >nocb_follower_head" as an
> extra belt-and-suspenders safety net.  I am not putting that in mainline,
> but in the privacy of your own copy of the kernel, I don't see any really
> serious problem with it.  (As long as you aren't going for absolute maximum
> performance, but even then there are other more important tuning actions
> and code changes you could make.)
> 
> > Though after another look at the code, I found out the 
> > `smp_mb__after_atomic();`
> > seems to be only a compiler barrier on x86.
> 
> Yes, and that is because the locked xchg instruction used on x86 to
> implement xchg() already provides full ordering.  ;-)
> 
> >    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
> >    *tail = rdp->nocb_gp_head;
> >    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> >    if (rdp != my_rdp && tail == >nocb_follower_head) {
> >    swake_up(>nocb_wq);
> > 
> > But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> > wakeup operation don't guarantee ordering. And when the follower wakes up, 
> > it checks
> > for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> > which doesn't
> > have LOCK prefix. So it's possible for follower to wake up and see the list 
> > is empty and go
> > back to sleep.
> 
> Again, xchg() is defined to provide full ordering against all operations
> before and after it.  Each architecture is required to do whatever is
> necessary to implement that full ordering, and x86 need only provide
> its "lock xchg" instruction.
> 
> The smp_mb__after_atomic() has effect only after atomic read-modify-write
> operations that do not return a value, for example, atomic_inc().
> If you use it after a value-returning atomic read-modify-write operation
> like xchg(), all you do is needlessly slow things down on platforms
> that provide non-empty smp_mb__after_atomic() definitions.  So again,
> smp_mb__after_atomic() after xchg() is pointless.
> 
> Please take a look at Documentation/core-api/atomic_ops.rst in the
> Linux-kernel source tree for more information.  Or get a v4.17 kernel
> source tree and check this using the memory model (tools/memory-model
> in that version).
> 
>     Thanx, Paul
> 
> > Thanks,
> > David
> > 
> > From: Paul E. McKenney 
> > Sent: Friday, July 27, 2018 3:31 PM
> > To: David Chen
> > Cc: linux-kernel@vger.kernel.org
> > Subject: Re: RCU nocb list not 

Re: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Linus Torvalds
On Fri, Jul 27, 2018 at 5:02 PM Schmauss, Erik  wrote:
>
> The patch below should be able to fix this.

Yes, the error messages seem gone with this.

I see another ACPI warning, but that one has always been there:

  acpi INT33D5:00: intel-hid: created platform device
  ACPI Warning: \_SB.IETM._ART: Return Package type mismatch at index
0 - found Integer, expected Reference (20180531/nspredef-263)
  ACPI: Invalid package element [0]: got number, expecting [R]
  _ART package 0 is invalid, ignored

and doesn't seem to cause any problems. Just mentioning it in case
somebody goes "yeah.."

 Linus


Re: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Linus Torvalds
On Fri, Jul 27, 2018 at 5:02 PM Schmauss, Erik  wrote:
>
> The patch below should be able to fix this.

Yes, the error messages seem gone with this.

I see another ACPI warning, but that one has always been there:

  acpi INT33D5:00: intel-hid: created platform device
  ACPI Warning: \_SB.IETM._ART: Return Package type mismatch at index
0 - found Integer, expected Reference (20180531/nspredef-263)
  ACPI: Invalid package element [0]: got number, expecting [R]
  _ART package 0 is invalid, ignored

and doesn't seem to cause any problems. Just mentioning it in case
somebody goes "yeah.."

 Linus


Re: [PATCH 4.4 15/47] ubi: fastmap: Correctly handle interrupted erasures in EBA

2018-07-27 Thread Ben Hutchings
On Thu, 2018-07-26 at 08:25 +0200, Richard Weinberger wrote:
> Ben,
> 
> Am Donnerstag, 26. Juli 2018, 04:12:54 CEST schrieb Ben Hutchings:
> > On Tue, 2018-07-10 at 20:24 +0200, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > > 
> > > --
> > > 
> > > From: Richard Weinberger 
> > > 
> > > commit 781932375ffc6411713ee0926ccae8596ed0261c upstream.
> > > 
> > > Fastmap cannot track the LEB unmap operation, therefore it can
> > > happen that after an interrupted erasure the mapping still looks
> > > good from Fastmap's point of view, while reading from the PEB will
> > > cause an ECC error and confuses the upper layer.
> > > 
> > > Instead of teaching users of UBI how to deal with that, we read back
> > > the VID header and check for errors. If the PEB is empty or shows ECC
> > > errors we fixup the mapping and schedule the PEB for erasure.
> > 
> > [...]
> > 
> > Isn't there a risk here, that a read error leads to erasing data that
> > might be recoverable if the read is retried?
> 
> Well, read error means that already something went very wrong. At other places
> in UBI wo also don't retry reading headers and consider it as fatal when we 
> are unable to read it.
> We could also read the EC header, but what do we gain from that?
> If the VID header is not readable we cannot check fastmap either.
> 
> What case exactly do you have in mind?

I suppose I'm thinking about data recovery from a flash device that has
become unreliable.  I'm not familiar with ubi; is there a read-only
mode where the erase wouldn't actually be performed?

Ben.

-- 
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
 Manchester, M1 2HF, United Kingdom


Re: [PATCH 4.4 15/47] ubi: fastmap: Correctly handle interrupted erasures in EBA

2018-07-27 Thread Ben Hutchings
On Thu, 2018-07-26 at 08:25 +0200, Richard Weinberger wrote:
> Ben,
> 
> Am Donnerstag, 26. Juli 2018, 04:12:54 CEST schrieb Ben Hutchings:
> > On Tue, 2018-07-10 at 20:24 +0200, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > > 
> > > --
> > > 
> > > From: Richard Weinberger 
> > > 
> > > commit 781932375ffc6411713ee0926ccae8596ed0261c upstream.
> > > 
> > > Fastmap cannot track the LEB unmap operation, therefore it can
> > > happen that after an interrupted erasure the mapping still looks
> > > good from Fastmap's point of view, while reading from the PEB will
> > > cause an ECC error and confuses the upper layer.
> > > 
> > > Instead of teaching users of UBI how to deal with that, we read back
> > > the VID header and check for errors. If the PEB is empty or shows ECC
> > > errors we fixup the mapping and schedule the PEB for erasure.
> > 
> > [...]
> > 
> > Isn't there a risk here, that a read error leads to erasing data that
> > might be recoverable if the read is retried?
> 
> Well, read error means that already something went very wrong. At other places
> in UBI wo also don't retry reading headers and consider it as fatal when we 
> are unable to read it.
> We could also read the EC header, but what do we gain from that?
> If the VID header is not readable we cannot check fastmap either.
> 
> What case exactly do you have in mind?

I suppose I'm thinking about data recovery from a flash device that has
become unreliable.  I'm not familiar with ubi; is there a read-only
mode where the erase wouldn't actually be performed?

Ben.

-- 
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
 Manchester, M1 2HF, United Kingdom


Re; HELLO DEAR MY DONATION TO YOU.

2018-07-27 Thread Rosrior Rmalhin
HELLO DEAR MY DONATION TO YOU.

May the peace of the Almighty God be with you and your Family,

With Due Respect and Humility, I was compelled to write you under humanitarian 
ground. My name is Mrs Rosareio Romarinhio. TIMOR Origin but married with Late 
Sir Ratnavale from MALAYSIA. I have took a personal decision to donate what I 
inherited from my late husband to the Charity, less privilege, I am 58 years 
old and I was diagnosed for cancer since the past 4 years, immediately after 
the death of my husband. We were both married for many years without a child, I 
took this decision because I don't have any child that will inherit this money.

My late husband was The chairman/CEO of k: E global industries and a 
philanthropist based in MALAYSIA. I have decided to donate a fund through 
Credit Account of ( $1,6000.00 (One Million, Six Hundred Thousand us dollars ) 
to you for this assignment, Charity, less privilege, building of schools, 
hospitals and also for the assistance of the poor widows, Motherless babies, 
Charity organization, CHRISTIAN OR MUSLIM, and orphans. I don't know you in 
person but God knows you, so contacting you for this assignment is a direction 
from the holy spirit of God to donate this fund outside this country through 
you as my late husband has already donated alot in this country,

so you have to make sure that you use this donation fund as I have directed so 
that the name of the Almighty God will be glorify forever. Your urgent response 
is required in this matter due to my present critical condition of my health, 
it was a Sister nurse working in this hospital that helped me to type this 
message, because I am loosing strenght every munites of the day. All I need 
from you is prayers that GOD will accept my soul incase I didn't survive this 
surgery. Also pray for the soul of my late Husband to rest in peace.

Please always be prayerful all through your life, we are visitors on this earth 
and we must be very careful in whatever we do so that our soul will not be a 
waste. I wish you all the best and may God bless you abundantly. I AM WAITING 
FOR YOUR URGENT REPLY

Remain blessed in God.  Yours.

Mother Mrs Rosareio Romarinhio


Re; HELLO DEAR MY DONATION TO YOU.

2018-07-27 Thread Rosrior Rmalhin
HELLO DEAR MY DONATION TO YOU.

May the peace of the Almighty God be with you and your Family,

With Due Respect and Humility, I was compelled to write you under humanitarian 
ground. My name is Mrs Rosareio Romarinhio. TIMOR Origin but married with Late 
Sir Ratnavale from MALAYSIA. I have took a personal decision to donate what I 
inherited from my late husband to the Charity, less privilege, I am 58 years 
old and I was diagnosed for cancer since the past 4 years, immediately after 
the death of my husband. We were both married for many years without a child, I 
took this decision because I don't have any child that will inherit this money.

My late husband was The chairman/CEO of k: E global industries and a 
philanthropist based in MALAYSIA. I have decided to donate a fund through 
Credit Account of ( $1,6000.00 (One Million, Six Hundred Thousand us dollars ) 
to you for this assignment, Charity, less privilege, building of schools, 
hospitals and also for the assistance of the poor widows, Motherless babies, 
Charity organization, CHRISTIAN OR MUSLIM, and orphans. I don't know you in 
person but God knows you, so contacting you for this assignment is a direction 
from the holy spirit of God to donate this fund outside this country through 
you as my late husband has already donated alot in this country,

so you have to make sure that you use this donation fund as I have directed so 
that the name of the Almighty God will be glorify forever. Your urgent response 
is required in this matter due to my present critical condition of my health, 
it was a Sister nurse working in this hospital that helped me to type this 
message, because I am loosing strenght every munites of the day. All I need 
from you is prayers that GOD will accept my soul incase I didn't survive this 
surgery. Also pray for the soul of my late Husband to rest in peace.

Please always be prayerful all through your life, we are visitors on this earth 
and we must be very careful in whatever we do so that our soul will not be a 
waste. I wish you all the best and may God bless you abundantly. I AM WAITING 
FOR YOUR URGENT REPLY

Remain blessed in God.  Yours.

Mother Mrs Rosareio Romarinhio


Re: [PATCH v2 1/3] tracing: kprobes: Prohibit probing on notrace function

2018-07-27 Thread Masami Hiramatsu
On Thu, 26 Jul 2018 14:53:27 +0900
Masami Hiramatsu  wrote:

> Prohibit kprobe-events probing on notrace function.
> Since probing on the notrace function can cause recursive
> event call. In most case those are just skipped, but
> in some case it falls into infinit recursive call.
> 
> This protection can be disabled by the kconfig
> CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly
> recommended to keep it "n" for normal kernel.
> 
> Signed-off-by: Masami Hiramatsu 
> Tested-by: Francis Deslauriers 
> ---
>   Changes from v1
>- Add CONFIG_KPROBE_EVENTS_ON_NOTRACE kconfig for knocking down
>  the protection.
> ---
>  kernel/trace/Kconfig|   18 ++
>  kernel/trace/trace_kprobe.c |   23 +++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index dcc0166d1997..24d5a58467a3 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -456,6 +456,24 @@ config KPROBE_EVENTS
> This option is also required by perf-probe subcommand of perf tools.
> If you want to use perf tools, this option is strongly recommended.
>  
> +config KPROBE_EVENTS_ON_NOTRACE
> + bool "Do NOT protect notrace function from kprobe events"
> + depends on KPROBE_EVENTS
> + default n
> + help
> +   This is only for the developers who want to debug ftrace itself
> +   using kprobe events.
> +
> +   Usually, ftrace related functions are protected from kprobe-events
> +   to prevent an infinit recursion or any unexpected execution path
> +   which leads to a kernel crash.
> +
> +   This option disables such protection and allows you to put kprobe
> +   events on ftrace functions for debugging ftrace by itself.
> +   Note that this might let you shoot yourself in the foot.
> +
> +   If unsure, say N.
> +
>  config UPROBE_EVENTS
>   bool "Enable uprobes-based dynamic events"
>   depends on ARCH_SUPPORTS_UPROBES
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 27ace4513c43..1f1b4d712a7e 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -496,6 +496,23 @@ disable_trace_kprobe(struct trace_kprobe *tk, struct 
> trace_event_file *file)
>   return ret;
>  }
>  
> +#ifdef CONFIG_KPROBE_EVENTS_ON_NOTRACE
> +#define within_notrace_func(tk)  (false)
> +#else
> +static bool within_notrace_func(struct trace_kprobe *tk)
> +{
> + unsigned long offset, size, addr;
> +
> + addr = kallsyms_lookup_name(trace_kprobe_symbol(tk));

Oops, I found this is wrong. (Thanks for old myself :P)
As ftracetest probepoint.tc said, kprobe-events can be
defined without symbol (direct address) for debugging.
In that case trace_kprobe_symbol will not return the symbol string.
I'll fix this series again.

Thank you,

> + addr += trace_kprobe_offset(tk);
> +
> + if (!kallsyms_lookup_size_offset(addr, , ))
> + return true;/* Out of range. */
> +
> + return !ftrace_location_range(addr - offset, addr - offset + size);
> +}
> +#endif
> +
>  /* Internal register function - just handle k*probes and flags */
>  static int __register_trace_kprobe(struct trace_kprobe *tk)
>  {
> @@ -504,6 +521,12 @@ static int __register_trace_kprobe(struct trace_kprobe 
> *tk)
>   if (trace_probe_is_registered(>tp))
>   return -EINVAL;
>  
> + if (within_notrace_func(tk)) {
> + pr_warn("Could not probe notrace function %s\n",
> + trace_kprobe_symbol(tk));
> + return -EINVAL;
> + }
> +
>   for (i = 0; i < tk->tp.nr_args; i++)
>   traceprobe_update_arg(>tp.args[i]);
>  
> 


-- 
Masami Hiramatsu 


Re: [PATCH v2 1/3] tracing: kprobes: Prohibit probing on notrace function

2018-07-27 Thread Masami Hiramatsu
On Thu, 26 Jul 2018 14:53:27 +0900
Masami Hiramatsu  wrote:

> Prohibit kprobe-events probing on notrace function.
> Since probing on the notrace function can cause recursive
> event call. In most case those are just skipped, but
> in some case it falls into infinit recursive call.
> 
> This protection can be disabled by the kconfig
> CONFIG_KPROBE_EVENTS_ON_NOTRACE=y, but it is highly
> recommended to keep it "n" for normal kernel.
> 
> Signed-off-by: Masami Hiramatsu 
> Tested-by: Francis Deslauriers 
> ---
>   Changes from v1
>- Add CONFIG_KPROBE_EVENTS_ON_NOTRACE kconfig for knocking down
>  the protection.
> ---
>  kernel/trace/Kconfig|   18 ++
>  kernel/trace/trace_kprobe.c |   23 +++
>  2 files changed, 41 insertions(+)
> 
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index dcc0166d1997..24d5a58467a3 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -456,6 +456,24 @@ config KPROBE_EVENTS
> This option is also required by perf-probe subcommand of perf tools.
> If you want to use perf tools, this option is strongly recommended.
>  
> +config KPROBE_EVENTS_ON_NOTRACE
> + bool "Do NOT protect notrace function from kprobe events"
> + depends on KPROBE_EVENTS
> + default n
> + help
> +   This is only for the developers who want to debug ftrace itself
> +   using kprobe events.
> +
> +   Usually, ftrace related functions are protected from kprobe-events
> +   to prevent an infinit recursion or any unexpected execution path
> +   which leads to a kernel crash.
> +
> +   This option disables such protection and allows you to put kprobe
> +   events on ftrace functions for debugging ftrace by itself.
> +   Note that this might let you shoot yourself in the foot.
> +
> +   If unsure, say N.
> +
>  config UPROBE_EVENTS
>   bool "Enable uprobes-based dynamic events"
>   depends on ARCH_SUPPORTS_UPROBES
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 27ace4513c43..1f1b4d712a7e 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -496,6 +496,23 @@ disable_trace_kprobe(struct trace_kprobe *tk, struct 
> trace_event_file *file)
>   return ret;
>  }
>  
> +#ifdef CONFIG_KPROBE_EVENTS_ON_NOTRACE
> +#define within_notrace_func(tk)  (false)
> +#else
> +static bool within_notrace_func(struct trace_kprobe *tk)
> +{
> + unsigned long offset, size, addr;
> +
> + addr = kallsyms_lookup_name(trace_kprobe_symbol(tk));

Oops, I found this is wrong. (Thanks for old myself :P)
As ftracetest probepoint.tc said, kprobe-events can be
defined without symbol (direct address) for debugging.
In that case trace_kprobe_symbol will not return the symbol string.
I'll fix this series again.

Thank you,

> + addr += trace_kprobe_offset(tk);
> +
> + if (!kallsyms_lookup_size_offset(addr, , ))
> + return true;/* Out of range. */
> +
> + return !ftrace_location_range(addr - offset, addr - offset + size);
> +}
> +#endif
> +
>  /* Internal register function - just handle k*probes and flags */
>  static int __register_trace_kprobe(struct trace_kprobe *tk)
>  {
> @@ -504,6 +521,12 @@ static int __register_trace_kprobe(struct trace_kprobe 
> *tk)
>   if (trace_probe_is_registered(>tp))
>   return -EINVAL;
>  
> + if (within_notrace_func(tk)) {
> + pr_warn("Could not probe notrace function %s\n",
> + trace_kprobe_symbol(tk));
> + return -EINVAL;
> + }
> +
>   for (i = 0; i < tk->tp.nr_args; i++)
>   traceprobe_update_arg(>tp.args[i]);
>  
> 


-- 
Masami Hiramatsu 


Re: include architecture Kconfig files from top-level Kconfig v3

2018-07-27 Thread Masahiro Yamada
2018-07-27 16:48 GMT+09:00 Christoph Hellwig :
> On Fri, Jul 27, 2018 at 10:32:19AM +0900, Masahiro Yamada wrote:
>> This will just add a new unmet dependency warning.
>> CONFIG_PREEMPT_COUNT will be still selected.
>
> True.  I guess we simply need to prohibit CONFIG_DEBUG_ATOMIC_SLEEP
> explicitly if PREEMPT_COUNT isn't supported.  E.g something like this:
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 26d3ff7e3cf4..373ce9fecd7e 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1195,6 +1195,7 @@ config DEBUG_ATOMIC_SLEEP
> bool "Sleep inside atomic section checking"
> select PREEMPT_COUNT
> depends on DEBUG_KERNEL
> +   depends on !ARCH_NO_PREEMPT
> help
>   If you say Y here, various routines which may sleep will become very
>   noisy if they are called inside atomic sections: when a spinlock is


I will fold this into 07/10.


I see another 'select PREEMPT_COUNT' in
drivers/gpu/drm/i915/Kconfig.debug
but DRM_I915 depends on X86, so this is not
a real problem.



-- 
Best Regards
Masahiro Yamada


Re: include architecture Kconfig files from top-level Kconfig v3

2018-07-27 Thread Masahiro Yamada
2018-07-27 16:48 GMT+09:00 Christoph Hellwig :
> On Fri, Jul 27, 2018 at 10:32:19AM +0900, Masahiro Yamada wrote:
>> This will just add a new unmet dependency warning.
>> CONFIG_PREEMPT_COUNT will be still selected.
>
> True.  I guess we simply need to prohibit CONFIG_DEBUG_ATOMIC_SLEEP
> explicitly if PREEMPT_COUNT isn't supported.  E.g something like this:
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 26d3ff7e3cf4..373ce9fecd7e 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1195,6 +1195,7 @@ config DEBUG_ATOMIC_SLEEP
> bool "Sleep inside atomic section checking"
> select PREEMPT_COUNT
> depends on DEBUG_KERNEL
> +   depends on !ARCH_NO_PREEMPT
> help
>   If you say Y here, various routines which may sleep will become very
>   noisy if they are called inside atomic sections: when a spinlock is


I will fold this into 07/10.


I see another 'select PREEMPT_COUNT' in
drivers/gpu/drm/i915/Kconfig.debug
but DRM_I915 depends on X86, so this is not
a real problem.



-- 
Best Regards
Masahiro Yamada


[PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()

2018-07-27 Thread Cong Wang
Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
we should sync its ->expires_seq too. However it is missing
for distribute_cfs_runtime(), especially the slack timer call path.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Cc: Xunlei Pang 
Cc: Ben Segall 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Cong Wang 
---
 kernel/sched/fair.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be4d344..910c50db3d74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4857,7 +4857,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
-   u64 remaining, u64 expires)
+   u64 remaining, u64 expires, int expires_seq)
 {
struct cfs_rq *cfs_rq;
u64 runtime;
@@ -4880,6 +4880,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
 
cfs_rq->runtime_remaining += runtime;
cfs_rq->runtime_expires = expires;
+   cfs_rq->expires_seq = expires_seq;
 
/* we check whether we're throttled above */
if (cfs_rq->runtime_remaining > 0)
@@ -4905,7 +4906,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
 static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
 {
u64 runtime, runtime_expires;
-   int throttled;
+   int throttled, expires_seq;
 
/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
@@ -4933,6 +4934,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth 
*cfs_b, int overrun)
cfs_b->nr_throttled += overrun;
 
runtime_expires = cfs_b->runtime_expires;
+   expires_seq = cfs_b->expires_seq;
 
/*
 * This check is repeated as we are holding onto the new bandwidth while
@@ -4946,7 +4948,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth 
*cfs_b, int overrun)
raw_spin_unlock(_b->lock);
/* we can't nest cfs_b->lock while distributing bandwidth */
runtime = distribute_cfs_runtime(cfs_b, runtime,
-runtime_expires);
+runtime_expires, expires_seq);
raw_spin_lock(_b->lock);
 
throttled = !list_empty(_b->throttled_cfs_rq);
@@ -5055,6 +5057,7 @@ static __always_inline void return_cfs_rq_runtime(struct 
cfs_rq *cfs_rq)
 static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
 {
u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
+   int expires_seq;
u64 expires;
 
/* confirm we're still not at a refresh boundary */
@@ -5068,12 +5071,13 @@ static void do_sched_cfs_slack_timer(struct 
cfs_bandwidth *cfs_b)
runtime = cfs_b->runtime;
 
expires = cfs_b->runtime_expires;
+   expires_seq = cfs_b->expires_seq;
raw_spin_unlock(_b->lock);
 
if (!runtime)
return;
 
-   runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
+   runtime = distribute_cfs_runtime(cfs_b, runtime, expires, expires_seq);
 
raw_spin_lock(_b->lock);
if (expires == cfs_b->runtime_expires)
-- 
2.14.4



Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread Anton Altaparmakov
Hi David,

> On 28 Jul 2018, at 00:49, David Howells  wrote:
> Jann Horn  wrote:
>>> +static int fsinfo_generic_name_encoding(struct dentry *dentry, char *buf)
>>> +{
>>> +   static const char encoding[] = "utf8";
>>> +
>>> +   if (buf)
>>> +   memcpy(buf, encoding, sizeof(encoding) - 1);
>>> +   return sizeof(encoding) - 1;
>>> +}
>> 
>> Is this meant to be "encoding to be used by userspace" or "encoding of
>> on-disk filenames"?
> 
> The latter.
> 
>> Are there any plans to create filesystems that behave differently?
> 
> isofs, fat, ntfs, cifs for example.
> 
>> If the latter: This is wrong for e.g. a vfat mount that uses a codepage,
>> right?  Should the default in that case not be "I don't know"?
> 
> Quite possibly.  Note that it could also be what you're interpreting it as
> because the codepage got overridden by a mount parameter rather than what's on
> the disk (assuming the medium actually records this).

No, nothing like that is recorded on disk.  That would have been way too 
helpful!  (-;  The only place Windows records such information is, you may have 
guessed this: in the registry which of course is local to the computer and 
unrelated to what removable media is attached...

> One thing I'm confused about is that fat has both a codepage and a charset and
> I'm not sure of the difference.

Oh that is quite simple.  (-:

The codepage is what is used to translate from/to the on-disk DOS 8.3 style 
names into the kernel's Unicode character representation.  The correct codepage 
for a particular volume is not stored on disk so it can lead to all sorts of 
fun if you for example create some names on for example a Japanese Windows on a 
FAT formatted USB stick and then plug that into a US or European Windows where 
the default code pages are completely different - all your filenames will 
appear totally corrupt.  (Note this ONLY affects 8.3 style/DOS/short names or 
whatever you want to call them.)

The charset on the other hand is what is used to convert strings coming in 
from/going out to userspace into the kernel's Unicode character representation.

The one nice thing about VFAT (and there aren't many nice things about it!) is 
that for long names (i.e. not the 8.3 style/DOS/short names), it actually 
stores on-disk little-endian UTF-16 (since Windows 2000, before that it used 
little endian UCS-2 - the change was needed to support things like Emojis and 
some languages that go outside the UCS-2 range of fixed 16-bit unicode).

Hope this clears that up.

Best regards,

Anton

> David

-- 
Anton Altaparmakov  (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer



Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread Anton Altaparmakov
Hi David,

> On 28 Jul 2018, at 00:49, David Howells  wrote:
> Jann Horn  wrote:
>>> +static int fsinfo_generic_name_encoding(struct dentry *dentry, char *buf)
>>> +{
>>> +   static const char encoding[] = "utf8";
>>> +
>>> +   if (buf)
>>> +   memcpy(buf, encoding, sizeof(encoding) - 1);
>>> +   return sizeof(encoding) - 1;
>>> +}
>> 
>> Is this meant to be "encoding to be used by userspace" or "encoding of
>> on-disk filenames"?
> 
> The latter.
> 
>> Are there any plans to create filesystems that behave differently?
> 
> isofs, fat, ntfs, cifs for example.
> 
>> If the latter: This is wrong for e.g. a vfat mount that uses a codepage,
>> right?  Should the default in that case not be "I don't know"?
> 
> Quite possibly.  Note that it could also be what you're interpreting it as
> because the codepage got overridden by a mount parameter rather than what's on
> the disk (assuming the medium actually records this).

No, nothing like that is recorded on disk.  That would have been way too 
helpful!  (-;  The only place Windows records such information is, you may have 
guessed this: in the registry which of course is local to the computer and 
unrelated to what removable media is attached...

> One thing I'm confused about is that fat has both a codepage and a charset and
> I'm not sure of the difference.

Oh that is quite simple.  (-:

The codepage is what is used to translate from/to the on-disk DOS 8.3 style 
names into the kernel's Unicode character representation.  The correct codepage 
for a particular volume is not stored on disk so it can lead to all sorts of 
fun if you for example create some names on for example a Japanese Windows on a 
FAT formatted USB stick and then plug that into a US or European Windows where 
the default code pages are completely different - all your filenames will 
appear totally corrupt.  (Note this ONLY affects 8.3 style/DOS/short names or 
whatever you want to call them.)

The charset on the other hand is what is used to convert strings coming in 
from/going out to userspace into the kernel's Unicode character representation.

The one nice thing about VFAT (and there aren't many nice things about it!) is 
that for long names (i.e. not the 8.3 style/DOS/short names), it actually 
stores on-disk little-endian UTF-16 (since Windows 2000, before that it used 
little endian UCS-2 - the change was needed to support things like Emojis and 
some languages that go outside the UCS-2 range of fixed 16-bit unicode).

Hope this clears that up.

Best regards,

Anton

> David

-- 
Anton Altaparmakov  (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer



[PATCH] sched/fair: sync expires_seq in distribute_cfs_runtime()

2018-07-27 Thread Cong Wang
Each time we sync cfs_rq->runtime_expires with cfs_b->runtime_expires,
we should sync its ->expires_seq too. However it is missing
for distribute_cfs_runtime(), especially the slack timer call path.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Cc: Xunlei Pang 
Cc: Ben Segall 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Cong Wang 
---
 kernel/sched/fair.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be4d344..910c50db3d74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4857,7 +4857,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 }
 
 static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b,
-   u64 remaining, u64 expires)
+   u64 remaining, u64 expires, int expires_seq)
 {
struct cfs_rq *cfs_rq;
u64 runtime;
@@ -4880,6 +4880,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
 
cfs_rq->runtime_remaining += runtime;
cfs_rq->runtime_expires = expires;
+   cfs_rq->expires_seq = expires_seq;
 
/* we check whether we're throttled above */
if (cfs_rq->runtime_remaining > 0)
@@ -4905,7 +4906,7 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
 static int do_sched_cfs_period_timer(struct cfs_bandwidth *cfs_b, int overrun)
 {
u64 runtime, runtime_expires;
-   int throttled;
+   int throttled, expires_seq;
 
/* no need to continue the timer with no bandwidth constraint */
if (cfs_b->quota == RUNTIME_INF)
@@ -4933,6 +4934,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth 
*cfs_b, int overrun)
cfs_b->nr_throttled += overrun;
 
runtime_expires = cfs_b->runtime_expires;
+   expires_seq = cfs_b->expires_seq;
 
/*
 * This check is repeated as we are holding onto the new bandwidth while
@@ -4946,7 +4948,7 @@ static int do_sched_cfs_period_timer(struct cfs_bandwidth 
*cfs_b, int overrun)
raw_spin_unlock(_b->lock);
/* we can't nest cfs_b->lock while distributing bandwidth */
runtime = distribute_cfs_runtime(cfs_b, runtime,
-runtime_expires);
+runtime_expires, expires_seq);
raw_spin_lock(_b->lock);
 
throttled = !list_empty(_b->throttled_cfs_rq);
@@ -5055,6 +5057,7 @@ static __always_inline void return_cfs_rq_runtime(struct 
cfs_rq *cfs_rq)
 static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b)
 {
u64 runtime = 0, slice = sched_cfs_bandwidth_slice();
+   int expires_seq;
u64 expires;
 
/* confirm we're still not at a refresh boundary */
@@ -5068,12 +5071,13 @@ static void do_sched_cfs_slack_timer(struct 
cfs_bandwidth *cfs_b)
runtime = cfs_b->runtime;
 
expires = cfs_b->runtime_expires;
+   expires_seq = cfs_b->expires_seq;
raw_spin_unlock(_b->lock);
 
if (!runtime)
return;
 
-   runtime = distribute_cfs_runtime(cfs_b, runtime, expires);
+   runtime = distribute_cfs_runtime(cfs_b, runtime, expires, expires_seq);
 
raw_spin_lock(_b->lock);
if (expires == cfs_b->runtime_expires)
-- 
2.14.4



Re: Remounting filesystem read-only

2018-07-27 Thread Theodore Y. Ts'o
On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > The error should be pretty clear: "Inode table for bg 0 marked as
> > needing zeroing".  That should never happen.
> 
> Can you provide any debug patch to detect when this corruption is happening?
> Source of this corruption and how this is partition getting corrupted?
> Or which file system operation lead to this corruption?

Do you have a reliable repro?  If it's a one-off, it can be caused by
*anything*.  Crappy hardware, a bug in some proprietary, binary-only
GPU driver dereferencing some wild pointer that corrupts kernel
memory, etc.

Asking for a debug patch is like asking for "can you create technology
that can detect when a cockroach enter my house?"

So if you have a reliable repro, then we know what operations might be
triggering the corruption, and then you work on creating a minimal
repro, and only *then* when we have a restricted set of possibilities
that might be the cause (for example, if removing a GPU call makes the
problem go away, then the patch would need to be in the proprietary
GPU driver)

> I am digging code a bit around this warning to understand more.

The warning means that a flag in block group descriptor #0 is set
that should never be set.  How did the flag get set?  There is any
number of things that could cause that.

You might want to look at the block group descriptor via dumpe2fs or
debugfs, to see if it's just a single bit getting flipped, or if the
entire block group descriptor is garbage.  Note that under normal code
paths, the flag *never* gets set by ext4 kernel code.  The flag will
get set on non-block group 0 block group descriptors by ext4, and the
ext4 kernel code will only clear the flag.

Of course, if there is a bug in some driver that dereferences a
pointer widely, all bets are off.

- Ted


Re: Remounting filesystem read-only

2018-07-27 Thread Theodore Y. Ts'o
On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > The error should be pretty clear: "Inode table for bg 0 marked as
> > needing zeroing".  That should never happen.
> 
> Can you provide any debug patch to detect when this corruption is happening?
> Source of this corruption and how this is partition getting corrupted?
> Or which file system operation lead to this corruption?

Do you have a reliable repro?  If it's a one-off, it can be caused by
*anything*.  Crappy hardware, a bug in some proprietary, binary-only
GPU driver dereferencing some wild pointer that corrupts kernel
memory, etc.

Asking for a debug patch is like asking for "can you create technology
that can detect when a cockroach enter my house?"

So if you have a reliable repro, then we know what operations might be
triggering the corruption, and then you work on creating a minimal
repro, and only *then* when we have a restricted set of possibilities
that might be the cause (for example, if removing a GPU call makes the
problem go away, then the patch would need to be in the proprietary
GPU driver)

> I am digging code a bit around this warning to understand more.

The warning means that a flag in block group descriptor #0 is set
that should never be set.  How did the flag get set?  There is any
number of things that could cause that.

You might want to look at the block group descriptor via dumpe2fs or
debugfs, to see if it's just a single bit getting flipped, or if the
entire block group descriptor is garbage.  Note that under normal code
paths, the flag *never* gets set by ext4 kernel code.  The flag will
get set on non-block group 0 block group descriptors by ext4, and the
ext4 kernel code will only clear the flag.

Of course, if there is a bug in some driver that dereferences a
pointer widely, all bets are off.

- Ted


Re: [PATCH v2] PM / devfreq: Add support for QCOM devfreq firmware

2018-07-27 Thread skannan

On 2018-05-23 07:39, Rob Herring wrote:

Reviving an old thread. Sorry about the late reply. Got busy.

On Tue, May 22, 2018 at 1:30 PM, Saravana Kannan 
 wrote:

On 05/22/2018 11:08 AM, Rob Herring wrote:


On Fri, May 18, 2018 at 12:52:40AM -0700, Saravana Kannan wrote:


The firmware present in some QCOM chipsets offloads the steps 
necessary

for
changing the frequency of some devices (Eg: L3). This driver 
implements

the
devfreq interface for this firmware so that various governors could 
be

used
to scale the frequency of these devices.

Each client (say cluster 0 and cluster 1) that wants to vote for a
particular device's frequency (say, L3 frequency) is represented as 
a

separate voter device (qcom,devfreq-fw-voter) that's a child of the
firmware device (qcom,devfreq-fw).

Signed-off-by: Saravana Kannan 
---
  .../bindings/devfreq/devfreq-qcom-fw.txt   |  41 +++
  drivers/devfreq/Kconfig|  14 +
  drivers/devfreq/Makefile   |   1 +
  drivers/devfreq/devfreq_qcom_fw.c  | 330
+
  4 files changed, 386 insertions(+)
  create mode 100644
Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
  create mode 100644 drivers/devfreq/devfreq_qcom_fw.c

diff --git
a/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
b/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
new file mode 100644
index 000..f882a0b
--- /dev/null
+++ b/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
@@ -0,0 +1,41 @@
+QCOM Devfreq firmware device
+
+Some Qualcomm Technologies, Inc. (QTI) chipsets have a firmware 
that

+offloads the steps for frequency switching. It provides a table of
+supported frequencies and a register to request one of the 
supported

+freqencies.
+
+The qcom,devfreq-fw represents this firmware as a device. 
Sometimes,
+multiple entities want to vote on the frequency request that is 
sent to

the
+firmware. The qcom,devfreq-fw-voter represents these voters as 
child

+devices of the corresponding qcom,devfreq-fw device.
+
+Required properties:
+- compatible:  Must be "qcom,devfreq-fw" or
"qcom,devfreq-fw-voter"



No versions of firmware?



Sure, I can add a v1. Right now the interface has always been 
identical. I

thought if it changed in the future I'll add -v2.


Sounds like you are making up version numbers. If you don't have real
h/w or firmware version numbers, then use an SoC specific compatible
string.


+Only for qcom,devfreq-fw:
+- reg: Pairs of physical base addresses and region 
sizes

of
+   memory mapped registers.



Registers? Is this firmware or h/w block?



It's a HW block that has its own firmware.


So you have 2 things that could change: the h/w interface and the
firmware version. Make sure the compatible string(s) is specific
enough for the OS to know the exact combination.


For all practical purposes, the FW is opaque to the OS. It doesn't 
affect anything the OS can do with the IP block. So, the HW version is 
what matters. I'll figure out the actual HW version and use that.



+- reg-names:   Names of the bases for the above registers.
+   Required register regions are:
+   - "en-base": address of register to check if 
the

+ firmware is enabled.
+   - "ftbl-base": address region for the 
frequency

+ table.
+   - "perf-base": address of register to 
request a

+ frequency.
+
+Example:
+
+   qcom,devfreq-l3 {
+   compatible = "qcom,devfreq-fw";
+   reg-names = "en-base", "ftbl-base", "perf-base";
+   reg = <0x18321000 0x4>, <0x18321110 0x600>, 
<0x18321920

0x4>;
+
+   qcom,cpu0-l3 {
+   compatible = "qcom,devfreq-fw-voter";



There's no point in these nodes. They don't have any properties or
resources.



These nodes decide how many voters this device supports. Each voter 
would be
a devfreq node that will have its own governor set. For example, one 
of them

would use this governor:
http://lkml.iu.edu/hypermail/linux/kernel/1805.2/02474.html

You can also attach different devfreq-event devices to each one of 
these
voter devices based on what events you want to use for scaling each 
voter.

So, the devices are definitely needed in the larger context.


Sorry, I still don't understand.


Ok, let me try to explain. Let's take L3 as an example. Different other 
IPs might have different requirements on the L3 frequency. For example, 
little CPUs might want L3 to run at 400 MHz, big CPUs might want L3 to 
run at 1000 MHz, GPU or some other peripheral might want L3 to run at 
800 MHz. The L3 freq needs to be set to max of these requests -- in this 
case 1000 MHz. I'm trying to represent each of these "votes" on L3 as a 
device. Once I register these with devfreq, each of 

Re: [PATCH v2] PM / devfreq: Add support for QCOM devfreq firmware

2018-07-27 Thread skannan

On 2018-05-23 07:39, Rob Herring wrote:

Reviving an old thread. Sorry about the late reply. Got busy.

On Tue, May 22, 2018 at 1:30 PM, Saravana Kannan 
 wrote:

On 05/22/2018 11:08 AM, Rob Herring wrote:


On Fri, May 18, 2018 at 12:52:40AM -0700, Saravana Kannan wrote:


The firmware present in some QCOM chipsets offloads the steps 
necessary

for
changing the frequency of some devices (Eg: L3). This driver 
implements

the
devfreq interface for this firmware so that various governors could 
be

used
to scale the frequency of these devices.

Each client (say cluster 0 and cluster 1) that wants to vote for a
particular device's frequency (say, L3 frequency) is represented as 
a

separate voter device (qcom,devfreq-fw-voter) that's a child of the
firmware device (qcom,devfreq-fw).

Signed-off-by: Saravana Kannan 
---
  .../bindings/devfreq/devfreq-qcom-fw.txt   |  41 +++
  drivers/devfreq/Kconfig|  14 +
  drivers/devfreq/Makefile   |   1 +
  drivers/devfreq/devfreq_qcom_fw.c  | 330
+
  4 files changed, 386 insertions(+)
  create mode 100644
Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
  create mode 100644 drivers/devfreq/devfreq_qcom_fw.c

diff --git
a/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
b/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
new file mode 100644
index 000..f882a0b
--- /dev/null
+++ b/Documentation/devicetree/bindings/devfreq/devfreq-qcom-fw.txt
@@ -0,0 +1,41 @@
+QCOM Devfreq firmware device
+
+Some Qualcomm Technologies, Inc. (QTI) chipsets have a firmware 
that

+offloads the steps for frequency switching. It provides a table of
+supported frequencies and a register to request one of the 
supported

+freqencies.
+
+The qcom,devfreq-fw represents this firmware as a device. 
Sometimes,
+multiple entities want to vote on the frequency request that is 
sent to

the
+firmware. The qcom,devfreq-fw-voter represents these voters as 
child

+devices of the corresponding qcom,devfreq-fw device.
+
+Required properties:
+- compatible:  Must be "qcom,devfreq-fw" or
"qcom,devfreq-fw-voter"



No versions of firmware?



Sure, I can add a v1. Right now the interface has always been 
identical. I

thought if it changed in the future I'll add -v2.


Sounds like you are making up version numbers. If you don't have real
h/w or firmware version numbers, then use an SoC specific compatible
string.


+Only for qcom,devfreq-fw:
+- reg: Pairs of physical base addresses and region 
sizes

of
+   memory mapped registers.



Registers? Is this firmware or h/w block?



It's a HW block that has its own firmware.


So you have 2 things that could change: the h/w interface and the
firmware version. Make sure the compatible string(s) is specific
enough for the OS to know the exact combination.


For all practical purposes, the FW is opaque to the OS. It doesn't 
affect anything the OS can do with the IP block. So, the HW version is 
what matters. I'll figure out the actual HW version and use that.



+- reg-names:   Names of the bases for the above registers.
+   Required register regions are:
+   - "en-base": address of register to check if 
the

+ firmware is enabled.
+   - "ftbl-base": address region for the 
frequency

+ table.
+   - "perf-base": address of register to 
request a

+ frequency.
+
+Example:
+
+   qcom,devfreq-l3 {
+   compatible = "qcom,devfreq-fw";
+   reg-names = "en-base", "ftbl-base", "perf-base";
+   reg = <0x18321000 0x4>, <0x18321110 0x600>, 
<0x18321920

0x4>;
+
+   qcom,cpu0-l3 {
+   compatible = "qcom,devfreq-fw-voter";



There's no point in these nodes. They don't have any properties or
resources.



These nodes decide how many voters this device supports. Each voter 
would be
a devfreq node that will have its own governor set. For example, one 
of them

would use this governor:
http://lkml.iu.edu/hypermail/linux/kernel/1805.2/02474.html

You can also attach different devfreq-event devices to each one of 
these
voter devices based on what events you want to use for scaling each 
voter.

So, the devices are definitely needed in the larger context.


Sorry, I still don't understand.


Ok, let me try to explain. Let's take L3 as an example. Different other 
IPs might have different requirements on the L3 frequency. For example, 
little CPUs might want L3 to run at 400 MHz, big CPUs might want L3 to 
run at 1000 MHz, GPU or some other peripheral might want L3 to run at 
800 MHz. The L3 freq needs to be set to max of these requests -- in this 
case 1000 MHz. I'm trying to represent each of these "votes" on L3 as a 
device. Once I register these with devfreq, each of 

Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
Jann Horn  wrote:

> > fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);
> 
> sprintf(buf, "cp%d", sbi->options.codepage);
> sbi->nls_disk = load_nls(buf);
> if (!sbi->nls_disk) {
> fat_msg(sb, KERN_ERR, "codepage %s not found", buf);
> goto out_fail;
> }

Sorry, yes.  I was reading the print as the part of the display of superblock
parameters.

David


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
Jann Horn  wrote:

> > fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);
> 
> sprintf(buf, "cp%d", sbi->options.codepage);
> sbi->nls_disk = load_nls(buf);
> if (!sbi->nls_disk) {
> fat_msg(sb, KERN_ERR, "codepage %s not found", buf);
> goto out_fail;
> }

Sorry, yes.  I was reading the print as the part of the display of superblock
parameters.

David


Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread David Chen
Hi Paul,

I wasn't talking about the xchg() though.

The smp_mb__after_atomic() is not for xchg(), it's for `*tail = 
rdp->nocb_gp_head;`
it's stated so in the comment. And I do think we need ordering between
`*tail = rdp->nocb_gp_head;` and wake up, because the waiter is checking on 
head not tail.

swait_event_interruptible(rdp->nocb_wq,
 
READ_ONCE(rdp->nocb_follower_head));

So what I'm saying is that since we need to maintain ordering between `*tail = 
rdp->nocb_gp_head;`
and wake up, we need to change the smp_mb__after_atomic() to smp_mb(). Because
smp_mb__after_atomic() wouldn't guarantee 

So this is what I'm proposing.

diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
linux-4.9.37/kernel/rcu/tree_plugin.h
--- linux-4.9.37.orig/kernel/rcu/tree_plugin.h  2017-07-12 06:42:41.0 
-0700
+++ linux-4.9.37/kernel/rcu/tree_plugin.h   2018-07-27 16:23:41.349044259 
-0700
@@ -2076,7 +2076,7 @@
/* Append callbacks to follower's "done" list. */
tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
*tail = rdp->nocb_gp_head;
-   smp_mb__after_atomic(); /* Store *tail before wakeup. */
+   smp_mb(); /* Store *tail before wakeup. */
if (rdp != my_rdp && tail == >nocb_follower_head) {
/*
 * List was empty, wake up the follower.

Thanks,
David

From: Paul E. McKenney 
Sent: Friday, July 27, 2018 4:47 PM
To: David Chen
Cc: linux-kernel@vger.kernel.org
Subject: Re: RCU nocb list not reclaiming causing OOM
  

On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> Hi Paul,
> 
> Thanks for the advice.
> The bug is kind of hard to hit, so I can't say for certain yet.

Well, you can always remove the "tail == >nocb_follower_head" as an
extra belt-and-suspenders safety net.  I am not putting that in mainline,
but in the privacy of your own copy of the kernel, I don't see any really
serious problem with it.  (As long as you aren't going for absolute maximum
performance, but even then there are other more important tuning actions
and code changes you could make.)

> Though after another look at the code, I found out the 
> `smp_mb__after_atomic();`
> seems to be only a compiler barrier on x86.

Yes, and that is because the locked xchg instruction used on x86 to
implement xchg() already provides full ordering.  ;-)

>    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>    *tail = rdp->nocb_gp_head;
>    smp_mb__after_atomic(); /* Store *tail before wakeup. */
>    if (rdp != my_rdp && tail == >nocb_follower_head) {
>    swake_up(>nocb_wq);
> 
> But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> wakeup operation don't guarantee ordering. And when the follower wakes up, it 
> checks
> for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> which doesn't
> have LOCK prefix. So it's possible for follower to wake up and see the list 
> is empty and go
> back to sleep.

Again, xchg() is defined to provide full ordering against all operations
before and after it.  Each architecture is required to do whatever is
necessary to implement that full ordering, and x86 need only provide
its "lock xchg" instruction.

The smp_mb__after_atomic() has effect only after atomic read-modify-write
operations that do not return a value, for example, atomic_inc().
If you use it after a value-returning atomic read-modify-write operation
like xchg(), all you do is needlessly slow things down on platforms
that provide non-empty smp_mb__after_atomic() definitions.  So again,
smp_mb__after_atomic() after xchg() is pointless.

Please take a look at Documentation/core-api/atomic_ops.rst in the
Linux-kernel source tree for more information.  Or get a v4.17 kernel
source tree and check this using the memory model (tools/memory-model
in that version).

    Thanx, Paul

> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 3:31 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> > Hi Paul,
> > 
> > I'd like to opinion again on this subject.
> > 
> > So we are going to backport this patch:
> > 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")
> 
> Does this one solve the problem, or are you still seeing hangs?
> If you are no longer seeing hangs, my advice is "hands off keyboard",
> though some would no doubt point out that I should follow that advice
> more myself.  ;-)
> 
> > But the other one:
> > 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> > It doesn't apply cleanly, and I'm not too comfortable porting it myself.
> 
> Yeah, that one is a bit on the non-trivial side, 

Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread David Chen
Hi Paul,

I wasn't talking about the xchg() though.

The smp_mb__after_atomic() is not for xchg(), it's for `*tail = 
rdp->nocb_gp_head;`
it's stated so in the comment. And I do think we need ordering between
`*tail = rdp->nocb_gp_head;` and wake up, because the waiter is checking on 
head not tail.

swait_event_interruptible(rdp->nocb_wq,
 
READ_ONCE(rdp->nocb_follower_head));

So what I'm saying is that since we need to maintain ordering between `*tail = 
rdp->nocb_gp_head;`
and wake up, we need to change the smp_mb__after_atomic() to smp_mb(). Because
smp_mb__after_atomic() wouldn't guarantee 

So this is what I'm proposing.

diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
linux-4.9.37/kernel/rcu/tree_plugin.h
--- linux-4.9.37.orig/kernel/rcu/tree_plugin.h  2017-07-12 06:42:41.0 
-0700
+++ linux-4.9.37/kernel/rcu/tree_plugin.h   2018-07-27 16:23:41.349044259 
-0700
@@ -2076,7 +2076,7 @@
/* Append callbacks to follower's "done" list. */
tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
*tail = rdp->nocb_gp_head;
-   smp_mb__after_atomic(); /* Store *tail before wakeup. */
+   smp_mb(); /* Store *tail before wakeup. */
if (rdp != my_rdp && tail == >nocb_follower_head) {
/*
 * List was empty, wake up the follower.

Thanks,
David

From: Paul E. McKenney 
Sent: Friday, July 27, 2018 4:47 PM
To: David Chen
Cc: linux-kernel@vger.kernel.org
Subject: Re: RCU nocb list not reclaiming causing OOM
  

On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> Hi Paul,
> 
> Thanks for the advice.
> The bug is kind of hard to hit, so I can't say for certain yet.

Well, you can always remove the "tail == >nocb_follower_head" as an
extra belt-and-suspenders safety net.  I am not putting that in mainline,
but in the privacy of your own copy of the kernel, I don't see any really
serious problem with it.  (As long as you aren't going for absolute maximum
performance, but even then there are other more important tuning actions
and code changes you could make.)

> Though after another look at the code, I found out the 
> `smp_mb__after_atomic();`
> seems to be only a compiler barrier on x86.

Yes, and that is because the locked xchg instruction used on x86 to
implement xchg() already provides full ordering.  ;-)

>    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>    *tail = rdp->nocb_gp_head;
>    smp_mb__after_atomic(); /* Store *tail before wakeup. */
>    if (rdp != my_rdp && tail == >nocb_follower_head) {
>    swake_up(>nocb_wq);
> 
> But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> wakeup operation don't guarantee ordering. And when the follower wakes up, it 
> checks
> for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> which doesn't
> have LOCK prefix. So it's possible for follower to wake up and see the list 
> is empty and go
> back to sleep.

Again, xchg() is defined to provide full ordering against all operations
before and after it.  Each architecture is required to do whatever is
necessary to implement that full ordering, and x86 need only provide
its "lock xchg" instruction.

The smp_mb__after_atomic() has effect only after atomic read-modify-write
operations that do not return a value, for example, atomic_inc().
If you use it after a value-returning atomic read-modify-write operation
like xchg(), all you do is needlessly slow things down on platforms
that provide non-empty smp_mb__after_atomic() definitions.  So again,
smp_mb__after_atomic() after xchg() is pointless.

Please take a look at Documentation/core-api/atomic_ops.rst in the
Linux-kernel source tree for more information.  Or get a v4.17 kernel
source tree and check this using the memory model (tools/memory-model
in that version).

    Thanx, Paul

> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 3:31 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> > Hi Paul,
> > 
> > I'd like to opinion again on this subject.
> > 
> > So we are going to backport this patch:
> > 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")
> 
> Does this one solve the problem, or are you still seeing hangs?
> If you are no longer seeing hangs, my advice is "hands off keyboard",
> though some would no doubt point out that I should follow that advice
> more myself.  ;-)
> 
> > But the other one:
> > 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> > It doesn't apply cleanly, and I'm not too comfortable porting it myself.
> 
> Yeah, that one is a bit on the non-trivial side, 

Re: [PATCH] tracing: do not leak kernel addresses

2018-07-27 Thread Theodore Y. Ts'o
On Fri, Jul 27, 2018 at 03:05:43PM -0700, Sandeep Patil wrote:
> On Fri, Jul 27, 2018 at 04:21:14PM -0400, Theodore Y. Ts'o wrote:
> > On Fri, Jul 27, 2018 at 04:11:03PM -0400, Steven Rostedt wrote:
> > > That said, I would assume that
> > > other Android utilities are using other debugfs files for system
> > > status and such.
> 
> As of today, I think a lot of information in 'bugreports' is read
> out of debugfs (including things like binder stats). We do have a plan
> to change that.

Hmm, if it's only for bugreports, maybe it can be only mounted when
about root processes getting tricked into reading from debugfs.


> Indeed, I think it can. However, the problem is the last time I tried to
> remove this a whole bunch of things just broke. So, it wasn't about losing
> a functionality here and there. Agree, we need to clean up platform to not use
> debugfs first. Then we can expect Apps or other native processes to not rely
> on debugfs at all.

Is Android controlling access to debugfs files via SELinux?  If so,
then access to debugfs can be gradually cranked down as use cases are
removed.

- Ted


   


Re: [PATCH] tracing: do not leak kernel addresses

2018-07-27 Thread Theodore Y. Ts'o
On Fri, Jul 27, 2018 at 03:05:43PM -0700, Sandeep Patil wrote:
> On Fri, Jul 27, 2018 at 04:21:14PM -0400, Theodore Y. Ts'o wrote:
> > On Fri, Jul 27, 2018 at 04:11:03PM -0400, Steven Rostedt wrote:
> > > That said, I would assume that
> > > other Android utilities are using other debugfs files for system
> > > status and such.
> 
> As of today, I think a lot of information in 'bugreports' is read
> out of debugfs (including things like binder stats). We do have a plan
> to change that.

Hmm, if it's only for bugreports, maybe it can be only mounted when
about root processes getting tricked into reading from debugfs.


> Indeed, I think it can. However, the problem is the last time I tried to
> remove this a whole bunch of things just broke. So, it wasn't about losing
> a functionality here and there. Agree, we need to clean up platform to not use
> debugfs first. Then we can expect Apps or other native processes to not rely
> on debugfs at all.

Is Android controlling access to debugfs files via SELinux?  If so,
then access to debugfs can be gradually cranked down as use cases are
removed.

- Ted


   


Re: [PATCH 7/9] irqchip: add a RISC-V PLIC driver

2018-07-27 Thread Atish Patra

On 7/26/18 7:38 AM, Christoph Hellwig wrote:

This patch adds a driver for the Platform Level Interrupt Controller (PLIC)
specified as part of the RISC-V supervisor level ISA manual, in the memory
layout implemented by SiFive and qemu.

The PLIC connects global interrupt sources to the local interrupt controller
on each hart.

This driver is based on the driver in the RISC-V tree from Palmer Dabbelt,
but has been almost entirely rewritten since.

Signed-off-by: Christoph Hellwig 


I tried to boot HighFive Unleashed with the patch series after applying 
all the patches from riscv-all branch except timer & irq patches. It 
gets stuck pretty early.


Here is my github repo with all the changes:
https://github.com/atishp04/riscv-linux/commits/master_chris_cleanup_hifive

I am still looking into it.
Palmer: Did I miss something?

FWIW, here is the boot log.
- Boot log ---
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=5
[0.00] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
[0.00] plic: mapped 53 interrupts to 4 (out of 9) handlers.
[0.00] clocksource: riscv_clocksource: mask: 0x 
max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
[0.00] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 2.00 BogoMIPS (lpj=1)

[0.01] pid_max: default: 32768 minimum: 301
[0.01] Mount-cache hash table entries: 16384 (order: 5, 131072 
bytes)
[0.02] Mountpoint-cache hash table entries: 16384 (order: 5, 
131072 bytes)

[0.02] Hierarchical SRCU implementation.
[0.03] smp: Bringing up secondary CPUs ...

---
  drivers/irqchip/Kconfig  |  13 ++
  drivers/irqchip/Makefile |   1 +
  drivers/irqchip/irq-riscv-plic.c | 219 +++
  3 files changed, 233 insertions(+)
  create mode 100644 drivers/irqchip/irq-riscv-plic.c

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e9233db16e03..5ac6dd922a0d 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -372,3 +372,16 @@ config QCOM_PDC
  IRQs for Qualcomm Technologies Inc (QTI) mobile chips.
  
  endmenu

+
+config RISCV_PLIC
+   bool "Platform-Level Interrupt Controller"
+   depends on RISCV
+   default y
+   help
+  This enables support for the PLIC chip found in standard RISC-V
+  systems.  The PLIC controls devices interrupts and connects them to
+  each core's local interrupt controller.  Aside from timer and
+  software interrupts, all other interrupt sources (MSI, GPIO, etc)
+  are subordinate to the PLIC.
+
+  If you don't know what to do here, say Y.
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 15f268f646bf..bf9238da8a31 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -87,3 +87,4 @@ obj-$(CONFIG_MESON_IRQ_GPIO)  += irq-meson-gpio.o
  obj-$(CONFIG_GOLDFISH_PIC)+= irq-goldfish-pic.o
  obj-$(CONFIG_NDS32)   += irq-ativic32.o
  obj-$(CONFIG_QCOM_PDC)+= qcom-pdc.o
+obj-$(CONFIG_RISCV_PLIC)   += irq-riscv-plic.o
diff --git a/drivers/irqchip/irq-riscv-plic.c b/drivers/irqchip/irq-riscv-plic.c
new file mode 100644
index ..274fe2cba544
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-plic.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2018 Christoph Hellwig
+ */
+#define pr_fmt(fmt) "plic: " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * From the RISC-V Priviledged Spec v1.10:
+ *
+ * Global interrupt sources are assigned small unsigned integer identifiers,
+ * beginning at the value 1.  An interrupt ID of 0 is reserved to mean "no
+ * interrupt".  Interrupt identifiers are also used to break ties when two or
+ * more interrupt sources have the same assigned priority. Smaller values of
+ * interrupt ID take precedence over larger values of interrupt ID.
+ *
+ * While the RISC-V supervisor spec doesn't define the maximum number of
+ * devices supported by the PLIC, the largest number supported by devices
+ * marked as 'riscv,plic0' (which is the only device type this driver supports,
+ * and is the only extant PLIC as of now) is 1024.  As mentioned above, device
+ * 0 is defined to be non-existent so this device really only supports 1023
+ * devices.
+ */
+#define MAX_DEVICES1024
+#define MAX_CONTEXTS   15872
+


Is there any way we can preserve some of the comments in the original 
patch about memory-mapped control registers or at least a reference 
where to find the register offset calculations?


IMHO, it is helpful for anybody who is not familiar with the details.

+/*
+ * Each interrupt source has a priority register associated 

Re: [PATCH 7/9] irqchip: add a RISC-V PLIC driver

2018-07-27 Thread Atish Patra

On 7/26/18 7:38 AM, Christoph Hellwig wrote:

This patch adds a driver for the Platform Level Interrupt Controller (PLIC)
specified as part of the RISC-V supervisor level ISA manual, in the memory
layout implemented by SiFive and qemu.

The PLIC connects global interrupt sources to the local interrupt controller
on each hart.

This driver is based on the driver in the RISC-V tree from Palmer Dabbelt,
but has been almost entirely rewritten since.

Signed-off-by: Christoph Hellwig 


I tried to boot HighFive Unleashed with the patch series after applying 
all the patches from riscv-all branch except timer & irq patches. It 
gets stuck pretty early.


Here is my github repo with all the changes:
https://github.com/atishp04/riscv-linux/commits/master_chris_cleanup_hifive

I am still looking into it.
Palmer: Did I miss something?

FWIW, here is the boot log.
- Boot log ---
[0.00] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=5
[0.00] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
[0.00] plic: mapped 53 interrupts to 4 (out of 9) handlers.
[0.00] clocksource: riscv_clocksource: mask: 0x 
max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
[0.00] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 2.00 BogoMIPS (lpj=1)

[0.01] pid_max: default: 32768 minimum: 301
[0.01] Mount-cache hash table entries: 16384 (order: 5, 131072 
bytes)
[0.02] Mountpoint-cache hash table entries: 16384 (order: 5, 
131072 bytes)

[0.02] Hierarchical SRCU implementation.
[0.03] smp: Bringing up secondary CPUs ...

---
  drivers/irqchip/Kconfig  |  13 ++
  drivers/irqchip/Makefile |   1 +
  drivers/irqchip/irq-riscv-plic.c | 219 +++
  3 files changed, 233 insertions(+)
  create mode 100644 drivers/irqchip/irq-riscv-plic.c

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e9233db16e03..5ac6dd922a0d 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -372,3 +372,16 @@ config QCOM_PDC
  IRQs for Qualcomm Technologies Inc (QTI) mobile chips.
  
  endmenu

+
+config RISCV_PLIC
+   bool "Platform-Level Interrupt Controller"
+   depends on RISCV
+   default y
+   help
+  This enables support for the PLIC chip found in standard RISC-V
+  systems.  The PLIC controls devices interrupts and connects them to
+  each core's local interrupt controller.  Aside from timer and
+  software interrupts, all other interrupt sources (MSI, GPIO, etc)
+  are subordinate to the PLIC.
+
+  If you don't know what to do here, say Y.
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 15f268f646bf..bf9238da8a31 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -87,3 +87,4 @@ obj-$(CONFIG_MESON_IRQ_GPIO)  += irq-meson-gpio.o
  obj-$(CONFIG_GOLDFISH_PIC)+= irq-goldfish-pic.o
  obj-$(CONFIG_NDS32)   += irq-ativic32.o
  obj-$(CONFIG_QCOM_PDC)+= qcom-pdc.o
+obj-$(CONFIG_RISCV_PLIC)   += irq-riscv-plic.o
diff --git a/drivers/irqchip/irq-riscv-plic.c b/drivers/irqchip/irq-riscv-plic.c
new file mode 100644
index ..274fe2cba544
--- /dev/null
+++ b/drivers/irqchip/irq-riscv-plic.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017 SiFive
+ * Copyright (C) 2018 Christoph Hellwig
+ */
+#define pr_fmt(fmt) "plic: " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * From the RISC-V Priviledged Spec v1.10:
+ *
+ * Global interrupt sources are assigned small unsigned integer identifiers,
+ * beginning at the value 1.  An interrupt ID of 0 is reserved to mean "no
+ * interrupt".  Interrupt identifiers are also used to break ties when two or
+ * more interrupt sources have the same assigned priority. Smaller values of
+ * interrupt ID take precedence over larger values of interrupt ID.
+ *
+ * While the RISC-V supervisor spec doesn't define the maximum number of
+ * devices supported by the PLIC, the largest number supported by devices
+ * marked as 'riscv,plic0' (which is the only device type this driver supports,
+ * and is the only extant PLIC as of now) is 1024.  As mentioned above, device
+ * 0 is defined to be non-existent so this device really only supports 1023
+ * devices.
+ */
+#define MAX_DEVICES1024
+#define MAX_CONTEXTS   15872
+


Is there any way we can preserve some of the comments in the original 
patch about memory-mapped control registers or at least a reference 
where to find the register offset calculations?


IMHO, it is helpful for anybody who is not familiar with the details.

+/*
+ * Each interrupt source has a priority register associated 

RE: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Schmauss, Erik


> -Original Message-
> From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> ow...@vger.kernel.org] On Behalf Of Schmauss, Erik
> Sent: Friday, July 27, 2018 2:51 PM
> To: Linus Torvalds ; Rafael J. Wysocki
> 
> Cc: Linux ACPI ; Linux Kernel Mailing List  ker...@vger.kernel.org>
> Subject: RE: [GIT PULL] ACPI fix for v4.18-rc7
> 
> 
> 
> > -Original Message-
> > From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> > ow...@vger.kernel.org] On Behalf Of Linus Torvalds
> > Sent: Friday, July 27, 2018 11:38 AM
> > To: Rafael J. Wysocki 
> > Cc: Linux ACPI ; Linux Kernel Mailing List
> > 
> > Subject: Re: [GIT PULL] ACPI fix for v4.18-rc7
> >
> > My XPS13 laptop is unhappy with something recent.
> >
> > As of yesterday, I get this at bootup:
> >
> >   ACPI: Added _OSI(Module Device)
> >   ACPI: Added _OSI(Processor Device)
> >   ACPI: Added _OSI(3.0 _SCP Extensions)
> >   ACPI: Added _OSI(Processor Aggregator Device)
> >   ACPI: Added _OSI(Linux-Dell-Video)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI: 9 ACPI AML tables successfully acquired and loaded
> >
> > It does *not* happen in
> >
> > Linux version 4.18.0-rc6-00110-g6e77b267723c
> >
> > but it *does* happen in
> >
> > Linux version 4.18.0-rc6-00152-gcd3f77d74ac3
> >
> > and while I didn't bisect it, I'm assuming it's due to 73c2a01c52b6
> > ("ACPICA: AML Parser: ignore dispatcher error status during table
> > load")
> >
> > Ideas?
> 
> Sorry about the breakage. I found the cause this failure and I'm working on a 
> fix
> right now.
> 
> Erik

Hi,

The patch below should be able to fix this.


From a828a091828599154d8f6e8bfee1495a3df5cf34 Mon Sep 17 00:00:00 2001
From: Erik Schmauss 
Date: Fri, 27 Jul 2018 15:37:35 -0700
From: Erik Schmauss 
Subject: [PATCH 1] ACPICA: AML Parser: ignore control method status in 
module-level code

Previous change in the AML parser code blindly set all non-successful
dispatcher statuses to AE_OK. This approach is incorrect because
successful control method 

RE: [GIT PULL] ACPI fix for v4.18-rc7

2018-07-27 Thread Schmauss, Erik


> -Original Message-
> From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> ow...@vger.kernel.org] On Behalf Of Schmauss, Erik
> Sent: Friday, July 27, 2018 2:51 PM
> To: Linus Torvalds ; Rafael J. Wysocki
> 
> Cc: Linux ACPI ; Linux Kernel Mailing List  ker...@vger.kernel.org>
> Subject: RE: [GIT PULL] ACPI fix for v4.18-rc7
> 
> 
> 
> > -Original Message-
> > From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> > ow...@vger.kernel.org] On Behalf Of Linus Torvalds
> > Sent: Friday, July 27, 2018 11:38 AM
> > To: Rafael J. Wysocki 
> > Cc: Linux ACPI ; Linux Kernel Mailing List
> > 
> > Subject: Re: [GIT PULL] ACPI fix for v4.18-rc7
> >
> > My XPS13 laptop is unhappy with something recent.
> >
> > As of yesterday, I get this at bootup:
> >
> >   ACPI: Added _OSI(Module Device)
> >   ACPI: Added _OSI(Processor Device)
> >   ACPI: Added _OSI(3.0 _SCP Extensions)
> >   ACPI: Added _OSI(Processor Aggregator Device)
> >   ACPI: Added _OSI(Linux-Dell-Video)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI Error: Result stack is empty! State=(ptrval)
> > (20180531/dswstate-65)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, Missing or null operand
> > (20180531/dsutils-612)
> >   ACPI Error: AE_AML_NO_RETURN_VALUE, While creating Arg 0
> > (20180531/dsutils-727)
> >   ACPI: 9 ACPI AML tables successfully acquired and loaded
> >
> > It does *not* happen in
> >
> > Linux version 4.18.0-rc6-00110-g6e77b267723c
> >
> > but it *does* happen in
> >
> > Linux version 4.18.0-rc6-00152-gcd3f77d74ac3
> >
> > and while I didn't bisect it, I'm assuming it's due to 73c2a01c52b6
> > ("ACPICA: AML Parser: ignore dispatcher error status during table
> > load")
> >
> > Ideas?
> 
> Sorry about the breakage. I found the cause this failure and I'm working on a 
> fix
> right now.
> 
> Erik

Hi,

The patch below should be able to fix this.


From a828a091828599154d8f6e8bfee1495a3df5cf34 Mon Sep 17 00:00:00 2001
From: Erik Schmauss 
Date: Fri, 27 Jul 2018 15:37:35 -0700
From: Erik Schmauss 
Subject: [PATCH 1] ACPICA: AML Parser: ignore control method status in 
module-level code

Previous change in the AML parser code blindly set all non-successful
dispatcher statuses to AE_OK. This approach is incorrect because
successful control method 

Re: [PATCH v2 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
On Fri, 27 Jul 2018 17:43:07 -0400
Steven Rostedt  wrote:

> On Thu, 26 Jul 2018 14:54:23 +0900
> Masami Hiramatsu  wrote:
> 
> > Fix kprobe string argument testcase to not probe notrace
> > function. Instead, it probes tracefs function which must
> > be available with ftrace.
> 
> Hi Masami,
> 
> With these patches applied, this test fails:
> 
>  ./ftracetest  test.d/kprobe/probepoint.tc
> 
> Gives me the error in dmseg:
> 
>   trace_kprobe: Could not probe notrace function create_trace_kprobe
> 
> Is this expected? I don't see create_trace_kprobe as a notrace function
> either.
> 
> I even applied patch 3 (going through Shuah's tree) to me test and it
> still fails.

Oops, why I missed this...
OK, I'll fix it.

Thanks,

> 
> Config attached.
> 
> -- Steve
> 
> > 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 
> > 
> >  1 file changed, 12 insertions(+), 18 deletions(-)
> > 
> > diff --git 
> > a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > index a0002563e9ee..1ad70cdaf442 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > @@ -9,28 +9,22 @@ echo > kprobe_events
> >  
> >  case `uname -m` in
> >  x86_64)
> > -  ARG2=%si
> > -  OFFS=8
> > +  ARG1=%di
> >  ;;
> >  i[3456]86)
> > -  ARG2=%cx
> > -  OFFS=4
> > +  ARG1=%ax
> >  ;;
> >  aarch64)
> > -  ARG2=%x1
> > -  OFFS=8
> > +  ARG1=%x0
> >  ;;
> >  arm*)
> > -  ARG2=%r1
> > -  OFFS=4
> > +  ARG1=%r0
> >  ;;
> >  ppc64*)
> > -  ARG2=%r4
> > -  OFFS=8
> > +  ARG1=%r3
> >  ;;
> >  ppc*)
> > -  ARG2=%r4
> > -  OFFS=4
> > +  ARG1=%r3
> >  ;;
> >  *)
> >echo "Please implement other architecture here"
> > @@ -38,17 +32,17 @@ ppc*)
> >  esac
> >  
> >  : "Test get argument (1)"
> > -echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
> > kprobe_events
> > +echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > 
> > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> > -! echo test >> kprobe_events
> > -tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
> > +echo "p:test _do_fork" >> kprobe_events
> > +grep -qe "testprobe.* arg1=\"test\"" trace
> >  
> >  echo 0 > events/kprobes/testprobe/enable
> >  : "Test get argument (2)"
> > -echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
> > arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
> > +echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string 
> > arg2=+0(${ARG1}):string" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> > -! echo test1 test2 >> kprobe_events
> > -tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
> > +echo "p:test _do_fork" >> kprobe_events
> > +grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
> >  
> >  echo 0 > events/enable
> >  echo > kprobe_events
> 


-- 
Masami Hiramatsu 


Re: [PATCH v2 3/3] selftests/ftrace: Fix kprobe string testcase to not probe notrace function

2018-07-27 Thread Masami Hiramatsu
On Fri, 27 Jul 2018 17:43:07 -0400
Steven Rostedt  wrote:

> On Thu, 26 Jul 2018 14:54:23 +0900
> Masami Hiramatsu  wrote:
> 
> > Fix kprobe string argument testcase to not probe notrace
> > function. Instead, it probes tracefs function which must
> > be available with ftrace.
> 
> Hi Masami,
> 
> With these patches applied, this test fails:
> 
>  ./ftracetest  test.d/kprobe/probepoint.tc
> 
> Gives me the error in dmseg:
> 
>   trace_kprobe: Could not probe notrace function create_trace_kprobe
> 
> Is this expected? I don't see create_trace_kprobe as a notrace function
> either.
> 
> I even applied patch 3 (going through Shuah's tree) to me test and it
> still fails.

Oops, why I missed this...
OK, I'll fix it.

Thanks,

> 
> Config attached.
> 
> -- Steve
> 
> > 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  .../ftrace/test.d/kprobe/kprobe_args_string.tc |   30 
> > 
> >  1 file changed, 12 insertions(+), 18 deletions(-)
> > 
> > diff --git 
> > a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > index a0002563e9ee..1ad70cdaf442 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
> > @@ -9,28 +9,22 @@ echo > kprobe_events
> >  
> >  case `uname -m` in
> >  x86_64)
> > -  ARG2=%si
> > -  OFFS=8
> > +  ARG1=%di
> >  ;;
> >  i[3456]86)
> > -  ARG2=%cx
> > -  OFFS=4
> > +  ARG1=%ax
> >  ;;
> >  aarch64)
> > -  ARG2=%x1
> > -  OFFS=8
> > +  ARG1=%x0
> >  ;;
> >  arm*)
> > -  ARG2=%r1
> > -  OFFS=4
> > +  ARG1=%r0
> >  ;;
> >  ppc64*)
> > -  ARG2=%r4
> > -  OFFS=8
> > +  ARG1=%r3
> >  ;;
> >  ppc*)
> > -  ARG2=%r4
> > -  OFFS=4
> > +  ARG1=%r3
> >  ;;
> >  *)
> >echo "Please implement other architecture here"
> > @@ -38,17 +32,17 @@ ppc*)
> >  esac
> >  
> >  : "Test get argument (1)"
> > -echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
> > kprobe_events
> > +echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > 
> > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> > -! echo test >> kprobe_events
> > -tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
> > +echo "p:test _do_fork" >> kprobe_events
> > +grep -qe "testprobe.* arg1=\"test\"" trace
> >  
> >  echo 0 > events/kprobes/testprobe/enable
> >  : "Test get argument (2)"
> > -echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
> > arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
> > +echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string 
> > arg2=+0(${ARG1}):string" > kprobe_events
> >  echo 1 > events/kprobes/testprobe/enable
> > -! echo test1 test2 >> kprobe_events
> > -tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
> > +echo "p:test _do_fork" >> kprobe_events
> > +grep -qe "testprobe.* arg1=\"test\" arg2=\"test\"" trace
> >  
> >  echo 0 > events/enable
> >  echo > kprobe_events
> 


-- 
Masami Hiramatsu 


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread Jann Horn
On Sat, Jul 28, 2018 at 1:51 AM David Howells  wrote:
> David Howells  wrote:
>
> > One thing I'm confused about is that fat has both a codepage and a charset 
> > and
> > I'm not sure of the difference.
>
> In fact, it's not clear that the codepage is actually used.
>
> warthog>git grep '[.>]codepage'
> fs/fat/inode.c: opts->codepage = fat_default_codepage;
> fs/fat/inode.c: opts->codepage = option;
> fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);

sprintf(buf, "cp%d", sbi->options.codepage);
sbi->nls_disk = load_nls(buf);
if (!sbi->nls_disk) {
fat_msg(sb, KERN_ERR, "codepage %s not found", buf);
goto out_fail;
}


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread Jann Horn
On Sat, Jul 28, 2018 at 1:51 AM David Howells  wrote:
> David Howells  wrote:
>
> > One thing I'm confused about is that fat has both a codepage and a charset 
> > and
> > I'm not sure of the difference.
>
> In fact, it's not clear that the codepage is actually used.
>
> warthog>git grep '[.>]codepage'
> fs/fat/inode.c: opts->codepage = fat_default_codepage;
> fs/fat/inode.c: opts->codepage = option;
> fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);

sprintf(buf, "cp%d", sbi->options.codepage);
sbi->nls_disk = load_nls(buf);
if (!sbi->nls_disk) {
fat_msg(sb, KERN_ERR, "codepage %s not found", buf);
goto out_fail;
}


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
David Howells  wrote:

> One thing I'm confused about is that fat has both a codepage and a charset and
> I'm not sure of the difference.

In fact, it's not clear that the codepage is actually used.

warthog>git grep '[.>]codepage'
fs/fat/inode.c: opts->codepage = fat_default_codepage;
fs/fat/inode.c: opts->codepage = option;
fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);

David


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
David Howells  wrote:

> One thing I'm confused about is that fat has both a codepage and a charset and
> I'm not sure of the difference.

In fact, it's not clear that the codepage is actually used.

warthog>git grep '[.>]codepage'
fs/fat/inode.c: opts->codepage = fat_default_codepage;
fs/fat/inode.c: opts->codepage = option;
fs/fat/inode.c: sprintf(buf, "cp%d", sbi->options.codepage);

David


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
Jann Horn  wrote:

> > +static int fsinfo_generic_name_encoding(struct dentry *dentry, char *buf)
> > +{
> > +   static const char encoding[] = "utf8";
> > +
> > +   if (buf)
> > +   memcpy(buf, encoding, sizeof(encoding) - 1);
> > +   return sizeof(encoding) - 1;
> > +}
> 
> Is this meant to be "encoding to be used by userspace" or "encoding of
> on-disk filenames"?

The latter.

> Are there any plans to create filesystems that behave differently?

isofs, fat, ntfs, cifs for example.

> If the latter: This is wrong for e.g. a vfat mount that uses a codepage,
> right?  Should the default in that case not be "I don't know"?

Quite possibly.  Note that it could also be what you're interpreting it as
because the codepage got overridden by a mount parameter rather than what's on
the disk (assuming the medium actually records this).

One thing I'm confused about is that fat has both a codepage and a charset and
I'm not sure of the difference.

David


Re: [PATCH 34/38] vfs: syscall: Add fsinfo() to query filesystem information [ver #10]

2018-07-27 Thread David Howells
Jann Horn  wrote:

> > +static int fsinfo_generic_name_encoding(struct dentry *dentry, char *buf)
> > +{
> > +   static const char encoding[] = "utf8";
> > +
> > +   if (buf)
> > +   memcpy(buf, encoding, sizeof(encoding) - 1);
> > +   return sizeof(encoding) - 1;
> > +}
> 
> Is this meant to be "encoding to be used by userspace" or "encoding of
> on-disk filenames"?

The latter.

> Are there any plans to create filesystems that behave differently?

isofs, fat, ntfs, cifs for example.

> If the latter: This is wrong for e.g. a vfat mount that uses a codepage,
> right?  Should the default in that case not be "I don't know"?

Quite possibly.  Note that it could also be what you're interpreting it as
because the codepage got overridden by a mount parameter rather than what's on
the disk (assuming the medium actually records this).

One thing I'm confused about is that fat has both a codepage and a charset and
I'm not sure of the difference.

David


Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread Paul E. McKenney
On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> Hi Paul,
> 
> Thanks for the advice.
> The bug is kind of hard to hit, so I can't say for certain yet.

Well, you can always remove the "tail == >nocb_follower_head" as an
extra belt-and-suspenders safety net.  I am not putting that in mainline,
but in the privacy of your own copy of the kernel, I don't see any really
serious problem with it.  (As long as you aren't going for absolute maximum
performance, but even then there are other more important tuning actions
and code changes you could make.)

> Though after another look at the code, I found out the 
> `smp_mb__after_atomic();`
> seems to be only a compiler barrier on x86.

Yes, and that is because the locked xchg instruction used on x86 to
implement xchg() already provides full ordering.  ;-)

>   tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>   *tail = rdp->nocb_gp_head;
>   smp_mb__after_atomic(); /* Store *tail before wakeup. */
>   if (rdp != my_rdp && tail == >nocb_follower_head) {
>   swake_up(>nocb_wq);
> 
> But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> wakeup operation don't guarantee ordering. And when the follower wakes up, it 
> checks
> for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> which doesn't
> have LOCK prefix. So it's possible for follower to wake up and see the list 
> is empty and go
> back to sleep.

Again, xchg() is defined to provide full ordering against all operations
before and after it.  Each architecture is required to do whatever is
necessary to implement that full ordering, and x86 need only provide
its "lock xchg" instruction.

The smp_mb__after_atomic() has effect only after atomic read-modify-write
operations that do not return a value, for example, atomic_inc().
If you use it after a value-returning atomic read-modify-write operation
like xchg(), all you do is needlessly slow things down on platforms
that provide non-empty smp_mb__after_atomic() definitions.  So again,
smp_mb__after_atomic() after xchg() is pointless.

Please take a look at Documentation/core-api/atomic_ops.rst in the
Linux-kernel source tree for more information.  Or get a v4.17 kernel
source tree and check this using the memory model (tools/memory-model
in that version).

Thanx, Paul

> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 3:31 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> > Hi Paul,
> > 
> > I'd like to opinion again on this subject.
> > 
> > So we are going to backport this patch:
> > 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")
> 
> Does this one solve the problem, or are you still seeing hangs?
> If you are no longer seeing hangs, my advice is "hands off keyboard",
> though some would no doubt point out that I should follow that advice
> more myself.  ;-)
> 
> > But the other one:
> > 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> > It doesn't apply cleanly, and I'm not too comfortable porting it myself.
> 
> Yeah, that one is a bit on the non-trivial side, no two ways about it.
> 
> > So I'm wondering if I use the following change to always wake up follower 
> > thread
> > regardless if the list was empty or not, just to be on the safe side. Do 
> > you think
> > this change is reasonable? Do you see any problem it might cause?
> > 
> > Thanks,
> > David
> > 
> > diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> > linux-4.9.37/kernel/rcu/tree_plugin.h
> > --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h    2017-07-12 
> > 06:42:41.0 -0700
> > +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 11:57:03.582134519 
> > -0700
> > @@ -2077,7 +2077,7 @@
> >    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
> >    *tail = rdp->nocb_gp_head;
> >    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> > - if (rdp != my_rdp && tail == >nocb_follower_head) {
> > + if (rdp != my_rdp) {
> 
> This will burn a bit of extra CPU time, but it should be fine other than
> that.
> 
>     Thanx, Paul
> 
> >    /*
> > * List was empty, wake up the follower.
> > * Memory barriers supplied by atomic_long_add().
> > 
> > 
> > From: David Chen
> > Sent: Friday, July 20, 2018 5:12 PM
> > To: paul...@linux.vnet.ibm.com
> > Cc: linux-kernel@vger.kernel.org
> > Subject: Re: RCU nocb list not reclaiming causing OOM
> >    
> > 
> > Hi Paul,
> > 
> > Ok, I'll try those patches.
> > 
> > Thanks,
> > David
> >   
> > From: Paul E. McKenney 
> > Sent: Friday, July 20, 2018 4:32:12 PM
> > To: David 

Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread Paul E. McKenney
On Fri, Jul 27, 2018 at 11:16:39PM +, David Chen wrote:
> Hi Paul,
> 
> Thanks for the advice.
> The bug is kind of hard to hit, so I can't say for certain yet.

Well, you can always remove the "tail == >nocb_follower_head" as an
extra belt-and-suspenders safety net.  I am not putting that in mainline,
but in the privacy of your own copy of the kernel, I don't see any really
serious problem with it.  (As long as you aren't going for absolute maximum
performance, but even then there are other more important tuning actions
and code changes you could make.)

> Though after another look at the code, I found out the 
> `smp_mb__after_atomic();`
> seems to be only a compiler barrier on x86.

Yes, and that is because the locked xchg instruction used on x86 to
implement xchg() already provides full ordering.  ;-)

>   tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>   *tail = rdp->nocb_gp_head;
>   smp_mb__after_atomic(); /* Store *tail before wakeup. */
>   if (rdp != my_rdp && tail == >nocb_follower_head) {
>   swake_up(>nocb_wq);
> 
> But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
> wakeup operation don't guarantee ordering. And when the follower wakes up, it 
> checks
> for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` 
> which doesn't
> have LOCK prefix. So it's possible for follower to wake up and see the list 
> is empty and go
> back to sleep.

Again, xchg() is defined to provide full ordering against all operations
before and after it.  Each architecture is required to do whatever is
necessary to implement that full ordering, and x86 need only provide
its "lock xchg" instruction.

The smp_mb__after_atomic() has effect only after atomic read-modify-write
operations that do not return a value, for example, atomic_inc().
If you use it after a value-returning atomic read-modify-write operation
like xchg(), all you do is needlessly slow things down on platforms
that provide non-empty smp_mb__after_atomic() definitions.  So again,
smp_mb__after_atomic() after xchg() is pointless.

Please take a look at Documentation/core-api/atomic_ops.rst in the
Linux-kernel source tree for more information.  Or get a v4.17 kernel
source tree and check this using the memory model (tools/memory-model
in that version).

Thanx, Paul

> Thanks,
> David
> 
> From: Paul E. McKenney 
> Sent: Friday, July 27, 2018 3:31 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> > Hi Paul,
> > 
> > I'd like to opinion again on this subject.
> > 
> > So we are going to backport this patch:
> > 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")
> 
> Does this one solve the problem, or are you still seeing hangs?
> If you are no longer seeing hangs, my advice is "hands off keyboard",
> though some would no doubt point out that I should follow that advice
> more myself.  ;-)
> 
> > But the other one:
> > 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> > It doesn't apply cleanly, and I'm not too comfortable porting it myself.
> 
> Yeah, that one is a bit on the non-trivial side, no two ways about it.
> 
> > So I'm wondering if I use the following change to always wake up follower 
> > thread
> > regardless if the list was empty or not, just to be on the safe side. Do 
> > you think
> > this change is reasonable? Do you see any problem it might cause?
> > 
> > Thanks,
> > David
> > 
> > diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> > linux-4.9.37/kernel/rcu/tree_plugin.h
> > --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h    2017-07-12 
> > 06:42:41.0 -0700
> > +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 11:57:03.582134519 
> > -0700
> > @@ -2077,7 +2077,7 @@
> >    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
> >    *tail = rdp->nocb_gp_head;
> >    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> > - if (rdp != my_rdp && tail == >nocb_follower_head) {
> > + if (rdp != my_rdp) {
> 
> This will burn a bit of extra CPU time, but it should be fine other than
> that.
> 
>     Thanx, Paul
> 
> >    /*
> > * List was empty, wake up the follower.
> > * Memory barriers supplied by atomic_long_add().
> > 
> > 
> > From: David Chen
> > Sent: Friday, July 20, 2018 5:12 PM
> > To: paul...@linux.vnet.ibm.com
> > Cc: linux-kernel@vger.kernel.org
> > Subject: Re: RCU nocb list not reclaiming causing OOM
> >    
> > 
> > Hi Paul,
> > 
> > Ok, I'll try those patches.
> > 
> > Thanks,
> > David
> >   
> > From: Paul E. McKenney 
> > Sent: Friday, July 20, 2018 4:32:12 PM
> > To: David 

Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

2018-07-27 Thread Suren Baghdasaryan
On Thu, Jul 26, 2018 at 1:07 PM, Johannes Weiner  wrote:
> On Thu, Jul 26, 2018 at 11:07:32AM +1000, Singh, Balbir wrote:
>> On 7/25/18 1:15 AM, Johannes Weiner wrote:
>> > On Tue, Jul 24, 2018 at 07:14:02AM +1000, Balbir Singh wrote:
>> >> Does the mechanism scale? I am a little concerned about how frequently
>> >> this infrastructure is monitored/read/acted upon.
>> >
>> > I expect most users to poll in the frequency ballpark of the running
>> > averages (10s, 1m, 5m). Our OOMD defaults to 5s polling of the 10s
>> > average; we collect the 1m average once per minute from our machines
>> > and cgroups to log the system/workload health trends in our fleet.
>> >
>> > Suren has been experimenting with adaptive polling down to the
>> > millisecond range on Android.
>> >
>>
>> I think this is a bad way of doing things, polling only adds to
>> overheads, there needs to be an event driven mechanism and the
>> selection of the events need to happen in user space.
>
> Of course, I'm not saying you should be doing this, and in fact Suren
> and I were talking about notification/event infrastructure.

I implemented a psi-monitor prototype which allows userspace to
specify the max PSI stall it can tolerate (in terms of % of time spent
on memory management). When that threshold is breached an event to
userspace is generated. I'm still testing it but early results look
promising. I'm planning to send it upstream when it's ready and after
the main PSI patchset is merged.

>
> You asked if this scales and I'm telling you it's not impossible to
> read at such frequencies.
>

Yes it's doable. One usecase might be to poll at a higher rate for a
short period of time immediately after the initial event is received
to clarify the short-term signal dynamics.

> Maybe you can clarify your question.
>
>> >> Why aren't existing mechanisms sufficient
>> >
>> > Our existing stuff gives a lot of indication when something *may* be
>> > an issue, like the rate of page reclaim, the number of refaults, the
>> > average number of active processes, one task waiting on a resource.
>> >
>> > But the real difference between an issue and a non-issue is how much
>> > it affects your overall goal of making forward progress or reacting to
>> > a request in time. And that's the only thing users really care
>> > about. It doesn't matter whether my system is doing 2314 or 6723 page
>> > refaults per minute, or scanned 8495 pages recently. I need to know
>> > whether I'm losing 1% or 20% of my time on overcommitted memory.
>> >
>> > Delayacct is time-based, so it's a step in the right direction, but it
>> > doesn't aggregate tasks and CPUs into compound productivity states to
>> > tell you if only parts of your workload are seeing delays (which is
>> > often tolerable for the purpose of ensuring maximum HW utilization) or
>> > your system overall is not making forward progress. That aggregation
>> > isn't something you can do in userspace with polled delayacct data.
>>
>> By aggregation you mean cgroup aggregation?
>
> System-wide and per cgroup.


Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

2018-07-27 Thread Suren Baghdasaryan
On Thu, Jul 26, 2018 at 1:07 PM, Johannes Weiner  wrote:
> On Thu, Jul 26, 2018 at 11:07:32AM +1000, Singh, Balbir wrote:
>> On 7/25/18 1:15 AM, Johannes Weiner wrote:
>> > On Tue, Jul 24, 2018 at 07:14:02AM +1000, Balbir Singh wrote:
>> >> Does the mechanism scale? I am a little concerned about how frequently
>> >> this infrastructure is monitored/read/acted upon.
>> >
>> > I expect most users to poll in the frequency ballpark of the running
>> > averages (10s, 1m, 5m). Our OOMD defaults to 5s polling of the 10s
>> > average; we collect the 1m average once per minute from our machines
>> > and cgroups to log the system/workload health trends in our fleet.
>> >
>> > Suren has been experimenting with adaptive polling down to the
>> > millisecond range on Android.
>> >
>>
>> I think this is a bad way of doing things, polling only adds to
>> overheads, there needs to be an event driven mechanism and the
>> selection of the events need to happen in user space.
>
> Of course, I'm not saying you should be doing this, and in fact Suren
> and I were talking about notification/event infrastructure.

I implemented a psi-monitor prototype which allows userspace to
specify the max PSI stall it can tolerate (in terms of % of time spent
on memory management). When that threshold is breached an event to
userspace is generated. I'm still testing it but early results look
promising. I'm planning to send it upstream when it's ready and after
the main PSI patchset is merged.

>
> You asked if this scales and I'm telling you it's not impossible to
> read at such frequencies.
>

Yes it's doable. One usecase might be to poll at a higher rate for a
short period of time immediately after the initial event is received
to clarify the short-term signal dynamics.

> Maybe you can clarify your question.
>
>> >> Why aren't existing mechanisms sufficient
>> >
>> > Our existing stuff gives a lot of indication when something *may* be
>> > an issue, like the rate of page reclaim, the number of refaults, the
>> > average number of active processes, one task waiting on a resource.
>> >
>> > But the real difference between an issue and a non-issue is how much
>> > it affects your overall goal of making forward progress or reacting to
>> > a request in time. And that's the only thing users really care
>> > about. It doesn't matter whether my system is doing 2314 or 6723 page
>> > refaults per minute, or scanned 8495 pages recently. I need to know
>> > whether I'm losing 1% or 20% of my time on overcommitted memory.
>> >
>> > Delayacct is time-based, so it's a step in the right direction, but it
>> > doesn't aggregate tasks and CPUs into compound productivity states to
>> > tell you if only parts of your workload are seeing delays (which is
>> > often tolerable for the purpose of ensuring maximum HW utilization) or
>> > your system overall is not making forward progress. That aggregation
>> > isn't something you can do in userspace with polled delayacct data.
>>
>> By aggregation you mean cgroup aggregation?
>
> System-wide and per cgroup.


Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread David Chen
Hi Paul,

Thanks for the advice.
The bug is kind of hard to hit, so I can't say for certain yet.

Though after another look at the code, I found out the `smp_mb__after_atomic();`
seems to be only a compiler barrier on x86.

tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
*tail = rdp->nocb_gp_head;
smp_mb__after_atomic(); /* Store *tail before wakeup. */
if (rdp != my_rdp && tail == >nocb_follower_head) {
swake_up(>nocb_wq);

But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
wakeup operation don't guarantee ordering. And when the follower wakes up, it 
checks
for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` which 
doesn't
have LOCK prefix. So it's possible for follower to wake up and see the list is 
empty and go
back to sleep.

Thanks,
David

From: Paul E. McKenney 
Sent: Friday, July 27, 2018 3:31 PM
To: David Chen
Cc: linux-kernel@vger.kernel.org
Subject: Re: RCU nocb list not reclaiming causing OOM
  

On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> Hi Paul,
> 
> I'd like to opinion again on this subject.
> 
> So we are going to backport this patch:
> 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")

Does this one solve the problem, or are you still seeing hangs?
If you are no longer seeing hangs, my advice is "hands off keyboard",
though some would no doubt point out that I should follow that advice
more myself.  ;-)

> But the other one:
> 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> It doesn't apply cleanly, and I'm not too comfortable porting it myself.

Yeah, that one is a bit on the non-trivial side, no two ways about it.

> So I'm wondering if I use the following change to always wake up follower 
> thread
> regardless if the list was empty or not, just to be on the safe side. Do you 
> think
> this change is reasonable? Do you see any problem it might cause?
> 
> Thanks,
> David
> 
> diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> linux-4.9.37/kernel/rcu/tree_plugin.h
> --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h    2017-07-12 
> 06:42:41.0 -0700
> +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 11:57:03.582134519 
> -0700
> @@ -2077,7 +2077,7 @@
>    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>    *tail = rdp->nocb_gp_head;
>    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> - if (rdp != my_rdp && tail == >nocb_follower_head) {
> + if (rdp != my_rdp) {

This will burn a bit of extra CPU time, but it should be fine other than
that.

    Thanx, Paul

>    /*
> * List was empty, wake up the follower.
> * Memory barriers supplied by atomic_long_add().
> 
> 
> From: David Chen
> Sent: Friday, July 20, 2018 5:12 PM
> To: paul...@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>    
> 
> Hi Paul,
> 
> Ok, I'll try those patches.
> 
> Thanks,
> David
>   
> From: Paul E. McKenney 
> Sent: Friday, July 20, 2018 4:32:12 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 20, 2018 at 11:05:52PM +, David Chen wrote:
> > Hi Paul,
> > 
> > We hit an RCU issue on 4.9.37 kernel. One of the nocb_follower list grows 
> > too
> > large, and not getting reclaimed, causing the system to OOM.
> > 
> > Printing the culprit rcu_sched_data:
> > 
> >   nocb_q_count = {
> > counter = 32369635
> >   },
> >   nocb_follower_head = 0x88ae901c0a00,
> >   nocb_follower_tail = 0x88af1538b8d8,
> >   nocb_kthread = 0x88b06d29,
> > 
> > As you can see here, the nocb_follower_head is not empty, so in theory, the
> > nocb_kthread shouldn't go to sleep. However, if dump the stack of the 
> > kthread:
> > 
> > crash> bt 0x88b06d29
> > PID: 21 TASK: 88b06d29  CPU: 3   COMMAND: "rcuos/1"
> >  #0 [afc9020b7dc0] __schedule at 8d8789dc
> >  #1 [afc9020b7e38] schedule at 8d878e76
> >  #2 [afc9020b7e50] rcu_nocb_kthread at 8d112337
> >  #3 [afc9020b7ec8] kthread at 8d0c6ce7
> >  #4 [afc9020b7f50] ret_from_fork at 8d87d755
> > 
> > And if we dis the address at 8d112337:
> > 
> > /usr/src/debug/kernel-4.9.37/linux-4.9.37-29.nutanix.07142017.el7.centos.x86_64/kernel/rcu/tree_plugin.h:
> >  2106
> > 0x8d11232d :  test   %rax,%rax
> > 0x8d112330 :  jne    0x8d112355 
> > 
> > 0x8d112332 :  callq  0x8d878e40 
> > 
> > 0x8d112337 :  lea    -0x40(%rbp),%rsi
> > 
> > So the kthread is blocked at swait_event_interruptible in the 
> > nocb_follower_wait.
> > This contradict with the fact that nocb_follower_head 

Re: RCU nocb list not reclaiming causing OOM

2018-07-27 Thread David Chen
Hi Paul,

Thanks for the advice.
The bug is kind of hard to hit, so I can't say for certain yet.

Though after another look at the code, I found out the `smp_mb__after_atomic();`
seems to be only a compiler barrier on x86.

tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
*tail = rdp->nocb_gp_head;
smp_mb__after_atomic(); /* Store *tail before wakeup. */
if (rdp != my_rdp && tail == >nocb_follower_head) {
swake_up(>nocb_wq);

But that wouldn't be enough right? Because from 6b5fc3a13318, it stated that
wakeup operation don't guarantee ordering. And when the follower wakes up, it 
checks
for nocb_follower_head, which is assigned by `*tail = rdp->nocb_gp_head;` which 
doesn't
have LOCK prefix. So it's possible for follower to wake up and see the list is 
empty and go
back to sleep.

Thanks,
David

From: Paul E. McKenney 
Sent: Friday, July 27, 2018 3:31 PM
To: David Chen
Cc: linux-kernel@vger.kernel.org
Subject: Re: RCU nocb list not reclaiming causing OOM
  

On Fri, Jul 27, 2018 at 07:07:46PM +, David Chen wrote:
> Hi Paul,
> 
> I'd like to opinion again on this subject.
> 
> So we are going to backport this patch:
> 6b5fc3a13318 ("rcu: Add memory barriers for NOCB leader wakeup")

Does this one solve the problem, or are you still seeing hangs?
If you are no longer seeing hangs, my advice is "hands off keyboard",
though some would no doubt point out that I should follow that advice
more myself.  ;-)

> But the other one:
> 8be6e1b15c54 ("rcu: Use timer as backstop for NOCB deferred wakeups")
> It doesn't apply cleanly, and I'm not too comfortable porting it myself.

Yeah, that one is a bit on the non-trivial side, no two ways about it.

> So I'm wondering if I use the following change to always wake up follower 
> thread
> regardless if the list was empty or not, just to be on the safe side. Do you 
> think
> this change is reasonable? Do you see any problem it might cause?
> 
> Thanks,
> David
> 
> diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h 
> linux-4.9.37/kernel/rcu/tree_plugin.h
> --- linux-4.9.37.orig/kernel/rcu/tree_plugin.h    2017-07-12 
> 06:42:41.0 -0700
> +++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-27 11:57:03.582134519 
> -0700
> @@ -2077,7 +2077,7 @@
>    tail = xchg(>nocb_follower_tail, rdp->nocb_gp_tail);
>    *tail = rdp->nocb_gp_head;
>    smp_mb__after_atomic(); /* Store *tail before wakeup. */
> - if (rdp != my_rdp && tail == >nocb_follower_head) {
> + if (rdp != my_rdp) {

This will burn a bit of extra CPU time, but it should be fine other than
that.

    Thanx, Paul

>    /*
> * List was empty, wake up the follower.
> * Memory barriers supplied by atomic_long_add().
> 
> 
> From: David Chen
> Sent: Friday, July 20, 2018 5:12 PM
> To: paul...@linux.vnet.ibm.com
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>    
> 
> Hi Paul,
> 
> Ok, I'll try those patches.
> 
> Thanks,
> David
>   
> From: Paul E. McKenney 
> Sent: Friday, July 20, 2018 4:32:12 PM
> To: David Chen
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: RCU nocb list not reclaiming causing OOM
>   
> 
> On Fri, Jul 20, 2018 at 11:05:52PM +, David Chen wrote:
> > Hi Paul,
> > 
> > We hit an RCU issue on 4.9.37 kernel. One of the nocb_follower list grows 
> > too
> > large, and not getting reclaimed, causing the system to OOM.
> > 
> > Printing the culprit rcu_sched_data:
> > 
> >   nocb_q_count = {
> > counter = 32369635
> >   },
> >   nocb_follower_head = 0x88ae901c0a00,
> >   nocb_follower_tail = 0x88af1538b8d8,
> >   nocb_kthread = 0x88b06d29,
> > 
> > As you can see here, the nocb_follower_head is not empty, so in theory, the
> > nocb_kthread shouldn't go to sleep. However, if dump the stack of the 
> > kthread:
> > 
> > crash> bt 0x88b06d29
> > PID: 21 TASK: 88b06d29  CPU: 3   COMMAND: "rcuos/1"
> >  #0 [afc9020b7dc0] __schedule at 8d8789dc
> >  #1 [afc9020b7e38] schedule at 8d878e76
> >  #2 [afc9020b7e50] rcu_nocb_kthread at 8d112337
> >  #3 [afc9020b7ec8] kthread at 8d0c6ce7
> >  #4 [afc9020b7f50] ret_from_fork at 8d87d755
> > 
> > And if we dis the address at 8d112337:
> > 
> > /usr/src/debug/kernel-4.9.37/linux-4.9.37-29.nutanix.07142017.el7.centos.x86_64/kernel/rcu/tree_plugin.h:
> >  2106
> > 0x8d11232d :  test   %rax,%rax
> > 0x8d112330 :  jne    0x8d112355 
> > 
> > 0x8d112332 :  callq  0x8d878e40 
> > 
> > 0x8d112337 :  lea    -0x40(%rbp),%rsi
> > 
> > So the kthread is blocked at swait_event_interruptible in the 
> > nocb_follower_wait.
> > This contradict with the fact that nocb_follower_head 

[PATCH 07/10] staging:rtl8192u: Refactor dm_dig_connect_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_connect_e is only used to group constant
values, as the actual type is never used as the type for the variables
which use the defined constants (cur_connect_state and pre_connect_state).

These two member variables have had there defined types changed to
properly reflect there usage and to permit compiler type checks to be
performed.

In addition the definition of the enumerated type has been moved above
the structure which uses the type. The typedef of the enumerated type
has been removed to clear the checkpatch issue with defining new types
and the enumerated value DIG_CONNECT_MAX has been removed since it is
never used in code.

The resulting changes are all coding style in nature and should not
impact runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 450bcbe3095e..825df06e17a8 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -76,6 +76,11 @@ enum dm_dig_sta_e {
DM_STA_DIG_MAX
 };
 
+enum dm_dig_connect_e {
+   DIG_DISCONNECT = 0,
+   DIG_CONNECT = 1,
+};
+
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
@@ -91,8 +96,8 @@ struct dig {
 
enum dm_dig_sta_e   dig_state;
enum dm_dig_sta_e   dig_highpwr_state;
-   u8  cur_connect_state;
-   u8  pre_connect_state;
+   enum dm_dig_connect_e   cur_connect_state;
+   enum dm_dig_connect_e   pre_connect_state;
 
u8  curpd_thstate;
u8  prepd_thstate;
@@ -110,12 +115,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_connect_definition {
-   DIG_DISCONNECT = 0,
-   DIG_CONNECT = 1,
-   DIG_CONNECT_MAX
-} dm_dig_connect_e;
-
 typedef enum tag_dig_packetdetection_threshold_definition {
DIG_PD_AT_LOW_POWER = 0,
DIG_PD_AT_NORMAL_POWER = 1,
-- 
2.18.0



[PATCH 07/10] staging:rtl8192u: Refactor dm_dig_connect_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_connect_e is only used to group constant
values, as the actual type is never used as the type for the variables
which use the defined constants (cur_connect_state and pre_connect_state).

These two member variables have had there defined types changed to
properly reflect there usage and to permit compiler type checks to be
performed.

In addition the definition of the enumerated type has been moved above
the structure which uses the type. The typedef of the enumerated type
has been removed to clear the checkpatch issue with defining new types
and the enumerated value DIG_CONNECT_MAX has been removed since it is
never used in code.

The resulting changes are all coding style in nature and should not
impact runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 450bcbe3095e..825df06e17a8 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -76,6 +76,11 @@ enum dm_dig_sta_e {
DM_STA_DIG_MAX
 };
 
+enum dm_dig_connect_e {
+   DIG_DISCONNECT = 0,
+   DIG_CONNECT = 1,
+};
+
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
@@ -91,8 +96,8 @@ struct dig {
 
enum dm_dig_sta_e   dig_state;
enum dm_dig_sta_e   dig_highpwr_state;
-   u8  cur_connect_state;
-   u8  pre_connect_state;
+   enum dm_dig_connect_e   cur_connect_state;
+   enum dm_dig_connect_e   pre_connect_state;
 
u8  curpd_thstate;
u8  prepd_thstate;
@@ -110,12 +115,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_connect_definition {
-   DIG_DISCONNECT = 0,
-   DIG_CONNECT = 1,
-   DIG_CONNECT_MAX
-} dm_dig_connect_e;
-
 typedef enum tag_dig_packetdetection_threshold_definition {
DIG_PD_AT_LOW_POWER = 0,
DIG_PD_AT_NORMAL_POWER = 1,
-- 
2.18.0



[PATCH 04/10] staging:rtl8192u: Remove enum dm_dig_op_e

2018-07-27 Thread John Whitmore
Remove the enumerated type dm_dig_op_e. The type is only used as a
parameter to the function dm_change_dynamic_initgain_thresh(), but
that function is never referenced in the code at all.

I would consider this to be a coding style change as the function is
never referenced and as a result the enumeration is never used. In
any case there should be no impact on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.c | 91 
 drivers/staging/rtl8192u/r8192U_dm.h | 20 --
 2 files changed, 111 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.c 
b/drivers/staging/rtl8192u/r8192U_dm.c
index c4e4e3ba394b..16dbe55d0e15 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.c
+++ b/drivers/staging/rtl8192u/r8192U_dm.c
@@ -1613,97 +1613,6 @@ static void dm_bb_initialgain_backup(struct net_device 
*dev)
 }   /* dm_BBInitialGainBakcup */
 
 #endif
-/*-
- * Function:   dm_change_dynamic_initgain_thresh()
- *
- * Overview:
- *
- * Input:  NONE
- *
- * Output: NONE
- *
- * Return: NONE
- *
- * Revised History:
- * WhenWho Remark
- * 05/29/2008  amy Create Version 0 porting from windows 
code.
- *
- *---*/
-
-void dm_change_dynamic_initgain_thresh(struct net_device *dev, u32 dm_type,
-  u32 dm_value)
-{
-   switch (dm_type) {
-   case DIG_TYPE_THRESH_HIGH:
-   dm_digtable.rssi_high_thresh = dm_value;
-   break;
-
-   case  DIG_TYPE_THRESH_LOW:
-   dm_digtable.rssi_low_thresh = dm_value;
-   break;
-
-   case  DIG_TYPE_THRESH_HIGHPWR_HIGH:
-   dm_digtable.rssi_high_power_highthresh = dm_value;
-   break;
-
-   case DIG_TYPE_THRESH_HIGHPWR_LOW:
-   dm_digtable.rssi_high_power_lowthresh = dm_value;
-   break;
-
-   case DIG_TYPE_ENABLE:
-   dm_digtable.dig_state   = DM_STA_DIG_MAX;
-   dm_digtable.dig_enable_flag = true;
-   break;
-
-   case DIG_TYPE_DISABLE:
-   dm_digtable.dig_state   = DM_STA_DIG_MAX;
-   dm_digtable.dig_enable_flag = false;
-   break;
-
-   case DIG_TYPE_DBG_MODE:
-   if (dm_value >= DM_DBG_MAX)
-   dm_value = DM_DBG_OFF;
-   dm_digtable.dbg_mode= (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RSSI:
-   if (dm_value > 100)
-   dm_value = 30;
-   dm_digtable.rssi_val= (long)dm_value;
-   break;
-
-   case DIG_TYPE_ALGORITHM:
-   if (dm_value >= DIG_ALGO_MAX)
-   dm_value = DIG_ALGO_BY_FALSE_ALARM;
-   if (dm_digtable.dig_algorithm != (u8)dm_value)
-   dm_digtable.dig_algorithm_switch = 1;
-   dm_digtable.dig_algorithm   = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_BACKOFF:
-   if (dm_value > 30)
-   dm_value = 30;
-   dm_digtable.backoff_val = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RX_GAIN_MIN:
-   if (dm_value == 0)
-   dm_value = 0x1;
-   dm_digtable.rx_gain_range_min = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RX_GAIN_MAX:
-   if (dm_value > 0x50)
-   dm_value = 0x50;
-   dm_digtable.rx_gain_range_max = (u8)dm_value;
-   break;
-
-   default:
-   break;
-   }
-
-}  /* DM_ChangeDynamicInitGainThresh */
-
 /*-
  * Function:   dm_dig_init()
  *
diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 942e9ba8d3ad..7d7eeb1cb9fc 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -105,24 +105,6 @@ struct dig {
longrssi_val;
 };
 
-/* 2007/10/11 MH Define DIG operation type. */
-typedef enum tag_dynamic_init_gain_operation_type_definition {
-   DIG_TYPE_THRESH_HIGH= 0,
-   DIG_TYPE_THRESH_LOW = 1,
-   DIG_TYPE_THRESH_HIGHPWR_HIGH= 2,
-   DIG_TYPE_THRESH_HIGHPWR_LOW = 3,
-   DIG_TYPE_DBG_MODE   = 4,
-   DIG_TYPE_RSSI   = 5,
-   DIG_TYPE_ALGORITHM  = 6,
-   DIG_TYPE_BACKOFF= 7,
-   DIG_TYPE_PWDB_FACTOR= 8,
-   DIG_TYPE_RX_GAIN_MIN= 9,
-   DIG_TYPE_RX_GAIN_MAX  

[PATCH 04/10] staging:rtl8192u: Remove enum dm_dig_op_e

2018-07-27 Thread John Whitmore
Remove the enumerated type dm_dig_op_e. The type is only used as a
parameter to the function dm_change_dynamic_initgain_thresh(), but
that function is never referenced in the code at all.

I would consider this to be a coding style change as the function is
never referenced and as a result the enumeration is never used. In
any case there should be no impact on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.c | 91 
 drivers/staging/rtl8192u/r8192U_dm.h | 20 --
 2 files changed, 111 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.c 
b/drivers/staging/rtl8192u/r8192U_dm.c
index c4e4e3ba394b..16dbe55d0e15 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.c
+++ b/drivers/staging/rtl8192u/r8192U_dm.c
@@ -1613,97 +1613,6 @@ static void dm_bb_initialgain_backup(struct net_device 
*dev)
 }   /* dm_BBInitialGainBakcup */
 
 #endif
-/*-
- * Function:   dm_change_dynamic_initgain_thresh()
- *
- * Overview:
- *
- * Input:  NONE
- *
- * Output: NONE
- *
- * Return: NONE
- *
- * Revised History:
- * WhenWho Remark
- * 05/29/2008  amy Create Version 0 porting from windows 
code.
- *
- *---*/
-
-void dm_change_dynamic_initgain_thresh(struct net_device *dev, u32 dm_type,
-  u32 dm_value)
-{
-   switch (dm_type) {
-   case DIG_TYPE_THRESH_HIGH:
-   dm_digtable.rssi_high_thresh = dm_value;
-   break;
-
-   case  DIG_TYPE_THRESH_LOW:
-   dm_digtable.rssi_low_thresh = dm_value;
-   break;
-
-   case  DIG_TYPE_THRESH_HIGHPWR_HIGH:
-   dm_digtable.rssi_high_power_highthresh = dm_value;
-   break;
-
-   case DIG_TYPE_THRESH_HIGHPWR_LOW:
-   dm_digtable.rssi_high_power_lowthresh = dm_value;
-   break;
-
-   case DIG_TYPE_ENABLE:
-   dm_digtable.dig_state   = DM_STA_DIG_MAX;
-   dm_digtable.dig_enable_flag = true;
-   break;
-
-   case DIG_TYPE_DISABLE:
-   dm_digtable.dig_state   = DM_STA_DIG_MAX;
-   dm_digtable.dig_enable_flag = false;
-   break;
-
-   case DIG_TYPE_DBG_MODE:
-   if (dm_value >= DM_DBG_MAX)
-   dm_value = DM_DBG_OFF;
-   dm_digtable.dbg_mode= (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RSSI:
-   if (dm_value > 100)
-   dm_value = 30;
-   dm_digtable.rssi_val= (long)dm_value;
-   break;
-
-   case DIG_TYPE_ALGORITHM:
-   if (dm_value >= DIG_ALGO_MAX)
-   dm_value = DIG_ALGO_BY_FALSE_ALARM;
-   if (dm_digtable.dig_algorithm != (u8)dm_value)
-   dm_digtable.dig_algorithm_switch = 1;
-   dm_digtable.dig_algorithm   = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_BACKOFF:
-   if (dm_value > 30)
-   dm_value = 30;
-   dm_digtable.backoff_val = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RX_GAIN_MIN:
-   if (dm_value == 0)
-   dm_value = 0x1;
-   dm_digtable.rx_gain_range_min = (u8)dm_value;
-   break;
-
-   case DIG_TYPE_RX_GAIN_MAX:
-   if (dm_value > 0x50)
-   dm_value = 0x50;
-   dm_digtable.rx_gain_range_max = (u8)dm_value;
-   break;
-
-   default:
-   break;
-   }
-
-}  /* DM_ChangeDynamicInitGainThresh */
-
 /*-
  * Function:   dm_dig_init()
  *
diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 942e9ba8d3ad..7d7eeb1cb9fc 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -105,24 +105,6 @@ struct dig {
longrssi_val;
 };
 
-/* 2007/10/11 MH Define DIG operation type. */
-typedef enum tag_dynamic_init_gain_operation_type_definition {
-   DIG_TYPE_THRESH_HIGH= 0,
-   DIG_TYPE_THRESH_LOW = 1,
-   DIG_TYPE_THRESH_HIGHPWR_HIGH= 2,
-   DIG_TYPE_THRESH_HIGHPWR_LOW = 3,
-   DIG_TYPE_DBG_MODE   = 4,
-   DIG_TYPE_RSSI   = 5,
-   DIG_TYPE_ALGORITHM  = 6,
-   DIG_TYPE_BACKOFF= 7,
-   DIG_TYPE_PWDB_FACTOR= 8,
-   DIG_TYPE_RX_GAIN_MIN= 9,
-   DIG_TYPE_RX_GAIN_MAX  

[PATCH 05/10] staging:rtl8192u: Refactor enum dm_dig_alg_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_alg_e is only used by one variable in the
code, 'dig_algorithm', a member variable of the structure dig. That
member variable was defined to be of type 'u8' thus negating any
advantage of the use of an enumerated type, (compiler type-checking).

The type of the variable 'dig_algorithm' has been change to reflect
its use of the enumeration and the enumerated type moved in the file
so that it appears before it is used in the file.

Additionally the 'typedef' has been removed to clear the checkpatch
issue with defining new types.

These changes are all coding style in nature and as such should have
no impact on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 7d7eeb1cb9fc..22e25a5cf40a 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -65,6 +65,11 @@
 
 /*--Define structure*/
 
+enum dm_dig_alg_e {
+   DIG_ALGO_BY_FALSE_ALARM = 0,
+   DIG_ALGO_BY_RSSI= 1,
+};
+
 enum dm_dig_sta_e {
DM_STA_DIG_OFF = 0,
DM_STA_DIG_ON,
@@ -74,7 +79,7 @@ enum dm_dig_sta_e {
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
-   u8  dig_algorithm;
+   enum dm_dig_alg_e   dig_algorithm;
u8  dbg_mode;
u8  dig_algorithm_switch;
 
@@ -105,12 +110,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_algorithm_definition {
-   DIG_ALGO_BY_FALSE_ALARM = 0,
-   DIG_ALGO_BY_RSSI= 1,
-   DIG_ALGO_MAX
-} dm_dig_alg_e;
-
 typedef enum tag_dig_dbgmode_definition {
DIG_DBG_OFF = 0,
DIG_DBG_ON = 1,
-- 
2.18.0



[PATCH 09/10] staging:rtl8192u: Refactor enum dm_dig_cs_ratio_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_cs_ratio_e is never actually used as a type,
but only as a collection of related constants. This is because the
variables, which use the defined constant values, are defined as being
of type u8 rather then enum dm_dig_cs_ratio_e. This omission negates
the possibility of taking advantage of compiler type checking.

To enable the use of compiler type checking of the enumeration the two
variables, (curcs_ratio_state & precs_ratio_state), which use the type's
constants have their types changed from u8 to enum dm_dig_cs_ratio_e.

Additionally the types declaration has been moved above the dig
structure, where the type is used. The 'typedef' keyword has been
removed from the type to clear the checkpatch issue with defining new
types. And the constant DIG_CS_MAX has been removed since this is never
used in the code.

These changes are purely coding style changes and should not impact
on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 41b08211e958..2aef330379cb 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -87,6 +87,11 @@ enum dm_dig_pd_th_e {
DIG_PD_AT_HIGH_POWER = 2,
 };
 
+enum dm_dig_cs_ratio_e {
+   DIG_CS_RATIO_LOWER = 0,
+   DIG_CS_RATIO_HIGHER = 1,
+};
+
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
@@ -107,8 +112,8 @@ struct dig {
 
enum dm_dig_pd_th_e curpd_thstate;
enum dm_dig_pd_th_e prepd_thstate;
-   u8  curcs_ratio_state;
-   u8  precs_ratio_state;
+   enum dm_dig_cs_ratio_e  curcs_ratio_state;
+   enum dm_dig_cs_ratio_e  precs_ratio_state;
 
u32 pre_ig_value;
u32 cur_ig_value;
@@ -121,11 +126,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_cck_cs_ratio_state_definition {
-   DIG_CS_RATIO_LOWER = 0,
-   DIG_CS_RATIO_HIGHER = 1,
-   DIG_CS_MAX
-} dm_dig_cs_ratio_e;
 struct dynamic_rx_path_sel {
u8  Enable;
u8  DbgMode;
-- 
2.18.0



[PATCH 02/10] staging:rtl8192u: Refactor use of enum dm_dig_sta_e - Style

2018-07-27 Thread John Whitmore
Refactor the use of the enumerated type dm_dig_sta_e, which is not
actually used for type checking by the compiler.

The enumerated type defines values for the enumeration, which are used
by both dig_state and dig_highpwr_state, (members of the struct dig).
Both of those variables were defined as being of type u8. This negates
any usefulness of the use of the enumeration, (compiler type checking).

To make use of the compiler's type-checking the two member variables,
dig_state and dig_highpwr_state have been changed to being of type
enum dm_dig_sta_e. The enumerated type has been moved above the
struct dig definition so that the enumeration is already defined when
compiler reaches the two types using the enumerated type.

In addition the 'typedef' of the enumerated type has been removed to
clear the checkpatch issue with declaring new types.

These changes, whilst convoluted, are purely coding style in nature and
should not impact runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index e86dda99c223..2444e1c1357b 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -64,6 +64,13 @@
 
 
 /*--Define structure*/
+
+enum dm_dig_sta_e {
+   DM_STA_DIG_OFF = 0,
+   DM_STA_DIG_ON,
+   DM_STA_DIG_MAX
+};
+
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
@@ -77,8 +84,8 @@ struct dig {
longrssi_high_power_lowthresh;
longrssi_high_power_highthresh;
 
-   u8  dig_state;
-   u8  dig_highpwr_state;
+   enum dm_dig_sta_e   dig_state;
+   enum dm_dig_sta_e   dig_highpwr_state;
u8  cur_connect_state;
u8  pre_connect_state;
 
@@ -98,13 +105,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dynamic_init_gain_state_definition {
-   DM_STA_DIG_OFF = 0,
-   DM_STA_DIG_ON,
-   DM_STA_DIG_MAX
-} dm_dig_sta_e;
-
-
 /* 2007/10/08 MH Define RATR state. */
 typedef enum tag_dynamic_ratr_state_definition {
DM_RATR_STA_HIGH = 0,
-- 
2.18.0



[PATCH 03/10] staging:rtl8192u: Refactor enum dm_ratr_sta_e usage - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_ratr_sta_e was defined in the file
drivers/staging/rtl8192u/r8192U_dm.h but never actually used in that
file. The only variable which uses this enumerated type is 'ratr_state',
a member variable of the _rate_adaptive structure defined in the file
drivers/staging/rtl8192u/r8192U.h.

To clarify and place the enumerated type close to where it is used the
type was moved to the drivers/staging/rtl8192u/r8192U.h file.

In addition the member variable 'ratr_state' which uses the enumerated
constants was declared as being of type 'u8'. This negates any advantage
of the enumerated type, compiler type-checking, so that member variable's
type has been changed to being of the enumerated type.

Additionally the typedef from the enumerated type has been removed to
clear the checkpatch issue with defining new types.

This is a coding style change and should have no impact on runtime code
execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U.h| 10 +-
 drivers/staging/rtl8192u/r8192U_dm.h |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U.h 
b/drivers/staging/rtl8192u/r8192U.h
index e615d9b3f6b1..b00781e1f5ee 100644
--- a/drivers/staging/rtl8192u/r8192U.h
+++ b/drivers/staging/rtl8192u/r8192U.h
@@ -662,9 +662,17 @@ typedef enum _RT_RF_TYPE_819xU {
RF_PSEUDO_11N = 4,
 } RT_RF_TYPE_819xU, *PRT_RF_TYPE_819xU;
 
+/* 2007/10/08 MH Define RATR state. */
+enum dm_ratr_sta_e {
+   DM_RATR_STA_HIGH = 0,
+   DM_RATR_STA_MIDDLE = 1,
+   DM_RATR_STA_LOW = 2,
+   DM_RATR_STA_MAX
+};
+
 typedef struct _rate_adaptive {
u8  rate_adaptive_disabled;
-   u8  ratr_state;
+   enum dm_ratr_sta_e  ratr_state;
u16 reserve;
 
u32 high_rssi_thresh_for_ra;
diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 2444e1c1357b..942e9ba8d3ad 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -105,14 +105,6 @@ struct dig {
longrssi_val;
 };
 
-/* 2007/10/08 MH Define RATR state. */
-typedef enum tag_dynamic_ratr_state_definition {
-   DM_RATR_STA_HIGH = 0,
-   DM_RATR_STA_MIDDLE = 1,
-   DM_RATR_STA_LOW = 2,
-   DM_RATR_STA_MAX
-} dm_ratr_sta_e;
-
 /* 2007/10/11 MH Define DIG operation type. */
 typedef enum tag_dynamic_init_gain_operation_type_definition {
DIG_TYPE_THRESH_HIGH= 0,
-- 
2.18.0



[PATCH 05/10] staging:rtl8192u: Refactor enum dm_dig_alg_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_alg_e is only used by one variable in the
code, 'dig_algorithm', a member variable of the structure dig. That
member variable was defined to be of type 'u8' thus negating any
advantage of the use of an enumerated type, (compiler type-checking).

The type of the variable 'dig_algorithm' has been change to reflect
its use of the enumeration and the enumerated type moved in the file
so that it appears before it is used in the file.

Additionally the 'typedef' has been removed to clear the checkpatch
issue with defining new types.

These changes are all coding style in nature and as such should have
no impact on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 7d7eeb1cb9fc..22e25a5cf40a 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -65,6 +65,11 @@
 
 /*--Define structure*/
 
+enum dm_dig_alg_e {
+   DIG_ALGO_BY_FALSE_ALARM = 0,
+   DIG_ALGO_BY_RSSI= 1,
+};
+
 enum dm_dig_sta_e {
DM_STA_DIG_OFF = 0,
DM_STA_DIG_ON,
@@ -74,7 +79,7 @@ enum dm_dig_sta_e {
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
-   u8  dig_algorithm;
+   enum dm_dig_alg_e   dig_algorithm;
u8  dbg_mode;
u8  dig_algorithm_switch;
 
@@ -105,12 +110,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_algorithm_definition {
-   DIG_ALGO_BY_FALSE_ALARM = 0,
-   DIG_ALGO_BY_RSSI= 1,
-   DIG_ALGO_MAX
-} dm_dig_alg_e;
-
 typedef enum tag_dig_dbgmode_definition {
DIG_DBG_OFF = 0,
DIG_DBG_ON = 1,
-- 
2.18.0



[PATCH 09/10] staging:rtl8192u: Refactor enum dm_dig_cs_ratio_e - Style

2018-07-27 Thread John Whitmore
The enumerated type dm_dig_cs_ratio_e is never actually used as a type,
but only as a collection of related constants. This is because the
variables, which use the defined constant values, are defined as being
of type u8 rather then enum dm_dig_cs_ratio_e. This omission negates
the possibility of taking advantage of compiler type checking.

To enable the use of compiler type checking of the enumeration the two
variables, (curcs_ratio_state & precs_ratio_state), which use the type's
constants have their types changed from u8 to enum dm_dig_cs_ratio_e.

Additionally the types declaration has been moved above the dig
structure, where the type is used. The 'typedef' keyword has been
removed from the type to clear the checkpatch issue with defining new
types. And the constant DIG_CS_MAX has been removed since this is never
used in the code.

These changes are purely coding style changes and should not impact
on runtime code execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/r8192U_dm.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/rtl8192u/r8192U_dm.h 
b/drivers/staging/rtl8192u/r8192U_dm.h
index 41b08211e958..2aef330379cb 100644
--- a/drivers/staging/rtl8192u/r8192U_dm.h
+++ b/drivers/staging/rtl8192u/r8192U_dm.h
@@ -87,6 +87,11 @@ enum dm_dig_pd_th_e {
DIG_PD_AT_HIGH_POWER = 2,
 };
 
+enum dm_dig_cs_ratio_e {
+   DIG_CS_RATIO_LOWER = 0,
+   DIG_CS_RATIO_HIGHER = 1,
+};
+
 /* 2007/10/04 MH Define upper and lower threshold of DIG enable or disable. */
 struct dig {
u8  dig_enable_flag;
@@ -107,8 +112,8 @@ struct dig {
 
enum dm_dig_pd_th_e curpd_thstate;
enum dm_dig_pd_th_e prepd_thstate;
-   u8  curcs_ratio_state;
-   u8  precs_ratio_state;
+   enum dm_dig_cs_ratio_e  curcs_ratio_state;
+   enum dm_dig_cs_ratio_e  precs_ratio_state;
 
u32 pre_ig_value;
u32 cur_ig_value;
@@ -121,11 +126,6 @@ struct dig {
longrssi_val;
 };
 
-typedef enum tag_dig_cck_cs_ratio_state_definition {
-   DIG_CS_RATIO_LOWER = 0,
-   DIG_CS_RATIO_HIGHER = 1,
-   DIG_CS_MAX
-} dm_dig_cs_ratio_e;
 struct dynamic_rx_path_sel {
u8  Enable;
u8  DbgMode;
-- 
2.18.0



  1   2   3   4   5   6   7   8   9   10   >