Re: [PATCH 2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

2018-10-29 Thread Viresh Kumar
On Mon, Oct 29, 2018 at 9:54 PM Daniel Lezcano
 wrote:
>
> The mutex protects a per_cpu variable access. The potential race can
> happen only when the cpufreq governor module is loaded and at the same
> time the cpu capacity is changed in the sysfs.
>
> There is no real interest of using a mutex to protect a variable
> assignation when there is no situation where a task can take the lock
> and block.
>
> Replace the mutex by READ_ONCE / WRITE_ONCE.
>
> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/base/arch_topology.c  | 7 +--
>  include/linux/arch_topology.h | 2 +-
>  2 files changed, 2 insertions(+), 7 deletions(-)

Reviewed-by: Viresh Kumar 


Re: [PATCH 2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

2018-10-29 Thread Viresh Kumar
On Mon, Oct 29, 2018 at 9:54 PM Daniel Lezcano
 wrote:
>
> The mutex protects a per_cpu variable access. The potential race can
> happen only when the cpufreq governor module is loaded and at the same
> time the cpu capacity is changed in the sysfs.
>
> There is no real interest of using a mutex to protect a variable
> assignation when there is no situation where a task can take the lock
> and block.
>
> Replace the mutex by READ_ONCE / WRITE_ONCE.
>
> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/base/arch_topology.c  | 7 +--
>  include/linux/arch_topology.h | 2 +-
>  2 files changed, 2 insertions(+), 7 deletions(-)

Reviewed-by: Viresh Kumar 


Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed

2018-10-29 Thread Gao Xiang
Hi,

On 2018/10/30 14:04, Gao Xiang wrote:
> It is better to use wrapped smp_cond_load_relaxed
> instead of open-coded busy waiting for bit_spinlock.
> 
> Signed-off-by: Gao Xiang 
> ---
> 
> change log v2:
>  - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1)))
>  - the test result is described in the following reply.
> 
> Thanks,
> Gao Xiang


Simple test script:
#include 
#include 
#include 

unsigned long global_lock;

int test_thread(void *data)
{
unsigned long thread_id = (unsigned long)data;
int i;
u64 start = ktime_get_ns();

for (i = 0; i < 50; ++i) {
bit_spin_lock(0, _lock);
__asm__("yield");
bit_spin_unlock(0, _lock);
}
pr_err("Thread id: %lu time: %llu\n", thread_id, ktime_get_ns() - 
start);

do_exit(0);
}


static int __init bitspinlock_test_module_init(void)
{
int i;

for (i = 0; i < 8; ++i) {
if (IS_ERR(kthread_run(test_thread, (void *)(unsigned long)i, 
"thread-%d", i)))
pr_err("fail to create thread %d\n", i);
}

return 0;
}

static void __exit bitspinlock_test_module_exit(void)
{
}

module_init(bitspinlock_test_module_init);
module_exit(bitspinlock_test_module_exit);
MODULE_LICENSE("GPL");


...and tested in the following ARM server environment:

Processor: HI1616 (https://en.wikichip.org/wiki/hisilicon/hi16xx/hi1616)
Board: HiSilicon D05 Development Board (http://open-estuary.org/d05/)
Memory: 512GB
Host OS: Ubuntu 18.04.1 LTS (Ubuntu 4.15.0-29.31-generic 4.15.18)
QEMU KVM OS: Linux 4.19 + buildroot
QEMU KVM cmdline: qemu-system-aarch64 -enable-kvm -cpu host -smp 4 -m 256M 
-kernel Image -M virt,kernel_irqchip=on -nographic -hda rootfs.ext2 -append 
'root=/dev/vda console=ttyAMA0 earlycon=pl011,0x900' -serial mon:stdio -net 
none

Without this patch:
  Thread 0   Thread 1   Thread 2   Thread 3   Thread 4   Thread 5   Thread 6   
Thread 7
1 1283709480 1271869280  454742480 1173673820 1145643640 1118846920  774616920 
1144146140
2  643580180  625143860  576841700  322982340  649987880  585749000  529178880  
373374780
3  672307220  847315000  880801860 1039502040  667086380 1033939940 1035381120 
1046898300
4  568635580  440547020  737000380  910040880  804543740  712314280  868896880  
867049000
5  749107320  726397720  776134480  611970100  756721040  753449440  711691300  
609343300

With this patch:
  Thread 0   Thread 1   Thread 2   Thread 3   Thread 4   Thread 5   Thread 6   
Thread 7
1  170327620  196322160  169434180   74723860  178145600  178873460  143843260  
 70998780
2  166415220  129649200  166161240  175241520  155474460  112811860  157003140  
150087420
3  511420780  117655640  598641860  596213720  462888760  430838600  554346300  
428035120 
4  174520240  156311800  120274280   87465380  172781400  136118620  163728340  
 63026360
5  153677940  202786860  183626500  140721300  150311360  161266840  168154340  
107247460

Thanks,
Gao Xiang


Re: [PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed

2018-10-29 Thread Gao Xiang
Hi,

On 2018/10/30 14:04, Gao Xiang wrote:
> It is better to use wrapped smp_cond_load_relaxed
> instead of open-coded busy waiting for bit_spinlock.
> 
> Signed-off-by: Gao Xiang 
> ---
> 
> change log v2:
>  - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1)))
>  - the test result is described in the following reply.
> 
> Thanks,
> Gao Xiang


Simple test script:
#include 
#include 
#include 

unsigned long global_lock;

int test_thread(void *data)
{
unsigned long thread_id = (unsigned long)data;
int i;
u64 start = ktime_get_ns();

for (i = 0; i < 50; ++i) {
bit_spin_lock(0, _lock);
__asm__("yield");
bit_spin_unlock(0, _lock);
}
pr_err("Thread id: %lu time: %llu\n", thread_id, ktime_get_ns() - 
start);

do_exit(0);
}


static int __init bitspinlock_test_module_init(void)
{
int i;

for (i = 0; i < 8; ++i) {
if (IS_ERR(kthread_run(test_thread, (void *)(unsigned long)i, 
"thread-%d", i)))
pr_err("fail to create thread %d\n", i);
}

return 0;
}

static void __exit bitspinlock_test_module_exit(void)
{
}

module_init(bitspinlock_test_module_init);
module_exit(bitspinlock_test_module_exit);
MODULE_LICENSE("GPL");


...and tested in the following ARM server environment:

Processor: HI1616 (https://en.wikichip.org/wiki/hisilicon/hi16xx/hi1616)
Board: HiSilicon D05 Development Board (http://open-estuary.org/d05/)
Memory: 512GB
Host OS: Ubuntu 18.04.1 LTS (Ubuntu 4.15.0-29.31-generic 4.15.18)
QEMU KVM OS: Linux 4.19 + buildroot
QEMU KVM cmdline: qemu-system-aarch64 -enable-kvm -cpu host -smp 4 -m 256M 
-kernel Image -M virt,kernel_irqchip=on -nographic -hda rootfs.ext2 -append 
'root=/dev/vda console=ttyAMA0 earlycon=pl011,0x900' -serial mon:stdio -net 
none

Without this patch:
  Thread 0   Thread 1   Thread 2   Thread 3   Thread 4   Thread 5   Thread 6   
Thread 7
1 1283709480 1271869280  454742480 1173673820 1145643640 1118846920  774616920 
1144146140
2  643580180  625143860  576841700  322982340  649987880  585749000  529178880  
373374780
3  672307220  847315000  880801860 1039502040  667086380 1033939940 1035381120 
1046898300
4  568635580  440547020  737000380  910040880  804543740  712314280  868896880  
867049000
5  749107320  726397720  776134480  611970100  756721040  753449440  711691300  
609343300

With this patch:
  Thread 0   Thread 1   Thread 2   Thread 3   Thread 4   Thread 5   Thread 6   
Thread 7
1  170327620  196322160  169434180   74723860  178145600  178873460  143843260  
 70998780
2  166415220  129649200  166161240  175241520  155474460  112811860  157003140  
150087420
3  511420780  117655640  598641860  596213720  462888760  430838600  554346300  
428035120 
4  174520240  156311800  120274280   87465380  172781400  136118620  163728340  
 63026360
5  153677940  202786860  183626500  140721300  150311360  161266840  168154340  
107247460

Thanks,
Gao Xiang


[PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed

2018-10-29 Thread Gao Xiang
It is better to use wrapped smp_cond_load_relaxed
instead of open-coded busy waiting for bit_spinlock.

Signed-off-by: Gao Xiang 
---

change log v2:
 - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1)))
 - the test result is described in the following reply.

Thanks,
Gao Xiang

 include/linux/bit_spinlock.h | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
index bbc4730a6505..d5f922b5ffd9 100644
--- a/include/linux/bit_spinlock.h
+++ b/include/linux/bit_spinlock.h
@@ -15,22 +15,19 @@
  */
 static inline void bit_spin_lock(int bitnum, unsigned long *addr)
 {
-   /*
-* Assuming the lock is uncontended, this never enters
-* the body of the outer loop. If it is contended, then
-* within the inner loop a non-atomic test is used to
-* busywait with less bus contention for a good time to
-* attempt to acquire the lock bit.
-*/
-   preempt_disable();
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-   while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
-   preempt_enable();
-   do {
-   cpu_relax();
-   } while (test_bit(bitnum, addr));
+   const unsigned int bitshift = bitnum & (BITS_PER_LONG - 1);
+
+   while (1) {
+   smp_cond_load_relaxed([BIT_WORD(bitnum)],
+ !((VAL >> bitshift) & 1));
preempt_disable();
+   if (!test_and_set_bit_lock(bitnum, addr))
+   break;
+   preempt_enable();
}
+#else
+   preempt_disable();
 #endif
__acquire(bitlock);
 }
-- 
2.17.1



[PATCH v2] bit_spinlock: introduce smp_cond_load_relaxed

2018-10-29 Thread Gao Xiang
It is better to use wrapped smp_cond_load_relaxed
instead of open-coded busy waiting for bit_spinlock.

Signed-off-by: Gao Xiang 
---

change log v2:
 - fix the incorrect expression !(VAL >> (bitnum & (BITS_PER_LONG-1)))
 - the test result is described in the following reply.

Thanks,
Gao Xiang

 include/linux/bit_spinlock.h | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
index bbc4730a6505..d5f922b5ffd9 100644
--- a/include/linux/bit_spinlock.h
+++ b/include/linux/bit_spinlock.h
@@ -15,22 +15,19 @@
  */
 static inline void bit_spin_lock(int bitnum, unsigned long *addr)
 {
-   /*
-* Assuming the lock is uncontended, this never enters
-* the body of the outer loop. If it is contended, then
-* within the inner loop a non-atomic test is used to
-* busywait with less bus contention for a good time to
-* attempt to acquire the lock bit.
-*/
-   preempt_disable();
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-   while (unlikely(test_and_set_bit_lock(bitnum, addr))) {
-   preempt_enable();
-   do {
-   cpu_relax();
-   } while (test_bit(bitnum, addr));
+   const unsigned int bitshift = bitnum & (BITS_PER_LONG - 1);
+
+   while (1) {
+   smp_cond_load_relaxed([BIT_WORD(bitnum)],
+ !((VAL >> bitshift) & 1));
preempt_disable();
+   if (!test_and_set_bit_lock(bitnum, addr))
+   break;
+   preempt_enable();
}
+#else
+   preempt_disable();
 #endif
__acquire(bitlock);
 }
-- 
2.17.1



Re: [PATCH 1/4] base/drivers/arch_topology: Remove useless check

2018-10-29 Thread Viresh Kumar
On Mon, Oct 29, 2018 at 9:56 PM Daniel Lezcano
 wrote:

Would have been better if I was cc'd on all the patches since I was
looking at this
stuff actively this week :)

> The function 'register_cpufreq_notifier' registers the
> init_cpu_capacity_notifier() only if raw_capacity is not NULL.
>
> Hence init_cpu_capacity_notifier() can not be called with raw_capacity
> set to NULL, it is pointless to check it.

It isn't entirely pointless though.

It is possible for init_cpu_capacity_notifier() to get called after
free_raw_capacity()
is called from it as the notifier unregistration happens from a workqueue.


Re: [PATCH 1/4] base/drivers/arch_topology: Remove useless check

2018-10-29 Thread Viresh Kumar
On Mon, Oct 29, 2018 at 9:56 PM Daniel Lezcano
 wrote:

Would have been better if I was cc'd on all the patches since I was
looking at this
stuff actively this week :)

> The function 'register_cpufreq_notifier' registers the
> init_cpu_capacity_notifier() only if raw_capacity is not NULL.
>
> Hence init_cpu_capacity_notifier() can not be called with raw_capacity
> set to NULL, it is pointless to check it.

It isn't entirely pointless though.

It is possible for init_cpu_capacity_notifier() to get called after
free_raw_capacity()
is called from it as the notifier unregistration happens from a workqueue.


[PATCH] arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP

2018-10-29 Thread Bjorn Andersson
GPIOs 0 through 3 and 81 through 84 are configured to not be accessible
from the application CPUs. Mark them as reserved to allow the MSM8998
MTP to boot after the introduction of 3edfb7bd76bd ("gpiolib: Show
correct direction from the beginning").

Signed-off-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
index b4276da1fb0d..11fd1fe8bdb5 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
@@ -241,3 +241,7 @@
};
};
 };
+
+ {
+   gpio-reserved-ranges = <0 4>, <81 4>;
+};
-- 
2.18.0



[PATCH] arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP

2018-10-29 Thread Bjorn Andersson
GPIOs 0 through 3 and 81 through 84 are configured to not be accessible
from the application CPUs. Mark them as reserved to allow the MSM8998
MTP to boot after the introduction of 3edfb7bd76bd ("gpiolib: Show
correct direction from the beginning").

Signed-off-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
index b4276da1fb0d..11fd1fe8bdb5 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
@@ -241,3 +241,7 @@
};
};
 };
+
+ {
+   gpio-reserved-ranges = <0 4>, <81 4>;
+};
-- 
2.18.0



[PATCH] doc: correct parameter in stallwarn

2018-10-29 Thread Joel Fernandes (Google)
The stallwarn document incorrectly mentions 'fps=' instead of 'fqs='.
Correct that.

Signed-off-by: Joel Fernandes (Google) 
---
 Documentation/RCU/stallwarn.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index b01bcafc64aa..073dbc12d1ea 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -205,7 +205,7 @@ handlers are no longer able to execute on this CPU.  This 
can happen if
 the stalled CPU is spinning with interrupts are disabled, or, in -rt
 kernels, if a high-priority process is starving RCU's softirq handler.
 
-The "fps=" shows the number of force-quiescent-state idle/offline
+The "fqs=" shows the number of force-quiescent-state idle/offline
 detection passes that the grace-period kthread has made across this
 CPU since the last time that this CPU noted the beginning of a grace
 period.
-- 
2.19.1.568.g152ad8e336-goog



[PATCH] doc: correct parameter in stallwarn

2018-10-29 Thread Joel Fernandes (Google)
The stallwarn document incorrectly mentions 'fps=' instead of 'fqs='.
Correct that.

Signed-off-by: Joel Fernandes (Google) 
---
 Documentation/RCU/stallwarn.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index b01bcafc64aa..073dbc12d1ea 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -205,7 +205,7 @@ handlers are no longer able to execute on this CPU.  This 
can happen if
 the stalled CPU is spinning with interrupts are disabled, or, in -rt
 kernels, if a high-priority process is starving RCU's softirq handler.
 
-The "fps=" shows the number of force-quiescent-state idle/offline
+The "fqs=" shows the number of force-quiescent-state idle/offline
 detection passes that the grace-period kthread has made across this
 CPU since the last time that this CPU noted the beginning of a grace
 period.
-- 
2.19.1.568.g152ad8e336-goog



linux-next: Tree for Oct 30

2018-10-29 Thread Stephen Rothwell
Hi all,

Please do not add any v4.21/v5.1 code to your linux-next included trees
until after the merge window closes.

Changes since 20181029:

My fixes tree contains this:

  "drivers: net: include linux/ip.h for iphdr"

The compiler-attributes tree gained a conflict against the kbuild tree.

The vfs tree gained a conflict against Linus' tree.

Non-merge commits (relative to Linus' tree): 1133
 1678 files changed, 65393 insertions(+), 20326 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 291 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (4b42745211af Merge tag 'armsoc-soc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging fixes/master (2941927a2da1 drivers: net: include linux/ip.h for iphdr)
Merging kbuild-current/fixes (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging arc-current/for-curr (56d02dd9e794 ARC: IOC: panic if kernel was 
started with previously enabled IOC)
Merging arm-current/fixes (3a58ac65e2d7 ARM: 8799/1: mm: fix pci_ioremap_io() 
offset check)
Merging arm64-fixes/for-next/fixes (ca2b497253ad arm64: perf: Reject 
stand-alone CHAIN events for PMUv3)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (ac1788cc7da4 powerpc/numa: Skip onlining a offline 
node in kdump path)
Merging sparc/master (345671ea0f92 Merge branch 'akpm' (patches from Andrew))
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging bpf/master (d8fd9e106fbc bpf: fix wrong helper enablement in cgroup 
local storage)
Merging ipsec/master (6788fac82001 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf)
Merging netfilter/master (c1cf13068b26 Merge branch 'master' of 
git://blackhole.kfki.hu/nf)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (3baafeffa48a iwlwifi: 1000: set the TFD queue 
size)
Merging mac80211/master (8d0be26c781a mac80211_hwsim: fix module init error 
paths for netlink)
Merging rdma-fixes/for-rc (a3671a4f973e RDMA/ucma: Fix Spectre v1 vulnerability)
Merging sound-current/for-linus (aedef16a63d5 ALSA: dice: fix to wait for 
releases of all ALSA character devices)
Merging sound-asoc-fixes/for-linus (eafb621d62c7 Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (35a7f35ad1b1 Linux 4.19-rc8)
Merging regulator-fixes/for-linus (84df9525b0c2 Linux 4.19)
Merging spi-fixes/for-linus (599eb81f4118 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (2edab4df98d9 PCI: Expand the "PF" acronym in 
Kconfig help text)
Merging driver-core.current/driver-core-linus (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging tty.current/tty-linus (202dc3cc10b4 serial: sh-sci: Fix receive on 
SCIFA/SCIFB variants with DMA)
Merging usb.current/usb-linus (69d5b97c5973 HID: we do not randomly make new 
drivers 'default y')
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-

linux-next: Tree for Oct 30

2018-10-29 Thread Stephen Rothwell
Hi all,

Please do not add any v4.21/v5.1 code to your linux-next included trees
until after the merge window closes.

Changes since 20181029:

My fixes tree contains this:

  "drivers: net: include linux/ip.h for iphdr"

The compiler-attributes tree gained a conflict against the kbuild tree.

The vfs tree gained a conflict against Linus' tree.

Non-merge commits (relative to Linus' tree): 1133
 1678 files changed, 65393 insertions(+), 20326 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 291 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (4b42745211af Merge tag 'armsoc-soc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging fixes/master (2941927a2da1 drivers: net: include linux/ip.h for iphdr)
Merging kbuild-current/fixes (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging arc-current/for-curr (56d02dd9e794 ARC: IOC: panic if kernel was 
started with previously enabled IOC)
Merging arm-current/fixes (3a58ac65e2d7 ARM: 8799/1: mm: fix pci_ioremap_io() 
offset check)
Merging arm64-fixes/for-next/fixes (ca2b497253ad arm64: perf: Reject 
stand-alone CHAIN events for PMUv3)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (ac1788cc7da4 powerpc/numa: Skip onlining a offline 
node in kdump path)
Merging sparc/master (345671ea0f92 Merge branch 'akpm' (patches from Andrew))
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging bpf/master (d8fd9e106fbc bpf: fix wrong helper enablement in cgroup 
local storage)
Merging ipsec/master (6788fac82001 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf)
Merging netfilter/master (c1cf13068b26 Merge branch 'master' of 
git://blackhole.kfki.hu/nf)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (3baafeffa48a iwlwifi: 1000: set the TFD queue 
size)
Merging mac80211/master (8d0be26c781a mac80211_hwsim: fix module init error 
paths for netlink)
Merging rdma-fixes/for-rc (a3671a4f973e RDMA/ucma: Fix Spectre v1 vulnerability)
Merging sound-current/for-linus (aedef16a63d5 ALSA: dice: fix to wait for 
releases of all ALSA character devices)
Merging sound-asoc-fixes/for-linus (eafb621d62c7 Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (35a7f35ad1b1 Linux 4.19-rc8)
Merging regulator-fixes/for-linus (84df9525b0c2 Linux 4.19)
Merging spi-fixes/for-linus (599eb81f4118 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (2edab4df98d9 PCI: Expand the "PF" acronym in 
Kconfig help text)
Merging driver-core.current/driver-core-linus (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging tty.current/tty-linus (202dc3cc10b4 serial: sh-sci: Fix receive on 
SCIFA/SCIFB variants with DMA)
Merging usb.current/usb-linus (69d5b97c5973 HID: we do not randomly make new 
drivers 'default y')
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-29 Thread Aleksa Sarai
On 2018-10-29, Daniel Colascione  wrote:
> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> 
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.

(Aside from any UX concerns other folks might have.)

I think it would be a good idea to (at least temporarily) restrict this
so that only processes that are in the same PID namespace as the /proc
being resolved through may use this interface. Otherwise you might have
cases where partial container breakouts can start sending signals to
PIDs they wouldn't normally be able to address.

> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".

I do like the idea of holding a dirfd to /proc/$pid to address
processes, and it something I considered doing in runc. (Unfortunately
there are lots of things that make it a bit difficult to use /proc/$pid
exclusively for introspection of a process -- especially in the context
of containers.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-29 Thread Aleksa Sarai
On 2018-10-29, Daniel Colascione  wrote:
> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> 
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.

(Aside from any UX concerns other folks might have.)

I think it would be a good idea to (at least temporarily) restrict this
so that only processes that are in the same PID namespace as the /proc
being resolved through may use this interface. Otherwise you might have
cases where partial container breakouts can start sending signals to
PIDs they wouldn't normally be able to address.

> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".

I do like the idea of holding a dirfd to /proc/$pid to address
processes, and it something I considered doing in runc. (Unfortunately
there are lots of things that make it a bit difficult to use /proc/$pid
exclusively for introspection of a process -- especially in the context
of containers.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [RFC PATCH v2 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish

2018-10-29 Thread Tetsuo Handa
Michal Hocko wrote:
> @@ -3156,6 +3166,13 @@ void exit_mmap(struct mm_struct *mm)
> vma = remove_vma(vma);
> }
> vm_unacct_memory(nr_accounted);
> +
> +   /*
> +* Now that the full address space is torn down, make sure the
> +* OOM killer skips over this task
> +*/
> +   if (oom)
> +   set_bit(MMF_OOM_SKIP, >flags);
>  }
> 
>  /* Insert vm structure into process list sorted by address

I don't like setting MMF_OOF_SKIP after remove_vma() loop. 50 users might
call vma->vm_ops->close() from remove_vma(). Some of them are doing fs
writeback, some of them might be doing GFP_KERNEL allocation from
vma->vm_ops->open() with a lock also held by vma->vm_ops->close().

I don't think that waiting for completion of remove_vma() loop is safe.
And my patch is safe.

 drivers/android/binder.c  |2 +-
 drivers/gpu/drm/drm_gem_cma_helper.c  |2 +-
 drivers/gpu/drm/drm_vm.c  |8 
 drivers/gpu/drm/gma500/framebuffer.c  |2 +-
 drivers/gpu/drm/gma500/psb_drv.c  |2 +-
 drivers/gpu/drm/i915/i915_drv.c   |2 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |2 +-
 drivers/gpu/drm/udl/udl_drv.c |2 +-
 drivers/gpu/drm/v3d/v3d_drv.c |2 +-
 drivers/gpu/drm/vc4/vc4_drv.c |2 +-
 drivers/gpu/drm/vgem/vgem_drv.c   |2 +-
 drivers/gpu/drm/vkms/vkms_drv.c   |2 +-
 drivers/gpu/drm/xen/xen_drm_front.c   |2 +-
 drivers/hwtracing/intel_th/msu.c  |2 +-
 drivers/hwtracing/stm/core.c  |2 +-
 drivers/infiniband/core/uverbs_main.c |2 +-
 drivers/infiniband/sw/rdmavt/mmap.c   |2 +-
 drivers/infiniband/sw/rxe/rxe_mmap.c  |2 +-
 drivers/media/common/videobuf2/videobuf2-memops.c |2 +-
 drivers/media/pci/meye/meye.c |2 +-
 drivers/media/platform/omap/omap_vout.c   |2 +-
 drivers/media/usb/stkwebcam/stk-webcam.c  |2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |2 +-
 drivers/media/v4l2-core/videobuf-vmalloc.c|2 +-
 drivers/misc/genwqe/card_dev.c|2 +-
 drivers/misc/mic/scif/scif_mmap.c |2 +-
 drivers/misc/sgi-gru/grufile.c|2 +-
 drivers/rapidio/devices/rio_mport_cdev.c  |2 +-
 drivers/staging/comedi/comedi_fops.c  |2 +-
 drivers/staging/media/zoran/zoran_driver.c|2 +-
 drivers/staging/vme/devices/vme_user.c|2 +-
 drivers/usb/core/devio.c  |2 +-
 drivers/usb/mon/mon_bin.c |2 +-
 drivers/video/fbdev/omap2/omapfb/omapfb-main.c|2 +-
 drivers/xen/gntalloc.c|2 +-
 drivers/xen/gntdev.c  |2 +-
 drivers/xen/privcmd-buf.c |2 +-
 drivers/xen/privcmd.c |2 +-
 fs/9p/vfs_file.c  |2 +-
 fs/fuse/file.c|2 +-
 fs/kernfs/file.c  |2 +-
 include/linux/mm.h|2 +-
 ipc/shm.c |2 +-
 kernel/events/core.c  |2 +-
 kernel/relay.c|2 +-
 mm/hugetlb.c  |2 +-
 mm/mmap.c |   14 +++---
 net/packet/af_packet.c|2 +-
 sound/core/pcm_native.c   |4 ++--
 sound/usb/usx2y/us122l.c  |2 +-
 sound/usb/usx2y/usx2yhwdeppcm.c   |2 +-
 52 files changed, 62 insertions(+), 62 deletions(-)


Re: [RFC PATCH v2 3/3] mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish

2018-10-29 Thread Tetsuo Handa
Michal Hocko wrote:
> @@ -3156,6 +3166,13 @@ void exit_mmap(struct mm_struct *mm)
> vma = remove_vma(vma);
> }
> vm_unacct_memory(nr_accounted);
> +
> +   /*
> +* Now that the full address space is torn down, make sure the
> +* OOM killer skips over this task
> +*/
> +   if (oom)
> +   set_bit(MMF_OOM_SKIP, >flags);
>  }
> 
>  /* Insert vm structure into process list sorted by address

I don't like setting MMF_OOF_SKIP after remove_vma() loop. 50 users might
call vma->vm_ops->close() from remove_vma(). Some of them are doing fs
writeback, some of them might be doing GFP_KERNEL allocation from
vma->vm_ops->open() with a lock also held by vma->vm_ops->close().

I don't think that waiting for completion of remove_vma() loop is safe.
And my patch is safe.

 drivers/android/binder.c  |2 +-
 drivers/gpu/drm/drm_gem_cma_helper.c  |2 +-
 drivers/gpu/drm/drm_vm.c  |8 
 drivers/gpu/drm/gma500/framebuffer.c  |2 +-
 drivers/gpu/drm/gma500/psb_drv.c  |2 +-
 drivers/gpu/drm/i915/i915_drv.c   |2 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |2 +-
 drivers/gpu/drm/udl/udl_drv.c |2 +-
 drivers/gpu/drm/v3d/v3d_drv.c |2 +-
 drivers/gpu/drm/vc4/vc4_drv.c |2 +-
 drivers/gpu/drm/vgem/vgem_drv.c   |2 +-
 drivers/gpu/drm/vkms/vkms_drv.c   |2 +-
 drivers/gpu/drm/xen/xen_drm_front.c   |2 +-
 drivers/hwtracing/intel_th/msu.c  |2 +-
 drivers/hwtracing/stm/core.c  |2 +-
 drivers/infiniband/core/uverbs_main.c |2 +-
 drivers/infiniband/sw/rdmavt/mmap.c   |2 +-
 drivers/infiniband/sw/rxe/rxe_mmap.c  |2 +-
 drivers/media/common/videobuf2/videobuf2-memops.c |2 +-
 drivers/media/pci/meye/meye.c |2 +-
 drivers/media/platform/omap/omap_vout.c   |2 +-
 drivers/media/usb/stkwebcam/stk-webcam.c  |2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c |2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c |2 +-
 drivers/media/v4l2-core/videobuf-vmalloc.c|2 +-
 drivers/misc/genwqe/card_dev.c|2 +-
 drivers/misc/mic/scif/scif_mmap.c |2 +-
 drivers/misc/sgi-gru/grufile.c|2 +-
 drivers/rapidio/devices/rio_mport_cdev.c  |2 +-
 drivers/staging/comedi/comedi_fops.c  |2 +-
 drivers/staging/media/zoran/zoran_driver.c|2 +-
 drivers/staging/vme/devices/vme_user.c|2 +-
 drivers/usb/core/devio.c  |2 +-
 drivers/usb/mon/mon_bin.c |2 +-
 drivers/video/fbdev/omap2/omapfb/omapfb-main.c|2 +-
 drivers/xen/gntalloc.c|2 +-
 drivers/xen/gntdev.c  |2 +-
 drivers/xen/privcmd-buf.c |2 +-
 drivers/xen/privcmd.c |2 +-
 fs/9p/vfs_file.c  |2 +-
 fs/fuse/file.c|2 +-
 fs/kernfs/file.c  |2 +-
 include/linux/mm.h|2 +-
 ipc/shm.c |2 +-
 kernel/events/core.c  |2 +-
 kernel/relay.c|2 +-
 mm/hugetlb.c  |2 +-
 mm/mmap.c |   14 +++---
 net/packet/af_packet.c|2 +-
 sound/core/pcm_native.c   |4 ++--
 sound/usb/usx2y/us122l.c  |2 +-
 sound/usb/usx2y/usx2yhwdeppcm.c   |2 +-
 52 files changed, 62 insertions(+), 62 deletions(-)


[PATCH] kbuild: consolidate single targets

2018-10-29 Thread Masahiro Yamada
Instead of specifying target/source pairs, let's list patterns that we
want to handle as single targets. This slightly changes the behavior;
the top Makefile previously checked the presence of a source file,
now Kbuild will descend into a subdirectory anyway to find out what to
do there.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index be76e6e..7d13add 100644
--- a/Makefile
+++ b/Makefile
@@ -1713,21 +1713,7 @@ else
 target-dir = $(if $(KBUILD_EXTMOD),$(dir $<),$(dir $@))
 endif
 
-%.s: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.i: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.o: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.lst: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.s: %.S prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.o: %.S prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.symtypes: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.ll: %.c prepare scripts FORCE
+%.i %.ll %.lst %.o %.s %.symtypes: prepare scripts FORCE
$(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
 
 # Modules
-- 
2.7.4



[PATCH] kbuild: consolidate single targets

2018-10-29 Thread Masahiro Yamada
Instead of specifying target/source pairs, let's list patterns that we
want to handle as single targets. This slightly changes the behavior;
the top Makefile previously checked the presence of a source file,
now Kbuild will descend into a subdirectory anyway to find out what to
do there.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/Makefile b/Makefile
index be76e6e..7d13add 100644
--- a/Makefile
+++ b/Makefile
@@ -1713,21 +1713,7 @@ else
 target-dir = $(if $(KBUILD_EXTMOD),$(dir $<),$(dir $@))
 endif
 
-%.s: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.i: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.o: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.lst: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.s: %.S prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.o: %.S prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.symtypes: %.c prepare scripts FORCE
-   $(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
-%.ll: %.c prepare scripts FORCE
+%.i %.ll %.lst %.o %.s %.symtypes: prepare scripts FORCE
$(Q)$(MAKE) $(build)=$(build-dir) $(target-dir)$(notdir $@)
 
 # Modules
-- 
2.7.4



[PATCH 2/2] kbuild: remove cc-name variable

2018-10-29 Thread Masahiro Yamada
There is one more user of $(cc-name) in the top Makefile. It is supposed
to detect Clang before invoking Kconfig, so it should still be there
in the $(shell ...) form. All the other users of $(cc-name) have been
replaced with $(CONFIG_CC_IS_CLANG). Hence, scripts/Kbuild.include does
not need to define cc-name any more.

Signed-off-by: Masahiro Yamada 
---

 Makefile   | 2 +-
 scripts/Kbuild.include | 4 
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index bd93bc3..430f7de 100644
--- a/Makefile
+++ b/Makefile
@@ -485,7 +485,7 @@ ifneq ($(KBUILD_SRC),)
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree)
 endif
 
-ifeq ($(cc-name),clang)
+ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_TARGET   := --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD)))
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index ca21a35..51703ae 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -140,10 +140,6 @@ cc-option-yn = $(call try-run,\
 cc-disable-warning = $(call try-run,\
$(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -W$(strip $(1)) -c 
-x c /dev/null -o "$$TMP",-Wno-$(strip $(1)))
 
-# cc-name
-# Expands to either gcc or clang
-cc-name = $(shell $(CC) -v 2>&1 | grep -q "clang version" && echo clang || 
echo gcc)
-
 # cc-version
 cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-version.sh $(CC))
 
-- 
2.7.4



Re: [PATCH v2] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 06:37:53AM +, Peng15 Wang 王鹏 wrote:
> 
> 
> >From: Kees Cook 
> >Sent: Monday, October 29, 2018 0:03
> >To: Peng15 Wang 王鹏
> >Cc: an...@enomsg.org; ccr...@android.com; tony.l...@intel.com; 
> >linux-kernel@vger.kernel.org; Joel Fernandes
> >Subject: Re: [PATCH v2] pstore: Avoid duplicate call of persistent_ram_zap()
> >
> >On Sat, Oct 27, 2018 at 2:08 PM, Peng15 Wang 王鹏  
> >wrote:
> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
> >> function call path is like this:
> >>
> >> ramoops_init_prz ->
> >> |
> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
> >> |
> >> |-> persistent_ram_zap
> >>
> >> As we can see, persistent_ram_zap() is called twice.
> >> We can avoid this by adding an option to persistent_ram_new(), and
> >> only call persistent_ram_zap() when it is needed.
> >>
> >> Signed-off-by: Peng Wang 
> >> ---
> >>  fs/pstore/ram.c|  5 +++--
> >>  fs/pstore/ram_core.c   | 11 +++
> >>  include/linux/pstore_ram.h |  3 ++-
> >>  3 files changed, 12 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> >> index ffcff6516e89..3044274de2f0 100644
> >> --- a/fs/pstore/ram.c
> >> +++ b/fs/pstore/ram.c
> >> @@ -596,7 +596,8 @@ static int ramoops_init_przs(const char *name,
> >>   name, i, *cnt - 1);
> >> prz_ar[i] = persistent_ram_new(*paddr, zone_sz, sig,
> >>>ecc_info,
> >> -  cxt->memtype, flags, label);
> >> +  cxt->memtype, flags,
> >> +  label, true);
> >> if (IS_ERR(prz_ar[i])) {
> >> err = PTR_ERR(prz_ar[i]);
> >> dev_err(dev, "failed to request %s mem region 
> >> (0x%zx@0x%llx): %d\n",
> >> @@ -640,7 +641,7 @@ static int ramoops_init_prz(const char *name,
> >>
> >> label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
> >> *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
> >> - cxt->memtype, 0, label);
> >> + cxt->memtype, 0, label, false);
> >> if (IS_ERR(*prz)) {
> >> int err = PTR_ERR(*prz);
> >>
> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
> >> index 12e21f789194..d8a520c8741c 100644
> >> --- a/fs/pstore/ram_core.c
> >> +++ b/fs/pstore/ram_core.c
> >> @@ -486,7 +486,8 @@ static int persistent_ram_buffer_map(phys_addr_t 
> >> start, phys_addr_t size,
> >>  }
> >>
> >>  static int persistent_ram_post_init(struct persistent_ram_zone *prz, u32 
> >> sig,
> >> -   struct persistent_ram_ecc_info 
> >> *ecc_info)
> >> +   struct persistent_ram_ecc_info 
> >> *ecc_info,
> >> +   bool zap_option)
> >>  {
> >> int ret;
> >>
> >> @@ -514,7 +515,8 @@ static int persistent_ram_post_init(struct 
> >> persistent_ram_zone *prz, u32 >sig,
> >>
> >> /* Rewind missing or invalid memory area. */
> >> prz->buffer->sig = sig;
> >> -   persistent_ram_zap(prz);
> >> +   if (zap_option)
> >> +   persistent_ram_zap(prz);
> >
> >This part of persistent_ram_post_init() handles the "invalid buffer"
> >case, which should always zap. The question is whether or not to zap
> >in the case of a valid buffer (the "return 0" earlier in the
> >function). I think you v2 patch needs similar changes found in your
> >v1: the v2 patch also needs to remove the "return 0" and replace it
> >with "zap_option = true;" and to remove the zap call from
> >ramoops_init_prz(). Then I think all the paths will be consolidated.
> 
> Thank you so much for the tips!
> 
> Furthermore,  we can make "zap_option" stand for whether its caller want to 
> zap in case of
> a valid buffer. So ramoops_init_przs() would say "false", and 
> ramoops_init_prz() would 
> say "true".
> 
> In persistent_ram_post_init(), if zap_option says "false", we return 
> immediately after 
> persistent_ram_save_old(), otherwise persistent_ram_zap would be called at 
> the end.

Can you not just add it to the flags, something like PRZ_ZAP_NEW, and set
that flag before calling ramoops_init_prz*, then check the flag in
persistent_ram_new? We are already passing flags to persistent_ram_new.

That way no new function arguments are needed and its simple.

 - Joel



[PATCH 2/2] kbuild: remove cc-name variable

2018-10-29 Thread Masahiro Yamada
There is one more user of $(cc-name) in the top Makefile. It is supposed
to detect Clang before invoking Kconfig, so it should still be there
in the $(shell ...) form. All the other users of $(cc-name) have been
replaced with $(CONFIG_CC_IS_CLANG). Hence, scripts/Kbuild.include does
not need to define cc-name any more.

Signed-off-by: Masahiro Yamada 
---

 Makefile   | 2 +-
 scripts/Kbuild.include | 4 
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index bd93bc3..430f7de 100644
--- a/Makefile
+++ b/Makefile
@@ -485,7 +485,7 @@ ifneq ($(KBUILD_SRC),)
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree)
 endif
 
-ifeq ($(cc-name),clang)
+ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
 ifneq ($(CROSS_COMPILE),)
 CLANG_TARGET   := --target=$(notdir $(CROSS_COMPILE:%-=%))
 GCC_TOOLCHAIN_DIR := $(dir $(shell which $(LD)))
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index ca21a35..51703ae 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -140,10 +140,6 @@ cc-option-yn = $(call try-run,\
 cc-disable-warning = $(call try-run,\
$(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -W$(strip $(1)) -c 
-x c /dev/null -o "$$TMP",-Wno-$(strip $(1)))
 
-# cc-name
-# Expands to either gcc or clang
-cc-name = $(shell $(CC) -v 2>&1 | grep -q "clang version" && echo clang || 
echo gcc)
-
 # cc-version
 cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-version.sh $(CC))
 
-- 
2.7.4



Re: [PATCH v2] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 06:37:53AM +, Peng15 Wang 王鹏 wrote:
> 
> 
> >From: Kees Cook 
> >Sent: Monday, October 29, 2018 0:03
> >To: Peng15 Wang 王鹏
> >Cc: an...@enomsg.org; ccr...@android.com; tony.l...@intel.com; 
> >linux-kernel@vger.kernel.org; Joel Fernandes
> >Subject: Re: [PATCH v2] pstore: Avoid duplicate call of persistent_ram_zap()
> >
> >On Sat, Oct 27, 2018 at 2:08 PM, Peng15 Wang 王鹏  
> >wrote:
> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
> >> function call path is like this:
> >>
> >> ramoops_init_prz ->
> >> |
> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
> >> |
> >> |-> persistent_ram_zap
> >>
> >> As we can see, persistent_ram_zap() is called twice.
> >> We can avoid this by adding an option to persistent_ram_new(), and
> >> only call persistent_ram_zap() when it is needed.
> >>
> >> Signed-off-by: Peng Wang 
> >> ---
> >>  fs/pstore/ram.c|  5 +++--
> >>  fs/pstore/ram_core.c   | 11 +++
> >>  include/linux/pstore_ram.h |  3 ++-
> >>  3 files changed, 12 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> >> index ffcff6516e89..3044274de2f0 100644
> >> --- a/fs/pstore/ram.c
> >> +++ b/fs/pstore/ram.c
> >> @@ -596,7 +596,8 @@ static int ramoops_init_przs(const char *name,
> >>   name, i, *cnt - 1);
> >> prz_ar[i] = persistent_ram_new(*paddr, zone_sz, sig,
> >>>ecc_info,
> >> -  cxt->memtype, flags, label);
> >> +  cxt->memtype, flags,
> >> +  label, true);
> >> if (IS_ERR(prz_ar[i])) {
> >> err = PTR_ERR(prz_ar[i]);
> >> dev_err(dev, "failed to request %s mem region 
> >> (0x%zx@0x%llx): %d\n",
> >> @@ -640,7 +641,7 @@ static int ramoops_init_prz(const char *name,
> >>
> >> label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
> >> *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
> >> - cxt->memtype, 0, label);
> >> + cxt->memtype, 0, label, false);
> >> if (IS_ERR(*prz)) {
> >> int err = PTR_ERR(*prz);
> >>
> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
> >> index 12e21f789194..d8a520c8741c 100644
> >> --- a/fs/pstore/ram_core.c
> >> +++ b/fs/pstore/ram_core.c
> >> @@ -486,7 +486,8 @@ static int persistent_ram_buffer_map(phys_addr_t 
> >> start, phys_addr_t size,
> >>  }
> >>
> >>  static int persistent_ram_post_init(struct persistent_ram_zone *prz, u32 
> >> sig,
> >> -   struct persistent_ram_ecc_info 
> >> *ecc_info)
> >> +   struct persistent_ram_ecc_info 
> >> *ecc_info,
> >> +   bool zap_option)
> >>  {
> >> int ret;
> >>
> >> @@ -514,7 +515,8 @@ static int persistent_ram_post_init(struct 
> >> persistent_ram_zone *prz, u32 >sig,
> >>
> >> /* Rewind missing or invalid memory area. */
> >> prz->buffer->sig = sig;
> >> -   persistent_ram_zap(prz);
> >> +   if (zap_option)
> >> +   persistent_ram_zap(prz);
> >
> >This part of persistent_ram_post_init() handles the "invalid buffer"
> >case, which should always zap. The question is whether or not to zap
> >in the case of a valid buffer (the "return 0" earlier in the
> >function). I think you v2 patch needs similar changes found in your
> >v1: the v2 patch also needs to remove the "return 0" and replace it
> >with "zap_option = true;" and to remove the zap call from
> >ramoops_init_prz(). Then I think all the paths will be consolidated.
> 
> Thank you so much for the tips!
> 
> Furthermore,  we can make "zap_option" stand for whether its caller want to 
> zap in case of
> a valid buffer. So ramoops_init_przs() would say "false", and 
> ramoops_init_prz() would 
> say "true".
> 
> In persistent_ram_post_init(), if zap_option says "false", we return 
> immediately after 
> persistent_ram_save_old(), otherwise persistent_ram_zap would be called at 
> the end.

Can you not just add it to the flags, something like PRZ_ZAP_NEW, and set
that flag before calling ramoops_init_prz*, then check the flag in
persistent_ram_new? We are already passing flags to persistent_ram_new.

That way no new function arguments are needed and its simple.

 - Joel



Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 29, 2018 at 11:24:42AM +, Ran Rozenstein wrote:
> > Hi Paul and all,
> > 
> > > -Original Message-
> > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> > > ow...@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > Sent: Thursday, August 30, 2018 01:21
> > > To: linux-kernel@vger.kernel.org
> > > Cc: mi...@kernel.org; jiangshan...@gmail.com; dipan...@in.ibm.com;
> > > a...@linux-foundation.org; mathieu.desnoy...@efficios.com;
> > > j...@joshtriplett.org; t...@linutronix.de; pet...@infradead.org;
> > > rost...@goodmis.org; dhowe...@redhat.com; eduma...@google.com;
> > > fweis...@gmail.com; o...@redhat.com; j...@joelfernandes.org; Paul E.
> > > McKenney 
> > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > quiescent states when disabled
> > > 
> > > This commit defers reporting of RCU-preempt quiescent states at
> > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > preemption are disabled.  These deferred quiescent states are reported at 
> > > a
> > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > operation.  Of course, if another RCU read-side critical section has 
> > > started in
> > > the meantime, the reporting of the quiescent state will be further 
> > > deferred.
> > > 
> > > This also means that disabling preemption, interrupts, and/or softirqs 
> > > will act
> > > as an RCU-preempt read-side critical section.
> > > This is enforced by checking preempt_count() as needed.
> > > 
> > > Some special cases must be handled on an ad-hoc basis, for example,
> > > context switch is a quiescent state even though both the scheduler and
> > > do_exit() disable preemption.  In these cases, additional calls to
> > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar 
> > > logic
> > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > this case the quiescent state happened just before the corresponding
> > > scheduling-clock interrupt.
> > > 
> > > In theory, this change lifts a long-standing restriction that required 
> > > that if
> > > interrupts were disabled across a call to rcu_read_unlock() that the 
> > > matching
> > > rcu_read_lock() also be contained within that interrupts-disabled region 
> > > of
> > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > state is now deferred until after interrupts have been enabled, it is no 
> > > longer
> > > possible for this situation to result in deadlocks involving the 
> > > scheduler's
> > > runqueue and priority-inheritance locks.  This may allow some code
> > > simplification that might reduce interrupt latency a bit.  Unfortunately, 
> > > in
> > > practice this would also defer deboosting a low-priority task that had 
> > > been
> > > subjected to RCU priority boosting, so real-time-response considerations
> > > might well force this restriction to remain in place.
> > > 
> > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > side critical sections, but also by disabling of interrupts, preemption, 
> > > and
> > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor 
> > > of
> > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > additional plumbing to provide the network denial-of-service guarantees
> > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > This would mean that all kernels would have but one flavor of RCU, which
> > > would open the door to significant code cleanup.
> > > 
> > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > reducing the NOCB kthreads by at least a factor of two.
> > > 
> > > Signed-off-by: Paul E. McKenney  [ paulmck:
> > > Apply rcu_read_unlock_special() preempt_count() feedback
> > >   from Joel Fernandes. ]
> > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug 
> > > located
> > > by kbuild test robot involving recursion
> > >   via rcu_preempt_deferred_qs(). ]
> > > ---
> > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > >  include/linux/rcutiny.h   |   5 +
> > >  kernel/rcu/tree.c |   9 ++
> > >  kernel/rcu/tree.h |   3 +
> > >  kernel/rcu/tree_exp.h |  71 +++--
> > >  kernel/rcu/tree_plugin.h  | 144 +-
> > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > 
> > 
> > We started seeing the trace below in our regression system, after I 
> > bisected I found this is the offending commit.
> > This appears immediately on boot. 
> > Please let me know if you need any additional details.
> 
> 

Re: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt quiescent states when disabled

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 07:27:35AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 29, 2018 at 11:24:42AM +, Ran Rozenstein wrote:
> > Hi Paul and all,
> > 
> > > -Original Message-
> > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> > > ow...@vger.kernel.org] On Behalf Of Paul E. McKenney
> > > Sent: Thursday, August 30, 2018 01:21
> > > To: linux-kernel@vger.kernel.org
> > > Cc: mi...@kernel.org; jiangshan...@gmail.com; dipan...@in.ibm.com;
> > > a...@linux-foundation.org; mathieu.desnoy...@efficios.com;
> > > j...@joshtriplett.org; t...@linutronix.de; pet...@infradead.org;
> > > rost...@goodmis.org; dhowe...@redhat.com; eduma...@google.com;
> > > fweis...@gmail.com; o...@redhat.com; j...@joelfernandes.org; Paul E.
> > > McKenney 
> > > Subject: [PATCH tip/core/rcu 02/19] rcu: Defer reporting RCU-preempt
> > > quiescent states when disabled
> > > 
> > > This commit defers reporting of RCU-preempt quiescent states at
> > > rcu_read_unlock_special() time when any of interrupts, softirq, or
> > > preemption are disabled.  These deferred quiescent states are reported at 
> > > a
> > > later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline
> > > operation.  Of course, if another RCU read-side critical section has 
> > > started in
> > > the meantime, the reporting of the quiescent state will be further 
> > > deferred.
> > > 
> > > This also means that disabling preemption, interrupts, and/or softirqs 
> > > will act
> > > as an RCU-preempt read-side critical section.
> > > This is enforced by checking preempt_count() as needed.
> > > 
> > > Some special cases must be handled on an ad-hoc basis, for example,
> > > context switch is a quiescent state even though both the scheduler and
> > > do_exit() disable preemption.  In these cases, additional calls to
> > > rcu_preempt_deferred_qs() override the preemption disabling.  Similar 
> > > logic
> > > overrides disabled interrupts in rcu_preempt_check_callbacks() because in
> > > this case the quiescent state happened just before the corresponding
> > > scheduling-clock interrupt.
> > > 
> > > In theory, this change lifts a long-standing restriction that required 
> > > that if
> > > interrupts were disabled across a call to rcu_read_unlock() that the 
> > > matching
> > > rcu_read_lock() also be contained within that interrupts-disabled region 
> > > of
> > > code.  Because the reporting of the corresponding RCU-preempt quiescent
> > > state is now deferred until after interrupts have been enabled, it is no 
> > > longer
> > > possible for this situation to result in deadlocks involving the 
> > > scheduler's
> > > runqueue and priority-inheritance locks.  This may allow some code
> > > simplification that might reduce interrupt latency a bit.  Unfortunately, 
> > > in
> > > practice this would also defer deboosting a low-priority task that had 
> > > been
> > > subjected to RCU priority boosting, so real-time-response considerations
> > > might well force this restriction to remain in place.
> > > 
> > > Because RCU-preempt grace periods are now blocked not only by RCU read-
> > > side critical sections, but also by disabling of interrupts, preemption, 
> > > and
> > > softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor 
> > > of
> > > RCU-preempt in CONFIG_PREEMPT=y kernels.  This may require some
> > > additional plumbing to provide the network denial-of-service guarantees
> > > that have been traditionally provided by RCU-bh.  Once these are in place,
> > > CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched.
> > > This would mean that all kernels would have but one flavor of RCU, which
> > > would open the door to significant code cleanup.
> > > 
> > > Moving to a single flavor of RCU would also have the beneficial effect of
> > > reducing the NOCB kthreads by at least a factor of two.
> > > 
> > > Signed-off-by: Paul E. McKenney  [ paulmck:
> > > Apply rcu_read_unlock_special() preempt_count() feedback
> > >   from Joel Fernandes. ]
> > > [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
> > >   response to bug reports from kbuild test robot. ] [ paulmck: Fix bug 
> > > located
> > > by kbuild test robot involving recursion
> > >   via rcu_preempt_deferred_qs(). ]
> > > ---
> > >  .../RCU/Design/Requirements/Requirements.html |  50 +++---
> > >  include/linux/rcutiny.h   |   5 +
> > >  kernel/rcu/tree.c |   9 ++
> > >  kernel/rcu/tree.h |   3 +
> > >  kernel/rcu/tree_exp.h |  71 +++--
> > >  kernel/rcu/tree_plugin.h  | 144 +-
> > >  6 files changed, 205 insertions(+), 77 deletions(-)
> > > 
> > 
> > We started seeing the trace below in our regression system, after I 
> > bisected I found this is the offending commit.
> > This appears immediately on boot. 
> > Please let me know if you need any additional details.
> 
> 

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione  wrote:
>
> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.
>
> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".

How long does the 'inspection' procedure take? If its a short
duration, then is PID reuse really an issue, I mean the PIDs are not
reused until wrap around and the only reason this can be a problem is
if you have the wrap around while the 'inspecting some aspect'
procedure takes really long.

Also the proc fs is typically not the right place for this. Some
entries in proc are writeable, but those are for changing values of
kernel data structures. The title of man proc(5) is "proc - process
information pseudo-filesystem". So its "information" right?

IMO without a really good reason for this, it could really be a hard
sell but the RFC was worth it anyway to discuss it ;-)

thanks,

- Joel


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione  wrote:
>
> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.
>
> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".

How long does the 'inspection' procedure take? If its a short
duration, then is PID reuse really an issue, I mean the PIDs are not
reused until wrap around and the only reason this can be a problem is
if you have the wrap around while the 'inspecting some aspect'
procedure takes really long.

Also the proc fs is typically not the right place for this. Some
entries in proc are writeable, but those are for changing values of
kernel data structures. The title of man proc(5) is "proc - process
information pseudo-filesystem". So its "information" right?

IMO without a really good reason for this, it could really be a hard
sell but the RFC was worth it anyway to discuss it ;-)

thanks,

- Joel


Re: [PATCH] kretprobe: produce sane stack traces

2018-10-29 Thread Aleksa Sarai
On 2018-10-30, Masami Hiramatsu  wrote:
> > Historically, kretprobe has always produced unusable stack traces
> > (kretprobe_trampoline is the only entry in most cases, because of the
> > funky stack pointer overwriting). This has caused quite a few annoyances
> > when using tracing to debug problems[1] -- since return values are only
> > available with kretprobes but stack traces were only usable for kprobes,
> > users had to probe both and then manually associate them.
> 
> Yes, this unfortunately still happens. I once tried to fix it by
> replacing current "kretprobe instance" with graph-tracer's per-thread
> return stack. (https://lkml.org/lkml/2017/8/21/553)

I played with graph-tracer a while ago and it didn't appear to have
associated return values? Is this hidden somewhere or did I just miss
it?

> I still believe that direction is the best solution to solve this kind
> of issues, otherwise, we have to have 2 different stack fixups for
> kretprobe and ftrace graph tracer. (I will have a talk with Steve at
> plumbers next month)

I'm definitely :+1: on removing the duplication of the stack fixups, my
first instinct was to try to refactor all of the stack_trace code so
that we didn't have multiple arch-specific "get the stack trace" paths
(and so we could generically add current_kretprobe_instance() to one
codepath). But after looking into it, I was convinced this would be more
than a little ugly to do.

> > With the advent of bpf_trace, users would have been able to do this
> > association in bpf, but this was less than ideal (because
> > bpf_get_stackid would still produce rubbish and programs that didn't
> > know better would get silly results). The main usecase for stack traces
> > (at least with bpf_trace) is for DTrace-style aggregation on stack
> > traces (both entry and exit). Therefore we cannot simply correct the
> > stack trace on exit -- we must stash away the stack trace and return the
> > entry stack trace when it is requested.
> > 
> > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> > kretprobe'd functions in trace logs") are no longer necessary *for
> > tracing* because now all kretprobe traces should produce sane stack
> > traces. However it's not clear whether removing them completely is
> > reasonable.
> 
> Then, let's try to revert it :)

Sure. :P

> BTW, could you also add a test case for ftrace too?
> also, I have some comments below.

Yup, will do.

> > +#define KRETPROBE_TRACE_SIZE 1024
> > +struct kretprobe_trace {
> > +   int nr_entries;
> > +   unsigned long entries[KRETPROBE_TRACE_SIZE];
> > +};
> 
> Hmm, do we really need all entries? It takes 8KB for each instances.
> Note that the number of instances can be big if the system core number
> is larger.

Yeah, you're right this is too large for a default.

But the problem is that we need it to be large enough for any of the
tracers to be happy -- otherwise we'd have to dynamically allocate it
and I had a feeling this would be seen as a Bad Idea™ in the kprobe
paths.

  * ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64).
  * perf_events (and thus BPF) uses 127 as the default but can be
configured via sysctl -- and thus can be unbounded.
  * show_stack(...) doesn't appear to have a limit, but I might just be
misreading the x86-specific code.

As mentioned above, the lack of consensus on a single structure for
storing stack traces also means that there is a lack of consensus on
what the largest reasonable stack is.

But maybe just doing 127 would be "reasonable"?

(Athough, dynamically allocating would allow us to just use 'struct
stack_trace' directly without needing to embed a different structure.)

> > +   hlist_for_each_entry_safe(iter, next, head, hlist) {
> 
> Why would you use "_safe" variant here? if you don't modify the hlist,
> you don't need to use it.

Yup, my mistake.

> > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> > +   struct stack_trace *trace)
> > +{
> > +   int i;
> > +   struct kretprobe_trace *krt = >entry;
> > +
> > +   for (i = trace->skip; i < krt->nr_entries; i++) {
> > +   if (trace->nr_entries >= trace->max_entries)
> > +   break;
> > +   trace->entries[trace->nr_entries++] = krt->entries[i];
> > +   }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace);
> > +
> > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> > +struct perf_callchain_entry_ctx *ctx)
> > +{
> > +   int i;
> > +   struct kretprobe_trace *krt = >entry;
> > +
> > +   for (i = 0; i < krt->nr_entries; i++) {
> > +   if (krt->entries[i] == ULONG_MAX)
> > +   break;
> > +   perf_callchain_store(ctx, (u64) krt->entries[i]);
> > +   }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel);
> 
> 
> Why do we need to export these functions?

That's a good question -- I must've just banged out the EXPORT
statements without thinking. 

Re: [PATCH] kretprobe: produce sane stack traces

2018-10-29 Thread Aleksa Sarai
On 2018-10-30, Masami Hiramatsu  wrote:
> > Historically, kretprobe has always produced unusable stack traces
> > (kretprobe_trampoline is the only entry in most cases, because of the
> > funky stack pointer overwriting). This has caused quite a few annoyances
> > when using tracing to debug problems[1] -- since return values are only
> > available with kretprobes but stack traces were only usable for kprobes,
> > users had to probe both and then manually associate them.
> 
> Yes, this unfortunately still happens. I once tried to fix it by
> replacing current "kretprobe instance" with graph-tracer's per-thread
> return stack. (https://lkml.org/lkml/2017/8/21/553)

I played with graph-tracer a while ago and it didn't appear to have
associated return values? Is this hidden somewhere or did I just miss
it?

> I still believe that direction is the best solution to solve this kind
> of issues, otherwise, we have to have 2 different stack fixups for
> kretprobe and ftrace graph tracer. (I will have a talk with Steve at
> plumbers next month)

I'm definitely :+1: on removing the duplication of the stack fixups, my
first instinct was to try to refactor all of the stack_trace code so
that we didn't have multiple arch-specific "get the stack trace" paths
(and so we could generically add current_kretprobe_instance() to one
codepath). But after looking into it, I was convinced this would be more
than a little ugly to do.

> > With the advent of bpf_trace, users would have been able to do this
> > association in bpf, but this was less than ideal (because
> > bpf_get_stackid would still produce rubbish and programs that didn't
> > know better would get silly results). The main usecase for stack traces
> > (at least with bpf_trace) is for DTrace-style aggregation on stack
> > traces (both entry and exit). Therefore we cannot simply correct the
> > stack trace on exit -- we must stash away the stack trace and return the
> > entry stack trace when it is requested.
> > 
> > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> > kretprobe'd functions in trace logs") are no longer necessary *for
> > tracing* because now all kretprobe traces should produce sane stack
> > traces. However it's not clear whether removing them completely is
> > reasonable.
> 
> Then, let's try to revert it :)

Sure. :P

> BTW, could you also add a test case for ftrace too?
> also, I have some comments below.

Yup, will do.

> > +#define KRETPROBE_TRACE_SIZE 1024
> > +struct kretprobe_trace {
> > +   int nr_entries;
> > +   unsigned long entries[KRETPROBE_TRACE_SIZE];
> > +};
> 
> Hmm, do we really need all entries? It takes 8KB for each instances.
> Note that the number of instances can be big if the system core number
> is larger.

Yeah, you're right this is too large for a default.

But the problem is that we need it to be large enough for any of the
tracers to be happy -- otherwise we'd have to dynamically allocate it
and I had a feeling this would be seen as a Bad Idea™ in the kprobe
paths.

  * ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64).
  * perf_events (and thus BPF) uses 127 as the default but can be
configured via sysctl -- and thus can be unbounded.
  * show_stack(...) doesn't appear to have a limit, but I might just be
misreading the x86-specific code.

As mentioned above, the lack of consensus on a single structure for
storing stack traces also means that there is a lack of consensus on
what the largest reasonable stack is.

But maybe just doing 127 would be "reasonable"?

(Athough, dynamically allocating would allow us to just use 'struct
stack_trace' directly without needing to embed a different structure.)

> > +   hlist_for_each_entry_safe(iter, next, head, hlist) {
> 
> Why would you use "_safe" variant here? if you don't modify the hlist,
> you don't need to use it.

Yup, my mistake.

> > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> > +   struct stack_trace *trace)
> > +{
> > +   int i;
> > +   struct kretprobe_trace *krt = >entry;
> > +
> > +   for (i = trace->skip; i < krt->nr_entries; i++) {
> > +   if (trace->nr_entries >= trace->max_entries)
> > +   break;
> > +   trace->entries[trace->nr_entries++] = krt->entries[i];
> > +   }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace);
> > +
> > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> > +struct perf_callchain_entry_ctx *ctx)
> > +{
> > +   int i;
> > +   struct kretprobe_trace *krt = >entry;
> > +
> > +   for (i = 0; i < krt->nr_entries; i++) {
> > +   if (krt->entries[i] == ULONG_MAX)
> > +   break;
> > +   perf_callchain_store(ctx, (u64) krt->entries[i]);
> > +   }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel);
> 
> 
> Why do we need to export these functions?

That's a good question -- I must've just banged out the EXPORT
statements without thinking. 

Re: [PATCH 2/2] gsmi: Log event for critical thermal thresholds

2018-10-29 Thread kbuild test robot
Hi Duncan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on soc-thermal/next]
[also build test ERROR on v4.19]
[cannot apply to next-20181029]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ross-Zwisler/thermal-Add-notifier-call-chain-for-hot-critical-events/20181023-043806
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git 
next
config: i386-randconfig-k0-10291547 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/firmware/google/gsmi.o: In function `gsmi_exit':
>> drivers/firmware/google/gsmi.c:936: undefined reference to 
>> `unregister_thermal_notifier'
   drivers/firmware/google/gsmi.o: In function `gsmi_init':
>> drivers/firmware/google/gsmi.c:909: undefined reference to 
>> `register_thermal_notifier'

vim +936 drivers/firmware/google/gsmi.c

   787  
   788  static __init int gsmi_init(void)
   789  {
   790  unsigned long flags;
   791  int ret;
   792  
   793  ret = gsmi_system_valid();
   794  if (ret)
   795  return ret;
   796  
   797  gsmi_dev.smi_cmd = acpi_gbl_FADT.smi_command;
   798  
   799  /* register device */
   800  gsmi_dev.pdev = platform_device_register_full(_dev_info);
   801  if (IS_ERR(gsmi_dev.pdev)) {
   802  printk(KERN_ERR "gsmi: unable to register platform 
device\n");
   803  return PTR_ERR(gsmi_dev.pdev);
   804  }
   805  
   806  /* SMI access needs to be serialized */
   807  spin_lock_init(_dev.lock);
   808  
   809  ret = -ENOMEM;
   810  gsmi_dev.dma_pool = dma_pool_create("gsmi", _dev.pdev->dev,
   811   GSMI_BUF_SIZE, 
GSMI_BUF_ALIGN, 0);
   812  if (!gsmi_dev.dma_pool)
   813  goto out_err;
   814  
   815  /*
   816   * pre-allocate buffers because sometimes we are called when
   817   * this is not feasible: oops, panic, die, mce, etc
   818   */
   819  gsmi_dev.name_buf = gsmi_buf_alloc();
   820  if (!gsmi_dev.name_buf) {
   821  printk(KERN_ERR "gsmi: failed to allocate name 
buffer\n");
   822  goto out_err;
   823  }
   824  
   825  gsmi_dev.data_buf = gsmi_buf_alloc();
   826  if (!gsmi_dev.data_buf) {
   827  printk(KERN_ERR "gsmi: failed to allocate data 
buffer\n");
   828  goto out_err;
   829  }
   830  
   831  gsmi_dev.param_buf = gsmi_buf_alloc();
   832  if (!gsmi_dev.param_buf) {
   833  printk(KERN_ERR "gsmi: failed to allocate param 
buffer\n");
   834  goto out_err;
   835  }
   836  
   837  /*
   838   * Determine type of handshake used to serialize the SMI
   839   * entry. See also gsmi_exec().
   840   *
   841   * There's a "behavior" present on some chipsets where writing 
the
   842   * SMI trigger register in the southbridge doesn't result in an
   843   * immediate SMI. Rather, the processor can execute "a few" more
   844   * instructions before the SMI takes effect. To ensure 
synchronous
   845   * behavior, implement a handshake between the kernel driver 
and the
   846   * firmware handler to spin until released. This ioctl 
determines
   847   * the type of handshake.
   848   *
   849   * NONE: The firmware handler does not implement any
   850   * handshake. Either it doesn't need to, or it's legacy firmware
   851   * that doesn't know it needs to and never will.
   852   *
   853   * CF: The firmware handler will clear the CF in the saved
   854   * state before returning. The driver may set the CF and test 
for
   855   * it to clear before proceeding.
   856   *
   857   * SPIN: The firmware handler does not implement any handshake
   858   * but the driver should spin for a hundred or so microseconds
   859   * to ensure the SMI has triggered.
   860   *
   861   * Finally, the handler will return -ENOSYS if
   862   * GSMI_CMD_HANDSHAKE_TYPE is unimplemented, which implies
   863   * HANDSHAKE_NONE.
   864   */
   865  spin_lock_irqsave(_dev.lock, flags);
   866  gsmi_dev.handshake_type = GSMI_HANDSHAKE_SPIN;
   867  gsmi_dev.handshake_type =
   868  gsmi_exec(GSMI_CALLBACK, GSMI_CMD_HANDSHAKE_TYPE);
   869  if (gs

Re: [PATCH 2/2] gsmi: Log event for critical thermal thresholds

2018-10-29 Thread kbuild test robot
Hi Duncan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on soc-thermal/next]
[also build test ERROR on v4.19]
[cannot apply to next-20181029]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ross-Zwisler/thermal-Add-notifier-call-chain-for-hot-critical-events/20181023-043806
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git 
next
config: i386-randconfig-k0-10291547 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/firmware/google/gsmi.o: In function `gsmi_exit':
>> drivers/firmware/google/gsmi.c:936: undefined reference to 
>> `unregister_thermal_notifier'
   drivers/firmware/google/gsmi.o: In function `gsmi_init':
>> drivers/firmware/google/gsmi.c:909: undefined reference to 
>> `register_thermal_notifier'

vim +936 drivers/firmware/google/gsmi.c

   787  
   788  static __init int gsmi_init(void)
   789  {
   790  unsigned long flags;
   791  int ret;
   792  
   793  ret = gsmi_system_valid();
   794  if (ret)
   795  return ret;
   796  
   797  gsmi_dev.smi_cmd = acpi_gbl_FADT.smi_command;
   798  
   799  /* register device */
   800  gsmi_dev.pdev = platform_device_register_full(_dev_info);
   801  if (IS_ERR(gsmi_dev.pdev)) {
   802  printk(KERN_ERR "gsmi: unable to register platform 
device\n");
   803  return PTR_ERR(gsmi_dev.pdev);
   804  }
   805  
   806  /* SMI access needs to be serialized */
   807  spin_lock_init(_dev.lock);
   808  
   809  ret = -ENOMEM;
   810  gsmi_dev.dma_pool = dma_pool_create("gsmi", _dev.pdev->dev,
   811   GSMI_BUF_SIZE, 
GSMI_BUF_ALIGN, 0);
   812  if (!gsmi_dev.dma_pool)
   813  goto out_err;
   814  
   815  /*
   816   * pre-allocate buffers because sometimes we are called when
   817   * this is not feasible: oops, panic, die, mce, etc
   818   */
   819  gsmi_dev.name_buf = gsmi_buf_alloc();
   820  if (!gsmi_dev.name_buf) {
   821  printk(KERN_ERR "gsmi: failed to allocate name 
buffer\n");
   822  goto out_err;
   823  }
   824  
   825  gsmi_dev.data_buf = gsmi_buf_alloc();
   826  if (!gsmi_dev.data_buf) {
   827  printk(KERN_ERR "gsmi: failed to allocate data 
buffer\n");
   828  goto out_err;
   829  }
   830  
   831  gsmi_dev.param_buf = gsmi_buf_alloc();
   832  if (!gsmi_dev.param_buf) {
   833  printk(KERN_ERR "gsmi: failed to allocate param 
buffer\n");
   834  goto out_err;
   835  }
   836  
   837  /*
   838   * Determine type of handshake used to serialize the SMI
   839   * entry. See also gsmi_exec().
   840   *
   841   * There's a "behavior" present on some chipsets where writing 
the
   842   * SMI trigger register in the southbridge doesn't result in an
   843   * immediate SMI. Rather, the processor can execute "a few" more
   844   * instructions before the SMI takes effect. To ensure 
synchronous
   845   * behavior, implement a handshake between the kernel driver 
and the
   846   * firmware handler to spin until released. This ioctl 
determines
   847   * the type of handshake.
   848   *
   849   * NONE: The firmware handler does not implement any
   850   * handshake. Either it doesn't need to, or it's legacy firmware
   851   * that doesn't know it needs to and never will.
   852   *
   853   * CF: The firmware handler will clear the CF in the saved
   854   * state before returning. The driver may set the CF and test 
for
   855   * it to clear before proceeding.
   856   *
   857   * SPIN: The firmware handler does not implement any handshake
   858   * but the driver should spin for a hundred or so microseconds
   859   * to ensure the SMI has triggered.
   860   *
   861   * Finally, the handler will return -ENOSYS if
   862   * GSMI_CMD_HANDSHAKE_TYPE is unimplemented, which implies
   863   * HANDSHAKE_NONE.
   864   */
   865  spin_lock_irqsave(_dev.lock, flags);
   866  gsmi_dev.handshake_type = GSMI_HANDSHAKE_SPIN;
   867  gsmi_dev.handshake_type =
   868  gsmi_exec(GSMI_CALLBACK, GSMI_CMD_HANDSHAKE_TYPE);
   869  if (gs

Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 21:49 +, Roman Gushchin wrote:
> On Mon, Oct 29, 2018 at 09:46:54PM +0100, Mike Galbraith wrote:
> 
> > Ah, I have cgroup_disable=memory on the command line, which turns out
> > to be why your box doesn't explode, while mine does.
> 
> Yeah, here it is. I'll send the fix in few minutes. Please,
> test it on your setup. Your tested-by will be appreciated.

Yup, all-better-by:/me


Re: memcg oops: memcg_kmem_charge_memcg()->try_charge()->page_counter_try_charge()->BOOM

2018-10-29 Thread Mike Galbraith
On Mon, 2018-10-29 at 21:49 +, Roman Gushchin wrote:
> On Mon, Oct 29, 2018 at 09:46:54PM +0100, Mike Galbraith wrote:
> 
> > Ah, I have cgroup_disable=memory on the command line, which turns out
> > to be why your box doesn't explode, while mine does.
> 
> Yeah, here it is. I'll send the fix in few minutes. Please,
> test it on your setup. Your tested-by will be appreciated.

Yup, all-better-by:/me


Re: [RFC PATCH] Minimal non-child process exit notification support

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 1:01 PM Daniel Colascione  wrote:
>
> Thanks for taking a look.
>
> On Mon, Oct 29, 2018 at 7:45 PM, Joel Fernandes  wrote:
> >
> > On Mon, Oct 29, 2018 at 10:53 AM Daniel Colascione  
> > wrote:
> > >
> > > This patch adds a new file under /proc/pid, /proc/pid/exithand.
> > > Attempting to read from an exithand file will block until the
> > > corresponding process exits, at which point the read will successfully
> > > complete with EOF.  The file descriptor supports both blocking
> > > operations and poll(2). It's intended to be a minimal interface for
> > > allowing a program to wait for the exit of a process that is not one
> > > of its children.
> > >
> > > Why might we want this interface? Android's lmkd kills processes in
> > > order to free memory in response to various memory pressure
> > > signals. It's desirable to wait until a killed process actually exits
> > > before moving on (if needed) to killing the next process. Since the
> > > processes that lmkd kills are not lmkd's children, lmkd currently
> > > lacks a way to wait for a proces to actually die after being sent
> > > SIGKILL; today, lmkd resorts to polling the proc filesystem pid
> >
> > Any idea why it needs to wait and then send SIGKILL? Why not do
> > SIGKILL and look for errno == ESRCH in a loop with a delay.
>
> I want to get polling loops out of the system. Polling loops are bad
> for wakeup attribution, bad for power, bad for priority inheritance,
> and bad for latency. There's no right answer to the question "How long
> should I wait before checking $CONDITION again?". If we can have an
> explicit waitqueue interface to something, we should. Besides, PID
> polling is vulnerable to PID reuse, whereas this mechanism (just like
> anything based on struct pid) is immune to it.

The argument sounds Ok to me. I would also more details in the commit
message about the alternate methods to do this (such as kill polling
or ptrace) and why they don't work well etc so no one asks any
questions. Like maybe under a "other ways to do this" section. A bit
of googling also showed a netlink way of doing it without polling
(though I don't look into that much and wouldn't be surprised if its
more complicated)

Also I guess when you send a patch, it'd be good to pass
"--cc-cmd='./scripts/get_maintainer.pl" to git-send-email so it
automatically CCs the maintainers who maintain this.

thanks,

- Joel


Re: [RFC PATCH] Minimal non-child process exit notification support

2018-10-29 Thread Joel Fernandes
On Mon, Oct 29, 2018 at 1:01 PM Daniel Colascione  wrote:
>
> Thanks for taking a look.
>
> On Mon, Oct 29, 2018 at 7:45 PM, Joel Fernandes  wrote:
> >
> > On Mon, Oct 29, 2018 at 10:53 AM Daniel Colascione  
> > wrote:
> > >
> > > This patch adds a new file under /proc/pid, /proc/pid/exithand.
> > > Attempting to read from an exithand file will block until the
> > > corresponding process exits, at which point the read will successfully
> > > complete with EOF.  The file descriptor supports both blocking
> > > operations and poll(2). It's intended to be a minimal interface for
> > > allowing a program to wait for the exit of a process that is not one
> > > of its children.
> > >
> > > Why might we want this interface? Android's lmkd kills processes in
> > > order to free memory in response to various memory pressure
> > > signals. It's desirable to wait until a killed process actually exits
> > > before moving on (if needed) to killing the next process. Since the
> > > processes that lmkd kills are not lmkd's children, lmkd currently
> > > lacks a way to wait for a proces to actually die after being sent
> > > SIGKILL; today, lmkd resorts to polling the proc filesystem pid
> >
> > Any idea why it needs to wait and then send SIGKILL? Why not do
> > SIGKILL and look for errno == ESRCH in a loop with a delay.
>
> I want to get polling loops out of the system. Polling loops are bad
> for wakeup attribution, bad for power, bad for priority inheritance,
> and bad for latency. There's no right answer to the question "How long
> should I wait before checking $CONDITION again?". If we can have an
> explicit waitqueue interface to something, we should. Besides, PID
> polling is vulnerable to PID reuse, whereas this mechanism (just like
> anything based on struct pid) is immune to it.

The argument sounds Ok to me. I would also more details in the commit
message about the alternate methods to do this (such as kill polling
or ptrace) and why they don't work well etc so no one asks any
questions. Like maybe under a "other ways to do this" section. A bit
of googling also showed a netlink way of doing it without polling
(though I don't look into that much and wouldn't be surprised if its
more complicated)

Also I guess when you send a patch, it'd be good to pass
"--cc-cmd='./scripts/get_maintainer.pl" to git-send-email so it
automatically CCs the maintainers who maintain this.

thanks,

- Joel


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:18 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 06:02 PM, John Garry wrote:
>>> On 29/10/2018 12:16, Will Deacon wrote:
 On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>> Currently it is acceptable to set the distance between 2 separate nodes 
>>> to
>>> LOCAL_DISTANCE.
>>>
>>> Reject this as it is invalid.
>>>
>>> This change avoids a crash reported in [1].
>>>
>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>
>>> Signed-off-by: John Garry 
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 146c04c..6092e3d 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
>>> distance)
>>> }
>>>
>>> if ((u8)distance != distance ||
>>> -    (from == to && distance != LOCAL_DISTANCE)) {
>>> +    (from == to && distance != LOCAL_DISTANCE) ||
>>> +    (from != to && distance == LOCAL_DISTANCE)) {
>>
>> The current code here is more-or-less lifted from the x86 implementation
>> of numa_set_distance().
>
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it
>>> (especially since the DT bindings actually specify the distance rules),
>>> and not rely on the arch NUMA code to accept/reject numa_set_distance()
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these. 
>> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>>  (from == to && distance != LOCAL_DISTANCE) ||
>>  (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM". Disabling NUMA is one such major decision 
>> which
>> should be with arch NUMA code not with OF driver.
> 
> I don't fully understand what you're getting at here, but why would the
> check posted by John be arch-specific? It's already done in the core code
> for ACPI, so there's a discrepancy between ACPI and FDT that should be
> resolved. I'd also argue that the subtleties of this check are actually
> based on what the core code is willing to accept in terms of the NUMA
> description, so it's also the best place to enforce it.

Agreed. I had overlooked the existing semantics with respect to ACPI parsing.
Yes, there is a discrepancy with respect to FDT which should be fixed. But
IMHO its also worth to enhance numa_set_distance() checks with this proposed
new check as well.


Re: Can VFIO pin only a specific region of guest mem when use pass through devices?

2018-10-29 Thread Peter Xu
On Mon, Oct 29, 2018 at 12:29:22PM -0600, Alex Williamson wrote:
> On Mon, 29 Oct 2018 17:14:46 +0800
> Jason Wang  wrote:
> 
> > On 2018/10/29 上午10:42, Simon Guo wrote:
> > > Hi,
> > >
> > > I am using network device pass through mode with qemu x86(-device 
> > > vfio-pci,host=:xx:yy.z)
> > > and “intel_iommu=on” in host kernel command line, and it shows the whole 
> > > guest memory
> > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I 
> > > understand it is due
> > > to device can DMA to any guest memory address and it cannot be swapped.
> > >
> > > However can we just pin a rang of address space allowed by iommu group of 
> > > that device,
> > > instead of pin whole address space? I do notice some code like 
> > > vtd_host_dma_iommu().
> > > Maybe there is already some way to enable that?
> > >
> > > Sorry if I missed some basics. I googled some but no luck to find the 
> > > answer yet. Please
> > > let me know if any discussion already raised on that.
> > >
> > > Any other suggestion will also be appreciated. For example, can we modify 
> > > the guest network
> > > card driver to allocate only from a specific memory region(zone), and 
> > > qemu advises guest
> > > kernel to only pin that memory region(zone) accordingly?
> > >
> > > Thanks,
> > > - Simon  
> > 
> > 
> > One possible method is to enable IOMMU of VM.
> 
> Right, making use of a virtual IOMMU in the VM is really the only way
> to bound the DMA to some subset of guest memory, but vIOMMU usage by
> the guest is optional on x86 and even if the guest does use it, it might
> enable passthrough mode, which puts you back at the problem that all
> guest memory is pinned with the additional problem that it might also
> be accounted for once per assigned device and may hit locked memory
> limits.  Also, the DMA mapping and unmapping path with a vIOMMU is very
> slow, so performance of the device in the guest will be abysmal unless
> the use case is limited to very static mappings, such as userspace use
> within the guest for nested assignment or perhaps DPDK use cases.
> 
> Modifying the guest to only use a portion of memory for DMA sounds like
> a quite intrusive option.  There are certainly IOMMU models where the
> IOMMU provides a fixed IOVA range, but creating dynamic mappings within
> that range doesn't really solve anything given that it simply returns
> us to a vIOMMU with slow mapping.  A window with a fixed identity
> mapping used as a DMA zone seems plausible, but again, also pretty
> intrusive to the guest, possibly also to the drivers.  Host IOMMU page
> faulting can also help the pinned memory footprint, but of course
> requires hardware support and lots of new code paths, many of which are
> already being discussed for things like Scalable IOV and SVA.  Thanks,

Agree with Jason's and Alex's comments.  One trivial additional: the
whole guest RAM will possibly still be pinned for a very short period
during guest system boot (e.g., when running guest BIOS) and before
the guest kernel enables the vIOMMU for the assigned device since the
bootup code like BIOS would still need to be able to access the whole
guest memory.

Thanks,

-- 
Peter Xu


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:18 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 06:02 PM, John Garry wrote:
>>> On 29/10/2018 12:16, Will Deacon wrote:
 On Mon, Oct 29, 2018 at 12:14:09PM +, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>> Currently it is acceptable to set the distance between 2 separate nodes 
>>> to
>>> LOCAL_DISTANCE.
>>>
>>> Reject this as it is invalid.
>>>
>>> This change avoids a crash reported in [1].
>>>
>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>
>>> Signed-off-by: John Garry 
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 146c04c..6092e3d 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int 
>>> distance)
>>> }
>>>
>>> if ((u8)distance != distance ||
>>> -    (from == to && distance != LOCAL_DISTANCE)) {
>>> +    (from == to && distance != LOCAL_DISTANCE) ||
>>> +    (from != to && distance == LOCAL_DISTANCE)) {
>>
>> The current code here is more-or-less lifted from the x86 implementation
>> of numa_set_distance().
>
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it
>>> (especially since the DT bindings actually specify the distance rules),
>>> and not rely on the arch NUMA code to accept/reject numa_set_distance()
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these. 
>> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>>  (from == to && distance != LOCAL_DISTANCE) ||
>>  (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM". Disabling NUMA is one such major decision 
>> which
>> should be with arch NUMA code not with OF driver.
> 
> I don't fully understand what you're getting at here, but why would the
> check posted by John be arch-specific? It's already done in the core code
> for ACPI, so there's a discrepancy between ACPI and FDT that should be
> resolved. I'd also argue that the subtleties of this check are actually
> based on what the core code is willing to accept in terms of the NUMA
> description, so it's also the best place to enforce it.

Agreed. I had overlooked the existing semantics with respect to ACPI parsing.
Yes, there is a discrepancy with respect to FDT which should be fixed. But
IMHO its also worth to enhance numa_set_distance() checks with this proposed
new check as well.


Re: Can VFIO pin only a specific region of guest mem when use pass through devices?

2018-10-29 Thread Peter Xu
On Mon, Oct 29, 2018 at 12:29:22PM -0600, Alex Williamson wrote:
> On Mon, 29 Oct 2018 17:14:46 +0800
> Jason Wang  wrote:
> 
> > On 2018/10/29 上午10:42, Simon Guo wrote:
> > > Hi,
> > >
> > > I am using network device pass through mode with qemu x86(-device 
> > > vfio-pci,host=:xx:yy.z)
> > > and “intel_iommu=on” in host kernel command line, and it shows the whole 
> > > guest memory
> > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I 
> > > understand it is due
> > > to device can DMA to any guest memory address and it cannot be swapped.
> > >
> > > However can we just pin a rang of address space allowed by iommu group of 
> > > that device,
> > > instead of pin whole address space? I do notice some code like 
> > > vtd_host_dma_iommu().
> > > Maybe there is already some way to enable that?
> > >
> > > Sorry if I missed some basics. I googled some but no luck to find the 
> > > answer yet. Please
> > > let me know if any discussion already raised on that.
> > >
> > > Any other suggestion will also be appreciated. For example, can we modify 
> > > the guest network
> > > card driver to allocate only from a specific memory region(zone), and 
> > > qemu advises guest
> > > kernel to only pin that memory region(zone) accordingly?
> > >
> > > Thanks,
> > > - Simon  
> > 
> > 
> > One possible method is to enable IOMMU of VM.
> 
> Right, making use of a virtual IOMMU in the VM is really the only way
> to bound the DMA to some subset of guest memory, but vIOMMU usage by
> the guest is optional on x86 and even if the guest does use it, it might
> enable passthrough mode, which puts you back at the problem that all
> guest memory is pinned with the additional problem that it might also
> be accounted for once per assigned device and may hit locked memory
> limits.  Also, the DMA mapping and unmapping path with a vIOMMU is very
> slow, so performance of the device in the guest will be abysmal unless
> the use case is limited to very static mappings, such as userspace use
> within the guest for nested assignment or perhaps DPDK use cases.
> 
> Modifying the guest to only use a portion of memory for DMA sounds like
> a quite intrusive option.  There are certainly IOMMU models where the
> IOMMU provides a fixed IOVA range, but creating dynamic mappings within
> that range doesn't really solve anything given that it simply returns
> us to a vIOMMU with slow mapping.  A window with a fixed identity
> mapping used as a DMA zone seems plausible, but again, also pretty
> intrusive to the guest, possibly also to the drivers.  Host IOMMU page
> faulting can also help the pinned memory footprint, but of course
> requires hardware support and lots of new code paths, many of which are
> already being discussed for things like Scalable IOV and SVA.  Thanks,

Agree with Jason's and Alex's comments.  One trivial additional: the
whole guest RAM will possibly still be pinned for a very short period
during guest system boot (e.g., when running guest BIOS) and before
the guest kernel enables the vIOMMU for the assigned device since the
bootup code like BIOS would still need to be able to access the whole
guest memory.

Thanks,

-- 
Peter Xu


How to implement "#interrupt-cells = <2>" for a gpiochip?

2018-10-29 Thread Daniel Santos
Hello,

I'm trying to use a GPIO as an interrupt on an mt7620 (using OpenWRT
drivers) and I can't seem to figure out how to glue my two-celled
interrupt description (including the trigger) to the device tree code. 
This is the gpio driver I'm using: 
https://github.com/openwrt/openwrt/blob/master/target/linux/ramips/patches-4.14/0027-GPIO-MIPS-ralink-add-gpio-driver-for-ralink-SoC.patch

And this is the gpio chip in the device tree:

gpio0: gpio@600 {
compatible = "ralink,mt7620a-gpio", 
"ralink,rt2880-gpio";
reg = <0x600 0x34>;

resets = < 13>;
reset-names = "pio";

interrupt-parent = <>;
interrupts = <6>;

interrupt-controller;
#interrupt-cells = <2>;

gpio-controller;
#gpio-cells = <2>;

ralink,gpio-base = <0>;
ralink,num-gpios = <24>;
ralink,register-map = [ 00 04 08 0c
20 24 28 2c
30 34 ];
};


I've added the "interrupt-controller;" and "#interrupt-cells" myself. 
This is my i2c device:

 {
status = "okay";

imu: lsm6ds3@6b {
compatible = "st,lsm6ds3";
reg = <0x6b>;
interrupt-parent = <>;
interrupts = <14 IRQ_TYPE_EDGE_FALLING>;
};
};


The problem is that when the driver probes and asks what the trigger for
the irq is, it returns zero instead of 2 (IRQ_TYPE_EDGE_FALLING).  I
presume this is because the two-celled interrupts aren't implemented by
the gpio driver? 
Documentation/devicetree/bindings/interrupt-controller/interrupts.txt says:

A device is marked as an interrupt controller with the
"interrupt-controller"
property. This is a empty, boolean property. An additional
"#interrupt-cells"
property defines the number of cells needed to specify a single interrupt.

It is the responsibility of the interrupt controller's binding to define the
length and format of the interrupt specifier. The following two variants are
commonly used:
...

However, I'm having great trouble finding documentation on how to write
these bindings. Can anybody give me a pointer please?

Thanks,
Daniel


How to implement "#interrupt-cells = <2>" for a gpiochip?

2018-10-29 Thread Daniel Santos
Hello,

I'm trying to use a GPIO as an interrupt on an mt7620 (using OpenWRT
drivers) and I can't seem to figure out how to glue my two-celled
interrupt description (including the trigger) to the device tree code. 
This is the gpio driver I'm using: 
https://github.com/openwrt/openwrt/blob/master/target/linux/ramips/patches-4.14/0027-GPIO-MIPS-ralink-add-gpio-driver-for-ralink-SoC.patch

And this is the gpio chip in the device tree:

gpio0: gpio@600 {
compatible = "ralink,mt7620a-gpio", 
"ralink,rt2880-gpio";
reg = <0x600 0x34>;

resets = < 13>;
reset-names = "pio";

interrupt-parent = <>;
interrupts = <6>;

interrupt-controller;
#interrupt-cells = <2>;

gpio-controller;
#gpio-cells = <2>;

ralink,gpio-base = <0>;
ralink,num-gpios = <24>;
ralink,register-map = [ 00 04 08 0c
20 24 28 2c
30 34 ];
};


I've added the "interrupt-controller;" and "#interrupt-cells" myself. 
This is my i2c device:

 {
status = "okay";

imu: lsm6ds3@6b {
compatible = "st,lsm6ds3";
reg = <0x6b>;
interrupt-parent = <>;
interrupts = <14 IRQ_TYPE_EDGE_FALLING>;
};
};


The problem is that when the driver probes and asks what the trigger for
the irq is, it returns zero instead of 2 (IRQ_TYPE_EDGE_FALLING).  I
presume this is because the two-celled interrupts aren't implemented by
the gpio driver? 
Documentation/devicetree/bindings/interrupt-controller/interrupts.txt says:

A device is marked as an interrupt controller with the
"interrupt-controller"
property. This is a empty, boolean property. An additional
"#interrupt-cells"
property defines the number of cells needed to specify a single interrupt.

It is the responsibility of the interrupt controller's binding to define the
length and format of the interrupt specifier. The following two variants are
commonly used:
...

However, I'm having great trouble finding documentation on how to write
these bindings. Can anybody give me a pointer please?

Thanks,
Daniel


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:14 PM, John Garry wrote:
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it 
>>> (especially since the DT bindings actually specify the distance rules), and 
>>> not rely on the arch NUMA code to accept/reject numa_set_distance() 
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these.
> 
> The checks in the arch NUMA code mean that invalid inter-node distance 
> combinations are ignored.

Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
one of them as well ? numa_set_distance() updates the table or just throws some
warnings while skipping entries it deems invalid. It would be okay to have this
new check there in addition to others like this patch suggests.

> 
> However, if any entries in the table are invalid, then the whole table can be 
> discarded as none of it can be believed, i.e. it's better to validate the 
> table.
>

Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.

> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>> (from == to && distance != LOCAL_DISTANCE) ||
>>     (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an 
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM".
> 
> Sorry, but I don't know who was advocating this.

I was just giving an example. Invalidating NUMA distance table during firmware
table (ACPI or FDT) parsing forces arm64_numa_init() to fall back on dummy NUMA
node which is like disabling NUMA. But that is the current semantics with ACPI
parsing which I overlooked. Fixing OF driver to do the same wont extend this
any further, hence my previous concern does not stand valid.

> 
> Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I meant parsing the table would fail, so arch NUMA would fall back on dummy 
> NUMA.

Right and ACPI parsing does that and can force a fallback on a dummy NUMA node.


Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()

2018-10-29 Thread Anshuman Khandual



On 10/29/2018 08:14 PM, John Garry wrote:
>
>  I think we should either factor out the sanity check
>> into a core helper or make the core code robust to these funny 
>> configurations.
>
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

 That, or have the OF code perform the same validation that slit_valid() is
 doing for ACPI. I'm just trying to avoid other architectures running into
 this problem down the line.

>>>
>>> Right, OF code should do this validation job if ACPI is doing it 
>>> (especially since the DT bindings actually specify the distance rules), and 
>>> not rely on the arch NUMA code to accept/reject numa_set_distance() 
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA 
>> init
>> code sanity check like other basic tests what numa_set_distance() currently 
>> does
>> already but it should not be a necessity for the OF driver to check these.
> 
> The checks in the arch NUMA code mean that invalid inter-node distance 
> combinations are ignored.

Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
one of them as well ? numa_set_distance() updates the table or just throws some
warnings while skipping entries it deems invalid. It would be okay to have this
new check there in addition to others like this patch suggests.

> 
> However, if any entries in the table are invalid, then the whole table can be 
> discarded as none of it can be believed, i.e. it's better to validate the 
> table.
>

Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.

> It can
>> choose to check but arch NUMA should check basic things like two different 
>> NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>> (from == to && distance != LOCAL_DISTANCE) ||
>>     (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an 
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once 
>> kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting 
>> is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM".
> 
> Sorry, but I don't know who was advocating this.

I was just giving an example. Invalidating NUMA distance table during firmware
table (ACPI or FDT) parsing forces arm64_numa_init() to fall back on dummy NUMA
node which is like disabling NUMA. But that is the current semantics with ACPI
parsing which I overlooked. Fixing OF driver to do the same wont extend this
any further, hence my previous concern does not stand valid.

> 
> Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I meant parsing the table would fail, so arch NUMA would fall back on dummy 
> NUMA.

Right and ACPI parsing does that and can force a fallback on a dummy NUMA node.


RE: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread 周威
> > > It's not obvious from this patch where this dependency comes 
> > > from...why is SYSVIPC required? I'd like to not have to require 
> > > IPC_NS either for devices.
> >
> > Yes, the patch is not highly dependent on SYSVIPC, but it will be 
> > convenient if require it. I will update it to drop dependency of it in 
> > V2 patch. This patch doesn't need IPC_NS set at present.
> 
> Actually it is dependent on IPC_NS since it makes changes to ipc/namespace.c 
> which is compiled only if CONFIG_IPC_NS.
> 

Actually it does not require IPC_NS, the code in ipc/namespace.c are namespace 
specific, 
and is *not needed* if ipc namespace is not supported.  <-- fixed here

> There are a couple more implementations similar to this one.
> https://lwn.net/Articles/577957/ and some submissions to AOSP derived from 
> that one that introduce a generic registration function for namespace support 
> [1], and changes to binder to implement namespaces [2].
> 
> If this is really needed, then we should have a solution that works for 
> devices without requiring IPC_NS or SYSVIPC. Also, we should not add 
> binder-specific code to ipc/namespace.c or include/linux/ipc_namespace.h.
> 
> -Todd
> 
> [1] https://android-review.googlesource.com/c/kernel/common/+/471961
> [2] https://android-review.googlesource.com/c/kernel/common/+/471825
>

If the binder will be isolated by namespace, it must put binder proc and binder 
context in ipc_namespace (or with something like void* as [1] did)
I have sent the V2 patch, that patch does not require SYSVIPC or IPC_NS. If 
IPC_NS is not set, binder_init will put proc and context into init_ipc_ns.
If SYSVIPC and CONFIG_POSIX_MQUEUE are both unset, I will make a fake 
init_ipc_ns to put them. it is marked as no static intentionally to let compile 
generate an error if it has defined somewhere alse. The code in ipc/namespace.c 
is just to notify binder to do some installationwhere namespace are
creating, If no IPC_NS set, the initialization in binder_init will be enough.
So please review and test the V2 patch.


RE: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread 周威
> > > It's not obvious from this patch where this dependency comes 
> > > from...why is SYSVIPC required? I'd like to not have to require 
> > > IPC_NS either for devices.
> >
> > Yes, the patch is not highly dependent on SYSVIPC, but it will be 
> > convenient if require it. I will update it to drop dependency of it in 
> > V2 patch. This patch doesn't need IPC_NS set at present.
> 
> Actually it is dependent on IPC_NS since it makes changes to ipc/namespace.c 
> which is compiled only if CONFIG_IPC_NS.
> 

Actually it does not require IPC_NS, the code in ipc/namespace.c are namespace 
specific, 
and is *not needed* if ipc namespace is not supported.  <-- fixed here

> There are a couple more implementations similar to this one.
> https://lwn.net/Articles/577957/ and some submissions to AOSP derived from 
> that one that introduce a generic registration function for namespace support 
> [1], and changes to binder to implement namespaces [2].
> 
> If this is really needed, then we should have a solution that works for 
> devices without requiring IPC_NS or SYSVIPC. Also, we should not add 
> binder-specific code to ipc/namespace.c or include/linux/ipc_namespace.h.
> 
> -Todd
> 
> [1] https://android-review.googlesource.com/c/kernel/common/+/471961
> [2] https://android-review.googlesource.com/c/kernel/common/+/471825
>

If the binder will be isolated by namespace, it must put binder proc and binder 
context in ipc_namespace (or with something like void* as [1] did)
I have sent the V2 patch, that patch does not require SYSVIPC or IPC_NS. If 
IPC_NS is not set, binder_init will put proc and context into init_ipc_ns.
If SYSVIPC and CONFIG_POSIX_MQUEUE are both unset, I will make a fake 
init_ipc_ns to put them. it is marked as no static intentionally to let compile 
generate an error if it has defined somewhere alse. The code in ipc/namespace.c 
is just to notify binder to do some installationwhere namespace are
creating, If no IPC_NS set, the initialization in binder_init will be enough.
So please review and test the V2 patch.


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Shakeel Butt
On Mon, Oct 29, 2018 at 6:01 PM Rik van Riel  wrote:
>
> On Mon, 2018-10-29 at 17:50 -0700, Shakeel Butt wrote:
> > On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
> > >
> > > Mike Galbraith reported a regression caused by the commit
> > > 9b6f7e163cd0
> > > ("mm: rework memcg kernel stack accounting") on a system with
> > > "cgroup_disable=memory" boot option: the system panics with the
> > > following stack trace:
> > >
> > >   [0.928542] BUG: unable to handle kernel NULL pointer dereference
> > > at 00f8
> > >   [0.929317] PGD 0 P4D 0
> > >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> > >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-
> > > preempt+ #410
> > >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS ?-20180531_142017-buildhw-08.phx2.fed4
> > >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> > >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff
> > > 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> > >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> > >   [0.934826] RAX: 00f8 RBX:  RCX:
> > > 
> > >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI:
> > > 00f8
> > >   [0.936288] RBP: 0001 R08: 8063 R09:
> > > 99ff7cd37a40
> > >   [0.937021] R10: acf68031fed0 R11: 0020 R12:
> > > 0020
> > >   [0.937749] R13: acf68031fd08 R14: 00f8 R15:
> > > 99ff7da1ec60
> > >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0()
> > > knlGS:
> > >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> > >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4:
> > > 00760ef0
> > >   [0.940638] DR0:  DR1:  DR2:
> > > 
> > >   [0.941366] DR3:  DR6: fffe0ff0 DR7:
> > > 0400
> > >   [0.942110] PKRU: 5554
> > >   [0.942412] Call Trace:
> > >   [0.942673]  try_charge+0xcb/0x780
> > >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> > >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> > >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> > >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> > >   [0.944853]  ? get_acl+0x1a/0x120
> > >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> > >   [0.945596]  _do_fork+0xd7/0x3d0
> > >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >   [0.946421]  do_syscall_64+0x5a/0x180
> > >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > The problem occurs because get_mem_cgroup_from_current() returns
> > > the NULL pointer if memory controller is disabled. Let's check
> > > if this is a case at the beginning of memcg_kmem_charge() and
> > > just return 0 if mem_cgroup_disabled() returns true. This is how
> > > we handle this case in many other places in the memory controller
> > > code.
> > >
> > > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > > Reported-by: Mike Galbraith 
> > > Signed-off-by: Roman Gushchin 
> > > Cc: Michal Hocko 
> > > Cc: Johannes Weiner 
> > > Cc: Vladimir Davydov 
> > > Cc: Andrew Morton 
> > > ---
> > >  mm/memcontrol.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 54920cbc46bf..6e1469b80cb7 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page,
> > > gfp_t gfp, int order)
> > > struct mem_cgroup *memcg;
> > > int ret = 0;
> > >
> > > -   if (memcg_kmem_bypass())
> > > +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> > > return 0;
> > >
> >
> > Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
> > in memcg_charge_kernel_stack()?
>
> Check Roman's backtrace again. The function
> memcg_charge_kernel_stack() is not in it.
>

It got inlined.

> That is why it is generally better to check
> in the called function, rather than add a
> check to every call site (and maybe miss one
> or two).
>

I think the reason the check was at the call site was not to introduce
jmp/call in the allocation hot path for processes in the root memcg. I
don't have any strong preference but we should be persistent i.e.
checks at call site for all or check in the called function for all.

Shakeel


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Shakeel Butt
On Mon, Oct 29, 2018 at 6:01 PM Rik van Riel  wrote:
>
> On Mon, 2018-10-29 at 17:50 -0700, Shakeel Butt wrote:
> > On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
> > >
> > > Mike Galbraith reported a regression caused by the commit
> > > 9b6f7e163cd0
> > > ("mm: rework memcg kernel stack accounting") on a system with
> > > "cgroup_disable=memory" boot option: the system panics with the
> > > following stack trace:
> > >
> > >   [0.928542] BUG: unable to handle kernel NULL pointer dereference
> > > at 00f8
> > >   [0.929317] PGD 0 P4D 0
> > >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> > >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-
> > > preempt+ #410
> > >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS ?-20180531_142017-buildhw-08.phx2.fed4
> > >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> > >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff
> > > 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> > >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> > >   [0.934826] RAX: 00f8 RBX:  RCX:
> > > 
> > >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI:
> > > 00f8
> > >   [0.936288] RBP: 0001 R08: 8063 R09:
> > > 99ff7cd37a40
> > >   [0.937021] R10: acf68031fed0 R11: 0020 R12:
> > > 0020
> > >   [0.937749] R13: acf68031fd08 R14: 00f8 R15:
> > > 99ff7da1ec60
> > >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0()
> > > knlGS:
> > >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> > >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4:
> > > 00760ef0
> > >   [0.940638] DR0:  DR1:  DR2:
> > > 
> > >   [0.941366] DR3:  DR6: fffe0ff0 DR7:
> > > 0400
> > >   [0.942110] PKRU: 5554
> > >   [0.942412] Call Trace:
> > >   [0.942673]  try_charge+0xcb/0x780
> > >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> > >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> > >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> > >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> > >   [0.944853]  ? get_acl+0x1a/0x120
> > >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> > >   [0.945596]  _do_fork+0xd7/0x3d0
> > >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >   [0.946421]  do_syscall_64+0x5a/0x180
> > >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > The problem occurs because get_mem_cgroup_from_current() returns
> > > the NULL pointer if memory controller is disabled. Let's check
> > > if this is a case at the beginning of memcg_kmem_charge() and
> > > just return 0 if mem_cgroup_disabled() returns true. This is how
> > > we handle this case in many other places in the memory controller
> > > code.
> > >
> > > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > > Reported-by: Mike Galbraith 
> > > Signed-off-by: Roman Gushchin 
> > > Cc: Michal Hocko 
> > > Cc: Johannes Weiner 
> > > Cc: Vladimir Davydov 
> > > Cc: Andrew Morton 
> > > ---
> > >  mm/memcontrol.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 54920cbc46bf..6e1469b80cb7 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page,
> > > gfp_t gfp, int order)
> > > struct mem_cgroup *memcg;
> > > int ret = 0;
> > >
> > > -   if (memcg_kmem_bypass())
> > > +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> > > return 0;
> > >
> >
> > Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
> > in memcg_charge_kernel_stack()?
>
> Check Roman's backtrace again. The function
> memcg_charge_kernel_stack() is not in it.
>

It got inlined.

> That is why it is generally better to check
> in the called function, rather than add a
> check to every call site (and maybe miss one
> or two).
>

I think the reason the check was at the call site was not to introduce
jmp/call in the allocation hot path for processes in the root memcg. I
don't have any strong preference but we should be persistent i.e.
checks at call site for all or check in the called function for all.

Shakeel


RE: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread 周威
> > > It's not obvious from this patch where this dependency comes 
> > > from...why is SYSVIPC required? I'd like to not have to require 
> > > IPC_NS either for devices.
> >
> > Yes, the patch is not highly dependent on SYSVIPC, but it will be 
> > convenient if require it. I will update it to drop dependency of it in 
> > V2 patch. This patch doesn't need IPC_NS set at present.
> 
> Actually it is dependent on IPC_NS since it makes changes to ipc/namespace.c 
> which is compiled only if CONFIG_IPC_NS.
> 

Actually it does not require IPC_NS, the code in ipc/namespace.c are namespace 
specific, and is *not needed* if ipc namespace is supported.

> There are a couple more implementations similar to this one.
> https://lwn.net/Articles/577957/ and some submissions to AOSP derived from 
> that one that introduce a generic registration function for namespace support 
> [1], and changes to binder to implement namespaces [2].
> 
> If this is really needed, then we should have a solution that works for 
> devices without requiring IPC_NS or SYSVIPC. Also, we should not add 
> binder-specific code to ipc/namespace.c or include/linux/ipc_namespace.h.
> 
> -Todd
> 
> [1] https://android-review.googlesource.com/c/kernel/common/+/471961
> [2] https://android-review.googlesource.com/c/kernel/common/+/471825
>

If the binder will be isolated by namespace, it must put binder proc and binder 
context in ipc_namespace (or with something like void* as [1] did)
I have sent the V2 patch, that patch does not require SYSVIPC or IPC_NS. If 
IPC_NS is not set, binder_init will put proc and context into init_ipc_ns.
If SYSVIPC and CONFIG_POSIX_MQUEUE are both unset, I will make a fake 
init_ipc_ns to put them. it is marked as no static intentionally to let compile 
generate an error if it has defined somewhere alse. The code in ipc/namespace.c 
is just to notify binder to do some installationwhere namespace are
creating, If no IPC_NS set, the initialization in binder_init will be enough.
So please review and test the V2 patch.


RE: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread 周威
> > > It's not obvious from this patch where this dependency comes 
> > > from...why is SYSVIPC required? I'd like to not have to require 
> > > IPC_NS either for devices.
> >
> > Yes, the patch is not highly dependent on SYSVIPC, but it will be 
> > convenient if require it. I will update it to drop dependency of it in 
> > V2 patch. This patch doesn't need IPC_NS set at present.
> 
> Actually it is dependent on IPC_NS since it makes changes to ipc/namespace.c 
> which is compiled only if CONFIG_IPC_NS.
> 

Actually it does not require IPC_NS, the code in ipc/namespace.c are namespace 
specific, and is *not needed* if ipc namespace is supported.

> There are a couple more implementations similar to this one.
> https://lwn.net/Articles/577957/ and some submissions to AOSP derived from 
> that one that introduce a generic registration function for namespace support 
> [1], and changes to binder to implement namespaces [2].
> 
> If this is really needed, then we should have a solution that works for 
> devices without requiring IPC_NS or SYSVIPC. Also, we should not add 
> binder-specific code to ipc/namespace.c or include/linux/ipc_namespace.h.
> 
> -Todd
> 
> [1] https://android-review.googlesource.com/c/kernel/common/+/471961
> [2] https://android-review.googlesource.com/c/kernel/common/+/471825
>

If the binder will be isolated by namespace, it must put binder proc and binder 
context in ipc_namespace (or with something like void* as [1] did)
I have sent the V2 patch, that patch does not require SYSVIPC or IPC_NS. If 
IPC_NS is not set, binder_init will put proc and context into init_ipc_ns.
If SYSVIPC and CONFIG_POSIX_MQUEUE are both unset, I will make a fake 
init_ipc_ns to put them. it is marked as no static intentionally to let compile 
generate an error if it has defined somewhere alse. The code in ipc/namespace.c 
is just to notify binder to do some installationwhere namespace are
creating, If no IPC_NS set, the initialization in binder_init will be enough.
So please review and test the V2 patch.


[PATCH -next] nds32: Remove duplicated include from pm.c

2018-10-29 Thread YueHaibing
Remove duplicated include.

Signed-off-by: YueHaibing 
---
 arch/nds32/kernel/pm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c
index 6989560..ffa8040 100644
--- a/arch/nds32/kernel/pm.c
+++ b/arch/nds32/kernel/pm.c
@@ -5,7 +5,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 







[PATCH -next] nds32: Remove duplicated include from pm.c

2018-10-29 Thread YueHaibing
Remove duplicated include.

Signed-off-by: YueHaibing 
---
 arch/nds32/kernel/pm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/kernel/pm.c b/arch/nds32/kernel/pm.c
index 6989560..ffa8040 100644
--- a/arch/nds32/kernel/pm.c
+++ b/arch/nds32/kernel/pm.c
@@ -5,7 +5,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 







Re: [PATCH v8 2/2] samples: add an example of seccomp user trap

2018-10-29 Thread Tycho Andersen
On Mon, Oct 29, 2018 at 11:31:00PM +, Serge E. Hallyn wrote:
> On Mon, Oct 29, 2018 at 04:40:31PM -0600, Tycho Andersen wrote:
> > +   if (req->data.nr != __NR_mount) {
> > +   fprintf(stderr, "huh? trapped something besides mknod? %d\n", 
> > req->data.nr);
> 
> 'besides mount' ?

Yes, thanks :)

Tycho


Re: [PATCH v8 2/2] samples: add an example of seccomp user trap

2018-10-29 Thread Tycho Andersen
On Mon, Oct 29, 2018 at 11:31:00PM +, Serge E. Hallyn wrote:
> On Mon, Oct 29, 2018 at 04:40:31PM -0600, Tycho Andersen wrote:
> > +   if (req->data.nr != __NR_mount) {
> > +   fprintf(stderr, "huh? trapped something besides mknod? %d\n", 
> > req->data.nr);
> 
> 'besides mount' ?

Yes, thanks :)

Tycho


Re: [PATCH] V4 init/main.c Enable watchdog_thresh control from kernel line

2018-10-29 Thread kbuild test robot
Hi Laurence,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux-sof-driver/master]
[also build test ERROR on v4.19 next-20181029]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Laurence-Oberman/V4-init-main-c-Enable-watchdog_thresh-control-from-kernel-line/20181025-040136
base:   https://github.com/thesofproject/linux master
config: i386-randconfig-k3-10291547 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   init/main.c: In function 'is_watchdog_thresh_setup':
>> init/main.c:1036:20: error: 'watchdog_thresh' undeclared (first use in this 
>> function); did you mean 'proc_watchdog_thresh'?
 get_option(, _thresh);
   ^~~
   proc_watchdog_thresh
   init/main.c:1036:20: note: each undeclared identifier is reported only once 
for each function it appears in

vim +1036 init/main.c

  1033  
  1034  static int __init is_watchdog_thresh_setup(char *str)
  1035  {
> 1036  get_option(, _thresh);
  1037  return 1;
  1038  }
  1039  __setup("watchdog_thresh=", is_watchdog_thresh_setup);
  1040  
  1041  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] V4 init/main.c Enable watchdog_thresh control from kernel line

2018-10-29 Thread kbuild test robot
Hi Laurence,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux-sof-driver/master]
[also build test ERROR on v4.19 next-20181029]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Laurence-Oberman/V4-init-main-c-Enable-watchdog_thresh-control-from-kernel-line/20181025-040136
base:   https://github.com/thesofproject/linux master
config: i386-randconfig-k3-10291547 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   init/main.c: In function 'is_watchdog_thresh_setup':
>> init/main.c:1036:20: error: 'watchdog_thresh' undeclared (first use in this 
>> function); did you mean 'proc_watchdog_thresh'?
 get_option(, _thresh);
   ^~~
   proc_watchdog_thresh
   init/main.c:1036:20: note: each undeclared identifier is reported only once 
for each function it appears in

vim +1036 init/main.c

  1033  
  1034  static int __init is_watchdog_thresh_setup(char *str)
  1035  {
> 1036  get_option(, _thresh);
  1037  return 1;
  1038  }
  1039  __setup("watchdog_thresh=", is_watchdog_thresh_setup);
  1040  
  1041  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 6:42 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 18:32:40 -0400


- struct annotation_options *annotation_options 
__maybe_unused)
+ struct annotation_options *annotation_options __maybe_unused,
+ atomic_t *nr_rb_read __maybe_unused)
  {


What is going on with the indentations of this patch?



Sorry, my editor auto wraps the line.
The patch has been sent in a separate email.

Thanks,
Kan


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 6:42 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 18:32:40 -0400


- struct annotation_options *annotation_options 
__maybe_unused)
+ struct annotation_options *annotation_options __maybe_unused,
+ atomic_t *nr_rb_read __maybe_unused)
  {


What is going on with the indentations of this patch?



Sorry, my editor auto wraps the line.
The patch has been sent in a separate email.

Thanks,
Kan


[RFC PATCH] perf top: Move the timeout warning from event processing thread to display thread

2018-10-29 Thread kan . liang
From: Kan Liang 

The main event processing thread may hang if the ring buffer event
processing timeouts.

Analysis from David Miller:
"It hangs the event thread, because the ui call waits for a keypress
but the display thread will eat them up and the event thread thus
hangs in select()."

The timeout warning is moved to display thread.

The nr_rb_read is introduced to track the times of
perf_top__mmap_read(), which is the main function of event processing.
If the nr_rb_read doesn't increase during the refresh time, the display
thread may output stale data. The timeout warning will be triggered.

The timeout warning can only be triggered one time to avoid the annoying
and duplicated warning message.

The first perf_top__mmap_read() is moved to after display thread create.
Because the perf_top__mmap_read() could cost long time. For example, the
function may cost tens of minutes on Knights Landing platform with
parallel kernel build. There will be nothing displayed on the screen.
The display thread has to be created before perf_top__mmap_read(). But
at that time, the data is not ready. Display thread has to sleep
refresh time.

Fix: 8cc42de736b6 ("perf top: Check the latency of
perf_top__mmap_read()")
Reported-by: David Miller 
Signed-off-by: Kan Liang 
---
 tools/perf/builtin-c2c.c   |  4 +--
 tools/perf/builtin-report.c|  3 ++-
 tools/perf/builtin-top.c   | 39 +++-
 tools/perf/ui/browsers/hists.c | 58 ++
 tools/perf/ui/browsers/hists.h |  2 +-
 tools/perf/util/hist.h |  6 +++--
 tools/perf/util/top.h  |  1 +
 7 files changed, 85 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f3aa9d0..1e77515 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2371,7 +2371,7 @@ static int perf_c2c__browse_cacheline(struct hist_entry 
*he)
c2c_browser__update_nr_entries(browser);
 
while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);
 
switch (key) {
case 's':
@@ -2440,7 +2440,7 @@ static int perf_c2c__hists_browse(struct hists *hists)
c2c_browser__update_nr_entries(browser);
 
while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);
 
switch (key) {
case 'q':
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..2fc1273 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -561,7 +561,8 @@ static int report__browse_hists(struct report *rep)
ret = perf_evlist__tui_browse_hists(evlist, help, NULL,
rep->min_percent,
>header.env,
-   true, 
>annotation_opts);
+   true, >annotation_opts,
+   NULL);
/*
 * Usually "ret" is the last pressed key, and we only
 * care if the key notifies us to switch data file.
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..95409de 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -584,6 +584,8 @@ static void *display_thread_tui(void *arg)
.refresh= top->delay_secs,
};
 
+   sleep(top->delay_secs);
+
/* In order to read symbols from other namespaces perf to  needs to call
 * setns(2).  This isn't permitted if the struct_fs has multiple users.
 * unshare(2) the fs so that we may continue to setns into namespaces
@@ -607,7 +609,8 @@ static void *display_thread_tui(void *arg)
  top->min_percent,
  >session->header.env,
  !top->record_opts.overwrite,
- >annotation_opts);
+ >annotation_opts,
+ >nr_rb_read);
 
done = 1;
return NULL;
@@ -633,6 +636,11 @@ static void *display_thread(void *arg)
struct termios save;
struct perf_top *top = arg;
int delay_msecs, c;
+   bool rb_read_timeout_warned = false;
+   bool rb_read_timeout = false;
+   int last_nr_rb_read = 0;
+
+   sleep(top->delay_secs);
 
/* In order to read symbols from other namespaces perf to  needs to call
 * setns(2).  This isn't permitted if the struct_fs has multiple users.
@@ -651,12 +659,26 @@ static void *display_thread(void *arg)
 
while (!done) {
perf_top__print_sym_table(top);
+
+   if 

[RFC PATCH] perf top: Move the timeout warning from event processing thread to display thread

2018-10-29 Thread kan . liang
From: Kan Liang 

The main event processing thread may hang if the ring buffer event
processing timeouts.

Analysis from David Miller:
"It hangs the event thread, because the ui call waits for a keypress
but the display thread will eat them up and the event thread thus
hangs in select()."

The timeout warning is moved to display thread.

The nr_rb_read is introduced to track the times of
perf_top__mmap_read(), which is the main function of event processing.
If the nr_rb_read doesn't increase during the refresh time, the display
thread may output stale data. The timeout warning will be triggered.

The timeout warning can only be triggered one time to avoid the annoying
and duplicated warning message.

The first perf_top__mmap_read() is moved to after display thread create.
Because the perf_top__mmap_read() could cost long time. For example, the
function may cost tens of minutes on Knights Landing platform with
parallel kernel build. There will be nothing displayed on the screen.
The display thread has to be created before perf_top__mmap_read(). But
at that time, the data is not ready. Display thread has to sleep
refresh time.

Fix: 8cc42de736b6 ("perf top: Check the latency of
perf_top__mmap_read()")
Reported-by: David Miller 
Signed-off-by: Kan Liang 
---
 tools/perf/builtin-c2c.c   |  4 +--
 tools/perf/builtin-report.c|  3 ++-
 tools/perf/builtin-top.c   | 39 +++-
 tools/perf/ui/browsers/hists.c | 58 ++
 tools/perf/ui/browsers/hists.h |  2 +-
 tools/perf/util/hist.h |  6 +++--
 tools/perf/util/top.h  |  1 +
 7 files changed, 85 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f3aa9d0..1e77515 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2371,7 +2371,7 @@ static int perf_c2c__browse_cacheline(struct hist_entry 
*he)
c2c_browser__update_nr_entries(browser);
 
while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);
 
switch (key) {
case 's':
@@ -2440,7 +2440,7 @@ static int perf_c2c__hists_browse(struct hists *hists)
c2c_browser__update_nr_entries(browser);
 
while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);
 
switch (key) {
case 'q':
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..2fc1273 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -561,7 +561,8 @@ static int report__browse_hists(struct report *rep)
ret = perf_evlist__tui_browse_hists(evlist, help, NULL,
rep->min_percent,
>header.env,
-   true, 
>annotation_opts);
+   true, >annotation_opts,
+   NULL);
/*
 * Usually "ret" is the last pressed key, and we only
 * care if the key notifies us to switch data file.
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..95409de 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -584,6 +584,8 @@ static void *display_thread_tui(void *arg)
.refresh= top->delay_secs,
};
 
+   sleep(top->delay_secs);
+
/* In order to read symbols from other namespaces perf to  needs to call
 * setns(2).  This isn't permitted if the struct_fs has multiple users.
 * unshare(2) the fs so that we may continue to setns into namespaces
@@ -607,7 +609,8 @@ static void *display_thread_tui(void *arg)
  top->min_percent,
  >session->header.env,
  !top->record_opts.overwrite,
- >annotation_opts);
+ >annotation_opts,
+ >nr_rb_read);
 
done = 1;
return NULL;
@@ -633,6 +636,11 @@ static void *display_thread(void *arg)
struct termios save;
struct perf_top *top = arg;
int delay_msecs, c;
+   bool rb_read_timeout_warned = false;
+   bool rb_read_timeout = false;
+   int last_nr_rb_read = 0;
+
+   sleep(top->delay_secs);
 
/* In order to read symbols from other namespaces perf to  needs to call
 * setns(2).  This isn't permitted if the struct_fs has multiple users.
@@ -651,12 +659,26 @@ static void *display_thread(void *arg)
 
while (!done) {
perf_top__print_sym_table(top);
+
+   if 

arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression

2018-10-29 Thread kbuild test robot
Hi Peter,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 7aa54be2976550f17c11a1c3e3630002dea39303 locking/qspinlock, x86: 
Provide liveness guarantee
date:   13 days ago
config: x86_64-randconfig-j0-10290909 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 7aa54be2976550f17c11a1c3e3630002dea39303
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:5:0,
from include/linux/atomic.h:7,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
   arch/x86/include/asm/qspinlock.h: In function 
'queued_fetch_set_pending_acquire':
>> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
   : clobbers : cc_label);\
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 
'asm_volatile_goto'
 asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"  \
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here
cc_label: c = true;  \
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
>> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label'
cc_label: c = true;  \
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
  

Re: [PATCH v2 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-29 Thread Leizhen (ThunderTown)



On 2018/10/30 1:59, Will Deacon wrote:
> On Sat, Oct 20, 2018 at 03:36:54PM +0800, Zhen Lei wrote:
>> The standard GITS_TRANSLATER register in ITS is only 4 bytes, but
>> Hisilicon expands the next 4 bytes to carry some IMPDEF information. That
>> means, total 8 bytes data will be written to MSIAddress each time.
>>
>> MSIAddr: |4bytes|4bytes|
>>   |MSIData   |IMPDEF|
>>
>> There is no problem for ITS, because the next 4 bytes space is reserved
>> in ITS. But it will overwrite the 4 bytes memory following "sync_count".
>> It's very fortunately that the previous and the next neighbour of the
>> "sync_count" are both aligned by 8 bytes, so no problem is met now.
>>
>> It's good to explicitly add a workaround:
>> 1. Add gcc __attribute__((aligned(8))) to make sure that "sync_count" is
>>always aligned by 8 bytes.
>> 2. Add a "int" struct member to make sure the 4 bytes padding is always
>>exist.
>>
>> There is no functional change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 5059d09..624fdd0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -586,7 +586,20 @@ struct arm_smmu_device {
>>  
>>  struct arm_smmu_strtab_cfg  strtab_cfg;
>>  
>> -u32 sync_count;
>> +/*
>> + * The alignment and padding is required by Hi16xx of Hisilicon.
>> + * Because the ITS hardware on Hi16xx will truncate the MSIAddress(Here
>> + * it's the address of "sync_count") to 8 bytes boundary first, then
>> + * write 32 bits MSIdata at offset 0, and 32 bits IMPDEF data at offset
>> + * 4. Without this workaround, the adjacent member maybe overwritten.
>> + *
>> + *|---4bytes---|---4bytes---|
>> + * MSIAddress & (~0x7):   MSIdata  | IMPDEF data|
>> + */
>> +struct {
>> +u32 sync_count;
>> +int padding;
>> +} __attribute__((aligned(8)));
> 
> I thought the conclusion after reviewing your original patch was to maintain
> the union and drop the alignment directive? e.g.
> 
>   union {
>   u32 sync_count;
>   u64 padding; /* Hi16xx writes an extra 32 bits of goodness 
> */
>   };
OK, I will sent v3.

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression

2018-10-29 Thread kbuild test robot
Hi Peter,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 7aa54be2976550f17c11a1c3e3630002dea39303 locking/qspinlock, x86: 
Provide liveness guarantee
date:   13 days ago
config: x86_64-randconfig-j0-10290909 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 7aa54be2976550f17c11a1c3e3630002dea39303
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:5:0,
from include/linux/atomic.h:7,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
   arch/x86/include/asm/qspinlock.h: In function 
'queued_fetch_set_pending_acquire':
>> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
   : clobbers : cc_label);\
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 
'asm_volatile_goto'
 asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"  \
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here
cc_label: c = true;  \
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
>> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label'
cc_label: c = true;  \
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
  

Re: [PATCH v2 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-29 Thread Leizhen (ThunderTown)



On 2018/10/30 1:59, Will Deacon wrote:
> On Sat, Oct 20, 2018 at 03:36:54PM +0800, Zhen Lei wrote:
>> The standard GITS_TRANSLATER register in ITS is only 4 bytes, but
>> Hisilicon expands the next 4 bytes to carry some IMPDEF information. That
>> means, total 8 bytes data will be written to MSIAddress each time.
>>
>> MSIAddr: |4bytes|4bytes|
>>   |MSIData   |IMPDEF|
>>
>> There is no problem for ITS, because the next 4 bytes space is reserved
>> in ITS. But it will overwrite the 4 bytes memory following "sync_count".
>> It's very fortunately that the previous and the next neighbour of the
>> "sync_count" are both aligned by 8 bytes, so no problem is met now.
>>
>> It's good to explicitly add a workaround:
>> 1. Add gcc __attribute__((aligned(8))) to make sure that "sync_count" is
>>always aligned by 8 bytes.
>> 2. Add a "int" struct member to make sure the 4 bytes padding is always
>>exist.
>>
>> There is no functional change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 5059d09..624fdd0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -586,7 +586,20 @@ struct arm_smmu_device {
>>  
>>  struct arm_smmu_strtab_cfg  strtab_cfg;
>>  
>> -u32 sync_count;
>> +/*
>> + * The alignment and padding is required by Hi16xx of Hisilicon.
>> + * Because the ITS hardware on Hi16xx will truncate the MSIAddress(Here
>> + * it's the address of "sync_count") to 8 bytes boundary first, then
>> + * write 32 bits MSIdata at offset 0, and 32 bits IMPDEF data at offset
>> + * 4. Without this workaround, the adjacent member maybe overwritten.
>> + *
>> + *|---4bytes---|---4bytes---|
>> + * MSIAddress & (~0x7):   MSIdata  | IMPDEF data|
>> + */
>> +struct {
>> +u32 sync_count;
>> +int padding;
>> +} __attribute__((aligned(8)));
> 
> I thought the conclusion after reviewing your original patch was to maintain
> the union and drop the alignment directive? e.g.
> 
>   union {
>   u32 sync_count;
>   u64 padding; /* Hi16xx writes an extra 32 bits of goodness 
> */
>   };
OK, I will sent v3.

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH] ASoC: AMD: Fix race condition between register access

2018-10-29 Thread Daniel Kurtz
Hi Akshu,


On Mon, Oct 29, 2018 at 1:39 AM Agrawal, Akshu  wrote:
>
> During simultaneous running of playback and capture, we
> got hit by incorrect value write on common register. This was due
> to race condition between 2 streams.
> Fixing this by locking the common register access.

Nice catch!  It looks looks like one of the operations you are trying
to make atomic is that two step Addr + Data register update.
If so, then I recommend refactoring a bit, and just doing that locked
2-step-access in its own helper function, like this:

  static void acp_reg_write_srbm_targ(void __iomem *acp_mmio, u32
addr, u32 data)
  {
unsigned long flags;

spin_lock_irqsave(, flags);
acp_reg_write(addr, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
acp_reg_write(data, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
spin_unlock_irqrestore(, flags);
  }

And similarly, you can add 2 more locking helpers, one for modifying
the imr/ch and another for mmACP_I2S_16BIT_RESOLUTION_EN.

>
> Signed-off-by: Akshu Agrawal 
> ---
>  sound/soc/amd/acp-pcm-dma.c | 29 +
>  1 file changed, 29 insertions(+)
>
> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
> index 0ac4b5b..993a7db 100644
> --- a/sound/soc/amd/acp-pcm-dma.c
> +++ b/sound/soc/amd/acp-pcm-dma.c
> @@ -121,6 +121,9 @@
> .periods_max = CAPTURE_MAX_NUM_PERIODS,
>  };
>
> +/* Lock to protect access to registers */
> +static DEFINE_SPINLOCK(lock);
> +
>  static u32 acp_reg_read(void __iomem *acp_mmio, u32 reg)
>  {
> return readl(acp_mmio + (reg * 4));
> @@ -168,9 +171,12 @@ static void config_dma_descriptor_in_sram(void __iomem 
> *acp_mmio,
>   acp_dma_dscr_transfer_t *descr_info)
>  {
> u32 sram_offset;
> +   unsigned long flags;
>
> sram_offset = (descr_idx * sizeof(acp_dma_dscr_transfer_t));
>
> +   spin_lock_irqsave(, flags);
> +
> /* program the source base address. */
> acp_reg_write(sram_offset, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
> acp_reg_write(descr_info->src,  acp_mmio, mmACP_SRBM_Targ_Idx_Data);
> @@ -181,6 +187,8 @@ static void config_dma_descriptor_in_sram(void __iomem 
> *acp_mmio,
> /* program the number of bytes to be transferred for this descriptor. 
> */
> acp_reg_write(sram_offset + 8,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
> acp_reg_write(descr_info->xfer_val, acp_mmio, 
> mmACP_SRBM_Targ_Idx_Data);
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void pre_config_reset(void __iomem *acp_mmio, u16 ch_num)
> @@ -309,8 +317,12 @@ static void acp_pte_config(void __iomem *acp_mmio, 
> struct page *pg,
> u32 low;
> u32 high;
> u32 offset;
> +   unsigned long flags;
>
> offset  = ACP_DAGB_GRP_SRBM_SRAM_BASE_OFFSET + (pte_offset * 8);
> +
> +   spin_lock_irqsave(, flags);
> +
> for (page_idx = 0; page_idx < (num_of_pages); page_idx++) {
> /* Load the low address of page int ACP SRAM through SRBM */
> acp_reg_write((offset + (page_idx * 8)),
> @@ -333,6 +345,8 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
> page *pg,
> /* Move to next physically contiguos page */
> pg++;
> }
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void config_acp_dma(void __iomem *acp_mmio,
> @@ -367,6 +381,7 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
>u16 cap_channel)
>  {
> u32 val, ch_reg, imr_reg, res_reg;
> +   unsigned long flags;
>
> switch (cap_channel) {
> case CAP_CHANNEL1:
> @@ -381,6 +396,8 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
> imr_reg = mmACP_I2SMICSP_IMR0;
> break;
> }
> +   spin_lock_irqsave(, flags);
> +
> val = acp_reg_read(acp_mmio,
>mmACP_I2S_16BIT_RESOLUTION_EN);
> if (val & ACP_I2S_MIC_16BIT_RESOLUTION_EN) {
> @@ -393,12 +410,15 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
> val &= ~ACP_I2SMICSP_IMR1__I2SMICSP_RXFOM_MASK;
> acp_reg_write(val, acp_mmio, imr_reg);
> acp_reg_write(0x1, acp_mmio, ch_reg);
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void acp_dma_cap_channel_disable(void __iomem *acp_mmio,
> u16 cap_channel)
>  {
> u32 val, ch_reg, imr_reg;
> +   unsigned long flags;
>
> switch (cap_channel) {
> case CAP_CHANNEL1:
> @@ -411,11 +431,15 @@ static void acp_dma_cap_channel_disable(void __iomem 
> *acp_mmio,
> ch_reg = mmACP_I2SMICSP_RER0;
> break;
> }
> +   spin_lock_irqsave(, flags);
> +
> val = acp_reg_read(acp_mmio, imr_reg);
> val |= ACP_I2SMICSP_IMR1__I2SMICSP_RXDAM_MASK;
> val |= 

Re: [PATCH] ASoC: AMD: Fix race condition between register access

2018-10-29 Thread Daniel Kurtz
Hi Akshu,


On Mon, Oct 29, 2018 at 1:39 AM Agrawal, Akshu  wrote:
>
> During simultaneous running of playback and capture, we
> got hit by incorrect value write on common register. This was due
> to race condition between 2 streams.
> Fixing this by locking the common register access.

Nice catch!  It looks looks like one of the operations you are trying
to make atomic is that two step Addr + Data register update.
If so, then I recommend refactoring a bit, and just doing that locked
2-step-access in its own helper function, like this:

  static void acp_reg_write_srbm_targ(void __iomem *acp_mmio, u32
addr, u32 data)
  {
unsigned long flags;

spin_lock_irqsave(, flags);
acp_reg_write(addr, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
acp_reg_write(data, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
spin_unlock_irqrestore(, flags);
  }

And similarly, you can add 2 more locking helpers, one for modifying
the imr/ch and another for mmACP_I2S_16BIT_RESOLUTION_EN.

>
> Signed-off-by: Akshu Agrawal 
> ---
>  sound/soc/amd/acp-pcm-dma.c | 29 +
>  1 file changed, 29 insertions(+)
>
> diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
> index 0ac4b5b..993a7db 100644
> --- a/sound/soc/amd/acp-pcm-dma.c
> +++ b/sound/soc/amd/acp-pcm-dma.c
> @@ -121,6 +121,9 @@
> .periods_max = CAPTURE_MAX_NUM_PERIODS,
>  };
>
> +/* Lock to protect access to registers */
> +static DEFINE_SPINLOCK(lock);
> +
>  static u32 acp_reg_read(void __iomem *acp_mmio, u32 reg)
>  {
> return readl(acp_mmio + (reg * 4));
> @@ -168,9 +171,12 @@ static void config_dma_descriptor_in_sram(void __iomem 
> *acp_mmio,
>   acp_dma_dscr_transfer_t *descr_info)
>  {
> u32 sram_offset;
> +   unsigned long flags;
>
> sram_offset = (descr_idx * sizeof(acp_dma_dscr_transfer_t));
>
> +   spin_lock_irqsave(, flags);
> +
> /* program the source base address. */
> acp_reg_write(sram_offset, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
> acp_reg_write(descr_info->src,  acp_mmio, mmACP_SRBM_Targ_Idx_Data);
> @@ -181,6 +187,8 @@ static void config_dma_descriptor_in_sram(void __iomem 
> *acp_mmio,
> /* program the number of bytes to be transferred for this descriptor. 
> */
> acp_reg_write(sram_offset + 8,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
> acp_reg_write(descr_info->xfer_val, acp_mmio, 
> mmACP_SRBM_Targ_Idx_Data);
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void pre_config_reset(void __iomem *acp_mmio, u16 ch_num)
> @@ -309,8 +317,12 @@ static void acp_pte_config(void __iomem *acp_mmio, 
> struct page *pg,
> u32 low;
> u32 high;
> u32 offset;
> +   unsigned long flags;
>
> offset  = ACP_DAGB_GRP_SRBM_SRAM_BASE_OFFSET + (pte_offset * 8);
> +
> +   spin_lock_irqsave(, flags);
> +
> for (page_idx = 0; page_idx < (num_of_pages); page_idx++) {
> /* Load the low address of page int ACP SRAM through SRBM */
> acp_reg_write((offset + (page_idx * 8)),
> @@ -333,6 +345,8 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
> page *pg,
> /* Move to next physically contiguos page */
> pg++;
> }
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void config_acp_dma(void __iomem *acp_mmio,
> @@ -367,6 +381,7 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
>u16 cap_channel)
>  {
> u32 val, ch_reg, imr_reg, res_reg;
> +   unsigned long flags;
>
> switch (cap_channel) {
> case CAP_CHANNEL1:
> @@ -381,6 +396,8 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
> imr_reg = mmACP_I2SMICSP_IMR0;
> break;
> }
> +   spin_lock_irqsave(, flags);
> +
> val = acp_reg_read(acp_mmio,
>mmACP_I2S_16BIT_RESOLUTION_EN);
> if (val & ACP_I2S_MIC_16BIT_RESOLUTION_EN) {
> @@ -393,12 +410,15 @@ static void acp_dma_cap_channel_enable(void __iomem 
> *acp_mmio,
> val &= ~ACP_I2SMICSP_IMR1__I2SMICSP_RXFOM_MASK;
> acp_reg_write(val, acp_mmio, imr_reg);
> acp_reg_write(0x1, acp_mmio, ch_reg);
> +
> +   spin_unlock_irqrestore(, flags);
>  }
>
>  static void acp_dma_cap_channel_disable(void __iomem *acp_mmio,
> u16 cap_channel)
>  {
> u32 val, ch_reg, imr_reg;
> +   unsigned long flags;
>
> switch (cap_channel) {
> case CAP_CHANNEL1:
> @@ -411,11 +431,15 @@ static void acp_dma_cap_channel_disable(void __iomem 
> *acp_mmio,
> ch_reg = mmACP_I2SMICSP_RER0;
> break;
> }
> +   spin_lock_irqsave(, flags);
> +
> val = acp_reg_read(acp_mmio, imr_reg);
> val |= ACP_I2SMICSP_IMR1__I2SMICSP_RXDAM_MASK;
> val |= 

Re: [PATCH v3] mm/page_owner: use kvmalloc instead of kmalloc

2018-10-29 Thread Miles Chen
On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote:
> On Mon 29-10-18 09:07:08, Michal Hocko wrote:
> [...]
> > Besides that, the following doesn't make much sense to me. It simply
> > makes no sense to use vmalloc for sub page allocation regardless of
> > HIGHMEM.
> 
> OK, it is still early morning here. Now I get the point of the patch.
> You just want to (ab)use highmeme for smaller requests. I do not like
> this, to be honest. It causes an internal fragmentation and more
> importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
> have any 64b with HIGHMEM btw?) is quite small to be wasted like that.
> 
thanks for your comment. It looks like that using vmalloc fallback for
sub page allocation is not good here.

Your comment gave another idea:

1. force kbuf to PAGE_SIZE
2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can
get a highmem page if possible
3. use kmap/kunmap pair to create mapping for this page. No vmalloc
space is used.
4. do not change kvmalloc logic.


> In any case such a changes should come with some numbers and as a
> separate patch for sure.
> 
> > > diff --git a/mm/util.c b/mm/util.c
> > > index 8bf08b5b5760..7b1c59b9bfbf 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int 
> > > node)
> > >   ret = kmalloc_node(size, kmalloc_flags, node);
> > >  
> > >   /*
> > > -  * It doesn't really make sense to fallback to vmalloc for sub page
> > > -  * requests
> > > +  * It only makes sense to fallback to vmalloc for sub page
> > > +  * requests if we might be able to allocate highmem pages.
> > >*/
> > > - if (ret || size <= PAGE_SIZE)
> > > + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
> > >   return ret;
> > >  
> > >   return __vmalloc_node_flags_caller(size, node, flags,
> > > -- 
> > > 2.18.0
> > > 
> > 
> > -- 
> > Michal Hocko
> > SUSE Labs
> 




Re: [PATCH v3] mm/page_owner: use kvmalloc instead of kmalloc

2018-10-29 Thread Miles Chen
On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote:
> On Mon 29-10-18 09:07:08, Michal Hocko wrote:
> [...]
> > Besides that, the following doesn't make much sense to me. It simply
> > makes no sense to use vmalloc for sub page allocation regardless of
> > HIGHMEM.
> 
> OK, it is still early morning here. Now I get the point of the patch.
> You just want to (ab)use highmeme for smaller requests. I do not like
> this, to be honest. It causes an internal fragmentation and more
> importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
> have any 64b with HIGHMEM btw?) is quite small to be wasted like that.
> 
thanks for your comment. It looks like that using vmalloc fallback for
sub page allocation is not good here.

Your comment gave another idea:

1. force kbuf to PAGE_SIZE
2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can
get a highmem page if possible
3. use kmap/kunmap pair to create mapping for this page. No vmalloc
space is used.
4. do not change kvmalloc logic.


> In any case such a changes should come with some numbers and as a
> separate patch for sure.
> 
> > > diff --git a/mm/util.c b/mm/util.c
> > > index 8bf08b5b5760..7b1c59b9bfbf 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int 
> > > node)
> > >   ret = kmalloc_node(size, kmalloc_flags, node);
> > >  
> > >   /*
> > > -  * It doesn't really make sense to fallback to vmalloc for sub page
> > > -  * requests
> > > +  * It only makes sense to fallback to vmalloc for sub page
> > > +  * requests if we might be able to allocate highmem pages.
> > >*/
> > > - if (ret || size <= PAGE_SIZE)
> > > + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
> > >   return ret;
> > >  
> > >   return __vmalloc_node_flags_caller(size, node, flags,
> > > -- 
> > > 2.18.0
> > > 
> > 
> > -- 
> > Michal Hocko
> > SUSE Labs
> 




linux-next: manual merge of the vfs tree with Linus' tree

2018-10-29 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in:

  fs/compat_ioctl.c

between commit:

  77654350306a ("take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl()")

from Linus' tree and commit:

  69374d063be0 ("compat_ioctl: remove pointless HCI... ioctls")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/compat_ioctl.c
index 6e30949d9f77,326ceab5246a..
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@@ -429,13 -499,68 +429,6 @@@ static int mt_ioctl_trans(struct file *
  
  #endif /* CONFIG_BLOCK */
  
- /* Bluetooth ioctls */
- #define HCIUARTSETPROTO   _IOW('U', 200, int)
- #define HCIUARTGETPROTO   _IOR('U', 201, int)
- #define HCIUARTGETDEVICE  _IOR('U', 202, int)
- #define HCIUARTSETFLAGS   _IOW('U', 203, int)
- #define HCIUARTGETFLAGS   _IOR('U', 204, int)
 -struct serial_struct32 {
 -compat_int_ttype;
 -compat_int_tline;
 -compat_uint_t   port;
 -compat_int_tirq;
 -compat_int_tflags;
 -compat_int_txmit_fifo_size;
 -compat_int_tcustom_divisor;
 -compat_int_tbaud_base;
 -unsigned short  close_delay;
 -chario_type;
 -charreserved_char[1];
 -compat_int_thub6;
 -unsigned short  closing_wait; /* time to wait before closing */
 -unsigned short  closing_wait2; /* no longer used... */
 -compat_uint_t   iomem_base;
 -unsigned short  iomem_reg_shift;
 -unsigned intport_high;
 - /* compat_ulong_t  iomap_base FIXME */
 -compat_int_treserved[1];
 -};
 -
 -static int serial_struct_ioctl(struct file *file,
 -  unsigned cmd, struct serial_struct32 __user *ss32)
 -{
 -typedef struct serial_struct32 SS32;
 -int err;
 -  struct serial_struct __user *ss = compat_alloc_user_space(sizeof(*ss));
 -__u32 udata;
 -  unsigned int base;
 -  unsigned char *iomem_base;
 -
 -  if (ss == NULL)
 -  return -EFAULT;
 -if (cmd == TIOCSSERIAL) {
 -  if (copy_in_user(ss, ss32, offsetof(SS32, iomem_base)) ||
 -  get_user(udata, >iomem_base))
 -  return -EFAULT;
 -  iomem_base = compat_ptr(udata);
 -  if (put_user(iomem_base, >iomem_base) ||
 -  convert_in_user(>iomem_reg_shift,
 ->iomem_reg_shift) ||
 -  convert_in_user(>port_high, >port_high) ||
 -  put_user(0UL, >iomap_base))
 -  return -EFAULT;
 -}
 -  err = do_ioctl(file, cmd, (unsigned long)ss);
 -if (cmd == TIOCGSERIAL && err >= 0) {
 -  if (copy_in_user(ss32, ss, offsetof(SS32, iomem_base)) ||
 -  get_user(iomem_base, >iomem_base))
 -  return -EFAULT;
 -  base = (unsigned long)iomem_base  >> 32 ?
 -  0x : (unsigned)(unsigned long)iomem_base;
 -  if (put_user(base, >iomem_base) ||
 -  convert_in_user(>iomem_reg_shift,
 ->iomem_reg_shift) ||
 -  convert_in_user(>port_high, >port_high))
 -  return -EFAULT;
 -}
 -return err;
 -}
--
  #define RTC_IRQP_READ32   _IOR('p', 0x0b, compat_ulong_t)
  #define RTC_IRQP_SET32_IOW('p', 0x0c, compat_ulong_t)
  #define RTC_EPOCH_READ32  _IOR('p', 0x0d, compat_ulong_t)


pgpemguebeHG1.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the vfs tree with Linus' tree

2018-10-29 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in:

  fs/compat_ioctl.c

between commit:

  77654350306a ("take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl()")

from Linus' tree and commit:

  69374d063be0 ("compat_ioctl: remove pointless HCI... ioctls")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/compat_ioctl.c
index 6e30949d9f77,326ceab5246a..
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@@ -429,13 -499,68 +429,6 @@@ static int mt_ioctl_trans(struct file *
  
  #endif /* CONFIG_BLOCK */
  
- /* Bluetooth ioctls */
- #define HCIUARTSETPROTO   _IOW('U', 200, int)
- #define HCIUARTGETPROTO   _IOR('U', 201, int)
- #define HCIUARTGETDEVICE  _IOR('U', 202, int)
- #define HCIUARTSETFLAGS   _IOW('U', 203, int)
- #define HCIUARTGETFLAGS   _IOR('U', 204, int)
 -struct serial_struct32 {
 -compat_int_ttype;
 -compat_int_tline;
 -compat_uint_t   port;
 -compat_int_tirq;
 -compat_int_tflags;
 -compat_int_txmit_fifo_size;
 -compat_int_tcustom_divisor;
 -compat_int_tbaud_base;
 -unsigned short  close_delay;
 -chario_type;
 -charreserved_char[1];
 -compat_int_thub6;
 -unsigned short  closing_wait; /* time to wait before closing */
 -unsigned short  closing_wait2; /* no longer used... */
 -compat_uint_t   iomem_base;
 -unsigned short  iomem_reg_shift;
 -unsigned intport_high;
 - /* compat_ulong_t  iomap_base FIXME */
 -compat_int_treserved[1];
 -};
 -
 -static int serial_struct_ioctl(struct file *file,
 -  unsigned cmd, struct serial_struct32 __user *ss32)
 -{
 -typedef struct serial_struct32 SS32;
 -int err;
 -  struct serial_struct __user *ss = compat_alloc_user_space(sizeof(*ss));
 -__u32 udata;
 -  unsigned int base;
 -  unsigned char *iomem_base;
 -
 -  if (ss == NULL)
 -  return -EFAULT;
 -if (cmd == TIOCSSERIAL) {
 -  if (copy_in_user(ss, ss32, offsetof(SS32, iomem_base)) ||
 -  get_user(udata, >iomem_base))
 -  return -EFAULT;
 -  iomem_base = compat_ptr(udata);
 -  if (put_user(iomem_base, >iomem_base) ||
 -  convert_in_user(>iomem_reg_shift,
 ->iomem_reg_shift) ||
 -  convert_in_user(>port_high, >port_high) ||
 -  put_user(0UL, >iomap_base))
 -  return -EFAULT;
 -}
 -  err = do_ioctl(file, cmd, (unsigned long)ss);
 -if (cmd == TIOCGSERIAL && err >= 0) {
 -  if (copy_in_user(ss32, ss, offsetof(SS32, iomem_base)) ||
 -  get_user(iomem_base, >iomem_base))
 -  return -EFAULT;
 -  base = (unsigned long)iomem_base  >> 32 ?
 -  0x : (unsigned)(unsigned long)iomem_base;
 -  if (put_user(base, >iomem_base) ||
 -  convert_in_user(>iomem_reg_shift,
 ->iomem_reg_shift) ||
 -  convert_in_user(>port_high, >port_high))
 -  return -EFAULT;
 -}
 -return err;
 -}
--
  #define RTC_IRQP_READ32   _IOR('p', 0x0b, compat_ulong_t)
  #define RTC_IRQP_SET32_IOW('p', 0x0c, compat_ulong_t)
  #define RTC_EPOCH_READ32  _IOR('p', 0x0d, compat_ulong_t)


pgpemguebeHG1.pgp
Description: OpenPGP digital signature


make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed by '__build'.

2018-10-29 Thread kbuild test robot
Hi Rob,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 37c8a5fafa3bb7dcdd51774be353be6cb2912b86 kbuild: consolidate Devicetree 
dtb build rules
date:   4 weeks ago
config: xtensa-common_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 37c8a5fafa3bb7dcdd51774be353be6cb2912b86
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx200mx.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/ml605.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705_nommu.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx60.dtb', needed 
>> by '__build'.
   make[2]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed by '__build'.

2018-10-29 Thread kbuild test robot
Hi Rob,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 37c8a5fafa3bb7dcdd51774be353be6cb2912b86 kbuild: consolidate Devicetree 
dtb build rules
date:   4 weeks ago
config: xtensa-common_defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 37c8a5fafa3bb7dcdd51774be353be6cb2912b86
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx200mx.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/ml605.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705_nommu.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx60.dtb', needed 
>> by '__build'.
   make[2]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


RTL8812au driver source code A little help please?

2018-10-29 Thread Nathaniel Russell
make[1]: Entering directory '/usr/src/linux-4.19'
  CC [M]  /tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.o
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.c:816:22: error:
initialization of ‘u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *, u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *))’ {aka ‘short unsigned int (*)(struct net_device
*, struct sk_buff *, struct net_device *, short unsigned int
(*)(struct net_device *, struct sk_buff *, struct net_device *))’}
from incompatible pointer type ‘u16 (*)(struct net_device *, struct
sk_buff *, void *, u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *))’ {aka ‘short unsigned int (*)(struct net_device
*, struct sk_buff *, void *, short unsigned int (*)(struct net_device
*, struct sk_buff *, struct net_device *))’}
[-Werror=incompatible-pointer-types]
  .ndo_select_queue = rtw_select_queue,
  ^~~~
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.c:816:22: note:
(near initialization for ‘rtw_netdev_ops.ndo_select_queue’)
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:306:
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.o] Error 1
make[1]: *** [Makefile:1517: _module_/tmp/rtl8812AU_8821AU_linux] Error 2
make[1]: Leaving directory '/usr/src/linux-4.19'
make: *** [Makefile:1584: modules] Error 2

If anyone could help me with this issue it would be greatly appreciated.


RTL8812au driver source code A little help please?

2018-10-29 Thread Nathaniel Russell
make[1]: Entering directory '/usr/src/linux-4.19'
  CC [M]  /tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.o
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.c:816:22: error:
initialization of ‘u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *, u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *))’ {aka ‘short unsigned int (*)(struct net_device
*, struct sk_buff *, struct net_device *, short unsigned int
(*)(struct net_device *, struct sk_buff *, struct net_device *))’}
from incompatible pointer type ‘u16 (*)(struct net_device *, struct
sk_buff *, void *, u16 (*)(struct net_device *, struct sk_buff *,
struct net_device *))’ {aka ‘short unsigned int (*)(struct net_device
*, struct sk_buff *, void *, short unsigned int (*)(struct net_device
*, struct sk_buff *, struct net_device *))’}
[-Werror=incompatible-pointer-types]
  .ndo_select_queue = rtw_select_queue,
  ^~~~
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.c:816:22: note:
(near initialization for ‘rtw_netdev_ops.ndo_select_queue’)
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:306:
/tmp/rtl8812AU_8821AU_linux/os_dep/linux/os_intfs.o] Error 1
make[1]: *** [Makefile:1517: _module_/tmp/rtl8812AU_8821AU_linux] Error 2
make[1]: Leaving directory '/usr/src/linux-4.19'
make: *** [Makefile:1584: modules] Error 2

If anyone could help me with this issue it would be greatly appreciated.


Re: [PATCH] kretprobe: produce sane stack traces

2018-10-29 Thread Masami Hiramatsu
Hi Aleksa,

On Sat, 27 Oct 2018 00:22:10 +1100
Aleksa Sarai  wrote:

> Historically, kretprobe has always produced unusable stack traces
> (kretprobe_trampoline is the only entry in most cases, because of the
> funky stack pointer overwriting). This has caused quite a few annoyances
> when using tracing to debug problems[1] -- since return values are only
> available with kretprobes but stack traces were only usable for kprobes,
> users had to probe both and then manually associate them.

Yes, this unfortunately still happens. I once tried to fix it by
replacing current "kretprobe instance" with graph-tracer's per-thread
return stack. (https://lkml.org/lkml/2017/8/21/553)

I still believe that direction is the best solution to solve this kind
of issues, otherwise, we have to have 2 different stack fixups for
kretprobe and ftrace graph tracer. (I will have a talk with Steve at
plumbers next month)

Anyway, until that merge happens, this patch looks good to avoid
this issue for generic solution (e.g. for the arch which doesn't
supports retstack).


> 
> With the advent of bpf_trace, users would have been able to do this
> association in bpf, but this was less than ideal (because
> bpf_get_stackid would still produce rubbish and programs that didn't
> know better would get silly results). The main usecase for stack traces
> (at least with bpf_trace) is for DTrace-style aggregation on stack
> traces (both entry and exit). Therefore we cannot simply correct the
> stack trace on exit -- we must stash away the stack trace and return the
> entry stack trace when it is requested.
> 
> In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> kretprobe'd functions in trace logs") are no longer necessary *for
> tracing* because now all kretprobe traces should produce sane stack
> traces. However it's not clear whether removing them completely is
> reasonable.

Then, let's try to revert it :)

BTW, could you also add a test case for ftrace too?
also, I have some comments below.

> 
> [1]: https://github.com/iovisor/bpftrace/issues/101
> 
> Cc: Brendan Gregg 
> Cc: Christian Brauner 
> Signed-off-by: Aleksa Sarai 
> ---
>  include/linux/kprobes.h   |  15 ++
>  kernel/events/callchain.c |   8 ++-
>  kernel/kprobes.c  | 108 +-
>  kernel/trace/trace.c  |  11 +++-
>  4 files changed, 138 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index e909413e4e38..8a4f78a0c990 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -40,6 +40,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #ifdef CONFIG_KPROBES
> @@ -168,11 +170,18 @@ struct kretprobe {
>   raw_spinlock_t lock;
>  };
>  
> +#define KRETPROBE_TRACE_SIZE 1024
> +struct kretprobe_trace {
> + int nr_entries;
> + unsigned long entries[KRETPROBE_TRACE_SIZE];
> +};

Hmm, do we really need all entries? It takes 8KB for each instances.
Note that the number of instances can be big if the system core number
is larger.

> +
>  struct kretprobe_instance {
>   struct hlist_node hlist;
>   struct kretprobe *rp;
>   kprobe_opcode_t *ret_addr;
>   struct task_struct *task;
> + struct kretprobe_trace entry;
>   char data[0];
>  };
>  
> @@ -371,6 +380,12 @@ void unregister_kretprobe(struct kretprobe *rp);
>  int register_kretprobes(struct kretprobe **rps, int num);
>  void unregister_kretprobes(struct kretprobe **rps, int num);
>  
> +struct kretprobe_instance *current_kretprobe_instance(void);
> +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> + struct stack_trace *trace);
> +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> +  struct perf_callchain_entry_ctx *ctx);
> +
>  void kprobe_flush_task(struct task_struct *tk);
>  void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
>  
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index 24a77c34e9ad..98edcd8a6987 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -197,9 +198,14 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, 
> bool kernel, bool user,
>   ctx.contexts_maxed = false;
>  
>   if (kernel && !user_mode(regs)) {
> + struct kretprobe_instance *ri = current_kretprobe_instance();
> +
>   if (add_mark)
>   perf_callchain_store_context(, PERF_CONTEXT_KERNEL);
> - perf_callchain_kernel(, regs);
> + if (ri)
> + kretprobe_perf_callchain_kernel(ri, );
> + else
> + perf_callchain_kernel(, regs);
>   }
>  
>   if (user) {
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 90e98e233647..a0776a7d1244 100644

Re: [PATCH] kretprobe: produce sane stack traces

2018-10-29 Thread Masami Hiramatsu
Hi Aleksa,

On Sat, 27 Oct 2018 00:22:10 +1100
Aleksa Sarai  wrote:

> Historically, kretprobe has always produced unusable stack traces
> (kretprobe_trampoline is the only entry in most cases, because of the
> funky stack pointer overwriting). This has caused quite a few annoyances
> when using tracing to debug problems[1] -- since return values are only
> available with kretprobes but stack traces were only usable for kprobes,
> users had to probe both and then manually associate them.

Yes, this unfortunately still happens. I once tried to fix it by
replacing current "kretprobe instance" with graph-tracer's per-thread
return stack. (https://lkml.org/lkml/2017/8/21/553)

I still believe that direction is the best solution to solve this kind
of issues, otherwise, we have to have 2 different stack fixups for
kretprobe and ftrace graph tracer. (I will have a talk with Steve at
plumbers next month)

Anyway, until that merge happens, this patch looks good to avoid
this issue for generic solution (e.g. for the arch which doesn't
supports retstack).


> 
> With the advent of bpf_trace, users would have been able to do this
> association in bpf, but this was less than ideal (because
> bpf_get_stackid would still produce rubbish and programs that didn't
> know better would get silly results). The main usecase for stack traces
> (at least with bpf_trace) is for DTrace-style aggregation on stack
> traces (both entry and exit). Therefore we cannot simply correct the
> stack trace on exit -- we must stash away the stack trace and return the
> entry stack trace when it is requested.
> 
> In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> kretprobe'd functions in trace logs") are no longer necessary *for
> tracing* because now all kretprobe traces should produce sane stack
> traces. However it's not clear whether removing them completely is
> reasonable.

Then, let's try to revert it :)

BTW, could you also add a test case for ftrace too?
also, I have some comments below.

> 
> [1]: https://github.com/iovisor/bpftrace/issues/101
> 
> Cc: Brendan Gregg 
> Cc: Christian Brauner 
> Signed-off-by: Aleksa Sarai 
> ---
>  include/linux/kprobes.h   |  15 ++
>  kernel/events/callchain.c |   8 ++-
>  kernel/kprobes.c  | 108 +-
>  kernel/trace/trace.c  |  11 +++-
>  4 files changed, 138 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index e909413e4e38..8a4f78a0c990 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -40,6 +40,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #ifdef CONFIG_KPROBES
> @@ -168,11 +170,18 @@ struct kretprobe {
>   raw_spinlock_t lock;
>  };
>  
> +#define KRETPROBE_TRACE_SIZE 1024
> +struct kretprobe_trace {
> + int nr_entries;
> + unsigned long entries[KRETPROBE_TRACE_SIZE];
> +};

Hmm, do we really need all entries? It takes 8KB for each instances.
Note that the number of instances can be big if the system core number
is larger.

> +
>  struct kretprobe_instance {
>   struct hlist_node hlist;
>   struct kretprobe *rp;
>   kprobe_opcode_t *ret_addr;
>   struct task_struct *task;
> + struct kretprobe_trace entry;
>   char data[0];
>  };
>  
> @@ -371,6 +380,12 @@ void unregister_kretprobe(struct kretprobe *rp);
>  int register_kretprobes(struct kretprobe **rps, int num);
>  void unregister_kretprobes(struct kretprobe **rps, int num);
>  
> +struct kretprobe_instance *current_kretprobe_instance(void);
> +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> + struct stack_trace *trace);
> +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> +  struct perf_callchain_entry_ctx *ctx);
> +
>  void kprobe_flush_task(struct task_struct *tk);
>  void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
>  
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index 24a77c34e9ad..98edcd8a6987 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -197,9 +198,14 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, 
> bool kernel, bool user,
>   ctx.contexts_maxed = false;
>  
>   if (kernel && !user_mode(regs)) {
> + struct kretprobe_instance *ri = current_kretprobe_instance();
> +
>   if (add_mark)
>   perf_callchain_store_context(, PERF_CONTEXT_KERNEL);
> - perf_callchain_kernel(, regs);
> + if (ri)
> + kretprobe_perf_callchain_kernel(ri, );
> + else
> + perf_callchain_kernel(, regs);
>   }
>  
>   if (user) {
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 90e98e233647..a0776a7d1244 100644

[PATCH v3] thermal: qoriq: add multiple sensors support

2018-10-29 Thread andy . tang
From: Yuantian Tang 

The QorIQ Layerscape SoC has several thermal sensors but the current
driver only supports one.

Massage the code to be sensor oriented and allow the support for
multiple sensors.

Signed-off-by: Yuantian Tang 
Reviewed-by: Daniel Lezcano 
---
v3:
  - add Reviewed-by
v2:
  - update the commit message
  - refine the qoriq_tmu_register_tmu_zone()

 drivers/thermal/qoriq_thermal.c |  100 ++-
 1 files changed, 46 insertions(+), 54 deletions(-)

diff --git a/drivers/thermal/qoriq_thermal.c b/drivers/thermal/qoriq_thermal.c
index 450ed66..8beb344 100644
--- a/drivers/thermal/qoriq_thermal.c
+++ b/drivers/thermal/qoriq_thermal.c
@@ -59,14 +59,21 @@ struct qoriq_tmu_regs {
u32 ttr3cr; /* Temperature Range 3 Control Register */
 };
 
+struct qoriq_tmu_data;
+
 /*
  * Thermal zone data
  */
+struct qoriq_sensor {
+   struct thermal_zone_device  *tzd;
+   struct qoriq_tmu_data   *qdata;
+   int id;
+};
+
 struct qoriq_tmu_data {
-   struct thermal_zone_device *tz;
struct qoriq_tmu_regs __iomem *regs;
-   int sensor_id;
bool little_endian;
+   struct qoriq_sensor *sensor[SITES_MAX];
 };
 
 static void tmu_write(struct qoriq_tmu_data *p, u32 val, void __iomem *addr)
@@ -87,48 +94,51 @@ static u32 tmu_read(struct qoriq_tmu_data *p, void __iomem 
*addr)
 
 static int tmu_get_temp(void *p, int *temp)
 {
+   struct qoriq_sensor *qsensor = p;
+   struct qoriq_tmu_data *qdata = qsensor->qdata;
u32 val;
-   struct qoriq_tmu_data *data = p;
 
-   val = tmu_read(data, >regs->site[data->sensor_id].tritsr);
+   val = tmu_read(qdata, >regs->site[qsensor->id].tritsr);
*temp = (val & 0xff) * 1000;
 
return 0;
 }
 
-static int qoriq_tmu_get_sensor_id(void)
+static const struct thermal_zone_of_device_ops tmu_tz_ops = {
+   .get_temp = tmu_get_temp,
+};
+
+static int qoriq_tmu_register_tmu_zone(struct platform_device *pdev)
 {
-   int ret, id;
-   struct of_phandle_args sensor_specs;
-   struct device_node *np, *sensor_np;
+   struct qoriq_tmu_data *qdata = platform_get_drvdata(pdev);
+   int id, sites = 0;
 
-   np = of_find_node_by_name(NULL, "thermal-zones");
-   if (!np)
-   return -ENODEV;
+   for (id = 0; id < SITES_MAX; id++) {
+   qdata->sensor[id] = devm_kzalloc(>dev,
+   sizeof(struct qoriq_sensor), GFP_KERNEL);
+   if (!qdata->sensor[id])
+   return -ENOMEM;
 
-   sensor_np = of_get_next_child(np, NULL);
-   ret = of_parse_phandle_with_args(sensor_np, "thermal-sensors",
-   "#thermal-sensor-cells",
-   0, _specs);
-   if (ret) {
-   of_node_put(np);
-   of_node_put(sensor_np);
-   return ret;
-   }
+   qdata->sensor[id]->id = id;
+   qdata->sensor[id]->qdata = qdata;
 
-   if (sensor_specs.args_count >= 1) {
-   id = sensor_specs.args[0];
-   WARN(sensor_specs.args_count > 1,
-   "%s: too many cells in sensor specifier %d\n",
-   sensor_specs.np->name, sensor_specs.args_count);
-   } else {
-   id = 0;
-   }
+   qdata->sensor[id]->tzd = devm_thermal_zone_of_sensor_register(
+   >dev, id, qdata->sensor[id], _tz_ops);
+   if (IS_ERR(qdata->sensor[id]->tzd)) {
+   if (PTR_ERR(qdata->sensor[id]->tzd) == -ENODEV)
+   continue;
+   else
+   return PTR_ERR(qdata->sensor[id]->tzd);
 
-   of_node_put(np);
-   of_node_put(sensor_np);
+   }
+
+   sites |= 0x1 << (15 - id);
+   }
+   /* Enable monitoring */
+   if (sites != 0)
+   tmu_write(qdata, sites | TMR_ME | TMR_ALPF, >regs->tmr);
 
-   return id;
+   return 0;
 }
 
 static int qoriq_tmu_calibration(struct platform_device *pdev)
@@ -178,16 +188,11 @@ static void qoriq_tmu_init_device(struct qoriq_tmu_data 
*data)
tmu_write(data, TMR_DISABLE, >regs->tmr);
 }
 
-static const struct thermal_zone_of_device_ops tmu_tz_ops = {
-   .get_temp = tmu_get_temp,
-};
-
 static int qoriq_tmu_probe(struct platform_device *pdev)
 {
int ret;
struct qoriq_tmu_data *data;
struct device_node *np = pdev->dev.of_node;
-   u32 site;
 
if (!np) {
dev_err(>dev, "Device OF-Node is NULL");
@@ -203,13 +208,6 @@ static int qoriq_tmu_probe(struct platform_device *pdev)
 
data->little_endian = of_property_read_bool(np, "little-endian");
 
-   data->sensor_id = qoriq_tmu_get_sensor_id();
-   if (data->sensor_id < 0) {
-   dev_err(>dev, "Failed to get sensor id\n");
- 

[PATCH v3] thermal: qoriq: add multiple sensors support

2018-10-29 Thread andy . tang
From: Yuantian Tang 

The QorIQ Layerscape SoC has several thermal sensors but the current
driver only supports one.

Massage the code to be sensor oriented and allow the support for
multiple sensors.

Signed-off-by: Yuantian Tang 
Reviewed-by: Daniel Lezcano 
---
v3:
  - add Reviewed-by
v2:
  - update the commit message
  - refine the qoriq_tmu_register_tmu_zone()

 drivers/thermal/qoriq_thermal.c |  100 ++-
 1 files changed, 46 insertions(+), 54 deletions(-)

diff --git a/drivers/thermal/qoriq_thermal.c b/drivers/thermal/qoriq_thermal.c
index 450ed66..8beb344 100644
--- a/drivers/thermal/qoriq_thermal.c
+++ b/drivers/thermal/qoriq_thermal.c
@@ -59,14 +59,21 @@ struct qoriq_tmu_regs {
u32 ttr3cr; /* Temperature Range 3 Control Register */
 };
 
+struct qoriq_tmu_data;
+
 /*
  * Thermal zone data
  */
+struct qoriq_sensor {
+   struct thermal_zone_device  *tzd;
+   struct qoriq_tmu_data   *qdata;
+   int id;
+};
+
 struct qoriq_tmu_data {
-   struct thermal_zone_device *tz;
struct qoriq_tmu_regs __iomem *regs;
-   int sensor_id;
bool little_endian;
+   struct qoriq_sensor *sensor[SITES_MAX];
 };
 
 static void tmu_write(struct qoriq_tmu_data *p, u32 val, void __iomem *addr)
@@ -87,48 +94,51 @@ static u32 tmu_read(struct qoriq_tmu_data *p, void __iomem 
*addr)
 
 static int tmu_get_temp(void *p, int *temp)
 {
+   struct qoriq_sensor *qsensor = p;
+   struct qoriq_tmu_data *qdata = qsensor->qdata;
u32 val;
-   struct qoriq_tmu_data *data = p;
 
-   val = tmu_read(data, >regs->site[data->sensor_id].tritsr);
+   val = tmu_read(qdata, >regs->site[qsensor->id].tritsr);
*temp = (val & 0xff) * 1000;
 
return 0;
 }
 
-static int qoriq_tmu_get_sensor_id(void)
+static const struct thermal_zone_of_device_ops tmu_tz_ops = {
+   .get_temp = tmu_get_temp,
+};
+
+static int qoriq_tmu_register_tmu_zone(struct platform_device *pdev)
 {
-   int ret, id;
-   struct of_phandle_args sensor_specs;
-   struct device_node *np, *sensor_np;
+   struct qoriq_tmu_data *qdata = platform_get_drvdata(pdev);
+   int id, sites = 0;
 
-   np = of_find_node_by_name(NULL, "thermal-zones");
-   if (!np)
-   return -ENODEV;
+   for (id = 0; id < SITES_MAX; id++) {
+   qdata->sensor[id] = devm_kzalloc(>dev,
+   sizeof(struct qoriq_sensor), GFP_KERNEL);
+   if (!qdata->sensor[id])
+   return -ENOMEM;
 
-   sensor_np = of_get_next_child(np, NULL);
-   ret = of_parse_phandle_with_args(sensor_np, "thermal-sensors",
-   "#thermal-sensor-cells",
-   0, _specs);
-   if (ret) {
-   of_node_put(np);
-   of_node_put(sensor_np);
-   return ret;
-   }
+   qdata->sensor[id]->id = id;
+   qdata->sensor[id]->qdata = qdata;
 
-   if (sensor_specs.args_count >= 1) {
-   id = sensor_specs.args[0];
-   WARN(sensor_specs.args_count > 1,
-   "%s: too many cells in sensor specifier %d\n",
-   sensor_specs.np->name, sensor_specs.args_count);
-   } else {
-   id = 0;
-   }
+   qdata->sensor[id]->tzd = devm_thermal_zone_of_sensor_register(
+   >dev, id, qdata->sensor[id], _tz_ops);
+   if (IS_ERR(qdata->sensor[id]->tzd)) {
+   if (PTR_ERR(qdata->sensor[id]->tzd) == -ENODEV)
+   continue;
+   else
+   return PTR_ERR(qdata->sensor[id]->tzd);
 
-   of_node_put(np);
-   of_node_put(sensor_np);
+   }
+
+   sites |= 0x1 << (15 - id);
+   }
+   /* Enable monitoring */
+   if (sites != 0)
+   tmu_write(qdata, sites | TMR_ME | TMR_ALPF, >regs->tmr);
 
-   return id;
+   return 0;
 }
 
 static int qoriq_tmu_calibration(struct platform_device *pdev)
@@ -178,16 +188,11 @@ static void qoriq_tmu_init_device(struct qoriq_tmu_data 
*data)
tmu_write(data, TMR_DISABLE, >regs->tmr);
 }
 
-static const struct thermal_zone_of_device_ops tmu_tz_ops = {
-   .get_temp = tmu_get_temp,
-};
-
 static int qoriq_tmu_probe(struct platform_device *pdev)
 {
int ret;
struct qoriq_tmu_data *data;
struct device_node *np = pdev->dev.of_node;
-   u32 site;
 
if (!np) {
dev_err(>dev, "Device OF-Node is NULL");
@@ -203,13 +208,6 @@ static int qoriq_tmu_probe(struct platform_device *pdev)
 
data->little_endian = of_property_read_bool(np, "little-endian");
 
-   data->sensor_id = qoriq_tmu_get_sensor_id();
-   if (data->sensor_id < 0) {
-   dev_err(>dev, "Failed to get sensor id\n");
- 

Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Rik van Riel
On Mon, 2018-10-29 at 17:50 -0700, Shakeel Butt wrote:
> On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
> > 
> > Mike Galbraith reported a regression caused by the commit
> > 9b6f7e163cd0
> > ("mm: rework memcg kernel stack accounting") on a system with
> > "cgroup_disable=memory" boot option: the system panics with the
> > following stack trace:
> > 
> >   [0.928542] BUG: unable to handle kernel NULL pointer dereference
> > at 00f8
> >   [0.929317] PGD 0 P4D 0
> >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-
> > preempt+ #410
> >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS ?-20180531_142017-buildhw-08.phx2.fed4
> >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff
> > 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> >   [0.934826] RAX: 00f8 RBX:  RCX:
> > 
> >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI:
> > 00f8
> >   [0.936288] RBP: 0001 R08: 8063 R09:
> > 99ff7cd37a40
> >   [0.937021] R10: acf68031fed0 R11: 0020 R12:
> > 0020
> >   [0.937749] R13: acf68031fd08 R14: 00f8 R15:
> > 99ff7da1ec60
> >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0()
> > knlGS:
> >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4:
> > 00760ef0
> >   [0.940638] DR0:  DR1:  DR2:
> > 
> >   [0.941366] DR3:  DR6: fffe0ff0 DR7:
> > 0400
> >   [0.942110] PKRU: 5554
> >   [0.942412] Call Trace:
> >   [0.942673]  try_charge+0xcb/0x780
> >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> >   [0.944853]  ? get_acl+0x1a/0x120
> >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> >   [0.945596]  _do_fork+0xd7/0x3d0
> >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> >   [0.946421]  do_syscall_64+0x5a/0x180
> >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > The problem occurs because get_mem_cgroup_from_current() returns
> > the NULL pointer if memory controller is disabled. Let's check
> > if this is a case at the beginning of memcg_kmem_charge() and
> > just return 0 if mem_cgroup_disabled() returns true. This is how
> > we handle this case in many other places in the memory controller
> > code.
> > 
> > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > Reported-by: Mike Galbraith 
> > Signed-off-by: Roman Gushchin 
> > Cc: Michal Hocko 
> > Cc: Johannes Weiner 
> > Cc: Vladimir Davydov 
> > Cc: Andrew Morton 
> > ---
> >  mm/memcontrol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 54920cbc46bf..6e1469b80cb7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page,
> > gfp_t gfp, int order)
> > struct mem_cgroup *memcg;
> > int ret = 0;
> > 
> > -   if (memcg_kmem_bypass())
> > +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> > return 0;
> > 
> 
> Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
> in memcg_charge_kernel_stack()?

Check Roman's backtrace again. The function
memcg_charge_kernel_stack() is not in it.

That is why it is generally better to check
in the called function, rather than add a
check to every call site (and maybe miss one
or two).

Acked-by: Rik van Riel 
-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Rik van Riel
On Mon, 2018-10-29 at 17:50 -0700, Shakeel Butt wrote:
> On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
> > 
> > Mike Galbraith reported a regression caused by the commit
> > 9b6f7e163cd0
> > ("mm: rework memcg kernel stack accounting") on a system with
> > "cgroup_disable=memory" boot option: the system panics with the
> > following stack trace:
> > 
> >   [0.928542] BUG: unable to handle kernel NULL pointer dereference
> > at 00f8
> >   [0.929317] PGD 0 P4D 0
> >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-
> > preempt+ #410
> >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS ?-20180531_142017-buildhw-08.phx2.fed4
> >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff
> > 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> >   [0.934826] RAX: 00f8 RBX:  RCX:
> > 
> >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI:
> > 00f8
> >   [0.936288] RBP: 0001 R08: 8063 R09:
> > 99ff7cd37a40
> >   [0.937021] R10: acf68031fed0 R11: 0020 R12:
> > 0020
> >   [0.937749] R13: acf68031fd08 R14: 00f8 R15:
> > 99ff7da1ec60
> >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0()
> > knlGS:
> >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4:
> > 00760ef0
> >   [0.940638] DR0:  DR1:  DR2:
> > 
> >   [0.941366] DR3:  DR6: fffe0ff0 DR7:
> > 0400
> >   [0.942110] PKRU: 5554
> >   [0.942412] Call Trace:
> >   [0.942673]  try_charge+0xcb/0x780
> >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> >   [0.944853]  ? get_acl+0x1a/0x120
> >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> >   [0.945596]  _do_fork+0xd7/0x3d0
> >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> >   [0.946421]  do_syscall_64+0x5a/0x180
> >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > The problem occurs because get_mem_cgroup_from_current() returns
> > the NULL pointer if memory controller is disabled. Let's check
> > if this is a case at the beginning of memcg_kmem_charge() and
> > just return 0 if mem_cgroup_disabled() returns true. This is how
> > we handle this case in many other places in the memory controller
> > code.
> > 
> > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > Reported-by: Mike Galbraith 
> > Signed-off-by: Roman Gushchin 
> > Cc: Michal Hocko 
> > Cc: Johannes Weiner 
> > Cc: Vladimir Davydov 
> > Cc: Andrew Morton 
> > ---
> >  mm/memcontrol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 54920cbc46bf..6e1469b80cb7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page,
> > gfp_t gfp, int order)
> > struct mem_cgroup *memcg;
> > int ret = 0;
> > 
> > -   if (memcg_kmem_bypass())
> > +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> > return 0;
> > 
> 
> Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
> in memcg_charge_kernel_stack()?

Check Roman's backtrace again. The function
memcg_charge_kernel_stack() is not in it.

That is why it is generally better to check
in the called function, rather than add a
check to every call site (and maybe miss one
or two).

Acked-by: Rik van Riel 
-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] fs/proc: introduce /proc/stat2 file

2018-10-29 Thread Vito Caputo
On Mon, Oct 29, 2018 at 11:04:45PM +, Daniel Colascione wrote:
> On Mon, Oct 29, 2018 at 7:25 PM, Davidlohr Bueso  wrote:
> > This patch introduces a new /proc/stat2 file that is identical to the
> > regular 'stat' except that it zeroes all hard irq statistics. The new
> > file is a drop in replacement to stat for users that need performance.
> 
> For a while now, I've been thinking over ways to improve the
> performance of collecting various bits of kernel information. I don't
> think that a proliferation of special-purpose named bag-of-fields file
> variants is the right answer, because even if you add a few info-file
> variants, you're still left with a situation where a given file
> provides a particular caller with too little or too much information.
> I'd much rather move to a model in which userspace *explicitly* tells
> the kernel which fields it wants, with the kernel replying with just
> those particular fields, maybe in their raw binary representations.
> The ASCII-text bag-of-everything files would remain available for
> ad-hoc and non-performance critical use, but programs that cared about
> performance would have an efficient bypass. One concrete approach is
> to let users open up today's proc files and, instead of read(2)ing a
> text blob, use an ioctl to retrieve specified and targeted information
> of the sort that would normally be encoded in the text blob. Because
> callers would open the same file when using either the text or binary
> interfaces, little would have to change, and it'd be easy to implement
> fallbacks when a particular system doesn't support a particular
> fast-path ioctl.


We have two extremes of granularity in the /proc and /sys virtual
filesystems today:

On procfs there's these legacy files which aggregate loosely-related
system information, and in cases where you actually want most of what's
provided, it's a nice optimization because you can sample it all in a
single pread() call.

On sysfs the granularity is much finer with it being fairly common to
find a file-per-datum.  This has other advantages, like not needing to
parse snowflake formats which sometimes varied across kernel versions
like in procfs, or needing to burden the kernel to produce more
information than necessary.

But anyone who has written tools trying to sample large subsets of the
granular information in sysfs at a high rate will know how quickly it
becomes rather costly in terms of system calls.

The last time I went down this path, I wished there were a system call
like readv() which accepted a vector a new iovec type specifying an fd.

Then the sysfs model could be made a more efficient by coalescing all
the required read syscalls into a single megaread bundling all the
relevant fds that are simply kept open and reused.

If we had such a readv() variant, the sysfs granular model could be used
to granularly expose all the information we currently expose in /proc,
while still being relatively efficient in terms of system calls per
sample.  Sure you still have to lookup and open all the files of
interest, but that only needs to occur once at initialization.

Regards,
Vito Caputo


Re: [PATCH] fs/proc: introduce /proc/stat2 file

2018-10-29 Thread Vito Caputo
On Mon, Oct 29, 2018 at 11:04:45PM +, Daniel Colascione wrote:
> On Mon, Oct 29, 2018 at 7:25 PM, Davidlohr Bueso  wrote:
> > This patch introduces a new /proc/stat2 file that is identical to the
> > regular 'stat' except that it zeroes all hard irq statistics. The new
> > file is a drop in replacement to stat for users that need performance.
> 
> For a while now, I've been thinking over ways to improve the
> performance of collecting various bits of kernel information. I don't
> think that a proliferation of special-purpose named bag-of-fields file
> variants is the right answer, because even if you add a few info-file
> variants, you're still left with a situation where a given file
> provides a particular caller with too little or too much information.
> I'd much rather move to a model in which userspace *explicitly* tells
> the kernel which fields it wants, with the kernel replying with just
> those particular fields, maybe in their raw binary representations.
> The ASCII-text bag-of-everything files would remain available for
> ad-hoc and non-performance critical use, but programs that cared about
> performance would have an efficient bypass. One concrete approach is
> to let users open up today's proc files and, instead of read(2)ing a
> text blob, use an ioctl to retrieve specified and targeted information
> of the sort that would normally be encoded in the text blob. Because
> callers would open the same file when using either the text or binary
> interfaces, little would have to change, and it'd be easy to implement
> fallbacks when a particular system doesn't support a particular
> fast-path ioctl.


We have two extremes of granularity in the /proc and /sys virtual
filesystems today:

On procfs there's these legacy files which aggregate loosely-related
system information, and in cases where you actually want most of what's
provided, it's a nice optimization because you can sample it all in a
single pread() call.

On sysfs the granularity is much finer with it being fairly common to
find a file-per-datum.  This has other advantages, like not needing to
parse snowflake formats which sometimes varied across kernel versions
like in procfs, or needing to burden the kernel to produce more
information than necessary.

But anyone who has written tools trying to sample large subsets of the
granular information in sysfs at a high rate will know how quickly it
becomes rather costly in terms of system calls.

The last time I went down this path, I wished there were a system call
like readv() which accepted a vector a new iovec type specifying an fd.

Then the sysfs model could be made a more efficient by coalescing all
the required read syscalls into a single megaread bundling all the
relevant fds that are simply kept open and reused.

If we had such a readv() variant, the sysfs granular model could be used
to granularly expose all the information we currently expose in /proc,
while still being relatively efficient in terms of system calls per
sample.  Sure you still have to lookup and open all the files of
interest, but that only needs to occur once at initialization.

Regards,
Vito Caputo


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Shakeel Butt
On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
>
> Mike Galbraith reported a regression caused by the commit 9b6f7e163cd0
> ("mm: rework memcg kernel stack accounting") on a system with
> "cgroup_disable=memory" boot option: the system panics with the
> following stack trace:
>
>   [0.928542] BUG: unable to handle kernel NULL pointer dereference at 
> 00f8
>   [0.929317] PGD 0 P4D 0
>   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
>   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
>   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> ?-20180531_142017-buildhw-08.phx2.fed4
>   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
>   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 
> 00 00 00 41 56 48 89 f8 49 89 fe 49
>   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
>   [0.934826] RAX: 00f8 RBX:  RCX: 
>   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI: 00f8
>   [0.936288] RBP: 0001 R08: 8063 R09: 99ff7cd37a40
>   [0.937021] R10: acf68031fed0 R11: 0020 R12: 0020
>   [0.937749] R13: acf68031fd08 R14: 00f8 R15: 99ff7da1ec60
>   [0.938486] FS:  7fc2140bb280() GS:99ff7da0() 
> knlGS:
>   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
>   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4: 00760ef0
>   [0.940638] DR0:  DR1:  DR2: 
>   [0.941366] DR3:  DR6: fffe0ff0 DR7: 0400
>   [0.942110] PKRU: 5554
>   [0.942412] Call Trace:
>   [0.942673]  try_charge+0xcb/0x780
>   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
>   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
>   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
>   [0.944396]  copy_process.part.41+0x1ca/0x2070
>   [0.944853]  ? get_acl+0x1a/0x120
>   [0.945200]  ? shmem_tmpfile+0x90/0x90
>   [0.945596]  _do_fork+0xd7/0x3d0
>   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
>   [0.946421]  do_syscall_64+0x5a/0x180
>   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The problem occurs because get_mem_cgroup_from_current() returns
> the NULL pointer if memory controller is disabled. Let's check
> if this is a case at the beginning of memcg_kmem_charge() and
> just return 0 if mem_cgroup_disabled() returns true. This is how
> we handle this case in many other places in the memory controller
> code.
>
> Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> Reported-by: Mike Galbraith 
> Signed-off-by: Roman Gushchin 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Vladimir Davydov 
> Cc: Andrew Morton 
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 54920cbc46bf..6e1469b80cb7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page, gfp_t gfp, int 
> order)
> struct mem_cgroup *memcg;
> int ret = 0;
>
> -   if (memcg_kmem_bypass())
> +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> return 0;
>

Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
in memcg_charge_kernel_stack()?

> memcg = get_mem_cgroup_from_current();
> --
> 2.17.2
>


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-29 Thread Shakeel Butt
On Mon, Oct 29, 2018 at 2:52 PM Roman Gushchin  wrote:
>
> Mike Galbraith reported a regression caused by the commit 9b6f7e163cd0
> ("mm: rework memcg kernel stack accounting") on a system with
> "cgroup_disable=memory" boot option: the system panics with the
> following stack trace:
>
>   [0.928542] BUG: unable to handle kernel NULL pointer dereference at 
> 00f8
>   [0.929317] PGD 0 P4D 0
>   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
>   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
>   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> ?-20180531_142017-buildhw-08.phx2.fed4
>   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
>   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 
> 00 00 00 41 56 48 89 f8 49 89 fe 49
>   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
>   [0.934826] RAX: 00f8 RBX:  RCX: 
>   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI: 00f8
>   [0.936288] RBP: 0001 R08: 8063 R09: 99ff7cd37a40
>   [0.937021] R10: acf68031fed0 R11: 0020 R12: 0020
>   [0.937749] R13: acf68031fd08 R14: 00f8 R15: 99ff7da1ec60
>   [0.938486] FS:  7fc2140bb280() GS:99ff7da0() 
> knlGS:
>   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
>   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4: 00760ef0
>   [0.940638] DR0:  DR1:  DR2: 
>   [0.941366] DR3:  DR6: fffe0ff0 DR7: 0400
>   [0.942110] PKRU: 5554
>   [0.942412] Call Trace:
>   [0.942673]  try_charge+0xcb/0x780
>   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
>   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
>   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
>   [0.944396]  copy_process.part.41+0x1ca/0x2070
>   [0.944853]  ? get_acl+0x1a/0x120
>   [0.945200]  ? shmem_tmpfile+0x90/0x90
>   [0.945596]  _do_fork+0xd7/0x3d0
>   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
>   [0.946421]  do_syscall_64+0x5a/0x180
>   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The problem occurs because get_mem_cgroup_from_current() returns
> the NULL pointer if memory controller is disabled. Let's check
> if this is a case at the beginning of memcg_kmem_charge() and
> just return 0 if mem_cgroup_disabled() returns true. This is how
> we handle this case in many other places in the memory controller
> code.
>
> Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> Reported-by: Mike Galbraith 
> Signed-off-by: Roman Gushchin 
> Cc: Michal Hocko 
> Cc: Johannes Weiner 
> Cc: Vladimir Davydov 
> Cc: Andrew Morton 
> ---
>  mm/memcontrol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 54920cbc46bf..6e1469b80cb7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2593,7 +2593,7 @@ int memcg_kmem_charge(struct page *page, gfp_t gfp, int 
> order)
> struct mem_cgroup *memcg;
> int ret = 0;
>
> -   if (memcg_kmem_bypass())
> +   if (mem_cgroup_disabled() || memcg_kmem_bypass())
> return 0;
>

Why not check memcg_kmem_enabled() before calling memcg_kmem_charge()
in memcg_charge_kernel_stack()?

> memcg = get_mem_cgroup_from_current();
> --
> 2.17.2
>


net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of function 'cmpxchg64'; did you mean 'cmpxchg'?

2018-10-29 Thread kbuild test robot
Hi Arnd,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 21924765862a0871908a35cb0e53e2e1c169b888 SUNRPC: use cmpxchg64() in 
gss_seq_send64_fetch_and_inc()
date:   3 weeks ago
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 21924765862a0871908a35cb0e53e2e1c169b888
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   net/sunrpc/auth_gss/gss_krb5_seal.c: In function 
'gss_seq_send64_fetch_and_inc':
>> net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of 
>> function 'cmpxchg64'; did you mean 'cmpxchg'? 
>> [-Werror=implicit-function-declaration]
  seq_send = cmpxchg64(>seq_send64, old, old + 1);
 ^
 cmpxchg
   cc1: some warnings being treated as errors

vim +144 net/sunrpc/auth_gss/gss_krb5_seal.c

   136  
   137  u64
   138  gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx)
   139  {
   140  u64 old, seq_send = READ_ONCE(ctx->seq_send);
   141  
   142  do {
   143  old = seq_send;
 > 144  seq_send = cmpxchg64(>seq_send64, old, old + 1);
   145  } while (old != seq_send);
   146  return seq_send;
   147  }
   148  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of function 'cmpxchg64'; did you mean 'cmpxchg'?

2018-10-29 Thread kbuild test robot
Hi Arnd,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 21924765862a0871908a35cb0e53e2e1c169b888 SUNRPC: use cmpxchg64() in 
gss_seq_send64_fetch_and_inc()
date:   3 weeks ago
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 21924765862a0871908a35cb0e53e2e1c169b888
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   net/sunrpc/auth_gss/gss_krb5_seal.c: In function 
'gss_seq_send64_fetch_and_inc':
>> net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of 
>> function 'cmpxchg64'; did you mean 'cmpxchg'? 
>> [-Werror=implicit-function-declaration]
  seq_send = cmpxchg64(>seq_send64, old, old + 1);
 ^
 cmpxchg
   cc1: some warnings being treated as errors

vim +144 net/sunrpc/auth_gss/gss_krb5_seal.c

   136  
   137  u64
   138  gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx)
   139  {
   140  u64 old, seq_send = READ_ONCE(ctx->seq_send);
   141  
   142  do {
   143  old = seq_send;
 > 144  seq_send = cmpxchg64(>seq_send64, old, old + 1);
   145  } while (old != seq_send);
   146  return seq_send;
   147  }
   148  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of function 'pci_iomap'; did you mean 'pcim_iomap'?

2018-10-29 Thread kbuild test robot
Hi Rakesh,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 6bae5ea9498926440ffc883f3dbceb0adc65e492 ASoC: hdac_hda: add asoc 
extension for legacy HDA codec drivers
date:   9 weeks ago
config: sh-allyesconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 6bae5ea9498926440ffc883f3dbceb0adc65e492
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   sound/pci/hda/patch_ca0132.c: In function 'patch_ca0132':
>> sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of 
>> function 'pci_iomap'; did you mean 'pcim_iomap'? 
>> [-Werror=implicit-function-declaration]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
   ^
   pcim_iomap
   sound/pci/hda/patch_ca0132.c:7650:18: warning: assignment makes pointer from 
integer without a cast [-Wint-conversion]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
 ^
   cc1: some warnings being treated as errors

vim +7650 sound/pci/hda/patch_ca0132.c

d5c016b56 Gabriele Martino 2015-05-18  7581  
95c6e9cb7 Ian Minett   2011-06-15  7582  static int patch_ca0132(struct 
hda_codec *codec)
95c6e9cb7 Ian Minett   2011-06-15  7583  {
95c6e9cb7 Ian Minett   2011-06-15  7584 struct ca0132_spec *spec;
a73d511c4 Ian Minett   2012-12-20  7585 int err;
d5c016b56 Gabriele Martino 2015-05-18  7586 const struct snd_pci_quirk 
*quirk;
95c6e9cb7 Ian Minett   2011-06-15  7587  
4e76a8833 Takashi Iwai 2014-02-25  7588 codec_dbg(codec, 
"patch_ca0132\n");
95c6e9cb7 Ian Minett   2011-06-15  7589  
95c6e9cb7 Ian Minett   2011-06-15  7590 spec = kzalloc(sizeof(*spec), 
GFP_KERNEL);
95c6e9cb7 Ian Minett   2011-06-15  7591 if (!spec)
95c6e9cb7 Ian Minett   2011-06-15  7592 return -ENOMEM;
95c6e9cb7 Ian Minett   2011-06-15  7593 codec->spec = spec;
993884f6a Chih-Chung Chang 2013-03-25  7594 spec->codec = codec;
95c6e9cb7 Ian Minett   2011-06-15  7595  
225068ab2 Takashi Iwai 2015-05-29  7596 codec->patch_ops = 
ca0132_patch_ops;
225068ab2 Takashi Iwai 2015-05-29  7597 codec->pcm_format_first = 1;
225068ab2 Takashi Iwai 2015-05-29  7598 codec->no_sticky_stream = 1;
225068ab2 Takashi Iwai 2015-05-29  7599  
d5c016b56 Gabriele Martino 2015-05-18  7600 /* Detect codec quirk */
d5c016b56 Gabriele Martino 2015-05-18  7601 quirk = 
snd_pci_quirk_lookup(codec->bus->pci, ca0132_quirks);
d5c016b56 Gabriele Martino 2015-05-18  7602 if (quirk)
d5c016b56 Gabriele Martino 2015-05-18  7603 spec->quirk = 
quirk->value;
d5c016b56 Gabriele Martino 2015-05-18  7604 else
d5c016b56 Gabriele Martino 2015-05-18  7605 spec->quirk = 
QUIRK_NONE;
d5c016b56 Gabriele Martino 2015-05-18  7606  
e24aa0a4c Takashi Iwai 2014-08-10  7607 spec->dsp_state = 
DSP_DOWNLOAD_INIT;
a7e76271b Ian Minett   2012-12-20  7608 spec->num_mixers = 1;
017310fbe Connor McAdams   2018-05-08  7609  
017310fbe Connor McAdams   2018-05-08  7610 /* Set which mixers each quirk 
uses. */
017310fbe Connor McAdams   2018-05-08  7611 switch (spec->quirk) {
017310fbe Connor McAdams   2018-05-08  7612 case QUIRK_SBZ:
e25e34450 Connor McAdams   2018-08-08  7613 spec->mixers[0] = 
desktop_mixer;
017310fbe Connor McAdams   2018-05-08  7614 
snd_hda_codec_set_name(codec, "Sound Blaster Z");
017310fbe Connor McAdams   2018-05-08  7615 break;
e25e34450 Connor McAdams   2018-08-08  7616 case QUIRK_R3D:
e25e34450 Connor McAdams   2018-08-08  7617 spec->mixers[0] = 
desktop_mixer;
e25e34450 Connor McAdams   2018-08-08  7618 
snd_hda_codec_set_name(codec, "Recon3D");
e25e34450 Connor McAdams   2018-08-08  7619 break;
017310fbe Connor McAdams   2018-05-08  7620 case QUIRK_R3DI:
017310fbe Connor McAdams   2018-05-08  7621 spec->mixers[0] = 
r3di_mixer;
017310fbe Connor McAdams   2018-05-08  7622 
snd_hda_codec_set_name(codec, "Recon3Di");
017310fbe Connor McAdams   2018-05-08  7623 break;
017310fbe Connor McAdams   2018-05-08  7624 default:
a7e76271b Ian Minett   2012-12-20  7625 spec->mixers[0] = 
ca0132_mixer;
017310fbe Connor McAdams   2018-05-08  7626 break;
017310fbe Connor McAdams   2018-05-08  7627 }
a7e76271b Ian Minett   2012-12-20  7628  
08eca6b1f Connor McAdams   2018-08-08  7629 /* Setup whether or not to use 
alt functions/controls/pci_mmio */
009b8f979 Connor McAdams   2018-05-08  7630 switch 

sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of function 'pci_iomap'; did you mean 'pcim_iomap'?

2018-10-29 Thread kbuild test robot
Hi Rakesh,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4b42745211af552f170f38a1b97f4a112b5da6b2
commit: 6bae5ea9498926440ffc883f3dbceb0adc65e492 ASoC: hdac_hda: add asoc 
extension for legacy HDA codec drivers
date:   9 weeks ago
config: sh-allyesconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 6bae5ea9498926440ffc883f3dbceb0adc65e492
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   sound/pci/hda/patch_ca0132.c: In function 'patch_ca0132':
>> sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of 
>> function 'pci_iomap'; did you mean 'pcim_iomap'? 
>> [-Werror=implicit-function-declaration]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
   ^
   pcim_iomap
   sound/pci/hda/patch_ca0132.c:7650:18: warning: assignment makes pointer from 
integer without a cast [-Wint-conversion]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
 ^
   cc1: some warnings being treated as errors

vim +7650 sound/pci/hda/patch_ca0132.c

d5c016b56 Gabriele Martino 2015-05-18  7581  
95c6e9cb7 Ian Minett   2011-06-15  7582  static int patch_ca0132(struct 
hda_codec *codec)
95c6e9cb7 Ian Minett   2011-06-15  7583  {
95c6e9cb7 Ian Minett   2011-06-15  7584 struct ca0132_spec *spec;
a73d511c4 Ian Minett   2012-12-20  7585 int err;
d5c016b56 Gabriele Martino 2015-05-18  7586 const struct snd_pci_quirk 
*quirk;
95c6e9cb7 Ian Minett   2011-06-15  7587  
4e76a8833 Takashi Iwai 2014-02-25  7588 codec_dbg(codec, 
"patch_ca0132\n");
95c6e9cb7 Ian Minett   2011-06-15  7589  
95c6e9cb7 Ian Minett   2011-06-15  7590 spec = kzalloc(sizeof(*spec), 
GFP_KERNEL);
95c6e9cb7 Ian Minett   2011-06-15  7591 if (!spec)
95c6e9cb7 Ian Minett   2011-06-15  7592 return -ENOMEM;
95c6e9cb7 Ian Minett   2011-06-15  7593 codec->spec = spec;
993884f6a Chih-Chung Chang 2013-03-25  7594 spec->codec = codec;
95c6e9cb7 Ian Minett   2011-06-15  7595  
225068ab2 Takashi Iwai 2015-05-29  7596 codec->patch_ops = 
ca0132_patch_ops;
225068ab2 Takashi Iwai 2015-05-29  7597 codec->pcm_format_first = 1;
225068ab2 Takashi Iwai 2015-05-29  7598 codec->no_sticky_stream = 1;
225068ab2 Takashi Iwai 2015-05-29  7599  
d5c016b56 Gabriele Martino 2015-05-18  7600 /* Detect codec quirk */
d5c016b56 Gabriele Martino 2015-05-18  7601 quirk = 
snd_pci_quirk_lookup(codec->bus->pci, ca0132_quirks);
d5c016b56 Gabriele Martino 2015-05-18  7602 if (quirk)
d5c016b56 Gabriele Martino 2015-05-18  7603 spec->quirk = 
quirk->value;
d5c016b56 Gabriele Martino 2015-05-18  7604 else
d5c016b56 Gabriele Martino 2015-05-18  7605 spec->quirk = 
QUIRK_NONE;
d5c016b56 Gabriele Martino 2015-05-18  7606  
e24aa0a4c Takashi Iwai 2014-08-10  7607 spec->dsp_state = 
DSP_DOWNLOAD_INIT;
a7e76271b Ian Minett   2012-12-20  7608 spec->num_mixers = 1;
017310fbe Connor McAdams   2018-05-08  7609  
017310fbe Connor McAdams   2018-05-08  7610 /* Set which mixers each quirk 
uses. */
017310fbe Connor McAdams   2018-05-08  7611 switch (spec->quirk) {
017310fbe Connor McAdams   2018-05-08  7612 case QUIRK_SBZ:
e25e34450 Connor McAdams   2018-08-08  7613 spec->mixers[0] = 
desktop_mixer;
017310fbe Connor McAdams   2018-05-08  7614 
snd_hda_codec_set_name(codec, "Sound Blaster Z");
017310fbe Connor McAdams   2018-05-08  7615 break;
e25e34450 Connor McAdams   2018-08-08  7616 case QUIRK_R3D:
e25e34450 Connor McAdams   2018-08-08  7617 spec->mixers[0] = 
desktop_mixer;
e25e34450 Connor McAdams   2018-08-08  7618 
snd_hda_codec_set_name(codec, "Recon3D");
e25e34450 Connor McAdams   2018-08-08  7619 break;
017310fbe Connor McAdams   2018-05-08  7620 case QUIRK_R3DI:
017310fbe Connor McAdams   2018-05-08  7621 spec->mixers[0] = 
r3di_mixer;
017310fbe Connor McAdams   2018-05-08  7622 
snd_hda_codec_set_name(codec, "Recon3Di");
017310fbe Connor McAdams   2018-05-08  7623 break;
017310fbe Connor McAdams   2018-05-08  7624 default:
a7e76271b Ian Minett   2012-12-20  7625 spec->mixers[0] = 
ca0132_mixer;
017310fbe Connor McAdams   2018-05-08  7626 break;
017310fbe Connor McAdams   2018-05-08  7627 }
a7e76271b Ian Minett   2012-12-20  7628  
08eca6b1f Connor McAdams   2018-08-08  7629 /* Setup whether or not to use 
alt functions/controls/pci_mmio */
009b8f979 Connor McAdams   2018-05-08  7630 switch 

Re: [PATCH v2 3/5] Creates macro to avoid variable shadowing

2018-10-29 Thread Leonardo Bras
Thank you!
On Sun, Oct 28, 2018 at 1:38 PM Masahiro Yamada
 wrote:
>
> On Tue, Oct 23, 2018 at 10:11 AM Leonardo Brás  wrote:
> >
> > Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
> > macro, allowing usage of a variable name other than the struct element name.
> > Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
> > the var name is the same as the struct element name.
> >
> > Signed-off-by: Leonardo Brás 
> > ---
>
>
> Applied to linux-kbuild.
>
>
>
> >  scripts/mod/file2alias.c | 24 +---
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> > index 7be43697ff84..3015c0bdecb2 100644
> > --- a/scripts/mod/file2alias.c
> > +++ b/scripts/mod/file2alias.c
> > @@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], 
> > *__stop___devtable[];
> >   */
> >  #define DEF_FIELD(m, devid, f) \
> > typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + 
> > OFF_##devid##_##f))
> > +
> > +/* Define a variable v that holds the address of field f of struct devid
> > + * based at address m.  Due to the way typeof works, for a field of type
> > + * T[N] the variable has type T(*)[N], _not_ T*.
> > + */
> > +#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \
> > +   typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f)
> > +
> >  /* Define a variable f that holds the address of field f of struct devid
> >   * based at address m.  Due to the way typeof works, for a field of type
> >   * T[N] the variable has type T(*)[N], _not_ T*.
> >   */
> >  #define DEF_FIELD_ADDR(m, devid, f) \
> > -   typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f)
> > +   DEF_FIELD_ADDR_VAR(m, devid, f, f)
> >
> >  /* Add a table entry.  We test function type matches while we're here. */
> >  #define ADD_TO_DEVTABLE(device_id, type, function) \
> > @@ -641,25 +649,27 @@ static void do_pnp_card_entries(void *symval, 
> > unsigned long size,
> > unsigned int i;
> >
> > device_id_check(mod->name, "pnp", size, id_size, symval);
> > +   DEF_FIELD_ADDR(symval, pnp_card_device_id, devs);
> > +   typeof(devs) devs_last;
> >
> > for (i = 0; i < count; i++) {
> > unsigned int j;
> > -   DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, 
> > devs);
> > +   devs_last = devs + i * id_size;
> >
> > for (j = 0; j < PNP_MAX_DEVICES; j++) {
> > -   const char *id = (char *)(*devs)[j].id;
> > -   int i2, j2;
> > +   const char *id = (char *)(*devs_last)[j].id;
> > +   int j2;
> > int dup = 0;
> >
> > if (!id[0])
> > break;
> >
> > /* find duplicate, already added value */
> > -   for (i2 = 0; i2 < i && !dup; i2++) {
> > -   DEF_FIELD_ADDR(symval + i2*id_size, 
> > pnp_card_device_id, devs);
> > +   while ((devs_last -= id_size) >= devs && !dup) {
> >
> > for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) {
> > -   const char *id2 = (char 
> > *)(*devs)[j2].id;
> > +   const char *id2 =
> > +   (char *)(*devs_last)[j2].id;
> >
> > if (!id2[0])
> > break;
> > --
> > 2.19.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH v2 3/5] Creates macro to avoid variable shadowing

2018-10-29 Thread Leonardo Bras
Thank you!
On Sun, Oct 28, 2018 at 1:38 PM Masahiro Yamada
 wrote:
>
> On Tue, Oct 23, 2018 at 10:11 AM Leonardo Brás  wrote:
> >
> > Creates DEF_FIELD_ADDR_VAR as a more generic version of the DEF_FIELD_ADD
> > macro, allowing usage of a variable name other than the struct element name.
> > Also, sets DEF_FIELD_ADDR as a specific usage of DEF_FILD_ADDR_VAR in which
> > the var name is the same as the struct element name.
> >
> > Signed-off-by: Leonardo Brás 
> > ---
>
>
> Applied to linux-kbuild.
>
>
>
> >  scripts/mod/file2alias.c | 24 +---
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
> > index 7be43697ff84..3015c0bdecb2 100644
> > --- a/scripts/mod/file2alias.c
> > +++ b/scripts/mod/file2alias.c
> > @@ -95,12 +95,20 @@ extern struct devtable *__start___devtable[], 
> > *__stop___devtable[];
> >   */
> >  #define DEF_FIELD(m, devid, f) \
> > typeof(((struct devid *)0)->f) f = TO_NATIVE(*(typeof(f) *)((m) + 
> > OFF_##devid##_##f))
> > +
> > +/* Define a variable v that holds the address of field f of struct devid
> > + * based at address m.  Due to the way typeof works, for a field of type
> > + * T[N] the variable has type T(*)[N], _not_ T*.
> > + */
> > +#define DEF_FIELD_ADDR_VAR(m, devid, f, v) \
> > +   typeof(((struct devid *)0)->f) *v = ((m) + OFF_##devid##_##f)
> > +
> >  /* Define a variable f that holds the address of field f of struct devid
> >   * based at address m.  Due to the way typeof works, for a field of type
> >   * T[N] the variable has type T(*)[N], _not_ T*.
> >   */
> >  #define DEF_FIELD_ADDR(m, devid, f) \
> > -   typeof(((struct devid *)0)->f) *f = ((m) + OFF_##devid##_##f)
> > +   DEF_FIELD_ADDR_VAR(m, devid, f, f)
> >
> >  /* Add a table entry.  We test function type matches while we're here. */
> >  #define ADD_TO_DEVTABLE(device_id, type, function) \
> > @@ -641,25 +649,27 @@ static void do_pnp_card_entries(void *symval, 
> > unsigned long size,
> > unsigned int i;
> >
> > device_id_check(mod->name, "pnp", size, id_size, symval);
> > +   DEF_FIELD_ADDR(symval, pnp_card_device_id, devs);
> > +   typeof(devs) devs_last;
> >
> > for (i = 0; i < count; i++) {
> > unsigned int j;
> > -   DEF_FIELD_ADDR(symval + i*id_size, pnp_card_device_id, 
> > devs);
> > +   devs_last = devs + i * id_size;
> >
> > for (j = 0; j < PNP_MAX_DEVICES; j++) {
> > -   const char *id = (char *)(*devs)[j].id;
> > -   int i2, j2;
> > +   const char *id = (char *)(*devs_last)[j].id;
> > +   int j2;
> > int dup = 0;
> >
> > if (!id[0])
> > break;
> >
> > /* find duplicate, already added value */
> > -   for (i2 = 0; i2 < i && !dup; i2++) {
> > -   DEF_FIELD_ADDR(symval + i2*id_size, 
> > pnp_card_device_id, devs);
> > +   while ((devs_last -= id_size) >= devs && !dup) {
> >
> > for (j2 = 0; j2 < PNP_MAX_DEVICES; j2++) {
> > -   const char *id2 = (char 
> > *)(*devs)[j2].id;
> > +   const char *id2 =
> > +   (char *)(*devs_last)[j2].id;
> >
> > if (!id2[0])
> > break;
> > --
> > 2.19.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.

2018-10-29 Thread Leonardo Bras
Sorry, I will take care next time.

Thank you,

Leonardo Bras

On Sun, Oct 28, 2018 at 1:37 PM Masahiro Yamada
 wrote:
>
> On Wed, Oct 24, 2018 at 1:04 PM Leonardo Bras  wrote:
> >
> > Removes an unnecessary shadowed local variable (start).
> > It was used only once, with the same value it was started before
> > the if block.
> >
> > Signed-off-by: Leonardo Bras 
>
>
>
> Applied to linux-kbuild
> with some fixups in the subject.
>
> Please do not add a period to the end of the subject.
>
>
>
>
>
>
> > ---
> >  scripts/asn1_compiler.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
> > index c146020fc783..1b28787028d3 100644
> > --- a/scripts/asn1_compiler.c
> > +++ b/scripts/asn1_compiler.c
> > @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
> >
> > /* Handle string tokens */
> > if (isalpha(*p)) {
> > -   const char **dir, *start = p;
> > +   const char **dir;
> >
> > /* Can be a directive, type name or element
> >  * name.  Find the end of the name.
> > --
> > 2.19.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH v3 2/5] kbuild: Removes unnecessary shadowed local variable.

2018-10-29 Thread Leonardo Bras
Sorry, I will take care next time.

Thank you,

Leonardo Bras

On Sun, Oct 28, 2018 at 1:37 PM Masahiro Yamada
 wrote:
>
> On Wed, Oct 24, 2018 at 1:04 PM Leonardo Bras  wrote:
> >
> > Removes an unnecessary shadowed local variable (start).
> > It was used only once, with the same value it was started before
> > the if block.
> >
> > Signed-off-by: Leonardo Bras 
>
>
>
> Applied to linux-kbuild
> with some fixups in the subject.
>
> Please do not add a period to the end of the subject.
>
>
>
>
>
>
> > ---
> >  scripts/asn1_compiler.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
> > index c146020fc783..1b28787028d3 100644
> > --- a/scripts/asn1_compiler.c
> > +++ b/scripts/asn1_compiler.c
> > @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
> >
> > /* Handle string tokens */
> > if (isalpha(*p)) {
> > -   const char **dir, *start = p;
> > +   const char **dir;
> >
> > /* Can be a directive, type name or element
> >  * name.  Find the end of the name.
> > --
> > 2.19.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread Todd Kjos
+christ...@brauner.io

On Sun, Oct 28, 2018 at 7:29 PM chouryzhou(周威)  wrote:
...
>
> > It's not obvious from this patch where this dependency comes
> > from...why is SYSVIPC required? I'd like to not have to require IPC_NS
> > either for devices.
>
> Yes, the patch is not highly dependent on SYSVIPC, but it will be convenient
> if require it. I will update it to drop dependency of it in V2 patch. This 
> patch
> doesn't need IPC_NS set at present.

Actually it is dependent on IPC_NS since it makes changes to
ipc/namespace.c which is
compiled only if CONFIG_IPC_NS.

There are a couple more implementations similar to this one.
https://lwn.net/Articles/577957/ and some submissions to AOSP derived
from that one
that introduce a generic registration function for namespace support [1], and
changes to binder to implement namespaces [2].

If this is really needed, then we should have a solution that works
for devices without
requiring IPC_NS or SYSVIPC. Also, we should not add binder-specific code to
ipc/namespace.c or include/linux/ipc_namespace.h.

-Todd

[1] https://android-review.googlesource.com/c/kernel/common/+/471961
[2] https://android-review.googlesource.com/c/kernel/common/+/471825


Re: [PATCH] binder: ipc namespace support for android binder

2018-10-29 Thread Todd Kjos
+christ...@brauner.io

On Sun, Oct 28, 2018 at 7:29 PM chouryzhou(周威)  wrote:
...
>
> > It's not obvious from this patch where this dependency comes
> > from...why is SYSVIPC required? I'd like to not have to require IPC_NS
> > either for devices.
>
> Yes, the patch is not highly dependent on SYSVIPC, but it will be convenient
> if require it. I will update it to drop dependency of it in V2 patch. This 
> patch
> doesn't need IPC_NS set at present.

Actually it is dependent on IPC_NS since it makes changes to
ipc/namespace.c which is
compiled only if CONFIG_IPC_NS.

There are a couple more implementations similar to this one.
https://lwn.net/Articles/577957/ and some submissions to AOSP derived
from that one
that introduce a generic registration function for namespace support [1], and
changes to binder to implement namespaces [2].

If this is really needed, then we should have a solution that works
for devices without
requiring IPC_NS or SYSVIPC. Also, we should not add binder-specific code to
ipc/namespace.c or include/linux/ipc_namespace.h.

-Todd

[1] https://android-review.googlesource.com/c/kernel/common/+/471961
[2] https://android-review.googlesource.com/c/kernel/common/+/471825


[ANNOUNCE] v4.19-rt1

2018-10-29 Thread Sebastian Andrzej Siewior
Dear RT folks!

I'm pleased to announce the v4.19-rt1 patch set. 

Changes since v4.18.16-rt9:

  - rebase to v4.19

Known issues
 - A warning triggered in "rcu_note_context_switch" originated from
   SyS_timer_gettime(). The issue was always there, it is now
   visible. Reported by Grygorii Strashko and Daniel Wagner.

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git 
v4.19-rt1

The RT patch against v4.19 can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.19/older/patch-4.19-rt1.patch.xz

The split quilt queue is available at:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.19/older/patches-4.19-rt1.tar.xz

Sebastian


[ANNOUNCE] v4.19-rt1

2018-10-29 Thread Sebastian Andrzej Siewior
Dear RT folks!

I'm pleased to announce the v4.19-rt1 patch set. 

Changes since v4.18.16-rt9:

  - rebase to v4.19

Known issues
 - A warning triggered in "rcu_note_context_switch" originated from
   SyS_timer_gettime(). The issue was always there, it is now
   visible. Reported by Grygorii Strashko and Daniel Wagner.

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git 
v4.19-rt1

The RT patch against v4.19 can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.19/older/patch-4.19-rt1.patch.xz

The split quilt queue is available at:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.19/older/patches-4.19-rt1.tar.xz

Sebastian


Re: [GIT PULL] rpmsg updates for v4.20

2018-10-29 Thread Linus Torvalds
On Mon, Oct 29, 2018 at 4:30 PM Bjorn Andersson
 wrote:
>
> rpmsg updates for v4.20

Pulled (along with the remoteproc branch),

Linus


Re: [GIT PULL] rpmsg updates for v4.20

2018-10-29 Thread Linus Torvalds
On Mon, Oct 29, 2018 at 4:30 PM Bjorn Andersson
 wrote:
>
> rpmsg updates for v4.20

Pulled (along with the remoteproc branch),

Linus


Re: Re: [PATCH] fs/proc: introduce /proc/stat2 file

2018-10-29 Thread Alexey Dobriyan
On Mon, Oct 29, 2018 at 11:40:47PM +, Daniel Colascione wrote:
> On Mon, Oct 29, 2018 at 11:34 PM, Alexey Dobriyan  wrote:
> >> I'd much rather move to a model in which userspace *explicitly* tells
> >> the kernel which fields it wants, with the kernel replying with just
> >> those particular fields, maybe in their raw binary representations.
> >> The ASCII-text bag-of-everything files would remain available for
> >> ad-hoc and non-performance critical use, but programs that cared about
> >> performance would have an efficient bypass. One concrete approach is
> >> to let users open up today's proc files and, instead of read(2)ing a
> >> text blob, use an ioctl to retrieve specified and targeted information
> >> of the sort that would normally be encoded in the text blob. Because
> >> callers would open the same file when using either the text or binary
> >> interfaces, little would have to change, and it'd be easy to implement
> >> fallbacks when a particular system doesn't support a particular
> >> fast-path ioctl.
> >
> > You've just reinvented systems calls.
> 
> I don't know why you say so. There are important benefits that come
> from using an ioctl on a proc file FD instead of a plain system call.
> Procfs files have file permissions,auditing, SCM_RIGHTS-ability, PID
> race immunity, and other things that you wouldn't get from a plain
> "get this information about this PID" system call.

This whole thread started because /proc/stat is slow and every number in
/proc/stat is system global.

If you continue adding stuff to /proc, one day someone will notice that
core VFS adds considerable overhead, at this point there is nothing
anyone could do.

I'd strongly advise to look at what this DB actually needs and deliver
just that.

Very little of other things apply to /proc/stat:
* system call auditing exists,
* /proc/stat is world readable and continues to be so,
* thus passing descriptor around is pretty useless,
* $PID race doesn't apply.

Additionally passing descriptors feels like party trick.
I suspect that's not how people use statistics in /proc: they run
processes and one priviledged enough monitoring daemon collects data,
otherwise userspace needs to cooperate with monitoring userspace
which of course doesn't happen.

PID race is solved by giving out descriptors which pin "struct pid".
Which is how the race is solved currently: dentry pins inode, inode
pins "struct pid".


Re: Re: [PATCH] fs/proc: introduce /proc/stat2 file

2018-10-29 Thread Alexey Dobriyan
On Mon, Oct 29, 2018 at 11:40:47PM +, Daniel Colascione wrote:
> On Mon, Oct 29, 2018 at 11:34 PM, Alexey Dobriyan  wrote:
> >> I'd much rather move to a model in which userspace *explicitly* tells
> >> the kernel which fields it wants, with the kernel replying with just
> >> those particular fields, maybe in their raw binary representations.
> >> The ASCII-text bag-of-everything files would remain available for
> >> ad-hoc and non-performance critical use, but programs that cared about
> >> performance would have an efficient bypass. One concrete approach is
> >> to let users open up today's proc files and, instead of read(2)ing a
> >> text blob, use an ioctl to retrieve specified and targeted information
> >> of the sort that would normally be encoded in the text blob. Because
> >> callers would open the same file when using either the text or binary
> >> interfaces, little would have to change, and it'd be easy to implement
> >> fallbacks when a particular system doesn't support a particular
> >> fast-path ioctl.
> >
> > You've just reinvented systems calls.
> 
> I don't know why you say so. There are important benefits that come
> from using an ioctl on a proc file FD instead of a plain system call.
> Procfs files have file permissions,auditing, SCM_RIGHTS-ability, PID
> race immunity, and other things that you wouldn't get from a plain
> "get this information about this PID" system call.

This whole thread started because /proc/stat is slow and every number in
/proc/stat is system global.

If you continue adding stuff to /proc, one day someone will notice that
core VFS adds considerable overhead, at this point there is nothing
anyone could do.

I'd strongly advise to look at what this DB actually needs and deliver
just that.

Very little of other things apply to /proc/stat:
* system call auditing exists,
* /proc/stat is world readable and continues to be so,
* thus passing descriptor around is pretty useless,
* $PID race doesn't apply.

Additionally passing descriptors feels like party trick.
I suspect that's not how people use statistics in /proc: they run
processes and one priviledged enough monitoring daemon collects data,
otherwise userspace needs to cooperate with monitoring userspace
which of course doesn't happen.

PID race is solved by giving out descriptors which pin "struct pid".
Which is how the race is solved currently: dentry pins inode, inode
pins "struct pid".


  1   2   3   4   5   6   7   8   9   10   >