date:20190222

Re: [PATCH 3/3] imx8mq.dtsi: add the sdma nodes

2019-02-22 Thread Daniel Baluta

Hi Angus,

What is the status of this patch? Most likely this should go through
Shwan's tree.

I noticed that I have also sent a similar patch to Shawn:
https://www.spinics.net/lists/arm-kernel/msg708424.html

So, lets coordinate and work better on this.

I am now preparing another series where I add SAI nodes and enable WM
codec on imx8MQ.

If you don't mind I will pick your relevant changes from this patch
and add them to my series, then send them to Shawn.

thanks,
Daniel.

On Sun, Jan 20, 2019 at 4:05 AM Angus Ainslie (Purism)  wrote:
>
> Add the sdma nodes to the base devicetree for the imx8mq
>
> Signed-off-by: Angus Ainslie (Purism) 
> ---
>  arch/arm64/boot/dts/freescale/imx8mq.dtsi | 31 +++
>  1 file changed, 31 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi 
> b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> index c0402375e7c1..0b9a9b5ae7b7 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
> @@ -336,6 +336,19 @@
> clocks = < IMX8MQ_CLK_WDOG3_ROOT>;
> status = "disabled";
> };
> +
> +   sdma2: sdma@302c {
> +   compatible = "fsl,imx8mq-sdma", 
> "fsl,imx7d-sdma";
> +   reg = <0x302c 0x1>;
> +   interrupts =  IRQ_TYPE_LEVEL_HIGH>;
> +   clocks = < IMX8MQ_CLK_SDMA2_ROOT>,
> +   < IMX8MQ_CLK_SDMA2_ROOT>;
> +   clock-names = "ipg", "ahb";
> +   #dma-cells = <3>;
> +   fsl,sdma-ram-script-name = 
> "imx/sdma/sdma-imx7d.bin";
> +   fsl,ratio-1-1;
> +   status = "disabled";
> +   };
> };
>
> bus@3040 { /* AIPS2 */
> @@ -370,6 +383,8 @@
> clocks = < IMX8MQ_CLK_UART3_ROOT>,
>  < IMX8MQ_CLK_UART3_ROOT>;
> clock-names = "ipg", "per";
> +   dmas = < 26 4 0>, < 27 4 0>;
> +   dma-names = "rx", "tx";
> status = "disabled";
> };
>
> @@ -381,6 +396,8 @@
> clocks = < IMX8MQ_CLK_UART2_ROOT>,
>  < IMX8MQ_CLK_UART2_ROOT>;
> clock-names = "ipg", "per";
> +   dmas = < 24 4 0>, < 25 4 0>;
> +   dma-names = "rx", "tx";
> status = "disabled";
> };
>
> @@ -432,6 +449,8 @@
> clocks = < IMX8MQ_CLK_UART4_ROOT>,
>  < IMX8MQ_CLK_UART4_ROOT>;
> clock-names = "ipg", "per";
> +   dmas = < 28 4 0>, < 29 4 0>;
> +   dma-names = "rx", "tx";
> status = "disabled";
> };
>
> @@ -465,6 +484,18 @@
> status = "disabled";
> };
>
> +   sdma1: sdma@30bd {
> +   compatible = "fsl,imx8mq-sdma", 
> "fsl,imx7d-sdma";
> +   reg = <0x30bd 0x1>;
> +   interrupts = ;
> +   clocks = < IMX8MQ_CLK_SDMA1_ROOT>,
> +   < IMX8MQ_CLK_SDMA1_ROOT>;
> +   clock-names = "ipg", "ahb";
> +   #dma-cells = <3>;
> +   fsl,sdma-ram-script-name = 
> "imx/sdma/sdma-imx7d.bin";
> +   status = "disabled";
> +   };
> +
> fec1: ethernet@30be {
> compatible = "fsl,imx8mq-fec", 
> "fsl,imx6sx-fec";
> reg = <0x30be 0x1>;
> --
> 2.17.1
>

Re: [PATCH V3 2/2] dt-bindings: imx: add new resources to scu resource table

2019-02-22 Thread Rob Herring

On Wed, 20 Feb 2019 14:35:06 +, Anson Huang wrote:
> Add new resources as below according to latest system
> controller firmware for new features:
> 
>   IMX_SC_R_PERF
>   IMX_SC_R_OCRAM
>   IMX_SC_R_DMA_5_CH0
>   IMX_SC_R_DMA_5_CH1
>   IMX_SC_R_DMA_5_CH2
>   IMX_SC_R_DMA_5_CH3
>   IMX_SC_R_ATTESTATION
> 
> Signed-off-by: Anson Huang 
> ---
> No changes since V2, just separate the patch to 2, 1 patch to remove 
> resources and the other is to add new
> resources.
> ---
>  include/dt-bindings/firmware/imx/rsrc.h | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH v3 0/8] nic: thunderx: fix communication races between VF & PF

2019-02-22 Thread David Miller

From: Vadim Lomovtsev 
Date: Wed, 20 Feb 2019 11:02:42 +

> The ThunderX CN88XX NIC Virtual Function driver uses mailbox interface
> to communicate to physical function driver. Each of VF has it's own pair
> of mailbox registers to read from and write to. The mailbox registers
> has no protection from possible races, so it has to be implemented
> at software side.
> 
> After long term testing by loop of 'ip link set  up/down'
> command it was found that there are two possible scenarios when
> race condition appears:
>  1. VF receives link change message from PF and VF send RX mode
> configuration message to PF in the same time from separate thread.
>  2. PF receives RX mode configuration from VF and in the same time,
> in separate thread PF detects link status change and sends appropriate
> message to particular VF.
> 
> Both cases leads to mailbox data to be rewritten, NIC VF messaging control
> data to be updated incorrectly and communication sequence gets broken.
> 
> This patch series is to address race condition with VF & PF communication.
> 
> Changes:
> v1 -> v2
>  - : correct typo in cover letter subject: 'betwen' -> 'between';
>  - move link state polling request task from pf to vf 
>instead of cheking status of mailbox irq;
> v2 -> v3
>  - 0003: change return type of nicvf_send_cfg_done() function
>from int to void;
>  - 0007: update subject and remove unused variable 'netdev'
>from nicvf_link_status_check_task() function;

Series applied, thanks.

Re: [PATCH 6/6] perf thread-stack: Hide x86 retpolines

2019-02-22 Thread Arnaldo Carvalho de Melo

Em Wed, Jan 09, 2019 at 11:18:35AM +0200, Adrian Hunter escreveu:
> x86 retpoline functions pollute the call graph by showing up everywhere
> there is an indirect branch, but they do not really mean anything. Make
> changes so that the default retpoline functions will no longer appear in
> the call graph. Note this only affects the call graph, since all the
> original branches are left unchanged.
> 
> This does not handle function return thunks, nor is there any improvement
> for the handling of inline thunks or extern thunks.
> 
> Example:
> 
> $ cat simple-retpoline.c
> __attribute__((noinline)) int bar(void)
> {
> return -1;
> }
> 
> int foo(void)
> {
> return bar() + 1;
> }
> 
> __attribute__((indirect_branch("thunk"))) int main()
> {
> int (*volatile fn)(void) = foo;
> 
> fn();
> return fn();
> }
> $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
> $ objdump -d simple-retpoline
> 
> 1040 :
> 1040:   48 83 ec 18 sub$0x18,%rsp
> 1044:   48 8d 05 25 01 00 00lea0x125(%rip),%rax# 1170 
> 
> 104b:   48 89 44 24 08  mov%rax,0x8(%rsp)
> 1050:   48 8b 44 24 08  mov0x8(%rsp),%rax
> 1055:   e8 1f 01 00 00  callq  1179 <__x86_indirect_thunk_rax>
> 105a:   48 8b 44 24 08  mov0x8(%rsp),%rax
> 105f:   48 83 c4 18 add$0x18,%rsp
> 1063:   e9 11 01 00 00  jmpq   1179 <__x86_indirect_thunk_rax>
> 
> 1160 :
> 1160:   b8 ff ff ff ff  mov$0x,%eax
> 1165:   c3  retq
> 
> 1170 :
> 1170:   e8 eb ff ff ff  callq  1160 
> 1175:   83 c0 01add$0x1,%eax
> 1178:   c3  retq
> 1179 <__x86_indirect_thunk_rax>:
> 1179:   e8 07 00 00 00  callq  1185 
> <__x86_indirect_thunk_rax+0xc>
> 117e:   f3 90   pause
> 1180:   0f ae e8lfence
> 1183:   eb f9   jmp117e 
> <__x86_indirect_thunk_rax+0x5>
> 1185:   48 89 04 24 mov%rax,(%rsp)
> 1189:   c3  retq
> 
> $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u 
> ./simple-retpoline
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
> $ perf script -i simple-retpoline.perf.data --itrace=be -s 
> ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db 
> branches calls
> 2019-01-08 14:03:37.851655 Creating database...
> 2019-01-08 14:03:37.863256 Writing records...
> 2019-01-08 14:03:38.069750 Adding indexes
> 2019-01-08 14:03:38.078799 Done
> $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py 
> simple-retpoline.db
> 
> Before:
> 
> main
> -> __x86_indirect_thunk_rax
> -> __x86_indirect_thunk_rax
> -> foo
> -> bar
> 
> After:
> 
> main
> -> foo
> -> bar
> 
> Signed-off-by: Adrian Hunter 
> ---
>  tools/perf/util/thread-stack.c | 112 -
>  1 file changed, 109 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index 632c07a125ab..805e30836460 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -20,6 +20,7 @@
>  #include "thread.h"
>  #include "event.h"
>  #include "machine.h"
> +#include "env.h"
>  #include "util.h"
>  #include "debug.h"
>  #include "symbol.h"
> @@ -29,6 +30,19 @@
>  
>  #define STACK_GROWTH 2048
>  
> +/*
> + * State of retpoline detection.
> + *
> + * RETPOLINE_NONE: no retpoline detection
> + * X86_RETPOLINE_POSSIBLE: x86 retpoline possible
> + * X86_RETPOLINE_DETECTED: x86 retpoline detected
> + */
> +enum retpoline_state_t {
> + RETPOLINE_NONE,
> + X86_RETPOLINE_POSSIBLE,
> + X86_RETPOLINE_DETECTED,
> +};
> +
>  /**
>   * struct thread_stack_entry - thread stack entry.
>   * @ret_addr: return address
> @@ -64,6 +78,7 @@ struct thread_stack_entry {
>   * @crp: call/return processor
>   * @comm: current comm
>   * @arr_sz: size of array if this is the first element of an array
> + * @rstate: used to detect retpolines
>   */
>  struct thread_stack {
>   struct thread_stack_entry *stack;
> @@ -76,6 +91,7 @@ struct thread_stack {
>   struct call_return_processor *crp;
>   struct comm *comm;
>   unsigned int arr_sz;
> + enum retpoline_state_t rstate;
>  };
>  
>  /*
> @@ -115,10 +131,16 @@ static int thread_stack__init(struct thread_stack *ts, 
> struct thread *thread,
>   if (err)
>   return err;
>  
> - if (thread->mg && thread->mg->machine)
> - ts->kernel_start = machine__kernel_start(thread->mg->machine);
> - else
> + if (thread->mg &&

[v6 PATCH 3/6] RISC-V: Remove NR_CPUs check during hartid search from DT

2019-02-22 Thread Atish Patra

In non-smp configuration, hartid can be higher that NR_CPUS.
riscv_of_processor_hartid should not be compared to hartid to NR_CPUS in
that case. Moreover, this function checks all the DT properties of a
hart node. NR_CPUS comparison seems out of place.

Signed-off-by: Atish Patra 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Anup Patel 
---
 arch/riscv/kernel/cpu.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index d1d9bfd5..cf2fca12 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -34,10 +34,6 @@ int riscv_of_processor_hartid(struct device_node *node)
pr_warn("Found CPU without hart ID\n");
return -ENODEV;
}
-   if (hart >= NR_CPUS) {
-   pr_info("Found hart ID %d, which is above NR_CPUs.  Disabling 
this hart\n", hart);
-   return -ENODEV;
-   }
 
if (!of_device_is_available(node)) {
pr_info("CPU with hartid=%d is not available\n", hart);
-- 
2.7.4

[v6 PATCH 0/6] Various SMP related fixes

2019-02-22 Thread Atish Patra

The existing upstream kernel doesn't boot for non-smp configuration.
This patch series address various issues with non-smp configurations.

The patch series is based on 5.0-rc7 + Johan's below mentioned patch
series. Tested on both QEMU and HiFive Unleashed board using both
OpenSBI & BBL.

https://lore.kernel.org/lkml/20190118140308.9599-1-jo...@kernel.org/   

Changes from v5->v6
1. Dropped driver patches as they have been accepted via drivers tree.

Changes from v4->v5
1. Continue processing other harts even if isa string is incorrect for
   a single hart during HWCAP processing.

Changes from v3->v4
1. Fixed commit text length issues.
2. Updated hwcap patch to use common capabilities of all harts.
3. Rebased on Johan's patch series.

Changes from v2->v3

1. Fixed spurious white space.
2. Added lockdep for smpboot completion variable.
2. Added a sanity check for hwcap.

Changes from v1->v2

1. Move the cpuid to hartid map to smp.c from setup.c
2. Split 3rd patch into several small patches based on
   logical grouping.
3. Added a new patch that fixes an issue in hwcap query.
4. Changed the title of the patch series.

Atish Patra (6):
RISC-V: Do not wait indefinitely in __cpu_up
RISC-V: Move cpuid to hartid mapping to SMP.
RISC-V: Remove NR_CPUs check during hartid search from DT
RISC-V: Allow hartid-to-cpuid function to fail.
RISC-V: Compare cpuid with NR_CPUS before mapping.
RISC-V: Assign hwcap as per comman capabilities.

arch/riscv/include/asm/smp.h   | 18 +-
arch/riscv/kernel/cpu.c|  4 
arch/riscv/kernel/cpufeature.c | 41 ++---
arch/riscv/kernel/setup.c  |  9 -
arch/riscv/kernel/smp.c| 10 +-
arch/riscv/kernel/smpboot.c| 20 +---
6 files changed, 61 insertions(+), 41 deletions(-)

--
2.7.4

[v6 PATCH 4/6] RISC-V: Allow hartid-to-cpuid function to fail.

2019-02-22 Thread Atish Patra

It is perfectly okay to call riscv_hartid_to_cpuid for a hartid that is
not mapped with an CPU id. It can happen if the calling functions
retrieves the hartid from DT.  However, that hartid was never brought
online by the firmware or kernel for any reasons.

No need to BUG() in the above case. A negative error return is
sufficient and the calling function should check for the return value
always.

Signed-off-by: Atish Patra 
Reviewed-by: Anup Patel 
Reviewed-by: Christoph Hellwig 
---
 arch/riscv/kernel/smp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index b69883c6..ca99f0fb 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -60,7 +60,6 @@ int riscv_hartid_to_cpuid(int hartid)
return i;
 
pr_err("Couldn't find cpu id for hartid [%d]\n", hartid);
-   BUG();
return i;
 }
 
-- 
2.7.4

[v6 PATCH 2/6] RISC-V: Move cpuid to hartid mapping to SMP.

2019-02-22 Thread Atish Patra

Currently, logical CPU id to physical hartid mapping is defined for both
smp and non-smp configurations. This is not required as we need this
only for smp configuration.  The mapping function can define directly
boot_cpu_hartid for non-smp use case.

The reverse mapping function i.e. hartid to cpuid can be called for any
valid but not booted harts. So it should return default cpu 0 only if it
is a boot hartid.

Signed-off-by: Atish Patra 
Reviewed-by: Anup Patel 
Reviewed-by: Christoph Hellwig 
---
 arch/riscv/include/asm/smp.h | 18 +-
 arch/riscv/kernel/setup.c|  9 -
 arch/riscv/kernel/smp.c  |  9 +
 3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 41aa73b4..636a934f 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -19,16 +19,17 @@
 #include 
 
 #define INVALID_HARTID ULONG_MAX
+
+struct seq_file;
+extern unsigned long boot_cpu_hartid;
+
+#ifdef CONFIG_SMP
 /*
  * Mapping between linux logical cpu index and hartid.
  */
 extern unsigned long __cpuid_to_hartid_map[NR_CPUS];
 #define cpuid_to_hartid_map(cpu)__cpuid_to_hartid_map[cpu]
 
-struct seq_file;
-
-#ifdef CONFIG_SMP
-
 /* print IPI stats */
 void show_ipi_stats(struct seq_file *p, int prec);
 
@@ -58,7 +59,14 @@ static inline void show_ipi_stats(struct seq_file *p, int 
prec)
 
 static inline int riscv_hartid_to_cpuid(int hartid)
 {
-   return 0;
+   if (hartid == boot_cpu_hartid)
+   return 0;
+
+   return -1;
+}
+static inline unsigned long cpuid_to_hartid_map(int cpu)
+{
+   return boot_cpu_hartid;
 }
 
 static inline void riscv_cpuid_to_hartid_mask(const struct cpumask *in,
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index fb09e013..61c81616 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -61,15 +61,6 @@ EXPORT_SYMBOL(empty_zero_page);
 atomic_t hart_lottery;
 unsigned long boot_cpu_hartid;
 
-unsigned long __cpuid_to_hartid_map[NR_CPUS] = {
-   [0 ... NR_CPUS-1] = INVALID_HARTID
-};
-
-void __init smp_setup_processor_id(void)
-{
-   cpuid_to_hartid_map(0) = boot_cpu_hartid;
-}
-
 #ifdef CONFIG_BLK_DEV_INITRD
 static void __init setup_initrd(void)
 {
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 246635ea..b69883c6 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -36,6 +36,15 @@ enum ipi_message_type {
IPI_MAX
 };
 
+unsigned long __cpuid_to_hartid_map[NR_CPUS] = {
+   [0 ... NR_CPUS-1] = INVALID_HARTID
+};
+
+void __init smp_setup_processor_id(void)
+{
+   cpuid_to_hartid_map(0) = boot_cpu_hartid;
+}
+
 /* A collection of single bit ipi messages.  */
 static struct {
unsigned long stats[IPI_MAX] cacheline_aligned;
-- 
2.7.4

[v6 PATCH 1/6] RISC-V: Do not wait indefinitely in __cpu_up

2019-02-22 Thread Atish Patra

In SMP path, __cpu_up waits for other CPU to come online indefinitely.
This is wrong as other CPU might be disabled in machine mode and
possible CPU is set to the cpus present in DT.

Introduce a completion variable and waits only for a second.

Signed-off-by: Atish Patra 
Reviewed-by: Anup Patel 
Reviewed-by: Christoph Hellwig 
---
 arch/riscv/kernel/smpboot.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 6e281325..d369b669 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -39,6 +39,7 @@
 
 void *__cpu_up_stack_pointer[NR_CPUS];
 void *__cpu_up_task_pointer[NR_CPUS];
+static DECLARE_COMPLETION(cpu_running);
 
 void __init smp_prepare_boot_cpu(void)
 {
@@ -77,6 +78,7 @@ void __init setup_smp(void)
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
+   int ret = 0;
int hartid = cpuid_to_hartid_map(cpu);
tidle->thread_info.cpu = cpu;
 
@@ -92,10 +94,16 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
  task_stack_page(tidle) + THREAD_SIZE);
WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
 
-   while (!cpu_online(cpu))
-   cpu_relax();
+   lockdep_assert_held(_running);
+   wait_for_completion_timeout(_running,
+   msecs_to_jiffies(1000));
 
-   return 0;
+   if (!cpu_online(cpu)) {
+   pr_crit("CPU%u: failed to come online\n", cpu);
+   ret = -EIO;
+   }
+
+   return ret;
 }
 
 void __init smp_cpus_done(unsigned int max_cpus)
@@ -121,6 +129,7 @@ asmlinkage void __init smp_callin(void)
 * a local TLB flush right now just in case.
 */
local_flush_tlb_all();
+   complete(_running);
/*
 * Disable preemption before enabling interrupts, so we don't try to
 * schedule a CPU that hasn't actually started yet.
-- 
2.7.4

[v6 PATCH 5/6] RISC-V: Compare cpuid with NR_CPUS before mapping.

2019-02-22 Thread Atish Patra

We should never have a cpuid greater that NR_CPUS. Compare with NR_CPUS
before creating the mapping between logical and physical CPU ids. This
is also mandatory as NR_CPUS check is removed from
riscv_of_processor_hartid.

Signed-off-by: Atish Patra 
Reviewed-by: Anup Patel 
Reviewed-by: Christoph Hellwig 
---
 arch/riscv/kernel/smpboot.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index d369b669..eb533b5c 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -66,6 +66,11 @@ void __init setup_smp(void)
found_boot_cpu = 1;
continue;
}
+   if (cpuid >= NR_CPUS) {
+   pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
+   cpuid, hart);
+   break;
+   }
 
cpuid_to_hartid_map(cpuid) = hart;
set_cpu_possible(cpuid, true);
-- 
2.7.4

[v6 PATCH 6/6] RISC-V: Assign hwcap as per comman capabilities.

2019-02-22 Thread Atish Patra

Currently, we set hwcap based on first valid hart from DT. This may not
be correct always as that hart might not be current booting cpu or may
have a different capability.

Set hwcap as the capabilities supported by all possible harts with "okay"
status.

Signed-off-by: Atish Patra 
---
 arch/riscv/kernel/cpufeature.c | 41 ++---
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index e7a4701f..bc29b010 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned long elf_hwcap __read_mostly;
 #ifdef CONFIG_FPU
@@ -42,28 +43,30 @@ void riscv_fill_hwcap(void)
 
elf_hwcap = 0;
 
-   /*
-* We don't support running Linux on hertergenous ISA systems.  For
-* now, we just check the ISA of the first "okay" processor.
-*/
for_each_of_cpu_node(node) {
-   if (riscv_of_processor_hartid(node) >= 0)
-   break;
-   }
-   if (!node) {
-   pr_warn("Unable to find \"cpu\" devicetree entry\n");
-   return;
-   }
+   unsigned long this_hwcap = 0;
 
-   if (of_property_read_string(node, "riscv,isa", )) {
-   pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
-   of_node_put(node);
-   return;
-   }
-   of_node_put(node);
+   if (riscv_of_processor_hartid(node) < 0)
+   continue;
 
-   for (i = 0; i < strlen(isa); ++i)
-   elf_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
+   if (of_property_read_string(node, "riscv,isa", )) {
+   pr_warn("Unable to find \"riscv,isa\" devicetree 
entry\n");
+   continue;
+   }
+
+   for (i = 0; i < strlen(isa); ++i)
+   this_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
+
+   /*
+* All "okay" hart should have same isa. Set HWCAP based on
+* common capabilities of every "okay" hart, in case they don't
+* have.
+*/
+   if (elf_hwcap)
+   elf_hwcap &= this_hwcap;
+   else
+   elf_hwcap = this_hwcap;
+   }
 
/* We don't support systems with F but without D, so mask those out
 * here. */
-- 
2.7.4

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-22 Thread Steven Rostedt

On Fri, 22 Feb 2019 11:34:58 -0800
Alexei Starovoitov  wrote:

> so you're saying you will break existing kprobe scripts?

Yes we may.

> I don't think it's a good idea.
> It's not acceptable to break bpf_probe_read uapi.

Then you may need to add more code to determine if the address is user
space or not in the kernel, and then go the appropriate route, before
calling probe_kernel_read().

-- Steve

Re: [PATCH] r8152: Fix an error on RTL8153-BD MAC Address Passthrough support

2019-02-22 Thread David Miller

From: David Chen 
Date: Wed, 20 Feb 2019 13:47:19 +0800

> From: David Chen 
> 
> RTL8153-BD is used in Dell DA300 type-C dongle.
> Added RTL8153-BD support to activate MAC address pass through on DA300.
> Apply correction on previously submitted patch in net.git tree.
> 
> Signed-off-by: David Chen 

Applied, thanks.

[PATCH v6 7/7] ARM: tegra: Add firmware calls required for suspend-resume on Tegra30

2019-02-22 Thread Dmitry Osipenko

In order to suspend-resume CPU with Trusted Foundations firmware being
present on Tegra30, the LP1/LP2 boot vectors and CPU caches need to be
set up using the firmware calls and then suspend code shall avoid
re-disabling parts that were disabled by the firmware.

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/mach-tegra/pm.c| 49 +
 arch/arm/mach-tegra/reset-handler.S | 26 +++
 arch/arm/mach-tegra/sleep.S | 14 ++---
 3 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mach-tegra/pm.c b/arch/arm/mach-tegra/pm.c
index 1ad5719779b0..f209f59e0daf 100644
--- a/arch/arm/mach-tegra/pm.c
+++ b/arch/arm/mach-tegra/pm.c
@@ -33,11 +33,13 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "iomap.h"
 #include "pm.h"
@@ -159,6 +161,28 @@ int tegra_cpu_do_idle(void)
 
 static int tegra_sleep_cpu(unsigned long v2p)
 {
+   /*
+* L2 cache disabling using kernel API only allowed when all
+* secondary CPU's are offline. Cache have to be disabled with
+* MMU-on if cache maintenance is done via Trusted Foundations
+* firmware. Note that CPUIDLE won't ever enter powergate on Tegra30
+* if any of secondary CPU's is online and this is the LP2-idle
+* code-path only for Tegra20/30.
+*/
+   if (trusted_foundations_registered())
+   outer_disable();
+
+   /*
+* Note that besides of setting up CPU reset vector this firmware
+* call may also do the following, depending on the FW version:
+*  1) Disable L2. But this doesn't matter since we already
+* disabled the L2.
+*  2) Disable D-cache. This need to be taken into account in
+* particular by the tegra_disable_clean_inv_dcache() which
+* shall avoid the re-disable.
+*/
+   call_firmware_op(prepare_idle, TF_PM_MODE_LP2);
+
setup_mm_for_reboot();
tegra_sleep_cpu_finish(v2p);
 
@@ -197,6 +221,14 @@ void tegra_idle_lp2_last(void)
 
cpu_suspend(PHYS_OFFSET - PAGE_OFFSET, _sleep_cpu);
 
+   /*
+* Resume L2 cache if it wasn't re-enabled early during resume,
+* which is the case for Tegra30 that has to re-enable the cache
+* via firmware call. In other cases cache is already enabled and
+* hence re-enabling is a no-op. This is always a no-op on Tegra114+.
+*/
+   outer_resume();
+
restore_cpu_complex();
cpu_cluster_pm_exit();
 }
@@ -215,6 +247,15 @@ enum tegra_suspend_mode tegra_pm_validate_suspend_mode(
 
 static int tegra_sleep_core(unsigned long v2p)
 {
+   /*
+* Cache have to be disabled with MMU-on if cache maintenance is done
+* via Trusted Foundations firmware. This is a no-op on Tegra114+.
+*/
+   if (trusted_foundations_registered())
+   outer_disable();
+
+   call_firmware_op(prepare_idle, TF_PM_MODE_LP1);
+
setup_mm_for_reboot();
tegra_sleep_core_finish(v2p);
 
@@ -342,6 +383,14 @@ static int tegra_suspend_enter(suspend_state_t state)
 
cpu_suspend(PHYS_OFFSET - PAGE_OFFSET, tegra_sleep_func);
 
+   /*
+* Resume L2 cache if it wasn't re-enabled early during resume,
+* which is the case for Tegra30 that has to re-enable the cache
+* via firmware call. In other cases cache is already enabled and
+* hence re-enabling is a no-op.
+*/
+   outer_resume();
+
switch (mode) {
case TEGRA_SUSPEND_LP1:
tegra_suspend_exit_lp1();
diff --git a/arch/arm/mach-tegra/reset-handler.S 
b/arch/arm/mach-tegra/reset-handler.S
index 3bf202819534..19a609046547 100644
--- a/arch/arm/mach-tegra/reset-handler.S
+++ b/arch/arm/mach-tegra/reset-handler.S
@@ -20,6 +20,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -76,6 +77,7 @@ ENTRY(tegra_resume)
orr r1, r1, #1
str r1, [r0]
 #endif
+   bl  tegra_resume_trusted_foundations
 
 #ifdef CONFIG_CACHE_L2X0
/* L2 cache resume & re-enable */
@@ -88,6 +90,30 @@ end_ca9_scu_l2_resume:
 
b   cpu_resume
 ENDPROC(tegra_resume)
+
+/*
+ * tegra_resume_trusted_foundations
+ *
+ *   Trusted Foundations firmware initialization.
+ *
+ * Doesn't return if firmware presents.
+ * Corrupted registers: r1, r2
+ */
+ENTRY(tegra_resume_trusted_foundations)
+   /* Check whether Trusted Foundations firmware presents. */
+   mov32   r2, TEGRA_IRAM_BASE + TEGRA_IRAM_RESET_HANDLER_OFFSET
+   ldr r1, =__tegra_cpu_reset_handler_data_offset + \
+   RESET_DATA(TF_PRESENT)
+   ldr r1, [r2, r1]
+   cmp r1, #0
+   reteq   lr
+
+ .arch_extension sec
+   /* First call after suspend wakes firmware. No arguments required. */
+   smc #0
+
+   b

Re: [PATCH] rpmsg: virtio_rpmsg_bus: acknowledge the received creation message

2019-02-22 Thread xiang xiao

Thanks for review.

On Fri, Feb 22, 2019 at 6:44 PM Arnaud Pouliquen
 wrote:
>
> Hello Xiang,
>
>
>
> On 2/12/19 8:13 AM, Xiang Xiao wrote:
> > From: QianWenfa 
> >
> > the two phase handsake make the client could initiate the transfer
> > immediately without the server side send any dummy message first.
> As discussed on OpenAMP mailing list, the first point (from my pov) is
> to figure out if this should be part of the rpmsg protocol or service
> dependent (so managed by the rpmsg client itself)...
>

Here is my thought:
1.The ack message don't have any side effect except kernel send back
the local address through NS channel directly.
2.But without this message, remote side can't send the message first
due to the lack of kernel address information.
The second limitation is very strange for all developer I think.

>
> >
> > Signed-off-by: Wenfa Qian 
> > Signed-off-by: Xiang Xiao 
> > ---
> >  drivers/rpmsg/virtio_rpmsg_bus.c | 25 -
> >  1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
> > b/drivers/rpmsg/virtio_rpmsg_bus.c
> > index 664f957..e323c98 100644
> > --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > @@ -71,6 +71,7 @@ struct virtproc_info {
> >
> >  /* The feature bitmap for virtio rpmsg */
> >  #define VIRTIO_RPMSG_F_NS0 /* RP supports name service notifications */
> > +#define VIRTIO_RPMSG_F_ACK   1 /* RP supports name service acknowledge */
> >
> >  /**
> >   * struct rpmsg_hdr - common header for all rpmsg messages
> > @@ -115,10 +116,12 @@ struct rpmsg_ns_msg {
> >   *
> >   * @RPMSG_NS_CREATE: a new remote service was just created
> >   * @RPMSG_NS_DESTROY: a known remote service was just destroyed
> > + * @RPMSG_NS_ACK: acknowledge the previous creation message
> >   */
> >  enum rpmsg_ns_flags {
> >   RPMSG_NS_CREATE = 0,
> >   RPMSG_NS_DESTROY= 1,
> > + RPMSG_NS_ACK= 2,
> >  };
> >
> >  /**
> > @@ -330,13 +333,14 @@ static int virtio_rpmsg_announce_create(struct 
> > rpmsg_device *rpdev)
> >   int err = 0;
> >
> >   /* need to tell remote processor's name service about this channel ? 
> > */
> > - if (rpdev->announce && rpdev->ept &&
> > + if (rpdev->ept && (rpdev->announce ||
> > + virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_ACK)) &&
> >   virtio_has_feature(vrp->vdev, VIRTIO_RPMSG_F_NS)) {Regarding this 
> > condition i have a concern. If rpdev->announce is false
> but  VIRTIO_RPMSG_F_ACK feature is set, this should generate an ack message.
> Seems that this condition can occurs depending on the rpmsg_device
> struct provided on registration, when rpmsg_dev_probe is called.

The ack message can't be sent before device probe success, otherwise
the early ack make  the remote side consider kernel driver is ready
and send the message immediately, but nobody can receive the message
actually.

>
> >   struct rpmsg_ns_msg nsm;
> >
> >   strncpy(nsm.name, rpdev->id.name, RPMSG_NAME_SIZE);
> >   nsm.addr = rpdev->ept->addr;
> > - nsm.flags = RPMSG_NS_CREATE;
> > + nsm.flags = rpdev->announce ? RPMSG_NS_CREATE : RPMSG_NS_ACK;
> >
> >   err = rpmsg_sendto(rpdev->ept, , sizeof(nsm), 
> > RPMSG_NS_ADDR);
> >   if (err)
> > @@ -820,6 +824,7 @@ static int rpmsg_ns_cb(struct rpmsg_device *rpdev, void 
> > *data, int len,
> >   struct rpmsg_channel_info chinfo;
> >   struct virtproc_info *vrp = priv;
> >   struct device *dev = >vdev->dev;
> > + struct device *tmp;
> >   int ret;
> >
> >  #if defined(CONFIG_DYNAMIC_DEBUG)
> > @@ -847,21 +852,30 @@ static int rpmsg_ns_cb(struct rpmsg_device *rpdev, 
> > void *data, int len,
> >   msg->name[RPMSG_NAME_SIZE - 1] = '\0';
> >
> >   dev_info(dev, "%sing channel %s addr 0x%x\n",
> > -  msg->flags & RPMSG_NS_DESTROY ? "destroy" : "creat",
> > +  msg->flags == RPMSG_NS_ACK ? "ack" :
> > +  msg->flags == RPMSG_NS_DESTROY ? "destroy" : "creat",
> >msg->name, msg->addr);
> >
> >   strncpy(chinfo.name, msg->name, sizeof(chinfo.name));
> >   chinfo.src = RPMSG_ADDR_ANY;
> >   chinfo.dst = msg->addr;
> >
> > - if (msg->flags & RPMSG_NS_DESTROY) {
> > + if (msg->flags == RPMSG_NS_DESTROY) {
> >   ret = rpmsg_unregister_device(>vdev->dev, );
> >   if (ret)
> >   dev_err(dev, "rpmsg_destroy_channel failed: %d\n", 
> > ret);
> > - } else {
> > + } else if (msg->flags == RPMSG_NS_CREATE) {
> >   newch = rpmsg_create_channel(vrp, );
> >   if (!newch)
> >   dev_err(dev, "rpmsg_create_channel failed\n");
> > + } else if (msg->flags == RPMSG_NS_ACK) {
> > + chinfo.dst = RPMSG_ADDR_ANY;
> > + tmp = rpmsg_find_device(>vdev->dev, );
> nit-picking... replace >vdev->dev by dev and change

[PATCH v6 6/7] ARM: tegra: Always boot CPU in ARM-mode

2019-02-22 Thread Dmitry Osipenko

CPU always jumps into reset handler in ARM-mode from the Trusted
Foundations firmware, hence let's make CPU to always jump into kernel
in ARM-mode regardless of the firmware presence. This is required to
make Thumb-2 kernel working with the Trusted Foundations firmware on
Tegra30.

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/mach-tegra/reset-handler.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-tegra/reset-handler.S 
b/arch/arm/mach-tegra/reset-handler.S
index 6bea95d165fa..3bf202819534 100644
--- a/arch/arm/mach-tegra/reset-handler.S
+++ b/arch/arm/mach-tegra/reset-handler.S
@@ -113,6 +113,7 @@ ENTRY(__tegra_cpu_reset_handler_start)
  *   must be position-independent.
  */
 
+   .arm
.align L1_CACHE_SHIFT
 ENTRY(__tegra_cpu_reset_handler)
 
-- 
2.20.1

[PATCH v6 5/7] ARM: tegra: Don't apply CPU erratas in insecure mode

2019-02-22 Thread Dmitry Osipenko

CPU isn't allowed to touch secure registers while running under secure
monitor. Hence skip applying of CPU erratas in the reset handler if
Trusted Foundations firmware presents.

Partially based on work done by Michał Mirosław [1].

[1] https://www.spinics.net/lists/arm-kernel/msg594768.html

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/mach-tegra/reset-handler.S | 23 ---
 arch/arm/mach-tegra/reset.c |  3 +++
 arch/arm/mach-tegra/reset.h |  9 +++--
 arch/arm/mach-tegra/sleep-tegra20.S |  4 
 4 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/arm/mach-tegra/reset-handler.S 
b/arch/arm/mach-tegra/reset-handler.S
index 805f306fa6f7..6bea95d165fa 100644
--- a/arch/arm/mach-tegra/reset-handler.S
+++ b/arch/arm/mach-tegra/reset-handler.S
@@ -29,8 +29,6 @@
 
 #define PMC_SCRATCH41  0x140
 
-#define RESET_DATA(x)  ((TEGRA_RESET_##x)*4)
-
 #ifdef CONFIG_PM_SLEEP
 /*
  * tegra_resume
@@ -121,6 +119,12 @@ ENTRY(__tegra_cpu_reset_handler)
cpsid   aif, 0x13   @ SVC mode, interrupts disabled
 
tegra_get_soc_id TEGRA_APB_MISC_BASE, r6
+
+   adr r12, __tegra_cpu_reset_handler_data
+   ldr r5, [r12, #RESET_DATA(TF_PRESENT)]
+   cmp r5, #0
+   bne after_errata
+
 #ifdef CONFIG_ARCH_TEGRA_2x_SOC
 t20_check:
cmp r6, #TEGRA20
@@ -155,7 +159,6 @@ after_errata:
and r10, r10, #0x3  @ R10 = CPU number
mov r11, #1
mov r11, r11, lsl r10   @ R11 = CPU mask
-   adr r12, __tegra_cpu_reset_handler_data
 
 #ifdef CONFIG_SMP
/* Does the OS know about this CPU? */
@@ -169,10 +172,9 @@ after_errata:
cmp r6, #TEGRA20
bne 1f
/* If not CPU0, don't let CPU0 reset CPU1 now that CPU1 is coming up. */
-   mov32   r5, TEGRA_IRAM_BASE + TEGRA_IRAM_RESET_HANDLER_OFFSET
mov r0, #CPU_NOT_RESETTABLE
cmp r10, #0
-   strneb  r0, [r5, #__tegra20_cpu1_resettable_status_offset]
+   strneb  r0, [r12, #RESET_DATA(RESETTABLE_STATUS)]
 1:
 #endif
 
@@ -277,14 +279,13 @@ ENDPROC(__tegra_cpu_reset_handler)
.align L1_CACHE_SHIFT
.type   __tegra_cpu_reset_handler_data, %object
.globl  __tegra_cpu_reset_handler_data
+   .globl  __tegra_cpu_reset_handler_data_offset
+   .equ__tegra_cpu_reset_handler_data_offset, \
+   . - __tegra_cpu_reset_handler_start
 __tegra_cpu_reset_handler_data:
-   .rept   TEGRA_RESET_DATA_SIZE
-   .long   0
+   .rept   TEGRA_RESET_DATA_SIZE
+   .long   0
.endr
-   .globl  __tegra20_cpu1_resettable_status_offset
-   .equ__tegra20_cpu1_resettable_status_offset, \
-   . - __tegra_cpu_reset_handler_start
-   .byte   0
.align L1_CACHE_SHIFT
 
 ENTRY(__tegra_cpu_reset_handler_end)
diff --git a/arch/arm/mach-tegra/reset.c b/arch/arm/mach-tegra/reset.c
index dc558892753c..b02ae7699842 100644
--- a/arch/arm/mach-tegra/reset.c
+++ b/arch/arm/mach-tegra/reset.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "iomap.h"
 #include "irammap.h"
@@ -89,6 +90,8 @@ static void __init tegra_cpu_reset_handler_enable(void)
 
 void __init tegra_cpu_reset_handler_init(void)
 {
+   __tegra_cpu_reset_handler_data[TEGRA_RESET_TF_PRESENT] =
+   trusted_foundations_registered();
 
 #ifdef CONFIG_SMP
__tegra_cpu_reset_handler_data[TEGRA_RESET_MASK_PRESENT] =
diff --git a/arch/arm/mach-tegra/reset.h b/arch/arm/mach-tegra/reset.h
index 9c479c7925b8..db0e6b3097ab 100644
--- a/arch/arm/mach-tegra/reset.h
+++ b/arch/arm/mach-tegra/reset.h
@@ -25,7 +25,11 @@
 #define TEGRA_RESET_STARTUP_SECONDARY  3
 #define TEGRA_RESET_STARTUP_LP24
 #define TEGRA_RESET_STARTUP_LP15
-#define TEGRA_RESET_DATA_SIZE  6
+#define TEGRA_RESET_RESETTABLE_STATUS  6
+#define TEGRA_RESET_TF_PRESENT 7
+#define TEGRA_RESET_DATA_SIZE  8
+
+#define RESET_DATA(x)  ((TEGRA_RESET_##x)*4)
 
 #ifndef __ASSEMBLY__
 
@@ -49,7 +53,8 @@ void __tegra_cpu_reset_handler_end(void);
 (u32)__tegra_cpu_reset_handler_start)))
 #define tegra20_cpu1_resettable_status \
(IO_ADDRESS(TEGRA_IRAM_BASE + TEGRA_IRAM_RESET_HANDLER_OFFSET + \
-(u32)__tegra20_cpu1_resettable_status_offset))
+   ((u32)&__tegra_cpu_reset_handler_data[TEGRA_RESET_RESETTABLE_STATUS] - \
+(u32)__tegra_cpu_reset_handler_start)))
 #endif
 
 #define tegra_cpu_reset_handler_offset \
diff --git a/arch/arm/mach-tegra/sleep-tegra20.S 
b/arch/arm/mach-tegra/sleep-tegra20.S
index dedeebfccc55..50d51d3465f6 100644
--- a/arch/arm/mach-tegra/sleep-tegra20.S
+++ b/arch/arm/mach-tegra/sleep-tegra20.S
@@ -28,6 +28,7 @@
 #include 
 
 #include "irammap.h"
+#include "reset.h"
 #include "sleep.h"
 
 #define EMC_CFG0xc
@@ -53,6 +54,9 @@
 #define

[PATCH v6 3/7] ARM: trusted_foundations: Provide information about whether firmware is registered

2019-02-22 Thread Dmitry Osipenko

Add a helper that provides information about whether Trusted Foundations
firmware operations have been registered.

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/firmware/trusted_foundations.c| 5 +
 arch/arm/include/asm/trusted_foundations.h | 7 +++
 2 files changed, 12 insertions(+)

diff --git a/arch/arm/firmware/trusted_foundations.c 
b/arch/arm/firmware/trusted_foundations.c
index c496f4cc49cb..d795ed83a3cd 100644
--- a/arch/arm/firmware/trusted_foundations.c
+++ b/arch/arm/firmware/trusted_foundations.c
@@ -172,3 +172,8 @@ void of_register_trusted_foundations(void)
panic("Trusted Foundation: missing version-minor property\n");
register_trusted_foundations();
 }
+
+bool trusted_foundations_registered(void)
+{
+   return firmware_ops == _foundations_ops;
+}
diff --git a/arch/arm/include/asm/trusted_foundations.h 
b/arch/arm/include/asm/trusted_foundations.h
index cdd48ab7d191..3f23fa493db6 100644
--- a/arch/arm/include/asm/trusted_foundations.h
+++ b/arch/arm/include/asm/trusted_foundations.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TF_PM_MODE_LP0 0
 #define TF_PM_MODE_LP1 1
@@ -47,6 +48,7 @@ struct trusted_foundations_platform_data {
 
 void register_trusted_foundations(struct trusted_foundations_platform_data 
*pd);
 void of_register_trusted_foundations(void);
+bool trusted_foundations_registered(void);
 
 #else /* CONFIG_TRUSTED_FOUNDATIONS */
 
@@ -74,6 +76,11 @@ static inline void of_register_trusted_foundations(void)
if (of_find_compatible_node(NULL, NULL, "tlm,trusted-foundations"))
register_trusted_foundations(NULL);
 }
+
+static inline bool trusted_foundations_registered(void)
+{
+   return false;
+}
 #endif /* CONFIG_TRUSTED_FOUNDATIONS */
 
 #endif
-- 
2.20.1

[PATCH v6 4/7] ARM: tegra: Set up L2 cache using Trusted Foundations firmware

2019-02-22 Thread Dmitry Osipenko

On Tegra30 L2 cache should be initialized using firmware call if CPU
is running in insecure mode. Set up the required outer-cache write_sec()
callback early during boot using the firmware API, it is always a NO-OP
on T114+ and is NO-OP on T20/30 if Trusted Foundations firmware node
isn't present in device-tree.

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/mach-tegra/tegra.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mach-tegra/tegra.c b/arch/arm/mach-tegra/tegra.c
index f9587be48235..1e89cfefbf68 100644
--- a/arch/arm/mach-tegra/tegra.c
+++ b/arch/arm/mach-tegra/tegra.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -74,6 +75,7 @@ static void __init tegra_init_early(void)
 {
of_register_trusted_foundations();
tegra_cpu_reset_handler_init();
+   call_firmware_op(l2x0_init);
 }
 
 static void __init tegra_dt_init_irq(void)
-- 
2.20.1

[PATCH v6 1/7] ARM: trusted_foundations: Implement L2 cache initialization callback

2019-02-22 Thread Dmitry Osipenko

Implement L2 cache initialization firmware callback that should be
invoked early during boot in order to set up the required outer cache
driver's callbacks.

Partially based on work done by Michał Mirosław [1].

[1] https://www.spinics.net/lists/arm-kernel/msg594765.html

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/firmware/trusted_foundations.c | 46 +
 1 file changed, 46 insertions(+)

diff --git a/arch/arm/firmware/trusted_foundations.c 
b/arch/arm/firmware/trusted_foundations.c
index 689e6565abfc..3bf61a5933b9 100644
--- a/arch/arm/firmware/trusted_foundations.c
+++ b/arch/arm/firmware/trusted_foundations.c
@@ -18,8 +18,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
+#define TF_CACHE_MAINT 0xf100
+
+#define TF_CACHE_ENABLE1
+#define TF_CACHE_DISABLE   2
+
 #define TF_SET_CPU_BOOT_ADDR_SMC 0xf200
 
 #define TF_CPU_PM  0xfffc
@@ -67,9 +74,48 @@ static int tf_prepare_idle(void)
return 0;
 }
 
+#ifdef CONFIG_CACHE_L2X0
+static void tf_cache_write_sec(unsigned long val, unsigned int reg)
+{
+   static u32 l2x0_way_mask = 0xff;
+   static u32 l2x0_aux_ctrl = 0;
+
+   switch (reg) {
+   case L2X0_AUX_CTRL:
+   l2x0_aux_ctrl = val;
+
+   if (l2x0_aux_ctrl & BIT(16))
+   l2x0_way_mask = 0x;
+   break;
+
+   case L2X0_CTRL:
+   if (val == L2X0_CTRL_EN)
+   tf_generic_smc(TF_CACHE_MAINT, TF_CACHE_ENABLE,
+  l2x0_aux_ctrl);
+   else
+   tf_generic_smc(TF_CACHE_MAINT, TF_CACHE_DISABLE,
+  l2x0_way_mask);
+   break;
+
+   default:
+   break;
+   }
+}
+
+static int tf_init_cache(void)
+{
+   outer_cache.write_sec = tf_cache_write_sec;
+
+   return 0;
+}
+#endif /* CONFIG_CACHE_L2X0 */
+
 static const struct firmware_ops trusted_foundations_ops = {
.set_cpu_boot_addr = tf_set_cpu_boot_addr,
.prepare_idle = tf_prepare_idle,
+#ifdef CONFIG_CACHE_L2X0
+   .l2x0_init = tf_init_cache,
+#endif
 };
 
 void register_trusted_foundations(struct trusted_foundations_platform_data *pd)
-- 
2.20.1

[PATCH v6 2/7] ARM: trusted_foundations: Make prepare_idle call to take mode argument

2019-02-22 Thread Dmitry Osipenko

The Trusted Foundations firmware call varies depending on the required
suspend-mode. Make the firmware API to take the mode argument in order
to expose all of the modes to firmware user.

Signed-off-by: Dmitry Osipenko 
---
 arch/arm/firmware/trusted_foundations.c| 29 --
 arch/arm/include/asm/firmware.h|  2 +-
 arch/arm/include/asm/trusted_foundations.h |  6 +
 arch/arm/mach-tegra/cpuidle-tegra114.c |  3 ++-
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/arch/arm/firmware/trusted_foundations.c 
b/arch/arm/firmware/trusted_foundations.c
index 3bf61a5933b9..c496f4cc49cb 100644
--- a/arch/arm/firmware/trusted_foundations.c
+++ b/arch/arm/firmware/trusted_foundations.c
@@ -67,9 +67,34 @@ static int tf_set_cpu_boot_addr(int cpu, unsigned long 
boot_addr)
return 0;
 }
 
-static int tf_prepare_idle(void)
+static int tf_prepare_idle(unsigned long mode)
 {
-   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S1_NOFLUSH_L2, cpu_boot_addr);
+   switch (mode) {
+   case TF_PM_MODE_LP0:
+   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S3, cpu_boot_addr);
+   break;
+
+   case TF_PM_MODE_LP1:
+   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S2, cpu_boot_addr);
+   break;
+
+   case TF_PM_MODE_LP1_NO_MC_CLK:
+   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S2_NO_MC_CLK,
+  cpu_boot_addr);
+   break;
+
+   case TF_PM_MODE_LP2:
+   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S1, cpu_boot_addr);
+   break;
+
+   case TF_PM_MODE_LP2_NOFLUSH_L2:
+   tf_generic_smc(TF_CPU_PM, TF_CPU_PM_S1_NOFLUSH_L2,
+  cpu_boot_addr);
+   break;
+
+   default:
+   return -EINVAL;
+   }
 
return 0;
 }
diff --git a/arch/arm/include/asm/firmware.h b/arch/arm/include/asm/firmware.h
index 34c1d96ef46d..6698272bbcbf 100644
--- a/arch/arm/include/asm/firmware.h
+++ b/arch/arm/include/asm/firmware.h
@@ -24,7 +24,7 @@ struct firmware_ops {
/*
 * Inform the firmware we intend to enter CPU idle mode
 */
-   int (*prepare_idle)(void);
+   int (*prepare_idle)(unsigned long mode);
/*
 * Enters CPU idle mode
 */
diff --git a/arch/arm/include/asm/trusted_foundations.h 
b/arch/arm/include/asm/trusted_foundations.h
index 00748350cf72..cdd48ab7d191 100644
--- a/arch/arm/include/asm/trusted_foundations.h
+++ b/arch/arm/include/asm/trusted_foundations.h
@@ -32,6 +32,12 @@
 #include 
 #include 
 
+#define TF_PM_MODE_LP0 0
+#define TF_PM_MODE_LP1 1
+#define TF_PM_MODE_LP1_NO_MC_CLK   2
+#define TF_PM_MODE_LP2 3
+#define TF_PM_MODE_LP2_NOFLUSH_L2  4
+
 struct trusted_foundations_platform_data {
unsigned int version_major;
unsigned int version_minor;
diff --git a/arch/arm/mach-tegra/cpuidle-tegra114.c 
b/arch/arm/mach-tegra/cpuidle-tegra114.c
index e3fbcfedf845..3b9af4766cdf 100644
--- a/arch/arm/mach-tegra/cpuidle-tegra114.c
+++ b/arch/arm/mach-tegra/cpuidle-tegra114.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "cpuidle.h"
@@ -46,7 +47,7 @@ static int tegra114_idle_power_down(struct cpuidle_device 
*dev,
tegra_set_cpu_in_lp2();
cpu_pm_enter();
 
-   call_firmware_op(prepare_idle);
+   call_firmware_op(prepare_idle, TF_PM_MODE_LP2_NOFLUSH_L2);
 
/* Do suspend by ourselves if the firmware does not implement it */
if (call_firmware_op(do_idle, 0) == -ENOSYS)
-- 
2.20.1

[PATCH v6 0/7] Support Trusted Foundations firmware on Tegra30

2019-02-22 Thread Dmitry Osipenko

Hello,

This patchset adds support for the Trusted Foundations firmware on
NVIDIA Tegra30. Pretty much all of Tegra30 consumer devices have that
firmware and upstream kernel can't boot on those devices without the
firmware support.

Changelog:

v6: - One patch got messed up accidentally in v5, this is fixed now.

- Squashed "Support L2 cache maintenance done via firmware" patch
  into the "Add firmware calls..." patch.

- The l2x0_init() firmware callback is now invoked unconditionally
  because it is always a NO-OP on T114+ and is a NO-OP on T20/30
  if firmware node is missed in device-tree, hence there is no
  need to check the machine's DT compatible as it was done in the
  previous versions of the series.

v5: - Fixed machine hanging on disabling D-cache during suspend,
  turned out there are slight variations in behaviour between
  firmware version in regards to cache-management. Thanks to
  Robert Yang for reporting the problem and helping with the
  solution.

v4: - Fixed Thumb2-kernel hanging on Tegra20, turned out it was not a
  good idea to switch CPU into Thumb2 mode right after jumping into
  the reset handler.

- Moved LP2-mode-set firmware call invocation to a later stage to
  better replicate what downstream kernel does. This change was
  suggested by Robert Yang and fixes system hang on Ouya game
  console.

- Added references to the original work made by Michał Mirosław
  into commit messages of the relevant patches.

v3: - Implemented suspend-resume support.

- Reworked arm/firmware/trusted_foundations.c a tad. Now cache
  is getting properly initialized, cache enabling / disabling is
  supported.

v2:
- The "Don't apply CPU erratas in insecure mode" patch got some
  cleanup, in particular resolved the messiness in
  __tegra_cpu_reset_handler_data.

- Added a comment to tf_cache_write_sec(), justifying the warning
  message.

Dmitry Osipenko (7):
  ARM: trusted_foundations: Implement L2 cache initialization callback
  ARM: trusted_foundations: Make prepare_idle call to take mode argument
  ARM: trusted_foundations: Provide information about whether firmware
is registered
  ARM: tegra: Set up L2 cache using Trusted Foundations firmware
  ARM: tegra: Don't apply CPU erratas in insecure mode
  ARM: tegra: Always boot CPU in ARM-mode
  ARM: tegra: Add firmware calls required for suspend-resume on Tegra30

 arch/arm/firmware/trusted_foundations.c| 80 +-
 arch/arm/include/asm/firmware.h|  2 +-
 arch/arm/include/asm/trusted_foundations.h | 13 
 arch/arm/mach-tegra/cpuidle-tegra114.c |  3 +-
 arch/arm/mach-tegra/pm.c   | 49 +
 arch/arm/mach-tegra/reset-handler.S| 50 +++---
 arch/arm/mach-tegra/reset.c|  3 +
 arch/arm/mach-tegra/reset.h|  9 ++-
 arch/arm/mach-tegra/sleep-tegra20.S|  4 ++
 arch/arm/mach-tegra/sleep.S| 14 ++--
 arch/arm/mach-tegra/tegra.c|  2 +
 11 files changed, 207 insertions(+), 22 deletions(-)

-- 
2.20.1

Re: [PATCH v4 1/3] dt-bindings: phy: Add Stingray USB PHY binding document

2019-02-22 Thread Rob Herring

On Fri, Feb 22, 2019 at 11:29 AM Srinath Mannam
 wrote:
>
> Hi Rob,
>
> Thanks for the review, Please find my comments below in line.
>
> On Fri, Feb 22, 2019 at 10:50 PM Rob Herring  wrote:
> >
> > On Wed, Feb 20, 2019 at 04:04:00PM +0530, Srinath Mannam wrote:
> > > Add DT binding document for Stingray USB PHY.
> > >
> > > Signed-off-by: Srinath Mannam 
> > > Reviewed-by: Florian Fainelli 
> > > Reviewed-by: Scott Branden 
> > > ---
> > >  .../bindings/phy/brcm,stingray-usb-phy.txt | 62 
> > > ++
> > >  1 file changed, 62 insertions(+)
> > >  create mode 100644 
> > > Documentation/devicetree/bindings/phy/brcm,stingray-usb-phy.txt
> > >
> > > diff --git 
> > > a/Documentation/devicetree/bindings/phy/brcm,stingray-usb-phy.txt 
> > > b/Documentation/devicetree/bindings/phy/brcm,stingray-usb-phy.txt
> > > new file mode 100644
> > > index 000..da19236
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/phy/brcm,stingray-usb-phy.txt
> > > @@ -0,0 +1,62 @@
> > > +Broadcom Stingray USB PHY
> > > +
> > > +Required properties:
> > > + - compatible : should be one of the listed compatibles
> > > + - "brcm,sr-usb-combo-phy" is a combo PHY has one SS PHY and one HS 
> > > PHY.
> > > + - "brcm,sr-usb-hs-phy" has a single HS PHY.
> > > + - reg: offset and length of the PHY blocks registers
> > > + - address-cells: should be 1
> > > + - size-cells: should be 0
> > > +
> > > +Sub-nodes:
> > > +  brcm,sr-usb-combo-phy have two sub-nodes for one SS PHY and one HS PHY.
> > > +
> > > +Sub-nodes required properties:
> > > + - reg: required for brcm,sr-usb-phy model PHY.
> > > + reg value 0 is HS PHY and 1 is SS PHY.
> > > + - phy-cells: generic PHY binding; must be 0
> > > +
> > > +Refer to phy/phy-bindings.txt for the generic PHY binding properties
> > > +
> > > +Example:
> > > + usbphy0: usb-phy@0 {
> > > + compatible = "brcm,sr-usb-combo-phy";
> > > + reg = <0x 0x100>;
> > > + #address-cells = <1>;
> > > + #size-cells = <0>;
> > > +
> > > + usb0_phy0: phy@0 {
> > > + reg = <0>;
> > > + #phy-cells = <0>;
> > > + };
> > > +
> > > + usb0_phy1: phy@1 {
> > > + reg = <1>;
> > > + #phy-cells = <0>;
> > > + };
> >
> > Again, you don't need child nodes here. There are not any per child
> > resources. Clients can refer to < 1> just as easily as
> > <_phy1>. This is why we have #phy-cells.
> This phy controller is combo PHY it has one Super Speed USB PHY and
> one High Speed USB PHY.
> We required to create two PHY devices inside driver to initialize and
> service(reset) both SS and HS PHYs separately.
> That is the reason we used two child nodes.

What you do in the driver is your business. That is independent of the
binding. Go look at other phy drivers which use #phy-cells=1.
.of_xlate() function is what converts the phy cells to a struct phy.

Rob

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-22 Thread Alexei Starovoitov

On Fri, Feb 22, 2019 at 02:30:26PM -0500, Steven Rostedt wrote:
> On Fri, 22 Feb 2019 11:27:05 -0800
> Alexei Starovoitov  wrote:
> 
> > On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> > > 
> > > Then we should still probably fix up "__probe_kernel_read()" to not
> > > allow user accesses. The easiest way to do that is actually likely to
> > > use the "unsafe_get_user()" functions *without* doing a
> > > uaccess_begin(), which will mean that modern CPU's will simply fault
> > > on a kernel access to user space.  
> > 
> > On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
> > and users pass both user and kernel addresses into it and expect
> > that the helper will actually try to read from that address.
> > 
> > If __probe_kernel_read will suddenly start failing on all user addresses
> > it will break the expectations.
> > How do we solve it in bpf_probe_read?
> > Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
> > in the loop?
> > That's doable, but people already complain that bpf_probe_read() is slow
> > and shows up in their perf report.
> 
> We're changing kprobes to add a specific flag to say that we want to
> differentiate between kernel or user reads. Can this be done with
> bpf_probe_read()? If it's showing up in perf report, I doubt a single

so you're saying you will break existing kprobe scripts?
I don't think it's a good idea.
It's not acceptable to break bpf_probe_read uapi.

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-22 Thread Steven Rostedt

On Fri, 22 Feb 2019 11:27:05 -0800
Alexei Starovoitov  wrote:

> On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> > 
> > Then we should still probably fix up "__probe_kernel_read()" to not
> > allow user accesses. The easiest way to do that is actually likely to
> > use the "unsafe_get_user()" functions *without* doing a
> > uaccess_begin(), which will mean that modern CPU's will simply fault
> > on a kernel access to user space.  
> 
> On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
> and users pass both user and kernel addresses into it and expect
> that the helper will actually try to read from that address.
> 
> If __probe_kernel_read will suddenly start failing on all user addresses
> it will break the expectations.
> How do we solve it in bpf_probe_read?
> Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
> in the loop?
> That's doable, but people already complain that bpf_probe_read() is slow
> and shows up in their perf report.

We're changing kprobes to add a specific flag to say that we want to
differentiate between kernel or user reads. Can this be done with
bpf_probe_read()? If it's showing up in perf report, I doubt a single
check is going to cause an issue. In fact, it may actually help speed
things up as the read will be optimized for either user or kernel
address reading.

-- Steve

[PATCH 1/2] habanalabs: don't print result when rc indicates error

2019-02-22 Thread Oded Gabbay

send_cpu_message() doesn't update the result parameter when an error
occurs in its code. Therefore, callers of send_cpu_message() shouldn't use
the result value when the return code indicates error.

This patch fixes a static checker warning in goya_test_cpu_queue(), where
that function did print the result even though the return code from
send_cpu_message() indicated error.

Signed-off-by: Oded Gabbay 
---
 drivers/misc/habanalabs/goya/goya.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c 
b/drivers/misc/habanalabs/goya/goya.c
index c43bd37fe693..e6f0d49ab71a 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -3380,10 +3380,16 @@ int goya_test_cpu_queue(struct hl_device *hdev)
rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) _pkt,
sizeof(test_pkt), HL_DEVICE_TIMEOUT_USEC, );
 
-   if (!rc)
-   dev_info(hdev->dev, "queue test on CPU queue succeeded\n");
-   else
-   dev_err(hdev->dev, "CPU queue test failed (0x%08lX)\n", result);
+   if (!rc) {
+   if (result == ARMCP_PACKET_FENCE_VAL)
+   dev_info(hdev->dev,
+   "queue test on CPU queue succeeded\n");
+   else
+   dev_err(hdev->dev,
+   "CPU queue test failed (0x%08lX)\n", result);
+   } else {
+   dev_err(hdev->dev, "CPU queue test failed, error %d\n", rc);
+   }
 
return rc;
 }
-- 
2.18.0

[PATCH 2/2] habanalabs: driver's Kconfig must select DMA_SHARED_BUFFER

2019-02-22 Thread Oded Gabbay

The driver uses the DMA_BUF module which is built only if
DMA_SHARED_BUFFER is selected. DMA_SHARED_BUFFER doesn't have any
dependencies so it is ok to select it (as done by many other components).

Signed-off-by: Oded Gabbay 
---
 drivers/misc/habanalabs/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index b7f38a14caf5..80400a035dc1 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,6 +6,7 @@ config HABANA_AI
tristate "HabanaAI accelerators (habanalabs)"
depends on PCI
select FRAME_VECTOR
+   select DMA_SHARED_BUFFER
help
  Enables PCIe card driver for Habana's AI Processors (AIP) that are
  designed to accelerate Deep Learning inference and training workloads.
-- 
2.18.0

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-22 Thread Alexei Starovoitov

On Fri, Feb 22, 2019 at 09:43:14AM -0800, Linus Torvalds wrote:
> 
> Then we should still probably fix up "__probe_kernel_read()" to not
> allow user accesses. The easiest way to do that is actually likely to
> use the "unsafe_get_user()" functions *without* doing a
> uaccess_begin(), which will mean that modern CPU's will simply fault
> on a kernel access to user space.

On bpf side the bpf_probe_read() helper just calls probe_kernel_read()
and users pass both user and kernel addresses into it and expect
that the helper will actually try to read from that address.

If __probe_kernel_read will suddenly start failing on all user addresses
it will break the expectations.
How do we solve it in bpf_probe_read?
Call probe_kernel_read and if that fails call unsafe_get_user byte-by-byte
in the loop?
That's doable, but people already complain that bpf_probe_read() is slow
and shows up in their perf report.

Re: [PATCH v2] dt-bindings: Add vendor prefix for techstar

2019-02-22 Thread Jagan Teki

Rob,

On Tue, Feb 12, 2019 at 4:57 PM Jagan Teki  wrote:
>
> Add vendor prefix for techstar, known as
> Shenzhen Techstar Electronics Co., Ltd. a known producer for LCD modules.
>
> Signed-off-by: Jagan Teki 
> ---
> Changes for v2:
> - rebase for master
>
>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
> b/Documentation/devicetree/bindings/vendor-prefixes.txt
> index 7eae7a95ab2a..a5ad57eaee20 100644
> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> @@ -397,6 +397,7 @@ tcl Toby Churchill Ltd.
>  technexion TechNexion
>  technologicTechnologic Systems
>  tempo  Tempo Semiconductor
> +techstar   Shenzhen Techstar Electronics Co., Ltd.
>  terasicTerasic Inc.
>  thine  THine Electronics, Inc.
>  ti Texas Instruments
> --

Can you apply this, the relevant driver is mainline already. [1]

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=849b2e3ff9698226ab91e034d52cbb1da92a5b4c

Re: [RFC][PATCH 00/16] sched: Core scheduling

2019-02-22 Thread Tim Chen

On 2/22/19 6:20 AM, Peter Zijlstra wrote:
> On Fri, Feb 22, 2019 at 01:17:01PM +0100, Paolo Bonzini wrote:
>> On 18/02/19 21:40, Peter Zijlstra wrote:
>>> On Mon, Feb 18, 2019 at 09:49:10AM -0800, Linus Torvalds wrote:
 On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra  
 wrote:
>
> However; whichever way around you turn this cookie; it is expensive and 
> nasty.

 Do you (or anybody else) have numbers for real loads?

 Because performance is all that matters. If performance is bad, then
 it's pointless, since just turning off SMT is the answer.
>>>
>>> Not for these patches; they stopped crashing only yesterday and I
>>> cleaned them up and send them out.
>>>
>>> The previous version; which was more horrible; but L1TF complete, was
>>> between OK-ish and horrible depending on the number of VMEXITs a
>>> workload had.
>>>
>>> If there were close to no VMEXITs, it beat smt=off, if there were lots
>>> of VMEXITs it was far far worse. Supposedly hosting people try their
>>> very bestest to have no VMEXITs so it mostly works for them (with the
>>> obvious exception of single VCPU guests).
>>
>> If you are giving access to dedicated cores to guests, you also let them
>> do PAUSE/HLT/MWAIT without vmexits and the host just thinks it's a CPU
>> bound workload.
>>
>> In any case, IIUC what you are looking for is:
>>
>> 1) take a benchmark that *is* helped by SMT, this will be something CPU
>> bound.
>>
>> 2) compare two runs, one without SMT and without core scheduler, and one
>> with SMT+core scheduler.
>>
>> 3) find out whether performance is helped by SMT despite the increased
>> overhead of the core scheduler
>>
>> Do you want some other load in the host, so that the scheduler actually
>> does do something?  Or is the point just that you show that the
>> performance isn't affected when the scheduler does not have anything to
>> do (which should be obvious, but having numbers is always better)?
> 
> Well, what _I_ want is for all this to just go away :-)
> 
> Tim did much of testing last time around; and I don't think he did
> core-pinning of VMs much (although I'm sure he did some of that). I'm

Yes. The last time around I tested basic scenarios like:
1. single VM pinned on a core
2. 2 VMs pinned on a core
3. system oversubscription (no pinning)

In general, CPU bound benchmarks and even things without too much I/O
causing lots of VMexits perform better with HT than without for Peter's
last patchset.

> still a complete virt noob; I can barely boot a VM to save my life.
> 
> (you should be glad to not have heard my cursing at qemu cmdline when
> trying to reproduce some of Tim's results -- lets just say that I can
> deal with gpg)
> 
> I'm sure he tried some oversubscribed scenarios without pinning. 

We did try some oversubscribed scenarios like SPECVirt, that tried to
squeeze tons of VMs on a single system in over subscription mode.

There're two main problems in the last go around:

1. Workload with high rate of Vmexits (SpecVirt is one) 
were a major source of pain when we tried Peter's previous patchset.
The switch from vcpus to qemu and back in previous version of Peter's patch
requires some coordination between the hyperthread siblings via IPI.  And for
workload that does this a lot, the overhead quickly added up.

For Peter's new patch, this overhead hopefully would be reduced and give
better performance.

2. Load balancing is quite tricky.  Peter's last patchset did not have
load balancing for consolidating compatible running threads.
I did some non-sophisticated load balancing
to pair vcpus up.  But the constant vcpu migrations overhead probably ate up
any improvements from better load pairing.  So I didn't get much
improvement in the over-subscription case when turning on load balancing
to consolidate the VCPUs of the same VM. We'll probably have to try
out this incarnation of Peter's patch and see how well the load balancing
works.

I'll try to line up some benchmarking folks to do some tests.

Tim

Re: [PATCH v5 4/9] mm/mmu_notifier: contextual information for event enums

2019-02-22 Thread Ralph Campbell




On 2/19/19 12:04 PM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

CPU page table update can happens for many reasons, not only as a result


s/update/updates
s/happens/happen


of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

This patch introduce a set of enums that can be associated with each of


s/introduce/introduces


the events triggering a mmu notifier. Latter patches take advantages of
those enum values.


s/advantages/advantage



 - UNMAP: munmap() or mremap()
 - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
 - PROTECTION_VMA: change in access protections for the range
 - PROTECTION_PAGE: change in access protections for page in the range
 - SOFT_DIRTY: soft dirtyness tracking



s/dirtyness/dirtiness


Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).

Signed-off-by: Jérôme Glisse 
Cc: Christian König 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Jan Kara 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Felix Kuehling 
Cc: Jason Gunthorpe 
Cc: Ross Zwisler 
Cc: Dan Williams 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Michal Hocko 
Cc: Christian Koenig 
Cc: Ralph Campbell 
Cc: John Hubbard 
Cc: k...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-r...@vger.kernel.org
Cc: Arnd Bergmann 
---
  include/linux/mmu_notifier.h | 30 ++
  1 file changed, 30 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index c8672c366f67..2386e71ac1b8 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -10,6 +10,36 @@
  struct mmu_notifier;
  struct mmu_notifier_ops;
  
+/**

+ * enum mmu_notifier_event - reason for the mmu notifier callback
+ * @MMU_NOTIFY_UNMAP: either munmap() that unmap the range or a mremap() that
+ * move the range


I would say something about the VMA for the notifier range
is being deleted.
MMU notifier clients can then use this case to remove any policy or
access counts associated with the range.
Just changing the PTE to "no access" as in the CLEAR case
doesn't mean a policy which prefers device private memory
over system memory should be cleared.


+ *
+ * @MMU_NOTIFY_CLEAR: clear page table entry (many reasons for this like
+ * madvise() or replacing a page by another one, ...).
+ *
+ * @MMU_NOTIFY_PROTECTION_VMA: update is due to protection change for the range
+ * ie using the vma access permission (vm_page_prot) to update the whole range
+ * is enough no need to inspect changes to the CPU page table (mprotect()
+ * syscall)
+ *
+ * @MMU_NOTIFY_PROTECTION_PAGE: update is due to change in read/write flag for
+ * pages in the range so to mirror those changes the user must inspect the CPU
+ * page table (from the end callback).
+ *
+ * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same
+ * access flags). User should soft dirty the page in the end callback to make
+ * sure that anyone relying on soft dirtyness catch pages that might be written
+ * through non CPU mappings.
+ */
+enum mmu_notifier_event {
+   MMU_NOTIFY_UNMAP = 0,
+   MMU_NOTIFY_CLEAR,
+   MMU_NOTIFY_PROTECTION_VMA,
+   MMU_NOTIFY_PROTECTION_PAGE,
+   MMU_NOTIFY_SOFT_DIRTY,
+};
+
  #ifdef CONFIG_MMU_NOTIFIER
  
  /*

Re: [PATCH v2] kcm: remove any offset before parsing messages

2019-02-22 Thread Tom Herbert

On Thu, Feb 21, 2019 at 12:22 AM Dominique Martinet
 wrote:
>
> Tom Herbert wrote on Wed, Feb 20, 2019:
> > > When the client closes the socket, some messages are obviously still "in
> > > flight", and the server will recv a POLLERR notification on the csock at
> > > some point with many messages left.
> > > The documentation says to unattach the csock when you get POLLER. If I
> > > do that, the kcm socket will no longer give me any message, so all the
> > > messages still in flight at the time are lost.
> >
> > So basically it sounds like you're interested in supporting TCP
> > connections that are half closed. I believe that the error in half
> > closed is EPIPE, so if the TCP socket returns that it can be ignored
> > and the socket can continue being attached and used to send data.
>
> Did you mean 'can continue being attached and used to receive data'?
>
No, I meant shutdown on receive side when FIN is receved. TX is still
allowed to drain an queued bytes. To support shutdown on the TX side
would require additional logic since we need to effectively detach the
transmit path but retain the receive path. I'm not sure this is a
compelling use case to support.

> I can confirm getsockopt with SO_ERROR gets me EPIPE, but I don't see
> how to efficiently ignore EPIPE until POLLIN gets unset -- polling on
> both the csock and kcm socket will do many needless wakeups on only the
> csock from what I can see, so I'd need some holdoff timer or something.
> I guess it's possible though.

We might need to clear the error somehow. May a read of zero bytes?

>
> > Another possibility is to add some linger semantics to an attached
> > socket. For instance, a large message might be sent so that part of
> > the messge is queued in TCP and part is queued in the KCM socket.
> > Unattach would probably break that message. We probably want to linger
> > option similar to SO_LINGER (or maybe just use the option on the TCP
> > socket) that means don't complete the detach until any message being
> > transmitted on the lower socket has been queued.
>
> That would certainly work, even if non-obvious from a user standpoint.
>
>
> > > > I'd like to see some retry on ENOMEM before this is merged though, so
> > > > while I'm there I'll resend this with a second patch doing that
> > > > retry,.. I think just not setting strp->interrupted and not reporting
> > > > the error up might be enough? Will have to try either way.
> > >
> > > I also tried playing with that without much success.
> > > I had assumed just not calling strp_parser_err() (which calls the
> > > abort_parser cb) would be enough, eventually calling strp_start_timer()
> > > like the !len case, but no can do.
> >
> > I think you need to ignore the ENOMEM and have a timer or other
> > callback to retry the operation in the future.
>
> Yes, that's what I had intended to try; basically just break out and
> schedule timer as said above.

You might want to look at some other systems, I don't recall if
there's a hook that can be used for when memory pressure is relieved.

>
> After a bit more debugging, this part works (__strp_recv() is called
> again); but the next packet that is treated properly is rejected because
> by the time __strp_recv() was called again a new skb was read and the
> length isn't large enough to go all the way into the new packet, so this
> test fails:
> } else if (len <= (ssize_t)head->len -
>   skb->len - stm->strp.offset) {
> /* Length must be into new skb (and also
>  * greater than zero)
>  */
> STRP_STATS_INCR(strp->stats.bad_hdr_len);
> strp_parser_err(strp, -EPROTO, desc);
>
> So I need to figure a way to say "call this function again without
> reading more data" somehow, or make this check more lax e.g. accept any
> len > 0 after a retry maybe...
> Removing that branch altogether seems to work at least but I'm not sure
> we'd want to?

I like the check since it's conservative and covers the normal case.
Maybe just need some more logic?
> (grmbl at this slow VM and strparser not being possible to enable as a
> module, it takes ages to test)
>
>
> > > With that said, returning 0 from the parse function also raises POLLERR
> > > on the csock and hangs netparser, so things aren't that simple...
> >
> > Can you point to where this is happening. If the parse_msg callback
> > returns 0 that is suppose to indicate that more bytes are needed.
>
> I just blindly returned 0 "from time to time" in the kcm parser
> function, but looking at above it was failing on the same check.
> This somewhat makes sense for this one to fail here if a new packet was
> read, no sane parser function should give a length smaller than what
> they require to determine the length.
>
>
> Thanks,
> --
> Dominique

Re: [PATCHv6 07/10] acpi/hmat: Register processor domain to its memory

2019-02-22 Thread Dan Williams

On Fri, Feb 22, 2019 at 10:48 AM Keith Busch  wrote:
>
> On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Feb 14, 2019 at 6:10 PM Keith Busch  wrote:
> > >  config ACPI_HMAT
> > > bool "ACPI Heterogeneous Memory Attribute Table Support"
> > > depends on ACPI_NUMA
> > > +   select HMEM_REPORTING
> >
> > If you want to do this here, I'm not sure that defining HMEM_REPORTING
> > as a user-selectable option is a good idea.  In particular, I don't
> > really think that setting ACPI_HMAT without it makes a lot of sense.
> > Apart from this, the patch looks reasonable to me.
>
> I'm trying to implement based on the feedback, but I'm a little confused.
>
> As I have it at the moment, HMEM_REPORTING is not user-prompted, so
> another option needs to turn it on. I have ACPI_HMAT do that here.
>
> So when you say it's a bad idea to make HMEM_REPORTING user selectable,
> isn't it already not user selectable?
>
> If I do it the other way around, that's going to make HMEM_REPORTING
> complicated if a non-ACPI implementation wants to report HMEM
> properties.

Agree. If a platform supports these HMEM properties then they should
be reported. ACPI_HMAT is that opt-in for ACPI based platforms, and
other archs can do something similar. It's not clear that one would
ever want to opt-in to HMAT support and opt-out of reporting any of it
to userspace.

Re: [v4 PATCH 8/8] RISC-V: Assign hwcap as per comman capabilities.

2019-02-22 Thread Atish Patra


On 2/14/19 3:49 PM, Atish Patra wrote:

On 2/13/19 4:38 PM, Palmer Dabbelt wrote:

On Wed, 13 Feb 2019 00:44:42 PST (-0800), jo...@kernel.org wrote:

On Tue, Feb 12, 2019 at 11:58:10AM -0800, Atish Patra wrote:

On 2/12/19 3:25 AM, Johan Hovold wrote:

On Tue, Feb 12, 2019 at 03:10:12AM -0800, Atish Patra wrote:

Currently, we set hwcap based on first valid hart from DT. This may not
be correct always as that hart might not be current booting cpu or may
have a different capability.

Set hwcap as the capabilities supported by all possible harts with "okay"
status.

Signed-off-by: Atish Patra 
---
arch/riscv/kernel/cpufeature.c | 41 
++---
1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index e7a4701f..a1e4fb34 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -20,6 +20,7 @@
#include 
#include 
#include 
+#include 

unsigned long elf_hwcap __read_mostly;
#ifdef CONFIG_FPU
@@ -42,28 +43,30 @@ void riscv_fill_hwcap(void)

elf_hwcap = 0;

-   /*
-* We don't support running Linux on hertergenous ISA systems.  For
-* now, we just check the ISA of the first "okay" processor.
-*/
for_each_of_cpu_node(node) {
-   if (riscv_of_processor_hartid(node) >= 0)
-   break;
-   }
-   if (!node) {
-   pr_warn("Unable to find \"cpu\" devicetree entry\n");
-   return;
-   }
+   unsigned long this_hwcap = 0;

-   if (of_property_read_string(node, "riscv,isa", )) {
-   pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
-   of_node_put(node);
-   return;
-   }
-   of_node_put(node);
+   if (riscv_of_processor_hartid(node) < 0)
+   continue;




-   for (i = 0; i < strlen(isa); ++i)
-   elf_hwcap |= isa2hwcap[(unsigned char)(isa[i])];
+   if (of_property_read_string(node, "riscv,isa", )) {
+   pr_warn("Unable to find \"riscv,isa\" devicetree 
entry\n");
+   return;


Did you want "continue" here to continue processing the other harts?


Hmm. If a cpu node doesn't have isa in DT, that means DT is wrong. A
"continue" here will let user space use other harts just with a warning
message?

Returning here will not set elf_hwcap which forces the user to fix the
DT. I am not sure what should be the defined behavior in this case.

Any thoughts ?


The problem is that the proposed code might still set elf_hwcap -- it
all depends on the order of the hart nodes in dt (i.e. it will only be
left unset if the first node is malformed).

For that reason, I'd say it's better to either bail out (hard or at
least with elf_hwcap unset) or to continue processing the other nodes.

The former might break current systems with malformed dt, though.

And since the harts are expected to have the same ISA, continuing the
processing while warning and ignoring the malformed node might be
acceptable.


Handling malformed device trees by providing a warning and an empty HWCAP seems
like the right way to go to me.



If I understand you correctly, you prefer following things to be done in
case of malformed DT.

1. Print a warning message
2. Unset the entire HWCAP
3. Return without processing other harts. This will most likely result
in panic when user space starts.

Is this correct ?



As per our offline discussion, we should let kernel avoid setting any 
value for the cpu with incorrect DT entry and continue for other harts. 
A warning is enough. This is fine as long as user space never see that hart.


As the hart enumeration depends on riscv_of_processor_hartid, the hart 
with corrupted isa property will never boot. riscv_of_processor_hartid 
will return -ENODEV if "riscv,isa" property is not present.


Moreover, the discussed conditional statement will not even executed 
unless there is memory corruption or somebody corrupts the DT on the fly.


So we can continue with the patch as it is. I will just resend the 
series (dropping driver patches) for easy merge.


Regards,
Atish

Regards,
Atish


Johan

Re: [PATCH] arm64: dts: rockchip: Add support for the Orange Pi RK3399 board.

2019-02-22 Thread Rob Herring

On Fri, Feb 22, 2019 at 12:46 PM Alexis Ballier  wrote:
>
> This adds basic support for the Orange Pi RK3399 board.
> What works:
> - SD card / emmc.
> - Debug UART
> - Ethernet
> - USB: Type C, internal USB3 for SATA, 4 USB 2.0 ports
> - Sensors: All of them but the Hall sensor.
> - Buttons
> - Wifi, Bluetooth
> - HDMI out
>
> Signed-off-by: Alexis Ballier 
> Cc: devicet...@vger.kernel.org
> Cc: Heiko Stuebner 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-rockc...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
>
> ---
>
> The board is heavily based on the Sapphire one, hence this patch is also
> based on this DT. This is why I kept the copyrights of the sapphire DT.
> ---
>  .../devicetree/bindings/arm/rockchip.yaml |   5 +
>  arch/arm64/boot/dts/rockchip/Makefile |   1 +
>  .../boot/dts/rockchip/rk3399-orangepi.dts | 778 ++
>  3 files changed, 784 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-orangepi.dts

Reviewed-by: Rob Herring

Re: [PATCH RFC v2 4/4] PCI: hotplug: Add quirk For Dell nvme pcie switches

2019-02-22 Thread Alex_Gagniuc

On 2/21/19 8:05 PM, Oliver wrote:
> On Fri, Feb 22, 2019 at 5:38 AM  wrote:
>> On 2/21/19 1:57 AM, Lukas Wunner wrote:
[snip]
>>> If the quirk is x86-specific, please enclose it in "#ifdef CONFIG_X86"
>>> to reduce kernel footprint on other arches.
>>
>> That's a tricky one. If you look at p. 185 of [1], items 9, 11, and 12
>> are standard x16 cards that would fit in any x16 slot. Those cards have
>> the offending switches.
>>
>> On the one hand, you could take the cards and backplane and put them in
>> a non-hax86 system. On the other hand, I don't see why someone would
>> want to do this.
> 
> I have a couple of POWER boxes with Dell branded switch cards in them.
> I have no idea why either, but it does happen.

The hardware debouncer, I think, is on the backplane, so you'd really 
need both the switch and backplane combo. I've seen other marketing 
departments call the switches "NVMe HBA".

Alex

Re: [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim.

2019-02-22 Thread Johannes Weiner

On Fri, Feb 22, 2019 at 08:58:25PM +0300, Andrey Ryabinin wrote:
> In a presence of more than 1 memory cgroup in the system our reclaim
> logic is just suck. When we hit memory limit (global or a limit on
> cgroup with subgroups) we reclaim some memory from all cgroups.
> This is sucks because, the cgroup that allocates more often always wins.
> E.g. job that allocates a lot of clean rarely used page cache will push
> out of memory other jobs with active relatively small all in memory
> working set.
> 
> To prevent such situations we have memcg controls like low/max, etc which
> are supposed to protect jobs or limit them so they to not hurt others.
> But memory cgroups are very hard to configure right because it requires
> precise knowledge of the workload which may vary during the execution.
> E.g. setting memory limit means that job won't be able to use all memory
> in the system for page cache even if the rest the system is idle.
> Basically our current scheme requires to configure every single cgroup
> in the system.
> 
> I think we can do better. The idea proposed by this patch is to reclaim
> only inactive pages and only from cgroups that have big
> (!inactive_is_low()) inactive list. And go back to shrinking active lists
> only if all inactive lists are low.

Yes, you are absolutely right.

We shouldn't go after active pages as long as there are plenty of
inactive pages around. That's the global reclaim policy, and we
currently fail to translate that well to cgrouped systems.

Setting group protections or limits would work around this problem,
but they're kind of a red herring. We shouldn't ever allow use-once
streams to push out hot workingsets, that's a bug.

> @@ -2489,6 +2491,10 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct mem_cgroup *memcg,
>  
>   scan >>= sc->priority;
>  
> + if (!sc->may_shrink_active && inactive_list_is_low(lruvec,
> + file, memcg, sc, false))
> + scan = 0;
> +
>   /*
>* If the cgroup's already been deleted, make sure to
>* scrape out the remaining cache.
> @@ -2733,6 +2739,7 @@ static bool shrink_node(pg_data_t *pgdat, struct 
> scan_control *sc)
>   struct reclaim_state *reclaim_state = current->reclaim_state;
>   unsigned long nr_reclaimed, nr_scanned;
>   bool reclaimable = false;
> + bool retry;
>  
>   do {
>   struct mem_cgroup *root = sc->target_mem_cgroup;
> @@ -2742,6 +2749,8 @@ static bool shrink_node(pg_data_t *pgdat, struct 
> scan_control *sc)
>   };
>   struct mem_cgroup *memcg;
>  
> + retry = false;
> +
>   memset(>nr, 0, sizeof(sc->nr));
>  
>   nr_reclaimed = sc->nr_reclaimed;
> @@ -2813,6 +2822,13 @@ static bool shrink_node(pg_data_t *pgdat, struct 
> scan_control *sc)
>   }
>   } while ((memcg = mem_cgroup_iter(root, memcg, )));
>  
> + if ((sc->nr_scanned - nr_scanned) == 0 &&
> +  !sc->may_shrink_active) {
> + sc->may_shrink_active = 1;
> + retry = true;
> + continue;
> + }

Using !scanned as the gate could be a problem. There might be a cgroup
that has inactive pages on the local level, but when viewed from the
system level the total inactive pages in the system might still be low
compared to active ones. In that case we should go after active pages.

Basically, during global reclaim, the answer for whether active pages
should be scanned or not should be the same regardless of whether the
memory is all global or whether it's spread out between cgroups.

The reason this isn't the case is because we're checking the ratio at
the lruvec level - which is the highest level (and identical to the
node counters) when memory is global, but it's at the lowest level
when memory is cgrouped.

So IMO what we should do is:

- At the beginning of global reclaim, use node_page_state() to compare
  the INACTIVE_FILE:ACTIVE_FILE ratio and then decide whether reclaim
  can go after active pages or not. Regardless of what the ratio is in
  individual lruvecs.

- And likewise at the beginning of cgroup limit reclaim, walk the
  subtree starting at sc->target_mem_cgroup, sum up the INACTIVE_FILE
  and ACTIVE_FILE counters, and make inactive_is_low() decision on
  those sums.

Re: [PATCH net] net: socket: set sock->sk to NULL after calling proto_ops::release()

2019-02-22 Thread Al Viro

On Fri, Feb 22, 2019 at 10:25:09AM -0800, Eric Dumazet wrote:
> 
> 
> On 02/22/2019 09:57 AM, Eric Biggers wrote:
> 
> > ->setattr() is called under inode_lock(), which __sock_release() also 
> > takes.  So
> > the uses of sock->sk are serialized.  See commit 6d8c50dcb029 ("socket: 
> > close
> > race condition between sock_close() and sockfs_setattr()").
> 
> Oh right, we added another inode_lock()/inode_unlock() for sock_close()

An interesting question is whether anything else will be confused by
sock->sk && sock->sk->sk_socket != sock

I'd still like to figure out if we could simply make sock_orphan()
do something like
if (likely(sk->sk_socket))
sk->sk_socket->sk = NULL;
just before sk_set_socket(sk, NULL);

That would make for much easier rules; the question is whether anything
relies upon the windows when linkage between socket and sock is not
symmetrical...

Re: [PATCH RESEND V6 2/2] clk: imx: scu: add cpu frequency scaling support

2019-02-22 Thread Stephen Boyd

Quoting Anson Huang (2019-02-21 18:32:10)
> On NXP's i.MX SoCs with system controller inside, CPU frequency
> scaling can ONLY be done by system controller firmware, and it
> can ONLY be requested from secure mode, so Linux kernel has to
> call ARM SMC to trap to ARM-Trusted-Firmware to request system
> controller firmware to do CPU frequency scaling.
> 
> This patch adds i.MX system controller CPU frequency scaling support,
> it reuses cpufreq-dt driver and implement the CPU frequency scaling
> inside SCU clock driver.
> 
> Signed-off-by: Anson Huang 

Ah I missed one thing, see below.

> @@ -180,6 +185,23 @@ static long clk_scu_round_rate(struct clk_hw *hw, 
> unsigned long rate,
> return rate;
>  }
>  
> +static int clk_scu_atf_set_cpu_rate(struct clk_hw *hw, unsigned long rate,
> +   unsigned long parent_rate)
> +{
> +   struct clk_scu *clk = to_clk_scu(hw);
> +   struct arm_smccc_res res;
> +   unsigned long cluster_id;
> +
> +   if (clk->rsrc_id == IMX_SC_R_A35)
> +   cluster_id = 0;

Do we still need to check this anymore? Why not just always use
cluster_id 0?

> +
> +   /* CPU frequency scaling can ONLY be done by ARM-Trusted-Firmware */
> +   arm_smccc_smc(IMX_SIP_CPUFREQ, IMX_SIP_SET_CPUFREQ,
> + cluster_id, rate, 0, 0, 0, 0, );

Because not checking would make this work, vs. checking causes this code
to sometimes use an uninitialized value from the stack.

> +
> +   return 0;
> +}
> +

Re: [PATCH v5 3/9] mm/mmu_notifier: convert mmu_notifier_range->blockable to a flags

2019-02-22 Thread Ralph Campbell




On 2/19/19 12:04 PM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

Use an unsigned field for flags other than blockable and convert
the blockable field to be one of those flags.

Signed-off-by: Jérôme Glisse 
Cc: Christian König 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Jan Kara 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Felix Kuehling 
Cc: Jason Gunthorpe 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Cc: Dan Williams 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Michal Hocko 
Cc: Christian Koenig 
Cc: Ralph Campbell 
Cc: John Hubbard 
Cc: k...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-r...@vger.kernel.org
Cc: Arnd Bergmann 
---
  include/linux/mmu_notifier.h | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index e630def131ce..c8672c366f67 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -25,11 +25,13 @@ struct mmu_notifier_mm {
spinlock_t lock;
  };
  
+#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)

+
  struct mmu_notifier_range {
struct mm_struct *mm;
unsigned long start;
unsigned long end;
-   bool blockable;
+   unsigned flags;
  };
  
  struct mmu_notifier_ops {

@@ -229,7 +231,7 @@ extern void __mmu_notifier_invalidate_range(struct 
mm_struct *mm,
  static inline bool
  mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
  {
-   return range->blockable;
+   return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
  }
  
  static inline void mmu_notifier_release(struct mm_struct *mm)

@@ -275,7 +277,7 @@ static inline void
  mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
  {
if (mm_has_notifiers(range->mm)) {
-   range->blockable = true;
+   range->flags |= MMU_NOTIFIER_RANGE_BLOCKABLE;
__mmu_notifier_invalidate_range_start(range);
}
  }
@@ -284,7 +286,7 @@ static inline int
  mmu_notifier_invalidate_range_start_nonblock(struct mmu_notifier_range *range)
  {
if (mm_has_notifiers(range->mm)) {
-   range->blockable = false;
+   range->flags &= ~MMU_NOTIFIER_RANGE_BLOCKABLE;
return __mmu_notifier_invalidate_range_start(range);
}
return 0;
@@ -331,6 +333,7 @@ static inline void mmu_notifier_range_init(struct 
mmu_notifier_range *range,
range->mm = mm;
range->start = start;
range->end = end;
+   range->flags = 0;
  }
  
  #define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\




Reviewed-by: Ralph Campbell

Re: [PATCH RESEND V6 2/2] clk: imx: scu: add cpu frequency scaling support

2019-02-22 Thread Stephen Boyd

Quoting Anson Huang (2019-02-21 18:32:10)
> On NXP's i.MX SoCs with system controller inside, CPU frequency
> scaling can ONLY be done by system controller firmware, and it
> can ONLY be requested from secure mode, so Linux kernel has to
> call ARM SMC to trap to ARM-Trusted-Firmware to request system
> controller firmware to do CPU frequency scaling.
> 
> This patch adds i.MX system controller CPU frequency scaling support,
> it reuses cpufreq-dt driver and implement the CPU frequency scaling
> inside SCU clock driver.
> 
> Signed-off-by: Anson Huang 
> ---

Applied to clk-next

Re: [PATCH v2 2/2] selftests/ftrace: Replace \e with \033

2019-02-22 Thread Steven Rostedt

On Fri, 22 Feb 2019 08:59:42 -0700
shuah  wrote:

> On 2/22/19 7:59 AM, Masami Hiramatsu wrote:
> > On Fri, 22 Feb 2019 10:10:21 +0100
> > Juerg Haefliger  wrote:
> >   
> >> The \e sequence character is not POSIX. Fix that by using \033 instead.
> >>
> >> Signed-off-by: Juerg Haefliger   
> > 
> > This also looks good to me.
> > 
> > Acked-by: Masami Hiramatsu 
> > 
> > Thank you!
> >   
> 
> 
> I wasn't on the To or Cc on the original patch. Make sure to
> run the get_maintainers and include the people it recommends.
> 
> Please resend the patch to me, so I can take it for 5.1-rc1


And add both Masami and my Acked-by when you send it to Shuah (for both
patches).

Acked-by: Steven Rostedt (VMware) 

-- Steve

Re: [PATCH 1/5] mm/workingset: remove unused @mapping argument in workingset_eviction()

2019-02-22 Thread Rik van Riel

On Fri, 2019-02-22 at 20:43 +0300, Andrey Ryabinin wrote:
> workingset_eviction() doesn't use and never did use the @mapping
> argument.
> Remove it.
> 
> Signed-off-by: Andrey Ryabinin 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Rik van Riel 
> Cc: Mel Gorman 

Acked-by: Rik van Riel 

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/5] mm/compaction: pass pgdat to too_many_isolated() instead of zone

2019-02-22 Thread Rik van Riel

On Fri, 2019-02-22 at 20:43 +0300, Andrey Ryabinin wrote:
> too_many_isolated() in mm/compaction.c looks only at node state,
> so it makes more sense to change argument to pgdat instead of zone.
> 
> Signed-off-by: Andrey Ryabinin 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Rik van Riel 
> Cc: Mel Gorman 

Acked-by: Rik van Riel 

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part

[PATCH] null_blk: fix checking for REQ_FUA

2019-02-22 Thread Heinz Mauelshagen

null_handle_bio() erroneously uses the bio_op macro
which masks respective request flag bits including REQ_FUA
out thus failing the check.

Fix by checking bio->bi_opf directly.

Signed-off-by: Heinz Mauelshagen 
---
 drivers/block/null_blk_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 62c9654b9ce8..04b9d415511b 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1104,7 +1104,7 @@ static int null_handle_bio(struct nullb_cmd *cmd)
len = bvec.bv_len;
err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
 op_is_write(bio_op(bio)), sector,
-bio_op(bio) & REQ_FUA);
+bio->bi_opf & REQ_FUA);
if (err) {
spin_unlock_irq(>lock);
return err;
-- 
2.20.1

Re: [PATCH v5 2/9] mm/mmu_notifier: convert user range->blockable to helper function

2019-02-22 Thread Ralph Campbell




On 2/19/19 12:04 PM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

Use the mmu_notifier_range_blockable() helper function instead of
directly dereferencing the range->blockable field. This is done to
make it easier to change the mmu_notifier range field.

This patch is the outcome of the following coccinelle patch:

%<---
@@
identifier I1, FN;
@@
FN(..., struct mmu_notifier_range *I1, ...) {
<...
-I1->blockable
+mmu_notifier_range_blockable(I1)
...>
}
--->%

spatch --in-place --sp-file blockable.spatch --dir .

Signed-off-by: Jérôme Glisse 
Cc: Christian König 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Jan Kara 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Felix Kuehling 
Cc: Jason Gunthorpe 
Cc: Ross Zwisler 
Cc: Dan Williams 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Michal Hocko 
Cc: Christian Koenig 
Cc: Ralph Campbell 
Cc: John Hubbard 
Cc: k...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-r...@vger.kernel.org
Cc: Arnd Bergmann 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c  | 8 
  drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +-
  drivers/gpu/drm/radeon/radeon_mn.c  | 4 ++--
  drivers/infiniband/core/umem_odp.c  | 5 +++--
  drivers/xen/gntdev.c| 6 +++---
  mm/hmm.c| 6 +++---
  mm/mmu_notifier.c   | 2 +-
  virt/kvm/kvm_main.c | 3 ++-
  8 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 3e6823fdd939..58ed401c5996 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -256,14 +256,14 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct 
mmu_notifier *mn,
/* TODO we should be able to split locking for interval tree and
 * amdgpu_mn_invalidate_node
 */
-   if (amdgpu_mn_read_lock(amn, range->blockable))
+   if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
return -EAGAIN;
  
  	it = interval_tree_iter_first(>objects, range->start, end);

while (it) {
struct amdgpu_mn_node *node;
  
-		if (!range->blockable) {

+   if (!mmu_notifier_range_blockable(range)) {
amdgpu_mn_read_unlock(amn);
return -EAGAIN;
}
@@ -299,7 +299,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
mmu_notifier *mn,
/* notification is exclusive, but interval is inclusive */
end = range->end - 1;
  
-	if (amdgpu_mn_read_lock(amn, range->blockable))

+   if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
return -EAGAIN;
  
  	it = interval_tree_iter_first(>objects, range->start, end);

@@ -307,7 +307,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct 
mmu_notifier *mn,
struct amdgpu_mn_node *node;
struct amdgpu_bo *bo;
  
-		if (!range->blockable) {

+   if (!mmu_notifier_range_blockable(range)) {
amdgpu_mn_read_unlock(amn);
return -EAGAIN;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1d3f9a31ad61..777b3f8727e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -122,7 +122,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
while (it) {
struct drm_i915_gem_object *obj;
  
-		if (!range->blockable) {

+   if (!mmu_notifier_range_blockable(range)) {
ret = -EAGAIN;
break;
}
diff --git a/drivers/gpu/drm/radeon/radeon_mn.c 
b/drivers/gpu/drm/radeon/radeon_mn.c
index b3019505065a..c9bd1278f573 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -133,7 +133,7 @@ static int radeon_mn_invalidate_range_start(struct 
mmu_notifier *mn,
/* TODO we should be able to split locking for interval tree and
 * the tear down.
 */
-   if (range->blockable)
+   if (mmu_notifier_range_blockable(range))
mutex_lock(>lock);
else if (!mutex_trylock(>lock))
return -EAGAIN;
@@ -144,7 +144,7 @@ static int radeon_mn_invalidate_range_start(struct 
mmu_notifier *mn,
struct radeon_bo *bo;
long r;
  
-		if (!range->blockable) {

+   if (!mmu_notifier_range_blockable(range)) {
ret = -EAGAIN;
goto out_unlock;
}
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index 012044f16d1c..3a3f1538d295 100644
--- a/drivers/infiniband/core/umem_odp.c
+++

Re: [PATCH v5 1/9] mm/mmu_notifier: helper to test if a range invalidation is blockable

2019-02-22 Thread Ralph Campbell




On 2/19/19 12:04 PM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

Simple helpers to test if range invalidation is blockable. Latter
patches use cocinnelle to convert all direct dereference of range->
blockable to use this function instead so that we can convert the
blockable field to an unsigned for more flags.

Signed-off-by: Jérôme Glisse 
Cc: Christian König 
Cc: Joonas Lahtinen 
Cc: Jani Nikula 
Cc: Rodrigo Vivi 
Cc: Jan Kara 
Cc: Andrea Arcangeli 
Cc: Peter Xu 
Cc: Felix Kuehling 
Cc: Jason Gunthorpe 
Cc: Andrew Morton 
Cc: Ross Zwisler 
Cc: Dan Williams 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Michal Hocko 
Cc: Christian Koenig 
Cc: Ralph Campbell 
Cc: John Hubbard 
Cc: k...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linux-r...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: Arnd Bergmann 
---
  include/linux/mmu_notifier.h | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 4050ec1c3b45..e630def131ce 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -226,6 +226,12 @@ extern void __mmu_notifier_invalidate_range_end(struct 
mmu_notifier_range *r,
  extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
  unsigned long start, unsigned long end);
  
+static inline bool

+mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
+{
+   return range->blockable;
+}
+
  static inline void mmu_notifier_release(struct mm_struct *mm)
  {
if (mm_has_notifiers(mm))
@@ -455,6 +461,11 @@ static inline void _mmu_notifier_range_init(struct 
mmu_notifier_range *range,
  #define mmu_notifier_range_init(range, mm, start, end) \
_mmu_notifier_range_init(range, start, end)
  
+static inline bool

+mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
+{
+   return true;
+}
  
  static inline int mm_has_notifiers(struct mm_struct *mm)

  {


Reviewed-by: Ralph Campbell

Re: [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim.

2019-02-22 Thread Rik van Riel

On Fri, 2019-02-22 at 20:58 +0300, Andrey Ryabinin wrote:
> In a presence of more than 1 memory cgroup in the system our reclaim
> logic is just suck. When we hit memory limit (global or a limit on
> cgroup with subgroups) we reclaim some memory from all cgroups.
> This is sucks because, the cgroup that allocates more often always
> wins.
> E.g. job that allocates a lot of clean rarely used page cache will
> push
> out of memory other jobs with active relatively small all in memory
> working set.
> 
> To prevent such situations we have memcg controls like low/max, etc
> which
> are supposed to protect jobs or limit them so they to not hurt
> others.
> But memory cgroups are very hard to configure right because it
> requires
> precise knowledge of the workload which may vary during the
> execution.
> E.g. setting memory limit means that job won't be able to use all
> memory
> in the system for page cache even if the rest the system is idle.
> Basically our current scheme requires to configure every single
> cgroup
> in the system.
> 
> I think we can do better. The idea proposed by this patch is to
> reclaim
> only inactive pages and only from cgroups that have big
> (!inactive_is_low()) inactive list. And go back to shrinking active
> lists
> only if all inactive lists are low.

Your general idea seems like a good one, but
the logic in the code seems a little convoluted
to me.

I wonder if we can simplify things a little, by
checking (when we enter page reclaim) whether
the pgdat has enough inactive pages based on
the node_page_state statistics, and basing our
decision whether or not to scan the active lists
off that.

As it stands, your patch seems like the kind of
code that makes perfect sense today, but which
will confuse people who look at the code two
years from now.

If the code could be made a little more explicit,
great. If there are good reasons to do things in
the fallback way your current patch does it, the
code could use some good comments explaining why :)

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v3] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

2019-02-22 Thread Mike Kravetz

On 2/22/19 6:01 AM, Jing Xiangfeng wrote:
Thanks, just a couple small changes.

> User can change a node specific hugetlb count. i.e.
> /sys/devices/system/node/node1/hugepages/hugepages-2048kB

Please make that,
/sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

> the calculated value of count is a total number of huge pages. It could
> be overflow when a user entering a crazy high value. If so, the total
> number of huge pages could be a small value which is not user expect.
> We can simply fix it by setting count to ULONG_MAX, then it goes on. This
> may be more in line with user's intention of allocating as many huge pages
> as possible.
> 
> Signed-off-by: Jing Xiangfeng 
> ---
>  mm/hugetlb.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index afef616..18fa7d7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2423,7 +2423,10 @@ static ssize_t __nr_hugepages_store_common(bool 
> obey_mempolicy,
>* per node hstate attribute: adjust count to global,
>* but restrict alloc/free to the specified node.
>*/
> + unsigned long old_count = count;
>   count += h->nr_huge_pages - h->nr_huge_pages_node[nid];

Also, adding a comment here about checking for overflow would help people
reading the code.  Something like,

/*
 * If user specified count causes overflow, set to
 * largest possible value.
 */

-- 
Mike Kravetz

> + if (count < old_count)
> + count = ULONG_MAX;
>   init_nodemask_of_node(nodes_allowed, nid);
>   } else
>   nodes_allowed = _states[N_MEMORY];
>

Re: [PATCH] x86/kvm/mmu: fix switch between root and guest MMUs

2019-02-22 Thread Vitaly Kuznetsov

Paolo Bonzini  writes:

> On 22/02/19 17:45, Vitaly Kuznetsov wrote:
>> Commit 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu") brought one subtle
>> change: previously, when switching back from L2 to L1, we were resetting
>> MMU hooks (like mmu->get_cr3()) in kvm_init_mmu() called from
>> nested_vmx_load_cr3() and now we do that in nested_ept_uninit_mmu_context()
>> when we re-target vcpu->arch.mmu pointer.
>> The change itself looks logical: if nested_ept_init_mmu_context() changes
>> something than nested_ept_uninit_mmu_context() restores it back. There is,
>> however, one thing: the following call chain:
>> 
>>  nested_vmx_load_cr3()
>>   kvm_mmu_new_cr3()
>> __kvm_mmu_new_cr3()
>>   fast_cr3_switch()
>> cached_root_available()
>> 
>> now happens with MMU hooks pointing to the new MMU (root MMU in our case)
>> while previously it was happening with the old one. cached_root_available()
>> tries to stash current root but it is incorrect to read current CR3 with
>> mmu->get_cr3(), we need to use old_mmu->get_cr3() which in case we're
>> switching from L2 to L1 is guest_mmu. (BTW, in shadow page tables case this
>> is a non-issue because we don't switch MMU).
>> 
>> While we could've tried to guess that we're switching between MMUs and call
>> the right ->get_cr3() from cached_root_available() this seems to be overly
>> complicated. Instead, just stash the corresponding CR3 when setting
>> root_hpa and make cached_root_available() use the stashed value.
>> 
>> Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu")
>> Signed-off-by: Vitaly Kuznetsov 
>
> Is the bug latent until the other patch?  Or are they completely
> separate issues?
>

This one was reported as https://bugs.launchpad.net/qemu/+bug/1813165
(and I still don't completely understand why it is only SMM which is
broken - or is it?). While working on it I noticed that
fast_cr3_switch() actually doesn't work (even after this patch when we
stop putting incorrect values in the cache with
cached_root_available()). So in the end they seem to be fairly
independent.

-- 
Vitaly

[RFC PATCH 19/20] net: silan: sc92031: Remove stale comment about mmiowb()

2019-02-22 Thread Will Deacon

mmiowb() is no more. It has ceased to be. It is an ex-barrier. So remove
all references to it from comments.

Signed-off-by: Will Deacon 
---
 drivers/net/ethernet/silan/sc92031.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/silan/sc92031.c 
b/drivers/net/ethernet/silan/sc92031.c
index db5dc8ce0aff..02b3962b0e63 100644
--- a/drivers/net/ethernet/silan/sc92031.c
+++ b/drivers/net/ethernet/silan/sc92031.c
@@ -251,7 +251,6 @@ enum PMConfigBits {
  * use of mdelay() at _sc92031_reset.
  * Functions prefixed with _sc92031_ must be called with the lock held;
  * functions prefixed with sc92031_ must be called without the lock held.
- * Use mmiowb() before unlocking if the hardware was written to.
  */
 
 /* Locking rules for the interrupt:
-- 
2.11.0

[RFC PATCH 17/20] scsi: qla1280: Remove stale comment about mmiowb()

2019-02-22 Thread Will Deacon

All mmiowb() invocations have been removed, so there's no need to keep
banging on about it.

Signed-off-by: Will Deacon 
---
 drivers/scsi/qla1280.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 93acbc5094f0..327eff67a1ee 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3363,16 +3363,6 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
 
/*
 * Update request index to mailbox4 (Request Queue In).
-* The mmiowb() ensures that this write is ordered with writes by other
-* CPUs.  Without the mmiowb(), it is possible for the following:
-*CPUA posts write of index 5 to mailbox4
-*CPUA releases host lock
-*CPUB acquires host lock
-*CPUB posts write of index 6 to mailbox4
-*On PCI bus, order reverses and write of 6 posts, then index 5,
-*   causing chip to issue full queue of stale commands
-* The mmiowb() prevents future writes from crossing the barrier.
-* See Documentation/driver-api/device-io.rst for more information.
 */
WRT_REG_WORD(>mailbox4, ha->req_ring_index);
 
-- 
2.11.0

[RFC PATCH 20/20] arch: Remove dummy mmiowb() definitions from arch code

2019-02-22 Thread Will Deacon

Now that no driver code is using mmiowb() directly, we can remove the
dummy definitions from the architectures that don't make use of
asm-generic/io.h

Signed-off-by: Will Deacon 
---
 arch/alpha/include/asm/io.h| 2 --
 arch/hexagon/include/asm/io.h  | 2 --
 arch/parisc/include/asm/io.h   | 2 --
 arch/sparc/include/asm/io_64.h | 2 --
 4 files changed, 8 deletions(-)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 4c533fc94d62..ccf9d65166bb 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -513,8 +513,6 @@ extern inline void writeq(u64 b, volatile void __iomem 
*addr)
 #define writel_relaxed(b, addr)__raw_writel(b, addr)
 #define writeq_relaxed(b, addr)__raw_writeq(b, addr)
 
-#define mmiowb()
-
 /*
  * String version of IO memory access ops:
  */
diff --git a/arch/hexagon/include/asm/io.h b/arch/hexagon/include/asm/io.h
index e17262ad125e..3d0ae09c2b8e 100644
--- a/arch/hexagon/include/asm/io.h
+++ b/arch/hexagon/include/asm/io.h
@@ -184,8 +184,6 @@ static inline void writel(u32 data, volatile void __iomem 
*addr)
 #define writew_relaxed __raw_writew
 #define writel_relaxed __raw_writel
 
-#define mmiowb()
-
 /*
  * Need an mtype somewhere in here, for cache type deals?
  * This is probably too long for an inline.
diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index afe493b23d04..b163043d49db 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -229,8 +229,6 @@ static inline void writeq(unsigned long long q, volatile 
void __iomem *addr)
 #define writel_relaxed(l, addr)writel(l, addr)
 #define writeq_relaxed(q, addr)writeq(q, addr)
 
-#define mmiowb() do { } while (0)
-
 void memset_io(volatile void __iomem *addr, unsigned char val, int count);
 void memcpy_fromio(void *dst, const volatile void __iomem *src, int count);
 void memcpy_toio(volatile void __iomem *dst, const void *src, int count);
diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h
index b162c23ae8c2..688911051b44 100644
--- a/arch/sparc/include/asm/io_64.h
+++ b/arch/sparc/include/asm/io_64.h
@@ -396,8 +396,6 @@ static inline void memcpy_toio(volatile void __iomem *dst, 
const void *src,
}
 }
 
-#define mmiowb()
-
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
-- 
2.11.0

[RFC PATCH 16/20] drivers: Remove explicit invocations of mmiowb()

2019-02-22 Thread Will Deacon

mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

@mmiowb@
@@
- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

Signed-off-by: Will Deacon 
---
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c   |  4 ---
 drivers/dma/txx9dmac.c |  3 ---
 drivers/firewire/ohci.c|  1 -
 drivers/gpu/drm/i915/intel_hdmi.c  | 10 
 drivers/ide/tx4939ide.c|  2 --
 drivers/infiniband/hw/hfi1/chip.c  |  3 ---
 drivers/infiniband/hw/hfi1/pio.c   |  1 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |  2 --
 drivers/infiniband/hw/mlx4/qp.c|  6 -
 drivers/infiniband/hw/mlx5/qp.c|  1 -
 drivers/infiniband/hw/mthca/mthca_cmd.c|  6 -
 drivers/infiniband/hw/mthca/mthca_cq.c |  5 
 drivers/infiniband/hw/mthca/mthca_qp.c | 17 -
 drivers/infiniband/hw/mthca/mthca_srq.c|  6 -
 drivers/infiniband/hw/qedr/verbs.c | 12 -
 drivers/infiniband/hw/qib/qib_iba6120.c|  4 ---
 drivers/infiniband/hw/qib/qib_iba7220.c|  3 ---
 drivers/infiniband/hw/qib/qib_iba7322.c|  3 ---
 drivers/infiniband/hw/qib/qib_sd7220.c |  4 ---
 drivers/media/pci/dt3155/dt3155.c  |  8 --
 drivers/memstick/host/jmb38x_ms.c  |  4 ---
 drivers/misc/ioc4.c|  2 --
 drivers/misc/mei/hw-me.c   |  3 ---
 drivers/misc/tifm_7xx1.c   |  1 -
 drivers/mmc/host/alcor.c   |  1 -
 drivers/mmc/host/sdhci.c   | 13 --
 drivers/mmc/host/tifm_sd.c |  3 ---
 drivers/mmc/host/via-sdmmc.c   | 10 
 drivers/mtd/nand/raw/r852.c|  2 --
 drivers/mtd/nand/raw/txx9ndfmc.c   |  1 -
 drivers/net/ethernet/aeroflex/greth.c  |  1 -
 drivers/net/ethernet/alacritech/slicoss.c  |  4 ---
 drivers/net/ethernet/amazon/ena/ena_com.c  |  1 -
 drivers/net/ethernet/atheros/atlx/atl1.c   |  1 -
 drivers/net/ethernet/atheros/atlx/atl2.c   |  1 -
 drivers/net/ethernet/broadcom/bnx2.c   |  4 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c|  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h|  4 ---
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c|  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 29 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c |  1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c  |  2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |  4 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  3 ---
 drivers/net/ethernet/broadcom/tg3.c|  6 -
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   | 10 
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  1 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |  4 ---
 .../net/ethernet/cavium/liquidio/request_manager.c |  1 -
 drivers/net/ethernet/intel/e1000/e1000_main.c  |  5 
 drivers/net/ethernet/intel/e1000e/netdev.c |  7 --
 drivers/net/ethernet/intel/fm10k/fm10k_iov.c   |  2 --
 drivers/net/ethernet/intel/fm10k/fm10k_main.c  |  5 
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|  5 
 drivers/net/ethernet/intel/iavf/iavf_txrx.c|  5 
 drivers/net/ethernet/intel/ice/ice_txrx.c  |  5 
 drivers/net/ethernet/intel/igb/igb_main.c  |  5 
 drivers/net/ethernet/intel/igbvf/netdev.c  |  4 ---
 drivers/net/ethernet/intel/igc/igc_main.c  |  5 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  5 
 drivers/net/ethernet/marvell/sky2.c|  4 ---
 drivers/net/ethernet/mellanox/mlx4/catas.c |  4 ---
 drivers/net/ethernet/mellanox/mlx4/cmd.c   | 13 --
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |  1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c   |  2 --
 drivers/net/ethernet/neterion/s2io.c   |  2 --
 drivers/net/ethernet/neterion/vxge/vxge-main.c |  5 
 drivers/net/ethernet/neterion/vxge/vxge-traffic.c  |  4 ---
 drivers/net/ethernet/qlogic/qed/qed_int.c  | 13 --
 drivers/net/ethernet/qlogic/qed/qed_spq.c  |  3 ---
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c|  8 --
 drivers/net/ethernet/qlogic/qede/qede_fp.c |  8 --
 drivers/net/ethernet/qlogic/qla3xxx.c  |  1 -
 drivers/net/ethernet/qlogic/qlge/qlge.h|  1 -

[RFC PATCH 03/20] mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors

2019-02-22 Thread Will Deacon

Removing explicit calls to mmiowb() from driver code means that we must
now call into the generic mmiowb_spin_{lock,unlock}() functions from the
core spinlock code. In order to elide barriers following critical
sections without any I/O writes, we also hook into the asm-generic I/O
routines.

Signed-off-by: Will Deacon 
---
 include/asm-generic/io.h|  3 ++-
 include/linux/spinlock.h| 11 ++-
 kernel/locking/spinlock_debug.c |  6 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 303871651f8a..bc490a746602 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -19,6 +19,7 @@
 #include 
 #endif
 
+#include 
 #include 
 
 #ifndef mmiowb
@@ -49,7 +50,7 @@
 
 /* serialize device access against a spin_unlock, usually handled there. */
 #ifndef __io_aw
-#define __io_aw()  barrier()
+#define __io_aw()  mmiowb_set_pending()
 #endif
 
 #ifndef __io_pbw
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index e089157dcf97..4298b1b31d9b 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 /*
@@ -177,6 +178,7 @@ do {
\
 static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
 {
__acquire(lock);
+   mmiowb_spin_lock();
arch_spin_lock(>raw_lock);
 }
 
@@ -188,16 +190,23 @@ static inline void
 do_raw_spin_lock_flags(raw_spinlock_t *lock, unsigned long *flags) 
__acquires(lock)
 {
__acquire(lock);
+   mmiowb_spin_lock();
arch_spin_lock_flags(>raw_lock, *flags);
 }
 
 static inline int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
-   return arch_spin_trylock(&(lock)->raw_lock);
+   int ret = arch_spin_trylock(&(lock)->raw_lock);
+
+   if (ret)
+   mmiowb_spin_lock();
+
+   return ret;
 }
 
 static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 {
+   mmiowb_spin_unlock();
arch_spin_unlock(>raw_lock);
__release(lock);
 }
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 9aa0fccd5d43..654484b6e70c 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -109,6 +109,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
  */
 void do_raw_spin_lock(raw_spinlock_t *lock)
 {
+   mmiowb_spin_lock();
debug_spin_lock_before(lock);
arch_spin_lock(>raw_lock);
debug_spin_lock_after(lock);
@@ -118,8 +119,10 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 {
int ret = arch_spin_trylock(>raw_lock);
 
-   if (ret)
+   if (ret) {
+   mmiowb_spin_lock();
debug_spin_lock_after(lock);
+   }
 #ifndef CONFIG_SMP
/*
 * Must not happen on UP:
@@ -131,6 +134,7 @@ int do_raw_spin_trylock(raw_spinlock_t *lock)
 
 void do_raw_spin_unlock(raw_spinlock_t *lock)
 {
+   mmiowb_spin_unlock();
debug_spin_unlock(lock);
arch_spin_unlock(>raw_lock);
 }
-- 
2.11.0

[RFC PATCH 04/20] ARM: io: Remove useless definition of mmiowb()

2019-02-22 Thread Will Deacon

ARM includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon 
---
 arch/arm/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 6b51826ab3d1..7e22c81398c4 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -281,8 +281,6 @@ extern void _memcpy_fromio(void *, const volatile void 
__iomem *, size_t);
 extern void _memcpy_toio(volatile void __iomem *, const void *, size_t);
 extern void _memset_io(volatile void __iomem *, int, size_t);
 
-#define mmiowb()
-
 /*
  *  Memory access primitives
  *  
-- 
2.11.0

[RFC PATCH 18/20] i40iw: Redefine i40iw_mmiowb() to do nothing

2019-02-22 Thread Will Deacon

mmiowb() is now implicit in spin_unlock(), so there's no reason to call
it from driver code. Redefine i40iw_mmiowb() to do nothing instead.

Signed-off-by: Will Deacon 
---
 drivers/infiniband/hw/i40iw/i40iw_osdep.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_osdep.h 
b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
index f27be3e7830b..d474aad62a81 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_osdep.h
+++ b/drivers/infiniband/hw/i40iw/i40iw_osdep.h
@@ -211,7 +211,7 @@ enum i40iw_status_code i40iw_hw_manage_vf_pble_bp(struct 
i40iw_device *iwdev,
 struct i40iw_sc_vsi;
 void i40iw_hw_stats_start_timer(struct i40iw_sc_vsi *vsi);
 void i40iw_hw_stats_stop_timer(struct i40iw_sc_vsi *vsi);
-#define i40iw_mmiowb() mmiowb()
+#define i40iw_mmiowb() do { } while (0)
 void i40iw_wr32(struct i40iw_hw *hw, u32 reg, u32 value);
 u32  i40iw_rd32(struct i40iw_hw *hw, u32 reg);
 #endif /* _I40IW_OSDEP_H_ */
-- 
2.11.0

[RFC PATCH 15/20] drivers: Remove useless trailing comments from mmiowb() invocations

2019-02-22 Thread Will Deacon

In preparation for removing mmiowb() from driver code altogether using
coccinelle, remove all trailing comments since they won't be picked up
by spatch later on and will end up being preserved in the code.

Signed-off-by: Will Deacon 
---
 drivers/infiniband/hw/hfi1/chip.c| 2 +-
 drivers/infiniband/hw/qedr/verbs.c   | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 drivers/scsi/bnx2i/bnx2i_hwi.c   | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/chip.c 
b/drivers/infiniband/hw/hfi1/chip.c
index b443642eac02..955bad21a519 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -8352,7 +8352,7 @@ static inline void clear_recv_intr(struct hfi1_ctxtdata 
*rcd)
struct hfi1_devdata *dd = rcd->dd;
u32 addr = CCE_INT_CLEAR + (8 * rcd->ireg);
 
-   mmiowb();   /* make sure everything before is written */
+   mmiowb();
write_csr(dd, addr, rcd->imask);
/* force the above write on the chip and get a value back */
(void)read_csr(dd, addr);
diff --git a/drivers/infiniband/hw/qedr/verbs.c 
b/drivers/infiniband/hw/qedr/verbs.c
index e1ccf32b1c3d..23353e0e4bd4 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -3744,7 +3744,7 @@ int qedr_post_recv(struct ib_qp *ibqp, const struct 
ib_recv_wr *wr,
 
if (rdma_protocol_iwarp(>ibdev, 1)) {
writel(qp->rq.iwarp_db2_data.raw, qp->rq.iwarp_db2);
-   mmiowb();   /* for second doorbell */
+   mmiowb();
}
 
wr = wr->next;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 2462e7aa0c5d..1ed068509337 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -527,7 +527,7 @@ static inline void bnx2x_update_rx_prod(struct bnx2x *bp,
REG_WR_RELAXED(bp, fp->ustorm_rx_prods_offset + i * 4,
   ((u32 *)_prods)[i]);
 
-   mmiowb(); /* keep prod updates ordered */
+   mmiowb();
 
DP(NETIF_MSG_RX_STATUS,
   "queue[%d]:  wrote  bd_prod %u  cqe_prod %u  sge_prod %u\n",
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 3b5b47e98c73..64bc6d6fcd65 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -5244,7 +5244,7 @@ static void bnx2x_update_eq_prod(struct bnx2x *bp, u16 
prod)
 {
/* No memory barriers */
storm_memset_eq_prod(bp, prod, BP_FUNC(bp));
-   mmiowb(); /* keep prod updates ordered */
+   mmiowb();
 }
 
 static int  bnx2x_cnic_handle_cfc_del(struct bnx2x *bp, u32 cid,
diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index fae6f71e677d..d56a78f411cd 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -280,7 +280,7 @@ static void bnx2i_ring_sq_dbell(struct bnx2i_conn 
*bnx2i_conn, int count)
} else
writew(count, ep->qp.ctx_base + CNIC_SEND_DOORBELL);
 
-   mmiowb(); /* flush posted PCI writes */
+   mmiowb();
 }
 
 
-- 
2.11.0

[RFC PATCH 14/20] Documentation: Kill all references to mmiowb()

2019-02-22 Thread Will Deacon

The guarantees provided by mmiowb() are now provided implicitly by
spin_unlock(), so we can remove all references from this most confusing
of barriers from our Documentation.

Good riddance.

Signed-off-by: Will Deacon 
---
 Documentation/driver-api/device-io.rst  |  45 --
 Documentation/driver-api/pci/p2pdma.rst |   4 --
 Documentation/memory-barriers.txt   | 103 ++--
 3 files changed, 4 insertions(+), 148 deletions(-)

diff --git a/Documentation/driver-api/device-io.rst 
b/Documentation/driver-api/device-io.rst
index b00b23903078..0e389378f71d 100644
--- a/Documentation/driver-api/device-io.rst
+++ b/Documentation/driver-api/device-io.rst
@@ -103,51 +103,6 @@ continuing execution::
 ha->flags.ints_enabled = 0;
 }
 
-In addition to write posting, on some large multiprocessing systems
-(e.g. SGI Challenge, Origin and Altix machines) posted writes won't be
-strongly ordered coming from different CPUs. Thus it's important to
-properly protect parts of your driver that do memory-mapped writes with
-locks and use the :c:func:`mmiowb()` to make sure they arrive in the
-order intended. Issuing a regular readX() will also ensure write ordering,
-but should only be used when the 
-driver has to be sure that the write has actually arrived at the device
-(not that it's simply ordered with respect to other writes), since a
-full readX() is a relatively expensive operation.
-
-Generally, one should use :c:func:`mmiowb()` prior to releasing a spinlock
-that protects regions using :c:func:`writeb()` or similar functions that
-aren't surrounded by readb() calls, which will ensure ordering
-and flushing. The following pseudocode illustrates what might occur if
-write ordering isn't guaranteed via :c:func:`mmiowb()` or one of the
-readX() functions::
-
-CPU A:  spin_lock_irqsave(_lock, flags)
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  spin_unlock_irqrestore(_lock, flags)
-...
-CPU B:  spin_lock_irqsave(_lock, flags)
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  ...
-CPU B:  spin_unlock_irqrestore(_lock, flags)
-
-In the case above, newval2 could be written to ring_ptr before newval.
-Fixing it is easy though::
-
-CPU A:  spin_lock_irqsave(_lock, flags)
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
-CPU A:  spin_unlock_irqrestore(_lock, flags)
-...
-CPU B:  spin_lock_irqsave(_lock, flags)
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  ...
-CPU B:  mmiowb();
-CPU B:  spin_unlock_irqrestore(_lock, flags)
-
-See tg3.c for a real world example of how to use :c:func:`mmiowb()`
-
 PCI ordering rules also guarantee that PIO read responses arrive after any
 outstanding DMA writes from that bus, since for some devices the result of
 a readb() call may signal to the driver that a DMA transaction is
diff --git a/Documentation/driver-api/pci/p2pdma.rst 
b/Documentation/driver-api/pci/p2pdma.rst
index 6d85b5a2598d..44deb52beeb4 100644
--- a/Documentation/driver-api/pci/p2pdma.rst
+++ b/Documentation/driver-api/pci/p2pdma.rst
@@ -132,10 +132,6 @@ precludes passing these pages to userspace.
 P2P memory is also technically IO memory but should never have any side
 effects behind it. Thus, the order of loads and stores should not be important
 and ioreadX(), iowriteX() and friends should not be necessary.
-However, as the memory is not cache coherent, if access ever needs to
-be protected by a spinlock then :c:func:`mmiowb()` must be used before
-unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
-Documentation/memory-barriers.txt)
 
 
 P2P DMA Support Library
diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 158947ae78c2..6d6eff413462 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1937,21 +1937,6 @@ There are some more advanced barrier functions:
  information on consistent memory.
 
 
-MMIO WRITE BARRIER
---
-
-The Linux kernel also has a special barrier for use with memory-mapped I/O
-writes:
-
-   mmiowb();
-
-This is a variation on the mandatory write barrier that causes writes to weakly
-ordered I/O regions to be partially ordered.  Its effects may go beyond the
-CPU->Hardware interface and actually affect the hardware at some level.
-
-See the subsection "Acquires vs I/O accesses" for more information.
-
-
 ===
 IMPLICIT KERNEL MEMORY BARRIERS
 ===
@@ -2317,75 +2302,6 @@ But it won't see any of:
*E, *F or *G following RELEASE Q
 
 
-
-ACQUIRES VS I/O ACCESSES
-
-
-Under certain circumstances (especially involving NUMA), I/O accesses within
-two spinlocked sections on two different CPUs may be seen as interleaved by the
-PCI bridge, because the PCI bridge does not necessarily participate in the

[RFC PATCH 12/20] powerpc: mmiowb: Hook up mmwiob() implementation to asm-generic code

2019-02-22 Thread Will Deacon

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code but override all of the functions
so that the existing (flawed) implementation is preserved on Power.

Future work should either use the asm-generic implementation entirely,
or implement a similar counting mechanism to avoid erroneously skipping
barrier invocations when they are needed.

Signed-off-by: Will Deacon 
---
 arch/powerpc/Kconfig|  1 +
 arch/powerpc/include/asm/Kbuild |  1 -
 arch/powerpc/include/asm/io.h   | 33 +++--
 arch/powerpc/include/asm/mmiowb.h   | 41 +
 arch/powerpc/include/asm/spinlock.h | 17 ---
 5 files changed, 45 insertions(+), 48 deletions(-)
 create mode 100644 arch/powerpc/include/asm/mmiowb.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..242ee248b922 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -134,6 +134,7 @@ config PPC
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_MMIOWB  if PPC64 && SMP
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_PMEM_APIif PPC64
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 57bd1f6660f4..77ff7fb24823 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -8,7 +8,6 @@ generic-y += irq_regs.h
 generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 7f19fbd3ba55..828100476ba6 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -34,14 +34,11 @@ extern struct pci_dev *isa_bridge_pcidev;
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
-#include 
-#endif
-
 #define SIO_CONFIG_RA  0x398
 #define SIO_CONFIG_RD  0x399
 
@@ -107,12 +104,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 #define DEF_MMIO_IN_X(name, size, insn)\
 static inline u##size name(const volatile u##size __iomem *addr)   \
 {  \
@@ -127,7 +118,7 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,%y0"  \
: "=Z" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
+   mmiowb_set_pending();   \
 }
 
 #define DEF_MMIO_IN_D(name, size, insn)\
@@ -144,7 +135,7 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
: "=m" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
+   mmiowb_set_pending();   \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -652,24 +643,6 @@ static inline void name at 
\
 
 #include 
 
-#ifdef CONFIG_PPC32
-#define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-   unsigned long tmp;
-
-   __asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-   : "=" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-   : "memory");
-}
-#endif /* !CONFIG_PPC32 */
-
 static inline void iosync(void)
 {
 __asm__ __volatile__ ("sync" : : : "memory");
diff --git a/arch/powerpc/include/asm/mmiowb.h 
b/arch/powerpc/include/asm/mmiowb.h
new file mode 100644
index ..7647ffcced1c
--- /dev/null
+++ b/arch/powerpc/include/asm/mmiowb.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_MMIOWB_H
+#define _ASM_POWERPC_MMIOWB_H
+
+#ifdef CONFIG_ARCH_HAS_MMIOWB
+
+#include 
+#include 
+#include 
+
+#define mmiowb()   mb()
+
+/*
+ * Note: this is broken, as it doesn't correctly handle sequences such as:
+ *
+ * spin_lock();// io_sync = 0
+ * outb(42, port); // io_sync = 1
+ * spin_lock();// io_sync = 0
+ * ...
+ * spin_unlock();
+ * spin_unlock();
+ *
+ * The

[RFC PATCH 07/20] nds32: io: Remove useless definition of mmiowb()

2019-02-22 Thread Will Deacon

mmiowb() only makes sense for SMP platforms, so we can remove it
entirely for nds32.

Signed-off-by: Will Deacon 
---
 arch/nds32/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/nds32/include/asm/io.h b/arch/nds32/include/asm/io.h
index 71cd226d6863..5ef8ae5ba833 100644
--- a/arch/nds32/include/asm/io.h
+++ b/arch/nds32/include/asm/io.h
@@ -55,8 +55,6 @@ static inline u32 __raw_readl(const volatile void __iomem 
*addr)
 #define __iormb()   rmb()
 #define __iowmb()   wmb()
 
-#define mmiowb()__asm__ __volatile__ ("msync all" : : : "memory");
-
 /*
  * {read,write}{b,w,l,q}_relaxed() are like the regular version, but
  * are not guaranteed to provide ordering against spinlocks or memory
-- 
2.11.0

[RFC PATCH 13/20] riscv: mmiowb: Hook up mmwiob() implementation to asm-generic code

2019-02-22 Thread Will Deacon

In a bid to kill off explicit mmiowb() usage in driver code, hook up
the asm-generic mmiowb() tracking code for riscv, so that an mmiowb()
is automatically issued from spin_unlock() if an I/O write was performed
in the critical section.

Signed-off-by: Will Deacon 
---
 arch/riscv/Kconfig  |  1 +
 arch/riscv/include/asm/Kbuild   |  1 -
 arch/riscv/include/asm/io.h | 15 ++-
 arch/riscv/include/asm/mmiowb.h | 14 ++
 4 files changed, 17 insertions(+), 14 deletions(-)
 create mode 100644 arch/riscv/include/asm/mmiowb.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 515fc3cc9687..3e7acb997235 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -49,6 +49,7 @@ config RISCV
select RISCV_TIMER
select GENERIC_IRQ_MULTI_HANDLER
select ARCH_HAS_PTE_SPECIAL
+   select ARCH_HAS_MMIOWB if SMP
 
 config MMU
def_bool y
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index 221cd2ec78a4..cccd12cf27d4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += mutex.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/riscv/include/asm/io.h b/arch/riscv/include/asm/io.h
index 1d9c1376dc64..744fd92e77bc 100644
--- a/arch/riscv/include/asm/io.h
+++ b/arch/riscv/include/asm/io.h
@@ -20,6 +20,7 @@
 #define _ASM_RISCV_IO_H
 
 #include 
+#include 
 
 extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);
 
@@ -100,18 +101,6 @@ static inline u64 __raw_readq(const volatile void __iomem 
*addr)
 #endif
 
 /*
- * FIXME: I'm flip-flopping on whether or not we should keep this or enforce
- * the ordering with I/O on spinlocks like PowerPC does.  The worry is that
- * drivers won't get this correct, but I also don't want to introduce a fence
- * into the lock code that otherwise only uses AMOs (and is essentially defined
- * by the ISA to be correct).   For now I'm leaving this here: "o,w" is
- * sufficient to ensure that all writes to the device have completed before the
- * write to the spinlock is allowed to commit.  I surmised this from reading
- * "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt.
- */
-#define mmiowb()   __asm__ __volatile__ ("fence o,w" : : : "memory");
-
-/*
  * Unordered I/O memory access primitives.  These are even more relaxed than
  * the relaxed versions, as they don't even order accesses between successive
  * operations to the I/O regions.
@@ -165,7 +154,7 @@ static inline u64 __raw_readq(const volatile void __iomem 
*addr)
 #define __io_br()  do {} while (0)
 #define __io_ar(v) __asm__ __volatile__ ("fence i,r" : : : "memory");
 #define __io_bw()  __asm__ __volatile__ ("fence w,o" : : : "memory");
-#define __io_aw()  do {} while (0)
+#define __io_aw()  mmiowb_set_pending()
 
 #define readb(c)   ({ u8  __v; __io_br(); __v = readb_cpu(c); 
__io_ar(__v); __v; })
 #define readw(c)   ({ u16 __v; __io_br(); __v = readw_cpu(c); 
__io_ar(__v); __v; })
diff --git a/arch/riscv/include/asm/mmiowb.h b/arch/riscv/include/asm/mmiowb.h
new file mode 100644
index ..5d7e3a2b4e3b
--- /dev/null
+++ b/arch/riscv/include/asm/mmiowb.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_RISCV_MMIOWB_H
+#define _ASM_RISCV_MMIOWB_H
+
+/*
+ * "o,w" is sufficient to ensure that all writes to the device have completed
+ * before the write to the spinlock is allowed to commit.
+ */
+#define mmiowb()   __asm__ __volatile__ ("fence o,w" : : : "memory");
+
+#include 
+
+#endif /* ASM_RISCV_MMIOWB_H */
-- 
2.11.0

[RFC PATCH 08/20] m68k: io: Remove useless definition of mmiowb()

2019-02-22 Thread Will Deacon

m68k includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Cc: Geert Uytterhoeven 
Signed-off-by: Will Deacon 
---
 arch/m68k/include/asm/io_mm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/m68k/include/asm/io_mm.h b/arch/m68k/include/asm/io_mm.h
index 782b78f8a048..6c03ca5bc436 100644
--- a/arch/m68k/include/asm/io_mm.h
+++ b/arch/m68k/include/asm/io_mm.h
@@ -377,8 +377,6 @@ static inline void isa_delay(void)
 #define writesw(port, buf, nr)raw_outsw((port), (u16 *)(buf), (nr))
 #define writesl(port, buf, nr)raw_outsl((port), (u32 *)(buf), (nr))
 
-#define mmiowb()
-
 #ifndef CONFIG_SUN3
 #define IO_SPACE_LIMIT 0x
 #else
-- 
2.11.0

[RFC PATCH 00/20] Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())

2019-02-22 Thread Will Deacon

Hi all,

This series attempts to remove explicit mmiowb() calls from driver code,
instead adding it to spin_unlock() for architectures that require it.

The infamous "PowerPC trick" was found to be flawed, so a correct but probably
slower version is made generic for architectures wishing to use it. I've ported
RISC-V over to this interface, and hooked up PowerPC to the generic
infrastructure but with their existing implementation. Other architectures that
require a definition for mmiowb() now have it unconditionally called from their
spin_unlock() code. If this is deemed to be expensive, they should be able to
do something similar to RISC-V.

I used coccinelle to remove the vast majority of the users, and the trivial
spatch invocation is included in the commit message of that patch. I've also
*build-tested* this for RISC-V, ia64, mips64, sh, powerpc64 and arm64.

This is based on a couple of other I/O-related series that I have pending:

  https://lkml.org/lkml/2019/2/22/484
  https://lkml.org/lkml/2019/2/22/500

so I've pushed the whole lot out on a branch for your comfort and convenience:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git io

Feedback welcome,

Will

Cc: "Paul E. McKenney" 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Arnd Bergmann 
Cc: Peter Zijlstra 
Cc: Andrea Parri 
Cc: Palmer Dabbelt 
Cc: Daniel Lustig 
Cc: David Howells 
Cc: Alan Stern 
Cc: Linus Torvalds 
Cc: "Maciej W. Rozycki" 
Cc: Paul Burton 
Cc: Ingo Molnar 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Tony Luck 

--->8

Will Deacon (20):
  asm-generic/mmiowb: Add generic implementation of mmiowb() tracking
  arch: Use asm-generic header for asm/mmiowb.h
  mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  ARM: io: Remove useless definition of mmiowb()
  arm64: io: Remove useless definition of mmiowb()
  x86: io: Remove useless definition of mmiowb()
  nds32: io: Remove useless definition of mmiowb()
  m68k: io: Remove useless definition of mmiowb()
  sh: Add unconditional mmiowb() to arch_spin_unlock()
  mips: Add unconditional mmiowb() to arch_spin_unlock()
  ia64: Add unconditional mmiowb() to arch_spin_unlock()
  powerpc: mmiowb: Hook up mmwiob() implementation to asm-generic code
  riscv: mmiowb: Hook up mmwiob() implementation to asm-generic code
  Documentation: Kill all references to mmiowb()
  drivers: Remove useless trailing comments from mmiowb() invocations
  drivers: Remove explicit invocations of mmiowb()
  scsi: qla1280: Remove stale comment about mmiowb()
  i40iw: Redefine i40iw_mmiowb() to do nothing
  net: silan: sc92031: Remove stale comment about mmiowb()
  arch: Remove dummy mmiowb() definitions from arch code

 Documentation/driver-api/device-io.rst |  45 -
 Documentation/driver-api/pci/p2pdma.rst|   4 -
 Documentation/memory-barriers.txt  | 103 +
 arch/alpha/include/asm/Kbuild  |   1 +
 arch/alpha/include/asm/io.h|   2 -
 arch/arc/include/asm/Kbuild|   1 +
 arch/arm/include/asm/Kbuild|   1 +
 arch/arm/include/asm/io.h  |   2 -
 arch/arm64/include/asm/Kbuild  |   1 +
 arch/arm64/include/asm/io.h|   2 -
 arch/c6x/include/asm/Kbuild|   1 +
 arch/csky/include/asm/Kbuild   |   1 +
 arch/h8300/include/asm/Kbuild  |   1 +
 arch/hexagon/include/asm/Kbuild|   1 +
 arch/hexagon/include/asm/io.h  |   2 -
 arch/ia64/include/asm/io.h |   4 -
 arch/ia64/include/asm/mmiowb.h |  12 +++
 arch/ia64/include/asm/spinlock.h   |   2 +
 arch/m68k/include/asm/Kbuild   |   1 +
 arch/m68k/include/asm/io_mm.h  |   2 -
 arch/microblaze/include/asm/Kbuild |   1 +
 arch/mips/include/asm/io.h |   3 -
 arch/mips/include/asm/mmiowb.h |  11 +++
 arch/mips/include/asm/spinlock.h   |  15 +++
 arch/nds32/include/asm/Kbuild  |   1 +
 arch/nds32/include/asm/io.h|   2 -
 arch/nios2/include/asm/Kbuild  |   1 +
 arch/openrisc/include/asm/Kbuild   |   1 +
 arch/parisc/include/asm/Kbuild |   1 +
 arch/parisc/include/asm/io.h   |   2 -
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/io.h  |  33 +--
 arch/powerpc/include/asm/mmiowb.h  |  41 
 arch/powerpc/include/asm/spinlock.h|  17 
 arch/riscv/Kconfig |   1 +
 arch/riscv/include/asm/io.h|  15 +--
 arch/riscv/include/asm/mmiowb.h|  14 +++

[RFC PATCH 02/20] arch: Use asm-generic header for asm/mmiowb.h

2019-02-22 Thread Will Deacon

Hook up asm-generic/mmiowb.h to Kbuild for all architectures so that we
can subsequently include asm/mmiowb.h from core code.

Cc: Masahiro Yamada 
Cc: Arnd Bergmann 
Signed-off-by: Will Deacon 
---
 arch/alpha/include/asm/Kbuild  | 1 +
 arch/arc/include/asm/Kbuild| 1 +
 arch/arm/include/asm/Kbuild| 1 +
 arch/arm64/include/asm/Kbuild  | 1 +
 arch/c6x/include/asm/Kbuild| 1 +
 arch/csky/include/asm/Kbuild   | 1 +
 arch/h8300/include/asm/Kbuild  | 1 +
 arch/hexagon/include/asm/Kbuild| 1 +
 arch/ia64/include/asm/Kbuild   | 1 +
 arch/m68k/include/asm/Kbuild   | 1 +
 arch/microblaze/include/asm/Kbuild | 1 +
 arch/mips/include/asm/Kbuild   | 1 +
 arch/nds32/include/asm/Kbuild  | 1 +
 arch/nios2/include/asm/Kbuild  | 1 +
 arch/openrisc/include/asm/Kbuild   | 1 +
 arch/parisc/include/asm/Kbuild | 1 +
 arch/powerpc/include/asm/Kbuild| 1 +
 arch/riscv/include/asm/Kbuild  | 1 +
 arch/s390/include/asm/Kbuild   | 1 +
 arch/sh/include/asm/Kbuild | 1 +
 arch/sparc/include/asm/Kbuild  | 1 +
 arch/um/include/asm/Kbuild | 1 +
 arch/unicore32/include/asm/Kbuild  | 1 +
 arch/xtensa/include/asm/Kbuild | 1 +
 24 files changed, 24 insertions(+)

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index dc0ab28baca1..2b4b2fe715bb 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -8,6 +8,7 @@ generic-y += fb.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index caa270261521..138f564afc74 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -14,6 +14,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 1d66db9c9db5..ee6d155ca372 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 1e17ea5c372b..3dae4fd028cf 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -13,6 +13,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index 63b4a1705182..fda53087eabc 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += kprobes.h
 generic-y += local.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += pci.h
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 2a0abe8f2a35..95f4e550db8a 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -28,6 +28,7 @@ generic-y += linkage.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += module.h
 generic-y += mutex.h
 generic-y += pci.h
diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild
index 961c1dc064e1..86ad35d93861 100644
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -29,6 +29,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += mmu.h
 generic-y += mmu_context.h
 generic-y += module.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index b25fd42aa0f4..ff9134a01a13 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -23,6 +23,7 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 43e21fe3499c..3273d7aedfa0 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += exec.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
+generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 95f8f631c4df..2add40f9fdef 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild

[RFC PATCH 06/20] x86: io: Remove useless definition of mmiowb()

2019-02-22 Thread Will Deacon

x86 maps mmiowb() to barrier(), but this is superfluous because a
compiler barrier is already implied by spin_unlock(). Since x86 also
includes asm-generic/io.h in its asm/io.h file, we can remove the
definition entirely and pick up the dummy definition from core code.

Signed-off-by: Will Deacon 
---
 arch/x86/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 686247db3106..a06a9f8294ea 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -90,8 +90,6 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 #define __raw_writew __writew
 #define __raw_writel __writel
 
-#define mmiowb() barrier()
-
 #ifdef CONFIG_X86_64
 
 build_mmio_read(readq, "q", u64, "=r", :"memory")
-- 
2.11.0

Re: [PATCH] x86/kvm/mmu: make mmu->prev_roots cache work for NPT case

2019-02-22 Thread Vitaly Kuznetsov

Paolo Bonzini  writes:

> On 22/02/19 17:46, Vitaly Kuznetsov wrote:
>> I noticed that fast_cr3_switch() always fails when we switch back from L2
>> to L1 as it is not able to find a cached root. This is odd: host's CR3
>> usually stays the same, we expect to always follow the fast path. Turns
>> out the problem is that page role is always mismatched because
>> kvm_mmu_get_page() filters out cr4_pae when direct, the value is stored
>> in page header and later compared with new_role in cached_root_available().
>> As cr4_pae is always set in long mode prev_roots cache is dysfunctional.
>
> Really cr4_pae means "are the PTEs 8 bytes".  So I think your patch is
> correct but on top we should set it to 1 (not zero!!) for
> kvm_calc_shadow_ept_root_page_role, init_kvm_nested_mmu and
> kvm_calc_tdp_mmu_root_page_role.  Or maybe everything breaks with that
> change.
>

Yes, exactly. If we put '1' there kvm_mmu_get_page() will again filter
it out and we won't be able to find the root in prev_roots cache :-(

>> - Do not clear cr4_pae in kvm_mmu_get_page() and check direct on call sites
>>  (detect_write_misaligned(), get_written_sptes()).
>
> These only run with shadow page tables, by the way.
>

Yes, and that's why I think it may make sense to move the filtering
logic there. At least in other cases cr4_pae will always be equal to
is_pae().

It seems I know too little about shadow paging and all these corner
cases :-(

-- 
Vitaly

[RFC PATCH 09/20] sh: Add unconditional mmiowb() to arch_spin_unlock()

2019-02-22 Thread Will Deacon

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for sh. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performned inside the
critical section.

Cc: Yoshinori Sato 
Cc: Rich Felker 
Signed-off-by: Will Deacon 
---
 arch/sh/include/asm/Kbuild  |  1 -
 arch/sh/include/asm/io.h|  3 ---
 arch/sh/include/asm/mmiowb.h| 12 
 arch/sh/include/asm/spinlock-llsc.h |  2 ++
 4 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 arch/sh/include/asm/mmiowb.h

diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index e3f6926c4b86..a6ef3fee5f85 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/sh/include/asm/io.h b/arch/sh/include/asm/io.h
index 4f7f235f15f8..c28e37a344ad 100644
--- a/arch/sh/include/asm/io.h
+++ b/arch/sh/include/asm/io.h
@@ -229,9 +229,6 @@ __BUILD_IOPORT_STRING(q, u64)
 
 #define IO_SPACE_LIMIT 0x
 
-/* synco on SH-4A, otherwise a nop */
-#define mmiowb()   wmb()
-
 /* We really want to try and get these to memcpy etc */
 void memcpy_fromio(void *, const volatile void __iomem *, unsigned long);
 void memcpy_toio(volatile void __iomem *, const void *, unsigned long);
diff --git a/arch/sh/include/asm/mmiowb.h b/arch/sh/include/asm/mmiowb.h
new file mode 100644
index ..535d59735f1d
--- /dev/null
+++ b/arch/sh/include/asm/mmiowb.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_SH_MMIOWB_H
+#define __ASM_SH_MMIOWB_H
+
+#include 
+
+/* synco on SH-4A, otherwise a nop */
+#define mmiowb()   wmb()
+
+#include 
+
+#endif /* __ASM_SH_MMIOWB_H */
diff --git a/arch/sh/include/asm/spinlock-llsc.h 
b/arch/sh/include/asm/spinlock-llsc.h
index 786ee0fde3b0..7fd929cd2e7a 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -47,6 +47,8 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
unsigned long tmp;
 
+   /* This could be optimised with ARCH_HAS_MMIOWB */
+   mmiowb();
__asm__ __volatile__ (
"mov#1, %0 ! arch_spin_unlock   \n\t"
"mov.l  %0, @%1 \n\t"
-- 
2.11.0

[RFC PATCH 11/20] ia64: Add unconditional mmiowb() to arch_spin_unlock()

2019-02-22 Thread Will Deacon

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for ia64. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performned inside the
critical section.

Signed-off-by: Will Deacon 
---
 arch/ia64/include/asm/Kbuild |  1 -
 arch/ia64/include/asm/io.h   |  4 
 arch/ia64/include/asm/mmiowb.h   | 12 
 arch/ia64/include/asm/spinlock.h |  2 ++
 4 files changed, 14 insertions(+), 5 deletions(-)
 create mode 100644 arch/ia64/include/asm/mmiowb.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 3273d7aedfa0..43e21fe3499c 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,7 +4,6 @@ generic-y += exec.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += preempt.h
 generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h
index 1e6fef69bb01..7f2371ba04a4 100644
--- a/arch/ia64/include/asm/io.h
+++ b/arch/ia64/include/asm/io.h
@@ -119,8 +119,6 @@ extern int valid_mmap_phys_addr_range (unsigned long pfn, 
size_t count);
  * Ensure ordering of I/O space writes.  This will make sure that writes
  * following the barrier will arrive after all previous writes.  For most
  * ia64 platforms, this is a simple 'mf.a' instruction.
- *
- * See Documentation/driver-api/device-io.rst for more information.
  */
 static inline void ___ia64_mmiowb(void)
 {
@@ -296,7 +294,6 @@ __outsl (unsigned long port, const void *src, unsigned long 
count)
 #define __outb platform_outb
 #define __outw platform_outw
 #define __outl platform_outl
-#define __mmiowb   platform_mmiowb
 
 #define inb(p) __inb(p)
 #define inw(p) __inw(p)
@@ -310,7 +307,6 @@ __outsl (unsigned long port, const void *src, unsigned long 
count)
 #define outsb(p,s,c)   __outsb(p,s,c)
 #define outsw(p,s,c)   __outsw(p,s,c)
 #define outsl(p,s,c)   __outsl(p,s,c)
-#define mmiowb()   __mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
diff --git a/arch/ia64/include/asm/mmiowb.h b/arch/ia64/include/asm/mmiowb.h
new file mode 100644
index ..238d56172c6f
--- /dev/null
+++ b/arch/ia64/include/asm/mmiowb.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_IA64_MMIOWB_H
+#define _ASM_IA64_MMIOWB_H
+
+#include 
+
+#define mmiowb()   platform_mmiowb()
+
+#include 
+
+#endif /* _ASM_IA64_MMIOWB_H */
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index afd0b3121b4c..5f620e66384e 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -73,6 +73,8 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 {
unsigned short  *p = (unsigned short *)>lock + 1, tmp;
 
+   /* This could be optimised with ARCH_HAS_MMIOWB */
+   mmiowb();
asm volatile ("ld2.bias %0=[%1]" : "=r"(tmp) : "r"(p));
WRITE_ONCE(*p, (tmp + 2) & ~1);
 }
-- 
2.11.0

[RFC PATCH 10/20] mips: Add unconditional mmiowb() to arch_spin_unlock()

2019-02-22 Thread Will Deacon

The mmiowb() macro is horribly difficult to use and drivers will continue
to work most of the time if they omit a call when it is required.

Rather than rely on driver authors getting this right, push mmiowb() into
arch_spin_unlock() for mips. If this is deemed to be a performance issue,
a subsequent optimisation could make use of ARCH_HAS_MMIOWB to elide
the barrier in cases where no I/O writes were performned inside the
critical section.

Signed-off-by: Will Deacon 
---
 arch/mips/include/asm/Kbuild |  1 -
 arch/mips/include/asm/io.h   |  3 ---
 arch/mips/include/asm/mmiowb.h   | 11 +++
 arch/mips/include/asm/spinlock.h | 15 +++
 4 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 arch/mips/include/asm/mmiowb.h

diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 5653b1e47dd0..f15d5db5dd67 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
-generic-y += mmiowb.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += percpu.h
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 845fbbc7a2e3..29997e42480e 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -102,9 +102,6 @@ static inline void set_io_port_base(unsigned long base)
 #define iobarrier_w() wmb()
 #define iobarrier_sync() iob()
 
-/* Some callers use this older API instead.  */
-#define mmiowb() iobarrier_w()
-
 /*
  * virt_to_phys-   map virtual addresses to physical
  * @address: address to remap
diff --git a/arch/mips/include/asm/mmiowb.h b/arch/mips/include/asm/mmiowb.h
new file mode 100644
index ..a40824e3ef8e
--- /dev/null
+++ b/arch/mips/include/asm/mmiowb.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MMIOWB_H
+#define _ASM_MMIOWB_H
+
+#include 
+
+#define mmiowb()   iobarrier_w()
+
+#include 
+
+#endif /* _ASM_MMIOWB_H */
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index ee81297d9117..8a88eb265516 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -11,6 +11,21 @@
 
 #include 
 #include 
+
+#include 
+
+#definequeued_spin_unlock queued_spin_unlock
+/**
+ * queued_spin_unlock - release a queued spinlock
+ * @lock : Pointer to queued spinlock structure
+ */
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+   /* This could be optimised with ARCH_HAS_MMIOWB */
+   mmiowb();
+   smp_store_release(>locked, 0);
+}
+
 #include 
 
 #endif /* _ASM_SPINLOCK_H */
-- 
2.11.0

[RFC PATCH 05/20] arm64: io: Remove useless definition of mmiowb()

2019-02-22 Thread Will Deacon

arm64 includes asm-generic/io.h, which provides a dummy definition of
mmiowb() if one isn't already provided by the architecture.

Remove the useless definition.

Signed-off-by: Will Deacon 
---
 arch/arm64/include/asm/io.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 8bb7210ac286..b807cb9b517d 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -124,8 +124,6 @@ static inline u64 __raw_readq(const volatile void __iomem 
*addr)
 #define __io_par(v)__iormb(v)
 #define __iowmb()  wmb()
 
-#define mmiowb()   do { } while (0)
-
 /*
  * Relaxed I/O memory access primitives. These follow the Device memory
  * ordering rules but do not guarantee any ordering relative to Normal memory
-- 
2.11.0

[RFC PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

2019-02-22 Thread Will Deacon

In preparation for removing all explicit mmiowb() calls from driver
code, implement a tracking system in asm-generic based on the PowerPC
implementation. This allows architectures with a non-empty mmiowb()
definition to automatically have the barrier inserted in spin_unlock()
following a critical section containing an I/O write.

Signed-off-by: Will Deacon 
---
 include/asm-generic/mmiowb.h | 60 
 kernel/Kconfig.locks |  3 +++
 kernel/locking/spinlock.c|  5 
 3 files changed, 68 insertions(+)
 create mode 100644 include/asm-generic/mmiowb.h

diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
new file mode 100644
index ..1cec8907806f
--- /dev/null
+++ b/include/asm-generic/mmiowb.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_MMIOWB_H
+#define __ASM_GENERIC_MMIOWB_H
+
+/*
+ * Generic implementation of mmiowb() tracking for spinlocks.
+ *
+ * If your architecture doesn't ensure that writes to an I/O peripheral
+ * within two spinlocked sections on two different CPUs are seen by the
+ * peripheral in the order corresponding to the lock handover, then you
+ * need to follow these FIVE easy steps:
+ *
+ * 1. Implement mmiowb() in asm/mmiowb.h and then #include this file
+ * 2. Ensure your I/O write accessors call mmiowb_set_pending()
+ * 3. Select ARCH_HAS_MMIOWB
+ * 4. Untangle the resulting mess of header files
+ * 5. Complain to your architects
+ */
+#if defined(CONFIG_ARCH_HAS_MMIOWB) && defined(CONFIG_SMP)
+
+#include 
+#include 
+#include 
+
+struct mmiowb_state {
+   u16 nesting_count;
+   u16 mmiowb_pending;
+};
+DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+
+#ifndef mmiowb_set_pending
+static inline void mmiowb_set_pending(void)
+{
+   __this_cpu_write(__mmiowb_state.mmiowb_pending, 1);
+}
+#endif
+
+#ifndef mmiowb_spin_lock
+static inline void mmiowb_spin_lock(void)
+{
+   if (__this_cpu_inc_return(__mmiowb_state.nesting_count) == 1)
+   __this_cpu_write(__mmiowb_state.mmiowb_pending, 0);
+}
+#endif
+
+#ifndef mmiowb_spin_unlock
+static inline void mmiowb_spin_unlock(void)
+{
+   if (__this_cpu_xchg(__mmiowb_state.mmiowb_pending, 0))
+   mmiowb();
+   __this_cpu_dec_return(__mmiowb_state.nesting_count);
+}
+#endif
+
+#else
+#define mmiowb_set_pending()   do { } while (0)
+#define mmiowb_spin_lock() do { } while (0)
+#define mmiowb_spin_unlock()   do { } while (0)
+#endif /* CONFIG_ARCH_HAS_MMIOWB && CONFIG_SMP */
+#endif /* __ASM_GENERIC_MMIOWB_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 84d882f3e299..04976ae41176 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -248,3 +248,6 @@ config ARCH_USE_QUEUED_RWLOCKS
 config QUEUED_RWLOCKS
def_bool y if ARCH_USE_QUEUED_RWLOCKS
depends on SMP
+
+config ARCH_HAS_MMIOWB
+   bool
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 936f3d14dd6b..cbae365d7dd1 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -22,6 +22,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_ARCH_HAS_MMIOWB
+DEFINE_PER_CPU(struct mmiowb_state, __mmiowb_state);
+EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
+#endif
+
 /*
  * If lockdep is enabled then we use the non-preemption spin-ops
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
-- 
2.11.0

Re: [PATCH] i2c: Allow recovery of the initial IRQ by an I2C client device.

2019-02-22 Thread Jim Broadus

On Fri, Feb 22, 2019 at 3:32 AM Wolfram Sang  wrote:
>
>
> > > > But I still have the feeling that the problem is not solved at the
> > > > right place. In i2c_new_device() we are storing parts of the fields of
> > > > struct i2c_board_info, and when resetting the irq we are losing
> > > > information. This patch solves that, but I wonder if the IRQ should
> > > > not be 'simply' set in i2c_device_probe(). This means we also need to
> > > > store the .resources of info, but I have a feeling this will be less
> > > > error prone in the future.
> > > >
> > > > But this is just my guts telling me something is not right. I would
> > > > perfectly understand if we want to get this merged ASAP.
> > > >
> > > > So given that the code is correct, this is my:
> > > > Reviewed-by: Benjamin Tissoires 
> > > >
> > > > But at least I have expressed my feelings :)
> > >
> > > Which I can relate to very much. I see the code solves the issue but my
> > > feeling is that we are patching around something which should be handled
> > > differently in general.
> > >
> > > Is somebody willing to research this further?
> > >
> > > Thanks for your input.
> > >
> >
> > I would be willing to have more of a look at it but am slightly
> > nervous I am not right person as all the systems I currently work
> > with are DT based so don't really exemplify the issue at all.
>
> Thank you! Well, I'll be there, too. Discussing, reviewing, testing. And
> if we have Benjamin for that on board as well, then I think we have a
> good start for that task :) Others are more than welcome to join, too,
> of course.
>

I'm also more familiar with device-tree (just came across this on my personal
laptop) but happy to review and test at the risk of learning something.

Re: [PATCHv6 07/10] acpi/hmat: Register processor domain to its memory

2019-02-22 Thread Keith Busch

On Wed, Feb 20, 2019 at 11:02:01PM +0100, Rafael J. Wysocki wrote:
> On Thu, Feb 14, 2019 at 6:10 PM Keith Busch  wrote:
> >  config ACPI_HMAT
> > bool "ACPI Heterogeneous Memory Attribute Table Support"
> > depends on ACPI_NUMA
> > +   select HMEM_REPORTING
> 
> If you want to do this here, I'm not sure that defining HMEM_REPORTING
> as a user-selectable option is a good idea.  In particular, I don't
> really think that setting ACPI_HMAT without it makes a lot of sense.
> Apart from this, the patch looks reasonable to me.

I'm trying to implement based on the feedback, but I'm a little confused.

As I have it at the moment, HMEM_REPORTING is not user-prompted, so
another option needs to turn it on. I have ACPI_HMAT do that here.

So when you say it's a bad idea to make HMEM_REPORTING user selectable,
isn't it already not user selectable?

If I do it the other way around, that's going to make HMEM_REPORTING
complicated if a non-ACPI implementation wants to report HMEM
properties.

[PATCH] arm64: dts: rockchip: Add support for the Orange Pi RK3399 board.

2019-02-22 Thread Alexis Ballier

This adds basic support for the Orange Pi RK3399 board.
What works:
- SD card / emmc.
- Debug UART
- Ethernet
- USB: Type C, internal USB3 for SATA, 4 USB 2.0 ports
- Sensors: All of them but the Hall sensor.
- Buttons
- Wifi, Bluetooth
- HDMI out

Signed-off-by: Alexis Ballier 
Cc: devicet...@vger.kernel.org
Cc: Heiko Stuebner 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-rockc...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org

---

The board is heavily based on the Sapphire one, hence this patch is also
based on this DT. This is why I kept the copyrights of the sapphire DT.
---
 .../devicetree/bindings/arm/rockchip.yaml |   5 +
 arch/arm64/boot/dts/rockchip/Makefile |   1 +
 .../boot/dts/rockchip/rk3399-orangepi.dts | 778 ++
 3 files changed, 784 insertions(+)
 create mode 100644 arch/arm64/boot/dts/rockchip/rk3399-orangepi.dts

diff --git a/Documentation/devicetree/bindings/arm/rockchip.yaml 
b/Documentation/devicetree/bindings/arm/rockchip.yaml
index eb4d7f356543..58b075fcd78d 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.yaml
+++ b/Documentation/devicetree/bindings/arm/rockchip.yaml
@@ -309,6 +309,11 @@ properties:
   - const: phytec,rk3288-phycore-som
   - const: rockchip,rk3288
 
+  - description: Orange Pi RK3399 board
+items:
+  - const: rockchip,rk3399-orangepi
+  - const: rockchip,rk3399
+
   - description: Pine64 Rock64
 items:
   - const: pine64,rock64
diff --git a/arch/arm64/boot/dts/rockchip/Makefile 
b/arch/arm64/boot/dts/rockchip/Makefile
index 1b28fa72ea0b..25454e2cf2de 100644
--- a/arch/arm64/boot/dts/rockchip/Makefile
+++ b/arch/arm64/boot/dts/rockchip/Makefile
@@ -18,6 +18,7 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-scarlet-inx.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-gru-scarlet-kd.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-nanopc-t4.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-nanopi-m4.dtb
+dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-orangepi.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-puma-haikou.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-roc-pc.dtb
 dtb-$(CONFIG_ARCH_ROCKCHIP) += rk3399-rock-pi-4.dtb
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-orangepi.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-orangepi.dts
new file mode 100644
index ..e33f0c8ce2c9
--- /dev/null
+++ b/arch/arm64/boot/dts/rockchip/rk3399-orangepi.dts
@@ -0,0 +1,778 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (c) 2017 Fuzhou Rockchip Electronics Co., Ltd.
+ */
+
+/dts-v1/;
+
+#include "dt-bindings/pwm/pwm.h"
+#include "dt-bindings/input/input.h"
+#include "rk3399.dtsi"
+#include "rk3399-opp.dtsi"
+
+/ {
+   model = "Orange Pi RK3399 Board";
+   compatible = "rockchip,rk3399-orangepi", "rockchip,rk3399";
+
+   chosen {
+   stdout-path = "serial2:150n8";
+   };
+
+   clkin_gmac: external-gmac-clock {
+   compatible = "fixed-clock";
+   clock-frequency = <12500>;
+   clock-output-names = "clkin_gmac";
+   #clock-cells = <0>;
+   };
+
+   dc_12v: dc-12v {
+   compatible = "regulator-fixed";
+   regulator-name = "dc_12v";
+   regulator-always-on;
+   regulator-boot-on;
+   regulator-min-microvolt = <1200>;
+   regulator-max-microvolt = <1200>;
+   };
+
+   keys: gpio-keys {
+   compatible = "gpio-keys";
+   autorepeat;
+
+   power {
+   debounce-interval = <100>;
+   gpios = < RK_PA5 GPIO_ACTIVE_LOW>;
+   label = "GPIO Power";
+   linux,code = ;
+   linux,input-type = <1>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_btn>;
+   wakeup-source;
+   };
+   };
+
+   adc-keys {
+   compatible = "adc-keys";
+   io-channels = < 1>;
+   io-channel-names = "buttons";
+   keyup-threshold-microvolt = <180>;
+   poll-interval = <100>;
+
+   button-up {
+   label = "Volume Up";
+   linux,code = ;
+   press-threshold-microvolt = <10>;
+   };
+
+   button-down {
+   label = "Volume Down";
+   linux,code = ;
+   press-threshold-microvolt = <30>;
+   };
+
+   back {
+   label = "Back";
+   linux,code = ;
+   press-threshold-microvolt = <985000>;
+   };
+
+   menu {
+   label = "Menu";
+   linux,code = ;
+   press-threshold-microvolt = <1314000>;
+   };
+   };
+
+   sdio_pwrseq:

Re: [PATCH net-next 1/7] net: phy: marvell10g: Use get_features to get the PHY abilities

2019-02-22 Thread Heiner Kallweit

On 21.02.2019 10:51, Maxime Chevallier wrote:
> The Alaska family of 10G PHYs has more abilities than the ones listed in
> PHY_10GBIT_FULL_FEATURES, the exact list depending on the model.
> 
> Make use of the newly introduced .get_features call to build this list,
> using genphy_c45_pma_read_abilities to build the list of supported
> linkmodes, and adding autoneg ability based on what's reported by the AN
> MMD.
> 
> .config_init is still used to validate the interface_mode.
> 
> Signed-off-by: Maxime Chevallier 
> ---
>  drivers/net/phy/marvell10g.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
> index 9ea27acf05ad..65ef469adf58 100644
> --- a/drivers/net/phy/marvell10g.c
> +++ b/drivers/net/phy/marvell10g.c
> @@ -233,8 +233,6 @@ static int mv3310_resume(struct phy_device *phydev)
>  
>  static int mv3310_config_init(struct phy_device *phydev)
>  {
> - int ret, val;
> -
>   /* Check that the PHY interface type is compatible */
>   if (phydev->interface != PHY_INTERFACE_MODE_SGMII &&
>   phydev->interface != PHY_INTERFACE_MODE_XAUI &&
> @@ -242,6 +240,12 @@ static int mv3310_config_init(struct phy_device *phydev)
>   phydev->interface != PHY_INTERFACE_MODE_10GKR)
>   return -ENODEV;
>  
> + return 0;
> +}
> +
> +static int mv3310_get_features(struct phy_device *phydev)
> +{

After my just submitted patch to include the aneg capability checking in
genphy_c45_pma_read_abilities() function mv3310_get_features() isn't
needed any longer and can be replaced with the generic one.
But we can make this change afterwards, then you don't have to
rework your series.

Also I'm not sure whether there will be a 5.0-rc8 or whether beginning
of next week we'll see 5.0. In the latter case we're a little bit in a
hurry because the merge window will start very soon.

> + int ret, val;
>   if (phydev->c45_ids.devices_in_package & MDIO_DEVS_AN) {
>   val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_STAT1);
>   if (val < 0)
> @@ -429,7 +433,7 @@ static struct phy_driver mv3310_drivers[] = {
>   .phy_id = 0x002b09aa,
>   .phy_id_mask= MARVELL_PHY_ID_MASK,
>   .name   = "mv88x3310",
> - .features   = PHY_10GBIT_FEATURES,
> + .get_features   = mv3310_get_features,
>   .soft_reset = gen10g_no_soft_reset,
>   .config_init= mv3310_config_init,
>   .probe  = mv3310_probe,
>

Re: [GIT PULL] clk fixes for v5.0-rc7

2019-02-22 Thread pr-tracker-bot

The pull request you sent on Thu, 21 Feb 2019 13:57:21 -0800:

> https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git 
> tags/clk-fixes-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a3504f7a38233030def726fcfe692e786ab162df

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] Please pull RDMA subsystem changes

2019-02-22 Thread pr-tracker-bot

The pull request you sent on Thu, 21 Feb 2019 23:07:20 +:

> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/168bd29830e8ebbffcd70d2af50249dca088e1a8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [git pull] drm fixes for 5.0-rc8 (or final)

2019-02-22 Thread pr-tracker-bot

The pull request you sent on Fri, 22 Feb 2019 10:13:42 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2019-02-22

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6ee2846cb4e7c6e8acdcb265299ad1c6de42b437

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [PATCH v3 7/9] dt-bindings: serial: Add Milbeaut serial driver description

2019-02-22 Thread Rob Herring

On Wed, 20 Feb 2019 16:44:37 +0900, Sugaya Taichi wrote:
> Add DT bindings document for Milbeaut serial driver.
> 
> Signed-off-by: Sugaya Taichi 
> ---
>  .../devicetree/bindings/serial/milbeaut-uart.txt| 21 
> +
>  1 file changed, 21 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/serial/milbeaut-uart.txt
> 

Reviewed-by: Rob Herring

Re: [PATCH] Bluetooth: btusb: request wake pin with NOAUTOEN

2019-02-22 Thread Matthias Kaehlcke

On Thu, Feb 21, 2019 at 04:30:51PM -0800, Brian Norris wrote:
> Badly-designed systems might have (for example) active-high wake pins
> that default to high (e.g., because of external pull ups) until they
> have an active firmware which starts driving it low. This can cause an
> interrupt storm in the time between request_irq() and disable_irq().
> 
> We don't support shared interrupts here, so let's just pre-configure the
> interrupt to avoid auto-enabling it.
> 
> Fixes: fd913ef7ce61 ("Bluetooth: btusb: Add out-of-band wakeup support")
> Signed-off-by: Brian Norris 
> ---
>  drivers/bluetooth/btusb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index 4761499db9ee..470ee68555d9 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -2885,6 +2885,7 @@ static int btusb_config_oob_wake(struct hci_dev *hdev)
>   return 0;
>   }
>  
> + irq_set_status_flags(irq, IRQ_NOAUTOEN);
>   ret = devm_request_irq(>dev, irq, btusb_oob_wake_handler,
>  0, "OOB Wake-on-BT", data);
>   if (ret) {
> @@ -2899,7 +2900,6 @@ static int btusb_config_oob_wake(struct hci_dev *hdev)
>   }
>  
>   data->oob_wake_irq = irq;
> - disable_irq(irq);
>   bt_dev_info(hdev, "OOB Wake-on-BT configured at IRQ %u", irq);
>   return 0;
>  }

Reviewed-by: Matthias Kaehlcke

Re: [PATCH v3 00/20] Merge text_poke fixes and executable lockdowns

2019-02-22 Thread Edgecombe, Rick P

On Fri, 2019-02-22 at 17:14 +0100, Borislav Petkov wrote:
> On Thu, Feb 21, 2019 at 03:44:31PM -0800, Rick Edgecombe wrote:
> > Changes v2 to v3:
> >  - Fix commit messages and comments [Boris]
> >  - Rename VM_HAS_SPECIAL_PERMS [Boris]
> >  - Remove unnecessary local variables [Boris]
> >  - Rename set_alias_*() functions [Boris, Andy]
> >  - Save/restore DR registers when using temporary mm
> >  - Move line deletion from patch 10 to patch 17
> 
> In your previous submission there was a patch called
> 
> Subject: [PATCH v2 01/20] Fix "x86/alternatives: Lockdep-enforce text_mutex in
> text_poke*()"
> 
> What happened to it?
> 
> It did introduce a function text_poke_kgdb(), a.o., and I see this
> function in the diff contexts in some of the patches in this submission
> so it looks to me like you missed that first patch when submitting v3?
> 
> Or am *I* missing something?
> 
> Thx.
> 
Oh, you are right! Sorry about that. I'll just send a new version with fixes for
other comments instead of a resend of this one.

Thanks,

Rick

Re: [PATCH 3/3] arm64: dts: rockchip: move QCA6174A wakeup pin into its USB node

2019-02-22 Thread Matthias Kaehlcke

On Thu, Feb 21, 2019 at 04:34:03PM -0800, Brian Norris wrote:
> Currently, we don't coordinate BT USB activity with our handling of the
> BT out-of-band wake pin, and instead just use gpio-keys. That causes
> problems because we have no way of distinguishing wake activity due to a
> BT device (e.g., mouse) vs. the BT controller (e.g., re-configuring wake
> mask before suspend). This can cause spurious wake events just because
> we, for instance, try to reconfigure the host controller's event mask
> before suspending.
> 
> We can avoid these synchronization problems by handling the BT wake pin
> directly in the btusb driver -- for all activity up until BT controller
> suspend(), we simply listen to normal USB activity (e.g., to know the
> difference between device and host activity); once we're really ready to
> suspend the host controller, there should be no more host activity, and
> only *then* do we unmask the GPIO interrupt.
> 
> This is already supported by btusb; we just need to describe the wake
> pin in the right node.
> 
> We list 2 compatible properties, since both PID/VID pairs show up on
> Scarlet devices, and they're both essentially identical QCA6174A-based
> modules.
> 
> Also note that the polarity was wrong before: Qualcomm implemented WAKE
> as active high, not active low. We only got away with this because
> gpio-keys always reconfigured us as bi-directional edge-triggered.
> 
> Finally, we have an external pull-up and a level-shifter on this line
> (we didn't notice Qualcomm's polarity in the initial design), so we
> can't do pull-down. Switch to pull-none.
> 
> Signed-off-by: Brian Norris 
> ---
> This patch is also required to make this stable, but since it's not
> really tied to the device tree, and it's an existing bug, I sent it
> separately:
> 
>   https://lore.kernel.org/patchwork/patch/1044896/
>   Subject: Bluetooth: btusb: request wake pin with NOAUTOEN
> 
>  .../dts/rockchip/rk3399-gru-chromebook.dtsi   | 13 ++
>  .../boot/dts/rockchip/rk3399-gru-scarlet.dtsi | 46 ---
>  arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi  | 13 --
>  3 files changed, 42 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi 
> b/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi
> index c400be64170e..931640e9aed4 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-chromebook.dtsi
> @@ -200,6 +200,19 @@
>   pinctrl-0 = <_en>;
>   pwm-delay-us = <1>;
>   };
> +
> + gpio_keys: gpio-keys {
> + compatible = "gpio-keys";
> + pinctrl-names = "default";
> + pinctrl-0 = <_host_wake_l>;
> +
> + wake_on_bt: wake-on-bt {
> + label = "Wake-on-Bluetooth";
> + gpios = < 3 GPIO_ACTIVE_LOW>;
> + linux,code = ;
> + wakeup-source;
> + };
> + };
>  };
>  
>  _bigcpu {
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi 
> b/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi
> index fc50b3ef758c..3e2196c08473 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi
> @@ -175,6 +175,21 @@
>   pinctrl-0 = <_en>;
>   wakeup-delay-ms = <250>;
>   };
> +
> + gpio_keys: gpio-keys {
> + compatible = "gpio-keys";
> + pinctrl-names = "default";
> + pinctrl-0 = <_eject_odl>;
> +
> + pen-insert {
> + label = "Pen Insert";
> + /* Insert = low, eject = high */
> + gpios = < 1 GPIO_ACTIVE_LOW>;
> + linux,code = ;
> + linux,input-type = ;
> + wakeup-source;
> + };
> + };
>  };
>  
>  /* pp900_s0 aliases */
> @@ -328,20 +343,6 @@ camera:  {
>   <4>;
>  };
>  
> -_keys {
> - pinctrl-names = "default";
> - pinctrl-0 = <_host_wake_l>, <_eject_odl>;
> -
> - pen-insert {
> - label = "Pen Insert";
> - /* Insert = low, eject = high */
> - gpios = < 1 GPIO_ACTIVE_LOW>;
> - linux,code = ;
> - linux,input-type = ;
> - wakeup-source;
> - };
> -};
> -
>  _tunnel {
>   google,remote-bus = <0>;
>  };
> @@ -437,8 +438,19 @@ camera:  {
>   status = "okay";
>  };
>  
> -_on_bt {
> - gpios = < 2 GPIO_ACTIVE_LOW>;
> +_host0_ohci {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + qca_bt: bt@1 {
> + compatible = "usb0cf3,e300", "usb04ca,301a";
> + reg = <1>;
> + pinctrl-names = "default";
> + pinctrl-0 = <_host_wake_l>;
> + interrupt-parent = <>;
> + interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
> + interrupt-names =

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-22 Thread Linus Torvalds

On Fri, Feb 22, 2019 at 9:48 AM Andy Lutomirski  wrote:
>
> > On Feb 22, 2019, at 9:43 AM, Linus Torvalds  
> > wrote:
> >
> > Then we should still probably fix up "__probe_kernel_read()" to not
> > allow user accesses. The easiest way to do that is actually likely to
> > use the "unsafe_get_user()" functions *without* doing a
> > uaccess_begin(), which will mean that modern CPU's will simply fault
> > on a kernel access to user space.
> >
> > The nice thing about that is that usually developers will have access
> > to exactly those modern boxes, so the people who notice that it
> > doesn't work are the right people.
>
> We use probe_kernel_read() from oops code. I’d rather it return -EFAULT than 
> oops harder and kill the first oops.

It would still do that.

Using the unsafe_get_user() macros doesn't remove the exception
handling, and we wouldn't remove the whole "pagefault_disable()"
either. So it would work exactly the same way it does now, except on a
modern CPU it would return -EFAULT for a user space access due to AC
not being set.

So no "oops harder", only "safer accesses".

Linus

[PATCH] kasan: prevent recursive instrumentation

2019-02-22 Thread Mark Rutland

When CONFIG_KASAN is selected,  defines the
prototypes for kasan_check_{read,write}(), rather than inline stubs.

This is the case even when it is included by files which are not
instrumented by KASAN. Where helpers (e.g. the atomics) are instrumented
with explicit calls to kasan_check_{read,write}(), this results in files
being unexpectedly instrumented.

This is problematic as it can result in instrumentation in files which
cannot safely call these functions, such as within the EFI stub:

[mark@lakrids:~/src/linux]% usekorg 8.1.0 make ARCH=arm64 
CROSS_COMPILE=aarch64-linux- -j64 -s
drivers/firmware/efi/libstub/arm-stub.stub.o: In function `atomic_set':
/home/mark/src/linux/./include/asm-generic/atomic-instrumented.h:44: undefined 
reference to `__efistub_kasan_check_write'
/home/mark/src/linux/./include/asm-generic/atomic-instrumented.h:44:(.init.text+0xa0):
 relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol 
`__efistub_kasan_check_write'
Makefile:1021: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1

Let's avoid this by ensuring that uninstrumented files are given the
same stub definition of these functions used for !KASAN builds. So that
the stub defintions don't conflict with the real definitions in
(uninstrumented) common KASAN code, the real definitions are prefixed
with underscores, and called from unprefixed macros.

Any compiler-generated instrumentation uses separate
__asan_{load,store}_*() entry points, and is not affected by this
change.

Signed-off-by: Mark Rutland 
Cc: Alexander Potapenko 
Cc: Andrey Ryabinin 
Cc: Dmitry Vyukov 
Cc: Ingo Molnar 
---
 include/linux/kasan-checks.h | 8 +---
 mm/kasan/common.c| 8 
 scripts/Makefile.kasan   | 2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

Hi,

Assuming the KASAN folk are happy with this, I'd like this to be queued in the
tip tree, where the arm64 instrumented atomics are already queued.

Thanks,
Mark.

diff --git a/include/linux/kasan-checks.h b/include/linux/kasan-checks.h
index d314150658a4..6cf4b41a5393 100644
--- a/include/linux/kasan-checks.h
+++ b/include/linux/kasan-checks.h
@@ -2,9 +2,11 @@
 #ifndef _LINUX_KASAN_CHECKS_H
 #define _LINUX_KASAN_CHECKS_H
 
-#ifdef CONFIG_KASAN
-void kasan_check_read(const volatile void *p, unsigned int size);
-void kasan_check_write(const volatile void *p, unsigned int size);
+#if defined(CONFIG_KASAN) && !defined (KASAN_NOSANITIZE)
+void __kasan_check_read(const volatile void *p, unsigned int size);
+#define kasan_check_read __kasan_check_read
+void __kasan_check_write(const volatile void *p, unsigned int size);
+#define kasan_check_write __kasan_check_write
 #else
 static inline void kasan_check_read(const volatile void *p, unsigned int size)
 { }
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 73c9cbfdedf4..630e32838adb 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -95,17 +95,17 @@ void kasan_disable_current(void)
current->kasan_depth--;
 }
 
-void kasan_check_read(const volatile void *p, unsigned int size)
+void __kasan_check_read(const volatile void *p, unsigned int size)
 {
check_memory_region((unsigned long)p, size, false, _RET_IP_);
 }
-EXPORT_SYMBOL(kasan_check_read);
+EXPORT_SYMBOL(__kasan_check_read);
 
-void kasan_check_write(const volatile void *p, unsigned int size)
+void __kasan_check_write(const volatile void *p, unsigned int size)
 {
check_memory_region((unsigned long)p, size, true, _RET_IP_);
 }
-EXPORT_SYMBOL(kasan_check_write);
+EXPORT_SYMBOL(__kasan_check_write);
 
 #undef memset
 void *memset(void *addr, int c, size_t len)
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 25c259df8ffa..c475a8ac776c 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -52,5 +52,5 @@ CFLAGS_KASAN := -fsanitize=kernel-hwaddress \
 endif # CONFIG_KASAN_SW_TAGS
 
 ifdef CONFIG_KASAN
-CFLAGS_KASAN_NOSANITIZE := -fno-builtin
+CFLAGS_KASAN_NOSANITIZE := -fno-builtin -DKASAN_NOSANITIZE
 endif
-- 
2.11.0

Re: [PATCH v5 8/8] ARM: tegra: Add firmware calls required for suspend-resume

2019-02-22 Thread Dmitry Osipenko

22.02.2019 20:59, Dmitry Osipenko пишет:
> In order to resume CPU from suspend via trusted Foundations firmware,
> the LP1/LP2 boot vectors and CPU caches need to be set up using the
> firmware calls.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  arch/arm/mach-tegra/pm.c| 53 ++---
>  arch/arm/mach-tegra/reset-handler.S | 26 ++
>  arch/arm/mach-tegra/sleep.S |  3 +-
>  3 files changed, 61 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm/mach-tegra/pm.c b/arch/arm/mach-tegra/pm.c
> index 66c8cd63dd86..f209f59e0daf 100644
> --- a/arch/arm/mach-tegra/pm.c
> +++ b/arch/arm/mach-tegra/pm.c
> @@ -33,6 +33,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -160,6 +161,28 @@ int tegra_cpu_do_idle(void)
>  
>  static int tegra_sleep_cpu(unsigned long v2p)
>  {
> + /*
> +  * L2 cache disabling using kernel API only allowed when all
> +  * secondary CPU's are offline. Cache have to be disabled with
> +  * MMU-on if cache maintenance is done via Trusted Foundations
> +  * firmware. Note that CPUIDLE won't ever enter powergate on Tegra30
> +  * if any of secondary CPU's is online and this is the LP2-idle
> +  * code-path only for Tegra20/30.
> +  */
> + if (trusted_foundations_registered())
> + outer_disable();
> +
> + /*
> +  * Note that besides of setting up CPU reset vector this firmware
> +  * call may also do the following, depending on the FW version:
> +  *  1) Disable L2. But this doesn't matter since we already
> +  * disabled the L2.
> +  *  2) Disable D-cache. This need to be taken into account in
> +  * particular by the tegra_disable_clean_inv_dcache() which
> +  * shall avoid the re-disable.
> +  */
> + call_firmware_op(prepare_idle, TF_PM_MODE_LP2);
> +
>   setup_mm_for_reboot();
>   tegra_sleep_cpu_finish(v2p);
>  
> @@ -196,24 +219,13 @@ void tegra_idle_lp2_last(void)
>   cpu_cluster_pm_enter();
>   suspend_cpu_complex();
>  
> - /*
> -  * L2 cache disabling using kernel API only allowed when all
> -  * secondary CPU's are offline. Cache have to be disabled early
> -  * if cache maintenance is done via Trusted Foundations firmware.
> -  * Note that CPUIDLE won't ever enter powergate on Tegra30 if any
> -  * of secondary CPU's is online and this is the LP2 codepath only
> -  * for Tegra20/30.
> -  */
> - if (trusted_foundations_registered())
> - outer_disable();
> -
>   cpu_suspend(PHYS_OFFSET - PAGE_OFFSET, _sleep_cpu);
>  
>   /*
>* Resume L2 cache if it wasn't re-enabled early during resume,
>* which is the case for Tegra30 that has to re-enable the cache
>* via firmware call. In other cases cache is already enabled and
> -  * hence re-enabling is a no-op.
> +  * hence re-enabling is a no-op. This is always a no-op on Tegra114+.
>*/
>   outer_resume();
>  
> @@ -235,6 +247,15 @@ enum tegra_suspend_mode tegra_pm_validate_suspend_mode(
>  
>  static int tegra_sleep_core(unsigned long v2p)
>  {
> + /*
> +  * Cache have to be disabled with MMU-on if cache maintenance is done
> +  * via Trusted Foundations firmware. This is a no-op on Tegra114+.
> +  */
> + if (trusted_foundations_registered())
> + outer_disable();
> +
> + call_firmware_op(prepare_idle, TF_PM_MODE_LP1);
> +
>   setup_mm_for_reboot();
>   tegra_sleep_core_finish(v2p);
>  
> @@ -360,14 +381,6 @@ static int tegra_suspend_enter(suspend_state_t state)
>   break;
>   }
>  
> - /*
> -  * Cache have to be disabled early if cache maintenance is done
> -  * via Trusted Foundations firmware. Otherwise this is a no-op,
> -  * like on Tegra114+.
> -  */
> - if (trusted_foundations_registered())
> - outer_disable();
> -
>   cpu_suspend(PHYS_OFFSET - PAGE_OFFSET, tegra_sleep_func);

Seems I messed up the rebasing a tad. Will send another version.

Re: [PATCH V3 00/10] X86/KVM/Hyper-V: Add HV ept tlb range list flush support in KVM

2019-02-22 Thread Paolo Bonzini

On 22/02/19 16:06, lantianyu1...@gmail.com wrote:
> From: Lan Tianyu 
> 
> This patchset is to introduce hv ept tlb range list flush function
> support in the KVM MMU component. Flushing ept tlbs of several address
> range can be done via single hypercall and new list flush function is
> used in the kvm_mmu_commit_zap_page() and FNAME(sync_page). This patchset
> also adds more hv ept tlb range flush support in more KVM MMU function.
> 
> This patchset is based on the fix patch "x86/Hyper-V: Fix definition 
> HV_MAX_FLUSH_REP_COUNT".
> (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1939455.html)

Note that this won't make it in 5.1 unless Linus releases an -rc8.
Otherwise, I'll get to it next week.

Thanks,

Paolo

[PATCH hyperv-fixes] hv_netvsc: Fix IP header checksum for coalesced packets

2019-02-22 Thread Haiyang Zhang

From: Haiyang Zhang 

Incoming packets may have IP header checksum verified by the host.
They may not have IP header checksum computed after coalescing.
This patch re-compute the checksum when necessary, otherwise the
packets may be dropped, because Linux network stack always checks it.

Signed-off-by: Haiyang Zhang 
---
 drivers/net/hyperv/netvsc_drv.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 256adbd044f5..cf4897043e83 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -744,6 +744,14 @@ void netvsc_linkstatus_callback(struct net_device *net,
schedule_delayed_work(_ctx->dwork, 0);
 }
 
+static void netvsc_comp_ipcsum(struct sk_buff *skb)
+{
+   struct iphdr *iph = (struct iphdr *)skb->data;
+
+   iph->check = 0;
+   iph->check = ip_fast_csum(iph, iph->ihl);
+}
+
 static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 struct netvsc_channel *nvchan)
 {
@@ -770,9 +778,17 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct 
net_device *net,
/* skb is already created with CHECKSUM_NONE */
skb_checksum_none_assert(skb);
 
-   /*
-* In Linux, the IP checksum is always checked.
-* Do L4 checksum offload if enabled and present.
+   /* Incoming packets may have IP header checksum verified by the host.
+* They may not have IP header checksum computed after coalescing.
+* We compute it here if the flags are set, because on Linux, the IP
+* checksum is always checked.
+*/
+   if (csum_info && csum_info->receive.ip_checksum_value_invalid &&
+   csum_info->receive.ip_checksum_succeeded &&
+   skb->protocol == htons(ETH_P_IP))
+   netvsc_comp_ipcsum(skb);
+
+   /* Do L4 checksum offload if enabled and present.
 */
if (csum_info && (net->features & NETIF_F_RXCSUM)) {
if (csum_info->receive.tcp_checksum_succeeded ||
-- 
2.19.1

Re: [PATCH net] net: socket: set sock->sk to NULL after calling proto_ops::release()

2019-02-22 Thread Eric Dumazet




On 02/22/2019 09:57 AM, Eric Biggers wrote:

> ->setattr() is called under inode_lock(), which __sock_release() also takes.  
> So
> the uses of sock->sk are serialized.  See commit 6d8c50dcb029 ("socket: close
> race condition between sock_close() and sockfs_setattr()").

Oh right, we added another inode_lock()/inode_unlock() for sock_close()


> 
> The issue now is that if ->setattr() happens *after* __sock_release() (which 
> is
> possible if fchownat() gets the reference to the file's 'struct path', then 
> the
> file is close()d by another thread, then fchownat() continues), it will see
> stale sock->sk because for many socket types it wasn't set to NULL earlier.
> 
> - Eric
>

Re: general protection fault in rose_send_frame

2019-02-22 Thread syzbot


syzbot has found a reproducer for the following crash on:

HEAD commit:7a25c6c0aac8 rocker: Add missing break for PRE_BRIDGE_FLAGS
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=104002d8c0
kernel config:  https://syzkaller.appspot.com/x/.config?x=58cb9d752ba5f3e0
dashboard link: https://syzkaller.appspot.com/bug?extid=7078ae989d857fe17988
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1224c304c0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=151824f8c0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7078ae989d857fe17...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready
8021q: adding VLAN 0 to HW filter on device batadv0
IPv6: ADDRCONF(NETDEV_CHANGE): rose0: link becomes ready
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.0.0-rc7+ #76
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:rose_send_frame+0x1a8/0x280 net/rose/rose_link.c:104
Code: c1 ea 03 80 3c 02 00 0f 85 8d 00 00 00 48 b8 00 00 00 00 00 fc ff df  
4c 8b 63 20 49 8d bc 24 58 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75  
7e 49 8b 94 24 58 03 00 00 e9 b8 fe ff ff e8 f0 53

RSP: 0018:8880ae807ae8 EFLAGS: 00010202
RAX: dc00 RBX: 88809aa1fa00 RCX: 86378efb
RDX: 006b RSI: 8637902c RDI: 0358
RBP: 8880ae807b18 R08: 8887dec0 R09: ed101263146d
R10: ed101263146c R11: 88809318a363 R12: 
R13: 0078 R14: 0005 R15: 8880a4f6fbc0
FS:  () GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f23b3e4ee78 CR3: 9eef3000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 rose_transmit_clear_request+0x1de/0x2a0 net/rose/rose_link.c:258
 rose_rx_call_request+0x4ea/0x1990 net/rose/af_rose.c:1001
 rose_loopback_timer+0x26a/0x3f0 net/rose/rose_loopback.c:100
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 6c a1 fa 48 8b 45 d8 e9 ce fe ff  
ff 48 89 df e8 48 6c a1 fa eb 82 90 90 90 90 90 90 fb f4  0f 1f 00 66  
2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90

RSP: 0018:88807d08 EFLAGS: 0286 ORIG_RAX: ff13
RAX: 11125061 RBX: 8887dec0 RCX: 
RDX: dc00 RSI: 0001 RDI: 8887e73c
RBP: 88807d38 R08: 8887dec0 R09: 
R10:  R11:  R12: 
R13: 889282f8 R14:  R15: 
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555
 default_idle_call+0x36/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x386/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353
 rest_init+0x245/0x37b init/main.c:442
 arch_call_rest_init+0xe/0x1b
 start_kernel+0x803/0x83c init/main.c:739
 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:470
 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:451
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
Modules linked in:
---[ end trace 8f29030702ddb052 ]---
RIP: 0010:rose_send_frame+0x1a8/0x280 net/rose/rose_link.c:104
Code: c1 ea 03 80 3c 02 00 0f 85 8d 00 00 00 48 b8 00 00 00 00 00 fc ff df  
4c 8b 63 20 49 8d bc 24 58 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75  
7e 49 8b 94 24 58 03 00 00 e9 b8 fe ff ff e8 f0 53

RSP: 0018:8880ae807ae8 EFLAGS: 00010202
RAX: dc00 RBX: 88809aa1fa00 RCX: 86378efb
RDX: 006b RSI: 8637902c RDI: 0358
RBP: 8880ae807b18 R08: 8887dec0 R09: ed101263146d
R10: ed101263146c R11: 88809318a363 R12: 
R13: 0078 R14: 0005 R15: 8880a4f6fbc0
FS:  () GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0:

Re: [PATCH] KVM: MMU: record maximum physical address width in kvm_mmu_extended_role

2019-02-22 Thread Paolo Bonzini

On 20/02/19 08:06, Yu Zhang wrote:
> Hi Paolo, any comments on this patch? And the other one(kvm: x86: Return
> LA57 feature based on hardware capability )? :-)

Queued both, thanks.

Paolo

> On Fri, Feb 01, 2019 at 12:09:23AM +0800, Yu Zhang wrote:
>> Previously, commit 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow
>> MMU reconfiguration is needed") offered some optimization to avoid
>> the unnecessary reconfiguration. Yet one scenario is broken - when
>> cpuid changes VM's maximum physical address width, reconfiguration
>> is needed to reset the reserved bits.  Also, the TDP may need to
>> reset its shadow_root_level when this value is changed.
>>
>> To fix this, a new field, maxphyaddr, is introduced in the extended
>> role structure to keep track of the configured guest physical address
>> width.
>>
>> Signed-off-by: Yu Zhang 
>> ---
>> Cc: Paolo Bonzini 
>> Cc: "Radim Krčmář" 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: "H. Peter Anvin" 
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  arch/x86/include/asm/kvm_host.h | 1 +
>>  arch/x86/kvm/mmu.c  | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 4660ce9..be87f71 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -299,6 +299,7 @@ struct kvm_mmu_memory_cache {
>>  unsigned int cr4_smap:1;
>>  unsigned int cr4_smep:1;
>>  unsigned int cr4_la57:1;
>> +unsigned int maxphyaddr:6;
>>  };
>>  };
>>  
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index ce770b4..2b74505 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4769,6 +4769,7 @@ static union kvm_mmu_extended_role 
>> kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
>>  ext.cr4_pse = !!is_pse(vcpu);
>>  ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
>>  ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
>> +ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
>>  
>>  ext.valid = 1;
>>  
>> -- 
>> 1.9.1
>>
> 
> Thanks
> Yu
>

Re: [PATCH 5/5] mm/vmscan: don't forcely shrink active anon lru list

2019-02-22 Thread Johannes Weiner

On Fri, Feb 22, 2019 at 08:43:37PM +0300, Andrey Ryabinin wrote:
> shrink_node_memcg() always forcely shrink active anon list.
> This doesn't seem like correct behavior. If system/memcg has no swap, it's
> absolutely pointless to rebalance anon lru lists.
> And in case we did scan the active anon list above, it's unclear why would
> we need this additional force scan. If there are cases when we want more
> aggressive scan of the anon lru we should just change the scan target
> in get_scan_count() (and better explain such cases in the comments).
> 
> Remove this force shrink and let get_scan_count() to decide how
> much of active anon we want to shrink.

This change breaks the anon pre-aging.

The idea behind this is that the VM maintains a small batch of anon
reclaim candidates with recent access information. On every reclaim,
even when we just trim cache, which is the most common reclaim mode,
but also when we just swapped out some pages and shrunk the inactive
anon list, at the end of it we make sure that the list of potential
anon candidates is refilled for the next reclaim cycle.

The comments for this are above inactive_list_is_low() and the
age_active_anon() call from kswapd.

Re: no swap, you are correct. We should gate that rebalancing on
total_swap_pages, just like age_active_anon() does.

Re: [PATCHv6 06/10] node: Add memory-side caching attributes

2019-02-22 Thread Dan Williams

On Fri, Feb 22, 2019 at 10:09 AM Keith Busch  wrote:
>
> On Fri, Feb 22, 2019 at 11:12:38AM +0100, Brice Goglin wrote:
> > Le 14/02/2019 à 18:10, Keith Busch a écrit :
> > > +What:  
> > > /sys/devices/system/node/nodeX/memory_side_cache/indexY/size
> > > +Date:  December 2018
> > > +Contact:   Keith Busch 
> > > +Description:
> > > +   The size of this memory side cache in bytes.
> >
> >
> > Hello Keith,
> >
> > CPU-side cache size is reported in kilobytes:
> >
> > $ cat
> > /sys/devices/system/cpu/cpu0/cache/index*/size
> >
> > 32K
> > 32K
> > 256K
> > 4096K
> >
> > Can you do the same of memory-side caches instead of reporting bytes?
>
> Ok, will do.

Ugh, please no. Don't optimize sysfs for human consumption. That 'K'
now needs to be parsed.

Re: [PATCH v2] IB/mlx4: Increase the timeout for CM cache

2019-02-22 Thread Jason Gunthorpe

On Sun, Feb 17, 2019 at 03:45:12PM +0100, Håkon Bugge wrote:
> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
> 
> Since the VF drivers have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
> 
> Following the RDMA Connection Manager (CM) protocol, it is clear when
> an entry has to evicted form the cache. But life is not perfect,
> remote peers may die or be rebooted. Hence, it's a timeout to wipe out
> a cache entry, when the PF driver assumes the remote peer has gone.
> 
> During workloads where a high number of QPs are destroyed concurrently,
> excessive amount of CM DREQ retries has been observed
> 
> The problem can be demonstrated in a bare-metal environment, where two
> nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
> have 16 vPorts per physical server.
> 
> 64 processes are associated with each vPort and creates and destroys
> one QP for each of the remote 64 processes. That is, 1024 QPs per
> vPort, all in all 16K QPs. The QPs are created/destroyed using the
> CM.
> 
> When tearing down these 16K QPs, excessive CM DREQ retries (and
> duplicates) are observed. With some cat/paste/awk wizardry on the
> infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
> nodes:
> 
> cm_rx_duplicates:
>   dreq  2102
> cm_rx_msgs:
>   drep  1989
>   dreq  6195
>rep  3968
>req  4224
>rtu  4224
> cm_tx_msgs:
>   drep  4093
>   dreq 27568
>rep  4224
>req  3968
>rtu  3968
> cm_tx_retries:
>   dreq 23469
> 
> Note that the active/passive side is equally distributed between the
> two nodes.
> 
> Enabling pr_debug in cm.c gives tons of:
> 
> [171778.814239]  mlx4_ib_multiplex_cm_handler: id{slave:
> 1,sl_cm_id: 0xd393089f} is NULL!
> 
> By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
> tear-down phase of the application is reduced from approximately 90 to
> 50 seconds. Retries/duplicates are also significantly reduced:
> 
> cm_rx_duplicates:
>   dreq  2460
> []
> cm_tx_retries:
>   dreq  3010
>req47
> 
> Increasing the timeout further didn't help, as these duplicates and
> retries stems from a too short CMA timeout, which was 20 (~4 seconds)
> on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
> numbers fell down to about 10 for both of them.
> 
> Adjustment of the CMA timeout is not part of this commit.
> 
> Signed-off-by: Håkon Bugge 
> Acked-by: Jack Morgenstein 
> ---

Applied to for-next

Thanks,
Jason

Re: [PATCH v8 0/7] freezer for cgroup v2

2019-02-22 Thread Tejun Heo

Hey, Oleg.

On Fri, Feb 22, 2019 at 05:34:42PM +0100, Oleg Nesterov wrote:
> > ptrace support is a lot less important than kill for sure but if at
> > all possible I think it'd be better to have it
> 
> Tejun, I agree it would be better. I did not argue with that.
> 
> The question is how this can be implemented. And how much uglifications^W
> complications in the core kernel code this needs.

Yeah, sure thing.  It's always a trade-off but given that this is
something we're gonna expose to userspace as an API which will be
difficult to change too noticeably once released, I think it'd be
worthwhile to spend some effort to get the visible semantics right.

> > To summarize, the ideal result is the frozen state to be "stuck in
> > jobctl stop loop"
> 
> Not sure I understand, but I don't think this can work...
> 
> Let me repeat, imo the freezer should be "reliable", it shouldn't stuck
> in CGRP_FREEZE state forever if, say, it races with vfork(). And personally
> I think this is more important than (very limited) ptrace support.

Absolutely, it makes no sense to ship it otherwise.  I don't think
anyone is saying that we can ship it while it doesn't work reliably.

> So I think it too should somehow interact with freezable_schedule/etc.

You mean freezer_do_not_count(), right?  As long as the task is
guaranteed to be trapped by signal stop afterwards (and they are), we
likely can use them the same way.  The only thing to be careful about
would be ensuring that we don't end up flipping group level frozen
state inbetween.  Would something like that work?

Thanks.

-- 
tejun

hello dear

2019-02-22 Thread Aysha Gadafi

Assalamu Alaikum Wa Rahmatullahi Wa Barakatuh,

hello dear

I came across your contact during my private search. Mrs Aisha Al-
Qaddafi is my name, the only daughter of late Libyan president, am a 
single Mother and a Widow with three Children.I have funds the sum of 
$27.5 million USD for, investment, I am interested in you for 
investment project assistance in your country,because of my current 
refugee status, i shall compensate you 30% of the total sum after the 
funds are transfer into your account,I am willing to, negotiate 
investment/business profit sharing ratio with you base on the future 
investment earning profits.

Reply me urgent for more details

Mrs Aisha Al-Qaddafi

Re: [PATCH 2/2] clk: ingenic: Fix doc of ingenic_cgu_div_info

2019-02-22 Thread Stephen Boyd

Quoting Paul Cercueil (2019-01-27 18:09:21)
> The 'div' field does not represent a number of bits used to divide
> (understand: right-shift) the divider, but a number itself used to
> divide the divider.
> 
> Signed-off-by: Paul Cercueil 
> Signed-off-by: Maarten ter Huurne 
> Cc: 
> ---

Applied to clk-next

Re: [PATCH 1/2] clk: ingenic: Fix round_rate misbehaving with non-integer dividers

2019-02-22 Thread Stephen Boyd

Quoting Paul Cercueil (2019-01-27 18:09:20)
> Take a parent rate of 180 MHz, and a requested rate of 4.285715 MHz.
> This results in a theorical divider of 41.93 which is then rounded
> up to 42. The .round_rate function would then return (180 MHz / 42) as
> the clock, rounded down, so 4.285714 MHz.
> 
> Calling clk_set_rate on 4.285714 MHz would round the rate again, and
> give a theorical divider of 42,028, now rounded up to 43, and the
> rate returned would be (180 MHz / 43) which is 4.186046 MHz, aka. not
> what we requested.
> 
> Fix this by rounding up the divisions.
> 
> Signed-off-by: Paul Cercueil 
> Tested-by: Maarten ter Huurne 
> Cc: 
> ---

Applied to clk-next

Re: [PATCHv6 06/10] node: Add memory-side caching attributes

2019-02-22 Thread Keith Busch

On Fri, Feb 22, 2019 at 11:22:12AM +0100, Brice Goglin wrote:
> Le 14/02/2019 à 18:10, Keith Busch a écrit :
> > +What:  
> > /sys/devices/system/node/nodeX/memory_side_cache/indexY/associativity
> > +Date:  December 2018
> > +Contact:   Keith Busch 
> > +Description:
> > +   The caches associativity: 0 for direct mapped, non-zero if
> > +   indexed.
> 
> 
> Should we rename "associativity" into "indexing" or something else?
>
> When I see "associativity" that contains 0, I tend to interpret this as
> the associativity value itself, which would mean fully-associative here
> (as in CPU-side cache "ways_of_associativity" attribute), while actually
> 0 means direct-mapped (ie 1-associative) with yout semantics.
> 
> Brice

Yes, that's a good suggestion.

Re: [PATCH v2 08/14] clk: qcom: hfpll: CLK_IGNORE_UNUSED

2019-02-22 Thread Stephen Boyd

Quoting Jorge Ramirez-Ortiz (2019-01-28 10:32:55)
> When COMMON_CLK_DISABLED_UNUSED is set, in an effort to save power and
> to keep the software model of the clock in line with reality, the
> framework transverses the clock tree and disables those clocks that
> were enabled by the firmware but have not been enabled by any device
> driver.
> 
> If CPUFREQ is enabled, early during the system boot, it might attempt
> to change the CPU frequency ("set_rate"). If the HFPLL is selected as
> a provider, it will then change the rate for this clock.
> 
> As boot continues, clk_disable_unused_subtree will run. Since it wont
> find a valid counter (enable_count) for a clock that is actually
> enabled it will attempt to disable it which will cause the CPU to
> stop. Notice that in this driver, calls to check whether the clock is
> enabled are routed via the is_enabled callback which queries the
> hardware.
> 
> The following commit, rather than marking the clock critical and
> forcing the clock to be always enabled, addresses the above scenario
> making sure the clock is not disabled but it continues to rely on the
> firmware to enable the clock.
> 
> Co-developed-by: Niklas Cassel 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Jorge Ramirez-Ortiz 
> ---
>  drivers/clk/qcom/hfpll.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/clk/qcom/hfpll.c b/drivers/clk/qcom/hfpll.c
> index 0ffed0d..9d92f5d 100644
> --- a/drivers/clk/qcom/hfpll.c
> +++ b/drivers/clk/qcom/hfpll.c
> @@ -58,6 +58,7 @@ static int qcom_hfpll_probe(struct platform_device *pdev)
> .parent_names = (const char *[]){ "xo" },
> .num_parents = 1,
> .ops = _ops_hfpll,
> +   .flags = CLK_IGNORE_UNUSED,

Please put some sort of similar comment in the code so we can find it
without git history digging.

< 1 2 3 4 5 6 7 8 9 >

301 - 400 of 856 matches

Mail list logo