Re: [PATCH] arm64: dts: qcom: sdm845: Add reserve-memory nodes

2018-10-30 Thread Sibi Sankar

Hi Doug,
Thanks for the review!

I noticed both the quirks just after sending it out :(, will fix them.


On 2018-10-30 00:07, Doug Anderson wrote:

Hi,

On Fri, Oct 26, 2018 at 5:28 AM Sibi Sankar  
wrote:


Add reserve-memory nodes for mpss and mba required for
remoteproc mss pil.

Signed-off-by: Sibi Sankar 
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi

index d3fe012ad84e..f74892e447f9 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -88,6 +88,16 @@
reg = <0 0x8620 0 0x2d0>;
no-map;
};
+
+   mpss_region: reserved-memory@8e00 {
+   no-map;
+   reg = <0 0x8e00 0 0x780>;
+   };
+
+   mba_region: reserved-memory@9650 {


nit: All of the other reserved memory in this same section just has
the node name "memory".  Can you please follow suit?  Also above.
This is the kind of thing that Rob H. usually cares about doing right.


+   no-map;
+   reg = <0 0x9650 0 0x20>;


nit: All of the other reserved memory in this same section has "reg"
above "no-map".  It doesn't matter a whole lot, but why not make it
match everyone else?  Also above.


-Doug


--
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.


Re: [PATCH] arm64: dts: qcom: sdm845: Add reserve-memory nodes

2018-10-30 Thread Sibi Sankar

Hi Doug,
Thanks for the review!

I noticed both the quirks just after sending it out :(, will fix them.


On 2018-10-30 00:07, Doug Anderson wrote:

Hi,

On Fri, Oct 26, 2018 at 5:28 AM Sibi Sankar  
wrote:


Add reserve-memory nodes for mpss and mba required for
remoteproc mss pil.

Signed-off-by: Sibi Sankar 
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi

index d3fe012ad84e..f74892e447f9 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -88,6 +88,16 @@
reg = <0 0x8620 0 0x2d0>;
no-map;
};
+
+   mpss_region: reserved-memory@8e00 {
+   no-map;
+   reg = <0 0x8e00 0 0x780>;
+   };
+
+   mba_region: reserved-memory@9650 {


nit: All of the other reserved memory in this same section just has
the node name "memory".  Can you please follow suit?  Also above.
This is the kind of thing that Rob H. usually cares about doing right.


+   no-map;
+   reg = <0 0x9650 0 0x20>;


nit: All of the other reserved memory in this same section has "reg"
above "no-map".  It doesn't matter a whole lot, but why not make it
match everyone else?  Also above.


-Doug


--
-- Sibi Sankar --
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.


Re: [PATCH 1/3] retpolines: Only enable retpoline when compiler support it

2018-10-30 Thread Zhenzhong Duan

On 2018/10/30 18:09, Peter Zijlstra wrote:

On Tue, Oct 30, 2018 at 06:39:24PM +0900, Masahiro Yamada wrote:

Hi,



On Tue, Oct 30, 2018 at 3:57 PM Zhenzhong Duan
 wrote:


Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on it.

Change KBUILD to use CONFIG_RETPOLINE_SUPPORT to avoid conflict with
CONFIG_RETPOLINE which is used by kernel.

With all that stuff, the check of RETPOLINE is changed to
CONFIG_RETPOLINE.

This change is based on suggestion in https://lkml.org/lkml/2018/9/18/1016

Signed-off-by: Zhenzhong Duan 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Daniel Borkmann 
Cc: David Woodhouse 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Konrad Rzeszutek Wilk 
Cc: Andy Lutomirski 
Cc: Masahiro Yamada 
Cc: Michal Marek 
---



Instead of adding another CONFIG option,
does it make sense to add compiler support checks
to 'depends on' syntax ?


config RETPOLINE
  bool "Avoid speculative indirect branches in kernel"
  depends on $(cc-option,-mindirect-branch=thunk-extern
-mindirect-branch-register) || \
 $(cc-option,-mretpoline-external-thunk)
  default y
  select STACK_VALIDATION if HAVE_STACK_VALIDATION


Looks better, thanks for suggestion.



That seems to be what we did for stackprotector, which is similar in
that it used to fail the build. So yes, this seems sane.


Should I add a scripts/gcc-x86_64-has-retpoline.sh like what 
stackprotector does as below or there is a simpler way?


config CC_HAS_SANE_STACKPROTECTOR
bool
default 
$(success,$(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC)) if 
64BIT
default 
$(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))

help
   We have to make sure stack protector is unconditionally 
disabled if

   the compiler produces broken code.

Thanks
Zhenzhong


Re: [PATCH 1/3] retpolines: Only enable retpoline when compiler support it

2018-10-30 Thread Zhenzhong Duan

On 2018/10/30 18:09, Peter Zijlstra wrote:

On Tue, Oct 30, 2018 at 06:39:24PM +0900, Masahiro Yamada wrote:

Hi,



On Tue, Oct 30, 2018 at 3:57 PM Zhenzhong Duan
 wrote:


Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on it.

Change KBUILD to use CONFIG_RETPOLINE_SUPPORT to avoid conflict with
CONFIG_RETPOLINE which is used by kernel.

With all that stuff, the check of RETPOLINE is changed to
CONFIG_RETPOLINE.

This change is based on suggestion in https://lkml.org/lkml/2018/9/18/1016

Signed-off-by: Zhenzhong Duan 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Daniel Borkmann 
Cc: David Woodhouse 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Konrad Rzeszutek Wilk 
Cc: Andy Lutomirski 
Cc: Masahiro Yamada 
Cc: Michal Marek 
---



Instead of adding another CONFIG option,
does it make sense to add compiler support checks
to 'depends on' syntax ?


config RETPOLINE
  bool "Avoid speculative indirect branches in kernel"
  depends on $(cc-option,-mindirect-branch=thunk-extern
-mindirect-branch-register) || \
 $(cc-option,-mretpoline-external-thunk)
  default y
  select STACK_VALIDATION if HAVE_STACK_VALIDATION


Looks better, thanks for suggestion.



That seems to be what we did for stackprotector, which is similar in
that it used to fail the build. So yes, this seems sane.


Should I add a scripts/gcc-x86_64-has-retpoline.sh like what 
stackprotector does as below or there is a simpler way?


config CC_HAS_SANE_STACKPROTECTOR
bool
default 
$(success,$(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC)) if 
64BIT
default 
$(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))

help
   We have to make sure stack protector is unconditionally 
disabled if

   the compiler produces broken code.

Thanks
Zhenzhong


[PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout

2018-10-30 Thread David Miller


This goes back to by-hand parsing of the proc mmap file, and removes
the timeout.

In my measurements this makes the parsing about twice as fast.

Profiling thread synthesizing shows that most of the time is spent in
the sscanf() call.

Processing samples is critical for perf top and perf record -a startup
performance, and perf's ability to keep up with high sample rates.

Signed-off-by: David S. Miller 

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 2b1ef704169f..d4b5177c3c22 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
"show events other than"
" HLT (x86 only) or Wait state (s390 only)"
" that take longer than duration usecs"),
-   OPT_UINTEGER(0, "proc-map-timeout", >opts.proc_map_timeout,
-   "per thread proc mmap processing timeout in 
ms"),
OPT_END()
};
const char * const live_usage[] = {
@@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
kvm->opts.target.uses_mmap = false;
kvm->opts.target.uid_str = NULL;
kvm->opts.target.uid = UINT_MAX;
-   kvm->opts.proc_map_timeout = 500;
 
symbol__init(NULL);
disable_buildid_cache();
@@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
perf_session__set_id_hdr_size(kvm->session);
ordered_events__set_copy_on_queue(>session->ordered_events, true);
machine__synthesize_threads(>session->machines.host, 
>opts.target,
-   kvm->evlist->threads, false,
-   kvm->opts.proc_map_timeout, 1);
+   kvm->evlist->threads, false, 1);
err = kvm_live_open_events(kvm);
if (err)
goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..4259ea42a8ef 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record *rec, 
bool tail)
err = perf_event__synthesize_thread_map(>tool, thread_map,
 process_synthesized_event,
 >session->machines.host,
-rec->opts.sample_address,
-rec->opts.proc_map_timeout);
+rec->opts.sample_address);
thread_map__put(thread_map);
return err;
 }
@@ -848,8 +847,7 @@ static int record__synthesize(struct record *rec, bool tail)
}
 
err = __machine__synthesize_threads(machine, tool, >target, 
rec->evlist->threads,
-   process_synthesized_event, 
opts->sample_address,
-   opts->proc_map_timeout, 1);
+   process_synthesized_event, 
opts->sample_address, 1);
 out:
return err;
 }
@@ -1521,7 +1519,6 @@ static struct record record = {
.uses_mmap   = true,
.default_per_cpu = true,
},
-   .proc_map_timeout = 500,
},
.tool = {
.sample = process_sample_event,
@@ -1651,8 +1648,6 @@ static struct option __record_options[] = {
parse_clockid),
OPT_STRING_OPTARG('S', "snapshot", _snapshot_opts,
  "opts", "AUX area tracing Snapshot Mode", ""),
-   OPT_UINTEGER(0, "proc-map-timeout", _map_timeout,
-   "per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "namespaces", _namespaces,
"Record namespaces events"),
OPT_BOOLEAN(0, "switch-events", _switch_events,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d8751e749..4eec44d815bd 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1096,7 +1096,6 @@ static int __cmd_top(struct perf_top *top)
 
machine__synthesize_threads(>session->machines.host, >target,
top->evlist->threads, false,
-   opts->proc_map_timeout,
top->nr_threads_synthesize);
 
if (top->nr_threads_synthesize > 1)
@@ -1256,7 +1255,6 @@ int cmd_top(int argc, const char **argv)
.target = {
.uses_mmap   = true,
},
-   .proc_map_timeout= 500,
.overwrite  = 1,
},
.max_stack   = sysctl__max_stack(),
@@ -1360,8 +1358,6 @@ int cmd_top(int argc, const char **argv)
OPT_STRING('w', 

[PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout

2018-10-30 Thread David Miller


This goes back to by-hand parsing of the proc mmap file, and removes
the timeout.

In my measurements this makes the parsing about twice as fast.

Profiling thread synthesizing shows that most of the time is spent in
the sscanf() call.

Processing samples is critical for perf top and perf record -a startup
performance, and perf's ability to keep up with high sample rates.

Signed-off-by: David S. Miller 

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 2b1ef704169f..d4b5177c3c22 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
"show events other than"
" HLT (x86 only) or Wait state (s390 only)"
" that take longer than duration usecs"),
-   OPT_UINTEGER(0, "proc-map-timeout", >opts.proc_map_timeout,
-   "per thread proc mmap processing timeout in 
ms"),
OPT_END()
};
const char * const live_usage[] = {
@@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
kvm->opts.target.uses_mmap = false;
kvm->opts.target.uid_str = NULL;
kvm->opts.target.uid = UINT_MAX;
-   kvm->opts.proc_map_timeout = 500;
 
symbol__init(NULL);
disable_buildid_cache();
@@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
perf_session__set_id_hdr_size(kvm->session);
ordered_events__set_copy_on_queue(>session->ordered_events, true);
machine__synthesize_threads(>session->machines.host, 
>opts.target,
-   kvm->evlist->threads, false,
-   kvm->opts.proc_map_timeout, 1);
+   kvm->evlist->threads, false, 1);
err = kvm_live_open_events(kvm);
if (err)
goto out;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..4259ea42a8ef 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record *rec, 
bool tail)
err = perf_event__synthesize_thread_map(>tool, thread_map,
 process_synthesized_event,
 >session->machines.host,
-rec->opts.sample_address,
-rec->opts.proc_map_timeout);
+rec->opts.sample_address);
thread_map__put(thread_map);
return err;
 }
@@ -848,8 +847,7 @@ static int record__synthesize(struct record *rec, bool tail)
}
 
err = __machine__synthesize_threads(machine, tool, >target, 
rec->evlist->threads,
-   process_synthesized_event, 
opts->sample_address,
-   opts->proc_map_timeout, 1);
+   process_synthesized_event, 
opts->sample_address, 1);
 out:
return err;
 }
@@ -1521,7 +1519,6 @@ static struct record record = {
.uses_mmap   = true,
.default_per_cpu = true,
},
-   .proc_map_timeout = 500,
},
.tool = {
.sample = process_sample_event,
@@ -1651,8 +1648,6 @@ static struct option __record_options[] = {
parse_clockid),
OPT_STRING_OPTARG('S', "snapshot", _snapshot_opts,
  "opts", "AUX area tracing Snapshot Mode", ""),
-   OPT_UINTEGER(0, "proc-map-timeout", _map_timeout,
-   "per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "namespaces", _namespaces,
"Record namespaces events"),
OPT_BOOLEAN(0, "switch-events", _switch_events,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d8751e749..4eec44d815bd 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1096,7 +1096,6 @@ static int __cmd_top(struct perf_top *top)
 
machine__synthesize_threads(>session->machines.host, >target,
top->evlist->threads, false,
-   opts->proc_map_timeout,
top->nr_threads_synthesize);
 
if (top->nr_threads_synthesize > 1)
@@ -1256,7 +1255,6 @@ int cmd_top(int argc, const char **argv)
.target = {
.uses_mmap   = true,
},
-   .proc_map_timeout= 500,
.overwrite  = 1,
},
.max_stack   = sysctl__max_stack(),
@@ -1360,8 +1358,6 @@ int cmd_top(int argc, const char **argv)
OPT_STRING('w', 

Re: [PATCH] reiserfs: remove workaround code for GCC 3.x

2018-10-30 Thread Michael Ellerman
Masahiro Yamada  writes:

> Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
> bumped the minimum GCC version to 4.6 for all architectures.
>
> The workaround code in fs/reiserfs/Makefile is obsolete now.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  fs/reiserfs/Makefile | 9 -
>  1 file changed, 9 deletions(-)
>
> diff --git a/fs/reiserfs/Makefile b/fs/reiserfs/Makefile
> index a39a562..bd29c58 100644
> --- a/fs/reiserfs/Makefile
> +++ b/fs/reiserfs/Makefile
> @@ -26,14 +26,5 @@ ifeq ($(CONFIG_REISERFS_FS_POSIX_ACL),y)
>  reiserfs-objs += xattr_acl.o
>  endif
>  
> -# gcc -O2 (the kernel default)  is overaggressive on ppc32 when many inline
> -# functions are used.  This causes the compiler to advance the stack
> -# pointer out of the available stack space, corrupting kernel space,
> -# and causing a panic. Since this behavior only affects ppc32, this ifeq
> -# will work around it. If any other architecture displays this behavior,
> -# add it here.
> -ccflags-$(CONFIG_PPC32) := $(call cc-ifversion, -lt, 0400, -O1)
> -
>  TAGS:
>   etags *.c

Oh wow that's gross. Thanks for cleaning it up.

Acked-by: Michael Ellerman 

cheers


Re: [PATCH] reiserfs: remove workaround code for GCC 3.x

2018-10-30 Thread Michael Ellerman
Masahiro Yamada  writes:

> Commit cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
> bumped the minimum GCC version to 4.6 for all architectures.
>
> The workaround code in fs/reiserfs/Makefile is obsolete now.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  fs/reiserfs/Makefile | 9 -
>  1 file changed, 9 deletions(-)
>
> diff --git a/fs/reiserfs/Makefile b/fs/reiserfs/Makefile
> index a39a562..bd29c58 100644
> --- a/fs/reiserfs/Makefile
> +++ b/fs/reiserfs/Makefile
> @@ -26,14 +26,5 @@ ifeq ($(CONFIG_REISERFS_FS_POSIX_ACL),y)
>  reiserfs-objs += xattr_acl.o
>  endif
>  
> -# gcc -O2 (the kernel default)  is overaggressive on ppc32 when many inline
> -# functions are used.  This causes the compiler to advance the stack
> -# pointer out of the available stack space, corrupting kernel space,
> -# and causing a panic. Since this behavior only affects ppc32, this ifeq
> -# will work around it. If any other architecture displays this behavior,
> -# add it here.
> -ccflags-$(CONFIG_PPC32) := $(call cc-ifversion, -lt, 0400, -O1)
> -
>  TAGS:
>   etags *.c

Oh wow that's gross. Thanks for cleaning it up.

Acked-by: Michael Ellerman 

cheers


Re: 4.18: early boot crash in thermal_cooling_device_destroy_sysfs

2018-10-30 Thread Randy Dunlap
On 10/30/18 6:07 PM, Zhang Rui wrote:
> Hi, Randy,
> 
> On 五, 2018-10-26 at 20:35 -0700, Randy Dunlap wrote:
>> On 10/26/18 2:14 AM, Rafael J. Wysocki wrote:
>>>
>>> On Monday, October 22, 2018 8:37:25 PM CEST Randy Dunlap wrote:


 On 8/16/18 2:33 PM, Randy Dunlap wrote:
>
> Hi,
>
> Sorry for the photo.  That's all I have available so far.
>
> https://www.infradead.org/~rdunlap/doc/IMG_20180816_133254743_H
> DR.jpg
>
>
> Does anyone recognize this?
>
> This is an (older) Toshiba laptop.  The kernel .config is
> mostly an
> allmodconfig with some DEBUG options disabled and other options
> enabled
> so that it can boot without using an initramfs.  (and with
> COMPILE_TEST
> disabled :)
>
>
> The full kernel .config file is attached.
>
> Thanks,
>
 This is a result of CONFIG_DEBUG_TEST_DRIVER_REMOVE=y.
 [switch from 64-bit to 32-bit machine]


 When using CONFIG_DEBUG_VM=y, it BUGs at:
 [5.553603] [ cut here ]
 [5.553733] kernel BUG at arch/x86/mm/physaddr.c:75!
 [5.557788] invalid opcode:  [#1] PREEMPT SMP
 DEBUG_PAGEALLOC
 [5.558738] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-
 rc7 #4
 [5.558738] Hardware name: Dell Inc. Inspiron
 1318   /0C236D, BIOS A04 01/15/2009
 [5.558738] EIP: __phys_addr+0x40/0x90
 [5.558738] Code: 00 40 75 2e 8b 15 00 57 23 d5 85 d2 74 12 89
 d9 c1 e9 0c 39 ca 72 5b e8 2e ca ff ff 39 d8 75 4a 89 d8 5b 5d c3
 8d 74 26 00 90 <0f> 0b 8d b6 00 00 00 00 8b 0d 80 56 23 d5 8d 91
 00 00 80 00 39 d0
 [5.558738] EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 00140011 EDX:
 
 [5.558738] ESI: f489 EDI: d4a58d60 EBP: f40c1e0c ESP:
 f40c1e08
 [5.558738] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 EFLAGS: 00210a97
 [5.558738] CR0: 80050033 CR2:  CR3: 14cad000 CR4:
 000406d0
 [5.558738] Call Trace:
 [5.558738]  kfree+0x1f/0x160
 [5.558738]  thermal_cooling_device_destroy_sysfs+0x11/0x20
 [5.558738]  thermal_cooling_device_unregister+0x168/0x180
 [5.558738]  acpi_pss_perf_exit.isra.4+0x32/0x50
 [5.558738]  acpi_processor_stop+0x4d/0x60
 [5.558738]  really_probe+0xa3/0x3e0
 [5.558738]  driver_probe_device+0x5b/0x120
 [5.558738]  __driver_attach+0xd9/0x100
 [5.558738]  ? driver_probe_device+0x120/0x120
 [5.558738]  bus_for_each_dev+0x56/0x90
 [5.558738]  driver_attach+0x14/0x20
 [5.558738]  ? driver_probe_device+0x120/0x120
 [5.558738]  bus_add_driver+0x117/0x210
 [5.558738]  driver_register+0x61/0xb0
 [5.558738]  acpi_processor_driver_init+0x19/0x88
 [5.558738]  ? acpi_pci_slot_init+0xf/0xf
 [5.558738]  do_one_initcall+0x3e/0x15a
 [5.558738]  ? do_early_param+0x75/0x75
 [5.558738]  kernel_init_freeable+0x170/0x1f3
 [5.558738]  ? rest_init+0xcd/0xcd
 [5.558738]  kernel_init+0x8/0xdb
 [5.558738]  ret_from_fork+0x2e/0x38
 [5.558738] Modules linked in:
 [5.625269] _warn_unseeded_randomness: 1 callbacks suppressed
 [5.625272] random: get_random_bytes called from
 init_oops_id+0x3a/0x40 with crng_init=0
 [5.629758] ---[ end trace 65b17bf4d18e7692 ]---
 [5.631573] EIP: __phys_addr+0x40/0x90
 [5.633242] Code: 00 40 75 2e 8b 15 00 57 23 d5 85 d2 74 12 89
 d9 c1 e9 0c 39 ca 72 5b e8 2e ca ff ff 39 d8 75 4a 89 d8 5b 5d c3
 8d 74 26 00 90 <0f> 0b 8d b6 00 00 00 00 8b 0d 80 56 23 d5 8d 91
 00 00 80 00 39 d0
 [5.638618] EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 00140011 EDX:
 
 [5.640703] ESI: f489 EDI: d4a58d60 EBP: f40c1e0c ESP:
 d4cb13dc
 [5.642801] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 EFLAGS: 00210a97
 [5.645053] CR0: 80050033 CR2:  CR3: 14cad000 CR4:
 000406d0
 [5.647179] Kernel panic - not syncing: Fatal exception
 [5.648172] Kernel Offset: 0x1300 from 0xc100
 (relocation range: 0xc000-0xf77fdfff)
 [5.648172] ---[ end Kernel panic - not syncing: Fatal
 exception ]---


 When not using CONFIG_DEBUG_VM, it BUGs in kfree:
 [5.497864] [ cut here ]
 [5.498215] kernel BUG at mm/slub.c:3901!
 [5.501739] invalid opcode:  [#1] PREEMPT SMP
 [5.502720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-
 rc7 #3
 [5.502720] Hardware name: Dell Inc. Inspiron
 1318   /0C236D, BIOS A04 01/15/2009
 [5.502720] EIP: kfree+0x117/0x150
 [5.502720] Code: 74 21 8b 06 31 d2 f6 c4 80 74 04 0f b6 56 31
 89 f0 e8 7d e0 fa ff e9 7b ff ff ff 8d b4 26 00 00 00 00 90 8b 46
 04 a8 01 75 d8 <0f> 0b 8d b4 26 00 00 00 00 8b 75 f0 ff 75 ec 89
 d9 89 f8 

Re: 4.18: early boot crash in thermal_cooling_device_destroy_sysfs

2018-10-30 Thread Randy Dunlap
On 10/30/18 6:07 PM, Zhang Rui wrote:
> Hi, Randy,
> 
> On 五, 2018-10-26 at 20:35 -0700, Randy Dunlap wrote:
>> On 10/26/18 2:14 AM, Rafael J. Wysocki wrote:
>>>
>>> On Monday, October 22, 2018 8:37:25 PM CEST Randy Dunlap wrote:


 On 8/16/18 2:33 PM, Randy Dunlap wrote:
>
> Hi,
>
> Sorry for the photo.  That's all I have available so far.
>
> https://www.infradead.org/~rdunlap/doc/IMG_20180816_133254743_H
> DR.jpg
>
>
> Does anyone recognize this?
>
> This is an (older) Toshiba laptop.  The kernel .config is
> mostly an
> allmodconfig with some DEBUG options disabled and other options
> enabled
> so that it can boot without using an initramfs.  (and with
> COMPILE_TEST
> disabled :)
>
>
> The full kernel .config file is attached.
>
> Thanks,
>
 This is a result of CONFIG_DEBUG_TEST_DRIVER_REMOVE=y.
 [switch from 64-bit to 32-bit machine]


 When using CONFIG_DEBUG_VM=y, it BUGs at:
 [5.553603] [ cut here ]
 [5.553733] kernel BUG at arch/x86/mm/physaddr.c:75!
 [5.557788] invalid opcode:  [#1] PREEMPT SMP
 DEBUG_PAGEALLOC
 [5.558738] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-
 rc7 #4
 [5.558738] Hardware name: Dell Inc. Inspiron
 1318   /0C236D, BIOS A04 01/15/2009
 [5.558738] EIP: __phys_addr+0x40/0x90
 [5.558738] Code: 00 40 75 2e 8b 15 00 57 23 d5 85 d2 74 12 89
 d9 c1 e9 0c 39 ca 72 5b e8 2e ca ff ff 39 d8 75 4a 89 d8 5b 5d c3
 8d 74 26 00 90 <0f> 0b 8d b6 00 00 00 00 8b 0d 80 56 23 d5 8d 91
 00 00 80 00 39 d0
 [5.558738] EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 00140011 EDX:
 
 [5.558738] ESI: f489 EDI: d4a58d60 EBP: f40c1e0c ESP:
 f40c1e08
 [5.558738] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 EFLAGS: 00210a97
 [5.558738] CR0: 80050033 CR2:  CR3: 14cad000 CR4:
 000406d0
 [5.558738] Call Trace:
 [5.558738]  kfree+0x1f/0x160
 [5.558738]  thermal_cooling_device_destroy_sysfs+0x11/0x20
 [5.558738]  thermal_cooling_device_unregister+0x168/0x180
 [5.558738]  acpi_pss_perf_exit.isra.4+0x32/0x50
 [5.558738]  acpi_processor_stop+0x4d/0x60
 [5.558738]  really_probe+0xa3/0x3e0
 [5.558738]  driver_probe_device+0x5b/0x120
 [5.558738]  __driver_attach+0xd9/0x100
 [5.558738]  ? driver_probe_device+0x120/0x120
 [5.558738]  bus_for_each_dev+0x56/0x90
 [5.558738]  driver_attach+0x14/0x20
 [5.558738]  ? driver_probe_device+0x120/0x120
 [5.558738]  bus_add_driver+0x117/0x210
 [5.558738]  driver_register+0x61/0xb0
 [5.558738]  acpi_processor_driver_init+0x19/0x88
 [5.558738]  ? acpi_pci_slot_init+0xf/0xf
 [5.558738]  do_one_initcall+0x3e/0x15a
 [5.558738]  ? do_early_param+0x75/0x75
 [5.558738]  kernel_init_freeable+0x170/0x1f3
 [5.558738]  ? rest_init+0xcd/0xcd
 [5.558738]  kernel_init+0x8/0xdb
 [5.558738]  ret_from_fork+0x2e/0x38
 [5.558738] Modules linked in:
 [5.625269] _warn_unseeded_randomness: 1 callbacks suppressed
 [5.625272] random: get_random_bytes called from
 init_oops_id+0x3a/0x40 with crng_init=0
 [5.629758] ---[ end trace 65b17bf4d18e7692 ]---
 [5.631573] EIP: __phys_addr+0x40/0x90
 [5.633242] Code: 00 40 75 2e 8b 15 00 57 23 d5 85 d2 74 12 89
 d9 c1 e9 0c 39 ca 72 5b e8 2e ca ff ff 39 d8 75 4a 89 d8 5b 5d c3
 8d 74 26 00 90 <0f> 0b 8d b6 00 00 00 00 8b 0d 80 56 23 d5 8d 91
 00 00 80 00 39 d0
 [5.638618] EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 00140011 EDX:
 
 [5.640703] ESI: f489 EDI: d4a58d60 EBP: f40c1e0c ESP:
 d4cb13dc
 [5.642801] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 EFLAGS: 00210a97
 [5.645053] CR0: 80050033 CR2:  CR3: 14cad000 CR4:
 000406d0
 [5.647179] Kernel panic - not syncing: Fatal exception
 [5.648172] Kernel Offset: 0x1300 from 0xc100
 (relocation range: 0xc000-0xf77fdfff)
 [5.648172] ---[ end Kernel panic - not syncing: Fatal
 exception ]---


 When not using CONFIG_DEBUG_VM, it BUGs in kfree:
 [5.497864] [ cut here ]
 [5.498215] kernel BUG at mm/slub.c:3901!
 [5.501739] invalid opcode:  [#1] PREEMPT SMP
 [5.502720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-
 rc7 #3
 [5.502720] Hardware name: Dell Inc. Inspiron
 1318   /0C236D, BIOS A04 01/15/2009
 [5.502720] EIP: kfree+0x117/0x150
 [5.502720] Code: 74 21 8b 06 31 d2 f6 c4 80 74 04 0f b6 56 31
 89 f0 e8 7d e0 fa ff e9 7b ff ff ff 8d b4 26 00 00 00 00 90 8b 46
 04 a8 01 75 d8 <0f> 0b 8d b4 26 00 00 00 00 8b 75 f0 ff 75 ec 89
 d9 89 f8 

Re: [PATCH] fs: fix lost error code in dio_complete

2018-10-30 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH] fs: fix lost error code in dio_complete

2018-10-30 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


[git pull] mount API series

2018-10-30 Thread Al Viro
mount API series from David Howells.  Last cycle's objections
had been of the "I'd do it differently" variety and with no such
differently done variants having ever materialized over several
cycles...

Conflicts: two trivial ones in
drivers/infiniband/Kconfig (removal of select ANON_INODES)
fs/f2fs/super.c (->remount signature change)
and a non-trivial one in fs/proc/inode.c - there we have
mainline adding
/* procfs dentries and inodes don't require IO to create */
s->s_shrink.seeks = 0;
to proc_fill_super() (in 4b85afbdacd2 "mm: zero-seek shrinkers")
while that series moves the sucker to fs/proc/root.c.  Resolved by
removing the old copy from fs/proc/inode.c and adding the same lines
into the new copy in fs/proc/root.c.
I'd put a variant of resolution into #proposed-merge.

David's cover letter follows; it's obviously over the top for commit
message of the merge.  Where to cut it is up to you...

=

Here are a set of patches to create a filesystem context prior to setting
up a new mount, populating it with the parsed options/binary data, looking
up/creating the superblock, querying it and then effecting the mount.  This
is also used for remount since much of the parsing stuff is common in many
filesystems.

This allows namespaces and other information to be conveyed through the
mount procedure.  This is done with something like:

fd = fsopen("nfs", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0);
fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
struct fsinfo_statfs statfs;
fsinfo(fd, NULL, NULL, , sizeof(statfs));
mfd = fsmount(fd, MS_NODEV);
move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

The new move_mount() syscall can also be used simply to move mounts around:

move_mount(AT_FDCWD, "/mnt", AT_FDCWD, "/mnt2", 0);

And, in conjunction with the open_tree() syscall, can be used to clone
mounts:

fd = open_tree(AT_FDCWD, "/mnt", AT_RECURSIVE | OPEN_TREE_CLONE);
move_mount(mfd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);

File descriptors can be used as mountpoint references:

fd = open_tree(AT_FDCWD, "/mnt", 0);
move_mount(mfd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);
move_mount(mfd, "", AT_FDCWD, "/mnt3", MOVE_MOUNT_F_EMPTY_PATH);

which, in this example, will *move* the mount at /mnt to /mnt2 and thence
to /mnt3.

Superblocks can be picked and reconfigured:

fd = fspick(AT_FDCWD, "/mnt", 0)
fsconfig(fd, FSCONFIG_SET_STRING, "option", "other-val", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option2", "true", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option3", "1234", 0);
fsconfig(fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);

Filesystem parameters and other attributes can also be queried from the fd
returned by fsopen() or fspick():

fd = fspick(AT_FDCWD, "/mnt", 0)
struct fsinfo_params params = {
.request= FSINFO_ATTR_PARAMETER,
.Nth= 1,
.Mth= 3,
};
char param_buf[4096];
fsinfo(fd, NULL, , param_buf, sizeof(param_buf));

which will retrieve the 4th value of the 2nd parameter (0 being first) as a
printable string.

Parameters and attributes can also be queried by path or on an ordinary fd:

struct fsinfo_params params = {
.request= FSINFO_ATTR_VOLUME_NAME,
};
char param_buf[4096];
fsinfo(AT_FDCWD, "/etc/passwd", , param_buf,
   sizeof(param_buf));

The details of a filesystem's parser can also be queried:

fd = fsopen("ext4", 0);
struct fsinfo_params params = {
.request= FSINFO_ATTR_PARAM_NAME,
.Nth= 1,
};
char param_buf[4096];
fsinfo(fd, NULL, , param_buf, sizeof(param_buf));

which, in this instance, will retrieve the name of parameter #1.

I have implemented filesystem context handling for procfs, nfs, mqueue,
cpuset, kernfs, sysfs, cgroup and afs filesystems.  Unconverted filesystems
are handled by a legacy filesystem wrapper for the moment.

Note that I didn't use netlink as that would make the core kernel depend on
CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing
issues.



WHY DO WE WANT THIS?


Firstly, there's a bunch of problems with the mount(2) syscall:

 (1) It's actually six or seven different interfaces rolled into one and
 weird combinations of flags make it do different things beyond the
 original specification of the syscall.

 (2) It produces a particularly large and diverse set of errors, which have
 to be mapped back to a small error code.  Yes, there's dmesg - if you
 have it configured - but you can't necessarily see that if you're
 doing a 

[git pull] mount API series

2018-10-30 Thread Al Viro
mount API series from David Howells.  Last cycle's objections
had been of the "I'd do it differently" variety and with no such
differently done variants having ever materialized over several
cycles...

Conflicts: two trivial ones in
drivers/infiniband/Kconfig (removal of select ANON_INODES)
fs/f2fs/super.c (->remount signature change)
and a non-trivial one in fs/proc/inode.c - there we have
mainline adding
/* procfs dentries and inodes don't require IO to create */
s->s_shrink.seeks = 0;
to proc_fill_super() (in 4b85afbdacd2 "mm: zero-seek shrinkers")
while that series moves the sucker to fs/proc/root.c.  Resolved by
removing the old copy from fs/proc/inode.c and adding the same lines
into the new copy in fs/proc/root.c.
I'd put a variant of resolution into #proposed-merge.

David's cover letter follows; it's obviously over the top for commit
message of the merge.  Where to cut it is up to you...

=

Here are a set of patches to create a filesystem context prior to setting
up a new mount, populating it with the parsed options/binary data, looking
up/creating the superblock, querying it and then effecting the mount.  This
is also used for remount since much of the parsing stuff is common in many
filesystems.

This allows namespaces and other information to be conveyed through the
mount procedure.  This is done with something like:

fd = fsopen("nfs", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option", "val", 0);
fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
struct fsinfo_statfs statfs;
fsinfo(fd, NULL, NULL, , sizeof(statfs));
mfd = fsmount(fd, MS_NODEV);
move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

The new move_mount() syscall can also be used simply to move mounts around:

move_mount(AT_FDCWD, "/mnt", AT_FDCWD, "/mnt2", 0);

And, in conjunction with the open_tree() syscall, can be used to clone
mounts:

fd = open_tree(AT_FDCWD, "/mnt", AT_RECURSIVE | OPEN_TREE_CLONE);
move_mount(mfd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);

File descriptors can be used as mountpoint references:

fd = open_tree(AT_FDCWD, "/mnt", 0);
move_mount(mfd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);
move_mount(mfd, "", AT_FDCWD, "/mnt3", MOVE_MOUNT_F_EMPTY_PATH);

which, in this example, will *move* the mount at /mnt to /mnt2 and thence
to /mnt3.

Superblocks can be picked and reconfigured:

fd = fspick(AT_FDCWD, "/mnt", 0)
fsconfig(fd, FSCONFIG_SET_STRING, "option", "other-val", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option2", "true", 0);
fsconfig(fd, FSCONFIG_SET_STRING, "option3", "1234", 0);
fsconfig(fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);

Filesystem parameters and other attributes can also be queried from the fd
returned by fsopen() or fspick():

fd = fspick(AT_FDCWD, "/mnt", 0)
struct fsinfo_params params = {
.request= FSINFO_ATTR_PARAMETER,
.Nth= 1,
.Mth= 3,
};
char param_buf[4096];
fsinfo(fd, NULL, , param_buf, sizeof(param_buf));

which will retrieve the 4th value of the 2nd parameter (0 being first) as a
printable string.

Parameters and attributes can also be queried by path or on an ordinary fd:

struct fsinfo_params params = {
.request= FSINFO_ATTR_VOLUME_NAME,
};
char param_buf[4096];
fsinfo(AT_FDCWD, "/etc/passwd", , param_buf,
   sizeof(param_buf));

The details of a filesystem's parser can also be queried:

fd = fsopen("ext4", 0);
struct fsinfo_params params = {
.request= FSINFO_ATTR_PARAM_NAME,
.Nth= 1,
};
char param_buf[4096];
fsinfo(fd, NULL, , param_buf, sizeof(param_buf));

which, in this instance, will retrieve the name of parameter #1.

I have implemented filesystem context handling for procfs, nfs, mqueue,
cpuset, kernfs, sysfs, cgroup and afs filesystems.  Unconverted filesystems
are handled by a legacy filesystem wrapper for the moment.

Note that I didn't use netlink as that would make the core kernel depend on
CONFIG_NET and CONFIG_NETLINK and would introduce network namespacing
issues.



WHY DO WE WANT THIS?


Firstly, there's a bunch of problems with the mount(2) syscall:

 (1) It's actually six or seven different interfaces rolled into one and
 weird combinations of flags make it do different things beyond the
 original specification of the syscall.

 (2) It produces a particularly large and diverse set of errors, which have
 to be mapped back to a small error code.  Yes, there's dmesg - if you
 have it configured - but you can't necessarily see that if you're
 doing a 

[PATCH RFC] perf: Start 'top' display thread earlier.

2018-10-30 Thread David Miller


If events are coming in at a rate such that the event processing
thread can barely keep up, our initial run of the event ring will
almost never terminate and this delays the starting of the display
thread.

The screen basically stays black until the event thread can get out of
it's endless loop.

Therefore, start the display thread before we start processing the
ring buffer.

This also make sure that we always have the user requested real time
setting engaged when processing the ring.

Signed-off-by: David S. Miller 

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..e7a78ce 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1134,11 +1125,6 @@ static int __cmd_top(struct perf_top *top)
 if (!target__none(>target))
 perf_evlist__enable(top->evlist);
 
-   /* Wait for a minimal set of events before starting the snapshot */
-   perf_evlist__poll(top->evlist, 100);
-
-   perf_top__mmap_read(top);
-
ret = -1;
if (pthread_create(, NULL, (use_browser > 0 ? display_thread_tui 
:
display_thread), 
top)) {
@@ -1156,6 +1142,11 @@ static int __cmd_top(struct perf_top *top)
}
}
 
+   /* Wait for a minimal set of events before starting the snapshot */
+   perf_evlist__poll(top->evlist, 100);
+
+   perf_top__mmap_read(top);
+
while (!done) {
u64 hits = top->samples;
 


[PATCH RFC] perf: Start 'top' display thread earlier.

2018-10-30 Thread David Miller


If events are coming in at a rate such that the event processing
thread can barely keep up, our initial run of the event ring will
almost never terminate and this delays the starting of the display
thread.

The screen basically stays black until the event thread can get out of
it's endless loop.

Therefore, start the display thread before we start processing the
ring buffer.

This also make sure that we always have the user requested real time
setting engaged when processing the ring.

Signed-off-by: David S. Miller 

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..e7a78ce 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1134,11 +1125,6 @@ static int __cmd_top(struct perf_top *top)
 if (!target__none(>target))
 perf_evlist__enable(top->evlist);
 
-   /* Wait for a minimal set of events before starting the snapshot */
-   perf_evlist__poll(top->evlist, 100);
-
-   perf_top__mmap_read(top);
-
ret = -1;
if (pthread_create(, NULL, (use_browser > 0 ? display_thread_tui 
:
display_thread), 
top)) {
@@ -1156,6 +1142,11 @@ static int __cmd_top(struct perf_top *top)
}
}
 
+   /* Wait for a minimal set of events before starting the snapshot */
+   perf_evlist__poll(top->evlist, 100);
+
+   perf_top__mmap_read(top);
+
while (!done) {
u64 hits = top->samples;
 


[PATCH] perf: Don't clone maps from parent when synthesizing forks

2018-10-30 Thread David Miller


When synthesizing FORK events, we are trying to create thread objects
for the already running tasks on the machine.

Normally, for a kernel FORK event, we want to clone the parent's maps
because that is what the kernel just did.

But when synthesizing, this should not be done.  If we do, we end up
with overlapping maps as we process the sythesized MMAP2 events that
get delivered shortly thereafter.

Use the FORK event misc flags in an internal way to signal this
situation, so we can elide the map clone when appropriate.

Signed-off-by: David S. Miller 

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72739c0..9de8780ac8d9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -646,10 +646,12 @@ struct perf_event_mmap_page {
  *
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
+ *   PERF_RECORD_MISC_FORK_EXEC  - PERF_RECORD_FORK event (perf internal)
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
  */
 #define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
+#define PERF_RECORD_MISC_FORK_EXEC (1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT(1 << 13)
 /*
  * These PERF_RECORD_MISC_* flags below are safely reused
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index f35eb72739c0..9de8780ac8d9 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -646,10 +646,12 @@ struct perf_event_mmap_page {
  *
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
+ *   PERF_RECORD_MISC_FORK_EXEC  - PERF_RECORD_FORK event (perf internal)
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
  */
 #define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
+#define PERF_RECORD_MISC_FORK_EXEC (1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT(1 << 13)
 /*
  * These PERF_RECORD_MISC_* flags below are safely reused
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index bc646185f8d9..e9c108a6b1c3 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -308,6 +308,7 @@ static int perf_event__synthesize_fork(struct perf_tool 
*tool,
event->fork.pid  = tgid;
event->fork.tid  = pid;
event->fork.header.type = PERF_RECORD_FORK;
+   event->fork.header.misc = PERF_RECORD_MISC_FORK_EXEC;
 
event->fork.header.size = (sizeof(event->fork) + machine->id_hdr_size);
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..214b7979c4e7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1708,6 +1708,7 @@ int machine__process_fork_event(struct machine *machine, 
union perf_event *event
struct thread *parent = machine__findnew_thread(machine,
event->fork.ppid,
event->fork.ptid);
+   bool do_maps_clone = true;
int err = 0;
 
if (dump_trace)
@@ -1737,8 +1738,11 @@ int machine__process_fork_event(struct machine *machine, 
union perf_event *event
thread = machine__findnew_thread(machine, event->fork.pid,
 event->fork.tid);
 
+   if (event->fork.header.misc & PERF_RECORD_MISC_FORK_EXEC)
+   do_maps_clone = false;
+
if (thread == NULL || parent == NULL ||
-   thread__fork(thread, parent, sample->time) < 0) {
+   thread__fork(thread, parent, sample->time, do_maps_clone) < 0) {
dump_printf("problem processing PERF_RECORD_FORK, skipping 
event.\n");
err = -1;
}
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 2048d393ece6..54b2c9ceba9f 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -330,7 +330,8 @@ static int thread__prepare_access(struct thread *thread)
 }
 
 static int thread__clone_map_groups(struct thread *thread,
-   struct thread *parent)
+   struct thread *parent,
+   bool do_maps_clone)
 {
/* This is new thread, we share map groups for process. */
if (thread->pid_ == parent->pid_)
@@ -341,15 +342,14 @@ static int thread__clone_map_groups(struct thread *thread,
 thread->pid_, thread->tid, parent->pid_, parent->tid);
return 0;
}
-
/* But this one is new process, copy maps. */
-   if (map_groups__clone(thread, parent->mg) < 0)
+   if (do_maps_clone &&
+   map_groups__clone(thread, parent->mg) < 0)
return -ENOMEM;
-
return 0;
 }
 
-int 

[PATCH] perf: Don't clone maps from parent when synthesizing forks

2018-10-30 Thread David Miller


When synthesizing FORK events, we are trying to create thread objects
for the already running tasks on the machine.

Normally, for a kernel FORK event, we want to clone the parent's maps
because that is what the kernel just did.

But when synthesizing, this should not be done.  If we do, we end up
with overlapping maps as we process the sythesized MMAP2 events that
get delivered shortly thereafter.

Use the FORK event misc flags in an internal way to signal this
situation, so we can elide the map clone when appropriate.

Signed-off-by: David S. Miller 

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72739c0..9de8780ac8d9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -646,10 +646,12 @@ struct perf_event_mmap_page {
  *
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
+ *   PERF_RECORD_MISC_FORK_EXEC  - PERF_RECORD_FORK event (perf internal)
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
  */
 #define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
+#define PERF_RECORD_MISC_FORK_EXEC (1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT(1 << 13)
 /*
  * These PERF_RECORD_MISC_* flags below are safely reused
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index f35eb72739c0..9de8780ac8d9 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -646,10 +646,12 @@ struct perf_event_mmap_page {
  *
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
+ *   PERF_RECORD_MISC_FORK_EXEC  - PERF_RECORD_FORK event (perf internal)
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
  */
 #define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
+#define PERF_RECORD_MISC_FORK_EXEC (1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT(1 << 13)
 /*
  * These PERF_RECORD_MISC_* flags below are safely reused
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index bc646185f8d9..e9c108a6b1c3 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -308,6 +308,7 @@ static int perf_event__synthesize_fork(struct perf_tool 
*tool,
event->fork.pid  = tgid;
event->fork.tid  = pid;
event->fork.header.type = PERF_RECORD_FORK;
+   event->fork.header.misc = PERF_RECORD_MISC_FORK_EXEC;
 
event->fork.header.size = (sizeof(event->fork) + machine->id_hdr_size);
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..214b7979c4e7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1708,6 +1708,7 @@ int machine__process_fork_event(struct machine *machine, 
union perf_event *event
struct thread *parent = machine__findnew_thread(machine,
event->fork.ppid,
event->fork.ptid);
+   bool do_maps_clone = true;
int err = 0;
 
if (dump_trace)
@@ -1737,8 +1738,11 @@ int machine__process_fork_event(struct machine *machine, 
union perf_event *event
thread = machine__findnew_thread(machine, event->fork.pid,
 event->fork.tid);
 
+   if (event->fork.header.misc & PERF_RECORD_MISC_FORK_EXEC)
+   do_maps_clone = false;
+
if (thread == NULL || parent == NULL ||
-   thread__fork(thread, parent, sample->time) < 0) {
+   thread__fork(thread, parent, sample->time, do_maps_clone) < 0) {
dump_printf("problem processing PERF_RECORD_FORK, skipping 
event.\n");
err = -1;
}
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 2048d393ece6..54b2c9ceba9f 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -330,7 +330,8 @@ static int thread__prepare_access(struct thread *thread)
 }
 
 static int thread__clone_map_groups(struct thread *thread,
-   struct thread *parent)
+   struct thread *parent,
+   bool do_maps_clone)
 {
/* This is new thread, we share map groups for process. */
if (thread->pid_ == parent->pid_)
@@ -341,15 +342,14 @@ static int thread__clone_map_groups(struct thread *thread,
 thread->pid_, thread->tid, parent->pid_, parent->tid);
return 0;
}
-
/* But this one is new process, copy maps. */
-   if (map_groups__clone(thread, parent->mg) < 0)
+   if (do_maps_clone &&
+   map_groups__clone(thread, parent->mg) < 0)
return -ENOMEM;
-
return 0;
 }
 
-int 

Re: [PATCH 2/2] pinctrl: pinctrl-npcm7xx: Set BGPIOF_VOLATILE_REG

2018-10-30 Thread Kun Yi
On Tue, Oct 30, 2018 at 5:08 AM Linus Walleij  wrote:
>
> On Wed, Oct 17, 2018 at 11:30 PM Kun Yi  wrote:
>
> > Indicate that the pins are both controlled by the pinctrl driver and the
> > generic GPIO driver, thus GPIO driver should read the register value
> > before updating, instead of using the stored shadow register values.
> >
> > Signed-off-by: Kun Yi 
>
> This is quite a rough measure, if we instead use regmap-mmio
> we can exercise fine control over what register are volatile and
> not instead of saying that all of them or some of them are.

Thanks for your review Linus! I don't have time to rewrite using
regmap-mmio at the moment though. I have discussed with the driver
author and we will first patch the pinctrl driver by making the
pinctrl functions use the gpio-mmio accessors instead of directly reg
read/writes. When I have time I will look into your suggestion to
improve the driver.

>
> Yours,
> Linus Walleij

-- 
Regards,
Kun


Re: [PATCH 2/2] pinctrl: pinctrl-npcm7xx: Set BGPIOF_VOLATILE_REG

2018-10-30 Thread Kun Yi
On Tue, Oct 30, 2018 at 5:08 AM Linus Walleij  wrote:
>
> On Wed, Oct 17, 2018 at 11:30 PM Kun Yi  wrote:
>
> > Indicate that the pins are both controlled by the pinctrl driver and the
> > generic GPIO driver, thus GPIO driver should read the register value
> > before updating, instead of using the stored shadow register values.
> >
> > Signed-off-by: Kun Yi 
>
> This is quite a rough measure, if we instead use regmap-mmio
> we can exercise fine control over what register are volatile and
> not instead of saying that all of them or some of them are.

Thanks for your review Linus! I don't have time to rewrite using
regmap-mmio at the moment though. I have discussed with the driver
author and we will first patch the pinctrl driver by making the
pinctrl functions use the gpio-mmio accessors instead of directly reg
read/writes. When I have time I will look into your suggestion to
improve the driver.

>
> Yours,
> Linus Walleij

-- 
Regards,
Kun


[PATCH RFC] hist lookups

2018-10-30 Thread David Miller


So when a cpu is overpowered processing samples, most of the time is
spent in the histogram code.

It seems we initialize a ~262 byte structure on the stack to do every
histogram entry lookup.

This is a side effect of how the sorting code is shared with the code
that does lookups and insertions into the histogram tree(s).

I tried to change this so that lookups use a smaller key, but it gets
ugly real fast.

I don't know when I'd be able to work more on this so I'm posting this
hoping maybe someone else can move it forward, or maybe even find a
better way to do this.

The histogram code is really the limiting factor in how well perf can
handle high sample rates.

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..f0265e4 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -81,6 +81,12 @@ sort__thread_cmp(struct hist_entry *left, struct hist_entry 
*right)
return right->thread->tid - left->thread->tid;
 }
 
+static int64_t
+sort__thread_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   return key->al->thread->tid - entry->thread->tid;
+}
+
 static int hist_entry__thread_snprintf(struct hist_entry *he, char *bf,
   size_t size, unsigned int width)
 {
@@ -104,6 +110,7 @@ static int hist_entry__thread_filter(struct hist_entry *he, 
int type, const void
 struct sort_entry sort_thread = {
.se_header  = "Pid:Command",
.se_cmp = sort__thread_cmp,
+   .se_cmp_key = sort__thread_cmp_key,
.se_snprintf= hist_entry__thread_snprintf,
.se_filter  = hist_entry__thread_filter,
.se_width_idx   = HISTC_THREAD,
@@ -123,6 +130,13 @@ sort__comm_cmp(struct hist_entry *left, struct hist_entry 
*right)
 }
 
 static int64_t
+sort__comm_cmp_key(struct hist_entry *entry,
+  struct hist_entry_cmp_key *key)
+{
+   return strcmp(comm__str(key->comm), comm__str(entry->comm));
+}
+
+static int64_t
 sort__comm_collapse(struct hist_entry *left, struct hist_entry *right)
 {
return strcmp(comm__str(right->comm), comm__str(left->comm));
@@ -143,6 +157,7 @@ static int hist_entry__comm_snprintf(struct hist_entry *he, 
char *bf,
 struct sort_entry sort_comm = {
.se_header  = "Command",
.se_cmp = sort__comm_cmp,
+   .se_cmp_key = sort__comm_cmp_key,
.se_collapse= sort__comm_collapse,
.se_sort= sort__comm_sort,
.se_snprintf= hist_entry__comm_snprintf,
@@ -178,6 +193,12 @@ sort__dso_cmp(struct hist_entry *left, struct hist_entry 
*right)
return _sort__dso_cmp(right->ms.map, left->ms.map);
 }
 
+static int64_t
+sort__dso_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   return _sort__dso_cmp(key->al->map, entry->ms.map);
+}
+
 static int _hist_entry__dso_snprintf(struct map *map, char *bf,
 size_t size, unsigned int width)
 {
@@ -209,6 +230,7 @@ static int hist_entry__dso_filter(struct hist_entry *he, 
int type, const void *a
 struct sort_entry sort_dso = {
.se_header  = "Shared Object",
.se_cmp = sort__dso_cmp,
+   .se_cmp_key = sort__dso_cmp_key,
.se_snprintf= hist_entry__dso_snprintf,
.se_filter  = hist_entry__dso_filter,
.se_width_idx   = HISTC_DSO,
@@ -260,6 +282,25 @@ sort__sym_cmp(struct hist_entry *left, struct hist_entry 
*right)
 }
 
 static int64_t
+sort__sym_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   int64_t ret;
+
+   if (!entry->ms.sym && !key->al->sym)
+   return _sort__addr_cmp(entry->ip, key->al->addr);
+
+   /*
+* comparing symbol address alone is not enough since it's a
+* relative address within a dso.
+*/
+   ret = sort__dso_cmp_key(entry, key);
+   if (ret != 0)
+   return ret;
+
+   return _sort__sym_cmp(entry->ms.sym, key->al->sym);
+}
+
+static int64_t
 sort__sym_sort(struct hist_entry *left, struct hist_entry *right)
 {
if (!left->ms.sym || !right->ms.sym)
@@ -323,6 +364,7 @@ static int hist_entry__sym_filter(struct hist_entry *he, 
int type, const void *a
 struct sort_entry sort_sym = {
.se_header  = "Symbol",
.se_cmp = sort__sym_cmp,
+   .se_cmp_key = sort__sym_cmp_key,
.se_sort= sort__sym_sort,
.se_snprintf= hist_entry__sym_snprintf,
.se_filter  = hist_entry__sym_filter,
@@ -347,6 +389,18 @@ sort__srcline_cmp(struct hist_entry *left, struct 
hist_entry *right)
return strcmp(right->srcline, left->srcline);
 }
 
+static int64_t
+sort__srcline_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   if (!entry->srcline)
+   entry->srcline = hist_entry__srcline(entry);
+   if (!key->al->srcline)
+   key->al->srcline =
+   map__srcline(key->al->map, 

[PATCH RFC] hist lookups

2018-10-30 Thread David Miller


So when a cpu is overpowered processing samples, most of the time is
spent in the histogram code.

It seems we initialize a ~262 byte structure on the stack to do every
histogram entry lookup.

This is a side effect of how the sorting code is shared with the code
that does lookups and insertions into the histogram tree(s).

I tried to change this so that lookups use a smaller key, but it gets
ugly real fast.

I don't know when I'd be able to work more on this so I'm posting this
hoping maybe someone else can move it forward, or maybe even find a
better way to do this.

The histogram code is really the limiting factor in how well perf can
handle high sample rates.

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..f0265e4 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -81,6 +81,12 @@ sort__thread_cmp(struct hist_entry *left, struct hist_entry 
*right)
return right->thread->tid - left->thread->tid;
 }
 
+static int64_t
+sort__thread_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   return key->al->thread->tid - entry->thread->tid;
+}
+
 static int hist_entry__thread_snprintf(struct hist_entry *he, char *bf,
   size_t size, unsigned int width)
 {
@@ -104,6 +110,7 @@ static int hist_entry__thread_filter(struct hist_entry *he, 
int type, const void
 struct sort_entry sort_thread = {
.se_header  = "Pid:Command",
.se_cmp = sort__thread_cmp,
+   .se_cmp_key = sort__thread_cmp_key,
.se_snprintf= hist_entry__thread_snprintf,
.se_filter  = hist_entry__thread_filter,
.se_width_idx   = HISTC_THREAD,
@@ -123,6 +130,13 @@ sort__comm_cmp(struct hist_entry *left, struct hist_entry 
*right)
 }
 
 static int64_t
+sort__comm_cmp_key(struct hist_entry *entry,
+  struct hist_entry_cmp_key *key)
+{
+   return strcmp(comm__str(key->comm), comm__str(entry->comm));
+}
+
+static int64_t
 sort__comm_collapse(struct hist_entry *left, struct hist_entry *right)
 {
return strcmp(comm__str(right->comm), comm__str(left->comm));
@@ -143,6 +157,7 @@ static int hist_entry__comm_snprintf(struct hist_entry *he, 
char *bf,
 struct sort_entry sort_comm = {
.se_header  = "Command",
.se_cmp = sort__comm_cmp,
+   .se_cmp_key = sort__comm_cmp_key,
.se_collapse= sort__comm_collapse,
.se_sort= sort__comm_sort,
.se_snprintf= hist_entry__comm_snprintf,
@@ -178,6 +193,12 @@ sort__dso_cmp(struct hist_entry *left, struct hist_entry 
*right)
return _sort__dso_cmp(right->ms.map, left->ms.map);
 }
 
+static int64_t
+sort__dso_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   return _sort__dso_cmp(key->al->map, entry->ms.map);
+}
+
 static int _hist_entry__dso_snprintf(struct map *map, char *bf,
 size_t size, unsigned int width)
 {
@@ -209,6 +230,7 @@ static int hist_entry__dso_filter(struct hist_entry *he, 
int type, const void *a
 struct sort_entry sort_dso = {
.se_header  = "Shared Object",
.se_cmp = sort__dso_cmp,
+   .se_cmp_key = sort__dso_cmp_key,
.se_snprintf= hist_entry__dso_snprintf,
.se_filter  = hist_entry__dso_filter,
.se_width_idx   = HISTC_DSO,
@@ -260,6 +282,25 @@ sort__sym_cmp(struct hist_entry *left, struct hist_entry 
*right)
 }
 
 static int64_t
+sort__sym_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   int64_t ret;
+
+   if (!entry->ms.sym && !key->al->sym)
+   return _sort__addr_cmp(entry->ip, key->al->addr);
+
+   /*
+* comparing symbol address alone is not enough since it's a
+* relative address within a dso.
+*/
+   ret = sort__dso_cmp_key(entry, key);
+   if (ret != 0)
+   return ret;
+
+   return _sort__sym_cmp(entry->ms.sym, key->al->sym);
+}
+
+static int64_t
 sort__sym_sort(struct hist_entry *left, struct hist_entry *right)
 {
if (!left->ms.sym || !right->ms.sym)
@@ -323,6 +364,7 @@ static int hist_entry__sym_filter(struct hist_entry *he, 
int type, const void *a
 struct sort_entry sort_sym = {
.se_header  = "Symbol",
.se_cmp = sort__sym_cmp,
+   .se_cmp_key = sort__sym_cmp_key,
.se_sort= sort__sym_sort,
.se_snprintf= hist_entry__sym_snprintf,
.se_filter  = hist_entry__sym_filter,
@@ -347,6 +389,18 @@ sort__srcline_cmp(struct hist_entry *left, struct 
hist_entry *right)
return strcmp(right->srcline, left->srcline);
 }
 
+static int64_t
+sort__srcline_cmp_key(struct hist_entry *entry, struct hist_entry_cmp_key *key)
+{
+   if (!entry->srcline)
+   entry->srcline = hist_entry__srcline(entry);
+   if (!key->al->srcline)
+   key->al->srcline =
+   map__srcline(key->al->map, 

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Christian Brauner  writes:

> On Tue, Oct 30, 2018 at 12:12 PM Daniel Colascione  wrote:
>>
>> On Tue, Oct 30, 2018 at 11:04 AM, Christian Brauner
>>  wrote:
>> > On Tue, Oct 30, 2018 at 11:48 AM Daniel Colascione  
>> > wrote:
>> >>
>> >> Why not?
>> >>
>> >> Does your proposed API allow for a race-free pkill, with arbitrary
>> >> selection criteria? This capability is a good litmus test for fixing
>> >> the long-standing Unix process API issues.
>> >
>> > You'd have a handle on the process with an fd so yes, it would be.
>>
>> Thanks. That's good to hear.
>>
>> Any idea on the timetable for this proposal? I'm open to lots of
>> alternative technical approaches, but I don't want this capability to
>> languish for a long time.
>
> Latest end of year likely sooner depending on the feedback I'm getting
> during LPC.

Frankly.  If you want a race free fork variant probably the easiest
thing to do is to return a open copy of the proc directory entry.  Wrapped
in a bind mount so that you can't see beyond that directory in proc.

My only concern would be if a vfsmount per process would be too heavy for such 
a use.

Eric





Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Christian Brauner  writes:

> On Tue, Oct 30, 2018 at 12:12 PM Daniel Colascione  wrote:
>>
>> On Tue, Oct 30, 2018 at 11:04 AM, Christian Brauner
>>  wrote:
>> > On Tue, Oct 30, 2018 at 11:48 AM Daniel Colascione  
>> > wrote:
>> >>
>> >> Why not?
>> >>
>> >> Does your proposed API allow for a race-free pkill, with arbitrary
>> >> selection criteria? This capability is a good litmus test for fixing
>> >> the long-standing Unix process API issues.
>> >
>> > You'd have a handle on the process with an fd so yes, it would be.
>>
>> Thanks. That's good to hear.
>>
>> Any idea on the timetable for this proposal? I'm open to lots of
>> alternative technical approaches, but I don't want this capability to
>> languish for a long time.
>
> Latest end of year likely sooner depending on the feedback I'm getting
> during LPC.

Frankly.  If you want a race free fork variant probably the easiest
thing to do is to return a open copy of the proc directory entry.  Wrapped
in a bind mount so that you can't see beyond that directory in proc.

My only concern would be if a vfsmount per process would be too heavy for such 
a use.

Eric





Re: [PATCH 1/3] retpolines: Only enable retpoline when compiler support it

2018-10-30 Thread Zhenzhong Duan

On 2018/10/30 16:32, Peter Zijlstra wrote:

On Mon, Oct 29, 2018 at 11:55:04PM -0700, Zhenzhong Duan wrote:

Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on it.

Change KBUILD to use CONFIG_RETPOLINE_SUPPORT to avoid conflict with
CONFIG_RETPOLINE which is used by kernel.

With all that stuff, the check of RETPOLINE is changed to
CONFIG_RETPOLINE.


So what happens when we select CONFIG_RETPOLINE but do not have
RETPOLINE_SUPPORT ? From a quick reading we'll silently build a
!retpoline kernel. I would expect a build failure.


CONFIG_RETPOLINE is only defined when CONFIG_RETPOLINE_SUPPORT is 
selected. See below chunk.


--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -221,9 +221,10 @@ KBUILD_CFLAGS += -Wno-sign-compare
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

 # Avoid indirect branches in kernel to deal with Spectre
-ifdef CONFIG_RETPOLINE
+ifdef CONFIG_RETPOLINE_SUPPORT
 ifneq ($(RETPOLINE_CFLAGS),)
-  KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+  KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DCONFIG_RETPOLINE
+  KBUILD_AFLAGS += -DCONFIG_RETPOLINE
 endif
 endif


Thanks
Zhenzhong


Re: [PATCH 1/3] retpolines: Only enable retpoline when compiler support it

2018-10-30 Thread Zhenzhong Duan

On 2018/10/30 16:32, Peter Zijlstra wrote:

On Mon, Oct 29, 2018 at 11:55:04PM -0700, Zhenzhong Duan wrote:

Since retpoline capable compilers are widely available, make
CONFIG_RETPOLINE hard depend on it.

Change KBUILD to use CONFIG_RETPOLINE_SUPPORT to avoid conflict with
CONFIG_RETPOLINE which is used by kernel.

With all that stuff, the check of RETPOLINE is changed to
CONFIG_RETPOLINE.


So what happens when we select CONFIG_RETPOLINE but do not have
RETPOLINE_SUPPORT ? From a quick reading we'll silently build a
!retpoline kernel. I would expect a build failure.


CONFIG_RETPOLINE is only defined when CONFIG_RETPOLINE_SUPPORT is 
selected. See below chunk.


--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -221,9 +221,10 @@ KBUILD_CFLAGS += -Wno-sign-compare
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

 # Avoid indirect branches in kernel to deal with Spectre
-ifdef CONFIG_RETPOLINE
+ifdef CONFIG_RETPOLINE_SUPPORT
 ifneq ($(RETPOLINE_CFLAGS),)
-  KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+  KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DCONFIG_RETPOLINE
+  KBUILD_AFLAGS += -DCONFIG_RETPOLINE
 endif
 endif


Thanks
Zhenzhong


[PATCH] arm64: dts: nxp: ls208xa: add more thermal zone support

2018-10-30 Thread Yuantian Tang
Ls208xa has several thermal sensors. Add all the sensor id to dts
to enable them.

To make the dts cleaner, re-organize the nodes to split out the
common part so that it can be shared with other SoCs.

Signed-off-by: Yuantian Tang 
---
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi  |8 +-
 arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi  |8 +-
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi  |   83 +++-
 arch/arm64/boot/dts/freescale/fsl-tmu-map1.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu-map2.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu-map3.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu.dtsi  |  251 +++
 7 files changed, 591 insertions(+), 56 deletions(-)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map1.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map2.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map3.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu.dtsi

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index f9c1d30..8f9788c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -12,7 +12,7 @@
 #include "fsl-ls208xa.dtsi"
 
  {
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x0>;
@@ -32,7 +32,7 @@
#cooling-cells = <2>;
};
 
-   cpu2: cpu@100 {
+   cooling_map1: cpu2: cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x100>;
@@ -52,7 +52,7 @@
#cooling-cells = <2>;
};
 
-   cpu4: cpu@200 {
+   cooling_map2: cpu4: cpu@200 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x200>;
@@ -72,7 +72,7 @@
#cooling-cells = <2>;
};
 
-   cpu6: cpu@300 {
+   cooling_map3: cpu6: cpu@300 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x300>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
index 7c882da..013fe16 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
@@ -12,7 +12,7 @@
 #include "fsl-ls208xa.dtsi"
 
  {
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x0>;
@@ -32,7 +32,7 @@
#cooling-cells = <2>;
};
 
-   cpu2: cpu@100 {
+   cooling_map1: cpu2: cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x100>;
@@ -52,7 +52,7 @@
#cooling-cells = <2>;
};
 
-   cpu4: cpu@200 {
+   cooling_map2: cpu4: cpu@200 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x200>;
@@ -72,7 +72,7 @@
#cooling-cells = <2>;
};
 
-   cpu6: cpu@300 {
+   cooling_map3: cpu6: cpu@300 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x300>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
index 8cb78dd..4102317 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
@@ -75,54 +75,7 @@
mask = <0x2>;
};
 
-   thermal-zones {
-   cpu_thermal: cpu-thermal {
-   polling-delay-passive = <1000>;
-   polling-delay = <5000>;
-
-   thermal-sensors = < 4>;
-
-   trips {
-   cpu_alert: cpu-alert {
-   temperature = <75000>;
-   hysteresis = <2000>;
-   type = "passive";
-   };
-   cpu_crit: cpu-crit {
-   temperature = <85000>;
-   hysteresis = <2000>;
-   type = "critical";
-   };
-   };
-
-   cooling-maps {
-   map0 {
-   trip = <_alert>;
-   cooling-device =
-   < THERMAL_NO_LIMIT
-   THERMAL_NO_LIMIT>;
-   };
-   map1 {
- 

[PATCH] arm64: dts: nxp: ls208xa: add more thermal zone support

2018-10-30 Thread Yuantian Tang
Ls208xa has several thermal sensors. Add all the sensor id to dts
to enable them.

To make the dts cleaner, re-organize the nodes to split out the
common part so that it can be shared with other SoCs.

Signed-off-by: Yuantian Tang 
---
 arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi  |8 +-
 arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi  |8 +-
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi  |   83 +++-
 arch/arm64/boot/dts/freescale/fsl-tmu-map1.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu-map2.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu-map3.dtsi |   99 +
 arch/arm64/boot/dts/freescale/fsl-tmu.dtsi  |  251 +++
 7 files changed, 591 insertions(+), 56 deletions(-)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map1.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map2.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu-map3.dtsi
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-tmu.dtsi

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
index f9c1d30..8f9788c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2080a.dtsi
@@ -12,7 +12,7 @@
 #include "fsl-ls208xa.dtsi"
 
  {
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x0>;
@@ -32,7 +32,7 @@
#cooling-cells = <2>;
};
 
-   cpu2: cpu@100 {
+   cooling_map1: cpu2: cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x100>;
@@ -52,7 +52,7 @@
#cooling-cells = <2>;
};
 
-   cpu4: cpu@200 {
+   cooling_map2: cpu4: cpu@200 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x200>;
@@ -72,7 +72,7 @@
#cooling-cells = <2>;
};
 
-   cpu6: cpu@300 {
+   cooling_map3: cpu6: cpu@300 {
device_type = "cpu";
compatible = "arm,cortex-a57";
reg = <0x300>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
index 7c882da..013fe16 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls2088a.dtsi
@@ -12,7 +12,7 @@
 #include "fsl-ls208xa.dtsi"
 
  {
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x0>;
@@ -32,7 +32,7 @@
#cooling-cells = <2>;
};
 
-   cpu2: cpu@100 {
+   cooling_map1: cpu2: cpu@100 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x100>;
@@ -52,7 +52,7 @@
#cooling-cells = <2>;
};
 
-   cpu4: cpu@200 {
+   cooling_map2: cpu4: cpu@200 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x200>;
@@ -72,7 +72,7 @@
#cooling-cells = <2>;
};
 
-   cpu6: cpu@300 {
+   cooling_map3: cpu6: cpu@300 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x300>;
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
index 8cb78dd..4102317 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi
@@ -75,54 +75,7 @@
mask = <0x2>;
};
 
-   thermal-zones {
-   cpu_thermal: cpu-thermal {
-   polling-delay-passive = <1000>;
-   polling-delay = <5000>;
-
-   thermal-sensors = < 4>;
-
-   trips {
-   cpu_alert: cpu-alert {
-   temperature = <75000>;
-   hysteresis = <2000>;
-   type = "passive";
-   };
-   cpu_crit: cpu-crit {
-   temperature = <85000>;
-   hysteresis = <2000>;
-   type = "critical";
-   };
-   };
-
-   cooling-maps {
-   map0 {
-   trip = <_alert>;
-   cooling-device =
-   < THERMAL_NO_LIMIT
-   THERMAL_NO_LIMIT>;
-   };
-   map1 {
- 

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Aleksa Sarai  writes:

> On 2018-10-29, Daniel Colascione  wrote:
>> Add a simple proc-based kill interface. To use /proc/pid/kill, just
>> write the signal number in base-10 ASCII to the kill file of the
>> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>> 
>> Semantically, /proc/pid/kill works like kill(2), except that the
>> process ID comes from the proc filesystem context instead of from an
>> explicit system call parameter. This way, it's possible to avoid races
>> between inspecting some aspect of a process and that process's PID
>> being reused for some other process.
>
> (Aside from any UX concerns other folks might have.)
>
> I think it would be a good idea to (at least temporarily) restrict this
> so that only processes that are in the same PID namespace as the /proc
> being resolved through may use this interface. Otherwise you might have
> cases where partial container breakouts can start sending signals to
> PIDs they wouldn't normally be able to address.

No.  That is the container managers job.  If you have the wrong proc
mounted in your container or otherwise allow access to it that is the
fault of the application that set up the container.

The pid namespace limits visibility.  If something becomes visible and
you have permissions over it, it is perfectly reasonable for you to
execute those permissions.

Eric


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Aleksa Sarai  writes:

> On 2018-10-29, Daniel Colascione  wrote:
>> Add a simple proc-based kill interface. To use /proc/pid/kill, just
>> write the signal number in base-10 ASCII to the kill file of the
>> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>> 
>> Semantically, /proc/pid/kill works like kill(2), except that the
>> process ID comes from the proc filesystem context instead of from an
>> explicit system call parameter. This way, it's possible to avoid races
>> between inspecting some aspect of a process and that process's PID
>> being reused for some other process.
>
> (Aside from any UX concerns other folks might have.)
>
> I think it would be a good idea to (at least temporarily) restrict this
> so that only processes that are in the same PID namespace as the /proc
> being resolved through may use this interface. Otherwise you might have
> cases where partial container breakouts can start sending signals to
> PIDs they wouldn't normally be able to address.

No.  That is the container managers job.  If you have the wrong proc
mounted in your container or otherwise allow access to it that is the
fault of the application that set up the container.

The pid namespace limits visibility.  If something becomes visible and
you have permissions over it, it is perfectly reasonable for you to
execute those permissions.

Eric


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Daniel Colascione  writes:

> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.
>
> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".
>
> #!/bin/bash
> set -euo pipefail
> pat=$1
> for proc_status in /proc/*/status; do (
> cd $(dirname $proc_status)
> readarray proc_argv -d'' < cmdline
> if ((${#proc_argv[@]} > 0)) &&
>[[ ${proc_argv[0]} = *$pat* ]];
> then
> echo 15 > kill
> fi
> ) || true; done
>

In general this looks good.

Unfortunately the permission checks are are subject to a serious
problem.  Even if I don't have permission to kill a process I quite
likely will be allowed to open the file.

Then I just need to find a setuid or setcap executable will write to
stdout or stderr a number.  Then I have killed something I should not
have the privileges to kill.

At a bare minimum you need to perform the permission check using the
credentials of the opener of the file.Which means refactoring
kill_pid so that you can perform the permission check for killing the
application during open.  Given that process credentials can change
completely during exec you also need to rule out a change in process
credentials making it so that the original opener of the file would not
be able to kill the process as it is now.

But overall this looks quite reasaonble.

Eric

> Signed-off-by: Daniel Colascione 
> ---
>  fs/proc/base.c | 39 +++
>  1 file changed, 39 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 7e9f07bf260d..923d62b21e67 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -205,6 +205,44 @@ static int proc_root_link(struct dentry *dentry, struct 
> path *path)
>   return result;
>  }
>  
> +static ssize_t proc_pid_kill_write(struct file *file,
> +const char __user *buf,
> +size_t count, loff_t *ppos)
> +{
> + ssize_t res;
> + int sig;
> + char buffer[4];
> +
> + res = -EINVAL;
> + if (*ppos != 0)
> + goto out;
> +
> + res = -EINVAL;
> + if (count > sizeof(buffer) - 1)
> + goto out;
> +
> + res = -EFAULT;
> + if (copy_from_user(buffer, buf, count))
> + goto out;
> +
> + buffer[count] = '\0';
> + res = kstrtoint(strstrip(buffer), 10, );
> + if (res)
> + goto out;
> +
> + res = kill_pid(proc_pid(file_inode(file)), sig, 0);
> + if (res)
> + goto out;
> + res = count;
> +out:
> + return res;
> +
> +}
> +
> +static const struct file_operations proc_pid_kill_ops = {
> + .write  = proc_pid_kill_write,
> +};
> +
>  static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf,
> size_t count, loff_t *ppos)
>  {
> @@ -2935,6 +2973,7 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>   ONE("syscall",S_IRUSR, proc_pid_syscall),
>  #endif
> + REG("kill",   S_IRUGO | S_IWUGO, proc_pid_kill_ops),
>   REG("cmdline",S_IRUGO, proc_pid_cmdline_ops),
>   ONE("stat",   S_IRUGO, proc_tgid_stat),
>   ONE("statm",  S_IRUGO, proc_pid_statm),


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Eric W. Biederman
Daniel Colascione  writes:

> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> write the signal number in base-10 ASCII to the kill file of the
> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
>
> Semantically, /proc/pid/kill works like kill(2), except that the
> process ID comes from the proc filesystem context instead of from an
> explicit system call parameter. This way, it's possible to avoid races
> between inspecting some aspect of a process and that process's PID
> being reused for some other process.
>
> With /proc/pid/kill, it's possible to write a proper race-free and
> safe pkill(1). An approximation follows. A real program might use
> openat(2), having opened a process's /proc/pid directory explicitly,
> with the directory file descriptor serving as a sort of "process
> handle".
>
> #!/bin/bash
> set -euo pipefail
> pat=$1
> for proc_status in /proc/*/status; do (
> cd $(dirname $proc_status)
> readarray proc_argv -d'' < cmdline
> if ((${#proc_argv[@]} > 0)) &&
>[[ ${proc_argv[0]} = *$pat* ]];
> then
> echo 15 > kill
> fi
> ) || true; done
>

In general this looks good.

Unfortunately the permission checks are are subject to a serious
problem.  Even if I don't have permission to kill a process I quite
likely will be allowed to open the file.

Then I just need to find a setuid or setcap executable will write to
stdout or stderr a number.  Then I have killed something I should not
have the privileges to kill.

At a bare minimum you need to perform the permission check using the
credentials of the opener of the file.Which means refactoring
kill_pid so that you can perform the permission check for killing the
application during open.  Given that process credentials can change
completely during exec you also need to rule out a change in process
credentials making it so that the original opener of the file would not
be able to kill the process as it is now.

But overall this looks quite reasaonble.

Eric

> Signed-off-by: Daniel Colascione 
> ---
>  fs/proc/base.c | 39 +++
>  1 file changed, 39 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 7e9f07bf260d..923d62b21e67 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -205,6 +205,44 @@ static int proc_root_link(struct dentry *dentry, struct 
> path *path)
>   return result;
>  }
>  
> +static ssize_t proc_pid_kill_write(struct file *file,
> +const char __user *buf,
> +size_t count, loff_t *ppos)
> +{
> + ssize_t res;
> + int sig;
> + char buffer[4];
> +
> + res = -EINVAL;
> + if (*ppos != 0)
> + goto out;
> +
> + res = -EINVAL;
> + if (count > sizeof(buffer) - 1)
> + goto out;
> +
> + res = -EFAULT;
> + if (copy_from_user(buffer, buf, count))
> + goto out;
> +
> + buffer[count] = '\0';
> + res = kstrtoint(strstrip(buffer), 10, );
> + if (res)
> + goto out;
> +
> + res = kill_pid(proc_pid(file_inode(file)), sig, 0);
> + if (res)
> + goto out;
> + res = count;
> +out:
> + return res;
> +
> +}
> +
> +static const struct file_operations proc_pid_kill_ops = {
> + .write  = proc_pid_kill_write,
> +};
> +
>  static ssize_t get_mm_cmdline(struct mm_struct *mm, char __user *buf,
> size_t count, loff_t *ppos)
>  {
> @@ -2935,6 +2973,7 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>   ONE("syscall",S_IRUSR, proc_pid_syscall),
>  #endif
> + REG("kill",   S_IRUGO | S_IWUGO, proc_pid_kill_ops),
>   REG("cmdline",S_IRUGO, proc_pid_cmdline_ops),
>   ONE("stat",   S_IRUGO, proc_tgid_stat),
>   ONE("statm",  S_IRUGO, proc_pid_statm),


ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!

2018-10-30 Thread kbuild test robot
Hi Matias,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 73569e11032fc5a9b314b6351632cfca7793afd5 lightnvm: remove dependencies 
on BLK_DEV_NVME and PCI
date:   3 weeks ago
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 73569e11032fc5a9b314b6351632cfca7793afd5
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!

2018-10-30 Thread kbuild test robot
Hi Matias,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 73569e11032fc5a9b314b6351632cfca7793afd5 lightnvm: remove dependencies 
on BLK_DEV_NVME and PCI
date:   3 weeks ago
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 73569e11032fc5a9b314b6351632cfca7793afd5
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined!
   ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] 
undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Eric W. Biederman
Benjamin Gordon  writes:

> Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
> in its effective capability set, but the current check looks in the root
> namespace instead of the process' user namespace.  Since a process is
> allowed to do other activities controlled by CAP_SYS_NICE inside a
> namespace, it should also be able to adjust timerslack_ns.

Acked-by: "Eric W. Biederman" 

I don't see any fundamental probess with how the processes user
namespace is being accessed.   You can race with setns
and that may result in a descendent user namespace of the current
user namespace being set.  But if you have permissions in the parent
user namespace you will have permissions over a child user namespace.
So the race can't effect the outcome of the ns_capable test.

That and while __task_cred(p) may change it is guaranteed there is a
valid one until __put_task_struct which only happens when a process has
a zero refcount.  Which the success of get_proc_task in before these
checks already ensures is not true.

So from my perspective this looks like a reasonable change.

I don't know how this looks from people who understand the timer bits
and what timerslack does. I suspect it is reasonable as there is no
permission check for changing yourself.

Eric

> Signed-off-by: Benjamin Gordon 
> Cc: John Stultz 
> Cc: "Eric W. Biederman" 
> Cc: Kees Cook 
> Cc: "Serge E. Hallyn" 
> Cc: Thomas Gleixner 
> Cc: Arjan van de Ven 
> Cc: Oren Laadan 
> Cc: Ruchi Kandoi 
> Cc: Rom Lemarchand 
> Cc: Todd Kjos 
> Cc: Colin Cross 
> Cc: Nick Kralevich 
> Cc: Dmitry Shmidt 
> Cc: Elliott Hughes 
> Cc: Android Kernel Team 
> Cc: Andrew Morton 
> ---
>
> Changes from v1:
>   - Use the namespace of the target process instead of the file opener.
> Didn't carry over John Stultz' Acked-by since the changes aren't
> cosmetic.
>
>  fs/proc/base.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index c78d8da09b52c..bdc093ba81dd3 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2385,10 +2385,13 @@ static ssize_t timerslack_ns_write(struct file *file, 
> const char __user *buf,
>   return -ESRCH;
>  
>   if (p != current) {
> - if (!capable(CAP_SYS_NICE)) {
> + rcu_read_lock();
> + if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
> + rcu_read_unlock();
>   count = -EPERM;
>   goto out;
>   }
> + rcu_read_unlock();
>  
>   err = security_task_setscheduler(p);
>   if (err) {
> @@ -2421,11 +2424,14 @@ static int timerslack_ns_show(struct seq_file *m, 
> void *v)
>   return -ESRCH;
>  
>   if (p != current) {
> -
> - if (!capable(CAP_SYS_NICE)) {
> + rcu_read_lock();
> + if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
> + rcu_read_unlock();
>   err = -EPERM;
>   goto out;
>   }
> + rcu_read_unlock();
> +
>   err = security_task_getscheduler(p);
>   if (err)
>   goto out;


Re: [PATCH v2] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Eric W. Biederman
Benjamin Gordon  writes:

> Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
> in its effective capability set, but the current check looks in the root
> namespace instead of the process' user namespace.  Since a process is
> allowed to do other activities controlled by CAP_SYS_NICE inside a
> namespace, it should also be able to adjust timerslack_ns.

Acked-by: "Eric W. Biederman" 

I don't see any fundamental probess with how the processes user
namespace is being accessed.   You can race with setns
and that may result in a descendent user namespace of the current
user namespace being set.  But if you have permissions in the parent
user namespace you will have permissions over a child user namespace.
So the race can't effect the outcome of the ns_capable test.

That and while __task_cred(p) may change it is guaranteed there is a
valid one until __put_task_struct which only happens when a process has
a zero refcount.  Which the success of get_proc_task in before these
checks already ensures is not true.

So from my perspective this looks like a reasonable change.

I don't know how this looks from people who understand the timer bits
and what timerslack does. I suspect it is reasonable as there is no
permission check for changing yourself.

Eric

> Signed-off-by: Benjamin Gordon 
> Cc: John Stultz 
> Cc: "Eric W. Biederman" 
> Cc: Kees Cook 
> Cc: "Serge E. Hallyn" 
> Cc: Thomas Gleixner 
> Cc: Arjan van de Ven 
> Cc: Oren Laadan 
> Cc: Ruchi Kandoi 
> Cc: Rom Lemarchand 
> Cc: Todd Kjos 
> Cc: Colin Cross 
> Cc: Nick Kralevich 
> Cc: Dmitry Shmidt 
> Cc: Elliott Hughes 
> Cc: Android Kernel Team 
> Cc: Andrew Morton 
> ---
>
> Changes from v1:
>   - Use the namespace of the target process instead of the file opener.
> Didn't carry over John Stultz' Acked-by since the changes aren't
> cosmetic.
>
>  fs/proc/base.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index c78d8da09b52c..bdc093ba81dd3 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2385,10 +2385,13 @@ static ssize_t timerslack_ns_write(struct file *file, 
> const char __user *buf,
>   return -ESRCH;
>  
>   if (p != current) {
> - if (!capable(CAP_SYS_NICE)) {
> + rcu_read_lock();
> + if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
> + rcu_read_unlock();
>   count = -EPERM;
>   goto out;
>   }
> + rcu_read_unlock();
>  
>   err = security_task_setscheduler(p);
>   if (err) {
> @@ -2421,11 +2424,14 @@ static int timerslack_ns_show(struct seq_file *m, 
> void *v)
>   return -ESRCH;
>  
>   if (p != current) {
> -
> - if (!capable(CAP_SYS_NICE)) {
> + rcu_read_lock();
> + if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
> + rcu_read_unlock();
>   err = -EPERM;
>   goto out;
>   }
> + rcu_read_unlock();
> +
>   err = security_task_getscheduler(p);
>   if (err)
>   goto out;


Re: [PATCH 1/4] base/drivers/arch_topology: Remove useless check

2018-10-30 Thread Viresh Kumar
On 30-10-18, 14:35, Daniel Lezcano wrote:
> I'm wondering if having a statically per_cpu variable, even if it is not
> free at the end, isn't worth regarding the twisted code we end up with
> an allocation.

Maybe yeah.

-- 
viresh


Re: [PATCH 1/4] base/drivers/arch_topology: Remove useless check

2018-10-30 Thread Viresh Kumar
On 30-10-18, 14:35, Daniel Lezcano wrote:
> I'm wondering if having a statically per_cpu variable, even if it is not
> free at the end, isn't worth regarding the twisted code we end up with
> an allocation.

Maybe yeah.

-- 
viresh


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 7:56 PM, Aleksa Sarai  wrote:
> On 2018-10-31, Christian Brauner  wrote:
>> > I think Aleksa's larger point is that it's useful to treat processes
>> > as other file-descriptor-named, poll-able, wait-able resources.
>> > Consistency is important. A process is just another system resource,
>> > and like any other system resource, you should be open to hold a file
>> > descriptor to it and do things to that process via that file
>> > descriptor. The precise form of this process-handle FD is up for
>> > debate. The existing /proc/$PID directory FD is a good candidate for a
>> > process handle FD, since it does almost all of what's needed. But
>> > regardless of what form a process handle FD takes, we need it. I don't
>> > see a case for continuing to treat processes in a non-unixy,
>> > non-file-descriptor-based manner.
>>
>> That's what I'm proposing in the API for which I'm gathering feedback.
>> I have presented parts of this in various discussions at LSS Europe last week
>> and will be at LPC.
>> We don't want to rush an API like this though. It was tried before in
>> other forms
>> and these proposals didn't make it.
>
> :+1: on a well thought-out and generic proposal. As we've discussed
> elsewhere, this is an issue that really would be great to (finally)
> solve.

Excited to see this and please count me in for discussions around this. thanks.

 - Joel


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 7:56 PM, Aleksa Sarai  wrote:
> On 2018-10-31, Christian Brauner  wrote:
>> > I think Aleksa's larger point is that it's useful to treat processes
>> > as other file-descriptor-named, poll-able, wait-able resources.
>> > Consistency is important. A process is just another system resource,
>> > and like any other system resource, you should be open to hold a file
>> > descriptor to it and do things to that process via that file
>> > descriptor. The precise form of this process-handle FD is up for
>> > debate. The existing /proc/$PID directory FD is a good candidate for a
>> > process handle FD, since it does almost all of what's needed. But
>> > regardless of what form a process handle FD takes, we need it. I don't
>> > see a case for continuing to treat processes in a non-unixy,
>> > non-file-descriptor-based manner.
>>
>> That's what I'm proposing in the API for which I'm gathering feedback.
>> I have presented parts of this in various discussions at LSS Europe last week
>> and will be at LPC.
>> We don't want to rush an API like this though. It was tried before in
>> other forms
>> and these proposals didn't make it.
>
> :+1: on a well thought-out and generic proposal. As we've discussed
> elsewhere, this is an issue that really would be great to (finally)
> solve.

Excited to see this and please count me in for discussions around this. thanks.

 - Joel


Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 8:57 PM, Peng15 Wang 王鹏  wrote:
>
>>From: Joel Fernandes 
>>Sent: Wednesday, October 31, 2018 6:16
>>To: Kees Cook
>>Cc: Peng15 Wang 王鹏; Anton Vorontsov; Colin Cross; Tony Luck; LKML; 
>>vipwangerx...@gmail.com
>>Subject: Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()
>>
>>On Tue, Oct 30, 2018 at 02:52:43PM -0700, Kees Cook wrote:
>>> On Tue, Oct 30, 2018 at 2:38 PM, Joel Fernandes  
>>> wrote:
>>> > On Tue, Oct 30, 2018 at 03:52:34PM +0800, Peng Wang wrote:
>>> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
>>> >> function call path is like this:
>>> >>
>>> >> ramoops_init_prz ->
>>> >> |
>>> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
>>> >> |
>>> >> |-> persistent_ram_zap
>>> >>
>>> >> As we can see, persistent_ram_zap() is called twice.
>>> >> We can avoid this by adding an option to persistent_ram_new(), and
>>> >> only call persistent_ram_zap() when it is needed.
>>> >>
>>> >> Signed-off-by: Peng Wang 
>>> >> ---
>>> >>  fs/pstore/ram.c| 4 +---
>>> >>  fs/pstore/ram_core.c   | 5 +++--
>>> >>  include/linux/pstore_ram.h | 1 +
>>> >>  3 files changed, 5 insertions(+), 5 deletions(-)
>>> >>
>>> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>>> >> index ffcff6516e89..b51901f97dc2 100644
>>> >> --- a/fs/pstore/ram.c
>>> >> +++ b/fs/pstore/ram.c
>>> >> @@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
>>> >>
>>> >>   label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
>>> >>   *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
>>> >> -   cxt->memtype, 0, label);
>>> >> +   cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
>>> >>   if (IS_ERR(*prz)) {
>>> >>   int err = PTR_ERR(*prz);
>>> >
>>> > Looks good to me except the minor comment below:
>>> >
>>> >>
>>> >> @@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
>>> >>   return err;
>>> >>   }
>>> >>
>>> >> - persistent_ram_zap(*prz);
>>> >> -
>>> >>   *paddr += sz;
>>> >>
>>> >>   return 0;
>>> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>>> >> index 12e21f789194..2ededd1ea1c2 100644
>>> >> --- a/fs/pstore/ram_core.c
>>> >> +++ b/fs/pstore/ram_core.c
>>> >> @@ -505,15 +505,16 @@ static int persistent_ram_post_init(struct 
>>> >> persistent_ram_zone *prz, u32 sig,
>>> >>   pr_debug("found existing buffer, size %zu, start 
>>> >> %zu\n",
>>> >>buffer_size(prz), buffer_start(prz));
>>> >>   persistent_ram_save_old(prz);
>>> >> - return 0;
>>> >> + if (!(prz->flags & PRZ_FLAG_ZAP_OLD))
>>> >> + return 0;
>>> >
>>> > This could be written differently.
>>> >
>>> > We could just do:
>>> >
>>> > if (prz->flags & PRZ_FLAG_ZAP_OLD)
>>> > persistent_ram_zap(prz);
>>> >
>>> > And remove the zap from below below.
>>>
>>> I actually rearranged things a little to avoid additional round-trips
>>> on the mailing list. :)
>>>
>>> > Since Kees already took this patch, I can just patch this in my series if
>>> > Kees and you are Ok with this suggestion.
>>>
>>> I've put it up here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=pstore/devel=ac564e023248e3f4d87917b91d12376ddfca5000
>>
>>Cool, it LGTM :)
>>
>>- Joel
>>
>
> Thank you all for these warm help.
>
> This is my first time to submit a patch to community. Feel great!

Congrats and welcome to the mother ship ;-)

 - Joel


Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 8:57 PM, Peng15 Wang 王鹏  wrote:
>
>>From: Joel Fernandes 
>>Sent: Wednesday, October 31, 2018 6:16
>>To: Kees Cook
>>Cc: Peng15 Wang 王鹏; Anton Vorontsov; Colin Cross; Tony Luck; LKML; 
>>vipwangerx...@gmail.com
>>Subject: Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()
>>
>>On Tue, Oct 30, 2018 at 02:52:43PM -0700, Kees Cook wrote:
>>> On Tue, Oct 30, 2018 at 2:38 PM, Joel Fernandes  
>>> wrote:
>>> > On Tue, Oct 30, 2018 at 03:52:34PM +0800, Peng Wang wrote:
>>> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
>>> >> function call path is like this:
>>> >>
>>> >> ramoops_init_prz ->
>>> >> |
>>> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
>>> >> |
>>> >> |-> persistent_ram_zap
>>> >>
>>> >> As we can see, persistent_ram_zap() is called twice.
>>> >> We can avoid this by adding an option to persistent_ram_new(), and
>>> >> only call persistent_ram_zap() when it is needed.
>>> >>
>>> >> Signed-off-by: Peng Wang 
>>> >> ---
>>> >>  fs/pstore/ram.c| 4 +---
>>> >>  fs/pstore/ram_core.c   | 5 +++--
>>> >>  include/linux/pstore_ram.h | 1 +
>>> >>  3 files changed, 5 insertions(+), 5 deletions(-)
>>> >>
>>> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>>> >> index ffcff6516e89..b51901f97dc2 100644
>>> >> --- a/fs/pstore/ram.c
>>> >> +++ b/fs/pstore/ram.c
>>> >> @@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
>>> >>
>>> >>   label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
>>> >>   *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
>>> >> -   cxt->memtype, 0, label);
>>> >> +   cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
>>> >>   if (IS_ERR(*prz)) {
>>> >>   int err = PTR_ERR(*prz);
>>> >
>>> > Looks good to me except the minor comment below:
>>> >
>>> >>
>>> >> @@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
>>> >>   return err;
>>> >>   }
>>> >>
>>> >> - persistent_ram_zap(*prz);
>>> >> -
>>> >>   *paddr += sz;
>>> >>
>>> >>   return 0;
>>> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>>> >> index 12e21f789194..2ededd1ea1c2 100644
>>> >> --- a/fs/pstore/ram_core.c
>>> >> +++ b/fs/pstore/ram_core.c
>>> >> @@ -505,15 +505,16 @@ static int persistent_ram_post_init(struct 
>>> >> persistent_ram_zone *prz, u32 sig,
>>> >>   pr_debug("found existing buffer, size %zu, start 
>>> >> %zu\n",
>>> >>buffer_size(prz), buffer_start(prz));
>>> >>   persistent_ram_save_old(prz);
>>> >> - return 0;
>>> >> + if (!(prz->flags & PRZ_FLAG_ZAP_OLD))
>>> >> + return 0;
>>> >
>>> > This could be written differently.
>>> >
>>> > We could just do:
>>> >
>>> > if (prz->flags & PRZ_FLAG_ZAP_OLD)
>>> > persistent_ram_zap(prz);
>>> >
>>> > And remove the zap from below below.
>>>
>>> I actually rearranged things a little to avoid additional round-trips
>>> on the mailing list. :)
>>>
>>> > Since Kees already took this patch, I can just patch this in my series if
>>> > Kees and you are Ok with this suggestion.
>>>
>>> I've put it up here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=pstore/devel=ac564e023248e3f4d87917b91d12376ddfca5000
>>
>>Cool, it LGTM :)
>>
>>- Joel
>>
>
> Thank you all for these warm help.
>
> This is my first time to submit a patch to community. Feel great!

Congrats and welcome to the mother ship ;-)

 - Joel


Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-30 Thread Peng15 Wang 王鹏

>From: Joel Fernandes 
>Sent: Wednesday, October 31, 2018 6:16
>To: Kees Cook
>Cc: Peng15 Wang 王鹏; Anton Vorontsov; Colin Cross; Tony Luck; LKML; 
>vipwangerx...@gmail.com
>Subject: Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()
>
>On Tue, Oct 30, 2018 at 02:52:43PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 2:38 PM, Joel Fernandes  
>> wrote:
>> > On Tue, Oct 30, 2018 at 03:52:34PM +0800, Peng Wang wrote:
>> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
>> >> function call path is like this:
>> >>
>> >> ramoops_init_prz ->
>> >> |
>> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
>> >> |
>> >> |-> persistent_ram_zap
>> >>
>> >> As we can see, persistent_ram_zap() is called twice.
>> >> We can avoid this by adding an option to persistent_ram_new(), and
>> >> only call persistent_ram_zap() when it is needed.
>> >>
>> >> Signed-off-by: Peng Wang 
>> >> ---
>> >>  fs/pstore/ram.c| 4 +---
>> >>  fs/pstore/ram_core.c   | 5 +++--
>> >>  include/linux/pstore_ram.h | 1 +
>> >>  3 files changed, 5 insertions(+), 5 deletions(-)
>> >>
>> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> >> index ffcff6516e89..b51901f97dc2 100644
>> >> --- a/fs/pstore/ram.c
>> >> +++ b/fs/pstore/ram.c
>> >> @@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
>> >>
>> >>   label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
>> >>   *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
>> >> -   cxt->memtype, 0, label);
>> >> +   cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
>> >>   if (IS_ERR(*prz)) {
>> >>   int err = PTR_ERR(*prz);
>> >
>> > Looks good to me except the minor comment below:
>> >
>> >>
>> >> @@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
>> >>   return err;
>> >>   }
>> >>
>> >> - persistent_ram_zap(*prz);
>> >> -
>> >>   *paddr += sz;
>> >>
>> >>   return 0;
>> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>> >> index 12e21f789194..2ededd1ea1c2 100644
>> >> --- a/fs/pstore/ram_core.c
>> >> +++ b/fs/pstore/ram_core.c
>> >> @@ -505,15 +505,16 @@ static int persistent_ram_post_init(struct 
>> >> persistent_ram_zone *prz, u32 sig,
>> >>   pr_debug("found existing buffer, size %zu, start 
>> >> %zu\n",
>> >>buffer_size(prz), buffer_start(prz));
>> >>   persistent_ram_save_old(prz);
>> >> - return 0;
>> >> + if (!(prz->flags & PRZ_FLAG_ZAP_OLD))
>> >> + return 0;
>> >
>> > This could be written differently.
>> >
>> > We could just do:
>> >
>> > if (prz->flags & PRZ_FLAG_ZAP_OLD)
>> > persistent_ram_zap(prz);
>> >
>> > And remove the zap from below below.
>>
>> I actually rearranged things a little to avoid additional round-trips
>> on the mailing list. :)
>>
>> > Since Kees already took this patch, I can just patch this in my series if
>> > Kees and you are Ok with this suggestion.
>>
>> I've put it up here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=pstore/devel=ac564e023248e3f4d87917b91d12376ddfca5000
>
>Cool, it LGTM :)
>
>- Joel
>

Thank you all for these warm help.

This is my first time to submit a patch to community. Feel great!


Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-30 Thread Peng15 Wang 王鹏

>From: Joel Fernandes 
>Sent: Wednesday, October 31, 2018 6:16
>To: Kees Cook
>Cc: Peng15 Wang 王鹏; Anton Vorontsov; Colin Cross; Tony Luck; LKML; 
>vipwangerx...@gmail.com
>Subject: Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()
>
>On Tue, Oct 30, 2018 at 02:52:43PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 2:38 PM, Joel Fernandes  
>> wrote:
>> > On Tue, Oct 30, 2018 at 03:52:34PM +0800, Peng Wang wrote:
>> >> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
>> >> function call path is like this:
>> >>
>> >> ramoops_init_prz ->
>> >> |
>> >> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
>> >> |
>> >> |-> persistent_ram_zap
>> >>
>> >> As we can see, persistent_ram_zap() is called twice.
>> >> We can avoid this by adding an option to persistent_ram_new(), and
>> >> only call persistent_ram_zap() when it is needed.
>> >>
>> >> Signed-off-by: Peng Wang 
>> >> ---
>> >>  fs/pstore/ram.c| 4 +---
>> >>  fs/pstore/ram_core.c   | 5 +++--
>> >>  include/linux/pstore_ram.h | 1 +
>> >>  3 files changed, 5 insertions(+), 5 deletions(-)
>> >>
>> >> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> >> index ffcff6516e89..b51901f97dc2 100644
>> >> --- a/fs/pstore/ram.c
>> >> +++ b/fs/pstore/ram.c
>> >> @@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
>> >>
>> >>   label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
>> >>   *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
>> >> -   cxt->memtype, 0, label);
>> >> +   cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
>> >>   if (IS_ERR(*prz)) {
>> >>   int err = PTR_ERR(*prz);
>> >
>> > Looks good to me except the minor comment below:
>> >
>> >>
>> >> @@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
>> >>   return err;
>> >>   }
>> >>
>> >> - persistent_ram_zap(*prz);
>> >> -
>> >>   *paddr += sz;
>> >>
>> >>   return 0;
>> >> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>> >> index 12e21f789194..2ededd1ea1c2 100644
>> >> --- a/fs/pstore/ram_core.c
>> >> +++ b/fs/pstore/ram_core.c
>> >> @@ -505,15 +505,16 @@ static int persistent_ram_post_init(struct 
>> >> persistent_ram_zone *prz, u32 sig,
>> >>   pr_debug("found existing buffer, size %zu, start 
>> >> %zu\n",
>> >>buffer_size(prz), buffer_start(prz));
>> >>   persistent_ram_save_old(prz);
>> >> - return 0;
>> >> + if (!(prz->flags & PRZ_FLAG_ZAP_OLD))
>> >> + return 0;
>> >
>> > This could be written differently.
>> >
>> > We could just do:
>> >
>> > if (prz->flags & PRZ_FLAG_ZAP_OLD)
>> > persistent_ram_zap(prz);
>> >
>> > And remove the zap from below below.
>>
>> I actually rearranged things a little to avoid additional round-trips
>> on the mailing list. :)
>>
>> > Since Kees already took this patch, I can just patch this in my series if
>> > Kees and you are Ok with this suggestion.
>>
>> I've put it up here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=pstore/devel=ac564e023248e3f4d87917b91d12376ddfca5000
>
>Cool, it LGTM :)
>
>- Joel
>

Thank you all for these warm help.

This is my first time to submit a patch to community. Feel great!


linux-next: Tree for Oct 31

2018-10-30 Thread Stephen Rothwell
Hi all,

Please do not add any v4.21/v5.1 code to your linux-next included trees
until after the merge window closes.

Changes since 20181030:

My fixes tree contains this:

  "drivers: net: include linux/ip.h for iphdr"

The xfs tree gained a conflict against Linus' tree.

The vfs tree gained a conflict against the xfs tree.

Non-merge commits (relative to Linus' tree): 1071
 1632 files changed, 62897 insertions(+), 19314 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 291 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (310c7585e830 Merge tag 'nfsd-4.20' of 
git://linux-nfs.org/~bfields/linux)
Merging fixes/master (2941927a2da1 drivers: net: include linux/ip.h for iphdr)
Merging kbuild-current/fixes (11743c56785c Merge tag 'rpmsg-v4.20' of 
git://github.com/andersson/remoteproc)
Merging arc-current/for-curr (a75e410a8bc2 ARCv2: boot log unaligned access in 
use)
Merging arm-current/fixes (3a58ac65e2d7 ARM: 8799/1: mm: fix pci_ioremap_io() 
offset check)
Merging arm64-fixes/for-next/fixes (ca2b497253ad arm64: perf: Reject 
stand-alone CHAIN events for PMUv3)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (ac1788cc7da4 powerpc/numa: Skip onlining a offline 
node in kdump path)
Merging sparc/master (0f0a691f1ef9 sparc64: Remvoe set_fs() from 
perf_callchain_user().)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (a6b3a3fa0423 net: mvpp2: Fix affinity hint allocation)
Merging bpf/master (d8fd9e106fbc bpf: fix wrong helper enablement in cgroup 
local storage)
Merging ipsec/master (533555e5cbb6 xfrm: Fix error return code in 
xfrm_output_one())
Merging netfilter/master (29a0dd66e953 netfilter: xt_IDLETIMER: add sysfs 
filename checking routine)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (3baafeffa48a iwlwifi: 1000: set the TFD queue 
size)
Merging mac80211/master (8d0be26c781a mac80211_hwsim: fix module init error 
paths for netlink)
Merging rdma-fixes/for-rc (a3671a4f973e RDMA/ucma: Fix Spectre v1 vulnerability)
Merging sound-current/for-linus (826b5de90c0b ALSA: firewire-lib: fix 
insufficient PCM rule for period/buffer size)
Merging sound-asoc-fixes/for-linus (f65f0722e85d Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (35a7f35ad1b1 Linux 4.19-rc8)
Merging regulator-fixes/for-linus (84df9525b0c2 Linux 4.19)
Merging spi-fixes/for-linus (599eb81f4118 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (2edab4df98d9 PCI: Expand the "PF" acronym in 
Kconfig help text)
Merging driver-core.current/driver-core-linus (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging tty.current/tty-linus (202dc3cc10b4 serial: sh-sci: Fix receive on 
SCIFA/SCIFB variants with DMA)
Merging usb.current/usb-linus (69d5b97c5973 HID: we do not randomly make new 
drivers 'default y')
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-fixes/usb-linus (0238df646e62 Linux 4.19-rc7)
Merging usb-chipidea-fixes/ci-fo

linux-next: Tree for Oct 31

2018-10-30 Thread Stephen Rothwell
Hi all,

Please do not add any v4.21/v5.1 code to your linux-next included trees
until after the merge window closes.

Changes since 20181030:

My fixes tree contains this:

  "drivers: net: include linux/ip.h for iphdr"

The xfs tree gained a conflict against Linus' tree.

The vfs tree gained a conflict against the xfs tree.

Non-merge commits (relative to Linus' tree): 1071
 1632 files changed, 62897 insertions(+), 19314 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 291 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (310c7585e830 Merge tag 'nfsd-4.20' of 
git://linux-nfs.org/~bfields/linux)
Merging fixes/master (2941927a2da1 drivers: net: include linux/ip.h for iphdr)
Merging kbuild-current/fixes (11743c56785c Merge tag 'rpmsg-v4.20' of 
git://github.com/andersson/remoteproc)
Merging arc-current/for-curr (a75e410a8bc2 ARCv2: boot log unaligned access in 
use)
Merging arm-current/fixes (3a58ac65e2d7 ARM: 8799/1: mm: fix pci_ioremap_io() 
offset check)
Merging arm64-fixes/for-next/fixes (ca2b497253ad arm64: perf: Reject 
stand-alone CHAIN events for PMUv3)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (ac1788cc7da4 powerpc/numa: Skip onlining a offline 
node in kdump path)
Merging sparc/master (0f0a691f1ef9 sparc64: Remvoe set_fs() from 
perf_callchain_user().)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (a6b3a3fa0423 net: mvpp2: Fix affinity hint allocation)
Merging bpf/master (d8fd9e106fbc bpf: fix wrong helper enablement in cgroup 
local storage)
Merging ipsec/master (533555e5cbb6 xfrm: Fix error return code in 
xfrm_output_one())
Merging netfilter/master (29a0dd66e953 netfilter: xt_IDLETIMER: add sysfs 
filename checking routine)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (3baafeffa48a iwlwifi: 1000: set the TFD queue 
size)
Merging mac80211/master (8d0be26c781a mac80211_hwsim: fix module init error 
paths for netlink)
Merging rdma-fixes/for-rc (a3671a4f973e RDMA/ucma: Fix Spectre v1 vulnerability)
Merging sound-current/for-linus (826b5de90c0b ALSA: firewire-lib: fix 
insufficient PCM rule for period/buffer size)
Merging sound-asoc-fixes/for-linus (f65f0722e85d Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (35a7f35ad1b1 Linux 4.19-rc8)
Merging regulator-fixes/for-linus (84df9525b0c2 Linux 4.19)
Merging spi-fixes/for-linus (599eb81f4118 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (2edab4df98d9 PCI: Expand the "PF" acronym in 
Kconfig help text)
Merging driver-core.current/driver-core-linus (9f51ae62c84a Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging tty.current/tty-linus (202dc3cc10b4 serial: sh-sci: Fix receive on 
SCIFA/SCIFB variants with DMA)
Merging usb.current/usb-linus (69d5b97c5973 HID: we do not randomly make new 
drivers 'default y')
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-fixes/usb-linus (0238df646e62 Linux 4.19-rc7)
Merging usb-chipidea-fixes/ci-fo

Re: [PATCH v2 11/11] ext4: access to uninitialized bh fields in ext4_xattr_set_handle()

2018-10-30 Thread Vasily Averin
On 10/31/2018 04:30 AM, Andreas Dilger wrote:
> Could you please explain your statement below that on-stack
> initialization does not zero unspecified fields?  According 
> to documents I found, for example:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
> 
> they *are* initialized to zero. 

I did not know it,
I re-checked it in generated assembler code and found that you are right and I 
was wrong.

Please drop this patch,
should I resend of rest of this patch set once again?

Thank you,
Vasily Averin


Re: [PATCH v2 11/11] ext4: access to uninitialized bh fields in ext4_xattr_set_handle()

2018-10-30 Thread Vasily Averin
On 10/31/2018 04:30 AM, Andreas Dilger wrote:
> Could you please explain your statement below that on-stack
> initialization does not zero unspecified fields?  According 
> to documents I found, for example:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
> 
> they *are* initialized to zero. 

I did not know it,
I re-checked it in generated assembler code and found that you are right and I 
was wrong.

Please drop this patch,
should I resend of rest of this patch set once again?

Thank you,
Vasily Averin


Re: [PATCH v2 0/8] OLPC 1.75 Keyboard/Touchpad fixes

2018-10-30 Thread James Cameron
G'day,

Success, see below.

On Tue, Oct 30, 2018 at 08:40:38PM +0100, Lubomir Rintel wrote:
> Hello Pavel,
> 
> On Tue, 2018-10-30 at 11:26 +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > > https://github.com/hackerspace/olpc-xo175-linux/wiki/How-to-run-an-up-to-date-Linux-on-a-XO-1.75
> > > > 
> > > > I didn't test it yet -- will do when I get home in the evening.
> > > > But
> > > > chances are it's good enough and I guess you'd be able to get it
> > > > working even if I messed up some details.
> >
> > [...]
> > Could I get prepared binary zImage for testing?
> 
> https://github.com/hackerspace/olpc-xo175-buildroot/releases
> 
> Here's a SD card image that works for me. The topmost commit in the
> same repository is the build configuration that was used to generate
> it:
> 
> https://github.com/hackerspace/olpc-xo175-buildroot/commit/71783d599.patch
> 
> Note it is only going to boot off the SD card, because the root=
> argument is hardwired in the devicetree. Sorry about that -- I built
> the image before I noticed you're booting off an USB stick and I don't
> have the resources to regenerate the image at the moment.

Your image does boot for me - after changing features on
filesystem.

http://dev.laptop.org/~quozl/y/1gHh5m.txt (dmesg)

dumpe2fs of your image filesystem features; has_journal ext_attr
resize_inode dir_index filetype flex_bg sparse_super large_file
huge_file dir_nlink extra_isize metadata_csum, and flags;
signed_directory_hash.

dumpe2fs of my image filesystem features; has_journal ext_attr
resize_inode dir_index filetype extent flex_bg sparse_super uninit_bg
dir_nlink extra_isize, and flags; unsigned_directory_hash.

Our OLPC OS builder uses "mkfs.ext4 -O dir_index,^huge_file", from
e2fsprogs 1.42.5.

I'll look at the microSD card errors; by trying another one.

Fantastic progress though, thanks!  Wish I were a full time kernel
developer, but so much else to do now.

> If this won't boot for you, we may need fixes for older FW.

Let me know what you need there; with a patch, and if it isn't too
extensive I could spin a new build.  We're not producing these models,
so I don't _have_ to keep the factory test code working.

https://github.com/quozl/openfirmware

-- 
James Cameron
http://quozl.netrek.org/


Re: [PATCH v2 0/8] OLPC 1.75 Keyboard/Touchpad fixes

2018-10-30 Thread James Cameron
G'day,

Success, see below.

On Tue, Oct 30, 2018 at 08:40:38PM +0100, Lubomir Rintel wrote:
> Hello Pavel,
> 
> On Tue, 2018-10-30 at 11:26 +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > > https://github.com/hackerspace/olpc-xo175-linux/wiki/How-to-run-an-up-to-date-Linux-on-a-XO-1.75
> > > > 
> > > > I didn't test it yet -- will do when I get home in the evening.
> > > > But
> > > > chances are it's good enough and I guess you'd be able to get it
> > > > working even if I messed up some details.
> >
> > [...]
> > Could I get prepared binary zImage for testing?
> 
> https://github.com/hackerspace/olpc-xo175-buildroot/releases
> 
> Here's a SD card image that works for me. The topmost commit in the
> same repository is the build configuration that was used to generate
> it:
> 
> https://github.com/hackerspace/olpc-xo175-buildroot/commit/71783d599.patch
> 
> Note it is only going to boot off the SD card, because the root=
> argument is hardwired in the devicetree. Sorry about that -- I built
> the image before I noticed you're booting off an USB stick and I don't
> have the resources to regenerate the image at the moment.

Your image does boot for me - after changing features on
filesystem.

http://dev.laptop.org/~quozl/y/1gHh5m.txt (dmesg)

dumpe2fs of your image filesystem features; has_journal ext_attr
resize_inode dir_index filetype flex_bg sparse_super large_file
huge_file dir_nlink extra_isize metadata_csum, and flags;
signed_directory_hash.

dumpe2fs of my image filesystem features; has_journal ext_attr
resize_inode dir_index filetype extent flex_bg sparse_super uninit_bg
dir_nlink extra_isize, and flags; unsigned_directory_hash.

Our OLPC OS builder uses "mkfs.ext4 -O dir_index,^huge_file", from
e2fsprogs 1.42.5.

I'll look at the microSD card errors; by trying another one.

Fantastic progress though, thanks!  Wish I were a full time kernel
developer, but so much else to do now.

> If this won't boot for you, we may need fixes for older FW.

Let me know what you need there; with a patch, and if it isn't too
extensive I could spin a new build.  We're not producing these models,
so I don't _have_ to keep the factory test code working.

https://github.com/quozl/openfirmware

-- 
James Cameron
http://quozl.netrek.org/


arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression

2018-10-30 Thread kbuild test robot
Hi Peter,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 7aa54be2976550f17c11a1c3e3630002dea39303 locking/qspinlock, x86: 
Provide liveness guarantee
date:   2 weeks ago
config: x86_64-randconfig-j0-10290909 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 7aa54be2976550f17c11a1c3e3630002dea39303
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:5:0,
from include/linux/atomic.h:7,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
   arch/x86/include/asm/qspinlock.h: In function 
'queued_fetch_set_pending_acquire':
>> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
   : clobbers : cc_label);\
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 
'asm_volatile_goto'
 asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"  \
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here
cc_label: c = true;  \
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
>> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label'
cc_label: c = true;  \
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
  

arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression

2018-10-30 Thread kbuild test robot
Hi Peter,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 7aa54be2976550f17c11a1c3e3630002dea39303 locking/qspinlock, x86: 
Provide liveness guarantee
date:   2 weeks ago
config: x86_64-randconfig-j0-10290909 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 7aa54be2976550f17c11a1c3e3630002dea39303
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/atomic.h:5:0,
from include/linux/atomic.h:7,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets.c:9:
   arch/x86/include/asm/qspinlock.h: In function 
'queued_fetch_set_pending_acquire':
>> arch/x86/include/asm/rmwcc.h:23:17: error: jump into statement expression
   : clobbers : cc_label);\
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:21:2: note: in expansion of macro 
'asm_volatile_goto'
 asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"  \
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:25:1: note: label 'cc_label' defined here
cc_label: c = true;  \
^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
   ^
   arch/x86/include/asm/rmwcc.h:60:32: note: in expansion of macro 
'RMWcc_CONCAT'
#define GEN_BINARY_RMWcc(X...) RMWcc_CONCAT(GEN_BINARY_RMWcc_, 
RMWcc_ARGS(X))(X)
   ^
   arch/x86/include/asm/qspinlock.h:18:6: note: in expansion of macro 
'GEN_BINARY_RMWcc'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
>> arch/x86/include/asm/rmwcc.h:25:1: error: duplicate label 'cc_label'
cc_label: c = true;  \
^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
   arch/x86/include/asm/qspinlock.h:18:2: note: in expansion of macro 'if'
 if (GEN_BINARY_RMWcc(LOCK_PREFIX "btsl", lock->val.counter, c,
 ^
   arch/x86/include/asm/rmwcc.h:54:2: note: in expansion of macro '__GEN_RMWcc'
 __GEN_RMWcc(op " %[val], " arg0, var, cc,   \
 ^
   arch/x86/include/asm/rmwcc.h:58:2: note: in expansion of macro 
'GEN_BINARY_RMWcc_6'
 GEN_BINARY_RMWcc_6(op, var, cc, vcon, val, "%[var]")
 ^
   arch/x86/include/asm/rmwcc.h:9:30: note: in expansion of macro 
'GEN_BINARY_RMWcc_5'
#define __RMWcc_CONCAT(a, b) a ## b
 ^
   arch/x86/include/asm/rmwcc.h:10:28: note: in expansion of macro 
'__RMWcc_CONCAT'
#define RMWcc_CONCAT(a, b) __RMWcc_CONCAT(a, b)
  

Re: mmotm 2018-10-30-16-08 uploaded (arch/x86/kernel/vsmp_64.c)

2018-10-30 Thread Randy Dunlap
On 10/30/18 4:09 PM, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2018-10-30-16-08 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.


Build error on x86_64 from origin.patch (i.e., not mmotm)
when CONFIG_PCI is not enabled:
Oh:  CONFIG_X86_VSMP is also not enabled, but
arch/x86/kernel/Makefile always tries to build vsmp_64.o.


ld: arch/x86/kernel/vsmp_64.o: in function `vsmp_cap_cpus':
vsmp_64.c:(.init.text+0x1e): undefined reference to `read_pci_config'


-- 
~Randy


Re: mmotm 2018-10-30-16-08 uploaded (arch/x86/kernel/vsmp_64.c)

2018-10-30 Thread Randy Dunlap
On 10/30/18 4:09 PM, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2018-10-30-16-08 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 
> mmotm-readme.txt says
> 
> README for mm-of-the-moment:
> 
> http://www.ozlabs.org/~akpm/mmotm/
> 
> This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
> more than once a week.


Build error on x86_64 from origin.patch (i.e., not mmotm)
when CONFIG_PCI is not enabled:
Oh:  CONFIG_X86_VSMP is also not enabled, but
arch/x86/kernel/Makefile always tries to build vsmp_64.o.


ld: arch/x86/kernel/vsmp_64.o: in function `vsmp_cap_cpus':
vsmp_64.c:(.init.text+0x1e): undefined reference to `read_pci_config'


-- 
~Randy


Re: WARNING in get_unlocked_entry

2018-10-30 Thread Matthew Wilcox
On Tue, Oct 30, 2018 at 08:00:03AM -0700, syzbot wrote:
> syzbot found the following crash on:
> 
> HEAD commit:4b42745211af Merge tag 'armsoc-soc' of git://git.kernel.or..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1187d06d40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=93932074d01b4a5
> dashboard link: https://syzkaller.appspot.com/bug?extid=4fd0c066d82852499145
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.

Hmmpf.  Would have been nice if syzbot had caught this during its tests
of the -next tree ...

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4fd0c066d82852499...@syzkaller.appspotmail.com
> 
> EXT4-fs (sda1): Cannot specify journal on remount
> EXT4-fs (sda1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> EXT4-fs (sda1): warning: refusing change of dax flag with busy inodes while
> remounting
> WARNING: CPU: 0 PID: 12870 at fs/dax.c:227 get_unlocked_entry+0x3ac/0x4d0
> fs/dax.c:227
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 12870 Comm: syz-executor3 Not tainted 4.19.0+ #87
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x244/0x39d lib/dump_stack.c:113
>  panic+0x238/0x4e7 kernel/panic.c:184
>  __warn.cold.8+0x20/0x4a kernel/panic.c:536
>  report_bug+0x254/0x2d0 lib/bug.c:186
>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:966
> RIP: 0010:get_unlocked_entry+0x3ac/0x4d0 fs/dax.c:227
> Code: e8 31 ff 83 e0 01 48 89 c6 48 89 85 10 ff ff ff e8 09 18 96 ff 48 8b
> 85 10 ff ff ff 48 85 c0 0f 85 34 fe ff ff e8 c4 16 96 ff <0f> 0b e8 bd 16 96
> ff 48 b8 00 00 00 00 00 fc ff df 48 03 85 f8 fe
> RSP: 0018:88018025e7b8 EFLAGS: 00010012
> RAX: 0004 RBX: 88018025ead8 RCX: c90008983000
> RDX: 0341 RSI: 81e94d3c RDI: 0007
> RBP: 88018025e8c8 R08: 88017f862280 R09: ed003004bd08
> R10: ed003004bd08 R11: 0003 R12: dc00
> R13: ea0005e01040 R14: 88018025e800 R15: 88018025e8a0
>  grab_mapping_entry fs/dax.c:447 [inline]

OK, so I'd like to know what value actually got loaded into 'entry'.
Maybe my x86-fu is weak, but I don't understand how this register dump
correlates with the instructions I'm seeing.

Here's the C code:

entry = xas_load(xas);
if (!entry || xa_is_internal(entry) ||
WARN_ON_ONCE(!xa_is_value(entry)) ||
!dax_is_locked(entry))
return entry;


I downloaded the .config that syzbot was using and built fs/dax.o.
Disassembling it, I get:

WARN_ON_ONCE(!xa_is_value(entry)) ||
5e53:   31 ff   xor%edi,%edi
5e55:   83 e0 01and$0x1,%eax
5e58:   48 89 c6mov%rax,%rsi
5e5b:   48 89 85 10 ff ff ffmov%rax,-0xf0(%rbp)
5e62:   e8 00 00 00 00  callq  5e67 
5e63: R_X86_64_PLT32__sanitizer_cov_trace_const_cmp8
-0x4
5e67:   48 8b 85 10 ff ff ffmov-0xf0(%rbp),%rax
5e6e:   48 85 c0test   %rax,%rax
5e71:   0f 85 34 fe ff ff   jne5cab 
5e77:   e8 00 00 00 00  callq  5e7c 
5e78: R_X86_64_PLT32__sanitizer_cov_trace_pc-0x4
5e7c:   0f 0b   ud2

That ud2 is the WARN_ON trigger.  It's at offset 0x3ac from the start of
get_unlocked_entry() so it matches the dump.

The key is 'test %rax,%rax' at 5e6e.  If that passes, we jump to 5cab
where we do xa_to_value() in order to implement dax_is_locked().  So is
'entry' actually 0x0004 like the dump says?

That's where I get confused.  The previous insn (5e67) loads rax from what
I presume is the stack, having previously stored it there at insn 535b.
But insn 5e51 should have reduced %rax to just 0 or 1 in the low bit of
the register.  So clearly I have my understanding of x86 insns confused.
If someone could help me out here, I'd be grateful.



Re: [PATCH v2] mtd: spi-nor: Add support for SPI boot flash access for AMD Family 16h

2018-10-30 Thread Grandbois, Brett

On 30/10/18 6:26 pm, Boris Brezillon wrote:
> On Mon, 29 Oct 2018 23:15:42 +
> "Grandbois, Brett"  wrote:
>
>> On 28/10/18 1:39 am, Boris Brezillon wrote:
>>> Hi Brett,
>>>
>>> On Tue, 16 Oct 2018 00:57:41 +
>>> "Grandbois, Brett"   wrote:
>>>   
 Add support to expose the SPI boot flash on AMD Family 16h CPUs as
 a standard mtd device to give userspace BIOS updaters greater
 feature support.  The BIOS and Kernel Developer's Guide refers to
 this as the 'SPI ROM' controller and so the driver follows that
 naming convention for consistency.
   
>>> We're currently trying to convert spi-nor controller drivers to the
>>> spi-mem interface [1]. Can you look at this new interface and tell
>>> me if you'd be able to implement it? If that's not possible, then
>>> I'd prefer to have this driver implement the mtd_info interface
>>> directly.
>> So from going over the spi-mem interface it looks like the intent is
>> for these sorts of devices to be a standard spi_controller with only
>> mem_ops defined and the transfer/_one/_one_message left as NULL?  Is
>> that correct?
> Yes
>
>> That's a bit of a pivot from how it's currently done
>> (it's conceptually similar to the intel-spi-pci driver so I was
>> following that)
> Yes, and that's exactly what I'd like to avoid. intel-spi-pci will
> probably be the trickiest conversion, so I'd like to avoid having
> another one ;-).
>
>> but I should be able to rework it to the new
>> interface.  This then lives under drivers/spi and thus should be
>> submitted to linux-spi?
> Actually, if the controller is only ever connected to the same SPI NOR
> chip (no need for advanced detection scheme) and does not support
> support Octo/Quad/Dual modes (or any other advanced features), you'll
> be better off implementing mtd->_read/_write/_erase() directly (the
> driver would then live in drivers/mtd/devices/).

Hmm, conceptually the device is better suited to the mtd layer than the spi 
layer.  The HW is designed to only ever access spi flash chips, not as a 
general spi master controller, so really the end result of it should always 
only ever be 1 mtd device.  Unfortunately as the device probing and command 
sequences for this are the same as implemented in spi-nor I'd either be 
duplicating a lot of existing code, or just wrapping around spi_nor_scan which 
sounds like we're back to the dedicated spi-nor controller you're trying to 
move away from.

The spi-mem interface ops do nicely match up with what the controller supports, 
including the new direct mapping mode which I'd be able to make use of, so as 
long as there aren't any issues with supporting only mem_ops and not the 
message transfers then it's probably the way to go.

   



  



Re: WARNING in get_unlocked_entry

2018-10-30 Thread Matthew Wilcox
On Tue, Oct 30, 2018 at 08:00:03AM -0700, syzbot wrote:
> syzbot found the following crash on:
> 
> HEAD commit:4b42745211af Merge tag 'armsoc-soc' of git://git.kernel.or..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1187d06d40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=93932074d01b4a5
> dashboard link: https://syzkaller.appspot.com/bug?extid=4fd0c066d82852499145
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.

Hmmpf.  Would have been nice if syzbot had caught this during its tests
of the -next tree ...

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4fd0c066d82852499...@syzkaller.appspotmail.com
> 
> EXT4-fs (sda1): Cannot specify journal on remount
> EXT4-fs (sda1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> EXT4-fs (sda1): warning: refusing change of dax flag with busy inodes while
> remounting
> WARNING: CPU: 0 PID: 12870 at fs/dax.c:227 get_unlocked_entry+0x3ac/0x4d0
> fs/dax.c:227
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 12870 Comm: syz-executor3 Not tainted 4.19.0+ #87
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x244/0x39d lib/dump_stack.c:113
>  panic+0x238/0x4e7 kernel/panic.c:184
>  __warn.cold.8+0x20/0x4a kernel/panic.c:536
>  report_bug+0x254/0x2d0 lib/bug.c:186
>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:966
> RIP: 0010:get_unlocked_entry+0x3ac/0x4d0 fs/dax.c:227
> Code: e8 31 ff 83 e0 01 48 89 c6 48 89 85 10 ff ff ff e8 09 18 96 ff 48 8b
> 85 10 ff ff ff 48 85 c0 0f 85 34 fe ff ff e8 c4 16 96 ff <0f> 0b e8 bd 16 96
> ff 48 b8 00 00 00 00 00 fc ff df 48 03 85 f8 fe
> RSP: 0018:88018025e7b8 EFLAGS: 00010012
> RAX: 0004 RBX: 88018025ead8 RCX: c90008983000
> RDX: 0341 RSI: 81e94d3c RDI: 0007
> RBP: 88018025e8c8 R08: 88017f862280 R09: ed003004bd08
> R10: ed003004bd08 R11: 0003 R12: dc00
> R13: ea0005e01040 R14: 88018025e800 R15: 88018025e8a0
>  grab_mapping_entry fs/dax.c:447 [inline]

OK, so I'd like to know what value actually got loaded into 'entry'.
Maybe my x86-fu is weak, but I don't understand how this register dump
correlates with the instructions I'm seeing.

Here's the C code:

entry = xas_load(xas);
if (!entry || xa_is_internal(entry) ||
WARN_ON_ONCE(!xa_is_value(entry)) ||
!dax_is_locked(entry))
return entry;


I downloaded the .config that syzbot was using and built fs/dax.o.
Disassembling it, I get:

WARN_ON_ONCE(!xa_is_value(entry)) ||
5e53:   31 ff   xor%edi,%edi
5e55:   83 e0 01and$0x1,%eax
5e58:   48 89 c6mov%rax,%rsi
5e5b:   48 89 85 10 ff ff ffmov%rax,-0xf0(%rbp)
5e62:   e8 00 00 00 00  callq  5e67 
5e63: R_X86_64_PLT32__sanitizer_cov_trace_const_cmp8
-0x4
5e67:   48 8b 85 10 ff ff ffmov-0xf0(%rbp),%rax
5e6e:   48 85 c0test   %rax,%rax
5e71:   0f 85 34 fe ff ff   jne5cab 
5e77:   e8 00 00 00 00  callq  5e7c 
5e78: R_X86_64_PLT32__sanitizer_cov_trace_pc-0x4
5e7c:   0f 0b   ud2

That ud2 is the WARN_ON trigger.  It's at offset 0x3ac from the start of
get_unlocked_entry() so it matches the dump.

The key is 'test %rax,%rax' at 5e6e.  If that passes, we jump to 5cab
where we do xa_to_value() in order to implement dax_is_locked().  So is
'entry' actually 0x0004 like the dump says?

That's where I get confused.  The previous insn (5e67) loads rax from what
I presume is the stack, having previously stored it there at insn 535b.
But insn 5e51 should have reduced %rax to just 0 or 1 in the low bit of
the register.  So clearly I have my understanding of x86 insns confused.
If someone could help me out here, I'd be grateful.



Re: [PATCH v2] mtd: spi-nor: Add support for SPI boot flash access for AMD Family 16h

2018-10-30 Thread Grandbois, Brett

On 30/10/18 6:26 pm, Boris Brezillon wrote:
> On Mon, 29 Oct 2018 23:15:42 +
> "Grandbois, Brett"  wrote:
>
>> On 28/10/18 1:39 am, Boris Brezillon wrote:
>>> Hi Brett,
>>>
>>> On Tue, 16 Oct 2018 00:57:41 +
>>> "Grandbois, Brett"   wrote:
>>>   
 Add support to expose the SPI boot flash on AMD Family 16h CPUs as
 a standard mtd device to give userspace BIOS updaters greater
 feature support.  The BIOS and Kernel Developer's Guide refers to
 this as the 'SPI ROM' controller and so the driver follows that
 naming convention for consistency.
   
>>> We're currently trying to convert spi-nor controller drivers to the
>>> spi-mem interface [1]. Can you look at this new interface and tell
>>> me if you'd be able to implement it? If that's not possible, then
>>> I'd prefer to have this driver implement the mtd_info interface
>>> directly.
>> So from going over the spi-mem interface it looks like the intent is
>> for these sorts of devices to be a standard spi_controller with only
>> mem_ops defined and the transfer/_one/_one_message left as NULL?  Is
>> that correct?
> Yes
>
>> That's a bit of a pivot from how it's currently done
>> (it's conceptually similar to the intel-spi-pci driver so I was
>> following that)
> Yes, and that's exactly what I'd like to avoid. intel-spi-pci will
> probably be the trickiest conversion, so I'd like to avoid having
> another one ;-).
>
>> but I should be able to rework it to the new
>> interface.  This then lives under drivers/spi and thus should be
>> submitted to linux-spi?
> Actually, if the controller is only ever connected to the same SPI NOR
> chip (no need for advanced detection scheme) and does not support
> support Octo/Quad/Dual modes (or any other advanced features), you'll
> be better off implementing mtd->_read/_write/_erase() directly (the
> driver would then live in drivers/mtd/devices/).

Hmm, conceptually the device is better suited to the mtd layer than the spi 
layer.  The HW is designed to only ever access spi flash chips, not as a 
general spi master controller, so really the end result of it should always 
only ever be 1 mtd device.  Unfortunately as the device probing and command 
sequences for this are the same as implemented in spi-nor I'd either be 
duplicating a lot of existing code, or just wrapping around spi_nor_scan which 
sounds like we're back to the dedicated spi-nor controller you're trying to 
move away from.

The spi-mem interface ops do nicely match up with what the controller supports, 
including the new direct mapping mode which I'd be able to make use of, so as 
long as there aren't any issues with supporting only mem_ops and not the 
message transfers then it's probably the way to go.

   



  



[PATCH] perf top: Display the LBR stats in callchain entry

2018-10-30 Thread Jin Yao
Perf report has supported the displaying of LBR stats
(such as cycles, predicted%) in callchain entry.

For example,
perf report --branch-history --stdio

--1.01%--intel_idle mwait.h:29
  intel_idle cpufeature.h:164 (cycles:5)
  intel_idle cpufeature.h:164 (predicted:76.4%)
  intel_idle mwait.h:102 (cycles:41)
  intel_idle current.h:15

While perf top has not supported that.

For example,
perf top -a -b --call-graph branch

-   13.86% 0.23%  [kernel]  [k] __x86_indirect_thunk_rax
   - 13.65% __x86_indirect_thunk_rax
  + 1.69% do_syscall_64
  + 1.68% do_select
  + 1.41% ktime_get
  + 0.70% __schedule
  + 0.62% do_sys_poll
0.58% __x86_indirect_thunk_rax

Actually it's very easy to enable this feature in perf top.

With this patch, the result is:

perf top -a -b --call-graph branch

-   13.58% 0.00%  [kernel]  [k] __x86_indirect_thunk_rax
   - 13.57% __x86_indirect_thunk_rax (predicted:93.9%)
  + 1.78% do_select (cycles:2)
  + 1.68% perf_pmu_disable.part.99 (cycles:1)
  + 1.45% ___sys_recvmsg (cycles:25)
  + 0.81% unix_stream_sendmsg (cycles:18)
  + 0.80% ktime_get (cycles:400)
0.58% pick_next_task_fair (cycles:47)
  + 0.56% i915_request_retire (cycles:2)
  + 0.52% do_sys_poll (cycles:4)

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-top.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..7691b21 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1420,6 +1420,9 @@ int cmd_top(int argc, const char **argv)
}
}
 
+   if (opts->branch_stack && callchain_param.enabled)
+   symbol_conf.show_branchflag_count = true;
+
sort__mode = SORT_MODE__TOP;
/* display thread wants entries to be collapsed in a different tree */
perf_hpp_list.need_collapse = 1;
-- 
2.7.4



[PATCH] perf top: Display the LBR stats in callchain entry

2018-10-30 Thread Jin Yao
Perf report has supported the displaying of LBR stats
(such as cycles, predicted%) in callchain entry.

For example,
perf report --branch-history --stdio

--1.01%--intel_idle mwait.h:29
  intel_idle cpufeature.h:164 (cycles:5)
  intel_idle cpufeature.h:164 (predicted:76.4%)
  intel_idle mwait.h:102 (cycles:41)
  intel_idle current.h:15

While perf top has not supported that.

For example,
perf top -a -b --call-graph branch

-   13.86% 0.23%  [kernel]  [k] __x86_indirect_thunk_rax
   - 13.65% __x86_indirect_thunk_rax
  + 1.69% do_syscall_64
  + 1.68% do_select
  + 1.41% ktime_get
  + 0.70% __schedule
  + 0.62% do_sys_poll
0.58% __x86_indirect_thunk_rax

Actually it's very easy to enable this feature in perf top.

With this patch, the result is:

perf top -a -b --call-graph branch

-   13.58% 0.00%  [kernel]  [k] __x86_indirect_thunk_rax
   - 13.57% __x86_indirect_thunk_rax (predicted:93.9%)
  + 1.78% do_select (cycles:2)
  + 1.68% perf_pmu_disable.part.99 (cycles:1)
  + 1.45% ___sys_recvmsg (cycles:25)
  + 0.81% unix_stream_sendmsg (cycles:18)
  + 0.80% ktime_get (cycles:400)
0.58% pick_next_task_fair (cycles:47)
  + 0.56% i915_request_retire (cycles:2)
  + 0.52% do_sys_poll (cycles:4)

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-top.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..7691b21 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1420,6 +1420,9 @@ int cmd_top(int argc, const char **argv)
}
}
 
+   if (opts->branch_stack && callchain_param.enabled)
+   symbol_conf.show_branchflag_count = true;
+
sort__mode = SORT_MODE__TOP;
/* display thread wants entries to be collapsed in a different tree */
perf_hpp_list.need_collapse = 1;
-- 
2.7.4



Re: [PATCH v5 08/11] arm64: dts: allwinner: a64: pinebook: enable power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> From: Vasily Khoruzhick 
>
> Pinebook has ACIN connector and 1 mAh battery.
>
> Signed-off-by: Vasily Khoruzhick 

You should have your own SoB following the author's when you resend other
people's patches. Otherwise,

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 08/11] arm64: dts: allwinner: a64: pinebook: enable power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> From: Vasily Khoruzhick 
>
> Pinebook has ACIN connector and 1 mAh battery.
>
> Signed-off-by: Vasily Khoruzhick 

You should have your own SoB following the author's when you resend other
people's patches. Otherwise,

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 07/11] arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> AXP803 ACIN pins are routed from SOM to the DC jack on the baseboard.
> AXP803 charger pins BATSENSE, LOADSENSE, N_BATDRV, LX_CHG, VIN_CHG
> and IPSOUT are connected via PMOS driver to SOM VBAT pins. VBAT and
> AXP803 TS pins are routed to the baseboard 3-pin battery connector.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 07/11] arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> AXP803 ACIN pins are routed from SOM to the DC jack on the baseboard.
> AXP803 charger pins BATSENSE, LOADSENSE, N_BATDRV, LX_CHG, VIN_CHG
> and IPSOUT are connected via PMOS driver to SOM VBAT pins. VBAT and
> AXP803 TS pins are routed to the baseboard 3-pin battery connector.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 06/11] arm64: dts: allwinner: axp803: add AC and battery power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> Parts of the AXP803 are compatible with their counterparts on the AXP813.
> Add DT nodes ADC, GPIO, AC and battery power supplies.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 
> Tested-by: Vasily Khoruzhick 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 06/11] arm64: dts: allwinner: axp803: add AC and battery power supplies

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> Parts of the AXP803 are compatible with their counterparts on the AXP813.
> Add DT nodes ADC, GPIO, AC and battery power supplies.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 
> Tested-by: Vasily Khoruzhick 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 03/11] dt-bindings: gpio: gpio-axp209: add AXP803 GPIO bindings

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> The AXP803 GPIO is compatible with AXP813 GPIO, but add
> specific compatible for it.
>
> Signed-off-by: Oskari Lemmela 
> ---
>  Documentation/devicetree/bindings/gpio/gpio-axp209.txt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt 
> b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> index fc42b2caa06d..5337a21d7bcf 100644
> --- a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> +++ b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> @@ -11,6 +11,7 @@ This driver employs the per-pin muxing pattern.
>  Required properties:
>  - compatible: Should be one of:
> - "x-powers,axp209-gpio"
> +   - "x-powers,axp803-gpio"
> - "x-powers,axp813-gpio"
>  - #gpio-cells: Should be two. The first cell is the pin number and the
>second is the GPIO flags.
> @@ -67,6 +68,7 @@ GPIO0 |   gpio_in, gpio_out, ldo, adc
>  GPIO1  |   gpio_in, gpio_out, ldo, adc
>  GPIO2  |   gpio_in, gpio_out
>
> +axp803
>  axp813
>  --
>  GPIO   |   Functions
> --

As mentioned in my reply to the cover letter, this patch isn't needed.

We ask people to add model-specific compatibles in the device tree
in case the hardware turns out not to be so compatible with the old
hardware we thought it was compatible with. With the model-specific
compatible in place, we have a way out. Without it, we'd have to ask
everyone to update their device trees, which annoys some people who'd
like to have some sort of backward compatibility.

If the model-specific + fallback compatibles combination was to be
listed, the line would include both compatibles. But we're not doing
that for Allwinner stuff. As mentioned it is there just in case. The
best outcome is we don't care and don't need to use them.

ChenYu


Re: [PATCH v5 03/11] dt-bindings: gpio: gpio-axp209: add AXP803 GPIO bindings

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> The AXP803 GPIO is compatible with AXP813 GPIO, but add
> specific compatible for it.
>
> Signed-off-by: Oskari Lemmela 
> ---
>  Documentation/devicetree/bindings/gpio/gpio-axp209.txt | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt 
> b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> index fc42b2caa06d..5337a21d7bcf 100644
> --- a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> +++ b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> @@ -11,6 +11,7 @@ This driver employs the per-pin muxing pattern.
>  Required properties:
>  - compatible: Should be one of:
> - "x-powers,axp209-gpio"
> +   - "x-powers,axp803-gpio"
> - "x-powers,axp813-gpio"
>  - #gpio-cells: Should be two. The first cell is the pin number and the
>second is the GPIO flags.
> @@ -67,6 +68,7 @@ GPIO0 |   gpio_in, gpio_out, ldo, adc
>  GPIO1  |   gpio_in, gpio_out, ldo, adc
>  GPIO2  |   gpio_in, gpio_out
>
> +axp803
>  axp813
>  --
>  GPIO   |   Functions
> --

As mentioned in my reply to the cover letter, this patch isn't needed.

We ask people to add model-specific compatibles in the device tree
in case the hardware turns out not to be so compatible with the old
hardware we thought it was compatible with. With the model-specific
compatible in place, we have a way out. Without it, we'd have to ask
everyone to update their device trees, which annoys some people who'd
like to have some sort of backward compatibility.

If the model-specific + fallback compatibles combination was to be
listed, the line would include both compatibles. But we're not doing
that for Allwinner stuff. As mentioned it is there just in case. The
best outcome is we don't care and don't need to use them.

ChenYu


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Aleksa Sarai
On 2018-10-31, Christian Brauner  wrote:
> > I think Aleksa's larger point is that it's useful to treat processes
> > as other file-descriptor-named, poll-able, wait-able resources.
> > Consistency is important. A process is just another system resource,
> > and like any other system resource, you should be open to hold a file
> > descriptor to it and do things to that process via that file
> > descriptor. The precise form of this process-handle FD is up for
> > debate. The existing /proc/$PID directory FD is a good candidate for a
> > process handle FD, since it does almost all of what's needed. But
> > regardless of what form a process handle FD takes, we need it. I don't
> > see a case for continuing to treat processes in a non-unixy,
> > non-file-descriptor-based manner.
> 
> That's what I'm proposing in the API for which I'm gathering feedback.
> I have presented parts of this in various discussions at LSS Europe last week
> and will be at LPC.
> We don't want to rush an API like this though. It was tried before in
> other forms
> and these proposals didn't make it.

:+1: on a well thought-out and generic proposal. As we've discussed
elsewhere, this is an issue that really would be great to (finally)
solve.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Aleksa Sarai
On 2018-10-31, Christian Brauner  wrote:
> > I think Aleksa's larger point is that it's useful to treat processes
> > as other file-descriptor-named, poll-able, wait-able resources.
> > Consistency is important. A process is just another system resource,
> > and like any other system resource, you should be open to hold a file
> > descriptor to it and do things to that process via that file
> > descriptor. The precise form of this process-handle FD is up for
> > debate. The existing /proc/$PID directory FD is a good candidate for a
> > process handle FD, since it does almost all of what's needed. But
> > regardless of what form a process handle FD takes, we need it. I don't
> > see a case for continuing to treat processes in a non-unixy,
> > non-file-descriptor-based manner.
> 
> That's what I'm proposing in the API for which I'm gathering feedback.
> I have presented parts of this in various discussions at LSS Europe last week
> and will be at LPC.
> We don't want to rush an API like this though. It was tried before in
> other forms
> and these proposals didn't make it.

:+1: on a well thought-out and generic proposal. As we've discussed
elsewhere, this is an issue that really would be great to (finally)
solve.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: [PATCH v5 09/11] power: supply: add AC power supply driver for AXP813

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> AXP813 and AXP803 PMICs can control input current and minimum voltage.
>
> Both of these values are configurable.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 09/11] power: supply: add AC power supply driver for AXP813

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> AXP813 and AXP803 PMICs can control input current and minimum voltage.
>
> Both of these values are configurable.
>
> Signed-off-by: Oskari Lemmela 
> Reviewed-by: Quentin Schulz 

Reviewed-by: Chen-Yu Tsai 


Re: [PATCH v5 02/11] dt-bindings: power: supply: axp20x: add AXP803 power bindings

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> The AXP803 power supplies are compatible with AXP813, but
> add specific compatibles for them.
>
> Signed-off-by: Oskari Lemmela 
> ---
>  .../devicetree/bindings/power/supply/axp20x_ac_power.txt | 1 +
>  .../devicetree/bindings/power/supply/axp20x_battery.txt  | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git 
> a/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt 
> b/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> index 7a1fb532abe5..acdeb4b8f4cc 100644
> --- a/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> +++ b/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> @@ -4,6 +4,7 @@ Required Properties:
>   - compatible: One of:
> "x-powers,axp202-ac-power-supply"
> "x-powers,axp221-ac-power-supply"
> +   "x-powers,axp803-ac-power-supply"
> "x-powers,axp813-ac-power-supply"
>
>  This node is a subnode of the axp20x PMIC.
> diff --git 
> a/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt 
> b/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> index 41916f69902c..780ebd7e3b84 100644
> --- a/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> +++ b/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> @@ -4,6 +4,7 @@ Required Properties:
>   - compatible, one of:
> "x-powers,axp209-battery-power-supply"
> "x-powers,axp221-battery-power-supply"
> +   "x-powers,axp803-battery-power-supply"
> "x-powers,axp813-battery-power-supply"
>
>  This node is a subnode of its respective PMIC DT node.

As mentioned in my reply to the cover letter, this patch isn't needed.

We ask people to add model-specific compatibles in the device tree
in case the hardware turns out not to be so compatible with the old
hardware we thought it was compatible with. With the model-specific
compatible in place, we have a way out. Without it, we'd have to ask
everyone to update their device trees, which annoys some people who'd
like to have some sort of backward compatibility.

If the model-specific + fallback compatibles combination was to be
listed, the line would include both compatibles. But we're not doing
that for Allwinner stuff. As mentioned it is there just in case. The
best outcome is we don't care and don't need to use them.

ChenYu


Re: [PATCH v5 02/11] dt-bindings: power: supply: axp20x: add AXP803 power bindings

2018-10-30 Thread Chen-Yu Tsai
On Wed, Oct 24, 2018 at 2:54 AM Oskari Lemmela  wrote:
>
> The AXP803 power supplies are compatible with AXP813, but
> add specific compatibles for them.
>
> Signed-off-by: Oskari Lemmela 
> ---
>  .../devicetree/bindings/power/supply/axp20x_ac_power.txt | 1 +
>  .../devicetree/bindings/power/supply/axp20x_battery.txt  | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git 
> a/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt 
> b/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> index 7a1fb532abe5..acdeb4b8f4cc 100644
> --- a/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> +++ b/Documentation/devicetree/bindings/power/supply/axp20x_ac_power.txt
> @@ -4,6 +4,7 @@ Required Properties:
>   - compatible: One of:
> "x-powers,axp202-ac-power-supply"
> "x-powers,axp221-ac-power-supply"
> +   "x-powers,axp803-ac-power-supply"
> "x-powers,axp813-ac-power-supply"
>
>  This node is a subnode of the axp20x PMIC.
> diff --git 
> a/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt 
> b/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> index 41916f69902c..780ebd7e3b84 100644
> --- a/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> +++ b/Documentation/devicetree/bindings/power/supply/axp20x_battery.txt
> @@ -4,6 +4,7 @@ Required Properties:
>   - compatible, one of:
> "x-powers,axp209-battery-power-supply"
> "x-powers,axp221-battery-power-supply"
> +   "x-powers,axp803-battery-power-supply"
> "x-powers,axp813-battery-power-supply"
>
>  This node is a subnode of its respective PMIC DT node.

As mentioned in my reply to the cover letter, this patch isn't needed.

We ask people to add model-specific compatibles in the device tree
in case the hardware turns out not to be so compatible with the old
hardware we thought it was compatible with. With the model-specific
compatible in place, we have a way out. Without it, we'd have to ask
everyone to update their device trees, which annoys some people who'd
like to have some sort of backward compatibility.

If the model-specific + fallback compatibles combination was to be
listed, the line would include both compatibles. But we're not doing
that for Allwinner stuff. As mentioned it is there just in case. The
best outcome is we don't care and don't need to use them.

ChenYu


Re: [PATCH v5 00/11] AXP8x3 AC and battery power supply support

2018-10-30 Thread Chen-Yu Tsai
On Fri, Oct 26, 2018 at 3:08 AM Sebastian Reichel
 wrote:
>
> Hi,
>
> Patches 1, 2 & 9 look good to me and do not seem to have any
> dependencies. I plan to merge them after the merge window
> for 4.20 closes.

Patches 2, 3 & 4 aren't needed. They aren't in the correct format
for model-specific + fallback compatibles either.

ChenYu

>
> -- Sebastian
>
> On Tue, Oct 23, 2018 at 09:53:19PM +0300, Oskari Lemmela wrote:
> > AXP813 AC power supply support with input current and
> > voltage limiting support.
> >
> > AXP803 AC and battery power supply support.
> >
> > Changes in v5:
> > * Return correct input current limit for values 0x6 and 0x7
> > * Add specific compatibles for AXP803
> > * Change commit messages
> > * Add Vasily Khoruzhick Pinebook DTS patch
> >
> > Changes in v4:
> > * Change order of axp20x-gpio in axp20x.c
> > * Fix indentation and spaces to tabs in axp20x.c
> >
> > Changes in v3:
> > * Reorder ac_power_supply DT node
> > * Rename axp20x_ac_power_set_property function
> > * Split mfd commit
> >
> > Changes in v2:
> > * Reuse axp813 compatibles for axp803
> > * Refactor axp20x_ac_power.c
> >
> > Oskari Lemmela (10):
> >   dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
> >   dt-bindings: power: supply: axp20x: add AXP803 power bindings
> >   dt-bindings: gpio: gpio-axp209: add AXP803 GPIO bindings
> >   dt-bindings: iio: adc: add AXP803 ADC bindings
> >   ARM: dts: axp81x: add AC power supply subnode
> >   arm64: dts: allwinner: axp803: add AC and battery power supplies
> >   arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies
> >   power: supply: add AC power supply driver for AXP813
> >   mfd: axp20x: Add AC power supply cell for AXP813
> >   mfd: axp20x: Add supported cells for AXP803
> >
> > Vasily Khoruzhick (1):
> >   arm64: dts: allwinner: a64: pinebook: enable power supplies
> >
> >  .../devicetree/bindings/gpio/gpio-axp209.txt  |  2 +
> >  .../bindings/iio/adc/axp20x_adc.txt   |  2 +
> >  .../bindings/power/supply/axp20x_ac_power.txt |  4 +
> >  .../bindings/power/supply/axp20x_battery.txt  |  1 +
> >  arch/arm/boot/dts/axp81x.dtsi |  5 +
> >  arch/arm64/boot/dts/allwinner/axp803.dtsi | 33 +++
> >  .../dts/allwinner/sun50i-a64-pinebook.dts |  8 ++
> >  .../allwinner/sun50i-a64-sopine-baseboard.dts |  8 ++
> >  drivers/mfd/axp20x.c  | 20 
> >  drivers/power/supply/axp20x_ac_power.c| 94 +++
> >  include/linux/mfd/axp20x.h|  1 +
> >  11 files changed, 178 insertions(+)
> >
> > --
> > 2.17.1
> >


RE: scsi_set_medium_removal timeout issue

2018-10-30 Thread Zengtao (B)
Hi:

>-Original Message-
>From: Alan Stern [mailto:st...@rowland.harvard.edu]
>Sent: Tuesday, October 30, 2018 10:08 PM
>To: Zengtao (B) 
>Cc: j...@linux.vnet.ibm.com; martin.peter...@oracle.com;
>gre...@linuxfoundation.org; linux-s...@vger.kernel.org;
>linux-kernel@vger.kernel.org; linux-...@vger.kernel.org;
>usb-stor...@lists.one-eyed-alien.net
>Subject: Re: scsi_set_medium_removal timeout issue
>
>On Tue, 30 Oct 2018, Zengtao (B) wrote:
>
>> Hi
>>
>> I have recently met a scsi_set_medium_removal timeout issue, and it's
>> related  to both SCSI and USB MASS storage.
>> Since i am not an expert in either scsi or usb mass storage, i am
>> writing to report the issue and ask for a solution from you guys.
>>
>> My test scenario is as follow:
>> 1.Linux HOST-Linux mass storage gadget(the back store is a flash
>partition).
>> 2.Host mount the device.
>> 3.Host writes some data to the Mass storage device.
>> 4.Host Unmount the device.
>> Both Linux kernels(Host and Device) are Linux 4.9.
>> Some has reported the same issue a long time ago, but it remains there.
>> https://www.spinics.net/lists/linux-usb/msg53739.html
>>
>> For the issue itself, there is my observation:
>> In the step 4, the Host will issue an
>PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
>> and and timeout happens due to the device 's very slow
>fsg_lun_fsync_sub.
>
>Something is wrong here.  Before sending PREVENT-ALLOW MEDIUM
>REMOVAL, the host should issue SYNCHRONIZE CACHE.  This will force
>fsg_lun_fsync_sub to run, and the host should allow a long timeout for
>this command.  Then when PREVENT-ALLOW MEDIUM REMOVAL is sent,
>nothing will need to be flushed.
>

Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it directly
issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe something wrong
with the scsi layer or something wrong with the mass storage class driver?

Zengtao 

>Alan Stern
>
>> I found there are two methods to workaround the issue:
>> 1. Change the timeout value of host scsi command
>> PREVENT_ALLOW_MEDIUM_REMOVAL to to about 60 seconds from 10
>seconds.
>> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget
>driver.
>>
>> Thanks
>>
>> Regards
>> zentao



Re: [PATCH v5 00/11] AXP8x3 AC and battery power supply support

2018-10-30 Thread Chen-Yu Tsai
On Fri, Oct 26, 2018 at 3:08 AM Sebastian Reichel
 wrote:
>
> Hi,
>
> Patches 1, 2 & 9 look good to me and do not seem to have any
> dependencies. I plan to merge them after the merge window
> for 4.20 closes.

Patches 2, 3 & 4 aren't needed. They aren't in the correct format
for model-specific + fallback compatibles either.

ChenYu

>
> -- Sebastian
>
> On Tue, Oct 23, 2018 at 09:53:19PM +0300, Oskari Lemmela wrote:
> > AXP813 AC power supply support with input current and
> > voltage limiting support.
> >
> > AXP803 AC and battery power supply support.
> >
> > Changes in v5:
> > * Return correct input current limit for values 0x6 and 0x7
> > * Add specific compatibles for AXP803
> > * Change commit messages
> > * Add Vasily Khoruzhick Pinebook DTS patch
> >
> > Changes in v4:
> > * Change order of axp20x-gpio in axp20x.c
> > * Fix indentation and spaces to tabs in axp20x.c
> >
> > Changes in v3:
> > * Reorder ac_power_supply DT node
> > * Rename axp20x_ac_power_set_property function
> > * Split mfd commit
> >
> > Changes in v2:
> > * Reuse axp813 compatibles for axp803
> > * Refactor axp20x_ac_power.c
> >
> > Oskari Lemmela (10):
> >   dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
> >   dt-bindings: power: supply: axp20x: add AXP803 power bindings
> >   dt-bindings: gpio: gpio-axp209: add AXP803 GPIO bindings
> >   dt-bindings: iio: adc: add AXP803 ADC bindings
> >   ARM: dts: axp81x: add AC power supply subnode
> >   arm64: dts: allwinner: axp803: add AC and battery power supplies
> >   arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies
> >   power: supply: add AC power supply driver for AXP813
> >   mfd: axp20x: Add AC power supply cell for AXP813
> >   mfd: axp20x: Add supported cells for AXP803
> >
> > Vasily Khoruzhick (1):
> >   arm64: dts: allwinner: a64: pinebook: enable power supplies
> >
> >  .../devicetree/bindings/gpio/gpio-axp209.txt  |  2 +
> >  .../bindings/iio/adc/axp20x_adc.txt   |  2 +
> >  .../bindings/power/supply/axp20x_ac_power.txt |  4 +
> >  .../bindings/power/supply/axp20x_battery.txt  |  1 +
> >  arch/arm/boot/dts/axp81x.dtsi |  5 +
> >  arch/arm64/boot/dts/allwinner/axp803.dtsi | 33 +++
> >  .../dts/allwinner/sun50i-a64-pinebook.dts |  8 ++
> >  .../allwinner/sun50i-a64-sopine-baseboard.dts |  8 ++
> >  drivers/mfd/axp20x.c  | 20 
> >  drivers/power/supply/axp20x_ac_power.c| 94 +++
> >  include/linux/mfd/axp20x.h|  1 +
> >  11 files changed, 178 insertions(+)
> >
> > --
> > 2.17.1
> >


RE: scsi_set_medium_removal timeout issue

2018-10-30 Thread Zengtao (B)
Hi:

>-Original Message-
>From: Alan Stern [mailto:st...@rowland.harvard.edu]
>Sent: Tuesday, October 30, 2018 10:08 PM
>To: Zengtao (B) 
>Cc: j...@linux.vnet.ibm.com; martin.peter...@oracle.com;
>gre...@linuxfoundation.org; linux-s...@vger.kernel.org;
>linux-kernel@vger.kernel.org; linux-...@vger.kernel.org;
>usb-stor...@lists.one-eyed-alien.net
>Subject: Re: scsi_set_medium_removal timeout issue
>
>On Tue, 30 Oct 2018, Zengtao (B) wrote:
>
>> Hi
>>
>> I have recently met a scsi_set_medium_removal timeout issue, and it's
>> related  to both SCSI and USB MASS storage.
>> Since i am not an expert in either scsi or usb mass storage, i am
>> writing to report the issue and ask for a solution from you guys.
>>
>> My test scenario is as follow:
>> 1.Linux HOST-Linux mass storage gadget(the back store is a flash
>partition).
>> 2.Host mount the device.
>> 3.Host writes some data to the Mass storage device.
>> 4.Host Unmount the device.
>> Both Linux kernels(Host and Device) are Linux 4.9.
>> Some has reported the same issue a long time ago, but it remains there.
>> https://www.spinics.net/lists/linux-usb/msg53739.html
>>
>> For the issue itself, there is my observation:
>> In the step 4, the Host will issue an
>PREVENT_ALLOW_MEDIUM_REMOVAL scsi command.
>> and and timeout happens due to the device 's very slow
>fsg_lun_fsync_sub.
>
>Something is wrong here.  Before sending PREVENT-ALLOW MEDIUM
>REMOVAL, the host should issue SYNCHRONIZE CACHE.  This will force
>fsg_lun_fsync_sub to run, and the host should allow a long timeout for
>this command.  Then when PREVENT-ALLOW MEDIUM REMOVAL is sent,
>nothing will need to be flushed.
>

Definitely, I haven't seen the SYNCHRONIZE CACHE from the host, it directly
issued the PREVENT-ALLOW MEDIUM REMOVAL, so maybe something wrong
with the scsi layer or something wrong with the mass storage class driver?

Zengtao 

>Alan Stern
>
>> I found there are two methods to workaround the issue:
>> 1. Change the timeout value of host scsi command
>> PREVENT_ALLOW_MEDIUM_REMOVAL to to about 60 seconds from 10
>seconds.
>> 2. Remove the fsg_lun_fsync_sub in the device's Mass storage gadget
>driver.
>>
>> Thanks
>>
>> Regards
>> zentao



Re: [PATCH v5 04/11] dt-bindings: iio: adc: add AXP803 ADC bindings

2018-10-30 Thread Chen-Yu Tsai
On Mon, Oct 29, 2018 at 9:10 PM Quentin Schulz
 wrote:
>
> Hi Jonathan,
>
> On Sun, Oct 28, 2018 at 03:40:11PM +, Jonathan Cameron wrote:
> > On Wed, 24 Oct 2018 08:56:33 -0500
> > Rob Herring  wrote:
> >
> > > On Tue, 23 Oct 2018 21:53:23 +0300, Oskari Lemmela wrote:
> > > > The AXP803 ADC is compatible with AXP813 ADC, but add
> > > > specific compatible for it.
> > > >
> > > > Signed-off-by: Oskari Lemmela 
> > > > ---
> > > >  Documentation/devicetree/bindings/iio/adc/axp20x_adc.txt | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > >
> > > Reviewed-by: Rob Herring 
> >
> > This doesn't seem to have any dependencies with the other patches
> > so applied to the togreg branch of iio.git and pushed out as testing
> > for the autobuilders to ignore.  However I am a little curious to know
> > why we would add the ID and then not use it (that I can see)?
> >
>
> Sometimes with Allwinner (and X-Powers), two IPs seem identical until we
> discover something that is slightly different. When this happens, we
> have to add a compatible to differentiate both. However, we would also
> need to change the Device Tree to change the compatible. We would need
> to handle the driver behaviour for both Device Trees.
>
> So better anticipate a possible difference so that we don't have to do
> some hacks in the driver to handle the device correctly.
>
> As always, Chen-Yu or Maxime may know better so I'm just stating what I
> seem to recall.

With Allwinner stuff (X-Powers included), sometimes the documents are
incomplete or have errors. We tend to add a model-specific compatible
just in case things turn out not to be so compatible, unless someone
has triple-checked everything, documents and actual hardware included.

However we don't actually document these, so this patch isn't strictly
needed. (I suppose this might annoy the device tree binding maintainers.)

ChenYu


Re: [PATCH v5 04/11] dt-bindings: iio: adc: add AXP803 ADC bindings

2018-10-30 Thread Chen-Yu Tsai
On Mon, Oct 29, 2018 at 9:10 PM Quentin Schulz
 wrote:
>
> Hi Jonathan,
>
> On Sun, Oct 28, 2018 at 03:40:11PM +, Jonathan Cameron wrote:
> > On Wed, 24 Oct 2018 08:56:33 -0500
> > Rob Herring  wrote:
> >
> > > On Tue, 23 Oct 2018 21:53:23 +0300, Oskari Lemmela wrote:
> > > > The AXP803 ADC is compatible with AXP813 ADC, but add
> > > > specific compatible for it.
> > > >
> > > > Signed-off-by: Oskari Lemmela 
> > > > ---
> > > >  Documentation/devicetree/bindings/iio/adc/axp20x_adc.txt | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > >
> > > Reviewed-by: Rob Herring 
> >
> > This doesn't seem to have any dependencies with the other patches
> > so applied to the togreg branch of iio.git and pushed out as testing
> > for the autobuilders to ignore.  However I am a little curious to know
> > why we would add the ID and then not use it (that I can see)?
> >
>
> Sometimes with Allwinner (and X-Powers), two IPs seem identical until we
> discover something that is slightly different. When this happens, we
> have to add a compatible to differentiate both. However, we would also
> need to change the Device Tree to change the compatible. We would need
> to handle the driver behaviour for both Device Trees.
>
> So better anticipate a possible difference so that we don't have to do
> some hacks in the driver to handle the device correctly.
>
> As always, Chen-Yu or Maxime may know better so I'm just stating what I
> seem to recall.

With Allwinner stuff (X-Powers included), sometimes the documents are
incomplete or have errors. We tend to add a model-specific compatible
just in case things turn out not to be so compatible, unless someone
has triple-checked everything, documents and actual hardware included.

However we don't actually document these, so this patch isn't strictly
needed. (I suppose this might annoy the device tree binding maintainers.)

ChenYu


Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-30 Thread Yi Sun
After syncing with Hyper-V team, we have got answers as below.

On 18-10-24 16:53:00, Michael Kelley wrote:
> From: Yi Sun   Sent: Friday, October 19, 2018 6:14 
> AM
> > 
> > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT)
> > is used by a guest OS to notify the hypervisor that the calling
> > virtual processor is attempting to acquire a resource that is
> > potentially held by another virtual processor within the same
> > Virtual Machine. This scheduling hint improves the scalability of
> > VMs with more than one virtual processor on Hyper-V.
> > 
> > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor
> > only when the retry number exceeds the recommended number. If
> > recommended number is 0x, never retry.
> 
> The HvNotifyLongSpinWait hypercall should be understood to be
> advisory only.  As you noted, it is a scheduling hint to the
> hypervisor that some virtual CPU in the VM holds a spin lock.  Even
> though Linux knows which vCPU holds the spin lock, the hypercall
> does not provide a way to give that information to Hyper-V.  The
> hypercall always returns immediately.
> 
> The "retry number" is a bit mis-named in the Hyper-V Top Level
> Functional Spec (TLFS).  It is essentially a threshold value.  Hyper-V is
> saying "don't bother to advise me about the spin lock until you have
> done a certain number of spins."  This threshold prevents
> over-notifying Hyper-V such that the notification becomes somewhat
> meaningless.   It's not immediately clear to me why the hypercall passes
> that value as an input, but maybe it lets the Hyper-V scheduler prioritize
> among vCPUs based on how many times they have spun for a lock.  I
> think we were told that current Hyper-V implementations ignore this
> input value anyway.
> 
> I believe the description of the sentinel value 0x in the
> Hyper-V TLFS is incorrect.  Because it is the max possible threshold
> value, that value in the EBX register just means to not ever bother to
> notify.   The description should be "0x indicates never to notify."
> The value does *not* indicate anything about retrying to obtain the
> spin lock.
> 
You are right. 0x only indicates never to notify. We should not
break the spin loop.

> >  static bool __initdata hv_pvspin = true;
> > 
> > +bool hv_notify_long_spin_wait(int retry_num)
> 
> retry_num should probably be declared as unsigned int.  You
> don't want it to be treated as a negative number if the high
> order bit is set.
> 
Yes, thanks!

> > +{
> > +   /*
> > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when
> > +* the retry number exceeds the recommended number.
> > +*
> > +* If recommended number is 0x, never retry.
> > +*/
> > +   if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER)
> > +   return false;
> > +
> > +   if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num)
> 
> I don't know if the "%" function is right here.  Your implementation will
> notify Hyper-V on every multiple of num_spin_retry.   The alternative is to
> notify once when the threshold is exceeded, and never again for this
> particular attempt to obtain a spin lock.  We should check with the Hyper-V
> team for which approach they expect to be used.
> 
We should send the notification on every multiple of the recommended
number.

> > +   hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT,
> > + retry_num);
> 
> The Hyper-V TLFS seems to be inconsistent on whether the input parameter
> is 32-bits or 64-bits.   In one place it is typed as UINT64, but in another 
> place
> it is shown as only 4 bytes.  Need to clear this up with the Hyper-V team as
> well.
> 
It is 32-bits.

> > +
> > +   return true;
> 
> I don't see a need for this function to return true vs. false.  Any calling 
> code
> should not change its behavior based on num_spin_retry.   This function will
> either notify Hyper-V or not notify Hyper-V, depending on whether the number
> of attempts to obtain the spinlock meets the threshold.  But calling code will
> do the same thing regardless of whether such a notification is made. 
> 
You are right. I will change it to 'void'.

> Michael


Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-30 Thread Yi Sun
After syncing with Hyper-V team, we have got answers as below.

On 18-10-24 16:53:00, Michael Kelley wrote:
> From: Yi Sun   Sent: Friday, October 19, 2018 6:14 
> AM
> > 
> > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT)
> > is used by a guest OS to notify the hypervisor that the calling
> > virtual processor is attempting to acquire a resource that is
> > potentially held by another virtual processor within the same
> > Virtual Machine. This scheduling hint improves the scalability of
> > VMs with more than one virtual processor on Hyper-V.
> > 
> > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor
> > only when the retry number exceeds the recommended number. If
> > recommended number is 0x, never retry.
> 
> The HvNotifyLongSpinWait hypercall should be understood to be
> advisory only.  As you noted, it is a scheduling hint to the
> hypervisor that some virtual CPU in the VM holds a spin lock.  Even
> though Linux knows which vCPU holds the spin lock, the hypercall
> does not provide a way to give that information to Hyper-V.  The
> hypercall always returns immediately.
> 
> The "retry number" is a bit mis-named in the Hyper-V Top Level
> Functional Spec (TLFS).  It is essentially a threshold value.  Hyper-V is
> saying "don't bother to advise me about the spin lock until you have
> done a certain number of spins."  This threshold prevents
> over-notifying Hyper-V such that the notification becomes somewhat
> meaningless.   It's not immediately clear to me why the hypercall passes
> that value as an input, but maybe it lets the Hyper-V scheduler prioritize
> among vCPUs based on how many times they have spun for a lock.  I
> think we were told that current Hyper-V implementations ignore this
> input value anyway.
> 
> I believe the description of the sentinel value 0x in the
> Hyper-V TLFS is incorrect.  Because it is the max possible threshold
> value, that value in the EBX register just means to not ever bother to
> notify.   The description should be "0x indicates never to notify."
> The value does *not* indicate anything about retrying to obtain the
> spin lock.
> 
You are right. 0x only indicates never to notify. We should not
break the spin loop.

> >  static bool __initdata hv_pvspin = true;
> > 
> > +bool hv_notify_long_spin_wait(int retry_num)
> 
> retry_num should probably be declared as unsigned int.  You
> don't want it to be treated as a negative number if the high
> order bit is set.
> 
Yes, thanks!

> > +{
> > +   /*
> > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when
> > +* the retry number exceeds the recommended number.
> > +*
> > +* If recommended number is 0x, never retry.
> > +*/
> > +   if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER)
> > +   return false;
> > +
> > +   if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num)
> 
> I don't know if the "%" function is right here.  Your implementation will
> notify Hyper-V on every multiple of num_spin_retry.   The alternative is to
> notify once when the threshold is exceeded, and never again for this
> particular attempt to obtain a spin lock.  We should check with the Hyper-V
> team for which approach they expect to be used.
> 
We should send the notification on every multiple of the recommended
number.

> > +   hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT,
> > + retry_num);
> 
> The Hyper-V TLFS seems to be inconsistent on whether the input parameter
> is 32-bits or 64-bits.   In one place it is typed as UINT64, but in another 
> place
> it is shown as only 4 bytes.  Need to clear this up with the Hyper-V team as
> well.
> 
It is 32-bits.

> > +
> > +   return true;
> 
> I don't see a need for this function to return true vs. false.  Any calling 
> code
> should not change its behavior based on num_spin_retry.   This function will
> either notify Hyper-V or not notify Hyper-V, depending on whether the number
> of attempts to obtain the spinlock meets the threshold.  But calling code will
> do the same thing regardless of whether such a notification is made. 
> 
You are right. I will change it to 'void'.

> Michael


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Daniel Colascione
On Wed, Oct 31, 2018 at 12:42 AM, Joel Fernandes  wrote:
> On Wed, Oct 31, 2018 at 09:49:08AM +1100, Aleksa Sarai wrote:
>> On 2018-10-30, Joel Fernandes  wrote:
>> > > > [...]
>> > > > > > > (Unfortunately
>> > > > > > > there are lots of things that make it a bit difficult to use 
>> > > > > > > /proc/$pid
>> > > > > > > exclusively for introspection of a process -- especially in the 
>> > > > > > > context
>> > > > > > > of containers.)
>> > > > > >
>> > > > > > Tons of things already break without a working /proc. What do you 
>> > > > > > have in mind?
>> > > > >
>> > > > > Heh, if only that was the only blocker. :P
>> > > > >
>> > > > > The basic problem is that currently container runtimes either depend 
>> > > > > on
>> > > > > some non-transient on-disk state (which becomes invalid on machine
>> > > > > reboots or dead processes and so on), or on long-running processes 
>> > > > > that
>> > > > > keep file descriptors required for administration of a container 
>> > > > > alive
>> > > > > (think O_PATH to /dev/pts/ptmx to avoid malicious container 
>> > > > > filesystem
>> > > > > attacks). Usually both.
>> > > > >
>> > > > > What would be really useful would be having some way of "hiding 
>> > > > > away" a
>> > > > > mount namespace (of the pid1 of the container) that has all of the
>> > > > > information and bind-mounts-to-file-descriptors that are necessary 
>> > > > > for
>> > > > > administration. If the container's pid1 dies all of the transient 
>> > > > > state
>> > > > > has disappeared automatically -- because the stashed mount namespace 
>> > > > > has
>> > > > > died. In addition, if this was done the way I'm thinking with (and 
>> > > > > this
>> > > > > is the contentious bit) hierarchical mount namespaces you could make 
>> > > > > it
>> > > > > so that the pid1 could not manipulate its current mount namespace to
>> > > > > confuse the administrative process. You would also then create an
>> > > > > intermediate user namespace to help with several race conditions 
>> > > > > (that
>> > > > > have caused security bugs like CVE-2016-9962) we've seen when joining
>> > > > > containers.
>> > > > >
>> > > > > Unfortunately this all depends on hierarchical mount namespaces (and
>> > > > > note that this would just be that NS_GET_PARENT gives you the mount
>> > > > > namespace that it was created in -- I'm not suggesting we redesign 
>> > > > > peers
>> > > > > or anything like that). This makes it basically a non-starter.
>> > > > >
>> > > > > But if, on top of this ground-work, we then referenced containers
>> > > > > entirely via an fd to /proc/$pid then you could also avoid PID reuse
>> > > > > races (as well as being able to find out implicitly whether a 
>> > > > > container
>> > > > > has died thanks to the error semantics of /proc/$pid). And that's the
>> > > > > way I would suggest doing it (if we had these other things in place).
>> > > >
>> > > > I didn't fully follow exactly what you mean. If you can explain for the
>> > > > layman who doesn't know much experience with containers..
>> > > >
>> > > > Are you saying that keeping open a /proc/$pid directory handle is not
>> > > > sufficient to prevent PID reuse while the proc entries under 
>> > > > /proc/$pid are
>> > > > being looked into? If its not sufficient, then isn't that a bug? If it 
>> > > > is
>> > > > sufficient, then can we not just keep the handle open while we do 
>> > > > whatever we
>> > > > want under /proc/$pid ?
>> > >
>> > > Sorry, I went on a bit of a tangent about various internals of container
>> > > runtimes. My main point is that I would love to use /proc/$pid because
>> > > it makes reuse handling very trivial and is always correct, but that
>> > > there are things which stop us from being able to use it for everything
>> > > (which is what my incoherent rambling was on about).
>> >
>> > Ok thanks. So I am guessing if the following sequence works, then Dan's 
>> > patch is not
>> > needed.
>> >
>> > 1. open /proc/ directory
>> > 2. inspect /proc/ or do whatever with 
>> > 3. Issue the kill on 
>> > 4. Close the /proc/ directory opened in step 1.
>> >
>> > So unless I missed something, the above sequence will not cause any PID 
>> > reuse
>> > races.
>>
>> (Sorry, I misunderstood your original question.)
>>
>> The problem is that holding /proc/$pid doesn't stop the PID from dying
>> and being reused. The benefit of holding open /proc/$pid is that you
>> will get an error if you try to use it *after* the PID has died -- which
>> means that you don't need to worry about explicitly checking for PID
>> reuse if you are only operating with the file descriptor and not the
>> PID.
>>
>> So that sequence won't always work. There is a race where the pid might
>> die and be recycled by the time you call kill(2) -- after you've done
>> step 2. By tying step 2 and 3 together -- in this patch -- you remove
>> the race (since in order to resolve the "kill" procfs file VFS must
>> resolve the PID first -- 

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Daniel Colascione
On Wed, Oct 31, 2018 at 12:42 AM, Joel Fernandes  wrote:
> On Wed, Oct 31, 2018 at 09:49:08AM +1100, Aleksa Sarai wrote:
>> On 2018-10-30, Joel Fernandes  wrote:
>> > > > [...]
>> > > > > > > (Unfortunately
>> > > > > > > there are lots of things that make it a bit difficult to use 
>> > > > > > > /proc/$pid
>> > > > > > > exclusively for introspection of a process -- especially in the 
>> > > > > > > context
>> > > > > > > of containers.)
>> > > > > >
>> > > > > > Tons of things already break without a working /proc. What do you 
>> > > > > > have in mind?
>> > > > >
>> > > > > Heh, if only that was the only blocker. :P
>> > > > >
>> > > > > The basic problem is that currently container runtimes either depend 
>> > > > > on
>> > > > > some non-transient on-disk state (which becomes invalid on machine
>> > > > > reboots or dead processes and so on), or on long-running processes 
>> > > > > that
>> > > > > keep file descriptors required for administration of a container 
>> > > > > alive
>> > > > > (think O_PATH to /dev/pts/ptmx to avoid malicious container 
>> > > > > filesystem
>> > > > > attacks). Usually both.
>> > > > >
>> > > > > What would be really useful would be having some way of "hiding 
>> > > > > away" a
>> > > > > mount namespace (of the pid1 of the container) that has all of the
>> > > > > information and bind-mounts-to-file-descriptors that are necessary 
>> > > > > for
>> > > > > administration. If the container's pid1 dies all of the transient 
>> > > > > state
>> > > > > has disappeared automatically -- because the stashed mount namespace 
>> > > > > has
>> > > > > died. In addition, if this was done the way I'm thinking with (and 
>> > > > > this
>> > > > > is the contentious bit) hierarchical mount namespaces you could make 
>> > > > > it
>> > > > > so that the pid1 could not manipulate its current mount namespace to
>> > > > > confuse the administrative process. You would also then create an
>> > > > > intermediate user namespace to help with several race conditions 
>> > > > > (that
>> > > > > have caused security bugs like CVE-2016-9962) we've seen when joining
>> > > > > containers.
>> > > > >
>> > > > > Unfortunately this all depends on hierarchical mount namespaces (and
>> > > > > note that this would just be that NS_GET_PARENT gives you the mount
>> > > > > namespace that it was created in -- I'm not suggesting we redesign 
>> > > > > peers
>> > > > > or anything like that). This makes it basically a non-starter.
>> > > > >
>> > > > > But if, on top of this ground-work, we then referenced containers
>> > > > > entirely via an fd to /proc/$pid then you could also avoid PID reuse
>> > > > > races (as well as being able to find out implicitly whether a 
>> > > > > container
>> > > > > has died thanks to the error semantics of /proc/$pid). And that's the
>> > > > > way I would suggest doing it (if we had these other things in place).
>> > > >
>> > > > I didn't fully follow exactly what you mean. If you can explain for the
>> > > > layman who doesn't know much experience with containers..
>> > > >
>> > > > Are you saying that keeping open a /proc/$pid directory handle is not
>> > > > sufficient to prevent PID reuse while the proc entries under 
>> > > > /proc/$pid are
>> > > > being looked into? If its not sufficient, then isn't that a bug? If it 
>> > > > is
>> > > > sufficient, then can we not just keep the handle open while we do 
>> > > > whatever we
>> > > > want under /proc/$pid ?
>> > >
>> > > Sorry, I went on a bit of a tangent about various internals of container
>> > > runtimes. My main point is that I would love to use /proc/$pid because
>> > > it makes reuse handling very trivial and is always correct, but that
>> > > there are things which stop us from being able to use it for everything
>> > > (which is what my incoherent rambling was on about).
>> >
>> > Ok thanks. So I am guessing if the following sequence works, then Dan's 
>> > patch is not
>> > needed.
>> >
>> > 1. open /proc/ directory
>> > 2. inspect /proc/ or do whatever with 
>> > 3. Issue the kill on 
>> > 4. Close the /proc/ directory opened in step 1.
>> >
>> > So unless I missed something, the above sequence will not cause any PID 
>> > reuse
>> > races.
>>
>> (Sorry, I misunderstood your original question.)
>>
>> The problem is that holding /proc/$pid doesn't stop the PID from dying
>> and being reused. The benefit of holding open /proc/$pid is that you
>> will get an error if you try to use it *after* the PID has died -- which
>> means that you don't need to worry about explicitly checking for PID
>> reuse if you are only operating with the file descriptor and not the
>> PID.
>>
>> So that sequence won't always work. There is a race where the pid might
>> die and be recycled by the time you call kill(2) -- after you've done
>> step 2. By tying step 2 and 3 together -- in this patch -- you remove
>> the race (since in order to resolve the "kill" procfs file VFS must
>> resolve the PID first -- 

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Daniel Colascione
On Wed, Oct 31, 2018 at 12:57 AM, Joel Fernandes  wrote:
> On Tue, Oct 30, 2018 at 11:10:47PM +, Daniel Colascione wrote:
>> On Tue, Oct 30, 2018 at 10:33 PM, Joel Fernandes  
>> wrote:
>> > On Wed, Oct 31, 2018 at 09:23:39AM +1100, Aleksa Sarai wrote:
>> >> On 2018-10-30, Joel Fernandes  wrote:
>> >> > On Wed, Oct 31, 2018 at 07:45:01AM +1100, Aleksa Sarai wrote:
>> >> > [...]
>> >> > > > > (Unfortunately
>> >> > > > > there are lots of things that make it a bit difficult to use 
>> >> > > > > /proc/$pid
>> >> > > > > exclusively for introspection of a process -- especially in the 
>> >> > > > > context
>> >> > > > > of containers.)
>> >> > > >
>> >> > > > Tons of things already break without a working /proc. What do you 
>> >> > > > have in mind?
>> >> > >
>> >> > > Heh, if only that was the only blocker. :P
>> >> > >
>> >> > > The basic problem is that currently container runtimes either depend 
>> >> > > on
>> >> > > some non-transient on-disk state (which becomes invalid on machine
>> >> > > reboots or dead processes and so on), or on long-running processes 
>> >> > > that
>> >> > > keep file descriptors required for administration of a container alive
>> >> > > (think O_PATH to /dev/pts/ptmx to avoid malicious container filesystem
>> >> > > attacks). Usually both.
>> >> > >
>> >> > > What would be really useful would be having some way of "hiding away" 
>> >> > > a
>> >> > > mount namespace (of the pid1 of the container) that has all of the
>> >> > > information and bind-mounts-to-file-descriptors that are necessary for
>> >> > > administration. If the container's pid1 dies all of the transient 
>> >> > > state
>> >> > > has disappeared automatically -- because the stashed mount namespace 
>> >> > > has
>> >> > > died. In addition, if this was done the way I'm thinking with (and 
>> >> > > this
>> >> > > is the contentious bit) hierarchical mount namespaces you could make 
>> >> > > it
>> >> > > so that the pid1 could not manipulate its current mount namespace to
>> >> > > confuse the administrative process. You would also then create an
>> >> > > intermediate user namespace to help with several race conditions (that
>> >> > > have caused security bugs like CVE-2016-9962) we've seen when joining
>> >> > > containers.
>> >> > >
>> >> > > Unfortunately this all depends on hierarchical mount namespaces (and
>> >> > > note that this would just be that NS_GET_PARENT gives you the mount
>> >> > > namespace that it was created in -- I'm not suggesting we redesign 
>> >> > > peers
>> >> > > or anything like that). This makes it basically a non-starter.
>> >> > >
>> >> > > But if, on top of this ground-work, we then referenced containers
>> >> > > entirely via an fd to /proc/$pid then you could also avoid PID reuse
>> >> > > races (as well as being able to find out implicitly whether a 
>> >> > > container
>> >> > > has died thanks to the error semantics of /proc/$pid). And that's the
>> >> > > way I would suggest doing it (if we had these other things in place).
>> >> >
>> >> > I didn't fully follow exactly what you mean. If you can explain for the
>> >> > layman who doesn't know much experience with containers..
>> >> >
>> >> > Are you saying that keeping open a /proc/$pid directory handle is not
>> >> > sufficient to prevent PID reuse while the proc entries under /proc/$pid 
>> >> > are
>> >> > being looked into? If its not sufficient, then isn't that a bug? If it 
>> >> > is
>> >> > sufficient, then can we not just keep the handle open while we do 
>> >> > whatever we
>> >> > want under /proc/$pid ?
>> >>
>> >> Sorry, I went on a bit of a tangent about various internals of container
>> >> runtimes. My main point is that I would love to use /proc/$pid because
>> >> it makes reuse handling very trivial and is always correct, but that
>> >> there are things which stop us from being able to use it for everything
>> >> (which is what my incoherent rambling was on about).
>> >
>> > Ok thanks. So I am guessing if the following sequence works, then Dan's 
>> > patch is not
>> > needed.
>> >
>> > 1. open /proc/ directory
>> > 2. inspect /proc/ or do whatever with 
>> > 3. Issue the kill on 
>> > 4. Close the /proc/ directory opened in step 1.
>> >
>> > So unless I missed something, the above sequence will not cause any PID 
>> > reuse
>> > races.
>>
>> Keeping a /proc/$PID directory file descriptor open does not prevent
>> $PID being used to name some other process. If it could, you could
>> pretty quickly fill a whole system's process table. See the program
>> below, which demonstrates the PID collision.
>
> I know. We both were not sure about that earlier, that's why I requested you
> to write the program when we were privately chatting. Now I'm sure because
> Aleska answered that and the you program you wrote showed that too.

I don't think that this behavior was ever in doubt from my side.

> I wonder if this cannot be plumbed by just making the /proc/$PID directory
> opens hold a reference to task_struct 

Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-30 Thread Yi Sun
On 18-10-23 17:33:28, Yi Sun wrote:
> On 18-10-23 10:51:27, Peter Zijlstra wrote:
> > On Tue, Oct 23, 2018 at 10:57:40AM +0800, Yi Sun wrote:
> > > On 18-10-22 19:15:16, Peter Zijlstra wrote:
> > 
> > > > >  +#if defined(CONFIG_X86_64) && 
> > > > >  defined(CONFIG_PARAVIRT_SPINLOCKS) && IS_ENABLED(CONFIG_HYPERV)
> > > > >  +if 
> > > > >  (!hv_notify_long_spin_wait(SPIN_THRESHOLD - loop))
> > > > >  +break;
> > > > >  +#endif
> > > > 
> > > > Secondly; how come you thought that was acceptable in any way shape or
> > > > form?
> > > > 
> > > Sorry for that. Will try another way.
> > 
> > Can you try and explain why vcpu_is_preempted() doesn't work for you?
> 
> I thought HvSpinWaitInfo is used to notify hypervisor the spin number
> which is different with vcpu_is_preempted. So I did not consider
> vcpu_is_preempted.
> 
> But HvSpinWaitInfo is a quite simple function and could be combined
> with vcpu_is_preempted together. So I think it is OK to use
> vcpu_is_preempted to make codes clean. I will have a try.

After checking codes, there is one issue to call vcpu_is_preempted.
There are two spin loops in qspinlock_paravirt.h. One loop in
'pv_wait_node' calls vcpu_is_preempted. But another loop in
'pv_wait_head_or_lock' does not call vcpu_is_preempted. It also does
not call any other ops of 'pv_lock_ops' in the loop. So I am afraid
we have to add one more ops in 'pv_lock_ops' to do this.


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Daniel Colascione
On Wed, Oct 31, 2018 at 12:57 AM, Joel Fernandes  wrote:
> On Tue, Oct 30, 2018 at 11:10:47PM +, Daniel Colascione wrote:
>> On Tue, Oct 30, 2018 at 10:33 PM, Joel Fernandes  
>> wrote:
>> > On Wed, Oct 31, 2018 at 09:23:39AM +1100, Aleksa Sarai wrote:
>> >> On 2018-10-30, Joel Fernandes  wrote:
>> >> > On Wed, Oct 31, 2018 at 07:45:01AM +1100, Aleksa Sarai wrote:
>> >> > [...]
>> >> > > > > (Unfortunately
>> >> > > > > there are lots of things that make it a bit difficult to use 
>> >> > > > > /proc/$pid
>> >> > > > > exclusively for introspection of a process -- especially in the 
>> >> > > > > context
>> >> > > > > of containers.)
>> >> > > >
>> >> > > > Tons of things already break without a working /proc. What do you 
>> >> > > > have in mind?
>> >> > >
>> >> > > Heh, if only that was the only blocker. :P
>> >> > >
>> >> > > The basic problem is that currently container runtimes either depend 
>> >> > > on
>> >> > > some non-transient on-disk state (which becomes invalid on machine
>> >> > > reboots or dead processes and so on), or on long-running processes 
>> >> > > that
>> >> > > keep file descriptors required for administration of a container alive
>> >> > > (think O_PATH to /dev/pts/ptmx to avoid malicious container filesystem
>> >> > > attacks). Usually both.
>> >> > >
>> >> > > What would be really useful would be having some way of "hiding away" 
>> >> > > a
>> >> > > mount namespace (of the pid1 of the container) that has all of the
>> >> > > information and bind-mounts-to-file-descriptors that are necessary for
>> >> > > administration. If the container's pid1 dies all of the transient 
>> >> > > state
>> >> > > has disappeared automatically -- because the stashed mount namespace 
>> >> > > has
>> >> > > died. In addition, if this was done the way I'm thinking with (and 
>> >> > > this
>> >> > > is the contentious bit) hierarchical mount namespaces you could make 
>> >> > > it
>> >> > > so that the pid1 could not manipulate its current mount namespace to
>> >> > > confuse the administrative process. You would also then create an
>> >> > > intermediate user namespace to help with several race conditions (that
>> >> > > have caused security bugs like CVE-2016-9962) we've seen when joining
>> >> > > containers.
>> >> > >
>> >> > > Unfortunately this all depends on hierarchical mount namespaces (and
>> >> > > note that this would just be that NS_GET_PARENT gives you the mount
>> >> > > namespace that it was created in -- I'm not suggesting we redesign 
>> >> > > peers
>> >> > > or anything like that). This makes it basically a non-starter.
>> >> > >
>> >> > > But if, on top of this ground-work, we then referenced containers
>> >> > > entirely via an fd to /proc/$pid then you could also avoid PID reuse
>> >> > > races (as well as being able to find out implicitly whether a 
>> >> > > container
>> >> > > has died thanks to the error semantics of /proc/$pid). And that's the
>> >> > > way I would suggest doing it (if we had these other things in place).
>> >> >
>> >> > I didn't fully follow exactly what you mean. If you can explain for the
>> >> > layman who doesn't know much experience with containers..
>> >> >
>> >> > Are you saying that keeping open a /proc/$pid directory handle is not
>> >> > sufficient to prevent PID reuse while the proc entries under /proc/$pid 
>> >> > are
>> >> > being looked into? If its not sufficient, then isn't that a bug? If it 
>> >> > is
>> >> > sufficient, then can we not just keep the handle open while we do 
>> >> > whatever we
>> >> > want under /proc/$pid ?
>> >>
>> >> Sorry, I went on a bit of a tangent about various internals of container
>> >> runtimes. My main point is that I would love to use /proc/$pid because
>> >> it makes reuse handling very trivial and is always correct, but that
>> >> there are things which stop us from being able to use it for everything
>> >> (which is what my incoherent rambling was on about).
>> >
>> > Ok thanks. So I am guessing if the following sequence works, then Dan's 
>> > patch is not
>> > needed.
>> >
>> > 1. open /proc/ directory
>> > 2. inspect /proc/ or do whatever with 
>> > 3. Issue the kill on 
>> > 4. Close the /proc/ directory opened in step 1.
>> >
>> > So unless I missed something, the above sequence will not cause any PID 
>> > reuse
>> > races.
>>
>> Keeping a /proc/$PID directory file descriptor open does not prevent
>> $PID being used to name some other process. If it could, you could
>> pretty quickly fill a whole system's process table. See the program
>> below, which demonstrates the PID collision.
>
> I know. We both were not sure about that earlier, that's why I requested you
> to write the program when we were privately chatting. Now I'm sure because
> Aleska answered that and the you program you wrote showed that too.

I don't think that this behavior was ever in doubt from my side.

> I wonder if this cannot be plumbed by just making the /proc/$PID directory
> opens hold a reference to task_struct 

Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-30 Thread Yi Sun
On 18-10-23 17:33:28, Yi Sun wrote:
> On 18-10-23 10:51:27, Peter Zijlstra wrote:
> > On Tue, Oct 23, 2018 at 10:57:40AM +0800, Yi Sun wrote:
> > > On 18-10-22 19:15:16, Peter Zijlstra wrote:
> > 
> > > > >  +#if defined(CONFIG_X86_64) && 
> > > > >  defined(CONFIG_PARAVIRT_SPINLOCKS) && IS_ENABLED(CONFIG_HYPERV)
> > > > >  +if 
> > > > >  (!hv_notify_long_spin_wait(SPIN_THRESHOLD - loop))
> > > > >  +break;
> > > > >  +#endif
> > > > 
> > > > Secondly; how come you thought that was acceptable in any way shape or
> > > > form?
> > > > 
> > > Sorry for that. Will try another way.
> > 
> > Can you try and explain why vcpu_is_preempted() doesn't work for you?
> 
> I thought HvSpinWaitInfo is used to notify hypervisor the spin number
> which is different with vcpu_is_preempted. So I did not consider
> vcpu_is_preempted.
> 
> But HvSpinWaitInfo is a quite simple function and could be combined
> with vcpu_is_preempted together. So I think it is OK to use
> vcpu_is_preempted to make codes clean. I will have a try.

After checking codes, there is one issue to call vcpu_is_preempted.
There are two spin loops in qspinlock_paravirt.h. One loop in
'pv_wait_node' calls vcpu_is_preempted. But another loop in
'pv_wait_head_or_lock' does not call vcpu_is_preempted. It also does
not call any other ops of 'pv_lock_ops' in the loop. So I am afraid
we have to add one more ops in 'pv_lock_ops' to do this.


net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of function 'cmpxchg64'; did you mean 'cmpxchg'?

2018-10-30 Thread kbuild test robot
Hi Arnd,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 21924765862a0871908a35cb0e53e2e1c169b888 SUNRPC: use cmpxchg64() in 
gss_seq_send64_fetch_and_inc()
date:   4 weeks ago
config: mips-nlm_xlr_defconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 21924765862a0871908a35cb0e53e2e1c169b888
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   net/sunrpc/auth_gss/gss_krb5_seal.c: In function 
'gss_seq_send64_fetch_and_inc':
>> net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of 
>> function 'cmpxchg64'; did you mean 'cmpxchg'? 
>> [-Werror=implicit-function-declaration]
  seq_send = cmpxchg64(>seq_send64, old, old + 1);
 ^
 cmpxchg
   cc1: some warnings being treated as errors

vim +144 net/sunrpc/auth_gss/gss_krb5_seal.c

   136  
   137  u64
   138  gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx)
   139  {
   140  u64 old, seq_send = READ_ONCE(ctx->seq_send);
   141  
   142  do {
   143  old = seq_send;
 > 144  seq_send = cmpxchg64(>seq_send64, old, old + 1);
   145  } while (old != seq_send);
   146  return seq_send;
   147  }
   148  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of function 'cmpxchg64'; did you mean 'cmpxchg'?

2018-10-30 Thread kbuild test robot
Hi Arnd,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 21924765862a0871908a35cb0e53e2e1c169b888 SUNRPC: use cmpxchg64() in 
gss_seq_send64_fetch_and_inc()
date:   4 weeks ago
config: mips-nlm_xlr_defconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 21924765862a0871908a35cb0e53e2e1c169b888
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   net/sunrpc/auth_gss/gss_krb5_seal.c: In function 
'gss_seq_send64_fetch_and_inc':
>> net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: error: implicit declaration of 
>> function 'cmpxchg64'; did you mean 'cmpxchg'? 
>> [-Werror=implicit-function-declaration]
  seq_send = cmpxchg64(>seq_send64, old, old + 1);
 ^
 cmpxchg
   cc1: some warnings being treated as errors

vim +144 net/sunrpc/auth_gss/gss_krb5_seal.c

   136  
   137  u64
   138  gss_seq_send64_fetch_and_inc(struct krb5_ctx *ctx)
   139  {
   140  u64 old, seq_send = READ_ONCE(ctx->seq_send);
   141  
   142  do {
   143  old = seq_send;
 > 144  seq_send = cmpxchg64(>seq_send64, old, old + 1);
   145  } while (old != seq_send);
   146  return seq_send;
   147  }
   148  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 5:29 PM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 03:34:54PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 3:32 PM, Tycho Andersen  wrote:
>> > On Tue, Oct 30, 2018 at 03:00:17PM -0700, Kees Cook wrote:
>> >> On Tue, Oct 30, 2018 at 2:54 PM, Tycho Andersen  wrote:
>> >> > On Tue, Oct 30, 2018 at 02:49:21PM -0700, Kees Cook wrote:
>> >> >> On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
>> >> >> > * switch to a flags based future-proofing mechanism for struct
>> >> >> >   seccomp_notif and seccomp_notif_resp, thus avoiding version 
>> >> >> > issues
>> >> >> >   with structure length (Kees)
>> >> >> [...]
>> >> >> >
>> >> >> > +struct seccomp_notif {
>> >> >> > +   __u64 id;
>> >> >> > +   __u32 pid;
>> >> >> > +   __u32 flags;
>> >> >> > +   struct seccomp_data data;
>> >> >> > +};
>> >> >> > +
>> >> >> > +struct seccomp_notif_resp {
>> >> >> > +   __u64 id;
>> >> >> > +   __s64 val;
>> >> >> > +   __s32 error;
>> >> >> > +   __u32 flags;
>> >> >> > +};
>> >> >>
>> >> >> Hrm, so, what's the plan for when struct seccomp_data changes size?
>> >> >
>> >> > I guess my plan was don't ever change the size again, just use flags
>> >> > and have extra state available via ioctl().
>> >> >
>> >> >> I'm realizing that it might be "too late" for userspace to discover
>> >> >> it's running on a newer kernel. i.e. it gets a user notification, and
>> >> >> discovers flags it doesn't know how to handle. Do we actually need
>> >> >> both flags AND a length? Designing UAPI is frustrating! :)
>> >> >
>> >> > :). I don't see this as such a big problem -- in fact it's better than
>> >> > the length mode, where you don't know what you don't know, because it
>> >> > only copied as much info as you could handle. Older userspace would
>> >> > simply not use information it didn't know how to use.
>> >> >
>> >> >> Do we need another ioctl to discover the seccomp_data size maybe?
>> >> >
>> >> > That could be an option as well, assuming we agree that size would
>> >> > work, which I thought we didn't?
>> >>
>> >> Size alone wasn't able to determine the layout of the seccomp_notif
>> >> structure since it had holes (in the prior version). seccomp_data
>> >> doesn't have holes and is likely to change in size (see the recent
>> >> thread on adding the MPK register to it...)
>> >
>> > Oh, sorry, I misread this as seccomp_notif, not seccomp_data.
>> >
>> >> I'm trying to imagine the right API for this. A portable user of
>> >> seccomp_notif expects the id/pid/flags/data to always be in the same
>> >> place, but it's the size of seccomp_data that may change. So it wants
>> >> to allocate space for seccomp_notif header and "everything else", of
>> >> which is may only understand the start of seccomp_data (and ignore any
>> >> new trailing fields).
>> >>
>> >> So... perhaps the "how big are things?" ioctl would report the header
>> >> size and the seccomp_data size. Then both are flexible. And flags
>> >> would be left as a way to "version" the header?
>> >>
>> >> Any Linux API list members want to chime in here?
>> >
>> > So:
>> >
>> > struct seccomp_notify_sizes {
>> > u16 seccomp_notify;
>> > u16 seccomp_data;
>> > };
>> >
>> > ioctl(fd, SECCOMP_IOCTL_GET_SIZE, );
>> >
>> > This would be only one extra syscall over the lifetime of the listener
>> > process, which doesn't seem too bad. One thing that's slightly
>> > annoying is that you can't do it until you actually get an event, so
>> > maybe it could be a command on the seccomp syscall instead:
>> >
>> > seccomp(SECCOMP_GET_NOTIF_SIZES, 0, );
>>
>> Yeah, top-level makes more sense. u16 seems fine too.
>
> So one problem is this is that the third argument of the seccomp
> syscall is declared as const char, so I get:
>
> kernel/seccomp.c: In function ‘seccomp_get_notif_sizes’:
> kernel/seccomp.c:1401:19: warning: passing argument 1 of ‘copy_to_user’ 
> discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
>   if (copy_to_user(usizes, , sizeof(sizes)))
>^~
> In file included from ./include/linux/compat.h:19:0,
>  from kernel/seccomp.c:19:
> ./include/linux/uaccess.h:152:1: note: expected ‘void *’ but argument is of 
> type ‘const char *’
>  copy_to_user(void __user *to, const void *from, unsigned long n)
>  ^~~~
>
> If I drop the const it doesn't complain, but I'm not sure what the protocol is
> for changing the types of syscall declarations. In principle it doesn't really
> mean anything, but...

I think this should be fine. It's documented as "void *"...

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 5:29 PM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 03:34:54PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 3:32 PM, Tycho Andersen  wrote:
>> > On Tue, Oct 30, 2018 at 03:00:17PM -0700, Kees Cook wrote:
>> >> On Tue, Oct 30, 2018 at 2:54 PM, Tycho Andersen  wrote:
>> >> > On Tue, Oct 30, 2018 at 02:49:21PM -0700, Kees Cook wrote:
>> >> >> On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
>> >> >> > * switch to a flags based future-proofing mechanism for struct
>> >> >> >   seccomp_notif and seccomp_notif_resp, thus avoiding version 
>> >> >> > issues
>> >> >> >   with structure length (Kees)
>> >> >> [...]
>> >> >> >
>> >> >> > +struct seccomp_notif {
>> >> >> > +   __u64 id;
>> >> >> > +   __u32 pid;
>> >> >> > +   __u32 flags;
>> >> >> > +   struct seccomp_data data;
>> >> >> > +};
>> >> >> > +
>> >> >> > +struct seccomp_notif_resp {
>> >> >> > +   __u64 id;
>> >> >> > +   __s64 val;
>> >> >> > +   __s32 error;
>> >> >> > +   __u32 flags;
>> >> >> > +};
>> >> >>
>> >> >> Hrm, so, what's the plan for when struct seccomp_data changes size?
>> >> >
>> >> > I guess my plan was don't ever change the size again, just use flags
>> >> > and have extra state available via ioctl().
>> >> >
>> >> >> I'm realizing that it might be "too late" for userspace to discover
>> >> >> it's running on a newer kernel. i.e. it gets a user notification, and
>> >> >> discovers flags it doesn't know how to handle. Do we actually need
>> >> >> both flags AND a length? Designing UAPI is frustrating! :)
>> >> >
>> >> > :). I don't see this as such a big problem -- in fact it's better than
>> >> > the length mode, where you don't know what you don't know, because it
>> >> > only copied as much info as you could handle. Older userspace would
>> >> > simply not use information it didn't know how to use.
>> >> >
>> >> >> Do we need another ioctl to discover the seccomp_data size maybe?
>> >> >
>> >> > That could be an option as well, assuming we agree that size would
>> >> > work, which I thought we didn't?
>> >>
>> >> Size alone wasn't able to determine the layout of the seccomp_notif
>> >> structure since it had holes (in the prior version). seccomp_data
>> >> doesn't have holes and is likely to change in size (see the recent
>> >> thread on adding the MPK register to it...)
>> >
>> > Oh, sorry, I misread this as seccomp_notif, not seccomp_data.
>> >
>> >> I'm trying to imagine the right API for this. A portable user of
>> >> seccomp_notif expects the id/pid/flags/data to always be in the same
>> >> place, but it's the size of seccomp_data that may change. So it wants
>> >> to allocate space for seccomp_notif header and "everything else", of
>> >> which is may only understand the start of seccomp_data (and ignore any
>> >> new trailing fields).
>> >>
>> >> So... perhaps the "how big are things?" ioctl would report the header
>> >> size and the seccomp_data size. Then both are flexible. And flags
>> >> would be left as a way to "version" the header?
>> >>
>> >> Any Linux API list members want to chime in here?
>> >
>> > So:
>> >
>> > struct seccomp_notify_sizes {
>> > u16 seccomp_notify;
>> > u16 seccomp_data;
>> > };
>> >
>> > ioctl(fd, SECCOMP_IOCTL_GET_SIZE, );
>> >
>> > This would be only one extra syscall over the lifetime of the listener
>> > process, which doesn't seem too bad. One thing that's slightly
>> > annoying is that you can't do it until you actually get an event, so
>> > maybe it could be a command on the seccomp syscall instead:
>> >
>> > seccomp(SECCOMP_GET_NOTIF_SIZES, 0, );
>>
>> Yeah, top-level makes more sense. u16 seems fine too.
>
> So one problem is this is that the third argument of the seccomp
> syscall is declared as const char, so I get:
>
> kernel/seccomp.c: In function ‘seccomp_get_notif_sizes’:
> kernel/seccomp.c:1401:19: warning: passing argument 1 of ‘copy_to_user’ 
> discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
>   if (copy_to_user(usizes, , sizeof(sizes)))
>^~
> In file included from ./include/linux/compat.h:19:0,
>  from kernel/seccomp.c:19:
> ./include/linux/uaccess.h:152:1: note: expected ‘void *’ but argument is of 
> type ‘const char *’
>  copy_to_user(void __user *to, const void *from, unsigned long n)
>  ^~~~
>
> If I drop the const it doesn't complain, but I'm not sure what the protocol is
> for changing the types of syscall declarations. In principle it doesn't really
> mean anything, but...

I think this should be fine. It's documented as "void *"...

-- 
Kees Cook


[GIT PULL] Thermal management updates for v4.20-rc1

2018-10-30 Thread Zhang Rui
Hi, Linus,

Please pull from
  git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next

to receive the latest Thermal management updates for v4.20-rc1 with
top-most commit c2b59d279dbbac750958f6a1bc4841e431d934e3:

  thermal: core: using power_efficient_wq for thermal worker (2018-10-
10 21:48:50 +0800)

on top of commit 0238df646e6224016a45505d2c111a24669ebe21:

  Linux 4.19-rc7 (2018-10-07 17:26:02 +0200)

Specifics:
- Fix a use-after-free issue when unregistering a thermal cooling
device.(Dmitry Osipenko)

- use power_efficient_wq for thermal worker to save more power. (Jeson
Gao)

thanks,
rui


Dmitry Osipenko (1):
  thermal: core: Fix use-after-free in
thermal_cooling_device_destroy_sysfs

Jeson Gao (1):
  thermal: core: using power_efficient_wq for thermal worker

 drivers/thermal/thermal_core.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)


[GIT PULL] Thermal management updates for v4.20-rc1

2018-10-30 Thread Zhang Rui
Hi, Linus,

Please pull from
  git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next

to receive the latest Thermal management updates for v4.20-rc1 with
top-most commit c2b59d279dbbac750958f6a1bc4841e431d934e3:

  thermal: core: using power_efficient_wq for thermal worker (2018-10-
10 21:48:50 +0800)

on top of commit 0238df646e6224016a45505d2c111a24669ebe21:

  Linux 4.19-rc7 (2018-10-07 17:26:02 +0200)

Specifics:
- Fix a use-after-free issue when unregistering a thermal cooling
device.(Dmitry Osipenko)

- use power_efficient_wq for thermal worker to save more power. (Jeson
Gao)

thanks,
rui


Dmitry Osipenko (1):
  thermal: core: Fix use-after-free in
thermal_cooling_device_destroy_sysfs

Jeson Gao (1):
  thermal: core: using power_efficient_wq for thermal worker

 drivers/thermal/thermal_core.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)


make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed by '__build'.

2018-10-30 Thread kbuild test robot
Hi Rob,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 37c8a5fafa3bb7dcdd51774be353be6cb2912b86 kbuild: consolidate Devicetree 
dtb build rules
date:   4 weeks ago
config: xtensa-defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 37c8a5fafa3bb7dcdd51774be353be6cb2912b86
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx200mx.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/ml605.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705_nommu.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx60.dtb', needed 
>> by '__build'.
   make[2]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: linux-next: manual merge of the vfs tree with the xfs tree

2018-10-30 Thread Dave Chinner
On Wed, Oct 31, 2018 at 11:52:47AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> [I don't understand why all this new work turned up in the xfs tree
> during the merge window ...]
> 
> Today's linux-next merge of the vfs tree got a conflict in:
> 
>   fs/read_write.c
> 
> between commits:
> 
>   42ec3d4c0218 ("vfs: make remap_file_range functions take and return bytes 
> completed")
>   eca3654e3cc7 ("vfs: enable remap callers that can handle short operations")
> 
> from the xfs tree and commit:
> 
>   5de4480ae7f8 ("vfs: allow dedupe of user owned read-only files")
> 
> from the vfs tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks ok. I didn't expect this conflict, but looks simple enough
to resolve. Thanks!

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed by '__build'.

2018-10-30 Thread kbuild test robot
Hi Rob,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   310c7585e8300ddc46211df0757c11e4299ec482
commit: 37c8a5fafa3bb7dcdd51774be353be6cb2912b86 kbuild: consolidate Devicetree 
dtb build rules
date:   4 weeks ago
config: xtensa-defconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 37c8a5fafa3bb7dcdd51774be353be6cb2912b86
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/csp.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx200mx.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/ml605.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705_nommu.dtb', 
>> needed by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/kc705.dtb', needed 
>> by '__build'.
>> make[2]: *** No rule to make target 'arch/xtensa/boot/dts/lx60.dtb', needed 
>> by '__build'.
   make[2]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: linux-next: manual merge of the vfs tree with the xfs tree

2018-10-30 Thread Dave Chinner
On Wed, Oct 31, 2018 at 11:52:47AM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> [I don't understand why all this new work turned up in the xfs tree
> during the merge window ...]
> 
> Today's linux-next merge of the vfs tree got a conflict in:
> 
>   fs/read_write.c
> 
> between commits:
> 
>   42ec3d4c0218 ("vfs: make remap_file_range functions take and return bytes 
> completed")
>   eca3654e3cc7 ("vfs: enable remap callers that can handle short operations")
> 
> from the xfs tree and commit:
> 
>   5de4480ae7f8 ("vfs: allow dedupe of user owned read-only files")
> 
> from the vfs tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks ok. I didn't expect this conflict, but looks simple enough
to resolve. Thanks!

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC] doc: rcu: remove note on smp_mb during synchronize_rcu

2018-10-30 Thread Joel Fernandes
Hi Paul,

On Tue, Oct 30, 2018 at 04:43:36PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 30, 2018 at 03:26:49PM -0700, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Sat, Oct 27, 2018 at 09:30:46PM -0700, Joel Fernandes (Google) wrote:
> > > As per this thread [1], it seems this smp_mb isn't needed anymore:
> > > "So the smp_mb() that I was trying to add doesn't need to be there."
> > > 
> > > So let us remove this part from the memory ordering documentation.
> > > 
> > > [1] https://lkml.org/lkml/2017/10/6/707
> > > 
> > > Signed-off-by: Joel Fernandes (Google) 
> > 
> > I was just checking about this patch. Do you feel it is correct to remove
> > this part from the docs? Are you satisified that a barrier isn't needed 
> > there
> > now? Or did I miss something?
> 
> Apologies, it got lost in the shuffle.  I have now applied it with a
> bit of rework to the commit log, thank you!

No worries, thanks for taking it!

Just wanted to update you on my progress reading/correcting the docs. The
'Memory Ordering' is taking a bit of time so I paused that and I'm focusing
on finishing all the other low hanging fruit. This activity is mostly during
night hours after the baby is asleep but some times I also manage to sneak it
into the day job ;-)

BTW I do want to discuss about this smp_mb patch above with you at LPC if you
had time, even though we are removing it from the documentation. I thought
about it a few times, and I was not able to fully appreciate the need for the
barrier (that is even assuming that complete() etc did not do the right
thing).  Specifically I was wondering same thing Peter said in the above
thread I think that - if that rcu_read_unlock() triggered all the spin
locking up the tree of nodes, then why is that locking not sufficient to
prevent reads from the read-side section from bleeding out? That would
prevent the reader that just unlocked from seeing anything that happens
_after_ the synchronize_rcu.

Also about GP memory ordering and RCU-tree-locking, I think you mentioned to
me that the RCU reader-sections are virtually extended both forward and
backward and whereever it ends, those paths do heavy-weight synchronization
that should be sufficient to prevent memory ordering issues (such as those
you mentioned in the Requierments document). That is exactly why we don't
need explicit barriers during rcu_read_unlock. If I recall I asked you why
those are not needed. So that answer made sense, but then now on going
through the 'Memory Ordering' document, I see that you mentioned there is
reliance on the locking. Is that reliance on locking necessary to maintain
ordering then?

Or did I miss the points completely? :(

--
TODO list of the index file marking which ones I have finished perusing:

arrayRCU.txtDONE
checklist.txt   DONE
listRCU.txt DONE
lockdep.txt DONE
lockdep-splat.txt   DONE
NMI-RCU.txt
rcu_dereference.txt
rcubarrier.txt
rculist_nulls.txt
rcuref.txt
rcu.txt
RTFP.txtDONE
stallwarn.txt   DONE
torture.txt
UP.txt
whatisRCU.txt   DONE

Design
 - Data-Structures  DONE
 - Requirements DONE
 - Expedited-Grace-Periods
 - Memory Ordering  next



Re: [RFC] doc: rcu: remove note on smp_mb during synchronize_rcu

2018-10-30 Thread Joel Fernandes
Hi Paul,

On Tue, Oct 30, 2018 at 04:43:36PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 30, 2018 at 03:26:49PM -0700, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Sat, Oct 27, 2018 at 09:30:46PM -0700, Joel Fernandes (Google) wrote:
> > > As per this thread [1], it seems this smp_mb isn't needed anymore:
> > > "So the smp_mb() that I was trying to add doesn't need to be there."
> > > 
> > > So let us remove this part from the memory ordering documentation.
> > > 
> > > [1] https://lkml.org/lkml/2017/10/6/707
> > > 
> > > Signed-off-by: Joel Fernandes (Google) 
> > 
> > I was just checking about this patch. Do you feel it is correct to remove
> > this part from the docs? Are you satisified that a barrier isn't needed 
> > there
> > now? Or did I miss something?
> 
> Apologies, it got lost in the shuffle.  I have now applied it with a
> bit of rework to the commit log, thank you!

No worries, thanks for taking it!

Just wanted to update you on my progress reading/correcting the docs. The
'Memory Ordering' is taking a bit of time so I paused that and I'm focusing
on finishing all the other low hanging fruit. This activity is mostly during
night hours after the baby is asleep but some times I also manage to sneak it
into the day job ;-)

BTW I do want to discuss about this smp_mb patch above with you at LPC if you
had time, even though we are removing it from the documentation. I thought
about it a few times, and I was not able to fully appreciate the need for the
barrier (that is even assuming that complete() etc did not do the right
thing).  Specifically I was wondering same thing Peter said in the above
thread I think that - if that rcu_read_unlock() triggered all the spin
locking up the tree of nodes, then why is that locking not sufficient to
prevent reads from the read-side section from bleeding out? That would
prevent the reader that just unlocked from seeing anything that happens
_after_ the synchronize_rcu.

Also about GP memory ordering and RCU-tree-locking, I think you mentioned to
me that the RCU reader-sections are virtually extended both forward and
backward and whereever it ends, those paths do heavy-weight synchronization
that should be sufficient to prevent memory ordering issues (such as those
you mentioned in the Requierments document). That is exactly why we don't
need explicit barriers during rcu_read_unlock. If I recall I asked you why
those are not needed. So that answer made sense, but then now on going
through the 'Memory Ordering' document, I see that you mentioned there is
reliance on the locking. Is that reliance on locking necessary to maintain
ordering then?

Or did I miss the points completely? :(

--
TODO list of the index file marking which ones I have finished perusing:

arrayRCU.txtDONE
checklist.txt   DONE
listRCU.txt DONE
lockdep.txt DONE
lockdep-splat.txt   DONE
NMI-RCU.txt
rcu_dereference.txt
rcubarrier.txt
rculist_nulls.txt
rcuref.txt
rcu.txt
RTFP.txtDONE
stallwarn.txt   DONE
torture.txt
UP.txt
whatisRCU.txt   DONE

Design
 - Data-Structures  DONE
 - Requirements DONE
 - Expedited-Grace-Periods
 - Memory Ordering  next



  1   2   3   4   5   6   7   8   9   10   >