Re: [PATCH v2 2/3] arch: simplify architecture specific page size configuration

2024-03-07 Thread Michael Ellerman
Arnd Bergmann  writes:
> From: Arnd Bergmann 
>
> arc, arm64, parisc and powerpc all have their own Kconfig symbols
> in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these
> so the common symbols are the ones that are actually used, while
> leaving the arhcitecture specific ones as the user visible
> place for configuring it, to avoid breaking user configs.
>
> Reviewed-by: Christophe Leroy  (powerpc32)
> Acked-by: Catalin Marinas 
> Acked-by: Helge Deller  # parisc
> Signed-off-by: Arnd Bergmann 
> ---
> No changes from v1
>
>  arch/arc/Kconfig  |  3 +++
>  arch/arc/include/uapi/asm/page.h  |  6 ++
>  arch/arm64/Kconfig| 29 +
>  arch/arm64/include/asm/page-def.h |  2 +-
>  arch/parisc/Kconfig   |  3 +++
>  arch/parisc/include/asm/page.h| 10 +-
>  arch/powerpc/Kconfig  | 31 ++-
>  arch/powerpc/include/asm/page.h   |  2 +-
>  scripts/gdb/linux/constants.py.in |  2 +-
>  scripts/gdb/linux/mm.py   |  2 +-
>  10 files changed, 32 insertions(+), 58 deletions(-)

Acked-by: Michael Ellerman  (powerpc)

cheers



Re: [PATCH v2 1/3] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions

2024-03-06 Thread Michael Ellerman
Hi Arnd,

Arnd Bergmann  writes:
> From: Arnd Bergmann 
>
> These four architectures define the same Kconfig symbols for configuring
> the page size. Move the logic into a common place where it can be shared
> with all other architectures.
>
> Signed-off-by: Arnd Bergmann 
> ---
> Changes from v1:
>  - improve Kconfig help texts
>  - fix Hexagon Kconfig
>
>  arch/Kconfig  | 92 ++-
>  arch/hexagon/Kconfig  | 24 ++--
>  arch/hexagon/include/asm/page.h   |  6 +-
>  arch/loongarch/Kconfig| 21 ++-
>  arch/loongarch/include/asm/page.h | 10 +---
>  arch/mips/Kconfig | 58 ++-
>  arch/mips/include/asm/page.h  | 16 +-
>  arch/sh/include/asm/page.h| 13 +
>  arch/sh/mm/Kconfig| 42 --
>  9 files changed, 121 insertions(+), 161 deletions(-)

There's a few "help" lines missing, which breaks the build:

  arch/Kconfig:1134: syntax error
  arch/Kconfig:1133: invalid statement
  arch/Kconfig:1134: invalid statement
  arch/Kconfig:1135:warning: ignoring unsupported character '.'
  arch/Kconfig:1135:warning: ignoring unsupported character '.'
  arch/Kconfig:1135: invalid statement
  arch/Kconfig:1136: invalid statement
  arch/Kconfig:1137:warning: ignoring unsupported character '.'
  arch/Kconfig:1137: invalid statement
  arch/Kconfig:1143: syntax error
  arch/Kconfig:1142: invalid statement
  arch/Kconfig:1143: invalid statement
  arch/Kconfig:1144:warning: ignoring unsupported character '.'
  arch/Kconfig:1144: invalid statement
  arch/Kconfig:1145: invalid statement
  arch/Kconfig:1146: invalid statement
  arch/Kconfig:1147: invalid statement
  arch/Kconfig:1148:warning: ignoring unsupported character '.'
  arch/Kconfig:1148: invalid statement
  make[4]: *** [../scripts/kconfig/Makefile:85: syncconfig] Error 1

Fixup diff is:

diff --git a/arch/Kconfig b/arch/Kconfig
index 56d45a75f625..f2295fa3b48c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1130,6 +1130,7 @@ config PAGE_SIZE_16KB
 config PAGE_SIZE_32KB
bool "32KiB pages"
depends on HAVE_PAGE_SIZE_32KB
+   help
  Using 32KiB page size will result in slightly higher performance
  kernel at the price of higher memory consumption compared to
  16KiB pages.  This option is available only on cnMIPS cores.
@@ -1139,6 +1140,7 @@ config PAGE_SIZE_32KB
 config PAGE_SIZE_64KB
bool "64KiB pages"
depends on HAVE_PAGE_SIZE_64KB
+   help
  Using 64KiB page size will result in slightly higher performance
  kernel at the price of much higher memory consumption compared to
  4KiB or 16KiB pages.


cheers



Re: [PATCH 4/4] vdso: avoid including asm/page.h

2024-02-27 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 26/02/2024 à 17:14, Arnd Bergmann a écrit :
>> From: Arnd Bergmann 
>> 
>> The recent change to the vdso_data_store broke building compat VDSO
>> on at least arm64 because it includes headers outside of the include/vdso/
>> namespace:
>
> I understand that powerpc64 also has an issue, see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20231221120410.2226678-1-...@ellerman.id.au/

Yeah, and that patch would silently conflict with this series, which is
not ideal.

I could delay merging my patch above until after this series goes in,
mine only fixes a fairly obscure build warning.

cheers



Re: [PATCH v2 02/29] powerpc: Remove PT_NOTE workaround

2019-10-11 Thread Michael Ellerman
Kees Cook  writes:
> In preparation for moving NOTES into RO_DATA, remove the PT_NOTE
> workaround since the kernel requires at least gcc 4.6 now.
>
> Signed-off-by: Kees Cook 
> ---
>  arch/powerpc/kernel/vmlinux.lds.S | 24 ++--
>  1 file changed, 2 insertions(+), 22 deletions(-)

Acked-by: Michael Ellerman 

For the archives, Joel tried a similar patch a while back which caused
some problems, see:

  https://lore.kernel.org/linuxppc-dev/20190321003253.22100-1-j...@jms.id.au/

and a v2:

  https://lore.kernel.org/linuxppc-dev/20190329064453.12761-1-j...@jms.id.au/

This is similar to his v2. The only outstanding comment on his v2 was
from Segher:
  (And I do not know if there are any tools that expect the notes in a phdr,
  or even specifically the second phdr).

But this patch solves that by not changing the note.

cheers

> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 81e672654789..a3c8492b2b19 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -20,20 +20,6 @@ ENTRY(_stext)
>  PHDRS {
>   kernel PT_LOAD FLAGS(7); /* RWX */
>   note PT_NOTE FLAGS(0);
> - dummy PT_NOTE FLAGS(0);
> -
> - /* binutils < 2.18 has a bug that makes it misbehave when taking an
> -ELF file with all segments at load address 0 as input.  This
> -happens when running "strip" on vmlinux, because of the AT() magic
> -in this linker script.  People using GCC >= 4.2 won't run into
> -this problem, because the "build-id" support will put some data
> -into the "notes" segment (at a non-zero load address).
> -
> -To work around this, we force some data into both the "dummy"
> -segment and the kernel segment, so the dummy segment will get a
> -non-zero load address.  It's not enough to always create the
> -"notes" segment, since if nothing gets assigned to it, its load
> -address will be zero.  */
>  }
>  
>  #ifdef CONFIG_PPC64
> @@ -178,14 +164,8 @@ SECTIONS
>   EXCEPTION_TABLE(0)
>  
>   NOTES :kernel :note
> -
> - /* The dummy segment contents for the bug workaround mentioned above
> -near PHDRS.  */
> - .dummy : AT(ADDR(.dummy) - LOAD_OFFSET) {
> - LONG(0)
> - LONG(0)
> - LONG(0)
> - } :kernel :dummy
> + /* Restore program header away from PT_NOTE. */
> + .dummy : { *(.dummy) } :kernel
>  
>  /*
>   * Init sections discarded at runtime
> -- 
> 2.17.1


Re: [PATCH v2 03/29] powerpc: Rename PT_LOAD identifier "kernel" to "text"

2019-10-11 Thread Michael Ellerman
Kees Cook  writes:
> In preparation for moving NOTES into RO_DATA, rename the linker script
> internal identifier for the PT_LOAD Program Header from "kernel" to
> "text" to match other architectures.
>
> Signed-off-by: Kees Cook 
> ---
>  arch/powerpc/kernel/vmlinux.lds.S | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index a3c8492b2b19..e184a63aa5b0 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -18,7 +18,7 @@
>  ENTRY(_stext)
>  
>  PHDRS {
> - kernel PT_LOAD FLAGS(7); /* RWX */
> + text PT_LOAD FLAGS(7); /* RWX */
>   note PT_NOTE FLAGS(0);
>  }
>  
> @@ -63,7 +63,7 @@ SECTIONS
>  #else /* !CONFIG_PPC64 */
>   HEAD_TEXT
>  #endif
> - } :kernel
> + } :text
>  
>   __head_end = .;
>  
> @@ -112,7 +112,7 @@ SECTIONS
>   __got2_end = .;
>  #endif /* CONFIG_PPC32 */
>  
> - } :kernel
> + } :text
>  
>   . = ALIGN(ETEXT_ALIGN_SIZE);
>   _etext = .;
> @@ -163,9 +163,9 @@ SECTIONS
>  #endif
>   EXCEPTION_TABLE(0)
>  
> - NOTES :kernel :note
> + NOTES :text :note
>   /* Restore program header away from PT_NOTE. */
> - .dummy : { *(.dummy) } :kernel
> + .dummy : { *(.dummy) } :text
>  
>  /*
>   * Init sections discarded at runtime
> @@ -180,7 +180,7 @@ SECTIONS
>  #ifdef CONFIG_PPC64
>   *(.tramp.ftrace.init);
>  #endif
> - } :kernel
> + } :text
>  
>   /* .exit.text is discarded at runtime, not link time,
>* to deal with references from __bug_table
> -- 
> 2.17.1


Re: [PATCH v2 01/29] powerpc: Rename "notes" PT_NOTE to "note"

2019-10-11 Thread Michael Ellerman
Kees Cook  writes:
> The Program Header identifiers are internal to the linker scripts. In
> preparation for moving the NOTES segment declaration into RO_DATA,
> standardize the identifier for the PT_NOTE entry to "note" as used by
> all other architectures that emit PT_NOTE.
>
> Signed-off-by: Kees Cook 
> ---
>  arch/powerpc/kernel/vmlinux.lds.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 060a1acd7c6d..81e672654789 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -19,7 +19,7 @@ ENTRY(_stext)
>  
>  PHDRS {
>   kernel PT_LOAD FLAGS(7); /* RWX */
> - notes PT_NOTE FLAGS(0);
> + note PT_NOTE FLAGS(0);
>   dummy PT_NOTE FLAGS(0);
>  
>   /* binutils < 2.18 has a bug that makes it misbehave when taking an
> @@ -177,7 +177,7 @@ SECTIONS
>  #endif
>   EXCEPTION_TABLE(0)
>  
> - NOTES :kernel :notes
> + NOTES :kernel :note
>  
>   /* The dummy segment contents for the bug workaround mentioned above
>  near PHDRS.  */
> -- 
> 2.17.1


Re: [PATCH v5] numa: make node_to_cpumask_map() NUMA_NO_NODE aware

2019-09-16 Thread Michael Ellerman
Yunsheng Lin  writes:
> When passing the return value of dev_to_node() to cpumask_of_node()
> without checking if the device's node id is NUMA_NO_NODE, there is
> global-out-of-bounds detected by KASAN.
>
> From the discussion [1], NUMA_NO_NODE really means no node affinity,
> which also means all cpus should be usable. So the cpumask_of_node()
> should always return all cpus online when user passes the node id as
> NUMA_NO_NODE, just like similar semantic that page allocator handles
> NUMA_NO_NODE.
>
> But we cannot really copy the page allocator logic. Simply because the
> page allocator doesn't enforce the near node affinity. It just picks it
> up as a preferred node but then it is free to fallback to any other numa
> node. This is not the case here and node_to_cpumask_map will only restrict
> to the particular node's cpus which would have really non deterministic
> behavior depending on where the code is executed. So in fact we really
> want to return cpu_online_mask for NUMA_NO_NODE.
>
> Some arches were already NUMA_NO_NODE aware, but they return cpu_all_mask,
> which should be identical with cpu_online_mask when those arches do not
> support cpu hotplug, this patch also changes them to return cpu_online_mask
> in order to be consistent and use NUMA_NO_NODE instead of "-1".

Except some of those arches *do* support CPU hotplug, powerpc and sparc
at least. So switching from cpu_all_mask to cpu_online_mask is a
meaningful change.

That doesn't mean it's wrong, but you need to explain why it's the right
change.


> Also there is a debugging version of node_to_cpumask_map() for x86 and
> arm64, which is only used when CONFIG_DEBUG_PER_CPU_MAPS is defined, this
> patch changes it to handle NUMA_NO_NODE as normal node_to_cpumask_map().
>
> [1] https://lore.kernel.org/patchwork/patch/1125789/
> Signed-off-by: Yunsheng Lin 
> Suggested-by: Michal Hocko 
> Acked-by: Michal Hocko 
> ---
> V5: Drop unsigned "fix" change for x86/arm64, and change comment log
> according to Michal's comment.
> V4: Have all these changes in a single patch.

This makes it much harder to get the patch merged, you basically have to
get Andrew Morton to merge it now. Sending individual patches for each
arch means each arch maintainer can merge them separately.

cheers

> V3: Change to only handle NUMA_NO_NODE, and return cpu_online_mask
> for NUMA_NO_NODE case, and change the commit log to better justify
> the change.
> V2: make the node id checking change to other arches too.
> ---
>  arch/alpha/include/asm/topology.h| 2 +-
>  arch/arm64/include/asm/numa.h| 3 +++
>  arch/arm64/mm/numa.c | 3 +++
>  arch/mips/include/asm/mach-ip27/topology.h   | 4 ++--
>  arch/mips/include/asm/mach-loongson64/topology.h | 4 +++-
>  arch/powerpc/include/asm/topology.h  | 6 +++---
>  arch/s390/include/asm/topology.h | 3 +++
>  arch/sparc/include/asm/topology_64.h | 6 +++---
>  arch/x86/include/asm/topology.h  | 3 +++
>  arch/x86/mm/numa.c   | 3 +++
>  10 files changed, 27 insertions(+), 10 deletions(-)
>
> diff --git a/arch/alpha/include/asm/topology.h 
> b/arch/alpha/include/asm/topology.h
> index 5a77a40..836c9e2 100644
> --- a/arch/alpha/include/asm/topology.h
> +++ b/arch/alpha/include/asm/topology.h
> @@ -31,7 +31,7 @@ static const struct cpumask *cpumask_of_node(int node)
>   int cpu;
>  
>   if (node == NUMA_NO_NODE)
> - return cpu_all_mask;
> + return cpu_online_mask;
>  
>   cpumask_clear(_to_cpumask_map[node]);
>  
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> index 626ad01..c8a4b31 100644
> --- a/arch/arm64/include/asm/numa.h
> +++ b/arch/arm64/include/asm/numa.h
> @@ -25,6 +25,9 @@ const struct cpumask *cpumask_of_node(int node);
>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> + if (node == NUMA_NO_NODE)
> + return cpu_online_mask;
> +
>   return node_to_cpumask_map[node];
>  }
>  #endif
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 4f241cc..f57202d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -46,6 +46,9 @@ EXPORT_SYMBOL(node_to_cpumask_map);
>   */
>  const struct cpumask *cpumask_of_node(int node)
>  {
> + if (node == NUMA_NO_NODE)
> + return cpu_online_mask;
> +
>   if (WARN_ON(node >= nr_node_ids))
>   return cpu_none_mask;
>  
> diff --git a/arch/mips/include/asm/mach-ip27/topology.h 
> b/arch/mips/include/asm/mach-ip27/topology.h
> index 965f079..04505e6 100644
> --- a/arch/mips/include/asm/mach-ip27/topology.h
> +++ b/arch/mips/include/asm/mach-ip27/topology.h
> @@ -15,8 +15,8 @@ struct cpuinfo_ip27 {
>  extern struct cpuinfo_ip27 sn_cpu_info[NR_CPUS];
>  
>  #define cpu_to_node(cpu) (sn_cpu_info[(cpu)].p_nodeid)
> 

Re: [PATCH 1/2] arch: mark syscall number 435 reserved for clone3

2019-07-19 Thread Michael Ellerman
Christian Brauner  writes:
> On Fri, Jul 19, 2019 at 08:18:02PM +1000, Michael Ellerman wrote:
>> Christian Brauner  writes:
>> > On Mon, Jul 15, 2019 at 03:56:04PM +0200, Christian Borntraeger wrote:
>> >> I think Vasily already has a clone3 patch for s390x with 435. 
>> >
>> > A quick follow-up on this. Helge and Michael have asked whether there
>> > are any tests for clone3. Yes, there will be and I try to have them
>> > ready by the end of the this or next week for review. In the meantime I
>> > hope the following minimalistic test program that just verifies very
>> > very basic functionality (It's not pretty.) will help you test:
>> 
>> Hi Christian,
>> 
>> Thanks for the test.
>> 
>> This actually oopses on powerpc, it hits the BUG_ON in CHECK_FULL_REGS
>> in process.c around line 1633:
>> 
>>  } else {
>>  /* user thread */
>>  struct pt_regs *regs = current_pt_regs();
>>  CHECK_FULL_REGS(regs);
>>  *childregs = *regs;
>>  if (usp)
>> 
>> 
>> So I'll have to dig into how we fix that before we wire up clone3.
>> 
>> Turns out testing is good! :)
>
> Indeed. I have a test-suite for clone3 in mind and I hope to have it
> ready by the end of next week. It's just always the finding the time
> part that is annoying. :)

I know the feeling!

> Thanks for digging into this, Michael!

No worries, happy to help where I can.

In the intervening five minutes I remembered how we handle this, we just
need a little wrapper to save the non-volatile regs:

_GLOBAL(ppc_clone3)
bl  save_nvgprs
bl  sys_clone3
b   .Lsyscall_exit


A while back I meant to make it generate those automatically based on a
flag in the syscall.tbl but of course haven't got around to it :)

So with the above it seems all good:

$ ./clone3 ; echo $?
Parent process received child's pid 4204 as return value
Parent process received child's pidfd 3
Parent process received child's pid 4204 as return argument
Child process with pid 4204
0

I'll send a patch to wire it up on Monday.

cheers


Re: [PATCH 1/2] arch: mark syscall number 435 reserved for clone3

2019-07-19 Thread Michael Ellerman
Christian Brauner  writes:
> On Mon, Jul 15, 2019 at 03:56:04PM +0200, Christian Borntraeger wrote:
>> I think Vasily already has a clone3 patch for s390x with 435. 
>
> A quick follow-up on this. Helge and Michael have asked whether there
> are any tests for clone3. Yes, there will be and I try to have them
> ready by the end of the this or next week for review. In the meantime I
> hope the following minimalistic test program that just verifies very
> very basic functionality (It's not pretty.) will help you test:

Hi Christian,

Thanks for the test.

This actually oopses on powerpc, it hits the BUG_ON in CHECK_FULL_REGS
in process.c around line 1633:

} else {
/* user thread */
struct pt_regs *regs = current_pt_regs();
CHECK_FULL_REGS(regs);
*childregs = *regs;
if (usp)


So I'll have to dig into how we fix that before we wire up clone3.

Turns out testing is good! :)

cheers


Re: [PATCH v9 10/10] selftests: add openat2(2) selftests

2019-07-07 Thread Michael Ellerman
Hi Aleksa,

A few minor comments below.

Aleksa Sarai  writes:
> diff --git a/tools/testing/selftests/openat2/Makefile 
> b/tools/testing/selftests/openat2/Makefile
> new file mode 100644
> index ..8235a49928f6
> --- /dev/null
> +++ b/tools/testing/selftests/openat2/Makefile
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +CFLAGS += -Wall -O2 -g
> +TEST_GEN_PROGS := linkmode_test resolve_test rename_attack_test
> +
> +include ../lib.mk
> +
> +$(OUTPUT)/linkmode_test: linkmode_test.c helpers.o
> +$(OUTPUT)/rename_attack_test: rename_attack_test.c helpers.o
> +$(OUTPUT)/resolve_test: resolve_test.c helpers.o

You don't need to tell make that foo depends on foo.c.

Also if you make the dependency be on helpers.c then you won't get an
intermediate helpers.o, and then you don't need to clean it.

So the above three lines could just be:

$(TEST_GEN_PROGS): helpers.c

> +EXTRA_CLEAN = helpers.o $(wildcard /tmp/ksft-openat2-*)

If you follow my advice above you don't need helpers.o in there.

Deleting things from /tmp is also a bit fishy on shared machines, ie. it
will error if those files happen to be owned by another user.

cheers


Re: [PATCH v3 2/3] arch: wire-up close_range()

2019-05-27 Thread Michael Ellerman
Christian Brauner  writes:
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 103655d84b4b..ba2c1f078cbd 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -515,3 +515,4 @@
>  431  common  fsconfigsys_fsconfig
>  432  common  fsmount sys_fsmount
>  433  common  fspick  sys_fspick
> +435  common  close_range sys_close_range

With a minor build fix the selftest passes for me on ppc64le:

  # ./close_range_test 
  1..9
  ok 1 do not allow invalid flag values for close_range()
  ok 2 close_range() from 3 to 53
  ok 3 fcntl() verify closed range from 3 to 53
  ok 4 close_range() from 54 to 95
  ok 5 fcntl() verify closed range from 54 to 95
  ok 6 close_range() from 96 to 102
  ok 7 fcntl() verify closed range from 96 to 102
  ok 8 close_range() closed single file descriptor
  ok 9 fcntl() verify closed single file descriptor
  # Pass 9 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0


Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v2 2/2] tests: add close_range() tests

2019-05-27 Thread Michael Ellerman
Christian Brauner  writes:
> This adds basic tests for the new close_range() syscall.
> - test that no invalid flags can be passed
> - test that a range of file descriptors is correctly closed
> - test that a range of file descriptors is correctly closed if there there
>   are already closed file descriptors in the range
> - test that max_fd is correctly capped to the current fdtable maximum
>
> Signed-off-by: Christian Brauner 
> Cc: Arnd Bergmann 
> Cc: Jann Horn 
> Cc: David Howells 
> Cc: Dmitry V. Levin 
> Cc: Oleg Nesterov 
> Cc: Linus Torvalds 
> Cc: Florian Weimer 
> Cc: linux-...@vger.kernel.org
> ---
> v1: unchanged
> v2:
> - Christian Brauner :
>   - verify that close_range() correctly closes a single file descriptor
> ---
>  tools/testing/selftests/Makefile  |   1 +
>  tools/testing/selftests/core/.gitignore   |   1 +
>  tools/testing/selftests/core/Makefile |   6 +
>  .../testing/selftests/core/close_range_test.c | 142 ++
>  4 files changed, 150 insertions(+)
>  create mode 100644 tools/testing/selftests/core/.gitignore
>  create mode 100644 tools/testing/selftests/core/Makefile
>  create mode 100644 tools/testing/selftests/core/close_range_test.c
>
> diff --git a/tools/testing/selftests/core/.gitignore 
> b/tools/testing/selftests/core/.gitignore
> new file mode 100644
> index ..6e6712ce5817
> --- /dev/null
> +++ b/tools/testing/selftests/core/.gitignore
> @@ -0,0 +1 @@
> +close_range_test
> diff --git a/tools/testing/selftests/core/Makefile 
> b/tools/testing/selftests/core/Makefile
> new file mode 100644
> index ..de3ae68aa345
> --- /dev/null
> +++ b/tools/testing/selftests/core/Makefile
> @@ -0,0 +1,6 @@
> +CFLAGS += -g -I../../../../usr/include/ -I../../../../include

Your second -I pulls the unexported kernel headers in, userspace
programs shouldn't include unexported kernel headers.

It breaks the build on powerpc with eg:

  powerpc64le-linux-gnu-gcc -g -I../../../../usr/include/ -I../../../../include 
   close_range_test.c  -o /output/kselftest/core/close_range_test
  In file included from 
/usr/powerpc64le-linux-gnu/include/bits/fcntl-linux.h:346,
   from /usr/powerpc64le-linux-gnu/include/bits/fcntl.h:62,
   from /usr/powerpc64le-linux-gnu/include/fcntl.h:35,
   from close_range_test.c:5:
  ../../../../include/linux/falloc.h:13:2: error: unknown type name '__s16'
__s16  l_type;
^


Did you do that on purpose or just copy it from one of the other
Makefiles? :)

If you're just wanting to get the syscall number when the headers
haven't been exported, I think the best solution is to do eg:

diff --git a/tools/testing/selftests/core/close_range_test.c 
b/tools/testing/selftests/core/close_range_test.c
index d6e6079d3d53..34c6f02f25de 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -14,6 +14,10 @@

 #include "../kselftest.h"

+#ifndef __NR_close_range
+#define __NR_close_range   435
+#endif
+
 static inline int sys_close_range(unsigned int fd, unsigned int max_fd,
  unsigned int flags)
 {


cheers


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-04 Thread Michael Ellerman
Jens Axboe  writes:
> On 4/3/19 5:11 AM, Will Deacon wrote:
>> On Wed, Apr 03, 2019 at 01:47:50PM +1100, Michael Ellerman wrote:
>>> Arnd Bergmann  writes:
>>>> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
>>>> b/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> index b18abb0c3dae..00f5a63c8d9a 100644
>>>> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>>>> @@ -505,3 +505,7 @@
>>>>  421   32  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
>>>> compat_sys_rt_sigtimedwait_time64
>>>>  422   32  futex_time64sys_futex   
>>>> sys_futex
>>>>  423   32  sched_rr_get_interval_time64
>>>> sys_sched_rr_get_interval   sys_sched_rr_get_interval
>>>> +424   common  pidfd_send_signal   sys_pidfd_send_signal
>>>> +425   common  io_uring_setup  sys_io_uring_setup
>>>> +426   common  io_uring_enter  sys_io_uring_enter
>>>> +427   common  io_uring_register   sys_io_uring_register
>>>
>>> Acked-by: Michael Ellerman  (powerpc)
>>>
>>> Lightly tested.
>>>
>>> The pidfd_test selftest passes.
>> 
>> That reports pass for me too, although it fails to unshare the pid ns, which 
>> I
>> assume is benign.

If you run it as root it should work?

>>> Ran the io_uring example from fio, which prints lots of:
>> 
>> How did you invoke that? I had a play with the tests in:
>
> It's t/io_uring from the fio repo:
>
> git://git.kernel.dk/fio
>
> and you just run it ala:
>
> # make t/io_uring
> # t/io_uring /dev/some_device

Yeah that's all I did.

>> will@autoplooker:~/liburing/test$ ./io_uring_register 
>> RELIMIT_MEMLOCK: 67108864 (67108864)
>> [   35.477875] Unable to handle kernel NULL pointer dereference at virtual 
>> address 0070
>> [   35.478969] Mem abort info:
>> [   35.479296]   ESR = 0x9604
>> [   35.479785]   Exception class = DABT (current EL), IL = 32 bits
>> [   35.480528]   SET = 0, FnV = 0
>> [   35.480980]   EA = 0, S1PTW = 0
>> [   35.481345] Data abort info:
>> [   35.481680]   ISV = 0, ISS = 0x0004
>> [   35.482267]   CM = 0, WnR = 0
>> [   35.482618] user pgtable: 4k pages, 48-bit VAs, pgdp = (ptrval)
>> [   35.483486] [0070] pgd=
>> [   35.484041] Internal error: Oops: 9604 [#1] PREEMPT SMP
>> [   35.484788] Modules linked in:
>> [   35.485311] CPU: 113 PID: 3973 Comm: io_uring_regist Not tainted 
>> 5.1.0-rc3-00012-g40b114779944 #1
>> [   35.486712] Hardware name: linux,dummy-virt (DT)
>> [   35.487450] pstate: 2045 (nzCv daif +PAN -UAO)
>> [   35.488228] pc : link_pwq+0x10/0x60
>> [   35.488794] lr : apply_wqattrs_commit+0xe0/0x118
>> [   35.489550] sp : 17e2bbc0
>
> Huh, this looks odd, it's crashing inside the wq setup.

Looks like you found a bug :)

cheers


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-02 Thread Michael Ellerman
Arnd Bergmann  writes:
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index b18abb0c3dae..00f5a63c8d9a 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -505,3 +505,7 @@
>  421  32  rt_sigtimedwait_time64  sys_rt_sigtimedwait 
> compat_sys_rt_sigtimedwait_time64
>  422  32  futex_time64sys_futex   
> sys_futex
>  423  32  sched_rr_get_interval_time64sys_sched_rr_get_interval   
> sys_sched_rr_get_interval
> +424  common  pidfd_send_signal   sys_pidfd_send_signal
> +425  common  io_uring_setup  sys_io_uring_setup
> +426  common  io_uring_enter  sys_io_uring_enter
> +427  common  io_uring_register       sys_io_uring_register

Acked-by: Michael Ellerman  (powerpc)

Lightly tested.

The pidfd_test selftest passes.

Ran the io_uring example from fio, which prints lots of:

IOPS=209952, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=116 (116), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209920, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=209952, IOS/call=32/32, inflight=115 (115), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=113 (113), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=112 (112), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=110 (110), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=104 (104), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=102 (102), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=100 (100), Cachehit=0.00%
IOPS=210080, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/32, inflight=97 (97), Cachehit=0.00%
IOPS=210112, IOS/call=32/31, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=126 (126), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=125 (125), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=119 (119), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=117 (117), Cachehit=0.00%
IOPS=210016, IOS/call=32/32, inflight=114 (114), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=111 (111), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=108 (108), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=107 (107), Cachehit=0.00%
IOPS=210048, IOS/call=32/32, inflight=105 (105), Cachehit=0.00%

Which is good I think?


cheers


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-04-02 Thread Michael Ellerman
Arnd Bergmann  writes:
> On Sun, Mar 31, 2019 at 5:47 PM Michael Ellerman  wrote:
>>
>> Arnd Bergmann  writes:
>> > Add the io_uring and pidfd_send_signal system calls to all architectures.
>> >
>> > These system calls are designed to handle both native and compat tasks,
>> > so all entries are the same across architectures, only arm-compat and
>> > the generic tale still use an old format.
>> >
>> > Signed-off-by: Arnd Bergmann 
>> > ---
>> >  arch/alpha/kernel/syscalls/syscall.tbl  | 4 
>> >  arch/arm/tools/syscall.tbl  | 4 
>> >  arch/arm64/include/asm/unistd.h | 2 +-
>> >  arch/arm64/include/asm/unistd32.h   | 8 
>> >  arch/ia64/kernel/syscalls/syscall.tbl   | 4 
>> >  arch/m68k/kernel/syscalls/syscall.tbl   | 4 
>> >  arch/microblaze/kernel/syscalls/syscall.tbl | 4 
>> >  arch/mips/kernel/syscalls/syscall_n32.tbl   | 4 
>> >  arch/mips/kernel/syscalls/syscall_n64.tbl   | 4 
>> >  arch/mips/kernel/syscalls/syscall_o32.tbl   | 4 
>> >  arch/parisc/kernel/syscalls/syscall.tbl | 4 
>> >  arch/powerpc/kernel/syscalls/syscall.tbl| 4 
>>
>> Have you done any testing?
>>
>> I'd rather not wire up syscalls that have never been tested at all on
>> powerpc.
>
> No, I have not. I did review the system calls carefully and added the first
> patch to fix the bug on x86 compat mode before adding the same bug
> on the other compat architectures though ;-)
>
> Generally, my feeling is that adding system calls is not fundamentally
> different from adding other ABIs, and we should really do it at
> the same time across all architectures, rather than waiting for each
> maintainer to get around to reviewing and testing the new calls
> first. This is not a problem on powerpc, but a lot of other architectures
> are less active, which is how we have always ended up with
> different sets of system calls across architectures.

Well it's still something of a problem on powerpc. No one has
volunteered to test io_uring on powerpc, so at this stage it will go in
completely untested.

If there was a selftest in the tree I'd be a bit happier, because at
least then our CI would start testing it as soon as the syscalls were
wired up in linux-next.

And yeah obviously I should test it, but I don't have infinite time
unfortunately.

> The problem here is that this makes it harder for the C library to
> know when a system call is guaranteed to be available. glibc
> still needs a feature test for newly added syscalls to see if they
> are working (they might be backported to an older kernel, or
> disabled), but whenever the minimum kernel version is increased,
> it makes sense to drop those checks and assume non-optional
> system calls will work if they were part of that minimum version.

But that's the thing, if we just wire them up untested they may not
actually work. And then you have the far worse situation where the
syscall exists in kernel version x but does not actually work properly.

See the mess we have with pkeys for example.

> In the future, I'd hope that any new system calls get added
> right away on all architectures when they land (it was a bit
> tricky this time, because I still did a bunch of reworks that
> conflicted with the new calls). Bugs will happen of course, but
> I think adding them sooner makes it more likely to catch those
> bugs early on so we have a chance to fix them properly,
> and need fewer arch specific workarounds (ideally none)
> for system calls.

For syscalls that have a selftest in the tree, and don't rely on
anything arch specific I agree.

I'm a bit more wary of things that are not easily tested and have the
potential to work differently across arches.

cheers


Re: [PATCH 2/2] arch: add pidfd and io_uring syscalls everywhere

2019-03-31 Thread Michael Ellerman
Arnd Bergmann  writes:
> Add the io_uring and pidfd_send_signal system calls to all architectures.
>
> These system calls are designed to handle both native and compat tasks,
> so all entries are the same across architectures, only arm-compat and
> the generic tale still use an old format.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/alpha/kernel/syscalls/syscall.tbl  | 4 
>  arch/arm/tools/syscall.tbl  | 4 
>  arch/arm64/include/asm/unistd.h | 2 +-
>  arch/arm64/include/asm/unistd32.h   | 8 
>  arch/ia64/kernel/syscalls/syscall.tbl   | 4 
>  arch/m68k/kernel/syscalls/syscall.tbl   | 4 
>  arch/microblaze/kernel/syscalls/syscall.tbl | 4 
>  arch/mips/kernel/syscalls/syscall_n32.tbl   | 4 
>  arch/mips/kernel/syscalls/syscall_n64.tbl   | 4 
>  arch/mips/kernel/syscalls/syscall_o32.tbl   | 4 
>  arch/parisc/kernel/syscalls/syscall.tbl | 4 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 4 

Have you done any testing?

I'd rather not wire up syscalls that have never been tested at all on
powerpc.

cheers


Re: [PATCH net-next v5 12/12] sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW

2019-02-10 Thread Michael Ellerman
Deepa Dinamani  writes:

>> You touched powerpc in the previous patch but not this one.
>>
>> That's because we use the asm-generic version I assume.
>
> That is correct.
>
>> Would be good to mention in the change log though to avoid any confusion.
>
> I'm not sure how to do that now. It looks like the series has already
> been applied to net-next with a couple of merge conflicts fixed.

That's fine, it's not that important.

cheers


Re: [PATCH net-next v5 11/12] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

2019-02-06 Thread Michael Ellerman
Deepa Dinamani  writes:

> SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
> as the time format. struct timeval is not y2038 safe.
> The subsequent patches in the series add support for new socket
> timeout options with _NEW suffix that will use y2038 safe
> data structures. Although the existing struct timeval layout
> is sufficiently wide to represent timeouts, because of the way
> libc will interpret time_t based on user defined flag, these
> new flags provide a way of having a structure that is the same
> for all architectures consistently.
> Rename the existing options with _OLD suffix forms so that the
> right option is enabled for userspace applications according
> to the architecture and time_t definition of libc.
>
> Signed-off-by: Deepa Dinamani 
> Acked-by: Willem de Bruijn 
> Cc: ccaul...@redhat.com
> Cc: del...@gmx.de
> Cc: pau...@samba.org
> Cc: r...@linux-mips.org
> Cc: r...@twiddle.net
> Cc: cluster-de...@redhat.com
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> ---
>  arch/alpha/include/uapi/asm/socket.h   | 7 +--
>  arch/mips/include/uapi/asm/socket.h| 6 --
>  arch/parisc/include/uapi/asm/socket.h  | 6 --
>  arch/powerpc/include/uapi/asm/socket.h | 4 ++--

The powerpc changes look OK to me.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/powerpc/include/uapi/asm/socket.h 
> b/arch/powerpc/include/uapi/asm/socket.h
> index 94de465e0920..12aa0c43e775 100644
> --- a/arch/powerpc/include/uapi/asm/socket.h
> +++ b/arch/powerpc/include/uapi/asm/socket.h
> @@ -11,8 +11,8 @@
>  
>  #define SO_RCVLOWAT  16
>  #define SO_SNDLOWAT  17
> -#define SO_RCVTIMEO  18
> -#define SO_SNDTIMEO  19
> +#define SO_RCVTIMEO_OLD  18
> +#define SO_SNDTIMEO_OLD  19
>  #define SO_PASSCRED  20
>  #define SO_PEERCRED  21
>  
> diff --git a/include/uapi/asm-generic/socket.h 
> b/include/uapi/asm-generic/socket.h
> index 2713e0fa68ef..c56b8b487c12 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -30,8 +30,8 @@
>  #define SO_PEERCRED  17
>  #define SO_RCVLOWAT  18
>  #define SO_SNDLOWAT  19
> -#define SO_RCVTIMEO  20
> -#define SO_SNDTIMEO  21
> +#define SO_RCVTIMEO_OLD  20
> +#define SO_SNDTIMEO_OLD  21
>  #endif
>  
>  /* Security levels - as per NRL IPv6 - don't actually do anything */
> @@ -116,6 +116,8 @@
>  
>  #if !defined(__KERNEL__)
>  
> +#define  SO_RCVTIMEO SO_RCVTIMEO_OLD
> +#define  SO_SNDTIMEO SO_SNDTIMEO_OLD
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
>  /* on 64-bit and x32, avoid the ?: operator */
>  #define SO_TIMESTAMP SO_TIMESTAMP_OLD


Re: [PATCH net-next v5 12/12] sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW

2019-02-06 Thread Michael Ellerman
Deepa Dinamani  writes:

> Add new socket timeout options that are y2038 safe.
>
> Signed-off-by: Deepa Dinamani 
> Acked-by: Willem de Bruijn 
> Cc: ccaul...@redhat.com
> Cc: da...@davemloft.net
> Cc: del...@gmx.de
> Cc: pau...@samba.org
> Cc: r...@linux-mips.org
> Cc: r...@twiddle.net
> Cc: cluster-de...@redhat.com
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> ---
>  arch/alpha/include/uapi/asm/socket.h  | 12 --
>  arch/mips/include/uapi/asm/socket.h   | 11 +-
>  arch/parisc/include/uapi/asm/socket.h | 10 -
>  arch/sparc/include/uapi/asm/socket.h  | 11 +-

You touched powerpc in the previous patch but not this one.

That's because we use the asm-generic version I assume. Would be good to
mention in the change log though to avoid any confusion.

cheers

>  include/uapi/asm-generic/socket.h | 11 +-
>  net/core/sock.c   | 53 ---
>  6 files changed, 83 insertions(+), 25 deletions(-)
>
> diff --git a/arch/alpha/include/uapi/asm/socket.h 
> b/arch/alpha/include/uapi/asm/socket.h
> index 9826d1db71d0..0d0fddb7e738 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -119,19 +119,25 @@
>  #define SO_TIMESTAMPNS_NEW  64
>  #define SO_TIMESTAMPING_NEW 65
>  
> -#if !defined(__KERNEL__)
> +#define SO_RCVTIMEO_NEW 66
> +#define SO_SNDTIMEO_NEW 67
>  
> -#define  SO_RCVTIMEO SO_RCVTIMEO_OLD
> -#define  SO_SNDTIMEO SO_SNDTIMEO_OLD
> +#if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
>  #define SO_TIMESTAMP SO_TIMESTAMP_OLD
>  #define SO_TIMESTAMPNS   SO_TIMESTAMPNS_OLD
>  #define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
> +
> +#define SO_RCVTIMEO  SO_RCVTIMEO_OLD
> +#define SO_SNDTIMEO  SO_SNDTIMEO_OLD
>  #else
>  #define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
>  #define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
>  #define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
> +
> +#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
> +#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
>  #endif
>  
>  #define SCM_TIMESTAMP   SO_TIMESTAMP
> diff --git a/arch/mips/include/uapi/asm/socket.h 
> b/arch/mips/include/uapi/asm/socket.h
> index 96cc0e907f12..eb9f33f8a8b3 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -130,18 +130,25 @@
>  #define SO_TIMESTAMPNS_NEW  64
>  #define SO_TIMESTAMPING_NEW 65
>  
> +#define SO_RCVTIMEO_NEW 66
> +#define SO_SNDTIMEO_NEW 67
> +
>  #if !defined(__KERNEL__)
>  
> -#define  SO_RCVTIMEO SO_RCVTIMEO_OLD
> -#define  SO_SNDTIMEO SO_SNDTIMEO_OLD
>  #if __BITS_PER_LONG == 64
>  #define SO_TIMESTAMP SO_TIMESTAMP_OLD
>  #define SO_TIMESTAMPNS   SO_TIMESTAMPNS_OLD
>  #define SO_TIMESTAMPING  SO_TIMESTAMPING_OLD
> +
> +#define SO_RCVTIMEO SO_RCVTIMEO_OLD
> +#define SO_SNDTIMEO SO_SNDTIMEO_OLD
>  #else
>  #define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
>  #define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
>  #define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
> +
> +#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
> +#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
>  #endif
>  
>  #define SCM_TIMESTAMP   SO_TIMESTAMP
> diff --git a/arch/parisc/include/uapi/asm/socket.h 
> b/arch/parisc/include/uapi/asm/socket.h
> index 046f0cd9cce4..16e428f03526 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -111,18 +111,24 @@
>  #define SO_TIMESTAMPNS_NEW  0x4039
>  #define SO_TIMESTAMPING_NEW 0x403A
>  
> +#define SO_RCVTIMEO_NEW 0x4040
> +#define SO_SNDTIMEO_NEW 0x4041
> +
>  #if !defined(__KERNEL__)
>  
> -#define  SO_RCVTIMEO SO_RCVTIMEO_OLD
> -#define  SO_SNDTIMEO SO_SNDTIMEO_OLD
>  #if __BITS_PER_LONG == 64
>  #define SO_TIMESTAMP SO_TIMESTAMP_OLD
>  #define SO_TIMESTAMPNS   SO_TIMESTAMPNS_OLD
>  #define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
> +#define SO_RCVTIMEO  SO_RCVTIMEO_OLD
> +#define SO_SNDTIMEO  SO_SNDTIMEO_OLD
>  #else
>  #define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? 
> SO_TIMESTAMP_OLD : 

Re: [PATCH v2 10/21] memblock: refactor internal allocation functions

2019-02-03 Thread Michael Ellerman
Mike Rapoport  writes:

> Currently, memblock has several internal functions with overlapping
> functionality. They all call memblock_find_in_range_node() to find free
> memory and then reserve the allocated range and mark it with kmemleak.
> However, there is difference in the allocation constraints and in fallback
> strategies.
>
> The allocations returning physical address first attempt to find free
> memory on the specified node within mirrored memory regions, then retry on
> the same node without the requirement for memory mirroring and finally fall
> back to all available memory.
>
> The allocations returning virtual address start with clamping the allowed
> range to memblock.current_limit, attempt to allocate from the specified
> node from regions with mirroring and with user defined minimal address. If
> such allocation fails, next attempt is done with node restriction lifted.
> Next, the allocation is retried with minimal address reset to zero and at
> last without the requirement for mirrored regions.
>
> Let's consolidate various fallbacks handling and make them more consistent
> for physical and virtual variants. Most of the fallback handling is moved
> to memblock_alloc_range_nid() and it now handles node and mirror fallbacks.
>
> The memblock_alloc_internal() uses memblock_alloc_range_nid() to get a
> physical address of the allocated range and converts it to virtual address.
>
> The fallback for allocation below the specified minimal address remains in
> memblock_alloc_internal() because memblock_alloc_range_nid() is used by CMA
> with exact requirement for lower bounds.

This is causing problems on some of my machines.

I see NODE_DATA allocations falling back to node 0 when they shouldn't,
or didn't previously.

eg, before:

57990190: (116011251): numa:   NODE_DATA [mem 0xfffe4980-0xfffebfff]
58152042: (116373087): numa:   NODE_DATA [mem 0x8fff90980-0x8fff97fff]

after:

16356872061562: (6296877055): numa:   NODE_DATA [mem 0xfffe4980-0xfffebfff]
16356872079279: (6296894772): numa:   NODE_DATA [mem 0xfffcd300-0xfffd497f]
16356872096376: (6296911869): numa: NODE_DATA(1) on node 0


On some of my other systems it does that, and then panics because it
can't allocate anything at all:

[0.00] numa:   NODE_DATA [mem 0x7ffcaee80-0x7ffcb3fff]
[0.00] numa:   NODE_DATA [mem 0x7ffc99d00-0x7ffc9ee7f]
[0.00] numa: NODE_DATA(1) on node 0
[0.00] Kernel panic - not syncing: Cannot allocate 20864 bytes for node 
16 data
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.0.0-rc4-gccN-next-20190201-gdc4c899 #1
[0.00] Call Trace:
[0.00] [c11cfca0] [c0c11044] dump_stack+0xe8/0x164 
(unreliable)
[0.00] [c11cfcf0] [c00fdd6c] panic+0x17c/0x3e0
[0.00] [c11cfd90] [c0f61bc8] initmem_init+0x128/0x260
[0.00] [c11cfe60] [c0f57940] setup_arch+0x398/0x418
[0.00] [c11cfee0] [c0f50a94] start_kernel+0xa0/0x684
[0.00] [c11cff90] [c000af70] 
start_here_common+0x1c/0x52c
[0.00] Rebooting in 180 seconds..


So there's something going wrong there, I haven't had time to dig into
it though (Sunday night here).

cheers


Re: [PATCH v2 09/21] memblock: drop memblock_alloc_base()

2019-01-29 Thread Michael Ellerman
Mike Rapoport  writes:

> The memblock_alloc_base() function tries to allocate a memory up to the
> limit specified by its max_addr parameter and panics if the allocation
> fails. Replace its usage with memblock_phys_alloc_range() and make the
> callers check the return value and panic in case of error.
>
> Signed-off-by: Mike Rapoport 
> ---
>  arch/powerpc/kernel/rtas.c  |  6 +-
>  arch/powerpc/mm/hash_utils_64.c |  8 ++--
>  arch/s390/kernel/smp.c  |  6 +-
>  drivers/macintosh/smu.c |  2 +-
>  include/linux/memblock.h|  2 --
>  mm/memblock.c   | 14 --
>  6 files changed, 17 insertions(+), 21 deletions(-)

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v2 06/21] memblock: memblock_phys_alloc_try_nid(): don't panic

2019-01-29 Thread Michael Ellerman
Michael Ellerman  writes:

> Mike Rapoport  writes:
>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index ae34e3a..2c61ea4 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -237,6 +237,10 @@ static void __init setup_node_data(int nid, u64 
>> start_pfn, u64 end_pfn)
>>  pr_info("Initmem setup node %d []\n", nid);
>>  
>>  nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +if (!nd_pa)
>> +panic("Cannot allocate %zu bytes for node %d data\n",
>> +      nd_size, nid);
>> +
>>  nd = __va(nd_pa);

Wrong hunk, O_o

> Acked-by: Michael Ellerman  (powerpc)

You know what I mean though :)

cheers


Re: [PATCH v2 06/21] memblock: memblock_phys_alloc_try_nid(): don't panic

2019-01-29 Thread Michael Ellerman
Mike Rapoport  writes:

> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index ae34e3a..2c61ea4 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -237,6 +237,10 @@ static void __init setup_node_data(int nid, u64 
> start_pfn, u64 end_pfn)
>   pr_info("Initmem setup node %d []\n", nid);
>  
>   nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> + if (!nd_pa)
> + panic("Cannot allocate %zu bytes for node %d data\n",
> +   nd_size, nid);
> +
>   nd = __va(nd_pa);

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v2 02/21] powerpc: use memblock functions returning virtual address

2019-01-29 Thread Michael Ellerman
Mike Rapoport  writes:

> From: Christophe Leroy 
>
> Since only the virtual address of allocated blocks is used,
> lets use functions returning directly virtual address.
>
> Those functions have the advantage of also zeroing the block.
>
> [ MR:
>  - updated error message in alloc_stack() to be more verbose
>  - convereted several additional call sites ]
>
> Signed-off-by: Christophe Leroy 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/powerpc/kernel/dt_cpu_ftrs.c |  3 +--
>  arch/powerpc/kernel/irq.c |  5 -
>  arch/powerpc/kernel/paca.c|  6 +-
>  arch/powerpc/kernel/prom.c|  5 -
>  arch/powerpc/kernel/setup_32.c| 26 --
>  5 files changed, 26 insertions(+), 19 deletions(-)

LGTM.

Acked-by: Michael Ellerman 

cheers


Re: [PATCH 14/15] arch: add split IPC system calls where needed

2019-01-15 Thread Michael Ellerman
Arnd Bergmann  writes:
> On Mon, Jan 14, 2019 at 4:59 AM Michael Ellerman  wrote:
>> Arnd Bergmann  writes:
>> >  arch/m68k/kernel/syscalls/syscall.tbl | 11 +++
>> >  arch/mips/kernel/syscalls/syscall_o32.tbl | 11 +++
>> >  arch/powerpc/kernel/syscalls/syscall.tbl  | 12 
>>
>> I have some changes I'd like to make to our syscall table that will
>> clash with this.
>>
>> I'll try and send them today.
>
> Ok. Are those for 5.0 or 5.1? If they are intended for 5.0, it would be
> nice for me to have a branch based on 5.0-rc1 that I can put
> the other patches on top of.

For 5.1.

I can put them in a topic branch for you.

>> > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
>> > b/arch/powerpc/kernel/syscalls/syscall.tbl
>> > index db3bbb8744af..1bffab54ff35 100644
>> > --- a/arch/powerpc/kernel/syscalls/syscall.tbl
>> > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
>> > @@ -425,3 +425,15 @@
>> >  386  nospu   pkey_mprotect   sys_pkey_mprotect
>> >  387  nospu   rseqsys_rseq
>> >  388  nospu   io_pgetevents   sys_io_pgetevents
>> >compat_sys_io_pgetevents
>> > +# room for arch specific syscalls
>> > +392  64  semtimedop  sys_semtimedop
>> > +393  common  semget  sys_semget
>> > +394  common  semctl  sys_semctl   
>> >compat_sys_semctl
>> > +395  common  shmget  sys_shmget
>> > +396  common  shmctl  sys_shmctl   
>> >compat_sys_shmctl
>> > +397  common  shmat   sys_shmat
>> >compat_sys_shmat
>> > +398  common  shmdt   sys_shmdt
>> > +399  common  msgget  sys_msgget
>> > +400  common  msgsnd  sys_msgsnd   
>> >compat_sys_msgsnd
>> > +401  common  msgrcv  sys_msgrcv   
>> >compat_sys_msgrcv
>> > +402  common  msgctl  sys_msgctl   
>> >compat_sys_msgctl
>>
>> We already have a gap at 366-377 from when we tried to add the split IPC
>> calls a few years back.
>>
>> I guess I don't mind leaving that gap and using the common numbers as
>> you've done here.
>>
>> But it would be good to add a comment pointing out that we have room
>> at 366 for more arch specific syscalls as well.
>
> Ah, I missed that. I've added this to my patch now:
>
> index 5c0936d862fc..2ddfba536d5f 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -460,6 +460,7 @@
>  363spu switch_endian   sys_ni_syscall
>  364common  userfaultfd sys_userfaultfd
>  365common  membarrier  sys_membarrier
> +# 366-377 originally left for IPC, now unused
>  378nospu   mlock2  sys_mlock2
>  379nospu   copy_file_range sys_copy_file_range
>  380common  preadv2 sys_preadv2
>  compat_sys_preadv2

Thanks.

cheers


Re: [PATCH 14/15] arch: add split IPC system calls where needed

2019-01-15 Thread Michael Ellerman
Arnd Bergmann  writes:

> On Tue, Jan 15, 2019 at 4:01 PM Arnd Bergmann  wrote:
>>
>> On Mon, Jan 14, 2019 at 4:59 AM Michael Ellerman  wrote:
>> > Arnd Bergmann  writes:
>> > >  arch/m68k/kernel/syscalls/syscall.tbl | 11 +++
>> > >  arch/mips/kernel/syscalls/syscall_o32.tbl | 11 +++
>> > >  arch/powerpc/kernel/syscalls/syscall.tbl  | 12 
>> >
>> > I have some changes I'd like to make to our syscall table that will
>> > clash with this.
>> >
>> > I'll try and send them today.
>>
>> Ok. Are those for 5.0 or 5.1? If they are intended for 5.0, it would be
>> nice for me to have a branch based on 5.0-rc1 that I can put
>> the other patches on top of.
>
> There is also another change that I considered:
>
> At the end of my series, we have a lot of entries like
>
> 245 32  clock_settime   sys_clock_settime32
> 245 64  clock_settime   sys_clock_settime
> 245 spu clock_settime   sys_clock_settime
>
> which could be folded into
>
> 245 32  clock_settime   sys_clock_settime32
> 245 spu64 clock_settime   sys_clock_settime
>
> if we just add another option to the ABI field. Any thoughts on
> that?

My series splits spu out into a separate field. So the above would be:

245 32  -   clock_settime   sys_clock_settime32
245 64  spu clock_settime   sys_clock_settime

cheers


Re: [PATCH 14/15] arch: add split IPC system calls where needed

2019-01-14 Thread Michael Ellerman
Michael Ellerman  writes:
> Hi Arnd,
>
> Arnd Bergmann  writes:
>> The IPC system call handling is highly inconsistent across architectures,
>> some use sys_ipc, some use separate calls, and some use both.  We also
>> have some architectures that require passing IPC_64 in the flags, and
>> others that set it implicitly.
...
>
> We already have a gap at 366-377 from when we tried to add the split IPC
> calls a few years back.
>
> I guess I don't mind leaving that gap and using the common numbers as
> you've done here.
>
> But it would be good to add a comment pointing out that we have room
> at 366 for more arch specific syscalls as well.
>
> cheers

Guess I sent that one twice. 臘

cheers


Re: [PATCH 14/15] arch: add split IPC system calls where needed

2019-01-13 Thread Michael Ellerman
Hi Arnd,

Arnd Bergmann  writes:
> The IPC system call handling is highly inconsistent across architectures,
> some use sys_ipc, some use separate calls, and some use both.  We also
> have some architectures that require passing IPC_64 in the flags, and
> others that set it implicitly.
>
> For the additon of a y2083 safe semtimedop() system call, I chose to only
> support the separate entry points, but that requires first supporting
> the regular ones with their own syscall numbers.
>
> The IPC_64 is now implied by the new semctl/shmctl/msgctl system
> calls even on the architectures that require passing it with the ipc()
> multiplexer.
>
> I'm not adding the new semtimedop() or semop() on 32-bit architectures,
> those will get implemented using the new semtimedop_time64() version
> that gets added along with the other time64 calls.
> Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop().
>
> Signed-off-by: Arnd Bergmann 
> ---
> One aspect here that might be a bit controversial is the use of
> the same system call numbers across all architectures, synchronizing
> all of them with the x86-32 numbers. With the new syscall.tbl
> files, I hope we can just keep doing that in the future, and no
> longer require the architecture maintainers to assign a number.
>
> This is mainly useful for implementers of the C libraries: if
> we can add future system calls everywhere at the same time, using
> a particular version of the kernel headers also guarantees that
> the system call number macro is visible.
> ---
>  arch/m68k/kernel/syscalls/syscall.tbl | 11 +++
>  arch/mips/kernel/syscalls/syscall_o32.tbl | 11 +++
>  arch/powerpc/kernel/syscalls/syscall.tbl  | 12 

I have some changes I'd like to make to our syscall table that will
clash with this.

I'll try and send them today.

> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index db3bbb8744af..1bffab54ff35 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -425,3 +425,15 @@
>  386  nospu   pkey_mprotect   sys_pkey_mprotect
>  387  nospu   rseqsys_rseq
>  388  nospu   io_pgetevents   sys_io_pgetevents   
> compat_sys_io_pgetevents
> +# room for arch specific syscalls
> +392  64  semtimedop  sys_semtimedop
> +393  common  semget  sys_semget
> +394  common  semctl  sys_semctl  
> compat_sys_semctl
> +395  common  shmget  sys_shmget
> +396  common  shmctl  sys_shmctl  
> compat_sys_shmctl
> +397  common  shmat   sys_shmat   
> compat_sys_shmat
> +398  common  shmdt   sys_shmdt
> +399  common  msgget  sys_msgget
> +400  common  msgsnd  sys_msgsnd  
> compat_sys_msgsnd
> +401  common  msgrcv  sys_msgrcv  
> compat_sys_msgrcv
> +402  common  msgctl  sys_msgctl  
> compat_sys_msgctl

We already have a gap at 366-377 from when we tried to add the split IPC
calls a few years back.

I guess I don't mind leaving that gap and using the common numbers as
you've done here.

But it would be good to add a comment pointing out that we have room
at 366 for more arch specific syscalls as well.

cheers


Re: [PATCH 14/15] arch: add split IPC system calls where needed

2019-01-13 Thread Michael Ellerman
Hi Arnd,

Arnd Bergmann  writes:
> The IPC system call handling is highly inconsistent across architectures,
> some use sys_ipc, some use separate calls, and some use both.  We also
> have some architectures that require passing IPC_64 in the flags, and
> others that set it implicitly.
>
> For the additon of a y2083 safe semtimedop() system call, I chose to only
> support the separate entry points, but that requires first supporting
> the regular ones with their own syscall numbers.
>
> The IPC_64 is now implied by the new semctl/shmctl/msgctl system
> calls even on the architectures that require passing it with the ipc()
> multiplexer.
>
> I'm not adding the new semtimedop() or semop() on 32-bit architectures,
> those will get implemented using the new semtimedop_time64() version
> that gets added along with the other time64 calls.
> Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop().
>
> Signed-off-by: Arnd Bergmann 
> ---
> One aspect here that might be a bit controversial is the use of
> the same system call numbers across all architectures, synchronizing
> all of them with the x86-32 numbers. With the new syscall.tbl
> files, I hope we can just keep doing that in the future, and no
> longer require the architecture maintainers to assign a number.
>
> This is mainly useful for implementers of the C libraries: if
> we can add future system calls everywhere at the same time, using
> a particular version of the kernel headers also guarantees that
> the system call number macro is visible.
> ---
>  arch/m68k/kernel/syscalls/syscall.tbl | 11 +++
>  arch/mips/kernel/syscalls/syscall_o32.tbl | 11 +++
>  arch/powerpc/kernel/syscalls/syscall.tbl  | 12 

I have some changes I'd like to make to our syscall table that will
clash with this.

I'll try and send them today.

> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index db3bbb8744af..1bffab54ff35 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -425,3 +425,15 @@
>  386  nospu   pkey_mprotect   sys_pkey_mprotect
>  387  nospu   rseqsys_rseq
>  388  nospu   io_pgetevents   sys_io_pgetevents   
> compat_sys_io_pgetevents
> +# room for arch specific syscalls
> +392  64  semtimedop  sys_semtimedop
> +393  common  semget  sys_semget
> +394  common  semctl  sys_semctl  
> compat_sys_semctl
> +395  common  shmget  sys_shmget
> +396  common  shmctl  sys_shmctl  
> compat_sys_shmctl
> +397  common  shmat   sys_shmat   
> compat_sys_shmat
> +398  common  shmdt   sys_shmdt
> +399  common  msgget  sys_msgget
> +400  common  msgsnd  sys_msgsnd  
> compat_sys_msgsnd
> +401  common  msgrcv  sys_msgrcv  
> compat_sys_msgrcv
> +402  common  msgctl  sys_msgctl  
> compat_sys_msgctl

We already have a gap at 366-377 from when we tried to add the split IPC
calls a few years back.

I guess I don't mind leaving that gap and using the common numbers.

But would be good to add a comment pointing out that we have room there
for arch specific syscalls as well.

cheers


Re: [PATCH v4 10/13] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2019-01-10 Thread Michael Ellerman
Peter Zijlstra  writes:

> On Mon, Jan 07, 2019 at 04:27:27PM +, Andrew Murray wrote:
>> For drivers that do not support context exclusion let's advertise the
>> PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
>> prevent us from handling events where any exclusion flags are set.
>> Let's also remove the now unnecessary check for exclusion flags.
>> 
>> Signed-off-by: Andrew Murray 
>> ---
>>  arch/x86/events/amd/ibs.c  | 13 +
>>  arch/x86/events/amd/power.c| 10 ++
>>  arch/x86/events/intel/cstate.c | 12 +++-
>>  arch/x86/events/intel/rapl.c   |  9 ++---
>>  arch/x86/events/intel/uncore_snb.c |  9 ++---
>>  arch/x86/events/msr.c  | 10 ++
>>  6 files changed, 12 insertions(+), 51 deletions(-)
>
> You (correctly) don't add CAP_NO_EXCLUDE to the main x86 pmu code, but
> then you also don't check if it handles all the various exclude options
> correctly/consistently.
>
> Now; I must admit that that is a bit of a maze, but I think we can at
> least add exclude_idle and exclude_hv fails in there, nothing uses those
> afaict.
>
> On the various exclude options; they are as follows (IIUC):
>
>   - exclude_guest: we're a HV/host-kernel and we don't want the counter
>to run when we run a guest context.
>
>   - exclude_host: we're a HV/host-kernel and we don't want the counter
>   to run when we run in host context.
>
>   - exclude_hv: we're a guest and don't want the counter to run in HV
> context.
>
> Now, KVM always implies exclude_hv afaict (for guests)

On Power it mostly does.

There's some host code that can run in real mode (MMU off) and therefore
doesn't do a full context switch out of the guest (including the PMU),
so that's host code that is running while the guest PMCs are still
counting.

cheers


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Michael Ellerman
Andrew Murray  writes:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
>> [ Reviving old thread. ]
>> 
>> Andrew Murray  writes:
>> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> >> Andrew Murray  writes:
>> >> 
>> >> > Update design.txt to reflect the presence of the exclude_host
>> >> > and exclude_guest perf flags.
>> >> >
>> >> > Signed-off-by: Andrew Murray 
>> >> > ---
>> >> >  tools/perf/design.txt | 4 
>> >> >  1 file changed, 4 insertions(+)
>> >> >
>> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> >> > index a28dca2..7de7d83 100644
>> >> > --- a/tools/perf/design.txt
>> >> > +++ b/tools/perf/design.txt
>> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 
>> >> > 'exclude_hv' bits provide a
>> >> >  way to request that counting of events be restricted to times when the
>> >> >  CPU is in user, kernel and/or hypervisor mode.
>> >> >  
>> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> >> > +to request counting of events restricted to guest and host contexts 
>> >> > when
>> >> > +using virtualisation.
>> >> 
>> >> How does exclude_host differ from exclude_hv ?
>> >
>> > I believe exclude_host / exclude_guest are intented to distinguish
>> > between host and guest in the hosted hypervisor context (KVM).
>> 
>> OK yeah, from the perf-list man page:
>> 
>>u - user-space counting
>>k - kernel counting
>>h - hypervisor counting
>>I - non idle counting
>>G - guest counting (in KVM guests)
>>H - host counting (not in KVM guests)
>> 
>> > Whereas exclude_hv allows to distinguish between guest and
>> > hypervisor in the bare-metal type hypervisors.
>> 
>> Except that's exactly not how we use them on powerpc :)
>> 
>> We use exclude_hv to exclude "the hypervisor", regardless of whether
>> it's KVM or PowerVM (which is a bare-metal hypervisor).
>> 
>> We don't use exclude_host / exclude_guest at all, which I guess is a
>> bug, except I didn't know they existed until this thread.
>> 
>> eg, in a KVM guest:
>> 
>>   $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
>>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>>   16
>> 
>> 
>> > In the case of arm64 - if VHE extensions are present then the host
>> > kernel will run at a higher privilege to the guest kernel, in which
>> > case there is no distinction between hypervisor and host so we ignore
>> > exclude_hv. But where VHE extensions are not present then the host
>> > kernel runs at the same privilege level as the guest and we use a
>> > higher privilege level to switch between them - in this case we can
>> > use exclude_hv to discount that hypervisor role of switching between
>> > guests.
>> 
>> I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].

OK, so at least that will be consistent across arm64 & x86.

>> And I don't see any x86 code using exclude_hv.
>
> I can't find any either.

I think that's because they don't need it, because they don't let guests
program the PMU directly. It's all handled by the host and the host
doesn't let the guest count host cycles anyway. But I could be wrong I'm
no x86 expert.

>> But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2]

Good find. I had looked at that commit, but the thread on the list is
more informative.

In fact there was even a man page update! Never occurred to me look
there :P

http://man7.org/linux/man-pages/man2/perf_event_open.2.html

   exclude_host (since Linux 3.2)
  When conducting measurements that include processes running VM
  instances (i.e., have executed a KVM_RUN ioctl(2)), only mea‐
  sure events happening inside a guest instance.  This is only
  meaningful outside the guests; this setting does not change
  counts gathered inside of a guest.  Currently, this function‐
  ality is x86 only.

   exclude_guest (since Linux 3.2)
  When conducting measurements that include processes running VM
  instances (i.e., have executed a KVM_RUN ioctl(2)), do not
  measure events happening inside guest instances.  This is only
  meaningful outside the guests; this setting does not change
  counts gathered inside of a guest.  Currently, this function‐
  ality is x86 only.


Which makes things much clearer.

Perhaps you want to add a reference to the man page in your text,
something like?

  Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
  to request counting of events restricted to guest and host contexts when
  using virtualisation. See the perf_event_open(2) man page for more
  detail.


cheers


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-11 Thread Michael Ellerman
[ Reviving old thread. ]

Andrew Murray  writes:
> On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> Andrew Murray  writes:
>> 
>> > Update design.txt to reflect the presence of the exclude_host
>> > and exclude_guest perf flags.
>> >
>> > Signed-off-by: Andrew Murray 
>> > ---
>> >  tools/perf/design.txt | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> > index a28dca2..7de7d83 100644
>> > --- a/tools/perf/design.txt
>> > +++ b/tools/perf/design.txt
>> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' 
>> > bits provide a
>> >  way to request that counting of events be restricted to times when the
>> >  CPU is in user, kernel and/or hypervisor mode.
>> >  
>> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> > +to request counting of events restricted to guest and host contexts when
>> > +using virtualisation.
>> 
>> How does exclude_host differ from exclude_hv ?
>
> I believe exclude_host / exclude_guest are intented to distinguish
> between host and guest in the hosted hypervisor context (KVM).

OK yeah, from the perf-list man page:

   u - user-space counting
   k - kernel counting
   h - hypervisor counting
   I - non idle counting
   G - guest counting (in KVM guests)
   H - host counting (not in KVM guests)

> Whereas exclude_hv allows to distinguish between guest and
> hypervisor in the bare-metal type hypervisors.

Except that's exactly not how we use them on powerpc :)

We use exclude_hv to exclude "the hypervisor", regardless of whether
it's KVM or PowerVM (which is a bare-metal hypervisor).

We don't use exclude_host / exclude_guest at all, which I guess is a
bug, except I didn't know they existed until this thread.

eg, in a KVM guest:

  $ perf record -e cycles:G /bin/bash -c "for i in {0..10}; do :;done"
  $ perf report -D | grep -Fc "dso: [hypervisor]"
  16


> In the case of arm64 - if VHE extensions are present then the host
> kernel will run at a higher privilege to the guest kernel, in which
> case there is no distinction between hypervisor and host so we ignore
> exclude_hv. But where VHE extensions are not present then the host
> kernel runs at the same privilege level as the guest and we use a
> higher privilege level to switch between them - in this case we can
> use exclude_hv to discount that hypervisor role of switching between
> guests.

I couldn't find any arm64 perf code using exclude_host/guest at all?

And I don't see any x86 code using exclude_hv.

But maybe that's OK, I just worry this is confusing for users.

cheers


Re: [PATCH v3 09/12] powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-09 Thread Michael Ellerman
Andrew Murray  writes:

> For PowerPC PMUs that do not support context exclusion let's
> advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that
> perf will prevent us from handling events where any exclusion flags
> are set. Let's also remove the now unnecessary check for exclusion
> flags.
>
> Signed-off-by: Andrew Murray 
> ---
>  arch/powerpc/perf/hv-24x7.c | 10 +-
>  arch/powerpc/perf/hv-gpci.c | 10 +-
>  arch/powerpc/perf/imc-pmu.c | 19 +--
>  3 files changed, 3 insertions(+), 36 deletions(-)

Looks good.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 72238ee..d2b8e60 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
>   return -EINVAL;
>   }
>  
> - /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> - return -EINVAL;
> -
>   /* no branch sampling */
>   if (has_branch_stack(event))
>   return -EOPNOTSUPP;
> @@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = {
>   .start_txn   = h_24x7_event_start_txn,
>   .commit_txn  = h_24x7_event_commit_txn,
>   .cancel_txn  = h_24x7_event_cancel_txn,
> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>  };
>  
>  static int hv_24x7_init(void)
> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> index 43fabb3..735e77b 100644
> --- a/arch/powerpc/perf/hv-gpci.c
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
>   return -EINVAL;
>   }
>  
> - /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> - return -EINVAL;
> -
>   /* no branch sampling */
>   if (has_branch_stack(event))
>   return -EOPNOTSUPP;
> @@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = {
>   .start   = h_gpci_event_start,
>   .stop= h_gpci_event_stop,
>   .read= h_gpci_event_update,
> + .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>  };
>  
>  static int hv_gpci_init(void)
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 1fafc32b..1dbb0ee 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
>   if (event->hw.sample_period)
>   return -EINVAL;
>  
> - /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> - return -EINVAL;
> -
>   if (event->cpu < 0)
>   return -EINVAL;
>  
> @@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
>   if (event->hw.sample_period)
>   return -EINVAL;
>  
> - /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> - return -EINVAL;
> -
>   if (event->cpu < 0)
>   return -EINVAL;
>  
> @@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
>   pmu->pmu.stop = imc_event_stop;
>   pmu->pmu.read = imc_event_update;
>   pmu->pmu.attr_groups = pmu->attr_groups;
> + pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
>   pmu->attr_groups[IMC_FORMAT_ATTR] = _format_group;
>  
>   switch (pmu->domain) {
> -- 
> 2.7.4


Re: [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-26 Thread Michael Ellerman
Anshuman Khandual  writes:
> At present there are multiple places where invalid node number is encoded
> as -1. Even though implicitly understood it is always better to have macros
> in there. Replace these open encodings for an invalid node number with the
> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like
> 'invalid node' from various places redirecting them to a common definition.
>
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
>
> - Added inclusion of 'numa.h' header at various places per Andrew
> - Updated 'dev_to_node' to use NUMA_NO_NODE instead per Vinod
>
> Changes in V1: (https://lkml.org/lkml/2018/11/23/485)
>
> - Dropped OCFS2 changes per Joseph
> - Dropped media/video drivers changes per Hans
>
> RFC - https://patchwork.kernel.org/patch/10678035/
>
> Build tested this with multiple cross compiler options like alpha, sparc,
> arm64, x86, powerpc, powerpc64le etc with their default config which might
> not have compiled tested all driver related changes. I will appreciate
> folks giving this a test in their respective build environment.
>
> All these places for replacement were found by running the following grep
> patterns on the entire kernel code. Please let me know if this might have
> missed some instances. This might also have replaced some false positives.
> I will appreciate suggestions, inputs and review.
>
> 1. git grep "nid == -1"
> 2. git grep "node == -1"
> 3. git grep "nid = -1"
> 4. git grep "node = -1"
>
>  arch/alpha/include/asm/topology.h |  3 ++-
>  arch/ia64/kernel/numa.c   |  2 +-
>  arch/ia64/mm/discontig.c  |  6 +++---
>  arch/ia64/sn/kernel/io_common.c   |  3 ++-
>  arch/powerpc/include/asm/pci-bridge.h |  3 ++-
>  arch/powerpc/kernel/paca.c|  3 ++-
>  arch/powerpc/kernel/pci-common.c  |  3 ++-
>  arch/powerpc/mm/numa.c| 14 +++---
>  arch/powerpc/platforms/powernv/memtrace.c |  5 +++--

These powerpc changes all look fine.

Acked-by: Michael Ellerman 


cheers


Re: [PATCH 10/10] perf/doc: update design.txt for exclude_{host|guest} flags

2018-11-20 Thread Michael Ellerman
Andrew Murray  writes:

> Update design.txt to reflect the presence of the exclude_host
> and exclude_guest perf flags.
>
> Signed-off-by: Andrew Murray 
> ---
>  tools/perf/design.txt | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> index a28dca2..7de7d83 100644
> --- a/tools/perf/design.txt
> +++ b/tools/perf/design.txt
> @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' 
> bits provide a
>  way to request that counting of events be restricted to times when the
>  CPU is in user, kernel and/or hypervisor mode.
>  
> +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> +to request counting of events restricted to guest and host contexts when
> +using virtualisation.

How does exclude_host differ from exclude_hv ?

cheers


Re: [PATCH 01/10] perf/core: Add macro to test for event exclusion flags

2018-11-20 Thread Michael Ellerman
Andrew Murray  writes:

> Add a macro that tests if any of the perf event exclusion flags
> are set on a given event.
>
> Signed-off-by: Andrew Murray 
> ---
>  include/linux/perf_event.h | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f..89ee7fa 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
>  extern void
>  perf_log_lost_samples(struct perf_event *event, u64 lost);
>  
> +static inline bool event_has_exclude_flags(struct perf_event *event)
> +{
> + struct perf_event_attr *attr = >attr;
> +
> + return attr->exclude_idle || attr->exclude_user ||
> +attr->exclude_kernel || attr->exclude_hv ||
> +attr->exclude_guest || attr->exclude_host;
> +}

Sorry to be a total PITA, but using "flags" plural suggests that it only
returns true if there is more than one exclude flag set.

A better name would be event_has_exclude_flag() or maybe
event_has_any_exclude_flag().

If you're doing a respin anyway it'd be nice to fix the name, but
obviously it's not high priority.

cheers


Re: [PATCH 04/10] powerpc: perf/core: generalise event exclusion checking with perf macro

2018-11-20 Thread Michael Ellerman
Andrew Murray  writes:

> Replace checking of perf event exclusion flags with perf macro.
>
> Signed-off-by: Andrew Murray 
> ---
>  arch/powerpc/perf/hv-24x7.c |  7 +--
>  arch/powerpc/perf/hv-gpci.c |  7 +--
>  arch/powerpc/perf/imc-pmu.c | 14 ++
>  3 files changed, 4 insertions(+), 24 deletions(-)

These conversions look fine, thanks.

Acked-by: Michael Ellerman 

cheers

> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index 72238ee..60db22d 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -1307,12 +1307,7 @@ static int h_24x7_event_init(struct perf_event *event)
>   }
>  
>   /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
>   return -EINVAL;
>  
>   /* no branch sampling */
> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> index 43fabb3..2d2b5c0 100644
> --- a/arch/powerpc/perf/hv-gpci.c
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -233,12 +233,7 @@ static int h_gpci_event_init(struct perf_event *event)
>   }
>  
>   /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
>   return -EINVAL;
>  
>   /* no branch sampling */
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 1fafc32b..1ae1d3f 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -474,12 +474,7 @@ static int nest_imc_event_init(struct perf_event *event)
>   return -EINVAL;
>  
>   /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
>   return -EINVAL;
>  
>   if (event->cpu < 0)
> @@ -749,12 +744,7 @@ static int core_imc_event_init(struct perf_event *event)
>   return -EINVAL;
>  
>   /* unsupported modes and filters */
> - if (event->attr.exclude_user   ||
> - event->attr.exclude_kernel ||
> - event->attr.exclude_hv ||
> - event->attr.exclude_idle   ||
> - event->attr.exclude_host   ||
> - event->attr.exclude_guest)
> + if (event_has_exclude_flags(event))
>   return -EINVAL;
>  
>   if (event->cpu < 0)
> -- 
> 2.7.4


Re: [PATCH] memblock: stop using implicit alignement to SMP_CACHE_BYTES

2018-10-10 Thread Michael Ellerman
Mike Rapoport  writes:

> When a memblock allocation APIs are called with align = 0, the alignment is
> implicitly set to SMP_CACHE_BYTES.
>
> Replace all such uses of memblock APIs with the 'align' parameter explicitly
> set to SMP_CACHE_BYTES and stop implicit alignment assignment in the
> memblock internal allocation functions.
>
> For the case when memblock APIs are used via helper functions, e.g. like
> iommu_arena_new_node() in Alpha, the helper functions were detected with
> Coccinelle's help and then manually examined and updated where appropriate.
>
> The direct memblock APIs users were updated using the semantic patch below:
>
> @@
> expression size, min_addr, max_addr, nid;
> @@
> (
> |
> - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
> + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
> nid)
> |
> - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
> + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
> nid)
> |
> - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
> + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
> |
> - memblock_alloc(size, 0)
> + memblock_alloc(size, SMP_CACHE_BYTES)
> |
> - memblock_alloc_raw(size, 0)
> + memblock_alloc_raw(size, SMP_CACHE_BYTES)
> |
> - memblock_alloc_from(size, 0, min_addr)
> + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
> |
> - memblock_alloc_nopanic(size, 0)
> + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
> |
> - memblock_alloc_low(size, 0)
> + memblock_alloc_low(size, SMP_CACHE_BYTES)
> |
> - memblock_alloc_low_nopanic(size, 0)
> + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
> |
> - memblock_alloc_from_nopanic(size, 0, min_addr)
> + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
> |
> - memblock_alloc_node(size, 0, nid)
> + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
> )
>
> Suggested-by: Michal Hocko 
> Signed-off-by: Mike Rapoport 
> ---
...
>  arch/powerpc/kernel/pci_32.c  |  3 ++-
>  arch/powerpc/lib/alloc.c  |  2 +-
>  arch/powerpc/mm/mmu_context_nohash.c  |  7 +++---
>  arch/powerpc/platforms/powermac/nvram.c   |  2 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c |  6 ++---
>  arch/powerpc/sysdev/msi_bitmap.c  |  2 +-

The powerpc changes all look fine.

I'm not quite clear on how SMP_CACHE_BYTES is getting included.

I think it's: memblock.h -> mm.h -> mmzone.h -> cache.h

So that's probably fine.

Acked-by: Michael Ellerman  (powerpc)


cheers


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman
Geert Uytterhoeven <ge...@linux-m68k.org> writes:

> On Tue, Jan 2, 2018 at 10:45 AM, Michael Ellerman <m...@ellerman.id.au> wrote:
>> Christoph Hellwig <h...@lst.de> writes:
>>
>>> We want to use the dma_direct_ namespace for a generic implementation,
>>> so rename powerpc to the second best choice: dma_nommu_.
>>
>> I'm not a fan of "nommu". Some of the users of direct ops *are* using an
>> IOMMU, they're just setting up a 1:1 mapping once at init time, rather
>> than mapping dynamically.
>>
>> Though I don't have a good idea for a better name, maybe "1to1",
>> "linear", "premapped" ?
>
> "identity"?

I think that would be wrong, but thanks for trying to help :)

The address on the device side is sometimes (often?) offset from the CPU
address. So eg. the device can DMA to RAM address 0x0 using address
0x800.

Identity would imply 0 == 0 etc.

I think "bijective" is the correct term, but that's probably a bit
esoteric.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/67] powerpc: rename dma_direct_ to dma_nommu_

2018-01-02 Thread Michael Ellerman
Christoph Hellwig  writes:

> We want to use the dma_direct_ namespace for a generic implementation,
> so rename powerpc to the second best choice: dma_nommu_.

I'm not a fan of "nommu". Some of the users of direct ops *are* using an
IOMMU, they're just setting up a 1:1 mapping once at init time, rather
than mapping dynamically.

Though I don't have a good idea for a better name, maybe "1to1",
"linear", "premapped" ?

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] futex: remove duplicated code

2017-03-04 Thread Michael Ellerman
Jiri Slaby <jsl...@suse.cz> writes:

> There is code duplicated over all architecture's headers for
> futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
> and comparison of the result.
>
> Remove this duplication and leave up to the arches only the needed
> assembly which is now in arch_futex_atomic_op_inuser.

Looks OK and boots on powerpc. But I don't think anything's actually
calling those futex ops. Is there a test suite I should run?

Acked-by: Michael Ellerman <m...@ellerman.id.au> (powerpc)

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-alpha" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html