Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-12 Thread Ram Pai
On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Ram,
> 
> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
> associated documentation in Documentation/filesystems/proc.txt. Later,
> I pasted much of that documentation into the proc(5) manual page.
> 
> That documentation says of the second field in the file:
> 
> [[
> This file contains lines of the form:
> 
> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
> (1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)
> 
> (1) mount ID:  unique identifier of the mount (may be reused after umount)
> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
> ...
> ]]
> 
> The last piece of the description of field (2) doesn't seem to be
> correct, or is at least rather unclear. I take this to be saying that
> that for the root mount point, /, field (2) will have the same value
> as field (1). I never actually looked at this detail closely, but
> Alexander pointed out that this is obviously not so, as one can
> immediately verify:
> 
> $ grep '/ / ' /proc/$$/mountinfo
> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
> 
> I dug around in the kernel source for a bit. I do not have an exact
> handle on the details, but I can see roughly what is going on.
> Internally, there seems to be one ("hidden") mount ID reserved to each
> mount namespace, and that ID is the parent of the root mount point.
> 
> Looking through the (4.14) kernel source, mount IDs are allocated by
> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
> alloc_vfsmnt() which is in turn called by clone_mnt().
> 
> A new mount namespace is created by the kernel function copy_mnt_ns()
> (in fs/namespace.c, called by create_new_namespaces() in
> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
> The first of these is the call that creates the "hidden" mount ID that
> becomes the parent of the root mount point. (I verified this by
> instrumenting the kernel with a few printk() calls to display the
> IDs.)  The second place where copy_tree() calls clone_mnt() is in a
> loop that replicates each of the mount points (including the root
> mount point) in the source mount namespace.

We used to report that mount, ones upon a time.  Something has changed
the behavior since then and its not reported any more, thus making it
hidden.

> 
> With these details in mind, I propose to patch the man page to read as
> below. Perhaps you have some corrections or improvements to suggest
> for this text?
> 
> [[
>   (2)  parent ID: the ID of the parent mount.  For  the  root
>mount  point,  the  ID shown here is a hidden mount ID
>associated with the mount namespace.  That ID is  dis‐
>tinct  from  any  of the IDs shown in field (1) of the
>records shown in the  mountinfo  file,  and  does  not
>appear in field (1) in the mountinfo file in any other
>mount namespace.  (In  the  initial  mount  namespace,
>this hidden ID has the value 0.)

It captures the current semantics correctly.

RP



Re: Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-12 Thread Ram Pai
On Mon, Nov 13, 2017 at 07:02:21AM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Ram,
> 
> Long ago (2.6.29) you added the /proc/PID/mountinfo file and
> associated documentation in Documentation/filesystems/proc.txt. Later,
> I pasted much of that documentation into the proc(5) manual page.
> 
> That documentation says of the second field in the file:
> 
> [[
> This file contains lines of the form:
> 
> 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
> (1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)
> 
> (1) mount ID:  unique identifier of the mount (may be reused after umount)
> (2) parent ID:  ID of parent (or of self for the top of the mount tree)
> ...
> ]]
> 
> The last piece of the description of field (2) doesn't seem to be
> correct, or is at least rather unclear. I take this to be saying that
> that for the root mount point, /, field (2) will have the same value
> as field (1). I never actually looked at this detail closely, but
> Alexander pointed out that this is obviously not so, as one can
> immediately verify:
> 
> $ grep '/ / ' /proc/$$/mountinfo
> 65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order
> 
> I dug around in the kernel source for a bit. I do not have an exact
> handle on the details, but I can see roughly what is going on.
> Internally, there seems to be one ("hidden") mount ID reserved to each
> mount namespace, and that ID is the parent of the root mount point.
> 
> Looking through the (4.14) kernel source, mount IDs are allocated by
> mnt_alloc_id() (in fs/namespace.c), which is in turn called by
> alloc_vfsmnt() which is in turn called by clone_mnt().
> 
> A new mount namespace is created by the kernel function copy_mnt_ns()
> (in fs/namespace.c, called by create_new_namespaces() in
> kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
> fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
> The first of these is the call that creates the "hidden" mount ID that
> becomes the parent of the root mount point. (I verified this by
> instrumenting the kernel with a few printk() calls to display the
> IDs.)  The second place where copy_tree() calls clone_mnt() is in a
> loop that replicates each of the mount points (including the root
> mount point) in the source mount namespace.

We used to report that mount, ones upon a time.  Something has changed
the behavior since then and its not reported any more, thus making it
hidden.

> 
> With these details in mind, I propose to patch the man page to read as
> below. Perhaps you have some corrections or improvements to suggest
> for this text?
> 
> [[
>   (2)  parent ID: the ID of the parent mount.  For  the  root
>mount  point,  the  ID shown here is a hidden mount ID
>associated with the mount namespace.  That ID is  dis‐
>tinct  from  any  of the IDs shown in field (1) of the
>records shown in the  mountinfo  file,  and  does  not
>appear in field (1) in the mountinfo file in any other
>mount namespace.  (In  the  initial  mount  namespace,
>this hidden ID has the value 0.)

It captures the current semantics correctly.

RP



[GIT PULL] RAS updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest ras-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ras-core-for-linus

   # HEAD: 783ca517bfd62ca516178712775e4b273292d5b1 x86/MCE/AMD: Fix 
mce_severity_amd_smca() signature

Two minor updates to AMD SMCA support, plus a timer_setup() conversion.

 Thanks,

Ingo

-->
Kees Cook (1):
  x86/mce: Convert timers to use timer_setup()

Yazen Ghannam (2):
  x86/MCE/AMD: Always give panic severity for UC errors in kernel context
  x86/MCE/AMD: Fix mce_severity_amd_smca() signature


 arch/x86/kernel/cpu/mcheck/mce-severity.c |  9 -
 arch/x86/kernel/cpu/mcheck/mce.c  | 13 +
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c 
b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab7a13c..4ca632a06e0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,7 +204,7 @@ static int error_context(struct mce *m)
return IN_KERNEL;
 }
 
-static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
 {
u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
u32 low, high;
@@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
 
if (m->status & MCI_STATUS_UC) {
 
+   if (ctx == IN_KERNEL)
+   return MCE_PANIC_SEVERITY;
+
/*
 * On older systems where overflow_recov flag is not present, we
 * should simply panic if an error overflow occurs. If
@@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
if (mce_flags.smca)
return mce_severity_amd_smca(m, ctx);
 
-   /* software can try to contain */
-   if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == 
IN_KERNEL))
-   return MCE_PANIC_SEVERITY;
-
/* kill current process */
return MCE_AR_SEVERITY;
} else {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3b413065c613..b1d616d08eee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1367,13 +1367,12 @@ static void __start_timer(struct timer_list *t, 
unsigned long interval)
local_irq_restore(flags);
 }
 
-static void mce_timer_fn(unsigned long data)
+static void mce_timer_fn(struct timer_list *t)
 {
-   struct timer_list *t = this_cpu_ptr(_timer);
-   int cpu = smp_processor_id();
+   struct timer_list *cpu_t = this_cpu_ptr(_timer);
unsigned long iv;
 
-   WARN_ON(cpu != data);
+   WARN_ON(cpu_t != t);
 
iv = __this_cpu_read(mce_next_interval);
 
@@ -1763,17 +1762,15 @@ static void mce_start_timer(struct timer_list *t)
 static void __mcheck_cpu_setup_timer(void)
 {
struct timer_list *t = this_cpu_ptr(_timer);
-   unsigned int cpu = smp_processor_id();
 
-   setup_pinned_timer(t, mce_timer_fn, cpu);
+   timer_setup(t, mce_timer_fn, TIMER_PINNED);
 }
 
 static void __mcheck_cpu_init_timer(void)
 {
struct timer_list *t = this_cpu_ptr(_timer);
-   unsigned int cpu = smp_processor_id();
 
-   setup_pinned_timer(t, mce_timer_fn, cpu);
+   timer_setup(t, mce_timer_fn, TIMER_PINNED);
mce_start_timer(t);
 }
 


[GIT PULL] RAS updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest ras-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ras-core-for-linus

   # HEAD: 783ca517bfd62ca516178712775e4b273292d5b1 x86/MCE/AMD: Fix 
mce_severity_amd_smca() signature

Two minor updates to AMD SMCA support, plus a timer_setup() conversion.

 Thanks,

Ingo

-->
Kees Cook (1):
  x86/mce: Convert timers to use timer_setup()

Yazen Ghannam (2):
  x86/MCE/AMD: Always give panic severity for UC errors in kernel context
  x86/MCE/AMD: Fix mce_severity_amd_smca() signature


 arch/x86/kernel/cpu/mcheck/mce-severity.c |  9 -
 arch/x86/kernel/cpu/mcheck/mce.c  | 13 +
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c 
b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab7a13c..4ca632a06e0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,7 +204,7 @@ static int error_context(struct mce *m)
return IN_KERNEL;
 }
 
-static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
 {
u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
u32 low, high;
@@ -245,6 +245,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
 
if (m->status & MCI_STATUS_UC) {
 
+   if (ctx == IN_KERNEL)
+   return MCE_PANIC_SEVERITY;
+
/*
 * On older systems where overflow_recov flag is not present, we
 * should simply panic if an error overflow occurs. If
@@ -255,10 +258,6 @@ static int mce_severity_amd(struct mce *m, int tolerant, 
char **msg, bool is_exc
if (mce_flags.smca)
return mce_severity_amd_smca(m, ctx);
 
-   /* software can try to contain */
-   if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == 
IN_KERNEL))
-   return MCE_PANIC_SEVERITY;
-
/* kill current process */
return MCE_AR_SEVERITY;
} else {
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3b413065c613..b1d616d08eee 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1367,13 +1367,12 @@ static void __start_timer(struct timer_list *t, 
unsigned long interval)
local_irq_restore(flags);
 }
 
-static void mce_timer_fn(unsigned long data)
+static void mce_timer_fn(struct timer_list *t)
 {
-   struct timer_list *t = this_cpu_ptr(_timer);
-   int cpu = smp_processor_id();
+   struct timer_list *cpu_t = this_cpu_ptr(_timer);
unsigned long iv;
 
-   WARN_ON(cpu != data);
+   WARN_ON(cpu_t != t);
 
iv = __this_cpu_read(mce_next_interval);
 
@@ -1763,17 +1762,15 @@ static void mce_start_timer(struct timer_list *t)
 static void __mcheck_cpu_setup_timer(void)
 {
struct timer_list *t = this_cpu_ptr(_timer);
-   unsigned int cpu = smp_processor_id();
 
-   setup_pinned_timer(t, mce_timer_fn, cpu);
+   timer_setup(t, mce_timer_fn, TIMER_PINNED);
 }
 
 static void __mcheck_cpu_init_timer(void)
 {
struct timer_list *t = this_cpu_ptr(_timer);
-   unsigned int cpu = smp_processor_id();
 
-   setup_pinned_timer(t, mce_timer_fn, cpu);
+   timer_setup(t, mce_timer_fn, TIMER_PINNED);
mce_start_timer(t);
 }
 


RE: [PATCH v5 0/5] fw_cfg: add DMA operations & etc/vmcoreinfo support

2017-11-12 Thread Hatayama, Daisuke
Marc-Andre,

It looks to me that the 4th and 5th patches somehow has not been sent.
Could you send them, too?
I'd like them to actually build and run the kernel for testing.

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Marc-Andre Lureau
> Sent: Wednesday, November 8, 2017 1:24 AM
> To: linux-kernel@vger.kernel.org
> Cc: so...@cmu.edu; qemu-de...@nongnu.org; m...@redhat.com; Marc-André Lureau
> 
> Subject: [PATCH v5 0/5] fw_cfg: add DMA operations & etc/vmcoreinfo support
> 
> Hi,
> 
> This series adds DMA operations support to the qemu fw_cfg kernel
> module and populates "etc/vmcoreinfo" with vmcoreinfo location
> details.
> 
> Note: the support for this entry handling has been merged for next
> qemu release (2.11)
> 
> v5:
> - resent to CC kdump people on the paddr_vmcoreinfo_note() export patch
> 
> v4:
> - export paddr_vmcoreinfo_note() to fix fw_cfg.ko build
> - fix build with !CONFIG_CRASH_CORE
> - replace the unbounded yield() loop with a usleep_range() loop and a
>   200ms timeout
> - do not write vmcoreinfo entry when running the kdump kernel (D. Hatayama)
> - drop the experimental sysfs write support patch from this series
> 
> v3: (thanks kbuild)
> - add "fw_cfg: fix the command line module name" patch
> - fix build of "fw_cfg: add DMA register" with CONFIG_FW_CFG_SYSFS_CMDLINE=y
> - fix 'Wshift-count-overflow'
> 
> v2:
> - use platform device for dma mapping
> - add etc/vmcoreinfo patch
> - some code cleanups
> 
> Marc-André Lureau (5):
>   fw_cfg: fix the command line module name
>   fw_cfg: add DMA register
>   fw_cfg: do DMA read operation
>   crash: export paddr_vmcoreinfo_note()
>   fw_cfg: write vmcoreinfo details
> 
>  drivers/firmware/qemu_fw_cfg.c | 292
> +
>  kernel/crash_core.c|   1 +
>  2 files changed, 264 insertions(+), 29 deletions(-)
> 
> --
> 2.15.0.125.g8f49766d64
> 
> 



RE: [PATCH v5 0/5] fw_cfg: add DMA operations & etc/vmcoreinfo support

2017-11-12 Thread Hatayama, Daisuke
Marc-Andre,

It looks to me that the 4th and 5th patches somehow has not been sent.
Could you send them, too?
I'd like them to actually build and run the kernel for testing.

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Marc-Andre Lureau
> Sent: Wednesday, November 8, 2017 1:24 AM
> To: linux-kernel@vger.kernel.org
> Cc: so...@cmu.edu; qemu-de...@nongnu.org; m...@redhat.com; Marc-André Lureau
> 
> Subject: [PATCH v5 0/5] fw_cfg: add DMA operations & etc/vmcoreinfo support
> 
> Hi,
> 
> This series adds DMA operations support to the qemu fw_cfg kernel
> module and populates "etc/vmcoreinfo" with vmcoreinfo location
> details.
> 
> Note: the support for this entry handling has been merged for next
> qemu release (2.11)
> 
> v5:
> - resent to CC kdump people on the paddr_vmcoreinfo_note() export patch
> 
> v4:
> - export paddr_vmcoreinfo_note() to fix fw_cfg.ko build
> - fix build with !CONFIG_CRASH_CORE
> - replace the unbounded yield() loop with a usleep_range() loop and a
>   200ms timeout
> - do not write vmcoreinfo entry when running the kdump kernel (D. Hatayama)
> - drop the experimental sysfs write support patch from this series
> 
> v3: (thanks kbuild)
> - add "fw_cfg: fix the command line module name" patch
> - fix build of "fw_cfg: add DMA register" with CONFIG_FW_CFG_SYSFS_CMDLINE=y
> - fix 'Wshift-count-overflow'
> 
> v2:
> - use platform device for dma mapping
> - add etc/vmcoreinfo patch
> - some code cleanups
> 
> Marc-André Lureau (5):
>   fw_cfg: fix the command line module name
>   fw_cfg: add DMA register
>   fw_cfg: do DMA read operation
>   crash: export paddr_vmcoreinfo_note()
>   fw_cfg: write vmcoreinfo details
> 
>  drivers/firmware/qemu_fw_cfg.c | 292
> +
>  kernel/crash_core.c|   1 +
>  2 files changed, 264 insertions(+), 29 deletions(-)
> 
> --
> 2.15.0.125.g8f49766d64
> 
> 



[GIT PULL] perf updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest perf-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus

   # HEAD: fcdfafcb73be8fa45909327bbddca46fb362a675 kprobes: Don't spam the 
build log with deprecation warnings

The main changes in this cycle were:

  - kprobes updates: use better W^X patterns for code modifications, improve 
optprobes, remove jprobes. (Masami Hiramatsu, Kees Cook)

  - core fixes: event timekeeping (enabled/running times statistics) fixes, 
perf_event_read() locking fixes and cleanups, etc. (Peter Zijlstra)

  - Extend x86 Intel free-running PEBS support and support x86 user-register 
sampling in perf record and perf script. (Andi Kleen)

  - Tooling updates:

 - Completely rework the way inline frames are handled. Instead of querying 
   for the inline nodes on-demand in the individual tools, we now create 
   proper callchain nodes for inlined frames. (Milian Wolff)

 - 'perf trace' updates (Arnaldo Carvalho de Melo)

 - Implement a way to print formatted output to per-event files in 'perf 
script'
   to facilitate generate flamegraphs, elliminating the need to write 
scripts to
   do that separation (yuzhoujian, Arnaldo Carvalho de Melo)

 - Update vendor events JSON metrics for Intel's Broadwell, Broadwell
   Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown, Sandy
   Bridge, Skylake, SkyLake Server - and Goldmont Plus V1 (Andi Kleen, Kan 
Liang)

 - Multithread the synthesizing of PERF_RECORD_ events for pre-existing
   threads in 'perf top', speeding up that phase, greatly improving the
   user experience in systems such as Intel's Knights Mill (Kan Liang)
   
 - Introduce the concept of weak groups in 'perf stat': try to set up a
   group, but if it's not schedulable fallback to not using a group. That
   gives us the best of both worlds: groups if they work, but still a
   usable fallback if they don't. E.g: (Andi Kleen)

 - perf sched timehist enhancements (David Ahern)

 - ... various other enhancements, updates, cleanups and fixes.

 
 Thanks,

Ingo

-->
Alexander Shishkin (1):
  perf/core: Explain perf_sched_mutex

Andi Kleen (40):
  perf tools: Support weak groups in 'perf stat'
  perf vendor events: Support metric_group and no event name in JSON parser
  perf stat: Factor out generic metric printing
  perf stat: Print generic metric header even for failed expressions
  perf pmu: Extract function to get JSON alias map
  perf stat: Support JSON metrics in perf stat
  perf list: Add metric groups to perf list
  perf stat: Don't use ctx for saved values lookup
  perf stat: Support duration_time for metrics
  perf stat: Hide internal duration_time counter
  perf stat: Update walltime_nsecs_stats in interval mode
  perf record: Support direct --user-regs arguments
  perf script: Support user regs
  perf stat: Fall weak group back even for EBADF
  perf vendor events: Add JSON metrics for Broadwell
  perf vendor events: Add JSON metrics for Skylake
  perf vendor events: Add JSON metrics for Sandy Bridge
  perf vendor events: Add JSON metrics for Sandy Bridge EP
  perf vendor events: Add JSON metrics for Ivy Bridge
  perf vendor events: Add JSON metrics for Haswell
  perf vendor events: Add JSON metrics for Ivy Town
  perf vendor events: Add JSON metrics for Haswell EP
  perf vendor events: Add JSON metrics for Broadwell Server
  perf vendor events: Add JSON metrics for Broadwell DE
  perf vendor events: Add JSON metrics for Skylake server
  perf pmu: Improve error messages for missing PMUs
  perf stat: Fix adding multiple event groups
  perf/x86: Enable free running PEBS for REGS_USER/INTR
  perf vendor events: Update JSON metrics for Broadwell
  perf vendor events: Update JSON metrics for Broadwell Server
  perf vendor events: Update JSON metrics for Haswell
  perf vendor events: Update JSON metrics for Haswell Server
  perf vendor events: Update JSON metrics for IvyBridge
  perf vendor events: Update JSON metrics for IvyTown
  perf vendor events: Update JSON metrics for JakeTown
  perf vendor events: Update JSON metrics for Sandy Bridge
  perf vendor events: Update JSON metrics for Skylake
  perf vendor events: Update JSON metrics for Skylake Server
  perf list: Fix group description in the man page
  perf vendor events: Fix incorrect cmask syntax for some Intel metrics

Arnaldo Carvalho de Melo (25):
  perf tools: Make copyfile_offset() static
  perf machine: Optimize a bit the machine__findnew_thread() methods
  perf trace beauty madvise: Generate 'behavior' string table from kernel 
headers
  tools: Update asm-generic/mman-common.h copy from the kernel
  perf tools: Get all of tools/{arch,include}/ in the MANIFEST
  

[GIT PULL] perf updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest perf-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus

   # HEAD: fcdfafcb73be8fa45909327bbddca46fb362a675 kprobes: Don't spam the 
build log with deprecation warnings

The main changes in this cycle were:

  - kprobes updates: use better W^X patterns for code modifications, improve 
optprobes, remove jprobes. (Masami Hiramatsu, Kees Cook)

  - core fixes: event timekeeping (enabled/running times statistics) fixes, 
perf_event_read() locking fixes and cleanups, etc. (Peter Zijlstra)

  - Extend x86 Intel free-running PEBS support and support x86 user-register 
sampling in perf record and perf script. (Andi Kleen)

  - Tooling updates:

 - Completely rework the way inline frames are handled. Instead of querying 
   for the inline nodes on-demand in the individual tools, we now create 
   proper callchain nodes for inlined frames. (Milian Wolff)

 - 'perf trace' updates (Arnaldo Carvalho de Melo)

 - Implement a way to print formatted output to per-event files in 'perf 
script'
   to facilitate generate flamegraphs, elliminating the need to write 
scripts to
   do that separation (yuzhoujian, Arnaldo Carvalho de Melo)

 - Update vendor events JSON metrics for Intel's Broadwell, Broadwell
   Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown, Sandy
   Bridge, Skylake, SkyLake Server - and Goldmont Plus V1 (Andi Kleen, Kan 
Liang)

 - Multithread the synthesizing of PERF_RECORD_ events for pre-existing
   threads in 'perf top', speeding up that phase, greatly improving the
   user experience in systems such as Intel's Knights Mill (Kan Liang)
   
 - Introduce the concept of weak groups in 'perf stat': try to set up a
   group, but if it's not schedulable fallback to not using a group. That
   gives us the best of both worlds: groups if they work, but still a
   usable fallback if they don't. E.g: (Andi Kleen)

 - perf sched timehist enhancements (David Ahern)

 - ... various other enhancements, updates, cleanups and fixes.

 
 Thanks,

Ingo

-->
Alexander Shishkin (1):
  perf/core: Explain perf_sched_mutex

Andi Kleen (40):
  perf tools: Support weak groups in 'perf stat'
  perf vendor events: Support metric_group and no event name in JSON parser
  perf stat: Factor out generic metric printing
  perf stat: Print generic metric header even for failed expressions
  perf pmu: Extract function to get JSON alias map
  perf stat: Support JSON metrics in perf stat
  perf list: Add metric groups to perf list
  perf stat: Don't use ctx for saved values lookup
  perf stat: Support duration_time for metrics
  perf stat: Hide internal duration_time counter
  perf stat: Update walltime_nsecs_stats in interval mode
  perf record: Support direct --user-regs arguments
  perf script: Support user regs
  perf stat: Fall weak group back even for EBADF
  perf vendor events: Add JSON metrics for Broadwell
  perf vendor events: Add JSON metrics for Skylake
  perf vendor events: Add JSON metrics for Sandy Bridge
  perf vendor events: Add JSON metrics for Sandy Bridge EP
  perf vendor events: Add JSON metrics for Ivy Bridge
  perf vendor events: Add JSON metrics for Haswell
  perf vendor events: Add JSON metrics for Ivy Town
  perf vendor events: Add JSON metrics for Haswell EP
  perf vendor events: Add JSON metrics for Broadwell Server
  perf vendor events: Add JSON metrics for Broadwell DE
  perf vendor events: Add JSON metrics for Skylake server
  perf pmu: Improve error messages for missing PMUs
  perf stat: Fix adding multiple event groups
  perf/x86: Enable free running PEBS for REGS_USER/INTR
  perf vendor events: Update JSON metrics for Broadwell
  perf vendor events: Update JSON metrics for Broadwell Server
  perf vendor events: Update JSON metrics for Haswell
  perf vendor events: Update JSON metrics for Haswell Server
  perf vendor events: Update JSON metrics for IvyBridge
  perf vendor events: Update JSON metrics for IvyTown
  perf vendor events: Update JSON metrics for JakeTown
  perf vendor events: Update JSON metrics for Sandy Bridge
  perf vendor events: Update JSON metrics for Skylake
  perf vendor events: Update JSON metrics for Skylake Server
  perf list: Fix group description in the man page
  perf vendor events: Fix incorrect cmask syntax for some Intel metrics

Arnaldo Carvalho de Melo (25):
  perf tools: Make copyfile_offset() static
  perf machine: Optimize a bit the machine__findnew_thread() methods
  perf trace beauty madvise: Generate 'behavior' string table from kernel 
headers
  tools: Update asm-generic/mman-common.h copy from the kernel
  perf tools: Get all of tools/{arch,include}/ in the MANIFEST
  

[PATCH] perf tool: Fix build failure when NO_AUXTRACE=1

2017-11-12 Thread Ravi Bangoria
Perf tool fails with following build failure when AUXTRACE is
not set:

  $ make NO_AUXTRACE=1
  builtin-script.c: In function 'perf_script__process_auxtrace_info':
  util/auxtrace.h:608:44: error: called object is not a function or function 
pointer
   #define perf_event__process_auxtrace_info  0
  ^

Fix it by guarding function under HAVE_AUXTRACE_SUPPORT.

Fixes: 47e5a26a916b ("perf script: Fix --per-event-dump for auxtrace synth 
evsels")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-script.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ad6404dcf91c..9b43bda45a41 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2848,6 +2848,7 @@ int process_cpu_map_event(struct perf_tool *tool 
__maybe_unused,
return set_maps(script);
 }
 
+#ifdef HAVE_AUXTRACE_SUPPORT
 static int perf_script__process_auxtrace_info(struct perf_tool *tool,
  union perf_event *event,
  struct perf_session *session)
@@ -2862,6 +2863,9 @@ static int perf_script__process_auxtrace_info(struct 
perf_tool *tool,
 
return ret;
 }
+#else
+#define perf_script__process_auxtrace_info 0
+#endif
 
 int cmd_script(int argc, const char **argv)
 {
-- 
2.13.6



[PATCH] perf tool: Fix build failure when NO_AUXTRACE=1

2017-11-12 Thread Ravi Bangoria
Perf tool fails with following build failure when AUXTRACE is
not set:

  $ make NO_AUXTRACE=1
  builtin-script.c: In function 'perf_script__process_auxtrace_info':
  util/auxtrace.h:608:44: error: called object is not a function or function 
pointer
   #define perf_event__process_auxtrace_info  0
  ^

Fix it by guarding function under HAVE_AUXTRACE_SUPPORT.

Fixes: 47e5a26a916b ("perf script: Fix --per-event-dump for auxtrace synth 
evsels")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-script.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ad6404dcf91c..9b43bda45a41 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2848,6 +2848,7 @@ int process_cpu_map_event(struct perf_tool *tool 
__maybe_unused,
return set_maps(script);
 }
 
+#ifdef HAVE_AUXTRACE_SUPPORT
 static int perf_script__process_auxtrace_info(struct perf_tool *tool,
  union perf_event *event,
  struct perf_session *session)
@@ -2862,6 +2863,9 @@ static int perf_script__process_auxtrace_info(struct 
perf_tool *tool,
 
return ret;
 }
+#else
+#define perf_script__process_auxtrace_info 0
+#endif
 
 int cmd_script(int argc, const char **argv)
 {
-- 
2.13.6



Re: [PATCH] perf/core: fast breakpoint modification via _IOC_MODIFY_BREAKPOINT

2017-11-12 Thread Jiri Olsa
On Sun, Nov 12, 2017 at 11:09:23AM -0800, Milind Chabbi wrote:
>  ,
> 
> On Thu, Nov 9, 2017 at 10:59 AM, Milind Chabbi  
> wrote:
> > SNIP
> >
> > On Thu, Nov 9, 2017 at 5:12 AM, Jiri Olsa  wrote:
> >>
> >>
> >> how about something like below (untested)
> >>
> >> looks like there's no irq caller for modify_user_hw_breakpoint,
> >> so we should be fine with locking nr_bp_mutex
> >>
> >> jirka
> >>
> >>
> >> ---
> >> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> >> index 3f8cb1e14588..f062b68399ea 100644
> >> --- a/kernel/events/hw_breakpoint.c
> >> +++ b/kernel/events/hw_breakpoint.c
> >> @@ -448,6 +448,8 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> else
> >> perf_event_disable(bp);
> >>
> >> +   release_bp_slot(bp);
> >> +
> >> bp->attr.bp_addr = attr->bp_addr;
> >> bp->attr.bp_type = attr->bp_type;
> >> bp->attr.bp_len = attr->bp_len;
> >> @@ -455,9 +457,9 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> if (attr->disabled)
> >> goto end;
> >>
> >> -   err = validate_hw_breakpoint(bp);
> >> +   err = reserve_bp_slot(bp);
> >> if (!err)
> >> -   perf_event_enable(bp);
> >> +   err = validate_hw_breakpoint(bp);
> >>
> >> if (err) {
> >> bp->attr.bp_addr = old_addr;
> >> @@ -469,6 +471,7 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> return err;
> >> }
> >>
> >> +   perf_event_enable(bp);
> >>  end:
> >> bp->attr.disabled = attr->disabled;
> >>
> >
> > We can do this accounting only if bp->attr.bp_type != attr->bp_type.
> >
> > -Milind
> 
> 
> Jirka,
> 
> Neither of us seems to fully understand the convoluted logic used in
> breakpoint counting.

yea, I was hoping some of the guys would take over ;-)

the problem I have with the patch above is that we could
fail to reserve the slot at the end, which is not what
the caller might expect

> 
> I tested the following sequence on an x86 machine, which has four
> debug registers (without your suggested patch for counting
> correction).
> 
> fd1 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR1
> fd2 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR2
> fd3 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR3
> fd4 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR4
> ioctl(fd4, MODIFY, ...); // change fd4 to BP_TYPE= HW_BREAKPOINT_X @ ADDR5
> close(fd4);
> fd5 = perf_event_open(); //BP_TYPE=RW @ ADDR6
> 
> We expected fd5 to fail because four BP_TYPE=TYPE_DATA are in use as
> per the accounting, but in reality, fd5 was successfully opened.

but you closed fd4 before openning fd5..?

> 
> Is the accounting accidentally working on x86?
> Is there another architecture where TYPE_DATA and TYPE_INS are counted
> differently?

[jolsa@krava linux-perf]$ grep -r HAVE_MIXED_BREAKPOINTS_REGS arch/*
arch/Kconfig:config HAVE_MIXED_BREAKPOINTS_REGS
arch/sh/Kconfig:select HAVE_MIXED_BREAKPOINTS_REGS
arch/x86/Kconfig:   select HAVE_MIXED_BREAKPOINTS_REGS

I'll try to check on it this week

jirka


Re: [PATCH] perf/core: fast breakpoint modification via _IOC_MODIFY_BREAKPOINT

2017-11-12 Thread Jiri Olsa
On Sun, Nov 12, 2017 at 11:09:23AM -0800, Milind Chabbi wrote:
>  ,
> 
> On Thu, Nov 9, 2017 at 10:59 AM, Milind Chabbi  
> wrote:
> > SNIP
> >
> > On Thu, Nov 9, 2017 at 5:12 AM, Jiri Olsa  wrote:
> >>
> >>
> >> how about something like below (untested)
> >>
> >> looks like there's no irq caller for modify_user_hw_breakpoint,
> >> so we should be fine with locking nr_bp_mutex
> >>
> >> jirka
> >>
> >>
> >> ---
> >> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> >> index 3f8cb1e14588..f062b68399ea 100644
> >> --- a/kernel/events/hw_breakpoint.c
> >> +++ b/kernel/events/hw_breakpoint.c
> >> @@ -448,6 +448,8 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> else
> >> perf_event_disable(bp);
> >>
> >> +   release_bp_slot(bp);
> >> +
> >> bp->attr.bp_addr = attr->bp_addr;
> >> bp->attr.bp_type = attr->bp_type;
> >> bp->attr.bp_len = attr->bp_len;
> >> @@ -455,9 +457,9 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> if (attr->disabled)
> >> goto end;
> >>
> >> -   err = validate_hw_breakpoint(bp);
> >> +   err = reserve_bp_slot(bp);
> >> if (!err)
> >> -   perf_event_enable(bp);
> >> +   err = validate_hw_breakpoint(bp);
> >>
> >> if (err) {
> >> bp->attr.bp_addr = old_addr;
> >> @@ -469,6 +471,7 @@ int modify_user_hw_breakpoint(struct perf_event *bp, 
> >> struct perf_event_attr *att
> >> return err;
> >> }
> >>
> >> +   perf_event_enable(bp);
> >>  end:
> >> bp->attr.disabled = attr->disabled;
> >>
> >
> > We can do this accounting only if bp->attr.bp_type != attr->bp_type.
> >
> > -Milind
> 
> 
> Jirka,
> 
> Neither of us seems to fully understand the convoluted logic used in
> breakpoint counting.

yea, I was hoping some of the guys would take over ;-)

the problem I have with the patch above is that we could
fail to reserve the slot at the end, which is not what
the caller might expect

> 
> I tested the following sequence on an x86 machine, which has four
> debug registers (without your suggested patch for counting
> correction).
> 
> fd1 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR1
> fd2 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR2
> fd3 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR3
> fd4 = perf_event_open(...); //BP_TYPE= HW_BREAKPOINT_RW @ ADDR4
> ioctl(fd4, MODIFY, ...); // change fd4 to BP_TYPE= HW_BREAKPOINT_X @ ADDR5
> close(fd4);
> fd5 = perf_event_open(); //BP_TYPE=RW @ ADDR6
> 
> We expected fd5 to fail because four BP_TYPE=TYPE_DATA are in use as
> per the accounting, but in reality, fd5 was successfully opened.

but you closed fd4 before openning fd5..?

> 
> Is the accounting accidentally working on x86?
> Is there another architecture where TYPE_DATA and TYPE_INS are counted
> differently?

[jolsa@krava linux-perf]$ grep -r HAVE_MIXED_BREAKPOINTS_REGS arch/*
arch/Kconfig:config HAVE_MIXED_BREAKPOINTS_REGS
arch/sh/Kconfig:select HAVE_MIXED_BREAKPOINTS_REGS
arch/x86/Kconfig:   select HAVE_MIXED_BREAKPOINTS_REGS

I'll try to check on it this week

jirka


Crypto Update for 4.15

2017-11-12 Thread Herbert Xu
Hi Linus: 

Here is the crypto update for 4.15:

API:

- Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
  This change touches code outside the crypto API.
- Reset settings when empty string is written to rng_current.

Algorithms:

- Add OSCCA SM3 secure hash.

Drivers:

- Remove old mv_cesa driver (replaced by marvell/cesa).
- Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
- Add ccm/gcm AES in crypto4xx.
- Add support for BCM7278 in iproc-rng200.
- Add hash support on Exynos in s5p-sss.
- Fix fallback-induced error in vmx.
- Fix output IV in atmel-aes.
- Fix empty GCM hash in mediatek.

Others:

- Fix DoS potential in lib/mpi.
- Fix potential out-of-order issues with padata.

Please note that there may be a conflict with the tips tree due
to the timer_setup patch being applied in both cryptodev and
the tips tree.  The version in the tips tree also touchs the
mv_cesa driver which just happens to have been removed in this
cycle in cryptodev.  Any changes to mv_cesa may be safely discarded.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Allen (1):
  crypto: omap - return -ENOMEM on allocation failure.

Arnd Bergmann (1):
  crypto: axis - hide an unused variable

Arvind Yadav (11):
  crypto: nx - constify vio_device_id
  crypto: nx-842 - constify vio_device_id
  hwrng: pseries - constify vio_device_id
  crypto: padlock-aes - constify x86_cpu_id
  crypto: padlock-sha - constify x86_cpu_id
  hwrng: core - pr_err() strings should end with newlines
  crypto: omap-aes - pr_err() strings should end with newlines
  crypto: virtio - pr_err() strings should end with newlines
  crypto: chelsio - pr_err() strings should end with newlines
  crypto: qat - pr_err() strings should end with newlines
  crypto: bcm - pr_err() strings should end with newlines

Boris BREZILLON (5):
  crypto: marvell - Add a platform_device_id table
  ARM: configs: Stop selecting the old CESA driver
  crypto: marvell - Remove the old mv_cesa driver
  crypto: marvell - Switch cipher algs to the skcipher interface
  crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]

Christian Lamparter (25):
  crypto: crypto4xx - remove bad list_del
  crypto: crypto4xx - remove unused definitions and write-only variables
  crypto: crypto4xx - set CRYPTO_ALG_KERN_DRIVER_ONLY flag
  crypto: crypto4xx - remove extern statement before function declaration
  crypto: crypto4xx - remove double assignment of pd_uinfo->state
  crypto: crypto4xx - fix dynamic_sa_ctl's sa_contents declaration
  crypto: crypto4xx - move and refactor dynamic_contents helpers
  crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads
  crypto: crypto4xx - refactor crypto4xx_copy_pkt_to_dst()
  crypto: crypto4xx - replace crypto4xx_dev's scatter_buffer_size with 
constant
  crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
  crypto: crypto4xx - pointer arithmetic overhaul
  crypto: crypto4xx - wire up hmac_mc to hmac_muting
  crypto: crypto4xx - fix off-by-one AES-OFB
  crypto: crypto4xx - fix type mismatch compiler error
  crypto: crypto4xx - increase context and scatter ring buffer elements
  crypto: crypto4xx - add backlog queue support
  crypto: crypto4xx - use the correct LE32 format for IV and key defs
  crypto: crypto4xx - overhaul crypto4xx_build_pd()
  crypto: crypto4xx - fix various warnings
  crypto: crypto4xx - fix stalls under heavy load
  crypto: crypto4xx - simplify sa and state context acquisition
  crypto: crypto4xx - prepare for AEAD support
  crypto: crypto4xx - add aes-ccm support
  crypto: crypto4xx - add aes-gcm support

Christophe Jaillet (2):
  crypto: lrw - Fix an error handling path in 'create()'
  crypto: lrw - Check for incorrect cipher name

Colin Ian King (5):
  crypto: aesni - make arrays aesni_simd_skciphers and 
aesni_simd_skciphers2 static
  crypto: algboss - remove redundant setting of len to zero
  crypto: cavium - clean up clang warning on unread variable offset
  crypto: ccp - remove unused variable qim
  crypto: qat - remove unused and redundant pointer vf_info

Corentin LABBE (14):
  crypto: gcm - add GCM IV size constant
  crypto: caam - Use GCM IV size constant
  crypto: ccp - Use GCM IV size constant
  crypto: nx - Use GCM IV size constant
  crypto: atmel - Use GCM IV size constant
  crypto: bcm - Use GCM IV size constant
  crypto: mediatek - Use GCM IV size constant
  crypto: chelsio - Use GCM IV size constant
  crypto: omap - Use GCM IV size constant
  crypto: gcm - Use GCM IV size constant
  crypto: aesni - Use GCM IV size constant
  crypto: stm32 - use of_device_get_match_data
  crypto: omap - use of_device_get_match_data
  crypto: bcm - use of_device_get_match_data

Eric Biggers 

Crypto Update for 4.15

2017-11-12 Thread Herbert Xu
Hi Linus: 

Here is the crypto update for 4.15:

API:

- Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
  This change touches code outside the crypto API.
- Reset settings when empty string is written to rng_current.

Algorithms:

- Add OSCCA SM3 secure hash.

Drivers:

- Remove old mv_cesa driver (replaced by marvell/cesa).
- Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
- Add ccm/gcm AES in crypto4xx.
- Add support for BCM7278 in iproc-rng200.
- Add hash support on Exynos in s5p-sss.
- Fix fallback-induced error in vmx.
- Fix output IV in atmel-aes.
- Fix empty GCM hash in mediatek.

Others:

- Fix DoS potential in lib/mpi.
- Fix potential out-of-order issues with padata.

Please note that there may be a conflict with the tips tree due
to the timer_setup patch being applied in both cryptodev and
the tips tree.  The version in the tips tree also touchs the
mv_cesa driver which just happens to have been removed in this
cycle in cryptodev.  Any changes to mv_cesa may be safely discarded.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Allen (1):
  crypto: omap - return -ENOMEM on allocation failure.

Arnd Bergmann (1):
  crypto: axis - hide an unused variable

Arvind Yadav (11):
  crypto: nx - constify vio_device_id
  crypto: nx-842 - constify vio_device_id
  hwrng: pseries - constify vio_device_id
  crypto: padlock-aes - constify x86_cpu_id
  crypto: padlock-sha - constify x86_cpu_id
  hwrng: core - pr_err() strings should end with newlines
  crypto: omap-aes - pr_err() strings should end with newlines
  crypto: virtio - pr_err() strings should end with newlines
  crypto: chelsio - pr_err() strings should end with newlines
  crypto: qat - pr_err() strings should end with newlines
  crypto: bcm - pr_err() strings should end with newlines

Boris BREZILLON (5):
  crypto: marvell - Add a platform_device_id table
  ARM: configs: Stop selecting the old CESA driver
  crypto: marvell - Remove the old mv_cesa driver
  crypto: marvell - Switch cipher algs to the skcipher interface
  crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]

Christian Lamparter (25):
  crypto: crypto4xx - remove bad list_del
  crypto: crypto4xx - remove unused definitions and write-only variables
  crypto: crypto4xx - set CRYPTO_ALG_KERN_DRIVER_ONLY flag
  crypto: crypto4xx - remove extern statement before function declaration
  crypto: crypto4xx - remove double assignment of pd_uinfo->state
  crypto: crypto4xx - fix dynamic_sa_ctl's sa_contents declaration
  crypto: crypto4xx - move and refactor dynamic_contents helpers
  crypto: crypto4xx - enable AES RFC3686, ECB, CFB and OFB offloads
  crypto: crypto4xx - refactor crypto4xx_copy_pkt_to_dst()
  crypto: crypto4xx - replace crypto4xx_dev's scatter_buffer_size with 
constant
  crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
  crypto: crypto4xx - pointer arithmetic overhaul
  crypto: crypto4xx - wire up hmac_mc to hmac_muting
  crypto: crypto4xx - fix off-by-one AES-OFB
  crypto: crypto4xx - fix type mismatch compiler error
  crypto: crypto4xx - increase context and scatter ring buffer elements
  crypto: crypto4xx - add backlog queue support
  crypto: crypto4xx - use the correct LE32 format for IV and key defs
  crypto: crypto4xx - overhaul crypto4xx_build_pd()
  crypto: crypto4xx - fix various warnings
  crypto: crypto4xx - fix stalls under heavy load
  crypto: crypto4xx - simplify sa and state context acquisition
  crypto: crypto4xx - prepare for AEAD support
  crypto: crypto4xx - add aes-ccm support
  crypto: crypto4xx - add aes-gcm support

Christophe Jaillet (2):
  crypto: lrw - Fix an error handling path in 'create()'
  crypto: lrw - Check for incorrect cipher name

Colin Ian King (5):
  crypto: aesni - make arrays aesni_simd_skciphers and 
aesni_simd_skciphers2 static
  crypto: algboss - remove redundant setting of len to zero
  crypto: cavium - clean up clang warning on unread variable offset
  crypto: ccp - remove unused variable qim
  crypto: qat - remove unused and redundant pointer vf_info

Corentin LABBE (14):
  crypto: gcm - add GCM IV size constant
  crypto: caam - Use GCM IV size constant
  crypto: ccp - Use GCM IV size constant
  crypto: nx - Use GCM IV size constant
  crypto: atmel - Use GCM IV size constant
  crypto: bcm - Use GCM IV size constant
  crypto: mediatek - Use GCM IV size constant
  crypto: chelsio - Use GCM IV size constant
  crypto: omap - Use GCM IV size constant
  crypto: gcm - Use GCM IV size constant
  crypto: aesni - Use GCM IV size constant
  crypto: stm32 - use of_device_get_match_data
  crypto: omap - use of_device_get_match_data
  crypto: bcm - use of_device_get_match_data

Eric Biggers 

[PATCH] spi: spi-fsl-dspi: add SPI_LSB_FIRST to driver capabilities

2017-11-12 Thread Kurt Kanzenbach
The driver as well as the controller support the SPI lsb first
mode. However, it's not possible to configure it e.g. when using
spidev. Adding this flag to mode_bits resolves the issue and lsb first
mode can be used.

Signed-off-by: Kurt Kanzenbach 
---
 drivers/spi/spi-fsl-dspi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index f652f70cb8db..02d3ed7f2558 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -980,7 +980,7 @@ static int dspi_probe(struct platform_device *pdev)
master->dev.of_node = pdev->dev.of_node;
 
master->cleanup = dspi_cleanup;
-   master->mode_bits = SPI_CPOL | SPI_CPHA;
+   master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_LSB_FIRST;
master->bits_per_word_mask = SPI_BPW_MASK(4) | SPI_BPW_MASK(8) |
SPI_BPW_MASK(16);
 
-- 
2.11.0



[PATCH] spi: spi-fsl-dspi: add SPI_LSB_FIRST to driver capabilities

2017-11-12 Thread Kurt Kanzenbach
The driver as well as the controller support the SPI lsb first
mode. However, it's not possible to configure it e.g. when using
spidev. Adding this flag to mode_bits resolves the issue and lsb first
mode can be used.

Signed-off-by: Kurt Kanzenbach 
---
 drivers/spi/spi-fsl-dspi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index f652f70cb8db..02d3ed7f2558 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -980,7 +980,7 @@ static int dspi_probe(struct platform_device *pdev)
master->dev.of_node = pdev->dev.of_node;
 
master->cleanup = dspi_cleanup;
-   master->mode_bits = SPI_CPOL | SPI_CPHA;
+   master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_LSB_FIRST;
master->bits_per_word_mask = SPI_BPW_MASK(4) | SPI_BPW_MASK(8) |
SPI_BPW_MASK(16);
 
-- 
2.11.0



Re: [PATCH v4 4/4] ARM64: dts: meson: drop "sana" clock from SAR ADC

2017-11-12 Thread Yixun Lan
Hi Kevin & others

  I'd like to just re-send the patch [4/4] (while leave others[1-3/4]
unchanged), to have separated DT patch the for 32bit / 64bit platform.
  is this ok for you?


On 11/12/17 09:33, Martin Blumenstingl wrote:
> Hi Yixun,
> 
> On Tue, Nov 7, 2017 at 3:10 PM, Yixun Lan  wrote:
>> From: Xingyu Chen 
>>
>> The SAR ADC modules doesn't require The "sana" clock.
>>
>> Singed-off-by: Xingyu Chen 
>> Signed-off-by: Yixun Lan 
>> ---
>>  arch/arm/boot/dts/meson8.dtsi   | 5 ++---
>>  arch/arm/boot/dts/meson8b.dtsi  | 5 ++---
> these two should go into a separate patch (with "ARM: dts: ..."
> prefix) - the ARM maintainers want separate pull requests for the
> 32-bit and 64-bit .dts changes, so patches should also follow that
> schema
> 
> with that fixed, you can add my ACK on both (32-bit and 64-bit) .dts patches:
> Acked-by: Martin Blumenstingl
> 

thanks, I will send separate patch for this, and I will add your 'Acked-by'

>>  arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 3 +--
>>  arch/arm64/boot/dts/amlogic/meson-gxl.dtsi  | 3 +--
>>  4 files changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm/boot/dts/meson8.dtsi b/arch/arm/boot/dts/meson8.dtsi
>> index b98d44fde6b6..f93d6cf6e094 100644
>> --- a/arch/arm/boot/dts/meson8.dtsi
>> +++ b/arch/arm/boot/dts/meson8.dtsi
>> @@ -289,9 +289,8 @@
>>   {
>> compatible = "amlogic,meson8-saradc", "amlogic,meson-saradc";
>> clocks = < CLKID_XTAL>,
>> -   < CLKID_SAR_ADC>,
>> -   < CLKID_SANA>;
>> -   clock-names = "clkin", "core", "sana";
>> +   < CLKID_SAR_ADC>;
>> +   clock-names = "clkin", "core";
>>  };
>>
>>   {
>> diff --git a/arch/arm/boot/dts/meson8b.dtsi b/arch/arm/boot/dts/meson8b.dtsi
>> index bc278da7df0d..4aa444284f0c 100644
>> --- a/arch/arm/boot/dts/meson8b.dtsi
>> +++ b/arch/arm/boot/dts/meson8b.dtsi
>> @@ -185,9 +185,8 @@
>>   {
>> compatible = "amlogic,meson8b-saradc", "amlogic,meson-saradc";
>> clocks = < CLKID_XTAL>,
>> -   < CLKID_SAR_ADC>,
>> -   < CLKID_SANA>;
>> -   clock-names = "clkin", "core", "sana";
>> +   < CLKID_SAR_ADC>;
>> +   clock-names = "clkin", "core";
>>  };
>>
>>  _AO {
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> index af834cdbba79..b77f2593cdc3 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> @@ -686,10 +686,9 @@
>> compatible = "amlogic,meson-gxbb-saradc", "amlogic,meson-saradc";
>> clocks = <>,
>>  < CLKID_SAR_ADC>,
>> -< CLKID_SANA>,
>>  < CLKID_SAR_ADC_CLK>,
>>  < CLKID_SAR_ADC_SEL>;
>> -   clock-names = "clkin", "core", "sana", "adc_clk", "adc_sel";
>> +   clock-names = "clkin", "core", "adc_clk", "adc_sel";
>>  };
>>
>>  _emmc_a {
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> index d8dd3298b15c..07805a3b4db0 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> @@ -628,10 +628,9 @@
>> compatible = "amlogic,meson-gxl-saradc", "amlogic,meson-saradc";
>> clocks = <>,
>>  < CLKID_SAR_ADC>,
>> -< CLKID_SANA>,
>>  < CLKID_SAR_ADC_CLK>,
>>  < CLKID_SAR_ADC_SEL>;
>> -   clock-names = "clkin", "core", "sana", "adc_clk", "adc_sel";
>> +   clock-names = "clkin", "core", "adc_clk", "adc_sel";
>>  };
>>
>>  _emmc_a {
>> --
>> 2.14.1
>>
> 
> .
> 


Re: [PATCH v4 4/4] ARM64: dts: meson: drop "sana" clock from SAR ADC

2017-11-12 Thread Yixun Lan
Hi Kevin & others

  I'd like to just re-send the patch [4/4] (while leave others[1-3/4]
unchanged), to have separated DT patch the for 32bit / 64bit platform.
  is this ok for you?


On 11/12/17 09:33, Martin Blumenstingl wrote:
> Hi Yixun,
> 
> On Tue, Nov 7, 2017 at 3:10 PM, Yixun Lan  wrote:
>> From: Xingyu Chen 
>>
>> The SAR ADC modules doesn't require The "sana" clock.
>>
>> Singed-off-by: Xingyu Chen 
>> Signed-off-by: Yixun Lan 
>> ---
>>  arch/arm/boot/dts/meson8.dtsi   | 5 ++---
>>  arch/arm/boot/dts/meson8b.dtsi  | 5 ++---
> these two should go into a separate patch (with "ARM: dts: ..."
> prefix) - the ARM maintainers want separate pull requests for the
> 32-bit and 64-bit .dts changes, so patches should also follow that
> schema
> 
> with that fixed, you can add my ACK on both (32-bit and 64-bit) .dts patches:
> Acked-by: Martin Blumenstingl
> 

thanks, I will send separate patch for this, and I will add your 'Acked-by'

>>  arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 3 +--
>>  arch/arm64/boot/dts/amlogic/meson-gxl.dtsi  | 3 +--
>>  4 files changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm/boot/dts/meson8.dtsi b/arch/arm/boot/dts/meson8.dtsi
>> index b98d44fde6b6..f93d6cf6e094 100644
>> --- a/arch/arm/boot/dts/meson8.dtsi
>> +++ b/arch/arm/boot/dts/meson8.dtsi
>> @@ -289,9 +289,8 @@
>>   {
>> compatible = "amlogic,meson8-saradc", "amlogic,meson-saradc";
>> clocks = < CLKID_XTAL>,
>> -   < CLKID_SAR_ADC>,
>> -   < CLKID_SANA>;
>> -   clock-names = "clkin", "core", "sana";
>> +   < CLKID_SAR_ADC>;
>> +   clock-names = "clkin", "core";
>>  };
>>
>>   {
>> diff --git a/arch/arm/boot/dts/meson8b.dtsi b/arch/arm/boot/dts/meson8b.dtsi
>> index bc278da7df0d..4aa444284f0c 100644
>> --- a/arch/arm/boot/dts/meson8b.dtsi
>> +++ b/arch/arm/boot/dts/meson8b.dtsi
>> @@ -185,9 +185,8 @@
>>   {
>> compatible = "amlogic,meson8b-saradc", "amlogic,meson-saradc";
>> clocks = < CLKID_XTAL>,
>> -   < CLKID_SAR_ADC>,
>> -   < CLKID_SANA>;
>> -   clock-names = "clkin", "core", "sana";
>> +   < CLKID_SAR_ADC>;
>> +   clock-names = "clkin", "core";
>>  };
>>
>>  _AO {
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> index af834cdbba79..b77f2593cdc3 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi
>> @@ -686,10 +686,9 @@
>> compatible = "amlogic,meson-gxbb-saradc", "amlogic,meson-saradc";
>> clocks = <>,
>>  < CLKID_SAR_ADC>,
>> -< CLKID_SANA>,
>>  < CLKID_SAR_ADC_CLK>,
>>  < CLKID_SAR_ADC_SEL>;
>> -   clock-names = "clkin", "core", "sana", "adc_clk", "adc_sel";
>> +   clock-names = "clkin", "core", "adc_clk", "adc_sel";
>>  };
>>
>>  _emmc_a {
>> diff --git a/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi 
>> b/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> index d8dd3298b15c..07805a3b4db0 100644
>> --- a/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> +++ b/arch/arm64/boot/dts/amlogic/meson-gxl.dtsi
>> @@ -628,10 +628,9 @@
>> compatible = "amlogic,meson-gxl-saradc", "amlogic,meson-saradc";
>> clocks = <>,
>>  < CLKID_SAR_ADC>,
>> -< CLKID_SANA>,
>>  < CLKID_SAR_ADC_CLK>,
>>  < CLKID_SAR_ADC_SEL>;
>> -   clock-names = "clkin", "core", "sana", "adc_clk", "adc_sel";
>> +   clock-names = "clkin", "core", "adc_clk", "adc_sel";
>>  };
>>
>>  _emmc_a {
>> --
>> 2.14.1
>>
> 
> .
> 


RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-12 Thread Ran Wang
Hello Michal,



> Date: Fri, 13 Oct 2017 14:00:12 +0200
> 
> From: Michal Hocko 
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed page
> migration attempts but we shouldn't even attempt to migrate that memory
> and fail right away because this memory is clearly not migrateable. This will
> become a real problem when we drop the retry loop counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if there
> are any non migrateable pages in the area. In orther to guarantee that
> remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman 
> Signed-off-by: Michal Hocko 
> Tested-by: Michael Ellerman 
> Acked-by: Vlastimil Babka 
> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> struct page *page, int count,
>*/
>   if (zone_idx(zone) == ZONE_MOVABLE)
>   return false;
> - mt = get_pageblock_migratetype(page);
> - if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> - return false;

This drop cause DWC3 USB controller fail on initialization with Layerscaper 
processors
(such as LS1043A) as below:

[2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus 
number 1
[2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
[2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
[2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
[2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
[2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller

And I notice that someone also reported to you that DWC2 got affected recently,
so do you have the solution now?

Best regards

Ran
> 
>   pfn = page_to_pfn(page);
>   for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {


RE: [PATCH 1/2] mm: drop migrate type checks from has_unmovable_pages

2017-11-12 Thread Ran Wang
Hello Michal,



> Date: Fri, 13 Oct 2017 14:00:12 +0200
> 
> From: Michal Hocko 
> 
> Michael has noticed that the memory offline tries to migrate kernel code
> pages when doing  echo 0 > /sys/devices/system/memory/memory0/online
> 
> The current implementation will fail the operation after several failed page
> migration attempts but we shouldn't even attempt to migrate that memory
> and fail right away because this memory is clearly not migrateable. This will
> become a real problem when we drop the retry loop counter resp. timeout.
> 
> The real problem is in has_unmovable_pages in fact. We should fail if there
> are any non migrateable pages in the area. In orther to guarantee that
> remove the migrate type checks because MIGRATE_MOVABLE is not
> guaranteed to contain only migrateable pages. It is merely a heuristic.
> Similarly MIGRATE_CMA does guarantee that the page allocator doesn't
> allocate any non-migrateable pages from the block but CMA allocations
> themselves are unlikely to migrateable. Therefore remove both checks.
> 
> Reported-by: Michael Ellerman 
> Signed-off-by: Michal Hocko 
> Tested-by: Michael Ellerman 
> Acked-by: Vlastimil Babka 
> ---
>  mm/page_alloc.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> 3badcedf96a7..ad0294ab3e4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7355,9 +7355,6 @@ bool has_unmovable_pages(struct zone *zone,
> struct page *page, int count,
>*/
>   if (zone_idx(zone) == ZONE_MOVABLE)
>   return false;
> - mt = get_pageblock_migratetype(page);
> - if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> - return false;

This drop cause DWC3 USB controller fail on initialization with Layerscaper 
processors
(such as LS1043A) as below:

[2.701437] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus 
number 1
[2.710949] cma: cma_alloc: alloc failed, req-size: 1 pages, ret: -16
[2.717411] xhci-hcd xhci-hcd.0.auto: can't setup: -12
[2.727940] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
[2.733607] xhci-hcd: probe of xhci-hcd.0.auto failed with error -12
[2.739978] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller

And I notice that someone also reported to you that DWC2 got affected recently,
so do you have the solution now?

Best regards

Ran
> 
>   pfn = page_to_pfn(page);
>   for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {


[GIT PULL] locking changes for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest locking-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-core-for-linus

   # HEAD: 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730 locking/x86: Use LOCK ADD 
for smp_mb() instead of MFENCE

The main changes in this cycle are:

 - Another attempt at enabling cross-release lockdep dependency tracking 
   (automatically part of CONFIG_PROVE_LOCKING=y), this time with better 
   performance and fewer false positives. (Byungchul Park)

 - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded
   equivalents to lockdep variants. (Frederic Weisbecker)

 - Add down_read_killable() and use it in the VFS's iterate_dir() method.
   (Kirill Tkhai)

 - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of 
   the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney)

 - Get rid of lockless_dereference(), by strengthening Alpha atomics, 
   strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able
   to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon)

 - Various micro-optimizations:

- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)

 - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, 
Miguel 
   Bernal Marin)

 Thanks,

Ingo

-->
Borislav Petkov (1):
  locking/static_keys: Improve uninitialized key warning

Byungchul Park (8):
  locking/lockdep: Provide empty lockdep_map structure for !CONFIG_LOCKDEP
  locking/lockdep, sched/completions: Change the prefix of lock name for 
completion variables
  locking/lockdep: Add a boot parameter allowing unwind in cross-release 
and disable it by default
  locking/lockdep: Remove the BROKEN flag from CONFIG_LOCKDEP_CROSSRELEASE 
and CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Introduce 
CONFIG_BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK=y
  sched/completions: Add support for initializing completions with 
lockdep_map
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  block, locking/lockdep: Assign a lock_class per gendisk used for 
wait_for_completion()

Cheng Jian (1):
  locking/rwlocks: Fix comments

Christoph Hellwig (1):
  block: Use DECLARE_COMPLETION_ONSTACK() in submit_bio_wait()

Dou Liyang (1):
  x86/paravirt: Set up the virt_spin_lock_key after static keys get 
initialized

Frederic Weisbecker (14):
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: 
lockdep_assert_irqs_enabled()/disabled()
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are 
disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  rcu: Use lockdep to assert IRQs are disabled/enabled

Juergen Gross (2):
  locking/paravirt: Use new static key for controlling call of 
virt_spin_lock()
  locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case

Kees Cook (2):
  locking/refcounts, x86/asm: Use unique .text section for refcount 
exceptions
  locking/refcounts, x86/asm: Enable CONFIG_ARCH_HAS_REFCOUNT

Kirill Tkhai (6):
  locking/arch, alpha: Add __down_read_killable()
  locking/arch, ia64: Add __down_read_killable()
  locking/arch, s390: Add __down_read_killable()
  locking/arch, x86: Add __down_read_killable()
  locking/rwsem: Add down_read_killable()
  locking/rwsem, fs: Use killable down_read() in iterate_dir()

Mark Rutland (14):
  locking/atomics, dm-integrity: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, EDAC/altera: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, firmware/ivc: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, fs/dcache: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, fs/ncpfs: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, media/dvb_ringbuffer: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/netlink/netfilter: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/ipv4/tcp_input.c: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/average: 

[GIT PULL] locking changes for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest locking-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-core-for-linus

   # HEAD: 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730 locking/x86: Use LOCK ADD 
for smp_mb() instead of MFENCE

The main changes in this cycle are:

 - Another attempt at enabling cross-release lockdep dependency tracking 
   (automatically part of CONFIG_PROVE_LOCKING=y), this time with better 
   performance and fewer false positives. (Byungchul Park)

 - Introduce lockdep_assert_irqs_enabled()/disabled() and convert open-coded
   equivalents to lockdep variants. (Frederic Weisbecker)

 - Add down_read_killable() and use it in the VFS's iterate_dir() method.
   (Kirill Tkhai)

 - Convert remaining uses of ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE(). Most of 
   the conversion was Coccinelle driven. (Mark Rutland, Paul E. McKenney)

 - Get rid of lockless_dereference(), by strengthening Alpha atomics, 
   strengthening READ_ONCE() with smp_read_barrier_depends() and thus being able
   to convert users of lockless_dereference() to READ_ONCE(). (Will Deacon)

 - Various micro-optimizations:

- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)

 - ... plus other fixes and enhancements. (Borislav Petkov, Juergen Gross, 
Miguel 
   Bernal Marin)

 Thanks,

Ingo

-->
Borislav Petkov (1):
  locking/static_keys: Improve uninitialized key warning

Byungchul Park (8):
  locking/lockdep: Provide empty lockdep_map structure for !CONFIG_LOCKDEP
  locking/lockdep, sched/completions: Change the prefix of lock name for 
completion variables
  locking/lockdep: Add a boot parameter allowing unwind in cross-release 
and disable it by default
  locking/lockdep: Remove the BROKEN flag from CONFIG_LOCKDEP_CROSSRELEASE 
and CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Introduce 
CONFIG_BOOTPARAM_LOCKDEP_CROSSRELEASE_FULLSTACK=y
  sched/completions: Add support for initializing completions with 
lockdep_map
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  block, locking/lockdep: Assign a lock_class per gendisk used for 
wait_for_completion()

Cheng Jian (1):
  locking/rwlocks: Fix comments

Christoph Hellwig (1):
  block: Use DECLARE_COMPLETION_ONSTACK() in submit_bio_wait()

Dou Liyang (1):
  x86/paravirt: Set up the virt_spin_lock_key after static keys get 
initialized

Frederic Weisbecker (14):
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: 
lockdep_assert_irqs_enabled()/disabled()
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are 
disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  rcu: Use lockdep to assert IRQs are disabled/enabled

Juergen Gross (2):
  locking/paravirt: Use new static key for controlling call of 
virt_spin_lock()
  locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case

Kees Cook (2):
  locking/refcounts, x86/asm: Use unique .text section for refcount 
exceptions
  locking/refcounts, x86/asm: Enable CONFIG_ARCH_HAS_REFCOUNT

Kirill Tkhai (6):
  locking/arch, alpha: Add __down_read_killable()
  locking/arch, ia64: Add __down_read_killable()
  locking/arch, s390: Add __down_read_killable()
  locking/arch, x86: Add __down_read_killable()
  locking/rwsem: Add down_read_killable()
  locking/rwsem, fs: Use killable down_read() in iterate_dir()

Mark Rutland (14):
  locking/atomics, dm-integrity: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, EDAC/altera: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, firmware/ivc: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, fs/dcache: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, fs/ncpfs: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, media/dvb_ringbuffer: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/netlink/netfilter: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/ipv4/tcp_input.c: Convert ACCESS_ONCE() to 
READ_ONCE()/WRITE_ONCE()
  locking/atomics, net/average: 

[GIT PULL] usercopy whitelisting for v4.15-rc1

2017-11-12 Thread Kees Cook
Hi,

Please pull these hardened usercopy whitelisting changes for v4.15-rc1.
This significantly narrows the areas of memory that can be copied to/from
userspace in the face of usercopy bugs.

Thanks!

-Kees

The following changes since commit 9e66317d3c92ddaab330c125dfe9d06eee268aff:

  Linux 4.14-rc3 (2017-10-01 14:54:54 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/usercopy-v4.15-rc1

for you to fetch changes up to 3889a28c449c01cebe166e413a58742002c2352b:

  lkdtm: Update usercopy tests for whitelisting (2017-11-08 15:40:04 -0800)


Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)


David Windsor (23):
  usercopy: Prepare for usercopy whitelisting
  usercopy: Enforce slab cache usercopy region boundaries
  usercopy: Mark kmalloc caches as usercopy caches
  dcache: Define usercopy region in dentry_cache slab cache
  vfs: Define usercopy region in names_cache slab caches
  vfs: Copy struct mount.mnt_id to userspace using put_user()
  ext4: Define usercopy region in ext4_inode_cache slab cache
  ext2: Define usercopy region in ext2_inode_cache slab cache
  jfs: Define usercopy region in jfs_ip slab cache
  befs: Define usercopy region in befs_inode_cache slab cache
  exofs: Define usercopy region in exofs_inode_cache slab cache
  orangefs: Define usercopy region in orangefs_inode_cache slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  cifs: Define usercopy region in cifs_request slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  net: Define usercopy region in struct proto slab cache
  ip: Define usercopy region in IP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  sctp: Define usercopy region in SCTP proto slab cache
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  fork: Define usercopy region in mm_struct slab caches
  fork: Define usercopy region in thread_stack slab caches

Kees Cook (8):
  net: Restrict unwhitelisted proto caches to size 0
  fork: Provide usercopy whitelisting for task_struct
  x86: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  arm: Implement thread_struct whitelist for hardened usercopy
  usercopy: Allow for temporary fallback for non-whitelisted usercopy
  usercopy: Restrict non-usercopy caches to size 0
  lkdtm: Update usercopy tests for whitelisting

Paolo Bonzini (2):
  kvm: whitelist struct kvm_vcpu_arch
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl

 arch/Kconfig   | 11 +
 arch/arm/Kconfig   |  1 +
 arch/arm/include/asm/processor.h   |  7 +++
 arch/arm64/Kconfig |  1 +
 arch/arm64/include/asm/processor.h |  8 
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/processor.h   |  8 
 arch/x86/kvm/x86.c |  7 +--
 drivers/misc/lkdtm.h   |  4 +-
 drivers/misc/lkdtm_core.c  |  4 +-
 drivers/misc/lkdtm_usercopy.c  | 88 +-
 drivers/scsi/scsi_lib.c|  9 ++--
 fs/befs/linuxvfs.c | 14 +++---
 fs/cifs/cifsfs.c   | 10 +++--
 fs/dcache.c|  9 ++--
 fs/exofs/super.c   |  7 ++-
 fs/ext2/super.c| 12 +++---
 fs/ext4/super.c| 12 +++---
 fs/fhandle.c   |  3 +-
 fs/freevxfs/vxfs_super.c   |  8 +++-
 fs/jfs/super.c |  8 ++--
 fs/orangefs/super.c| 15 ---
 fs/ufs/super.c | 13 +++---
 include/linux/sched/task.h | 14 ++
 include/linux/slab.h   | 27 +---
 include/linux/slab_def.h   |  3 ++
 include/linux/slub_def.h   |  3 ++
 include/linux/stddef.h |  2 +
 include/net/sctp/structs.h |  9 +++-
 

[GIT PULL] usercopy whitelisting for v4.15-rc1

2017-11-12 Thread Kees Cook
Hi,

Please pull these hardened usercopy whitelisting changes for v4.15-rc1.
This significantly narrows the areas of memory that can be copied to/from
userspace in the face of usercopy bugs.

Thanks!

-Kees

The following changes since commit 9e66317d3c92ddaab330c125dfe9d06eee268aff:

  Linux 4.14-rc3 (2017-10-01 14:54:54 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/usercopy-v4.15-rc1

for you to fetch changes up to 3889a28c449c01cebe166e413a58742002c2352b:

  lkdtm: Update usercopy tests for whitelisting (2017-11-08 15:40:04 -0800)


Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)


David Windsor (23):
  usercopy: Prepare for usercopy whitelisting
  usercopy: Enforce slab cache usercopy region boundaries
  usercopy: Mark kmalloc caches as usercopy caches
  dcache: Define usercopy region in dentry_cache slab cache
  vfs: Define usercopy region in names_cache slab caches
  vfs: Copy struct mount.mnt_id to userspace using put_user()
  ext4: Define usercopy region in ext4_inode_cache slab cache
  ext2: Define usercopy region in ext2_inode_cache slab cache
  jfs: Define usercopy region in jfs_ip slab cache
  befs: Define usercopy region in befs_inode_cache slab cache
  exofs: Define usercopy region in exofs_inode_cache slab cache
  orangefs: Define usercopy region in orangefs_inode_cache slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  cifs: Define usercopy region in cifs_request slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  net: Define usercopy region in struct proto slab cache
  ip: Define usercopy region in IP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  sctp: Define usercopy region in SCTP proto slab cache
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  fork: Define usercopy region in mm_struct slab caches
  fork: Define usercopy region in thread_stack slab caches

Kees Cook (8):
  net: Restrict unwhitelisted proto caches to size 0
  fork: Provide usercopy whitelisting for task_struct
  x86: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  arm: Implement thread_struct whitelist for hardened usercopy
  usercopy: Allow for temporary fallback for non-whitelisted usercopy
  usercopy: Restrict non-usercopy caches to size 0
  lkdtm: Update usercopy tests for whitelisting

Paolo Bonzini (2):
  kvm: whitelist struct kvm_vcpu_arch
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl

 arch/Kconfig   | 11 +
 arch/arm/Kconfig   |  1 +
 arch/arm/include/asm/processor.h   |  7 +++
 arch/arm64/Kconfig |  1 +
 arch/arm64/include/asm/processor.h |  8 
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/processor.h   |  8 
 arch/x86/kvm/x86.c |  7 +--
 drivers/misc/lkdtm.h   |  4 +-
 drivers/misc/lkdtm_core.c  |  4 +-
 drivers/misc/lkdtm_usercopy.c  | 88 +-
 drivers/scsi/scsi_lib.c|  9 ++--
 fs/befs/linuxvfs.c | 14 +++---
 fs/cifs/cifsfs.c   | 10 +++--
 fs/dcache.c|  9 ++--
 fs/exofs/super.c   |  7 ++-
 fs/ext2/super.c| 12 +++---
 fs/ext4/super.c| 12 +++---
 fs/fhandle.c   |  3 +-
 fs/freevxfs/vxfs_super.c   |  8 +++-
 fs/jfs/super.c |  8 ++--
 fs/orangefs/super.c| 15 ---
 fs/ufs/super.c | 13 +++---
 include/linux/sched/task.h | 14 ++
 include/linux/slab.h   | 27 +---
 include/linux/slab_def.h   |  3 ++
 include/linux/slub_def.h   |  3 ++
 include/linux/stddef.h |  2 +
 include/net/sctp/structs.h |  9 +++-
 

Re: [PATCH 4/4] kbuild: optimize object directory creation for incremental build

2017-11-12 Thread Masahiro Yamada
Hi Cao,


2017-11-10 19:58 GMT+09:00 Cao jin :
> Masahiro-san
>
> On 11/09/2017 11:41 PM, Masahiro Yamada wrote:
>> The previous commit largely optimized the object directory creation.
>> We can optimize it more for incremental build.
>>
>> There are already *.cmd files in the output directory.  The existing
>> *.cmd files have been picked up by $(wildcard ...).  Obviously,
>> directories containing them exist too, so we can skip "mkdir -p".
>>
>> With this, Kbuild runs almost zero "mkdir -p" in incremental building.
>>
>> Signed-off-by: Masahiro Yamada 
>> ---
>>
>>  scripts/Makefile.build | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
>> index 89ac180..90ea7a5 100644
>> --- a/scripts/Makefile.build
>> +++ b/scripts/Makefile.build
>> @@ -583,8 +583,13 @@ endif
>>  ifneq ($(KBUILD_SRC),)
>>  # Create directories for object files if directory does not exist
>>  obj-dirs := $(sort $(obj) $(patsubst %/,%, $(dir $(targets
>> +# If cmd_files exist, their directories apparently exist.  Skip mkdir.
>> +exist-dirs := $(sort $(patsubst %/,%, $(dir $(cmd_files
>> +obj-dirs := $(strip $(filter-out . $(exist-dirs), $(obj-dirs)))
>
> First I am not sure if the dot "." here is necessary, because I guess
> kbuild always descend into subdir do recursive make, so, very
> $(cmd_files) should have at least 1 level dir.


The top level Makefile descends into ./Kbuild

prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.

So, it is possible to have 0 level dir.






> Second, Assuming that "." probably exists,

Right "." always exists.  That's way I filtered it out.


> would it be "./"? because it
> is what "dir" function returns.

No.
You missed  $(patsubst %/,%, ...)


Having said that, "." generally comes from phony targets
and I think I can fix it in a more correct way.
I will remove "." from v2.


> --
> Sincerely,
> Cao jin
>
>> +ifneq ($(obj-dirs),)
>>  $(shell mkdir -p $(obj-dirs))
>>  endif
>> +endif
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 4/4] kbuild: optimize object directory creation for incremental build

2017-11-12 Thread Masahiro Yamada
Hi Cao,


2017-11-10 19:58 GMT+09:00 Cao jin :
> Masahiro-san
>
> On 11/09/2017 11:41 PM, Masahiro Yamada wrote:
>> The previous commit largely optimized the object directory creation.
>> We can optimize it more for incremental build.
>>
>> There are already *.cmd files in the output directory.  The existing
>> *.cmd files have been picked up by $(wildcard ...).  Obviously,
>> directories containing them exist too, so we can skip "mkdir -p".
>>
>> With this, Kbuild runs almost zero "mkdir -p" in incremental building.
>>
>> Signed-off-by: Masahiro Yamada 
>> ---
>>
>>  scripts/Makefile.build | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
>> index 89ac180..90ea7a5 100644
>> --- a/scripts/Makefile.build
>> +++ b/scripts/Makefile.build
>> @@ -583,8 +583,13 @@ endif
>>  ifneq ($(KBUILD_SRC),)
>>  # Create directories for object files if directory does not exist
>>  obj-dirs := $(sort $(obj) $(patsubst %/,%, $(dir $(targets
>> +# If cmd_files exist, their directories apparently exist.  Skip mkdir.
>> +exist-dirs := $(sort $(patsubst %/,%, $(dir $(cmd_files
>> +obj-dirs := $(strip $(filter-out . $(exist-dirs), $(obj-dirs)))
>
> First I am not sure if the dot "." here is necessary, because I guess
> kbuild always descend into subdir do recursive make, so, very
> $(cmd_files) should have at least 1 level dir.


The top level Makefile descends into ./Kbuild

prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.

So, it is possible to have 0 level dir.






> Second, Assuming that "." probably exists,

Right "." always exists.  That's way I filtered it out.


> would it be "./"? because it
> is what "dir" function returns.

No.
You missed  $(patsubst %/,%, ...)


Having said that, "." generally comes from phony targets
and I think I can fix it in a more correct way.
I will remove "." from v2.


> --
> Sincerely,
> Cao jin
>
>> +ifneq ($(obj-dirs),)
>>  $(shell mkdir -p $(obj-dirs))
>>  endif
>> +endif
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 2/3] Input: twl6040-vibra: fix child-node lookup

2017-11-12 Thread Peter Ujfalusi


On 2017-11-11 17:43, Johan Hovold wrote:
> Fix child-node lookup during probe, which ended up searching the whole
> device tree depth-first starting at parent rather than just matching on
> its children.
> 
> Later sanity checks on node properties (which would likely be missing)
> should prevent this from causing much trouble however, especially as the
> original premature free of the parent node has already been fixed
> separately (but that "fix" was apparently never backported to stable).
> 
> Fixes: e7ec014a47e4 ("Input: twl6040-vibra - update for device tree support")
> Fixes: c52c545ead97 ("Input: twl6040-vibra - fix DT node memory management")
> Cc: stable  # 3.6
> Cc: Peter Ujfalusi 
> Cc: H. Nikolaus Schaller 
> Signed-off-by: Johan Hovold 

Acked-by: Peter Ujfalusi 

> ---
>  drivers/input/misc/twl6040-vibra.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/input/misc/twl6040-vibra.c 
> b/drivers/input/misc/twl6040-vibra.c
> index 5690eb7ff954..15e0d352c4cc 100644
> --- a/drivers/input/misc/twl6040-vibra.c
> +++ b/drivers/input/misc/twl6040-vibra.c
> @@ -248,8 +248,7 @@ static int twl6040_vibra_probe(struct platform_device 
> *pdev)
>   int vddvibr_uV = 0;
>   int error;
>  
> - of_node_get(twl6040_core_dev->of_node);
> - twl6040_core_node = of_find_node_by_name(twl6040_core_dev->of_node,
> + twl6040_core_node = of_get_child_by_name(twl6040_core_dev->of_node,
>"vibra");
>   if (!twl6040_core_node) {
>   dev_err(>dev, "parent of node is missing?\n");
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH 2/3] Input: twl6040-vibra: fix child-node lookup

2017-11-12 Thread Peter Ujfalusi


On 2017-11-11 17:43, Johan Hovold wrote:
> Fix child-node lookup during probe, which ended up searching the whole
> device tree depth-first starting at parent rather than just matching on
> its children.
> 
> Later sanity checks on node properties (which would likely be missing)
> should prevent this from causing much trouble however, especially as the
> original premature free of the parent node has already been fixed
> separately (but that "fix" was apparently never backported to stable).
> 
> Fixes: e7ec014a47e4 ("Input: twl6040-vibra - update for device tree support")
> Fixes: c52c545ead97 ("Input: twl6040-vibra - fix DT node memory management")
> Cc: stable  # 3.6
> Cc: Peter Ujfalusi 
> Cc: H. Nikolaus Schaller 
> Signed-off-by: Johan Hovold 

Acked-by: Peter Ujfalusi 

> ---
>  drivers/input/misc/twl6040-vibra.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/input/misc/twl6040-vibra.c 
> b/drivers/input/misc/twl6040-vibra.c
> index 5690eb7ff954..15e0d352c4cc 100644
> --- a/drivers/input/misc/twl6040-vibra.c
> +++ b/drivers/input/misc/twl6040-vibra.c
> @@ -248,8 +248,7 @@ static int twl6040_vibra_probe(struct platform_device 
> *pdev)
>   int vddvibr_uV = 0;
>   int error;
>  
> - of_node_get(twl6040_core_dev->of_node);
> - twl6040_core_node = of_find_node_by_name(twl6040_core_dev->of_node,
> + twl6040_core_node = of_get_child_by_name(twl6040_core_dev->of_node,
>"vibra");
>   if (!twl6040_core_node) {
>   dev_err(>dev, "parent of node is missing?\n");
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH 1/3] Input: twl4030-vibra: fix sibling-node lookup

2017-11-12 Thread Peter Ujfalusi


On 2017-11-11 17:43, Johan Hovold wrote:
> A helper purported to look up a child node based on its name was using
> the wrong of-helper and ended up prematurely freeing the parent of-node
> while searching the whole device tree depth-first starting at the parent
> node.
> 
> Fixes: 64b9e4d803b1 ("input: twl4030-vibra: Support for DT booted kernel")
> Fixes: e661d0a04462 ("Input: twl4030-vibra - fix ERROR: Bad of_node_put() 
> warning")
> Cc: stable  # 3.7
> Cc: Peter Ujfalusi 
> Cc: Marek Belisko 
> Signed-off-by: Johan Hovold 

Acked-by: Peter Ujfalusi 

> ---
>  drivers/input/misc/twl4030-vibra.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/input/misc/twl4030-vibra.c 
> b/drivers/input/misc/twl4030-vibra.c
> index 6c51d404874b..c37aea9ac272 100644
> --- a/drivers/input/misc/twl4030-vibra.c
> +++ b/drivers/input/misc/twl4030-vibra.c
> @@ -178,12 +178,14 @@ static SIMPLE_DEV_PM_OPS(twl4030_vibra_pm_ops,
>twl4030_vibra_suspend, twl4030_vibra_resume);
>  
>  static bool twl4030_vibra_check_coexist(struct twl4030_vibra_data *pdata,
> -   struct device_node *node)
> +   struct device_node *parent)
>  {
> + struct device_node *node;
> +
>   if (pdata && pdata->coexist)
>   return true;
>  
> - node = of_find_node_by_name(node, "codec");
> + node = of_get_child_by_name(parent, "codec");
>   if (node) {
>   of_node_put(node);
>   return true;
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH 1/3] Input: twl4030-vibra: fix sibling-node lookup

2017-11-12 Thread Peter Ujfalusi


On 2017-11-11 17:43, Johan Hovold wrote:
> A helper purported to look up a child node based on its name was using
> the wrong of-helper and ended up prematurely freeing the parent of-node
> while searching the whole device tree depth-first starting at the parent
> node.
> 
> Fixes: 64b9e4d803b1 ("input: twl4030-vibra: Support for DT booted kernel")
> Fixes: e661d0a04462 ("Input: twl4030-vibra - fix ERROR: Bad of_node_put() 
> warning")
> Cc: stable  # 3.7
> Cc: Peter Ujfalusi 
> Cc: Marek Belisko 
> Signed-off-by: Johan Hovold 

Acked-by: Peter Ujfalusi 

> ---
>  drivers/input/misc/twl4030-vibra.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/input/misc/twl4030-vibra.c 
> b/drivers/input/misc/twl4030-vibra.c
> index 6c51d404874b..c37aea9ac272 100644
> --- a/drivers/input/misc/twl4030-vibra.c
> +++ b/drivers/input/misc/twl4030-vibra.c
> @@ -178,12 +178,14 @@ static SIMPLE_DEV_PM_OPS(twl4030_vibra_pm_ops,
>twl4030_vibra_suspend, twl4030_vibra_resume);
>  
>  static bool twl4030_vibra_check_coexist(struct twl4030_vibra_data *pdata,
> -   struct device_node *node)
> +   struct device_node *parent)
>  {
> + struct device_node *node;
> +
>   if (pdata && pdata->coexist)
>   return true;
>  
> - node = of_find_node_by_name(node, "codec");
> + node = of_get_child_by_name(parent, "codec");
>   if (node) {
>   of_node_put(node);
>   return true;
> 

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH] KVM: x86: inject exceptions produced by x86_decode_insn

2017-11-12 Thread Wanpeng Li
2017-11-10 17:49 GMT+08:00 Paolo Bonzini :
> Sometimes, a processor might execute an instruction while another
> processor is updating the page tables for that instruction's code page,
> but before the TLB shootdown completes.  The interesting case happens
> if the page is in the TLB.
>
> In general, the processor will succeed in executing the instruction and
> nothing bad happens.  However, what if the instruction is an MMIO access?
> If *that* happens, KVM invokes the emulator, and the emulator gets the
> updated page tables.  If the update side had marked the code page as non
> present, the page table walk then will fail and so will x86_decode_insn.
>
> Unfortunately, even though kvm_fetch_guest_virt is correctly returning
> X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
> a fatal error if the instruction cannot simply be reexecuted (as is the
> case for MMIO).  And this in fact happened sometimes when rebooting
> Windows 2012r2 guests.  Just checking ctxt->have_exception and injecting
> the exception if true is enough to fix the case.

I found the only place which can set ctxt->have_exception is in the
function x86_emulate_insn(), and x86_decode_insn() will not set
ctxt->have_exception even if kvm_fetch_guest_virt() returns
X86_EMUL_PROPAGATE_FAULT.

Regards,
Wanpeng Li

>
> Thanks to Eduardo Habkost for helping in the debugging of this issue.
>
> Reported-by: Yanan Fu 
> Cc: Eduardo Habkost 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/x86.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 34c85aa2e2d1..6dbed9022797 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5722,6 +5722,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> if (reexecute_instruction(vcpu, cr2, 
> write_fault_to_spt,
> emulation_type))
> return EMULATE_DONE;
> +   if (ctxt->have_exception && 
> inject_emulated_exception(vcpu))
> +   return EMULATE_DONE;
> if (emulation_type & EMULTYPE_SKIP)
> return EMULATE_FAIL;
> return handle_emulation_failure(vcpu);
> --
> 1.8.3.1
>


Re: [PATCH] KVM: x86: inject exceptions produced by x86_decode_insn

2017-11-12 Thread Wanpeng Li
2017-11-10 17:49 GMT+08:00 Paolo Bonzini :
> Sometimes, a processor might execute an instruction while another
> processor is updating the page tables for that instruction's code page,
> but before the TLB shootdown completes.  The interesting case happens
> if the page is in the TLB.
>
> In general, the processor will succeed in executing the instruction and
> nothing bad happens.  However, what if the instruction is an MMIO access?
> If *that* happens, KVM invokes the emulator, and the emulator gets the
> updated page tables.  If the update side had marked the code page as non
> present, the page table walk then will fail and so will x86_decode_insn.
>
> Unfortunately, even though kvm_fetch_guest_virt is correctly returning
> X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
> a fatal error if the instruction cannot simply be reexecuted (as is the
> case for MMIO).  And this in fact happened sometimes when rebooting
> Windows 2012r2 guests.  Just checking ctxt->have_exception and injecting
> the exception if true is enough to fix the case.

I found the only place which can set ctxt->have_exception is in the
function x86_emulate_insn(), and x86_decode_insn() will not set
ctxt->have_exception even if kvm_fetch_guest_virt() returns
X86_EMUL_PROPAGATE_FAULT.

Regards,
Wanpeng Li

>
> Thanks to Eduardo Habkost for helping in the debugging of this issue.
>
> Reported-by: Yanan Fu 
> Cc: Eduardo Habkost 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/x86.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 34c85aa2e2d1..6dbed9022797 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5722,6 +5722,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> if (reexecute_instruction(vcpu, cr2, 
> write_fault_to_spt,
> emulation_type))
> return EMULATE_DONE;
> +   if (ctxt->have_exception && 
> inject_emulated_exception(vcpu))
> +   return EMULATE_DONE;
> if (emulation_type & EMULTYPE_SKIP)
> return EMULATE_FAIL;
> return handle_emulation_failure(vcpu);
> --
> 1.8.3.1
>


[GIT PULL] RCU updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest core-rcu-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-for-linus

   # HEAD: 72bc286b81d21404cdfecddf76b64c7163aac764 Merge branch 'for-mingo' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

The main changes in this cycle are:

  - Documentation updates
  - RCU CPU stall-warning updates
  - Torture-test updates
  - Miscellaneous fixes

Size wise the biggest updates are to documentation. Excluding documentation the 
diffstat becomes:

 18 files changed, 205 insertions(+), 79 deletions(-)

... and most of the code increase comes from a single commit which expands 
debugging:

  9b9500da8150: rcu: Make RCU CPU stall warnings check for irq-disabled CPUs

 Thanks,

Ingo

-->
Alan Stern (1):
  memory-barriers: Rework multicopy-atomicity section

Guilherme G. Piccoli (1):
  doc: Rewrite confusing statement about memory barriers

Neeraj Upadhyay (1):
  rcu: Fix up pending cbs check in rcu_prepare_for_idle

Paul E. McKenney (18):
  documentation: RCU grace-period memory ordering guarantees
  documentation: Long-running irq handlers can stall RCU grace periods
  documentation: Slow systems can stall RCU grace periods
  documentation: Update RCU CPU stall warning messages
  memory-barriers: Replace uses of "transitive"
  rcu: Create call_rcu_tasks() kthread at boot time
  irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP
  sched: Make resched_cpu() unconditional
  sched,rcu: Make cond_resched() provide RCU quiescent state
  rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
  rcu: Turn off tracing before dumping trace
  rcu: Suppress RCU CPU stall warnings while dumping trace
  rcutorture: Add interrupt-disable capability to stall-warning tests
  rcutorture: Dump writer stack if stalled
  torture: Provide TMPDIR environment variable to specify tmpdir
  rcu: Suppress lockdep false-positive ->boost_mtx complaints
  rcu: Add extended-quiescent-state testing advice
  srcu: Add parameters to SRCU docbook comments

Scott Tsai (1):
  memory-barriers.txt: Fix typo in pairing example

Sebastian Andrzej Siewior (2):
  rcu: Do not include rtmutex_common.h unconditionally
  rcu/segcblist: Include rcupdate.h


 .../Design/Memory-Ordering/Tree-RCU-Diagram.html   |9 +
 .../Memory-Ordering/Tree-RCU-Memory-Ordering.html  |  707 +++
 .../TreeRCU-callback-invocation.svg|  486 ++
 .../Memory-Ordering/TreeRCU-callback-registry.svg  |  655 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg |  700 +++
 .../Design/Memory-Ordering/TreeRCU-gp-cleanup.svg  | 1126 +
 .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg  | 1309 +
 .../Design/Memory-Ordering/TreeRCU-gp-init-1.svg   |  656 +++
 .../Design/Memory-Ordering/TreeRCU-gp-init-2.svg   |  656 +++
 .../Design/Memory-Ordering/TreeRCU-gp-init-3.svg   |  632 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg  | 5135 
 .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg |  775 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-qs.svg  | 1095 +
 .../RCU/Design/Memory-Ordering/rcu_node-lock.svg   |  229 +
 Documentation/RCU/stallwarn.txt|  200 +-
 Documentation/admin-guide/kernel-parameters.txt|3 +
 Documentation/memory-barriers.txt  |  197 +-
 include/linux/irq_work.h   |3 -
 kernel/irq_work.c  |9 +-
 kernel/rcu/rcu.h   |   21 +-
 kernel/rcu/rcu_segcblist.c |1 +
 kernel/rcu/rcutorture.c|   24 +-
 kernel/rcu/tree.c  |  159 +-
 kernel/rcu/tree.h  |5 +
 kernel/rcu/tree_plugin.h   |   14 +-
 kernel/rcu/update.c|   25 +-
 kernel/sched/core.c|5 +-
 .../selftests/rcutorture/bin/config_override.sh|2 +-
 .../selftests/rcutorture/bin/configcheck.sh|2 +-
 .../testing/selftests/rcutorture/bin/configinit.sh |2 +-
 .../testing/selftests/rcutorture/bin/kvm-build.sh  |2 +-
 .../selftests/rcutorture/bin/kvm-test-1-run.sh |2 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh  |4 +-
 .../selftests/rcutorture/bin/parse-build.sh|2 +-
 .../selftests/rcutorture/bin/parse-torture.sh  |2 +-
 35 files changed, 14598 insertions(+), 256 deletions(-)
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Diagram.html
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
 

[GIT PULL] RCU updates for v4.15

2017-11-12 Thread Ingo Molnar
Linus,

Please pull the latest core-rcu-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-for-linus

   # HEAD: 72bc286b81d21404cdfecddf76b64c7163aac764 Merge branch 'for-mingo' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

The main changes in this cycle are:

  - Documentation updates
  - RCU CPU stall-warning updates
  - Torture-test updates
  - Miscellaneous fixes

Size wise the biggest updates are to documentation. Excluding documentation the 
diffstat becomes:

 18 files changed, 205 insertions(+), 79 deletions(-)

... and most of the code increase comes from a single commit which expands 
debugging:

  9b9500da8150: rcu: Make RCU CPU stall warnings check for irq-disabled CPUs

 Thanks,

Ingo

-->
Alan Stern (1):
  memory-barriers: Rework multicopy-atomicity section

Guilherme G. Piccoli (1):
  doc: Rewrite confusing statement about memory barriers

Neeraj Upadhyay (1):
  rcu: Fix up pending cbs check in rcu_prepare_for_idle

Paul E. McKenney (18):
  documentation: RCU grace-period memory ordering guarantees
  documentation: Long-running irq handlers can stall RCU grace periods
  documentation: Slow systems can stall RCU grace periods
  documentation: Update RCU CPU stall warning messages
  memory-barriers: Replace uses of "transitive"
  rcu: Create call_rcu_tasks() kthread at boot time
  irq_work: Map irq_work_on_queue() to irq_work_on() in !SMP
  sched: Make resched_cpu() unconditional
  sched,rcu: Make cond_resched() provide RCU quiescent state
  rcu: Make RCU CPU stall warnings check for irq-disabled CPUs
  rcu: Turn off tracing before dumping trace
  rcu: Suppress RCU CPU stall warnings while dumping trace
  rcutorture: Add interrupt-disable capability to stall-warning tests
  rcutorture: Dump writer stack if stalled
  torture: Provide TMPDIR environment variable to specify tmpdir
  rcu: Suppress lockdep false-positive ->boost_mtx complaints
  rcu: Add extended-quiescent-state testing advice
  srcu: Add parameters to SRCU docbook comments

Scott Tsai (1):
  memory-barriers.txt: Fix typo in pairing example

Sebastian Andrzej Siewior (2):
  rcu: Do not include rtmutex_common.h unconditionally
  rcu/segcblist: Include rcupdate.h


 .../Design/Memory-Ordering/Tree-RCU-Diagram.html   |9 +
 .../Memory-Ordering/Tree-RCU-Memory-Ordering.html  |  707 +++
 .../TreeRCU-callback-invocation.svg|  486 ++
 .../Memory-Ordering/TreeRCU-callback-registry.svg  |  655 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-dyntick.svg |  700 +++
 .../Design/Memory-Ordering/TreeRCU-gp-cleanup.svg  | 1126 +
 .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg  | 1309 +
 .../Design/Memory-Ordering/TreeRCU-gp-init-1.svg   |  656 +++
 .../Design/Memory-Ordering/TreeRCU-gp-init-2.svg   |  656 +++
 .../Design/Memory-Ordering/TreeRCU-gp-init-3.svg   |  632 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg  | 5135 
 .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg |  775 +++
 .../RCU/Design/Memory-Ordering/TreeRCU-qs.svg  | 1095 +
 .../RCU/Design/Memory-Ordering/rcu_node-lock.svg   |  229 +
 Documentation/RCU/stallwarn.txt|  200 +-
 Documentation/admin-guide/kernel-parameters.txt|3 +
 Documentation/memory-barriers.txt  |  197 +-
 include/linux/irq_work.h   |3 -
 kernel/irq_work.c  |9 +-
 kernel/rcu/rcu.h   |   21 +-
 kernel/rcu/rcu_segcblist.c |1 +
 kernel/rcu/rcutorture.c|   24 +-
 kernel/rcu/tree.c  |  159 +-
 kernel/rcu/tree.h  |5 +
 kernel/rcu/tree_plugin.h   |   14 +-
 kernel/rcu/update.c|   25 +-
 kernel/sched/core.c|5 +-
 .../selftests/rcutorture/bin/config_override.sh|2 +-
 .../selftests/rcutorture/bin/configcheck.sh|2 +-
 .../testing/selftests/rcutorture/bin/configinit.sh |2 +-
 .../testing/selftests/rcutorture/bin/kvm-build.sh  |2 +-
 .../selftests/rcutorture/bin/kvm-test-1-run.sh |2 +-
 tools/testing/selftests/rcutorture/bin/kvm.sh  |4 +-
 .../selftests/rcutorture/bin/parse-build.sh|2 +-
 .../selftests/rcutorture/bin/parse-torture.sh  |2 +-
 35 files changed, 14598 insertions(+), 256 deletions(-)
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Diagram.html
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-invocation.svg
 create mode 100644 
Documentation/RCU/Design/Memory-Ordering/TreeRCU-callback-registry.svg
 

Re: [PATCHv4 3/6] powerpc64: Add .opd based function descriptor dereference

2017-11-12 Thread Santosh Sivaraj
* Sergey Senozhatsky  wrote (on 2017-11-10 
08:48:27 +0900):

> We are moving towards separate kernel and module function descriptor
> dereference callbacks. This patch enables it for powerpc64.
> 
> For pointers that belong to the kernel
> -  Added __start_opd and __end_opd pointers, to track the kernel
>.opd section address range;
> 
> -  Added dereference_kernel_function_descriptor(). Now we
>will dereference only function pointers that are within
>[__start_opd, __end_opd);
> 
> For pointers that belong to a module
> -  Added dereference_module_function_descriptor() to handle module
>function descriptor dereference. Now we will dereference only
>pointers that are within [module->opd.start, module->opd.end).
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  arch/powerpc/include/asm/module.h   |  3 +++
>  arch/powerpc/include/asm/sections.h | 12 
>  arch/powerpc/kernel/module_64.c | 14 ++
>  arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
>  4 files changed, 31 insertions(+)
>

Looks good on powerpc. If you wish:

Tested-by: Santosh Sivaraj  # for powerpc

Thanks,
Santosh

> diff --git a/arch/powerpc/include/asm/module.h 
> b/arch/powerpc/include/asm/module.h
> index 6c0132c7212f..7e28442827f1 100644
> --- a/arch/powerpc/include/asm/module.h
> +++ b/arch/powerpc/include/asm/module.h
> @@ -45,6 +45,9 @@ struct mod_arch_specific {
>   unsigned long tramp;
>  #endif
>  
> + /* For module function descriptor dereference */
> + unsigned long start_opd;
> + unsigned long end_opd;
>  #else /* powerpc64 */
>   /* Indices of PLT sections within module. */
>   unsigned int core_plt_section;
> diff --git a/arch/powerpc/include/asm/sections.h 
> b/arch/powerpc/include/asm/sections.h
> index 82bec63bbd4f..e335a8f846af 100644
> --- a/arch/powerpc/include/asm/sections.h
> +++ b/arch/powerpc/include/asm/sections.h
> @@ -66,6 +66,9 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
> unsigned long end)
>  }
>  
>  #ifdef PPC64_ELF_ABI_v1
> +
> +#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
> +
>  #undef dereference_function_descriptor
>  static inline void *dereference_function_descriptor(void *ptr)
>  {
> @@ -76,6 +79,15 @@ static inline void *dereference_function_descriptor(void 
> *ptr)
>   ptr = p;
>   return ptr;
>  }
> +
> +#undef dereference_kernel_function_descriptor
> +static inline void *dereference_kernel_function_descriptor(void *ptr)
> +{
> + if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
> + return ptr;
> +
> + return dereference_function_descriptor(ptr);
> +}
>  #endif /* PPC64_ELF_ABI_v1 */
>  
>  #endif
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 759104b99f9f..218971ac7e04 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -93,6 +93,15 @@ static unsigned int local_entry_offset(const Elf64_Sym 
> *sym)
>  {
>   return 0;
>  }
> +
> +void *dereference_module_function_descriptor(struct module *mod, void *ptr)
> +{
> + if (ptr < (void *)mod->arch.start_opd ||
> + ptr >= (void *)mod->arch.end_opd)
> + return ptr;
> +
> + return dereference_function_descriptor(ptr);
> +}
>  #endif
>  
>  #define STUB_MAGIC 0x73747562 /* stub */
> @@ -344,6 +353,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
>   else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
>   dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
> sechdrs[i].sh_size);
> + else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) {
> + me->arch.start_opd = sechdrs[i].sh_addr;
> + me->arch.end_opd = sechdrs[i].sh_addr +
> +sechdrs[i].sh_size;
> + }
>  
>   /* We don't handle .init for the moment: rename to _init */
>   while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 0494e1566ee2..5dac5ab22fa2 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -278,7 +278,9 @@ SECTIONS
>   }
>  
>   .opd : AT(ADDR(.opd) - LOAD_OFFSET) {
> + __start_opd = .;
>   *(.opd)
> + __end_opd = .;
>   }
>  
>   . = ALIGN(256);

-- 


Re: [PATCHv4 3/6] powerpc64: Add .opd based function descriptor dereference

2017-11-12 Thread Santosh Sivaraj
* Sergey Senozhatsky  wrote (on 2017-11-10 
08:48:27 +0900):

> We are moving towards separate kernel and module function descriptor
> dereference callbacks. This patch enables it for powerpc64.
> 
> For pointers that belong to the kernel
> -  Added __start_opd and __end_opd pointers, to track the kernel
>.opd section address range;
> 
> -  Added dereference_kernel_function_descriptor(). Now we
>will dereference only function pointers that are within
>[__start_opd, __end_opd);
> 
> For pointers that belong to a module
> -  Added dereference_module_function_descriptor() to handle module
>function descriptor dereference. Now we will dereference only
>pointers that are within [module->opd.start, module->opd.end).
> 
> Signed-off-by: Sergey Senozhatsky 
> ---
>  arch/powerpc/include/asm/module.h   |  3 +++
>  arch/powerpc/include/asm/sections.h | 12 
>  arch/powerpc/kernel/module_64.c | 14 ++
>  arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
>  4 files changed, 31 insertions(+)
>

Looks good on powerpc. If you wish:

Tested-by: Santosh Sivaraj  # for powerpc

Thanks,
Santosh

> diff --git a/arch/powerpc/include/asm/module.h 
> b/arch/powerpc/include/asm/module.h
> index 6c0132c7212f..7e28442827f1 100644
> --- a/arch/powerpc/include/asm/module.h
> +++ b/arch/powerpc/include/asm/module.h
> @@ -45,6 +45,9 @@ struct mod_arch_specific {
>   unsigned long tramp;
>  #endif
>  
> + /* For module function descriptor dereference */
> + unsigned long start_opd;
> + unsigned long end_opd;
>  #else /* powerpc64 */
>   /* Indices of PLT sections within module. */
>   unsigned int core_plt_section;
> diff --git a/arch/powerpc/include/asm/sections.h 
> b/arch/powerpc/include/asm/sections.h
> index 82bec63bbd4f..e335a8f846af 100644
> --- a/arch/powerpc/include/asm/sections.h
> +++ b/arch/powerpc/include/asm/sections.h
> @@ -66,6 +66,9 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
> unsigned long end)
>  }
>  
>  #ifdef PPC64_ELF_ABI_v1
> +
> +#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
> +
>  #undef dereference_function_descriptor
>  static inline void *dereference_function_descriptor(void *ptr)
>  {
> @@ -76,6 +79,15 @@ static inline void *dereference_function_descriptor(void 
> *ptr)
>   ptr = p;
>   return ptr;
>  }
> +
> +#undef dereference_kernel_function_descriptor
> +static inline void *dereference_kernel_function_descriptor(void *ptr)
> +{
> + if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
> + return ptr;
> +
> + return dereference_function_descriptor(ptr);
> +}
>  #endif /* PPC64_ELF_ABI_v1 */
>  
>  #endif
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 759104b99f9f..218971ac7e04 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -93,6 +93,15 @@ static unsigned int local_entry_offset(const Elf64_Sym 
> *sym)
>  {
>   return 0;
>  }
> +
> +void *dereference_module_function_descriptor(struct module *mod, void *ptr)
> +{
> + if (ptr < (void *)mod->arch.start_opd ||
> + ptr >= (void *)mod->arch.end_opd)
> + return ptr;
> +
> + return dereference_function_descriptor(ptr);
> +}
>  #endif
>  
>  #define STUB_MAGIC 0x73747562 /* stub */
> @@ -344,6 +353,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
>   else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
>   dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
> sechdrs[i].sh_size);
> + else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) {
> + me->arch.start_opd = sechdrs[i].sh_addr;
> + me->arch.end_opd = sechdrs[i].sh_addr +
> +sechdrs[i].sh_size;
> + }
>  
>   /* We don't handle .init for the moment: rename to _init */
>   while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index 0494e1566ee2..5dac5ab22fa2 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -278,7 +278,9 @@ SECTIONS
>   }
>  
>   .opd : AT(ADDR(.opd) - LOAD_OFFSET) {
> + __start_opd = .;
>   *(.opd)
> + __end_opd = .;
>   }
>  
>   . = ALIGN(256);

-- 


Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Kaiwan N Billimoria
On Mon, Nov 13, 2017 at 11:38 AM, Tobin C. Harding  wrote:
> On Mon, Nov 13, 2017 at 11:16:28AM +0530, kaiwan.billimo...@gmail.com wrote:
>> On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
>> > On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
>> >  wrote:
>> > > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
>> > > > Currently we are leaking addresses from the kernel to user space.
>> > > > This
...
>
> So, Linus has requested that I set up a tree for the development of
> this. I have to work out the details of how to do that and then I'll
> email you so you can get the pull the current version. I can then take
> your patch via LKML as per usual.
>
Super. Thanks.


Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Kaiwan N Billimoria
On Mon, Nov 13, 2017 at 11:38 AM, Tobin C. Harding  wrote:
> On Mon, Nov 13, 2017 at 11:16:28AM +0530, kaiwan.billimo...@gmail.com wrote:
>> On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
>> > On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
>> >  wrote:
>> > > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
>> > > > Currently we are leaking addresses from the kernel to user space.
>> > > > This
...
>
> So, Linus has requested that I set up a tree for the development of
> this. I have to work out the details of how to do that and then I'll
> email you so you can get the pull the current version. I can then take
> your patch via LKML as per usual.
>
Super. Thanks.


[PATCH] KVM: X86: Avoid to handle first-time write when updating the pv stuffs each time

2017-11-12 Thread Wanpeng Li
From: Wanpeng Li 

There is a logic to handle first-time write when updating the 
pvclock/wall clock/steal time shared memory pages each time, 
actually we should do this logic during pv stuffs setup if we 
suspect the version-field can't be guranteed to be initialized 
to an even number by the guest. This patch fixes it by handling 
the first-time write of pvclock/steal time during setup since 
the update is frequent, and keeping the wall clock since it is 
rare updating.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Liran Alon 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4552427..19311e0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1833,9 +1833,6 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
 */
BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0);
 
-   if (guest_hv_clock.version & 1)
-   ++guest_hv_clock.version;  /* first time write, random junk */
-
vcpu->hv_clock.version = guest_hv_clock.version + 1;
kvm_write_guest_cached(v->kvm, >pv_time,
>hv_clock,
@@ -2126,9 +2123,6 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
vcpu->arch.st.steal.preempted = 0;
 
-   if (vcpu->arch.st.steal.version & 1)
-   vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
-
vcpu->arch.st.steal.version += 1;
 
kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
@@ -2256,8 +2250,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 >arch.pv_time, data & ~1ULL,
 sizeof(struct pvclock_vcpu_time_info)))
vcpu->arch.pv_time_enabled = false;
-   else
+   else {
+   struct pvclock_vcpu_time_info guest_hv_clock;
+
vcpu->arch.pv_time_enabled = true;
+   if (unlikely(kvm_read_guest_cached(vcpu->kvm, 
>arch.pv_time,
+   _hv_clock, sizeof(guest_hv_clock
+   break;
+   if (guest_hv_clock.version & 1)
+   ++guest_hv_clock.version;  /* first time write, 
random junk */
+   kvm_write_guest_cached(vcpu->kvm, >arch.pv_time,
+   >arch.hv_clock,
+   sizeof(vcpu->arch.hv_clock.version));
+   }
 
break;
}
@@ -2283,6 +2288,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (!(data & KVM_MSR_ENABLED))
break;
 
+   if (unlikely(kvm_read_guest_cached(vcpu->kvm, 
>arch.st.stime,
+   >arch.st.steal, sizeof(struct kvm_steal_time
+   break;
+
+   if (vcpu->arch.st.steal.version & 1)
+   vcpu->arch.st.steal.version += 1;  /* first time write, 
random junk */
+
+   kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
+   >arch.st.steal, sizeof(struct kvm_steal_time));
+
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
break;
-- 
2.7.4



[PATCH] KVM: X86: Avoid to handle first-time write when updating the pv stuffs each time

2017-11-12 Thread Wanpeng Li
From: Wanpeng Li 

There is a logic to handle first-time write when updating the 
pvclock/wall clock/steal time shared memory pages each time, 
actually we should do this logic during pv stuffs setup if we 
suspect the version-field can't be guranteed to be initialized 
to an even number by the guest. This patch fixes it by handling 
the first-time write of pvclock/steal time during setup since 
the update is frequent, and keeping the wall clock since it is 
rare updating.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Liran Alon 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/x86.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4552427..19311e0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1833,9 +1833,6 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v)
 */
BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0);
 
-   if (guest_hv_clock.version & 1)
-   ++guest_hv_clock.version;  /* first time write, random junk */
-
vcpu->hv_clock.version = guest_hv_clock.version + 1;
kvm_write_guest_cached(v->kvm, >pv_time,
>hv_clock,
@@ -2126,9 +2123,6 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 
vcpu->arch.st.steal.preempted = 0;
 
-   if (vcpu->arch.st.steal.version & 1)
-   vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
-
vcpu->arch.st.steal.version += 1;
 
kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
@@ -2256,8 +2250,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 >arch.pv_time, data & ~1ULL,
 sizeof(struct pvclock_vcpu_time_info)))
vcpu->arch.pv_time_enabled = false;
-   else
+   else {
+   struct pvclock_vcpu_time_info guest_hv_clock;
+
vcpu->arch.pv_time_enabled = true;
+   if (unlikely(kvm_read_guest_cached(vcpu->kvm, 
>arch.pv_time,
+   _hv_clock, sizeof(guest_hv_clock
+   break;
+   if (guest_hv_clock.version & 1)
+   ++guest_hv_clock.version;  /* first time write, 
random junk */
+   kvm_write_guest_cached(vcpu->kvm, >arch.pv_time,
+   >arch.hv_clock,
+   sizeof(vcpu->arch.hv_clock.version));
+   }
 
break;
}
@@ -2283,6 +2288,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
if (!(data & KVM_MSR_ENABLED))
break;
 
+   if (unlikely(kvm_read_guest_cached(vcpu->kvm, 
>arch.st.stime,
+   >arch.st.steal, sizeof(struct kvm_steal_time
+   break;
+
+   if (vcpu->arch.st.steal.version & 1)
+   vcpu->arch.st.steal.version += 1;  /* first time write, 
random junk */
+
+   kvm_write_guest_cached(vcpu->kvm, >arch.st.stime,
+   >arch.st.steal, sizeof(struct kvm_steal_time));
+
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
break;
-- 
2.7.4



[GIT PULL] s390 updates for v4.15

2017-11-12 Thread Heiko Carstens
Hello Linus,

since Martin is on vacation you get the s390 pull request for the v4.15
merge window this time from me.

Please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates:

Besides a lot of cleanups and bug fixes these are the most important changes:

- A new regset for runtime instrumentation registers

- Hardware accelerated AES-GCM support for the aes_s390 module

- Support for the new CEX6S crypto cards

- Support for FORTIFY_SOURCE

- Addition of missing z13 and new z14 instructions to the in-kernel
  disassembler

- Generate opcode tables for the in-kernel disassembler out of a simple text
  file instead of having to manually maintain those tables

- Fast memset16, memset32 and memset64 implementations

- Removal of named saved segment support

- Hardware counter support for z14

- Queued spinlocks and queued rwlocks implementations for s390

- Use the stack_depth tracking feature for s390 BPF JIT

- A new s390_sthyi system call which emulates the sthyi (store hypervisor
  information) instruction

- Removal of the old KVM virtio transport

- An s390 specific CPU alternatives implementation which is used in the new
  spinlock code

You will see two trivial merge conflicts caused by the "SPDX GPL-2.0 license"
commit and the removal of s390 specific header files:

arch/s390/include/asm/rwsem.h
arch/s390/include/uapi/asm/kvm_virtio.h

The resolution is to simply remove those files.

Thanks,
Heiko

Alice Frosi (2):
  s390/runtime_instrumentation: clean up struct runtime_instr_cb
  s390/ptrace: add runtime instrumention register get/set

Arnd Bergmann (1):
  s390/dasd: avoid calling do_gettimeofday()

Christian Borntraeger (2):
  s390/pci: do not require AIS facility
  s390/virtio: remove unused header file kvm_virtio.h

Cornelia Huck (1):
  MAINTAINERS: add virtio-ccw.h to virtio/s390 section

Dong Jia Shi (2):
  vfio: ccw: bypass bad idaw address when fetching IDAL ccws
  vfio: ccw: validate the count field of a ccw before pinning

Elena Reshetova (1):
  vmur: convert urdev.ref_count from atomic_t to refcount_t

Harald Freudenberger (7):
  s390/zcrypt: Explicitly check input data length.
  s390/crypto: add s390 platform specific aes gcm support.
  s390/zcrypt: CEX6S exploitation
  s390/zcrypt: Enable special header file flag for AU CPRP
  s390/zcrypt: Introduce QACT support for AP bus devices.
  s390/archrandom: Reconsider s390 arch random implementation
  s390/zcrypt: Rework struct ap_qact_ap_info.

Heiko Carstens (31):
  s390: convert release_thread() into a static inline function
  s390/runtime instrumention: fix possible memory corruption
  s390/runtime instrumentation: simplify task exit handling
  s390/guarded storage: fix possible memory corruption
  s390/ptrace: fix guarded storage regset handling
  s390/guarded storage: simplify task exit handling
  s390: get rid of exit_thread()
  s390: add support for FORTIFY_SOURCE
  s390/cpumf: remove superfluous nr_cpumask_bits check
  s390/virtio: simplify Makefile
  s390/disassembler: add missing end marker for e7 table
  s390/disassembler: fix LRDFU format
  s390/disassembler: remove double instructions
  s390/disassembler: add sthyi instruction
  s390/disassembler: add missing z13 instructions
  s390/disassembler: add new z14 instructions
  s390: use generic rwsem implementation
  s390: implement memset16, memset32 & memset64
  s390/mm: use memset64 instead of clear_table
  s390: optimize memset implementation
  s390: cleanup string ops prototypes
  s390/kprobes: remove KPROBE_SWAP_INST state
  s390/debug: adjust coding style
  s390: remove named saved segment support
  s390/disassembler: remove insn_to_mnemonic()
  s390/disassembler: generate opcode tables from text file
  s390: avoid undefined behaviour
  s390: simplify transactional execution elf hwcap handling
  Merge tag 'vfio-ccw-20171109' of 
git://git.kernel.org/.../kvms390/vfio-ccw into features
  s390: fix transactional execution control register handling
  s390/noexec: execute kexec datamover without DAT

Hendrik Brueckner (1):
  s390/cpum_cf: add hardware counter support for IBM z14

Himanshu Jha (1):
  s390/sclp: Use setup_timer and mod_timer

Jason J. Herne (1):
  s390: vfio-ccw: Do not attempt to free no-op, test and tic cda.

Jean Delvare (1):
  s390/char: fix cdev_add usage

Johannes Thumshirn (1):
  samples/kprobes: Add s390 case in kprobe example module

Julian Wiedmann (1):
  s390/ccwgroup: tie a ccwgroup driver to its ccw driver

Luc Van Oostenryck (1):
  s390: pass endianness info to sparse

Martin Schwidefsky (14):
  s390/topology: add detection of dedicated vs shared CPUs
  s390/spinlock: use the cpu number +1 as spinlock value
  s390/spinlock: introduce 

[GIT PULL] s390 updates for v4.15

2017-11-12 Thread Heiko Carstens
Hello Linus,

since Martin is on vacation you get the s390 pull request for the v4.15
merge window this time from me.

Please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates:

Besides a lot of cleanups and bug fixes these are the most important changes:

- A new regset for runtime instrumentation registers

- Hardware accelerated AES-GCM support for the aes_s390 module

- Support for the new CEX6S crypto cards

- Support for FORTIFY_SOURCE

- Addition of missing z13 and new z14 instructions to the in-kernel
  disassembler

- Generate opcode tables for the in-kernel disassembler out of a simple text
  file instead of having to manually maintain those tables

- Fast memset16, memset32 and memset64 implementations

- Removal of named saved segment support

- Hardware counter support for z14

- Queued spinlocks and queued rwlocks implementations for s390

- Use the stack_depth tracking feature for s390 BPF JIT

- A new s390_sthyi system call which emulates the sthyi (store hypervisor
  information) instruction

- Removal of the old KVM virtio transport

- An s390 specific CPU alternatives implementation which is used in the new
  spinlock code

You will see two trivial merge conflicts caused by the "SPDX GPL-2.0 license"
commit and the removal of s390 specific header files:

arch/s390/include/asm/rwsem.h
arch/s390/include/uapi/asm/kvm_virtio.h

The resolution is to simply remove those files.

Thanks,
Heiko

Alice Frosi (2):
  s390/runtime_instrumentation: clean up struct runtime_instr_cb
  s390/ptrace: add runtime instrumention register get/set

Arnd Bergmann (1):
  s390/dasd: avoid calling do_gettimeofday()

Christian Borntraeger (2):
  s390/pci: do not require AIS facility
  s390/virtio: remove unused header file kvm_virtio.h

Cornelia Huck (1):
  MAINTAINERS: add virtio-ccw.h to virtio/s390 section

Dong Jia Shi (2):
  vfio: ccw: bypass bad idaw address when fetching IDAL ccws
  vfio: ccw: validate the count field of a ccw before pinning

Elena Reshetova (1):
  vmur: convert urdev.ref_count from atomic_t to refcount_t

Harald Freudenberger (7):
  s390/zcrypt: Explicitly check input data length.
  s390/crypto: add s390 platform specific aes gcm support.
  s390/zcrypt: CEX6S exploitation
  s390/zcrypt: Enable special header file flag for AU CPRP
  s390/zcrypt: Introduce QACT support for AP bus devices.
  s390/archrandom: Reconsider s390 arch random implementation
  s390/zcrypt: Rework struct ap_qact_ap_info.

Heiko Carstens (31):
  s390: convert release_thread() into a static inline function
  s390/runtime instrumention: fix possible memory corruption
  s390/runtime instrumentation: simplify task exit handling
  s390/guarded storage: fix possible memory corruption
  s390/ptrace: fix guarded storage regset handling
  s390/guarded storage: simplify task exit handling
  s390: get rid of exit_thread()
  s390: add support for FORTIFY_SOURCE
  s390/cpumf: remove superfluous nr_cpumask_bits check
  s390/virtio: simplify Makefile
  s390/disassembler: add missing end marker for e7 table
  s390/disassembler: fix LRDFU format
  s390/disassembler: remove double instructions
  s390/disassembler: add sthyi instruction
  s390/disassembler: add missing z13 instructions
  s390/disassembler: add new z14 instructions
  s390: use generic rwsem implementation
  s390: implement memset16, memset32 & memset64
  s390/mm: use memset64 instead of clear_table
  s390: optimize memset implementation
  s390: cleanup string ops prototypes
  s390/kprobes: remove KPROBE_SWAP_INST state
  s390/debug: adjust coding style
  s390: remove named saved segment support
  s390/disassembler: remove insn_to_mnemonic()
  s390/disassembler: generate opcode tables from text file
  s390: avoid undefined behaviour
  s390: simplify transactional execution elf hwcap handling
  Merge tag 'vfio-ccw-20171109' of 
git://git.kernel.org/.../kvms390/vfio-ccw into features
  s390: fix transactional execution control register handling
  s390/noexec: execute kexec datamover without DAT

Hendrik Brueckner (1):
  s390/cpum_cf: add hardware counter support for IBM z14

Himanshu Jha (1):
  s390/sclp: Use setup_timer and mod_timer

Jason J. Herne (1):
  s390: vfio-ccw: Do not attempt to free no-op, test and tic cda.

Jean Delvare (1):
  s390/char: fix cdev_add usage

Johannes Thumshirn (1):
  samples/kprobes: Add s390 case in kprobe example module

Julian Wiedmann (1):
  s390/ccwgroup: tie a ccwgroup driver to its ccw driver

Luc Van Oostenryck (1):
  s390: pass endianness info to sparse

Martin Schwidefsky (14):
  s390/topology: add detection of dedicated vs shared CPUs
  s390/spinlock: use the cpu number +1 as spinlock value
  s390/spinlock: introduce 

drivers/firmware/google/vpd.c: duplicate sysfs file

2017-11-12 Thread Randy Dunlap
sysfs: cannot create duplicate filename '/devices/platform/vpd'

on the second load of this driver.  I.e.,

modprobe vpd-sysfs
rmmod vpd-sysfs
modprobe vpd-sysfs
[boom]


on 4.14-rc8

-- 
~Randy


drivers/firmware/google/vpd.c: duplicate sysfs file

2017-11-12 Thread Randy Dunlap
sysfs: cannot create duplicate filename '/devices/platform/vpd'

on the second load of this driver.  I.e.,

modprobe vpd-sysfs
rmmod vpd-sysfs
modprobe vpd-sysfs
[boom]


on 4.14-rc8

-- 
~Randy


[PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug

2017-11-12 Thread Paolo Valente
Hi,
these patches address the following issue, raised and
discussed in [1].

BFQ provides a proportional share policy for the blkio controller.  In
this respect, BFQ updates the I/O accounting related to its policy,
i.e., the statistics contained in the special files blkio.bfq.* in
blkio groups (these files are the bfq counterpart of the blkio.*
statistic files updated by CFQ).  To update these statistics, BFQ
invokes some blkg_*stats_* functions.  We have found out that these
functions take a considerable percentage, about 40%, of the total
execution time of BFQ.

This patch series contains two patches to address this issue, namely
the patches anticipated and discussed in their main aspects in [1].

The first of these two patches is patch 3/4 in this series: it enables
BFQ to execute the above blkg_*stats_* functions, where possible, in
parallel with the rest of the code of the scheduler.  With this
improvement, the maximum request-processing rate sustainable with BFQ
grows by 25%-30%, depending on the CPU.  For instance, it grows from
250 to 310 KIOPS on an Intel i7-4850HQ. These results, and the others
reported in this letter, have been obtained and can be reproduced very
easily with the script [2].

Unfortunately, even after the above improvement, blkg_*stats_*
functions still cause a noticeable loss of sustainable throughput.  To
give an idea, on an Intel i7-4850HQ, if the update of blkio.bfq.*
statistics is not performed at all, then the sustainable throughput
grows from 310 to 400 KIOPS.  This issue has been already discussed in
[1] as well. In brief, we agreed to make a further commit, which
introduces the possibility to disable/re-enable at boot, or at
module-loading time, the updating of all blkio statistics for
proportional-share policies, i.e., of both those updated by BFQ and
those updated by CFQ.

We are already working on that commit, but finalizing it will take
some time.  Fortunately, following a suggestion/recommendation of
Tejun in the same thread [2], it is already possible to drastically
increase BFQ performance, when no blkio-debugging information is
needed. Tejun's suggestion/recommendation is to move most blkio.bfq.*
statistics behind an already existing config option,
CONFIG_DEBUG_BLK_CGROUP.  Patch 4/4 in this series does that.  Thanks
to this change, if CONFIG_DEBUG_BLK_CGROUP is not set, then bfq does
attain a further boost in sustainable throughput, which ranges from
+30% to +45%, depending on the CPU (some figures in the
documentation).

The above two patches are preceded by two preliminary patches.  The
first updates the conservative range of IOPS (sustainable with BFQ)
that was previously reported in the documentation.  The patch replaces
this piece of information with the actual, much higher limits that we
have measured while working at the above two commits.  The second
preliminary patch fixes a functional bug, related to the update of the
above statistics.

We waited for one week of testing from bfq users before submitting
these patches. We hope we are still in time for having these
improvements and fixes considered for 4.15.

NOTE.

Two definitions of empty functions in patch 4/4 trigger the following
checkpatch error: "open brace '{' following function definitions go on
the next line". Unfortunately, following this recommendation does seem
to worsen code in our case: in addition to making these two
definitions slightly harder to read, it would break symmetry with
respect to all other definitions of empty functions, both those
already present in the base code, and those added by the patch
itself. In particular, in all those other definitions, the empty body
of the functions is on the same line as the prototypes of the
functions. Oddly, the latter definitions do not cause the same error
report.

Thanks,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg18943.html
[2] https://github.com/Algodev-github/IOSpeed

Luca Miccio (2):
  block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP

Paolo Valente (2):
  doc, block, bfq: update max IOPS sustainable with BFQ
  block, bfq: update blkio stats outside the scheduler lock

 Documentation/block/bfq-iosched.txt |  43 +--
 block/bfq-cgroup.c  | 148 
 block/bfq-iosched.c | 117 ++--
 block/bfq-iosched.h |   4 +-
 block/bfq-wf2q.c|   1 -
 5 files changed, 233 insertions(+), 80 deletions(-)

--
2.10.0


[PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove

2017-11-12 Thread Paolo Valente
From: Luca Miccio 

bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be
invoked, respectively, when an I/O request enters and when an I/O
request exits the scheduler. Unfortunately, bfq does not fully comply
with this scheme, because it does not invoke these functions for
requests that are inserted into or extracted from its priority
dispatch list. This commit fixes this mistake.

Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 block/bfq-iosched.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a854..91703eb 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1359,7 +1359,6 @@ static void bfq_bfqq_handle_idle_busy_switch(struct 
bfq_data *bfqd,
bfqq->ttime.last_end_request +
bfqd->bfq_slice_idle * 3;
 
-   bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq, rq->cmd_flags);
 
/*
 * bfqq deserves to be weight-raised if:
@@ -1633,7 +1632,6 @@ static void bfq_remove_request(struct request_queue *q,
if (rq->cmd_flags & REQ_META)
bfqq->meta_pending--;
 
-   bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
 }
 
 static bool bfq_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
@@ -1746,6 +1744,7 @@ static void bfq_requests_merged(struct request_queue *q, 
struct request *rq,
bfqq->next_rq = rq;
 
bfq_remove_request(q, next);
+   bfqg_stats_update_io_remove(bfqq_group(bfqq), next->cmd_flags);
 
spin_unlock_irq(>bfqd->lock);
 end:
@@ -3700,6 +3699,9 @@ static struct request *bfq_dispatch_request(struct 
blk_mq_hw_ctx *hctx)
spin_lock_irq(>lock);
 
rq = __bfq_dispatch_request(hctx);
+   if (rq && RQ_BFQQ(rq))
+   bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
+   rq->cmd_flags);
spin_unlock_irq(>lock);
 
return rq;
@@ -4224,6 +4226,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
 {
struct request_queue *q = hctx->queue;
struct bfq_data *bfqd = q->elevator->elevator_data;
+   struct bfq_queue *bfqq = RQ_BFQQ(rq);
 
spin_lock_irq(>lock);
if (blk_mq_sched_try_insert_merge(q, rq)) {
@@ -4243,6 +4246,12 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
list_add_tail(>queuelist, >dispatch);
} else {
__bfq_insert_request(bfqd, rq);
+   /*
+* Update bfqq, because, if a queue merge has occurred
+* in __bfq_insert_request, then rq has been
+* redirected into a new queue.
+*/
+   bfqq = RQ_BFQQ(rq);
 
if (rq_mergeable(rq)) {
elv_rqhash_add(q, rq);
@@ -4251,6 +4260,9 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
}
}
 
+   if (bfqq)
+   bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, rq->cmd_flags);
+
spin_unlock_irq(>lock);
 }
 
@@ -4428,8 +4440,11 @@ static void bfq_finish_request(struct request *rq)
 * lock is held.
 */
 
-   if (!RB_EMPTY_NODE(>rb_node))
+   if (!RB_EMPTY_NODE(>rb_node)) {
bfq_remove_request(rq->q, rq);
+   bfqg_stats_update_io_remove(bfqq_group(bfqq),
+   rq->cmd_flags);
+   }
bfq_put_rq_priv_body(bfqq);
}
 
-- 
2.10.0



[PATCH IMPROVEMENT/BUGFIX 0/4] block, bfq: increase sustainable IOPS and fix a bug

2017-11-12 Thread Paolo Valente
Hi,
these patches address the following issue, raised and
discussed in [1].

BFQ provides a proportional share policy for the blkio controller.  In
this respect, BFQ updates the I/O accounting related to its policy,
i.e., the statistics contained in the special files blkio.bfq.* in
blkio groups (these files are the bfq counterpart of the blkio.*
statistic files updated by CFQ).  To update these statistics, BFQ
invokes some blkg_*stats_* functions.  We have found out that these
functions take a considerable percentage, about 40%, of the total
execution time of BFQ.

This patch series contains two patches to address this issue, namely
the patches anticipated and discussed in their main aspects in [1].

The first of these two patches is patch 3/4 in this series: it enables
BFQ to execute the above blkg_*stats_* functions, where possible, in
parallel with the rest of the code of the scheduler.  With this
improvement, the maximum request-processing rate sustainable with BFQ
grows by 25%-30%, depending on the CPU.  For instance, it grows from
250 to 310 KIOPS on an Intel i7-4850HQ. These results, and the others
reported in this letter, have been obtained and can be reproduced very
easily with the script [2].

Unfortunately, even after the above improvement, blkg_*stats_*
functions still cause a noticeable loss of sustainable throughput.  To
give an idea, on an Intel i7-4850HQ, if the update of blkio.bfq.*
statistics is not performed at all, then the sustainable throughput
grows from 310 to 400 KIOPS.  This issue has been already discussed in
[1] as well. In brief, we agreed to make a further commit, which
introduces the possibility to disable/re-enable at boot, or at
module-loading time, the updating of all blkio statistics for
proportional-share policies, i.e., of both those updated by BFQ and
those updated by CFQ.

We are already working on that commit, but finalizing it will take
some time.  Fortunately, following a suggestion/recommendation of
Tejun in the same thread [2], it is already possible to drastically
increase BFQ performance, when no blkio-debugging information is
needed. Tejun's suggestion/recommendation is to move most blkio.bfq.*
statistics behind an already existing config option,
CONFIG_DEBUG_BLK_CGROUP.  Patch 4/4 in this series does that.  Thanks
to this change, if CONFIG_DEBUG_BLK_CGROUP is not set, then bfq does
attain a further boost in sustainable throughput, which ranges from
+30% to +45%, depending on the CPU (some figures in the
documentation).

The above two patches are preceded by two preliminary patches.  The
first updates the conservative range of IOPS (sustainable with BFQ)
that was previously reported in the documentation.  The patch replaces
this piece of information with the actual, much higher limits that we
have measured while working at the above two commits.  The second
preliminary patch fixes a functional bug, related to the update of the
above statistics.

We waited for one week of testing from bfq users before submitting
these patches. We hope we are still in time for having these
improvements and fixes considered for 4.15.

NOTE.

Two definitions of empty functions in patch 4/4 trigger the following
checkpatch error: "open brace '{' following function definitions go on
the next line". Unfortunately, following this recommendation does seem
to worsen code in our case: in addition to making these two
definitions slightly harder to read, it would break symmetry with
respect to all other definitions of empty functions, both those
already present in the base code, and those added by the patch
itself. In particular, in all those other definitions, the empty body
of the functions is on the same line as the prototypes of the
functions. Oddly, the latter definitions do not cause the same error
report.

Thanks,
Paolo

[1] https://www.spinics.net/lists/linux-block/msg18943.html
[2] https://github.com/Algodev-github/IOSpeed

Luca Miccio (2):
  block, bfq: add missing invocations of bfqg_stats_update_io_add/remove
  block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP

Paolo Valente (2):
  doc, block, bfq: update max IOPS sustainable with BFQ
  block, bfq: update blkio stats outside the scheduler lock

 Documentation/block/bfq-iosched.txt |  43 +--
 block/bfq-cgroup.c  | 148 
 block/bfq-iosched.c | 117 ++--
 block/bfq-iosched.h |   4 +-
 block/bfq-wf2q.c|   1 -
 5 files changed, 233 insertions(+), 80 deletions(-)

--
2.10.0


[PATCH BUGFIX/IMPROVEMENT 2/4] block, bfq: add missing invocations of bfqg_stats_update_io_add/remove

2017-11-12 Thread Paolo Valente
From: Luca Miccio 

bfqg_stats_update_io_add and bfqg_stats_update_io_remove are to be
invoked, respectively, when an I/O request enters and when an I/O
request exits the scheduler. Unfortunately, bfq does not fully comply
with this scheme, because it does not invoke these functions for
requests that are inserted into or extracted from its priority
dispatch list. This commit fixes this mistake.

Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 block/bfq-iosched.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a854..91703eb 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1359,7 +1359,6 @@ static void bfq_bfqq_handle_idle_busy_switch(struct 
bfq_data *bfqd,
bfqq->ttime.last_end_request +
bfqd->bfq_slice_idle * 3;
 
-   bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq, rq->cmd_flags);
 
/*
 * bfqq deserves to be weight-raised if:
@@ -1633,7 +1632,6 @@ static void bfq_remove_request(struct request_queue *q,
if (rq->cmd_flags & REQ_META)
bfqq->meta_pending--;
 
-   bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
 }
 
 static bool bfq_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
@@ -1746,6 +1744,7 @@ static void bfq_requests_merged(struct request_queue *q, 
struct request *rq,
bfqq->next_rq = rq;
 
bfq_remove_request(q, next);
+   bfqg_stats_update_io_remove(bfqq_group(bfqq), next->cmd_flags);
 
spin_unlock_irq(>bfqd->lock);
 end:
@@ -3700,6 +3699,9 @@ static struct request *bfq_dispatch_request(struct 
blk_mq_hw_ctx *hctx)
spin_lock_irq(>lock);
 
rq = __bfq_dispatch_request(hctx);
+   if (rq && RQ_BFQQ(rq))
+   bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
+   rq->cmd_flags);
spin_unlock_irq(>lock);
 
return rq;
@@ -4224,6 +4226,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
 {
struct request_queue *q = hctx->queue;
struct bfq_data *bfqd = q->elevator->elevator_data;
+   struct bfq_queue *bfqq = RQ_BFQQ(rq);
 
spin_lock_irq(>lock);
if (blk_mq_sched_try_insert_merge(q, rq)) {
@@ -4243,6 +4246,12 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
list_add_tail(>queuelist, >dispatch);
} else {
__bfq_insert_request(bfqd, rq);
+   /*
+* Update bfqq, because, if a queue merge has occurred
+* in __bfq_insert_request, then rq has been
+* redirected into a new queue.
+*/
+   bfqq = RQ_BFQQ(rq);
 
if (rq_mergeable(rq)) {
elv_rqhash_add(q, rq);
@@ -4251,6 +4260,9 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
}
}
 
+   if (bfqq)
+   bfqg_stats_update_io_add(bfqq_group(bfqq), bfqq, rq->cmd_flags);
+
spin_unlock_irq(>lock);
 }
 
@@ -4428,8 +4440,11 @@ static void bfq_finish_request(struct request *rq)
 * lock is held.
 */
 
-   if (!RB_EMPTY_NODE(>rb_node))
+   if (!RB_EMPTY_NODE(>rb_node)) {
bfq_remove_request(rq->q, rq);
+   bfqg_stats_update_io_remove(bfqq_group(bfqq),
+   rq->cmd_flags);
+   }
bfq_put_rq_priv_body(bfqq);
}
 
-- 
2.10.0



[PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP

2017-11-12 Thread Paolo Valente
From: Luca Miccio 

BFQ currently creates, and updates, its own instance of the whole
set of blkio statistics that cfq creates. Yet, from the comments
of Tejun Heo in [1], it turned out that most of these statistics
are meant/useful only for debugging. This commit makes BFQ create
the latter, debugging statistics only if the option
CONFIG_DEBUG_BLK_CGROUP is set.

By doing so, this commit also enables BFQ to enjoy a high perfomance
boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
BFQ has to update far fewer statistics, and, in particular, not the
heaviest to update.  To give an idea of the benefits, if
CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
with 8 threads doing random I/O in parallel on null_blk (configured
with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
(+30%). We have measured similar or even much higher boosts with other
CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
been obtained and can be reproduced very easily with the script in [1].

[1] https://www.spinics.net/lists/linux-block/msg18943.html

Suggested-by: Tejun Heo 
Suggested-by: Ulf Hansson 
Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Luca Miccio 
Signed-off-by: Paolo Valente 
---
 Documentation/block/bfq-iosched.txt |  38 +++--
 block/bfq-cgroup.c  | 148 
 block/bfq-iosched.c |  14 ++--
 block/bfq-iosched.h |   4 +-
 4 files changed, 125 insertions(+), 79 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 7fad6c0..8d8d8f0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,22 @@ for that device, by setting low_latency to 0. See Section 3 
for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-BFQ has a non-null overhead, which limits the maximum IOPS that the
-CPU can process for a device scheduled with BFQ. To give an idea of
-the limits on slow or average CPUs, here are BFQ limits for three
-different CPUs, on, respectively, an average laptop, an old desktop,
-and a cheap embedded system, in case full hierarchical support is
-enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
+can process for a device scheduled with BFQ. To give an idea of the
+limits on slow or average CPUs, here are, first, the limits of BFQ for
+three different CPUs, on, respectively, an average laptop, an old
+desktop, and a cheap embedded system, in case full hierarchical
+support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
+CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
+- Intel i7-4850HQ: 400 KIOPS
+- AMD A8-3850: 250 KIOPS
+- ARM CortexTM-A53 Octa-core: 80 KIOPS
+
+If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
+support is enabled), then the sustainable throughput with BFQ
+decreases, because all blkio.bfq* statistics are created and updated
+(Section 4-2). For BFQ, this leads to the following maximum
+sustainable throughputs, on the same systems as above:
 - Intel i7-4850HQ: 310 KIOPS
 - AMD A8-3850: 200 KIOPS
 - ARM CortexTM-A53 Octa-core: 56 KIOPS
@@ -505,6 +515,22 @@ BFQ-specific files is "blkio.bfq." or "io.bfq." For 
example, the group
 parameter to set the weight of a group with BFQ is blkio.bfq.weight
 or io.bfq.weight.
 
+As for cgroups-v1 (blkio controller), the exact set of stat files
+created, and kept up-to-date by bfq, depends on whether
+CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
+the stat files documented in
+Documentation/cgroup-v1/blkio-controller.txt. If, instead,
+CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
+blkio.bfq.io_service_bytes
+blkio.bfq.io_service_bytes_recursive
+blkio.bfq.io_serviced
+blkio.bfq.io_serviced_recursive
+
+The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
+throughput sustainable with bfq, because updating the blkio.bfq.*
+stats is rather costly, especially for some of the stats enabled by
+CONFIG_DEBUG_BLK_CGROUP.
+
 Parameters to set
 -
 
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index ceefb9a..da1525e 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -24,7 +24,7 @@
 
 #include "bfq-iosched.h"
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) &&  defined(CONFIG_DEBUG_BLK_CGROUP)
 
 /* bfqg stats flags */
 enum bfqg_stats_flags {
@@ -152,6 +152,57 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group 
*bfqg)
bfqg_stats_update_group_wait_time(stats);
 }
 
+void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
+  

[PATCH BUGFIX/IMPROVEMENT 4/4] block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP

2017-11-12 Thread Paolo Valente
From: Luca Miccio 

BFQ currently creates, and updates, its own instance of the whole
set of blkio statistics that cfq creates. Yet, from the comments
of Tejun Heo in [1], it turned out that most of these statistics
are meant/useful only for debugging. This commit makes BFQ create
the latter, debugging statistics only if the option
CONFIG_DEBUG_BLK_CGROUP is set.

By doing so, this commit also enables BFQ to enjoy a high perfomance
boost. The reason is that, if CONFIG_DEBUG_BLK_CGROUP is not set, then
BFQ has to update far fewer statistics, and, in particular, not the
heaviest to update.  To give an idea of the benefits, if
CONFIG_DEBUG_BLK_CGROUP is not set, then, on an Intel i7-4850HQ, and
with 8 threads doing random I/O in parallel on null_blk (configured
with 0 latency), the throughput of BFQ grows from 310 to 400 KIOPS
(+30%). We have measured similar or even much higher boosts with other
CPUs: e.g., +45% with an ARM CortexTM-A53 Octa-core. Our results have
been obtained and can be reproduced very easily with the script in [1].

[1] https://www.spinics.net/lists/linux-block/msg18943.html

Suggested-by: Tejun Heo 
Suggested-by: Ulf Hansson 
Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Luca Miccio 
Signed-off-by: Paolo Valente 
---
 Documentation/block/bfq-iosched.txt |  38 +++--
 block/bfq-cgroup.c  | 148 
 block/bfq-iosched.c |  14 ++--
 block/bfq-iosched.h |   4 +-
 4 files changed, 125 insertions(+), 79 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 7fad6c0..8d8d8f0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,22 @@ for that device, by setting low_latency to 0. See Section 3 
for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-BFQ has a non-null overhead, which limits the maximum IOPS that the
-CPU can process for a device scheduled with BFQ. To give an idea of
-the limits on slow or average CPUs, here are BFQ limits for three
-different CPUs, on, respectively, an average laptop, an old desktop,
-and a cheap embedded system, in case full hierarchical support is
-enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
+can process for a device scheduled with BFQ. To give an idea of the
+limits on slow or average CPUs, here are, first, the limits of BFQ for
+three different CPUs, on, respectively, an average laptop, an old
+desktop, and a cheap embedded system, in case full hierarchical
+support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
+CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
+- Intel i7-4850HQ: 400 KIOPS
+- AMD A8-3850: 250 KIOPS
+- ARM CortexTM-A53 Octa-core: 80 KIOPS
+
+If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
+support is enabled), then the sustainable throughput with BFQ
+decreases, because all blkio.bfq* statistics are created and updated
+(Section 4-2). For BFQ, this leads to the following maximum
+sustainable throughputs, on the same systems as above:
 - Intel i7-4850HQ: 310 KIOPS
 - AMD A8-3850: 200 KIOPS
 - ARM CortexTM-A53 Octa-core: 56 KIOPS
@@ -505,6 +515,22 @@ BFQ-specific files is "blkio.bfq." or "io.bfq." For 
example, the group
 parameter to set the weight of a group with BFQ is blkio.bfq.weight
 or io.bfq.weight.
 
+As for cgroups-v1 (blkio controller), the exact set of stat files
+created, and kept up-to-date by bfq, depends on whether
+CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
+the stat files documented in
+Documentation/cgroup-v1/blkio-controller.txt. If, instead,
+CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
+blkio.bfq.io_service_bytes
+blkio.bfq.io_service_bytes_recursive
+blkio.bfq.io_serviced
+blkio.bfq.io_serviced_recursive
+
+The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
+throughput sustainable with bfq, because updating the blkio.bfq.*
+stats is rather costly, especially for some of the stats enabled by
+CONFIG_DEBUG_BLK_CGROUP.
+
 Parameters to set
 -
 
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index ceefb9a..da1525e 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -24,7 +24,7 @@
 
 #include "bfq-iosched.h"
 
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
+#if defined(CONFIG_BFQ_GROUP_IOSCHED) &&  defined(CONFIG_DEBUG_BLK_CGROUP)
 
 /* bfqg stats flags */
 enum bfqg_stats_flags {
@@ -152,6 +152,57 @@ void bfqg_stats_update_avg_queue_size(struct bfq_group 
*bfqg)
bfqg_stats_update_group_wait_time(stats);
 }
 
+void bfqg_stats_update_io_add(struct bfq_group *bfqg, struct bfq_queue *bfqq,
+ unsigned int op)
+{
+   blkg_rwstat_add(>stats.queued, op, 1);
+   bfqg_stats_end_empty_time(>stats);
+   if (!(bfqq == 

[PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ

2017-11-12 Thread Paolo Valente
We have investigated more deeply the performance of BFQ, in terms of
number of IOPS that can be processed by the CPU when BFQ is used as
I/O scheduler. In more detail, using the script [1], we have measured
the number of IOPS reached on top of a null block device configured
with zero latency, as a function of the workload (sequential read,
sequential write, random read, random write) and of the system (we
considered desktops, laptops and embedded systems).

Basing on the resulting figures, with this commit we update the
current, conservative IOPS range reported in BFQ documentation. In
particular, the documentation now reports, for each of three different
systems, the lowest number of IOPS obtained for that system with the
above test (namely, the value obtained with the workload leading to
the lowest IOPS).

[1] https://github.com/Algodev-github/IOSpeed

Reviewed-by: Lee Tibbert 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 Documentation/block/bfq-iosched.txt | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 3d6951d..7a93615 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,17 @@ for that device, by setting low_latency to 0. See Section 3 
for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-On average CPUs, the current version of BFQ can handle devices
-performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
-reference, 30-50 KIOPS correspond to very high bandwidths with
-sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
-to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
-multi-queue devices too.
+BFQ has a non-null overhead, which limits the maximum IOPS that the
+CPU can process for a device scheduled with BFQ. To give an idea of
+the limits on slow or average CPUs, here are BFQ limits for three
+different CPUs, on, respectively, an average laptop, an old desktop,
+and a cheap embedded system, in case full hierarchical support is
+enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+- Intel i7-4850HQ: 250 KIOPS
+- AMD A8-3850: 170 KIOPS
+- ARM CortexTM-A53 Octa-core: 45 KIOPS
+
+BFQ works for multi-queue devices too.
 
 The table of contents follow. Impatients can just jump to Section 3.
 
-- 
2.10.0



[PATCH BUGFIX/IMPROVEMENT 1/4] doc, block, bfq: update max IOPS sustainable with BFQ

2017-11-12 Thread Paolo Valente
We have investigated more deeply the performance of BFQ, in terms of
number of IOPS that can be processed by the CPU when BFQ is used as
I/O scheduler. In more detail, using the script [1], we have measured
the number of IOPS reached on top of a null block device configured
with zero latency, as a function of the workload (sequential read,
sequential write, random read, random write) and of the system (we
considered desktops, laptops and embedded systems).

Basing on the resulting figures, with this commit we update the
current, conservative IOPS range reported in BFQ documentation. In
particular, the documentation now reports, for each of three different
systems, the lowest number of IOPS obtained for that system with the
above test (namely, the value obtained with the workload leading to
the lowest IOPS).

[1] https://github.com/Algodev-github/IOSpeed

Reviewed-by: Lee Tibbert 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 Documentation/block/bfq-iosched.txt | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 3d6951d..7a93615 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -20,12 +20,17 @@ for that device, by setting low_latency to 0. See Section 3 
for
 details on how to configure BFQ for the desired tradeoff between
 latency and throughput, or on how to maximize throughput.
 
-On average CPUs, the current version of BFQ can handle devices
-performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
-reference, 30-50 KIOPS correspond to very high bandwidths with
-sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
-to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
-multi-queue devices too.
+BFQ has a non-null overhead, which limits the maximum IOPS that the
+CPU can process for a device scheduled with BFQ. To give an idea of
+the limits on slow or average CPUs, here are BFQ limits for three
+different CPUs, on, respectively, an average laptop, an old desktop,
+and a cheap embedded system, in case full hierarchical support is
+enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
+- Intel i7-4850HQ: 250 KIOPS
+- AMD A8-3850: 170 KIOPS
+- ARM CortexTM-A53 Octa-core: 45 KIOPS
+
+BFQ works for multi-queue devices too.
 
 The table of contents follow. Impatients can just jump to Section 3.
 
-- 
2.10.0



[PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock

2017-11-12 Thread Paolo Valente
bfq invokes various blkg_*stats_* functions to update the statistics
contained in the special files blkio.bfq.* in the blkio controller
groups, i.e., the I/O accounting related to the proportional-share
policy provided by bfq. The execution of these functions takes a
considerable percentage, about 40%, of the total per-request execution
time of bfq (i.e., of the sum of the execution time of all the bfq
functions that have to be executed to process an I/O request from its
creation to its destruction).  This reduces the request-processing
rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
the bfq functions that invoke blkg_*stats_* functions cannot be
executed in parallel with the rest of the code of bfq, because both
are executed under the same same per-device scheduler lock.

To reduce this slowdown, this commit moves, wherever possible, the
invocation of these functions (more precisely, of the bfq functions
that invoke blkg_*stats_* functions) outside the critical sections
protected by the scheduler lock.

With this change, and with all blkio.bfq.* statistics enabled, the
throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
i7-4850HQ, in case of 8 threads doing random I/O in parallel on
null_blk, with the latter configured with 0 latency. We obtained the
same or higher throughput boosts, up to +30%, with other processors
(some figures are reported in the documentation). For our tests, we
used the script [1], with which our results can be easily reproduced.

NOTE. This commit still protects the invocation of blkg_*stats_*
functions with the request_queue lock, because the group these
functions are invoked on may otherwise disappear before or while these
functions are executed.  Fortunately, tests without even this lock
show, by difference, that the serialization caused by this lock has a
little impact (at most ~5% of throughput reduction).

[1] https://github.com/Algodev-github/IOSpeed

Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 Documentation/block/bfq-iosched.txt |   6 +-
 block/bfq-iosched.c | 110 
 block/bfq-wf2q.c|   1 -
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 7a93615..7fad6c0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -26,9 +26,9 @@ the limits on slow or average CPUs, here are BFQ limits for 
three
 different CPUs, on, respectively, an average laptop, an old desktop,
 and a cheap embedded system, in case full hierarchical support is
 enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
-- Intel i7-4850HQ: 250 KIOPS
-- AMD A8-3850: 170 KIOPS
-- ARM CortexTM-A53 Octa-core: 45 KIOPS
+- Intel i7-4850HQ: 310 KIOPS
+- AMD A8-3850: 200 KIOPS
+- ARM CortexTM-A53 Octa-core: 56 KIOPS
 
 BFQ works for multi-queue devices too.
 
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 91703eb..69e05f8 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2228,7 +2228,6 @@ static void __bfq_set_in_service_queue(struct bfq_data 
*bfqd,
   struct bfq_queue *bfqq)
 {
if (bfqq) {
-   bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
bfq_clear_bfqq_fifo_expire(bfqq);
 
bfqd->budgets_assigned = (bfqd->budgets_assigned * 7 + 256) / 8;
@@ -3469,7 +3468,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data 
*bfqd)
 */
bfq_clear_bfqq_wait_request(bfqq);
hrtimer_try_to_cancel(>idle_slice_timer);
-   bfqg_stats_update_idle_time(bfqq_group(bfqq));
}
goto keep_queue;
}
@@ -3695,15 +3693,67 @@ static struct request *bfq_dispatch_request(struct 
blk_mq_hw_ctx *hctx)
 {
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
struct request *rq;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   struct bfq_queue *in_serv_queue, *bfqq;
+   bool waiting_rq, idle_timer_disabled;
+#endif
 
spin_lock_irq(>lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   in_serv_queue = bfqd->in_service_queue;
+   waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
+
+   rq = __bfq_dispatch_request(hctx);
+
+   idle_timer_disabled =
+   waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+
+#else
rq = __bfq_dispatch_request(hctx);
-   if (rq && RQ_BFQQ(rq))
-   bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
-   rq->cmd_flags);
+#endif
spin_unlock_irq(>lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   

[PATCH BUGFIX/IMPROVEMENT 3/4] block, bfq: update blkio stats outside the scheduler lock

2017-11-12 Thread Paolo Valente
bfq invokes various blkg_*stats_* functions to update the statistics
contained in the special files blkio.bfq.* in the blkio controller
groups, i.e., the I/O accounting related to the proportional-share
policy provided by bfq. The execution of these functions takes a
considerable percentage, about 40%, of the total per-request execution
time of bfq (i.e., of the sum of the execution time of all the bfq
functions that have to be executed to process an I/O request from its
creation to its destruction).  This reduces the request-processing
rate sustainable by bfq noticeably, even on a multicore CPU. In fact,
the bfq functions that invoke blkg_*stats_* functions cannot be
executed in parallel with the rest of the code of bfq, because both
are executed under the same same per-device scheduler lock.

To reduce this slowdown, this commit moves, wherever possible, the
invocation of these functions (more precisely, of the bfq functions
that invoke blkg_*stats_* functions) outside the critical sections
protected by the scheduler lock.

With this change, and with all blkio.bfq.* statistics enabled, the
throughput grows, e.g., from 250 to 310 KIOPS (+25%) on an Intel
i7-4850HQ, in case of 8 threads doing random I/O in parallel on
null_blk, with the latter configured with 0 latency. We obtained the
same or higher throughput boosts, up to +30%, with other processors
(some figures are reported in the documentation). For our tests, we
used the script [1], with which our results can be easily reproduced.

NOTE. This commit still protects the invocation of blkg_*stats_*
functions with the request_queue lock, because the group these
functions are invoked on may otherwise disappear before or while these
functions are executed.  Fortunately, tests without even this lock
show, by difference, that the serialization caused by this lock has a
little impact (at most ~5% of throughput reduction).

[1] https://github.com/Algodev-github/IOSpeed

Tested-by: Lee Tibbert 
Tested-by: Oleksandr Natalenko 
Signed-off-by: Paolo Valente 
Signed-off-by: Luca Miccio 
---
 Documentation/block/bfq-iosched.txt |   6 +-
 block/bfq-iosched.c | 110 
 block/bfq-wf2q.c|   1 -
 3 files changed, 102 insertions(+), 15 deletions(-)

diff --git a/Documentation/block/bfq-iosched.txt 
b/Documentation/block/bfq-iosched.txt
index 7a93615..7fad6c0 100644
--- a/Documentation/block/bfq-iosched.txt
+++ b/Documentation/block/bfq-iosched.txt
@@ -26,9 +26,9 @@ the limits on slow or average CPUs, here are BFQ limits for 
three
 different CPUs, on, respectively, an average laptop, an old desktop,
 and a cheap embedded system, in case full hierarchical support is
 enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set):
-- Intel i7-4850HQ: 250 KIOPS
-- AMD A8-3850: 170 KIOPS
-- ARM CortexTM-A53 Octa-core: 45 KIOPS
+- Intel i7-4850HQ: 310 KIOPS
+- AMD A8-3850: 200 KIOPS
+- ARM CortexTM-A53 Octa-core: 56 KIOPS
 
 BFQ works for multi-queue devices too.
 
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 91703eb..69e05f8 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2228,7 +2228,6 @@ static void __bfq_set_in_service_queue(struct bfq_data 
*bfqd,
   struct bfq_queue *bfqq)
 {
if (bfqq) {
-   bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
bfq_clear_bfqq_fifo_expire(bfqq);
 
bfqd->budgets_assigned = (bfqd->budgets_assigned * 7 + 256) / 8;
@@ -3469,7 +3468,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data 
*bfqd)
 */
bfq_clear_bfqq_wait_request(bfqq);
hrtimer_try_to_cancel(>idle_slice_timer);
-   bfqg_stats_update_idle_time(bfqq_group(bfqq));
}
goto keep_queue;
}
@@ -3695,15 +3693,67 @@ static struct request *bfq_dispatch_request(struct 
blk_mq_hw_ctx *hctx)
 {
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
struct request *rq;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   struct bfq_queue *in_serv_queue, *bfqq;
+   bool waiting_rq, idle_timer_disabled;
+#endif
 
spin_lock_irq(>lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   in_serv_queue = bfqd->in_service_queue;
+   waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
+
+   rq = __bfq_dispatch_request(hctx);
+
+   idle_timer_disabled =
+   waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+
+#else
rq = __bfq_dispatch_request(hctx);
-   if (rq && RQ_BFQQ(rq))
-   bfqg_stats_update_io_remove(bfqq_group(RQ_BFQQ(rq)),
-   rq->cmd_flags);
+#endif
spin_unlock_irq(>lock);
 
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+   bfqq = rq ? RQ_BFQQ(rq) : NULL;
+   if (!idle_timer_disabled && !bfqq)
+   

linux-next: Tree for Nov 13

2017-11-12 Thread Stephen Rothwell
Hi all,

Please do not add any v4.16 material to your linux-next included trees
until v4.15-rc1 has been released.

Changes since 20171110:

The powerpc tree still had its build failure for which I applied a
patch

The keys tree gained a build failure so I used the version from
next-20171110.

Non-merge commits (relative to Linus' tree): 12048
 11498 files changed, 555327 insertions(+), 266135 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 272 trees (counting Linus' and 42 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (bebc6082da0a Linux 4.14)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures 
with sparse check)
Merging arc-current/for-curr (92d44128241f ARCv2: Accomodate HS48 MMUv5 by 
relaxing MMU ver checking)
Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks 
addr_limit)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug 
callback failure during imc initialization)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (c9f3f813d462 xfrm: Fix stack-out-of-bounds read in 
xfrm_state_find.)
Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix 
use-after-free in send_reset)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a6127b4440d1 Merge tag 
'iwlwifi-for-kalle-2017-10-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (9618aec3349b Merge tag 'mac80211-for-davem-2017-10-25' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging sound-current/for-linus (75ee94b20b46 ALSA: hda - fix headset mic 
problem for Dell machines with alc274)
Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi 
for PCI host bridge drivers)
Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8)
Merging tty.current/tty-linus (8a5776a5f498 Linux 4.14-rc4)
Merging usb.current/usb-linus (bb176f67090c Linux 4.14-rc6)
Merging usb-gadget-fixes/fixes (7c80f9e4a588 usb: usbtest: fix NULL pointer 
dereference)
Merging usb-serial-fixes/usb-linus (0b07194bb55e Linux 4.14-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (2fb850092fd9 phy: rockchip-typec: Check for errors from 
tcphy_phy_init())
Merging staging.current/staging-linus (bb176f67090c Linux 4.14-rc6)
Merging char-misc.current/char-misc-linus (bb176f67090c Linux 4.14-rc6)
Merging input-current/for-linus (26dd633e437d Input: synaptics-rmi4 - RMI4 can 
also use SMBUS version 3)
Merging crypto-current/master (441f99c90497 crypto: ccm - preserve the IV 
buffer)
Merging ide/master 

linux-next: Tree for Nov 13

2017-11-12 Thread Stephen Rothwell
Hi all,

Please do not add any v4.16 material to your linux-next included trees
until v4.15-rc1 has been released.

Changes since 20171110:

The powerpc tree still had its build failure for which I applied a
patch

The keys tree gained a build failure so I used the version from
next-20171110.

Non-merge commits (relative to Linus' tree): 12048
 11498 files changed, 555327 insertions(+), 266135 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 272 trees (counting Linus' and 42 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (bebc6082da0a Linux 4.14)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures 
with sparse check)
Merging arc-current/for-curr (92d44128241f ARCv2: Accomodate HS48 MMUv5 by 
relaxing MMU ver checking)
Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks 
addr_limit)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug 
callback failure during imc initialization)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (c9f3f813d462 xfrm: Fix stack-out-of-bounds read in 
xfrm_state_find.)
Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix 
use-after-free in send_reset)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a6127b4440d1 Merge tag 
'iwlwifi-for-kalle-2017-10-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (9618aec3349b Merge tag 'mac80211-for-davem-2017-10-25' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging sound-current/for-linus (75ee94b20b46 ALSA: hda - fix headset mic 
problem for Dell machines with alc274)
Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi 
for PCI host bridge drivers)
Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8)
Merging tty.current/tty-linus (8a5776a5f498 Linux 4.14-rc4)
Merging usb.current/usb-linus (bb176f67090c Linux 4.14-rc6)
Merging usb-gadget-fixes/fixes (7c80f9e4a588 usb: usbtest: fix NULL pointer 
dereference)
Merging usb-serial-fixes/usb-linus (0b07194bb55e Linux 4.14-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (2fb850092fd9 phy: rockchip-typec: Check for errors from 
tcphy_phy_init())
Merging staging.current/staging-linus (bb176f67090c Linux 4.14-rc6)
Merging char-misc.current/char-misc-linus (bb176f67090c Linux 4.14-rc6)
Merging input-current/for-linus (26dd633e437d Input: synaptics-rmi4 - RMI4 can 
also use SMBUS version 3)
Merging crypto-current/master (441f99c90497 crypto: ccm - preserve the IV 
buffer)
Merging ide/master 

Re: linux-next: manual merge of the arm64 tree with Linus' tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 1 Nov 2017 07:57:23 +1100 Stephen Rothwell  
wrote:
> 
> Today's linux-next merge of the arm64 tree got a conflict in:
> 
>   drivers/acpi/arm64/iort.c
> 
> between commit:
> 
>   37f6b42e9c29 ("ACPI/IORT: Fix PCI ACS enablement")
> 
> from Linus' tree and commit:
> 
>   896dd2c32484 ("ACPI/IORT: Make platform devices initialization code SMMU 
> agnostic")
> 
> from the arm64 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc drivers/acpi/arm64/iort.c
> index de56394dd161,7dc964f4d8f1..
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@@ -1215,7 -1326,7 +1357,8 @@@ static void __init iort_init_platform_d
>   struct acpi_table_iort *iort;
>   struct fwnode_handle *fwnode;
>   int i, ret;
>  +bool acs_enabled = false;
> + const struct iort_dev_config *ops;
>   
>   /*
>* iort_table and iort both point to the start of IORT table, but
> @@@ -1235,12 -1346,8 +1378,11 @@@
>   return;
>   }
>   
>  +if (!acs_enabled)
>  +acs_enabled = iort_enable_acs(iort_node);
>  +
> - if ((iort_node->type == ACPI_IORT_NODE_SMMU) ||
> - (iort_node->type == ACPI_IORT_NODE_SMMU_V3)) {
> - 
> + ops = iort_get_dev_cfg(iort_node);
> + if (ops) {
>   fwnode = acpi_alloc_fwnode_static();
>   if (!fwnode)
>   return;

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the arm64 tree with Linus' tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 1 Nov 2017 07:57:23 +1100 Stephen Rothwell  
wrote:
> 
> Today's linux-next merge of the arm64 tree got a conflict in:
> 
>   drivers/acpi/arm64/iort.c
> 
> between commit:
> 
>   37f6b42e9c29 ("ACPI/IORT: Fix PCI ACS enablement")
> 
> from Linus' tree and commit:
> 
>   896dd2c32484 ("ACPI/IORT: Make platform devices initialization code SMMU 
> agnostic")
> 
> from the arm64 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc drivers/acpi/arm64/iort.c
> index de56394dd161,7dc964f4d8f1..
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@@ -1215,7 -1326,7 +1357,8 @@@ static void __init iort_init_platform_d
>   struct acpi_table_iort *iort;
>   struct fwnode_handle *fwnode;
>   int i, ret;
>  +bool acs_enabled = false;
> + const struct iort_dev_config *ops;
>   
>   /*
>* iort_table and iort both point to the start of IORT table, but
> @@@ -1235,12 -1346,8 +1378,11 @@@
>   return;
>   }
>   
>  +if (!acs_enabled)
>  +acs_enabled = iort_enable_acs(iort_node);
>  +
> - if ((iort_node->type == ACPI_IORT_NODE_SMMU) ||
> - (iort_node->type == ACPI_IORT_NODE_SMMU_V3)) {
> - 
> + ops = iort_get_dev_cfg(iort_node);
> + if (ops) {
>   fwnode = acpi_alloc_fwnode_static();
>   if (!fwnode)
>   return;

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Tobin C. Harding
On Mon, Nov 13, 2017 at 11:16:28AM +0530, kaiwan.billimo...@gmail.com wrote:
> On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
> > On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
> >  wrote:
> > > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
> > > > Currently we are leaking addresses from the kernel to user space.
> > > > This
> > > > script is an attempt to find some of those leakages. Script
> > > > parses
> > > > `dmesg` output and /proc and /sys files for hex strings that look
> > > > like
> > > > kernel addresses.
> > > > 
> > > > Only works for 64 bit kernels, the reason being that kernel
> > > > addresses
> > > > on 64 bit kernels have '' as the leading bit pattern making
> > > > greping
> > > > possible. On 32 kernels we don't have this luxury.
> > > 
> > > Tobin C. Harding  wrote:
> > > > Only works for 64 bit kernels, the reason being that kernel
> > > > addresses
> > > > on 64 bit kernels have '' as the leading bit pattern making
> > > > greping
> > > > possible. On 32 kernels we don't have this luxury.
> > > 
> > > [RFC] leaking_addresses.pl - enhance it to work for 32-bit kernels
> > > as well
> > > 
> > > (Firstly, apologies if I've got the protocol horribly wrong- should
> > > this
> > > be a new thread altogether?)
> > 
> > I think this patch will need to wait until the patch set that is
> > currently in flight is either merged or dropped.
> > 
> Thanks for looking at it!
> Okay; blocking on merge || drop...  :-)

So, Linus has requested that I set up a tree for the development of
this. I have to work out the details of how to do that and then I'll
email you so you can get the pull the current version. I can then take
your patch via LKML as per usual.

> > We can work this out pragmatically, Perl can give us an architecture
> > string then a few regexs can ascertain which architecture we are
> > running
> > on. This is in the inflight patch set. 
> > 
> > > The patch below does Not take into account (yet) stuff like:
> > >  - exactly which files & dirs should be skipped on 32-bit (will it
> > > be
> > > identical to 64-bit?; unsure..)
> > 
> > As per discussion later in this thread we may need to consider
> > architecture specific lists for files/directories to skip. 
> Right
> > 
> > >  - it currently hard-codes a global 'PAGE_OFFSET_32BIT=0xc000'
> > > , just
> > >  so I can test quickly; must figure whether to query it or pass it;
> > >  Suggestions?
> > 
> > Perhaps we should have a command line option for this.
> > 
> > --kernel-base-address
> 
> Why not just detect it programatically? We could devise a series of
> fallbacks; something like:
> - if .config exists in the kernel source tree root, grep it for
> PAGE_OFFSET
> - if not, grep the arch-specific (arch//configs/)
> for the same
> - if for some reason we don't have enough info regarding specific
> platform and thus the defconfig filename (could happen for ARM, PPC?),
> we then fail and request the user to pass it as a parameter.
> 
> > >  - the 'false positives'; again, what differs for 32-bit?

Sounds good to me.

thanks,
Tobin.


Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Tobin C. Harding
On Mon, Nov 13, 2017 at 11:16:28AM +0530, kaiwan.billimo...@gmail.com wrote:
> On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
> > On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
> >  wrote:
> > > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
> > > > Currently we are leaking addresses from the kernel to user space.
> > > > This
> > > > script is an attempt to find some of those leakages. Script
> > > > parses
> > > > `dmesg` output and /proc and /sys files for hex strings that look
> > > > like
> > > > kernel addresses.
> > > > 
> > > > Only works for 64 bit kernels, the reason being that kernel
> > > > addresses
> > > > on 64 bit kernels have '' as the leading bit pattern making
> > > > greping
> > > > possible. On 32 kernels we don't have this luxury.
> > > 
> > > Tobin C. Harding  wrote:
> > > > Only works for 64 bit kernels, the reason being that kernel
> > > > addresses
> > > > on 64 bit kernels have '' as the leading bit pattern making
> > > > greping
> > > > possible. On 32 kernels we don't have this luxury.
> > > 
> > > [RFC] leaking_addresses.pl - enhance it to work for 32-bit kernels
> > > as well
> > > 
> > > (Firstly, apologies if I've got the protocol horribly wrong- should
> > > this
> > > be a new thread altogether?)
> > 
> > I think this patch will need to wait until the patch set that is
> > currently in flight is either merged or dropped.
> > 
> Thanks for looking at it!
> Okay; blocking on merge || drop...  :-)

So, Linus has requested that I set up a tree for the development of
this. I have to work out the details of how to do that and then I'll
email you so you can get the pull the current version. I can then take
your patch via LKML as per usual.

> > We can work this out pragmatically, Perl can give us an architecture
> > string then a few regexs can ascertain which architecture we are
> > running
> > on. This is in the inflight patch set. 
> > 
> > > The patch below does Not take into account (yet) stuff like:
> > >  - exactly which files & dirs should be skipped on 32-bit (will it
> > > be
> > > identical to 64-bit?; unsure..)
> > 
> > As per discussion later in this thread we may need to consider
> > architecture specific lists for files/directories to skip. 
> Right
> > 
> > >  - it currently hard-codes a global 'PAGE_OFFSET_32BIT=0xc000'
> > > , just
> > >  so I can test quickly; must figure whether to query it or pass it;
> > >  Suggestions?
> > 
> > Perhaps we should have a command line option for this.
> > 
> > --kernel-base-address
> 
> Why not just detect it programatically? We could devise a series of
> fallbacks; something like:
> - if .config exists in the kernel source tree root, grep it for
> PAGE_OFFSET
> - if not, grep the arch-specific (arch//configs/)
> for the same
> - if for some reason we don't have enough info regarding specific
> platform and thus the defconfig filename (could happen for ARM, PPC?),
> we then fail and request the user to pass it as a parameter.
> 
> > >  - the 'false positives'; again, what differs for 32-bit?

Sounds good to me.

thanks,
Tobin.


Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-12 Thread Michael Kerrisk (man-pages)
Hello Ram,

Long ago (2.6.29) you added the /proc/PID/mountinfo file and
associated documentation in Documentation/filesystems/proc.txt. Later,
I pasted much of that documentation into the proc(5) manual page.

That documentation says of the second field in the file:

[[
This file contains lines of the form:

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)

(1) mount ID:  unique identifier of the mount (may be reused after umount)
(2) parent ID:  ID of parent (or of self for the top of the mount tree)
...
]]

The last piece of the description of field (2) doesn't seem to be
correct, or is at least rather unclear. I take this to be saying that
that for the root mount point, /, field (2) will have the same value
as field (1). I never actually looked at this detail closely, but
Alexander pointed out that this is obviously not so, as one can
immediately verify:

$ grep '/ / ' /proc/$$/mountinfo
65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order

I dug around in the kernel source for a bit. I do not have an exact
handle on the details, but I can see roughly what is going on.
Internally, there seems to be one ("hidden") mount ID reserved to each
mount namespace, and that ID is the parent of the root mount point.

Looking through the (4.14) kernel source, mount IDs are allocated by
mnt_alloc_id() (in fs/namespace.c), which is in turn called by
alloc_vfsmnt() which is in turn called by clone_mnt().

A new mount namespace is created by the kernel function copy_mnt_ns()
(in fs/namespace.c, called by create_new_namespaces() in
kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
The first of these is the call that creates the "hidden" mount ID that
becomes the parent of the root mount point. (I verified this by
instrumenting the kernel with a few printk() calls to display the
IDs.)  The second place where copy_tree() calls clone_mnt() is in a
loop that replicates each of the mount points (including the root
mount point) in the source mount namespace.

With these details in mind, I propose to patch the man page to read as
below. Perhaps you have some corrections or improvements to suggest
for this text?

[[
  (2)  parent ID: the ID of the parent mount.  For  the  root
   mount  point,  the  ID shown here is a hidden mount ID
   associated with the mount namespace.  That ID is  dis‐
   tinct  from  any  of the IDs shown in field (1) of the
   records shown in the  mountinfo  file,  and  does  not
   appear in field (1) in the mountinfo file in any other
   mount namespace.  (In  the  initial  mount  namespace,
   this hidden ID has the value 0.)
]]

With best regards,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Improving documentation of parent-ID field in /proc/PID/mountinfo

2017-11-12 Thread Michael Kerrisk (man-pages)
Hello Ram,

Long ago (2.6.29) you added the /proc/PID/mountinfo file and
associated documentation in Documentation/filesystems/proc.txt. Later,
I pasted much of that documentation into the proc(5) manual page.

That documentation says of the second field in the file:

[[
This file contains lines of the form:

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)  (6)  (7)   (8) (9)   (10) (11)

(1) mount ID:  unique identifier of the mount (may be reused after umount)
(2) parent ID:  ID of parent (or of self for the top of the mount tree)
...
]]

The last piece of the description of field (2) doesn't seem to be
correct, or is at least rather unclear. I take this to be saying that
that for the root mount point, /, field (2) will have the same value
as field (1). I never actually looked at this detail closely, but
Alexander pointed out that this is obviously not so, as one can
immediately verify:

$ grep '/ / ' /proc/$$/mountinfo
65 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,seclabel,data=order

I dug around in the kernel source for a bit. I do not have an exact
handle on the details, but I can see roughly what is going on.
Internally, there seems to be one ("hidden") mount ID reserved to each
mount namespace, and that ID is the parent of the root mount point.

Looking through the (4.14) kernel source, mount IDs are allocated by
mnt_alloc_id() (in fs/namespace.c), which is in turn called by
alloc_vfsmnt() which is in turn called by clone_mnt().

A new mount namespace is created by the kernel function copy_mnt_ns()
(in fs/namespace.c, called by create_new_namespaces() in
kernel/nsproxy.c). The copy_mnt_ns() function calls copy_tree() (in
fs/namespace.c), and copy_tree() calls clone_mnt() in *two* places.
The first of these is the call that creates the "hidden" mount ID that
becomes the parent of the root mount point. (I verified this by
instrumenting the kernel with a few printk() calls to display the
IDs.)  The second place where copy_tree() calls clone_mnt() is in a
loop that replicates each of the mount points (including the root
mount point) in the source mount namespace.

With these details in mind, I propose to patch the man page to read as
below. Perhaps you have some corrections or improvements to suggest
for this text?

[[
  (2)  parent ID: the ID of the parent mount.  For  the  root
   mount  point,  the  ID shown here is a hidden mount ID
   associated with the mount namespace.  That ID is  dis‐
   tinct  from  any  of the IDs shown in field (1) of the
   records shown in the  mountinfo  file,  and  does  not
   appear in field (1) in the mountinfo file in any other
   mount namespace.  (In  the  initial  mount  namespace,
   this hidden ID has the value 0.)
]]

With best regards,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: linux-next: manual merge of the tip tree with the net-next tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 20:55:47 + Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   net/ipv4/tcp_output.c
> 
> between commit:
> 
>   6aa7de059173a ("locking/atomics: COCCINELLE/treewide: Convert trivial 
> ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()")
> 
> in the tip tree and some change in the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc net/ipv4/tcp_output.c
> index a69a34f57330,48531da1aba6..
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@@ -1978,7 -1908,7 +1978,7 @@@ static bool tcp_tso_should_defer(struc
>   if ((skb != tcp_write_queue_tail(sk)) && (limit >= skb->len))
>   goto send_now;
>   
> - win_divisor = 
> ACCESS_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_win_divisor);
>  -win_divisor = READ_ONCE(sysctl_tcp_tso_win_divisor);
> ++win_divisor = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_win_divisor);
>   if (win_divisor) {
>   u32 chunk = min(tp->snd_wnd, tp->snd_cwnd * tp->mss_cache);
>   

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the tip tree with the net-next tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 20:55:47 + Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   net/ipv4/tcp_output.c
> 
> between commit:
> 
>   6aa7de059173a ("locking/atomics: COCCINELLE/treewide: Convert trivial 
> ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()")
> 
> in the tip tree and some change in the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc net/ipv4/tcp_output.c
> index a69a34f57330,48531da1aba6..
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@@ -1978,7 -1908,7 +1978,7 @@@ static bool tcp_tso_should_defer(struc
>   if ((skb != tcp_write_queue_tail(sk)) && (limit >= skb->len))
>   goto send_now;
>   
> - win_divisor = 
> ACCESS_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_win_divisor);
>  -win_divisor = READ_ONCE(sysctl_tcp_tso_win_divisor);
> ++win_divisor = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_tso_win_divisor);
>   if (win_divisor) {
>   u32 chunk = min(tp->snd_wnd, tp->snd_cwnd * tp->mss_cache);
>   

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the devicetree tree with the drm tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 20:37:56 + Mark Brown  wrote:
>
> Today's linux-next merge of the devicetree tree got a conflict in:
> 
>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> 
> between commit:
> 
>   44cd3939c111b7 ("drm/tilcdc: Remove redundant OF_DETACHED flag setting")
> 
> from the drm tree and commit:
> 
>   f948d6d8b792bb ("of: overlay: avoid race condition between applying 
> multiple overlays")
> 
> from the devicetree tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> index 482299a6f3b0,54025af534d4..
> --- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> @@@ -163,12 -162,8 +162,6 @@@ static struct device_node * __init tilc
>   return NULL;
>   }
>   
> - ret = of_resolve_phandles(overlay);
> - if (ret) {
> - pr_err("%s: Failed to resolve phandles: %d\n", __func__, ret);
> - return NULL;
> - }
>  -of_node_set_flag(overlay, OF_DETACHED);
> --
>   return overlay;
>   }
>   

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the devicetree tree with the drm tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 20:37:56 + Mark Brown  wrote:
>
> Today's linux-next merge of the devicetree tree got a conflict in:
> 
>   drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> 
> between commit:
> 
>   44cd3939c111b7 ("drm/tilcdc: Remove redundant OF_DETACHED flag setting")
> 
> from the drm tree and commit:
> 
>   f948d6d8b792bb ("of: overlay: avoid race condition between applying 
> multiple overlays")
> 
> from the devicetree tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> index 482299a6f3b0,54025af534d4..
> --- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> +++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
> @@@ -163,12 -162,8 +162,6 @@@ static struct device_node * __init tilc
>   return NULL;
>   }
>   
> - ret = of_resolve_phandles(overlay);
> - if (ret) {
> - pr_err("%s: Failed to resolve phandles: %d\n", __func__, ret);
> - return NULL;
> - }
>  -of_node_set_flag(overlay, OF_DETACHED);
> --
>   return overlay;
>   }
>   

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the ext4 tree with the fscrypt tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 14:48:04 + Mark Brown  wrote:
>
> Today's linux-next merge of the ext4 tree got a conflict in:
> 
>   fs/ext4/inode.c
> 
> between commit:
> 
>   2ee6a576be564272 ("fs, fscrypt: add an S_ENCRYPTED inode flag")
> 
> from the fscrypt tree and commit:
> 
>   d4e50e6d43b2620f ("ext4: add ext4_should_use_dax()")
> 
> from the ext4 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc fs/ext4/inode.c
> index 617c7feced24,9f836e2ec18c..
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@@ -4572,6 -4610,21 +4610,23 @@@ int ext4_get_inode_loc(struct inode *in
>   !ext4_test_inode_state(inode, EXT4_STATE_XATTR));
>   }
>   
> + static bool ext4_should_use_dax(struct inode *inode)
> + {
> ++unsigned int flags = EXT4_I(inode)->i_flags;
> ++
> + if (!test_opt(inode->i_sb, DAX))
> + return false;
> + if (!S_ISREG(inode->i_mode))
> + return false;
> + if (ext4_should_journal_data(inode))
> + return false;
> + if (ext4_has_inline_data(inode))
> + return false;
>  -if (ext4_encrypted_inode(inode))
> ++if (flags & EXT4_ENCRYPT_FL)
> + return false;
> + return true;
> + }
> + 
>   void ext4_set_inode_flags(struct inode *inode)
>   {
>   unsigned int flags = EXT4_I(inode)->i_flags;
> @@@ -4587,15 -4640,10 +4642,13 @@@
>   new_fl |= S_NOATIME;
>   if (flags & EXT4_DIRSYNC_FL)
>   new_fl |= S_DIRSYNC;
> - if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
> - !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
> - !(flags & EXT4_ENCRYPT_FL))
> + if (ext4_should_use_dax(inode))
>   new_fl |= S_DAX;
>  +if (flags & EXT4_ENCRYPT_FL)
>  +new_fl |= S_ENCRYPTED;
>   inode_set_flags(inode, new_fl,
>  -S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
>  +S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
>  +S_ENCRYPTED);
>   }
>   
>   static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the ext4 tree with the fscrypt tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 14:48:04 + Mark Brown  wrote:
>
> Today's linux-next merge of the ext4 tree got a conflict in:
> 
>   fs/ext4/inode.c
> 
> between commit:
> 
>   2ee6a576be564272 ("fs, fscrypt: add an S_ENCRYPTED inode flag")
> 
> from the fscrypt tree and commit:
> 
>   d4e50e6d43b2620f ("ext4: add ext4_should_use_dax()")
> 
> from the ext4 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc fs/ext4/inode.c
> index 617c7feced24,9f836e2ec18c..
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@@ -4572,6 -4610,21 +4610,23 @@@ int ext4_get_inode_loc(struct inode *in
>   !ext4_test_inode_state(inode, EXT4_STATE_XATTR));
>   }
>   
> + static bool ext4_should_use_dax(struct inode *inode)
> + {
> ++unsigned int flags = EXT4_I(inode)->i_flags;
> ++
> + if (!test_opt(inode->i_sb, DAX))
> + return false;
> + if (!S_ISREG(inode->i_mode))
> + return false;
> + if (ext4_should_journal_data(inode))
> + return false;
> + if (ext4_has_inline_data(inode))
> + return false;
>  -if (ext4_encrypted_inode(inode))
> ++if (flags & EXT4_ENCRYPT_FL)
> + return false;
> + return true;
> + }
> + 
>   void ext4_set_inode_flags(struct inode *inode)
>   {
>   unsigned int flags = EXT4_I(inode)->i_flags;
> @@@ -4587,15 -4640,10 +4642,13 @@@
>   new_fl |= S_NOATIME;
>   if (flags & EXT4_DIRSYNC_FL)
>   new_fl |= S_DIRSYNC;
> - if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
> - !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
> - !(flags & EXT4_ENCRYPT_FL))
> + if (ext4_should_use_dax(inode))
>   new_fl |= S_DAX;
>  +if (flags & EXT4_ENCRYPT_FL)
>  +new_fl |= S_ENCRYPTED;
>   inode_set_flags(inode, new_fl,
>  -S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
>  +S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
>  +S_ENCRYPTED);
>   }
>   
>   static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the akpm-current tree

2017-11-12 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

In file included from include/linux/printk.h:7:0,
 from include/linux/kernel.h:14,
 from lib/test_find_bit.c:28:
lib/test_find_bit.c: In function 'test_find_first_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:54:2: note: in expansion of macro 'pr_err'
  pr_err("find_first_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^
lib/test_find_bit.c: In function 'test_find_next_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:68:2: note: in expansion of macro 'pr_err'
  pr_err("find_next_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^
lib/test_find_bit.c: In function 'test_find_next_zero_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:82:2: note: in expansion of macro 'pr_err'
  pr_err("find_next_zero_bit:\t%ld cycles,\t%ld iterations\n",
  ^
lib/test_find_bit.c: In function 'test_find_last_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:102:2: note: in expansion of macro 'pr_err'
  pr_err("find_last_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^

Introduced by commit

  09588b1f1d58 ("lib: test module for find_*_bit() functions")

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the akpm-current tree

2017-11-12 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

In file included from include/linux/printk.h:7:0,
 from include/linux/kernel.h:14,
 from lib/test_find_bit.c:28:
lib/test_find_bit.c: In function 'test_find_first_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:54:2: note: in expansion of macro 'pr_err'
  pr_err("find_first_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^
lib/test_find_bit.c: In function 'test_find_next_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:68:2: note: in expansion of macro 'pr_err'
  pr_err("find_next_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^
lib/test_find_bit.c: In function 'test_find_next_zero_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:82:2: note: in expansion of macro 'pr_err'
  pr_err("find_next_zero_bit:\t%ld cycles,\t%ld iterations\n",
  ^
lib/test_find_bit.c: In function 'test_find_last_bit':
include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of 
type 'long int', but argument 2 has type 'cycles_t {aka long long unsigned 
int}' [-Wformat=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
 #define KERN_ERR KERN_SOH "3" /* error conditions */
  ^
include/linux/printk.h:300:9: note: in expansion of macro 'KERN_ERR'
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
 ^
lib/test_find_bit.c:102:2: note: in expansion of macro 'pr_err'
  pr_err("find_last_bit:\t\t%ld cycles,\t%ld iterations\n", cycles, cnt);
  ^

Introduced by commit

  09588b1f1d58 ("lib: test module for find_*_bit() functions")

-- 
Cheers,
Stephen Rothwell


[PATCH v2] coccinelle: orplus: reorganize to improve performance

2017-11-12 Thread Julia Lawall
Adding two #define constants is less common than performing & and |
operations on them, so put the addition first to reduce the set of cases
that have to be considered in detail.  At the same time, add & and |
patterns for both arguments of +, to account for commutativity and obtain
more results.

Running time is divided by 3 when applying this to the whole kernel on my
laptop with an Intel i5-6200U CPU.

Signed-off-by: Julia Lawall 

---

v2: added SOB and fixed typos in the commit message

diff --git a/scripts/coccinelle/misc/orplus.cocci 
b/scripts/coccinelle/misc/orplus.cocci
index 81fabf3..08de5be 100644
--- a/scripts/coccinelle/misc/orplus.cocci
+++ b/scripts/coccinelle/misc/orplus.cocci
@@ -14,7 +14,19 @@ virtual report
 virtual context

 @r@
-constant c;
+constant c,c1;
+identifier i,i1;
+position p;
+@@
+
+(
+ c1 + c - 1
+|
+ c1@i1 +@p c@i
+)
+
+@s@
+constant r.c, r.c1;
 identifier i;
 expression e;
 @@
@@ -27,28 +39,31 @@ e & c@i
 e |= c@i
 |
 e &= c@i
+|
+e | c1@i
+|
+e & c1@i
+|
+e |= c1@i
+|
+e &= c1@i
 )

-@s@
-constant r.c,c1;
-identifier i1;
-position p;
+@depends on s@
+position r.p;
+constant c1,c2;
 @@

-(
- c1 + c - 1
-|
-*c1@i1 +@p c
-)
+* c1 +@p c2

-@script:python depends on org@
-p << s.p;
+@script:python depends on s && org@
+p << r.p;
 @@

 cocci.print_main("sum of probable bitmasks, consider |",p)

-@script:python depends on report@
-p << s.p;
+@script:python depends on s && report@
+p << r.p;
 @@

 msg = "WARNING: sum of probable bitmasks, consider |"


[PATCH v2] coccinelle: orplus: reorganize to improve performance

2017-11-12 Thread Julia Lawall
Adding two #define constants is less common than performing & and |
operations on them, so put the addition first to reduce the set of cases
that have to be considered in detail.  At the same time, add & and |
patterns for both arguments of +, to account for commutativity and obtain
more results.

Running time is divided by 3 when applying this to the whole kernel on my
laptop with an Intel i5-6200U CPU.

Signed-off-by: Julia Lawall 

---

v2: added SOB and fixed typos in the commit message

diff --git a/scripts/coccinelle/misc/orplus.cocci 
b/scripts/coccinelle/misc/orplus.cocci
index 81fabf3..08de5be 100644
--- a/scripts/coccinelle/misc/orplus.cocci
+++ b/scripts/coccinelle/misc/orplus.cocci
@@ -14,7 +14,19 @@ virtual report
 virtual context

 @r@
-constant c;
+constant c,c1;
+identifier i,i1;
+position p;
+@@
+
+(
+ c1 + c - 1
+|
+ c1@i1 +@p c@i
+)
+
+@s@
+constant r.c, r.c1;
 identifier i;
 expression e;
 @@
@@ -27,28 +39,31 @@ e & c@i
 e |= c@i
 |
 e &= c@i
+|
+e | c1@i
+|
+e & c1@i
+|
+e |= c1@i
+|
+e &= c1@i
 )

-@s@
-constant r.c,c1;
-identifier i1;
-position p;
+@depends on s@
+position r.p;
+constant c1,c2;
 @@

-(
- c1 + c - 1
-|
-*c1@i1 +@p c
-)
+* c1 +@p c2

-@script:python depends on org@
-p << s.p;
+@script:python depends on s && org@
+p << r.p;
 @@

 cocci.print_main("sum of probable bitmasks, consider |",p)

-@script:python depends on report@
-p << s.p;
+@script:python depends on s && report@
+p << r.p;
 @@

 msg = "WARNING: sum of probable bitmasks, consider |"


[PATCH] timekeeping: Eliminate the useless declaration of ktime_get_raw_and_real_ts64()

2017-11-12 Thread Dou Liyang
Commit:

  ba26621e63ce ("time: Remove duplicated code in ktime_get_raw_and_real()")

... got rid of ktime_get_raw_and_real_ts64(), but left its declaration
behind.

Remove it.

Signed-off-by: Dou Liyang 
---
 include/linux/timekeeping.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index c198ab4..b17bcce 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -143,12 +143,6 @@ extern bool timekeeping_rtc_skipresume(void);
 extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
 
 /*
- * PPS accessor
- */
-extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
-   struct timespec64 *ts_real);
-
-/*
  * struct system_time_snapshot - simultaneous raw/real time capture with
  * counter value
  * @cycles:Clocksource counter value to produce the system times
-- 
2.5.5





[PATCH] timekeeping: Eliminate the useless declaration of ktime_get_raw_and_real_ts64()

2017-11-12 Thread Dou Liyang
Commit:

  ba26621e63ce ("time: Remove duplicated code in ktime_get_raw_and_real()")

... got rid of ktime_get_raw_and_real_ts64(), but left its declaration
behind.

Remove it.

Signed-off-by: Dou Liyang 
---
 include/linux/timekeeping.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index c198ab4..b17bcce 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -143,12 +143,6 @@ extern bool timekeeping_rtc_skipresume(void);
 extern void timekeeping_inject_sleeptime64(struct timespec64 *delta);
 
 /*
- * PPS accessor
- */
-extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
-   struct timespec64 *ts_real);
-
-/*
  * struct system_time_snapshot - simultaneous raw/real time capture with
  * counter value
  * @cycles:Clocksource counter value to produce the system times
-- 
2.5.5





Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread kaiwan . billimoria
On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
> On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
>  wrote:
> > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
> > > Currently we are leaking addresses from the kernel to user space.
> > > This
> > > script is an attempt to find some of those leakages. Script
> > > parses
> > > `dmesg` output and /proc and /sys files for hex strings that look
> > > like
> > > kernel addresses.
> > > 
> > > Only works for 64 bit kernels, the reason being that kernel
> > > addresses
> > > on 64 bit kernels have '' as the leading bit pattern making
> > > greping
> > > possible. On 32 kernels we don't have this luxury.
> > 
> > Tobin C. Harding  wrote:
> > > Only works for 64 bit kernels, the reason being that kernel
> > > addresses
> > > on 64 bit kernels have '' as the leading bit pattern making
> > > greping
> > > possible. On 32 kernels we don't have this luxury.
> > 
> > [RFC] leaking_addresses.pl - enhance it to work for 32-bit kernels
> > as well
> > 
> > (Firstly, apologies if I've got the protocol horribly wrong- should
> > this
> > be a new thread altogether?)
> 
> I think this patch will need to wait until the patch set that is
> currently in flight is either merged or dropped.
> 
Thanks for looking at it!
Okay; blocking on merge || drop...  :-)

> > 
> We can work this out pragmatically, Perl can give us an architecture
> string then a few regexs can ascertain which architecture we are
> running
> on. This is in the inflight patch set. 
> 
> > The patch below does Not take into account (yet) stuff like:
> >  - exactly which files & dirs should be skipped on 32-bit (will it
> > be
> > identical to 64-bit?; unsure..)
> 
> As per discussion later in this thread we may need to consider
> architecture specific lists for files/directories to skip. 
Right
> 
> >  - it currently hard-codes a global 'PAGE_OFFSET_32BIT=0xc000'
> > , just
> >  so I can test quickly; must figure whether to query it or pass it;
> >  Suggestions?
> 
> Perhaps we should have a command line option for this.
> 
>   --kernel-base-address

Why not just detect it programatically? We could devise a series of
fallbacks; something like:
- if .config exists in the kernel source tree root, grep it for
PAGE_OFFSET
- if not, grep the arch-specific (arch//configs/)
for the same
- if for some reason we don't have enough info regarding specific
platform and thus the defconfig filename (could happen for ARM, PPC?),
we then fail and request the user to pass it as a parameter.

> >  - the 'false positives'; again, what differs for 32-bit?
> >(BTW, shouldn't the dmesg 'root=UUID=<...>' line be a false
> > positive
> > & skipped?).
> 
> We could probably do with architecture specific false
> positives. Inflight patch set refactors false_positive() so adding to
> this should be easy.
Sure.
> 
> > Also, I must point out that I'm a complete newbie to Perl :-) so,
> > pl excuse
> > my highly inadequate perl-foo; I rely on you perl gurus out there
> > to fix
> > and optimize :)
> 
> I'm no Perl guru but following are a few tips I have picked up over
> the
> last month.
Thanks, will fix the issues you point out..
> 
> > 
> Conceptually your ideas look good to me. If there is some reason this
> approach won't work hopefully someone else will jump in and say so.
> 
> Nice work, thanks for putting in effort to get 32 bit machines
> supported. Let's see what happens with the inflight patch set then
> work
> on getting these ideas in.
> 
Thanks! yes..
> thanks,
> Tobin.


Re: [kernel-hardening] [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread kaiwan . billimoria
On Mon, 2017-11-13 at 09:21 +1100, Tobin C. Harding wrote:
> On Fri, Nov 10, 2017 at 07:26:34PM +0530, kaiwan.billimo...@gmail.com
>  wrote:
> > On Tue, 2017-11-07 at 21:32 +1100, Tobin C. Harding wrote:
> > > Currently we are leaking addresses from the kernel to user space.
> > > This
> > > script is an attempt to find some of those leakages. Script
> > > parses
> > > `dmesg` output and /proc and /sys files for hex strings that look
> > > like
> > > kernel addresses.
> > > 
> > > Only works for 64 bit kernels, the reason being that kernel
> > > addresses
> > > on 64 bit kernels have '' as the leading bit pattern making
> > > greping
> > > possible. On 32 kernels we don't have this luxury.
> > 
> > Tobin C. Harding  wrote:
> > > Only works for 64 bit kernels, the reason being that kernel
> > > addresses
> > > on 64 bit kernels have '' as the leading bit pattern making
> > > greping
> > > possible. On 32 kernels we don't have this luxury.
> > 
> > [RFC] leaking_addresses.pl - enhance it to work for 32-bit kernels
> > as well
> > 
> > (Firstly, apologies if I've got the protocol horribly wrong- should
> > this
> > be a new thread altogether?)
> 
> I think this patch will need to wait until the patch set that is
> currently in flight is either merged or dropped.
> 
Thanks for looking at it!
Okay; blocking on merge || drop...  :-)

> > 
> We can work this out pragmatically, Perl can give us an architecture
> string then a few regexs can ascertain which architecture we are
> running
> on. This is in the inflight patch set. 
> 
> > The patch below does Not take into account (yet) stuff like:
> >  - exactly which files & dirs should be skipped on 32-bit (will it
> > be
> > identical to 64-bit?; unsure..)
> 
> As per discussion later in this thread we may need to consider
> architecture specific lists for files/directories to skip. 
Right
> 
> >  - it currently hard-codes a global 'PAGE_OFFSET_32BIT=0xc000'
> > , just
> >  so I can test quickly; must figure whether to query it or pass it;
> >  Suggestions?
> 
> Perhaps we should have a command line option for this.
> 
>   --kernel-base-address

Why not just detect it programatically? We could devise a series of
fallbacks; something like:
- if .config exists in the kernel source tree root, grep it for
PAGE_OFFSET
- if not, grep the arch-specific (arch//configs/)
for the same
- if for some reason we don't have enough info regarding specific
platform and thus the defconfig filename (could happen for ARM, PPC?),
we then fail and request the user to pass it as a parameter.

> >  - the 'false positives'; again, what differs for 32-bit?
> >(BTW, shouldn't the dmesg 'root=UUID=<...>' line be a false
> > positive
> > & skipped?).
> 
> We could probably do with architecture specific false
> positives. Inflight patch set refactors false_positive() so adding to
> this should be easy.
Sure.
> 
> > Also, I must point out that I'm a complete newbie to Perl :-) so,
> > pl excuse
> > my highly inadequate perl-foo; I rely on you perl gurus out there
> > to fix
> > and optimize :)
> 
> I'm no Perl guru but following are a few tips I have picked up over
> the
> last month.
Thanks, will fix the issues you point out..
> 
> > 
> Conceptually your ideas look good to me. If there is some reason this
> approach won't work hopefully someone else will jump in and say so.
> 
> Nice work, thanks for putting in effort to get 32 bit machines
> supported. Let's see what happens with the inflight patch set then
> work
> on getting these ideas in.
> 
Thanks! yes..
> thanks,
> Tobin.


linux-next: build warning after merge of the akpm-current tree

2017-11-12 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

In file included from include/linux/mmzone.h:17:0,
 from include/linux/mempolicy.h:10,
 from mm/mempolicy.c:70:
mm/mempolicy.c: In function 'mpol_to_str':
include/linux/nodemask.h:107:41: warning: the address of 'nodes' will always 
evaluate as 'true' [-Waddress]
 #define nodemask_pr_args(maskp) (maskp) ? MAX_NUMNODES : 0, (maskp) ? 
(maskp)->bits : NULL
 ^
mm/mempolicy.c:2817:11: note: in expansion of macro 'nodemask_pr_args'
   nodemask_pr_args());
   ^
include/linux/nodemask.h:107:69: warning: the address of 'nodes' will always 
evaluate as 'true' [-Waddress]
 #define nodemask_pr_args(maskp) (maskp) ? MAX_NUMNODES : 0, (maskp) ? 
(maskp)->bits : NULL
 ^
mm/mempolicy.c:2817:11: note: in expansion of macro 'nodemask_pr_args'
   nodemask_pr_args());
   ^

Introduced by commit

  b2c1ed23bdc1 ("mm: simplify nodemask printing")

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the akpm-current tree

2017-11-12 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

In file included from include/linux/mmzone.h:17:0,
 from include/linux/mempolicy.h:10,
 from mm/mempolicy.c:70:
mm/mempolicy.c: In function 'mpol_to_str':
include/linux/nodemask.h:107:41: warning: the address of 'nodes' will always 
evaluate as 'true' [-Waddress]
 #define nodemask_pr_args(maskp) (maskp) ? MAX_NUMNODES : 0, (maskp) ? 
(maskp)->bits : NULL
 ^
mm/mempolicy.c:2817:11: note: in expansion of macro 'nodemask_pr_args'
   nodemask_pr_args());
   ^
include/linux/nodemask.h:107:69: warning: the address of 'nodes' will always 
evaluate as 'true' [-Waddress]
 #define nodemask_pr_args(maskp) (maskp) ? MAX_NUMNODES : 0, (maskp) ? 
(maskp)->bits : NULL
 ^
mm/mempolicy.c:2817:11: note: in expansion of macro 'nodemask_pr_args'
   nodemask_pr_args());
   ^

Introduced by commit

  b2c1ed23bdc1 ("mm: simplify nodemask printing")

-- 
Cheers,
Stephen Rothwell


[PATCH] powerpc/perf: Add debugfs interface for imc-mode and imc-command

2017-11-12 Thread Anju T Sudhakar
In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.  

Nest units support only limited set of hardware counters and ucode is always
programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.   
  

But ucode also supports other modes which would be used for "debug" to drill
down specific nest units. That is, ucode when switched to "powerbus" debug  
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
  

IMC nest counters has both in-band (ucode access) and out of band access to it. 
Since not all nest counter configurations are supported by ucode, out of band   
tools are used to characterize other nest counter configurations.   

Patch provides an interface via "debugfs" to enable the switching of ucode
modes in the system. To switch ucode mode, one has to first pause the microcode 
(imc_cmd), and then write the target mode value to the "imc_mode" file. 
  

Proposed Approach   
=== 

In the proposed approach, the function (export_imc_mode_and_cmd) which creates  
 
the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip. 

The interface to expose imc mode and command is required only if we have nest
pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.   

Example for the interface:  

root@:/sys/kernel/debug/imc# ls 
imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/imc-pmu.h|  7 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
 #define THREAD_IMC_ENABLE   0x8000ULL
 
 /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
  * Structure to hold memory address information for imc units.
  */
 struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
 #include 
 #include 
 #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+

[PATCH] powerpc/perf: Add debugfs interface for imc-mode and imc-command

2017-11-12 Thread Anju T Sudhakar
In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.  

Nest units support only limited set of hardware counters and ucode is always
programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.   
  

But ucode also supports other modes which would be used for "debug" to drill
down specific nest units. That is, ucode when switched to "powerbus" debug  
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
  

IMC nest counters has both in-band (ucode access) and out of band access to it. 
Since not all nest counter configurations are supported by ucode, out of band   
tools are used to characterize other nest counter configurations.   

Patch provides an interface via "debugfs" to enable the switching of ucode
modes in the system. To switch ucode mode, one has to first pause the microcode 
(imc_cmd), and then write the target mode value to the "imc_mode" file. 
  

Proposed Approach   
=== 

In the proposed approach, the function (export_imc_mode_and_cmd) which creates  
 
the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip. 

The interface to expose imc mode and command is required only if we have nest
pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.   

Example for the interface:  

root@:/sys/kernel/debug/imc# ls 
imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/imc-pmu.h|  7 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
 #define THREAD_IMC_ENABLE   0x8000ULL
 
 /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
  * Structure to hold memory address information for imc units.
  */
 struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
 #include 
 #include 
 #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+ 

Re: [PATCH] powerpc/perf: Add debugfs interface for imc run-mode and run-status

2017-11-12 Thread Anju T Sudhakar


Hi,

Kindly ignore this version

Thanks,
Anju

On Monday 13 November 2017 11:06 AM, Anju T Sudhakar wrote:

In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.
 
Nest units support only limited set of hardware counters and ucode is always

programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.
 
But ucode also supports other modes which would be used for "debug" to drill

down specific nest units. That is, ucode when switched to "powerbus" debug
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
 
IMC nest counters has both in-band (ucode access) and out of band access to it.

Since not all nest counter configurations are supported by ucode, out of band
tools are used to characterize other nest counter configurations.
 
Patch provides an interface via "debugfs" to enable the switching of ucode

modes in the system. To switch ucode mode, one has to first pause the microcode
(imc_cmd), and then write the target mode value to the "imc_mode" file.
 
Proposed Approach

===
 
In the proposed approach, the function (export_imc_mode_and_cmd) which creates

the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip.
 
The interface to expose imc mode and command is required only if we have nest

pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.

Example for the interface:
 
root@:/sys/kernel/debug/imc# ls

imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8
 
Signed-off-by: Anju T Sudhakar 

---
  arch/powerpc/include/asm/imc-pmu.h|  7 +++
  arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
  2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
  #define THREAD_IMC_ENABLE   0x8000ULL

  /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
   * Structure to hold memory address information for imc units.
   */
  struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
  #include 
  #include 
  #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+   "0x%016llx\n");
+
+static struct dentry *debugfs_create_imc_x64(const char *name, umode_t mode,
+   struct dentry *parent, u64  *value)
+{
+   return debugfs_create_file_unsafe(name, mode, parent, value, 
_imc_x64);
+}
+
+/*
+ * export_imc_mode_and_cmd: Create a debugfs interface
+ * for 

Re: [PATCH] powerpc/perf: Add debugfs interface for imc run-mode and run-status

2017-11-12 Thread Anju T Sudhakar


Hi,

Kindly ignore this version

Thanks,
Anju

On Monday 13 November 2017 11:06 AM, Anju T Sudhakar wrote:

In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.
 
Nest units support only limited set of hardware counters and ucode is always

programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.
 
But ucode also supports other modes which would be used for "debug" to drill

down specific nest units. That is, ucode when switched to "powerbus" debug
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
 
IMC nest counters has both in-band (ucode access) and out of band access to it.

Since not all nest counter configurations are supported by ucode, out of band
tools are used to characterize other nest counter configurations.
 
Patch provides an interface via "debugfs" to enable the switching of ucode

modes in the system. To switch ucode mode, one has to first pause the microcode
(imc_cmd), and then write the target mode value to the "imc_mode" file.
 
Proposed Approach

===
 
In the proposed approach, the function (export_imc_mode_and_cmd) which creates

the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip.
 
The interface to expose imc mode and command is required only if we have nest

pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.

Example for the interface:
 
root@:/sys/kernel/debug/imc# ls

imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8
 
Signed-off-by: Anju T Sudhakar 

---
  arch/powerpc/include/asm/imc-pmu.h|  7 +++
  arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
  2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
  #define THREAD_IMC_ENABLE   0x8000ULL

  /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
   * Structure to hold memory address information for imc units.
   */
  struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
  #include 
  #include 
  #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+   "0x%016llx\n");
+
+static struct dentry *debugfs_create_imc_x64(const char *name, umode_t mode,
+   struct dentry *parent, u64  *value)
+{
+   return debugfs_create_file_unsafe(name, mode, parent, value, 
_imc_x64);
+}
+
+/*
+ * export_imc_mode_and_cmd: Create a debugfs interface
+ * for imc_cmd and imc_mode
+ *   

Re: linux-next: manual merge of the powerpc tree with Linus' tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 12:51:33 + Mark Brown  wrote:
>
> Hi all,
> 
> Today's linux-next merge of the powerpc tree got a conflict in:
> 
>   arch/powerpc/kvm/powerpc.c
> 
> between commit:
> 
>   ac64115a66c1 ("KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM")
> 
> from Linus' tree and commit:
> 
>   2a3d6553cbd7 ("KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM 
> feature")
> 
> from the powerpc tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc arch/powerpc/kvm/powerpc.c
> index ee279c7f4802,a3746b98ec11..
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@@ -644,7 -644,8 +644,8 @@@ int kvm_vm_ioctl_check_extension(struc
>   break;
>   #endif
>   case KVM_CAP_PPC_HTM:
> - r = cpu_has_feature(CPU_FTR_TM_COMP) && hv_enabled;
>  -r = is_kvmppc_hv_enabled(kvm) &&
> ++r = hv_enabled &&
> + (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP);
>   break;
>   default:
>   r = 0;

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the powerpc tree with Linus' tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 30 Oct 2017 12:51:33 + Mark Brown  wrote:
>
> Hi all,
> 
> Today's linux-next merge of the powerpc tree got a conflict in:
> 
>   arch/powerpc/kvm/powerpc.c
> 
> between commit:
> 
>   ac64115a66c1 ("KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM")
> 
> from Linus' tree and commit:
> 
>   2a3d6553cbd7 ("KVM: PPC: Tie KVM_CAP_PPC_HTM to the user-visible TM 
> feature")
> 
> from the powerpc tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc arch/powerpc/kvm/powerpc.c
> index ee279c7f4802,a3746b98ec11..
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@@ -644,7 -644,8 +644,8 @@@ int kvm_vm_ioctl_check_extension(struc
>   break;
>   #endif
>   case KVM_CAP_PPC_HTM:
> - r = cpu_has_feature(CPU_FTR_TM_COMP) && hv_enabled;
>  -r = is_kvmppc_hv_enabled(kvm) &&
> ++r = hv_enabled &&
> + (cur_cpu_spec->cpu_user_features2 & PPC_FEATURE2_HTM_COMP);
>   break;
>   default:
>   r = 0;

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


[PATCH] powerpc/perf: Add debugfs interface for imc run-mode and run-status

2017-11-12 Thread Anju T Sudhakar
In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.  

Nest units support only limited set of hardware counters and ucode is always
programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.   
  

But ucode also supports other modes which would be used for "debug" to drill
down specific nest units. That is, ucode when switched to "powerbus" debug  
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
  

IMC nest counters has both in-band (ucode access) and out of band access to it. 
Since not all nest counter configurations are supported by ucode, out of band   
tools are used to characterize other nest counter configurations.   

Patch provides an interface via "debugfs" to enable the switching of ucode
modes in the system. To switch ucode mode, one has to first pause the microcode 
(imc_cmd), and then write the target mode value to the "imc_mode" file. 
  

Proposed Approach   
=== 

In the proposed approach, the function (export_imc_mode_and_cmd) which creates  
 
the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip. 

The interface to expose imc mode and command is required only if we have nest
pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.   

Example for the interface:  

root@:/sys/kernel/debug/imc# ls 
imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/imc-pmu.h|  7 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
 #define THREAD_IMC_ENABLE   0x8000ULL
 
 /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
  * Structure to hold memory address information for imc units.
  */
 struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
 #include 
 #include 
 #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+

[PATCH] powerpc/perf: Add debugfs interface for imc run-mode and run-status

2017-11-12 Thread Anju T Sudhakar
In memory Collection (IMC) counter pmu driver controls the ucode's execution
state. At the system boot, IMC perf driver pause the ucode. Ucode state is
changed to "running" only when any of the nest units are monitored or profiled
using perf tool.  

Nest units support only limited set of hardware counters and ucode is always
programmed in the "production mode" ("accumulation") mode. This mode is
configured to provide key performance metric data for most of the nest units.   
  

But ucode also supports other modes which would be used for "debug" to drill
down specific nest units. That is, ucode when switched to "powerbus" debug  
mode (for example), will dynamically reconfigure the nest counters to target
only "powerbus" related events in the hardware counters. This allows the IMC
nest unit to focus on powerbus related transactions in the system in more
detail. At this point, production mode events may or may not be counted.
  

IMC nest counters has both in-band (ucode access) and out of band access to it. 
Since not all nest counter configurations are supported by ucode, out of band   
tools are used to characterize other nest counter configurations.   

Patch provides an interface via "debugfs" to enable the switching of ucode
modes in the system. To switch ucode mode, one has to first pause the microcode 
(imc_cmd), and then write the target mode value to the "imc_mode" file. 
  

Proposed Approach   
=== 

In the proposed approach, the function (export_imc_mode_and_cmd) which creates  
 
the debugfs interface for imc mode and command is implemented in opal-imc.c.
Thus we can use imc_get_mem_addr() to get the homer base address for each chip. 

The interface to expose imc mode and command is required only if we have nest
pmu units registered. Employing the existing data structures to track whether
we have any nest units registered will require to extend data from perf side
to opal-imc.c. Instead an integer is introduced to hold that information by
counting successful nest unit registration. Debugfs interface is removed
based on the integer count.   

Example for the interface:  

root@:/sys/kernel/debug/imc# ls 
imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/imc-pmu.h|  7 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 74 ++-
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 7f74c28..317002d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -40,6 +40,13 @@
 #define THREAD_IMC_ENABLE   0x8000ULL
 
 /*
+ * For debugfs interface for imc-mode and imc-command
+ */
+#define IMC_CNTL_BLK_OFFSET0x3FC00
+#define IMC_CNTL_BLK_CMD_OFFSET8
+#define IMC_CNTL_BLK_MODE_OFFSET   32
+
+/*
  * Structure to hold memory address information for imc units.
  */
 struct imc_mem_info {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 21f6531..a88ddab 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -21,6 +21,70 @@
 #include 
 #include 
 #include 
+#include 
+
+static struct dentry *parent;
+
+/* Helpers to export imc command and status via debugfs */
+static int debugfs_imc_mem_get(void *data, u64 *val)
+{
+   *val = cpu_to_be64(*(u64 *)data);
+   return 0;
+}
+
+static int debugfs_imc_mem_set(void *data, u64 val)
+{
+   *(u64 *)data = cpu_to_be64(val);
+   return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(fops_imc_x64, debugfs_imc_mem_get, 
debugfs_imc_mem_set,
+ 

Re: linux-next: manual merge of the integrity tree with the jc-docs tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 18 Oct 2017 11:50:25 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the integrity tree got a conflict in:
> 
>   Documentation/ABI/testing/evm
> 
> between commit:
> 
>   c7f66400f504fd5 ("Documentation: fix security related doc refs")
> 
> from the jc-docs tree and commit:
> 
>   cbad39d632b7c18 ("EVM: Allow userspace to signal an RSA key has been 
> loaded")
> 
> from the integrity tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc Documentation/ABI/testing/evm
> index ca622c9aa24c,a0bbccb00736..
> --- a/Documentation/ABI/testing/evm
> +++ b/Documentation/ABI/testing/evm
> @@@ -7,17 -7,36 +7,36 @@@ Description
>   HMAC-sha1 value across the extended attributes, storing the
>   value as the extended attribute 'security.evm'.
>   
> - EVM depends on the Kernel Key Retention System to provide it
> - with a trusted/encrypted key for the HMAC-sha1 operation.
> - The key is loaded onto the root's keyring using keyctl.  Until
> - EVM receives notification that the key has been successfully
> - loaded onto the keyring (echo 1 > /evm), EVM
> - can not create or validate the 'security.evm' xattr, but
> - returns INTEGRITY_UNKNOWN.  Loading the key and signaling EVM
> - should be done as early as possible.  Normally this is done
> - in the initramfs, which has already been measured as part
> - of the trusted boot.  For more information on creating and
> - loading existing trusted/encrypted keys, refer to:
> - Documentation/security/keys/trusted-encrypted.rst.  (A sample
> - dracut patch, which loads the trusted/encrypted key and enables
> - EVM, is available from http://linux-ima.sourceforge.net/#EVM.)
> + EVM supports two classes of security.evm. The first is
> + an HMAC-sha1 generated locally with a
> + trusted/encrypted key stored in the Kernel Key
> + Retention System. The second is a digital signature
> + generated either locally or remotely using an
> + asymmetric key. These keys are loaded onto root's
> + keyring using keyctl, and EVM is then enabled by
> + echoing a value to /evm:
> + 
> + 1: enable HMAC validation and creation
> + 2: enable digital signature validation
> + 3: enable HMAC and digital signature validation and HMAC
> +creation
> + 
> + Further writes will be blocked if HMAC support is enabled or
> + if bit 32 is set:
> + 
> + echo 0x8002 >/evm
> + 
> + will enable digital signature validation and block
> + further writes to /evm.
> + 
> + Until this is done, EVM can not create or validate the
> + 'security.evm' xattr, but returns INTEGRITY_UNKNOWN.
> + Loading keys and signaling EVM should be done as early
> + as possible.  Normally this is done in the initramfs,
> + which has already been measured as part of the trusted
> + boot.  For more information on creating and loading
> + existing trusted/encrypted keys, refer to:
>  -Documentation/keys-trusted-encrypted.txt. Both dracut
> ++Documentation/security/keys/trusted-encrypted.rst. Both dracut
> + (via 97masterkey and 98integrity) and systemd (via
> + core/ima-setup) have support for loading keys at boot
> + time.

Just a reminder that this conflict still exists (and is now relevant to
the security tree).

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the integrity tree with the jc-docs tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 18 Oct 2017 11:50:25 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the integrity tree got a conflict in:
> 
>   Documentation/ABI/testing/evm
> 
> between commit:
> 
>   c7f66400f504fd5 ("Documentation: fix security related doc refs")
> 
> from the jc-docs tree and commit:
> 
>   cbad39d632b7c18 ("EVM: Allow userspace to signal an RSA key has been 
> loaded")
> 
> from the integrity tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc Documentation/ABI/testing/evm
> index ca622c9aa24c,a0bbccb00736..
> --- a/Documentation/ABI/testing/evm
> +++ b/Documentation/ABI/testing/evm
> @@@ -7,17 -7,36 +7,36 @@@ Description
>   HMAC-sha1 value across the extended attributes, storing the
>   value as the extended attribute 'security.evm'.
>   
> - EVM depends on the Kernel Key Retention System to provide it
> - with a trusted/encrypted key for the HMAC-sha1 operation.
> - The key is loaded onto the root's keyring using keyctl.  Until
> - EVM receives notification that the key has been successfully
> - loaded onto the keyring (echo 1 > /evm), EVM
> - can not create or validate the 'security.evm' xattr, but
> - returns INTEGRITY_UNKNOWN.  Loading the key and signaling EVM
> - should be done as early as possible.  Normally this is done
> - in the initramfs, which has already been measured as part
> - of the trusted boot.  For more information on creating and
> - loading existing trusted/encrypted keys, refer to:
> - Documentation/security/keys/trusted-encrypted.rst.  (A sample
> - dracut patch, which loads the trusted/encrypted key and enables
> - EVM, is available from http://linux-ima.sourceforge.net/#EVM.)
> + EVM supports two classes of security.evm. The first is
> + an HMAC-sha1 generated locally with a
> + trusted/encrypted key stored in the Kernel Key
> + Retention System. The second is a digital signature
> + generated either locally or remotely using an
> + asymmetric key. These keys are loaded onto root's
> + keyring using keyctl, and EVM is then enabled by
> + echoing a value to /evm:
> + 
> + 1: enable HMAC validation and creation
> + 2: enable digital signature validation
> + 3: enable HMAC and digital signature validation and HMAC
> +creation
> + 
> + Further writes will be blocked if HMAC support is enabled or
> + if bit 32 is set:
> + 
> + echo 0x8002 >/evm
> + 
> + will enable digital signature validation and block
> + further writes to /evm.
> + 
> + Until this is done, EVM can not create or validate the
> + 'security.evm' xattr, but returns INTEGRITY_UNKNOWN.
> + Loading keys and signaling EVM should be done as early
> + as possible.  Normally this is done in the initramfs,
> + which has already been measured as part of the trusted
> + boot.  For more information on creating and loading
> + existing trusted/encrypted keys, refer to:
>  -Documentation/keys-trusted-encrypted.txt. Both dracut
> ++Documentation/security/keys/trusted-encrypted.rst. Both dracut
> + (via 97masterkey and 98integrity) and systemd (via
> + core/ima-setup) have support for loading keys at boot
> + time.

Just a reminder that this conflict still exists (and is now relevant to
the security tree).

-- 
Cheers,
Stephen Rothwell


Re: [kernel-hardening] Re: [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Kaiwan N Billimoria
On Mon, Nov 13, 2017 at 10:05 AM, Tobin C. Harding  wrote:
> On Mon, Nov 13, 2017 at 06:37:28AM +0300, Kirill A. Shutemov wrote:
>> On Mon, Nov 13, 2017 at 10:06:46AM +1100, Tobin C. Harding wrote:
>> > On Sun, Nov 12, 2017 at 02:10:07AM +0300, Kirill A. Shutemov wrote:
...
>> >
>> > Thanks for the link. So it looks like we need to refactor the kernel
>> > address regular expression into a function that takes into account the
>> > machine architecture and the number of page table levels. We will need
>> > to add this to the false positive checks also.
>> >
>> > > Not sure if we care. It won't work too for other 64-bit architectrues 
>> > > that
>> > > have more than 256TB of virtual address space.
>> >
>> > Is this because of the virtual memory map?
>>
>> On x86 direct mapping is the nearest thing we have to userspace.
>>
>> > Did you mean 512TB?
>>
>> No, I mean 256TB.
>>
>> You have all kernel memory in the range from 0x to
>> 0x if you have 256 TB of virtual address space. If you
>> hvae more, some thing might be ouside the range.
>
> Doesn't 4-level paging already limit a system to 64TB of memory? So any
> system better equipped than this will use 5-level paging right? If I am
> totally talking rubbish please ignore, I'm appreciative that you pointed
> out the limitation already. Perhaps we can add a comment to the script
>
> # Script may miss some addresses on machines with more than 256TB of
> # memory.

I think the 256TB is wrt *virtual* address space not physical RAM.

Also, IMHO, the script should 'transparently' take into account the # of paging
levels (instead of the user needing to pass a parameter).
IOW it should be able to detect the same (say, from the .config file) and act
accordingly - in the sense, the regex's and associated logic would accordingly
differ.


Re: [kernel-hardening] Re: [PATCH v4] scripts: add leaking_addresses.pl

2017-11-12 Thread Kaiwan N Billimoria
On Mon, Nov 13, 2017 at 10:05 AM, Tobin C. Harding  wrote:
> On Mon, Nov 13, 2017 at 06:37:28AM +0300, Kirill A. Shutemov wrote:
>> On Mon, Nov 13, 2017 at 10:06:46AM +1100, Tobin C. Harding wrote:
>> > On Sun, Nov 12, 2017 at 02:10:07AM +0300, Kirill A. Shutemov wrote:
...
>> >
>> > Thanks for the link. So it looks like we need to refactor the kernel
>> > address regular expression into a function that takes into account the
>> > machine architecture and the number of page table levels. We will need
>> > to add this to the false positive checks also.
>> >
>> > > Not sure if we care. It won't work too for other 64-bit architectrues 
>> > > that
>> > > have more than 256TB of virtual address space.
>> >
>> > Is this because of the virtual memory map?
>>
>> On x86 direct mapping is the nearest thing we have to userspace.
>>
>> > Did you mean 512TB?
>>
>> No, I mean 256TB.
>>
>> You have all kernel memory in the range from 0x to
>> 0x if you have 256 TB of virtual address space. If you
>> hvae more, some thing might be ouside the range.
>
> Doesn't 4-level paging already limit a system to 64TB of memory? So any
> system better equipped than this will use 5-level paging right? If I am
> totally talking rubbish please ignore, I'm appreciative that you pointed
> out the limitation already. Perhaps we can add a comment to the script
>
> # Script may miss some addresses on machines with more than 256TB of
> # memory.

I think the 256TB is wrt *virtual* address space not physical RAM.

Also, IMHO, the script should 'transparently' take into account the # of paging
levels (instead of the user needing to pass a parameter).
IOW it should be able to detect the same (say, from the .config file) and act
accordingly - in the sense, the regex's and associated logic would accordingly
differ.


Re: linux-next: manual merge of the tip tree with the FIXME tree

2017-11-12 Thread Stephen Rothwell
Hi Mark,

On Wed, 11 Oct 2017 17:10:35 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   arch/s390/include/asm/spinlock.h
> 
> between a series of commits adding wait queuing to s390 spinlocks
> from the s390 tree:
> 
> eb3b7b848fb3dd00f7a57d633 s390/rwlock: introduce rwlock wait queueing
> b96f7d881ad94203e997cd2aa s390/spinlock: introduce spinlock wait queueing
> 8153380379ecc8381f6d55f64 s390/spinlock: use the cpu number +1 as spinlock 
> value
> 
> and Will's series of commits removing dummy implementations of spinlock
> related things from the tip tree:
> 
> a4c1887d4c1462b0ec5a8989f locking/arch: Remove dummy 
> arch_{read,spin,write}_lock_flags() implementations
> 0160fb177d484367e041ac251 locking/arch: Remove dummy 
> arch_{read,spin,write}_relax() implementations
> a8a217c22116eff6c120d753c locking/core: Remove {read,spin,write}_can_lock()
> 
> I'm don't feel confident I can resolve this conflict sensibly without
> taking too long so I've used the tip tree from yesterday.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the tip tree with the FIXME tree

2017-11-12 Thread Stephen Rothwell
Hi Mark,

On Wed, 11 Oct 2017 17:10:35 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   arch/s390/include/asm/spinlock.h
> 
> between a series of commits adding wait queuing to s390 spinlocks
> from the s390 tree:
> 
> eb3b7b848fb3dd00f7a57d633 s390/rwlock: introduce rwlock wait queueing
> b96f7d881ad94203e997cd2aa s390/spinlock: introduce spinlock wait queueing
> 8153380379ecc8381f6d55f64 s390/spinlock: use the cpu number +1 as spinlock 
> value
> 
> and Will's series of commits removing dummy implementations of spinlock
> related things from the tip tree:
> 
> a4c1887d4c1462b0ec5a8989f locking/arch: Remove dummy 
> arch_{read,spin,write}_lock_flags() implementations
> 0160fb177d484367e041ac251 locking/arch: Remove dummy 
> arch_{read,spin,write}_relax() implementations
> a8a217c22116eff6c120d753c locking/core: Remove {read,spin,write}_can_lock()
> 
> I'm don't feel confident I can resolve this conflict sensibly without
> taking too long so I've used the tip tree from yesterday.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


RE: [PATCH] mm/hugetlb: Implement ASLR and topdown for hugetlb mappings

2017-11-12 Thread Zhang, Shile (NSB - CN/Hangzhou)
Hi, Russell,

Have you any time to check this patch?
I found this issue/missing in my works, the application cannot mmap big 
hugepage (about 360MB) due to no more contiguous vm from the default 
"TASK_UNMMAPPED_AREA" by legacy bottom-up.
We need this patch to fix this issue.

Could you please help check this patch?

Thanks!

BR, Shile

-Original Message-
From: Shile Zhang [mailto:shile.zh...@nokia-sbell.com] 
Sent: Friday, November 03, 2017 5:19 PM
To: Russell King 
Cc: linux-kernel@vger.kernel.org; Zhang, Shile (NSB - CN/Hangzhou) 

Subject: [PATCH] mm/hugetlb: Implement ASLR and topdown for hugetlb mappings

merge from arch/x86

Signed-off-by: Shile Zhang 
---
 arch/arm/include/asm/page.h |  1 +
 arch/arm/mm/hugetlbpage.c   | 85 +
 2 files changed, 86 insertions(+)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 4355f0e..994630f 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -144,6 +144,7 @@ extern void copy_page(void *to, const void *from);
 
 #ifdef CONFIG_KUSER_HELPERS
 #define __HAVE_ARCH_GATE_AREA 1
+#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #endif
 
 #ifdef CONFIG_ARM_LPAE
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index fcafb52..46ed0c8 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -45,3 +45,88 @@ int pmd_huge(pmd_t pmd)
 {
return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
 }
+
+#ifdef CONFIG_HUGETLB_PAGE
+static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
+   unsigned long addr, unsigned long len,
+   unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct vm_unmapped_area_info info;
+
+   info.flags = 0;
+   info.length = len;
+   info.low_limit = current->mm->mmap_legacy_base;
+   info.high_limit = TASK_SIZE;
+   info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+   info.align_offset = 0;
+   return vm_unmapped_area();
+}
+
+static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
+   unsigned long addr0, unsigned long len,
+   unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct vm_unmapped_area_info info;
+   unsigned long addr;
+
+   info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+   info.length = len;
+   info.low_limit = PAGE_SIZE;
+   info.high_limit = current->mm->mmap_base;
+   info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+   info.align_offset = 0;
+   addr = vm_unmapped_area();
+
+   /*
+* A failed mmap() very likely causes application failure,
+* so fall back to the bottom-up function here. This scenario
+* can happen with large stack limits and large mmap()
+* allocations.
+*/
+   if (addr & ~PAGE_MASK) {
+   VM_BUG_ON(addr != -ENOMEM);
+   info.flags = 0;
+   info.low_limit = TASK_UNMAPPED_BASE;
+   info.high_limit = TASK_SIZE;
+   addr = vm_unmapped_area();
+   }
+
+   return addr;
+}
+
+unsigned long
+hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+   unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
+
+   if (len & ~huge_page_mask(h))
+   return -EINVAL;
+   if (len > TASK_SIZE)
+   return -ENOMEM;
+
+   if (flags & MAP_FIXED) {
+   if (prepare_hugepage_range(file, addr, len))
+   return -EINVAL;
+   return addr;
+   }
+
+   if (addr) {
+   addr = ALIGN(addr, huge_page_size(h));
+   vma = find_vma(mm, addr);
+   if (TASK_SIZE - len >= addr &&
+   (!vma || addr + len <= vma->vm_start))
+   return addr;
+   }
+   if (mm->get_unmapped_area == arch_get_unmapped_area)
+   return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+   pgoff, flags);
+   else
+   return hugetlb_get_unmapped_area_topdown(file, addr, len,
+   pgoff, flags);
+}
+#endif /* CONFIG_HUGETLB_PAGE */
-- 
2.6.2



RE: [PATCH] mm/hugetlb: Implement ASLR and topdown for hugetlb mappings

2017-11-12 Thread Zhang, Shile (NSB - CN/Hangzhou)
Hi, Russell,

Have you any time to check this patch?
I found this issue/missing in my works, the application cannot mmap big 
hugepage (about 360MB) due to no more contiguous vm from the default 
"TASK_UNMMAPPED_AREA" by legacy bottom-up.
We need this patch to fix this issue.

Could you please help check this patch?

Thanks!

BR, Shile

-Original Message-
From: Shile Zhang [mailto:shile.zh...@nokia-sbell.com] 
Sent: Friday, November 03, 2017 5:19 PM
To: Russell King 
Cc: linux-kernel@vger.kernel.org; Zhang, Shile (NSB - CN/Hangzhou) 

Subject: [PATCH] mm/hugetlb: Implement ASLR and topdown for hugetlb mappings

merge from arch/x86

Signed-off-by: Shile Zhang 
---
 arch/arm/include/asm/page.h |  1 +
 arch/arm/mm/hugetlbpage.c   | 85 +
 2 files changed, 86 insertions(+)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 4355f0e..994630f 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -144,6 +144,7 @@ extern void copy_page(void *to, const void *from);
 
 #ifdef CONFIG_KUSER_HELPERS
 #define __HAVE_ARCH_GATE_AREA 1
+#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #endif
 
 #ifdef CONFIG_ARM_LPAE
diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
index fcafb52..46ed0c8 100644
--- a/arch/arm/mm/hugetlbpage.c
+++ b/arch/arm/mm/hugetlbpage.c
@@ -45,3 +45,88 @@ int pmd_huge(pmd_t pmd)
 {
return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
 }
+
+#ifdef CONFIG_HUGETLB_PAGE
+static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
+   unsigned long addr, unsigned long len,
+   unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct vm_unmapped_area_info info;
+
+   info.flags = 0;
+   info.length = len;
+   info.low_limit = current->mm->mmap_legacy_base;
+   info.high_limit = TASK_SIZE;
+   info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+   info.align_offset = 0;
+   return vm_unmapped_area();
+}
+
+static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
+   unsigned long addr0, unsigned long len,
+   unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct vm_unmapped_area_info info;
+   unsigned long addr;
+
+   info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+   info.length = len;
+   info.low_limit = PAGE_SIZE;
+   info.high_limit = current->mm->mmap_base;
+   info.align_mask = PAGE_MASK & ~huge_page_mask(h);
+   info.align_offset = 0;
+   addr = vm_unmapped_area();
+
+   /*
+* A failed mmap() very likely causes application failure,
+* so fall back to the bottom-up function here. This scenario
+* can happen with large stack limits and large mmap()
+* allocations.
+*/
+   if (addr & ~PAGE_MASK) {
+   VM_BUG_ON(addr != -ENOMEM);
+   info.flags = 0;
+   info.low_limit = TASK_UNMAPPED_BASE;
+   info.high_limit = TASK_SIZE;
+   addr = vm_unmapped_area();
+   }
+
+   return addr;
+}
+
+unsigned long
+hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+   unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+   struct hstate *h = hstate_file(file);
+   struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
+
+   if (len & ~huge_page_mask(h))
+   return -EINVAL;
+   if (len > TASK_SIZE)
+   return -ENOMEM;
+
+   if (flags & MAP_FIXED) {
+   if (prepare_hugepage_range(file, addr, len))
+   return -EINVAL;
+   return addr;
+   }
+
+   if (addr) {
+   addr = ALIGN(addr, huge_page_size(h));
+   vma = find_vma(mm, addr);
+   if (TASK_SIZE - len >= addr &&
+   (!vma || addr + len <= vma->vm_start))
+   return addr;
+   }
+   if (mm->get_unmapped_area == arch_get_unmapped_area)
+   return hugetlb_get_unmapped_area_bottomup(file, addr, len,
+   pgoff, flags);
+   else
+   return hugetlb_get_unmapped_area_topdown(file, addr, len,
+   pgoff, flags);
+}
+#endif /* CONFIG_HUGETLB_PAGE */
-- 
2.6.2



Re: linux-next: manual merge of the tip tree with the s390 tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 11 Oct 2017 16:51:45 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   arch/s390/include/asm/rwsem.h
> 
> between commit:
> 
>91a1fad759ffd ("s390: use generic rwsem implementation")
> 
> from the s390 tree and commit:
> 
>a61ba2c8a48f1 ("locking/arch, s390: Add __down_read_killable()")
> 
> from the tip tree.
> 
> I fixed it up by re-deleting the file and can carry the fix as
> necessary. This is now fixed as far as linux-next is concerned, but any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to consider
> cooperating with the maintainer of the conflicting tree to minimise any
> particularly complex conflicts.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the tip tree with the s390 tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Wed, 11 Oct 2017 16:51:45 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   arch/s390/include/asm/rwsem.h
> 
> between commit:
> 
>91a1fad759ffd ("s390: use generic rwsem implementation")
> 
> from the s390 tree and commit:
> 
>a61ba2c8a48f1 ("locking/arch, s390: Add __down_read_killable()")
> 
> from the tip tree.
> 
> I fixed it up by re-deleting the file and can carry the fix as
> necessary. This is now fixed as far as linux-next is concerned, but any
> non trivial conflicts should be mentioned to your upstream maintainer
> when your tree is submitted for merging.  You may also want to consider
> cooperating with the maintainer of the conflicting tree to minimise any
> particularly complex conflicts.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 9 Oct 2017 18:56:33 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the drivers-x86 tree got a conflict in:
> 
>   Documentation/admin-guide/thunderbolt.rst
> 
> between commit:
> 
>e69b6c02b4c3b ("net: Add support for networking over Thunderbolt cable")
> 
> from the net-next tree and commit:
> 
>ce6a90027c10f ("platform/x86: Add driver to force WMI Thunderbolt 
> controller power status")
> 
> from the drivers-x86 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc Documentation/admin-guide/thunderbolt.rst
> index 5c62d11d77e8,dadcd66ee12f..
> --- a/Documentation/admin-guide/thunderbolt.rst
> +++ b/Documentation/admin-guide/thunderbolt.rst
> @@@ -198,26 -198,17 +198,41 @@@ information is missing
>   To recover from this mode, one needs to flash a valid NVM image to the
>   host host controller in the same way it is done in the previous chapter.
>   
>  +Networking over Thunderbolt cable
>  +-
>  +Thunderbolt technology allows software communication across two hosts
>  +connected by a Thunderbolt cable.
>  +
>  +It is possible to tunnel any kind of traffic over Thunderbolt link but
>  +currently we only support Apple ThunderboltIP protocol.
>  +
>  +If the other host is running Windows or macOS only thing you need to
>  +do is to connect Thunderbolt cable between the two hosts, the
>  +``thunderbolt-net`` is loaded automatically. If the other host is also
>  +Linux you should load ``thunderbolt-net`` manually on one host (it does
>  +not matter which one)::
>  +
>  +  # modprobe thunderbolt-net
>  +
>  +This triggers module load on the other host automatically. If the driver
>  +is built-in to the kernel image, there is no need to do anything.
>  +
>  +The driver will create one virtual ethernet interface per Thunderbolt
>  +port which are named like ``thunderbolt0`` and so on. From this point
>  +you can either use standard userspace tools like ``ifconfig`` to
>  +configure the interface or let your GUI to handle it automatically.
> ++
> + Forcing power
> + -
> + Many OEMs include a method that can be used to force the power of a
> + thunderbolt controller to an "On" state even if nothing is connected.
> + If supported by your machine this will be exposed by the WMI bus with
> + a sysfs attribute called "force_power".
> + 
> + For example the intel-wmi-thunderbolt driver exposes this attribute in:
> +   
> /sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
> + 
> +   To force the power to on, write 1 to this attribute file.
> +   To disable force power, write 0 to this attribute file.
> + 
> + Note: it's currently not possible to query the force power state of a 
> platform.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree

2017-11-12 Thread Stephen Rothwell
Hi all,

On Mon, 9 Oct 2017 18:56:33 +0100 Mark Brown  wrote:
>
> Today's linux-next merge of the drivers-x86 tree got a conflict in:
> 
>   Documentation/admin-guide/thunderbolt.rst
> 
> between commit:
> 
>e69b6c02b4c3b ("net: Add support for networking over Thunderbolt cable")
> 
> from the net-next tree and commit:
> 
>ce6a90027c10f ("platform/x86: Add driver to force WMI Thunderbolt 
> controller power status")
> 
> from the drivers-x86 tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc Documentation/admin-guide/thunderbolt.rst
> index 5c62d11d77e8,dadcd66ee12f..
> --- a/Documentation/admin-guide/thunderbolt.rst
> +++ b/Documentation/admin-guide/thunderbolt.rst
> @@@ -198,26 -198,17 +198,41 @@@ information is missing
>   To recover from this mode, one needs to flash a valid NVM image to the
>   host host controller in the same way it is done in the previous chapter.
>   
>  +Networking over Thunderbolt cable
>  +-
>  +Thunderbolt technology allows software communication across two hosts
>  +connected by a Thunderbolt cable.
>  +
>  +It is possible to tunnel any kind of traffic over Thunderbolt link but
>  +currently we only support Apple ThunderboltIP protocol.
>  +
>  +If the other host is running Windows or macOS only thing you need to
>  +do is to connect Thunderbolt cable between the two hosts, the
>  +``thunderbolt-net`` is loaded automatically. If the other host is also
>  +Linux you should load ``thunderbolt-net`` manually on one host (it does
>  +not matter which one)::
>  +
>  +  # modprobe thunderbolt-net
>  +
>  +This triggers module load on the other host automatically. If the driver
>  +is built-in to the kernel image, there is no need to do anything.
>  +
>  +The driver will create one virtual ethernet interface per Thunderbolt
>  +port which are named like ``thunderbolt0`` and so on. From this point
>  +you can either use standard userspace tools like ``ifconfig`` to
>  +configure the interface or let your GUI to handle it automatically.
> ++
> + Forcing power
> + -
> + Many OEMs include a method that can be used to force the power of a
> + thunderbolt controller to an "On" state even if nothing is connected.
> + If supported by your machine this will be exposed by the WMI bus with
> + a sysfs attribute called "force_power".
> + 
> + For example the intel-wmi-thunderbolt driver exposes this attribute in:
> +   
> /sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
> + 
> +   To force the power to on, write 1 to this attribute file.
> +   To disable force power, write 0 to this attribute file.
> + 
> + Note: it's currently not possible to query the force power state of a 
> platform.

Just a reminder that this conflict still exists.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-11-12 Thread Dave Young
On 10/24/17 at 01:31pm, Dave Young wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to: 
> 
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
> 
> It's more transparent and user-friendly.
> 
> If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> is not compatible with new kernel because old kexec-tools can not load
> kernel at high memory region, there was an old discussion below:
> https://lkml.org/lkml/2013/10/15/601
> 
> But actually the behavior is consistent during my test. Suppose
> old kernel fail to reserve memory at low areas, kdump does not
> work because no meory reserved. With this patch, suppose new kernel
> successfully reserved memory at high areas, old kexec-tools still fail
> to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> worry about the compatibility.
> 
> Here is the test result (kexec-tools 2.0.2, no high memory load
> support):
> Crashkernel over 4G:
> # cat /proc/iomem|grep Crash
>   be00-cdff : Crash kernel
>   21300-21eff : Crash kernel
> # ./kexec  -p /boot/vmlinuz-`uname -r`
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
> 
> crashkernel: 896M-4G:
> # cat /proc/iomem|grep Crash
>   9600-cdef : Crash kernel
> # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> ELF core (kcore) parse failed
> Cannot load /boot/vmlinuz-4.14.0-rc4+
> 
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/setup.c |   16 
>  1 file changed, 16 insertions(+)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
>   high ? CRASH_ADDR_HIGH_MAX
>: CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> +  * crashkernel=X reserve below 896M fails? Try below 4G
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> +  * crashkernel=X reserve below 4G fails? Try MAXMEM
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;
> 
> 

Andrew, this patch is good to have, could you take this in your tree?
The other two patches may need more discussion I will drop them for now.

Thanks
Dave


Re: [PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-11-12 Thread Dave Young
On 10/24/17 at 01:31pm, Dave Young wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to: 
> 
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
> 
> It's more transparent and user-friendly.
> 
> If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> is not compatible with new kernel because old kexec-tools can not load
> kernel at high memory region, there was an old discussion below:
> https://lkml.org/lkml/2013/10/15/601
> 
> But actually the behavior is consistent during my test. Suppose
> old kernel fail to reserve memory at low areas, kdump does not
> work because no meory reserved. With this patch, suppose new kernel
> successfully reserved memory at high areas, old kexec-tools still fail
> to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> worry about the compatibility.
> 
> Here is the test result (kexec-tools 2.0.2, no high memory load
> support):
> Crashkernel over 4G:
> # cat /proc/iomem|grep Crash
>   be00-cdff : Crash kernel
>   21300-21eff : Crash kernel
> # ./kexec  -p /boot/vmlinuz-`uname -r`
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
> 
> crashkernel: 896M-4G:
> # cat /proc/iomem|grep Crash
>   9600-cdef : Crash kernel
> # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> ELF core (kcore) parse failed
> Cannot load /boot/vmlinuz-4.14.0-rc4+
> 
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/setup.c |   16 
>  1 file changed, 16 insertions(+)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
>   high ? CRASH_ADDR_HIGH_MAX
>: CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> +  * crashkernel=X reserve below 896M fails? Try below 4G
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> +  * crashkernel=X reserve below 4G fails? Try MAXMEM
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);
> +#endif
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;
> 
> 

Andrew, this patch is good to have, could you take this in your tree?
The other two patches may need more discussion I will drop them for now.

Thanks
Dave


  1   2   3   4   5   6   7   >