date:20190510

Re: Linux v5.1.1

2019-05-10 Thread Greg Kroah-Hartman

On Fri, May 10, 2019 at 09:47:14PM +0200, Sedat Dilek wrote:
> Hi Greg,
> 
> I have seen that all other Linux-stable Git branches got a new release.
> 
> What happened to Linux-stable-5.1.y and v5.1.1 release?

Dinner happened before I could get to them all :)

> Is there a show-stopper?

Nope, nothing was "supposed" to be released until today, according to
the -rc announcement, so there's no real issue.

Dealing with 5 stable trees at once is not trivial, please give us a
chance...

greg k-h

Re: [PATCH 5.1 00/30] 5.1.1-stable review

2019-05-10 Thread Greg Kroah-Hartman

On Fri, May 10, 2019 at 10:53:34PM +0530, Vandana BN wrote:
> 
> On 10/05/19 12:12 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.1.1 release.
> > There are 30 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat 11 May 2019 06:11:35 PM UTC.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.1-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-5.1.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> 
> Compiled , booted and no regressions on my system.

Wonderful, thansk!

Re: [PATCH 5.1 00/30] 5.1.1-stable review

2019-05-10 Thread Greg Kroah-Hartman

On Fri, May 10, 2019 at 03:14:08PM -0600, shuah wrote:
> On 5/9/19 12:42 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.1.1 release.
> > There are 30 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat 11 May 2019 06:11:35 PM UTC.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.1-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-5.1.y
> > and the diffstat can be found below.
> > 
> 
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testingn all of these and letting me know.

greg k-h

Re: [PATCH 5.1 00/30] 5.1.1-stable review

2019-05-10 Thread Greg Kroah-Hartman

On Fri, May 10, 2019 at 11:27:43AM -0500, Dan Rue wrote:
> On Thu, May 09, 2019 at 08:42:32PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.1.1 release.
> > There are 30 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat 11 May 2019 06:11:35 PM UTC.
> > Anything received after that time might be too late.
> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm, x86_64, and i386.

Great, thanks!

Re: [PATCH 5.1 00/30] 5.1.1-stable review

2019-05-10 Thread Greg Kroah-Hartman

On Fri, May 10, 2019 at 09:46:23AM -0700, Guenter Roeck wrote:
> On Thu, May 09, 2019 at 08:42:32PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.1.1 release.
> > There are 30 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat 11 May 2019 06:11:35 PM UTC.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 159 pass: 159 fail: 0
> Qemu test results:
>   total: 349 pass: 349 fail: 0

Wonderful, thanks for testing all of these and letting me know.

greg k-h

Re: [PATCH] mm: Remove duplicate headers

2019-05-10 Thread Souptick Joarder

On Sat, May 11, 2019 at 9:04 AM Sabyasachi Gupta
 wrote:
>
> Remove asm/sections.h and asm/fixmap.h which are included more than once
>
> Signed-off-by: Sabyasachi Gupta 

Acked-by: Souptick Joarder 

> ---
>  arch/arm/mm/mmu.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index fcded2c..29035f4 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -23,7 +23,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -36,7 +35,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>
>  #include "fault.h"
>  #include "mm.h"
> --
> 2.7.4
>

The UAPI references Kconfig's CONFIG_* macros (variables)

2019-05-10 Thread Michael Witten

The  UAPI  headers  contain  references  to  Kconfig's  `CONFIG_'
macros. Either this is a bug,  or there needs to be some standard
way for users  to provide definitions for these  macros, in order
to complete Linux's user-space API. Consider:

  #!/bin/sh
  cd "${linux_repo}"

  # Careful!
  #git reset --hard HEAD
  #git clean -fdx

  git checkout v5.1
  #zcat /proc/config.gz >.config

  mkdir -p /tmp/uapi/include
  make INSTALL_HDR_PATH=/tmp/uapi headers_install

  printf >/tmp/uapi/raw.c '%s\n%s\n' \
'#include ' \
'int main() { return MAX_RAW_MINORS; }'

Then:

  $ gcc -c -I/tmp/uapi/include /tmp/uapi/raw.c
  In file included from /tmp/uapi/raw.c:1:0:
  /tmp/uapi/raw.c: In function âmainâ:
  /tmp/uapi/include/linux/raw.h:17:24: error: âCONFIG_MAX_RAW_DEVSâ 
undeclared (first use in this function)
   #define MAX_RAW_MINORS CONFIG_MAX_RAW_DEVS
  ^
  /tmp/uapi/raw.c:2:21: note: in expansion of macro âMAX_RAW_MINORSâ
   int main() { return MAX_RAW_MINORS; }
   ^~
  /tmp/uapi/include/linux/raw.h:17:24: note: each undeclared identifier is 
reported only once for each function it appears in
   #define MAX_RAW_MINORS CONFIG_MAX_RAW_DEVS
  ^
  /tmp/uapi/raw.c:2:21: note: in expansion of macro âMAX_RAW_MINORSâ
   int main() { return MAX_RAW_MINORS; }
   ^~

As you can  see, the UAPI is actually incomplete;  there is not a
valid  definition for  `MAX_RAW_MINORS'.  Indeed,  on my  system,
`CONFIG_MAX_RAW_DEVS'  isn't   ever  defined   anywhere,  because
`CONFIG_RAW_DRIVER' is not set:

  $ git show v5.1:drivers/char/Kconfig | sed -n 467,469p
  config MAX_RAW_DEVS
  int "Maximum number of RAW devices to support (1-65536)"
  depends on RAW_DRIVER
  $ zcat /proc/config.gz | grep RAW_DRIVER
  # CONFIG_RAW_DRIVER is not set

Even if `CONFIG_RAW_DRIVER' were  set, the desired definition for
the  macro  `CONFIG_MAX_RAW_DEVS'  would  only be  found  in  the
following  header (generated  at  built-time),  which is  neither
officially nor likely available to user space:

  "${linux_repo}"/include/generated/autoconf.h

Other  such  references to  `CONFIG_*'  macros  are seen  in  the
following  (some  appear only  in  comments,  but perhaps  that's
conceptually a mistake, too):

  $ (cd /tmp/uapi/include && grep -R . -e \\bCONFIG_)
  ./asm/mman.h:#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
  ./asm/auxvec.h:#if defined(CONFIG_IA32_EMULATION) || !defined(CONFIG_X86_64)
  ./asm/e820.h: * kernel was built: MAX_NUMNODES == (1 << CONFIG_NODES_SHIFT),
  ./asm/e820.h: * unless the CONFIG_X86_PMEM_LEGACY option is set.
  ./asm/e820.h: * if CONFIG_INTEL_TXT is enabled, memory of this type will be
  ./mtd/ubi-user.h: * default kernel value of %CONFIG_MTD_UBI_BEB_LIMIT will be 
used.
  ./linux/pkt_cls.h:TCA_FW_INDEV, /*  used by CONFIG_NET_CLS_IND */
  ./linux/pkt_cls.h:TCA_FW_ACT, /* used by CONFIG_NET_CLS_ACT */
  ./linux/cm4000_cs.h: * member sizes. This leads to CONFIG_COMPAT breakage, 
since 32bit userspace
  ./linux/eventpoll.h:#ifdef CONFIG_PM_SLEEP
  ./linux/hw_breakpoint.h:#ifdef CONFIG_HAVE_MIXED_BREAKPOINTS_REGS
  ./linux/bpf.h: * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not 
set,
  ./linux/bpf.h: *  the **CONFIG_CGROUP_NET_CLASSID** configuration 
option set to
  ./linux/bpf.h: *  **CONFIG_IP_ROUTE_CLASSID** configuration 
option.
  ./linux/bpf.h: *  with the **CONFIG_BPF_KPROBE_OVERRIDE** 
configuration
  ./linux/bpf.h: *  the CONFIG_FUNCTION_ERROR_INJECTION option. As 
of this writing,
  ./linux/bpf.h: *  **CONFIG_XFRM** configuration option.
  ./linux/bpf.h: *  the **CONFIG_BPF_LIRC_MODE2** configuration 
option set to
  ./linux/bpf.h: *  the **CONFIG_BPF_LIRC_MODE2** configuration 
option set to
  ./linux/bpf.h: *  **CONFIG_SOCK_CGROUP_DATA** configuration 
option.
  ./linux/bpf.h: *  **CONFIG_NET** configuration option.
  ./linux/bpf.h: *  **CONFIG_NET** configuration option.
  ./linux/bpf.h: *  the **CONFIG_BPF_LIRC_MODE2** configuration 
option set to
  ./linux/raw.h:#define MAX_RAW_MINORS CONFIG_MAX_RAW_DEVS
  ./linux/pktcdvd.h:#if defined(CONFIG_CDROM_PKTCDVD_WCACHE)
  ./linux/flat.h:#ifdef CONFIG_BINFMT_SHARED_FLAT
  ./linux/videodev2.h: * Only implemented if CONFIG_VIDEO_ADV_DEBUG is defined.
  ./linux/elfcore.h:#ifdef CONFIG_BINFMT_ELF_FDPIC
  ./linux/atmdev.h:#ifdef CONFIG_COMPAT
  ./asm-generic/bitsperlong.h: * both 32 and 64 bit user space must not rely on 
CONFIG_64BIT
  ./asm-generic/unistd.h:/* mm/, CONFIG_MMU only */
  ./asm-generic/mman-common.h:#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
  ./asm-generic/fcntl.h:#ifndef CONFIG_64BIT

What is the correct way to think about this?

  * Should the UAPI make no reference to build-time configurations?
  * Should the UAPI headers include sanity checks on behalf of

Re: [PATCH] mm: vmscan: correct nr_reclaimed for THP

2019-05-10 Thread Yafang Shao

On Sat, May 11, 2019 at 12:36 AM Matthew Wilcox  wrote:
>
> On Fri, May 10, 2019 at 10:12:40AM +0800, Huang, Ying wrote:
> > > +   nr_reclaimed += (1 << compound_order(page));
> >
> > How about to change this to
> >
> >
> > nr_reclaimed += hpage_nr_pages(page);
>
> Please don't.  That embeds the knowledge that we can only swap out either
> normal pages or THP sized pages.

Agreed.
compound_order() is more general than hpage_nr_pages().
It seems to me that hpage_nr_pages() is a little  abuse in lots of places.

Thanks
Yafang

RE: [EXT] Re: [PATCH v6] arm64: dts: ls1088a: add one more thermal zone node

2019-05-10 Thread Andy Tang

Thanks Viresh for your explanation.

BR,
Andy
> -Original Message-
> From: Viresh Kumar 
> Sent: 2019年5月10日 18:12
> To: Andy Tang 
> Cc: Daniel Lezcano ; Shawn Guo
> ; Leo Li ; robh...@kernel.org;
> mark.rutl...@arm.com; linux-arm-ker...@lists.infradead.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux...@vger.kernel.org; rui.zh...@intel.com; edubez...@gmail.com
> Subject: Re: [EXT] Re: [PATCH v6] arm64: dts: ls1088a: add one more thermal
> zone node
> 
> Caution: EXT Email
> 
> On 10-05-19, 08:47, Andy Tang wrote:
> > + Viresh for help.
> >
> > > -Original Message-
> > > From: Daniel Lezcano 
> > > Sent: 2019年5月10日 15:17
> > > To: Andy Tang ; Shawn Guo
> 
> > > Cc: Leo Li ; robh...@kernel.org;
> > > mark.rutl...@arm.com; linux-arm-ker...@lists.infradead.org;
> > > devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > linux...@vger.kernel.org; rui.zh...@intel.com; edubez...@gmail.com
> > > Subject: Re: [EXT] Re: [PATCH v6] arm64: dts: ls1088a: add one more
> > > thermal zone node
> > >
> > > Caution: EXT Email
> > >
> > > On 10/05/2019 05:40, Andy Tang wrote:
> > > >> -Original Message-
> > > >> From: Shawn Guo 
> > > >> Sent: 2019年5月10日 11:14
> > > >> To: Andy Tang 
> > > >> Cc: Leo Li ; robh...@kernel.org;
> > > >> mark.rutl...@arm.com; linux-arm-ker...@lists.infradead.org;
> > > >> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > >> linux...@vger.kernel.org; daniel.lezc...@linaro.org;
> > > >> rui.zh...@intel.com; edubez...@gmail.com
> > > >> Subject: [EXT] Re: [PATCH v6] arm64: dts: ls1088a: add one more
> > > >> thermal zone node
> > > >>
> > > >> Caution: EXT Email
> > > >>
> > > >> On Tue, Apr 23, 2019 at 10:25:07AM +0800, Yuantian Tang wrote:
> > > >>> Ls1088a has 2 thermal sensors, core cluster and SoC platform.
> > > >>> Core cluster sensor is used to monitor the temperature of core
> > > >>> and SoC platform is for platform. The current dts only support the 
> > > >>> first
> sensor.
> > > >>> This patch adds the second sensor node to dts to enable it.
> > > >>>
> > > >>> Signed-off-by: Yuantian Tang 
> > > >>> ---
> > > >>> v6:
> > > >>> - add cooling device map to cpu0-7 in platform node.
> > > > I like to explain a little. I think it makes sense that multiple
> > > > thermal zone
> > > map to same cooling device.
> > > > In this way, no matter which thermal zone raises a temp alarm, it
> > > > can call
> > > cooling device to chill out.
> > > > I also asked cpufreq maintainer about the cooling map issue, he
> > > > think it
> > > would be fine.
> 
> Yes, you asked me and I said it should be okay.
> 
> > > > I have tested and no issue found.
> > > >
> > > > Daniel, what's your thought?
> > >
> > > If there are multiple thermal zones, they will be managed by
> > > different instances of a thermal governor. Each instances will act
> > > on the shared cooling device and will collide in their decisions:
> > >
> > >  - If the sensors are closed, their behavior will be similar
> > > regarding the temperature. The governors may take the same decision
> > > for the cooling device. But in such case having just one thermal zone
> managed is enough.
> > >
> > >  - If the sensors are not closed, their behavior will be different
> > > regarding the temperature. The governors will take different
> > > decision regarding the cooling device (one will decrease the freq, other
> will increase the freq).
> > >
> > > As the thermal governors are not able to manage several thermal
> > > zones and there is one cooling device (the cpu cooling device), this
> > > setup won't work as expected IMO.
> > >
> > > The setup making sense is having a thermal zone per 'cluster' and a
> > > cooling device per 'cluster'. That means the platform has one clock line
> per 'cluster'.
> > > The thermal management happens in a self-contained thermal zone (one
> > > cooling device - one governor - one thermal zone).
> > >
> > > In the case of HMP, other combinations are possible to be optimal.
> 
> But not sure how I missed the obvious, though I do remember thinking about
> this.
> 
> So the problem is that the cpu_cooling driver will get requests in parallel to
> set different max frequencies and the last call will always win and may result
> in undesired outcome.
> 
> Sorry about creating the confusion.
> 
> --
> viresh

[PATCH] mm: Remove duplicate headers

2019-05-10 Thread Sabyasachi Gupta

Remove asm/sections.h and asm/fixmap.h which are included more than once

Signed-off-by: Sabyasachi Gupta 
---
 arch/arm/mm/mmu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index fcded2c..29035f4 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -36,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "fault.h"
 #include "mm.h"
-- 
2.7.4

[PATCH] .gitignore: exclude .get_maintainer.ignore and .gitattributes

2019-05-10 Thread Masahiro Yamada

Also, sort the patterns alphabetically. Update the comment since
we have non-git files here.

Signed-off-by: Masahiro Yamada 
---

 .gitignore | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/.gitignore b/.gitignore
index d263ca9..7587ef56 100644
--- a/.gitignore
+++ b/.gitignore
@@ -81,12 +81,14 @@ modules.builtin
 /tar-install/
 
 #
-# git files that we don't want to ignore even if they are dot-files
+# We don't want to ignore the following even if they are dot-files
 #
+!.clang-format
+!.cocciconfig
+!.get_maintainer.ignore
+!.gitattributes
 !.gitignore
 !.mailmap
-!.cocciconfig
-!.clang-format
 
 #
 # Generated include files
-- 
2.7.4

Re: [RFC] x86: Speculative execution warnings

2019-05-10 Thread Randy Dunlap

On 5/10/19 12:25 PM, Nadav Amit wrote:
> It may be useful to check in runtime whether certain assertions are
> violated even during speculative execution. This can allow to avoid
> adding unnecessary memory fences and at the same time check that no data
> leak channels exist.
> 
> For example, adding such checks can show that allocating zeroed pages
> can return speculatively non-zeroed pages (the first qword is not
> zero).  [This might be a problem when the page-fault handler performs
> software page-walk, for example.]
> 
> Introduce SPEC_WARN_ON(), which checks in runtime whether a certain
> condition is violated during speculative execution. The condition should
> be computed without branches, e.g., using bitwise operators. The check
> will wait for the condition to be realized (i.e., not speculated), and
> if the assertion is violated, a warning will be thrown.
> 
> Warnings can be provided in one of two modes: precise and imprecise.
> Both mode are not perfect. The precise mode does not always make it easy
> to understand which assertion was broken, but instead points to a point
> in the execution somewhere around the point in which the assertion was
> violated.  In addition, it prints a warning for each violation (unlike
> WARN_ONCE() like behavior).
> 
> The imprecise mode, on the other hand, can sometimes throw the wrong
> indication, specifically if the control flow has changed between the
> speculative execution and the actual one. Note that it is not a
> false-positive, it just means that the output would mislead the user to
> think the wrong assertion was broken.
> 
> There are some more limitations. Since the mechanism requires an
> indirect branch, it should not be used in production systems that are
> susceptible for Spectre v2. The mechanism requires TSX and performance
> counters that are only available in skylake+. There is a hidden
> assumption that TSX is not used in the kernel for anything else, other
> than this mechanism.
> 
> The basic idea behind the implementation is to use a performance counter
> that updates also during speculative execution as an indication for
> assertion failure. By using conditional-mov, which is not predicted,
> to affect the control flow, the condition is realized before the event
> that affects the PMU is triggered.
> 
> Enable this feature by setting "spec_warn=on" or "spec_warn=precise"
> kernel parameter. I did not run performance numbers but I guess the
> overhead should not be too high.

Hi,
If this progresses, please document spec_warn={on|precise} in
Documentation/admin-guide/kernel-parameters.txt.

> I did not run too many tests, but brief experiments suggest that it does
> work. Let me know if I missed anything and whether you think this can be
> useful. To be frank, the exact use cases are not super clear, and there
> are various possible extensions (e.g., ensuring the speculation window
> is long enough by adding data dependencies). I would appreciate your
> inputs.
> 
> Cc: Andy Lutomirsky 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Jann Horn 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/Kconfig |   4 +
>  arch/x86/include/asm/nospec-branch.h |  30 +
>  arch/x86/kernel/Makefile |   1 +
>  arch/x86/kernel/nospec.c | 185 +++
>  4 files changed, 220 insertions(+)
>  create mode 100644 arch/x86/kernel/nospec.c

thanks.
-- 
~Randy

Re: [PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-10 Thread Ravi Bangoria




On 5/11/19 8:12 AM, Ravi Bangoria wrote:
> Consider a scenario where user creates two events:
> 
>   1st event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   This sets cpuhw->bhrb_filter to 0 and returns valid fd.
> 
>   2nd event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   It overrides cpuhw->bhrb_filter to -1 and returns with error.
> 
> Now if power_pmu_enable() gets called by any path other than
> power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.
> 
> Signed-off-by: Ravi Bangoria 

Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework")

[RFC] x86: Speculative execution warnings

2019-05-10 Thread Nadav Amit

It may be useful to check in runtime whether certain assertions are
violated even during speculative execution. This can allow to avoid
adding unnecessary memory fences and at the same time check that no data
leak channels exist.

For example, adding such checks can show that allocating zeroed pages
can return speculatively non-zeroed pages (the first qword is not
zero).  [This might be a problem when the page-fault handler performs
software page-walk, for example.]

Introduce SPEC_WARN_ON(), which checks in runtime whether a certain
condition is violated during speculative execution. The condition should
be computed without branches, e.g., using bitwise operators. The check
will wait for the condition to be realized (i.e., not speculated), and
if the assertion is violated, a warning will be thrown.

Warnings can be provided in one of two modes: precise and imprecise.
Both mode are not perfect. The precise mode does not always make it easy
to understand which assertion was broken, but instead points to a point
in the execution somewhere around the point in which the assertion was
violated.  In addition, it prints a warning for each violation (unlike
WARN_ONCE() like behavior).

The imprecise mode, on the other hand, can sometimes throw the wrong
indication, specifically if the control flow has changed between the
speculative execution and the actual one. Note that it is not a
false-positive, it just means that the output would mislead the user to
think the wrong assertion was broken.

There are some more limitations. Since the mechanism requires an
indirect branch, it should not be used in production systems that are
susceptible for Spectre v2. The mechanism requires TSX and performance
counters that are only available in skylake+. There is a hidden
assumption that TSX is not used in the kernel for anything else, other
than this mechanism.

The basic idea behind the implementation is to use a performance counter
that updates also during speculative execution as an indication for
assertion failure. By using conditional-mov, which is not predicted,
to affect the control flow, the condition is realized before the event
that affects the PMU is triggered.

Enable this feature by setting "spec_warn=on" or "spec_warn=precise"
kernel parameter. I did not run performance numbers but I guess the
overhead should not be too high.

I did not run too many tests, but brief experiments suggest that it does
work. Let me know if I missed anything and whether you think this can be
useful. To be frank, the exact use cases are not super clear, and there
are various possible extensions (e.g., ensuring the speculation window
is long enough by adding data dependencies). I would appreciate your
inputs.

Cc: Andy Lutomirsky 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Signed-off-by: Nadav Amit 
---
 arch/x86/Kconfig |   4 +
 arch/x86/include/asm/nospec-branch.h |  30 +
 arch/x86/kernel/Makefile |   1 +
 arch/x86/kernel/nospec.c | 185 +++
 4 files changed, 220 insertions(+)
 create mode 100644 arch/x86/kernel/nospec.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 62fc3fda1a05..2cc57c2172be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2887,6 +2887,10 @@ config X86_DMA_REMAP
 config HAVE_GENERIC_GUP
def_bool y
 
+config DEBUG_SPECULATIVE_EXECUTION
+   bool "Debug speculative execution"
+   depends on X86_64
+
 source "drivers/firmware/Kconfig"
 
 source "arch/x86/kvm/Kconfig"
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index dad12b767ba0..3f1af6378304 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -290,6 +290,36 @@ static inline void indirect_branch_prediction_barrier(void)
 /* The Intel SPEC CTRL MSR base value cache */
 extern u64 x86_spec_ctrl_base;
 
+#ifdef CONFIG_DEBUG_SPECULATIVE_EXECUTION
+
+extern bool spec_check(unsigned long cond);
+
+DECLARE_STATIC_KEY_FALSE(spec_test_key);
+DECLARE_STATIC_KEY_FALSE(spec_test_precise_key);
+
+#define SPEC_WARN_ON(cond) \
+do {   \
+   bool _error;\
+   \
+   if (!static_branch_unlikely(_test_key))\
+   break;  \
+   \
+   _error = spec_check((unsigned long)(cond)); \
+   \
+   if (static_branch_unlikely(_test_precise_key)) \
+   break;  \
+   \
+

[PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-10 Thread Ravi Bangoria

Consider a scenario where user creates two events:

  1st event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
fd = perf_event_open(attr, 0, 1, -1, 0);

  This sets cpuhw->bhrb_filter to 0 and returns valid fd.

  2nd event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
fd = perf_event_open(attr, 0, 1, -1, 0);

  It overrides cpuhw->bhrb_filter to -1 and returns with error.

Now if power_pmu_enable() gets called by any path other than
power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/perf/core-book3s.c | 6 --
 arch/powerpc/perf/power8-pmu.c  | 3 +++
 arch/powerpc/perf/power9-pmu.c  | 3 +++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b0723002a396..8eb5dc5df62b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1846,6 +1846,7 @@ static int power_pmu_event_init(struct perf_event *event)
int n;
int err;
struct cpu_hw_events *cpuhw;
+   u64 bhrb_filter;
 
if (!ppmu)
return -ENOENT;
@@ -1951,13 +1952,14 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   bhrb_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if (cpuhw->bhrb_filter == -1) {
+   if (bhrb_filter == -1) {
put_cpu_var(cpu_hw_events);
return -EOPNOTSUPP;
}
+   cpuhw->bhrb_filter = bhrb_filter;
}
 
put_cpu_var(cpu_hw_events);
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index d12a2db26353..d10feef93b6b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -29,6 +29,7 @@ enum {
 #definePOWER8_MMCRA_IFM1   0x4000UL
 #definePOWER8_MMCRA_IFM2   0x8000UL
 #definePOWER8_MMCRA_IFM3   0xC000UL
+#definePOWER8_MMCRA_BHRB_MASK  0xC000UL
 
 /*
  * Raw event encoding for PowerISA v2.07 (Power8):
@@ -243,6 +244,8 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
 {
+   pmu_bhrb_filter &= POWER8_MMCRA_BHRB_MASK;
+
/* Enable BHRB filter in PMU */
mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter));
 }
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 030544e35959..f3987915cadc 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -92,6 +92,7 @@ enum {
 #define POWER9_MMCRA_IFM1  0x4000UL
 #define POWER9_MMCRA_IFM2  0x8000UL
 #define POWER9_MMCRA_IFM3  0xC000UL
+#define POWER9_MMCRA_BHRB_MASK 0xC000UL
 
 /* Nasty Power9 specific hack */
 #define PVR_POWER9_CUMULUS 0x2000
@@ -300,6 +301,8 @@ static u64 power9_bhrb_filter_map(u64 branch_sample_type)
 
 static void power9_config_bhrb(u64 pmu_bhrb_filter)
 {
+   pmu_bhrb_filter &= POWER9_MMCRA_BHRB_MASK;
+
/* Enable BHRB filter in PMU */
mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter));
 }
-- 
2.20.1

[PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-10 Thread Ravi Bangoria

Add a check for sample_period value sent from userspace. Negative
value does not make sense. And in powerpc arch code this could cause
a recursive PMI leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..e44c90378940 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1

Re: [PATCH 7/8] thermal: mediatek: add another get_temp ops for thermal sensors

2019-05-10 Thread Nicolas Boichat

On Thu, May 2, 2019 at 7:45 PM michael.kao  wrote:
>
> From: Michael Kao 
>
> Provide thermal zone to read thermal sensor
> in the SoC. We can read all the thermal sensors
> value in the SoC by the node /sys/class/thermal/
>
> Signed-off-by: Michael Kao 
> ---
>  drivers/thermal/mtk_thermal.c | 68 
> ++-
>  1 file changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/thermal/mtk_thermal.c b/drivers/thermal/mtk_thermal.c
> index cb41e46..d5c78b0 100644
> --- a/drivers/thermal/mtk_thermal.c
> +++ b/drivers/thermal/mtk_thermal.c
> @@ -230,6 +230,11 @@ enum {
>
>  struct mtk_thermal;
>
> +struct mtk_thermal_zone {
> +   struct mtk_thermal *mt;
> +   int id;
> +};
> +
>  struct thermal_bank_cfg {
> unsigned int num_sensors;
> const int *sensors;
> @@ -612,7 +617,7 @@ static int mtk_thermal_bank_temperature(struct 
> mtk_thermal_bank *bank)
>  * not immediately shut down.
>  */
> if (temp > 20)
> -   temp = 0;
> +   temp = -EACCES;

EACCES/permission denied doesn't really seem to be the right error
code here. Maybe EAGAIN?

>
> if (temp > max)
> max = temp;
> @@ -623,7 +628,8 @@ static int mtk_thermal_bank_temperature(struct 
> mtk_thermal_bank *bank)
>
>  static int mtk_read_temp(void *data, int *temperature)
>  {
> -   struct mtk_thermal *mt = data;
> +   struct mtk_thermal_zone *tz = data;
> +   struct mtk_thermal *mt = tz->mt;
> int i;
> int tempmax = INT_MIN;
>
> @@ -636,16 +642,48 @@ static int mtk_read_temp(void *data, int *temperature)
>
> mtk_thermal_put_bank(bank);
> }
> -

I'd drop that change.

> *temperature = tempmax;
>
> return 0;
>  }
>
> +static int mtk_read_sensor_temp(void *data, int *temperature)
> +{
> +   struct mtk_thermal_zone *tz = data;
> +   struct mtk_thermal *mt = tz->mt;
> +   const struct mtk_thermal_data *conf = mt->conf;
> +   int id = tz->id - 1;
> +   int temp = INT_MIN;

No need to initialize temp.

> +   u32 raw;
> +
> +   if (id < 0)
> +   return  -EACCES;

EINVAL?

> +
> +   raw = readl(mt->thermal_base + conf->msr[id]);
> +
> +   temp = raw_to_mcelsius(mt, id, raw);
> +
> +   /*
> +* The first read of a sensor often contains very high bogus
> +* temperature value. Filter these out so that the system does
> +* not immediately shut down.
> +*/
> +

nit: Drop this blank line

> +   if (temp > 20)
> +   return  -EACCES;

Again, EAGAIN, maybe?

> +
> +   *temperature = temp;
> +   return 0;
> +}
> +
>  static const struct thermal_zone_of_device_ops mtk_thermal_ops = {
> .get_temp = mtk_read_temp,
>  };
>
> +static const struct thermal_zone_of_device_ops mtk_thermal_sensor_ops = {
> +   .get_temp = mtk_read_sensor_temp,
> +};
> +
>  static void mtk_thermal_init_bank(struct mtk_thermal *mt, int num,
>   u32 apmixed_phys_base, u32 auxadc_phys_base,
>   int ctrl_id)
> @@ -878,6 +916,7 @@ static int mtk_thermal_probe(struct platform_device *pdev)
> struct resource *res;
> u64 auxadc_phys_base, apmixed_phys_base;
> struct thermal_zone_device *tzdev;
> +   struct mtk_thermal_zone *tz;
>
> mt = devm_kzalloc(>dev, sizeof(*mt), GFP_KERNEL);
> if (!mt)
> @@ -959,11 +998,24 @@ static int mtk_thermal_probe(struct platform_device 
> *pdev)
>
> platform_set_drvdata(pdev, mt);
>
> -   tzdev = devm_thermal_zone_of_sensor_register(>dev, 0, mt,
> -_thermal_ops);
> -   if (IS_ERR(tzdev)) {
> -   ret = PTR_ERR(tzdev);
> -   goto err_disable_clk_peri_therm;
> +   for (i = 0; i < mt->conf->num_sensors + 1; i++) {
> +   tz = kmalloc(sizeof(*tz), GFP_KERNEL);

Are we leaking this pointer? Should this be devm_kmalloc?

> +   if (!tz)
> +   return -ENOMEM;
> +
> +   tz->mt = mt;
> +   tz->id = i;
> +
> +   tzdev = devm_thermal_zone_of_sensor_register(>dev, i,
> +   tz, (i == 0) ?
> +   _thermal_ops : _thermal_sensor_ops);
> +
> +   if (IS_ERR(tzdev)) {
> +   if (IS_ERR(tzdev) != -EACCES) {

Why would EACCES ever happen? AFAICT
devm_thermal_zone_of_sensor_register does not actually try to read the
temperature value? Or does the error come from somewhere else?

> +   ret = PTR_ERR(tzdev);
> +   goto err_disable_clk_peri_therm;
> +   }
> +   }
> }
>
> return 0;
> --
> 1.9.1
>
>
> ___
> Linux-mediatek

Re: [PATCH] kbuild: add script check for cross compilation utilities

2019-05-10 Thread Nathan Chancellor

Few comments below but nothing major, this seems to work fine as is.

On Thu, May 09, 2019 at 01:19:21PM -0700, 'Nick Desaulniers' via Clang Built 
Linux wrote:
> When cross compiling via setting CROSS_COMPILE, if the prefixed tools
> are not found, then the host utilities are often instead invoked, and
> produce often difficult to understand errors.  This is most commonly the
> case for developers new to cross compiling the kernel that have yet to
> install the proper cross compilation toolchain. Rather than charge
> headlong into a build that will fail obscurely, check that the tools
> exist before starting to compile, and fail with a friendly error
> message.

This part of the commit message makes it sound like this is a generic
problem when it is actually specific to clang. make will fail on its
own when building with gcc if CROSS_COMPILE is not properly set (since
gcc won't be found).

On a side note, seems kind of odd that clang falls back to the host
tools when a non-host --target argument is used... (how in the world is
that expected to work?)

> 
> Before:
> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make CC=clang
> ...
> /usr/bin/as: unrecognized option '-EL'
> clang: error: assembler command failed with exit code 1 (use -v to see
> invocation)
> make[2]: *** [../scripts/Makefile.build:279: scripts/mod/empty.o] Error 1
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [/linux/Makefile:1118:
> prepare0] Error 2
> make: *** [Makefile:179: sub-make] Error 2
> 
> After:
> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make CC=clang
> $CROSS_COMPILE set to arm-linux-gnueabihf-, but unable to find
> arm-linux-gnueabihf-as.
> Makefile:522: recipe for target 'outputmakefile' failed
> make: *** [outputmakefile] Error 1
> 
> Signed-off-by: Nick Desaulniers 

Reviewed-by: Nathan Chancellor 

> ---
> Note: this is probably more generally useful, but after a few minutes
> wrestling with Make errors related to "recipe commences before first
> target" and "missing separator," I came to understand my hatred of GNU
> Make. Open to sugguestions for where better to invoke this from the top
> level Makefile.
> 
>  Makefile  |  1 +
>  scripts/check_crosscompile.sh | 18 ++
>  2 files changed, 19 insertions(+)
>  create mode 100755 scripts/check_crosscompile.sh
> 
> diff --git a/Makefile b/Makefile
> index a61a95b6b38f..774339674b59 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -519,6 +519,7 @@ endif
>  
>  ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
>  ifneq ($(CROSS_COMPILE),)
> + $(Q)$(CONFIG_SHELL) $(srctree)/scripts/check_crosscompile.sh
>  CLANG_FLAGS  := --target=$(notdir $(CROSS_COMPILE:%-=%))
>  GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
>  CLANG_FLAGS  += --prefix=$(GCC_TOOLCHAIN_DIR)
> diff --git a/scripts/check_crosscompile.sh b/scripts/check_crosscompile.sh
> new file mode 100755
> index ..f4586fbfee18
> --- /dev/null
> +++ b/scripts/check_crosscompile.sh
> @@ -0,0 +1,18 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# (c) 2019, Nick Desaulniers 

I think a space between the comment and function here would look nicer.

> +function check () {
> +  # Remove trailing commands, for example arch/arm/Makefile may add `-EL`.
> +  utility=$(echo ${1} | awk '{print $1;}')

Shellcheck mentions the ${1} should be quoted.

> +  command -v "${utility}" &> /dev/null
> +  if [[ $? != 0 ]]; then

This can be simplified into:

if ! command -v "${utility}" &> /dev/null; then

> +echo "\$CROSS_COMPILE set to ${CROSS_COMPILE}," \
> +  "but unable to find ${utility}."
> +exit 1
> +  fi
> +}

Maybe a space here and after utilities?

> +utilities=("${AS}" "${LD}" "${CC}" "${AR}" "${NM}" "${STRIP}" "${OBJCOPY}"
> +  "${OBJDUMP}")

I think this would look a little better with the "${OBJDUMP}" aligned to
the "${AS}" (and maybe split the lines to make them evenly align?)

Another note, this script could in theory be invoked via 'sh' if bash
doesn't exist on a system (see CONFIG_SHELL's definition), where only
POSIX compliant constructs should be used (so no arrays). I don't know
how often this occurs to matter (or if it does in this case) but worth
mentioning.

> +for utility in "${utilities[@]}"; do
> +  check "${utility}"
> +done
> -- 
> 2.21.0.1020.gf2820cf01a-goog

Re: [PATCH 3/3] i2c: i801: avoid panic if ioreamp fails

2019-05-10 Thread Kefeng Wang



On 2019/5/10 20:18, Jean Delvare wrote:
> On Fri, 10 May 2019 17:35:46 +0800, Kefeng Wang wrote:
>> On 2019/5/10 16:09, Jean Delvare wrote:
>>> We don't need this anyway. The comment says it can't fail, so why
>>> bother checking for a condition which will never happen?  
>> The ioremap could fails due to no memory, our inner test robot(enable 
>> FAULT_INJECTION)
>>
>> find this issue.
> The code only runs on x86 where this specific memory segment is
> standardized for the purpose. That's how we know it "can't fail".
>
> That being said, maybe it could fail for other reasons (internal kernel
> bug, or bogus BIOS maybe), and I don't care adding the check
> anyway, as this code path is not performance critical.
Got it , please ignore it.
>

Re: [PATCH] nvme/pci: Use host managed power state for suspend

2019-05-10 Thread Edmund Nadolski


On 5/10/19 2:29 PM, Keith Busch wrote:

The nvme pci driver prepares its devices for power loss during suspend
by shutting down the controllers, and the power setting is deferred to
pci driver's power management before the platform removes power. The
suspend-to-idle mode, however, does not remove power.

NVMe devices that implement host managed power settings can achieve
lower power and better transition latencies than using generic PCI
power settings. Try to use this feature if the platform is not involved
with the suspend. If successful, restore the previous power state on
resume.

Cc: Mario Limonciello 
Cc: Kai Heng Feng 
Signed-off-by: Keith Busch 
---
Disclaimer: I've tested only on emulation faking support for the feature.

General question: different devices potentially have divergent values
for power consumption and transition latencies. Would it be useful to
allow a user tunable setting to select the desired target power state
instead of assuming the lowest one?


In general I prefer fewer knobs; but for a new setting it seems
advisable not to presume that one value is appropriate for everyone (as
long as they can't set an invalid value or otherwise hose things).



  drivers/nvme/host/core.c | 27 
  drivers/nvme/host/nvme.h |  2 ++
  drivers/nvme/host/pci.c  | 53 
  3 files changed, 82 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a6644a2c3ef7..eb3640fd8838 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1132,6 +1132,33 @@ static int nvme_set_features(struct nvme_ctrl *dev, 
unsigned fid, unsigned dword
return ret;
  }
  
+int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps)

+{
+   return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL, 0, NULL);
+}
+EXPORT_SYMBOL_GPL(nvme_set_power);
+
+int nvme_get_power(struct nvme_ctrl *ctrl, u32 *result)
+{
+   struct nvme_command c;
+   union nvme_result res;
+   int ret;
+
+   if (!result)
+   return -EINVAL;


As this is only called from one place, is this check really needed?



+
+   memset(, 0, sizeof(c));
+   c.features.opcode = nvme_admin_get_features;
+   c.features.fid = cpu_to_le32(NVME_FEAT_POWER_MGMT);
+
+   ret = __nvme_submit_sync_cmd(ctrl->admin_q, , ,
+   NULL, 0, 0, NVME_QID_ANY, 0, 0, false);
+   if (ret >= 0)
+   *result = le32_to_cpu(res.u32);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_get_power);
+
  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
  {
u32 q_count = (*count - 1) | ((*count - 1) << 16);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5ff83f..eaa571ac06d2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -459,6 +459,8 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct 
nvme_command *cmd,
unsigned timeout, int qid, int at_head,
blk_mq_req_flags_t flags, bool poll);
  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
+int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps);
+int nvme_get_power(struct nvme_ctrl *ctrl, u32 *result);
  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
  int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
  int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3e4fb891a95a..0d5d91e5b293 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -116,6 +117,7 @@ struct nvme_dev {
u32 cmbsz;
u32 cmbloc;
struct nvme_ctrl ctrl;
+   u32 last_ps;
  
  	mempool_t *iod_mempool;
  
@@ -2828,11 +2830,59 @@ static void nvme_remove(struct pci_dev *pdev)

  }
  
  #ifdef CONFIG_PM_SLEEP

+static int nvme_deep_state(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int ret;
+
+   /*
+* Save the current power state in case a user tool set a power policy
+* for this device. We'll restore that state on resume.
+*/
+   dev->last_ps = 0;
+   ret = nvme_get_power(>ctrl, >last_ps);


Should we validate (range check) the value reported by the device?



+
+   /*
+* Return the error to halt suspend if the driver either couldn't
+* submit a command or didn't see a response.
+*/
+   if (ret < 0)
+   return ret;
+
+   ret = nvme_set_power(>ctrl, dev->ctrl.npss);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
+   /*
+* A saved state prevents pci pm from generically controlling
+* the device's power. We're using protocol specific settings
+* so we don't want pci interfering.
+*/
+   pci_save_state(pdev);
+   } else {
+

Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver

2019-05-10 Thread Pankaj Gupta



> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta  wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c | 114 +
> > > >  drivers/virtio/Kconfig   |  10 +++
> > > >  drivers/virtio/Makefile  |   1 +
> > > >  drivers/virtio/pmem.c| 118 +++
> > > >  include/linux/virtio_pmem.h  |  60 
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index ..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include 
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +   unsigned int len;
> > > > +   unsigned long flags;
> > > > +   struct virtio_pmem_request *req, *req_buf;
> > > > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > > +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> > > > +   req->done = true;
> > > > +   wake_up(>host_acked);
> > > > +
> > > > +   if (!list_empty(>req_list)) {
> > > > +   req_buf = list_first_entry(>req_list,
> > > > +   struct virtio_pmem_request,
> > > > list);
> > > > +   list_del(>req_list);
> > > > +   req_buf->wq_buf_avail = true;
> > > > +   wake_up(_buf->wq_buf);
> > > > +   }
> > > > +   }
> > > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +   int err;
> > > > +   unsigned long flags;
> > > > +   struct scatterlist *sgs[2], sg, ret;
> > > > +   struct virtio_device *vdev = nd_region->provider_data;
> > > > +   struct virtio_pmem *vpmem = vdev->priv;
> > > > +   struct virtio_pmem_request *req;
> > > > +
> > > > +   might_sleep();
> > > > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +   if (!req)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   req->done = req->wq_buf_avail = false;
> > > > +   strcpy(req->name, "FLUSH");
> > > > +   init_waitqueue_head(>host_acked);
> > > > +   init_waitqueue_head(>wq_buf);
> > > > +   sg_init_one(, req->name, strlen(req->name));
> > > > +   sgs[0] = 
> > > > +   sg_init_one(, >ret, sizeof(req->ret));
> > > > +   sgs[1] = 
> > > > +
> > > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > > +   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > > > GFP_ATOMIC);
> > > > +   if (err) {
> > > > +   dev_err(>dev, "failed to send command to virtio
> > > > pmem
> > > > device\n");
> > > > +
> > > > +   list_add_tail(>req_list, >list);
> > > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > > +
> > > > +   /* When host has read buffer, this completes via
> > > > host_ack
> > > > */
> > > > +   wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > > +   }
> > > > +   err =

[PATCH] mfd: stmfx: Fix macro definition spelling

2019-05-10 Thread Nathan Chancellor

Clang warns:

In file included from drivers/mfd/stmfx.c:13:
include/linux/mfd/stmfx.h:7:9: warning: 'MFD_STMFX_H' is used as a
header guard here, followed by #define of a different macro
[-Wheader-guard]

Fixes: 06252ade9156 ("mfd: Add ST Multi-Function eXpander (STMFX) core driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/475
Signed-off-by: Nathan Chancellor 
---
 include/linux/mfd/stmfx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mfd/stmfx.h b/include/linux/mfd/stmfx.h
index d890595b89b6..3c67983678ec 100644
--- a/include/linux/mfd/stmfx.h
+++ b/include/linux/mfd/stmfx.h
@@ -5,7 +5,7 @@
  */
 
 #ifndef MFD_STMFX_H
-#define MFX_STMFX_H
+#define MFD_STMFX_H
 
 #include 
 
-- 
2.21.0

Re: [Qemu-devel] [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta



> > > >
> > > > This patch adds 'DAXDEV_SYNC' flag which is set
> > > > for nd_region doing synchronous flush. This later
> > > > is used to disable MAP_SYNC functionality for
> > > > ext4 & xfs filesystem for devices don't support
> > > > synchronous flush.
> > > >
> > > > Signed-off-by: Pankaj Gupta 
> > > > ---
> > > >  drivers/dax/bus.c|  2 +-
> > > >  drivers/dax/super.c  | 13 -
> > > >  drivers/md/dm.c  |  3 ++-
> > > >  drivers/nvdimm/pmem.c|  5 -
> > > >  drivers/nvdimm/region_devs.c |  7 +++
> > > >  include/linux/dax.h  |  8 ++--
> > > >  include/linux/libnvdimm.h|  1 +
> > > >  7 files changed, 33 insertions(+), 6 deletions(-)
> > > [..]
> > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > > index 043f0761e4a0..ee007b75d9fd 100644
> > > > --- a/drivers/md/dm.c
> > > > +++ b/drivers/md/dm.c
> > > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > > > sprintf(md->disk->disk_name, "dm-%d", minor);
> > > >
> > > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > > > -   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops);
> > > > +   dax_dev = alloc_dax(md, md->disk->disk_name,
> > > > _dax_ops,
> > > > +
> > > > DAXDEV_F_SYNC);
> > >
> > > Apologies for not realizing this until now, but this is broken.
> > > Imaging a device-mapper configuration composed of both 'async'
> > > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> > > across all members. I would change this argument to '0' and then
> > > arrange for it to be set at dm_table_supports_dax() time after
> > > validating that all components support synchronous dax.
> >
> > o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
> > components support synchronous DAX.
> >
> > Just a question, If device mapper configuration have composed of both
> > virtio-pmem or pmem devices, we want to configure device mapper for async
> > flush?
> 
> If it's composed of both then, yes, it needs to be async flush at the
> device-mapper level. Otherwise MAP_SYNC will succeed and fail to
> trigger fsync on the host file when necessary for the virtio-pmem
> backed portion of the device-mapper device.

o.k. Agree.

Thanks you,
Pankaj
> 
>

RE: [PATCH v20 00/28] Intel SGX1 support

2019-05-10 Thread Xing, Cedric

Hi Andy and Jethro,

> > > You have probably misread my email. By mmap(), I meant the enclave
> file would be mapped via *multiple* mmap() calls, in the same way as
> what dlopen() would do in loading regular shared object. The intention
> here is to make the enclave file subject to the same checks as regular
> shared objects.
> >
> > No, I didn't misread your email. My original point still stands:
> > requiring that an enclave's memory is created from one or more mmap
> > calls of a file puts significant restrictions on the enclave's on-disk
> > representation.
> >

I was talking in the context of ELF, with the assumption that changing RW pages 
to RX is disallowed by LSM. And in that case mmap()would be the only way to 
load a page from disk without having to "write" to it. But that's just an 
example but not the focus of the discussion.

The point I was trying to make was, that the driver is going to copy both 
content and permissions from the source so the security properties established 
(by IMA/LSM) around that source page would be carried onto the EPC page being 
EADD'ed. The driver doesn't care how that source page came into existence. It 
could be mapped from an ELF file as in the example, or it could be a result 
from JIT as long as LSM allows it. The driver will be file format agnostic.

> 
> For a tiny bit of background, Linux (AFAIK*) makes no effort to ensure
> the complete integrity of DSOs.  What Linux *does* do (if so
> configured) is to make sure that only approved data is mapped
> executable.  So, if you want to have some bytes be executable, those
> bytes have to come from a file that passes the relevant LSM and IMA
> checks.  So we have two valid approaches, I think.
> 
> Approach 1: we treat SGX exactly the same way and make it so that only
> bytes that pass the relevant checks can be mapped as code within an
> enclave.  This imposes no particular restrictions on the file format
> -- we just need some API that takes an fd, an offset, and a length,
> and adds those bytes as code to an enclave.  (It could also take a
> pointer and a length and make sure that the pointer points to
> executable memory -- same effect.)

I assume "some API" is some user mode API so this approach is the same as what 
I suggested in my last email. Am I correct?

> 
> Approach 2: we decide that we want a stronger guarantee and that we
> *will* ensure the integrity of the enclave.  This means:
> 
> 2a) that we either need to load the entire thing from some approved
> file, and we commit to supporting one or more file formats.
> 
> 2b) we need to check that the eventual enclave hash is approved.  Or
> we could have a much shorter file that is just the hash and we check
> that.  At its simplest, the file could be *only* the hash, and there
> could be an LSM callback to check it.  In the future, if someone wants
> to allow enclaves to be embedded in DSOs, we could have a special ELF
> note or similar that contains an enclave hash or similar.
> 
> 2c) same as 2b except that we expose the whole SIGSTRUCT, not just the
> hash.
> 
> Here are some pros and cons of various bits:
> 
> 1 and 2a allow anti-virus software to scan the enclave code, and 2a
> allows it to scan the whole enclave.  I don't know if this is actually
> imporant.

I guess anti-virus software can scan any enclave file in *all* cases as long as 
it understands the format of that enclave. It doesn't necessary mean the kernel 
has to understand that enclave format (as enclave file could be parsed in user 
mode) or the anti-virus software has to understand all formats (if any) 
supported natively by the kernel.
 
> 
> 2a is by far the most complicated kernel implementation.
> 

Agreed. I don't see any reason 2a would be necessary.

> 2b and 2c are almost file-format agnostic.  1 is completely file
> format agnostic but, in exchange, it's much weaker.

I'd say 1 and (variants of) 2 are orthogonal. SGX always enforces integrities 
so not doing integrity checks at EADD doesn't mean the enclave integrity is not 
being enforced. 1 and 2 are basically 2 different checkpoints where LSM hooks 
could be placed. And a given LSM implementation/policy may enforce either 1 or 
2, or both, or neither. 

> 
> 2b and 2c should solve most (but not all) of the launch control
> complaints that Dr. Greg cares about, in the sense that the LSM policy
> quite literally validates that the enclave is approved.
> 
> As a straw man design, I propose the following, which is mostly 2c.
> The whole loading process works almost as in Jarkko's current driver,
> but the actual ioctl that triggers EINIT changes.  When you issue the
> ioctl, you pass in an fd and the SIGSTRUCT is loaded and checked from
> the fd.  The idea is that software that ships an enclave will ship a
> .sgxsig file that is literally a SIGSTRUCT for the enclave.  With
> SELinux, that file gets labeled something like
> sgx_enclave_sigstruct_t.  And we have the following extra twist: if
> you're calling the EADD ioctl

Re: [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Dan Williams

On Fri, May 10, 2019 at 5:45 PM Pankaj Gupta  wrote:
>
>
>
> > >
> > > This patch adds 'DAXDEV_SYNC' flag which is set
> > > for nd_region doing synchronous flush. This later
> > > is used to disable MAP_SYNC functionality for
> > > ext4 & xfs filesystem for devices don't support
> > > synchronous flush.
> > >
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  drivers/dax/bus.c|  2 +-
> > >  drivers/dax/super.c  | 13 -
> > >  drivers/md/dm.c  |  3 ++-
> > >  drivers/nvdimm/pmem.c|  5 -
> > >  drivers/nvdimm/region_devs.c |  7 +++
> > >  include/linux/dax.h  |  8 ++--
> > >  include/linux/libnvdimm.h|  1 +
> > >  7 files changed, 33 insertions(+), 6 deletions(-)
> > [..]
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index 043f0761e4a0..ee007b75d9fd 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > > sprintf(md->disk->disk_name, "dm-%d", minor);
> > >
> > > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > > -   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
> > > +   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
> > > +DAXDEV_F_SYNC);
> >
> > Apologies for not realizing this until now, but this is broken.
> > Imaging a device-mapper configuration composed of both 'async'
> > virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> > across all members. I would change this argument to '0' and then
> > arrange for it to be set at dm_table_supports_dax() time after
> > validating that all components support synchronous dax.
>
> o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
> components support synchronous DAX.
>
> Just a question, If device mapper configuration have composed of both
> virtio-pmem or pmem devices, we want to configure device mapper for async 
> flush?

If it's composed of both then, yes, it needs to be async flush at the
device-mapper level. Otherwise MAP_SYNC will succeed and fail to
trigger fsync on the host file when necessary for the virtio-pmem
backed portion of the device-mapper device.

Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations

2019-05-10 Thread Masami Hiramatsu

On Fri, 10 May 2019 14:40:54 +0200
Peter Zijlstra  wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra  wrote:
> > > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > > @@ -731,29 +731,8 @@ asm(
> > > > >   ".global kretprobe_trampoline\n"
> > > > >   ".type kretprobe_trampoline, @function\n"
> > > > >   "kretprobe_trampoline:\n"
> 
> > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > > the address which is returned from the target function. We have no 
> > > > "ret-ip" here at this point. So something like
> > > > 
> > > > +   "push $0\n" /* This is a gap, will be filled with real 
> > > > return address*/
> > > 
> > > The trampoline already provides a gap, trampoline_handler() will need to
> > > use int3_emulate_push() if it wants to inject something on the return
> > > stack.
> > 
> > I guess you mean the int3 case. This trampoline is used as a return 
> > destination.
> 
> > When the target function is called, kretprobe interrupts the first 
> > instruction,
> > and replace the return address with this trampoline. When a "ret" 
> > instruction
> > is done, it returns to this trampoline. Thus the stack frame start with
> > previous context here. As you described above,
> 
> I would prefer to change that to inject an extra return address, instead
> of replacing it. With the new exception stuff we can actually do that.
> 
> So on entry we then go from:
> 
>   
>   RET-IP
> 
> to
> 
>   
>   RET-IP
>   return-trampoline
> 
> So when the function returns, it falls into the trampoline instead.

Is that really possible? On x86-64, most parameters are passed by registers,
but x86-32 (and x86-64 in rare case) some parameters can be passed by stack.
If we change the stack layout in the function prologue, the code in
function body can not access those parameters on stack.

Thank you,

> 
> > > > > +  * On entry the stack looks like:
> > > > > +  *
> > > > > +  *   2*4(%esp) 
> > > > > +  *   1*4(%esp) RET-IP
> > > > > +  *   0*4(%esp) func
> > 
> > From this trampoline call, the stack looks like:
> > 
> >  *   1*4(%esp) 
> >  *   0*4(%esp) func
> > 
> > So we need one more push.
> 
> And then the stack looks just right at this point.
> 
> > > > > + "push trampoline_handler\n"
> > > > > + "jmp call_to_exception_trampoline\n"
> > > > >   ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > > >  );
> > > > >  NOKPROBE_SYMBOL(kretprobe_trampoline);


-- 
Masami Hiramatsu

Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations

2019-05-10 Thread Masami Hiramatsu

On Fri, 10 May 2019 14:31:31 +0200
Peter Zijlstra  wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra  wrote:
> 
> > > Ideally also the optimized kprobe trampoline, but I've not managed to
> > > fully comprehend that one.
> > 
> > As you pointed in other reply, save/restore can be a macro, but
> > each trampoline code is slightly different. Optprobe template has
> > below parts
> > 
> > (jumped from probed address)
> > [store regs]
> > [setup function arguments (pt_regs and probed address)]
> > [handler call]
> > [restore regs]
> > [execute copied instruction]
> 
>  instruction_s_ ?

Yes.

> 
> The JMP to this trampoline is likely 5 bytes and could have clobbered
> multiple instructions, we'd then have to place them all here, and
> 
> > [jump back to probed address]
> 
> jump to after whatever instructions were clobbered by the JMP.

Right!

> > Note that there is a limitation that if it is optiomized probe, user
> > handler can not change regs->ip. (we can not use "ret" after executed
> > a copied instruction, which must run on same stack)
> 
> Changing regs->ip in this case is going to be massively dodgy indeed :-)
> But so would changing much else; changing stack layout would also be
> somewhat tricky.

Yes, so the stack must be same after [restore regs].

Thank you,

-- 
Masami Hiramatsu

RE: [PATCH] nvme/pci: Use host managed power state for suspend

2019-05-10 Thread Mario.Limonciello

> 
> Cc: Mario Limonciello 
> Cc: Kai Heng Feng 
> Signed-off-by: Keith Busch 
> ---
> Disclaimer: I've tested only on emulation faking support for the feature.

Thanks for sharing.  I'll arrange some testing with this with storage partners 
early 
next week.

> 
> General question: different devices potentially have divergent values for 
> power
> consumption and transition latencies. Would it be useful to allow a user 
> tunable
> setting to select the desired target power state instead of assuming the 
> lowest
> one?

Since this action only happens on the way down to suspend to idle I don't think
there is a lot of value in configuring this to be user tunable which state is 
used.

If you don't go into the deepest state for NVME, at least on Intel the PCH 
generally
won't be able to go into its deepest state either.

> 
>  drivers/nvme/host/core.c | 27 
> drivers/nvme/host/nvme.h |  2 ++  drivers/nvme/host/pci.c  | 53
> 
>  3 files changed, 82 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index
> a6644a2c3ef7..eb3640fd8838 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1132,6 +1132,33 @@ static int nvme_set_features(struct nvme_ctrl *dev,
> unsigned fid, unsigned dword
>   return ret;
>  }
> 
> +int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps) {
> + return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL,
> 0,
> +NULL); } EXPORT_SYMBOL_GPL(nvme_set_power);
> +
> +int nvme_get_power(struct nvme_ctrl *ctrl, u32 *result) {
> + struct nvme_command c;
> + union nvme_result res;
> + int ret;
> +
> + if (!result)
> + return -EINVAL;
> +
> + memset(, 0, sizeof(c));
> + c.features.opcode = nvme_admin_get_features;
> + c.features.fid = cpu_to_le32(NVME_FEAT_POWER_MGMT);
> +
> + ret = __nvme_submit_sync_cmd(ctrl->admin_q, , ,
> + NULL, 0, 0, NVME_QID_ANY, 0, 0, false);
> + if (ret >= 0)
> + *result = le32_to_cpu(res.u32);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(nvme_get_power);
> +
>  int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)  {
>   u32 q_count = (*count - 1) | ((*count - 1) << 16); diff --git
> a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index
> 5ee75b5ff83f..eaa571ac06d2 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -459,6 +459,8 @@ int __nvme_submit_sync_cmd(struct request_queue *q,
> struct nvme_command *cmd,
>   unsigned timeout, int qid, int at_head,
>   blk_mq_req_flags_t flags, bool poll);  int
> nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
> +int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps); int
> +nvme_get_power(struct nvme_ctrl *ctrl, u32 *result);
>  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);  int 
> nvme_reset_ctrl(struct
> nvme_ctrl *ctrl);  int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl); diff 
> --git
> a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index
> 3e4fb891a95a..0d5d91e5b293 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include  @@ -116,6 +117,7 @@ struct
> nvme_dev {
>   u32 cmbsz;
>   u32 cmbloc;
>   struct nvme_ctrl ctrl;
> + u32 last_ps;
> 
>   mempool_t *iod_mempool;
> 
> @@ -2828,11 +2830,59 @@ static void nvme_remove(struct pci_dev *pdev)  }
> 
>  #ifdef CONFIG_PM_SLEEP
> +static int nvme_deep_state(struct nvme_dev *dev) {
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
> + int ret;
> +
> + /*
> +  * Save the current power state in case a user tool set a power policy
> +  * for this device. We'll restore that state on resume.
> +  */
> + dev->last_ps = 0;
> + ret = nvme_get_power(>ctrl, >last_ps);
> +
> + /*
> +  * Return the error to halt suspend if the driver either couldn't
> +  * submit a command or didn't see a response.
> +  */
> + if (ret < 0)
> + return ret;
> +
> + ret = nvme_set_power(>ctrl, dev->ctrl.npss);
> + if (ret < 0)
> + return ret;
> +
> + if (!ret) {
> + /*
> +  * A saved state prevents pci pm from generically controlling
> +  * the device's power. We're using protocol specific settings
> +  * so we don't want pci interfering.
> +  */
> + pci_save_state(pdev);
> + } else {
> + /*
> +  * The drive failed the low power request. Fallback to device
> +  * shutdown and clear npss to force a controller reset on
> +  * resume. The value will be rediscovered during reset.
> +  */
> + dev->ctrl.npss = 0;
> + nvme_dev_disable(dev, true);
> + }
> + return 0;
> +}
> +
>  static int nvme_suspend(struct device

Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess

2019-05-10 Thread Arnd Bergmann

On Fri, May 10, 2019 at 6:53 AM Dmitry Vyukov  wrote:
> >
> > I think it's good to have a sanity check in-place for consistency.
>
>
> Hi,
>
> This broke our cross-builds from x86. I am using:
>
> $ powerpc64le-linux-gnu-gcc --version
> powerpc64le-linux-gnu-gcc (Debian 7.2.0-7) 7.2.0
>
> and it says that it's little-endian somehow:
>
> $ powerpc64le-linux-gnu-gcc -dM -E - < /dev/null | grep BYTE_ORDER
> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>
> Is it broke compiler? Or I always hold it wrong? Is there some
> additional flag I need to add?

It looks like a bug in the kernel Makefiles to me. powerpc32 is always
big-endian,
powerpc64 used to be big-endian but is now usually little-endian. There are
often three separate toolchains that default to the respective user
space targets
(ppc32be, ppc64be, ppc64le), but generally you should be able to build
any of the
three kernel configurations with any of those compilers, and have the Makefile
pass the correct -m32/-m64/-mbig-endian/-mlittle-endian command line options
depending on the kernel configuration. It seems that this is not happening
here. I have not checked why, but if this is the problem, it should be
easy enough
to figure out.

   Arnd

Re: [PATCH v8 3/6] libnvdimm: add dax_dev sync flag

2019-05-10 Thread Pankaj Gupta




> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/dax/bus.c|  2 +-
> >  drivers/dax/super.c  | 13 -
> >  drivers/md/dm.c  |  3 ++-
> >  drivers/nvdimm/pmem.c|  5 -
> >  drivers/nvdimm/region_devs.c |  7 +++
> >  include/linux/dax.h  |  8 ++--
> >  include/linux/libnvdimm.h|  1 +
> >  7 files changed, 33 insertions(+), 6 deletions(-)
> [..]
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 043f0761e4a0..ee007b75d9fd 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1969,7 +1969,8 @@ static struct mapped_device *alloc_dev(int minor)
> > sprintf(md->disk->disk_name, "dm-%d", minor);
> >
> > if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> > -   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops);
> > +   dax_dev = alloc_dax(md, md->disk->disk_name, _dax_ops,
> > +DAXDEV_F_SYNC);
> 
> Apologies for not realizing this until now, but this is broken.
> Imaging a device-mapper configuration composed of both 'async'
> virtio-pmem and 'sync' pmem. The 'sync' flag needs to be unified
> across all members. I would change this argument to '0' and then
> arrange for it to be set at dm_table_supports_dax() time after
> validating that all components support synchronous dax.

o.k. Need to set 'DAXDEV_F_SYNC' flag after verifying all the target
components support synchronous DAX.

Just a question, If device mapper configuration have composed of both 
virtio-pmem or pmem devices, we want to configure device mapper for async flush?

Thank you,
Pankaj 
>

Re: [PATCH] usercopy: Remove HARDENED_USERCOPY_PAGESPAN

2019-05-10 Thread Laura Abbott


On 5/10/19 3:43 PM, Kees Cook wrote:

This feature continues to cause more problems than it solves[1]. Its
intention was to check the bounds of page-allocator allocations by using
__GFP_COMP, for which we would need to find all missing __GFP_COMP
markings. This work has been on hold and there is an argument[2]
that such markings are not even the correct signal for checking for
same-allocation pages. Instead of depending on BROKEN, this just removes
it entirely. It can be trivially reverted if/when a better solution for
tracking page allocator sizes is found.

[1] https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg37479.html
[2] https://lkml.kernel.org/r/20190415022412.ga29...@bombadil.infradead.org

Suggested-by: Eric Biggers 
Signed-off-by: Kees Cook 
---
  mm/usercopy.c| 67 
  security/Kconfig | 11 
  2 files changed, 78 deletions(-)

diff --git a/mm/usercopy.c b/mm/usercopy.c
index 14faadcedd06..15dc1bf03303 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -159,70 +159,6 @@ static inline void check_bogus_address(const unsigned long 
ptr, unsigned long n,
usercopy_abort("null address", NULL, to_user, ptr, n);
  }
  
-/* Checks for allocs that are marked in some way as spanning multiple pages. */

-static inline void check_page_span(const void *ptr, unsigned long n,
-  struct page *page, bool to_user)
-{
-#ifdef CONFIG_HARDENED_USERCOPY_PAGESPAN
-   const void *end = ptr + n - 1;
-   struct page *endpage;
-   bool is_reserved, is_cma;
-
-   /*
-* Sometimes the kernel data regions are not marked Reserved (see
-* check below). And sometimes [_sdata,_edata) does not cover
-* rodata and/or bss, so check each range explicitly.
-*/
-
-   /* Allow reads of kernel rodata region (if not marked as Reserved). */
-   if (ptr >= (const void *)__start_rodata &&
-   end <= (const void *)__end_rodata) {
-   if (!to_user)
-   usercopy_abort("rodata", NULL, to_user, 0, n);
-   return;
-   }
-
-   /* Allow kernel data region (if not marked as Reserved). */
-   if (ptr >= (const void *)_sdata && end <= (const void *)_edata)
-   return;
-
-   /* Allow kernel bss region (if not marked as Reserved). */
-   if (ptr >= (const void *)__bss_start &&
-   end <= (const void *)__bss_stop)
-   return;
-



I agree the page spanning is broken but is it worth keeping the
checks against __rodata __bss etc.?


-   /* Is the object wholly within one base page? */
-   if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) ==
-  ((unsigned long)end & (unsigned long)PAGE_MASK)))
-   return;
-
-   /* Allow if fully inside the same compound (__GFP_COMP) page. */
-   endpage = virt_to_head_page(end);
-   if (likely(endpage == page))
-   return;
-
-   /*
-* Reject if range is entirely either Reserved (i.e. special or
-* device memory), or CMA. Otherwise, reject since the object spans
-* several independently allocated pages.
-*/
-   is_reserved = PageReserved(page);
-   is_cma = is_migrate_cma_page(page);
-   if (!is_reserved && !is_cma)
-   usercopy_abort("spans multiple pages", NULL, to_user, 0, n);
-
-   for (ptr += PAGE_SIZE; ptr <= end; ptr += PAGE_SIZE) {
-   page = virt_to_head_page(ptr);
-   if (is_reserved && !PageReserved(page))
-   usercopy_abort("spans Reserved and non-Reserved pages",
-  NULL, to_user, 0, n);
-   if (is_cma && !is_migrate_cma_page(page))
-   usercopy_abort("spans CMA and non-CMA pages", NULL,
-  to_user, 0, n);
-   }
-#endif
-}
-
  static inline void check_heap_object(const void *ptr, unsigned long n,
 bool to_user)
  {
@@ -236,9 +172,6 @@ static inline void check_heap_object(const void *ptr, 
unsigned long n,
if (PageSlab(page)) {
/* Check slab allocator for flags and size. */
__check_heap_object(ptr, n, page, to_user);
-   } else {
-   /* Verify object does not incorrectly span multiple pages. */
-   check_page_span(ptr, n, page, to_user);
}
  }
  
diff --git a/security/Kconfig b/security/Kconfig

index 353cfef71d4e..8392647f5a4c 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -176,17 +176,6 @@ config HARDENED_USERCOPY_FALLBACK
  Booting with "slab_common.usercopy_fallback=Y/N" can change
  this setting.
  
-config HARDENED_USERCOPY_PAGESPAN

-   bool "Refuse to copy allocations that span multiple pages"
-   depends on HARDENED_USERCOPY
-   depends on EXPERT
-   help
- When a multi-page allocation is done

[PATCH v2] media: v4l2-subdev: Verify arguments of v4l2_subdev_call()

2019-05-10 Thread Janusz Krzysztofik

Correctness of format type (try or active) and pad number parameters
passed to subdevice operation callbacks is now verified only for IOCTL
calls.  However, those callbacks are also used by drivers, e.g., V4L2
host interfaces.

Since both subdev_do_ioctl() and drivers are using v4l2_subdev_call()
macro while calling subdevice operations, move those parameter checks
from subdev_do_ioctl() to v4l2_subdev_call() so we can avoid
reimplementing those checks inside drivers.

Signed-off-by: Janusz Krzysztofik 
---
Changelog:
v1->v2:
- replace the horrible macro with a structure of wrapper functions;
  inspired by Hans' and Sakari's comments - thanks!
 
drivers/media/v4l2-core/v4l2-subdev.c | 222 +++---
 include/media/v4l2-subdev.h   |   6 +
 2 files changed, 139 insertions(+), 89 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-subdev.c 
b/drivers/media/v4l2-core/v4l2-subdev.c
index d75815ab0d7b..213194660d52 100644
--- a/drivers/media/v4l2-core/v4l2-subdev.c
+++ b/drivers/media/v4l2-core/v4l2-subdev.c
@@ -120,56 +120,165 @@ static int subdev_close(struct file *file)
return 0;
 }
 
+static inline int check_which(__u32 which)
+{
+   return which != V4L2_SUBDEV_FORMAT_TRY &&
+  which != V4L2_SUBDEV_FORMAT_ACTIVE ? -EINVAL : 0;
+}
+
 #if defined(CONFIG_VIDEO_V4L2_SUBDEV_API)
+static inline int check_pad(struct v4l2_subdev *sd, __u32 pad)
+{
+   return pad >= sd->entity.num_pads ? -EINVAL : 0;
+}
+#else
+#define check_pad(...) 0
+#endif
+
 static int check_format(struct v4l2_subdev *sd,
struct v4l2_subdev_format *format)
 {
-   if (format->which != V4L2_SUBDEV_FORMAT_TRY &&
-   format->which != V4L2_SUBDEV_FORMAT_ACTIVE)
-   return -EINVAL;
+   return check_which(format->which) ? : check_pad(sd, format->pad);
+}
 
-   if (format->pad >= sd->entity.num_pads)
-   return -EINVAL;
+static int check_get_fmt(struct v4l2_subdev *sd,
+struct v4l2_subdev_pad_config *cfg,
+struct v4l2_subdev_format *format)
+{
+   return check_format(sd, format) ? :
+   sd->ops->pad->get_fmt(sd, cfg, format);
+}
 
-   return 0;
+static int check_set_fmt(struct v4l2_subdev *sd,
+struct v4l2_subdev_pad_config *cfg,
+struct v4l2_subdev_format *format)
+{
+   return check_format(sd, format) ? :
+   sd->ops->pad->set_fmt(sd, cfg, format);
 }
 
-static int check_crop(struct v4l2_subdev *sd, struct v4l2_subdev_crop *crop)
+static int check_enum_mbus_code(struct v4l2_subdev *sd,
+   struct v4l2_subdev_pad_config *cfg,
+   struct v4l2_subdev_mbus_code_enum *code)
 {
-   if (crop->which != V4L2_SUBDEV_FORMAT_TRY &&
-   crop->which != V4L2_SUBDEV_FORMAT_ACTIVE)
-   return -EINVAL;
+   return check_which(code->which) ? : check_pad(sd, code->pad) ? :
+   sd->ops->pad->enum_mbus_code(sd, cfg, code);
+}
 
-   if (crop->pad >= sd->entity.num_pads)
-   return -EINVAL;
+static int check_enum_frame_size(struct v4l2_subdev *sd,
+struct v4l2_subdev_pad_config *cfg,
+struct v4l2_subdev_frame_size_enum *fse)
+{
+   return check_which(fse->which) ? : check_pad(sd, fse->pad) ? :
+   sd->ops->pad->enum_frame_size(sd, cfg, fse);
+}
 
-   return 0;
+static int check_frame_interval(struct v4l2_subdev *sd,
+   struct v4l2_subdev_frame_interval *fi)
+{
+   return check_pad(sd, fi->pad);
+}
+
+static int check_g_frame_interval(struct v4l2_subdev *sd,
+ struct v4l2_subdev_frame_interval *fi)
+{
+   return check_frame_interval(sd, fi) ? :
+   sd->ops->video->g_frame_interval(sd, fi);
+}
+
+static int check_s_frame_interval(struct v4l2_subdev *sd,
+ struct v4l2_subdev_frame_interval *fi)
+{
+   return check_frame_interval(sd, fi) ? :
+   sd->ops->video->s_frame_interval(sd, fi);
+}
+
+static int check_enum_frame_interval(struct v4l2_subdev *sd,
+   struct v4l2_subdev_pad_config *cfg,
+   struct v4l2_subdev_frame_interval_enum *fie)
+{
+   return check_which(fie->which) ? : check_pad(sd, fie->pad) ? :
+   sd->ops->pad->enum_frame_interval(sd, cfg, fie);
 }
 
 static int check_selection(struct v4l2_subdev *sd,
   struct v4l2_subdev_selection *sel)
 {
-   if (sel->which != V4L2_SUBDEV_FORMAT_TRY &&
-   sel->which != V4L2_SUBDEV_FORMAT_ACTIVE)
-   return -EINVAL;
+   return check_which(sel->which) ? : check_pad(sd, sel->pad);
+}
 
-   if (sel->pad >= sd->entity.num_pads)
-   return -EINVAL;
+static int

Re: [PATCH v3 5/7] mm: rework non-root kmem_cache lifecycle management

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:41 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> This commit makes several important changes in the lifecycle
> of a non-root kmem_cache, which also affect the lifecycle
> of a memory cgroup.
>
> Currently each charged slab page has a page->mem_cgroup pointer
> to the memory cgroup and holds a reference to it.
> Kmem_caches are held by the memcg and are released with it.
> It means that none of kmem_caches are released unless at least one
> reference to the memcg exists, which is not optimal.
>
> So the current scheme can be illustrated as:
> page->mem_cgroup->kmem_cache.
>
> To implement the slab memory reparenting we need to invert the scheme
> into: page->kmem_cache->mem_cgroup.
>
> Let's make every page to hold a reference to the kmem_cache (we
> already have a stable pointer), and make kmem_caches to hold a single
> reference to the memory cgroup.
>
> To make this possible we need to introduce a new percpu refcounter
> for non-root kmem_caches. The counter is initialized to the percpu
> mode, and is switched to atomic mode after deactivation, so we never
> shutdown an active cache. The counter is bumped for every charged page
> and also for every running allocation. So the kmem_cache can't
> be released unless all allocations complete.
>
> To shutdown non-active empty kmem_caches, let's reuse the
> infrastructure of the RCU-delayed work queue, used previously for
> the deactivation. After the generalization, it's perfectly suited
> for our needs.
>
> Since now we can release a kmem_cache at any moment after the
> deactivation, let's call sysfs_slab_remove() only from the shutdown
> path. It makes deactivation path simpler.
>
> Because we don't set the page->mem_cgroup pointer, we need to change
> the way how memcg-level stats is working for slab pages. We can't use
> mod_lruvec_page_state() helpers anymore, so switch over to
> mod_lruvec_state().
>
> * I used the following simple approach to test the performance
> (stolen from another patchset by T. Harding):
>
> time find / -name fname-no-exist
> echo 2 > /proc/sys/vm/drop_caches
> repeat several times
>
> Results (I've chosen best results in several runs):
>
> orig   patched
>
> real0m0.700s   0m0.722s
> user0m0.114s   0m0.120s
> sys 0m0.317s   0m0.324s
>
> real0m0.729s   0m0.746s
> user0m0.110s   0m0.139s
> sys 0m0.320s   0m0.317s
>
> real0m0.745s   0m0.719s
> user0m0.108s   0m0.124s
> sys 0m0.320s   0m0.323s
>

You need to re-run the experiment. The numbers are same as of the
previous version but the patch changed a lot.

> So it looks like the difference is not noticeable in this test.
>
> Signed-off-by: Roman Gushchin 
> ---
>  include/linux/slab.h |  3 +-
>  mm/memcontrol.c  | 53 +--
>  mm/slab.h| 74 +++-
>  mm/slab_common.c | 63 +++--
>  mm/slub.c| 12 +--
>  5 files changed, 112 insertions(+), 93 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 47923c173f30..1b54e5f83342 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>
>  /*
> @@ -152,7 +153,6 @@ int kmem_cache_shrink(struct kmem_cache *);
>
>  void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *);
>  void memcg_deactivate_kmem_caches(struct mem_cgroup *);
> -void memcg_destroy_kmem_caches(struct mem_cgroup *);
>
>  /*
>   * Please use this macro to create slab caches. Simply specify the
> @@ -641,6 +641,7 @@ struct memcg_cache_params {
> struct mem_cgroup *memcg;
> struct list_head children_node;
> struct list_head kmem_caches_node;
> +   struct percpu_ref refcnt;
>
> void (*work_fn)(struct kmem_cache *);
> union {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b2c39f187cbb..9b27988c8969 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2610,12 +2610,13 @@ static void memcg_schedule_kmem_cache_create(struct 
> mem_cgroup *memcg,
>  {
> struct memcg_kmem_cache_create_work *cw;
>
> +   if (!css_tryget_online(>css))
> +   return;
> +
> cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> if (!cw)
> return;
>
> -   css_get(>css);
> -
> cw->memcg = memcg;
> cw->cachep = cachep;
> INIT_WORK(>work, memcg_kmem_cache_create_func);
> @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct 
> kmem_cache *cachep)
> struct mem_cgroup *memcg;
> struct kmem_cache *memcg_cachep;
> int kmemcg_id;
> +   struct memcg_cache_array *arr;
>
>

Re: [PATCH v3 6/7] mm: reparent slab memory on cgroup removal

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:41 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Let's reparent memcg slab memory on memcg offlining. This allows us
> to release the memory cgroup without waiting for the last outstanding
> kernel object (e.g. dentry used by another application).
>
> So instead of reparenting all accounted slab pages, let's do reparent
> a relatively small amount of kmem_caches. Reparenting is performed as
> a part of the deactivation process.
>
> Since the parent cgroup is already charged, everything we need to do
> is to splice the list of kmem_caches to the parent's kmem_caches list,
> swap the memcg pointer and drop the css refcounter for each kmem_cache
> and adjust the parent's css refcounter. Quite simple.
>
> Please, note that kmem_cache->memcg_params.memcg isn't a stable
> pointer anymore. It's safe to read it under rcu_read_lock() or
> with slab_mutex held.
>
> We can race with the slab allocation and deallocation paths. It's not
> a big problem: parent's charge and slab global stats are always
> correct, and we don't care anymore about the child usage and global
> stats. The child cgroup is already offline, so we don't use or show it
> anywhere.
>
> Local slab stats (NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE)
> aren't used anywhere except count_shadow_nodes(). But even there it
> won't break anything: after reparenting "nodes" will be 0 on child
> level (because we're already reparenting shrinker lists), and on
> parent level page stats always were 0, and this patch won't change
> anything.
>
> Signed-off-by: Roman Gushchin 
> ---
>  include/linux/slab.h |  4 ++--
>  mm/memcontrol.c  | 14 --
>  mm/slab.h| 14 +-
>  mm/slab_common.c | 23 ---
>  4 files changed, 39 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 1b54e5f83342..109cab2ad9b4 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -152,7 +152,7 @@ void kmem_cache_destroy(struct kmem_cache *);
>  int kmem_cache_shrink(struct kmem_cache *);
>
>  void memcg_create_kmem_cache(struct mem_cgroup *, struct kmem_cache *);
> -void memcg_deactivate_kmem_caches(struct mem_cgroup *);
> +void memcg_deactivate_kmem_caches(struct mem_cgroup *, struct mem_cgroup *);
>
>  /*
>   * Please use this macro to create slab caches. Simply specify the
> @@ -638,7 +638,7 @@ struct memcg_cache_params {
> bool dying;
> };
> struct {
> -   struct mem_cgroup *memcg;
> +   struct mem_cgroup __rcu *memcg;
> struct list_head children_node;
> struct list_head kmem_caches_node;
> struct percpu_ref refcnt;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9b27988c8969..6e4d9ed16069 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3220,15 +3220,15 @@ static void memcg_offline_kmem(struct mem_cgroup 
> *memcg)
>  */
> memcg->kmem_state = KMEM_ALLOCATED;
>
> -   memcg_deactivate_kmem_caches(memcg);
> -
> -   kmemcg_id = memcg->kmemcg_id;
> -   BUG_ON(kmemcg_id < 0);
> -
> parent = parent_mem_cgroup(memcg);
> if (!parent)
> parent = root_mem_cgroup;
>
> +   memcg_deactivate_kmem_caches(memcg, parent);
> +
> +   kmemcg_id = memcg->kmemcg_id;
> +   BUG_ON(kmemcg_id < 0);
> +
> /*
>  * Change kmemcg_id of this cgroup and all its descendants to the
>  * parent's id, and then move all entries from this cgroup's list_lrus
> @@ -3261,7 +3261,6 @@ static void memcg_free_kmem(struct mem_cgroup *memcg)
> if (memcg->kmem_state == KMEM_ALLOCATED) {
> WARN_ON(!list_empty(>kmem_caches));
> static_branch_dec(_kmem_enabled_key);
> -   WARN_ON(page_counter_read(>kmem));
> }
>  }
>  #else
> @@ -4673,6 +4672,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
> *parent_css)
>
> /* The following stuff does not apply to the root */
> if (!parent) {
> +#ifdef CONFIG_MEMCG_KMEM
> +   INIT_LIST_HEAD(>kmem_caches);
> +#endif
> root_mem_cgroup = memcg;
> return >css;
> }
> diff --git a/mm/slab.h b/mm/slab.h
> index 2acc68a7e0a0..acdc1810639d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -264,10 +264,11 @@ static __always_inline int memcg_charge_slab(struct 
> page *page,
> struct lruvec *lruvec;
> int ret;
>
> -   memcg = s->memcg_params.memcg;
> +   rcu_read_lock();
> +   memcg = rcu_dereference(s->memcg_params.memcg);
> ret = memcg_kmem_charge_memcg(page, gfp, order, memcg);

You can not do memcg_kmem_charge_memcg() within rcu_read_lock(). You
need to css_tryget_online(), though I don't know

Re: [PATCH v3 7/7] mm: fix /proc/kpagecgroup interface for slab pages

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:40 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Switching to an indirect scheme of getting mem_cgroup pointer for
> !root slab pages broke /proc/kpagecgroup interface for them.
>
> Let's fix it by learning page_cgroup_ino() how to get memcg
> pointer for slab pages.
>
> Reported-by: Shakeel Butt 
> Signed-off-by: Roman Gushchin 
> ---
>  mm/memcontrol.c  |  5 -
>  mm/slab.h| 21 +
>  mm/slab_common.c |  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6e4d9ed16069..8114838759f6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -494,7 +494,10 @@ ino_t page_cgroup_ino(struct page *page)
> unsigned long ino = 0;
>
> rcu_read_lock();
> -   memcg = READ_ONCE(page->mem_cgroup);
> +   if (PageSlab(page))
> +   memcg = memcg_from_slab_page(page);
> +   else
> +   memcg = READ_ONCE(page->mem_cgroup);
> while (memcg && !(memcg->css.flags & CSS_ONLINE))
> memcg = parent_mem_cgroup(memcg);
> if (memcg)
> diff --git a/mm/slab.h b/mm/slab.h
> index acdc1810639d..cb684fbe2cc2 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -256,6 +256,22 @@ static inline struct kmem_cache *memcg_root_cache(struct 
> kmem_cache *s)
> return s->memcg_params.root_cache;
>  }
>

Can you please document the preconditions of this function? It seems
like it must be PageSlab() then why need to check PageTail and do
compound_head().


> +static inline struct mem_cgroup *memcg_from_slab_page(struct page *page)
> +{
> +   struct kmem_cache *s;
> +
> +   WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +   if (PageTail(page))
> +   page = compound_head(page);
> +
> +   s = READ_ONCE(page->slab_cache);
> +   if (s && !is_root_cache(s))
> +   return rcu_dereference(s->memcg_params.memcg);
> +
> +   return NULL;
> +}
> +
>  static __always_inline int memcg_charge_slab(struct page *page,
>  gfp_t gfp, int order,
>  struct kmem_cache *s)
> @@ -338,6 +354,11 @@ static inline struct kmem_cache *memcg_root_cache(struct 
> kmem_cache *s)
> return s;
>  }
>
> +static inline struct mem_cgroup *memcg_from_slab_page(struct page *page)
> +{
> +   return NULL;
> +}
> +
>  static inline int memcg_charge_slab(struct page *page, gfp_t gfp, int order,
> struct kmem_cache *s)
>  {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 36673a43ed31..0cfdad0a0aac 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -253,6 +253,7 @@ static void memcg_unlink_cache(struct kmem_cache *s)
> list_del(>memcg_params.kmem_caches_node);
> 
> mem_cgroup_put(rcu_dereference_protected(s->memcg_params.memcg,
> lockdep_is_held(_mutex)));
> +   rcu_assign_pointer(s->memcg_params.memcg, NULL);
> }
>  }
>  #else
> --
> 2.20.1
>

Re: [PATCH v3 3/7] mm: introduce __memcg_kmem_uncharge_memcg()

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:30 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Let's separate the page counter modification code out of
> __memcg_kmem_uncharge() in a way similar to what
> __memcg_kmem_charge() and __memcg_kmem_charge_memcg() work.
>
> This will allow to reuse this code later using a new
> memcg_kmem_uncharge_memcg() wrapper, which calls
> __memcg_kmem_unchare_memcg() if memcg_kmem_enabled()

__memcg_kmem_uncharge_memcg()

> check is passed.
>
> Signed-off-by: Roman Gushchin 

Reviewed-by: Shakeel Butt 


> ---
>  include/linux/memcontrol.h | 10 ++
>  mm/memcontrol.c| 25 +
>  2 files changed, 27 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 36bdfe8e5965..deb209510902 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1298,6 +1298,8 @@ int __memcg_kmem_charge(struct page *page, gfp_t gfp, 
> int order);
>  void __memcg_kmem_uncharge(struct page *page, int order);
>  int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
>   struct mem_cgroup *memcg);
> +void __memcg_kmem_uncharge_memcg(struct mem_cgroup *memcg,
> +unsigned int nr_pages);
>
>  extern struct static_key_false memcg_kmem_enabled_key;
>  extern struct workqueue_struct *memcg_kmem_cache_wq;
> @@ -1339,6 +1341,14 @@ static inline int memcg_kmem_charge_memcg(struct page 
> *page, gfp_t gfp,
> return __memcg_kmem_charge_memcg(page, gfp, order, memcg);
> return 0;
>  }
> +
> +static inline void memcg_kmem_uncharge_memcg(struct page *page, int order,
> +struct mem_cgroup *memcg)
> +{
> +   if (memcg_kmem_enabled())
> +   __memcg_kmem_uncharge_memcg(memcg, 1 << order);
> +}
> +
>  /*
>   * helper for accessing a memcg's index. It will be used as an index in the
>   * child cache array in kmem_cache, and also to derive its name. This 
> function
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 48a8f1c35176..b2c39f187cbb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2750,6 +2750,22 @@ int __memcg_kmem_charge(struct page *page, gfp_t gfp, 
> int order)
> css_put(>css);
> return ret;
>  }
> +
> +/**
> + * __memcg_kmem_uncharge_memcg: uncharge a kmem page
> + * @memcg: memcg to uncharge
> + * @nr_pages: number of pages to uncharge
> + */
> +void __memcg_kmem_uncharge_memcg(struct mem_cgroup *memcg,
> +unsigned int nr_pages)
> +{
> +   if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +   page_counter_uncharge(>kmem, nr_pages);
> +
> +   page_counter_uncharge(>memory, nr_pages);
> +   if (do_memsw_account())
> +   page_counter_uncharge(>memsw, nr_pages);
> +}
>  /**
>   * __memcg_kmem_uncharge: uncharge a kmem page
>   * @page: page to uncharge
> @@ -2764,14 +2780,7 @@ void __memcg_kmem_uncharge(struct page *page, int 
> order)
> return;
>
> VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> -
> -   if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> -   page_counter_uncharge(>kmem, nr_pages);
> -
> -   page_counter_uncharge(>memory, nr_pages);
> -   if (do_memsw_account())
> -   page_counter_uncharge(>memsw, nr_pages);
> -
> +   __memcg_kmem_uncharge_memcg(memcg, nr_pages);
> page->mem_cgroup = NULL;
>
> /* slab pages do not have PageKmemcg flag set */
> --
> 2.20.1
>

Re: [PATCH v3 4/7] mm: unify SLAB and SLUB page accounting

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:40 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Currently the page accounting code is duplicated in SLAB and SLUB
> internals. Let's move it into new (un)charge_slab_page helpers
> in the slab_common.c file. These helpers will be responsible
> for statistics (global and memcg-aware) and memcg charging.
> So they are replacing direct memcg_(un)charge_slab() calls.
>
> Signed-off-by: Roman Gushchin 

Reviewed-by: Shakeel Butt 


> ---
>  mm/slab.c | 19 +++
>  mm/slab.h | 25 +
>  mm/slub.c | 14 ++
>  3 files changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 83000e46b870..32e6af9ed9af 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1389,7 +1389,6 @@ static struct page *kmem_getpages(struct kmem_cache 
> *cachep, gfp_t flags,
> int nodeid)
>  {
> struct page *page;
> -   int nr_pages;
>
> flags |= cachep->allocflags;
>
> @@ -1399,17 +1398,11 @@ static struct page *kmem_getpages(struct kmem_cache 
> *cachep, gfp_t flags,
> return NULL;
> }
>
> -   if (memcg_charge_slab(page, flags, cachep->gfporder, cachep)) {
> +   if (charge_slab_page(page, flags, cachep->gfporder, cachep)) {
> __free_pages(page, cachep->gfporder);
> return NULL;
> }
>
> -   nr_pages = (1 << cachep->gfporder);
> -   if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -   mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, nr_pages);
> -   else
> -   mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, nr_pages);
> -
> __SetPageSlab(page);
> /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
> if (sk_memalloc_socks() && page_is_pfmemalloc(page))
> @@ -1424,12 +1417,6 @@ static struct page *kmem_getpages(struct kmem_cache 
> *cachep, gfp_t flags,
>  static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
>  {
> int order = cachep->gfporder;
> -   unsigned long nr_freed = (1 << order);
> -
> -   if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
> -   mod_lruvec_page_state(page, NR_SLAB_RECLAIMABLE, -nr_freed);
> -   else
> -   mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE, -nr_freed);
>
> BUG_ON(!PageSlab(page));
> __ClearPageSlabPfmemalloc(page);
> @@ -1438,8 +1425,8 @@ static void kmem_freepages(struct kmem_cache *cachep, 
> struct page *page)
> page->mapping = NULL;
>
> if (current->reclaim_state)
> -   current->reclaim_state->reclaimed_slab += nr_freed;
> -   memcg_uncharge_slab(page, order, cachep);
> +   current->reclaim_state->reclaimed_slab += 1 << order;
> +   uncharge_slab_page(page, order, cachep);
> __free_pages(page, order);
>  }
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 4a261c97c138..c9a31120fa1d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -205,6 +205,12 @@ ssize_t slabinfo_write(struct file *file, const char 
> __user *buffer,
>  void __kmem_cache_free_bulk(struct kmem_cache *, size_t, void **);
>  int __kmem_cache_alloc_bulk(struct kmem_cache *, gfp_t, size_t, void **);
>
> +static inline int cache_vmstat_idx(struct kmem_cache *s)
> +{
> +   return (s->flags & SLAB_RECLAIM_ACCOUNT) ?
> +   NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>
>  /* List of all root caches. */
> @@ -352,6 +358,25 @@ static inline void memcg_link_cache(struct kmem_cache *s,
>
>  #endif /* CONFIG_MEMCG_KMEM */
>
> +static __always_inline int charge_slab_page(struct page *page,
> +   gfp_t gfp, int order,
> +   struct kmem_cache *s)
> +{
> +   int ret = memcg_charge_slab(page, gfp, order, s);
> +
> +   if (!ret)
> +   mod_lruvec_page_state(page, cache_vmstat_idx(s), 1 << order);
> +
> +   return ret;
> +}
> +
> +static __always_inline void uncharge_slab_page(struct page *page, int order,
> +  struct kmem_cache *s)
> +{
> +   mod_lruvec_page_state(page, cache_vmstat_idx(s), -(1 << order));
> +   memcg_uncharge_slab(page, order, s);
> +}
> +
>  static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void 
> *x)
>  {
> struct kmem_cache *cachep;
> diff --git a/mm/slub.c b/mm/slub.c
> index 43c34d54ad86..9ec25a588bdd 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1494,7 +1494,7 @@ static inline struct page *alloc_slab_page(struct 
> kmem_cache *s,
> else
> page = __alloc_pages_node(node, flags, order);
>
> -   if (page && memcg_charge_slab(page, flags, order, s)) {
> +   if (page && charge_slab_page(page,

Re: [PATCH v3 2/7] mm: generalize postponed non-root kmem_cache deactivation

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:30 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Currently SLUB uses a work scheduled after an RCU grace period
> to deactivate a non-root kmem_cache. This mechanism can be reused
> for kmem_caches reparenting, but requires some generalization.
>
> Let's decouple all infrastructure (rcu callback, work callback)
> from the SLUB-specific code, so it can be used with SLAB as well.
>
> Also, let's rename some functions to make the code look simpler.
> All SLAB/SLUB-specific functions start with "__". Remove "deact_"
> prefix from the corresponding struct fields.
>
> Here is the graph of a new calling scheme:
> kmemcg_cache_deactivate()
>   __kmemcg_cache_deactivate()  SLAB/SLUB-specific
>   kmemcg_schedule_work_after_rcu() rcu
> kmemcg_after_rcu_workfn()  work
>   kmemcg_cache_deactivate_after_rcu()
> __kmemcg_cache_deactivate_after_rcu()  SLAB/SLUB-specific
>
> instead of:
> __kmemcg_cache_deactivate()SLAB/SLUB-specific
>   slab_deactivate_memcg_cache_rcu_sched()  SLUB-only
> kmemcg_deactivate_rcufnSLUB-only, rcu
>   kmemcg_deactivate_workfn SLUB-only, work
> kmemcg_cache_deact_after_rcu() SLUB-only
>
> Signed-off-by: Roman Gushchin 

Reviewed-by: Shakeel Butt 

> ---
>  include/linux/slab.h |  6 ++---
>  mm/slab.c|  4 +++
>  mm/slab.h|  3 ++-
>  mm/slab_common.c | 62 
>  mm/slub.c|  8 +-
>  5 files changed, 38 insertions(+), 45 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9449b19c5f10..47923c173f30 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -642,10 +642,10 @@ struct memcg_cache_params {
> struct list_head children_node;
> struct list_head kmem_caches_node;
>
> -   void (*deact_fn)(struct kmem_cache *);
> +   void (*work_fn)(struct kmem_cache *);
> union {
> -   struct rcu_head deact_rcu_head;
> -   struct work_struct deact_work;
> +   struct rcu_head rcu_head;
> +   struct work_struct work;
> };
> };
> };
> diff --git a/mm/slab.c b/mm/slab.c
> index f6eff59e018e..83000e46b870 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2281,6 +2281,10 @@ void __kmemcg_cache_deactivate(struct kmem_cache 
> *cachep)
>  {
> __kmem_cache_shrink(cachep);
>  }
> +
> +void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s)
> +{
> +}
>  #endif
>
>  int __kmem_cache_shutdown(struct kmem_cache *cachep)
> diff --git a/mm/slab.h b/mm/slab.h
> index 6a562ca72bca..4a261c97c138 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -172,6 +172,7 @@ int __kmem_cache_shutdown(struct kmem_cache *);
>  void __kmem_cache_release(struct kmem_cache *);
>  int __kmem_cache_shrink(struct kmem_cache *);
>  void __kmemcg_cache_deactivate(struct kmem_cache *s);
> +void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
>  void slab_kmem_cache_release(struct kmem_cache *);
>
>  struct seq_file;
> @@ -291,7 +292,7 @@ static __always_inline void memcg_uncharge_slab(struct 
> page *page, int order,
>  extern void slab_init_memcg_params(struct kmem_cache *);
>  extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg);
>  extern void slab_deactivate_memcg_cache_rcu_sched(struct kmem_cache *s,
> -   void (*deact_fn)(struct kmem_cache *));
> +   void (*work_fn)(struct kmem_cache *));
>
>  #else /* CONFIG_MEMCG_KMEM */
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 6e00bdf8618d..4e5b4292a763 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -691,17 +691,18 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
> put_online_cpus();
>  }
>
> -static void kmemcg_deactivate_workfn(struct work_struct *work)
> +static void kmemcg_after_rcu_workfn(struct work_struct *work)
>  {
> struct kmem_cache *s = container_of(work, struct kmem_cache,
> -   memcg_params.deact_work);
> +   memcg_params.work);
>
> get_online_cpus();
> get_online_mems();
>
> mutex_lock(_mutex);
>
> -   s->memcg_params.deact_fn(s);
> +   s->memcg_params.work_fn(s);
> +   s->memcg_params.work_fn = NULL;
>
> mutex_unlock(_mutex);
>
> @@ -712,37 +713,28 @@ static void kmemcg_deactivate_workfn(struct work_struct 
> *work)
> css_put(>memcg_params.memcg->css);
>  }
>
> -static void kmemcg_deactivate_rcufn(struct rcu_head *head)
>

Re: [PATCH 4/4] Revert "platform/chrome: cros_ec_spi: Transfer messages at high priority"

2019-05-10 Thread Guenter Roeck

From: Douglas Anderson 
Date: Fri, May 10, 2019 at 3:35 PM
To: Mark Brown, Benson Leung, Enric Balletbo i Serra
Cc: , ,
Guenter Roeck, , , Douglas
Anderson, 

> This reverts commit 37a186225a0c020516bafad2727fdcdfc039a1e4.
>
> We have a better solution in the patch ("platform/chrome: cros_ec_spi:
> Set ourselves as timing sensitive").  Let's revert the uglier and less
> reliable solution.
>
> Signed-off-by: Douglas Anderson 

Reviewed-by: Guenter Roeck 

> ---
>
>  drivers/platform/chrome/cros_ec_spi.c | 80 ++-
>  1 file changed, 6 insertions(+), 74 deletions(-)
>
> diff --git a/drivers/platform/chrome/cros_ec_spi.c 
> b/drivers/platform/chrome/cros_ec_spi.c
> index 757a115502ec..70ff1ad09012 100644
> --- a/drivers/platform/chrome/cros_ec_spi.c
> +++ b/drivers/platform/chrome/cros_ec_spi.c
> @@ -75,27 +75,6 @@ struct cros_ec_spi {
> unsigned int end_of_msg_delay;
>  };
>
> -typedef int (*cros_ec_xfer_fn_t) (struct cros_ec_device *ec_dev,
> - struct cros_ec_command *ec_msg);
> -
> -/**
> - * struct cros_ec_xfer_work_params - params for our high priority workers
> - *
> - * @work: The work_struct needed to queue work
> - * @fn: The function to use to transfer
> - * @ec_dev: ChromeOS EC device
> - * @ec_msg: Message to transfer
> - * @ret: The return value of the function
> - */
> -
> -struct cros_ec_xfer_work_params {
> -   struct work_struct work;
> -   cros_ec_xfer_fn_t fn;
> -   struct cros_ec_device *ec_dev;
> -   struct cros_ec_command *ec_msg;
> -   int ret;
> -};
> -
>  static void debug_packet(struct device *dev, const char *name, u8 *ptr,
>  int len)
>  {
> @@ -371,13 +350,13 @@ static int cros_ec_spi_receive_response(struct 
> cros_ec_device *ec_dev,
>  }
>
>  /**
> - * do_cros_ec_pkt_xfer_spi - Transfer a packet over SPI and receive the reply
> + * cros_ec_pkt_xfer_spi - Transfer a packet over SPI and receive the reply
>   *
>   * @ec_dev: ChromeOS EC device
>   * @ec_msg: Message to transfer
>   */
> -static int do_cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
> -  struct cros_ec_command *ec_msg)
> +static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
> +   struct cros_ec_command *ec_msg)
>  {
> struct ec_host_response *response;
> struct cros_ec_spi *ec_spi = ec_dev->priv;
> @@ -514,13 +493,13 @@ static int do_cros_ec_pkt_xfer_spi(struct 
> cros_ec_device *ec_dev,
>  }
>
>  /**
> - * do_cros_ec_cmd_xfer_spi - Transfer a message over SPI and receive the 
> reply
> + * cros_ec_cmd_xfer_spi - Transfer a message over SPI and receive the reply
>   *
>   * @ec_dev: ChromeOS EC device
>   * @ec_msg: Message to transfer
>   */
> -static int do_cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
> -  struct cros_ec_command *ec_msg)
> +static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
> +   struct cros_ec_command *ec_msg)
>  {
> struct cros_ec_spi *ec_spi = ec_dev->priv;
> struct spi_transfer trans;
> @@ -632,53 +611,6 @@ static int do_cros_ec_cmd_xfer_spi(struct cros_ec_device 
> *ec_dev,
> return ret;
>  }
>
> -static void cros_ec_xfer_high_pri_work(struct work_struct *work)
> -{
> -   struct cros_ec_xfer_work_params *params;
> -
> -   params = container_of(work, struct cros_ec_xfer_work_params, work);
> -   params->ret = params->fn(params->ec_dev, params->ec_msg);
> -}
> -
> -static int cros_ec_xfer_high_pri(struct cros_ec_device *ec_dev,
> -struct cros_ec_command *ec_msg,
> -cros_ec_xfer_fn_t fn)
> -{
> -   struct cros_ec_xfer_work_params params;
> -
> -   INIT_WORK_ONSTACK(, cros_ec_xfer_high_pri_work);
> -   params.ec_dev = ec_dev;
> -   params.ec_msg = ec_msg;
> -   params.fn = fn;
> -
> -   /*
> -* This looks a bit ridiculous.  Why do the work on a
> -* different thread if we're just going to block waiting for
> -* the thread to finish?  The key here is that the thread is
> -* running at high priority but the calling context might not
> -* be.  We need to be at high priority to avoid getting
> -* context switched out for too long and the EC giving up on
> -* the transfer.
> -*/
> -   queue_work(system_highpri_wq, );
> -   flush_work();
> -   destroy_work_on_stack();
> -
> -   return params.ret;
> -}
> -
> -static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
> -   struct cros_ec_command *ec_msg)
> -{
> -   return cros_ec_xfer_high_pri(ec_dev, ec_msg, do_cros_ec_pkt_xfer_spi);
> -}
> -
> -static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
> -   struct cros_ec_command *ec_msg)
> -{
> -   return cros_ec_xfer_high_pri(ec_dev, ec_msg,

Re: [PATCH 2/4] spi: Allow SPI devices to specify that they are timing sensitive

2019-05-10 Thread Guenter Roeck

From: Douglas Anderson 
Date: Fri, May 10, 2019 at 3:35 PM
To: Mark Brown, Benson Leung, Enric Balletbo i Serra
Cc: , ,
Guenter Roeck, , , Douglas
Anderson, , 

> If a device on the SPI bus is very sensitive to timing then it may be
> necessary (for correctness) not to get interrupted during a transfer.
> One example is the EC (Embedded Controller) on Chromebooks.  The
> Chrome OS EC will drop a transfer if more than ~8ms passes between the
> chip select being asserted and the transfer finishing.
>
> The SPI framework already has code to handle the case where transfers
> are timing senstive.  It can set its message pumping thread to
> realtime to to minimize interruptions during the transfer.  However,
> at the moment, this mode can only be requested by a SPI controller.
> Let's allow the drivers for SPI devices to also request this mode.
>
> NOTE: at the moment if a given device on a bus says that it's timing
> sensitive then we'll pump all messages on that bus at high priority.
> It is possible we might want to relax this in the future but it seems
> like it should be fine for now.
>
> Signed-off-by: Douglas Anderson 

Nitpick: I would use 'rt' instead of 'timing_sensitive' as name for the
new variable.

Otherwise:

Reviewed-by: Guenter Roeck 

> ---
>
>  drivers/spi/spi.c   | 34 --
>  include/linux/spi/spi.h |  3 +++
>  2 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 0597f7086de3..d117ab3adafa 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -1367,10 +1367,30 @@ static void spi_pump_messages(struct kthread_work 
> *work)
> __spi_pump_messages(ctlr, true);
>  }
>
> -static int spi_init_queue(struct spi_controller *ctlr)
> +/**
> + * spi_boost_thread_priority - set the controller to pump at realtime 
> priority
> + * @ctlr: controller to boost priority of
> + *
> + * This can be called because the controller requested realtime priority
> + * (by setting the ->rt value before calling spi_register_controller()) or
> + * because a device on the bus said that its transfers were timing senstive.
> + *
> + * NOTE: at the moment if any device on a bus says it is timing sensitive 
> then
> + * all the devices on this bus will do transfers at realtime priority.  If
> + * this eventually becomes a problem we may see if we can find a way to boost
> + * the priority only temporarily during relevant transfers.
> + */
> +static void spi_boost_thread_priority(struct spi_controller *ctlr)
>  {
> struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
>
> +   dev_info(>dev,
> +   "will run message pump with realtime priority\n");
> +   sched_setscheduler(ctlr->kworker_task, SCHED_FIFO, );
> +}
> +
> +static int spi_init_queue(struct spi_controller *ctlr)
> +{
> ctlr->running = false;
> ctlr->busy = false;
>
> @@ -1390,11 +1410,8 @@ static int spi_init_queue(struct spi_controller *ctlr)
>  * request and the scheduling of the message pump thread. Without this
>  * setting the message pump thread will remain at default priority.
>  */
> -   if (ctlr->rt) {
> -   dev_info(>dev,
> -   "will run message pump with realtime priority\n");
> -   sched_setscheduler(ctlr->kworker_task, SCHED_FIFO, );
> -   }
> +   if (ctlr->rt)
> +   spi_boost_thread_priority(ctlr);
>
> return 0;
>  }
> @@ -2985,6 +3002,11 @@ int spi_setup(struct spi_device *spi)
>
> spi_set_cs(spi, false);
>
> +   if (spi->timing_sensitive && !spi->controller->rt) {
> +   spi->controller->rt = true;
> +   spi_boost_thread_priority(spi->controller);
> +   }
> +
> dev_dbg(>dev, "setup mode %d, %s%s%s%s%u bits/w, %u Hz max --> 
> %d\n",
> (int) (spi->mode & (SPI_CPOL | SPI_CPHA)),
> (spi->mode & SPI_CS_HIGH) ? "cs_high, " : "",
> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> index 053abd22ad31..ef6bdd4d25f2 100644
> --- a/include/linux/spi/spi.h
> +++ b/include/linux/spi/spi.h
> @@ -109,6 +109,8 @@ void spi_statistics_add_transfer_stats(struct 
> spi_statistics *stats,
>   * This may be changed by the device's driver, or left at the
>   * default (0) indicating protocol words are eight bit bytes.
>   * The spi_transfer.bits_per_word can override this for each transfer.
> + * @timing_sensitive: Transfers for this device are senstive to timing
> + * so we should do our transfer at high priority.
>   * @irq: Negative, or the number passed to request_irq() to receive
>   * interrupts from this device.
>   * @controller_state: Controller's runtime state
> @@ -143,6 +145,7 @@ struct spi_device {
> u32 max_speed_hz;
> u8  chip_select;
> u8  bits_per_word;
> +   bool

Re: [PATCH 3/4] platform/chrome: cros_ec_spi: Set ourselves as timing sensitive

2019-05-10 Thread Guenter Roeck

From: Douglas Anderson 
Date: Fri, May 10, 2019 at 3:35 PM
To: Mark Brown, Benson Leung, Enric Balletbo i Serra
Cc: , ,
Guenter Roeck, , , Douglas
Anderson, 

> All currently known ECs in the wild are very sensitive to timing.
> Specifically the ECs are known to drop a transfer if more than 8 ms
> passes from the assertion of the chip select until the transfer
> finishes.
>
> Let's use the new feature introduced in the patch ("spi: Allow SPI
> devices to specify that they are timing sensitive") to specify this
> and increase the success rate of our transfers.
>
> NOTE: if future Chrome OS ECs ever fix themselves to be less sensitive
> then we could consider adding a property (or compatible string) to not
> set this property.  For now we need it across the board.
>
> With this change we can revert the commit 37a186225a0c
> ("platform/chrome: cros_ec_spi: Transfer messages at high priority").
> ...and, in fact, transfers are _even more_ reliable than they were
> with that commit since the SPI framework will use a higher priority
> (realtime) and we no longer lose our priority when we get shunted over
> to the message pumping thread (because we now always get shunted and
> the thread is high priority).
>
> Signed-off-by: Douglas Anderson 

Reviewed-by: Guenter Roeck 

> ---
>
>  drivers/platform/chrome/cros_ec_spi.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/platform/chrome/cros_ec_spi.c 
> b/drivers/platform/chrome/cros_ec_spi.c
> index 8e9451720e73..757a115502ec 100644
> --- a/drivers/platform/chrome/cros_ec_spi.c
> +++ b/drivers/platform/chrome/cros_ec_spi.c
> @@ -703,6 +703,7 @@ static int cros_ec_spi_probe(struct spi_device *spi)
>
> spi->bits_per_word = 8;
> spi->mode = SPI_MODE_0;
> +   spi->timing_sensitive = true;
> err = spi_setup(spi);
> if (err < 0)
> return err;
> --
> 2.21.0.1020.gf2820cf01a-goog
>

Re: [PATCH v3 1/7] mm: postpone kmem_cache memcg pointer initialization to memcg_link_cache()

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:30 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> Initialize kmem_cache->memcg_params.memcg pointer in
> memcg_link_cache() rather than in init_memcg_params().
>
> Once kmem_cache will hold a reference to the memory cgroup,
> it will simplify the refcounting.
>
> For non-root kmem_caches memcg_link_cache() is always called
> before the kmem_cache becomes visible to a user, so it's safe.
>
> Signed-off-by: Roman Gushchin 

Reviewed-by: Shakeel Butt 


> ---
>  mm/slab.c|  2 +-
>  mm/slab.h|  5 +++--
>  mm/slab_common.c | 14 +++---
>  mm/slub.c|  2 +-
>  4 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 2915d912e89a..f6eff59e018e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1268,7 +1268,7 @@ void __init kmem_cache_init(void)
>   nr_node_ids * sizeof(struct kmem_cache_node 
> *),
>   SLAB_HWCACHE_ALIGN, 0, 0);
> list_add(_cache->list, _caches);
> -   memcg_link_cache(kmem_cache);
> +   memcg_link_cache(kmem_cache, NULL);
> slab_state = PARTIAL;
>
> /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..6a562ca72bca 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -289,7 +289,7 @@ static __always_inline void memcg_uncharge_slab(struct 
> page *page, int order,
>  }
>
>  extern void slab_init_memcg_params(struct kmem_cache *);
> -extern void memcg_link_cache(struct kmem_cache *s);
> +extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg);
>  extern void slab_deactivate_memcg_cache_rcu_sched(struct kmem_cache *s,
> void (*deact_fn)(struct kmem_cache *));
>
> @@ -344,7 +344,8 @@ static inline void slab_init_memcg_params(struct 
> kmem_cache *s)
>  {
>  }
>
> -static inline void memcg_link_cache(struct kmem_cache *s)
> +static inline void memcg_link_cache(struct kmem_cache *s,
> +   struct mem_cgroup *memcg)
>  {
>  }
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..6e00bdf8618d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -140,13 +140,12 @@ void slab_init_memcg_params(struct kmem_cache *s)
>  }
>
>  static int init_memcg_params(struct kmem_cache *s,
> -   struct mem_cgroup *memcg, struct kmem_cache *root_cache)
> +struct kmem_cache *root_cache)
>  {
> struct memcg_cache_array *arr;
>
> if (root_cache) {
> s->memcg_params.root_cache = root_cache;
> -   s->memcg_params.memcg = memcg;
> INIT_LIST_HEAD(>memcg_params.children_node);
> INIT_LIST_HEAD(>memcg_params.kmem_caches_node);
> return 0;
> @@ -221,11 +220,12 @@ int memcg_update_all_caches(int num_memcgs)
> return ret;
>  }
>
> -void memcg_link_cache(struct kmem_cache *s)
> +void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *memcg)
>  {
> if (is_root_cache(s)) {
> list_add(>root_caches_node, _root_caches);
> } else {
> +   s->memcg_params.memcg = memcg;
> list_add(>memcg_params.children_node,
>  >memcg_params.root_cache->memcg_params.children);
> list_add(>memcg_params.kmem_caches_node,
> @@ -244,7 +244,7 @@ static void memcg_unlink_cache(struct kmem_cache *s)
>  }
>  #else
>  static inline int init_memcg_params(struct kmem_cache *s,
> -   struct mem_cgroup *memcg, struct kmem_cache *root_cache)
> +   struct kmem_cache *root_cache)
>  {
> return 0;
>  }
> @@ -384,7 +384,7 @@ static struct kmem_cache *create_cache(const char *name,
> s->useroffset = useroffset;
> s->usersize = usersize;
>
> -   err = init_memcg_params(s, memcg, root_cache);
> +   err = init_memcg_params(s, root_cache);
> if (err)
> goto out_free_cache;
>
> @@ -394,7 +394,7 @@ static struct kmem_cache *create_cache(const char *name,
>
> s->refcount = 1;
> list_add(>list, _caches);
> -   memcg_link_cache(s);
> +   memcg_link_cache(s, memcg);
>  out:
> if (err)
> return ERR_PTR(err);
> @@ -997,7 +997,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char 
> *name,
>
> create_boot_cache(s, name, size, flags, useroffset, usersize);
> list_add(>list, _caches);
> -   memcg_link_cache(s);
> +   memcg_link_cache(s, NULL);
> s->refcount = 1;
> return s;
>  }
> diff --git a/mm/slub.c b/mm/slub.c
> index 5b2e364102e1..16f7e4f5a141 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4219,7 +4219,7 @@ static struct kmem_cache * __init bootstrap(struct 
> kmem_cache *static_cache)
> }
>

Re: [PATCH v3 0/7] mm: reparent slab memory on cgroup removal

2019-05-10 Thread Shakeel Butt

From: Roman Gushchin 
Date: Wed, May 8, 2019 at 1:30 PM
To: Andrew Morton, Shakeel Butt
Cc: , ,
, Johannes Weiner, Michal Hocko, Rik van Riel,
Christoph Lameter, Vladimir Davydov, , Roman
Gushchin

> # Why do we need this?
>
> We've noticed that the number of dying cgroups is steadily growing on most
> of our hosts in production. The following investigation revealed an issue
> in userspace memory reclaim code [1], accounting of kernel stacks [2],
> and also the mainreason: slab objects.
>
> The underlying problem is quite simple: any page charged
> to a cgroup holds a reference to it, so the cgroup can't be reclaimed unless
> all charged pages are gone. If a slab object is actively used by other 
> cgroups,
> it won't be reclaimed, and will prevent the origin cgroup from being 
> reclaimed.
>
> Slab objects, and first of all vfs cache, is shared between cgroups, which are
> using the same underlying fs, and what's even more important, it's shared
> between multiple generations of the same workload. So if something is running
> periodically every time in a new cgroup (like how systemd works), we do
> accumulate multiple dying cgroups.
>
> Strictly speaking pagecache isn't different here, but there is a key 
> difference:
> we disable protection and apply some extra pressure on LRUs of dying cgroups,

How do you apply extra pressure on dying cgroups? cgroup-v2 does not
have memory.force_empty.


> and these LRUs contain all charged pages.
> My experiments show that with the disabled kernel memory accounting the number
> of dying cgroups stabilizes at a relatively small number (~100, depends on
> memory pressure and cgroup creation rate), and with kernel memory accounting
> it grows pretty steadily up to several thousands.
>
> Memory cgroups are quite complex and big objects (mostly due to percpu stats),
> so it leads to noticeable memory losses. Memory occupied by dying cgroups
> is measured in hundreds of megabytes. I've even seen a host with more than 
> 100Gb
> of memory wasted for dying cgroups. It leads to a degradation of performance
> with the uptime, and generally limits the usage of cgroups.
>
> My previous attempt [3] to fix the problem by applying extra pressure on slab
> shrinker lists caused a regressions with xfs and ext4, and has been reverted 
> [4].
> The following attempts to find the right balance [5, 6] were not successful.
>
> So instead of trying to find a maybe non-existing balance, let's do reparent
> the accounted slabs to the parent cgroup on cgroup removal.
>
>
> # Implementation approach
>
> There is however a significant problem with reparenting of slab memory:
> there is no list of charged pages. Some of them are in shrinker lists,
> but not all. Introducing of a new list is really not an option.
>
> But fortunately there is a way forward: every slab page has a stable pointer
> to the corresponding kmem_cache. So the idea is to reparent kmem_caches
> instead of slab pages.
>
> It's actually simpler and cheaper, but requires some underlying changes:
> 1) Make kmem_caches to hold a single reference to the memory cgroup,
>instead of a separate reference per every slab page.
> 2) Stop setting page->mem_cgroup pointer for memcg slab pages and use
>page->kmem_cache->memcg indirection instead. It's used only on
>slab page release, so it shouldn't be a big issue.
> 3) Introduce a refcounter for non-root slab caches. It's required to
>be able to destroy kmem_caches when they become empty and release
>the associated memory cgroup.
>
> There is a bonus: currently we do release empty kmem_caches on cgroup
> removal, however all other are waiting for the releasing of the memory cgroup.
> These refactorings allow kmem_caches to be released as soon as they
> become inactive and free.
>
> Some additional implementation details are provided in corresponding
> commit messages.
>
>
> # Results
>
> Below is the average number of dying cgroups on two groups of our production
> hosts. They do run some sort of web frontend workload, the memory pressure
> is moderate. As we can see, with the kernel memory reparenting the number
> stabilizes in 50s range; however with the original version it grows almost
> linearly and doesn't show any signs of plateauing. The difference in slab
> and percpu usage between patched and unpatched versions also grows linearly.
> In 6 days it reached 200Mb.
>
> day   0123456
> original 39  338  580  827 1098 1349 1574
> patched  23   44   45   47   50   46   55
> mem diff(Mb) 53   73   99  137  148  182  209
>
>
> # History
>
> v3:
>   1) reworked memcg kmem_cache search on allocation path
>   2) fixed /proc/kpagecgroup interface
>
> v2:
>   1) switched to percpu kmem_cache refcounter
>   2) a reference to kmem_cache is held during the allocation
>   3) slabs stats are fixed for !MEMCG case (and the refactoring
>  is separated into a standalone patch)
>   4) kmem_cache reparenting is performed from

Re: [PATCH 1/4] spi: For controllers that need realtime always use the pump thread

2019-05-10 Thread Guenter Roeck

From: Douglas Anderson 
Date: Fri, May 10, 2019 at 3:35 PM
To: Mark Brown, Benson Leung, Enric Balletbo i Serra
Cc: , ,
Guenter Roeck, , , Douglas
Anderson, , 

> If a controller specifies that it needs high priority for sending
> messages we should always schedule our transfers on the thread.  If we
> don't do this we'll do the transfer in the caller's context which
> might not be very high priority.
>
> Signed-off-by: Douglas Anderson 

Reviewed-by: Guenter Roeck 

> ---
>
>  drivers/spi/spi.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 8eb7460dd744..0597f7086de3 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -1230,8 +1230,11 @@ static void __spi_pump_messages(struct spi_controller 
> *ctlr, bool in_kthread)
> return;
> }
>
> -   /* If another context is idling the device then defer */
> -   if (ctlr->idling) {
> +   /*
> +* If another context is idling the device then defer.
> +* If we are high priority then the thread should do the transfer.
> +*/
> +   if (ctlr->idling || (ctlr->rt && !in_kthread)) {
> kthread_queue_work(>kworker, >pump_messages);
> spin_unlock_irqrestore(>queue_lock, flags);
> return;
> --
> 2.21.0.1020.gf2820cf01a-goog
>

Re: [PATCH] mm/gup.c: Make follow_page_mask static

2019-05-10 Thread Ira Weiny

On Sat, May 11, 2019 at 12:38:32AM +0530, Bharath Vedartham wrote:
> follow_page_mask is only used in gup.c, make it static.
> 
> Tested by compiling and booting. Grepped the source for
> "follow_page_mask" to be sure it is not used else where.
> 
> Signed-off-by: Bharath Vedartham 

Reviewed-by: Ira Weiny 

> ---
>  mm/gup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 91819b8..e6f3b7f 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -409,7 +409,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct 
> *vma,
>   * an error pointer if there is a mapping to something not represented
>   * by a page descriptor (see also vm_normal_page()).
>   */
> -struct page *follow_page_mask(struct vm_area_struct *vma,
> +static struct page *follow_page_mask(struct vm_area_struct *vma,
> unsigned long address, unsigned int flags,
> struct follow_page_context *ctx)
>  {
> -- 
> 2.7.4
>

Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

2019-05-10 Thread Christian Brauner

On Sat, May 11, 2019 at 12:55 AM Jann Horn  wrote:
>
> On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote:
> > On Fri, May 10, 2019 at 1:41 PM Jann Horn  wrote:
> > >
> > > On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote:
> > > > On 2019-05-06, Jann Horn  wrote:
> > > > > In my opinion, CVE-2019-5736 points out two different problems:
> > > > >
> > > > > The big problem: The __ptrace_may_access() logic has a special-case
> > > > > short-circuit for "introspection" that you can't opt out of; this
> > > > > makes it possible to open things in procfs that are related to the
> > > > > current process even if the credentials of the process wouldn't permit
> > > > > accessing another process like it. I think the proper fix to deal with
> > > > > this would be to add a prctl() flag for "set whether introspection is
> > > > > allowed for this process", and if userspace has manually un-set that
> > > > > flag, any introspection special-case logic would be skipped.
> > > >
> > > > We could do PR_SET_DUMPABLE=3 for this, I guess?
> > >
> > > Hmm... I'd make it a new prctl() command, since introspection is
> > > somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I
> > > think the introspection flag should be per-thread.
> >
> > I've lost track of the context here, but it seems to me that
> > mitigating attacks involving accidental following of /proc links
> > shouldn't depend on dumpability.  What's the actual problem this is
> > trying to solve again?
>
> The one actual security problem that I've seen related to this is
> CVE-2019-5736. There is a write-up of it at
> 
> under "Successful approach", but it goes more or less as follows:
>
> A container is running that doesn't use user namespaces (because for
> some reason I don't understand, apparently some people still do that).
> An evil process is running inside the container with UID 0 (as in,
> GLOBAL_ROOT_UID); so if the evil process inside the container was able
> to reach root-owned files on the host filesystem, it could write into
> them.
>
> The container engine wants to spawn a new process inside the container.
> It forks off a child that joins the container's namespaces (including
> PID and mount namespaces), and then the child calls execve() on some
> path in the container.
> The attacker replaces the executable in the container with a symlink
> to /proc/self/exe and replaces a library inside the container with a
> malicious one.
> When the container engine calls execve(), intending to run an executable
> inside the container, it instead goes through ptrace_may_access() using
> the introspection short-circuit and re-executes its own executable
> through the jumped symlink /proc/self/exe (which is normally unreachable
> for the container). After the execve(), the process loads an evil
> library from inside the container and is under the control of the
> container.
> Now the container controls a process whose /proc/self/exe is a jumped
> symlink to a host executable, and the container can write into it.
>
> Some container engines are now using an extremely ugly hack to work
> around this - whenever they want to enter a container, they copy the
> host binary into a new memfd and execute that to avoid exposing the
> original host binary to containers:
> 
>
>
> In my opinion, the problems here are:
>
>  - Apparently some people run untrusted containers without user
>namespaces. It would be really nice if people could not do that.
>(Probably the biggest problem here.)

I know I sound like a broken record since I've been going on about this
forever together with a lot of other people but honestly,
the fact that people are running untrusted workloads in privileged containers
is the real issue here.

Aleksa is a good friend of mine and we have discussed this a lot so I hope
he doesn't hate me for saying this again: it is crazy that there are container
runtimes out there that promise (or at least do not state the opposite)
containers without user namespaces or containers with user namespaces
that allow to map the host root id to anything can be safe. They cannot.

Even if this /proc/*/exe thing is somehow blocked there
are other ways of escaping from a privileged container.
We (i.e. LXC) literally do not accept CVEs for privileged containers
because we do not consider them safe by design.
It seems to me to be heading in the wrong direction to keep up the
illusion that with enough effort we can make this all nice and safe.
Yes, the userspace memfd hack we came up with is as ugly as a security
patch can be but if you make promises you can't keep you better be
prepared to pay the price when things start to fall apart.
So if this part of the patch is just needed to handle this do we really
want to do all that tricky work or is there more to gain from

Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver

2019-05-10 Thread Dan Williams

On Wed, May 8, 2019 at 4:19 AM Pankaj Gupta  wrote:
>
>
> Hi Dan,
>
> Thank you for the review. Please see my reply inline.
>
> >
> > Hi Pankaj,
> >
> > Some minor file placement comments below.
>
> Sure.
>
> >
> > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta  wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > >
> > > Signed-off-by: Pankaj Gupta 
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c | 114 +
> > >  drivers/virtio/Kconfig   |  10 +++
> > >  drivers/virtio/Makefile  |   1 +
> > >  drivers/virtio/pmem.c| 118 +++
> > >  include/linux/virtio_pmem.h  |  60 
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 314 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > new file mode 100644
> > > index ..66b582f751a3
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -0,0 +1,114 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include 
> > > +#include "nd.h"
> > > +
> > > + /* The interrupt handler */
> > > +void host_ack(struct virtqueue *vq)
> > > +{
> > > +   unsigned int len;
> > > +   unsigned long flags;
> > > +   struct virtio_pmem_request *req, *req_buf;
> > > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > +   while ((req = virtqueue_get_buf(vq, )) != NULL) {
> > > +   req->done = true;
> > > +   wake_up(>host_acked);
> > > +
> > > +   if (!list_empty(>req_list)) {
> > > +   req_buf = list_first_entry(>req_list,
> > > +   struct virtio_pmem_request, list);
> > > +   list_del(>req_list);
> > > +   req_buf->wq_buf_avail = true;
> > > +   wake_up(_buf->wq_buf);
> > > +   }
> > > +   }
> > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > +}
> > > +EXPORT_SYMBOL_GPL(host_ack);
> > > +
> > > + /* The request submission function */
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +   int err;
> > > +   unsigned long flags;
> > > +   struct scatterlist *sgs[2], sg, ret;
> > > +   struct virtio_device *vdev = nd_region->provider_data;
> > > +   struct virtio_pmem *vpmem = vdev->priv;
> > > +   struct virtio_pmem_request *req;
> > > +
> > > +   might_sleep();
> > > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +   if (!req)
> > > +   return -ENOMEM;
> > > +
> > > +   req->done = req->wq_buf_avail = false;
> > > +   strcpy(req->name, "FLUSH");
> > > +   init_waitqueue_head(>host_acked);
> > > +   init_waitqueue_head(>wq_buf);
> > > +   sg_init_one(, req->name, strlen(req->name));
> > > +   sgs[0] = 
> > > +   sg_init_one(, >ret, sizeof(req->ret));
> > > +   sgs[1] = 
> > > +
> > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > +   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, 
> > > GFP_ATOMIC);
> > > +   if (err) {
> > > +   dev_err(>dev, "failed to send command to virtio pmem
> > > device\n");
> > > +
> > > +   list_add_tail(>req_list, >list);
> > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > +
> > > +   /* When host has read buffer, this completes via host_ack
> > > */
> > > +   wait_event(req->wq_buf, req->wq_buf_avail);
> > > +   spin_lock_irqsave(>pmem_lock, flags);
> > > +   }
> > > +   err = virtqueue_kick(vpmem->req_vq);
> > > +   spin_unlock_irqrestore(>pmem_lock, flags);
> > > +
> > > +   if (!err) {
> > > +   err = -EIO;
> > > +   goto ret;
> > > +   }
> > > +   /* When host has

Re: [PATCH v8 0/6] virtio pmem driver

2019-05-10 Thread Pankaj Gupta



> >
> >  Hi Michael & Dan,
> >
> >  Please review/ack the patch series from LIBNVDIMM & VIRTIO side.
> >  We have ack on ext4, xfs patches(4, 5 & 6) patch 2. Still need
> >  your ack on nvdimm patches(1 & 3) & virtio patch 2.
> 
> I was planning to merge these via the nvdimm tree, not ack them. Did
> you have another maintainer lined up to take these patches?

Sorry! for not being clear on this. I wanted to say same.

Proposed the patch series to be merged via nvdimm tree as kindly agreed
by you. We only need an ack on virtio patch 2 from Micahel.

Thank you for all your help.

Best regards,
Pankaj Gupta

>

Re: Question about sched_setaffinity()

2019-05-10 Thread Paul E. McKenney

On Fri, May 10, 2019 at 02:08:19PM +0200, Peter Zijlstra wrote:
> On Thu, May 09, 2019 at 12:36:25PM -0700, Paul E. McKenney wrote:
> > I forward-ported the relevant patches from -rcu and placed them on -rcu
> > branch peterz.2019.05.09a, and this is what produced the output above.
> > 
> > Any other debugging thoughts?
> > 
> > Or, if you wish, you can reproduce by running the following:
> > 
> > nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 
> > --configs "TRIVIAL" --bootargs 
> > "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph 
> > ftrace_graph_filter=sched_setaffinity,migration_cpu_stop" --kconfig 
> > "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
> > 
> > This gets me the following summary output:
> > 
> >  --- Thu May  9 12:08:31 PDT 2019 Test summary:
> >  Results directory: 
> > /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31
> >  tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 
> > --configs TRIVIAL --bootargs 
> > trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph 
> > ftrace_graph_filter=sched_setaffinity,migration_cpu_stop --kconfig 
> > CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y
> >  TRIVIAL --- 2177 GPs (18.1417/s) [trivial: g0 f0x0 ]
> >  :CONFIG_HOTPLUG_CPU: improperly set
> >  WARNING: BAD SEQ 2176:2176 last:2177 version 4
> >  
> > /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> >  WARNING: Assertion failure in 
> > /home/git/linux-2.6-tip/tools/testing/selftests/rcutorture/res/2019.05.09-12:08:31/TRIVIAL/console.log
> >  WARNING: Summary: Warnings: 1 Bugs: 1 Call Traces: 5 Stalls: 8
> 
> So I could reproduce...
> 
> I must first complain about your scripts; it does "make mrproper" on the
> source tree every time you run it, this is not appreciated. For one, it
> deletes my 'tags' file.

This is because it builds in a different directory, and "make O=/path"
complains if you don't have the source directory pristine.

But there really is no longer any reason to build in a different
directory, I suppose.  This is a largish change, but working on it.

> Getting it to not rebuild the whole kernel every time wasn't easy
> either.

You trust "make" far more than I do!  I am thinking of adding a
"--trust-make" argument that suppresses the "make clean".  Maybe if
I grow to trust "make" in the fulness of time, I can remove the "make
clean" entirely.  But given ccache, and given the duration of the typical
rcutorture run, and given that there are multiple rcutorture scenarios
each with a different .config, this hasn't been a priority.  The build
step is already omitted for repeated runs.

> Aside from that it seems to 'work'.
> 
> The below trace explain the issue. Some Paul person did it, see below.
> It's broken per construction :-)

*facepalm*  Hence the very strange ->cpus_allowed mask.  I really
should have figured that one out.

The fix is straightforward.  I just added "rcutorture.shuffle_interval=0"
to the TRIVIAL.boot file, which stops rcutorture from shuffling its
kthreads around.

Please accept my apologies for the hassle, and thank you for tracking
this down!!!

Thanx, Paul

Re: [PATCH] mm: vmscan: correct nr_reclaimed for THP

2019-05-10 Thread Matthew Wilcox

On Fri, May 10, 2019 at 03:54:56PM -0700, Ira Weiny wrote:
> On Fri, May 10, 2019 at 09:36:12AM -0700, Matthew Wilcox wrote:
> > On Fri, May 10, 2019 at 10:12:40AM +0800, Huang, Ying wrote:
> > > > +   nr_reclaimed += (1 << compound_order(page));
> > > 
> > > How about to change this to
> > > 
> > > 
> > > nr_reclaimed += hpage_nr_pages(page);
> > 
> > Please don't.  That embeds the knowledge that we can only swap out either 
> > normal pages or THP sized pages.  I'm trying to make the VM capable of 
> > supporting arbitrary-order pages, and this would be just one more place
> > to fix.
> > 
> > I'm sympathetic to the "self documenting" argument.  My current tree has
> > a patch in it:
> > 
> > mm: Introduce compound_nr
> > 
> > Replace 1 << compound_order(page) with compound_nr(page).  Minor
> > improvements in readability.
> > 
> > It goes along with this patch:
> > 
> > mm: Introduce page_size()
> > 
> > It's unnecessarily hard to find out the size of a potentially huge page.
> > Replace 'PAGE_SIZE << compound_order(page)' with page_size(page).
> > 
> > Better suggestions on naming gratefully received.  I'm more happy with 
> > page_size() than I am with compound_nr().  page_nr() gives the wrong
> > impression; page_count() isn't great either.
> 
> Stupid question : what does 'nr' stand for?

NumbeR.  It's relatively common argot in the Linux kernel (as you can
see from the earlier example ...

> > > nr_reclaimed += hpage_nr_pages(page);

willy@bobo:~/kernel/xarray-2$ git grep -w nr mm |wc -l
388
willy@bobo:~/kernel/xarray-2$ git grep -w nr fs |wc -l
1067

Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

2019-05-10 Thread Jann Horn

On Fri, May 10, 2019 at 02:20:23PM -0700, Andy Lutomirski wrote:
> On Fri, May 10, 2019 at 1:41 PM Jann Horn  wrote:
> >
> > On Tue, May 07, 2019 at 05:17:35AM +1000, Aleksa Sarai wrote:
> > > On 2019-05-06, Jann Horn  wrote:
> > > > In my opinion, CVE-2019-5736 points out two different problems:
> > > >
> > > > The big problem: The __ptrace_may_access() logic has a special-case
> > > > short-circuit for "introspection" that you can't opt out of; this
> > > > makes it possible to open things in procfs that are related to the
> > > > current process even if the credentials of the process wouldn't permit
> > > > accessing another process like it. I think the proper fix to deal with
> > > > this would be to add a prctl() flag for "set whether introspection is
> > > > allowed for this process", and if userspace has manually un-set that
> > > > flag, any introspection special-case logic would be skipped.
> > >
> > > We could do PR_SET_DUMPABLE=3 for this, I guess?
> >
> > Hmm... I'd make it a new prctl() command, since introspection is
> > somewhat orthogonal to dumpability. Also, dumpability is per-mm, and I
> > think the introspection flag should be per-thread.
> 
> I've lost track of the context here, but it seems to me that
> mitigating attacks involving accidental following of /proc links
> shouldn't depend on dumpability.  What's the actual problem this is
> trying to solve again?

The one actual security problem that I've seen related to this is
CVE-2019-5736. There is a write-up of it at

under "Successful approach", but it goes more or less as follows:

A container is running that doesn't use user namespaces (because for
some reason I don't understand, apparently some people still do that).
An evil process is running inside the container with UID 0 (as in,
GLOBAL_ROOT_UID); so if the evil process inside the container was able
to reach root-owned files on the host filesystem, it could write into
them.

The container engine wants to spawn a new process inside the container.
It forks off a child that joins the container's namespaces (including
PID and mount namespaces), and then the child calls execve() on some
path in the container.
The attacker replaces the executable in the container with a symlink
to /proc/self/exe and replaces a library inside the container with a
malicious one.
When the container engine calls execve(), intending to run an executable
inside the container, it instead goes through ptrace_may_access() using
the introspection short-circuit and re-executes its own executable
through the jumped symlink /proc/self/exe (which is normally unreachable
for the container). After the execve(), the process loads an evil
library from inside the container and is under the control of the
container.
Now the container controls a process whose /proc/self/exe is a jumped
symlink to a host executable, and the container can write into it.

Some container engines are now using an extremely ugly hack to work
around this - whenever they want to enter a container, they copy the
host binary into a new memfd and execute that to avoid exposing the
original host binary to containers:

In my opinion, the problems here are:

 - Apparently some people run untrusted containers without user
   namespaces. It would be really nice if people could not do that.
   (Probably the biggest problem here.)
 - ptrace_may_access() has a short-circuit that permits a process to
   unintentionally look at itself even if it has dropped privileges -
   here, it permits the execve("/proc/self/exe", ...) that would
   normally be blocked by the check for CAP_SYS_PTRACE if the process
   is nondumpable.
 - You can use /proc/*/exe to get a writable fd.

Re: [PATCH, RFC 2/2] Implement sharing/unsharing of PMDs for FS/DAX

2019-05-10 Thread Mike Kravetz

On 5/10/19 9:16 AM, Larry Bassel wrote:
> On 09 May 19 09:49, Matthew Wilcox wrote:
>> On Thu, May 09, 2019 at 09:05:33AM -0700, Larry Bassel wrote:
>>> This is based on (but somewhat different from) what hugetlbfs
>>> does to share/unshare page tables.
>>
>> Wow, that worked out far more cleanly than I was expecting to see.
> 
> Yes, I was pleasantly surprised. As I've mentioned already, I 
> think this is at least partially due to the nature of DAX.

I have not looked in detail to make sure this is indeed all the places you
need to hook and special case for sharing/unsharing.  Since this scheme is
somewhat like that used for hugetlb, I just wanted to point out some nasty
bugs related to hugetlb PMD sharing that were fixed last year.

5e41540c8a0f hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
dff11abe280b hugetlb: take PMD sharing into account when flushing tlb/caches
017b1660df89 mm: migration: fix migration of huge PMD shared pages

The common issue in these is that when unmapping a page with a shared PMD
mapping you need to flush the entire shared range and not just the unmapped
page.  The above changes were hugetlb specific.  I do not know if any of
this applies in the case of DAX.
-- 
Mike Kravetz

Re: [PATCH] mm: vmscan: correct nr_reclaimed for THP

2019-05-10 Thread Ira Weiny

On Fri, May 10, 2019 at 09:36:12AM -0700, Matthew Wilcox wrote:
> On Fri, May 10, 2019 at 10:12:40AM +0800, Huang, Ying wrote:
> > > + nr_reclaimed += (1 << compound_order(page));
> > 
> > How about to change this to
> > 
> > 
> > nr_reclaimed += hpage_nr_pages(page);
> 
> Please don't.  That embeds the knowledge that we can only swap out either 
> normal pages or THP sized pages.  I'm trying to make the VM capable of 
> supporting arbitrary-order pages, and this would be just one more place
> to fix.
> 
> I'm sympathetic to the "self documenting" argument.  My current tree has
> a patch in it:
> 
> mm: Introduce compound_nr
> 
> Replace 1 << compound_order(page) with compound_nr(page).  Minor
> improvements in readability.
> 
> It goes along with this patch:
> 
> mm: Introduce page_size()
> 
> It's unnecessarily hard to find out the size of a potentially huge page.
> Replace 'PAGE_SIZE << compound_order(page)' with page_size(page).
> 
> Better suggestions on naming gratefully received.  I'm more happy with 
> page_size() than I am with compound_nr().  page_nr() gives the wrong
> impression; page_count() isn't great either.

Stupid question : what does 'nr' stand for?

Ira

Re: [PATCH v5 04/11] of: irq: document properties for wakeup interrupt parent

2019-05-10 Thread Rob Herring

On Tue, May 7, 2019 at 3:41 PM Lina Iyer  wrote:
>
> Some interrupt controllers in a SoC, are always powered on and have a
> select interrupts routed to them, so that they can wakeup the SoC from
> suspend. Add wakeup-parent DT property to refer to these interrupt
> controllers.
>
> If the interrupts routed to the wakeup parent are not sequential, than a
> map needs to exist to associate the same interrupt line on multiple
> interrupt controllers. Providing this map in every driver is cumbersome.
> Let's add this in the device tree and document the properties to map the
> interrupt specifiers
>
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Lina Iyer 
> ---
> Changes in v5:
> - Update documentation to describe masks in the example
> Changes in v4:
> - Added this documentation
> ---
>  .../interrupt-controller/interrupts.txt   | 54 +++
>  1 file changed, 54 insertions(+)
>
> diff --git 
> a/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt 
> b/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
> index 8a3c40829899..e3e43f5d5566 100644
> --- a/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
> +++ b/Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
> @@ -108,3 +108,57 @@ commonly used:
> sensitivity = <7>;
> };
> };
> +
> +3) Interrupt wakeup parent
> +--
> +
> +Some interrupt controllers in a SoC, are always powered on and have a select
> +interrupts routed to them, so that they can wakeup the SoC from suspend. 
> These
> +interrupt controllers do not fall into the category of a parent interrupt
> +controller and can be specified by the "wakeup-parent" property and contain a
> +single phandle referring to the wakeup capable interrupt controller.
> +
> +   Example:
> +   wakeup-parent = <_intc>;
> +
> +
> +4) Interrupt mapping
> +
> +
> +Sometimes interrupts may be detected by more than one interrupt controller
> +(depending on which controller is active). The interrupt controllers may not
> +be in hierarchy and therefore the interrupt controller driver is required to
> +establish the relationship between the same interrupt at different interrupt
> +controllers. If these interrupts are not sequential then a map needs to be
> +specified to help identify these interrupts.
> +
> +Mapping the interrupt specifiers in the device tree can be done using the
> +"irqdomain-map" property. The property contains interrupt specifier at the
> +current interrupt controller followed by the interrupt specifier at the 
> mapped
> +interrupt controller.
> +
> +   irqdomain-map = 

I'm wondering why we need a new map property rather than just using
interrupt-map? Contrary to what Linus said, it is not PCI only.

It would be an extension of the current behavior. It's generally used
to map each interrupt to different parents or swizzle the routing (in
the PCI case). Generally, a node would be either an
'interrupt-controller' or an 'interrupt-map' node. The interrupt
parsing code (for the kernel at least) prioritizes
'interrupt-controller' path, so adding 'interrupt-map' could be done
without changing behavior.

Another concern I have with this is it only solves the problem of an
IRQ routed to multiple parents for the case of 2 parents. What happens
when we have an IRQ routed to 3 different parents? Maybe the solution
is the incoming-interrupt-specifier can be listed more than once. Marc
already expressed concerns with the scalability of interrupt-map
property, so that's maybe not an ideal solution.

> +
> +The optional properties "irqdomain-map-mask" and "irqdomain-map-pass-thru" 
> may
> +be provided to help interpret the valid bits of the incoming and mapped
> +interrupt specifiers respectively.
> +
> +   Example:
> +   intc: interrupt-controller@17a0 {
> +   #interrupt-cells = <3>;

The phandle doesn't count as a cell, so this should be 2.

> +   };
> +
> +   pinctrl@340 {
> +   #interrupt-cells = <2>;
> +   irqdomain-map = <22 0  36 0>, <24 0  37 0>;
> +   irqdomain-map-mask = <0xff 0>;
> +   irqdomain-map-pass-thru = <0 0xff>;
> +   };
> +
> +In the above example, the input interrupt specifier map-mask <0xff 0> applied
> +on the incoming interrupt specifier of the map <22 0>, <24 0>, returns the
> +input interrupt 22, 24 etc. The second argument being irq type is immaterial
> +from the map and is used from the incoming request instead. The pass-thru
> +specifier parses the output interrupt specifier from the rest of the unparsed
> +argments from the map < 36 0>, < 37 0> etc to return the output
> +interrupt 36, 37 etc.
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

[PATCH 0/3 v5] Kexec cmdline bufffer measure

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

The motive behind the patch series is to measure the cmdline args
used for soft reboot/kexec case.

For secure boot attestation, it is necessary to measure the kernel
command line and the kernel version. For cold boot, the boot loader
can be enhanced to measure these parameters.
(https://mjg59.dreamwidth.org/48897.html)
However, for attestation across soft reboot boundary, these values 
also need to be measured during kexec.

Currently for Kexec(kexec_file_load)/soft reboot scenario the cmdline
args that are used to boot the next kernel is not measured. For 
normal case of boot/hardreboot the cmdline args are measured into the TPM.

The hash of boot command line is calculated and added to the current 
running kernel's measurement list.  On a soft reboot like kexec, the PCRs
are not reset to zero.  Refer to commit 94c3aac567a9 ("ima: on soft 
reboot, restore the measurement list") patch description.

To achive the above the patch series does the following
  -adds a new ima hook: ima_kexec_cmdline which measures the cmdline args
   into the ima log, behind a new ima policy entry KEXEC_CMDLINE.
  -since the cmldine args cannot be appraised, a new template field(buf) is
   added. The template field contains the buffer passed(cmldine args), which
   can be used to appraise/attest at a later stage.
  -call the ima_kexec_cmdline(...) hook from kexec_file_load call.

The ima logs need to carried over to the next kernel, which will be followed
up by other patchsets for x86_64 and arm64.


Changelog:
v5:
  -add a new ima hook and policy to measure the cmdline
args(ima_kexec_cmdline)
  -add a new template field buf to contain the buffer measured.
[suggested by Mimi Zohar]
  -call ima_kexec_cmdline from kexec_file_load path

v4:
  - per feedback from LSM community, removed the LSM hook and renamed the
IMA policy to KEXEC_CMDLINE

v3: (rebase changes to next-general)
  - Add policy checks for buffer[suggested by Mimi Zohar]
  - use the IMA_XATTR to add buffer
  - Add kexec_cmdline used for kexec file load
  - Add an LSM hook to allow usage by other LSM.[suggestd by Mimi Zohar]

v2:
  - Add policy checks for buffer[suggested by Mimi Zohar]
  - Add an LSM hook to allow usage by other LSM.[suggestd by Mimi Zohar]
  - use the IMA_XATTR to add buffer instead of sig template

v1:
  -Add kconfigs to control the ima_buffer_check
  -measure the cmdline args suffixed with the kernel file name
  -add the buffer to the template sig field.


Prakhar Srivastava (3):
  add a new ima hook and policy to measure the cmdline
  add a new template field buf to contain the buffer
  call ima_kexec_cmdline from kexec_file_load path

 Documentation/ABI/testing/ima_policy  |   1 +
 include/linux/ima.h   |   2 +
 kernel/kexec_file.c   |   2 +
 security/integrity/ima/ima.h  |   1 +
 security/integrity/ima/ima_api.c  |   1 +
 security/integrity/ima/ima_main.c | 107 ++
 security/integrity/ima/ima_policy.c   |   9 ++
 security/integrity/ima/ima_template.c |   2 +
 security/integrity/ima/ima_template_lib.c |  21 +
 security/integrity/ima/ima_template_lib.h |   4 +
 security/integrity/integrity.h|   1 +
 11 files changed, 151 insertions(+)

-- 
2.20.1

[PATCH 2/3 v5] add a new template field buf to contain the buffer

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

The buffer(cmdline args) added to the ima log cannot be attested
without having the actual buffer. Thus to make the measured buffer 
available to stroe/read a new ima temaplate (buf) is added. 
The cmdline args used for soft reboot can then be read and attested
later.

The patch adds a new template field buf to store/read the buffer
used while measuring kexec_cmdline args in the 
[PATCH 1/2 v5]: "add a new ima hook and policy to measure the cmdline".
Signed-off-by: Prakhar Srivastava 
---
 security/integrity/ima/ima_main.c | 23 +++
 security/integrity/ima/ima_template.c |  2 ++
 security/integrity/ima/ima_template_lib.c | 21 +
 security/integrity/ima/ima_template_lib.h |  4 
 security/integrity/integrity.h|  1 +
 5 files changed, 51 insertions(+)

diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 1d186bda25fe..ca12885ca241 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -605,10 +605,32 @@ static int process_buffer_measurement(const void *buf, 
int size,
int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
int action = 0;
 
+   struct buffer_xattr {
+   enum evm_ima_xattr_type type;
+   u16 buf_length;
+   unsigned char buf[0];
+   };
+   struct buffer_xattr *buffer_event_data = NULL;
+   int alloc_length = 0;
+
action = ima_get_action(NULL, cred, secid, 0, KEXEC_CMDLINE, );
if (!(action & IMA_AUDIT) && !(action & IMA_MEASURE))
goto out;
 
+   alloc_length = sizeof(struct buffer_xattr) + size;
+   buffer_event_data = kzalloc(alloc_length, GFP_KERNEL);
+   if (!buffer_event_data) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   buffer_event_data->type = IMA_XATTR_BUFFER;
+   buffer_event_data->buf_length = size;
+   memcpy(buffer_event_data->buf, buf, size);
+
+   event_data.xattr_value = (struct evm_ima_xattr_data *)buffer_event_data;
+   event_data.xattr_len = alloc_length;
+
memset(iint, 0, sizeof(*iint));
memset(, 0, sizeof(hash));
 
@@ -638,6 +660,7 @@ static int process_buffer_measurement(const void *buf, int 
size,
}
 
 out:
+   kfree(buffer_event_data);
return ret;
 }
 
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index b631b8bc7624..a76d1c04162a 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -43,6 +43,8 @@ static const struct ima_template_field supported_fields[] = {
 .field_show = ima_show_template_string},
{.field_id = "sig", .field_init = ima_eventsig_init,
 .field_show = ima_show_template_sig},
+   {.field_id = "buf", .field_init = ima_eventbuf_init,
+.field_show = ima_show_template_buf},
 };
 #define MAX_TEMPLATE_NAME_LEN 15
 
diff --git a/security/integrity/ima/ima_template_lib.c 
b/security/integrity/ima/ima_template_lib.c
index 513b457ae900..95a827f42c18 100644
--- a/security/integrity/ima/ima_template_lib.c
+++ b/security/integrity/ima/ima_template_lib.c
@@ -162,6 +162,11 @@ void ima_show_template_sig(struct seq_file *m, enum 
ima_show_type show,
ima_show_template_field_data(m, show, DATA_FMT_HEX, field_data);
 }
 
+void ima_show_template_buf(struct seq_file *m, enum ima_show_type show,
+   struct ima_field_data *field_data)
+{
+   ima_show_template_field_data(m, show, DATA_FMT_HEX, field_data);
+}
 /**
  * ima_parse_buf() - Parses lengths and data from an input buffer
  * @bufstartp:   Buffer start address.
@@ -389,3 +394,19 @@ int ima_eventsig_init(struct ima_event_data *event_data,
return ima_write_template_field_data(xattr_value, event_data->xattr_len,
 DATA_FMT_HEX, field_data);
 }
+
+/*
+ *  ima_eventbuf_init - include the buffer(kexec-cmldine) as part of the
+ *  template data.
+ */
+int ima_eventbuf_init(struct ima_event_data *event_data,
+   struct ima_field_data *field_data)
+{
+   struct evm_ima_xattr_data *xattr_value = event_data->xattr_value;
+
+   if ((!xattr_value) || (xattr_value->type != IMA_XATTR_BUFFER))
+   return 0;
+
+   return ima_write_template_field_data(xattr_value, event_data->xattr_len,
+   DATA_FMT_HEX, field_data);
+}
diff --git a/security/integrity/ima/ima_template_lib.h 
b/security/integrity/ima/ima_template_lib.h
index 6a3d8b831deb..12f1a8578b31 100644
--- a/security/integrity/ima/ima_template_lib.h
+++ b/security/integrity/ima/ima_template_lib.h
@@ -29,6 +29,8 @@ void ima_show_template_string(struct seq_file *m, enum 
ima_show_type show,
  struct ima_field_data *field_data);
 void ima_show_template_sig(struct seq_file *m, enum ima_show_type show,

[PATCH 3/3 v5] call ima_kexec_cmdline from kexec_file_load path

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

To measure the cmldine args used in case of soft reboot. Call the 
ima hook defined in [PATCH 1/3 v5]:"add a new ima hook and policy to measure 
the cmdline"

Signed-off-by: Prakhar Srivastava 
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f1d0e00a3971..e779bcf674a0 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -241,6 +241,8 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
ret = -EINVAL;
goto out;
}
+
+   ima_kexec_cmdline(image->cmdline_buf, image->cmdline_buf_len - 
1);
}
 
/* Call arch image load handlers */
-- 
2.20.1

[PATCH v3 2/2] ASoC: Intel: Skylake: Add Cometlake PCI IDs

2019-05-10 Thread Evan Green

Add PCI IDs for Intel CometLake platforms, which from a software
point of view are extremely similar to Cannonlake platforms.

Signed-off-by: Evan Green 
---

Changes in v3:
- Don't select CML_* in SND_SOC_INTEL_SKYLAKE (Pierre-Louis)

Changes in v2:
- Add 0x06c8 for CML-H (Pierre-Louis)

 sound/soc/intel/Kconfig| 16 
 sound/soc/intel/skylake/skl-messages.c | 16 
 sound/soc/intel/skylake/skl.c  | 10 ++
 3 files changed, 42 insertions(+)

diff --git a/sound/soc/intel/Kconfig b/sound/soc/intel/Kconfig
index fc1396adde71..b089ed3bf77f 100644
--- a/sound/soc/intel/Kconfig
+++ b/sound/soc/intel/Kconfig
@@ -165,6 +165,22 @@ config SND_SOC_INTEL_CFL
  If you have a Intel CoffeeLake platform with the DSP
  enabled in the BIOS then enable this option by saying Y or m.
 
+config SND_SOC_INTEL_CML_H
+   tristate "CometLake-H Platforms"
+   depends on PCI && ACPI
+   select SND_SOC_INTEL_SKYLAKE_FAMILY
+   help
+ If you have a Intel CometLake-H platform with the DSP
+ enabled in the BIOS then enable this option by saying Y or m.
+
+config SND_SOC_INTEL_CML_LP
+   tristate "CometLake-LP Platforms"
+   depends on PCI && ACPI
+   select SND_SOC_INTEL_SKYLAKE_FAMILY
+   help
+ If you have a Intel CometLake-LP platform with the DSP
+ enabled in the BIOS then enable this option by saying Y or m.
+
 config SND_SOC_INTEL_SKYLAKE_FAMILY
tristate
select SND_SOC_INTEL_SKYLAKE_COMMON
diff --git a/sound/soc/intel/skylake/skl-messages.c 
b/sound/soc/intel/skylake/skl-messages.c
index 4bf70b4429f0..df01dc952521 100644
--- a/sound/soc/intel/skylake/skl-messages.c
+++ b/sound/soc/intel/skylake/skl-messages.c
@@ -255,6 +255,22 @@ static const struct skl_dsp_ops dsp_ops[] = {
.init_fw = cnl_sst_init_fw,
.cleanup = cnl_sst_dsp_cleanup
},
+   {
+   .id = 0x02c8,
+   .num_cores = 4,
+   .loader_ops = bxt_get_loader_ops,
+   .init = cnl_sst_dsp_init,
+   .init_fw = cnl_sst_init_fw,
+   .cleanup = cnl_sst_dsp_cleanup
+   },
+   {
+   .id = 0x06c8,
+   .num_cores = 4,
+   .loader_ops = bxt_get_loader_ops,
+   .init = cnl_sst_dsp_init,
+   .init_fw = cnl_sst_init_fw,
+   .cleanup = cnl_sst_dsp_cleanup
+   },
 };
 
 const struct skl_dsp_ops *skl_get_dsp_ops(int pci_id)
diff --git a/sound/soc/intel/skylake/skl.c b/sound/soc/intel/skylake/skl.c
index 4ed5b7e17d44..f864f7b3df3a 100644
--- a/sound/soc/intel/skylake/skl.c
+++ b/sound/soc/intel/skylake/skl.c
@@ -1166,6 +1166,16 @@ static const struct pci_device_id skl_ids[] = {
/* CFL */
{ PCI_DEVICE(0x8086, 0xa348),
.driver_data = (unsigned long)_soc_acpi_intel_cnl_machines},
+#endif
+#if IS_ENABLED(CONFIG_SND_SOC_INTEL_CML_LP)
+   /* CML-LP */
+   { PCI_DEVICE(0x8086, 0x02c8),
+   .driver_data = (unsigned long)_soc_acpi_intel_cnl_machines},
+#endif
+#if IS_ENABLED(CONFIG_SND_SOC_INTEL_CML_H)
+   /* CML-H */
+   { PCI_DEVICE(0x8086, 0x06c8),
+   .driver_data = (unsigned long)_soc_acpi_intel_cnl_machines},
 #endif
{ 0, }
 };
-- 
2.20.1

[PATCH v3 0/2] ASoC: Intel: Add Cometlake PCI IDs

2019-05-10 Thread Evan Green



This small series adds PCI IDs for Cometlake platforms, for a
dazzling audio experience.

This is based on linux-next's next-20190510.

Changes in v3:
- Copy cnl_desc to new cml_desc, and avoid selecting cannonlake (Pierre-Louis)
- Don't select CML_* in SND_SOC_INTEL_SKYLAKE (Pierre-Louis)

Changes in v2:
- Add CML-H ID 0x06c8 (Pierre-Louis)
- Add 0x06c8 for CML-H (Pierre-Louis)

Evan Green (2):
  ASoC: SOF: Add Comet Lake PCI IDs
  ASoC: Intel: Skylake: Add Cometlake PCI IDs

 sound/soc/intel/Kconfig| 16 +
 sound/soc/intel/skylake/skl-messages.c | 16 +
 sound/soc/intel/skylake/skl.c  | 10 
 sound/soc/sof/intel/Kconfig| 32 ++
 sound/soc/sof/sof-pci-dev.c| 28 ++
 5 files changed, 102 insertions(+)

-- 
2.20.1

Re: [PATCH 5/5] arm64: dts: meson: sei510: add network support

2019-05-10 Thread Kevin Hilman

Jerome Brunet  writes:

> Enable the network interface of the SEI510 which use the internal PHY.
>
> Signed-off-by: Jerome Brunet 

I tried testing this series on SEI510, but I must still be missing some
defconfig options, as the default defconfig doesn't lead to a working
interface.

I tried adding this kconfig fragment[1], and the dwmac probes/inits but
I must still be missing something, as the dwmac is still failing to find
a PHY.  Boot log: https://termbin.com/ivf3

I have the same result testing on the u200.

Kevin

[1] amlogic network kconfig fragment
CONFIG_STMMAC_ETH=y

# following are needed, but automatically enabled if above is set
#CONFIG_STMMAC_PLATFORM=m
#CONFIG_DWMAC_MESON=m

CONFIG_PHYLIB=y
CONFIG_MICREL_PHY=y
CONFIG_REALTEK_PHY=y

CONFIG_MDIO_BUS_MUX_MESON_G12A=y
CONFIG_MESON_GXL_PHY=y

[PATCH 1/3 v5] add a new ima hook and policy to measure the cmdline

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

For secure boot attestation, it is necessary to measure the kernel
command line and the kernel version. For cold boot, the boot loader
can be enhanced to measure these parameters. However, for attestation
across soft reboot boundary, these values also need to be measured
during kexec.

For this reason, this patch adds support for measuring these
parameters during kexec. To achive this, a new ima policy and
hook id, defined KEXEC_CMDLINE and ima_kexec_cmdline respectively,
are added.

Signed-off-by: Prakhar Srivastava 
---
 Documentation/ABI/testing/ima_policy |  1 +
 include/linux/ima.h  |  2 +
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_api.c |  1 +
 security/integrity/ima/ima_main.c| 84 
 security/integrity/ima/ima_policy.c  |  9 +++
 6 files changed, 98 insertions(+)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index 74c6702de74e..62e7cd687e9c 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -29,6 +29,7 @@ Description:
base:   func:= 
[BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
+   [KEXEC_CMDLINE]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
diff --git a/include/linux/ima.h b/include/linux/ima.h
index dc12fbcf484c..2e2c77280be8 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -26,6 +26,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
+extern void ima_kexec_cmdline(const void *buf, int size);
 
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
@@ -92,6 +93,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+static inline void ima_kexec_cmdline(const void *buf, int size) {}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index d213e835c498..226a26d8de09 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -184,6 +184,7 @@ static inline unsigned long ima_hash_key(u8 *digest)
hook(KEXEC_KERNEL_CHECK)\
hook(KEXEC_INITRAMFS_CHECK) \
hook(POLICY_CHECK)  \
+   hook(KEXEC_CMDLINE) \
hook(MAX_CHECK)
 #define __ima_hook_enumify(ENUM)   ENUM,
 
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c7505fb122d4..800d965232e5 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -169,6 +169,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * subj=, obj=, type=, func=, mask=, fsmagic=
  * subj,obj, and type: are LSM specific.
  * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK
+ * | KEXEC_CMDLINE
  * mask: contains the permission mask
  * fsmagic: hex value
  *
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 357edd140c09..1d186bda25fe 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -576,6 +576,90 @@ int ima_load_data(enum kernel_load_data_id id)
return 0;
 }
 
+/*
+ * process_buffer_measurement - Measure the buffer passed to ima log.
+ * (Instead of using the file hash use the buffer hash).
+ * @buf - The buffer that needs to be added to the log
+ * @size - size of buffer(in bytes)
+ * @eventname - event name to be used for buffer.
+ *
+ * The buffer passed is added to the ima log.
+ *
+ * On success return 0.
+ * On error cases surface errors from ima calls.
+ */
+static int process_buffer_measurement(const void *buf, int size,
+   const char *eventname, const struct cred *cred,
+   u32 secid)
+{
+   int ret = 0;
+   struct ima_template_entry *entry = NULL;
+   struct integrity_iint_cache tmp_iint, *iint = _iint;
+   struct ima_event_data event_data = {iint, NULL, NULL,
+   NULL, 0, NULL};
+   struct {
+   struct ima_digest_data hdr;
+   char digest[IMA_MAX_DIGEST_SIZE];
+   } hash;
+   int violation = 0;
+   int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
+   int action = 0;
+
+   action = ima_get_action(NULL, cred, secid, 0, KEXEC_CMDLINE, );
+   if (!(action & IMA_AUDIT) && !(action & IMA_MEASURE))
+   goto out;
+
+

[PATCH v3 1/2] ASoC: SOF: Add Comet Lake PCI IDs

2019-05-10 Thread Evan Green

Add support for Intel Comet Lake platforms by adding a new Kconfig
for CometLake and the appropriate PCI IDs.

Signed-off-by: Evan Green 
---

Changes in v3:
- Copy cnl_desc to new cml_desc, and avoid selecting cannonlake (Pierre-Louis)

Changes in v2:
- Add CML-H ID 0x06c8 (Pierre-Louis)

 sound/soc/sof/intel/Kconfig | 32 
 sound/soc/sof/sof-pci-dev.c | 28 
 2 files changed, 60 insertions(+)

diff --git a/sound/soc/sof/intel/Kconfig b/sound/soc/sof/intel/Kconfig
index 603e0db4f012..17e10d65fc0c 100644
--- a/sound/soc/sof/intel/Kconfig
+++ b/sound/soc/sof/intel/Kconfig
@@ -24,6 +24,8 @@ config SND_SOC_SOF_INTEL_PCI
select SND_SOC_SOF_CANNONLAKE  if SND_SOC_SOF_CANNONLAKE_SUPPORT
select SND_SOC_SOF_COFFEELAKE  if SND_SOC_SOF_COFFEELAKE_SUPPORT
select SND_SOC_SOF_ICELAKE if SND_SOC_SOF_ICELAKE_SUPPORT
+   select SND_SOC_SOF_COMETLAKE_LP if SND_SOC_SOF_COMETLAKE_LP_SUPPORT
+   select SND_SOC_SOF_COMETLAKE_H if SND_SOC_SOF_COMETLAKE_H_SUPPORT
help
  This option is not user-selectable but automagically handled by
  'select' statements at a higher level
@@ -179,6 +181,36 @@ config SND_SOC_SOF_ICELAKE
  This option is not user-selectable but automagically handled by
  'select' statements at a higher level
 
+config SND_SOC_SOF_COMETLAKE_LP
+   tristate
+   select SND_SOC_SOF_HDA_COMMON
+   help
+ This option is not user-selectable but automagically handled by
+ 'select' statements at a higher level
+
+config SND_SOC_SOF_COMETLAKE_LP_SUPPORT
+   bool "SOF support for CometLake-LP"
+   help
+ This adds support for Sound Open Firmware for Intel(R) platforms
+ using the Cometlake-LP processors.
+ Say Y if you have such a device.
+ If unsure select "N".
+
+config SND_SOC_SOF_COMETLAKE_H
+   tristate
+   select SND_SOC_SOF_HDA_COMMON
+   help
+ This option is not user-selectable but automagically handled by
+ 'select' statements at a higher level
+
+config SND_SOC_SOF_COMETLAKE_H_SUPPORT
+   bool "SOF support for CometLake-H"
+   help
+ This adds support for Sound Open Firmware for Intel(R) platforms
+ using the Cometlake-H processors.
+ Say Y if you have such a device.
+ If unsure select "N".
+
 config SND_SOC_SOF_HDA_COMMON
tristate
select SND_SOC_SOF_INTEL_COMMON
diff --git a/sound/soc/sof/sof-pci-dev.c b/sound/soc/sof/sof-pci-dev.c
index b778dffb2d25..d736806c2e0d 100644
--- a/sound/soc/sof/sof-pci-dev.c
+++ b/sound/soc/sof/sof-pci-dev.c
@@ -129,6 +129,26 @@ static const struct sof_dev_desc cfl_desc = {
 };
 #endif
 
+#if IS_ENABLED(CONFIG_SND_SOC_SOF_COMETLAKE_LP) || \
+   IS_ENABLED(CONFIG_SND_SOC_SOF_COMETLAKE_H)
+
+static const struct sof_dev_desc cml_desc = {
+   .machines   = snd_soc_acpi_intel_cnl_machines,
+   .resindex_lpe_base  = 0,
+   .resindex_pcicfg_base   = -1,
+   .resindex_imr_base  = -1,
+   .irqindex_host_ipc  = -1,
+   .resindex_dma_base  = -1,
+   .chip_info = _chip_info,
+   .default_fw_path = "intel/sof",
+   .default_tplg_path = "intel/sof-tplg",
+   .nocodec_fw_filename = "sof-cnl.ri",
+   .nocodec_tplg_filename = "sof-cnl-nocodec.tplg",
+   .ops = _cnl_ops,
+   .arch_ops = _xtensa_arch_ops
+};
+#endif
+
 #if IS_ENABLED(CONFIG_SND_SOC_SOF_ICELAKE)
 static const struct sof_dev_desc icl_desc = {
.machines   = snd_soc_acpi_intel_icl_machines,
@@ -353,6 +373,14 @@ static const struct pci_device_id sof_pci_ids[] = {
 #if IS_ENABLED(CONFIG_SND_SOC_SOF_ICELAKE)
{ PCI_DEVICE(0x8086, 0x34C8),
.driver_data = (unsigned long)_desc},
+#endif
+#if IS_ENABLED(CONFIG_SND_SOC_SOF_COMETLAKE_LP)
+   { PCI_DEVICE(0x8086, 0x02c8),
+   .driver_data = (unsigned long)_desc},
+#endif
+#if IS_ENABLED(CONFIG_SND_SOC_SOF_COMETLAKE_H)
+   { PCI_DEVICE(0x8086, 0x06c8),
+   .driver_data = (unsigned long)_desc},
 #endif
{ 0, }
 };
-- 
2.20.1

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-10 Thread Mimi Zohar

On Fri, 2019-05-10 at 15:46 -0500, Rob Landley wrote:
> On 5/10/19 6:49 AM, Mimi Zohar wrote:
> > On Fri, 2019-05-10 at 08:56 +0200, Roberto Sassu wrote:
> >> On 5/9/2019 8:34 PM, Rob Landley wrote:
> >>> On 5/9/19 6:24 AM, Roberto Sassu wrote:
> > 
>  The difference with another proposal
>  (https://lore.kernel.org/patchwork/cover/888071/) is that xattrs can be
>  included in an image without changing the image format, as opposed to
>  defining a new one. As seen from the discussion, if a new format has to 
>  be
>  defined, it should fix the issues of the existing format, which requires
>  more time.
> >>>
> >>> So you've explicitly chosen _not_ to address Y2038 while you're there.
> >>
> >> Can you be more specific?
> > 
> > Right, this patch set avoids incrementing the CPIO magic number and
> > the resulting changes required (eg. increasing the timestamp field
> > size), by including a file with the security xattrs in the CPIO.  In
> > either case, including the security xattrs in the initramfs header or
> > as a separate file, the initramfs, itself, needs to be signed.
> 
> The /init binary in the initramfs runs as root and launches all other 
> processes
> on the system. Presumably it can write any xattrs it wants to, and doesn't 
> need
> any extra permissions granted to it to do so. But as soon as you start putting
> xattrs on _other_ files within the initramfs that are _not_ necessarily 
> running
> as PID 1, _that's_ when the need to sign the initramfs comes in?
> 
> Presumably the signing occurs on the gzipped file. How does that affect the 
> cpio
> parsing _after_ it's decompressed? Why would that be part of _this_ patch?

The signing and verification of the initramfs is a separate issue, not
part of this patch set.  The only reason for mentioning it here was to
say that both methods of including the security xattrs require the
initramfs be signed.  Just as the kernel image needs to be signed and
verified, the initramfs should be too.

Mimi

[PATCH 3/4] platform/chrome: cros_ec_spi: Set ourselves as timing sensitive

2019-05-10 Thread Douglas Anderson

All currently known ECs in the wild are very sensitive to timing.
Specifically the ECs are known to drop a transfer if more than 8 ms
passes from the assertion of the chip select until the transfer
finishes.

Let's use the new feature introduced in the patch ("spi: Allow SPI
devices to specify that they are timing sensitive") to specify this
and increase the success rate of our transfers.

NOTE: if future Chrome OS ECs ever fix themselves to be less sensitive
then we could consider adding a property (or compatible string) to not
set this property.  For now we need it across the board.

With this change we can revert the commit 37a186225a0c
("platform/chrome: cros_ec_spi: Transfer messages at high priority").
...and, in fact, transfers are _even more_ reliable than they were
with that commit since the SPI framework will use a higher priority
(realtime) and we no longer lose our priority when we get shunted over
to the message pumping thread (because we now always get shunted and
the thread is high priority).

Signed-off-by: Douglas Anderson 
---

 drivers/platform/chrome/cros_ec_spi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/platform/chrome/cros_ec_spi.c 
b/drivers/platform/chrome/cros_ec_spi.c
index 8e9451720e73..757a115502ec 100644
--- a/drivers/platform/chrome/cros_ec_spi.c
+++ b/drivers/platform/chrome/cros_ec_spi.c
@@ -703,6 +703,7 @@ static int cros_ec_spi_probe(struct spi_device *spi)
 
spi->bits_per_word = 8;
spi->mode = SPI_MODE_0;
+   spi->timing_sensitive = true;
err = spi_setup(spi);
if (err < 0)
return err;
-- 
2.21.0.1020.gf2820cf01a-goog

[PATCH 0/4] spi: A better solution for cros_ec_spi reliability

2019-05-10 Thread Douglas Anderson

This series is a much better solution for getting the Chrome OS EC to
talk reliably and replaces commit 37a186225a0c ("platform/chrome:
cros_ec_spi: Transfer messages at high priority").

Note that the cros_ec bits can't land until the SPI bits are
somewhere.  If the SPI bits look OK to land it might be convenient if
they could be placed somewhere with a stable git hash?

Special thanks to Guenter Roeck for pointing out the "realtime"
feature of the SPI framework so I didn't re-invent the wheel.  I have
no idea how I missed it.  :-/


Douglas Anderson (4):
  spi: For controllers that need realtime always use the pump thread
  spi: Allow SPI devices to specify that they are timing sensitive
  platform/chrome: cros_ec_spi: Set ourselves as timing sensitive
  Revert "platform/chrome: cros_ec_spi: Transfer messages at high
priority"

 drivers/platform/chrome/cros_ec_spi.c | 81 +++
 drivers/spi/spi.c | 41 +++---
 include/linux/spi/spi.h   |  3 +
 3 files changed, 43 insertions(+), 82 deletions(-)

-- 
2.21.0.1020.gf2820cf01a-goog

[PATCH 4/4] Revert "platform/chrome: cros_ec_spi: Transfer messages at high priority"

2019-05-10 Thread Douglas Anderson

This reverts commit 37a186225a0c020516bafad2727fdcdfc039a1e4.

We have a better solution in the patch ("platform/chrome: cros_ec_spi:
Set ourselves as timing sensitive").  Let's revert the uglier and less
reliable solution.

Signed-off-by: Douglas Anderson 
---

 drivers/platform/chrome/cros_ec_spi.c | 80 ++-
 1 file changed, 6 insertions(+), 74 deletions(-)

diff --git a/drivers/platform/chrome/cros_ec_spi.c 
b/drivers/platform/chrome/cros_ec_spi.c
index 757a115502ec..70ff1ad09012 100644
--- a/drivers/platform/chrome/cros_ec_spi.c
+++ b/drivers/platform/chrome/cros_ec_spi.c
@@ -75,27 +75,6 @@ struct cros_ec_spi {
unsigned int end_of_msg_delay;
 };
 
-typedef int (*cros_ec_xfer_fn_t) (struct cros_ec_device *ec_dev,
- struct cros_ec_command *ec_msg);
-
-/**
- * struct cros_ec_xfer_work_params - params for our high priority workers
- *
- * @work: The work_struct needed to queue work
- * @fn: The function to use to transfer
- * @ec_dev: ChromeOS EC device
- * @ec_msg: Message to transfer
- * @ret: The return value of the function
- */
-
-struct cros_ec_xfer_work_params {
-   struct work_struct work;
-   cros_ec_xfer_fn_t fn;
-   struct cros_ec_device *ec_dev;
-   struct cros_ec_command *ec_msg;
-   int ret;
-};
-
 static void debug_packet(struct device *dev, const char *name, u8 *ptr,
 int len)
 {
@@ -371,13 +350,13 @@ static int cros_ec_spi_receive_response(struct 
cros_ec_device *ec_dev,
 }
 
 /**
- * do_cros_ec_pkt_xfer_spi - Transfer a packet over SPI and receive the reply
+ * cros_ec_pkt_xfer_spi - Transfer a packet over SPI and receive the reply
  *
  * @ec_dev: ChromeOS EC device
  * @ec_msg: Message to transfer
  */
-static int do_cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
-  struct cros_ec_command *ec_msg)
+static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
+   struct cros_ec_command *ec_msg)
 {
struct ec_host_response *response;
struct cros_ec_spi *ec_spi = ec_dev->priv;
@@ -514,13 +493,13 @@ static int do_cros_ec_pkt_xfer_spi(struct cros_ec_device 
*ec_dev,
 }
 
 /**
- * do_cros_ec_cmd_xfer_spi - Transfer a message over SPI and receive the reply
+ * cros_ec_cmd_xfer_spi - Transfer a message over SPI and receive the reply
  *
  * @ec_dev: ChromeOS EC device
  * @ec_msg: Message to transfer
  */
-static int do_cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
-  struct cros_ec_command *ec_msg)
+static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
+   struct cros_ec_command *ec_msg)
 {
struct cros_ec_spi *ec_spi = ec_dev->priv;
struct spi_transfer trans;
@@ -632,53 +611,6 @@ static int do_cros_ec_cmd_xfer_spi(struct cros_ec_device 
*ec_dev,
return ret;
 }
 
-static void cros_ec_xfer_high_pri_work(struct work_struct *work)
-{
-   struct cros_ec_xfer_work_params *params;
-
-   params = container_of(work, struct cros_ec_xfer_work_params, work);
-   params->ret = params->fn(params->ec_dev, params->ec_msg);
-}
-
-static int cros_ec_xfer_high_pri(struct cros_ec_device *ec_dev,
-struct cros_ec_command *ec_msg,
-cros_ec_xfer_fn_t fn)
-{
-   struct cros_ec_xfer_work_params params;
-
-   INIT_WORK_ONSTACK(, cros_ec_xfer_high_pri_work);
-   params.ec_dev = ec_dev;
-   params.ec_msg = ec_msg;
-   params.fn = fn;
-
-   /*
-* This looks a bit ridiculous.  Why do the work on a
-* different thread if we're just going to block waiting for
-* the thread to finish?  The key here is that the thread is
-* running at high priority but the calling context might not
-* be.  We need to be at high priority to avoid getting
-* context switched out for too long and the EC giving up on
-* the transfer.
-*/
-   queue_work(system_highpri_wq, );
-   flush_work();
-   destroy_work_on_stack();
-
-   return params.ret;
-}
-
-static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
-   struct cros_ec_command *ec_msg)
-{
-   return cros_ec_xfer_high_pri(ec_dev, ec_msg, do_cros_ec_pkt_xfer_spi);
-}
-
-static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
-   struct cros_ec_command *ec_msg)
-{
-   return cros_ec_xfer_high_pri(ec_dev, ec_msg, do_cros_ec_cmd_xfer_spi);
-}
-
 static void cros_ec_spi_dt_probe(struct cros_ec_spi *ec_spi, struct device 
*dev)
 {
struct device_node *np = dev->of_node;
-- 
2.21.0.1020.gf2820cf01a-goog

[PATCH 1/4] spi: For controllers that need realtime always use the pump thread

2019-05-10 Thread Douglas Anderson

If a controller specifies that it needs high priority for sending
messages we should always schedule our transfers on the thread.  If we
don't do this we'll do the transfer in the caller's context which
might not be very high priority.

Signed-off-by: Douglas Anderson 
---

 drivers/spi/spi.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 8eb7460dd744..0597f7086de3 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1230,8 +1230,11 @@ static void __spi_pump_messages(struct spi_controller 
*ctlr, bool in_kthread)
return;
}
 
-   /* If another context is idling the device then defer */
-   if (ctlr->idling) {
+   /*
+* If another context is idling the device then defer.
+* If we are high priority then the thread should do the transfer.
+*/
+   if (ctlr->idling || (ctlr->rt && !in_kthread)) {
kthread_queue_work(>kworker, >pump_messages);
spin_unlock_irqrestore(>queue_lock, flags);
return;
-- 
2.21.0.1020.gf2820cf01a-goog

[PATCH 2/4] spi: Allow SPI devices to specify that they are timing sensitive

2019-05-10 Thread Douglas Anderson

If a device on the SPI bus is very sensitive to timing then it may be
necessary (for correctness) not to get interrupted during a transfer.
One example is the EC (Embedded Controller) on Chromebooks.  The
Chrome OS EC will drop a transfer if more than ~8ms passes between the
chip select being asserted and the transfer finishing.

The SPI framework already has code to handle the case where transfers
are timing senstive.  It can set its message pumping thread to
realtime to to minimize interruptions during the transfer.  However,
at the moment, this mode can only be requested by a SPI controller.
Let's allow the drivers for SPI devices to also request this mode.

NOTE: at the moment if a given device on a bus says that it's timing
sensitive then we'll pump all messages on that bus at high priority.
It is possible we might want to relax this in the future but it seems
like it should be fine for now.

Signed-off-by: Douglas Anderson 
---

 drivers/spi/spi.c   | 34 --
 include/linux/spi/spi.h |  3 +++
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 0597f7086de3..d117ab3adafa 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1367,10 +1367,30 @@ static void spi_pump_messages(struct kthread_work *work)
__spi_pump_messages(ctlr, true);
 }
 
-static int spi_init_queue(struct spi_controller *ctlr)
+/**
+ * spi_boost_thread_priority - set the controller to pump at realtime priority
+ * @ctlr: controller to boost priority of
+ *
+ * This can be called because the controller requested realtime priority
+ * (by setting the ->rt value before calling spi_register_controller()) or
+ * because a device on the bus said that its transfers were timing senstive.
+ *
+ * NOTE: at the moment if any device on a bus says it is timing sensitive then
+ * all the devices on this bus will do transfers at realtime priority.  If
+ * this eventually becomes a problem we may see if we can find a way to boost
+ * the priority only temporarily during relevant transfers.
+ */
+static void spi_boost_thread_priority(struct spi_controller *ctlr)
 {
struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
 
+   dev_info(>dev,
+   "will run message pump with realtime priority\n");
+   sched_setscheduler(ctlr->kworker_task, SCHED_FIFO, );
+}
+
+static int spi_init_queue(struct spi_controller *ctlr)
+{
ctlr->running = false;
ctlr->busy = false;
 
@@ -1390,11 +1410,8 @@ static int spi_init_queue(struct spi_controller *ctlr)
 * request and the scheduling of the message pump thread. Without this
 * setting the message pump thread will remain at default priority.
 */
-   if (ctlr->rt) {
-   dev_info(>dev,
-   "will run message pump with realtime priority\n");
-   sched_setscheduler(ctlr->kworker_task, SCHED_FIFO, );
-   }
+   if (ctlr->rt)
+   spi_boost_thread_priority(ctlr);
 
return 0;
 }
@@ -2985,6 +3002,11 @@ int spi_setup(struct spi_device *spi)
 
spi_set_cs(spi, false);
 
+   if (spi->timing_sensitive && !spi->controller->rt) {
+   spi->controller->rt = true;
+   spi_boost_thread_priority(spi->controller);
+   }
+
dev_dbg(>dev, "setup mode %d, %s%s%s%s%u bits/w, %u Hz max --> 
%d\n",
(int) (spi->mode & (SPI_CPOL | SPI_CPHA)),
(spi->mode & SPI_CS_HIGH) ? "cs_high, " : "",
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 053abd22ad31..ef6bdd4d25f2 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -109,6 +109,8 @@ void spi_statistics_add_transfer_stats(struct 
spi_statistics *stats,
  * This may be changed by the device's driver, or left at the
  * default (0) indicating protocol words are eight bit bytes.
  * The spi_transfer.bits_per_word can override this for each transfer.
+ * @timing_sensitive: Transfers for this device are senstive to timing
+ * so we should do our transfer at high priority.
  * @irq: Negative, or the number passed to request_irq() to receive
  * interrupts from this device.
  * @controller_state: Controller's runtime state
@@ -143,6 +145,7 @@ struct spi_device {
u32 max_speed_hz;
u8  chip_select;
u8  bits_per_word;
+   booltiming_sensitive;
u32 mode;
 #defineSPI_CPHA0x01/* clock phase */
 #defineSPI_CPOL0x02/* clock polarity */
-- 
2.21.0.1020.gf2820cf01a-goog

[PATCH 1/3 v5] add a new ima hook and policy to measure the cmdline

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

For this reason, this patch adds support for measuring these
parameters during kexec. To achive this, a new ima policy and
hook id, defined KEXEC_CMDLINE and ima_kexec_cmdline respectively,
are added.

Signed-off-by: Prakhar Srivastava 
---
 Documentation/ABI/testing/ima_policy |  1 +
 include/linux/ima.h  |  2 +
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_api.c |  1 +
 security/integrity/ima/ima_main.c| 84 
 security/integrity/ima/ima_policy.c  |  9 +++
 6 files changed, 98 insertions(+)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index 74c6702de74e..62e7cd687e9c 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -29,6 +29,7 @@ Description:
base:   func:= 
[BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
+   [KEXEC_CMDLINE]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
diff --git a/include/linux/ima.h b/include/linux/ima.h
index dc12fbcf484c..2e2c77280be8 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -26,6 +26,7 @@ extern int ima_read_file(struct file *file, enum 
kernel_read_file_id id);
 extern int ima_post_read_file(struct file *file, void *buf, loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
+extern void ima_kexec_cmdline(const void *buf, int size);
 
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
@@ -92,6 +93,7 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+static inline void ima_kexec_cmdline(const void *buf, int size) {}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index d213e835c498..226a26d8de09 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -184,6 +184,7 @@ static inline unsigned long ima_hash_key(u8 *digest)
hook(KEXEC_KERNEL_CHECK)\
hook(KEXEC_INITRAMFS_CHECK) \
hook(POLICY_CHECK)  \
+   hook(KEXEC_CMDLINE) \
hook(MAX_CHECK)
 #define __ima_hook_enumify(ENUM)   ENUM,
 
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c7505fb122d4..800d965232e5 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -169,6 +169,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * subj=, obj=, type=, func=, mask=, fsmagic=
  * subj,obj, and type: are LSM specific.
  * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK
+ * | KEXEC_CMDLINE
  * mask: contains the permission mask
  * fsmagic: hex value
  *
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 357edd140c09..1d186bda25fe 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -576,6 +576,90 @@ int ima_load_data(enum kernel_load_data_id id)
return 0;
 }
 
+/*
+ * process_buffer_measurement - Measure the buffer passed to ima log.
+ * (Instead of using the file hash use the buffer hash).
+ * @buf - The buffer that needs to be added to the log
+ * @size - size of buffer(in bytes)
+ * @eventname - event name to be used for buffer.
+ *
+ * The buffer passed is added to the ima log.
+ *
+ * On success return 0.
+ * On error cases surface errors from ima calls.
+ */
+static int process_buffer_measurement(const void *buf, int size,
+   const char *eventname, const struct cred *cred,
+   u32 secid)
+{
+   int ret = 0;
+   struct ima_template_entry *entry = NULL;
+   struct integrity_iint_cache tmp_iint, *iint = _iint;
+   struct ima_event_data event_data = {iint, NULL, NULL,
+   NULL, 0, NULL};
+   struct {
+   struct ima_digest_data hdr;
+   char digest[IMA_MAX_DIGEST_SIZE];
+   } hash;
+   int violation = 0;
+   int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
+   int action = 0;
+
+   action = ima_get_action(NULL, cred, secid, 0, KEXEC_CMDLINE, );
+   if (!(action & IMA_AUDIT) && !(action & IMA_MEASURE))
+   goto out;
+
+   memset(iint, 0, sizeof(*iint));
+   memset(, 0, sizeof(hash));
+
+   event_data.filename = eventname;
+
+   iint->ima_hash = 
+   iint->ima_hash->algo = ima_hash_algo;
+   iint->ima_hash->length = hash_digest_size[ima_hash_algo];
+
+   ret = ima_calc_buffer_hash(buf,

[PATCH 2/3 v5] add a new template field buf to contain the buffer

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

The buffer(cmdline args) added to the ima log cannot be attested
without having the actual buffer. Thus to make the measured buffer 
available to stroe/read a new ima temaplate (buf) is added. 
The cmdline args used for soft reboot can then be read and attested
later.

The patch adds a new template field buf to store/read the buffer
used while measuring kexec_cmdline args in the 
[PATCH 1/2 v5]: "add a new ima hook and policy to measure the cmdline".
Signed-off-by: Prakhar Srivastava 
---
 security/integrity/ima/ima_main.c | 23 +++
 security/integrity/ima/ima_template.c |  2 ++
 security/integrity/ima/ima_template_lib.c | 21 +
 security/integrity/ima/ima_template_lib.h |  4 
 security/integrity/integrity.h|  1 +
 5 files changed, 51 insertions(+)

diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 1d186bda25fe..ca12885ca241 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -605,10 +605,32 @@ static int process_buffer_measurement(const void *buf, 
int size,
int pcr = CONFIG_IMA_MEASURE_PCR_IDX;
int action = 0;
 
+   struct buffer_xattr {
+   enum evm_ima_xattr_type type;
+   u16 buf_length;
+   unsigned char buf[0];
+   };
+   struct buffer_xattr *buffer_event_data = NULL;
+   int alloc_length = 0;
+
action = ima_get_action(NULL, cred, secid, 0, KEXEC_CMDLINE, );
if (!(action & IMA_AUDIT) && !(action & IMA_MEASURE))
goto out;
 
+   alloc_length = sizeof(struct buffer_xattr) + size;
+   buffer_event_data = kzalloc(alloc_length, GFP_KERNEL);
+   if (!buffer_event_data) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   buffer_event_data->type = IMA_XATTR_BUFFER;
+   buffer_event_data->buf_length = size;
+   memcpy(buffer_event_data->buf, buf, size);
+
+   event_data.xattr_value = (struct evm_ima_xattr_data *)buffer_event_data;
+   event_data.xattr_len = alloc_length;
+
memset(iint, 0, sizeof(*iint));
memset(, 0, sizeof(hash));
 
@@ -638,6 +660,7 @@ static int process_buffer_measurement(const void *buf, int 
size,
}
 
 out:
+   kfree(buffer_event_data);
return ret;
 }
 
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index b631b8bc7624..a76d1c04162a 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -43,6 +43,8 @@ static const struct ima_template_field supported_fields[] = {
 .field_show = ima_show_template_string},
{.field_id = "sig", .field_init = ima_eventsig_init,
 .field_show = ima_show_template_sig},
+   {.field_id = "buf", .field_init = ima_eventbuf_init,
+.field_show = ima_show_template_buf},
 };
 #define MAX_TEMPLATE_NAME_LEN 15
 
diff --git a/security/integrity/ima/ima_template_lib.c 
b/security/integrity/ima/ima_template_lib.c
index 513b457ae900..95a827f42c18 100644
--- a/security/integrity/ima/ima_template_lib.c
+++ b/security/integrity/ima/ima_template_lib.c
@@ -162,6 +162,11 @@ void ima_show_template_sig(struct seq_file *m, enum 
ima_show_type show,
ima_show_template_field_data(m, show, DATA_FMT_HEX, field_data);
 }
 
+void ima_show_template_buf(struct seq_file *m, enum ima_show_type show,
+   struct ima_field_data *field_data)
+{
+   ima_show_template_field_data(m, show, DATA_FMT_HEX, field_data);
+}
 /**
  * ima_parse_buf() - Parses lengths and data from an input buffer
  * @bufstartp:   Buffer start address.
@@ -389,3 +394,19 @@ int ima_eventsig_init(struct ima_event_data *event_data,
return ima_write_template_field_data(xattr_value, event_data->xattr_len,
 DATA_FMT_HEX, field_data);
 }
+
+/*
+ *  ima_eventbuf_init - include the buffer(kexec-cmldine) as part of the
+ *  template data.
+ */
+int ima_eventbuf_init(struct ima_event_data *event_data,
+   struct ima_field_data *field_data)
+{
+   struct evm_ima_xattr_data *xattr_value = event_data->xattr_value;
+
+   if ((!xattr_value) || (xattr_value->type != IMA_XATTR_BUFFER))
+   return 0;
+
+   return ima_write_template_field_data(xattr_value, event_data->xattr_len,
+   DATA_FMT_HEX, field_data);
+}
diff --git a/security/integrity/ima/ima_template_lib.h 
b/security/integrity/ima/ima_template_lib.h
index 6a3d8b831deb..12f1a8578b31 100644
--- a/security/integrity/ima/ima_template_lib.h
+++ b/security/integrity/ima/ima_template_lib.h
@@ -29,6 +29,8 @@ void ima_show_template_string(struct seq_file *m, enum 
ima_show_type show,
  struct ima_field_data *field_data);
 void ima_show_template_sig(struct seq_file *m, enum ima_show_type show,

[PATCH 0/3 v5] Kexec cmdline bufffer measure

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

For secure boot attestation, it is necessary to measure the kernel
command line and the kernel version. For cold boot, the boot loader
can be enhanced to measure these parameters.
(https://mjg59.dreamwidth.org/48897.html)
However, for attestation across soft reboot boundary, these values 
also need to be measured during kexec.

Currently for Kexec(kexec_file_load)/soft reboot scenario the cmdline
args that are used to boot the next kernel is not measured. For 
normal case of boot/hardreboot the cmdline args are measured into the TPM.

The hash of boot command line is calculated and added to the current 
running kernel's measurement list.  On a soft reboot like kexec, the PCRs
are not reset to zero.  Refer to commit 94c3aac567a9 ("ima: on soft 
reboot, restore the measurement list") patch description.

To achive the above the patch series does the following
  -adds a new ima hook: ima_kexec_cmdline which measures the cmdline args
   into the ima log, behind a new ima policy entry KEXEC_CMDLINE.
  -since the cmldine args cannot be appraised, a new template field(buf) is
   added. The template field contains the buffer passed(cmldine args), which
   can be used to appraise/attest at a later stage.
  -call the ima_kexec_cmdline(...) hook from kexec_file_load call.

The ima logs need to carried over to the next kernel, which will be followed
up by other patchsets for x86_64 and arm64.


Changelog:
v5:
  -add a new ima hook and policy to measure the cmdline
args(ima_kexec_cmdline)
  -add a new template field buf to contain the buffer measured.
[suggested by Mimi Zohar]
  -call ima_kexec_cmdline from kexec_file_load path

v4:
  - per feedback from LSM community, removed the LSM hook and renamed the
IMA policy to KEXEC_CMDLINE

v3: (rebase changes to next-general)
  - Add policy checks for buffer[suggested by Mimi Zohar]
  - use the IMA_XATTR to add buffer
  - Add kexec_cmdline used for kexec file load
  - Add an LSM hook to allow usage by other LSM.[suggestd by Mimi Zohar]

v2:
  - Add policy checks for buffer[suggested by Mimi Zohar]
  - Add an LSM hook to allow usage by other LSM.[suggestd by Mimi Zohar]
  - use the IMA_XATTR to add buffer instead of sig template

v1:
  -Add kconfigs to control the ima_buffer_check
  -measure the cmdline args suffixed with the kernel file name
  -add the buffer to the template sig field.


Prakhar Srivastava (3):
  add a new ima hook and policy to measure the cmdline
  add a new template field buf to contain the buffer
  call ima_kexec_cmdline from kexec_file_load path

 Documentation/ABI/testing/ima_policy  |   1 +
 include/linux/ima.h   |   2 +
 kernel/kexec_file.c   |   2 +
 security/integrity/ima/ima.h  |   1 +
 security/integrity/ima/ima_api.c  |   1 +
 security/integrity/ima/ima_main.c | 107 ++
 security/integrity/ima/ima_policy.c   |   9 ++
 security/integrity/ima/ima_template.c |   2 +
 security/integrity/ima/ima_template_lib.c |  21 +
 security/integrity/ima/ima_template_lib.h |   4 +
 security/integrity/integrity.h|   1 +
 11 files changed, 151 insertions(+)

-- 
2.20.1

[PATCH 3/3 v5] call ima_kexec_cmdline from kexec_file_load path

2019-05-10 Thread Prakhar Srivastava

From: Prakhar Srivastava 

Signed-off-by: Prakhar Srivastava 
---
 kernel/kexec_file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f1d0e00a3971..e779bcf674a0 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -241,6 +241,8 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
ret = -EINVAL;
goto out;
}
+
+   ima_kexec_cmdline(image->cmdline_buf, image->cmdline_buf_len - 
1);
}
 
/* Call arch image load handlers */
-- 
2.20.1

Re: [PATCH] gcc-plugins: arm_ssp_per_task_plugin: Fix for older GCC < 6

2019-05-10 Thread Doug Anderson

Hi,

> Use gen_rtx_set instead of gen_rtx_SET. The former is a wrapper macro
> that handles the difference between GCC versions implementing
> the latter.
>
> This fixes the following error on my system with g++ 5.4.0 as the host
> compiler
>
>HOSTCXX -fPIC scripts/gcc-plugins/arm_ssp_per_task_plugin.o
>  scripts/gcc-plugins/arm_ssp_per_task_plugin.c:42:14: error: macro 
> "gen_rtx_SET" requires 3 arguments, but only 2 given
>   mask)),
>^
>  scripts/gcc-plugins/arm_ssp_per_task_plugin.c: In function ‘unsigned int 
> arm_pertask_ssp_rtl_execute()’:
>  scripts/gcc-plugins/arm_ssp_per_task_plugin.c:39:20: error: ‘gen_rtx_SET’ 
> was not declared in this scope
> emit_insn_before(gen_rtx_SET
>
> Signed-off-by: Chris Packham 
> ---
>  scripts/gcc-plugins/arm_ssp_per_task_plugin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I can confirm that I was getting compile errors before this patch and
applying it allowed me to compile and boot.  Thanks!  :-)

Tested-by: Douglas Anderson 

-Doug

[GIT PULL] security subsystem: Tomoyo updates for v5.2

2019-05-10 Thread James Morris

Please pull.

These patches include fixes to enable fuzz testing, and a fix for 
calculating whether a filesystem is user-modifiable.

The following changes since commit 1fb3b526df3bd7647e7854915ae6b22299408baf:

  Merge tag 'docs-5.2a' of git://git.lwn.net/linux (2019-05-10 13:24:53 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
next-tomoyo2

for you to fetch changes up to 4ad98ac46490d5f8441025930070eaf028cfd0f2:

  tomoyo: Don't emit WARNING: string while fuzzing testing. (2019-05-10 
14:58:35 -0700)


Tetsuo Handa (4):
  tomoyo: Add a kernel config option for fuzzing testing.
  tomoyo: Check address length before reading address family
  tomoyo: Change pathname calculation for read-only filesystems.
  tomoyo: Don't emit WARNING: string while fuzzing testing.

 security/tomoyo/Kconfig| 10 ++
 security/tomoyo/common.c   | 13 -
 security/tomoyo/network.c  |  4 
 security/tomoyo/realpath.c |  3 ++-
 security/tomoyo/util.c |  2 ++
 5 files changed, 30 insertions(+), 2 deletions(-)

Re: [RFC][PATCH 3/2] livepatch: remove klp_check_compiler_support()

2019-05-10 Thread Josh Poimboeuf

On Fri, May 10, 2019 at 11:47:50PM +0200, Jiri Kosina wrote:
> From: Jiri Kosina 
> 
> The only purpose of klp_check_compiler_support() is to make sure that we 
> are not using ftrace on x86 via mcount (because that's executed only after 
> prologue has already happened, and that's too late for livepatching 
> purposes).
> 
> Now that mcount is not supported by ftrace any more, there is no need for 
> klp_check_compiler_support() either.
> 
> Reported-by: Linus Torvalds 
> Signed-off-by: Jiri Kosina 
> ---
> 
> I guess it makes most sense to merge this together with mcount removal in 
> one go.

Acked-by: Josh Poimboeuf 

-- 
Josh

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-10 Thread Frank Rowand

On 5/9/19 3:20 PM, Logan Gunthorpe wrote:
> 
> 
> On 2019-05-09 3:42 p.m., Theodore Ts'o wrote:
>> On Thu, May 09, 2019 at 11:12:12AM -0700, Frank Rowand wrote:
>>>
>>>     "My understanding is that the intent of KUnit is to avoid booting a 
>>> kernel on
>>>     real hardware or in a virtual machine.  That seems to be a matter of 
>>> semantics
>>>     to me because isn't invoking a UML Linux just running the Linux kernel 
>>> in
>>>     a different form of virtualization?
>>>
>>>     So I do not understand why KUnit is an improvement over kselftest.
>>>
>>>     ...
>>> 
>>> What am I missing?"
>> 
>> One major difference: kselftest requires a userspace environment;
>> it starts systemd, requires a root file system from which you can
>> load modules, etc.  Kunit doesn't require a root file system;
>> doesn't require that you start systemd; doesn't allow you to run
>> arbitrary perl, python, bash, etc. scripts.  As such, it's much
>> lighter weight than kselftest, and will have much less overhead
>> before you can start running tests.  So it's not really the same
>> kind of virtualization.

I'm back to reply to this subthread, after a delay, as promised.


> I largely agree with everything Ted has said in this thread, but I
> wonder if we are conflating two different ideas that is causing an
> impasse. From what I see, Kunit actually provides two different
> things:

> 1) An execution environment that can be run very quickly in userspace
> on tests in the kernel source. This speeds up the tests and gives a
> lot of benefit to developers using those tests because they can get
> feedback on their code changes a *lot* quicker.

kselftest in-kernel tests provide exactly the same when the tests are
configured as "built-in" code instead of as modules.


> 2) A framework to write unit tests that provides a lot of the same
> facilities as other common unit testing frameworks from userspace
> (ie. a runner that runs a list of tests and a bunch of helpers such
> as KUNIT_EXPECT_* to simplify test passes and failures).

> The first item from Kunit is novel and I see absolutely no overlap
> with anything kselftest does. It's also the valuable thing I'd like
> to see merged and grow.

The first item exists in kselftest.


> The second item, arguably, does have significant overlap with
> kselftest. Whether you are running short tests in a light weight UML
> environment or higher level tests in an heavier VM the two could be
> using the same framework for writing or defining in-kernel tests. It
> *may* also be valuable for some people to be able to run all the UML
> tests in the heavy VM environment along side other higher level
> tests.
> 
> Looking at the selftests tree in the repo, we already have similar
> items to what Kunit is adding as I described in point (2) above.
> kselftest_harness.h contains macros like EXPECT_* and ASSERT_* with
> very similar intentions to the new KUNIT_EXECPT_* and KUNIT_ASSERT_*
> macros.

I might be wrong here because I have not dug deeply enough into the
code!!!  Does this framework apply to the userspace tests, the
in-kernel tests, or both?  My "not having dug enough GUESS" is that
these are for the user space tests (although if so, they could be
extended for in-kernel use also).

So I think this one maybe does not have an overlap between KUnit
and kselftest.


> However, the number of users of this harness appears to be quite
> small. Most of the code in the selftests tree seems to be a random
> mismash of scripts and userspace code so it's not hard to see it as
> something completely different from the new Kunit:
> $ git grep --files-with-matches kselftest_harness.h *
> Documentation/dev-tools/kselftest.rst
> MAINTAINERS
> tools/testing/selftests/kselftest_harness.h
> tools/testing/selftests/net/tls.c
> tools/testing/selftests/rtc/rtctest.c
> tools/testing/selftests/seccomp/Makefile
> tools/testing/selftests/seccomp/seccomp_bpf.c
> tools/testing/selftests/uevent/Makefile
> tools/testing/selftests/uevent/uevent_filtering.c


> Thus, I can personally see a lot of value in integrating the kunit
> test framework with this kselftest harness. There's only a small
> number of users of the kselftest harness today, so one way or another
> it seems like getting this integrated early would be a good idea.
> Letting Kunit and Kselftests progress independently for a few years
> will only make this worse and may become something we end up
> regretting.

Yes, this I agree with.

-Frank

> 
> Logan

Re: [PATCH] platform/chrome: cros_ec_spi: Always add of_match_table

2019-05-10 Thread Benson Leung

Hi Evan,

On Thu, May 09, 2019 at 11:17:50AM -0700, Evan Green wrote:
> The Chrome OS EC driver attaches to devices using the of_match_table
> even when ACPI is the underlying firmware. It does this using the
> magic PRP0001 ACPI HID, which tells ACPI to go find an OF compatible
> string under the hood and match on that.
> 
> The cros_ec_spi driver needs to provide the of_match_table regardless
> of whether CONFIG_OF is enabled or not, since the table is used by
> ACPI for PRP0001 devices.
> 
> Signed-off-by: Evan Green 

Looks good to me.
Reviewed-by: Benson Leung 

I'll leave this to Enric to merge to our for-next.

Thanks,
Benson

-- 
Benson Leung
Staff Software Engineer
Chrome OS Kernel
Google Inc.
ble...@google.com
Chromium OS Project
ble...@chromium.org


signature.asc
Description: PGP signature

[PATCH v5 1/9] Revert "media: staging/imx: add media device to capture register"

2019-05-10 Thread Steve Longerbeam

The imx6-specific subdevs that register a capture device will no
longer hold a reference to the media device, so this commit must be
reverted.

This reverts commit 16204b8a1c1af77725533b77936e6c73953486ae.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-prpencvf.c   | 2 +-
 drivers/staging/media/imx/imx-media-capture.c | 6 +++---
 drivers/staging/media/imx/imx-media-csi.c | 2 +-
 drivers/staging/media/imx/imx-media.h | 3 +--
 drivers/staging/media/imx/imx7-media-csi.c| 2 +-
 5 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index 3ca1422f6154..5c8e6ad8c025 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -1270,7 +1270,7 @@ static int prp_registered(struct v4l2_subdev *sd)
if (ret)
return ret;
 
-   ret = imx_media_capture_device_register(priv->md, priv->vdev);
+   ret = imx_media_capture_device_register(priv->vdev);
if (ret)
return ret;
 
diff --git a/drivers/staging/media/imx/imx-media-capture.c 
b/drivers/staging/media/imx/imx-media-capture.c
index 7688238a3396..9703c85b19c4 100644
--- a/drivers/staging/media/imx/imx-media-capture.c
+++ b/drivers/staging/media/imx/imx-media-capture.c
@@ -706,8 +706,7 @@ void imx_media_capture_device_error(struct 
imx_media_video_dev *vdev)
 }
 EXPORT_SYMBOL_GPL(imx_media_capture_device_error);
 
-int imx_media_capture_device_register(struct imx_media_dev *md,
- struct imx_media_video_dev *vdev)
+int imx_media_capture_device_register(struct imx_media_video_dev *vdev)
 {
struct capture_priv *priv = to_capture_priv(vdev);
struct v4l2_subdev *sd = priv->src_sd;
@@ -716,7 +715,8 @@ int imx_media_capture_device_register(struct imx_media_dev 
*md,
struct v4l2_subdev_format fmt_src;
int ret;
 
-   priv->md = md;
+   /* get media device */
+   priv->md = dev_get_drvdata(sd->v4l2_dev->dev);
 
vfd->v4l2_dev = sd->v4l2_dev;
 
diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index c33d714ed953..41965d8b56c4 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -1816,7 +1816,7 @@ static int csi_registered(struct v4l2_subdev *sd)
if (ret)
goto free_fim;
 
-   ret = imx_media_capture_device_register(priv->md, priv->vdev);
+   ret = imx_media_capture_device_register(priv->vdev);
if (ret)
goto free_fim;
 
diff --git a/drivers/staging/media/imx/imx-media.h 
b/drivers/staging/media/imx/imx-media.h
index fc5d969ded79..dd603a6b3a70 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -272,8 +272,7 @@ int imx_media_of_add_csi(struct imx_media_dev *imxmd,
 struct imx_media_video_dev *
 imx_media_capture_device_init(struct v4l2_subdev *src_sd, int pad);
 void imx_media_capture_device_remove(struct imx_media_video_dev *vdev);
-int imx_media_capture_device_register(struct imx_media_dev *md,
- struct imx_media_video_dev *vdev);
+int imx_media_capture_device_register(struct imx_media_video_dev *vdev);
 void imx_media_capture_device_unregister(struct imx_media_video_dev *vdev);
 struct imx_media_buffer *
 imx_media_capture_device_next_buf(struct imx_media_video_dev *vdev);
diff --git a/drivers/staging/media/imx/imx7-media-csi.c 
b/drivers/staging/media/imx/imx7-media-csi.c
index a708a0340eb1..18eb5d3ecf10 100644
--- a/drivers/staging/media/imx/imx7-media-csi.c
+++ b/drivers/staging/media/imx/imx7-media-csi.c
@@ -1126,7 +1126,7 @@ static int imx7_csi_registered(struct v4l2_subdev *sd)
if (ret < 0)
return ret;
 
-   ret = imx_media_capture_device_register(csi->imxmd, csi->vdev);
+   ret = imx_media_capture_device_register(csi->vdev);
if (ret < 0)
return ret;
 
-- 
2.17.1

Re: [RFC][PATCH 3/2] livepatch: remove klp_check_compiler_support()

2019-05-10 Thread Steven Rostedt

On Fri, 10 May 2019 23:47:50 +0200 (CEST)
Jiri Kosina  wrote:

> From: Jiri Kosina 
> 
> The only purpose of klp_check_compiler_support() is to make sure that we 
> are not using ftrace on x86 via mcount (because that's executed only after 
> prologue has already happened, and that's too late for livepatching 
> purposes).
> 
> Now that mcount is not supported by ftrace any more, there is no need for 
> klp_check_compiler_support() either.
> 
> Reported-by: Linus Torvalds 
> Signed-off-by: Jiri Kosina 
> ---
> 
> I guess it makes most sense to merge this together with mcount removal in 
> one go.

Thanks, I applied it to my queue and will start running it through my
tests.

-- Steve

[PATCH 5/5] net: phy: dp83867: Use unsigned variables to store unsigned properties

2019-05-10 Thread Trent Piepho

The variables used to store u32 DT properties were signed ints.  This
doesn't work properly if the value of the property were to overflow.
Use unsigned variables so this doesn't happen.

Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Heiner Kallweit 
Signed-off-by: Trent Piepho 
---
 drivers/net/phy/dp83867.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index a46cc9427fb3..edd9e27425e8 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -82,9 +82,9 @@ enum {
 };
 
 struct dp83867_private {
-   int rx_id_delay;
-   int tx_id_delay;
-   int fifo_depth;
+   u32 rx_id_delay;
+   u32 tx_id_delay;
+   u32 fifo_depth;
int io_impedance;
int port_mirroring;
bool rxctrl_strap_quirk;
-- 
2.14.5

[PATCH 3/5] net: phy: dp83867: Add ability to disable output clock

2019-05-10 Thread Trent Piepho

Generally, the output clock pin is only used for testing and only serves
as a source of RF noise after this.  It could be used to daisy-chain
PHYs, but this is uncommon.  Since the PHY can disable the output, make
doing so an option.  I do this by adding another enumeration to the
allowed values of ti,clk-output-sel.

The code was not using the value DP83867_CLK_O_SEL_REF_CLK as one might
expect: to select the REF_CLK as the output.  Rather it meant "keep
clock output setting as is", which, depending on PHY strapping, might
not be outputting REF_CLK.

Change this so DP83867_CLK_O_SEL_REF_CLK means enable REF_CLK output.
Omitting the property will leave the setting as is (which was the
previous behavior in this case).

Out of range values were silently converted into
DP83867_CLK_O_SEL_REF_CLK.  Change this so they generate an error.

Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Heiner Kallweit 
Signed-off-by: Trent Piepho 
---
 drivers/net/phy/dp83867.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index fd35131a0c39..420729cd6025 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -68,6 +68,7 @@
 
 #define DP83867_IO_MUX_CFG_IO_IMPEDANCE_MAX0x0
 #define DP83867_IO_MUX_CFG_IO_IMPEDANCE_MIN0x1f
+#define DP83867_IO_MUX_CFG_CLK_O_DISABLE   BIT(6)
 #define DP83867_IO_MUX_CFG_CLK_O_SEL_MASK  (0x1f << 8)
 #define DP83867_IO_MUX_CFG_CLK_O_SEL_SHIFT 8
 
@@ -87,7 +88,8 @@ struct dp83867_private {
int io_impedance;
int port_mirroring;
bool rxctrl_strap_quirk;
-   int clk_output_sel;
+   bool set_clk_output;
+   u32 clk_output_sel;
 };
 
 static int dp83867_ack_interrupt(struct phy_device *phydev)
@@ -154,11 +156,16 @@ static int dp83867_of_init(struct phy_device *phydev)
/* Optional configuration */
ret = of_property_read_u32(of_node, "ti,clk-output-sel",
   >clk_output_sel);
-   if (ret || dp83867->clk_output_sel > DP83867_CLK_O_SEL_REF_CLK)
-   /* Keep the default value if ti,clk-output-sel is not set
-* or too high
-*/
-   dp83867->clk_output_sel = DP83867_CLK_O_SEL_REF_CLK;
+   /* If not set, keep default */
+   if (!ret) {
+   dp83867->set_clk_output = true;
+   if (dp83867->clk_output_sel > DP83867_CLK_O_SEL_REF_CLK &&
+   dp83867->clk_output_sel != DP83867_CLK_O_SEL_OFF) {
+   phydev_err(phydev, "ti,clk-output-sel value %u out of 
range\n",
+  dp83867->clk_output_sel);
+   return -EINVAL;
+   }
+   }
 
if (of_property_read_bool(of_node, "ti,max-output-impedance"))
dp83867->io_impedance = DP83867_IO_MUX_CFG_IO_IMPEDANCE_MAX;
@@ -288,11 +295,20 @@ static int dp83867_config_init(struct phy_device *phydev)
dp83867_config_port_mirroring(phydev);
 
/* Clock output selection if muxing property is set */
-   if (dp83867->clk_output_sel != DP83867_CLK_O_SEL_REF_CLK)
+   if (dp83867->set_clk_output) {
+   u16 mask = DP83867_IO_MUX_CFG_CLK_O_DISABLE;
+
+   if (dp83867->clk_output_sel == DP83867_CLK_O_SEL_OFF) {
+   val = DP83867_IO_MUX_CFG_CLK_O_DISABLE;
+   } else {
+   mask |= DP83867_IO_MUX_CFG_CLK_O_SEL_MASK;
+   val = dp83867->clk_output_sel <<
+ DP83867_IO_MUX_CFG_CLK_O_SEL_SHIFT;
+   }
+
phy_modify_mmd(phydev, DP83867_DEVADDR, DP83867_IO_MUX_CFG,
-  DP83867_IO_MUX_CFG_CLK_O_SEL_MASK,
-  dp83867->clk_output_sel <<
-  DP83867_IO_MUX_CFG_CLK_O_SEL_SHIFT);
+  mask, val);
+   }
 
return 0;
 }
-- 
2.14.5

Re: [PATCH 0/2] arm64: dts: meson: g12a board node order

2019-05-10 Thread Kevin Hilman

Jerome Brunet  writes:

> The order of the nodes in the u200 and sei510 is bit fancy.
> Order nodes by address, then node name, then aliases.
>
> This makes rebasing is little less painful

Fully agree.  Thanks for the cleanup.

Queued for v5.3 (branch: v5.3/dt64)

Kevin

[PATCH 4/5] net: phy: dp83867: Disable tx/rx delay when not configured

2019-05-10 Thread Trent Piepho

The code was assuming the reset default of the delay control register
was to have delay disabled.  This is what the datasheet shows as the
register's initial value.  However, that's not actually true: the
default is controlled by the PHY's pin strapping.

If the interface mode is selected as RX or TX delay only, insure the
other direction's delay is disabled.

If the interface mode is just "rgmii", with neither TX or RX internal
delay, one might expect that the driver should disable both delays.  But
this is not what the driver does.  It leaves the setting at the PHY's
strapping's default.  And that default, for no pins with strapping
resistors, is to have delay enabled and 2.00 ns.

Rather than change this behavior, I've kept it the same and documented
it.  No delay will most likely not work and will break ethernet on any
board using "rgmii" mode.

Cc: Andrew Lunn 
Cc: Florian Fainelli 
Cc: Heiner Kallweit 
Signed-off-by: Trent Piepho 
---
 drivers/net/phy/dp83867.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 420729cd6025..a46cc9427fb3 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -256,10 +256,16 @@ static int dp83867_config_init(struct phy_device *phydev)
return ret;
}
 
+   /* If rgmii mode with no internal delay is selected, we do NOT use
+* aligned mode as one might expect.  Instead we use the PHY's default
+* based on pin strapping.  And the "mode 0" default is to *use*
+* internal delay with a value of 7 (2.00 ns).
+*/
if ((phydev->interface >= PHY_INTERFACE_MODE_RGMII_ID) &&
(phydev->interface <= PHY_INTERFACE_MODE_RGMII_RXID)) {
val = phy_read_mmd(phydev, DP83867_DEVADDR, DP83867_RGMIICTL);
 
+   val &= ~(DP83867_RGMII_TX_CLK_DELAY_EN | 
DP83867_RGMII_RX_CLK_DELAY_EN);
if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
val |= (DP83867_RGMII_TX_CLK_DELAY_EN | 
DP83867_RGMII_RX_CLK_DELAY_EN);
 
-- 
2.14.5

[RFC][PATCH 3/2] livepatch: remove klp_check_compiler_support()

2019-05-10 Thread Jiri Kosina

From: Jiri Kosina 

The only purpose of klp_check_compiler_support() is to make sure that we 
are not using ftrace on x86 via mcount (because that's executed only after 
prologue has already happened, and that's too late for livepatching 
purposes).

Now that mcount is not supported by ftrace any more, there is no need for 
klp_check_compiler_support() either.

Reported-by: Linus Torvalds 
Signed-off-by: Jiri Kosina 
---

I guess it makes most sense to merge this together with mcount removal in 
one go.

 arch/powerpc/include/asm/livepatch.h | 5 -
 arch/s390/include/asm/livepatch.h| 5 -
 arch/x86/include/asm/livepatch.h | 5 -
 kernel/livepatch/core.c  | 8 
 4 files changed, 23 deletions(-)

diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 5070df19d463..c005aee5ea43 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -24,11 +24,6 @@
 #include 
 
 #ifdef CONFIG_LIVEPATCH
-static inline int klp_check_compiler_support(void)
-{
-   return 0;
-}
-
 static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
 {
regs->nip = ip;
diff --git a/arch/s390/include/asm/livepatch.h 
b/arch/s390/include/asm/livepatch.h
index 672f95b12d40..818612b784cd 100644
--- a/arch/s390/include/asm/livepatch.h
+++ b/arch/s390/include/asm/livepatch.h
@@ -13,11 +13,6 @@
 
 #include 
 
-static inline int klp_check_compiler_support(void)
-{
-   return 0;
-}
-
 static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
 {
regs->psw.addr = ip;
diff --git a/arch/x86/include/asm/livepatch.h b/arch/x86/include/asm/livepatch.h
index 2f2bdf0662f8..a66f6706c2de 100644
--- a/arch/x86/include/asm/livepatch.h
+++ b/arch/x86/include/asm/livepatch.h
@@ -24,11 +24,6 @@
 #include 
 #include 
 
-static inline int klp_check_compiler_support(void)
-{
-   return 0;
-}
-
 static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
 {
regs->ip = ip;
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index f12c0eabd843..7e5cdeeca3bd 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -1207,14 +1207,6 @@ void klp_module_going(struct module *mod)
 
 static int __init klp_init(void)
 {
-   int ret;
-
-   ret = klp_check_compiler_support();
-   if (ret) {
-   pr_info("Your compiler is too old; turning off.\n");
-   return -EINVAL;
-   }
-
klp_root_kobj = kobject_create_and_add("livepatch", kernel_kobj);
if (!klp_root_kobj)
return -ENOMEM;
-- 
Jiri Kosina
SUSE Labs

Re: [PATCH 2/4] powerpc/stackprotector: work around stack-guard init from atomic

2019-05-10 Thread Steven Rostedt

On Wed, 27 Mar 2019 19:33:08 +0100
Sebastian Andrzej Siewior  wrote:

> This is invoked from the secondary CPU in atomic context. On x86 we use
> tsc instead. On Power we XOR it against mftb() so lets use stack address
> as the initial value.
> 
> Signed-off-by: Sebastian Andrzej Siewior 

Hi Sebastian,

in your repo, you marked this as stable-rt, but this code was added in
4.20, and the next -rt is at 4.19.

-- Steve



> ---
>  arch/powerpc/include/asm/stackprotector.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/stackprotector.h 
> b/arch/powerpc/include/asm/stackprotector.h
> index 1c8460e235838..e764eb4b6c284 100644
> --- a/arch/powerpc/include/asm/stackprotector.h
> +++ b/arch/powerpc/include/asm/stackprotector.h
> @@ -24,7 +24,11 @@ static __always_inline void boot_init_stack_canary(void)
>   unsigned long canary;
>  
>   /* Try to get a semi random initial value. */
> +#ifdef CONFIG_PREEMPT_RT_FULL
> + canary = (unsigned long)
> +#else
>   canary = get_random_canary();
> +#endif
>   canary ^= mftb();
>   canary ^= LINUX_VERSION_CODE;
>   canary &= CANARY_MASK;

[PATCH v5 5/9] Revert "media: imx: Set capture compose rectangle in capture_device_set_format"

2019-05-10 Thread Steve Longerbeam

Rvert this commit, as imx_media_capture_device_set_format() will be
removed. The arguments to mx_media_mbus_fmt_to_pix_fmt() and
imx_media_capture_device_set_format() in imx7_csi_set_fmt() are also
reverted.

This reverts commit 5964cbd8692252615370b77eb96764dd70c2f837.

Signed-off-by: Steve Longerbeam 
---
Chnges in v3:
- revert to previous args in imx7_csi_set_fmt().
---
 drivers/staging/media/imx/imx-ic-prpencvf.c   |  5 ++--
 drivers/staging/media/imx/imx-media-capture.c | 24 +--
 drivers/staging/media/imx/imx-media-csi.c |  5 ++--
 drivers/staging/media/imx/imx-media-utils.c   | 20 
 drivers/staging/media/imx/imx-media.h |  6 ++---
 drivers/staging/media/imx/imx7-media-csi.c|  5 ++--
 6 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index 8242d88dfb82..afaa3a8b15e9 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -910,7 +910,6 @@ static int prp_set_fmt(struct v4l2_subdev *sd,
const struct imx_media_pixfmt *cc;
struct v4l2_pix_format vdev_fmt;
struct v4l2_mbus_framefmt *fmt;
-   struct v4l2_rect vdev_compose;
int ret = 0;
 
if (sdformat->pad >= PRPENCVF_NUM_PADS)
@@ -952,11 +951,11 @@ static int prp_set_fmt(struct v4l2_subdev *sd,
priv->cc[sdformat->pad] = cc;
 
/* propagate output pad format to capture device */
-   imx_media_mbus_fmt_to_pix_fmt(_fmt, _compose,
+   imx_media_mbus_fmt_to_pix_fmt(_fmt,
  >format_mbus[PRPENCVF_SRC_PAD],
  priv->cc[PRPENCVF_SRC_PAD]);
mutex_unlock(>lock);
-   imx_media_capture_device_set_format(vdev, _fmt, _compose);
+   imx_media_capture_device_set_format(vdev, _fmt);
 
return 0;
 out:
diff --git a/drivers/staging/media/imx/imx-media-capture.c 
b/drivers/staging/media/imx/imx-media-capture.c
index 335084a6b0cd..555f6204660b 100644
--- a/drivers/staging/media/imx/imx-media-capture.c
+++ b/drivers/staging/media/imx/imx-media-capture.c
@@ -205,8 +205,7 @@ static int capture_g_fmt_vid_cap(struct file *file, void 
*fh,
 
 static int __capture_try_fmt_vid_cap(struct capture_priv *priv,
 struct v4l2_subdev_format *fmt_src,
-struct v4l2_format *f,
-struct v4l2_rect *compose)
+struct v4l2_format *f)
 {
const struct imx_media_pixfmt *cc, *cc_src;
 
@@ -246,8 +245,7 @@ static int __capture_try_fmt_vid_cap(struct capture_priv 
*priv,
}
}
 
-   imx_media_mbus_fmt_to_pix_fmt(>fmt.pix, compose,
- _src->format, cc);
+   imx_media_mbus_fmt_to_pix_fmt(>fmt.pix, _src->format, cc);
 
return 0;
 }
@@ -265,7 +263,7 @@ static int capture_try_fmt_vid_cap(struct file *file, void 
*fh,
if (ret)
return ret;
 
-   return __capture_try_fmt_vid_cap(priv, _src, f, NULL);
+   return __capture_try_fmt_vid_cap(priv, _src, f);
 }
 
 static int capture_s_fmt_vid_cap(struct file *file, void *fh,
@@ -273,7 +271,6 @@ static int capture_s_fmt_vid_cap(struct file *file, void 
*fh,
 {
struct capture_priv *priv = video_drvdata(file);
struct v4l2_subdev_format fmt_src;
-   struct v4l2_rect compose;
int ret;
 
if (vb2_is_busy(>q)) {
@@ -287,14 +284,17 @@ static int capture_s_fmt_vid_cap(struct file *file, void 
*fh,
if (ret)
return ret;
 
-   ret = __capture_try_fmt_vid_cap(priv, _src, f, );
+   ret = __capture_try_fmt_vid_cap(priv, _src, f);
if (ret)
return ret;
 
priv->vdev.fmt.fmt.pix = f->fmt.pix;
priv->vdev.cc = imx_media_find_format(f->fmt.pix.pixelformat,
  CS_SEL_ANY, true);
-   priv->vdev.compose = compose;
+   priv->vdev.compose.left = 0;
+   priv->vdev.compose.top = 0;
+   priv->vdev.compose.width = fmt_src.format.width;
+   priv->vdev.compose.height = fmt_src.format.height;
 
return 0;
 }
@@ -655,8 +655,7 @@ static struct video_device capture_videodev = {
 };
 
 void imx_media_capture_device_set_format(struct imx_media_video_dev *vdev,
-const struct v4l2_pix_format *pix,
-const struct v4l2_rect *compose)
+struct v4l2_pix_format *pix)
 {
struct capture_priv *priv = to_capture_priv(vdev);
 
@@ -664,7 +663,6 @@ void imx_media_capture_device_set_format(struct 
imx_media_video_dev *vdev,
priv->vdev.fmt.fmt.pix = *pix;
priv->vdev.cc = imx_media_find_format(pix->pixelformat, CS_SEL_ANY,
  true);
-

[PATCH v5 4/9] media: staging/imx: Move add_video_device into capture_device_register

2019-05-10 Thread Steve Longerbeam

Move imx_media_add_video_device() into imx_media_capture_device_register().
Also the former has no error conditions to convert to void.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-prpencvf.c   |  5 -
 drivers/staging/media/imx/imx-media-capture.c |  3 +++
 drivers/staging/media/imx/imx-media-csi.c |  7 +--
 drivers/staging/media/imx/imx-media-utils.c   |  9 -
 drivers/staging/media/imx/imx-media.h |  4 ++--
 drivers/staging/media/imx/imx7-media-csi.c| 12 +---
 6 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index ddcd87a17c71..8242d88dfb82 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -1241,7 +1241,6 @@ static int prp_s_frame_interval(struct v4l2_subdev *sd,
 static int prp_registered(struct v4l2_subdev *sd)
 {
struct prp_priv *priv = sd_to_priv(sd);
-   struct imx_ic_priv *ic_priv = priv->ic_priv;
int i, ret;
u32 code;
 
@@ -1271,10 +1270,6 @@ static int prp_registered(struct v4l2_subdev *sd)
if (ret)
return ret;
 
-   ret = imx_media_add_video_device(ic_priv->md, priv->vdev);
-   if (ret)
-   goto unreg;
-
ret = prp_init_controls(priv);
if (ret)
goto unreg;
diff --git a/drivers/staging/media/imx/imx-media-capture.c 
b/drivers/staging/media/imx/imx-media-capture.c
index 211ec4df2066..335084a6b0cd 100644
--- a/drivers/staging/media/imx/imx-media-capture.c
+++ b/drivers/staging/media/imx/imx-media-capture.c
@@ -780,6 +780,9 @@ int imx_media_capture_device_register(struct 
imx_media_video_dev *vdev)
 
vfd->ctrl_handler = >ctrl_hdlr;
 
+   /* add vdev to the video device list */
+   imx_media_add_video_device(priv->md, vdev);
+
return 0;
 unreg:
video_unregister_device(vfd);
diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index ea3d13103c91..c70fa6b509ae 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -1820,13 +1820,8 @@ static int csi_registered(struct v4l2_subdev *sd)
if (ret)
goto free_fim;
 
-   ret = imx_media_add_video_device(priv->md, priv->vdev);
-   if (ret)
-   goto unreg;
-
return 0;
-unreg:
-   imx_media_capture_device_unregister(priv->vdev);
+
 free_fim:
if (priv->fim)
imx_media_fim_free(priv->fim);
diff --git a/drivers/staging/media/imx/imx-media-utils.c 
b/drivers/staging/media/imx/imx-media-utils.c
index c52aa59acd05..8a6e57652402 100644
--- a/drivers/staging/media/imx/imx-media-utils.c
+++ b/drivers/staging/media/imx/imx-media-utils.c
@@ -767,18 +767,17 @@ imx_media_find_subdev_by_devname(struct imx_media_dev 
*imxmd,
 EXPORT_SYMBOL_GPL(imx_media_find_subdev_by_devname);
 
 /*
- * Adds a video device to the master video device list. This is called by
- * an async subdev that owns a video device when it is registered.
+ * Adds a video device to the master video device list. This is called
+ * when a video device is registered.
  */
-int imx_media_add_video_device(struct imx_media_dev *imxmd,
-  struct imx_media_video_dev *vdev)
+void imx_media_add_video_device(struct imx_media_dev *imxmd,
+   struct imx_media_video_dev *vdev)
 {
mutex_lock(>mutex);
 
list_add_tail(>list, >vdev_list);
 
mutex_unlock(>mutex);
-   return 0;
 }
 EXPORT_SYMBOL_GPL(imx_media_add_video_device);
 
diff --git a/drivers/staging/media/imx/imx-media.h 
b/drivers/staging/media/imx/imx-media.h
index ba2d75bcc4c9..71e20f53ed7b 100644
--- a/drivers/staging/media/imx/imx-media.h
+++ b/drivers/staging/media/imx/imx-media.h
@@ -189,8 +189,8 @@ imx_media_find_subdev_by_fwnode(struct imx_media_dev *imxmd,
 struct v4l2_subdev *
 imx_media_find_subdev_by_devname(struct imx_media_dev *imxmd,
 const char *devname);
-int imx_media_add_video_device(struct imx_media_dev *imxmd,
-  struct imx_media_video_dev *vdev);
+void imx_media_add_video_device(struct imx_media_dev *imxmd,
+   struct imx_media_video_dev *vdev);
 int imx_media_find_mipi_csi2_channel(struct imx_media_dev *imxmd,
 struct media_entity *start_entity);
 struct media_pad *
diff --git a/drivers/staging/media/imx/imx7-media-csi.c 
b/drivers/staging/media/imx/imx7-media-csi.c
index 96d01d8af874..f2037aba6e0e 100644
--- a/drivers/staging/media/imx/imx7-media-csi.c
+++ b/drivers/staging/media/imx/imx7-media-csi.c
@@ -1126,17 +1126,7 @@ static int imx7_csi_registered(struct v4l2_subdev *sd)
if (ret < 0)
return ret;
 
-   ret = imx_media_capture_device_register(csi->vdev);
-   if (ret < 0)

[PATCH v5 9/9] media: staging/imx: Don't set driver data for v4l2_dev

2019-05-10 Thread Steve Longerbeam

The media device is already available via multiple methods, there is no
need to set driver data for v4l2_dev to the media device.

In imx_media_link_notify(), get media device from link->graph_obj.mdev.

In imx_media_capture_device_register(), get media device from
v4l2_dev->mdev.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-media-capture.c| 5 +++--
 drivers/staging/media/imx/imx-media-dev-common.c | 7 ++-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/media/imx/imx-media-capture.c 
b/drivers/staging/media/imx/imx-media-capture.c
index 8a908c3e5e60..ea7f2decfc16 100644
--- a/drivers/staging/media/imx/imx-media-capture.c
+++ b/drivers/staging/media/imx/imx-media-capture.c
@@ -735,15 +735,16 @@ int imx_media_capture_device_register(struct 
imx_media_video_dev *vdev)
 {
struct capture_priv *priv = to_capture_priv(vdev);
struct v4l2_subdev *sd = priv->src_sd;
+   struct v4l2_device *v4l2_dev = sd->v4l2_dev;
struct video_device *vfd = vdev->vfd;
struct vb2_queue *vq = >q;
struct v4l2_subdev_format fmt_src;
int ret;
 
/* get media device */
-   priv->md = dev_get_drvdata(sd->v4l2_dev->dev);
+   priv->md = container_of(v4l2_dev->mdev, struct imx_media_dev, md);
 
-   vfd->v4l2_dev = sd->v4l2_dev;
+   vfd->v4l2_dev = v4l2_dev;
 
ret = video_register_device(vfd, VFL_TYPE_GRABBER, -1);
if (ret) {
diff --git a/drivers/staging/media/imx/imx-media-dev-common.c 
b/drivers/staging/media/imx/imx-media-dev-common.c
index 89dc4ec8dadb..66b505f7e8df 100644
--- a/drivers/staging/media/imx/imx-media-dev-common.c
+++ b/drivers/staging/media/imx/imx-media-dev-common.c
@@ -260,10 +260,11 @@ static int imx_media_inherit_controls(struct 
imx_media_dev *imxmd,
 static int imx_media_link_notify(struct media_link *link, u32 flags,
 unsigned int notification)
 {
+   struct imx_media_dev *imxmd = container_of(link->graph_obj.mdev,
+  struct imx_media_dev, md);
struct media_entity *source = link->source->entity;
struct imx_media_pad_vdev *pad_vdev;
struct list_head *pad_vdev_list;
-   struct imx_media_dev *imxmd;
struct video_device *vfd;
struct v4l2_subdev *sd;
int pad_idx, ret;
@@ -279,8 +280,6 @@ static int imx_media_link_notify(struct media_link *link, 
u32 flags,
sd = media_entity_to_v4l2_subdev(source);
pad_idx = link->source->index;
 
-   imxmd = dev_get_drvdata(sd->v4l2_dev->dev);
-
pad_vdev_list = to_pad_vdev_list(sd, pad_idx);
if (!pad_vdev_list) {
/* nothing to do if source sd has no pad vdev list */
@@ -384,8 +383,6 @@ struct imx_media_dev *imx_media_dev_init(struct device *dev,
goto cleanup;
}
 
-   dev_set_drvdata(imxmd->v4l2_dev.dev, imxmd);
-
INIT_LIST_HEAD(>vdev_list);
 
v4l2_async_notifier_init(>notifier);
-- 
2.17.1

[PATCH v5 2/9] media: staging/imx: Switch to sync registration for IPU subdevs

2019-05-10 Thread Steve Longerbeam

Because the IPU sub-devices VDIC and IC are not present in the
device-tree, platform devices were created for them instead. This
allowed these sub-devices to be added to the media device's async
notifier and registered asynchronously along with the other
sub-devices that do have a device-tree presence (CSI and devices
external to the IPU and SoC).

But that approach isn't really necessary. The IPU sub-devices don't
actually require a backing device (sd->dev is allowed to be NULL).
And that approach can't get around the fact that the IPU sub-devices
are not part of a device hierarchy, which makes it awkward to retrieve
the parent IPU of these devices.

By registering them synchronously, they can be registered from the CSI
async bound notifier, so the init function for them can be given the CSI
subdev, who's dev->parent is the IPU. That is a somewhat cleaner way
to retrieve the parent IPU.

So convert to synchronous registration for the VDIC and IC task
sub-devices, at the time a CSI sub-device is bound. There is no longer
a backing device for them (sd->dev is NULL), but that's ok. Also
set the VDIC/IC sub-device owner as the IPU, so that a reference can
be taken on the IPU module.

Since the VDIC and IC task drivers are no longer platform drivers,
they are now statically linked to imx-media module.

Signed-off-by: Steve Longerbeam 
---
Changes in v3:
- statically link VDIC and IC task objects to imx-media module in
  Makefile.
---
 drivers/staging/media/imx/Makefile|   6 +-
 drivers/staging/media/imx/imx-ic-common.c |  70 ++--
 drivers/staging/media/imx/imx-ic-prp.c|  34 +-
 drivers/staging/media/imx/imx-ic-prpencvf.c   |  70 ++--
 drivers/staging/media/imx/imx-ic.h|   7 +-
 drivers/staging/media/imx/imx-media-capture.c |   7 +-
 drivers/staging/media/imx/imx-media-csi.c |   2 +-
 drivers/staging/media/imx/imx-media-dev.c | 121 +-
 .../staging/media/imx/imx-media-internal-sd.c | 356 --
 drivers/staging/media/imx/imx-media-of.c  |  38 +-
 drivers/staging/media/imx/imx-media-vdic.c|  85 ++---
 drivers/staging/media/imx/imx-media.h |  67 ++--
 drivers/staging/media/imx/imx7-media-csi.c|   3 +-
 13 files changed, 327 insertions(+), 539 deletions(-)

diff --git a/drivers/staging/media/imx/Makefile 
b/drivers/staging/media/imx/Makefile
index d2d909a36239..86f0c81b6a3b 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -1,14 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0
-imx-media-objs := imx-media-dev.o imx-media-internal-sd.o imx-media-of.o
+imx-media-objs := imx-media-dev.o imx-media-internal-sd.o imx-media-of.o \
+   imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o
 imx-media-objs += imx-media-dev-common.o
 imx-media-common-objs := imx-media-utils.o imx-media-fim.o
-imx-media-ic-objs := imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o
 
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
-obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-vdic.o
-obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-ic.o
 
 obj-$(CONFIG_VIDEO_IMX_CSI) += imx-media-csi.o
 obj-$(CONFIG_VIDEO_IMX_CSI) += imx6-mipi-csi2.o
diff --git a/drivers/staging/media/imx/imx-ic-common.c 
b/drivers/staging/media/imx/imx-ic-common.c
index 1addb0893c57..ad0c291db03c 100644
--- a/drivers/staging/media/imx/imx-ic-common.c
+++ b/drivers/staging/media/imx/imx-ic-common.c
@@ -8,8 +8,6 @@
  * the Free Software Foundation; either version 2 of the License, or
  * (at your option) any later version.
  */
-#include 
-#include 
 #include 
 #include 
 #include "imx-media.h"
@@ -24,23 +22,25 @@ static struct imx_ic_ops *ic_ops[IC_NUM_OPS] = {
[IC_TASK_VIEWFINDER] = _ic_prpencvf_ops,
 };
 
-static int imx_ic_probe(struct platform_device *pdev)
+struct v4l2_subdev *imx_media_ic_register(struct imx_media_dev *imxmd,
+ struct device *ipu_dev,
+ struct ipu_soc *ipu,
+ u32 grp_id)
 {
-   struct imx_media_ipu_internal_sd_pdata *pdata;
+   struct v4l2_device *v4l2_dev = >v4l2_dev;
struct imx_ic_priv *priv;
int ret;
 
-   priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL);
+   priv = devm_kzalloc(ipu_dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
-   platform_set_drvdata(pdev, >sd);
-   priv->dev = >dev;
+   priv->ipu_dev = ipu_dev;
+   priv->ipu = ipu;
+   priv->md = imxmd;
 
-   /* get our ipu_id, grp_id and IC task id */
-   pdata = priv->dev->platform_data;
-   priv->ipu_id = pdata->ipu_id;
-   switch (pdata->grp_id) {
+   /* get our IC task id */
+   switch (grp_id) {
case IMX_MEDIA_GRP_ID_IPU_IC_PRP:
priv->task_id = IC_TASK_PRP;

[GIT PULL] VFIO updates for v5.2-rc1

2019-05-10 Thread Alex Williamson

Hi Linus,

The following changes since commit 085b7755808aa11f78ab9377257e1dad2e6fa4bb:

  Linux 5.1-rc6 (2019-04-21 10:45:57 -0700)

are available in the Git repository at:

  git://github.com/awilliam/linux-vfio.git tags/vfio-v5.2-rc1

for you to fetch changes up to 15c80c1659f27364734a3938b04d1c67479aa11c:

  vfio: Add Cornelia Huck as reviewer (2019-05-08 11:41:26 -0600)


VFIO updates for v5.2-rc1

 - Improve dev_printk() usage (Bjorn Helgaas)

 - Fix issue with blocking in !TASK_RUNNING state while waiting for
   userspace to release devices (Farhan Ali)

 - Fix error path cleanup in nvlink setup (Greg Kurz)

 - mdev-core cleanups and fixes in preparation for more use cases
   (Parav Pandit)

 - Cornelia has volunteered as an official vfio reviewer (Cornelia Huck)


Bjorn Helgaas (1):
  vfio: Use dev_printk() when possible

Cornelia Huck (1):
  vfio: Add Cornelia Huck as reviewer

Farhan Ali (1):
  vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"

Greg Kurz (1):
  vfio-pci/nvlink2: Fix potential VMA leak

Parav Pandit (7):
  vfio/mdev: Avoid release parent reference during error path
  vfio/mdev: Removed unused kref
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Avoid inline get and put parent helpers

 MAINTAINERS|  1 +
 drivers/vfio/mdev/mdev_core.c  | 18 +++
 drivers/vfio/mdev/mdev_private.h   |  1 -
 drivers/vfio/mdev/mdev_sysfs.c |  2 +-
 drivers/vfio/pci/vfio_pci.c| 23 -
 drivers/vfio/pci/vfio_pci_config.c | 29 +--
 drivers/vfio/pci/vfio_pci_nvlink2.c|  2 +
 .../vfio/platform/reset/vfio_platform_amdxgbe.c|  5 +-
 drivers/vfio/platform/vfio_platform_common.c   | 12 +++--
 drivers/vfio/vfio.c| 59 +-
 include/linux/mdev.h   | 21 
 include/linux/pci.h|  3 ++
 12 files changed, 81 insertions(+), 95 deletions(-)

[PATCH v5 8/9] media: staging/imx: Improve pipeline searching

2019-05-10 Thread Steve Longerbeam

Export find_pipeline_pad(), renaming to imx_media_pipeline_pad(), and
extend its functionality to allow searching for video devices in the
enabled pipeline in addition to sub-devices.

As part of this:

- Rename imx_media_find_mipi_csi2_channel() to
  imx_media_pipeline_csi2_channel().

- Remove imx_media_find_upstream_pad(), it is redundant now.

- Rename imx_media_find_upstream_subdev() to imx_media_pipeline_subdev()
  with an additional boolean argument for searching upstream or downstream.

- Add imx_media_pipeline_video_device() which is analogous to
  imx_media_pipeline_subdev() but searches for video devices.

- Remove imxmd pointer arg from all of the functions above, it was
  never used in those functions. With that change the i.MX5/6 CSI,
  VDIC, and IC sub-devices no longer require the media_device.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-common.c |   4 +-
 drivers/staging/media/imx/imx-ic-prp.c|   4 +-
 drivers/staging/media/imx/imx-ic.h|   1 -
 drivers/staging/media/imx/imx-media-csi.c |  13 +-
 drivers/staging/media/imx/imx-media-fim.c |   4 -
 .../staging/media/imx/imx-media-internal-sd.c |   5 +-
 drivers/staging/media/imx/imx-media-utils.c   | 128 ++
 drivers/staging/media/imx/imx-media-vdic.c|   5 +-
 drivers/staging/media/imx/imx-media.h |  20 +--
 drivers/staging/media/imx/imx7-media-csi.c|   2 +-
 10 files changed, 93 insertions(+), 93 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-common.c 
b/drivers/staging/media/imx/imx-ic-common.c
index ad0c291db03c..37734984beb4 100644
--- a/drivers/staging/media/imx/imx-ic-common.c
+++ b/drivers/staging/media/imx/imx-ic-common.c
@@ -22,12 +22,11 @@ static struct imx_ic_ops *ic_ops[IC_NUM_OPS] = {
[IC_TASK_VIEWFINDER] = _ic_prpencvf_ops,
 };
 
-struct v4l2_subdev *imx_media_ic_register(struct imx_media_dev *imxmd,
+struct v4l2_subdev *imx_media_ic_register(struct v4l2_device *v4l2_dev,
  struct device *ipu_dev,
  struct ipu_soc *ipu,
  u32 grp_id)
 {
-   struct v4l2_device *v4l2_dev = >v4l2_dev;
struct imx_ic_priv *priv;
int ret;
 
@@ -37,7 +36,6 @@ struct v4l2_subdev *imx_media_ic_register(struct 
imx_media_dev *imxmd,
 
priv->ipu_dev = ipu_dev;
priv->ipu = ipu;
-   priv->md = imxmd;
 
/* get our IC task id */
switch (grp_id) {
diff --git a/drivers/staging/media/imx/imx-ic-prp.c 
b/drivers/staging/media/imx/imx-ic-prp.c
index 663db200e594..1432776a33f9 100644
--- a/drivers/staging/media/imx/imx-ic-prp.c
+++ b/drivers/staging/media/imx/imx-ic-prp.c
@@ -302,8 +302,8 @@ static int prp_link_validate(struct v4l2_subdev *sd,
if (ret)
return ret;
 
-   csi = imx_media_find_upstream_subdev(ic_priv->md, _priv->sd.entity,
-IMX_MEDIA_GRP_ID_IPU_CSI);
+   csi = imx_media_pipeline_subdev(_priv->sd.entity,
+   IMX_MEDIA_GRP_ID_IPU_CSI, true);
if (IS_ERR(csi))
csi = NULL;
 
diff --git a/drivers/staging/media/imx/imx-ic.h 
b/drivers/staging/media/imx/imx-ic.h
index 1dcbb37aeada..ff2f66f11982 100644
--- a/drivers/staging/media/imx/imx-ic.h
+++ b/drivers/staging/media/imx/imx-ic.h
@@ -16,7 +16,6 @@
 struct imx_ic_priv {
struct device *ipu_dev;
struct ipu_soc *ipu;
-   struct imx_media_dev *md;
struct v4l2_subdev sd;
inttask_id;
void   *task_priv;
diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index 68c2b1a3066a..555904759078 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -60,7 +60,6 @@ struct csi_skip_desc {
 struct csi_priv {
struct device *dev;
struct ipu_soc *ipu;
-   struct imx_media_dev *md;
struct v4l2_subdev sd;
struct media_pad pad[CSI_NUM_PADS];
/* the video device at IDMAC output pad */
@@ -182,8 +181,8 @@ static int csi_get_upstream_endpoint(struct csi_priv *priv,
 * CSI-2 receiver if it is in the path, otherwise stay
 * with video mux.
 */
-   sd = imx_media_find_upstream_subdev(priv->md, src,
-   IMX_MEDIA_GRP_ID_CSI2);
+   sd = imx_media_pipeline_subdev(src, IMX_MEDIA_GRP_ID_CSI2,
+  true);
if (!IS_ERR(sd))
src = >entity;
}
@@ -197,7 +196,7 @@ static int csi_get_upstream_endpoint(struct csi_priv *priv,
src = >sd.entity;
 
/* get source pad of entity directly upstream from src */
-   pad = imx_media_find_upstream_pad(priv->md, src, 0);
+   pad = imx_media_pipeline_pad(src, 0, 0, true);

[PATCH v5 3/9] media: staging/imx: Pass device to alloc/free_dma_buf

2019-05-10 Thread Steve Longerbeam

Allocate and free a DMA coherent buffer in imx_media_alloc/free_dma_buf()
from the given device. This allows DMA alloc and free using a device
that is backed by real hardware, which for the imx5/6/7 CSI is the CSI
unit, and for the internal IPU sub-devices, is the parent IPU.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-prpencvf.c | 18 +-
 drivers/staging/media/imx/imx-media-csi.c   |  6 +++---
 drivers/staging/media/imx/imx-media-utils.c | 13 ++---
 drivers/staging/media/imx/imx-media.h   |  4 ++--
 drivers/staging/media/imx/imx7-media-csi.c  |  4 ++--
 5 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index 069cce512280..ddcd87a17c71 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -464,13 +464,13 @@ static int prp_setup_rotation(struct prp_priv *priv)
incc = priv->cc[PRPENCVF_SINK_PAD];
outcc = vdev->cc;
 
-   ret = imx_media_alloc_dma_buf(ic_priv->md, >rot_buf[0],
+   ret = imx_media_alloc_dma_buf(ic_priv->ipu_dev, >rot_buf[0],
  outfmt->sizeimage);
if (ret) {
v4l2_err(_priv->sd, "failed to alloc rot_buf[0], %d\n", ret);
return ret;
}
-   ret = imx_media_alloc_dma_buf(ic_priv->md, >rot_buf[1],
+   ret = imx_media_alloc_dma_buf(ic_priv->ipu_dev, >rot_buf[1],
  outfmt->sizeimage);
if (ret) {
v4l2_err(_priv->sd, "failed to alloc rot_buf[1], %d\n", ret);
@@ -543,9 +543,9 @@ static int prp_setup_rotation(struct prp_priv *priv)
 unsetup_vb2:
prp_unsetup_vb2_buf(priv, VB2_BUF_STATE_QUEUED);
 free_rot1:
-   imx_media_free_dma_buf(ic_priv->md, >rot_buf[1]);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >rot_buf[1]);
 free_rot0:
-   imx_media_free_dma_buf(ic_priv->md, >rot_buf[0]);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >rot_buf[0]);
return ret;
 }
 
@@ -563,8 +563,8 @@ static void prp_unsetup_rotation(struct prp_priv *priv)
 
ipu_ic_disable(priv->ic);
 
-   imx_media_free_dma_buf(ic_priv->md, >rot_buf[0]);
-   imx_media_free_dma_buf(ic_priv->md, >rot_buf[1]);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >rot_buf[0]);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >rot_buf[1]);
 }
 
 static int prp_setup_norotation(struct prp_priv *priv)
@@ -656,7 +656,7 @@ static int prp_start(struct prp_priv *priv)
 
outfmt = >fmt.fmt.pix;
 
-   ret = imx_media_alloc_dma_buf(ic_priv->md, >underrun_buf,
+   ret = imx_media_alloc_dma_buf(ic_priv->ipu_dev, >underrun_buf,
  outfmt->sizeimage);
if (ret)
goto out_put_ipu;
@@ -726,7 +726,7 @@ static int prp_start(struct prp_priv *priv)
 out_unsetup:
prp_unsetup(priv, VB2_BUF_STATE_QUEUED);
 out_free_underrun:
-   imx_media_free_dma_buf(ic_priv->md, >underrun_buf);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >underrun_buf);
 out_put_ipu:
prp_put_ipu_resources(priv);
return ret;
@@ -763,7 +763,7 @@ static void prp_stop(struct prp_priv *priv)
 
prp_unsetup(priv, VB2_BUF_STATE_ERROR);
 
-   imx_media_free_dma_buf(ic_priv->md, >underrun_buf);
+   imx_media_free_dma_buf(ic_priv->ipu_dev, >underrun_buf);
 
/* cancel the EOF timeout timer */
del_timer_sync(>eof_timeout_timer);
diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index 93b107eab5f5..ea3d13103c91 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -612,7 +612,7 @@ static int csi_idmac_start(struct csi_priv *priv)
 
outfmt = >fmt.fmt.pix;
 
-   ret = imx_media_alloc_dma_buf(priv->md, >underrun_buf,
+   ret = imx_media_alloc_dma_buf(priv->dev, >underrun_buf,
  outfmt->sizeimage);
if (ret)
goto out_put_ipu;
@@ -666,7 +666,7 @@ static int csi_idmac_start(struct csi_priv *priv)
 out_unsetup:
csi_idmac_unsetup(priv, VB2_BUF_STATE_QUEUED);
 out_free_dma_buf:
-   imx_media_free_dma_buf(priv->md, >underrun_buf);
+   imx_media_free_dma_buf(priv->dev, >underrun_buf);
 out_put_ipu:
csi_idmac_put_ipu_resources(priv);
return ret;
@@ -698,7 +698,7 @@ static void csi_idmac_stop(struct csi_priv *priv)
 
csi_idmac_unsetup(priv, VB2_BUF_STATE_ERROR);
 
-   imx_media_free_dma_buf(priv->md, >underrun_buf);
+   imx_media_free_dma_buf(priv->dev, >underrun_buf);
 
/* cancel the EOF timeout timer */
del_timer_sync(>eof_timeout_timer);
diff --git a/drivers/staging/media/imx/imx-media-utils.c 
b/drivers/staging/media/imx/imx-media-utils.c
index 1c63a2765a81..c52aa59acd05 100644
--- a/drivers/staging/media/imx/imx-media-utils.c

[PATCH v5 6/9] media: staging/imx: Remove capture_device_set_format

2019-05-10 Thread Steve Longerbeam

Don't propagate the source pad format to the connected capture device.
It's now the responsibility of userspace to call VIDIOC_S_FMT on the
capture device to ensure the capture format and compose rectangle
are compatible with the connected source. To check this, validate
the capture format with the source before streaming starts.

Signed-off-by: Steve Longerbeam 
---
Changes in v4:
- add **cc arg to __capture_try_fmt_vid_cap() to validate colorspace,
  instead of calling ipu_pixelformat_to_colorspace().
- add error message if capture format validation failed.
---
 drivers/staging/media/imx/imx-ic-prpencvf.c   | 16 +
 drivers/staging/media/imx/imx-media-capture.c | 71 +--
 drivers/staging/media/imx/imx-media-csi.c | 16 +
 drivers/staging/media/imx/imx-media.h |  2 -
 drivers/staging/media/imx/imx7-media-csi.c| 17 +
 5 files changed, 55 insertions(+), 67 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index afaa3a8b15e9..63334fd61492 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -906,9 +906,7 @@ static int prp_set_fmt(struct v4l2_subdev *sd,
   struct v4l2_subdev_format *sdformat)
 {
struct prp_priv *priv = sd_to_priv(sd);
-   struct imx_media_video_dev *vdev = priv->vdev;
const struct imx_media_pixfmt *cc;
-   struct v4l2_pix_format vdev_fmt;
struct v4l2_mbus_framefmt *fmt;
int ret = 0;
 
@@ -945,19 +943,9 @@ static int prp_set_fmt(struct v4l2_subdev *sd,
priv->cc[PRPENCVF_SRC_PAD] = outcc;
}
 
-   if (sdformat->which == V4L2_SUBDEV_FORMAT_TRY)
-   goto out;
-
-   priv->cc[sdformat->pad] = cc;
+   if (sdformat->which == V4L2_SUBDEV_FORMAT_ACTIVE)
+   priv->cc[sdformat->pad] = cc;
 
-   /* propagate output pad format to capture device */
-   imx_media_mbus_fmt_to_pix_fmt(_fmt,
- >format_mbus[PRPENCVF_SRC_PAD],
- priv->cc[PRPENCVF_SRC_PAD]);
-   mutex_unlock(>lock);
-   imx_media_capture_device_set_format(vdev, _fmt);
-
-   return 0;
 out:
mutex_unlock(>lock);
return ret;
diff --git a/drivers/staging/media/imx/imx-media-capture.c 
b/drivers/staging/media/imx/imx-media-capture.c
index 555f6204660b..8a908c3e5e60 100644
--- a/drivers/staging/media/imx/imx-media-capture.c
+++ b/drivers/staging/media/imx/imx-media-capture.c
@@ -205,7 +205,9 @@ static int capture_g_fmt_vid_cap(struct file *file, void 
*fh,
 
 static int __capture_try_fmt_vid_cap(struct capture_priv *priv,
 struct v4l2_subdev_format *fmt_src,
-struct v4l2_format *f)
+struct v4l2_format *f,
+const struct imx_media_pixfmt **retcc,
+struct v4l2_rect *compose)
 {
const struct imx_media_pixfmt *cc, *cc_src;
 
@@ -247,6 +249,16 @@ static int __capture_try_fmt_vid_cap(struct capture_priv 
*priv,
 
imx_media_mbus_fmt_to_pix_fmt(>fmt.pix, _src->format, cc);
 
+   if (retcc)
+   *retcc = cc;
+
+   if (compose) {
+   compose->left = 0;
+   compose->top = 0;
+   compose->width = fmt_src->format.width;
+   compose->height = fmt_src->format.height;
+   }
+
return 0;
 }
 
@@ -263,7 +275,7 @@ static int capture_try_fmt_vid_cap(struct file *file, void 
*fh,
if (ret)
return ret;
 
-   return __capture_try_fmt_vid_cap(priv, _src, f);
+   return __capture_try_fmt_vid_cap(priv, _src, f, NULL, NULL);
 }
 
 static int capture_s_fmt_vid_cap(struct file *file, void *fh,
@@ -284,17 +296,12 @@ static int capture_s_fmt_vid_cap(struct file *file, void 
*fh,
if (ret)
return ret;
 
-   ret = __capture_try_fmt_vid_cap(priv, _src, f);
+   ret = __capture_try_fmt_vid_cap(priv, _src, f, >vdev.cc,
+   >vdev.compose);
if (ret)
return ret;
 
priv->vdev.fmt.fmt.pix = f->fmt.pix;
-   priv->vdev.cc = imx_media_find_format(f->fmt.pix.pixelformat,
- CS_SEL_ANY, true);
-   priv->vdev.compose.left = 0;
-   priv->vdev.compose.top = 0;
-   priv->vdev.compose.width = fmt_src.format.width;
-   priv->vdev.compose.height = fmt_src.format.height;
 
return 0;
 }
@@ -524,6 +531,33 @@ static void capture_buf_queue(struct vb2_buffer *vb)
spin_unlock_irqrestore(>q_lock, flags);
 }
 
+static int capture_validate_fmt(struct capture_priv *priv)
+{
+   struct v4l2_subdev_format fmt_src;
+   const struct imx_media_pixfmt *cc;
+   struct v4l2_rect compose;
+   struct v4l2_format f;
+   int ret;
+
+

[PATCH v5 7/9] media: staging/imx: Re-organize modules

2019-05-10 Thread Steve Longerbeam

Re-organize modules, and which objects are linked into those modules, so
that:

- imx6-media (renamed from imx-media) is the media driver module for
  imx5/6 only, and has no symbol exports.

- imx6-media-csi (renamed from imx-media-csi) is the subdev driver
  module for imx5/6 CSI. It is now linked direcly with imx-media-fim,
  since only the imx5/6 CSI makes use of the frame interval monitor.

- imx-media-common now only contains common code between imx5/6 and imx7
  media drivers. It contains imx-media-utils, imx-media-of,
  imx-media-dev-common, and imx-media-capture. In order to acheive that,
  some functions common to imx5/6 and imx7 have been moved out of
  imx-media-dev.c and into imx-media-dev-common.c.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/Makefile|  14 +-
 .../staging/media/imx/imx-media-dev-common.c  | 345 +-
 drivers/staging/media/imx/imx-media-dev.c | 340 +
 drivers/staging/media/imx/imx-media-fim.c |   5 -
 drivers/staging/media/imx/imx-media-of.c  |   3 +
 drivers/staging/media/imx/imx-media.h |  16 +-
 drivers/staging/media/imx/imx7-media-csi.c|   4 +-
 7 files changed, 369 insertions(+), 358 deletions(-)

diff --git a/drivers/staging/media/imx/Makefile 
b/drivers/staging/media/imx/Makefile
index 86f0c81b6a3b..aa6c4b4ad37e 100644
--- a/drivers/staging/media/imx/Makefile
+++ b/drivers/staging/media/imx/Makefile
@@ -1,14 +1,16 @@
 # SPDX-License-Identifier: GPL-2.0
-imx-media-objs := imx-media-dev.o imx-media-internal-sd.o imx-media-of.o \
+imx6-media-objs := imx-media-dev.o imx-media-internal-sd.o \
imx-ic-common.o imx-ic-prp.o imx-ic-prpencvf.o imx-media-vdic.o
-imx-media-objs += imx-media-dev-common.o
-imx-media-common-objs := imx-media-utils.o imx-media-fim.o
 
-obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media.o
+imx-media-common-objs := imx-media-capture.o imx-media-dev-common.o \
+   imx-media-of.o imx-media-utils.o
+
+imx6-media-csi-objs := imx-media-csi.o imx-media-fim.o
+
+obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx6-media.o
 obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-common.o
-obj-$(CONFIG_VIDEO_IMX_MEDIA) += imx-media-capture.o
 
-obj-$(CONFIG_VIDEO_IMX_CSI) += imx-media-csi.o
+obj-$(CONFIG_VIDEO_IMX_CSI) += imx6-media-csi.o
 obj-$(CONFIG_VIDEO_IMX_CSI) += imx6-mipi-csi2.o
 
 obj-$(CONFIG_VIDEO_IMX7_CSI) += imx7-media-csi.o
diff --git a/drivers/staging/media/imx/imx-media-dev-common.c 
b/drivers/staging/media/imx/imx-media-dev-common.c
index 6cd93419b81d..89dc4ec8dadb 100644
--- a/drivers/staging/media/imx/imx-media-dev-common.c
+++ b/drivers/staging/media/imx/imx-media-dev-common.c
@@ -8,9 +8,342 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include "imx-media.h"
 
-static const struct v4l2_async_notifier_operations imx_media_subdev_ops = {
+static inline struct imx_media_dev *notifier2dev(struct v4l2_async_notifier *n)
+{
+   return container_of(n, struct imx_media_dev, notifier);
+}
+
+/* async subdev bound notifier */
+static int imx_media_subdev_bound(struct v4l2_async_notifier *notifier,
+ struct v4l2_subdev *sd,
+ struct v4l2_async_subdev *asd)
+{
+   v4l2_info(sd->v4l2_dev, "subdev %s bound\n", sd->name);
+
+   return 0;
+}
+
+/*
+ * Create the media links for all subdevs that registered.
+ * Called after all async subdevs have bound.
+ */
+static int imx_media_create_links(struct v4l2_async_notifier *notifier)
+{
+   struct imx_media_dev *imxmd = notifier2dev(notifier);
+   struct v4l2_subdev *sd;
+
+   list_for_each_entry(sd, >v4l2_dev.subdevs, list) {
+   switch (sd->grp_id) {
+   case IMX_MEDIA_GRP_ID_IPU_VDIC:
+   case IMX_MEDIA_GRP_ID_IPU_IC_PRP:
+   case IMX_MEDIA_GRP_ID_IPU_IC_PRPENC:
+   case IMX_MEDIA_GRP_ID_IPU_IC_PRPVF:
+   /*
+* links have already been created for the
+* sync-registered subdevs.
+*/
+   break;
+   case IMX_MEDIA_GRP_ID_IPU_CSI0:
+   case IMX_MEDIA_GRP_ID_IPU_CSI1:
+   case IMX_MEDIA_GRP_ID_CSI:
+   imx_media_create_csi_of_links(imxmd, sd);
+   break;
+   default:
+   /*
+* if this subdev has fwnode links, create media
+* links for them.
+*/
+   imx_media_create_of_links(imxmd, sd);
+   break;
+   }
+   }
+
+   return 0;
+}
+
+/*
+ * adds given video device to given imx-media source pad vdev list.
+ * Continues upstream from the pad entity's sink pads.
+ */
+static int imx_media_add_vdev_to_pad(struct imx_media_dev *imxmd,
+struct imx_media_video_dev *vdev,
+

Re: [PATCH v3 4/4] clk: at91: sckc: add support for SAM9X60

2019-05-10 Thread Alexandre Belloni

On 10/05/2019 11:23:40+, claudiu.bez...@microchip.com wrote:
> From: Claudiu Beznea 
> 
> Add support for SAM9X60's slow clock.
> 
> Signed-off-by: Claudiu Beznea 
Acked-by: Alexandre Belloni 

> ---
>  drivers/clk/at91/sckc.c | 74 
> +
>  1 file changed, 74 insertions(+)
> 
> diff --git a/drivers/clk/at91/sckc.c b/drivers/clk/at91/sckc.c
> index 2a4ac548de80..2c410f41b413 100644
> --- a/drivers/clk/at91/sckc.c
> +++ b/drivers/clk/at91/sckc.c
> @@ -415,6 +415,80 @@ static void __init of_sama5d3_sckc_setup(struct 
> device_node *np)
>  CLK_OF_DECLARE(sama5d3_clk_sckc, "atmel,sama5d3-sckc",
>  of_sama5d3_sckc_setup);
>  
> +static const struct clk_slow_bits at91sam9x60_bits = {
> + .cr_osc32en = BIT(1),
> + .cr_osc32byp = BIT(2),
> + .cr_oscsel = BIT(24),
> +};
> +
> +static void __init of_sam9x60_sckc_setup(struct device_node *np)
> +{
> + void __iomem *regbase = of_iomap(np, 0);
> + struct clk_hw_onecell_data *clk_data;
> + struct clk_hw *slow_rc, *slow_osc;
> + const char *xtal_name;
> + const char *parent_names[2] = { "slow_rc_osc", "slow_osc" };
> + bool bypass;
> + int ret;
> +
> + if (!regbase)
> + return;
> +
> + slow_rc = clk_hw_register_fixed_rate(NULL, parent_names[0], NULL, 0,
> +  32768);
> + if (IS_ERR(slow_rc))
> + return;
> +
> + xtal_name = of_clk_get_parent_name(np, 0);
> + if (!xtal_name)
> + goto unregister_slow_rc;
> +
> + bypass = of_property_read_bool(np, "atmel,osc-bypass");
> + slow_osc = at91_clk_register_slow_osc(regbase, parent_names[1],
> +   xtal_name, 500, bypass,
> +   _bits);
> + if (IS_ERR(slow_osc))
> + goto unregister_slow_rc;
> +
> + clk_data = kzalloc(sizeof(*clk_data) + (2 * sizeof(struct clk_hw *)),
> +GFP_KERNEL);
> + if (!clk_data)
> + goto unregister_slow_osc;
> +
> + /* MD_SLCK and TD_SLCK. */
> + clk_data->num = 2;
> + clk_data->hws[0] = clk_hw_register_fixed_rate(NULL, "md_slck",
> +   parent_names[0],
> +   0, 32768);
> + if (IS_ERR(clk_data->hws[0]))
> + goto clk_data_free;
> +
> + clk_data->hws[1] = at91_clk_register_sam9x5_slow(regbase, "td_slck",
> +  parent_names, 2,
> +  _bits);
> + if (IS_ERR(clk_data->hws[1]))
> + goto unregister_md_slck;
> +
> + ret = of_clk_add_hw_provider(np, of_clk_hw_onecell_get, clk_data);
> + if (WARN_ON(ret))
> + goto unregister_td_slck;
> +
> + return;
> +
> +unregister_td_slck:
> + clk_hw_unregister(clk_data->hws[1]);
> +unregister_md_slck:
> + clk_hw_unregister(clk_data->hws[0]);
> +clk_data_free:
> + kfree(clk_data);
> +unregister_slow_osc:
> + clk_hw_unregister(slow_osc);
> +unregister_slow_rc:
> + clk_hw_unregister(slow_rc);
> +}
> +CLK_OF_DECLARE(sam9x60_clk_sckc, "microchip,sam9x60-sckc",
> +of_sam9x60_sckc_setup);
> +
>  static int clk_sama5d4_slow_osc_prepare(struct clk_hw *hw)
>  {
>   struct clk_sama5d4_slow_osc *osc = to_clk_sama5d4_slow_osc(hw);
> -- 
> 2.7.4
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH] arm64: arch_k3: Fix kconfig dependency warning

2019-05-10 Thread Tony Lindgren

* Marc Zyngier  [190510 18:30]:
> On Fri, 10 May 2019 06:16:38 +0100,
> Lokesh Vutla  wrote:
> > 
> > 
> > 
> > On 10/05/19 9:22 AM, YueHaibing wrote:
> > > Fix Kbuild warning when SOC_TI is not set
> > > 
> > > WARNING: unmet direct dependencies detected for TI_SCI_INTA_IRQCHIP
> > >   Depends on [n]: TI_SCI_PROTOCOL [=y] && SOC_TI [=n]
> > >   Selected by [y]:
> > >   - ARCH_K3 [=y]
> > > 
> > > Fixes: 009669e74813 ("arm64: arch_k3: Enable interrupt controller 
> > > drivers")
> > > Signed-off-by: YueHaibing 
> > 
> > Thanks for catching it.
> > 
> > Reviewed-by: Lokesh Vutla 
> 
> Tony, can you please route this patch via armsoc?

Thanks adding Tero to loop so he can queue it.

Regards,

Tony

Re: [PATCH 1/2] arm64: dts: meson: sei510: consistently order nodes

2019-05-10 Thread Kevin Hilman

Jerome Brunet  writes:

> Like order boards, order nodes by address then node names then aliases.
>
> Signed-off-by: Jerome Brunet 
> ---
>  .../boot/dts/amlogic/meson-g12a-sei510.dts| 92 +--
>  1 file changed, 46 insertions(+), 46 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts 
> b/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
> index 34b40587e5ef..61fb30047d7f 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
> +++ b/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
> @@ -14,10 +14,6 @@
>   compatible = "seirobotics,sei510", "amlogic,g12a";
>   model = "SEI Robotics SEI510";
>  
> - aliases {
> - serial0 = _AO;
> - };
> -
>   adc_keys {
>   compatible = "adc-keys";
>   io-channels = < 0>;
> @@ -31,13 +27,8 @@
>   };
>   };
>  
> - ao_5v: regulator-ao_5v {
> - compatible = "regulator-fixed";
> - regulator-name = "AO_5V";
> - regulator-min-microvolt = <500>;
> - regulator-max-microvolt = <500>;
> - vin-supply = <_in>;
> - regulator-always-on;
> + aliases {
> + serial0 = _AO;
>   };

minor nit: I kind of like "aliases" and "chosen" at the top since they
are kind of special nodes, but honestly, I can't think of a really good
reason other than personal preference, so keeping things sorted as
you've done here is probably better.

Kevin

[PATCH] nvme/pci: Use host managed power state for suspend

2019-05-10 Thread Keith Busch

The nvme pci driver prepares its devices for power loss during suspend
by shutting down the controllers, and the power setting is deferred to
pci driver's power management before the platform removes power. The
suspend-to-idle mode, however, does not remove power.

NVMe devices that implement host managed power settings can achieve
lower power and better transition latencies than using generic PCI
power settings. Try to use this feature if the platform is not involved
with the suspend. If successful, restore the previous power state on
resume.

Cc: Mario Limonciello 
Cc: Kai Heng Feng 
Signed-off-by: Keith Busch 
---
Disclaimer: I've tested only on emulation faking support for the feature.

General question: different devices potentially have divergent values
for power consumption and transition latencies. Would it be useful to
allow a user tunable setting to select the desired target power state
instead of assuming the lowest one?

 drivers/nvme/host/core.c | 27 
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  | 53 
 3 files changed, 82 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a6644a2c3ef7..eb3640fd8838 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1132,6 +1132,33 @@ static int nvme_set_features(struct nvme_ctrl *dev, 
unsigned fid, unsigned dword
return ret;
 }
 
+int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps)
+{
+   return nvme_set_features(ctrl, NVME_FEAT_POWER_MGMT, ps, NULL, 0, NULL);
+}
+EXPORT_SYMBOL_GPL(nvme_set_power);
+
+int nvme_get_power(struct nvme_ctrl *ctrl, u32 *result)
+{
+   struct nvme_command c;
+   union nvme_result res;
+   int ret;
+
+   if (!result)
+   return -EINVAL;
+
+   memset(, 0, sizeof(c));
+   c.features.opcode = nvme_admin_get_features;
+   c.features.fid = cpu_to_le32(NVME_FEAT_POWER_MGMT);
+
+   ret = __nvme_submit_sync_cmd(ctrl->admin_q, , ,
+   NULL, 0, 0, NVME_QID_ANY, 0, 0, false);
+   if (ret >= 0)
+   *result = le32_to_cpu(res.u32);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_get_power);
+
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 {
u32 q_count = (*count - 1) | ((*count - 1) << 16);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5ff83f..eaa571ac06d2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -459,6 +459,8 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct 
nvme_command *cmd,
unsigned timeout, int qid, int at_head,
blk_mq_req_flags_t flags, bool poll);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
+int nvme_set_power(struct nvme_ctrl *ctrl, unsigned ps);
+int nvme_get_power(struct nvme_ctrl *ctrl, u32 *result);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3e4fb891a95a..0d5d91e5b293 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -116,6 +117,7 @@ struct nvme_dev {
u32 cmbsz;
u32 cmbloc;
struct nvme_ctrl ctrl;
+   u32 last_ps;
 
mempool_t *iod_mempool;
 
@@ -2828,11 +2830,59 @@ static void nvme_remove(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
+static int nvme_deep_state(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int ret;
+
+   /*
+* Save the current power state in case a user tool set a power policy
+* for this device. We'll restore that state on resume.
+*/
+   dev->last_ps = 0;
+   ret = nvme_get_power(>ctrl, >last_ps);
+
+   /*
+* Return the error to halt suspend if the driver either couldn't
+* submit a command or didn't see a response.
+*/
+   if (ret < 0)
+   return ret;
+
+   ret = nvme_set_power(>ctrl, dev->ctrl.npss);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
+   /*
+* A saved state prevents pci pm from generically controlling
+* the device's power. We're using protocol specific settings
+* so we don't want pci interfering.
+*/
+   pci_save_state(pdev);
+   } else {
+   /*
+* The drive failed the low power request. Fallback to device
+* shutdown and clear npss to force a controller reset on
+* resume. The value will be rediscovered during reset.
+*/
+   dev->ctrl.npss = 0;
+   nvme_dev_disable(dev, true);
+   }
+   return 0;
+}
+
 static int nvme_suspend(struct device *dev)
 {

Re: [PATCH v2 3/3] initramfs: introduce do_readxattrs()

2019-05-10 Thread Jann Horn

On Thu, May 09, 2019 at 01:24:20PM +0200, Roberto Sassu wrote:
> This patch adds support for an alternative method to add xattrs to files in
> the rootfs filesystem. Instead of extracting them directly from the ram
> disk image, they are extracted from a regular file called .xattr-list, that
> can be added by any ram disk generator available today.
[...]
> +struct path_hdr {
> + char p_size[10]; /* total size including p_size field */
> + char p_data[];  /* \0 */
> +};
> +
> +static int __init do_readxattrs(void)
> +{
> + struct path_hdr hdr;
> + char str[sizeof(hdr.p_size) + 1];
> + unsigned long file_entry_size;
> + size_t size, name_buf_size, total_size;
> + struct kstat st;
> + int ret, fd;
> +
> + ret = vfs_lstat(XATTR_LIST_FILENAME, );
> + if (ret < 0)
> + return ret;
> +
> + total_size = st.size;
> +
> + fd = ksys_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
> + if (fd < 0)
> + return fd;
> +
> + while (total_size) {
> + size = ksys_read(fd, (char *), sizeof(hdr));
[...]
> + ksys_close(fd);
> +
> + if (ret < 0)
> + error("Unable to parse xattrs");
> +
> + return ret;
> +}

Please use something like filp_open()+kernel_read()+fput() instead of
ksys_open()+ksys_read()+ksys_close(). I understand that some of the init
code needs to use the syscall wrappers because no equivalent VFS
functions are available, but please use the VFS functions when that's
easy to do.

Re: [PATCH v3 3/4] dt-bindings: clk: at91: add bindings for SAM9X60's slow clock controller

2019-05-10 Thread Alexandre Belloni

On 10/05/2019 11:23:35+, claudiu.bez...@microchip.com wrote:
> From: Claudiu Beznea 
> 
> Add bindings for SAM9X60's slow clock controller.
> 
> Signed-off-by: Claudiu Beznea 
Reviewed-by: Alexandre Belloni 

> ---
> 
> Hi Rob,
> 
> I didn't added your Reviewed-by tag to this version since I changed
> the driver with regards to clock-cells DT binding (and I though you
> may want to comment on this).
> 
> Thank you,
> Claudiu Beznea
> 
>  Documentation/devicetree/bindings/clock/at91-clock.txt | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/clock/at91-clock.txt 
> b/Documentation/devicetree/bindings/clock/at91-clock.txt
> index b520280e33ff..13f45db3b66d 100644
> --- a/Documentation/devicetree/bindings/clock/at91-clock.txt
> +++ b/Documentation/devicetree/bindings/clock/at91-clock.txt
> @@ -9,10 +9,11 @@ Slow Clock controller:
>  Required properties:
>  - compatible : shall be one of the following:
>   "atmel,at91sam9x5-sckc",
> - "atmel,sama5d3-sckc" or
> - "atmel,sama5d4-sckc":
> + "atmel,sama5d3-sckc",
> + "atmel,sama5d4-sckc" or
> + "microchip,sam9x60-sckc":
>   at91 SCKC (Slow Clock Controller)
> -- #clock-cells : shall be 0.
> +- #clock-cells : shall be 1 for "microchip,sam9x60-sckc" otherwise shall be 
> 0.
>  - clocks : shall be the input parent clock phandle for the clock.
>  
>  Optional properties:
> -- 
> 2.7.4
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

1 2 3 4 5 6 >

1 - 100 of 580 matches

Mail list logo