date:20180216

Re: [PATCH] x86/microcode/intel: Check microcode revision before updating sibling threads

2018-02-16 Thread Ingo Molnar


* Ashok Raj  wrote:

> After updating microcode on one of the threads in the core, the
> thread sibling automatically gets the update since the microcode
> resources are shared. Check the ucode revision on the cpu before
> performing a ucode update.

s/cpu/CPU

> 
> Signed-off-by: Ashok Raj 
> Cc: X86 ML 
> Cc: LKML 
> ---
>  arch/x86/kernel/cpu/microcode/intel.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/microcode/intel.c 
> b/arch/x86/kernel/cpu/microcode/intel.c
> index 09b95a7..036d1db 100644
> --- a/arch/x86/kernel/cpu/microcode/intel.c
> +++ b/arch/x86/kernel/cpu/microcode/intel.c
> @@ -776,7 +776,7 @@ static enum ucode_state apply_microcode_intel(int cpu)
>  {
>   struct microcode_intel *mc;
>   struct ucode_cpu_info *uci;
> - struct cpuinfo_x86 *c;
> + struct cpuinfo_x86 *c = &cpu_data(cpu);
>   static int prev_rev;
>   u32 rev;
>  
> @@ -793,6 +793,18 @@ static enum ucode_state apply_microcode_intel(int cpu)
>   return UCODE_NFOUND;
>   }
>  
> + rev = intel_get_microcode_revision();
> + /*
> +  * Its possible the microcode got udpated
> +  * because its sibling update was done earlier.
> +  * Skip the udpate in that case.
> +  */
> + if (rev >= mc->hdr.rev) {
> + uci->cpu_sig.rev = rev;
> + c->microcode = rev;
> + return UCODE_OK;
> + }

s/udpate
 /update

Also, more fundamentally, during microcode early testing, isn't it possible for 
internal iterations of the microcode to have the same revision, but be 
different?

This patch would prevent re-loading it - for a seemingly minimal benefit.

Thanks,

Ingo

[PATCH 2/2] proc: use set_puts() at /proc/*/wchan

2018-02-16 Thread Alexey Dobriyan

Signed-off-by: Alexey Dobriyan 
---

 fs/proc/base.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -396,7 +396,7 @@ static int proc_pid_wchan(struct seq_file *m, struct 
pid_namespace *ns,
 
wchan = get_wchan(task);
if (wchan && !lookup_symbol_name(wchan, symname)) {
-   seq_printf(m, "%s", symname);
+   seq_puts(m, symname);
return 0;
}

[PATCH 1/2] proc: check permissions earlier for /proc/*/wchan

2018-02-16 Thread Alexey Dobriyan

get_wchan() accesses stack page before permissions are checked,
let's not play this game.

Signed-off-by: Alexey Dobriyan 
---

 fs/proc/base.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -391,14 +391,17 @@ static int proc_pid_wchan(struct seq_file *m, struct 
pid_namespace *ns,
unsigned long wchan;
char symname[KSYM_NAME_LEN];
 
-   wchan = get_wchan(task);
+   if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+   goto print0;
 
-   if (wchan && ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)
-   && !lookup_symbol_name(wchan, symname))
+   wchan = get_wchan(task);
+   if (wchan && !lookup_symbol_name(wchan, symname)) {
seq_printf(m, "%s", symname);
-   else
-   seq_putc(m, '0');
+   return 0;
+   }
 
+print0:
+   seq_putc(m, '0');
return 0;
 }
 #endif /* CONFIG_KALLSYMS */

Re: [PATCH] USB: chaoskey: Use kasprintf() over strcpy()/strcat()

2018-02-16 Thread Keith Packard

Kees Cook  writes:

> Instead of kmalloc() with manually calculated values followed by
> multiple strcpy()/strcat() calls, just fold it all into a single
> kasprintf() call.
>
> Signed-off-by: Kees Cook 

Reviewed-by: Keith Packard 

-- 
-keith


signature.asc
Description: PGP signature

Re: [PATCH 1/3] tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h

2018-02-16 Thread Ravi Bangoria

Oops.. Really sorry about that.

I've tested acme/perf/core on ubuntu ppc32 with and without libaudit-dev
and it's working fine.

Thank you very much for fixing it,
Ravi

On 02/16/2018 11:20 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Feb 16, 2018 at 02:29:01PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Humm, we need to create two tables, one for 32-bit and another for 64,
>> even with ppc not having (AFAIK) clashes in syscall numbers for 32/64...
>>
>> Trying to do it now.
> Now seems to work, take a look at my perf/core branch, should be one of the 
> first few csets.
>
> perfbuilder@cc1a85517216:/git/perf$ grep 192 
> /tmp/build/perf/arch/powerpc/include/generated/asm/syscalls_32.c 
>   [192] = "mmap2",
> perfbuilder@cc1a85517216:/git/perf$ powerpc-linux-gnu-gcc --version
> powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> perfbuilder@cc1a85517216:/git/perf$
>
> perfbuilder@9d7fc9dcfb73:/git/perf$ grep 192 
> /tmp/build/perf/arch/powerpc/include/generated/asm/syscalls_32.c
>   [192] = "mmap2",
> perfbuilder@9d7fc9dcfb73:/git/perf$ grep 192 
> /tmp/build/perf/arch/powerpc/include/generated/asm/syscalls_64.c
> perfbuilder@9d7fc9dcfb73:/git/perf$ powerpc64-linux-gnu-gcc --version
> powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
> perfbuilder@9d7fc9dcfb73:/git/perf$ 
>

[PATCH v2] block/loop: add documentation for sysfs interface

2018-02-16 Thread Aishwarya Pant

Documentation has been compiled from git logs and by reading through
code.

Signed-off-by: Aishwarya Pant 
---
For drivers/block/loop.c, I don't see any maintainers or mailing lists except
for LKML. I am guessing linux-block mailing list should be okay.

Changes in v2:
- Add linux-bl...@vger.kernel.org to the recipient list.

 Documentation/ABI/testing/sysfs-block-loop | 50 ++
 1 file changed, 50 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-block-loop

diff --git a/Documentation/ABI/testing/sysfs-block-loop 
b/Documentation/ABI/testing/sysfs-block-loop
new file mode 100644
index ..627f4eb87286
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-loop
@@ -0,0 +1,50 @@
+What:  /sys/block/loopX/loop/autoclear
+Date:  Aug, 2010
+KernelVersion: v2.6.37
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) Shows if the device is in autoclear mode or not ( "1" or
+   "0"). Autoclear (if set) indicates that the loopback device will
+   self-distruct after last close.
+
+What:  /sys/block/loopX/loop/backing_file
+Date:  Aug, 2010
+KernelVersion: v2.6.37
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) The path of the backing file that the loop device maps its
+   data blocks to.
+
+What:  /sys/block/loopX/loop/offset
+Date:  Aug, 2010
+KernelVersion: v2.6.37
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) Start offset (in bytes).
+
+What:  /sys/block/loopX/loop/sizelimit
+Date:  Aug, 2010
+KernelVersion: v2.6.37
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) The size (in bytes) that the block device maps, starting
+   from the offset.
+
+What:  /sys/block/loopX/loop/partscan
+Date:  Aug, 2011
+KernelVersion: v3.10
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) Shows if automatic partition scanning is enabled for the
+   device or not ("1" or "0"). This can be requested individually
+   per loop device during its setup by setting LO_FLAGS_PARTSCAN in
+   in the ioctl request. By default, no partition tables are
+   scanned.
+
+What:  /sys/block/loopX/loop/dio
+Date:  Aug, 2015
+KernelVersion: v4.10
+Contact:   linux-bl...@vger.kernel.org
+Description:
+   (RO) Shows if direct IO is being used to access backing file or
+   not ("1 or "0").
-- 
2.16.1

Re: [PATCH 08/23] kconfig: add 'macro' keyword to support user-defined function

2018-02-16 Thread Ulf Magnusson

On Fri, Feb 16, 2018 at 11:44:25PM -0500, Nicolas Pitre wrote:
> On Sat, 17 Feb 2018, Ulf Magnusson wrote:
> 
> > On Sat, Feb 17, 2018 at 3:30 AM, Nicolas Pitre  wrote:
> > > On Sat, 17 Feb 2018, Ulf Magnusson wrote:
> > >
> > >> On Fri, Feb 16, 2018 at 02:49:31PM -0500, Nicolas Pitre wrote:
> > >> > On Sat, 17 Feb 2018, Masahiro Yamada wrote:
> > >> >
> > >> > > Now, we got a basic ability to test compiler capability in Kconfig.
> > >> > >
> > >> > > config CC_HAS_STACKPROTECTOR
> > >> > > bool
> > >> > > default $(shell $CC -Werror -fstack-protector -c -x c 
> > >> > > /dev/null -o /dev/null)
> > >> > >
> > >> > > This works, but it is ugly to repeat this long boilerplate.
> > >> > >
> > >> > > We want to describe like this:
> > >> > >
> > >> > > config CC_HAS_STACKPROTECTOR
> > >> > > bool
> > >> > > default $(cc-option -fstack-protector)
> > >> > >
> > >> > > It is straight-forward to implement a new function, but I do not like
> > >> > > to hard-code specialized functions like this.  Hence, here is another
> > >> > > feature to add functions from Kconfig files.
> > >> > >
> > >> > > A user-defined function can be defined as a string type symbol with
> > >> > > a special keyword 'macro'.  It can be referenced in the same way as
> > >> > > built-in functions.  This feature was also inspired by Makefile where
> > >> > > user-defined functions are referenced by $(call func-name, args...),
> > >> > > but I omitted the 'call' to makes it shorter.
> > >> > >
> > >> > > The macro definition can contain $(1), $(2), ... which will be 
> > >> > > replaced
> > >> > > with arguments from the caller.
> > >> > >
> > >> > > Example code:
> > >> > >
> > >> > >   config cc-option
> > >> > >   string
> > >> > >   macro $(shell $CC -Werror $(1) -c -x c /dev/null -o 
> > >> > > /dev/null)
> > >> >
> > >> > I think this syntax for defining a macro shouldn't start with the
> > >> > "config" keyword, unless you want it to be part of the config symbol
> > >> > space and land it in .config. And typing it as a "string" while it
> > >> > actually returns y/n (hence a bool) is also strange.
> > >> >
> > >> > What about this instead:
> > >> >
> > >> > macro cc-option
> > >> > bool $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> > >> >
> > >> > This makes it easier to extend as well if need be.
> > >> >
> > >> >
> > >> > Nicolas
> > >>
> > >> I haven't gone over the patchset in detail yet and might be missing
> > >> something here, but if this is just meant to be a textual shorthand,
> > >> then why give it a type at all?
> > >
> > > It is meant to be like a user-defined function.
> > >
> > >> Do you think a simpler syntax like this would make sense?
> > >>
> > >>   macro cc-option "$(shell $CC -Werror $(1) -c -x c /dev/null -o 
> > >> /dev/null)"
> > >>
> > >> That's the most general version, where you could use it for other stuff
> > >> besides $(shell ...) as well, just to keep parity.
> > >
> > > This is not extendable.  Let's imagine that you might want to implement
> > > some kind of conditionals some day e.g.:
> > >
> > > macro complex_test
> > > bool $(shell foo) if LOCKDEP_SUPPORT
> > > bool y if DEBUG_DRIVER
> > > bool n
> > 
> > I still don't quite get the semantics here. How would the behavior
> > change if the type was changed to say string or int in some or all of
> > the lines?
> 
> I admit this wouldn't make sense to have multiple different types. In 
> this example, the bool keyword acts as syntactic sugar more than 
> anything else.
> 
> > Since the current model is to evaluate $() while the Kconfig files are
> > being parsed, would this require evaluating Kconfig expressions during
> > parsing? There is a relatively clean and (somewhat) easy to understand
> > parsing/evaluation separation at the moment, which I like.
> 
> Agreed. Let's forget about the conditionals then.
> 
> 
> Nicolas

This is also related to why it feels off to me to (at least for its own
sake) make macro definitions mimic symbol definitions.

To me, parsing being a different domain makes it "okay" to use a
different syntax for macros compared to symbol definitions, especially
if it happens to be handier. It even makes things less confusing,
because there's less risk of mixing up the two domains (it's rare to mix
up the preprocessor with C "proper", since the syntax is so different).

More practically, I'm not sure that

macro foo "definition"

would be that hard to extend in practice, if you'd ever need to. You could
always add a new keyword:

fancy-macro/function/whatever foo ...

I admit it'd be a bit ugly if you'd ever end up with something like

macro foo "definition"
bit_ugly

It's still not the end of the world though, IMO, and I suspect there'd
be better-looking options if you'd need to extend things on the macro
side.

That macro syntax seems like the simplest possible thing to me, with no
obvious major dr

[PATCH v2] aoe: document sysfs interface

2018-02-16 Thread Aishwarya Pant

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.

Signed-off-by: Aishwarya Pant 
---
Changes in v2:
- interface -> interfaces in description of netif

 Documentation/ABI/testing/sysfs-block-aoe | 44 +++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-block-aoe

diff --git a/Documentation/ABI/testing/sysfs-block-aoe 
b/Documentation/ABI/testing/sysfs-block-aoe
new file mode 100644
index ..6f0795f7f10c
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-aoe
@@ -0,0 +1,44 @@
+What:  /sys/block/etherd*/mac
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   Ed L. Cashin 
+Description:
+   (RO) The ethernet address of the remote Ata over Ethernet (AoE)
+   device.
+
+What:  /sys/block/etherd*/netif
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   Ed L. Cashin 
+Description:
+   (RO) The name of the network interfaces on the localhost through
+   which we are communicating with the remote AoE device.
+
+What:  /sys/block/etherd*/state
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   Ed L. Cashin 
+Description:
+   (RO) Device status. The state attribute is "up" when the device
+   is ready for I/O and "down" if detected but unusable. The
+   "down,closewait" state shows that the device is still open and
+   cannot come up again until it has been closed.  The "up,kickme"
+   state means that the driver wants to send more commands to the
+   target but found out there were already the max number of
+   commands waiting for a response. It will retry again after being
+   kicked by the periodic timer handler routine.
+
+What:  /sys/block/etherd*/firmware-version
+Date:  Apr, 2005
+KernelVersion: v2.6.12
+Contact:   Ed L. Cashin 
+Description:
+   (RO) Version of the firmware in the target.
+
+What:  /sys/block/etherd*/payload
+Date:  Dec, 2012
+KernelVersion: v3.10
+Contact:   Ed L. Cashin 
+Description:
+   (RO) The amount of user data transferred (in bytes) inside each 
AoE
+   command on the network, network headers excluded.
-- 
2.16.1

[PATCH 00/17] Include linux trace docs to Sphinx TOC tree

2018-02-16 Thread changbin . du

From: Changbin Du 

Hi All,
The linux tracers are so useful that I want to make the docs better. The kernel
now uses Sphinx to generate intelligent and beautiful documentation from
reStructuredText files. I converted most of the Linux trace docs to rst format
in this serias.

For you to preview, please visit below url:
http://docservice.askxiong.com/linux-kernel/trace/index.html

Thank you!

Changbin Du (17):
  Documentation: add Linux tracing to Sphinx TOC tree
  trace doc: convert trace/ftrace-design.txt to rst format
  trace doc: add ftrace-uses.rst to doc tree
  trace doc: convert trace/tracepoint-analysis.txt to rst format
  trace doc: convert trace/ftrace.txt to rst format
  trace doc: convert trace/kprobetrace.txt to rst format
  trace doc: convert trace/uprobetracer.txt to rst format
  trace doc: convert trace/tracepoints.txt to rst format
  trace doc: convert trace/events.txt to rst format
  trace doc: convert trace/events-kmem.txt to rst format
  trace doc: convert trace/events-power.txt to rst format
  trace doc: convert trace/events-nmi.txt to rst format
  trace doc: convert trace/events-msr.txt to rst format
  trace doc: convert trace/mmiotrace.txt to rst format
  trace doc: convert trace/hwlat_detector.txt to rst fromat
  trace doc: convert trace/intel_th.txt to rst format
  trace doc: convert trace/stm.txt to rst format

 Documentation/index.rst|1 +
 .../trace/{events-kmem.txt => events-kmem.rst} |   50 +-
 Documentation/trace/events-msr.rst |   40 +
 Documentation/trace/events-msr.txt |   37 -
 Documentation/trace/events-nmi.rst |   45 +
 Documentation/trace/events-nmi.txt |   43 -
 .../trace/{events-power.txt => events-power.rst}   |   52 +-
 Documentation/trace/{events.txt => events.rst} |  669 ++--
 .../trace/{ftrace-design.txt => ftrace-design.rst} |  252 +-
 Documentation/trace/ftrace-uses.rst|   23 +-
 Documentation/trace/ftrace.rst | 3332 
 Documentation/trace/ftrace.txt | 3220 ---
 .../{hwlat_detector.txt => hwlat_detector.rst} |   26 +-
 Documentation/trace/index.rst  |   23 +
 Documentation/trace/{intel_th.txt => intel_th.rst} |   43 +-
 .../trace/{kprobetrace.txt => kprobetrace.rst} |  100 +-
 .../trace/{mmiotrace.txt => mmiotrace.rst} |   86 +-
 Documentation/trace/{stm.txt => stm.rst}   |   23 +-
 ...epoint-analysis.txt => tracepoint-analysis.rst} |   41 +-
 .../trace/{tracepoints.txt => tracepoints.rst} |   77 +-
 .../trace/{uprobetracer.txt => uprobetracer.rst}   |   44 +-
 21 files changed, 4237 insertions(+), 3990 deletions(-)
 rename Documentation/trace/{events-kmem.txt => events-kmem.rst} (76%)
 create mode 100644 Documentation/trace/events-msr.rst
 delete mode 100644 Documentation/trace/events-msr.txt
 create mode 100644 Documentation/trace/events-nmi.rst
 delete mode 100644 Documentation/trace/events-nmi.txt
 rename Documentation/trace/{events-power.txt => events-power.rst} (65%)
 rename Documentation/trace/{events.txt => events.rst} (82%)
 rename Documentation/trace/{ftrace-design.txt => ftrace-design.rst} (74%)
 create mode 100644 Documentation/trace/ftrace.rst
 delete mode 100644 Documentation/trace/ftrace.txt
 rename Documentation/trace/{hwlat_detector.txt => hwlat_detector.rst} (83%)
 create mode 100644 Documentation/trace/index.rst
 rename Documentation/trace/{intel_th.txt => intel_th.rst} (82%)
 rename Documentation/trace/{kprobetrace.txt => kprobetrace.rst} (63%)
 rename Documentation/trace/{mmiotrace.txt => mmiotrace.rst} (78%)
 rename Documentation/trace/{stm.txt => stm.rst} (91%)
 rename Documentation/trace/{tracepoint-analysis.txt => 
tracepoint-analysis.rst} (93%)
 rename Documentation/trace/{tracepoints.txt => tracepoints.rst} (74%)
 rename Documentation/trace/{uprobetracer.txt => uprobetracer.rst} (86%)

-- 
2.7.4

[PATCH 03/17] trace doc: add ftrace-uses.rst to doc tree

2018-02-16 Thread changbin . du

From: Changbin Du 

Add ftrace-uses.rst into Sphinx TOC tree. Few format issues are fixed.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/ftrace-uses.rst | 23 ---
 Documentation/trace/index.rst   |  1 +
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/Documentation/trace/ftrace-uses.rst 
b/Documentation/trace/ftrace-uses.rst
index 3aed560..998a60a 100644
--- a/Documentation/trace/ftrace-uses.rst
+++ b/Documentation/trace/ftrace-uses.rst
@@ -21,13 +21,14 @@ how to use ftrace to implement your own function callbacks.
 
 The ftrace context
 ==
+.. warning::
 
-WARNING: The ability to add a callback to almost any function within the
-kernel comes with risks. A callback can be called from any context
-(normal, softirq, irq, and NMI). Callbacks can also be called just before
-going to idle, during CPU bring up and takedown, or going to user space.
-This requires extra care to what can be done inside a callback. A callback
-can be called outside the protective scope of RCU.
+  The ability to add a callback to almost any function within the
+  kernel comes with risks. A callback can be called from any context
+  (normal, softirq, irq, and NMI). Callbacks can also be called just before
+  going to idle, during CPU bring up and takedown, or going to user space.
+  This requires extra care to what can be done inside a callback. A callback
+  can be called outside the protective scope of RCU.
 
 The ftrace infrastructure has some protections agains recursions and RCU
 but one must still be very careful how they use the callbacks.
@@ -54,15 +55,15 @@ an ftrace_ops with ftrace:
 
 Both .flags and .private are optional. Only .func is required.
 
-To enable tracing call::
+To enable tracing call:
 
 .. c:function::  register_ftrace_function(&ops);
 
-To disable tracing call::
+To disable tracing call:
 
 .. c:function::  unregister_ftrace_function(&ops);
 
-The above is defined by including the header::
+The above is defined by including the header:
 
 .. c:function:: #include 
 
@@ -200,7 +201,7 @@ match a specific pattern.
 
 See Filter Commands in :file:`Documentation/trace/ftrace.txt`.
 
-To just trace the schedule function::
+To just trace the schedule function:
 
 .. code-block:: c
 
@@ -210,7 +211,7 @@ To add more functions, call the ftrace_set_filter() more 
than once with the
 @reset parameter set to zero. To remove the current filter set and replace it
 with new functions defined by @buf, have @reset be non-zero.
 
-To remove all the filtered functions and trace all functions::
+To remove all the filtered functions and trace all functions:
 
 .. code-block:: c
 
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index c8000ba..aa2baad 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -6,3 +6,4 @@ Linux Tracing Technologies
:maxdepth: 2
 
ftrace-design
+   ftrace-uses
-- 
2.7.4

[PATCH] virtio_balloon: export huge page allocation statistics

2018-02-16 Thread Jonathan Helman

Export statistics for successful and failed huge page allocations
from the virtio balloon driver. These 2 stats come directly from
the vm_events HTLB_BUDDY_PGALLOC and HTLB_BUDDY_PGALLOC_FAIL.

Signed-off-by: Jonathan Helman 
---
 drivers/virtio/virtio_balloon.c | 6 ++
 include/uapi/linux/virtio_balloon.h | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..6b237e3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct 
virtio_balloon *vb)
pages_to_bytes(events[PSWPOUT]));
update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
+#ifdef CONFIG_HUGETLB_PAGE
+   update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
+   events[HTLB_BUDDY_PGALLOC]);
+   update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
+   events[HTLB_BUDDY_PGALLOC_FAIL]);
+#endif
 #endif
update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
pages_to_bytes(i.freeram));
diff --git a/include/uapi/linux/virtio_balloon.h 
b/include/uapi/linux/virtio_balloon.h
index 4e8b830..e3e8071 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -53,7 +53,9 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
 #define VIRTIO_BALLOON_S_AVAIL6   /* Available memory as in /proc */
 #define VIRTIO_BALLOON_S_CACHES   7   /* Disk caches */
-#define VIRTIO_BALLOON_S_NR   8
+#define VIRTIO_BALLOON_S_HTLB_PGALLOC  8  /* Number of htlb pgalloc successes 
*/
+#define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Number of htlb pgalloc failures */
+#define VIRTIO_BALLOON_S_NR   10
 
 /*
  * Memory statistics structure.
-- 
1.8.3.1

[PATCH 02/17] trace doc: convert trace/ftrace-design.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it to Sphinx TOC tree. This documentation is not synced with current
code, so mark it as out of date.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 .../trace/{ftrace-design.txt => ftrace-design.rst} | 252 -
 Documentation/trace/index.rst  |   2 +
 2 files changed, 141 insertions(+), 113 deletions(-)
 rename Documentation/trace/{ftrace-design.txt => ftrace-design.rst} (74%)

diff --git a/Documentation/trace/ftrace-design.txt 
b/Documentation/trace/ftrace-design.rst
similarity index 74%
rename from Documentation/trace/ftrace-design.txt
rename to Documentation/trace/ftrace-design.rst
index a273dd0..a8e22e0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.rst
@@ -1,6 +1,12 @@
-   function tracer guts
-   
-   By Mike Frysinger
+==
+Function Tracer Design
+==
+
+:Author: Mike Frysinger
+
+.. caution::
+   This document is out of date. Some of the description below doesn't
+   match current implementation now.
 
 Introduction
 
@@ -21,8 +27,8 @@ Prerequisites
 -
 
 Ftrace relies on these features being implemented:
- STACKTRACE_SUPPORT - implement save_stack_trace()
- TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h
+  - STACKTRACE_SUPPORT - implement save_stack_trace()
+  - TRACE_IRQFLAGS_SUPPORT - implement include/asm/irqflags.h
 
 
 HAVE_FUNCTION_TRACER
@@ -32,9 +38,11 @@ You will need to implement the mcount and the ftrace_stub 
functions.
 
 The exact mcount symbol name will depend on your toolchain.  Some call it
 "mcount", "_mcount", or even "__mcount".  You can probably figure it out by
-running something like:
+running something like::
+
$ echo 'main(){}' | gcc -x c -S -o - - -pg | grep mcount
callmcount
+
 We'll make the assumption below that the symbol is "mcount" just to keep things
 nice and simple in the examples.
 
@@ -56,8 +64,9 @@ size of the mcount call that is embedded in the function).
 
 For example, if the function foo() calls bar(), when the bar() function calls
 mcount(), the arguments mcount() will pass to the tracer are:
-   "frompc" - the address bar() will use to return to foo()
-   "selfpc" - the address bar() (with mcount() size adjustment)
+
+  - "frompc" - the address bar() will use to return to foo()
+  - "selfpc" - the address bar() (with mcount() size adjustment)
 
 Also keep in mind that this mcount function will be called *a lot*, so
 optimizing for the default case of no tracer will help the smooth running of
@@ -67,39 +76,41 @@ means the code flow should usually be kept linear (i.e. no 
branching in the nop
 case).  This is of course an optimization and not a hard requirement.
 
 Here is some pseudo code that should help (these functions should actually be
-implemented in assembly):
+implemented in assembly)::
 
-void ftrace_stub(void)
-{
-   return;
-}
+   void ftrace_stub(void)
+   {
+   return;
+   }
 
-void mcount(void)
-{
-   /* save any bare state needed in order to do initial checking */
+   void mcount(void)
+   {
+   /* save any bare state needed in order to do initial checking */
 
-   extern void (*ftrace_trace_function)(unsigned long, unsigned long);
-   if (ftrace_trace_function != ftrace_stub)
-   goto do_trace;
+   extern void (*ftrace_trace_function)(unsigned long, unsigned 
long);
+   if (ftrace_trace_function != ftrace_stub)
+   goto do_trace;
 
-   /* restore any bare state */
+   /* restore any bare state */
 
-   return;
+   return;
 
-do_trace:
+   do_trace:
 
-   /* save all state needed by the ABI (see paragraph above) */
+   /* save all state needed by the ABI (see paragraph above) */
 
-   unsigned long frompc = ...;
-   unsigned long selfpc =  - MCOUNT_INSN_SIZE;
-   ftrace_trace_function(frompc, selfpc);
+   unsigned long frompc = ...;
+   unsigned long selfpc =  - MCOUNT_INSN_SIZE;
+   ftrace_trace_function(frompc, selfpc);
 
-   /* restore all state needed by the ABI */
-}
+   /* restore all state needed by the ABI */
+   }
 
 Don't forget to export mcount for modules !
-extern void mcount(void);
-EXPORT_SYMBOL(mcount);
+::
+
+   extern void mcount(void);
+   EXPORT_SYMBOL(mcount);
 
 
 HAVE_FUNCTION_GRAPH_TRACER
@@ -127,38 +138,40 @@ That function will simply call the common 
ftrace_return_to_handler function and
 that will return the original return address with which you can return to the
 original call site.
 
-Here is the updated mcount pseudo code:
-void mcount(void)
-{
-...
-   if (ftrace_trace_function != ftrace_stub)
-   goto

[PATCH 11/17] trace doc: convert trace/events-power.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 .../trace/{events-power.txt => events-power.rst}   | 52 +-
 Documentation/trace/index.rst  |  1 +
 2 files changed, 31 insertions(+), 22 deletions(-)
 rename Documentation/trace/{events-power.txt => events-power.rst} (65%)

diff --git a/Documentation/trace/events-power.txt 
b/Documentation/trace/events-power.rst
similarity index 65%
rename from Documentation/trace/events-power.txt
rename to Documentation/trace/events-power.rst
index 21d514c..a77daca 100644
--- a/Documentation/trace/events-power.txt
+++ b/Documentation/trace/events-power.rst
@@ -1,13 +1,14 @@
-
-   Subsystem Trace Points: power
+=
+Subsystem Trace Points: power
+=
 
 The power tracing system captures events related to power transitions
 within the kernel. Broadly speaking there are three major subheadings:
 
-  o Power state switch which reports events related to suspend (S-states),
- cpuidle (C-states) and cpufreq (P-states)
-  o System clock related changes
-  o Power domains related changes and transitions
+  - Power state switch which reports events related to suspend (S-states),
+cpuidle (C-states) and cpufreq (P-states)
+  - System clock related changes
+  - Power domains related changes and transitions
 
 This document describes what each of the tracepoints is and why they
 might be useful.
@@ -22,14 +23,16 @@ Cf. include/trace/events/power.h for the events definitions.
 
 A 'cpu' event class gathers the CPU-related events: cpuidle and
 cpufreq.
+::
 
-cpu_idle   "state=%lu cpu_id=%lu"
-cpu_frequency  "state=%lu cpu_id=%lu"
+  cpu_idle "state=%lu cpu_id=%lu"
+  cpu_frequency"state=%lu cpu_id=%lu"
 
 A suspend event is used to indicate the system going in and out of the
 suspend mode:
+::
 
-machine_suspend"state=%lu"
+  machine_suspend  "state=%lu"
 
 
 Note: the value of '-1' or '4294967295' for state means an exit from the 
current state,
@@ -45,10 +48,11 @@ correctly draw the states diagrams and to calculate 
accurate statistics etc.
 
 The clock events are used for clock enable/disable and for
 clock rate change.
+::
 
-clock_enable   "%s state=%lu cpu_id=%lu"
-clock_disable  "%s state=%lu cpu_id=%lu"
-clock_set_rate "%s state=%lu cpu_id=%lu"
+  clock_enable "%s state=%lu cpu_id=%lu"
+  clock_disable"%s state=%lu cpu_id=%lu"
+  clock_set_rate   "%s state=%lu cpu_id=%lu"
 
 The first parameter gives the clock name (e.g. "gpio1_iclk").
 The second parameter is '1' for enable, '0' for disable, the target
@@ -57,8 +61,9 @@ clock rate for set_rate.
 3. Power domains events
 ===
 The power domain events are used for power domains transitions
+::
 
-power_domain_target"%s state=%lu cpu_id=%lu"
+  power_domain_target  "%s state=%lu cpu_id=%lu"
 
 The first parameter gives the power domain name (e.g. "mpu_pwrdm").
 The second parameter is the power domain target state.
@@ -67,28 +72,31 @@ The second parameter is the power domain target state.
 
 The PM QoS events are used for QoS add/update/remove request and for
 target/flags update.
+::
 
-pm_qos_add_request "pm_qos_class=%s value=%d"
-pm_qos_update_request  "pm_qos_class=%s value=%d"
-pm_qos_remove_request  "pm_qos_class=%s value=%d"
-pm_qos_update_request_timeout  "pm_qos_class=%s value=%d, timeout_us=%ld"
+  pm_qos_add_request "pm_qos_class=%s value=%d"
+  pm_qos_update_request  "pm_qos_class=%s value=%d"
+  pm_qos_remove_request  "pm_qos_class=%s value=%d"
+  pm_qos_update_request_timeout  "pm_qos_class=%s value=%d, timeout_us=%ld"
 
 The first parameter gives the QoS class name (e.g. "CPU_DMA_LATENCY").
 The second parameter is value to be added/updated/removed.
 The third parameter is timeout value in usec.
+::
 
-pm_qos_update_target   "action=%s prev_value=%d curr_value=%d"
-pm_qos_update_flags"action=%s prev_value=0x%x curr_value=0x%x"
+  pm_qos_update_target   "action=%s prev_value=%d curr_value=%d"
+  pm_qos_update_flags"action=%s prev_value=0x%x 
curr_value=0x%x"
 
 The first parameter gives the QoS action name (e.g. "ADD_REQ").
 The second parameter is the previous QoS value.
 The third parameter is the current QoS value to update.
 
 And, there are also events used for device PM QoS add/update/remove request.
+::
 
-dev_pm_qos_add_request "device=%s type=%s new_value=%d"
-dev_pm_qos_update_request  "device=%s type=%s new_value=%d"
-dev_pm_qos_remove_request  "device=%s type=%s new_value=%d"
+

[PATCH 12/17] trace doc: convert trace/events-nmi.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/events-nmi.rst | 45 ++
 Documentation/trace/events-nmi.txt | 43 
 Documentation/trace/index.rst  |  1 +
 3 files changed, 46 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/trace/events-nmi.rst
 delete mode 100644 Documentation/trace/events-nmi.txt

diff --git a/Documentation/trace/events-nmi.rst 
b/Documentation/trace/events-nmi.rst
new file mode 100644
index 000..9e0a728
--- /dev/null
+++ b/Documentation/trace/events-nmi.rst
@@ -0,0 +1,45 @@
+
+NMI Trace Events
+
+
+These events normally show up here:
+
+   /sys/kernel/debug/tracing/events/nmi
+
+
+nmi_handler
+---
+
+You might want to use this tracepoint if you suspect that your
+NMI handlers are hogging large amounts of CPU time.  The kernel
+will warn if it sees long-running handlers::
+
+   INFO: NMI handler took too long to run: 9.207 msecs
+
+and this tracepoint will allow you to drill down and get some
+more details.
+
+Let's say you suspect that perf_event_nmi_handler() is causing
+you some problems and you only want to trace that handler
+specifically.  You need to find its address::
+
+   $ grep perf_event_nmi_handler /proc/kallsyms
+   81625600 t perf_event_nmi_handler
+
+Let's also say you are only interested in when that function is
+really hogging a lot of CPU time, like a millisecond at a time.
+Note that the kernel's output is in milliseconds, but the input
+to the filter is in nanoseconds!  You can filter on 'delta_ns'::
+
+   cd /sys/kernel/debug/tracing/events/nmi/nmi_handler
+   echo 'handler==0x81625600 && delta_ns>100' > filter
+   echo 1 > enable
+
+Your output would then look like::
+
+   $ cat /sys/kernel/debug/tracing/trace_pipe
+   -0 [000] d.h3   505.397558: nmi_handler: 
perf_event_nmi_handler() delta_ns: 3236765 handled: 1
+   -0 [000] d.h3   505.805893: nmi_handler: 
perf_event_nmi_handler() delta_ns: 3174234 handled: 1
+   -0 [000] d.h3   506.158206: nmi_handler: 
perf_event_nmi_handler() delta_ns: 3084642 handled: 1
+   -0 [000] d.h3   506.334346: nmi_handler: 
perf_event_nmi_handler() delta_ns: 3080351 handled: 1
+
diff --git a/Documentation/trace/events-nmi.txt 
b/Documentation/trace/events-nmi.txt
deleted file mode 100644
index c03c8c8..000
--- a/Documentation/trace/events-nmi.txt
+++ /dev/null
@@ -1,43 +0,0 @@
-NMI Trace Events
-
-These events normally show up here:
-
-   /sys/kernel/debug/tracing/events/nmi
-
---
-
-nmi_handler:
-
-You might want to use this tracepoint if you suspect that your
-NMI handlers are hogging large amounts of CPU time.  The kernel
-will warn if it sees long-running handlers:
-
-   INFO: NMI handler took too long to run: 9.207 msecs
-
-and this tracepoint will allow you to drill down and get some
-more details.
-
-Let's say you suspect that perf_event_nmi_handler() is causing
-you some problems and you only want to trace that handler
-specifically.  You need to find its address:
-
-   $ grep perf_event_nmi_handler /proc/kallsyms
-   81625600 t perf_event_nmi_handler
-
-Let's also say you are only interested in when that function is
-really hogging a lot of CPU time, like a millisecond at a time.
-Note that the kernel's output is in milliseconds, but the input
-to the filter is in nanoseconds!  You can filter on 'delta_ns':
-
-cd /sys/kernel/debug/tracing/events/nmi/nmi_handler
-echo 'handler==0x81625600 && delta_ns>100' > filter
-echo 1 > enable
-
-Your output would then look like:
-
-$ cat /sys/kernel/debug/tracing/trace_pipe
--0 [000] d.h3   505.397558: nmi_handler: perf_event_nmi_handler() 
delta_ns: 3236765 handled: 1
--0 [000] d.h3   505.805893: nmi_handler: perf_event_nmi_handler() 
delta_ns: 3174234 handled: 1
--0 [000] d.h3   506.158206: nmi_handler: perf_event_nmi_handler() 
delta_ns: 3084642 handled: 1
--0 [000] d.h3   506.334346: nmi_handler: perf_event_nmi_handler() 
delta_ns: 3080351 handled: 1
-
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 309c9c5..f4a8fbc 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -15,3 +15,4 @@ Linux Tracing Technologies
events
events-kmem
events-power
+   events-nmi
-- 
2.7.4

[PATCH 06/17] trace doc: convert trace/kprobetrace.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |   1 +
 .../trace/{kprobetrace.txt => kprobetrace.rst} | 100 +++--
 2 files changed, 55 insertions(+), 46 deletions(-)
 rename Documentation/trace/{kprobetrace.txt => kprobetrace.rst} (63%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 947c6db..c8e2130 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -9,3 +9,4 @@ Linux Tracing Technologies
tracepoint-analysis
ftrace
ftrace-uses
+   kprobetrace
diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.rst
similarity index 63%
rename from Documentation/trace/kprobetrace.txt
rename to Documentation/trace/kprobetrace.rst
index 1a3a3d6..3e0f971 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.rst
@@ -1,8 +1,8 @@
-Kprobe-based Event Tracing
-==
-
- Documentation is written by Masami Hiramatsu
+==
+Kprobe-based Event Tracing
+==
 
+:Author: Masami Hiramatsu
 
 Overview
 
@@ -23,6 +23,8 @@ current_tracer. Instead of that, add probe points via
 
 Synopsis of kprobe_events
 -
+::
+
   p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
   r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
   -:[GRP/]EVENT: Clear a probe
@@ -66,7 +68,7 @@ String type is a special type, which fetches a 
"null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out.
 Bitfield is another special type, which takes 3 parameters, bit-width, bit-
-offset, and container-size (usually 32). The syntax is;
+offset, and container-size (usually 32). The syntax is::
 
  b@/
 
@@ -75,7 +77,7 @@ For $comm, the default type is "string"; any other type is 
invalid.
 
 Per-Probe Event Filtering
 -
- Per-probe event filtering feature allows you to set different filter on each
+Per-probe event filtering feature allows you to set different filter on each
 probe and gives you what arguments will be shown in trace buffer. If an event
 name is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
 under tracing/events/kprobes/, at the directory you can see 'id',
@@ -96,87 +98,93 @@ id:
 
 Event Profiling
 ---
- You can check the total number of probe hits and probe miss-hits via
+You can check the total number of probe hits and probe miss-hits via
 /sys/kernel/debug/tracing/kprobe_profile.
- The first column is event name, the second is the number of probe hits,
+The first column is event name, the second is the number of probe hits,
 the third is the number of probe miss-hits.
 
 
 Usage examples
 --
 To add a probe as a new event, write a new definition to kprobe_events
-as below.
+as below::
 
   echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' 
> /sys/kernel/debug/tracing/kprobe_events
 
- This sets a kprobe on the top of do_sys_open() function with recording
+This sets a kprobe on the top of do_sys_open() function with recording
 1st to 4th arguments as "myprobe" event. Note, which register/stack entry is
 assigned to each function argument depends on arch-specific ABI. If you unsure
 the ABI, please try to use probe subcommand of perf-tools (you can find it
 under tools/perf/).
 As this example shows, users can choose more familiar names for each arguments.
+::
 
   echo 'r:myretprobe do_sys_open $retval' >> 
/sys/kernel/debug/tracing/kprobe_events
 
- This sets a kretprobe on the return point of do_sys_open() function with
+This sets a kretprobe on the return point of do_sys_open() function with
 recording return value as "myretprobe" event.
- You can see the format of these events via
+You can see the format of these events via
 /sys/kernel/debug/tracing/events/kprobes//format.
+::
 
   cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
-name: myprobe
-ID: 780
-format:
-field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
-field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
-field:unsigned char common_preempt_count;   offset:3; 
size:1;signed:0;
-field:int common_pid;   offset:4;   size:4; signed:1;
+  name: myprobe
+  ID: 780
+  format:
+  field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
+  field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
+  field:unsigned char common_preempt_count;   offset:3; 
size:1;signed:0

[PATCH 10/17] trace doc: convert trace/events-kmem.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 .../trace/{events-kmem.txt => events-kmem.rst} | 50 ++
 Documentation/trace/index.rst  |  1 +
 2 files changed, 32 insertions(+), 19 deletions(-)
 rename Documentation/trace/{events-kmem.txt => events-kmem.rst} (76%)

diff --git a/Documentation/trace/events-kmem.txt 
b/Documentation/trace/events-kmem.rst
similarity index 76%
rename from Documentation/trace/events-kmem.txt
rename to Documentation/trace/events-kmem.rst
index 1948004..5554841 100644
--- a/Documentation/trace/events-kmem.txt
+++ b/Documentation/trace/events-kmem.rst
@@ -1,22 +1,26 @@
-   Subsystem Trace Points: kmem
+
+Subsystem Trace Points: kmem
+
 
 The kmem tracing system captures events related to object and page allocation
 within the kernel. Broadly speaking there are five major subheadings.
 
-  o Slab allocation of small objects of unknown type (kmalloc)
-  o Slab allocation of small objects of known type
-  o Page allocation
-  o Per-CPU Allocator Activity
-  o External Fragmentation
+  - Slab allocation of small objects of unknown type (kmalloc)
+  - Slab allocation of small objects of known type
+  - Page allocation
+  - Per-CPU Allocator Activity
+  - External Fragmentation
 
 This document describes what each of the tracepoints is and why they
 might be useful.
 
 1. Slab allocation of small objects of unknown type
 ===
-kmalloccall_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu 
gfp_flags=%s
-kmalloc_node   call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 
node=%d
-kfree  call_site=%lx ptr=%p
+::
+
+  kmalloc  call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu 
gfp_flags=%s
+  kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s 
node=%d
+  kfreecall_site=%lx ptr=%p
 
 Heavy activity for these events may indicate that a specific cache is
 justified, particularly if kmalloc slab pages are getting significantly
@@ -27,9 +31,11 @@ the allocation sites were.
 
 2. Slab allocation of small objects of known type
 =
-kmem_cache_alloc   call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu 
gfp_flags=%s
-kmem_cache_alloc_node  call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu 
gfp_flags=%s node=%d
-kmem_cache_freecall_site=%lx ptr=%p
+::
+
+  kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu 
gfp_flags=%s
+  kmem_cache_alloc_nodecall_site=%lx ptr=%p bytes_req=%zu 
bytes_alloc=%zu gfp_flags=%s node=%d
+  kmem_cache_free  call_site=%lx ptr=%p
 
 These events are similar in usage to the kmalloc-related events except that
 it is likely easier to pin the event down to a specific cache. At the time
@@ -38,10 +44,12 @@ but the call_site can usually be used to extrapolate that 
information.
 
 3. Page allocation
 ==
-mm_page_allocpage=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
-mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d 
percpu_refill=%d
-mm_page_free page=%p pfn=%lu order=%d
-mm_page_free_batched page=%p pfn=%lu order=%d cold=%d
+::
+
+  mm_page_alloc  page=%p pfn=%lu order=%d migratetype=%d 
gfp_flags=%s
+  mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d 
percpu_refill=%d
+  mm_page_free   page=%p pfn=%lu order=%d
+  mm_page_free_batched   page=%p pfn=%lu order=%d cold=%d
 
 These four events deal with page allocation and freeing. mm_page_alloc is
 a simple indicator of page allocator activity. Pages may be allocated from
@@ -65,8 +73,10 @@ contention on the zone->lru_lock.
 
 4. Per-CPU Allocator Activity
 =
-mm_page_alloc_zone_locked  page=%p pfn=%lu order=%u migratetype=%d cpu=%d 
percpu_refill=%d
-mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d
+::
+
+  mm_page_alloc_zone_lockedpage=%p pfn=%lu order=%u migratetype=%d cpu=%d 
percpu_refill=%d
+  mm_page_pcpu_drain   page=%p pfn=%lu order=%d cpu=%d migratetype=%d
 
 In front of the page allocator is a per-cpu page allocator. It exists only
 for order-0 pages, reduces contention on the zone->lock and reduces the
@@ -92,7 +102,9 @@ can be allocated and freed on the same CPU through some 
algorithm change.
 
 5. External Fragmentation
 =
-mm_page_alloc_extfrag  page=%p pfn=%lu alloc_order=%d 
fallback_order=%d pageblock_order=%d alloc_migratetype=%d 
fallback_migratetype=%d fragmenting=%d change_ownership=%d
+::
+
+  mm_page_alloc_extfragpage=%p pfn=%lu alloc_order=%d 
fallback_order=%d

[PATCH 13/17] trace doc: convert trace/events-msr.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/events-msr.rst | 40 ++
 Documentation/trace/events-msr.txt | 37 ---
 Documentation/trace/index.rst  |  1 +
 3 files changed, 41 insertions(+), 37 deletions(-)
 create mode 100644 Documentation/trace/events-msr.rst
 delete mode 100644 Documentation/trace/events-msr.txt

diff --git a/Documentation/trace/events-msr.rst 
b/Documentation/trace/events-msr.rst
new file mode 100644
index 000..e938aa0
--- /dev/null
+++ b/Documentation/trace/events-msr.rst
@@ -0,0 +1,40 @@
+
+MSR Trace Events
+
+
+The x86 kernel supports tracing most MSR (Model Specific Register) accesses.
+To see the definition of the MSRs on Intel systems please see the SDM
+at http://www.intel.com/sdm (Volume 3)
+
+Available trace points:
+
+/sys/kernel/debug/tracing/events/msr/
+
+Trace MSR reads:
+
+read_msr
+
+  - msr: MSR number
+  - val: Value written
+  - failed: 1 if the access failed, otherwise 0
+
+
+Trace MSR writes:
+
+write_msr
+
+  - msr: MSR number
+  - val: Value written
+  - failed: 1 if the access failed, otherwise 0
+
+
+Trace RDPMC in kernel:
+
+rdpmc
+
+The trace data can be post processed with the postprocess/decode_msr.py 
script::
+
+  cat /sys/kernel/debug/tracing/trace | decode_msr.py 
/usr/src/linux/include/asm/msr-index.h
+
+to add symbolic MSR names.
+
diff --git a/Documentation/trace/events-msr.txt 
b/Documentation/trace/events-msr.txt
deleted file mode 100644
index 78c383b..000
--- a/Documentation/trace/events-msr.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-
-The x86 kernel supports tracing most MSR (Model Specific Register) accesses.
-To see the definition of the MSRs on Intel systems please see the SDM
-at http://www.intel.com/sdm (Volume 3)
-
-Available trace points:
-
-/sys/kernel/debug/tracing/events/msr/
-
-Trace MSR reads
-
-read_msr
-
-msr: MSR number
-val: Value written
-failed: 1 if the access failed, otherwise 0
-
-
-Trace MSR writes
-
-write_msr
-
-msr: MSR number
-val: Value written
-failed: 1 if the access failed, otherwise 0
-
-
-Trace RDPMC in kernel
-
-rdpmc
-
-The trace data can be post processed with the postprocess/decode_msr.py script
-
-cat /sys/kernel/debug/tracing/trace | decode_msr.py 
/usr/src/linux/include/asm/msr-index.h
-
-to add symbolic MSR names.
-
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index f4a8fbc..307468d 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -16,3 +16,4 @@ Linux Tracing Technologies
events-kmem
events-power
events-nmi
+   events-msr
-- 
2.7.4

[PATCH 09/17] trace doc: convert trace/events.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/{events.txt => events.rst} | 669 +
 Documentation/trace/index.rst  |   1 +
 2 files changed, 337 insertions(+), 333 deletions(-)
 rename Documentation/trace/{events.txt => events.rst} (82%)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.rst
similarity index 82%
rename from Documentation/trace/events.txt
rename to Documentation/trace/events.rst
index 2cc08d4..3d6fdea 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.rst
@@ -1,7 +1,9 @@
-Event Tracing
+=
+Event Tracing
+=
 
-   Documentation written by Theodore Ts'o
-   Updated by Li Zefan and Tom Zanussi
+:Author: Theodore Ts'o
+:Updated: Li Zefan and Tom Zanussi
 
 1. Introduction
 ===
@@ -25,23 +27,22 @@ The events which are available for tracing can be found in 
the file
 /sys/kernel/debug/tracing/available_events.
 
 To enable a particular event, such as 'sched_wakeup', simply echo it
-to /sys/kernel/debug/tracing/set_event. For example:
+to /sys/kernel/debug/tracing/set_event. For example::
 
# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
 
-[ Note: '>>' is necessary, otherwise it will firstly disable
-  all the events. ]
+.. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
 
 To disable an event, echo the event name to the set_event file prefixed
-with an exclamation point:
+with an exclamation point::
 
# echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
 
-To disable all events, echo an empty line to the set_event file:
+To disable all events, echo an empty line to the set_event file::
 
# echo > /sys/kernel/debug/tracing/set_event
 
-To enable all events, echo '*:*' or '*:' to the set_event file:
+To enable all events, echo '*:*' or '*:' to the set_event file::
 
# echo *:* > /sys/kernel/debug/tracing/set_event
 
@@ -50,7 +51,7 @@ etc., and a full event name looks like this: 
:.  The
 subsystem name is optional, but it is displayed in the available_events
 file.  All of the events in a subsystem can be specified via the syntax
 ":*"; for example, to enable all irq events, you can use the
-command:
+command::
 
# echo 'irq:*' > /sys/kernel/debug/tracing/set_event
 
@@ -60,33 +61,33 @@ command:
 The events available are also listed in /sys/kernel/debug/tracing/events/ 
hierarchy
 of directories.
 
-To enable event 'sched_wakeup':
+To enable event 'sched_wakeup'::
 
# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
 
-To disable it:
+To disable it::
 
# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
 
-To enable all events in sched subsystem:
+To enable all events in sched subsystem::
 
# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
 
-To enable all events:
+To enable all events::
 
# echo 1 > /sys/kernel/debug/tracing/events/enable
 
 When reading one of these enable files, there are four results:
 
- 0 - all events this file affects are disabled
- 1 - all events this file affects are enabled
- X - there is a mixture of events enabled and disabled
- ? - this file does not affect any event
+ - 0 - all events this file affects are disabled
+ - 1 - all events this file affects are enabled
+ - X - there is a mixture of events enabled and disabled
+ - ? - this file does not affect any event
 
 2.3 Boot option
 ---
 
-In order to facilitate early boot debugging, use boot option:
+In order to facilitate early boot debugging, use boot option::
 
trace_event=[event-list]
 
@@ -115,7 +116,7 @@ the fields prefixed with 'common_'.  The other fields vary 
between
 events and correspond to the fields defined in the TRACE_EVENT
 definition for that event.
 
-Each field in the format has the form:
+Each field in the format has the form::
 
  field:field-type field-name; offset:N; size:N;
 
@@ -123,27 +124,27 @@ where offset is the offset of the field in the trace 
record and size
 is the size of the data item, in bytes.
 
 For example, here's the information displayed for the 'sched_wakeup'
-event:
+event::
 
-# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
+   # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
 
-name: sched_wakeup
-ID: 60
-format:
-   field:unsigned short common_type;   offset:0;   size:2;
-   field:unsigned char common_flags;   offset:2;   size:1;
-   field:unsigned char common_preempt_count;   offset:3;   size:1;
-   field:int common_pid;   offset:4;   size:4;
-   field:int common_tgid;  offset:8;   size:4;
+   name: sched_wakeup
+   ID: 60
+   format:
+

[PATCH 07/17] trace doc: convert trace/uprobetracer.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |  1 +
 .../trace/{uprobetracer.txt => uprobetracer.rst}   | 44 +-
 2 files changed, 27 insertions(+), 18 deletions(-)
 rename Documentation/trace/{uprobetracer.txt => uprobetracer.rst} (86%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index c8e2130..353fb8a 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -10,3 +10,4 @@ Linux Tracing Technologies
ftrace
ftrace-uses
kprobetrace
+   uprobetracer
diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.rst
similarity index 86%
rename from Documentation/trace/uprobetracer.txt
rename to Documentation/trace/uprobetracer.rst
index bf526a7c..98d3f69 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.rst
@@ -1,7 +1,8 @@
-Uprobe-tracer: Uprobe-based Event Tracing
-=
+=
+Uprobe-tracer: Uprobe-based Event Tracing
+=
 
-   Documentation written by Srikar Dronamraju
+:Author: Srikar Dronamraju
 
 
 Overview
@@ -19,6 +20,8 @@ user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -
+::
+
   p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
   r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
   -:[GRP/]EVENT   : Clear uprobe or uretprobe event
@@ -57,7 +60,7 @@ x86-64 uses x64).
 String type is a special type, which fetches a "null-terminated" string from
 user space.
 Bitfield is another special type, which takes 3 parameters, bit-width, bit-
-offset, and container-size (usually 32). The syntax is;
+offset, and container-size (usually 32). The syntax is::
 
  b@/
 
@@ -74,28 +77,28 @@ the third is the number of probe miss-hits.
 Usage examples
 --
  * Add a probe as a new uprobe event, write a new definition to uprobe_events
-as below: (sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash)
+   as below (sets a uprobe at an offset of 0x4245c0 in the executable 
/bin/bash)::
 
 echo 'p /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
 
- * Add a probe as a new uretprobe event:
+ * Add a probe as a new uretprobe event::
 
 echo 'r /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
 
- * Unset registered event:
+ * Unset registered event::
 
 echo '-:p_bash_0x4245c0' >> /sys/kernel/debug/tracing/uprobe_events
 
- * Print out the events that are registered:
+ * Print out the events that are registered::
 
 cat /sys/kernel/debug/tracing/uprobe_events
 
- * Clear all events:
+ * Clear all events::
 
 echo > /sys/kernel/debug/tracing/uprobe_events
 
 Following example shows how to dump the instruction pointer and %ax register
-at the probed text address. Probe zfree function in /bin/zsh:
+at the probed text address. Probe zfree function in /bin/zsh::
 
 # cd /sys/kernel/debug/tracing/
 # cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
@@ -103,24 +106,27 @@ at the probed text address. Probe zfree function in 
/bin/zsh:
 # objdump -T /bin/zsh | grep -w zfree
 00446420 gDF .text  0012  Basezfree
 
-  0x46420 is the offset of zfree in object /bin/zsh that is loaded at
-  0x0040. Hence the command to uprobe would be:
+0x46420 is the offset of zfree in object /bin/zsh that is loaded at
+0x0040. Hence the command to uprobe would be::
 
 # echo 'p:zfree_entry /bin/zsh:0x46420 %ip %ax' > uprobe_events
 
-  And the same for the uretprobe would be:
+And the same for the uretprobe would be::
 
 # echo 'r:zfree_exit /bin/zsh:0x46420 %ip %ax' >> uprobe_events
 
-Please note: User has to explicitly calculate the offset of the probe-point
-in the object. We can see the events that are registered by looking at the
-uprobe_events file.
+.. note:: User has to explicitly calculate the offset of the probe-point
+   in the object.
+
+We can see the events that are registered by looking at the uprobe_events file.
+::
 
 # cat uprobe_events
 p:uprobes/zfree_entry /bin/zsh:0x00046420 arg1=%ip arg2=%ax
 r:uprobes/zfree_exit /bin/zsh:0x00046420 arg1=%ip arg2=%ax
 
-Format of events can be seen by viewing the file 
events/uprobes/zfree_entry/format
+Format of events can be seen by viewing the file 
events/uprobes/zfree_entry/format.
+::
 
 # cat events/uprobes/zfree_entry/format
 name: zfree_entry
@@ -139,16 +145,18 @@ Format of events can be seen by viewing the file 
events/uprobes/zfree_entry/form
 print fmt: "(%lx) arg1=%lx arg2=%lx", REC->__probe_ip, REC->arg1,

[PATCH 16/17] trace doc: convert trace/intel_th.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |  1 +
 Documentation/trace/{intel_th.txt => intel_th.rst} | 43 +++---
 2 files changed, 23 insertions(+), 21 deletions(-)
 rename Documentation/trace/{intel_th.txt => intel_th.rst} (82%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index eabbbaf..02cc56c 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -19,3 +19,4 @@ Linux Tracing Technologies
events-msr
mmiotrace
hwlat_detector
+   intel_th
diff --git a/Documentation/trace/intel_th.txt b/Documentation/trace/intel_th.rst
similarity index 82%
rename from Documentation/trace/intel_th.txt
rename to Documentation/trace/intel_th.rst
index 7a57165..990f132 100644
--- a/Documentation/trace/intel_th.txt
+++ b/Documentation/trace/intel_th.rst
@@ -1,3 +1,4 @@
+===
 Intel(R) Trace Hub (TH)
 ===
 
@@ -18,13 +19,13 @@ via sysfs attributes.
 
 Currently, the following Intel TH subdevices (blocks) are supported:
   - Software Trace Hub (STH), trace source, which is a System Trace
-  Module (STM) device,
+Module (STM) device,
   - Memory Storage Unit (MSU), trace output, which allows storing
-  trace hub output in system memory,
+trace hub output in system memory,
   - Parallel Trace Interface output (PTI), trace output to an external
-  debug host via a PTI port,
+debug host via a PTI port,
   - Global Trace Hub (GTH), which is a switch and a central component
-  of Intel(R) Trace Hub architecture.
+of Intel(R) Trace Hub architecture.
 
 Common attributes for output devices are described in
 Documentation/ABI/testing/sysfs-bus-intel_th-output-devices, the most
@@ -65,41 +66,41 @@ allocated, are accessible via /dev/intel_th0/msc{0,1}.
 Quick example
 -
 
-# figure out which GTH port is the first memory controller:
+# figure out which GTH port is the first memory controller::
 
-$ cat /sys/bus/intel_th/devices/0-msc0/port
-0
+   $ cat /sys/bus/intel_th/devices/0-msc0/port
+   0
 
-# looks like it's port 0, configure master 33 to send data to port 0:
+# looks like it's port 0, configure master 33 to send data to port 0::
 
-$ echo 0 > /sys/bus/intel_th/devices/0-gth/masters/33
+   $ echo 0 > /sys/bus/intel_th/devices/0-gth/masters/33
 
 # allocate a 2-windowed multiblock buffer on the first memory
-# controller, each with 64 pages:
+# controller, each with 64 pages::
 
-$ echo multi > /sys/bus/intel_th/devices/0-msc0/mode
-$ echo 64,64 > /sys/bus/intel_th/devices/0-msc0/nr_pages
+   $ echo multi > /sys/bus/intel_th/devices/0-msc0/mode
+   $ echo 64,64 > /sys/bus/intel_th/devices/0-msc0/nr_pages
 
-# enable wrapping for this controller, too:
+# enable wrapping for this controller, too::
 
-$ echo 1 > /sys/bus/intel_th/devices/0-msc0/wrap
+   $ echo 1 > /sys/bus/intel_th/devices/0-msc0/wrap
 
-# and enable tracing into this port:
+# and enable tracing into this port::
 
-$ echo 1 > /sys/bus/intel_th/devices/0-msc0/active
+   $ echo 1 > /sys/bus/intel_th/devices/0-msc0/active
 
 # .. send data to master 33, see stm.txt for more details ..
 # .. wait for traces to pile up ..
-# .. and stop the trace:
+# .. and stop the trace::
 
-$ echo 0 > /sys/bus/intel_th/devices/0-msc0/active
+   $ echo 0 > /sys/bus/intel_th/devices/0-msc0/active
 
-# and now you can collect the trace from the device node:
+# and now you can collect the trace from the device node::
 
-$ cat /dev/intel_th0/msc0 > my_stp_trace
+   $ cat /dev/intel_th0/msc0 > my_stp_trace
 
 Host Debugger Mode
-==
+--
 
 It is possible to configure the Trace Hub and control its trace
 capture from a remote debug host, which should be connected via one of
-- 
2.7.4

[PATCH 17/17] trace doc: convert trace/stm.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst|  1 +
 Documentation/trace/{stm.txt => stm.rst} | 23 ---
 2 files changed, 13 insertions(+), 11 deletions(-)
 rename Documentation/trace/{stm.txt => stm.rst} (91%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 02cc56c..b58c10b 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -20,3 +20,4 @@ Linux Tracing Technologies
mmiotrace
hwlat_detector
intel_th
+   stm
diff --git a/Documentation/trace/stm.txt b/Documentation/trace/stm.rst
similarity index 91%
rename from Documentation/trace/stm.txt
rename to Documentation/trace/stm.rst
index 0376575..2c22ddb 100644
--- a/Documentation/trace/stm.txt
+++ b/Documentation/trace/stm.rst
@@ -1,3 +1,4 @@
+===
 System Trace Module
 ===
 
@@ -32,14 +33,14 @@ associated with it, located in "stp-policy" subsystem 
directory in
 configfs. The topmost directory's name (the policy) is formatted as
 the STM device name to which this policy applies and and arbitrary
 string identifier separated by a stop. From the examle above, a rule
-may look like this:
+may look like this::
 
-$ ls /config/stp-policy/dummy_stm.my-policy/user
-channels masters
-$ cat /config/stp-policy/dummy_stm.my-policy/user/masters
-48 63
-$ cat /config/stp-policy/dummy_stm.my-policy/user/channels
-0 127
+   $ ls /config/stp-policy/dummy_stm.my-policy/user
+   channels masters
+   $ cat /config/stp-policy/dummy_stm.my-policy/user/masters
+   48 63
+   $ cat /config/stp-policy/dummy_stm.my-policy/user/channels
+   0 127
 
 which means that the master allocation pool for this rule consists of
 masters 48 through 63 and channel allocation pool has channels 0
@@ -78,9 +79,9 @@ stm_source
 For kernel-based trace sources, there is "stm_source" device
 class. Devices of this class can be connected and disconnected to/from
 stm devices at runtime via a sysfs attribute called "stm_source_link"
-by writing the name of the desired stm device there, for example:
+by writing the name of the desired stm device there, for example::
 
-$ echo dummy_stm.0 > /sys/class/stm_source/console/stm_source_link
+   $ echo dummy_stm.0 > /sys/class/stm_source/console/stm_source_link
 
 For examples on how to use stm_source interface in the kernel, refer
 to stm_console, stm_heartbeat or stm_ftrace drivers.
@@ -118,5 +119,5 @@ the same time.
 
 Currently only Ftrace "function" tracer is supported.
 
-[1] 
https://software.intel.com/sites/default/files/managed/d3/3c/intel-th-developer-manual.pdf
-[2] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0444b/index.html
+* [1] 
https://software.intel.com/sites/default/files/managed/d3/3c/intel-th-developer-manual.pdf
+* [2] 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0444b/index.html
-- 
2.7.4

[PATCH 14/17] trace doc: convert trace/mmiotrace.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |  1 +
 .../trace/{mmiotrace.txt => mmiotrace.rst} | 86 +-
 2 files changed, 54 insertions(+), 33 deletions(-)
 rename Documentation/trace/{mmiotrace.txt => mmiotrace.rst} (78%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 307468d..4b3d690 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -17,3 +17,4 @@ Linux Tracing Technologies
events-power
events-nmi
events-msr
+   mmiotrace
diff --git a/Documentation/trace/mmiotrace.txt 
b/Documentation/trace/mmiotrace.rst
similarity index 78%
rename from Documentation/trace/mmiotrace.txt
rename to Documentation/trace/mmiotrace.rst
index 664e738..5116e8c 100644
--- a/Documentation/trace/mmiotrace.txt
+++ b/Documentation/trace/mmiotrace.rst
@@ -1,4 +1,6 @@
-   In-kernel memory-mapped I/O tracing
+===
+In-kernel memory-mapped I/O tracing
+===
 
 
 Home page and links to optional user space tools:
@@ -31,30 +33,35 @@ is no way to automatically detect if you are losing events 
due to CPUs racing.
 
 Usage Quick Reference
 -
+::
 
-$ mount -t debugfs debugfs /sys/kernel/debug
-$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
-$ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
-Start X or whatever.
-$ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
-$ echo nop > /sys/kernel/debug/tracing/current_tracer
-Check for lost events.
+   $ mount -t debugfs debugfs /sys/kernel/debug
+   $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
+   $ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
+   Start X or whatever.
+   $ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
+   $ echo nop > /sys/kernel/debug/tracing/current_tracer
+   Check for lost events.
 
 
 Usage
 -
 
 Make sure debugfs is mounted to /sys/kernel/debug.
-If not (requires root privileges):
-$ mount -t debugfs debugfs /sys/kernel/debug
+If not (requires root privileges)::
+
+   $ mount -t debugfs debugfs /sys/kernel/debug
 
 Check that the driver you are about to trace is not loaded.
 
-Activate mmiotrace (requires root privileges):
-$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
+Activate mmiotrace (requires root privileges)::
+
+   $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
+
+Start storing the trace::
+
+   $ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
 
-Start storing the trace:
-$ cat /sys/kernel/debug/tracing/trace_pipe > mydump.txt &
 The 'cat' process should stay running (sleeping) in the background.
 
 Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
@@ -66,30 +73,42 @@ This makes it easier to see which part of the (huge) trace 
corresponds to
 which action. It is recommended to place descriptive markers about what you
 do.
 
-Shut down mmiotrace (requires root privileges):
-$ echo nop > /sys/kernel/debug/tracing/current_tracer
+Shut down mmiotrace (requires root privileges)::
+
+   $ echo nop > /sys/kernel/debug/tracing/current_tracer
+
 The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
 pressing ctrl+c.
 
-Check that mmiotrace did not lose events due to a buffer filling up. Either
-$ grep -i lost mydump.txt
-which tells you exactly how many events were lost, or use
-$ dmesg
+Check that mmiotrace did not lose events due to a buffer filling up. Either::
+
+   $ grep -i lost mydump.txt
+
+which tells you exactly how many events were lost, or use::
+
+   $ dmesg
+
 to view your kernel log and look for "mmiotrace has lost events" warning. If
 events were lost, the trace is incomplete. You should enlarge the buffers and
 try again. Buffers are enlarged by first seeing how large the current buffers
-are:
-$ cat /sys/kernel/debug/tracing/buffer_size_kb
+are::
+
+   $ cat /sys/kernel/debug/tracing/buffer_size_kb
+
 gives you a number. Approximately double this number and write it back, for
-instance:
-$ echo 128000 > /sys/kernel/debug/tracing/buffer_size_kb
+instance::
+
+   $ echo 128000 > /sys/kernel/debug/tracing/buffer_size_kb
+
 Then start again from the top.
 
 If you are doing a trace for a driver project, e.g. Nouveau, you should also
-do the following before sending your results:
-$ lspci -vvv > lspci.txt
-$ dmesg > dmesg.txt
-$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
+do the following before sending your results::
+
+   $ lspci -vvv > lspci.txt
+   $ dmesg > dmesg.txt
+   $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
+
 and then send the .tar.gz file. The trace compresses considerably. Re

[PATCH 08/17] trace doc: convert trace/tracepoints.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |  1 +
 .../trace/{tracepoints.txt => tracepoints.rst} | 77 +++---
 2 files changed, 41 insertions(+), 37 deletions(-)
 rename Documentation/trace/{tracepoints.txt => tracepoints.rst} (74%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 353fb8a..c8bbdfc 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -11,3 +11,4 @@ Linux Tracing Technologies
ftrace-uses
kprobetrace
uprobetracer
+   tracepoints
diff --git a/Documentation/trace/tracepoints.txt 
b/Documentation/trace/tracepoints.rst
similarity index 74%
rename from Documentation/trace/tracepoints.txt
rename to Documentation/trace/tracepoints.rst
index a3efac6..6e3ce3b 100644
--- a/Documentation/trace/tracepoints.txt
+++ b/Documentation/trace/tracepoints.rst
@@ -1,6 +1,8 @@
-Using the Linux Kernel Tracepoints
+==
+Using the Linux Kernel Tracepoints
+==
 
-   Mathieu Desnoyers
+:Author: Mathieu Desnoyers
 
 
 This document introduces Linux Kernel Tracepoints and their use. It
@@ -9,8 +11,8 @@ connect probe functions to them and provides some examples of 
probe
 functions.
 
 
-* Purpose of tracepoints
-
+Purpose of tracepoints
+--
 A tracepoint placed in code provides a hook to call a function (probe)
 that you can provide at runtime. A tracepoint can be "on" (a probe is
 connected to it) or "off" (no probe is attached). When a tracepoint is
@@ -31,8 +33,8 @@ header file.
 They can be used for tracing and performance accounting.
 
 
-* Usage
-
+Usage
+-
 Two elements are required for tracepoints :
 
 - A tracepoint definition, placed in a header file.
@@ -40,52 +42,53 @@ Two elements are required for tracepoints :
 
 In order to use tracepoints, you should include linux/tracepoint.h.
 
-In include/trace/events/subsys.h :
+In include/trace/events/subsys.h::
 
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM subsys
+   #undef TRACE_SYSTEM
+   #define TRACE_SYSTEM subsys
 
-#if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_SUBSYS_H
+   #if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ)
+   #define _TRACE_SUBSYS_H
 
-#include 
+   #include 
 
-DECLARE_TRACE(subsys_eventname,
-   TP_PROTO(int firstarg, struct task_struct *p),
-   TP_ARGS(firstarg, p));
+   DECLARE_TRACE(subsys_eventname,
+   TP_PROTO(int firstarg, struct task_struct *p),
+   TP_ARGS(firstarg, p));
 
-#endif /* _TRACE_SUBSYS_H */
+   #endif /* _TRACE_SUBSYS_H */
 
-/* This part must be outside protection */
-#include 
+   /* This part must be outside protection */
+   #include 
 
-In subsys/file.c (where the tracing statement must be added) :
+In subsys/file.c (where the tracing statement must be added)::
 
-#include 
+   #include 
 
-#define CREATE_TRACE_POINTS
-DEFINE_TRACE(subsys_eventname);
+   #define CREATE_TRACE_POINTS
+   DEFINE_TRACE(subsys_eventname);
 
-void somefct(void)
-{
-   ...
-   trace_subsys_eventname(arg, task);
-   ...
-}
+   void somefct(void)
+   {
+   ...
+   trace_subsys_eventname(arg, task);
+   ...
+   }
 
 Where :
-- subsys_eventname is an identifier unique to your event
+  - subsys_eventname is an identifier unique to your event
+
 - subsys is the name of your subsystem.
 - eventname is the name of the event to trace.
 
-- TP_PROTO(int firstarg, struct task_struct *p) is the prototype of the
-  function called by this tracepoint.
+  - `TP_PROTO(int firstarg, struct task_struct *p)` is the prototype of the
+function called by this tracepoint.
 
-- TP_ARGS(firstarg, p) are the parameters names, same as found in the
-  prototype.
+  - `TP_ARGS(firstarg, p)` are the parameters names, same as found in the
+prototype.
 
-- if you use the header in multiple source files, #define CREATE_TRACE_POINTS
-  should appear only in one source file.
+  - if you use the header in multiple source files, `#define 
CREATE_TRACE_POINTS`
+should appear only in one source file.
 
 Connecting a function (probe) to a tracepoint is done by providing a
 probe (function to call) for the specific tracepoint through
@@ -117,7 +120,7 @@ used to export the defined tracepoints.
 
 If you need to do a bit of work for a tracepoint parameter, and
 that work is only used for the tracepoint, that work can be encapsulated
-within an if statement with the following:
+within an if statement with the following::
 
if (trace_foo_bar_enabled()) {
int i;
@@ -139,7 +142,7 @@ The advantage of using the trace__enabled(

[PATCH 15/17] trace doc: convert trace/hwlat_detector.txt to rst fromat

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 .../{hwlat_detector.txt => hwlat_detector.rst} | 26 +-
 Documentation/trace/index.rst  |  1 +
 2 files changed, 16 insertions(+), 11 deletions(-)
 rename Documentation/trace/{hwlat_detector.txt => hwlat_detector.rst} (83%)

diff --git a/Documentation/trace/hwlat_detector.txt 
b/Documentation/trace/hwlat_detector.rst
similarity index 83%
rename from Documentation/trace/hwlat_detector.txt
rename to Documentation/trace/hwlat_detector.rst
index 3207717..5739349 100644
--- a/Documentation/trace/hwlat_detector.txt
+++ b/Documentation/trace/hwlat_detector.rst
@@ -1,4 +1,8 @@
-Introduction:
+=
+Hardware Latency Detector
+=
+
+Introduction
 -
 
 The tracer hwlat_detector is a special purpose tracer that is used to
@@ -28,7 +32,7 @@ Note that the hwlat detector should *NEVER* be used in a 
production environment.
 It is intended to be run manually to determine if the hardware platform has a
 problem with long system firmware service routines.
 
-Usage:
+Usage
 --
 
 Write the ASCII text "hwlat" into the current_tracer file of the tracing system
@@ -36,16 +40,16 @@ Write the ASCII text "hwlat" into the current_tracer file 
of the tracing system
 redefine the threshold in microseconds (us) above which latency spikes will
 be taken into account.
 
-Example:
+Example::
 
# echo hwlat > /sys/kernel/tracing/current_tracer
# echo 100 > /sys/kernel/tracing/tracing_thresh
 
 The /sys/kernel/tracing/hwlat_detector interface contains the following files:
 
-width  - time period to sample with CPUs held (usecs)
- must be less than the total window size (enforced)
-window - total period of sampling, width being inside (usecs)
+  - width - time period to sample with CPUs held (usecs)
+must be less than the total window size (enforced)
+  - window - total period of sampling, width being inside (usecs)
 
 By default the width is set to 500,000 and window to 1,000,000, meaning that
 for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
@@ -67,11 +71,11 @@ The following tracing directory files are used by the 
hwlat_detector:
 
 in /sys/kernel/tracing:
 
- tracing_threshold - minimum latency value to be considered (usecs)
- tracing_max_latency   - maximum hardware latency actually observed (usecs)
- tracing_cpumask   - the CPUs to move the hwlat thread across
- hwlat_detector/width  - specified amount of time to spin within window (usecs)
- hwlat_detector/window - amount of time between (width) runs (usecs)
+ - tracing_threshold   - minimum latency value to be considered (usecs)
+ - tracing_max_latency - maximum hardware latency actually observed (usecs)
+ - tracing_cpumask - the CPUs to move the hwlat thread across
+ - hwlat_detector/width- specified amount of time to spin within 
window (usecs)
+ - hwlat_detector/window   - amount of time between (width) runs (usecs)
 
 The hwlat detector's kernel thread will migrate across each CPU specified in
 tracing_cpumask between each window. To limit the migration, either modify
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 4b3d690..eabbbaf 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -18,3 +18,4 @@ Linux Tracing Technologies
events-nmi
events-msr
mmiotrace
+   hwlat_detector
-- 
2.7.4

[PATCH 01/17] Documentation: add Linux tracing to Sphinx TOC tree

2018-02-16 Thread changbin . du

From: Changbin Du 

This just add a index.rst for trace subsystem. More docs will
be added later.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/index.rst   | 1 +
 Documentation/trace/index.rst | 6 ++
 2 files changed, 7 insertions(+)
 create mode 100644 Documentation/trace/index.rst

diff --git a/Documentation/index.rst b/Documentation/index.rst
index ef5080c..3b99ab9 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -64,6 +64,7 @@ merged much easier.
dev-tools/index
doc-guide/index
kernel-hacking/index
+   trace/index
maintainer/index
 
 Kernel API documentation
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
new file mode 100644
index 000..d986ead
--- /dev/null
+++ b/Documentation/trace/index.rst
@@ -0,0 +1,6 @@
+==
+Linux Tracing Technologies
+==
+
+.. toctree::
+   :maxdepth: 2
-- 
2.7.4

[PATCH 04/17] trace doc: convert trace/tracepoint-analysis.txt to rst format

2018-02-16 Thread changbin . du

From: Changbin Du 

This converts the plain text documentation to reStructuredText format and
add it into Sphinx TOC tree. No essential content change.

Cc: Steven Rostedt 
Signed-off-by: Changbin Du 
---
 Documentation/trace/index.rst  |  1 +
 ...epoint-analysis.txt => tracepoint-analysis.rst} | 41 ++
 2 files changed, 27 insertions(+), 15 deletions(-)
 rename Documentation/trace/{tracepoint-analysis.txt => 
tracepoint-analysis.rst} (93%)

diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index aa2baad..61b5551 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -6,4 +6,5 @@ Linux Tracing Technologies
:maxdepth: 2
 
ftrace-design
+   tracepoint-analysis
ftrace-uses
diff --git a/Documentation/trace/tracepoint-analysis.txt 
b/Documentation/trace/tracepoint-analysis.rst
similarity index 93%
rename from Documentation/trace/tracepoint-analysis.txt
rename to Documentation/trace/tracepoint-analysis.rst
index 058cc6c..a4d3ff2 100644
--- a/Documentation/trace/tracepoint-analysis.txt
+++ b/Documentation/trace/tracepoint-analysis.rst
@@ -1,7 +1,7 @@
-   Notes on Analysing Behaviour Using Events and Tracepoints
-
-   Documentation written by Mel Gorman
-   PCL information heavily based on email from Ingo Molnar
+=
+Notes on Analysing Behaviour Using Events and Tracepoints
+=
+:Author: Mel Gorman (PCL information heavily based on email from Ingo Molnar)
 
 1. Introduction
 ===
@@ -27,18 +27,18 @@ assumed that the PCL tool tools/perf has been installed and 
is in your path.
 --
 
 All possible events are visible from /sys/kernel/debug/tracing/events. Simply
-calling
+calling::
 
   $ find /sys/kernel/debug/tracing/events -type d
 
 will give a fair indication of the number of events available.
 
 2.2 PCL (Performance Counters for Linux)

+
 
 Discovery and enumeration of all counters and events, including tracepoints,
 are available with the perf tool. Getting a list of available events is a
-simple case of:
+simple case of::
 
   $ perf list 2>&1 | grep Tracepoint
   ext4:ext4_free_inode [Tracepoint event]
@@ -57,7 +57,7 @@ simple case of:
 
 See Documentation/trace/events.txt for a proper description on how events
 can be enabled system-wide. A short example of enabling all events related
-to page allocation would look something like:
+to page allocation would look something like::
 
   $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep 
mm_`; do echo 1 > $i; done
 
@@ -67,6 +67,7 @@ to page allocation would look something like:
 In SystemTap, tracepoints are accessible using the kernel.trace() function
 call. The following is an example that reports every 5 seconds what processes
 were allocating the pages.
+::
 
   global page_allocs
 
@@ -91,6 +92,7 @@ were allocating the pages.
 
 By specifying the -a switch and analysing sleep, the system-wide events
 for a duration of time can be examined.
+::
 
  $ perf stat -a \
-e kmem:mm_page_alloc -e kmem:mm_page_free \
@@ -118,6 +120,7 @@ basis using set_ftrace_pid.
 
 Events can be activated and tracked for the duration of a process on a local
 basis using PCL such as follows.
+::
 
   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
 -e kmem:mm_page_free_batched ./hackbench 10
@@ -145,6 +148,7 @@ Any workload can exhibit variances between runs and it can 
be important
 to know what the standard deviation is. By and large, this is left to the
 performance analyst to do it by hand. In the event that the discrete event
 occurrences are useful to the performance analyst, then perf can be used.
+::
 
   $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free
-e kmem:mm_page_free_batched ./hackbench 10
@@ -167,6 +171,7 @@ aggregation of discrete events, then a script would need to 
be developed.
 
 Using --repeat, it is also possible to view how events are fluctuating over
 time on a system-wide basis using -a and sleep.
+::
 
   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
-e kmem:mm_page_free_batched \
@@ -188,9 +193,9 @@ When events are enabled the events that are triggering can 
be read from
 options exist as well. By post-processing the output, further information can
 be gathered on-line as appropriate. Examples of post-processing might include
 
-  o Reading information from /proc for the PID that triggered the event
-  o Deriving a higher-level event from a series of lower-level events.
-  o Calculating latencies between two events
+  - Reading information from /proc for the PID that triggered the event
+  - Deriving a higher-level event from a series of lower-level events.
+  - Calculating latencie

RE: [PATCH v1 0/4] platform/x86: mlx-platform: Add bus differed and auto detection functionalities

2018-02-16 Thread Vadim Pasternak



> -Original Message-
> From: Darren Hart [mailto:dvh...@infradead.org]
> Sent: Friday, February 16, 2018 3:33 AM
> To: Vadim Pasternak 
> Cc: andy.shevche...@gmail.com; gre...@linuxfoundation.org; platform-
> driver-...@vger.kernel.org; linux-kernel@vger.kernel.org; j...@resnulli.us;
> Michael Shych 
> Subject: Re: [PATCH v1 0/4] platform/x86: mlx-platform: Add bus differed and
> auto detection functionalities
> 
> On Tue, Feb 13, 2018 at 10:09:32PM +, Vadim Pasternak wrote:
> > This patchset:
> > - Adds define for the channels number for mux device.
> > - Adds differed bus functionality.
> > - Changes input for device create routine in mlxreg-hotplug driver.
> > - Adds physical bus number auto detection.
> 
> Hi Vadim,

Hi Darren,

> 
> We are now in the RC cycle for 4.16. Do you consider these changes to be
> "fixes" that need to be included in 4.16? It was difficult to tell from the 
> commit
> messages how severe some of the possible issues were.

I actually considered these patches for the next, having in mind that 4.16 is
already closed.

> 
> I'm concerned about the level of testing seen by the previous patch series if 
> we
> are getting these changes so soon after they were merged.
> Can you comment on why these are being discovered now - and how confident
> are we in the drivers with these changes applied?

These patches has been tested on the all relevant platforms.
It has been discovered some time ago and it was in my plans to submit these
patches after the previous series.

Do you think it's better to consider this series as a bug fix, and target it to
4.16?

Thanks,
Vadim.

> 
> >
> > Vadim Pasternak (4):
> >   platform/x86: mlx-platform: Use define for the channel numbers
> >   platform/x86: mlx-platform: Add differed bus functionality
> >   platform/mellanox: mlxreg-hotplug: Change input for device create
> > routine
> >   platform/x86: mlx-platform: Add physical bus number auto detection
> >
> >  drivers/platform/mellanox/mlxreg-hotplug.c | 31 +-
> >  drivers/platform/x86/mlx-platform.c| 68
> --
> >  include/linux/platform_data/mlxreg.h   |  4 ++
> >  3 files changed, 90 insertions(+), 13 deletions(-)
> >
> > --
> > 2.1.4
> >
> >
> 
> --
> Darren Hart
> VMware Open Source Technology Center

[PATCH] ARM: sunxi: Fix multi-cluster SMP support compilation in multi v6/v7 configs

2018-02-16 Thread Chen-Yu Tsai

Various parts of the assembly code used in the multi-cluster SMP support
requires ARMv7-A. If the kernel config also has multi v6 support enabled,
Kbuild defaults to building for armv6k, which does not support some of
the instructions we use.

Configure the Makefile such that the multi-cluster SMP code is always
built for ARMv7-A. This is also what mach-exynos does for their MC-SMP
code.

Signed-off-by: Chen-Yu Tsai 
---

This addresses "[sunxi:sunxi/core-for-4.17 1/4] /tmp/ccSQM2rD.s:438:
Error: selected processor does not support `isb' in ARM mode"
reported by the kbuild test robot for arm-allmodconfig.

Should we apply it, or squash it in the original patch?

---
 arch/arm/mach-sunxi/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-sunxi/Makefile b/arch/arm/mach-sunxi/Makefile
index 3e741e959c7c..3c2c4384357a 100644
--- a/arch/arm/mach-sunxi/Makefile
+++ b/arch/arm/mach-sunxi/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_ARCH_SUNXI) += sunxi.o
 obj-$(CONFIG_ARCH_SUNXI_MC_SMP) += mc_smp.o
+CFLAGS_mc_smp.o+= -march=armv7-a
 obj-$(CONFIG_SMP) += platsmp.o
-- 
2.16.1

Plan for DTC re-sync?

2018-02-16 Thread Masahiro Yamada

Hi Rob,


Do you have a plan to sync scripts/dtc/
with upstream?



I want the following commit in the upstream DTC project.

commit b260c4f610c004c6e9e36c5f7bbb58d23e605bf1
Author: Grant Likely 
Date:   Mon Nov 20 17:12:18 2017 +

Fix ambiguous grammar for devicetree rule




-- 
Best Regards
Masahiro Yamada

[PATCH] USB: chaoskey: Use kasprintf() over strcpy()/strcat()

2018-02-16 Thread Kees Cook

Instead of kmalloc() with manually calculated values followed by
multiple strcpy()/strcat() calls, just fold it all into a single
kasprintf() call.

Signed-off-by: Kees Cook 
---
 drivers/usb/misc/chaoskey.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/misc/chaoskey.c b/drivers/usb/misc/chaoskey.c
index 716cb515523e..cf5828ce927a 100644
--- a/drivers/usb/misc/chaoskey.c
+++ b/drivers/usb/misc/chaoskey.c
@@ -168,14 +168,10 @@ static int chaoskey_probe(struct usb_interface *interface,
 */
 
if (udev->product && udev->serial) {
-   dev->name = kmalloc(strlen(udev->product) + 1 +
-   strlen(udev->serial) + 1, GFP_KERNEL);
+   dev->name = kasprintf(GFP_KERNEL, "%s-%s", udev->product,
+ udev->serial);
if (dev->name == NULL)
goto out;
-
-   strcpy(dev->name, udev->product);
-   strcat(dev->name, "-");
-   strcat(dev->name, udev->serial);
}
 
dev->interface = interface;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH 08/23] kconfig: add 'macro' keyword to support user-defined function

2018-02-16 Thread Nicolas Pitre

On Sat, 17 Feb 2018, Ulf Magnusson wrote:

> On Sat, Feb 17, 2018 at 3:30 AM, Nicolas Pitre  wrote:
> > On Sat, 17 Feb 2018, Ulf Magnusson wrote:
> >
> >> On Fri, Feb 16, 2018 at 02:49:31PM -0500, Nicolas Pitre wrote:
> >> > On Sat, 17 Feb 2018, Masahiro Yamada wrote:
> >> >
> >> > > Now, we got a basic ability to test compiler capability in Kconfig.
> >> > >
> >> > > config CC_HAS_STACKPROTECTOR
> >> > > bool
> >> > > default $(shell $CC -Werror -fstack-protector -c -x c 
> >> > > /dev/null -o /dev/null)
> >> > >
> >> > > This works, but it is ugly to repeat this long boilerplate.
> >> > >
> >> > > We want to describe like this:
> >> > >
> >> > > config CC_HAS_STACKPROTECTOR
> >> > > bool
> >> > > default $(cc-option -fstack-protector)
> >> > >
> >> > > It is straight-forward to implement a new function, but I do not like
> >> > > to hard-code specialized functions like this.  Hence, here is another
> >> > > feature to add functions from Kconfig files.
> >> > >
> >> > > A user-defined function can be defined as a string type symbol with
> >> > > a special keyword 'macro'.  It can be referenced in the same way as
> >> > > built-in functions.  This feature was also inspired by Makefile where
> >> > > user-defined functions are referenced by $(call func-name, args...),
> >> > > but I omitted the 'call' to makes it shorter.
> >> > >
> >> > > The macro definition can contain $(1), $(2), ... which will be replaced
> >> > > with arguments from the caller.
> >> > >
> >> > > Example code:
> >> > >
> >> > >   config cc-option
> >> > >   string
> >> > >   macro $(shell $CC -Werror $(1) -c -x c /dev/null -o 
> >> > > /dev/null)
> >> >
> >> > I think this syntax for defining a macro shouldn't start with the
> >> > "config" keyword, unless you want it to be part of the config symbol
> >> > space and land it in .config. And typing it as a "string" while it
> >> > actually returns y/n (hence a bool) is also strange.
> >> >
> >> > What about this instead:
> >> >
> >> > macro cc-option
> >> > bool $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> >> >
> >> > This makes it easier to extend as well if need be.
> >> >
> >> >
> >> > Nicolas
> >>
> >> I haven't gone over the patchset in detail yet and might be missing
> >> something here, but if this is just meant to be a textual shorthand,
> >> then why give it a type at all?
> >
> > It is meant to be like a user-defined function.
> >
> >> Do you think a simpler syntax like this would make sense?
> >>
> >>   macro cc-option "$(shell $CC -Werror $(1) -c -x c /dev/null -o 
> >> /dev/null)"
> >>
> >> That's the most general version, where you could use it for other stuff
> >> besides $(shell ...) as well, just to keep parity.
> >
> > This is not extendable.  Let's imagine that you might want to implement
> > some kind of conditionals some day e.g.:
> >
> > macro complex_test
> > bool $(shell foo) if LOCKDEP_SUPPORT
> > bool y if DEBUG_DRIVER
> > bool n
> 
> I still don't quite get the semantics here. How would the behavior
> change if the type was changed to say string or int in some or all of
> the lines?

I admit this wouldn't make sense to have multiple different types. In 
this example, the bool keyword acts as syntactic sugar more than 
anything else.

> Since the current model is to evaluate $() while the Kconfig files are
> being parsed, would this require evaluating Kconfig expressions during
> parsing? There is a relatively clean and (somewhat) easy to understand
> parsing/evaluation separation at the moment, which I like.

Agreed. Let's forget about the conditionals then.


Nicolas

Re: [PATCH v2 0/6] crypto: engine - Permit to enqueue all async requests

2018-02-16 Thread Herbert Xu

On Fri, Feb 16, 2018 at 04:36:56PM +0100, Corentin Labbe wrote:
>
> As mentionned in the cover letter, all patchs (except documentation one) 
> should be squashed.
> A kbuild robot reported a build error on cryptodev due to this.

It's too late now.  In future if you want the patches to be squashed
then please send them in one email.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 08/23] kconfig: add 'macro' keyword to support user-defined function

2018-02-16 Thread Ulf Magnusson

On Sat, Feb 17, 2018 at 3:30 AM, Nicolas Pitre  wrote:
> On Sat, 17 Feb 2018, Ulf Magnusson wrote:
>
>> On Fri, Feb 16, 2018 at 02:49:31PM -0500, Nicolas Pitre wrote:
>> > On Sat, 17 Feb 2018, Masahiro Yamada wrote:
>> >
>> > > Now, we got a basic ability to test compiler capability in Kconfig.
>> > >
>> > > config CC_HAS_STACKPROTECTOR
>> > > bool
>> > > default $(shell $CC -Werror -fstack-protector -c -x c /dev/null 
>> > > -o /dev/null)
>> > >
>> > > This works, but it is ugly to repeat this long boilerplate.
>> > >
>> > > We want to describe like this:
>> > >
>> > > config CC_HAS_STACKPROTECTOR
>> > > bool
>> > > default $(cc-option -fstack-protector)
>> > >
>> > > It is straight-forward to implement a new function, but I do not like
>> > > to hard-code specialized functions like this.  Hence, here is another
>> > > feature to add functions from Kconfig files.
>> > >
>> > > A user-defined function can be defined as a string type symbol with
>> > > a special keyword 'macro'.  It can be referenced in the same way as
>> > > built-in functions.  This feature was also inspired by Makefile where
>> > > user-defined functions are referenced by $(call func-name, args...),
>> > > but I omitted the 'call' to makes it shorter.
>> > >
>> > > The macro definition can contain $(1), $(2), ... which will be replaced
>> > > with arguments from the caller.
>> > >
>> > > Example code:
>> > >
>> > >   config cc-option
>> > >   string
>> > >   macro $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
>> >
>> > I think this syntax for defining a macro shouldn't start with the
>> > "config" keyword, unless you want it to be part of the config symbol
>> > space and land it in .config. And typing it as a "string" while it
>> > actually returns y/n (hence a bool) is also strange.
>> >
>> > What about this instead:
>> >
>> > macro cc-option
>> > bool $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
>> >
>> > This makes it easier to extend as well if need be.
>> >
>> >
>> > Nicolas
>>
>> I haven't gone over the patchset in detail yet and might be missing
>> something here, but if this is just meant to be a textual shorthand,
>> then why give it a type at all?
>
> It is meant to be like a user-defined function.
>
>> Do you think a simpler syntax like this would make sense?
>>
>>   macro cc-option "$(shell $CC -Werror $(1) -c -x c /dev/null -o 
>> /dev/null)"
>>
>> That's the most general version, where you could use it for other stuff
>> besides $(shell ...) as well, just to keep parity.
>
> This is not extendable.  Let's imagine that you might want to implement
> some kind of conditionals some day e.g.:
>
> macro complex_test
> bool $(shell foo) if LOCKDEP_SUPPORT
> bool y if DEBUG_DRIVER
> bool n

I still don't quite get the semantics here. How would the behavior
change if the type was changed to say string or int in some or all of
the lines?

Since the current model is to evaluate $() while the Kconfig files are
being parsed, would this require evaluating Kconfig expressions during
parsing? There is a relatively clean and (somewhat) easy to understand
parsing/evaluation separation at the moment, which I like.

Do you have anything in mind that would be cleaner and simpler to
implement in this way compared to using plain symbols?

>
> There is no real advantage to simplify the macro definition to its
> simplest expression, unlike its actual usage.

Maybe I'm being grumpy, but this feels like it's adding complexity
rather than reducing it.

I like the rest of this patchset, because the behavior is easy to
understand and fits well with Kconfig's evaluation model: $() is just
a kind of preprocessor that runs during parsing and does value
substitution based on shell commands, possibly along with some helper
macros to avoid repetition.

I think we should think hard about whether we actually need anything
more than that before complicating Kconfig even further "just in
case." If the goal is simplification, then it's bad if we eventually
end up with a bigger mess than the Makefiles.

>
>> Are there any cases where something more advanced than that might be
>> warranted (e.g., macros that expand to complete expressions)?
>
> Maybe not now, but there is no need to close the door on the possibility
> either.
>
>
> Nicolas

Kconfig has no notion of types for expressions by the way. The
simplest way to look at it is that all symbols have a tristate value
(which is n for non-bool/tristate symbols) and a string value. Which
one gets used depends on the context. In A && B, the tristate values
are used, and in A = B the string values are compared.

In something like 'default "foo bar"', "foo bar" is actually a
constant symbol. If we were to drop the straightforward preprocessor
model, then constant symbols would no longer necessarily be constant.
I have a feeling that that might turn Kconfig's internals even
messier.

Constant (and undefined) symbols en

Re: [PATCH 2/3] Documentation: convert trace/ftrace-design.txt to rst format

2018-02-16 Thread Du, Changbin

On Fri, Feb 16, 2018 at 12:36:29PM -0500, Steven Rostedt wrote:
> On Fri, 16 Feb 2018 05:49:52 -0700
> Jonathan Corbet  wrote:
> 
> > On Thu, 15 Feb 2018 22:57:05 -0500
> > Steven Rostedt  wrote:
> > 
> > > This document is out of date, and I rather have it updated before we
> > > make it more "available" elsewhere.  
> > 
> > Imagine that, an out-of-date doc in the kernel :)
> > 
> > Seriously, though, I'd argue that (1) it's already highly available, and
> > (2) it's useful now.  And (3) who knows when that update will happen?
> > Unless we have reason to believe that a new version is waiting on the
> > wings, I don't really see why we would want to delay this work.
> 
> Actually, some of these documents I was thinking of labeling as
> "obsolete" or simply removing them. The ftrace-design one is about
> how to port ftrace to other architectures, and I already had to correct
> people that based their work on it.
> 
> Yeah, I really need to get some time to update them, but like everyone
> else, that's just the 90th thing I have to do.
> 
> -- Steve
Reading this doc, I think most of information are still useful for undertading 
the
implemeation. So how abount just put a caution at the begining of doc as below 
defore get updated?
http://docservice.askxiong.com/linux-kernel/trace/ftrace-design.html

Anyway, I just converted them all. I will send them out. Please comemnt if some
of them should be removed. 

-- 
Thanks,
Changbin Du

Re: [PATCH] tools/memory-model: remove rb-dep, smp_read_barrier_depends, and lockless_dereference

2018-02-16 Thread Andrea Parri

On Fri, Feb 16, 2018 at 05:22:55PM -0500, Alan Stern wrote:
> Since commit 76ebbe78f739 ("locking/barriers: Add implicit
> smp_read_barrier_depends() to READ_ONCE()") was merged for the 4.15
> kernel, it has not been necessary to use smp_read_barrier_depends().
> Similarly, commit 59ecbbe7b31c ("locking/barriers: Kill
> lockless_dereference()") removed lockless_dereference() from the
> kernel.
> 
> Since these primitives are no longer part of the kernel, they do not
> belong in the Linux Kernel Memory Consistency Model.  This patch
> removes them, along with the internal rb-dep relation, and updates the
> revelant documentation.
> 
> Signed-off-by: Alan Stern 
> 
> ---

[...]


> Index: usb-4.x/tools/memory-model/linux-kernel.def
> ===
> --- usb-4.x/tools/memory-model.orig/linux-kernel.def
> +++ usb-4.x/tools/memory-model/linux-kernel.def
> @@ -13,14 +13,12 @@ WRITE_ONCE(X,V) { __store{once}(X,V); }
>  smp_store_release(X,V) { __store{release}(*X,V); }
>  smp_load_acquire(X) __load{acquire}(*X)
>  rcu_assign_pointer(X,V) { __store{release}(X,V); }
> -lockless_dereference(X) __load{lderef}(X)
>  rcu_dereference(X) __load{deref}(X)

^^^ __load{once}


>  
>  // Fences
>  smp_mb() { __fence{mb} ; }
>  smp_rmb() { __fence{rmb} ; }
>  smp_wmb() { __fence{wmb} ; }
> -smp_read_barrier_depends() { __fence{rb_dep}; }
>  smp_mb__before_atomic() { __fence{before-atomic} ; }
>  smp_mb__after_atomic() { __fence{after-atomic} ; }
>  smp_mb__after_spinlock() { __fence{after-spinlock} ; }
> Index: usb-4.x/tools/memory-model/Documentation/cheatsheet.txt
> ===
> --- usb-4.x/tools/memory-model.orig/Documentation/cheatsheet.txt
> +++ usb-4.x/tools/memory-model/Documentation/cheatsheet.txt
> @@ -6,8 +6,7 @@
>  Store, e.g., WRITE_ONCE()Y   
> Y
>  Load, e.g., READ_ONCE()  Y  Y
> Y
>  Unsuccessful RMW operation   Y  Y
> Y
> -smp_read_barrier_depends()  Y   Y   Y
> -*_dereference()  Y  Y   Y
> Y
> +rcu_dereference()Y  Y   Y
> Y
>  Successful *_acquire()   R   Y  Y   Y   YY   
> Y
>  Successful *_release() CY  YY W  
> Y
>  smp_rmb()   Y   RY  YR

Akira's observation about READ_ONCE extends to all (annotated) loads.  In
fact, it also applies to loads corresponding to unsuccessful RMW operations;
consider, for example, the following variation of MP+onceassign+derefonce:

C T

{
y=z;
z=0;
}

P0(int *x, int **y)
{
WRITE_ONCE(*x, 1);
smp_store_release(y, x);
}

P1(int **y, int *z)
{
int *r0;
int r1;

r0 = cmpxchg_relaxed(y, z, z);
r1 = READ_ONCE(*r0);
}

exists (1:r0=x /\ 1:r1=0)

The final state is allowed w/o the patch, and forbidden w/ the patch.

This also reminds me of

   5a8897cc7631fa544d079c443800f4420d1b173f
   ("locking/atomics/alpha: Add smp_read_barrier_depends() to 
_release()/_relaxed() atomics")

(that we probably want to mention in the commit message).

  Andrea


> 
>

Re: [PATCH] aoe: document sysfs interface

2018-02-16 Thread Ed Cashin


On 02/16/2018 10:39 AM, Aishwarya Pant wrote:

Documentation has been compiled from git commit logs and descriptions in
Documentation/aoe/aoe.txt. This should be useful for scripting and
tracking changes in the ABI.

...

+What:  /sys/block/etherd*/netif

...

+Description:
+   (RO) The name of the network interface on the localhost through
+   which we are communicating with the remote AoE device.


I'd recommend changing that to, "network interfaces".

Thanks!

--
  Ed

Re: [PATCH 08/23] kconfig: add 'macro' keyword to support user-defined function

2018-02-16 Thread Nicolas Pitre

On Sat, 17 Feb 2018, Ulf Magnusson wrote:

> On Fri, Feb 16, 2018 at 02:49:31PM -0500, Nicolas Pitre wrote:
> > On Sat, 17 Feb 2018, Masahiro Yamada wrote:
> > 
> > > Now, we got a basic ability to test compiler capability in Kconfig.
> > > 
> > > config CC_HAS_STACKPROTECTOR
> > > bool
> > > default $(shell $CC -Werror -fstack-protector -c -x c /dev/null 
> > > -o /dev/null)
> > > 
> > > This works, but it is ugly to repeat this long boilerplate.
> > > 
> > > We want to describe like this:
> > > 
> > > config CC_HAS_STACKPROTECTOR
> > > bool
> > > default $(cc-option -fstack-protector)
> > > 
> > > It is straight-forward to implement a new function, but I do not like
> > > to hard-code specialized functions like this.  Hence, here is another
> > > feature to add functions from Kconfig files.
> > > 
> > > A user-defined function can be defined as a string type symbol with
> > > a special keyword 'macro'.  It can be referenced in the same way as
> > > built-in functions.  This feature was also inspired by Makefile where
> > > user-defined functions are referenced by $(call func-name, args...),
> > > but I omitted the 'call' to makes it shorter.
> > > 
> > > The macro definition can contain $(1), $(2), ... which will be replaced
> > > with arguments from the caller.
> > > 
> > > Example code:
> > > 
> > >   config cc-option
> > >   string
> > >   macro $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> > 
> > I think this syntax for defining a macro shouldn't start with the 
> > "config" keyword, unless you want it to be part of the config symbol 
> > space and land it in .config. And typing it as a "string" while it 
> > actually returns y/n (hence a bool) is also strange.
> > 
> > What about this instead:
> > 
> > macro cc-option
> > bool $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> > 
> > This makes it easier to extend as well if need be.
> > 
> > 
> > Nicolas
> 
> I haven't gone over the patchset in detail yet and might be missing
> something here, but if this is just meant to be a textual shorthand,
> then why give it a type at all?

It is meant to be like a user-defined function.

> Do you think a simpler syntax like this would make sense?
> 
>   macro cc-option "$(shell $CC -Werror $(1) -c -x c /dev/null -o 
> /dev/null)"
> 
> That's the most general version, where you could use it for other stuff
> besides $(shell ...) as well, just to keep parity.

This is not extendable.  Let's imagine that you might want to implement 
some kind of conditionals some day e.g.:

macro complex_test
bool $(shell foo) if LOCKDEP_SUPPORT
bool y if DEBUG_DRIVER
bool n

There is no real advantage to simplify the macro definition to its 
simplest expression, unlike its actual usage.

> Are there any cases where something more advanced than that might be
> warranted (e.g., macros that expand to complete expressions)?

Maybe not now, but there is no need to close the door on the possibility 
either.


Nicolas

Re: [PATCH v3 08/11] watchdog/hpwdt: Programable Pretimeout NMI

2018-02-16 Thread Guenter Roeck


On 02/16/2018 05:56 PM, Jerry Hoemann wrote:

On Fri, Feb 16, 2018 at 03:55:06PM -0800, Guenter Roeck wrote:

On Fri, Feb 16, 2018 at 04:46:17PM -0700, Jerry Hoemann wrote:

On Fri, Feb 16, 2018 at 12:34:40PM -0800, Guenter Roeck wrote:

On Thu, Feb 15, 2018 at 04:43:57PM -0700, Jerry Hoemann wrote:


...


@@ -98,12 +106,21 @@ static int hpwdt_settimeout(struct watchdog_device *dev, 
unsigned int val)
  }
  
  #ifdef CONFIG_HPWDT_NMI_DECODING	/* { */

+static int hpwdt_set_pretimeout(struct watchdog_device *dev, unsigned int val)
+{
+   if (val && (val != PRETIMEOUT_SEC)) {


Unnecessary ( )



There are several things going on here. I'm not sure which one the above
comment is intended.


The "Unnecessary" refers to the ( ) around the second part of the expression
above. While there may be valid reasons to include extra ( ), I think we
can trust the C compiler to get it right here.


Okay, wasn't sure what you were getting at here.

I trust the C compiler, I don't trust humans.  In compound conditionals,
I'll add parens so that the meaning is clear.



... and I don't accept patches with excessive ( ) if I catch them, because
it confuses me since I start looking for a meaning that isn't there.




While a pretimeout NMI isn't required by the HW to be enabled, if enabled the
length of pretimeout is fixed by HW.

I didn't see anything in the API that would allow us to communicate to
the user this "feature."  timeout at leasst has both min_timeout and 
max_timeout, but
I didn't see similar for pretimeout.  I also don't think its reasonable to fail
here if the requested value is not 9 as the user really has no way of knowing 
what
the valid range of pretimeout values are.  So I accept, any non-zero value
for pretimeout, but then set pretimeout to be 9.

But at the same time, I don't like to silently change a human request
w/o at least warning.


Sorry, I lost you here.


I wasn't sure to what you were objecting to.  I thought you might
not have understood why I was converting non-zero values of
"pretimeout" to 9.  Was trying to explain the reasoning.
 > A problem I see with the watchdog API is that users don't know
what is an acceptable range of values for pretimeout.

For HPE proliant systems, one cannot just choose an arbitrary
value for pretimeout.

I don't see a reasonable way that a user can determine the valid range
for pretimeout for HPE systems given our hardware restrictions.



I fully understand this, and I did not object to it. Watchdog drivers may and
are expected to adjust timeouts and pretimeouts to match hardware constraints,
and user space programs are expected to check the actual values after setting
timeouts. That is nothing unexpected or special.







The actual timeout can be a value smaller than 9 seconds.
Minimum is 1 second. What happens if the user configures
a timeout of less than 9 seconds as well as a pretimeout ?
Will it fire immediately ?


The architecture is silent on this issue.  My experience with
this is that if timeout < 9 seconds, the NMI is not issued.
System resets when the timeout expires.  This could be implementation
dependent.

Note, this is not a new issue.


Bad argument.


Not sure exactly to what your objections are.  I'll point out that:

1) hpwdt has been using pretimeout NMI for watchdog for > 10 years.
2) For 8 years, its been possible to have a timeout < 9 seconds.
3) AFAIK this hasn't proven to be a big issue.
4) I have real questions as to how (or if) to address the issue.



and I don't see the problem with returning EINVAL, ie rejecting the
pretimeout, if the selected timeout value is <= 9 seconds.

So the review found a problem (or maybe inconsistency), and you
refuse to fix it because you don't see the fix as part of this
patch set, even though it would literally require one or two lines
of code. Hmm.


I am perfectly willing to discuss the problem, but I don't think
it is a requirement for this patch set.


Well, I think it is. But see below.






I thought about setting the min timeout to ten seconds to avoid this situation.


You could reject reject request to set the pretimeout to a value <= the
timeout.


I think you mis-communicated here.


I think you understand what I mean: reject setting a a pretimeout
if the timeout is set to be <= 9 seconds.


It is perfectly fine to have a timeout of 30 seconds with the pretimeout of 9 
seconds.




I haven't dug into the various user level clients of watchdog so I'm not sure
what the impact of making this change would be to them.





+   dev_info(dev->parent, "Setting pretimeout to %d\n", 
PRETIMEOUT_SEC);


Please no ongoing logging noise. This can easily be abused to clog
the kernel log.


Good point.  I will look at WARN_ONCE or something similar.


A traceback if someone sets a bad timeout ? That would be even worse.



I am thinking something more in line with setting a static variable if
the message had already been printed and not reprinting it.



That is ok, and different to W

[PATCH 4/4] Staging: ks7010: hostif: Fix multiple use of arguments in ps_confirm_wait_inc() macro.

2018-02-16 Thread Quytelda Kahja

Use GCC extensions to prevent macro arguments from accidentally being evaluated
multiple times when the macro is called.

Signed-off-by: Quytelda Kahja 
---
 drivers/staging/ks7010/ks_hostif.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/ks7010/ks_hostif.c 
b/drivers/staging/ks7010/ks_hostif.c
index 30c9592b3a00..92035e8ac843 100644
--- a/drivers/staging/ks7010/ks_hostif.c
+++ b/drivers/staging/ks7010/ks_hostif.c
@@ -1306,11 +1306,10 @@ int hostif_data_request(struct ks_wlan_private *priv, 
struct sk_buff *skb)
return ret;
 }
 
-#define ps_confirm_wait_inc(priv)   \
-   do { \
-   if (atomic_read(&priv->psstatus.status) > PS_ACTIVE_SET) \
-   atomic_inc(&priv->psstatus.confirm_wait);\
-   } while (0)
+#define ps_confirm_wait_inc(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   if (atomic_read(&priv_->psstatus.status) > PS_ACTIVE_SET) \
+   atomic_inc(&priv_->psstatus.confirm_wait); })
 
 static
 void hostif_mib_get_request(struct ks_wlan_private *priv,
-- 
2.16.1

[PATCH 3/4] Staging: ks7010: hostif: Fix multiple use of arguments in rate and event masking macros.

2018-02-16 Thread Quytelda Kahja

Use GCC extensions to prevent macro arguments from accidentally being evaluated
multiple times when the macro is called.

Signed-off-by: Quytelda Kahja 
---
 drivers/staging/ks7010/ks_hostif.h | 74 +-
 1 file changed, 50 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/ks7010/ks_hostif.h 
b/drivers/staging/ks7010/ks_hostif.h
index 5bae8d468e23..750ac86cee77 100644
--- a/drivers/staging/ks7010/ks_hostif.h
+++ b/drivers/staging/ks7010/ks_hostif.h
@@ -599,19 +599,39 @@ struct hostif_mic_failure_confirm_t {
 #define TX_RATE_48M(uint8_t)(480 / 5)
 #define TX_RATE_54M(uint8_t)(540 / 5)
 
-#define IS_11B_RATE(A) (((A & RATE_MASK) == TX_RATE_1M) || ((A & RATE_MASK) == 
TX_RATE_2M) || \
-   ((A & RATE_MASK) == TX_RATE_5M) || ((A & RATE_MASK) == 
TX_RATE_11M))
-
-#define IS_OFDM_RATE(A) (((A & RATE_MASK) == TX_RATE_6M) || ((A & RATE_MASK) 
== TX_RATE_12M) || \
-((A & RATE_MASK) == TX_RATE_24M) || ((A & RATE_MASK) 
== TX_RATE_9M) || \
-((A & RATE_MASK) == TX_RATE_18M) || ((A & RATE_MASK) 
== TX_RATE_36M) || \
-((A & RATE_MASK) == TX_RATE_48M) || ((A & RATE_MASK) 
== TX_RATE_54M))
-
-#define IS_11BG_RATE(A) (IS_11B_RATE(A) || IS_OFDM_RATE(A))
-
-#define IS_OFDM_EXT_RATE(A) (((A & RATE_MASK) == TX_RATE_9M) || ((A & 
RATE_MASK) == TX_RATE_18M) || \
-((A & RATE_MASK) == TX_RATE_36M) || ((A & 
RATE_MASK) == TX_RATE_48M) || \
-((A & RATE_MASK) == TX_RATE_54M))
+#define IS_11B_RATE(A) \
+   ({  \
+   typeof(A) A_ = (A); \
+   ((A_ & RATE_MASK) == TX_RATE_1M) || \
+   ((A_ & RATE_MASK) == TX_RATE_2M) || \
+   ((A_ & RATE_MASK) == TX_RATE_5M) || \
+   ((A_ & RATE_MASK) == TX_RATE_11M); })
+
+#define IS_OFDM_RATE(A)
\
+   ({  \
+   typeof(A) A_ = (A); \
+   ((A_ & RATE_MASK) == TX_RATE_6M) || \
+   ((A_ & RATE_MASK) == TX_RATE_12M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_24M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_9M) || \
+   ((A_ & RATE_MASK) == TX_RATE_18M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_36M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_48M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_54M); })
+
+#define IS_11BG_RATE(A)\
+   ({  \
+   typeof(A) A_ = (A); \
+   IS_11B_RATE(A_) || IS_OFDM_RATE(A_); })
+
+#define IS_OFDM_EXT_RATE(A)\
+   ({  \
+   typeof(A) A_ = (A); \
+   ((A_ & RATE_MASK) == TX_RATE_9M) || \
+   ((A_ & RATE_MASK) == TX_RATE_18M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_36M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_48M) ||\
+   ((A_ & RATE_MASK) == TX_RATE_54M); })
 
 enum connect_status_type {
CONNECT_STATUS,
@@ -633,17 +653,23 @@ enum multicast_filter_type {
 
 /* macro function */
 #define HIF_EVENT_MASK 0xE800
-#define IS_HIF_IND(_EVENT)  ((_EVENT & HIF_EVENT_MASK) == 0xE800  && \
-((_EVENT & ~HIF_EVENT_MASK) == 0x0001 || \
-(_EVENT & ~HIF_EVENT_MASK) == 0x0006 || \
-(_EVENT & ~HIF_EVENT_MASK) == 0x000C || \
-(_EVENT & ~HIF_EVENT_MASK) == 0x0011 || \
-(_EVENT & ~HIF_EVENT_MASK) == 0x0012))
-
-#define IS_HIF_CONF(_EVENT) ((_EVENT & HIF_EVENT_MASK) == 0xE800  && \
-(_EVENT & ~HIF_EVENT_MASK) > 0x  && \
-(_EVENT & ~HIF_EVENT_MASK) < 0x0012  && \
-!IS_HIF_IND(_EVENT))
+#define IS_HIF_IND(_EVENT)   \
+   ({\
+   typeof(_EVENT) EVENT_ = (_EVENT); \
+   (EVENT_ & HIF_EVENT_MASK) == 0xE800  &&   \
+   ((EVENT_ & ~HIF_EVENT_MASK) == 0x0001 ||  \
+(EVENT_ & ~HIF_EVENT_MASK) == 0x0006 ||  \
+

[PATCH 2/4] Staging: ks7010: hostif: Fix multiple use of arguments in SME queue macros.

2018-02-16 Thread Quytelda Kahja

Use GCC extensions to prevent macro arguments from accidentally being evaluated
multiple times when the macro is called.

Signed-off-by: Quytelda Kahja 
---
 drivers/staging/ks7010/ks_hostif.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/ks7010/ks_hostif.c 
b/drivers/staging/ks7010/ks_hostif.c
index 975dbbb3abd0..30c9592b3a00 100644
--- a/drivers/staging/ks7010/ks_hostif.c
+++ b/drivers/staging/ks7010/ks_hostif.c
@@ -22,12 +22,19 @@
 #include /* New driver API */
 
 /* macro */
-#define inc_smeqhead(priv) \
-   (priv->sme_i.qhead = (priv->sme_i.qhead + 1) % SME_EVENT_BUFF_SIZE)
-#define inc_smeqtail(priv) \
-   (priv->sme_i.qtail = (priv->sme_i.qtail + 1) % SME_EVENT_BUFF_SIZE)
-#define cnt_smeqbody(priv) \
-   (((priv->sme_i.qtail + SME_EVENT_BUFF_SIZE) - (priv->sme_i.qhead)) % 
SME_EVENT_BUFF_SIZE)
+#define inc_smeqhead(priv) \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qhead = priv_->sme_i.qhead + 1;   \
+   priv_->sme_i.qhead = next_qhead % SME_EVENT_BUFF_SIZE; })
+#define inc_smeqtail(priv) \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qtail = priv_->sme_i.qtail + 1;   \
+   priv_->sme_i.qtail = next_qtail % SME_EVENT_BUFF_SIZE; })
+#define cnt_smeqbody(priv) \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int left_cnt = \
+   priv_->sme_i.qtail + SME_EVENT_BUFF_SIZE;   \
+(left_cnt - (priv_->sme_i.qhead)) % SME_EVENT_BUFF_SIZE; })
 
 #define KS_WLAN_MEM_FLAG (GFP_ATOMIC)
 
-- 
2.16.1

[PATCH 1/4] Staging: ks7010: sdio: Fix multiple use of arguments in RX/TX queue macros.

2018-02-16 Thread Quytelda Kahja

Use GCC extensions to prevent macro arguments from accidentally being evaluated
multiple times when the macro is called.

Signed-off-by: Quytelda Kahja 
---
 drivers/staging/ks7010/ks7010_sdio.c | 40 
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/ks7010/ks7010_sdio.c 
b/drivers/staging/ks7010/ks7010_sdio.c
index 8cfdff198334..ffa7e2382353 100644
--- a/drivers/staging/ks7010/ks7010_sdio.c
+++ b/drivers/staging/ks7010/ks7010_sdio.c
@@ -32,19 +32,33 @@ static const struct sdio_device_id ks7010_sdio_ids[] = {
 };
 MODULE_DEVICE_TABLE(sdio, ks7010_sdio_ids);
 
-#define inc_txqhead(priv) \
-   (priv->tx_dev.qhead = (priv->tx_dev.qhead + 1) % TX_DEVICE_BUFF_SIZE)
-#define inc_txqtail(priv) \
-   (priv->tx_dev.qtail = (priv->tx_dev.qtail + 1) % TX_DEVICE_BUFF_SIZE)
-#define cnt_txqbody(priv) \
-   (((priv->tx_dev.qtail + TX_DEVICE_BUFF_SIZE) - (priv->tx_dev.qhead)) % 
TX_DEVICE_BUFF_SIZE)
-
-#define inc_rxqhead(priv) \
-   (priv->rx_dev.qhead = (priv->rx_dev.qhead + 1) % RX_DEVICE_BUFF_SIZE)
-#define inc_rxqtail(priv) \
-   (priv->rx_dev.qtail = (priv->rx_dev.qtail + 1) % RX_DEVICE_BUFF_SIZE)
-#define cnt_rxqbody(priv) \
-   (((priv->rx_dev.qtail + RX_DEVICE_BUFF_SIZE) - (priv->rx_dev.qhead)) % 
RX_DEVICE_BUFF_SIZE)
+#define inc_txqhead(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qhead = priv_->tx_dev.qhead + 1;  \
+   priv_->tx_dev.qhead = next_qhead % TX_DEVICE_BUFF_SIZE; })
+#define inc_txqtail(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qtail = priv_->tx_dev.qtail + 1;  \
+   priv_->tx_dev.qtail = next_qtail % TX_DEVICE_BUFF_SIZE; })
+#define cnt_txqbody(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int left_cnt = \
+   priv_->tx_dev.qtail + TX_DEVICE_BUFF_SIZE;  \
+   (left_cnt - (priv_->tx_dev.qhead)) % TX_DEVICE_BUFF_SIZE; })
+
+#define inc_rxqhead(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qhead = priv_->rx_dev.qhead + 1;  \
+   priv_->rx_dev.qhead = next_qhead % RX_DEVICE_BUFF_SIZE; })
+#define inc_rxqtail(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int next_qtail = priv_->rx_dev.qtail + 1;  \
+   priv_->rx_dev.qtail = next_qtail % RX_DEVICE_BUFF_SIZE; })
+#define cnt_rxqbody(priv)  \
+   ({ typeof(priv) priv_ = (priv); \
+   unsigned int left_cnt = \
+   priv_->rx_dev.qtail + RX_DEVICE_BUFF_SIZE;  \
+   (left_cnt - (priv_->rx_dev.qhead)) % RX_DEVICE_BUFF_SIZE; })
 
 /* Read single byte from device address into byte (CMD52) */
 static int ks7010_sdio_readb(struct ks_wlan_private *priv, unsigned int 
address,
-- 
2.16.1

Re: [PATCH v3 01/15] Documentation: add newcx initramfs format description

2018-02-16 Thread hpa

On February 16, 2018 1:47:35 PM PST, Victor Kamensky  wrote:
>
>
>On Fri, 16 Feb 2018, Rob Landley wrote:
>
>>
>> On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
>>> On 02/16/18 12:33, Taras Kondratiuk wrote:
 Many of the Linux security/integrity features are dependent on file
 metadata, stored as extended attributes (xattrs), for making
>decisions.
 These features need to be initialized during initcall and enabled
>as
 early as possible for complete security coverage.

 Initramfs (tmpfs) supports xattrs, but newc CPIO archive format
>does not
 support including them into the archive.

 This patch describes "extended" newc format (newcx) that is based
>on
 newc and has following changes:
 - extended attributes support
 - increased size of filesize to support files >4GB
 - increased mtime field size to have 64 bits of seconds and added a
   field for nanoseconds
 - removed unused checksum field

>>>
>>> If you are going to implement a new, non-backwards-compatible
>format,
>>> you shouldn't replicate the mistakes of the current format. 
>Specifically:
>>
>> So rather than make minimal changes to the existing format and
>continue to
>> support the existing format (sharing as much code as possible), you
>recommend
>> gratuitous aesthetic changes?
>>
>>> 1. The use of ASCII-encoded fixed-length numbers is an idiotic
>legacy
>>> from an era before there were any portable way of dealing with
>numbers
>>> with prespecified endianness.
>>
>> It lets encoders and decoders easily share code with the existing
>cpio format,
>> which we still intend to be able to read and write.
>>
>>> If you are going to use ASCII, make them
>>> delimited so that they don't have fixed limits, or just use binary.
>>
>> When it's gzipped this accomplishes what? (Other than being
>gratuitously
>> different from the previous iteration?)
>>
>>> The cpio header isn't fixed size, so that argument goes away, in
>fact
>>> the only way to determine the end of the header is to scan forward.
>>>
>>> 2. Alignment sensitivity!  Because there is no header length
>>> information, the above scan tells you where the header ends, but
>there
>>> is padding before the data, and the size of that padding is only
>defined
>>> by alignment.
>>
>> Again, these are minimal changes to the existing cpio format. You're
>complaining
>> about _cpio_, and that the new stuff isn't _different_ enough from
>it.
>>
>>> 3. Inband encoding of EOF: if you actually have a filename
>"TRAILER!!!"
>>> you have problems.
>>
>> Been there, done that:
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html
>>
>>> But first, before you define a whole new format for which no tools
>exist
>>> (you will have to work with the maintainers of the GNU tools to add
>>> support)
>>
>> No, he's been working with the maintainer of toybox to add support
>(for about a
>> year now), which gets him the Android command line. And the kernel
>has its own
>> built-in tool to generate cpio images anyway.
>>
>> Why would anyone care what the GNU project thinks?
>
>In our internal use of this patch series we do use gnu cpio
>to create initramfs.cpio.
>
>And reference to gnu cpio patch that supports newcx format is
>posted in description for this serieis:
>
>https://raw.githubusercontent.com/victorkamensky/initramfs-xattrs-poky/rocko/meta/recipes-extended/cpio/cpio-2.12/cpio-xattrs.patch
>
>Whether GNU cpio maintainers will accept it is different matter.
>We will try, but we need to start somewhere and agree on
>new format first.
>
>Thanks,
>Victor
>
>>> you should see how complex it would be to support the POSIX
>>> tar/pax format,
>>
>> That argument was had (at length) when initramfs went in over a
>decade ago.
>> There are links in
>Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
>> mailing list entries about it.
>>
>>> which already has all the features you are seeking, and
>>> by now is well-supported.
>>
>> So... tar wasn't well-supported 15 years ago? (Hasn't the kernel
>source always
>> been distributed via tarball back since 0.0.1?)
>>
>> You're suggesting having a whole second codepath that shares no code
>with the
>> existing cpio extractor. Are you suggesting abandoning support for
>the existing
>> initramfs.cpio.gz file format?
>>
>> Rob
>>

Introducing new, incompatible data formats is an inherently *very* costly 
operation; unfortunately many engineers don't seem to have a good grip of just 
*how* expensive it is (see "silly embedded nonsense hacks", "too little, too 
soon".)

Cpio itself is a great horror show of just how bad this gets: a bunch of minor 
tweaks without finding underlying design bugs resulting in a ton of mutually 
incompatible formats.  "They are almost the same" doesn't help: they are still 
incompatible.

Introducing a new incompatible data format without strong justification is 
engineering malpractice.  Doing it under the non-justification of expedience 
("oh, we can share most of the

Re: [PATCH v2] reset: add support for non-DT systems

2018-02-16 Thread David Lechner


On 02/13/2018 12:39 PM, Bartosz Golaszewski wrote:

From: Bartosz Golaszewski 

The reset framework only supports device-tree. There are some platforms
however, which need to use it even in legacy, board-file based mode.

An example of such architecture is the DaVinci family of SoCs which
supports both device tree and legacy boot modes and we don't want to
introduce any regressions.

We're currently working on converting the platform from its hand-crafted
clock API to using the common clock framework. Part of the overhaul will
be representing the chip's power sleep controller's reset lines using
the reset framework.

This changeset extends the core reset code with a new field in the
reset controller struct which contains an array of lookup entries. Each
entry contains the device name and an additional, optional identifier
string.

Drivers can register a set of reset lines using this lookup table and
concerned devices can access them using the regular reset_control API.

This new function is only called as a fallback in case the of_node
field is NULL and doesn't change anything for current users.

Tested with a dummy reset driver with several lookup entries.

An example lookup table can look like this:

static const struct reset_lookup foobar_reset_lookup[] = {
[FOO_RESET] = { .dev = "foo", .id = "foo_id" },
[BAR_RESET] = { .dev = "bar", .id = NULL },
{ }
};

where FOO_RESET and BAR_RESET will correspond with the id parameters
of reset callbacks.

Cc: Sekhar Nori 
Cc: Kevin Hilman 
Cc: David Lechner 
Signed-off-by: Bartosz Golaszewski 
---
v1 -> v2:
- renamed the new function to __reset_control_get_from_lookup()
- added a missing break; when a matching entry is found
- rearranged the code in __reset_control_get() - we can no longer get to the
   return at the bottom, so remove it and return from
   __reset_control_get_from_lookup() if __of_reset_control_get() fails
- return -ENOENT from reset_contol_get() if we can't find a matching entry,
   prevously returned -EINVAL referred to the fact that we passed a device
   without the of_node which is no longer an error condition
- add a comment about needing a sentinel in the lookup table

  drivers/reset/core.c | 40 +++-
  include/linux/reset-controller.h | 14 ++
  2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/reset/core.c b/drivers/reset/core.c
index da4292e9de97..b104a0c5c511 100644
--- a/drivers/reset/core.c
+++ b/drivers/reset/core.c
@@ -493,6 +493,44 @@ struct reset_control *__of_reset_control_get(struct 
device_node *node,
  }
  EXPORT_SYMBOL_GPL(__of_reset_control_get);
  
+static struct reset_control *

+__reset_control_get_from_lookup(struct device *dev, const char *id,
+   bool shared, bool optional)
+{
+   struct reset_controller_dev *rcdev;
+   const char *dev_id = dev_name(dev);
+   struct reset_control *rstc = NULL;
+   const struct reset_lookup *lookup;
+   int index;
+
+   mutex_lock(&reset_list_mutex);
+
+   list_for_each_entry(rcdev, &reset_controller_list, list) {
+   if (!rcdev->lookup)
+   continue;
+
+   lookup = rcdev->lookup;
+   for (index = 0; lookup->dev; index++, lookup++) {> +   
   if (strcmp(dev_id, lookup->dev))
+   continue;
+
+   if ((!id && !lookup->id) ||
+   (id && lookup->id && !strcmp(id, lookup->id))) {
+   rstc = __reset_control_get_internal(rcdev,
+   index, shared);
+   break;
+   }
+   }
+   }



This method of determining the index is not very useful. In the case of the DSP
reset on OMAP-L138, the index *must* be the LPSC module domain number, which is
15. This would require us to create 15 dummy entries in the rcdev->lookup array
so that we get the correct index in order to get the correct reset control.

I think it would be better to just store the index in struct reset_lookup.

Another option would be to require the length of lookup to be rcdev->nr_resets
instead of using a sentinel.

Re: [PATCH v3 08/11] watchdog/hpwdt: Programable Pretimeout NMI

2018-02-16 Thread Jerry Hoemann

On Fri, Feb 16, 2018 at 03:55:06PM -0800, Guenter Roeck wrote:
> On Fri, Feb 16, 2018 at 04:46:17PM -0700, Jerry Hoemann wrote:
> > On Fri, Feb 16, 2018 at 12:34:40PM -0800, Guenter Roeck wrote:
> > > On Thu, Feb 15, 2018 at 04:43:57PM -0700, Jerry Hoemann wrote:

...

> > > > @@ -98,12 +106,21 @@ static int hpwdt_settimeout(struct watchdog_device 
> > > > *dev, unsigned int val)
> > > >  }
> > > >  
> > > >  #ifdef CONFIG_HPWDT_NMI_DECODING   /* { */
> > > > +static int hpwdt_set_pretimeout(struct watchdog_device *dev, unsigned 
> > > > int val)
> > > > +{
> > > > +   if (val && (val != PRETIMEOUT_SEC)) {
> > > 
> > > Unnecessary ( )
> > 
> > 
> > There are several things going on here. I'm not sure which one the above
> > comment is intended.
> > 
> The "Unnecessary" refers to the ( ) around the second part of the expression
> above. While there may be valid reasons to include extra ( ), I think we
> can trust the C compiler to get it right here.

Okay, wasn't sure what you were getting at here.

I trust the C compiler, I don't trust humans.  In compound conditionals,
I'll add parens so that the meaning is clear.

> > While a pretimeout NMI isn't required by the HW to be enabled, if enabled 
> > the
> > length of pretimeout is fixed by HW.
> > 
> > I didn't see anything in the API that would allow us to communicate to
> > the user this "feature."  timeout at leasst has both min_timeout and 
> > max_timeout, but
> > I didn't see similar for pretimeout.  I also don't think its reasonable to 
> > fail
> > here if the requested value is not 9 as the user really has no way of 
> > knowing what
> > the valid range of pretimeout values are.  So I accept, any non-zero value
> > for pretimeout, but then set pretimeout to be 9.
> > 
> > But at the same time, I don't like to silently change a human request
> > w/o at least warning.
> > 
> Sorry, I lost you here.

I wasn't sure to what you were objecting to.  I thought you might
not have understood why I was converting non-zero values of
"pretimeout" to 9.  Was trying to explain the reasoning.

A problem I see with the watchdog API is that users don't know
what is an acceptable range of values for pretimeout.

For HPE proliant systems, one cannot just choose an arbitrary
value for pretimeout.

I don't see a reasonable way that a user can determine the valid range
for pretimeout for HPE systems given our hardware restrictions.

> 
> > > 
> > > The actual timeout can be a value smaller than 9 seconds.
> > > Minimum is 1 second. What happens if the user configures
> > > a timeout of less than 9 seconds as well as a pretimeout ?
> > > Will it fire immediately ? 
> > 
> > The architecture is silent on this issue.  My experience with
> > this is that if timeout < 9 seconds, the NMI is not issued.
> > System resets when the timeout expires.  This could be implementation
> > dependent.
> > 
> > Note, this is not a new issue.
> > 
> Bad argument.

Not sure exactly to what your objections are.  I'll point out that:

1) hpwdt has been using pretimeout NMI for watchdog for > 10 years.
2) For 8 years, its been possible to have a timeout < 9 seconds.
3) AFAIK this hasn't proven to be a big issue.
4) I have real questions as to how (or if) to address the issue.

I am perfectly willing to discuss the problem, but I don't think
it is a requirement for this patch set.

> 
> > I thought about setting the min timeout to ten seconds to avoid this 
> > situation.
> > 
> You could reject reject request to set the pretimeout to a value <= the
> timeout.

I think you mis-communicated here.

It is perfectly fine to have a timeout of 30 seconds with the pretimeout of 9 
seconds.

> 
> > I haven't dug into the various user level clients of watchdog so I'm not 
> > sure 
> > what the impact of making this change would be to them.
> > 
> > 
> > > 
> > > > +   dev_info(dev->parent, "Setting pretimeout to %d\n", 
> > > > PRETIMEOUT_SEC);
> > > 
> > > Please no ongoing logging noise. This can easily be abused to clog
> > > the kernel log.
> > 
> > Good point.  I will look at WARN_ONCE or something similar.
> > 
> A traceback if someone sets a bad timeout ? That would be even worse.

I am thinking something more in line with setting a static variable if
the message had already been printed and not reprinting it.

-- 

-
Jerry Hoemann  Software Engineer   Hewlett Packard Enterprise
-

Re: [PATCH 1/1] perf: Add CPU hotplug support for events

2018-02-16 Thread Raghavendra Rao Ananta




On 02/16/2018 12:39 PM, Peter Zijlstra wrote:

On Fri, Feb 16, 2018 at 10:06:29AM -0800, Raghavendra Rao Ananta wrote:

No this is absolutely disguisting. You can simply keep the events in the
dead CPU's context. It's really not that hard.

Keeping the events in the dead CPU's context was also an idea that we had.
However, detaching that event from the PMU when the CPU is offline would be
a pain. Consider the scenario in which an event is about to be destroyed
when the CPU is offline (yet still attached to the CPU). During it's
destruction, a cross-cpu call is made (from perf_remove_from_context()) to
the offlined CPU to detach the event from the CPU's PMU. As the CPU is
offline, that would not be possible, and again a separate logic has to be
written for cleaning up the events whose CPUs are offlined.


That is actually really simple to deal with. The real problems are with
semantics, is an event enabled when the CPU is dead? Can you
disable/enable an event on a dead CPU.

The below patch (_completely_ untested) should do most of it, but needs
help with the details. I suspect we want to allow enable/disable on
events that are on a dead CPU, and equally I think we want to account
the time an enabled event spends on a dead CPU to go towards the
'enabled' bucket.
I've gone through your diff, and it gave me a hint of similar texture 
what we are trying to do (except for maintaining an offline event list). 
Nevertheless, I tried to test your patch. I created an hw event, and 
tried to offline the CPU in parallel, and I immediately hit a watchdog 
soft lockup bug! Tried the same this by first switching off the CPU 
(without any event created), and I hit into similar issue. I am sure we 
can fix it, but apart from the "why we are doing hotplug?" question, was 
was there specifically any issue with our patch?





Also, you _still_ don't explain why you care about dead CPUs.

I wanted to understand, if we no longer care about hotplugging of CPUs, 
then why do we still have exported symbols such as cpu_up() and 
cpu_down()? Moreover, we also have the hotplug interface exposed to 
users-space as well (through sysfs). As long as these interfaces exist, 
there's always a potential chance of bringing the CPU up/down. Can you 
please clear this thing up for me?


-- Raghavendra

--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[no subject]

2018-02-16 Thread Ryan Ellis



Hi, I am Ryan. I consider myself an easy-going man,honest and loving person. I 
am currently looking for a relationship in which i feel loved.

Please tell me more about yourself, if you do not mind.

Regards,
Ryan Ellis.

syscon regmap for disabled node?

2018-02-16 Thread Suman Anna

Hi Pankaj, Arnd, Lee,

I am testing some code to use a syscon/regmap interface and I find that
the syscon/regmap is initialized even on a disabled device node using a
"syscon" compatible property when I have expected it to fail. Prior to
commit bdb0066df96e ("mfd: syscon: Decouple syscon interface from
platform devices"), the driver would have never probed, and the
of_syscon_register() only checks for the compatible, but not if the
device node is available. Is this intentional or a bug?

regards
Suman

Re: [PATCH 3/3] taint: Add taint for randstruct

2018-02-16 Thread Kees Cook

On Fri, Feb 16, 2018 at 1:02 PM, Andrew Morton
 wrote:
> On Thu, 15 Feb 2018 19:37:44 -0800 Kees Cook  wrote:
>
>> --- a/Documentation/sysctl/kernel.txt
>> +++ b/Documentation/sysctl/kernel.txt
>> @@ -991,6 +991,7 @@ ORed together. The letters are seen in "Tainted" line of 
>> Oops reports.
>>   16384 (L): A soft lockup has previously occurred on the system.
>>   32768 (K): The kernel has been live patched.
>>   65536 (X): Auxiliary taint, defined and used by for distros.
>> +131072 (T): The kernel was built with the struct randomization plugin.
>
> Uncle.
>
>
> From: Andrew Morton 
> Subject: Documentation/sysctl/kernel.txt: show taint codes in hex
>
> The decimal representation is getting a bit hard to follow.

The rationale, AIUI, is that /proc/sys/kernel/tainted prints the
values in decimal. If we change the docs to be hex and leave the
output decimal, that makes it even harder to examine.

If we change the proc output, will we break userspace? And if we
change it, maybe avoid numbers at all, and proc should bring the same
thing that Oops does (the letter codes)? (But then the sysctl would
need to parse the letters...)

-Kees

-- 
Kees Cook
Pixel Security

[PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-16 Thread Shanker Donthineni

Two point of unification cache maintenance operations 'DC CVAU' and
'IC IVAU' are optional for implementors as per ARMv8 specification.
This patch parses the updated CTR_EL0 register definition and adds
the required changes to skip POU operations if the hardware reports
CTR_EL0.IDC and/or CTR_EL0.IDC.

CTR_EL0.DIC: Instruction cache invalidation requirements for
 instruction to data coherence. The meaning of this bit[29].
  0: Instruction cache invalidation to the point of unification
 is required for instruction to data coherence.
  1: Instruction cache cleaning to the point of unification is
  not required for instruction to data coherence.

CTR_EL0.IDC: Data cache clean requirements for instruction to data
 coherence. The meaning of this bit[28].
  0: Data cache clean to the point of unification is required for
 instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
 or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
  1: Data cache clean to the point of unification is not required
 for instruction to data coherence.

Signed-off-by: Philip Elcan 
Signed-off-by: Shanker Donthineni 
---
 arch/arm64/include/asm/assembler.h | 48 --
 arch/arm64/include/asm/cache.h |  2 ++
 arch/arm64/kernel/cpufeature.c |  2 ++
 arch/arm64/mm/cache.S  | 26 ++---
 4 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 3c78835..9eaa948 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
.macro save_and_disable_daif, flags
mrs \flags, daif
@@ -334,9 +335,9 @@
  * raw_dcache_line_size - get the minimum D-cache line size on this CPU
  * from the CTR register.
  */
-   .macro  raw_dcache_line_size, reg, tmp
-   mrs \tmp, ctr_el0   // read CTR
-   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   .macro  raw_dcache_line_size, reg, tmp, ctr
+   mrs \ctr, ctr_el0   // read CTR
+   ubfm\tmp, \ctr, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -344,9 +345,9 @@
 /*
  * dcache_line_size - get the safe D-cache line size across all CPUs
  */
-   .macro  dcache_line_size, reg, tmp
-   read_ctr\tmp
-   ubfm\tmp, \tmp, #16, #19// cache line size encoding
+   .macro  dcache_line_size, reg, tmp, ctr
+   read_ctr\ctr
+   ubfm\tmp, \ctr, #16, #19// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -355,9 +356,9 @@
  * raw_icache_line_size - get the minimum I-cache line size on this CPU
  * from the CTR register.
  */
-   .macro  raw_icache_line_size, reg, tmp
-   mrs \tmp, ctr_el0   // read CTR
-   and \tmp, \tmp, #0xf// cache line size encoding
+   .macro  raw_icache_line_size, reg, tmp, ctr
+   mrs \ctr, ctr_el0   // read CTR
+   and \tmp, \ctr, #0xf// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -365,9 +366,9 @@
 /*
  * icache_line_size - get the safe I-cache line size across all CPUs
  */
-   .macro  icache_line_size, reg, tmp
-   read_ctr\tmp
-   and \tmp, \tmp, #0xf// cache line size encoding
+   .macro  icache_line_size, reg, tmp, ctr
+   read_ctr\ctr
+   and \tmp, \ctr, #0xf// cache line size encoding
mov \reg, #4// bytes per word
lsl \reg, \reg, \tmp// actual cache line size
.endm
@@ -408,13 +409,21 @@
  * size:   size of the region
  * Corrupts:   kaddr, size, tmp1, tmp2
  */
-   .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
-   dcache_line_size \tmp1, \tmp2
+   .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2, tmp3
+   dcache_line_size \tmp1, \tmp2, \tmp3
add \size, \kaddr, \size
sub \tmp2, \tmp1, #1
bic \kaddr, \kaddr, \tmp2
 9998:
-   .if (\op == cvau || \op == cvac)
+   .if (\op == cvau)
+alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
+   tbnz\tmp3, #CTR_IDC_SHIFT, 9997f
+   dc  cvau, \kaddr
+alternative_else
+   dc  civac, \kaddr
+   nop
+alternative_endif
+   .elseif (\op == cvac)
 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
dc  \o

Re: [PATCH v3 1/1] mm: page_alloc: skip over regions of invalid pfns on UMA

2018-02-16 Thread Andrew Morton

On Mon, 12 Feb 2018 19:47:59 +0100 Michal Hocko  wrote:

> > prerequisite for this is to reach some agreement on what people think is
> > the best option, which I feel didn't occur yet.
> 
> I do not have a _strong_ preference here as well. So I will leave the
> decision to you.
> 
> In any case feel free to add
> Acked-by: Michal Hocko 

I find Michal's version to be a little tidier.

Eugeniu, please send Michal's patch at me with a fresh changelog, with
your signed-off-by and your tested-by and your reported-by and we may
as well add Michal's (thus-far-missing) signed-off-by ;)

Re: [PATCH] tools/memory-model: remove rb-dep, smp_read_barrier_depends, and lockless_dereference

2018-02-16 Thread Paul E. McKenney

On Fri, Feb 16, 2018 at 05:22:55PM -0500, Alan Stern wrote:
> Since commit 76ebbe78f739 ("locking/barriers: Add implicit
> smp_read_barrier_depends() to READ_ONCE()") was merged for the 4.15
> kernel, it has not been necessary to use smp_read_barrier_depends().
> Similarly, commit 59ecbbe7b31c ("locking/barriers: Kill
> lockless_dereference()") removed lockless_dereference() from the
> kernel.
> 
> Since these primitives are no longer part of the kernel, they do not
> belong in the Linux Kernel Memory Consistency Model.  This patch
> removes them, along with the internal rb-dep relation, and updates the
> revelant documentation.
> 
> Signed-off-by: Alan Stern 

I queued this, but would welcome an update that addressed Akira's
feedback as appropriate.

Thanx, Paul

> ---
> 
> Index: usb-4.x/tools/memory-model/linux-kernel.cat
> ===
> --- usb-4.x/tools/memory-model.orig/linux-kernel.cat
> +++ usb-4.x/tools/memory-model/linux-kernel.cat
> @@ -25,7 +25,6 @@ include "lock.cat"
>  (***)
> 
>  (* Fences *)
> -let rb-dep = [R] ; fencerel(Rb_dep) ; [R]
>  let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn]
>  let wmb = [W] ; fencerel(Wmb) ; [W]
>  let mb = ([M] ; fencerel(Mb) ; [M]) |
> @@ -61,11 +60,9 @@ let dep = addr | data
>  let rwdep = (dep | ctrl) ; [W]
>  let overwrite = co | fr
>  let to-w = rwdep | (overwrite & int)
> -let rrdep = addr | (dep ; rfi)
> -let strong-rrdep = rrdep+ & rb-dep
> -let to-r = strong-rrdep | rfi-rel-acq
> +let to-r = addr | (dep ; rfi) | rfi-rel-acq
>  let fence = strong-fence | wmb | po-rel | rmb | acq-po
> -let ppo = rrdep* ; (to-r | to-w | fence)
> +let ppo = to-r | to-w | fence
> 
>  (* Propagation: Ordering from release operations and strong fences. *)
>  let A-cumul(r) = rfe? ; r
> Index: usb-4.x/tools/memory-model/Documentation/explanation.txt
> ===
> --- usb-4.x/tools/memory-model.orig/Documentation/explanation.txt
> +++ usb-4.x/tools/memory-model/Documentation/explanation.txt
> @@ -1,5 +1,5 @@
> -Explanation of the Linux-Kernel Memory Model
> -
> +Explanation of the Linux-Kernel Memory Consistency Model
> +
> 
>  :Author: Alan Stern 
>  :Created: October 2017
> @@ -35,25 +35,24 @@ Explanation of the Linux-Kernel Memory M
>  INTRODUCTION
>  
> 
> -The Linux-kernel memory model (LKMM) is rather complex and obscure.
> -This is particularly evident if you read through the linux-kernel.bell
> -and linux-kernel.cat files that make up the formal version of the
> -memory model; they are extremely terse and their meanings are far from
> -clear.
> +The Linux-kernel memory consistency model (LKMM) is rather complex and
> +obscure.  This is particularly evident if you read through the
> +linux-kernel.bell and linux-kernel.cat files that make up the formal
> +version of the model; they are extremely terse and their meanings are
> +far from clear.
> 
>  This document describes the ideas underlying the LKMM.  It is meant
> -for people who want to understand how the memory model was designed.
> -It does not go into the details of the code in the .bell and .cat
> -files; rather, it explains in English what the code expresses
> -symbolically.
> +for people who want to understand how the model was designed.  It does
> +not go into the details of the code in the .bell and .cat files;
> +rather, it explains in English what the code expresses symbolically.
> 
>  Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
> -toward beginners; they explain what memory models are and the basic
> -notions shared by all such models.  People already familiar with these
> -concepts can skim or skip over them.  Sections 6 (EVENTS) through 12
> -(THE FROM_READS RELATION) describe the fundamental relations used in
> -many memory models.  Starting in Section 13 (AN OPERATIONAL MODEL),
> -the workings of the LKMM itself are covered.
> +toward beginners; they explain what memory consistency models are and
> +the basic notions shared by all such models.  People already familiar
> +with these concepts can skim or skip over them.  Sections 6 (EVENTS)
> +through 12 (THE FROM_READS RELATION) describe the fundamental
> +relations used in many models.  Starting in Section 13 (AN OPERATIONAL
> +MODEL), the workings of the LKMM itself are covered.
> 
>  Warning: The code examples in this document are not written in the
>  proper format for litmus tests.  They don't include a header line, the
> @@ -827,8 +826,8 @@ A-cumulative; they only affect the propa
>  executed on C before the fence (i.e., those which precede the fence in
>  program order).
> 
> -smp_read_barrier_depends(), rcu_read_lock(), rcu_read_unlock(), and
> -synchronize_rcu() fences have other properties which we discuss later.
> +read_

Re: [PATCH] lib: Rename compiler intrinsic selects to GENERIC_LIB_*

2018-02-16 Thread Palmer Dabbelt


On Tue, 13 Feb 2018 14:49:37 PST (-0800), jho...@kernel.org wrote:

On Tue, Feb 13, 2018 at 01:48:18PM -0800, Palmer Dabbelt wrote:

On Fri, 09 Feb 2018 05:22:52 PST (-0800), matt.redfe...@mips.com wrote:
> When these are included into arch Kconfig files, maintaining
> alphabetical ordering of the selects means these get split up. To allow
> for keeping things tidier and alphabetical, rename the selects to
> GENERIC_LIB_*
>
> Signed-off-by: Matt Redfearn 

Thanks!  Do you want me to take this in my tree?

Reviewed-by: Palmer Dabbelt 


Since a new version of the "MIPS: use generic GCC library routines from
lib/" series would depend on it, and it makes sense for that series to
go via the MIPS tree, I think it would be simpler for this patch to also
be taken (with your ack) via the MIPS tree. Is that okay?


That's great, thanks!



Thanks
James



> ---
>  arch/riscv/Kconfig |  6 +++---
>  lib/Kconfig| 12 ++--
>  lib/Makefile   | 12 ++--
>  3 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 2c6adf12713a..5f1e2188d029 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -99,9 +99,9 @@ config ARCH_RV32I
>bool "RV32I"
>select CPU_SUPPORTS_32BIT_KERNEL
>select 32BIT
> -  select GENERIC_ASHLDI3
> -  select GENERIC_ASHRDI3
> -  select GENERIC_LSHRDI3
> +  select GENERIC_LIB_ASHLDI3
> +  select GENERIC_LIB_ASHRDI3
> +  select GENERIC_LIB_LSHRDI3
>
>  config ARCH_RV64I
>bool "RV64I"
> diff --git a/lib/Kconfig b/lib/Kconfig
> index c5e84fbcb30b..946d0890aad6 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -584,20 +584,20 @@ config STRING_SELFTEST
>
>  endmenu
>
> -config GENERIC_ASHLDI3
> +config GENERIC_LIB_ASHLDI3
>bool
>
> -config GENERIC_ASHRDI3
> +config GENERIC_LIB_ASHRDI3
>bool
>
> -config GENERIC_LSHRDI3
> +config GENERIC_LIB_LSHRDI3
>bool
>
> -config GENERIC_MULDI3
> +config GENERIC_LIB_MULDI3
>bool
>
> -config GENERIC_CMPDI2
> +config GENERIC_LIB_CMPDI2
>bool
>
> -config GENERIC_UCMPDI2
> +config GENERIC_LIB_UCMPDI2
>bool
> diff --git a/lib/Makefile b/lib/Makefile
> index d11c48ec8ffd..7e1ef77e86a3 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -252,9 +252,9 @@ obj-$(CONFIG_SBITMAP) += sbitmap.o
>  obj-$(CONFIG_PARMAN) += parman.o
>
>  # GCC library routines
> -obj-$(CONFIG_GENERIC_ASHLDI3) += ashldi3.o
> -obj-$(CONFIG_GENERIC_ASHRDI3) += ashrdi3.o
> -obj-$(CONFIG_GENERIC_LSHRDI3) += lshrdi3.o
> -obj-$(CONFIG_GENERIC_MULDI3) += muldi3.o
> -obj-$(CONFIG_GENERIC_CMPDI2) += cmpdi2.o
> -obj-$(CONFIG_GENERIC_UCMPDI2) += ucmpdi2.o
> +obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
> +obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
> +obj-$(CONFIG_GENERIC_LIB_LSHRDI3) += lshrdi3.o
> +obj-$(CONFIG_GENERIC_LIB_MULDI3) += muldi3.o
> +obj-$(CONFIG_GENERIC_LIB_CMPDI2) += cmpdi2.o
> +obj-$(CONFIG_GENERIC_LIB_UCMPDI2) += ucmpdi2.o

Re: [PATCH] z3fold: limit use of stale list for allocation

2018-02-16 Thread Andrew Morton

On Sat, 10 Feb 2018 12:02:52 +0100 Vitaly Wool  wrote:

> Currently if z3fold couldn't find an unbuddied page it would first
> try to pull a page off the stale list. The problem with this
> approach is that we can't 100% guarantee that the page is not
> processed by the workqueue thread at the same time unless we run
> cancel_work_sync() on it, which we can't do if we're in an atomic
> context. So let's just limit stale list usage to non-atomic
> contexts only.

This smells like a bugfix.  What are the end-user visible effects of
the bug?

[PATCH v2/rollup] headers: untangle kmemleak.h from mm.h

2018-02-16 Thread Randy Dunlap

From: Randy Dunlap 

Currently  #includes  for no obvious
reason. It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add  to any users of kmemleak_*
that don't already #include it.
Also remove  from source files that do not use it.

This is tested on i386 allmodconfig and x86_64 allmodconfig. It
would be good to run it through the 0day bot for other $ARCHes.
I have neither the horsepower nor the storage space for the other
$ARCHes.

Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
for which patches are included here (in v2).

[slab.h is the second most used header file after module.h; kernel.h
is right there with slab.h. There could be some minor error in the
counting due to some #includes having comments after them and I
didn't combine all of those.]

Signed-off-by: Randy Dunlap 
Reviewed-by: Ingo Molnar 
Cc: Wei Yongjun 
Cc: Luis R. Rodriguez 
Cc: Greg Kroah-Hartman 
Cc: Mimi Zohar 
Cc: John Johansen 
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Reported-by: Michael Ellerman  # 2 build failures
Reported-by: Fengguang Wu  # 2 build failures
---

v2: add patches for build failures in lib/test_firmware.c and
security/integrity/digsig.c.
  + add Ingo's Reviewed-by: tag.

 arch/powerpc/sysdev/dart_iommu.c  |1 +
 arch/powerpc/sysdev/msi_bitmap.c  |1 +
 arch/s390/kernel/nmi.c|2 +-
 arch/s390/kernel/smp.c|1 -
 arch/sparc/kernel/irq_64.c|1 -
 arch/x86/kernel/pci-dma.c |1 -
 drivers/iommu/exynos-iommu.c  |1 +
 drivers/iommu/mtk_iommu_v1.c  |1 -
 drivers/net/ethernet/ti/cpsw.c|1 +
 drivers/net/wireless/realtek/rtlwifi/pci.c|1 -
 drivers/net/wireless/realtek/rtlwifi/rtl8192c/fw_common.c |1 -
 drivers/staging/rtl8188eu/hal/fw.c|2 +-
 drivers/staging/rtlwifi/pci.c |1 -
 drivers/virtio/virtio_ring.c  |1 -
 include/linux/slab.h  |1 -
 kernel/ucount.c   |1 +
 lib/test_firmware.c   |1 +
 mm/cma.c  |1 +
 mm/memblock.c |1 +
 net/core/sysctl_net_core.c|1 -
 net/ipv4/route.c  |1 -
 security/apparmor/lsm.c   |1 -
 security/integrity/digsig.c   |1 +
 23 files changed, 11 insertions(+), 14 deletions(-)

--- lnx-416-rc1.orig/include/linux/slab.h
+++ lnx-416-rc1/include/linux/slab.h
@@ -125,7 +125,6 @@
 #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \
(unsigned long)ZERO_SIZE_PTR)
 
-#include 
 #include 
 
 struct mem_cgroup;
--- lnx-416-rc1.orig/kernel/ucount.c
+++ lnx-416-rc1/kernel/ucount.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define UCOUNTS_HASHTABLE_BITS 10
--- lnx-416-rc1.orig/mm/memblock.c
+++ lnx-416-rc1/mm/memblock.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
--- lnx-416-rc1.orig/mm/cma.c
+++ lnx-416-rc1/mm/cma.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "cma.h"
--- lnx-416-rc1.orig/drivers/staging/rtl8188eu/hal/fw.c
+++ lnx-416-rc1/drivers/staging/rtl8188eu/hal/fw.c
@@ -30,7 +30,7 @@
 #include "rtl8188e_hal.h"
 
 #include 
-#include 
+#include 
 
 static void _rtl88e_enable_fw_download(struct adapter *adapt, bool enable)
 {
--- lnx-416-rc1.orig/drivers/iommu/exynos-iommu.c
+++ lnx-416-rc1/drivers/iommu/exynos-iommu.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- lnx-416-rc1.orig/arch/s390/kernel/nmi.c
+++ lnx-416-rc1/arch/s390/kernel/nmi.c
@@ -15,7 +15,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
--- lnx-416-rc1.orig/arch/powerpc/sysdev/dart_iommu.c
+++ lnx-416-rc1/arch/powerpc/sysdev/dart_iommu.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- lnx-416-rc1.orig/arch/powerpc/sysdev/msi_bitmap.c
+++ lnx-416-rc1/arch/powerpc/sysdev/msi_bitmap.c
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- lnx-416-rc1.orig/drivers/net/ethernet/ti/cpsw.c
+++ lnx-416-rc1/drivers/net/ethernet/ti/cpsw.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
--- lnx-416-rc1.orig/drivers/virtio/virtio_ring.c
+++ lnx-416-rc1/drivers/virtio/virtio_ring.c
@@ -23,7 +23,6 @

Re: [PATCH v3 6/6] drm/rockchip: cdn-dp: remove the DP phy switch

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:56 CET schrieb Enric Balletbo i Serra:
> From: Chris Zhong 
> 
> There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
> only one PHY can connect to DP controller at one time, the other should
> be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
> set this bit means enable PHY 1, clear this bit means enable PHY 0.
> 
> If the board has 2 Type-C ports, the DP driver get the phy id from
> devm_of_phy_get_by_index, and then control this switch according to
> this id. But some others board only has one Type-C port, it may be PHY 0
> or PHY 1. The dts node id can not tell us the correct PHY id. Hence move
> this switch to PHY driver, the PHY driver can distinguish between PHY 0
> and PHY 1, and then write the correct register bit.
> 
> Signed-off-by: Chris Zhong 
> Signed-off-by: Enric Balletbo i Serra 

Reviewed-by: Heiko Stuebner

Re: [PATCH v3 4/6] phy: rockchip-typec: force to USB2 if DP at 4 lanes mode

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:54 CET schrieb Enric Balletbo i Serra:
> From: Chris Zhong 
> 
> The usb3tousb2_en BIT will be clear to 0 in probe(), it make USB
> controller work at USB3 mode, and if the USB phy is turned on with DP
> only mode(4 lanes DP), the rockchip_usb3_phy_power_on() will return
> directly, so usb3_host_disable and usb3_host_port these 2 BIT will keep
> a same value as coreboot. In coreboot, these 3 BITs are set as USB2
> mode, but now one of the bits is changed to USB3, it make USB controller
> work at a unknown status.
> 
> These 3 BITs should be changed to USB2, if the Type-C works at 4 lanes
> mode, and then switch it back to USB3 mode, when USB disconnect.
> 
> Signed-off-by: Chris Zhong 
> Signed-off-by: Enric Balletbo i Serra 

Reviewed-by: Heiko Stuebner

Re: [PATCH v3 5/6] phy: rockchip-typec: support DP phy switch

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:55 CET schrieb Enric Balletbo i Serra:
> From: Chris Zhong 
> 
> There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
> only one PHY can connect to DP controller at one time, the other should
> be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
> set this bit means enable PHY 1, clear this bit means enable PHY 0.
> 
> Signed-off-by: Chris Zhong 
> Signed-off-by: Enric Balletbo i Serra 
> ---

Reviewed-by: Heiko Stuebner

Re: [PATCH] Make kernel taint on invalid module signatures configurable

2018-02-16 Thread Matthew Garrett

On Fri, Feb 16, 2018 at 12:25 AM Philipp Hahn  wrote:
> Sadly didn't work for me :-(
> If my understanding is correct and iff that would work, Debian (and
> others) could load their public key into Shim and then use the
> associated private key for singing their modules.

This works for UEFI systems, but distributions have to support non-UEFI as
well.

Re: [PATCH v3 3/6] phy: rockchip-typec: enable usb3 host during usb3 phy power on

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:53 CET schrieb Enric Balletbo i Serra:
> From: William wu 
> 
> We have forced usb3 to work in usb2 only mode in firmware by setting
> usb3tousb2_en (bit3 of GRF_USB3PHY0/1_CON0) to 1, and setting
> host_u3_port_disable (bit0 of GRF_USB3OTG0/1_CON1) to 1 and host_u3_port
> (bit15~12 of GRF_USB3OTG0/1_CON1) to 0. So we need to re-enable usb3
> host.
> 
> Note that the RK3399 TRM suggests that we should keep the whole usb3
> controller in reset for the duration of the Type-C PHY initialization.
> However, it's hard to assert the reset in the current framework of
> reset. And according to the TRM, it doesn't require that we should
> clear the usb3tousb2 bit before pipe ready. So let's enable the usb3
> host after pipe ready to avoid the Type-C PHY initialization failure.
> 
> Signed-off-by: William wu 
> Signed-off-by: Enric Balletbo i Serra 

Reviewed-by: Heiko Stuebner

Re: [PATCH v3 2/6] dt-bindings: phy-rockchip-typec: deprecate some register properties.

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:52 CET schrieb Enric Balletbo i Serra:
> As now the following register properties are in the driver, document as
> deprecated these properties and recommend to not use them on new bindings.
> 
> The deprecated properties are:
> 
> - rockchip,typec-conn-dir : the register of type-c connector direction
> - rockchip,usb3tousb2-en : the register of type-c force usb3 to usb2
>enable control.
> - rockchip,external-psm : the register of type-c phy external psm clock
>   selection.
> - rockchip,pipe-status : the register of type-c phy pipe status.
> 
> Signed-off-by: Enric Balletbo i Serra 

Reviewed-by: Heiko Stuebner

Re: [PATCH v3 1/6] phy: rockchip-typec: deprecate some DT properties for various register fields.

2018-02-16 Thread Heiko Stuebner

Am Freitag, 16. Februar 2018, 13:09:51 CET schrieb Enric Balletbo i Serra:
> Adding properties for various register fields in the DT doesn't scale and
> this information should be in the driver instead.
> 
> Before this patch these registers (description below) were specified in
> the DT, every register node contained 3 sections: offset, enable bit,
> write mask bit.
> 
>  - rockchip,typec-conn-dir : the register of type-c connector direction,
>for type-c phy0, it must be <0xe580 0 16>;
>for type-c phy1, it must be <0xe58c 0 16>;
>  - rockchip,usb3tousb2-en : the register of type-c force usb3 to usb2 enable
>control.
>for type-c phy0, it must be <0xe580 3 19>;
>for type-c phy1, it must be <0xe58c 3 19>;
>  - rockchip,external-psm : the register of type-c phy external psm clock
>selection.
>for type-c phy0, it must be <0xe588 14 30>;
>for type-c phy1, it must be <0xe594 14 30>;
>  - rockchip,pipe-status : the register of type-c phy pipe status.
>for type-c phy0, it must be <0xe5c0 0 0>;
>for type-c phy1, it must be <0xe5c0 16 16>;
> 
> After this patch these register definitions are in the driver. So can be
> removed from the DT. Note that there are 2 type-c phys for RK3399 with
> different offsets, the driver checks the phy base address of the running
> instance and applies the right offsets.
> 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> Changes since v2:
> - Suggested by Heiko Stuebner:
>   - Prefix phy config struct with rk3399_ as is rk3399-specific.
>   - Create a new struct similar to things like the inno-usb2-phy
>   - Select phy config according to the compatible and remove the
> specific constants.
> Changes since v1:
> - This patch is new in this series to accomplish the purpose of get rid
>   of some registers from the DT. Suggested by Rob Herring.

looks great now
Reviewed-by: Heiko Stuebner

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread Linus Torvalds

On Fri, Feb 16, 2018 at 3:49 PM, John Ogness  wrote:
>
> After reading your initial feedback my idea was to change both
> lock_parent() and dentry_lock_inode() to not only communicate _if_ the
> lock was successful, but also if d_lock was dropped in the process. (For
> example, with a tristate rather than boolean return value.) Then callers
> would know if they needed to recheck the dentry contents.

So I think that would work well for your dentry_lock_inode() helper,
and might indeed solve my reaction to your dentry_kill() patch.

I suspect it doesn't work for lock_parent(), because that has to also
return the parent itself. So you'd have to add another way to say
"didn't need to drop dentry lock". I suspect it gets ugly real fast.

But yes, making dentry_lock_inode() return 0/1/2 for "fail/fast/slow"
(or whatever) sounds doable. And then your dentry_kill() patch can use
a "switch ()" to handle the cases, and the whole "need to revalidate"
might become pretty clear and clean.

I'd suggest you ignore lock_parent() for now. Unless you come up with
something clever.

Linus

Re: [PATCH] proc/kpageflags: add KPF_WAITERS

2018-02-16 Thread Andrew Morton

On Sun, 11 Feb 2018 13:36:41 +0300 Konstantin Khlebnikov 
 wrote:

> KPF_WAITERS indicates tasks are waiting for a page lock or writeback.
> This might be false-positive, in this case next unlock will clear it.

Well, kpageflags is full of potential false-positives.  Or do you think
this flag is especially vulnerable?

In other words, under what circumstances will we have KPF_WAITERS set
when PG_locked and PG-writeback are clear?

> This looks like worth information not only for kernel hacking.

Why?  What are the use-cases, in detail?  How are we to justify this
modification?

> In tool page-types in non-raw mode treat KPF_WAITERS without
> KPF_LOCKED and KPF_WRITEBACK as false-positive and hide it.

>  fs/proc/page.c |1 +
>  include/uapi/linux/kernel-page-flags.h |1 +
>  tools/vm/page-types.c  |7 +++

Please update Documentation/vm/pagemap.txt.

Re: [PATCH v3 08/11] watchdog/hpwdt: Programable Pretimeout NMI

2018-02-16 Thread Guenter Roeck

On Fri, Feb 16, 2018 at 04:46:17PM -0700, Jerry Hoemann wrote:
> On Fri, Feb 16, 2018 at 12:34:40PM -0800, Guenter Roeck wrote:
> > On Thu, Feb 15, 2018 at 04:43:57PM -0700, Jerry Hoemann wrote:
> > > Make whether or not the hpwdt watchdog delivers a pretimeout NMI
> > > programable by the user.
> > > 
> > > The underlying iLO hardware is programmable as to whether or not
> > > a pre-timeout NMI is delivered to the system before the iLO resets
> > > the system.  However, the iLO does not allow for programming the
> > > length of time that NMI is delivered before the system is reset.
> > > 
> > > Hence, in hpwdt_set_pretimeout, val == 0 disables the NMI. Any
> > > non-zero value sets the pretimeout length to what the hardware
> > > supports.
> > > 
> > > Signed-off-by: Jerry Hoemann 
> > > ---
> > >  drivers/watchdog/hpwdt.c | 42 --
> > >  1 file changed, 36 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > > index da9a04101814..dc0ad20738ed 100644
> > > --- a/drivers/watchdog/hpwdt.c
> > > +++ b/drivers/watchdog/hpwdt.c
> > > @@ -28,12 +28,15 @@
> > >  #define TICKS_TO_SECS(ticks) ((ticks) * 128 / 1000)
> > >  #define HPWDT_MAX_TIMER  TICKS_TO_SECS(65535)
> > >  #define DEFAULT_MARGIN   30
> > > +#define PRETIMEOUT_SEC   9
> > >  
> > >  static unsigned int soft_margin = DEFAULT_MARGIN;/* in seconds */
> > > -static unsigned int reload;  /* the computed 
> > > soft_margin */
> > >  static bool nowayout = WATCHDOG_NOWAYOUT;
> > >  #ifdef CONFIG_HPWDT_NMI_DECODING
> > >  static unsigned int allow_kdump = 1;
> > > +static bool pretimeout = 1;
> > > +#else
> > > +static bool pretimeout;
> > >  #endif
> > >  
> > static bool pretimeout = IS_ENABLED(CONFIG_HPWDT_NMI_DECODING);
> 
> ack. will do.
> 
> > 
> > >  static void __iomem *pci_mem_addr;   /* the PCI-memory 
> > > address */
> > > @@ -55,10 +58,12 @@ static struct watchdog_device hpwdt_dev;
> > >   */
> > >  static int hpwdt_start(struct watchdog_device *dev)
> > >  {
> > > - reload = SECS_TO_TICKS(dev->timeout);
> > > + int control = 0x81 | (pretimeout ? 0x4 : 0);
> > > + int reload = SECS_TO_TICKS(dev->timeout);
> > >  
> > > + dev_dbg(dev->parent, "start watchdog 0x%08x:0x%02x\n", reload, control);
> > >   iowrite16(reload, hpwdt_timer_reg);
> > > - iowrite8(0x85, hpwdt_timer_con);
> > > + iowrite8(control, hpwdt_timer_con);
> > >  
> > >   return 0;
> > >  }
> > > @@ -67,6 +72,8 @@ static int hpwdt_stop(struct watchdog_device *dev)
> > >  {
> > >   unsigned long data;
> > >  
> > > + dev_dbg(dev->parent, "stop  watchdog\n");
> > > +
> > Unrelated.
> > 
> > >   data = ioread8(hpwdt_timer_con);
> > >   data &= 0xFE;
> > >   iowrite8(data, hpwdt_timer_con);
> > > @@ -75,8 +82,9 @@ static int hpwdt_stop(struct watchdog_device *dev)
> > >  
> > >  static int hpwdt_ping(struct watchdog_device *dev)
> > >  {
> > > - reload = SECS_TO_TICKS(dev->timeout);
> > > + int reload = SECS_TO_TICKS(dev->timeout);
> > >  
> > > + dev_dbg(dev->parent, "ping  watchdog 0x%08x\n", reload);
> > 
> > Unrelated. If you want to add debug messages, please do it
> > in a separate patch.
> 
> 
> Different patch, but same set?  I'll move these (and ones from earlier
> patch to a new separate patch later in set.)
> 
> > 
> > >   iowrite16(reload, hpwdt_timer_reg);
> > >  
> > >   return 0;
> > > @@ -98,12 +106,21 @@ static int hpwdt_settimeout(struct watchdog_device 
> > > *dev, unsigned int val)
> > >  }
> > >  
> > >  #ifdef CONFIG_HPWDT_NMI_DECODING /* { */
> > > +static int hpwdt_set_pretimeout(struct watchdog_device *dev, unsigned 
> > > int val)
> > > +{
> > > + if (val && (val != PRETIMEOUT_SEC)) {
> > 
> > Unnecessary ( )
> 
> 
> There are several things going on here. I'm not sure which one the above
> comment is intended.
> 
The "Unnecessary" refers to the ( ) around the second part of the expression
above. While there may be valid reasons to include extra ( ), I think we
can trust the C compiler to get it right here.

> While a pretimeout NMI isn't required by the HW to be enabled, if enabled the
> length of pretimeout is fixed by HW.
> 
> I didn't see anything in the API that would allow us to communicate to
> the user this "feature."  timeout at leasst has both min_timeout and 
> max_timeout, but
> I didn't see similar for pretimeout.  I also don't think its reasonable to 
> fail
> here if the requested value is not 9 as the user really has no way of knowing 
> what
> the valid range of pretimeout values are.  So I accept, any non-zero value
> for pretimeout, but then set pretimeout to be 9.
> 
> But at the same time, I don't like to siliently change a human request
> w/o at least warning.
> 
Sorry, I lost you here.

> > 
> > The actual timeout can be a value smaller than 9 seconds.
> > Minimum is 1 second. What happens if the user configures
> > a timeout of less than 9

Re: [PATCH][V2] rtc: tx4939: avoid unintended sign extension on a 24 bit shift

2018-02-16 Thread Alexandre Belloni

On 15/02/2018 at 19:36:14 +, Colin King wrote:
> From: Colin Ian King 
> 
> The shifting of buf[5] by 24 bits to the left will be promoted to
> a 32 bit signed int and then sign-extended to an unsigned long. If
> the top bit of buf[5] is set then all then all the upper bits sec
> end up as also being set because of the sign-extension. Fix this by
> casting buf[5] to an unsigned long before the shift.
> 
> Detected by CoverityScan, CID#1465292 ("Unintended sign extension")
> 
> Fixes: 0e1492330cd2 ("rtc: add rtc-tx4939 driver")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/rtc/rtc-tx4939.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
Applied, thanks.

-- 
Alexandre Belloni, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com

Re: [PATCH 08/23] kconfig: add 'macro' keyword to support user-defined function

2018-02-16 Thread Ulf Magnusson

On Fri, Feb 16, 2018 at 02:49:31PM -0500, Nicolas Pitre wrote:
> On Sat, 17 Feb 2018, Masahiro Yamada wrote:
> 
> > Now, we got a basic ability to test compiler capability in Kconfig.
> > 
> > config CC_HAS_STACKPROTECTOR
> > bool
> > default $(shell $CC -Werror -fstack-protector -c -x c /dev/null -o 
> > /dev/null)
> > 
> > This works, but it is ugly to repeat this long boilerplate.
> > 
> > We want to describe like this:
> > 
> > config CC_HAS_STACKPROTECTOR
> > bool
> > default $(cc-option -fstack-protector)
> > 
> > It is straight-forward to implement a new function, but I do not like
> > to hard-code specialized functions like this.  Hence, here is another
> > feature to add functions from Kconfig files.
> > 
> > A user-defined function can be defined as a string type symbol with
> > a special keyword 'macro'.  It can be referenced in the same way as
> > built-in functions.  This feature was also inspired by Makefile where
> > user-defined functions are referenced by $(call func-name, args...),
> > but I omitted the 'call' to makes it shorter.
> > 
> > The macro definition can contain $(1), $(2), ... which will be replaced
> > with arguments from the caller.
> > 
> > Example code:
> > 
> >   config cc-option
> >   string
> >   macro $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> 
> I think this syntax for defining a macro shouldn't start with the 
> "config" keyword, unless you want it to be part of the config symbol 
> space and land it in .config. And typing it as a "string" while it 
> actually returns y/n (hence a bool) is also strange.
> 
> What about this instead:
> 
> macro cc-option
>   bool $(shell $CC -Werror $(1) -c -x c /dev/null -o /dev/null)
> 
> This makes it easier to extend as well if need be.
> 
> 
> Nicolas

I haven't gone over the patchset in detail yet and might be missing
something here, but if this is just meant to be a textual shorthand,
then why give it a type at all?

Do you think a simpler syntax like this would make sense?

macro cc-option "$(shell $CC -Werror $(1) -c -x c /dev/null -o 
/dev/null)"

That's the most general version, where you could use it for other stuff
besides $(shell ...) as well, just to keep parity.

You could then always just expand $() as a string, and maybe spit out
"n" and "y" in the cases Linus suggested for $(shell ...). The existing
logic for constant symbols should then take care of converting that into
a tristate value where appropriate.

If you go with that and want to support $() outside quotes, then

$(foo)

would just be a shorthand for

"$(foo)"

Are there any cases where something more advanced than that might be
warranted (e.g., macros that expand to complete expressions)? It seems
pretty nice and nonmagical otherwise.

Cheers,
Ulf

Re: [PATCH v2 01/10] drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs

2018-02-16 Thread Lina Iyer


Thanks Evan for your review.

On Fri, Feb 16 2018 at 21:30 +, Evan Green wrote:

Hi Lina,

On Thu, Feb 15, 2018 at 9:34 AM, Lina Iyer  wrote:



+
+/**
+ * tcs_response: Response object for a request


Can you embed the acronym definition, ie: tcs_response: Responses for
a Trigger Command Set.


Sure.


+ *
+ * @drv: the controller
+ * @msg: the request for this response
+ * @m: the tcs identifier
+ * @err: error reported in the response
+ * @list: link list object.
+ */
+struct tcs_response {
+   struct rsc_drv *drv;
+   struct tcs_request *msg;
+   u32 m;
+   int err;
+   struct list_head list;
+};
+
+/**
+ * tcs_group: group of TCSes for a request state
+ *


Document @drv.


OK


+ * @type: type of the TCS in this group - active, sleep, wake
+ * @tcs_mask: mask of the TCSes relative to all the TCSes in the RSC
+ * @tcs_offset: start of the TCS group relative to the TCSes in the RSC
+ * @num_tcs: number of TCSes in this type
+ * @ncpt: number of commands in each TCS
+ * @tcs_lock: lock for synchronizing this TCS writes
+ * @responses: response objects for requests sent from each TCS
+ */
+struct tcs_group {
+   struct rsc_drv *drv;
+   int type;
+   u32 tcs_mask;
+   u32 tcs_offset;
+   int num_tcs;
+   int ncpt;
+   spinlock_t tcs_lock;
+   struct tcs_response *responses[MAX_TCS_PER_TYPE];
+};
+
+/**
+ * rsc_drv: the RSC controller


Would be more helpfully described as Resource State Coordinator controller.


OK



+static void write_tcs_reg_sync(struct rsc_drv *drv, int reg, int m, int n,
+ u32 data)
+{
+   write_tcs_reg(drv, reg, m, n, data);
+   for (;;) {
+   if (data == read_tcs_reg(drv, reg, m, n))
+   break;
+   udelay(1);


Should this time out and return a failure?


There is no reason for this fail. We just need to ensure that it is
written. Sometimes writes going through the bus, takes time to complete
the write. When we exit this function, we are assured that it is
written.


+   }
+}
+
+static bool tcs_is_free(struct rsc_drv *drv, int m)
+{
+   return !atomic_read(&drv->tcs_in_use[m]) &&
+  read_tcs_reg(drv, RSC_DRV_STATUS, m, 0);
+}
+
+static struct tcs_group *get_tcs_of_type(struct rsc_drv *drv, int type)
+{
+   int i;
+   struct tcs_group *tcs;
+
+   for (i = 0; i < TCS_TYPE_NR; i++) {
+   if (type == drv->tcs[i].type)
+   break;
+   }
+
+   if (i == TCS_TYPE_NR)
+   return ERR_PTR(-EINVAL);
+
+   tcs = &drv->tcs[i];
+   if (!tcs->num_tcs)
+   return ERR_PTR(-EINVAL);
+
+   return tcs;
+}
+
+static struct tcs_group *get_tcs_for_msg(struct rsc_drv *drv,
+   struct tcs_request *msg)
+{
+   int type;
+
+   switch (msg->state) {
+   case RPMH_ACTIVE_ONLY_STATE:
+   type = ACTIVE_TCS;
+   break;
+   default:
+   return ERR_PTR(-EINVAL);
+   }
+
+   return get_tcs_of_type(drv, type);
+}
+
+static void send_tcs_response(struct tcs_response *resp)
+{
+   struct rsc_drv *drv = resp->drv;
+   unsigned long flags;
+
+   if (!resp)
+   return;


Does this ever happen? Ah, I see that it might in the irq handler. But
get_response already assumes that there is a response, and reaches
through it. So I don't think you need this here nor the check+label in
the irq handler.

Is requesting an index out of range purely a developer error, or could
it happen in some sort of runtime scarcity situation? If it's a
developer error, I'd get rid of all the null checking. If it's
something that might really happen under the right circumstances, then
my comment above doesn't stand and you'd want to fix the null
dereference in get_response instead.


I added the check so that I dont have confusing goto in the IRQ handler.


+
+   spin_lock_irqsave(&drv->drv_lock, flags);
+   INIT_LIST_HEAD(&resp->list);
+   list_add_tail(&resp->list, &drv->response_pending);
+   spin_unlock_irqrestore(&drv->drv_lock, flags);
+
+   tasklet_schedule(&drv->tasklet);
+}
+
+/**
+ * tcs_irq_handler: TX Done interrupt handler
+ */
+static irqreturn_t tcs_irq_handler(int irq, void *p)
+{
+   struct rsc_drv *drv = p;
+   int m, i;
+   u32 irq_status, sts;
+   struct tcs_response *resp;
+   struct tcs_cmd *cmd;
+   int err;
+
+   irq_status = read_tcs_reg(drv, RSC_DRV_IRQ_STATUS, 0, 0);
+
+   for (m = 0; m < drv->num_tcs; m++) {
+   if (!(irq_status & (u32)BIT(m)))
+   continue;
+
+   err = 0;
+   resp = get_response(drv, m);
+   if (!resp) {


I mention this above, but I don't think get_response can gracefully
return null, so is this needed?


+   WARN_ON(1);
+   goto skip_resp;
+   }
+
+   for (i = 0; i

Re: [PATCH v2 05/10] drivers: qcom: rpmh-rsc: write sleep/wake requests to TCS

2018-02-16 Thread Evan Green

Hello Lina,

On Thu, Feb 15, 2018 at 9:35 AM, Lina Iyer  wrote:
> Sleep and wake requests are sent when the application processor
> subsystem of the SoC is entering deep sleep states like in suspend.
> These requests help lower the system power requirements when the
> resources are not in use.
>
> Sleep and wake requests are written to the TCS slots but are not
> triggered at the time of writing. The TCS are triggered by the firmware
> after the last of the CPUs has executed its WFI. Since these requests
> may come in different batches of requests, it is job of this controller
> driver to find arrange the requests into the available TCSes.
>
> Signed-off-by: Lina Iyer 
> ---
>  drivers/soc/qcom/rpmh-internal.h |   7 +++
>  drivers/soc/qcom/rpmh-rsc.c  | 126 
> +++
>  2 files changed, 133 insertions(+)
>
[...]
> +static int find_slots(struct tcs_group *tcs, struct tcs_request *msg,
> +int *m, int *n)
> +{
> +   int slot, offset;
> +   int i = 0;
> +
> +   /* Find if we already have the msg in our TCS */
> +   slot = find_match(tcs, msg->payload, msg->num_payload);
> +   if (slot >= 0)
> +   goto copy_data;
> +
> +   /* Do over, until we can fit the full payload in a TCS */
> +   do {
> +   slot = bitmap_find_next_zero_area(tcs->slots, MAX_TCS_SLOTS,
> +i, msg->num_payload, 0);
> +   if (slot == MAX_TCS_SLOTS)
> +   break;
> +   i += tcs->ncpt;
> +   } while (slot + msg->num_payload - 1 >= i);
> +
> +   if (slot == MAX_TCS_SLOTS)
> +   return -ENOMEM;
> +
> +copy_data:
> +   bitmap_set(tcs->slots, slot, msg->num_payload);
> +   /* Copy the addresses of the resources over to the slots */
> +   for (i = 0; tcs->cmd_addr && i < msg->num_payload; i++)

I don't think tcs->cmd_addr can be null, can it? Above, find_match()
is already reaching through cmd_addr with enthusiasm. If kept, it
could at least be moved outside of the loop.

Evan

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread John Ogness

On 2018-02-17, Linus Torvalds  wrote:
>> dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
>> common case. So by changing the semantics of lock_parent(), I am
>> removing two "recheck in case I dropped" in the common case rather
>> than just the one you pointed out.
>
> Ok, that would be lovely, but doesn't that end up being a nasty patch?

After reading your initial feedback my idea was to change both
lock_parent() and dentry_lock_inode() to not only communicate _if_ the
lock was successful, but also if d_lock was dropped in the process. (For
example, with a tristate rather than boolean return value.) Then callers
would know if they needed to recheck the dentry contents.

> So it may be that my dislike of the "re-check after possibly dropping
> the lock" is not really about the re-checking, but about just how it
> made that function look much more complicated.

I understand what you are saying and I appreciate the comments. I will
code up some variations for myself and try to pick the one that is the
least complicated for my v2.

John Ogness

Re: [tip:x86/pti] x86/speculation: Use IBRS if available before calling into firmware

2018-02-16 Thread Tim Chen

On 02/16/2018 11:16 AM, David Woodhouse wrote:
> On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
>>
>> I encountered hang on a machine but not others when using the above
>> macro.  It is probably an alignment thing with ALTERNATIVE as the
>> problem went
>> away after I made the change below:
>>
>> Tim
>>
>> diff --git a/arch/x86/include/asm/nospec-branch.h
>> b/arch/x86/include/asm/nospec-branch.h
>> index 8f2ff74..0f65bd2 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
>>  
>>  #define alternative_msr_write(_msr, _val, _feature)\
>> asm volatile(ALTERNATIVE("",\
>> +   ".align 16\n\t"\
>> "movl %[msr], %%ecx\n\t"   \
>> "movl %[val], %%eax\n\t"   \
>> "movl $0, %%edx\n\t"   \
> 
> That's weird. Note that .align in an altinstr section isn't actually
> going to do what you'd expect; the oldinstr and altinstr sections
> aren't necessarily aligned the same, so however many NOPs it inserts
> into the alternative, might be deliberately *misaligning* it in the
> code that actually gets executed.
> 
> Are you sure you're not running a kernel where the alternatives code
> would turn that alternative which *starts* with a NOP, into *all* NOPs?
> 

I rebuild the kernel again without the align. I'm no longer
seeing the issue again on that machine that had an issue earlier.  
So let's ignore this for now as I can't reproduce the problem.

It should be other issues causing the hang I saw earlier.

Thanks.

Tim

Re: [PATCH v3 08/11] watchdog/hpwdt: Programable Pretimeout NMI

2018-02-16 Thread Jerry Hoemann

On Fri, Feb 16, 2018 at 12:34:40PM -0800, Guenter Roeck wrote:
> On Thu, Feb 15, 2018 at 04:43:57PM -0700, Jerry Hoemann wrote:
> > Make whether or not the hpwdt watchdog delivers a pretimeout NMI
> > programable by the user.
> > 
> > The underlying iLO hardware is programmable as to whether or not
> > a pre-timeout NMI is delivered to the system before the iLO resets
> > the system.  However, the iLO does not allow for programming the
> > length of time that NMI is delivered before the system is reset.
> > 
> > Hence, in hpwdt_set_pretimeout, val == 0 disables the NMI. Any
> > non-zero value sets the pretimeout length to what the hardware
> > supports.
> > 
> > Signed-off-by: Jerry Hoemann 
> > ---
> >  drivers/watchdog/hpwdt.c | 42 --
> >  1 file changed, 36 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index da9a04101814..dc0ad20738ed 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -28,12 +28,15 @@
> >  #define TICKS_TO_SECS(ticks)   ((ticks) * 128 / 1000)
> >  #define HPWDT_MAX_TIMERTICKS_TO_SECS(65535)
> >  #define DEFAULT_MARGIN 30
> > +#define PRETIMEOUT_SEC 9
> >  
> >  static unsigned int soft_margin = DEFAULT_MARGIN;  /* in seconds */
> > -static unsigned int reload;/* the computed 
> > soft_margin */
> >  static bool nowayout = WATCHDOG_NOWAYOUT;
> >  #ifdef CONFIG_HPWDT_NMI_DECODING
> >  static unsigned int allow_kdump = 1;
> > +static bool pretimeout = 1;
> > +#else
> > +static bool pretimeout;
> >  #endif
> >  
> static bool pretimeout = IS_ENABLED(CONFIG_HPWDT_NMI_DECODING);

ack. will do.

> 
> >  static void __iomem *pci_mem_addr; /* the PCI-memory address */
> > @@ -55,10 +58,12 @@ static struct watchdog_device hpwdt_dev;
> >   */
> >  static int hpwdt_start(struct watchdog_device *dev)
> >  {
> > -   reload = SECS_TO_TICKS(dev->timeout);
> > +   int control = 0x81 | (pretimeout ? 0x4 : 0);
> > +   int reload = SECS_TO_TICKS(dev->timeout);
> >  
> > +   dev_dbg(dev->parent, "start watchdog 0x%08x:0x%02x\n", reload, control);
> > iowrite16(reload, hpwdt_timer_reg);
> > -   iowrite8(0x85, hpwdt_timer_con);
> > +   iowrite8(control, hpwdt_timer_con);
> >  
> > return 0;
> >  }
> > @@ -67,6 +72,8 @@ static int hpwdt_stop(struct watchdog_device *dev)
> >  {
> > unsigned long data;
> >  
> > +   dev_dbg(dev->parent, "stop  watchdog\n");
> > +
> Unrelated.
> 
> > data = ioread8(hpwdt_timer_con);
> > data &= 0xFE;
> > iowrite8(data, hpwdt_timer_con);
> > @@ -75,8 +82,9 @@ static int hpwdt_stop(struct watchdog_device *dev)
> >  
> >  static int hpwdt_ping(struct watchdog_device *dev)
> >  {
> > -   reload = SECS_TO_TICKS(dev->timeout);
> > +   int reload = SECS_TO_TICKS(dev->timeout);
> >  
> > +   dev_dbg(dev->parent, "ping  watchdog 0x%08x\n", reload);
> 
> Unrelated. If you want to add debug messages, please do it
> in a separate patch.


Different patch, but same set?  I'll move these (and ones from earlier
patch to a new separate patch later in set.)

> 
> > iowrite16(reload, hpwdt_timer_reg);
> >  
> > return 0;
> > @@ -98,12 +106,21 @@ static int hpwdt_settimeout(struct watchdog_device 
> > *dev, unsigned int val)
> >  }
> >  
> >  #ifdef CONFIG_HPWDT_NMI_DECODING   /* { */
> > +static int hpwdt_set_pretimeout(struct watchdog_device *dev, unsigned int 
> > val)
> > +{
> > +   if (val && (val != PRETIMEOUT_SEC)) {
> 
> Unnecessary ( )


There are several things going on here. I'm not sure which one the above
comment is intended.

While a pretimeout NMI isn't required by the HW to be enabled, if enabled the
length of pretimeout is fixed by HW.

I didn't see anything in the API that would allow us to communicate to
the user this "feature."  timeout at leasst has both min_timeout and 
max_timeout, but
I didn't see similar for pretimeout.  I also don't think its reasonable to fail
here if the requested value is not 9 as the user really has no way of knowing 
what
the valid range of pretimeout values are.  So I accept, any non-zero value
for pretimeout, but then set pretimeout to be 9.

But at the same time, I don't like to siliently change a human request
w/o at least warning.



> 
> The actual timeout can be a value smaller than 9 seconds.
> Minimum is 1 second. What happens if the user configures
> a timeout of less than 9 seconds as well as a pretimeout ?
> Will it fire immediately ? 

The architecture is silient on this issue.  My experience with
this is that if timeout < 9 seconds, the NMI is not issued.
System resets when the timeout expires.  This could be implementation
dependent.

Note, this is not a new issue.

I thought about setting the min timeout to ten seconds to avoid this situation.

I haven't dug into the various user level clients of watchdog so I'm not sure 
what the impact of making this change would be t

[PATCH RESEND] powerpc/5200: dts: digsy_mtc.dts: fix rv3029 compatible

2018-02-16 Thread Alexandre Belloni

The proper compatible for rv3029 is microcrystal,rv3029.

Acked-by: Anatolij Gustschin 
Signed-off-by: Alexandre Belloni 
---

Hi,

I'm resending that one because I prefer not taking DT patches through the RTC
tree.

 arch/powerpc/boot/dts/digsy_mtc.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/digsy_mtc.dts 
b/arch/powerpc/boot/dts/digsy_mtc.dts
index c280e75c86bf..c3922fc03e0b 100644
--- a/arch/powerpc/boot/dts/digsy_mtc.dts
+++ b/arch/powerpc/boot/dts/digsy_mtc.dts
@@ -78,7 +78,7 @@
};
 
rtc@56 {
-   compatible = "mc,rv3029c2";
+   compatible = "microcrystal,rv3029";
reg = <0x56>;
};
 
-- 
2.16.1

Re: [PATCH -mm -v5 RESEND] mm, swap: Fix race between swapoff and some swap operations

2018-02-16 Thread Andrew Morton

On Wed, 14 Feb 2018 08:38:00 +0800 "Huang\, Ying"  wrote:

> Andrew Morton  writes:
> 
> > On Tue, 13 Feb 2018 09:42:20 +0800 "Huang, Ying"  
> > wrote:
> >
> >> From: Huang Ying 
> >> 
> >> When the swapin is performed, after getting the swap entry information
> >> from the page table, system will swap in the swap entry, without any
> >> lock held to prevent the swap device from being swapoff.  This may
> >> cause the race like below,
> >
> > Sigh.  In terms of putting all the work into the swapoff path and
> > avoiding overheads in the hot paths, I guess this is about as good as
> > it will get.
> >
> > It's a very low-priority fix so I'd prefer to keep the patch in -mm
> > until Hugh has had an opportunity to think about it.
> >
> >> ...
> >>  
> >> +/*
> >> + * Check whether swap entry is valid in the swap device.  If so,
> >> + * return pointer to swap_info_struct, and keep the swap entry valid
> >> + * via preventing the swap device from being swapoff, until
> >> + * put_swap_device() is called.  Otherwise return NULL.
> >> + */
> >> +struct swap_info_struct *get_swap_device(swp_entry_t entry)
> >> +{
> >> +  struct swap_info_struct *si;
> >> +  unsigned long type, offset;
> >> +
> >> +  if (!entry.val)
> >> +  goto out;
> >> +  type = swp_type(entry);
> >> +  if (type >= nr_swapfiles)
> >> +  goto bad_nofile;
> >> +  si = swap_info[type];
> >> +
> >> +  preempt_disable();
> >
> > This preempt_disable() is later than I'd expect.  If a well-timed race
> > occurs, `si' could now be pointing at a defunct entry.  If that
> > well-timed race include a swapoff AND a swapon, `si' could be pointing
> > at the info for a new device?
> 
> struct swap_info_struct pointed to by swap_info[] will never be freed.
> During swapoff, we only free the memory pointed to by the fields of
> struct swap_info_struct.  And when swapon, we will always reuse
> swap_info[type] if it's not NULL.  So it should be safe to dereference
> swap_info[type] with preemption enabled.

That's my point.  If there's a race window during which there is a
parallel swapoff+swapon, this swap_info_struct may now be in use for a
different device?

[PATCH] alpha: rtc: remove unused set_mmss ops

2018-02-16 Thread Alexandre Belloni

The .set_mmss and .setmmss64 ops are only called when the RTC is not
providing an implementation for the .set_time callback.

On alpha, .set_time is provided so .set_mmss64 is never called. Remove the
unused code.

Signed-off-by: Alexandre Belloni 
---
 arch/alpha/kernel/rtc.c | 99 -
 1 file changed, 99 deletions(-)

diff --git a/arch/alpha/kernel/rtc.c b/arch/alpha/kernel/rtc.c
index b3da0dcda47d..0816e6c747e8 100644
--- a/arch/alpha/kernel/rtc.c
+++ b/arch/alpha/kernel/rtc.c
@@ -114,83 +114,6 @@ alpha_rtc_set_time(struct device *dev, struct rtc_time *tm)
return mc146818_set_time(tm);
 }
 
-static int
-alpha_rtc_set_mmss(struct device *dev, time64_t nowtime)
-{
-   int retval = 0;
-   int real_seconds, real_minutes, cmos_minutes;
-   unsigned char save_control, save_freq_select;
-
-   /* Note: This code only updates minutes and seconds.  Comments
-  indicate this was to avoid messing with unknown time zones,
-  and with the epoch nonsense described above.  In order for
-  this to work, the existing clock cannot be off by more than
-  15 minutes.
-
-  ??? This choice is may be out of date.  The x86 port does
-  not have problems with timezones, and the epoch processing has
-  now been fixed in alpha_set_rtc_time.
-
-  In either case, one can always force a full rtc update with
-  the userland hwclock program, so surely 15 minute accuracy
-  is no real burden.  */
-
-   /* In order to set the CMOS clock precisely, we have to be called
-  500 ms after the second nowtime has started, because when
-  nowtime is written into the registers of the CMOS clock, it will
-  jump to the next second precisely 500 ms later. Check the Motorola
-  MC146818A or Dallas DS12887 data sheet for details.  */
-
-   /* irq are locally disabled here */
-   spin_lock(&rtc_lock);
-   /* Tell the clock it's being set */
-   save_control = CMOS_READ(RTC_CONTROL);
-   CMOS_WRITE((save_control|RTC_SET), RTC_CONTROL);
-
-   /* Stop and reset prescaler */
-   save_freq_select = CMOS_READ(RTC_FREQ_SELECT);
-   CMOS_WRITE((save_freq_select|RTC_DIV_RESET2), RTC_FREQ_SELECT);
-
-   cmos_minutes = CMOS_READ(RTC_MINUTES);
-   if (!(save_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
-   cmos_minutes = bcd2bin(cmos_minutes);
-
-   real_seconds = nowtime % 60;
-   real_minutes = nowtime / 60;
-   if (((abs(real_minutes - cmos_minutes) + 15) / 30) & 1) {
-   /* correct for half hour time zone */
-   real_minutes += 30;
-   }
-   real_minutes %= 60;
-
-   if (abs(real_minutes - cmos_minutes) < 30) {
-   if (!(save_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) {
-   real_seconds = bin2bcd(real_seconds);
-   real_minutes = bin2bcd(real_minutes);
-   }
-   CMOS_WRITE(real_seconds,RTC_SECONDS);
-   CMOS_WRITE(real_minutes,RTC_MINUTES);
-   } else {
-   printk_once(KERN_NOTICE
-   "set_rtc_mmss: can't update from %d to %d\n",
-   cmos_minutes, real_minutes);
-   retval = -1;
-   }
-
-   /* The following flags have to be released exactly in this order,
-* otherwise the DS12887 (popular MC146818A clone with integrated
-* battery and quartz) will not reset the oscillator and will not
-* update precisely 500 ms later. You won't find this mentioned in
-* the Dallas Semiconductor data sheets, but who believes data
-* sheets anyway ...   -- Markus Kuhn
-*/
-   CMOS_WRITE(save_control, RTC_CONTROL);
-   CMOS_WRITE(save_freq_select, RTC_FREQ_SELECT);
-   spin_unlock(&rtc_lock);
-
-   return retval;
-}
-
 static int
 alpha_rtc_ioctl(struct device *dev, unsigned int cmd, unsigned long arg)
 {
@@ -210,7 +133,6 @@ alpha_rtc_ioctl(struct device *dev, unsigned int cmd, 
unsigned long arg)
 static const struct rtc_class_ops alpha_rtc_ops = {
.read_time = alpha_rtc_read_time,
.set_time = alpha_rtc_set_time,
-   .set_mmss64 = alpha_rtc_set_mmss,
.ioctl = alpha_rtc_ioctl,
 };
 
@@ -225,7 +147,6 @@ static const struct rtc_class_ops alpha_rtc_ops = {
 
 union remote_data {
struct rtc_time *tm;
-   unsigned long now;
long retval;
 };
 
@@ -267,29 +188,9 @@ remote_set_time(struct device *dev, struct rtc_time *tm)
return alpha_rtc_set_time(NULL, tm);
 }
 
-static void
-do_remote_mmss(void *data)
-{
-   union remote_data *x = data;
-   x->retval = alpha_rtc_set_mmss(NULL, x->now);
-}
-
-static int
-remote_set_mmss(struct device *dev, time64_t now)
-{
-   union remote_data x;
-   if (smp_processor_id() != boot_cpuid) {
-   x.now = now;
-   smp_call_fun

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread Linus Torvalds

On Fri, Feb 16, 2018 at 3:05 PM, John Ogness  wrote:
>
> dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
> common case. So by changing the semantics of lock_parent(), I am
> removing two "recheck in case I dropped" in the common case rather than
> just the one you pointed out.

Ok, that would be lovely, but doesn't that end up being a nasty patch?
You can't just move the trylock into the caller, since then you need
to move all the other stuff too?

Or were you planning on splitting lock_parent() into two, for the
"fast case vs compex case"?

Or maybe I'm entirely missing something and we're miscommunicating.

I'm actually not so much worried about the cost of re-checking (the
real cost tends to be the locked cycle itself) as I am about the code
looking understandable. Your d_delete() patch didn't make me  go "that
looks more complicated", probably partl ybecause of the nice helper
function.

So it may be that my dislike of the "re-check after possibly dropping
the lock" is not really about the re-checking, but about just how it
made that function look much more complicated.

 Linus

Re: [PATCH] tools/memory-model: remove rb-dep, smp_read_barrier_depends, and lockless_dereference

2018-02-16 Thread Akira Yokosawa

On 2018/02/16 17:22:55 -0500, Alan Stern wrote:
> Since commit 76ebbe78f739 ("locking/barriers: Add implicit
> smp_read_barrier_depends() to READ_ONCE()") was merged for the 4.15
> kernel, it has not been necessary to use smp_read_barrier_depends().
> Similarly, commit 59ecbbe7b31c ("locking/barriers: Kill
> lockless_dereference()") removed lockless_dereference() from the
> kernel.
> 
> Since these primitives are no longer part of the kernel, they do not
> belong in the Linux Kernel Memory Consistency Model.  This patch
> removes them, along with the internal rb-dep relation, and updates the
> revelant documentation.
> 
> Signed-off-by: Alan Stern 
> 

A few nits.  Please see inline comments below.

With those fixed,

Reviewed-by: Akira Yokosawa 

 Thanks, Akira

> ---
> 
> Index: usb-4.x/tools/memory-model/linux-kernel.cat
> ===
> --- usb-4.x/tools/memory-model.orig/linux-kernel.cat
> +++ usb-4.x/tools/memory-model/linux-kernel.cat
> @@ -25,7 +25,6 @@ include "lock.cat"
>  (***)
>  
>  (* Fences *)
> -let rb-dep = [R] ; fencerel(Rb_dep) ; [R]
>  let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn]
>  let wmb = [W] ; fencerel(Wmb) ; [W]
>  let mb = ([M] ; fencerel(Mb) ; [M]) |
> @@ -61,11 +60,9 @@ let dep = addr | data
>  let rwdep = (dep | ctrl) ; [W]
>  let overwrite = co | fr
>  let to-w = rwdep | (overwrite & int)
> -let rrdep = addr | (dep ; rfi)
> -let strong-rrdep = rrdep+ & rb-dep
> -let to-r = strong-rrdep | rfi-rel-acq
> +let to-r = addr | (dep ; rfi) | rfi-rel-acq
>  let fence = strong-fence | wmb | po-rel | rmb | acq-po
> -let ppo = rrdep* ; (to-r | to-w | fence)
> +let ppo = to-r | to-w | fence
>  
>  (* Propagation: Ordering from release operations and strong fences. *)
>  let A-cumul(r) = rfe? ; r
> Index: usb-4.x/tools/memory-model/Documentation/explanation.txt
> ===
> --- usb-4.x/tools/memory-model.orig/Documentation/explanation.txt
> +++ usb-4.x/tools/memory-model/Documentation/explanation.txt
> @@ -1,5 +1,5 @@
> -Explanation of the Linux-Kernel Memory Model
> -
> +Explanation of the Linux-Kernel Memory Consistency Model
> +
>  
>  :Author: Alan Stern 
>  :Created: October 2017
> @@ -35,25 +35,24 @@ Explanation of the Linux-Kernel Memory M
>  INTRODUCTION
>  
>  
> -The Linux-kernel memory model (LKMM) is rather complex and obscure.
> -This is particularly evident if you read through the linux-kernel.bell
> -and linux-kernel.cat files that make up the formal version of the
> -memory model; they are extremely terse and their meanings are far from
> -clear.
> +The Linux-kernel memory consistency model (LKMM) is rather complex and
> +obscure.  This is particularly evident if you read through the
> +linux-kernel.bell and linux-kernel.cat files that make up the formal
> +version of the model; they are extremely terse and their meanings are
> +far from clear.
>  
>  This document describes the ideas underlying the LKMM.  It is meant
> -for people who want to understand how the memory model was designed.
> -It does not go into the details of the code in the .bell and .cat
> -files; rather, it explains in English what the code expresses
> -symbolically.
> +for people who want to understand how the model was designed.  It does
> +not go into the details of the code in the .bell and .cat files;
> +rather, it explains in English what the code expresses symbolically.
>  
>  Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
> -toward beginners; they explain what memory models are and the basic
> -notions shared by all such models.  People already familiar with these
> -concepts can skim or skip over them.  Sections 6 (EVENTS) through 12
> -(THE FROM_READS RELATION) describe the fundamental relations used in
> -many memory models.  Starting in Section 13 (AN OPERATIONAL MODEL),
> -the workings of the LKMM itself are covered.
> +toward beginners; they explain what memory consistency models are and
> +the basic notions shared by all such models.  People already familiar
> +with these concepts can skim or skip over them.  Sections 6 (EVENTS)
> +through 12 (THE FROM_READS RELATION) describe the fundamental
> +relations used in many models.  Starting in Section 13 (AN OPERATIONAL
> +MODEL), the workings of the LKMM itself are covered.
>  
>  Warning: The code examples in this document are not written in the
>  proper format for litmus tests.  They don't include a header line, the
> @@ -827,8 +826,8 @@ A-cumulative; they only affect the propa
>  executed on C before the fence (i.e., those which precede the fence in
>  program order).
>  
> -smp_read_barrier_depends(), rcu_read_lock(), rcu_read_unlock(), and
> -synchronize_rcu() fences have other properties which we discuss later.
> +read_lock(), rcu_read_unlock(), and synchronize_rc

Re: [PATCH 2/3] x86/mm: introduce __PAGE_KERNEL_GLOBAL

2018-02-16 Thread Dave Hansen

On 02/16/2018 10:25 AM, Nadav Amit wrote:
>> --- a/arch/x86/mm/pageattr.c~kpti-no-global-for-kernel-mappings  
>> 2018-02-13 15:17:56.148210060 -0800
>> +++ b/arch/x86/mm/pageattr.c 2018-02-13 15:17:56.153210060 -0800
>> @@ -593,7 +593,8 @@ try_preserve_large_page(pte_t *kpte, uns
>>   * different bit positions in the two formats.
>>   */
>>  req_prot = pgprot_4k_2_large(req_prot);
>> -req_prot = pgprot_set_on_present(req_prot, _PAGE_GLOBAL | _PAGE_PSE);
>> +req_prot = pgprot_set_on_present(req_prot,
>> +__PAGE_KERNEL_GLOBAL | _PAGE_PSE);
>>  req_prot = canon_pgprot(req_prot);
> From these chunks, it seems to me as req_prot will not have the global bit
> on when “nopti” parameter is provided. What am I missing?

BTW, this code is broken.  It's trying to unconditionally set
_PAGE_GLOBAL whenever set do change_page_attr() and friends.  It gets
fixed up by canon_pgprot(), but it's wrong to do in the first place.
I've got a better fix for this coming.

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread John Ogness

On 2018-02-16, Linus Torvalds  wrote:
>> lock_parent() already has the problem you are referring to. Callers
>> are required to recheck the dentry contents and check the returned
>> parent because they do not know if the trylock succeeded. See
>> d_prune_aliases(), for example.
>
> What are you talking about?
>
> lock_parent() does the nice "spin_trylock succeeded" special case.
>
> Yes, it will then do the "unlock dentry, do the parent first, then
> re-check" too, and callers may need to worry about it.
>
> But that's not what I'm complaining about in your patch. You remove
> the simple case, and make dentry_kill() do the "recheck in case I
> dropped" every single time.

dentry_lock_inode() uses the same semantics as lock_parent(). The caller
does not know if the trylock succeeded. Any caller using lock_parent()
must "recheck in case I dropped", just as with dentry_lock_inode(). This
is what you have pointed out.

> The fact that there are _other_ complex cases doesn't make it any
> better. The whole "but Bobby does it too" thing is not a defense.
> Would you jump off a bridge just because your friend did it?

dentry_kill() calls both dentry_lock_inode() and lock_parent() in the
common case. So by changing the semantics of lock_parent(), I am
removing two "recheck in case I dropped" in the common case rather than
just the one you pointed out.

John Ogness

Re: [PATCH 1/3] Kconfig: disable PROFILE_ALL_BRANCHES for compile testing

2018-02-16 Thread Steven Rostedt

On Fri, 16 Feb 2018 23:40:22 +0100
Arnd Bergmann  wrote:

>  ((CAPI_MSG *) msg)->info.facility_req.structs[1] =
> LI_REQ_SILENT_UPDATE & 0xff;
>  
> ~^
> drivers/isdn/hardware/eicon/message.c:11163:54: error: array subscript
> is above array bounds [-Werror=array-bounds]
>  ((CAPI_MSG *) msg)->info.facility_req.structs[2] =
> LI_REQ_SILENT_UPDATE >> 8;
>  
> ~^~~
> drivers/isdn/hardware/eicon/message.c:11164:54: error: array subscript
> is above array bounds [-Werror=array-bounds]
>  ((CAPI_MSG *) msg)->info.facility_req.structs[3] = 0;
> 
> All those are nonsense AFAICT, and we see them only because the "if()" 
> override
> ends up confusing gcc's value-range tracking in the same way it used to cause
> lots of -Wmaybe-uninitialized warnings (which we just disable these days
> with PROFILE_ALL_BRANCHES).

I'm fine with your patch then.

-- Steve

[PATCH v1 1/2] PCI: Probe for device reset support before driver claim

2018-02-16 Thread Bjorn Helgaas

From: Bjorn Helgaas 

Previously we called pci_probe_reset_function() in this path:

  pci_sysfs_init  # late_initcall
for_each_pci_dev(dev)
  pci_create_sysfs_dev_files(dev)
pci_create_capabilities_sysfs(dev)
  pci_probe_reset_function
pci_dev_specific_reset
pcie_has_flr
  pcie_capability_read_dword

pci_sysfs_init() is a late_initcall, and a driver may have already claimed
one of these devices and enabled runtime power management for it, so the
device could already be in D3hot by the time we get to pci_sysfs_init().

The device itself should respond to the config read even while it's in
D3hot, but if an upstream bridge is also in D3hot, the read won't even
reach the device because the bridge won't forward it downstream to the
device.  If the bridge is a PCIe port, it should complete the read as an
Unsupported Request, which may be reported to the CPU as an exception or as
invalid data.

Avoid this case by probing for reset support from pci_init_capabilities(),
before a driver can claim the device.  The device should be in D0 and fully
accessible at that point.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pci-sysfs.c |3 +--
 drivers/pci/probe.c |3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index eb6bee8724cc..4933f0270471 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1542,11 +1542,10 @@ static int pci_create_capabilities_sysfs(struct pci_dev 
*dev)
/* Active State Power Management */
pcie_aspm_create_sysfs_dev_files(dev);
 
-   if (!pci_probe_reset_function(dev)) {
+   if (dev->reset_fn) {
retval = device_create_file(&dev->dev, &reset_attr);
if (retval)
goto error;
-   dev->reset_fn = 1;
}
return 0;
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..489660d0d384 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2121,6 +2121,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 
/* Advanced Error Reporting */
pci_aer_init(dev);
+
+   if (pci_probe_reset_function(dev) == 0)
+   dev->reset_fn = 1;
 }
 
 /*

[PATCH v1 2/2] PCI: Remove redundant probes for device reset support

2018-02-16 Thread Bjorn Helgaas

From: Bjorn Helgaas 

We probe every device for whether it supports reset so we can tell whether
to create a sysfs "reset" file for it.  We do that probe in
pci_init_capabilities() during enumeration and save the result in
dev->reset_fn.  The result doesn't depend on any other devices on the bus
and shouldn't change after boot, so we don't need to do the probe again.

Remove the pci_probe_reset_function() calls and rely on the dev->reset_fn
we found during enumeration.  No functional change intended.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pci.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6a4dd10d9b0..4db740e4f50a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4450,9 +4450,8 @@ int pci_reset_function(struct pci_dev *dev)
 {
int rc;
 
-   rc = pci_probe_reset_function(dev);
-   if (rc)
-   return rc;
+   if (!dev->reset_fn)
+   return -ENOTTY;
 
pci_dev_lock(dev);
pci_dev_save_and_disable(dev);
@@ -4487,9 +4486,8 @@ int pci_reset_function_locked(struct pci_dev *dev)
 {
int rc;
 
-   rc = pci_probe_reset_function(dev);
-   if (rc)
-   return rc;
+   if (!dev->reset_fn)
+   return -ENOTTY;
 
pci_dev_save_and_disable(dev);
 
@@ -4511,9 +4509,8 @@ int pci_try_reset_function(struct pci_dev *dev)
 {
int rc;
 
-   rc = pci_probe_reset_function(dev);
-   if (rc)
-   return rc;
+   if (!dev->reset_fn)
+   return -ENOTTY;
 
if (!pci_dev_trylock(dev))
return -EAGAIN;

[PATCH v1 0/2] PCI: Probe for reset support earlier

2018-02-16 Thread Bjorn Helgaas

The PCI core currently uses a late_initcall to probe each device for
whether it supports reset.  This is dangerous because a driver may have
already claimed the device by this point, and the PCI core should not
interfere with the driver by touching the device on its own.

These patches move the probe to be earlier, during enumeration, before a
driver has a chance to claim the device.

---

Bjorn Helgaas (2):
  PCI: Probe for device reset support before driver claim
  PCI: Remove redundant probes for device reset support


 drivers/pci/pci-sysfs.c |3 +--
 drivers/pci/pci.c   |   15 ++-
 drivers/pci/probe.c |3 +++
 3 files changed, 10 insertions(+), 11 deletions(-)

[PATCH v3 0/2] drivers/qcom: add Command DB support

2018-02-16 Thread Lina Iyer

Changes in v3:
- use min_t instead of MIN
- add cmd db memory to reserved memory region
- use devm_memremap

These patches add support for reading a shared memory database in the newer
QCOM SoCs called Command DB. With the new architecture on SDM845, shared
resources like clocks, regulators etc., have dynamic properties. These
properties may change based on external components, board configurations or
available feature set. A remote processor detects these parameters and fills up
the database with the resource and available state information. Platform
drivers that need these shared resources will need to query this database to
get the address and properties and vote for the state.

The information in the database is static.  The database is read-only memory
location that is available for Linux. A pre-defined string is used as a key into
an entry in the database. Generally, platform drivers query the database only
at init to get the information they need.

[v1]: https://www.spinics.net/lists/linux-arm-msm/msg32462.html
[v2]: https://lkml.org/lkml/2018/2/8/588

Lina Iyer (2):
  drivers: qcom: add command DB driver
  dt-bindings: introduce Command DB for QCOM SoCs

 .../devicetree/bindings/arm/msm/cmd-db.txt |  30 ++
 drivers/of/platform.c  |   1 +
 drivers/soc/qcom/Kconfig   |   9 +
 drivers/soc/qcom/Makefile  |   1 +
 drivers/soc/qcom/cmd-db.c  | 319 +
 include/soc/qcom/cmd-db.h  |  50 
 6 files changed, 410 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/msm/cmd-db.txt
 create mode 100644 drivers/soc/qcom/cmd-db.c
 create mode 100644 include/soc/qcom/cmd-db.h

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v3 1/2] drivers: qcom: add command DB driver

2018-02-16 Thread Lina Iyer

From: Mahesh Sivasubramanian 

Command DB is a simple database in the shared memory of QCOM SoCs, that
provides information regarding shared resources. Some shared resources
in the SoC have properties that are probed dynamically at boot by the
remote processor. The information pertaining to the SoC and the platform
are made available in the shared memory. Drivers can query this
information using predefined strings.

Signed-off-by: Mahesh Sivasubramanian 
Signed-off-by: Lina Iyer 
---
 drivers/of/platform.c |   1 +
 drivers/soc/qcom/Kconfig  |   9 ++
 drivers/soc/qcom/Makefile |   1 +
 drivers/soc/qcom/cmd-db.c | 319 ++
 include/soc/qcom/cmd-db.h |  50 
 5 files changed, 380 insertions(+)
 create mode 100644 drivers/soc/qcom/cmd-db.c
 create mode 100644 include/soc/qcom/cmd-db.h

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index c00d81dfac0b..26fb43847f4b 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -494,6 +494,7 @@ EXPORT_SYMBOL_GPL(of_platform_default_populate);
 #ifndef CONFIG_PPC
 static const struct of_device_id reserved_mem_matches[] = {
{ .compatible = "qcom,rmtfs-mem" },
+   { .compatible = "qcom,cmd-db" },
{ .compatible = "ramoops" },
{}
 };
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index e050eb83341d..b12868a2b92d 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -3,6 +3,15 @@
 #
 menu "Qualcomm SoC drivers"
 
+config QCOM_COMMAND_DB
+   bool "Qualcomm Command DB"
+   depends on (ARCH_QCOM && OF) || COMPILE_TEST
+   help
+ Command DB queries shared memory by key string for shared system
+ resources. Platform drivers that require to set state of a shared
+ resource on a RPM-hardened platform must use this database to get
+ SoC specific identifier and information for the shared resources.
+
 config QCOM_GLINK_SSR
tristate "Qualcomm Glink SSR driver"
depends on RPMSG
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index dcebf2814e6d..bbd1230fc441 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_QCOM_COMMAND_DB) += cmd-db.o
 obj-$(CONFIG_QCOM_GLINK_SSR) +=glink_ssr.o
 obj-$(CONFIG_QCOM_GSBI)+=  qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)  += mdt_loader.o
diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
new file mode 100644
index ..0792a2a98fc9
--- /dev/null
+++ b/drivers/soc/qcom/cmd-db.c
@@ -0,0 +1,319 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define NUM_PRIORITY   2
+#define MAX_SLV_ID 8
+#define CMD_DB_MAGIC   0x0C0330DBUL
+#define SLAVE_ID_MASK  0x7
+#define SLAVE_ID_SHIFT 16
+
+#define ENTRY_HEADER(hdr)  ((void *)cmd_db_header +\
+   sizeof(*cmd_db_header) +\
+   hdr->header_offset)
+
+#define RSC_OFFSET(hdr, ent)   ((void *)cmd_db_header +\
+   sizeof(*cmd_db_header) +\
+   hdr.data_offset + ent.offset)
+
+/**
+ * entry_header: header for each entry in cmddb
+ *
+ * @id: resource's identifier
+ * @priority: unused
+ * @addr: the address of the resource
+ * @len: length of the data
+ * @offset: offset at which data starts
+ */
+struct entry_header {
+   u64 id;
+   u32 priority[NUM_PRIORITY];
+   u32 addr;
+   u16 len;
+   u16 offset;
+};
+
+/**
+ * rsc_hdr: resource header information
+ *
+ * @slv_id: id for the resource
+ * @header_offset: Entry header offset from data
+ * @data_offset: Entry offset for data location
+ * @cnt: number of entries for HW type
+ * @version: MSB is major, LSB is minor
+ */
+struct rsc_hdr {
+   u16 slv_id;
+   u16 header_offset;
+   u16 data_offset;
+   u16 cnt;
+   u16 version;
+   u16 reserved[3];
+};
+
+/**
+ * cmd_db_header: The DB header information
+ *
+ * @version: The cmd db version
+ * @magic_number: constant expected in the database
+ * @header: array of resources
+ * @check_sum: check sum for the header. Unused.
+ * @reserved: reserved memory
+ * @data: driver specific data
+ */
+struct cmd_db_header {
+   u32 version;
+   u32 magic_num;
+   struct rsc_hdr header[MAX_SLV_ID];
+   u32 check_sum;
+   u32 reserved;
+   u8 data[];
+};
+
+/**
+ * DOC: Description of the Command DB database.
+ *
+ * At the start of the command DB memory is the cmd_db_header structure.
+ * The cmd_db_header holds the version, checksum, magic key as well as an
+ * array for header for each slave (depicted by the rsc_header). Each h/w
+ * based accelerator is a 'slave' (shared resour

[PATCH v3 2/2] dt-bindings: introduce Command DB for QCOM SoCs

2018-02-16 Thread Lina Iyer

From: Mahesh Sivasubramanian 

Command DB provides information on shared resources like clocks,
regulators etc., probed at boot by the remote subsytem and made
available in shared memory.

Cc: devicet...@vger.kernel.org
Signed-off-by: Mahesh Sivasubramanian 
Signed-off-by: Lina Iyer 
---
 .../devicetree/bindings/arm/msm/cmd-db.txt | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/msm/cmd-db.txt

diff --git a/Documentation/devicetree/bindings/arm/msm/cmd-db.txt 
b/Documentation/devicetree/bindings/arm/msm/cmd-db.txt
new file mode 100644
index ..039be54fe9c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/msm/cmd-db.txt
@@ -0,0 +1,30 @@
+Command DB
+-
+
+Command DB is a database that provides a mapping between resource key and the
+resource address for a system resource managed by a remote processor. The data
+is stored in a shared memory region and is loaded by the remote processor.
+
+Some of the Qualcomm Technologies Inc SoC's have hardware accelerators for
+controlling shared resources. Depending on the board configuration the shared
+resource properties may change. These properties are dynamically probed by the
+remote processor and made available in the shared memory.
+
+The devicetree representation of the command DB driver should be:
+
+PROPERTIES:
+- compatible:
+   Usage: required
+   Value type: 
+   Definition: Should be "qcom,cmd-db"
+
+Example:
+
+   reserved-memory {
+   [...]
+   qcom,cmd-db@c3f000c {
+   reg = <0x0 0xc3f000c 0x0 0x8>,
+ <0x0 0x85fe 0x0 0x2>;
+   compatible = "qcom,cmd-db";
+   };
+   };
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH] tun: fix mismatch in mutex lock-unlock in tun_get_user()

2018-02-16 Thread Eric Dumazet

On Fri, Feb 16, 2018 at 2:11 PM, Alexey Khoroshilov
 wrote:
> There is a single error path where tfile->napi_mutex is left unlocked.
> It can lead to a deadlock.
>
> Found by Linux Driver Verification project (linuxtesting.org).
>
> Signed-off-by: Alexey Khoroshilov 
> ---
>  drivers/net/tun.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 81e6cc951e7f..0072a9832532 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1879,6 +1879,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> default:
> this_cpu_inc(tun->pcpu_stats->rx_dropped);
> kfree_skb(skb);
> +   if (frags) {
> +   tfile->napi.skb = NULL;
> +   mutex_unlock(&tfile->napi_mutex);
> +   }
> return -EINVAL;


I do not believe this can happen for IFF_TUN

IFF_NAPI_FRAGS can only be set for IFF_TAP

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread Linus Torvalds

On Fri, Feb 16, 2018 at 2:32 PM, John Ogness  wrote:
>
> lock_parent() already has the problem you are referring to. Callers are
> required to recheck the dentry contents and check the returned parent
> because they do not know if the trylock succeeded. See
> d_prune_aliases(), for example.

What are you talking about?

lock_parent() does the nice "spin_trylock succeeded" special case.

Yes, it will then do the "unlock dentry, do the parent first, then
re-check" too, and callers may need to worry about it.

But that's not what I'm complaining about in your patch. You remove
the simple case, and make dentry_kill() do the "recheck in case I
dropped" every single time.

It's the "turn a simple case into a complex case" that I absolutely detest.

The fact that there are _other_ complex cases doesn't make it any
better. The whole "but Bobby does it too" thing is not a defense.
Would you jump off a bridge just because your friend did it?

   Linus

Re: [PATCH 1/3] Kconfig: disable PROFILE_ALL_BRANCHES for compile testing

2018-02-16 Thread Arnd Bergmann

On Fri, Feb 16, 2018 at 11:14 PM, Arnd Bergmann  wrote:
> On Fri, Feb 16, 2018 at 11:03 PM, Steven Rostedt  wrote:
>> On Fri, 16 Feb 2018 22:41:11 +0100
>> Arnd Bergmann  wrote:
>>
>>> This can easily double the time for compiling a driver but does not
>>> provide any benefit for the compile tester, so it's better left disabled.
>>>
>>> In addition, any 'inline' function that is not also 'static' and that
>>> contains an 'if' causes a warning like
>>>
>>> include/linux/string.h:212:2: note: in expansion of macro 'if'
>>>   if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0)
>>>   ^~
>>> include/linux/compiler.h:162:4: warning: '__f' is static but declared 
>>> in inline function 'strcpy' which is not static
>>>
>>> without this patch, and I could not come up with a nice fix for that.
>>> In combination with my patch to always enable 'CONFIG_COMPILE_TEST'
>>> during 'randconfig' builds, we can at least hide these warnings for
>>> most users.
>>
>> This looks like it fixes the same issue that was already fixed and is
>> in Linus's tree.
>>
>>  http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2...@infradead.org
>>
>> See commit 68e76e034b6b1 ("tracing: Prevent PROFILE_ALL_BRANCHES when
>> FORTIFY_SOURCE=y")
>
> Ah, right. I missed that when I wrote the new changelog text for this old
> patch of mine. It also means I should rebase the patch so it applies
> on mainline, as I still want PROFILE_ALL_BRANCHES to be disabled
> in COMPILE_TEST kernels for the build speed aspect.

I retested on top of that patch and found a couple of other warnings show up
in an allmodconfig build with PROFILE_ALL_BRANCHES:

lib/zstd/decompress.c: In function 'ZSTD_decompressStream':
lib/zstd/decompress.c:416:2: error: argument 1 null where non-null
expected [-Werror=nonnull]
drivers/crypto/qat/qat_common/qat_algs.c: In function 'qat_alg_do_precomputes':
drivers/crypto/qat/qat_common/qat_algs.c:156:7: error: argument 1
range [18446744071562067968, 18446744073709551615] exceeds maximum
object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
drivers/isdn/hardware/eicon/message.c: In function 'mixer_notify_update':
drivers/isdn/hardware/eicon/message.c:11162:54: error: array subscript
is above array bounds [-Werror=array-bounds]
 ((CAPI_MSG *) msg)->info.facility_req.structs[1] =
LI_REQ_SILENT_UPDATE & 0xff;
 
~^
drivers/isdn/hardware/eicon/message.c:11163:54: error: array subscript
is above array bounds [-Werror=array-bounds]
 ((CAPI_MSG *) msg)->info.facility_req.structs[2] =
LI_REQ_SILENT_UPDATE >> 8;
 
~^~~
drivers/isdn/hardware/eicon/message.c:11164:54: error: array subscript
is above array bounds [-Werror=array-bounds]
 ((CAPI_MSG *) msg)->info.facility_req.structs[3] = 0;

All those are nonsense AFAICT, and we see them only because the "if()" override
ends up confusing gcc's value-range tracking in the same way it used to cause
lots of -Wmaybe-uninitialized warnings (which we just disable these days
with PROFILE_ALL_BRANCHES).

Arnd

Re: [PATCH v3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()

2018-02-16 Thread Frank Rowand

On 02/16/18 14:20, Frank Rowand wrote:
> On 02/16/18 01:04, Chintan Pandya wrote:
>>
>>
>> On 2/15/2018 6:22 AM, frowand.l...@gmail.com wrote:
>>> From: Frank Rowand 
>>>
>>> Create a cache of the nodes that contain a phandle property.  Use this
>>> cache to find the node for a given phandle value instead of scanning
>>> the devicetree to find the node.  If the phandle value is not found
>>> in the cache, of_find_node_by_phandle() will fall back to the tree
>>> scan algorithm.
>>>
> 
> < snip >
> 
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index ad28de96e13f..ab545dfa9173 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -91,10 +91,69 @@ int __weak of_node_to_nid(struct device_node *np)
>>>   }
>>>   #endif
>>>   +static struct device_node **phandle_cache;
>>> +static u32 phandle_cache_mask;
>>> +
>>> +/*
>>> + * Assumptions behind phandle_cache implementation:
>>> + *   - phandle property values are in a contiguous range of 1..n
>>> + *
>>> + * If the assumptions do not hold, then
>>> + *   - the phandle lookup overhead reduction provided by the cache
>>> + * will likely be less
>>> + */
>>> +static void of_populate_phandle_cache(void)
>>> +{
>>> +    unsigned long flags;
>>> +    u32 cache_entries;
>>> +    struct device_node *np;
>>> +    u32 phandles = 0;
>>> +
>>> +    raw_spin_lock_irqsave(&devtree_lock, flags);
>>> +
>>> +    kfree(phandle_cache);
>>
>> I couldn't understood this. Everything else looks good to me.
> 
> I will be adding a call to of_populate_phandle_cache() from the
> devicetree overlay code.  I put the kfree here so that the previous
> cache memory is freed when a new cache is created.
> 
> Adding the call from the overlay code is not done in this
> series because I have a patch series modifying overlays and
> I do not want to create a conflict or ordering between that
> series and that patch.  The lack of the call from overlay
  this

> code means that overlay code will gain some of the overhead
> reduction from this patch, but possibly not the entire reduction.
> 
> 
>>
>>> +    phandle_cache = NULL;
>>> +
>>> +    for_each_of_allnodes(np)
> 
> < snip >
>

Re: [PATCH] of: add early boot allocation of of_find_node_by_phandle() cache

2018-02-16 Thread Frank Rowand

On 02/16/18 01:07, Chintan Pandya wrote:
> 
> 
> On 2/15/2018 6:14 AM, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> The initial implementation of the of_find_node_by_phandle() cache
>> allocates the cache using kcalloc().  Add an early boot allocation
>> of the cache so it will be usable during early boot.  Switch over
>> to the kcalloc() based cache once normal memory allocation
>> becomes available.
>>
>> Signed-off-by: Frank Rowand 
>> ---
>>
>> This patch is optional, to be added at Rob's discretion.  The
>> extra complexity is not as much as I had feared, but the boot
>> speed up is also likely small.
>>
>>   drivers/of/base.c   | 33 +
>>   drivers/of/fdt.c    |  2 ++
>>   drivers/of/of_private.h |  2 ++
>>   3 files changed, 37 insertions(+)
>>
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index ab545dfa9173..d7b1ff1209e8 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -16,9 +16,11 @@
>>     #define pr_fmt(fmt)    "OF: " fmt
>>   +#include 
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>   #include 
>> @@ -131,6 +133,29 @@ static void of_populate_phandle_cache(void)
>>   raw_spin_unlock_irqrestore(&devtree_lock, flags);
>>   }
>>   +void __init of_populate_phandle_cache_early(void)
>> +{
>> +    u32 cache_entries;
>> +    struct device_node *np;
>> +    u32 phandles = 0;
>> +    size_t size;
>> +
>> +    for_each_of_allnodes(np)
>> +    if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>> +    phandles++;
>> +
>> +    cache_entries = roundup_pow_of_two(phandles);
>> +    phandle_cache_mask = cache_entries - 1;
>> +
>> +    size = cache_entries * sizeof(*phandle_cache);
>> +    phandle_cache = memblock_virt_alloc(size, 4);
>> +    memset(phandle_cache, 0, size);
>> +
>> +    for_each_of_allnodes(np)
>> +    if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL)
>> +    phandle_cache[np->phandle & phandle_cache_mask] = np;
>> +}
> 
> There is a lot of code duplication in this function with
> of_populate_phandle_cache. Would you think of taking out
> common code or differ the function with extra bool parameter
> to say 'early' or 'not early'.

Good observation, and normally yes.  My first implementation of this
feature actually did what you suggest.

It turns out to be a bit more complicated than one might expect
because some of the code is marked __init.  That results in
passing the memory allocation function as a parameter to
of_populate_phandle_cache().  See __unflatten_device_tree() for
an example of what this entails.  Then the parts of
of_populate_cache() that need to be encapsulated in a 'if (!early)'
test are scattered throughout the function, so the test becomes
rather intrusive in terms of code readability.

In the end, the method I chose results in cleaner code for
of_populate_phandle_cache(), plus the memory used by
of_populate_cache_early() gets reclaimed after boot, since
it is marked __init.  Both functions are relatively small and
the code common to both is unlikely to be modified, so I do
not see this causing a maintenance burden.

-Frank

>> +
>>   #ifndef CONFIG_MODULES
>>   static int __init of_free_phandle_cache(void)
>>   {
>> @@ -150,7 +175,15 @@ static int __init of_free_phandle_cache(void)
>>     void __init of_core_init(void)
>>   {
>> +    unsigned long flags;
>>   struct device_node *np;
>> +    phys_addr_t size;
>> +
>> +    raw_spin_lock_irqsave(&devtree_lock, flags);
>> +    size = (phandle_cache_mask + 1) * sizeof(*phandle_cache);
>> +    memblock_free(__pa(phandle_cache), size);
>> +    phandle_cache = NULL;
>> +    raw_spin_unlock_irqrestore(&devtree_lock, flags);
>>     of_populate_phandle_cache();
>>   diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index 84aa9d676375..cb320df23f26 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -1264,6 +1264,8 @@ void __init unflatten_device_tree(void)
>>   of_alias_scan(early_init_dt_alloc_memory_arch);
>>     unittest_unflatten_overlay_base();
>> +
>> +    of_populate_phandle_cache_early();
>>   }
>>     /**
>> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
>> index fa70650136b4..6720448c84cc 100644
>> --- a/drivers/of/of_private.h
>> +++ b/drivers/of/of_private.h
>> @@ -134,6 +134,8 @@ extern void __of_sysfs_remove_bin_file(struct 
>> device_node *np,
>>   /* illegal phandle value (set when unresolved) */
>>   #define OF_PHANDLE_ILLEGAL    0xdeadbeef
>>   +extern void __init of_populate_phandle_cache_early(void);
>> +
>>   /* iterators for transactions, used for overlays */
>>   /* forward iterator */
>>   #define for_each_transaction_entry(_oft, _te) \
>>
> 
> Chintan

Re: [PATCH 4/4] fs/dcache: Avoid the try_lock loops in dentry_kill()

2018-02-16 Thread John Ogness

On 2018-02-16, Linus Torvalds  wrote:
> On Fri, Feb 16, 2018 at 7:09 AM, John Ogness  
> wrote:
>> dentry_kill() holds dentry->d_lock and needs to acquire both
>> dentry->d_inode->i_lock and dentry->d_parent->d_lock. This cannot be
>> done with spin_lock() operations because it's the reverse of the
>> regular lock order. To avoid ABBA deadlocks it is done with two
>> trylock loops.
>>
>> Trylock loops are problematic in two scenarios:
>
> I don't mind this patch series per se (although I would really like Al
> to ack it), but this particular patch I hate.
>
> Why?
>
>> Avoid the trylock loops by using dentry_lock_inode() and lock_parent()
>> which take the locks in the appropriate order. As both functions drop
>> dentry->lock briefly, this requires rechecking of the dentry content
>> as it might have changed after dropping the lock.
>
> I think the trylock should be done first, and then you don't need that
> recheck for the common case.
>
> I realize that the recheck itself isn't expensive, but it's mostly
> about the code flow and the comment:
>
>> +* Recheck refcount as it might have been incremented while
>> +* d_lock was dropped.
>
> the thing is, 99.9% of the time the d_lock wasn't dropped, so that
> "while d_lock was dropped" comment is misleading.
>
> Re-organizing it to do the trylock fastpath explicitly here and not
> bothering with the re-check etc crid for the common case is the rioght
> thing to do.
>
> And the old code was actually organized exactly that way, with a
>
> if (inode && unlikely(!spin_trylock(&inode->i_lock)))
> goto failed;
>
> at the top.
>
> But instead of having that unlikely "failed" case do the complex
> thing, you made the *normal* case do the complex thing.
>
> So NAK on this.

lock_parent() already has the problem you are referring to. Callers are
required to recheck the dentry contents and check the returned parent
because they do not know if the trylock succeeded. See
d_prune_aliases(), for example.

Would you like my v2 to fixup lock_parent() semantics to address your
concerns there as well?

John Ogness

Nokia N900 camera in v4.16-rc1: ready for testing?

2018-02-16 Thread Pavel Machek

Hi!

Camera should work on N900, with v4.16-rc1. Autofocus should work;
flash is in the queue.

This patch is needed for non-square image. Patched v4l-utils is very
much recommened for taking photos.

Sakari: any ideas about this one? This is the bug I showed you in
Prague...

Best regards,
Pavel

commit b685b7b98fc50149779416b33d234e4f9ff6ad0e
Author: Pavel 
Date:   Mon Feb 13 21:26:51 2017 +0100

omap3isp: fix VP2SDR bit so capture (not preview) works

This is neccessary for capture (not preview) to work properly on
N900. Why is unknown.

diff --git a/drivers/media/platform/omap3isp/ispccdc.c 
b/drivers/media/platform/omap3isp/ispccdc.c
index b66276a..6435857 100644
--- a/drivers/media/platform/omap3isp/ispccdc.c
+++ b/drivers/media/platform/omap3isp/ispccdc.c
@@ -1182,7 +1182,8 @@ static void ccdc_configure(struct isp_ccdc_device *ccdc)
/* Use the raw, unprocessed data when writing to memory. The H3A and
 * histogram modules are still fed with lens shading corrected data.
 */
-   syn_mode &= ~ISPCCDC_SYN_MODE_VP2SDR;
+// syn_mode &= ~ISPCCDC_SYN_MODE_VP2SDR;
+   syn_mode |= ISPCCDC_SYN_MODE_VP2SDR;
 
if (ccdc->output & CCDC_OUTPUT_MEMORY)
syn_mode |= ISPCCDC_SYN_MODE_WEN;
@@ -1249,6 +1250,8 @@ static void ccdc_configure(struct isp_ccdc_device *ccdc)
<< ISPCCDC_VERT_LINES_NLV_SHIFT,
   OMAP3_ISP_IOMEM_CCDC, ISPCCDC_VERT_LINES);
 
+   printk("configuring for %d(%d)x%d\n", crop->width, 
ccdc->video_out.bpl_value, crop->height);
+
ccdc_config_outlineoffset(ccdc, ccdc->video_out.bpl_value,
  format->field);
 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

[PATCH] Staging: gdm724x: tty: Fix macro argument reuse that could cause side-effects.

2018-02-16 Thread Quytelda Kahja

Fix a coding style warning from checkpatch.pl.  Use GNU extensions to create
references to the results of problem macro arguments when they are evaluated so
that they can be used safely multiple times.

Signed-off-by: Quytelda Kahja 
---
 drivers/staging/gdm724x/gdm_tty.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/gdm724x/gdm_tty.c 
b/drivers/staging/gdm724x/gdm_tty.c
index fc7682c18f20..73d39fa86d10 100644
--- a/drivers/staging/gdm724x/gdm_tty.c
+++ b/drivers/staging/gdm724x/gdm_tty.c
@@ -37,14 +37,22 @@
 
 #define MUX_TX_MAX_SIZE 2048
 
-#define gdm_tty_send(n, d, l, i, c, b) (\
-   n->tty_dev->send_func(n->tty_dev->priv_dev, d, l, i, c, b))
-#define gdm_tty_recv(n, c) (\
-   n->tty_dev->recv_func(n->tty_dev->priv_dev, c))
-#define gdm_tty_send_control(n, r, v, d, l) (\
-   n->tty_dev->send_control(n->tty_dev->priv_dev, r, v, d, l))
-
-#define GDM_TTY_READY(gdm) (gdm && gdm->tty_dev && gdm->port.count)
+#define gdm_tty_send(n, d, l, i, c, b) \
+   ({ typeof(n) n_ = (n);  \
+   void *priv_dev = n_->tty_dev->priv_dev; \
+   n_->tty_dev->send_func(priv_dev, d, l, i, c, b); })
+#define gdm_tty_recv(n, c) \
+   ({ typeof(n) n_ = (n);  \
+   void *priv_dev = n_->tty_dev->priv_dev; \
+   n_->tty_dev->recv_func(priv_dev, c); })
+#define gdm_tty_send_control(n, r, v, d, l)\
+   ({ typeof(n) n_ = (n);  \
+   void *priv_dev = n_->tty_dev->priv_dev; \
+   n_->tty_dev->send_control(priv_dev, r, v, d, l); })
+
+#define GDM_TTY_READY(gdm) \
+   ({ typeof(gdm) gdm_ = gdm;  \
+   gdm_ && gdm_->tty_dev && gdm_->port.count; })
 
 static struct tty_driver *gdm_driver[TTY_MAX_COUNT];
 static struct gdm *gdm_table[TTY_MAX_COUNT][GDM_TTY_MINOR];
-- 
2.16.1

[PATCH] tools/memory-model: remove rb-dep, smp_read_barrier_depends, and lockless_dereference

2018-02-16 Thread Alan Stern

Since commit 76ebbe78f739 ("locking/barriers: Add implicit
smp_read_barrier_depends() to READ_ONCE()") was merged for the 4.15
kernel, it has not been necessary to use smp_read_barrier_depends().
Similarly, commit 59ecbbe7b31c ("locking/barriers: Kill
lockless_dereference()") removed lockless_dereference() from the
kernel.

Since these primitives are no longer part of the kernel, they do not
belong in the Linux Kernel Memory Consistency Model.  This patch
removes them, along with the internal rb-dep relation, and updates the
revelant documentation.

Signed-off-by: Alan Stern 

---

Index: usb-4.x/tools/memory-model/linux-kernel.cat
===
--- usb-4.x/tools/memory-model.orig/linux-kernel.cat
+++ usb-4.x/tools/memory-model/linux-kernel.cat
@@ -25,7 +25,6 @@ include "lock.cat"
 (***)
 
 (* Fences *)
-let rb-dep = [R] ; fencerel(Rb_dep) ; [R]
 let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn]
 let wmb = [W] ; fencerel(Wmb) ; [W]
 let mb = ([M] ; fencerel(Mb) ; [M]) |
@@ -61,11 +60,9 @@ let dep = addr | data
 let rwdep = (dep | ctrl) ; [W]
 let overwrite = co | fr
 let to-w = rwdep | (overwrite & int)
-let rrdep = addr | (dep ; rfi)
-let strong-rrdep = rrdep+ & rb-dep
-let to-r = strong-rrdep | rfi-rel-acq
+let to-r = addr | (dep ; rfi) | rfi-rel-acq
 let fence = strong-fence | wmb | po-rel | rmb | acq-po
-let ppo = rrdep* ; (to-r | to-w | fence)
+let ppo = to-r | to-w | fence
 
 (* Propagation: Ordering from release operations and strong fences. *)
 let A-cumul(r) = rfe? ; r
Index: usb-4.x/tools/memory-model/Documentation/explanation.txt
===
--- usb-4.x/tools/memory-model.orig/Documentation/explanation.txt
+++ usb-4.x/tools/memory-model/Documentation/explanation.txt
@@ -1,5 +1,5 @@
-Explanation of the Linux-Kernel Memory Model
-
+Explanation of the Linux-Kernel Memory Consistency Model
+
 
 :Author: Alan Stern 
 :Created: October 2017
@@ -35,25 +35,24 @@ Explanation of the Linux-Kernel Memory M
 INTRODUCTION
 
 
-The Linux-kernel memory model (LKMM) is rather complex and obscure.
-This is particularly evident if you read through the linux-kernel.bell
-and linux-kernel.cat files that make up the formal version of the
-memory model; they are extremely terse and their meanings are far from
-clear.
+The Linux-kernel memory consistency model (LKMM) is rather complex and
+obscure.  This is particularly evident if you read through the
+linux-kernel.bell and linux-kernel.cat files that make up the formal
+version of the model; they are extremely terse and their meanings are
+far from clear.
 
 This document describes the ideas underlying the LKMM.  It is meant
-for people who want to understand how the memory model was designed.
-It does not go into the details of the code in the .bell and .cat
-files; rather, it explains in English what the code expresses
-symbolically.
+for people who want to understand how the model was designed.  It does
+not go into the details of the code in the .bell and .cat files;
+rather, it explains in English what the code expresses symbolically.
 
 Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
-toward beginners; they explain what memory models are and the basic
-notions shared by all such models.  People already familiar with these
-concepts can skim or skip over them.  Sections 6 (EVENTS) through 12
-(THE FROM_READS RELATION) describe the fundamental relations used in
-many memory models.  Starting in Section 13 (AN OPERATIONAL MODEL),
-the workings of the LKMM itself are covered.
+toward beginners; they explain what memory consistency models are and
+the basic notions shared by all such models.  People already familiar
+with these concepts can skim or skip over them.  Sections 6 (EVENTS)
+through 12 (THE FROM_READS RELATION) describe the fundamental
+relations used in many models.  Starting in Section 13 (AN OPERATIONAL
+MODEL), the workings of the LKMM itself are covered.
 
 Warning: The code examples in this document are not written in the
 proper format for litmus tests.  They don't include a header line, the
@@ -827,8 +826,8 @@ A-cumulative; they only affect the propa
 executed on C before the fence (i.e., those which precede the fence in
 program order).
 
-smp_read_barrier_depends(), rcu_read_lock(), rcu_read_unlock(), and
-synchronize_rcu() fences have other properties which we discuss later.
+read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
+other properties which we discuss later.
 
 
 PROPAGATION ORDER RELATION: cumul-fence
@@ -988,8 +987,8 @@ Another possibility, not mentioned earli
 section, is:
 
X and Y are both loads, X ->addr Y (i.e., there is an address
-   dependency from X to Y), and an smp_read_barrier_depends()
-   fence occurs between them.
+   dependen

Re: [PATCH v3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()

2018-02-16 Thread Frank Rowand

On 02/16/18 01:04, Chintan Pandya wrote:
> 
> 
> On 2/15/2018 6:22 AM, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> Create a cache of the nodes that contain a phandle property.  Use this
>> cache to find the node for a given phandle value instead of scanning
>> the devicetree to find the node.  If the phandle value is not found
>> in the cache, of_find_node_by_phandle() will fall back to the tree
>> scan algorithm.
>>

< snip >

>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index ad28de96e13f..ab545dfa9173 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -91,10 +91,69 @@ int __weak of_node_to_nid(struct device_node *np)
>>   }
>>   #endif
>>   +static struct device_node **phandle_cache;
>> +static u32 phandle_cache_mask;
>> +
>> +/*
>> + * Assumptions behind phandle_cache implementation:
>> + *   - phandle property values are in a contiguous range of 1..n
>> + *
>> + * If the assumptions do not hold, then
>> + *   - the phandle lookup overhead reduction provided by the cache
>> + * will likely be less
>> + */
>> +static void of_populate_phandle_cache(void)
>> +{
>> +    unsigned long flags;
>> +    u32 cache_entries;
>> +    struct device_node *np;
>> +    u32 phandles = 0;
>> +
>> +    raw_spin_lock_irqsave(&devtree_lock, flags);
>> +
>> +    kfree(phandle_cache);
> 
> I couldn't understood this. Everything else looks good to me.

I will be adding a call to of_populate_phandle_cache() from the
devicetree overlay code.  I put the kfree here so that the previous
cache memory is freed when a new cache is created.

Adding the call from the overlay code is not done in this
series because I have a patch series modifying overlays and
I do not want to create a conflict or ordering between that
series and that patch.  The lack of the call from overlay
code means that overlay code will gain some of the overhead
reduction from this patch, but possibly not the entire reduction.


> 
>> +    phandle_cache = NULL;
>> +
>> +    for_each_of_allnodes(np)

< snip >

Re: [PATCH 1/3] Kconfig: disable PROFILE_ALL_BRANCHES for compile testing

2018-02-16 Thread Arnd Bergmann

On Fri, Feb 16, 2018 at 11:03 PM, Steven Rostedt  wrote:
> On Fri, 16 Feb 2018 22:41:11 +0100
> Arnd Bergmann  wrote:
>
>> This can easily double the time for compiling a driver but does not
>> provide any benefit for the compile tester, so it's better left disabled.
>>
>> In addition, any 'inline' function that is not also 'static' and that
>> contains an 'if' causes a warning like
>>
>> include/linux/string.h:212:2: note: in expansion of macro 'if'
>>   if (strscpy(p, q, p_size < q_size ? p_size : q_size) < 0)
>>   ^~
>> include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
>> inline function 'strcpy' which is not static
>>
>> without this patch, and I could not come up with a nice fix for that.
>> In combination with my patch to always enable 'CONFIG_COMPILE_TEST'
>> during 'randconfig' builds, we can at least hide these warnings for
>> most users.
>
> This looks like it fixes the same issue that was already fixed and is
> in Linus's tree.
>
>  http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2...@infradead.org
>
> See commit 68e76e034b6b1 ("tracing: Prevent PROFILE_ALL_BRANCHES when
> FORTIFY_SOURCE=y")

Ah, right. I missed that when I wrote the new changelog text for this old
patch of mine. It also means I should rebase the patch so it applies
on mainline, as I still want PROFILE_ALL_BRANCHES to be disabled
in COMPILE_TEST kernels for the build speed aspect.

Greg, could you add the 68e76e034b6b1 commit to 4.14-stable and
4.15-stable in the meantime?

Arnd

Re: [PATCH v3 5/6] vfio/type1: Add IOVA range capability support

2018-02-16 Thread Alex Williamson

On Thu, 15 Feb 2018 09:45:03 +
Shameer Kolothum  wrote:

> This  allows the user-space to retrieve the supported IOVA
> range(s), excluding any reserved regions. The implementation
> is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 92 
> +
>  include/uapi/linux/vfio.h   | 23 +++
>  2 files changed, 115 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index dae01c5..21e575c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1925,6 +1925,68 @@ static int vfio_domains_have_iommu_cache(struct 
> vfio_iommu *iommu)
>   return ret;
>  }
>  
> +static int vfio_add_iova_cap(struct vfio_info_cap *caps,
> +  struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
> +  size_t size)
> +{
> + struct vfio_info_cap_header *header;
> + struct vfio_iommu_type1_info_cap_iova_range *iova_cap;
> +
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE, 1);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + iova_cap = container_of(header,
> + struct vfio_iommu_type1_info_cap_iova_range, header);
> + iova_cap->nr_iovas = cap_iovas->nr_iovas;
> + memcpy(iova_cap->iova_ranges, cap_iovas->iova_ranges,
> + cap_iovas->nr_iovas * sizeof(*cap_iovas->iova_ranges));
> + return 0;
> +}
> +
> +static int vfio_build_iommu_iova_caps(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_iommu_type1_info_cap_iova_range *cap_iovas;
> + struct vfio_iova *iova;
> + size_t size;
> + int iovas = 0, i = 0, ret;
> +
> + mutex_lock(&iommu->lock);
> +
> + list_for_each_entry(iova, &iommu->iova_list, list)
> + iovas++;
> +
> + if (!iovas) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + size = sizeof(*cap_iovas) + (iovas * sizeof(*cap_iovas->iova_ranges));
> +
> + cap_iovas = kzalloc(size, GFP_KERNEL);
> + if (!cap_iovas) {
> + ret = -ENOMEM;
> + goto out_unlock;
> + }
> +
> + cap_iovas->nr_iovas = iovas;
> +
> + list_for_each_entry(iova, &iommu->iova_list, list) {
> + cap_iovas->iova_ranges[i].start = iova->start;
> + cap_iovas->iova_ranges[i].end = iova->end;
> + i++;
> + }
> +
> + ret = vfio_add_iova_cap(caps, cap_iovas, size);
> +
> + kfree(cap_iovas);
> +out_unlock:
> + mutex_unlock(&iommu->lock);
> + return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1946,6 +2008,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   }
>   } else if (cmd == VFIO_IOMMU_GET_INFO) {
>   struct vfio_iommu_type1_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ret;
>  
>   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
>  
> @@ -1959,6 +2023,34 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>  
> + if (info.argsz == minsz)
> + goto done;

I don't think the above branch should exist, we want to tell the user
via argsz and flags that capabilities exist even if they only passed
the previous structure size through.

> +
> + ret = vfio_build_iommu_iova_caps(iommu, &caps);
> + if (ret)
> + return ret;
> +
> + if (caps.size) {
> + info.flags |= VFIO_IOMMU_INFO_CAPS;
> + minsz = offsetofend(struct vfio_iommu_type1_info,
> +  cap_offset);

Only update minsz if this is within the provided argsz.

> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;

IOW, if cap_offset doesn't get copied to the user, that's ok, we've
provided them the flag and argsz they need to recognize it's there and
call with a sufficient buffer next time.

> + } else {
> + vfio_info_cap_shift(&caps, sizeof(info));
> + if (copy_to_user((void __user *)arg +
> + sizeof(info), caps.buf,
> + caps.size)) {
> + kfree(caps.buf);
> + return -EFAULT;
> + }
> + info.cap_offset = siz

[PATCH] tun: fix mismatch in mutex lock-unlock in tun_get_user()

2018-02-16 Thread Alexey Khoroshilov

There is a single error path where tfile->napi_mutex is left unlocked.
It can lead to a deadlock.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/tun.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 81e6cc951e7f..0072a9832532 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1879,6 +1879,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
default:
this_cpu_inc(tun->pcpu_stats->rx_dropped);
kfree_skb(skb);
+   if (frags) {
+   tfile->napi.skb = NULL;
+   mutex_unlock(&tfile->napi_mutex);
+   }
return -EINVAL;
}
}
-- 
2.7.4

Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-16 Thread Nadav Amit

Dmitry Safonov <0x7f454...@gmail.com> wrote:

> 2018-02-16 7:11 GMT+00:00 Cyrill Gorcunov :
>> On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote:
>> ...
>> +bool pti_handle_segment_not_present(long error_code)
>> +{
>> +   if (!static_cpu_has(X86_FEATURE_PTI))
>> +   return false;
>> +
>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
>> +   return false;
>> +
>> +   pti_reenable();
>> +   return true;
>> +}
> 
> Please don't.  You're trying to emulate the old behavior here, but
> you're emulating it wrong.  In particular, you won't trap on LAR.
 
 Yes, I thought I’ll manage to address LAR, but failed. I thought you said
 this is not a “show-stopper”. I’ll adapt your approach of using prctl, 
 although
 it really limits the benefit of this mechanism.
>>> 
>>> It's possible we could get away with adding the prctl but making the
>>> default be that only the bitness that matches the program being run is
>>> allowed.  After all, it's possible that CRIU is literally the only
>>> program that switches bitness using the GDT.  (DOSEMU2 definitely does
>>> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
>>> never been entirely sure that CRIU fully counts toward the Linux
>>> "don't break ABI" guarantee.
>>> 
>>> Linus, how would you feel about, by default, preventing 64-bit
>>> programs from long-jumping to __USER32_CS and vice versa?  I think it
>>> has some value as a hardening measure.  I've certainly engaged in some
>>> exploit shenanigans myself that took advantage of the ability to long
>>> jump/ret to change bitness at will.  This wouldn't affect users of
>>> modify_ldt() -- 64-bit programs could still create and use their own
>>> private 32-bit segments with modify_ldt(), and seccomp can (and
>>> should!) prevent that in sandboxed programs.
>>> 
>>> In general, I prefer an approach where everything is explicit to an
>>> approach where we almost, but not quite, emulate the weird historical
>>> behavior.
>>> 
>>> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra
>>> arch_prctl() to enable its cross-bitness shenanigans when
>>> checkpointing and restoring a 32-bit program?
>> 
>> I think this should not be a problem for criu (CC'ing Dima, who has
>> been working on compat mode support in criu). As far as I remember
>> we initiate restoring of 32 bit tasks in native 64 bit mode (well,
>> ia32e to be precise :) mode and then, once everything is ready,
>> we changing the mode by doing a return to __USER32_CS descriptor.
>> So this won't be painful to add additional prctl call here.
> 
> Yeah, restoring will still be easy..
> But checkpointing will be harder if we can't switch to 64-bit mode.
> ATM we have one 64-bit parasite binary, which does all seizing job
> for both 64 and 32 bit binaries.
> So, if you can't switch back to 64-bit from 32-bit mode, we'll need
> to keep two parasites.

I can allow to switch back and forth by dynamically enabling/disabling PTI.
Andy, Dave, do you think it makes it a viable option? Should I respin
another version of the patch-set?

1 2 3 4 5 6 7 8 9 >

1 - 100 of 889 matches

Mail list logo