date:20181110

Re: [PATCH] staging: mt7621-pinctrl: fix uninitialized variable ngroups

2018-11-10 Thread Sergio Paracuellos

On Sat, Nov 10, 2018 at 11:28:06PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Currently the for_each_node_with_property loop us incrementing variable
> ngroups however it was not initialized and hence will contain garbage.
> Fix this by initializing ngroups to zero.
> 
> Detected with static analysis with cppcheck:
> 
> drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c:89]: (error) Uninitialized
> variable: ngroups
> 
> Fixes: e12a1a6e087b ("staging: mt7621-pinctrl: refactor 
> rt2880_pinctrl_dt_node_to_map function")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c 
> b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> index b8566ed898f1..aa98fbb17013 100644
> --- a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> +++ b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> @@ -82,7 +82,7 @@ static int rt2880_pinctrl_dt_node_to_map(struct pinctrl_dev 
> *pctrldev,
>   struct property *prop;
>   const char *function_name, *group_name;
>   int ret;
> - int ngroups;
> + int ngroups = 0;
>   unsigned int reserved_maps = 0;
>  
>   for_each_node_with_property(np_config, "group")
> -- 
> 2.19.1
> 

Thanks, Colin. Looks good.

Reviewed-by: Sergio Paracuellos 

Best regards,
Sergio Paracuellos

Re: [PATCH] staging: mt7621-pinctrl: fix uninitialized variable ngroups

2018-11-10 Thread Sergio Paracuellos

On Sat, Nov 10, 2018 at 11:28:06PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Currently the for_each_node_with_property loop us incrementing variable
> ngroups however it was not initialized and hence will contain garbage.
> Fix this by initializing ngroups to zero.
> 
> Detected with static analysis with cppcheck:
> 
> drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c:89]: (error) Uninitialized
> variable: ngroups
> 
> Fixes: e12a1a6e087b ("staging: mt7621-pinctrl: refactor 
> rt2880_pinctrl_dt_node_to_map function")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c 
> b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> index b8566ed898f1..aa98fbb17013 100644
> --- a/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> +++ b/drivers/staging/mt7621-pinctrl/pinctrl-rt2880.c
> @@ -82,7 +82,7 @@ static int rt2880_pinctrl_dt_node_to_map(struct pinctrl_dev 
> *pctrldev,
>   struct property *prop;
>   const char *function_name, *group_name;
>   int ret;
> - int ngroups;
> + int ngroups = 0;
>   unsigned int reserved_maps = 0;
>  
>   for_each_node_with_property(np_config, "group")
> -- 
> 2.19.1
> 

Thanks, Colin. Looks good.

Reviewed-by: Sergio Paracuellos 

Best regards,
Sergio Paracuellos

[PATCH] ftrace: remove KASAN poison in ftrace_ops_test()

2018-11-10 Thread Zhizhou Zhang

ftrace_ops_test() passed local varible parameter to hash_contains_ip(),
which could result KASAN stack-out-of-bounds warning.

Signed-off-by: Zhizhou Zhang 
---
 kernel/trace/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f536f60..6e11f90 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1522,6 +1522,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, 
void *regs)
rcu_assign_pointer(hash.filter_hash, ops->func_hash->filter_hash);
rcu_assign_pointer(hash.notrace_hash, ops->func_hash->notrace_hash);
 
+   kasan_unpoison_task_stack(current);
+
if (hash_contains_ip(ip, ))
ret = 1;
else
-- 
2.7.4

[PATCH] ftrace: remove KASAN poison in ftrace_ops_test()

2018-11-10 Thread Zhizhou Zhang

ftrace_ops_test() passed local varible parameter to hash_contains_ip(),
which could result KASAN stack-out-of-bounds warning.

Signed-off-by: Zhizhou Zhang 
---
 kernel/trace/ftrace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f536f60..6e11f90 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1522,6 +1522,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, 
void *regs)
rcu_assign_pointer(hash.filter_hash, ops->func_hash->filter_hash);
rcu_assign_pointer(hash.notrace_hash, ops->func_hash->notrace_hash);
 
+   kasan_unpoison_task_stack(current);
+
if (hash_contains_ip(ip, ))
ret = 1;
else
-- 
2.7.4

Re: Official Linux system wrapper library?

2018-11-10 Thread Michael Kerrisk (man-pages)

[adding in glibc folk for comment]

On 11/10/18 7:52 PM, Daniel Colascione wrote:
> Now that glibc is basically not adding any new system call wrappers,
> how about publishing an "official" system call glue library as part of
> the kernel distribution, along with the uapi headers? I don't think
> it's reasonable to expect people to keep using syscall(__NR_XXX) for
> all new functionality, especially as the system grows increasingly
> sophisticated capabilities (like the new mount API, and hopefully the
> new process API) outside the strictures of the POSIX process.

As a quick glance at the glibc NEWS file shows, the above is not
quite true:

[[
Version 2.28
* The renameat2 function has been added...
* The statx function has been added...

Version 2.27
* Support for memory protection keys was added.  The  header now
  declares the functions pkey_alloc, pkey_free, pkey_mprotect...
* The copy_file_range function was added.

Version 2.26
* New wrappers for the Linux-specific system calls preadv2 and pwritev2.

Version 2.25
* The getrandom [function] have been added.
]]

I make that 11 system call wrappers added in the last 2 years.

That said, of course, there are many system calls that lack wrappers [1],
and the use of syscall() is undesirable.

The glibc folk do have their reasons for being conservative around
adding system calls (https://lwn.net/Articles/655028/). However, at
this point, I think one of the limiting factors is developer time
on the glibc project. Quite possibly, they just need some help to
add more (properly designed) wrappers faster.

Cheers,

Michael

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
longstanding example.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Re: Official Linux system wrapper library?

2018-11-10 Thread Michael Kerrisk (man-pages)

[adding in glibc folk for comment]

On 11/10/18 7:52 PM, Daniel Colascione wrote:
> Now that glibc is basically not adding any new system call wrappers,
> how about publishing an "official" system call glue library as part of
> the kernel distribution, along with the uapi headers? I don't think
> it's reasonable to expect people to keep using syscall(__NR_XXX) for
> all new functionality, especially as the system grows increasingly
> sophisticated capabilities (like the new mount API, and hopefully the
> new process API) outside the strictures of the POSIX process.

As a quick glance at the glibc NEWS file shows, the above is not
quite true:

[[
Version 2.28
* The renameat2 function has been added...
* The statx function has been added...

Version 2.27
* Support for memory protection keys was added.  The  header now
  declares the functions pkey_alloc, pkey_free, pkey_mprotect...
* The copy_file_range function was added.

Version 2.26
* New wrappers for the Linux-specific system calls preadv2 and pwritev2.

Version 2.25
* The getrandom [function] have been added.
]]

I make that 11 system call wrappers added in the last 2 years.

That said, of course, there are many system calls that lack wrappers [1],
and the use of syscall() is undesirable.

The glibc folk do have their reasons for being conservative around
adding system calls (https://lwn.net/Articles/655028/). However, at
this point, I think one of the limiting factors is developer time
on the glibc project. Quite possibly, they just need some help to
add more (properly designed) wrappers faster.

Cheers,

Michael

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=6399 is a
longstanding example.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Re: [PATCH] pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues

2018-11-10 Thread Bjorn Andersson

On Sat 10 Nov 17:34 PST 2018, Brian Masney wrote:

> When attempting to setup up a gpio hog, device probing will repeatedly
> fail with -EPROBE_DEFERED errors. It is caused by a circular dependency
> between the gpio and pinctrl frameworks. If the gpio-ranges property is
> present in device tree, then the gpio framework will handle the gpio pin
> registration and eliminate the circular dependency.
> 
> See Christian Lamparter's commit a86caa9ba5d7 ("pinctrl: msm: fix
> gpio-hog related boot issues") for a detailed commit message that
> explains the issue in much more detail. The code comment in this commit
> came from Christian's commit.
> 
> I did not test this change against any hardware supported by this
> particular driver, however I was able to validate this same fix works
> for pinctrl-spmi-gpio.c using a LG Nexus 5 (hammerhead) phone.
> 
> Signed-off-by: Brian Masney 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
> For the patch and discussion regarding pinctrl-spmi-gpio.c, see
> https://lore.kernel.org/lkml/20181101001149.13453-6-masn...@onstation.org/
> 
>  drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c 
> b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> index 6b30bef829ab..ded7d765af2e 100644
> --- a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> +++ b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> @@ -762,12 +762,23 @@ static int pm8xxx_gpio_probe(struct platform_device 
> *pdev)
>   return ret;
>   }
>  
> - ret = gpiochip_add_pin_range(>chip,
> -  dev_name(pctrl->dev),
> -  0, 0, pctrl->chip.ngpio);
> - if (ret) {
> - dev_err(pctrl->dev, "failed to add pin range\n");
> - goto unregister_gpiochip;
> + /*
> +  * For DeviceTree-supported systems, the gpio core checks the
> +  * pinctrl's device node for the "gpio-ranges" property.
> +  * If it is present, it takes care of adding the pin ranges
> +  * for the driver. In this case the driver can skip ahead.
> +  *
> +  * In order to remain compatible with older, existing DeviceTree
> +  * files which don't set the "gpio-ranges" property or systems that
> +  * utilize ACPI the driver has to call gpiochip_add_pin_range().
> +  */
> + if (!of_property_read_bool(pctrl->dev->of_node, "gpio-ranges")) {
> + ret = gpiochip_add_pin_range(>chip, dev_name(pctrl->dev),
> +  0, 0, pctrl->chip.ngpio);
> + if (ret) {
> + dev_err(pctrl->dev, "failed to add pin range\n");
> + goto unregister_gpiochip;
> + }
>   }
>  
>   platform_set_drvdata(pdev, pctrl);
> -- 
> 2.17.2
>

Re: [PATCH] pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues

2018-11-10 Thread Bjorn Andersson

On Sat 10 Nov 17:34 PST 2018, Brian Masney wrote:

> When attempting to setup up a gpio hog, device probing will repeatedly
> fail with -EPROBE_DEFERED errors. It is caused by a circular dependency
> between the gpio and pinctrl frameworks. If the gpio-ranges property is
> present in device tree, then the gpio framework will handle the gpio pin
> registration and eliminate the circular dependency.
> 
> See Christian Lamparter's commit a86caa9ba5d7 ("pinctrl: msm: fix
> gpio-hog related boot issues") for a detailed commit message that
> explains the issue in much more detail. The code comment in this commit
> came from Christian's commit.
> 
> I did not test this change against any hardware supported by this
> particular driver, however I was able to validate this same fix works
> for pinctrl-spmi-gpio.c using a LG Nexus 5 (hammerhead) phone.
> 
> Signed-off-by: Brian Masney 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
> For the patch and discussion regarding pinctrl-spmi-gpio.c, see
> https://lore.kernel.org/lkml/20181101001149.13453-6-masn...@onstation.org/
> 
>  drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c 
> b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> index 6b30bef829ab..ded7d765af2e 100644
> --- a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> +++ b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
> @@ -762,12 +762,23 @@ static int pm8xxx_gpio_probe(struct platform_device 
> *pdev)
>   return ret;
>   }
>  
> - ret = gpiochip_add_pin_range(>chip,
> -  dev_name(pctrl->dev),
> -  0, 0, pctrl->chip.ngpio);
> - if (ret) {
> - dev_err(pctrl->dev, "failed to add pin range\n");
> - goto unregister_gpiochip;
> + /*
> +  * For DeviceTree-supported systems, the gpio core checks the
> +  * pinctrl's device node for the "gpio-ranges" property.
> +  * If it is present, it takes care of adding the pin ranges
> +  * for the driver. In this case the driver can skip ahead.
> +  *
> +  * In order to remain compatible with older, existing DeviceTree
> +  * files which don't set the "gpio-ranges" property or systems that
> +  * utilize ACPI the driver has to call gpiochip_add_pin_range().
> +  */
> + if (!of_property_read_bool(pctrl->dev->of_node, "gpio-ranges")) {
> + ret = gpiochip_add_pin_range(>chip, dev_name(pctrl->dev),
> +  0, 0, pctrl->chip.ngpio);
> + if (ret) {
> + dev_err(pctrl->dev, "failed to add pin range\n");
> + goto unregister_gpiochip;
> + }
>   }
>  
>   platform_set_drvdata(pdev, pctrl);
> -- 
> 2.17.2
>

[PATCH v4 00/10] x86/alternative: text_poke() fixes

2018-11-10 Thread Nadav Amit

This patch-set addresses some issues that might affect the security and
the correctness of code patching.

The main issue that the patches deal with is the fact that the fixmap
PTEs that are used for patching are available for access from other
cores and might be exploited. They are not even flushed from the TLB in
remote cores, so the risk is even higher. This set addresses this issue
by introducing a temporary mm that is only used during patching.
Unfortunately, due to init ordering, fixmap is still used during
boot-time patching. Future patches can eliminate the need for it.

To do so, we need to avoid using text_poke() before the poking-mm is
initialized and instead use text_poke_early().

During v3 of this set, Andy & Thomas suggested that early patching of
modules can be improved by simply writing to the memory. This actually
raises a security concern: there should not be any W+X mappings at any
given moment, and modules loading breaks this protection for no good
reason. So this patch also addresses this issue, while (presumably)
improving patching speed by making module memory initially RW(+NX) and
before execution changing it into RO(+X).

In addition the patch addresses various issues that are related to code
patching, and do some cleanup. I removed in this version some
tested-by and reviewed-by tags due to some extensive changes of some
patches.

v3->v4:
- Setting modules as RO when loading [Andy, tglx]
- Adding text_poke_kgdb() to keep the text_mutex assertion [tglx]
- Simpler logic to decide when to use early-poking [peterZ]
- More cleanup

v2->v3:
- Remove the fallback path in text_poke() [peterZ]
- poking_init() was broken due to the local variable poking_addr
- Preallocate tables for the temporary-mm to avoid sleep-in-atomic
- Prevent KASAN from yelling at text_poke()

v1->v2:
- Partial revert of 9222f606506c added to 1/6 [masami]
- Added Masami's reviewed-by tag

RFC->v1:
- Added handling of error in get_locked_pte()
- Remove lockdep assertion, clarify text_mutex use instead [masami]
- Comment fix [peterz]
- Removed remainders of text_poke return value [masami]
- Use __weak for poking_init instead of macros [masami]
- Simplify error handling in poking_init [masami]

Andy Lutomirski (1):
  x86/mm: temporary mm struct

Nadav Amit (9):
  Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"
  x86/jump_label: Use text_poke_early() during early init
  fork: provide a function for copying init_mm
  x86/alternative: initializing temporary mm for patching
  x86/alternative: use temporary mm for text poking
  x86/kgdb: avoid redundant comparison of code
  x86: avoid W^X being broken during modules loading
  x86/jump-label: remove support for custom poker
  x86/alternative: remove the return value of text_poke_*()

 arch/x86/include/asm/fixmap.h|   2 -
 arch/x86/include/asm/mmu_context.h   |  20 +++
 arch/x86/include/asm/pgtable.h   |   3 +
 arch/x86/include/asm/text-patching.h |   9 +-
 arch/x86/kernel/alternative.c| 208 +--
 arch/x86/kernel/jump_label.c |  24 ++--
 arch/x86/kernel/kgdb.c   |  19 +--
 arch/x86/kernel/module.c |   2 +-
 arch/x86/mm/init_64.c|  39 +
 include/linux/filter.h   |   6 +
 include/linux/sched/task.h   |   1 +
 init/main.c  |   3 +
 kernel/fork.c|  24 +++-
 kernel/module.c  |  10 ++
 14 files changed, 289 insertions(+), 81 deletions(-)

-- 
2.17.1

[PATCH v4 01/10] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-11-10 Thread Nadav Amit

text_mutex is currently expected to be held before text_poke() is
called, but we kgdb does not take the mutex, and instead *supposedly*
ensures the lock is not taken and will not be acquired by any other core
while text_poke() is running.

The reason for the "supposedly" comment is that it is not entirely clear
that this would be the case if gdb_do_roundup is zero.

This patch creates two wrapper functions, text_poke() and
text_poke_kgdb() which do or do not run the lockdep assertion
respectively.

While we are at it, change the return code of text_poke() to something
meaningful. One day, callers might actually respect it and the existing
BUG_ON() when patching fails could be removed. For kgdb, the return
value can actually be used.

Cc: Jiri Kosina 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Fixes: 9222f606506c ("x86/alternatives: Lockdep-enforce text_mutex in 
text_poke*()")
Suggested-by: Peter Zijlstra 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/text-patching.h |  3 +-
 arch/x86/kernel/alternative.c| 72 +---
 arch/x86/kernel/kgdb.c   | 15 --
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..5a2600370763 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -34,7 +34,8 @@ extern void *text_poke_early(void *addr, const void *opcode, 
size_t len);
  * On the local CPU you need to be protected again NMI or MCE handlers seeing 
an
  * inconsistent instruction while you patch.
  */
-extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern int text_poke(void *addr, const void *opcode, size_t len);
+extern int text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebeac487a20c..ebe9210dc92e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -678,23 +678,12 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
return addr;
 }
 
-/**
- * text_poke - Update instructions on a live kernel
- * @addr: address to modify
- * @opcode: source of the copy
- * @len: length to copy
- *
- * Only atomic text poke/set should be allowed when not doing early patching.
- * It means the size must be writable atomically and the address must be 
aligned
- * in a way that permits an atomic write. It also makes sure we fit on a single
- * page.
- */
-void *text_poke(void *addr, const void *opcode, size_t len)
+static int __text_poke(void *addr, const void *opcode, size_t len)
 {
unsigned long flags;
char *vaddr;
struct page *pages[2];
-   int i;
+   int i, r = 0;
 
/*
 * While boot memory allocator is runnig we cannot use struct
@@ -702,8 +691,6 @@ void *text_poke(void *addr, const void *opcode, size_t len)
 */
BUG_ON(!after_bootmem);
 
-   lockdep_assert_held(_mutex);
-
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -712,7 +699,8 @@ void *text_poke(void *addr, const void *opcode, size_t len)
WARN_ON(!PageReserved(pages[0]));
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
-   BUG_ON(!pages[0]);
+   if (!pages[0])
+   return -EFAULT;
local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
@@ -727,9 +715,57 @@ void *text_poke(void *addr, const void *opcode, size_t len)
/* Could also do a CLFLUSH here to speed up CPU recovery; but
   that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
-   BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+   if (((char *)addr)[i] != ((char *)opcode)[i])
+   r = -EFAULT;
local_irq_restore(flags);
-   return addr;
+   return r;
+}
+
+/**
+ * text_poke - Update instructions on a live kernel
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
+ * Only atomic text poke/set should be allowed when not doing early patching.
+ * It means the size must be writable atomically and the address must be 
aligned
+ * in a way that permits an atomic write. It also makes sure we fit on a single
+ * page.
+ */
+int text_poke(void *addr, const void *opcode, size_t len)
+{
+   int r;
+
+   lockdep_assert_held(_mutex);
+
+   r = __text_poke(addr, opcode, len);
+
+   /*
+* TODO: change the callers to consider the return value and remove this
+*   historical assertion.
+*/
+

[PATCH v4 00/10] x86/alternative: text_poke() fixes

2018-11-10 Thread Nadav Amit

This patch-set addresses some issues that might affect the security and
the correctness of code patching.

The main issue that the patches deal with is the fact that the fixmap
PTEs that are used for patching are available for access from other
cores and might be exploited. They are not even flushed from the TLB in
remote cores, so the risk is even higher. This set addresses this issue
by introducing a temporary mm that is only used during patching.
Unfortunately, due to init ordering, fixmap is still used during
boot-time patching. Future patches can eliminate the need for it.

To do so, we need to avoid using text_poke() before the poking-mm is
initialized and instead use text_poke_early().

During v3 of this set, Andy & Thomas suggested that early patching of
modules can be improved by simply writing to the memory. This actually
raises a security concern: there should not be any W+X mappings at any
given moment, and modules loading breaks this protection for no good
reason. So this patch also addresses this issue, while (presumably)
improving patching speed by making module memory initially RW(+NX) and
before execution changing it into RO(+X).

In addition the patch addresses various issues that are related to code
patching, and do some cleanup. I removed in this version some
tested-by and reviewed-by tags due to some extensive changes of some
patches.

v3->v4:
- Setting modules as RO when loading [Andy, tglx]
- Adding text_poke_kgdb() to keep the text_mutex assertion [tglx]
- Simpler logic to decide when to use early-poking [peterZ]
- More cleanup

v2->v3:
- Remove the fallback path in text_poke() [peterZ]
- poking_init() was broken due to the local variable poking_addr
- Preallocate tables for the temporary-mm to avoid sleep-in-atomic
- Prevent KASAN from yelling at text_poke()

v1->v2:
- Partial revert of 9222f606506c added to 1/6 [masami]
- Added Masami's reviewed-by tag

RFC->v1:
- Added handling of error in get_locked_pte()
- Remove lockdep assertion, clarify text_mutex use instead [masami]
- Comment fix [peterz]
- Removed remainders of text_poke return value [masami]
- Use __weak for poking_init instead of macros [masami]
- Simplify error handling in poking_init [masami]

Andy Lutomirski (1):
  x86/mm: temporary mm struct

Nadav Amit (9):
  Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"
  x86/jump_label: Use text_poke_early() during early init
  fork: provide a function for copying init_mm
  x86/alternative: initializing temporary mm for patching
  x86/alternative: use temporary mm for text poking
  x86/kgdb: avoid redundant comparison of code
  x86: avoid W^X being broken during modules loading
  x86/jump-label: remove support for custom poker
  x86/alternative: remove the return value of text_poke_*()

 arch/x86/include/asm/fixmap.h|   2 -
 arch/x86/include/asm/mmu_context.h   |  20 +++
 arch/x86/include/asm/pgtable.h   |   3 +
 arch/x86/include/asm/text-patching.h |   9 +-
 arch/x86/kernel/alternative.c| 208 +--
 arch/x86/kernel/jump_label.c |  24 ++--
 arch/x86/kernel/kgdb.c   |  19 +--
 arch/x86/kernel/module.c |   2 +-
 arch/x86/mm/init_64.c|  39 +
 include/linux/filter.h   |   6 +
 include/linux/sched/task.h   |   1 +
 init/main.c  |   3 +
 kernel/fork.c|  24 +++-
 kernel/module.c  |  10 ++
 14 files changed, 289 insertions(+), 81 deletions(-)

-- 
2.17.1

[PATCH v4 01/10] Fix "x86/alternatives: Lockdep-enforce text_mutex in text_poke*()"

2018-11-10 Thread Nadav Amit

text_mutex is currently expected to be held before text_poke() is
called, but we kgdb does not take the mutex, and instead *supposedly*
ensures the lock is not taken and will not be acquired by any other core
while text_poke() is running.

The reason for the "supposedly" comment is that it is not entirely clear
that this would be the case if gdb_do_roundup is zero.

This patch creates two wrapper functions, text_poke() and
text_poke_kgdb() which do or do not run the lockdep assertion
respectively.

While we are at it, change the return code of text_poke() to something
meaningful. One day, callers might actually respect it and the existing
BUG_ON() when patching fails could be removed. For kgdb, the return
value can actually be used.

Cc: Jiri Kosina 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Fixes: 9222f606506c ("x86/alternatives: Lockdep-enforce text_mutex in 
text_poke*()")
Suggested-by: Peter Zijlstra 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/text-patching.h |  3 +-
 arch/x86/kernel/alternative.c| 72 +---
 arch/x86/kernel/kgdb.c   | 15 --
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..5a2600370763 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -34,7 +34,8 @@ extern void *text_poke_early(void *addr, const void *opcode, 
size_t len);
  * On the local CPU you need to be protected again NMI or MCE handlers seeing 
an
  * inconsistent instruction while you patch.
  */
-extern void *text_poke(void *addr, const void *opcode, size_t len);
+extern int text_poke(void *addr, const void *opcode, size_t len);
+extern int text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebeac487a20c..ebe9210dc92e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -678,23 +678,12 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
return addr;
 }
 
-/**
- * text_poke - Update instructions on a live kernel
- * @addr: address to modify
- * @opcode: source of the copy
- * @len: length to copy
- *
- * Only atomic text poke/set should be allowed when not doing early patching.
- * It means the size must be writable atomically and the address must be 
aligned
- * in a way that permits an atomic write. It also makes sure we fit on a single
- * page.
- */
-void *text_poke(void *addr, const void *opcode, size_t len)
+static int __text_poke(void *addr, const void *opcode, size_t len)
 {
unsigned long flags;
char *vaddr;
struct page *pages[2];
-   int i;
+   int i, r = 0;
 
/*
 * While boot memory allocator is runnig we cannot use struct
@@ -702,8 +691,6 @@ void *text_poke(void *addr, const void *opcode, size_t len)
 */
BUG_ON(!after_bootmem);
 
-   lockdep_assert_held(_mutex);
-
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -712,7 +699,8 @@ void *text_poke(void *addr, const void *opcode, size_t len)
WARN_ON(!PageReserved(pages[0]));
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
-   BUG_ON(!pages[0]);
+   if (!pages[0])
+   return -EFAULT;
local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
@@ -727,9 +715,57 @@ void *text_poke(void *addr, const void *opcode, size_t len)
/* Could also do a CLFLUSH here to speed up CPU recovery; but
   that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
-   BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+   if (((char *)addr)[i] != ((char *)opcode)[i])
+   r = -EFAULT;
local_irq_restore(flags);
-   return addr;
+   return r;
+}
+
+/**
+ * text_poke - Update instructions on a live kernel
+ * @addr: address to modify
+ * @opcode: source of the copy
+ * @len: length to copy
+ *
+ * Only atomic text poke/set should be allowed when not doing early patching.
+ * It means the size must be writable atomically and the address must be 
aligned
+ * in a way that permits an atomic write. It also makes sure we fit on a single
+ * page.
+ */
+int text_poke(void *addr, const void *opcode, size_t len)
+{
+   int r;
+
+   lockdep_assert_held(_mutex);
+
+   r = __text_poke(addr, opcode, len);
+
+   /*
+* TODO: change the callers to consider the return value and remove this
+*   historical assertion.
+*/
+

[PATCH v4 02/10] x86/jump_label: Use text_poke_early() during early init

2018-11-10 Thread Nadav Amit

There is no apparent reason not to use text_poke_early() while we are
during early-init and we do not patch code that might be on the stack
(i.e., we'll return to the middle of the patched code). This appears to
be the case of jump-labels, so do so.

This is required for the next patches that would set a temporary mm for
patching, which is initialized after some static-keys are
enabled/disabled.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Co-Developed-by: Peter Zijlstra 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/jump_label.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index aac0c1f7e354..ed5fe274a7d8 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
jmp.offset = jump_entry_target(entry) -
 (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 
-   if (early_boot_irqs_disabled)
+   /*
+* As long as we're UP and not yet marked RO, we can use
+* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
+* SYSTEM_SCHEDULING before going either.
+*/
+   if (system_state == SYSTEM_BOOTING)
poker = text_poke_early;
 
if (type == JUMP_LABEL_JMP) {
-- 
2.17.1

[PATCH v4 04/10] fork: provide a function for copying init_mm

2018-11-10 Thread Nadav Amit

Provide a function for copying init_mm. This function will be later used
for setting a temporary mm.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 include/linux/sched/task.h |  1 +
 kernel/fork.c  | 24 ++--
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 108ede99e533..ac0a675678f5 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -74,6 +74,7 @@ extern void exit_itimers(struct signal_struct *);
 extern long _do_fork(unsigned long, unsigned long, unsigned long, int __user 
*, int __user *, unsigned long);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, 
int __user *);
 struct task_struct *fork_idle(int);
+struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff89c7b..01d3f5b39363 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1297,13 +1297,20 @@ void mm_release(struct task_struct *tsk, struct 
mm_struct *mm)
complete_vfork_done(tsk);
 }
 
-/*
- * Allocate a new mm structure and copy contents from the
- * mm structure of the passed in task structure.
+/**
+ * dup_mm() - duplicates an existing mm structure
+ * @tsk: the task_struct with which the new mm will be associated.
+ * @oldmm: the mm to duplicate.
+ *
+ * Allocates a new mm structure and copy contents from the provided
+ * @oldmm structure.
+ *
+ * Return: the duplicated mm or NULL on failure.
  */
-static struct mm_struct *dup_mm(struct task_struct *tsk)
+static struct mm_struct *dup_mm(struct task_struct *tsk,
+   struct mm_struct *oldmm)
 {
-   struct mm_struct *mm, *oldmm = current->mm;
+   struct mm_struct *mm;
int err;
 
mm = allocate_mm();
@@ -1370,7 +1377,7 @@ static int copy_mm(unsigned long clone_flags, struct 
task_struct *tsk)
}
 
retval = -ENOMEM;
-   mm = dup_mm(tsk);
+   mm = dup_mm(tsk, current->mm);
if (!mm)
goto fail_nomem;
 
@@ -2176,6 +2183,11 @@ struct task_struct *fork_idle(int cpu)
return task;
 }
 
+struct mm_struct *copy_init_mm(void)
+{
+   return dup_mm(NULL, _mm);
+}
+
 /*
  *  Ok, this is the main fork-routine.
  *
-- 
2.17.1

[PATCH v4 02/10] x86/jump_label: Use text_poke_early() during early init

2018-11-10 Thread Nadav Amit

There is no apparent reason not to use text_poke_early() while we are
during early-init and we do not patch code that might be on the stack
(i.e., we'll return to the middle of the patched code). This appears to
be the case of jump-labels, so do so.

This is required for the next patches that would set a temporary mm for
patching, which is initialized after some static-keys are
enabled/disabled.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Co-Developed-by: Peter Zijlstra 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/jump_label.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index aac0c1f7e354..ed5fe274a7d8 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
jmp.offset = jump_entry_target(entry) -
 (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 
-   if (early_boot_irqs_disabled)
+   /*
+* As long as we're UP and not yet marked RO, we can use
+* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
+* SYSTEM_SCHEDULING before going either.
+*/
+   if (system_state == SYSTEM_BOOTING)
poker = text_poke_early;
 
if (type == JUMP_LABEL_JMP) {
-- 
2.17.1

[PATCH v4 04/10] fork: provide a function for copying init_mm

2018-11-10 Thread Nadav Amit

Provide a function for copying init_mm. This function will be later used
for setting a temporary mm.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 include/linux/sched/task.h |  1 +
 kernel/fork.c  | 24 ++--
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 108ede99e533..ac0a675678f5 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -74,6 +74,7 @@ extern void exit_itimers(struct signal_struct *);
 extern long _do_fork(unsigned long, unsigned long, unsigned long, int __user 
*, int __user *, unsigned long);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, 
int __user *);
 struct task_struct *fork_idle(int);
+struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff89c7b..01d3f5b39363 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1297,13 +1297,20 @@ void mm_release(struct task_struct *tsk, struct 
mm_struct *mm)
complete_vfork_done(tsk);
 }
 
-/*
- * Allocate a new mm structure and copy contents from the
- * mm structure of the passed in task structure.
+/**
+ * dup_mm() - duplicates an existing mm structure
+ * @tsk: the task_struct with which the new mm will be associated.
+ * @oldmm: the mm to duplicate.
+ *
+ * Allocates a new mm structure and copy contents from the provided
+ * @oldmm structure.
+ *
+ * Return: the duplicated mm or NULL on failure.
  */
-static struct mm_struct *dup_mm(struct task_struct *tsk)
+static struct mm_struct *dup_mm(struct task_struct *tsk,
+   struct mm_struct *oldmm)
 {
-   struct mm_struct *mm, *oldmm = current->mm;
+   struct mm_struct *mm;
int err;
 
mm = allocate_mm();
@@ -1370,7 +1377,7 @@ static int copy_mm(unsigned long clone_flags, struct 
task_struct *tsk)
}
 
retval = -ENOMEM;
-   mm = dup_mm(tsk);
+   mm = dup_mm(tsk, current->mm);
if (!mm)
goto fail_nomem;
 
@@ -2176,6 +2183,11 @@ struct task_struct *fork_idle(int cpu)
return task;
 }
 
+struct mm_struct *copy_init_mm(void)
+{
+   return dup_mm(NULL, _mm);
+}
+
 /*
  *  Ok, this is the main fork-routine.
  *
-- 
2.17.1

[PATCH v4 05/10] x86/alternative: initializing temporary mm for patching

2018-11-10 Thread Nadav Amit

To prevent improper use of the PTEs that are used for text patching, we
want to use a temporary mm struct. We initailize it by copying the init
mm.

The address that will be used for patching is taken from the lower area
that is usually used for the task memory. Doing so prevents the need to
frequently synchronize the temporary-mm (e.g., when BPF programs are
installed), since different PGDs are used for the task memory.

Finally, we randomize the address of the PTEs to harden against exploits
that use these PTEs.

Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Suggested-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/pgtable.h   |  3 +++
 arch/x86/include/asm/text-patching.h |  2 ++
 arch/x86/kernel/alternative.c|  3 +++
 arch/x86/mm/init_64.c| 39 
 init/main.c  |  3 +++
 5 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 40616e805292..e8f630d9a2ed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1021,6 +1021,9 @@ static inline void __meminit init_trampoline_default(void)
/* Default trampoline pgd value */
trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 }
+
+void __init poking_init(void);
+
 # ifdef CONFIG_RANDOMIZE_MEMORY
 void __meminit init_trampoline(void);
 # else
diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index 5a2600370763..e5716ef9a721 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,5 +39,7 @@ extern int text_poke_kgdb(void *addr, const void *opcode, 
size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
+extern __ro_after_init struct mm_struct *poking_mm;
+extern __ro_after_init unsigned long poking_addr;
 
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebe9210dc92e..d3ae5c26e5a0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -678,6 +678,9 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
return addr;
 }
 
+__ro_after_init struct mm_struct *poking_mm;
+__ro_after_init unsigned long poking_addr;
+
 static int __text_poke(void *addr, const void *opcode, size_t len)
 {
unsigned long flags;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..56d56d77aa66 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
@@ -1388,6 +1389,44 @@ unsigned long memory_block_size_bytes(void)
return memory_block_size_probed;
 }
 
+/*
+ * Initialize an mm_struct to be used during poking and a pointer to be used
+ * during patching. If anything fails during initialization, poking will be 
done
+ * using the fixmap, which is unsafe, so warn the user about it.
+ */
+void __init poking_init(void)
+{
+   spinlock_t *ptl;
+   pte_t *ptep;
+
+   poking_mm = copy_init_mm();
+   if (!poking_mm) {
+   pr_err("x86/mm: error setting a separate poking address space");
+   return;
+   }
+
+   /*
+* Randomize the poking address, but make sure that the following page
+* will be mapped at the same PMD. We need 2 pages, so find space for 3,
+* and adjust the address if the PMD ends after the first one.
+*/
+   poking_addr = TASK_UNMAPPED_BASE +
+   (kaslr_get_random_long("Poking") & PAGE_MASK) %
+   (TASK_SIZE - TASK_UNMAPPED_BASE - 3 * PAGE_SIZE);
+
+   if (((poking_addr + PAGE_SIZE) & ~PMD_MASK) == 0)
+   poking_addr += PAGE_SIZE;
+
+   /*
+* We need to trigger the allocation of the page-tables that will be
+* needed for poking now. Later, poking may be performed in an atomic
+* section, which might cause allocation to fail.
+*/
+   ptep = get_locked_pte(poking_mm, poking_addr, );
+   if (!WARN_ON(!ptep))
+   pte_unmap_unlock(ptep, ptl);
+}
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..a461150adfb1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -497,6 +497,8 @@ void __init __weak thread_stack_cache_init(void)
 
 void __init __weak mem_encrypt_init(void) { }
 
+void __init __weak poking_init(void) { }
+
 bool initcall_debug;
 core_param(initcall_debug, initcall_debug, bool, 0644);
 
@@ -731,6 +733,7 @@ asmlinkage __visible void __init start_kernel(void)
taskstats_init_early();
delayacct_init();
 
+   poking_init();

[PATCH v4 07/10] x86/kgdb: avoid redundant comparison of code

2018-11-10 Thread Nadav Amit

text_poke() already ensures that the written value is the correct one
and fails if that is not the case. There is no need for an additional
comparison. Remove it.

Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/kgdb.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8091b2e381d4..d14e1be576fd 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -751,7 +751,6 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
ip)
 int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 {
int err;
-   char opc[BREAK_INSTR_SIZE];
 
bpt->type = BP_BREAKPOINT;
err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
@@ -772,11 +771,6 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 BREAK_INSTR_SIZE);
if (err)
return err;
-   err = probe_kernel_read(opc, (char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
-   if (err)
-   return err;
-   if (memcmp(opc, arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE))
-   return -EINVAL;
bpt->type = BP_POKE_BREAKPOINT;
 
return err;
@@ -785,7 +779,6 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 {
int err;
-   char opc[BREAK_INSTR_SIZE];
 
if (bpt->type != BP_POKE_BREAKPOINT)
goto knl_write;
@@ -798,9 +791,6 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
err = text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr,
 BREAK_INSTR_SIZE);
if (err)
-   return err;
-   err = probe_kernel_read(opc, (char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
-   if (err || memcmp(opc, bpt->saved_instr, BREAK_INSTR_SIZE))
goto knl_write;
return err;
 
-- 
2.17.1

[PATCH v4 03/10] x86/mm: temporary mm struct

2018-11-10 Thread Nadav Amit

From: Andy Lutomirski 

Sometimes we want to set a temporary page-table entries (PTEs) in one of
the cores, without allowing other cores to use - even speculatively -
these mappings. There are two benefits for doing so:

(1) Security: if sensitive PTEs are set, temporary mm prevents their use
in other cores. This hardens the security as it prevents exploding a
dangling pointer to overwrite sensitive data using the sensitive PTE.

(2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
remote page-tables.

To do so a temporary mm_struct can be used. Mappings which are private
for this mm can be set in the userspace part of the address-space.
During the whole time in which the temporary mm is loaded, interrupts
must be disabled.

The first use-case for temporary PTEs, which will follow, is for poking
the kernel text.

[ Commit message was written by Nadav ]

Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/mmu_context.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 0ca50611e8ce..7cc8e5c50bf6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
return cr3;
 }
 
+typedef struct {
+   struct mm_struct *prev;
+} temporary_mm_state_t;
+
+static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
+{
+   temporary_mm_state_t state;
+
+   lockdep_assert_irqs_disabled();
+   state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+   switch_mm_irqs_off(NULL, mm, current);
+   return state;
+}
+
+static inline void unuse_temporary_mm(temporary_mm_state_t prev)
+{
+   lockdep_assert_irqs_disabled();
+   switch_mm_irqs_off(NULL, prev.prev, current);
+}
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
-- 
2.17.1

[PATCH v4 07/10] x86/kgdb: avoid redundant comparison of code

2018-11-10 Thread Nadav Amit

text_poke() already ensures that the written value is the correct one
and fails if that is not the case. There is no need for an additional
comparison. Remove it.

Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/kgdb.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8091b2e381d4..d14e1be576fd 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -751,7 +751,6 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long 
ip)
 int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 {
int err;
-   char opc[BREAK_INSTR_SIZE];
 
bpt->type = BP_BREAKPOINT;
err = probe_kernel_read(bpt->saved_instr, (char *)bpt->bpt_addr,
@@ -772,11 +771,6 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 BREAK_INSTR_SIZE);
if (err)
return err;
-   err = probe_kernel_read(opc, (char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
-   if (err)
-   return err;
-   if (memcmp(opc, arch_kgdb_ops.gdb_bpt_instr, BREAK_INSTR_SIZE))
-   return -EINVAL;
bpt->type = BP_POKE_BREAKPOINT;
 
return err;
@@ -785,7 +779,6 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
 {
int err;
-   char opc[BREAK_INSTR_SIZE];
 
if (bpt->type != BP_POKE_BREAKPOINT)
goto knl_write;
@@ -798,9 +791,6 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
err = text_poke_kgdb((void *)bpt->bpt_addr, bpt->saved_instr,
 BREAK_INSTR_SIZE);
if (err)
-   return err;
-   err = probe_kernel_read(opc, (char *)bpt->bpt_addr, BREAK_INSTR_SIZE);
-   if (err || memcmp(opc, bpt->saved_instr, BREAK_INSTR_SIZE))
goto knl_write;
return err;
 
-- 
2.17.1

[PATCH v4 03/10] x86/mm: temporary mm struct

2018-11-10 Thread Nadav Amit

From: Andy Lutomirski 

Sometimes we want to set a temporary page-table entries (PTEs) in one of
the cores, without allowing other cores to use - even speculatively -
these mappings. There are two benefits for doing so:

(1) Security: if sensitive PTEs are set, temporary mm prevents their use
in other cores. This hardens the security as it prevents exploding a
dangling pointer to overwrite sensitive data using the sensitive PTE.

(2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
remote page-tables.

To do so a temporary mm_struct can be used. Mappings which are private
for this mm can be set in the userspace part of the address-space.
During the whole time in which the temporary mm is loaded, interrupts
must be disabled.

The first use-case for temporary PTEs, which will follow, is for poking
the kernel text.

[ Commit message was written by Nadav ]

Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Signed-off-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/mmu_context.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 0ca50611e8ce..7cc8e5c50bf6 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
return cr3;
 }
 
+typedef struct {
+   struct mm_struct *prev;
+} temporary_mm_state_t;
+
+static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
+{
+   temporary_mm_state_t state;
+
+   lockdep_assert_irqs_disabled();
+   state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+   switch_mm_irqs_off(NULL, mm, current);
+   return state;
+}
+
+static inline void unuse_temporary_mm(temporary_mm_state_t prev)
+{
+   lockdep_assert_irqs_disabled();
+   switch_mm_irqs_off(NULL, prev.prev, current);
+}
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
-- 
2.17.1

[PATCH v4 05/10] x86/alternative: initializing temporary mm for patching

2018-11-10 Thread Nadav Amit

To prevent improper use of the PTEs that are used for text patching, we
want to use a temporary mm struct. We initailize it by copying the init
mm.

The address that will be used for patching is taken from the lower area
that is usually used for the task memory. Doing so prevents the need to
frequently synchronize the temporary-mm (e.g., when BPF programs are
installed), since different PGDs are used for the task memory.

Finally, we randomize the address of the PTEs to harden against exploits
that use these PTEs.

Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Reviewed-by: Masami Hiramatsu 
Tested-by: Masami Hiramatsu 
Suggested-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/pgtable.h   |  3 +++
 arch/x86/include/asm/text-patching.h |  2 ++
 arch/x86/kernel/alternative.c|  3 +++
 arch/x86/mm/init_64.c| 39 
 init/main.c  |  3 +++
 5 files changed, 50 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 40616e805292..e8f630d9a2ed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1021,6 +1021,9 @@ static inline void __meminit init_trampoline_default(void)
/* Default trampoline pgd value */
trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 }
+
+void __init poking_init(void);
+
 # ifdef CONFIG_RANDOMIZE_MEMORY
 void __meminit init_trampoline(void);
 # else
diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index 5a2600370763..e5716ef9a721 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,5 +39,7 @@ extern int text_poke_kgdb(void *addr, const void *opcode, 
size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
+extern __ro_after_init struct mm_struct *poking_mm;
+extern __ro_after_init unsigned long poking_addr;
 
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ebe9210dc92e..d3ae5c26e5a0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -678,6 +678,9 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
return addr;
 }
 
+__ro_after_init struct mm_struct *poking_mm;
+__ro_after_init unsigned long poking_addr;
+
 static int __text_poke(void *addr, const void *opcode, size_t len)
 {
unsigned long flags;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..56d56d77aa66 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
@@ -1388,6 +1389,44 @@ unsigned long memory_block_size_bytes(void)
return memory_block_size_probed;
 }
 
+/*
+ * Initialize an mm_struct to be used during poking and a pointer to be used
+ * during patching. If anything fails during initialization, poking will be 
done
+ * using the fixmap, which is unsafe, so warn the user about it.
+ */
+void __init poking_init(void)
+{
+   spinlock_t *ptl;
+   pte_t *ptep;
+
+   poking_mm = copy_init_mm();
+   if (!poking_mm) {
+   pr_err("x86/mm: error setting a separate poking address space");
+   return;
+   }
+
+   /*
+* Randomize the poking address, but make sure that the following page
+* will be mapped at the same PMD. We need 2 pages, so find space for 3,
+* and adjust the address if the PMD ends after the first one.
+*/
+   poking_addr = TASK_UNMAPPED_BASE +
+   (kaslr_get_random_long("Poking") & PAGE_MASK) %
+   (TASK_SIZE - TASK_UNMAPPED_BASE - 3 * PAGE_SIZE);
+
+   if (((poking_addr + PAGE_SIZE) & ~PMD_MASK) == 0)
+   poking_addr += PAGE_SIZE;
+
+   /*
+* We need to trigger the allocation of the page-tables that will be
+* needed for poking now. Later, poking may be performed in an atomic
+* section, which might cause allocation to fail.
+*/
+   ptep = get_locked_pte(poking_mm, poking_addr, );
+   if (!WARN_ON(!ptep))
+   pte_unmap_unlock(ptep, ptl);
+}
+
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 /*
  * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..a461150adfb1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -497,6 +497,8 @@ void __init __weak thread_stack_cache_init(void)
 
 void __init __weak mem_encrypt_init(void) { }
 
+void __init __weak poking_init(void) { }
+
 bool initcall_debug;
 core_param(initcall_debug, initcall_debug, bool, 0644);
 
@@ -731,6 +733,7 @@ asmlinkage __visible void __init start_kernel(void)
taskstats_init_early();
delayacct_init();
 
+   poking_init();

[PATCH v4 08/10] x86: avoid W^X being broken during modules loading

2018-11-10 Thread Nadav Amit

When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. This patch
prevents having writable executable PTEs in this stage.

In addition, avoiding having R+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch.

To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Suggested-by: Thomas Gleixner 
Suggested-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/alternative.c | 28 +---
 arch/x86/kernel/module.c  |  2 +-
 include/linux/filter.h|  6 ++
 kernel/module.c   | 10 ++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 96607ef285c3..70827332da0f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -667,15 +667,29 @@ void __init alternative_instructions(void)
  * handlers seeing an inconsistent instruction while you patch.
  */
 void *__init_or_module text_poke_early(void *addr, const void *opcode,
- size_t len)
+  size_t len)
 {
unsigned long flags;
-   local_irq_save(flags);
-   memcpy(addr, opcode, len);
-   local_irq_restore(flags);
-   sync_core();
-   /* Could also do a CLFLUSH here to speed up CPU recovery; but
-  that causes hangs on some VIA CPUs. */
+
+   if (static_cpu_has(X86_FEATURE_NX) &&
+   is_module_text_address((unsigned long)addr)) {
+   /*
+* Modules text is marked initially as non-executable, so the
+* code cannot be running and speculative code-fetches are
+* prevented. We can just change the code.
+*/
+   memcpy(addr, opcode, len);
+   } else {
+   local_irq_save(flags);
+   memcpy(addr, opcode, len);
+   local_irq_restore(flags);
+   sync_core();
+
+   /*
+* Could also do a CLFLUSH here to speed up CPU recovery; but
+* that causes hangs on some VIA CPUs.
+*/
+   }
return addr;
 }
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..cfa3106faee4 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,7 +87,7 @@ void *module_alloc(unsigned long size)
p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
-   PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+   PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size) < 0)) {
vfree(p);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index de629b706d1d..ee9ae03c5f56 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -704,7 +704,13 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
+   /*
+* Perform mapping changes in two stages to avoid opening a time-window
+* in which a PTE is cached in any TLB as writable, but marked as
+* executable in the memory-resident mappings (e.g., page-tables).
+*/
set_memory_ro((unsigned long)hdr, hdr->pages);
+   set_memory_x((unsigned long)hdr, hdr->pages);
 }
 
 static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..7cb207249437 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1946,9 +1946,19 @@ void module_enable_ro(const struct module *mod, bool 
after_init)
if (!rodata_enabled)
return;
 
+   /*
+* Perform mapping changes in two stages to avoid opening a time-window
+* in which a PTE is cached in any TLB as writable, but marked as
+* executable in the memory-resident mappings (e.g., page-tables).
+*/
frob_text(>core_layout, set_memory_ro);
+   frob_text(>core_layout, set_memory_x);
+
frob_rodata(>core_layout, set_memory_ro);
+

[PATCH v4 06/10] x86/alternative: use temporary mm for text poking

2018-11-10 Thread Nadav Amit

text_poke() can potentially compromise the security as it sets temporary
PTEs in the fixmap. These PTEs might be used to rewrite the kernel code
from other cores accidentally or maliciously, if an attacker gains the
ability to write onto kernel memory.

Moreover, since remote TLBs are not flushed after the temporary PTEs are
removed, the time-window in which the code is writable is not limited if
the fixmap PTEs - maliciously or accidentally - are cached in the TLB.
To address these potential security hazards, we use a temporary mm for
patching the code.

Finally, text_poke() is also not conservative enough when mapping pages,
as it always tries to map 2 pages, even when a single one is sufficient.
So try to be more conservative, and do not map more than needed.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/fixmap.h |   2 -
 arch/x86/kernel/alternative.c | 112 +++---
 2 files changed, 89 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 50ba74a34a37..9da8cccdf3fb 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -103,8 +103,6 @@ enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
 #endif
-   FIX_TEXT_POKE1, /* reserve 2 pages for text_poke() */
-   FIX_TEXT_POKE0, /* first page is last, because allocation is backward */
 #ifdef CONFIG_X86_INTEL_MID
FIX_LNW_VRTC,
 #endif
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d3ae5c26e5a0..96607ef285c3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -683,43 +684,108 @@ __ro_after_init unsigned long poking_addr;
 
 static int __text_poke(void *addr, const void *opcode, size_t len)
 {
+   bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
+   temporary_mm_state_t prev;
+   struct page *pages[2] = {NULL};
unsigned long flags;
-   char *vaddr;
-   struct page *pages[2];
-   int i, r = 0;
+   pte_t pte, *ptep;
+   spinlock_t *ptl;
+   int r = 0;
 
/*
-* While boot memory allocator is runnig we cannot use struct
-* pages as they are not yet initialized.
+* While boot memory allocator is running we cannot use struct pages as
+* they are not yet initialized.
 */
BUG_ON(!after_bootmem);
 
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
-   pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
+   if (cross_page_boundary)
+   pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
} else {
pages[0] = virt_to_page(addr);
WARN_ON(!PageReserved(pages[0]));
-   pages[1] = virt_to_page(addr + PAGE_SIZE);
+   if (cross_page_boundary)
+   pages[1] = virt_to_page(addr + PAGE_SIZE);
}
-   if (!pages[0])
+
+   if (!pages[0] || (cross_page_boundary && !pages[1]))
return -EFAULT;
+
local_irq_save(flags);
-   set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
-   if (pages[1])
-   set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
-   vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
-   memcpy([(unsigned long)addr & ~PAGE_MASK], opcode, len);
-   clear_fixmap(FIX_TEXT_POKE0);
-   if (pages[1])
-   clear_fixmap(FIX_TEXT_POKE1);
-   local_flush_tlb();
-   sync_core();
-   /* Could also do a CLFLUSH here to speed up CPU recovery; but
-  that causes hangs on some VIA CPUs. */
-   for (i = 0; i < len; i++)
-   if (((char *)addr)[i] != ((char *)opcode)[i])
-   r = -EFAULT;
+
+   /*
+* The lock is not really needed, but this allows to avoid open-coding.
+*/
+   ptep = get_locked_pte(poking_mm, poking_addr, );
+
+   /*
+* If we failed to allocate a PTE, fail. This should *never* happen,
+* since we preallocate the PTE.
+*/
+   if (WARN_ON_ONCE(!ptep))
+   goto out;
+
+   pte = mk_pte(pages[0], PAGE_KERNEL);
+   set_pte_at(poking_mm, poking_addr, ptep, pte);
+
+   if (cross_page_boundary) {
+   pte = mk_pte(pages[1], PAGE_KERNEL);
+   set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte);
+   }
+
+   /*
+* Loading the temporary mm behaves as a compiler barrier, which
+* guarantees that the PTE will be set at the time memcpy() is done.
+*/
+   prev = use_temporary_mm(poking_mm);
+
+   kasan_disable_current();
+   memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
+   kasan_enable_current();

[PATCH v4 10/10] x86/alternative: remove the return value of text_poke_*()

2018-11-10 Thread Nadav Amit

The return value of text_poke_early() and text_poke_bp() is useless.
Remove it.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/text-patching.h |  4 ++--
 arch/x86/kernel/alternative.c| 11 ---
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index e5716ef9a721..a7234cd435d2 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,7 +18,7 @@ static inline void apply_paravirt(struct paravirt_patch_site 
*start,
 #define __parainstructions_end NULL
 #endif
 
-extern void *text_poke_early(void *addr, const void *opcode, size_t len);
+extern void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
  * Clear and restore the kernel write-protection flag on the local CPU.
@@ -37,7 +37,7 @@ extern void *text_poke_early(void *addr, const void *opcode, 
size_t len);
 extern int text_poke(void *addr, const void *opcode, size_t len);
 extern int text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
-extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
+extern void text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 70827332da0f..ab0278c7ecfa 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -264,7 +264,7 @@ static void __init_or_module add_nops(void *insns, unsigned 
int len)
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
-void *text_poke_early(void *addr, const void *opcode, size_t len);
+void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
  * Are we looking at a near JMP with a 1 or 4-byte displacement.
@@ -666,8 +666,8 @@ void __init alternative_instructions(void)
  * instructions. And on the local CPU you need to be protected again NMI or MCE
  * handlers seeing an inconsistent instruction while you patch.
  */
-void *__init_or_module text_poke_early(void *addr, const void *opcode,
-  size_t len)
+void __init_or_module text_poke_early(void *addr, const void *opcode,
+ size_t len)
 {
unsigned long flags;
 
@@ -690,7 +690,6 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
 * that causes hangs on some VIA CPUs.
 */
}
-   return addr;
 }
 
 __ro_after_init struct mm_struct *poking_mm;
@@ -906,7 +905,7 @@ int poke_int3_handler(struct pt_regs *regs)
  *   replacing opcode
  * - sync cores
  */
-void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 {
unsigned char int3 = 0xcc;
 
@@ -948,7 +947,5 @@ void *text_poke_bp(void *addr, const void *opcode, size_t 
len, void *handler)
 * the writing of the new instruction.
 */
bp_patching_in_progress = false;
-
-   return addr;
 }
 
-- 
2.17.1

[PATCH v4 09/10] x86/jump-label: remove support for custom poker

2018-11-10 Thread Nadav Amit

There are only two types of poking: early and breakpoint based. The use
of a function pointer to perform poking complicates the code and is
probably inefficient due to the use of indirect branches.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/jump_label.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index ed5fe274a7d8..7947df599e58 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -39,13 +39,13 @@ static void bug_at(unsigned char *ip, int line)
 
 static void __ref __jump_label_transform(struct jump_entry *entry,
 enum jump_label_type type,
-void *(*poker)(void *, const void *, 
size_t),
 int init)
 {
union jump_code_union jmp;
const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
const void *expect, *code;
+   bool early_poking = init;
int line;
 
jmp.jump = 0xe9;
@@ -58,7 +58,7 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
 * SYSTEM_SCHEDULING before going either.
 */
if (system_state == SYSTEM_BOOTING)
-   poker = text_poke_early;
+   early_poking = true;
 
if (type == JUMP_LABEL_JMP) {
if (init) {
@@ -82,16 +82,13 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
bug_at((void *)jump_entry_code(entry), line);
 
/*
-* Make text_poke_bp() a default fallback poker.
-*
 * At the time the change is being done, just ignore whether we
 * are doing nop -> jump or jump -> nop transition, and assume
 * always nop being the 'currently valid' instruction
-*
 */
-   if (poker) {
-   (*poker)((void *)jump_entry_code(entry), code,
-JUMP_LABEL_NOP_SIZE);
+   if (early_poking) {
+   text_poke_early((void *)jump_entry_code(entry), code,
+   JUMP_LABEL_NOP_SIZE);
return;
}
 
@@ -103,7 +100,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
mutex_lock(_mutex);
-   __jump_label_transform(entry, type, NULL, 0);
+   __jump_label_transform(entry, type, 0);
mutex_unlock(_mutex);
 }
 
@@ -133,7 +130,7 @@ __init_or_module void 
arch_jump_label_transform_static(struct jump_entry *entry,
jlstate = JL_STATE_NO_UPDATE;
}
if (jlstate == JL_STATE_UPDATE)
-   __jump_label_transform(entry, type, text_poke_early, 1);
+   __jump_label_transform(entry, type, 1);
 }
 
 #endif
-- 
2.17.1

[PATCH v4 08/10] x86: avoid W^X being broken during modules loading

2018-11-10 Thread Nadav Amit

When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. This patch
prevents having writable executable PTEs in this stage.

In addition, avoiding having R+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch.

To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Suggested-by: Thomas Gleixner 
Suggested-by: Andy Lutomirski 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/alternative.c | 28 +---
 arch/x86/kernel/module.c  |  2 +-
 include/linux/filter.h|  6 ++
 kernel/module.c   | 10 ++
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 96607ef285c3..70827332da0f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -667,15 +667,29 @@ void __init alternative_instructions(void)
  * handlers seeing an inconsistent instruction while you patch.
  */
 void *__init_or_module text_poke_early(void *addr, const void *opcode,
- size_t len)
+  size_t len)
 {
unsigned long flags;
-   local_irq_save(flags);
-   memcpy(addr, opcode, len);
-   local_irq_restore(flags);
-   sync_core();
-   /* Could also do a CLFLUSH here to speed up CPU recovery; but
-  that causes hangs on some VIA CPUs. */
+
+   if (static_cpu_has(X86_FEATURE_NX) &&
+   is_module_text_address((unsigned long)addr)) {
+   /*
+* Modules text is marked initially as non-executable, so the
+* code cannot be running and speculative code-fetches are
+* prevented. We can just change the code.
+*/
+   memcpy(addr, opcode, len);
+   } else {
+   local_irq_save(flags);
+   memcpy(addr, opcode, len);
+   local_irq_restore(flags);
+   sync_core();
+
+   /*
+* Could also do a CLFLUSH here to speed up CPU recovery; but
+* that causes hangs on some VIA CPUs.
+*/
+   }
return addr;
 }
 
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..cfa3106faee4 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,7 +87,7 @@ void *module_alloc(unsigned long size)
p = __vmalloc_node_range(size, MODULE_ALIGN,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL,
-   PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+   PAGE_KERNEL, 0, NUMA_NO_NODE,
__builtin_return_address(0));
if (p && (kasan_module_alloc(p, size) < 0)) {
vfree(p);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index de629b706d1d..ee9ae03c5f56 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -704,7 +704,13 @@ static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
+   /*
+* Perform mapping changes in two stages to avoid opening a time-window
+* in which a PTE is cached in any TLB as writable, but marked as
+* executable in the memory-resident mappings (e.g., page-tables).
+*/
set_memory_ro((unsigned long)hdr, hdr->pages);
+   set_memory_x((unsigned long)hdr, hdr->pages);
 }
 
 static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..7cb207249437 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1946,9 +1946,19 @@ void module_enable_ro(const struct module *mod, bool 
after_init)
if (!rodata_enabled)
return;
 
+   /*
+* Perform mapping changes in two stages to avoid opening a time-window
+* in which a PTE is cached in any TLB as writable, but marked as
+* executable in the memory-resident mappings (e.g., page-tables).
+*/
frob_text(>core_layout, set_memory_ro);
+   frob_text(>core_layout, set_memory_x);
+
frob_rodata(>core_layout, set_memory_ro);
+

[PATCH v4 06/10] x86/alternative: use temporary mm for text poking

2018-11-10 Thread Nadav Amit

text_poke() can potentially compromise the security as it sets temporary
PTEs in the fixmap. These PTEs might be used to rewrite the kernel code
from other cores accidentally or maliciously, if an attacker gains the
ability to write onto kernel memory.

Moreover, since remote TLBs are not flushed after the temporary PTEs are
removed, the time-window in which the code is writable is not limited if
the fixmap PTEs - maliciously or accidentally - are cached in the TLB.
To address these potential security hazards, we use a temporary mm for
patching the code.

Finally, text_poke() is also not conservative enough when mapping pages,
as it always tries to map 2 pages, even when a single one is sufficient.
So try to be more conservative, and do not map more than needed.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/fixmap.h |   2 -
 arch/x86/kernel/alternative.c | 112 +++---
 2 files changed, 89 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 50ba74a34a37..9da8cccdf3fb 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -103,8 +103,6 @@ enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
 #endif
-   FIX_TEXT_POKE1, /* reserve 2 pages for text_poke() */
-   FIX_TEXT_POKE0, /* first page is last, because allocation is backward */
 #ifdef CONFIG_X86_INTEL_MID
FIX_LNW_VRTC,
 #endif
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d3ae5c26e5a0..96607ef285c3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -683,43 +684,108 @@ __ro_after_init unsigned long poking_addr;
 
 static int __text_poke(void *addr, const void *opcode, size_t len)
 {
+   bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
+   temporary_mm_state_t prev;
+   struct page *pages[2] = {NULL};
unsigned long flags;
-   char *vaddr;
-   struct page *pages[2];
-   int i, r = 0;
+   pte_t pte, *ptep;
+   spinlock_t *ptl;
+   int r = 0;
 
/*
-* While boot memory allocator is runnig we cannot use struct
-* pages as they are not yet initialized.
+* While boot memory allocator is running we cannot use struct pages as
+* they are not yet initialized.
 */
BUG_ON(!after_bootmem);
 
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
-   pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
+   if (cross_page_boundary)
+   pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
} else {
pages[0] = virt_to_page(addr);
WARN_ON(!PageReserved(pages[0]));
-   pages[1] = virt_to_page(addr + PAGE_SIZE);
+   if (cross_page_boundary)
+   pages[1] = virt_to_page(addr + PAGE_SIZE);
}
-   if (!pages[0])
+
+   if (!pages[0] || (cross_page_boundary && !pages[1]))
return -EFAULT;
+
local_irq_save(flags);
-   set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
-   if (pages[1])
-   set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
-   vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
-   memcpy([(unsigned long)addr & ~PAGE_MASK], opcode, len);
-   clear_fixmap(FIX_TEXT_POKE0);
-   if (pages[1])
-   clear_fixmap(FIX_TEXT_POKE1);
-   local_flush_tlb();
-   sync_core();
-   /* Could also do a CLFLUSH here to speed up CPU recovery; but
-  that causes hangs on some VIA CPUs. */
-   for (i = 0; i < len; i++)
-   if (((char *)addr)[i] != ((char *)opcode)[i])
-   r = -EFAULT;
+
+   /*
+* The lock is not really needed, but this allows to avoid open-coding.
+*/
+   ptep = get_locked_pte(poking_mm, poking_addr, );
+
+   /*
+* If we failed to allocate a PTE, fail. This should *never* happen,
+* since we preallocate the PTE.
+*/
+   if (WARN_ON_ONCE(!ptep))
+   goto out;
+
+   pte = mk_pte(pages[0], PAGE_KERNEL);
+   set_pte_at(poking_mm, poking_addr, ptep, pte);
+
+   if (cross_page_boundary) {
+   pte = mk_pte(pages[1], PAGE_KERNEL);
+   set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte);
+   }
+
+   /*
+* Loading the temporary mm behaves as a compiler barrier, which
+* guarantees that the PTE will be set at the time memcpy() is done.
+*/
+   prev = use_temporary_mm(poking_mm);
+
+   kasan_disable_current();
+   memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
+   kasan_enable_current();

[PATCH v4 10/10] x86/alternative: remove the return value of text_poke_*()

2018-11-10 Thread Nadav Amit

The return value of text_poke_early() and text_poke_bp() is useless.
Remove it.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/text-patching.h |  4 ++--
 arch/x86/kernel/alternative.c| 11 ---
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index e5716ef9a721..a7234cd435d2 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,7 +18,7 @@ static inline void apply_paravirt(struct paravirt_patch_site 
*start,
 #define __parainstructions_end NULL
 #endif
 
-extern void *text_poke_early(void *addr, const void *opcode, size_t len);
+extern void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
  * Clear and restore the kernel write-protection flag on the local CPU.
@@ -37,7 +37,7 @@ extern void *text_poke_early(void *addr, const void *opcode, 
size_t len);
 extern int text_poke(void *addr, const void *opcode, size_t len);
 extern int text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
-extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
+extern void text_poke_bp(void *addr, const void *opcode, size_t len, void 
*handler);
 extern int after_bootmem;
 extern __ro_after_init struct mm_struct *poking_mm;
 extern __ro_after_init unsigned long poking_addr;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 70827332da0f..ab0278c7ecfa 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -264,7 +264,7 @@ static void __init_or_module add_nops(void *insns, unsigned 
int len)
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
-void *text_poke_early(void *addr, const void *opcode, size_t len);
+void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
  * Are we looking at a near JMP with a 1 or 4-byte displacement.
@@ -666,8 +666,8 @@ void __init alternative_instructions(void)
  * instructions. And on the local CPU you need to be protected again NMI or MCE
  * handlers seeing an inconsistent instruction while you patch.
  */
-void *__init_or_module text_poke_early(void *addr, const void *opcode,
-  size_t len)
+void __init_or_module text_poke_early(void *addr, const void *opcode,
+ size_t len)
 {
unsigned long flags;
 
@@ -690,7 +690,6 @@ void *__init_or_module text_poke_early(void *addr, const 
void *opcode,
 * that causes hangs on some VIA CPUs.
 */
}
-   return addr;
 }
 
 __ro_after_init struct mm_struct *poking_mm;
@@ -906,7 +905,7 @@ int poke_int3_handler(struct pt_regs *regs)
  *   replacing opcode
  * - sync cores
  */
-void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 {
unsigned char int3 = 0xcc;
 
@@ -948,7 +947,5 @@ void *text_poke_bp(void *addr, const void *opcode, size_t 
len, void *handler)
 * the writing of the new instruction.
 */
bp_patching_in_progress = false;
-
-   return addr;
 }
 
-- 
2.17.1

[PATCH v4 09/10] x86/jump-label: remove support for custom poker

2018-11-10 Thread Nadav Amit

There are only two types of poking: early and breakpoint based. The use
of a function pointer to perform poking complicates the code and is
probably inefficient due to the use of indirect branches.

Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Masami Hiramatsu 
Signed-off-by: Nadav Amit 
---
 arch/x86/kernel/jump_label.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index ed5fe274a7d8..7947df599e58 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -39,13 +39,13 @@ static void bug_at(unsigned char *ip, int line)
 
 static void __ref __jump_label_transform(struct jump_entry *entry,
 enum jump_label_type type,
-void *(*poker)(void *, const void *, 
size_t),
 int init)
 {
union jump_code_union jmp;
const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
const void *expect, *code;
+   bool early_poking = init;
int line;
 
jmp.jump = 0xe9;
@@ -58,7 +58,7 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
 * SYSTEM_SCHEDULING before going either.
 */
if (system_state == SYSTEM_BOOTING)
-   poker = text_poke_early;
+   early_poking = true;
 
if (type == JUMP_LABEL_JMP) {
if (init) {
@@ -82,16 +82,13 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
bug_at((void *)jump_entry_code(entry), line);
 
/*
-* Make text_poke_bp() a default fallback poker.
-*
 * At the time the change is being done, just ignore whether we
 * are doing nop -> jump or jump -> nop transition, and assume
 * always nop being the 'currently valid' instruction
-*
 */
-   if (poker) {
-   (*poker)((void *)jump_entry_code(entry), code,
-JUMP_LABEL_NOP_SIZE);
+   if (early_poking) {
+   text_poke_early((void *)jump_entry_code(entry), code,
+   JUMP_LABEL_NOP_SIZE);
return;
}
 
@@ -103,7 +100,7 @@ void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
mutex_lock(_mutex);
-   __jump_label_transform(entry, type, NULL, 0);
+   __jump_label_transform(entry, type, 0);
mutex_unlock(_mutex);
 }
 
@@ -133,7 +130,7 @@ __init_or_module void 
arch_jump_label_transform_static(struct jump_entry *entry,
jlstate = JL_STATE_NO_UPDATE;
}
if (jlstate == JL_STATE_UPDATE)
-   __jump_label_transform(entry, type, text_poke_early, 1);
+   __jump_label_transform(entry, type, 1);
 }
 
 #endif
-- 
2.17.1

Re: [PATCH 3.16 000/410] 3.16.57-rc1 review

2018-11-10 Thread Guenter Roeck

On Sun, Nov 11, 2018 at 12:09:03AM +, Ben Hutchings wrote:
> On Sat, 2018-06-16 at 22:18 +0100, Ben Hutchings wrote:
> > On Fri, 2018-06-08 at 07:14 -0700, Guenter Roeck wrote:
> > > On Thu, Jun 07, 2018 at 03:05:20PM +0100, Ben Hutchings wrote:
> > > > This is the start of the stable review cycle for the 3.16.57 release.
> > > > There are 410 patches in this series, which will be posted as responses
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > > 
> > > > Responses should be made by Thu Jun 14 18:00:00 UTC 2018.
> > > > Anything received after that time might be too late.
> > > > 
> > > 
> > > Build results:
> > >   total: 138 pass: 136 fail: 2
> > > Failed builds: 
> > >   i386:tools/perf 
> > >   x86_64:tools/perf 
> > > Qemu test results:
> > >   total: 116 pass: 116 fail: 0
> > > 
> > > tools/perf builds are new, so it is probably not entirely surprising
> > > that they fail.
> > > 
> > > Deetails are available at http://kerneltests.org/builders/.
> > 
> > Thanks for testing.  I see you've now made the tools/perf builds work
> > for 3.16, so thanks for that as well.
> 
> I looked again and for the current patch queue I see:
> 
> Building i386:tools/perf ... failed (script) - skipping
> 
> Building x86_64:tools/perf ... failed (script) - skipping
>
perf builds are currently skipped in 3.16. Let me check if I can drop that.

Guenter

Re: [PATCH 3.16 000/410] 3.16.57-rc1 review

2018-11-10 Thread Guenter Roeck

On Sun, Nov 11, 2018 at 12:09:03AM +, Ben Hutchings wrote:
> On Sat, 2018-06-16 at 22:18 +0100, Ben Hutchings wrote:
> > On Fri, 2018-06-08 at 07:14 -0700, Guenter Roeck wrote:
> > > On Thu, Jun 07, 2018 at 03:05:20PM +0100, Ben Hutchings wrote:
> > > > This is the start of the stable review cycle for the 3.16.57 release.
> > > > There are 410 patches in this series, which will be posted as responses
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > > 
> > > > Responses should be made by Thu Jun 14 18:00:00 UTC 2018.
> > > > Anything received after that time might be too late.
> > > > 
> > > 
> > > Build results:
> > >   total: 138 pass: 136 fail: 2
> > > Failed builds: 
> > >   i386:tools/perf 
> > >   x86_64:tools/perf 
> > > Qemu test results:
> > >   total: 116 pass: 116 fail: 0
> > > 
> > > tools/perf builds are new, so it is probably not entirely surprising
> > > that they fail.
> > > 
> > > Deetails are available at http://kerneltests.org/builders/.
> > 
> > Thanks for testing.  I see you've now made the tools/perf builds work
> > for 3.16, so thanks for that as well.
> 
> I looked again and for the current patch queue I see:
> 
> Building i386:tools/perf ... failed (script) - skipping
> 
> Building x86_64:tools/perf ... failed (script) - skipping
>
perf builds are currently skipped in 3.16. Let me check if I can drop that.

Guenter

[PATCH] staging: greybus: arche-apb-ctrl.c: Switch to the gpio descriptor interface

2018-11-10 Thread Nishad Kamdar

Use the gpiod interface instead of the deprecated old non-descriptor
interface.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/greybus/arche-apb-ctrl.c | 158 ++-
 1 file changed, 65 insertions(+), 93 deletions(-)

diff --git a/drivers/staging/greybus/arche-apb-ctrl.c 
b/drivers/staging/greybus/arche-apb-ctrl.c
index cc8d6fc831b4..fd19e2394c9c 100644
--- a/drivers/staging/greybus/arche-apb-ctrl.c
+++ b/drivers/staging/greybus/arche-apb-ctrl.c
@@ -8,9 +8,8 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -20,17 +19,16 @@
 #include 
 #include "arche_platform.h"
 
-
 static void apb_bootret_deassert(struct device *dev);
 
 struct arche_apb_ctrl_drvdata {
/* Control GPIO signals to and from AP <=> AP Bridges */
-   int resetn_gpio;
-   int boot_ret_gpio;
-   int pwroff_gpio;
-   int wake_in_gpio;
-   int wake_out_gpio;
-   int pwrdn_gpio;
+   struct gpio_desc *resetn;
+   struct gpio_desc *boot_ret;
+   struct gpio_desc *pwroff;
+   struct gpio_desc *wake_in;
+   struct gpio_desc *wake_out;
+   struct gpio_desc *pwrdn;
 
enum arche_platform_state state;
bool init_disabled;
@@ -38,28 +36,28 @@ struct arche_apb_ctrl_drvdata {
struct regulator *vcore;
struct regulator *vio;
 
-   int clk_en_gpio;
+   struct gpio_desc *clk_en;
struct clk *clk;
 
struct pinctrl *pinctrl;
struct pinctrl_state *pin_default;
 
/* V2: SPI Bus control  */
-   int spi_en_gpio;
+   struct gpio_desc *spi_en;
bool spi_en_polarity_high;
 };
 
 /*
  * Note that these low level api's are active high
  */
-static inline void deassert_reset(unsigned int gpio)
+static inline void deassert_reset(struct gpio_desc *gpio)
 {
-   gpio_set_value(gpio, 1);
+   gpiod_set_value(gpio, 1);
 }
 
-static inline void assert_reset(unsigned int gpio)
+static inline void assert_reset(struct gpio_desc *gpio)
 {
-   gpio_set_value(gpio, 0);
+   gpiod_set_value(gpio, 0);
 }
 
 /*
@@ -76,11 +74,11 @@ static int coldboot_seq(struct platform_device *pdev)
return 0;
 
/* Hold APB in reset state */
-   assert_reset(apb->resetn_gpio);
+   assert_reset(apb->resetn);
 
if (apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING &&
-   gpio_is_valid(apb->spi_en_gpio))
-   devm_gpio_free(dev, apb->spi_en_gpio);
+   apb->spi_en)
+   devm_gpiod_put(dev, apb->spi_en);
 
/* Enable power to APB */
if (!IS_ERR(apb->vcore)) {
@@ -102,13 +100,13 @@ static int coldboot_seq(struct platform_device *pdev)
apb_bootret_deassert(dev);
 
/* On DB3 clock was not mandatory */
-   if (gpio_is_valid(apb->clk_en_gpio))
-   gpio_set_value(apb->clk_en_gpio, 1);
+   if (apb->clk_en)
+   gpiod_set_value(apb->clk_en, 1);
 
usleep_range(100, 200);
 
/* deassert reset to APB : Active-low signal */
-   deassert_reset(apb->resetn_gpio);
+   deassert_reset(apb->resetn);
 
apb->state = ARCHE_PLATFORM_STATE_ACTIVE;
 
@@ -120,6 +118,7 @@ static int fw_flashing_seq(struct platform_device *pdev)
struct device *dev = >dev;
struct arche_apb_ctrl_drvdata *apb = platform_get_drvdata(pdev);
int ret;
+   unsigned long flags;
 
if (apb->init_disabled ||
apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING)
@@ -137,25 +136,20 @@ static int fw_flashing_seq(struct platform_device *pdev)
return ret;
}
 
-   if (gpio_is_valid(apb->spi_en_gpio)) {
-   unsigned long flags;
-
-   if (apb->spi_en_polarity_high)
-   flags = GPIOF_OUT_INIT_HIGH;
-   else
-   flags = GPIOF_OUT_INIT_LOW;
+   if (apb->spi_en_polarity_high)
+   flags = GPIOD_OUT_HIGH;
+   else
+   flags = GPIOD_OUT_LOW;
 
-   ret = devm_gpio_request_one(dev, apb->spi_en_gpio,
-   flags, "apb_spi_en");
-   if (ret) {
-   dev_err(dev, "Failed requesting SPI bus en gpio %d\n",
-   apb->spi_en_gpio);
-   return ret;
-   }
+   apb->spi_en = devm_gpiod_get(dev, "gb,apb_spi_en", flags);
+   if (IS_ERR(apb->spi_en)) {
+   ret = PTR_ERR(apb->spi_en);
+   dev_err(dev, "Failed requesting SPI bus en GPIO: %d\n", ret);
+   return ret;
}
 
/* for flashing device should be in reset state */
-   assert_reset(apb->resetn_gpio);
+   assert_reset(apb->resetn);
apb->state = ARCHE_PLATFORM_STATE_FW_FLASHING;
 
return 0;
@@ -178,8 +172,8 @@ static int standby_boot_seq(struct platform_device *pdev)
return 0;
 
if (apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING &&
-

[PATCH] staging: greybus: arche-apb-ctrl.c: Switch to the gpio descriptor interface

2018-11-10 Thread Nishad Kamdar

Use the gpiod interface instead of the deprecated old non-descriptor
interface.

Signed-off-by: Nishad Kamdar 
---
 drivers/staging/greybus/arche-apb-ctrl.c | 158 ++-
 1 file changed, 65 insertions(+), 93 deletions(-)

diff --git a/drivers/staging/greybus/arche-apb-ctrl.c 
b/drivers/staging/greybus/arche-apb-ctrl.c
index cc8d6fc831b4..fd19e2394c9c 100644
--- a/drivers/staging/greybus/arche-apb-ctrl.c
+++ b/drivers/staging/greybus/arche-apb-ctrl.c
@@ -8,9 +8,8 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -20,17 +19,16 @@
 #include 
 #include "arche_platform.h"
 
-
 static void apb_bootret_deassert(struct device *dev);
 
 struct arche_apb_ctrl_drvdata {
/* Control GPIO signals to and from AP <=> AP Bridges */
-   int resetn_gpio;
-   int boot_ret_gpio;
-   int pwroff_gpio;
-   int wake_in_gpio;
-   int wake_out_gpio;
-   int pwrdn_gpio;
+   struct gpio_desc *resetn;
+   struct gpio_desc *boot_ret;
+   struct gpio_desc *pwroff;
+   struct gpio_desc *wake_in;
+   struct gpio_desc *wake_out;
+   struct gpio_desc *pwrdn;
 
enum arche_platform_state state;
bool init_disabled;
@@ -38,28 +36,28 @@ struct arche_apb_ctrl_drvdata {
struct regulator *vcore;
struct regulator *vio;
 
-   int clk_en_gpio;
+   struct gpio_desc *clk_en;
struct clk *clk;
 
struct pinctrl *pinctrl;
struct pinctrl_state *pin_default;
 
/* V2: SPI Bus control  */
-   int spi_en_gpio;
+   struct gpio_desc *spi_en;
bool spi_en_polarity_high;
 };
 
 /*
  * Note that these low level api's are active high
  */
-static inline void deassert_reset(unsigned int gpio)
+static inline void deassert_reset(struct gpio_desc *gpio)
 {
-   gpio_set_value(gpio, 1);
+   gpiod_set_value(gpio, 1);
 }
 
-static inline void assert_reset(unsigned int gpio)
+static inline void assert_reset(struct gpio_desc *gpio)
 {
-   gpio_set_value(gpio, 0);
+   gpiod_set_value(gpio, 0);
 }
 
 /*
@@ -76,11 +74,11 @@ static int coldboot_seq(struct platform_device *pdev)
return 0;
 
/* Hold APB in reset state */
-   assert_reset(apb->resetn_gpio);
+   assert_reset(apb->resetn);
 
if (apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING &&
-   gpio_is_valid(apb->spi_en_gpio))
-   devm_gpio_free(dev, apb->spi_en_gpio);
+   apb->spi_en)
+   devm_gpiod_put(dev, apb->spi_en);
 
/* Enable power to APB */
if (!IS_ERR(apb->vcore)) {
@@ -102,13 +100,13 @@ static int coldboot_seq(struct platform_device *pdev)
apb_bootret_deassert(dev);
 
/* On DB3 clock was not mandatory */
-   if (gpio_is_valid(apb->clk_en_gpio))
-   gpio_set_value(apb->clk_en_gpio, 1);
+   if (apb->clk_en)
+   gpiod_set_value(apb->clk_en, 1);
 
usleep_range(100, 200);
 
/* deassert reset to APB : Active-low signal */
-   deassert_reset(apb->resetn_gpio);
+   deassert_reset(apb->resetn);
 
apb->state = ARCHE_PLATFORM_STATE_ACTIVE;
 
@@ -120,6 +118,7 @@ static int fw_flashing_seq(struct platform_device *pdev)
struct device *dev = >dev;
struct arche_apb_ctrl_drvdata *apb = platform_get_drvdata(pdev);
int ret;
+   unsigned long flags;
 
if (apb->init_disabled ||
apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING)
@@ -137,25 +136,20 @@ static int fw_flashing_seq(struct platform_device *pdev)
return ret;
}
 
-   if (gpio_is_valid(apb->spi_en_gpio)) {
-   unsigned long flags;
-
-   if (apb->spi_en_polarity_high)
-   flags = GPIOF_OUT_INIT_HIGH;
-   else
-   flags = GPIOF_OUT_INIT_LOW;
+   if (apb->spi_en_polarity_high)
+   flags = GPIOD_OUT_HIGH;
+   else
+   flags = GPIOD_OUT_LOW;
 
-   ret = devm_gpio_request_one(dev, apb->spi_en_gpio,
-   flags, "apb_spi_en");
-   if (ret) {
-   dev_err(dev, "Failed requesting SPI bus en gpio %d\n",
-   apb->spi_en_gpio);
-   return ret;
-   }
+   apb->spi_en = devm_gpiod_get(dev, "gb,apb_spi_en", flags);
+   if (IS_ERR(apb->spi_en)) {
+   ret = PTR_ERR(apb->spi_en);
+   dev_err(dev, "Failed requesting SPI bus en GPIO: %d\n", ret);
+   return ret;
}
 
/* for flashing device should be in reset state */
-   assert_reset(apb->resetn_gpio);
+   assert_reset(apb->resetn);
apb->state = ARCHE_PLATFORM_STATE_FW_FLASHING;
 
return 0;
@@ -178,8 +172,8 @@ static int standby_boot_seq(struct platform_device *pdev)
return 0;
 
if (apb->state == ARCHE_PLATFORM_STATE_FW_FLASHING &&
-

[PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet

2018-11-10 Thread Leo Yan

The perf sample data contains flags to indicate the hardware trace data
is belonging to which type branch instruction, thus this can be used to
print out the human readable string.  Arm CoreSight ETM sample data is
missed to set flags and it is always set to zeros, this results in perf
tool skips to print string for instruction types.

Arm CoreSight ETM supports different kinds instruction of A64, A32 and
T32; this patch is to set branch instruction flags in packet for these
ISAs.

The brief idea for patch implementation is describe as below:

- For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
  beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
  OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
  for trace end;

  As Mike suggested the packet stream might have more than one two
  TRACE_ON packets, the first one TRACE_ON packet indicates trace end
  and the second one is taken as trace restarting.  We will handle this
  special case in the upper layer with packet queue handling, which has
  more context so it's more suitable fix up for it.  This will be
  accomplished in the sequential patch.

- For instruction range packet, mainly base on three factors to decide
  the branch instruction types:

  elem->last_i_type
  elem->last_i_subtype
  elem->last_instr_cond

  If the instruction is immediate branch but without link and return
  flag, we consider it as function internal branch;  in fact the
  immediate branch also can be used to invoke the function entry,
  usually this is only used in assembly code to directly call a symbol
  and don't expect to return back; after reviewing kernel normal
  functions and user space programs, both of them are very seldom to use
  immediate branch for function call.  On the other hand, if we want to
  decide the immediate branch is for function branch jumping or for
  function calling, we need to rely on the start address of next packet
  and check the symbol offset for the start address,  this will
  introduce much complexity in the implementation.  So for this version
  we simply consider immediate branch as function internal branch.
  Moreover, we rely on 'elem->last_instr_cond' to decide if the branch
  instruction is a conditional branch or not.

  If the instruction is immediate branch with link, it's instruction
  'BL' and which is used for function call.

  If the instruction is indirect branch and with subtype
  OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function
  return for below cases related with A32/T32 instruction; set this
  branch flag as function return (Thanks for Al's suggestion).

BX R14
MOV PC, LR
POP {…, PC}
LDR PC, [SP], #offset

  If the instruction is indirect branch without link, this is
  corresponding to instruction 'BR', this instruction usually is used
  for dynamic link lib with below usage; so we think it's a return
  instruction.

0680 <.plt>:
 680:   a9bf7bf0stp x16, x30, [sp, #-16]!
 684:   9090adrpx16, 1 <__FRAME_END__+0xf630>
 688:   f947fe11ldr x17, [x16, #4088]
 68c:   913fe210add x16, x16, #0xff8
 690:   d61f0220br  x17

  If the instruction is indirect branch with link, e.g BLR, we think
  it's a function call.

  For function return, ARMv8 introduces a dedicated instruction 'ret',
  which has flag of OCSD_S_INSTR_V8_RET.

- For exception packets, this patch divides into three types:

  The first type of exception is caused by external logics like bus,
  interrupt controller, debug module or PE reset or halt; this is
  corresponding to flags "bcyi" which defined in doc perf-script.txt;

  The second type is for system call, this is set as "bcs" by following
  definition in the doc;

  The third type is for CPU trap, data and instruction prefetch abort,
  alignment abort; usually these exceptions are synchronous for CPU, so
  set them as "bci" type.

Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Robert Walker 
Cc: Al Grant 
Cc: Andi Kleen 
Cc: Adrian Hunter 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 2 files changed, 169 insertions(+)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index d1a6cbc..0e50c52 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -303,6 +303,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder 
*decoder,
decoder->packet_buffer[et].instr_count = 0;
decoder->packet_buffer[et].last_instr_taken_branch = false;
decoder->packet_buffer[et].last_instr_size = 0;
+   decoder->packet_buffer[et].flags = 0;
 
if (decoder->packet_count == MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
@@ -437,6 +438,171 @@

[PATCH v2 0/2] perf cs-etm: Add support for sample flags

2018-11-10 Thread Leo Yan

This patch seris adds support for sample flags so can facilitate perf
to print sample flags for branch instruction.

The patch 0001 is to set branch instruction flags in packet, this
patch has the core code in this series to set flags according to the
decoding element type, and also based on the elements including
instruction type, subtype and condition flag to help making decision
to set flags value.

The patch 0002 is to support sample flags by copying the flags value
from packet structure to sample structure, and it includes three fixing
up for TRACE_ON/TRACE_OFF and exception packets.

The patch series is based on OpenCSD v0.10.0 and Rob's patch 'perf:
Support for Arm A32/T32 instruction sets in CoreSight trace' also is
prerequisite to support A32/T32 ISAs.

This patch series is applied on the acme's perf core branch [1] with the
latest commit f1d23afaf677 ("perf bpf: Reduce the hardcoded .max_entries
for pid_maps") and has two prerequisites:
1) It's dependent on Rob's patch 'perf: Support for Arm A32/T32
   instruction sets in CoreSight trace' [2];
2) It's dependent on another patch series 'perf cs-etm: Correct packets
   handling' [3].

After applying the dependency patches and this patch series, we can
verify sample flags with below command:

  # perf script -F,-time,+flags,+ip,+sym,+dso,+addr,+symoff -k vmlinux

Changes from v1:
* Moved exception packets handling patches into patch series 'perf
  cs-etm: Correct packets handling'.
* Added sample flags fixing up for TRACE_OFF packet.
* Created a new function which is used to maintain flags fixing up.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
[2] 
http://archive.armlinux.org.uk/lurker/message/20181109.091126.9d69489d.en.html
[3] 
http://archive.armlinux.org.uk/lurker/message/2018.045938.782b378b.en.html


Leo Yan (2):
  perf cs-etm: Set branch instruction flags in packet
  perf cs-etm: Add support sample flags

 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c|  43 +-
 3 files changed, 210 insertions(+), 2 deletions(-)

-- 
2.7.4

[PATCH v2 2/2] perf cs-etm: Add support sample flags

2018-11-10 Thread Leo Yan

We have prepared the flags in the packet structure, so need to copy
the related value into sample structure thus perf tool can facilitate
sample flags.

The PREV_PACKET contains the branch instruction flags and PACKET
actually contains the flags for next branch instruction.  So this patch
is to set sample flags with 'etmq->prev_packet->flags'.

This patch includes three fixing up for sample flags based on the
packets context:

- If the packet is exception packet or exception return packet, update
  the previous packet for exception specific flags;
- If there has TRACE_ON or TRACE_OFF packet in the middle of instruction
  packets, this indicates the trace is discontinuous, so append the flag
  PERF_IP_FLAG_TRACE_END to the previous packet to indicate the trace
  has been ended;
- If one instruction packet is behind TRACE_OFF packet, this instruction
  is restarting trace packet.  So set flag PERF_IP_FLAG_TRACE_START to
  TRACE_OFF packet if one, this flag isn't used by TRACE_OFF packet but
  used to indicate trace restarting when generate sample.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 43 +--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 455f132..afca6f3 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -676,7 +676,7 @@ static int cs_etm__synth_instruction_sample(struct 
cs_etm_queue *etmq,
sample.stream_id = etmq->etm->instructions_id;
sample.period = period;
sample.cpu = etmq->packet->cpu;
-   sample.flags = 0;
+   sample.flags = etmq->prev_packet->flags;
sample.insn_len = 1;
sample.cpumode = event->sample.header.misc;
 
@@ -735,7 +735,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue 
*etmq)
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
sample.cpu = etmq->packet->cpu;
-   sample.flags = 0;
+   sample.flags = etmq->prev_packet->flags;
sample.cpumode = event->sample.header.misc;
 
/*
@@ -878,6 +878,43 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace 
*etm,
return 0;
 }
 
+static void cs_etm__fixup_flags(struct cs_etm_queue *etmq)
+{
+   /*
+* Decoding stream might insert one TRACE_OFF packet in the
+* middle of instruction packets, this means it doesn't
+* contain the pair packets with TRACE_OFF and TRACE_ON.
+* For this case, the instruction packet follows with
+* TRACE_OFF packet so we need to fixup prev_packet with flag
+* PERF_IP_FLAG_TRACE_BEGIN, this flag finally is used by the
+* instruction packet to generate samples.
+*/
+   if (etmq->prev_packet->sample_type == CS_ETM_TRACE_OFF &&
+   etmq->packet->sample_type == CS_ETM_RANGE)
+   etmq->prev_packet->flags = PERF_IP_FLAG_BRANCH |
+  PERF_IP_FLAG_TRACE_BEGIN;
+
+   if (etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+   /*
+* When the exception packet is inserted, update flags
+* so tell perf it is exception related branches.
+*/
+   if (etmq->packet->sample_type == CS_ETM_EXCEPTION ||
+   etmq->packet->sample_type == CS_ETM_EXCEPTION_RET)
+   etmq->prev_packet->flags = etmq->packet->flags;
+
+   /*
+* The trace is discontinuous, weather this is caused by
+* TRACE_ON packet or TRACE_OFF packet is coming, if the
+* previous packet is instruction packet, simply set flag
+* PERF_IP_FLAG_TRACE_END for previous packet.
+*/
+   if (etmq->packet->sample_type == CS_ETM_TRACE_ON ||
+   etmq->packet->sample_type == CS_ETM_TRACE_OFF)
+   etmq->prev_packet->flags |= PERF_IP_FLAG_TRACE_END;
+   }
+}
+
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
struct cs_etm_auxtrace *etm = etmq->etm;
@@ -1100,6 +1137,8 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 */
break;
 
+   cs_etm__fixup_flags(etmq);
+
switch (etmq->packet->sample_type) {
case CS_ETM_RANGE:
/*
-- 
2.7.4

[PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet

2018-11-10 Thread Leo Yan

The perf sample data contains flags to indicate the hardware trace data
is belonging to which type branch instruction, thus this can be used to
print out the human readable string.  Arm CoreSight ETM sample data is
missed to set flags and it is always set to zeros, this results in perf
tool skips to print string for instruction types.

Arm CoreSight ETM supports different kinds instruction of A64, A32 and
T32; this patch is to set branch instruction flags in packet for these
ISAs.

The brief idea for patch implementation is describe as below:

- For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
  beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
  OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
  for trace end;

  As Mike suggested the packet stream might have more than one two
  TRACE_ON packets, the first one TRACE_ON packet indicates trace end
  and the second one is taken as trace restarting.  We will handle this
  special case in the upper layer with packet queue handling, which has
  more context so it's more suitable fix up for it.  This will be
  accomplished in the sequential patch.

- For instruction range packet, mainly base on three factors to decide
  the branch instruction types:

  elem->last_i_type
  elem->last_i_subtype
  elem->last_instr_cond

  If the instruction is immediate branch but without link and return
  flag, we consider it as function internal branch;  in fact the
  immediate branch also can be used to invoke the function entry,
  usually this is only used in assembly code to directly call a symbol
  and don't expect to return back; after reviewing kernel normal
  functions and user space programs, both of them are very seldom to use
  immediate branch for function call.  On the other hand, if we want to
  decide the immediate branch is for function branch jumping or for
  function calling, we need to rely on the start address of next packet
  and check the symbol offset for the start address,  this will
  introduce much complexity in the implementation.  So for this version
  we simply consider immediate branch as function internal branch.
  Moreover, we rely on 'elem->last_instr_cond' to decide if the branch
  instruction is a conditional branch or not.

  If the instruction is immediate branch with link, it's instruction
  'BL' and which is used for function call.

  If the instruction is indirect branch and with subtype
  OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function
  return for below cases related with A32/T32 instruction; set this
  branch flag as function return (Thanks for Al's suggestion).

BX R14
MOV PC, LR
POP {…, PC}
LDR PC, [SP], #offset

  If the instruction is indirect branch without link, this is
  corresponding to instruction 'BR', this instruction usually is used
  for dynamic link lib with below usage; so we think it's a return
  instruction.

0680 <.plt>:
 680:   a9bf7bf0stp x16, x30, [sp, #-16]!
 684:   9090adrpx16, 1 <__FRAME_END__+0xf630>
 688:   f947fe11ldr x17, [x16, #4088]
 68c:   913fe210add x16, x16, #0xff8
 690:   d61f0220br  x17

  If the instruction is indirect branch with link, e.g BLR, we think
  it's a function call.

  For function return, ARMv8 introduces a dedicated instruction 'ret',
  which has flag of OCSD_S_INSTR_V8_RET.

- For exception packets, this patch divides into three types:

  The first type of exception is caused by external logics like bus,
  interrupt controller, debug module or PE reset or halt; this is
  corresponding to flags "bcyi" which defined in doc perf-script.txt;

  The second type is for system call, this is set as "bcs" by following
  definition in the doc;

  The third type is for CPU trap, data and instruction prefetch abort,
  alignment abort; usually these exceptions are synchronous for CPU, so
  set them as "bci" type.

Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Robert Walker 
Cc: Al Grant 
Cc: Andi Kleen 
Cc: Adrian Hunter 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 2 files changed, 169 insertions(+)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index d1a6cbc..0e50c52 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -303,6 +303,7 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder 
*decoder,
decoder->packet_buffer[et].instr_count = 0;
decoder->packet_buffer[et].last_instr_taken_branch = false;
decoder->packet_buffer[et].last_instr_size = 0;
+   decoder->packet_buffer[et].flags = 0;
 
if (decoder->packet_count == MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
@@ -437,6 +438,171 @@

[PATCH v2 0/2] perf cs-etm: Add support for sample flags

2018-11-10 Thread Leo Yan

This patch seris adds support for sample flags so can facilitate perf
to print sample flags for branch instruction.

The patch 0001 is to set branch instruction flags in packet, this
patch has the core code in this series to set flags according to the
decoding element type, and also based on the elements including
instruction type, subtype and condition flag to help making decision
to set flags value.

The patch 0002 is to support sample flags by copying the flags value
from packet structure to sample structure, and it includes three fixing
up for TRACE_ON/TRACE_OFF and exception packets.

The patch series is based on OpenCSD v0.10.0 and Rob's patch 'perf:
Support for Arm A32/T32 instruction sets in CoreSight trace' also is
prerequisite to support A32/T32 ISAs.

This patch series is applied on the acme's perf core branch [1] with the
latest commit f1d23afaf677 ("perf bpf: Reduce the hardcoded .max_entries
for pid_maps") and has two prerequisites:
1) It's dependent on Rob's patch 'perf: Support for Arm A32/T32
   instruction sets in CoreSight trace' [2];
2) It's dependent on another patch series 'perf cs-etm: Correct packets
   handling' [3].

After applying the dependency patches and this patch series, we can
verify sample flags with below command:

  # perf script -F,-time,+flags,+ip,+sym,+dso,+addr,+symoff -k vmlinux

Changes from v1:
* Moved exception packets handling patches into patch series 'perf
  cs-etm: Correct packets handling'.
* Added sample flags fixing up for TRACE_OFF packet.
* Created a new function which is used to maintain flags fixing up.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
[2] 
http://archive.armlinux.org.uk/lurker/message/20181109.091126.9d69489d.en.html
[3] 
http://archive.armlinux.org.uk/lurker/message/2018.045938.782b378b.en.html


Leo Yan (2):
  perf cs-etm: Set branch instruction flags in packet
  perf cs-etm: Add support sample flags

 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 168 
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c|  43 +-
 3 files changed, 210 insertions(+), 2 deletions(-)

-- 
2.7.4

[PATCH v2 2/2] perf cs-etm: Add support sample flags

2018-11-10 Thread Leo Yan

We have prepared the flags in the packet structure, so need to copy
the related value into sample structure thus perf tool can facilitate
sample flags.

The PREV_PACKET contains the branch instruction flags and PACKET
actually contains the flags for next branch instruction.  So this patch
is to set sample flags with 'etmq->prev_packet->flags'.

This patch includes three fixing up for sample flags based on the
packets context:

- If the packet is exception packet or exception return packet, update
  the previous packet for exception specific flags;
- If there has TRACE_ON or TRACE_OFF packet in the middle of instruction
  packets, this indicates the trace is discontinuous, so append the flag
  PERF_IP_FLAG_TRACE_END to the previous packet to indicate the trace
  has been ended;
- If one instruction packet is behind TRACE_OFF packet, this instruction
  is restarting trace packet.  So set flag PERF_IP_FLAG_TRACE_START to
  TRACE_OFF packet if one, this flag isn't used by TRACE_OFF packet but
  used to indicate trace restarting when generate sample.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 43 +--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 455f132..afca6f3 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -676,7 +676,7 @@ static int cs_etm__synth_instruction_sample(struct 
cs_etm_queue *etmq,
sample.stream_id = etmq->etm->instructions_id;
sample.period = period;
sample.cpu = etmq->packet->cpu;
-   sample.flags = 0;
+   sample.flags = etmq->prev_packet->flags;
sample.insn_len = 1;
sample.cpumode = event->sample.header.misc;
 
@@ -735,7 +735,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue 
*etmq)
sample.stream_id = etmq->etm->branches_id;
sample.period = 1;
sample.cpu = etmq->packet->cpu;
-   sample.flags = 0;
+   sample.flags = etmq->prev_packet->flags;
sample.cpumode = event->sample.header.misc;
 
/*
@@ -878,6 +878,43 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace 
*etm,
return 0;
 }
 
+static void cs_etm__fixup_flags(struct cs_etm_queue *etmq)
+{
+   /*
+* Decoding stream might insert one TRACE_OFF packet in the
+* middle of instruction packets, this means it doesn't
+* contain the pair packets with TRACE_OFF and TRACE_ON.
+* For this case, the instruction packet follows with
+* TRACE_OFF packet so we need to fixup prev_packet with flag
+* PERF_IP_FLAG_TRACE_BEGIN, this flag finally is used by the
+* instruction packet to generate samples.
+*/
+   if (etmq->prev_packet->sample_type == CS_ETM_TRACE_OFF &&
+   etmq->packet->sample_type == CS_ETM_RANGE)
+   etmq->prev_packet->flags = PERF_IP_FLAG_BRANCH |
+  PERF_IP_FLAG_TRACE_BEGIN;
+
+   if (etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+   /*
+* When the exception packet is inserted, update flags
+* so tell perf it is exception related branches.
+*/
+   if (etmq->packet->sample_type == CS_ETM_EXCEPTION ||
+   etmq->packet->sample_type == CS_ETM_EXCEPTION_RET)
+   etmq->prev_packet->flags = etmq->packet->flags;
+
+   /*
+* The trace is discontinuous, weather this is caused by
+* TRACE_ON packet or TRACE_OFF packet is coming, if the
+* previous packet is instruction packet, simply set flag
+* PERF_IP_FLAG_TRACE_END for previous packet.
+*/
+   if (etmq->packet->sample_type == CS_ETM_TRACE_ON ||
+   etmq->packet->sample_type == CS_ETM_TRACE_OFF)
+   etmq->prev_packet->flags |= PERF_IP_FLAG_TRACE_END;
+   }
+}
+
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
struct cs_etm_auxtrace *etm = etmq->etm;
@@ -1100,6 +1137,8 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 */
break;
 
+   cs_etm__fixup_flags(etmq);
+
switch (etmq->packet->sample_type) {
case CS_ETM_RANGE:
/*
-- 
2.7.4

[PATCH v1 2/5] perf cs-etm: Avoid stale branch samples when flush packet

2018-11-10 Thread Leo Yan

At the end of trace buffer handling, function cs_etm__flush() is invoked
to flush any remaining branch stack entries.  As a side effect, it also
generates branch sample, because the 'etmq->packet' doesn't contains any
new coming packet but point to one stale packet after packets swapping,
so it wrongly makes synthesize branch samples with stale packet info.

We could review below detailed flow which causes issue:

  Packet1: start_addr=0x08b1fbf0 end_addr=0x08b1fbfc
  Packet2: start_addr=0x08b1fb5c end_addr=0x08b1fb6c

  step 1: cs_etm__sample():
sample: ip=(0x08b1fbfc-4) addr=0x08b1fb5c

  step 2: flush packet in cs_etm__run_decoder():
cs_etm__run_decoder()
  `-> err = cs_etm__flush(etmq, false);
sample: ip=(0x08b1fb6c-4) addr=0x08b1fbf0

Packet1 and packet2 are two continuous packets, when packet2 is the new
coming packet, cs_etm__sample() generates branch sample for these two
packets and use [packet1::end_addr - 4 => packet2::start_addr] as branch
jump flow, thus we can see the first generated branch sample in step 1.
At the end of cs_etm__sample() it swaps packets so 'etm->prev_packet'=
packet2 and 'etm->packet'=packet1, so far it's okay for branch sample.

If packet2 is the last one packet in trace buffer, even there have no
any new coming packet, cs_etm__run_decoder() invokes cs_etm__flush() to
flush branch stack entries as expected, but it also generates branch
samples by taking 'etm->packet' as a new coming packet, thus the branch
jump flow is as [packet2::end_addr - 4 =>  packet1::start_addr]; this
is the second sample which is generated in step 2.  So actually the
second sample is a stale sample and we should not generate it.

This patch is to add new argument 'new_packet' for cs_etm__flush(), we
can pass 'true' for this argument if there have a new packet, otherwise
it will pass 'false' for the purpose of only flushing branch stack
entries and avoid to generate sample for stale packet.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index fe18d7b..f4fa877 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -955,7 +955,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
return 0;
 }
 
-static int cs_etm__flush(struct cs_etm_queue *etmq)
+static int cs_etm__flush(struct cs_etm_queue *etmq, bool new_packet)
 {
int err = 0;
struct cs_etm_auxtrace *etm = etmq->etm;
@@ -989,6 +989,20 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
 
}
 
+   /*
+* If 'new_packet' is false, this time call has no a new packet
+* coming and 'etmq->packet' contains the stale packet which is
+* set at the previous time with packets swapping.  In this case
+* this function is invoked only for flushing branch stack at
+* the end of buffer handling.
+*
+* Simply to say, branch samples should be generated when every
+* time receive one new packet; otherwise, directly bail out to
+* avoid generate branch sample with stale packet.
+*/
+   if (!new_packet)
+   return 0;
+
if (etm->sample_branches &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
err = cs_etm__synth_branch_sample(etmq);
@@ -1075,7 +1089,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 * Discontinuity in trace, flush
 * previous branch stack
 */
-   cs_etm__flush(etmq);
+   cs_etm__flush(etmq, true);
break;
case CS_ETM_EMPTY:
/*
@@ -1092,7 +1106,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 
if (err == 0)
/* Flush any remaining branch stack entries */
-   err = cs_etm__flush(etmq);
+   err = cs_etm__flush(etmq, false);
}
 
return err;
-- 
2.7.4

[PATCH v1 2/5] perf cs-etm: Avoid stale branch samples when flush packet

2018-11-10 Thread Leo Yan

At the end of trace buffer handling, function cs_etm__flush() is invoked
to flush any remaining branch stack entries.  As a side effect, it also
generates branch sample, because the 'etmq->packet' doesn't contains any
new coming packet but point to one stale packet after packets swapping,
so it wrongly makes synthesize branch samples with stale packet info.

We could review below detailed flow which causes issue:

  Packet1: start_addr=0x08b1fbf0 end_addr=0x08b1fbfc
  Packet2: start_addr=0x08b1fb5c end_addr=0x08b1fb6c

  step 1: cs_etm__sample():
sample: ip=(0x08b1fbfc-4) addr=0x08b1fb5c

  step 2: flush packet in cs_etm__run_decoder():
cs_etm__run_decoder()
  `-> err = cs_etm__flush(etmq, false);
sample: ip=(0x08b1fb6c-4) addr=0x08b1fbf0

Packet1 and packet2 are two continuous packets, when packet2 is the new
coming packet, cs_etm__sample() generates branch sample for these two
packets and use [packet1::end_addr - 4 => packet2::start_addr] as branch
jump flow, thus we can see the first generated branch sample in step 1.
At the end of cs_etm__sample() it swaps packets so 'etm->prev_packet'=
packet2 and 'etm->packet'=packet1, so far it's okay for branch sample.

If packet2 is the last one packet in trace buffer, even there have no
any new coming packet, cs_etm__run_decoder() invokes cs_etm__flush() to
flush branch stack entries as expected, but it also generates branch
samples by taking 'etm->packet' as a new coming packet, thus the branch
jump flow is as [packet2::end_addr - 4 =>  packet1::start_addr]; this
is the second sample which is generated in step 2.  So actually the
second sample is a stale sample and we should not generate it.

This patch is to add new argument 'new_packet' for cs_etm__flush(), we
can pass 'true' for this argument if there have a new packet, otherwise
it will pass 'false' for the purpose of only flushing branch stack
entries and avoid to generate sample for stale packet.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index fe18d7b..f4fa877 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -955,7 +955,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
return 0;
 }
 
-static int cs_etm__flush(struct cs_etm_queue *etmq)
+static int cs_etm__flush(struct cs_etm_queue *etmq, bool new_packet)
 {
int err = 0;
struct cs_etm_auxtrace *etm = etmq->etm;
@@ -989,6 +989,20 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
 
}
 
+   /*
+* If 'new_packet' is false, this time call has no a new packet
+* coming and 'etmq->packet' contains the stale packet which is
+* set at the previous time with packets swapping.  In this case
+* this function is invoked only for flushing branch stack at
+* the end of buffer handling.
+*
+* Simply to say, branch samples should be generated when every
+* time receive one new packet; otherwise, directly bail out to
+* avoid generate branch sample with stale packet.
+*/
+   if (!new_packet)
+   return 0;
+
if (etm->sample_branches &&
etmq->prev_packet->sample_type == CS_ETM_RANGE) {
err = cs_etm__synth_branch_sample(etmq);
@@ -1075,7 +1089,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 * Discontinuity in trace, flush
 * previous branch stack
 */
-   cs_etm__flush(etmq);
+   cs_etm__flush(etmq, true);
break;
case CS_ETM_EMPTY:
/*
@@ -1092,7 +1106,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 
if (err == 0)
/* Flush any remaining branch stack entries */
-   err = cs_etm__flush(etmq);
+   err = cs_etm__flush(etmq, false);
}
 
return err;
-- 
2.7.4

[PATCH v1 5/5] perf cs-etm: Track exception number

2018-11-10 Thread Leo Yan

When an exception packet comes, it contains the info for exception
number; the exception number indicates the exception types, so from it
we can know if the exception is taken for interrupt, system call or
other traps, etc.  But because the exception return packet cannot
delivery exception number correctly by decoder thus when prepare sample
flags we cannot know what's type for exception return.

This patch adds a new 'exc_num' array in decoder structure to record
exception number per CPU, the exception number is recorded in the array
when the exception packet comes and this exception number can be used by
exception return packet.  If detect there have discontinuous trace with
TRACE_ON or TRACE_OFF packet, the exception number is set to invalid
value.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 ++---
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index b8cb7a3e..d1a6cbc 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -43,6 +43,7 @@ struct cs_etm_decoder {
u32 packet_count;
u32 head;
u32 tail;
+   u32 *exc_num;
struct cs_etm_packet packet_buffer[MAX_BUFFER];
 };
 
@@ -368,24 +369,64 @@ static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
 const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_TRACE_OFF);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_TRACE_OFF);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /* Clear execption number for discontinuous trace */
+   decoder->exc_num[packet->cpu] = UINT32_MAX;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_TRACE_ON);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_TRACE_ON);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /* Clear execption number for discontinuous trace */
+   decoder->exc_num[packet->cpu] = UINT32_MAX;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+const ocsd_generic_trace_elem *elem,
 const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_EXCEPTION);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_EXCEPTION);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /*
+* Exception number is recorded per CPU and later can be used
+* for exception return instruction analysis.
+*/
+   decoder->exc_num[packet->cpu] = elem->exception_number;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
@@ -423,7 +464,7 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
-   resp = cs_etm_decoder__buffer_exception(decoder,
+   resp = cs_etm_decoder__buffer_exception(decoder, elem,
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
@@ -511,6 +552,10 @@ cs_etm_decoder__new(int num_cpu, struct 
cs_etm_decoder_params *d_params,
if (!decoder)
return NULL;
 
+   decoder->exc_num = zalloc(sizeof(*decoder->exc_num) * num_cpu);
+   if (!decoder->exc_num)
+   goto err_free_decoder;
+
decoder->data = d_params->data;
decoder->prev_return = OCSD_RESP_CONT;
cs_etm_decoder__clear_buffer(decoder);
@@ -531,7 +576,7 @@ cs_etm_decoder__new(int num_cpu, struct 
cs_etm_decoder_params *d_params,
decoder->dcd_tree = ocsd_create_dcd_tree(format, flags);
 
if (decoder->dcd_tree == 0)
-

[PATCH v1 5/5] perf cs-etm: Track exception number

2018-11-10 Thread Leo Yan

When an exception packet comes, it contains the info for exception
number; the exception number indicates the exception types, so from it
we can know if the exception is taken for interrupt, system call or
other traps, etc.  But because the exception return packet cannot
delivery exception number correctly by decoder thus when prepare sample
flags we cannot know what's type for exception return.

This patch adds a new 'exc_num' array in decoder structure to record
exception number per CPU, the exception number is recorded in the array
when the exception packet comes and this exception number can be used by
exception return packet.  If detect there have discontinuous trace with
TRACE_ON or TRACE_OFF packet, the exception number is set to invalid
value.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 ++---
 1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index b8cb7a3e..d1a6cbc 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -43,6 +43,7 @@ struct cs_etm_decoder {
u32 packet_count;
u32 head;
u32 tail;
+   u32 *exc_num;
struct cs_etm_packet packet_buffer[MAX_BUFFER];
 };
 
@@ -368,24 +369,64 @@ static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
 const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_TRACE_OFF);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_TRACE_OFF);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /* Clear execption number for discontinuous trace */
+   decoder->exc_num[packet->cpu] = UINT32_MAX;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_TRACE_ON);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_TRACE_ON);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /* Clear execption number for discontinuous trace */
+   decoder->exc_num[packet->cpu] = UINT32_MAX;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+const ocsd_generic_trace_elem *elem,
 const uint8_t trace_chan_id)
 {
-   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
-CS_ETM_EXCEPTION);
+   int ret;
+   struct cs_etm_packet *packet;
+
+   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+   CS_ETM_EXCEPTION);
+   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+   return ret;
+
+   packet = >packet_buffer[decoder->tail];
+
+   /*
+* Exception number is recorded per CPU and later can be used
+* for exception return instruction analysis.
+*/
+   decoder->exc_num[packet->cpu] = elem->exception_number;
+
+   return ret;
 }
 
 static ocsd_datapath_resp_t
@@ -423,7 +464,7 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
-   resp = cs_etm_decoder__buffer_exception(decoder,
+   resp = cs_etm_decoder__buffer_exception(decoder, elem,
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
@@ -511,6 +552,10 @@ cs_etm_decoder__new(int num_cpu, struct 
cs_etm_decoder_params *d_params,
if (!decoder)
return NULL;
 
+   decoder->exc_num = zalloc(sizeof(*decoder->exc_num) * num_cpu);
+   if (!decoder->exc_num)
+   goto err_free_decoder;
+
decoder->data = d_params->data;
decoder->prev_return = OCSD_RESP_CONT;
cs_etm_decoder__clear_buffer(decoder);
@@ -531,7 +576,7 @@ cs_etm_decoder__new(int num_cpu, struct 
cs_etm_decoder_params *d_params,
decoder->dcd_tree = ocsd_create_dcd_tree(format, flags);
 
if (decoder->dcd_tree == 0)
-

[PATCH v1 4/5] perf cs-etm: Generate branch sample for exception packet

2018-11-10 Thread Leo Yan

The exception packet appears as one element with 'elem_type' ==
OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET,
which present for exception entry and exit respectively.  The decoder
set packet fields 'packet->exc' and 'packet->exc_ret' to indicate the
exception packets; but exception packets don't have dedicated sample
type and shares the same sample type CS_ETM_RANGE with normal
instruction packets.

As result, the exception packets are taken as normal instruction packets
and this introduces confusion to mix different packet types.
Furthermore, these instruction range packets will be processed for
branch sample only when 'packet->last_instr_taken_branch' is true,
otherwise they will be omitted, this can introduce mess for exception
and exception returning due we don't have complete address range info
for context switching.

To process exception packets properly, this patch introduce two new
sample type: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; for these two
kind packets, they will be handled by cs_etm__exception().  The func
cs_etm__exception() forces to set previous CS_ETM_RANGE packet flag
'prev_packet->last_instr_taken_branch' to true, this matches well with
the program flow when the exception is trapped from user space to kernel
space, no matter if the most recent flow has branch taken or not; this
is also safe for returning to user space after exception handling.

After exception packets have their own sample type, the packet fields
'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove
them.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 26 +--
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  4 ++--
 tools/perf/util/cs-etm.c| 28 +
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 9d52727..b8cb7a3e 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -269,8 +269,6 @@ static void cs_etm_decoder__clear_buffer(struct 
cs_etm_decoder *decoder)
decoder->packet_buffer[i].instr_count = 0;
decoder->packet_buffer[i].last_instr_taken_branch = false;
decoder->packet_buffer[i].last_instr_size = 0;
-   decoder->packet_buffer[i].exc = false;
-   decoder->packet_buffer[i].exc_ret = false;
decoder->packet_buffer[i].cpu = INT_MIN;
}
 }
@@ -298,8 +296,6 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder 
*decoder,
 
decoder->packet_buffer[et].sample_type = sample_type;
decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
-   decoder->packet_buffer[et].exc = false;
-   decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
@@ -384,6 +380,22 @@ cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder 
*decoder,
 CS_ETM_TRACE_ON);
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_EXCEPTION);
+}
+
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_EXCEPTION_RET);
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -411,10 +423,12 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
-   decoder->packet_buffer[decoder->tail].exc = true;
+   resp = cs_etm_decoder__buffer_exception(decoder,
+   trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
-   decoder->packet_buffer[decoder->tail].exc_ret = true;
+   resp = cs_etm_decoder__buffer_exception_ret(decoder,
+   trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
case OCSD_GEN_TRC_ELEM_EO_TRACE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a38c97c..0d1c18d

[PATCH v1 1/5] perf cs-etm: Correct packets swapping in cs_etm__flush()

2018-11-10 Thread Leo Yan

The structure cs_etm_queue uses 'prev_packet' to point to previous
packet, this can be used to combine with new coming packet to generate
samples.

In function cs_etm__flush() it swaps packets only when the flag
'etm->synth_opts.last_branch' is true, this means that it will not
swap packets if without option '--itrace=il' to generate last branch
entries; thus for this case the 'prev_packet' doesn't point to the
correct previous packet and the stale packet still will be used to
generate sequential sample.  Thus if dump trace with 'perf script'
command we can see the incorrect flow with the stale packet's address
info.

This patch corrects packets swapping in cs_etm__flush(); except using
the flag 'etm->synth_opts.last_branch' it also checks the another flag
'etm->sample_branches', if any flag is true then it swaps packets so
can save correct content to 'prev_packet'.  Finally this can fix the
wrong program flow dumping issue.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 48ad217..fe18d7b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -997,7 +997,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
}
 
 swap_packet:
-   if (etmq->etm->synth_opts.last_branch) {
+   if (etm->sample_branches || etmq->etm->synth_opts.last_branch) {
/*
 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 * the next incoming packet.
-- 
2.7.4

[PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet

2018-11-10 Thread Leo Yan

As described in OpenCSD (CoreSight decoder lib), in the decoding stream
it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the
element indicates 'either at start of decode, or after overflow / bad
packet', we should take it as a signal for the tracing off and this will
cause tracing discontinuity.  From the trace dump with 'perf script',
sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with
element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been
turned off and on, in this case the cs-etm code has handled TRACE_ON
packet well so we observe the tracing discontinuity; but in another case
it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions
packets, we miss to handle the case if has only standalone NO_SYNC
element and users cannot receive the info for tracing discontinuity.

This patch introduces new type CS_ETM_TRACE_OFF to generate packet for
receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate
sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with
CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate
samples for the previous instructions packet, and in cs_etm__sample() it
also needs to generate samples if TRACE_OFF packet is followed by one
sequential instructions packet.  This patch also converts the address to
0 for TRACE_OFF packet, this is same with TRACE_ON packet as well.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  7 ---
 tools/perf/util/cs-etm.c| 15 +++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 5efb616..9d52727 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder 
*decoder,
 }
 
 static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_TRACE_OFF);
+}
+
+static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
const uint8_t trace_chan_id)
 {
@@ -389,6 +397,8 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
case OCSD_GEN_TRC_ELEM_UNKNOWN:
break;
case OCSD_GEN_TRC_ELEM_NO_SYNC:
+   resp = cs_etm_decoder__buffer_trace_off(decoder,
+   trace_chan_id);
decoder->trace_on = false;
break;
case OCSD_GEN_TRC_ELEM_TRACE_ON:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 9351bd1..a38c97c 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -23,9 +23,10 @@ struct cs_etm_buffer {
 };
 
 enum cs_etm_sample_type {
-   CS_ETM_EMPTY = 0,
-   CS_ETM_RANGE = 1 << 0,
-   CS_ETM_TRACE_ON = 1 << 1,
+   CS_ETM_EMPTY= 0,
+   CS_ETM_RANGE= 1 << 0,
+   CS_ETM_TRACE_ON = 1 << 1,
+   CS_ETM_TRACE_OFF= 1 << 2,
 };
 
 enum cs_etm_isa {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f4fa877..2a0cef9 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -517,8 +517,9 @@ static inline int cs_etm__t32_instr_size(struct 
cs_etm_queue *etmq,
 
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
-   /* Returns 0 for the CS_ETM_TRACE_ON packet */
-   if (packet->sample_type == CS_ETM_TRACE_ON)
+   /* Returns 0 for TRACE_ON and TRACE_OFF packets */
+   if (packet->sample_type == CS_ETM_TRACE_ON ||
+   packet->sample_type == CS_ETM_TRACE_OFF)
return 0;
 
return packet->start_addr;
@@ -527,8 +528,9 @@ static inline u64 cs_etm__first_executed_instr(struct 
cs_etm_packet *packet)
 static inline
 u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 {
-   /* Returns 0 for the CS_ETM_TRACE_ON packet */
-   if (packet->sample_type == CS_ETM_TRACE_ON)
+   /* Returns 0 for TRACE_ON and TRACE_OFF packets */
+   if (packet->sample_type == CS_ETM_TRACE_ON ||
+   packet->sample_type == CS_ETM_TRACE_OFF)
return 0;
 
return packet->end_addr - packet->last_instr_size;
@@ -930,6 +932,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
if (etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
generate_sample = true;
 
+   /* Generate sample for tracing off packet */
+

[PATCH v1 0/5] perf cs-etm: Correct packets handling

2018-11-10 Thread Leo Yan

perf cs-etm module converts decoder elements to packets and then we have
more context crossing packets to generate synthenize samples, finally
perf tool can faciliate samples for statistics and report the results.

This patch series is to address several issues found related with
packets handling and samples generation when worked firstly on branch
sample flags support for Arm CoreSight trace data,  so this patch series
also is dependency for another patch series for sample flags.

The first two patches are mainly to fix issues in cs_etm__flush():
Patch 0001 corrects packets swapping in cs_etm__flush() and this can fix
the wrong branch sample caused by the missed packets swapping; patch
0002 is to fix the wrong samples generation with stale packets at the
end of every trace buffer.

Patch 0003 is used to support NO_SYNC packet, otherwise the trace
decoding cannot reflect the tracing discontinuity caused by NO_SYNC
packet.

Patch 0004/0005 has been published in the patch series 'perf cs-etm: Add
support for sample flags' before but at this time I move them into this
patch series due these two patches are more relative with packets
handling.  Patch 0004 is used to generate branch sample for exception
packets; and patch 0005 is to track the exception number.

This patch series is applied on the acme's perf core branch [1] with the
latest commit f1d23afaf677 ("perf bpf: Reduce the hardcoded .max_entries
for pid_maps") and has one prerequisite from Rob's patch 'perf: Support
for Arm A32/T32 instruction sets in CoreSight trace' [2].

With applying the dependency patch, this patch series has been tested
for branch samples dumping with below command on Juno board:

  # perf script -F,-time,+ip,+sym,+dso,+addr,+symoff -k vmlinux

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
[2] 
http://archive.armlinux.org.uk/lurker/message/20181109.091126.9d69489d.en.html


Leo Yan (5):
  perf cs-etm: Correct packets swapping in cs_etm__flush()
  perf cs-etm: Avoid stale branch samples when flush packet
  perf cs-etm: Support for NO_SYNC packet
  perf cs-etm: Generate branch sample for exception packet
  perf cs-etm: Track exception number

 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 91 ++---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 11 +--
 tools/perf/util/cs-etm.c| 65 +++---
 3 files changed, 146 insertions(+), 21 deletions(-)

-- 
2.7.4

[PATCH v1 4/5] perf cs-etm: Generate branch sample for exception packet

2018-11-10 Thread Leo Yan

The exception packet appears as one element with 'elem_type' ==
OCSD_GEN_TRC_ELEM_EXCEPTION or OCSD_GEN_TRC_ELEM_EXCEPTION_RET,
which present for exception entry and exit respectively.  The decoder
set packet fields 'packet->exc' and 'packet->exc_ret' to indicate the
exception packets; but exception packets don't have dedicated sample
type and shares the same sample type CS_ETM_RANGE with normal
instruction packets.

As result, the exception packets are taken as normal instruction packets
and this introduces confusion to mix different packet types.
Furthermore, these instruction range packets will be processed for
branch sample only when 'packet->last_instr_taken_branch' is true,
otherwise they will be omitted, this can introduce mess for exception
and exception returning due we don't have complete address range info
for context switching.

To process exception packets properly, this patch introduce two new
sample type: CS_ETM_EXCEPTION and CS_ETM_EXCEPTION_RET; for these two
kind packets, they will be handled by cs_etm__exception().  The func
cs_etm__exception() forces to set previous CS_ETM_RANGE packet flag
'prev_packet->last_instr_taken_branch' to true, this matches well with
the program flow when the exception is trapped from user space to kernel
space, no matter if the most recent flow has branch taken or not; this
is also safe for returning to user space after exception handling.

After exception packets have their own sample type, the packet fields
'packet->exc' and 'packet->exc_ret' aren't needed anymore, so remove
them.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 26 +--
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  4 ++--
 tools/perf/util/cs-etm.c| 28 +
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 9d52727..b8cb7a3e 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -269,8 +269,6 @@ static void cs_etm_decoder__clear_buffer(struct 
cs_etm_decoder *decoder)
decoder->packet_buffer[i].instr_count = 0;
decoder->packet_buffer[i].last_instr_taken_branch = false;
decoder->packet_buffer[i].last_instr_size = 0;
-   decoder->packet_buffer[i].exc = false;
-   decoder->packet_buffer[i].exc_ret = false;
decoder->packet_buffer[i].cpu = INT_MIN;
}
 }
@@ -298,8 +296,6 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder 
*decoder,
 
decoder->packet_buffer[et].sample_type = sample_type;
decoder->packet_buffer[et].isa = CS_ETM_ISA_UNKNOWN;
-   decoder->packet_buffer[et].exc = false;
-   decoder->packet_buffer[et].exc_ret = false;
decoder->packet_buffer[et].cpu = *((int *)inode->priv);
decoder->packet_buffer[et].start_addr = CS_ETM_INVAL_ADDR;
decoder->packet_buffer[et].end_addr = CS_ETM_INVAL_ADDR;
@@ -384,6 +380,22 @@ cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder 
*decoder,
 CS_ETM_TRACE_ON);
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_EXCEPTION);
+}
+
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_exception_ret(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_EXCEPTION_RET);
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
const void *context,
const ocsd_trc_index_t indx __maybe_unused,
@@ -411,10 +423,12 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION:
-   decoder->packet_buffer[decoder->tail].exc = true;
+   resp = cs_etm_decoder__buffer_exception(decoder,
+   trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_EXCEPTION_RET:
-   decoder->packet_buffer[decoder->tail].exc_ret = true;
+   resp = cs_etm_decoder__buffer_exception_ret(decoder,
+   trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
case OCSD_GEN_TRC_ELEM_EO_TRACE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a38c97c..0d1c18d

[PATCH v1 1/5] perf cs-etm: Correct packets swapping in cs_etm__flush()

2018-11-10 Thread Leo Yan

The structure cs_etm_queue uses 'prev_packet' to point to previous
packet, this can be used to combine with new coming packet to generate
samples.

In function cs_etm__flush() it swaps packets only when the flag
'etm->synth_opts.last_branch' is true, this means that it will not
swap packets if without option '--itrace=il' to generate last branch
entries; thus for this case the 'prev_packet' doesn't point to the
correct previous packet and the stale packet still will be used to
generate sequential sample.  Thus if dump trace with 'perf script'
command we can see the incorrect flow with the stale packet's address
info.

This patch corrects packets swapping in cs_etm__flush(); except using
the flag 'etm->synth_opts.last_branch' it also checks the another flag
'etm->sample_branches', if any flag is true then it swaps packets so
can save correct content to 'prev_packet'.  Finally this can fix the
wrong program flow dumping issue.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 48ad217..fe18d7b 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -997,7 +997,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
}
 
 swap_packet:
-   if (etmq->etm->synth_opts.last_branch) {
+   if (etm->sample_branches || etmq->etm->synth_opts.last_branch) {
/*
 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 * the next incoming packet.
-- 
2.7.4

[PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet

2018-11-10 Thread Leo Yan

As described in OpenCSD (CoreSight decoder lib), in the decoding stream
it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the
element indicates 'either at start of decode, or after overflow / bad
packet', we should take it as a signal for the tracing off and this will
cause tracing discontinuity.  From the trace dump with 'perf script',
sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with
element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been
turned off and on, in this case the cs-etm code has handled TRACE_ON
packet well so we observe the tracing discontinuity; but in another case
it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions
packets, we miss to handle the case if has only standalone NO_SYNC
element and users cannot receive the info for tracing discontinuity.

This patch introduces new type CS_ETM_TRACE_OFF to generate packet for
receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate
sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with
CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate
samples for the previous instructions packet, and in cs_etm__sample() it
also needs to generate samples if TRACE_OFF packet is followed by one
sequential instructions packet.  This patch also converts the address to
0 for TRACE_OFF packet, this is same with TRACE_ON packet as well.

Signed-off-by: Leo Yan 
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  7 ---
 tools/perf/util/cs-etm.c| 15 +++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 5efb616..9d52727 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder 
*decoder,
 }
 
 static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
+const uint8_t trace_chan_id)
+{
+   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+CS_ETM_TRACE_OFF);
+}
+
+static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
const uint8_t trace_chan_id)
 {
@@ -389,6 +397,8 @@ static ocsd_datapath_resp_t 
cs_etm_decoder__gen_trace_elem_printer(
case OCSD_GEN_TRC_ELEM_UNKNOWN:
break;
case OCSD_GEN_TRC_ELEM_NO_SYNC:
+   resp = cs_etm_decoder__buffer_trace_off(decoder,
+   trace_chan_id);
decoder->trace_on = false;
break;
case OCSD_GEN_TRC_ELEM_TRACE_ON:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h 
b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 9351bd1..a38c97c 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -23,9 +23,10 @@ struct cs_etm_buffer {
 };
 
 enum cs_etm_sample_type {
-   CS_ETM_EMPTY = 0,
-   CS_ETM_RANGE = 1 << 0,
-   CS_ETM_TRACE_ON = 1 << 1,
+   CS_ETM_EMPTY= 0,
+   CS_ETM_RANGE= 1 << 0,
+   CS_ETM_TRACE_ON = 1 << 1,
+   CS_ETM_TRACE_OFF= 1 << 2,
 };
 
 enum cs_etm_isa {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f4fa877..2a0cef9 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -517,8 +517,9 @@ static inline int cs_etm__t32_instr_size(struct 
cs_etm_queue *etmq,
 
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
-   /* Returns 0 for the CS_ETM_TRACE_ON packet */
-   if (packet->sample_type == CS_ETM_TRACE_ON)
+   /* Returns 0 for TRACE_ON and TRACE_OFF packets */
+   if (packet->sample_type == CS_ETM_TRACE_ON ||
+   packet->sample_type == CS_ETM_TRACE_OFF)
return 0;
 
return packet->start_addr;
@@ -527,8 +528,9 @@ static inline u64 cs_etm__first_executed_instr(struct 
cs_etm_packet *packet)
 static inline
 u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 {
-   /* Returns 0 for the CS_ETM_TRACE_ON packet */
-   if (packet->sample_type == CS_ETM_TRACE_ON)
+   /* Returns 0 for TRACE_ON and TRACE_OFF packets */
+   if (packet->sample_type == CS_ETM_TRACE_ON ||
+   packet->sample_type == CS_ETM_TRACE_OFF)
return 0;
 
return packet->end_addr - packet->last_instr_size;
@@ -930,6 +932,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
if (etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
generate_sample = true;
 
+   /* Generate sample for tracing off packet */
+

[PATCH v1 0/5] perf cs-etm: Correct packets handling

2018-11-10 Thread Leo Yan

perf cs-etm module converts decoder elements to packets and then we have
more context crossing packets to generate synthenize samples, finally
perf tool can faciliate samples for statistics and report the results.

This patch series is to address several issues found related with
packets handling and samples generation when worked firstly on branch
sample flags support for Arm CoreSight trace data,  so this patch series
also is dependency for another patch series for sample flags.

The first two patches are mainly to fix issues in cs_etm__flush():
Patch 0001 corrects packets swapping in cs_etm__flush() and this can fix
the wrong branch sample caused by the missed packets swapping; patch
0002 is to fix the wrong samples generation with stale packets at the
end of every trace buffer.

Patch 0003 is used to support NO_SYNC packet, otherwise the trace
decoding cannot reflect the tracing discontinuity caused by NO_SYNC
packet.

Patch 0004/0005 has been published in the patch series 'perf cs-etm: Add
support for sample flags' before but at this time I move them into this
patch series due these two patches are more relative with packets
handling.  Patch 0004 is used to generate branch sample for exception
packets; and patch 0005 is to track the exception number.

This patch series is applied on the acme's perf core branch [1] with the
latest commit f1d23afaf677 ("perf bpf: Reduce the hardcoded .max_entries
for pid_maps") and has one prerequisite from Rob's patch 'perf: Support
for Arm A32/T32 instruction sets in CoreSight trace' [2].

With applying the dependency patch, this patch series has been tested
for branch samples dumping with below command on Juno board:

  # perf script -F,-time,+ip,+sym,+dso,+addr,+symoff -k vmlinux

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/core
[2] 
http://archive.armlinux.org.uk/lurker/message/20181109.091126.9d69489d.en.html


Leo Yan (5):
  perf cs-etm: Correct packets swapping in cs_etm__flush()
  perf cs-etm: Avoid stale branch samples when flush packet
  perf cs-etm: Support for NO_SYNC packet
  perf cs-etm: Generate branch sample for exception packet
  perf cs-etm: Track exception number

 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 91 ++---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 11 +--
 tools/perf/util/cs-etm.c| 65 +++---
 3 files changed, 146 insertions(+), 21 deletions(-)

-- 
2.7.4

Re: [PATCH v2] scripts/kconfig/merge_config: don't redefine 'y' to 'm'

2018-11-10 Thread Masahiro Yamada

On Fri, Nov 9, 2018 at 4:45 AM Anders Roxell  wrote:
>
> In today's merge_config.sh the order of the config fragment files dictates
> the output of a config option. With this approach we will get different
> .config files depending on the order of the config fragment files.
>
> So doing something like:
> $ ./merge/kconfig/merge_config.sh selftest.config drm.config
>
> Where selftest.config defines DRM=y and drm.config defines DRM=m, the
> result will be "DRM=m".
>
> Rework to add a switch to get builtin '=y' precedence over modules '=m',
> this will result in "DRM=y". If we do something like this:
>
> $ ./merge/kconfig/merge_config.sh -y selftest.config drm.config
>
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  scripts/kconfig/merge_config.sh | 34 +++--
>  1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
> index da66e7742282..fcd18f642fc7 100755
> --- a/scripts/kconfig/merge_config.sh
> +++ b/scripts/kconfig/merge_config.sh
> @@ -22,6 +22,7 @@
>
>  clean_up() {
> rm -f $TMP_FILE
> +   rm -f $MERGE_FILE
> exit
>  }
>  trap clean_up HUP INT TERM
> @@ -32,6 +33,7 @@ usage() {
> echo "  -monly merge the fragments, do not execute the make 
> command"
> echo "  -nuse allnoconfig instead of alldefconfig"
> echo "  -rlist redundant entries when merging fragments"
> +   echo "  -ymake builtin have precedence over modules"
> echo "  -Odir to put generated output files.  Consider setting 
> \$KCONFIG_CONFIG instead."
> echo
> echo "Used prefix: '$CONFIG_PREFIX'. You can redefine it with 
> \$CONFIG_ environment variable."
> @@ -40,6 +42,8 @@ usage() {
>  RUNMAKE=true
>  ALLTARGET=alldefconfig
>  WARNREDUN=false
> +BUILTIN=false
> +BUILTIN_FLAG=false


Could you move the initialization of BUILTIN_FLAG
into the inner for-loop ?



>  OUTPUT=.
>  CONFIG_PREFIX=${CONFIG_-CONFIG_}
>
> @@ -64,6 +68,11 @@ while true; do
> shift
> continue
> ;;
> +   "-y")
> +   BUILTIN=true
> +   shift
> +   continue
> +   ;;
> "-O")
> if [ -d $2 ];then
> OUTPUT=$(echo $2 | sed 's/\/*$//')
> @@ -105,13 +114,15 @@ MERGE_LIST=$*
>  SED_CONFIG_EXP="s/^\(# \)\{0,1\}\(${CONFIG_PREFIX}[a-zA-Z0-9_]*\)[= ].*/\2/p"
>
>  TMP_FILE=$(mktemp ./.tmp.config.XX)
> +MERGE_FILE=$(mktemp ./.merge_tmp.config.XX)
>
>  echo "Using $INITFILE as base"
>  cat $INITFILE > $TMP_FILE
>
>  # Merge files, printing warnings on overridden values
> -for MERGE_FILE in $MERGE_LIST ; do
> -   echo "Merging $MERGE_FILE"
> +for ORIG_MERGE_FILE in $MERGE_LIST ; do
> +   cat $ORIG_MERGE_FILE > $MERGE_FILE


This 'cat' should be moved after the check
of the presence of '$ORIG_MERGE_FILE'.


> +   echo "Merging $ORIG_MERGE_FILE"
> if [ ! -r "$MERGE_FILE" ]; then

This check always returns false now.


> echo "The merge file '$MERGE_FILE' does not exist.  Exit." >&2
> exit 1
> @@ -122,15 +133,26 @@ for MERGE_FILE in $MERGE_LIST ; do
> grep -q -w $CFG $TMP_FILE || continue
> PREV_VAL=$(grep -w $CFG $TMP_FILE)
> NEW_VAL=$(grep -w $CFG $MERGE_FILE)

Could you add 'BUILTIN_FLAG=false' here?



> -   if [ "x$PREV_VAL" != "x$NEW_VAL" ] ; then
> -   echo Value of $CFG is redefined by fragment 
> $MERGE_FILE:
> +   if [ "$BUILTIN" = "true" ] && [ "${NEW_VAL#CONFIG_*=}" = "m" 
> ] && [ "${PREV_VAL#CONFIG_*=}" = "y" ]; then
> +   echo Previous  value: $PREV_VAL
> +   echo New value:   $NEW_VAL
> +   echo -y passed, will not demote y to m
> +   echo
> +   BUILTIN_FLAG=true
> +   elif [ "x$PREV_VAL" != "x$NEW_VAL" ] ; then
> +   echo Value of $CFG is redefined by fragment 
> $ORIG_MERGE_FILE:
> echo Previous  value: $PREV_VAL
> echo New value:   $NEW_VAL
> echo
> elif [ "$WARNREDUN" = "true" ]; then
> -   echo Value of $CFG is redundant by fragment 
> $MERGE_FILE:
> +   echo Value of $CFG is redundant by fragment 
> $ORIG_MERGE_FILE:
> +   fi
> +   if [ "$BUILTIN_FLAG" = "false" ]; then
> +   sed -i "/$CFG[ =]/d" $TMP_FILE
> +   else
> +   sed -i "/$CFG[ =]/d" $MERGE_FILE
> +   BUILTIN_FLAG=false


Then this 'BUILTIN_FLAG=false' can go away.

Thanks.


> fi
> -   sed -i "/$CFG[ =]/d" $TMP_FILE
> done
> cat $MERGE_FILE >> $TMP_FILE
>  done
> --
> 2.19.1
>


-- 
Best

Re: [PATCH v2] scripts/kconfig/merge_config: don't redefine 'y' to 'm'

2018-11-10 Thread Masahiro Yamada

On Fri, Nov 9, 2018 at 4:45 AM Anders Roxell  wrote:
>
> In today's merge_config.sh the order of the config fragment files dictates
> the output of a config option. With this approach we will get different
> .config files depending on the order of the config fragment files.
>
> So doing something like:
> $ ./merge/kconfig/merge_config.sh selftest.config drm.config
>
> Where selftest.config defines DRM=y and drm.config defines DRM=m, the
> result will be "DRM=m".
>
> Rework to add a switch to get builtin '=y' precedence over modules '=m',
> this will result in "DRM=y". If we do something like this:
>
> $ ./merge/kconfig/merge_config.sh -y selftest.config drm.config
>
> Suggested-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  scripts/kconfig/merge_config.sh | 34 +++--
>  1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
> index da66e7742282..fcd18f642fc7 100755
> --- a/scripts/kconfig/merge_config.sh
> +++ b/scripts/kconfig/merge_config.sh
> @@ -22,6 +22,7 @@
>
>  clean_up() {
> rm -f $TMP_FILE
> +   rm -f $MERGE_FILE
> exit
>  }
>  trap clean_up HUP INT TERM
> @@ -32,6 +33,7 @@ usage() {
> echo "  -monly merge the fragments, do not execute the make 
> command"
> echo "  -nuse allnoconfig instead of alldefconfig"
> echo "  -rlist redundant entries when merging fragments"
> +   echo "  -ymake builtin have precedence over modules"
> echo "  -Odir to put generated output files.  Consider setting 
> \$KCONFIG_CONFIG instead."
> echo
> echo "Used prefix: '$CONFIG_PREFIX'. You can redefine it with 
> \$CONFIG_ environment variable."
> @@ -40,6 +42,8 @@ usage() {
>  RUNMAKE=true
>  ALLTARGET=alldefconfig
>  WARNREDUN=false
> +BUILTIN=false
> +BUILTIN_FLAG=false


Could you move the initialization of BUILTIN_FLAG
into the inner for-loop ?



>  OUTPUT=.
>  CONFIG_PREFIX=${CONFIG_-CONFIG_}
>
> @@ -64,6 +68,11 @@ while true; do
> shift
> continue
> ;;
> +   "-y")
> +   BUILTIN=true
> +   shift
> +   continue
> +   ;;
> "-O")
> if [ -d $2 ];then
> OUTPUT=$(echo $2 | sed 's/\/*$//')
> @@ -105,13 +114,15 @@ MERGE_LIST=$*
>  SED_CONFIG_EXP="s/^\(# \)\{0,1\}\(${CONFIG_PREFIX}[a-zA-Z0-9_]*\)[= ].*/\2/p"
>
>  TMP_FILE=$(mktemp ./.tmp.config.XX)
> +MERGE_FILE=$(mktemp ./.merge_tmp.config.XX)
>
>  echo "Using $INITFILE as base"
>  cat $INITFILE > $TMP_FILE
>
>  # Merge files, printing warnings on overridden values
> -for MERGE_FILE in $MERGE_LIST ; do
> -   echo "Merging $MERGE_FILE"
> +for ORIG_MERGE_FILE in $MERGE_LIST ; do
> +   cat $ORIG_MERGE_FILE > $MERGE_FILE


This 'cat' should be moved after the check
of the presence of '$ORIG_MERGE_FILE'.


> +   echo "Merging $ORIG_MERGE_FILE"
> if [ ! -r "$MERGE_FILE" ]; then

This check always returns false now.


> echo "The merge file '$MERGE_FILE' does not exist.  Exit." >&2
> exit 1
> @@ -122,15 +133,26 @@ for MERGE_FILE in $MERGE_LIST ; do
> grep -q -w $CFG $TMP_FILE || continue
> PREV_VAL=$(grep -w $CFG $TMP_FILE)
> NEW_VAL=$(grep -w $CFG $MERGE_FILE)

Could you add 'BUILTIN_FLAG=false' here?



> -   if [ "x$PREV_VAL" != "x$NEW_VAL" ] ; then
> -   echo Value of $CFG is redefined by fragment 
> $MERGE_FILE:
> +   if [ "$BUILTIN" = "true" ] && [ "${NEW_VAL#CONFIG_*=}" = "m" 
> ] && [ "${PREV_VAL#CONFIG_*=}" = "y" ]; then
> +   echo Previous  value: $PREV_VAL
> +   echo New value:   $NEW_VAL
> +   echo -y passed, will not demote y to m
> +   echo
> +   BUILTIN_FLAG=true
> +   elif [ "x$PREV_VAL" != "x$NEW_VAL" ] ; then
> +   echo Value of $CFG is redefined by fragment 
> $ORIG_MERGE_FILE:
> echo Previous  value: $PREV_VAL
> echo New value:   $NEW_VAL
> echo
> elif [ "$WARNREDUN" = "true" ]; then
> -   echo Value of $CFG is redundant by fragment 
> $MERGE_FILE:
> +   echo Value of $CFG is redundant by fragment 
> $ORIG_MERGE_FILE:
> +   fi
> +   if [ "$BUILTIN_FLAG" = "false" ]; then
> +   sed -i "/$CFG[ =]/d" $TMP_FILE
> +   else
> +   sed -i "/$CFG[ =]/d" $MERGE_FILE
> +   BUILTIN_FLAG=false


Then this 'BUILTIN_FLAG=false' can go away.

Thanks.


> fi
> -   sed -i "/$CFG[ =]/d" $TMP_FILE
> done
> cat $MERGE_FILE >> $TMP_FILE
>  done
> --
> 2.19.1
>


-- 
Best

crashkernel=512M is no longer working on this aarch64 server

2018-11-10 Thread Qian Cai

It was broken somewhere between b00d209241ff and 3541833fd1f2.

[0.00] cannot allocate crashkernel (size:0x2000)

Where a good one looks like this,

[0.00] crashkernel reserved: 0x0860 - 0x2860 
(512 MB)

Some commits look more suspicious than others.

  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: make the __PAGETABLE_PxD_FOLDED defines non-empty

# diff -u ../iomem.good.txt ../iomem.bad.txt 
--- ../iomem.good.txt   2018-11-10 22:28:20.092614398 -0500
+++ ../iomem.bad.txt2018-11-10 20:39:54.930294479 -0500
@@ -1,9 +1,8 @@
 -3965 : System RAM
   0008-018c : Kernel code
-  018d-020a : reserved
-  020b-045a : Kernel data
-  0860-285f : Crash kernel
-  2873-2d5a : reserved
+  018d-0762 : reserved
+  0763-09b2 : Kernel data
+  231b-2802 : reserved
   30ec-30ec : reserved
   3566-3965 : reserved
 3966-396f : reserved
@@ -127,7 +126,7 @@
   7c520-7c520 : 0004:48:00.0
 104000-17fbff : System RAM
   13fbfd-13fdfd : reserved
-  16fba8-17fbfd : reserved
+  16fafd-17fbfd : reserved
   17fbfe-17fbff : reserved
 18-1ffbff : System RAM
   1bfbff-1bfdfe : reserved

The memory map looks like this,

[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x398D0014 24 (v02 HISI  )
[0.00] ACPI: XSDT 0x398C00E8 64 (v01 HISI   HIP07
  0113)
[0.00] ACPI: FACP 0x3977 000114 (v06 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: DSDT 0x3973 00691A (v02 HISI   HIP07
 INTL 20170728)
[0.00] ACPI: MCFG 0x397C AC (v01 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: SLIT 0x397B 3C (v01 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: SRAT 0x397A 000578 (v03 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: DBG2 0x3979 5A (v00 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: GTDT 0x3976 7C (v02 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: APIC 0x3975 0014E4 (v04 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: IORT 0x3974 000554 (v00 HISI   HIP07
 INTL 20170728)
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x3fff]
[0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x18-0x1f]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x10-0x17]
[0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x90-0x97]
[0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x88-0x8f]
[0.00] NUMA: NODE_DATA [mem 0x17fbffe5c0-0x17fbff]
[0.00] NUMA: NODE_DATA [mem 0x1ffbffe5c0-0x1ffbff]
[0.00] NUMA: NODE_DATA [mem 0x8ffbffe5c0-0x8ffbff]
[0.00] NUMA: NODE_DATA [mem 0x97fadce5c0-0x97fadc]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x-0x]
[0.00]   Normal   [mem 0x0001-0x0097fbff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x3965]
[0.00]   node   0: [mem 0x3966-0x396f]
[0.00]   node   0: [mem 0x3970-0x3977]
[0.00]   node   0: [mem 0x3978-0x3978]
[0.00]   node   0: [mem 0x3979-0x397c]
[0.00]   node   0: [mem 0x397d-0x398b]
[0.00]   node   0: [mem 0x398c-0x398d]
[0.00]   node   0: [mem 0x398e-0x39d5]
[0.00]   node   0: [mem 0x39d6-0x3ed4]
[0.00]   node   0: [mem 0x3ed5-0x3ed7]
[0.00]   node   0: [mem 0x3ed8-0x3fbf]
[0.00]   node   0: [mem 0x00104000-0x0017fbff]
[0.00]   node   1: [mem 0x0018-0x001ffbff]
[0.00]   node   2: [mem 0x0088-0x008ffbff]
[0.00]   node   3: [mem 0x0090-0x0097fbff]

crashkernel=512M is no longer working on this aarch64 server

2018-11-10 Thread Qian Cai

It was broken somewhere between b00d209241ff and 3541833fd1f2.

[0.00] cannot allocate crashkernel (size:0x2000)

Where a good one looks like this,

[0.00] crashkernel reserved: 0x0860 - 0x2860 
(512 MB)

Some commits look more suspicious than others.

  mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
  mm: introduce mm_[p4d|pud|pmd]_folded
  mm: make the __PAGETABLE_PxD_FOLDED defines non-empty

# diff -u ../iomem.good.txt ../iomem.bad.txt 
--- ../iomem.good.txt   2018-11-10 22:28:20.092614398 -0500
+++ ../iomem.bad.txt2018-11-10 20:39:54.930294479 -0500
@@ -1,9 +1,8 @@
 -3965 : System RAM
   0008-018c : Kernel code
-  018d-020a : reserved
-  020b-045a : Kernel data
-  0860-285f : Crash kernel
-  2873-2d5a : reserved
+  018d-0762 : reserved
+  0763-09b2 : Kernel data
+  231b-2802 : reserved
   30ec-30ec : reserved
   3566-3965 : reserved
 3966-396f : reserved
@@ -127,7 +126,7 @@
   7c520-7c520 : 0004:48:00.0
 104000-17fbff : System RAM
   13fbfd-13fdfd : reserved
-  16fba8-17fbfd : reserved
+  16fafd-17fbfd : reserved
   17fbfe-17fbff : reserved
 18-1ffbff : System RAM
   1bfbff-1bfdfe : reserved

The memory map looks like this,

[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x398D0014 24 (v02 HISI  )
[0.00] ACPI: XSDT 0x398C00E8 64 (v01 HISI   HIP07
  0113)
[0.00] ACPI: FACP 0x3977 000114 (v06 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: DSDT 0x3973 00691A (v02 HISI   HIP07
 INTL 20170728)
[0.00] ACPI: MCFG 0x397C AC (v01 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: SLIT 0x397B 3C (v01 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: SRAT 0x397A 000578 (v03 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: DBG2 0x3979 5A (v00 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: GTDT 0x3976 7C (v02 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: APIC 0x3975 0014E4 (v04 HISI   HIP07
 INTL 20151124)
[0.00] ACPI: IORT 0x3974 000554 (v00 HISI   HIP07
 INTL 20170728)
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x3fff]
[0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x18-0x1f]
[0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x10-0x17]
[0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x90-0x97]
[0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x88-0x8f]
[0.00] NUMA: NODE_DATA [mem 0x17fbffe5c0-0x17fbff]
[0.00] NUMA: NODE_DATA [mem 0x1ffbffe5c0-0x1ffbff]
[0.00] NUMA: NODE_DATA [mem 0x8ffbffe5c0-0x8ffbff]
[0.00] NUMA: NODE_DATA [mem 0x97fadce5c0-0x97fadc]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x-0x]
[0.00]   Normal   [mem 0x0001-0x0097fbff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x3965]
[0.00]   node   0: [mem 0x3966-0x396f]
[0.00]   node   0: [mem 0x3970-0x3977]
[0.00]   node   0: [mem 0x3978-0x3978]
[0.00]   node   0: [mem 0x3979-0x397c]
[0.00]   node   0: [mem 0x397d-0x398b]
[0.00]   node   0: [mem 0x398c-0x398d]
[0.00]   node   0: [mem 0x398e-0x39d5]
[0.00]   node   0: [mem 0x39d6-0x3ed4]
[0.00]   node   0: [mem 0x3ed5-0x3ed7]
[0.00]   node   0: [mem 0x3ed8-0x3fbf]
[0.00]   node   0: [mem 0x00104000-0x0017fbff]
[0.00]   node   1: [mem 0x0018-0x001ffbff]
[0.00]   node   2: [mem 0x0088-0x008ffbff]
[0.00]   node   3: [mem 0x0090-0x0097fbff]

[PATCH] arm64: disable KASAN for save_trace()

2018-11-10 Thread Zhizhou Zhang

save_trace() which is called from walk_stackframe() always try to
read/write caller's stack. This results KASAN stack-out-of-bounds
warning. So mute it.

Signed-off-by: Zhizhou Zhang 
---
 arch/arm64/kernel/stacktrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7e..e93ca67 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -107,7 +107,7 @@ struct stack_trace_data {
unsigned int skip;
 };
 
-static int save_trace(struct stackframe *frame, void *d)
+static int __no_sanitize_address save_trace(struct stackframe *frame, void *d)
 {
struct stack_trace_data *data = d;
struct stack_trace *trace = data->trace;
-- 
2.7.4

[PATCH] arm64: disable KASAN for save_trace()

2018-11-10 Thread Zhizhou Zhang

save_trace() which is called from walk_stackframe() always try to
read/write caller's stack. This results KASAN stack-out-of-bounds
warning. So mute it.

Signed-off-by: Zhizhou Zhang 
---
 arch/arm64/kernel/stacktrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 4989f7e..e93ca67 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -107,7 +107,7 @@ struct stack_trace_data {
unsigned int skip;
 };
 
-static int save_trace(struct stackframe *frame, void *d)
+static int __no_sanitize_address save_trace(struct stackframe *frame, void *d)
 {
struct stack_trace_data *data = d;
struct stack_trace *trace = data->trace;
-- 
2.7.4

Re: dyntick-idle CPU and node's qsmask

2018-11-10 Thread Paul E. McKenney

On Sat, Nov 10, 2018 at 07:09:25PM -0800, Joel Fernandes wrote:
> On Sat, Nov 10, 2018 at 03:04:36PM -0800, Paul E. McKenney wrote:
> > On Sat, Nov 10, 2018 at 01:46:59PM -0800, Joel Fernandes wrote:
> > > Hi Paul and everyone,
> > > 
> > > I was tracing/studying the RCU code today in paul/dev branch and noticed 
> > > that
> > > for dyntick-idle CPUs, the RCU GP thread is clearing the rnp->qsmask
> > > corresponding to the leaf node for the idle CPU, and reporting a QS on 
> > > their
> > > behalf.
> > > 
> > > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 792 0 
> > > dti
> > > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 801 2 
> > > dti
> > > rcu_sched-10[003]40.008041: rcu_quiescent_state_report: rcu_sched 
> > > 805 5>0 0 0 3 0
> > > 
> > > That's all good but I was wondering if we can do better for the idle CPUs 
> > > if
> > > we can some how not set the qsmask of the node in the first place. Then no
> > > reporting would be needed of quiescent state is needed for idle CPUs 
> > > right?
> > > And we would also not need to acquire the rnp lock I think.
> > > 
> > > At least for a single node tree RCU system, it seems that would avoid 
> > > needing
> > > to acquire the lock without complications. Anyway let me know your 
> > > thoughts
> > > and happy to discuss this at the hallways of the LPC as well for folks
> > > attending :)
> > 
> > We could, but that would require consulting the rcu_data structure for
> > each CPU while initializing the grace period, thus increasing the number
> > of cache misses during grace-period initialization and also shortly after
> > for any non-idle CPUs.  This seems backwards on busy systems where each
> 
> When I traced, it appears to me that rcu_data structure of a remote CPU was
> being consulted anyway by the rcu_sched thread. So it seems like such cache
> miss would happen anyway whether it is during grace-period initialization or
> during the fqs stage? I guess I'm trying to say, the consultation of remote
> CPU's rcu_data happens anyway.

Hmmm...

The rcu_gp_init() function does access an rcu_data structure, but it is
that of the current CPU, so shouldn't involve a communications cache miss,
at least not in the common case.

Or are you seeing these cross-CPU rcu_data accesses in rcu_gp_fqs() or
functions that it calls?  In that case, please see below.

> > CPU will with high probability report its own quiescent state before three
> > jiffies pass, in which case the cache misses on the rcu_data structures
> > would be wasted motion.
> 
> If all the CPUs are busy and reporting their QS themselves, then I think the
> qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
> force_qs_rnp) wouldn't be called and so there would no cache misses on
> rcu_data right?

Yes, but assuming that all CPUs report their quiescent states before
the first call to rcu_gp_fqs().  One exception is when some CPU is
looping in the kernel for many milliseconds without passing through a
quiescent state.  This is because for recent kernels, cond_resched()
is not a quiescent state until the grace period is something like 100
milliseconds old.  (For older kernels, cond_resched() was never an RCU
quiescent state unless it actually scheduled.)

Why wait 100 milliseconds?  Because otherwise the increase in
cond_resched() overhead shows up all too well, causing 0day test robot
to complain bitterly.  Besides, I would expect that in the common case,
CPUs would be executing usermode code.

Ah, did you build with NO_HZ_FULL, boot with nohz_full CPUs, and then run
CPU-bound usermode workloads on those CPUs?  Such CPUs would appear to
be idle from an RCU perspective.  But these CPUs would never touch their
rcu_data structures, so they would likely remain in the RCU grace-period
kthread's cache.  So this should work well also.  Give or take that other
work would likely eject them from the cache, but in that case they would
be capacity cache misses rather than the aforementioned communications
cache misses.  Not that this distinction matters to whoever is measuring
performance.  ;-)

> > Now, this does increase overhead on mostly idle systems, but the theory
> > is that mostly idle systems are most able to absorb this extra overhead.
> 
> Yes. Could we use rcuperf to check the impact of such change?

I would be very surprised if the overhead was large enough for rcuperf
to be able to see it.

> Anyway it was just an idea that popped up when I was going through traces :)
> Thanks for the discussion and happy to discuss further or try out anything.

Either way, I do appreciate your going through this.  People have found
RCU bugs this way, one of which involved RCU uselessly calling a particular
function twice in quick succession.  ;-)

Thanx, Paul

Re: dyntick-idle CPU and node's qsmask

2018-11-10 Thread Paul E. McKenney

On Sat, Nov 10, 2018 at 07:09:25PM -0800, Joel Fernandes wrote:
> On Sat, Nov 10, 2018 at 03:04:36PM -0800, Paul E. McKenney wrote:
> > On Sat, Nov 10, 2018 at 01:46:59PM -0800, Joel Fernandes wrote:
> > > Hi Paul and everyone,
> > > 
> > > I was tracing/studying the RCU code today in paul/dev branch and noticed 
> > > that
> > > for dyntick-idle CPUs, the RCU GP thread is clearing the rnp->qsmask
> > > corresponding to the leaf node for the idle CPU, and reporting a QS on 
> > > their
> > > behalf.
> > > 
> > > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 792 0 
> > > dti
> > > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 801 2 
> > > dti
> > > rcu_sched-10[003]40.008041: rcu_quiescent_state_report: rcu_sched 
> > > 805 5>0 0 0 3 0
> > > 
> > > That's all good but I was wondering if we can do better for the idle CPUs 
> > > if
> > > we can some how not set the qsmask of the node in the first place. Then no
> > > reporting would be needed of quiescent state is needed for idle CPUs 
> > > right?
> > > And we would also not need to acquire the rnp lock I think.
> > > 
> > > At least for a single node tree RCU system, it seems that would avoid 
> > > needing
> > > to acquire the lock without complications. Anyway let me know your 
> > > thoughts
> > > and happy to discuss this at the hallways of the LPC as well for folks
> > > attending :)
> > 
> > We could, but that would require consulting the rcu_data structure for
> > each CPU while initializing the grace period, thus increasing the number
> > of cache misses during grace-period initialization and also shortly after
> > for any non-idle CPUs.  This seems backwards on busy systems where each
> 
> When I traced, it appears to me that rcu_data structure of a remote CPU was
> being consulted anyway by the rcu_sched thread. So it seems like such cache
> miss would happen anyway whether it is during grace-period initialization or
> during the fqs stage? I guess I'm trying to say, the consultation of remote
> CPU's rcu_data happens anyway.

Hmmm...

The rcu_gp_init() function does access an rcu_data structure, but it is
that of the current CPU, so shouldn't involve a communications cache miss,
at least not in the common case.

Or are you seeing these cross-CPU rcu_data accesses in rcu_gp_fqs() or
functions that it calls?  In that case, please see below.

> > CPU will with high probability report its own quiescent state before three
> > jiffies pass, in which case the cache misses on the rcu_data structures
> > would be wasted motion.
> 
> If all the CPUs are busy and reporting their QS themselves, then I think the
> qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
> force_qs_rnp) wouldn't be called and so there would no cache misses on
> rcu_data right?

Yes, but assuming that all CPUs report their quiescent states before
the first call to rcu_gp_fqs().  One exception is when some CPU is
looping in the kernel for many milliseconds without passing through a
quiescent state.  This is because for recent kernels, cond_resched()
is not a quiescent state until the grace period is something like 100
milliseconds old.  (For older kernels, cond_resched() was never an RCU
quiescent state unless it actually scheduled.)

Why wait 100 milliseconds?  Because otherwise the increase in
cond_resched() overhead shows up all too well, causing 0day test robot
to complain bitterly.  Besides, I would expect that in the common case,
CPUs would be executing usermode code.

Ah, did you build with NO_HZ_FULL, boot with nohz_full CPUs, and then run
CPU-bound usermode workloads on those CPUs?  Such CPUs would appear to
be idle from an RCU perspective.  But these CPUs would never touch their
rcu_data structures, so they would likely remain in the RCU grace-period
kthread's cache.  So this should work well also.  Give or take that other
work would likely eject them from the cache, but in that case they would
be capacity cache misses rather than the aforementioned communications
cache misses.  Not that this distinction matters to whoever is measuring
performance.  ;-)

> > Now, this does increase overhead on mostly idle systems, but the theory
> > is that mostly idle systems are most able to absorb this extra overhead.
> 
> Yes. Could we use rcuperf to check the impact of such change?

I would be very surprised if the overhead was large enough for rcuperf
to be able to see it.

> Anyway it was just an idea that popped up when I was going through traces :)
> Thanks for the discussion and happy to discuss further or try out anything.

Either way, I do appreciate your going through this.  People have found
RCU bugs this way, one of which involved RCU uselessly calling a particular
function twice in quick succession.  ;-)

Thanx, Paul

RE,

2018-11-10 Thread Miss Juliet Muhammad

I have a deal for you, in your region.

RE,

2018-11-10 Thread Miss Juliet Muhammad

I have a deal for you, in your region.

RE: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-11-10 Thread Dexuan Cui

> From: gre...@linuxfoundation.org 
> Sent: Thursday, November 1, 2018 21:54
> To: Dexuan Cui 
> Cc: Michael Kelley ; KY Srinivasan
> ; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; Stephen Hemminger ;
> vkuznets ; Sasha Levin
> ; Haiyang Zhang ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused
> by incorrect clean-up
> 
> On Thu, Nov 01, 2018 at 07:22:28PM +, Dexuan Cui wrote:
> > > From: gre...@linuxfoundation.org 
> > > Sent: Thursday, November 1, 2018 11:57
> > > To: Dexuan Cui 
> > >
> > > On Wed, Oct 31, 2018 at 11:23:54PM +, Dexuan Cui wrote:
> > > > > From: Michael Kelley 
> > > > > Sent: Wednesday, October 24, 2018 08:38
> > > > > From: k...@linuxonhyperv.com   Sent:
> > > Wednesday,
> > > > > October 17, 2018 10:10 PM
> > > > > > From: Dexuan Cui 
> > > > > >
> > > > > > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > > > > > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it
> turns
> > > out
> > > > > > the userland hv_kvp_daemon needs the info of operation, adapter_id
> > > and
> > > > > > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > > > > > VM's IP via KVP.
> > > > > >
> > > > > > And, fc62c3b1977d added a "break;", but actually forgot to 
> > > > > > initialize
> > > > > > the key_size/value in the case of KVP_OP_SET, so the default 
> > > > > > key_size
> of
> > > > > > 0 is passed to the kvp daemon, and the pool files
> > > > > > /var/lib/hyperv/.kvp_pool_* can't be updated.
> > > > > >
> > > > > > This patch effectively rolls back the previous fc62c3b1977d, and
> > > > > > correctly fixes the "this statement may fall through" warnings.
> > > > > >
> > > > > > This patch is tested on WS 2012 R2 and 2016.
> > > > > >
> > > > > > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may
> fall
> > > > > through" warnings")
> > > > > > Signed-off-by: Dexuan Cui 
> > > > > > Cc: K. Y. Srinivasan 
> > > > > > Cc: Haiyang Zhang 
> > > > > > Cc: Stephen Hemminger 
> > > > > > Cc: 
> > > > > > Signed-off-by: K. Y. Srinivasan 
> > > > > > ---
> > > > > >  drivers/hv/hv_kvp.c | 26 ++
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > Reviewed-by: Michael Kelley 
> > > >
> > > > Hi Greg,
> > > > Can you please take a look at this patch?
> > >
> > > Nope, I'm not the hv maintainer, they need to look at this and ack it,
> > > not me :)
> > >
> > > greg k-h
> >
> > Hi Greg,
> > KY has added his Signed-off-by in the mail.
> >
> > I'll ask the other HV maintainers to take a look as well.
> 
> Ok, then I'll look at it after 4.20-rc1 is out, nothing I can do until
> then anyway...
> 
> thanks,
> 
> greg k-h

Hi Greg,
Can you please take a look at the patch now?

The patch has received

Reviewed-by: Michael Kelley 
Signed-off-by: Haiyang Zhang 
Signed-off-by: K. Y. Srinivasan 

Thanks,
-- Dexuan

RE: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-11-10 Thread Dexuan Cui

> From: gre...@linuxfoundation.org 
> Sent: Thursday, November 1, 2018 21:54
> To: Dexuan Cui 
> Cc: Michael Kelley ; KY Srinivasan
> ; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; Stephen Hemminger ;
> vkuznets ; Sasha Levin
> ; Haiyang Zhang ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused
> by incorrect clean-up
> 
> On Thu, Nov 01, 2018 at 07:22:28PM +, Dexuan Cui wrote:
> > > From: gre...@linuxfoundation.org 
> > > Sent: Thursday, November 1, 2018 11:57
> > > To: Dexuan Cui 
> > >
> > > On Wed, Oct 31, 2018 at 11:23:54PM +, Dexuan Cui wrote:
> > > > > From: Michael Kelley 
> > > > > Sent: Wednesday, October 24, 2018 08:38
> > > > > From: k...@linuxonhyperv.com   Sent:
> > > Wednesday,
> > > > > October 17, 2018 10:10 PM
> > > > > > From: Dexuan Cui 
> > > > > >
> > > > > > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > > > > > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it
> turns
> > > out
> > > > > > the userland hv_kvp_daemon needs the info of operation, adapter_id
> > > and
> > > > > > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > > > > > VM's IP via KVP.
> > > > > >
> > > > > > And, fc62c3b1977d added a "break;", but actually forgot to 
> > > > > > initialize
> > > > > > the key_size/value in the case of KVP_OP_SET, so the default 
> > > > > > key_size
> of
> > > > > > 0 is passed to the kvp daemon, and the pool files
> > > > > > /var/lib/hyperv/.kvp_pool_* can't be updated.
> > > > > >
> > > > > > This patch effectively rolls back the previous fc62c3b1977d, and
> > > > > > correctly fixes the "this statement may fall through" warnings.
> > > > > >
> > > > > > This patch is tested on WS 2012 R2 and 2016.
> > > > > >
> > > > > > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may
> fall
> > > > > through" warnings")
> > > > > > Signed-off-by: Dexuan Cui 
> > > > > > Cc: K. Y. Srinivasan 
> > > > > > Cc: Haiyang Zhang 
> > > > > > Cc: Stephen Hemminger 
> > > > > > Cc: 
> > > > > > Signed-off-by: K. Y. Srinivasan 
> > > > > > ---
> > > > > >  drivers/hv/hv_kvp.c | 26 ++
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > Reviewed-by: Michael Kelley 
> > > >
> > > > Hi Greg,
> > > > Can you please take a look at this patch?
> > >
> > > Nope, I'm not the hv maintainer, they need to look at this and ack it,
> > > not me :)
> > >
> > > greg k-h
> >
> > Hi Greg,
> > KY has added his Signed-off-by in the mail.
> >
> > I'll ask the other HV maintainers to take a look as well.
> 
> Ok, then I'll look at it after 4.20-rc1 is out, nothing I can do until
> then anyway...
> 
> thanks,
> 
> greg k-h

Hi Greg,
Can you please take a look at the patch now?

The patch has received

Reviewed-by: Michael Kelley 
Signed-off-by: Haiyang Zhang 
Signed-off-by: K. Y. Srinivasan 

Thanks,
-- Dexuan

RE,

2018-11-10 Thread Miss Juliet Muhammad

I have a deal for you, in your region.

RE,

2018-11-10 Thread Miss Juliet Muhammad

I have a deal for you, in your region.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> >> 
>  On Nov 10, 2018, at 2:09 PM, Joel Fernandes  
>  wrote:
>  
> > On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione 
> >>  wrote:
> >> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
> >>  wrote:
> >> Thanks Andy for your thoughts, my comments below:
>  [snip]
> >> I don't see it as warty, different seals will work differently. It 
> >> works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >> 
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >> 
> >> Wouldn't that be the right thing to observe anyway?
> >> 
> >>> - causes reopen to fail.
> >> 
> >> So this concern isn't true anymore if we make reopen fail only for 
> >> WRITE
> >> opens as Daniel suggested. I will make this change so that the 
> >> security fix
> >> is a clean one.
> >> 
> >>> - does *not* affect other struct files that may already exist on the 
> >>> same inode.
> >> 
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent 
> >> over IPC
> >> to another process and we want to prevent any new writes in the 
> >> receiver
> >> side. There is no way this other receiving process can have an 
> >> existing fd
> >> unless it was already sent one without the seal applied.  The proposed 
> >> seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >> 
> >>> - mysteriously malfunctions if you try to set it again on another 
> >>> struct
> >>> file that already exists
> >>> 
> >> 
> >> I didn't follow this, could you explain more?
> >> 
> >>> - probably is insecure when used on hugetlbfs.
> >> 
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the 
> >> memory map.
> >> So would you call that a security issue too? The use of the seal wants 
> >> to
> >> allow existing mmap regions to be continue to be written into (I 
> >> mentioned
> >> more details in the cover letter).
> >> 
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 07:40:10PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> >> 
>  On Nov 10, 2018, at 2:09 PM, Joel Fernandes  
>  wrote:
>  
> > On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione 
> >>  wrote:
> >> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
> >>  wrote:
> >> Thanks Andy for your thoughts, my comments below:
>  [snip]
> >> I don't see it as warty, different seals will work differently. It 
> >> works
> >> quite well for our usecase, and since Linux is all about solving real
> >> problems in the real work, it would be useful to have it.
> >> 
> >>> - causes a probably-observable effect in the file mode in F_GETFL.
> >> 
> >> Wouldn't that be the right thing to observe anyway?
> >> 
> >>> - causes reopen to fail.
> >> 
> >> So this concern isn't true anymore if we make reopen fail only for 
> >> WRITE
> >> opens as Daniel suggested. I will make this change so that the 
> >> security fix
> >> is a clean one.
> >> 
> >>> - does *not* affect other struct files that may already exist on the 
> >>> same inode.
> >> 
> >> TBH if you really want to block all writes to the file, then you want
> >> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent 
> >> over IPC
> >> to another process and we want to prevent any new writes in the 
> >> receiver
> >> side. There is no way this other receiving process can have an 
> >> existing fd
> >> unless it was already sent one without the seal applied.  The proposed 
> >> seal
> >> could be renamed to F_SEAL_FD_WRITE if that is preferred.
> >> 
> >>> - mysteriously malfunctions if you try to set it again on another 
> >>> struct
> >>> file that already exists
> >>> 
> >> 
> >> I didn't follow this, could you explain more?
> >> 
> >>> - probably is insecure when used on hugetlbfs.
> >> 
> >> The usecase is not expected to prevent all writes, indeed the usecase
> >> requires existing mmaps to continue to be able to write into the 
> >> memory map.
> >> So would you call that a security issue too? The use of the seal wants 
> >> to
> >> allow existing mmap regions to be continue to be written into (I 
> >> mentioned
> >> more details in the cover letter).
> >> 
> >>> I see two reasonable solutions:
> >>> 
> >>> 1. Don’t fiddle with the struct file at all. Instead make the inode 
> >>> flag
> >>> work by itself.
> >> 
> >> Currently, the various VFS paths check only the struct file's f_mode 
> >> to deny
> >> writes of already opened files. This would mean more checking in all 
> >> those
> >> paths (and modification of all those paths).
> >> 
> >> Anyway going with that idea, we could
> >> 1. call deny_write_access(file) from the memfd's seal path which 
> >> decrements
> >> the inode::i_writecount.
> >> 2. call get_write_access(inode) in the various VFS paths in addition to
> >> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
> >> negative)
> >> 
> >> That will prevent both reopens, and writes from succeeding. However I 
> >> worry a
> >> bit about 2 not being too familiar with VFS internals, about what the
> >> consequences of doing that may be.
> > 
> > IMHO, modifying both the inode and the struct file separately is fine,
> > since they mean different things. In regular filesystems, it's fine to
> > have a read-write open file description for a file whose inode grants
> > write permission to nobody. Speaking of which: is fchmod enough to
> > prevent this attack?
>  
>  Well, yes and no. fchmod does prevent reopening the file RW, but
>  anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
>  seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
>  isn't sufficient by itself. While it might be good enough for Android
>  (in the sense that it'll prevent RW-reopens from other security
>  contexts to which we send an open memfd file), it's still conceptually
>  ugly, IMHO. Let's go with the original approach of just tweaking the
>  inode so that open-for-write is permanently blocked.
> >>> 
> >>> Agreed with the idea of modifying both file and inode flags. I was 
> >>> thinking
> >>> modifying i_mode may do the trick but as you pointed it probably could be
> >>> reverted by chmod or some other attribute setting calls.
> >>> 
> >>> OTOH, I don't think deny_write_access(file) can be reverted from any
> >>> user-facing path so we could do that from the seal to prevent the future
> >>> opens in write mode.

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski




> On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
>> 
 On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
 
> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>>  wrote:
>> Thanks Andy for your thoughts, my comments below:
 [snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>> 
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>> 
>> Wouldn't that be the right thing to observe anyway?
>> 
>>> - causes reopen to fail.
>> 
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security 
>> fix
>> is a clean one.
>> 
>>> - does *not* affect other struct files that may already exist on the 
>>> same inode.
>> 
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>> IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing 
>> fd
>> unless it was already sent one without the seal applied.  The proposed 
>> seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>> 
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>> 
>> 
>> I didn't follow this, could you explain more?
>> 
>>> - probably is insecure when used on hugetlbfs.
>> 
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory 
>> map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I 
>> mentioned
>> more details in the cover letter).
>> 
>>> I see two reasonable solutions:
>>> 
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>> 
>> Currently, the various VFS paths check only the struct file's f_mode to 
>> deny
>> writes of already opened files. This would mean more checking in all 
>> those
>> paths (and modification of all those paths).
>> 
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which 
>> decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>> 
>> That will prevent both reopens, and writes from succeeding. However I 
>> worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
> 
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?
 
 Well, yes and no. fchmod does prevent reopening the file RW, but
 anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
 seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
 isn't sufficient by itself. While it might be good enough for Android
 (in the sense that it'll prevent RW-reopens from other security
 contexts to which we send an open memfd file), it's still conceptually
 ugly, IMHO. Let's go with the original approach of just tweaking the
 inode so that open-for-write is permanently blocked.
>>> 
>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>> modifying i_mode may do the trick but as you pointed it probably could be
>>> reverted by chmod or some other attribute setting calls.
>>> 
>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>> user-facing path so we could do that from the seal to prevent the future
>>> opens in write mode. I'll double check and test that out tomorrow.
>>> 
>>> 
>> 
>> This seems considerably more complicated and more fragile than needed. Just
>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> variant work exactly like it with two exceptions:
>> 
>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Andy Lutomirski




> On Nov 10, 2018, at 6:38 PM, Joel Fernandes  wrote:
> 
>> On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
>> 
 On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
 
> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
>> wrote:
>> On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>>  wrote:
>> Thanks Andy for your thoughts, my comments below:
 [snip]
>> I don't see it as warty, different seals will work differently. It works
>> quite well for our usecase, and since Linux is all about solving real
>> problems in the real work, it would be useful to have it.
>> 
>>> - causes a probably-observable effect in the file mode in F_GETFL.
>> 
>> Wouldn't that be the right thing to observe anyway?
>> 
>>> - causes reopen to fail.
>> 
>> So this concern isn't true anymore if we make reopen fail only for WRITE
>> opens as Daniel suggested. I will make this change so that the security 
>> fix
>> is a clean one.
>> 
>>> - does *not* affect other struct files that may already exist on the 
>>> same inode.
>> 
>> TBH if you really want to block all writes to the file, then you want
>> F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>> IPC
>> to another process and we want to prevent any new writes in the receiver
>> side. There is no way this other receiving process can have an existing 
>> fd
>> unless it was already sent one without the seal applied.  The proposed 
>> seal
>> could be renamed to F_SEAL_FD_WRITE if that is preferred.
>> 
>>> - mysteriously malfunctions if you try to set it again on another struct
>>> file that already exists
>>> 
>> 
>> I didn't follow this, could you explain more?
>> 
>>> - probably is insecure when used on hugetlbfs.
>> 
>> The usecase is not expected to prevent all writes, indeed the usecase
>> requires existing mmaps to continue to be able to write into the memory 
>> map.
>> So would you call that a security issue too? The use of the seal wants to
>> allow existing mmap regions to be continue to be written into (I 
>> mentioned
>> more details in the cover letter).
>> 
>>> I see two reasonable solutions:
>>> 
>>> 1. Don’t fiddle with the struct file at all. Instead make the inode flag
>>> work by itself.
>> 
>> Currently, the various VFS paths check only the struct file's f_mode to 
>> deny
>> writes of already opened files. This would mean more checking in all 
>> those
>> paths (and modification of all those paths).
>> 
>> Anyway going with that idea, we could
>> 1. call deny_write_access(file) from the memfd's seal path which 
>> decrements
>> the inode::i_writecount.
>> 2. call get_write_access(inode) in the various VFS paths in addition to
>> checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>> negative)
>> 
>> That will prevent both reopens, and writes from succeeding. However I 
>> worry a
>> bit about 2 not being too familiar with VFS internals, about what the
>> consequences of doing that may be.
> 
> IMHO, modifying both the inode and the struct file separately is fine,
> since they mean different things. In regular filesystems, it's fine to
> have a read-write open file description for a file whose inode grants
> write permission to nobody. Speaking of which: is fchmod enough to
> prevent this attack?
 
 Well, yes and no. fchmod does prevent reopening the file RW, but
 anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
 seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
 isn't sufficient by itself. While it might be good enough for Android
 (in the sense that it'll prevent RW-reopens from other security
 contexts to which we send an open memfd file), it's still conceptually
 ugly, IMHO. Let's go with the original approach of just tweaking the
 inode so that open-for-write is permanently blocked.
>>> 
>>> Agreed with the idea of modifying both file and inode flags. I was thinking
>>> modifying i_mode may do the trick but as you pointed it probably could be
>>> reverted by chmod or some other attribute setting calls.
>>> 
>>> OTOH, I don't think deny_write_access(file) can be reverted from any
>>> user-facing path so we could do that from the seal to prevent the future
>>> opens in write mode. I'll double check and test that out tomorrow.
>>> 
>>> 
>> 
>> This seems considerably more complicated and more fragile than needed. Just
>> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
>> variant work exactly like it with two exceptions:
>> 
>> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act

Re: [PATCH for-stable] dmaengine: stm32-dma: fix incomplete configuration in cyclic mode

2018-11-10 Thread Greg KH

On Tue, Oct 16, 2018 at 05:00:03PM -0700, Joel Fernandes (Google) wrote:
> From: Pierre Yves MORDRET 
> 
> commit e57cb3b3f10d005410f09d4598cc6d62b833f2b0 upstream.
> 
> When in cyclic mode, the configuration is updated after having started the
> DMA hardware (STM32_DMA_SCR_EN) leading to incomplete configuration of
> SMxAR registers.
> 
> Signed-off-by: Pierre-Yves MORDRET 
> Signed-off-by: Hugues Fruchet 
> Signed-off-by: Vinod Koul 
> ---
>  drivers/dma/stm32-dma.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

I'm just guessing you want this for 4.14.y?  I've added it there.  If
not, please give me a hint :)

Re: [PATCH for-stable] dmaengine: stm32-dma: fix incomplete configuration in cyclic mode

2018-11-10 Thread Greg KH

On Tue, Oct 16, 2018 at 05:00:03PM -0700, Joel Fernandes (Google) wrote:
> From: Pierre Yves MORDRET 
> 
> commit e57cb3b3f10d005410f09d4598cc6d62b833f2b0 upstream.
> 
> When in cyclic mode, the configuration is updated after having started the
> DMA hardware (STM32_DMA_SCR_EN) leading to incomplete configuration of
> SMxAR registers.
> 
> Signed-off-by: Pierre-Yves MORDRET 
> Signed-off-by: Hugues Fruchet 
> Signed-off-by: Vinod Koul 
> ---
>  drivers/dma/stm32-dma.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

I'm just guessing you want this for 4.14.y?  I've added it there.  If
not, please give me a hint :)

Re: dyntick-idle CPU and node's qsmask

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 03:04:36PM -0800, Paul E. McKenney wrote:
> On Sat, Nov 10, 2018 at 01:46:59PM -0800, Joel Fernandes wrote:
> > Hi Paul and everyone,
> > 
> > I was tracing/studying the RCU code today in paul/dev branch and noticed 
> > that
> > for dyntick-idle CPUs, the RCU GP thread is clearing the rnp->qsmask
> > corresponding to the leaf node for the idle CPU, and reporting a QS on their
> > behalf.
> > 
> > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 792 0 
> > dti
> > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 801 2 
> > dti
> > rcu_sched-10[003]40.008041: rcu_quiescent_state_report: rcu_sched 
> > 805 5>0 0 0 3 0
> > 
> > That's all good but I was wondering if we can do better for the idle CPUs if
> > we can some how not set the qsmask of the node in the first place. Then no
> > reporting would be needed of quiescent state is needed for idle CPUs right?
> > And we would also not need to acquire the rnp lock I think.
> > 
> > At least for a single node tree RCU system, it seems that would avoid 
> > needing
> > to acquire the lock without complications. Anyway let me know your thoughts
> > and happy to discuss this at the hallways of the LPC as well for folks
> > attending :)
> 
> We could, but that would require consulting the rcu_data structure for
> each CPU while initializing the grace period, thus increasing the number
> of cache misses during grace-period initialization and also shortly after
> for any non-idle CPUs.  This seems backwards on busy systems where each

When I traced, it appears to me that rcu_data structure of a remote CPU was
being consulted anyway by the rcu_sched thread. So it seems like such cache
miss would happen anyway whether it is during grace-period initialization or
during the fqs stage? I guess I'm trying to say, the consultation of remote
CPU's rcu_data happens anyway.

> CPU will with high probability report its own quiescent state before three
> jiffies pass, in which case the cache misses on the rcu_data structures
> would be wasted motion.

If all the CPUs are busy and reporting their QS themselves, then I think the
qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
force_qs_rnp) wouldn't be called and so there would no cache misses on
rcu_data right?

> Now, this does increase overhead on mostly idle systems, but the theory
> is that mostly idle systems are most able to absorb this extra overhead.

Yes. Could we use rcuperf to check the impact of such change?

Anyway it was just an idea that popped up when I was going through traces :)
Thanks for the discussion and happy to discuss further or try out anything.

- Joel

Re: dyntick-idle CPU and node's qsmask

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 03:04:36PM -0800, Paul E. McKenney wrote:
> On Sat, Nov 10, 2018 at 01:46:59PM -0800, Joel Fernandes wrote:
> > Hi Paul and everyone,
> > 
> > I was tracing/studying the RCU code today in paul/dev branch and noticed 
> > that
> > for dyntick-idle CPUs, the RCU GP thread is clearing the rnp->qsmask
> > corresponding to the leaf node for the idle CPU, and reporting a QS on their
> > behalf.
> > 
> > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 792 0 
> > dti
> > rcu_sched-10[003]40.008039: rcu_fqs:  rcu_sched 801 2 
> > dti
> > rcu_sched-10[003]40.008041: rcu_quiescent_state_report: rcu_sched 
> > 805 5>0 0 0 3 0
> > 
> > That's all good but I was wondering if we can do better for the idle CPUs if
> > we can some how not set the qsmask of the node in the first place. Then no
> > reporting would be needed of quiescent state is needed for idle CPUs right?
> > And we would also not need to acquire the rnp lock I think.
> > 
> > At least for a single node tree RCU system, it seems that would avoid 
> > needing
> > to acquire the lock without complications. Anyway let me know your thoughts
> > and happy to discuss this at the hallways of the LPC as well for folks
> > attending :)
> 
> We could, but that would require consulting the rcu_data structure for
> each CPU while initializing the grace period, thus increasing the number
> of cache misses during grace-period initialization and also shortly after
> for any non-idle CPUs.  This seems backwards on busy systems where each

When I traced, it appears to me that rcu_data structure of a remote CPU was
being consulted anyway by the rcu_sched thread. So it seems like such cache
miss would happen anyway whether it is during grace-period initialization or
during the fqs stage? I guess I'm trying to say, the consultation of remote
CPU's rcu_data happens anyway.

> CPU will with high probability report its own quiescent state before three
> jiffies pass, in which case the cache misses on the rcu_data structures
> would be wasted motion.

If all the CPUs are busy and reporting their QS themselves, then I think the
qsmask is likely 0 so then rcu_implicit_dynticks_qs (called from
force_qs_rnp) wouldn't be called and so there would no cache misses on
rcu_data right?

> Now, this does increase overhead on mostly idle systems, but the theory
> is that mostly idle systems are most able to absorb this extra overhead.

Yes. Could we use rcuperf to check the impact of such change?

Anyway it was just an idea that popped up when I was going through traces :)
Thanks for the discussion and happy to discuss further or try out anything.

- Joel

sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of function 'pci_iomap'; did you mean 'pcim_iomap'?

2018-11-10 Thread kbuild test robot

Hi Rakesh,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   e255aee5b66ce4af025e6f77122114c01303b861
commit: 6bae5ea9498926440ffc883f3dbceb0adc65e492 ASoC: hdac_hda: add asoc 
extension for legacy HDA codec drivers
date:   2 months ago
config: sh-allyesconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 6bae5ea9498926440ffc883f3dbceb0adc65e492
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   sound/pci/hda/patch_ca0132.c: In function 'patch_ca0132':
>> sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of 
>> function 'pci_iomap'; did you mean 'pcim_iomap'? 
>> [-Werror=implicit-function-declaration]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
   ^
   pcim_iomap
   sound/pci/hda/patch_ca0132.c:7650:18: warning: assignment makes pointer from 
integer without a cast [-Wint-conversion]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
 ^
   cc1: some warnings being treated as errors

vim +7650 sound/pci/hda/patch_ca0132.c

d5c016b56 Gabriele Martino 2015-05-18  7581  
95c6e9cb7 Ian Minett   2011-06-15  7582  static int patch_ca0132(struct 
hda_codec *codec)
95c6e9cb7 Ian Minett   2011-06-15  7583  {
95c6e9cb7 Ian Minett   2011-06-15  7584 struct ca0132_spec *spec;
a73d511c4 Ian Minett   2012-12-20  7585 int err;
d5c016b56 Gabriele Martino 2015-05-18  7586 const struct snd_pci_quirk 
*quirk;
95c6e9cb7 Ian Minett   2011-06-15  7587  
4e76a8833 Takashi Iwai 2014-02-25  7588 codec_dbg(codec, 
"patch_ca0132\n");
95c6e9cb7 Ian Minett   2011-06-15  7589  
95c6e9cb7 Ian Minett   2011-06-15  7590 spec = kzalloc(sizeof(*spec), 
GFP_KERNEL);
95c6e9cb7 Ian Minett   2011-06-15  7591 if (!spec)
95c6e9cb7 Ian Minett   2011-06-15  7592 return -ENOMEM;
95c6e9cb7 Ian Minett   2011-06-15  7593 codec->spec = spec;
993884f6a Chih-Chung Chang 2013-03-25  7594 spec->codec = codec;
95c6e9cb7 Ian Minett   2011-06-15  7595  
225068ab2 Takashi Iwai 2015-05-29  7596 codec->patch_ops = 
ca0132_patch_ops;
225068ab2 Takashi Iwai 2015-05-29  7597 codec->pcm_format_first = 1;
225068ab2 Takashi Iwai 2015-05-29  7598 codec->no_sticky_stream = 1;
225068ab2 Takashi Iwai 2015-05-29  7599  
d5c016b56 Gabriele Martino 2015-05-18  7600 /* Detect codec quirk */
d5c016b56 Gabriele Martino 2015-05-18  7601 quirk = 
snd_pci_quirk_lookup(codec->bus->pci, ca0132_quirks);
d5c016b56 Gabriele Martino 2015-05-18  7602 if (quirk)
d5c016b56 Gabriele Martino 2015-05-18  7603 spec->quirk = 
quirk->value;
d5c016b56 Gabriele Martino 2015-05-18  7604 else
d5c016b56 Gabriele Martino 2015-05-18  7605 spec->quirk = 
QUIRK_NONE;
d5c016b56 Gabriele Martino 2015-05-18  7606  
e24aa0a4c Takashi Iwai 2014-08-10  7607 spec->dsp_state = 
DSP_DOWNLOAD_INIT;
a7e76271b Ian Minett   2012-12-20  7608 spec->num_mixers = 1;
017310fbe Connor McAdams   2018-05-08  7609  
017310fbe Connor McAdams   2018-05-08  7610 /* Set which mixers each quirk 
uses. */
017310fbe Connor McAdams   2018-05-08  7611 switch (spec->quirk) {
017310fbe Connor McAdams   2018-05-08  7612 case QUIRK_SBZ:
e25e34450 Connor McAdams   2018-08-08  7613 spec->mixers[0] = 
desktop_mixer;
017310fbe Connor McAdams   2018-05-08  7614 
snd_hda_codec_set_name(codec, "Sound Blaster Z");
017310fbe Connor McAdams   2018-05-08  7615 break;
e25e34450 Connor McAdams   2018-08-08  7616 case QUIRK_R3D:
e25e34450 Connor McAdams   2018-08-08  7617 spec->mixers[0] = 
desktop_mixer;
e25e34450 Connor McAdams   2018-08-08  7618 
snd_hda_codec_set_name(codec, "Recon3D");
e25e34450 Connor McAdams   2018-08-08  7619 break;
017310fbe Connor McAdams   2018-05-08  7620 case QUIRK_R3DI:
017310fbe Connor McAdams   2018-05-08  7621 spec->mixers[0] = 
r3di_mixer;
017310fbe Connor McAdams   2018-05-08  7622 
snd_hda_codec_set_name(codec, "Recon3Di");
017310fbe Connor McAdams   2018-05-08  7623 break;
017310fbe Connor McAdams   2018-05-08  7624 default:
a7e76271b Ian Minett   2012-12-20  7625 spec->mixers[0] = 
ca0132_mixer;
017310fbe Connor McAdams   2018-05-08  7626 break;
017310fbe Connor McAdams   2018-05-08  7627 }
a7e76271b Ian Minett   2012-12-20  7628  
08eca6b1f Connor McAdams   2018-08-08  7629 /* Setup whether or not to use 
alt functions/controls/pci_mmio */
009b8f979 Connor McAdams   2018-05-08  7630 switch

sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of function 'pci_iomap'; did you mean 'pcim_iomap'?

2018-11-10 Thread kbuild test robot

Hi Rakesh,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   e255aee5b66ce4af025e6f77122114c01303b861
commit: 6bae5ea9498926440ffc883f3dbceb0adc65e492 ASoC: hdac_hda: add asoc 
extension for legacy HDA codec drivers
date:   2 months ago
config: sh-allyesconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 6bae5ea9498926440ffc883f3dbceb0adc65e492
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   sound/pci/hda/patch_ca0132.c: In function 'patch_ca0132':
>> sound/pci/hda/patch_ca0132.c:7650:20: error: implicit declaration of 
>> function 'pci_iomap'; did you mean 'pcim_iomap'? 
>> [-Werror=implicit-function-declaration]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
   ^
   pcim_iomap
   sound/pci/hda/patch_ca0132.c:7650:18: warning: assignment makes pointer from 
integer without a cast [-Wint-conversion]
  spec->mem_base = pci_iomap(codec->bus->pci, 2, 0xC20);
 ^
   cc1: some warnings being treated as errors

vim +7650 sound/pci/hda/patch_ca0132.c

d5c016b56 Gabriele Martino 2015-05-18  7581  
95c6e9cb7 Ian Minett   2011-06-15  7582  static int patch_ca0132(struct 
hda_codec *codec)
95c6e9cb7 Ian Minett   2011-06-15  7583  {
95c6e9cb7 Ian Minett   2011-06-15  7584 struct ca0132_spec *spec;
a73d511c4 Ian Minett   2012-12-20  7585 int err;
d5c016b56 Gabriele Martino 2015-05-18  7586 const struct snd_pci_quirk 
*quirk;
95c6e9cb7 Ian Minett   2011-06-15  7587  
4e76a8833 Takashi Iwai 2014-02-25  7588 codec_dbg(codec, 
"patch_ca0132\n");
95c6e9cb7 Ian Minett   2011-06-15  7589  
95c6e9cb7 Ian Minett   2011-06-15  7590 spec = kzalloc(sizeof(*spec), 
GFP_KERNEL);
95c6e9cb7 Ian Minett   2011-06-15  7591 if (!spec)
95c6e9cb7 Ian Minett   2011-06-15  7592 return -ENOMEM;
95c6e9cb7 Ian Minett   2011-06-15  7593 codec->spec = spec;
993884f6a Chih-Chung Chang 2013-03-25  7594 spec->codec = codec;
95c6e9cb7 Ian Minett   2011-06-15  7595  
225068ab2 Takashi Iwai 2015-05-29  7596 codec->patch_ops = 
ca0132_patch_ops;
225068ab2 Takashi Iwai 2015-05-29  7597 codec->pcm_format_first = 1;
225068ab2 Takashi Iwai 2015-05-29  7598 codec->no_sticky_stream = 1;
225068ab2 Takashi Iwai 2015-05-29  7599  
d5c016b56 Gabriele Martino 2015-05-18  7600 /* Detect codec quirk */
d5c016b56 Gabriele Martino 2015-05-18  7601 quirk = 
snd_pci_quirk_lookup(codec->bus->pci, ca0132_quirks);
d5c016b56 Gabriele Martino 2015-05-18  7602 if (quirk)
d5c016b56 Gabriele Martino 2015-05-18  7603 spec->quirk = 
quirk->value;
d5c016b56 Gabriele Martino 2015-05-18  7604 else
d5c016b56 Gabriele Martino 2015-05-18  7605 spec->quirk = 
QUIRK_NONE;
d5c016b56 Gabriele Martino 2015-05-18  7606  
e24aa0a4c Takashi Iwai 2014-08-10  7607 spec->dsp_state = 
DSP_DOWNLOAD_INIT;
a7e76271b Ian Minett   2012-12-20  7608 spec->num_mixers = 1;
017310fbe Connor McAdams   2018-05-08  7609  
017310fbe Connor McAdams   2018-05-08  7610 /* Set which mixers each quirk 
uses. */
017310fbe Connor McAdams   2018-05-08  7611 switch (spec->quirk) {
017310fbe Connor McAdams   2018-05-08  7612 case QUIRK_SBZ:
e25e34450 Connor McAdams   2018-08-08  7613 spec->mixers[0] = 
desktop_mixer;
017310fbe Connor McAdams   2018-05-08  7614 
snd_hda_codec_set_name(codec, "Sound Blaster Z");
017310fbe Connor McAdams   2018-05-08  7615 break;
e25e34450 Connor McAdams   2018-08-08  7616 case QUIRK_R3D:
e25e34450 Connor McAdams   2018-08-08  7617 spec->mixers[0] = 
desktop_mixer;
e25e34450 Connor McAdams   2018-08-08  7618 
snd_hda_codec_set_name(codec, "Recon3D");
e25e34450 Connor McAdams   2018-08-08  7619 break;
017310fbe Connor McAdams   2018-05-08  7620 case QUIRK_R3DI:
017310fbe Connor McAdams   2018-05-08  7621 spec->mixers[0] = 
r3di_mixer;
017310fbe Connor McAdams   2018-05-08  7622 
snd_hda_codec_set_name(codec, "Recon3Di");
017310fbe Connor McAdams   2018-05-08  7623 break;
017310fbe Connor McAdams   2018-05-08  7624 default:
a7e76271b Ian Minett   2012-12-20  7625 spec->mixers[0] = 
ca0132_mixer;
017310fbe Connor McAdams   2018-05-08  7626 break;
017310fbe Connor McAdams   2018-05-08  7627 }
a7e76271b Ian Minett   2012-12-20  7628  
08eca6b1f Connor McAdams   2018-08-08  7629 /* Setup whether or not to use 
alt functions/controls/pci_mmio */
009b8f979 Connor McAdams   2018-05-08  7630 switch

Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-10 Thread Travis Downs

On Sat, Nov 10, 2018 at 8:07 PM Andi Kleen  wrote:
>
> On Sat, Nov 10, 2018 at 04:42:48PM -0500, Travis Downs wrote:
> > I guess this problem doesn't occur for LBR unwinding since the LBR
> > records are captured at the same
> > moment in time as the PEBS record, so reflect the correct branch
> > sequence.
>
> Actually it happens with LBRs too, but it always gives the backtrace
> consistently at the PMI trigger point.

That's weird - so the LBR records are from the PMI point, but the rest
of the PEBS record comes from the PEBS trigger point? Or the LBR isn't
part of PEBS at all?

>
> > What would this fix mean for perf report when you use cycles:pp and
> > cycles:ppp (or any PEBS based events)? The unwinding should generally
> > work, but the IP at the top of that stack (from the PMI) will
> > generally be different than that recorded by PEBS. The tree view and
> > overhead calculations will be based on the captured stacks, I guess -
> > but when I annotate, will the values I see correspond to the PEBS IPs
> > or the PMI IPs?
>
> Based on PEBS IPs.
>
> It would be a good idea to add a check to perf report
> that the two IPs are different, and if they differ
> add some indicator to the sample. This could be a new sort key,
> although that would waste some space on the screen, or something
> else.

In the case that PEBS events are used, the IP will differ essentially
100% of the time, right? That is, there will always be *some* skid.

>
>
> It wouldn't be cover all cases, for example if you have recursion
> on the same function it might report the same IP even though
> it's a different instance, but I presume that should be rare
> enough to not be a problem.
>

Well the main problem I see is that "IP inconsistency" will be the
usual case, and it will be hard to resent in a reasonable way in the
report. For example, the backtrace-based displays/reports may indicate
that 80% of your samples are in function X, but based on the PEBS IP
records, only 50% may fall in that function, so you'll always have a
weird thing where when you are investigating within the stack-display
you might see 1234 samples in a function, but when you annotate only
789 samples are accounted for, or whatever.

I don't think this is 100% solvable, it's mostly an issue of
displaying it reasonably and managing expectations.

If the LBR record came from PEBS (as I had thought, but perhaps you
are indicating otherwise above), I could imagine a hybrid mode where
LBR is used to go back some number of calls and then dwarf or FP or
whatever unwinding takes over, because the further down the stack you
do the more likely the PEBS trigger point and PMI point are likely to
have a consistent stack.

Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-10 Thread Travis Downs

On Sat, Nov 10, 2018 at 8:07 PM Andi Kleen  wrote:
>
> On Sat, Nov 10, 2018 at 04:42:48PM -0500, Travis Downs wrote:
> > I guess this problem doesn't occur for LBR unwinding since the LBR
> > records are captured at the same
> > moment in time as the PEBS record, so reflect the correct branch
> > sequence.
>
> Actually it happens with LBRs too, but it always gives the backtrace
> consistently at the PMI trigger point.

That's weird - so the LBR records are from the PMI point, but the rest
of the PEBS record comes from the PEBS trigger point? Or the LBR isn't
part of PEBS at all?

>
> > What would this fix mean for perf report when you use cycles:pp and
> > cycles:ppp (or any PEBS based events)? The unwinding should generally
> > work, but the IP at the top of that stack (from the PMI) will
> > generally be different than that recorded by PEBS. The tree view and
> > overhead calculations will be based on the captured stacks, I guess -
> > but when I annotate, will the values I see correspond to the PEBS IPs
> > or the PMI IPs?
>
> Based on PEBS IPs.
>
> It would be a good idea to add a check to perf report
> that the two IPs are different, and if they differ
> add some indicator to the sample. This could be a new sort key,
> although that would waste some space on the screen, or something
> else.

In the case that PEBS events are used, the IP will differ essentially
100% of the time, right? That is, there will always be *some* skid.

>
>
> It wouldn't be cover all cases, for example if you have recursion
> on the same function it might report the same IP even though
> it's a different instance, but I presume that should be rare
> enough to not be a problem.
>

Well the main problem I see is that "IP inconsistency" will be the
usual case, and it will be hard to resent in a reasonable way in the
report. For example, the backtrace-based displays/reports may indicate
that 80% of your samples are in function X, but based on the PEBS IP
records, only 50% may fall in that function, so you'll always have a
weird thing where when you are investigating within the stack-display
you might see 1234 samples in a function, but when you annotate only
789 samples are accounted for, or whatever.

I don't think this is 100% solvable, it's mostly an issue of
displaying it reasonably and managing expectations.

If the LBR record came from PEBS (as I had thought, but perhaps you
are indicating otherwise above), I could imagine a hybrid mode where
LBR is used to go back some number of calls and then dwarf or FP or
whatever unwinding takes over, because the further down the stack you
do the more likely the PEBS trigger point and PMI point are likely to
have a consistent stack.

Re: [PATCH] proc: fix and merge proc-self-map-file tests

2018-11-10 Thread Rafael David Tinoco

On Sat, Nov 10, 2018, at 4:49 PM, Alexey Dobriyan wrote:
> On Sat, Nov 10, 2018 at 03:56:03PM -0200, Rafael David Tinoco wrote:
> > On Sat, Nov 10, 2018, at 3:47 PM, Alexey Dobriyan wrote:
> > > On Fri, Nov 09, 2018 at 09:30:36AM -0200, Rafael David Tinoco wrote:
> > > > Merge proc-self-map-files tests into one since this test should focus in
> > > > testing readlink in /proc/self/map_files/* only, and not trying to test
> > > > mapping virtual address 0.
> > > > 
> > > > Lowest virtual address for user space mapping in other architectures,
> > > > like arm, is *at least* *(PAGE_SIZE * 2) and NULL hint does not
> > > > guarantee that when MAP_FIXED flag, important to this test, is given.
> > > > This patch also fixes this issue in remaining test.
> > > 
> > > > -   p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE, fd, 
> > > > 0);
> > > > +   p = mmap((void *) (2 * PAGE_SIZE), PAGE_SIZE, PROT_NONE,
> > > 
> > > I don't know ARM. Is this 2 page limitation a limitation of hardware or
> > > kernel's?
> > 
> > Kernel:
> > https://bugs.linaro.org/show_bug.cgi?id=3782#c7
> 
> Ahh. please test the path I've sent, I don't have arm install readily
> available.

I replied to your patch based on some of the discussion we had in this thread.

Thanks

Rafael
-
Rafael D. Tinoco
Linaro Kernel Validation Team

Re: [PATCH] proc: fix and merge proc-self-map-file tests

2018-11-10 Thread Rafael David Tinoco

On Sat, Nov 10, 2018, at 4:49 PM, Alexey Dobriyan wrote:
> On Sat, Nov 10, 2018 at 03:56:03PM -0200, Rafael David Tinoco wrote:
> > On Sat, Nov 10, 2018, at 3:47 PM, Alexey Dobriyan wrote:
> > > On Fri, Nov 09, 2018 at 09:30:36AM -0200, Rafael David Tinoco wrote:
> > > > Merge proc-self-map-files tests into one since this test should focus in
> > > > testing readlink in /proc/self/map_files/* only, and not trying to test
> > > > mapping virtual address 0.
> > > > 
> > > > Lowest virtual address for user space mapping in other architectures,
> > > > like arm, is *at least* *(PAGE_SIZE * 2) and NULL hint does not
> > > > guarantee that when MAP_FIXED flag, important to this test, is given.
> > > > This patch also fixes this issue in remaining test.
> > > 
> > > > -   p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE, fd, 
> > > > 0);
> > > > +   p = mmap((void *) (2 * PAGE_SIZE), PAGE_SIZE, PROT_NONE,
> > > 
> > > I don't know ARM. Is this 2 page limitation a limitation of hardware or
> > > kernel's?
> > 
> > Kernel:
> > https://bugs.linaro.org/show_bug.cgi?id=3782#c7
> 
> Ahh. please test the path I've sent, I don't have arm install readily
> available.

I replied to your patch based on some of the discussion we had in this thread.

Thanks

Rafael
-
Rafael D. Tinoco
Linaro Kernel Validation Team

Re: [PATCH] proc: fixup map_files test on arm

2018-11-10 Thread Rafael David Tinoco

Including Shuah and kselftest list...

On Sat, Nov 10, 2018, at 4:49 PM, Alexey Dobriyan wrote:
> https://bugs.linaro.org/show_bug.cgi?id=3782
> 
> Turns out arm doesn't allow to map address 0, so try minimum virtual
> address instead.
> 
> Reported-by: Rafael David Tinoco 
> Signed-off-by: Alexey Dobriyan 
> ---
> 
>  tools/testing/selftests/proc/proc-self-map-files-002.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/tools/testing/selftests/proc/proc-self-map-files-002.c
> +++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
> @@ -13,7 +13,7 @@
>   * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT 
> OF
>   * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>   */
> -/* Test readlink /proc/self/map_files/... with address 0. */
> +/* Test readlink /proc/self/map_files/... with minimum address. */
>  #include 
>  #include 
>  #include 
> @@ -47,6 +47,11 @@ static void fail(const char *fmt, unsigned long a, 
> unsigned long b)
>  int main(void)
>  {
>   const unsigned int PAGE_SIZE = sysconf(_SC_PAGESIZE);
> +#ifdef __arm__
> + unsigned long va = 2 * PAGE_SIZE;
> +#else
> + unsigned long va = 0;
> +#endif
>   void *p;
>   int fd;
>   unsigned long a, b;
> @@ -55,7 +60,7 @@ int main(void)
>   if (fd == -1)
>   return 1;
>  
> - p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, 
> fd, 0);
> + p = mmap(va, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 
> 0);
>   if (p == MAP_FAILED) {
>   if (errno == EPERM)
>   return 2;

I have sent a patch removing proc-self-map-files-002 AND making 001 to use as a
HINT for mmap (MAP_FIXED) *at least*  *(2 * PAGE_SIZE), which would, likely,
attend all  architectures, avoiding trying to make the test specific to one,
and, still, test the symlinks for issues (like bad chars, spaces, so on).

Both tests (001 and 002) have pretty much the same code, while they could have 2
tests in a single code, using kselftest framework. Is NULL hint + MAP_FIXED
something imperative for this test ? Why not to have all in a single test ? Are
you keeping the NULL hint just to test mmap, apart" from the core of this test ?

Sorry to insist.. If you want to keep it like this, I can create a similar test
in LTP - for the symlinks only, which seem important - and blacklist this one in
our function tests kselftest list (https://lkft.linaro.org/), then no change is
needed on your side.

Thanks

Re: [PATCH] proc: fixup map_files test on arm

2018-11-10 Thread Rafael David Tinoco

Including Shuah and kselftest list...

On Sat, Nov 10, 2018, at 4:49 PM, Alexey Dobriyan wrote:
> https://bugs.linaro.org/show_bug.cgi?id=3782
> 
> Turns out arm doesn't allow to map address 0, so try minimum virtual
> address instead.
> 
> Reported-by: Rafael David Tinoco 
> Signed-off-by: Alexey Dobriyan 
> ---
> 
>  tools/testing/selftests/proc/proc-self-map-files-002.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> --- a/tools/testing/selftests/proc/proc-self-map-files-002.c
> +++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
> @@ -13,7 +13,7 @@
>   * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT 
> OF
>   * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>   */
> -/* Test readlink /proc/self/map_files/... with address 0. */
> +/* Test readlink /proc/self/map_files/... with minimum address. */
>  #include 
>  #include 
>  #include 
> @@ -47,6 +47,11 @@ static void fail(const char *fmt, unsigned long a, 
> unsigned long b)
>  int main(void)
>  {
>   const unsigned int PAGE_SIZE = sysconf(_SC_PAGESIZE);
> +#ifdef __arm__
> + unsigned long va = 2 * PAGE_SIZE;
> +#else
> + unsigned long va = 0;
> +#endif
>   void *p;
>   int fd;
>   unsigned long a, b;
> @@ -55,7 +60,7 @@ int main(void)
>   if (fd == -1)
>   return 1;
>  
> - p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, 
> fd, 0);
> + p = mmap(va, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 
> 0);
>   if (p == MAP_FAILED) {
>   if (errno == EPERM)
>   return 2;

I have sent a patch removing proc-self-map-files-002 AND making 001 to use as a
HINT for mmap (MAP_FIXED) *at least*  *(2 * PAGE_SIZE), which would, likely,
attend all  architectures, avoiding trying to make the test specific to one,
and, still, test the symlinks for issues (like bad chars, spaces, so on).

Both tests (001 and 002) have pretty much the same code, while they could have 2
tests in a single code, using kselftest framework. Is NULL hint + MAP_FIXED
something imperative for this test ? Why not to have all in a single test ? Are
you keeping the NULL hint just to test mmap, apart" from the core of this test ?

Sorry to insist.. If you want to keep it like this, I can create a similar test
in LTP - for the symlinks only, which seem important - and blacklist this one in
our function tests kselftest list (https://lkft.linaro.org/), then no change is
needed on your side.

Thanks

Re: [PATCH RFC] bluetooth: add uart h4 devices via serdev/devicetree

2018-11-10 Thread Sebastian Reichel

Hi,

On Sun, Nov 11, 2018 at 12:20:34AM +0100, Andreas Kemnade wrote:
> This is a first try to be able to use h4 devices specified in
> the devicetree, so you do not need to call hciattach and
> it can be automatically probed.
> 
> Of course, proper devicetree bindings documentation is
> missing. And also you would extend that by regulator/
> enable gpio settings.
> 
> But before proceeding further it should be checked if the
> general way of doing things is right.
> 
> Signed-off-by: Andreas Kemnade 
> ---

Patch looks good to me, just one note

>  drivers/bluetooth/hci_h4.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 

[...]

> + return hci_uart_register_device(hu, );
> +}
> +
> +static void hci_h4_remove(struct serdev_device *serdev)
> +{
> + struct h4_device *h4dev = serdev_device_get_drvdata(serdev);
> +
> + hci_uart_unregister_device(>hu);
> +}

I suggest to add a patch introducing

devm_hci_uart_register_device()

All existing users of hci_uart_register_device() could use it
(your driver, hci_bcm, hci_h5, hci_ll, hci_nokia and hci_qca)
and all drivers but hci_qca can drop their remove function.

-- Sebastian


signature.asc
Description: PGP signature

Re: [PATCH RFC] bluetooth: add uart h4 devices via serdev/devicetree

2018-11-10 Thread Sebastian Reichel

Hi,

On Sun, Nov 11, 2018 at 12:20:34AM +0100, Andreas Kemnade wrote:
> This is a first try to be able to use h4 devices specified in
> the devicetree, so you do not need to call hciattach and
> it can be automatically probed.
> 
> Of course, proper devicetree bindings documentation is
> missing. And also you would extend that by regulator/
> enable gpio settings.
> 
> But before proceeding further it should be checked if the
> general way of doing things is right.
> 
> Signed-off-by: Andreas Kemnade 
> ---

Patch looks good to me, just one note

>  drivers/bluetooth/hci_h4.c | 78 
> ++
>  1 file changed, 78 insertions(+)
> 

[...]

> + return hci_uart_register_device(hu, );
> +}
> +
> +static void hci_h4_remove(struct serdev_device *serdev)
> +{
> + struct h4_device *h4dev = serdev_device_get_drvdata(serdev);
> +
> + hci_uart_unregister_device(>hu);
> +}

I suggest to add a patch introducing

devm_hci_uart_register_device()

All existing users of hci_uart_register_device() could use it
(your driver, hci_bcm, hci_h5, hci_ll, hci_nokia and hci_qca)
and all drivers but hci_qca can drop their remove function.

-- Sebastian


signature.asc
Description: PGP signature

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
> >>> wrote:
>  On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>   wrote:
>  Thanks Andy for your thoughts, my comments below:
> >> [snip]
>  I don't see it as warty, different seals will work differently. It works
>  quite well for our usecase, and since Linux is all about solving real
>  problems in the real work, it would be useful to have it.
>  
> > - causes a probably-observable effect in the file mode in F_GETFL.
>  
>  Wouldn't that be the right thing to observe anyway?
>  
> > - causes reopen to fail.
>  
>  So this concern isn't true anymore if we make reopen fail only for WRITE
>  opens as Daniel suggested. I will make this change so that the security 
>  fix
>  is a clean one.
>  
> > - does *not* affect other struct files that may already exist on the 
> > same inode.
>  
>  TBH if you really want to block all writes to the file, then you want
>  F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>  IPC
>  to another process and we want to prevent any new writes in the receiver
>  side. There is no way this other receiving process can have an existing 
>  fd
>  unless it was already sent one without the seal applied.  The proposed 
>  seal
>  could be renamed to F_SEAL_FD_WRITE if that is preferred.
>  
> > - mysteriously malfunctions if you try to set it again on another struct
> > file that already exists
> > 
>  
>  I didn't follow this, could you explain more?
>  
> > - probably is insecure when used on hugetlbfs.
>  
>  The usecase is not expected to prevent all writes, indeed the usecase
>  requires existing mmaps to continue to be able to write into the memory 
>  map.
>  So would you call that a security issue too? The use of the seal wants to
>  allow existing mmap regions to be continue to be written into (I 
>  mentioned
>  more details in the cover letter).
>  
> > I see two reasonable solutions:
> > 
> > 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> > work by itself.
>  
>  Currently, the various VFS paths check only the struct file's f_mode to 
>  deny
>  writes of already opened files. This would mean more checking in all 
>  those
>  paths (and modification of all those paths).
>  
>  Anyway going with that idea, we could
>  1. call deny_write_access(file) from the memfd's seal path which 
>  decrements
>  the inode::i_writecount.
>  2. call get_write_access(inode) in the various VFS paths in addition to
>  checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>  negative)
>  
>  That will prevent both reopens, and writes from succeeding. However I 
>  worry a
>  bit about 2 not being too familiar with VFS internals, about what the
>  consequences of doing that may be.
> >>> 
> >>> IMHO, modifying both the inode and the struct file separately is fine,
> >>> since they mean different things. In regular filesystems, it's fine to
> >>> have a read-write open file description for a file whose inode grants
> >>> write permission to nobody. Speaking of which: is fchmod enough to
> >>> prevent this attack?
> >> 
> >> Well, yes and no. fchmod does prevent reopening the file RW, but
> >> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >> isn't sufficient by itself. While it might be good enough for Android
> >> (in the sense that it'll prevent RW-reopens from other security
> >> contexts to which we send an open memfd file), it's still conceptually
> >> ugly, IMHO. Let's go with the original approach of just tweaking the
> >> inode so that open-for-write is permanently blocked.
> > 
> > Agreed with the idea of modifying both file and inode flags. I was thinking
> > modifying i_mode may do the trick but as you pointed it probably could be
> > reverted by chmod or some other attribute setting calls.
> > 
> > OTOH, I don't think deny_write_access(file) can be reverted from any
> > user-facing path so we could do that from the seal to prevent the future
> > opens in write mode. I'll double check and test that out tomorrow.
> > 
> > 
> 
> This seems considerably more complicated and more fragile than needed. Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> accordingly.

There's more to it than that, we also need to block future

Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

2018-11-10 Thread Joel Fernandes

On Sat, Nov 10, 2018 at 02:18:23PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 10, 2018, at 2:09 PM, Joel Fernandes  wrote:
> > 
> >> On Sat, Nov 10, 2018 at 11:11:27AM -0800, Daniel Colascione wrote:
> >>> On Sat, Nov 10, 2018 at 10:45 AM, Daniel Colascione  
> >>> wrote:
>  On Sat, Nov 10, 2018 at 10:24 AM, Joel Fernandes 
>   wrote:
>  Thanks Andy for your thoughts, my comments below:
> >> [snip]
>  I don't see it as warty, different seals will work differently. It works
>  quite well for our usecase, and since Linux is all about solving real
>  problems in the real work, it would be useful to have it.
>  
> > - causes a probably-observable effect in the file mode in F_GETFL.
>  
>  Wouldn't that be the right thing to observe anyway?
>  
> > - causes reopen to fail.
>  
>  So this concern isn't true anymore if we make reopen fail only for WRITE
>  opens as Daniel suggested. I will make this change so that the security 
>  fix
>  is a clean one.
>  
> > - does *not* affect other struct files that may already exist on the 
> > same inode.
>  
>  TBH if you really want to block all writes to the file, then you want
>  F_SEAL_WRITE, not this seal. The usecase we have is the fd is sent over 
>  IPC
>  to another process and we want to prevent any new writes in the receiver
>  side. There is no way this other receiving process can have an existing 
>  fd
>  unless it was already sent one without the seal applied.  The proposed 
>  seal
>  could be renamed to F_SEAL_FD_WRITE if that is preferred.
>  
> > - mysteriously malfunctions if you try to set it again on another struct
> > file that already exists
> > 
>  
>  I didn't follow this, could you explain more?
>  
> > - probably is insecure when used on hugetlbfs.
>  
>  The usecase is not expected to prevent all writes, indeed the usecase
>  requires existing mmaps to continue to be able to write into the memory 
>  map.
>  So would you call that a security issue too? The use of the seal wants to
>  allow existing mmap regions to be continue to be written into (I 
>  mentioned
>  more details in the cover letter).
>  
> > I see two reasonable solutions:
> > 
> > 1. Don’t fiddle with the struct file at all. Instead make the inode flag
> > work by itself.
>  
>  Currently, the various VFS paths check only the struct file's f_mode to 
>  deny
>  writes of already opened files. This would mean more checking in all 
>  those
>  paths (and modification of all those paths).
>  
>  Anyway going with that idea, we could
>  1. call deny_write_access(file) from the memfd's seal path which 
>  decrements
>  the inode::i_writecount.
>  2. call get_write_access(inode) in the various VFS paths in addition to
>  checking for FMODE_*WRITE and deny the write (incase i_writecount is 
>  negative)
>  
>  That will prevent both reopens, and writes from succeeding. However I 
>  worry a
>  bit about 2 not being too familiar with VFS internals, about what the
>  consequences of doing that may be.
> >>> 
> >>> IMHO, modifying both the inode and the struct file separately is fine,
> >>> since they mean different things. In regular filesystems, it's fine to
> >>> have a read-write open file description for a file whose inode grants
> >>> write permission to nobody. Speaking of which: is fchmod enough to
> >>> prevent this attack?
> >> 
> >> Well, yes and no. fchmod does prevent reopening the file RW, but
> >> anyone with permissions (owner, CAP_FOWNER) can just fchmod it back. A
> >> seal is supposed to be irrevocable, so fchmod-as-inode-seal probably
> >> isn't sufficient by itself. While it might be good enough for Android
> >> (in the sense that it'll prevent RW-reopens from other security
> >> contexts to which we send an open memfd file), it's still conceptually
> >> ugly, IMHO. Let's go with the original approach of just tweaking the
> >> inode so that open-for-write is permanently blocked.
> > 
> > Agreed with the idea of modifying both file and inode flags. I was thinking
> > modifying i_mode may do the trick but as you pointed it probably could be
> > reverted by chmod or some other attribute setting calls.
> > 
> > OTOH, I don't think deny_write_access(file) can be reverted from any
> > user-facing path so we could do that from the seal to prevent the future
> > opens in write mode. I'll double check and test that out tomorrow.
> > 
> > 
> 
> This seems considerably more complicated and more fragile than needed. Just
> add a new F_SEAL_WRITE_FUTURE.  Grep for F_SEAL_WRITE and make the _FUTURE
> variant work exactly like it with two exceptions:
> 
> - shmem_mmap and maybe its hugetlbfs equivalent should check for it and act
> accordingly.

There's more to it than that, we also need to block future

Can I Trust You?

2018-11-10 Thread Abel Brent

Dear friend,

I am Abel Brent, a NATO soldier serving in Afghanistan. I and my 
Comrades, we are seeking your assistance to help us 
receive/invest our funds in your country in any lucrative 
business. Please if this proposal is acceptable by you, kindly 
respond back to me for more details.

Thanks and waiting to hear from you


Abel.

Can I Trust You?

2018-11-10 Thread Abel Brent

Dear friend,

I am Abel Brent, a NATO soldier serving in Afghanistan. I and my 
Comrades, we are seeking your assistance to help us 
receive/invest our funds in your country in any lucrative 
business. Please if this proposal is acceptable by you, kindly 
respond back to me for more details.

Thanks and waiting to hear from you


Abel.

Re: [PATCH] clk: qcom: gcc: Fix board clock node name

2018-11-10 Thread Taniya Das


Hello Vinod,

On 11/9/2018 3:20 PM, Vinod Koul wrote:

Device tree node name are not supposed to have "_" in them so fix the
node name use of xo_board to xo-board

Fixes: 652f1813c113 ("clk: qcom: gcc: Add global clock controller driver for 
QCS404")
Signed-off-by: Vinod Koul 
---

Steve: RobH pointed this on DTS patches, would be great if you can pick this
as a fix

  drivers/clk/qcom/gcc-qcs404.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/qcom/gcc-qcs404.c b/drivers/clk/qcom/gcc-qcs404.c
index e4ca6a45f313..ef1b267cb058 100644
--- a/drivers/clk/qcom/gcc-qcs404.c
+++ b/drivers/clk/qcom/gcc-qcs404.c
@@ -265,7 +265,7 @@ static struct clk_fixed_factor cxo = {
.div = 1,
.hw.init = &(struct clk_init_data){
.name = "cxo",
-   .parent_names = (const char *[]){ "xo_board" },
+   .parent_names = (const char *[]){ "xo-board" },
.num_parents = 1,
.ops = _fixed_factor_ops,
},



This fixed clock needs to be removed, once the RPM<->SMD clocks are 
added. Why not have this clock part of the device Tree?


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

--

Re: [PATCH] clk: qcom: gcc: Fix board clock node name

2018-11-10 Thread Taniya Das


Hello Vinod,

On 11/9/2018 3:20 PM, Vinod Koul wrote:

Device tree node name are not supposed to have "_" in them so fix the
node name use of xo_board to xo-board

Fixes: 652f1813c113 ("clk: qcom: gcc: Add global clock controller driver for 
QCS404")
Signed-off-by: Vinod Koul 
---

Steve: RobH pointed this on DTS patches, would be great if you can pick this
as a fix

  drivers/clk/qcom/gcc-qcs404.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clk/qcom/gcc-qcs404.c b/drivers/clk/qcom/gcc-qcs404.c
index e4ca6a45f313..ef1b267cb058 100644
--- a/drivers/clk/qcom/gcc-qcs404.c
+++ b/drivers/clk/qcom/gcc-qcs404.c
@@ -265,7 +265,7 @@ static struct clk_fixed_factor cxo = {
.div = 1,
.hw.init = &(struct clk_init_data){
.name = "cxo",
-   .parent_names = (const char *[]){ "xo_board" },
+   .parent_names = (const char *[]){ "xo-board" },
.num_parents = 1,
.ops = _fixed_factor_ops,
},



This fixed clock needs to be removed, once the RPM<->SMD clocks are 
added. Why not have this clock part of the device Tree?


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

--

WARNING in ovl_instantiate

2018-11-10 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:442b8cea2477 Add linux-next specific files for 20181109
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=169a6fbd40
kernel config:  https://syzkaller.appspot.com/x/.config?x=2f72bdb11df9fbe8
dashboard link: https://syzkaller.appspot.com/bug?extid=9c69c282adc4edd2b540
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9c69c282adc4edd2b...@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 9768 at fs/overlayfs/dir.c:263  
ovl_instantiate+0x369/0x400 fs/overlayfs/dir.c:263

Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9768 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+  
#110
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 panic+0x2ad/0x55c kernel/panic.c:188
 __warn.cold.8+0x20/0x45 kernel/panic.c:540
 report_bug+0x254/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:ovl_instantiate+0x369/0x400 fs/overlayfs/dir.c:263
Code: c3 89 c6 e8 69 84 ee fe 85 db 0f 85 9e 00 00 00 e8 4c 83 ee fe 4c 89  
e7 45 31 f6 e8 11 18 45 ff e9 ec fe ff ff e8 37 83 ee fe <0f> 0b e9 e0 fe  
ff ff e8 2b 83 ee fe 0f 0b e9 63 ff ff ff e8 1f db

RSP: 0018:88018f31f990 EFLAGS: 00010212
RAX: 0004 RBX: 88018f31fa28 RCX: c90013c02000
RDX: a369 RSI: 82912579 RDI: 0007
RBP: 88018f31fa50 R08: 8801bb18a000 R09: ed0031e63ee5
R10: ed0031e63ee5 R11: 0003 R12: 8801cd1e8300
R13: 88018f31f9c8 R14: ff8c R15: 
 ovl_create_over_whiteout fs/overlayfs/dir.c:518 [inline]
 ovl_create_or_link+0xad6/0x1560 fs/overlayfs/dir.c:582
 ovl_create_object+0x2e9/0x3a0 fs/overlayfs/dir.c:616
 ovl_create+0x2b/0x30 fs/overlayfs/dir.c:630
 vfs_create+0x388/0x5b0 fs/namei.c:2912
 do_mknodat+0x410/0x530 fs/namei.c:3766
 __do_sys_mknod fs/namei.c:3795 [inline]
 __se_sys_mknod fs/namei.c:3793 [inline]
 __x64_sys_mknod+0x7b/0xb0 fs/namei.c:3793
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f2126d09c78 EFLAGS: 0246 ORIG_RAX: 0085
RAX: ffda RBX: 0003 RCX: 00457569
RDX:  RSI:  RDI: 2340
RBP: 0072bfa0 R08:  R09: 
R10:  R11: 0246 R12: 7f2126d0a6d4
R13: 004c2a6e R14: 004d4110 R15: 
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

WARNING in ovl_instantiate

2018-11-10 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:442b8cea2477 Add linux-next specific files for 20181109
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=169a6fbd40
kernel config:  https://syzkaller.appspot.com/x/.config?x=2f72bdb11df9fbe8
dashboard link: https://syzkaller.appspot.com/bug?extid=9c69c282adc4edd2b540
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9c69c282adc4edd2b...@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 9768 at fs/overlayfs/dir.c:263  
ovl_instantiate+0x369/0x400 fs/overlayfs/dir.c:263

Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9768 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+  
#110
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 panic+0x2ad/0x55c kernel/panic.c:188
 __warn.cold.8+0x20/0x45 kernel/panic.c:540
 report_bug+0x254/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:ovl_instantiate+0x369/0x400 fs/overlayfs/dir.c:263
Code: c3 89 c6 e8 69 84 ee fe 85 db 0f 85 9e 00 00 00 e8 4c 83 ee fe 4c 89  
e7 45 31 f6 e8 11 18 45 ff e9 ec fe ff ff e8 37 83 ee fe <0f> 0b e9 e0 fe  
ff ff e8 2b 83 ee fe 0f 0b e9 63 ff ff ff e8 1f db

RSP: 0018:88018f31f990 EFLAGS: 00010212
RAX: 0004 RBX: 88018f31fa28 RCX: c90013c02000
RDX: a369 RSI: 82912579 RDI: 0007
RBP: 88018f31fa50 R08: 8801bb18a000 R09: ed0031e63ee5
R10: ed0031e63ee5 R11: 0003 R12: 8801cd1e8300
R13: 88018f31f9c8 R14: ff8c R15: 
 ovl_create_over_whiteout fs/overlayfs/dir.c:518 [inline]
 ovl_create_or_link+0xad6/0x1560 fs/overlayfs/dir.c:582
 ovl_create_object+0x2e9/0x3a0 fs/overlayfs/dir.c:616
 ovl_create+0x2b/0x30 fs/overlayfs/dir.c:630
 vfs_create+0x388/0x5b0 fs/namei.c:2912
 do_mknodat+0x410/0x530 fs/namei.c:3766
 __do_sys_mknod fs/namei.c:3795 [inline]
 __se_sys_mknod fs/namei.c:3793 [inline]
 __x64_sys_mknod+0x7b/0xb0 fs/namei.c:3793
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f2126d09c78 EFLAGS: 0246 ORIG_RAX: 0085
RAX: ffda RBX: 0003 RCX: 00457569
RDX:  RSI:  RDI: 2340
RBP: 0072bfa0 R08:  R09: 
R10:  R11: 0246 R12: 7f2126d0a6d4
R13: 004c2a6e R14: 004d4110 R15: 
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

Re: stable/linux-3.16.y build: 178 builds: 1 failed, 177 passed, 2 errors, 57 warnings (v3.16.52)

2018-11-10 Thread Ben Hutchings

On Sat, 2018-01-13 at 19:51 +0100, Manfred Spraul wrote:
> Hi Arnd,
> 
> On 01/03/2018 12:15 AM, Arnd Bergmann wrote:
> > > 2 ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this 
> > > function [-Wmaybe-uninitialized]
> > This code was last touched in 3.16 by the backport of commit
> > 5864a2fd3088 ("ipc/sem.c: fix complex_count vs. simple op race")
> > 
> > The warning is in "smp_load_acquire(>complex_mode))", and I suspect
> > that commit 27d7be1801a4 ("ipc/sem.c: avoid using spin_unlock_wait()")
> > avoided the warning upstream by removing the smp_mb() before it.
> The smp_mb() pairs with spin_unlock_wait() in complexmode_enter()
> It is removed by commit 27d7be1801a4 ("ipc/sem.c: avoid using 
> spin_unlock_wait()").
> 
>  From what I see, it doesn't exist in any of the stable kernels 
> (intentionally, the above commit is a rewrite for better performance).
> 
> ___p1 is from smp_load_acquire()
>  >typeof(*p) ___p1 = READ_ONCE(*p);   \
> 
> I don't see how ___p1 could be used uninitialized. Perhaps a compiler issue?

On arm64 smp_load_acquire() was implemented in assembly that only
supported 4-byte and 8-byte words.  And complex_mode is a bool (1-byte) 
field.

So I believe the fix is:

878a84d5a8a1 arm64: add missing data types in smp_load_acquire/smp_store_release

Ben.

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.



signature.asc
Description: This is a digitally signed message part

Re: stable/linux-3.16.y build: 178 builds: 1 failed, 177 passed, 2 errors, 57 warnings (v3.16.52)

2018-11-10 Thread Ben Hutchings

On Sat, 2018-01-13 at 19:51 +0100, Manfred Spraul wrote:
> Hi Arnd,
> 
> On 01/03/2018 12:15 AM, Arnd Bergmann wrote:
> > > 2 ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this 
> > > function [-Wmaybe-uninitialized]
> > This code was last touched in 3.16 by the backport of commit
> > 5864a2fd3088 ("ipc/sem.c: fix complex_count vs. simple op race")
> > 
> > The warning is in "smp_load_acquire(>complex_mode))", and I suspect
> > that commit 27d7be1801a4 ("ipc/sem.c: avoid using spin_unlock_wait()")
> > avoided the warning upstream by removing the smp_mb() before it.
> The smp_mb() pairs with spin_unlock_wait() in complexmode_enter()
> It is removed by commit 27d7be1801a4 ("ipc/sem.c: avoid using 
> spin_unlock_wait()").
> 
>  From what I see, it doesn't exist in any of the stable kernels 
> (intentionally, the above commit is a rewrite for better performance).
> 
> ___p1 is from smp_load_acquire()
>  >typeof(*p) ___p1 = READ_ONCE(*p);   \
> 
> I don't see how ___p1 could be used uninitialized. Perhaps a compiler issue?

On arm64 smp_load_acquire() was implemented in assembly that only
supported 4-byte and 8-byte words.  And complex_mode is a bool (1-byte) 
field.

So I believe the fix is:

878a84d5a8a1 arm64: add missing data types in smp_load_acquire/smp_store_release

Ben.

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.



signature.asc
Description: This is a digitally signed message part

[PATCH] pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues

2018-11-10 Thread Brian Masney

When attempting to setup up a gpio hog, device probing will repeatedly
fail with -EPROBE_DEFERED errors. It is caused by a circular dependency
between the gpio and pinctrl frameworks. If the gpio-ranges property is
present in device tree, then the gpio framework will handle the gpio pin
registration and eliminate the circular dependency.

See Christian Lamparter's commit a86caa9ba5d7 ("pinctrl: msm: fix
gpio-hog related boot issues") for a detailed commit message that
explains the issue in much more detail. The code comment in this commit
came from Christian's commit.

I did not test this change against any hardware supported by this
particular driver, however I was able to validate this same fix works
for pinctrl-spmi-gpio.c using a LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
For the patch and discussion regarding pinctrl-spmi-gpio.c, see
https://lore.kernel.org/lkml/20181101001149.13453-6-masn...@onstation.org/

 drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c 
b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
index 6b30bef829ab..ded7d765af2e 100644
--- a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
+++ b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
@@ -762,12 +762,23 @@ static int pm8xxx_gpio_probe(struct platform_device *pdev)
return ret;
}
 
-   ret = gpiochip_add_pin_range(>chip,
-dev_name(pctrl->dev),
-0, 0, pctrl->chip.ngpio);
-   if (ret) {
-   dev_err(pctrl->dev, "failed to add pin range\n");
-   goto unregister_gpiochip;
+   /*
+* For DeviceTree-supported systems, the gpio core checks the
+* pinctrl's device node for the "gpio-ranges" property.
+* If it is present, it takes care of adding the pin ranges
+* for the driver. In this case the driver can skip ahead.
+*
+* In order to remain compatible with older, existing DeviceTree
+* files which don't set the "gpio-ranges" property or systems that
+* utilize ACPI the driver has to call gpiochip_add_pin_range().
+*/
+   if (!of_property_read_bool(pctrl->dev->of_node, "gpio-ranges")) {
+   ret = gpiochip_add_pin_range(>chip, dev_name(pctrl->dev),
+0, 0, pctrl->chip.ngpio);
+   if (ret) {
+   dev_err(pctrl->dev, "failed to add pin range\n");
+   goto unregister_gpiochip;
+   }
}
 
platform_set_drvdata(pdev, pctrl);
-- 
2.17.2

[PATCH] pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues

2018-11-10 Thread Brian Masney

When attempting to setup up a gpio hog, device probing will repeatedly
fail with -EPROBE_DEFERED errors. It is caused by a circular dependency
between the gpio and pinctrl frameworks. If the gpio-ranges property is
present in device tree, then the gpio framework will handle the gpio pin
registration and eliminate the circular dependency.

See Christian Lamparter's commit a86caa9ba5d7 ("pinctrl: msm: fix
gpio-hog related boot issues") for a detailed commit message that
explains the issue in much more detail. The code comment in this commit
came from Christian's commit.

I did not test this change against any hardware supported by this
particular driver, however I was able to validate this same fix works
for pinctrl-spmi-gpio.c using a LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
For the patch and discussion regarding pinctrl-spmi-gpio.c, see
https://lore.kernel.org/lkml/20181101001149.13453-6-masn...@onstation.org/

 drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c 
b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
index 6b30bef829ab..ded7d765af2e 100644
--- a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
+++ b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
@@ -762,12 +762,23 @@ static int pm8xxx_gpio_probe(struct platform_device *pdev)
return ret;
}
 
-   ret = gpiochip_add_pin_range(>chip,
-dev_name(pctrl->dev),
-0, 0, pctrl->chip.ngpio);
-   if (ret) {
-   dev_err(pctrl->dev, "failed to add pin range\n");
-   goto unregister_gpiochip;
+   /*
+* For DeviceTree-supported systems, the gpio core checks the
+* pinctrl's device node for the "gpio-ranges" property.
+* If it is present, it takes care of adding the pin ranges
+* for the driver. In this case the driver can skip ahead.
+*
+* In order to remain compatible with older, existing DeviceTree
+* files which don't set the "gpio-ranges" property or systems that
+* utilize ACPI the driver has to call gpiochip_add_pin_range().
+*/
+   if (!of_property_read_bool(pctrl->dev->of_node, "gpio-ranges")) {
+   ret = gpiochip_add_pin_range(>chip, dev_name(pctrl->dev),
+0, 0, pctrl->chip.ngpio);
+   if (ret) {
+   dev_err(pctrl->dev, "failed to add pin range\n");
+   goto unregister_gpiochip;
+   }
}
 
platform_set_drvdata(pdev, pctrl);
-- 
2.17.2

Re: [RFC PATCH 07/12] locking/lockdep: Add support for nested terminal locks

2018-11-10 Thread Peter Zijlstra

On Sat, Nov 10, 2018 at 07:30:54PM -0500, Waiman Long wrote:
> On 11/10/2018 09:20 AM, Peter Zijlstra wrote:
> > On Thu, Nov 08, 2018 at 03:34:23PM -0500, Waiman Long wrote:
> >> There are use cases where we want to allow 2-level nesting of one
> >> terminal lock underneath another one. So the terminal lock type is now
> >> extended to support a new nested terminal lock where it can allow the
> >> acquisition of another regular terminal lock underneath it.
> > You're stretching things here... If you're allowing things under it, it
> > is no longer a terminal lock.
> >
> > Why would you want to do such a thing?
> 
> A majority of the gain in debugobjects is to make the hash lock a kind
> of terminal lock. Yes, I may be stretching it a bit here. I will take
> back the nesting patch and consider doing that in a future patch.

Maybe try and write a better changelog? I'm not following, but that
could also be because I've been awake for almost 20 hours :/

Re: [RFC PATCH 07/12] locking/lockdep: Add support for nested terminal locks

2018-11-10 Thread Peter Zijlstra

On Sat, Nov 10, 2018 at 07:30:54PM -0500, Waiman Long wrote:
> On 11/10/2018 09:20 AM, Peter Zijlstra wrote:
> > On Thu, Nov 08, 2018 at 03:34:23PM -0500, Waiman Long wrote:
> >> There are use cases where we want to allow 2-level nesting of one
> >> terminal lock underneath another one. So the terminal lock type is now
> >> extended to support a new nested terminal lock where it can allow the
> >> acquisition of another regular terminal lock underneath it.
> > You're stretching things here... If you're allowing things under it, it
> > is no longer a terminal lock.
> >
> > Why would you want to do such a thing?
> 
> A majority of the gain in debugobjects is to make the hash lock a kind
> of terminal lock. Yes, I may be stretching it a bit here. I will take
> back the nesting patch and consider doing that in a future patch.

Maybe try and write a better changelog? I'm not following, but that
could also be because I've been awake for almost 20 hours :/

Re: Patch to consider for stable 3.18, 4.4 and 4.9

2018-11-10 Thread Ben Hutchings

On Mon, 2018-03-05 at 20:43 +, Tomasz Kramkowski wrote:
> In September last year, Ben Hutchings submitted commit [9547837bdccb]
> for 3.16.48-rc1 and I informed him that it would be useless without
> [3f3752705dbd] (and that maybe [c3883fe06488] would be useful as well).
> Ben dropped the patch but suggested I email this list with the
> information of the other two patches but I never quite got around to it.
> 
> Now I see Sasha Levin is submitting [3f3752705dbd] and [c3883fe06488]
> for 4.9, 4.4 and 3.18 it would now make sense to include [9547837bdccb].
> This patch fixes a minor problem where a certain USB adapter for Sega
> Genesis controllers appears as one input device when it has two ports
> for two controllers. I imagine some users of emulator distributions
> might use stable kernels and might benefit from this fix.
> 
> I'm actually not entirely sure that patch is something suitable for
> stable but since it was already submitted once then I don't think it
> hurts to bring it up again (despite it breaking stable-kernel-rules as
> far as I understand it).
> 
> Commits mentioned:
> [9547837bdccb]: HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI 
> adapter
> [3f3752705dbd]: HID: reject input outside logical range only if null state is 
> set
> [c3883fe06488]: HID: clamp input to logical range if no null state

I've finally queued these up for 3.16, thanks.

Ben.

> If the patch [9547837bdccb] is not relevant then feel free to ignore
> this email.
> 
> Thanks,
> 
-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.




signature.asc
Description: This is a digitally signed message part

Re: Patch to consider for stable 3.18, 4.4 and 4.9

2018-11-10 Thread Ben Hutchings

On Mon, 2018-03-05 at 20:43 +, Tomasz Kramkowski wrote:
> In September last year, Ben Hutchings submitted commit [9547837bdccb]
> for 3.16.48-rc1 and I informed him that it would be useless without
> [3f3752705dbd] (and that maybe [c3883fe06488] would be useful as well).
> Ben dropped the patch but suggested I email this list with the
> information of the other two patches but I never quite got around to it.
> 
> Now I see Sasha Levin is submitting [3f3752705dbd] and [c3883fe06488]
> for 4.9, 4.4 and 3.18 it would now make sense to include [9547837bdccb].
> This patch fixes a minor problem where a certain USB adapter for Sega
> Genesis controllers appears as one input device when it has two ports
> for two controllers. I imagine some users of emulator distributions
> might use stable kernels and might benefit from this fix.
> 
> I'm actually not entirely sure that patch is something suitable for
> stable but since it was already submitted once then I don't think it
> hurts to bring it up again (despite it breaking stable-kernel-rules as
> far as I understand it).
> 
> Commits mentioned:
> [9547837bdccb]: HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI 
> adapter
> [3f3752705dbd]: HID: reject input outside logical range only if null state is 
> set
> [c3883fe06488]: HID: clamp input to logical range if no null state

I've finally queued these up for 3.16, thanks.

Ben.

> If the patch [9547837bdccb] is not relevant then feel free to ignore
> this email.
> 
> Thanks,
> 
-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.




signature.asc
Description: This is a digitally signed message part

1 2 3 4 5 >

1 - 100 of 458 matches

Mail list logo