date:20161025

[RFC v4 00/18] Landlock LSM: Unprivileged sandboxing

2016-10-25 Thread Mickaël Salaün

Hi,

This fourth RFC brings some improvements over the previous one [1]. An important
new point is the abstraction from the raw types of LSM hook arguments. It is
now possible to call a Landlock function the same way for LSM hooks with
different internal argument types. Some parts of the code are revamped with RCU
to properly deal with concurrency. From a userland point of view, the only
remaining link with seccomp-bpf is the ability to use the seccomp(2) syscall to
load and enforce a Landlock rule. Seccomp filters cannot trigger Landlock rules
anymore. For now, it is no more possible for an unprivileged user to enforce a
Landlock rule on a cgroup through delegation.

As suggested, I plan to write documentation for userland and kernel developers
with some kind of guiding principles. A remaining question is how to enforce
limitations for the rule creation?


# Landlock LSM

The goal of this new stackable Linux Security Module (LSM) called Landlock is
to allow any process, including unprivileged ones, to create powerful security
sandboxes comparable to the Seatbelt/XNU Sandbox or the OpenBSD Pledge. This
kind of sandbox is expected to help mitigate the security impact of bugs or
unexpected/malicious behaviors in userland applications.

eBPF programs are used to create a security rule. They are very limited (i.e.
can only call a whitelist of functions) and cannot do a denial of service (i.e.
no loop). A new dedicated eBPF map allows to collect and compare Landlock
handles with system resources (e.g. files or network connections).

The approach taken is to add the minimum amount of code while still allowing
the userland to create quite complex access rules. A dedicated security policy
language as the one used by SELinux, AppArmor and other major LSMs involves a
lot of code and is usually dedicated to a trusted user (i.e. root).


# eBPF

To get an expressive language while still being safe and small, Landlock is
based on eBPF. Landlock should be usable by untrusted processes and must then
expose a minimal attack surface. The eBPF bytecode is minimal while powerful,
widely used and designed to be used by not so trusted application. Reusing this
code allows to not reproduce the same mistakes and minimize new code  while
still taking a generic approach. Only a few additional features are added like
a new kind of arraymap and some dedicated eBPF functions.

An eBPF program has access to an eBPF context which contains the LSM hook
arguments (as does seccomp-bpf with syscall arguments). They can be used
directly or passed to helper functions according to their types. It is then
possible to do complex access checks without race conditions nor inconsistent
evaluation (i.e. incorrect mirroring of the OS code and state [2]).

There is one eBPF program subtype per LSM hook. This allows to statically check
which context access is performed by an eBPF program. This is needed to deny
kernel address leak and ensure the right use of LSM hook arguments with eBPF
functions. Moreover, this safe pointer handling removes the need for runtime
check or abstract data, which improves performances. Any user can add multiple
Landlock eBPF programs per LSM hook. They are stacked and evaluated one after
the other (cf. seccomp-bpf).


# LSM hooks

Unlike syscalls, LSM hooks are security checkpoints and are not architecture
dependent. They are designed to match a security need associated with a
security policy (e.g. access to a file). Exposing parts of some LSM hooks
instead of using the syscall API for sandboxing should help to avoid bugs and
hacks as encountered by the first RFC. Instead of redoing the work of the LSM
hooks through syscalls, we should use and expose them as does policies of
access control LSM.

Only a subset of the hooks are meaningful for an unprivileged sandbox mechanism
(e.g. file system or network access control). Landlock uses an abstraction of
raw LSM hooks, which allow to deal with possible future API changes of the LSM
hook API. Moreover, thanks to the ePBF program typing (per LSM hook) used by
Landlock, it should not be hard to make such evolutions backward compatible.


# Use case scenario

First, a process needs to create a new dedicated eBPF map containing handles.
This handles are references to system resources (e.g. file or directory) and
grouped in one or multiple maps to be efficiently managed and checked in
batches. This kind of map can be passed to Landlock eBPF functions to compare,
for example, with a file access request. The handles are only accessible from
the eBPF programs created by the same thread.

The loaded Landlock eBPF programs can be triggered by a seccomp filter
returning RET_LANDLOCK. In addition, a cookie (16-bit value) can be passed from
a seccomp filter to eBPF programs. This allow flexible security policies
between seccomp and Landlock.

Another way to enforce a Landlock security policy is to attach Landlock
programs to a dedicated cgroup. All the processes in this cgroup will then be

[RFC v4 01/18] landlock: Add Kconfig

2016-10-25 Thread Mickaël Salaün

Initial Landlock Kconfig needed to split the Landlock eBPF and seccomp
parts to ease the review.

Changes from v2:
* add seccomp filter or cgroups (with eBPF programs attached support)
  dependencies

Signed-off-by: Mickaël Salaün 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---
 security/Kconfig  |  1 +
 security/landlock/Kconfig | 23 +++
 2 files changed, 24 insertions(+)
 create mode 100644 security/landlock/Kconfig

diff --git a/security/Kconfig b/security/Kconfig
index 118f4549404e..c63194c561c5 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -164,6 +164,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/landlock/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig
new file mode 100644
index ..dec64270b06d
--- /dev/null
+++ b/security/landlock/Kconfig
@@ -0,0 +1,23 @@
+config SECURITY_LANDLOCK
+   bool "Landlock sandbox support"
+   depends on SECURITY
+   depends on BPF_SYSCALL
+   depends on SECCOMP_FILTER || CGROUP_BPF
+   default y
+   help
+ Landlock is a stacked LSM which allows any user to load a security
+ policy to restrict their processes (i.e. create a sandbox). The
+ policy is a list of stacked eBPF programs for some LSM hooks. Each
+ program can do some access comparison to check if an access request
+ is legitimate.
+
+ You need to enable seccomp filter and/or cgroups (with eBPF programs
+ attached support) to apply a security policy to either a process
+ hierarchy (e.g. application with built-in sandboxing) or a group of
+ processes (e.g. container sandboxing). It is recommended to enable
+ both seccomp filter and cgroups.
+
+ Further information about eBPF can be found in
+ Documentation/networking/filter.txt
+
+ If you are unsure how to answer this question, answer Y.
-- 
2.9.3

[RFC v4 11/18] seccomp,landlock: Handle Landlock hooks per process hierarchy

2016-10-25 Thread Mickaël Salaün

The seccomp(2) syscall can be use to apply a Landlock rule to the
current process. As with a seccomp filter, the Landlock rule is enforced
for all its future children. An inherited rule tree can be updated
(append-only) by the owner of inherited Landlock nodes (e.g. a parent
process that create a new rule). However, an intermediate task, which
did not create a rule, will not be able to update its children's rules.

Changes since v3:
* remove the hard link with seccomp (suggested by Andy Lutomirski and
  Kees Cook):
  * remove the cookie which could imply multiple evaluation of Landlock
rules
  * remove the origin field in struct landlock_data
* remove documentation fix (merged upstream)
* rename the new seccomp command to SECCOMP_ADD_LANDLOCK_RULE
* internal renaming

Changes since v2:
* Landlock programs can now be run without seccomp filter but for any
  syscall (from the process) or interruption
* move Landlock related functions and structs into security/landlock/*
  (to manage cgroups as well)
* fix seccomp filter handling: run Landlock programs for each of their
  legitimate seccomp filter
* properly clean up all seccomp results
* cosmetic changes to ease the understanding
* fix some ifdef

Signed-off-by: Mickaël Salaün 
Cc: Kees Cook 
Cc: Andy Lutomirski 
Cc: Will Drewry 
Cc: Andrew Morton 
Link: 
https://lkml.kernel.org/r/cagxu5j+qowiyquhifobtupfpxp6xevdgf08bw4yzkvdtcha...@mail.gmail.com
---
 include/linux/landlock.h |  5 +
 include/linux/seccomp.h  |  8 +++
 include/uapi/linux/seccomp.h |  1 +
 kernel/fork.c| 13 +--
 kernel/seccomp.c |  8 +++
 security/landlock/lsm.c  |  8 +--
 security/landlock/manager.c  | 51 
 7 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/include/linux/landlock.h b/include/linux/landlock.h
index 263be3cf0b48..72b4235d255f 100644
--- a/include/linux/landlock.h
+++ b/include/linux/landlock.h
@@ -74,5 +74,10 @@ struct landlock_hooks {
 
 void put_landlock_hooks(struct landlock_hooks *hooks);
 
+#ifdef CONFIG_SECCOMP_FILTER
+int landlock_seccomp_append_prog(unsigned int flags,
+   const char __user *user_bpf_fd);
+#endif /* CONFIG_SECCOMP_FILTER */
+
 #endif /* CONFIG_SECURITY_LANDLOCK */
 #endif /* _LINUX_LANDLOCK_H */
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index e25aee2cdfc0..4a8ccc7ff976 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -10,6 +10,10 @@
 #include 
 #include 
 
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+#include 
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
+
 struct seccomp_filter;
 /**
  * struct seccomp - the state of a seccomp'ed process
@@ -18,6 +22,7 @@ struct seccomp_filter;
  * system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *  accessed without locking during system call entry.
+ * @landlock_hooks: contains an array of Landlock programs.
  *
  *  @filter must only be accessed from the context of current as there
  *  is no read locking.
@@ -25,6 +30,9 @@ struct seccomp_filter;
 struct seccomp {
int mode;
struct seccomp_filter *filter;
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+   struct landlock_hooks *landlock_hooks;
+#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
 };
 
 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..56dd692cddac 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -13,6 +13,7 @@
 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT0
 #define SECCOMP_SET_MODE_FILTER1
+#define SECCOMP_ADD_LANDLOCK_RULE  2
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC  1
diff --git a/kernel/fork.c b/kernel/fork.c
index 0690e43bdda5..d8af3ba554fa 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -510,7 +510,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
 * the usage counts on the error path calling free_task.
 */
tsk->seccomp.filter = NULL;
-#endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+   tsk->seccomp.landlock_hooks = NULL;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+#endif /* CONFIG_SECCOMP */
 
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
@@ -1378,7 +1381,13 @@ static void copy_seccomp(struct task_struct *p)
 
/* Ref-count the new filter user, and assign it. */
get_seccomp_filter(current);
-   p->seccomp = current->seccomp;
+   p->seccomp.mode = current->seccomp.mode;
+   p->seccomp.filter = current->seccomp.filter;
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+   p->seccomp.landlock_hooks = current->seccomp.landlock_ho

RE: [PATCH v9 02/12] vfio: VFIO based driver for Mediated devices

2016-10-25 Thread Tian, Kevin

> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Tuesday, October 18, 2016 5:22 AM
> 
> vfio_mdev driver registers with mdev core driver.
> MDEV core driver creates mediated device and calls probe routine of

use same case - either 'mdev core' or 'MDEV core'

> vfio_mdev driver for each device.
> Probe routine of vfio_mdev driver adds mediated device to VFIO core module
> 
> This driver forms a shim layer that pass through VFIO devices operations
> to vendor driver for mediated devices.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I583f4734752971d3d112324d69e2508c88f359ec
> ---
>  drivers/vfio/mdev/Kconfig |   7 ++
>  drivers/vfio/mdev/Makefile|   1 +
>  drivers/vfio/mdev/vfio_mdev.c | 148
> ++
>  3 files changed, 156 insertions(+)
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
> 
> diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
> index 93addace9a67..6cef0c4d2ceb 100644
> --- a/drivers/vfio/mdev/Kconfig
> +++ b/drivers/vfio/mdev/Kconfig
> @@ -9,3 +9,10 @@ config VFIO_MDEV
>   See Documentation/vfio-mdev/vfio-mediated-device.txt for more details.
> 
>  If you don't know what do here, say N.
> +
> +config VFIO_MDEV_DEVICE
> +tristate "VFIO support for Mediated devices"
> +depends on VFIO && VFIO_MDEV
> +default n
> +help
> +VFIO based driver for mediated devices.
> diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
> index 31bc04801d94..fa2d5ea466ee 100644
> --- a/drivers/vfio/mdev/Makefile
> +++ b/drivers/vfio/mdev/Makefile
> @@ -2,3 +2,4 @@
>  mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
> 
>  obj-$(CONFIG_VFIO_MDEV) += mdev.o
> +obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> new file mode 100644
> index ..b7b47604ce7a
> --- /dev/null
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -0,0 +1,148 @@
> +/*
> + * VFIO based driver for Mediated device
> + *
> + * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
> + * Author: Neo Jia 
> + *  Kirti Wankhede 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "mdev_private.h"
> +
> +#define DRIVER_VERSION  "0.1"
> +#define DRIVER_AUTHOR   "NVIDIA Corporation"
> +#define DRIVER_DESC "VFIO based driver for Mediated device"
> +
> +static int vfio_mdev_open(void *device_data)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> + int ret;
> +
> + if (unlikely(!parent->ops->open))
> + return -EINVAL;
> +
> + if (!try_module_get(THIS_MODULE))
> + return -ENODEV;
> +
> + ret = parent->ops->open(mdev);
> + if (ret)
> + module_put(THIS_MODULE);
> +
> + return ret;
> +}
> +
> +static void vfio_mdev_release(void *device_data)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (parent->ops->release)

likely()

> + parent->ops->release(mdev);
> +
> + module_put(THIS_MODULE);
> +}
> +
> +static long vfio_mdev_unlocked_ioctl(void *device_data,
> +  unsigned int cmd, unsigned long arg)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->ioctl))
> + return -EINVAL;
> +
> + return parent->ops->ioctl(mdev, cmd, arg);
> +}
> +
> +static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
> +   size_t count, loff_t *ppos)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->read))
> + return -EINVAL;
> +
> + return parent->ops->read(mdev, buf, count, ppos);
> +}
> +
> +static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
> +size_t count, loff_t *ppos)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->write))
> + return -EINVAL;
> +
> + return parent->ops->write(mdev, buf, count, ppos);
> +}
> +
> +static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->mmap))
> + return -EINVAL;
> +
> + return parent->ops->mmap(mdev, vma);
> +}
> +
> +static const struct vfio_device_ops vfio_mdev_dev_ops = {
> + .name   = "vfio-mdev",
> +

[tip:x86/asm] x86/decoder: Use stderr if insn sanity test fails

2016-10-25 Thread tip-bot for Paul Bolle

Commit-ID:  bb12d6740f6de393927362f23f833a79d85df384
Gitweb: http://git.kernel.org/tip/bb12d6740f6de393927362f23f833a79d85df384
Author: Paul Bolle 
AuthorDate: Tue, 25 Oct 2016 22:56:05 +0200
Committer:  Ingo Molnar 
CommitDate: Wed, 26 Oct 2016 08:41:06 +0200

x86/decoder: Use stderr if insn sanity test fails

If the instruction sanity test fails, it prints a "Failure" message to
stdout. Make this program behave like the rest of the build and print
that message to stderr.

Signed-off-by: Paul Bolle 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1477428965-20548-3-git-send-email-pebo...@tiscali.nl
Signed-off-by: Ingo Molnar 
---
 arch/x86/tools/insn_sanity.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/insn_sanity.c b/arch/x86/tools/insn_sanity.c
index ba70ff2..1972565 100644
--- a/arch/x86/tools/insn_sanity.c
+++ b/arch/x86/tools/insn_sanity.c
@@ -269,7 +269,8 @@ int main(int argc, char **argv)
insns++;
}
 
-   fprintf(stdout, "%s: %s: decoded and checked %d %s instructions with %d 
errors (seed:0x%x)\n",
+   fprintf((errors) ? stderr : stdout,
+   "%s: %s: decoded and checked %d %s instructions with %d errors 
(seed:0x%x)\n",
prog,
(errors) ? "Failure" : "Success",
insns,

[tip:x86/asm] x86/decoder: Use stdout if insn decoder test is successful

2016-10-25 Thread tip-bot for Paul Bolle

Commit-ID:  bdcc18b548b8f1fab23c097724c6f32daac03185
Gitweb: http://git.kernel.org/tip/bdcc18b548b8f1fab23c097724c6f32daac03185
Author: Paul Bolle 
AuthorDate: Tue, 25 Oct 2016 22:56:04 +0200
Committer:  Ingo Molnar 
CommitDate: Wed, 26 Oct 2016 08:41:06 +0200

x86/decoder: Use stdout if insn decoder test is successful

If the instruction decoder test ran successful it prints a message like
this to stderr:
Succeed: decoded and checked 1767380 instructions

But, as described in "console mode programming user interface guidelines
version 101" which doesn't exist, programs should use stderr for errors
or warnings. We're told about a successful run here, so the instruction
decoder test should use stdout.

Let's fix the typo too, while we're at it.

Signed-off-by: Paul Bolle 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1477428965-20548-2-git-send-email-pebo...@tiscali.nl
Signed-off-by: Ingo Molnar 
---
 arch/x86/tools/test_get_len.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index 56f04db..ecf31e0 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -167,7 +167,7 @@ int main(int argc, char **argv)
fprintf(stderr, "Warning: decoded and checked %d"
" instructions with %d warnings\n", insns, warnings);
else
-   fprintf(stderr, "Succeed: decoded and checked %d"
+   fprintf(stdout, "Success: decoded and checked %d"
" instructions\n", insns);
return 0;
 }

[git pull] drm/x86 pat regression fix.

2016-10-25 Thread Dave Airlie

Hi Linus,

This is a standalone pull request for the fix for a regression introduced
in -rc1 by a change to vm_insert_mixed to start using the PAT range tracking
to validate page protections. With this fix in place, all the VRAM mappings
for GPU drivers ended up at UC instead of WC.

There are probably better ways to fix this long term, but nothing I'd considered
for -fixes that wouldn't need more settling in time. So I've just created a new
arch API that the drivers can reserve all their VRAM aperture ranges as WC.

Dave.

The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:

  Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux tags/drm-x86-pat-regression-fix

for you to fetch changes up to 7cf321d118a825c1541b43ca45294126fd474efa:

  drm/drivers: add support for using the arch wc mapping API.
(2016-10-26 16:48:01 +1000)


patches to fix a regression in 4.9-rc1 on x86 PAT


Dave Airlie (2):
  x86/io: add interface to reserve io memtype for a resource range. (v1.1)
  drm/drivers: add support for using the arch wc mapping API.

 arch/x86/include/asm/io.h  |  6 ++
 arch/x86/mm/pat.c  | 14 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  5 +
 drivers/gpu/drm/ast/ast_ttm.c  |  6 ++
 drivers/gpu/drm/cirrus/cirrus_ttm.c|  7 +++
 drivers/gpu/drm/mgag200/mgag200_ttm.c  |  7 +++
 drivers/gpu/drm/nouveau/nouveau_ttm.c  |  8 
 drivers/gpu/drm/radeon/radeon_object.c |  5 +
 include/linux/io.h | 22 ++
 9 files changed, 80 insertions(+)

RE: [PATCH v9 01/12] vfio: Mediated device Core driver

2016-10-25 Thread Tian, Kevin

> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Tuesday, October 18, 2016 5:22 AM
> 
> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |  mdev | | |  |
>  | |  bus  | +>+  |<-> VFIO user
>  | |  driver   | | probe()/remove()| vfio_mdev.ko |APIs
>  | |   | | |  |
>  | +---+ | +--+
>  |   |
>  |  MDEV CORE|
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>  struct device_driverdriver;
> };
> 
> Mediated bus driver for mdev device should use this interface to register
> and unregister with core driver respectively:
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Medisted bus driver is responsible to add/delete mediated devices to/from

Medisted -> Mediated

> VFIO group when devices are bound and unbound to the driver.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in its driver. APIs are :
> 
> * dev_attr_groups: attributes of the parent device.
> * mdev_attr_groups: attributes of the mediated device.
> * supported_type_groups: attributes to define supported type. This is
>mandatory field.
> * create: to allocate basic resources in driver for a mediated device.

in 'which driver'? it should be clear to remove 'in driver' here

> * remove: to free resources in driver when mediated device is destroyed.
> * open: open callback of mediated device
> * release: release callback of mediated device
> * read : read emulation callback.
> * write: write emulation callback.
> * mmap: mmap emulation callback.
> * ioctl: ioctl callback.

You only highlight 'mandatory field' for supported_type_groups. What
about other fields? Are all of them optional? Please clarify and also
stay consistent to later code comment.

> 
> Drivers should use these interfaces to register and unregister device to
> mdev core driver respectively:
> 
> extern int  mdev_register_device(struct device *dev,
>  const struct parent_ops *ops);
> extern void mdev_unregister_device(struct device *dev);
> 
> There are no locks to serialize above callbacks in mdev driver and
> vfio_mdev driver. If required, vendor driver can have locks to serialize
> above APIs in their driver.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I73a5084574270b14541c529461ea2f03c292d510
> ---
>  drivers/vfio/Kconfig |   1 +
>  drivers/vfio/Makefile|   1 +
>  drivers/vfio/mdev/Kconfig|  11 ++
>  drivers/v

Re: [PATCH] tty/serial: at91: fix hardware handshake on Atmel platforms

2016-10-25 Thread Richard Genoud

On 25/10/2016 18:22, Alexandre Belloni wrote:
> Hi,
> 
> On 25/10/2016 at 18:11:35 +0200, Richard Genoud wrote :
>> commit 1cf6e8fc8341 ("tty/serial: at91: fix RTS line management when
>> hardware handshake is enabled"), despite its title, broke hardware
>> handshake on *every* Atmel platforms.
>>
>> The only one partially working is the SAMA5D2.
>>
> 
> [...]
> 
>> Changes since v4:
>>  - the mctrl_gpio_use_rtscts() is gone since it was atmel_serial
>>  specific. (so patch 1 is gone)
>>  - patches 2 and 3 have been merged together since it didn't make
>>  a lot of sense to correct the GPIO case in one separate patch.
>>  - ATMEL_US_USMODE_HWHS is now unset for platform with PDC
>>
>> Changes since v3:
>>  - remove superfluous #include  (thanks to Uwe)
>>  - rebase on next-20160930
>>
>> Changes since v2:
>>  - remove IS_ERR_OR_NULL() test in patch 1/3 as Uwe suggested.
>>  - fix typos in patch 2/3
>>  - rebase on next-20160927
>>  - simplify the logic in patch 3/3.
>>
>> Changes since v1:
>>  - Correct patch 1 with the error found by kbuild.
>>  - Add Alexandre's Acked-by on patch 2
>>  - Rewrite patch 3 logic in the light of the on-going discussion
>>with Cyrille and Alexandre.
>>
> 
> The changelog has to go after the --- marker.

You're right.
thanks !
>> * the list may not be exhaustive
>>
>> Signed-off-by: Richard Genoud 
> 
> Acked-by: Alexandre Belloni 
> 
>> ---
>>  drivers/tty/serial/atmel_serial.c | 25 +
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>>  I think this should go in the stable tree since it fixes the flow
>>  control broken since v4.0.
>>  But It won't compile on versions before 4.9rc1 because:
>>  function atmel_use_fifo was introduced in 4.4.12 / 4.7
>>  variable atmel_port was introduced in 4.9rc1
>>
>>  That's why I didn't add the Cc stable in the email body.
>>
>>
>> diff --git a/drivers/tty/serial/atmel_serial.c 
>> b/drivers/tty/serial/atmel_serial.c
>> index fd8aa1f4ba78..2c7c45904ba7 100644
>> --- a/drivers/tty/serial/atmel_serial.c
>> +++ b/drivers/tty/serial/atmel_serial.c
>> @@ -2132,11 +2132,28 @@ static void atmel_set_termios(struct uart_port 
>> *port, struct ktermios *termios,
>>  mode |= ATMEL_US_USMODE_RS485;
>>  } else if (termios->c_cflag & CRTSCTS) {
>>  /* RS232 with hardware handshake (RTS/CTS) */
>> -if (atmel_use_dma_rx(port) && !atmel_use_fifo(port)) {
>> -dev_info(port->dev, "not enabling hardware flow control 
>> because DMA is used");
>> -termios->c_cflag &= ~CRTSCTS;
>> -} else {
>> +if (atmel_use_fifo(port) &&
>> +!mctrl_gpio_to_gpiod(atmel_port->gpios, UART_GPIO_CTS)) {
>> +/*
>> + * with ATMEL_US_USMODE_HWHS set, the controller will
>> + * be able to drive the RTS pin high/low when the RX
>> + * FIFO is above RXFTHRES/below RXFTHRES2.
>> + * It will also disable the transmitter when the CTS
>> + * pin is high.
>> + * This mode is not activated if CTS pin is a GPIO
>> + * because in this case, the transmitter is always
>> + * disabled.
>> + * If the RTS pin is a GPIO, the controller won't be
>> + * able to drive it according to the FIFO thresholds,
>> + * but it will be handled by the driver.
>> + */
>>  mode |= ATMEL_US_USMODE_HWHS;
>> +} else {
>> +/*
>> + * For platforms without FIFO, the flow control is
>> + * handled by the driver.
>> + */
>> +mode |= ATMEL_US_USMODE_NORMAL;
>>  }
>>  } else {
>>  /* RS232 without hadware handshake */
>

Re: [PATCH] IB/mlx4: avoid a -Wmaybe-uninitialize warning

2016-10-25 Thread Yishai Hadas


On 10/25/2016 7:16 PM, Arnd Bergmann wrote:

There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:

ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.

Signed-off-by: Arnd Bergmann 


Reviewed-by: Yishai Hadas 


---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c 
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 84d7857ccc27..c548beaaf910 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -1605,13 +1605,14 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, 
int slave, int index,
r->com.from_state = r->com.state;
r->com.to_state = state;
r->com.state = RES_EQ_BUSY;
-   if (eq)
-   *eq = r;
}
}

spin_unlock_irq(mlx4_tlock(dev));

+   if (!err && eq)
+   *eq = r;
+
return err;
 }

Re: [PATCH 19/28] brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap

2016-10-25 Thread Kalle Valo

Arnd Bergmann  writes:

> A bugfix added a sanity check around the assignment and use of the
> 'is_11d' variable, which looks correct to me, but as the function is
> rather complex already, this confuses the compiler to the point where
> it can no longer figure out if the variable is always initialized
> correctly:
>
> brcm80211/brcmfmac/cfg80211.c: In function ‘brcmf_cfg80211_start_ap’:
> brcm80211/brcmfmac/cfg80211.c:4586:10: error: ‘is_11d’ may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>
> This adds an initialization for the newly introduced case in which
> the variable should not really be used, in order to make the warning
> go away.
>
> Fixes: b3589dfe0212 ("brcmfmac: ignore 11d configuration errors")
> Cc: Hante Meuleman 
> Cc: Arend van Spriel 
> Cc: Kalle Valo 
> Signed-off-by: Arnd Bergmann 

Via which tree are you planning to submit this? Should I take it?

-- 
Kalle Valo

Re: [PATCH v2] staging: vc04_services: Replace dmac_map_area with dmac_map_sg

2016-10-25 Thread Greg KH

On Tue, Oct 25, 2016 at 07:23:27PM -0700, Michael Zoran wrote:
> The original arm implementation uses dmac_map_area which is not
> portable.  Replace it with an architecture neutral version
> which uses dma_map_sg.
> 
> As you can see that for larger page sizes, the dma_map_sg
> implementation is faster then the original unportable dma_map_area
> implementation.
> 
> Test   dmac_map_area   dma_map_page dma_map_sg
> vchiq_test -b 4 1  51us/iter   76us/iter76us
> vchiq_test -b 8 1  70us/iter   82us/iter91us
> vchiq_test -b 16 1 94us/iter   118us/iter   121us
> vchiq_test -b 32 1 146us/iter  173us/iter   187us
> vchiq_test -b 64 1 263us/iter  328us/iter   299us
> vchiq_test -b 128 1529us/iter  631us/iter   595us
> vchiq_test -b 256 12285us/iter 2275us/iter  2001us
> vchiq_test -b 512 14372us/iter 4616us/iter  4123us
> 
> For message sizes >= 64KB, dma_map_sg is faster then dma_map_page.
> 
> For message size >= 256KB, the dma_map_sg is the fastest
> implementation.
> 
> "Normal" messages sizes should be about 1MB which is beyond
> the length that this change shows a speed increase.
> 
> This is v2 of the patch which includes extra WARN_ONs and
> incorporates feedback from Eric Anholt .
> 
> Signed-off-by: Michael Zoran 
> ---
>  .../interface/vchiq_arm/vchiq_2835_arm.c   | 152 
> +
>  1 file changed, 93 insertions(+), 59 deletions(-)

Nice work!

I'd like to get an ack from Eric before applying it...

thanks,

greg k-h

Re: [PATCH v2] drm/mediatek: fixed the calc method of data rate per lane

2016-10-25 Thread CK Hu

Hi, Jitao:

On Tue, 2016-10-25 at 13:40 +0800, Jitao Shi wrote:
> Tune dsi frame rate by pixel clock, dsi add some extra signal (i.e. Tlpx,
> Ths-prepare, Ths-zero, Ths-trail,Ths-exit) when enter and exit LP mode, this
> signal will cause h-time larger than normal and reduce FPS.
> Need to multiply a coefficient to offset the extra signal's effect.
> coefficient = ((htotal*bpp/lane_number)+Tlpx+Ths_prep+Ths_zero+Ths_trail+
> Ths_exit)/(htotal*bpp/lane_number))
> 
> Signed-off-by: Jitao Shi 
> ---
> Change since v1:
>  - phy_timing2 and phy_timing3 refer clock cycle time.
>  - define values of LPX HS_PRPR HS_ZERO HS_TRAIL TA_GO TA_SURE TA_GET 
> DA_HS_EXIT
> ---
>  drivers/gpu/drm/mediatek/mtk_dsi.c |  103 
> +++-
>  1 file changed, 67 insertions(+), 36 deletions(-)
> 

[snip...]

>  
> -static void dsi_phy_timconfig(struct mtk_dsi *dsi)
> +static void dsi_phy_timconfig(struct mtk_dsi *dsi, u32 phy_timing0,
> +   u32 phy_timing1, u32 phy_timing2,
> +   u32 phy_timing3)
>  {
> - u32 timcon0, timcon1, timcon2, timcon3;
> - unsigned int ui, cycle_time;
> - unsigned int lpx;
> -
> - ui = 1000 / dsi->data_rate + 0x01;
> - cycle_time = 8000 / dsi->data_rate + 0x01;
> - lpx = 5;
> -
> - timcon0 = (8 << 24) | (0xa << 16) | (0x6 << 8) | lpx;
> - timcon1 = (7 << 24) | (5 * lpx << 16) | ((3 * lpx) / 2) << 8 |
> -   (4 * lpx);
> - timcon2 = ((NS_TO_CYCLE(0x64, cycle_time) + 0xa) << 24) |
> -   (NS_TO_CYCLE(0x150, cycle_time) << 16);
> - timcon3 = (2 * lpx) << 16 | NS_TO_CYCLE(80 + 52 * ui, cycle_time) << 8 |
> -NS_TO_CYCLE(0x40, cycle_time);
> -
> - writel(timcon0, dsi->regs + DSI_PHY_TIMECON0);
> - writel(timcon1, dsi->regs + DSI_PHY_TIMECON1);
> - writel(timcon2, dsi->regs + DSI_PHY_TIMECON2);
> - writel(timcon3, dsi->regs + DSI_PHY_TIMECON3);

Why do you move these calculation to mtk_dsi_poweron()? You can keep
calculation here and just do some modification.

Regards,
CK

> + writel(phy_timing0, dsi->regs + DSI_PHY_TIMECON0);
> + writel(phy_timing1, dsi->regs + DSI_PHY_TIMECON1);
> + writel(phy_timing2, dsi->regs + DSI_PHY_TIMECON2);
> + writel(phy_timing3, dsi->regs + DSI_PHY_TIMECON3);
>  }
>  
>  static void mtk_dsi_enable(struct mtk_dsi *dsi)
> @@ -202,19 +188,51 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi)
>  {
>   struct device *dev = dsi->dev;
>   int ret;
> + u64 bit_clock, total_bits;
> + u32 htotal, htotal_bits, bit_per_pixel, overhead_cycles, overhead_bits;
> + u32 phy_timing0, phy_timing1, phy_timing2, phy_timing3;
> + u32 ui, cycle_time;
>  
>   if (++dsi->refcount != 1)
>   return 0;
>  
> + switch (dsi->format) {
> + case MIPI_DSI_FMT_RGB565:
> + bit_per_pixel = 16;
> + break;
> + case MIPI_DSI_FMT_RGB666_PACKED:
> + bit_per_pixel = 18;
> + break;
> + case MIPI_DSI_FMT_RGB666:
> + case MIPI_DSI_FMT_RGB888:
> + default:
> + bit_per_pixel = 24;
> + break;
> + }
> + /**
> +  * data_rate = (pixel_clock) * bit_per_pixel * mipi_ratio / lane_num;
> +  * vm.pixelclock is Khz, data_rata unit is Hz, so need to multiply 1000
> +  * mipi_ratio is (htotal * byte_per_pixel / lane_num + Tlpx + Ths_prep
> +  *+ Thstrail + Ths_exit + Ths_zero) /
> +  *   (htotal * byte_per_pixel /lane_number)
> +  */
> + bit_clock = dsi->vm.pixelclock * 1000 * bit_per_pixel;
> + htotal = dsi->vm.hactive + dsi->vm.hback_porch + dsi->vm.hfront_porch +
> +  dsi->vm.hsync_len;
> + htotal_bits = htotal * bit_per_pixel;
> +
>   /**
> -  * data_rate = (pixel_clock / 1000) * pixel_dipth * mipi_ratio;
> -  * pixel_clock unit is Khz, data_rata unit is MHz, so need divide 1000.
> -  * mipi_ratio is mipi clk coefficient for balance the pixel clk in mipi.
> -  * we set mipi_ratio is 1.05.
> +  * overhead = lpx + hs_prepare + hs_zero + hs_trail + hs_exit
>*/
> - dsi->data_rate = dsi->vm.pixelclock * 3 * 21 / (1 * 1000 * 10);
> + overhead_cycles = LPX + (HS_PRPR >> 8) + (HS_ZERO >> 16) +
> +   (HS_TRAIL >> 24) + (DA_HS_EXIT >> 24);
> + overhead_bits = overhead_cycles * dsi->lanes * 8;
> + total_bits = htotal_bits + overhead_bits;
>  
> - ret = clk_set_rate(dsi->hs_clk, dsi->data_rate * 100);
> + dsi->data_rate = DIV_ROUND_UP_ULL(bit_clock * total_bits,
> +   htotal_bits * dsi->lanes);
> +
> + ret = clk_set_rate(dsi->hs_clk, dsi->data_rate);
>   if (ret < 0) {
>   dev_err(dev, "Failed to set data rate: %d\n", ret);
>   goto err_refcount;
> @@ -236,7 +254,20 @@ static int mtk_dsi_poweron(struct mtk_dsi *dsi)
>  
>   mtk_dsi_enable(dsi);
>   mtk_dsi_reset(dsi);
> - dsi_

Re: [PATCH] crypto: caam: fix type mismatch warning

2016-10-25 Thread Horia Geanta Neag

On 10/26/2016 12:29 AM, Arnd Bergmann wrote:
> Building the caam driver on arm64 produces a harmless warning:
> 
> drivers/crypto/caam/caamalg.c:140:139: warning: comparison of distinct 
> pointer types lacks a cast
> 
> We can use min_t to tell the compiler which type we want it to use
> here.
> 
> Fixes: 5ecf8ef9103c ("crypto: caam - fix sg dump")
> Signed-off-by: Arnd Bergmann 
Reviewed-by: Horia Geantă 

Thanks,
Horia

[PATCH V3 1/9] PM / OPP: Reword binding supporting multiple regulators per device

2016-10-25 Thread Viresh Kumar

On certain platforms (like TI), DVFS for a single device (CPU) requires
configuring multiple power supplies.

The OPP bindings already contains binding and example to explain this
case, but it isn't sufficient. For example, there is no way for the code
parsing these bindings to know which voltage values belong to which
power supply. Also its not possible to know the order in which the
supplies need to be configured while switching OPPs.

This patch tries to clarify on those details and does some minor changes
as well.

Note that the bindings do not specify the order in which the regulators
need to be programmed and the order in which the entries are added for
the supplies.

The user of the bindings (like the kernel) shall know these details
already and the DT is responsible to supply only the readings for the
regulators.

Cc: Mark Brown 
Cc: devicet...@vger.kernel.org
Signed-off-by: Viresh Kumar 
Acked-by: Rob Herring 
Reviewed-by: Stephen Boyd 
---
 Documentation/devicetree/bindings/opp/opp.txt | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/opp/opp.txt 
b/Documentation/devicetree/bindings/opp/opp.txt
index ee91cbdd95ee..af476df510f1 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -86,8 +86,13 @@ properties.
   Single entry is for target voltage and three entries are for 
   voltages.
 
-  Entries for multiple regulators must be present in the same order as
-  regulators are specified in device's DT node.
+  Entries for multiple regulators shall be provided in the same field separated
+  by angular brackets <>. The OPP binding doesn't provide any provisions to
+  relate the values to their power supplies or the order in which the supplies
+  need to be configured.
+
+  Entries for all regulators shall be of the same size, i.e. either all use a
+  single value or triplets.
 
 - opp-microvolt-: Named opp-microvolt property. This is exactly similar 
to
   the above opp-microvolt property, but allows multiple voltage ranges to be
@@ -104,10 +109,12 @@ properties.
 
   Should only be set if opp-microvolt is set for the OPP.
 
-  Entries for multiple regulators must be present in the same order as
-  regulators are specified in device's DT node. If this property isn't required
-  for few regulators, then this should be marked as zero for them. If it isn't
-  required for any regulator, then this property need not be present.
+  Entries for multiple regulators shall be provided in the same field separated
+  by angular brackets <>. If current values aren't required for a regulator,
+  then it shall be filled with 0. If current values aren't required for any of
+  the regulators, then this field is not required. The OPP binding doesn't
+  provide any provisions to relate the values to their power supplies or the
+  order in which the supplies need to be configured.
 
 - opp-microamp-: Named opp-microamp property. Similar to
   opp-microvolt- property, but for microamp instead.
@@ -386,10 +393,12 @@ Example 4: Handling multiple regulators
 / {
cpus {
cpu@0 {
-   compatible = "arm,cortex-a7";
+   compatible = "vendor,cpu-type";
...
 
-   cpu-supply = <&cpu_supply0>, <&cpu_supply1>, 
<&cpu_supply2>;
+   vcc0-supply = <&cpu_supply0>;
+   vcc1-supply = <&cpu_supply1>;
+   vcc2-supply = <&cpu_supply2>;
operating-points-v2 = <&cpu0_opp_table>;
};
};
-- 
2.7.1.410.g6faf27b

[PATCH V3 9/9] PM / OPP: Don't assume platform doesn't have regulators

2016-10-25 Thread Viresh Kumar

If the regulators aren't set explicitly by the platform, the OPP core
assumes that the platform doesn't have any regulator and uses the
clk-only callback.

If the platform failed to register a regulator with the core, then this
can turn out to be a dangerous assumption as the OPP core will try to
change clk without changing regulators.

Handle that properly by making sure that the DT didn't had any entries
for supply voltages as well.

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/opp/core.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 3298fac01bb0..34cd48dfe89e 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -734,7 +734,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 
/* Only frequency scaling */
if (!regulators) {
-   rcu_read_unlock();
+   /*
+* DT contained supply ratings? Consider platform failed to set
+* regulators.
+*/
+   if (unlikely(opp->supplies[0].u_volt)) {
+   rcu_read_unlock();
+   dev_err(dev, "%s: Regulator not registered with OPP 
core\n",
+   __func__);
+   return -EINVAL;
+   }
+
return _generic_set_opp_clk_only(dev, clk, old_freq, freq);
}
 
-- 
2.7.1.410.g6faf27b

[PATCH V3 8/9] PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()

2016-10-25 Thread Viresh Kumar

If a platform specific OPP driver has called this routine first and set
the regulators, then the second call from cpufreq-dt driver will hit the
WARN_ON(). Remove the WARN_ON(), but continue to return error in such
cases.

Signed-off-by: Viresh Kumar 
Reviewed-by: Stephen Boyd 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index f4f6b1fdbe06..3298fac01bb0 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -1469,7 +1469,7 @@ int dev_pm_opp_set_regulators(struct device *dev, const 
char * const names[],
}
 
/* Already have regulators set */
-   if (WARN_ON(opp_table->regulators)) {
+   if (opp_table->regulators) {
ret = -EBUSY;
goto err;
}
-- 
2.7.1.410.g6faf27b

[PATCH V3 4/9] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()

2016-10-25 Thread Viresh Kumar

Pass the entire supply structure instead of all of its fields.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c | 44 +--
 1 file changed, 17 insertions(+), 27 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 8d6006151c9a..37fad2eb0f47 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -542,8 +542,7 @@ static struct clk *_get_opp_clk(struct device *dev)
 }
 
 static int _set_opp_voltage(struct device *dev, struct regulator *reg,
-   unsigned long u_volt, unsigned long u_volt_min,
-   unsigned long u_volt_max)
+   struct dev_pm_opp_supply *supply)
 {
int ret;
 
@@ -554,14 +553,15 @@ static int _set_opp_voltage(struct device *dev, struct 
regulator *reg,
return 0;
}
 
-   dev_dbg(dev, "%s: voltages (mV): %lu %lu %lu\n", __func__, u_volt_min,
-   u_volt, u_volt_max);
+   dev_dbg(dev, "%s: voltages (mV): %lu %lu %lu\n", __func__,
+   supply->u_volt_min, supply->u_volt, supply->u_volt_max);
 
-   ret = regulator_set_voltage_triplet(reg, u_volt_min, u_volt,
-   u_volt_max);
+   ret = regulator_set_voltage_triplet(reg, supply->u_volt_min,
+   supply->u_volt, supply->u_volt_max);
if (ret)
dev_err(dev, "%s: failed to set voltage (%lu %lu %lu mV): %d\n",
-   __func__, u_volt_min, u_volt, u_volt_max, ret);
+   __func__, supply->u_volt_min, supply->u_volt,
+   supply->u_volt_max, ret);
 
return ret;
 }
@@ -583,8 +583,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
struct regulator *reg;
struct clk *clk;
unsigned long freq, old_freq;
-   unsigned long u_volt, u_volt_min, u_volt_max;
-   unsigned long old_u_volt, old_u_volt_min, old_u_volt_max;
+   struct dev_pm_opp_supply old_supply, new_supply;
int ret;
 
if (unlikely(!target_freq)) {
@@ -634,17 +633,12 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
return ret;
}
 
-   if (IS_ERR(old_opp)) {
-   old_u_volt = 0;
-   } else {
-   old_u_volt = old_opp->supply.u_volt;
-   old_u_volt_min = old_opp->supply.u_volt_min;
-   old_u_volt_max = old_opp->supply.u_volt_max;
-   }
+   if (IS_ERR(old_opp))
+   old_supply.u_volt = 0;
+   else
+   memcpy(&old_supply, &old_opp->supply, sizeof(old_supply));
 
-   u_volt = opp->supply.u_volt;
-   u_volt_min = opp->supply.u_volt_min;
-   u_volt_max = opp->supply.u_volt_max;
+   memcpy(&new_supply, &opp->supply, sizeof(new_supply));
 
reg = opp_table->regulator;
 
@@ -652,8 +646,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 
/* Scaling up? Scale voltage before frequency */
if (freq > old_freq) {
-   ret = _set_opp_voltage(dev, reg, u_volt, u_volt_min,
-  u_volt_max);
+   ret = _set_opp_voltage(dev, reg, &new_supply);
if (ret)
goto restore_voltage;
}
@@ -672,8 +665,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 
/* Scaling down? Scale voltage after frequency */
if (freq < old_freq) {
-   ret = _set_opp_voltage(dev, reg, u_volt, u_volt_min,
-  u_volt_max);
+   ret = _set_opp_voltage(dev, reg, &new_supply);
if (ret)
goto restore_freq;
}
@@ -686,10 +678,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
__func__, old_freq);
 restore_voltage:
/* This shouldn't harm even if the voltages weren't updated earlier */
-   if (old_u_volt) {
-   _set_opp_voltage(dev, reg, old_u_volt, old_u_volt_min,
-old_u_volt_max);
-   }
+   if (old_supply.u_volt)
+   _set_opp_voltage(dev, reg, &old_supply);
 
return ret;
 }
-- 
2.7.1.410.g6faf27b

[PATCH V3 7/9] PM / OPP: Allow platform specific custom set_opp() callbacks

2016-10-25 Thread Viresh Kumar

The generic set_opp() handler isn't sufficient for platforms with
complex DVFS.  For example, some TI platforms have multiple regulators
for a CPU device. The order in which various supplies need to be
programmed is only known to the platform code and its best to leave it
to it.

This patch implements APIs to register platform specific set_opp()
callback.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c | 116 +-
 drivers/base/power/opp/opp.h  |   1 +
 include/linux/pm_opp.h|  10 
 3 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index dedb08a66e99..f4f6b1fdbe06 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -673,6 +673,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 {
struct opp_table *opp_table;
unsigned long freq, old_freq;
+   int (*set_opp)(struct device *dev, struct dev_pm_set_opp_data *data);
struct dev_pm_opp *old_opp, *opp;
struct regulator **regulators;
struct dev_pm_set_opp_data *data;
@@ -737,6 +738,11 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
return _generic_set_opp_clk_only(dev, clk, old_freq, freq);
}
 
+   if (opp_table->set_opp)
+   set_opp = opp_table->set_opp;
+   else
+   set_opp = _generic_set_opp;
+
data = opp_table->set_opp_data;
data->regulators = regulators;
data->regulator_count = opp_table->regulator_count;
@@ -754,7 +760,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 
rcu_read_unlock();
 
-   return _generic_set_opp(dev, data);
+   return set_opp(dev, data);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_set_rate);
 
@@ -888,6 +894,9 @@ static void _remove_opp_table(struct opp_table *opp_table)
if (opp_table->regulators)
return;
 
+   if (opp_table->set_opp)
+   return;
+
/* Release clk */
if (!IS_ERR(opp_table->clk))
clk_put(opp_table->clk);
@@ -1486,7 +1495,7 @@ int dev_pm_opp_set_regulators(struct device *dev, const 
char * const names[],
 
opp_table->regulator_count = count;
 
-   /* Allocate block only once to pass to ->set_rate() */
+   /* Allocate block only once to pass to ->set_opp() */
ret = _allocate_set_opp_data(opp_table);
if (ret)
goto free_regulators;
@@ -1561,6 +1570,109 @@ void dev_pm_opp_put_regulators(struct device *dev)
 EXPORT_SYMBOL_GPL(dev_pm_opp_put_regulators);
 
 /**
+ * dev_pm_opp_register_set_opp_helper() - Register custom OPP set rate helper
+ * @dev: Device for which the helper is getting registered.
+ * @set_opp: Custom set OPP helper.
+ *
+ * This is useful to support complex platforms (like platforms with multiple
+ * regulators per device), instead of the generic OPP set rate helper.
+ *
+ * This must be called before any OPPs are initialized for the device.
+ *
+ * Locking: The internal opp_table and opp structures are RCU protected.
+ * Hence this function internally uses RCU updater strategy with mutex locks
+ * to keep the integrity of the internal data structures. Callers should ensure
+ * that this function is *NOT* called under RCU protection or in contexts where
+ * mutex cannot be locked.
+ */
+int dev_pm_opp_register_set_opp_helper(struct device *dev,
+   int (*set_opp)(struct device *dev, struct dev_pm_set_opp_data *data))
+{
+   struct opp_table *opp_table;
+   int ret;
+
+   if (!set_opp)
+   return -EINVAL;
+
+   mutex_lock(&opp_table_lock);
+
+   opp_table = _add_opp_table(dev);
+   if (!opp_table) {
+   ret = -ENOMEM;
+   goto unlock;
+   }
+
+   /* This should be called before OPPs are initialized */
+   if (WARN_ON(!list_empty(&opp_table->opp_list))) {
+   ret = -EBUSY;
+   goto err;
+   }
+
+   /* Already have custom set_opp helper */
+   if (WARN_ON(opp_table->set_opp)) {
+   ret = -EBUSY;
+   goto err;
+   }
+
+   opp_table->set_opp = set_opp;
+
+   mutex_unlock(&opp_table_lock);
+   return 0;
+
+err:
+   _remove_opp_table(opp_table);
+unlock:
+   mutex_unlock(&opp_table_lock);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_register_set_opp_helper);
+
+/**
+ * dev_pm_opp_register_put_opp_helper() - Releases resources blocked for
+ *set_opp helper
+ * @dev: Device for which custom set_opp helper has to be cleared.
+ *
+ * Locking: The internal opp_table and opp structures are RCU protected.
+ * Hence this function internally uses RCU updater strategy with mutex locks
+ * to keep the integrity of the internal data structures. Callers should ensure
+ * that this function is *NOT* called un

[PATCH V3 6/9] PM / OPP: Separate out _generic_opp_set_rate()

2016-10-25 Thread Viresh Kumar

Later patches would add support for custom opp_set_rate callbacks. This
patch separates out the code for generic opp_set_rate handler in order
to prepare for that.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c | 180 +-
 drivers/base/power/opp/opp.h  |   2 +
 include/linux/pm_opp.h|  33 
 3 files changed, 162 insertions(+), 53 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 5a35fdd4b61b..dedb08a66e99 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -596,6 +596,69 @@ static int _set_opp_voltage(struct device *dev, struct 
regulator *reg,
return ret;
 }
 
+static inline int
+_generic_set_opp_clk_only(struct device *dev, struct clk *clk,
+ unsigned long old_freq, unsigned long freq)
+{
+   int ret;
+
+   ret = clk_set_rate(clk, freq);
+   if (ret) {
+   dev_err(dev, "%s: failed to set clock rate: %d\n", __func__,
+   ret);
+   }
+
+   return ret;
+}
+
+static int _generic_set_opp(struct device *dev,
+   struct dev_pm_set_opp_data *data)
+{
+   struct dev_pm_opp_supply *old_supply = data->old_opp.supplies;
+   struct dev_pm_opp_supply *new_supply = data->new_opp.supplies;
+   unsigned long old_freq = data->old_opp.rate, freq = data->new_opp.rate;
+   struct regulator *reg = data->regulators[0];
+   int ret;
+
+   /* This function only supports single regulator per device */
+   if (WARN_ON(data->regulator_count > 1)) {
+   dev_err(dev, "multiple regulators are not supported\n");
+   return -EINVAL;
+   }
+
+   /* Scaling up? Scale voltage before frequency */
+   if (freq > old_freq) {
+   ret = _set_opp_voltage(dev, reg, new_supply);
+   if (ret)
+   goto restore_voltage;
+   }
+
+   /* Change frequency */
+   ret = _generic_set_opp_clk_only(dev, data->clk, old_freq, freq);
+   if (ret)
+   goto restore_voltage;
+
+   /* Scaling down? Scale voltage after frequency */
+   if (freq < old_freq) {
+   ret = _set_opp_voltage(dev, reg, new_supply);
+   if (ret)
+   goto restore_freq;
+   }
+
+   return 0;
+
+restore_freq:
+   if (_generic_set_opp_clk_only(dev, data->clk, freq, old_freq))
+   dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n",
+   __func__, old_freq);
+restore_voltage:
+   /* This shouldn't harm even if the voltages weren't updated earlier */
+   if (old_supply->u_volt)
+   _set_opp_voltage(dev, reg, old_supply);
+
+   return ret;
+}
+
 /**
  * dev_pm_opp_set_rate() - Configure new OPP based on frequency
  * @dev:device for which we do this operation
@@ -609,12 +672,12 @@ static int _set_opp_voltage(struct device *dev, struct 
regulator *reg,
 int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 {
struct opp_table *opp_table;
+   unsigned long freq, old_freq;
struct dev_pm_opp *old_opp, *opp;
-   struct regulator *reg = ERR_PTR(-ENXIO);
+   struct regulator **regulators;
+   struct dev_pm_set_opp_data *data;
struct clk *clk;
-   unsigned long freq, old_freq;
-   struct dev_pm_opp_supply old_supply, new_supply;
-   int ret;
+   int ret, size;
 
if (unlikely(!target_freq)) {
dev_err(dev, "%s: Invalid target frequency %lu\n", __func__,
@@ -663,64 +726,35 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
return ret;
}
 
-   if (opp_table->regulators) {
-   /* This function only supports single regulator per device */
-   if (WARN_ON(opp_table->regulator_count > 1)) {
-   dev_err(dev, "multiple regulators not supported\n");
-   rcu_read_unlock();
-   return -EINVAL;
-   }
+   dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__,
+   old_freq, freq);
 
-   reg = opp_table->regulators[0];
+   regulators = opp_table->regulators;
+
+   /* Only frequency scaling */
+   if (!regulators) {
+   rcu_read_unlock();
+   return _generic_set_opp_clk_only(dev, clk, old_freq, freq);
}
 
+   data = opp_table->set_opp_data;
+   data->regulators = regulators;
+   data->regulator_count = opp_table->regulator_count;
+   data->clk = clk;
+
+   data->old_opp.rate = old_freq;
+   size = sizeof(*opp->supplies) * opp_table->regulator_count;
if (IS_ERR(old_opp))
-   old_supply.u_volt = 0;
+   memset(data->old_opp.supplies, 0, size);
else
-   memcpy(&old_supply, old_opp->supplies, sizeo

[PATCH V3 2/9] PM / OPP: Don't use OPP structure outside of rcu protected section

2016-10-25 Thread Viresh Kumar

The OPP structure must not be used out of the rcu protected section.
Cache the values to be used in separate variables instead.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 4c7c6da7a989..056527a3fb4e 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -584,6 +584,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
struct clk *clk;
unsigned long freq, old_freq;
unsigned long u_volt, u_volt_min, u_volt_max;
+   unsigned long old_u_volt, old_u_volt_min, old_u_volt_max;
int ret;
 
if (unlikely(!target_freq)) {
@@ -633,6 +634,14 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
return ret;
}
 
+   if (IS_ERR(old_opp)) {
+   old_u_volt = 0;
+   } else {
+   old_u_volt = old_opp->u_volt;
+   old_u_volt_min = old_opp->u_volt_min;
+   old_u_volt_max = old_opp->u_volt_max;
+   }
+
u_volt = opp->u_volt;
u_volt_min = opp->u_volt_min;
u_volt_max = opp->u_volt_max;
@@ -677,9 +686,10 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
__func__, old_freq);
 restore_voltage:
/* This shouldn't harm even if the voltages weren't updated earlier */
-   if (!IS_ERR(old_opp))
-   _set_opp_voltage(dev, reg, old_opp->u_volt,
-old_opp->u_volt_min, old_opp->u_volt_max);
+   if (old_u_volt) {
+   _set_opp_voltage(dev, reg, old_u_volt, old_u_volt_min,
+old_u_volt_max);
+   }
 
return ret;
 }
-- 
2.7.1.410.g6faf27b

[PATCH V3 5/9] PM / OPP: Add infrastructure to manage multiple regulators

2016-10-25 Thread Viresh Kumar

This patch adds infrastructure to manage multiple regulators and updates
the only user (cpufreq-dt) of dev_pm_opp_set{put}_regulator().

This is preparatory work for adding full support for devices with
multiple regulators.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c| 220 ++-
 drivers/base/power/opp/debugfs.c |  52 +++--
 drivers/base/power/opp/of.c  | 103 --
 drivers/base/power/opp/opp.h |  10 +-
 drivers/cpufreq/cpufreq-dt.c |   9 +-
 include/linux/pm_opp.h   |   8 +-
 6 files changed, 280 insertions(+), 122 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 37fad2eb0f47..5a35fdd4b61b 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -93,6 +93,8 @@ struct opp_table *_find_opp_table(struct device *dev)
  * Return: voltage in micro volt corresponding to the opp, else
  * return 0
  *
+ * This is useful only for devices with single power supply.
+ *
  * Locking: This function must be called under rcu_read_lock(). opp is a rcu
  * protected pointer. This means that opp which could have been fetched by
  * opp_find_freq_{exact,ceil,floor} functions is valid as long as we are
@@ -112,7 +114,7 @@ unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
if (IS_ERR_OR_NULL(tmp_opp))
pr_err("%s: Invalid parameters\n", __func__);
else
-   v = tmp_opp->supply.u_volt;
+   v = tmp_opp->supplies[0].u_volt;
 
return v;
 }
@@ -222,10 +224,13 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct 
device *dev)
 {
struct opp_table *opp_table;
struct dev_pm_opp *opp;
-   struct regulator *reg;
+   struct regulator *reg, **regulators;
unsigned long latency_ns = 0;
-   unsigned long min_uV = ~0, max_uV = 0;
-   int ret;
+   int ret, size, i, count;
+   struct {
+   unsigned long min;
+   unsigned long max;
+   } *uV;
 
rcu_read_lock();
 
@@ -235,21 +240,41 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct 
device *dev)
return 0;
}
 
-   reg = opp_table->regulator;
-   if (IS_ERR(reg)) {
+   count = opp_table->regulator_count;
+
+   if (!count) {
/* Regulator may not be required for device */
rcu_read_unlock();
return 0;
}
 
-   list_for_each_entry_rcu(opp, &opp_table->opp_list, node) {
-   if (!opp->available)
-   continue;
+   size = count * sizeof(*regulators);
+   regulators = kmemdup(opp_table->regulators, size, GFP_KERNEL);
+   if (!regulators) {
+   rcu_read_unlock();
+   return 0;
+   }
 
-   if (opp->supply.u_volt_min < min_uV)
-   min_uV = opp->supply.u_volt_min;
-   if (opp->supply.u_volt_max > max_uV)
-   max_uV = opp->supply.u_volt_max;
+   uV = kmalloc_array(count, sizeof(*uV), GFP_KERNEL);
+   if (!uV) {
+   kfree(regulators);
+   rcu_read_unlock();
+   return 0;
+   }
+
+   for (i = 0; i < count; i++) {
+   uV[i].min = ~0;
+   uV[i].max = 0;
+
+   list_for_each_entry_rcu(opp, &opp_table->opp_list, node) {
+   if (!opp->available)
+   continue;
+
+   if (opp->supplies[i].u_volt_min < uV[i].min)
+   uV[i].min = opp->supplies[i].u_volt_min;
+   if (opp->supplies[i].u_volt_max > uV[i].max)
+   uV[i].max = opp->supplies[i].u_volt_max;
+   }
}
 
rcu_read_unlock();
@@ -258,9 +283,14 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct 
device *dev)
 * The caller needs to ensure that opp_table (and hence the regulator)
 * isn't freed, while we are executing this routine.
 */
-   ret = regulator_set_voltage_time(reg, min_uV, max_uV);
-   if (ret > 0)
-   latency_ns = ret * 1000;
+   for (i = 0; reg = regulators[i], i < count; i++) {
+   ret = regulator_set_voltage_time(reg, uV[i].min, uV[i].max);
+   if (ret > 0)
+   latency_ns += ret * 1000;
+   }
+
+   kfree(uV);
+   kfree(regulators);
 
return latency_ns;
 }
@@ -580,7 +610,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
 {
struct opp_table *opp_table;
struct dev_pm_opp *old_opp, *opp;
-   struct regulator *reg;
+   struct regulator *reg = ERR_PTR(-ENXIO);
struct clk *clk;
unsigned long freq, old_freq;
struct dev_pm_opp_supply old_supply, new_supply;
@@ -633,14 +663,23 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_fr

[PATCH V3 0/9] PM / OPP: Multiple regulator support

2016-10-25 Thread Viresh Kumar

Hi,

Some platforms (like TI) have complex DVFS configuration for CPU
devices, where multiple regulators are required to be configured to
change DVFS state of the device. This was explained well by Nishanth
earlier [1].

One of the major complaints around multiple regulators case was that the
DT isn't responsible in any way to represent the ordering in which
multiple supplies need to be programmed, before or after frequency
change. It was considered in this patch and such information is left to
the platform specific OPP driver now, which can register its own
opp_set_rate() callback with the OPP core and the OPP core will then
call it during DVFS.

The patches are tested on Exynos5250 (Dual A15). I have hacked around DT
and code to pass values for multiple regulators and verified that they
are all properly read by the kernel (using debugfs interface).

Dave Gerlach has already tested it on the real TI platforms and it works
well for him.

This is rebased over: linux-next branch in the PM tree.

V2->V3:
- The last patch is new
- Removed a debug leftover pr_info() message
- Renamed few names as s/set_rate/set_opp
- Removed a TODO comment (as it is done now with this series)
- created struct for min_uV and max_uV
- kerneldoc comments for structures in pm_opp.h
- s/const char */const char * const
- use kasprintf()
- Some more minor reformatting
- More Ack/RBY tags added

V1->V2:
- Ack from Rob for 1st patch
- Moved the supplies structure to pm_opp.h (Dave)
- Fixed an compilation warning.

--
viresh

[1] https://marc.info/?l=linux-pm&m=145684495832764&w=2

Viresh Kumar (9):
  PM / OPP: Reword binding supporting multiple regulators per device
  PM / OPP: Don't use OPP structure outside of rcu protected section
  PM / OPP: Manage supply's voltage/current in a separate structure
  PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
  PM / OPP: Add infrastructure to manage multiple regulators
  PM / OPP: Separate out _generic_opp_set_rate()
  PM / OPP: Allow platform specific custom set_opp() callbacks
  PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
  PM / OPP: Don't assume platform doesn't have regulators

 Documentation/devicetree/bindings/opp/opp.txt |  25 +-
 drivers/base/power/opp/core.c | 510 --
 drivers/base/power/opp/debugfs.c  |  52 ++-
 drivers/base/power/opp/of.c   | 105 --
 drivers/base/power/opp/opp.h  |  20 +-
 drivers/cpufreq/cpufreq-dt.c  |   9 +-
 include/linux/pm_opp.h|  67 +++-
 7 files changed, 605 insertions(+), 183 deletions(-)

-- 
2.7.1.410.g6faf27b

[PATCH V3 3/9] PM / OPP: Manage supply's voltage/current in a separate structure

2016-10-25 Thread Viresh Kumar

This is a preparatory step for multiple regulator per device support.
Move the voltage/current variables to a new structure.

Signed-off-by: Viresh Kumar 
Tested-by: Dave Gerlach 
---
 drivers/base/power/opp/core.c| 44 +---
 drivers/base/power/opp/debugfs.c |  8 
 drivers/base/power/opp/of.c  | 18 
 drivers/base/power/opp/opp.h | 11 +++---
 include/linux/pm_opp.h   | 16 +++
 5 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
index 056527a3fb4e..8d6006151c9a 100644
--- a/drivers/base/power/opp/core.c
+++ b/drivers/base/power/opp/core.c
@@ -112,7 +112,7 @@ unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
if (IS_ERR_OR_NULL(tmp_opp))
pr_err("%s: Invalid parameters\n", __func__);
else
-   v = tmp_opp->u_volt;
+   v = tmp_opp->supply.u_volt;
 
return v;
 }
@@ -246,10 +246,10 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct 
device *dev)
if (!opp->available)
continue;
 
-   if (opp->u_volt_min < min_uV)
-   min_uV = opp->u_volt_min;
-   if (opp->u_volt_max > max_uV)
-   max_uV = opp->u_volt_max;
+   if (opp->supply.u_volt_min < min_uV)
+   min_uV = opp->supply.u_volt_min;
+   if (opp->supply.u_volt_max > max_uV)
+   max_uV = opp->supply.u_volt_max;
}
 
rcu_read_unlock();
@@ -637,14 +637,14 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
if (IS_ERR(old_opp)) {
old_u_volt = 0;
} else {
-   old_u_volt = old_opp->u_volt;
-   old_u_volt_min = old_opp->u_volt_min;
-   old_u_volt_max = old_opp->u_volt_max;
+   old_u_volt = old_opp->supply.u_volt;
+   old_u_volt_min = old_opp->supply.u_volt_min;
+   old_u_volt_max = old_opp->supply.u_volt_max;
}
 
-   u_volt = opp->u_volt;
-   u_volt_min = opp->u_volt_min;
-   u_volt_max = opp->u_volt_max;
+   u_volt = opp->supply.u_volt;
+   u_volt_min = opp->supply.u_volt_min;
+   u_volt_max = opp->supply.u_volt_max;
 
reg = opp_table->regulator;
 
@@ -957,10 +957,11 @@ static bool _opp_supported_by_regulators(struct 
dev_pm_opp *opp,
struct regulator *reg = opp_table->regulator;
 
if (!IS_ERR(reg) &&
-   !regulator_is_supported_voltage(reg, opp->u_volt_min,
-   opp->u_volt_max)) {
+   !regulator_is_supported_voltage(reg, opp->supply.u_volt_min,
+   opp->supply.u_volt_max)) {
pr_warn("%s: OPP minuV: %lu maxuV: %lu, not supported by 
regulator\n",
-   __func__, opp->u_volt_min, opp->u_volt_max);
+   __func__, opp->supply.u_volt_min,
+   opp->supply.u_volt_max);
return false;
}
 
@@ -993,11 +994,12 @@ int _opp_add(struct device *dev, struct dev_pm_opp 
*new_opp,
 
/* Duplicate OPPs */
dev_warn(dev, "%s: duplicate OPPs detected. Existing: freq: 
%lu, volt: %lu, enabled: %d. New: freq: %lu, volt: %lu, enabled: %d\n",
-__func__, opp->rate, opp->u_volt, opp->available,
-new_opp->rate, new_opp->u_volt, new_opp->available);
+__func__, opp->rate, opp->supply.u_volt,
+opp->available, new_opp->rate, new_opp->supply.u_volt,
+new_opp->available);
 
-   return opp->available && new_opp->u_volt == opp->u_volt ?
-   0 : -EEXIST;
+   return opp->available &&
+  new_opp->supply.u_volt == opp->supply.u_volt ? 0 : 
-EEXIST;
}
 
new_opp->opp_table = opp_table;
@@ -1064,9 +1066,9 @@ int _opp_add_v1(struct device *dev, unsigned long freq, 
long u_volt,
/* populate the opp table */
new_opp->rate = freq;
tol = u_volt * opp_table->voltage_tolerance_v1 / 100;
-   new_opp->u_volt = u_volt;
-   new_opp->u_volt_min = u_volt - tol;
-   new_opp->u_volt_max = u_volt + tol;
+   new_opp->supply.u_volt = u_volt;
+   new_opp->supply.u_volt_min = u_volt - tol;
+   new_opp->supply.u_volt_max = u_volt + tol;
new_opp->available = true;
new_opp->dynamic = dynamic;
 
diff --git a/drivers/base/power/opp/debugfs.c b/drivers/base/power/opp/debugfs.c
index ef1ae6b52042..c897676ca35f 100644
--- a/drivers/base/power/opp/debugfs.c
+++ b/drivers/base/power/opp/debugfs.c
@@ -63,16 +63,16 @@ int opp_debug_create_one(struct dev_pm_opp *opp, struct 
opp_table *opp_table)
if (!debugfs_create_ulong("rate_hz", S_IRUGO, d, &opp->rate))

Re: [PATCH] powerpc: process.c: fix Kconfig typo

2016-10-25 Thread Valentin Rothberg

On Wed, Oct 26, 2016 at 7:52 AM, Michael Ellerman  wrote:
> Cyril Bur  writes:
>
>> On Wed, 2016-10-05 at 07:57 +0200, Valentin Rothberg wrote:
>>> s/ALIVEC/ALTIVEC/
>>>
>>
>> Oops, nice catch
>>
>>> Signed-off-by: Valentin Rothberg 
>>
>> Reviewed-by: Cyril Bur 
>
> How did we not notice? Sounds like we need a new selftest.
>
> Looks like this should have:
>
> Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware 
> transactional memory in use")
>
>
> And I guess I need to start running checkkconfigsymbols.py on every
> commit.

Happy to find a new user :-)  You can also run the script on a range
of commits via '--diff commitA..commitB', which can safe some time.

Best regards,
 Valentin

> cheers

RE: [PATCH 0/6] ACPICA: Interpreter: Improve lock order fixes

2016-10-25 Thread Zheng, Lv

Hi, Rafael

> From: linux-acpi-ow...@vger.kernel.org 
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Rafael J.
> Wysocki
> Subject: Re: [PATCH 0/6] ACPICA: Interpreter: Improve lock order fixes
> 
> On Tue, Oct 25, 2016 at 7:20 AM, Lv Zheng  wrote:
> > This patchset improves ACPICA intepreter lock order fixes. Including
> > several urgent regression fixes [PATCH 0-3].
> 
> OK, thanks!
> 
> So patches [4-6/6] appear to be cleanups and I'd prefer them to be
> applied in a usual way (ie. via the upstream ACPICA).

I think PATCH 4 is also an urgent fix.
On certain table loading mode (we have 3 now).
When acpi_ds_initialize_objects() is invoked, acpi_ds_initialize_region() will 
be invoked.
While in other modes, it will be invoked in acpi_ds_load2_end_op(), so no-op in 
acpi_ds_initialize_objects().

When it is not no-op in acpi_ds_initialize_objects(), the wrong returning value 
becomes an exception preventing the table from being correctly 
loaded/initialized.

[PATCH 5-6] are cleanups.

> 
> I'd like to take the [1-3/6] as fixes for 4.9-rc3 though, but for that
> I need you to tell me which mainline kernel commits are fixed by them.
> 
> IOW, what should I put into the Fixes: tags.
> 
> [In the future, if you post a regression fix, please always add a
> FIxes: tag to it pointing to the commit being fixed.]

OK, I'll add the Fixes tag and re-send the patches.

Thanks and best regards
Lv

> 
> > Patches tested with customized ACPI table where _PS0/_PS3 methods are
> > customized to invoke a serialized control method which creates named
> > objects. When pm_async=yes, AE_ALREADY_EXISTS can be seen in suspend/resume
> > process. This is an existing issue, triggered in 4.9-rc1 by ACPICA
> > interpreter lock order fixes, and can be fixed by [PATCH 1] in this series.
> >
> > Lv Zheng (6):
> >   ACPICA: Dispatcher: Fix order issue of method termination
> >   ACPICA: Dispatcher: Fix an unbalanced lock exit path in
> > acpi_ds_auto_serialize_method()
> >   ACPICA: Dispatcher: Tune interpreter lock around
> > acpi_ev_initialize_region()
> >   ACPICA: Events: Cleanup acpi_ev_initialize_region()
> >   ACPICA: Tables: Cleanup acpi_tb_install_and_load_table()
> >   ACPICA: Tables: Add acpi_tb_unload_table()
> 
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)

2016-10-25 Thread Dave Airlie

>>
>> Is anything on a driver to be able to tell when this is actually needed ?
>> How will driver developers know? Can you add a bit of documentation to
>> the API? If its transitive towards a secondary solution indicating so
>> would help driver developers.
>
> I'll plug the io-mapping stuff again here, and more specifically the
> userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
> probably move that one to the core. That way io_mapping takes care of the
> full reservartion, and allows you to on-demand kmap (for kernel) and write
> ptes. All nicely fast and all, and for bonus, also nicely encapsulated.

Yeah I think ideally we'd want to move towards that, however we don't tend
to want to ioremap the full range even on 64-bit, which is what io-mapping does.

At least on most GPUs with VRAM we rarely want to map VRAM for much,
I think page tables and fbcon are probably the main two uses for touch
it at all.

So I don't think we need to be as efficient as i915 in this area.

Dave.

Re: [PATCH V2 6/8] PM / OPP: Separate out _generic_opp_set_rate()

2016-10-25 Thread Viresh Kumar

On 25-10-16, 11:59, Stephen Boyd wrote:
> On 10/20, Viresh Kumar wrote:
> > Later patches would add support for custom opp_set_rate callbacks. This
> 
> I know the OPP consumer function has "rate" in the name, but it
> makes more sense to call the callback set_opp instead because we
> could be doing a lot more than setting the opp rate.

Done.

> > patch separates out the code for generic opp_set_rate handler in order
> > to prepare for that.
> > 
> > Signed-off-by: Viresh Kumar 
> > ---
> > diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
> > index 45c70ce07864..96f04392daef 100644
> > --- a/drivers/base/power/opp/core.c
> > +++ b/drivers/base/power/opp/core.c
> > @@ -596,6 +596,73 @@ static int _set_opp_voltage(struct device *dev, struct 
> > regulator *reg,
> > return ret;
> >  }
> >  
> > +static inline int
> > +_generic_opp_set_rate_clk_only(struct device *dev, struct clk *clk,
> > +  unsigned long old_freq, unsigned long freq)
> > +{
> > +   int ret;
> > +
> > +   /* Change frequency */
> > +   dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n",
> > +   __func__, old_freq, freq);
> 
> Perhaps this should stay at the beginning of OPP transitions?
> Otherwise it can get confusing when multiple switching OPP
> messages appear on OPP transition failures.

Done.

> > +struct clk;
> 
> Is struct regulator also forward declared?

Done now.

> >  struct dev_pm_opp;
> >  struct device;
> >  
> > @@ -24,6 +25,36 @@ enum dev_pm_opp_event {
> > OPP_EVENT_ADD, OPP_EVENT_REMOVE, OPP_EVENT_ENABLE, OPP_EVENT_DISABLE,
> >  };
> >  
> > +/**
> > + * struct dev_pm_opp_supply - Power supply voltage/current values
> > + * @u_volt:Target voltage in microvolts corresponding to this OPP
> > + * @u_volt_min:Minimum voltage in microvolts corresponding to this OPP
> > + * @u_volt_max:Maximum voltage in microvolts corresponding to this OPP
> > + * @u_amp: Maximum current drawn by the device in microamperes
> > + *
> > + * This structure stores the voltage/current values for a single power 
> > supply.
> > + */
> > +struct dev_pm_opp_supply {
> > +   unsigned long u_volt;
> > +   unsigned long u_volt_min;
> > +   unsigned long u_volt_max;
> > +   unsigned long u_amp;
> > +};
> 
> This structure moved during this series. Can we avoid that and
> already have it in the right place to begin with?

Done.

> > +
> > +struct dev_pm_opp_info {
> > +   unsigned long rate;
> > +   struct dev_pm_opp_supply *supplies;
> > +};
> > +
> > +struct dev_pm_set_rate_data {
> 
> dev_pm_set_opp_data?

Done.

> > +   struct dev_pm_opp_info old_opp;
> > +   struct dev_pm_opp_info new_opp;
> > +
> > +   struct regulator **regulators;
> > +   unsigned int regulator_count;
> > +   struct clk *clk;
> > +};
> 
> The above two structures don't get kernel doc?

Done.

-- 
viresh

Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Xishi Qiu

On 2016/10/26 13:59, Joonsoo Kim wrote:

> On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote:
>> On 2016/10/26 12:37, Joonsoo Kim wrote:
>>
>>> On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
 On 2016/10/13 16:08, js1...@gmail.com wrote:

> From: Joonsoo Kim 
>
> Currently, freeing page can stay longer in the buddy list if next higher
> order page is in the buddy list in order to help coalescence. However,
> it doesn't work for the simplest sequential free case. For example, think
> about the situation that 8 consecutive pages are freed in sequential
> order.
>
> page 0: attached at the head of order 0 list
> page 1: merged with page 0, attached at the head of order 1 list
> page 2: attached at the tail of order 0 list
> page 3: merged with page 2 and then merged with page 0, attached at
>  the head of order 2 list
> page 4: attached at the head of order 0 list
> page 5: merged with page 4, attached at the tail of order 1 list
> page 6: attached at the tail of order 0 list
> page 7: merged with page 6 and then merged with page 4. Lastly, merged
>  with page 0 and we get order 3 freepage.
>
> With excluding page 0 case, there are three cases that freeing page is
> attached at the head of buddy list in this example and if just one
> corresponding ordered allocation request comes at that moment, this page
> in being a high order page will be allocated and we would fail to make
> order-3 freepage.
>
> Allocation usually happens in sequential order and free also does. So, it
> would be important to detect such a situation and to give some chance
> to be coalesced.
>
> I think that simple and effective heuristic about this case is just
> attaching freeing page at the tail of the buddy list unconditionally.
> If freeing isn't merged during one rotation, it would be actual
> fragmentation and we don't need to care about it for coalescence.
>

 Hi Joonsoo,

 I find another two places to reduce fragmentation.

 1)
 __rmqueue_fallback
steal_suitable_fallback
move_freepages_block
move_freepages
list_move
 If we steal some free pages, we will add these page at the head of 
 start_migratetype list,
 this will cause more fixed migratetype, because this pages will be 
 allocated more easily.
 So how about use list_move_tail instead of list_move?
>>>
>>> Yeah... I don't think deeply but, at a glance, it would be helpful.
>>>

 2)
 __rmqueue_fallback
expand
list_add
 How about use list_add_tail instead of list_add? If add the tail, then the 
 rest of pages
 will be hard to be allocated and we can merge them again as soon as the 
 page freed.
>>>
>>> I guess that it has no effect. When we do __rmqueue_fallback() and
>>> expand(), we don't have any freepage on this or more order. So,
>>> list_add or list_add_tail will show the same result.
>>>
>>
>> Hi Joonsoo,
>>
>> Usually this list is empty, but in the following case, the list is not empty.
>>
>> __rmqueue_fallback
>>  steal_suitable_fallback
>>  move_freepages_block  // move to the list of start_migratetype
>>  expand  // split the largest order first
>>  list_add  // add to the list of start_migratetype
> 
> In this case, stealed freepage on steal_suitable_fallback() and
> splitted freepage would come from the same pageblock. So, it doen't
> matter to use whatever list_add* function.
> 

Yes, they are from the same pageblock, stealed freepage will move to the
start_migratetype, and expand will move to the same migratetype too,
but the list may be not empty because of the stealed freepage.
So when we split the largest order, add to the tail will be allocated
less easily, right?

Thanks,
Xishi Qiu

> Thanks.
> 
> .
>

Re: [PATCH 6/5] KVM: x86: fix periodic lapic timer with hrtimers

2016-10-25 Thread Wanpeng Li

2016-10-26 14:02 GMT+08:00 Wanpeng Li :
> 2016-10-25 19:43 GMT+08:00 Radim Krčmář :
>> 2016-10-25 07:39+0800, Wanpeng Li:
>>> 2016-10-24 23:27 GMT+08:00 Radim Krčmář :
 2016-10-24 17:09+0200, Paolo Bonzini:
> On 24/10/2016 17:03, Radim Krčmář wrote:
 [...]
>
> Reviewed-by: Paolo Bonzini 
>
> Go ahead, squash it into 5/5 and commit to kvm/queue. :)

 Did that, thanks.

 Wanpeng, the code is now under your name so please check it and/or
 complain.
>>>
>>> This patch 6/5 incurred regressions.
>>>
>>> - The latency of the periodic mode which is emulated by VMX preemption
>>> is almost the same as periodic mode which is emulated by hrtimer.
>>
>> Hm, what numbers are you getting?
>
> The two fixes look good to me. However, the codes which you remove in
> kvm_lapic_switch_to_hv_timer() results in different numbers.
>
> w/o removehlt average latency = 2398462
> w/ remove  hlt average latency = 2403845
>
>>
>> When I ran the test with the original series, then it actually had worse
>
> Did you test this by running my kvm-unit-tests/apic_timer_latency.flat?
>
>> results with the VMX preemption than it did with the hrimer:
>>
>>   hlt average latency   = 1464151
>>   pause average latency = 1467605
>>
>> htl tests the hrtimer, pause tests the VMX preemption.  I just replaced
>> "hlt" with "pause" in the assembly loop.
>>
>> The worse result was because the VMX preemption period was computed
>> incorrectly -- it was being added to now().  Some time passes between
>> the expiration and reading of now(), so this time was extending the
>> period while it shouldn't have.
>>
>> If I run the test with [6/5], it gets sane numbers:
>>
>>   hlt average latency   = 1465107
>>   pause average latency = 1465093
>>
>> The numbers are sane bacause the test is not computing latency (= how
>> long after the timer should have fired have we received the interrupt)
>> -- it is computing the duration of the period in cycles, which is much
>> better right now.
>
> Agreed.
>
>>
>>> - The oneshot mode test of kvm-unit-tests/apic_timer_latency.flat almost 
>>> fail.
>>
>> Oops, silly mistake -- apic_timer_expired() was in the 'else' branch in
>> [5/5] and I didn't invert the condition after moving it.
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 6244988418be..d7e74c8ec8ca 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1354,8 +1354,8 @@ static void start_sw_period(struct kvm_lapic *apic)
>> return;
>>
>> if (apic_lvtt_oneshot(apic) &&
>> -   ktime_after(apic->lapic_timer.target_expiration,
>> -   apic->lapic_timer.timer.base->get_time())) {
>> +   !ktime_after(apic->lapic_timer.target_expiration,
>> +apic->lapic_timer.timer.base->get_time())) {
>> apic_timer_expired(apic);
>> return;
>> }
>>
>
> It works.
>
>> Paolo, can you squash that?
>>
>>> Btw, hope you can also apply the testcase for kvm-unit-tests. :)
>>
>> I will have some comments, because it would be nicer if it measured the
>> latency ... expected_expiration is not computed correctly.
>
> It measured the latency from guest programs the clock event device to
> interrupt injected to guest after timer fire.

When compare this with clock event device which is emulated by
hrtimer, we can calculate the latency bonus from VMX preemption.

Regards,
Wanpeng Li

Re: [PATCH V2 7/8] PM / OPP: Allow platform specific custom opp_set_rate() callbacks

2016-10-25 Thread Viresh Kumar

On 25-10-16, 12:01, Stephen Boyd wrote:
> On 10/20, Viresh Kumar wrote:
> > The generic opp_set_rate() handler isn't sufficient for platforms with
> > complex DVFS.  For example, some TI platforms have multiple regulators
> > for a CPU device. The order in which various supplies need to be
> > programmed is only known to the platform code and its best to leave it
> > to it.
> > 
> > This patch implements APIs to register platform specific opp_set_rate()
> > callback.
> > 
> > Signed-off-by: Viresh Kumar 
> > ---
> 
> Overall it looks ok, but I'd prefer set_opp instead of set_rate.

Done.

-- 
viresh

Re: [PATCH V2 5/8] PM / OPP: Add infrastructure to manage multiple regulators

2016-10-25 Thread Viresh Kumar

On 25-10-16, 09:49, Stephen Boyd wrote:
> On 10/20, Viresh Kumar wrote:
> > diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c
> > index 37fad2eb0f47..45c70ce07864 100644
> > --- a/drivers/base/power/opp/core.c
> > +++ b/drivers/base/power/opp/core.c
> > @@ -235,21 +237,44 @@ unsigned long dev_pm_opp_get_max_volt_latency(struct 
> > device *dev)
> > return 0;
> > }
> >  
> > -   reg = opp_table->regulator;
> > -   if (IS_ERR(reg)) {
> > +   count = opp_table->regulator_count;
> > +
> > +   if (!count) {
> > /* Regulator may not be required for device */
> > rcu_read_unlock();
> > return 0;
> > }
> >  
> > -   list_for_each_entry_rcu(opp, &opp_table->opp_list, node) {
> > -   if (!opp->available)
> > -   continue;
> > +   size = count * sizeof(*regulators);
> > +   regulators = kmemdup(opp_table->regulators, size, GFP_KERNEL);
> 
> How do we allocate under RCU? Doesn't that spit out big warnings?

That doesn't. I even tried enabling several locking debug config options.

> > +   if (!regulators) {
> > +   rcu_read_unlock();
> > +   return 0;
> > +   }
> > +
> > +   min_uV = kmalloc(count * (sizeof(*min_uV) + sizeof(*max_uV)),
> 
> Do we imagine min_uV is going to be a different size from max_uV?
> It may be better to have a struct for min/max pair and then
> stride them. Then the kmalloc() can become a kmalloc_array().

done.

> > -   *opp_table = _add_opp_table(dev);
> > -   if (!*opp_table) {
> > -   kfree(opp);
> > +   /* allocate new OPP node + and supplies structures */
> > +   opp = kzalloc(sizeof(*opp) + supply_size, GFP_KERNEL);
> > +   if (!opp) {
> > +   kfree(table);
> > return NULL;
> > }
> >  
> > +   opp->supplies = (struct dev_pm_opp_supply *)(opp + 1);
> 
> So put the supplies at the end of the OPP structure as an empty
> array?

Yes. Added a comment to clarify as well.

> > -int dev_pm_opp_set_regulator(struct device *dev, const char *name)
> > +int dev_pm_opp_set_regulators(struct device *dev, const char *names[],
> 
> Make names a const char * const *?

Done.

> I seem to recall arrays as
> function arguments has some problem but my C mastery is failing
> right now.

I am not sure why it would be a problem, and of course what gets passed is the
address and not the array.

> > +   for (i = 0; i < count; i++) {
> > +   reg = regulator_get_optional(dev, names[i]);
> > +   pr_info("%s: %d: %p: %s\n", __func__, __LINE__, reg, names[i]);
> 
> Debug noise?

Yes.

> > +static bool opp_debug_create_supplies(struct dev_pm_opp *opp,
> > + struct opp_table *opp_table,
> > + struct dentry *pdentry)
> > +{
> > +   struct dentry *d;
> > +   int i = 0;
> > +   char name[] = "supply-X"; /* support only 0-9 supplies */
> 
> But we don't verify that's the case? Why not use kasprintf() and
> free() and then there isn't any limit?

Done.

> > diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
> > index b7fcd0a1b58b..c857fb07a5bc 100644
> > --- a/drivers/base/power/opp/of.c
> > +++ b/drivers/base/power/opp/of.c
> > @@ -105,12 +106,13 @@ static bool _opp_is_supported(struct device *dev, 
> > struct opp_table *opp_table,
> >  static int opp_parse_supplies(struct dev_pm_opp *opp, struct device *dev,
> >   struct opp_table *opp_table)
> >  {
> > -   u32 microvolt[3] = {0};
> > -   u32 val;
> > -   int count, ret;
> > +   u32 *microvolt, *microamp = NULL;
> > +   int supplies, vcount, icount, ret, i, j;
> > struct property *prop = NULL;
> > char name[NAME_MAX];
> >  
> > +   supplies = opp_table->regulator_count ? opp_table->regulator_count : 1;
> 
> We can't have regulator_count == 1 by default?

It is used at various places to distinguish if regulators are set by platform
code or not. The OPP core can still be used just for DT data, i.e. no opp-set.
And so it is important to support cases where the regulators aren't set.

> > @@ -155,7 +155,8 @@ enum opp_table_access {
> >   * @supported_hw_count: Number of elements in supported_hw array.
> >   * @prop_name: A name to postfix to many DT properties, while parsing them.
> >   * @clk: Device's clock handle
> > - * @regulator: Supply regulator
> > + * @regulators: Supply regulators
> > + * @regulator_count: Number of Power Supply regulators
> 
> Lowercase Power Supply please.

Done.

> >   * @dentry:debugfs dentry pointer of the real device directory 
> > (not links).
> >   * @dentry_name: Name of the real dentry.
> >   *
> > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> > index 5c07ae05d69a..15cb26118dc7 100644
> > --- a/drivers/cpufreq/cpufreq-dt.c
> > +++ b/drivers/cpufreq/cpufreq-dt.c
> > @@ -186,7 +186,10 @@ static int cpufreq_init(struct cpufreq_policy *policy)
> >  */
> > name = find_supply_name(cpu_dev);
> > if (name) {
> > -

Re: [PATCH 6/5] KVM: x86: fix periodic lapic timer with hrtimers

2016-10-25 Thread Wanpeng Li

2016-10-25 19:43 GMT+08:00 Radim Krčmář :
> 2016-10-25 07:39+0800, Wanpeng Li:
>> 2016-10-24 23:27 GMT+08:00 Radim Krčmář :
>>> 2016-10-24 17:09+0200, Paolo Bonzini:
 On 24/10/2016 17:03, Radim Krčmář wrote:
>>> [...]

 Reviewed-by: Paolo Bonzini 

 Go ahead, squash it into 5/5 and commit to kvm/queue. :)
>>>
>>> Did that, thanks.
>>>
>>> Wanpeng, the code is now under your name so please check it and/or
>>> complain.
>>
>> This patch 6/5 incurred regressions.
>>
>> - The latency of the periodic mode which is emulated by VMX preemption
>> is almost the same as periodic mode which is emulated by hrtimer.
>
> Hm, what numbers are you getting?

The two fixes look good to me. However, the codes which you remove in
kvm_lapic_switch_to_hv_timer() results in different numbers.

w/o removehlt average latency = 2398462
w/ remove  hlt average latency = 2403845

>
> When I ran the test with the original series, then it actually had worse

Did you test this by running my kvm-unit-tests/apic_timer_latency.flat?

> results with the VMX preemption than it did with the hrimer:
>
>   hlt average latency   = 1464151
>   pause average latency = 1467605
>
> htl tests the hrtimer, pause tests the VMX preemption.  I just replaced
> "hlt" with "pause" in the assembly loop.
>
> The worse result was because the VMX preemption period was computed
> incorrectly -- it was being added to now().  Some time passes between
> the expiration and reading of now(), so this time was extending the
> period while it shouldn't have.
>
> If I run the test with [6/5], it gets sane numbers:
>
>   hlt average latency   = 1465107
>   pause average latency = 1465093
>
> The numbers are sane bacause the test is not computing latency (= how
> long after the timer should have fired have we received the interrupt)
> -- it is computing the duration of the period in cycles, which is much
> better right now.

Agreed.

>
>> - The oneshot mode test of kvm-unit-tests/apic_timer_latency.flat almost 
>> fail.
>
> Oops, silly mistake -- apic_timer_expired() was in the 'else' branch in
> [5/5] and I didn't invert the condition after moving it.
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 6244988418be..d7e74c8ec8ca 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1354,8 +1354,8 @@ static void start_sw_period(struct kvm_lapic *apic)
> return;
>
> if (apic_lvtt_oneshot(apic) &&
> -   ktime_after(apic->lapic_timer.target_expiration,
> -   apic->lapic_timer.timer.base->get_time())) {
> +   !ktime_after(apic->lapic_timer.target_expiration,
> +apic->lapic_timer.timer.base->get_time())) {
> apic_timer_expired(apic);
> return;
> }
>

It works.

> Paolo, can you squash that?
>
>> Btw, hope you can also apply the testcase for kvm-unit-tests. :)
>
> I will have some comments, because it would be nicer if it measured the
> latency ... expected_expiration is not computed correctly.

It measured the latency from guest programs the clock event device to
interrupt injected to guest after timer fire.

Regards,
Wanpeng Li

Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim

On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote:
> On 2016/10/26 12:37, Joonsoo Kim wrote:
> 
> > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> >> On 2016/10/13 16:08, js1...@gmail.com wrote:
> >>
> >>> From: Joonsoo Kim 
> >>>
> >>> Currently, freeing page can stay longer in the buddy list if next higher
> >>> order page is in the buddy list in order to help coalescence. However,
> >>> it doesn't work for the simplest sequential free case. For example, think
> >>> about the situation that 8 consecutive pages are freed in sequential
> >>> order.
> >>>
> >>> page 0: attached at the head of order 0 list
> >>> page 1: merged with page 0, attached at the head of order 1 list
> >>> page 2: attached at the tail of order 0 list
> >>> page 3: merged with page 2 and then merged with page 0, attached at
> >>>  the head of order 2 list
> >>> page 4: attached at the head of order 0 list
> >>> page 5: merged with page 4, attached at the tail of order 1 list
> >>> page 6: attached at the tail of order 0 list
> >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >>>  with page 0 and we get order 3 freepage.
> >>>
> >>> With excluding page 0 case, there are three cases that freeing page is
> >>> attached at the head of buddy list in this example and if just one
> >>> corresponding ordered allocation request comes at that moment, this page
> >>> in being a high order page will be allocated and we would fail to make
> >>> order-3 freepage.
> >>>
> >>> Allocation usually happens in sequential order and free also does. So, it
> >>> would be important to detect such a situation and to give some chance
> >>> to be coalesced.
> >>>
> >>> I think that simple and effective heuristic about this case is just
> >>> attaching freeing page at the tail of the buddy list unconditionally.
> >>> If freeing isn't merged during one rotation, it would be actual
> >>> fragmentation and we don't need to care about it for coalescence.
> >>>
> >>
> >> Hi Joonsoo,
> >>
> >> I find another two places to reduce fragmentation.
> >>
> >> 1)
> >> __rmqueue_fallback
> >>steal_suitable_fallback
> >>move_freepages_block
> >>move_freepages
> >>list_move
> >> If we steal some free pages, we will add these page at the head of 
> >> start_migratetype list,
> >> this will cause more fixed migratetype, because this pages will be 
> >> allocated more easily.
> >> So how about use list_move_tail instead of list_move?
> > 
> > Yeah... I don't think deeply but, at a glance, it would be helpful.
> > 
> >>
> >> 2)
> >> __rmqueue_fallback
> >>expand
> >>list_add
> >> How about use list_add_tail instead of list_add? If add the tail, then the 
> >> rest of pages
> >> will be hard to be allocated and we can merge them again as soon as the 
> >> page freed.
> > 
> > I guess that it has no effect. When we do __rmqueue_fallback() and
> > expand(), we don't have any freepage on this or more order. So,
> > list_add or list_add_tail will show the same result.
> > 
> 
> Hi Joonsoo,
> 
> Usually this list is empty, but in the following case, the list is not empty.
> 
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block  // move to the list of start_migratetype
>   expand  // split the largest order first
>   list_add  // add to the list of start_migratetype

In this case, stealed freepage on steal_suitable_fallback() and
splitted freepage would come from the same pageblock. So, it doen't
matter to use whatever list_add* function.

Thanks.

Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Xishi Qiu

On 2016/10/26 12:37, Joonsoo Kim wrote:

> On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
>> On 2016/10/13 16:08, js1...@gmail.com wrote:
>>
>>> From: Joonsoo Kim 
>>>
>>> Currently, freeing page can stay longer in the buddy list if next higher
>>> order page is in the buddy list in order to help coalescence. However,
>>> it doesn't work for the simplest sequential free case. For example, think
>>> about the situation that 8 consecutive pages are freed in sequential
>>> order.
>>>
>>> page 0: attached at the head of order 0 list
>>> page 1: merged with page 0, attached at the head of order 1 list
>>> page 2: attached at the tail of order 0 list
>>> page 3: merged with page 2 and then merged with page 0, attached at
>>>  the head of order 2 list
>>> page 4: attached at the head of order 0 list
>>> page 5: merged with page 4, attached at the tail of order 1 list
>>> page 6: attached at the tail of order 0 list
>>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
>>>  with page 0 and we get order 3 freepage.
>>>
>>> With excluding page 0 case, there are three cases that freeing page is
>>> attached at the head of buddy list in this example and if just one
>>> corresponding ordered allocation request comes at that moment, this page
>>> in being a high order page will be allocated and we would fail to make
>>> order-3 freepage.
>>>
>>> Allocation usually happens in sequential order and free also does. So, it
>>> would be important to detect such a situation and to give some chance
>>> to be coalesced.
>>>
>>> I think that simple and effective heuristic about this case is just
>>> attaching freeing page at the tail of the buddy list unconditionally.
>>> If freeing isn't merged during one rotation, it would be actual
>>> fragmentation and we don't need to care about it for coalescence.
>>>
>>
>> Hi Joonsoo,
>>
>> I find another two places to reduce fragmentation.
>>
>> 1)
>> __rmqueue_fallback
>>  steal_suitable_fallback
>>  move_freepages_block
>>  move_freepages
>>  list_move
>> If we steal some free pages, we will add these page at the head of 
>> start_migratetype list,
>> this will cause more fixed migratetype, because this pages will be allocated 
>> more easily.
>> So how about use list_move_tail instead of list_move?
> 
> Yeah... I don't think deeply but, at a glance, it would be helpful.
> 
>>
>> 2)
>> __rmqueue_fallback
>>  expand
>>  list_add
>> How about use list_add_tail instead of list_add? If add the tail, then the 
>> rest of pages
>> will be hard to be allocated and we can merge them again as soon as the page 
>> freed.
> 
> I guess that it has no effect. When we do __rmqueue_fallback() and
> expand(), we don't have any freepage on this or more order. So,
> list_add or list_add_tail will show the same result.
> 

Hi Joonsoo,

Usually this list is empty, but in the following case, the list is not empty.

__rmqueue_fallback
steal_suitable_fallback
move_freepages_block  // move to the list of start_migratetype
expand  // split the largest order first
list_add  // add to the list of start_migratetype

Thanks,
Xishi Qiu

Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)

2016-10-25 Thread Daniel Vetter

On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote:
> > A recent change to the mm code in:
> > 87744ab3832b83ba71b931f86f9cfdb000d07da5
> > mm: fix cache mode tracking in vm_insert_mixed()
> > 
> > started enforcing checking the memory type against the registered list for
> > amixed pfn insertion mappings. It happens that the drm drivers for a number
> > of gpus relied on this being broken. Currently the driver only inserted
> > VRAM mappings into the tracking table when they came from the kernel,
> > and userspace mappings never landed in the table. This led to a regression
> > where all the mapping end up as UC instead of WC now.
> 
> Eek.
> 
> > I've considered a number of solutions but since this needs to be fixed
> > in fixes and not next, and some of the solutions were going to introduce
> > overhead that hadn't been there before I didn't consider them viable at
> > this stage. These mainly concerned hooking into the TTM io reserve APIs,
> > but these API have a bunch of fast paths I didn't want to unwind to add
> > this to.
> > 
> > The solution I've decided on is to add a new API like the arch_phys_wc
> > APIs (these would have worked but wc_del didn't take a range), and
> > use them from the drivers to add a WC compatible mapping to the table
> > for all VRAM on those GPUs. This means we can then create userspace
> > mapping that won't get degraded to UC.
> 
> Is anything on a driver to be able to tell when this is actually needed ?
> How will driver developers know? Can you add a bit of documentation to
> the API? If its transitive towards a secondary solution indicating so
> would help driver developers.

I'll plug the io-mapping stuff again here, and more specifically the
userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
probably move that one to the core. That way io_mapping takes care of the
full reservartion, and allows you to on-demand kmap (for kernel) and write
ptes. All nicely fast and all, and for bonus, also nicely encapsulated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

[tip:x86/asm] mm/page_alloc: Remove kernel address exposure in free_reserved_area()

2016-10-25 Thread tip-bot for Josh Poimboeuf

Commit-ID:  adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Gitweb: http://git.kernel.org/tip/adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:14 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

mm/page_alloc: Remove kernel address exposure in free_reserved_area()

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg.  The only leaks I see on my local
system are:

  Freeing SMP alternatives memory: 32K (9e309000 - 9e311000)
  Freeing initrd memory: 10588K (a0b736b42000 - a0b737599000)
  Freeing unused kernel memory: 3592K (9df87000 - 9e309000)
  Freeing unused kernel memory: 1352K (a0b7288ae000 - a0b728a0)
  Freeing unused kernel memory: 632K (a0b728d62000 - a0b728e0)

Linus says:

  "I suspect we should just remove [the addresses in the 'Freeing'
   messages]. I'm sure they are useful in theory, but I suspect they
   were more useful back when the whole "free init memory" was
   originally done.

   These days, if we have a use-after-free, I suspect the init-mem
   situation is the easiest situation by far. Compared to all the dynamic
   allocations which are much more likely to show it anyway. So having
   debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

  Freeing SMP alternatives memory: 32K
  Freeing initrd memory: 10588K
  Freeing unused kernel memory: 3592K
  Freeing unused kernel memory: 1352K
  Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@kvack.org
Link: 
http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b3bf67..3f63973 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6508,8 +6508,8 @@ unsigned long free_reserved_area(void *start, void *end, 
int poison, char *s)
}
 
if (pages && s)
-   pr_info("Freeing %s memory: %ldK (%p - %p)\n",
-   s, pages << (PAGE_SHIFT - 10), start, end);
+   pr_info("Freeing %s memory: %ldK\n",
+   s, pages << (PAGE_SHIFT - 10));
 
return pages;
 }

[tip:x86/asm] x86/dumpstack: Remove raw stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf

Commit-ID:  0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Gitweb: http://git.kernel.org/tip/0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:13 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove raw stack dump

For mostly historical reasons, the x86 oops dump shows the raw stack
values:

  ...
  [registers]
  Stack:
   880079af7350 880079905400  c98f3ae0
   a0196610 0001 0001 87654321
   0002   
  Call Trace:
  ...

This seems to be an artifact from long ago, and probably isn't needed
anymore.  It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.

Linus says:

  "The stack dump actually goes back to forever, and it used to be
   useful back in 1992 or so. But it used to be useful mainly because
   stacks were simpler and we didn't have very good call traces anyway. I
   definitely remember having used them - I just do not remember having
   used them in the last ten+ years.

   Of course, it's still true that if you can trigger an oops, you've
   likely already lost the security game, but since the stack dump is so
   useless, let's aim to just remove it and make games like the above
   harder."

This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 Documentation/kernel-parameters.txt   |  3 --
 Documentation/sysctl/kernel.txt   |  8 -
 Documentation/x86/x86_64/boot-options.txt |  4 ---
 arch/x86/include/asm/stacktrace.h |  5 ---
 arch/x86/kernel/dumpstack.c   | 21 ++--
 arch/x86/kernel/dumpstack_32.c| 33 +--
 arch/x86/kernel/dumpstack_64.c| 53 +--
 kernel/sysctl.c   |  7 
 8 files changed, 4 insertions(+), 130 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..049a917 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1958,9 +1958,6 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
 
-   kstack=N[X86] Print N words from the kernel stack
-   in oops dumps.
-
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
 
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..065f184 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -40,7 +40,6 @@ show up in /proc/sys/kernel:
 - hung_task_warnings
 - kexec_load_disabled
 - kptr_restrict
-- kstack_depth_to_print   [ X86 only ]
 - l2cr[ PPC only ]
 - modprobe==> Documentation/debugging-modules.txt
 - modules_disabled
@@ -395,13 +394,6 @@ When kptr_restrict is set to (2), kernel pointers printed 
using
 
 ==
 
-kstack_depth_to_print: (X86 only)
-
-Controls the number of words to print when dumping the raw
-kernel stack.
-
-==
-
 l2cr: (PPC only)
 
 This flag controls the L2 cache of G3 processor boards. If
diff --git a/Documentation/x86/x86_64/boot-options.txt 
b/Documentation/x86/x86_64/boot-options.txt
index 0965a71..61b611e 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -277,10 +277,6 @@ IOMMU (input/output memory management unit)
 space might stop working. Use this option if you have devices that
 are accessed from userspace directly on some PCI host bridge.
 
-Debugging
-
-  kstack=N Print N words from the kernel stack in oops dumps.
-
 Miscellaneous
 
nogbpages
diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 37f2e0b..1e375b0 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -43,8 +43,6 @@ static inline bool on_stack(struct stack_info *info, void 
*addr, size_t len)
addr + len > begin && addr + len <= end);
 }
 
-extern int kstack_depth_to_print;
-
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
 #else
@@ -86,9 +84,6 @@ get_stack_pointer(struct task_struct *task, struct pt_regs 
*regs)
 void show_trace

[tip:x86/asm] x86/dumpstack: Remove kernel text addresses from stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf

Commit-ID:  bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Gitweb: http://git.kernel.org/tip/bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:12 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove kernel text addresses from stack dump

Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.

It can be a security issue because it leaks kernel addresses.  It also
affects the usefulness of the stack dump.  Linus says:

  "I actually spend time cleaning up commit messages in logs, because
  useless data that isn't actually information (random hex numbers) is
  actively detrimental.

  It makes commit logs less legible.

  It also makes it harder to parse dumps.

  It's not useful. That makes it actively bad.

  I probably look at more oops reports than most people. I have not
  found the hex numbers useful for the last five years, because they are
  just randomized crap.

  The stack content thing just makes code scroll off the screen etc, for
  example."

The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names.  However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.

There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:

  $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  write_sysrq_trigger+0x51/0x60:
  write_sysrq_trigger at drivers/tty/sysrq.c:1098

Or gdb can be used:

  $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
  (gdb) 0x815b5d83 is in driver_probe_device 
(/home/jpoimboe/git/linux/drivers/base/dd.c:378).

(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds.  faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)

Here's an example of what a stack dump looks like after this change:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: sysrq_handle_crash+0x45/0x80
  PGD 36bfa067 [   29.650644] PUD 7aca3067
  Oops: 0002 [#1] PREEMPT SMP
  Modules linked in: ...
  CPU: 1 PID: 786 Comm: bash Tainted: GE   4.9.0-rc1+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 
04/01/2014
  task: 880078582a40 task.stack: c9ba8000
  RIP: 0010:sysrq_handle_crash+0x45/0x80
  RSP: 0018:c9babdc8 EFLAGS: 00010296
  RAX: 880078582a40 RBX: 0063 RCX: 0001
  RDX: 0001 RSI:  RDI: 0292
  RBP: c9babdc8 R08: 000b31866061 R09: 
  R10: 0001 R11:  R12: 
  R13: 0007 R14: 81ee8680 R15: 
  FS:  7ffb43869700() GS:88007d40() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 7a3e9000 CR4: 001406e0
  Stack:
   c9babe00 81572d08 81572bd5 0002
    880079606600 7ffb4386e000 c9babe20
   81573201 880036a3fd00 fffb c9babe40
  Call Trace:
   __handle_sysrq+0x138/0x220
   ? __handle_sysrq+0x5/0x220
   write_sysrq_trigger+0x51/0x60
   proc_reg_write+0x42/0x70
   __vfs_write+0x37/0x140
   ? preempt_count_sub+0xa1/0x100
   ? __sb_start_write+0xf5/0x210
   ? vfs_write+0x183/0x1a0
   vfs_write+0xb8/0x1a0
   SyS_write+0x58/0xc0
   entry_SYSCALL_64_fastpath+0x1f/0xc2
  RIP: 0033:0x7ffb42f55940
  RSP: 002b:7ffd33bb6b18 EFLAGS: 0246 ORIG_RAX: 0001
  RAX: ffda RBX: 0046 RCX: 7ffb42f55940
  RDX: 0002 RSI: 7ffb4386e000 RDI: 0001
  RBP: 0011 R08: 7ffb4321ea40 R09: 7ffb43869700
  R10: 7ffb43869700 R11: 0246 R12: 00778a10
  R13: 7ffd33bb5c00 R14: 0007 R15: 0010
  Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 
81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8  04 25 00 00 00 00 
01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
  RIP: sysrq_handle_crash+0x45/0x80 RSP: c9babdc8
  CR2: 

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/kdebug.h |  1 -
 arch/x86/kernel/dumpstack.c   | 18 --
 arch/x86/kernel/process_32.c  |  7 +++
 arch/x86/kernel/process_64.c  |  6 +++---
 arch/x86/mm/fault.c   |  3 +--

[tip:x86/asm] scripts/faddr2line: Fix "size mismatch" error

2016-10-25 Thread tip-bot for Josh Poimboeuf

Commit-ID:  efdb4167e676aaba7505bec739785b76e206cb45
Gitweb: http://git.kernel.org/tip/efdb4167e676aaba7505bec739785b76e206cb45
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:11 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

scripts/faddr2line: Fix "size mismatch" error

I'm not sure how we missed this problem before.  When I take a function
address and size from an oops and give it to faddr2line, it usually
complains about a size mismatch:

  $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  skipping write_sysrq_trigger address at 0x815731a1 due to size 
mismatch (0x60 != 83)
  no match for write_sysrq_trigger+0x51/0x60

The problem is caused by differences in how kallsyms and faddr2line
determine a function's size.

kallsyms calculates a function's size by parsing the output of 'nm -n'
and subtracting the next function's address from the current function's
address.  This means that nop instructions after the end of the function
are included in the size.

In contrast, faddr2line reads the size from the symbol table, which does
*not* include the ending nops in the function's size.

Change faddr2line to calculate the size from the output of 'nm -n' to be
consistent with kallsyms and oops outputs.

Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 scripts/faddr2line | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/scripts/faddr2line b/scripts/faddr2line
index 450b332..29df825 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -105,9 +105,18 @@ __faddr2line() {
# In rare cases there might be duplicates.
while read symbol; do
local fields=($symbol)
-   local sym_base=0x${fields[1]}
-   local sym_size=${fields[2]}
-   local sym_type=${fields[3]}
+   local sym_base=0x${fields[0]}
+   local sym_type=${fields[1]}
+   local sym_end=0x${fields[3]}
+
+   # calculate the size
+   local sym_size=$(($sym_end - $sym_base))
+   if [[ -z $sym_size ]] || [[ $sym_size -le 0 ]]; then
+   warn "bad symbol size: base: $sym_base end: $sym_end"
+   DONE=1
+   return
+   fi
+   sym_size=0x$(printf %x $sym_size)
 
# calculate the address
local addr=$(($sym_base + $offset))
@@ -116,26 +125,26 @@ __faddr2line() {
DONE=1
return
fi
-   local hexaddr=0x$(printf %x $addr)
+   addr=0x$(printf %x $addr)
 
# weed out non-function symbols
-   if [[ $sym_type != "FUNC" ]]; then
+   if [[ $sym_type != t ]] && [[ $sym_type != T ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
non-function symbol"
+   echo "skipping $func address at $addr due to 
non-function symbol of type '$sym_type'"
continue
fi
 
# if the user provided a size, make sure it matches the 
symbol's size
if [[ -n $size ]] && [[ $size -ne $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($size != $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($size != $sym_size)"
continue;
fi
 
# make sure the provided offset is within the symbol's range
if [[ $offset -gt $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($offset > $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($offset > $sym_size)"
continue
fi
 
@@ -143,12 +152,12 @@ __faddr2line() {
[[ $FIRST = 0 ]] && echo
FIRST=0
 
-   local hexsize=0x$(printf %x $sym_size)
-   echo "$func+$offset/$hexsize:"
-   addr2line -fpie $objfile $hexaddr | sed "s; 
$dir_prefix\(\./\)*; ;"
+   # pass real address to addr2line
+   echo "$func+$offset/$sym_size:"
+   addr2line -fpie $objfile $addr | sed "s; $dir_prefix\(\./\)*; ;"
DONE=1
 
-   done < <(readelf -sW $objfile | awk -v f=$func '$8 == f {pri

[PATCH v2 3/3] kernel/smp: Tell the user we're bringing up secondary CPUs

2016-10-25 Thread Michael Ellerman

Currently we don't print anything before starting to bring up secondary
CPUs. This can be confusing if it takes a long time to bring up the
secondaries, or if the kernel crashes while doing so and produces no
further output.

On x86 they work around this by detecting when the first secondary CPU
comes up and printing a message (see announce_cpu()). But doing it in
smp_init() is simpler and works for all arches.

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

v2: Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/kernel/smp.c b/kernel/smp.c
index 4323c5db7d26..77fcdb9f2775 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -555,6 +555,8 @@ void __init smp_init(void)
idle_threads_init();
cpuhp_threads_init();
 
+   pr_info("Bringing up secondary CPUs ...\n");
+
/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
-- 
2.7.4

[PATCH v2 2/3] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman

Currently after bringing up secondary CPUs all arches print "Brought up
%d CPUs". On x86 they also print the number of nodes that were brought
online.

It would be nice to also print the number of nodes on other arches.
Although we could override smp_announce() on the other ~10 NUMA aware
arches, it seems simpler to just always print the number of nodes. On
non-NUMA arches there is just always 1 node.

Having done that, smp_announce() is no longer weak, and seems small
enough to just pull directly into smp_init().

Also update the printing of "%d CPUs" to be smart when an SMP kernel is
booted on a single CPU system, or when only one CPU is available, eg:

   smp: Brought up 2 nodes, 1 CPU

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 arch/x86/kernel/smpboot.c |  8 
 kernel/smp.c  | 13 +++--
 2 files changed, 7 insertions(+), 14 deletions(-)

v2: Print singular CPU when only 1 CPU is found.
Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42f5eb7b4f6c..b9f02383f372 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -821,14 +821,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned 
long start_eip)
return (send_status | accept_status);
 }
 
-void smp_announce(void)
-{
-   int num_nodes = num_online_nodes();
-
-   printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n",
-  num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
-}
-
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
diff --git a/kernel/smp.c b/kernel/smp.c
index 2d1f15d43022..4323c5db7d26 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -546,14 +546,10 @@ void __init setup_nr_cpu_ids(void)
nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
-void __weak smp_announce(void)
-{
-   printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus());
-}
-
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
+   int num_nodes, num_cpus;
unsigned int cpu;
 
idle_threads_init();
@@ -567,8 +563,13 @@ void __init smp_init(void)
cpu_up(cpu);
}
 
+   num_nodes = num_online_nodes();
+   num_cpus  = num_online_cpus();
+   pr_info("Brought up %d node%s, %d CPU%s\n",
+   num_nodes, (num_nodes > 1 ? "s" : ""),
+   num_cpus,  (num_cpus  > 1 ? "s" : ""));
+
/* Any cleanup work */
-   smp_announce();
smp_cpus_done(setup_max_cpus);
 }
 
-- 
2.7.4

[PATCH v2 1/3] kernel/smp: Define pr_fmt() for smp.c

2016-10-25 Thread Michael Ellerman

This makes all our pr_xxx()'s start with "smp: ", which helps pin down
where they come from and generally looks nice. There is actually only
one pr_xxx() use in smp.c at the moment, but we will add some more in
the next commit.

Suggested-by: Borislav Petkov 
Signed-off-by: Michael Ellerman 
---
 kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

v2: New in v2.

diff --git a/kernel/smp.c b/kernel/smp.c
index bba3b201668d..2d1f15d43022 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,9 @@
  *
  * (C) Jens Axboe  2008
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
-- 
2.7.4

Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman

Ingo Molnar  writes:
> * Michael Ellerman  wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> No objections - but pedantry requires me to mention that while we are 
> evolving 
> this code and changing the strings I think we should make the CPU 
> announcement 
> CPU%s smart as well: an SMP kernel on a single CPU bootup will result in 
> num_online_cpus() == 1, right?

Yeah that makes sense. I don't often boot any single CPU systems, but I
tested with maxcpus=1 and it does look nicer:

smp: Brought up 2 nodes, 1 CPU


Will send a v2.

cheers

Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman

Borislav Petkov  writes:
> On Thu, Oct 13, 2016 at 07:55:19PM +1100, Michael Ellerman wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> Please define pr_fmt for this file so that pr_info adds the prefix
> automatically. I guess
>
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> at the top, before all the include directives should suffice.

Sure thing.

> Other than that, for both patches:
>
> Reviewed-by: Borislav Petkov 

Thanks, v2 coming soon.

cheers

[PATCH] arm64: defconfig: Enable DRM DU and V4L2 FCP + VSP modules

2016-10-25 Thread Magnus Damm

From: Magnus Damm 

Extend the ARM64 defconfig to enable the DU DRM device as module
together with required dependencies of V4L2 FCP and VSP modules.

This enables VGA output on the r8a7795 Salvator-X board.

Signed-off-by: Magnus Damm 
---

 Written against next-20161026

 arch/arm64/configs/defconfig |   14 ++
 1 file changed, 14 insertions(+)

--- 0001/arch/arm64/configs/defconfig
+++ work/arch/arm64/configs/defconfig   2016-10-26 14:10:58.220607110 +0900
@@ -293,8 +293,22 @@ CONFIG_REGULATOR_PWM=y
 CONFIG_REGULATOR_QCOM_SMD_RPM=y
 CONFIG_REGULATOR_QCOM_SPMI=y
 CONFIG_REGULATOR_S2MPS11=y
+CONFIG_MEDIA_SUPPORT=m
+CONFIG_MEDIA_CAMERA_SUPPORT=y
+CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
+CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
+CONFIG_MEDIA_CONTROLLER=y
+CONFIG_VIDEO_V4L2_SUBDEV_API=y
+# CONFIG_DVB_NET is not set
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_VIDEO_RENESAS_FCP=m
+CONFIG_VIDEO_RENESAS_VSP1=m
 CONFIG_DRM=m
 CONFIG_DRM_NOUVEAU=m
+CONFIG_DRM_RCAR_DU=m
+CONFIG_DRM_RCAR_HDMI=y
+CONFIG_DRM_RCAR_LVDS=y
+CONFIG_DRM_RCAR_VSP=y
 CONFIG_DRM_TEGRA=m
 CONFIG_DRM_PANEL_SIMPLE=m
 CONFIG_DRM_I2C_ADV7511=m

[PATCH 3/3] x86/vmware: Add paravirt sched clock

2016-10-25 Thread Alexey Makhalov

Set pv_time_ops.sched_clock to vmware_sched_clock(). It is simplified
version of native_sched_clock() without ring buffer of mult/shift/offset
triplets and preempt toggling.
Since VMware hypervisor provides constant tsc we can use constant
mult/shift/offset triplet calculated at boot time.

no-vmw-sched-clock kernel parameter is added to switch back to the
native_sched_clock() implementation.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 Documentation/kernel-parameters.txt |  4 
 arch/x86/kernel/cpu/vmware.c| 38 +
 2 files changed, 42 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..b3b2ec0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2754,6 +2754,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
 
+   no-vmw-sched-clock
+   [X86,PV_OPS] Disable paravirtualized VMware scheduler
+   clock and use the default one.
+
no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index e3fb320..6ef22c1 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -24,10 +24,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #define CPUID_VMWARE_INFO_LEAF 0x4000
 #define VMWARE_HYPERVISOR_MAGIC0x564D5868
@@ -62,10 +64,46 @@ static unsigned long vmware_get_tsc_khz(void)
 }
 
 #ifdef CONFIG_PARAVIRT
+static struct cyc2ns_data vmware_cyc2ns __ro_after_init;
+
+static int vmw_sched_clock __initdata = 1;
+static __init int setup_vmw_sched_clock(char *s)
+{
+   vmw_sched_clock = 0;
+   return 0;
+}
+early_param("no-vmw-sched-clock", setup_vmw_sched_clock);
+
+static unsigned long long vmware_sched_clock(void)
+{
+   unsigned long long ns;
+
+   ns = mul_u64_u32_shr(rdtsc(), vmware_cyc2ns.cyc2ns_mul,
+vmware_cyc2ns.cyc2ns_shift);
+   ns -= vmware_cyc2ns.cyc2ns_offset;
+   return ns;
+}
+
 static void __init vmware_paravirt_ops_setup(void)
 {
pv_info.name = "VMware";
pv_cpu_ops.io_delay = paravirt_nop;
+
+   if (vmware_tsc_khz && vmw_sched_clock) {
+   unsigned long long tsc_now = rdtsc();
+
+   clocks_calc_mult_shift(&vmware_cyc2ns.cyc2ns_mul,
+  &vmware_cyc2ns.cyc2ns_shift,
+  vmware_tsc_khz,
+  NSEC_PER_MSEC, 0);
+   vmware_cyc2ns.cyc2ns_offset =
+   mul_u64_u32_shr(tsc_now, vmware_cyc2ns.cyc2ns_mul,
+   vmware_cyc2ns.cyc2ns_shift);
+
+   pv_time_ops.sched_clock = vmware_sched_clock;
+   pr_info("vmware: using sched offset of %llu ns\n",
+   vmware_cyc2ns.cyc2ns_offset);
+   }
 }
 #else
 #define vmware_paravirt_ops_setup() do {} while (0)
-- 
2.10.1

[PATCH 2/3] x86/vmware: Add basic paravirt ops support

2016-10-25 Thread Alexey Makhalov

Add basic paravirt support:
 1. set pv_info.name to "VMware" to have proper boot log message
Booting paravirtualized kernel on VMware
instead of "... on bare hardware"
 2. set pv_cpu_ops.io_delay() to empty function - paravirt_nop() to
avoid vm-exits on IO delays.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 480790f..e3fb320 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -61,6 +61,16 @@ static unsigned long vmware_get_tsc_khz(void)
return vmware_tsc_khz;
 }
 
+#ifdef CONFIG_PARAVIRT
+static void __init vmware_paravirt_ops_setup(void)
+{
+   pv_info.name = "VMware";
+   pv_cpu_ops.io_delay = paravirt_nop;
+}
+#else
+#define vmware_paravirt_ops_setup() do {} while (0)
+#endif
+
 static void __init vmware_platform_setup(void)
 {
uint32_t eax, ebx, ecx, edx;
@@ -94,6 +104,8 @@ static void __init vmware_platform_setup(void)
} else {
pr_warn("Failed to get TSC freq from the hypervisor\n");
}
+
+   vmware_paravirt_ops_setup();
 }
 
 /*
-- 
2.10.1

lening bieden 3%

2016-10-25 Thread Lloyds TSB Bank PLC

Goede dag,

 Dit is Lloyd's TSB Bank plc leningen aan te bieden.

   Lloyds TSB biedt flexibele en betaalbare leningen voor welk doel u te helpen 
uw doelen te bereiken. we lening tegen lage rente van 3%. Hier zijn een aantal 
belangrijke kenmerken van de persoonlijke lening aangeboden door Lloyd's TSB. 
Hier zijn de Loan Factoren we werken met de toonaangevende Britse makelaars die 
toegang hebben tot de top kredietverstrekkers hebben en in staat zijn om de 
beste financiële oplossing tegen een betaalbare price.Please vinden als u 
geïnteresseerd bent vriendelijk contact met ons op via deze e-mail: 
lloyds26...@gmail.com


Na de reactie, zal u een aanvraag voor een lening te vullen ontvangen. Geen 
sociale zekerheid en geen credit check, 100% gegarandeerd.

Het zal ons een eer zijn als u ons toelaten om u van dienst zijn.


INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigd 
Duur: ...
Bezetting: ...
Maandelijks Inkomen Level: 
Geslacht: ...
Geboortedatum: 
Staat: ..
Land: ..
Doel: .

Ontmoeting uw financiële behoeften is onze trots.


Dr.John Mahama.

[PATCH 0/3] x86/vmware guest improvements

2016-10-25 Thread Alexey Makhalov

This patchset includes several VMware guest improvements:

Alexey Makhalov (3):
  x86/vmware: Use tsc_khz value for calibrate_cpu()
  x86/vmware: Add basic paravirt ops support
  x86/vmware: Add paravirt sched clock

 Documentation/kernel-parameters.txt |  4 +++
 arch/x86/kernel/cpu/vmware.c| 51 +
 2 files changed, 55 insertions(+)

-- 
2.10.1

[PATCH 1/3] x86/vmware: Use tsc_khz value for calibrate_cpu()

2016-10-25 Thread Alexey Makhalov

After aa297292d708, there are separate native calibrations for cpu_khz and
tsc_khz. The code sets x86_platform.calibrate_cpu to native_calibrate_cpu()
which looks in cpuid leaf 0x16 or msrs for the cpu frequency. Since we keep
the tsc_khz constant (even after vmotion), the cpu_khz and tsc_khz may
start diverging.

tsc_init() now does

cpu_khz = x86_platform.calibrate_cpu();
tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;

We want the cpu_khz and tsc_khz to be sync even if they diverge less then
10%.
This patch resolves this issue by setting x86_platform.calibrate_cpu to
vmware_get_tsc_khz().

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 4e34da4b..480790f 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -83,6 +83,7 @@ static void __init vmware_platform_setup(void)
 
vmware_tsc_khz = tsc_khz;
x86_platform.calibrate_tsc = vmware_get_tsc_khz;
+   x86_platform.calibrate_cpu = vmware_get_tsc_khz;
 
 #ifdef CONFIG_X86_LOCAL_APIC
/* Skip lapic calibration since we know the bus frequency. */
-- 
2.10.1

RE: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add dpio

2016-10-25 Thread Stuart Yoder

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, October 24, 2016 9:34 AM
> To: Stuart Yoder ; gre...@linuxfoundation.org
> Cc: German Rivera ; de...@driverdev.osuosl.org; 
> linux-kernel@vger.kernel.org;
> a...@arndb.de; Leo Li 
> Subject: Re: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add 
> dpio
> 
> Hi Stuart,
> 
> On 10/21/2016 04:01 PM, Stuart Yoder wrote:
> > This patch series: A) addresses the final item in the staging
> > TODO list for the fsl-mc bus driver-- adding a functional driver
> > on top of the bus driver, and B) requests that the fsl-mc bus driver
> > be moved out of staging.
> 
> Awesome, it's great to see progress again! :)
> 
> > The proposed destination for the bus driver is drivers/bus.
> > Proposed location for global header files for fsl-mc and dpaa2
> > is include/linux/fsl.
> >
> > The functional driver added is for the DPIO object which provides
> > queuing services for other DPAA2 drivers.  An overview of the
> 
> I thought the idea of the TODO item was to have a full-fledged user of
> the bus, like a full network driver. The TODO item reads:
> 
> > -* Add at least one device driver for a DPAA2 object (child device of the
> > -  fsl-mc bus).  Most likely candidate for this is adding DPAA2 Ethernet
> > -  driver support, which depends on drivers for several objects: DPNI,
> > -  DPIO, DPMAC.  Other pre-requisites include:

DPIO is a "full fleged user" of the bus.  But, yes, it does provide
infrastructure services and so does not have a standalone I/O function.

> which to me indicates that DPIO is only part of that goal. Of course I'm
> the last person blocking progress to move the driver out of staging. But
> are we at the right point yet?

I thought the goal was to demonstrate a driver on top of the fsl-mc
bus driver because without that it would have been difficult to validate/review
that the bus infrastructure was correct.

The DPIO driver demonstrates full use of the bus driver infrastructure--
getting probed, discovering and mapping mmio regions, initializing the
device, initializing interrupts.

> To me the topmost important bit of having this outside of staging is
> actually missing in the TODO list (probably since it's obvious): Have
> stable, reliable, responsible maintainership for the code.
> 
> So far I've seen German do the initial push upstream, then there was
> silence for a while. Now some time passed and you push a few bits here
> and there again. All of the efforts are great and very appreciated, but
> I'm missing the "maintainer" figure. Some peer to German and you who
> oversees the whole thing, reviews your patches and devotes at least 2-3
> days a week to only upstream fsl-mc work. Someone like York for U-Boot
> or Scott for general Linux work.
> 
> Without that, there's too much of a chance that the code will stay
> incomplete, bitrot, etc. And that'd be bad for everyone involved. I
> think the concept behind fsl-mc is great and exactly what people need,
> so we should make sure it succeeds.

I agree we need that.  We are actively working on getting an additional
maintainer (or two), and until we can get the right person(s) I'm willing
to fill that role.  We're not going to let this code bitrot.

I actually think getting the bus driver out of staging will help spur
broader involvment by NXP engineers in the fsl-mc bus support.  There
are enhancements like a resource management interface for user space,
an interface to see the MC log buffer, SMMU-related hooks for the fsl-mc
bus, and vfio for the fsl-mc bus.  All that stuff is on hold until we
get the bus driver out of staging. The directive we have is to add no
new features until the bus driver is out.

For example, the ARM SMMU driver has an include of ,
but I don't see the SMMU maintainers accepting the following in
arm-smmu.c:
   #include <../drivers/staging/fsl-mc/include/mc.h>

Given that the fsl-mc bus TODO list is done, there is not a whole lot
for a new maintainer to do to the bus driver itself until we get the
driver out of staging (aside from reviewing another DPAA2 object driver
that would also go into staging).

Once the bus driver + dpio is out staging it also opens up the door
for other DPAA2 drivers-- network, crypto, DMA, L2 switch,
decompression/compression, and others to be upstreamed.  I didn't think
we wanted all of those to go into staging, but we were waiting until
some 1 driver was accepted first, proving the bus infrastructure is 
sound.  I was hoping DPI could be that proof of concept.

So, in short, I think getting the bus driver and DPIO out of staging
will open some parallel development and will also provide more 
opportunities for some new maintainers to get involved, because there
will be more to review and do.

However, if you want things to stay in staging for now, I will resubmit
and put DPIO there.

Thanks,
Stuart

Re: [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[]

2016-10-25 Thread Anshuman Khandual

On 10/25/2016 12:52 PM, Balbir Singh wrote:
> 
> 
> On 24/10/16 15:31, Anshuman Khandual wrote:
>> Add a new member N_COHERENT_DEVICE into node_states[] nodemask array to
>> enlist all those nodes which contain only coherent device memory. Also
>> creates a new sysfs interface /sys/devices/system/node/is_coherent_device
>> to list down all those nodes which has coherent device memory.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  Documentation/ABI/stable/sysfs-devices-node |  7 +++
>>  drivers/base/node.c |  6 ++
>>  include/linux/nodemask.h|  3 +++
>>  mm/memory_hotplug.c | 10 ++
>>  4 files changed, 26 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
>> b/Documentation/ABI/stable/sysfs-devices-node
>> index 5b2d0f0..5538791 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -29,6 +29,13 @@ Description:
>>  Nodes that have regular or high memory.
>>  Depends on CONFIG_HIGHMEM.
>>  
>> +What:   /sys/devices/system/node/is_coherent_device
>> +Date:   October 2016
>> +Contact:Linux Memory Management list 
>> +Description:
>> +Lists the nodemask of nodes that have coherent memory.
>> +Depends on CONFIG_COHERENT_DEVICE.
>> +
>>  What:   /sys/devices/system/node/nodeX
>>  Date:   October 2002
>>  Contact:Linux Memory Management list 
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 5548f96..5b5dd89 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -661,6 +661,9 @@ static struct node_attr node_state_attr[] = {
>>  [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
>>  #endif
>>  [N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +[N_COHERENT_DEVICE] = _NODE_ATTR(is_coherent_device, N_COHERENT_DEVICE),
>> +#endif
>>  };
>>  
>>  static struct attribute *node_state_attrs[] = {
>> @@ -674,6 +677,9 @@ static struct attribute *node_state_attrs[] = {
>>  &node_state_attr[N_MEMORY].attr.attr,
>>  #endif
>>  &node_state_attr[N_CPU].attr.attr,
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +&node_state_attr[N_COHERENT_DEVICE].attr.attr,
>> +#endif
>>  NULL
>>  };
>>  
>> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
>> index f746e44..605cb0d 100644
>> --- a/include/linux/nodemask.h
>> +++ b/include/linux/nodemask.h
>> @@ -393,6 +393,9 @@ enum node_states {
>>  N_MEMORY = N_HIGH_MEMORY,
>>  #endif
>>  N_CPU,  /* The node has one or more cpus */
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +N_COHERENT_DEVICE,  /* The node has coherent device memory */
>> +#endif
>>  NR_NODE_STATES
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9629273..8f03962 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1044,6 +1044,11 @@ static void node_states_set_node(int node, struct 
>> memory_notify *arg)
>>  if (arg->status_change_nid_high >= 0)
>>  node_set_state(node, N_HIGH_MEMORY);
>>  
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_set_state(node, N_COHERENT_DEVICE);
>> +#endif
>> +
> 
> #ifdef not required, see below
> 

Right, will change.

>>  node_set_state(node, N_MEMORY);
>>  }
>>  
>> @@ -1858,6 +1863,11 @@ static void node_states_clear_node(int node, struct 
>> memory_notify *arg)
>>  if ((N_MEMORY != N_HIGH_MEMORY) &&
>>  (arg->status_change_nid >= 0))
>>  node_clear_state(node, N_MEMORY);
>> +
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_clear_state(node, N_COHERENT_DEVICE);
>> +#endif
>>  }
>>  
> 
> I think the #ifdefs are not needed if isolated_cdm_node
> is defined for both with and without CONFIG_COHERENT_DEVICE.
> 
> I think this patch needs to move up in the series so that
> node state can be examined by other core algorithms

Okay, will move up.

Re: [RFC PATCH 2/5] mm/page_alloc: use smallest fallback page first in movable allocation

2016-10-25 Thread Joonsoo Kim

On Fri, Oct 14, 2016 at 12:52:26PM +0200, Vlastimil Babka wrote:
> On 10/14/2016 03:26 AM, Joonsoo Kim wrote:
> >On Thu, Oct 13, 2016 at 11:12:10AM +0200, Vlastimil Babka wrote:
> >>On 10/13/2016 10:08 AM, js1...@gmail.com wrote:
> >>>From: Joonsoo Kim 
> >>>
> >>>When we try to find freepage in fallback buddy list, we always serach
> >>>the largest one. This would help for fragmentation if we process
> >>>unmovable/reclaimable allocation request because it could cause permanent
> >>>fragmentation on movable pageblock and spread out such allocations would
> >>>cause more fragmentation. But, movable allocation request is
> >>>rather different. It would be simply freed or migrated so it doesn't
> >>>contribute to fragmentation on the other pageblock. In this case, it would
> >>>be better not to break the precious highest order freepage so we need to
> >>>search the smallest freepage first.
> >>
> >>I've also pondered this, but then found a lower hanging fruit that
> >>should be hopefully clear win and mitigate most cases of breaking
> >>high-order pages unnecessarily:
> >>
> >>http://marc.info/?l=linux-mm&m=147582914330198&w=2
> >
> >Yes, I agree with that change. That's the similar patch what I tried
> >before.
> >
> >"mm/page_alloc: don't break highest order freepage if steal"
> >http://marc.info/?l=linux-mm&m=143011930520417&w=2
> 
> Ah, indeed, I forgot about it and had to rediscover :)
> 
> >
> >>
> >>So I would try that first, and then test your patch on top? In your
> >>patch there's a risk that we make it harder for
> >>unmovable/reclaimable pageblocks to become movable again (we start
> >>with the smallest page which means there's lower chance that
> >>move_freepages_block() will convert more than half of the block).
> >
> >Indeed, but, with your "count movable pages when stealing", risk would
> >disappear. :)
> 
> Hmm, but that counting is only triggered when we attempt to steal
> whole pageblock. For movable allocation, can_steal_fallback() allows
> that only for
> (order >= pageblock_order / 2), and since your patch makes "order"
> as small as possible for movable allocations, the chances are lower?

Chances are lower than current but we eventually try to steal that
(order >= pageblock_order / 2) freepage from unmovable pageblock and
your logic will result in changing pageblock migratetype from
unmovable to movable.

Thanks.

Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim

On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> On 2016/10/13 16:08, js1...@gmail.com wrote:
> 
> > From: Joonsoo Kim 
> > 
> > Currently, freeing page can stay longer in the buddy list if next higher
> > order page is in the buddy list in order to help coalescence. However,
> > it doesn't work for the simplest sequential free case. For example, think
> > about the situation that 8 consecutive pages are freed in sequential
> > order.
> > 
> > page 0: attached at the head of order 0 list
> > page 1: merged with page 0, attached at the head of order 1 list
> > page 2: attached at the tail of order 0 list
> > page 3: merged with page 2 and then merged with page 0, attached at
> >  the head of order 2 list
> > page 4: attached at the head of order 0 list
> > page 5: merged with page 4, attached at the tail of order 1 list
> > page 6: attached at the tail of order 0 list
> > page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >  with page 0 and we get order 3 freepage.
> > 
> > With excluding page 0 case, there are three cases that freeing page is
> > attached at the head of buddy list in this example and if just one
> > corresponding ordered allocation request comes at that moment, this page
> > in being a high order page will be allocated and we would fail to make
> > order-3 freepage.
> > 
> > Allocation usually happens in sequential order and free also does. So, it
> > would be important to detect such a situation and to give some chance
> > to be coalesced.
> > 
> > I think that simple and effective heuristic about this case is just
> > attaching freeing page at the tail of the buddy list unconditionally.
> > If freeing isn't merged during one rotation, it would be actual
> > fragmentation and we don't need to care about it for coalescence.
> > 
> 
> Hi Joonsoo,
> 
> I find another two places to reduce fragmentation.
> 
> 1)
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block
>   move_freepages
>   list_move
> If we steal some free pages, we will add these page at the head of 
> start_migratetype list,
> this will cause more fixed migratetype, because this pages will be allocated 
> more easily.
> So how about use list_move_tail instead of list_move?

Yeah... I don't think deeply but, at a glance, it would be helpful.

> 
> 2)
> __rmqueue_fallback
>   expand
>   list_add
> How about use list_add_tail instead of list_add? If add the tail, then the 
> rest of pages
> will be hard to be allocated and we can merge them again as soon as the page 
> freed.

I guess that it has no effect. When we do __rmqueue_fallback() and
expand(), we don't have any freepage on this or more order. So,
list_add or list_add_tail will show the same result.

Thanks.

Re: [kernel-hardening] [PATCH] module: extend 'rodata=off' boot cmdline parameter to module mappings

2016-10-25 Thread AKASHI Takahiro

Rusty, Jessica

On Wed, Oct 26, 2016 at 10:43:32AM +1030, Rusty Russell wrote:
> AKASHI Takahiro  writes:
> > On Thu, Oct 20, 2016 at 01:48:15PM -0700, Kees Cook wrote:
> >> On Wed, Oct 19, 2016 at 11:24 PM, AKASHI Takahiro
> >>  wrote:
> >> > The current "rodata=off" parameter disables read-only kernel mappings
> >> > under CONFIG_DEBUG_RODATA:
> >> > commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline 
> >> > parameter
> >> > to disable read-only kernel mappings")
> >> >
> >> > This patch is a logical extension to module mappings ie. read-only 
> >> > mappings
> >> > at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
> >> > (mainly for debug use). Please note, however, that it only affects RO/RW
> >> > permissions, keeping NX set.
> 
> This patch looks good (except the minor issues noted by Kees); please CC
> the followup version to Jessica as new module maintainer.

I think that the new version (v2)[1] addresses Kees' comments already.

[1] http://lkml.iu.edu//hypermail/linux/kernel/1610.2/04163.html

Thanks,
-Takahiro AKASHI

> Thanks!
> Rusty.
> 
> >> >
> >> > This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
> >> > (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.
> >> >
> >> > Suggested-by: Mark Rutland 
> >> > Signed-off-by: AKASHI Takahiro 
> >> > Cc: Rusty Russell 
> >> > ---
> >> > v1:
> >> >   * remove RFC's "module_ronx=" and merge it with "rodata="
> >> >   * always keep NX set if CONFIG_SET_MODULE_RONX
> >> >
> >> >  include/linux/init.h |  3 ++-
> >> >  init/main.c  |  2 +-
> >> >  kernel/module.c  | 21 ++---
> >> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/include/linux/init.h b/include/linux/init.h
> >> > index e30104c..20aa2eb 100644
> >> > --- a/include/linux/init.h
> >> > +++ b/include/linux/init.h
> >> > @@ -126,7 +126,8 @@ void prepare_namespace(void);
> >> >  void __init load_default_modules(void);
> >> >  int __init init_rootfs(void);
> >> >
> >> > -#ifdef CONFIG_DEBUG_RODATA
> >> > +#if defined(CONFIG_DEBUG_RODATA) || 
> >> > defined(CONFIG_DEBUG_SET_MODULE_RONX)
> >> > +extern bool rodata_enabled;
> >> >  void mark_rodata_ro(void);
> >> >  #endif
> >> >
> >> > diff --git a/init/main.c b/init/main.c
> >> > index 2858be7..92db2f3 100644
> >> > --- a/init/main.c
> >> > +++ b/init/main.c
> >> > @@ -915,7 +915,7 @@ static int try_to_run_init_process(const char 
> >> > *init_filename)
> >> >  static noinline void __init kernel_init_freeable(void);
> >> >
> >> >  #ifdef CONFIG_DEBUG_RODATA
> >> > -static bool rodata_enabled = true;
> >> > +bool rodata_enabled = true;
> >> 
> >> Is there a mismatch here between the extern ifdef and the bool ifdef?
> >> I.e. shouldn't the ifdef here be || DEBUG_SET_MODULE_RONX too?
> >
> > Yes.
> >
> >> Also, can you mark this as __ro_after_init, since nothing changes it
> >> after the kernel command line is parsed?
> >
> > Yes, yes.
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> >> Otherwise, this looks fine to me.
> >> 
> >> -Kees
> >> 
> >> 
> >> -- 
> >> Kees Cook
> >> Nexus Security

Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-25 Thread Joonsoo Kim

On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote:
> On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote:
> > On 10/14/2016 05:03 AM, js1...@gmail.com wrote:
> > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > >   int i;
> > >+  struct zone *zone;
> > >+  pg_data_t *pgdat;
> > >+
> > >+  if (!cma_area_count)
> > >+  return 0;
> > >+
> > >+  for_each_online_pgdat(pgdat) {
> > >+  unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+  for (i = 0; i < cma_area_count; i++) {
> > >+  if (pfn_to_nid(cma_areas[i].base_pfn) !=
> > >+  pgdat->node_id)
> > >+  continue;
> > >+
> > >+  start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> > >+  end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> > >+  cma_areas[i].count);
> > >+  }
> > >+
> > >+  if (!end_pfn)
> > >+  continue;
> > >+
> > >+  zone = &pgdat->node_zones[ZONE_CMA];
> > >+
> > >+  /* ZONE_CMA doesn't need to exceed CMA region */
> > >+  zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+  zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+  zone->zone_start_pfn;
> > 
> > Hmm, do the max/min here work as intended? IIUC the initial
> 
> Yeap.
> 
> > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least
> > the max/min should be swapped?
> 
> No. CMA zone's start/end pfn are updated as node's start/end pfn.
> 
> > Also the zone_end_pfn(zone) on the second line already sees the
> > changes to zone->zone_start_pfn in the first line, so it's kind of a
> > mess. You should probably cache zone_end_pfn() to a temporary
> > variable before changing zone_start_pfn.
> 
> You're right although it doesn't cause any problem. I look at the code
> again and find that max/min isn't needed. Calculated start/end pfn
> should be inbetween node's start/end pfn so max(zone->zone_start_pfn,
> start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone),
> end_pfn) will return end_pfn in all the cases.
> 
> Anyway, I will fix it as following.
> 
> zone->zone_start_pfn = start_pfn
> zone->spanned_pages = end_pfn - start_pfn

Hello,

Here comes fixed one.

--->8
>From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Thu, 24 Mar 2016 22:29:10 +0900
Subject: [PATCH] mm/cma: populate ZONE_CMA

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 --
 include/linux/mm.h |  1 +
 mm/cma.c   | 62 ++
 mm/internal.h  |  3 ++
 mm/page_alloc.c| 29 +---
 5 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct 
page *page,
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);

-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct

Re: [PATCH v3] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li

2016-10-25 19:15 GMT+08:00 Paolo Bonzini :
>
>
> On 25/10/2016 04:58, Wanpeng Li wrote:
>> @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>
> This needs to be notrace too.

Ok, I just sent out a new version for this.

Regards,
Wanpeng Li

Re: [RFC PATCH] xhci: do not halt the secondary HCD

2016-10-25 Thread Joel Stanley

On Tue, Sep 20, 2016 at 5:56 PM, Mathias Nyman
 wrote:
> Quick Googling shows that that TI TUSB 73x0 USB3.0 xHCI host has an issue
> with halting.
>
> Errata says host needs 125us to 1ms between the last control transfer and
> clearing the run/stop bit. (halting the host)
>
> Suggested workaround is to wait at least 2ms before halting the host.
>
> See issue #10 in:
> http://www.ti.com/lit/er/sllz076/sllz076.pdf
>
> It might just be that the patch works because it forces halting the host to
> be done later (secondary hcd -> primary hcd),  giving it enough time after
> the last control transfer.

Well spotted.

I gave this a go, adding a quirk and performing a msleep:

+++ b/drivers/usb/host/xhci.c
@@ -109,6 +109,10 @@ int xhci_halt(struct xhci_hcd *xhci)
 {
int ret;
xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Halt the HC");
+
+   if (xhci->quirks & XHCI_HALT_DELAY_QUIRK)
+   msleep(2);
+
xhci_quiesce(xhci);

However it didn't help.

Are we guaranteed that transfers are not in flight at that point?

>
>>> a first step.
>>>
>>> load primary
>>> load secondary  (starts the xhci controller
>>> ...
>>> unload secondary (halts the controller)
>>> unload primary   (free memory)
>
>
> Now thinking about it, it doesn't really make sense to halt the host
> controller hardware
> before removing the primary hcd. It will just cause devices under the
> primary (USB2) to
> be removed uncleanly.  So basically the idea of the workaround makes sense,
> it just needs
> to be cleaned up from a workaround to intended behavior.

Great. When you say clean up, do you just mean tidying the comments?

Cheers,

Joel


>
> We might also need an additional quirk for TI TUSB 73x0 that adds a msleep()
> before the
> xhci_halt, even if it's moved to the last hcd removed.
>
> -Mathias

[PATCH v4] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li

From: Wanpeng Li 

As Peterz pointed out:

| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only send to tickle the
| return to user path (which does the actual preemption).

This patch add write msr notrace to avoid the debug codes splash.

Suggested-by: Peter Zijlstra 
Suggested-by: Paolo Bonzini 
Cc: Ingo Molnar 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/apic.h |  3 ++-
 arch/x86/include/asm/msr.h  | 15 +++
 arch/x86/kernel/apic/apic.c |  1 +
 arch/x86/kernel/kvm.c   |  6 +++---
 arch/x86/kernel/smp.c   |  2 --
 5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index f5aaf6c..a5a0bcf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -196,7 +196,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi_write(u32 reg, u32 v)
 {
-   wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+   wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
@@ -332,6 +332,7 @@ struct apic {
 * on write for EOI.
 */
void (*eoi_write)(u32 reg, u32 v);
+   void (*native_eoi_write)(u32 reg, u32 v);
u64 (*icr_read)(void);
void (*icr_write)(u32 low, u32 high);
void (*wait_icr_idle)(void);
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index b5fee97..afbb221 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -127,6 +127,21 @@ notrace static inline void native_write_msr(unsigned int 
msr,
 }
 
 /* Can be uninlined because referenced by paravirt */
+notrace static inline void native_write_msr_notrace(unsigned int msr,
+   unsigned low, unsigned high)
+{
+   asm volatile("1: wrmsr\n"
+"2:\n"
+_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
+: : "c" (msr), "a"(low), "d" (high) : "memory");
+}
+
+static inline void wrmsr_notrace(unsigned msr, unsigned low, unsigned high)
+{
+   native_write_msr_notrace(msr, low, high);
+}
+
+/* Can be uninlined because referenced by paravirt */
 notrace static inline int native_write_msr_safe(unsigned int msr,
unsigned low, unsigned high)
 {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 88c657b..2686894 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2263,6 +2263,7 @@ void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, 
u32 v))
for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
/* Should happen once for each apic */
WARN_ON((*drv)->eoi_write == eoi_write);
+   (*drv)->native_eoi_write = (*drv)->eoi_write;
(*drv)->eoi_write = eoi_write;
}
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..a4627ed 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -308,7 +308,7 @@ static void kvm_register_steal_time(void)
 
 static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
 
-static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+static void kvm_guest_apic_eoi_write_notrace(u32 reg, u32 val)
 {
/**
 * This relies on __test_and_clear_bit to modify the memory
@@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 */
if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(&kvm_apic_eoi)))
return;
-   apic_write(APIC_EOI, APIC_EOI_ACK);
+   apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
 static void kvm_guest_cpu_init(void)
@@ -474,7 +474,7 @@ void __init kvm_guest_init(void)
}
 
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
-   apic_set_eoi_write(kvm_guest_apic_eoi_write);
+   apic_set_eoi_write(kvm_guest_apic_eoi_write_notrace);
 
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index c00cb64..68f8cc2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,10 +261,8 @@ static inline void __smp_reschedule_interrupt(void)
 
 __visible void smp_reschedule_interrupt(struct pt_regs *regs)
 {
-   irq_enter();
ack_APIC_irq();
__smp_reschedule_interrupt();
-   irq_exit();
/*
 * KVM uses this interrupt to force a cpu out of guest mode
 */
-- 
1.9.1

Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-25 Thread Andreas Dilger

On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov  wrote:
> 
> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
>> 
>> NAK.  The maximum bio size should not depend on an obscure vm config,
>> please send a standalone patch increasing the size to the block list,
>> with a much long explanation.  Also you can't simply increase the size
>> of the largers pool, we'll probably need more pools instead, or maybe
>> even implement a similar chaining scheme as we do for struct
>> scatterlist.
> 
> The size of required pool depends on architecture: different architectures
> has different (huge page size)/(base page size).
> 
> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?

Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
4MB or larger since most high-end RAID hardware does best with 4MB IOs.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail

RE: [PATCH 1/3] clk: qcom: gdsc: Add support for gdscs with HW control

2016-10-25 Thread Sricharan

Hi Stan,

>Hi Sricharan,
>
>On 10/24/2016 01:18 PM, Sricharan R wrote:
>> From: Rajendra Nayak 
>>
>> Some GDSCs might support a HW control mode, where in the power
>> domain (gdsc) is brought in and out of low power state (while
>> unsued) without any SW assistance, saving power.
>> Such GDSCs can be configured in a HW control mode when powered on
>> until they are explicitly requested to be powered off by software.
>>
>> Signed-off-by: Rajendra Nayak 
>> Signed-off-by: Sricharan R 
>> ---
>>  drivers/clk/qcom/gdsc.c | 15 +++
>>  drivers/clk/qcom/gdsc.h |  1 +
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
>> index f12d7b2..a5e1c8c 100644
>> --- a/drivers/clk/qcom/gdsc.c
>> +++ b/drivers/clk/qcom/gdsc.c
>> @@ -55,6 +55,13 @@ static int gdsc_is_enabled(struct gdsc *sc, unsigned int 
>> reg)
>>  return !!(val & PWR_ON_MASK);
>>  }
>>
>> +static int gdsc_hwctrl(struct gdsc *sc, bool en)
>> +{
>> +u32 val = en ? HW_CONTROL_MASK : 0;
>> +
>> +return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val);
>> +}
>> +
>>  static int gdsc_toggle_logic(struct gdsc *sc, bool en)
>>  {
>>  int ret;
>> @@ -164,6 +171,10 @@ static int gdsc_enable(struct generic_pm_domain *domain)
>>   */
>>  udelay(1);
>>
>> +/* Turn on HW trigger mode if supported */
>> +if (sc->flags & HW_CTRL)
>> +gdsc_hwctrl(sc, true);
>
   Sure, will add the check.

Regards,
 Sricharan

[PATCH] drm: rcar-du: Fix R-Car Gen3 crash when VSP is disabled

2016-10-25 Thread Magnus Damm

From: Magnus Damm 

For the DU to operate on R-Car Gen3 hardware a combination of DU
and VSP devices are required. Since the DU driver also supports
earlier generations hardware the VSP portion is enabled via Kconfig.

The arm64 defconfig is as of v4.9-rc1 having the DU driver enabled
as a module, however this is not enough to support R-Car Gen3. In
the current case of CONFIG_DRM_RCAR_VSP=n then the kernel crashes
when loading the module. This patch is fixing that particular case.

In more detail, the crash triggers in drm_atomic_get_plane_state()
when __drm_atomic_helper_set_config() passes NULL as crtc->primary.

This patch corrects this issue by failing to load the DU driver on
R-Car Gen3 when VSP is not available.

Signed-off-by: Magnus Damm 
---

 drivers/gpu/drm/rcar-du/rcar_du_vsp.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 0001/drivers/gpu/drm/rcar-du/rcar_du_vsp.h
+++ work/drivers/gpu/drm/rcar-du/rcar_du_vsp.h  2016-10-26 00:01:12.920607110 
+0900
@@ -70,7 +70,7 @@ void rcar_du_vsp_disable(struct rcar_du_
 void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc);
 void rcar_du_vsp_atomic_flush(struct rcar_du_crtc *crtc);
 #else
-static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return 0; };
+static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return -ENXIO; };
 static inline void rcar_du_vsp_enable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_disable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc) { };

Re: [PATCH V2 4/8] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()

2016-10-25 Thread Viresh Kumar

On 25-10-16, 13:26, Stephen Boyd wrote:
> For things like AVS we'll probably want to do that, although it's
> sort of funny because replacing RCU with rw-locks is the opposite
> direction most people go.

Yes, that would be very funny :)

> With AVS we would be updating the
> voltage(s) in use for the current OPP, and we would want that
> update to block any OPP transition until the voltage is adjusted.
> I don't know how we would do that with RCU very well. Plus, RCU
> is for reader heavy things, but we mostly have one or two
> readers.

Not just that, think of opp_disable() function. What guarantees currently that
an OPP being disabled isn't already used right now? Or is on the way of getting
used?

I strongly feel RCU is not the best fit for OPP core at least.

> I guess it's ok for now to do all this copying, but it feels like
> we'll need to undo a large portion of it later with things like
> AVS.

Yes.

> Or at least we'll be doing copies for almost no reason
> because we'll want to hold the read lock across the whole OPP
> transition. I was going to suggest we pass around information
> about what we want to grab from the RCU protected data
> structures, think index of regulator, etc. and then have small
> RCU read-side critical sections to grab that info during the OPP
> transition but I'm not sure that's any better. It might be worse
> because the OPP could change during the OPP transition and we
> could be using half of the old and half of the new data.

The problem is that this code is getting harder to read for everybody. If we are
finding it difficult to understand, what about newbies..

-- 
viresh

Re: [PATCH 1/3] usb: dwc3: host: inherit dma configuration from parent dev

2016-10-25 Thread Peter Chen

On Tue, Oct 25, 2016 at 04:26:26PM +0530, Sriram Dash wrote:
> For xhci-hcd platform device, all the DMA parameters are not configured
> properly, notably dma ops for dwc3 devices.
> 
> The idea here is that you pass in the parent of_node along with the child
> device pointer, so it would behave exactly like the parent already does.
> The difference is that it also handles all the other attributes besides
> the mask.
> Splitting the usb_bus->controller field into the Linux-internal device
> (used for the sysfs hierarchy, for printks and for power management)
> and a new pointer (used for DMA, DT enumeration and phy lookup) probably
> covers all that we really need.
> 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Sriram Dash 
> Cc: Felipe Balbi 
> Cc: Grygorii Strashko 
> Cc: Sinjan Kumar 
> Cc: David Fisher 
> Cc: Catalin Marinas 
> Cc: "Thang Q. Nguyen" 
> Cc: Yoshihiro Shimoda 
> Cc: Stephen Boyd 
> Cc: Bjorn Andersson 
> Cc: Ming Lei 
> Cc: Jon Masters 
> Cc: Dann Frazier 
> Cc: Peter Chen 
> Cc: Leo Li 
> ---
>  drivers/usb/chipidea/host.c  |  3 ++-
>  drivers/usb/chipidea/udc.c   | 10 +
>  drivers/usb/core/buffer.c| 12 +--
>  drivers/usb/core/hcd.c   | 48 
> ++--
>  drivers/usb/core/usb.c   | 18 -
>  drivers/usb/dwc3/core.c  | 22 +---
>  drivers/usb/dwc3/core.h  |  1 +
>  drivers/usb/dwc3/ep0.c   |  8 
>  drivers/usb/dwc3/gadget.c| 37 +-
>  drivers/usb/dwc3/host.c  |  8 
>  drivers/usb/host/ehci-fsl.c  |  4 ++--
>  drivers/usb/host/xhci-mem.c  | 12 +--
>  drivers/usb/host/xhci-plat.c | 33 +++---
>  drivers/usb/host/xhci.c  | 15 ++
>  include/linux/usb.h  |  1 +
>  include/linux/usb/hcd.h  |  3 +++
>  16 files changed, 144 insertions(+), 91 deletions(-)
> 
> diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c
> index 96ae695..ca27893 100644
> --- a/drivers/usb/chipidea/host.c
> +++ b/drivers/usb/chipidea/host.c
> @@ -116,7 +116,8 @@ static int host_start(struct ci_hdrc *ci)
>   if (usb_disabled())
>   return -ENODEV;
>  
> - hcd = usb_create_hcd(&ci_ehci_hc_driver, ci->dev, dev_name(ci->dev));
> + hcd = __usb_create_hcd(&ci_ehci_hc_driver, ci->dev->parent,
> +ci->dev, dev_name(ci->dev), NULL);
>   if (!hcd)
>   return -ENOMEM;
>  
> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> index 661f43f..bc55922 100644
> --- a/drivers/usb/chipidea/udc.c
> +++ b/drivers/usb/chipidea/udc.c
> @@ -423,7 +423,8 @@ static int _hardware_enqueue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>  
>   hwreq->req.status = -EALREADY;
>  
> - ret = usb_gadget_map_request(&ci->gadget, &hwreq->req, hwep->dir);
> + ret = usb_gadget_map_request_by_dev(ci->dev->parent,
> + &hwreq->req, hwep->dir);
>   if (ret)
>   return ret;
>  
> @@ -603,7 +604,8 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>   list_del_init(&node->td);
>   }
>  
> - usb_gadget_unmap_request(&hwep->ci->gadget, &hwreq->req, hwep->dir);
> + usb_gadget_unmap_request_by_dev(hwep->ci->dev->parent,
> + &hwreq->req, hwep->dir);
>  
>   hwreq->req.actual += actual;
>  
> @@ -1904,13 +1906,13 @@ static int udc_start(struct ci_hdrc *ci)
>   INIT_LIST_HEAD(&ci->gadget.ep_list);
>  
>   /* alloc resources */
> - ci->qh_pool = dma_pool_create("ci_hw_qh", dev,
> + ci->qh_pool = dma_pool_create("ci_hw_qh", dev->parent,
>  sizeof(struct ci_hw_qh),
>  64, CI_HDRC_PAGE_SIZE);
>   if (ci->qh_pool == NULL)
>   return -ENOMEM;
>  
> - ci->td_pool = dma_pool_create("ci_hw_td", dev,
> + ci->td_pool = dma_pool_create("ci_hw_td", dev->parent,
>  sizeof(struct ci_hw_td),
>  64, CI_HDRC_PAGE_SIZE);

The chipidea part is ok for me, but just follow Arnd's suggestion
for patch split, subject, and commit log.

Peter

>   if (ci->td_pool == NULL) {
> diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
> index 98e39f9..1e41ef7 100644
> --- a/drivers/usb/core/buffer.c
> +++ b/drivers/usb/core/buffer.c
> @@ -63,7 +63,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
>   int i, size;
>  
>   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
> - (!hcd->self.controller->dma_mask &&
> + (!hcd->self.sysdev->dma_mask &&
>!(hcd->driver->flags & HCD_LOCAL_MEM)))
>   return 0;
>  
> @@ -72,7 +72,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
>   if (!size)
>   continue;
>   snprintf(name, sizeof(name), "buffer-%d

Re: [PATCH V2 0/8] PM / OPP: Multiple regulator support

2016-10-25 Thread Viresh Kumar

On 25-10-16, 16:13, Dave Gerlach wrote:
> I think what you have shared below is a good safety check but if I rename
> the regulator properties in the DT for the cpu (to vdd and vbb, meaning
> cpufreq detects no regulator) and do *not* call dev_pm_opp_set_regulators
> before cpufreq-dt probes we fail before we even get to that point:
> 
> [16.946] cpu cpu0: opp_parse_supplies: Invalid number of elements in
> opp-microvolt property (6) with supplies (1)
> [16.967] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22
> [16.982] cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19)
> [16.982] cpu cpu0: OPP table is not ready, deferring probe
> 
> This failure is because opp_parse_supplies assumes a count of 1 regulator if
> no regulators at all are present and then hard fails if too many voltages
> have been passed for each OPP.

Exactly. And yes this is intentional.

> It seems we need a check much earlier similar
> to what you suggested below to allow us to defer if an OPP has supplied
> voltages but no regulator has been registered with the system. I think this
> is reasonable even for the 1 regulator case, no?

No.

OPP core needs to know about regulators only if the user drivers want it to
manage DVFS. It is still possible for cpufreq drivers to use OPP framework for
managing the tables, but do the real DVFS stuff themselves. That's why it is not
compulsory in the code to set regulator names.

And its only wrong if dev_pm_opp_set_rate() is called without first setting the
regulators..

> cpufreq-dt won't handle this properly as is, but now that the opp core is
> evolving perhaps it makes sense to modify the resources_available check
> slightly to rely on the OPP core rather than just a dummy
> regulator_get_optional to see if the regulator is ready.

I am not sure yet on what to change there. You mean regarding multiple
regulators?

-- 
viresh

Re: [PATCH V2 0/6] ARM64: Uprobe support added

2016-10-25 Thread Pratyush Anand

Hi Catalin,

Please let me know if everything else other than is_trap_insn() looks
fine to you. May be I can work well in time. It would be great if we
can make it into v4.9.


~Pratyush


On Tue, Sep 27, 2016 at 1:17 PM, Pratyush Anand  wrote:
> Changes since v1:
> * Exposed sync_icache_aliases() and used that in stead of 
> flush_uprobe_xol_access()
> * Assigned 0x0005 to BRK64_ESR_UPROBES in stead of 0x0008
> * moved uprobe_opcode_t from probes.h to uprobes.h
> * Assigned 4 to TIF_UPROBE instead of 5
> * Assigned AARCH64_INSN_SIZE to UPROBE_SWBP_INSN_SIZE instead of hard code 4.
> * Removed saved_fault_code from struct arch_uprobe_task
> * Removed preempt_dis(en)able() from arch_uprobe_copy_ixol()
> * Removed case INSN_GOOD from arch_uprobe_analyze_insn()
> * Now we do check that probe point is not for a 32 bit task.
> * Return a false positive from is_tarp_insn()
> * Changes for rebase conflict resolution
>
> V1 was here: https://lkml.org/lkml/2016/8/2/29
> Patches have been rebased on next-20160927, so that there would be no
> conflicts with other arm64/for-next/core patches.
>
> Patches have been tested for following:
> 1. Step-able instructions, like sub, ldr, add etc.
> 2. Simulation-able like ret, cbnz, cbz etc.
> 3. uretprobe
> 4. Reject-able instructions like sev, wfe etc.
> 5. trapped and abort xol path
> 6. probe at unaligned user address.
> 7. longjump test cases
>
> aarch32 task probing is not yet supported.
>
> Pratyush Anand (6):
>   arm64: kprobe: protect/rename few definitions to be reused by uprobe
>   arm64: kgdb_step_brk_fn: ignore other's exception
>   arm64: Handle TRAP_TRACE for user mode as well
>   arm64: Handle TRAP_BRKPT for user mode as well
>   arm64: introduce mm context flag to keep 32 bit task information
>   arm64: Add uprobe support
>
>  arch/arm64/Kconfig  |   3 +
>  arch/arm64/include/asm/cacheflush.h |   1 +
>  arch/arm64/include/asm/debug-monitors.h |   3 +
>  arch/arm64/include/asm/elf.h|  12 +-
>  arch/arm64/include/asm/mmu.h|   1 +
>  arch/arm64/include/asm/probes.h |  19 +--
>  arch/arm64/include/asm/ptrace.h |   8 ++
>  arch/arm64/include/asm/thread_info.h|   5 +-
>  arch/arm64/include/asm/uprobes.h|  36 ++
>  arch/arm64/kernel/debug-monitors.c  |  40 +++---
>  arch/arm64/kernel/kgdb.c|   3 +
>  arch/arm64/kernel/probes/Makefile   |   2 +
>  arch/arm64/kernel/probes/decode-insn.c  |  32 ++---
>  arch/arm64/kernel/probes/decode-insn.h  |   8 +-
>  arch/arm64/kernel/probes/kprobes.c  |  36 +++---
>  arch/arm64/kernel/probes/uprobes.c  | 221 
> 
>  arch/arm64/kernel/signal.c  |   3 +
>  arch/arm64/mm/flush.c   |   2 +-
>  18 files changed, 371 insertions(+), 64 deletions(-)
>  create mode 100644 arch/arm64/include/asm/uprobes.h
>  create mode 100644 arch/arm64/kernel/probes/uprobes.c
>
> --
> 2.7.4
>

Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-25 Thread Leizhen (ThunderTown)



On 2016/10/25 21:23, Michal Hocko wrote:
> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>> actually exist. The percpu variable areas and numa control blocks of that
>> memoryless numa nodes need to be allocated from the nearest available
>> node to improve performance.
>>
>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>> specified nid at the first time, but if that allocation failed it will
>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>> the second time.
>>
>> To compatible the above old scene, I use a marco node_distance_ready to
>> control it. By default, the marco node_distance_ready is not defined in
>> any platforms, the above mentioned functions will work as normal as
>> before. Otherwise, they will try the nearest node first.
> 
> I am sorry but it is absolutely unclear to me _what_ is the motivation
> of the patch. Is this a performance optimization, correctness issue or
> something else? Could you please restate what is the problem, why do you
> think it has to be fixed at memblock layer and describe what the actual
> fix is please?
This is a performance optimization. The problem is if some memoryless numa 
nodes are
actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no 
memory,
and the node distances is as below:
-board---
|   |
|   |
 socket0 socket1
   / \ / \
  /   \   /   \
   node0 node1 node2 node3
distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 
access
the memory of node0 is faster than node2 or node3.

Linux defines a lot of percpu variables, each cpu has a copy of it and most of 
the time
only to access their own percpu area. In this example, we hope the percpu area 
of CPUs
on node1 allocated from node0. But without these patches, it's not sure that.

If each node has their own memory, we can directly use below functions to 
allocate memory
from its local node:
1. memblock_alloc_nid
2. memblock_alloc_try_nid
3. memblock_virt_alloc_try_nid_nopanic
4. memblock_virt_alloc_try_nid

So, these patches is only used for numa memoryless scenario.

Another use case is the control block "extern pg_data_t *node_data[]",
Here is an example of x86 numa in arch/x86/mm/numa.c:
static void __init alloc_node_data(int nid)
{
... ...
/*
 * Allocate node data.  Try node-local memory and then any node.
//==>But the nearest node is the best
 * Never allocate in DMA zone.
 */
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
  MEMBLOCK_ALLOC_ACCESSIBLE);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
   nd_size, nid);
return;
}
}
nd = __va(nd_pa);
... ...
node_data[nid] = nd;

> 
>>From a quick glance you are trying to bend over the memblock API for
> something that should be handled on a different layer.
> 
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  mm/memblock.c | 76 
>> ++-
>>  1 file changed, 65 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7608bc3..556bbd2 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, 
>> phys_addr_t align)
>>  return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +#ifndef node_distance_ready
>> +#define node_distance_ready()   0
>> +#endif
>> +
>> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
>> +phys_addr_t align, phys_addr_t start,
>> +phys_addr_t end, int nid, ulong flags,
>> +int alloc_func_type)
>> +{
>> +int nnid, round = 0;
>> +u64 pa;
>> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +bitmap_zero(nodes_map, MAX_NUMNODES);
>> +
>> +again:
>> +/*
>> + * There are total 4 cases:
>> + * 
>> + *   1)2) node_distance_ready || !node_distance_ready
>> + *  Round 1, nnid = nid = NUMA_NO_NODE;
>> + * 
>> + *   3) !node_distance_ready
>> + *  Round 1, nnid = nid;
>> + *::Round 2, currently only applicable for alloc_func_type = <0>
>> + *  Round 2, nnid = NUMA_NO_NODE;
>> + *   4) node_distance_ready
>> + *  Round 1, LOCAL_DISTANCE, nnid = nid;
>> + *  Round ?, nnid = nearest nid;
>> + */
>> +if (!nod

[PATCH v6 4/5] ARM: DTS: da850: Add cfgchip syscon node

2016-10-25 Thread David Lechner

Add a syscon node for the SoC CFGCHIPn registers. This is needed for
the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index f79e1b9..6bbf20d 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -188,6 +188,10 @@
};
 
};
+   cfgchip: cfgchip@1417c {
+   compatible = "ti,da830-cfgchip", "syscon";
+   reg = <0x1417c 0x14>;
+   };
edma0: edma@0 {
compatible = "ti,edma3-tpcc";
/* eDMA3 CC0: 0x01c0  - 0x01c0 7fff */
-- 
2.7.4

[PATCH v6 0/5] da8xx USB PHY platform devices and clocks

2016-10-25 Thread David Lechner

It has been almost 6 months since the v5 submission, so here is a recap:

* There were a number of phy and usb dependencies that were submitted
  separately.
* The last of the usb dependencies has finally made its way into linux-next
  today.
* This series was recently included in "[PATCH/RFT v2 00/17] Add DT support for
  ohci-da8xx". I am breaking it back out again as a standalone series.


v6 changes:

* Combine "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
  from the "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx" series with
  the "ARM: davinci: da8xx: add usb phy clocks" patch in this series.
* Change the syscon and da8xx-usb-phy device ids to -1.

v5 changes: renamed "usbphy" to "usb_phy" or "usb-phy" as appropriate

v4 changes: fix strict checkpatch complaint

v3 changes:

* Fixed the davinci device tree declarations to use the preferred DT address
  convention so that the items I have added can be correct too.
* Moved that davinci clock init so that we don't have to call ioremap in the
  clock mux functions.
* Added a new "syscon" device for the CFGCHIP registers. This is used by the
  USB PHY driver and will be used in the future in common clock framework
  drivers.
* USB clocks are moved to a common file instead of having duplicated code.
* PHY driver uses syscon for CFGCHIP registers instead of using them directly.

David Lechner (5):
  ARM: davinci: da8xx: add usb phy clocks
  ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.
  ARM: davinci: da8xx: Add USB PHY platform declaration
  ARM: DTS: da850: Add cfgchip syscon node
  ARM: DTS: da850: Add usb phy node

 arch/arm/boot/dts/da850.dtsi|   9 ++
 arch/arm/mach-davinci/board-da830-evm.c |  52 +++---
 arch/arm/mach-davinci/board-da850-evm.c |   4 +
 arch/arm/mach-davinci/board-mityomapl138.c  |   4 +
 arch/arm/mach-davinci/board-omapl138-hawk.c |  23 ++-
 arch/arm/mach-davinci/devices-da8xx.c   |  28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |   6 +
 arch/arm/mach-davinci/usb-da8xx.c   | 243 +++-
 8 files changed, 327 insertions(+), 42 deletions(-)

-- 
2.7.4

[PATCH v6 1/5] ARM: davinci: da8xx: add usb phy clocks

2016-10-25 Thread David Lechner

Up to this point, the USB phy clock configuration was handled manually in
the board files and in the usb drivers. This adds proper clocks so that
the usb drivers can use clk_get and clk_enable and not have to worry about
the details. Also, the related code is removed from the board files and
replaced with the new clock registration functions.

Signed-off-by: David Lechner 
Signed-off-by: Axel Haslam 
---

I have added "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
from Axel Haslam to this patch.

In the review of Axel's patch, Sekhar said:

> We should not be using a NULL device pointer here. Can you pass the musb
> device pointer available in the same file? Also, da850_clks[] in da850.c
> needs to be fixed to add the matching device name.

However, the musb device may not be registered. The usb20_clk can be used to
supply a 48MHz clock to USB 1.1 (ohci) without using the musb device. So, I am
inclined to leave this as NULL.


 arch/arm/mach-davinci/board-da830-evm.c |  22 ++-
 arch/arm/mach-davinci/board-omapl138-hawk.c |  16 +-
 arch/arm/mach-davinci/include/mach/da8xx.h  |   3 +
 arch/arm/mach-davinci/usb-da8xx.c   | 232 +++-
 4 files changed, 252 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3d8cf8c..605d444 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -115,18 +115,6 @@ static __init void da830_evm_usb_init(void)
 */
cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
-   /* USB2.0 PHY reference clock is 24 MHz */
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-
-   /*
-* Select internal reference clock for USB 2.0 PHY
-* and use it as a clock source for USB 1.1 PHY
-* (this is the default setting anyway).
-*/
-   cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX;
-   cfgchip2 |=  CFGCHIP2_USB2PHYCLKMUX;
-
/*
 * We have to override VBUS/ID signals when MUSB is configured into the
 * host-only mode -- ID pin will float if no cable is connected, so the
@@ -143,6 +131,16 @@ static __init void da830_evm_usb_init(void)
__raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
/* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index ee62486..d4930b6 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -243,7 +243,6 @@ static irqreturn_t omapl138_hawk_usb_ocic_irq(int irq, void 
*dev_id)
 static __init void omapl138_hawk_usb_init(void)
 {
int ret;
-   u32 cfgchip2;
 
ret = davinci_cfg_reg_list(da850_hawk_usb11_pins);
if (ret) {
@@ -251,12 +250,15 @@ static __init void omapl138_hawk_usb_init(void)
return;
}
 
-   /* Setup the Ref. clock frequency for the HAWK at 24 MHz. */
-
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
+   /* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
 
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index f9f9713..c367530 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -88,6 +88,9 @@ int da850_register_edma(struct edma_rsv_info *rsv[2]);
 int da8xx_register_i2c(int instance, struct davinci_i2c_platform_data *pdata);
 int da8xx_register_spi_bus(int instance, unsigned num_chipselect);
 int da8xx_register_watchdog(void);
+int da8xx_register_usb_refclkin(int rate);
+int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb11_phy_clk(bool use_usb_refclkin);
 int da8xx_register_usb20(unsigned

[PATCH v6 3/5] ARM: davinci: da8xx: Add USB PHY platform declaration

2016-10-25 Thread David Lechner

There is now a proper phy driver for the DA8xx SoC USB PHY. This adds the
platform device declarations needed to use it.

Signed-off-by: David Lechner 
---

da8xx-usb-phy device id is changed to -1 since there is only one da8xx-usb-phy
device.

 arch/arm/mach-davinci/board-da830-evm.c | 28 +---
 arch/arm/mach-davinci/board-omapl138-hawk.c |  5 +
 arch/arm/mach-davinci/include/mach/da8xx.h  |  1 +
 arch/arm/mach-davinci/usb-da8xx.c   | 11 +++
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3051cb6..c62766e 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -106,30 +105,8 @@ static irqreturn_t da830_evm_usb_ocic_irq(int irq, void 
*dev_id)
 
 static __init void da830_evm_usb_init(void)
 {
-   u32 cfgchip2;
int ret;
 
-   /*
-* Set up USB clock/mode in the CFGCHIP2 register.
-* FYI:  CFGCHIP2 is 0xef00 initially.
-*/
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
-   /*
-* We have to override VBUS/ID signals when MUSB is configured into the
-* host-only mode -- ID pin will float if no cable is connected, so the
-* controller won't be able to drive VBUS thinking that it's a B-device.
-* Otherwise, we want to use the OTG mode and enable VBUS comparators.
-*/
-   cfgchip2 &= ~CFGCHIP2_OTGMODE;
-#ifdef CONFIG_USB_MUSB_HOST
-   cfgchip2 |=  CFGCHIP2_FORCE_HOST;
-#else
-   cfgchip2 |=  CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN;
-#endif
-
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
/* USB_REFCLKIN is not used. */
ret = da8xx_register_usb20_phy_clk(false);
if (ret)
@@ -141,6 +118,11 @@ static __init void da830_evm_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index 8691a25..c5cb8d9 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -260,6 +260,11 @@ static __init void omapl138_hawk_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
if (ret < 0) {
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c32444b..38d932e 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -92,6 +92,7 @@ int da8xx_register_watchdog(void);
 int da8xx_register_usb_refclkin(int rate);
 int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
 int da8xx_register_usb11_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb_phy(void);
 int da8xx_register_usb20(unsigned mA, unsigned potpgt);
 int da8xx_register_usb11(struct da8xx_ohci_root_hub *pdata);
 int da8xx_register_emac(void);
diff --git a/arch/arm/mach-davinci/usb-da8xx.c 
b/arch/arm/mach-davinci/usb-da8xx.c
index 71a6d85..9c30bff 100644
--- a/arch/arm/mach-davinci/usb-da8xx.c
+++ b/arch/arm/mach-davinci/usb-da8xx.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -243,6 +244,16 @@ int __init da8xx_register_usb11_phy_clk(bool 
use_usb_refclkin)
return ret;
 }
 
+static struct platform_device da8xx_usb_phy = {
+   .name   = "da8xx-usb-phy",
+   .id = -1,
+};
+
+int __init da8xx_register_usb_phy(void)
+{
+   return platform_device_register(&da8xx_usb_phy);
+}
+
 #if IS_ENABLED(CONFIG_USB_MUSB_HDRC)
 
 static struct musb_hdrc_config musb_config = {
-- 
2.7.4

[PATCH v6 2/5] ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.

2016-10-25 Thread David Lechner

The CFGCHIP registers are used by a number of devices, so using a syscon
device to share them. The first consumer of this will by the phy-da8xx-usb
driver.

Signed-off-by: David Lechner 
---

syscon device id is changed to -1 since there is only one syscon device.

 arch/arm/mach-davinci/board-da830-evm.c |  4 
 arch/arm/mach-davinci/board-da850-evm.c |  4 
 arch/arm/mach-davinci/board-mityomapl138.c  |  4 
 arch/arm/mach-davinci/board-omapl138-hawk.c |  4 
 arch/arm/mach-davinci/devices-da8xx.c   | 28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |  2 ++
 6 files changed, 46 insertions(+)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 605d444..3051cb6 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -586,6 +586,10 @@ static __init void da830_evm_init(void)
struct davinci_soc_info *soc_info = &davinci_soc_info;
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da830_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-da850-evm.c 
b/arch/arm/mach-davinci/board-da850-evm.c
index 8e4539f..ec5cb10 100644
--- a/arch/arm/mach-davinci/board-da850-evm.c
+++ b/arch/arm/mach-davinci/board-da850-evm.c
@@ -1345,6 +1345,10 @@ static __init void da850_evm_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-mityomapl138.c 
b/arch/arm/mach-davinci/board-mityomapl138.c
index bc4e63f..1a6d430 100644
--- a/arch/arm/mach-davinci/board-mityomapl138.c
+++ b/arch/arm/mach-davinci/board-mityomapl138.c
@@ -514,6 +514,10 @@ static void __init mityomapl138_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
/* for now, no special EDMA channels are reserved */
ret = da850_register_edma(NULL);
if (ret)
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index d4930b6..8691a25 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -294,6 +294,10 @@ static __init void omapl138_hawk_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/devices-da8xx.c 
b/arch/arm/mach-davinci/devices-da8xx.c
index add3771..31a99db 100644
--- a/arch/arm/mach-davinci/devices-da8xx.c
+++ b/arch/arm/mach-davinci/devices-da8xx.c
@@ -11,6 +11,7 @@
  * (at your option) any later version.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1089,3 +1090,30 @@ int __init da850_register_sata(unsigned long refclkpn)
return platform_device_register(&da850_sata_device);
 }
 #endif
+
+static struct syscon_platform_data da8xx_cfgchip_platform_data = {
+   .label  = "cfgchip",
+};
+
+static struct resource da8xx_cfgchip_resources[] = {
+   {
+   .start  = DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP0_REG,
+   .end= DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP4_REG + 3,
+   .flags  = IORESOURCE_MEM,
+   },
+};
+
+static struct platform_device da8xx_cfgchip_device = {
+   .name   = "syscon",
+   .id = -1,
+   .dev= {
+   .platform_data  = &da8xx_cfgchip_platform_data,
+   },
+   .num_resources  = ARRAY_SIZE(da8xx_cfgchip_resources),
+   .resource   = da8xx_cfgchip_resources,
+};
+
+int __init da8xx_register_cfgchip(void)
+{
+   return platform_device_register(&da8xx_cfgchip_device);
+}
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c367530..c32444b 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -61,6 +61,7 @@ extern unsigned int da850_max_speed;
 #define DA8XX_CFGCHIP1_REG 0x180
 #define DA8XX_CFGCHIP2_REG 0x184
 #define DA8XX_CFGCHIP3_REG 0x188
+#define DA8XX_CFGCHIP4_REG 0x18c
 
 #define DA8XX_SYSCFG1_BASE (IO_PHYS + 0x22C000)
 #define DA8XX_SYSCFG1_VIRT(x)  (da8xx_syscfg1_base + (x))
@@ -116,6 +117,7 @@ void da8xx_rproc_reserve_cma(void);
 int da8xx_register_rproc(void);
 int da850_register_gpio(void);
 int da830_register_gpio(void);
+int da8xx_re

[PATCH v6 5/5] ARM: DTS: da850: Add usb phy node

2016-10-25 Thread David Lechner

Add a node for the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 6bbf20d..33fcdce 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -376,6 +376,11 @@
>;
status = "disabled";
};
+   usb_phy: usb-phy {
+   compatible = "ti,da830-usb-phy";
+   #phy-cells = <1>;
+   status = "disabled";
+   };
gpio: gpio@226000 {
compatible = "ti,dm6441-gpio";
gpio-controller;
-- 
2.7.4

linux-next: no releases next week

2016-10-25 Thread Stephen Rothwell

Hi all,

There will probably be no linux-next releases next week while I am
attending Kernel Summit.

-- 
Cheers,
Stephen Rothwell

linux-next: Tree for Oct 26

2016-10-25 Thread Stephen Rothwell

Hi all,

There will probably be no linux-next releases next week while I attend
the Kernel Summit.

Changes since 20161025:

The sunxi tree lost its build failure.

The akpm-current tree still had its build failures for which I applied
2 patches.

Non-merge commits (relative to Linus' tree): 2628
 3334 files changed, 210166 insertions(+), 49968 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 245 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (9fe68cad6e74 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (989cea5c14be kbuild: prevent lib-ksyms.o 
rebuilds)
Merging arc-current/for-curr (9868c77a82f7 ARC: build: retire old toggles)
Merging arm-current/fixes (6127d124ee4e ARM: wire up new pkey syscalls)
Merging m68k-current/for-linus (6736e65effc3 m68k: Migrate exception table 
users off module.h and onto extable.h)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (09b7e37b18ee powerpc/64: Fix race condition in 
setting lock bit in idle/wakeup code)
Merging sparc/master (ee9e83973d54 sparc32: Fix old style declaration GCC 
warnings)
Merging net/master (44060abe1dd6 Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth)
CONFLICT (content): Merge conflict in drivers/net/ethernet/qlogic/Kconfig
Applying: qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move
Merging ipsec/master (7f92083eb58f vti6: flush x-netns xfrm cache when vti 
interface is removed)
Merging netfilter/master (7034b566a4e7 netfilter: fix nf_queue handling)
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (1ea2643961b0 ath6kl: add Dell OEM SDIO I/O for 
the Venue 8 Pro)
Merging mac80211/master (b4f0fd4baa90 qed: Use list_move_tail instead of 
list_del/list_add_tail)
Merging sound-current/for-linus (9b50898ad96c ALSA: seq: Fix time account 
regression)
Merging pci-current/for-linus (02a1b8f4167e PCI: designware-plat: Update author 
email address)
Merging driver-core.current/driver-core-linus (07d9a380680d Linux 4.9-rc2)
Merging tty.current/tty-linus (1001354ca341 Linux 4.9-rc1)
Merging usb.current/usb-linus (b76032396d79 usb: renesas_usbhs: add wait after 
initialization for R-Car Gen3)
Merging usb-gadget-fixes/fixes (a1aa8cf6471b Revert "Documentation: devicetree: 
dwc2: Deprecate g-tx-fifo-size")
Merging usb-serial-fixes/usb-linus (07d9a380680d Linux 4.9-rc2)
Merging usb-chipidea-fixes/ci-for-usb-stable (6b7f456e67a1 usb: chipidea: host: 
fix NULL ptr dereference during shutdown)
Merging phy/fixes (1001354ca341 Linux 4.9-rc1)
Merging staging.current/staging-linus (e866dd8aab76 greybus: fix a leak on 
error in gb_module_create())
Merging char-misc.current/char-misc-linus (407a3aee6ee2 hv: do not lose pending 
heartbeat vmbus packets)
Merging input-current/for-linus (324ae0958cab Input: psmouse - cleanup 
Focaltech code)
Merging crypto-current/master (6d4952d9d9d4 hwrng: core - Don't use a stack 
buffer in add_early_randomness())
Mergi

Re: [RFC PATCH 0/6] UART slave devices using serio

2016-10-25 Thread Sebastian Reichel

Hi,

On Tue, Oct 25, 2016 at 05:02:23PM -0500, Rob Herring wrote:
> On Tue, Oct 25, 2016 at 4:55 PM, Sebastian Reichel wrote:
> > On Wed, Aug 24, 2016 at 06:24:30PM -0500, Rob Herring wrote:
> >> [...]
> > I had a more detailed look at the series during the last two weeks.
> > For me the approach looks ok and it should work for the nokia bluetooth
> > use case. Actually my work on that driver is more or less stalled until
> > this is solved, so it would be nice to get this forward. Whose feedback
> > is this waiting from? I guess
> 
> I think it is mainly waiting for me to spend more time on it and get
> the tty port part done.

The general approach could already be commented on.

> I could use help especially for converting the BT part properly.

Ok, I will have a look at that.

> >  * Alan & Greg for the serial parts
> >  * Marcel for the bluetooth parts
> >  * Dmitry for the serio parts
> >
> > Maybe you can try to find some minutes at the Kernel Summit to talk
> > about this?
> 
> Still waiting for my invite...
> But I will be at Plumbers if folks want to discuss this.

Ok. I obviously assumed invites have already been sent and that you
would be invited.

-- Sebastian


signature.asc
Description: PGP signature

[PATCH v2 5/5] posix-timers: make it configurable

2016-10-25 Thread Nicolas Pitre

Some embedded systems have no use for them.  This removes about
22KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 drivers/ptp/Kconfig  |   2 +-
 include/linux/posix-timers.h |  28 +-
 include/linux/sched.h|  10 
 init/Kconfig |  17 +++
 kernel/signal.c  |   4 ++
 kernel/time/Makefile |  10 +++-
 kernel/time/posix-stubs.c| 118 +++
 7 files changed, 184 insertions(+), 5 deletions(-)
 create mode 100644 kernel/time/posix-stubs.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 0f7492f8ea..bdce332911 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -6,7 +6,7 @@ menu "PTP clock support"
 
 config PTP_1588_CLOCK
tristate "PTP clock support"
-   depends on NET
+   depends on NET && POSIX_TIMERS
select PPS
select NET_PTP_CLASSIFY
help
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c1760..2288c5c557 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -118,6 +118,8 @@ struct k_clock {
 extern struct k_clock clock_posix_cpu;
 extern struct k_clock clock_posix_dynamic;
 
+#ifdef CONFIG_POSIX_TIMERS
+
 void posix_timers_register_clock(const clockid_t clock_id, struct k_clock 
*new_clock);
 
 /* function to call to trigger timer event */
@@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task);
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
   cputime_t *newval, cputime_t *oldval);
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
+#else
+
+#include 
+
+static inline void posix_timers_register_clock(const clockid_t clock_id,
+  struct k_clock *new_clock) {}
+static inline int posix_timer_event(struct k_itimer *timr, int si_private)
+{ return 0; }
+static inline void run_posix_cpu_timers(struct task_struct *task) {}
+static inline void posix_cpu_timers_exit(struct task_struct *task)
+{
+   add_device_randomness((const void*) &task->se.sum_exec_runtime,
+ sizeof(unsigned long long));
+}
+static inline void posix_cpu_timers_exit_group(struct task_struct *task) {}
+static inline void set_process_cpu_timer(struct task_struct *task,
+   unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {}
+static inline void update_rlimit_cpu(struct task_struct *task,
+unsigned long rlim_new) {}
+
+#endif
+
+long clock_nanosleep_restart(struct restart_block *restart_block);
+
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b0ec..ad716d5559 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk)
 extern void exit_files(struct task_struct *);
 extern void __cleanup_sighand(struct sighand_struct *);
 
+#ifdef CONFIG_POSIX_TIMERS
 extern void exit_itimers(struct signal_struct *);
 extern void flush_itimer_signals(void);
+#else
+static inline void exit_itimers(struct signal_struct *s) {}
+static inline void flush_itimer_signals(void) {}
+#endif
 
 extern void do_group_exit(int);
 
@@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void)
  * Thread group CPU time accounting.
  */
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+#ifdef CONFIG_POSIX_TIMERS
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime 
*times);
+#else
+static inline void thread_group_cputimer(struct task_struct *tsk,
+struct task_cputime *times) {}
+#endif
 
 /*
  * Reevaluate whether the task has signals pending delivery.
diff --git a/init/Kconfig b/init/Kconfig
index 34407f15e6..351d422252 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL
 
  If unsure say N here.
 
+config POSIX_TIMERS
+   bool "Posix Clocks & timers" if EXPERT
+   default y
+   help
+ This includes native support for POSIX timers to the kernel.
+ Most embedded systems may have no use for them and therefore they
+ can be configured out to reduce the size of the kernel image.
+

[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes

2016-10-25 Thread Nicolas Pitre

Signed-off-by: Nicolas Pitre 
---
 scripts/kconfig/zconf.hash.c_shipped |  228 ++---
 scripts/kconfig/zconf.tab.c_shipped  | 1631 --
 2 files changed, 888 insertions(+), 971 deletions(-)

diff --git a/scripts/kconfig/zconf.hash.c_shipped 
b/scripts/kconfig/zconf.hash.c_shipped
index 360a62df2b..bf7f1378b3 100644
--- a/scripts/kconfig/zconf.hash.c_shipped
+++ b/scripts/kconfig/zconf.hash.c_shipped
@@ -32,7 +32,7 @@
 struct kconf_id;
 
 static const struct kconf_id *kconf_id_lookup(register const char *str, 
register unsigned int len);
-/* maximum key range = 71, duplicates = 0 */
+/* maximum key range = 72, duplicates = 0 */
 
 #ifdef __GNUC__
 __inline
@@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned 
int len)
 {
   static const unsigned char asso_values[] =
 {
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73,  0, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73,  5, 25, 25,
-   0,  0,  0,  5,  0,  0, 73, 73,  5,  0,
-  10,  5, 45, 73, 20, 20,  0, 15, 15, 73,
-  20,  5, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74,  0, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74,  0, 20, 10,
+   0,  0,  0, 30,  0,  0, 74, 74,  5, 15,
+   0, 25, 40, 74, 15,  0,  0, 10, 35, 74,
+  10,  0, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74
 };
   register int hval = len;
 
@@ -97,33 +97,35 @@ struct kconf_id_strings_t
 char kconf_id_strings_str8[sizeof("tristate")];
 char kconf_id_strings_str9[sizeof("endchoice")];
 char kconf_id_strings_str10[sizeof("---help---")];
+char kconf_id_strings_str11[sizeof("select")];
 char kconf_id_strings_str12[sizeof("def_tristate")];
 char kconf_id_strings_str13[sizeof("def_bool")];
 char kconf_id_strings_str14[sizeof("defconfig_list")];
-char kconf_id_strings_str17[sizeof("on")];
-char kconf_id_strings_str18[sizeof("optional")];
-char kconf_id_strings_str21[sizeof("option")];
-char kconf_id_strings_str22[sizeof("endmenu")];
-char kconf_id_strings_str23[sizeof("mainmenu")];
-char kconf_id_strings_str25[sizeof("menuconfig")];
-char kconf_id_strings_str27[sizeof("modules")];
-char kconf_id_strings_str28[sizeof("allnoconfig_y")];
+char kconf_id_strings_str16[sizeof("source")];
+char kconf_id_strings_str17[sizeof("endmenu")];
+char kconf_id_strings_str18[sizeof("allnoconfig_y")];
+char kconf_id_strings_str20[sizeof("range")];
+char kconf_id_strings_str22[sizeof("modules")];
+char kconf_id_strings_str23[sizeof("hex")];
+char kconf_id_strings_str27[sizeof("on")];
 char kconf_id_strings_str29[sizeof("menu")];
-char kconf_id_strings_str31[sizeof("select")];
+char kconf_id_strings_str31[sizeof("option")];
 char kconf_id_strings_str32[sizeof("comment")];
-char kconf_id_strings_str33[sizeof("env")];
-char kconf_id_strings_str35[sizeof("range")];
-char kconf_id_strings_str36[sizeof("choice")];
-char kconf_id_strings_str39[sizeof("bool")];
-char kconf_id_strings_str41[sizeof("source")];
+char kconf_id_string

[PATCH v2 1/5] kconfig: introduce the "imply" keyword

2016-10-25 Thread Nicolas Pitre

The "imply" keyword is a weak version of "select" where the target
config symbol can still be turned off, avoiding those pitfalls that come
with the "select" keyword.

This is useful e.g. with multiple drivers that want to indicate their
ability to hook into a given subsystem while still being able to
configure that subsystem out and keep those drivers selected.

Currently, the same effect can almost be achieved with:

config DRIVER_A
tristate

config DRIVER_B
tristate

config DRIVER_C
tristate

config DRIVER_D
tristate

[...]

config SUBSYSTEM_X
tristate
default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...]

This is unwieldly to maintain especially with a large number of drivers.
Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X
to y or n, excluding m, when some drivers are built-in. The "select"
keyword allows for excluding m, but it excludes n as well. Hence
this "imply" keyword.  The above becomes:

config DRIVER_A
tristate
imply SUBSYSTEM_X

config DRIVER_B
tristate
imply SUBSYSTEM_X

[...]

config SUBSYSTEM_X
tristate

This is much cleaner, and way more flexible than "select". SUBSYSTEM_X
can still be configured out, and it can be set as a module when none of
the drivers are selected or all of them are also modular.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 Documentation/kbuild/kconfig-language.txt | 28 
 scripts/kconfig/expr.h|  2 ++
 scripts/kconfig/menu.c| 55 ++-
 scripts/kconfig/symbol.c  | 24 +-
 scripts/kconfig/zconf.gperf   |  1 +
 scripts/kconfig/zconf.y   | 16 +++--
 6 files changed, 107 insertions(+), 19 deletions(-)

diff --git a/Documentation/kbuild/kconfig-language.txt 
b/Documentation/kbuild/kconfig-language.txt
index 069fcb3eef..5ee0dd3c85 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -113,6 +113,33 @@ applicable everywhere (see syntax).
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
 
+- weak reverse dependencies: "imply"  ["if" ]
+  This is similar to "select" as it enforces a lower limit on another
+  symbol except that the "implied" config symbol's value may still be
+  set to n from a direct dependency or with a visible prompt.
+  Given the following example:
+
+  config FOO
+   tristate
+   imply BAZ
+
+  config BAZ
+   tristate
+   depends on BAR
+
+  The following values are possible:
+
+   FOO BAR BAZ's default   choice for BAZ
+   --- --- -   --
+   n   y   n   N/m/y
+   m   y   m   M/y/n
+   y   y   y   Y/n
+   y   n   *   N
+
+  This is useful e.g. with multiple drivers that want to indicate their
+  ability to hook into a given subsystem while still being able to
+  configure that subsystem out and keep those drivers selected.
+
 - limiting menu display: "visible if" 
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
@@ -481,6 +508,7 @@ historical issues resolved through these different 
solutions.
   b) Match dependency semantics:
b1) Swap all "select FOO" to "depends on FOO" or,
b2) Swap all "depends on FOO" to "select FOO"
+  c) Consider the use of "imply" instead of "select"
 
 The resolution to a) can be tested with the sample Kconfig file
 Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 973b6f7333..a73f762c48 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -85,6 +85,7 @@ struct symbol {
struct property *prop;
struct expr_value dir_dep;
struct expr_value rev_dep;
+   struct expr_value implied;
 };
 
 #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym 
= symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER)
@@ -136,6 +137,7 @@ enum prop_type {
P_DEFAULT,  /* default y */
P_CHOICE,   /* choice value */
P_SELECT,   /* select BAR */
+   P_IMPLY,/* imply BAR */
P_RANGE,/* range 7..100 (for a symbol) */
P_ENV,  /* value from environment variable */
P_SYMBOL,   /* where a symbol is defined */
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index aed678e8a7..e9357931b4 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym)
 {
struct property *prop;
struct symbol *sym2;
+   char

Re: [PATCH -next 1/2] Input: synaptics-rmi4 - add support for F55 sensor tuning

2016-10-25 Thread Guenter Roeck


On 10/25/2016 11:26 AM, Andrew Duggan wrote:

On 10/24/2016 08:13 PM, Guenter Roeck wrote:

Hi Andrew,

On 10/24/2016 05:59 PM, Andrew Duggan wrote:

Hi Guenter,

I have a couple of comments below.



Thanks a lot for the feedback.


On 09/30/2016 08:22 PM, Guenter Roeck wrote:

Sensor tuning support is needed to determine the number of enabled
tx and rx electrodes for use in F54 functions.

The number of enabled electrodes is not identical to the total number
of electrodes as reported with F55:Query0 and F55:Query1. It has to be
calculated by analyzing F55:Ctrl1 (sensor receiver assignment) and
F55:Ctrl2 (sensor transmitter assignment).

Support for additional sensor tuning functions may be added later.

Signed-off-by: Guenter Roeck 
---
This patch applies to next-20160930.

  drivers/input/rmi4/Kconfig  |   9 +++
  drivers/input/rmi4/Makefile |   1 +
  drivers/input/rmi4/rmi_bus.c|   3 +
  drivers/input/rmi4/rmi_driver.h |   1 +
  drivers/input/rmi4/rmi_f55.c| 127 
  5 files changed, 141 insertions(+)
  create mode 100644 drivers/input/rmi4/rmi_f55.c

diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig
index 4c8a55857e00..11ede43c9936 100644
--- a/drivers/input/rmi4/Kconfig
+++ b/drivers/input/rmi4/Kconfig
@@ -72,3 +72,12 @@ config RMI4_F54
  Function 54 provides access to various diagnostic features in certain
RMI4 touch sensors.
+
+config RMI4_F55
+bool "RMI4 Function 55 (Sensor tuning)"
+depends on RMI4_CORE
+help
+  Say Y here if you want to add support for RMI4 function 55
+
+  Function 55 provides access to the RMI4 touch sensor tuning
+  mechanism.
diff --git a/drivers/input/rmi4/Makefile b/drivers/input/rmi4/Makefile
index 0bafc8502c4b..96f8e0c21e3b 100644
--- a/drivers/input/rmi4/Makefile
+++ b/drivers/input/rmi4/Makefile
@@ -8,6 +8,7 @@ rmi_core-$(CONFIG_RMI4_F11) += rmi_f11.o
  rmi_core-$(CONFIG_RMI4_F12) += rmi_f12.o
  rmi_core-$(CONFIG_RMI4_F30) += rmi_f30.o
  rmi_core-$(CONFIG_RMI4_F54) += rmi_f54.o
+rmi_core-$(CONFIG_RMI4_F55) += rmi_f55.o
# Transports
  obj-$(CONFIG_RMI4_I2C) += rmi_i2c.o
diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c
index ef8c747c35e7..82b7d4960858 100644
--- a/drivers/input/rmi4/rmi_bus.c
+++ b/drivers/input/rmi4/rmi_bus.c
@@ -314,6 +314,9 @@ static struct rmi_function_handler *fn_handlers[] = {
  #ifdef CONFIG_RMI4_F54
  &rmi_f54_handler,
  #endif
+#ifdef CONFIG_RMI4_F55
+&rmi_f55_handler,
+#endif
  };
static void __rmi_unregister_function_handlers(int start_idx)
diff --git a/drivers/input/rmi4/rmi_driver.h b/drivers/input/rmi4/rmi_driver.h
index 8dfbebe9bf86..a65cf70f61e2 100644
--- a/drivers/input/rmi4/rmi_driver.h
+++ b/drivers/input/rmi4/rmi_driver.h
@@ -103,4 +103,5 @@ extern struct rmi_function_handler rmi_f11_handler;
  extern struct rmi_function_handler rmi_f12_handler;
  extern struct rmi_function_handler rmi_f30_handler;
  extern struct rmi_function_handler rmi_f54_handler;
+extern struct rmi_function_handler rmi_f55_handler;
  #endif
diff --git a/drivers/input/rmi4/rmi_f55.c b/drivers/input/rmi4/rmi_f55.c
new file mode 100644
index ..268fa904205a
--- /dev/null
+++ b/drivers/input/rmi4/rmi_f55.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2012-2015 Synaptics Incorporated
+ * Copyright (C) 2016 Zodiac Inflight Innovations
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 


This is incidental, but I don't think i2c.h needs to be included here since 
this file shouldn't contain anything i2c specific. Its not that big a deal, but 
I noticed it so I thought I would mention it.



Makes sense. delay.h and input.h seem to be unnecessary too.
I'll remove those if/when I resubmit.


+#include 
+#include 
+#include 
+#include 
+#include "rmi_driver.h"
+
+#define F55_NAME"rmi4_f55"
+
+/* F55 data offsets */
+#define F55_NUM_RX_OFFSET0
+#define F55_NUM_TX_OFFSET1
+#define F55_PHYS_CHAR_OFFSET2
+
+/* Fixed sizes of reports */
+#define F55_QUERY_LEN17


How did you chose the number 17? The number of F55 query registers present will 
depend on how the firmware is configured so the total length of query registers 
can change. Right now this driver is only using the first three F55 query 
registers which will always be present so that not an issue. But, beyond query 
2 not all query registers are guaranteed to be present.



According to the information I have, the maximum size is 17.

Do you have a better idea on how to handle the dynamic length ? Or a better 
number ?
Should I only read the minimum ? Or the number we actually need (3) at this 
point ?
Or just name the define F55_QUERY_MAXLEN and change the comment to "maximum size
of report" ?



I would just read the three registers which you are using.

[PATCH] Change the document about iowait

2016-10-25 Thread Chao Fan

The iowait is not reliable by reading from /proc/stat, so this
method to get iowait is not suggested. And we mark it in the
document.

Signed-off-by: Cao Jin 
Signed-off-by: Chao Fan 
---
 Documentation/filesystems/proc.txt | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index 74329fd..71f5096 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1305,7 +1305,16 @@ second).  The meanings of the columns are as follows, 
from left to right:
 - nice: niced processes executing in user mode
 - system: processes executing in kernel mode
 - idle: twiddling thumbs
-- iowait: waiting for I/O to complete
+- iowait: In a word, iowait stands for waiting for I/O to complete. But there
+  are several problems:
+  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
+ waiting for I/O to complete. When cpu goes into idle state for
+ outstanding task io, another task will be scheduled on this CPU.
+  2. In a multi-core CPU, the task waiting for I/O to complete is not running
+ on any CPU, so the iowait of each CPU is difficult to calculate.
+  3. The value of iowait field in /proc/stat will decrease in certain
+ conditions.
+  So, the iowait is not reliable by reading from /proc/stat.
 - irq: servicing interrupts
 - softirq: servicing softirqs
 - steal: involuntary wait
-- 
2.7.4

Re: [PATCH] ARM: imx6: Fix GPC probe error path

2016-10-25 Thread Guenter Roeck


On 10/25/2016 10:34 AM, Guenter Roeck wrote:

GPC may fail to instantiate with

imx-gpc: probe of 20dc000.gpc failed with error -22

which is returned from of_genpd_add_provider_onecell(). The error path
does not call pm_genpd_remove(). This results in the following crash
later on.

Unhandled fault: page domain fault (0x01b) at 0x0040
pgd = c0204000
[0040] *pgd=
Internal error: : 1b [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 108 Comm: kworker/0:3 Not tainted 4.9.0-rc2 #8
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: pm genpd_power_off_work_fn
task: c759ea00 task.stack: c766a000
PC is at mutex_lock+0xc/0x4c
LR is at regulator_disable+0x28/0x64
...
[] (mutex_lock) from [] (regulator_disable+0x28/0x64)
[] (regulator_disable) from [] 
(imx6q_pm_pu_power_off+0x90/0x98)
[] (imx6q_pm_pu_power_off) from [] 
(genpd_poweroff+0x114/0x1d4)
[] (genpd_poweroff) from [] 
(genpd_power_off_work_fn+0x20/0x2c)
[] (genpd_power_off_work_fn) from [] 
(process_one_work+0x138/0x34c)
[] (process_one_work) from [] (worker_thread+0x38/0x510)
[] (worker_thread) from [] (kthread+0xdc/0xf4)
[] (kthread) from [] (ret_from_fork+0x14/0x3c)

This is seen with multi_v7_defconfig and imx6dl-sabrelite.dtb running in
qemu (v2.7 patched to fix a qemu related problem). The error return from
of_genpd_add_provider_onecell() is not seen in v4.8 and may be caused by
a devicetree change (this is a wild guess only), but that is a different
problem.

Fixes: 00eb60a8b4f7 ("ARM: imx6: gpc: Add PU power domain for GPU/VPU")
Cc: Philipp Zabel 
Cc: Arnd Bergmann 
Signed-off-by: Guenter Roeck 
---
Several bisect attempts trying to track down "imx-gpc: probe ... failed
with error -22" point to commit 00e729c93395 ("Merge tag 'armsoc-dt' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc"). I have not
been able to track down the real culprit. Part of the problem is that
CONFIG_REGULATOR_ANATOP must be enabled for the problem to be seen, and
CONFIG_ARCH_AT91 causes compile errors for some sequence of commits between
v4.8 and v4.9-rc1. But even after taking this into account, the bisect
results always point to 00e729c93395. If anyone has an idea how to track
down that problem, or what might be causing it, please let me know.



Looking into this some more, it turns out that of_genpd_add_provider_onecell()
now returns an error if one of the provided power domains does not exist.
In this case, the "ARM" power domain does not exist. I don't see where it is
created, so it may well be that this now fails for all imx6 boards with
multi_v7_defconfig. Looking into kernelci.org test results, this is confirmed
for at least imx6dl-riotboard. Overall I think it is quite safe to assume
that all imx6 boards crash with mainline kernels and multi_v7_defconfig.

The change can be tracked down to commit 0159ec67076 ("PM / Domains: Verify
the PM domain is present when adding a provider"). Adding everyone in the
commit log for feedback.

Guenter


 arch/arm/mach-imx/gpc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
index 0df062d8b2c9..f3f40045b4c9 100644
--- a/arch/arm/mach-imx/gpc.c
+++ b/arch/arm/mach-imx/gpc.c
@@ -409,6 +409,7 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
 {
struct clk *clk;
int i;
+   int ret;

imx6q_pu_domain.reg = pu_reg;

@@ -431,9 +432,14 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
return 0;

pm_genpd_init(&imx6q_pu_domain.base, NULL, false);
-   return of_genpd_add_provider_onecell(dev->of_node,
-&imx_gpc_onecell_data);
+   ret = of_genpd_add_provider_onecell(dev->of_node,
+   &imx_gpc_onecell_data);
+   if (ret)
+   goto genpd_remove;
+   return 0;

+genpd_remove:
+   pm_genpd_remove(&imx6q_pu_domain.base);
 clk_err:
while (i--)
clk_put(imx6q_pu_domain.clk[i]);

Re: [PATCH -next] unicore32: Fix build error

2016-10-25 Thread Xuetao Guan

> Since the oldabi syscall interface was first introduced, the
> infrastructure
> changed and the patch no longer compiles. See commit f56141e3e2d9a ("all
> arches, signal: move restart_block to struct task_struct") for details.
>
> Fixes: 1ace5d1e3d4b4 ("unicore32-oldabi: add oldabi syscall interface")
> Signed-off-by: Guenter Roeck 

Sorry, I missed your patch when sent patch-v2 for oldabi.
And so, I directly modified the codes in patch-v2.
I'll add your comment and Sob for patch-v2.
Thanks, Guenter.

Xuetao

> ---
> Should be merged with the commit introducing the problem if possible.
>
>  arch/unicore32/kernel/signal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/unicore32/kernel/signal.c
> b/arch/unicore32/kernel/signal.c
> index 78a66491b108..be75ef8c1e0c 100644
> --- a/arch/unicore32/kernel/signal.c
> +++ b/arch/unicore32/kernel/signal.c
> @@ -115,7 +115,7 @@ asmlinkage int __sys_sigreturn(struct pt_regs *regs)
>   struct sigframe __user *frame;
>
>   /* Always make any pending restarted system calls return -EINTR */
> - current_thread_info()->restart_block.fn = do_no_restart_syscall;
> + current->restart_block.fn = do_no_restart_syscall;
>
>   /*
>* Since we stacked the signal on a 64-bit boundary,
> --
> 2.5.0
>

[PATCH v2 4/5] ptp_clock: allow for it to be optional

2016-10-25 Thread Nicolas Pitre

In order to break the hard dependency between the PTP clock subsystem and
ethernet drivers capable of being clock providers, this patch provides
simple PTP stub functions to allow linkage of those drivers into the
kernel even when the PTP subsystem is configured out. Drivers must be
ready to accept NULL from ptp_clock_register() in that case.

And to make it possible for PTP to be configured out, the select statement
in those driver's Kconfig menu entries is converted to the new "imply"
statement. This way the PTP subsystem may have Kconfig dependencies of
its own, such as POSIX_TIMERS, without having to make those ethernet
drivers unavailable if POSIX timers are cconfigured out. And when support
for POSIX timers is selected again then the default config option for PTP
clock support will automatically be adjusted accordingly.

The pch_gbe driver is a bit special as it relies on extra code in
drivers/ptp/ptp_pch.c. Therefore we let the make process descend into
drivers/ptp/ even if PTP_1588_CLOCK is unselected.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 drivers/Makefile|  2 +-
 drivers/net/ethernet/adi/Kconfig|  2 +-
 drivers/net/ethernet/amd/Kconfig|  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |  6 ++-
 drivers/net/ethernet/broadcom/Kconfig   |  4 +-
 drivers/net/ethernet/cavium/Kconfig |  2 +-
 drivers/net/ethernet/freescale/Kconfig  |  2 +-
 drivers/net/ethernet/intel/Kconfig  | 10 ++--
 drivers/net/ethernet/mellanox/mlx4/Kconfig  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |  2 +-
 drivers/net/ethernet/renesas/Kconfig|  2 +-
 drivers/net/ethernet/samsung/Kconfig|  2 +-
 drivers/net/ethernet/sfc/Kconfig|  2 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig |  2 +-
 drivers/net/ethernet/ti/Kconfig |  2 +-
 drivers/net/ethernet/tile/Kconfig   |  2 +-
 drivers/ptp/Kconfig |  8 +--
 include/linux/ptp_clock_kernel.h| 65 -
 18 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index f0afdfb3c7..8cfa1ff8f6 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -107,7 +107,7 @@ obj-$(CONFIG_INPUT) += input/
 obj-$(CONFIG_RTC_LIB)  += rtc/
 obj-y  += i2c/ media/
 obj-$(CONFIG_PPS)  += pps/
-obj-$(CONFIG_PTP_1588_CLOCK)   += ptp/
+obj-y  += ptp/
 obj-$(CONFIG_W1)   += w1/
 obj-y  += power/
 obj-$(CONFIG_HWMON)+= hwmon/
diff --git a/drivers/net/ethernet/adi/Kconfig b/drivers/net/ethernet/adi/Kconfig
index 6b94ba6103..98cc8f5350 100644
--- a/drivers/net/ethernet/adi/Kconfig
+++ b/drivers/net/ethernet/adi/Kconfig
@@ -58,7 +58,7 @@ config BFIN_RX_DESC_NUM
 config BFIN_MAC_USE_HWSTAMP
bool "Use IEEE 1588 hwstamp"
depends on BFIN_MAC && BF518
-   select PTP_1588_CLOCK
+   imply PTP_1588_CLOCK
default y
---help---
  To support the IEEE 1588 Precision Time Protocol (PTP), select y here
diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index 0038709fd3..713ea7ad22 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -177,7 +177,7 @@ config AMD_XGBE
depends on ARM64 || COMPILE_TEST
select BITREVERSE
select CRC32
-   select PTP_1588_CLOCK
+   imply PTP_1588_CLOCK
---help---
  This driver supports the AMD 10GbE Ethernet device found on an
  AMD SoC.
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 9de078819a..e10e569c0d 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -773,7 +773,8 @@ static int xgbe_probe(struct platform_device *pdev)
goto err_wq;
}
 
-   xgbe_ptp_register(pdata);
+   if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK))
+   xgbe_ptp_register(pdata);
 
xgbe_debugfs_init(pdata);
 
@@ -812,7 +813,8 @@ static int xgbe_remove(struct platform_device *pdev)
 
xgbe_debugfs_exit(pdata);
 
-   xgbe_ptp_unregister(pdata);
+   if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK))
+   xgbe_ptp_unregister(pdata);
 
flush_workqueue(pdata->an_workqueue);
destroy_workqueue(pdata->an_workqueue);
diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index bd8c80c0b7..6a8d74aeb6 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -110,7 +110,7 @@ config TIGON3
depends on PCI
select PHYLIB
select HWMON
-   select PTP_1588_CLOCK
+   imply PTP_1588_CLOCK
---help---
  This driver supports Broadcom T

[no subject]

2016-10-25 Thread Nicolas Pitre

From: Nicolas Pitre 
Subject: [PATCH v2 0/5] make POSIX timers optional with some Kconfig help

Many embedded systems don't need the full POSIX timer support.
Configuring them out provides a nice kernel image size reduction.

When POSIX timers are configured out, the PTP clock subsystem should be
left out as well. However a bunch of ethernet drivers currently *select*
the later in their Kconfig entries. Therefore some more work was needed
to break that hard dependency from those drivers without preventing their
usage altogether.

Therefore this series also includes kconfig changes to implement a new
keyword to express some reverse dependencies like "select" does, named
"imply", and still allowing for the target config symbol to be disabled
if the user or a direct dependency says so. The "suggest" keyword is
also provided to complement "imply" but without the restrictions from
"imply" or "select".

At this point I'd like to gather ACKs especially from people in the "To"
field. Ideally this would need to go upstream as a single series to avoid
cross subsystem dependency issues, and we should decide which maintainer
tree to use.  Suggestions welcome.

Changes from v1:

- added "suggest" to kconfig for completeness
- various typo fixes
- small "imply" effect visibility fix

The bulk of the diffstat comes from the kconfig lex parser regeneration.

Diffstat:

 Documentation/kbuild/kconfig-language.txt   |   34 +
 drivers/Makefile|2 +-
 drivers/net/ethernet/adi/Kconfig|2 +-
 drivers/net/ethernet/amd/Kconfig|2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |6 +-
 drivers/net/ethernet/broadcom/Kconfig   |4 +-
 drivers/net/ethernet/cavium/Kconfig |2 +-
 drivers/net/ethernet/freescale/Kconfig  |2 +-
 drivers/net/ethernet/intel/Kconfig  |   10 +-
 drivers/net/ethernet/mellanox/mlx4/Kconfig  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |2 +-
 drivers/net/ethernet/renesas/Kconfig|2 +-
 drivers/net/ethernet/samsung/Kconfig|2 +-
 drivers/net/ethernet/sfc/Kconfig|2 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig |2 +-
 drivers/net/ethernet/ti/Kconfig |2 +-
 drivers/net/ethernet/tile/Kconfig   |2 +-
 drivers/ptp/Kconfig |   10 +-
 include/linux/posix-timers.h|   28 +-
 include/linux/ptp_clock_kernel.h|   65 +-
 include/linux/sched.h   |   10 +
 init/Kconfig|   17 +
 kernel/signal.c |4 +
 kernel/time/Makefile|   10 +-
 kernel/time/posix-stubs.c   |  118 ++
 scripts/kconfig/expr.h  |4 +
 scripts/kconfig/menu.c  |   68 +-
 scripts/kconfig/symbol.c|   42 +-
 scripts/kconfig/zconf.gperf |2 +
 scripts/kconfig/zconf.hash.c_shipped|  228 +--
 scripts/kconfig/zconf.tab.c_shipped | 1631 -
 scripts/kconfig/zconf.y |   28 +-
 32 files changed, 1300 insertions(+), 1045 deletions(-)

[PATCH v2 2/5] kconfig: introduce the "suggest" keyword

2016-10-25 Thread Nicolas Pitre

Similar to "imply" but with no added restrictions on the target symbol's
value. Useful for providing a default value to another symbol.

Suggested by Edward Cree.

Signed-off-by: Nicolas Pitre 
---
 Documentation/kbuild/kconfig-language.txt |  6 ++
 scripts/kconfig/expr.h|  2 ++
 scripts/kconfig/menu.c| 15 ++-
 scripts/kconfig/symbol.c  | 20 +++-
 scripts/kconfig/zconf.gperf   |  1 +
 scripts/kconfig/zconf.y   | 16 ++--
 6 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/Documentation/kbuild/kconfig-language.txt 
b/Documentation/kbuild/kconfig-language.txt
index 5ee0dd3c85..b7f4f0ca1d 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -140,6 +140,12 @@ applicable everywhere (see syntax).
   ability to hook into a given subsystem while still being able to
   configure that subsystem out and keep those drivers selected.
 
+- even weaker reverse dependencies: "suggest"  ["if" ]
+  This is similar to "imply" except that this doesn't add any restrictions
+  on the value the suggested symbol may use. In other words this only
+  provides a default for the specified symbol based on the value for the
+  config entry where this is used.
+
 - limiting menu display: "visible if" 
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index a73f762c48..eea3aa3c7a 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -86,6 +86,7 @@ struct symbol {
struct expr_value dir_dep;
struct expr_value rev_dep;
struct expr_value implied;
+   struct expr_value suggested;
 };
 
 #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym 
= symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER)
@@ -138,6 +139,7 @@ enum prop_type {
P_CHOICE,   /* choice value */
P_SELECT,   /* select BAR */
P_IMPLY,/* imply BAR */
+   P_SUGGEST,  /* suggest BAR */
P_RANGE,/* range 7..100 (for a symbol) */
P_ENV,  /* value from environment variable */
P_SYMBOL,   /* where a symbol is defined */
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index e9357931b4..3abc5c85ac 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -255,7 +255,9 @@ static void sym_check_prop(struct symbol *sym)
break;
case P_SELECT:
case P_IMPLY:
-   use = prop->type == P_SELECT ? "select" : "imply";
+   case P_SUGGEST:
+   use = prop->type == P_SELECT ? "select" :
+ prop->type == P_IMPLY ? "imply" : "suggest";
sym2 = prop_get_symbol(prop);
if (sym->type != S_BOOLEAN && sym->type != S_TRISTATE)
prop_warn(prop,
@@ -341,6 +343,10 @@ void menu_finalize(struct menu *parent)
struct symbol *es = 
prop_get_symbol(prop);
es->implied.expr = 
expr_alloc_or(es->implied.expr,

expr_alloc_and(expr_alloc_symbol(menu->sym), expr_copy(dep)));
+   } else if (prop->type == P_SUGGEST) {
+   struct symbol *es = 
prop_get_symbol(prop);
+   es->suggested.expr = 
expr_alloc_or(es->suggested.expr,
+   
expr_alloc_and(expr_alloc_symbol(menu->sym), expr_copy(dep)));
}
}
}
@@ -687,6 +693,13 @@ static void get_symbol_str(struct gstr *r, struct symbol 
*sym,
str_append(r, "\n");
}
 
+   get_symbol_props_str(r, sym, P_SUGGEST, _("  Suggests: "));
+   if (sym->suggested.expr) {
+   str_append(r, _("  Suggested by: "));
+   expr_gstr_print(sym->suggested.expr, r);
+   str_append(r, "\n");
+   }
+
str_append(r, "\n\n");
 }
 
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c
index 20136ffefb..4a8094a63c 100644
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -267,6 +267,16 @@ static void sym_calc_visibility(struct symbol *sym)
sym->implied.tri = tri;
sym_set_changed(sym);
}
+   tri = no;
+   if (sym->suggested.expr)
+   tri = expr_calc_value(sym->suggested.expr);
+   tri = EXPR_AND(tri, sym->visible);
+   if (tri == mod && sym_get_type(sym) == S_BOOLEAN)
+   tri = yes;
+   if (sym->suggested.tri != tri) {
+   sym->suggested.tri = tri;
+

[PATCH v2] staging: vc04_services: Replace dmac_map_area with dmac_map_sg

2016-10-25 Thread Michael Zoran

The original arm implementation uses dmac_map_area which is not
portable.  Replace it with an architecture neutral version
which uses dma_map_sg.

As you can see that for larger page sizes, the dma_map_sg
implementation is faster then the original unportable dma_map_area
implementation.

Test   dmac_map_area   dma_map_page dma_map_sg
vchiq_test -b 4 1  51us/iter   76us/iter76us
vchiq_test -b 8 1  70us/iter   82us/iter91us
vchiq_test -b 16 1 94us/iter   118us/iter   121us
vchiq_test -b 32 1 146us/iter  173us/iter   187us
vchiq_test -b 64 1 263us/iter  328us/iter   299us
vchiq_test -b 128 1529us/iter  631us/iter   595us
vchiq_test -b 256 12285us/iter 2275us/iter  2001us
vchiq_test -b 512 14372us/iter 4616us/iter  4123us

For message sizes >= 64KB, dma_map_sg is faster then dma_map_page.

For message size >= 256KB, the dma_map_sg is the fastest
implementation.

"Normal" messages sizes should be about 1MB which is beyond
the length that this change shows a speed increase.

This is v2 of the patch which includes extra WARN_ONs and
incorporates feedback from Eric Anholt .

Signed-off-by: Michael Zoran 
---
 .../interface/vchiq_arm/vchiq_2835_arm.c   | 152 +
 1 file changed, 93 insertions(+), 59 deletions(-)

diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c 
b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
index 32d12e6..a5afcc5 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_2835_arm.c
@@ -45,13 +45,8 @@
 #include 
 #include 
 
-#define dmac_map_area  __glue(_CACHE,_dma_map_area)
-#define dmac_unmap_area__glue(_CACHE,_dma_unmap_area)
-
 #define TOTAL_SLOTS (VCHIQ_SLOT_ZERO_SLOTS + 2 * 32)
 
-#define VCHIQ_ARM_ADDRESS(x) ((void *)((char *)x + g_virt_to_bus_offset))
-
 #include "vchiq_arm.h"
 #include "vchiq_2835.h"
 #include "vchiq_connected.h"
@@ -73,7 +68,7 @@ static unsigned int g_fragments_size;
 static char *g_fragments_base;
 static char *g_free_fragments;
 static struct semaphore g_free_fragments_sema;
-static unsigned long g_virt_to_bus_offset;
+static struct device *g_dev;
 
 extern int vchiq_arm_log_level;
 
@@ -84,10 +79,11 @@ vchiq_doorbell_irq(int irq, void *dev_id);
 
 static int
 create_pagelist(char __user *buf, size_t count, unsigned short type,
-struct task_struct *task, PAGELIST_T ** ppagelist);
+   struct task_struct *task, PAGELIST_T **ppagelist,
+   dma_addr_t *dma_addr);
 
 static void
-free_pagelist(PAGELIST_T *pagelist, int actual);
+free_pagelist(dma_addr_t dma_addr, PAGELIST_T *pagelist, int actual);
 
 int vchiq_platform_init(struct platform_device *pdev, VCHIQ_STATE_T *state)
 {
@@ -101,8 +97,6 @@ int vchiq_platform_init(struct platform_device *pdev, 
VCHIQ_STATE_T *state)
int slot_mem_size, frag_mem_size;
int err, irq, i;
 
-   g_virt_to_bus_offset = virt_to_dma(dev, (void *)0);
-
(void)of_property_read_u32(dev->of_node, "cache-line-size",
   &g_cache_line_size);
g_fragments_size = 2 * g_cache_line_size;
@@ -118,7 +112,7 @@ int vchiq_platform_init(struct platform_device *pdev, 
VCHIQ_STATE_T *state)
return -ENOMEM;
}
 
-   WARN_ON(((int)slot_mem & (PAGE_SIZE - 1)) != 0);
+   WARN_ON(((unsigned long)slot_mem & (PAGE_SIZE - 1)) != 0);
 
vchiq_slot_zero = vchiq_init_slots(slot_mem, slot_mem_size);
if (!vchiq_slot_zero)
@@ -170,6 +164,7 @@ int vchiq_platform_init(struct platform_device *pdev, 
VCHIQ_STATE_T *state)
return err ? : -ENXIO;
}
 
+   g_dev = dev;
vchiq_log_info(vchiq_arm_log_level,
"vchiq_init - done (slots %pK, phys %pad)",
vchiq_slot_zero, &slot_phys);
@@ -233,6 +228,7 @@ vchiq_prepare_bulk_data(VCHIQ_BULK_T *bulk, 
VCHI_MEM_HANDLE_T memhandle,
 {
PAGELIST_T *pagelist;
int ret;
+   dma_addr_t dma_addr;
 
WARN_ON(memhandle != VCHI_MEM_HANDLE_INVALID);
 
@@ -241,12 +237,14 @@ vchiq_prepare_bulk_data(VCHIQ_BULK_T *bulk, 
VCHI_MEM_HANDLE_T memhandle,
? PAGELIST_READ
: PAGELIST_WRITE,
current,
-   &pagelist);
+   &pagelist,
+   &dma_addr);
+
if (ret != 0)
return VCHIQ_ERROR;
 
bulk->handle = memhandle;
-   bulk->data = VCHIQ_ARM_ADDRESS(pagelist);
+   bulk->data = (void *)(unsigned long)dma_addr;
 
/* Store the pagelist address in remote_data, which isn't used by the
   slave. */
@@ -259,7 +257,8 @@ void
 vchiq_complete_bulk(VCHIQ_BULK_T *bulk)
 {
if (bulk && bulk->remote_data && bulk->actual)
-   free_pa

Re: [LKP] [lkp] [perf powerpc] 18d1796d0b: [No primary change]

2016-10-25 Thread Huang, Ying

Peter Zijlstra  writes:

> On Tue, Oct 25, 2016 at 02:40:13PM +0800, kernel test robot wrote:
>> [will-it-scale] perf-stat.branch-miss-rate +7.4% regression 
>> Reply-To: kernel test robot 
>> User-Agent: Heirloom mailx 12.5 6/20/10
>> 
>> 
>> FYI, we noticed a +7.4% regression of perf-stat.branch-miss-rate due to 
>> commit:
>> 
>> commit 18d1796d0b45762ec6f58c5ed2ad3f7510ffbaa9 ("perf powerpc: Don't call 
>> perf_event_disable from atomic context")
>> https://github.com/0day-ci/linux 
>> Jiri-Olsa/perf-powerpc-Don-t-call-perf_event_disable-from-atomic-context/20161006-203500
>> 
>> in testcase: will-it-scale
>> on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 
>> 64G memory
>> with following parameters:
>> 
>>  test: poll2
>>  cpufreq_governor: performance
>> 
>> Will It Scale takes a testcase and runs it from 1 through to n parallel 
>> copies to see if the testcase will scale. It builds both a process and 
>> threads based test in order to see any differences between the two.
>
>> Details are as below:
>> -->
>> 
>> 
>> To reproduce:
>> 
>> git clone 
>> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml  # job file is attached in this email
>> bin/lkp run job.yaml
>> 
>> =
>> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>>   
>> gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll2/will-it-scale
>> 
>> commit: 
>>   41aad2a6d4 (" perf/core improvements and fixes:")
>>   18d1796d0b ("perf powerpc: Don't call perf_event_disable from atomic 
>> context")
>> 
>> 41aad2a6d4fcdda8 18d1796d0b45762ec6f58c5ed2 
>>  -- 
>>fail:runs  %reproductionfail:runs
>>| | |
>>  %stddev %change %stddev
>>  \  |\  
>>   0.19 .  0%  +7.4%   0.21 .  0%  perf-stat.branch-miss-rate%
>>  9.591e+09 .  1%  +9.1%  1.047e+10 .  0%  perf-stat.branch-misses
>>  1.962e+09 .  0%  +2.3%  2.008e+09 .  1%  perf-stat.cache-references
>>  51.18 .  2%  +5.6%  54.06 .  1%  perf-stat.iTLB-load-miss-rate%
>>   46430577 .  5%  -6.9%   43241506 .  2%  perf-stat.iTLB-loads
>>   9.90 .  4%  +9.3%  10.82 .  4%  turbostat.Pkg%pc2
>>  62066 . 24% +34.7%  83582 . 11%  numa-meminfo.node1.Active
>>  49531 . 30% +42.9%  70778 . 13%  numa-meminfo.node1.Active(anon)
>>  27883 .100%-100.0%   0.00 . -1%  
>> latency_stats.avg.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>>  27883 .100%-100.0%   0.00 . -1%  
>> latency_stats.max.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>>  32685 . 38% +88.5%  61603 .147%  
>> latency_stats.sum.call_rwsem_down_write_failed.path_openat.do_filp_open.do_sys_open.SyS_open.entry_SYSCALL_64_fastpath
>>  27883 .100%-100.0%   0.00 . -1%  
>> latency_stats.sum.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>>  92795 .  4%  -8.6%  84853 .  6%  numa-vmstat.node0.numa_hit
>>  92782 .  4%  -8.5%  84851 .  6%  numa-vmstat.node0.numa_local
>>  12381 . 30% +42.9%  17694 . 13%  
>> numa-vmstat.node1.nr_active_anon
>>  12381 . 30% +42.9%  17694 . 13%  
>> numa-vmstat.node1.nr_zone_active_anon
>>  21.80 . 59% -69.8%   6.58 . 83%  sched_debug.cpu.clock.stddev
>>  21.80 . 59% -69.8%   6.58 . 83%  
>> sched_debug.cpu.clock_task.stddev
>>   0.00 . 23% -34.3%   0.00 . 20%  
>> sched_debug.cpu.next_balance.stddev
>>  35829 .  9% -18.4%  29221 .  6%  sched_debug.cpu.nr_switches.max
>>   8361 .  6% -13.4%   7243 .  7%  
>> sched_debug.cpu.nr_switches.stddev
>>   8.43 . 11% -25.2%   6.30 . 12%  
>> sched_debug.cpu.nr_uninterruptible.stddev
>>  18057 .  6% -14.3%  15482 .  8%  
>> sched_debug.cpu.sched_count.stddev
>> 
>
> ARGH... so what is the normal metric for this test and did that change?
> And why can't I still find that? These reports suck!

There is observable changes between the benchmark (will-it-scale)
scores.  That is said in the subject of the mail: "[No primary
change]".  But apparently, that is not clear.  We will improve that to
make it more clear.

> The result doesn't make sense, my gcc inlines the function call, the
> emitted code is very similar to the old code, with exception of one
> extra symbol.
>
> Are you sure this isn't simple run to run variation?

The reported change is perf-stat.branch-miss-rate%

Re: [PATCH] irqchip/gic: Enable gic_set_affinity set more than one cpu

2016-10-25 Thread Cheng Chao



on 10/25/2016 06:09 PM, Marc Zyngier wrote:
> On 15/10/16 08:23, Cheng Chao wrote:
>> On 10/15/2016 01:33 AM, Marc Zyngier wrote:
 on 10/13/2016 11:31 PM, Marc Zyngier wrote:
> On Thu, 13 Oct 2016 18:57:14 +0800
> Cheng Chao  wrote:
>
>> GIC can distribute an interrupt to more than one cpu,
>> but now, gic_set_affinity sets only one cpu to handle interrupt.
>
> What makes you think this is a good idea? What purpose does it serves?
> I can only see drawbacks to this: You're waking up more than one CPU,
> wasting power, adding jitter and clobbering the cache.
>
> I assume you see a benefit to that approach, so can you please spell it
> out?
>

 Ok, You are right, but the performance is another point that we should 
 consider.

 We use E1 device to transmit/receive video stream. we find that E1's 
 interrupt is
 only on the one cpu that cause this cpu usage is almost 100%,
 but other cpus is much lower load, so the performance is not good.
 the cpu is 4-core.
>>>
>>> It looks to me like you're barking up the wrong tree. We have
>>> NAPI-enabled network drivers for this exact reason, and adding more
>>> interrupts to an already overloaded system doesn't strike me as going in
>>> the right direction. May I suggest that you look at integrating NAPI
>>> into your E1 driver?
>>>
>>
>> great, NAPI maybe is a good option, I can try to use NAPI. thank you.
>>
>> In other hand, gic_set_affinity sets only one cpu to handle interrupt,
>> that really makes me a little confused, why does GIC's driver not like 
>> the others(MPIC, APIC etc) to support many cpus to handle interrupt?
>>
>> It seems that the GIC's driver constrain too much.
> 
> There is several drawbacks to this:
> - Cache impacts and power efficiency, as already mentioned
> - Not virtualizable (you cannot efficiently implement this in a 
>   hypervisor that emulates a GICv2 distributor)
> - Doesn't scale (you cannot go beyond 8 CPUs)
> 
> I strongly suggest you give NAPI a go, and only then consider
> delivering interrupts to multiple CPUs, because multiple CPU
> delivery is not future proof.
> 

Thanks again, the E1 driver with NAPI is on the right track.

>> I think it is more reasonable to let user decide what to do.
>>
>> If I care about the power etc, then I only echo single cpu to
>> /proc/irq/xx/smp_affinity, but if I expect more than one cpu to handle 
>> one special interrupt, I can echo 'what I expect cpus' to
>> /proc/irq/xx/smp_affinity.
> 
> If that's what you really want, a better patch may be something like this:
> 

I hope the GIC'c driver is more flexible, and gic_set_affinity() doesn't 
constrain 
to set only one cpu. the GIC supports to distribute more than one cpu after all.


> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index d6c404b..b301d72 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -326,20 +326,25 @@ static int gic_set_affinity(struct irq_data *d, const 
> struct cpumask *mask_val,
>  {
>   void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & 
> ~3);
>   unsigned int cpu, shift = (gic_irq(d) % 4) * 8;
> - u32 val, mask, bit;
> - unsigned long flags;
> + u32 val, mask, bit = 0;
> + unsigned long flags, aff = 0;
>  
> - if (!force)
> - cpu = cpumask_any_and(mask_val, cpu_online_mask);
> - else
> - cpu = cpumask_first(mask_val);
> + for_each_cpu(cpu, mask_val) {
> + if (force) {
> + aff = 1 << cpu;
> + break;
> + }
> +
> + aff |= cpu_online(cpu) << cpu;
> + }
>  
> - if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids)
> + if (!aff)
>   return -EINVAL;
>  
>   gic_lock_irqsave(flags);
>   mask = 0xff << shift;
> - bit = gic_cpu_map[cpu] << shift;
> + for_each_set_bit(cpu, &aff, nr_cpu_ids)
> + bit |= gic_cpu_map[cpu] << shift;
>   val = readl_relaxed(reg) & ~mask;
>   writel_relaxed(val | bit, reg);
>   gic_unlock_irqrestore(flags);
> 

this patch is more better than before.
a little check add.

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 58e5b4e..b3d0f07 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -326,20 +326,28 @@ static int gic_set_affinity(struct irq_data *d, const 
struct cpumask *mask_val,
 {
void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & 
~3);
unsigned int cpu, shift = (gic_irq(d) % 4) * 8;
-   u32 val, mask, bit;
-   unsigned long flags;
+   u32 val, mask, bit = 0;
+   unsigned long flags, aff = 0;

-   if (!force)
-   cpu = cpumask_any_and(mask_val, cpu_online_mask);
-   else
-   cpu = cpumask_first(mask_val);
+   for_each_cpu(cpu, mask_val) {
+   if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids)
+

[PATCH] include/linux/rtmutex.h: NOOP rt_mutex_destroy if !CONFIG_DEBUG_RT_MUTEXES

2016-10-25 Thread Alex Goins

mutex_destroy is no-op inline when DEBUG_MUTEX is not enabled. When
mutex_destroy is indirectly used by non-GPL kernel modules that use inline
functions such as reservation_object_fini(), users can use a kernel with
DEBUG_MUTEX disabled to avoid a dependence on the GPL-only symbol
mutex_destroy.

The RT Linux patches replace mutex_destroy with rt_mutex_destroy.
Currently, rt_mutex_destroy is GPL_ONLY irrespective of whether mutex
debugging is enabled.

This patch aligns rt_mutex_destroy with mutex_destroy by using the same
no-op inline technique. This allows non-GPL modules to access the same
functionality with RT Linux as with regular Linux.

Signed-off-by: Alex Goins 
---
 include/linux/rtmutex.h  | 7 ++-
 kernel/locking/rtmutex.c | 5 ++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 1abba5c..741e844 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -56,6 +56,12 @@ struct hrtimer_sleeper;
 #endif
 
 #ifdef CONFIG_DEBUG_RT_MUTEXES
+ extern void rt_mutex_destroy(struct rt_mutex *lock);
+#else
+ static inline void rt_mutex_destroy(struct rt_mutex *lock) {}
+#endif
+
+#ifdef CONFIG_DEBUG_RT_MUTEXES
 # define __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
, .name = #mutexname, .file = __FILE__, .line = __LINE__
 # define rt_mutex_init(mutex)  __rt_mutex_init(mutex, __func__)
@@ -87,7 +93,6 @@ static inline int rt_mutex_is_locked(struct rt_mutex *lock)
 }
 
 extern void __rt_mutex_init(struct rt_mutex *lock, const char *name);
-extern void rt_mutex_destroy(struct rt_mutex *lock);
 
 extern void rt_mutex_lock(struct rt_mutex *lock);
 extern int rt_mutex_lock_interruptible(struct rt_mutex *lock);
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 1ec0f48..8d3a80e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1513,6 +1513,7 @@ bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock,
return rt_mutex_slowunlock(lock, wqh);
 }
 
+#ifdef CONFIG_DEBUG_RT_MUTEXES
 /**
  * rt_mutex_destroy - mark a mutex unusable
  * @lock: the mutex to be destroyed
@@ -1524,12 +1525,10 @@ bool __sched rt_mutex_futex_unlock(struct rt_mutex 
*lock,
 void rt_mutex_destroy(struct rt_mutex *lock)
 {
WARN_ON(rt_mutex_is_locked(lock));
-#ifdef CONFIG_DEBUG_RT_MUTEXES
lock->magic = NULL;
-#endif
 }
-
 EXPORT_SYMBOL_GPL(rt_mutex_destroy);
+#endif
 
 /**
  * __rt_mutex_init - initialize the rt lock
-- 
1.9.1

[RFC] HID:hid-lg4ff: Delay to allow wheels to center after plug-in

2016-10-25 Thread Simon Wood

A number of wheels (G27/etc) do a little full right/full left 'dance'
when first plugged in. This patch inserts a delay so that this 'dance'
is completed before we disable (set to zero) the autocenter spring.

A side benefit is that the DFGT was confused without the delay, and is
now correctly being set to 900' rotation mode.

Side Effect - and the reason I am sending as RFC. This 8s delay seems
to have an effect on other wheels connected at the same time.

With 3 wheels on a hub, and then the hub connected to PC. The wheel
on the right in video below waits for G27 to complete this 8s, before
it will do it's 'dance' and register with the system.

https://www.youtube.com/watch?v=xCVpCw_yGgA

I don't know if this is a problem, or if someone here has suggestions
on a better way to implement the delay...
---
 drivers/hid/hid-lg4ff.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/hid/hid-lg4ff.c b/drivers/hid/hid-lg4ff.c
index af3a8ec..3eee920 100644
--- a/drivers/hid/hid-lg4ff.c
+++ b/drivers/hid/hid-lg4ff.c
@@ -1248,6 +1248,8 @@ int lg4ff_init(struct hid_device *hid)
/* Check if autocentering is available and
 * set the centering force to zero by default */
if (test_bit(FF_AUTOCENTER, dev->ffbit)) {
+   wait_queue_head_t wait;
+
/* Formula Force EX expects different autocentering command */
if ((bcdDevice >> 8) == LG4FF_FFEX_REV_MAJ &&
(bcdDevice & 0xff) == LG4FF_FFEX_REV_MIN)
@@ -1255,6 +1257,14 @@ int lg4ff_init(struct hid_device *hid)
else
dev->ff->set_autocenter = lg4ff_set_autocenter_default;
 
+   /* insert a 8s delay to allow DFGT/G25/G27/G29 wheels to return 
to center position*/
+   if (lg4ff_devices[i].product_id == 
USB_DEVICE_ID_LOGITECH_DFGT_WHEEL ||
+   lg4ff_devices[i].product_id == 
USB_DEVICE_ID_LOGITECH_G25_WHEEL ||
+   lg4ff_devices[i].product_id == 
USB_DEVICE_ID_LOGITECH_G27_WHEEL ||
+   lg4ff_devices[i].product_id == 
USB_DEVICE_ID_LOGITECH_G29_WHEEL) {
+   init_waitqueue_head (&wait);
+   wait_event_interruptible_timeout(wait, 0, 
msecs_to_jiffies(8000));
+   }
dev->ff->set_autocenter(dev, 0);
}
 
-- 
2.7.4

Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping mount propagation trees

2016-10-25 Thread Eric W. Biederman

Andrei Vagin  writes:

> On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
>> That is certainly interesting.  The problem is that the reason we were
>> going slow is that there were in fact mounts that had not been traversed
>> in the share group.
>
> You are right.
>
>> 
>> And in fact the entire idea of visiting a vfsmount mountpoint pair
>> exactly once is wrong in the face of shadow mounts.  For a vfsmount
>> mountpoint pair that has shadow mounts the number of shadow mounts needs
>> to be descreased by one each time the propgation tree is traversed
>> during unmount. Which means that as far as I can see we have to kill
>> shadow mounts to correctly optimize this code.  Once shadow mounts are
>> gone I don't know of a case where need your optimization.
>
> Without shadow mounts, it will be hard to save predictable behaviour
> for cases like this:
>
> $ unshare --propagation private -m sh test.sh
> + mount -t tmpfs --make-shared  A
> + mkdir A/a
> + mount -t tmpfs  A/a
> + mount --bind A B
> + mount -t tmpfs  B/a
> + grep 
> + cat /proc/self/mountinfo
> 155 123 0:44 / /root/tmp/A rw,relatime shared:70 - tmpfs  rw
> 156 155 0:45 / /root/tmp/A/a rw,relatime shared:71 - tmpfs  rw
> 157 123 0:44 / /root/tmp/B rw,relatime shared:70 - tmpfs  rw
> 158 157 0:46 / /root/tmp/B/a rw,relatime shared:72 - tmpfs  rw
> 159 155 0:46 / /root/tmp/A/a rw,relatime shared:72 - tmpfs  rw
> + umount B/a
> + grep 
> + cat /proc/self/mountinfo
> 155 123 0:44 / /root/tmp/A rw,relatime shared:70 - tmpfs  rw
> 156 155 0:45 / /root/tmp/A/a rw,relatime shared:71 - tmpfs  rw
> 157 123 0:44 / /root/tmp/B rw,relatime shared:70 - tmpfs  rw
>
> X + a - a = X
>
> Maybe we need to add another ID for propagated mounts and when we
> do umount, we will detach only mounts with the same propagation id.
>
> I support the idea to kill shadow mounts. I guess it will help us to
> simplify algorithm of dumping and restoring a mount tree in CRIU.
>
> Currently it is a big pain for us.

Killing shadow mounts is not exactly a done deal as there are some user
visible effects.  The practical question becomes do we break anything
anyone cares about in userspace.  Answering those practical questions
sucks.

I definitely think we should try to kill shadow mounts because they are
such a big pain to deal with, and only provide very limited value.

So far the only thing I have seem shadow mounts being good for is
preserving unmount behavior in cases where what someone has
constructed an artificially evil mount tree. I haven't figured out how
to see how any of those mount trees are actually useful in real life.

Eric

Re: bio linked list corruption.

2016-10-25 Thread Linus Torvalds

On Tue, Oct 25, 2016 at 6:33 PM, Linus Torvalds
 wrote:
>
> Completely untested. Maybe there's some reason we can't write to the
> whole thing like that?

That hack boots and seems to work for me, but doesn't show anything.

Dave, mind just trying that oneliner?

   Linus

Re: taint/module: Clean up global and module taint flags handling

2016-10-25 Thread Rusty Russell

Jiri Kosina  writes:
> On Fri, 23 Sep 2016, Jessica Yu wrote:
>
>> Hm, quick question, which tree would this patch go to? Though the
>> cleanup is for modules, there is an indirect cross-tree dependency
>> (taint_flag.module needs to be true for TAINT_LIVEPATCH for Josh's
>> patch to still work as intended). The least complicated thing to do
>> would be to just take this through the livepatch tree (with Rusty's
>> approval :-)), no?
>
> I don't want to be sneaking this behind Rusty's back, but he hasn't 
> reposnded so far.

I finally side-stepped this by appointing Jessica maintainer, thus her
Reviewed-by is sufficient for the module tree.

Lazy, huh?

Sorry for the delay,
Rusty.

> It's not vitally super-crucial to have this present in this very pull 
> request, so I am currently putting this on hold wrt. the upcoming merge 
> window pull request, and we'll then proceeed afterwards once Rusty 
> expressess his (n)ack. If this doesn't happen during the coming weeks, 
> I'll pick this up myself.
>
> Thanks,
>
> -- 
> Jiri Kosina
> SUSE Labs

Re: [RFC PATCH 1/2] module: Ensure a module's state is set accordingly during module coming cleanup code

2016-10-25 Thread Rusty Russell

Aaron Tomlin  writes:
> In load_module() in the event of an error, for e.g. unknown module
> parameter(s) specified we go to perform some module coming clean up
> operations. At this point the module is still in a "formed" state
> when it is actually going away.
>
> This patch updates the module's state accordingly to ensure anyone on the
> module_notify_list waiting for a module going away notification will be
> notified accordingly.

I recall a similar proposal before.

I've audited all the subscribers to check they didn't look at
mod->state; they seem OK.

We actually do this in the init-failed path, so this should be OK.

Acked-by: Rusty Russell 

Thanks,
Rusty.

> Signed-off-by: Aaron Tomlin 
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f57dd63..ff93ab8 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3708,6 +3708,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   sysfs_cleanup:
>   mod_sysfs_teardown(mod);
>   coming_cleanup:
> + mod->state = MODULE_STATE_GOING;
>   blocking_notifier_call_chain(&module_notify_list,
>MODULE_STATE_GOING, mod);
>   klp_module_going(mod);
> -- 
> 2.5.5

Re: [RFC PATCH 2/2] module: When modifying a module's text ignore modules which are going away too

2016-10-25 Thread Rusty Russell

Aaron Tomlin  writes:
> By default, during the access permission modification of a module's core
> and init pages, we only ignore modules that are malformed. There is no
> reason not to extend this to modules which are going away too.

Well, it depends on all the callers (ie. ftrace): is that also ignoring
modules which are going away?

Otherwise, we set MODULE_STATE_GOING, ftrace walks all the modules and
this one is still RO...

Thanks,
Rusty.

> This patch makes both set_all_modules_text_rw() and
> set_all_modules_text_ro() skip modules which are going away too.
>
> Signed-off-by: Aaron Tomlin 
> ---
>  kernel/module.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index ff93ab8..09c386b 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
>  
>   mutex_lock(&module_mutex);
>   list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
>   continue;
>  
>   frob_text(&mod->core_layout, set_memory_rw);
> @@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
>  
>   mutex_lock(&module_mutex);
>   list_for_each_entry_rcu(mod, &modules, list) {
> - if (mod->state == MODULE_STATE_UNFORMED)
> + if (mod->state == MODULE_STATE_UNFORMED ||
> + mod->state == MODULE_STATE_GOING)
>   continue;
>  
>   frob_text(&mod->core_layout, set_memory_ro);
> -- 
> 2.5.5

Re: bio linked list corruption.

2016-10-25 Thread Linus Torvalds

On Tue, Oct 25, 2016 at 5:27 PM, Dave Jones  wrote:
>
> DaveC: Do these look like real problems, or is this more "looks like
> random memory corruption" ?  It's been a while since I did some stress
> testing on XFS, so these might not be new..

Andy, do you think we could just do some poisoning of the stack as we
free it, to see if that catches anything?

Something truly stupid like just

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -218,6 +218,7 @@ static inline void free_thread_stack(struct
task_struct *tsk)
unsigned long flags;
int i;

+   memset(tsk->stack_vm_area->addr, 0xd0, THREAD_SIZE);
local_irq_save(flags);
for (i = 0; i < NR_CACHED_STACKS; i++) {
if (this_cpu_read(cached_stacks[i]))

or similar?

It seems like DaveJ had an easier time triggering these problems with
the stack cache, but they clearly didn't go away when the stack cache
was disabled. So maybe the stack cache just made the reuse more likely
and faster, making the problem show up faster too. But if we actively
poison things, we'll corrupt the free'd stack *immediately* if there
is some stale use..

Completely untested. Maybe there's some reason we can't write to the
whole thing like that?

 Linus

  Linus

Re: [RFC PATCH] PCI/ACPI: xgene: Add ECAM quirk for X-Gene PCIe controller

2016-10-25 Thread Duc Dang

On Wed, Sep 21, 2016 at 2:22 PM, Bjorn Helgaas  wrote:
> On Mon, Sep 19, 2016 at 06:07:37PM -0700, Duc Dang wrote:
>> On Mon, Sep 19, 2016 at 1:06 PM, Bjorn Helgaas  wrote:
>> > On Sat, Sep 17, 2016 at 07:24:38AM -0700, Duc Dang wrote:
>
>> This patch only adds the ability for X-Gene PCIe controller to be
>> probable in ACPI boot mode. All the 'quirk' code added is for ECAM to
>> work with X-Gene.
>>
>> > I sort of expected this to also remove, for example, the seemingly
>> > identical xgene_pcie_config_read32() in drivers/pci/host/pci-xgene.c.
>> > Actually, a bunch of this code seems to be duplicated from there.  It
>> > doesn't seem like we should end up with all this duplicated code.
>> >
>> > I'd really like it better if all this could get folded into
>> > pci-xgene.c so we don't end up with more files.
>>
>> I am still debating whether to put this X-Gene ECAM quirk code into
>> drivers/acpi or keep it here in drivers/pci/host. But given the
>> direction of other quirk patches for ThunderX and HiSi, seem like the
>> quirk will stay in drivers/pci/host. I can definitely fold the new
>> quirk code into pci-xgene.c as you suggested and eliminate the
>> identical one.
>
> I like Tomasz's patches, where the MCFG quirk itself is in
> acpi/pci_mcfg.c, and it uses config accessors exported from
> drivers/pci/host.

Yes, I removed the new file and folded the quirk code into pci-xgene.c.

>
> I do not want to end up with duplicate accessors.  The mapping
> functions and accessors should be the same whether we're booting with
> DT or ACPI.
>
> I think a patch to add ACPI support should only contain:
>
>   - acpi/pci_mcfg.c quirks to fix incorrect ACPI MCFG resources or use
> special accessors,
>
>   - pnp/quirks.c quirks to compensate for missing ACPI _CRS for the
> ECAM regions, and
>
>   - pci-xgene.c code to derive the csr_base and cfg_base.  Today we
> get that from DT, but the _CRS producer/consumer mess means we
> don't have a good way to get it from ACPI, so you'll need some
> sort of quirk for this.

The new quirk code (v2 patch) follows this direction, but I have not
found a good way to introduce quirk into pnp/quirks.c yet. Just to
clarify a little bit, our ACPI table provides ECAM base address from
_CBA method. The missing piece is the controller register base address
(csr_base) that I need to get from a hard-coded resource array.

I also owe you the rework for Configuration Request Retry Status
workaround, but it will need to be done in a separate patch set for
both DT and ACPI.
>
>> >> +struct xgene_pcie_acpi_root {
>> >> + void __iomem *csr_base;
>> >> + u32 version;
>> >> +};
>> >
>> > I think this should be folded into struct xgene_pcie_port so we don't
>> > have to allocate and manage it separately.
>>
>> I will need to look into this more. When booting with ACPI mode, the
>> code in pci-xgene.c is not used (except the cfg read/write functions
>> that are shared with ECAM quirk code), so puting these into
>> xgene_pcie_port will force ECAM quirk code to allocate this structure
>> as well.
>
> This information is needed whether booting with DT or ACPI, so we
> should use the existing xgene_pcie_port.csr_base and initialize it
> differently depending on which we're using.

The new ECAM quirk code will also allocate struct xgene_pcie_port, I
got rid of xgene_pcie_acpi_root struct.

>
>> >> + default:
>> >> + return -ENODEV;
>> >> + }
>> >> +
>> >> + xgene_root->csr_base = ioremap(csr_base, XGENE_CSR_LENGTH);
>> >
>> > There should be a request_region() somewhere, too.  Ideal would be to
>> > use devm_ioremap_resource(), but I don't know where this apparent
>> > resource is coming from.
>>
>> Yes, I will use request_region/devm_ioremap_resource here.
>
> We're not *adding* any new resources that need ioremapping; all we're
> doing is changing the *source* of the resource, so we should use the
> same devm_ioremap_resource() you already have in xgene_pcie_map_reg().
> You might have to refactor that slightly so we can lookup the resource
> via either DT or ACPI (you'll probably actually use a quirk since ACPI
> doesn't have a good mechanism for this), and then use the same call to
> devm_ioremap_resource().

I changed to use devm_ioremap_resource to map the csr_base from the
fixed resource array defined for each X-Gene SoC. The 'cat
/proc/iomem' for PCIe port on Mustang board is like following:

1f2b-1f2b : PNP0A08:00
e04000-e07fff : PCI Bus :00
  e04000-e0401f : PCI Bus :01
e04000-e0400f : :01:00.0
  e04000-e0400f : mlx4_core
e04010-e0401f : :01:00.0
e0d000-e0dfff : PCI ECAM
f0-ff : PCI Bus :00
  f0-f001ff : PCI Bus :01
f0-f001ff : :01:00.0
  f0-f001ff : mlx4_core

Is this what you expect? Or you are looking for something else?

Regards,
Duc Dabng.

[RFC PATCH v2 1/1] PCI/ACPI: xgene: Add ECAM quirk for X-Gene PCIe controller

2016-10-25 Thread Duc Dang

PCIe controllers in X-Gene SoCs is not ECAM compliant: software
needs to configure additional controller's register to address
device at bus:dev:function.

This patch depends on "ECAM quirks handling for ARM64 platforms"
series (http://www.spinics.net/lists/arm-kernel/msg530692.html,
the series was also modified by Bjorn) to address the limitation
above for X-Gene PCIe controller.

The quirk will only be applied for X-Gene PCIe MCFG table with
OEM revison 1, 2, 3 or 4 (PCIe controller v1 and v2 on X-Gene SoCs).

Signed-off-by: Duc Dang 
---
v2 changes:
1. Get rid of pci-xgene-ecam.c file and fold quirk code into pci-xgene.c
2. Redefine fixup array for X-Gene
3. Use devm_ioremap_resource to map csr_base

 drivers/acpi/pci_mcfg.c  |  30 
 drivers/pci/host/pci-xgene.c | 165 ++-
 include/linux/pci-ecam.h |   5 ++
 3 files changed, 197 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index bb2c508..9dfc937 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -96,6 +96,36 @@ struct mcfg_fixup {
THUNDER_ECAM_MCFG(2, 12),
THUNDER_ECAM_MCFG(2, 13),
 #endif
+#ifdef CONFIG_PCI_XGENE
+#define XGENE_V1_ECAM_MCFG(rev, seg) \
+   {"APM   ", "XGENE   ", rev, seg, MCFG_BUS_ANY, \
+   &xgene_v1_pcie_ecam_ops }
+#define XGENE_V2_1_ECAM_MCFG(rev, seg) \
+   {"APM   ", "XGENE   ", rev, seg, MCFG_BUS_ANY, \
+   &xgene_v2_1_pcie_ecam_ops }
+#define XGENE_V2_2_ECAM_MCFG(rev, seg) \
+   {"APM   ", "XGENE   ", rev, seg, MCFG_BUS_ANY, \
+   &xgene_v2_2_pcie_ecam_ops }
+
+   /* X-Gene SoC with v1 PCIe controller */
+   XGENE_V1_ECAM_MCFG(1, 0),
+   XGENE_V1_ECAM_MCFG(1, 1),
+   XGENE_V1_ECAM_MCFG(1, 2),
+   XGENE_V1_ECAM_MCFG(1, 3),
+   XGENE_V1_ECAM_MCFG(1, 4),
+   XGENE_V1_ECAM_MCFG(2, 0),
+   XGENE_V1_ECAM_MCFG(2, 1),
+   XGENE_V1_ECAM_MCFG(2, 2),
+   XGENE_V1_ECAM_MCFG(2, 3),
+   XGENE_V1_ECAM_MCFG(2, 4),
+   /* X-Gene SoC with v2.1 PCIe controller */
+   XGENE_V2_1_ECAM_MCFG(3, 0),
+   XGENE_V2_1_ECAM_MCFG(3, 1),
+   /* X-Gene SoC with v2.2 PCIe controller */
+   XGENE_V2_2_ECAM_MCFG(4, 0),
+   XGENE_V2_2_ECAM_MCFG(4, 1),
+   XGENE_V2_2_ECAM_MCFG(4, 2),
+#endif
 };
 
 static char mcfg_oem_id[ACPI_OEM_ID_SIZE];
diff --git a/drivers/pci/host/pci-xgene.c b/drivers/pci/host/pci-xgene.c
index 1de23d7..d6aa642 100644
--- a/drivers/pci/host/pci-xgene.c
+++ b/drivers/pci/host/pci-xgene.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -64,6 +66,7 @@
 /* PCIe IP version */
 #define XGENE_PCIE_IP_VER_UNKN 0
 #define XGENE_PCIE_IP_VER_11
+#define XGENE_PCIE_IP_VER_22
 
 struct xgene_pcie_port {
struct device_node  *node;
@@ -97,7 +100,15 @@ static inline u32 pcie_bar_low_val(u32 addr, u32 flags)
  */
 static void __iomem *xgene_pcie_get_cfg_base(struct pci_bus *bus)
 {
-   struct xgene_pcie_port *port = bus->sysdata;
+   struct pci_config_window *cfg;
+   struct xgene_pcie_port *port;
+
+   if (acpi_disabled)
+   port = bus->sysdata;
+   else {
+   cfg = bus->sysdata;
+   port = cfg->priv;
+   }
 
if (bus->number >= (bus->primary + 1))
return port->cfg_base + AXI_EP_CFG_ACCESS;
@@ -111,10 +122,18 @@ static void __iomem *xgene_pcie_get_cfg_base(struct 
pci_bus *bus)
  */
 static void xgene_pcie_set_rtdid_reg(struct pci_bus *bus, uint devfn)
 {
-   struct xgene_pcie_port *port = bus->sysdata;
+   struct pci_config_window *cfg;
+   struct xgene_pcie_port *port;
unsigned int b, d, f;
u32 rtdid_val = 0;
 
+   if (acpi_disabled)
+   port = bus->sysdata;
+   else {
+   cfg = bus->sysdata;
+   port = cfg->priv;
+   }
+
b = bus->number;
d = PCI_SLOT(devfn);
f = PCI_FUNC(devfn);
@@ -158,7 +177,15 @@ static void __iomem *xgene_pcie_map_bus(struct pci_bus 
*bus, unsigned int devfn,
 static int xgene_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
int where, int size, u32 *val)
 {
-   struct xgene_pcie_port *port = bus->sysdata;
+   struct pci_config_window *cfg;
+   struct xgene_pcie_port *port;
+
+   if (acpi_disabled)
+   port = bus->sysdata;
+   else {
+   cfg = bus->sysdata;
+   port = cfg->priv;
+   }
 
if (pci_generic_config_read32(bus, devfn, where & ~0x3, 4, val) !=
PCIBIOS_SUCCESSFUL)
@@ -189,6 +216,138 @@ static int xgene_pcie_config_read32(struct pci_bus *bus, 
unsigned int devfn,
.write = pci_generic_config_write32,
 };
 
+#ifdef CONFIG_ACPI
+static struct resource xgene_v1_csr_res[] = {
+   [0] = DEFINE_RES_MEM(0x1f2bUL, SZ_64K),
+   [1] = DEFINE

1 2 3 4 5 6 7 8 9 >

1 - 100 of 875 matches

Mail list logo