date:20171109

Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

2017-11-09 Thread Peter Zijlstra

On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:

> >> Also, you should not put cpumask_t on stack, that's 'broken'.
> 
> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
> 
> /* These two declarations are only used in check_irq_vectors_for_cpu_disable()
>  * below, which is protected by stop_machine().  Putting them on the stack
>  * results in a stack frame overflow.  Dynamically allocating could result in 
> a
>  * failure so declare these two cpumasks as global.
>  */
> static struct cpumask affinity_new, online_new;

That code no longer exists.. Also not entirely sure how it would be
helpful.

What you probably want to do is have a per-cpu cpumask, since
flush_tlb_others() is called with preemption disabled. But you probably
don't want an unconditionally allocated one, since most kernels will not
in fact be PV.

So you'll want something like:

static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);

And then you need something like:

for_each_possible_cpu(cpu) {
zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
GFP_KERNEL, cpu_to_node(cpu));
}

before you set the pv-op or so.

[PATCH 0/4] i2c: mpc: Clean up clock selection

2017-11-09 Thread Arseny Solokha

This series cleans up I2C clock selection for Freescale/NXP MPC SoCs during
the controller initialization for cases when clock settings are not to be
preserved from the bootloader.

Patch 1/4 fixes division by zero which happens during controller
initialization when (1) clock frequency is not specified in the Device
Tree, (2) preservation of clock settings from the bootloader is not
requested, and (3) the clock prescaler (which may actually depend
on the POR configuration) is not explicitly specified. It simply moves
obtaining the prescaler value before the clock computation.

Patch 2/4 unifies obtaining the prescaler value for MPC8544 with other
SoCs. It moves the relevant code to the helper function introduced
in commit 8ce795cb0c6b ("i2c: mpc: assign the correct prescaler from SVR")
and also adds handling of MPC8533 is similar to MPC8544 in this regard.

Patch 3/4 fixes checking the relevant bit in a controller's register used
for selecting the prescaler value for MPC8533 and MPC8544.

Patch 4/4 removes the facility for setting the clock prescaler value
at compile time. This facility is not used in the majority of cases. Getting
the prescaler value at run time currently covers more SoCs. Hardcoding it
is also wrong for some SoCs as it can be configured on board during POR.

Arseny Solokha (4):
  i2c: mpc: get MPC8xxx I2C clock prescaler before using it in
calculations
  i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/
MPC8xxx
  i2c: mpc: fix PORDEVSR2 mask for MPC8533/44
  i2c: mpc: always determine I2C clock prescaler at runtime

 drivers/i2c/busses/i2c-mpc.c | 72 ++--
 1 file changed, 30 insertions(+), 42 deletions(-)

-- 
2.15.0

[PATCH 3/4] i2c: mpc: fix PORDEVSR2 mask for MPC8533/44

2017-11-09 Thread Arseny Solokha

According to the reference manuals for the corresponding SoCs, SEC
frequency ratio configuration is indicated by bit 26 of the POR Device
Status Register 2. Consequently, SEC_CFG bit should be tested by mask 0x20,
not 0x80. Testing the wrong bit leads to selection of wrong I2C clock
prescaler on those SoCs.

Signed-off-by: Arseny Solokha 
---
 drivers/i2c/busses/i2c-mpc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index f47916466b82..8d60db0080f6 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -332,14 +332,18 @@ static u32 mpc_i2c_get_sec_cfg_8xxx(void)
if (prop) {
/*
 * Map and check POR Device Status Register 2
-* (PORDEVSR2) at 0xE0014
+* (PORDEVSR2) at 0xE0014. Note than while MPC8533
+* and MPC8544 indicate SEC frequency ratio
+* configuration as bit 26 in PORDEVSR2, other MPC8xxx
+* parts may store it differently or may not have it
+* at all.
 */
reg = ioremap(get_immrbase() + *prop + 0x14, 0x4);
if (!reg)
printk(KERN_ERR
   "Error: couldn't map PORDEVSR2\n");
else
-   val = in_be32(reg) & 0x0080; /* sec-cfg */
+   val = in_be32(reg) & 0x0020; /* sec-cfg */
iounmap(reg);
}
}
-- 
2.15.0

[PATCH 2/4] i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/ MPC8xxx

2017-11-09 Thread Arseny Solokha

Commit 8ce795cb0c6b ("i2c: mpc: assign the correct prescaler from SVR")
introduced the common helper function for obtaining the actual clock
prescaler value for MPC85xx. However, getting the prescaler for MPC8544
which depends on the SEC frequency ratio on this platform, has been always
performed separately based on the corresponding Device Tree configuration.

Move special handling of MPC8544 into that common helper. Make it dependent
on the SoC version and not on Device Tree compatible node, as is the case
with all other SoCs. Handle MPC8533 the same way which is similar
to MPC8544 in this regard, according to AN2919 "Determining the I2C
Frequency Divider Ratio for SCL".

Signed-off-by: Arseny Solokha 
---
 drivers/i2c/busses/i2c-mpc.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index bf0c86d41f1a..f47916466b82 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -350,7 +350,11 @@ static u32 mpc_i2c_get_sec_cfg_8xxx(void)
 
 static u32 mpc_i2c_get_prescaler_8xxx(void)
 {
-   /* mpc83xx and mpc82xx all have prescaler 1 */
+   /*
+* According to the AN2919 all MPC824x have prescaler 1, while MPC83xx
+* may have prescaler 1, 2, or 3, depending on the power-on
+* configuration.
+*/
u32 prescaler = 1;
 
/* mpc85xx */
@@ -367,6 +371,10 @@ static u32 mpc_i2c_get_prescaler_8xxx(void)
|| (SVR_SOC_VER(svr) == SVR_8610))
/* the above 85xx SoCs have prescaler 1 */
prescaler = 1;
+   else if ((SVR_SOC_VER(svr) == SVR_8533)
+   || (SVR_SOC_VER(svr) == SVR_8544))
+   /* the above 85xx SoCs have prescaler 3 or 2 */
+   prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2;
else
/* all the other 85xx have prescaler 2 */
prescaler = 2;
@@ -383,8 +391,6 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, 
u32 clock,
int i;
 
/* Determine proper divider value */
-   if (of_device_is_compatible(node, "fsl,mpc8544-i2c"))
-   prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2;
if (!prescaler)
prescaler = mpc_i2c_get_prescaler_8xxx();
 
-- 
2.15.0

[PATCH 4/4] i2c: mpc: always determine I2C clock prescaler at runtime

2017-11-09 Thread Arseny Solokha

Remove the facility for setting the prescaler value at compile time
entirely. It was only used for two SoCs, duplicating the actual value
for one of them and setting sometimes bogus value for another. Make all
MPC8xxx SoCs obtain their actual I2C clock prescaler from a single place
in the code.

Signed-off-by: Arseny Solokha 
---
 drivers/i2c/busses/i2c-mpc.c | 52 +---
 1 file changed, 15 insertions(+), 37 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 8d60db0080f6..e0f059687c2d 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -78,9 +78,7 @@ struct mpc_i2c_divider {
 };
 
 struct mpc_i2c_data {
-   void (*setup)(struct device_node *node, struct mpc_i2c *i2c,
- u32 clock, u32 prescaler);
-   u32 prescaler;
+   void (*setup)(struct device_node *node, struct mpc_i2c *i2c, u32 clock);
 };
 
 static inline void writeccr(struct mpc_i2c *i2c, u32 x)
@@ -201,7 +199,7 @@ static const struct mpc_i2c_divider mpc_i2c_dividers_52xx[] 
= {
 };
 
 static int mpc_i2c_get_fdr_52xx(struct device_node *node, u32 clock,
- int prescaler, u32 *real_clk)
+ u32 *real_clk)
 {
const struct mpc_i2c_divider *div = NULL;
unsigned int pvr = mfspr(SPRN_PVR);
@@ -236,7 +234,7 @@ static int mpc_i2c_get_fdr_52xx(struct device_node *node, 
u32 clock,
 
 static void mpc_i2c_setup_52xx(struct device_node *node,
 struct mpc_i2c *i2c,
-u32 clock, u32 prescaler)
+u32 clock)
 {
int ret, fdr;
 
@@ -246,7 +244,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node,
return;
}
 
-   ret = mpc_i2c_get_fdr_52xx(node, clock, prescaler, &i2c->real_clk);
+   ret = mpc_i2c_get_fdr_52xx(node, clock, &i2c->real_clk);
fdr = (ret >= 0) ? ret : 0x3f; /* backward compatibility */
 
writeb(fdr & 0xff, i2c->base + MPC_I2C_FDR);
@@ -258,7 +256,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node,
 #else /* !(CONFIG_PPC_MPC52xx || CONFIG_PPC_MPC512x) */
 static void mpc_i2c_setup_52xx(struct device_node *node,
 struct mpc_i2c *i2c,
-u32 clock, u32 prescaler)
+u32 clock)
 {
 }
 #endif /* CONFIG_PPC_MPC52xx || CONFIG_PPC_MPC512x */
@@ -266,7 +264,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node,
 #ifdef CONFIG_PPC_MPC512x
 static void mpc_i2c_setup_512x(struct device_node *node,
 struct mpc_i2c *i2c,
-u32 clock, u32 prescaler)
+u32 clock)
 {
struct device_node *node_ctrl;
void __iomem *ctrl;
@@ -289,12 +287,12 @@ static void mpc_i2c_setup_512x(struct device_node *node,
}
 
/* The clock setup for the 52xx works also fine for the 512x */
-   mpc_i2c_setup_52xx(node, i2c, clock, prescaler);
+   mpc_i2c_setup_52xx(node, i2c, clock);
 }
 #else /* CONFIG_PPC_MPC512x */
 static void mpc_i2c_setup_512x(struct device_node *node,
 struct mpc_i2c *i2c,
-u32 clock, u32 prescaler)
+u32 clock)
 {
 }
 #endif /* CONFIG_PPC_MPC512x */
@@ -388,16 +386,13 @@ static u32 mpc_i2c_get_prescaler_8xxx(void)
 }
 
 static int mpc_i2c_get_fdr_8xxx(struct device_node *node, u32 clock,
- u32 prescaler, u32 *real_clk)
+ u32 *real_clk)
 {
const struct mpc_i2c_divider *div = NULL;
+   u32 prescaler = mpc_i2c_get_prescaler_8xxx();
u32 divider;
int i;
 
-   /* Determine proper divider value */
-   if (!prescaler)
-   prescaler = mpc_i2c_get_prescaler_8xxx();
-
if (clock == MPC_I2C_CLOCK_LEGACY) {
/* see below - default fdr = 0x1031 -> div = 16 * 3072 */
*real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072);
@@ -425,7 +420,7 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, 
u32 clock,
 
 static void mpc_i2c_setup_8xxx(struct device_node *node,
 struct mpc_i2c *i2c,
-u32 clock, u32 prescaler)
+u32 clock)
 {
int ret, fdr;
 
@@ -436,7 +431,7 @@ static void mpc_i2c_setup_8xxx(struct device_node *node,
return;
}
 
-   ret = mpc_i2c_get_fdr_8xxx(node, clock, prescaler, &i2c->real_clk);
+   ret = mpc_i2c_get_fdr_8xxx(node, clock, &i2c->real_clk);
fdr = (ret >= 0) ? ret : 0x1031; /* backward compatibility */
 
writeb(fdr & 0xff,

Re: [PATCH] perf evsel: Fix incorrect precise_ip in default event name

2017-11-09 Thread zhangmengting


Hi Namhyung,

Yeah, you are right. I'll send a new patch later.

Thanks,

Mengting Zhang


On 2017/11/10 14:30, Namhyung Kim wrote:

Hello,

On Fri, Nov 10, 2017 at 01:49:06PM +0800, Mengting Zhang wrote:

When no event is specified with -e option, perf will specify a
"cycles" event with the highest level of precision available in
perf_event_attr.precise_ip as the default event. But the evsel name
shows an incorrect precise ip, fix it.

For example, with a highest precision perf_event_attr.precise_ip = 2,
the evsel name "cycles:ppp" shows a wrong precision available.

Before:
$./perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ]
$./perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000,
sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

After:
$./perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
$./perf evlist -v
cycles:pp: size: 112, { sample_period, sample_freq }: 4000,
sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

Signed-off-by: Mengting Zhang 
---
  tools/perf/util/evsel.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0dccdb8..94cf11d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise)
if (asprintf(&evsel->name, "cycles%s%s%.*s",
 (attr.precise_ip || attr.exclude_kernel) ? ":" : "",
 attr.exclude_kernel ? "u" : "",
-attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0)
+attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0)

I think you don't need to check value of the precise_ip anymore.
The following should be ok:

 attr.precise_ip, "ppp") < 0)

Thanks,
Namhyung


.

[PATCH 1/4] i2c: mpc: get MPC8xxx I2C clock prescaler before using it in calculations

2017-11-09 Thread Arseny Solokha

Obtaining the actual I2C clock prescaler value in mpc_i2c_setup_8xxx() only
happens when the clock parameter is set to something other than
MPC_I2C_CLOCK_LEGACY. When the clock parameter is exactly
MPC_I2C_CLOCK_LEGACY, the prescaler parameter is used in arithmetic
division as provided by the caller, resulting in a division by zero
for the majority of processors supported by the module.

Avoid division by zero by obtaining the actual I2C clock prescaler
in mpc_i2c_setup_8xxx() unconditionally regardless of the passed clock
value.

Signed-off-by: Arseny Solokha 
---
 drivers/i2c/busses/i2c-mpc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 96caf378b1dc..bf0c86d41f1a 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -382,18 +382,18 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, 
u32 clock,
u32 divider;
int i;
 
-   if (clock == MPC_I2C_CLOCK_LEGACY) {
-   /* see below - default fdr = 0x1031 -> div = 16 * 3072 */
-   *real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072);
-   return -EINVAL;
-   }
-
/* Determine proper divider value */
if (of_device_is_compatible(node, "fsl,mpc8544-i2c"))
prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2;
if (!prescaler)
prescaler = mpc_i2c_get_prescaler_8xxx();
 
+   if (clock == MPC_I2C_CLOCK_LEGACY) {
+   /* see below - default fdr = 0x1031 -> div = 16 * 3072 */
+   *real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072);
+   return -EINVAL;
+   }
+
divider = fsl_get_sys_freq() / clock / prescaler;
 
pr_debug("I2C: src_clock=%d clock=%d divider=%d\n",
-- 
2.15.0

Re: [PATCH v3 1/2] PM / domains: Rework governor code to be more consistent

2017-11-09 Thread Ulf Hansson

On 7 November 2017 at 02:23, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> The genpd governor currently uses negative PM QoS values to indicate
> the "no suspend" condition and 0 as "no restriction", but it doesn't
> use them consistently.  Moreover, it tries to refresh QoS values for
> already suspended devices in a quite questionable way.
>
> For the above reasons, rework it to be a bit more consistent.
>
> First off, note that dev_pm_qos_read_value() in
> dev_update_qos_constraint() and __default_power_down_ok() is
> evaluated for devices in suspend.  Moreover, that only happens if the
> effective_constraint_ns value for them is negative (meaning "no
> suspend").  It is not evaluated in any other cases, so effectively
> the QoS values are only updated for devices in suspend that should
> not have been suspended in the first place.  In all of the other
> cases, the QoS values taken into account are the effective ones from
> the time before the device has been suspended, so generally devices
> need to be resumed and suspended again for new QoS values to take
> effect anyway.  Thus evaluating dev_update_qos_constraint() in
> those two places doesn't make sense at all, so drop it.
>
> Second, initialize effective_constraint_ns to 0 ("no constraint")
> rather than to (-1) ("no suspend"), which makes more sense in
> general and in case effective_constraint_ns is never updated
> (the device is in suspend all the time or it is never suspended)
> it doesn't affect the device's parent and so on.
>
> Finally, rework default_suspend_ok() to explicitly handle the
> "no restriction" and "no suspend" special cases.
>
> Also add WARN_ON() around checks that should never trigger.
>
> Signed-off-by: Rafael J. Wysocki 
> Tested-by: Geert Uytterhoeven 

Acked-by: Ulf Hansson 

Kind regards
Uffe

> ---
>
> v2 -> v3: Take children that don't belong to genpd power domains into
>   account in dev_update_qos_constraint().
>
> ---
>  drivers/base/power/domain.c  |2
>  drivers/base/power/domain_governor.c |   71 
> ---
>  2 files changed, 50 insertions(+), 23 deletions(-)
>
> Index: linux-pm/drivers/base/power/domain.c
> ===
> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1331,7 +1331,7 @@ static struct generic_pm_domain_data *ge
>
> gpd_data->base.dev = dev;
> gpd_data->td.constraint_changed = true;
> -   gpd_data->td.effective_constraint_ns = -1;
> +   gpd_data->td.effective_constraint_ns = 0;
> gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier;
>
> spin_lock_irq(&dev->power.lock);
> Index: linux-pm/drivers/base/power/domain_governor.c
> ===
> --- linux-pm.orig/drivers/base/power/domain_governor.c
> +++ linux-pm/drivers/base/power/domain_governor.c
> @@ -14,22 +14,33 @@
>  static int dev_update_qos_constraint(struct device *dev, void *data)
>  {
> s64 *constraint_ns_p = data;
> -   s32 constraint_ns = -1;
> +   s64 constraint_ns;
>
> -   if (dev->power.subsys_data && dev->power.subsys_data->domain_data)
> +   if (dev->power.subsys_data && dev->power.subsys_data->domain_data) {
> +   /*
> +* Only take suspend-time QoS constraints of devices into
> +* account, because constraints updated after the device has
> +* been suspended are not guaranteed to be taken into account
> +* anyway.  In order for them to take effect, the device has 
> to
> +* be resumed and suspended again.
> +*/
> constraint_ns = dev_gpd_data(dev)->td.effective_constraint_ns;
> -
> -   if (constraint_ns < 0) {
> +   } else {
> +   /*
> +* The child is not in a domain and there's no info on its
> +* suspend/resume latencies, so assume them to be negligible 
> and
> +* take its current PM QoS constraint (that's the only thing
> +* known at this point anyway).
> +*/
> constraint_ns = dev_pm_qos_read_value(dev);
> -   constraint_ns *= NSEC_PER_USEC;
> +   if (constraint_ns > 0)
> +   constraint_ns *= NSEC_PER_USEC;
> }
> +
> +   /* 0 means "no constraint" */
> if (constraint_ns == 0)
> return 0;
>
> -   /*
> -* constraint_ns cannot be negative here, because the device has been
> -* suspended.
> -*/
> if (constraint_ns < *constraint_ns_p || *constraint_ns_p == 0)
> *constraint_ns_p = constraint_ns;
>
> @@ -76,14 +87,32 @@ static bool default_suspend_ok(struct de
> device_for_each_child(dev, &constraint_ns,
>   dev_update_qos_constraint);
>
> -

Re: [PATCH v4 2/2] PM / QoS: Fix device resume latency framework

2017-11-09 Thread Ulf Hansson

On 7 November 2017 at 11:33, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> The special value of 0 for device resume latency PM QoS means
> "no restriction", but there are two problems with that.
>
> First, device resume latency PM QoS requests with 0 as the
> value are always put in front of requests with positive
> values in the priority lists used internally by the PM QoS
> framework, causing 0 to be chosen as an effective constraint
> value.  However, that 0 is then interpreted as "no restriction"
> effectively overriding the other requests with specific
> restrictions which is incorrect.
>
> Second, the users of device resume latency PM QoS have no
> way to specify that *any* resume latency at all should be
> avoided, which is an artificial limitation in general.
>
> To address these issues, modify device resume latency PM QoS to
> use S32_MAX as the "no constraint" value and 0 as the "no
> latency at all" one and rework its users (the cpuidle menu
> governor, the genpd QoS governor and the runtime PM framework)
> to follow these changes.
>
> Also add a special "n/a" value to the corresponding user space I/F
> to allow user space to indicate that it cannot accept any resume
> latencies at all for the given device.
>
> Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency 
> constraints)
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
> Reported-by: Reinette Chatre 
> Signed-off-by: Rafael J. Wysocki 
> Tested-by: Reinette Chatre 
> Tested-by: Geert Uytterhoeven 

Acked-by: Ulf Hansson 

Kind regards
Uffe

> ---
>
> As noticed by Ramesh, the v3 had issues with an overlooked value
> conversion and a stale comment, so here goes a v4.
>
> ---
>  Documentation/ABI/testing/sysfs-devices-power |4 +-
>  drivers/base/cpu.c|3 +
>  drivers/base/power/domain.c   |2 -
>  drivers/base/power/domain_governor.c  |   40 
> ++
>  drivers/base/power/qos.c  |5 ++-
>  drivers/base/power/runtime.c  |2 -
>  drivers/base/power/sysfs.c|   25 +---
>  drivers/cpuidle/governors/menu.c  |4 +-
>  include/linux/pm_qos.h|   26 +++-
>  9 files changed, 68 insertions(+), 43 deletions(-)
>
> Index: linux-pm/drivers/base/power/sysfs.c
> ===
> --- linux-pm.orig/drivers/base/power/sysfs.c
> +++ linux-pm/drivers/base/power/sysfs.c
> @@ -218,7 +218,14 @@ static ssize_t pm_qos_resume_latency_sho
>   struct device_attribute *attr,
>   char *buf)
>  {
> -   return sprintf(buf, "%d\n", dev_pm_qos_requested_resume_latency(dev));
> +   s32 value = dev_pm_qos_requested_resume_latency(dev);
> +
> +   if (value == 0)
> +   return sprintf(buf, "n/a\n");
> +   else if (value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT)
> +   value = 0;
> +
> +   return sprintf(buf, "%d\n", value);
>  }
>
>  static ssize_t pm_qos_resume_latency_store(struct device *dev,
> @@ -228,11 +235,21 @@ static ssize_t pm_qos_resume_latency_sto
> s32 value;
> int ret;
>
> -   if (kstrtos32(buf, 0, &value))
> -   return -EINVAL;
> +   if (!kstrtos32(buf, 0, &value)) {
> +   /*
> +* Prevent users from writing negative or "no constraint" 
> values
> +* directly.
> +*/
> +   if (value < 0 || value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT)
> +   return -EINVAL;
>
> -   if (value < 0)
> +   if (value == 0)
> +   value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT;
> +   } else if (!strcmp(buf, "n/a") || !strcmp(buf, "n/a\n")) {
> +   value = 0;
> +   } else {
> return -EINVAL;
> +   }
>
> ret = dev_pm_qos_update_request(dev->power.qos->resume_latency_req,
> value);
> Index: linux-pm/include/linux/pm_qos.h
> ===
> --- linux-pm.orig/include/linux/pm_qos.h
> +++ linux-pm/include/linux/pm_qos.h
> @@ -28,16 +28,19 @@ enum pm_qos_flags_status {
> PM_QOS_FLAGS_ALL,
>  };
>
> -#define PM_QOS_DEFAULT_VALUE -1
> +#define PM_QOS_DEFAULT_VALUE   (-1)
> +#define PM_QOS_LATENCY_ANY S32_MAX
> +#define PM_QOS_LATENCY_ANY_NS  ((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC)
>
>  #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE   (2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE   (2000 * USEC_PER_SEC)
>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE0
>  #define PM_QOS_MEMORY_BANDWIDTH_DEFAULT_VALUE  0
> -#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUE0
> +#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUEPM_QOS_LATENCY_ANY
> +#define PM_QOS_RESU

[PATCH] fixup! kvm: arm debug: introduce helper for single-step

2017-11-09 Thread Alex Bennée

---
 arch/arm/include/asm/kvm_host.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a2e881d6108e..26a1ea6c6542 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -286,7 +286,10 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 static inline bool kvm_arm_handle_step_debug(struct kvm_vcpu *vcpu,
-struct kvm_run *run) {}
+struct kvm_run *run)
+{
+   return false;
+}
 
 int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
   struct kvm_device_attr *attr);
-- 
2.14.2

[git pull] Input updates for v4.14-rc8

2017-11-09 Thread Dmitry Torokhov

Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive updates for the input subsystem. You will get:

- a new ACPI ID for Elan touchpad found in yet another Ideapad model;

- Synaptics RMI4 will allow binding to controllers reporting SMB version
  3 (note that we are not adding any new ACPI IDs to the Synaptics PS/2
  drover so unless user explicitly enables intertouch support there is
  no user-visible change);

- a fixup to TSC 2004/5 touchscreen driver to mark input devices as
  "direct" to help userspace identify the type of device they are
  dealing with.

Changelog:
-

Kai-Heng Feng (1):
  Input: elan_i2c - add ELAN060C to the ACPI table

Martin Kepplinger (1):
  Input: tsc200x-core - set INPUT_PROP_DIRECT

Yiannis Marangos (1):
  Input: synaptics-rmi4 - RMI4 can also use SMBUS version 3

Diffstat:


 drivers/input/mouse/elan_i2c_core.c  | 1 +
 drivers/input/rmi4/rmi_smbus.c   | 4 ++--
 drivers/input/touchscreen/tsc200x-core.c | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

Thanks.


-- 
Dmitry

[f2fs-dev] [PATCH] f2fs: validate before set/clear free nat bitmap

2017-11-09 Thread LiFan

In flush_nat_entries, all dirty nats will be flushed and if
their new address isn't NULL_ADDR, their bitmaps will be updated,
the free_nid_count of the bitmaps will be increaced regardless
of whether the nats have already been occupied before.
This could lead to wrong free_nid_count.
So this patch checks the status of the bits beforeactually
set/clear them.

Signed-off-by: Fan li 
---
 fs/f2fs/node.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d234c6e..b965a53 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct
f2fs_sb_info *sbi, nid_t nid,
if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
return;
 
-   if (set)
+   if (set) {
+   if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
__set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-   else
-   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-
-   if (set)
nm_i->free_nid_count[nat_ofs]++;
-   else if (!build)
-   nm_i->free_nid_count[nat_ofs]--;
+   } else {
+   if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+   return;
+   __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+   if (!build)
+   nm_i->free_nid_count[nat_ofs]--;
+   }
 }
 
 static void scan_nat_page(struct f2fs_sb_info *sbi,
-- 
2.7.4

Re: [PATCH v2] locking/lockdep: Revise Documentation/locking/crossrelease.txt

2017-11-09 Thread Byungchul Park


On 11/10/2017 4:30 PM, Ingo Molnar wrote:


* Byungchul Park  wrote:


 Event C depends on event A.
 Event A depends on event B.
 Event B depends on event C.
  
-   NOTE: Precisely speaking, a dependency is one between whether a

-   waiter for an event can be woken up and whether another waiter for
-   another event can be woken up. However from now on, we will describe
-   a dependency as if it's one between an event and another event for
-   simplicity.


Why was this explanation removed?


-Lockdep tries to detect a deadlock by checking dependencies created by
-lock operations, acquire and release. Waiting for a lock corresponds to
-waiting for an event, and releasing a lock corresponds to triggering an
-event in the previous section.
+Lockdep tries to detect a deadlock by checking circular dependencies
+created by lock operations, acquire and release, which are wait and
+event respectively.


What? You changed a readable paragraph into an unreadable one.

Sorry, this text needs to be acked by someone with good English skills, and I
don't have the time right now to fix it all up. Please send minimal, obvious
typo/grammar fixes only.


I will send one including minimal fixes at the next spin.

--
Thanks,
Byungchul

[PATCH RFC] kbuild: fixes in Makefile.lib

2017-11-09 Thread Cao jin

commit

  cf4f21938e13e ("kbuild: Allow to specify composite modules with modname-m")

add modname-m support, but miss to update the corresponding multi-objs-m
defination.

commit 551559e13af1c ("kbuild: implement modules.order") miss to filter
the subdir listed in obj-m. Except that the subdirs are totally identical
between obj-y and obj-m, or else I think it will miss something.

But until now, no one has complaining about it, so I guess it just no
one has triggerred it.

Signed-off-by: Cao jin 
---
I found these 2 points which I think might be wrong during code
inspection, but until now, they seems didn't do anything bad, so I am not
sure this is a problem:)

 scripts/Makefile.lib | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 580e605118e4..3209f303213b 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -22,7 +22,7 @@ lib-y := $(filter-out $(obj-y), $(sort $(lib-y) $(lib-m)))
 # Determine modorder.
 # Unfortunately, we don't have information about ordering between -y
 # and -m subdirs.  Just put -y's first.
-modorder   := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y)) 
$(obj-m:.o=.ko))
+modorder   := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y) 
$(obj-m) $(obj-m:.o=.ko))
 
 # Handle objects in subdirs
 # ---
@@ -49,7 +49,7 @@ single-used-m := $(sort $(filter-out 
$(multi-used-m),$(obj-m)))
 # Build list of the parts of our composite objects, our composite
 # objects depend on those (obviously)
 multi-objs-y := $(foreach m, $(multi-used-y), $($(m:.o=-objs)) $($(m:.o=-y)))
-multi-objs-m := $(foreach m, $(multi-used-m), $($(m:.o=-objs)) $($(m:.o=-y)))
+multi-objs-m := $(foreach m, $(multi-used-m), $($(m:.o=-objs)) $($(m:.o=-y)) 
$($(m:.o=-m))
 multi-objs   := $(multi-objs-y) $(multi-objs-m)
 
 # $(subdir-obj-y) is the list of objects in $(obj-y) which uses dir/ to
-- 
2.13.6

Re: [PATCH v5 15/37] tracing: Add variable support to hist triggers

2017-11-09 Thread Namhyung Kim

Hi Tom,

On Thu, Nov 09, 2017 at 02:33:46PM -0600, Tom Zanussi wrote:
> Add support for saving the value of a current event's event field by
> assigning it to a variable that can be read by a subsequent event.
> 
> The basic syntax for saving a variable is to simply prefix a unique
> variable name not corresponding to any keyword along with an '=' sign
> to any event field.
> 
> Both keys and values can be saved and retrieved in this way:
> 
> # echo 'hist:keys=next_pid:vals=$ts0:ts0=$common_timestamp ...
> # echo 'hist:timer_pid=common_pid:key=$timer_pid ...'
> 
> If a variable isn't a key variable or prefixed with 'vals=', the
> associated event field will be saved in a variable but won't be summed
> as a value:
> 
> # echo 'hist:keys=next_pid:ts1=$common_timestamp:...
> 
> Multiple variables can be assigned at the same time:
> 
> # echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=$common_timestamp,b=field1 
> ...
> 
> Multiple (or single) variables can also be assigned at the same time
> using separate assignments:
> 
> # echo 'hist:keys=pid:vals=$ts0:ts0=$common_timestamp:b=field1:c=field2 
> ...
> 
> Variables set as above can be used by being referenced from another
> event, as described in a subsequent patch.
> 
> Signed-off-by: Tom Zanussi 
> Signed-off-by: Baohong Liu 
> ---

[SNIP]
> +static int parse_var_defs(struct hist_trigger_data *hist_data)
> +{
> + char *s, *str, *var_name, *field_str;
> + unsigned int i, j, n_vars = 0;
> + int ret = 0;
> +
> + for (i = 0; i < hist_data->attrs->n_assignments; i++) {
> + str = hist_data->attrs->assignment_str[i];
> + for (j = 0; j < TRACING_MAP_VARS_MAX; j++) {
> + field_str = strsep(&str, ",");
> + if (!field_str)
> + break;
> +
> + var_name = strsep(&field_str, "=");
> + if (!var_name || !field_str) {
> + ret = -EINVAL;
> + goto free;
> + }
> +
> + s = kstrdup(var_name, GFP_KERNEL);
> + if (!s) {
> + ret = -ENOMEM;
> + goto free;
> + }
> + hist_data->attrs->var_defs.name[n_vars] = s;
> +
> + s = kstrdup(field_str, GFP_KERNEL);
> + if (!s) {
> + kfree(hist_data->attrs->var_defs.name[n_vars]);
> + ret = -ENOMEM;
> + goto free;
> + }
> + hist_data->attrs->var_defs.expr[n_vars++] = s;
> +
> + hist_data->attrs->var_defs.n_vars = n_vars;
> +
> + if (n_vars == TRACING_MAP_VARS_MAX)
> + goto free;

This will silently discard all variables.  Why not returning an error?
Also I think it should be moved to the beginning of this block..

Thanks,
Namhyung


> + }
> + }
> +
> + return ret;
> + free:
> + free_var_defs(hist_data);
> +
> + return ret;
> +}

[RFC PATCH] mm: fix device-dax pud write-faults triggered by get_user_pages()

2017-11-09 Thread Dan Williams

Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().

kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
 follow_page_mask+0x28c/0x6e0
 __get_user_pages+0xe4/0x6c0
 get_user_pages_unlocked+0x130/0x1b0
 get_user_pages_fast+0x89/0xb0
 iov_iter_get_pages_alloc+0x114/0x4a0
 nfs_direct_read_schedule_iovec+0xd2/0x350
 ? nfs_start_io_direct+0x63/0x70
 nfs_file_direct_read+0x1e0/0x250
 nfs_file_read+0x90/0xc0

Use pud_access_permitted() to implement pud_write(), a later cleanup can
remove {pte,pmd,pud}_write and replace them with
{pte,pmd,pud}_access_permitted() drectly so that we only have one set of
helpers these kinds of checks. For now, implementing pud_write()
simplifies -stable backports.

Cc: 
Cc: Dave Hansen 
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams 
---

Sending this as RFC for opinion on whether this should just be a
pud_flags() & _PAGE_RW check, like pmd_write, or pud_access_permitted()
that also takes protection keys into account.

 include/linux/hugetlb.h |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fbf5b31d47ee..6a142b240ef7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -242,8 +242,7 @@ static inline int pgd_write(pgd_t pgd)
 #ifndef pud_write
 static inline int pud_write(pud_t pud)
 {
-   BUG();
-   return 0;
+   return pud_access_permitted(pud, WRITE);
 }
 #endif

Re: [PATCH v2] locking/lockdep: Revise Documentation/locking/crossrelease.txt

2017-11-09 Thread Ingo Molnar


* Byungchul Park  wrote:

> Event C depends on event A.
> Event A depends on event B.
> Event B depends on event C.
>  
> -   NOTE: Precisely speaking, a dependency is one between whether a
> -   waiter for an event can be woken up and whether another waiter for
> -   another event can be woken up. However from now on, we will describe
> -   a dependency as if it's one between an event and another event for
> -   simplicity.

Why was this explanation removed?

> -Lockdep tries to detect a deadlock by checking dependencies created by
> -lock operations, acquire and release. Waiting for a lock corresponds to
> -waiting for an event, and releasing a lock corresponds to triggering an
> -event in the previous section.
> +Lockdep tries to detect a deadlock by checking circular dependencies
> +created by lock operations, acquire and release, which are wait and
> +event respectively.

What? You changed a readable paragraph into an unreadable one.

Sorry, this text needs to be acked by someone with good English skills, and I 
don't have the time right now to fix it all up. Please send minimal, obvious
typo/grammar fixes only.

Thanks,

Ingo

Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again

2017-11-09 Thread Ingo Molnar


* Rafael J. Wysocki  wrote:

> Hi Linus,
> 
> On 11/9/2017 11:38 AM, WANG Chao wrote:
> > Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > a serious performance issue when reading from /proc/cpuinfo on system
> > with aperfmperf.
> > 
> > For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > On a system with 64 cpus, it takes 1.5s to finish running `cat
> > /proc/cpuinfo`, while it previously was done in 15ms.
> 
> Honestly, I'm not sure what to do to address this ATM.
> 
> The last requested frequency is only available in the non-HWP case, so it
> cannot be used universally.

This is a serious regression that needs to be fixed ASAP, because the slowdown 
is 
utterly ridiculous on a 120 CPU system:

  fomalhaut:~> time cat /proc/cpuinfo  >/dev/null

  real0m2.689s
  user0m0.001s
  sys 0m0.007s

Thanks,

Ingo

Re: [alsa-devel] [RFC PATCH v2 7/7] sound: core: Avoid using timespec for struct snd_timer_tread

2017-11-09 Thread Takashi Iwai

On Fri, 10 Nov 2017 00:20:10 +0100,
Arnd Bergmann wrote:
> 
> On Thu, Nov 9, 2017 at 7:11 PM, Takashi Iwai  wrote:
> > On Thu, 09 Nov 2017 18:01:47 +0100,
> > Arnd Bergmann wrote:
> >>
> >> On Thu, Nov 9, 2017 at 5:52 PM, Takashi Iwai  wrote:
> >> >
> >> > IOW, is there any macro indicating the 64bit user time_t?
> >>
> >> There is a macro defined by the C library, but so far we have not
> >> started relying on it in kernel headers, because there is no guarantee
> >> that this symbol is visible before sys/time.h has been included,
> >> and there are some cases where it's possible to include a kernel
> >> header before sys/time.h.
> >>
> >> In case of sound/asound.h, that should be no problem since we rely
> >> on having seen the definition on 'struct timeval' already today, and
> >> that must come from sys/time.h. Then we just need to make sure that
> >> all C libraries define the same macro.
> >>
> >> Are you sure about the switch()/case problem? I thought that worked
> >> in C99, the only problem would be using the macro outside of a
> >> function, e.g. as initalizer for a variable
> >
> > Hmm, OK it seems working.
> >
> > But, honestly speaking, it's too scaring.  I'm OK if it were only in
> > the kernel local code.  But it's the API/ABI definition, which is
> > referred by user-space...
> >
> > A more solid condition would be really appreciated.
> 
> I understand your concern here and agree it's really ugly. It did take us
> many attempts to come up with this trick for other cases, so my initial
> reaction would be to use the same thing everywhere since I know
> it works,  but we can use #ifdef instead if you prefer that. I think we
> can use a single #ifdef variant to cover all cases, but I'd have to think
> about the x32 and x86-32 some more here. With this trick, we can
> make user space with new glibc use data structures that are compatible
> with 64-bit kernels and avoid the additional translation helpers:
> 
> enum {
>   SNDRV_PCM_MMAP_OFFSET_DATA = 0x,
>   SNDRV_PCM_MMAP_OFFSET_CONTROL = 0x8100,
> #if (__BITS_PER_LONG == 64) || !defined(__USE_TIME_BITS64)

Yeah, it's definitely better, more understandable!


thanks,

Takashi

Re: [PATCH] x86, pkeys: update documentation about availability

2017-11-09 Thread Ingo Molnar


* Dave Hansen  wrote:

> On 11/09/2017 10:12 PM, Ingo Molnar wrote:
> > 
> > * Dave Hansen  wrote:
> > 
> >>
> >> From: Dave Hansen 
> >>
> >> Now that CPUs that implement Memory Protection Keys are publicly
> >> available we can be a bit less oblique about where it is available.
> >>
> >> Signed-off-by: Dave Hansen 
> >> ---
> >>
> >>  b/Documentation/x86/protection-keys.txt |9 +++--
> >>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff -puN Documentation/x86/protection-keys.txt~pkeys-update 
> >> Documentation/x86/protection-keys.txt
> >> --- a/Documentation/x86/protection-keys.txt~pkeys-update   2017-11-09 
> >> 10:36:53.381467202 -0800
> >> +++ b/Documentation/x86/protection-keys.txt2017-11-09 
> >> 10:43:15.527466249 -0800
> >> @@ -1,5 +1,10 @@
> >> -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
> >> -which will be found on future Intel CPUs.
> >> +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
> >> +which is found on Intel's Skylake "Scalable Processor" Server CPUs.
> >> +It will be avalable in future non-server parts.
> >> +
> >> +For anyone wishing to test or use this feature, it is available in
> >> +Amazon's EC2 C5 instances and is known to work there using an Ubuntu
> >> +17.04 image.
> >>  
> >>  Memory Protection Keys provides a mechanism for enforcing page-based
> >>  protections, but without requiring modification of the page tables
> > 
> > Could we please first fix the pkeys self-test? One of the testcases doesn't 
> > build 
> > at all:
> > 
> >  gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 
> > -O2 -g -std=gnu99 -pthread -Wall -no-pie  protection_keys.c -lrt -ldl -lm
> >  In file included from /usr/include/signal.h:57:0,
> >   from protection_keys.c:33:
> >  protection_keys.c: In function ‘signal_handler’:
> >  protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or 
> > ‘__attribute__’ 
> >  before ‘.’ token
> >u64 si_pkey;
> 
> That's odd.  I build them all the time.  I compiled it just now with
> 4.14-rc8 and gcc 4.8.4.
> 
> I wonder if this is more fallout from the glibc headers getting updated
> to now contain pkey-related stuff.  si_pkey might be getting #defined
> over for the siginfo si_pkey.
> 
> What distro are you seeing this on?

Latest Ubuntu, 17.10:

  triton:~/tip> cat /etc/os-release 
  NAME="Ubuntu"
  VERSION="17.10 (Artful Aardvark)"

  triton:~/tip> apt-file find /usr/include/signal.h
  libc6-dev: /usr/include/signal.h

  triton:~/tip> dpkg -l libc6-dev
  Desired=Unknown/Install/Remove/Purge/Hold
  | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
  |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
  ||/ NameVersion  
Architecture Description
  
+++-===---
  ii  libc6-dev:amd64 2.26-0ubuntu2amd64
GNU C Library: Development Libraries and Header Files


> > plus, on a related note, the MPX testcase produces annoying warnings:
> > 
> >  gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 
> > -O2 -g -std=gnu99 -pthread -Wall -no-pie  mpx-mini-test.c -lrt -ldl -lm
> >  mpx-mini-test.c: In function ‘insn_test_failed’:
> >  mpx-mini-test.c:1406:3: warning: array subscript is above array bounds 
> >  [-Warray-bounds]
> > printf("bte[1]: %lx\n", bte->contents[1]);
> 
> This is kinda a weird structure:
> 
> > struct mpx_bt_entry {
> > union {
> > char x[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES];
> > unsigned long contents[1];
> > };
> > } __attribute__((packed));
> 
> I guess it should either be contents[0] or
> contents[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTE/sizeof(long)].  But, the
> warning is harmless at least.
> 
> What gcc is this, btw?  I must be behind the times.

gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3) 

Thanks,

Ingo

Re: [PATCH 1/2] sched/swait: allow swake_up() to return

2017-11-09 Thread Peter Xu

On Thu, Nov 09, 2017 at 11:06:53AM +0100, Paolo Bonzini wrote:
> On 09/11/2017 10:18, Peter Xu wrote:
> > Let swake_up() to return whether any of the waiters is waked up. One use
> > case of it would be:
> > 
> >   if (swait_active(wq)) {
> > swake_up(wq);
> > // do something when waiter is waked up
> > waked_up++;
> >   }
> > 
> > Logically it's possible that when reaching swake_up() the wait queue is
> > not active any more, and here doing something like waked_up++ would be
> > inaccurate.  To correct it, we need an atomic version of it.
> > 
> > With this patch, we can simply re-write it into:
> > 
> >   if (swake_up(wq)) {
> > // do something when waiter is waked up
> > waked_up++;
> >   }
> > 
> > After all we are checking swait_active() inside swake_up() too.
> 
> Better subject:
> 
> sched/swait: make swake_up() return whether there were any waiters
> 
> I like this patch.

I'll see how PeterZ would like me to do next, or I can drop this patch
and send another clean up which is part of patch 2.  Thanks for the
positive feedback and commenting. :-)

-- 
Peter Xu

Re: [PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li

2017-11-10 15:04 GMT+08:00 Wanpeng Li :
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter. Idea was discussed here:
> https://lkml.org/lkml/2012/2/20/157
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>
> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in
> one linux guest.
>
> ebizzy -M
>   vanillaoptimized boost
>  8 vCPUs   10152   10083   -0.68%
> 16 vCPUs12244866   297.5%
> 24 vCPUs11093871   249%
> 32 vCPUs10253375   229.3%

v1 -> v2:
 * a new CPUID feature bit
 * fix cmpxchg check
 * use kvm_vcpu_flush_tlb() to get the statistics right
 * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
 * add a new bool argument to kvm_x86_ops->tlb_flush
 * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
 * not put cpumask_t on stack
 * rebase the patchset against "locking/qspinlock/x86: Avoid
test-and-set when PV_DEDICATED is set" v3

>
> Wanpeng Li (4):
>   KVM: Add vCPU running/preempted state
>   KVM: Add paravirt remote TLB flush
>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>   KVM: Add flush_on_enter before guest enter
>
>  Documentation/virtual/kvm/cpuid.txt  | 10 ++
>  arch/x86/include/asm/kvm_host.h  |  2 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  6 ++
>  arch/x86/kernel/kvm.c| 35 ++-
>  arch/x86/kvm/cpuid.c |  3 ++-
>  arch/x86/kvm/svm.c   | 14 +++---
>  arch/x86/kvm/vmx.c   | 21 +++--
>  arch/x86/kvm/x86.c   | 24 +++-
>  8 files changed, 86 insertions(+), 29 deletions(-)
>
> --
> 2.7.4
>

Re: [PATCH 1/2] sched/swait: allow swake_up() to return

2017-11-09 Thread Peter Xu

On Thu, Nov 09, 2017 at 11:23:03AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 05:18:53PM +0800, Peter Xu wrote:
> > Let swake_up() to return whether any of the waiters is waked up. One use
> > case of it would be:
> > 
> >   if (swait_active(wq)) {
> > swake_up(wq);
> > // do something when waiter is waked up
> > waked_up++;
> >   }
> 
> The word is 'woken', and no that doesn't work. All it says is that there
> was a waiter, not that you were to one to wake it. Another concurrent
> wakeup might have done so.

Yes.  Or IIUC the waiter can be calling finish_swait() somehow so it
removed itself from the list before being woken.

> 
> > 
> > Logically it's possible that when reaching swake_up() the wait queue is
> > not active any more, and here doing something like waked_up++ would be
> > inaccurate.  To correct it, we need an atomic version of it.
> > 
> > With this patch, we can simply re-write it into:
> > 
> >   if (swake_up(wq)) {
> > // do something when waiter is waked up
> > waked_up++;
> >   }
> > 
> > After all we are checking swait_active() inside swake_up() too.
> 
> We're not in fact; you've been staring at old code; see commit:
> 
>   35a2897c2a30 ("sched/wait: Remove the lockless swait_active() check in 
> swake_up*()")

I thought the tree was new enough, but obviously I was wrong...
Thanks for the pointer.

> 
> 
> Also, you're changing the interface relative to the regular wait
> interface. The two should be similar wherever possible.

Indeed.

I came to this when reading kvm_vcpu_wake_up(), so that only affects
some statistic which may not be that critical.  However I don't know
whether there would be any other real use case that we would like to
know exactly whether a call to [s]wake_up() has really done something
or just returned with a NOP.

Anyway, please let me know if you think the same change to wake_up()
would be meaningful, otherwise I can drop this patch and post another
KVM-only one to clean up the redundant callers of swait_active(),
since even if we dropped that list check in 35a2897c2a30, we'll do
that again in swake_up_locked().

And after knowing 35a2897c2a30, I do think that calling swait_active()
before swake_up() is not good since that call is without a lock as
well, just like what can happen before 35a2897c2a30.

(I am not 100% sure whether I fully understand the problem mentioned
 in 35a2897c2a30, but I think it's the memory barrier in the
 lock/unlock that matters.)

Thanks,

-- 
Peter Xu

[PATCH v2 1/4] KVM: Add vCPU running/preempted state

2017-11-09 Thread Wanpeng Li

From: Wanpeng Li 

This patch reuses the preempted field in kvm_steal_time, and will export
the vcpu running/pre-empted information to the guest from host. This will
enable guest to intelligently send ipi to running vcpus and set flag for
pre-empted vcpus. This will prevent waiting for vcpus that are not running.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 +++
 arch/x86/kernel/kvm.c| 2 +-
 arch/x86/kvm/x86.c   | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index a965e5b0..ff23ce9 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,9 @@ struct kvm_steal_time {
__u32 pad[11];
 };
 
+#define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
+#define KVM_VCPU_PREEMPTED  (1 << 0)
+
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
__s64 sec;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..1b1b641 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -608,7 +608,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
 {
struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
 
-   return !!src->preempted;
+   return !!(src->preempted & KVM_VCPU_PREEMPTED);
 }
 PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d61dcce3..46d4158 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2116,7 +2116,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = 0;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -2887,7 +2887,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
-   vcpu->arch.st.steal.preempted = 1;
+   vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
 
kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal.preempted,
-- 
2.7.4

[PATCH v2 2/4] KVM: Add paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li

From: Wanpeng Li 

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
  vanillaoptimized boost
 8 vCPUs   10152   10083   -0.68% 
16 vCPUs12244866   297.5% 
24 vCPUs11093871   249%
32 vCPUs10253375   229.3% 

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  |  4 
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c| 31 +++
 3 files changed, 37 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 117066a..9693fcc 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED   || 8 || guest checks 
this feature bit
||   || mizations such as usage of
||   || qspinlocks.
 --
+KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || tlb flush.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 9ead1ed..a028479 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
 #define KVM_FEATURE_PV_DEDICATED   8
+#define KVM_FEATURE_PV_TLB_FLUSH   9
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -53,6 +54,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED  (0 << 0)
 #define KVM_VCPU_PREEMPTED  (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH   (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 66ed3bc..50f4b6a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static cpumask_t flushmask;
+
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+   const struct flush_tlb_info *info)
+{
+   u8 state;
+   int cpu;
+   struct kvm_steal_time *src;
+
+   cpumask_copy(&flushmask, cpumask);
+   /*
+* We have to call flush only on online vCPUs. And
+* queue flush_on_enter for pre-empted vCPUs
+*/
+   for_each_cpu(cpu, cpumask) {
+   src = &per_cpu(steal_time, cpu);
+   state = src->preempted;
+   if ((state & KVM_VCPU_PREEMPTED)) {
+   if (cmpxchg(&src->preempted, state, state |
+   KVM_VCPU_SHOULD_FLUSH) == state)
+   __cpumask_clear_cpu(cpu, &flushmask);
+   }
+   }
+
+   native_flush_tlb_others(&flushmask, info);
+}
+
 void __init kvm_guest_init(void)
 {
int i;
@@ -484,6 +511,10 @@ void __init kvm_guest_init(void)
pv_time_ops.steal_clock = kvm_steal_clock;
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
+   !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
+   pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
-- 
2.7.4

[PATCH v2 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush

2017-11-09 Thread Wanpeng Li

From: Wanpeng Li 

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, 
it will be used by later patches to just flush guest tlb.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  | 14 +++---
 arch/x86/kvm/vmx.c  | 21 +++--
 arch/x86/kvm/x86.c  |  6 +++---
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c73e493..b4f7bb1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -952,7 +952,7 @@ struct kvm_x86_ops {
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-   void (*tlb_flush)(struct kvm_vcpu *vcpu);
+   void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 
void (*run)(struct kvm_vcpu *vcpu);
int (*handle_exit)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b..efaf95f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -285,7 +285,7 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-static void svm_flush_tlb(struct kvm_vcpu *vcpu);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
@@ -2032,7 +2032,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned 
long cr4)
return 1;
 
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 
vcpu->arch.cr4 = cr4;
if (!npt_enabled)
@@ -2368,7 +2368,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 
svm->vmcb->control.nested_cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_NPT);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -3033,7 +3033,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->nested.intercept_exceptions = 
nested_vmcb->control.intercept_exceptions;
svm->nested.intercept= nested_vmcb->control.intercept;
 
-   svm_flush_tlb(&svm->vcpu);
+   svm_flush_tlb(&svm->vcpu, true);
svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | 
V_INTR_MASKING_MASK;
if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -4755,7 +4755,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int 
addr)
return 0;
 }
 
-static void svm_flush_tlb(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -5046,7 +5046,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 
svm->vmcb->save.cr3 = __sme_set(root);
mark_dirty(svm->vmcb, VMCB_CR);
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
@@ -5060,7 +5060,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
svm->vmcb->save.cr3 = kvm_read_cr3(vcpu);
mark_dirty(svm->vmcb, VMCB_CR);
 
-   svm_flush_tlb(vcpu);
+   svm_flush_tlb(vcpu, true);
 }
 
 static int is_disabled(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e5bea5e..17d13d2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4113,9 +4113,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
+   bool invalidate_gpa)
 {
-   if (enable_ept) {
+   if (enable_ept && (invalidate_gpa || !enable_vpid)) {
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
@@ -4124,15 +4125,15 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu 
*vcpu, int vpid)
}
 }
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
-   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+   __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
 }
 
 static void vmx_flush_tlb_ept_only(struct kvm_vcpu *vcpu)
 {
if (enable_ept)
-   vmx_flush_tlb(vcpu);
+   vmx_flush_tlb(vcpu, true);
 }
 
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
@@ -4330,7 +4331,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)
ept_load_pdptrs(vcpu);
}
 
-

[PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based 
page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
  vanillaoptimized boost
 8 vCPUs   10152   10083   -0.68% 
16 vCPUs12244866   297.5% 
24 vCPUs11093871   249%
32 vCPUs10253375   229.3% 

Wanpeng Li (4):
  KVM: Add vCPU running/preempted state
  KVM: Add paravirt remote TLB flush
  KVM: X86: introduce invalidate_gpa argument to tlb flush
  KVM: Add flush_on_enter before guest enter

 Documentation/virtual/kvm/cpuid.txt  | 10 ++
 arch/x86/include/asm/kvm_host.h  |  2 +-
 arch/x86/include/uapi/asm/kvm_para.h |  6 ++
 arch/x86/kernel/kvm.c| 35 ++-
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/svm.c   | 14 +++---
 arch/x86/kvm/vmx.c   | 21 +++--
 arch/x86/kvm/x86.c   | 24 +++-
 8 files changed, 86 insertions(+), 29 deletions(-)

-- 
2.7.4

[PATCH v2 4/4] KVM: Add flush_on_enter before guest enter

2017-11-09 Thread Wanpeng Li

PV-Flush guest would indicate to flush on enter, flush the TLB before
entering and exiting the guest.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   | 22 ++
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..2724a5c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -594,7 +594,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1 << KVM_FEATURE_PV_UNHALT);
+(1 << KVM_FEATURE_PV_UNHALT) |
+(1 << KVM_FEATURE_PV_TLB_FLUSH);
 
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2b2cc99..7e80be4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2107,6 +2107,12 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;
 }
 
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+{
+   ++vcpu->stat.tlb_flush;
+   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
@@ -2116,7 +2122,13 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time
return;
 
-   vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
+   if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
+   (KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED))
+   /*
+* Do TLB_FLUSH before entering the guest, its passed
+* the stage of request checking
+*/
+   kvm_vcpu_flush_tlb(vcpu, false);
 
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1;  /* first time write, random 
junk */
@@ -2887,7 +2899,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
-   vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
+   vcpu->arch.st.steal.preempted |= KVM_VCPU_PREEMPTED;
 
kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal.preempted,
@@ -6737,12 +6749,6 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
-{
-   ++vcpu->stat.tlb_flush;
-   kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
-}
-
 void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
 {
struct page *page = NULL;
-- 
2.7.4

Re: [PATCH] pstore: use ktime_get_real_fast_ns() instead of __getnstimeofday()

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Kees Cook wrote:
> On Thu, Nov 9, 2017 at 4:46 PM, Thomas Gleixner  wrote:
> > On Fri, 10 Nov 2017, Arnd Bergmann wrote:
> >> On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner  
> >> wrote:
> >> > Hmm, no. None of the regular accessor functions can be called from NMI
> >> > context safely.
> >>
> >> Right, that's what I mean: it must not get called from NMI context, but it
> >> currently is, at least for this case:
> >>
> >> NMI handler:
> >>   something bad
> >> panic()
> >>   kmsg_dump()
> >> pstore_dump()
> >>pstore_record_init()
> >>  __getnstimeofday()
> >>
> >> I should probably add that to the changelog text ;-)
> >
> > Indeed.
> 
> Er, so, is this safe to call there? I've had to fix this a few times
> now, so if using ktime_get_real_fast_ns() can be used here (and
> doesn't return 0) then this is easily an improvement over the existing
> "maybe read 0" case pstore has now.

ktime_get_real_fast_ns() is NMI safe and returns

before timekeeping_suspend():   correct time

after timekeeping_suspend():timestamp which was frozen in
timekeeping_suspend()

after timekeeping_resume(): correct time

Thanks,

tglx

Re: [PATCH v17 5/6] vfio: ABI for mdev display dma-buf operation

2017-11-09 Thread Gerd Hoffmann

On Thu, Nov 09, 2017 at 01:54:57PM -0700, Alex Williamson wrote:
> On Thu, 9 Nov 2017 19:35:14 +0100
> Gerd Hoffmann  wrote:
> 
> >   Hi,
> > 
> > > struct vfio_device_gfx_plane_info lacks the head field we've been
> > > discussing.  Thanks,  
> > 
> > Adding multihead support turned out to not be that easy.  There are
> > corner cases like a single framebuffer spawning both heads.  Also it
> > would be useful to somehow hint to the guest which heads it should use.
> > 
> > In short:  Proper multihead support is more complex than just adding a
> > head field for later use.  So in a short private discussion with Tina we
> > came to the conclusion that it will be better add multihead support to
> > the API when the first driver wants use it, so we can actually test the
> > interface and make sure we didn't miss anything.  Adding a incomplete
> > multihead API now doesn't help anybody.
> 
> Do you think we can enable multi-head and preserve backwards
> compatibility within this API proposed here?

Yes, I think we can.  Adding new fields is possible thanks to the argsz
field at the start of the struct, so we easily add the new fields (head,
framebuffer rectangle, whatever else is needed).  If the new fields are
not present the driver can simply assume head=0.

Does the driver set argsz too?  If so userspace can detect whenever the
driver supports the multihead API extension (before going to probe for
head=1) that way.  If not we probably need an additional probe flag for
that.  But in any case I'm confident this is solvable.

Passing hints about the display configuration to the guest needs a new
ioctl, so we don't have compatibility issues there.

cheers,
  Gerd

[PATCH V4 06/12] clk: sprd: add divider clock support

2017-11-09 Thread Chunyan Zhang

This is a feature that can also be found in sprd composite clocks,
provide a bunch of helpers that can be reused in that.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Makefile |   1 +
 drivers/clk/sprd/div.c| 100 ++
 drivers/clk/sprd/div.h|  79 
 3 files changed, 180 insertions(+)
 create mode 100644 drivers/clk/sprd/div.c
 create mode 100644 drivers/clk/sprd/div.h

diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index cee36b5..80e6039 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_SPRD_COMMON_CLK)   += clk-sprd.o
 clk-sprd-y += common.o
 clk-sprd-y += gate.o
 clk-sprd-y += mux.o
+clk-sprd-y += div.o
diff --git a/drivers/clk/sprd/div.c b/drivers/clk/sprd/div.c
new file mode 100644
index 000..3e08dcd
--- /dev/null
+++ b/drivers/clk/sprd/div.c
@@ -0,0 +1,100 @@
+/*
+ * Spreadtrum divider clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+
+#include "div.h"
+
+DEFINE_SPINLOCK(sprd_div_lock);
+EXPORT_SYMBOL_GPL(sprd_div_lock);
+
+long sprd_div_helper_round_rate(struct sprd_clk_common *common,
+   const struct sprd_div_internal *div,
+   unsigned long rate,
+   unsigned long *parent_rate)
+{
+   return divider_round_rate(&common->hw, rate, parent_rate,
+ NULL, div->width, 0);
+}
+EXPORT_SYMBOL_GPL(sprd_div_helper_round_rate);
+
+static long sprd_div_round_rate(struct clk_hw *hw, unsigned long rate,
+   unsigned long *parent_rate)
+{
+   struct sprd_div *cd = hw_to_sprd_div(hw);
+
+   return sprd_div_helper_round_rate(&cd->common, &cd->div,
+ rate, parent_rate);
+}
+
+unsigned long sprd_div_helper_recalc_rate(struct sprd_clk_common *common,
+ const struct sprd_div_internal *div,
+ unsigned long parent_rate)
+{
+   unsigned long val;
+   unsigned int reg;
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+   val = reg >> div->shift;
+   val &= (1 << div->width) - 1;
+
+   return divider_recalc_rate(&common->hw, parent_rate, val, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(sprd_div_helper_recalc_rate);
+
+static unsigned long sprd_div_recalc_rate(struct clk_hw *hw,
+ unsigned long parent_rate)
+{
+   struct sprd_div *cd = hw_to_sprd_div(hw);
+
+   return sprd_div_helper_recalc_rate(&cd->common, &cd->div, parent_rate);
+}
+
+int sprd_div_helper_set_rate(const struct sprd_clk_common *common,
+const struct sprd_div_internal *div,
+unsigned long rate,
+unsigned long parent_rate)
+{
+   unsigned long flags;
+   unsigned long val;
+   unsigned int reg;
+
+   val = divider_get_val(rate, parent_rate, NULL,
+ div->width, 0);
+
+   spin_lock_irqsave(common->lock, flags);
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+   reg &= ~GENMASK(div->width + div->shift - 1, div->shift);
+
+   sprd_regmap_write(common->regmap, common->reg,
+ reg | (val << div->shift));
+
+   spin_unlock_irqrestore(common->lock, flags);
+
+   return 0;
+
+}
+EXPORT_SYMBOL_GPL(sprd_div_helper_set_rate);
+
+static int sprd_div_set_rate(struct clk_hw *hw, unsigned long rate,
+unsigned long parent_rate)
+{
+   struct sprd_div *cd = hw_to_sprd_div(hw);
+
+   return sprd_div_helper_set_rate(&cd->common, &cd->div,
+   rate, parent_rate);
+}
+
+const struct clk_ops sprd_div_ops = {
+   .recalc_rate = sprd_div_recalc_rate,
+   .round_rate = sprd_div_round_rate,
+   .set_rate = sprd_div_set_rate,
+};
+EXPORT_SYMBOL_GPL(sprd_div_ops);
diff --git a/drivers/clk/sprd/div.h b/drivers/clk/sprd/div.h
new file mode 100644
index 000..fa47773
--- /dev/null
+++ b/drivers/clk/sprd/div.h
@@ -0,0 +1,79 @@
+/*
+ * Spreadtrum divider clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef _SPRD_DIV_H_
+#define _SPRD_DIV_H_
+
+#include "common.h"
+
+/**
+ * struct sprd_div_internal - Internal divider description
+ * @shift: Bit offset of the divider in its register
+ * @width: Width of the divider field in its register
+ *
+ * That structure represents a single divider, and is meant to be
+ * embedded in other structures representing the various clock
+ * classes.
+ */
+struct sprd_div_internal {
+   u8  shift;
+   u8  width;
+};
+
+#define _SPRD_DIV_CLK(_shift, _width)  \
+   {

[PATCH V4 12/12] arm64: dts: add clocks for SC9860

2017-11-09 Thread Chunyan Zhang

Some clocks on SC9860 are in the same address area with syscon devices,
those are what have a property of 'sprd,syscon' which would refer to
syscon devices, others would have a reg property indicated their address
ranges.

Signed-off-by: Chunyan Zhang 
---
 arch/arm64/boot/dts/sprd/sc9860.dtsi | 115 +++
 arch/arm64/boot/dts/sprd/whale2.dtsi |   2 +-
 2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/sprd/sc9860.dtsi 
b/arch/arm64/boot/dts/sprd/sc9860.dtsi
index 7b7d8ce..bf03da4 100644
--- a/arch/arm64/boot/dts/sprd/sc9860.dtsi
+++ b/arch/arm64/boot/dts/sprd/sc9860.dtsi
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include "whale2.dtsi"
 
 / {
@@ -183,6 +184,120 @@
};
 
soc {
+   pmu_gate: pmu-gate {
+   compatible = "sprd,sc9860-pmu-gate";
+   sprd,syscon = <&pmu_regs>; /* 0x402b */
+   clocks = <&ext_26m>;
+   #clock-cells = <1>;
+   };
+
+   pll: pll {
+   compatible = "sprd,sc9860-pll";
+   sprd,syscon = <&ana_regs>; /* 0x4040 */
+   clocks = <&pmu_gate 0>;
+   #clock-cells = <1>;
+   };
+
+   ap_clk: clock-controller@2000 {
+   compatible = "sprd,sc9860-ap-clk";
+   reg = <0 0x2000 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>,
+<&pmu_gate 0>;
+   #clock-cells = <1>;
+   };
+
+   aon_prediv: aon-prediv {
+   compatible = "sprd,sc9860-aon-prediv";
+   reg = <0 0x402d 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>,
+<&pmu_gate 0>;
+   #clock-cells = <1>;
+   };
+
+   apahb_gate: apahb-gate {
+   compatible = "sprd,sc9860-apahb-gate";
+   sprd,syscon = <&ap_ahb_regs>; /* 0x2021 */
+   clocks = <&aon_prediv 0>;
+   #clock-cells = <1>;
+   };
+
+   aon_gate: aon-gate {
+   compatible = "sprd,sc9860-aon-gate";
+   sprd,syscon = <&aon_regs>; /* 0x402e */
+   clocks = <&aon_prediv 0>;
+   #clock-cells = <1>;
+   };
+
+   aonsecure_clk: clock-controller@4088 {
+   compatible = "sprd,sc9860-aonsecure-clk";
+   reg = <0 0x4088 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>;
+   #clock-cells = <1>;
+   };
+
+   agcp_gate: agcp-gate {
+   compatible = "sprd,sc9860-agcp-gate";
+   sprd,syscon = <&agcp_regs>; /* 0x415e */
+   clocks = <&aon_prediv 0>;
+   #clock-cells = <1>;
+   };
+
+   gpu_clk: clock-controller@6020 {
+   compatible = "sprd,sc9860-gpu-clk";
+   reg = <0 0x6020 0 0x400>;
+   clocks = <&pll 0>;
+   #clock-cells = <1>;
+   };
+
+   vsp_clk: clock-controller@6100 {
+   compatible = "sprd,sc9860-vsp-clk";
+   reg = <0 0x6100 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>;
+   #clock-cells = <1>;
+   };
+
+   vsp_gate: vsp-gate {
+   compatible = "sprd,sc9860-vsp-gate";
+   sprd,syscon = <&vsp_regs>; /* 0x6110 */
+   clocks = <&vsp_clk 0>;
+   #clock-cells = <1>;
+   };
+
+   cam_clk: clock-controller@6200 {
+   compatible = "sprd,sc9860-cam-clk";
+   reg = <0 0x6200 0 0x4000>;
+   clocks = <&ext_26m>, <&pll 0>;
+   #clock-cells = <1>;
+   };
+
+   cam_gate: cam-gate {
+   compatible = "sprd,sc9860-cam-gate";
+   sprd,syscon = <&cam_regs>; /* 0x6210 */
+   clocks = <&cam_clk 0>;
+   #clock-cells = <1>;
+   };
+
+   disp_clk: clock-controller@6300 {
+   compatible = "sprd,sc9860-disp-clk";
+   reg = <0 0x6300 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>;
+   #clock-cells = <1>;
+   };
+
+   disp_gate: disp-gate {
+   compatible = "sprd,sc9860-disp-gate";
+   sprd,syscon = <&disp_regs>; /* 0x6310 */
+

[PATCH V4 10/12] clk: sprd: add clocks support for SC9860

2017-11-09 Thread Chunyan Zhang

This patch added the list of clocks for Spreadtrum's SC9860 SoC,
together with clock initialization code.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Kconfig  |   10 +
 drivers/clk/sprd/Makefile |3 +
 drivers/clk/sprd/sc9860-clk.c | 1987 +
 3 files changed, 2000 insertions(+)
 create mode 100644 drivers/clk/sprd/sc9860-clk.c

diff --git a/drivers/clk/sprd/Kconfig b/drivers/clk/sprd/Kconfig
index 67a3287..8789247 100644
--- a/drivers/clk/sprd/Kconfig
+++ b/drivers/clk/sprd/Kconfig
@@ -2,3 +2,13 @@ config SPRD_COMMON_CLK
tristate "Clock support for Spreadtrum SoCs"
depends on ARCH_SPRD || COMPILE_TEST
default ARCH_SPRD
+
+if SPRD_COMMON_CLK
+
+# SoC Drivers
+
+config SPRD_SC9860_CLK
+   tristate "Support for the Spreadtrum SC9860 clocks"
+   depends on (ARM64 && ARCH_SPRD) || COMPILE_TEST
+   default ARM64 && ARCH_SPRD
+endif
diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index d693969..b0d81e5 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -6,3 +6,6 @@ clk-sprd-y  += mux.o
 clk-sprd-y += div.o
 clk-sprd-y += composite.o
 clk-sprd-y += pll.o
+
+## SoC support
+obj-$(CONFIG_SPRD_SC9860_CLK)  += sc9860-clk.o
diff --git a/drivers/clk/sprd/sc9860-clk.c b/drivers/clk/sprd/sc9860-clk.c
new file mode 100644
index 000..caf7194
--- /dev/null
+++ b/drivers/clk/sprd/sc9860-clk.c
@@ -0,0 +1,1987 @@
+/*
+ * Spreatrum SC9860 clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "common.h"
+#include "composite.h"
+#include "div.h"
+#include "gate.h"
+#include "mux.h"
+#include "pll.h"
+
+static CLK_FIXED_RATE(ext_rco_100m, "ext-rco-100m", 0, 1, 0);
+static CLK_FIXED_RATE(ext_32k, "ext-32k", 0, 32768, 0);
+
+static CLK_FIXED_FACTOR(fac_4m,"fac-4m",   "ext-26m",
+   6, 1, 0);
+static CLK_FIXED_FACTOR(fac_2m,"fac-2m",   "ext-26m",
+   13, 1, 0);
+static CLK_FIXED_FACTOR(fac_1m,"fac-1m",   "ext-26m",
+   26, 1, 0);
+static CLK_FIXED_FACTOR(fac_250k,  "fac-250k", "ext-26m",
+   104, 1, 0);
+static CLK_FIXED_FACTOR(fac_rpll0_26m, "rpll0-26m","ext-26m",
+   1, 1, 0);
+static CLK_FIXED_FACTOR(fac_rpll1_26m, "rpll1-26m","ext-26m",
+   1, 1, 0);
+static CLK_FIXED_FACTOR(fac_rco_25m,   "rco-25m",  "ext-rc0-100m",
+   4, 1, 0);
+static CLK_FIXED_FACTOR(fac_rco_4m,"rco-4m",   "ext-rc0-100m",
+   25, 1, 0);
+static CLK_FIXED_FACTOR(fac_rco_2m,"rco-2m",   "ext-rc0-100m",
+   50, 1, 0);
+static CLK_FIXED_FACTOR(fac_3k2,   "fac-3k2",  "ext-32k",
+   10, 1, 0);
+static CLK_FIXED_FACTOR(fac_1k,"fac-1k",   "ext-32k",
+   32, 1, 0);
+
+static SPRD_SC_GATE_CLK(mpll0_gate,"mpll0-gate",   "ext-26m", 0xb0,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(mpll1_gate,"mpll1-gate",   "ext-26m", 0xb0,
+0x1000, BIT(18), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(dpll0_gate,"dpll0-gate",   "ext-26m", 0xb4,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(dpll1_gate,"dpll1-gate",   "ext-26m", 0xb4,
+0x1000, BIT(18), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(ltepll0_gate,  "ltepll0-gate", "ext-26m", 0xb8,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(twpll_gate,"twpll-gate",   "ext-26m", 0xbc,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(ltepll1_gate,  "ltepll1-gate", "ext-26m", 0x10c,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(rpll0_gate,"rpll0-gate",   "ext-26m", 0x16c,
+0x1000, BIT(2), 0, 0);
+static SPRD_SC_GATE_CLK(rpll1_gate,"rpll1-gate",   "ext-26m", 0x16c,
+0x1000, BIT(18), 0, 0);
+static SPRD_SC_GATE_CLK(cppll_gate,"cppll-gate",   "ext-26m", 0x2b4,
+0x1000, BIT(2), CLK_IGNORE_UNUSED, 0);
+static SPRD_SC_GATE_CLK(gpll_gate, "gpll-gate","ext-26m", 0x32c,
+   0x1000, BIT(0), CLK_IGNORE_UNUSED, CLK_GATE_SET_TO_DISABLE);
+
+static struct sprd_clk_common *sc9860_pmu_gate_clks[] = {
+   /* address base is 0x402b */
+   &mpll0_gate.common,
+   &mpll1_gate.common,
+   &dpll0_gate.common,
+   &dpll1_gate.common,
+

[PATCH V4 09/12] clk: sprd: Add dt-bindings include file for SC9860

2017-11-09 Thread Chunyan Zhang

This file defines all SC9860 clock indexes, it should be included in the
device tree in which there's device using the clocks.

Signed-off-by: Chunyan Zhang 
---
 include/dt-bindings/clock/sprd,sc9860-clk.h | 408 
 1 file changed, 408 insertions(+)
 create mode 100644 include/dt-bindings/clock/sprd,sc9860-clk.h

diff --git a/include/dt-bindings/clock/sprd,sc9860-clk.h 
b/include/dt-bindings/clock/sprd,sc9860-clk.h
new file mode 100644
index 000..48e6052
--- /dev/null
+++ b/include/dt-bindings/clock/sprd,sc9860-clk.h
@@ -0,0 +1,408 @@
+/*
+ * Spreadtrum SC9860 platform clocks
+ *
+ * Copyright (C) 2017, Spreadtrum Communications Inc.
+ *
+ * SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+ */
+
+#ifndef _DT_BINDINGS_CLK_SC9860_H_
+#define _DT_BINDINGS_CLK_SC9860_H_
+
+#defineCLK_EXT_RCO_100M0
+#defineCLK_EXT_32K 1
+#defineCLK_FAC_4M  2
+#defineCLK_FAC_2M  3
+#defineCLK_FAC_1M  4
+#defineCLK_FAC_250K5
+#defineCLK_FAC_RPLL0_26M   6
+#defineCLK_FAC_RPLL1_26M   7
+#defineCLK_FAC_RCO25M  8
+#defineCLK_FAC_RCO4M   9
+#defineCLK_FAC_RCO2M   10
+#defineCLK_FAC_3K2 11
+#defineCLK_FAC_1K  12
+#defineCLK_MPLL0_GATE  13
+#defineCLK_MPLL1_GATE  14
+#defineCLK_DPLL0_GATE  15
+#defineCLK_DPLL1_GATE  16
+#defineCLK_LTEPLL0_GATE17
+#defineCLK_TWPLL_GATE  18
+#defineCLK_LTEPLL1_GATE19
+#defineCLK_RPLL0_GATE  20
+#defineCLK_RPLL1_GATE  21
+#defineCLK_CPPLL_GATE  22
+#defineCLK_GPLL_GATE   23
+#define CLK_PMU_GATE_NUM   (CLK_GPLL_GATE + 1)
+
+#defineCLK_MPLL0   0
+#defineCLK_MPLL1   1
+#defineCLK_DPLL0   2
+#defineCLK_DPLL1   3
+#defineCLK_RPLL0   4
+#defineCLK_RPLL1   5
+#defineCLK_TWPLL   6
+#defineCLK_LTEPLL0 7
+#defineCLK_LTEPLL1 8
+#defineCLK_GPLL9
+#defineCLK_CPPLL   10
+#defineCLK_GPLL_42M5   11
+#defineCLK_TWPLL_768M  12
+#defineCLK_TWPLL_384M  13
+#defineCLK_TWPLL_192M  14
+#defineCLK_TWPLL_96M   15
+#defineCLK_TWPLL_48M   16
+#defineCLK_TWPLL_24M   17
+#defineCLK_TWPLL_12M   18
+#defineCLK_TWPLL_512M  19
+#defineCLK_TWPLL_256M  20
+#defineCLK_TWPLL_128M  21
+#defineCLK_TWPLL_64M   22
+#defineCLK_TWPLL_307M2 23
+#defineCLK_TWPLL_153M6 24
+#defineCLK_TWPLL_76M8  25
+#defineCLK_TWPLL_51M2  26
+#defineCLK_TWPLL_38M4  27
+#defineCLK_TWPLL_19M2  28
+#defineCLK_L0_614M429
+#defineCLK_L0_409M630
+#defineCLK_L0_38M  31
+#defineCLK_L1_38M  32
+#defineCLK_RPLL0_192M  33
+#defineCLK_RPLL0_96M   34
+#defineCLK_RPLL0_48M   35
+#defineCLK_RPLL1_468M  36
+#defineCLK_RPLL1_192M  37
+#defineCLK_RPLL1_96M   38
+#defineCLK_RPLL1_64M   39
+#defineCLK_RPLL1_48M   40
+#defineCLK_DPLL0_50M   41
+#defineCLK_DPLL1_50M   42
+#defineCLK_CPPLL_50M   43
+#defineCLK_M0_39M  44
+#defineCLK_M1_63M  45
+#define CLK_PLL_NUM(CLK_M1_63M + 1)
+
+
+#defineCLK_AP_APB  0
+#defineCLK_AP_USB3 1
+#defineCLK_UART0   2
+#defineCLK_UART1   3
+#defineCLK_UART2   4
+#defineCLK_UART3   5
+#defineCLK_UART4   6
+#defineCLK_I2C07
+#defineCLK_I2C18
+#defineCLK_I2C29
+#defineCLK_I2C310
+#defineCLK_I2C411
+#defineCLK_I2C512
+#defineCLK_SPI013
+#defineCLK_SPI114
+#defineCLK_SPI215
+#defineCLK_SPI316
+#defineCLK_IIS017
+#defineCLK_IIS118
+#defineCLK_IIS219
+#defineCLK_IIS320
+#define CLK_AP_CLK_NUM (CLK_IIS3 + 1)
+
+#defineCLK_AON_APB 0
+#defineCLK_AUX01
+#defineCLK_AUX12
+#defineCLK_AUX2

Re: [RFC PATCH 0/2] apply write hints to select the type of segments

2017-11-09 Thread Chao Yu

On 2017/11/10 8:23, Hyunchul Lee wrote:
> Hello, Chao
> 
> On 11/09/2017 06:12 PM, Chao Yu wrote:
>> On 2017/11/9 13:51, Hyunchul Lee wrote:
>>> From: Hyunchul Lee 
>>>
>>> Using write hints[1], applications can inform the life time of the data
>>> written to devices. and this[2] reported that the write hints patch
>>> decreased writes in NAND by 25%.
>>>
>>> This hints help F2FS to determine the followings.
>>>   1) the segment types where the data will be written.
>>>   2) the hints that will be passed down to devices with the data of 
>>> segments.
>>>
>>> This patch set implements the first mapping from write hints to segment 
>>> types
>>> as shown below.
>>>
>>>   hints segment type
>>>   - 
>>>   WRITE_LIFE_SHORT  CURSEG_COLD_DATA
>>>   WRITE_LIFE_EXTREMECURSEG_HOT_DATA
>>>   othersCURSEG_WARM_DATA
>>>
>>> The F2FS poliy for hot/cold seperation has precedence over this hints, And
>>> hints are not applied in in-place update.
>>
>> Could we change to disable IPU if file/inode write hint is existing?
>>
> 
> I am afraid that this makes side effects. for example, this could cause
> out-of-place updates even when there are not enough free segments. 
> I can write the patch that handles these situations. But I wonder 
> that this is required, and I am not sure which IPU polices can be disabled.

Oh, As I replied in another thread, I think IPU just affects filesystem
hot/cold separating, rather than this feature. So I think it will be okay
to not consider it.

> 
>>>
>>> Before the second mapping is implemented, write hints are not passed down
>>> to devices. Because it is better that the data of a segment have the same 
>>> hint.
>>>
>>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
>>> [2]: https://lwn.net/Articles/726477/
>>
>> Could you write a patch to support passing write hint to block layer for
>> buffered writes as below commit:
>> 0127251c45ae ("ext4: add support for passing in write hints for buffered 
>> writes")
>>
> 
> Sure I will. I wrote it already ;)

Cool, ;)

> I think that datas from the same segment should be passed down with the same
> hint, and the following mapping is reasonable. I wonder what is your opinion
> about it.
> 
>   segment type   hints
>      -
>   CURSEG_COLD_DATA   WRITE_LIFE_EXTREME
>   CURSEG_HOT_DATAWRITE_LIFE_SHORT
>   CURSEG_COLD_NODE   WRITE_LIFE_NORMAL

We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h?

>   CURSEG_HOT_NODEWRITE_LIFE_MEDIUM

As I know, in scenario of cell phone, data of meta_inode is hottest, then hot
data, warm node, and cold node should be coldest. So I suggested we can define
as below:

META_DATA   WRITE_LIFE_SHORT
HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM
HOT_NODE & WARM_DATAWRITE_LIFE_LONG
COLD_NODE & COLD_DATA   WRITE_LIFE_EXTREME

Thanks,

>   others WRITE_LIFE_NONE
>  
>> Thanks,
>>
>>>
>>> Hyunchul Lee (2):
>>>   f2fs: apply write hints to select the type of segments for buffered
>>> write
>>>   f2fs: apply write hints to select the type of segment for direct write
>>>
>>>  fs/f2fs/data.c| 101 
>>> --
>>>  fs/f2fs/f2fs.h|   1 +
>>>  fs/f2fs/segment.c |  14 +++-
>>>  3 files changed, 74 insertions(+), 42 deletions(-)
>>>
>>
>>
> 
> Thanks
> 
> .
>

[PATCH V4 11/12] arm64: dts: add syscon for whale2 platform

2017-11-09 Thread Chunyan Zhang

Some clocks on SC9860 are in the same address area with syscon
devices, the proper syscon node will be quoted under the
definitions of those clocks in DT.

Signed-off-by: Chunyan Zhang 
---
 arch/arm64/boot/dts/sprd/whale2.dtsi | 46 +++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/sprd/whale2.dtsi 
b/arch/arm64/boot/dts/sprd/whale2.dtsi
index 7c217c5..6ea3a75 100644
--- a/arch/arm64/boot/dts/sprd/whale2.dtsi
+++ b/arch/arm64/boot/dts/sprd/whale2.dtsi
@@ -17,6 +17,51 @@
#size-cells = <2>;
ranges;
 
+   ap_ahb_regs: syscon@2021 {
+   compatible = "syscon";
+   reg = <0 0x2021 0 0x1>;
+   };
+
+   pmu_regs: syscon@402b {
+   compatible = "syscon";
+   reg = <0 0x402b 0 0x1>;
+   };
+
+   aon_regs: syscon@402e {
+   compatible = "syscon";
+   reg = <0 0x402e 0 0x1>;
+   };
+
+   ana_regs: syscon@4040 {
+   compatible = "syscon";
+   reg = <0 0x4040 0 0x1>;
+   };
+
+   agcp_regs: syscon@415e {
+   compatible = "syscon";
+   reg = <0 0x415e 0 0x100>;
+   };
+
+   vsp_regs: syscon@6110 {
+   compatible = "syscon";
+   reg = <0 0x6110 0 0x1>;
+   };
+
+   cam_regs: syscon@6210 {
+   compatible = "syscon";
+   reg = <0 0x6210 0 0x1>;
+   };
+
+   disp_regs: syscon@6310 {
+   compatible = "syscon";
+   reg = <0 0x6310 0 0x1>;
+   };
+
+   ap_apb_regs: syscon@70b0 {
+   compatible = "syscon";
+   reg = <0 0x70b0 0 0x4>;
+   };
+
ap-apb {
compatible = "simple-bus";
#address-cells = <1>;
@@ -59,7 +104,6 @@
status = "disabled";
};
};
-
};
 
ext_26m: ext-26m {
-- 
2.7.4

[PATCH v2] checkpatch: Fix checks for Kconfig help text

2017-11-09 Thread Leo Yan

If one patch has Kconfig section, the check script variable '$is_start'
will be set by first 'config' line and the variable '$is_end' is to be
set by the second 'config' line. But patches often only has one
'config' line so we have no chance to set '$is_end', as result below
condition is invalid and it skips check for Kconfig description:

if ($is_start && $is_end && $length < $min_conf_desc_length) {
..
}

When script runs to this condition sentence it means the Kconfig
section parsing has been completed, whatever '$is_end' is true
or not. So removes '$is_end' from condition sentence.

Another change is to change '$min_conf_desc_length' from 4 to 1; so can
pass the check if Kconfig description has at least one line.

Signed-off-by: Leo Yan 
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3453df9..ba724b0 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf";
 my $max_line_length = 80;
 my $ignore_perl_version = 0;
 my $minimum_perl_version = 5.10.0;
-my $min_conf_desc_length = 4;
+my $min_conf_desc_length = 1;
 my $spelling_file = "$D/spelling.txt";
 my $codespell = 0;
 my $codespellfile = "/usr/share/codespell/dictionary.txt";
@@ -2796,7 +2796,7 @@ sub process {
}
$length++;
}
-   if ($is_start && $is_end && $length < 
$min_conf_desc_length) {
+   if ($is_start && $length < $min_conf_desc_length) {
WARN("CONFIG_DESCRIPTION",
 "please write a paragraph that describes 
the config symbol fully\n" . $herecurr);
}
-- 
2.7.4

[PATCH V4 07/12] clk: sprd: add composite clock support

2017-11-09 Thread Chunyan Zhang

This patch introduced composite clock driver for Spreadtrum's SoCs.
The functions of this composite clock simply consists of divider
and mux clocks.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Makefile|  1 +
 drivers/clk/sprd/composite.c | 65 
 drivers/clk/sprd/composite.h | 55 +
 3 files changed, 121 insertions(+)
 create mode 100644 drivers/clk/sprd/composite.c
 create mode 100644 drivers/clk/sprd/composite.h

diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index 80e6039..2262e76 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -4,3 +4,4 @@ clk-sprd-y  += common.o
 clk-sprd-y += gate.o
 clk-sprd-y += mux.o
 clk-sprd-y += div.o
+clk-sprd-y += composite.o
diff --git a/drivers/clk/sprd/composite.c b/drivers/clk/sprd/composite.c
new file mode 100644
index 000..30d5b36
--- /dev/null
+++ b/drivers/clk/sprd/composite.c
@@ -0,0 +1,65 @@
+/*
+ * Spreadtrum composite clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+
+#include "composite.h"
+
+DEFINE_SPINLOCK(sprd_comp_lock);
+EXPORT_SYMBOL_GPL(sprd_comp_lock);
+
+static long sprd_comp_round_rate(struct clk_hw *hw, unsigned long rate,
+   unsigned long *parent_rate)
+{
+   struct sprd_comp *cc = hw_to_sprd_comp(hw);
+
+   return sprd_div_helper_round_rate(&cc->common, &cc->div,
+rate, parent_rate);
+}
+
+static unsigned long sprd_comp_recalc_rate(struct clk_hw *hw,
+ unsigned long parent_rate)
+{
+   struct sprd_comp *cc = hw_to_sprd_comp(hw);
+
+   return sprd_div_helper_recalc_rate(&cc->common, &cc->div, parent_rate);
+}
+
+static int sprd_comp_set_rate(struct clk_hw *hw, unsigned long rate,
+unsigned long parent_rate)
+{
+   struct sprd_comp *cc = hw_to_sprd_comp(hw);
+
+   return sprd_div_helper_set_rate(&cc->common, &cc->div,
+  rate, parent_rate);
+}
+
+static u8 sprd_comp_get_parent(struct clk_hw *hw)
+{
+   struct sprd_comp *cc = hw_to_sprd_comp(hw);
+
+   return sprd_mux_helper_get_parent(&cc->common, &cc->mux);
+}
+
+static int sprd_comp_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct sprd_comp *cc = hw_to_sprd_comp(hw);
+
+   return sprd_mux_helper_set_parent(&cc->common, &cc->mux, index);
+}
+
+const struct clk_ops sprd_comp_ops = {
+   .get_parent = sprd_comp_get_parent,
+   .set_parent = sprd_comp_set_parent,
+
+   .round_rate = sprd_comp_round_rate,
+   .recalc_rate= sprd_comp_recalc_rate,
+   .set_rate   = sprd_comp_set_rate,
+};
+EXPORT_SYMBOL_GPL(sprd_comp_ops);
diff --git a/drivers/clk/sprd/composite.h b/drivers/clk/sprd/composite.h
new file mode 100644
index 000..a9bd68d
--- /dev/null
+++ b/drivers/clk/sprd/composite.h
@@ -0,0 +1,55 @@
+/*
+ * Spreadtrum composite clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef _SPRD_COMPOSITE_H_
+#define _SPRD_COMPOSITE_H_
+
+#include "common.h"
+#include "mux.h"
+#include "div.h"
+
+struct sprd_comp {
+   struct sprd_mux_sselmux;
+   struct sprd_div_internaldiv;
+   struct sprd_clk_common  common;
+};
+
+#define SPRD_COMP_CLK_TABLE(_struct, _name, _parent, _reg, _table, \
+   _mshift, _mwidth, _dshift, _dwidth, _flags) \
+   struct sprd_comp _struct = {\
+   .mux= _SPRD_MUX_CLK(_mshift, _mwidth, _table),  \
+   .div= _SPRD_DIV_CLK(_dshift, _dwidth),  \
+   .common = { \
+   .regmap = NULL, \
+   .reg= _reg, \
+   .lock   = &sprd_comp_lock,  \
+   .hw.init = CLK_HW_INIT_PARENTS(_name,   \
+  _parent, \
+  &sprd_comp_ops,  \
+  _flags), \
+}  \
+   }
+
+#define SPRD_COMP_CLK(_struct, _name, _parent, _reg, _mshift,  \
+   _mwidth, _dshift, _dwidth, _flags)  \
+   SPRD_COMP_CLK_TABLE(_struct, _name, _parent, _reg,  \
+   NULL, _mshift, _mwidth, \
+   _dshift, _dwidth, _flags)
+
+static inline struct sprd_comp *hw_to_sprd_comp(const struct clk_hw *hw)
+{
+   struct sprd_clk_common *common = hw_to_sprd_clk_common(hw);

[PATCH V4 02/12] dt-bindings: Add Spreadtrum clock binding documentation

2017-11-09 Thread Chunyan Zhang

Introduce a new binding with its documentation for Spreadtrum clock
sub-framework.

Signed-off-by: Chunyan Zhang 
---
 Documentation/devicetree/bindings/clock/sprd.txt | 63 
 1 file changed, 63 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/sprd.txt

diff --git a/Documentation/devicetree/bindings/clock/sprd.txt 
b/Documentation/devicetree/bindings/clock/sprd.txt
new file mode 100644
index 000..e9d179e
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/sprd.txt
@@ -0,0 +1,63 @@
+Spreadtrum Clock Binding
+
+
+Required properties:
+- compatible: should contain the following compatible strings:
+   - "sprd,sc9860-pmu-gate"
+   - "sprd,sc9860-pll"
+   - "sprd,sc9860-ap-clk"
+   - "sprd,sc9860-aon-prediv"
+   - "sprd,sc9860-apahb-gate"
+   - "sprd,sc9860-aon-gate"
+   - "sprd,sc9860-aonsecure-clk"
+   - "sprd,sc9860-agcp-gate"
+   - "sprd,sc9860-gpu-clk"
+   - "sprd,sc9860-vsp-clk"
+   - "sprd,sc9860-vsp-gate"
+   - "sprd,sc9860-cam-clk"
+   - "sprd,sc9860-cam-gate"
+   - "sprd,sc9860-disp-clk"
+   - "sprd,sc9860-disp-gate"
+   - "sprd,sc9860-apapb-gate"
+
+- #clock-cells: must be 1
+
+- clocks : Should be the input parent clock(s) phandle for the clock, this
+  property here just simply shows which clock group the clocks'
+  parents are in, since each clk node would represent many clocks
+  which are defined in the driver.  The detailed dependency
+  relationship (i.e. how many parents and which are the parents)
+  are implemented in driver code.
+
+Optional properties:
+
+- reg: Contain the registers base address and length. It must be configured
+   only if no 'sprd,syscon' under the node.
+
+- sprd,syscon: phandle to the syscon which is in the same address area with
+  the clock, and so we can get regmap for the clocks from the
+  syscon device.
+
+Example:
+
+   pmu_gate: pmu-gate {
+   compatible = "sprd,sc9860-pmu-gate";
+   sprd,syscon = <&pmu_regs>;
+   clocks = <&ext_26m>;
+   #clock-cells = <1>;
+   };
+
+   pll: pll {
+   compatible = "sprd,sc9860-pll";
+   sprd,syscon = <&ana_regs>;
+   clocks = <&pmu_gate 0>;
+   #clock-cells = <1>;
+   };
+
+   ap_clk: clock-controller@2000 {
+   compatible = "sprd,sc9860-ap-clk";
+   reg = <0 0x2000 0 0x400>;
+   clocks = <&ext_26m>, <&pll 0>,
+<&pmu_gate 0>;
+   #clock-cells = <1>;
+   };
-- 
2.7.4

[PATCH V4 03/12] clk: sprd: Add common infrastructure

2017-11-09 Thread Chunyan Zhang

Added Spreadtrum's clock driver framework together with common
structures and interface functions.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/Kconfig   |   1 +
 drivers/clk/Makefile  |   1 +
 drivers/clk/sprd/Kconfig  |   4 ++
 drivers/clk/sprd/Makefile |   3 ++
 drivers/clk/sprd/common.c | 113 ++
 drivers/clk/sprd/common.h |  54 ++
 6 files changed, 176 insertions(+)
 create mode 100644 drivers/clk/sprd/Kconfig
 create mode 100644 drivers/clk/sprd/Makefile
 create mode 100644 drivers/clk/sprd/common.c
 create mode 100644 drivers/clk/sprd/common.h

diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 1c4e1aa..ce1a32be 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -236,6 +236,7 @@ source "drivers/clk/mvebu/Kconfig"
 source "drivers/clk/qcom/Kconfig"
 source "drivers/clk/renesas/Kconfig"
 source "drivers/clk/samsung/Kconfig"
+source "drivers/clk/sprd/Kconfig"
 source "drivers/clk/sunxi-ng/Kconfig"
 source "drivers/clk/tegra/Kconfig"
 source "drivers/clk/ti/Kconfig"
diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index c99f363..fa33891 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -84,6 +84,7 @@ obj-$(CONFIG_COMMON_CLK_SAMSUNG)  += samsung/
 obj-$(CONFIG_ARCH_SIRF)+= sirf/
 obj-$(CONFIG_ARCH_SOCFPGA) += socfpga/
 obj-$(CONFIG_PLAT_SPEAR)   += spear/
+obj-$(CONFIG_ARCH_SPRD)+= sprd/
 obj-$(CONFIG_ARCH_STI) += st/
 obj-$(CONFIG_ARCH_SUNXI)   += sunxi/
 obj-$(CONFIG_ARCH_SUNXI)   += sunxi-ng/
diff --git a/drivers/clk/sprd/Kconfig b/drivers/clk/sprd/Kconfig
new file mode 100644
index 000..67a3287
--- /dev/null
+++ b/drivers/clk/sprd/Kconfig
@@ -0,0 +1,4 @@
+config SPRD_COMMON_CLK
+   tristate "Clock support for Spreadtrum SoCs"
+   depends on ARCH_SPRD || COMPILE_TEST
+   default ARCH_SPRD
diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
new file mode 100644
index 000..74f4b80
--- /dev/null
+++ b/drivers/clk/sprd/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_SPRD_COMMON_CLK)  += clk-sprd.o
+
+clk-sprd-y += common.o
diff --git a/drivers/clk/sprd/common.c b/drivers/clk/sprd/common.c
new file mode 100644
index 000..c003f09
--- /dev/null
+++ b/drivers/clk/sprd/common.c
@@ -0,0 +1,113 @@
+/*
+ * Spreadtrum clock infrastructure
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common.h"
+
+static const struct regmap_config sprdclk_regmap_config = {
+   .reg_bits   = 32,
+   .reg_stride = 4,
+   .val_bits   = 32,
+   .max_register   = 0x,
+   .fast_io= true,
+};
+
+static void sprd_clk_set_regmap(const struct sprd_clk_desc *desc,
+struct regmap *regmap)
+{
+   int i;
+   struct sprd_clk_common *cclk;
+
+   for (i = 0; i < desc->num_clk_clks; i++) {
+   cclk = desc->clk_clks[i];
+   if (!cclk)
+   continue;
+
+   cclk->regmap = regmap;
+   }
+}
+
+int sprd_clk_regmap_init(struct platform_device *pdev,
+const struct sprd_clk_desc *desc)
+{
+   void __iomem *base;
+   struct device_node *node = pdev->dev.of_node;
+   struct regmap *regmap = NULL;
+
+   if (of_find_property(node, "sprd,syscon", NULL)) {
+   regmap = syscon_regmap_lookup_by_phandle(node, "sprd,syscon");
+   if (IS_ERR(regmap)) {
+   pr_err("%s: failed to get syscon regmap\n", __func__);
+   return PTR_ERR(regmap);
+   }
+   } else {
+   base = of_iomap(node, 0);
+   regmap = devm_regmap_init_mmio(&pdev->dev, base,
+  &sprdclk_regmap_config);
+   if (IS_ERR(regmap)) {
+   pr_err("failed to init regmap.\n");
+   return PTR_ERR(regmap);
+   }
+   }
+
+   sprd_clk_set_regmap(desc, regmap);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(sprd_clk_regmap_init);
+
+int sprd_clk_probe(struct device_node *node,
+  struct clk_hw_onecell_data *clkhw)
+{
+   int i, ret = 0;
+   struct clk_hw *hw;
+
+   for (i = 0; i < clkhw->num; i++) {
+
+   hw = clkhw->hws[i];
+
+   if (!hw)
+   continue;
+
+   ret = clk_hw_register(NULL, hw);
+   if (ret) {
+   pr_err("Couldn't register clock %d - %s\n",
+  i, hw->init->name);
+   goto err_clk_unreg;
+   }
+   }
+
+   ret = of_clk_add_hw_provider(node, of_clk_hw_onecell_get,
+clkhw);
+   if (ret) {
+   p

[PATCH V4 08/12] clk: sprd: add adjustable pll support

2017-11-09 Thread Chunyan Zhang

Introduced a common adjustable pll clock driver for Spreadtrum SoCs.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Makefile |   1 +
 drivers/clk/sprd/pll.c| 268 ++
 drivers/clk/sprd/pll.h| 110 +++
 3 files changed, 379 insertions(+)
 create mode 100644 drivers/clk/sprd/pll.c
 create mode 100644 drivers/clk/sprd/pll.h

diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index 2262e76..d693969 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -5,3 +5,4 @@ clk-sprd-y  += gate.o
 clk-sprd-y += mux.o
 clk-sprd-y += div.o
 clk-sprd-y += composite.o
+clk-sprd-y += pll.o
diff --git a/drivers/clk/sprd/pll.c b/drivers/clk/sprd/pll.c
new file mode 100644
index 000..1fd8d32
--- /dev/null
+++ b/drivers/clk/sprd/pll.c
@@ -0,0 +1,268 @@
+/*
+ * Spreadtrum pll clock driver
+ *
+ * Copyright (C) 2015~2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "pll.h"
+
+#define CLK_PLL_1M 100
+#define CLK_PLL_10M(CLK_PLL_1M * 10)
+
+#define pindex(pll, member)\
+   (pll->factors[member].shift / (8 * sizeof(pll->regs_num)))
+
+#define pshift(pll, member)\
+   (pll->factors[member].shift % (8 * sizeof(pll->regs_num)))
+
+#define pwidth(pll, member)\
+   pll->factors[member].width
+
+#define pmask(pll, member) \
+   ((pwidth(pll, member)) ?\
+   GENMASK(pwidth(pll, member) + pshift(pll, member) - 1,  \
+   pshift(pll, member)) : 0)
+
+#define pinternal(pll, cfg, member)\
+   (cfg[pindex(pll, member)] & pmask(pll, member))
+
+#define pinternal_val(pll, cfg, member)\
+   (pinternal(pll, cfg, member) >> pshift(pll, member))
+
+static inline unsigned int
+sprd_pll_read(const struct sprd_pll *pll, u8 index)
+{
+   const struct sprd_clk_common *common = &pll->common;
+   unsigned int val = 0;
+
+   if (WARN_ON(index >= pll->regs_num))
+   return 0;
+
+   sprd_regmap_read(common->regmap, common->reg + index * 4, &val);
+
+   return val;
+}
+
+static inline void
+sprd_pll_write(const struct sprd_pll *pll, u8 index,
+ u32 msk, u32 val)
+{
+   const struct sprd_clk_common *common = &pll->common;
+   unsigned int offset, reg;
+   int ret = 0;
+
+   if (WARN_ON(index >= pll->regs_num))
+   return;
+
+   offset = common->reg + index * 4;
+   ret = sprd_regmap_read(common->regmap, offset, ®);
+   if (!ret)
+   sprd_regmap_write(common->regmap, offset, (reg & ~msk) | val);
+}
+
+static unsigned long pll_get_refin(const struct sprd_pll *pll)
+{
+   u32 shift, mask, index, refin_id = 3;
+   const unsigned long refin[4] = { 2, 4, 13, 26 };
+
+   if (pwidth(pll, PLL_REFIN)) {
+   index = pindex(pll, PLL_REFIN);
+   shift = pshift(pll, PLL_REFIN);
+   mask = pmask(pll, PLL_REFIN);
+   refin_id = (sprd_pll_read(pll, index) & mask) >> shift;
+   if (refin_id > 3)
+   refin_id = 3;
+   }
+
+   return refin[refin_id];
+}
+
+static u32 pll_get_ibias(u64 rate, const u64 *table)
+{
+   u32 i, num = table[0];
+
+   for (i = 1; i < num + 1; i++)
+   if (rate <= table[i])
+   break;
+
+   return (i == num + 1) ? num : i;
+}
+
+static unsigned long _sprd_pll_recalc_rate(const struct sprd_pll *pll,
+  unsigned long parent_rate)
+{
+   u32 *cfg;
+   u32 i, mask, regs_num = pll->regs_num;
+   unsigned long rate, nint, kint = 0;
+   u64 refin;
+   u16 k1, k2;
+
+   cfg = kcalloc(regs_num, sizeof(*cfg), GFP_KERNEL);
+   if (!cfg)
+   return -ENOMEM;
+
+   for (i = 0; i < regs_num; i++)
+   cfg[i] = sprd_pll_read(pll, i);
+
+   refin = pll_get_refin(pll);
+
+   if (pinternal(pll, cfg, PLL_PREDIV))
+   refin = refin * 2;
+
+   if (pwidth(pll, PLL_POSTDIV) &&
+   ((pll->fflag == 1 && pinternal(pll, cfg, PLL_POSTDIV)) ||
+(!pll->fflag && !pinternal(pll, cfg, PLL_POSTDIV
+   refin = refin / 2;
+
+   if (!pinternal(pll, cfg, PLL_DIV_S)) {
+   rate = refin * pinternal_val(pll, cfg, PLL_N) * CLK_PLL_10M;
+   } else {
+   nint = pinternal_val(pll, cfg, PLL_NINT);
+   if (pinternal(pll, cfg, PLL_SDM_EN))
+   kint = pinternal_val(pll, cfg, PLL_KINT);
+
+   mask = pmask(pll, PLL_KINT);
+
+   k1 = pll->k1;
+   k2 = pll->k2;
+   rate = DIV_ROUND_CLOSEST_ULL(refin * kint * k1,
+((mask >> __ffs(mask)) + 1)) *
+

[PATCH V4 05/12] clk: sprd: add mux clock support

2017-11-09 Thread Chunyan Zhang

This patch adds clock multiplexor support for Spreadtrum platforms,
the mux clocks also can be found in sprd composite clocks, so
provides two helpers that can be reused later on.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Makefile |  1 +
 drivers/clk/sprd/mux.c| 86 +++
 drivers/clk/sprd/mux.h| 78 ++
 3 files changed, 165 insertions(+)
 create mode 100644 drivers/clk/sprd/mux.c
 create mode 100644 drivers/clk/sprd/mux.h

diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index 8cd5592..cee36b5 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_SPRD_COMMON_CLK)   += clk-sprd.o
 
 clk-sprd-y += common.o
 clk-sprd-y += gate.o
+clk-sprd-y += mux.o
diff --git a/drivers/clk/sprd/mux.c b/drivers/clk/sprd/mux.c
new file mode 100644
index 000..dacb5b4
--- /dev/null
+++ b/drivers/clk/sprd/mux.c
@@ -0,0 +1,86 @@
+/*
+ * Spreadtrum multiplexer clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+#include 
+
+#include "mux.h"
+
+DEFINE_SPINLOCK(sprd_mux_lock);
+EXPORT_SYMBOL_GPL(sprd_mux_lock);
+
+u8 sprd_mux_helper_get_parent(const struct sprd_clk_common *common,
+ const struct sprd_mux_ssel *mux)
+{
+   unsigned int reg;
+   u8 parent;
+   int num_parents;
+   int i;
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+   parent = reg >> mux->shift;
+   parent &= (1 << mux->width) - 1;
+
+   if (!mux->table)
+   return parent;
+
+   num_parents = clk_hw_get_num_parents(&common->hw);
+
+   for (i = 0; i < num_parents - 1; i++)
+   if (parent >= mux->table[i] && parent < mux->table[i + 1])
+   return i;
+
+   return num_parents - 1;
+}
+EXPORT_SYMBOL_GPL(sprd_mux_helper_get_parent);
+
+static u8 sprd_mux_get_parent(struct clk_hw *hw)
+{
+   struct sprd_mux *cm = hw_to_sprd_mux(hw);
+
+   return sprd_mux_helper_get_parent(&cm->common, &cm->mux);
+}
+
+int sprd_mux_helper_set_parent(const struct sprd_clk_common *common,
+  const struct sprd_mux_ssel *mux,
+  u8 index)
+{
+   unsigned long flags = 0;
+   unsigned int reg;
+
+   if (mux->table)
+   index = mux->table[index];
+
+   spin_lock_irqsave(common->lock, flags);
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+   reg &= ~GENMASK(mux->width + mux->shift - 1, mux->shift);
+   sprd_regmap_write(common->regmap, common->reg,
+ reg | (index << mux->shift));
+
+   spin_unlock_irqrestore(common->lock, flags);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(sprd_mux_helper_set_parent);
+
+static int sprd_mux_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct sprd_mux *cm = hw_to_sprd_mux(hw);
+
+   return sprd_mux_helper_set_parent(&cm->common, &cm->mux, index);
+}
+
+const struct clk_ops sprd_mux_ops = {
+   .get_parent = sprd_mux_get_parent,
+   .set_parent = sprd_mux_set_parent,
+   .determine_rate = __clk_mux_determine_rate,
+};
+EXPORT_SYMBOL_GPL(sprd_mux_ops);
diff --git a/drivers/clk/sprd/mux.h b/drivers/clk/sprd/mux.h
new file mode 100644
index 000..72a3f78
--- /dev/null
+++ b/drivers/clk/sprd/mux.h
@@ -0,0 +1,78 @@
+/*
+ * Spreadtrum multiplexer clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef _SPRD_MUX_H_
+#define _SPRD_MUX_H_
+
+#include "common.h"
+
+/**
+ * struct sprd_mux_ssel - Mux clock's source select bits in its register
+ * @shift: Bit offset of the divider in its register
+ * @width: Width of the divider field in its register
+ * @table: For some mux clocks, not all sources are used on some special
+ *chips, this matches the value of mux clock's register and the
+ *sources which are used for this mux clock
+ */
+struct sprd_mux_ssel {
+   u8  shift;
+   u8  width;
+   const u8*table;
+};
+
+struct sprd_mux {
+   struct sprd_mux_ssel mux;
+   struct sprd_clk_common  common;
+};
+
+#define _SPRD_MUX_CLK(_shift, _width, _table)  \
+   {   \
+   .shift  = _shift,   \
+   .width  = _width,   \
+   .table  = _table,   \
+   }
+
+#define SPRD_MUX_CLK_TABLE(_struct, _name, _parents, _table,   \
+_reg, _shift, _width,  \
+_flags)\
+   struct sprd_mux _struct = { \
+   .mux= _SPRD_MUX_CLK(_shift, _width, _table),\
+

[PATCH V4 01/12] drivers: move clock common macros out from vendor directories

2017-11-09 Thread Chunyan Zhang

These macros are used by more than one SoC vendor platforms, avoid to
have many copies of these code, this patch moves them to the common
clock directory which every clock drivers can access to.

Signed-off-by: Chunyan Zhang 
---
This patchset also added a few common clock mactos into 
drivers/clk/clk_common.h,
which are generally useful for all vendors' clock driver, sunxi-ng, zte, sprd
(added in this patchse) use them (or part of them) at present, once this patch
is merged, I can help to remove the duplicated code which is under the vendors'
respective directories.
---
 drivers/clk/clk_common.h | 60 
 1 file changed, 60 insertions(+)
 create mode 100644 drivers/clk/clk_common.h

diff --git a/drivers/clk/clk_common.h b/drivers/clk/clk_common.h
new file mode 100644
index 000..21e93d2
--- /dev/null
+++ b/drivers/clk/clk_common.h
@@ -0,0 +1,60 @@
+/*
+ * drivers/clk/clk_common.h
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef _CLK_COMMON_H_
+#define _CLK_COMMON_H_
+
+#include 
+
+#define CLK_HW_INIT(_name, _parent, _ops, _flags)  \
+   (&(struct clk_init_data) {  \
+   .flags  = _flags,   \
+   .name   = _name,\
+   .parent_names   = (const char *[]) { _parent }, \
+   .num_parents= 1,\
+   .ops= _ops, \
+   })
+
+#define CLK_HW_INIT_PARENTS(_name, _parents, _ops, _flags) \
+   (&(struct clk_init_data) {  \
+   .flags  = _flags,   \
+   .name   = _name,\
+   .parent_names   = _parents, \
+   .num_parents= ARRAY_SIZE(_parents), \
+   .ops= _ops, \
+   })
+
+#define CLK_HW_INIT_NO_PARENT(_name, _ops, _flags) \
+   (&(struct clk_init_data) {  \
+   .flags  = _flags,   \
+   .name   = _name,\
+   .parent_names   = NULL, \
+   .num_parents= 0,\
+   .ops= _ops, \
+   })
+
+#define CLK_FIXED_FACTOR(_struct, _name, _parent,  \
+   _div, _mult, _flags)\
+   struct clk_fixed_factor _struct = { \
+   .div= _div, \
+   .mult   = _mult,\
+   .hw.init= CLK_HW_INIT(_name,\
+ _parent,  \
+ &clk_fixed_factor_ops,\
+ _flags),  \
+   }
+
+#define CLK_FIXED_RATE(_struct, _name, _flags, \
+  _fixed_rate, _fixed_accuracy)\
+   struct clk_fixed_rate _struct = {   \
+   .fixed_rate = _fixed_rate,  \
+   .fixed_accuracy = _fixed_accuracy,  \
+   .hw.init= CLK_HW_INIT_NO_PARENT(_name,  \
+ &clk_fixed_rate_ops,  \
+   _flags),\
+   }
+
+#endif /* _CLK_COMMON_H_ */
-- 
2.7.4

[PATCH V4 04/12] clk: sprd: add gate clock support

2017-11-09 Thread Chunyan Zhang

Some clocks on the Spreadtrum's SoCs are just simple gates. Add
support for those clocks.

Signed-off-by: Chunyan Zhang 
---
 drivers/clk/sprd/Makefile |   1 +
 drivers/clk/sprd/gate.c   | 124 ++
 drivers/clk/sprd/gate.h   |  63 +++
 3 files changed, 188 insertions(+)
 create mode 100644 drivers/clk/sprd/gate.c
 create mode 100644 drivers/clk/sprd/gate.h

diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile
index 74f4b80..8cd5592 100644
--- a/drivers/clk/sprd/Makefile
+++ b/drivers/clk/sprd/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_SPRD_COMMON_CLK)  += clk-sprd.o
 
 clk-sprd-y += common.o
+clk-sprd-y += gate.o
diff --git a/drivers/clk/sprd/gate.c b/drivers/clk/sprd/gate.c
new file mode 100644
index 000..fa0d9ee
--- /dev/null
+++ b/drivers/clk/sprd/gate.c
@@ -0,0 +1,124 @@
+/*
+ * Spreadtrum gate clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include 
+#include 
+
+#include "gate.h"
+
+DEFINE_SPINLOCK(sprd_gate_lock);
+EXPORT_SYMBOL_GPL(sprd_gate_lock);
+
+static void clk_gate_toggle(const struct sprd_gate *sg, bool en)
+{
+   const struct sprd_clk_common *common = &sg->common;
+   unsigned long flags = 0;
+   unsigned int reg;
+   bool set = sg->flags & CLK_GATE_SET_TO_DISABLE ? true : false;
+
+   set ^= en;
+
+   spin_lock_irqsave(common->lock, flags);
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+
+   if (set)
+   reg |= sg->enable_mask;
+   else
+   reg &= ~sg->enable_mask;
+
+   sprd_regmap_write(common->regmap, common->reg, reg);
+
+   spin_unlock_irqrestore(common->lock, flags);
+}
+
+static void clk_sc_gate_toggle(const struct sprd_gate *sg, bool en)
+{
+   const struct sprd_clk_common *common = &sg->common;
+   unsigned long flags = 0;
+   bool set = sg->flags & CLK_GATE_SET_TO_DISABLE ? 1 : 0;
+   unsigned int offset;
+
+   set ^= en;
+
+   /*
+* Each set/clear gate clock has three registers:
+* common->reg  - base register
+* common->reg + offset - set register
+* common->reg + 2 * offset - clear register
+*/
+   offset = set ? sg->sc_offset : sg->sc_offset * 2;
+
+   spin_lock_irqsave(common->lock, flags);
+   sprd_regmap_write(common->regmap, common->reg + offset,
+ sg->enable_mask);
+   spin_unlock_irqrestore(common->lock, flags);
+}
+
+static void sprd_gate_disable(struct clk_hw *hw)
+{
+   struct sprd_gate *sg = hw_to_sprd_gate(hw);
+
+   clk_gate_toggle(sg, false);
+}
+
+static int sprd_gate_enable(struct clk_hw *hw)
+{
+   struct sprd_gate *sg = hw_to_sprd_gate(hw);
+
+   clk_gate_toggle(sg, true);
+
+   return 0;
+}
+
+static void sprd_sc_gate_disable(struct clk_hw *hw)
+{
+   struct sprd_gate *sg = hw_to_sprd_gate(hw);
+
+   clk_sc_gate_toggle(sg, false);
+}
+
+static int sprd_sc_gate_enable(struct clk_hw *hw)
+{
+   struct sprd_gate *sg = hw_to_sprd_gate(hw);
+
+   clk_sc_gate_toggle(sg, true);
+
+   return 0;
+}
+static int sprd_gate_is_enabled(struct clk_hw *hw)
+{
+   struct sprd_gate *sg = hw_to_sprd_gate(hw);
+   struct sprd_clk_common *common = &sg->common;
+   unsigned int reg;
+
+   sprd_regmap_read(common->regmap, common->reg, ®);
+
+   if (sg->flags & CLK_GATE_SET_TO_DISABLE)
+   reg ^= sg->enable_mask;
+
+   reg &= sg->enable_mask;
+
+   return reg ? 1 : 0;
+}
+
+const struct clk_ops sprd_gate_ops = {
+   .disable= sprd_gate_disable,
+   .enable = sprd_gate_enable,
+   .is_enabled = sprd_gate_is_enabled,
+};
+EXPORT_SYMBOL_GPL(sprd_gate_ops);
+
+const struct clk_ops sprd_sc_gate_ops = {
+   .disable= sprd_sc_gate_disable,
+   .enable = sprd_sc_gate_enable,
+   .is_enabled = sprd_gate_is_enabled,
+};
+EXPORT_SYMBOL_GPL(sprd_sc_gate_ops);
+
diff --git a/drivers/clk/sprd/gate.h b/drivers/clk/sprd/gate.h
new file mode 100644
index 000..dad8ba0
--- /dev/null
+++ b/drivers/clk/sprd/gate.h
@@ -0,0 +1,63 @@
+/*
+ * Spreadtrum gate clock driver
+ *
+ * Copyright (C) 2017 Spreadtrum, Inc.
+ * Author: Chunyan Zhang 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef _SPRD_GATE_H_
+#define _SPRD_GATE_H_
+
+#include "common.h"
+
+struct sprd_gate {
+   u32 enable_mask;
+   u16 flags;
+   u16 sc_offset;
+
+   struct sprd_clk_common  common;
+};
+
+#define SPRD_SC_GATE_CLK_OPS(_struct, _name, _parent, _reg, _sc_offset,
\
+_enable_mask, _flags, _gate_flags, _ops)   \
+   struct sprd_gate _struct = {\
+   .enable_mask= _enable_mask, \
+   .sc_offset  = _sc_of

Re: n900 in next-20170901

2017-11-09 Thread Joonsoo Kim

On Thu, Nov 09, 2017 at 10:23:40PM -0800, Tony Lindgren wrote:
> * Tony Lindgren  [171109 22:19]:
> > * Tony Lindgren  [171110 03:28]:
> > > Then I'll follow up on cleaning up save_secure_ram_context later.
> > 
> > Here's a better version, the static mapping did not get used.. It
> > just moved the area so it happened to work. It needs to be set
> > up as MT_MEMORY_RWX_NONCACHED instead.
> 

I see a better version now. Hmm... I guess that it also has the
problem that I mentioned on first version.

> And FYI, here's what I currently have for the follow-up patch,
> but that can wait a bit.

Okay. So, this patch should be applied on the top of above better version?

Thanks.

[PATCH V4 00/12] add clock driver for Spreadtrum platforms

2017-11-09 Thread Chunyan Zhang

This series adds Spreadtrum clock support together with its binding
documentation and devicetree data.

Any comments would be greatly appreciated.

This patchset also added a few common clock mactos into 
drivers/clk/clk_common.h,
which are generally useful for all vendors' clock driver, sunxi-ng, zte, sprd
(added in this patchse) use them (or part of them) at present, once this 
patchset
is merged, I can help to remove the duplicated code which is under the vendors'
respective directories.


Thanks,
Chunyan

Changes from V3: (https://lkml.org/lkml/2017/11/2/61)
* Addressed comments from Julien Thierry:
  - Clean the if branch of sprd_mux_helper_get_parent()
  - Have the Gate clock macros and ops for both mode (i.e. sc_gate and gate) 
separate;
  - Have the Mux clock macros with/without table separate, and same changes
for the composite clock.
* Switched the function name from _endisable to _toggle;
* Fixed Kbuild test error:
  - Added exporting sprd_clk_regmap_init() which would be used in other 
module(s);
* Change the function sprd_clk_set_regmap() to the static one, and removed the
  declear from the include file;
* Addressed comments from Rob:
  - Separate the dt-binding include file from the driver patch;
  - Documented more for the property "clocks"
* Changed the syscon device names;
* Changed the name of 'sprd_mux_internal' to 'sprd_mux_ssel'
  

Changes from V2: (http://lkml.iu.edu/hypermail/linux/kernel/1707.1/01504.html)
* Switch to use regmap to access registers;
* Splited all clocks into 16 separated nodes, for each belongs to a single 
address area; 
* Rearranged the order of clock declaration in sc9860-clk.c, sorted them upon 
the address area;
* Added syscon device tree nodes which will be quoted by the node of clocks 
which are in
  the same address area with the syscon device;
* Revised the binding documentation according to the dt modification. 

Changes from V1: (https://lkml.org/lkml/2017/6/17/356)
* Address Stephen's comments:
  - Switch to use platform device driver instead of the DT probing mechanism.
  - Move the common clock macro out from vendor directory, but need to remove 
those
overlap code from other vendors (such as sunxi-ng) once this get merged.
  - Add support to be built as a module.
  - Add 'sprd_' prefix for all spin locks used in these drivers.
  - Mark input parameter of sprd_x with const.
  - Remove unreasonable dependencies to CONFIG_64BIT.
  - Add readl() after writing the same register.
  - Remove CLK_IS_BASIC which is no longer used.
  - Remove unnecessery CLK_IGNORE_UNUSED when defining a clock.
  - Change to expose all clock index.
  - Use clk_ instead of ccu.
  - Add Kconfig for sprd clocks.
  - Move the fixed clocks out from the soc node.
  - Switch to use 64-bit math in pll driver instead of 32-bit math.
* Revise binding documentation according to dt modification.
* Rename sc9860.c to sc9860-clk.c


Chunyan Zhang (12):
  drivers: move clock common macros out from vendor directories
  dt-bindings: Add Spreadtrum clock binding documentation
  clk: sprd: Add common infrastructure
  clk: sprd: add gate clock support
  clk: sprd: add mux clock support
  clk: sprd: add divider clock support
  clk: sprd: add composite clock support
  clk: sprd: add adjustable pll support
  clk: sprd: Add dt-bindings include file for SC9860
  clk: sprd: add clocks support for SC9860
  arm64: dts: add syscon for whale2 platform
  arm64: dts: add clocks for SC9860

 Documentation/devicetree/bindings/clock/sprd.txt |   63 +
 arch/arm64/boot/dts/sprd/sc9860.dtsi |  115 ++
 arch/arm64/boot/dts/sprd/whale2.dtsi |   48 +-
 drivers/clk/Kconfig  |1 +
 drivers/clk/Makefile |1 +
 drivers/clk/clk_common.h |   60 +
 drivers/clk/sprd/Kconfig |   14 +
 drivers/clk/sprd/Makefile|   11 +
 drivers/clk/sprd/common.c|  113 ++
 drivers/clk/sprd/common.h|   54 +
 drivers/clk/sprd/composite.c |   65 +
 drivers/clk/sprd/composite.h |   55 +
 drivers/clk/sprd/div.c   |  100 ++
 drivers/clk/sprd/div.h   |   79 +
 drivers/clk/sprd/gate.c  |  124 ++
 drivers/clk/sprd/gate.h  |   63 +
 drivers/clk/sprd/mux.c   |   86 +
 drivers/clk/sprd/mux.h   |   78 +
 drivers/clk/sprd/pll.c   |  268 +++
 drivers/clk/sprd/pll.h   |  110 ++
 drivers/clk/sprd/sc9860-clk.c| 1987 ++
 include/dt-bindings/clock/sprd,sc9860-clk.h  |  408 +
 22 files changed, 3901 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/sprd.txt
 create mode 100644 drivers/clk/clk_common.h
 create mode 100644 drivers/clk/sprd/Kco

Re: [PATCH] x86, pkeys: update documentation about availability

2017-11-09 Thread Dave Hansen

On 11/09/2017 10:12 PM, Ingo Molnar wrote:
> 
> * Dave Hansen  wrote:
> 
>>
>> From: Dave Hansen 
>>
>> Now that CPUs that implement Memory Protection Keys are publicly
>> available we can be a bit less oblique about where it is available.
>>
>> Signed-off-by: Dave Hansen 
>> ---
>>
>>  b/Documentation/x86/protection-keys.txt |9 +++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff -puN Documentation/x86/protection-keys.txt~pkeys-update 
>> Documentation/x86/protection-keys.txt
>> --- a/Documentation/x86/protection-keys.txt~pkeys-update 2017-11-09 
>> 10:36:53.381467202 -0800
>> +++ b/Documentation/x86/protection-keys.txt  2017-11-09 10:43:15.527466249 
>> -0800
>> @@ -1,5 +1,10 @@
>> -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
>> -which will be found on future Intel CPUs.
>> +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
>> +which is found on Intel's Skylake "Scalable Processor" Server CPUs.
>> +It will be avalable in future non-server parts.
>> +
>> +For anyone wishing to test or use this feature, it is available in
>> +Amazon's EC2 C5 instances and is known to work there using an Ubuntu
>> +17.04 image.
>>  
>>  Memory Protection Keys provides a mechanism for enforcing page-based
>>  protections, but without requiring modification of the page tables
> 
> Could we please first fix the pkeys self-test? One of the testcases doesn't 
> build 
> at all:
> 
>  gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 
> -O2 -g -std=gnu99 -pthread -Wall -no-pie  protection_keys.c -lrt -ldl -lm
>  In file included from /usr/include/signal.h:57:0,
>   from protection_keys.c:33:
>  protection_keys.c: In function ‘signal_handler’:
>  protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or 
> ‘__attribute__’ 
>  before ‘.’ token
>u64 si_pkey;

That's odd.  I build them all the time.  I compiled it just now with
4.14-rc8 and gcc 4.8.4.

I wonder if this is more fallout from the glibc headers getting updated
to now contain pkey-related stuff.  si_pkey might be getting #defined
over for the siginfo si_pkey.

What distro are you seeing this on?

> plus, on a related note, the MPX testcase produces annoying warnings:
> 
>  gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 -O2 
> -g -std=gnu99 -pthread -Wall -no-pie  mpx-mini-test.c -lrt -ldl -lm
>  mpx-mini-test.c: In function ‘insn_test_failed’:
>  mpx-mini-test.c:1406:3: warning: array subscript is above array bounds 
>  [-Warray-bounds]
> printf("bte[1]: %lx\n", bte->contents[1]);

This is kinda a weird structure:

> struct mpx_bt_entry {
> union {
> char x[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES];
> unsigned long contents[1];
> };
> } __attribute__((packed));

I guess it should either be contents[0] or
contents[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTE/sizeof(long)].  But, the
warning is harmless at least.

What gcc is this, btw?  I must be behind the times.

Re: [PATCH] checkpatch: Fix checks for Kconfig help text

2017-11-09 Thread Leo Yan

On Fri, Nov 10, 2017 at 02:32:37PM +0800, Leo Yan wrote:
> If one patch has Kconfig section with only one 'config', then variable
> '$is_start' will be set by first 'config' line and '$is_end' set by the
> second 'config' line. But patches often has only one 'config' line so
> we have no chance to set '$is_end', as result below condition is invalid
> and it skips check for Kconfig description:

Sorry for the bad commit log, I will send v2 for this.

>   if ($is_start && $is_end && $length < $min_conf_desc_length) {
>   ..
>   }
> 
> When script runs to this condition sentence it means the Kconfig
> section parsing has been completed, whatever '$is_end' is true
> or not. So removes '$is_end' from condition sentence.
> 
> Another change is to change '$min_conf_desc_length' from 4 to 1; so can
> pass the check if the Kconfig description has at least one line.
> 
> Signed-off-by: Leo Yan 
> ---
>  scripts/checkpatch.pl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 3453df9..ba724b0 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf";
>  my $max_line_length = 80;
>  my $ignore_perl_version = 0;
>  my $minimum_perl_version = 5.10.0;
> -my $min_conf_desc_length = 4;
> +my $min_conf_desc_length = 1;
>  my $spelling_file = "$D/spelling.txt";
>  my $codespell = 0;
>  my $codespellfile = "/usr/share/codespell/dictionary.txt";
> @@ -2796,7 +2796,7 @@ sub process {
>   }
>   $length++;
>   }
> - if ($is_start && $is_end && $length < 
> $min_conf_desc_length) {
> + if ($is_start && $length < $min_conf_desc_length) {
>   WARN("CONFIG_DESCRIPTION",
>"please write a paragraph that describes 
> the config symbol fully\n" . $herecurr);
>   }
> -- 
> 2.7.4
>

[tip:x86/urgent] x86/debug: Handle warnings before the notifier chain, to fix KGDB crash

2017-11-09 Thread tip-bot for Alexander Shishkin

Commit-ID:  a8d6c1bd62ffefb075c9d3570f07659e2a36ecb3
Gitweb: https://git.kernel.org/tip/a8d6c1bd62ffefb075c9d3570f07659e2a36ecb3
Author: Alexander Shishkin 
AuthorDate: Mon, 24 Jul 2017 13:04:28 +0300
Committer:  Ingo Molnar 
CommitDate: Fri, 10 Nov 2017 07:16:23 +0100

x86/debug: Handle warnings before the notifier chain, to fix KGDB crash

Commit:

  9a93848fe787 ("x86/debug: Implement __WARN() using UD0")

turned warnings into UD0, but the fixup code only runs after the
notify_die() chain. This is a problem, in particular, with kgdb,
which kicks in as if it was a BUG().

Fix this by running the fixup code before the notifier chain in
the invalid op handler path.

Signed-off-by: Alexander Shishkin 
Tested-by: Ilya Dryomov 
Acked-by: Daniel Thompson 
Cc: Jason Wessel 
Cc: Arjan van de Ven 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Thomas Gleixner 
Cc:  # v4.12+
Link: 
http://lkml.kernel.org/r/20170724100428.19173-1-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/traps.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 67db4f4..5a6b8f8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -209,9 +209,6 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char 
*str,
if (fixup_exception(regs, trapnr))
return 0;
 
-   if (fixup_bug(regs, trapnr))
-   return 0;
-
tsk->thread.error_code = error_code;
tsk->thread.trap_nr = trapnr;
die(str, regs, error_code);
@@ -292,6 +289,13 @@ static void do_error_trap(struct pt_regs *regs, long 
error_code, char *str,
 
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 
+   /*
+* WARN*()s end up here; fix them up before we call the
+* notifier chain.
+*/
+   if (!user_mode(regs) && fixup_bug(regs, trapnr))
+   return;
+
if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
cond_local_irq_enable(regs);

Re: [PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support

2017-11-09 Thread Kishon Vijay Abraham I

Hi,

On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> Add the pcie controller ep function support of layerscape base on
> pcie ep framework.
> 
> Signed-off-by: Bao Xiaowei 
> ---
>  v2:
>  - fix the ioremap function used but no ioumap issue
>  - optimize the code structure
>  - add code comments
>  v3:
>  - fix the msi outband window request failed issue
>  v4:
>  - optimize the code, adjust the format
> 
>  drivers/pci/dwc/pci-layerscape.c | 120 
> ---
>  1 file changed, 113 insertions(+), 7 deletions(-)

$subject should begin with
PCI: layerscape:
> 
> diff --git a/drivers/pci/dwc/pci-layerscape.c 
> b/drivers/pci/dwc/pci-layerscape.c
> index 87fa486bee2c..6f3e434599e0 100644
> --- a/drivers/pci/dwc/pci-layerscape.c
> +++ b/drivers/pci/dwc/pci-layerscape.c
> @@ -34,7 +34,12 @@
>  /* PEX Internal Configuration Registers */
>  #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */
>  
> +#define PCIE_DBI2_BASE   0x1000  /* DBI2 base address*/

The base address should come from dt.
> +#define PCIE_MSI_MSG_DATA_OFF0x5c/* MSI Data register address*/
> +#define PCIE_MSI_OB_SIZE 4096
> +#define PCIE_MSI_ADDR_OFFSET (1024 * 1024)
>  #define PCIE_IATU_NUM6
> +#define PCIE_EP_ADDR_SPACE_SIZE 0x1
>  
>  struct ls_pcie_drvdata {
>   u32 lut_offset;
> @@ -44,12 +49,20 @@ struct ls_pcie_drvdata {
>   const struct dw_pcie_ops *dw_pcie_ops;
>  };
>  
> +struct ls_pcie_ep {
> + dma_addr_t msi_phys_addr;
> + void __iomem *msi_virt_addr;
> + u64 msi_msg_addr;
> + u16 msi_msg_data;
> +};
> +
>  struct ls_pcie {
>   struct dw_pcie *pci;
>   void __iomem *lut;
>   struct regmap *scfg;
>   const struct ls_pcie_drvdata *drvdata;
>   int index;
> + struct ls_pcie_ep *pcie_ep;
>  };
>  
>  #define to_ls_pcie(x)dev_get_drvdata((x)->dev)
> @@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = {
>   { },
>  };
>  
> +static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep)
> +{
> + iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr);
> +}
> +
> +static int ls_pcie_raise_irq(struct dw_pcie_ep *ep,
> + enum pci_epc_irq_type type, u8 interrupt_num)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> + struct ls_pcie_ep *pcie_ep = pcie->pcie_ep;
> + u32 free_win;
> +
> + /* get the msi message address and msi message data */
> + pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) |
> + (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32);
> + pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF);
> +
> + /* request and config the outband window for msi */
> + free_win = find_first_zero_bit(&ep->ob_window_map,
> + sizeof(ep->ob_window_map));
> + if (free_win >= ep->num_ob_windows) {
> + dev_err(pci->dev, "no free outbound window\n");
> + return -ENOMEM;
> + }
> +
> + dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM,
> + pcie_ep->msi_phys_addr,
> + pcie_ep->msi_msg_addr,
> + PCIE_MSI_OB_SIZE);
> +
> + set_bit(free_win, &ep->ob_window_map);

This custom logic is not required. You can use [1] instead

[1] -> https://lkml.org/lkml/2017/11/3/318
> +
> + /* generate the msi interrupt */
> + ls_pcie_raise_msi_irq(pcie_ep);
> +
> + /* release the outband window of msi */
> + dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND);
> + clear_bit(free_win, &ep->ob_window_map);
> +
> + return 0;
> +}
> +
> +static struct dw_pcie_ep_ops pcie_ep_ops = {
> + .raise_irq = ls_pcie_raise_irq,
> +};
> +
> +static int __init ls_add_pcie_ep(struct ls_pcie *pcie,
> + struct platform_device *pdev)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + struct device *dev = pci->dev;
> + struct dw_pcie_ep *ep;
> + struct ls_pcie_ep *pcie_ep;
> + struct resource *cfg_res;
> + int ret;
> +
> + ep = &pci->ep;
> + ep->ops = &pcie_ep_ops;
> +
> + pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL);
> + if (!pcie_ep)
> + return -ENOMEM;
> +
> + pcie->pcie_ep = pcie_ep;
> +
> + cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config");
> + if (cfg_res) {
> + ep->phys_base = cfg_res->start;
> + ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE;
> + } else {
> + dev_err(dev, "missing *config* space\n");
> + return -ENODEV;
> + }
> +
> + pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET;
> +
> + pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr,
> + PCIE_MSI_OB_SI

Re: [PATCH] perf evsel: Fix incorrect precise_ip in default event name

2017-11-09 Thread Namhyung Kim

Hello,

On Fri, Nov 10, 2017 at 01:49:06PM +0800, Mengting Zhang wrote:
> When no event is specified with -e option, perf will specify a
> "cycles" event with the highest level of precision available in
> perf_event_attr.precise_ip as the default event. But the evsel name
> shows an incorrect precise ip, fix it.
> 
> For example, with a highest precision perf_event_attr.precise_ip = 2,
> the evsel name "cycles:ppp" shows a wrong precision available.
> 
> Before:
> $./perf record sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ]
> $./perf evlist -v
> cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, 
> sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
> comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
> sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
> 
> After:
> $./perf record sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
> $./perf evlist -v
> cycles:pp: size: 112, { sample_period, sample_freq }: 4000,
> sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
> comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
> sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
> 
> Signed-off-by: Mengting Zhang 
> ---
>  tools/perf/util/evsel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 0dccdb8..94cf11d 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise)
>   if (asprintf(&evsel->name, "cycles%s%s%.*s",
>(attr.precise_ip || attr.exclude_kernel) ? ":" : "",
>attr.exclude_kernel ? "u" : "",
> -  attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0)
> +  attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0)

I think you don't need to check value of the precise_ip anymore.
The following should be ok:

 attr.precise_ip, "ppp") < 0)

Thanks,
Namhyung

[PATCH] checkpatch: Fix checks for Kconfig help text

2017-11-09 Thread Leo Yan

If one patch has Kconfig section with only one 'config', then variable
'$is_start' will be set by first 'config' line and '$is_end' set by the
second 'config' line. But patches often has only one 'config' line so
we have no chance to set '$is_end', as result below condition is invalid
and it skips check for Kconfig description:

if ($is_start && $is_end && $length < $min_conf_desc_length) {
..
}

When script runs to this condition sentence it means the Kconfig
section parsing has been completed, whatever '$is_end' is true
or not. So removes '$is_end' from condition sentence.

Another change is to change '$min_conf_desc_length' from 4 to 1; so can
pass the check if the Kconfig description has at least one line.

Signed-off-by: Leo Yan 
---
 scripts/checkpatch.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3453df9..ba724b0 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf";
 my $max_line_length = 80;
 my $ignore_perl_version = 0;
 my $minimum_perl_version = 5.10.0;
-my $min_conf_desc_length = 4;
+my $min_conf_desc_length = 1;
 my $spelling_file = "$D/spelling.txt";
 my $codespell = 0;
 my $codespellfile = "/usr/share/codespell/dictionary.txt";
@@ -2796,7 +2796,7 @@ sub process {
}
$length++;
}
-   if ($is_start && $is_end && $length < 
$min_conf_desc_length) {
+   if ($is_start && $length < $min_conf_desc_length) {
WARN("CONFIG_DESCRIPTION",
 "please write a paragraph that describes 
the config symbol fully\n" . $herecurr);
}
-- 
2.7.4

Re: n900 in next-20170901

2017-11-09 Thread Joonsoo Kim

On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote:
> * Joonsoo Kim  [171110 00:10]:
> > On Thu, Nov 09, 2017 at 07:08:54AM -0800, Tony Lindgren wrote:
> > > Hmm OK. Does your first patch above now have the initcall issue too?
> > > It boots if I make that also subsys_initcall and then I get:
> > 
> > > [2.078094] vmalloc_pool_init: DMA: get vmalloc area: d001
> > 
> > Yes, first patch has the initcall issue and it's intentional in order
> > to check the theory. I checked following log for this.
> > 
> > - Boot failure
> > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000
> > SRAM_ADDR: omap_map_sram: V: 0xd005 - 0xd0057000
> > 
> > - Boot success
> > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000
> > SRAM_ADDR: omap_map_sram: V: 0xd0008000 - 0xd000f000
> > 
> > When failure, virtual address for sram is higher than normal one due
> > to vmalloc area allocation in __dma_alloc_remap(). If it is deferred,
> > virtual address is the same with success case and then the system work.
> > 
> > So, my next theory is that there is n900 specific assumption that sram
> > should have that address. Could you check if any working tree for n900
> > which doesn't have my CMA series work or not with adding
> > "arm/dma: vmalloc area allocation"?
> 
> Oh I see, sorry I was not following you earlier. So you mean that
> by adding the vmalloc_pool_init() initcall the va mapping for SRAM
> changes.

Exactly.

> 
> And yes, save_secure_ram_context seems to be doing some sketchy
> virt to phys calculation with sram_phy_addr_mask. Here's a small
> patch to fix that for your CMA series, maybe you can merge it
> with your series to avoid breaking booting for git bisect.
> 
> Then I'll follow up on cleaning up save_secure_ram_context later.

Thanks for the patch. However, the patch should be modified. See below.

> Regards,
> 
> Tony
> 
> 8< -
> >From tony Mon Sep 17 00:00:00 2001
> From: Tony Lindgren 
> Date: Thu, 9 Nov 2017 17:05:34 -0800
> Subject: [PATCH] ARM: OMAP2+: Add static SRAM mapping for
>  save_secure_ram_context
> 
> With the CMA changes from Joonsoo Kim , it
> was noticed that n900 stopped booting. After investigating it turned
> out that n900 save_secure_ram_context does some whacky virtual to
> physical address translation for the SRAM data address.
> 
> Let's fix this for CMA changes by adding a static mapping for SRAM
> on omap3. Then we can follow up with a patch to clean up the address
> translation in save_secure_ram_context later on.
> 
> Debugged-by: Joonsoo Kim 
> Signed-off-by: Tony Lindgren 
> ---
>  arch/arm/mach-omap2/io.c| 6 ++
>  arch/arm/mach-omap2/iomap.h | 4 
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/arm/mach-omap2/io.c b/arch/arm/mach-omap2/io.c
> --- a/arch/arm/mach-omap2/io.c
> +++ b/arch/arm/mach-omap2/io.c
> @@ -139,6 +139,12 @@ static struct map_desc omap243x_io_desc[] __initdata = {
>  
>  #ifdef   CONFIG_ARCH_OMAP3
>  static struct map_desc omap34xx_io_desc[] __initdata = {
> + {
> + .virtual= OMAP34XX_SRAM_VIRT,
> + .pfn= __phys_to_pfn(OMAP34XX_SRAM_PHYS),
> + .length = OMAP34XX_SRAM_SIZE,
> + .type   = MT_DEVICE
> + },
>   {
>   .virtual= L3_34XX_VIRT,
>   .pfn= __phys_to_pfn(L3_34XX_PHYS),
> diff --git a/arch/arm/mach-omap2/iomap.h b/arch/arm/mach-omap2/iomap.h
> --- a/arch/arm/mach-omap2/iomap.h
> +++ b/arch/arm/mach-omap2/iomap.h
> @@ -123,6 +123,10 @@
>   * VPOM3430 was not working for Int controller
>   */
>  
> +#define OMAP34XX_SRAM_PHYS   0x4020
> +#define OMAP34XX_SRAM_VIRT   0xd001
> +#define OMAP34XX_SRAM_SIZE   0x1

For my testing environment, vmalloc address space is started at
roughly 0xe000 so 0xd001 would not be valid. And, PHYS
can be different according to the system type. Maybe either
OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should
be considered, too. My understanding is correct?

Thanks.

Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen

On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
> Here are two proposals to address this without breaking vsyscalls.
> 
> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
> mappings but, optionally, warn if you see _PAGE_USER on any address
> that isn't the vsyscall page.
> 
> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
> KAISER doesn't muck with it.

These are totally doable.  But, what's the big deal with breaking native
vsyscall?  We can still do the emulation so nothing breaks: it is just slow.

Re: [PATCH 3/4] kbuild: create object directories simpler and faster

2017-11-09 Thread Ingo Molnar


* Masahiro Yamada  wrote:

> For the out-of-tree build, scripts/Makefile.build creates output
> directories, but this operation is not efficient.
> 
> scripts/Makefile.lib calculates obj-dirs as follows:
> 
>   obj-dirs := $(dir $(multi-objs) $(obj-y))
> 
> Please notice $(sort ...) is not used here.  Usually the resulted
> obj-dirs is as many "./" as objects.
> 
> For those duplicated paths, the following command is invoked.
> 
>   _dummy := $(foreach d,$(obj-dirs), $(shell [ -d $(d) ] || mkdir -p $(d)))
> 
> Then, the costly shell command is run over and over again.
> 
> I see many points for optimization:
> 
> [1] Use $(sort ...) to cut down duplicated paths before passing them
> to system call
> [2] Use single $(shell ...) instead of repeating it with $(foreach ...)
> This will reduce forking.
> [3] We can calculate obj-dirs more simply.  Most of objects are already
> accumulated in $(targets).  So, $(dir $(targets)) is fine and more
> comprehensive.
> 
> I also removed bad code in arch/x86/entry/vdso/Makefile.  This is now
> really unnecessary.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  arch/x86/entry/vdso/Makefile |  4 
>  scripts/Makefile.build   | 15 ++-
>  scripts/Makefile.host| 11 ---
>  scripts/Makefile.lib |  5 -
>  4 files changed, 6 insertions(+), 29 deletions(-)

I love not just the speedup, but the diffstat as well ;-)

Acked-by: Ingo Molnar 

Thanks,

Ingo

Re: n900 in next-20170901

2017-11-09 Thread Tony Lindgren

* Tony Lindgren  [171109 22:19]:
> * Tony Lindgren  [171110 03:28]:
> > Then I'll follow up on cleaning up save_secure_ram_context later.
> 
> Here's a better version, the static mapping did not get used.. It
> just moved the area so it happened to work. It needs to be set
> up as MT_MEMORY_RWX_NONCACHED instead.

And FYI, here's what I currently have for the follow-up patch,
but that can wait a bit.

Regards,

Tony

8< 
diff --git a/arch/arm/mach-omap2/sleep34xx.S b/arch/arm/mach-omap2/sleep34xx.S
--- a/arch/arm/mach-omap2/sleep34xx.S
+++ b/arch/arm/mach-omap2/sleep34xx.S
@@ -45,7 +45,6 @@
 #define PM_PWSTCTRL_MPU_P  OMAP3430_PRM_BASE + MPU_MOD + OMAP2_PM_PWSTCTRL
 #define CM_IDLEST1_CORE_V  OMAP34XX_CM_REGADDR(CORE_MOD, CM_IDLEST1)
 #define CM_IDLEST_CKGEN_V  OMAP34XX_CM_REGADDR(PLL_MOD, CM_IDLEST)
-#define SRAM_BASE_POMAP3_SRAM_PA
 #define CONTROL_STAT   OMAP343X_CTRL_BASE + OMAP343X_CONTROL_STATUS
 #define CONTROL_MEM_RTA_CTRL   (OMAP343X_CTRL_BASE +\
OMAP36XX_CONTROL_MEM_RTA_CTRL)
@@ -103,10 +102,8 @@ ENTRY(save_secure_ram_context)
stmfd   sp!, {r4 - r11, lr} @ save registers on stack
adr r3, api_params  @ r3 points to parameters
str r0, [r3,#0x4]   @ r0 has sdram address
-   ldr r12, high_mask
-   and r3, r3, r12
-   ldr r12, sram_phy_addr_mask
-   orr r3, r3, r12
+   ldr r12, sram_phys_offset   @ load sram physical offset
+   sub r3, r3, r12 @ parameters physical address
mov r0, #25 @ set service ID for PPA
mov r12, r0 @ copy secure service ID in r12
mov r1, #0  @ set task id for ROM code in r1
@@ -121,10 +118,8 @@ ENTRY(save_secure_ram_context)
nop
ldmfd   sp!, {r4 - r11, pc}
.align
-sram_phy_addr_mask:
-   .word   SRAM_BASE_P
-high_mask:
-   .word   0x
+sram_phys_offset:
+   .word   OMAP34XX_SRAM_VIRT - OMAP34XX_SRAM_PHYS
 api_params:
.word   0x4, 0x0, 0x0, 0x1, 0x1
 ENDPROC(save_secure_ram_context)
@@ -521,7 +516,7 @@ pm_pwstctrl_mpu:
 scratchpad_base:
.word   SCRATCHPAD_BASE_P
 sram_base:
-   .word   SRAM_BASE_P + 0x8000
+   .word   OMAP34XX_SRAM_PHYS + 0x8000
 control_stat:
.word   CONTROL_STAT
 control_mem_rta:
diff --git a/arch/arm/mach-omap2/sram.c b/arch/arm/mach-omap2/sram.c
--- a/arch/arm/mach-omap2/sram.c
+++ b/arch/arm/mach-omap2/sram.c
@@ -31,7 +31,7 @@
 #include "sram.h"
 
 #define OMAP2_SRAM_PUB_PA  (OMAP2_SRAM_PA + 0xf800)
-#define OMAP3_SRAM_PUB_PA   (OMAP3_SRAM_PA + 0x8000)
+#define OMAP3_SRAM_PUB_PA   (OMAP34XX_SRAM_PHYS + 0x8000)
 
 #define SRAM_BOOTLOADER_SZ 0x00
 
@@ -105,7 +105,7 @@ static void __init omap_detect_sram(void)
}
} else {
if (cpu_is_omap34xx()) {
-   omap_sram_start = OMAP3_SRAM_PA;
+   omap_sram_start = OMAP34XX_SRAM_PHYS;
omap_sram_size = 0x1; /* 64K */
} else {
omap_sram_start = OMAP2_SRAM_PA;
diff --git a/arch/arm/mach-omap2/sram.h b/arch/arm/mach-omap2/sram.h
--- a/arch/arm/mach-omap2/sram.h
+++ b/arch/arm/mach-omap2/sram.h
@@ -59,4 +59,3 @@ static inline void omap_push_sram_idle(void) {}
  * Used by the SRAM management code and the idle sleep code.
  */
 #define OMAP2_SRAM_PA  0x4020
-#define OMAP3_SRAM_PA   0x4020
-- 
2.15.0

Re: [PATCH 30/31] dt-bindings: nds32 CPU Bindings

2017-11-09 Thread Greentime Hu

2017-11-09 21:57 GMT+08:00 Rob Herring :
> On Thu, Nov 9, 2017 at 3:39 AM, Greentime Hu  wrote:
>> 2017-11-08 21:18 GMT+08:00 Rob Herring :
>>> Please Cc the DT list on bindings.
>>
>> Sorry. I am not sure what you mean.
>> Do you mean add devicet...@vger.kernel.org to cc list?
>
> Yes. Use get_maintainers.pl as a guide.

Roger that! Thanks!

>>> On Tue, Nov 7, 2017 at 11:55 PM, Greentime Hu  wrote:
 From: Greentime Hu 
>>>
>
 +   device_type = "cpu";
 +   compatible = "andestech,n13", "andestech,n15";
>>>
>>> n13 is a superset of n15?
>>
>> No, they are independent ones.
>
> Then having both is not valid. The strings should be in order of best
> match to worst match where worst match is typically either older
> implementations of IP blocks or generic'ish strings such as "ns16550"
> for a UART.

Thanks.
I would like to explain it more clearly.
They are independent ones in implementations.
They are implemented based on the same nds32 ISA and architecture spec
with different configurations
like cache size, page size, cache type(VIPT/PIPT), pipeline stages...
Most of them are compatible.
They use the same toolchain to build vmlinux which can run on
different nds32 cores.

Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB

2017-11-09 Thread Kishon Vijay Abraham I

Hi Bao,

On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote:
> Add the property of inbound and outbound windows number for ep
> driver.
> 
> Signed-off-by: Bao Xiaowei 
> Acked-by: Minghuan Lian 
> ---
>  v2:
>  - no change
>  v3:
>  - modify the commit message
>  v4:
>  - no change
> 
>  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++
>  1 file changed, 6 insertions(+)

$subject should start with something like
arm64: dts: ls1046a: **
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> index 06b5e12d04d8..f8332669663c 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
> @@ -674,6 +674,8 @@
>   device_type = "pci";
>   dma-coherent;
>   num-lanes = <4>;
> + num-ib-windows = <6>;
> + num-ob-windows = <6>;

EP specific properties shouldn't be added in RC dt node. Ideally you should
have a separate dt node for RC and EP.

Thanks
Kishon

Re: n900 in next-20170901

2017-11-09 Thread Tony Lindgren

* Tony Lindgren  [171110 03:28]:
> * Joonsoo Kim  [171110 00:10]:
> > On Thu, Nov 09, 2017 at 07:08:54AM -0800, Tony Lindgren wrote:
> > > Hmm OK. Does your first patch above now have the initcall issue too?
> > > It boots if I make that also subsys_initcall and then I get:
> > 
> > > [2.078094] vmalloc_pool_init: DMA: get vmalloc area: d001
> > 
> > Yes, first patch has the initcall issue and it's intentional in order
> > to check the theory. I checked following log for this.
> > 
> > - Boot failure
> > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000
> > SRAM_ADDR: omap_map_sram: V: 0xd005 - 0xd0057000
> > 
> > - Boot success
> > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000
> > SRAM_ADDR: omap_map_sram: V: 0xd0008000 - 0xd000f000
> > 
> > When failure, virtual address for sram is higher than normal one due
> > to vmalloc area allocation in __dma_alloc_remap(). If it is deferred,
> > virtual address is the same with success case and then the system work.
> > 
> > So, my next theory is that there is n900 specific assumption that sram
> > should have that address. Could you check if any working tree for n900
> > which doesn't have my CMA series work or not with adding
> > "arm/dma: vmalloc area allocation"?
> 
> Oh I see, sorry I was not following you earlier. So you mean that
> by adding the vmalloc_pool_init() initcall the va mapping for SRAM
> changes.
> 
> And yes, save_secure_ram_context seems to be doing some sketchy
> virt to phys calculation with sram_phy_addr_mask. Here's a small
> patch to fix that for your CMA series, maybe you can merge it
> with your series to avoid breaking booting for git bisect.
> 
> Then I'll follow up on cleaning up save_secure_ram_context later.

Here's a better version, the static mapping did not get used.. It
just moved the area so it happened to work. It needs to be set
up as MT_MEMORY_RWX_NONCACHED instead.

Regards,

Tony

8< ---
>From tony Mon Sep 17 00:00:00 2001
From: Tony Lindgren 
Date: Thu, 9 Nov 2017 17:05:34 -0800
Subject: [PATCH] ARM: OMAP2+: Add static SRAM mapping for
 save_secure_ram_context

With the CMA changes from Joonsoo Kim , it
was noticed that n900 stopped booting. After investigating it turned
out that n900 save_secure_ram_context does some whacky virtual to
physical address translation for the SRAM data address.

Let's fix this for CMA changes by adding a static mapping for SRAM
on omap3. Then we can follow up with a patch to clean up the address
translation in save_secure_ram_context later on.

Debugged-by: Joonsoo Kim 
Signed-off-by: Tony Lindgren 
---
 arch/arm/mach-omap2/io.c| 6 ++
 arch/arm/mach-omap2/iomap.h | 4 
 2 files changed, 10 insertions(+)

diff --git a/arch/arm/mach-omap2/io.c b/arch/arm/mach-omap2/io.c
--- a/arch/arm/mach-omap2/io.c
+++ b/arch/arm/mach-omap2/io.c
@@ -139,6 +139,12 @@ static struct map_desc omap243x_io_desc[] __initdata = {
 
 #ifdef CONFIG_ARCH_OMAP3
 static struct map_desc omap34xx_io_desc[] __initdata = {
+   {
+   .virtual= OMAP34XX_SRAM_VIRT,
+   .pfn= __phys_to_pfn(OMAP34XX_SRAM_PHYS),
+   .length = OMAP34XX_SRAM_SIZE,
+   .type   = MT_MEMORY_RWX_NONCACHED
+   },
{
.virtual= L3_34XX_VIRT,
.pfn= __phys_to_pfn(L3_34XX_PHYS),
diff --git a/arch/arm/mach-omap2/iomap.h b/arch/arm/mach-omap2/iomap.h
--- a/arch/arm/mach-omap2/iomap.h
+++ b/arch/arm/mach-omap2/iomap.h
@@ -123,6 +123,10 @@
  * VPOM3430 was not working for Int controller
  */
 
+#define OMAP34XX_SRAM_PHYS 0x4020
+#define OMAP34XX_SRAM_VIRT 0xd001
+#define OMAP34XX_SRAM_SIZE 0x1
+
 #define L4_PER_34XX_PHYS   L4_PER_34XX_BASE
/* 0x4900 --> 0xfb00 */
 #define L4_PER_34XX_VIRT   (L4_PER_34XX_PHYS + OMAP2_L4_IO_OFFSET)
-- 
2.15.0

Re: KGDB/KDB treats WARN*() as Oops on x86 since 4.12

2017-11-09 Thread Ingo Molnar


* Ilya Dryomov  wrote:

> On Fri, Oct 13, 2017 at 4:59 PM, Daniel Thompson
>  wrote:
> > On 09/10/17 13:24, Ilya Dryomov wrote:
> >>
> >> Hi Jason,
> >>
> >> Starting with 4.12, WARN*() is implemented with ud0, generating an
> >> Invalid Opcode exception.  KGDB/KDB gets entered as if it were an Oops,
> >> making KGDB/KDB rather hard to use, particularly on testing kernels.
> >>
> >> Alexander posted a fix a while back, but Peter seems to be waiting for
> >> your ack.  Could you please weigh in?
> >>
> >>[PATCH] x86/debug: Handle warnings before the notifier chain
> >>https://patchwork.kernel.org/patch/9859065/
> >
> >
> > Hmnnn... IIRC arm64 code has been also been blocked for a couple of releases
> > whilst Will D. waited for an ack that never came.
> >
> > My own reading of the code is that the patch in question restores the status
> > quo, that there will still be mechanisms to provoke entry to kdb/kgdb during
> > a warning (breakpoint on __warn, engage panic_on_warn, etc) and that these
> > are not obviously recursive[1].
> >
> > Put another way I'm happy to dig the patch out of my mail archive and throw
> > in an Acked-By: but since I have no official role within kdb/kgdb (I'm just
> > an interested bystander) it might not be enough for Peter.
> >
> >
> > Daniel.
> >
> >
> > [1] I'm not a huge x86 expert so correct me if I am wrong but I think
> > its ok for us to trap here providing its for a different reason.
> 
> Hi Peter, Ingo,
> 
> Could you please consider taking Alexander's patch for 4.15?  Jason
> never replied to any of our pings and hasn't been actively involved
> with kgdb recently.  In the meantime, this regression makes running
> e.g. xfstests runs with kgdb enabled pretty much impossible.

Ok, agreed, I picked the fix up into tip:x86/urgent, with a -stable backporting 
tag, and will try to get it to Linus for v4.15 (it will also get backported to 
v4.14 which is affected as well).

Thanks,

Ingo

Re: [PATCH] x86, pkeys: update documentation about availability

2017-11-09 Thread Ingo Molnar


* Dave Hansen  wrote:

> 
> From: Dave Hansen 
> 
> Now that CPUs that implement Memory Protection Keys are publicly
> available we can be a bit less oblique about where it is available.
> 
> Signed-off-by: Dave Hansen 
> ---
> 
>  b/Documentation/x86/protection-keys.txt |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff -puN Documentation/x86/protection-keys.txt~pkeys-update 
> Documentation/x86/protection-keys.txt
> --- a/Documentation/x86/protection-keys.txt~pkeys-update  2017-11-09 
> 10:36:53.381467202 -0800
> +++ b/Documentation/x86/protection-keys.txt   2017-11-09 10:43:15.527466249 
> -0800
> @@ -1,5 +1,10 @@
> -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
> -which will be found on future Intel CPUs.
> +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
> +which is found on Intel's Skylake "Scalable Processor" Server CPUs.
> +It will be avalable in future non-server parts.
> +
> +For anyone wishing to test or use this feature, it is available in
> +Amazon's EC2 C5 instances and is known to work there using an Ubuntu
> +17.04 image.
>  
>  Memory Protection Keys provides a mechanism for enforcing page-based
>  protections, but without requiring modification of the page tables

Could we please first fix the pkeys self-test? One of the testcases doesn't 
build 
at all:

 gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 -O2 
-g -std=gnu99 -pthread -Wall -no-pie  protection_keys.c -lrt -ldl -lm
 In file included from /usr/include/signal.h:57:0,
  from protection_keys.c:33:
 protection_keys.c: In function ‘signal_handler’:
 protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or 
‘__attribute__’ 
 before ‘.’ token
   u64 si_pkey;
   ^

plus, on a related note, the MPX testcase produces annoying warnings:

 gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 -O2 
-g -std=gnu99 -pthread -Wall -no-pie  mpx-mini-test.c -lrt -ldl -lm
 mpx-mini-test.c: In function ‘insn_test_failed’:
 mpx-mini-test.c:1406:3: warning: array subscript is above array bounds 
 [-Warray-bounds]
printf("bte[1]: %lx\n", bte->contents[1]);
^
 mpx-mini-test.c:1407:3: warning: array subscript is above array bounds 
 [-Warray-bounds]
printf("bte[2]: %lx\n", bte->contents[2]);
^
 mpx-mini-test.c:1408:3: warning: array subscript is above array bounds 
 [-Warray-bounds]
printf("bte[3]: %lx\n", bte->contents[3]);
^

Thanks,

Ingo

Re: [01/18] x86/asm/64: Remove the restore_c_regs_and_iret label

2017-11-09 Thread kemi

Some performance regression/improvement is reported by LKP-tools for this patch 
series
tested with Intel Atom processor. So, post the data here for your reference.

Branch:x86/entry_consolidation
Commit id:
 base:50da9d439392fdd91601d36e7f05728265bff262
 head:69af865668fdb86a95e4e948b1f48b2689d60b73
Benchmark suite:will-it-scale
Download link:https://github.com/antonblanchard/will-it-scale/tree/master/tests
Metrics:
 will-it-scale.per_process_ops=processes/nr_cpu
 will-it-scale.per_thread_ops=threads/nr_cpu

tbox:lkp-avoton3(nr_cpu=8,memory=16G)
CPU: Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz
Performance regression with will-it-scale benchmark suite:
testcasebasechange  headmetric  
 
eventfd11505677 -5.9%   1416132 
will-it-scale.per_process_ops
1352716 -3.0%   1311943 
will-it-scale.per_thread_ops
lseek2  7306698 -4.3%   6991473 
will-it-scale.per_process_ops
4906388 -3.6%   4730531 
will-it-scale.per_thread_ops
lseek1  7355365 -4.2%   7046224 
will-it-scale.per_process_ops
4928961 -3.7%   4748791 
will-it-scale.per_thread_ops
getppid18479806 -4.1%   8129026 
will-it-scale.per_process_ops
8515252 -4.1%   8162076 
will-it-scale.per_thread_ops
lock1   1054249 -3.2%   1020895 
will-it-scale.per_process_ops
989145  -2.6%   963578  
will-it-scale.per_thread_ops
dup12675825 -3.0%   2596257 
will-it-scale.per_process_ops
futex3  4986520 -2.8%   4846640 
will-it-scale.per_process_ops
5009388 -2.7%   4875126 
will-it-scale.per_thread_ops
futex4  3932936 -2.0%   3854240 
will-it-scale.per_process_ops
3950138 -2.0%   3872615 
will-it-scale.per_thread_ops
futex1  2941886 -1.8%   2888912 
will-it-scale.per_process_ops
futex2  2500203 -1.6%   2461065 
will-it-scale.per_process_ops
1534692 -2.3%   1499532 
will-it-scale.per_thread_ops
malloc1 61314   -1.0%   60725   
will-it-scale.per_process_ops
19996   -1.5%   19688   
will-it-scale.per_thread_ops

Performance improvement with will-it-scale benchmark suite:
testcasebasechange  headmetric  
 
context_switch1 176376  +1.6%   179152  
will-it-scale.per_process_ops
180703  +1.9%   184209  
will-it-scale.per_thread_ops
page_fault2 179716  +2.5%   184272  
will-it-scale.per_process_ops
146890  +2.8%   150989  
will-it-scale.per_thread_ops
page_fault3 666953  +3.7%   691735  
will-it-scale.per_process_ops
464641  +5.0%   487952  
will-it-scale.per_thread_ops
unix1   483094  +4.4%   504201  
will-it-scale.per_process_ops
450055  +7.5%   483637  
will-it-scale.per_thread_ops
read2   575887  +5.0%   604440  
will-it-scale.per_process_ops
500319  +5.2%   526361  
will-it-scale.per_thread_ops
poll1   4614597 +5.4%   4864022 
will-it-scale.per_process_ops
3981551 +5.8%   4213409 
will-it-scale.per_thread_ops
pwrite2 383344  +5.7%   405151  
will-it-scale.per_process_ops
367006  +5.0%   385209  
will-it-scale.per_thread_ops
sched_yield 3011191 +6.0%   3191710 
will-it-scale.per_process_ops
3024171 +6.1%   3208197 
will-it-scale.per_thread_ops
pipe1   755487  +6.2%   802622  
will-it-scale.per_process_ops
705136  +8.8%   766950  
will-it-scale.per_thread_ops
pwrite3 422850  +6.6%   450660  
will-it-scale.per_process_ops
413370  +3.7%   428704  
will-it-scale.per_thread_ops
readseek1   972102  +6.7%   1036852 
will-it-scale.per_process_ops
844877  +6.6%   900686  
will-it-scale.per_thread_ops
pwrite1 981310  +6.8%   1047809 
will-it-scale.per_process_ops
94

Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

2017-11-09 Thread Wanpeng Li

2017-11-10 0:00 GMT+08:00 Radim Krcmar :
> 2017-11-09 20:43+0800, Wanpeng Li:
>> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin :
>> > Currently, the existing qspinlock implementation will fallback to
>> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
>> >
>> > This patch gives the opportunity to guest kernels to select
>> > between test-and-set and the regular queueu fair lock implementation
>> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
>> > flag is not set, the code will still fall back to test-and-set,
>> > but when the PV_DEDICATED flag is set, the code will use
>> > the regular queue spinlock implementation.
>> >
>> > With this patch, when in autoselect mode, the guest will
>> > use the default spinlock implementation based on host feature
>> > flags as follows:
>> >
>> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>> >
>> > Cc: Paolo Bonzini 
>> > Cc: "Radim Krčmář" 
>> > Cc: Jonathan Corbet 
>> > Cc: Thomas Gleixner 
>> > Cc: Ingo Molnar 
>> > Cc: "H. Peter Anvin" 
>> > Cc: x...@kernel.org
>> > Cc: Peter Zijlstra 
>> > Cc: Waiman Long 
>> > Cc: k...@vger.kernel.org
>> > Cc: linux-...@vger.kernel.org
>> > Cc: linux-kernel@vger.kernel.org
>> > Cc: Jan H. Schoenherr 
>> > Cc: Anthony Liguori 
>> > Suggested-by: Matt Wilson 
>> > Signed-off-by: Eduardo Valentin 
>> > ---
>>
>> You should also add a cpuid flag in kvm part.
>
> It is better without that.  The flag has no dependency on KVM (kernel
> hypervisor) code.

Do you mean -cpu host, +，I think it will result in "warning: host
doesn't support requested feature: CPUID.4001H:eax.XX"

Regards,
Wanpeng Li

Re: Kernel crash in free_pipe_info()

2017-11-09 Thread Simon Brewer

On 1 November 2017 at 14:19, Cong Wang  wrote:
> On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds
>  wrote:
>> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang  wrote:
>>>
>>> 1. The faulty addresses are all near 0001, with one exception
>>> of null (which is the most recent one)
>>
>> Well, they're at 8(%rax), except for that last case.
>>
>> And in every case (_including_ that last case), %rax has a very
>> interesting pattern.. That's the (bad) buf->ops pointer that  was
>> loaded from the somehow corrupted "buf".
>>
>> The values in all cases are
>>
>> fffa
>> fffd
>> fff1
>> fff7
>> fff4
>> fffa
>> fffd
>> fffd
>> fffa
>> ffe8
>> fff1
>> fff7
>>
>> which kind of looks like a 32-bit error value. So we have (n, val, (errno)):
>>
>>   1 -24 (EMFILE)
>>   2 -15 (ENOTBLK)
>>   1 -12 (ENOMEM)
>>   2 -9 (EBADF)
>>   3 -6 (ENXIO)
>>   3 -3 (ESRCH)
>>
>> none of which makes any sense to me, but it's an interesting pattern
>> nonetheless.
>
>
> Yeah, good find!
>
>
>>
>>> 2. R12 register, which should map to the local vairable 'i', is always 0x8
>>> at the time of crash.
>>
>> So _if_ this is some kind of use-after-free thing, and the allocation
>> got re-used for something else, that might just be related to whatever
>> ends up being the offset that is filled in with the (int) error
>> number.
>>
>> Except the offset is that %r12*0x28+0x10, so we're talking a byte
>> offset of 330 bytes into the allocation, and apparently the eight
>> previous (0-7) iterations were fine.
>>
>> Which is really odd.
>>
>> I'm not seeing anything that makes sense. I'll have to think about this.
>>
>> I'm assuming you don't have slub debugging enabled, and no way to
>> enable it and try to catch this?
>
> We enable it at compile-time but not at run-time:
>
> CONFIG_SLUB_DEBUG=y
> CONFIG_SLUB=y
> CONFIG_SLUB_CPU_PARTIAL=y
> # CONFIG_SLUB_DEBUG_ON is not set
> # CONFIG_SLUB_STATS is not set
>
> I can try to manually add slub_debug in boot parameters, but still
> have no idea how and when can trigger this bug again.
>
>
> Thanks!

This looks familiar...

https://github.com/moby/moby/issues/34472

>From the bug report:
"In particular, it looks like either docker-containerd or
docker-containerd-shim (the log is cut off) has a pipe open that is
causing a kernel BUG when attempting to kill the process. Fun times."

Re: [PATCH v2 2/4] kaslr: select the memory region in immovable node to process

2017-11-09 Thread Chao Fan

On Fri, Nov 10, 2017 at 11:14:37AM +0800, Baoquan He wrote:
>On 11/10/17 at 11:03am, Chao Fan wrote:
>> On Thu, Nov 09, 2017 at 04:21:32PM +0800, Baoquan He wrote:
>> >Hi Chao,
>> >
>> >On 11/01/17 at 07:32pm, Chao Fan wrote:
>> >> Compare the region of memmap entry and immovable_mem, then choose the
>> >> intersection to process_mem_region.
>> >> 
>> >> Since the interrelationship between e820 or efi entries and memory
>> >> region in immovable_mem is different:
>> >
>> >Could you paste a bootlog with efi=debug specified in cmdline on the
>> >system you tested? I want to check what kind of intersection between
>> >them. The adding makes code pretty ugly, want to make sure if we have
>> >to do like this.
>> Hi Baoquan,
>> 
>> Here is a machine with efi.
>

Here is a log for e820, also 10 nodes in this machine.

Thanks,
Chao Fan

>Thanks, do you have the whole boot log? I want to have a look at e820.
>And this is a special system, or a customized system? I mean you just
>customize the firmware for better testing to cover kinds of cases.
>
>If it's too big, please attach it and send to me privately.
>
>Anyway, seems your considering about the intersection is right.
>
>Thanks
>Baoquan
>> 
>> The memory information in SRAT from dmesg:
>> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009]
>> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0x1f3f]
>> [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x1f40-0x3e7f]
>> [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x3e80-0x5dbf]
>> [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x5dc0-0x7cff]
>> [0.00] ACPI: SRAT: Node 4 PXM 4 [mem 0x7d00-0x9c3f]
>> [0.00] ACPI: SRAT: Node 5 PXM 5 [mem 0x9c40-0xbb7f]
>> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0xbb80-0xbfff]
>> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0x1-0x11abf]
>> [0.00] ACPI: SRAT: Node 7 PXM 7 [mem 0x11ac0-0x139ff]
>> [0.00] ACPI: SRAT: Node 8 PXM 8 [mem 0x13a00-0x1593f]
>> [0.00] ACPI: SRAT: Node 9 PXM 9 [mem 0x15940-0x1787f]
>> 
>> There are 10 nodes, and 500M memory in every node.
>> And node0 and node 6 has two parts.
>> 
>> 
>> Here is the efi mem:
>> [0.00] efi: mem00: [Boot Code  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x-0x0fff] (0MB)
>> [0.00] efi: mem01: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x1000-0x1fff] (0MB)
>> [0.00] efi: mem02: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x2000-0x0009] (0MB)
>> [0.00] efi: mem03: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0010-0x00805fff] (7MB)
>> [0.00] efi: mem04: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x00806000-0x00806fff] (0MB)
>> [0.00] efi: mem05: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x00807000-0x0081] (0MB)
>> [0.00] efi: mem06: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0082-0x012f] (10MB)
>> [0.00] efi: mem07: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0130-0x01ff] (13MB)
>> [0.00] efi: mem08: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0200-0x036e3fff] (22MB)
>> (From mem00 to mem08, belongs to node0)
>> [0.00] efi: mem09: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x036e4000-0x3d626fff] (927MB)
>> (mem09 has part of node0 and part of node1, but not the whole of node0 and 
>> node1)
>> [0.00] efi: mem10: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x3d627000-0x3fff] (41MB)
>> (part of node1 and part of node2)
>> [0.00] efi: mem11: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x4000-0x8c92dfff] (1225MB)
>> [0.00] efi: mem12: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x8c92e000-0xbbfbdfff] (758MB)
>> [0.00] efi: mem13: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbbfbe000-0xbbfddfff] (0MB)
>> [0.00] efi: mem14: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbbfde000-0xbe350fff] (35MB)
>> [0.00] efi: mem15: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbe351000-0xbe579fff] (2MB)
>> [0.00] efi: mem16: [Loader Code|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbe57a000-0xbe6a0fff] (1MB)
>> [0.00] efi: mem17: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] ra

[PATCH] perf evsel: Fix incorrect precise_ip in default event name

2017-11-09 Thread Mengting Zhang

When no event is specified with -e option, perf will specify a
"cycles" event with the highest level of precision available in
perf_event_attr.precise_ip as the default event. But the evsel name
shows an incorrect precise ip, fix it.

For example, with a highest precision perf_event_attr.precise_ip = 2,
the evsel name "cycles:ppp" shows a wrong precision available.

Before:
$./perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ]
$./perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, 
sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

After:
$./perf record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ]
$./perf evlist -v
cycles:pp: size: 112, { sample_period, sample_freq }: 4000,
sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1,
comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2,
sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1

Signed-off-by: Mengting Zhang 
---
 tools/perf/util/evsel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0dccdb8..94cf11d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise)
if (asprintf(&evsel->name, "cycles%s%s%.*s",
 (attr.precise_ip || attr.exclude_kernel) ? ":" : "",
 attr.exclude_kernel ? "u" : "",
-attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0)
+attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0)
goto error_free;
 out:
return evsel;
-- 
1.7.12.4

Re: [PATCH net] rds: ib: Fix NULL pointer dereference in debug code

2017-11-09 Thread David Miller

From: Håkon Bugge 
Date: Tue,  7 Nov 2017 16:33:34 +0100

> rds_ib_recv_refill() is a function that refills an IB receive
> queue. It can be called from both the CQE handler (tasklet) and a
> worker thread.
> 
> Just after the call to ib_post_recv(), a debug message is printed with
> rdsdebug():
> 
> ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
> rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
>  recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
>  (long) ib_sg_dma_address(
> ic->i_cm_id->device,
> &recv->r_frag->f_sg),
> ret);
> 
> Now consider an invocation of rds_ib_recv_refill() from the worker
> thread, which is preemptible. Further, assume that the worker thread
> is preempted between the ib_post_recv() and rdsdebug() statements.
> 
> Then, if the preemption is due to a receive CQE event, the
> rds_ib_recv_cqe_handler() will be invoked. This function processes
> receive completions, including freeing up data structures, such as the
> recv->r_frag.
> 
> In this scenario, rds_ib_recv_cqe_handler() will process the receive
> WR posted above. That implies, that the recv->r_frag has been freed
> before the above rdsdebug() statement has been executed. When it is
> later executed, we will have a NULL pointer dereference:
 ...
> This bug was provoked by compiling rds out-of-tree with
> EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay
> between the rdsdebug() and ib_ib_port_recv() statements:
> 
>  /* XXX when can this fail? */
>  ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
> + if (can_wait)
> + usleep_range(1000, 5000);
>  rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
>   recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
>   (long) ib_sg_dma_address(
> 
> The fix is simply to move the rdsdebug() statement up before the
> ib_post_recv() and remove the printing of ret, which is taken care of
> anyway by the non-debug code.
> 
> Signed-off-by: Håkon Bugge 
> Reviewed-by: Knut Omang 
> Reviewed-by: Wei Lin Guay 

Applied, thank you.

Re: [PATCH v2 2/2] ARM: sun8i: bananapi-m3: Enable dwmac-sun8i

2017-11-09 Thread Corentin Labbe

On Fri, Nov 10, 2017 at 11:48:11AM +0800, Chen-Yu Tsai wrote:
> On Thu, Nov 9, 2017 at 4:29 PM, Corentin Labbe
>  wrote:
> > The dwmac-sun8i hardware is present on the bananapi m3
> > It uses an external PHY rtl8211e via RGMII.
> >
> > This patch create the needed emac and phy nodes.
> >
> > Signed-off-by: Corentin Labbe 
> > ---
> >  arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 18 ++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts 
> > b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
> > index c606af3dbfed..45bdd5c17829 100644
> > --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
> > +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
> > @@ -52,6 +52,7 @@
> > compatible = "sinovoip,bpi-m3", "allwinner,sun8i-a83t";
> >
> > aliases {
> > +   ethernet0 = &emac;
> > serial0 = &uart0;
> > };
> >
> > @@ -88,6 +89,23 @@
> > /* TODO GL830 USB-to-SATA bridge downstream w/ GPIO power controls 
> > */
> >  };
> >
> > +&emac {
> > +   pinctrl-names = "default";
> > +   pinctrl-0 = <&emac_rgmii_pins>;
> > +   phy-handle = <&ext_rgmii_phy>;
> > +   phy-mode = "rgmii";
> 
> Schematics say PHY is power by DC1SW from the PMIC.
> Not sure why you don't need that. Have you tested your patch?

Tested on 4.14.0-rc5-next-20171018+ 
I will try to check which uboot is used, perhaps it's an old uboot with some 
PMIC hack.

Thanks
Regards

Re: [PATCH] MAINTAINERS: Add Lorenzo Pieralisi for PCI host bridge drivers

2017-11-09 Thread Kishon Vijay Abraham I

Hi,

On Thursday 09 November 2017 08:35 PM, Bjorn Helgaas wrote:
> On Thu, Nov 09, 2017 at 11:28:36AM +0530, Kishon Vijay Abraham I wrote:
>> Hi Bjorn,
>>
>> On Thursday 09 November 2017 01:56 AM, Bjorn Helgaas wrote:
>>> On Wed, Nov 08, 2017 at 02:15:10PM -0600, Bjorn Helgaas wrote:
 From: Bjorn Helgaas 

 Add Lorenzo Pieralisi as maintainer for PCI native host bridge drivers and
 the endpoint driver framework.

 Signed-off-by: Bjorn Helgaas 
>>>
>>> This is on my for-linus branch, and I intend to merge it for v4.14.
>>
>> There is already an entry for PCI endpoint in MAINTAINERS file. Can Lorenzo 
>> be
>> added there?
>>
>> PCI ENDPOINT SUBSYSTEM
>> M:  Kishon Vijay Abraham I 
>> L:  linux-...@vger.kernel.org
>> T:  git 
>> git://git.kernel.org/pub/scm/linux/kernel/git/kishon/pci-endpoint.git
>> S:  Supported
>> F:  drivers/pci/endpoint/
>> F:  drivers/misc/pci_endpoint_test.c
>> F:  tools/pci/
> 
> Right, thanks, I forgot all about this separate entry.  I added Lorenzo
> there, resulting in the patch below.
> 
> My practice has been that all the PCI patches (everything in
> drivers/pci plus some include and x86/pci stuff) have been merged via
> my tree.
> 
> This includes things in drivers/pci/{host,dwc,endpoint,switch}, which
> are non-core things and usually specific to a chipset.  I try to
> ensure they have individual maintainers designated, and I ask for
> their acks for non-trivial changes because I have no specs and no
> hardware for testing them.  But I think it's still good to have one
> person look over them all to try to keep some consistency across them
> because they are all quite similar.
> 
> So my hope is that Lorenzo can take over that oversight role from me,
> not that he would replace any of those designated maintainers.
> 
> Ideally, this will be transparent to patch submitters except that they
> should add Lorenzo to the "To:" line (keeping linux-pci and other
> interested parties).

Makes sense.

I'm also thinking if we should change the tree in PCI ENDPOINT SUBSYSTEM to
git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git or maybe
git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/?

since most of the endpoint patches also deals with modifying the controller
drivers.

Thanks
Kishon

[PATCHv2 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-09 Thread Mahesh Bandewar

From: Mahesh Bandewar 

Add a sysctl variable kernel.controlled_userns_caps_whitelist. This
takes input as capability mask expressed as two comma separated hex
u32 words. The mask, however, is stored in kernel as kernel_cap_t type.

Any capabilities that are not part of this mask will be controlled and
will not be allowed to processes in controlled user-ns.

Signed-off-by: Mahesh Bandewar 
---
v2:
  Rebase
v1:
  Initial submission

 Documentation/sysctl/kernel.txt | 21 ++
 include/linux/capability.h  |  3 +++
 kernel/capability.c | 47 +
 kernel/sysctl.c |  5 +
 4 files changed, 76 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c7523c..a1d39dbae847 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
 - bootloader_version[ X86 only ]
 - callhome  [ S390 only ]
 - cap_last_cap
+- controlled_userns_caps_whitelist
 - core_pattern
 - core_pipe_limit
 - core_uses_pid
@@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel.
 
 ==
 
+controlled_userns_caps_whitelist
+
+Capability mask that is whitelisted for "controlled" user namespaces.
+Any capability that is missing from this mask will not be allowed to
+any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
+is not part of this mask, then processes running inside any controlled
+userns's will not be allowed to perform action that needs CAP_NET_RAW
+capability. However, processes that are attached to a parent user-ns
+hierarchy that is *not* controlled and has CAP_NET_RAW can continue
+performing those actions. User-namespaces are marked "controlled" at
+the time of their creation based on the capabilities of the creator.
+A process that does not have CAP_SYS_ADMIN will create user-namespaces
+that are controlled.
+
+The value is expressed as two comma separated hex words (u32). This
+sysctl is avaialble in init-ns and users with CAP_SYS_ADMIN in init-ns
+are allowed to make changes.
+
+==
+
 core_pattern:
 
 core_pattern is used to specify a core dumpfile pattern name.
diff --git a/include/linux/capability.h b/include/linux/capability.h
index f640dcbc880c..7d79a4689625 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -14,6 +14,7 @@
 #define _LINUX_CAPABILITY_H
 
 #include 
+#include 
 
 
 #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
@@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+void __user *buff, size_t *lenp, loff_t *ppos);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/kernel/capability.c b/kernel/capability.c
index 1e1c0236f55b..4a859b7d4902 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set);
 
 int file_caps_enabled = 1;
 
+kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET;
+
 static int __init file_caps_disable(char *str)
 {
file_caps_enabled = 0;
@@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
rcu_read_unlock();
return (ret == 0);
 }
+
+/* Controlled-userns capabilities routines */
+#ifdef CONFIG_SYSCTL
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+void __user *buff, size_t *lenp, loff_t *ppos)
+{
+   DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP);
+   struct ctl_table caps_table;
+   char tbuf[NAME_MAX];
+   int ret;
+
+   ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP,
+  controlled_userns_caps_whitelist.cap,
+  _KERNEL_CAPABILITY_U32S);
+   if (ret != CAP_LAST_CAP)
+   return -1;
+
+   scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap);
+
+   caps_table.data = tbuf;
+   caps_table.maxlen = NAME_MAX;
+   caps_table.mode = table->mode;
+   ret = proc_dostring(&caps_table, write, buff, lenp, ppos);
+   if (ret)
+   return ret;
+   if (write) {
+   kernel_cap_t tmp;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP);
+   if (ret)
+   return ret;
+
+   ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S,
+caps_b

[PATCHv2 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Mahesh Bandewar

From: Mahesh Bandewar 

With this new notion of "controlled" user-namespaces, the controlled
user-namespaces are marked at the time of their creation while the
capabilities of processes that belong to them are controlled using the
global mask.

Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
that belongs to uncontrolled user-ns can create another (child) user-
namespace that is uncontrolled. Any other process (that either does
not have SYS_ADMIN or belongs to a controlled user-ns) can only
create a user-ns that is controlled.

global-capability-whitelist (controlled_userns_caps_whitelist) is used
at the capability check-time and keeps the semantics for the processes
that belong to uncontrolled user-ns as it is. Processes that belong to
controlled user-ns however are subjected to different checks-

   (a) if the capability in question is controlled and process belongs
   to controlled user-ns, then it's always denied.
   (b) if the capability in question is NOT controlled then fall back
   to the traditional check.

Signed-off-by: Mahesh Bandewar 
---
v2:
  Don't recalculate user-ns flags for every setns() call.
v1:
  Initial submission.

 include/linux/capability.h |  1 +
 include/linux/user_namespace.h | 20 
 kernel/capability.c|  5 +
 kernel/user_namespace.c|  4 
 security/commoncap.c   |  8 
 5 files changed, 38 insertions(+)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 7d79a4689625..a1fd9e460379 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -251,6 +251,7 @@ extern bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns);
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct 
cpu_vfs_cap_data *cpu_caps);
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos);
+bool is_capability_controlled(int cap);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t 
size);
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 3fe714da7f5a..647f825c7b5f 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -23,6 +23,7 @@ struct uid_gid_map {  /* 64 bytes -- 1 cache line */
 };
 
 #define USERNS_SETGROUPS_ALLOWED 1UL
+#define USERNS_CONTROLLED   2UL
 
 #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
 
@@ -103,6 +104,16 @@ static inline void put_user_ns(struct user_namespace *ns)
__put_user_ns(ns);
 }
 
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return ns->flags & USERNS_CONTROLLED;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+   ns->flags |= USERNS_CONTROLLED;
+}
+
 struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
@@ -161,6 +172,15 @@ static inline struct ns_common *ns_get_owner(struct 
ns_common *ns)
 {
return ERR_PTR(-EPERM);
 }
+
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+   return false;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/capability.c b/kernel/capability.c
index 4a859b7d4902..bffe249922de 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -511,6 +511,11 @@ bool ptracer_capable(struct task_struct *tsk, struct 
user_namespace *ns)
 }
 
 /* Controlled-userns capabilities routines */
+bool is_capability_controlled(int cap)
+{
+   return !cap_raised(controlled_userns_caps_whitelist, cap);
+}
+
 #ifdef CONFIG_SYSCTL
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 void __user *buff, size_t *lenp, loff_t *ppos)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c490f1e4313b..600c7dcb9ff7 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -139,6 +139,10 @@ int create_user_ns(struct cred *new)
goto fail_keyring;
 
set_cred_user_ns(new, ns);
+   if (!ns_capable(parent_ns, CAP_SYS_ADMIN) ||
+   is_user_ns_controlled(parent_ns))
+   mark_user_ns_controlled(ns);
+
return 0;
 fail_keyring:
 #ifdef CONFIG_PERSISTENT_KEYRINGS
diff --git a/security/commoncap.c b/security/commoncap.c
index fc46f5b85251..89103f16ac37 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -73,6 +73,14 @@ int cap_capable(const struct cred *cred, struct 
user_namespace *targ_ns,
 {
struct user_namespace *ns = targ_ns;
 
+   /* If the capability is controlled and user-ns that process
+* belongs-to is 'controlled' then return EPERM and no need
+* to check the user-ns hierarchy.
+*/
+   if (is_user_ns_controlled(cred->user_ns) &&
+

[PATCHv2 0/2] capability controlled user-namespaces

2017-11-09 Thread Mahesh Bandewar

From: Mahesh Bandewar 

TL;DR version
-
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version


Problem
---
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

#define _GNU_SOURCE
#include 
#include 
#include 

int main(int ac, char **av)
{
int sock = -1;

printf("Attempting to open RAW socket before unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock before unshare().\n");
close(sock);
sock = -1;
}

if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
perror("unshare() failed: ");
return 1;
}

printf("Attempting to open RAW socket after unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock after unshare().\n");
close(sock);
sock = -1;
}

return 0;
}

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach

Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Once a user-ns is marked as 'controlled'; all its child user-
namespaces are marked as 'controlled' too.

A global whitelist is list of capabilities governed by the
sysctl which is available to (privileged) user in init-ns to modify
while it's applicable to all controlled user-namespaces on the host.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +
 include/linux/capability.h  |  4 
 include/linux/user_namespace.h  | 20 
 kernel/capability.c | 52 +
 kernel/sysctl.c |  5 
 kernel/user_namespace.c |  4 
 security/commoncap.c|  8 +++
 7 files changed, 114 insertions(+)

-- 
2.15.0.448.gf294e3d99a-goog

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread महेश बंडेवार

On Fri, Nov 10, 2017 at 1:46 PM, Serge E. Hallyn  wrote:
> Quoting Eric W. Biederman (ebied...@xmission.com):
>> single sandbox.  I am not at all certain that the capabilities is the
>> proper place to limit code reachability.
>
> Right, I keep having this gut feeling that there is another way we
> should be doing that.  Maybe based on ksplice or perf, or maybe more
> based on subsystems.  And I hope someone pursues that.  But I can't put
> my finger on it, and meanwhile the capability checks obviously *are* in
> fact gates...
>
Well, I don't mind if there is a better solution available. The
proposed solution is not adding too much or complex code and using a
bit and a sysctl and will be sitting dormant. When we have complete
solution, this addition should not be a burden to maintain because of
it's non-invasive footprint.

I will push the next version of the patch-set that implements Serge's finding.

Thanks,
--mahesh..

[PS: I'll be soon traveling again and moving to an area where
connectivity will be scarce / unreliable. So please expect lot more
delays in my responses.]

> -serge

linux-next: Tree for Nov 10

2017-11-09 Thread Stephen Rothwell

Hi all,

Changes since 20171109:

The powerpc tree still had its build failure for which I applied a
patch

The net-next tree gained a conflict against Linus' tree.

The tip tree lost its build failure but gained a conflict against Linus'
tree.

The rcu tree gained a conflict against the tip tree.

The gpio tree gained a conflict against Linus' tree.

The akpm-current tree gained a conflict against the tip tree.

Non-merge commits (relative to Linus' tree): 11746
 11057 files changed, 546043 insertions(+), 263268 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 272 trees (counting Linus' and 42 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (3fefc31843cf Merge tag 'pm-final-4.14' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures 
with sparse check)
Merging arc-current/for-curr (92d44128241f ARCv2: Accomodate HS48 MMUv5 by 
relaxing MMU ver checking)
Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks 
addr_limit)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug 
callback failure during imc initialization)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6a1728024745 Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec)
Merging ipsec/master (c9f3f813d462 xfrm: Fix stack-out-of-bounds read in 
xfrm_state_find.)
Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix 
use-after-free in send_reset)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a6127b4440d1 Merge tag 
'iwlwifi-for-kalle-2017-10-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (9618aec3349b Merge tag 'mac80211-for-davem-2017-10-25' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging sound-current/for-linus (75ee94b20b46 ALSA: hda - fix headset mic 
problem for Dell machines with alc274)
Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi 
for PCI host bridge drivers)
Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8)
Merging tty.current/tty-linus (8a5776a5f498 Linux 4.14-rc4)
Merging usb.current/usb-linus (bb176f67090c Linux 4.14-rc6)
Merging usb-gadget-fixes/fixes (7c80f9e4a588 usb: usbtest: fix NULL pointer 
dereference)
Merging usb-serial-fixes/usb-linus (0b07194bb55e Linux 4.14-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (2fb850092fd9 phy: rockchip-typec: Check for errors from 
tcphy_phy_init())
Merging s

Re: [PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-09 Thread महेश बंडेवार

On Fri, Nov 10, 2017 at 1:30 PM, Serge E. Hallyn  wrote:
> Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com):
> ...
>> >>
>> >>  ==
>> >>
>> >> +controlled_userns_caps_whitelist
>> >> +
>> >> +Capability mask that is whitelisted for "controlled" user namespaces.
>> >> +Any capability that is missing from this mask will not be allowed to
>> >> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
>> >> +is not part of this mask, then processes running inside any controlled
>> >> +userns's will not be allowed to perform action that needs CAP_NET_RAW
>> >> +capability. However, processes that are attached to a parent user-ns
>> >> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue
>> >> +performing those actions. User-namespaces are marked "controlled" at
>> >> +the time of their creation based on the capabilities of the creator.
>> >> +A process that does not have CAP_SYS_ADMIN will create user-namespaces
>> >> +that are controlled.
>> >
>> > Hm.  I think that's fine (the way 'controlled' user namespaces are
>> > defined), but that is design decision in itself, and should perhaps be
>> > discussed.
>> >
>> > Did you consider other ways?  What about using CAP_SETPCAP?
>> >
>> I did try other ways e.g. using another bounding-set etc. but
>> eventually settled with this approach because of main two properties -
>
> No, I meant did you try other ways of defining a controlled user
> namespace, other than one which is created by a task lacking
> CAP_SYS_ADMIN?
>
SYS_ADMIN is the capability that has been used for deciding who can or
cannot create namespaces, so didn't want to create another model that
may not be compatible with current model which is well understood
hence no.

> ...
>
>> >> +The value is expressed as two comma separated hex words (u32). This
>> >
>> > Why comma separated?  whitespace ok?  Leading 0x ok?  What is the
>> > default at boot?  (Obviously the patch tells me, I'm asking for it
>> > to be spelled out in the doc)
>> >
>> I tried multiple ways including representing capabilities in
>> string/name form for better readability but didn't want to add
>> additional complexities of dealing with strings and possible
>> string-related-issues for this. Also didn't want to reinvent the new
>> form so settled with something that is widely used (cpu
>> bounding/affinity/irq mapping etc.) and is capable of handling growing
>> bit set (currently 37 but possibly more later).
>
> Ok, thanks.

Re: [PATCH 03/14] soundwire: Add Master registration

2017-11-09 Thread Vinod Koul

On Thu, Nov 09, 2017 at 09:14:16PM +, Srinivas Kandagatla wrote:
> 
> 
> On 19/10/17 04:03, Vinod Koul wrote:
> 
> >+/**
> >+ * sdw_add_bus_master: add a bus Master instance
> >+ *
> >+ * @bus: bus instance
> >+ *
> >+ * Initializes the bus instance, read properties and create child
> >+ * devices.
> >+ */
> 
> Some of the exported functions are missing kerneldocs.
> Is it something you plan to add in next version of the patcheset?

I though most were, will double check to be sure.

> 
> >+int sdw_add_bus_master(struct sdw_bus *bus)
> >+{
> >+int ret;
> >+
> >+if (!bus->dev) {
> >+pr_err("SoundWire bus has no device");
> >+return -ENODEV;
> >+}
> >+
> >+mutex_init(&bus->bus_lock);
> >+INIT_LIST_HEAD(&bus->slaves);
> >+
> >+/*
> >+ * SDW is an enumerable bus, but devices can be powered off. So,
> >+ * they won't be able to report as present.
> >+ *
> >+ * Create Slave devices based on Slaves described in
> >+ * the respective firmware (ACPI/DT)
> >+ */
> >+
> >+if (IS_ENABLED(CONFIG_ACPI) && bus->dev && ACPI_HANDLE(bus->dev))
> >+ret = sdw_acpi_find_slaves(bus);
> >+else if (IS_ENABLED(CONFIG_OF) && bus->dev && bus->dev->of_node)
> >+ret = sdw_of_find_slaves(bus);
> >+else
> bus->dev is already checked in the start of the function, do we need to
> check once again ?

yes already fixed, thanks

-- 
~Vinod

Re: [PATCH] drivers: hv: balloon: remove extraneous assignment to region_start

2017-11-09 Thread Stephen Hemminger

On Wed, 18 Oct 2017 12:52:12 +0100
Colin King  wrote:

> From: Colin Ian King 
> 
> The variable region_start is assigned twice, the first value is
> never read and redundant, so can be removed.  We can clean up the
> code further by assigning rg_start directly rather than using the
> temporary variable region_start which can then be removed. Cleans
> up the clang warning:
> 
> drivers/hv/hv_balloon.c:976:3: warning: Value stored to 'region_start'
> is never read
> 
> Signed-off-by: Colin Ian King 

LGTM

Acked-by: Stephen Hemminger

Re: [PATCH 02/14] soundwire: Add SoundWire bus type

2017-11-09 Thread Vinod Koul

On Thu, Nov 09, 2017 at 09:14:07PM +, Srinivas Kandagatla wrote:
> 
> 
> On 19/10/17 04:03, Vinod Koul wrote:
> >This adds the base SoundWire bus type, bus and driver registration.
> >along with changes to module device table for new SoundWire
> >device type.
> >
> >Signed-off-by: Sanyog Kale 
> >Signed-off-by: Vinod Koul 
> >---
> 
> >+++ b/drivers/soundwire/Kconfig
> >@@ -0,0 +1,22 @@
> >+#
> >+# SoundWire subsystem configuration
> >+#
> >+
> >+menuconfig SOUNDWIRE
> >+bool "SoundWire support"
> 
> Any reason why this subsystem can not be build as module?

This is not subsystem symbol but the menu. The SOUNDWIRE_BUS can be module.

> 
> >+---help---
> >+  SoundWire is a 2-Pin interface with data and clock line ratified
> >+  by the MIPI Alliance. SoundWire is used for transporting data
> >+  typically related to audio functions. SoundWire interface is
> 
> >+#ifndef __SDW_BUS_H
> >+#define __SDW_BUS_H
> >+
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> Do you need these headers here?

Yes :) I will double check though


> 
> >+#include 
> >+
> >+int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size);
> >+
> >+#endif /* __SDW_BUS_H */
> >diff --git a/drivers/soundwire/bus_type.c b/drivers/soundwire/bus_type.c
> >new file mode 100644
> >index ..a14d1de80afa
> >--- /dev/null
> >+++ b/drivers/soundwire/bus_type.c
> >
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include "bus.h"
> >+
> >+/**
> >+ * sdw_get_device_id: find the matching SoundWire device id
> >+ *
> function name should end with () - according to kernel doc.

ah thanks for pointing will add

> 
> >+ * @slave: SoundWire Slave device
> >+ * @drv: SoundWire Slave Driver
> >+ *
> >+ * The match is done by comparing the mfg_id and part_id from the
> >+ * struct sdw_device_id. class_id is unused, as it is a placeholder
> >+ * in MIPI Spec.
> >+ */
> 
> BTW, This is a static private function, why are we adding kernel doc for
> this?

the match is an important routine and helps people understand the logic
hence documentation. More doc is better right :)

> 
> >+static const struct sdw_device_id *
> >+sdw_get_device_id(struct sdw_slave *slave, struct sdw_driver *drv)
> >+{
> >+const struct sdw_device_id *id = drv->id_table;
> >+
> >+while (id && id->mfg_id) {
> >+if (slave->id.mfg_id == id->mfg_id &&
> >+slave->id.part_id == id->part_id) {
> >+return id;
> >+}
> >+id++;
> >+}
> >+
> >+return NULL;
> >+}
> >+
> >+static int sdw_bus_match(struct device *dev, struct device_driver *ddrv)
> >+{
> >+struct sdw_slave *slave = dev_to_sdw_dev(dev);
> >+struct sdw_driver *drv = drv_to_sdw_driver(ddrv);
> >+
> >+return !!sdw_get_device_id(slave, drv);
> >+}
> >+
> >+int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size)
> >+{
> >+/* modalias is sdw:mp */
> >+
> >+return snprintf(buf, size, "sdw:m%04Xp%04X\n",
> >+slave->id.mfg_id, slave->id.part_id);
> >+}
> >+
> >+static int sdw_uevent(struct device *dev, struct kobj_uevent_env *env)
> >+{
> >+struct sdw_slave *slave = dev_to_sdw_dev(dev);
> >+char modalias[32];
> >+
> >+sdw_slave_modalias(slave, modalias, sizeof(modalias));
> >+
> >+if (add_uevent_var(env, "MODALIAS=%s", modalias))
> >+return -ENOMEM;
> >+
> >+return 0;
> >+}
> >+
> >+struct bus_type sdw_bus_type = {
> >+.name = "soundwire",
> >+.match = sdw_bus_match,
> >+.uevent = sdw_uevent,
> >+};
> >+EXPORT_SYMBOL(sdw_bus_type);
> >+
> >+static int sdw_drv_probe(struct device *dev)
> >+{
> >+struct sdw_slave *slave = dev_to_sdw_dev(dev);
> >+struct sdw_driver *drv = drv_to_sdw_driver(dev->driver);
> >+const struct sdw_device_id *id;
> >+int ret;
> >+
> >+id = sdw_get_device_id(slave, drv);
> 
> By this time we must have already matched dev and driver by the ID,
> shouldn't it be just slave->id  here?

I don't think so we do not have slave->id, we pass the id in probe as an
argument

> >+if (!id)
> >+return -ENODEV;
> >+
> >+/*
> >+ * attach to power domain but don't turn on (last arg)
> >+ */
> >+ret = dev_pm_domain_attach(dev, false);
> >+if (ret) {
> Shouldn't it just handle the EPROBE_DEFER case and ignore it for other
> errors.

why should we ignore other errors and continue?

> 
> 
> >+dev_err(dev, "Failed to attach PM domain: %d\n", ret);
> >+return ret;
> >+}
> >+
> >+ret = drv->probe(slave, id);
> >+if (ret) {
> >+dev_err(dev, "Probe of %s failed: %d\n", drv->name, ret);
> >+return ret;
> >+}
> 
> 
> What happens if the slave driver is built as module and loaded after the
> slave device is attached to the bus. How does the slave driver get updated
> status in this case?
> 
> We have simila

Re: [PATCH] tcp: Export to userspace the TCP state names for the trace events

2017-11-09 Thread Yafang Shao

2017-11-10 8:57 GMT+08:00 Steven Rostedt :
>
> From: "Steven Rostedt (VMware)" 
>
> The TCP trace events (specifically tcp_set_state), maps emums to symbol
> names via __print_symbolic(). But this only works for reading trace events
> from the tracefs trace files. If perf or trace-cmd were to record these
> events, the event format file does not convert the enum names into numbers,
> and you get something like:
>
> __print_symbolic(REC->oldstate,
> { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
> { TCP_SYN_SENT, "TCP_SYN_SENT" },
> { TCP_SYN_RECV, "TCP_SYN_RECV" },
> { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
> { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
> { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
> { TCP_CLOSE, "TCP_CLOSE" },
> { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
> { TCP_LAST_ACK, "TCP_LAST_ACK" },
> { TCP_LISTEN, "TCP_LISTEN" },
> { TCP_CLOSING, "TCP_CLOSING" },
> { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
>
> Where trace-cmd and perf do not know the values of those enums.
>
> Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
> the enum strings into their values at system boot. This will allow perf and
> trace-cmd to see actual numbers and not enums:
>
> __print_symbolic(REC->oldstate,
> { 1, "TCP_ESTABLISHED" },
> { 2, "TCP_SYN_SENT" },
> { 3, "TCP_SYN_RECV" },
> { 4, "TCP_FIN_WAIT1" },
> { 5, "TCP_FIN_WAIT2" },
> { 6, "TCP_TIME_WAIT" },
> { 7, "TCP_CLOSE" },
> { 8, "TCP_CLOSE_WAIT" },
> { 9, "TCP_LAST_ACK" },
> { 10, "TCP_LISTEN" },
> { 11, "TCP_CLOSING" },
> { 12, "TCP_NEW_SYN_RECV" })
>
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  include/trace/events/tcp.h | 41 -
>  1 file changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 07a6cbf1..62e5bad7901f 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -9,21 +9,36 @@
>  #include 
>  #include 
>
> +#define tcp_state_names\
> +   EM(TCP_ESTABLISHED) \
> +   EM(TCP_SYN_SENT)\
> +   EM(TCP_SYN_RECV)\
> +   EM(TCP_FIN_WAIT1)   \
> +   EM(TCP_FIN_WAIT2)   \
> +   EM(TCP_TIME_WAIT)   \
> +   EM(TCP_CLOSE)   \
> +   EM(TCP_CLOSE_WAIT)  \
> +   EM(TCP_LAST_ACK)\
> +   EM(TCP_LISTEN)  \
> +   EM(TCP_CLOSING) \
> +   EMe(TCP_NEW_SYN_RECV)
> +
> +/* enums need to be exported to user space */
> +#undef EM
> +#undef EMe
> +#define EM(a) TRACE_DEFINE_ENUM(a);
> +#define EMe(a)TRACE_DEFINE_ENUM(a);
> +
> +tcp_state_names
> +
> +#undef EM
> +#undef EMe
> +#define EM(a) tcp_state_name(a),
> +#define EMe(a)tcp_state_name(a)
> +
>  #define tcp_state_name(state)  { state, #state }
>  #define show_tcp_state_name(val)   \
> -   __print_symbolic(val,   \
> -   tcp_state_name(TCP_ESTABLISHED),\
> -   tcp_state_name(TCP_SYN_SENT),   \
> -   tcp_state_name(TCP_SYN_RECV),   \
> -   tcp_state_name(TCP_FIN_WAIT1),  \
> -   tcp_state_name(TCP_FIN_WAIT2),  \
> -   tcp_state_name(TCP_TIME_WAIT),  \
> -   tcp_state_name(TCP_CLOSE),  \
> -   tcp_state_name(TCP_CLOSE_WAIT), \
> -   tcp_state_name(TCP_LAST_ACK),   \
> -   tcp_state_name(TCP_LISTEN), \
> -   tcp_state_name(TCP_CLOSING),\
> -   tcp_state_name(TCP_NEW_SYN_RECV))
> +   __print_symbolic(val, tcp_state_names)
>
>  /*
>   * tcp event with arguments sk and skb
> --
> 2.13.6
>

Could the macro tcp_state_name() be renamed ？
If  is included in include/net/tcp.h, it will
cause compile error, because there's another function tcp_state_name()
defined in net/netfilter/ipvs/ip_vs_proto_tcp.c.
static const char * tcp_state_name(int state)
{

if (state >= IP_VS_TCP_S_LAST)

return "ERR!";

return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?";

}


Thanks
Yafang

Re: [PATCH 2/4] kbuild: remove redundant $(wildcard ...) for cmd_files calculation

2017-11-09 Thread Masahiro Yamada

2017-11-10 13:53 GMT+09:00 Doug Anderson :
> Hi,
>
> On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada
>  wrote:
>> I do not why $(wildcard ...) needs to be called twice for computing
>> cmd_files.  Remove the first one.
>
> I tried and I can't find any reason for the two calls $(wildcard ...)
> either, so this seems fine to me.
>
>
>> Signed-off-by: Masahiro Yamada 
>> ---
>>
>>  Makefile | 3 +--
>>  scripts/Makefile.build   | 3 +--
>>  scripts/Makefile.headersinst | 3 +--
>>  scripts/Makefile.modpost | 3 +--
>>  4 files changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index a7476e6..58dd245 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1693,8 +1693,7 @@ cmd_crmodverdir = $(Q)mkdir -p $(MODVERDIR) \
>>
>>  # read all saved command lines
>>
>> -targets := $(wildcard $(sort $(targets)))
>> -cmd_files := $(wildcard .*.cmd $(foreach f,$(targets),$(dir $(f)).$(notdir 
>> $(f)).cmd))
>> +cmd_files := $(wildcard .*.cmd $(foreach f,$(sort $(targets)),$(dir 
>> $(f)).$(notdir $(f)).cmd))
>>
>>  ifneq ($(cmd_files),)
>>$(cmd_files): ;  # Do not try to update included dependency files
>> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
>> index 061d0c3..62d5314 100644
>> --- a/scripts/Makefile.build
>> +++ b/scripts/Makefile.build
>> @@ -583,8 +583,7 @@ FORCE:
>>  # optimization, we don't need to read them if the target does not
>>  # exist, we will rebuild anyway in that case.
>>
>> -targets := $(wildcard $(sort $(targets)))
>> -cmd_files := $(wildcard $(foreach f,$(targets),$(dir $(f)).$(notdir 
>> $(f)).cmd))
>> +cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir 
>> $(f)).cmd))
>>
>>  ifneq ($(cmd_files),)
>>include $(cmd_files)
>> diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst
>> index 5692d7a..2aa9181 100644
>> --- a/scripts/Makefile.headersinst
>> +++ b/scripts/Makefile.headersinst
>> @@ -114,9 +114,8 @@ $(check-file): scripts/headers_check.pl $(output-files) 
>> FORCE
>>
>>  endif
>>
>> -targets := $(wildcard $(sort $(targets)))
>>  cmd_files := $(wildcard \
>> - $(foreach f,$(targets),$(dir $(f)).$(notdir $(f)).cmd))
>> + $(foreach f,$(sort $$(targets)),$(dir $(f)).$(notdir 
>> $(f)).cmd))
>
> Did you mean the "$$" here before (targets)?  At first glance it seems 
> wrong...


Good catch!
I will fix this.

Thanks!


-- 
Best Regards
Masahiro Yamada

Re: [PATCH 2/4] kbuild: remove redundant $(wildcard ...) for cmd_files calculation

2017-11-09 Thread Doug Anderson

Hi,

On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada
 wrote:
> I do not why $(wildcard ...) needs to be called twice for computing
> cmd_files.  Remove the first one.

I tried and I can't find any reason for the two calls $(wildcard ...)
either, so this seems fine to me.


> Signed-off-by: Masahiro Yamada 
> ---
>
>  Makefile | 3 +--
>  scripts/Makefile.build   | 3 +--
>  scripts/Makefile.headersinst | 3 +--
>  scripts/Makefile.modpost | 3 +--
>  4 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index a7476e6..58dd245 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1693,8 +1693,7 @@ cmd_crmodverdir = $(Q)mkdir -p $(MODVERDIR) \
>
>  # read all saved command lines
>
> -targets := $(wildcard $(sort $(targets)))
> -cmd_files := $(wildcard .*.cmd $(foreach f,$(targets),$(dir $(f)).$(notdir 
> $(f)).cmd))
> +cmd_files := $(wildcard .*.cmd $(foreach f,$(sort $(targets)),$(dir 
> $(f)).$(notdir $(f)).cmd))
>
>  ifneq ($(cmd_files),)
>$(cmd_files): ;  # Do not try to update included dependency files
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 061d0c3..62d5314 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -583,8 +583,7 @@ FORCE:
>  # optimization, we don't need to read them if the target does not
>  # exist, we will rebuild anyway in that case.
>
> -targets := $(wildcard $(sort $(targets)))
> -cmd_files := $(wildcard $(foreach f,$(targets),$(dir $(f)).$(notdir 
> $(f)).cmd))
> +cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir 
> $(f)).cmd))
>
>  ifneq ($(cmd_files),)
>include $(cmd_files)
> diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst
> index 5692d7a..2aa9181 100644
> --- a/scripts/Makefile.headersinst
> +++ b/scripts/Makefile.headersinst
> @@ -114,9 +114,8 @@ $(check-file): scripts/headers_check.pl $(output-files) 
> FORCE
>
>  endif
>
> -targets := $(wildcard $(sort $(targets)))
>  cmd_files := $(wildcard \
> - $(foreach f,$(targets),$(dir $(f)).$(notdir $(f)).cmd))
> + $(foreach f,$(sort $$(targets)),$(dir $(f)).$(notdir $(f)).cmd))

Did you mean the "$$" here before (targets)?  At first glance it seems wrong...

Re: [PATCH 10/14] soundwire: Add sysfs for SoundWire DisCo properties

2017-11-09 Thread Vinod Koul

On Thu, Nov 09, 2017 at 09:14:35PM +, Srinivas Kandagatla wrote:
> 
> 
> On 19/10/17 04:03, Vinod Koul wrote:
> >It helps to read the properties for understanding and debug
> >SoundWire systems, so add sysfs files for SoundWire DisCo
> >properties.
> >
> >TODO: Add ABI files for sysfs
> >
> >Signed-off-by: Sanyog Kale 
> >Signed-off-by: Vinod Koul 
> >---
> 
> >diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
> >index 6c4f41b64744..e3d7aea18892 100644
> >--- a/drivers/soundwire/bus.c
> >+++ b/drivers/soundwire/bus.c
> >@@ -90,6 +90,8 @@ int sdw_add_bus_master(struct sdw_bus *bus)
> > }
> > }
> >+sdw_sysfs_bus_init(bus);
> >+
> > /*
> >  * SDW is an enumerable bus, but devices can be powered off. So,
> >  * they won't be able to report as present.
> >@@ -119,6 +121,8 @@ static int sdw_delete_slave(struct device *dev, void 
> >*data)
> > struct sdw_slave *slave = dev_to_sdw_dev(dev);
> > struct sdw_bus *bus = slave->bus;
> >+sdw_sysfs_slave_exit(slave);
> >+
> > mutex_lock(&bus->bus_lock);
> > if (!list_empty(&bus->slaves))
> > list_del(&slave->node);
> >@@ -130,6 +134,7 @@ static int sdw_delete_slave(struct device *dev, void 
> >*data)
> >  void sdw_delete_bus_master(struct sdw_bus *bus)
> >  {
> >+sdw_sysfs_bus_init(bus);
> 
> Shouldn't this be sdw_sysfs_bus_exit() here?

yes thats right, fixes in for v2

-- 
~Vinod

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> single sandbox.  I am not at all certain that the capabilities is the
> proper place to limit code reachability.

Right, I keep having this gut feeling that there is another way we
should be doing that.  Maybe based on ksplice or perf, or maybe more
based on subsystems.  And I hope someone pursues that.  But I can't put
my finger on it, and meanwhile the capability checks obviously *are* in
fact gates...

-serge

Re: [PATCH 3/3] VFS: close race between getcwd() and d_move()

2017-11-09 Thread NeilBrown

On Thu, Nov 09 2017, Linus Torvalds wrote:

> On Thu, Nov 9, 2017 at 2:14 PM, NeilBrown  wrote:
>> On Thu, Nov 09 2017, Linus Torvalds wrote:
>>>
>>> How nasty would it be to just expand the calls to __d_drop/__d_rehash
>>> into __d_move itself, and take both has list locks at the same time
>>> (with the usual ordering and checking if it's the same list, of
>>> course).
>>
>> something like this?
>
> Yes.
>
> This looks nicer to me. Partly because I hate those "pass flags to
> functions that modify their behavior" kinds of patches. I'd rather see
> just straight-line unconditional code with some possible duplication.
...
>
> I also do wonder if we can avoid all the unhash/rehash games entirely
> (and avoid the hash list locking) if it turns out that the dentry and
> target hash lists are the same.

I'm not convinced.  I haven't actually tried it, but the matrix of
possibilities seems a little large.
The source dentry may or may not be hashed (not in the "disconnected
IS_ROOT" case), and the target may or may not want to be rehashed
afterwards (depending on 'exchange').  We could skip the lock
for an exchange if they both had the same hash, but not for a simple
move.

However your description of what it was that you didn't like gave me an
idea - I can take the same approach as my original, but not pass flags
around.
I quite like how this turned out.
Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add
___d_rehash() without the BUG_ON() and call that from __d_rehash?

Thanks,
NeilBrown

From: NeilBrown 
Date: Fri, 10 Nov 2017 15:20:06 +1100
Subject: [PATCH] VFS: close race between getcwd() and d_move()

d_move() will call __d_drop() and then __d_rehash()
on the dentry being moved.  This creates a small window
when the dentry appears to be unhashed.  Many tests
of d_unhashed() are made under ->d_lock and so are safe
from racing with this window, but some aren't.
In particular, getcwd() calls d_unlinked() (which calls
d_unhashed()) without d_lock protection, so it can race.

This races has been seen in practice with lustre, which uses d_move() as
part of name lookup.  See:
   https://jira.hpdd.intel.com/browse/LU-9735
It could race with a regular rename(), and result in ENOENT instead
of either the 'before' or 'after' name.

The race can be demonstrated with a simple program which
has two threads, one renaming a directory back and forth
while another calls getcwd() within that directory: it should never
fail, but does.  See:
  https://patchwork.kernel.org/patch/9455345/

We could fix this race by taking d_lock and rechecking when
d_unhashed() reports true.  Alternately when can remove the window,
which is the approach this patch takes.

___d_drop() is introduce which does *not* clear d_hash.pprev
so the dentry still appears to be hashed.  __d_drop() calls
___d_drop(), then clears d_hash.pprev.
__d_move() now uses ___d_drop() and only clears d_hash.pprev
when not rehashing.

Signed-off-by: NeilBrown 
---
 fs/dcache.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..8c83543f5065 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -468,9 +468,11 @@ static void dentry_lru_add(struct dentry *dentry)
  * d_drop() is used mainly for stuff that wants to invalidate a dentry for some
  * reason (NFS timeouts or autofs deletes).
  *
- * __d_drop requires dentry->d_lock.
+ * __d_drop requires dentry->d_lock
+ * ___d_drop doesn't mark dentry as "unhashed"
+ *   (dentry->d_hash.pprev will be LIST_POISON2, not NULL).
  */
-void __d_drop(struct dentry *dentry)
+static void ___d_drop(struct dentry *dentry)
 {
if (!d_unhashed(dentry)) {
struct hlist_bl_head *b;
@@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry)

hlist_bl_lock(b);
__hlist_bl_del(&dentry->d_hash);
-   dentry->d_hash.pprev = NULL;
hlist_bl_unlock(b);
/* After this call, in-progress rcu-walk path lookup will fail. 
*/
write_seqcount_invalidate(&dentry->d_seq);
}
 }
+void __d_drop(struct dentry *dentry) {
+   ___d_drop(dentry);
+   dentry->d_hash.pprev = NULL;
+}
 EXPORT_SYMBOL(__d_drop);

 void d_drop(struct dentry *dentry)
@@ -2381,7 +2386,7 @@ EXPORT_SYMBOL(d_delete);
 static void __d_rehash(struct dentry *entry)
 {
struct hlist_bl_head *b = d_hash(entry->d_name.hash);
-   BUG_ON(!d_unhashed(entry));
+
hlist_bl_lock(b);
hlist_bl_add_head_rcu(&entry->d_hash, b);
hlist_bl_unlock(b);
@@ -2818,9 +2823,9 @@ static void __d_move(struct dentry *dentry, struct dentry 
*target,
write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);

/* unhash both */
-   /* __d_drop does write_seqcount_barrier, but they're OK to nest. */
-   __d_drop(dentry);
-   __d_drop(target);
+   /* ___d_drop does write_seqcount_barrier, but they're OK to nest. */
+   ___d_drop(den

Re: linux-next: manual merge of the net-next tree with Linus' tree

2017-11-09 Thread David Miller

From: Stephen Rothwell 
Date: Fri, 10 Nov 2017 10:31:00 +1100

> Hi all,
> 
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   net/sched/cls_basic.c
>   net/sched/cls_u32.c
> 
> between commits:
> 
>   0b2a59894b76 ("cls_basic: use tcf_exts_get_net() before call_rcu()")
>   35c55fc156d8 ("cls_u32: use tcf_exts_get_net() before call_rcu()")
> 
> from Linus' tree and commit:
> 
>   1d8134fea2eb ("net_sched: use idr to allocate basic filter handles")
> 
> from the net-next tree.

This should be resolved as I've just merged 'net' into 'net-next'.

Re: linux-next: manual merge of the net-next tree with Linus' tree

2017-11-09 Thread Cong Wang

On Thu, Nov 9, 2017 at 3:31 PM, Stephen Rothwell  wrote:
> I fixed it up (I think - see below) and can carry the fix as necessary.
> This is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

It looks good to me.

Thanks!

linux-next: manual merge of the akpm-current tree with the tip tree

2017-11-09 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/softirq.c

between commit:

  f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled")

from the tip tree and commit:

  275f9389fa4e ("kmemcheck: rip it out")

from the akpm-current tree.

I fixed it up (the latter removed code modified by the former) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-09 Thread महेश बंडेवार

On Fri, Nov 10, 2017 at 6:58 AM, Eric W. Biederman
 wrote:
> "Mahesh Bandewar (महेश बंडेवार)"  writes:
>
>> [resend response as earlier one failed because of formatting issues]
>>
>> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn  wrote:
>>>
>>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) 
>>> wrote:
>>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner
>>> >  wrote:
>>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश 
>>> > > बंडेवार) wrote:
>>> > >> Sorry folks I was traveling and seems like lot happened on this 
>>> > >> thread. :p
>>> > >>
>>> > >> I will try to response few of these comments selectively -
>>> > >>
>>> > >> > The thing that makes me hesitate with this set is that it is a
>>> > >> > permanent new feature to address what (I hope) is a temporary
>>> > >> > problem.
>>> > >> I agree this is permanent new feature but it's not solving a temporary
>>> > >> problem. It's impossible to assess what and when new vulnerability
>>> > >> that could show up. I think Daniel summed it up appropriately in his
>>> > >> response
>>> > >>
>>> > >> > Seems like there are two naive ways to do it, the first being to just
>>> > >> > look at all code under ns_capable() plus code called from there.  It
>>> > >> > seems like looking at the result of that could be fruitful.
>>> > >> This is really hard. The main issue that there were features designed
>>> > >> and developed before user-ns days with an assumption that unprivileged
>>> > >> users will never get certain capabilities which only root user gets.
>>> > >> Now that is not true anymore with user-ns creation with mapping root
>>> > >> for any process. Also at the same time blocking user-ns creation for
>>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy
>>> > >> to just perform a code-walk-though and correct those decisions now.
>>> > >>
>>> > >> > It seems to me that the existing control in
>>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct 
>>> > >> > tape
>>> > >> > in that case.
>>> > >> This solution is essentially blocking unprivileged users from using
>>> > >> the user-namespaces entirely. This is not really a solution that can
>>> > >> work. The solution that this patch-set adds allows unprivileged users
>>> > >> to create user-namespaces. Actually the proposed solution is more
>>> > >> fine-grained approach than the unprivileged_userns_clone solution
>>> > >> since you can selectively block capabilities rather than completely
>>> > >> blocking the functionality.
>>> > >
>>> > > I've been talking to Stéphane today about this and we should also keep 
>>> > > in mind
>>> > > that we have:
>>> > >
>>> > > chb@conventiont|~
>>> > >> ls -al /proc/sys/user/
>>> > > total 0
>>> > > dr-xr-xr-x 1 root root 0 Nov  6 23:32 .
>>> > > dr-xr-xr-x 1 root root 0 Nov  2 22:13 ..
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_cgroup_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_instances
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_inotify_watches
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_ipc_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_mnt_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_net_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_pid_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_user_namespaces
>>> > > -rw-r--r-- 1 root root 0 Nov  8 19:48 max_uts_namespaces
>>> > >
>>> > > These files allow you to limit the number of namespaces that can be 
>>> > > created
>>> > > *per namespace* type. So let's say your system runs a bunch of user 
>>> > > namespaces
>>> > > you can do:
>>> > >
>>> > > chb@conventiont|~
>>> > >> echo 0 > /proc/sys/user/max_user_namespaces
>>> > >
>>> > > So that the next time you try to create a user namespaces you'd see:
>>> > >
>>> > > chb@conventiont|~
>>> > >> unshare -U
>>> > > unshare: unshare failed: No space left on device
>>> > >
>>> > > So there's not even a need to upstream a new sysctl since we have ways 
>>> > > of
>>> > > blocking this.
>>> > >
>>> > I'm not sure how it's solving the problem that my patch-set is addressing?
>>> > I agree though that the need for unprivileged_userns_clone sysctl goes
>>> > away as this is equivalent to setting that sysctl to 0 as you have
>>> > described above.
>>>
>>> oh right that was the reasoning iirc for not needing the other sysctl.
>>>
>>> > However as I mentioned earlier, blocking processes from creating
>>> > user-namespaces is not the solution. Processes should be able to
>>> > create namespaces as they are designed but at the same time we need to
>>> > have controls to 'contain' them if a need arise. Setting max_no to 0
>>> > is not the solution that I'm looking for since it doesn't solve the
>>> > problem.
>>>
>>> well yesterday we were told that was explicitly not the goal, but that was
>>> not by you ... i just mention it to explain why we seem to be walking in
>>> circles a bit.

Re: [PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist

2017-11-09 Thread Serge E. Hallyn

Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com):
...
> >>
> >>  ==
> >>
> >> +controlled_userns_caps_whitelist
> >> +
> >> +Capability mask that is whitelisted for "controlled" user namespaces.
> >> +Any capability that is missing from this mask will not be allowed to
> >> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
> >> +is not part of this mask, then processes running inside any controlled
> >> +userns's will not be allowed to perform action that needs CAP_NET_RAW
> >> +capability. However, processes that are attached to a parent user-ns
> >> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue
> >> +performing those actions. User-namespaces are marked "controlled" at
> >> +the time of their creation based on the capabilities of the creator.
> >> +A process that does not have CAP_SYS_ADMIN will create user-namespaces
> >> +that are controlled.
> >
> > Hm.  I think that's fine (the way 'controlled' user namespaces are
> > defined), but that is design decision in itself, and should perhaps be
> > discussed.
> >
> > Did you consider other ways?  What about using CAP_SETPCAP?
> >
> I did try other ways e.g. using another bounding-set etc. but
> eventually settled with this approach because of main two properties -

No, I meant did you try other ways of defining a controlled user
namespace, other than one which is created by a task lacking
CAP_SYS_ADMIN?

...

> >> +The value is expressed as two comma separated hex words (u32). This
> >
> > Why comma separated?  whitespace ok?  Leading 0x ok?  What is the
> > default at boot?  (Obviously the patch tells me, I'm asking for it
> > to be spelled out in the doc)
> >
> I tried multiple ways including representing capabilities in
> string/name form for better readability but didn't want to add
> additional complexities of dealing with strings and possible
> string-related-issues for this. Also didn't want to reinvent the new
> form so settled with something that is widely used (cpu
> bounding/affinity/irq mapping etc.) and is capable of handling growing
> bit set (currently 37 but possibly more later).

Ok, thanks.

Re: [PATCH 1/2] blk-throtl: make latency= absolute

2017-11-09 Thread Shaohua Li

On Thu, Nov 09, 2017 at 03:42:58PM -0800, Tejun Heo wrote:
> Hello, Shaohua.
> 
> On Thu, Nov 09, 2017 at 03:12:12PM -0800, Shaohua Li wrote:
> > The percentage latency makes sense, but the absolute latency doesn't to me. 
> > A
> > 4k IO latency could be much smaller than 1M IO latency. If we don't add
> > baseline latency, we can't specify a latency target which works for both 4k 
> > and
> > 1M IO.
> 
> It isn't adaptive for sure.  I think it's still useful for the
> following reasons.
> 
> 1. The absolute latency target is by nature both workload and device
>dependent.  For a lot of use cases, coming up with a decent number
>should be possible.
> 
> 2. There are many use cases which aren't sensitive to the level where
>they care much about the different between small and large
>requests.  e.g. protecting a managerial job so that it doesn't
>completely stall doesn't require tuning things to that level.  A
>value which is comfortably higher than usually expected latencies
>would often be enough (say 100ms).
> 
> 3. It's also useful for verification / testing.

I think the absolute latency would only work for HD. For a SSD, a 4k latency
probably is 60us and 1M latency is 500us. The disk must be very contended to
make 4k latency reach 500us. Not sensitive doesn't mean no protection. If the
use case sets rough latency, say 1ms, there will be no protection for 4k IO at
all. The baseline latency is pretty reliable for SSD actually. So I'd rather
keeping the baseline latency for SSD but using absolute latency for HD, which
can be done easily by setting DFL_HD_BASELINE_LATENCY to 0.

Thanks,
Shaohua

Re: [PATCH] drm: gem_cma_helper.c: Allow importing of contiguous scatterlists with nents > 1

2017-11-09 Thread Laurent Pinchart

Hi Liviu,

Thank you for the patch.

On Wednesday, 1 November 2017 16:14:19 EET Liviu Dudau wrote:
> drm_gem_cma_prime_import_sg_table() will fail if the number of entries
> in the sg_table > 1. However, you can have a device that uses an IOMMU
> engine and can map a discontiguous buffer with multiple entries that
> have consecutive sg_dma_addresses, effectively making it contiguous.
> Allow for that scenario by testing the entries in the sg_table for
> contiguous coverage.
> 
> Reviewed-by: Brian Starkey 
> Signed-off-by: Liviu Dudau 
> ---
> 
> Hi,
> 
> This patch is the only change I need in order to be able to use existing
> IOMMU domain infrastructure with the Mali DP driver. I have tested the
> patch and I know it works correctly for my setup, but I would like to get
> some comments on whether I am on the right path or if CMA really wants to
> see an sg_table with only one entry.

CMA, as the memory allocator, doesn't care as it doesn't even see the sg 
table. The drm_gem_cma_helper is badly named as it doesn't depend on CMA, it 
should have been called drm_gem_dma_contig_helper or something similar.

The assumption at the base of that helper library is that the memory is DMA 
contiguous. Your patch guarantees that, so it should be fine. I've quickly 
checked the drivers using drm_gem_cma_prime_import_sg_table and none of them 
use cma_obj->sgt, so I think there's no risk of breakage. However, I would 
prefer if you updated the drm_gem_cma_object structure documentation to 
explicitly state that the sgt can contain multiple entries but that those 
entries are guaranteed to have contiguous DMA addresses.

With the documentation update,

Reviewed-by: Laurent Pinchart 

>  drivers/gpu/drm/drm_gem_cma_helper.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c
> b/drivers/gpu/drm/drm_gem_cma_helper.c index 020e7668dfaba..43b179212052d
> 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -482,8 +482,26 @@ drm_gem_cma_prime_import_sg_table(struct drm_device
> *dev, {
>   struct drm_gem_cma_object *cma_obj;
> 
> - if (sgt->nents != 1)
> - return ERR_PTR(-EINVAL);
> + if (sgt->nents != 1) {
> + /* check if the entries in the sg_table are contiguous */
> + dma_addr_t next_addr = sg_dma_address(sgt->sgl);
> + struct scatterlist *s;
> + unsigned int i;
> +
> + for_each_sg(sgt->sgl, s, sgt->nents, i) {
> + /*
> +  * sg_dma_address(s) is only valid for entries
> +  * that have sg_dma_len(s) != 0
> +  */
> + if (!sg_dma_len(s))
> + continue;
> +
> + if (sg_dma_address(s) != next_addr)
> + return ERR_PTR(-EINVAL);
> +
> + next_addr = sg_dma_address(s) + sg_dma_len(s);
> + }
> + }
> 
>   /* Create a CMA GEM buffer. */
>   cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);

-- 
Regards,

Laurent Pinchart

Re: [PATCH v2 2/4] kaslr: select the memory region in immovable node to process

2017-11-09 Thread Chao Fan

On Fri, Nov 10, 2017 at 11:14:37AM +0800, Baoquan He wrote:
>On 11/10/17 at 11:03am, Chao Fan wrote:
>> On Thu, Nov 09, 2017 at 04:21:32PM +0800, Baoquan He wrote:
>> >Hi Chao,
>> >
>> >On 11/01/17 at 07:32pm, Chao Fan wrote:
>> >> Compare the region of memmap entry and immovable_mem, then choose the
>> >> intersection to process_mem_region.
>> >> 
>> >> Since the interrelationship between e820 or efi entries and memory
>> >> region in immovable_mem is different:
>> >
>> >Could you paste a bootlog with efi=debug specified in cmdline on the
>> >system you tested? I want to check what kind of intersection between
>> >them. The adding makes code pretty ugly, want to make sure if we have
>> >to do like this.
>> Hi Baoquan,
>> 
>> Here is a machine with efi.
>
>Thanks, do you have the whole boot log? I want to have a look at e820.

No problem, I will paste the whole in attach file.

>And this is a special system, or a customized system? I mean you just

It's a qemu machine, in which I can make more nodes to test.
I have no suitable host machine available in my hand.

>customize the firmware for better testing to cover kinds of cases.

Although the code may be a little ugly, after comparing the
different memory regions, this method and logic are better to
cover more cases.
If you have some ideas to improve the code. Thank you very much!

Thanks,
Chao Fan

>
>If it's too big, please attach it and send to me privately.
>
>Anyway, seems your considering about the intersection is right.
>
>Thanks
>Baoquan
>> 
>> The memory information in SRAT from dmesg:
>> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009]
>> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0x1f3f]
>> [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x1f40-0x3e7f]
>> [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x3e80-0x5dbf]
>> [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x5dc0-0x7cff]
>> [0.00] ACPI: SRAT: Node 4 PXM 4 [mem 0x7d00-0x9c3f]
>> [0.00] ACPI: SRAT: Node 5 PXM 5 [mem 0x9c40-0xbb7f]
>> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0xbb80-0xbfff]
>> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0x1-0x11abf]
>> [0.00] ACPI: SRAT: Node 7 PXM 7 [mem 0x11ac0-0x139ff]
>> [0.00] ACPI: SRAT: Node 8 PXM 8 [mem 0x13a00-0x1593f]
>> [0.00] ACPI: SRAT: Node 9 PXM 9 [mem 0x15940-0x1787f]
>> 
>> There are 10 nodes, and 500M memory in every node.
>> And node0 and node 6 has two parts.
>> 
>> 
>> Here is the efi mem:
>> [0.00] efi: mem00: [Boot Code  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x-0x0fff] (0MB)
>> [0.00] efi: mem01: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x1000-0x1fff] (0MB)
>> [0.00] efi: mem02: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x2000-0x0009] (0MB)
>> [0.00] efi: mem03: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0010-0x00805fff] (7MB)
>> [0.00] efi: mem04: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x00806000-0x00806fff] (0MB)
>> [0.00] efi: mem05: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x00807000-0x0081] (0MB)
>> [0.00] efi: mem06: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0082-0x012f] (10MB)
>> [0.00] efi: mem07: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0130-0x01ff] (13MB)
>> [0.00] efi: mem08: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x0200-0x036e3fff] (22MB)
>> (From mem00 to mem08, belongs to node0)
>> [0.00] efi: mem09: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x036e4000-0x3d626fff] (927MB)
>> (mem09 has part of node0 and part of node1, but not the whole of node0 and 
>> node1)
>> [0.00] efi: mem10: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x3d627000-0x3fff] (41MB)
>> (part of node1 and part of node2)
>> [0.00] efi: mem11: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x4000-0x8c92dfff] (1225MB)
>> [0.00] efi: mem12: [Loader Data|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0x8c92e000-0xbbfbdfff] (758MB)
>> [0.00] efi: mem13: [Boot Data  |   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbbfbe000-0xbbfddfff] (0MB)
>> [0.00] efi: mem14: [Conventional Memory|   |  |  |  |  |  |  |   
>> |WB|WT|WC|UC] range=[0xbbfde000-0xbe350fff] (35MB)
>> [0.00] efi: mem15: [Loader Data|   |  |  |  |  |

Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again

2017-11-09 Thread WANG Chao

On 11/10/17 at 12:04P, WANG Chao wrote:
> On 11/10/17 at 01:06P, Rafael J. Wysocki wrote:
> > On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote:
> > > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
> > >  wrote:
> > > > Hi Linus,
> > > >
> > > > On 11/9/2017 11:38 AM, WANG Chao wrote:
> > > >>
> > > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) 
> > > >> caused
> > > >> a serious performance issue when reading from /proc/cpuinfo on system
> > > >> with aperfmperf.
> > > >>
> > > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > > >> On a system with 64 cpus, it takes 1.5s to finish running `cat
> > > >> /proc/cpuinfo`, while it previously was done in 15ms.
> > > >
> > > > Honestly, I'm not sure what to do to address this ATM.
> > > >
> > > > The last requested frequency is only available in the non-HWP case, so 
> > > > it
> > > > cannot be used universally.
> > > 
> > > OK, here's an idea.
> > > 
> > > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> > > in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> > > is overkill) and then aperfmperf_snapshot_khz() can be run once on
> > > each CPU in show_cpuinfo() without taking the "stale cache" threshold
> > > into account.
> > > 
> > > I'm going to try that and see how far I can get with it.
> > 
> > Below is what I have.
> > 
> > I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in
> > aperfmperf_snapshot_all(), because 5 ms tended to add too much
> > variation to the results on my test box.
> > 
> > I think it may be reduced to 10 ms, though.
> > 
> > Chao, can you please try this one and report back?
> 
> Hi, Rafael
> 
> Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to
> finish on a 64 cpus AMD box with aperfmperf.
> 
> You missed the fact that c_start() will also be called by c_next().
> 
> But I don't think the overall idea is good enough. I think /proc/cpuinfo
> is too general for usespace too be delayed, no matter it's 10ms or 20ms.
> 
> My point is cpu MHz is best to use a cached value for quick access. If
> people are looking for reliable and accurate cpu frequency,
> /proc/cpuinfo is probably a bad idae.
> 
> What do you think?

Could you also explain 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in
/proc/cpuinfo) please? The commit message is not clear for me.

Are there any upstream disscutions? I wasn't following this change in
upstream. Now I can't find any.

Thanks,
WANG Chao

Re: [PATCH 1/4] kbuild: create directory for make cache only when necessary

2017-11-09 Thread Masahiro Yamada

Hi Douglas,

Thanks for your review.

2017-11-10 2:59 GMT+09:00 Doug Anderson :
> Hi,
>
> On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada
>  wrote:
>> Currently, the existence of $(dir $(make-cache)) is always checked,
>> and created if it is missing.
>>
>> We can avoid unnecessary system calls by some tricks.
>>
>> [1] If KBUILD_SRC is unset, we are building in the source tree.
>> The output directory checks can be entirely skipped.
>> [2] If at least one cache data is found, it means the cache file
>> was included.  Obiously its directory exists.  Skip "mkdir -p".
>> [3] If Makefile does not contain any call of __run-and-store, it will
>> not create a cache file.  No need to create its directory.
>> [4] The "mkdir -p" should be only invoked by the first call of
>> __run-and-store
>>
>> Signed-off-by: Masahiro Yamada 
>> ---
>>
>>  scripts/Kbuild.include | 13 +
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
>> index be1c9d6..4fb1be1 100644
>> --- a/scripts/Kbuild.include
>> +++ b/scripts/Kbuild.include
>> @@ -99,18 +99,19 @@ cc-cross-prefix =  \
>>
>>  # Include values from last time
>>  make-cache := $(if $(KBUILD_EXTMOD),$(KBUILD_EXTMOD)/,$(if 
>> $(obj),$(obj)/)).cache.mk
>> -ifeq ($(wildcard $(dir $(make-cache))),)
>> -$(shell mkdir -p '$(dir $(make-cache))')
>> -endif
>>  $(make-cache): ;
>>  -include $(make-cache)
>>
>> +cached-data := $(filter __cached_%, $(.VARIABLES))
>> +
>>  # If cache exceeds 1000 lines, shrink it down to 500.
>> -ifneq ($(word 1000,$(filter __cached_%, $(.VARIABLES))),)
>> +ifneq ($(word 1000,$(cached-data)),)
>>  $(shell tail -n 500 $(make-cache) > $(make-cache).tmp; \
>> mv $(make-cache).tmp $(make-cache))
>>  endif
>>
>> +cache-dir := $(if $(KBUILD_SRC),$(if $(cache-data),,$(dir $(make-cache
>
> It wouldn't hurt to add a comment that cache-dir will be blank if we
> don't need to make the cache dir and will contain a directory path
> only if the dir doesn't exist.  Without a comment it could take
> someone quite a while to realize that...


You are right. This is confusing.


Another idea is use a boolean flag.


For example, like follows:


create-cache-dir := $(if $(KBUILD_SRC),$(if $(cache-data),,1)))


 define __run-and-store
 ifeq ($(origin $(1)),undefined)
   $$(eval $(1) := $$(shell $$(2)))
 ifeq ($(create-cache-dir),1)
  $$(shell mkdir -p $(dir $(make-cache)))
  $$(eval create-cache-dir :=)
 endif
   $$(shell echo '$(1) := $$($(1))' >> $(make-cache))
 endif
 endef



Perhaps, this is clearer and self-documenting.



>> +
>>  # Usage: $(call __sanitize-opt,Hello=Hola$(comma)Goodbye Adios)
>>  #
>>  # Convert all '$', ')', '(', '\', '=', ' ', ',', ':' to '_'
>> @@ -136,6 +137,10 @@ __sanitize-opt = $(subst $$,_,$(subst 
>> $(right_paren),_,$(subst $(left_paren),_,$
>>  define __run-and-store
>>  ifeq ($(origin $(1)),undefined)
>>$$(eval $(1) := $$(shell $$(2)))
>> +ifneq ($(cache-dir),)
>> +  $$(shell mkdir -p $(cache-dir))
>
> I _think_ you want some single quotes in there.  AKA:
>
> $$(shell mkdir -p '$(cache-dir)')
>
> That at least matches what the "old" code used to do.  Specifically if
> 'cache-dir' happens to have a space in it then it won't work right
> without the single quotes.  There may be other symbols that your shell
> might interpret in interesting ways, too.


Kbuild always runs in the output directory.

So, 'cache-dir' is always a relative path from the top of kernel directory
whether O= option is given or not.


For kernel source, I do not see any file path containing spaces.

Just in case, I renamed a directory and tested, but
something strange happened in silentoldconfig, it would not work.


Insane people may want to use a file path with spaces
for external modules.

I tested,

 obj-m  := fo o/

but, this would not work either.


It will be difficult to make it work
because $(sort ...) is used in several places
in core makefiles.


So, my conclusion is, it does not work.


> NOTE: I have no idea if the kernel Makefiles work if paths like
> KBUILD_SRC have spaces in them to begin with, but it seems wise to add
> the quotes here anyway.

I have not tested this case.

Probably, this will be less difficult
if we want to allow spaces in KBUILD_SRC.


> ALSO NOTE: I think you could still confuse the kernel Makefiles if
> somehow you had a single quote in your path somehow.  I assume we
> don't care?

Hmm, I do not think this is worth efforts.

Probably, the most reasonable solution is
please do not use special characters in file paths.



>
>> +  $$(eval cache-dir :=)
>> +endif
>>$$(shell echo '$(1) := $$($(1))' >> $(make-cache))
>>  endif
>>  endef
>
> Other than the single quote problem and the suggested comment, this
> seems like a sane optimization to me.  Feel free to add my Reviewed-by
> once those fixes are in place.
>
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a m

Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again

2017-11-09 Thread WANG Chao

On 11/10/17 at 01:06P, Rafael J. Wysocki wrote:
> On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote:
> > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
> >  wrote:
> > > Hi Linus,
> > >
> > > On 11/9/2017 11:38 AM, WANG Chao wrote:
> > >>
> > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > >> a serious performance issue when reading from /proc/cpuinfo on system
> > >> with aperfmperf.
> > >>
> > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > >> On a system with 64 cpus, it takes 1.5s to finish running `cat
> > >> /proc/cpuinfo`, while it previously was done in 15ms.
> > >
> > > Honestly, I'm not sure what to do to address this ATM.
> > >
> > > The last requested frequency is only available in the non-HWP case, so it
> > > cannot be used universally.
> > 
> > OK, here's an idea.
> > 
> > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> > in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> > is overkill) and then aperfmperf_snapshot_khz() can be run once on
> > each CPU in show_cpuinfo() without taking the "stale cache" threshold
> > into account.
> > 
> > I'm going to try that and see how far I can get with it.
> 
> Below is what I have.
> 
> I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in
> aperfmperf_snapshot_all(), because 5 ms tended to add too much
> variation to the results on my test box.
> 
> I think it may be reduced to 10 ms, though.
> 
> Chao, can you please try this one and report back?

Hi, Rafael

Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to
finish on a 64 cpus AMD box with aperfmperf.

You missed the fact that c_start() will also be called by c_next().

But I don't think the overall idea is good enough. I think /proc/cpuinfo
is too general for usespace too be delayed, no matter it's 10ms or 20ms.

My point is cpu MHz is best to use a cached value for quick access. If
people are looking for reliable and accurate cpu frequency,
/proc/cpuinfo is probably a bad idae.

What do you think?

WANG Chao

> 
> 
> ---
>  arch/x86/kernel/cpu/aperfmperf.c |   42 
> ---
>  arch/x86/kernel/cpu/cpu.h|4 +++
>  arch/x86/kernel/cpu/proc.c   |5 +++-
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> ===
> --- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
> +++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> @@ -14,6 +14,8 @@
>  #include 
>  #include 
>  
> +#include "cpu.h"
> +
>  struct aperfmperf_sample {
>   unsigned intkhz;
>   ktime_t time;
> @@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
>   u64 aperf, aperf_delta;
>   u64 mperf, mperf_delta;
>   struct aperfmperf_sample *s = this_cpu_ptr(&samples);
> - ktime_t now = ktime_get();
> - s64 time_delta = ktime_ms_delta(now, s->time);
>   unsigned long flags;
>  
>   local_irq_save(flags);
> @@ -57,15 +57,10 @@ static void aperfmperf_snapshot_khz(void
>   if (mperf_delta == 0)
>   return;
>  
> - s->time = now;
> + s->time = ktime_get();
>   s->aperf = aperf;
>   s->mperf = mperf;
> -
> - /* If the previous iteration was too long ago, discard it. */
> - if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
> - s->khz = 0;
> - else
> - s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
> + s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
>  }
>  
>  unsigned int arch_freq_get_on_cpu(int cpu)
> @@ -82,16 +77,41 @@ unsigned int arch_freq_get_on_cpu(int cp
>   /* Don't bother re-computing within the cache threshold time. */
>   time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
>   khz = per_cpu(samples.khz, cpu);
> - if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> + if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
>   return khz;
>  
>   smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
>   khz = per_cpu(samples.khz, cpu);
> - if (khz)
> + if (time_delta <= APERFMPERF_STALE_THRESHOLD_MS)
>   return khz;
>  
> + /* If the previous iteration was too long ago, take a new data point. */
>   msleep(APERFMPERF_REFRESH_DELAY_MS);
>   smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
>  
>   return per_cpu(samples.khz, cpu);
>  }
> +
> +void aperfmperf_snapshot_all(void)
> +{
> + if (!cpu_khz)
> + return;
> +
> + if (!static_cpu_has(X86_FEATURE_APERFMPERF))
> + return;
> +
> + smp_call_function_many(cpu_online_mask, aperfmperf_snapshot_khz, NULL, 
> 1);
> + msleep(APERFMPERF_REFRESH_DELAY_MS);
> +}
> +
> +unsigned int aperfmperf_snapshot_cpu(int cpu)
> +{
> + if (!cpu_khz)
> + return 0;
> +
> + if (!static_cpu_has(X

Re: [PATCH 2/3] clk: hisilicon: Add support for Hi3660 stub clocks

2017-11-09 Thread Leo Yan

Hi Julien,

On Fri, Nov 03, 2017 at 05:37:34PM +, Julien Thierry wrote:
> Hi Kaihua,
> 
> On 03/11/17 07:25, Kaihua Zhong wrote:
> >Hi3660 has four stub clocks, which are big and LITTLE cluster clocks,
> >GPU clock and DDR clock.  These clocks ask MCU for frequency scaling
> >by sending message through mailbox.
> >
> >This commit adds support for stub clocks, it requests the dedicated
> >mailbox channel at initialization; then later uses this channel to send
> >message to MCU to execute frequency scaling. The four stub clocks share
> >the same mailbox channel, but every stub clock has its own command id so
> >MCU can distinguish the requirement coming for which clock.
> >
> >A shared memory is used to present effective frequency value, so the
> >clock driver uses I/O mapping for the memory and reads back rate value.
> >
> >Reviewed-by: Leo Yan 
> >Signed-off-by: Kai Zhao 
> >Signed-off-by: Kevin Wang 
> >Signed-off-by: Ruyi Wang 
> >Signed-off-by: Kaihua Zhong 
> >---
> >  drivers/clk/hisilicon/Kconfig|   6 +
> >  drivers/clk/hisilicon/Makefile   |   1 +
> >  drivers/clk/hisilicon/clk-hi3660-stub.c  | 195 
> > +++
> >  include/dt-bindings/clock/hi3660-clock.h |   7 ++
> >  4 files changed, 209 insertions(+)
> >  create mode 100644 drivers/clk/hisilicon/clk-hi3660-stub.c
> >
> >diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig
> >index 7098bfd..1bd4355 100644
> >--- a/drivers/clk/hisilicon/Kconfig
> >+++ b/drivers/clk/hisilicon/Kconfig
> >@@ -49,3 +49,9 @@ config STUB_CLK_HI6220
> > default ARCH_HISI
> > help
> >   Build the Hisilicon Hi6220 stub clock driver.
> >+
> >+config STUB_CLK_HI3660
> >+bool "Hi3660 Stub Clock Driver"
> >+depends on COMMON_CLK_HI3660 && MAILBOX
> >+help
> >+  Build the Hisilicon Hi3660 stub clock driver.
> >diff --git a/drivers/clk/hisilicon/Makefile b/drivers/clk/hisilicon/Makefile
> >index 1e4c3dd..0a5b499 100644
> >--- a/drivers/clk/hisilicon/Makefile
> >+++ b/drivers/clk/hisilicon/Makefile
> >@@ -14,3 +14,4 @@ obj-$(CONFIG_COMMON_CLK_HI3798CV200)   += 
> >crg-hi3798cv200.o
> >  obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o
> >  obj-$(CONFIG_RESET_HISI)   += reset.o
> >  obj-$(CONFIG_STUB_CLK_HI6220)  += clk-hi6220-stub.o
> >+obj-$(CONFIG_STUB_CLK_HI3660)   += clk-hi3660-stub.o
> >diff --git a/drivers/clk/hisilicon/clk-hi3660-stub.c 
> >b/drivers/clk/hisilicon/clk-hi3660-stub.c
> >new file mode 100644
> >index 000..0a21c91
> >--- /dev/null
> >+++ b/drivers/clk/hisilicon/clk-hi3660-stub.c
> >@@ -0,0 +1,195 @@
> >+/*
> >+ * Hisilicon clock driver
> >+ *
> >+ * Copyright (c) 2013-2017 Hisilicon Limited.
> >+ * Copyright (c) 2017 Linaro Limited.
> >+ *
> >+ * Author: Kai Zhao 
> >+ * Author: Tao Wang 
> >+ * Author: Leo Yan 
> >+ *
> >+ * This program is free software; you can redistribute it and/or modify
> >+ * it under the terms of the GNU General Public License as published by
> >+ * the Free Software Foundation; either version 2 of the License, or
> >+ * (at your option) any later version.
> >+ *
> >+ * This program is distributed in the hope that it will be useful,
> >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >+ * GNU General Public License for more details.
> >+ *
> >+ */
> >+
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+#include 
> >+
> >+#define HI3660_STUB_CLOCK_DATA  (0x70)
> >+#define MHZ (1000 * 1000)
> >+
> >+#define DEFINE_CLK_STUB(_id, _cmd, _name)   \
> >+{   \
> >+.id = (_id),\
> >+.cmd = (_cmd),  \
> >+.hw.init = &(struct clk_init_data) {\
> >+.name = #_name, \
> >+.ops = &hi3660_stub_clk_ops,\
> >+.num_parents = 0,   \
> >+.flags = CLK_GET_RATE_NOCACHE,  \
> >+},  \
> >+},
> >+
> >+#define to_stub_clk(_hw) container_of(_hw, struct hi3660_stub_clk, hw)
> >+
> >+struct hi3660_stub_clk_chan {
> >+struct mbox_client cl;
> >+struct mbox_chan *mbox;
> >+};
> >+
> >+struct hi3660_stub_clk {
> >+unsigned int id;
> >+struct device *dev;
> 
> I don't understand why you need to keep this. The only place it is used it
> for the debug message in hi3660_stub_clk_set_rate and you could get the
> device pointer by doing chan->cl.dev since all the stub_clk point to the
> same device.

Kaihua might miss this email, so I checked all your comments; accept
these comments and will spin for next version patch.

Thank you for good suggestions.

Thanks,
Leo Yan

>

[PATCHv4 3/3] ARMv8: pcie: make the DWC EP driver support for layerscape

2017-11-09 Thread Bao Xiaowei

Layerscape pcie controllers support RC or EP mode, Add the EP mode
support in Kconfig, the driver will support both RC and EP mode, and
the driver is able to judge the pcie controllers work on RC or EP mode.

Signed-off-by: Bao Xiaowei 
Acked-by: Minghuan Lian 
---
 v2:
 no change
 v3:
 no change
 v4:
 no change

 drivers/pci/dwc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
index 22ec82fcdea2..b5f507795779 100644
--- a/drivers/pci/dwc/Kconfig
+++ b/drivers/pci/dwc/Kconfig
@@ -108,6 +108,7 @@ config PCI_LAYERSCAPE
depends on PCI_MSI_IRQ_DOMAIN
select MFD_SYSCON
select PCIE_DW_HOST
+   select PCIE_DW_EP
help
  Say Y here if you want PCIe controller support on Layerscape SoCs.
 
-- 
2.14.1

[PATCHv4 0/3] dts: Add the property of IB and OB

2017-11-09 Thread Bao Xiaowei

Depend on http://patchwork.ozlabs.org/patch/815382/

Bao Xiaowei (3):
  ARMv8: dts: ls1046a: add the property of IB and OB
  ARMv8: layerscape: add the pcie ep function support
  ARMv8: pcie: make the DWC EP driver support for layerscape

 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   6 ++
 drivers/pci/dwc/Kconfig|   1 +
 drivers/pci/dwc/pci-layerscape.c   | 121 +++--
 3 files changed, 121 insertions(+), 7 deletions(-)

-- 
2.14.1

[PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support

2017-11-09 Thread Bao Xiaowei

Add the pcie controller ep function support of layerscape base on
pcie ep framework.

Signed-off-by: Bao Xiaowei 
---
 v2:
 - fix the ioremap function used but no ioumap issue
 - optimize the code structure
 - add code comments
 v3:
 - fix the msi outband window request failed issue
 v4:
 - optimize the code, adjust the format

 drivers/pci/dwc/pci-layerscape.c | 120 ---
 1 file changed, 113 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/dwc/pci-layerscape.c b/drivers/pci/dwc/pci-layerscape.c
index 87fa486bee2c..6f3e434599e0 100644
--- a/drivers/pci/dwc/pci-layerscape.c
+++ b/drivers/pci/dwc/pci-layerscape.c
@@ -34,7 +34,12 @@
 /* PEX Internal Configuration Registers */
 #define PCIE_STRFMR1   0x71c /* Symbol Timer & Filter Mask Register1 */
 
+#define PCIE_DBI2_BASE 0x1000  /* DBI2 base address*/
+#define PCIE_MSI_MSG_DATA_OFF  0x5c/* MSI Data register address*/
+#define PCIE_MSI_OB_SIZE   4096
+#define PCIE_MSI_ADDR_OFFSET   (1024 * 1024)
 #define PCIE_IATU_NUM  6
+#define PCIE_EP_ADDR_SPACE_SIZE 0x1
 
 struct ls_pcie_drvdata {
u32 lut_offset;
@@ -44,12 +49,20 @@ struct ls_pcie_drvdata {
const struct dw_pcie_ops *dw_pcie_ops;
 };
 
+struct ls_pcie_ep {
+   dma_addr_t msi_phys_addr;
+   void __iomem *msi_virt_addr;
+   u64 msi_msg_addr;
+   u16 msi_msg_data;
+};
+
 struct ls_pcie {
struct dw_pcie *pci;
void __iomem *lut;
struct regmap *scfg;
const struct ls_pcie_drvdata *drvdata;
int index;
+   struct ls_pcie_ep *pcie_ep;
 };
 
 #define to_ls_pcie(x)  dev_get_drvdata((x)->dev)
@@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = {
{ },
 };
 
+static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep)
+{
+   iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr);
+}
+
+static int ls_pcie_raise_irq(struct dw_pcie_ep *ep,
+   enum pci_epc_irq_type type, u8 interrupt_num)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct ls_pcie *pcie = to_ls_pcie(pci);
+   struct ls_pcie_ep *pcie_ep = pcie->pcie_ep;
+   u32 free_win;
+
+   /* get the msi message address and msi message data */
+   pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) |
+   (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32);
+   pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF);
+
+   /* request and config the outband window for msi */
+   free_win = find_first_zero_bit(&ep->ob_window_map,
+   sizeof(ep->ob_window_map));
+   if (free_win >= ep->num_ob_windows) {
+   dev_err(pci->dev, "no free outbound window\n");
+   return -ENOMEM;
+   }
+
+   dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM,
+   pcie_ep->msi_phys_addr,
+   pcie_ep->msi_msg_addr,
+   PCIE_MSI_OB_SIZE);
+
+   set_bit(free_win, &ep->ob_window_map);
+
+   /* generate the msi interrupt */
+   ls_pcie_raise_msi_irq(pcie_ep);
+
+   /* release the outband window of msi */
+   dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND);
+   clear_bit(free_win, &ep->ob_window_map);
+
+   return 0;
+}
+
+static struct dw_pcie_ep_ops pcie_ep_ops = {
+   .raise_irq = ls_pcie_raise_irq,
+};
+
+static int __init ls_add_pcie_ep(struct ls_pcie *pcie,
+   struct platform_device *pdev)
+{
+   struct dw_pcie *pci = pcie->pci;
+   struct device *dev = pci->dev;
+   struct dw_pcie_ep *ep;
+   struct ls_pcie_ep *pcie_ep;
+   struct resource *cfg_res;
+   int ret;
+
+   ep = &pci->ep;
+   ep->ops = &pcie_ep_ops;
+
+   pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL);
+   if (!pcie_ep)
+   return -ENOMEM;
+
+   pcie->pcie_ep = pcie_ep;
+
+   cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config");
+   if (cfg_res) {
+   ep->phys_base = cfg_res->start;
+   ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE;
+   } else {
+   dev_err(dev, "missing *config* space\n");
+   return -ENODEV;
+   }
+
+   pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET;
+
+   pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr,
+   PCIE_MSI_OB_SIZE);
+   if (!pcie_ep->msi_virt_addr) {
+   dev_err(dev, "failed to map MSI outbound region\n");
+   return -ENOMEM;
+   }
+
+   ret = dw_pcie_ep_init(ep);
+   if (ret) {
+   dev_err(dev, "failed to initialize endpoint\n");
+   return ret;
+   }
+
+   return 0;
+}
+
 static int __init ls_add_pcie_port(struct ls_pcie *pcie)
 {
struct dw

[PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB

2017-11-09 Thread Bao Xiaowei

Add the property of inbound and outbound windows number for ep
driver.

Signed-off-by: Bao Xiaowei 
Acked-by: Minghuan Lian 
---
 v2:
 - no change
 v3:
 - modify the commit message
 v4:
 - no change

 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 06b5e12d04d8..f8332669663c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -674,6 +674,8 @@
device_type = "pci";
dma-coherent;
num-lanes = <4>;
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
bus-range = <0x0 0xff>;
ranges = <0x8100 0x0 0x 0x40 0x0001 0x0 
0x0001   /* downstream I/O */
  0x8200 0x0 0x4000 0x40 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
@@ -699,6 +701,8 @@
device_type = "pci";
dma-coherent;
num-lanes = <2>;
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
bus-range = <0x0 0xff>;
ranges = <0x8100 0x0 0x 0x48 0x0001 0x0 
0x0001   /* downstream I/O */
  0x8200 0x0 0x4000 0x48 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
@@ -724,6 +728,8 @@
device_type = "pci";
dma-coherent;
num-lanes = <2>;
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
bus-range = <0x0 0xff>;
ranges = <0x8100 0x0 0x 0x50 0x0001 0x0 
0x0001   /* downstream I/O */
  0x8200 0x0 0x4000 0x50 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
-- 
2.14.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1015 matches

Mail list logo