Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote: > >> Also, you should not put cpumask_t on stack, that's 'broken'. > > Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c: > > /* These two declarations are only used in check_irq_vectors_for_cpu_disable() > * below, which is protected by stop_machine(). Putting them on the stack > * results in a stack frame overflow. Dynamically allocating could result in > a > * failure so declare these two cpumasks as global. > */ > static struct cpumask affinity_new, online_new; That code no longer exists.. Also not entirely sure how it would be helpful. What you probably want to do is have a per-cpu cpumask, since flush_tlb_others() is called with preemption disabled. But you probably don't want an unconditionally allocated one, since most kernels will not in fact be PV. So you'll want something like: static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask); And then you need something like: for_each_possible_cpu(cpu) { zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu), GFP_KERNEL, cpu_to_node(cpu)); } before you set the pv-op or so.
[PATCH 0/4] i2c: mpc: Clean up clock selection
This series cleans up I2C clock selection for Freescale/NXP MPC SoCs during the controller initialization for cases when clock settings are not to be preserved from the bootloader. Patch 1/4 fixes division by zero which happens during controller initialization when (1) clock frequency is not specified in the Device Tree, (2) preservation of clock settings from the bootloader is not requested, and (3) the clock prescaler (which may actually depend on the POR configuration) is not explicitly specified. It simply moves obtaining the prescaler value before the clock computation. Patch 2/4 unifies obtaining the prescaler value for MPC8544 with other SoCs. It moves the relevant code to the helper function introduced in commit 8ce795cb0c6b ("i2c: mpc: assign the correct prescaler from SVR") and also adds handling of MPC8533 is similar to MPC8544 in this regard. Patch 3/4 fixes checking the relevant bit in a controller's register used for selecting the prescaler value for MPC8533 and MPC8544. Patch 4/4 removes the facility for setting the clock prescaler value at compile time. This facility is not used in the majority of cases. Getting the prescaler value at run time currently covers more SoCs. Hardcoding it is also wrong for some SoCs as it can be configured on board during POR. Arseny Solokha (4): i2c: mpc: get MPC8xxx I2C clock prescaler before using it in calculations i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/ MPC8xxx i2c: mpc: fix PORDEVSR2 mask for MPC8533/44 i2c: mpc: always determine I2C clock prescaler at runtime drivers/i2c/busses/i2c-mpc.c | 72 ++-- 1 file changed, 30 insertions(+), 42 deletions(-) -- 2.15.0
[PATCH 3/4] i2c: mpc: fix PORDEVSR2 mask for MPC8533/44
According to the reference manuals for the corresponding SoCs, SEC frequency ratio configuration is indicated by bit 26 of the POR Device Status Register 2. Consequently, SEC_CFG bit should be tested by mask 0x20, not 0x80. Testing the wrong bit leads to selection of wrong I2C clock prescaler on those SoCs. Signed-off-by: Arseny Solokha --- drivers/i2c/busses/i2c-mpc.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index f47916466b82..8d60db0080f6 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -332,14 +332,18 @@ static u32 mpc_i2c_get_sec_cfg_8xxx(void) if (prop) { /* * Map and check POR Device Status Register 2 -* (PORDEVSR2) at 0xE0014 +* (PORDEVSR2) at 0xE0014. Note than while MPC8533 +* and MPC8544 indicate SEC frequency ratio +* configuration as bit 26 in PORDEVSR2, other MPC8xxx +* parts may store it differently or may not have it +* at all. */ reg = ioremap(get_immrbase() + *prop + 0x14, 0x4); if (!reg) printk(KERN_ERR "Error: couldn't map PORDEVSR2\n"); else - val = in_be32(reg) & 0x0080; /* sec-cfg */ + val = in_be32(reg) & 0x0020; /* sec-cfg */ iounmap(reg); } } -- 2.15.0
[PATCH 2/4] i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/ MPC8xxx
Commit 8ce795cb0c6b ("i2c: mpc: assign the correct prescaler from SVR") introduced the common helper function for obtaining the actual clock prescaler value for MPC85xx. However, getting the prescaler for MPC8544 which depends on the SEC frequency ratio on this platform, has been always performed separately based on the corresponding Device Tree configuration. Move special handling of MPC8544 into that common helper. Make it dependent on the SoC version and not on Device Tree compatible node, as is the case with all other SoCs. Handle MPC8533 the same way which is similar to MPC8544 in this regard, according to AN2919 "Determining the I2C Frequency Divider Ratio for SCL". Signed-off-by: Arseny Solokha --- drivers/i2c/busses/i2c-mpc.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index bf0c86d41f1a..f47916466b82 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -350,7 +350,11 @@ static u32 mpc_i2c_get_sec_cfg_8xxx(void) static u32 mpc_i2c_get_prescaler_8xxx(void) { - /* mpc83xx and mpc82xx all have prescaler 1 */ + /* +* According to the AN2919 all MPC824x have prescaler 1, while MPC83xx +* may have prescaler 1, 2, or 3, depending on the power-on +* configuration. +*/ u32 prescaler = 1; /* mpc85xx */ @@ -367,6 +371,10 @@ static u32 mpc_i2c_get_prescaler_8xxx(void) || (SVR_SOC_VER(svr) == SVR_8610)) /* the above 85xx SoCs have prescaler 1 */ prescaler = 1; + else if ((SVR_SOC_VER(svr) == SVR_8533) + || (SVR_SOC_VER(svr) == SVR_8544)) + /* the above 85xx SoCs have prescaler 3 or 2 */ + prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2; else /* all the other 85xx have prescaler 2 */ prescaler = 2; @@ -383,8 +391,6 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, u32 clock, int i; /* Determine proper divider value */ - if (of_device_is_compatible(node, "fsl,mpc8544-i2c")) - prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2; if (!prescaler) prescaler = mpc_i2c_get_prescaler_8xxx(); -- 2.15.0
[PATCH 4/4] i2c: mpc: always determine I2C clock prescaler at runtime
Remove the facility for setting the prescaler value at compile time entirely. It was only used for two SoCs, duplicating the actual value for one of them and setting sometimes bogus value for another. Make all MPC8xxx SoCs obtain their actual I2C clock prescaler from a single place in the code. Signed-off-by: Arseny Solokha --- drivers/i2c/busses/i2c-mpc.c | 52 +--- 1 file changed, 15 insertions(+), 37 deletions(-) diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index 8d60db0080f6..e0f059687c2d 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -78,9 +78,7 @@ struct mpc_i2c_divider { }; struct mpc_i2c_data { - void (*setup)(struct device_node *node, struct mpc_i2c *i2c, - u32 clock, u32 prescaler); - u32 prescaler; + void (*setup)(struct device_node *node, struct mpc_i2c *i2c, u32 clock); }; static inline void writeccr(struct mpc_i2c *i2c, u32 x) @@ -201,7 +199,7 @@ static const struct mpc_i2c_divider mpc_i2c_dividers_52xx[] = { }; static int mpc_i2c_get_fdr_52xx(struct device_node *node, u32 clock, - int prescaler, u32 *real_clk) + u32 *real_clk) { const struct mpc_i2c_divider *div = NULL; unsigned int pvr = mfspr(SPRN_PVR); @@ -236,7 +234,7 @@ static int mpc_i2c_get_fdr_52xx(struct device_node *node, u32 clock, static void mpc_i2c_setup_52xx(struct device_node *node, struct mpc_i2c *i2c, -u32 clock, u32 prescaler) +u32 clock) { int ret, fdr; @@ -246,7 +244,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node, return; } - ret = mpc_i2c_get_fdr_52xx(node, clock, prescaler, &i2c->real_clk); + ret = mpc_i2c_get_fdr_52xx(node, clock, &i2c->real_clk); fdr = (ret >= 0) ? ret : 0x3f; /* backward compatibility */ writeb(fdr & 0xff, i2c->base + MPC_I2C_FDR); @@ -258,7 +256,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node, #else /* !(CONFIG_PPC_MPC52xx || CONFIG_PPC_MPC512x) */ static void mpc_i2c_setup_52xx(struct device_node *node, struct mpc_i2c *i2c, -u32 clock, u32 prescaler) +u32 clock) { } #endif /* CONFIG_PPC_MPC52xx || CONFIG_PPC_MPC512x */ @@ -266,7 +264,7 @@ static void mpc_i2c_setup_52xx(struct device_node *node, #ifdef CONFIG_PPC_MPC512x static void mpc_i2c_setup_512x(struct device_node *node, struct mpc_i2c *i2c, -u32 clock, u32 prescaler) +u32 clock) { struct device_node *node_ctrl; void __iomem *ctrl; @@ -289,12 +287,12 @@ static void mpc_i2c_setup_512x(struct device_node *node, } /* The clock setup for the 52xx works also fine for the 512x */ - mpc_i2c_setup_52xx(node, i2c, clock, prescaler); + mpc_i2c_setup_52xx(node, i2c, clock); } #else /* CONFIG_PPC_MPC512x */ static void mpc_i2c_setup_512x(struct device_node *node, struct mpc_i2c *i2c, -u32 clock, u32 prescaler) +u32 clock) { } #endif /* CONFIG_PPC_MPC512x */ @@ -388,16 +386,13 @@ static u32 mpc_i2c_get_prescaler_8xxx(void) } static int mpc_i2c_get_fdr_8xxx(struct device_node *node, u32 clock, - u32 prescaler, u32 *real_clk) + u32 *real_clk) { const struct mpc_i2c_divider *div = NULL; + u32 prescaler = mpc_i2c_get_prescaler_8xxx(); u32 divider; int i; - /* Determine proper divider value */ - if (!prescaler) - prescaler = mpc_i2c_get_prescaler_8xxx(); - if (clock == MPC_I2C_CLOCK_LEGACY) { /* see below - default fdr = 0x1031 -> div = 16 * 3072 */ *real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072); @@ -425,7 +420,7 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, u32 clock, static void mpc_i2c_setup_8xxx(struct device_node *node, struct mpc_i2c *i2c, -u32 clock, u32 prescaler) +u32 clock) { int ret, fdr; @@ -436,7 +431,7 @@ static void mpc_i2c_setup_8xxx(struct device_node *node, return; } - ret = mpc_i2c_get_fdr_8xxx(node, clock, prescaler, &i2c->real_clk); + ret = mpc_i2c_get_fdr_8xxx(node, clock, &i2c->real_clk); fdr = (ret >= 0) ? ret : 0x1031; /* backward compatibility */ writeb(fdr & 0xff,
Re: [PATCH] perf evsel: Fix incorrect precise_ip in default event name
Hi Namhyung, Yeah, you are right. I'll send a new patch later. Thanks, Mengting Zhang On 2017/11/10 14:30, Namhyung Kim wrote: Hello, On Fri, Nov 10, 2017 at 01:49:06PM +0800, Mengting Zhang wrote: When no event is specified with -e option, perf will specify a "cycles" event with the highest level of precision available in perf_event_attr.precise_ip as the default event. But the evsel name shows an incorrect precise ip, fix it. For example, with a highest precision perf_event_attr.precise_ip = 2, the evsel name "cycles:ppp" shows a wrong precision available. Before: $./perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ] $./perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 After: $./perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ] $./perf evlist -v cycles:pp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 Signed-off-by: Mengting Zhang --- tools/perf/util/evsel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 0dccdb8..94cf11d 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise) if (asprintf(&evsel->name, "cycles%s%s%.*s", (attr.precise_ip || attr.exclude_kernel) ? ":" : "", attr.exclude_kernel ? "u" : "", -attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0) +attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0) I think you don't need to check value of the precise_ip anymore. The following should be ok: attr.precise_ip, "ppp") < 0) Thanks, Namhyung .
[PATCH 1/4] i2c: mpc: get MPC8xxx I2C clock prescaler before using it in calculations
Obtaining the actual I2C clock prescaler value in mpc_i2c_setup_8xxx() only happens when the clock parameter is set to something other than MPC_I2C_CLOCK_LEGACY. When the clock parameter is exactly MPC_I2C_CLOCK_LEGACY, the prescaler parameter is used in arithmetic division as provided by the caller, resulting in a division by zero for the majority of processors supported by the module. Avoid division by zero by obtaining the actual I2C clock prescaler in mpc_i2c_setup_8xxx() unconditionally regardless of the passed clock value. Signed-off-by: Arseny Solokha --- drivers/i2c/busses/i2c-mpc.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c index 96caf378b1dc..bf0c86d41f1a 100644 --- a/drivers/i2c/busses/i2c-mpc.c +++ b/drivers/i2c/busses/i2c-mpc.c @@ -382,18 +382,18 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, u32 clock, u32 divider; int i; - if (clock == MPC_I2C_CLOCK_LEGACY) { - /* see below - default fdr = 0x1031 -> div = 16 * 3072 */ - *real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072); - return -EINVAL; - } - /* Determine proper divider value */ if (of_device_is_compatible(node, "fsl,mpc8544-i2c")) prescaler = mpc_i2c_get_sec_cfg_8xxx() ? 3 : 2; if (!prescaler) prescaler = mpc_i2c_get_prescaler_8xxx(); + if (clock == MPC_I2C_CLOCK_LEGACY) { + /* see below - default fdr = 0x1031 -> div = 16 * 3072 */ + *real_clk = fsl_get_sys_freq() / prescaler / (16 * 3072); + return -EINVAL; + } + divider = fsl_get_sys_freq() / clock / prescaler; pr_debug("I2C: src_clock=%d clock=%d divider=%d\n", -- 2.15.0
Re: [PATCH v3 1/2] PM / domains: Rework governor code to be more consistent
On 7 November 2017 at 02:23, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The genpd governor currently uses negative PM QoS values to indicate > the "no suspend" condition and 0 as "no restriction", but it doesn't > use them consistently. Moreover, it tries to refresh QoS values for > already suspended devices in a quite questionable way. > > For the above reasons, rework it to be a bit more consistent. > > First off, note that dev_pm_qos_read_value() in > dev_update_qos_constraint() and __default_power_down_ok() is > evaluated for devices in suspend. Moreover, that only happens if the > effective_constraint_ns value for them is negative (meaning "no > suspend"). It is not evaluated in any other cases, so effectively > the QoS values are only updated for devices in suspend that should > not have been suspended in the first place. In all of the other > cases, the QoS values taken into account are the effective ones from > the time before the device has been suspended, so generally devices > need to be resumed and suspended again for new QoS values to take > effect anyway. Thus evaluating dev_update_qos_constraint() in > those two places doesn't make sense at all, so drop it. > > Second, initialize effective_constraint_ns to 0 ("no constraint") > rather than to (-1) ("no suspend"), which makes more sense in > general and in case effective_constraint_ns is never updated > (the device is in suspend all the time or it is never suspended) > it doesn't affect the device's parent and so on. > > Finally, rework default_suspend_ok() to explicitly handle the > "no restriction" and "no suspend" special cases. > > Also add WARN_ON() around checks that should never trigger. > > Signed-off-by: Rafael J. Wysocki > Tested-by: Geert Uytterhoeven Acked-by: Ulf Hansson Kind regards Uffe > --- > > v2 -> v3: Take children that don't belong to genpd power domains into > account in dev_update_qos_constraint(). > > --- > drivers/base/power/domain.c |2 > drivers/base/power/domain_governor.c | 71 > --- > 2 files changed, 50 insertions(+), 23 deletions(-) > > Index: linux-pm/drivers/base/power/domain.c > === > --- linux-pm.orig/drivers/base/power/domain.c > +++ linux-pm/drivers/base/power/domain.c > @@ -1331,7 +1331,7 @@ static struct generic_pm_domain_data *ge > > gpd_data->base.dev = dev; > gpd_data->td.constraint_changed = true; > - gpd_data->td.effective_constraint_ns = -1; > + gpd_data->td.effective_constraint_ns = 0; > gpd_data->nb.notifier_call = genpd_dev_pm_qos_notifier; > > spin_lock_irq(&dev->power.lock); > Index: linux-pm/drivers/base/power/domain_governor.c > === > --- linux-pm.orig/drivers/base/power/domain_governor.c > +++ linux-pm/drivers/base/power/domain_governor.c > @@ -14,22 +14,33 @@ > static int dev_update_qos_constraint(struct device *dev, void *data) > { > s64 *constraint_ns_p = data; > - s32 constraint_ns = -1; > + s64 constraint_ns; > > - if (dev->power.subsys_data && dev->power.subsys_data->domain_data) > + if (dev->power.subsys_data && dev->power.subsys_data->domain_data) { > + /* > +* Only take suspend-time QoS constraints of devices into > +* account, because constraints updated after the device has > +* been suspended are not guaranteed to be taken into account > +* anyway. In order for them to take effect, the device has > to > +* be resumed and suspended again. > +*/ > constraint_ns = dev_gpd_data(dev)->td.effective_constraint_ns; > - > - if (constraint_ns < 0) { > + } else { > + /* > +* The child is not in a domain and there's no info on its > +* suspend/resume latencies, so assume them to be negligible > and > +* take its current PM QoS constraint (that's the only thing > +* known at this point anyway). > +*/ > constraint_ns = dev_pm_qos_read_value(dev); > - constraint_ns *= NSEC_PER_USEC; > + if (constraint_ns > 0) > + constraint_ns *= NSEC_PER_USEC; > } > + > + /* 0 means "no constraint" */ > if (constraint_ns == 0) > return 0; > > - /* > -* constraint_ns cannot be negative here, because the device has been > -* suspended. > -*/ > if (constraint_ns < *constraint_ns_p || *constraint_ns_p == 0) > *constraint_ns_p = constraint_ns; > > @@ -76,14 +87,32 @@ static bool default_suspend_ok(struct de > device_for_each_child(dev, &constraint_ns, > dev_update_qos_constraint); > > -
Re: [PATCH v4 2/2] PM / QoS: Fix device resume latency framework
On 7 November 2017 at 11:33, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The special value of 0 for device resume latency PM QoS means > "no restriction", but there are two problems with that. > > First, device resume latency PM QoS requests with 0 as the > value are always put in front of requests with positive > values in the priority lists used internally by the PM QoS > framework, causing 0 to be chosen as an effective constraint > value. However, that 0 is then interpreted as "no restriction" > effectively overriding the other requests with specific > restrictions which is incorrect. > > Second, the users of device resume latency PM QoS have no > way to specify that *any* resume latency at all should be > avoided, which is an artificial limitation in general. > > To address these issues, modify device resume latency PM QoS to > use S32_MAX as the "no constraint" value and 0 as the "no > latency at all" one and rework its users (the cpuidle menu > governor, the genpd QoS governor and the runtime PM framework) > to follow these changes. > > Also add a special "n/a" value to the corresponding user space I/F > to allow user space to indicate that it cannot accept any resume > latencies at all for the given device. > > Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency > constraints) > Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323 > Reported-by: Reinette Chatre > Signed-off-by: Rafael J. Wysocki > Tested-by: Reinette Chatre > Tested-by: Geert Uytterhoeven Acked-by: Ulf Hansson Kind regards Uffe > --- > > As noticed by Ramesh, the v3 had issues with an overlooked value > conversion and a stale comment, so here goes a v4. > > --- > Documentation/ABI/testing/sysfs-devices-power |4 +- > drivers/base/cpu.c|3 + > drivers/base/power/domain.c |2 - > drivers/base/power/domain_governor.c | 40 > ++ > drivers/base/power/qos.c |5 ++- > drivers/base/power/runtime.c |2 - > drivers/base/power/sysfs.c| 25 +--- > drivers/cpuidle/governors/menu.c |4 +- > include/linux/pm_qos.h| 26 +++- > 9 files changed, 68 insertions(+), 43 deletions(-) > > Index: linux-pm/drivers/base/power/sysfs.c > === > --- linux-pm.orig/drivers/base/power/sysfs.c > +++ linux-pm/drivers/base/power/sysfs.c > @@ -218,7 +218,14 @@ static ssize_t pm_qos_resume_latency_sho > struct device_attribute *attr, > char *buf) > { > - return sprintf(buf, "%d\n", dev_pm_qos_requested_resume_latency(dev)); > + s32 value = dev_pm_qos_requested_resume_latency(dev); > + > + if (value == 0) > + return sprintf(buf, "n/a\n"); > + else if (value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) > + value = 0; > + > + return sprintf(buf, "%d\n", value); > } > > static ssize_t pm_qos_resume_latency_store(struct device *dev, > @@ -228,11 +235,21 @@ static ssize_t pm_qos_resume_latency_sto > s32 value; > int ret; > > - if (kstrtos32(buf, 0, &value)) > - return -EINVAL; > + if (!kstrtos32(buf, 0, &value)) { > + /* > +* Prevent users from writing negative or "no constraint" > values > +* directly. > +*/ > + if (value < 0 || value == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT) > + return -EINVAL; > > - if (value < 0) > + if (value == 0) > + value = PM_QOS_RESUME_LATENCY_NO_CONSTRAINT; > + } else if (!strcmp(buf, "n/a") || !strcmp(buf, "n/a\n")) { > + value = 0; > + } else { > return -EINVAL; > + } > > ret = dev_pm_qos_update_request(dev->power.qos->resume_latency_req, > value); > Index: linux-pm/include/linux/pm_qos.h > === > --- linux-pm.orig/include/linux/pm_qos.h > +++ linux-pm/include/linux/pm_qos.h > @@ -28,16 +28,19 @@ enum pm_qos_flags_status { > PM_QOS_FLAGS_ALL, > }; > > -#define PM_QOS_DEFAULT_VALUE -1 > +#define PM_QOS_DEFAULT_VALUE (-1) > +#define PM_QOS_LATENCY_ANY S32_MAX > +#define PM_QOS_LATENCY_ANY_NS ((s64)PM_QOS_LATENCY_ANY * NSEC_PER_USEC) > > #define PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC) > #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE (2000 * USEC_PER_SEC) > #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE0 > #define PM_QOS_MEMORY_BANDWIDTH_DEFAULT_VALUE 0 > -#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUE0 > +#define PM_QOS_RESUME_LATENCY_DEFAULT_VALUEPM_QOS_LATENCY_ANY > +#define PM_QOS_RESU
[PATCH] fixup! kvm: arm debug: introduce helper for single-step
--- arch/arm/include/asm/kvm_host.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index a2e881d6108e..26a1ea6c6542 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -286,7 +286,10 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {} static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {} static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {} static inline bool kvm_arm_handle_step_debug(struct kvm_vcpu *vcpu, -struct kvm_run *run) {} +struct kvm_run *run) +{ + return false; +} int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr); -- 2.14.2
[git pull] Input updates for v4.14-rc8
Hi Linus, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus to receive updates for the input subsystem. You will get: - a new ACPI ID for Elan touchpad found in yet another Ideapad model; - Synaptics RMI4 will allow binding to controllers reporting SMB version 3 (note that we are not adding any new ACPI IDs to the Synaptics PS/2 drover so unless user explicitly enables intertouch support there is no user-visible change); - a fixup to TSC 2004/5 touchscreen driver to mark input devices as "direct" to help userspace identify the type of device they are dealing with. Changelog: - Kai-Heng Feng (1): Input: elan_i2c - add ELAN060C to the ACPI table Martin Kepplinger (1): Input: tsc200x-core - set INPUT_PROP_DIRECT Yiannis Marangos (1): Input: synaptics-rmi4 - RMI4 can also use SMBUS version 3 Diffstat: drivers/input/mouse/elan_i2c_core.c | 1 + drivers/input/rmi4/rmi_smbus.c | 4 ++-- drivers/input/touchscreen/tsc200x-core.c | 1 + 3 files changed, 4 insertions(+), 2 deletions(-) Thanks. -- Dmitry
[f2fs-dev] [PATCH] f2fs: validate before set/clear free nat bitmap
In flush_nat_entries, all dirty nats will be flushed and if their new address isn't NULL_ADDR, their bitmaps will be updated, the free_nid_count of the bitmaps will be increaced regardless of whether the nats have already been occupied before. This could lead to wrong free_nid_count. So this patch checks the status of the bits beforeactually set/clear them. Signed-off-by: Fan li --- fs/f2fs/node.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index d234c6e..b965a53 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1906,15 +1906,18 @@ static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid, if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap)) return; - if (set) + if (set) { + if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs])) + return; __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); - else - __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); - - if (set) nm_i->free_nid_count[nat_ofs]++; - else if (!build) - nm_i->free_nid_count[nat_ofs]--; + } else { + if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs])) + return; + __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); + if (!build) + nm_i->free_nid_count[nat_ofs]--; + } } static void scan_nat_page(struct f2fs_sb_info *sbi, -- 2.7.4
Re: [PATCH v2] locking/lockdep: Revise Documentation/locking/crossrelease.txt
On 11/10/2017 4:30 PM, Ingo Molnar wrote: * Byungchul Park wrote: Event C depends on event A. Event A depends on event B. Event B depends on event C. - NOTE: Precisely speaking, a dependency is one between whether a - waiter for an event can be woken up and whether another waiter for - another event can be woken up. However from now on, we will describe - a dependency as if it's one between an event and another event for - simplicity. Why was this explanation removed? -Lockdep tries to detect a deadlock by checking dependencies created by -lock operations, acquire and release. Waiting for a lock corresponds to -waiting for an event, and releasing a lock corresponds to triggering an -event in the previous section. +Lockdep tries to detect a deadlock by checking circular dependencies +created by lock operations, acquire and release, which are wait and +event respectively. What? You changed a readable paragraph into an unreadable one. Sorry, this text needs to be acked by someone with good English skills, and I don't have the time right now to fix it all up. Please send minimal, obvious typo/grammar fixes only. I will send one including minimal fixes at the next spin. -- Thanks, Byungchul
[PATCH RFC] kbuild: fixes in Makefile.lib
commit cf4f21938e13e ("kbuild: Allow to specify composite modules with modname-m") add modname-m support, but miss to update the corresponding multi-objs-m defination. commit 551559e13af1c ("kbuild: implement modules.order") miss to filter the subdir listed in obj-m. Except that the subdirs are totally identical between obj-y and obj-m, or else I think it will miss something. But until now, no one has complaining about it, so I guess it just no one has triggerred it. Signed-off-by: Cao jin --- I found these 2 points which I think might be wrong during code inspection, but until now, they seems didn't do anything bad, so I am not sure this is a problem:) scripts/Makefile.lib | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 580e605118e4..3209f303213b 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -22,7 +22,7 @@ lib-y := $(filter-out $(obj-y), $(sort $(lib-y) $(lib-m))) # Determine modorder. # Unfortunately, we don't have information about ordering between -y # and -m subdirs. Just put -y's first. -modorder := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y)) $(obj-m:.o=.ko)) +modorder := $(patsubst %/,%/modules.order, $(filter %/, $(obj-y) $(obj-m) $(obj-m:.o=.ko)) # Handle objects in subdirs # --- @@ -49,7 +49,7 @@ single-used-m := $(sort $(filter-out $(multi-used-m),$(obj-m))) # Build list of the parts of our composite objects, our composite # objects depend on those (obviously) multi-objs-y := $(foreach m, $(multi-used-y), $($(m:.o=-objs)) $($(m:.o=-y))) -multi-objs-m := $(foreach m, $(multi-used-m), $($(m:.o=-objs)) $($(m:.o=-y))) +multi-objs-m := $(foreach m, $(multi-used-m), $($(m:.o=-objs)) $($(m:.o=-y)) $($(m:.o=-m)) multi-objs := $(multi-objs-y) $(multi-objs-m) # $(subdir-obj-y) is the list of objects in $(obj-y) which uses dir/ to -- 2.13.6
Re: [PATCH v5 15/37] tracing: Add variable support to hist triggers
Hi Tom, On Thu, Nov 09, 2017 at 02:33:46PM -0600, Tom Zanussi wrote: > Add support for saving the value of a current event's event field by > assigning it to a variable that can be read by a subsequent event. > > The basic syntax for saving a variable is to simply prefix a unique > variable name not corresponding to any keyword along with an '=' sign > to any event field. > > Both keys and values can be saved and retrieved in this way: > > # echo 'hist:keys=next_pid:vals=$ts0:ts0=$common_timestamp ... > # echo 'hist:timer_pid=common_pid:key=$timer_pid ...' > > If a variable isn't a key variable or prefixed with 'vals=', the > associated event field will be saved in a variable but won't be summed > as a value: > > # echo 'hist:keys=next_pid:ts1=$common_timestamp:... > > Multiple variables can be assigned at the same time: > > # echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=$common_timestamp,b=field1 > ... > > Multiple (or single) variables can also be assigned at the same time > using separate assignments: > > # echo 'hist:keys=pid:vals=$ts0:ts0=$common_timestamp:b=field1:c=field2 > ... > > Variables set as above can be used by being referenced from another > event, as described in a subsequent patch. > > Signed-off-by: Tom Zanussi > Signed-off-by: Baohong Liu > --- [SNIP] > +static int parse_var_defs(struct hist_trigger_data *hist_data) > +{ > + char *s, *str, *var_name, *field_str; > + unsigned int i, j, n_vars = 0; > + int ret = 0; > + > + for (i = 0; i < hist_data->attrs->n_assignments; i++) { > + str = hist_data->attrs->assignment_str[i]; > + for (j = 0; j < TRACING_MAP_VARS_MAX; j++) { > + field_str = strsep(&str, ","); > + if (!field_str) > + break; > + > + var_name = strsep(&field_str, "="); > + if (!var_name || !field_str) { > + ret = -EINVAL; > + goto free; > + } > + > + s = kstrdup(var_name, GFP_KERNEL); > + if (!s) { > + ret = -ENOMEM; > + goto free; > + } > + hist_data->attrs->var_defs.name[n_vars] = s; > + > + s = kstrdup(field_str, GFP_KERNEL); > + if (!s) { > + kfree(hist_data->attrs->var_defs.name[n_vars]); > + ret = -ENOMEM; > + goto free; > + } > + hist_data->attrs->var_defs.expr[n_vars++] = s; > + > + hist_data->attrs->var_defs.n_vars = n_vars; > + > + if (n_vars == TRACING_MAP_VARS_MAX) > + goto free; This will silently discard all variables. Why not returning an error? Also I think it should be moved to the beginning of this block.. Thanks, Namhyung > + } > + } > + > + return ret; > + free: > + free_var_defs(hist_data); > + > + return ret; > +}
[RFC PATCH] mm: fix device-dax pud write-faults triggered by get_user_pages()
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 Use pud_access_permitted() to implement pud_write(), a later cleanup can remove {pte,pmd,pud}_write and replace them with {pte,pmd,pud}_access_permitted() drectly so that we only have one set of helpers these kinds of checks. For now, implementing pud_write() simplifies -stable backports. Cc: Cc: Dave Hansen Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams --- Sending this as RFC for opinion on whether this should just be a pud_flags() & _PAGE_RW check, like pmd_write, or pud_access_permitted() that also takes protection keys into account. include/linux/hugetlb.h |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index fbf5b31d47ee..6a142b240ef7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -242,8 +242,7 @@ static inline int pgd_write(pgd_t pgd) #ifndef pud_write static inline int pud_write(pud_t pud) { - BUG(); - return 0; + return pud_access_permitted(pud, WRITE); } #endif
Re: [PATCH v2] locking/lockdep: Revise Documentation/locking/crossrelease.txt
* Byungchul Park wrote: > Event C depends on event A. > Event A depends on event B. > Event B depends on event C. > > - NOTE: Precisely speaking, a dependency is one between whether a > - waiter for an event can be woken up and whether another waiter for > - another event can be woken up. However from now on, we will describe > - a dependency as if it's one between an event and another event for > - simplicity. Why was this explanation removed? > -Lockdep tries to detect a deadlock by checking dependencies created by > -lock operations, acquire and release. Waiting for a lock corresponds to > -waiting for an event, and releasing a lock corresponds to triggering an > -event in the previous section. > +Lockdep tries to detect a deadlock by checking circular dependencies > +created by lock operations, acquire and release, which are wait and > +event respectively. What? You changed a readable paragraph into an unreadable one. Sorry, this text needs to be acked by someone with good English skills, and I don't have the time right now to fix it all up. Please send minimal, obvious typo/grammar fixes only. Thanks, Ingo
Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
* Rafael J. Wysocki wrote: > Hi Linus, > > On 11/9/2017 11:38 AM, WANG Chao wrote: > > Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused > > a serious performance issue when reading from /proc/cpuinfo on system > > with aperfmperf. > > > > For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency. > > On a system with 64 cpus, it takes 1.5s to finish running `cat > > /proc/cpuinfo`, while it previously was done in 15ms. > > Honestly, I'm not sure what to do to address this ATM. > > The last requested frequency is only available in the non-HWP case, so it > cannot be used universally. This is a serious regression that needs to be fixed ASAP, because the slowdown is utterly ridiculous on a 120 CPU system: fomalhaut:~> time cat /proc/cpuinfo >/dev/null real0m2.689s user0m0.001s sys 0m0.007s Thanks, Ingo
Re: [alsa-devel] [RFC PATCH v2 7/7] sound: core: Avoid using timespec for struct snd_timer_tread
On Fri, 10 Nov 2017 00:20:10 +0100, Arnd Bergmann wrote: > > On Thu, Nov 9, 2017 at 7:11 PM, Takashi Iwai wrote: > > On Thu, 09 Nov 2017 18:01:47 +0100, > > Arnd Bergmann wrote: > >> > >> On Thu, Nov 9, 2017 at 5:52 PM, Takashi Iwai wrote: > >> > > >> > IOW, is there any macro indicating the 64bit user time_t? > >> > >> There is a macro defined by the C library, but so far we have not > >> started relying on it in kernel headers, because there is no guarantee > >> that this symbol is visible before sys/time.h has been included, > >> and there are some cases where it's possible to include a kernel > >> header before sys/time.h. > >> > >> In case of sound/asound.h, that should be no problem since we rely > >> on having seen the definition on 'struct timeval' already today, and > >> that must come from sys/time.h. Then we just need to make sure that > >> all C libraries define the same macro. > >> > >> Are you sure about the switch()/case problem? I thought that worked > >> in C99, the only problem would be using the macro outside of a > >> function, e.g. as initalizer for a variable > > > > Hmm, OK it seems working. > > > > But, honestly speaking, it's too scaring. I'm OK if it were only in > > the kernel local code. But it's the API/ABI definition, which is > > referred by user-space... > > > > A more solid condition would be really appreciated. > > I understand your concern here and agree it's really ugly. It did take us > many attempts to come up with this trick for other cases, so my initial > reaction would be to use the same thing everywhere since I know > it works, but we can use #ifdef instead if you prefer that. I think we > can use a single #ifdef variant to cover all cases, but I'd have to think > about the x32 and x86-32 some more here. With this trick, we can > make user space with new glibc use data structures that are compatible > with 64-bit kernels and avoid the additional translation helpers: > > enum { > SNDRV_PCM_MMAP_OFFSET_DATA = 0x, > SNDRV_PCM_MMAP_OFFSET_CONTROL = 0x8100, > #if (__BITS_PER_LONG == 64) || !defined(__USE_TIME_BITS64) Yeah, it's definitely better, more understandable! thanks, Takashi
Re: [PATCH] x86, pkeys: update documentation about availability
* Dave Hansen wrote: > On 11/09/2017 10:12 PM, Ingo Molnar wrote: > > > > * Dave Hansen wrote: > > > >> > >> From: Dave Hansen > >> > >> Now that CPUs that implement Memory Protection Keys are publicly > >> available we can be a bit less oblique about where it is available. > >> > >> Signed-off-by: Dave Hansen > >> --- > >> > >> b/Documentation/x86/protection-keys.txt |9 +++-- > >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> > >> diff -puN Documentation/x86/protection-keys.txt~pkeys-update > >> Documentation/x86/protection-keys.txt > >> --- a/Documentation/x86/protection-keys.txt~pkeys-update 2017-11-09 > >> 10:36:53.381467202 -0800 > >> +++ b/Documentation/x86/protection-keys.txt2017-11-09 > >> 10:43:15.527466249 -0800 > >> @@ -1,5 +1,10 @@ > >> -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature > >> -which will be found on future Intel CPUs. > >> +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature > >> +which is found on Intel's Skylake "Scalable Processor" Server CPUs. > >> +It will be avalable in future non-server parts. > >> + > >> +For anyone wishing to test or use this feature, it is available in > >> +Amazon's EC2 C5 instances and is known to work there using an Ubuntu > >> +17.04 image. > >> > >> Memory Protection Keys provides a mechanism for enforcing page-based > >> protections, but without requiring modification of the page tables > > > > Could we please first fix the pkeys self-test? One of the testcases doesn't > > build > > at all: > > > > gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 > > -O2 -g -std=gnu99 -pthread -Wall -no-pie protection_keys.c -lrt -ldl -lm > > In file included from /usr/include/signal.h:57:0, > > from protection_keys.c:33: > > protection_keys.c: In function ‘signal_handler’: > > protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or > > ‘__attribute__’ > > before ‘.’ token > >u64 si_pkey; > > That's odd. I build them all the time. I compiled it just now with > 4.14-rc8 and gcc 4.8.4. > > I wonder if this is more fallout from the glibc headers getting updated > to now contain pkey-related stuff. si_pkey might be getting #defined > over for the siginfo si_pkey. > > What distro are you seeing this on? Latest Ubuntu, 17.10: triton:~/tip> cat /etc/os-release NAME="Ubuntu" VERSION="17.10 (Artful Aardvark)" triton:~/tip> apt-file find /usr/include/signal.h libc6-dev: /usr/include/signal.h triton:~/tip> dpkg -l libc6-dev Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ NameVersion Architecture Description +++-===--- ii libc6-dev:amd64 2.26-0ubuntu2amd64 GNU C Library: Development Libraries and Header Files > > plus, on a related note, the MPX testcase produces annoying warnings: > > > > gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 > > -O2 -g -std=gnu99 -pthread -Wall -no-pie mpx-mini-test.c -lrt -ldl -lm > > mpx-mini-test.c: In function ‘insn_test_failed’: > > mpx-mini-test.c:1406:3: warning: array subscript is above array bounds > > [-Warray-bounds] > > printf("bte[1]: %lx\n", bte->contents[1]); > > This is kinda a weird structure: > > > struct mpx_bt_entry { > > union { > > char x[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES]; > > unsigned long contents[1]; > > }; > > } __attribute__((packed)); > > I guess it should either be contents[0] or > contents[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTE/sizeof(long)]. But, the > warning is harmless at least. > > What gcc is this, btw? I must be behind the times. gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3) Thanks, Ingo
Re: [PATCH 1/2] sched/swait: allow swake_up() to return
On Thu, Nov 09, 2017 at 11:06:53AM +0100, Paolo Bonzini wrote: > On 09/11/2017 10:18, Peter Xu wrote: > > Let swake_up() to return whether any of the waiters is waked up. One use > > case of it would be: > > > > if (swait_active(wq)) { > > swake_up(wq); > > // do something when waiter is waked up > > waked_up++; > > } > > > > Logically it's possible that when reaching swake_up() the wait queue is > > not active any more, and here doing something like waked_up++ would be > > inaccurate. To correct it, we need an atomic version of it. > > > > With this patch, we can simply re-write it into: > > > > if (swake_up(wq)) { > > // do something when waiter is waked up > > waked_up++; > > } > > > > After all we are checking swait_active() inside swake_up() too. > > Better subject: > > sched/swait: make swake_up() return whether there were any waiters > > I like this patch. I'll see how PeterZ would like me to do next, or I can drop this patch and send another clean up which is part of patch 2. Thanks for the positive feedback and commenting. :-) -- Peter Xu
Re: [PATCH v2 0/4] KVM: Paravirt remote TLB flush
2017-11-10 15:04 GMT+08:00 Wanpeng Li : > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. > > This patch set implements para-virt flush tlbs making sure that it > does not wait for vcpus that are sleeping. And all the sleeping vcpus > flush the tlb on guest enter. Idea was discussed here: > https://lkml.org/lkml/2012/2/20/157 > > The best result is achieved when we're overcommiting the host by running > multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching > vCPUs which are not scheduled and avoid the wait on the main CPU. > > In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based > page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") > > Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in > one linux guest. > > ebizzy -M > vanillaoptimized boost > 8 vCPUs 10152 10083 -0.68% > 16 vCPUs12244866 297.5% > 24 vCPUs11093871 249% > 32 vCPUs10253375 229.3% v1 -> v2: * a new CPUID feature bit * fix cmpxchg check * use kvm_vcpu_flush_tlb() to get the statistics right * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted * add a new bool argument to kvm_x86_ops->tlb_flush * __cpumask_clear_cpu() instead of cpumask_clear_cpu() * not put cpumask_t on stack * rebase the patchset against "locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set" v3 > > Wanpeng Li (4): > KVM: Add vCPU running/preempted state > KVM: Add paravirt remote TLB flush > KVM: X86: introduce invalidate_gpa argument to tlb flush > KVM: Add flush_on_enter before guest enter > > Documentation/virtual/kvm/cpuid.txt | 10 ++ > arch/x86/include/asm/kvm_host.h | 2 +- > arch/x86/include/uapi/asm/kvm_para.h | 6 ++ > arch/x86/kernel/kvm.c| 35 ++- > arch/x86/kvm/cpuid.c | 3 ++- > arch/x86/kvm/svm.c | 14 +++--- > arch/x86/kvm/vmx.c | 21 +++-- > arch/x86/kvm/x86.c | 24 +++- > 8 files changed, 86 insertions(+), 29 deletions(-) > > -- > 2.7.4 >
Re: [PATCH 1/2] sched/swait: allow swake_up() to return
On Thu, Nov 09, 2017 at 11:23:03AM +0100, Peter Zijlstra wrote: > On Thu, Nov 09, 2017 at 05:18:53PM +0800, Peter Xu wrote: > > Let swake_up() to return whether any of the waiters is waked up. One use > > case of it would be: > > > > if (swait_active(wq)) { > > swake_up(wq); > > // do something when waiter is waked up > > waked_up++; > > } > > The word is 'woken', and no that doesn't work. All it says is that there > was a waiter, not that you were to one to wake it. Another concurrent > wakeup might have done so. Yes. Or IIUC the waiter can be calling finish_swait() somehow so it removed itself from the list before being woken. > > > > > Logically it's possible that when reaching swake_up() the wait queue is > > not active any more, and here doing something like waked_up++ would be > > inaccurate. To correct it, we need an atomic version of it. > > > > With this patch, we can simply re-write it into: > > > > if (swake_up(wq)) { > > // do something when waiter is waked up > > waked_up++; > > } > > > > After all we are checking swait_active() inside swake_up() too. > > We're not in fact; you've been staring at old code; see commit: > > 35a2897c2a30 ("sched/wait: Remove the lockless swait_active() check in > swake_up*()") I thought the tree was new enough, but obviously I was wrong... Thanks for the pointer. > > > Also, you're changing the interface relative to the regular wait > interface. The two should be similar wherever possible. Indeed. I came to this when reading kvm_vcpu_wake_up(), so that only affects some statistic which may not be that critical. However I don't know whether there would be any other real use case that we would like to know exactly whether a call to [s]wake_up() has really done something or just returned with a NOP. Anyway, please let me know if you think the same change to wake_up() would be meaningful, otherwise I can drop this patch and post another KVM-only one to clean up the redundant callers of swait_active(), since even if we dropped that list check in 35a2897c2a30, we'll do that again in swake_up_locked(). And after knowing 35a2897c2a30, I do think that calling swait_active() before swake_up() is not good since that call is without a lock as well, just like what can happen before 35a2897c2a30. (I am not 100% sure whether I fully understand the problem mentioned in 35a2897c2a30, but I think it's the memory barrier in the lock/unlock that matters.) Thanks, -- Peter Xu
[PATCH v2 1/4] KVM: Add vCPU running/preempted state
From: Wanpeng Li This patch reuses the preempted field in kvm_steal_time, and will export the vcpu running/pre-empted information to the guest from host. This will enable guest to intelligently send ipi to running vcpus and set flag for pre-empted vcpus. This will prevent waiting for vcpus that are not running. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/include/uapi/asm/kvm_para.h | 3 +++ arch/x86/kernel/kvm.c| 2 +- arch/x86/kvm/x86.c | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index a965e5b0..ff23ce9 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -50,6 +50,9 @@ struct kvm_steal_time { __u32 pad[11]; }; +#define KVM_VCPU_NOT_PREEMPTED (0 << 0) +#define KVM_VCPU_PREEMPTED (1 << 0) + #define KVM_CLOCK_PAIRING_WALLCLOCK 0 struct kvm_clock_pairing { __s64 sec; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8bb9594..1b1b641 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -608,7 +608,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu) { struct kvm_steal_time *src = &per_cpu(steal_time, cpu); - return !!src->preempted; + return !!(src->preempted & KVM_VCPU_PREEMPTED); } PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d61dcce3..46d4158 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2116,7 +2116,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu) &vcpu->arch.st.steal, sizeof(struct kvm_steal_time return; - vcpu->arch.st.steal.preempted = 0; + vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED; if (vcpu->arch.st.steal.version & 1) vcpu->arch.st.steal.version += 1; /* first time write, random junk */ @@ -2887,7 +2887,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; - vcpu->arch.st.steal.preempted = 1; + vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED; kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime, &vcpu->arch.st.steal.preempted, -- 2.7.4
[PATCH v2 2/4] KVM: Add paravirt remote TLB flush
From: Wanpeng Li Remote flushing api's does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time. This patch set implements para-virt flush tlbs making sure that it does not wait for vcpus that are sleeping. And all the sleeping vcpus flush the tlb on guest enter. The best result is achieved when we're overcommiting the host by running multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching vCPUs which are not scheduled and avoid the wait on the main CPU. Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in one linux guest. ebizzy -M vanillaoptimized boost 8 vCPUs 10152 10083 -0.68% 16 vCPUs12244866 297.5% 24 vCPUs11093871 249% 32 vCPUs10253375 229.3% Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/cpuid.txt | 4 arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 31 +++ 3 files changed, 37 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 117066a..9693fcc 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit || || mizations such as usage of || || qspinlocks. -- +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit + || || before enabling paravirtualized + || || tlb flush. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 9ead1ed..a028479 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -25,6 +25,7 @@ #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 #define KVM_FEATURE_PV_DEDICATED 8 +#define KVM_FEATURE_PV_TLB_FLUSH 9 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -53,6 +54,7 @@ struct kvm_steal_time { #define KVM_VCPU_NOT_PREEMPTED (0 << 0) #define KVM_VCPU_PREEMPTED (1 << 0) +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) #define KVM_CLOCK_PAIRING_WALLCLOCK 0 struct kvm_clock_pairing { diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 66ed3bc..50f4b6a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) update_intr_gate(X86_TRAP_PF, async_page_fault); } +static cpumask_t flushmask; + +static void kvm_flush_tlb_others(const struct cpumask *cpumask, + const struct flush_tlb_info *info) +{ + u8 state; + int cpu; + struct kvm_steal_time *src; + + cpumask_copy(&flushmask, cpumask); + /* +* We have to call flush only on online vCPUs. And +* queue flush_on_enter for pre-empted vCPUs +*/ + for_each_cpu(cpu, cpumask) { + src = &per_cpu(steal_time, cpu); + state = src->preempted; + if ((state & KVM_VCPU_PREEMPTED)) { + if (cmpxchg(&src->preempted, state, state | + KVM_VCPU_SHOULD_FLUSH) == state) + __cpumask_clear_cpu(cpu, &flushmask); + } + } + + native_flush_tlb_others(&flushmask, info); +} + void __init kvm_guest_init(void) { int i; @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) pv_time_ops.steal_clock = kvm_steal_clock; } + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) + pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others; + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); -- 2.7.4
[PATCH v2 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush
From: Wanpeng Li Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, it will be used by later patches to just flush guest tlb. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm.c | 14 +++--- arch/x86/kvm/vmx.c | 21 +++-- arch/x86/kvm/x86.c | 6 +++--- 4 files changed, 22 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c73e493..b4f7bb1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -952,7 +952,7 @@ struct kvm_x86_ops { unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); - void (*tlb_flush)(struct kvm_vcpu *vcpu); + void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa); void (*run)(struct kvm_vcpu *vcpu); int (*handle_exit)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0e68f0b..efaf95f 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -285,7 +285,7 @@ static int vgif = true; module_param(vgif, int, 0444); static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); -static void svm_flush_tlb(struct kvm_vcpu *vcpu); +static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa); static void svm_complete_interrupts(struct vcpu_svm *svm); static int nested_svm_exit_handled(struct vcpu_svm *svm); @@ -2032,7 +2032,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) return 1; if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE)) - svm_flush_tlb(vcpu); + svm_flush_tlb(vcpu, true); vcpu->arch.cr4 = cr4; if (!npt_enabled) @@ -2368,7 +2368,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu, svm->vmcb->control.nested_cr3 = __sme_set(root); mark_dirty(svm->vmcb, VMCB_NPT); - svm_flush_tlb(vcpu); + svm_flush_tlb(vcpu, true); } static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu, @@ -3033,7 +3033,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions; svm->nested.intercept= nested_vmcb->control.intercept; - svm_flush_tlb(&svm->vcpu); + svm_flush_tlb(&svm->vcpu, true); svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK; if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK) svm->vcpu.arch.hflags |= HF_VINTR_MASK; @@ -4755,7 +4755,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr) return 0; } -static void svm_flush_tlb(struct kvm_vcpu *vcpu) +static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa) { struct vcpu_svm *svm = to_svm(vcpu); @@ -5046,7 +5046,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root) svm->vmcb->save.cr3 = __sme_set(root); mark_dirty(svm->vmcb, VMCB_CR); - svm_flush_tlb(vcpu); + svm_flush_tlb(vcpu, true); } static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) @@ -5060,7 +5060,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root) svm->vmcb->save.cr3 = kvm_read_cr3(vcpu); mark_dirty(svm->vmcb, VMCB_CR); - svm_flush_tlb(vcpu); + svm_flush_tlb(vcpu, true); } static int is_disabled(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e5bea5e..17d13d2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4113,9 +4113,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu) #endif -static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid) +static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid, + bool invalidate_gpa) { - if (enable_ept) { + if (enable_ept && (invalidate_gpa || !enable_vpid)) { if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) return; ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa)); @@ -4124,15 +4125,15 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid) } } -static void vmx_flush_tlb(struct kvm_vcpu *vcpu) +static void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa) { - __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid); + __vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa); } static void vmx_flush_tlb_ept_only(struct kvm_vcpu *vcpu) { if (enable_ept) - vmx_flush_tlb(vcpu); + vmx_flush_tlb(vcpu, true); } static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu) @@ -4330,7 +4331,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) ept_load_pdptrs(vcpu); } -
[PATCH v2 0/4] KVM: Paravirt remote TLB flush
Remote flushing api's does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time. This patch set implements para-virt flush tlbs making sure that it does not wait for vcpus that are sleeping. And all the sleeping vcpus flush the tlb on guest enter. Idea was discussed here: https://lkml.org/lkml/2012/2/20/157 The best result is achieved when we're overcommiting the host by running multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching vCPUs which are not scheduled and avoid the wait on the main CPU. In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in one linux guest. ebizzy -M vanillaoptimized boost 8 vCPUs 10152 10083 -0.68% 16 vCPUs12244866 297.5% 24 vCPUs11093871 249% 32 vCPUs10253375 229.3% Wanpeng Li (4): KVM: Add vCPU running/preempted state KVM: Add paravirt remote TLB flush KVM: X86: introduce invalidate_gpa argument to tlb flush KVM: Add flush_on_enter before guest enter Documentation/virtual/kvm/cpuid.txt | 10 ++ arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/uapi/asm/kvm_para.h | 6 ++ arch/x86/kernel/kvm.c| 35 ++- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/svm.c | 14 +++--- arch/x86/kvm/vmx.c | 21 +++-- arch/x86/kvm/x86.c | 24 +++- 8 files changed, 86 insertions(+), 29 deletions(-) -- 2.7.4
[PATCH v2 4/4] KVM: Add flush_on_enter before guest enter
PV-Flush guest would indicate to flush on enter, flush the TLB before entering and exiting the guest. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/x86.c | 22 ++ 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0099e10..2724a5c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -594,7 +594,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_ASYNC_PF) | (1 << KVM_FEATURE_PV_EOI) | (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | -(1 << KVM_FEATURE_PV_UNHALT); +(1 << KVM_FEATURE_PV_UNHALT) | +(1 << KVM_FEATURE_PV_TLB_FLUSH); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2b2cc99..7e80be4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2107,6 +2107,12 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu) vcpu->arch.pv_time_enabled = false; } +static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa) +{ + ++vcpu->stat.tlb_flush; + kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa); +} + static void record_steal_time(struct kvm_vcpu *vcpu) { if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) @@ -2116,7 +2122,13 @@ static void record_steal_time(struct kvm_vcpu *vcpu) &vcpu->arch.st.steal, sizeof(struct kvm_steal_time return; - vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED; + if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) == + (KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) + /* +* Do TLB_FLUSH before entering the guest, its passed +* the stage of request checking +*/ + kvm_vcpu_flush_tlb(vcpu, false); if (vcpu->arch.st.steal.version & 1) vcpu->arch.st.steal.version += 1; /* first time write, random junk */ @@ -2887,7 +2899,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; - vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED; + vcpu->arch.st.steal.preempted |= KVM_VCPU_PREEMPTED; kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime, &vcpu->arch.st.steal.preempted, @@ -6737,12 +6749,6 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); } -static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa) -{ - ++vcpu->stat.tlb_flush; - kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa); -} - void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page = NULL; -- 2.7.4
Re: [PATCH] pstore: use ktime_get_real_fast_ns() instead of __getnstimeofday()
On Thu, 9 Nov 2017, Kees Cook wrote: > On Thu, Nov 9, 2017 at 4:46 PM, Thomas Gleixner wrote: > > On Fri, 10 Nov 2017, Arnd Bergmann wrote: > >> On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner > >> wrote: > >> > Hmm, no. None of the regular accessor functions can be called from NMI > >> > context safely. > >> > >> Right, that's what I mean: it must not get called from NMI context, but it > >> currently is, at least for this case: > >> > >> NMI handler: > >> something bad > >> panic() > >> kmsg_dump() > >> pstore_dump() > >>pstore_record_init() > >> __getnstimeofday() > >> > >> I should probably add that to the changelog text ;-) > > > > Indeed. > > Er, so, is this safe to call there? I've had to fix this a few times > now, so if using ktime_get_real_fast_ns() can be used here (and > doesn't return 0) then this is easily an improvement over the existing > "maybe read 0" case pstore has now. ktime_get_real_fast_ns() is NMI safe and returns before timekeeping_suspend(): correct time after timekeeping_suspend():timestamp which was frozen in timekeeping_suspend() after timekeeping_resume(): correct time Thanks, tglx
Re: [PATCH v17 5/6] vfio: ABI for mdev display dma-buf operation
On Thu, Nov 09, 2017 at 01:54:57PM -0700, Alex Williamson wrote: > On Thu, 9 Nov 2017 19:35:14 +0100 > Gerd Hoffmann wrote: > > > Hi, > > > > > struct vfio_device_gfx_plane_info lacks the head field we've been > > > discussing. Thanks, > > > > Adding multihead support turned out to not be that easy. There are > > corner cases like a single framebuffer spawning both heads. Also it > > would be useful to somehow hint to the guest which heads it should use. > > > > In short: Proper multihead support is more complex than just adding a > > head field for later use. So in a short private discussion with Tina we > > came to the conclusion that it will be better add multihead support to > > the API when the first driver wants use it, so we can actually test the > > interface and make sure we didn't miss anything. Adding a incomplete > > multihead API now doesn't help anybody. > > Do you think we can enable multi-head and preserve backwards > compatibility within this API proposed here? Yes, I think we can. Adding new fields is possible thanks to the argsz field at the start of the struct, so we easily add the new fields (head, framebuffer rectangle, whatever else is needed). If the new fields are not present the driver can simply assume head=0. Does the driver set argsz too? If so userspace can detect whenever the driver supports the multihead API extension (before going to probe for head=1) that way. If not we probably need an additional probe flag for that. But in any case I'm confident this is solvable. Passing hints about the display configuration to the guest needs a new ioctl, so we don't have compatibility issues there. cheers, Gerd
[PATCH V4 06/12] clk: sprd: add divider clock support
This is a feature that can also be found in sprd composite clocks, provide a bunch of helpers that can be reused in that. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Makefile | 1 + drivers/clk/sprd/div.c| 100 ++ drivers/clk/sprd/div.h| 79 3 files changed, 180 insertions(+) create mode 100644 drivers/clk/sprd/div.c create mode 100644 drivers/clk/sprd/div.h diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index cee36b5..80e6039 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_SPRD_COMMON_CLK) += clk-sprd.o clk-sprd-y += common.o clk-sprd-y += gate.o clk-sprd-y += mux.o +clk-sprd-y += div.o diff --git a/drivers/clk/sprd/div.c b/drivers/clk/sprd/div.c new file mode 100644 index 000..3e08dcd --- /dev/null +++ b/drivers/clk/sprd/div.c @@ -0,0 +1,100 @@ +/* + * Spreadtrum divider clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include + +#include "div.h" + +DEFINE_SPINLOCK(sprd_div_lock); +EXPORT_SYMBOL_GPL(sprd_div_lock); + +long sprd_div_helper_round_rate(struct sprd_clk_common *common, + const struct sprd_div_internal *div, + unsigned long rate, + unsigned long *parent_rate) +{ + return divider_round_rate(&common->hw, rate, parent_rate, + NULL, div->width, 0); +} +EXPORT_SYMBOL_GPL(sprd_div_helper_round_rate); + +static long sprd_div_round_rate(struct clk_hw *hw, unsigned long rate, + unsigned long *parent_rate) +{ + struct sprd_div *cd = hw_to_sprd_div(hw); + + return sprd_div_helper_round_rate(&cd->common, &cd->div, + rate, parent_rate); +} + +unsigned long sprd_div_helper_recalc_rate(struct sprd_clk_common *common, + const struct sprd_div_internal *div, + unsigned long parent_rate) +{ + unsigned long val; + unsigned int reg; + + sprd_regmap_read(common->regmap, common->reg, ®); + val = reg >> div->shift; + val &= (1 << div->width) - 1; + + return divider_recalc_rate(&common->hw, parent_rate, val, NULL, 0); +} +EXPORT_SYMBOL_GPL(sprd_div_helper_recalc_rate); + +static unsigned long sprd_div_recalc_rate(struct clk_hw *hw, + unsigned long parent_rate) +{ + struct sprd_div *cd = hw_to_sprd_div(hw); + + return sprd_div_helper_recalc_rate(&cd->common, &cd->div, parent_rate); +} + +int sprd_div_helper_set_rate(const struct sprd_clk_common *common, +const struct sprd_div_internal *div, +unsigned long rate, +unsigned long parent_rate) +{ + unsigned long flags; + unsigned long val; + unsigned int reg; + + val = divider_get_val(rate, parent_rate, NULL, + div->width, 0); + + spin_lock_irqsave(common->lock, flags); + + sprd_regmap_read(common->regmap, common->reg, ®); + reg &= ~GENMASK(div->width + div->shift - 1, div->shift); + + sprd_regmap_write(common->regmap, common->reg, + reg | (val << div->shift)); + + spin_unlock_irqrestore(common->lock, flags); + + return 0; + +} +EXPORT_SYMBOL_GPL(sprd_div_helper_set_rate); + +static int sprd_div_set_rate(struct clk_hw *hw, unsigned long rate, +unsigned long parent_rate) +{ + struct sprd_div *cd = hw_to_sprd_div(hw); + + return sprd_div_helper_set_rate(&cd->common, &cd->div, + rate, parent_rate); +} + +const struct clk_ops sprd_div_ops = { + .recalc_rate = sprd_div_recalc_rate, + .round_rate = sprd_div_round_rate, + .set_rate = sprd_div_set_rate, +}; +EXPORT_SYMBOL_GPL(sprd_div_ops); diff --git a/drivers/clk/sprd/div.h b/drivers/clk/sprd/div.h new file mode 100644 index 000..fa47773 --- /dev/null +++ b/drivers/clk/sprd/div.h @@ -0,0 +1,79 @@ +/* + * Spreadtrum divider clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#ifndef _SPRD_DIV_H_ +#define _SPRD_DIV_H_ + +#include "common.h" + +/** + * struct sprd_div_internal - Internal divider description + * @shift: Bit offset of the divider in its register + * @width: Width of the divider field in its register + * + * That structure represents a single divider, and is meant to be + * embedded in other structures representing the various clock + * classes. + */ +struct sprd_div_internal { + u8 shift; + u8 width; +}; + +#define _SPRD_DIV_CLK(_shift, _width) \ + {
[PATCH V4 12/12] arm64: dts: add clocks for SC9860
Some clocks on SC9860 are in the same address area with syscon devices, those are what have a property of 'sprd,syscon' which would refer to syscon devices, others would have a reg property indicated their address ranges. Signed-off-by: Chunyan Zhang --- arch/arm64/boot/dts/sprd/sc9860.dtsi | 115 +++ arch/arm64/boot/dts/sprd/whale2.dtsi | 2 +- 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/sprd/sc9860.dtsi b/arch/arm64/boot/dts/sprd/sc9860.dtsi index 7b7d8ce..bf03da4 100644 --- a/arch/arm64/boot/dts/sprd/sc9860.dtsi +++ b/arch/arm64/boot/dts/sprd/sc9860.dtsi @@ -7,6 +7,7 @@ */ #include +#include #include "whale2.dtsi" / { @@ -183,6 +184,120 @@ }; soc { + pmu_gate: pmu-gate { + compatible = "sprd,sc9860-pmu-gate"; + sprd,syscon = <&pmu_regs>; /* 0x402b */ + clocks = <&ext_26m>; + #clock-cells = <1>; + }; + + pll: pll { + compatible = "sprd,sc9860-pll"; + sprd,syscon = <&ana_regs>; /* 0x4040 */ + clocks = <&pmu_gate 0>; + #clock-cells = <1>; + }; + + ap_clk: clock-controller@2000 { + compatible = "sprd,sc9860-ap-clk"; + reg = <0 0x2000 0 0x400>; + clocks = <&ext_26m>, <&pll 0>, +<&pmu_gate 0>; + #clock-cells = <1>; + }; + + aon_prediv: aon-prediv { + compatible = "sprd,sc9860-aon-prediv"; + reg = <0 0x402d 0 0x400>; + clocks = <&ext_26m>, <&pll 0>, +<&pmu_gate 0>; + #clock-cells = <1>; + }; + + apahb_gate: apahb-gate { + compatible = "sprd,sc9860-apahb-gate"; + sprd,syscon = <&ap_ahb_regs>; /* 0x2021 */ + clocks = <&aon_prediv 0>; + #clock-cells = <1>; + }; + + aon_gate: aon-gate { + compatible = "sprd,sc9860-aon-gate"; + sprd,syscon = <&aon_regs>; /* 0x402e */ + clocks = <&aon_prediv 0>; + #clock-cells = <1>; + }; + + aonsecure_clk: clock-controller@4088 { + compatible = "sprd,sc9860-aonsecure-clk"; + reg = <0 0x4088 0 0x400>; + clocks = <&ext_26m>, <&pll 0>; + #clock-cells = <1>; + }; + + agcp_gate: agcp-gate { + compatible = "sprd,sc9860-agcp-gate"; + sprd,syscon = <&agcp_regs>; /* 0x415e */ + clocks = <&aon_prediv 0>; + #clock-cells = <1>; + }; + + gpu_clk: clock-controller@6020 { + compatible = "sprd,sc9860-gpu-clk"; + reg = <0 0x6020 0 0x400>; + clocks = <&pll 0>; + #clock-cells = <1>; + }; + + vsp_clk: clock-controller@6100 { + compatible = "sprd,sc9860-vsp-clk"; + reg = <0 0x6100 0 0x400>; + clocks = <&ext_26m>, <&pll 0>; + #clock-cells = <1>; + }; + + vsp_gate: vsp-gate { + compatible = "sprd,sc9860-vsp-gate"; + sprd,syscon = <&vsp_regs>; /* 0x6110 */ + clocks = <&vsp_clk 0>; + #clock-cells = <1>; + }; + + cam_clk: clock-controller@6200 { + compatible = "sprd,sc9860-cam-clk"; + reg = <0 0x6200 0 0x4000>; + clocks = <&ext_26m>, <&pll 0>; + #clock-cells = <1>; + }; + + cam_gate: cam-gate { + compatible = "sprd,sc9860-cam-gate"; + sprd,syscon = <&cam_regs>; /* 0x6210 */ + clocks = <&cam_clk 0>; + #clock-cells = <1>; + }; + + disp_clk: clock-controller@6300 { + compatible = "sprd,sc9860-disp-clk"; + reg = <0 0x6300 0 0x400>; + clocks = <&ext_26m>, <&pll 0>; + #clock-cells = <1>; + }; + + disp_gate: disp-gate { + compatible = "sprd,sc9860-disp-gate"; + sprd,syscon = <&disp_regs>; /* 0x6310 */ +
[PATCH V4 10/12] clk: sprd: add clocks support for SC9860
This patch added the list of clocks for Spreadtrum's SC9860 SoC, together with clock initialization code. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Kconfig | 10 + drivers/clk/sprd/Makefile |3 + drivers/clk/sprd/sc9860-clk.c | 1987 + 3 files changed, 2000 insertions(+) create mode 100644 drivers/clk/sprd/sc9860-clk.c diff --git a/drivers/clk/sprd/Kconfig b/drivers/clk/sprd/Kconfig index 67a3287..8789247 100644 --- a/drivers/clk/sprd/Kconfig +++ b/drivers/clk/sprd/Kconfig @@ -2,3 +2,13 @@ config SPRD_COMMON_CLK tristate "Clock support for Spreadtrum SoCs" depends on ARCH_SPRD || COMPILE_TEST default ARCH_SPRD + +if SPRD_COMMON_CLK + +# SoC Drivers + +config SPRD_SC9860_CLK + tristate "Support for the Spreadtrum SC9860 clocks" + depends on (ARM64 && ARCH_SPRD) || COMPILE_TEST + default ARM64 && ARCH_SPRD +endif diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index d693969..b0d81e5 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -6,3 +6,6 @@ clk-sprd-y += mux.o clk-sprd-y += div.o clk-sprd-y += composite.o clk-sprd-y += pll.o + +## SoC support +obj-$(CONFIG_SPRD_SC9860_CLK) += sc9860-clk.o diff --git a/drivers/clk/sprd/sc9860-clk.c b/drivers/clk/sprd/sc9860-clk.c new file mode 100644 index 000..caf7194 --- /dev/null +++ b/drivers/clk/sprd/sc9860-clk.c @@ -0,0 +1,1987 @@ +/* + * Spreatrum SC9860 clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "common.h" +#include "composite.h" +#include "div.h" +#include "gate.h" +#include "mux.h" +#include "pll.h" + +static CLK_FIXED_RATE(ext_rco_100m, "ext-rco-100m", 0, 1, 0); +static CLK_FIXED_RATE(ext_32k, "ext-32k", 0, 32768, 0); + +static CLK_FIXED_FACTOR(fac_4m,"fac-4m", "ext-26m", + 6, 1, 0); +static CLK_FIXED_FACTOR(fac_2m,"fac-2m", "ext-26m", + 13, 1, 0); +static CLK_FIXED_FACTOR(fac_1m,"fac-1m", "ext-26m", + 26, 1, 0); +static CLK_FIXED_FACTOR(fac_250k, "fac-250k", "ext-26m", + 104, 1, 0); +static CLK_FIXED_FACTOR(fac_rpll0_26m, "rpll0-26m","ext-26m", + 1, 1, 0); +static CLK_FIXED_FACTOR(fac_rpll1_26m, "rpll1-26m","ext-26m", + 1, 1, 0); +static CLK_FIXED_FACTOR(fac_rco_25m, "rco-25m", "ext-rc0-100m", + 4, 1, 0); +static CLK_FIXED_FACTOR(fac_rco_4m,"rco-4m", "ext-rc0-100m", + 25, 1, 0); +static CLK_FIXED_FACTOR(fac_rco_2m,"rco-2m", "ext-rc0-100m", + 50, 1, 0); +static CLK_FIXED_FACTOR(fac_3k2, "fac-3k2", "ext-32k", + 10, 1, 0); +static CLK_FIXED_FACTOR(fac_1k,"fac-1k", "ext-32k", + 32, 1, 0); + +static SPRD_SC_GATE_CLK(mpll0_gate,"mpll0-gate", "ext-26m", 0xb0, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(mpll1_gate,"mpll1-gate", "ext-26m", 0xb0, +0x1000, BIT(18), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(dpll0_gate,"dpll0-gate", "ext-26m", 0xb4, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(dpll1_gate,"dpll1-gate", "ext-26m", 0xb4, +0x1000, BIT(18), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(ltepll0_gate, "ltepll0-gate", "ext-26m", 0xb8, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(twpll_gate,"twpll-gate", "ext-26m", 0xbc, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(ltepll1_gate, "ltepll1-gate", "ext-26m", 0x10c, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(rpll0_gate,"rpll0-gate", "ext-26m", 0x16c, +0x1000, BIT(2), 0, 0); +static SPRD_SC_GATE_CLK(rpll1_gate,"rpll1-gate", "ext-26m", 0x16c, +0x1000, BIT(18), 0, 0); +static SPRD_SC_GATE_CLK(cppll_gate,"cppll-gate", "ext-26m", 0x2b4, +0x1000, BIT(2), CLK_IGNORE_UNUSED, 0); +static SPRD_SC_GATE_CLK(gpll_gate, "gpll-gate","ext-26m", 0x32c, + 0x1000, BIT(0), CLK_IGNORE_UNUSED, CLK_GATE_SET_TO_DISABLE); + +static struct sprd_clk_common *sc9860_pmu_gate_clks[] = { + /* address base is 0x402b */ + &mpll0_gate.common, + &mpll1_gate.common, + &dpll0_gate.common, + &dpll1_gate.common, +
[PATCH V4 09/12] clk: sprd: Add dt-bindings include file for SC9860
This file defines all SC9860 clock indexes, it should be included in the device tree in which there's device using the clocks. Signed-off-by: Chunyan Zhang --- include/dt-bindings/clock/sprd,sc9860-clk.h | 408 1 file changed, 408 insertions(+) create mode 100644 include/dt-bindings/clock/sprd,sc9860-clk.h diff --git a/include/dt-bindings/clock/sprd,sc9860-clk.h b/include/dt-bindings/clock/sprd,sc9860-clk.h new file mode 100644 index 000..48e6052 --- /dev/null +++ b/include/dt-bindings/clock/sprd,sc9860-clk.h @@ -0,0 +1,408 @@ +/* + * Spreadtrum SC9860 platform clocks + * + * Copyright (C) 2017, Spreadtrum Communications Inc. + * + * SPDX-License-Identifier: (GPL-2.0+ OR MIT) + */ + +#ifndef _DT_BINDINGS_CLK_SC9860_H_ +#define _DT_BINDINGS_CLK_SC9860_H_ + +#defineCLK_EXT_RCO_100M0 +#defineCLK_EXT_32K 1 +#defineCLK_FAC_4M 2 +#defineCLK_FAC_2M 3 +#defineCLK_FAC_1M 4 +#defineCLK_FAC_250K5 +#defineCLK_FAC_RPLL0_26M 6 +#defineCLK_FAC_RPLL1_26M 7 +#defineCLK_FAC_RCO25M 8 +#defineCLK_FAC_RCO4M 9 +#defineCLK_FAC_RCO2M 10 +#defineCLK_FAC_3K2 11 +#defineCLK_FAC_1K 12 +#defineCLK_MPLL0_GATE 13 +#defineCLK_MPLL1_GATE 14 +#defineCLK_DPLL0_GATE 15 +#defineCLK_DPLL1_GATE 16 +#defineCLK_LTEPLL0_GATE17 +#defineCLK_TWPLL_GATE 18 +#defineCLK_LTEPLL1_GATE19 +#defineCLK_RPLL0_GATE 20 +#defineCLK_RPLL1_GATE 21 +#defineCLK_CPPLL_GATE 22 +#defineCLK_GPLL_GATE 23 +#define CLK_PMU_GATE_NUM (CLK_GPLL_GATE + 1) + +#defineCLK_MPLL0 0 +#defineCLK_MPLL1 1 +#defineCLK_DPLL0 2 +#defineCLK_DPLL1 3 +#defineCLK_RPLL0 4 +#defineCLK_RPLL1 5 +#defineCLK_TWPLL 6 +#defineCLK_LTEPLL0 7 +#defineCLK_LTEPLL1 8 +#defineCLK_GPLL9 +#defineCLK_CPPLL 10 +#defineCLK_GPLL_42M5 11 +#defineCLK_TWPLL_768M 12 +#defineCLK_TWPLL_384M 13 +#defineCLK_TWPLL_192M 14 +#defineCLK_TWPLL_96M 15 +#defineCLK_TWPLL_48M 16 +#defineCLK_TWPLL_24M 17 +#defineCLK_TWPLL_12M 18 +#defineCLK_TWPLL_512M 19 +#defineCLK_TWPLL_256M 20 +#defineCLK_TWPLL_128M 21 +#defineCLK_TWPLL_64M 22 +#defineCLK_TWPLL_307M2 23 +#defineCLK_TWPLL_153M6 24 +#defineCLK_TWPLL_76M8 25 +#defineCLK_TWPLL_51M2 26 +#defineCLK_TWPLL_38M4 27 +#defineCLK_TWPLL_19M2 28 +#defineCLK_L0_614M429 +#defineCLK_L0_409M630 +#defineCLK_L0_38M 31 +#defineCLK_L1_38M 32 +#defineCLK_RPLL0_192M 33 +#defineCLK_RPLL0_96M 34 +#defineCLK_RPLL0_48M 35 +#defineCLK_RPLL1_468M 36 +#defineCLK_RPLL1_192M 37 +#defineCLK_RPLL1_96M 38 +#defineCLK_RPLL1_64M 39 +#defineCLK_RPLL1_48M 40 +#defineCLK_DPLL0_50M 41 +#defineCLK_DPLL1_50M 42 +#defineCLK_CPPLL_50M 43 +#defineCLK_M0_39M 44 +#defineCLK_M1_63M 45 +#define CLK_PLL_NUM(CLK_M1_63M + 1) + + +#defineCLK_AP_APB 0 +#defineCLK_AP_USB3 1 +#defineCLK_UART0 2 +#defineCLK_UART1 3 +#defineCLK_UART2 4 +#defineCLK_UART3 5 +#defineCLK_UART4 6 +#defineCLK_I2C07 +#defineCLK_I2C18 +#defineCLK_I2C29 +#defineCLK_I2C310 +#defineCLK_I2C411 +#defineCLK_I2C512 +#defineCLK_SPI013 +#defineCLK_SPI114 +#defineCLK_SPI215 +#defineCLK_SPI316 +#defineCLK_IIS017 +#defineCLK_IIS118 +#defineCLK_IIS219 +#defineCLK_IIS320 +#define CLK_AP_CLK_NUM (CLK_IIS3 + 1) + +#defineCLK_AON_APB 0 +#defineCLK_AUX01 +#defineCLK_AUX12 +#defineCLK_AUX2
Re: [RFC PATCH 0/2] apply write hints to select the type of segments
On 2017/11/10 8:23, Hyunchul Lee wrote: > Hello, Chao > > On 11/09/2017 06:12 PM, Chao Yu wrote: >> On 2017/11/9 13:51, Hyunchul Lee wrote: >>> From: Hyunchul Lee >>> >>> Using write hints[1], applications can inform the life time of the data >>> written to devices. and this[2] reported that the write hints patch >>> decreased writes in NAND by 25%. >>> >>> This hints help F2FS to determine the followings. >>> 1) the segment types where the data will be written. >>> 2) the hints that will be passed down to devices with the data of >>> segments. >>> >>> This patch set implements the first mapping from write hints to segment >>> types >>> as shown below. >>> >>> hints segment type >>> - >>> WRITE_LIFE_SHORT CURSEG_COLD_DATA >>> WRITE_LIFE_EXTREMECURSEG_HOT_DATA >>> othersCURSEG_WARM_DATA >>> >>> The F2FS poliy for hot/cold seperation has precedence over this hints, And >>> hints are not applied in in-place update. >> >> Could we change to disable IPU if file/inode write hint is existing? >> > > I am afraid that this makes side effects. for example, this could cause > out-of-place updates even when there are not enough free segments. > I can write the patch that handles these situations. But I wonder > that this is required, and I am not sure which IPU polices can be disabled. Oh, As I replied in another thread, I think IPU just affects filesystem hot/cold separating, rather than this feature. So I think it will be okay to not consider it. > >>> >>> Before the second mapping is implemented, write hints are not passed down >>> to devices. Because it is better that the data of a segment have the same >>> hint. >>> >>> [1]: c75b1d9421f80f4143e389d2d50ddfc8a28c8c35 >>> [2]: https://lwn.net/Articles/726477/ >> >> Could you write a patch to support passing write hint to block layer for >> buffered writes as below commit: >> 0127251c45ae ("ext4: add support for passing in write hints for buffered >> writes") >> > > Sure I will. I wrote it already ;) Cool, ;) > I think that datas from the same segment should be passed down with the same > hint, and the following mapping is reasonable. I wonder what is your opinion > about it. > > segment type hints > - > CURSEG_COLD_DATA WRITE_LIFE_EXTREME > CURSEG_HOT_DATAWRITE_LIFE_SHORT > CURSEG_COLD_NODE WRITE_LIFE_NORMAL We have WRITE_LIFE_LONG defined rather than WRITE_LIFE_NORMAL in fs.h? > CURSEG_HOT_NODEWRITE_LIFE_MEDIUM As I know, in scenario of cell phone, data of meta_inode is hottest, then hot data, warm node, and cold node should be coldest. So I suggested we can define as below: META_DATA WRITE_LIFE_SHORT HOT_DATA & WARM_NODEWRITE_LIFE_MEDIUM HOT_NODE & WARM_DATAWRITE_LIFE_LONG COLD_NODE & COLD_DATA WRITE_LIFE_EXTREME Thanks, > others WRITE_LIFE_NONE > >> Thanks, >> >>> >>> Hyunchul Lee (2): >>> f2fs: apply write hints to select the type of segments for buffered >>> write >>> f2fs: apply write hints to select the type of segment for direct write >>> >>> fs/f2fs/data.c| 101 >>> -- >>> fs/f2fs/f2fs.h| 1 + >>> fs/f2fs/segment.c | 14 +++- >>> 3 files changed, 74 insertions(+), 42 deletions(-) >>> >> >> > > Thanks > > . >
[PATCH V4 11/12] arm64: dts: add syscon for whale2 platform
Some clocks on SC9860 are in the same address area with syscon devices, the proper syscon node will be quoted under the definitions of those clocks in DT. Signed-off-by: Chunyan Zhang --- arch/arm64/boot/dts/sprd/whale2.dtsi | 46 +++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/sprd/whale2.dtsi b/arch/arm64/boot/dts/sprd/whale2.dtsi index 7c217c5..6ea3a75 100644 --- a/arch/arm64/boot/dts/sprd/whale2.dtsi +++ b/arch/arm64/boot/dts/sprd/whale2.dtsi @@ -17,6 +17,51 @@ #size-cells = <2>; ranges; + ap_ahb_regs: syscon@2021 { + compatible = "syscon"; + reg = <0 0x2021 0 0x1>; + }; + + pmu_regs: syscon@402b { + compatible = "syscon"; + reg = <0 0x402b 0 0x1>; + }; + + aon_regs: syscon@402e { + compatible = "syscon"; + reg = <0 0x402e 0 0x1>; + }; + + ana_regs: syscon@4040 { + compatible = "syscon"; + reg = <0 0x4040 0 0x1>; + }; + + agcp_regs: syscon@415e { + compatible = "syscon"; + reg = <0 0x415e 0 0x100>; + }; + + vsp_regs: syscon@6110 { + compatible = "syscon"; + reg = <0 0x6110 0 0x1>; + }; + + cam_regs: syscon@6210 { + compatible = "syscon"; + reg = <0 0x6210 0 0x1>; + }; + + disp_regs: syscon@6310 { + compatible = "syscon"; + reg = <0 0x6310 0 0x1>; + }; + + ap_apb_regs: syscon@70b0 { + compatible = "syscon"; + reg = <0 0x70b0 0 0x4>; + }; + ap-apb { compatible = "simple-bus"; #address-cells = <1>; @@ -59,7 +104,6 @@ status = "disabled"; }; }; - }; ext_26m: ext-26m { -- 2.7.4
[PATCH v2] checkpatch: Fix checks for Kconfig help text
If one patch has Kconfig section, the check script variable '$is_start' will be set by first 'config' line and the variable '$is_end' is to be set by the second 'config' line. But patches often only has one 'config' line so we have no chance to set '$is_end', as result below condition is invalid and it skips check for Kconfig description: if ($is_start && $is_end && $length < $min_conf_desc_length) { .. } When script runs to this condition sentence it means the Kconfig section parsing has been completed, whatever '$is_end' is true or not. So removes '$is_end' from condition sentence. Another change is to change '$min_conf_desc_length' from 4 to 1; so can pass the check if Kconfig description has at least one line. Signed-off-by: Leo Yan --- scripts/checkpatch.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3453df9..ba724b0 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf"; my $max_line_length = 80; my $ignore_perl_version = 0; my $minimum_perl_version = 5.10.0; -my $min_conf_desc_length = 4; +my $min_conf_desc_length = 1; my $spelling_file = "$D/spelling.txt"; my $codespell = 0; my $codespellfile = "/usr/share/codespell/dictionary.txt"; @@ -2796,7 +2796,7 @@ sub process { } $length++; } - if ($is_start && $is_end && $length < $min_conf_desc_length) { + if ($is_start && $length < $min_conf_desc_length) { WARN("CONFIG_DESCRIPTION", "please write a paragraph that describes the config symbol fully\n" . $herecurr); } -- 2.7.4
[PATCH V4 07/12] clk: sprd: add composite clock support
This patch introduced composite clock driver for Spreadtrum's SoCs. The functions of this composite clock simply consists of divider and mux clocks. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Makefile| 1 + drivers/clk/sprd/composite.c | 65 drivers/clk/sprd/composite.h | 55 + 3 files changed, 121 insertions(+) create mode 100644 drivers/clk/sprd/composite.c create mode 100644 drivers/clk/sprd/composite.h diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index 80e6039..2262e76 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -4,3 +4,4 @@ clk-sprd-y += common.o clk-sprd-y += gate.o clk-sprd-y += mux.o clk-sprd-y += div.o +clk-sprd-y += composite.o diff --git a/drivers/clk/sprd/composite.c b/drivers/clk/sprd/composite.c new file mode 100644 index 000..30d5b36 --- /dev/null +++ b/drivers/clk/sprd/composite.c @@ -0,0 +1,65 @@ +/* + * Spreadtrum composite clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include + +#include "composite.h" + +DEFINE_SPINLOCK(sprd_comp_lock); +EXPORT_SYMBOL_GPL(sprd_comp_lock); + +static long sprd_comp_round_rate(struct clk_hw *hw, unsigned long rate, + unsigned long *parent_rate) +{ + struct sprd_comp *cc = hw_to_sprd_comp(hw); + + return sprd_div_helper_round_rate(&cc->common, &cc->div, +rate, parent_rate); +} + +static unsigned long sprd_comp_recalc_rate(struct clk_hw *hw, + unsigned long parent_rate) +{ + struct sprd_comp *cc = hw_to_sprd_comp(hw); + + return sprd_div_helper_recalc_rate(&cc->common, &cc->div, parent_rate); +} + +static int sprd_comp_set_rate(struct clk_hw *hw, unsigned long rate, +unsigned long parent_rate) +{ + struct sprd_comp *cc = hw_to_sprd_comp(hw); + + return sprd_div_helper_set_rate(&cc->common, &cc->div, + rate, parent_rate); +} + +static u8 sprd_comp_get_parent(struct clk_hw *hw) +{ + struct sprd_comp *cc = hw_to_sprd_comp(hw); + + return sprd_mux_helper_get_parent(&cc->common, &cc->mux); +} + +static int sprd_comp_set_parent(struct clk_hw *hw, u8 index) +{ + struct sprd_comp *cc = hw_to_sprd_comp(hw); + + return sprd_mux_helper_set_parent(&cc->common, &cc->mux, index); +} + +const struct clk_ops sprd_comp_ops = { + .get_parent = sprd_comp_get_parent, + .set_parent = sprd_comp_set_parent, + + .round_rate = sprd_comp_round_rate, + .recalc_rate= sprd_comp_recalc_rate, + .set_rate = sprd_comp_set_rate, +}; +EXPORT_SYMBOL_GPL(sprd_comp_ops); diff --git a/drivers/clk/sprd/composite.h b/drivers/clk/sprd/composite.h new file mode 100644 index 000..a9bd68d --- /dev/null +++ b/drivers/clk/sprd/composite.h @@ -0,0 +1,55 @@ +/* + * Spreadtrum composite clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#ifndef _SPRD_COMPOSITE_H_ +#define _SPRD_COMPOSITE_H_ + +#include "common.h" +#include "mux.h" +#include "div.h" + +struct sprd_comp { + struct sprd_mux_sselmux; + struct sprd_div_internaldiv; + struct sprd_clk_common common; +}; + +#define SPRD_COMP_CLK_TABLE(_struct, _name, _parent, _reg, _table, \ + _mshift, _mwidth, _dshift, _dwidth, _flags) \ + struct sprd_comp _struct = {\ + .mux= _SPRD_MUX_CLK(_mshift, _mwidth, _table), \ + .div= _SPRD_DIV_CLK(_dshift, _dwidth), \ + .common = { \ + .regmap = NULL, \ + .reg= _reg, \ + .lock = &sprd_comp_lock, \ + .hw.init = CLK_HW_INIT_PARENTS(_name, \ + _parent, \ + &sprd_comp_ops, \ + _flags), \ +} \ + } + +#define SPRD_COMP_CLK(_struct, _name, _parent, _reg, _mshift, \ + _mwidth, _dshift, _dwidth, _flags) \ + SPRD_COMP_CLK_TABLE(_struct, _name, _parent, _reg, \ + NULL, _mshift, _mwidth, \ + _dshift, _dwidth, _flags) + +static inline struct sprd_comp *hw_to_sprd_comp(const struct clk_hw *hw) +{ + struct sprd_clk_common *common = hw_to_sprd_clk_common(hw);
[PATCH V4 02/12] dt-bindings: Add Spreadtrum clock binding documentation
Introduce a new binding with its documentation for Spreadtrum clock sub-framework. Signed-off-by: Chunyan Zhang --- Documentation/devicetree/bindings/clock/sprd.txt | 63 1 file changed, 63 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/sprd.txt diff --git a/Documentation/devicetree/bindings/clock/sprd.txt b/Documentation/devicetree/bindings/clock/sprd.txt new file mode 100644 index 000..e9d179e --- /dev/null +++ b/Documentation/devicetree/bindings/clock/sprd.txt @@ -0,0 +1,63 @@ +Spreadtrum Clock Binding + + +Required properties: +- compatible: should contain the following compatible strings: + - "sprd,sc9860-pmu-gate" + - "sprd,sc9860-pll" + - "sprd,sc9860-ap-clk" + - "sprd,sc9860-aon-prediv" + - "sprd,sc9860-apahb-gate" + - "sprd,sc9860-aon-gate" + - "sprd,sc9860-aonsecure-clk" + - "sprd,sc9860-agcp-gate" + - "sprd,sc9860-gpu-clk" + - "sprd,sc9860-vsp-clk" + - "sprd,sc9860-vsp-gate" + - "sprd,sc9860-cam-clk" + - "sprd,sc9860-cam-gate" + - "sprd,sc9860-disp-clk" + - "sprd,sc9860-disp-gate" + - "sprd,sc9860-apapb-gate" + +- #clock-cells: must be 1 + +- clocks : Should be the input parent clock(s) phandle for the clock, this + property here just simply shows which clock group the clocks' + parents are in, since each clk node would represent many clocks + which are defined in the driver. The detailed dependency + relationship (i.e. how many parents and which are the parents) + are implemented in driver code. + +Optional properties: + +- reg: Contain the registers base address and length. It must be configured + only if no 'sprd,syscon' under the node. + +- sprd,syscon: phandle to the syscon which is in the same address area with + the clock, and so we can get regmap for the clocks from the + syscon device. + +Example: + + pmu_gate: pmu-gate { + compatible = "sprd,sc9860-pmu-gate"; + sprd,syscon = <&pmu_regs>; + clocks = <&ext_26m>; + #clock-cells = <1>; + }; + + pll: pll { + compatible = "sprd,sc9860-pll"; + sprd,syscon = <&ana_regs>; + clocks = <&pmu_gate 0>; + #clock-cells = <1>; + }; + + ap_clk: clock-controller@2000 { + compatible = "sprd,sc9860-ap-clk"; + reg = <0 0x2000 0 0x400>; + clocks = <&ext_26m>, <&pll 0>, +<&pmu_gate 0>; + #clock-cells = <1>; + }; -- 2.7.4
[PATCH V4 03/12] clk: sprd: Add common infrastructure
Added Spreadtrum's clock driver framework together with common structures and interface functions. Signed-off-by: Chunyan Zhang --- drivers/clk/Kconfig | 1 + drivers/clk/Makefile | 1 + drivers/clk/sprd/Kconfig | 4 ++ drivers/clk/sprd/Makefile | 3 ++ drivers/clk/sprd/common.c | 113 ++ drivers/clk/sprd/common.h | 54 ++ 6 files changed, 176 insertions(+) create mode 100644 drivers/clk/sprd/Kconfig create mode 100644 drivers/clk/sprd/Makefile create mode 100644 drivers/clk/sprd/common.c create mode 100644 drivers/clk/sprd/common.h diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig index 1c4e1aa..ce1a32be 100644 --- a/drivers/clk/Kconfig +++ b/drivers/clk/Kconfig @@ -236,6 +236,7 @@ source "drivers/clk/mvebu/Kconfig" source "drivers/clk/qcom/Kconfig" source "drivers/clk/renesas/Kconfig" source "drivers/clk/samsung/Kconfig" +source "drivers/clk/sprd/Kconfig" source "drivers/clk/sunxi-ng/Kconfig" source "drivers/clk/tegra/Kconfig" source "drivers/clk/ti/Kconfig" diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile index c99f363..fa33891 100644 --- a/drivers/clk/Makefile +++ b/drivers/clk/Makefile @@ -84,6 +84,7 @@ obj-$(CONFIG_COMMON_CLK_SAMSUNG) += samsung/ obj-$(CONFIG_ARCH_SIRF)+= sirf/ obj-$(CONFIG_ARCH_SOCFPGA) += socfpga/ obj-$(CONFIG_PLAT_SPEAR) += spear/ +obj-$(CONFIG_ARCH_SPRD)+= sprd/ obj-$(CONFIG_ARCH_STI) += st/ obj-$(CONFIG_ARCH_SUNXI) += sunxi/ obj-$(CONFIG_ARCH_SUNXI) += sunxi-ng/ diff --git a/drivers/clk/sprd/Kconfig b/drivers/clk/sprd/Kconfig new file mode 100644 index 000..67a3287 --- /dev/null +++ b/drivers/clk/sprd/Kconfig @@ -0,0 +1,4 @@ +config SPRD_COMMON_CLK + tristate "Clock support for Spreadtrum SoCs" + depends on ARCH_SPRD || COMPILE_TEST + default ARCH_SPRD diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile new file mode 100644 index 000..74f4b80 --- /dev/null +++ b/drivers/clk/sprd/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_SPRD_COMMON_CLK) += clk-sprd.o + +clk-sprd-y += common.o diff --git a/drivers/clk/sprd/common.c b/drivers/clk/sprd/common.c new file mode 100644 index 000..c003f09 --- /dev/null +++ b/drivers/clk/sprd/common.c @@ -0,0 +1,113 @@ +/* + * Spreadtrum clock infrastructure + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include +#include +#include +#include +#include + +#include "common.h" + +static const struct regmap_config sprdclk_regmap_config = { + .reg_bits = 32, + .reg_stride = 4, + .val_bits = 32, + .max_register = 0x, + .fast_io= true, +}; + +static void sprd_clk_set_regmap(const struct sprd_clk_desc *desc, +struct regmap *regmap) +{ + int i; + struct sprd_clk_common *cclk; + + for (i = 0; i < desc->num_clk_clks; i++) { + cclk = desc->clk_clks[i]; + if (!cclk) + continue; + + cclk->regmap = regmap; + } +} + +int sprd_clk_regmap_init(struct platform_device *pdev, +const struct sprd_clk_desc *desc) +{ + void __iomem *base; + struct device_node *node = pdev->dev.of_node; + struct regmap *regmap = NULL; + + if (of_find_property(node, "sprd,syscon", NULL)) { + regmap = syscon_regmap_lookup_by_phandle(node, "sprd,syscon"); + if (IS_ERR(regmap)) { + pr_err("%s: failed to get syscon regmap\n", __func__); + return PTR_ERR(regmap); + } + } else { + base = of_iomap(node, 0); + regmap = devm_regmap_init_mmio(&pdev->dev, base, + &sprdclk_regmap_config); + if (IS_ERR(regmap)) { + pr_err("failed to init regmap.\n"); + return PTR_ERR(regmap); + } + } + + sprd_clk_set_regmap(desc, regmap); + + return 0; +} +EXPORT_SYMBOL_GPL(sprd_clk_regmap_init); + +int sprd_clk_probe(struct device_node *node, + struct clk_hw_onecell_data *clkhw) +{ + int i, ret = 0; + struct clk_hw *hw; + + for (i = 0; i < clkhw->num; i++) { + + hw = clkhw->hws[i]; + + if (!hw) + continue; + + ret = clk_hw_register(NULL, hw); + if (ret) { + pr_err("Couldn't register clock %d - %s\n", + i, hw->init->name); + goto err_clk_unreg; + } + } + + ret = of_clk_add_hw_provider(node, of_clk_hw_onecell_get, +clkhw); + if (ret) { + p
[PATCH V4 08/12] clk: sprd: add adjustable pll support
Introduced a common adjustable pll clock driver for Spreadtrum SoCs. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Makefile | 1 + drivers/clk/sprd/pll.c| 268 ++ drivers/clk/sprd/pll.h| 110 +++ 3 files changed, 379 insertions(+) create mode 100644 drivers/clk/sprd/pll.c create mode 100644 drivers/clk/sprd/pll.h diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index 2262e76..d693969 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -5,3 +5,4 @@ clk-sprd-y += gate.o clk-sprd-y += mux.o clk-sprd-y += div.o clk-sprd-y += composite.o +clk-sprd-y += pll.o diff --git a/drivers/clk/sprd/pll.c b/drivers/clk/sprd/pll.c new file mode 100644 index 000..1fd8d32 --- /dev/null +++ b/drivers/clk/sprd/pll.c @@ -0,0 +1,268 @@ +/* + * Spreadtrum pll clock driver + * + * Copyright (C) 2015~2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include +#include +#include +#include + +#include "pll.h" + +#define CLK_PLL_1M 100 +#define CLK_PLL_10M(CLK_PLL_1M * 10) + +#define pindex(pll, member)\ + (pll->factors[member].shift / (8 * sizeof(pll->regs_num))) + +#define pshift(pll, member)\ + (pll->factors[member].shift % (8 * sizeof(pll->regs_num))) + +#define pwidth(pll, member)\ + pll->factors[member].width + +#define pmask(pll, member) \ + ((pwidth(pll, member)) ?\ + GENMASK(pwidth(pll, member) + pshift(pll, member) - 1, \ + pshift(pll, member)) : 0) + +#define pinternal(pll, cfg, member)\ + (cfg[pindex(pll, member)] & pmask(pll, member)) + +#define pinternal_val(pll, cfg, member)\ + (pinternal(pll, cfg, member) >> pshift(pll, member)) + +static inline unsigned int +sprd_pll_read(const struct sprd_pll *pll, u8 index) +{ + const struct sprd_clk_common *common = &pll->common; + unsigned int val = 0; + + if (WARN_ON(index >= pll->regs_num)) + return 0; + + sprd_regmap_read(common->regmap, common->reg + index * 4, &val); + + return val; +} + +static inline void +sprd_pll_write(const struct sprd_pll *pll, u8 index, + u32 msk, u32 val) +{ + const struct sprd_clk_common *common = &pll->common; + unsigned int offset, reg; + int ret = 0; + + if (WARN_ON(index >= pll->regs_num)) + return; + + offset = common->reg + index * 4; + ret = sprd_regmap_read(common->regmap, offset, ®); + if (!ret) + sprd_regmap_write(common->regmap, offset, (reg & ~msk) | val); +} + +static unsigned long pll_get_refin(const struct sprd_pll *pll) +{ + u32 shift, mask, index, refin_id = 3; + const unsigned long refin[4] = { 2, 4, 13, 26 }; + + if (pwidth(pll, PLL_REFIN)) { + index = pindex(pll, PLL_REFIN); + shift = pshift(pll, PLL_REFIN); + mask = pmask(pll, PLL_REFIN); + refin_id = (sprd_pll_read(pll, index) & mask) >> shift; + if (refin_id > 3) + refin_id = 3; + } + + return refin[refin_id]; +} + +static u32 pll_get_ibias(u64 rate, const u64 *table) +{ + u32 i, num = table[0]; + + for (i = 1; i < num + 1; i++) + if (rate <= table[i]) + break; + + return (i == num + 1) ? num : i; +} + +static unsigned long _sprd_pll_recalc_rate(const struct sprd_pll *pll, + unsigned long parent_rate) +{ + u32 *cfg; + u32 i, mask, regs_num = pll->regs_num; + unsigned long rate, nint, kint = 0; + u64 refin; + u16 k1, k2; + + cfg = kcalloc(regs_num, sizeof(*cfg), GFP_KERNEL); + if (!cfg) + return -ENOMEM; + + for (i = 0; i < regs_num; i++) + cfg[i] = sprd_pll_read(pll, i); + + refin = pll_get_refin(pll); + + if (pinternal(pll, cfg, PLL_PREDIV)) + refin = refin * 2; + + if (pwidth(pll, PLL_POSTDIV) && + ((pll->fflag == 1 && pinternal(pll, cfg, PLL_POSTDIV)) || +(!pll->fflag && !pinternal(pll, cfg, PLL_POSTDIV + refin = refin / 2; + + if (!pinternal(pll, cfg, PLL_DIV_S)) { + rate = refin * pinternal_val(pll, cfg, PLL_N) * CLK_PLL_10M; + } else { + nint = pinternal_val(pll, cfg, PLL_NINT); + if (pinternal(pll, cfg, PLL_SDM_EN)) + kint = pinternal_val(pll, cfg, PLL_KINT); + + mask = pmask(pll, PLL_KINT); + + k1 = pll->k1; + k2 = pll->k2; + rate = DIV_ROUND_CLOSEST_ULL(refin * kint * k1, +((mask >> __ffs(mask)) + 1)) * +
[PATCH V4 05/12] clk: sprd: add mux clock support
This patch adds clock multiplexor support for Spreadtrum platforms, the mux clocks also can be found in sprd composite clocks, so provides two helpers that can be reused later on. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Makefile | 1 + drivers/clk/sprd/mux.c| 86 +++ drivers/clk/sprd/mux.h| 78 ++ 3 files changed, 165 insertions(+) create mode 100644 drivers/clk/sprd/mux.c create mode 100644 drivers/clk/sprd/mux.h diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index 8cd5592..cee36b5 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_SPRD_COMMON_CLK) += clk-sprd.o clk-sprd-y += common.o clk-sprd-y += gate.o +clk-sprd-y += mux.o diff --git a/drivers/clk/sprd/mux.c b/drivers/clk/sprd/mux.c new file mode 100644 index 000..dacb5b4 --- /dev/null +++ b/drivers/clk/sprd/mux.c @@ -0,0 +1,86 @@ +/* + * Spreadtrum multiplexer clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include +#include +#include + +#include "mux.h" + +DEFINE_SPINLOCK(sprd_mux_lock); +EXPORT_SYMBOL_GPL(sprd_mux_lock); + +u8 sprd_mux_helper_get_parent(const struct sprd_clk_common *common, + const struct sprd_mux_ssel *mux) +{ + unsigned int reg; + u8 parent; + int num_parents; + int i; + + sprd_regmap_read(common->regmap, common->reg, ®); + parent = reg >> mux->shift; + parent &= (1 << mux->width) - 1; + + if (!mux->table) + return parent; + + num_parents = clk_hw_get_num_parents(&common->hw); + + for (i = 0; i < num_parents - 1; i++) + if (parent >= mux->table[i] && parent < mux->table[i + 1]) + return i; + + return num_parents - 1; +} +EXPORT_SYMBOL_GPL(sprd_mux_helper_get_parent); + +static u8 sprd_mux_get_parent(struct clk_hw *hw) +{ + struct sprd_mux *cm = hw_to_sprd_mux(hw); + + return sprd_mux_helper_get_parent(&cm->common, &cm->mux); +} + +int sprd_mux_helper_set_parent(const struct sprd_clk_common *common, + const struct sprd_mux_ssel *mux, + u8 index) +{ + unsigned long flags = 0; + unsigned int reg; + + if (mux->table) + index = mux->table[index]; + + spin_lock_irqsave(common->lock, flags); + + sprd_regmap_read(common->regmap, common->reg, ®); + reg &= ~GENMASK(mux->width + mux->shift - 1, mux->shift); + sprd_regmap_write(common->regmap, common->reg, + reg | (index << mux->shift)); + + spin_unlock_irqrestore(common->lock, flags); + + return 0; +} +EXPORT_SYMBOL_GPL(sprd_mux_helper_set_parent); + +static int sprd_mux_set_parent(struct clk_hw *hw, u8 index) +{ + struct sprd_mux *cm = hw_to_sprd_mux(hw); + + return sprd_mux_helper_set_parent(&cm->common, &cm->mux, index); +} + +const struct clk_ops sprd_mux_ops = { + .get_parent = sprd_mux_get_parent, + .set_parent = sprd_mux_set_parent, + .determine_rate = __clk_mux_determine_rate, +}; +EXPORT_SYMBOL_GPL(sprd_mux_ops); diff --git a/drivers/clk/sprd/mux.h b/drivers/clk/sprd/mux.h new file mode 100644 index 000..72a3f78 --- /dev/null +++ b/drivers/clk/sprd/mux.h @@ -0,0 +1,78 @@ +/* + * Spreadtrum multiplexer clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#ifndef _SPRD_MUX_H_ +#define _SPRD_MUX_H_ + +#include "common.h" + +/** + * struct sprd_mux_ssel - Mux clock's source select bits in its register + * @shift: Bit offset of the divider in its register + * @width: Width of the divider field in its register + * @table: For some mux clocks, not all sources are used on some special + *chips, this matches the value of mux clock's register and the + *sources which are used for this mux clock + */ +struct sprd_mux_ssel { + u8 shift; + u8 width; + const u8*table; +}; + +struct sprd_mux { + struct sprd_mux_ssel mux; + struct sprd_clk_common common; +}; + +#define _SPRD_MUX_CLK(_shift, _width, _table) \ + { \ + .shift = _shift, \ + .width = _width, \ + .table = _table, \ + } + +#define SPRD_MUX_CLK_TABLE(_struct, _name, _parents, _table, \ +_reg, _shift, _width, \ +_flags)\ + struct sprd_mux _struct = { \ + .mux= _SPRD_MUX_CLK(_shift, _width, _table),\ +
[PATCH V4 01/12] drivers: move clock common macros out from vendor directories
These macros are used by more than one SoC vendor platforms, avoid to have many copies of these code, this patch moves them to the common clock directory which every clock drivers can access to. Signed-off-by: Chunyan Zhang --- This patchset also added a few common clock mactos into drivers/clk/clk_common.h, which are generally useful for all vendors' clock driver, sunxi-ng, zte, sprd (added in this patchse) use them (or part of them) at present, once this patch is merged, I can help to remove the duplicated code which is under the vendors' respective directories. --- drivers/clk/clk_common.h | 60 1 file changed, 60 insertions(+) create mode 100644 drivers/clk/clk_common.h diff --git a/drivers/clk/clk_common.h b/drivers/clk/clk_common.h new file mode 100644 index 000..21e93d2 --- /dev/null +++ b/drivers/clk/clk_common.h @@ -0,0 +1,60 @@ +/* + * drivers/clk/clk_common.h + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#ifndef _CLK_COMMON_H_ +#define _CLK_COMMON_H_ + +#include + +#define CLK_HW_INIT(_name, _parent, _ops, _flags) \ + (&(struct clk_init_data) { \ + .flags = _flags, \ + .name = _name,\ + .parent_names = (const char *[]) { _parent }, \ + .num_parents= 1,\ + .ops= _ops, \ + }) + +#define CLK_HW_INIT_PARENTS(_name, _parents, _ops, _flags) \ + (&(struct clk_init_data) { \ + .flags = _flags, \ + .name = _name,\ + .parent_names = _parents, \ + .num_parents= ARRAY_SIZE(_parents), \ + .ops= _ops, \ + }) + +#define CLK_HW_INIT_NO_PARENT(_name, _ops, _flags) \ + (&(struct clk_init_data) { \ + .flags = _flags, \ + .name = _name,\ + .parent_names = NULL, \ + .num_parents= 0,\ + .ops= _ops, \ + }) + +#define CLK_FIXED_FACTOR(_struct, _name, _parent, \ + _div, _mult, _flags)\ + struct clk_fixed_factor _struct = { \ + .div= _div, \ + .mult = _mult,\ + .hw.init= CLK_HW_INIT(_name,\ + _parent, \ + &clk_fixed_factor_ops,\ + _flags), \ + } + +#define CLK_FIXED_RATE(_struct, _name, _flags, \ + _fixed_rate, _fixed_accuracy)\ + struct clk_fixed_rate _struct = { \ + .fixed_rate = _fixed_rate, \ + .fixed_accuracy = _fixed_accuracy, \ + .hw.init= CLK_HW_INIT_NO_PARENT(_name, \ + &clk_fixed_rate_ops, \ + _flags),\ + } + +#endif /* _CLK_COMMON_H_ */ -- 2.7.4
[PATCH V4 04/12] clk: sprd: add gate clock support
Some clocks on the Spreadtrum's SoCs are just simple gates. Add support for those clocks. Signed-off-by: Chunyan Zhang --- drivers/clk/sprd/Makefile | 1 + drivers/clk/sprd/gate.c | 124 ++ drivers/clk/sprd/gate.h | 63 +++ 3 files changed, 188 insertions(+) create mode 100644 drivers/clk/sprd/gate.c create mode 100644 drivers/clk/sprd/gate.h diff --git a/drivers/clk/sprd/Makefile b/drivers/clk/sprd/Makefile index 74f4b80..8cd5592 100644 --- a/drivers/clk/sprd/Makefile +++ b/drivers/clk/sprd/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_SPRD_COMMON_CLK) += clk-sprd.o clk-sprd-y += common.o +clk-sprd-y += gate.o diff --git a/drivers/clk/sprd/gate.c b/drivers/clk/sprd/gate.c new file mode 100644 index 000..fa0d9ee --- /dev/null +++ b/drivers/clk/sprd/gate.c @@ -0,0 +1,124 @@ +/* + * Spreadtrum gate clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#include +#include + +#include "gate.h" + +DEFINE_SPINLOCK(sprd_gate_lock); +EXPORT_SYMBOL_GPL(sprd_gate_lock); + +static void clk_gate_toggle(const struct sprd_gate *sg, bool en) +{ + const struct sprd_clk_common *common = &sg->common; + unsigned long flags = 0; + unsigned int reg; + bool set = sg->flags & CLK_GATE_SET_TO_DISABLE ? true : false; + + set ^= en; + + spin_lock_irqsave(common->lock, flags); + + sprd_regmap_read(common->regmap, common->reg, ®); + + if (set) + reg |= sg->enable_mask; + else + reg &= ~sg->enable_mask; + + sprd_regmap_write(common->regmap, common->reg, reg); + + spin_unlock_irqrestore(common->lock, flags); +} + +static void clk_sc_gate_toggle(const struct sprd_gate *sg, bool en) +{ + const struct sprd_clk_common *common = &sg->common; + unsigned long flags = 0; + bool set = sg->flags & CLK_GATE_SET_TO_DISABLE ? 1 : 0; + unsigned int offset; + + set ^= en; + + /* +* Each set/clear gate clock has three registers: +* common->reg - base register +* common->reg + offset - set register +* common->reg + 2 * offset - clear register +*/ + offset = set ? sg->sc_offset : sg->sc_offset * 2; + + spin_lock_irqsave(common->lock, flags); + sprd_regmap_write(common->regmap, common->reg + offset, + sg->enable_mask); + spin_unlock_irqrestore(common->lock, flags); +} + +static void sprd_gate_disable(struct clk_hw *hw) +{ + struct sprd_gate *sg = hw_to_sprd_gate(hw); + + clk_gate_toggle(sg, false); +} + +static int sprd_gate_enable(struct clk_hw *hw) +{ + struct sprd_gate *sg = hw_to_sprd_gate(hw); + + clk_gate_toggle(sg, true); + + return 0; +} + +static void sprd_sc_gate_disable(struct clk_hw *hw) +{ + struct sprd_gate *sg = hw_to_sprd_gate(hw); + + clk_sc_gate_toggle(sg, false); +} + +static int sprd_sc_gate_enable(struct clk_hw *hw) +{ + struct sprd_gate *sg = hw_to_sprd_gate(hw); + + clk_sc_gate_toggle(sg, true); + + return 0; +} +static int sprd_gate_is_enabled(struct clk_hw *hw) +{ + struct sprd_gate *sg = hw_to_sprd_gate(hw); + struct sprd_clk_common *common = &sg->common; + unsigned int reg; + + sprd_regmap_read(common->regmap, common->reg, ®); + + if (sg->flags & CLK_GATE_SET_TO_DISABLE) + reg ^= sg->enable_mask; + + reg &= sg->enable_mask; + + return reg ? 1 : 0; +} + +const struct clk_ops sprd_gate_ops = { + .disable= sprd_gate_disable, + .enable = sprd_gate_enable, + .is_enabled = sprd_gate_is_enabled, +}; +EXPORT_SYMBOL_GPL(sprd_gate_ops); + +const struct clk_ops sprd_sc_gate_ops = { + .disable= sprd_sc_gate_disable, + .enable = sprd_sc_gate_enable, + .is_enabled = sprd_gate_is_enabled, +}; +EXPORT_SYMBOL_GPL(sprd_sc_gate_ops); + diff --git a/drivers/clk/sprd/gate.h b/drivers/clk/sprd/gate.h new file mode 100644 index 000..dad8ba0 --- /dev/null +++ b/drivers/clk/sprd/gate.h @@ -0,0 +1,63 @@ +/* + * Spreadtrum gate clock driver + * + * Copyright (C) 2017 Spreadtrum, Inc. + * Author: Chunyan Zhang + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#ifndef _SPRD_GATE_H_ +#define _SPRD_GATE_H_ + +#include "common.h" + +struct sprd_gate { + u32 enable_mask; + u16 flags; + u16 sc_offset; + + struct sprd_clk_common common; +}; + +#define SPRD_SC_GATE_CLK_OPS(_struct, _name, _parent, _reg, _sc_offset, \ +_enable_mask, _flags, _gate_flags, _ops) \ + struct sprd_gate _struct = {\ + .enable_mask= _enable_mask, \ + .sc_offset = _sc_of
Re: n900 in next-20170901
On Thu, Nov 09, 2017 at 10:23:40PM -0800, Tony Lindgren wrote: > * Tony Lindgren [171109 22:19]: > > * Tony Lindgren [171110 03:28]: > > > Then I'll follow up on cleaning up save_secure_ram_context later. > > > > Here's a better version, the static mapping did not get used.. It > > just moved the area so it happened to work. It needs to be set > > up as MT_MEMORY_RWX_NONCACHED instead. > I see a better version now. Hmm... I guess that it also has the problem that I mentioned on first version. > And FYI, here's what I currently have for the follow-up patch, > but that can wait a bit. Okay. So, this patch should be applied on the top of above better version? Thanks.
[PATCH V4 00/12] add clock driver for Spreadtrum platforms
This series adds Spreadtrum clock support together with its binding documentation and devicetree data. Any comments would be greatly appreciated. This patchset also added a few common clock mactos into drivers/clk/clk_common.h, which are generally useful for all vendors' clock driver, sunxi-ng, zte, sprd (added in this patchse) use them (or part of them) at present, once this patchset is merged, I can help to remove the duplicated code which is under the vendors' respective directories. Thanks, Chunyan Changes from V3: (https://lkml.org/lkml/2017/11/2/61) * Addressed comments from Julien Thierry: - Clean the if branch of sprd_mux_helper_get_parent() - Have the Gate clock macros and ops for both mode (i.e. sc_gate and gate) separate; - Have the Mux clock macros with/without table separate, and same changes for the composite clock. * Switched the function name from _endisable to _toggle; * Fixed Kbuild test error: - Added exporting sprd_clk_regmap_init() which would be used in other module(s); * Change the function sprd_clk_set_regmap() to the static one, and removed the declear from the include file; * Addressed comments from Rob: - Separate the dt-binding include file from the driver patch; - Documented more for the property "clocks" * Changed the syscon device names; * Changed the name of 'sprd_mux_internal' to 'sprd_mux_ssel' Changes from V2: (http://lkml.iu.edu/hypermail/linux/kernel/1707.1/01504.html) * Switch to use regmap to access registers; * Splited all clocks into 16 separated nodes, for each belongs to a single address area; * Rearranged the order of clock declaration in sc9860-clk.c, sorted them upon the address area; * Added syscon device tree nodes which will be quoted by the node of clocks which are in the same address area with the syscon device; * Revised the binding documentation according to the dt modification. Changes from V1: (https://lkml.org/lkml/2017/6/17/356) * Address Stephen's comments: - Switch to use platform device driver instead of the DT probing mechanism. - Move the common clock macro out from vendor directory, but need to remove those overlap code from other vendors (such as sunxi-ng) once this get merged. - Add support to be built as a module. - Add 'sprd_' prefix for all spin locks used in these drivers. - Mark input parameter of sprd_x with const. - Remove unreasonable dependencies to CONFIG_64BIT. - Add readl() after writing the same register. - Remove CLK_IS_BASIC which is no longer used. - Remove unnecessery CLK_IGNORE_UNUSED when defining a clock. - Change to expose all clock index. - Use clk_ instead of ccu. - Add Kconfig for sprd clocks. - Move the fixed clocks out from the soc node. - Switch to use 64-bit math in pll driver instead of 32-bit math. * Revise binding documentation according to dt modification. * Rename sc9860.c to sc9860-clk.c Chunyan Zhang (12): drivers: move clock common macros out from vendor directories dt-bindings: Add Spreadtrum clock binding documentation clk: sprd: Add common infrastructure clk: sprd: add gate clock support clk: sprd: add mux clock support clk: sprd: add divider clock support clk: sprd: add composite clock support clk: sprd: add adjustable pll support clk: sprd: Add dt-bindings include file for SC9860 clk: sprd: add clocks support for SC9860 arm64: dts: add syscon for whale2 platform arm64: dts: add clocks for SC9860 Documentation/devicetree/bindings/clock/sprd.txt | 63 + arch/arm64/boot/dts/sprd/sc9860.dtsi | 115 ++ arch/arm64/boot/dts/sprd/whale2.dtsi | 48 +- drivers/clk/Kconfig |1 + drivers/clk/Makefile |1 + drivers/clk/clk_common.h | 60 + drivers/clk/sprd/Kconfig | 14 + drivers/clk/sprd/Makefile| 11 + drivers/clk/sprd/common.c| 113 ++ drivers/clk/sprd/common.h| 54 + drivers/clk/sprd/composite.c | 65 + drivers/clk/sprd/composite.h | 55 + drivers/clk/sprd/div.c | 100 ++ drivers/clk/sprd/div.h | 79 + drivers/clk/sprd/gate.c | 124 ++ drivers/clk/sprd/gate.h | 63 + drivers/clk/sprd/mux.c | 86 + drivers/clk/sprd/mux.h | 78 + drivers/clk/sprd/pll.c | 268 +++ drivers/clk/sprd/pll.h | 110 ++ drivers/clk/sprd/sc9860-clk.c| 1987 ++ include/dt-bindings/clock/sprd,sc9860-clk.h | 408 + 22 files changed, 3901 insertions(+), 2 deletions(-) create mode 100644 Documentation/devicetree/bindings/clock/sprd.txt create mode 100644 drivers/clk/clk_common.h create mode 100644 drivers/clk/sprd/Kco
Re: [PATCH] x86, pkeys: update documentation about availability
On 11/09/2017 10:12 PM, Ingo Molnar wrote: > > * Dave Hansen wrote: > >> >> From: Dave Hansen >> >> Now that CPUs that implement Memory Protection Keys are publicly >> available we can be a bit less oblique about where it is available. >> >> Signed-off-by: Dave Hansen >> --- >> >> b/Documentation/x86/protection-keys.txt |9 +++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff -puN Documentation/x86/protection-keys.txt~pkeys-update >> Documentation/x86/protection-keys.txt >> --- a/Documentation/x86/protection-keys.txt~pkeys-update 2017-11-09 >> 10:36:53.381467202 -0800 >> +++ b/Documentation/x86/protection-keys.txt 2017-11-09 10:43:15.527466249 >> -0800 >> @@ -1,5 +1,10 @@ >> -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature >> -which will be found on future Intel CPUs. >> +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature >> +which is found on Intel's Skylake "Scalable Processor" Server CPUs. >> +It will be avalable in future non-server parts. >> + >> +For anyone wishing to test or use this feature, it is available in >> +Amazon's EC2 C5 instances and is known to work there using an Ubuntu >> +17.04 image. >> >> Memory Protection Keys provides a mechanism for enforcing page-based >> protections, but without requiring modification of the page tables > > Could we please first fix the pkeys self-test? One of the testcases doesn't > build > at all: > > gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 > -O2 -g -std=gnu99 -pthread -Wall -no-pie protection_keys.c -lrt -ldl -lm > In file included from /usr/include/signal.h:57:0, > from protection_keys.c:33: > protection_keys.c: In function ‘signal_handler’: > protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or > ‘__attribute__’ > before ‘.’ token >u64 si_pkey; That's odd. I build them all the time. I compiled it just now with 4.14-rc8 and gcc 4.8.4. I wonder if this is more fallout from the glibc headers getting updated to now contain pkey-related stuff. si_pkey might be getting #defined over for the siginfo si_pkey. What distro are you seeing this on? > plus, on a related note, the MPX testcase produces annoying warnings: > > gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 -O2 > -g -std=gnu99 -pthread -Wall -no-pie mpx-mini-test.c -lrt -ldl -lm > mpx-mini-test.c: In function ‘insn_test_failed’: > mpx-mini-test.c:1406:3: warning: array subscript is above array bounds > [-Warray-bounds] > printf("bte[1]: %lx\n", bte->contents[1]); This is kinda a weird structure: > struct mpx_bt_entry { > union { > char x[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES]; > unsigned long contents[1]; > }; > } __attribute__((packed)); I guess it should either be contents[0] or contents[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTE/sizeof(long)]. But, the warning is harmless at least. What gcc is this, btw? I must be behind the times.
Re: [PATCH] checkpatch: Fix checks for Kconfig help text
On Fri, Nov 10, 2017 at 02:32:37PM +0800, Leo Yan wrote: > If one patch has Kconfig section with only one 'config', then variable > '$is_start' will be set by first 'config' line and '$is_end' set by the > second 'config' line. But patches often has only one 'config' line so > we have no chance to set '$is_end', as result below condition is invalid > and it skips check for Kconfig description: Sorry for the bad commit log, I will send v2 for this. > if ($is_start && $is_end && $length < $min_conf_desc_length) { > .. > } > > When script runs to this condition sentence it means the Kconfig > section parsing has been completed, whatever '$is_end' is true > or not. So removes '$is_end' from condition sentence. > > Another change is to change '$min_conf_desc_length' from 4 to 1; so can > pass the check if the Kconfig description has at least one line. > > Signed-off-by: Leo Yan > --- > scripts/checkpatch.pl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index 3453df9..ba724b0 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf"; > my $max_line_length = 80; > my $ignore_perl_version = 0; > my $minimum_perl_version = 5.10.0; > -my $min_conf_desc_length = 4; > +my $min_conf_desc_length = 1; > my $spelling_file = "$D/spelling.txt"; > my $codespell = 0; > my $codespellfile = "/usr/share/codespell/dictionary.txt"; > @@ -2796,7 +2796,7 @@ sub process { > } > $length++; > } > - if ($is_start && $is_end && $length < > $min_conf_desc_length) { > + if ($is_start && $length < $min_conf_desc_length) { > WARN("CONFIG_DESCRIPTION", >"please write a paragraph that describes > the config symbol fully\n" . $herecurr); > } > -- > 2.7.4 >
[tip:x86/urgent] x86/debug: Handle warnings before the notifier chain, to fix KGDB crash
Commit-ID: a8d6c1bd62ffefb075c9d3570f07659e2a36ecb3 Gitweb: https://git.kernel.org/tip/a8d6c1bd62ffefb075c9d3570f07659e2a36ecb3 Author: Alexander Shishkin AuthorDate: Mon, 24 Jul 2017 13:04:28 +0300 Committer: Ingo Molnar CommitDate: Fri, 10 Nov 2017 07:16:23 +0100 x86/debug: Handle warnings before the notifier chain, to fix KGDB crash Commit: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") turned warnings into UD0, but the fixup code only runs after the notify_die() chain. This is a problem, in particular, with kgdb, which kicks in as if it was a BUG(). Fix this by running the fixup code before the notifier chain in the invalid op handler path. Signed-off-by: Alexander Shishkin Tested-by: Ilya Dryomov Acked-by: Daniel Thompson Cc: Jason Wessel Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Richard Weinberger Cc: Thomas Gleixner Cc: # v4.12+ Link: http://lkml.kernel.org/r/20170724100428.19173-1-alexander.shish...@linux.intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/traps.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 67db4f4..5a6b8f8 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -209,9 +209,6 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str, if (fixup_exception(regs, trapnr)) return 0; - if (fixup_bug(regs, trapnr)) - return 0; - tsk->thread.error_code = error_code; tsk->thread.trap_nr = trapnr; die(str, regs, error_code); @@ -292,6 +289,13 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str, RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); + /* +* WARN*()s end up here; fix them up before we call the +* notifier chain. +*/ + if (!user_mode(regs) && fixup_bug(regs, trapnr)) + return; + if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) != NOTIFY_STOP) { cond_local_irq_enable(regs);
Re: [PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support
Hi, On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote: > Add the pcie controller ep function support of layerscape base on > pcie ep framework. > > Signed-off-by: Bao Xiaowei > --- > v2: > - fix the ioremap function used but no ioumap issue > - optimize the code structure > - add code comments > v3: > - fix the msi outband window request failed issue > v4: > - optimize the code, adjust the format > > drivers/pci/dwc/pci-layerscape.c | 120 > --- > 1 file changed, 113 insertions(+), 7 deletions(-) $subject should begin with PCI: layerscape: > > diff --git a/drivers/pci/dwc/pci-layerscape.c > b/drivers/pci/dwc/pci-layerscape.c > index 87fa486bee2c..6f3e434599e0 100644 > --- a/drivers/pci/dwc/pci-layerscape.c > +++ b/drivers/pci/dwc/pci-layerscape.c > @@ -34,7 +34,12 @@ > /* PEX Internal Configuration Registers */ > #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */ > > +#define PCIE_DBI2_BASE 0x1000 /* DBI2 base address*/ The base address should come from dt. > +#define PCIE_MSI_MSG_DATA_OFF0x5c/* MSI Data register address*/ > +#define PCIE_MSI_OB_SIZE 4096 > +#define PCIE_MSI_ADDR_OFFSET (1024 * 1024) > #define PCIE_IATU_NUM6 > +#define PCIE_EP_ADDR_SPACE_SIZE 0x1 > > struct ls_pcie_drvdata { > u32 lut_offset; > @@ -44,12 +49,20 @@ struct ls_pcie_drvdata { > const struct dw_pcie_ops *dw_pcie_ops; > }; > > +struct ls_pcie_ep { > + dma_addr_t msi_phys_addr; > + void __iomem *msi_virt_addr; > + u64 msi_msg_addr; > + u16 msi_msg_data; > +}; > + > struct ls_pcie { > struct dw_pcie *pci; > void __iomem *lut; > struct regmap *scfg; > const struct ls_pcie_drvdata *drvdata; > int index; > + struct ls_pcie_ep *pcie_ep; > }; > > #define to_ls_pcie(x)dev_get_drvdata((x)->dev) > @@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = { > { }, > }; > > +static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep) > +{ > + iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr); > +} > + > +static int ls_pcie_raise_irq(struct dw_pcie_ep *ep, > + enum pci_epc_irq_type type, u8 interrupt_num) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + struct ls_pcie *pcie = to_ls_pcie(pci); > + struct ls_pcie_ep *pcie_ep = pcie->pcie_ep; > + u32 free_win; > + > + /* get the msi message address and msi message data */ > + pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) | > + (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32); > + pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF); > + > + /* request and config the outband window for msi */ > + free_win = find_first_zero_bit(&ep->ob_window_map, > + sizeof(ep->ob_window_map)); > + if (free_win >= ep->num_ob_windows) { > + dev_err(pci->dev, "no free outbound window\n"); > + return -ENOMEM; > + } > + > + dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM, > + pcie_ep->msi_phys_addr, > + pcie_ep->msi_msg_addr, > + PCIE_MSI_OB_SIZE); > + > + set_bit(free_win, &ep->ob_window_map); This custom logic is not required. You can use [1] instead [1] -> https://lkml.org/lkml/2017/11/3/318 > + > + /* generate the msi interrupt */ > + ls_pcie_raise_msi_irq(pcie_ep); > + > + /* release the outband window of msi */ > + dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND); > + clear_bit(free_win, &ep->ob_window_map); > + > + return 0; > +} > + > +static struct dw_pcie_ep_ops pcie_ep_ops = { > + .raise_irq = ls_pcie_raise_irq, > +}; > + > +static int __init ls_add_pcie_ep(struct ls_pcie *pcie, > + struct platform_device *pdev) > +{ > + struct dw_pcie *pci = pcie->pci; > + struct device *dev = pci->dev; > + struct dw_pcie_ep *ep; > + struct ls_pcie_ep *pcie_ep; > + struct resource *cfg_res; > + int ret; > + > + ep = &pci->ep; > + ep->ops = &pcie_ep_ops; > + > + pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL); > + if (!pcie_ep) > + return -ENOMEM; > + > + pcie->pcie_ep = pcie_ep; > + > + cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config"); > + if (cfg_res) { > + ep->phys_base = cfg_res->start; > + ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE; > + } else { > + dev_err(dev, "missing *config* space\n"); > + return -ENODEV; > + } > + > + pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET; > + > + pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr, > + PCIE_MSI_OB_SI
Re: [PATCH] perf evsel: Fix incorrect precise_ip in default event name
Hello, On Fri, Nov 10, 2017 at 01:49:06PM +0800, Mengting Zhang wrote: > When no event is specified with -e option, perf will specify a > "cycles" event with the highest level of precision available in > perf_event_attr.precise_ip as the default event. But the evsel name > shows an incorrect precise ip, fix it. > > For example, with a highest precision perf_event_attr.precise_ip = 2, > the evsel name "cycles:ppp" shows a wrong precision available. > > Before: > $./perf record sleep 1 > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ] > $./perf evlist -v > cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, > sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, > comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, > sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 > > After: > $./perf record sleep 1 > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ] > $./perf evlist -v > cycles:pp: size: 112, { sample_period, sample_freq }: 4000, > sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, > comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, > sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 > > Signed-off-by: Mengting Zhang > --- > tools/perf/util/evsel.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c > index 0dccdb8..94cf11d 100644 > --- a/tools/perf/util/evsel.c > +++ b/tools/perf/util/evsel.c > @@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise) > if (asprintf(&evsel->name, "cycles%s%s%.*s", >(attr.precise_ip || attr.exclude_kernel) ? ":" : "", >attr.exclude_kernel ? "u" : "", > - attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0) > + attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0) I think you don't need to check value of the precise_ip anymore. The following should be ok: attr.precise_ip, "ppp") < 0) Thanks, Namhyung
[PATCH] checkpatch: Fix checks for Kconfig help text
If one patch has Kconfig section with only one 'config', then variable '$is_start' will be set by first 'config' line and '$is_end' set by the second 'config' line. But patches often has only one 'config' line so we have no chance to set '$is_end', as result below condition is invalid and it skips check for Kconfig description: if ($is_start && $is_end && $length < $min_conf_desc_length) { .. } When script runs to this condition sentence it means the Kconfig section parsing has been completed, whatever '$is_end' is true or not. So removes '$is_end' from condition sentence. Another change is to change '$min_conf_desc_length' from 4 to 1; so can pass the check if the Kconfig description has at least one line. Signed-off-by: Leo Yan --- scripts/checkpatch.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 3453df9..ba724b0 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -51,7 +51,7 @@ my $configuration_file = ".checkpatch.conf"; my $max_line_length = 80; my $ignore_perl_version = 0; my $minimum_perl_version = 5.10.0; -my $min_conf_desc_length = 4; +my $min_conf_desc_length = 1; my $spelling_file = "$D/spelling.txt"; my $codespell = 0; my $codespellfile = "/usr/share/codespell/dictionary.txt"; @@ -2796,7 +2796,7 @@ sub process { } $length++; } - if ($is_start && $is_end && $length < $min_conf_desc_length) { + if ($is_start && $length < $min_conf_desc_length) { WARN("CONFIG_DESCRIPTION", "please write a paragraph that describes the config symbol fully\n" . $herecurr); } -- 2.7.4
Re: n900 in next-20170901
On Thu, Nov 09, 2017 at 07:26:10PM -0800, Tony Lindgren wrote: > * Joonsoo Kim [171110 00:10]: > > On Thu, Nov 09, 2017 at 07:08:54AM -0800, Tony Lindgren wrote: > > > Hmm OK. Does your first patch above now have the initcall issue too? > > > It boots if I make that also subsys_initcall and then I get: > > > > > [2.078094] vmalloc_pool_init: DMA: get vmalloc area: d001 > > > > Yes, first patch has the initcall issue and it's intentional in order > > to check the theory. I checked following log for this. > > > > - Boot failure > > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000 > > SRAM_ADDR: omap_map_sram: V: 0xd005 - 0xd0057000 > > > > - Boot success > > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000 > > SRAM_ADDR: omap_map_sram: V: 0xd0008000 - 0xd000f000 > > > > When failure, virtual address for sram is higher than normal one due > > to vmalloc area allocation in __dma_alloc_remap(). If it is deferred, > > virtual address is the same with success case and then the system work. > > > > So, my next theory is that there is n900 specific assumption that sram > > should have that address. Could you check if any working tree for n900 > > which doesn't have my CMA series work or not with adding > > "arm/dma: vmalloc area allocation"? > > Oh I see, sorry I was not following you earlier. So you mean that > by adding the vmalloc_pool_init() initcall the va mapping for SRAM > changes. Exactly. > > And yes, save_secure_ram_context seems to be doing some sketchy > virt to phys calculation with sram_phy_addr_mask. Here's a small > patch to fix that for your CMA series, maybe you can merge it > with your series to avoid breaking booting for git bisect. > > Then I'll follow up on cleaning up save_secure_ram_context later. Thanks for the patch. However, the patch should be modified. See below. > Regards, > > Tony > > 8< - > >From tony Mon Sep 17 00:00:00 2001 > From: Tony Lindgren > Date: Thu, 9 Nov 2017 17:05:34 -0800 > Subject: [PATCH] ARM: OMAP2+: Add static SRAM mapping for > save_secure_ram_context > > With the CMA changes from Joonsoo Kim , it > was noticed that n900 stopped booting. After investigating it turned > out that n900 save_secure_ram_context does some whacky virtual to > physical address translation for the SRAM data address. > > Let's fix this for CMA changes by adding a static mapping for SRAM > on omap3. Then we can follow up with a patch to clean up the address > translation in save_secure_ram_context later on. > > Debugged-by: Joonsoo Kim > Signed-off-by: Tony Lindgren > --- > arch/arm/mach-omap2/io.c| 6 ++ > arch/arm/mach-omap2/iomap.h | 4 > 2 files changed, 10 insertions(+) > > diff --git a/arch/arm/mach-omap2/io.c b/arch/arm/mach-omap2/io.c > --- a/arch/arm/mach-omap2/io.c > +++ b/arch/arm/mach-omap2/io.c > @@ -139,6 +139,12 @@ static struct map_desc omap243x_io_desc[] __initdata = { > > #ifdef CONFIG_ARCH_OMAP3 > static struct map_desc omap34xx_io_desc[] __initdata = { > + { > + .virtual= OMAP34XX_SRAM_VIRT, > + .pfn= __phys_to_pfn(OMAP34XX_SRAM_PHYS), > + .length = OMAP34XX_SRAM_SIZE, > + .type = MT_DEVICE > + }, > { > .virtual= L3_34XX_VIRT, > .pfn= __phys_to_pfn(L3_34XX_PHYS), > diff --git a/arch/arm/mach-omap2/iomap.h b/arch/arm/mach-omap2/iomap.h > --- a/arch/arm/mach-omap2/iomap.h > +++ b/arch/arm/mach-omap2/iomap.h > @@ -123,6 +123,10 @@ > * VPOM3430 was not working for Int controller > */ > > +#define OMAP34XX_SRAM_PHYS 0x4020 > +#define OMAP34XX_SRAM_VIRT 0xd001 > +#define OMAP34XX_SRAM_SIZE 0x1 For my testing environment, vmalloc address space is started at roughly 0xe000 so 0xd001 would not be valid. And, PHYS can be different according to the system type. Maybe either OMAP3_SRAM_PUB_PA or OMAP3_SRAM_PA. It seems that SIZE and TYPE should be considered, too. My understanding is correct? Thanks.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 06:25 PM, Andy Lutomirski wrote: > Here are two proposals to address this without breaking vsyscalls. > > 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high > mappings but, optionally, warn if you see _PAGE_USER on any address > that isn't the vsyscall page. > > 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so > KAISER doesn't muck with it. These are totally doable. But, what's the big deal with breaking native vsyscall? We can still do the emulation so nothing breaks: it is just slow.
Re: [PATCH 3/4] kbuild: create object directories simpler and faster
* Masahiro Yamada wrote: > For the out-of-tree build, scripts/Makefile.build creates output > directories, but this operation is not efficient. > > scripts/Makefile.lib calculates obj-dirs as follows: > > obj-dirs := $(dir $(multi-objs) $(obj-y)) > > Please notice $(sort ...) is not used here. Usually the resulted > obj-dirs is as many "./" as objects. > > For those duplicated paths, the following command is invoked. > > _dummy := $(foreach d,$(obj-dirs), $(shell [ -d $(d) ] || mkdir -p $(d))) > > Then, the costly shell command is run over and over again. > > I see many points for optimization: > > [1] Use $(sort ...) to cut down duplicated paths before passing them > to system call > [2] Use single $(shell ...) instead of repeating it with $(foreach ...) > This will reduce forking. > [3] We can calculate obj-dirs more simply. Most of objects are already > accumulated in $(targets). So, $(dir $(targets)) is fine and more > comprehensive. > > I also removed bad code in arch/x86/entry/vdso/Makefile. This is now > really unnecessary. > > Signed-off-by: Masahiro Yamada > --- > > arch/x86/entry/vdso/Makefile | 4 > scripts/Makefile.build | 15 ++- > scripts/Makefile.host| 11 --- > scripts/Makefile.lib | 5 - > 4 files changed, 6 insertions(+), 29 deletions(-) I love not just the speedup, but the diffstat as well ;-) Acked-by: Ingo Molnar Thanks, Ingo
Re: n900 in next-20170901
* Tony Lindgren [171109 22:19]: > * Tony Lindgren [171110 03:28]: > > Then I'll follow up on cleaning up save_secure_ram_context later. > > Here's a better version, the static mapping did not get used.. It > just moved the area so it happened to work. It needs to be set > up as MT_MEMORY_RWX_NONCACHED instead. And FYI, here's what I currently have for the follow-up patch, but that can wait a bit. Regards, Tony 8< diff --git a/arch/arm/mach-omap2/sleep34xx.S b/arch/arm/mach-omap2/sleep34xx.S --- a/arch/arm/mach-omap2/sleep34xx.S +++ b/arch/arm/mach-omap2/sleep34xx.S @@ -45,7 +45,6 @@ #define PM_PWSTCTRL_MPU_P OMAP3430_PRM_BASE + MPU_MOD + OMAP2_PM_PWSTCTRL #define CM_IDLEST1_CORE_V OMAP34XX_CM_REGADDR(CORE_MOD, CM_IDLEST1) #define CM_IDLEST_CKGEN_V OMAP34XX_CM_REGADDR(PLL_MOD, CM_IDLEST) -#define SRAM_BASE_POMAP3_SRAM_PA #define CONTROL_STAT OMAP343X_CTRL_BASE + OMAP343X_CONTROL_STATUS #define CONTROL_MEM_RTA_CTRL (OMAP343X_CTRL_BASE +\ OMAP36XX_CONTROL_MEM_RTA_CTRL) @@ -103,10 +102,8 @@ ENTRY(save_secure_ram_context) stmfd sp!, {r4 - r11, lr} @ save registers on stack adr r3, api_params @ r3 points to parameters str r0, [r3,#0x4] @ r0 has sdram address - ldr r12, high_mask - and r3, r3, r12 - ldr r12, sram_phy_addr_mask - orr r3, r3, r12 + ldr r12, sram_phys_offset @ load sram physical offset + sub r3, r3, r12 @ parameters physical address mov r0, #25 @ set service ID for PPA mov r12, r0 @ copy secure service ID in r12 mov r1, #0 @ set task id for ROM code in r1 @@ -121,10 +118,8 @@ ENTRY(save_secure_ram_context) nop ldmfd sp!, {r4 - r11, pc} .align -sram_phy_addr_mask: - .word SRAM_BASE_P -high_mask: - .word 0x +sram_phys_offset: + .word OMAP34XX_SRAM_VIRT - OMAP34XX_SRAM_PHYS api_params: .word 0x4, 0x0, 0x0, 0x1, 0x1 ENDPROC(save_secure_ram_context) @@ -521,7 +516,7 @@ pm_pwstctrl_mpu: scratchpad_base: .word SCRATCHPAD_BASE_P sram_base: - .word SRAM_BASE_P + 0x8000 + .word OMAP34XX_SRAM_PHYS + 0x8000 control_stat: .word CONTROL_STAT control_mem_rta: diff --git a/arch/arm/mach-omap2/sram.c b/arch/arm/mach-omap2/sram.c --- a/arch/arm/mach-omap2/sram.c +++ b/arch/arm/mach-omap2/sram.c @@ -31,7 +31,7 @@ #include "sram.h" #define OMAP2_SRAM_PUB_PA (OMAP2_SRAM_PA + 0xf800) -#define OMAP3_SRAM_PUB_PA (OMAP3_SRAM_PA + 0x8000) +#define OMAP3_SRAM_PUB_PA (OMAP34XX_SRAM_PHYS + 0x8000) #define SRAM_BOOTLOADER_SZ 0x00 @@ -105,7 +105,7 @@ static void __init omap_detect_sram(void) } } else { if (cpu_is_omap34xx()) { - omap_sram_start = OMAP3_SRAM_PA; + omap_sram_start = OMAP34XX_SRAM_PHYS; omap_sram_size = 0x1; /* 64K */ } else { omap_sram_start = OMAP2_SRAM_PA; diff --git a/arch/arm/mach-omap2/sram.h b/arch/arm/mach-omap2/sram.h --- a/arch/arm/mach-omap2/sram.h +++ b/arch/arm/mach-omap2/sram.h @@ -59,4 +59,3 @@ static inline void omap_push_sram_idle(void) {} * Used by the SRAM management code and the idle sleep code. */ #define OMAP2_SRAM_PA 0x4020 -#define OMAP3_SRAM_PA 0x4020 -- 2.15.0
Re: [PATCH 30/31] dt-bindings: nds32 CPU Bindings
2017-11-09 21:57 GMT+08:00 Rob Herring : > On Thu, Nov 9, 2017 at 3:39 AM, Greentime Hu wrote: >> 2017-11-08 21:18 GMT+08:00 Rob Herring : >>> Please Cc the DT list on bindings. >> >> Sorry. I am not sure what you mean. >> Do you mean add devicet...@vger.kernel.org to cc list? > > Yes. Use get_maintainers.pl as a guide. Roger that! Thanks! >>> On Tue, Nov 7, 2017 at 11:55 PM, Greentime Hu wrote: From: Greentime Hu >>> > + device_type = "cpu"; + compatible = "andestech,n13", "andestech,n15"; >>> >>> n13 is a superset of n15? >> >> No, they are independent ones. > > Then having both is not valid. The strings should be in order of best > match to worst match where worst match is typically either older > implementations of IP blocks or generic'ish strings such as "ns16550" > for a UART. Thanks. I would like to explain it more clearly. They are independent ones in implementations. They are implemented based on the same nds32 ISA and architecture spec with different configurations like cache size, page size, cache type(VIPT/PIPT), pipeline stages... Most of them are compatible. They use the same toolchain to build vmlinux which can run on different nds32 cores.
Re: [PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB
Hi Bao, On Friday 10 November 2017 09:18 AM, Bao Xiaowei wrote: > Add the property of inbound and outbound windows number for ep > driver. > > Signed-off-by: Bao Xiaowei > Acked-by: Minghuan Lian > --- > v2: > - no change > v3: > - modify the commit message > v4: > - no change > > arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++ > 1 file changed, 6 insertions(+) $subject should start with something like arm64: dts: ls1046a: ** > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > index 06b5e12d04d8..f8332669663c 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi > @@ -674,6 +674,8 @@ > device_type = "pci"; > dma-coherent; > num-lanes = <4>; > + num-ib-windows = <6>; > + num-ob-windows = <6>; EP specific properties shouldn't be added in RC dt node. Ideally you should have a separate dt node for RC and EP. Thanks Kishon
Re: n900 in next-20170901
* Tony Lindgren [171110 03:28]: > * Joonsoo Kim [171110 00:10]: > > On Thu, Nov 09, 2017 at 07:08:54AM -0800, Tony Lindgren wrote: > > > Hmm OK. Does your first patch above now have the initcall issue too? > > > It boots if I make that also subsys_initcall and then I get: > > > > > [2.078094] vmalloc_pool_init: DMA: get vmalloc area: d001 > > > > Yes, first patch has the initcall issue and it's intentional in order > > to check the theory. I checked following log for this. > > > > - Boot failure > > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000 > > SRAM_ADDR: omap_map_sram: V: 0xd005 - 0xd0057000 > > > > - Boot success > > SRAM_ADDR: omap_map_sram: P: 0x40208000 - 0x4020f000 > > SRAM_ADDR: omap_map_sram: V: 0xd0008000 - 0xd000f000 > > > > When failure, virtual address for sram is higher than normal one due > > to vmalloc area allocation in __dma_alloc_remap(). If it is deferred, > > virtual address is the same with success case and then the system work. > > > > So, my next theory is that there is n900 specific assumption that sram > > should have that address. Could you check if any working tree for n900 > > which doesn't have my CMA series work or not with adding > > "arm/dma: vmalloc area allocation"? > > Oh I see, sorry I was not following you earlier. So you mean that > by adding the vmalloc_pool_init() initcall the va mapping for SRAM > changes. > > And yes, save_secure_ram_context seems to be doing some sketchy > virt to phys calculation with sram_phy_addr_mask. Here's a small > patch to fix that for your CMA series, maybe you can merge it > with your series to avoid breaking booting for git bisect. > > Then I'll follow up on cleaning up save_secure_ram_context later. Here's a better version, the static mapping did not get used.. It just moved the area so it happened to work. It needs to be set up as MT_MEMORY_RWX_NONCACHED instead. Regards, Tony 8< --- >From tony Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Thu, 9 Nov 2017 17:05:34 -0800 Subject: [PATCH] ARM: OMAP2+: Add static SRAM mapping for save_secure_ram_context With the CMA changes from Joonsoo Kim , it was noticed that n900 stopped booting. After investigating it turned out that n900 save_secure_ram_context does some whacky virtual to physical address translation for the SRAM data address. Let's fix this for CMA changes by adding a static mapping for SRAM on omap3. Then we can follow up with a patch to clean up the address translation in save_secure_ram_context later on. Debugged-by: Joonsoo Kim Signed-off-by: Tony Lindgren --- arch/arm/mach-omap2/io.c| 6 ++ arch/arm/mach-omap2/iomap.h | 4 2 files changed, 10 insertions(+) diff --git a/arch/arm/mach-omap2/io.c b/arch/arm/mach-omap2/io.c --- a/arch/arm/mach-omap2/io.c +++ b/arch/arm/mach-omap2/io.c @@ -139,6 +139,12 @@ static struct map_desc omap243x_io_desc[] __initdata = { #ifdef CONFIG_ARCH_OMAP3 static struct map_desc omap34xx_io_desc[] __initdata = { + { + .virtual= OMAP34XX_SRAM_VIRT, + .pfn= __phys_to_pfn(OMAP34XX_SRAM_PHYS), + .length = OMAP34XX_SRAM_SIZE, + .type = MT_MEMORY_RWX_NONCACHED + }, { .virtual= L3_34XX_VIRT, .pfn= __phys_to_pfn(L3_34XX_PHYS), diff --git a/arch/arm/mach-omap2/iomap.h b/arch/arm/mach-omap2/iomap.h --- a/arch/arm/mach-omap2/iomap.h +++ b/arch/arm/mach-omap2/iomap.h @@ -123,6 +123,10 @@ * VPOM3430 was not working for Int controller */ +#define OMAP34XX_SRAM_PHYS 0x4020 +#define OMAP34XX_SRAM_VIRT 0xd001 +#define OMAP34XX_SRAM_SIZE 0x1 + #define L4_PER_34XX_PHYS L4_PER_34XX_BASE /* 0x4900 --> 0xfb00 */ #define L4_PER_34XX_VIRT (L4_PER_34XX_PHYS + OMAP2_L4_IO_OFFSET) -- 2.15.0
Re: KGDB/KDB treats WARN*() as Oops on x86 since 4.12
* Ilya Dryomov wrote: > On Fri, Oct 13, 2017 at 4:59 PM, Daniel Thompson > wrote: > > On 09/10/17 13:24, Ilya Dryomov wrote: > >> > >> Hi Jason, > >> > >> Starting with 4.12, WARN*() is implemented with ud0, generating an > >> Invalid Opcode exception. KGDB/KDB gets entered as if it were an Oops, > >> making KGDB/KDB rather hard to use, particularly on testing kernels. > >> > >> Alexander posted a fix a while back, but Peter seems to be waiting for > >> your ack. Could you please weigh in? > >> > >>[PATCH] x86/debug: Handle warnings before the notifier chain > >>https://patchwork.kernel.org/patch/9859065/ > > > > > > Hmnnn... IIRC arm64 code has been also been blocked for a couple of releases > > whilst Will D. waited for an ack that never came. > > > > My own reading of the code is that the patch in question restores the status > > quo, that there will still be mechanisms to provoke entry to kdb/kgdb during > > a warning (breakpoint on __warn, engage panic_on_warn, etc) and that these > > are not obviously recursive[1]. > > > > Put another way I'm happy to dig the patch out of my mail archive and throw > > in an Acked-By: but since I have no official role within kdb/kgdb (I'm just > > an interested bystander) it might not be enough for Peter. > > > > > > Daniel. > > > > > > [1] I'm not a huge x86 expert so correct me if I am wrong but I think > > its ok for us to trap here providing its for a different reason. > > Hi Peter, Ingo, > > Could you please consider taking Alexander's patch for 4.15? Jason > never replied to any of our pings and hasn't been actively involved > with kgdb recently. In the meantime, this regression makes running > e.g. xfstests runs with kgdb enabled pretty much impossible. Ok, agreed, I picked the fix up into tip:x86/urgent, with a -stable backporting tag, and will try to get it to Linus for v4.15 (it will also get backported to v4.14 which is affected as well). Thanks, Ingo
Re: [PATCH] x86, pkeys: update documentation about availability
* Dave Hansen wrote: > > From: Dave Hansen > > Now that CPUs that implement Memory Protection Keys are publicly > available we can be a bit less oblique about where it is available. > > Signed-off-by: Dave Hansen > --- > > b/Documentation/x86/protection-keys.txt |9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff -puN Documentation/x86/protection-keys.txt~pkeys-update > Documentation/x86/protection-keys.txt > --- a/Documentation/x86/protection-keys.txt~pkeys-update 2017-11-09 > 10:36:53.381467202 -0800 > +++ b/Documentation/x86/protection-keys.txt 2017-11-09 10:43:15.527466249 > -0800 > @@ -1,5 +1,10 @@ > -Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature > -which will be found on future Intel CPUs. > +Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature > +which is found on Intel's Skylake "Scalable Processor" Server CPUs. > +It will be avalable in future non-server parts. > + > +For anyone wishing to test or use this feature, it is available in > +Amazon's EC2 C5 instances and is known to work there using an Ubuntu > +17.04 image. > > Memory Protection Keys provides a mechanism for enforcing page-based > protections, but without requiring modification of the page tables Could we please first fix the pkeys self-test? One of the testcases doesn't build at all: gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall -no-pie protection_keys.c -lrt -ldl -lm In file included from /usr/include/signal.h:57:0, from protection_keys.c:33: protection_keys.c: In function ‘signal_handler’: protection_keys.c:253:6: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘.’ token u64 si_pkey; ^ plus, on a related note, the MPX testcase produces annoying warnings: gcc -m32 -o /home/mingo/tip/tools/testing/selftests/x86/mpx-mini-test_32 -O2 -g -std=gnu99 -pthread -Wall -no-pie mpx-mini-test.c -lrt -ldl -lm mpx-mini-test.c: In function ‘insn_test_failed’: mpx-mini-test.c:1406:3: warning: array subscript is above array bounds [-Warray-bounds] printf("bte[1]: %lx\n", bte->contents[1]); ^ mpx-mini-test.c:1407:3: warning: array subscript is above array bounds [-Warray-bounds] printf("bte[2]: %lx\n", bte->contents[2]); ^ mpx-mini-test.c:1408:3: warning: array subscript is above array bounds [-Warray-bounds] printf("bte[3]: %lx\n", bte->contents[3]); ^ Thanks, Ingo
Re: [01/18] x86/asm/64: Remove the restore_c_regs_and_iret label
Some performance regression/improvement is reported by LKP-tools for this patch series tested with Intel Atom processor. So, post the data here for your reference. Branch:x86/entry_consolidation Commit id: base:50da9d439392fdd91601d36e7f05728265bff262 head:69af865668fdb86a95e4e948b1f48b2689d60b73 Benchmark suite:will-it-scale Download link:https://github.com/antonblanchard/will-it-scale/tree/master/tests Metrics: will-it-scale.per_process_ops=processes/nr_cpu will-it-scale.per_thread_ops=threads/nr_cpu tbox:lkp-avoton3(nr_cpu=8,memory=16G) CPU: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz Performance regression with will-it-scale benchmark suite: testcasebasechange headmetric eventfd11505677 -5.9% 1416132 will-it-scale.per_process_ops 1352716 -3.0% 1311943 will-it-scale.per_thread_ops lseek2 7306698 -4.3% 6991473 will-it-scale.per_process_ops 4906388 -3.6% 4730531 will-it-scale.per_thread_ops lseek1 7355365 -4.2% 7046224 will-it-scale.per_process_ops 4928961 -3.7% 4748791 will-it-scale.per_thread_ops getppid18479806 -4.1% 8129026 will-it-scale.per_process_ops 8515252 -4.1% 8162076 will-it-scale.per_thread_ops lock1 1054249 -3.2% 1020895 will-it-scale.per_process_ops 989145 -2.6% 963578 will-it-scale.per_thread_ops dup12675825 -3.0% 2596257 will-it-scale.per_process_ops futex3 4986520 -2.8% 4846640 will-it-scale.per_process_ops 5009388 -2.7% 4875126 will-it-scale.per_thread_ops futex4 3932936 -2.0% 3854240 will-it-scale.per_process_ops 3950138 -2.0% 3872615 will-it-scale.per_thread_ops futex1 2941886 -1.8% 2888912 will-it-scale.per_process_ops futex2 2500203 -1.6% 2461065 will-it-scale.per_process_ops 1534692 -2.3% 1499532 will-it-scale.per_thread_ops malloc1 61314 -1.0% 60725 will-it-scale.per_process_ops 19996 -1.5% 19688 will-it-scale.per_thread_ops Performance improvement with will-it-scale benchmark suite: testcasebasechange headmetric context_switch1 176376 +1.6% 179152 will-it-scale.per_process_ops 180703 +1.9% 184209 will-it-scale.per_thread_ops page_fault2 179716 +2.5% 184272 will-it-scale.per_process_ops 146890 +2.8% 150989 will-it-scale.per_thread_ops page_fault3 666953 +3.7% 691735 will-it-scale.per_process_ops 464641 +5.0% 487952 will-it-scale.per_thread_ops unix1 483094 +4.4% 504201 will-it-scale.per_process_ops 450055 +7.5% 483637 will-it-scale.per_thread_ops read2 575887 +5.0% 604440 will-it-scale.per_process_ops 500319 +5.2% 526361 will-it-scale.per_thread_ops poll1 4614597 +5.4% 4864022 will-it-scale.per_process_ops 3981551 +5.8% 4213409 will-it-scale.per_thread_ops pwrite2 383344 +5.7% 405151 will-it-scale.per_process_ops 367006 +5.0% 385209 will-it-scale.per_thread_ops sched_yield 3011191 +6.0% 3191710 will-it-scale.per_process_ops 3024171 +6.1% 3208197 will-it-scale.per_thread_ops pipe1 755487 +6.2% 802622 will-it-scale.per_process_ops 705136 +8.8% 766950 will-it-scale.per_thread_ops pwrite3 422850 +6.6% 450660 will-it-scale.per_process_ops 413370 +3.7% 428704 will-it-scale.per_thread_ops readseek1 972102 +6.7% 1036852 will-it-scale.per_process_ops 844877 +6.6% 900686 will-it-scale.per_thread_ops pwrite1 981310 +6.8% 1047809 will-it-scale.per_process_ops 94
Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-10 0:00 GMT+08:00 Radim Krcmar : > 2017-11-09 20:43+0800, Wanpeng Li: >> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin : >> > Currently, the existing qspinlock implementation will fallback to >> > test-and-set if the hypervisor has not set the PV_UNHALT flag. >> > >> > This patch gives the opportunity to guest kernels to select >> > between test-and-set and the regular queueu fair lock implementation >> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED >> > flag is not set, the code will still fall back to test-and-set, >> > but when the PV_DEDICATED flag is set, the code will use >> > the regular queue spinlock implementation. >> > >> > With this patch, when in autoselect mode, the guest will >> > use the default spinlock implementation based on host feature >> > flags as follows: >> > >> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock >> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock >> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas >> > >> > Cc: Paolo Bonzini >> > Cc: "Radim Krčmář" >> > Cc: Jonathan Corbet >> > Cc: Thomas Gleixner >> > Cc: Ingo Molnar >> > Cc: "H. Peter Anvin" >> > Cc: x...@kernel.org >> > Cc: Peter Zijlstra >> > Cc: Waiman Long >> > Cc: k...@vger.kernel.org >> > Cc: linux-...@vger.kernel.org >> > Cc: linux-kernel@vger.kernel.org >> > Cc: Jan H. Schoenherr >> > Cc: Anthony Liguori >> > Suggested-by: Matt Wilson >> > Signed-off-by: Eduardo Valentin >> > --- >> >> You should also add a cpuid flag in kvm part. > > It is better without that. The flag has no dependency on KVM (kernel > hypervisor) code. Do you mean -cpu host, +,I think it will result in "warning: host doesn't support requested feature: CPUID.4001H:eax.XX" Regards, Wanpeng Li
Re: Kernel crash in free_pipe_info()
On 1 November 2017 at 14:19, Cong Wang wrote: > On Mon, Oct 30, 2017 at 7:08 PM, Linus Torvalds > wrote: >> On Mon, Oct 30, 2017 at 6:19 PM, Cong Wang wrote: >>> >>> 1. The faulty addresses are all near 0001, with one exception >>> of null (which is the most recent one) >> >> Well, they're at 8(%rax), except for that last case. >> >> And in every case (_including_ that last case), %rax has a very >> interesting pattern.. That's the (bad) buf->ops pointer that was >> loaded from the somehow corrupted "buf". >> >> The values in all cases are >> >> fffa >> fffd >> fff1 >> fff7 >> fff4 >> fffa >> fffd >> fffd >> fffa >> ffe8 >> fff1 >> fff7 >> >> which kind of looks like a 32-bit error value. So we have (n, val, (errno)): >> >> 1 -24 (EMFILE) >> 2 -15 (ENOTBLK) >> 1 -12 (ENOMEM) >> 2 -9 (EBADF) >> 3 -6 (ENXIO) >> 3 -3 (ESRCH) >> >> none of which makes any sense to me, but it's an interesting pattern >> nonetheless. > > > Yeah, good find! > > >> >>> 2. R12 register, which should map to the local vairable 'i', is always 0x8 >>> at the time of crash. >> >> So _if_ this is some kind of use-after-free thing, and the allocation >> got re-used for something else, that might just be related to whatever >> ends up being the offset that is filled in with the (int) error >> number. >> >> Except the offset is that %r12*0x28+0x10, so we're talking a byte >> offset of 330 bytes into the allocation, and apparently the eight >> previous (0-7) iterations were fine. >> >> Which is really odd. >> >> I'm not seeing anything that makes sense. I'll have to think about this. >> >> I'm assuming you don't have slub debugging enabled, and no way to >> enable it and try to catch this? > > We enable it at compile-time but not at run-time: > > CONFIG_SLUB_DEBUG=y > CONFIG_SLUB=y > CONFIG_SLUB_CPU_PARTIAL=y > # CONFIG_SLUB_DEBUG_ON is not set > # CONFIG_SLUB_STATS is not set > > I can try to manually add slub_debug in boot parameters, but still > have no idea how and when can trigger this bug again. > > > Thanks! This looks familiar... https://github.com/moby/moby/issues/34472 >From the bug report: "In particular, it looks like either docker-containerd or docker-containerd-shim (the log is cut off) has a pipe open that is causing a kernel BUG when attempting to kill the process. Fun times."
Re: [PATCH v2 2/4] kaslr: select the memory region in immovable node to process
On Fri, Nov 10, 2017 at 11:14:37AM +0800, Baoquan He wrote: >On 11/10/17 at 11:03am, Chao Fan wrote: >> On Thu, Nov 09, 2017 at 04:21:32PM +0800, Baoquan He wrote: >> >Hi Chao, >> > >> >On 11/01/17 at 07:32pm, Chao Fan wrote: >> >> Compare the region of memmap entry and immovable_mem, then choose the >> >> intersection to process_mem_region. >> >> >> >> Since the interrelationship between e820 or efi entries and memory >> >> region in immovable_mem is different: >> > >> >Could you paste a bootlog with efi=debug specified in cmdline on the >> >system you tested? I want to check what kind of intersection between >> >them. The adding makes code pretty ugly, want to make sure if we have >> >to do like this. >> Hi Baoquan, >> >> Here is a machine with efi. > Here is a log for e820, also 10 nodes in this machine. Thanks, Chao Fan >Thanks, do you have the whole boot log? I want to have a look at e820. >And this is a special system, or a customized system? I mean you just >customize the firmware for better testing to cover kinds of cases. > >If it's too big, please attach it and send to me privately. > >Anyway, seems your considering about the intersection is right. > >Thanks >Baoquan >> >> The memory information in SRAT from dmesg: >> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009] >> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0x1f3f] >> [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x1f40-0x3e7f] >> [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x3e80-0x5dbf] >> [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x5dc0-0x7cff] >> [0.00] ACPI: SRAT: Node 4 PXM 4 [mem 0x7d00-0x9c3f] >> [0.00] ACPI: SRAT: Node 5 PXM 5 [mem 0x9c40-0xbb7f] >> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0xbb80-0xbfff] >> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0x1-0x11abf] >> [0.00] ACPI: SRAT: Node 7 PXM 7 [mem 0x11ac0-0x139ff] >> [0.00] ACPI: SRAT: Node 8 PXM 8 [mem 0x13a00-0x1593f] >> [0.00] ACPI: SRAT: Node 9 PXM 9 [mem 0x15940-0x1787f] >> >> There are 10 nodes, and 500M memory in every node. >> And node0 and node 6 has two parts. >> >> >> Here is the efi mem: >> [0.00] efi: mem00: [Boot Code | | | | | | | | >> |WB|WT|WC|UC] range=[0x-0x0fff] (0MB) >> [0.00] efi: mem01: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x1000-0x1fff] (0MB) >> [0.00] efi: mem02: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x2000-0x0009] (0MB) >> [0.00] efi: mem03: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x0010-0x00805fff] (7MB) >> [0.00] efi: mem04: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0x00806000-0x00806fff] (0MB) >> [0.00] efi: mem05: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x00807000-0x0081] (0MB) >> [0.00] efi: mem06: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0x0082-0x012f] (10MB) >> [0.00] efi: mem07: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x0130-0x01ff] (13MB) >> [0.00] efi: mem08: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x0200-0x036e3fff] (22MB) >> (From mem00 to mem08, belongs to node0) >> [0.00] efi: mem09: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x036e4000-0x3d626fff] (927MB) >> (mem09 has part of node0 and part of node1, but not the whole of node0 and >> node1) >> [0.00] efi: mem10: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x3d627000-0x3fff] (41MB) >> (part of node1 and part of node2) >> [0.00] efi: mem11: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x4000-0x8c92dfff] (1225MB) >> [0.00] efi: mem12: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x8c92e000-0xbbfbdfff] (758MB) >> [0.00] efi: mem13: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0xbbfbe000-0xbbfddfff] (0MB) >> [0.00] efi: mem14: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0xbbfde000-0xbe350fff] (35MB) >> [0.00] efi: mem15: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0xbe351000-0xbe579fff] (2MB) >> [0.00] efi: mem16: [Loader Code| | | | | | | | >> |WB|WT|WC|UC] range=[0xbe57a000-0xbe6a0fff] (1MB) >> [0.00] efi: mem17: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] ra
[PATCH] perf evsel: Fix incorrect precise_ip in default event name
When no event is specified with -e option, perf will specify a "cycles" event with the highest level of precision available in perf_event_attr.precise_ip as the default event. But the evsel name shows an incorrect precise ip, fix it. For example, with a highest precision perf_event_attr.precise_ip = 2, the evsel name "cycles:ppp" shows a wrong precision available. Before: $./perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (21 samples) ] $./perf evlist -v cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 After: $./perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (16 samples) ] $./perf evlist -v cycles:pp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1 Signed-off-by: Mengting Zhang --- tools/perf/util/evsel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 0dccdb8..94cf11d 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -312,7 +312,7 @@ struct perf_evsel *perf_evsel__new_cycles(bool precise) if (asprintf(&evsel->name, "cycles%s%s%.*s", (attr.precise_ip || attr.exclude_kernel) ? ":" : "", attr.exclude_kernel ? "u" : "", -attr.precise_ip ? attr.precise_ip + 1 : 0, "ppp") < 0) +attr.precise_ip ? attr.precise_ip : 0, "ppp") < 0) goto error_free; out: return evsel; -- 1.7.12.4
Re: [PATCH net] rds: ib: Fix NULL pointer dereference in debug code
From: Håkon Bugge Date: Tue, 7 Nov 2017 16:33:34 +0100 > rds_ib_recv_refill() is a function that refills an IB receive > queue. It can be called from both the CQE handler (tasklet) and a > worker thread. > > Just after the call to ib_post_recv(), a debug message is printed with > rdsdebug(): > > ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); > rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv, > recv->r_ibinc, sg_page(&recv->r_frag->f_sg), > (long) ib_sg_dma_address( > ic->i_cm_id->device, > &recv->r_frag->f_sg), > ret); > > Now consider an invocation of rds_ib_recv_refill() from the worker > thread, which is preemptible. Further, assume that the worker thread > is preempted between the ib_post_recv() and rdsdebug() statements. > > Then, if the preemption is due to a receive CQE event, the > rds_ib_recv_cqe_handler() will be invoked. This function processes > receive completions, including freeing up data structures, such as the > recv->r_frag. > > In this scenario, rds_ib_recv_cqe_handler() will process the receive > WR posted above. That implies, that the recv->r_frag has been freed > before the above rdsdebug() statement has been executed. When it is > later executed, we will have a NULL pointer dereference: ... > This bug was provoked by compiling rds out-of-tree with > EXTRA_CFLAGS="-DRDS_DEBUG -DDEBUG" and inserting an artificial delay > between the rdsdebug() and ib_ib_port_recv() statements: > > /* XXX when can this fail? */ > ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr); > + if (can_wait) > + usleep_range(1000, 5000); > rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv, > recv->r_ibinc, sg_page(&recv->r_frag->f_sg), > (long) ib_sg_dma_address( > > The fix is simply to move the rdsdebug() statement up before the > ib_post_recv() and remove the printing of ret, which is taken care of > anyway by the non-debug code. > > Signed-off-by: Håkon Bugge > Reviewed-by: Knut Omang > Reviewed-by: Wei Lin Guay Applied, thank you.
Re: [PATCH v2 2/2] ARM: sun8i: bananapi-m3: Enable dwmac-sun8i
On Fri, Nov 10, 2017 at 11:48:11AM +0800, Chen-Yu Tsai wrote: > On Thu, Nov 9, 2017 at 4:29 PM, Corentin Labbe > wrote: > > The dwmac-sun8i hardware is present on the bananapi m3 > > It uses an external PHY rtl8211e via RGMII. > > > > This patch create the needed emac and phy nodes. > > > > Signed-off-by: Corentin Labbe > > --- > > arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts | 18 ++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts > > b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts > > index c606af3dbfed..45bdd5c17829 100644 > > --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts > > +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts > > @@ -52,6 +52,7 @@ > > compatible = "sinovoip,bpi-m3", "allwinner,sun8i-a83t"; > > > > aliases { > > + ethernet0 = &emac; > > serial0 = &uart0; > > }; > > > > @@ -88,6 +89,23 @@ > > /* TODO GL830 USB-to-SATA bridge downstream w/ GPIO power controls > > */ > > }; > > > > +&emac { > > + pinctrl-names = "default"; > > + pinctrl-0 = <&emac_rgmii_pins>; > > + phy-handle = <&ext_rgmii_phy>; > > + phy-mode = "rgmii"; > > Schematics say PHY is power by DC1SW from the PMIC. > Not sure why you don't need that. Have you tested your patch? Tested on 4.14.0-rc5-next-20171018+ I will try to check which uboot is used, perhaps it's an old uboot with some PMIC hack. Thanks Regards
Re: [PATCH] MAINTAINERS: Add Lorenzo Pieralisi for PCI host bridge drivers
Hi, On Thursday 09 November 2017 08:35 PM, Bjorn Helgaas wrote: > On Thu, Nov 09, 2017 at 11:28:36AM +0530, Kishon Vijay Abraham I wrote: >> Hi Bjorn, >> >> On Thursday 09 November 2017 01:56 AM, Bjorn Helgaas wrote: >>> On Wed, Nov 08, 2017 at 02:15:10PM -0600, Bjorn Helgaas wrote: From: Bjorn Helgaas Add Lorenzo Pieralisi as maintainer for PCI native host bridge drivers and the endpoint driver framework. Signed-off-by: Bjorn Helgaas >>> >>> This is on my for-linus branch, and I intend to merge it for v4.14. >> >> There is already an entry for PCI endpoint in MAINTAINERS file. Can Lorenzo >> be >> added there? >> >> PCI ENDPOINT SUBSYSTEM >> M: Kishon Vijay Abraham I >> L: linux-...@vger.kernel.org >> T: git >> git://git.kernel.org/pub/scm/linux/kernel/git/kishon/pci-endpoint.git >> S: Supported >> F: drivers/pci/endpoint/ >> F: drivers/misc/pci_endpoint_test.c >> F: tools/pci/ > > Right, thanks, I forgot all about this separate entry. I added Lorenzo > there, resulting in the patch below. > > My practice has been that all the PCI patches (everything in > drivers/pci plus some include and x86/pci stuff) have been merged via > my tree. > > This includes things in drivers/pci/{host,dwc,endpoint,switch}, which > are non-core things and usually specific to a chipset. I try to > ensure they have individual maintainers designated, and I ask for > their acks for non-trivial changes because I have no specs and no > hardware for testing them. But I think it's still good to have one > person look over them all to try to keep some consistency across them > because they are all quite similar. > > So my hope is that Lorenzo can take over that oversight role from me, > not that he would replace any of those designated maintainers. > > Ideally, this will be transparent to patch submitters except that they > should add Lorenzo to the "To:" line (keeping linux-pci and other > interested parties). Makes sense. I'm also thinking if we should change the tree in PCI ENDPOINT SUBSYSTEM to git git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git or maybe git git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/? since most of the endpoint patches also deals with modifying the controller drivers. Thanks Kishon
[PATCHv2 1/2] capability: introduce sysctl for controlled user-ns capability whitelist
From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. This takes input as capability mask expressed as two comma separated hex u32 words. The mask, however, is stored in kernel as kernel_cap_t type. Any capabilities that are not part of this mask will be controlled and will not be allowed to processes in controlled user-ns. Signed-off-by: Mahesh Bandewar --- v2: Rebase v1: Initial submission Documentation/sysctl/kernel.txt | 21 ++ include/linux/capability.h | 3 +++ kernel/capability.c | 47 + kernel/sysctl.c | 5 + 4 files changed, 76 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 694968c7523c..a1d39dbae847 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -25,6 +25,7 @@ show up in /proc/sys/kernel: - bootloader_version[ X86 only ] - callhome [ S390 only ] - cap_last_cap +- controlled_userns_caps_whitelist - core_pattern - core_pipe_limit - core_uses_pid @@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel. == +controlled_userns_caps_whitelist + +Capability mask that is whitelisted for "controlled" user namespaces. +Any capability that is missing from this mask will not be allowed to +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW +is not part of this mask, then processes running inside any controlled +userns's will not be allowed to perform action that needs CAP_NET_RAW +capability. However, processes that are attached to a parent user-ns +hierarchy that is *not* controlled and has CAP_NET_RAW can continue +performing those actions. User-namespaces are marked "controlled" at +the time of their creation based on the capabilities of the creator. +A process that does not have CAP_SYS_ADMIN will create user-namespaces +that are controlled. + +The value is expressed as two comma separated hex words (u32). This +sysctl is avaialble in init-ns and users with CAP_SYS_ADMIN in init-ns +are allowed to make changes. + +== + core_pattern: core_pattern is used to specify a core dumpfile pattern name. diff --git a/include/linux/capability.h b/include/linux/capability.h index f640dcbc880c..7d79a4689625 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -14,6 +14,7 @@ #define _LINUX_CAPABILITY_H #include +#include #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 @@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); /* audit system wants to get cap info from files as well */ extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, +void __user *buff, size_t *lenp, loff_t *ppos); extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); diff --git a/kernel/capability.c b/kernel/capability.c index 1e1c0236f55b..4a859b7d4902 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set); int file_caps_enabled = 1; +kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET; + static int __init file_caps_disable(char *str) { file_caps_enabled = 0; @@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) rcu_read_unlock(); return (ret == 0); } + +/* Controlled-userns capabilities routines */ +#ifdef CONFIG_SYSCTL +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, +void __user *buff, size_t *lenp, loff_t *ppos) +{ + DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP); + struct ctl_table caps_table; + char tbuf[NAME_MAX]; + int ret; + + ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP, + controlled_userns_caps_whitelist.cap, + _KERNEL_CAPABILITY_U32S); + if (ret != CAP_LAST_CAP) + return -1; + + scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap); + + caps_table.data = tbuf; + caps_table.maxlen = NAME_MAX; + caps_table.mode = table->mode; + ret = proc_dostring(&caps_table, write, buff, lenp, ppos); + if (ret) + return ret; + if (write) { + kernel_cap_t tmp; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP); + if (ret) + return ret; + + ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S, +caps_b
[PATCHv2 2/2] userns: control capabilities of some user namespaces
From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and a process that has SYS_ADMIN that belongs to uncontrolled user-ns can create another (child) user- namespace that is uncontrolled. Any other process (that either does not have SYS_ADMIN or belongs to a controlled user-ns) can only create a user-ns that is controlled. global-capability-whitelist (controlled_userns_caps_whitelist) is used at the capability check-time and keeps the semantics for the processes that belong to uncontrolled user-ns as it is. Processes that belong to controlled user-ns however are subjected to different checks- (a) if the capability in question is controlled and process belongs to controlled user-ns, then it's always denied. (b) if the capability in question is NOT controlled then fall back to the traditional check. Signed-off-by: Mahesh Bandewar --- v2: Don't recalculate user-ns flags for every setns() call. v1: Initial submission. include/linux/capability.h | 1 + include/linux/user_namespace.h | 20 kernel/capability.c| 5 + kernel/user_namespace.c| 4 security/commoncap.c | 8 5 files changed, 38 insertions(+) diff --git a/include/linux/capability.h b/include/linux/capability.h index 7d79a4689625..a1fd9e460379 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -251,6 +251,7 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); int proc_douserns_caps_whitelist(struct ctl_table *table, int write, void __user *buff, size_t *lenp, loff_t *ppos); +bool is_capability_controlled(int cap); extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 3fe714da7f5a..647f825c7b5f 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -23,6 +23,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ }; #define USERNS_SETGROUPS_ALLOWED 1UL +#define USERNS_CONTROLLED 2UL #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED @@ -103,6 +104,16 @@ static inline void put_user_ns(struct user_namespace *ns) __put_user_ns(ns); } +static inline bool is_user_ns_controlled(const struct user_namespace *ns) +{ + return ns->flags & USERNS_CONTROLLED; +} + +static inline void mark_user_ns_controlled(struct user_namespace *ns) +{ + ns->flags |= USERNS_CONTROLLED; +} + struct seq_operations; extern const struct seq_operations proc_uid_seq_operations; extern const struct seq_operations proc_gid_seq_operations; @@ -161,6 +172,15 @@ static inline struct ns_common *ns_get_owner(struct ns_common *ns) { return ERR_PTR(-EPERM); } + +static inline bool is_user_ns_controlled(const struct user_namespace *ns) +{ + return false; +} + +static inline void mark_user_ns_controlled(struct user_namespace *ns) +{ +} #endif #endif /* _LINUX_USER_H */ diff --git a/kernel/capability.c b/kernel/capability.c index 4a859b7d4902..bffe249922de 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -511,6 +511,11 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) } /* Controlled-userns capabilities routines */ +bool is_capability_controlled(int cap) +{ + return !cap_raised(controlled_userns_caps_whitelist, cap); +} + #ifdef CONFIG_SYSCTL int proc_douserns_caps_whitelist(struct ctl_table *table, int write, void __user *buff, size_t *lenp, loff_t *ppos) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index c490f1e4313b..600c7dcb9ff7 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -139,6 +139,10 @@ int create_user_ns(struct cred *new) goto fail_keyring; set_cred_user_ns(new, ns); + if (!ns_capable(parent_ns, CAP_SYS_ADMIN) || + is_user_ns_controlled(parent_ns)) + mark_user_ns_controlled(ns); + return 0; fail_keyring: #ifdef CONFIG_PERSISTENT_KEYRINGS diff --git a/security/commoncap.c b/security/commoncap.c index fc46f5b85251..89103f16ac37 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -73,6 +73,14 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns, { struct user_namespace *ns = targ_ns; + /* If the capability is controlled and user-ns that process +* belongs-to is 'controlled' then return EPERM and no need +* to check the user-ns hierarchy. +*/ + if (is_user_ns_controlled(cred->user_ns) && +
[PATCHv2 0/2] capability controlled user-namespaces
From: Mahesh Bandewar TL;DR version - Creating a sandbox environment with namespaces is challenging considering what these sandboxed processes can engage into. e.g. CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. Current form of user-namespaces, however, if changed a bit can allow us to create a sandbox environment without locking down user- namespaces. Detailed version Problem --- User-namespaces in the current form have increased the attack surface as any process can acquire capabilities which are not available to them (by default) by performing combination of clone()/unshare()/setns() syscalls. #define _GNU_SOURCE #include #include #include int main(int ac, char **av) { int sock = -1; printf("Attempting to open RAW socket before unshare()...\n"); sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); if (sock < 0) { perror("socket() SOCK_RAW failed: "); } else { printf("Successfully opened RAW-Sock before unshare().\n"); close(sock); sock = -1; } if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { perror("unshare() failed: "); return 1; } printf("Attempting to open RAW socket after unshare()...\n"); sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); if (sock < 0) { perror("socket() SOCK_RAW failed: "); } else { printf("Successfully opened RAW-Sock after unshare().\n"); close(sock); sock = -1; } return 0; } The above example shows how easy it is to acquire NET_RAW capabilities and once acquired, these processes could take benefit of above mentioned or similar issues discovered/undiscovered with malicious intent. Note that this is just an example and the problem/solution is not limited to NET_RAW capability *only*. The easiest fix one can apply here is to lock-down user-namespaces which many of the distros do (i.e. don't allow users to create user namespaces), but unfortunately that prevents everyone from using them. Approach Introduce a notion of 'controlled' user-namespaces. Every process on the host is allowed to create user-namespaces (governed by the limit imposed by per-ns sysctl) however, mark user-namespaces created by sandboxed processes as 'controlled'. Use this 'mark' at the time of capability check in conjunction with a global capability whitelist. If the capability is not whitelisted, processes that belong to controlled user-namespaces will not be allowed. Once a user-ns is marked as 'controlled'; all its child user- namespaces are marked as 'controlled' too. A global whitelist is list of capabilities governed by the sysctl which is available to (privileged) user in init-ns to modify while it's applicable to all controlled user-namespaces on the host. Marking user-namespaces controlled without modifying the whitelist is equivalent of the current behavior. The default value of whitelist includes all capabilities so that the compatibility is maintained. However it gives admins fine-grained ability to control various capabilities system wide without locking down user-namespaces. Please see individual patches in this series. Mahesh Bandewar (2): capability: introduce sysctl for controlled user-ns capability whitelist userns: control capabilities of some user namespaces Documentation/sysctl/kernel.txt | 21 + include/linux/capability.h | 4 include/linux/user_namespace.h | 20 kernel/capability.c | 52 + kernel/sysctl.c | 5 kernel/user_namespace.c | 4 security/commoncap.c| 8 +++ 7 files changed, 114 insertions(+) -- 2.15.0.448.gf294e3d99a-goog
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Fri, Nov 10, 2017 at 1:46 PM, Serge E. Hallyn wrote: > Quoting Eric W. Biederman (ebied...@xmission.com): >> single sandbox. I am not at all certain that the capabilities is the >> proper place to limit code reachability. > > Right, I keep having this gut feeling that there is another way we > should be doing that. Maybe based on ksplice or perf, or maybe more > based on subsystems. And I hope someone pursues that. But I can't put > my finger on it, and meanwhile the capability checks obviously *are* in > fact gates... > Well, I don't mind if there is a better solution available. The proposed solution is not adding too much or complex code and using a bit and a sysctl and will be sitting dormant. When we have complete solution, this addition should not be a burden to maintain because of it's non-invasive footprint. I will push the next version of the patch-set that implements Serge's finding. Thanks, --mahesh.. [PS: I'll be soon traveling again and moving to an area where connectivity will be scarce / unreliable. So please expect lot more delays in my responses.] > -serge
linux-next: Tree for Nov 10
Hi all, Changes since 20171109: The powerpc tree still had its build failure for which I applied a patch The net-next tree gained a conflict against Linus' tree. The tip tree lost its build failure but gained a conflict against Linus' tree. The rcu tree gained a conflict against the tip tree. The gpio tree gained a conflict against Linus' tree. The akpm-current tree gained a conflict against the tip tree. Non-merge commits (relative to Linus' tree): 11746 11057 files changed, 546043 insertions(+), 263268 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 272 trees (counting Linus' and 42 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (3fefc31843cf Merge tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi) Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures with sparse check) Merging arc-current/for-curr (92d44128241f ARCv2: Accomodate HS48 MMUv5 by relaxing MMU ver checking) Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks addr_limit) Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup warning after mach_power_off) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug callback failure during imc initialization) Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (6a1728024745 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec) Merging ipsec/master (c9f3f813d462 xfrm: Fix stack-out-of-bounds read in xfrm_state_find.) Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix use-after-free in send_reset) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (a6127b4440d1 Merge tag 'iwlwifi-for-kalle-2017-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (9618aec3349b Merge tag 'mac80211-for-davem-2017-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211) Merging sound-current/for-linus (75ee94b20b46 ALSA: hda - fix headset mic problem for Dell machines with alc274) Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi for PCI host bridge drivers) Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8) Merging tty.current/tty-linus (8a5776a5f498 Linux 4.14-rc4) Merging usb.current/usb-linus (bb176f67090c Linux 4.14-rc6) Merging usb-gadget-fixes/fixes (7c80f9e4a588 usb: usbtest: fix NULL pointer dereference) Merging usb-serial-fixes/usb-linus (0b07194bb55e Linux 4.14-rc7) Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: check before accessing ci_role in ci_role_show) Merging phy/fixes (2fb850092fd9 phy: rockchip-typec: Check for errors from tcphy_phy_init()) Merging s
Re: [PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist
On Fri, Nov 10, 2017 at 1:30 PM, Serge E. Hallyn wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com): > ... >> >> >> >> == >> >> >> >> +controlled_userns_caps_whitelist >> >> + >> >> +Capability mask that is whitelisted for "controlled" user namespaces. >> >> +Any capability that is missing from this mask will not be allowed to >> >> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW >> >> +is not part of this mask, then processes running inside any controlled >> >> +userns's will not be allowed to perform action that needs CAP_NET_RAW >> >> +capability. However, processes that are attached to a parent user-ns >> >> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue >> >> +performing those actions. User-namespaces are marked "controlled" at >> >> +the time of their creation based on the capabilities of the creator. >> >> +A process that does not have CAP_SYS_ADMIN will create user-namespaces >> >> +that are controlled. >> > >> > Hm. I think that's fine (the way 'controlled' user namespaces are >> > defined), but that is design decision in itself, and should perhaps be >> > discussed. >> > >> > Did you consider other ways? What about using CAP_SETPCAP? >> > >> I did try other ways e.g. using another bounding-set etc. but >> eventually settled with this approach because of main two properties - > > No, I meant did you try other ways of defining a controlled user > namespace, other than one which is created by a task lacking > CAP_SYS_ADMIN? > SYS_ADMIN is the capability that has been used for deciding who can or cannot create namespaces, so didn't want to create another model that may not be compatible with current model which is well understood hence no. > ... > >> >> +The value is expressed as two comma separated hex words (u32). This >> > >> > Why comma separated? whitespace ok? Leading 0x ok? What is the >> > default at boot? (Obviously the patch tells me, I'm asking for it >> > to be spelled out in the doc) >> > >> I tried multiple ways including representing capabilities in >> string/name form for better readability but didn't want to add >> additional complexities of dealing with strings and possible >> string-related-issues for this. Also didn't want to reinvent the new >> form so settled with something that is widely used (cpu >> bounding/affinity/irq mapping etc.) and is capable of handling growing >> bit set (currently 37 but possibly more later). > > Ok, thanks.
Re: [PATCH 03/14] soundwire: Add Master registration
On Thu, Nov 09, 2017 at 09:14:16PM +, Srinivas Kandagatla wrote: > > > On 19/10/17 04:03, Vinod Koul wrote: > > >+/** > >+ * sdw_add_bus_master: add a bus Master instance > >+ * > >+ * @bus: bus instance > >+ * > >+ * Initializes the bus instance, read properties and create child > >+ * devices. > >+ */ > > Some of the exported functions are missing kerneldocs. > Is it something you plan to add in next version of the patcheset? I though most were, will double check to be sure. > > >+int sdw_add_bus_master(struct sdw_bus *bus) > >+{ > >+int ret; > >+ > >+if (!bus->dev) { > >+pr_err("SoundWire bus has no device"); > >+return -ENODEV; > >+} > >+ > >+mutex_init(&bus->bus_lock); > >+INIT_LIST_HEAD(&bus->slaves); > >+ > >+/* > >+ * SDW is an enumerable bus, but devices can be powered off. So, > >+ * they won't be able to report as present. > >+ * > >+ * Create Slave devices based on Slaves described in > >+ * the respective firmware (ACPI/DT) > >+ */ > >+ > >+if (IS_ENABLED(CONFIG_ACPI) && bus->dev && ACPI_HANDLE(bus->dev)) > >+ret = sdw_acpi_find_slaves(bus); > >+else if (IS_ENABLED(CONFIG_OF) && bus->dev && bus->dev->of_node) > >+ret = sdw_of_find_slaves(bus); > >+else > bus->dev is already checked in the start of the function, do we need to > check once again ? yes already fixed, thanks -- ~Vinod
Re: [PATCH] drivers: hv: balloon: remove extraneous assignment to region_start
On Wed, 18 Oct 2017 12:52:12 +0100 Colin King wrote: > From: Colin Ian King > > The variable region_start is assigned twice, the first value is > never read and redundant, so can be removed. We can clean up the > code further by assigning rg_start directly rather than using the > temporary variable region_start which can then be removed. Cleans > up the clang warning: > > drivers/hv/hv_balloon.c:976:3: warning: Value stored to 'region_start' > is never read > > Signed-off-by: Colin Ian King LGTM Acked-by: Stephen Hemminger
Re: [PATCH 02/14] soundwire: Add SoundWire bus type
On Thu, Nov 09, 2017 at 09:14:07PM +, Srinivas Kandagatla wrote: > > > On 19/10/17 04:03, Vinod Koul wrote: > >This adds the base SoundWire bus type, bus and driver registration. > >along with changes to module device table for new SoundWire > >device type. > > > >Signed-off-by: Sanyog Kale > >Signed-off-by: Vinod Koul > >--- > > >+++ b/drivers/soundwire/Kconfig > >@@ -0,0 +1,22 @@ > >+# > >+# SoundWire subsystem configuration > >+# > >+ > >+menuconfig SOUNDWIRE > >+bool "SoundWire support" > > Any reason why this subsystem can not be build as module? This is not subsystem symbol but the menu. The SOUNDWIRE_BUS can be module. > > >+---help--- > >+ SoundWire is a 2-Pin interface with data and clock line ratified > >+ by the MIPI Alliance. SoundWire is used for transporting data > >+ typically related to audio functions. SoundWire interface is > > >+#ifndef __SDW_BUS_H > >+#define __SDW_BUS_H > >+ > >+#include > >+#include > >+#include > >+#include > >+#include > Do you need these headers here? Yes :) I will double check though > > >+#include > >+ > >+int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size); > >+ > >+#endif /* __SDW_BUS_H */ > >diff --git a/drivers/soundwire/bus_type.c b/drivers/soundwire/bus_type.c > >new file mode 100644 > >index ..a14d1de80afa > >--- /dev/null > >+++ b/drivers/soundwire/bus_type.c > > > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include "bus.h" > >+ > >+/** > >+ * sdw_get_device_id: find the matching SoundWire device id > >+ * > function name should end with () - according to kernel doc. ah thanks for pointing will add > > >+ * @slave: SoundWire Slave device > >+ * @drv: SoundWire Slave Driver > >+ * > >+ * The match is done by comparing the mfg_id and part_id from the > >+ * struct sdw_device_id. class_id is unused, as it is a placeholder > >+ * in MIPI Spec. > >+ */ > > BTW, This is a static private function, why are we adding kernel doc for > this? the match is an important routine and helps people understand the logic hence documentation. More doc is better right :) > > >+static const struct sdw_device_id * > >+sdw_get_device_id(struct sdw_slave *slave, struct sdw_driver *drv) > >+{ > >+const struct sdw_device_id *id = drv->id_table; > >+ > >+while (id && id->mfg_id) { > >+if (slave->id.mfg_id == id->mfg_id && > >+slave->id.part_id == id->part_id) { > >+return id; > >+} > >+id++; > >+} > >+ > >+return NULL; > >+} > >+ > >+static int sdw_bus_match(struct device *dev, struct device_driver *ddrv) > >+{ > >+struct sdw_slave *slave = dev_to_sdw_dev(dev); > >+struct sdw_driver *drv = drv_to_sdw_driver(ddrv); > >+ > >+return !!sdw_get_device_id(slave, drv); > >+} > >+ > >+int sdw_slave_modalias(struct sdw_slave *slave, char *buf, size_t size) > >+{ > >+/* modalias is sdw:mp */ > >+ > >+return snprintf(buf, size, "sdw:m%04Xp%04X\n", > >+slave->id.mfg_id, slave->id.part_id); > >+} > >+ > >+static int sdw_uevent(struct device *dev, struct kobj_uevent_env *env) > >+{ > >+struct sdw_slave *slave = dev_to_sdw_dev(dev); > >+char modalias[32]; > >+ > >+sdw_slave_modalias(slave, modalias, sizeof(modalias)); > >+ > >+if (add_uevent_var(env, "MODALIAS=%s", modalias)) > >+return -ENOMEM; > >+ > >+return 0; > >+} > >+ > >+struct bus_type sdw_bus_type = { > >+.name = "soundwire", > >+.match = sdw_bus_match, > >+.uevent = sdw_uevent, > >+}; > >+EXPORT_SYMBOL(sdw_bus_type); > >+ > >+static int sdw_drv_probe(struct device *dev) > >+{ > >+struct sdw_slave *slave = dev_to_sdw_dev(dev); > >+struct sdw_driver *drv = drv_to_sdw_driver(dev->driver); > >+const struct sdw_device_id *id; > >+int ret; > >+ > >+id = sdw_get_device_id(slave, drv); > > By this time we must have already matched dev and driver by the ID, > shouldn't it be just slave->id here? I don't think so we do not have slave->id, we pass the id in probe as an argument > >+if (!id) > >+return -ENODEV; > >+ > >+/* > >+ * attach to power domain but don't turn on (last arg) > >+ */ > >+ret = dev_pm_domain_attach(dev, false); > >+if (ret) { > Shouldn't it just handle the EPROBE_DEFER case and ignore it for other > errors. why should we ignore other errors and continue? > > > >+dev_err(dev, "Failed to attach PM domain: %d\n", ret); > >+return ret; > >+} > >+ > >+ret = drv->probe(slave, id); > >+if (ret) { > >+dev_err(dev, "Probe of %s failed: %d\n", drv->name, ret); > >+return ret; > >+} > > > What happens if the slave driver is built as module and loaded after the > slave device is attached to the bus. How does the slave driver get updated > status in this case? > > We have simila
Re: [PATCH] tcp: Export to userspace the TCP state names for the trace events
2017-11-10 8:57 GMT+08:00 Steven Rostedt : > > From: "Steven Rostedt (VMware)" > > The TCP trace events (specifically tcp_set_state), maps emums to symbol > names via __print_symbolic(). But this only works for reading trace events > from the tracefs trace files. If perf or trace-cmd were to record these > events, the event format file does not convert the enum names into numbers, > and you get something like: > > __print_symbolic(REC->oldstate, > { TCP_ESTABLISHED, "TCP_ESTABLISHED" }, > { TCP_SYN_SENT, "TCP_SYN_SENT" }, > { TCP_SYN_RECV, "TCP_SYN_RECV" }, > { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" }, > { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" }, > { TCP_TIME_WAIT, "TCP_TIME_WAIT" }, > { TCP_CLOSE, "TCP_CLOSE" }, > { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" }, > { TCP_LAST_ACK, "TCP_LAST_ACK" }, > { TCP_LISTEN, "TCP_LISTEN" }, > { TCP_CLOSING, "TCP_CLOSING" }, > { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" }) > > Where trace-cmd and perf do not know the values of those enums. > > Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert > the enum strings into their values at system boot. This will allow perf and > trace-cmd to see actual numbers and not enums: > > __print_symbolic(REC->oldstate, > { 1, "TCP_ESTABLISHED" }, > { 2, "TCP_SYN_SENT" }, > { 3, "TCP_SYN_RECV" }, > { 4, "TCP_FIN_WAIT1" }, > { 5, "TCP_FIN_WAIT2" }, > { 6, "TCP_TIME_WAIT" }, > { 7, "TCP_CLOSE" }, > { 8, "TCP_CLOSE_WAIT" }, > { 9, "TCP_LAST_ACK" }, > { 10, "TCP_LISTEN" }, > { 11, "TCP_CLOSING" }, > { 12, "TCP_NEW_SYN_RECV" }) > > Signed-off-by: Steven Rostedt (VMware) > --- > include/trace/events/tcp.h | 41 - > 1 file changed, 28 insertions(+), 13 deletions(-) > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h > index 07a6cbf1..62e5bad7901f 100644 > --- a/include/trace/events/tcp.h > +++ b/include/trace/events/tcp.h > @@ -9,21 +9,36 @@ > #include > #include > > +#define tcp_state_names\ > + EM(TCP_ESTABLISHED) \ > + EM(TCP_SYN_SENT)\ > + EM(TCP_SYN_RECV)\ > + EM(TCP_FIN_WAIT1) \ > + EM(TCP_FIN_WAIT2) \ > + EM(TCP_TIME_WAIT) \ > + EM(TCP_CLOSE) \ > + EM(TCP_CLOSE_WAIT) \ > + EM(TCP_LAST_ACK)\ > + EM(TCP_LISTEN) \ > + EM(TCP_CLOSING) \ > + EMe(TCP_NEW_SYN_RECV) > + > +/* enums need to be exported to user space */ > +#undef EM > +#undef EMe > +#define EM(a) TRACE_DEFINE_ENUM(a); > +#define EMe(a)TRACE_DEFINE_ENUM(a); > + > +tcp_state_names > + > +#undef EM > +#undef EMe > +#define EM(a) tcp_state_name(a), > +#define EMe(a)tcp_state_name(a) > + > #define tcp_state_name(state) { state, #state } > #define show_tcp_state_name(val) \ > - __print_symbolic(val, \ > - tcp_state_name(TCP_ESTABLISHED),\ > - tcp_state_name(TCP_SYN_SENT), \ > - tcp_state_name(TCP_SYN_RECV), \ > - tcp_state_name(TCP_FIN_WAIT1), \ > - tcp_state_name(TCP_FIN_WAIT2), \ > - tcp_state_name(TCP_TIME_WAIT), \ > - tcp_state_name(TCP_CLOSE), \ > - tcp_state_name(TCP_CLOSE_WAIT), \ > - tcp_state_name(TCP_LAST_ACK), \ > - tcp_state_name(TCP_LISTEN), \ > - tcp_state_name(TCP_CLOSING),\ > - tcp_state_name(TCP_NEW_SYN_RECV)) > + __print_symbolic(val, tcp_state_names) > > /* > * tcp event with arguments sk and skb > -- > 2.13.6 > Could the macro tcp_state_name() be renamed ? If is included in include/net/tcp.h, it will cause compile error, because there's another function tcp_state_name() defined in net/netfilter/ipvs/ip_vs_proto_tcp.c. static const char * tcp_state_name(int state) { if (state >= IP_VS_TCP_S_LAST) return "ERR!"; return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?"; } Thanks Yafang
Re: [PATCH 2/4] kbuild: remove redundant $(wildcard ...) for cmd_files calculation
2017-11-10 13:53 GMT+09:00 Doug Anderson : > Hi, > > On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada > wrote: >> I do not why $(wildcard ...) needs to be called twice for computing >> cmd_files. Remove the first one. > > I tried and I can't find any reason for the two calls $(wildcard ...) > either, so this seems fine to me. > > >> Signed-off-by: Masahiro Yamada >> --- >> >> Makefile | 3 +-- >> scripts/Makefile.build | 3 +-- >> scripts/Makefile.headersinst | 3 +-- >> scripts/Makefile.modpost | 3 +-- >> 4 files changed, 4 insertions(+), 8 deletions(-) >> >> diff --git a/Makefile b/Makefile >> index a7476e6..58dd245 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -1693,8 +1693,7 @@ cmd_crmodverdir = $(Q)mkdir -p $(MODVERDIR) \ >> >> # read all saved command lines >> >> -targets := $(wildcard $(sort $(targets))) >> -cmd_files := $(wildcard .*.cmd $(foreach f,$(targets),$(dir $(f)).$(notdir >> $(f)).cmd)) >> +cmd_files := $(wildcard .*.cmd $(foreach f,$(sort $(targets)),$(dir >> $(f)).$(notdir $(f)).cmd)) >> >> ifneq ($(cmd_files),) >>$(cmd_files): ; # Do not try to update included dependency files >> diff --git a/scripts/Makefile.build b/scripts/Makefile.build >> index 061d0c3..62d5314 100644 >> --- a/scripts/Makefile.build >> +++ b/scripts/Makefile.build >> @@ -583,8 +583,7 @@ FORCE: >> # optimization, we don't need to read them if the target does not >> # exist, we will rebuild anyway in that case. >> >> -targets := $(wildcard $(sort $(targets))) >> -cmd_files := $(wildcard $(foreach f,$(targets),$(dir $(f)).$(notdir >> $(f)).cmd)) >> +cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir >> $(f)).cmd)) >> >> ifneq ($(cmd_files),) >>include $(cmd_files) >> diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst >> index 5692d7a..2aa9181 100644 >> --- a/scripts/Makefile.headersinst >> +++ b/scripts/Makefile.headersinst >> @@ -114,9 +114,8 @@ $(check-file): scripts/headers_check.pl $(output-files) >> FORCE >> >> endif >> >> -targets := $(wildcard $(sort $(targets))) >> cmd_files := $(wildcard \ >> - $(foreach f,$(targets),$(dir $(f)).$(notdir $(f)).cmd)) >> + $(foreach f,$(sort $$(targets)),$(dir $(f)).$(notdir >> $(f)).cmd)) > > Did you mean the "$$" here before (targets)? At first glance it seems > wrong... Good catch! I will fix this. Thanks! -- Best Regards Masahiro Yamada
Re: [PATCH 2/4] kbuild: remove redundant $(wildcard ...) for cmd_files calculation
Hi, On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada wrote: > I do not why $(wildcard ...) needs to be called twice for computing > cmd_files. Remove the first one. I tried and I can't find any reason for the two calls $(wildcard ...) either, so this seems fine to me. > Signed-off-by: Masahiro Yamada > --- > > Makefile | 3 +-- > scripts/Makefile.build | 3 +-- > scripts/Makefile.headersinst | 3 +-- > scripts/Makefile.modpost | 3 +-- > 4 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/Makefile b/Makefile > index a7476e6..58dd245 100644 > --- a/Makefile > +++ b/Makefile > @@ -1693,8 +1693,7 @@ cmd_crmodverdir = $(Q)mkdir -p $(MODVERDIR) \ > > # read all saved command lines > > -targets := $(wildcard $(sort $(targets))) > -cmd_files := $(wildcard .*.cmd $(foreach f,$(targets),$(dir $(f)).$(notdir > $(f)).cmd)) > +cmd_files := $(wildcard .*.cmd $(foreach f,$(sort $(targets)),$(dir > $(f)).$(notdir $(f)).cmd)) > > ifneq ($(cmd_files),) >$(cmd_files): ; # Do not try to update included dependency files > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index 061d0c3..62d5314 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -583,8 +583,7 @@ FORCE: > # optimization, we don't need to read them if the target does not > # exist, we will rebuild anyway in that case. > > -targets := $(wildcard $(sort $(targets))) > -cmd_files := $(wildcard $(foreach f,$(targets),$(dir $(f)).$(notdir > $(f)).cmd)) > +cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir > $(f)).cmd)) > > ifneq ($(cmd_files),) >include $(cmd_files) > diff --git a/scripts/Makefile.headersinst b/scripts/Makefile.headersinst > index 5692d7a..2aa9181 100644 > --- a/scripts/Makefile.headersinst > +++ b/scripts/Makefile.headersinst > @@ -114,9 +114,8 @@ $(check-file): scripts/headers_check.pl $(output-files) > FORCE > > endif > > -targets := $(wildcard $(sort $(targets))) > cmd_files := $(wildcard \ > - $(foreach f,$(targets),$(dir $(f)).$(notdir $(f)).cmd)) > + $(foreach f,$(sort $$(targets)),$(dir $(f)).$(notdir $(f)).cmd)) Did you mean the "$$" here before (targets)? At first glance it seems wrong...
Re: [PATCH 10/14] soundwire: Add sysfs for SoundWire DisCo properties
On Thu, Nov 09, 2017 at 09:14:35PM +, Srinivas Kandagatla wrote: > > > On 19/10/17 04:03, Vinod Koul wrote: > >It helps to read the properties for understanding and debug > >SoundWire systems, so add sysfs files for SoundWire DisCo > >properties. > > > >TODO: Add ABI files for sysfs > > > >Signed-off-by: Sanyog Kale > >Signed-off-by: Vinod Koul > >--- > > >diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c > >index 6c4f41b64744..e3d7aea18892 100644 > >--- a/drivers/soundwire/bus.c > >+++ b/drivers/soundwire/bus.c > >@@ -90,6 +90,8 @@ int sdw_add_bus_master(struct sdw_bus *bus) > > } > > } > >+sdw_sysfs_bus_init(bus); > >+ > > /* > > * SDW is an enumerable bus, but devices can be powered off. So, > > * they won't be able to report as present. > >@@ -119,6 +121,8 @@ static int sdw_delete_slave(struct device *dev, void > >*data) > > struct sdw_slave *slave = dev_to_sdw_dev(dev); > > struct sdw_bus *bus = slave->bus; > >+sdw_sysfs_slave_exit(slave); > >+ > > mutex_lock(&bus->bus_lock); > > if (!list_empty(&bus->slaves)) > > list_del(&slave->node); > >@@ -130,6 +134,7 @@ static int sdw_delete_slave(struct device *dev, void > >*data) > > void sdw_delete_bus_master(struct sdw_bus *bus) > > { > >+sdw_sysfs_bus_init(bus); > > Shouldn't this be sdw_sysfs_bus_exit() here? yes thats right, fixes in for v2 -- ~Vinod
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Eric W. Biederman (ebied...@xmission.com): > single sandbox. I am not at all certain that the capabilities is the > proper place to limit code reachability. Right, I keep having this gut feeling that there is another way we should be doing that. Maybe based on ksplice or perf, or maybe more based on subsystems. And I hope someone pursues that. But I can't put my finger on it, and meanwhile the capability checks obviously *are* in fact gates... -serge
Re: [PATCH 3/3] VFS: close race between getcwd() and d_move()
On Thu, Nov 09 2017, Linus Torvalds wrote: > On Thu, Nov 9, 2017 at 2:14 PM, NeilBrown wrote: >> On Thu, Nov 09 2017, Linus Torvalds wrote: >>> >>> How nasty would it be to just expand the calls to __d_drop/__d_rehash >>> into __d_move itself, and take both has list locks at the same time >>> (with the usual ordering and checking if it's the same list, of >>> course). >> >> something like this? > > Yes. > > This looks nicer to me. Partly because I hate those "pass flags to > functions that modify their behavior" kinds of patches. I'd rather see > just straight-line unconditional code with some possible duplication. ... > > I also do wonder if we can avoid all the unhash/rehash games entirely > (and avoid the hash list locking) if it turns out that the dentry and > target hash lists are the same. I'm not convinced. I haven't actually tried it, but the matrix of possibilities seems a little large. The source dentry may or may not be hashed (not in the "disconnected IS_ROOT" case), and the target may or may not want to be rehashed afterwards (depending on 'exchange'). We could skip the lock for an exchange if they both had the same hash, but not for a simple move. However your description of what it was that you didn't like gave me an idea - I can take the same approach as my original, but not pass flags around. I quite like how this turned out. Dropping the BUG_ON() in d_rehash() isn't ideal, maybe we could add ___d_rehash() without the BUG_ON() and call that from __d_rehash? Thanks, NeilBrown From: NeilBrown Date: Fri, 10 Nov 2017 15:20:06 +1100 Subject: [PATCH] VFS: close race between getcwd() and d_move() d_move() will call __d_drop() and then __d_rehash() on the dentry being moved. This creates a small window when the dentry appears to be unhashed. Many tests of d_unhashed() are made under ->d_lock and so are safe from racing with this window, but some aren't. In particular, getcwd() calls d_unlinked() (which calls d_unhashed()) without d_lock protection, so it can race. This races has been seen in practice with lustre, which uses d_move() as part of name lookup. See: https://jira.hpdd.intel.com/browse/LU-9735 It could race with a regular rename(), and result in ENOENT instead of either the 'before' or 'after' name. The race can be demonstrated with a simple program which has two threads, one renaming a directory back and forth while another calls getcwd() within that directory: it should never fail, but does. See: https://patchwork.kernel.org/patch/9455345/ We could fix this race by taking d_lock and rechecking when d_unhashed() reports true. Alternately when can remove the window, which is the approach this patch takes. ___d_drop() is introduce which does *not* clear d_hash.pprev so the dentry still appears to be hashed. __d_drop() calls ___d_drop(), then clears d_hash.pprev. __d_move() now uses ___d_drop() and only clears d_hash.pprev when not rehashing. Signed-off-by: NeilBrown --- fs/dcache.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index f90141387f01..8c83543f5065 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -468,9 +468,11 @@ static void dentry_lru_add(struct dentry *dentry) * d_drop() is used mainly for stuff that wants to invalidate a dentry for some * reason (NFS timeouts or autofs deletes). * - * __d_drop requires dentry->d_lock. + * __d_drop requires dentry->d_lock + * ___d_drop doesn't mark dentry as "unhashed" + * (dentry->d_hash.pprev will be LIST_POISON2, not NULL). */ -void __d_drop(struct dentry *dentry) +static void ___d_drop(struct dentry *dentry) { if (!d_unhashed(dentry)) { struct hlist_bl_head *b; @@ -486,12 +488,15 @@ void __d_drop(struct dentry *dentry) hlist_bl_lock(b); __hlist_bl_del(&dentry->d_hash); - dentry->d_hash.pprev = NULL; hlist_bl_unlock(b); /* After this call, in-progress rcu-walk path lookup will fail. */ write_seqcount_invalidate(&dentry->d_seq); } } +void __d_drop(struct dentry *dentry) { + ___d_drop(dentry); + dentry->d_hash.pprev = NULL; +} EXPORT_SYMBOL(__d_drop); void d_drop(struct dentry *dentry) @@ -2381,7 +2386,7 @@ EXPORT_SYMBOL(d_delete); static void __d_rehash(struct dentry *entry) { struct hlist_bl_head *b = d_hash(entry->d_name.hash); - BUG_ON(!d_unhashed(entry)); + hlist_bl_lock(b); hlist_bl_add_head_rcu(&entry->d_hash, b); hlist_bl_unlock(b); @@ -2818,9 +2823,9 @@ static void __d_move(struct dentry *dentry, struct dentry *target, write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED); /* unhash both */ - /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ - __d_drop(dentry); - __d_drop(target); + /* ___d_drop does write_seqcount_barrier, but they're OK to nest. */ + ___d_drop(den
Re: linux-next: manual merge of the net-next tree with Linus' tree
From: Stephen Rothwell Date: Fri, 10 Nov 2017 10:31:00 +1100 > Hi all, > > Today's linux-next merge of the net-next tree got a conflict in: > > net/sched/cls_basic.c > net/sched/cls_u32.c > > between commits: > > 0b2a59894b76 ("cls_basic: use tcf_exts_get_net() before call_rcu()") > 35c55fc156d8 ("cls_u32: use tcf_exts_get_net() before call_rcu()") > > from Linus' tree and commit: > > 1d8134fea2eb ("net_sched: use idr to allocate basic filter handles") > > from the net-next tree. This should be resolved as I've just merged 'net' into 'net-next'.
Re: linux-next: manual merge of the net-next tree with Linus' tree
On Thu, Nov 9, 2017 at 3:31 PM, Stephen Rothwell wrote: > I fixed it up (I think - see below) and can carry the fix as necessary. > This is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. It looks good to me. Thanks!
linux-next: manual merge of the akpm-current tree with the tip tree
Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in: kernel/softirq.c between commit: f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled") from the tip tree and commit: 275f9389fa4e ("kmemcheck: rip it out") from the akpm-current tree. I fixed it up (the latter removed code modified by the former) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
On Fri, Nov 10, 2017 at 6:58 AM, Eric W. Biederman wrote: > "Mahesh Bandewar (महेश बंडेवार)" writes: > >> [resend response as earlier one failed because of formatting issues] >> >> On Thu, Nov 9, 2017 at 12:21 PM, Serge E. Hallyn wrote: >>> >>> On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) >>> wrote: >>> > On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner >>> > wrote: >>> > > On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश >>> > > बंडेवार) wrote: >>> > >> Sorry folks I was traveling and seems like lot happened on this >>> > >> thread. :p >>> > >> >>> > >> I will try to response few of these comments selectively - >>> > >> >>> > >> > The thing that makes me hesitate with this set is that it is a >>> > >> > permanent new feature to address what (I hope) is a temporary >>> > >> > problem. >>> > >> I agree this is permanent new feature but it's not solving a temporary >>> > >> problem. It's impossible to assess what and when new vulnerability >>> > >> that could show up. I think Daniel summed it up appropriately in his >>> > >> response >>> > >> >>> > >> > Seems like there are two naive ways to do it, the first being to just >>> > >> > look at all code under ns_capable() plus code called from there. It >>> > >> > seems like looking at the result of that could be fruitful. >>> > >> This is really hard. The main issue that there were features designed >>> > >> and developed before user-ns days with an assumption that unprivileged >>> > >> users will never get certain capabilities which only root user gets. >>> > >> Now that is not true anymore with user-ns creation with mapping root >>> > >> for any process. Also at the same time blocking user-ns creation for >>> > >> eveyone is a big-hammer which is not needed too. So it's not that easy >>> > >> to just perform a code-walk-though and correct those decisions now. >>> > >> >>> > >> > It seems to me that the existing control in >>> > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct >>> > >> > tape >>> > >> > in that case. >>> > >> This solution is essentially blocking unprivileged users from using >>> > >> the user-namespaces entirely. This is not really a solution that can >>> > >> work. The solution that this patch-set adds allows unprivileged users >>> > >> to create user-namespaces. Actually the proposed solution is more >>> > >> fine-grained approach than the unprivileged_userns_clone solution >>> > >> since you can selectively block capabilities rather than completely >>> > >> blocking the functionality. >>> > > >>> > > I've been talking to Stéphane today about this and we should also keep >>> > > in mind >>> > > that we have: >>> > > >>> > > chb@conventiont|~ >>> > >> ls -al /proc/sys/user/ >>> > > total 0 >>> > > dr-xr-xr-x 1 root root 0 Nov 6 23:32 . >>> > > dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces >>> > > -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces >>> > > >>> > > These files allow you to limit the number of namespaces that can be >>> > > created >>> > > *per namespace* type. So let's say your system runs a bunch of user >>> > > namespaces >>> > > you can do: >>> > > >>> > > chb@conventiont|~ >>> > >> echo 0 > /proc/sys/user/max_user_namespaces >>> > > >>> > > So that the next time you try to create a user namespaces you'd see: >>> > > >>> > > chb@conventiont|~ >>> > >> unshare -U >>> > > unshare: unshare failed: No space left on device >>> > > >>> > > So there's not even a need to upstream a new sysctl since we have ways >>> > > of >>> > > blocking this. >>> > > >>> > I'm not sure how it's solving the problem that my patch-set is addressing? >>> > I agree though that the need for unprivileged_userns_clone sysctl goes >>> > away as this is equivalent to setting that sysctl to 0 as you have >>> > described above. >>> >>> oh right that was the reasoning iirc for not needing the other sysctl. >>> >>> > However as I mentioned earlier, blocking processes from creating >>> > user-namespaces is not the solution. Processes should be able to >>> > create namespaces as they are designed but at the same time we need to >>> > have controls to 'contain' them if a need arise. Setting max_no to 0 >>> > is not the solution that I'm looking for since it doesn't solve the >>> > problem. >>> >>> well yesterday we were told that was explicitly not the goal, but that was >>> not by you ... i just mention it to explain why we seem to be walking in >>> circles a bit.
Re: [PATCH resend 1/2] capability: introduce sysctl for controlled user-ns capability whitelist
Quoting Mahesh Bandewar (महेश बंडेवार) (mahe...@google.com): ... > >> > >> == > >> > >> +controlled_userns_caps_whitelist > >> + > >> +Capability mask that is whitelisted for "controlled" user namespaces. > >> +Any capability that is missing from this mask will not be allowed to > >> +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW > >> +is not part of this mask, then processes running inside any controlled > >> +userns's will not be allowed to perform action that needs CAP_NET_RAW > >> +capability. However, processes that are attached to a parent user-ns > >> +hierarchy that is *not* controlled and has CAP_NET_RAW can continue > >> +performing those actions. User-namespaces are marked "controlled" at > >> +the time of their creation based on the capabilities of the creator. > >> +A process that does not have CAP_SYS_ADMIN will create user-namespaces > >> +that are controlled. > > > > Hm. I think that's fine (the way 'controlled' user namespaces are > > defined), but that is design decision in itself, and should perhaps be > > discussed. > > > > Did you consider other ways? What about using CAP_SETPCAP? > > > I did try other ways e.g. using another bounding-set etc. but > eventually settled with this approach because of main two properties - No, I meant did you try other ways of defining a controlled user namespace, other than one which is created by a task lacking CAP_SYS_ADMIN? ... > >> +The value is expressed as two comma separated hex words (u32). This > > > > Why comma separated? whitespace ok? Leading 0x ok? What is the > > default at boot? (Obviously the patch tells me, I'm asking for it > > to be spelled out in the doc) > > > I tried multiple ways including representing capabilities in > string/name form for better readability but didn't want to add > additional complexities of dealing with strings and possible > string-related-issues for this. Also didn't want to reinvent the new > form so settled with something that is widely used (cpu > bounding/affinity/irq mapping etc.) and is capable of handling growing > bit set (currently 37 but possibly more later). Ok, thanks.
Re: [PATCH 1/2] blk-throtl: make latency= absolute
On Thu, Nov 09, 2017 at 03:42:58PM -0800, Tejun Heo wrote: > Hello, Shaohua. > > On Thu, Nov 09, 2017 at 03:12:12PM -0800, Shaohua Li wrote: > > The percentage latency makes sense, but the absolute latency doesn't to me. > > A > > 4k IO latency could be much smaller than 1M IO latency. If we don't add > > baseline latency, we can't specify a latency target which works for both 4k > > and > > 1M IO. > > It isn't adaptive for sure. I think it's still useful for the > following reasons. > > 1. The absolute latency target is by nature both workload and device >dependent. For a lot of use cases, coming up with a decent number >should be possible. > > 2. There are many use cases which aren't sensitive to the level where >they care much about the different between small and large >requests. e.g. protecting a managerial job so that it doesn't >completely stall doesn't require tuning things to that level. A >value which is comfortably higher than usually expected latencies >would often be enough (say 100ms). > > 3. It's also useful for verification / testing. I think the absolute latency would only work for HD. For a SSD, a 4k latency probably is 60us and 1M latency is 500us. The disk must be very contended to make 4k latency reach 500us. Not sensitive doesn't mean no protection. If the use case sets rough latency, say 1ms, there will be no protection for 4k IO at all. The baseline latency is pretty reliable for SSD actually. So I'd rather keeping the baseline latency for SSD but using absolute latency for HD, which can be done easily by setting DFL_HD_BASELINE_LATENCY to 0. Thanks, Shaohua
Re: [PATCH] drm: gem_cma_helper.c: Allow importing of contiguous scatterlists with nents > 1
Hi Liviu, Thank you for the patch. On Wednesday, 1 November 2017 16:14:19 EET Liviu Dudau wrote: > drm_gem_cma_prime_import_sg_table() will fail if the number of entries > in the sg_table > 1. However, you can have a device that uses an IOMMU > engine and can map a discontiguous buffer with multiple entries that > have consecutive sg_dma_addresses, effectively making it contiguous. > Allow for that scenario by testing the entries in the sg_table for > contiguous coverage. > > Reviewed-by: Brian Starkey > Signed-off-by: Liviu Dudau > --- > > Hi, > > This patch is the only change I need in order to be able to use existing > IOMMU domain infrastructure with the Mali DP driver. I have tested the > patch and I know it works correctly for my setup, but I would like to get > some comments on whether I am on the right path or if CMA really wants to > see an sg_table with only one entry. CMA, as the memory allocator, doesn't care as it doesn't even see the sg table. The drm_gem_cma_helper is badly named as it doesn't depend on CMA, it should have been called drm_gem_dma_contig_helper or something similar. The assumption at the base of that helper library is that the memory is DMA contiguous. Your patch guarantees that, so it should be fine. I've quickly checked the drivers using drm_gem_cma_prime_import_sg_table and none of them use cma_obj->sgt, so I think there's no risk of breakage. However, I would prefer if you updated the drm_gem_cma_object structure documentation to explicitly state that the sgt can contain multiple entries but that those entries are guaranteed to have contiguous DMA addresses. With the documentation update, Reviewed-by: Laurent Pinchart > drivers/gpu/drm/drm_gem_cma_helper.c | 22 -- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c > b/drivers/gpu/drm/drm_gem_cma_helper.c index 020e7668dfaba..43b179212052d > 100644 > --- a/drivers/gpu/drm/drm_gem_cma_helper.c > +++ b/drivers/gpu/drm/drm_gem_cma_helper.c > @@ -482,8 +482,26 @@ drm_gem_cma_prime_import_sg_table(struct drm_device > *dev, { > struct drm_gem_cma_object *cma_obj; > > - if (sgt->nents != 1) > - return ERR_PTR(-EINVAL); > + if (sgt->nents != 1) { > + /* check if the entries in the sg_table are contiguous */ > + dma_addr_t next_addr = sg_dma_address(sgt->sgl); > + struct scatterlist *s; > + unsigned int i; > + > + for_each_sg(sgt->sgl, s, sgt->nents, i) { > + /* > + * sg_dma_address(s) is only valid for entries > + * that have sg_dma_len(s) != 0 > + */ > + if (!sg_dma_len(s)) > + continue; > + > + if (sg_dma_address(s) != next_addr) > + return ERR_PTR(-EINVAL); > + > + next_addr = sg_dma_address(s) + sg_dma_len(s); > + } > + } > > /* Create a CMA GEM buffer. */ > cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size); -- Regards, Laurent Pinchart
Re: [PATCH v2 2/4] kaslr: select the memory region in immovable node to process
On Fri, Nov 10, 2017 at 11:14:37AM +0800, Baoquan He wrote: >On 11/10/17 at 11:03am, Chao Fan wrote: >> On Thu, Nov 09, 2017 at 04:21:32PM +0800, Baoquan He wrote: >> >Hi Chao, >> > >> >On 11/01/17 at 07:32pm, Chao Fan wrote: >> >> Compare the region of memmap entry and immovable_mem, then choose the >> >> intersection to process_mem_region. >> >> >> >> Since the interrelationship between e820 or efi entries and memory >> >> region in immovable_mem is different: >> > >> >Could you paste a bootlog with efi=debug specified in cmdline on the >> >system you tested? I want to check what kind of intersection between >> >them. The adding makes code pretty ugly, want to make sure if we have >> >to do like this. >> Hi Baoquan, >> >> Here is a machine with efi. > >Thanks, do you have the whole boot log? I want to have a look at e820. No problem, I will paste the whole in attach file. >And this is a special system, or a customized system? I mean you just It's a qemu machine, in which I can make more nodes to test. I have no suitable host machine available in my hand. >customize the firmware for better testing to cover kinds of cases. Although the code may be a little ugly, after comparing the different memory regions, this method and logic are better to cover more cases. If you have some ideas to improve the code. Thank you very much! Thanks, Chao Fan > >If it's too big, please attach it and send to me privately. > >Anyway, seems your considering about the intersection is right. > >Thanks >Baoquan >> >> The memory information in SRAT from dmesg: >> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x0009] >> [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x0010-0x1f3f] >> [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x1f40-0x3e7f] >> [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x3e80-0x5dbf] >> [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x5dc0-0x7cff] >> [0.00] ACPI: SRAT: Node 4 PXM 4 [mem 0x7d00-0x9c3f] >> [0.00] ACPI: SRAT: Node 5 PXM 5 [mem 0x9c40-0xbb7f] >> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0xbb80-0xbfff] >> [0.00] ACPI: SRAT: Node 6 PXM 6 [mem 0x1-0x11abf] >> [0.00] ACPI: SRAT: Node 7 PXM 7 [mem 0x11ac0-0x139ff] >> [0.00] ACPI: SRAT: Node 8 PXM 8 [mem 0x13a00-0x1593f] >> [0.00] ACPI: SRAT: Node 9 PXM 9 [mem 0x15940-0x1787f] >> >> There are 10 nodes, and 500M memory in every node. >> And node0 and node 6 has two parts. >> >> >> Here is the efi mem: >> [0.00] efi: mem00: [Boot Code | | | | | | | | >> |WB|WT|WC|UC] range=[0x-0x0fff] (0MB) >> [0.00] efi: mem01: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x1000-0x1fff] (0MB) >> [0.00] efi: mem02: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x2000-0x0009] (0MB) >> [0.00] efi: mem03: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x0010-0x00805fff] (7MB) >> [0.00] efi: mem04: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0x00806000-0x00806fff] (0MB) >> [0.00] efi: mem05: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x00807000-0x0081] (0MB) >> [0.00] efi: mem06: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0x0082-0x012f] (10MB) >> [0.00] efi: mem07: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x0130-0x01ff] (13MB) >> [0.00] efi: mem08: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x0200-0x036e3fff] (22MB) >> (From mem00 to mem08, belongs to node0) >> [0.00] efi: mem09: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x036e4000-0x3d626fff] (927MB) >> (mem09 has part of node0 and part of node1, but not the whole of node0 and >> node1) >> [0.00] efi: mem10: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x3d627000-0x3fff] (41MB) >> (part of node1 and part of node2) >> [0.00] efi: mem11: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0x4000-0x8c92dfff] (1225MB) >> [0.00] efi: mem12: [Loader Data| | | | | | | | >> |WB|WT|WC|UC] range=[0x8c92e000-0xbbfbdfff] (758MB) >> [0.00] efi: mem13: [Boot Data | | | | | | | | >> |WB|WT|WC|UC] range=[0xbbfbe000-0xbbfddfff] (0MB) >> [0.00] efi: mem14: [Conventional Memory| | | | | | | | >> |WB|WT|WC|UC] range=[0xbbfde000-0xbe350fff] (35MB) >> [0.00] efi: mem15: [Loader Data| | | | | |
Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
On 11/10/17 at 12:04P, WANG Chao wrote: > On 11/10/17 at 01:06P, Rafael J. Wysocki wrote: > > On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote: > > > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki > > > wrote: > > > > Hi Linus, > > > > > > > > On 11/9/2017 11:38 AM, WANG Chao wrote: > > > >> > > > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) > > > >> caused > > > >> a serious performance issue when reading from /proc/cpuinfo on system > > > >> with aperfmperf. > > > >> > > > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency. > > > >> On a system with 64 cpus, it takes 1.5s to finish running `cat > > > >> /proc/cpuinfo`, while it previously was done in 15ms. > > > > > > > > Honestly, I'm not sure what to do to address this ATM. > > > > > > > > The last requested frequency is only available in the non-HWP case, so > > > > it > > > > cannot be used universally. > > > > > > OK, here's an idea. > > > > > > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say > > > in parallel), then wait for a while (say 5 ms; the current 20 ms wait > > > is overkill) and then aperfmperf_snapshot_khz() can be run once on > > > each CPU in show_cpuinfo() without taking the "stale cache" threshold > > > into account. > > > > > > I'm going to try that and see how far I can get with it. > > > > Below is what I have. > > > > I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in > > aperfmperf_snapshot_all(), because 5 ms tended to add too much > > variation to the results on my test box. > > > > I think it may be reduced to 10 ms, though. > > > > Chao, can you please try this one and report back? > > Hi, Rafael > > Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to > finish on a 64 cpus AMD box with aperfmperf. > > You missed the fact that c_start() will also be called by c_next(). > > But I don't think the overall idea is good enough. I think /proc/cpuinfo > is too general for usespace too be delayed, no matter it's 10ms or 20ms. > > My point is cpu MHz is best to use a cached value for quick access. If > people are looking for reliable and accurate cpu frequency, > /proc/cpuinfo is probably a bad idae. > > What do you think? Could you also explain 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) please? The commit message is not clear for me. Are there any upstream disscutions? I wasn't following this change in upstream. Now I can't find any. Thanks, WANG Chao
Re: [PATCH 1/4] kbuild: create directory for make cache only when necessary
Hi Douglas, Thanks for your review. 2017-11-10 2:59 GMT+09:00 Doug Anderson : > Hi, > > On Thu, Nov 9, 2017 at 7:41 AM, Masahiro Yamada > wrote: >> Currently, the existence of $(dir $(make-cache)) is always checked, >> and created if it is missing. >> >> We can avoid unnecessary system calls by some tricks. >> >> [1] If KBUILD_SRC is unset, we are building in the source tree. >> The output directory checks can be entirely skipped. >> [2] If at least one cache data is found, it means the cache file >> was included. Obiously its directory exists. Skip "mkdir -p". >> [3] If Makefile does not contain any call of __run-and-store, it will >> not create a cache file. No need to create its directory. >> [4] The "mkdir -p" should be only invoked by the first call of >> __run-and-store >> >> Signed-off-by: Masahiro Yamada >> --- >> >> scripts/Kbuild.include | 13 + >> 1 file changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include >> index be1c9d6..4fb1be1 100644 >> --- a/scripts/Kbuild.include >> +++ b/scripts/Kbuild.include >> @@ -99,18 +99,19 @@ cc-cross-prefix = \ >> >> # Include values from last time >> make-cache := $(if $(KBUILD_EXTMOD),$(KBUILD_EXTMOD)/,$(if >> $(obj),$(obj)/)).cache.mk >> -ifeq ($(wildcard $(dir $(make-cache))),) >> -$(shell mkdir -p '$(dir $(make-cache))') >> -endif >> $(make-cache): ; >> -include $(make-cache) >> >> +cached-data := $(filter __cached_%, $(.VARIABLES)) >> + >> # If cache exceeds 1000 lines, shrink it down to 500. >> -ifneq ($(word 1000,$(filter __cached_%, $(.VARIABLES))),) >> +ifneq ($(word 1000,$(cached-data)),) >> $(shell tail -n 500 $(make-cache) > $(make-cache).tmp; \ >> mv $(make-cache).tmp $(make-cache)) >> endif >> >> +cache-dir := $(if $(KBUILD_SRC),$(if $(cache-data),,$(dir $(make-cache > > It wouldn't hurt to add a comment that cache-dir will be blank if we > don't need to make the cache dir and will contain a directory path > only if the dir doesn't exist. Without a comment it could take > someone quite a while to realize that... You are right. This is confusing. Another idea is use a boolean flag. For example, like follows: create-cache-dir := $(if $(KBUILD_SRC),$(if $(cache-data),,1))) define __run-and-store ifeq ($(origin $(1)),undefined) $$(eval $(1) := $$(shell $$(2))) ifeq ($(create-cache-dir),1) $$(shell mkdir -p $(dir $(make-cache))) $$(eval create-cache-dir :=) endif $$(shell echo '$(1) := $$($(1))' >> $(make-cache)) endif endef Perhaps, this is clearer and self-documenting. >> + >> # Usage: $(call __sanitize-opt,Hello=Hola$(comma)Goodbye Adios) >> # >> # Convert all '$', ')', '(', '\', '=', ' ', ',', ':' to '_' >> @@ -136,6 +137,10 @@ __sanitize-opt = $(subst $$,_,$(subst >> $(right_paren),_,$(subst $(left_paren),_,$ >> define __run-and-store >> ifeq ($(origin $(1)),undefined) >>$$(eval $(1) := $$(shell $$(2))) >> +ifneq ($(cache-dir),) >> + $$(shell mkdir -p $(cache-dir)) > > I _think_ you want some single quotes in there. AKA: > > $$(shell mkdir -p '$(cache-dir)') > > That at least matches what the "old" code used to do. Specifically if > 'cache-dir' happens to have a space in it then it won't work right > without the single quotes. There may be other symbols that your shell > might interpret in interesting ways, too. Kbuild always runs in the output directory. So, 'cache-dir' is always a relative path from the top of kernel directory whether O= option is given or not. For kernel source, I do not see any file path containing spaces. Just in case, I renamed a directory and tested, but something strange happened in silentoldconfig, it would not work. Insane people may want to use a file path with spaces for external modules. I tested, obj-m := fo o/ but, this would not work either. It will be difficult to make it work because $(sort ...) is used in several places in core makefiles. So, my conclusion is, it does not work. > NOTE: I have no idea if the kernel Makefiles work if paths like > KBUILD_SRC have spaces in them to begin with, but it seems wise to add > the quotes here anyway. I have not tested this case. Probably, this will be less difficult if we want to allow spaces in KBUILD_SRC. > ALSO NOTE: I think you could still confuse the kernel Makefiles if > somehow you had a single quote in your path somehow. I assume we > don't care? Hmm, I do not think this is worth efforts. Probably, the most reasonable solution is please do not use special characters in file paths. > >> + $$(eval cache-dir :=) >> +endif >>$$(shell echo '$(1) := $$($(1))' >> $(make-cache)) >> endif >> endef > > Other than the single quote problem and the suggested comment, this > seems like a sane optimization to me. Feel free to add my Reviewed-by > once those fixes are in place. > > -Doug > -- > To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in > the body of a m
Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
On 11/10/17 at 01:06P, Rafael J. Wysocki wrote: > On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote: > > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki > > wrote: > > > Hi Linus, > > > > > > On 11/9/2017 11:38 AM, WANG Chao wrote: > > >> > > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused > > >> a serious performance issue when reading from /proc/cpuinfo on system > > >> with aperfmperf. > > >> > > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency. > > >> On a system with 64 cpus, it takes 1.5s to finish running `cat > > >> /proc/cpuinfo`, while it previously was done in 15ms. > > > > > > Honestly, I'm not sure what to do to address this ATM. > > > > > > The last requested frequency is only available in the non-HWP case, so it > > > cannot be used universally. > > > > OK, here's an idea. > > > > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say > > in parallel), then wait for a while (say 5 ms; the current 20 ms wait > > is overkill) and then aperfmperf_snapshot_khz() can be run once on > > each CPU in show_cpuinfo() without taking the "stale cache" threshold > > into account. > > > > I'm going to try that and see how far I can get with it. > > Below is what I have. > > I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in > aperfmperf_snapshot_all(), because 5 ms tended to add too much > variation to the results on my test box. > > I think it may be reduced to 10 ms, though. > > Chao, can you please try this one and report back? Hi, Rafael Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to finish on a 64 cpus AMD box with aperfmperf. You missed the fact that c_start() will also be called by c_next(). But I don't think the overall idea is good enough. I think /proc/cpuinfo is too general for usespace too be delayed, no matter it's 10ms or 20ms. My point is cpu MHz is best to use a cached value for quick access. If people are looking for reliable and accurate cpu frequency, /proc/cpuinfo is probably a bad idae. What do you think? WANG Chao > > > --- > arch/x86/kernel/cpu/aperfmperf.c | 42 > --- > arch/x86/kernel/cpu/cpu.h|4 +++ > arch/x86/kernel/cpu/proc.c |5 +++- > 3 files changed, 39 insertions(+), 12 deletions(-) > > Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c > === > --- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c > +++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c > @@ -14,6 +14,8 @@ > #include > #include > > +#include "cpu.h" > + > struct aperfmperf_sample { > unsigned intkhz; > ktime_t time; > @@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void > u64 aperf, aperf_delta; > u64 mperf, mperf_delta; > struct aperfmperf_sample *s = this_cpu_ptr(&samples); > - ktime_t now = ktime_get(); > - s64 time_delta = ktime_ms_delta(now, s->time); > unsigned long flags; > > local_irq_save(flags); > @@ -57,15 +57,10 @@ static void aperfmperf_snapshot_khz(void > if (mperf_delta == 0) > return; > > - s->time = now; > + s->time = ktime_get(); > s->aperf = aperf; > s->mperf = mperf; > - > - /* If the previous iteration was too long ago, discard it. */ > - if (time_delta > APERFMPERF_STALE_THRESHOLD_MS) > - s->khz = 0; > - else > - s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); > + s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); > } > > unsigned int arch_freq_get_on_cpu(int cpu) > @@ -82,16 +77,41 @@ unsigned int arch_freq_get_on_cpu(int cp > /* Don't bother re-computing within the cache threshold time. */ > time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu)); > khz = per_cpu(samples.khz, cpu); > - if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS) > + if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS) > return khz; > > smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); > khz = per_cpu(samples.khz, cpu); > - if (khz) > + if (time_delta <= APERFMPERF_STALE_THRESHOLD_MS) > return khz; > > + /* If the previous iteration was too long ago, take a new data point. */ > msleep(APERFMPERF_REFRESH_DELAY_MS); > smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); > > return per_cpu(samples.khz, cpu); > } > + > +void aperfmperf_snapshot_all(void) > +{ > + if (!cpu_khz) > + return; > + > + if (!static_cpu_has(X86_FEATURE_APERFMPERF)) > + return; > + > + smp_call_function_many(cpu_online_mask, aperfmperf_snapshot_khz, NULL, > 1); > + msleep(APERFMPERF_REFRESH_DELAY_MS); > +} > + > +unsigned int aperfmperf_snapshot_cpu(int cpu) > +{ > + if (!cpu_khz) > + return 0; > + > + if (!static_cpu_has(X
Re: [PATCH 2/3] clk: hisilicon: Add support for Hi3660 stub clocks
Hi Julien, On Fri, Nov 03, 2017 at 05:37:34PM +, Julien Thierry wrote: > Hi Kaihua, > > On 03/11/17 07:25, Kaihua Zhong wrote: > >Hi3660 has four stub clocks, which are big and LITTLE cluster clocks, > >GPU clock and DDR clock. These clocks ask MCU for frequency scaling > >by sending message through mailbox. > > > >This commit adds support for stub clocks, it requests the dedicated > >mailbox channel at initialization; then later uses this channel to send > >message to MCU to execute frequency scaling. The four stub clocks share > >the same mailbox channel, but every stub clock has its own command id so > >MCU can distinguish the requirement coming for which clock. > > > >A shared memory is used to present effective frequency value, so the > >clock driver uses I/O mapping for the memory and reads back rate value. > > > >Reviewed-by: Leo Yan > >Signed-off-by: Kai Zhao > >Signed-off-by: Kevin Wang > >Signed-off-by: Ruyi Wang > >Signed-off-by: Kaihua Zhong > >--- > > drivers/clk/hisilicon/Kconfig| 6 + > > drivers/clk/hisilicon/Makefile | 1 + > > drivers/clk/hisilicon/clk-hi3660-stub.c | 195 > > +++ > > include/dt-bindings/clock/hi3660-clock.h | 7 ++ > > 4 files changed, 209 insertions(+) > > create mode 100644 drivers/clk/hisilicon/clk-hi3660-stub.c > > > >diff --git a/drivers/clk/hisilicon/Kconfig b/drivers/clk/hisilicon/Kconfig > >index 7098bfd..1bd4355 100644 > >--- a/drivers/clk/hisilicon/Kconfig > >+++ b/drivers/clk/hisilicon/Kconfig > >@@ -49,3 +49,9 @@ config STUB_CLK_HI6220 > > default ARCH_HISI > > help > > Build the Hisilicon Hi6220 stub clock driver. > >+ > >+config STUB_CLK_HI3660 > >+bool "Hi3660 Stub Clock Driver" > >+depends on COMMON_CLK_HI3660 && MAILBOX > >+help > >+ Build the Hisilicon Hi3660 stub clock driver. > >diff --git a/drivers/clk/hisilicon/Makefile b/drivers/clk/hisilicon/Makefile > >index 1e4c3dd..0a5b499 100644 > >--- a/drivers/clk/hisilicon/Makefile > >+++ b/drivers/clk/hisilicon/Makefile > >@@ -14,3 +14,4 @@ obj-$(CONFIG_COMMON_CLK_HI3798CV200) += > >crg-hi3798cv200.o > > obj-$(CONFIG_COMMON_CLK_HI6220)+= clk-hi6220.o > > obj-$(CONFIG_RESET_HISI) += reset.o > > obj-$(CONFIG_STUB_CLK_HI6220) += clk-hi6220-stub.o > >+obj-$(CONFIG_STUB_CLK_HI3660) += clk-hi3660-stub.o > >diff --git a/drivers/clk/hisilicon/clk-hi3660-stub.c > >b/drivers/clk/hisilicon/clk-hi3660-stub.c > >new file mode 100644 > >index 000..0a21c91 > >--- /dev/null > >+++ b/drivers/clk/hisilicon/clk-hi3660-stub.c > >@@ -0,0 +1,195 @@ > >+/* > >+ * Hisilicon clock driver > >+ * > >+ * Copyright (c) 2013-2017 Hisilicon Limited. > >+ * Copyright (c) 2017 Linaro Limited. > >+ * > >+ * Author: Kai Zhao > >+ * Author: Tao Wang > >+ * Author: Leo Yan > >+ * > >+ * This program is free software; you can redistribute it and/or modify > >+ * it under the terms of the GNU General Public License as published by > >+ * the Free Software Foundation; either version 2 of the License, or > >+ * (at your option) any later version. > >+ * > >+ * This program is distributed in the hope that it will be useful, > >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of > >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >+ * GNU General Public License for more details. > >+ * > >+ */ > >+ > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+#include > >+ > >+#define HI3660_STUB_CLOCK_DATA (0x70) > >+#define MHZ (1000 * 1000) > >+ > >+#define DEFINE_CLK_STUB(_id, _cmd, _name) \ > >+{ \ > >+.id = (_id),\ > >+.cmd = (_cmd), \ > >+.hw.init = &(struct clk_init_data) {\ > >+.name = #_name, \ > >+.ops = &hi3660_stub_clk_ops,\ > >+.num_parents = 0, \ > >+.flags = CLK_GET_RATE_NOCACHE, \ > >+}, \ > >+}, > >+ > >+#define to_stub_clk(_hw) container_of(_hw, struct hi3660_stub_clk, hw) > >+ > >+struct hi3660_stub_clk_chan { > >+struct mbox_client cl; > >+struct mbox_chan *mbox; > >+}; > >+ > >+struct hi3660_stub_clk { > >+unsigned int id; > >+struct device *dev; > > I don't understand why you need to keep this. The only place it is used it > for the debug message in hi3660_stub_clk_set_rate and you could get the > device pointer by doing chan->cl.dev since all the stub_clk point to the > same device. Kaihua might miss this email, so I checked all your comments; accept these comments and will spin for next version patch. Thank you for good suggestions. Thanks, Leo Yan >
[PATCHv4 3/3] ARMv8: pcie: make the DWC EP driver support for layerscape
Layerscape pcie controllers support RC or EP mode, Add the EP mode support in Kconfig, the driver will support both RC and EP mode, and the driver is able to judge the pcie controllers work on RC or EP mode. Signed-off-by: Bao Xiaowei Acked-by: Minghuan Lian --- v2: no change v3: no change v4: no change drivers/pci/dwc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig index 22ec82fcdea2..b5f507795779 100644 --- a/drivers/pci/dwc/Kconfig +++ b/drivers/pci/dwc/Kconfig @@ -108,6 +108,7 @@ config PCI_LAYERSCAPE depends on PCI_MSI_IRQ_DOMAIN select MFD_SYSCON select PCIE_DW_HOST + select PCIE_DW_EP help Say Y here if you want PCIe controller support on Layerscape SoCs. -- 2.14.1
[PATCHv4 0/3] dts: Add the property of IB and OB
Depend on http://patchwork.ozlabs.org/patch/815382/ Bao Xiaowei (3): ARMv8: dts: ls1046a: add the property of IB and OB ARMv8: layerscape: add the pcie ep function support ARMv8: pcie: make the DWC EP driver support for layerscape arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++ drivers/pci/dwc/Kconfig| 1 + drivers/pci/dwc/pci-layerscape.c | 121 +++-- 3 files changed, 121 insertions(+), 7 deletions(-) -- 2.14.1
[PATCHv4 2/3] ARMv8: layerscape: add the pcie ep function support
Add the pcie controller ep function support of layerscape base on pcie ep framework. Signed-off-by: Bao Xiaowei --- v2: - fix the ioremap function used but no ioumap issue - optimize the code structure - add code comments v3: - fix the msi outband window request failed issue v4: - optimize the code, adjust the format drivers/pci/dwc/pci-layerscape.c | 120 --- 1 file changed, 113 insertions(+), 7 deletions(-) diff --git a/drivers/pci/dwc/pci-layerscape.c b/drivers/pci/dwc/pci-layerscape.c index 87fa486bee2c..6f3e434599e0 100644 --- a/drivers/pci/dwc/pci-layerscape.c +++ b/drivers/pci/dwc/pci-layerscape.c @@ -34,7 +34,12 @@ /* PEX Internal Configuration Registers */ #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask Register1 */ +#define PCIE_DBI2_BASE 0x1000 /* DBI2 base address*/ +#define PCIE_MSI_MSG_DATA_OFF 0x5c/* MSI Data register address*/ +#define PCIE_MSI_OB_SIZE 4096 +#define PCIE_MSI_ADDR_OFFSET (1024 * 1024) #define PCIE_IATU_NUM 6 +#define PCIE_EP_ADDR_SPACE_SIZE 0x1 struct ls_pcie_drvdata { u32 lut_offset; @@ -44,12 +49,20 @@ struct ls_pcie_drvdata { const struct dw_pcie_ops *dw_pcie_ops; }; +struct ls_pcie_ep { + dma_addr_t msi_phys_addr; + void __iomem *msi_virt_addr; + u64 msi_msg_addr; + u16 msi_msg_data; +}; + struct ls_pcie { struct dw_pcie *pci; void __iomem *lut; struct regmap *scfg; const struct ls_pcie_drvdata *drvdata; int index; + struct ls_pcie_ep *pcie_ep; }; #define to_ls_pcie(x) dev_get_drvdata((x)->dev) @@ -263,6 +276,99 @@ static const struct of_device_id ls_pcie_of_match[] = { { }, }; +static void ls_pcie_raise_msi_irq(struct ls_pcie_ep *pcie_ep) +{ + iowrite32(pcie_ep->msi_msg_data, pcie_ep->msi_virt_addr); +} + +static int ls_pcie_raise_irq(struct dw_pcie_ep *ep, + enum pci_epc_irq_type type, u8 interrupt_num) +{ + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + struct ls_pcie *pcie = to_ls_pcie(pci); + struct ls_pcie_ep *pcie_ep = pcie->pcie_ep; + u32 free_win; + + /* get the msi message address and msi message data */ + pcie_ep->msi_msg_addr = ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_L32) | + (((u64)ioread32(pci->dbi_base + MSI_MESSAGE_ADDR_U32)) << 32); + pcie_ep->msi_msg_data = ioread16(pci->dbi_base + PCIE_MSI_MSG_DATA_OFF); + + /* request and config the outband window for msi */ + free_win = find_first_zero_bit(&ep->ob_window_map, + sizeof(ep->ob_window_map)); + if (free_win >= ep->num_ob_windows) { + dev_err(pci->dev, "no free outbound window\n"); + return -ENOMEM; + } + + dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM, + pcie_ep->msi_phys_addr, + pcie_ep->msi_msg_addr, + PCIE_MSI_OB_SIZE); + + set_bit(free_win, &ep->ob_window_map); + + /* generate the msi interrupt */ + ls_pcie_raise_msi_irq(pcie_ep); + + /* release the outband window of msi */ + dw_pcie_disable_atu(pci, free_win, DW_PCIE_REGION_OUTBOUND); + clear_bit(free_win, &ep->ob_window_map); + + return 0; +} + +static struct dw_pcie_ep_ops pcie_ep_ops = { + .raise_irq = ls_pcie_raise_irq, +}; + +static int __init ls_add_pcie_ep(struct ls_pcie *pcie, + struct platform_device *pdev) +{ + struct dw_pcie *pci = pcie->pci; + struct device *dev = pci->dev; + struct dw_pcie_ep *ep; + struct ls_pcie_ep *pcie_ep; + struct resource *cfg_res; + int ret; + + ep = &pci->ep; + ep->ops = &pcie_ep_ops; + + pcie_ep = devm_kzalloc(dev, sizeof(*pcie_ep), GFP_KERNEL); + if (!pcie_ep) + return -ENOMEM; + + pcie->pcie_ep = pcie_ep; + + cfg_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "config"); + if (cfg_res) { + ep->phys_base = cfg_res->start; + ep->addr_size = PCIE_EP_ADDR_SPACE_SIZE; + } else { + dev_err(dev, "missing *config* space\n"); + return -ENODEV; + } + + pcie_ep->msi_phys_addr = ep->phys_base + PCIE_MSI_ADDR_OFFSET; + + pcie_ep->msi_virt_addr = ioremap(pcie_ep->msi_phys_addr, + PCIE_MSI_OB_SIZE); + if (!pcie_ep->msi_virt_addr) { + dev_err(dev, "failed to map MSI outbound region\n"); + return -ENOMEM; + } + + ret = dw_pcie_ep_init(ep); + if (ret) { + dev_err(dev, "failed to initialize endpoint\n"); + return ret; + } + + return 0; +} + static int __init ls_add_pcie_port(struct ls_pcie *pcie) { struct dw
[PATCHv4 1/3] ARMv8: dts: ls1046a: add the property of IB and OB
Add the property of inbound and outbound windows number for ep driver. Signed-off-by: Bao Xiaowei Acked-by: Minghuan Lian --- v2: - no change v3: - modify the commit message v4: - no change arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi index 06b5e12d04d8..f8332669663c 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi @@ -674,6 +674,8 @@ device_type = "pci"; dma-coherent; num-lanes = <4>; + num-ib-windows = <6>; + num-ob-windows = <6>; bus-range = <0x0 0xff>; ranges = <0x8100 0x0 0x 0x40 0x0001 0x0 0x0001 /* downstream I/O */ 0x8200 0x0 0x4000 0x40 0x4000 0x0 0x4000>; /* non-prefetchable memory */ @@ -699,6 +701,8 @@ device_type = "pci"; dma-coherent; num-lanes = <2>; + num-ib-windows = <6>; + num-ob-windows = <6>; bus-range = <0x0 0xff>; ranges = <0x8100 0x0 0x 0x48 0x0001 0x0 0x0001 /* downstream I/O */ 0x8200 0x0 0x4000 0x48 0x4000 0x0 0x4000>; /* non-prefetchable memory */ @@ -724,6 +728,8 @@ device_type = "pci"; dma-coherent; num-lanes = <2>; + num-ib-windows = <6>; + num-ob-windows = <6>; bus-range = <0x0 0xff>; ranges = <0x8100 0x0 0x 0x50 0x0001 0x0 0x0001 /* downstream I/O */ 0x8200 0x0 0x4000 0x50 0x4000 0x0 0x4000>; /* non-prefetchable memory */ -- 2.14.1