[PATCH V3 08/10] OPP: Configure all required OPPs
Now that all the infrastructure is in place to support multiple required OPPs, lets switch over to using it. A new internal routine _set_required_opps() takes care of updating performance state for all the required OPPs. With this the performance state updates are supported even when the end device needs to configure regulators as well, that wasn't the case earlier. The pstates were earlier stored in the end device's OPP structures, that also changes now as those values are stored in the genpd's OPP structures. And so we switch over to using pm_genpd_opp_to_performance_state() instead of of_genpd_opp_to_performance_state() to get performance state for the genpd OPPs. The routine _generic_set_opp_domain() is not required anymore and is removed. On errors we don't try to recover by reverting to old settings as things are really complex now and the calls here should never really fail unless there is a bug. There is no point increasing the complexity, for code which will never be executed. Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 113 ++--- drivers/opp/of.c | 5 +- 2 files changed, 68 insertions(+), 50 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index cef2ccda355d..0eaa954b3f6c 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -548,44 +548,6 @@ _generic_set_opp_clk_only(struct device *dev, struct clk *clk, return ret; } -static inline int -_generic_set_opp_domain(struct device *dev, struct clk *clk, - unsigned long old_freq, unsigned long freq, - unsigned int old_pstate, unsigned int new_pstate) -{ - int ret; - - /* Scaling up? Scale domain performance state before frequency */ - if (freq > old_freq) { - ret = dev_pm_genpd_set_performance_state(dev, new_pstate); - if (ret) - return ret; - } - - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); - if (ret) - goto restore_domain_state; - - /* Scaling down? Scale domain performance state after frequency */ - if (freq < old_freq) { - ret = dev_pm_genpd_set_performance_state(dev, new_pstate); - if (ret) - goto restore_freq; - } - - return 0; - -restore_freq: - if (_generic_set_opp_clk_only(dev, clk, freq, old_freq)) - dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", - __func__, old_freq); -restore_domain_state: - if (freq > old_freq) - dev_pm_genpd_set_performance_state(dev, old_pstate); - - return ret; -} - static int _generic_set_opp_regulator(const struct opp_table *opp_table, struct device *dev, unsigned long old_freq, @@ -663,6 +625,56 @@ static int _set_opp_custom(const struct opp_table *opp_table, return opp_table->set_opp(data); } +/* This is only called for PM domain for now */ +static int _set_required_opps(struct device *dev, + struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct opp_table **required_opp_tables = opp_table->required_opp_tables; + struct device **genpd_virt_devs = opp_table->genpd_virt_devs; + unsigned int pstate; + int i, ret = 0; + + if (!required_opp_tables) + return 0; + + /* Single genpd case */ + if (!genpd_virt_devs) { + pstate = opp->required_opps[0]->pstate; + ret = dev_pm_genpd_set_performance_state(dev, pstate); + if (ret) { + dev_err(dev, "Failed to set performance state of %s: %d (%d)\n", + dev_name(dev), pstate, ret); + } + return ret; + } + + /* Multiple genpd case */ + + /* +* Acquire genpd_virt_dev_lock to make sure we don't use a genpd_dev +* after it is freed from another thread. +*/ + mutex_lock(_table->genpd_virt_dev_lock); + + for (i = 0; i < opp_table->required_opp_count; i++) { + pstate = opp->required_opps[i]->pstate; + + if (!genpd_virt_devs[i]) + continue; + + ret = dev_pm_genpd_set_performance_state(genpd_virt_devs[i], pstate); + if (ret) { + dev_err(dev, "Failed to set performance rate of %s: %d (%d)\n", + dev_name(genpd_virt_devs[i]), pstate, ret); + break; + } + } + mutex_unlock(_table->genpd_virt_dev_lock); + + return ret; +} + /** * dev_pm_opp_set_rate() - Configure new OPP based on frequency * @dev:device for which we do this operation @@ -730,6 +742,13 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long
[PATCH V3 05/10] OPP: Populate OPPs from "required-opps" property
An earlier commit populated the OPP tables from the "required-opps" property, this commit populates the individual OPPs. This is repeated for each OPP in the OPP table and these populated OPPs will be used by later commits. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 1 + drivers/opp/of.c | 81 -- drivers/opp/opp.h | 6 3 files changed, 86 insertions(+), 2 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 85174a5c4850..02a69a62dac8 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -976,6 +976,7 @@ static void _opp_kref_release(struct kref *kref) * frequency/voltage list. */ blocking_notifier_call_chain(_table->head, OPP_EVENT_REMOVE, opp); + _of_opp_free_required_opps(opp_table, opp); opp_debug_remove_one(opp); list_del(>node); kfree(opp); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index b5605196122a..ffaeefef98ce 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -259,6 +259,77 @@ void _of_clear_opp_table(struct opp_table *opp_table) _opp_table_free_required_tables(opp_table); } +/* + * Release all resources previously acquired with a call to + * _of_opp_alloc_required_opps(). + */ +void _of_opp_free_required_opps(struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct dev_pm_opp **required_opps = opp->required_opps; + int i; + + if (!required_opps) + return; + + for (i = 0; i < opp_table->required_opp_count; i++) { + if (!required_opps[i]) + break; + + /* Put the reference back */ + dev_pm_opp_put(required_opps[i]); + } + + kfree(required_opps); + opp->required_opps = NULL; +} + +/* Populate all required OPPs which are part of "required-opps" list */ +static int _of_opp_alloc_required_opps(struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct dev_pm_opp **required_opps; + struct opp_table *required_table; + struct device_node *np; + int i, ret, count = opp_table->required_opp_count; + + if (!count) + return 0; + + required_opps = kcalloc(count, sizeof(*required_opps), GFP_KERNEL); + if (!required_opps) + return -ENOMEM; + + opp->required_opps = required_opps; + + for (i = 0; i < count; i++) { + required_table = opp_table->required_opp_tables[i]; + + np = of_parse_required_opp(opp->np, i); + if (unlikely(!np)) { + ret = -ENODEV; + goto free_required_opps; + } + + required_opps[i] = _find_opp_of_np(required_table, np); + of_node_put(np); + + if (!required_opps[i]) { + pr_err("%s: Unable to find required OPP node: %pOF (%d)\n", + __func__, opp->np, i); + ret = -ENODEV; + goto free_required_opps; + } + } + + return 0; + +free_required_opps: + _of_opp_free_required_opps(opp_table, opp); + + return ret; +} + static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table, struct device_node *np) { @@ -503,6 +574,10 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, new_opp->dynamic = false; new_opp->available = true; + ret = _of_opp_alloc_required_opps(opp_table, new_opp); + if (ret) + goto free_opp; + if (!of_property_read_u32(np, "clock-latency-ns", )) new_opp->clock_latency_ns = val; @@ -510,14 +585,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, ret = opp_parse_supplies(new_opp, dev, opp_table); if (ret) - goto free_opp; + goto free_required_opps; ret = _opp_add(dev, new_opp, opp_table, rate_not_available); if (ret) { /* Don't return error for duplicate OPPs */ if (ret == -EBUSY) ret = 0; - goto free_opp; + goto free_required_opps; } /* OPP to select on device suspend */ @@ -547,6 +622,8 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, blocking_notifier_call_chain(_table->head, OPP_EVENT_ADD, new_opp); return new_opp; +free_required_opps: + _of_opp_free_required_opps(opp_table, new_opp); free_opp: _opp_free(new_opp); diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 024e1be23d37..24b340ad18d1 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -63,6 +63,7 @@ extern struct list_head opp_tables; * @supplies: Power supplies
[PATCH V3 02/10] OPP: Identify and mark genpd OPP tables
We need to handle genpd OPP tables differently, this is already the case at one location and will be extended going forward. Add another field to the OPP table to check if the table belongs to a genpd or not. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/of.c | 6 -- drivers/opp/opp.h | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 5a4b47958073..5f114cd3d88c 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -92,6 +92,9 @@ void _of_init_opp_table(struct opp_table *opp_table, struct device *dev, of_property_read_u32(np, "voltage-tolerance", _table->voltage_tolerance_v1); + if (of_find_property(np, "#power-domain-cells", NULL)) + opp_table->is_genpd = true; + /* Get OPP table node */ opp_np = _opp_of_get_opp_desc_node(np, index); of_node_put(np); @@ -326,8 +329,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, ret = of_property_read_u64(np, "opp-hz", ); if (ret < 0) { /* "opp-hz" is optional for devices like power domains. */ - if (!of_find_property(dev->of_node, "#power-domain-cells", - NULL)) { + if (!opp_table->is_genpd) { dev_err(dev, "%s: opp-hz not found\n", __func__); goto free_opp; } diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 9c6544b4f4f9..cdb0c2b095e2 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -140,6 +140,7 @@ enum opp_table_access { * @regulators: Supply regulators * @regulator_count: Number of power supply regulators * @genpd_performance_state: Device's power domain support performance state. + * @is_genpd: Marks if the OPP table belongs to a genpd. * @set_opp: Platform specific set_opp callback * @set_opp_data: Data to be passed to set_opp callback * @dentry:debugfs dentry pointer of the real device directory (not links). @@ -178,6 +179,7 @@ struct opp_table { struct regulator **regulators; unsigned int regulator_count; bool genpd_performance_state; + bool is_genpd; int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_set_opp_data *set_opp_data; -- 2.19.1.568.g152ad8e3369a
[PATCH V3 07/10] OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper
Multiple generic power domains for a consumer device are supported with the help of virtual devices, which are created for each consumer device - genpd pair. These are the device structures which are attached to the power domain and are required by the OPP core to set the performance state of the genpd. The helpers added by this commit are required to be called once for each of these virtual devices. These are required only if multiple domains are available for a device, otherwise the actual device structure will be used instead by the OPP core. The new helpers also support the complex cases where the consumer device wouldn't always require all the domains. For example, a camera may require only one power domain during normal operations but two during high resolution operations. The consumer driver can call dev_pm_opp_put_genpd_virt_dev(high_resolution_genpd_virt_dev) if it is currently operating in the normal mode and doesn't have any performance requirements from the genpd which manages high resolution power requirements. The consumer driver can later call dev_pm_opp_set_genpd_virt_dev(high_resolution_genpd_virt_dev) once it switches back to the high resolution mode. The new helpers differ from other OPP set/put helpers as the new ones can be called with OPPs initialized for the table as we may need to call them on the fly because of the complex case explained above. For this reason it is possible that the genpd virt_dev structure may be used in parallel while the new helpers are running and a new mutex is added to protect against that. We didn't use the existing opp_table->lock mutex as that is widely used in the OPP core and we will need this lock in the dev_pm_opp_set_rate() helper while changing OPP and we need to make sure there is not much contention while doing that as that's the hotpath. Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 88 ++ drivers/opp/of.c | 16 +++- drivers/opp/opp.h | 4 ++ include/linux/pm_opp.h | 8 4 files changed, 115 insertions(+), 1 deletion(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 02a69a62dac8..cef2ccda355d 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -823,6 +823,7 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index) return NULL; mutex_init(_table->lock); + mutex_init(_table->genpd_virt_dev_lock); INIT_LIST_HEAD(_table->dev_list); opp_dev = _add_opp_dev(dev, opp_table); @@ -920,6 +921,7 @@ static void _opp_table_kref_release(struct kref *kref) _remove_opp_dev(opp_dev, opp_table); } + mutex_destroy(_table->genpd_virt_dev_lock); mutex_destroy(_table->lock); list_del(_table->node); kfree(opp_table); @@ -1602,6 +1604,92 @@ void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) } EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper); +/** + * dev_pm_opp_set_genpd_virt_dev - Set virtual genpd device for an index + * @dev: Consumer device for which the genpd device is getting set. + * @virt_dev: virtual genpd device. + * @index: index. + * + * Multiple generic power domains for a device are supported with the help of + * virtual genpd devices, which are created for each consumer device - genpd + * pair. These are the device structures which are attached to the power domain + * and are required by the OPP core to set the performance state of the genpd. + * + * This helper will normally be called by the consumer driver of the device + * "dev", as only that has details of the genpd devices. + * + * This helper needs to be called once for each of those virtual devices, but + * only if multiple domains are available for a device. Otherwise the original + * device structure will be used instead by the OPP core. + */ +struct opp_table *dev_pm_opp_set_genpd_virt_dev(struct device *dev, + struct device *virt_dev, + int index) +{ + struct opp_table *opp_table; + + opp_table = dev_pm_opp_get_opp_table(dev); + if (!opp_table) + return ERR_PTR(-ENOMEM); + + mutex_lock(_table->genpd_virt_dev_lock); + + if (unlikely(!opp_table->genpd_virt_devs || +index >= opp_table->required_opp_count || +opp_table->genpd_virt_devs[index])) { + + dev_err(dev, "Invalid request to set required device\n"); + dev_pm_opp_put_opp_table(opp_table); + mutex_unlock(_table->genpd_virt_dev_lock); + + return ERR_PTR(-EINVAL); + } + + opp_table->genpd_virt_devs[index] = virt_dev; + mutex_unlock(_table->genpd_virt_dev_lock); + + return opp_table; +} + +/** + * dev_pm_opp_put_genpd_virt_dev() - Releases resources blocked for genpd device. + * @opp_table: OPP table returned by
[PATCH V3 06/10] PM / Domains: Add genpd_opp_to_performance_state()
The OPP core currently stores the performance state in the consumer device's OPP table, but that is going to change going forward and performance state will rather be set directly in the genpd's OPP table. For that we need to get the performance state for genpd's device structure (genpd->dev) instead of the consumer device's structure. Add a new helper to do that. Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 32 include/linux/pm_domain.h | 9 + 2 files changed, 41 insertions(+) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index fe9b0527b161..7be8c94c6b7f 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2520,6 +2520,38 @@ int of_genpd_parse_idle_states(struct device_node *dn, } EXPORT_SYMBOL_GPL(of_genpd_parse_idle_states); +/** + * pm_genpd_opp_to_performance_state - Gets performance state of the genpd from its OPP node. + * + * @genpd_dev: Genpd's device for which the performance-state needs to be found. + * @opp: struct dev_pm_opp of the OPP for which we need to find performance + * state. + * + * Returns performance state encoded in the OPP of the genpd. This calls + * platform specific genpd->opp_to_performance_state() callback to translate + * power domain OPP to performance state. + * + * Returns performance state on success and 0 on failure. + */ +unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp) +{ + struct generic_pm_domain *genpd = NULL; + int state; + + genpd = container_of(genpd_dev, struct generic_pm_domain, dev); + + if (unlikely(!genpd->opp_to_performance_state)) + return 0; + + genpd_lock(genpd); + state = genpd->opp_to_performance_state(genpd, opp); + genpd_unlock(genpd); + + return state; +} +EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state); + /** * of_genpd_opp_to_performance_state- Gets performance state of device's * power domain corresponding to a DT node's "required-opps" property. diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 3b5d7280e52e..4f803f934308 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -258,6 +258,8 @@ int of_genpd_add_subdomain(struct of_phandle_args *parent, struct generic_pm_domain *of_genpd_remove_last(struct device_node *np); int of_genpd_parse_idle_states(struct device_node *dn, struct genpd_power_state **states, int *n); +unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp); unsigned int of_genpd_opp_to_performance_state(struct device *dev, struct device_node *np); @@ -299,6 +301,13 @@ static inline int of_genpd_parse_idle_states(struct device_node *dn, return -ENODEV; } +static inline unsigned int +pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp) +{ + return 0; +} + static inline unsigned int of_genpd_opp_to_performance_state(struct device *dev, struct device_node *np) -- 2.19.1.568.g152ad8e3369a
[PATCH V3 09/10] OPP: Rename and relocate of_genpd_opp_to_performance_state()
The OPP core already has the performance state values for each of the genpd's OPPs and there is no need to call the genpd callback again to get the performance state for the case where the end device doesn't have an OPP table and has the "required-opps" property directly in its node. This commit renames of_genpd_opp_to_performance_state() as of_get_required_opp_performance_state() and moves it to the OPP core, as it is all about OPP stuff now. Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 48 - drivers/opp/of.c| 44 ++ include/linux/pm_domain.h | 9 --- include/linux/pm_opp.h | 5 4 files changed, 49 insertions(+), 57 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 7be8c94c6b7f..8e554e6a82a2 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2552,54 +2552,6 @@ unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, } EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state); -/** - * of_genpd_opp_to_performance_state- Gets performance state of device's - * power domain corresponding to a DT node's "required-opps" property. - * - * @dev: Device for which the performance-state needs to be found. - * @np: DT node where the "required-opps" property is present. This can be - * the device node itself (if it doesn't have an OPP table) or a node - * within the OPP table of a device (if device has an OPP table). - * - * Returns performance state corresponding to the "required-opps" property of - * a DT node. This calls platform specific genpd->opp_to_performance_state() - * callback to translate power domain OPP to performance state. - * - * Returns performance state on success and 0 on failure. - */ -unsigned int of_genpd_opp_to_performance_state(struct device *dev, - struct device_node *np) -{ - struct generic_pm_domain *genpd; - struct dev_pm_opp *opp; - int state = 0; - - genpd = dev_to_genpd(dev); - if (IS_ERR(genpd)) - return 0; - - if (unlikely(!genpd->set_performance_state)) - return 0; - - genpd_lock(genpd); - - opp = of_dev_pm_opp_find_required_opp(>dev, np); - if (IS_ERR(opp)) { - dev_err(dev, "Failed to find required OPP: %ld\n", - PTR_ERR(opp)); - goto unlock; - } - - state = genpd->opp_to_performance_state(genpd, opp); - dev_pm_opp_put(opp); - -unlock: - genpd_unlock(genpd); - - return state; -} -EXPORT_SYMBOL_GPL(of_genpd_opp_to_performance_state); - static int __init genpd_bus_init(void) { return bus_register(_bus_type); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 4e494720ac25..0755ee307b1a 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -969,6 +969,50 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, } EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_sharing_cpus); +/** + * of_get_required_opp_performance_state() - Search for required OPP and return its performance state. + * @np: Node that contains the "required-opps" property. + * @index: Index of the phandle to parse. + * + * Returns the performance state of the OPP pointed out by the "required-opps" + * property at @index in @np. + * + * Return: Positive performance state on success, otherwise 0 on errors. + */ +unsigned int of_get_required_opp_performance_state(struct device_node *np, + int index) +{ + struct dev_pm_opp *opp; + struct device_node *required_np; + struct opp_table *opp_table; + unsigned int pstate = 0; + + required_np = of_parse_required_opp(np, index); + if (!required_np) + return -ENODEV; + + opp_table = _find_table_of_opp_np(required_np); + if (IS_ERR(opp_table)) { + pr_err("%s: Failed to find required OPP table %pOF: %ld\n", + __func__, np, PTR_ERR(opp_table)); + goto put_required_np; + } + + opp = _find_opp_of_np(opp_table, required_np); + if (opp) { + pstate = opp->pstate; + dev_pm_opp_put(opp); + } + + dev_pm_opp_put_opp_table(opp_table); + +put_required_np: + of_node_put(required_np); + + return pstate; +} +EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state); + /** * of_dev_pm_opp_find_required_opp() - Search for required OPP. * @dev: The device whose OPP node is referenced by the 'np' DT node. diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 4f803f934308..642036952553 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -260,8 +260,6 @@ int of_genpd_parse_idle_states(struct device_node *dn, struct genpd_power_state **states, int *n); unsigned int
[PATCH V3 10/10] OPP: Remove of_dev_pm_opp_find_required_opp()
This isn't used anymore, remove it. Signed-off-by: Viresh Kumar --- drivers/opp/of.c | 54 -- include/linux/pm_opp.h | 5 2 files changed, 59 deletions(-) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 0755ee307b1a..da31874c7fe9 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -1013,60 +1013,6 @@ unsigned int of_get_required_opp_performance_state(struct device_node *np, } EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state); -/** - * of_dev_pm_opp_find_required_opp() - Search for required OPP. - * @dev: The device whose OPP node is referenced by the 'np' DT node. - * @np: Node that contains the "required-opps" property. - * - * Returns the OPP of the device 'dev', whose phandle is present in the "np" - * node. Although the "required-opps" property supports having multiple - * phandles, this helper routine only parses the very first phandle in the list. - * - * Return: Matching opp, else returns ERR_PTR in case of error and should be - * handled using IS_ERR. - * - * The callers are required to call dev_pm_opp_put() for the returned OPP after - * use. - */ -struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, - struct device_node *np) -{ - struct dev_pm_opp *temp_opp, *opp = ERR_PTR(-ENODEV); - struct device_node *required_np; - struct opp_table *opp_table; - - opp_table = _find_opp_table(dev); - if (IS_ERR(opp_table)) - return ERR_CAST(opp_table); - - required_np = of_parse_phandle(np, "required-opps", 0); - if (unlikely(!required_np)) { - dev_err(dev, "Unable to parse required-opps\n"); - goto put_opp_table; - } - - mutex_lock(_table->lock); - - list_for_each_entry(temp_opp, _table->opp_list, node) { - if (temp_opp->available && temp_opp->np == required_np) { - opp = temp_opp; - - /* Increment the reference count of OPP */ - dev_pm_opp_get(opp); - break; - } - } - - mutex_unlock(_table->lock); - - of_node_put(required_np); -put_opp_table: - dev_pm_opp_put_opp_table(opp_table); - - return opp; -} -EXPORT_SYMBOL_GPL(of_dev_pm_opp_find_required_opp); - /** * dev_pm_opp_get_of_node() - Gets the DT node corresponding to an opp * @opp: opp for which DT node has to be returned for diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 889bb347fbd9..2b2c3fd985ab 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -313,7 +313,6 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask *cpumask); void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask); int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask); struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); -struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct device_node *np); struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); unsigned int of_get_required_opp_performance_state(struct device_node *np, int index); #else @@ -350,10 +349,6 @@ static inline struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device return NULL; } -static inline struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct device_node *np) -{ - return NULL; -} static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) { return NULL; -- 2.19.1.568.g152ad8e3369a
[PATCH V3 08/10] OPP: Configure all required OPPs
Now that all the infrastructure is in place to support multiple required OPPs, lets switch over to using it. A new internal routine _set_required_opps() takes care of updating performance state for all the required OPPs. With this the performance state updates are supported even when the end device needs to configure regulators as well, that wasn't the case earlier. The pstates were earlier stored in the end device's OPP structures, that also changes now as those values are stored in the genpd's OPP structures. And so we switch over to using pm_genpd_opp_to_performance_state() instead of of_genpd_opp_to_performance_state() to get performance state for the genpd OPPs. The routine _generic_set_opp_domain() is not required anymore and is removed. On errors we don't try to recover by reverting to old settings as things are really complex now and the calls here should never really fail unless there is a bug. There is no point increasing the complexity, for code which will never be executed. Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 113 ++--- drivers/opp/of.c | 5 +- 2 files changed, 68 insertions(+), 50 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index cef2ccda355d..0eaa954b3f6c 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -548,44 +548,6 @@ _generic_set_opp_clk_only(struct device *dev, struct clk *clk, return ret; } -static inline int -_generic_set_opp_domain(struct device *dev, struct clk *clk, - unsigned long old_freq, unsigned long freq, - unsigned int old_pstate, unsigned int new_pstate) -{ - int ret; - - /* Scaling up? Scale domain performance state before frequency */ - if (freq > old_freq) { - ret = dev_pm_genpd_set_performance_state(dev, new_pstate); - if (ret) - return ret; - } - - ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); - if (ret) - goto restore_domain_state; - - /* Scaling down? Scale domain performance state after frequency */ - if (freq < old_freq) { - ret = dev_pm_genpd_set_performance_state(dev, new_pstate); - if (ret) - goto restore_freq; - } - - return 0; - -restore_freq: - if (_generic_set_opp_clk_only(dev, clk, freq, old_freq)) - dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n", - __func__, old_freq); -restore_domain_state: - if (freq > old_freq) - dev_pm_genpd_set_performance_state(dev, old_pstate); - - return ret; -} - static int _generic_set_opp_regulator(const struct opp_table *opp_table, struct device *dev, unsigned long old_freq, @@ -663,6 +625,56 @@ static int _set_opp_custom(const struct opp_table *opp_table, return opp_table->set_opp(data); } +/* This is only called for PM domain for now */ +static int _set_required_opps(struct device *dev, + struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct opp_table **required_opp_tables = opp_table->required_opp_tables; + struct device **genpd_virt_devs = opp_table->genpd_virt_devs; + unsigned int pstate; + int i, ret = 0; + + if (!required_opp_tables) + return 0; + + /* Single genpd case */ + if (!genpd_virt_devs) { + pstate = opp->required_opps[0]->pstate; + ret = dev_pm_genpd_set_performance_state(dev, pstate); + if (ret) { + dev_err(dev, "Failed to set performance state of %s: %d (%d)\n", + dev_name(dev), pstate, ret); + } + return ret; + } + + /* Multiple genpd case */ + + /* +* Acquire genpd_virt_dev_lock to make sure we don't use a genpd_dev +* after it is freed from another thread. +*/ + mutex_lock(_table->genpd_virt_dev_lock); + + for (i = 0; i < opp_table->required_opp_count; i++) { + pstate = opp->required_opps[i]->pstate; + + if (!genpd_virt_devs[i]) + continue; + + ret = dev_pm_genpd_set_performance_state(genpd_virt_devs[i], pstate); + if (ret) { + dev_err(dev, "Failed to set performance rate of %s: %d (%d)\n", + dev_name(genpd_virt_devs[i]), pstate, ret); + break; + } + } + mutex_unlock(_table->genpd_virt_dev_lock); + + return ret; +} + /** * dev_pm_opp_set_rate() - Configure new OPP based on frequency * @dev:device for which we do this operation @@ -730,6 +742,13 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long
[PATCH V3 05/10] OPP: Populate OPPs from "required-opps" property
An earlier commit populated the OPP tables from the "required-opps" property, this commit populates the individual OPPs. This is repeated for each OPP in the OPP table and these populated OPPs will be used by later commits. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 1 + drivers/opp/of.c | 81 -- drivers/opp/opp.h | 6 3 files changed, 86 insertions(+), 2 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 85174a5c4850..02a69a62dac8 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -976,6 +976,7 @@ static void _opp_kref_release(struct kref *kref) * frequency/voltage list. */ blocking_notifier_call_chain(_table->head, OPP_EVENT_REMOVE, opp); + _of_opp_free_required_opps(opp_table, opp); opp_debug_remove_one(opp); list_del(>node); kfree(opp); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index b5605196122a..ffaeefef98ce 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -259,6 +259,77 @@ void _of_clear_opp_table(struct opp_table *opp_table) _opp_table_free_required_tables(opp_table); } +/* + * Release all resources previously acquired with a call to + * _of_opp_alloc_required_opps(). + */ +void _of_opp_free_required_opps(struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct dev_pm_opp **required_opps = opp->required_opps; + int i; + + if (!required_opps) + return; + + for (i = 0; i < opp_table->required_opp_count; i++) { + if (!required_opps[i]) + break; + + /* Put the reference back */ + dev_pm_opp_put(required_opps[i]); + } + + kfree(required_opps); + opp->required_opps = NULL; +} + +/* Populate all required OPPs which are part of "required-opps" list */ +static int _of_opp_alloc_required_opps(struct opp_table *opp_table, + struct dev_pm_opp *opp) +{ + struct dev_pm_opp **required_opps; + struct opp_table *required_table; + struct device_node *np; + int i, ret, count = opp_table->required_opp_count; + + if (!count) + return 0; + + required_opps = kcalloc(count, sizeof(*required_opps), GFP_KERNEL); + if (!required_opps) + return -ENOMEM; + + opp->required_opps = required_opps; + + for (i = 0; i < count; i++) { + required_table = opp_table->required_opp_tables[i]; + + np = of_parse_required_opp(opp->np, i); + if (unlikely(!np)) { + ret = -ENODEV; + goto free_required_opps; + } + + required_opps[i] = _find_opp_of_np(required_table, np); + of_node_put(np); + + if (!required_opps[i]) { + pr_err("%s: Unable to find required OPP node: %pOF (%d)\n", + __func__, opp->np, i); + ret = -ENODEV; + goto free_required_opps; + } + } + + return 0; + +free_required_opps: + _of_opp_free_required_opps(opp_table, opp); + + return ret; +} + static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table, struct device_node *np) { @@ -503,6 +574,10 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, new_opp->dynamic = false; new_opp->available = true; + ret = _of_opp_alloc_required_opps(opp_table, new_opp); + if (ret) + goto free_opp; + if (!of_property_read_u32(np, "clock-latency-ns", )) new_opp->clock_latency_ns = val; @@ -510,14 +585,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, ret = opp_parse_supplies(new_opp, dev, opp_table); if (ret) - goto free_opp; + goto free_required_opps; ret = _opp_add(dev, new_opp, opp_table, rate_not_available); if (ret) { /* Don't return error for duplicate OPPs */ if (ret == -EBUSY) ret = 0; - goto free_opp; + goto free_required_opps; } /* OPP to select on device suspend */ @@ -547,6 +622,8 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, blocking_notifier_call_chain(_table->head, OPP_EVENT_ADD, new_opp); return new_opp; +free_required_opps: + _of_opp_free_required_opps(opp_table, new_opp); free_opp: _opp_free(new_opp); diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 024e1be23d37..24b340ad18d1 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -63,6 +63,7 @@ extern struct list_head opp_tables; * @supplies: Power supplies
[PATCH V3 02/10] OPP: Identify and mark genpd OPP tables
We need to handle genpd OPP tables differently, this is already the case at one location and will be extended going forward. Add another field to the OPP table to check if the table belongs to a genpd or not. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/of.c | 6 -- drivers/opp/opp.h | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 5a4b47958073..5f114cd3d88c 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -92,6 +92,9 @@ void _of_init_opp_table(struct opp_table *opp_table, struct device *dev, of_property_read_u32(np, "voltage-tolerance", _table->voltage_tolerance_v1); + if (of_find_property(np, "#power-domain-cells", NULL)) + opp_table->is_genpd = true; + /* Get OPP table node */ opp_np = _opp_of_get_opp_desc_node(np, index); of_node_put(np); @@ -326,8 +329,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table, ret = of_property_read_u64(np, "opp-hz", ); if (ret < 0) { /* "opp-hz" is optional for devices like power domains. */ - if (!of_find_property(dev->of_node, "#power-domain-cells", - NULL)) { + if (!opp_table->is_genpd) { dev_err(dev, "%s: opp-hz not found\n", __func__); goto free_opp; } diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h index 9c6544b4f4f9..cdb0c2b095e2 100644 --- a/drivers/opp/opp.h +++ b/drivers/opp/opp.h @@ -140,6 +140,7 @@ enum opp_table_access { * @regulators: Supply regulators * @regulator_count: Number of power supply regulators * @genpd_performance_state: Device's power domain support performance state. + * @is_genpd: Marks if the OPP table belongs to a genpd. * @set_opp: Platform specific set_opp callback * @set_opp_data: Data to be passed to set_opp callback * @dentry:debugfs dentry pointer of the real device directory (not links). @@ -178,6 +179,7 @@ struct opp_table { struct regulator **regulators; unsigned int regulator_count; bool genpd_performance_state; + bool is_genpd; int (*set_opp)(struct dev_pm_set_opp_data *data); struct dev_pm_set_opp_data *set_opp_data; -- 2.19.1.568.g152ad8e3369a
[PATCH V3 07/10] OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper
Multiple generic power domains for a consumer device are supported with the help of virtual devices, which are created for each consumer device - genpd pair. These are the device structures which are attached to the power domain and are required by the OPP core to set the performance state of the genpd. The helpers added by this commit are required to be called once for each of these virtual devices. These are required only if multiple domains are available for a device, otherwise the actual device structure will be used instead by the OPP core. The new helpers also support the complex cases where the consumer device wouldn't always require all the domains. For example, a camera may require only one power domain during normal operations but two during high resolution operations. The consumer driver can call dev_pm_opp_put_genpd_virt_dev(high_resolution_genpd_virt_dev) if it is currently operating in the normal mode and doesn't have any performance requirements from the genpd which manages high resolution power requirements. The consumer driver can later call dev_pm_opp_set_genpd_virt_dev(high_resolution_genpd_virt_dev) once it switches back to the high resolution mode. The new helpers differ from other OPP set/put helpers as the new ones can be called with OPPs initialized for the table as we may need to call them on the fly because of the complex case explained above. For this reason it is possible that the genpd virt_dev structure may be used in parallel while the new helpers are running and a new mutex is added to protect against that. We didn't use the existing opp_table->lock mutex as that is widely used in the OPP core and we will need this lock in the dev_pm_opp_set_rate() helper while changing OPP and we need to make sure there is not much contention while doing that as that's the hotpath. Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 88 ++ drivers/opp/of.c | 16 +++- drivers/opp/opp.h | 4 ++ include/linux/pm_opp.h | 8 4 files changed, 115 insertions(+), 1 deletion(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 02a69a62dac8..cef2ccda355d 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -823,6 +823,7 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index) return NULL; mutex_init(_table->lock); + mutex_init(_table->genpd_virt_dev_lock); INIT_LIST_HEAD(_table->dev_list); opp_dev = _add_opp_dev(dev, opp_table); @@ -920,6 +921,7 @@ static void _opp_table_kref_release(struct kref *kref) _remove_opp_dev(opp_dev, opp_table); } + mutex_destroy(_table->genpd_virt_dev_lock); mutex_destroy(_table->lock); list_del(_table->node); kfree(opp_table); @@ -1602,6 +1604,92 @@ void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) } EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper); +/** + * dev_pm_opp_set_genpd_virt_dev - Set virtual genpd device for an index + * @dev: Consumer device for which the genpd device is getting set. + * @virt_dev: virtual genpd device. + * @index: index. + * + * Multiple generic power domains for a device are supported with the help of + * virtual genpd devices, which are created for each consumer device - genpd + * pair. These are the device structures which are attached to the power domain + * and are required by the OPP core to set the performance state of the genpd. + * + * This helper will normally be called by the consumer driver of the device + * "dev", as only that has details of the genpd devices. + * + * This helper needs to be called once for each of those virtual devices, but + * only if multiple domains are available for a device. Otherwise the original + * device structure will be used instead by the OPP core. + */ +struct opp_table *dev_pm_opp_set_genpd_virt_dev(struct device *dev, + struct device *virt_dev, + int index) +{ + struct opp_table *opp_table; + + opp_table = dev_pm_opp_get_opp_table(dev); + if (!opp_table) + return ERR_PTR(-ENOMEM); + + mutex_lock(_table->genpd_virt_dev_lock); + + if (unlikely(!opp_table->genpd_virt_devs || +index >= opp_table->required_opp_count || +opp_table->genpd_virt_devs[index])) { + + dev_err(dev, "Invalid request to set required device\n"); + dev_pm_opp_put_opp_table(opp_table); + mutex_unlock(_table->genpd_virt_dev_lock); + + return ERR_PTR(-EINVAL); + } + + opp_table->genpd_virt_devs[index] = virt_dev; + mutex_unlock(_table->genpd_virt_dev_lock); + + return opp_table; +} + +/** + * dev_pm_opp_put_genpd_virt_dev() - Releases resources blocked for genpd device. + * @opp_table: OPP table returned by
[PATCH V3 06/10] PM / Domains: Add genpd_opp_to_performance_state()
The OPP core currently stores the performance state in the consumer device's OPP table, but that is going to change going forward and performance state will rather be set directly in the genpd's OPP table. For that we need to get the performance state for genpd's device structure (genpd->dev) instead of the consumer device's structure. Add a new helper to do that. Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 32 include/linux/pm_domain.h | 9 + 2 files changed, 41 insertions(+) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index fe9b0527b161..7be8c94c6b7f 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2520,6 +2520,38 @@ int of_genpd_parse_idle_states(struct device_node *dn, } EXPORT_SYMBOL_GPL(of_genpd_parse_idle_states); +/** + * pm_genpd_opp_to_performance_state - Gets performance state of the genpd from its OPP node. + * + * @genpd_dev: Genpd's device for which the performance-state needs to be found. + * @opp: struct dev_pm_opp of the OPP for which we need to find performance + * state. + * + * Returns performance state encoded in the OPP of the genpd. This calls + * platform specific genpd->opp_to_performance_state() callback to translate + * power domain OPP to performance state. + * + * Returns performance state on success and 0 on failure. + */ +unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp) +{ + struct generic_pm_domain *genpd = NULL; + int state; + + genpd = container_of(genpd_dev, struct generic_pm_domain, dev); + + if (unlikely(!genpd->opp_to_performance_state)) + return 0; + + genpd_lock(genpd); + state = genpd->opp_to_performance_state(genpd, opp); + genpd_unlock(genpd); + + return state; +} +EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state); + /** * of_genpd_opp_to_performance_state- Gets performance state of device's * power domain corresponding to a DT node's "required-opps" property. diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 3b5d7280e52e..4f803f934308 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -258,6 +258,8 @@ int of_genpd_add_subdomain(struct of_phandle_args *parent, struct generic_pm_domain *of_genpd_remove_last(struct device_node *np); int of_genpd_parse_idle_states(struct device_node *dn, struct genpd_power_state **states, int *n); +unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp); unsigned int of_genpd_opp_to_performance_state(struct device *dev, struct device_node *np); @@ -299,6 +301,13 @@ static inline int of_genpd_parse_idle_states(struct device_node *dn, return -ENODEV; } +static inline unsigned int +pm_genpd_opp_to_performance_state(struct device *genpd_dev, + struct dev_pm_opp *opp) +{ + return 0; +} + static inline unsigned int of_genpd_opp_to_performance_state(struct device *dev, struct device_node *np) -- 2.19.1.568.g152ad8e3369a
[PATCH V3 09/10] OPP: Rename and relocate of_genpd_opp_to_performance_state()
The OPP core already has the performance state values for each of the genpd's OPPs and there is no need to call the genpd callback again to get the performance state for the case where the end device doesn't have an OPP table and has the "required-opps" property directly in its node. This commit renames of_genpd_opp_to_performance_state() as of_get_required_opp_performance_state() and moves it to the OPP core, as it is all about OPP stuff now. Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 48 - drivers/opp/of.c| 44 ++ include/linux/pm_domain.h | 9 --- include/linux/pm_opp.h | 5 4 files changed, 49 insertions(+), 57 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 7be8c94c6b7f..8e554e6a82a2 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2552,54 +2552,6 @@ unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev, } EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state); -/** - * of_genpd_opp_to_performance_state- Gets performance state of device's - * power domain corresponding to a DT node's "required-opps" property. - * - * @dev: Device for which the performance-state needs to be found. - * @np: DT node where the "required-opps" property is present. This can be - * the device node itself (if it doesn't have an OPP table) or a node - * within the OPP table of a device (if device has an OPP table). - * - * Returns performance state corresponding to the "required-opps" property of - * a DT node. This calls platform specific genpd->opp_to_performance_state() - * callback to translate power domain OPP to performance state. - * - * Returns performance state on success and 0 on failure. - */ -unsigned int of_genpd_opp_to_performance_state(struct device *dev, - struct device_node *np) -{ - struct generic_pm_domain *genpd; - struct dev_pm_opp *opp; - int state = 0; - - genpd = dev_to_genpd(dev); - if (IS_ERR(genpd)) - return 0; - - if (unlikely(!genpd->set_performance_state)) - return 0; - - genpd_lock(genpd); - - opp = of_dev_pm_opp_find_required_opp(>dev, np); - if (IS_ERR(opp)) { - dev_err(dev, "Failed to find required OPP: %ld\n", - PTR_ERR(opp)); - goto unlock; - } - - state = genpd->opp_to_performance_state(genpd, opp); - dev_pm_opp_put(opp); - -unlock: - genpd_unlock(genpd); - - return state; -} -EXPORT_SYMBOL_GPL(of_genpd_opp_to_performance_state); - static int __init genpd_bus_init(void) { return bus_register(_bus_type); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 4e494720ac25..0755ee307b1a 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -969,6 +969,50 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, } EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_sharing_cpus); +/** + * of_get_required_opp_performance_state() - Search for required OPP and return its performance state. + * @np: Node that contains the "required-opps" property. + * @index: Index of the phandle to parse. + * + * Returns the performance state of the OPP pointed out by the "required-opps" + * property at @index in @np. + * + * Return: Positive performance state on success, otherwise 0 on errors. + */ +unsigned int of_get_required_opp_performance_state(struct device_node *np, + int index) +{ + struct dev_pm_opp *opp; + struct device_node *required_np; + struct opp_table *opp_table; + unsigned int pstate = 0; + + required_np = of_parse_required_opp(np, index); + if (!required_np) + return -ENODEV; + + opp_table = _find_table_of_opp_np(required_np); + if (IS_ERR(opp_table)) { + pr_err("%s: Failed to find required OPP table %pOF: %ld\n", + __func__, np, PTR_ERR(opp_table)); + goto put_required_np; + } + + opp = _find_opp_of_np(opp_table, required_np); + if (opp) { + pstate = opp->pstate; + dev_pm_opp_put(opp); + } + + dev_pm_opp_put_opp_table(opp_table); + +put_required_np: + of_node_put(required_np); + + return pstate; +} +EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state); + /** * of_dev_pm_opp_find_required_opp() - Search for required OPP. * @dev: The device whose OPP node is referenced by the 'np' DT node. diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 4f803f934308..642036952553 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -260,8 +260,6 @@ int of_genpd_parse_idle_states(struct device_node *dn, struct genpd_power_state **states, int *n); unsigned int
[PATCH V3 10/10] OPP: Remove of_dev_pm_opp_find_required_opp()
This isn't used anymore, remove it. Signed-off-by: Viresh Kumar --- drivers/opp/of.c | 54 -- include/linux/pm_opp.h | 5 2 files changed, 59 deletions(-) diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 0755ee307b1a..da31874c7fe9 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -1013,60 +1013,6 @@ unsigned int of_get_required_opp_performance_state(struct device_node *np, } EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state); -/** - * of_dev_pm_opp_find_required_opp() - Search for required OPP. - * @dev: The device whose OPP node is referenced by the 'np' DT node. - * @np: Node that contains the "required-opps" property. - * - * Returns the OPP of the device 'dev', whose phandle is present in the "np" - * node. Although the "required-opps" property supports having multiple - * phandles, this helper routine only parses the very first phandle in the list. - * - * Return: Matching opp, else returns ERR_PTR in case of error and should be - * handled using IS_ERR. - * - * The callers are required to call dev_pm_opp_put() for the returned OPP after - * use. - */ -struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, - struct device_node *np) -{ - struct dev_pm_opp *temp_opp, *opp = ERR_PTR(-ENODEV); - struct device_node *required_np; - struct opp_table *opp_table; - - opp_table = _find_opp_table(dev); - if (IS_ERR(opp_table)) - return ERR_CAST(opp_table); - - required_np = of_parse_phandle(np, "required-opps", 0); - if (unlikely(!required_np)) { - dev_err(dev, "Unable to parse required-opps\n"); - goto put_opp_table; - } - - mutex_lock(_table->lock); - - list_for_each_entry(temp_opp, _table->opp_list, node) { - if (temp_opp->available && temp_opp->np == required_np) { - opp = temp_opp; - - /* Increment the reference count of OPP */ - dev_pm_opp_get(opp); - break; - } - } - - mutex_unlock(_table->lock); - - of_node_put(required_np); -put_opp_table: - dev_pm_opp_put_opp_table(opp_table); - - return opp; -} -EXPORT_SYMBOL_GPL(of_dev_pm_opp_find_required_opp); - /** * dev_pm_opp_get_of_node() - Gets the DT node corresponding to an opp * @opp: opp for which DT node has to be returned for diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 889bb347fbd9..2b2c3fd985ab 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -313,7 +313,6 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask *cpumask); void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask); int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask); struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); -struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct device_node *np); struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); unsigned int of_get_required_opp_performance_state(struct device_node *np, int index); #else @@ -350,10 +349,6 @@ static inline struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device return NULL; } -static inline struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct device_node *np) -{ - return NULL; -} static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) { return NULL; -- 2.19.1.568.g152ad8e3369a
[PATCH V3 04/10] OPP: Populate required opp tables from "required-opps" property
The current implementation works only for the case where a single phandle is present in the "required-opps" property, while DT allows multiple phandles to be present there. This patch adds new infrastructure to parse all the phandles present in "required-opps" property and save pointers of the required OPP's OPP tables. These will be used by later commits. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 2 + drivers/opp/of.c | 147 + drivers/opp/opp.h | 8 +++ 3 files changed, 157 insertions(+) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index ebb3b648e0fd..85174a5c4850 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -901,6 +901,8 @@ static void _opp_table_kref_release(struct kref *kref) struct opp_table *opp_table = container_of(kref, struct opp_table, kref); struct opp_device *opp_dev, *temp; + _of_clear_opp_table(opp_table); + /* Release clk */ if (!IS_ERR(opp_table->clk)) clk_put(opp_table->clk); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 5f114cd3d88c..b5605196122a 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -73,6 +73,147 @@ struct opp_table *_managed_opp(struct device *dev, int index) return managed_table; } +/* The caller must call dev_pm_opp_put() after the OPP is used */ +static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table, + struct device_node *opp_np) +{ + struct dev_pm_opp *opp; + + lockdep_assert_held(_table_lock); + + mutex_lock(_table->lock); + + list_for_each_entry(opp, _table->opp_list, node) { + if (opp->np == opp_np) { + dev_pm_opp_get(opp); + mutex_unlock(_table->lock); + return opp; + } + } + + mutex_unlock(_table->lock); + + return NULL; +} + +static struct device_node *of_parse_required_opp(struct device_node *np, +int index) +{ + struct device_node *required_np; + + required_np = of_parse_phandle(np, "required-opps", index); + if (unlikely(!required_np)) { + pr_err("%s: Unable to parse required-opps: %pOF, index: %d\n", + __func__, np, index); + } + + return required_np; +} + +/* The caller must call dev_pm_opp_put_opp_table() after the table is used */ +static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np) +{ + struct opp_table *opp_table; + struct dev_pm_opp *opp; + + lockdep_assert_held(_table_lock); + + list_for_each_entry(opp_table, _tables, node) { + opp = _find_opp_of_np(opp_table, opp_np); + if (opp) { + dev_pm_opp_put(opp); + _get_opp_table_kref(opp_table); + return opp_table; + } + } + + return ERR_PTR(-ENODEV); +} + +/* Free resources previously acquired by _opp_table_alloc_required_tables() */ +static void _opp_table_free_required_tables(struct opp_table *opp_table) +{ + struct opp_table **required_opp_tables = opp_table->required_opp_tables; + int i; + + if (!required_opp_tables) + return; + + for (i = 0; i < opp_table->required_opp_count; i++) { + if (IS_ERR_OR_NULL(required_opp_tables[i])) + break; + + dev_pm_opp_put_opp_table(required_opp_tables[i]); + } + + kfree(required_opp_tables); + + opp_table->required_opp_count = 0; + opp_table->required_opp_tables = NULL; +} + +/* + * Populate all devices and opp tables which are part of "required-opps" list. + * Checking only the first OPP node should be enough. + */ +static void _opp_table_alloc_required_tables(struct opp_table *opp_table, +struct device *dev, +struct device_node *opp_np) +{ + struct opp_table **required_opp_tables; + struct device_node *required_np, *np; + int count, i; + + /* Traversing the first OPP node is all we need */ + np = of_get_next_available_child(opp_np, NULL); + if (!np) { + dev_err(dev, "Empty OPP table\n"); + return; + } + + count = of_count_phandle_with_args(np, "required-opps", NULL); + if (!count) + goto put_np; + + required_opp_tables = kcalloc(count, sizeof(*required_opp_tables), + GFP_KERNEL); + if (!required_opp_tables) + goto put_np; + + opp_table->required_opp_tables = required_opp_tables; + opp_table->required_opp_count = count; + + for (i = 0; i < count; i++) { + required_np = of_parse_required_opp(np, i); + if
[PATCH V3 04/10] OPP: Populate required opp tables from "required-opps" property
The current implementation works only for the case where a single phandle is present in the "required-opps" property, while DT allows multiple phandles to be present there. This patch adds new infrastructure to parse all the phandles present in "required-opps" property and save pointers of the required OPP's OPP tables. These will be used by later commits. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 2 + drivers/opp/of.c | 147 + drivers/opp/opp.h | 8 +++ 3 files changed, 157 insertions(+) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index ebb3b648e0fd..85174a5c4850 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -901,6 +901,8 @@ static void _opp_table_kref_release(struct kref *kref) struct opp_table *opp_table = container_of(kref, struct opp_table, kref); struct opp_device *opp_dev, *temp; + _of_clear_opp_table(opp_table); + /* Release clk */ if (!IS_ERR(opp_table->clk)) clk_put(opp_table->clk); diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 5f114cd3d88c..b5605196122a 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -73,6 +73,147 @@ struct opp_table *_managed_opp(struct device *dev, int index) return managed_table; } +/* The caller must call dev_pm_opp_put() after the OPP is used */ +static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table, + struct device_node *opp_np) +{ + struct dev_pm_opp *opp; + + lockdep_assert_held(_table_lock); + + mutex_lock(_table->lock); + + list_for_each_entry(opp, _table->opp_list, node) { + if (opp->np == opp_np) { + dev_pm_opp_get(opp); + mutex_unlock(_table->lock); + return opp; + } + } + + mutex_unlock(_table->lock); + + return NULL; +} + +static struct device_node *of_parse_required_opp(struct device_node *np, +int index) +{ + struct device_node *required_np; + + required_np = of_parse_phandle(np, "required-opps", index); + if (unlikely(!required_np)) { + pr_err("%s: Unable to parse required-opps: %pOF, index: %d\n", + __func__, np, index); + } + + return required_np; +} + +/* The caller must call dev_pm_opp_put_opp_table() after the table is used */ +static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np) +{ + struct opp_table *opp_table; + struct dev_pm_opp *opp; + + lockdep_assert_held(_table_lock); + + list_for_each_entry(opp_table, _tables, node) { + opp = _find_opp_of_np(opp_table, opp_np); + if (opp) { + dev_pm_opp_put(opp); + _get_opp_table_kref(opp_table); + return opp_table; + } + } + + return ERR_PTR(-ENODEV); +} + +/* Free resources previously acquired by _opp_table_alloc_required_tables() */ +static void _opp_table_free_required_tables(struct opp_table *opp_table) +{ + struct opp_table **required_opp_tables = opp_table->required_opp_tables; + int i; + + if (!required_opp_tables) + return; + + for (i = 0; i < opp_table->required_opp_count; i++) { + if (IS_ERR_OR_NULL(required_opp_tables[i])) + break; + + dev_pm_opp_put_opp_table(required_opp_tables[i]); + } + + kfree(required_opp_tables); + + opp_table->required_opp_count = 0; + opp_table->required_opp_tables = NULL; +} + +/* + * Populate all devices and opp tables which are part of "required-opps" list. + * Checking only the first OPP node should be enough. + */ +static void _opp_table_alloc_required_tables(struct opp_table *opp_table, +struct device *dev, +struct device_node *opp_np) +{ + struct opp_table **required_opp_tables; + struct device_node *required_np, *np; + int count, i; + + /* Traversing the first OPP node is all we need */ + np = of_get_next_available_child(opp_np, NULL); + if (!np) { + dev_err(dev, "Empty OPP table\n"); + return; + } + + count = of_count_phandle_with_args(np, "required-opps", NULL); + if (!count) + goto put_np; + + required_opp_tables = kcalloc(count, sizeof(*required_opp_tables), + GFP_KERNEL); + if (!required_opp_tables) + goto put_np; + + opp_table->required_opp_tables = required_opp_tables; + opp_table->required_opp_count = count; + + for (i = 0; i < count; i++) { + required_np = of_parse_required_opp(np, i); + if
[PATCH V3 03/10] OPP: Separate out custom OPP handler specific code
Create a separate routine to take care of custom set_opp() handler specific stuff. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 67 +++--- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 2c2df4e4fc14..ebb3b648e0fd 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -635,6 +635,34 @@ static int _generic_set_opp_regulator(const struct opp_table *opp_table, return ret; } +static int _set_opp_custom(const struct opp_table *opp_table, + struct device *dev, unsigned long old_freq, + unsigned long freq, + struct dev_pm_opp_supply *old_supply, + struct dev_pm_opp_supply *new_supply) +{ + struct dev_pm_set_opp_data *data; + int size; + + data = opp_table->set_opp_data; + data->regulators = opp_table->regulators; + data->regulator_count = opp_table->regulator_count; + data->clk = opp_table->clk; + data->dev = dev; + + data->old_opp.rate = old_freq; + size = sizeof(*old_supply) * opp_table->regulator_count; + if (IS_ERR(old_supply)) + memset(data->old_opp.supplies, 0, size); + else + memcpy(data->old_opp.supplies, old_supply, size); + + data->new_opp.rate = freq; + memcpy(data->new_opp.supplies, new_supply, size); + + return opp_table->set_opp(data); +} + /** * dev_pm_opp_set_rate() - Configure new OPP based on frequency * @dev:device for which we do this operation @@ -649,7 +677,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) unsigned long freq, old_freq; struct dev_pm_opp *old_opp, *opp; struct clk *clk; - int ret, size; + int ret; if (unlikely(!target_freq)) { dev_err(dev, "%s: Invalid target frequency %lu\n", __func__, @@ -702,8 +730,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__, old_freq, freq); - /* Only frequency scaling */ - if (!opp_table->regulators) { + if (opp_table->set_opp) { + ret = _set_opp_custom(opp_table, dev, old_freq, freq, + IS_ERR(old_opp) ? NULL : old_opp->supplies, + opp->supplies); + } else if (opp_table->regulators) { + ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, +IS_ERR(old_opp) ? NULL : old_opp->supplies, +opp->supplies); + } else { + /* Only frequency scaling */ + /* * We don't support devices with both regulator and * domain performance-state for now. @@ -714,30 +751,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) opp->pstate); else ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); - } else if (!opp_table->set_opp) { - ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, -IS_ERR(old_opp) ? NULL : old_opp->supplies, -opp->supplies); - } else { - struct dev_pm_set_opp_data *data; - - data = opp_table->set_opp_data; - data->regulators = opp_table->regulators; - data->regulator_count = opp_table->regulator_count; - data->clk = clk; - data->dev = dev; - - data->old_opp.rate = old_freq; - size = sizeof(*opp->supplies) * opp_table->regulator_count; - if (IS_ERR(old_opp)) - memset(data->old_opp.supplies, 0, size); - else - memcpy(data->old_opp.supplies, old_opp->supplies, size); - - data->new_opp.rate = freq; - memcpy(data->new_opp.supplies, opp->supplies, size); - - ret = opp_table->set_opp(data); } dev_pm_opp_put(opp); -- 2.19.1.568.g152ad8e3369a
[PATCH V3 03/10] OPP: Separate out custom OPP handler specific code
Create a separate routine to take care of custom set_opp() handler specific stuff. Reviewed-by: Ulf Hansson Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 67 +++--- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 2c2df4e4fc14..ebb3b648e0fd 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -635,6 +635,34 @@ static int _generic_set_opp_regulator(const struct opp_table *opp_table, return ret; } +static int _set_opp_custom(const struct opp_table *opp_table, + struct device *dev, unsigned long old_freq, + unsigned long freq, + struct dev_pm_opp_supply *old_supply, + struct dev_pm_opp_supply *new_supply) +{ + struct dev_pm_set_opp_data *data; + int size; + + data = opp_table->set_opp_data; + data->regulators = opp_table->regulators; + data->regulator_count = opp_table->regulator_count; + data->clk = opp_table->clk; + data->dev = dev; + + data->old_opp.rate = old_freq; + size = sizeof(*old_supply) * opp_table->regulator_count; + if (IS_ERR(old_supply)) + memset(data->old_opp.supplies, 0, size); + else + memcpy(data->old_opp.supplies, old_supply, size); + + data->new_opp.rate = freq; + memcpy(data->new_opp.supplies, new_supply, size); + + return opp_table->set_opp(data); +} + /** * dev_pm_opp_set_rate() - Configure new OPP based on frequency * @dev:device for which we do this operation @@ -649,7 +677,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) unsigned long freq, old_freq; struct dev_pm_opp *old_opp, *opp; struct clk *clk; - int ret, size; + int ret; if (unlikely(!target_freq)) { dev_err(dev, "%s: Invalid target frequency %lu\n", __func__, @@ -702,8 +730,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__, old_freq, freq); - /* Only frequency scaling */ - if (!opp_table->regulators) { + if (opp_table->set_opp) { + ret = _set_opp_custom(opp_table, dev, old_freq, freq, + IS_ERR(old_opp) ? NULL : old_opp->supplies, + opp->supplies); + } else if (opp_table->regulators) { + ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, +IS_ERR(old_opp) ? NULL : old_opp->supplies, +opp->supplies); + } else { + /* Only frequency scaling */ + /* * We don't support devices with both regulator and * domain performance-state for now. @@ -714,30 +751,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) opp->pstate); else ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq); - } else if (!opp_table->set_opp) { - ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq, -IS_ERR(old_opp) ? NULL : old_opp->supplies, -opp->supplies); - } else { - struct dev_pm_set_opp_data *data; - - data = opp_table->set_opp_data; - data->regulators = opp_table->regulators; - data->regulator_count = opp_table->regulator_count; - data->clk = clk; - data->dev = dev; - - data->old_opp.rate = old_freq; - size = sizeof(*opp->supplies) * opp_table->regulator_count; - if (IS_ERR(old_opp)) - memset(data->old_opp.supplies, 0, size); - else - memcpy(data->old_opp.supplies, old_opp->supplies, size); - - data->new_opp.rate = freq; - memcpy(data->new_opp.supplies, opp->supplies, size); - - ret = opp_table->set_opp(data); } dev_pm_opp_put(opp); -- 2.19.1.568.g152ad8e3369a
[PATCH V3 01/10] PM / Domains: Rename genpd virtual devices as virt_dev
There are several struct device instances that genpd core handles. The most common one is the consumer device structure, which is named (correctly) as "dev" within genpd core. The second one is the genpd's device structure, referenced as genpd->dev. The third one is the virtual device structures created by the genpd core to represent the consumer device for multiple power domain case, currently named as genpd_dev. The naming of these virtual devices isn't very clear or readable and it looks more like the genpd->dev. Rename the virtual device instances within the genpd core as "virt_dev". Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 7f38a92b444a..fe9b0527b161 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2338,7 +2338,7 @@ EXPORT_SYMBOL_GPL(genpd_dev_pm_attach); struct device *genpd_dev_pm_attach_by_id(struct device *dev, unsigned int index) { - struct device *genpd_dev; + struct device *virt_dev; int num_domains; int ret; @@ -2352,31 +2352,31 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev, return NULL; /* Allocate and register device on the genpd bus. */ - genpd_dev = kzalloc(sizeof(*genpd_dev), GFP_KERNEL); - if (!genpd_dev) + virt_dev = kzalloc(sizeof(*virt_dev), GFP_KERNEL); + if (!virt_dev) return ERR_PTR(-ENOMEM); - dev_set_name(genpd_dev, "genpd:%u:%s", index, dev_name(dev)); - genpd_dev->bus = _bus_type; - genpd_dev->release = genpd_release_dev; + dev_set_name(virt_dev, "genpd:%u:%s", index, dev_name(dev)); + virt_dev->bus = _bus_type; + virt_dev->release = genpd_release_dev; - ret = device_register(genpd_dev); + ret = device_register(virt_dev); if (ret) { - kfree(genpd_dev); + kfree(virt_dev); return ERR_PTR(ret); } /* Try to attach the device to the PM domain at the specified index. */ - ret = __genpd_dev_pm_attach(genpd_dev, dev->of_node, index, false); + ret = __genpd_dev_pm_attach(virt_dev, dev->of_node, index, false); if (ret < 1) { - device_unregister(genpd_dev); + device_unregister(virt_dev); return ret ? ERR_PTR(ret) : NULL; } - pm_runtime_enable(genpd_dev); - genpd_queue_power_off_work(dev_to_genpd(genpd_dev)); + pm_runtime_enable(virt_dev); + genpd_queue_power_off_work(dev_to_genpd(virt_dev)); - return genpd_dev; + return virt_dev; } EXPORT_SYMBOL_GPL(genpd_dev_pm_attach_by_id); -- 2.19.1.568.g152ad8e3369a
[PATCH V3 00/10] OPP: Support multiple power-domains per device
Hello, This series improves the OPP core (and a bit of genpd core as well) to support multiple phandles in the "required-opps" property, which are only used for multiple power-domains per device for now. We still don't propagate the changes to master domains for the sub-domains, but this patchset is an important stepping stone for that to happen. Tested on Hikey960 after faking some power domains for CPUs. V2->V3: - Gained an additional patch to properly name the virtual device (1/10) - Renamed the virtual device as virt_dev instead of genpd_dev. - Few helpers renamed as well accordingly. - Use container_of() instead of traversing list of genpd. - Included tags received from Ulf. V1->V2: - Had a discussion at Linaro connect with Ulf regarding the changes V1 did in the genpd core and what his objections are to them. Based on his suggestions many changes are made. - The OPP core still needs the virtual device pointers to set the performance state for multiple domains, but the genpd core doesn't provide them automatically to the OPP core. One of the reasons behind this is to give more power to the consumer drivers which may not want to enable all the genpds at once. - The consumer drivers would now need to call the APIs dev_pm_opp_{set|put}_genpd_device() in order to set/reset these virtual device pointers. - More locking is put in place to protect the genpd device pointers in OPP core. - Reorg of the code at many places to make code less redundant. -- viresh Viresh Kumar (10): PM / Domains: Rename genpd virtual devices as virt_dev OPP: Identify and mark genpd OPP tables OPP: Separate out custom OPP handler specific code OPP: Populate required opp tables from "required-opps" property OPP: Populate OPPs from "required-opps" property PM / Domains: Add genpd_opp_to_performance_state() OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper OPP: Configure all required OPPs OPP: Rename and relocate of_genpd_opp_to_performance_state() OPP: Remove of_dev_pm_opp_find_required_opp() drivers/base/power/domain.c | 70 drivers/opp/core.c | 261 ++ drivers/opp/of.c| 313 +++- drivers/opp/opp.h | 20 +++ include/linux/pm_domain.h | 8 +- include/linux/pm_opp.h | 16 +- 6 files changed, 526 insertions(+), 162 deletions(-) -- 2.19.1.568.g152ad8e3369a
[PATCH V3 01/10] PM / Domains: Rename genpd virtual devices as virt_dev
There are several struct device instances that genpd core handles. The most common one is the consumer device structure, which is named (correctly) as "dev" within genpd core. The second one is the genpd's device structure, referenced as genpd->dev. The third one is the virtual device structures created by the genpd core to represent the consumer device for multiple power domain case, currently named as genpd_dev. The naming of these virtual devices isn't very clear or readable and it looks more like the genpd->dev. Rename the virtual device instances within the genpd core as "virt_dev". Signed-off-by: Viresh Kumar --- drivers/base/power/domain.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 7f38a92b444a..fe9b0527b161 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -2338,7 +2338,7 @@ EXPORT_SYMBOL_GPL(genpd_dev_pm_attach); struct device *genpd_dev_pm_attach_by_id(struct device *dev, unsigned int index) { - struct device *genpd_dev; + struct device *virt_dev; int num_domains; int ret; @@ -2352,31 +2352,31 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev, return NULL; /* Allocate and register device on the genpd bus. */ - genpd_dev = kzalloc(sizeof(*genpd_dev), GFP_KERNEL); - if (!genpd_dev) + virt_dev = kzalloc(sizeof(*virt_dev), GFP_KERNEL); + if (!virt_dev) return ERR_PTR(-ENOMEM); - dev_set_name(genpd_dev, "genpd:%u:%s", index, dev_name(dev)); - genpd_dev->bus = _bus_type; - genpd_dev->release = genpd_release_dev; + dev_set_name(virt_dev, "genpd:%u:%s", index, dev_name(dev)); + virt_dev->bus = _bus_type; + virt_dev->release = genpd_release_dev; - ret = device_register(genpd_dev); + ret = device_register(virt_dev); if (ret) { - kfree(genpd_dev); + kfree(virt_dev); return ERR_PTR(ret); } /* Try to attach the device to the PM domain at the specified index. */ - ret = __genpd_dev_pm_attach(genpd_dev, dev->of_node, index, false); + ret = __genpd_dev_pm_attach(virt_dev, dev->of_node, index, false); if (ret < 1) { - device_unregister(genpd_dev); + device_unregister(virt_dev); return ret ? ERR_PTR(ret) : NULL; } - pm_runtime_enable(genpd_dev); - genpd_queue_power_off_work(dev_to_genpd(genpd_dev)); + pm_runtime_enable(virt_dev); + genpd_queue_power_off_work(dev_to_genpd(virt_dev)); - return genpd_dev; + return virt_dev; } EXPORT_SYMBOL_GPL(genpd_dev_pm_attach_by_id); -- 2.19.1.568.g152ad8e3369a
[PATCH V3 00/10] OPP: Support multiple power-domains per device
Hello, This series improves the OPP core (and a bit of genpd core as well) to support multiple phandles in the "required-opps" property, which are only used for multiple power-domains per device for now. We still don't propagate the changes to master domains for the sub-domains, but this patchset is an important stepping stone for that to happen. Tested on Hikey960 after faking some power domains for CPUs. V2->V3: - Gained an additional patch to properly name the virtual device (1/10) - Renamed the virtual device as virt_dev instead of genpd_dev. - Few helpers renamed as well accordingly. - Use container_of() instead of traversing list of genpd. - Included tags received from Ulf. V1->V2: - Had a discussion at Linaro connect with Ulf regarding the changes V1 did in the genpd core and what his objections are to them. Based on his suggestions many changes are made. - The OPP core still needs the virtual device pointers to set the performance state for multiple domains, but the genpd core doesn't provide them automatically to the OPP core. One of the reasons behind this is to give more power to the consumer drivers which may not want to enable all the genpds at once. - The consumer drivers would now need to call the APIs dev_pm_opp_{set|put}_genpd_device() in order to set/reset these virtual device pointers. - More locking is put in place to protect the genpd device pointers in OPP core. - Reorg of the code at many places to make code less redundant. -- viresh Viresh Kumar (10): PM / Domains: Rename genpd virtual devices as virt_dev OPP: Identify and mark genpd OPP tables OPP: Separate out custom OPP handler specific code OPP: Populate required opp tables from "required-opps" property OPP: Populate OPPs from "required-opps" property PM / Domains: Add genpd_opp_to_performance_state() OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper OPP: Configure all required OPPs OPP: Rename and relocate of_genpd_opp_to_performance_state() OPP: Remove of_dev_pm_opp_find_required_opp() drivers/base/power/domain.c | 70 drivers/opp/core.c | 261 ++ drivers/opp/of.c| 313 +++- drivers/opp/opp.h | 20 +++ include/linux/pm_domain.h | 8 +- include/linux/pm_opp.h | 16 +- 6 files changed, 526 insertions(+), 162 deletions(-) -- 2.19.1.568.g152ad8e3369a
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction
Commit-ID: ace6485a03266cc3c198ce8e927a1ce0ce139699 Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699 Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIR64B instruction MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity. Direct store is implemented by using write combining (WC) for writing data directly into memory without caching the data. In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes work descriptors (and data in some cases) to device-hosted work-queues atomically without cache pollution. Availability of the MOVDIR64B instruction is indicated by the presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIR64B flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 90934ee7b79a..28c4a502b419 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -332,6 +332,7 @@ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ +#define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction
Commit-ID: ace6485a03266cc3c198ce8e927a1ce0ce139699 Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699 Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIR64B instruction MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity. Direct store is implemented by using write combining (WC) for writing data directly into memory without caching the data. In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes work descriptors (and data in some cases) to device-hosted work-queues atomically without cache pollution. Availability of the MOVDIR64B instruction is indicated by the presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIR64B flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 90934ee7b79a..28c4a502b419 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -332,6 +332,7 @@ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ +#define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction
Commit-ID: 33823f4d63f7a010653d219800539409a78ef4be Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIRI instruction MOVDIRI moves doubleword or quadword from register to memory through direct store which is implemented by using write combining (WC) for writing data directly into memory without caching the data. Programmable agents can handle streaming offload (e.g. high speed packet processing in network). Hardware implements a doorbell (tail pointer) register that is updated by software when adding new work-elements to the streaming offload work-queue. MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte uncachable write to MMIO. MOVDIRI has lower overhead than other ways to write the doorbell. Availability of the MOVDIRI instruction is indicated by the presence of the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIRI flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 89a048c2faec..90934ee7b79a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -331,6 +331,7 @@ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ +#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction
Commit-ID: 33823f4d63f7a010653d219800539409a78ef4be Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be Author: Fenghua Yu AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700 Committer: Ingo Molnar CommitDate: Thu, 25 Oct 2018 07:42:48 +0200 x86/cpufeatures: Enumerate MOVDIRI instruction MOVDIRI moves doubleword or quadword from register to memory through direct store which is implemented by using write combining (WC) for writing data directly into memory without caching the data. Programmable agents can handle streaming offload (e.g. high speed packet processing in network). Hardware implements a doorbell (tail pointer) register that is updated by software when adding new work-elements to the streaming offload work-queue. MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte uncachable write to MMIO. MOVDIRI has lower overhead than other ways to write the doorbell. Availability of the MOVDIRI instruction is indicated by the presence of the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIRI flag. Signed-off-by: Fenghua Yu Cc: Andy Lutomirski Cc: Ashok Raj Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi V Shankar Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 89a048c2faec..90934ee7b79a 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -331,6 +331,7 @@ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ #define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */ +#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */ /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
Re: [PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem
* Bjorn Helgaas wrote: > From: Bjorn Helgaas > > arch/x86/kernel/early-quirks.c contains special PCI quirks that need to > run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks. These have > typically been merged by the x86 maintainers, which is fine, but PCI folks > should at least see what's happening, so add a file pattern to the PCI > subsystem entry. > > Signed-off-by: Bjorn Helgaas > --- > MAINTAINERS |1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4ece30f15777..63cb7f3dbbb4 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -11249,6 +11249,7 @@ F:include/uapi/linux/pci* > F: lib/pci* > F: arch/x86/pci/ > F: arch/x86/kernel/quirks.c > +F: arch/x86/kernel/early-quirks.c Good idea! Acked-by: Ingo Molnar Thanks, Ingo
Re: [PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem
* Bjorn Helgaas wrote: > From: Bjorn Helgaas > > arch/x86/kernel/early-quirks.c contains special PCI quirks that need to > run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks. These have > typically been merged by the x86 maintainers, which is fine, but PCI folks > should at least see what's happening, so add a file pattern to the PCI > subsystem entry. > > Signed-off-by: Bjorn Helgaas > --- > MAINTAINERS |1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4ece30f15777..63cb7f3dbbb4 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -11249,6 +11249,7 @@ F:include/uapi/linux/pci* > F: lib/pci* > F: arch/x86/pci/ > F: arch/x86/kernel/quirks.c > +F: arch/x86/kernel/early-quirks.c Good idea! Acked-by: Ingo Molnar Thanks, Ingo
Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.
On 2018-10-23 11:43, Julia Lawall wrote: On Tue, 23 Oct 2018, Michal Hocko wrote: [Trimmed CC list + Julia - there is indeed no need to CC everybody maintain a file you are updating for the change like this] On Tue 23-10-18 10:16:51, Arun Sudhilal wrote: > On Mon, Oct 22, 2018 at 11:41 PM Michal Hocko wrote: > > > > On Mon 22-10-18 22:53:22, Arun KS wrote: > > > Remove managed_page_count_lock spinlock and instead use atomic > > > variables. > > > > Hello Michal, > > I assume this has been auto-generated. If yes, it would be better to > > mention the script so that people can review it and regenerate for > > comparision. Such a large change is hard to review manually. > > Changes were made partially with script. For totalram_pages and > totalhigh_pages, > > find dir -type f -exec sed -i > 's/totalram_pages/atomic_long_read(\_pages)/g' {} \; > find dir -type f -exec sed -i > 's/totalhigh_pages/atomic_long_read(\_pages)/g' {} \; > > For managed_pages it was mostly manual edits after using, > find mm/ -type f -exec sed -i > 's/zone->managed_pages/atomic_long_read(\>managed_pages)/g' {} > \; I guess we should be able to use coccinelle for this kind of change and reduce the amount of manual intervention to absolute minimum. Coccinelle looks like it would be desirable, especially in case the word zone is not always used. Arun, please feel free to contact me if you want to try it and need help. Hello Julia, I was able to come up .cocci for replacing managed_pages, @@ struct zone *z; @@ ( z->managed_pages = ... | - z->managed_pages + atomic_long_read(>managed_pages) ) @@ struct zone *z; expression e1; @@ - z->managed_pages = e1 + atomic_long_set(>managed_pages, e1) @@ expression e,e1; @@ - e->managed_pages += e1 + atomic_long_add(e1, >managed_pages) @@ expression z; @@ - z.managed_pages + atomic_long_read(_pages) But I m not able to use same method for unsigned long variables(totalram_pages) @@ unsigned long totalram_pages; @@ ( totalram_pages = ... | -totalram_pages +atomic_long_read(_pages) ) This throws error, spatch test1.cocci mm/page_alloc.c init_defs_builtins: /usr/lib/coccinelle/standard.h HANDLING: mm/page_alloc.c previous modification: MINUS >>> atomic_long_read( starting on line 1:totalram_pages) According to environment 1: rule starting on line 1.totalram_pages -> page_idx ^ (1 << order) current modification: MINUS >>> atomic_long_read( starting on line 1:totalram_pages) According to environment 1: rule starting on line 1.totalram_pages -> page_idx Fatal error: exception Failure("rule starting on line 1: already tagged token: C code context File "mm/internal.h", line 175, column 8, charpos = 5368 around = 'page_idx', whole content =return page_idx ^ (1 << order);") Regards, Arun julia
Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.
On 2018-10-23 11:43, Julia Lawall wrote: On Tue, 23 Oct 2018, Michal Hocko wrote: [Trimmed CC list + Julia - there is indeed no need to CC everybody maintain a file you are updating for the change like this] On Tue 23-10-18 10:16:51, Arun Sudhilal wrote: > On Mon, Oct 22, 2018 at 11:41 PM Michal Hocko wrote: > > > > On Mon 22-10-18 22:53:22, Arun KS wrote: > > > Remove managed_page_count_lock spinlock and instead use atomic > > > variables. > > > > Hello Michal, > > I assume this has been auto-generated. If yes, it would be better to > > mention the script so that people can review it and regenerate for > > comparision. Such a large change is hard to review manually. > > Changes were made partially with script. For totalram_pages and > totalhigh_pages, > > find dir -type f -exec sed -i > 's/totalram_pages/atomic_long_read(\_pages)/g' {} \; > find dir -type f -exec sed -i > 's/totalhigh_pages/atomic_long_read(\_pages)/g' {} \; > > For managed_pages it was mostly manual edits after using, > find mm/ -type f -exec sed -i > 's/zone->managed_pages/atomic_long_read(\>managed_pages)/g' {} > \; I guess we should be able to use coccinelle for this kind of change and reduce the amount of manual intervention to absolute minimum. Coccinelle looks like it would be desirable, especially in case the word zone is not always used. Arun, please feel free to contact me if you want to try it and need help. Hello Julia, I was able to come up .cocci for replacing managed_pages, @@ struct zone *z; @@ ( z->managed_pages = ... | - z->managed_pages + atomic_long_read(>managed_pages) ) @@ struct zone *z; expression e1; @@ - z->managed_pages = e1 + atomic_long_set(>managed_pages, e1) @@ expression e,e1; @@ - e->managed_pages += e1 + atomic_long_add(e1, >managed_pages) @@ expression z; @@ - z.managed_pages + atomic_long_read(_pages) But I m not able to use same method for unsigned long variables(totalram_pages) @@ unsigned long totalram_pages; @@ ( totalram_pages = ... | -totalram_pages +atomic_long_read(_pages) ) This throws error, spatch test1.cocci mm/page_alloc.c init_defs_builtins: /usr/lib/coccinelle/standard.h HANDLING: mm/page_alloc.c previous modification: MINUS >>> atomic_long_read( starting on line 1:totalram_pages) According to environment 1: rule starting on line 1.totalram_pages -> page_idx ^ (1 << order) current modification: MINUS >>> atomic_long_read( starting on line 1:totalram_pages) According to environment 1: rule starting on line 1.totalram_pages -> page_idx Fatal error: exception Failure("rule starting on line 1: already tagged token: C code context File "mm/internal.h", line 175, column 8, charpos = 5368 around = 'page_idx', whole content =return page_idx ^ (1 << order);") Regards, Arun julia
Linux kernel crash
A week ago I upgraded to the latest Ubuntu distribution (Linux thunderbird 4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux). Whenever I run "shutdown -h now" or "reboot" I receive an immediate kernel crash with a dump that has: "Code: Bad RIP value" What is the best way of collecting and sending information to upstream. I don't mind learning to build and install the kernel if that will help debug this. sps
Linux kernel crash
A week ago I upgraded to the latest Ubuntu distribution (Linux thunderbird 4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux). Whenever I run "shutdown -h now" or "reboot" I receive an immediate kernel crash with a dump that has: "Code: Bad RIP value" What is the best way of collecting and sending information to upstream. I don't mind learning to build and install the kernel if that will help debug this. sps
Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE
On (10/24/18 22:27), Rafael David Tinoco wrote: > static unsigned long location_to_obj(struct page *page, unsigned int obj_idx) > { > - unsigned long obj; > + unsigned long obj, pfn; > + > + pfn = page_to_pfn(page); > + > + if (unlikely(OBJ_OVERFLOW(pfn))) > + BUG(); The trend these days is to have less BUG/BUG_ON-s in the kernel. -ss
Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE
On (10/24/18 22:27), Rafael David Tinoco wrote: > static unsigned long location_to_obj(struct page *page, unsigned int obj_idx) > { > - unsigned long obj; > + unsigned long obj, pfn; > + > + pfn = page_to_pfn(page); > + > + if (unlikely(OBJ_OVERFLOW(pfn))) > + BUG(); The trend these days is to have less BUG/BUG_ON-s in the kernel. -ss
Re: [GIT PULL] Qualcomm ARM64 DT updates for 4.20
On Fri, Oct 5, 2018 at 7:59 PM Andy Gross wrote: > > On Tue, Oct 02, 2018 at 11:30:29AM +0200, Arnd Bergmann wrote: > > On Sun, Sep 30, 2018 at 8:38 PM Andy Gross wrote: > > > > > > The following changes since commit > > > 5b394b2ddf0347bef56e50c69a58773c94343ff3: > > > > > > Linux 4.19-rc1 (2018-08-26 14:11:59 -0700) > > > > > > are available in the git repository at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git > > > tags/qcom-arm64-for-4.20 > > > > > > for you to fetch changes up to 6db0483cf622fc690e32e0804f8b86061f13962a: > > > > > > Revert "dt-bindings: thermal: qcom-spmi-temp-alarm: Fix documentation > > > of 'reg'" (2018-09-30 12:59:52 -0500) > > > > > > > > > Qualcomm ARM64 Updates for v4.20 > > > > > > * Update Coresight for MSM8916 > > > * Switch to use mailbox for smp2p and smd on MSM8996 > > > * Add dispcc, dsp, USB, regulator, and other nodes for SDM845 > > > * Drop model/compatible from MSM8916 and MSM8996 > > > * Add compat for db820c > > > * Add MSM8998 SoC and board support along with associated nodes > > > * Add RESIN/PON for Qualcomm PM8916 and PM8994 > > > > Pulled into next/dt, thanks. > > > > There is a small chance that the two reverts end up causing more > > problems than they help: they might conflict with the patches merged > > into the iio branch, or they could end up being applied on top of > > the iio branch and revert the changes during the release. I think > > it will be fine, but it's obviously better to avoid this. > > > > Agreed, I'll check in on this once -rc1 hits to make sure they weren't > removed. Andy, While you acked the tsens thermal DT patches[1] so they could go through the thermal tree[2] to avoid a dependency, Eduardo would prefer the DT changes for tsens to go through the arch tree. To avoid a scenario where these DT changes don't get merged until the next merge window, would you or arm-soc maintainers consider merging them in -rc2 (after the thermal-soc tree gets merged)? For convenience, I've provided just the DT changes with all tags in a separate branch on top of v4.19 here: https://git.linaro.org/people/amit.kucheria/kernel.git/log/?h=up/thermal/tsens-preirq-cleanup-v4 Regards, Amit [1] Here is a reference to the patch series with Andy's acks: https://lore.kernel.org/lkml/cover.1536744310.git.amit.kuche...@linaro.org/T/ [2] Here is a link to Eduardo's tree containing a subset of the above series (I don't yet see a pull request): https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git/log/?h=linus
Re: [GIT PULL] Qualcomm ARM64 DT updates for 4.20
On Fri, Oct 5, 2018 at 7:59 PM Andy Gross wrote: > > On Tue, Oct 02, 2018 at 11:30:29AM +0200, Arnd Bergmann wrote: > > On Sun, Sep 30, 2018 at 8:38 PM Andy Gross wrote: > > > > > > The following changes since commit > > > 5b394b2ddf0347bef56e50c69a58773c94343ff3: > > > > > > Linux 4.19-rc1 (2018-08-26 14:11:59 -0700) > > > > > > are available in the git repository at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git > > > tags/qcom-arm64-for-4.20 > > > > > > for you to fetch changes up to 6db0483cf622fc690e32e0804f8b86061f13962a: > > > > > > Revert "dt-bindings: thermal: qcom-spmi-temp-alarm: Fix documentation > > > of 'reg'" (2018-09-30 12:59:52 -0500) > > > > > > > > > Qualcomm ARM64 Updates for v4.20 > > > > > > * Update Coresight for MSM8916 > > > * Switch to use mailbox for smp2p and smd on MSM8996 > > > * Add dispcc, dsp, USB, regulator, and other nodes for SDM845 > > > * Drop model/compatible from MSM8916 and MSM8996 > > > * Add compat for db820c > > > * Add MSM8998 SoC and board support along with associated nodes > > > * Add RESIN/PON for Qualcomm PM8916 and PM8994 > > > > Pulled into next/dt, thanks. > > > > There is a small chance that the two reverts end up causing more > > problems than they help: they might conflict with the patches merged > > into the iio branch, or they could end up being applied on top of > > the iio branch and revert the changes during the release. I think > > it will be fine, but it's obviously better to avoid this. > > > > Agreed, I'll check in on this once -rc1 hits to make sure they weren't > removed. Andy, While you acked the tsens thermal DT patches[1] so they could go through the thermal tree[2] to avoid a dependency, Eduardo would prefer the DT changes for tsens to go through the arch tree. To avoid a scenario where these DT changes don't get merged until the next merge window, would you or arm-soc maintainers consider merging them in -rc2 (after the thermal-soc tree gets merged)? For convenience, I've provided just the DT changes with all tags in a separate branch on top of v4.19 here: https://git.linaro.org/people/amit.kucheria/kernel.git/log/?h=up/thermal/tsens-preirq-cleanup-v4 Regards, Amit [1] Here is a reference to the patch series with Andy's acks: https://lore.kernel.org/lkml/cover.1536744310.git.amit.kuche...@linaro.org/T/ [2] Here is a link to Eduardo's tree containing a subset of the above series (I don't yet see a pull request): https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git/log/?h=linus
Re: [PATCH anybus v1 2/4] dt-bindings: anybus-bridge: document devicetree binding.
On Wed, 24 Oct 2018, Sven Van Asbroeck wrote: > This patch adds devicetree binding documentation for the > Arcx anybus bridge. > > Signed-off-by: Sven Van Asbroeck > --- > .../bindings/mfd/arcx,anybus-bridge.txt | 37 +++ > .../devicetree/bindings/vendor-prefixes.txt | 1 + > 2 files changed, 38 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > > diff --git a/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > new file mode 100644 > index ..3c0399c4ed45 > --- /dev/null > +++ b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > @@ -0,0 +1,37 @@ > +* Arcx anybus bridge > + > +This chip communicates with the SoC over the WEIM bus. It is > +expected that its Device Tree node is specified as the child of a node > +corresponding to the WEIM bus used for communication. > + > +Required properties: > + > + - compatible : The following chip-specific string: > +"arcx,anybus-bridge" > + > + - reg : > + weim memory area where the cpld registers are located, followed by: > + weim memory area of the first anybus-s slot, followed by: > + weim memory area of the second anybus-s slot. > + > + - enable-gpios : the gpio connected to the bridge's 'enable gpio'. > + > + - pwms : the pwm connected to the bridge's 'pwm input'. > + > + - irq-gpios : the gpios connected to the bridge's interrupt lines. > + note that there is no need to provide the 'interrupts' property here. > + > +Example of usage: > + > + { > + bridge@0,0 { I haven't seen this syntax before. It doesn't mean it's wrong, but will needs Rob et. al to cast an eye. > + compatible = "arcx,anybus-bridge"; > + reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>; > + fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100 > + 0x 0xa240 0x>; This needs to be documented. > + enable-gpios = < 2 GPIO_ACTIVE_HIGH>; > + pwms = < 0 571>; > + irq-gpios = < 1 GPIO_ACTIVE_HIGH>, > + < 5 GPIO_ACTIVE_HIGH>; Tabbing. > + }; > +}; > diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt > b/Documentation/devicetree/bindings/vendor-prefixes.txt > index 2c3fc512e746..1bf07b20a8af 100644 > --- a/Documentation/devicetree/bindings/vendor-prefixes.txt > +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt > @@ -35,6 +35,7 @@ aptina Aptina Imaging > arasan Arasan Chip Systems > archermind ArcherMind Technology (Nanjing) Co., Ltd. > arctic Arctic Sand > +arcx Arcx/Archronix Inc. > ariesAries Embedded GmbH > arm ARM Ltd. > armadeus ARMadeus Systems SARL -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH anybus v1 2/4] dt-bindings: anybus-bridge: document devicetree binding.
On Wed, 24 Oct 2018, Sven Van Asbroeck wrote: > This patch adds devicetree binding documentation for the > Arcx anybus bridge. > > Signed-off-by: Sven Van Asbroeck > --- > .../bindings/mfd/arcx,anybus-bridge.txt | 37 +++ > .../devicetree/bindings/vendor-prefixes.txt | 1 + > 2 files changed, 38 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > > diff --git a/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > new file mode 100644 > index ..3c0399c4ed45 > --- /dev/null > +++ b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt > @@ -0,0 +1,37 @@ > +* Arcx anybus bridge > + > +This chip communicates with the SoC over the WEIM bus. It is > +expected that its Device Tree node is specified as the child of a node > +corresponding to the WEIM bus used for communication. > + > +Required properties: > + > + - compatible : The following chip-specific string: > +"arcx,anybus-bridge" > + > + - reg : > + weim memory area where the cpld registers are located, followed by: > + weim memory area of the first anybus-s slot, followed by: > + weim memory area of the second anybus-s slot. > + > + - enable-gpios : the gpio connected to the bridge's 'enable gpio'. > + > + - pwms : the pwm connected to the bridge's 'pwm input'. > + > + - irq-gpios : the gpios connected to the bridge's interrupt lines. > + note that there is no need to provide the 'interrupts' property here. > + > +Example of usage: > + > + { > + bridge@0,0 { I haven't seen this syntax before. It doesn't mean it's wrong, but will needs Rob et. al to cast an eye. > + compatible = "arcx,anybus-bridge"; > + reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>; > + fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100 > + 0x 0xa240 0x>; This needs to be documented. > + enable-gpios = < 2 GPIO_ACTIVE_HIGH>; > + pwms = < 0 571>; > + irq-gpios = < 1 GPIO_ACTIVE_HIGH>, > + < 5 GPIO_ACTIVE_HIGH>; Tabbing. > + }; > +}; > diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt > b/Documentation/devicetree/bindings/vendor-prefixes.txt > index 2c3fc512e746..1bf07b20a8af 100644 > --- a/Documentation/devicetree/bindings/vendor-prefixes.txt > +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt > @@ -35,6 +35,7 @@ aptina Aptina Imaging > arasan Arasan Chip Systems > archermind ArcherMind Technology (Nanjing) Co., Ltd. > arctic Arctic Sand > +arcx Arcx/Archronix Inc. > ariesAries Embedded GmbH > arm ARM Ltd. > armadeus ARMadeus Systems SARL -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask
On 10/25/18 1:17 AM, Andrew Morton wrote: > On Mon, 22 Oct 2018 15:27:54 +0200 Vlastimil Babka wrote: > >>> : Moreover the oriinal code allowed to trigger >>> : WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); >>> : in policy_node if the requested node (e.g. cpu local one) was outside of >>> : the mbind nodemask. This is not possible now. We haven't heard about any >>> : such warning yet so it is unlikely that it happens but still a signal of >>> : a wrong code layering. >> >> Ah, as I said in the other mail, I think it's inaccurate, the warning >> was not possible to hit. >> >> There's also a slight difference wrt MPOL_BIND. The previous code would >> avoid using __GFP_THISNODE if the local node was outside of >> policy_nodemask(). After your patch __GFP_THISNODE is avoided for all >> MPOL_BIND policies. So there's a difference that if local node is >> actually allowed by the bind policy's nodemask, previously >> __GFP_THISNODE would be added, but now it won't be. I don't think it >> matters that much though, but maybe the changelog could say that >> (instead of the inaccurate note about warning). Note the other policy >> where nodemask is relevant is MPOL_INTERLEAVE, and that's unchanged by >> this patch. > > So the above could go into the changelog, yes? Yeah. >> When that's addressed, you can add > > What is it that you'd like to see addressed? Purely changelog updates? Right. >> Acked-by: Vlastimil Babka > > Thanks. >
Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask
On 10/25/18 1:17 AM, Andrew Morton wrote: > On Mon, 22 Oct 2018 15:27:54 +0200 Vlastimil Babka wrote: > >>> : Moreover the oriinal code allowed to trigger >>> : WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); >>> : in policy_node if the requested node (e.g. cpu local one) was outside of >>> : the mbind nodemask. This is not possible now. We haven't heard about any >>> : such warning yet so it is unlikely that it happens but still a signal of >>> : a wrong code layering. >> >> Ah, as I said in the other mail, I think it's inaccurate, the warning >> was not possible to hit. >> >> There's also a slight difference wrt MPOL_BIND. The previous code would >> avoid using __GFP_THISNODE if the local node was outside of >> policy_nodemask(). After your patch __GFP_THISNODE is avoided for all >> MPOL_BIND policies. So there's a difference that if local node is >> actually allowed by the bind policy's nodemask, previously >> __GFP_THISNODE would be added, but now it won't be. I don't think it >> matters that much though, but maybe the changelog could say that >> (instead of the inaccurate note about warning). Note the other policy >> where nodemask is relevant is MPOL_INTERLEAVE, and that's unchanged by >> this patch. > > So the above could go into the changelog, yes? Yeah. >> When that's addressed, you can add > > What is it that you'd like to see addressed? Purely changelog updates? Right. >> Acked-by: Vlastimil Babka > > Thanks. >
Re: in_compat_syscall() returns from kernel thread for X86_32.
> On Oct 24, 2018, at 8:46 PM, NeilBrown wrote: > >> On Wed, Oct 24 2018, Theodore Y. Ts'o wrote: >> >>> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote: >>> >>> I doubt it was copied - more likely independent evolution. >>> But on reflection, I see that it is probably reasonable that it >>> shouldn't be used this way - or at all in this context. >>> I'll try to understand what the issues really are and see if I can >>> find a solution that doesn't depend on this interface. >>> Thanks for your help. >> >> At least for ext4, the primary problem is that we want to use a 64-bit >> telldir/seekdir cookie if all 64-bits will make it to user space, and >> a 32-bit telldir cookie if only 32 bits will make it userspace. This >> impacts NFS as well because if there are people who are still using >> NFSv2, which has 32-bit directory offsets, we need to constrain the >> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a >> 64-bit hash. > > NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the > right thing. FMODE_32BITHASH is set for NFSv2 only. > > Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs > to set FMODE_64BITHASH - or something like that. It’s possible for a 32-bit process and a 64-bit process to share a directory fd, so I don’t think it’s quite that simple. One option would be to add .llseek and .getdents flags or entire new compat operations to indicate that the caller expects 32-bit offsets. I wonder how overlayfs interacts with this whole mess.
Re: in_compat_syscall() returns from kernel thread for X86_32.
> On Oct 24, 2018, at 8:46 PM, NeilBrown wrote: > >> On Wed, Oct 24 2018, Theodore Y. Ts'o wrote: >> >>> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote: >>> >>> I doubt it was copied - more likely independent evolution. >>> But on reflection, I see that it is probably reasonable that it >>> shouldn't be used this way - or at all in this context. >>> I'll try to understand what the issues really are and see if I can >>> find a solution that doesn't depend on this interface. >>> Thanks for your help. >> >> At least for ext4, the primary problem is that we want to use a 64-bit >> telldir/seekdir cookie if all 64-bits will make it to user space, and >> a 32-bit telldir cookie if only 32 bits will make it userspace. This >> impacts NFS as well because if there are people who are still using >> NFSv2, which has 32-bit directory offsets, we need to constrain the >> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a >> 64-bit hash. > > NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the > right thing. FMODE_32BITHASH is set for NFSv2 only. > > Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs > to set FMODE_64BITHASH - or something like that. It’s possible for a 32-bit process and a 64-bit process to share a directory fd, so I don’t think it’s quite that simple. One option would be to add .llseek and .getdents flags or entire new compat operations to indicate that the caller expects 32-bit offsets. I wonder how overlayfs interacts with this whole mess.
Re: in_compat_syscall() returns from kernel thread for X86_32.
On Wed, Oct 24 2018, Theodore Y. Ts'o wrote: > On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote: >> >> I doubt it was copied - more likely independent evolution. >> But on reflection, I see that it is probably reasonable that it >> shouldn't be used this way - or at all in this context. >> I'll try to understand what the issues really are and see if I can >> find a solution that doesn't depend on this interface. >> Thanks for your help. > > At least for ext4, the primary problem is that we want to use a 64-bit > telldir/seekdir cookie if all 64-bits will make it to user space, and > a 32-bit telldir cookie if only 32 bits will make it userspace. This > impacts NFS as well because if there are people who are still using > NFSv2, which has 32-bit directory offsets, we need to constrain the > telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a > 64-bit hash. NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the right thing. FMODE_32BITHASH is set for NFSv2 only. Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs to set FMODE_64BITHASH - or something like that. For lustre it is a bit more complex. The internal "inode number" is 128 bits and we (sort of) hash it to 32 or 64 bits. cp_compat_stat() just says -EOVERFLOW if we give a 64 bit number when 32 are expected, and there is no flag to say "this is a 32-bit 'stat' request". But I need to dig into exactly what that "sort-of" means - maybe there is an answer in there. NeilBrown signature.asc Description: PGP signature
Re: in_compat_syscall() returns from kernel thread for X86_32.
On Wed, Oct 24 2018, Theodore Y. Ts'o wrote: > On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote: >> >> I doubt it was copied - more likely independent evolution. >> But on reflection, I see that it is probably reasonable that it >> shouldn't be used this way - or at all in this context. >> I'll try to understand what the issues really are and see if I can >> find a solution that doesn't depend on this interface. >> Thanks for your help. > > At least for ext4, the primary problem is that we want to use a 64-bit > telldir/seekdir cookie if all 64-bits will make it to user space, and > a 32-bit telldir cookie if only 32 bits will make it userspace. This > impacts NFS as well because if there are people who are still using > NFSv2, which has 32-bit directory offsets, we need to constrain the > telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a > 64-bit hash. NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the right thing. FMODE_32BITHASH is set for NFSv2 only. Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs to set FMODE_64BITHASH - or something like that. For lustre it is a bit more complex. The internal "inode number" is 128 bits and we (sort of) hash it to 32 or 64 bits. cp_compat_stat() just says -EOVERFLOW if we give a 64 bit number when 32 are expected, and there is no flag to say "this is a 32-bit 'stat' request". But I need to dig into exactly what that "sort-of" means - maybe there is an answer in there. NeilBrown signature.asc Description: PGP signature
[PATCH v4 1/4] nds32: Fix bug in bitfield.h
From: Nickhu There two bitfield bug for perfomance counter in bitfield.h: PFM_CTL_offSEL1 21 --> 16 PFM_CTL_offSEL2 27 --> 22 This commit fix it. Signed-off-by: Nickhu --- arch/nds32/include/asm/bitfield.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/nds32/include/asm/bitfield.h b/arch/nds32/include/asm/bitfield.h index 8e84fc385b94..19b2841219ad 100644 --- a/arch/nds32/include/asm/bitfield.h +++ b/arch/nds32/include/asm/bitfield.h @@ -692,8 +692,8 @@ #define PFM_CTL_offKU1 13 /* Enable user mode event counting for PFMC1 */ #define PFM_CTL_offKU2 14 /* Enable user mode event counting for PFMC2 */ #define PFM_CTL_offSEL015 /* The event selection for PFMC0 */ -#define PFM_CTL_offSEL121 /* The event selection for PFMC1 */ -#define PFM_CTL_offSEL227 /* The event selection for PFMC2 */ +#define PFM_CTL_offSEL116 /* The event selection for PFMC1 */ +#define PFM_CTL_offSEL222 /* The event selection for PFMC2 */ /* bit 28:31 reserved */ #define PFM_CTL_mskEN0 ( 0x01 << PFM_CTL_offEN0 ) -- 2.17.0
[PATCH v4 1/4] nds32: Fix bug in bitfield.h
From: Nickhu There two bitfield bug for perfomance counter in bitfield.h: PFM_CTL_offSEL1 21 --> 16 PFM_CTL_offSEL2 27 --> 22 This commit fix it. Signed-off-by: Nickhu --- arch/nds32/include/asm/bitfield.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/nds32/include/asm/bitfield.h b/arch/nds32/include/asm/bitfield.h index 8e84fc385b94..19b2841219ad 100644 --- a/arch/nds32/include/asm/bitfield.h +++ b/arch/nds32/include/asm/bitfield.h @@ -692,8 +692,8 @@ #define PFM_CTL_offKU1 13 /* Enable user mode event counting for PFMC1 */ #define PFM_CTL_offKU2 14 /* Enable user mode event counting for PFMC2 */ #define PFM_CTL_offSEL015 /* The event selection for PFMC0 */ -#define PFM_CTL_offSEL121 /* The event selection for PFMC1 */ -#define PFM_CTL_offSEL227 /* The event selection for PFMC2 */ +#define PFM_CTL_offSEL116 /* The event selection for PFMC1 */ +#define PFM_CTL_offSEL222 /* The event selection for PFMC2 */ /* bit 28:31 reserved */ #define PFM_CTL_mskEN0 ( 0x01 << PFM_CTL_offEN0 ) -- 2.17.0
[PATCH v4 4/4] nds32: Add document for NDS32 PMU.
From: Nickhu The document for how to add NDS32 PMU in devicetree. Signed-off-by: Nickhu Reviewed-by: Rob Herring --- .../devicetree/bindings/perf/nds32v3-pmu.txt| 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt diff --git a/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt new file mode 100644 index ..1bd15785b4ae --- /dev/null +++ b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt @@ -0,0 +1,17 @@ +* NDS32 Performance Monitor Units + +NDS32 core have a PMU for counting cpu and cache events like cache misses. +The NDS32 PMU representation in the device tree should be done as under: + +Required properties: + +- compatible : + "andestech,nds32v3-pmu" + +- interrupts : The interrupt number for NDS32 PMU is 13. + +Example: +pmu{ + compatible = "andestech,nds32v3-pmu"; + interrupts = <13>; +} -- 2.17.0
[PATCH v4 2/4] nds32: Perf porting
From: Nickhu This is the commit that porting the perf for nds32. 1.Raw event: The raw events start with 'r'. Usage: perf stat -e rXYZ ./app X: the index of performance counter. YZ: the index(convert to hexdecimal) of events Example: 'perf stat -e r101 ./app' means the counter 1 will count the instruction event. The index of counter and events can be found in "Andes System Privilege Architecture Version 3 Manual". Or you can perform the 'perf list' to find the symbolic name of raw events. 2.Perf mmap2: Fix unexpected perf mmap2() page fault When the mmap2() called by perf application, you will encounter such condition:"failed to write." With return value -EFAULT This is due to the page fault caused by "reading" buffer from the mapped legal address region to write to the descriptor. The page_fault handler will get a VM_FAULT_SIGBUS return value, which should not happens here.(Due to this is a read request.) You can refer to kernel/events/core.c:perf_mmap_fault(...) If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated as true, you will get VM_FAULT_SIGBUS as return value. However, this is not an write request. The flags which indicated why the page fault happens is wrong. Furthermore, NDS32 SPAv3 is not able to detect it is read or write. It only know either it is instruction fetch or data access. Therefore, by removing the wrong flag assignment(actually, the hardware is not able to show the reason), we can fix this bug. 3.Perf multiple events map to same counter. When there are multiple events map to the same counter, the counter counts inaccurately. This is because each counter only counts one event in the same time. So when there are multiple events map to same counter, they have to take turns in each context. There are two solution: 1. Print the error message when multiple events map to the same counter. But print the error message would let the program hang in loop. The ltp (linux test program) would be failed when the program hang in loop. 2. Don't print the error message, the ltp would pass. But the user need to have the knowledge that don't count the events which map to the same counter, or the user will get the inaccurate results. We choose method 2 for the solution Signed-off-by: Nickhu --- arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 ++ arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1223 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 15 files changed, 2019 insertions(+), 6 deletions(-) create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index 7068f341133d..dd448d431f5a 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -31,6 +31,7 @@ config NDS32 select HAVE_DEBUG_KMEMLEAK select HAVE_MEMBLOCK select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_PERF_EVENTS select IRQ_DOMAIN select LOCKDEP_SUPPORT select MODULES_USE_ELF_RELA diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts index bb39749a6673..16a9f54a805e 100644 --- a/arch/nds32/boot/dts/ae3xx.dts +++ b/arch/nds32/boot/dts/ae3xx.dts @@ -82,4 +82,9 @@ interrupts = <18>; }; }; + + pmu { + compatible = "andestech,nds32v3-pmu"; + interrupts= <13>; + }; }; diff --git a/arch/nds32/include/asm/Kbuild
[PATCH v4 4/4] nds32: Add document for NDS32 PMU.
From: Nickhu The document for how to add NDS32 PMU in devicetree. Signed-off-by: Nickhu Reviewed-by: Rob Herring --- .../devicetree/bindings/perf/nds32v3-pmu.txt| 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt diff --git a/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt new file mode 100644 index ..1bd15785b4ae --- /dev/null +++ b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt @@ -0,0 +1,17 @@ +* NDS32 Performance Monitor Units + +NDS32 core have a PMU for counting cpu and cache events like cache misses. +The NDS32 PMU representation in the device tree should be done as under: + +Required properties: + +- compatible : + "andestech,nds32v3-pmu" + +- interrupts : The interrupt number for NDS32 PMU is 13. + +Example: +pmu{ + compatible = "andestech,nds32v3-pmu"; + interrupts = <13>; +} -- 2.17.0
[PATCH v4 2/4] nds32: Perf porting
From: Nickhu This is the commit that porting the perf for nds32. 1.Raw event: The raw events start with 'r'. Usage: perf stat -e rXYZ ./app X: the index of performance counter. YZ: the index(convert to hexdecimal) of events Example: 'perf stat -e r101 ./app' means the counter 1 will count the instruction event. The index of counter and events can be found in "Andes System Privilege Architecture Version 3 Manual". Or you can perform the 'perf list' to find the symbolic name of raw events. 2.Perf mmap2: Fix unexpected perf mmap2() page fault When the mmap2() called by perf application, you will encounter such condition:"failed to write." With return value -EFAULT This is due to the page fault caused by "reading" buffer from the mapped legal address region to write to the descriptor. The page_fault handler will get a VM_FAULT_SIGBUS return value, which should not happens here.(Due to this is a read request.) You can refer to kernel/events/core.c:perf_mmap_fault(...) If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated as true, you will get VM_FAULT_SIGBUS as return value. However, this is not an write request. The flags which indicated why the page fault happens is wrong. Furthermore, NDS32 SPAv3 is not able to detect it is read or write. It only know either it is instruction fetch or data access. Therefore, by removing the wrong flag assignment(actually, the hardware is not able to show the reason), we can fix this bug. 3.Perf multiple events map to same counter. When there are multiple events map to the same counter, the counter counts inaccurately. This is because each counter only counts one event in the same time. So when there are multiple events map to same counter, they have to take turns in each context. There are two solution: 1. Print the error message when multiple events map to the same counter. But print the error message would let the program hang in loop. The ltp (linux test program) would be failed when the program hang in loop. 2. Don't print the error message, the ltp would pass. But the user need to have the knowledge that don't count the events which map to the same counter, or the user will get the inaccurate results. We choose method 2 for the solution Signed-off-by: Nickhu --- arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 ++ arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1223 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 15 files changed, 2019 insertions(+), 6 deletions(-) create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig index 7068f341133d..dd448d431f5a 100644 --- a/arch/nds32/Kconfig +++ b/arch/nds32/Kconfig @@ -31,6 +31,7 @@ config NDS32 select HAVE_DEBUG_KMEMLEAK select HAVE_MEMBLOCK select HAVE_REGS_AND_STACK_ACCESS_API + select HAVE_PERF_EVENTS select IRQ_DOMAIN select LOCKDEP_SUPPORT select MODULES_USE_ELF_RELA diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts index bb39749a6673..16a9f54a805e 100644 --- a/arch/nds32/boot/dts/ae3xx.dts +++ b/arch/nds32/boot/dts/ae3xx.dts @@ -82,4 +82,9 @@ interrupts = <18>; }; }; + + pmu { + compatible = "andestech,nds32v3-pmu"; + interrupts= <13>; + }; }; diff --git a/arch/nds32/include/asm/Kbuild
Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall
Hi, Michael, Thanks a lot for the review and comments! Let us sync with Hyper-V team to confirm these suspicious points. BRs, Sun Yi On 18-10-24 16:53:00, Michael Kelley wrote: > From: Yi Sun Sent: Friday, October 19, 2018 6:14 > AM > > > > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT) > > is used by a guest OS to notify the hypervisor that the calling > > virtual processor is attempting to acquire a resource that is > > potentially held by another virtual processor within the same > > Virtual Machine. This scheduling hint improves the scalability of > > VMs with more than one virtual processor on Hyper-V. > > > > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor > > only when the retry number exceeds the recommended number. If > > recommended number is 0x, never retry. > > The HvNotifyLongSpinWait hypercall should be understood to be > advisory only. As you noted, it is a scheduling hint to the > hypervisor that some virtual CPU in the VM holds a spin lock. Even > though Linux knows which vCPU holds the spin lock, the hypercall > does not provide a way to give that information to Hyper-V. The > hypercall always returns immediately. > > The "retry number" is a bit mis-named in the Hyper-V Top Level > Functional Spec (TLFS). It is essentially a threshold value. Hyper-V is > saying "don't bother to advise me about the spin lock until you have > done a certain number of spins." This threshold prevents > over-notifying Hyper-V such that the notification becomes somewhat > meaningless. It's not immediately clear to me why the hypercall passes > that value as an input, but maybe it lets the Hyper-V scheduler prioritize > among vCPUs based on how many times they have spun for a lock. I > think we were told that current Hyper-V implementations ignore this > input value anyway. > > I believe the description of the sentinel value 0x in the > Hyper-V TLFS is incorrect. Because it is the max possible threshold > value, that value in the EBX register just means to not ever bother to > notify. The description should be "0x indicates never to notify." > The value does *not* indicate anything about retrying to obtain the > spin lock. > I will send mail to Hyper-V team to clarify these. > > static bool __initdata hv_pvspin = true; > > > > +bool hv_notify_long_spin_wait(int retry_num) > > retry_num should probably be declared as unsigned int. You > don't want it to be treated as a negative number if the high > order bit is set. > Yes, I should declare it as 'unsigned int'. Thanks! > > +{ > > + /* > > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when > > +* the retry number exceeds the recommended number. > > +* > > +* If recommended number is 0x, never retry. > > +*/ > > + if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER) > > + return false; > > + > > + if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num) > > I don't know if the "%" function is right here. Your implementation will > notify Hyper-V on every multiple of num_spin_retry. The alternative is to > notify once when the threshold is exceeded, and never again for this > particular attempt to obtain a spin lock. We should check with the Hyper-V > team for which approach they expect to be used. > > > + hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT, > > + retry_num); > > The Hyper-V TLFS seems to be inconsistent on whether the input parameter > is 32-bits or 64-bits. In one place it is typed as UINT64, but in another > place > it is shown as only 4 bytes. Need to clear this up with the Hyper-V team as > well. > > > + > > + return true; > > I don't see a need for this function to return true vs. false. Any calling > code > should not change its behavior based on num_spin_retry. This function will > either notify Hyper-V or not notify Hyper-V, depending on whether the number > of attempts to obtain the spinlock meets the threshold. But calling code will > do the same thing regardless of whether such a notification is made. > > Michael
[PATCH v4 0/4] nds32: Perf support
These four commits are perf supporting for nds32. There are three perfomance counters in nds32, and each of them can counts different events. You can use 'perf list' to show the available events that can be used. Changes in V2: 1. Change the definition 'PFM_CTL_xxx' to array form. 2. Simplify the PMU driver. 3. Stop all counters when handling irq caused by performance counters overflow. 4. Rename the compatible string in devicetree. Changes in V3: Fix the typo in Documentation/devicetree/ bindings/nds32/pmu.txt. Changes in V4: Move 'Documentation/devicetree/bindings/nds32/pmu.txt' to 'Documentation/devicetree/bindings/perf/nds32v3-pmu.txt'. Nickhu (4): nds32: Fix bug in bitfield.h nds32: Perf porting nds32: Add perf call-graph support. nds32: Add document for NDS32 PMU. .../devicetree/bindings/perf/nds32v3-pmu.txt | 17 + arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/bitfield.h |4 +- arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 + arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1522 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 17 files changed, 2337 insertions(+), 8 deletions(-) create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json -- 2.17.0
[PATCH v4 3/4] nds32: Add perf call-graph support.
From: Nickhu The perf call-graph option can trace the callchain between functions. This commit add the perf callchain for nds32. There are kerenl callchain and user callchain. The kerenl callchain can trace the function in kernel space. There are two type for user callchain. One for the 'optimize for size' config is set, and another one for the config is not set. The difference between two types is that the index of frame-pointer in user stack is not the same. For example: With optimize for size: User Stack: - | lp | - | gp | - | fp | Without optimize for size: User Stack: 1. non-leaf function: - | lp | - | fp | 2. leaf function: - | fp | Signed-off-by: Nickhu --- arch/nds32/kernel/perf_event_cpu.c | 299 + 1 file changed, 299 insertions(+) diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c index a6e723d0fdbc..5e00ce54d0ff 100644 --- a/arch/nds32/kernel/perf_event_cpu.c +++ b/arch/nds32/kernel/perf_event_cpu.c @@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void) device_initcall(register_pmu_driver); +/* + * References: arch/nds32/kernel/traps.c:__dump() + * You will need to know the NDS ABI first. + */ +static int unwind_frame_kernel(struct stackframe *frame) +{ + int graph = 0; +#ifdef CONFIG_FRAME_POINTER + /* 0x3 means misalignment */ + if (!kstack_end((void *)frame->fp) && + !((unsigned long)frame->fp & 0x3) && + ((unsigned long)frame->fp >= TASK_SIZE)) { + /* +* The array index is based on the ABI, the below graph +* illustrate the reasons. +* Function call procedure: "smw" and "lmw" will always +* update SP and FP for you automatically. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) <-- FP(after smw) -1 +* +* |FP| -2 +* +* | | <-- SP(after smw) -3 +*/ + frame->lp = ((unsigned long *)frame->fp)[-1]; + frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET]; + /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (NULL, , frame->lp, NULL); + + return 0; + } else { + return -EPERM; + } +#else + /* +* You can refer to arch/nds32/kernel/traps.c:__dump() +* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp". +* And, the "sp" is not always correct. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) -1 +* +* | | <-- SP(after smw) -2 +* +*/ + if (!kstack_end((void *)frame->sp)) { + frame->lp = ((unsigned long *)frame->sp)[1]; + /* TODO: How to deal with the value in first +* "sp" is not correct? +*/ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (tsk, , frame->lp, NULL); + + frame->sp = ((unsigned long *)frame->sp) + 1; + + return 0; + } else { + return -EPERM; + } +#endif +} + +static void notrace +walk_stackframe(struct stackframe *frame, + int (*fn_record)(struct stackframe *, void *), + void *data) +{ + while (1) { + int ret; + + if (fn_record(frame, data)) + break; + + ret = unwind_frame_kernel(frame); + if (ret < 0) + break; + } +} + +/* + * Gets called by walk_stackframe() for every stackframe. This will be called + * whist unwinding the stackframe and is like a subroutine return so we use + * the PC. + */ +static int callchain_trace(struct stackframe *fr, void *data) +{
Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall
Hi, Michael, Thanks a lot for the review and comments! Let us sync with Hyper-V team to confirm these suspicious points. BRs, Sun Yi On 18-10-24 16:53:00, Michael Kelley wrote: > From: Yi Sun Sent: Friday, October 19, 2018 6:14 > AM > > > > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT) > > is used by a guest OS to notify the hypervisor that the calling > > virtual processor is attempting to acquire a resource that is > > potentially held by another virtual processor within the same > > Virtual Machine. This scheduling hint improves the scalability of > > VMs with more than one virtual processor on Hyper-V. > > > > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor > > only when the retry number exceeds the recommended number. If > > recommended number is 0x, never retry. > > The HvNotifyLongSpinWait hypercall should be understood to be > advisory only. As you noted, it is a scheduling hint to the > hypervisor that some virtual CPU in the VM holds a spin lock. Even > though Linux knows which vCPU holds the spin lock, the hypercall > does not provide a way to give that information to Hyper-V. The > hypercall always returns immediately. > > The "retry number" is a bit mis-named in the Hyper-V Top Level > Functional Spec (TLFS). It is essentially a threshold value. Hyper-V is > saying "don't bother to advise me about the spin lock until you have > done a certain number of spins." This threshold prevents > over-notifying Hyper-V such that the notification becomes somewhat > meaningless. It's not immediately clear to me why the hypercall passes > that value as an input, but maybe it lets the Hyper-V scheduler prioritize > among vCPUs based on how many times they have spun for a lock. I > think we were told that current Hyper-V implementations ignore this > input value anyway. > > I believe the description of the sentinel value 0x in the > Hyper-V TLFS is incorrect. Because it is the max possible threshold > value, that value in the EBX register just means to not ever bother to > notify. The description should be "0x indicates never to notify." > The value does *not* indicate anything about retrying to obtain the > spin lock. > I will send mail to Hyper-V team to clarify these. > > static bool __initdata hv_pvspin = true; > > > > +bool hv_notify_long_spin_wait(int retry_num) > > retry_num should probably be declared as unsigned int. You > don't want it to be treated as a negative number if the high > order bit is set. > Yes, I should declare it as 'unsigned int'. Thanks! > > +{ > > + /* > > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when > > +* the retry number exceeds the recommended number. > > +* > > +* If recommended number is 0x, never retry. > > +*/ > > + if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER) > > + return false; > > + > > + if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num) > > I don't know if the "%" function is right here. Your implementation will > notify Hyper-V on every multiple of num_spin_retry. The alternative is to > notify once when the threshold is exceeded, and never again for this > particular attempt to obtain a spin lock. We should check with the Hyper-V > team for which approach they expect to be used. > > > + hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT, > > + retry_num); > > The Hyper-V TLFS seems to be inconsistent on whether the input parameter > is 32-bits or 64-bits. In one place it is typed as UINT64, but in another > place > it is shown as only 4 bytes. Need to clear this up with the Hyper-V team as > well. > > > + > > + return true; > > I don't see a need for this function to return true vs. false. Any calling > code > should not change its behavior based on num_spin_retry. This function will > either notify Hyper-V or not notify Hyper-V, depending on whether the number > of attempts to obtain the spinlock meets the threshold. But calling code will > do the same thing regardless of whether such a notification is made. > > Michael
[PATCH v4 0/4] nds32: Perf support
These four commits are perf supporting for nds32. There are three perfomance counters in nds32, and each of them can counts different events. You can use 'perf list' to show the available events that can be used. Changes in V2: 1. Change the definition 'PFM_CTL_xxx' to array form. 2. Simplify the PMU driver. 3. Stop all counters when handling irq caused by performance counters overflow. 4. Rename the compatible string in devicetree. Changes in V3: Fix the typo in Documentation/devicetree/ bindings/nds32/pmu.txt. Changes in V4: Move 'Documentation/devicetree/bindings/nds32/pmu.txt' to 'Documentation/devicetree/bindings/perf/nds32v3-pmu.txt'. Nickhu (4): nds32: Fix bug in bitfield.h nds32: Perf porting nds32: Add perf call-graph support. nds32: Add document for NDS32 PMU. .../devicetree/bindings/perf/nds32v3-pmu.txt | 17 + arch/nds32/Kconfig|1 + arch/nds32/boot/dts/ae3xx.dts |5 + arch/nds32/include/asm/Kbuild |1 + arch/nds32/include/asm/bitfield.h |4 +- arch/nds32/include/asm/perf_event.h | 16 + arch/nds32/include/asm/pmu.h | 386 + arch/nds32/include/asm/stacktrace.h | 39 + arch/nds32/kernel/Makefile|3 +- arch/nds32/kernel/perf_event_cpu.c| 1522 + arch/nds32/mm/fault.c | 13 +- tools/include/asm/barrier.h |2 + tools/perf/arch/nds32/Build |1 + tools/perf/arch/nds32/util/Build |1 + tools/perf/arch/nds32/util/header.c | 29 + tools/perf/pmu-events/arch/nds32/mapfile.csv | 15 + .../pmu-events/arch/nds32/n13/atcpmu.json | 290 17 files changed, 2337 insertions(+), 8 deletions(-) create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt create mode 100644 arch/nds32/include/asm/perf_event.h create mode 100644 arch/nds32/include/asm/pmu.h create mode 100644 arch/nds32/include/asm/stacktrace.h create mode 100644 arch/nds32/kernel/perf_event_cpu.c create mode 100644 tools/perf/arch/nds32/Build create mode 100644 tools/perf/arch/nds32/util/Build create mode 100644 tools/perf/arch/nds32/util/header.c create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json -- 2.17.0
[PATCH v4 3/4] nds32: Add perf call-graph support.
From: Nickhu The perf call-graph option can trace the callchain between functions. This commit add the perf callchain for nds32. There are kerenl callchain and user callchain. The kerenl callchain can trace the function in kernel space. There are two type for user callchain. One for the 'optimize for size' config is set, and another one for the config is not set. The difference between two types is that the index of frame-pointer in user stack is not the same. For example: With optimize for size: User Stack: - | lp | - | gp | - | fp | Without optimize for size: User Stack: 1. non-leaf function: - | lp | - | fp | 2. leaf function: - | fp | Signed-off-by: Nickhu --- arch/nds32/kernel/perf_event_cpu.c | 299 + 1 file changed, 299 insertions(+) diff --git a/arch/nds32/kernel/perf_event_cpu.c b/arch/nds32/kernel/perf_event_cpu.c index a6e723d0fdbc..5e00ce54d0ff 100644 --- a/arch/nds32/kernel/perf_event_cpu.c +++ b/arch/nds32/kernel/perf_event_cpu.c @@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void) device_initcall(register_pmu_driver); +/* + * References: arch/nds32/kernel/traps.c:__dump() + * You will need to know the NDS ABI first. + */ +static int unwind_frame_kernel(struct stackframe *frame) +{ + int graph = 0; +#ifdef CONFIG_FRAME_POINTER + /* 0x3 means misalignment */ + if (!kstack_end((void *)frame->fp) && + !((unsigned long)frame->fp & 0x3) && + ((unsigned long)frame->fp >= TASK_SIZE)) { + /* +* The array index is based on the ABI, the below graph +* illustrate the reasons. +* Function call procedure: "smw" and "lmw" will always +* update SP and FP for you automatically. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) <-- FP(after smw) -1 +* +* |FP| -2 +* +* | | <-- SP(after smw) -3 +*/ + frame->lp = ((unsigned long *)frame->fp)[-1]; + frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET]; + /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (NULL, , frame->lp, NULL); + + return 0; + } else { + return -EPERM; + } +#else + /* +* You can refer to arch/nds32/kernel/traps.c:__dump() +* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp". +* And, the "sp" is not always correct. +* +* Stack Relative Address +* | | 0 +* +* |LP| <-- SP(before smw) -1 +* +* | | <-- SP(after smw) -2 +* +*/ + if (!kstack_end((void *)frame->sp)) { + frame->lp = ((unsigned long *)frame->sp)[1]; + /* TODO: How to deal with the value in first +* "sp" is not correct? +*/ + if (__kernel_text_address(frame->lp)) + frame->lp = ftrace_graph_ret_addr + (tsk, , frame->lp, NULL); + + frame->sp = ((unsigned long *)frame->sp) + 1; + + return 0; + } else { + return -EPERM; + } +#endif +} + +static void notrace +walk_stackframe(struct stackframe *frame, + int (*fn_record)(struct stackframe *, void *), + void *data) +{ + while (1) { + int ret; + + if (fn_record(frame, data)) + break; + + ret = unwind_frame_kernel(frame); + if (ret < 0) + break; + } +} + +/* + * Gets called by walk_stackframe() for every stackframe. This will be called + * whist unwinding the stackframe and is like a subroutine return so we use + * the PC. + */ +static int callchain_trace(struct stackframe *fr, void *data) +{
Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
Hi Kirill, Thanks for making this patchset. I have small concerns, please see the inline comments. On 10/24/18 at 03:51pm, Kirill A. Shutemov wrote: > On 5-level paging LDT remap area is placed in the middle of > KASLR randomization region and it can overlap with direct mapping, > vmalloc or vmap area. > > Let's move LDT just before direct mapping which makes it safe for KASLR. > This also allows us to unify layout between 4- and 5-level paging. In crash utility and makedumpfile which are used to analyze system memory content, PAGE_OFFSET is hardcoded as below in non-KASLR case: #define PAGE_OFFSET_2_6_27 0x8800 Seems this time they need add another value for them. For 4-level and 5-level, since 5-level code also exist in stable kernel. Surely this doesn't matter much. > > We don't touch 4 pgd slot gap just before the direct mapping reserved > for a hypervisor, but move direct mapping by one slot instead. > > The LDT mapping is per-mm, so we cannot move it into P4D page table next > to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level > paging. Here as discussed in private thread, at the first place you also agreed to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd mind, there must be some reasons when you implemented and investigated further to find out. Could you please say more about how it will complicating PGD table allocation for 5-level paging? Or give an use case where it will complicate? Very sorry I am stupid, still don't get what's the point. Really appreciate it. Thanks Baoquan
Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
Hi Kirill, Thanks for making this patchset. I have small concerns, please see the inline comments. On 10/24/18 at 03:51pm, Kirill A. Shutemov wrote: > On 5-level paging LDT remap area is placed in the middle of > KASLR randomization region and it can overlap with direct mapping, > vmalloc or vmap area. > > Let's move LDT just before direct mapping which makes it safe for KASLR. > This also allows us to unify layout between 4- and 5-level paging. In crash utility and makedumpfile which are used to analyze system memory content, PAGE_OFFSET is hardcoded as below in non-KASLR case: #define PAGE_OFFSET_2_6_27 0x8800 Seems this time they need add another value for them. For 4-level and 5-level, since 5-level code also exist in stable kernel. Surely this doesn't matter much. > > We don't touch 4 pgd slot gap just before the direct mapping reserved > for a hypervisor, but move direct mapping by one slot instead. > > The LDT mapping is per-mm, so we cannot move it into P4D page table next > to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level > paging. Here as discussed in private thread, at the first place you also agreed to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd mind, there must be some reasons when you implemented and investigated further to find out. Could you please say more about how it will complicating PGD table allocation for 5-level paging? Or give an use case where it will complicate? Very sorry I am stupid, still don't get what's the point. Really appreciate it. Thanks Baoquan
Re: KASAN: use-after-free Read in task_is_descendant
Oleg Nesterov wrote: > On 10/22, Tetsuo Handa wrote: > > > And again, I do not know how/if yama ensures that child is rcu-protected, > > > perhaps > > > task_is_descendant() needs to check pid_alive(child) right after > > > rcu_read_lock() ? > > > > Since the caller (ptrace() path) called get_task_struct(child), child > > itself can't be > > released. Do we still need pid_alive(child) ? > > get_task_struct(child) can only ensure that this task_struct can't be freed. The report says that it is a use-after-free read at walker = rcu_dereference(walker->real_parent); which means that walker was already released. > > Suppose that this child exits after get_task_struct(), then its real_parent > exits > too and calls call_rcu(delayed_put_task_struct). > > Now, when task_is_descendant() is called, rcu_read_lock() can happen after > rcu gp, > iow child->parent can be already freed/reused/unmapped. > > We need to ensure that child is still protected by RCU. I wonder whether pid_alive() test helps. We can get [ 40.620318] parent or walker is dead. [ 40.624146] tracee is dead. messages using below patch and reproducer. -- diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 99cfddd..0d9d786 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long request, if (mutex_lock_interruptible(>signal->cred_guard_mutex)) goto out; + schedule_timeout_killable(HZ); task_lock(task); retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); task_unlock(task); diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c index ffda91a..a231ec6 100644 --- a/security/yama/yama_lsm.c +++ b/security/yama/yama_lsm.c @@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent, return 0; rcu_read_lock(); + if (!pid_alive(parent) || !pid_alive(walker)) { + rcu_read_unlock(); + printk("parent or walker is dead.\n"); + return 0; + } if (!thread_group_leader(parent)) parent = rcu_dereference(parent->group_leader); while (walker->pid > 0) { @@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct *tracer, bool found = false; rcu_read_lock(); + if (!pid_alive(tracee)) { + printk("tracee is dead.\n"); + goto unlock; + } /* * If there's already an active tracing relationship, then make an -- -- #include #include #include int main(int argc, char *argv[]) { if (fork() == 0) { ptrace(PTRACE_ATTACH, getppid(), NULL, NULL); _exit(0); } poll(NULL, 0, 100); return 0; } -- But since "child" has at least one reference, reading "child->real_parent" should be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false (like above patch does) cannot avoid this problem...
Re: KASAN: use-after-free Read in task_is_descendant
Oleg Nesterov wrote: > On 10/22, Tetsuo Handa wrote: > > > And again, I do not know how/if yama ensures that child is rcu-protected, > > > perhaps > > > task_is_descendant() needs to check pid_alive(child) right after > > > rcu_read_lock() ? > > > > Since the caller (ptrace() path) called get_task_struct(child), child > > itself can't be > > released. Do we still need pid_alive(child) ? > > get_task_struct(child) can only ensure that this task_struct can't be freed. The report says that it is a use-after-free read at walker = rcu_dereference(walker->real_parent); which means that walker was already released. > > Suppose that this child exits after get_task_struct(), then its real_parent > exits > too and calls call_rcu(delayed_put_task_struct). > > Now, when task_is_descendant() is called, rcu_read_lock() can happen after > rcu gp, > iow child->parent can be already freed/reused/unmapped. > > We need to ensure that child is still protected by RCU. I wonder whether pid_alive() test helps. We can get [ 40.620318] parent or walker is dead. [ 40.624146] tracee is dead. messages using below patch and reproducer. -- diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 99cfddd..0d9d786 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long request, if (mutex_lock_interruptible(>signal->cred_guard_mutex)) goto out; + schedule_timeout_killable(HZ); task_lock(task); retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); task_unlock(task); diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c index ffda91a..a231ec6 100644 --- a/security/yama/yama_lsm.c +++ b/security/yama/yama_lsm.c @@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent, return 0; rcu_read_lock(); + if (!pid_alive(parent) || !pid_alive(walker)) { + rcu_read_unlock(); + printk("parent or walker is dead.\n"); + return 0; + } if (!thread_group_leader(parent)) parent = rcu_dereference(parent->group_leader); while (walker->pid > 0) { @@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct *tracer, bool found = false; rcu_read_lock(); + if (!pid_alive(tracee)) { + printk("tracee is dead.\n"); + goto unlock; + } /* * If there's already an active tracing relationship, then make an -- -- #include #include #include int main(int argc, char *argv[]) { if (fork() == 0) { ptrace(PTRACE_ATTACH, getppid(), NULL, NULL); _exit(0); } poll(NULL, 0, 100); return 0; } -- But since "child" has at least one reference, reading "child->real_parent" should be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false (like above patch does) cannot avoid this problem...
Re: [PATCH 08/17] prmem: struct page: track vmap_area
On Thu, Oct 25, 2018 at 02:01:02AM +0300, Igor Stoppa wrote: > > > @@ -1747,6 +1750,10 @@ void *__vmalloc_node_range(unsigned long size, > > > unsigned long align, > > > if (!addr) > > > return NULL; > > > + va = __find_vmap_area((unsigned long)addr); > > > + for (i = 0; i < va->vm->nr_pages; i++) > > > + va->vm->pages[i]->area = va; > > > > I don't like it that you're calling this for _every_ vmalloc() caller > > when most of them will never use this. Perhaps have page->va be initially > > NULL and then cache the lookup in it when it's accessed for the first time. > > > > If __find_vmap_area() was part of the API, this loop could be left out from > __vmalloc_node_range() and the user of the allocation could initialize the > field, if needed. > > What is the reason for keeping __find_vmap_area() private? Well, for one, you're walking the rbtree without holding the spinlock, so you're going to get crashes. I don't see why we shouldn't export find_vmap_area() though. Another way we could approach this is to embed the vmap_area in the vm_struct. It'd require a bit of juggling of the alloc/free paths in vmalloc, but it might be worthwhile.
Re: [PATCH 08/17] prmem: struct page: track vmap_area
On Thu, Oct 25, 2018 at 02:01:02AM +0300, Igor Stoppa wrote: > > > @@ -1747,6 +1750,10 @@ void *__vmalloc_node_range(unsigned long size, > > > unsigned long align, > > > if (!addr) > > > return NULL; > > > + va = __find_vmap_area((unsigned long)addr); > > > + for (i = 0; i < va->vm->nr_pages; i++) > > > + va->vm->pages[i]->area = va; > > > > I don't like it that you're calling this for _every_ vmalloc() caller > > when most of them will never use this. Perhaps have page->va be initially > > NULL and then cache the lookup in it when it's accessed for the first time. > > > > If __find_vmap_area() was part of the API, this loop could be left out from > __vmalloc_node_range() and the user of the allocation could initialize the > field, if needed. > > What is the reason for keeping __find_vmap_area() private? Well, for one, you're walking the rbtree without holding the spinlock, so you're going to get crashes. I don't see why we shouldn't export find_vmap_area() though. Another way we could approach this is to embed the vmap_area in the vm_struct. It'd require a bit of juggling of the alloc/free paths in vmalloc, but it might be worthwhile.
[PATCH] mmc: sdhci: Convert sdhci_allocate_bounce_buffer() to return void
The function sdhci_allocate_bounce_buffer() always return zero at present, so there's no need to have a return value, that will also make error path easier. CC: Linus Walleij Signed-off-by: Chunyan Zhang --- drivers/mmc/host/sdhci.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 1b3fbd9..e94f4aa 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -3408,7 +3408,7 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, u32 *caps, u32 *caps1) } EXPORT_SYMBOL_GPL(__sdhci_read_caps); -static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) +static void sdhci_allocate_bounce_buffer(struct sdhci_host *host) { struct mmc_host *mmc = host->mmc; unsigned int max_blocks; @@ -3446,7 +3446,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) * Exiting with zero here makes sure we proceed with * mmc->max_segs == 1. */ - return 0; + return; } host->bounce_addr = dma_map_single(mmc->parent, @@ -3456,7 +3456,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) ret = dma_mapping_error(mmc->parent, host->bounce_addr); if (ret) /* Again fall back to max_segs == 1 */ - return 0; + return; host->bounce_buffer_size = bounce_size; /* Lie about this since we're bouncing */ @@ -3466,8 +3466,6 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) pr_info("%s bounce up to %u segments into one, max segment size %u bytes\n", mmc_hostname(mmc), max_blocks, bounce_size); - - return 0; } int sdhci_setup_host(struct sdhci_host *host) @@ -3987,12 +3985,9 @@ int sdhci_setup_host(struct sdhci_host *host) */ mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535; - if (mmc->max_segs == 1) { + if (mmc->max_segs == 1) /* This may alter mmc->*_blk_* parameters */ - ret = sdhci_allocate_bounce_buffer(host); - if (ret) - return ret; - } + sdhci_allocate_bounce_buffer(host); return 0; -- 2.7.4
[PATCH] mmc: sdhci: Convert sdhci_allocate_bounce_buffer() to return void
The function sdhci_allocate_bounce_buffer() always return zero at present, so there's no need to have a return value, that will also make error path easier. CC: Linus Walleij Signed-off-by: Chunyan Zhang --- drivers/mmc/host/sdhci.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 1b3fbd9..e94f4aa 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -3408,7 +3408,7 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, u32 *caps, u32 *caps1) } EXPORT_SYMBOL_GPL(__sdhci_read_caps); -static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) +static void sdhci_allocate_bounce_buffer(struct sdhci_host *host) { struct mmc_host *mmc = host->mmc; unsigned int max_blocks; @@ -3446,7 +3446,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) * Exiting with zero here makes sure we proceed with * mmc->max_segs == 1. */ - return 0; + return; } host->bounce_addr = dma_map_single(mmc->parent, @@ -3456,7 +3456,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) ret = dma_mapping_error(mmc->parent, host->bounce_addr); if (ret) /* Again fall back to max_segs == 1 */ - return 0; + return; host->bounce_buffer_size = bounce_size; /* Lie about this since we're bouncing */ @@ -3466,8 +3466,6 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host *host) pr_info("%s bounce up to %u segments into one, max segment size %u bytes\n", mmc_hostname(mmc), max_blocks, bounce_size); - - return 0; } int sdhci_setup_host(struct sdhci_host *host) @@ -3987,12 +3985,9 @@ int sdhci_setup_host(struct sdhci_host *host) */ mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535; - if (mmc->max_segs == 1) { + if (mmc->max_segs == 1) /* This may alter mmc->*_blk_* parameters */ - ret = sdhci_allocate_bounce_buffer(host); - if (ret) - return ret; - } + sdhci_allocate_bounce_buffer(host); return 0; -- 2.7.4
Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page
On Wed, Oct 24, 2018 at 05:09:05PM -0700, Florian Fainelli wrote: > perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some > independance with respect to the ARM CPU being used. Add a test which > tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is > turned on to help asses the system's health. Hi Florian I've suffered the pain from missing CONFIG_KUSER_HELPERS. The segfaults give little clue as to what is going wrong,and gdb is not much use either. What i don't see here is any clue to CONFIG_KUSER_HELPERS. If the test fails, could you print a message suggesting CONFIG_KUSER_HELPERS on ARM? Andrew
Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page
On Wed, Oct 24, 2018 at 05:09:05PM -0700, Florian Fainelli wrote: > perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some > independance with respect to the ARM CPU being used. Add a test which > tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is > turned on to help asses the system's health. Hi Florian I've suffered the pain from missing CONFIG_KUSER_HELPERS. The segfaults give little clue as to what is going wrong,and gdb is not much use either. What i don't see here is any clue to CONFIG_KUSER_HELPERS. If the test fails, could you print a message suggesting CONFIG_KUSER_HELPERS on ARM? Andrew
Re: [PATCH v6 1/6] dt-bindings: power: Introduce one property to present the battery internal resistance
Hi Sebastian, On 22 October 2018 at 15:43, Baolin Wang wrote: > The internal resistance of a battery is not a constant in its life cycle, > this varies over the age of the battery or temperature and so on. But we > just want use one constant battery internal resistance to estimate the > battery capacity. Thus this patch introduces one property to present > the battery factory internal resistance for battery information. > > Signed-off-by: Baolin Wang > Reviewed-by: Linus Walleij > --- > Changes from v5: > - None. > > Changes from v4: > - None. > > Changes from v3: > - Split binding into one separate patch. > - Add LinusW reviewed tag. > > Changes from v2: > - Rename the property. > - Improve the commit message. > > Changes from v1: > - New patch in v2. > --- I think this v6 patch set have addressed your comments on v5, so could you apply this patch set if there are no other comments? I hope this driver can be merged into 4.20 finally. Thanks. -- Baolin Wang Best Regards
Re: [PATCH v6 1/6] dt-bindings: power: Introduce one property to present the battery internal resistance
Hi Sebastian, On 22 October 2018 at 15:43, Baolin Wang wrote: > The internal resistance of a battery is not a constant in its life cycle, > this varies over the age of the battery or temperature and so on. But we > just want use one constant battery internal resistance to estimate the > battery capacity. Thus this patch introduces one property to present > the battery factory internal resistance for battery information. > > Signed-off-by: Baolin Wang > Reviewed-by: Linus Walleij > --- > Changes from v5: > - None. > > Changes from v4: > - None. > > Changes from v3: > - Split binding into one separate patch. > - Add LinusW reviewed tag. > > Changes from v2: > - Rename the property. > - Improve the commit message. > > Changes from v1: > - New patch in v2. > --- I think this v6 patch set have addressed your comments on v5, so could you apply this patch set if there are no other comments? I hope this driver can be merged into 4.20 finally. Thanks. -- Baolin Wang Best Regards
Re: [PATCH 5/5] rtc: sc27xx: Always read normal alarm when registering RTC device
Hi Alexandre, On 25 October 2018 at 08:34, Alexandre Belloni wrote: > Hello, > > On 18/10/2018 16:52:30+0800, Baolin Wang wrote: >> When registering one RTC device, it will check to see if there is an >> alarm already set in RTC hardware by reading RTC alarm, at this time >> we should always read the normal alarm put in always-on region by >> checking the rtc->registered flag. >> >> Signed-off-by: Baolin Wang >> --- >> drivers/rtc/rtc-sc27xx.c |8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/rtc/rtc-sc27xx.c b/drivers/rtc/rtc-sc27xx.c >> index 72bb002..b4eb3b3 100644 >> --- a/drivers/rtc/rtc-sc27xx.c >> +++ b/drivers/rtc/rtc-sc27xx.c >> @@ -415,10 +415,14 @@ static int sprd_rtc_read_alarm(struct device *dev, >> struct rtc_wkalrm *alrm) >> u32 val; >> >> /* >> - * If aie_timer is enabled, we should get the normal alarm time. >> + * Before RTC device is registered, it will check to see if there is an >> + * alarm already set in RTC hardware, and we always read the normal >> + * alarm at this time. >> + * >> + * Or if aie_timer is enabled, we should get the normal alarm time. >>* Otherwise we should get auxiliary alarm time. >>*/ >> - if (rtc->rtc && rtc->rtc->aie_timer.enabled == 0) >> + if (rtc->rtc && rtc->rtc->registered && rtc->rtc->aie_timer.enabled == >> 0) > > Note that the driver should not access rtc->registered and > rtc->aie_timer.enabled and this is a bit fragile. > But, on the other hand, I currently don't have anything better to > suggest. I was also planning to add an in-kernel API for multiple alarms > but I'm not sure it will actually help in your case. Yes, I understand your concern. I will glad to help to test if you introduce new APIs for multiple alarms to see if they can help in our case. Thanks. -- Baolin Wang Best Regards
Re: [PATCH 5/5] rtc: sc27xx: Always read normal alarm when registering RTC device
Hi Alexandre, On 25 October 2018 at 08:34, Alexandre Belloni wrote: > Hello, > > On 18/10/2018 16:52:30+0800, Baolin Wang wrote: >> When registering one RTC device, it will check to see if there is an >> alarm already set in RTC hardware by reading RTC alarm, at this time >> we should always read the normal alarm put in always-on region by >> checking the rtc->registered flag. >> >> Signed-off-by: Baolin Wang >> --- >> drivers/rtc/rtc-sc27xx.c |8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/rtc/rtc-sc27xx.c b/drivers/rtc/rtc-sc27xx.c >> index 72bb002..b4eb3b3 100644 >> --- a/drivers/rtc/rtc-sc27xx.c >> +++ b/drivers/rtc/rtc-sc27xx.c >> @@ -415,10 +415,14 @@ static int sprd_rtc_read_alarm(struct device *dev, >> struct rtc_wkalrm *alrm) >> u32 val; >> >> /* >> - * If aie_timer is enabled, we should get the normal alarm time. >> + * Before RTC device is registered, it will check to see if there is an >> + * alarm already set in RTC hardware, and we always read the normal >> + * alarm at this time. >> + * >> + * Or if aie_timer is enabled, we should get the normal alarm time. >>* Otherwise we should get auxiliary alarm time. >>*/ >> - if (rtc->rtc && rtc->rtc->aie_timer.enabled == 0) >> + if (rtc->rtc && rtc->rtc->registered && rtc->rtc->aie_timer.enabled == >> 0) > > Note that the driver should not access rtc->registered and > rtc->aie_timer.enabled and this is a bit fragile. > But, on the other hand, I currently don't have anything better to > suggest. I was also planning to add an in-kernel API for multiple alarms > but I'm not sure it will actually help in your case. Yes, I understand your concern. I will glad to help to test if you introduce new APIs for multiple alarms to see if they can help in our case. Thanks. -- Baolin Wang Best Regards
[PATCH] kvm/x86 : remove unused struct definition
structure svm_init_data is never used. So remove it. Signed-off-by: Peng Hao --- arch/x86/kvm/svm.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 61ccfb1..5c7dc8b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -675,11 +675,6 @@ struct svm_cpu_data { static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); -struct svm_init_data { - int cpu; - int r; -}; - static const u32 msrpm_ranges[] = {0, 0xc000, 0xc001}; #define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges) -- 1.8.3.1
[PATCH] kvm/x86 : remove unused struct definition
structure svm_init_data is never used. So remove it. Signed-off-by: Peng Hao --- arch/x86/kvm/svm.c | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 61ccfb1..5c7dc8b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -675,11 +675,6 @@ struct svm_cpu_data { static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); -struct svm_init_data { - int cpu; - int r; -}; - static const u32 msrpm_ranges[] = {0, 0xc000, 0xc001}; #define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges) -- 1.8.3.1
Re: investments proposal
-- Dear Sir, We are contacting your firm with a keen interest to consider a possible collaboration through investments in viable projects globally. we are an investment firm responsible for investing part of our client surplus financial resources through a globally diversified investment strategy, targeting positive capital returns through an expansive portfolio of highly diversified asset classes and active investment management strategies. Currently we wish to re-invest through project financing and investment opportunities globally We are ready to invest in project developments and business ventures that can generate at least 10% ROI per annum. Our offer for investment is based on a purely loan/debt financing basis of 2.5% interest rate per annum throughout the duration of the collaboration. We look forward to reading from you on possible projects and investments suitable for possible collaboration. Best regards, Mr.Gabson Richard Email:mr_gabsonrich...@aol.com
Re: investments proposal
-- Dear Sir, We are contacting your firm with a keen interest to consider a possible collaboration through investments in viable projects globally. we are an investment firm responsible for investing part of our client surplus financial resources through a globally diversified investment strategy, targeting positive capital returns through an expansive portfolio of highly diversified asset classes and active investment management strategies. Currently we wish to re-invest through project financing and investment opportunities globally We are ready to invest in project developments and business ventures that can generate at least 10% ROI per annum. Our offer for investment is based on a purely loan/debt financing basis of 2.5% interest rate per annum throughout the duration of the collaboration. We look forward to reading from you on possible projects and investments suitable for possible collaboration. Best regards, Mr.Gabson Richard Email:mr_gabsonrich...@aol.com
[PATCH v3 1/3] dt-bindings: Input: new bindings for MSM vibrator
This patch adds the device tree bindings for the vibrator found on various Qualcomm MSM SOCs. Signed-off-by: Brian Masney --- .../bindings/input/msm-vibrator.txt | 36 +++ 1 file changed, 36 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt diff --git a/Documentation/devicetree/bindings/input/msm-vibrator.txt b/Documentation/devicetree/bindings/input/msm-vibrator.txt new file mode 100644 index ..8dcf014ef2e5 --- /dev/null +++ b/Documentation/devicetree/bindings/input/msm-vibrator.txt @@ -0,0 +1,36 @@ +* Device tree bindings for the Qualcomm MSM vibrator + +Required properties: + + - compatible: Should be one of + "qcom,msm8226-vibrator" + "qcom,msm8974-vibrator" + - reg: the base address and length of the IO memory for the registers. + - pinctrl-names: set to default. + - pinctrl-0: phandles pointing to pin configuration nodes. See + Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt + - clock-names: set to pwm + - clocks: phandle of the clock. See +Documentation/devicetree/bindings/clock/clock-bindings.txt + - enable-gpios: GPIO that enables the vibrator. + +Optional properties: + + - vcc-supply: phandle to the regulator that provides power to the sensor. + +Example from a LG Nexus 5 (hammerhead) phone: + +vibrator@fd8c3450 { + reg = <0xfd8c3450 0x400>; + compatible = "qcom,msm8974-vibrator"; + + vcc-supply = <_l19>; + + clocks = < CAMSS_GP1_CLK>; + clock-names = "pwm"; + + enable-gpios = < 60 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_pin>; +}; -- 2.17.2
[PATCH v3 0/3] ARM: qcom: add vibrator support for various MSM SOCs
This patch set adds support for the vibrator found on various Qualcomm MSM SOCs. This is based on work from: Jonathan Marek from qcom,pwm-vibrator.c in the PostmarketOS repo: https://gitlab.com/postmarketOS/linux-postmarketos/commit/7647fb36cb1cbd060f8b52087a68ab93583292b5 Jongrak Kwon and Devin Kim from msm_pwm_vibrator.c in the downstream Android 3.4.0 sources: https://android.googlesource.com/kernel/msm/+/android-msm-lenok-3.10-lollipop-wear-release/drivers/misc/msm_pwm_vibrator.c Driver was tested on a LG Nexus 5 (hammerhead) phone using rumble-test: https://git.collabora.com/cgit/user/sre/rumble-test.git/plain/rumble-test.c Changes since v2 - Moved from pwm to input subsystem based on feedback from https://lore.kernel.org/lkml/20181012114749.GC31561@ulmo/. I previously wired this into the input system via pwm-vibra. Changes since v1 - Update device tree binding document based on feedback from Rob Herring. - Changed the driver description to: Qualcomm PWM driver for the MSM vibrator. Brian Masney (3): dt-bindings: Input: new bindings for MSM vibrator Input: add new vibrator driver for various MSM SOCs ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator .../bindings/input/msm-vibrator.txt | 36 +++ .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 ++ drivers/input/misc/Kconfig| 10 + drivers/input/misc/Makefile | 1 + drivers/input/misc/msm-vibrator.c | 276 ++ 5 files changed, 354 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt create mode 100644 drivers/input/misc/msm-vibrator.c -- 2.17.2
[PATCH v3 1/3] dt-bindings: Input: new bindings for MSM vibrator
This patch adds the device tree bindings for the vibrator found on various Qualcomm MSM SOCs. Signed-off-by: Brian Masney --- .../bindings/input/msm-vibrator.txt | 36 +++ 1 file changed, 36 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt diff --git a/Documentation/devicetree/bindings/input/msm-vibrator.txt b/Documentation/devicetree/bindings/input/msm-vibrator.txt new file mode 100644 index ..8dcf014ef2e5 --- /dev/null +++ b/Documentation/devicetree/bindings/input/msm-vibrator.txt @@ -0,0 +1,36 @@ +* Device tree bindings for the Qualcomm MSM vibrator + +Required properties: + + - compatible: Should be one of + "qcom,msm8226-vibrator" + "qcom,msm8974-vibrator" + - reg: the base address and length of the IO memory for the registers. + - pinctrl-names: set to default. + - pinctrl-0: phandles pointing to pin configuration nodes. See + Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt + - clock-names: set to pwm + - clocks: phandle of the clock. See +Documentation/devicetree/bindings/clock/clock-bindings.txt + - enable-gpios: GPIO that enables the vibrator. + +Optional properties: + + - vcc-supply: phandle to the regulator that provides power to the sensor. + +Example from a LG Nexus 5 (hammerhead) phone: + +vibrator@fd8c3450 { + reg = <0xfd8c3450 0x400>; + compatible = "qcom,msm8974-vibrator"; + + vcc-supply = <_l19>; + + clocks = < CAMSS_GP1_CLK>; + clock-names = "pwm"; + + enable-gpios = < 60 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_pin>; +}; -- 2.17.2
[PATCH v3 0/3] ARM: qcom: add vibrator support for various MSM SOCs
This patch set adds support for the vibrator found on various Qualcomm MSM SOCs. This is based on work from: Jonathan Marek from qcom,pwm-vibrator.c in the PostmarketOS repo: https://gitlab.com/postmarketOS/linux-postmarketos/commit/7647fb36cb1cbd060f8b52087a68ab93583292b5 Jongrak Kwon and Devin Kim from msm_pwm_vibrator.c in the downstream Android 3.4.0 sources: https://android.googlesource.com/kernel/msm/+/android-msm-lenok-3.10-lollipop-wear-release/drivers/misc/msm_pwm_vibrator.c Driver was tested on a LG Nexus 5 (hammerhead) phone using rumble-test: https://git.collabora.com/cgit/user/sre/rumble-test.git/plain/rumble-test.c Changes since v2 - Moved from pwm to input subsystem based on feedback from https://lore.kernel.org/lkml/20181012114749.GC31561@ulmo/. I previously wired this into the input system via pwm-vibra. Changes since v1 - Update device tree binding document based on feedback from Rob Herring. - Changed the driver description to: Qualcomm PWM driver for the MSM vibrator. Brian Masney (3): dt-bindings: Input: new bindings for MSM vibrator Input: add new vibrator driver for various MSM SOCs ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator .../bindings/input/msm-vibrator.txt | 36 +++ .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 ++ drivers/input/misc/Kconfig| 10 + drivers/input/misc/Makefile | 1 + drivers/input/misc/msm-vibrator.c | 276 ++ 5 files changed, 354 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt create mode 100644 drivers/input/misc/msm-vibrator.c -- 2.17.2
[PATCH v3 2/3] Input: add new vibrator driver for various MSM SOCs
This patch adds a new vibrator driver that supports various Qualcomm MSM SOCs. Driver was tested on a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney --- drivers/input/misc/Kconfig| 10 ++ drivers/input/misc/Makefile | 1 + drivers/input/misc/msm-vibrator.c | 276 ++ 3 files changed, 287 insertions(+) create mode 100644 drivers/input/misc/msm-vibrator.c diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index ca59a2be9bc5..e39aef84f357 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig @@ -117,6 +117,16 @@ config INPUT_E3X0_BUTTON To compile this driver as a module, choose M here: the module will be called e3x0_button. +config INPUT_MSM_VIBRATOR + tristate "Qualcomm MSM vibrator driver" + select INPUT_FF_MEMLESS + help + Support for the vibrator that is found on various Qualcomm MSM + SOCs. + + To compile this driver as a module, choose M here: the module + will be called msm_vibrator. + config INPUT_PCSPKR tristate "PC Speaker support" depends on PCSPKR_PLATFORM diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index 9d0f9d1ff68f..96a6419cb1f2 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_INPUT_MAX8925_ONKEY) += max8925_onkey.o obj-$(CONFIG_INPUT_MAX8997_HAPTIC) += max8997_haptic.o obj-$(CONFIG_INPUT_MC13783_PWRBUTTON) += mc13783-pwrbutton.o obj-$(CONFIG_INPUT_MMA8450)+= mma8450.o +obj-$(CONFIG_INPUT_MSM_VIBRATOR) += msm-vibrator.o obj-$(CONFIG_INPUT_PALMAS_PWRBUTTON) += palmas-pwrbutton.o obj-$(CONFIG_INPUT_PCAP) += pcap_keys.o obj-$(CONFIG_INPUT_PCF50633_PMU) += pcf50633-input.o diff --git a/drivers/input/misc/msm-vibrator.c b/drivers/input/misc/msm-vibrator.c new file mode 100644 index ..bb862f30a780 --- /dev/null +++ b/drivers/input/misc/msm-vibrator.c @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Qualcomm MSM vibrator driver + * + * Copyright (c) 2018 Brian Masney + * + * Based on qcom,pwm-vibrator.c from: + * Copyright (c) 2018 Jonathan Marek + * + * Based on msm_pwm_vibrator.c from downstream Android sources: + * Copyright (C) 2009-2014 LGE, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define REG_CMD_RCGR 0x00 +#define REG_CFG_RCGR 0x04 +#define REG_M 0x08 +#define REG_N 0x0C +#define REG_D 0x10 +#define REG_CBCR 0x24 +#define MMSS_CC_M_DEFAULT 1 + +struct msm_vibrator { + struct input_dev *input; + struct mutex mutex; + struct work_struct worker; + void __iomem *base; + struct regulator *vcc; + struct clk *clk; + struct gpio_desc *enable_gpio; + u16 magnitude; + bool enabled; +}; + +#define msm_vibrator_write(msm_vibrator, offset, value) \ + writel((value), (void __iomem *)((msm_vibrator)->base + (offset))) + +static int msm_vibrator_start(struct msm_vibrator *vibrator) +{ + int d_reg_val, ret = 0; + + mutex_lock(>mutex); + + if (!vibrator->enabled) { + ret = clk_set_rate(vibrator->clk, 24000); + if (ret) { + dev_err(>input->dev, + "Failed to set clock rate: %d\n", ret); + goto unlock; + } + + ret = clk_prepare_enable(vibrator->clk); + if (ret) { + dev_err(>input->dev, + "Failed to enable clock: %d\n", ret); + goto unlock; + } + + ret = regulator_enable(vibrator->vcc); + if (ret) { + dev_err(>input->dev, + "Failed to enable regulator: %d\n", ret); + clk_disable(vibrator->clk); + goto unlock; + } + + gpiod_set_value_cansleep(vibrator->enable_gpio, 1); + + vibrator->enabled = true; + } + + d_reg_val = 127 - ((126 * vibrator->magnitude) / 0x); + msm_vibrator_write(vibrator, REG_CFG_RCGR, + (2 << 12) | /* dual edge mode */ + (0 << 8) | /* cxo */ + (7 << 0)); + msm_vibrator_write(vibrator, REG_M, 1); + msm_vibrator_write(vibrator, REG_N, 128); + msm_vibrator_write(vibrator, REG_D, d_reg_val); + msm_vibrator_write(vibrator, REG_CMD_RCGR, 1); + msm_vibrator_write(vibrator, REG_CBCR, 1); + +unlock: + mutex_unlock(>mutex); + + return ret; +} + +static void msm_vibrator_stop(struct msm_vibrator *vibrator) +{ + mutex_lock(>mutex); + + if (vibrator->enabled) { +
Re: TSC to Mono-raw Drift
On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote: > On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote: > > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz wrote: > > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner > > > wrote: > > >> On Fri, 19 Oct 2018, John Stultz wrote: > > >>> We might be able to reduce the degree in this case, but I worry the > > >>> extra complexity may only cause problems for others. > > >> > > >> Is it really that complex to add a fixed correction value periodically? > > The error is too large to be corrected by stepping on clock updates. > For a typical TSC frequency we have multiplier in the range of few > millions, so that's a frequency error of up to few hundred ppb. In the For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the largest I've seen since measuring this on a few platforms. 20-40 PPB seems more typical. > old days when the clock was updated 1000 times per second that would > be hidden in the resolution of the clock, but now with tickless > kernels those steps would be very noticeable. > > If the multiplier was adjusted in the same way as the non-raw clock, > there wouldn't be any steps in time, but there would be steps in > frequency and the time error would be proportional to the update > interval. For a clock updated once per second that's an error of up to > a few hundreds of nanoseconds. > > I agree with John. I think the raw monotonic clock should be stable. > > It would help if we better understood the use case. If I needed a There may be other future use cases, but the most immediate use is timed GPIO. Current shipping hardware has a timed GPIO device that uses ART/TSC directly to schedule output events and time-stamp input events. There isn't a device clock like PHC, as an example. An example application is synchronizing TGPIO across two networked devices sync'd with PTP. It's possible to use existing the existing PHC interface to compute the network time in terms of raw monotonic and vice versa. Now we need to add a TSC to monotonic raw relation in order to interact with the TGPIO hardware. To me this doesn't make sense because they use the same hardware clock. > clock that ticks in an exact proportion to the tsc, why wouldn't I use > the tsc directly? Is this about having a fall back in case the tsc I think this would work OK. This would need some additional API to expose the relation between monotonic raw and TSC. In the kernel that would probably mean plumbing ktime_get_snapshot() to the application. Or, another option: add TSC as another clock to the get_device_system_crosststamp() interface. The second option would be more precise, but for the proposed application (or similar) would require changes to the supporting stacks like PTP. > cannot be used (e.g. due to unsynchronized CPUs)? No. > > If the frequency error was exported, it could be compensated where > necessary. Maybe that would work for the original poster? This would be the same as plumbing ktime_get_snapshot() except that the math would be done in the kernel. > > A better fix might be to modify the calculation of time to use a > second multiplier, effectively increasing its resolution. However, I'm not sure, I'm understanding. Like cascading transforms? While this would increase the precision, I think it would still drift over days. We could probably fix up every second though. > that would slow down all users of the clock. Couldn't some clocksources specify an additional multiplier/precision and others use lower precision? > > -- > Miroslav Lichvar
Re: TSC to Mono-raw Drift
On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote: > On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote: > > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz wrote: > > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner > > > wrote: > > >> On Fri, 19 Oct 2018, John Stultz wrote: > > >>> We might be able to reduce the degree in this case, but I worry the > > >>> extra complexity may only cause problems for others. > > >> > > >> Is it really that complex to add a fixed correction value periodically? > > The error is too large to be corrected by stepping on clock updates. > For a typical TSC frequency we have multiplier in the range of few > millions, so that's a frequency error of up to few hundred ppb. In the For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the largest I've seen since measuring this on a few platforms. 20-40 PPB seems more typical. > old days when the clock was updated 1000 times per second that would > be hidden in the resolution of the clock, but now with tickless > kernels those steps would be very noticeable. > > If the multiplier was adjusted in the same way as the non-raw clock, > there wouldn't be any steps in time, but there would be steps in > frequency and the time error would be proportional to the update > interval. For a clock updated once per second that's an error of up to > a few hundreds of nanoseconds. > > I agree with John. I think the raw monotonic clock should be stable. > > It would help if we better understood the use case. If I needed a There may be other future use cases, but the most immediate use is timed GPIO. Current shipping hardware has a timed GPIO device that uses ART/TSC directly to schedule output events and time-stamp input events. There isn't a device clock like PHC, as an example. An example application is synchronizing TGPIO across two networked devices sync'd with PTP. It's possible to use existing the existing PHC interface to compute the network time in terms of raw monotonic and vice versa. Now we need to add a TSC to monotonic raw relation in order to interact with the TGPIO hardware. To me this doesn't make sense because they use the same hardware clock. > clock that ticks in an exact proportion to the tsc, why wouldn't I use > the tsc directly? Is this about having a fall back in case the tsc I think this would work OK. This would need some additional API to expose the relation between monotonic raw and TSC. In the kernel that would probably mean plumbing ktime_get_snapshot() to the application. Or, another option: add TSC as another clock to the get_device_system_crosststamp() interface. The second option would be more precise, but for the proposed application (or similar) would require changes to the supporting stacks like PTP. > cannot be used (e.g. due to unsynchronized CPUs)? No. > > If the frequency error was exported, it could be compensated where > necessary. Maybe that would work for the original poster? This would be the same as plumbing ktime_get_snapshot() except that the math would be done in the kernel. > > A better fix might be to modify the calculation of time to use a > second multiplier, effectively increasing its resolution. However, I'm not sure, I'm understanding. Like cascading transforms? While this would increase the precision, I think it would still drift over days. We could probably fix up every second though. > that would slow down all users of the clock. Couldn't some clocksources specify an additional multiplier/precision and others use lower precision? > > -- > Miroslav Lichvar
[PATCH v3 2/3] Input: add new vibrator driver for various MSM SOCs
This patch adds a new vibrator driver that supports various Qualcomm MSM SOCs. Driver was tested on a LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney --- drivers/input/misc/Kconfig| 10 ++ drivers/input/misc/Makefile | 1 + drivers/input/misc/msm-vibrator.c | 276 ++ 3 files changed, 287 insertions(+) create mode 100644 drivers/input/misc/msm-vibrator.c diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index ca59a2be9bc5..e39aef84f357 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig @@ -117,6 +117,16 @@ config INPUT_E3X0_BUTTON To compile this driver as a module, choose M here: the module will be called e3x0_button. +config INPUT_MSM_VIBRATOR + tristate "Qualcomm MSM vibrator driver" + select INPUT_FF_MEMLESS + help + Support for the vibrator that is found on various Qualcomm MSM + SOCs. + + To compile this driver as a module, choose M here: the module + will be called msm_vibrator. + config INPUT_PCSPKR tristate "PC Speaker support" depends on PCSPKR_PLATFORM diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index 9d0f9d1ff68f..96a6419cb1f2 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_INPUT_MAX8925_ONKEY) += max8925_onkey.o obj-$(CONFIG_INPUT_MAX8997_HAPTIC) += max8997_haptic.o obj-$(CONFIG_INPUT_MC13783_PWRBUTTON) += mc13783-pwrbutton.o obj-$(CONFIG_INPUT_MMA8450)+= mma8450.o +obj-$(CONFIG_INPUT_MSM_VIBRATOR) += msm-vibrator.o obj-$(CONFIG_INPUT_PALMAS_PWRBUTTON) += palmas-pwrbutton.o obj-$(CONFIG_INPUT_PCAP) += pcap_keys.o obj-$(CONFIG_INPUT_PCF50633_PMU) += pcf50633-input.o diff --git a/drivers/input/misc/msm-vibrator.c b/drivers/input/misc/msm-vibrator.c new file mode 100644 index ..bb862f30a780 --- /dev/null +++ b/drivers/input/misc/msm-vibrator.c @@ -0,0 +1,276 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Qualcomm MSM vibrator driver + * + * Copyright (c) 2018 Brian Masney + * + * Based on qcom,pwm-vibrator.c from: + * Copyright (c) 2018 Jonathan Marek + * + * Based on msm_pwm_vibrator.c from downstream Android sources: + * Copyright (C) 2009-2014 LGE, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define REG_CMD_RCGR 0x00 +#define REG_CFG_RCGR 0x04 +#define REG_M 0x08 +#define REG_N 0x0C +#define REG_D 0x10 +#define REG_CBCR 0x24 +#define MMSS_CC_M_DEFAULT 1 + +struct msm_vibrator { + struct input_dev *input; + struct mutex mutex; + struct work_struct worker; + void __iomem *base; + struct regulator *vcc; + struct clk *clk; + struct gpio_desc *enable_gpio; + u16 magnitude; + bool enabled; +}; + +#define msm_vibrator_write(msm_vibrator, offset, value) \ + writel((value), (void __iomem *)((msm_vibrator)->base + (offset))) + +static int msm_vibrator_start(struct msm_vibrator *vibrator) +{ + int d_reg_val, ret = 0; + + mutex_lock(>mutex); + + if (!vibrator->enabled) { + ret = clk_set_rate(vibrator->clk, 24000); + if (ret) { + dev_err(>input->dev, + "Failed to set clock rate: %d\n", ret); + goto unlock; + } + + ret = clk_prepare_enable(vibrator->clk); + if (ret) { + dev_err(>input->dev, + "Failed to enable clock: %d\n", ret); + goto unlock; + } + + ret = regulator_enable(vibrator->vcc); + if (ret) { + dev_err(>input->dev, + "Failed to enable regulator: %d\n", ret); + clk_disable(vibrator->clk); + goto unlock; + } + + gpiod_set_value_cansleep(vibrator->enable_gpio, 1); + + vibrator->enabled = true; + } + + d_reg_val = 127 - ((126 * vibrator->magnitude) / 0x); + msm_vibrator_write(vibrator, REG_CFG_RCGR, + (2 << 12) | /* dual edge mode */ + (0 << 8) | /* cxo */ + (7 << 0)); + msm_vibrator_write(vibrator, REG_M, 1); + msm_vibrator_write(vibrator, REG_N, 128); + msm_vibrator_write(vibrator, REG_D, d_reg_val); + msm_vibrator_write(vibrator, REG_CMD_RCGR, 1); + msm_vibrator_write(vibrator, REG_CBCR, 1); + +unlock: + mutex_unlock(>mutex); + + return ret; +} + +static void msm_vibrator_stop(struct msm_vibrator *vibrator) +{ + mutex_lock(>mutex); + + if (vibrator->enabled) { +
[PATCH v3 3/3] ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator
This patch adds device device tree bindings for the vibrator found on the LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney --- .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts index ed8f064d0895..de956eea31ac 100644 --- a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts +++ b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts @@ -5,6 +5,7 @@ #include #include #include +#include / { model = "LGE MSM 8974 HAMMERHEAD"; @@ -268,6 +269,36 @@ input-enable; }; }; + + vibrator_pin: vibrator { + pwm { + pins = "gpio27"; + function = "gp1_clk"; + + drive-strength = <6>; + bias-disable; + }; + + enable { + pins = "gpio60"; + function = "gpio"; + }; + }; + }; + + vibrator@fd8c3450 { + compatible = "qcom,msm8974-vibrator"; + reg = <0xfd8c3450 0x400>; + + vcc-supply = <_l19>; + + clocks = < CAMSS_GP1_CLK>; + clock-names = "pwm"; + + enable-gpios = < 60 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_pin>; }; sdhci@f9824900 { -- 2.17.2
[PATCH v3 3/3] ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator
This patch adds device device tree bindings for the vibrator found on the LG Nexus 5 (hammerhead) phone. Signed-off-by: Brian Masney --- .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts index ed8f064d0895..de956eea31ac 100644 --- a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts +++ b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts @@ -5,6 +5,7 @@ #include #include #include +#include / { model = "LGE MSM 8974 HAMMERHEAD"; @@ -268,6 +269,36 @@ input-enable; }; }; + + vibrator_pin: vibrator { + pwm { + pins = "gpio27"; + function = "gp1_clk"; + + drive-strength = <6>; + bias-disable; + }; + + enable { + pins = "gpio60"; + function = "gpio"; + }; + }; + }; + + vibrator@fd8c3450 { + compatible = "qcom,msm8974-vibrator"; + reg = <0xfd8c3450 0x400>; + + vcc-supply = <_l19>; + + clocks = < CAMSS_GP1_CLK>; + clock-names = "pwm"; + + enable-gpios = < 60 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_pin>; }; sdhci@f9824900 { -- 2.17.2
[PATCH 2/2] mm/zsmalloc.c: fix zsmalloc ARM LPAE support
Since commit 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS"), an architecture has to define this value in order to guarantee that zsmalloc will be able to encode and decode the obj value properly. Similar to that change, this one sets the value for ARM LPAE, fixing a possible null-ptr-deref in zs_map_object() when using ARM LPAE and HIGHMEM pages located above the 4GB watermark. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- arch/arm/include/asm/pgtable-3level-types.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h index 921aa30259c4..bd4994f98700 100644 --- a/arch/arm/include/asm/pgtable-3level-types.h +++ b/arch/arm/include/asm/pgtable-3level-types.h @@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 36 + #endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */ -- 2.19.1
[PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the physical frame number might be so big that zsmalloc obj encoding (to location) will break IF architecture does not re-define MAX_PHYSMEM_BITS, causing: [ 497.097843] == [ 497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc [ 497.105933] Read of size 4 at addr by task mkfs.ext4/623 [ 497.109684] [ 497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 [ 497.116098] Hardware name: Generic DT based system [ 497.119094] [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [ 497.123819] [] (show_stack) from [] (dump_stack+0xbc/0xe8) [ 497.128299] [] (dump_stack) from [] (kasan_report+0x248/0x390) [ 497.132928] [] (kasan_report) from [] (__asan_load4+0x78/0xb4) [ 497.137530] [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) [ 497.142335] [] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) [ 497.148294] [] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) [ 497.154653] [] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) [ 497.160413] [] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) [ 497.165379] [] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) [ 497.170775] [] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) [ 497.176776] [] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) [ 497.182549] [] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) [ 497.187849] [] (blkdev_writepage) from [] (__writepage+0x44/0x78) [ 497.192633] [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) [ 497.197616] [] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) [ 497.202807] [] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) [ 497.208022] [] (blkdev_writepages) from [] (do_writepages+0x68/0x134) [ 497.213002] [] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) [ 497.218447] [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) [ 497.224416] [] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) [ 497.229749] [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) [ 497.234549] [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) [ 497.239159] [] (do_fsync) from [] (sys_fsync+0x1c/0x20) [ 497.243407] [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) like in this ARM 32-bit (LPAE enabled), example. That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be wrong: which may cause obj variable to overflow if PFN is HIGHMEM and referencing any page above the 4GB watermark. This commit exposes the BUG IF the architecture supports PAE AND does not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- mm/zsmalloc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 9da65552e7ca..9c3ff8c2ccbc 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -119,6 +119,15 @@ #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +/* + * When using PAE, the obj encoding might overflow if arch does + * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM. + * This checks for a future bad page access, when de-coding obj. + */ +#define OBJ_OVERFLOW(_pfn) \ + (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \ + ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0) + #define FULLNESS_BITS 2 #define CLASS_BITS 8 #define ISOLATED_BITS 3 @@ -871,9 +880,14 @@ static void obj_to_location(unsigned long obj, struct page **page, */ static unsigned long location_to_obj(struct page *page, unsigned int obj_idx) { - unsigned long obj; + unsigned long obj, pfn; + + pfn = page_to_pfn(page); + + if (unlikely(OBJ_OVERFLOW(pfn))) + BUG(); - obj = page_to_pfn(page) << OBJ_INDEX_BITS; + obj = pfn << OBJ_INDEX_BITS; obj |= obj_idx & OBJ_INDEX_MASK; obj <<= OBJ_TAG_BITS; -- 2.19.1
[PATCH 2/2] mm/zsmalloc.c: fix zsmalloc ARM LPAE support
Since commit 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS"), an architecture has to define this value in order to guarantee that zsmalloc will be able to encode and decode the obj value properly. Similar to that change, this one sets the value for ARM LPAE, fixing a possible null-ptr-deref in zs_map_object() when using ARM LPAE and HIGHMEM pages located above the 4GB watermark. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- arch/arm/include/asm/pgtable-3level-types.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h index 921aa30259c4..bd4994f98700 100644 --- a/arch/arm/include/asm/pgtable-3level-types.h +++ b/arch/arm/include/asm/pgtable-3level-types.h @@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 36 + #endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */ -- 2.19.1
[PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the physical frame number might be so big that zsmalloc obj encoding (to location) will break IF architecture does not re-define MAX_PHYSMEM_BITS, causing: [ 497.097843] == [ 497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc [ 497.105933] Read of size 4 at addr by task mkfs.ext4/623 [ 497.109684] [ 497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 [ 497.116098] Hardware name: Generic DT based system [ 497.119094] [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [ 497.123819] [] (show_stack) from [] (dump_stack+0xbc/0xe8) [ 497.128299] [] (dump_stack) from [] (kasan_report+0x248/0x390) [ 497.132928] [] (kasan_report) from [] (__asan_load4+0x78/0xb4) [ 497.137530] [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) [ 497.142335] [] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) [ 497.148294] [] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) [ 497.154653] [] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) [ 497.160413] [] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) [ 497.165379] [] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) [ 497.170775] [] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) [ 497.176776] [] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) [ 497.182549] [] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) [ 497.187849] [] (blkdev_writepage) from [] (__writepage+0x44/0x78) [ 497.192633] [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) [ 497.197616] [] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) [ 497.202807] [] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) [ 497.208022] [] (blkdev_writepages) from [] (do_writepages+0x68/0x134) [ 497.213002] [] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) [ 497.218447] [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) [ 497.224416] [] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) [ 497.229749] [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) [ 497.234549] [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) [ 497.239159] [] (do_fsync) from [] (sys_fsync+0x1c/0x20) [ 497.243407] [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) like in this ARM 32-bit (LPAE enabled), example. That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be wrong: which may cause obj variable to overflow if PFN is HIGHMEM and referencing any page above the 4GB watermark. This commit exposes the BUG IF the architecture supports PAE AND does not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- mm/zsmalloc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 9da65552e7ca..9c3ff8c2ccbc 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -119,6 +119,15 @@ #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +/* + * When using PAE, the obj encoding might overflow if arch does + * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM. + * This checks for a future bad page access, when de-coding obj. + */ +#define OBJ_OVERFLOW(_pfn) \ + (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \ + ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0) + #define FULLNESS_BITS 2 #define CLASS_BITS 8 #define ISOLATED_BITS 3 @@ -871,9 +880,14 @@ static void obj_to_location(unsigned long obj, struct page **page, */ static unsigned long location_to_obj(struct page *page, unsigned int obj_idx) { - unsigned long obj; + unsigned long obj, pfn; + + pfn = page_to_pfn(page); + + if (unlikely(OBJ_OVERFLOW(pfn))) + BUG(); - obj = page_to_pfn(page) << OBJ_INDEX_BITS; + obj = pfn << OBJ_INDEX_BITS; obj |= obj_idx & OBJ_INDEX_MASK; obj <<= OBJ_TAG_BITS; -- 2.19.1
Re: [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping
Daniel Jordan writes: > On Wed, Oct 10, 2018 at 03:19:17PM +0800, Huang Ying wrote: >> +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, >> +pmd_t pmd, swp_entry_t *entry) >> +{ > > Got > /home/dbbench/linux/mm/memcontrol.c:4719:21: warning: ‘mc_handle_swap_pmd’ > defined but not used [-Wunused-function] > static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, > when > # CONFIG_TRANSPARENT_HUGEPAGE is not set Thanks for pointing this out. Will fix it in the next version. Best Regards, Huang, Ying
Re: [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping
Daniel Jordan writes: > On Wed, Oct 10, 2018 at 03:19:17PM +0800, Huang Ying wrote: >> +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, >> +pmd_t pmd, swp_entry_t *entry) >> +{ > > Got > /home/dbbench/linux/mm/memcontrol.c:4719:21: warning: ‘mc_handle_swap_pmd’ > defined but not used [-Wunused-function] > static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma, > when > # CONFIG_TRANSPARENT_HUGEPAGE is not set Thanks for pointing this out. Will fix it in the next version. Best Regards, Huang, Ying
Re: [PATCH v3 4/4] nds32: Add document for NDS32 PMU.
Hi Rob, On Thu, Oct 25, 2018 at 02:32:48AM +0800, Rob Herring wrote: > On Wed, Oct 24, 2018 at 11:32:40AM +0800, Nickhu wrote: > > The document for how to add NDS32 PMU > > in devicetree. > > > > Signed-off-by: Nickhu > > --- > > Documentation/devicetree/bindings/nds32/pmu.txt | 17 + > > 1 file changed, 17 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt > > Can you move to bindings/perf/ with other similar h/w. With that, > > Reviewed-by: Rob Herring Sure, thanks! Nick
Re: [PATCH v3 4/4] nds32: Add document for NDS32 PMU.
Hi Rob, On Thu, Oct 25, 2018 at 02:32:48AM +0800, Rob Herring wrote: > On Wed, Oct 24, 2018 at 11:32:40AM +0800, Nickhu wrote: > > The document for how to add NDS32 PMU > > in devicetree. > > > > Signed-off-by: Nickhu > > --- > > Documentation/devicetree/bindings/nds32/pmu.txt | 17 + > > 1 file changed, 17 insertions(+) > > create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt > > Can you move to bindings/perf/ with other similar h/w. With that, > > Reviewed-by: Rob Herring Sure, thanks! Nick
Re: [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD
Daniel Jordan writes: > On Wed, Oct 10, 2018 at 03:19:09PM +0800, Huang Ying wrote: >> +#ifdef CONFIG_THP_SWAP >> +/* >> + * The corresponding page table shouldn't be changed under us, that >> + * is, the page table lock should be held. >> + */ >> +int split_swap_cluster_map(swp_entry_t entry) >> +{ >> +struct swap_info_struct *si; >> +struct swap_cluster_info *ci; >> +unsigned long offset = swp_offset(entry); >> + >> +VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); >> +si = _swap_info_get(entry); >> +if (!si) >> +return -EBUSY; > > I think this return value doesn't get used anywhere? Yes. And the error is only possible if page table is corrupted. So maybe add a VM_BUG_ON() in it caller __split_huge_swap_pmd()? Best Regards, Huang, Ying
Re: [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD
Daniel Jordan writes: > On Wed, Oct 10, 2018 at 03:19:09PM +0800, Huang Ying wrote: >> +#ifdef CONFIG_THP_SWAP >> +/* >> + * The corresponding page table shouldn't be changed under us, that >> + * is, the page table lock should be held. >> + */ >> +int split_swap_cluster_map(swp_entry_t entry) >> +{ >> +struct swap_info_struct *si; >> +struct swap_cluster_info *ci; >> +unsigned long offset = swp_offset(entry); >> + >> +VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER)); >> +si = _swap_info_get(entry); >> +if (!si) >> +return -EBUSY; > > I think this return value doesn't get used anywhere? Yes. And the error is only possible if page table is corrupted. So maybe add a VM_BUG_ON() in it caller __split_huge_swap_pmd()? Best Regards, Huang, Ying
RE: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support
-Original Message- From: Fabio Estevam [mailto:feste...@gmail.com] Sent: 2018年10月24日 19:20 To: Joakim Zhang Cc: Shawn Guo ; Sascha Hauer ; Sascha Hauer ; Fabio Estevam ; dl-linux-imx ; Rob Herring ; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE ; open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS ; linux-kernel ; A.s. Dong Subject: Re: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support Hi Joakim, On Wed, Oct 24, 2018 at 8:15 AM Joakim Zhang wrote: > diff --git a/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > new file mode 100644 > index ..963e0b559d60 > --- /dev/null > +++ b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > @@ -0,0 +1,19 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2013 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. You can remove this paragraph as you are using SPDX tag. > +++ b/arch/arm/boot/dts/imx6q-sabreauto-flexcan1.dts > @@ -0,0 +1,19 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2013 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ Ditto. > + { > + pinctrl-names = "default"; > + pinctrl-0 = <_flexcan1>; > + pinctrl-assert-gpios = <_b 3 GPIO_ACTIVE_HIGH>; /* TX */ This property does not exist in upstream. Hi Fabio, I will modify what you have commented. Thanks a lot. Best Regards, Joakim Zhang
Re: [PATCH 1/2] perf: Add munmap callback
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra wrote: > > On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote: > > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra wrote: > > > > That is actually a different problem. And you're right, we never did fix > > > that. > > > > > it is a different problem but the solution is the same: PERF_RECORD_UNMAP! > > But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap() > hook sufficient to actually generate those. > You're right. But I saw that and thought, this is going in the right direction: tracking munmap. I agree with you that we should use this opportunity to track unmap for his purpose but also the issue I raised, which needs a new record type based on the new unmap hook. > Now I agree that if he's going to do an munmap hook, he should do it > 'right' and at the very least allow for PERF_RECORD_UNMAP to be done, > but ideally simply pick up and finish that patch we had back then. Agreed.
RE: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support
-Original Message- From: Fabio Estevam [mailto:feste...@gmail.com] Sent: 2018年10月24日 19:20 To: Joakim Zhang Cc: Shawn Guo ; Sascha Hauer ; Sascha Hauer ; Fabio Estevam ; dl-linux-imx ; Rob Herring ; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE ; open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS ; linux-kernel ; A.s. Dong Subject: Re: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support Hi Joakim, On Wed, Oct 24, 2018 at 8:15 AM Joakim Zhang wrote: > diff --git a/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > new file mode 100644 > index ..963e0b559d60 > --- /dev/null > +++ b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts > @@ -0,0 +1,19 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2013 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. You can remove this paragraph as you are using SPDX tag. > +++ b/arch/arm/boot/dts/imx6q-sabreauto-flexcan1.dts > @@ -0,0 +1,19 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2013 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + */ Ditto. > + { > + pinctrl-names = "default"; > + pinctrl-0 = <_flexcan1>; > + pinctrl-assert-gpios = <_b 3 GPIO_ACTIVE_HIGH>; /* TX */ This property does not exist in upstream. Hi Fabio, I will modify what you have commented. Thanks a lot. Best Regards, Joakim Zhang
Re: [PATCH 1/2] perf: Add munmap callback
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra wrote: > > On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote: > > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra wrote: > > > > That is actually a different problem. And you're right, we never did fix > > > that. > > > > > it is a different problem but the solution is the same: PERF_RECORD_UNMAP! > > But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap() > hook sufficient to actually generate those. > You're right. But I saw that and thought, this is going in the right direction: tracking munmap. I agree with you that we should use this opportunity to track unmap for his purpose but also the issue I raised, which needs a new record type based on the new unmap hook. > Now I agree that if he's going to do an munmap hook, he should do it > 'right' and at the very least allow for PERF_RECORD_UNMAP to be done, > but ideally simply pick up and finish that patch we had back then. Agreed.
Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece
Daniel Jordan writes: > On Wed, Oct 24, 2018 at 11:31:42AM +0800, Huang, Ying wrote: >> Hi, Daniel, >> >> Daniel Jordan writes: >> >> > On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote: >> >> And for all, Any comment is welcome! >> >> >> >> This patchset is based on the 2018-10-3 head of mmotm/master. >> > >> > There seems to be some infrequent memory corruption with THPs that have >> > been >> > swapped out: page contents differ after swapin. >> >> Thanks a lot for testing this! I know there were big effort behind this >> and it definitely will improve the quality of the patchset greatly! > > You're welcome! Hopefully I'll have more results and tests to share in the > next two weeks. > >> >> > Reproducer at the bottom. Part of some tests I'm writing, had to separate >> > it a >> > little hack-ily. Basically it writes the word offset _at_ each word >> > offset in >> > a memory blob, tries to push it to swap, and verifies the offset is the >> > same >> > after swapin. >> > >> > I ran with THP enabled=always. THP swapin_enabled could be always or >> > never, it >> > happened with both. Every time swapping occurred, a single THP-sized >> > chunk in >> > the middle of the blob had different offsets. Example: >> > >> > ** > word corruption gap >> > ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) >> > ** >> > ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) >> > ** >> > ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) >> > ** >> > ...pattern continues... >> > ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) >> > ** >> > ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) >> > ** >> > ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) >> > ** >> >> 15179776 < 15179xxx <= 17027064 >> >> 15179776 % 4096 = 0 >> >> And 15179776 = 15179768 + 8 >> >> So I guess we have some alignment bug. Could you try the patches >> attached? It deal with some alignment issue. > > That fixed it. And removed three lines of code. Nice :) Thanks! I will merge the fixes into the patchset. Best Regards, Huang, Ying
Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece
Daniel Jordan writes: > On Wed, Oct 24, 2018 at 11:31:42AM +0800, Huang, Ying wrote: >> Hi, Daniel, >> >> Daniel Jordan writes: >> >> > On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote: >> >> And for all, Any comment is welcome! >> >> >> >> This patchset is based on the 2018-10-3 head of mmotm/master. >> > >> > There seems to be some infrequent memory corruption with THPs that have >> > been >> > swapped out: page contents differ after swapin. >> >> Thanks a lot for testing this! I know there were big effort behind this >> and it definitely will improve the quality of the patchset greatly! > > You're welcome! Hopefully I'll have more results and tests to share in the > next two weeks. > >> >> > Reproducer at the bottom. Part of some tests I'm writing, had to separate >> > it a >> > little hack-ily. Basically it writes the word offset _at_ each word >> > offset in >> > a memory blob, tries to push it to swap, and verifies the offset is the >> > same >> > after swapin. >> > >> > I ran with THP enabled=always. THP swapin_enabled could be always or >> > never, it >> > happened with both. Every time swapping occurred, a single THP-sized >> > chunk in >> > the middle of the blob had different offsets. Example: >> > >> > ** > word corruption gap >> > ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) >> > ** >> > ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) >> > ** >> > ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) >> > ** >> > ...pattern continues... >> > ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) >> > ** >> > ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) >> > ** >> > ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) >> > ** >> >> 15179776 < 15179xxx <= 17027064 >> >> 15179776 % 4096 = 0 >> >> And 15179776 = 15179768 + 8 >> >> So I guess we have some alignment bug. Could you try the patches >> attached? It deal with some alignment issue. > > That fixed it. And removed three lines of code. Nice :) Thanks! I will merge the fixes into the patchset. Best Regards, Huang, Ying