[PATCH V3 08/10] OPP: Configure all required OPPs

2018-10-24 Thread Viresh Kumar
Now that all the infrastructure is in place to support multiple required
OPPs, lets switch over to using it.

A new internal routine _set_required_opps() takes care of updating
performance state for all the required OPPs. With this the performance
state updates are supported even when the end device needs to configure
regulators as well, that wasn't the case earlier.

The pstates were earlier stored in the end device's OPP structures, that
also changes now as those values are stored in the genpd's OPP
structures. And so we switch over to using
pm_genpd_opp_to_performance_state() instead of
of_genpd_opp_to_performance_state() to get performance state for the
genpd OPPs.

The routine _generic_set_opp_domain() is not required anymore and is
removed.

On errors we don't try to recover by reverting to old settings as things
are really complex now and the calls here should never really fail
unless there is a bug. There is no point increasing the complexity, for
code which will never be executed.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 113 ++---
 drivers/opp/of.c   |   5 +-
 2 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index cef2ccda355d..0eaa954b3f6c 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -548,44 +548,6 @@ _generic_set_opp_clk_only(struct device *dev, struct clk 
*clk,
return ret;
 }
 
-static inline int
-_generic_set_opp_domain(struct device *dev, struct clk *clk,
-   unsigned long old_freq, unsigned long freq,
-   unsigned int old_pstate, unsigned int new_pstate)
-{
-   int ret;
-
-   /* Scaling up? Scale domain performance state before frequency */
-   if (freq > old_freq) {
-   ret = dev_pm_genpd_set_performance_state(dev, new_pstate);
-   if (ret)
-   return ret;
-   }
-
-   ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq);
-   if (ret)
-   goto restore_domain_state;
-
-   /* Scaling down? Scale domain performance state after frequency */
-   if (freq < old_freq) {
-   ret = dev_pm_genpd_set_performance_state(dev, new_pstate);
-   if (ret)
-   goto restore_freq;
-   }
-
-   return 0;
-
-restore_freq:
-   if (_generic_set_opp_clk_only(dev, clk, freq, old_freq))
-   dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n",
-   __func__, old_freq);
-restore_domain_state:
-   if (freq > old_freq)
-   dev_pm_genpd_set_performance_state(dev, old_pstate);
-
-   return ret;
-}
-
 static int _generic_set_opp_regulator(const struct opp_table *opp_table,
  struct device *dev,
  unsigned long old_freq,
@@ -663,6 +625,56 @@ static int _set_opp_custom(const struct opp_table 
*opp_table,
return opp_table->set_opp(data);
 }
 
+/* This is only called for PM domain for now */
+static int _set_required_opps(struct device *dev,
+ struct opp_table *opp_table,
+ struct dev_pm_opp *opp)
+{
+   struct opp_table **required_opp_tables = opp_table->required_opp_tables;
+   struct device **genpd_virt_devs = opp_table->genpd_virt_devs;
+   unsigned int pstate;
+   int i, ret = 0;
+
+   if (!required_opp_tables)
+   return 0;
+
+   /* Single genpd case */
+   if (!genpd_virt_devs) {
+   pstate = opp->required_opps[0]->pstate;
+   ret = dev_pm_genpd_set_performance_state(dev, pstate);
+   if (ret) {
+   dev_err(dev, "Failed to set performance state of %s: %d 
(%d)\n",
+   dev_name(dev), pstate, ret);
+   }
+   return ret;
+   }
+
+   /* Multiple genpd case */
+
+   /*
+* Acquire genpd_virt_dev_lock to make sure we don't use a genpd_dev
+* after it is freed from another thread.
+*/
+   mutex_lock(_table->genpd_virt_dev_lock);
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   pstate = opp->required_opps[i]->pstate;
+
+   if (!genpd_virt_devs[i])
+   continue;
+
+   ret = dev_pm_genpd_set_performance_state(genpd_virt_devs[i], 
pstate);
+   if (ret) {
+   dev_err(dev, "Failed to set performance rate of %s: %d 
(%d)\n",
+   dev_name(genpd_virt_devs[i]), pstate, ret);
+   break;
+   }
+   }
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return ret;
+}
+
 /**
  * dev_pm_opp_set_rate() - Configure new OPP based on frequency
  * @dev:device for which we do this operation
@@ -730,6 +742,13 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 

[PATCH V3 05/10] OPP: Populate OPPs from "required-opps" property

2018-10-24 Thread Viresh Kumar
An earlier commit populated the OPP tables from the "required-opps"
property, this commit populates the individual OPPs. This is repeated
for each OPP in the OPP table and these populated OPPs will be used by
later commits.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c |  1 +
 drivers/opp/of.c   | 81 --
 drivers/opp/opp.h  |  6 
 3 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 85174a5c4850..02a69a62dac8 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -976,6 +976,7 @@ static void _opp_kref_release(struct kref *kref)
 * frequency/voltage list.
 */
blocking_notifier_call_chain(_table->head, OPP_EVENT_REMOVE, opp);
+   _of_opp_free_required_opps(opp_table, opp);
opp_debug_remove_one(opp);
list_del(>node);
kfree(opp);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index b5605196122a..ffaeefef98ce 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -259,6 +259,77 @@ void _of_clear_opp_table(struct opp_table *opp_table)
_opp_table_free_required_tables(opp_table);
 }
 
+/*
+ * Release all resources previously acquired with a call to
+ * _of_opp_alloc_required_opps().
+ */
+void _of_opp_free_required_opps(struct opp_table *opp_table,
+   struct dev_pm_opp *opp)
+{
+   struct dev_pm_opp **required_opps = opp->required_opps;
+   int i;
+
+   if (!required_opps)
+   return;
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   if (!required_opps[i])
+   break;
+
+   /* Put the reference back */
+   dev_pm_opp_put(required_opps[i]);
+   }
+
+   kfree(required_opps);
+   opp->required_opps = NULL;
+}
+
+/* Populate all required OPPs which are part of "required-opps" list */
+static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
+  struct dev_pm_opp *opp)
+{
+   struct dev_pm_opp **required_opps;
+   struct opp_table *required_table;
+   struct device_node *np;
+   int i, ret, count = opp_table->required_opp_count;
+
+   if (!count)
+   return 0;
+
+   required_opps = kcalloc(count, sizeof(*required_opps), GFP_KERNEL);
+   if (!required_opps)
+   return -ENOMEM;
+
+   opp->required_opps = required_opps;
+
+   for (i = 0; i < count; i++) {
+   required_table = opp_table->required_opp_tables[i];
+
+   np = of_parse_required_opp(opp->np, i);
+   if (unlikely(!np)) {
+   ret = -ENODEV;
+   goto free_required_opps;
+   }
+
+   required_opps[i] = _find_opp_of_np(required_table, np);
+   of_node_put(np);
+
+   if (!required_opps[i]) {
+   pr_err("%s: Unable to find required OPP node: %pOF 
(%d)\n",
+  __func__, opp->np, i);
+   ret = -ENODEV;
+   goto free_required_opps;
+   }
+   }
+
+   return 0;
+
+free_required_opps:
+   _of_opp_free_required_opps(opp_table, opp);
+
+   return ret;
+}
+
 static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table,
  struct device_node *np)
 {
@@ -503,6 +574,10 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
new_opp->dynamic = false;
new_opp->available = true;
 
+   ret = _of_opp_alloc_required_opps(opp_table, new_opp);
+   if (ret)
+   goto free_opp;
+
if (!of_property_read_u32(np, "clock-latency-ns", ))
new_opp->clock_latency_ns = val;
 
@@ -510,14 +585,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
 
ret = opp_parse_supplies(new_opp, dev, opp_table);
if (ret)
-   goto free_opp;
+   goto free_required_opps;
 
ret = _opp_add(dev, new_opp, opp_table, rate_not_available);
if (ret) {
/* Don't return error for duplicate OPPs */
if (ret == -EBUSY)
ret = 0;
-   goto free_opp;
+   goto free_required_opps;
}
 
/* OPP to select on device suspend */
@@ -547,6 +622,8 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
blocking_notifier_call_chain(_table->head, OPP_EVENT_ADD, new_opp);
return new_opp;
 
+free_required_opps:
+   _of_opp_free_required_opps(opp_table, new_opp);
 free_opp:
_opp_free(new_opp);
 
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 024e1be23d37..24b340ad18d1 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -63,6 +63,7 @@ extern struct list_head opp_tables;
  * @supplies:  Power supplies 

[PATCH V3 02/10] OPP: Identify and mark genpd OPP tables

2018-10-24 Thread Viresh Kumar
We need to handle genpd OPP tables differently, this is already the case
at one location and will be extended going forward. Add another field to
the OPP table to check if the table belongs to a genpd or not.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/of.c  | 6 --
 drivers/opp/opp.h | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 5a4b47958073..5f114cd3d88c 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -92,6 +92,9 @@ void _of_init_opp_table(struct opp_table *opp_table, struct 
device *dev,
of_property_read_u32(np, "voltage-tolerance",
 _table->voltage_tolerance_v1);
 
+   if (of_find_property(np, "#power-domain-cells", NULL))
+   opp_table->is_genpd = true;
+
/* Get OPP table node */
opp_np = _opp_of_get_opp_desc_node(np, index);
of_node_put(np);
@@ -326,8 +329,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
ret = of_property_read_u64(np, "opp-hz", );
if (ret < 0) {
/* "opp-hz" is optional for devices like power domains. */
-   if (!of_find_property(dev->of_node, "#power-domain-cells",
- NULL)) {
+   if (!opp_table->is_genpd) {
dev_err(dev, "%s: opp-hz not found\n", __func__);
goto free_opp;
}
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 9c6544b4f4f9..cdb0c2b095e2 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -140,6 +140,7 @@ enum opp_table_access {
  * @regulators: Supply regulators
  * @regulator_count: Number of power supply regulators
  * @genpd_performance_state: Device's power domain support performance state.
+ * @is_genpd: Marks if the OPP table belongs to a genpd.
  * @set_opp: Platform specific set_opp callback
  * @set_opp_data: Data to be passed to set_opp callback
  * @dentry:debugfs dentry pointer of the real device directory (not links).
@@ -178,6 +179,7 @@ struct opp_table {
struct regulator **regulators;
unsigned int regulator_count;
bool genpd_performance_state;
+   bool is_genpd;
 
int (*set_opp)(struct dev_pm_set_opp_data *data);
struct dev_pm_set_opp_data *set_opp_data;
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 07/10] OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper

2018-10-24 Thread Viresh Kumar
Multiple generic power domains for a consumer device are supported with
the help of virtual devices, which are created for each consumer device
- genpd pair. These are the device structures which are attached to the
power domain and are required by the OPP core to set the performance
state of the genpd.

The helpers added by this commit are required to be called once for each
of these virtual devices. These are required only if multiple domains
are available for a device, otherwise the actual device structure will
be used instead by the OPP core.

The new helpers also support the complex cases where the consumer device
wouldn't always require all the domains. For example, a camera may
require only one power domain during normal operations but two during
high resolution operations. The consumer driver can call
dev_pm_opp_put_genpd_virt_dev(high_resolution_genpd_virt_dev) if it is
currently operating in the normal mode and doesn't have any performance
requirements from the genpd which manages high resolution power
requirements. The consumer driver can later call
dev_pm_opp_set_genpd_virt_dev(high_resolution_genpd_virt_dev) once it
switches back to the high resolution mode.

The new helpers differ from other OPP set/put helpers as the new ones
can be called with OPPs initialized for the table as we may need to call
them on the fly because of the complex case explained above. For this
reason it is possible that the genpd virt_dev structure may be used in
parallel while the new helpers are running and a new mutex is added to
protect against that. We didn't use the existing opp_table->lock mutex
as that is widely used in the OPP core and we will need this lock in the
dev_pm_opp_set_rate() helper while changing OPP and we need to make sure
there is not much contention while doing that as that's the hotpath.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 88 ++
 drivers/opp/of.c   | 16 +++-
 drivers/opp/opp.h  |  4 ++
 include/linux/pm_opp.h |  8 
 4 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 02a69a62dac8..cef2ccda355d 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -823,6 +823,7 @@ static struct opp_table *_allocate_opp_table(struct device 
*dev, int index)
return NULL;
 
mutex_init(_table->lock);
+   mutex_init(_table->genpd_virt_dev_lock);
INIT_LIST_HEAD(_table->dev_list);
 
opp_dev = _add_opp_dev(dev, opp_table);
@@ -920,6 +921,7 @@ static void _opp_table_kref_release(struct kref *kref)
_remove_opp_dev(opp_dev, opp_table);
}
 
+   mutex_destroy(_table->genpd_virt_dev_lock);
mutex_destroy(_table->lock);
list_del(_table->node);
kfree(opp_table);
@@ -1602,6 +1604,92 @@ void dev_pm_opp_unregister_set_opp_helper(struct 
opp_table *opp_table)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper);
 
+/**
+ * dev_pm_opp_set_genpd_virt_dev - Set virtual genpd device for an index
+ * @dev: Consumer device for which the genpd device is getting set.
+ * @virt_dev: virtual genpd device.
+ * @index: index.
+ *
+ * Multiple generic power domains for a device are supported with the help of
+ * virtual genpd devices, which are created for each consumer device - genpd
+ * pair. These are the device structures which are attached to the power domain
+ * and are required by the OPP core to set the performance state of the genpd.
+ *
+ * This helper will normally be called by the consumer driver of the device
+ * "dev", as only that has details of the genpd devices.
+ *
+ * This helper needs to be called once for each of those virtual devices, but
+ * only if multiple domains are available for a device. Otherwise the original
+ * device structure will be used instead by the OPP core.
+ */
+struct opp_table *dev_pm_opp_set_genpd_virt_dev(struct device *dev,
+   struct device *virt_dev,
+   int index)
+{
+   struct opp_table *opp_table;
+
+   opp_table = dev_pm_opp_get_opp_table(dev);
+   if (!opp_table)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_lock(_table->genpd_virt_dev_lock);
+
+   if (unlikely(!opp_table->genpd_virt_devs ||
+index >= opp_table->required_opp_count ||
+opp_table->genpd_virt_devs[index])) {
+
+   dev_err(dev, "Invalid request to set required device\n");
+   dev_pm_opp_put_opp_table(opp_table);
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return ERR_PTR(-EINVAL);
+   }
+
+   opp_table->genpd_virt_devs[index] = virt_dev;
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return opp_table;
+}
+
+/**
+ * dev_pm_opp_put_genpd_virt_dev() - Releases resources blocked for genpd 
device.
+ * @opp_table: OPP table returned by 

[PATCH V3 06/10] PM / Domains: Add genpd_opp_to_performance_state()

2018-10-24 Thread Viresh Kumar
The OPP core currently stores the performance state in the consumer
device's OPP table, but that is going to change going forward and
performance state will rather be set directly in the genpd's OPP table.

For that we need to get the performance state for genpd's device
structure (genpd->dev) instead of the consumer device's structure. Add a
new helper to do that.

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 32 
 include/linux/pm_domain.h   |  9 +
 2 files changed, 41 insertions(+)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index fe9b0527b161..7be8c94c6b7f 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2520,6 +2520,38 @@ int of_genpd_parse_idle_states(struct device_node *dn,
 }
 EXPORT_SYMBOL_GPL(of_genpd_parse_idle_states);
 
+/**
+ * pm_genpd_opp_to_performance_state - Gets performance state of the genpd 
from its OPP node.
+ *
+ * @genpd_dev: Genpd's device for which the performance-state needs to be 
found.
+ * @opp: struct dev_pm_opp of the OPP for which we need to find performance
+ * state.
+ *
+ * Returns performance state encoded in the OPP of the genpd. This calls
+ * platform specific genpd->opp_to_performance_state() callback to translate
+ * power domain OPP to performance state.
+ *
+ * Returns performance state on success and 0 on failure.
+ */
+unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+  struct dev_pm_opp *opp)
+{
+   struct generic_pm_domain *genpd = NULL;
+   int state;
+
+   genpd = container_of(genpd_dev, struct generic_pm_domain, dev);
+
+   if (unlikely(!genpd->opp_to_performance_state))
+   return 0;
+
+   genpd_lock(genpd);
+   state = genpd->opp_to_performance_state(genpd, opp);
+   genpd_unlock(genpd);
+
+   return state;
+}
+EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state);
+
 /**
  * of_genpd_opp_to_performance_state- Gets performance state of device's
  * power domain corresponding to a DT node's "required-opps" property.
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 3b5d7280e52e..4f803f934308 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -258,6 +258,8 @@ int of_genpd_add_subdomain(struct of_phandle_args *parent,
 struct generic_pm_domain *of_genpd_remove_last(struct device_node *np);
 int of_genpd_parse_idle_states(struct device_node *dn,
   struct genpd_power_state **states, int *n);
+unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+  struct dev_pm_opp *opp);
 unsigned int of_genpd_opp_to_performance_state(struct device *dev,
struct device_node *np);
 
@@ -299,6 +301,13 @@ static inline int of_genpd_parse_idle_states(struct 
device_node *dn,
return -ENODEV;
 }
 
+static inline unsigned int
+pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+ struct dev_pm_opp *opp)
+{
+   return 0;
+}
+
 static inline unsigned int
 of_genpd_opp_to_performance_state(struct device *dev,
  struct device_node *np)
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 09/10] OPP: Rename and relocate of_genpd_opp_to_performance_state()

2018-10-24 Thread Viresh Kumar
The OPP core already has the performance state values for each of the
genpd's OPPs and there is no need to call the genpd callback again to
get the performance state for the case where the end device doesn't have
an OPP table and has the "required-opps" property directly in its node.

This commit renames of_genpd_opp_to_performance_state() as
of_get_required_opp_performance_state() and moves it to the OPP core, as
it is all about OPP stuff now.

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 48 -
 drivers/opp/of.c| 44 ++
 include/linux/pm_domain.h   |  9 ---
 include/linux/pm_opp.h  |  5 
 4 files changed, 49 insertions(+), 57 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 7be8c94c6b7f..8e554e6a82a2 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2552,54 +2552,6 @@ unsigned int pm_genpd_opp_to_performance_state(struct 
device *genpd_dev,
 }
 EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state);
 
-/**
- * of_genpd_opp_to_performance_state- Gets performance state of device's
- * power domain corresponding to a DT node's "required-opps" property.
- *
- * @dev: Device for which the performance-state needs to be found.
- * @np: DT node where the "required-opps" property is present. This can be
- * the device node itself (if it doesn't have an OPP table) or a node
- * within the OPP table of a device (if device has an OPP table).
- *
- * Returns performance state corresponding to the "required-opps" property of
- * a DT node. This calls platform specific genpd->opp_to_performance_state()
- * callback to translate power domain OPP to performance state.
- *
- * Returns performance state on success and 0 on failure.
- */
-unsigned int of_genpd_opp_to_performance_state(struct device *dev,
-  struct device_node *np)
-{
-   struct generic_pm_domain *genpd;
-   struct dev_pm_opp *opp;
-   int state = 0;
-
-   genpd = dev_to_genpd(dev);
-   if (IS_ERR(genpd))
-   return 0;
-
-   if (unlikely(!genpd->set_performance_state))
-   return 0;
-
-   genpd_lock(genpd);
-
-   opp = of_dev_pm_opp_find_required_opp(>dev, np);
-   if (IS_ERR(opp)) {
-   dev_err(dev, "Failed to find required OPP: %ld\n",
-   PTR_ERR(opp));
-   goto unlock;
-   }
-
-   state = genpd->opp_to_performance_state(genpd, opp);
-   dev_pm_opp_put(opp);
-
-unlock:
-   genpd_unlock(genpd);
-
-   return state;
-}
-EXPORT_SYMBOL_GPL(of_genpd_opp_to_performance_state);
-
 static int __init genpd_bus_init(void)
 {
return bus_register(_bus_type);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 4e494720ac25..0755ee307b1a 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -969,6 +969,50 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_sharing_cpus);
 
+/**
+ * of_get_required_opp_performance_state() - Search for required OPP and 
return its performance state.
+ * @np: Node that contains the "required-opps" property.
+ * @index: Index of the phandle to parse.
+ *
+ * Returns the performance state of the OPP pointed out by the "required-opps"
+ * property at @index in @np.
+ *
+ * Return: Positive performance state on success, otherwise 0 on errors.
+ */
+unsigned int of_get_required_opp_performance_state(struct device_node *np,
+  int index)
+{
+   struct dev_pm_opp *opp;
+   struct device_node *required_np;
+   struct opp_table *opp_table;
+   unsigned int pstate = 0;
+
+   required_np = of_parse_required_opp(np, index);
+   if (!required_np)
+   return -ENODEV;
+
+   opp_table = _find_table_of_opp_np(required_np);
+   if (IS_ERR(opp_table)) {
+   pr_err("%s: Failed to find required OPP table %pOF: %ld\n",
+  __func__, np, PTR_ERR(opp_table));
+   goto put_required_np;
+   }
+
+   opp = _find_opp_of_np(opp_table, required_np);
+   if (opp) {
+   pstate = opp->pstate;
+   dev_pm_opp_put(opp);
+   }
+
+   dev_pm_opp_put_opp_table(opp_table);
+
+put_required_np:
+   of_node_put(required_np);
+
+   return pstate;
+}
+EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state);
+
 /**
  * of_dev_pm_opp_find_required_opp() - Search for required OPP.
  * @dev: The device whose OPP node is referenced by the 'np' DT node.
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 4f803f934308..642036952553 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -260,8 +260,6 @@ int of_genpd_parse_idle_states(struct device_node *dn,
   struct genpd_power_state **states, int *n);
 unsigned int 

[PATCH V3 10/10] OPP: Remove of_dev_pm_opp_find_required_opp()

2018-10-24 Thread Viresh Kumar
This isn't used anymore, remove it.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/of.c   | 54 --
 include/linux/pm_opp.h |  5 
 2 files changed, 59 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 0755ee307b1a..da31874c7fe9 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1013,60 +1013,6 @@ unsigned int 
of_get_required_opp_performance_state(struct device_node *np,
 }
 EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state);
 
-/**
- * of_dev_pm_opp_find_required_opp() - Search for required OPP.
- * @dev: The device whose OPP node is referenced by the 'np' DT node.
- * @np: Node that contains the "required-opps" property.
- *
- * Returns the OPP of the device 'dev', whose phandle is present in the "np"
- * node. Although the "required-opps" property supports having multiple
- * phandles, this helper routine only parses the very first phandle in the 
list.
- *
- * Return: Matching opp, else returns ERR_PTR in case of error and should be
- * handled using IS_ERR.
- *
- * The callers are required to call dev_pm_opp_put() for the returned OPP after
- * use.
- */
-struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev,
-  struct device_node *np)
-{
-   struct dev_pm_opp *temp_opp, *opp = ERR_PTR(-ENODEV);
-   struct device_node *required_np;
-   struct opp_table *opp_table;
-
-   opp_table = _find_opp_table(dev);
-   if (IS_ERR(opp_table))
-   return ERR_CAST(opp_table);
-
-   required_np = of_parse_phandle(np, "required-opps", 0);
-   if (unlikely(!required_np)) {
-   dev_err(dev, "Unable to parse required-opps\n");
-   goto put_opp_table;
-   }
-
-   mutex_lock(_table->lock);
-
-   list_for_each_entry(temp_opp, _table->opp_list, node) {
-   if (temp_opp->available && temp_opp->np == required_np) {
-   opp = temp_opp;
-
-   /* Increment the reference count of OPP */
-   dev_pm_opp_get(opp);
-   break;
-   }
-   }
-
-   mutex_unlock(_table->lock);
-
-   of_node_put(required_np);
-put_opp_table:
-   dev_pm_opp_put_opp_table(opp_table);
-
-   return opp;
-}
-EXPORT_SYMBOL_GPL(of_dev_pm_opp_find_required_opp);
-
 /**
  * dev_pm_opp_get_of_node() - Gets the DT node corresponding to an opp
  * @opp:   opp for which DT node has to be returned for
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 889bb347fbd9..2b2c3fd985ab 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -313,7 +313,6 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask 
*cpumask);
 void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask);
 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask 
*cpumask);
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
-struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct 
device_node *np);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
 unsigned int of_get_required_opp_performance_state(struct device_node *np, int 
index);
 #else
@@ -350,10 +349,6 @@ static inline struct device_node 
*dev_pm_opp_of_get_opp_desc_node(struct device
return NULL;
 }
 
-static inline struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device 
*dev, struct device_node *np)
-{
-   return NULL;
-}
 static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp 
*opp)
 {
return NULL;
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 08/10] OPP: Configure all required OPPs

2018-10-24 Thread Viresh Kumar
Now that all the infrastructure is in place to support multiple required
OPPs, lets switch over to using it.

A new internal routine _set_required_opps() takes care of updating
performance state for all the required OPPs. With this the performance
state updates are supported even when the end device needs to configure
regulators as well, that wasn't the case earlier.

The pstates were earlier stored in the end device's OPP structures, that
also changes now as those values are stored in the genpd's OPP
structures. And so we switch over to using
pm_genpd_opp_to_performance_state() instead of
of_genpd_opp_to_performance_state() to get performance state for the
genpd OPPs.

The routine _generic_set_opp_domain() is not required anymore and is
removed.

On errors we don't try to recover by reverting to old settings as things
are really complex now and the calls here should never really fail
unless there is a bug. There is no point increasing the complexity, for
code which will never be executed.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 113 ++---
 drivers/opp/of.c   |   5 +-
 2 files changed, 68 insertions(+), 50 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index cef2ccda355d..0eaa954b3f6c 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -548,44 +548,6 @@ _generic_set_opp_clk_only(struct device *dev, struct clk 
*clk,
return ret;
 }
 
-static inline int
-_generic_set_opp_domain(struct device *dev, struct clk *clk,
-   unsigned long old_freq, unsigned long freq,
-   unsigned int old_pstate, unsigned int new_pstate)
-{
-   int ret;
-
-   /* Scaling up? Scale domain performance state before frequency */
-   if (freq > old_freq) {
-   ret = dev_pm_genpd_set_performance_state(dev, new_pstate);
-   if (ret)
-   return ret;
-   }
-
-   ret = _generic_set_opp_clk_only(dev, clk, old_freq, freq);
-   if (ret)
-   goto restore_domain_state;
-
-   /* Scaling down? Scale domain performance state after frequency */
-   if (freq < old_freq) {
-   ret = dev_pm_genpd_set_performance_state(dev, new_pstate);
-   if (ret)
-   goto restore_freq;
-   }
-
-   return 0;
-
-restore_freq:
-   if (_generic_set_opp_clk_only(dev, clk, freq, old_freq))
-   dev_err(dev, "%s: failed to restore old-freq (%lu Hz)\n",
-   __func__, old_freq);
-restore_domain_state:
-   if (freq > old_freq)
-   dev_pm_genpd_set_performance_state(dev, old_pstate);
-
-   return ret;
-}
-
 static int _generic_set_opp_regulator(const struct opp_table *opp_table,
  struct device *dev,
  unsigned long old_freq,
@@ -663,6 +625,56 @@ static int _set_opp_custom(const struct opp_table 
*opp_table,
return opp_table->set_opp(data);
 }
 
+/* This is only called for PM domain for now */
+static int _set_required_opps(struct device *dev,
+ struct opp_table *opp_table,
+ struct dev_pm_opp *opp)
+{
+   struct opp_table **required_opp_tables = opp_table->required_opp_tables;
+   struct device **genpd_virt_devs = opp_table->genpd_virt_devs;
+   unsigned int pstate;
+   int i, ret = 0;
+
+   if (!required_opp_tables)
+   return 0;
+
+   /* Single genpd case */
+   if (!genpd_virt_devs) {
+   pstate = opp->required_opps[0]->pstate;
+   ret = dev_pm_genpd_set_performance_state(dev, pstate);
+   if (ret) {
+   dev_err(dev, "Failed to set performance state of %s: %d 
(%d)\n",
+   dev_name(dev), pstate, ret);
+   }
+   return ret;
+   }
+
+   /* Multiple genpd case */
+
+   /*
+* Acquire genpd_virt_dev_lock to make sure we don't use a genpd_dev
+* after it is freed from another thread.
+*/
+   mutex_lock(_table->genpd_virt_dev_lock);
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   pstate = opp->required_opps[i]->pstate;
+
+   if (!genpd_virt_devs[i])
+   continue;
+
+   ret = dev_pm_genpd_set_performance_state(genpd_virt_devs[i], 
pstate);
+   if (ret) {
+   dev_err(dev, "Failed to set performance rate of %s: %d 
(%d)\n",
+   dev_name(genpd_virt_devs[i]), pstate, ret);
+   break;
+   }
+   }
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return ret;
+}
+
 /**
  * dev_pm_opp_set_rate() - Configure new OPP based on frequency
  * @dev:device for which we do this operation
@@ -730,6 +742,13 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 

[PATCH V3 05/10] OPP: Populate OPPs from "required-opps" property

2018-10-24 Thread Viresh Kumar
An earlier commit populated the OPP tables from the "required-opps"
property, this commit populates the individual OPPs. This is repeated
for each OPP in the OPP table and these populated OPPs will be used by
later commits.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c |  1 +
 drivers/opp/of.c   | 81 --
 drivers/opp/opp.h  |  6 
 3 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 85174a5c4850..02a69a62dac8 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -976,6 +976,7 @@ static void _opp_kref_release(struct kref *kref)
 * frequency/voltage list.
 */
blocking_notifier_call_chain(_table->head, OPP_EVENT_REMOVE, opp);
+   _of_opp_free_required_opps(opp_table, opp);
opp_debug_remove_one(opp);
list_del(>node);
kfree(opp);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index b5605196122a..ffaeefef98ce 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -259,6 +259,77 @@ void _of_clear_opp_table(struct opp_table *opp_table)
_opp_table_free_required_tables(opp_table);
 }
 
+/*
+ * Release all resources previously acquired with a call to
+ * _of_opp_alloc_required_opps().
+ */
+void _of_opp_free_required_opps(struct opp_table *opp_table,
+   struct dev_pm_opp *opp)
+{
+   struct dev_pm_opp **required_opps = opp->required_opps;
+   int i;
+
+   if (!required_opps)
+   return;
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   if (!required_opps[i])
+   break;
+
+   /* Put the reference back */
+   dev_pm_opp_put(required_opps[i]);
+   }
+
+   kfree(required_opps);
+   opp->required_opps = NULL;
+}
+
+/* Populate all required OPPs which are part of "required-opps" list */
+static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
+  struct dev_pm_opp *opp)
+{
+   struct dev_pm_opp **required_opps;
+   struct opp_table *required_table;
+   struct device_node *np;
+   int i, ret, count = opp_table->required_opp_count;
+
+   if (!count)
+   return 0;
+
+   required_opps = kcalloc(count, sizeof(*required_opps), GFP_KERNEL);
+   if (!required_opps)
+   return -ENOMEM;
+
+   opp->required_opps = required_opps;
+
+   for (i = 0; i < count; i++) {
+   required_table = opp_table->required_opp_tables[i];
+
+   np = of_parse_required_opp(opp->np, i);
+   if (unlikely(!np)) {
+   ret = -ENODEV;
+   goto free_required_opps;
+   }
+
+   required_opps[i] = _find_opp_of_np(required_table, np);
+   of_node_put(np);
+
+   if (!required_opps[i]) {
+   pr_err("%s: Unable to find required OPP node: %pOF 
(%d)\n",
+  __func__, opp->np, i);
+   ret = -ENODEV;
+   goto free_required_opps;
+   }
+   }
+
+   return 0;
+
+free_required_opps:
+   _of_opp_free_required_opps(opp_table, opp);
+
+   return ret;
+}
+
 static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table,
  struct device_node *np)
 {
@@ -503,6 +574,10 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
new_opp->dynamic = false;
new_opp->available = true;
 
+   ret = _of_opp_alloc_required_opps(opp_table, new_opp);
+   if (ret)
+   goto free_opp;
+
if (!of_property_read_u32(np, "clock-latency-ns", ))
new_opp->clock_latency_ns = val;
 
@@ -510,14 +585,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
 
ret = opp_parse_supplies(new_opp, dev, opp_table);
if (ret)
-   goto free_opp;
+   goto free_required_opps;
 
ret = _opp_add(dev, new_opp, opp_table, rate_not_available);
if (ret) {
/* Don't return error for duplicate OPPs */
if (ret == -EBUSY)
ret = 0;
-   goto free_opp;
+   goto free_required_opps;
}
 
/* OPP to select on device suspend */
@@ -547,6 +622,8 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
blocking_notifier_call_chain(_table->head, OPP_EVENT_ADD, new_opp);
return new_opp;
 
+free_required_opps:
+   _of_opp_free_required_opps(opp_table, new_opp);
 free_opp:
_opp_free(new_opp);
 
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 024e1be23d37..24b340ad18d1 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -63,6 +63,7 @@ extern struct list_head opp_tables;
  * @supplies:  Power supplies 

[PATCH V3 02/10] OPP: Identify and mark genpd OPP tables

2018-10-24 Thread Viresh Kumar
We need to handle genpd OPP tables differently, this is already the case
at one location and will be extended going forward. Add another field to
the OPP table to check if the table belongs to a genpd or not.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/of.c  | 6 --
 drivers/opp/opp.h | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 5a4b47958073..5f114cd3d88c 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -92,6 +92,9 @@ void _of_init_opp_table(struct opp_table *opp_table, struct 
device *dev,
of_property_read_u32(np, "voltage-tolerance",
 _table->voltage_tolerance_v1);
 
+   if (of_find_property(np, "#power-domain-cells", NULL))
+   opp_table->is_genpd = true;
+
/* Get OPP table node */
opp_np = _opp_of_get_opp_desc_node(np, index);
of_node_put(np);
@@ -326,8 +329,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct 
opp_table *opp_table,
ret = of_property_read_u64(np, "opp-hz", );
if (ret < 0) {
/* "opp-hz" is optional for devices like power domains. */
-   if (!of_find_property(dev->of_node, "#power-domain-cells",
- NULL)) {
+   if (!opp_table->is_genpd) {
dev_err(dev, "%s: opp-hz not found\n", __func__);
goto free_opp;
}
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 9c6544b4f4f9..cdb0c2b095e2 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -140,6 +140,7 @@ enum opp_table_access {
  * @regulators: Supply regulators
  * @regulator_count: Number of power supply regulators
  * @genpd_performance_state: Device's power domain support performance state.
+ * @is_genpd: Marks if the OPP table belongs to a genpd.
  * @set_opp: Platform specific set_opp callback
  * @set_opp_data: Data to be passed to set_opp callback
  * @dentry:debugfs dentry pointer of the real device directory (not links).
@@ -178,6 +179,7 @@ struct opp_table {
struct regulator **regulators;
unsigned int regulator_count;
bool genpd_performance_state;
+   bool is_genpd;
 
int (*set_opp)(struct dev_pm_set_opp_data *data);
struct dev_pm_set_opp_data *set_opp_data;
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 07/10] OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper

2018-10-24 Thread Viresh Kumar
Multiple generic power domains for a consumer device are supported with
the help of virtual devices, which are created for each consumer device
- genpd pair. These are the device structures which are attached to the
power domain and are required by the OPP core to set the performance
state of the genpd.

The helpers added by this commit are required to be called once for each
of these virtual devices. These are required only if multiple domains
are available for a device, otherwise the actual device structure will
be used instead by the OPP core.

The new helpers also support the complex cases where the consumer device
wouldn't always require all the domains. For example, a camera may
require only one power domain during normal operations but two during
high resolution operations. The consumer driver can call
dev_pm_opp_put_genpd_virt_dev(high_resolution_genpd_virt_dev) if it is
currently operating in the normal mode and doesn't have any performance
requirements from the genpd which manages high resolution power
requirements. The consumer driver can later call
dev_pm_opp_set_genpd_virt_dev(high_resolution_genpd_virt_dev) once it
switches back to the high resolution mode.

The new helpers differ from other OPP set/put helpers as the new ones
can be called with OPPs initialized for the table as we may need to call
them on the fly because of the complex case explained above. For this
reason it is possible that the genpd virt_dev structure may be used in
parallel while the new helpers are running and a new mutex is added to
protect against that. We didn't use the existing opp_table->lock mutex
as that is widely used in the OPP core and we will need this lock in the
dev_pm_opp_set_rate() helper while changing OPP and we need to make sure
there is not much contention while doing that as that's the hotpath.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 88 ++
 drivers/opp/of.c   | 16 +++-
 drivers/opp/opp.h  |  4 ++
 include/linux/pm_opp.h |  8 
 4 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 02a69a62dac8..cef2ccda355d 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -823,6 +823,7 @@ static struct opp_table *_allocate_opp_table(struct device 
*dev, int index)
return NULL;
 
mutex_init(_table->lock);
+   mutex_init(_table->genpd_virt_dev_lock);
INIT_LIST_HEAD(_table->dev_list);
 
opp_dev = _add_opp_dev(dev, opp_table);
@@ -920,6 +921,7 @@ static void _opp_table_kref_release(struct kref *kref)
_remove_opp_dev(opp_dev, opp_table);
}
 
+   mutex_destroy(_table->genpd_virt_dev_lock);
mutex_destroy(_table->lock);
list_del(_table->node);
kfree(opp_table);
@@ -1602,6 +1604,92 @@ void dev_pm_opp_unregister_set_opp_helper(struct 
opp_table *opp_table)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_unregister_set_opp_helper);
 
+/**
+ * dev_pm_opp_set_genpd_virt_dev - Set virtual genpd device for an index
+ * @dev: Consumer device for which the genpd device is getting set.
+ * @virt_dev: virtual genpd device.
+ * @index: index.
+ *
+ * Multiple generic power domains for a device are supported with the help of
+ * virtual genpd devices, which are created for each consumer device - genpd
+ * pair. These are the device structures which are attached to the power domain
+ * and are required by the OPP core to set the performance state of the genpd.
+ *
+ * This helper will normally be called by the consumer driver of the device
+ * "dev", as only that has details of the genpd devices.
+ *
+ * This helper needs to be called once for each of those virtual devices, but
+ * only if multiple domains are available for a device. Otherwise the original
+ * device structure will be used instead by the OPP core.
+ */
+struct opp_table *dev_pm_opp_set_genpd_virt_dev(struct device *dev,
+   struct device *virt_dev,
+   int index)
+{
+   struct opp_table *opp_table;
+
+   opp_table = dev_pm_opp_get_opp_table(dev);
+   if (!opp_table)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_lock(_table->genpd_virt_dev_lock);
+
+   if (unlikely(!opp_table->genpd_virt_devs ||
+index >= opp_table->required_opp_count ||
+opp_table->genpd_virt_devs[index])) {
+
+   dev_err(dev, "Invalid request to set required device\n");
+   dev_pm_opp_put_opp_table(opp_table);
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return ERR_PTR(-EINVAL);
+   }
+
+   opp_table->genpd_virt_devs[index] = virt_dev;
+   mutex_unlock(_table->genpd_virt_dev_lock);
+
+   return opp_table;
+}
+
+/**
+ * dev_pm_opp_put_genpd_virt_dev() - Releases resources blocked for genpd 
device.
+ * @opp_table: OPP table returned by 

[PATCH V3 06/10] PM / Domains: Add genpd_opp_to_performance_state()

2018-10-24 Thread Viresh Kumar
The OPP core currently stores the performance state in the consumer
device's OPP table, but that is going to change going forward and
performance state will rather be set directly in the genpd's OPP table.

For that we need to get the performance state for genpd's device
structure (genpd->dev) instead of the consumer device's structure. Add a
new helper to do that.

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 32 
 include/linux/pm_domain.h   |  9 +
 2 files changed, 41 insertions(+)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index fe9b0527b161..7be8c94c6b7f 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2520,6 +2520,38 @@ int of_genpd_parse_idle_states(struct device_node *dn,
 }
 EXPORT_SYMBOL_GPL(of_genpd_parse_idle_states);
 
+/**
+ * pm_genpd_opp_to_performance_state - Gets performance state of the genpd 
from its OPP node.
+ *
+ * @genpd_dev: Genpd's device for which the performance-state needs to be 
found.
+ * @opp: struct dev_pm_opp of the OPP for which we need to find performance
+ * state.
+ *
+ * Returns performance state encoded in the OPP of the genpd. This calls
+ * platform specific genpd->opp_to_performance_state() callback to translate
+ * power domain OPP to performance state.
+ *
+ * Returns performance state on success and 0 on failure.
+ */
+unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+  struct dev_pm_opp *opp)
+{
+   struct generic_pm_domain *genpd = NULL;
+   int state;
+
+   genpd = container_of(genpd_dev, struct generic_pm_domain, dev);
+
+   if (unlikely(!genpd->opp_to_performance_state))
+   return 0;
+
+   genpd_lock(genpd);
+   state = genpd->opp_to_performance_state(genpd, opp);
+   genpd_unlock(genpd);
+
+   return state;
+}
+EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state);
+
 /**
  * of_genpd_opp_to_performance_state- Gets performance state of device's
  * power domain corresponding to a DT node's "required-opps" property.
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 3b5d7280e52e..4f803f934308 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -258,6 +258,8 @@ int of_genpd_add_subdomain(struct of_phandle_args *parent,
 struct generic_pm_domain *of_genpd_remove_last(struct device_node *np);
 int of_genpd_parse_idle_states(struct device_node *dn,
   struct genpd_power_state **states, int *n);
+unsigned int pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+  struct dev_pm_opp *opp);
 unsigned int of_genpd_opp_to_performance_state(struct device *dev,
struct device_node *np);
 
@@ -299,6 +301,13 @@ static inline int of_genpd_parse_idle_states(struct 
device_node *dn,
return -ENODEV;
 }
 
+static inline unsigned int
+pm_genpd_opp_to_performance_state(struct device *genpd_dev,
+ struct dev_pm_opp *opp)
+{
+   return 0;
+}
+
 static inline unsigned int
 of_genpd_opp_to_performance_state(struct device *dev,
  struct device_node *np)
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 09/10] OPP: Rename and relocate of_genpd_opp_to_performance_state()

2018-10-24 Thread Viresh Kumar
The OPP core already has the performance state values for each of the
genpd's OPPs and there is no need to call the genpd callback again to
get the performance state for the case where the end device doesn't have
an OPP table and has the "required-opps" property directly in its node.

This commit renames of_genpd_opp_to_performance_state() as
of_get_required_opp_performance_state() and moves it to the OPP core, as
it is all about OPP stuff now.

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 48 -
 drivers/opp/of.c| 44 ++
 include/linux/pm_domain.h   |  9 ---
 include/linux/pm_opp.h  |  5 
 4 files changed, 49 insertions(+), 57 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 7be8c94c6b7f..8e554e6a82a2 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2552,54 +2552,6 @@ unsigned int pm_genpd_opp_to_performance_state(struct 
device *genpd_dev,
 }
 EXPORT_SYMBOL_GPL(pm_genpd_opp_to_performance_state);
 
-/**
- * of_genpd_opp_to_performance_state- Gets performance state of device's
- * power domain corresponding to a DT node's "required-opps" property.
- *
- * @dev: Device for which the performance-state needs to be found.
- * @np: DT node where the "required-opps" property is present. This can be
- * the device node itself (if it doesn't have an OPP table) or a node
- * within the OPP table of a device (if device has an OPP table).
- *
- * Returns performance state corresponding to the "required-opps" property of
- * a DT node. This calls platform specific genpd->opp_to_performance_state()
- * callback to translate power domain OPP to performance state.
- *
- * Returns performance state on success and 0 on failure.
- */
-unsigned int of_genpd_opp_to_performance_state(struct device *dev,
-  struct device_node *np)
-{
-   struct generic_pm_domain *genpd;
-   struct dev_pm_opp *opp;
-   int state = 0;
-
-   genpd = dev_to_genpd(dev);
-   if (IS_ERR(genpd))
-   return 0;
-
-   if (unlikely(!genpd->set_performance_state))
-   return 0;
-
-   genpd_lock(genpd);
-
-   opp = of_dev_pm_opp_find_required_opp(>dev, np);
-   if (IS_ERR(opp)) {
-   dev_err(dev, "Failed to find required OPP: %ld\n",
-   PTR_ERR(opp));
-   goto unlock;
-   }
-
-   state = genpd->opp_to_performance_state(genpd, opp);
-   dev_pm_opp_put(opp);
-
-unlock:
-   genpd_unlock(genpd);
-
-   return state;
-}
-EXPORT_SYMBOL_GPL(of_genpd_opp_to_performance_state);
-
 static int __init genpd_bus_init(void)
 {
return bus_register(_bus_type);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 4e494720ac25..0755ee307b1a 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -969,6 +969,50 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_sharing_cpus);
 
+/**
+ * of_get_required_opp_performance_state() - Search for required OPP and 
return its performance state.
+ * @np: Node that contains the "required-opps" property.
+ * @index: Index of the phandle to parse.
+ *
+ * Returns the performance state of the OPP pointed out by the "required-opps"
+ * property at @index in @np.
+ *
+ * Return: Positive performance state on success, otherwise 0 on errors.
+ */
+unsigned int of_get_required_opp_performance_state(struct device_node *np,
+  int index)
+{
+   struct dev_pm_opp *opp;
+   struct device_node *required_np;
+   struct opp_table *opp_table;
+   unsigned int pstate = 0;
+
+   required_np = of_parse_required_opp(np, index);
+   if (!required_np)
+   return -ENODEV;
+
+   opp_table = _find_table_of_opp_np(required_np);
+   if (IS_ERR(opp_table)) {
+   pr_err("%s: Failed to find required OPP table %pOF: %ld\n",
+  __func__, np, PTR_ERR(opp_table));
+   goto put_required_np;
+   }
+
+   opp = _find_opp_of_np(opp_table, required_np);
+   if (opp) {
+   pstate = opp->pstate;
+   dev_pm_opp_put(opp);
+   }
+
+   dev_pm_opp_put_opp_table(opp_table);
+
+put_required_np:
+   of_node_put(required_np);
+
+   return pstate;
+}
+EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state);
+
 /**
  * of_dev_pm_opp_find_required_opp() - Search for required OPP.
  * @dev: The device whose OPP node is referenced by the 'np' DT node.
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 4f803f934308..642036952553 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -260,8 +260,6 @@ int of_genpd_parse_idle_states(struct device_node *dn,
   struct genpd_power_state **states, int *n);
 unsigned int 

[PATCH V3 10/10] OPP: Remove of_dev_pm_opp_find_required_opp()

2018-10-24 Thread Viresh Kumar
This isn't used anymore, remove it.

Signed-off-by: Viresh Kumar 
---
 drivers/opp/of.c   | 54 --
 include/linux/pm_opp.h |  5 
 2 files changed, 59 deletions(-)

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 0755ee307b1a..da31874c7fe9 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1013,60 +1013,6 @@ unsigned int 
of_get_required_opp_performance_state(struct device_node *np,
 }
 EXPORT_SYMBOL_GPL(of_get_required_opp_performance_state);
 
-/**
- * of_dev_pm_opp_find_required_opp() - Search for required OPP.
- * @dev: The device whose OPP node is referenced by the 'np' DT node.
- * @np: Node that contains the "required-opps" property.
- *
- * Returns the OPP of the device 'dev', whose phandle is present in the "np"
- * node. Although the "required-opps" property supports having multiple
- * phandles, this helper routine only parses the very first phandle in the 
list.
- *
- * Return: Matching opp, else returns ERR_PTR in case of error and should be
- * handled using IS_ERR.
- *
- * The callers are required to call dev_pm_opp_put() for the returned OPP after
- * use.
- */
-struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev,
-  struct device_node *np)
-{
-   struct dev_pm_opp *temp_opp, *opp = ERR_PTR(-ENODEV);
-   struct device_node *required_np;
-   struct opp_table *opp_table;
-
-   opp_table = _find_opp_table(dev);
-   if (IS_ERR(opp_table))
-   return ERR_CAST(opp_table);
-
-   required_np = of_parse_phandle(np, "required-opps", 0);
-   if (unlikely(!required_np)) {
-   dev_err(dev, "Unable to parse required-opps\n");
-   goto put_opp_table;
-   }
-
-   mutex_lock(_table->lock);
-
-   list_for_each_entry(temp_opp, _table->opp_list, node) {
-   if (temp_opp->available && temp_opp->np == required_np) {
-   opp = temp_opp;
-
-   /* Increment the reference count of OPP */
-   dev_pm_opp_get(opp);
-   break;
-   }
-   }
-
-   mutex_unlock(_table->lock);
-
-   of_node_put(required_np);
-put_opp_table:
-   dev_pm_opp_put_opp_table(opp_table);
-
-   return opp;
-}
-EXPORT_SYMBOL_GPL(of_dev_pm_opp_find_required_opp);
-
 /**
  * dev_pm_opp_get_of_node() - Gets the DT node corresponding to an opp
  * @opp:   opp for which DT node has to be returned for
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 889bb347fbd9..2b2c3fd985ab 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -313,7 +313,6 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask 
*cpumask);
 void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask);
 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask 
*cpumask);
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
-struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device *dev, struct 
device_node *np);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
 unsigned int of_get_required_opp_performance_state(struct device_node *np, int 
index);
 #else
@@ -350,10 +349,6 @@ static inline struct device_node 
*dev_pm_opp_of_get_opp_desc_node(struct device
return NULL;
 }
 
-static inline struct dev_pm_opp *of_dev_pm_opp_find_required_opp(struct device 
*dev, struct device_node *np)
-{
-   return NULL;
-}
 static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp 
*opp)
 {
return NULL;
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 04/10] OPP: Populate required opp tables from "required-opps" property

2018-10-24 Thread Viresh Kumar
The current implementation works only for the case where a single
phandle is present in the "required-opps" property, while DT allows
multiple phandles to be present there.

This patch adds new infrastructure to parse all the phandles present in
"required-opps" property and save pointers of the required OPP's OPP
tables. These will be used by later commits.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c |   2 +
 drivers/opp/of.c   | 147 +
 drivers/opp/opp.h  |   8 +++
 3 files changed, 157 insertions(+)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index ebb3b648e0fd..85174a5c4850 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -901,6 +901,8 @@ static void _opp_table_kref_release(struct kref *kref)
struct opp_table *opp_table = container_of(kref, struct opp_table, 
kref);
struct opp_device *opp_dev, *temp;
 
+   _of_clear_opp_table(opp_table);
+
/* Release clk */
if (!IS_ERR(opp_table->clk))
clk_put(opp_table->clk);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 5f114cd3d88c..b5605196122a 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -73,6 +73,147 @@ struct opp_table *_managed_opp(struct device *dev, int 
index)
return managed_table;
 }
 
+/* The caller must call dev_pm_opp_put() after the OPP is used */
+static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table,
+ struct device_node *opp_np)
+{
+   struct dev_pm_opp *opp;
+
+   lockdep_assert_held(_table_lock);
+
+   mutex_lock(_table->lock);
+
+   list_for_each_entry(opp, _table->opp_list, node) {
+   if (opp->np == opp_np) {
+   dev_pm_opp_get(opp);
+   mutex_unlock(_table->lock);
+   return opp;
+   }
+   }
+
+   mutex_unlock(_table->lock);
+
+   return NULL;
+}
+
+static struct device_node *of_parse_required_opp(struct device_node *np,
+int index)
+{
+   struct device_node *required_np;
+
+   required_np = of_parse_phandle(np, "required-opps", index);
+   if (unlikely(!required_np)) {
+   pr_err("%s: Unable to parse required-opps: %pOF, index: %d\n",
+  __func__, np, index);
+   }
+
+   return required_np;
+}
+
+/* The caller must call dev_pm_opp_put_opp_table() after the table is used */
+static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
+{
+   struct opp_table *opp_table;
+   struct dev_pm_opp *opp;
+
+   lockdep_assert_held(_table_lock);
+
+   list_for_each_entry(opp_table, _tables, node) {
+   opp = _find_opp_of_np(opp_table, opp_np);
+   if (opp) {
+   dev_pm_opp_put(opp);
+   _get_opp_table_kref(opp_table);
+   return opp_table;
+   }
+   }
+
+   return ERR_PTR(-ENODEV);
+}
+
+/* Free resources previously acquired by _opp_table_alloc_required_tables() */
+static void _opp_table_free_required_tables(struct opp_table *opp_table)
+{
+   struct opp_table **required_opp_tables = opp_table->required_opp_tables;
+   int i;
+
+   if (!required_opp_tables)
+   return;
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   if (IS_ERR_OR_NULL(required_opp_tables[i]))
+   break;
+
+   dev_pm_opp_put_opp_table(required_opp_tables[i]);
+   }
+
+   kfree(required_opp_tables);
+
+   opp_table->required_opp_count = 0;
+   opp_table->required_opp_tables = NULL;
+}
+
+/*
+ * Populate all devices and opp tables which are part of "required-opps" list.
+ * Checking only the first OPP node should be enough.
+ */
+static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
+struct device *dev,
+struct device_node *opp_np)
+{
+   struct opp_table **required_opp_tables;
+   struct device_node *required_np, *np;
+   int count, i;
+
+   /* Traversing the first OPP node is all we need */
+   np = of_get_next_available_child(opp_np, NULL);
+   if (!np) {
+   dev_err(dev, "Empty OPP table\n");
+   return;
+   }
+
+   count = of_count_phandle_with_args(np, "required-opps", NULL);
+   if (!count)
+   goto put_np;
+
+   required_opp_tables = kcalloc(count, sizeof(*required_opp_tables),
+ GFP_KERNEL);
+   if (!required_opp_tables)
+   goto put_np;
+
+   opp_table->required_opp_tables = required_opp_tables;
+   opp_table->required_opp_count = count;
+
+   for (i = 0; i < count; i++) {
+   required_np = of_parse_required_opp(np, i);
+   if 

[PATCH V3 04/10] OPP: Populate required opp tables from "required-opps" property

2018-10-24 Thread Viresh Kumar
The current implementation works only for the case where a single
phandle is present in the "required-opps" property, while DT allows
multiple phandles to be present there.

This patch adds new infrastructure to parse all the phandles present in
"required-opps" property and save pointers of the required OPP's OPP
tables. These will be used by later commits.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c |   2 +
 drivers/opp/of.c   | 147 +
 drivers/opp/opp.h  |   8 +++
 3 files changed, 157 insertions(+)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index ebb3b648e0fd..85174a5c4850 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -901,6 +901,8 @@ static void _opp_table_kref_release(struct kref *kref)
struct opp_table *opp_table = container_of(kref, struct opp_table, 
kref);
struct opp_device *opp_dev, *temp;
 
+   _of_clear_opp_table(opp_table);
+
/* Release clk */
if (!IS_ERR(opp_table->clk))
clk_put(opp_table->clk);
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 5f114cd3d88c..b5605196122a 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -73,6 +73,147 @@ struct opp_table *_managed_opp(struct device *dev, int 
index)
return managed_table;
 }
 
+/* The caller must call dev_pm_opp_put() after the OPP is used */
+static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table,
+ struct device_node *opp_np)
+{
+   struct dev_pm_opp *opp;
+
+   lockdep_assert_held(_table_lock);
+
+   mutex_lock(_table->lock);
+
+   list_for_each_entry(opp, _table->opp_list, node) {
+   if (opp->np == opp_np) {
+   dev_pm_opp_get(opp);
+   mutex_unlock(_table->lock);
+   return opp;
+   }
+   }
+
+   mutex_unlock(_table->lock);
+
+   return NULL;
+}
+
+static struct device_node *of_parse_required_opp(struct device_node *np,
+int index)
+{
+   struct device_node *required_np;
+
+   required_np = of_parse_phandle(np, "required-opps", index);
+   if (unlikely(!required_np)) {
+   pr_err("%s: Unable to parse required-opps: %pOF, index: %d\n",
+  __func__, np, index);
+   }
+
+   return required_np;
+}
+
+/* The caller must call dev_pm_opp_put_opp_table() after the table is used */
+static struct opp_table *_find_table_of_opp_np(struct device_node *opp_np)
+{
+   struct opp_table *opp_table;
+   struct dev_pm_opp *opp;
+
+   lockdep_assert_held(_table_lock);
+
+   list_for_each_entry(opp_table, _tables, node) {
+   opp = _find_opp_of_np(opp_table, opp_np);
+   if (opp) {
+   dev_pm_opp_put(opp);
+   _get_opp_table_kref(opp_table);
+   return opp_table;
+   }
+   }
+
+   return ERR_PTR(-ENODEV);
+}
+
+/* Free resources previously acquired by _opp_table_alloc_required_tables() */
+static void _opp_table_free_required_tables(struct opp_table *opp_table)
+{
+   struct opp_table **required_opp_tables = opp_table->required_opp_tables;
+   int i;
+
+   if (!required_opp_tables)
+   return;
+
+   for (i = 0; i < opp_table->required_opp_count; i++) {
+   if (IS_ERR_OR_NULL(required_opp_tables[i]))
+   break;
+
+   dev_pm_opp_put_opp_table(required_opp_tables[i]);
+   }
+
+   kfree(required_opp_tables);
+
+   opp_table->required_opp_count = 0;
+   opp_table->required_opp_tables = NULL;
+}
+
+/*
+ * Populate all devices and opp tables which are part of "required-opps" list.
+ * Checking only the first OPP node should be enough.
+ */
+static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
+struct device *dev,
+struct device_node *opp_np)
+{
+   struct opp_table **required_opp_tables;
+   struct device_node *required_np, *np;
+   int count, i;
+
+   /* Traversing the first OPP node is all we need */
+   np = of_get_next_available_child(opp_np, NULL);
+   if (!np) {
+   dev_err(dev, "Empty OPP table\n");
+   return;
+   }
+
+   count = of_count_phandle_with_args(np, "required-opps", NULL);
+   if (!count)
+   goto put_np;
+
+   required_opp_tables = kcalloc(count, sizeof(*required_opp_tables),
+ GFP_KERNEL);
+   if (!required_opp_tables)
+   goto put_np;
+
+   opp_table->required_opp_tables = required_opp_tables;
+   opp_table->required_opp_count = count;
+
+   for (i = 0; i < count; i++) {
+   required_np = of_parse_required_opp(np, i);
+   if 

[PATCH V3 03/10] OPP: Separate out custom OPP handler specific code

2018-10-24 Thread Viresh Kumar
Create a separate routine to take care of custom set_opp() handler
specific stuff.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 67 +++---
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 2c2df4e4fc14..ebb3b648e0fd 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -635,6 +635,34 @@ static int _generic_set_opp_regulator(const struct 
opp_table *opp_table,
return ret;
 }
 
+static int _set_opp_custom(const struct opp_table *opp_table,
+  struct device *dev, unsigned long old_freq,
+  unsigned long freq,
+  struct dev_pm_opp_supply *old_supply,
+  struct dev_pm_opp_supply *new_supply)
+{
+   struct dev_pm_set_opp_data *data;
+   int size;
+
+   data = opp_table->set_opp_data;
+   data->regulators = opp_table->regulators;
+   data->regulator_count = opp_table->regulator_count;
+   data->clk = opp_table->clk;
+   data->dev = dev;
+
+   data->old_opp.rate = old_freq;
+   size = sizeof(*old_supply) * opp_table->regulator_count;
+   if (IS_ERR(old_supply))
+   memset(data->old_opp.supplies, 0, size);
+   else
+   memcpy(data->old_opp.supplies, old_supply, size);
+
+   data->new_opp.rate = freq;
+   memcpy(data->new_opp.supplies, new_supply, size);
+
+   return opp_table->set_opp(data);
+}
+
 /**
  * dev_pm_opp_set_rate() - Configure new OPP based on frequency
  * @dev:device for which we do this operation
@@ -649,7 +677,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
unsigned long freq, old_freq;
struct dev_pm_opp *old_opp, *opp;
struct clk *clk;
-   int ret, size;
+   int ret;
 
if (unlikely(!target_freq)) {
dev_err(dev, "%s: Invalid target frequency %lu\n", __func__,
@@ -702,8 +730,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__,
old_freq, freq);
 
-   /* Only frequency scaling */
-   if (!opp_table->regulators) {
+   if (opp_table->set_opp) {
+   ret = _set_opp_custom(opp_table, dev, old_freq, freq,
+ IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
+ opp->supplies);
+   } else if (opp_table->regulators) {
+   ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq,
+IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
+opp->supplies);
+   } else {
+   /* Only frequency scaling */
+
/*
 * We don't support devices with both regulator and
 * domain performance-state for now.
@@ -714,30 +751,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
  opp->pstate);
else
ret = _generic_set_opp_clk_only(dev, clk, old_freq, 
freq);
-   } else if (!opp_table->set_opp) {
-   ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq,
-IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
-opp->supplies);
-   } else {
-   struct dev_pm_set_opp_data *data;
-
-   data = opp_table->set_opp_data;
-   data->regulators = opp_table->regulators;
-   data->regulator_count = opp_table->regulator_count;
-   data->clk = clk;
-   data->dev = dev;
-
-   data->old_opp.rate = old_freq;
-   size = sizeof(*opp->supplies) * opp_table->regulator_count;
-   if (IS_ERR(old_opp))
-   memset(data->old_opp.supplies, 0, size);
-   else
-   memcpy(data->old_opp.supplies, old_opp->supplies, size);
-
-   data->new_opp.rate = freq;
-   memcpy(data->new_opp.supplies, opp->supplies, size);
-
-   ret = opp_table->set_opp(data);
}
 
dev_pm_opp_put(opp);
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 03/10] OPP: Separate out custom OPP handler specific code

2018-10-24 Thread Viresh Kumar
Create a separate routine to take care of custom set_opp() handler
specific stuff.

Reviewed-by: Ulf Hansson 
Signed-off-by: Viresh Kumar 
---
 drivers/opp/core.c | 67 +++---
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 2c2df4e4fc14..ebb3b648e0fd 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -635,6 +635,34 @@ static int _generic_set_opp_regulator(const struct 
opp_table *opp_table,
return ret;
 }
 
+static int _set_opp_custom(const struct opp_table *opp_table,
+  struct device *dev, unsigned long old_freq,
+  unsigned long freq,
+  struct dev_pm_opp_supply *old_supply,
+  struct dev_pm_opp_supply *new_supply)
+{
+   struct dev_pm_set_opp_data *data;
+   int size;
+
+   data = opp_table->set_opp_data;
+   data->regulators = opp_table->regulators;
+   data->regulator_count = opp_table->regulator_count;
+   data->clk = opp_table->clk;
+   data->dev = dev;
+
+   data->old_opp.rate = old_freq;
+   size = sizeof(*old_supply) * opp_table->regulator_count;
+   if (IS_ERR(old_supply))
+   memset(data->old_opp.supplies, 0, size);
+   else
+   memcpy(data->old_opp.supplies, old_supply, size);
+
+   data->new_opp.rate = freq;
+   memcpy(data->new_opp.supplies, new_supply, size);
+
+   return opp_table->set_opp(data);
+}
+
 /**
  * dev_pm_opp_set_rate() - Configure new OPP based on frequency
  * @dev:device for which we do this operation
@@ -649,7 +677,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
unsigned long freq, old_freq;
struct dev_pm_opp *old_opp, *opp;
struct clk *clk;
-   int ret, size;
+   int ret;
 
if (unlikely(!target_freq)) {
dev_err(dev, "%s: Invalid target frequency %lu\n", __func__,
@@ -702,8 +730,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
dev_dbg(dev, "%s: switching OPP: %lu Hz --> %lu Hz\n", __func__,
old_freq, freq);
 
-   /* Only frequency scaling */
-   if (!opp_table->regulators) {
+   if (opp_table->set_opp) {
+   ret = _set_opp_custom(opp_table, dev, old_freq, freq,
+ IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
+ opp->supplies);
+   } else if (opp_table->regulators) {
+   ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq,
+IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
+opp->supplies);
+   } else {
+   /* Only frequency scaling */
+
/*
 * We don't support devices with both regulator and
 * domain performance-state for now.
@@ -714,30 +751,6 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long 
target_freq)
  opp->pstate);
else
ret = _generic_set_opp_clk_only(dev, clk, old_freq, 
freq);
-   } else if (!opp_table->set_opp) {
-   ret = _generic_set_opp_regulator(opp_table, dev, old_freq, freq,
-IS_ERR(old_opp) ? NULL : 
old_opp->supplies,
-opp->supplies);
-   } else {
-   struct dev_pm_set_opp_data *data;
-
-   data = opp_table->set_opp_data;
-   data->regulators = opp_table->regulators;
-   data->regulator_count = opp_table->regulator_count;
-   data->clk = clk;
-   data->dev = dev;
-
-   data->old_opp.rate = old_freq;
-   size = sizeof(*opp->supplies) * opp_table->regulator_count;
-   if (IS_ERR(old_opp))
-   memset(data->old_opp.supplies, 0, size);
-   else
-   memcpy(data->old_opp.supplies, old_opp->supplies, size);
-
-   data->new_opp.rate = freq;
-   memcpy(data->new_opp.supplies, opp->supplies, size);
-
-   ret = opp_table->set_opp(data);
}
 
dev_pm_opp_put(opp);
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 01/10] PM / Domains: Rename genpd virtual devices as virt_dev

2018-10-24 Thread Viresh Kumar
There are several struct device instances that genpd core handles. The
most common one is the consumer device structure, which is named
(correctly) as "dev" within genpd core. The second one is the genpd's
device structure, referenced as genpd->dev. The third one is the virtual
device structures created by the genpd core to represent the consumer
device for multiple power domain case, currently named as genpd_dev. The
naming of these virtual devices isn't very clear or readable and it
looks more like the genpd->dev.

Rename the virtual device instances within the genpd core as "virt_dev".

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 7f38a92b444a..fe9b0527b161 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2338,7 +2338,7 @@ EXPORT_SYMBOL_GPL(genpd_dev_pm_attach);
 struct device *genpd_dev_pm_attach_by_id(struct device *dev,
 unsigned int index)
 {
-   struct device *genpd_dev;
+   struct device *virt_dev;
int num_domains;
int ret;
 
@@ -2352,31 +2352,31 @@ struct device *genpd_dev_pm_attach_by_id(struct device 
*dev,
return NULL;
 
/* Allocate and register device on the genpd bus. */
-   genpd_dev = kzalloc(sizeof(*genpd_dev), GFP_KERNEL);
-   if (!genpd_dev)
+   virt_dev = kzalloc(sizeof(*virt_dev), GFP_KERNEL);
+   if (!virt_dev)
return ERR_PTR(-ENOMEM);
 
-   dev_set_name(genpd_dev, "genpd:%u:%s", index, dev_name(dev));
-   genpd_dev->bus = _bus_type;
-   genpd_dev->release = genpd_release_dev;
+   dev_set_name(virt_dev, "genpd:%u:%s", index, dev_name(dev));
+   virt_dev->bus = _bus_type;
+   virt_dev->release = genpd_release_dev;
 
-   ret = device_register(genpd_dev);
+   ret = device_register(virt_dev);
if (ret) {
-   kfree(genpd_dev);
+   kfree(virt_dev);
return ERR_PTR(ret);
}
 
/* Try to attach the device to the PM domain at the specified index. */
-   ret = __genpd_dev_pm_attach(genpd_dev, dev->of_node, index, false);
+   ret = __genpd_dev_pm_attach(virt_dev, dev->of_node, index, false);
if (ret < 1) {
-   device_unregister(genpd_dev);
+   device_unregister(virt_dev);
return ret ? ERR_PTR(ret) : NULL;
}
 
-   pm_runtime_enable(genpd_dev);
-   genpd_queue_power_off_work(dev_to_genpd(genpd_dev));
+   pm_runtime_enable(virt_dev);
+   genpd_queue_power_off_work(dev_to_genpd(virt_dev));
 
-   return genpd_dev;
+   return virt_dev;
 }
 EXPORT_SYMBOL_GPL(genpd_dev_pm_attach_by_id);
 
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 00/10] OPP: Support multiple power-domains per device

2018-10-24 Thread Viresh Kumar
Hello,

This series improves the OPP core (and a bit of genpd core as well) to
support multiple phandles in the "required-opps" property, which are
only used for multiple power-domains per device for now.

We still don't propagate the changes to master domains for the
sub-domains, but this patchset is an important stepping stone for that
to happen.

Tested on Hikey960 after faking some power domains for CPUs.

V2->V3:
- Gained an additional patch to properly name the virtual device (1/10)
- Renamed the virtual device as virt_dev instead of genpd_dev.
- Few helpers renamed as well accordingly.
- Use container_of() instead of traversing list of genpd.
- Included tags received from Ulf.

V1->V2:
- Had a discussion at Linaro connect with Ulf regarding the changes V1
  did in the genpd core and what his objections are to them. Based on
  his suggestions many changes are made.
- The OPP core still needs the virtual device pointers to set the
  performance state for multiple domains, but the genpd core doesn't
  provide them automatically to the OPP core. One of the reasons behind
  this is to give more power to the consumer drivers which may not want
  to enable all the genpds at once.
- The consumer drivers would now need to call the APIs
  dev_pm_opp_{set|put}_genpd_device() in order to set/reset these
  virtual device pointers.
- More locking is put in place to protect the genpd device pointers in
  OPP core.
- Reorg of the code at many places to make code less redundant.

--
viresh

Viresh Kumar (10):
  PM / Domains: Rename genpd virtual devices as virt_dev
  OPP: Identify and mark genpd OPP tables
  OPP: Separate out custom OPP handler specific code
  OPP: Populate required opp tables from "required-opps" property
  OPP: Populate OPPs from "required-opps" property
  PM / Domains: Add genpd_opp_to_performance_state()
  OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper
  OPP: Configure all required OPPs
  OPP: Rename and relocate of_genpd_opp_to_performance_state()
  OPP: Remove of_dev_pm_opp_find_required_opp()

 drivers/base/power/domain.c |  70 
 drivers/opp/core.c  | 261 ++
 drivers/opp/of.c| 313 +++-
 drivers/opp/opp.h   |  20 +++
 include/linux/pm_domain.h   |   8 +-
 include/linux/pm_opp.h  |  16 +-
 6 files changed, 526 insertions(+), 162 deletions(-)

-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 01/10] PM / Domains: Rename genpd virtual devices as virt_dev

2018-10-24 Thread Viresh Kumar
There are several struct device instances that genpd core handles. The
most common one is the consumer device structure, which is named
(correctly) as "dev" within genpd core. The second one is the genpd's
device structure, referenced as genpd->dev. The third one is the virtual
device structures created by the genpd core to represent the consumer
device for multiple power domain case, currently named as genpd_dev. The
naming of these virtual devices isn't very clear or readable and it
looks more like the genpd->dev.

Rename the virtual device instances within the genpd core as "virt_dev".

Signed-off-by: Viresh Kumar 
---
 drivers/base/power/domain.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 7f38a92b444a..fe9b0527b161 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2338,7 +2338,7 @@ EXPORT_SYMBOL_GPL(genpd_dev_pm_attach);
 struct device *genpd_dev_pm_attach_by_id(struct device *dev,
 unsigned int index)
 {
-   struct device *genpd_dev;
+   struct device *virt_dev;
int num_domains;
int ret;
 
@@ -2352,31 +2352,31 @@ struct device *genpd_dev_pm_attach_by_id(struct device 
*dev,
return NULL;
 
/* Allocate and register device on the genpd bus. */
-   genpd_dev = kzalloc(sizeof(*genpd_dev), GFP_KERNEL);
-   if (!genpd_dev)
+   virt_dev = kzalloc(sizeof(*virt_dev), GFP_KERNEL);
+   if (!virt_dev)
return ERR_PTR(-ENOMEM);
 
-   dev_set_name(genpd_dev, "genpd:%u:%s", index, dev_name(dev));
-   genpd_dev->bus = _bus_type;
-   genpd_dev->release = genpd_release_dev;
+   dev_set_name(virt_dev, "genpd:%u:%s", index, dev_name(dev));
+   virt_dev->bus = _bus_type;
+   virt_dev->release = genpd_release_dev;
 
-   ret = device_register(genpd_dev);
+   ret = device_register(virt_dev);
if (ret) {
-   kfree(genpd_dev);
+   kfree(virt_dev);
return ERR_PTR(ret);
}
 
/* Try to attach the device to the PM domain at the specified index. */
-   ret = __genpd_dev_pm_attach(genpd_dev, dev->of_node, index, false);
+   ret = __genpd_dev_pm_attach(virt_dev, dev->of_node, index, false);
if (ret < 1) {
-   device_unregister(genpd_dev);
+   device_unregister(virt_dev);
return ret ? ERR_PTR(ret) : NULL;
}
 
-   pm_runtime_enable(genpd_dev);
-   genpd_queue_power_off_work(dev_to_genpd(genpd_dev));
+   pm_runtime_enable(virt_dev);
+   genpd_queue_power_off_work(dev_to_genpd(virt_dev));
 
-   return genpd_dev;
+   return virt_dev;
 }
 EXPORT_SYMBOL_GPL(genpd_dev_pm_attach_by_id);
 
-- 
2.19.1.568.g152ad8e3369a



[PATCH V3 00/10] OPP: Support multiple power-domains per device

2018-10-24 Thread Viresh Kumar
Hello,

This series improves the OPP core (and a bit of genpd core as well) to
support multiple phandles in the "required-opps" property, which are
only used for multiple power-domains per device for now.

We still don't propagate the changes to master domains for the
sub-domains, but this patchset is an important stepping stone for that
to happen.

Tested on Hikey960 after faking some power domains for CPUs.

V2->V3:
- Gained an additional patch to properly name the virtual device (1/10)
- Renamed the virtual device as virt_dev instead of genpd_dev.
- Few helpers renamed as well accordingly.
- Use container_of() instead of traversing list of genpd.
- Included tags received from Ulf.

V1->V2:
- Had a discussion at Linaro connect with Ulf regarding the changes V1
  did in the genpd core and what his objections are to them. Based on
  his suggestions many changes are made.
- The OPP core still needs the virtual device pointers to set the
  performance state for multiple domains, but the genpd core doesn't
  provide them automatically to the OPP core. One of the reasons behind
  this is to give more power to the consumer drivers which may not want
  to enable all the genpds at once.
- The consumer drivers would now need to call the APIs
  dev_pm_opp_{set|put}_genpd_device() in order to set/reset these
  virtual device pointers.
- More locking is put in place to protect the genpd device pointers in
  OPP core.
- Reorg of the code at many places to make code less redundant.

--
viresh

Viresh Kumar (10):
  PM / Domains: Rename genpd virtual devices as virt_dev
  OPP: Identify and mark genpd OPP tables
  OPP: Separate out custom OPP handler specific code
  OPP: Populate required opp tables from "required-opps" property
  OPP: Populate OPPs from "required-opps" property
  PM / Domains: Add genpd_opp_to_performance_state()
  OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helper
  OPP: Configure all required OPPs
  OPP: Rename and relocate of_genpd_opp_to_performance_state()
  OPP: Remove of_dev_pm_opp_find_required_opp()

 drivers/base/power/domain.c |  70 
 drivers/opp/core.c  | 261 ++
 drivers/opp/of.c| 313 +++-
 drivers/opp/opp.h   |  20 +++
 include/linux/pm_domain.h   |   8 +-
 include/linux/pm_opp.h  |  16 +-
 6 files changed, 526 insertions(+), 162 deletions(-)

-- 
2.19.1.568.g152ad8e3369a



[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction

2018-10-24 Thread tip-bot for Fenghua Yu
Commit-ID:  ace6485a03266cc3c198ce8e927a1ce0ce139699
Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIR64B instruction

MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
work descriptors (and data in some cases) to device-hosted work-queues
atomically without cache pollution.

Availability of the MOVDIR64B instruction is indicated by the
presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIR64B flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 90934ee7b79a..28c4a502b419 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -332,6 +332,7 @@
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
+#define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */


[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIR64B instruction

2018-10-24 Thread tip-bot for Fenghua Yu
Commit-ID:  ace6485a03266cc3c198ce8e927a1ce0ce139699
Gitweb: https://git.kernel.org/tip/ace6485a03266cc3c198ce8e927a1ce0ce139699
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:17 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIR64B instruction

MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity.
Direct store is implemented by using write combining (WC) for writing
data directly into memory without caching the data.

In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes
work descriptors (and data in some cases) to device-hosted work-queues
atomically without cache pollution.

Availability of the MOVDIR64B instruction is indicated by the
presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIR64B flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 90934ee7b79a..28c4a502b419 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -332,6 +332,7 @@
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
+#define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */


[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction

2018-10-24 Thread tip-bot for Fenghua Yu
Commit-ID:  33823f4d63f7a010653d219800539409a78ef4be
Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIRI instruction

MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Programmable agents can handle streaming offload (e.g. high speed packet
processing in network). Hardware implements a doorbell (tail pointer)
register that is updated by software when adding new work-elements to
the streaming offload work-queue.

MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
uncachable write to MMIO. MOVDIRI has lower overhead than other ways
to write the doorbell.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIRI flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 89a048c2faec..90934ee7b79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -331,6 +331,7 @@
 #define X86_FEATURE_LA57   (16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
+#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */


[tip:x86/urgent] x86/cpufeatures: Enumerate MOVDIRI instruction

2018-10-24 Thread tip-bot for Fenghua Yu
Commit-ID:  33823f4d63f7a010653d219800539409a78ef4be
Gitweb: https://git.kernel.org/tip/33823f4d63f7a010653d219800539409a78ef4be
Author: Fenghua Yu 
AuthorDate: Wed, 24 Oct 2018 14:57:16 -0700
Committer:  Ingo Molnar 
CommitDate: Thu, 25 Oct 2018 07:42:48 +0200

x86/cpufeatures: Enumerate MOVDIRI instruction

MOVDIRI moves doubleword or quadword from register to memory through
direct store which is implemented by using write combining (WC) for
writing data directly into memory without caching the data.

Programmable agents can handle streaming offload (e.g. high speed packet
processing in network). Hardware implements a doorbell (tail pointer)
register that is updated by software when adding new work-elements to
the streaming offload work-queue.

MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte
uncachable write to MMIO. MOVDIRI has lower overhead than other ways
to write the doorbell.

Availability of the MOVDIRI instruction is indicated by the presence of
the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]).

Please check the latest Intel Architecture Instruction Set Extensions
and Future Features Programming Reference for more details on the CPUID
feature MOVDIRI flag.

Signed-off-by: Fenghua Yu 
Cc: Andy Lutomirski 
Cc: Ashok Raj 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Ravi V Shankar 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 89a048c2faec..90934ee7b79a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -331,6 +331,7 @@
 #define X86_FEATURE_LA57   (16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID  (16*32+22) /* RDPID instruction */
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
+#define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */


Re: [PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem

2018-10-24 Thread Ingo Molnar


* Bjorn Helgaas  wrote:

> From: Bjorn Helgaas 
> 
> arch/x86/kernel/early-quirks.c contains special PCI quirks that need to
> run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks.  These have
> typically been merged by the x86 maintainers, which is fine, but PCI folks
> should at least see what's happening, so add a file pattern to the PCI
> subsystem entry.
> 
> Signed-off-by: Bjorn Helgaas 
> ---
>  MAINTAINERS |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4ece30f15777..63cb7f3dbbb4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11249,6 +11249,7 @@ F:include/uapi/linux/pci*
>  F:   lib/pci*
>  F:   arch/x86/pci/
>  F:   arch/x86/kernel/quirks.c
> +F:   arch/x86/kernel/early-quirks.c

Good idea!

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH] MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem

2018-10-24 Thread Ingo Molnar


* Bjorn Helgaas  wrote:

> From: Bjorn Helgaas 
> 
> arch/x86/kernel/early-quirks.c contains special PCI quirks that need to
> run even before the usual DECLARE_PCI_FIXUP_EARLY() quirks.  These have
> typically been merged by the x86 maintainers, which is fine, but PCI folks
> should at least see what's happening, so add a file pattern to the PCI
> subsystem entry.
> 
> Signed-off-by: Bjorn Helgaas 
> ---
>  MAINTAINERS |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4ece30f15777..63cb7f3dbbb4 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11249,6 +11249,7 @@ F:include/uapi/linux/pci*
>  F:   lib/pci*
>  F:   arch/x86/pci/
>  F:   arch/x86/kernel/quirks.c
> +F:   arch/x86/kernel/early-quirks.c

Good idea!

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-24 Thread Arun KS

On 2018-10-23 11:43, Julia Lawall wrote:

On Tue, 23 Oct 2018, Michal Hocko wrote:

[Trimmed CC list + Julia - there is indeed no need to CC everybody 
maintain a

file you are updating for the change like this]

On Tue 23-10-18 10:16:51, Arun Sudhilal wrote:
> On Mon, Oct 22, 2018 at 11:41 PM Michal Hocko  wrote:
> >
> > On Mon 22-10-18 22:53:22, Arun KS wrote:
> > > Remove managed_page_count_lock spinlock and instead use atomic
> > > variables.
> >
>
> Hello Michal,
> > I assume this has been auto-generated. If yes, it would be better to
> > mention the script so that people can review it and regenerate for
> > comparision. Such a large change is hard to review manually.
>
> Changes were made partially with script.  For totalram_pages and
> totalhigh_pages,
>
> find dir -type f -exec sed -i
> 's/totalram_pages/atomic_long_read(\_pages)/g' {} \;
> find dir -type f -exec sed -i
> 's/totalhigh_pages/atomic_long_read(\_pages)/g' {} \;
>
> For managed_pages it was mostly manual edits after using,
> find mm/ -type f -exec sed -i
> 's/zone->managed_pages/atomic_long_read(\>managed_pages)/g' {}
> \;

I guess we should be able to use coccinelle for this kind of change 
and

reduce the amount of manual intervention to absolute minimum.


Coccinelle looks like it would be desirable, especially in case the 
word

zone is not always used.

Arun, please feel free to contact me if you want to try it and need 
help.

Hello Julia,

I was able to come up .cocci for replacing managed_pages,

@@
struct zone *z;
@@
(
z->managed_pages = ...
|
- z->managed_pages
+ atomic_long_read(>managed_pages)
)

@@
struct zone *z;
expression e1;
@@
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ atomic_long_read(_pages)

But I m not able to use same method for unsigned long 
variables(totalram_pages)


@@
unsigned long totalram_pages;
@@
(
totalram_pages = ...
|
-totalram_pages
+atomic_long_read(_pages)
)

This throws error,

spatch test1.cocci mm/page_alloc.c
init_defs_builtins: /usr/lib/coccinelle/standard.h
HANDLING: mm/page_alloc.c

previous modification:
MINUS
  >>> atomic_long_read( starting on line 1:totalram_pages)

According to environment 1:
   rule starting on line 1.totalram_pages -> page_idx ^ (1 <<
order)

current modification:
MINUS
  >>> atomic_long_read( starting on line 1:totalram_pages)

According to environment 1:
   rule starting on line 1.totalram_pages -> page_idx

Fatal error: exception Failure("rule starting on line 1: already tagged 
token:

C code context
File "mm/internal.h", line 175, column 8,  charpos = 5368
around = 'page_idx', whole content =return page_idx ^ (1 << 
order);")


Regards,
Arun




julia


Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-24 Thread Arun KS

On 2018-10-23 11:43, Julia Lawall wrote:

On Tue, 23 Oct 2018, Michal Hocko wrote:

[Trimmed CC list + Julia - there is indeed no need to CC everybody 
maintain a

file you are updating for the change like this]

On Tue 23-10-18 10:16:51, Arun Sudhilal wrote:
> On Mon, Oct 22, 2018 at 11:41 PM Michal Hocko  wrote:
> >
> > On Mon 22-10-18 22:53:22, Arun KS wrote:
> > > Remove managed_page_count_lock spinlock and instead use atomic
> > > variables.
> >
>
> Hello Michal,
> > I assume this has been auto-generated. If yes, it would be better to
> > mention the script so that people can review it and regenerate for
> > comparision. Such a large change is hard to review manually.
>
> Changes were made partially with script.  For totalram_pages and
> totalhigh_pages,
>
> find dir -type f -exec sed -i
> 's/totalram_pages/atomic_long_read(\_pages)/g' {} \;
> find dir -type f -exec sed -i
> 's/totalhigh_pages/atomic_long_read(\_pages)/g' {} \;
>
> For managed_pages it was mostly manual edits after using,
> find mm/ -type f -exec sed -i
> 's/zone->managed_pages/atomic_long_read(\>managed_pages)/g' {}
> \;

I guess we should be able to use coccinelle for this kind of change 
and

reduce the amount of manual intervention to absolute minimum.


Coccinelle looks like it would be desirable, especially in case the 
word

zone is not always used.

Arun, please feel free to contact me if you want to try it and need 
help.

Hello Julia,

I was able to come up .cocci for replacing managed_pages,

@@
struct zone *z;
@@
(
z->managed_pages = ...
|
- z->managed_pages
+ atomic_long_read(>managed_pages)
)

@@
struct zone *z;
expression e1;
@@
- z->managed_pages = e1
+ atomic_long_set(>managed_pages, e1)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, >managed_pages)

@@
expression z;
@@
- z.managed_pages
+ atomic_long_read(_pages)

But I m not able to use same method for unsigned long 
variables(totalram_pages)


@@
unsigned long totalram_pages;
@@
(
totalram_pages = ...
|
-totalram_pages
+atomic_long_read(_pages)
)

This throws error,

spatch test1.cocci mm/page_alloc.c
init_defs_builtins: /usr/lib/coccinelle/standard.h
HANDLING: mm/page_alloc.c

previous modification:
MINUS
  >>> atomic_long_read( starting on line 1:totalram_pages)

According to environment 1:
   rule starting on line 1.totalram_pages -> page_idx ^ (1 <<
order)

current modification:
MINUS
  >>> atomic_long_read( starting on line 1:totalram_pages)

According to environment 1:
   rule starting on line 1.totalram_pages -> page_idx

Fatal error: exception Failure("rule starting on line 1: already tagged 
token:

C code context
File "mm/internal.h", line 175, column 8,  charpos = 5368
around = 'page_idx', whole content =return page_idx ^ (1 << 
order);")


Regards,
Arun




julia


Linux kernel crash

2018-10-24 Thread Stephen & Linda Smith
A week ago I upgraded to the latest Ubuntu distribution (Linux thunderbird 
4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 x86_64 
x86_64 GNU/Linux).   

Whenever I run "shutdown -h now" or "reboot" I receive an immediate kernel 
crash with a dump that has:

"Code: Bad RIP value"

What is the best way of collecting and sending information to upstream.  I 
don't mind learning to build and install the kernel if that will help debug 
this.

sps






Linux kernel crash

2018-10-24 Thread Stephen & Linda Smith
A week ago I upgraded to the latest Ubuntu distribution (Linux thunderbird 
4.18.0-10-generic #11-Ubuntu SMP Thu Oct 11 15:13:55 UTC 2018 x86_64 x86_64 
x86_64 GNU/Linux).   

Whenever I run "shutdown -h now" or "reboot" I receive an immediate kernel 
crash with a dump that has:

"Code: Bad RIP value"

What is the best way of collecting and sending information to upstream.  I 
don't mind learning to build and install the kernel if that will help debug 
this.

sps






Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-24 Thread Sergey Senozhatsky
On (10/24/18 22:27), Rafael David Tinoco wrote:
>  static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
>  {
> - unsigned long obj;
> + unsigned long obj, pfn;
> +
> + pfn = page_to_pfn(page);
> +
> + if (unlikely(OBJ_OVERFLOW(pfn)))
> + BUG();

The trend these days is to have less BUG/BUG_ON-s in the kernel.

-ss


Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-24 Thread Sergey Senozhatsky
On (10/24/18 22:27), Rafael David Tinoco wrote:
>  static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
>  {
> - unsigned long obj;
> + unsigned long obj, pfn;
> +
> + pfn = page_to_pfn(page);
> +
> + if (unlikely(OBJ_OVERFLOW(pfn)))
> + BUG();

The trend these days is to have less BUG/BUG_ON-s in the kernel.

-ss


Re: [GIT PULL] Qualcomm ARM64 DT updates for 4.20

2018-10-24 Thread Amit Kucheria
On Fri, Oct 5, 2018 at 7:59 PM Andy Gross  wrote:
>
> On Tue, Oct 02, 2018 at 11:30:29AM +0200, Arnd Bergmann wrote:
> > On Sun, Sep 30, 2018 at 8:38 PM Andy Gross  wrote:
> > >
> > > The following changes since commit 
> > > 5b394b2ddf0347bef56e50c69a58773c94343ff3:
> > >
> > >   Linux 4.19-rc1 (2018-08-26 14:11:59 -0700)
> > >
> > > are available in the git repository at:
> > >
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
> > > tags/qcom-arm64-for-4.20
> > >
> > > for you to fetch changes up to 6db0483cf622fc690e32e0804f8b86061f13962a:
> > >
> > >   Revert "dt-bindings: thermal: qcom-spmi-temp-alarm: Fix documentation 
> > > of 'reg'" (2018-09-30 12:59:52 -0500)
> > >
> > > 
> > > Qualcomm ARM64 Updates for v4.20
> > >
> > > * Update Coresight for MSM8916
> > > * Switch to use mailbox for smp2p and smd on MSM8996
> > > * Add dispcc, dsp, USB, regulator, and other nodes for SDM845
> > > * Drop model/compatible from MSM8916 and MSM8996
> > > * Add compat for db820c
> > > * Add MSM8998 SoC and board support along with associated nodes
> > > * Add RESIN/PON for Qualcomm PM8916 and PM8994
> >
> > Pulled into next/dt, thanks.
> >
> > There is a small chance that the two reverts end up causing more
> > problems than they help: they might conflict with the patches merged
> > into the iio branch, or they could end up being applied on top of
> > the iio branch and revert the changes during the release. I think
> > it will be fine, but it's obviously better to avoid this.
> >
>
> Agreed, I'll check in on this once -rc1 hits to make sure they weren't 
> removed.

Andy,

While you acked the tsens thermal DT patches[1] so they could go
through the thermal tree[2] to avoid a dependency, Eduardo would
prefer the DT changes for tsens to go through the arch tree.

To avoid a scenario where these DT changes don't get merged until the
next merge window, would you or arm-soc maintainers consider merging
them in -rc2 (after the thermal-soc tree gets merged)?

For convenience, I've provided just the DT changes with all tags in a
separate branch on top of v4.19 here:
https://git.linaro.org/people/amit.kucheria/kernel.git/log/?h=up/thermal/tsens-preirq-cleanup-v4

Regards,
Amit
[1] Here is a reference to the patch series with Andy's acks:
https://lore.kernel.org/lkml/cover.1536744310.git.amit.kuche...@linaro.org/T/
[2] Here is a link to Eduardo's tree containing a subset of the above
series (I don't yet see a pull request):
https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git/log/?h=linus


Re: [GIT PULL] Qualcomm ARM64 DT updates for 4.20

2018-10-24 Thread Amit Kucheria
On Fri, Oct 5, 2018 at 7:59 PM Andy Gross  wrote:
>
> On Tue, Oct 02, 2018 at 11:30:29AM +0200, Arnd Bergmann wrote:
> > On Sun, Sep 30, 2018 at 8:38 PM Andy Gross  wrote:
> > >
> > > The following changes since commit 
> > > 5b394b2ddf0347bef56e50c69a58773c94343ff3:
> > >
> > >   Linux 4.19-rc1 (2018-08-26 14:11:59 -0700)
> > >
> > > are available in the git repository at:
> > >
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git 
> > > tags/qcom-arm64-for-4.20
> > >
> > > for you to fetch changes up to 6db0483cf622fc690e32e0804f8b86061f13962a:
> > >
> > >   Revert "dt-bindings: thermal: qcom-spmi-temp-alarm: Fix documentation 
> > > of 'reg'" (2018-09-30 12:59:52 -0500)
> > >
> > > 
> > > Qualcomm ARM64 Updates for v4.20
> > >
> > > * Update Coresight for MSM8916
> > > * Switch to use mailbox for smp2p and smd on MSM8996
> > > * Add dispcc, dsp, USB, regulator, and other nodes for SDM845
> > > * Drop model/compatible from MSM8916 and MSM8996
> > > * Add compat for db820c
> > > * Add MSM8998 SoC and board support along with associated nodes
> > > * Add RESIN/PON for Qualcomm PM8916 and PM8994
> >
> > Pulled into next/dt, thanks.
> >
> > There is a small chance that the two reverts end up causing more
> > problems than they help: they might conflict with the patches merged
> > into the iio branch, or they could end up being applied on top of
> > the iio branch and revert the changes during the release. I think
> > it will be fine, but it's obviously better to avoid this.
> >
>
> Agreed, I'll check in on this once -rc1 hits to make sure they weren't 
> removed.

Andy,

While you acked the tsens thermal DT patches[1] so they could go
through the thermal tree[2] to avoid a dependency, Eduardo would
prefer the DT changes for tsens to go through the arch tree.

To avoid a scenario where these DT changes don't get merged until the
next merge window, would you or arm-soc maintainers consider merging
them in -rc2 (after the thermal-soc tree gets merged)?

For convenience, I've provided just the DT changes with all tags in a
separate branch on top of v4.19 here:
https://git.linaro.org/people/amit.kucheria/kernel.git/log/?h=up/thermal/tsens-preirq-cleanup-v4

Regards,
Amit
[1] Here is a reference to the patch series with Andy's acks:
https://lore.kernel.org/lkml/cover.1536744310.git.amit.kuche...@linaro.org/T/
[2] Here is a link to Eduardo's tree containing a subset of the above
series (I don't yet see a pull request):
https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal.git/log/?h=linus


Re: [PATCH anybus v1 2/4] dt-bindings: anybus-bridge: document devicetree binding.

2018-10-24 Thread Lee Jones
On Wed, 24 Oct 2018, Sven Van Asbroeck wrote:

> This patch adds devicetree binding documentation for the
> Arcx anybus bridge.
> 
> Signed-off-by: Sven Van Asbroeck 
> ---
>  .../bindings/mfd/arcx,anybus-bridge.txt   | 37 +++
>  .../devicetree/bindings/vendor-prefixes.txt   |  1 +
>  2 files changed, 38 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> 
> diff --git a/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt 
> b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> new file mode 100644
> index ..3c0399c4ed45
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> @@ -0,0 +1,37 @@
> +* Arcx anybus bridge
> +
> +This chip communicates with the SoC over the WEIM bus. It is
> +expected that its Device Tree node is specified as the child of a node
> +corresponding to the WEIM bus used for communication.
> +
> +Required properties:
> +
> +  - compatible : The following chip-specific string:
> +"arcx,anybus-bridge"
> +
> +  - reg :
> + weim memory area where the cpld registers are located,  followed by:
> + weim memory area of the first  anybus-s slot,   followed by:
> + weim memory area of the second anybus-s slot.
> +
> +  - enable-gpios : the gpio connected to the bridge's 'enable gpio'.
> +
> +  - pwms : the pwm connected to the bridge's 'pwm input'.
> +
> +  - irq-gpios : the gpios connected to the bridge's interrupt lines.
> + note that there is no need to provide the 'interrupts' property here.
> +
> +Example of usage:
> +
> + {
> + bridge@0,0 {

I haven't seen this syntax before.

It doesn't mean it's wrong, but will needs Rob et. al to cast an eye.

> + compatible = "arcx,anybus-bridge";
> + reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>;
> + fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100
> + 0x 0xa240 0x>;

This needs to be documented.

> + enable-gpios = < 2 GPIO_ACTIVE_HIGH>;
> + pwms = < 0 571>;
> + irq-gpios = < 1 GPIO_ACTIVE_HIGH>,
> + < 5 GPIO_ACTIVE_HIGH>;

Tabbing.

> + };
> +};
> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
> b/Documentation/devicetree/bindings/vendor-prefixes.txt
> index 2c3fc512e746..1bf07b20a8af 100644
> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> @@ -35,6 +35,7 @@ aptina  Aptina Imaging
>  arasan   Arasan Chip Systems
>  archermind ArcherMind Technology (Nanjing) Co., Ltd.
>  arctic   Arctic Sand
> +arcx Arcx/Archronix Inc.
>  ariesAries Embedded GmbH
>  arm  ARM Ltd.
>  armadeus ARMadeus Systems SARL

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [PATCH anybus v1 2/4] dt-bindings: anybus-bridge: document devicetree binding.

2018-10-24 Thread Lee Jones
On Wed, 24 Oct 2018, Sven Van Asbroeck wrote:

> This patch adds devicetree binding documentation for the
> Arcx anybus bridge.
> 
> Signed-off-by: Sven Van Asbroeck 
> ---
>  .../bindings/mfd/arcx,anybus-bridge.txt   | 37 +++
>  .../devicetree/bindings/vendor-prefixes.txt   |  1 +
>  2 files changed, 38 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> 
> diff --git a/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt 
> b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> new file mode 100644
> index ..3c0399c4ed45
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/arcx,anybus-bridge.txt
> @@ -0,0 +1,37 @@
> +* Arcx anybus bridge
> +
> +This chip communicates with the SoC over the WEIM bus. It is
> +expected that its Device Tree node is specified as the child of a node
> +corresponding to the WEIM bus used for communication.
> +
> +Required properties:
> +
> +  - compatible : The following chip-specific string:
> +"arcx,anybus-bridge"
> +
> +  - reg :
> + weim memory area where the cpld registers are located,  followed by:
> + weim memory area of the first  anybus-s slot,   followed by:
> + weim memory area of the second anybus-s slot.
> +
> +  - enable-gpios : the gpio connected to the bridge's 'enable gpio'.
> +
> +  - pwms : the pwm connected to the bridge's 'pwm input'.
> +
> +  - irq-gpios : the gpios connected to the bridge's interrupt lines.
> + note that there is no need to provide the 'interrupts' property here.
> +
> +Example of usage:
> +
> + {
> + bridge@0,0 {

I haven't seen this syntax before.

It doesn't mean it's wrong, but will needs Rob et. al to cast an eye.

> + compatible = "arcx,anybus-bridge";
> + reg = <0 0 0x100>, <0 0x40 0x800>, <1 0x40 0x800>;
> + fsl,weim-cs-timing = <0x024400b1 0x1010 0x20081100
> + 0x 0xa240 0x>;

This needs to be documented.

> + enable-gpios = < 2 GPIO_ACTIVE_HIGH>;
> + pwms = < 0 571>;
> + irq-gpios = < 1 GPIO_ACTIVE_HIGH>,
> + < 5 GPIO_ACTIVE_HIGH>;

Tabbing.

> + };
> +};
> diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
> b/Documentation/devicetree/bindings/vendor-prefixes.txt
> index 2c3fc512e746..1bf07b20a8af 100644
> --- a/Documentation/devicetree/bindings/vendor-prefixes.txt
> +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
> @@ -35,6 +35,7 @@ aptina  Aptina Imaging
>  arasan   Arasan Chip Systems
>  archermind ArcherMind Technology (Nanjing) Co., Ltd.
>  arctic   Arctic Sand
> +arcx Arcx/Archronix Inc.
>  ariesAries Embedded GmbH
>  arm  ARM Ltd.
>  armadeus ARMadeus Systems SARL

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-24 Thread Vlastimil Babka
On 10/25/18 1:17 AM, Andrew Morton wrote:
> On Mon, 22 Oct 2018 15:27:54 +0200 Vlastimil Babka  wrote:
> 
>>> : Moreover the oriinal code allowed to trigger
>>> :   WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
>>> : in policy_node if the requested node (e.g. cpu local one) was outside of
>>> : the mbind nodemask. This is not possible now. We haven't heard about any
>>> : such warning yet so it is unlikely that it happens but still a signal of
>>> : a wrong code layering.
>>
>> Ah, as I said in the other mail, I think it's inaccurate, the warning
>> was not possible to hit.
>>
>> There's also a slight difference wrt MPOL_BIND. The previous code would
>> avoid using __GFP_THISNODE if the local node was outside of
>> policy_nodemask(). After your patch __GFP_THISNODE is avoided for all
>> MPOL_BIND policies. So there's a difference that if local node is
>> actually allowed by the bind policy's nodemask, previously
>> __GFP_THISNODE would be added, but now it won't be. I don't think it
>> matters that much though, but maybe the changelog could say that
>> (instead of the inaccurate note about warning). Note the other policy
>> where nodemask is relevant is MPOL_INTERLEAVE, and that's unchanged by
>> this patch.
> 
> So the above could go into the changelog, yes?

Yeah.

>> When that's addressed, you can add
> 
> What is it that you'd like to see addressed?  Purely changelog updates?

Right.

>> Acked-by: Vlastimil Babka 
> 
> Thanks.
> 



Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-24 Thread Vlastimil Babka
On 10/25/18 1:17 AM, Andrew Morton wrote:
> On Mon, 22 Oct 2018 15:27:54 +0200 Vlastimil Babka  wrote:
> 
>>> : Moreover the oriinal code allowed to trigger
>>> :   WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
>>> : in policy_node if the requested node (e.g. cpu local one) was outside of
>>> : the mbind nodemask. This is not possible now. We haven't heard about any
>>> : such warning yet so it is unlikely that it happens but still a signal of
>>> : a wrong code layering.
>>
>> Ah, as I said in the other mail, I think it's inaccurate, the warning
>> was not possible to hit.
>>
>> There's also a slight difference wrt MPOL_BIND. The previous code would
>> avoid using __GFP_THISNODE if the local node was outside of
>> policy_nodemask(). After your patch __GFP_THISNODE is avoided for all
>> MPOL_BIND policies. So there's a difference that if local node is
>> actually allowed by the bind policy's nodemask, previously
>> __GFP_THISNODE would be added, but now it won't be. I don't think it
>> matters that much though, but maybe the changelog could say that
>> (instead of the inaccurate note about warning). Note the other policy
>> where nodemask is relevant is MPOL_INTERLEAVE, and that's unchanged by
>> this patch.
> 
> So the above could go into the changelog, yes?

Yeah.

>> When that's addressed, you can add
> 
> What is it that you'd like to see addressed?  Purely changelog updates?

Right.

>> Acked-by: Vlastimil Babka 
> 
> Thanks.
> 



Re: in_compat_syscall() returns from kernel thread for X86_32.

2018-10-24 Thread Andy Lutomirski



> On Oct 24, 2018, at 8:46 PM, NeilBrown  wrote:
> 
>> On Wed, Oct 24 2018, Theodore Y. Ts'o wrote:
>> 
>>> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote:
>>> 
>>> I doubt it was copied - more likely independent evolution.
>>> But on reflection, I see that it is probably reasonable that it
>>> shouldn't be used this way - or at all in this context.
>>> I'll try to understand what the issues really are and see if I can
>>> find a solution that doesn't depend on this interface.
>>> Thanks for your help.
>> 
>> At least for ext4, the primary problem is that we want to use a 64-bit
>> telldir/seekdir cookie if all 64-bits will make it to user space, and
>> a 32-bit telldir cookie if only 32 bits will make it userspace.  This
>> impacts NFS as well because if there are people who are still using
>> NFSv2, which has 32-bit directory offsets, we need to constrain the
>> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a
>> 64-bit hash.
> 
> NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the
> right thing.  FMODE_32BITHASH is set for NFSv2 only.
> 
> Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs
> to set FMODE_64BITHASH - or something like that.

It’s possible for a 32-bit process and a 64-bit process to share a directory 
fd, so I don’t think it’s quite that simple.

One option would be to add .llseek and .getdents flags or entire new compat 
operations to indicate that the caller expects 32-bit offsets.

I wonder how overlayfs interacts with this whole mess.

Re: in_compat_syscall() returns from kernel thread for X86_32.

2018-10-24 Thread Andy Lutomirski



> On Oct 24, 2018, at 8:46 PM, NeilBrown  wrote:
> 
>> On Wed, Oct 24 2018, Theodore Y. Ts'o wrote:
>> 
>>> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote:
>>> 
>>> I doubt it was copied - more likely independent evolution.
>>> But on reflection, I see that it is probably reasonable that it
>>> shouldn't be used this way - or at all in this context.
>>> I'll try to understand what the issues really are and see if I can
>>> find a solution that doesn't depend on this interface.
>>> Thanks for your help.
>> 
>> At least for ext4, the primary problem is that we want to use a 64-bit
>> telldir/seekdir cookie if all 64-bits will make it to user space, and
>> a 32-bit telldir cookie if only 32 bits will make it userspace.  This
>> impacts NFS as well because if there are people who are still using
>> NFSv2, which has 32-bit directory offsets, we need to constrain the
>> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a
>> 64-bit hash.
> 
> NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the
> right thing.  FMODE_32BITHASH is set for NFSv2 only.
> 
> Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs
> to set FMODE_64BITHASH - or something like that.

It’s possible for a 32-bit process and a 64-bit process to share a directory 
fd, so I don’t think it’s quite that simple.

One option would be to add .llseek and .getdents flags or entire new compat 
operations to indicate that the caller expects 32-bit offsets.

I wonder how overlayfs interacts with this whole mess.

Re: in_compat_syscall() returns from kernel thread for X86_32.

2018-10-24 Thread NeilBrown
On Wed, Oct 24 2018, Theodore Y. Ts'o wrote:

> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote:
>> 
>> I doubt it was copied - more likely independent evolution.
>> But on reflection, I see that it is probably reasonable that it
>> shouldn't be used this way - or at all in this context.
>> I'll try to understand what the issues really are and see if I can
>> find a solution that doesn't depend on this interface.
>> Thanks for your help.
>
> At least for ext4, the primary problem is that we want to use a 64-bit
> telldir/seekdir cookie if all 64-bits will make it to user space, and
> a 32-bit telldir cookie if only 32 bits will make it userspace.  This
> impacts NFS as well because if there are people who are still using
> NFSv2, which has 32-bit directory offsets, we need to constrain the
> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a
> 64-bit hash.

NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the
right thing.  FMODE_32BITHASH is set for NFSv2 only.

Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs
to set FMODE_64BITHASH - or something like that.

For lustre it is a bit more complex.  The internal "inode number" is 128
bits and we (sort of) hash it to 32 or 64 bits.  cp_compat_stat() just
says -EOVERFLOW if we give a 64 bit number when 32 are expected, and
there is no flag to say "this is a 32-bit 'stat' request".

But I need to dig into exactly what that "sort-of" means - maybe there
is an answer in there.

NeilBrown


signature.asc
Description: PGP signature


Re: in_compat_syscall() returns from kernel thread for X86_32.

2018-10-24 Thread NeilBrown
On Wed, Oct 24 2018, Theodore Y. Ts'o wrote:

> On Wed, Oct 24, 2018 at 12:47:57PM +1100, NeilBrown wrote:
>> 
>> I doubt it was copied - more likely independent evolution.
>> But on reflection, I see that it is probably reasonable that it
>> shouldn't be used this way - or at all in this context.
>> I'll try to understand what the issues really are and see if I can
>> find a solution that doesn't depend on this interface.
>> Thanks for your help.
>
> At least for ext4, the primary problem is that we want to use a 64-bit
> telldir/seekdir cookie if all 64-bits will make it to user space, and
> a 32-bit telldir cookie if only 32 bits will make it userspace.  This
> impacts NFS as well because if there are people who are still using
> NFSv2, which has 32-bit directory offsets, we need to constrain the
> telldir/seekdir cookies we give to NFS to be a 32 has as opposed to a
> 64-bit hash.

NFSd uses FMODE_32BITHASH or FMODE64BITHASH to allow ext4 to do the
right thing.  FMODE_32BITHASH is set for NFSv2 only.

Maybe sys_getdents needs to set FMODE_32BITHASH, and sys_getdent64 needs
to set FMODE_64BITHASH - or something like that.

For lustre it is a bit more complex.  The internal "inode number" is 128
bits and we (sort of) hash it to 32 or 64 bits.  cp_compat_stat() just
says -EOVERFLOW if we give a 64 bit number when 32 are expected, and
there is no flag to say "this is a 32-bit 'stat' request".

But I need to dig into exactly what that "sort-of" means - maybe there
is an answer in there.

NeilBrown


signature.asc
Description: PGP signature


[PATCH v4 1/4] nds32: Fix bug in bitfield.h

2018-10-24 Thread Nick Hu
From: Nickhu 

There two bitfield bug for perfomance counter
in bitfield.h:

PFM_CTL_offSEL1 21 --> 16
PFM_CTL_offSEL2 27 --> 22

This commit fix it.

Signed-off-by: Nickhu 
---
 arch/nds32/include/asm/bitfield.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/nds32/include/asm/bitfield.h 
b/arch/nds32/include/asm/bitfield.h
index 8e84fc385b94..19b2841219ad 100644
--- a/arch/nds32/include/asm/bitfield.h
+++ b/arch/nds32/include/asm/bitfield.h
@@ -692,8 +692,8 @@
 #define PFM_CTL_offKU1 13  /* Enable user mode event counting for 
PFMC1 */
 #define PFM_CTL_offKU2 14  /* Enable user mode event counting for 
PFMC2 */
 #define PFM_CTL_offSEL015  /* The event selection for 
PFMC0 */
-#define PFM_CTL_offSEL121  /* The event selection for 
PFMC1 */
-#define PFM_CTL_offSEL227  /* The event selection for 
PFMC2 */
+#define PFM_CTL_offSEL116  /* The event selection for 
PFMC1 */
+#define PFM_CTL_offSEL222  /* The event selection for 
PFMC2 */
 /* bit 28:31 reserved */
 
 #define PFM_CTL_mskEN0 ( 0x01  << PFM_CTL_offEN0 )
-- 
2.17.0



[PATCH v4 1/4] nds32: Fix bug in bitfield.h

2018-10-24 Thread Nick Hu
From: Nickhu 

There two bitfield bug for perfomance counter
in bitfield.h:

PFM_CTL_offSEL1 21 --> 16
PFM_CTL_offSEL2 27 --> 22

This commit fix it.

Signed-off-by: Nickhu 
---
 arch/nds32/include/asm/bitfield.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/nds32/include/asm/bitfield.h 
b/arch/nds32/include/asm/bitfield.h
index 8e84fc385b94..19b2841219ad 100644
--- a/arch/nds32/include/asm/bitfield.h
+++ b/arch/nds32/include/asm/bitfield.h
@@ -692,8 +692,8 @@
 #define PFM_CTL_offKU1 13  /* Enable user mode event counting for 
PFMC1 */
 #define PFM_CTL_offKU2 14  /* Enable user mode event counting for 
PFMC2 */
 #define PFM_CTL_offSEL015  /* The event selection for 
PFMC0 */
-#define PFM_CTL_offSEL121  /* The event selection for 
PFMC1 */
-#define PFM_CTL_offSEL227  /* The event selection for 
PFMC2 */
+#define PFM_CTL_offSEL116  /* The event selection for 
PFMC1 */
+#define PFM_CTL_offSEL222  /* The event selection for 
PFMC2 */
 /* bit 28:31 reserved */
 
 #define PFM_CTL_mskEN0 ( 0x01  << PFM_CTL_offEN0 )
-- 
2.17.0



[PATCH v4 4/4] nds32: Add document for NDS32 PMU.

2018-10-24 Thread Nick Hu
From: Nickhu 

The document for how to add NDS32 PMU
in devicetree.

Signed-off-by: Nickhu 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/perf/nds32v3-pmu.txt| 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt

diff --git a/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt 
b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
new file mode 100644
index ..1bd15785b4ae
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
@@ -0,0 +1,17 @@
+* NDS32 Performance Monitor Units
+
+NDS32 core have a PMU for counting cpu and cache events like cache misses.
+The NDS32 PMU representation in the device tree should be done as under:
+
+Required properties:
+
+- compatible :
+   "andestech,nds32v3-pmu"
+
+- interrupts : The interrupt number for NDS32 PMU is 13.
+
+Example:
+pmu{
+   compatible = "andestech,nds32v3-pmu";
+   interrupts = <13>;
+}
-- 
2.17.0



[PATCH v4 2/4] nds32: Perf porting

2018-10-24 Thread Nick Hu
From: Nickhu 

This is the commit that porting the perf for nds32.

1.Raw event:
The raw events start with 'r'.
Usage:
perf stat -e rXYZ ./app
X: the index of performance counter.
YZ: the index(convert to hexdecimal) of events

Example:
'perf stat -e r101 ./app' means the counter 1 will 
count the instruction
event.

The index of counter and events can be found in
"Andes System Privilege Architecture Version 3 Manual".

Or you can perform the 'perf list' to find the symbolic name of raw events.

2.Perf mmap2:

Fix unexpected perf mmap2() page fault

When the mmap2() called by perf application,
you will encounter such condition:"failed to write."
With return value -EFAULT

This is due to the page fault caused by "reading" buffer
from the mapped legal address region to write to the descriptor.
The page_fault handler will get a VM_FAULT_SIGBUS return value,
which should not happens here.(Due to this is a read request.)

You can refer to kernel/events/core.c:perf_mmap_fault(...)
If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated
as true, you will get VM_FAULT_SIGBUS as return value.

However, this is not an write request. The flags which indicated
why the page fault happens is wrong.

Furthermore, NDS32 SPAv3 is not able to detect it is read or write.
It only know  either it is instruction fetch or data access.

Therefore, by removing the wrong flag assignment(actually, the hardware
is not able to show the reason), we can fix this bug.

3.Perf multiple events map to same counter.

When there are multiple events map to the same counter, the counter
counts inaccurately. This is because each counter only counts one event
in the same time.
So when there are multiple events map to same counter, they have to take
turns in each context.

There are two solution:
1. Print the error message when multiple events map to the same counter.
But print the error message would let the program hang in loop. The ltp
(linux test program) would be failed when the program hang in loop.

2. Don't print the error message, the ltp would pass. But the user need 
to
have the knowledge that don't count the events which map to the same
counter, or the user will get the inaccurate results.

We choose method 2 for the solution

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 ++
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1223 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 15 files changed, 2019 insertions(+), 6 deletions(-)
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 7068f341133d..dd448d431f5a 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -31,6 +31,7 @@ config NDS32
select HAVE_DEBUG_KMEMLEAK
select HAVE_MEMBLOCK
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_PERF_EVENTS
select IRQ_DOMAIN
select LOCKDEP_SUPPORT
select MODULES_USE_ELF_RELA
diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts
index bb39749a6673..16a9f54a805e 100644
--- a/arch/nds32/boot/dts/ae3xx.dts
+++ b/arch/nds32/boot/dts/ae3xx.dts
@@ -82,4 +82,9 @@
interrupts = <18>;
};
};
+
+   pmu {
+   compatible = "andestech,nds32v3-pmu";
+   interrupts= <13>;
+   };
 };
diff --git a/arch/nds32/include/asm/Kbuild 

[PATCH v4 4/4] nds32: Add document for NDS32 PMU.

2018-10-24 Thread Nick Hu
From: Nickhu 

The document for how to add NDS32 PMU
in devicetree.

Signed-off-by: Nickhu 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/perf/nds32v3-pmu.txt| 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt

diff --git a/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt 
b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
new file mode 100644
index ..1bd15785b4ae
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
@@ -0,0 +1,17 @@
+* NDS32 Performance Monitor Units
+
+NDS32 core have a PMU for counting cpu and cache events like cache misses.
+The NDS32 PMU representation in the device tree should be done as under:
+
+Required properties:
+
+- compatible :
+   "andestech,nds32v3-pmu"
+
+- interrupts : The interrupt number for NDS32 PMU is 13.
+
+Example:
+pmu{
+   compatible = "andestech,nds32v3-pmu";
+   interrupts = <13>;
+}
-- 
2.17.0



[PATCH v4 2/4] nds32: Perf porting

2018-10-24 Thread Nick Hu
From: Nickhu 

This is the commit that porting the perf for nds32.

1.Raw event:
The raw events start with 'r'.
Usage:
perf stat -e rXYZ ./app
X: the index of performance counter.
YZ: the index(convert to hexdecimal) of events

Example:
'perf stat -e r101 ./app' means the counter 1 will 
count the instruction
event.

The index of counter and events can be found in
"Andes System Privilege Architecture Version 3 Manual".

Or you can perform the 'perf list' to find the symbolic name of raw events.

2.Perf mmap2:

Fix unexpected perf mmap2() page fault

When the mmap2() called by perf application,
you will encounter such condition:"failed to write."
With return value -EFAULT

This is due to the page fault caused by "reading" buffer
from the mapped legal address region to write to the descriptor.
The page_fault handler will get a VM_FAULT_SIGBUS return value,
which should not happens here.(Due to this is a read request.)

You can refer to kernel/events/core.c:perf_mmap_fault(...)
If "(vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))" is evaluated
as true, you will get VM_FAULT_SIGBUS as return value.

However, this is not an write request. The flags which indicated
why the page fault happens is wrong.

Furthermore, NDS32 SPAv3 is not able to detect it is read or write.
It only know  either it is instruction fetch or data access.

Therefore, by removing the wrong flag assignment(actually, the hardware
is not able to show the reason), we can fix this bug.

3.Perf multiple events map to same counter.

When there are multiple events map to the same counter, the counter
counts inaccurately. This is because each counter only counts one event
in the same time.
So when there are multiple events map to same counter, they have to take
turns in each context.

There are two solution:
1. Print the error message when multiple events map to the same counter.
But print the error message would let the program hang in loop. The ltp
(linux test program) would be failed when the program hang in loop.

2. Don't print the error message, the ltp would pass. But the user need 
to
have the knowledge that don't count the events which map to the same
counter, or the user will get the inaccurate results.

We choose method 2 for the solution

Signed-off-by: Nickhu 
---
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 ++
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1223 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 15 files changed, 2019 insertions(+), 6 deletions(-)
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 7068f341133d..dd448d431f5a 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -31,6 +31,7 @@ config NDS32
select HAVE_DEBUG_KMEMLEAK
select HAVE_MEMBLOCK
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_PERF_EVENTS
select IRQ_DOMAIN
select LOCKDEP_SUPPORT
select MODULES_USE_ELF_RELA
diff --git a/arch/nds32/boot/dts/ae3xx.dts b/arch/nds32/boot/dts/ae3xx.dts
index bb39749a6673..16a9f54a805e 100644
--- a/arch/nds32/boot/dts/ae3xx.dts
+++ b/arch/nds32/boot/dts/ae3xx.dts
@@ -82,4 +82,9 @@
interrupts = <18>;
};
};
+
+   pmu {
+   compatible = "andestech,nds32v3-pmu";
+   interrupts= <13>;
+   };
 };
diff --git a/arch/nds32/include/asm/Kbuild 

Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-24 Thread Yi Sun
Hi, Michael,

Thanks a lot for the review and comments! Let us sync with Hyper-V team
to confirm these suspicious points.

BRs,
Sun Yi

On 18-10-24 16:53:00, Michael Kelley wrote:
> From: Yi Sun   Sent: Friday, October 19, 2018 6:14 
> AM
> > 
> > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT)
> > is used by a guest OS to notify the hypervisor that the calling
> > virtual processor is attempting to acquire a resource that is
> > potentially held by another virtual processor within the same
> > Virtual Machine. This scheduling hint improves the scalability of
> > VMs with more than one virtual processor on Hyper-V.
> > 
> > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor
> > only when the retry number exceeds the recommended number. If
> > recommended number is 0x, never retry.
> 
> The HvNotifyLongSpinWait hypercall should be understood to be
> advisory only.  As you noted, it is a scheduling hint to the
> hypervisor that some virtual CPU in the VM holds a spin lock.  Even
> though Linux knows which vCPU holds the spin lock, the hypercall
> does not provide a way to give that information to Hyper-V.  The
> hypercall always returns immediately.
> 
> The "retry number" is a bit mis-named in the Hyper-V Top Level
> Functional Spec (TLFS).  It is essentially a threshold value.  Hyper-V is
> saying "don't bother to advise me about the spin lock until you have
> done a certain number of spins."  This threshold prevents
> over-notifying Hyper-V such that the notification becomes somewhat
> meaningless.   It's not immediately clear to me why the hypercall passes
> that value as an input, but maybe it lets the Hyper-V scheduler prioritize
> among vCPUs based on how many times they have spun for a lock.  I
> think we were told that current Hyper-V implementations ignore this
> input value anyway.
> 
> I believe the description of the sentinel value 0x in the
> Hyper-V TLFS is incorrect.  Because it is the max possible threshold
> value, that value in the EBX register just means to not ever bother to
> notify.   The description should be "0x indicates never to notify."
> The value does *not* indicate anything about retrying to obtain the
> spin lock.
> 
I will send mail to Hyper-V team to clarify these.

> >  static bool __initdata hv_pvspin = true;
> > 
> > +bool hv_notify_long_spin_wait(int retry_num)
> 
> retry_num should probably be declared as unsigned int.  You
> don't want it to be treated as a negative number if the high
> order bit is set.
> 
Yes, I should declare it as 'unsigned int'. Thanks!

> > +{
> > +   /*
> > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when
> > +* the retry number exceeds the recommended number.
> > +*
> > +* If recommended number is 0x, never retry.
> > +*/
> > +   if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER)
> > +   return false;
> > +
> > +   if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num)
> 
> I don't know if the "%" function is right here.  Your implementation will
> notify Hyper-V on every multiple of num_spin_retry.   The alternative is to
> notify once when the threshold is exceeded, and never again for this
> particular attempt to obtain a spin lock.  We should check with the Hyper-V
> team for which approach they expect to be used.
> 
> > +   hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT,
> > + retry_num);
> 
> The Hyper-V TLFS seems to be inconsistent on whether the input parameter
> is 32-bits or 64-bits.   In one place it is typed as UINT64, but in another 
> place
> it is shown as only 4 bytes.  Need to clear this up with the Hyper-V team as
> well.
> 
> > +
> > +   return true;
> 
> I don't see a need for this function to return true vs. false.  Any calling 
> code
> should not change its behavior based on num_spin_retry.   This function will
> either notify Hyper-V or not notify Hyper-V, depending on whether the number
> of attempts to obtain the spinlock meets the threshold.  But calling code will
> do the same thing regardless of whether such a notification is made. 
> 
> Michael


[PATCH v4 0/4] nds32: Perf support

2018-10-24 Thread Nick Hu
These four commits are perf supporting for nds32.
There are three perfomance counters in nds32, and
each of them can counts different events. You can
use 'perf list' to show the available events that
can be used.

Changes in V2:
1. Change the definition 'PFM_CTL_xxx' to
array form.

2. Simplify the PMU driver.

3. Stop all counters when handling irq
caused by performance counters overflow.

4. Rename the compatible string in
devicetree.

Changes in V3:
Fix the typo in Documentation/devicetree/
bindings/nds32/pmu.txt.

Changes in V4:
Move 'Documentation/devicetree/bindings/nds32/pmu.txt'
to 'Documentation/devicetree/bindings/perf/nds32v3-pmu.txt'.

Nickhu (4):
  nds32: Fix bug in bitfield.h
  nds32: Perf porting
  nds32: Add perf call-graph support.
  nds32: Add document for NDS32 PMU.

 .../devicetree/bindings/perf/nds32v3-pmu.txt  |   17 +
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/bitfield.h |4 +-
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 +
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1522 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 17 files changed, 2337 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

-- 
2.17.0



[PATCH v4 3/4] nds32: Add perf call-graph support.

2018-10-24 Thread Nick Hu
From: Nickhu 

The perf call-graph option can trace the callchain
between functions. This commit add the perf callchain
for nds32. There are kerenl callchain and user callchain.
The kerenl callchain can trace the function in kernel
space. There are two type for user callchain. One for the
'optimize for size' config is set, and another one for the
config is not set. The difference between two types is that
the index of frame-pointer in user stack is not the same.

For example:
With optimize for size:
User Stack:
-
|   lp  |
-
|   gp  |
-
|   fp  |

Without optimize for size:
User Stack:
1. non-leaf function:
-
|   lp  |
-
|   fp  |

2. leaf function:
-
|   fp  |

Signed-off-by: Nickhu 
---
 arch/nds32/kernel/perf_event_cpu.c | 299 +
 1 file changed, 299 insertions(+)

diff --git a/arch/nds32/kernel/perf_event_cpu.c 
b/arch/nds32/kernel/perf_event_cpu.c
index a6e723d0fdbc..5e00ce54d0ff 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void)
 
 device_initcall(register_pmu_driver);
 
+/*
+ * References: arch/nds32/kernel/traps.c:__dump()
+ * You will need to know the NDS ABI first.
+ */
+static int unwind_frame_kernel(struct stackframe *frame)
+{
+   int graph = 0;
+#ifdef CONFIG_FRAME_POINTER
+   /* 0x3 means misalignment */
+   if (!kstack_end((void *)frame->fp) &&
+   !((unsigned long)frame->fp & 0x3) &&
+   ((unsigned long)frame->fp >= TASK_SIZE)) {
+   /*
+*  The array index is based on the ABI, the below graph
+*  illustrate the reasons.
+*  Function call procedure: "smw" and "lmw" will always
+*  update SP and FP for you automatically.
+*
+*  Stack Relative Address
+*  |  |  0
+*  
+*  |LP| <-- SP(before smw)  <-- FP(after smw)   -1
+*  
+*  |FP| -2
+*  
+*  |  | <-- SP(after smw)   -3
+*/
+   frame->lp = ((unsigned long *)frame->fp)[-1];
+   frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET];
+   /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (NULL, , frame->lp, NULL);
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#else
+   /*
+* You can refer to arch/nds32/kernel/traps.c:__dump()
+* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp".
+* And, the "sp" is not always correct.
+*
+*   Stack Relative Address
+*   |  |  0
+*   
+*   |LP| <-- SP(before smw)  -1
+*   
+*   |  | <-- SP(after smw)   -2
+*   
+*/
+   if (!kstack_end((void *)frame->sp)) {
+   frame->lp = ((unsigned long *)frame->sp)[1];
+   /* TODO: How to deal with the value in first
+* "sp" is not correct?
+*/
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (tsk, , frame->lp, NULL);
+
+   frame->sp = ((unsigned long *)frame->sp) + 1;
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#endif
+}
+
+static void notrace
+walk_stackframe(struct stackframe *frame,
+   int (*fn_record)(struct stackframe *, void *),
+   void *data)
+{
+   while (1) {
+   int ret;
+
+   if (fn_record(frame, data))
+   break;
+
+   ret = unwind_frame_kernel(frame);
+   if (ret < 0)
+   break;
+   }
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int callchain_trace(struct stackframe *fr, void *data)
+{

Re: [PATCH v1 2/2] x86/hyperv: make HvNotifyLongSpinWait hypercall

2018-10-24 Thread Yi Sun
Hi, Michael,

Thanks a lot for the review and comments! Let us sync with Hyper-V team
to confirm these suspicious points.

BRs,
Sun Yi

On 18-10-24 16:53:00, Michael Kelley wrote:
> From: Yi Sun   Sent: Friday, October 19, 2018 6:14 
> AM
> > 
> > The HvNotifyLongSpinWait hypercall (HVCALL_NOTIFY_LONG_SPIN_WAIT)
> > is used by a guest OS to notify the hypervisor that the calling
> > virtual processor is attempting to acquire a resource that is
> > potentially held by another virtual processor within the same
> > Virtual Machine. This scheduling hint improves the scalability of
> > VMs with more than one virtual processor on Hyper-V.
> > 
> > Per MSFT TLFS, the retry number (SpinWaitInfo) is sent to hypervisor
> > only when the retry number exceeds the recommended number. If
> > recommended number is 0x, never retry.
> 
> The HvNotifyLongSpinWait hypercall should be understood to be
> advisory only.  As you noted, it is a scheduling hint to the
> hypervisor that some virtual CPU in the VM holds a spin lock.  Even
> though Linux knows which vCPU holds the spin lock, the hypercall
> does not provide a way to give that information to Hyper-V.  The
> hypercall always returns immediately.
> 
> The "retry number" is a bit mis-named in the Hyper-V Top Level
> Functional Spec (TLFS).  It is essentially a threshold value.  Hyper-V is
> saying "don't bother to advise me about the spin lock until you have
> done a certain number of spins."  This threshold prevents
> over-notifying Hyper-V such that the notification becomes somewhat
> meaningless.   It's not immediately clear to me why the hypercall passes
> that value as an input, but maybe it lets the Hyper-V scheduler prioritize
> among vCPUs based on how many times they have spun for a lock.  I
> think we were told that current Hyper-V implementations ignore this
> input value anyway.
> 
> I believe the description of the sentinel value 0x in the
> Hyper-V TLFS is incorrect.  Because it is the max possible threshold
> value, that value in the EBX register just means to not ever bother to
> notify.   The description should be "0x indicates never to notify."
> The value does *not* indicate anything about retrying to obtain the
> spin lock.
> 
I will send mail to Hyper-V team to clarify these.

> >  static bool __initdata hv_pvspin = true;
> > 
> > +bool hv_notify_long_spin_wait(int retry_num)
> 
> retry_num should probably be declared as unsigned int.  You
> don't want it to be treated as a negative number if the high
> order bit is set.
> 
Yes, I should declare it as 'unsigned int'. Thanks!

> > +{
> > +   /*
> > +* Per MSFT TLFS, the SpinWaitInfo is sent to hypervisor only when
> > +* the retry number exceeds the recommended number.
> > +*
> > +* If recommended number is 0x, never retry.
> > +*/
> > +   if (ms_hyperv.num_spin_retry == HYPERV_SPINLOCK_RETRY_NEVER)
> > +   return false;
> > +
> > +   if ((0 == retry_num % ms_hyperv.num_spin_retry) && retry_num)
> 
> I don't know if the "%" function is right here.  Your implementation will
> notify Hyper-V on every multiple of num_spin_retry.   The alternative is to
> notify once when the threshold is exceeded, and never again for this
> particular attempt to obtain a spin lock.  We should check with the Hyper-V
> team for which approach they expect to be used.
> 
> > +   hv_do_fast_hypercall8(HVCALL_NOTIFY_LONG_SPIN_WAIT,
> > + retry_num);
> 
> The Hyper-V TLFS seems to be inconsistent on whether the input parameter
> is 32-bits or 64-bits.   In one place it is typed as UINT64, but in another 
> place
> it is shown as only 4 bytes.  Need to clear this up with the Hyper-V team as
> well.
> 
> > +
> > +   return true;
> 
> I don't see a need for this function to return true vs. false.  Any calling 
> code
> should not change its behavior based on num_spin_retry.   This function will
> either notify Hyper-V or not notify Hyper-V, depending on whether the number
> of attempts to obtain the spinlock meets the threshold.  But calling code will
> do the same thing regardless of whether such a notification is made. 
> 
> Michael


[PATCH v4 0/4] nds32: Perf support

2018-10-24 Thread Nick Hu
These four commits are perf supporting for nds32.
There are three perfomance counters in nds32, and
each of them can counts different events. You can
use 'perf list' to show the available events that
can be used.

Changes in V2:
1. Change the definition 'PFM_CTL_xxx' to
array form.

2. Simplify the PMU driver.

3. Stop all counters when handling irq
caused by performance counters overflow.

4. Rename the compatible string in
devicetree.

Changes in V3:
Fix the typo in Documentation/devicetree/
bindings/nds32/pmu.txt.

Changes in V4:
Move 'Documentation/devicetree/bindings/nds32/pmu.txt'
to 'Documentation/devicetree/bindings/perf/nds32v3-pmu.txt'.

Nickhu (4):
  nds32: Fix bug in bitfield.h
  nds32: Perf porting
  nds32: Add perf call-graph support.
  nds32: Add document for NDS32 PMU.

 .../devicetree/bindings/perf/nds32v3-pmu.txt  |   17 +
 arch/nds32/Kconfig|1 +
 arch/nds32/boot/dts/ae3xx.dts |5 +
 arch/nds32/include/asm/Kbuild |1 +
 arch/nds32/include/asm/bitfield.h |4 +-
 arch/nds32/include/asm/perf_event.h   |   16 +
 arch/nds32/include/asm/pmu.h  |  386 +
 arch/nds32/include/asm/stacktrace.h   |   39 +
 arch/nds32/kernel/Makefile|3 +-
 arch/nds32/kernel/perf_event_cpu.c| 1522 +
 arch/nds32/mm/fault.c |   13 +-
 tools/include/asm/barrier.h   |2 +
 tools/perf/arch/nds32/Build   |1 +
 tools/perf/arch/nds32/util/Build  |1 +
 tools/perf/arch/nds32/util/header.c   |   29 +
 tools/perf/pmu-events/arch/nds32/mapfile.csv  |   15 +
 .../pmu-events/arch/nds32/n13/atcpmu.json |  290 
 17 files changed, 2337 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/perf/nds32v3-pmu.txt
 create mode 100644 arch/nds32/include/asm/perf_event.h
 create mode 100644 arch/nds32/include/asm/pmu.h
 create mode 100644 arch/nds32/include/asm/stacktrace.h
 create mode 100644 arch/nds32/kernel/perf_event_cpu.c
 create mode 100644 tools/perf/arch/nds32/Build
 create mode 100644 tools/perf/arch/nds32/util/Build
 create mode 100644 tools/perf/arch/nds32/util/header.c
 create mode 100644 tools/perf/pmu-events/arch/nds32/mapfile.csv
 create mode 100644 tools/perf/pmu-events/arch/nds32/n13/atcpmu.json

-- 
2.17.0



[PATCH v4 3/4] nds32: Add perf call-graph support.

2018-10-24 Thread Nick Hu
From: Nickhu 

The perf call-graph option can trace the callchain
between functions. This commit add the perf callchain
for nds32. There are kerenl callchain and user callchain.
The kerenl callchain can trace the function in kernel
space. There are two type for user callchain. One for the
'optimize for size' config is set, and another one for the
config is not set. The difference between two types is that
the index of frame-pointer in user stack is not the same.

For example:
With optimize for size:
User Stack:
-
|   lp  |
-
|   gp  |
-
|   fp  |

Without optimize for size:
User Stack:
1. non-leaf function:
-
|   lp  |
-
|   fp  |

2. leaf function:
-
|   fp  |

Signed-off-by: Nickhu 
---
 arch/nds32/kernel/perf_event_cpu.c | 299 +
 1 file changed, 299 insertions(+)

diff --git a/arch/nds32/kernel/perf_event_cpu.c 
b/arch/nds32/kernel/perf_event_cpu.c
index a6e723d0fdbc..5e00ce54d0ff 100644
--- a/arch/nds32/kernel/perf_event_cpu.c
+++ b/arch/nds32/kernel/perf_event_cpu.c
@@ -1193,6 +1193,305 @@ static int __init register_pmu_driver(void)
 
 device_initcall(register_pmu_driver);
 
+/*
+ * References: arch/nds32/kernel/traps.c:__dump()
+ * You will need to know the NDS ABI first.
+ */
+static int unwind_frame_kernel(struct stackframe *frame)
+{
+   int graph = 0;
+#ifdef CONFIG_FRAME_POINTER
+   /* 0x3 means misalignment */
+   if (!kstack_end((void *)frame->fp) &&
+   !((unsigned long)frame->fp & 0x3) &&
+   ((unsigned long)frame->fp >= TASK_SIZE)) {
+   /*
+*  The array index is based on the ABI, the below graph
+*  illustrate the reasons.
+*  Function call procedure: "smw" and "lmw" will always
+*  update SP and FP for you automatically.
+*
+*  Stack Relative Address
+*  |  |  0
+*  
+*  |LP| <-- SP(before smw)  <-- FP(after smw)   -1
+*  
+*  |FP| -2
+*  
+*  |  | <-- SP(after smw)   -3
+*/
+   frame->lp = ((unsigned long *)frame->fp)[-1];
+   frame->fp = ((unsigned long *)frame->fp)[FP_OFFSET];
+   /* make sure CONFIG_FUNCTION_GRAPH_TRACER is turned on */
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (NULL, , frame->lp, NULL);
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#else
+   /*
+* You can refer to arch/nds32/kernel/traps.c:__dump()
+* Treat "sp" as "fp", but the "sp" is one frame ahead of "fp".
+* And, the "sp" is not always correct.
+*
+*   Stack Relative Address
+*   |  |  0
+*   
+*   |LP| <-- SP(before smw)  -1
+*   
+*   |  | <-- SP(after smw)   -2
+*   
+*/
+   if (!kstack_end((void *)frame->sp)) {
+   frame->lp = ((unsigned long *)frame->sp)[1];
+   /* TODO: How to deal with the value in first
+* "sp" is not correct?
+*/
+   if (__kernel_text_address(frame->lp))
+   frame->lp = ftrace_graph_ret_addr
+   (tsk, , frame->lp, NULL);
+
+   frame->sp = ((unsigned long *)frame->sp) + 1;
+
+   return 0;
+   } else {
+   return -EPERM;
+   }
+#endif
+}
+
+static void notrace
+walk_stackframe(struct stackframe *frame,
+   int (*fn_record)(struct stackframe *, void *),
+   void *data)
+{
+   while (1) {
+   int ret;
+
+   if (fn_record(frame, data))
+   break;
+
+   ret = unwind_frame_kernel(frame);
+   if (ret < 0)
+   break;
+   }
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int callchain_trace(struct stackframe *fr, void *data)
+{

Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging

2018-10-24 Thread Baoquan He
Hi Kirill,

Thanks for making this patchset. I have small concerns, please see the
inline comments.

On 10/24/18 at 03:51pm, Kirill A. Shutemov wrote:
> On 5-level paging LDT remap area is placed in the middle of
> KASLR randomization region and it can overlap with direct mapping,
> vmalloc or vmap area.
> 
> Let's move LDT just before direct mapping which makes it safe for KASLR.
> This also allows us to unify layout between 4- and 5-level paging.

In crash utility and makedumpfile which are used to analyze system
memory content, PAGE_OFFSET is hardcoded as below in non-KASLR case:

#define PAGE_OFFSET_2_6_27 0x8800

Seems this time they need add another value for them. For 4-level and
5-level, since 5-level code also exist in stable kernel. Surely this
doesn't matter much.

> 
> We don't touch 4 pgd slot gap just before the direct mapping reserved
> for a hypervisor, but move direct mapping by one slot instead.
> 
> The LDT mapping is per-mm, so we cannot move it into P4D page table next
> to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> paging.

Here as discussed in private thread, at the first place you also agreed
to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd
mind, there must be some reasons when you implemented and investigated
further to find out. Could you please say more about how it will
complicating PGD table allocation for 5-level paging? Or give an use
case where it will complicate?

Very sorry I am stupid, still don't get what's the point. Really
appreciate it.

Thanks
Baoquan


Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging

2018-10-24 Thread Baoquan He
Hi Kirill,

Thanks for making this patchset. I have small concerns, please see the
inline comments.

On 10/24/18 at 03:51pm, Kirill A. Shutemov wrote:
> On 5-level paging LDT remap area is placed in the middle of
> KASLR randomization region and it can overlap with direct mapping,
> vmalloc or vmap area.
> 
> Let's move LDT just before direct mapping which makes it safe for KASLR.
> This also allows us to unify layout between 4- and 5-level paging.

In crash utility and makedumpfile which are used to analyze system
memory content, PAGE_OFFSET is hardcoded as below in non-KASLR case:

#define PAGE_OFFSET_2_6_27 0x8800

Seems this time they need add another value for them. For 4-level and
5-level, since 5-level code also exist in stable kernel. Surely this
doesn't matter much.

> 
> We don't touch 4 pgd slot gap just before the direct mapping reserved
> for a hypervisor, but move direct mapping by one slot instead.
> 
> The LDT mapping is per-mm, so we cannot move it into P4D page table next
> to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> paging.

Here as discussed in private thread, at the first place you also agreed
to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd
mind, there must be some reasons when you implemented and investigated
further to find out. Could you please say more about how it will
complicating PGD table allocation for 5-level paging? Or give an use
case where it will complicate?

Very sorry I am stupid, still don't get what's the point. Really
appreciate it.

Thanks
Baoquan


Re: KASAN: use-after-free Read in task_is_descendant

2018-10-24 Thread Tetsuo Handa
Oleg Nesterov wrote:
> On 10/22, Tetsuo Handa wrote:
> > > And again, I do not know how/if yama ensures that child is rcu-protected, 
> > > perhaps
> > > task_is_descendant() needs to check pid_alive(child) right after 
> > > rcu_read_lock() ?
> >
> > Since the caller (ptrace() path) called get_task_struct(child), child 
> > itself can't be
> > released. Do we still need pid_alive(child) ?
> 
> get_task_struct(child) can only ensure that this task_struct can't be freed.

The report says that it is a use-after-free read at

  walker = rcu_dereference(walker->real_parent);

which means that walker was already released.

> 
> Suppose that this child exits after get_task_struct(), then its real_parent 
> exits
> too and calls call_rcu(delayed_put_task_struct).
> 
> Now, when task_is_descendant() is called, rcu_read_lock() can happen after 
> rcu gp,
> iow child->parent can be already freed/reused/unmapped.
> 
> We need to ensure that child is still protected by RCU.

I wonder whether pid_alive() test helps.

We can get

[   40.620318] parent or walker is dead.
[   40.624146] tracee is dead.

messages using below patch and reproducer.

--
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99cfddd..0d9d786 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
if (mutex_lock_interruptible(>signal->cred_guard_mutex))
goto out;
 
+   schedule_timeout_killable(HZ);
task_lock(task);
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
task_unlock(task);
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index ffda91a..a231ec6 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
return 0;
 
rcu_read_lock();
+   if (!pid_alive(parent) || !pid_alive(walker)) {
+   rcu_read_unlock();
+   printk("parent or walker is dead.\n");
+   return 0;
+   }
if (!thread_group_leader(parent))
parent = rcu_dereference(parent->group_leader);
while (walker->pid > 0) {
@@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct 
*tracer,
bool found = false;
 
rcu_read_lock();
+   if (!pid_alive(tracee)) {
+   printk("tracee is dead.\n");
+   goto unlock;
+   }
 
/*
 * If there's already an active tracing relationship, then make an
--

--
#include 
#include 
#include 

int main(int argc, char *argv[])
{
if (fork() == 0) {
ptrace(PTRACE_ATTACH, getppid(), NULL, NULL);
_exit(0);
}
poll(NULL, 0, 100);
return 0;
}
--

But since "child" has at least one reference, reading "child->real_parent" 
should
be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false
(like above patch does) cannot avoid this problem...



Re: KASAN: use-after-free Read in task_is_descendant

2018-10-24 Thread Tetsuo Handa
Oleg Nesterov wrote:
> On 10/22, Tetsuo Handa wrote:
> > > And again, I do not know how/if yama ensures that child is rcu-protected, 
> > > perhaps
> > > task_is_descendant() needs to check pid_alive(child) right after 
> > > rcu_read_lock() ?
> >
> > Since the caller (ptrace() path) called get_task_struct(child), child 
> > itself can't be
> > released. Do we still need pid_alive(child) ?
> 
> get_task_struct(child) can only ensure that this task_struct can't be freed.

The report says that it is a use-after-free read at

  walker = rcu_dereference(walker->real_parent);

which means that walker was already released.

> 
> Suppose that this child exits after get_task_struct(), then its real_parent 
> exits
> too and calls call_rcu(delayed_put_task_struct).
> 
> Now, when task_is_descendant() is called, rcu_read_lock() can happen after 
> rcu gp,
> iow child->parent can be already freed/reused/unmapped.
> 
> We need to ensure that child is still protected by RCU.

I wonder whether pid_alive() test helps.

We can get

[   40.620318] parent or walker is dead.
[   40.624146] tracee is dead.

messages using below patch and reproducer.

--
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99cfddd..0d9d786 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -385,6 +385,7 @@ static int ptrace_attach(struct task_struct *task, long 
request,
if (mutex_lock_interruptible(>signal->cred_guard_mutex))
goto out;
 
+   schedule_timeout_killable(HZ);
task_lock(task);
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
task_unlock(task);
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index ffda91a..a231ec6 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -283,6 +283,11 @@ static int task_is_descendant(struct task_struct *parent,
return 0;
 
rcu_read_lock();
+   if (!pid_alive(parent) || !pid_alive(walker)) {
+   rcu_read_unlock();
+   printk("parent or walker is dead.\n");
+   return 0;
+   }
if (!thread_group_leader(parent))
parent = rcu_dereference(parent->group_leader);
while (walker->pid > 0) {
@@ -315,6 +320,10 @@ static int ptracer_exception_found(struct task_struct 
*tracer,
bool found = false;
 
rcu_read_lock();
+   if (!pid_alive(tracee)) {
+   printk("tracee is dead.\n");
+   goto unlock;
+   }
 
/*
 * If there's already an active tracing relationship, then make an
--

--
#include 
#include 
#include 

int main(int argc, char *argv[])
{
if (fork() == 0) {
ptrace(PTRACE_ATTACH, getppid(), NULL, NULL);
_exit(0);
}
poll(NULL, 0, 100);
return 0;
}
--

But since "child" has at least one reference, reading "child->real_parent" 
should
be safe. Therefore, I think that bailing out due to pid_is_alive(child) == false
(like above patch does) cannot avoid this problem...



Re: [PATCH 08/17] prmem: struct page: track vmap_area

2018-10-24 Thread Matthew Wilcox
On Thu, Oct 25, 2018 at 02:01:02AM +0300, Igor Stoppa wrote:
> > > @@ -1747,6 +1750,10 @@ void *__vmalloc_node_range(unsigned long size, 
> > > unsigned long align,
> > >   if (!addr)
> > >   return NULL;
> > > + va = __find_vmap_area((unsigned long)addr);
> > > + for (i = 0; i < va->vm->nr_pages; i++)
> > > + va->vm->pages[i]->area = va;
> > 
> > I don't like it that you're calling this for _every_ vmalloc() caller
> > when most of them will never use this.  Perhaps have page->va be initially
> > NULL and then cache the lookup in it when it's accessed for the first time.
> > 
> 
> If __find_vmap_area() was part of the API, this loop could be left out from
> __vmalloc_node_range() and the user of the allocation could initialize the
> field, if needed.
> 
> What is the reason for keeping __find_vmap_area() private?

Well, for one, you're walking the rbtree without holding the spinlock,
so you're going to get crashes.  I don't see why we shouldn't export
find_vmap_area() though.

Another way we could approach this is to embed the vmap_area in the
vm_struct.  It'd require a bit of juggling of the alloc/free paths in
vmalloc, but it might be worthwhile.


Re: [PATCH 08/17] prmem: struct page: track vmap_area

2018-10-24 Thread Matthew Wilcox
On Thu, Oct 25, 2018 at 02:01:02AM +0300, Igor Stoppa wrote:
> > > @@ -1747,6 +1750,10 @@ void *__vmalloc_node_range(unsigned long size, 
> > > unsigned long align,
> > >   if (!addr)
> > >   return NULL;
> > > + va = __find_vmap_area((unsigned long)addr);
> > > + for (i = 0; i < va->vm->nr_pages; i++)
> > > + va->vm->pages[i]->area = va;
> > 
> > I don't like it that you're calling this for _every_ vmalloc() caller
> > when most of them will never use this.  Perhaps have page->va be initially
> > NULL and then cache the lookup in it when it's accessed for the first time.
> > 
> 
> If __find_vmap_area() was part of the API, this loop could be left out from
> __vmalloc_node_range() and the user of the allocation could initialize the
> field, if needed.
> 
> What is the reason for keeping __find_vmap_area() private?

Well, for one, you're walking the rbtree without holding the spinlock,
so you're going to get crashes.  I don't see why we shouldn't export
find_vmap_area() though.

Another way we could approach this is to embed the vmap_area in the
vm_struct.  It'd require a bit of juggling of the alloc/free paths in
vmalloc, but it might be worthwhile.


[PATCH] mmc: sdhci: Convert sdhci_allocate_bounce_buffer() to return void

2018-10-24 Thread Chunyan Zhang
The function sdhci_allocate_bounce_buffer() always return zero at
present, so there's no need to have a return value, that will also make
error path easier.

CC: Linus Walleij 
Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 1b3fbd9..e94f4aa 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3408,7 +3408,7 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, 
u32 *caps, u32 *caps1)
 }
 EXPORT_SYMBOL_GPL(__sdhci_read_caps);
 
-static int sdhci_allocate_bounce_buffer(struct sdhci_host *host)
+static void sdhci_allocate_bounce_buffer(struct sdhci_host *host)
 {
struct mmc_host *mmc = host->mmc;
unsigned int max_blocks;
@@ -3446,7 +3446,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
 * Exiting with zero here makes sure we proceed with
 * mmc->max_segs == 1.
 */
-   return 0;
+   return;
}
 
host->bounce_addr = dma_map_single(mmc->parent,
@@ -3456,7 +3456,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
ret = dma_mapping_error(mmc->parent, host->bounce_addr);
if (ret)
/* Again fall back to max_segs == 1 */
-   return 0;
+   return;
host->bounce_buffer_size = bounce_size;
 
/* Lie about this since we're bouncing */
@@ -3466,8 +3466,6 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
 
pr_info("%s bounce up to %u segments into one, max segment size %u 
bytes\n",
mmc_hostname(mmc), max_blocks, bounce_size);
-
-   return 0;
 }
 
 int sdhci_setup_host(struct sdhci_host *host)
@@ -3987,12 +3985,9 @@ int sdhci_setup_host(struct sdhci_host *host)
 */
mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 
65535;
 
-   if (mmc->max_segs == 1) {
+   if (mmc->max_segs == 1)
/* This may alter mmc->*_blk_* parameters */
-   ret = sdhci_allocate_bounce_buffer(host);
-   if (ret)
-   return ret;
-   }
+   sdhci_allocate_bounce_buffer(host);
 
return 0;
 
-- 
2.7.4



[PATCH] mmc: sdhci: Convert sdhci_allocate_bounce_buffer() to return void

2018-10-24 Thread Chunyan Zhang
The function sdhci_allocate_bounce_buffer() always return zero at
present, so there's no need to have a return value, that will also make
error path easier.

CC: Linus Walleij 
Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 1b3fbd9..e94f4aa 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3408,7 +3408,7 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, 
u32 *caps, u32 *caps1)
 }
 EXPORT_SYMBOL_GPL(__sdhci_read_caps);
 
-static int sdhci_allocate_bounce_buffer(struct sdhci_host *host)
+static void sdhci_allocate_bounce_buffer(struct sdhci_host *host)
 {
struct mmc_host *mmc = host->mmc;
unsigned int max_blocks;
@@ -3446,7 +3446,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
 * Exiting with zero here makes sure we proceed with
 * mmc->max_segs == 1.
 */
-   return 0;
+   return;
}
 
host->bounce_addr = dma_map_single(mmc->parent,
@@ -3456,7 +3456,7 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
ret = dma_mapping_error(mmc->parent, host->bounce_addr);
if (ret)
/* Again fall back to max_segs == 1 */
-   return 0;
+   return;
host->bounce_buffer_size = bounce_size;
 
/* Lie about this since we're bouncing */
@@ -3466,8 +3466,6 @@ static int sdhci_allocate_bounce_buffer(struct sdhci_host 
*host)
 
pr_info("%s bounce up to %u segments into one, max segment size %u 
bytes\n",
mmc_hostname(mmc), max_blocks, bounce_size);
-
-   return 0;
 }
 
 int sdhci_setup_host(struct sdhci_host *host)
@@ -3987,12 +3985,9 @@ int sdhci_setup_host(struct sdhci_host *host)
 */
mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 
65535;
 
-   if (mmc->max_segs == 1) {
+   if (mmc->max_segs == 1)
/* This may alter mmc->*_blk_* parameters */
-   ret = sdhci_allocate_bounce_buffer(host);
-   if (ret)
-   return ret;
-   }
+   sdhci_allocate_bounce_buffer(host);
 
return 0;
 
-- 
2.7.4



Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page

2018-10-24 Thread Andrew Lunn
On Wed, Oct 24, 2018 at 05:09:05PM -0700, Florian Fainelli wrote:
> perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some
> independance with respect to the ARM CPU being used. Add a test which
> tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is
> turned on to help asses the system's health.

Hi Florian

I've suffered the pain from missing CONFIG_KUSER_HELPERS. The
segfaults give little clue as to what is going wrong,and gdb is not
much use either.

What i don't see here is any clue to CONFIG_KUSER_HELPERS. If the test
fails, could you print a message suggesting CONFIG_KUSER_HELPERS on
ARM?

Andrew


Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page

2018-10-24 Thread Andrew Lunn
On Wed, Oct 24, 2018 at 05:09:05PM -0700, Florian Fainelli wrote:
> perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some
> independance with respect to the ARM CPU being used. Add a test which
> tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is
> turned on to help asses the system's health.

Hi Florian

I've suffered the pain from missing CONFIG_KUSER_HELPERS. The
segfaults give little clue as to what is going wrong,and gdb is not
much use either.

What i don't see here is any clue to CONFIG_KUSER_HELPERS. If the test
fails, could you print a message suggesting CONFIG_KUSER_HELPERS on
ARM?

Andrew


Re: [PATCH v6 1/6] dt-bindings: power: Introduce one property to present the battery internal resistance

2018-10-24 Thread Baolin Wang
Hi Sebastian,

On 22 October 2018 at 15:43, Baolin Wang  wrote:
> The internal resistance of a battery is not a constant in its life cycle,
> this varies over the age of the battery or temperature and so on. But we
> just want use one constant battery internal resistance to estimate the
> battery capacity. Thus this patch introduces one property to present
> the battery factory internal resistance for battery information.
>
> Signed-off-by: Baolin Wang 
> Reviewed-by: Linus Walleij 
> ---
> Changes from v5:
>  - None.
>
> Changes from v4:
>  - None.
>
> Changes from v3:
>  - Split binding into one separate patch.
>  - Add LinusW reviewed tag.
>
> Changes from v2:
>  - Rename the property.
>  - Improve the commit message.
>
> Changes from v1:
>  - New patch in v2.
> ---

I think this v6 patch set have addressed your comments on v5, so could
you apply this patch set if there are no other comments? I hope this
driver can be merged into 4.20 finally. Thanks.

-- 
Baolin Wang
Best Regards


Re: [PATCH v6 1/6] dt-bindings: power: Introduce one property to present the battery internal resistance

2018-10-24 Thread Baolin Wang
Hi Sebastian,

On 22 October 2018 at 15:43, Baolin Wang  wrote:
> The internal resistance of a battery is not a constant in its life cycle,
> this varies over the age of the battery or temperature and so on. But we
> just want use one constant battery internal resistance to estimate the
> battery capacity. Thus this patch introduces one property to present
> the battery factory internal resistance for battery information.
>
> Signed-off-by: Baolin Wang 
> Reviewed-by: Linus Walleij 
> ---
> Changes from v5:
>  - None.
>
> Changes from v4:
>  - None.
>
> Changes from v3:
>  - Split binding into one separate patch.
>  - Add LinusW reviewed tag.
>
> Changes from v2:
>  - Rename the property.
>  - Improve the commit message.
>
> Changes from v1:
>  - New patch in v2.
> ---

I think this v6 patch set have addressed your comments on v5, so could
you apply this patch set if there are no other comments? I hope this
driver can be merged into 4.20 finally. Thanks.

-- 
Baolin Wang
Best Regards


Re: [PATCH 5/5] rtc: sc27xx: Always read normal alarm when registering RTC device

2018-10-24 Thread Baolin Wang
Hi Alexandre,

On 25 October 2018 at 08:34, Alexandre Belloni
 wrote:
> Hello,
>
> On 18/10/2018 16:52:30+0800, Baolin Wang wrote:
>> When registering one RTC device, it will check to see if there is an
>> alarm already set in RTC hardware by reading RTC alarm, at this time
>> we should always read the normal alarm put in always-on region by
>> checking the rtc->registered flag.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/rtc/rtc-sc27xx.c |8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/rtc/rtc-sc27xx.c b/drivers/rtc/rtc-sc27xx.c
>> index 72bb002..b4eb3b3 100644
>> --- a/drivers/rtc/rtc-sc27xx.c
>> +++ b/drivers/rtc/rtc-sc27xx.c
>> @@ -415,10 +415,14 @@ static int sprd_rtc_read_alarm(struct device *dev, 
>> struct rtc_wkalrm *alrm)
>>   u32 val;
>>
>>   /*
>> -  * If aie_timer is enabled, we should get the normal alarm time.
>> +  * Before RTC device is registered, it will check to see if there is an
>> +  * alarm already set in RTC hardware, and we always read the normal
>> +  * alarm at this time.
>> +  *
>> +  * Or if aie_timer is enabled, we should get the normal alarm time.
>>* Otherwise we should get auxiliary alarm time.
>>*/
>> - if (rtc->rtc && rtc->rtc->aie_timer.enabled == 0)
>> + if (rtc->rtc && rtc->rtc->registered && rtc->rtc->aie_timer.enabled == 
>> 0)
>
> Note that the driver should not access rtc->registered and
> rtc->aie_timer.enabled and this is a bit fragile.
> But, on the other hand, I currently don't have anything better to
> suggest. I was also planning to add an in-kernel API for multiple alarms
> but I'm not sure it will actually help in your case.

Yes, I understand your concern. I will glad to help to test if you
introduce new APIs for multiple alarms to see if they can help in our
case. Thanks.

-- 
Baolin Wang
Best Regards


Re: [PATCH 5/5] rtc: sc27xx: Always read normal alarm when registering RTC device

2018-10-24 Thread Baolin Wang
Hi Alexandre,

On 25 October 2018 at 08:34, Alexandre Belloni
 wrote:
> Hello,
>
> On 18/10/2018 16:52:30+0800, Baolin Wang wrote:
>> When registering one RTC device, it will check to see if there is an
>> alarm already set in RTC hardware by reading RTC alarm, at this time
>> we should always read the normal alarm put in always-on region by
>> checking the rtc->registered flag.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/rtc/rtc-sc27xx.c |8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/rtc/rtc-sc27xx.c b/drivers/rtc/rtc-sc27xx.c
>> index 72bb002..b4eb3b3 100644
>> --- a/drivers/rtc/rtc-sc27xx.c
>> +++ b/drivers/rtc/rtc-sc27xx.c
>> @@ -415,10 +415,14 @@ static int sprd_rtc_read_alarm(struct device *dev, 
>> struct rtc_wkalrm *alrm)
>>   u32 val;
>>
>>   /*
>> -  * If aie_timer is enabled, we should get the normal alarm time.
>> +  * Before RTC device is registered, it will check to see if there is an
>> +  * alarm already set in RTC hardware, and we always read the normal
>> +  * alarm at this time.
>> +  *
>> +  * Or if aie_timer is enabled, we should get the normal alarm time.
>>* Otherwise we should get auxiliary alarm time.
>>*/
>> - if (rtc->rtc && rtc->rtc->aie_timer.enabled == 0)
>> + if (rtc->rtc && rtc->rtc->registered && rtc->rtc->aie_timer.enabled == 
>> 0)
>
> Note that the driver should not access rtc->registered and
> rtc->aie_timer.enabled and this is a bit fragile.
> But, on the other hand, I currently don't have anything better to
> suggest. I was also planning to add an in-kernel API for multiple alarms
> but I'm not sure it will actually help in your case.

Yes, I understand your concern. I will glad to help to test if you
introduce new APIs for multiple alarms to see if they can help in our
case. Thanks.

-- 
Baolin Wang
Best Regards


[PATCH] kvm/x86 : remove unused struct definition

2018-10-24 Thread Peng Hao
structure svm_init_data is never used. So remove it.

Signed-off-by: Peng Hao 
---
 arch/x86/kvm/svm.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 61ccfb1..5c7dc8b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -675,11 +675,6 @@ struct svm_cpu_data {
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
 
-struct svm_init_data {
-   int cpu;
-   int r;
-};
-
 static const u32 msrpm_ranges[] = {0, 0xc000, 0xc001};
 
 #define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges)
-- 
1.8.3.1



[PATCH] kvm/x86 : remove unused struct definition

2018-10-24 Thread Peng Hao
structure svm_init_data is never used. So remove it.

Signed-off-by: Peng Hao 
---
 arch/x86/kvm/svm.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 61ccfb1..5c7dc8b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -675,11 +675,6 @@ struct svm_cpu_data {
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
 
-struct svm_init_data {
-   int cpu;
-   int r;
-};
-
 static const u32 msrpm_ranges[] = {0, 0xc000, 0xc001};
 
 #define NUM_MSR_MAPS ARRAY_SIZE(msrpm_ranges)
-- 
1.8.3.1



Re: investments proposal

2018-10-24 Thread Mr.Gabson Richard




--
Dear Sir,

We are contacting your firm with a keen interest to consider a possible
collaboration through investments in viable projects globally. we are
an investment firm responsible for investing part of our client surplus
financial resources through a globally diversified investment strategy,
targeting positive capital returns through an expansive portfolio of
highly diversified asset classes and active investment management
strategies.

Currently we wish to re-invest through project financing and investment
opportunities globally We are ready to invest in project developments
and business ventures that can generate at least 10% ROI per annum. Our
offer for investment is based on a purely loan/debt financing basis of
2.5% interest rate per annum throughout the duration of the
collaboration.

We look forward to reading from you on possible projects and investments
suitable for possible collaboration.

Best regards,

Mr.Gabson Richard
Email:mr_gabsonrich...@aol.com


Re: investments proposal

2018-10-24 Thread Mr.Gabson Richard




--
Dear Sir,

We are contacting your firm with a keen interest to consider a possible
collaboration through investments in viable projects globally. we are
an investment firm responsible for investing part of our client surplus
financial resources through a globally diversified investment strategy,
targeting positive capital returns through an expansive portfolio of
highly diversified asset classes and active investment management
strategies.

Currently we wish to re-invest through project financing and investment
opportunities globally We are ready to invest in project developments
and business ventures that can generate at least 10% ROI per annum. Our
offer for investment is based on a purely loan/debt financing basis of
2.5% interest rate per annum throughout the duration of the
collaboration.

We look forward to reading from you on possible projects and investments
suitable for possible collaboration.

Best regards,

Mr.Gabson Richard
Email:mr_gabsonrich...@aol.com


[PATCH v3 1/3] dt-bindings: Input: new bindings for MSM vibrator

2018-10-24 Thread Brian Masney
This patch adds the device tree bindings for the vibrator found on
various Qualcomm MSM SOCs.

Signed-off-by: Brian Masney 
---
 .../bindings/input/msm-vibrator.txt   | 36 +++
 1 file changed, 36 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt

diff --git a/Documentation/devicetree/bindings/input/msm-vibrator.txt 
b/Documentation/devicetree/bindings/input/msm-vibrator.txt
new file mode 100644
index ..8dcf014ef2e5
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/msm-vibrator.txt
@@ -0,0 +1,36 @@
+* Device tree bindings for the Qualcomm MSM vibrator
+
+Required properties:
+
+  - compatible: Should be one of
+   "qcom,msm8226-vibrator"
+   "qcom,msm8974-vibrator"
+  - reg: the base address and length of the IO memory for the registers.
+  - pinctrl-names: set to default.
+  - pinctrl-0: phandles pointing to pin configuration nodes. See
+   Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
+  - clock-names: set to pwm
+  - clocks: phandle of the clock. See
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+  - enable-gpios: GPIO that enables the vibrator.
+
+Optional properties:
+
+  - vcc-supply: phandle to the regulator that provides power to the sensor.
+
+Example from a LG Nexus 5 (hammerhead) phone:
+
+vibrator@fd8c3450 {
+   reg = <0xfd8c3450 0x400>;
+   compatible = "qcom,msm8974-vibrator";
+
+   vcc-supply = <_l19>;
+
+   clocks = < CAMSS_GP1_CLK>;
+   clock-names = "pwm";
+
+   enable-gpios = < 60 GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pin>;
+};
-- 
2.17.2



[PATCH v3 0/3] ARM: qcom: add vibrator support for various MSM SOCs

2018-10-24 Thread Brian Masney
This patch set adds support for the vibrator found on various Qualcomm
MSM SOCs. This is based on work from:

Jonathan Marek from qcom,pwm-vibrator.c in the PostmarketOS repo:
https://gitlab.com/postmarketOS/linux-postmarketos/commit/7647fb36cb1cbd060f8b52087a68ab93583292b5

Jongrak Kwon and Devin Kim from msm_pwm_vibrator.c in the downstream
Android 3.4.0 sources:
https://android.googlesource.com/kernel/msm/+/android-msm-lenok-3.10-lollipop-wear-release/drivers/misc/msm_pwm_vibrator.c

Driver was tested on a LG Nexus 5 (hammerhead) phone using rumble-test:
https://git.collabora.com/cgit/user/sre/rumble-test.git/plain/rumble-test.c

Changes since v2
- Moved from pwm to input subsystem based on feedback from
  https://lore.kernel.org/lkml/20181012114749.GC31561@ulmo/. I
  previously wired this into the input system via pwm-vibra.

Changes since v1
- Update device tree binding document based on feedback from Rob
  Herring.
- Changed the driver description to: Qualcomm PWM driver for the MSM
  vibrator.

Brian Masney (3):
  dt-bindings: Input: new bindings for MSM vibrator
  Input: add new vibrator driver for various MSM SOCs
  ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for
vibrator

 .../bindings/input/msm-vibrator.txt   |  36 +++
 .../qcom-msm8974-lge-nexus5-hammerhead.dts|  31 ++
 drivers/input/misc/Kconfig|  10 +
 drivers/input/misc/Makefile   |   1 +
 drivers/input/misc/msm-vibrator.c | 276 ++
 5 files changed, 354 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt
 create mode 100644 drivers/input/misc/msm-vibrator.c

-- 
2.17.2



[PATCH v3 1/3] dt-bindings: Input: new bindings for MSM vibrator

2018-10-24 Thread Brian Masney
This patch adds the device tree bindings for the vibrator found on
various Qualcomm MSM SOCs.

Signed-off-by: Brian Masney 
---
 .../bindings/input/msm-vibrator.txt   | 36 +++
 1 file changed, 36 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt

diff --git a/Documentation/devicetree/bindings/input/msm-vibrator.txt 
b/Documentation/devicetree/bindings/input/msm-vibrator.txt
new file mode 100644
index ..8dcf014ef2e5
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/msm-vibrator.txt
@@ -0,0 +1,36 @@
+* Device tree bindings for the Qualcomm MSM vibrator
+
+Required properties:
+
+  - compatible: Should be one of
+   "qcom,msm8226-vibrator"
+   "qcom,msm8974-vibrator"
+  - reg: the base address and length of the IO memory for the registers.
+  - pinctrl-names: set to default.
+  - pinctrl-0: phandles pointing to pin configuration nodes. See
+   Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
+  - clock-names: set to pwm
+  - clocks: phandle of the clock. See
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+  - enable-gpios: GPIO that enables the vibrator.
+
+Optional properties:
+
+  - vcc-supply: phandle to the regulator that provides power to the sensor.
+
+Example from a LG Nexus 5 (hammerhead) phone:
+
+vibrator@fd8c3450 {
+   reg = <0xfd8c3450 0x400>;
+   compatible = "qcom,msm8974-vibrator";
+
+   vcc-supply = <_l19>;
+
+   clocks = < CAMSS_GP1_CLK>;
+   clock-names = "pwm";
+
+   enable-gpios = < 60 GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pin>;
+};
-- 
2.17.2



[PATCH v3 0/3] ARM: qcom: add vibrator support for various MSM SOCs

2018-10-24 Thread Brian Masney
This patch set adds support for the vibrator found on various Qualcomm
MSM SOCs. This is based on work from:

Jonathan Marek from qcom,pwm-vibrator.c in the PostmarketOS repo:
https://gitlab.com/postmarketOS/linux-postmarketos/commit/7647fb36cb1cbd060f8b52087a68ab93583292b5

Jongrak Kwon and Devin Kim from msm_pwm_vibrator.c in the downstream
Android 3.4.0 sources:
https://android.googlesource.com/kernel/msm/+/android-msm-lenok-3.10-lollipop-wear-release/drivers/misc/msm_pwm_vibrator.c

Driver was tested on a LG Nexus 5 (hammerhead) phone using rumble-test:
https://git.collabora.com/cgit/user/sre/rumble-test.git/plain/rumble-test.c

Changes since v2
- Moved from pwm to input subsystem based on feedback from
  https://lore.kernel.org/lkml/20181012114749.GC31561@ulmo/. I
  previously wired this into the input system via pwm-vibra.

Changes since v1
- Update device tree binding document based on feedback from Rob
  Herring.
- Changed the driver description to: Qualcomm PWM driver for the MSM
  vibrator.

Brian Masney (3):
  dt-bindings: Input: new bindings for MSM vibrator
  Input: add new vibrator driver for various MSM SOCs
  ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for
vibrator

 .../bindings/input/msm-vibrator.txt   |  36 +++
 .../qcom-msm8974-lge-nexus5-hammerhead.dts|  31 ++
 drivers/input/misc/Kconfig|  10 +
 drivers/input/misc/Makefile   |   1 +
 drivers/input/misc/msm-vibrator.c | 276 ++
 5 files changed, 354 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/msm-vibrator.txt
 create mode 100644 drivers/input/misc/msm-vibrator.c

-- 
2.17.2



[PATCH v3 2/3] Input: add new vibrator driver for various MSM SOCs

2018-10-24 Thread Brian Masney
This patch adds a new vibrator driver that supports various Qualcomm
MSM SOCs. Driver was tested on a LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
 drivers/input/misc/Kconfig|  10 ++
 drivers/input/misc/Makefile   |   1 +
 drivers/input/misc/msm-vibrator.c | 276 ++
 3 files changed, 287 insertions(+)
 create mode 100644 drivers/input/misc/msm-vibrator.c

diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index ca59a2be9bc5..e39aef84f357 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -117,6 +117,16 @@ config INPUT_E3X0_BUTTON
  To compile this driver as a module, choose M here: the
  module will be called e3x0_button.
 
+config INPUT_MSM_VIBRATOR
+   tristate "Qualcomm MSM vibrator driver"
+   select INPUT_FF_MEMLESS
+   help
+ Support for the vibrator that is found on various Qualcomm MSM
+ SOCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called msm_vibrator.
+
 config INPUT_PCSPKR
tristate "PC Speaker support"
depends on PCSPKR_PLATFORM
diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
index 9d0f9d1ff68f..96a6419cb1f2 100644
--- a/drivers/input/misc/Makefile
+++ b/drivers/input/misc/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_INPUT_MAX8925_ONKEY) += max8925_onkey.o
 obj-$(CONFIG_INPUT_MAX8997_HAPTIC) += max8997_haptic.o
 obj-$(CONFIG_INPUT_MC13783_PWRBUTTON)  += mc13783-pwrbutton.o
 obj-$(CONFIG_INPUT_MMA8450)+= mma8450.o
+obj-$(CONFIG_INPUT_MSM_VIBRATOR)   += msm-vibrator.o
 obj-$(CONFIG_INPUT_PALMAS_PWRBUTTON)   += palmas-pwrbutton.o
 obj-$(CONFIG_INPUT_PCAP)   += pcap_keys.o
 obj-$(CONFIG_INPUT_PCF50633_PMU)   += pcf50633-input.o
diff --git a/drivers/input/misc/msm-vibrator.c 
b/drivers/input/misc/msm-vibrator.c
new file mode 100644
index ..bb862f30a780
--- /dev/null
+++ b/drivers/input/misc/msm-vibrator.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Qualcomm MSM vibrator driver
+ *
+ * Copyright (c) 2018 Brian Masney 
+ *
+ * Based on qcom,pwm-vibrator.c from:
+ * Copyright (c) 2018 Jonathan Marek 
+ *
+ * Based on msm_pwm_vibrator.c from downstream Android sources:
+ * Copyright (C) 2009-2014 LGE, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define REG_CMD_RCGR   0x00
+#define REG_CFG_RCGR   0x04
+#define REG_M  0x08
+#define REG_N  0x0C
+#define REG_D  0x10
+#define REG_CBCR   0x24
+#define MMSS_CC_M_DEFAULT  1
+
+struct msm_vibrator {
+   struct input_dev *input;
+   struct mutex mutex;
+   struct work_struct worker;
+   void __iomem *base;
+   struct regulator *vcc;
+   struct clk *clk;
+   struct gpio_desc *enable_gpio;
+   u16 magnitude;
+   bool enabled;
+};
+
+#define msm_vibrator_write(msm_vibrator, offset, value) \
+   writel((value), (void __iomem *)((msm_vibrator)->base + (offset)))
+
+static int msm_vibrator_start(struct msm_vibrator *vibrator)
+{
+   int d_reg_val, ret = 0;
+
+   mutex_lock(>mutex);
+
+   if (!vibrator->enabled) {
+   ret = clk_set_rate(vibrator->clk, 24000);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to set clock rate: %d\n", ret);
+   goto unlock;
+   }
+
+   ret = clk_prepare_enable(vibrator->clk);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to enable clock: %d\n", ret);
+   goto unlock;
+   }
+
+   ret = regulator_enable(vibrator->vcc);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to enable regulator: %d\n", ret);
+   clk_disable(vibrator->clk);
+   goto unlock;
+   }
+
+   gpiod_set_value_cansleep(vibrator->enable_gpio, 1);
+
+   vibrator->enabled = true;
+   }
+
+   d_reg_val = 127 - ((126 * vibrator->magnitude) / 0x);
+   msm_vibrator_write(vibrator, REG_CFG_RCGR,
+  (2 << 12) | /* dual edge mode */
+  (0 << 8) |  /* cxo */
+  (7 << 0));
+   msm_vibrator_write(vibrator, REG_M, 1);
+   msm_vibrator_write(vibrator, REG_N, 128);
+   msm_vibrator_write(vibrator, REG_D, d_reg_val);
+   msm_vibrator_write(vibrator, REG_CMD_RCGR, 1);
+   msm_vibrator_write(vibrator, REG_CBCR, 1);
+
+unlock:
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
+static void msm_vibrator_stop(struct msm_vibrator *vibrator)
+{
+   mutex_lock(>mutex);
+
+   if (vibrator->enabled) {
+   

Re: TSC to Mono-raw Drift

2018-10-24 Thread Christopher Hall
On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz  wrote:
> > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner  
> > > wrote:
> > >> On Fri, 19 Oct 2018, John Stultz wrote:
> > >>> We might be able to reduce the degree in this case, but I worry the
> > >>> extra complexity may only cause problems for others.
> > >>
> > >> Is it really that complex to add a fixed correction value periodically?
> 
> The error is too large to be corrected by stepping on clock updates.
> For a typical TSC frequency we have multiplier in the range of few
> millions, so that's a frequency error of up to few hundred ppb. In the

For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
largest I've seen since measuring this on a few platforms. 20-40 PPB seems
more typical.

> old days when the clock was updated 1000 times per second that would
> be hidden in the resolution of the clock, but now with tickless
> kernels those steps would be very noticeable.
> 
> If the multiplier was adjusted in the same way as the non-raw clock,
> there wouldn't be any steps in time, but there would be steps in
> frequency and the time error would be proportional to the update
> interval. For a clock updated once per second that's an error of up to
> a few hundreds of nanoseconds.
> 
> I agree with John. I think the raw monotonic clock should be stable.
> 
> It would help if we better understood the use case. If I needed a

There may be other future use cases, but the most immediate use is timed
GPIO. Current shipping hardware has a timed GPIO device that uses ART/TSC
directly to schedule output events and time-stamp input events. There isn't
a device clock like PHC, as an example.

An example application is synchronizing TGPIO across two networked devices
sync'd with PTP. It's possible to use existing the existing PHC interface to
compute the network time in terms of raw monotonic and vice versa.

Now we need to add a TSC to monotonic raw relation in order to interact
with the TGPIO hardware. To me this doesn't make sense because they use the
same hardware clock.

> clock that ticks in an exact proportion to the tsc, why wouldn't I use
> the tsc directly? Is this about having a fall back in case the tsc

I think this would work OK. This would need some additional API to expose the
relation between monotonic raw and TSC. In the kernel that would probably mean
plumbing ktime_get_snapshot() to the application. Or, another option:
add TSC as another clock to the get_device_system_crosststamp() interface.

The second option would be more precise, but for the proposed application
(or similar) would require changes to the supporting stacks like PTP.

> cannot be used (e.g. due to unsynchronized CPUs)?

No.
> 
> If the frequency error was exported, it could be compensated where
> necessary. Maybe that would work for the original poster?

This would be the same as plumbing ktime_get_snapshot() except that the
math would be done in the kernel.

> 
> A better fix might be to modify the calculation of time to use a
> second multiplier, effectively increasing its resolution. However,

I'm not sure, I'm understanding. Like cascading transforms? While this
would increase the precision, I think it would still drift over days. We
could probably fix up every second though.

> that would slow down all users of the clock.

Couldn't some clocksources specify an additional multiplier/precision and
others use lower precision?

> 
> -- 
> Miroslav Lichvar


Re: TSC to Mono-raw Drift

2018-10-24 Thread Christopher Hall
On Wed, Oct 24, 2018 at 04:51:13PM +0200, Miroslav Lichvar wrote:
> On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz  wrote:
> > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner  
> > > wrote:
> > >> On Fri, 19 Oct 2018, John Stultz wrote:
> > >>> We might be able to reduce the degree in this case, but I worry the
> > >>> extra complexity may only cause problems for others.
> > >>
> > >> Is it really that complex to add a fixed correction value periodically?
> 
> The error is too large to be corrected by stepping on clock updates.
> For a typical TSC frequency we have multiplier in the range of few
> millions, so that's a frequency error of up to few hundred ppb. In the

For this example, a i5-6600 CPU @ 3.30GHz I measured 90 ppb. That is the
largest I've seen since measuring this on a few platforms. 20-40 PPB seems
more typical.

> old days when the clock was updated 1000 times per second that would
> be hidden in the resolution of the clock, but now with tickless
> kernels those steps would be very noticeable.
> 
> If the multiplier was adjusted in the same way as the non-raw clock,
> there wouldn't be any steps in time, but there would be steps in
> frequency and the time error would be proportional to the update
> interval. For a clock updated once per second that's an error of up to
> a few hundreds of nanoseconds.
> 
> I agree with John. I think the raw monotonic clock should be stable.
> 
> It would help if we better understood the use case. If I needed a

There may be other future use cases, but the most immediate use is timed
GPIO. Current shipping hardware has a timed GPIO device that uses ART/TSC
directly to schedule output events and time-stamp input events. There isn't
a device clock like PHC, as an example.

An example application is synchronizing TGPIO across two networked devices
sync'd with PTP. It's possible to use existing the existing PHC interface to
compute the network time in terms of raw monotonic and vice versa.

Now we need to add a TSC to monotonic raw relation in order to interact
with the TGPIO hardware. To me this doesn't make sense because they use the
same hardware clock.

> clock that ticks in an exact proportion to the tsc, why wouldn't I use
> the tsc directly? Is this about having a fall back in case the tsc

I think this would work OK. This would need some additional API to expose the
relation between monotonic raw and TSC. In the kernel that would probably mean
plumbing ktime_get_snapshot() to the application. Or, another option:
add TSC as another clock to the get_device_system_crosststamp() interface.

The second option would be more precise, but for the proposed application
(or similar) would require changes to the supporting stacks like PTP.

> cannot be used (e.g. due to unsynchronized CPUs)?

No.
> 
> If the frequency error was exported, it could be compensated where
> necessary. Maybe that would work for the original poster?

This would be the same as plumbing ktime_get_snapshot() except that the
math would be done in the kernel.

> 
> A better fix might be to modify the calculation of time to use a
> second multiplier, effectively increasing its resolution. However,

I'm not sure, I'm understanding. Like cascading transforms? While this
would increase the precision, I think it would still drift over days. We
could probably fix up every second though.

> that would slow down all users of the clock.

Couldn't some clocksources specify an additional multiplier/precision and
others use lower precision?

> 
> -- 
> Miroslav Lichvar


[PATCH v3 2/3] Input: add new vibrator driver for various MSM SOCs

2018-10-24 Thread Brian Masney
This patch adds a new vibrator driver that supports various Qualcomm
MSM SOCs. Driver was tested on a LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
 drivers/input/misc/Kconfig|  10 ++
 drivers/input/misc/Makefile   |   1 +
 drivers/input/misc/msm-vibrator.c | 276 ++
 3 files changed, 287 insertions(+)
 create mode 100644 drivers/input/misc/msm-vibrator.c

diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index ca59a2be9bc5..e39aef84f357 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -117,6 +117,16 @@ config INPUT_E3X0_BUTTON
  To compile this driver as a module, choose M here: the
  module will be called e3x0_button.
 
+config INPUT_MSM_VIBRATOR
+   tristate "Qualcomm MSM vibrator driver"
+   select INPUT_FF_MEMLESS
+   help
+ Support for the vibrator that is found on various Qualcomm MSM
+ SOCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called msm_vibrator.
+
 config INPUT_PCSPKR
tristate "PC Speaker support"
depends on PCSPKR_PLATFORM
diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
index 9d0f9d1ff68f..96a6419cb1f2 100644
--- a/drivers/input/misc/Makefile
+++ b/drivers/input/misc/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_INPUT_MAX8925_ONKEY) += max8925_onkey.o
 obj-$(CONFIG_INPUT_MAX8997_HAPTIC) += max8997_haptic.o
 obj-$(CONFIG_INPUT_MC13783_PWRBUTTON)  += mc13783-pwrbutton.o
 obj-$(CONFIG_INPUT_MMA8450)+= mma8450.o
+obj-$(CONFIG_INPUT_MSM_VIBRATOR)   += msm-vibrator.o
 obj-$(CONFIG_INPUT_PALMAS_PWRBUTTON)   += palmas-pwrbutton.o
 obj-$(CONFIG_INPUT_PCAP)   += pcap_keys.o
 obj-$(CONFIG_INPUT_PCF50633_PMU)   += pcf50633-input.o
diff --git a/drivers/input/misc/msm-vibrator.c 
b/drivers/input/misc/msm-vibrator.c
new file mode 100644
index ..bb862f30a780
--- /dev/null
+++ b/drivers/input/misc/msm-vibrator.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Qualcomm MSM vibrator driver
+ *
+ * Copyright (c) 2018 Brian Masney 
+ *
+ * Based on qcom,pwm-vibrator.c from:
+ * Copyright (c) 2018 Jonathan Marek 
+ *
+ * Based on msm_pwm_vibrator.c from downstream Android sources:
+ * Copyright (C) 2009-2014 LGE, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define REG_CMD_RCGR   0x00
+#define REG_CFG_RCGR   0x04
+#define REG_M  0x08
+#define REG_N  0x0C
+#define REG_D  0x10
+#define REG_CBCR   0x24
+#define MMSS_CC_M_DEFAULT  1
+
+struct msm_vibrator {
+   struct input_dev *input;
+   struct mutex mutex;
+   struct work_struct worker;
+   void __iomem *base;
+   struct regulator *vcc;
+   struct clk *clk;
+   struct gpio_desc *enable_gpio;
+   u16 magnitude;
+   bool enabled;
+};
+
+#define msm_vibrator_write(msm_vibrator, offset, value) \
+   writel((value), (void __iomem *)((msm_vibrator)->base + (offset)))
+
+static int msm_vibrator_start(struct msm_vibrator *vibrator)
+{
+   int d_reg_val, ret = 0;
+
+   mutex_lock(>mutex);
+
+   if (!vibrator->enabled) {
+   ret = clk_set_rate(vibrator->clk, 24000);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to set clock rate: %d\n", ret);
+   goto unlock;
+   }
+
+   ret = clk_prepare_enable(vibrator->clk);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to enable clock: %d\n", ret);
+   goto unlock;
+   }
+
+   ret = regulator_enable(vibrator->vcc);
+   if (ret) {
+   dev_err(>input->dev,
+   "Failed to enable regulator: %d\n", ret);
+   clk_disable(vibrator->clk);
+   goto unlock;
+   }
+
+   gpiod_set_value_cansleep(vibrator->enable_gpio, 1);
+
+   vibrator->enabled = true;
+   }
+
+   d_reg_val = 127 - ((126 * vibrator->magnitude) / 0x);
+   msm_vibrator_write(vibrator, REG_CFG_RCGR,
+  (2 << 12) | /* dual edge mode */
+  (0 << 8) |  /* cxo */
+  (7 << 0));
+   msm_vibrator_write(vibrator, REG_M, 1);
+   msm_vibrator_write(vibrator, REG_N, 128);
+   msm_vibrator_write(vibrator, REG_D, d_reg_val);
+   msm_vibrator_write(vibrator, REG_CMD_RCGR, 1);
+   msm_vibrator_write(vibrator, REG_CBCR, 1);
+
+unlock:
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
+static void msm_vibrator_stop(struct msm_vibrator *vibrator)
+{
+   mutex_lock(>mutex);
+
+   if (vibrator->enabled) {
+   

[PATCH v3 3/3] ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator

2018-10-24 Thread Brian Masney
This patch adds device device tree bindings for the vibrator found on
the LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
 .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 +++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts 
b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
index ed8f064d0895..de956eea31ac 100644
--- a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
model = "LGE MSM 8974 HAMMERHEAD";
@@ -268,6 +269,36 @@
input-enable;
};
};
+
+   vibrator_pin: vibrator {
+   pwm {
+   pins = "gpio27";
+   function = "gp1_clk";
+
+   drive-strength = <6>;
+   bias-disable;
+   };
+
+   enable {
+   pins = "gpio60";
+   function = "gpio";
+   };
+   };
+   };
+
+   vibrator@fd8c3450 {
+   compatible = "qcom,msm8974-vibrator";
+   reg = <0xfd8c3450 0x400>;
+
+   vcc-supply = <_l19>;
+
+   clocks = < CAMSS_GP1_CLK>;
+   clock-names = "pwm";
+
+   enable-gpios = < 60 GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pin>;
};
 
sdhci@f9824900 {
-- 
2.17.2



[PATCH v3 3/3] ARM: dts: qcom: msm8974-hammerhead: add device tree bindings for vibrator

2018-10-24 Thread Brian Masney
This patch adds device device tree bindings for the vibrator found on
the LG Nexus 5 (hammerhead) phone.

Signed-off-by: Brian Masney 
---
 .../qcom-msm8974-lge-nexus5-hammerhead.dts| 31 +++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts 
b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
index ed8f064d0895..de956eea31ac 100644
--- a/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
+++ b/arch/arm/boot/dts/qcom-msm8974-lge-nexus5-hammerhead.dts
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
model = "LGE MSM 8974 HAMMERHEAD";
@@ -268,6 +269,36 @@
input-enable;
};
};
+
+   vibrator_pin: vibrator {
+   pwm {
+   pins = "gpio27";
+   function = "gp1_clk";
+
+   drive-strength = <6>;
+   bias-disable;
+   };
+
+   enable {
+   pins = "gpio60";
+   function = "gpio";
+   };
+   };
+   };
+
+   vibrator@fd8c3450 {
+   compatible = "qcom,msm8974-vibrator";
+   reg = <0xfd8c3450 0x400>;
+
+   vcc-supply = <_l19>;
+
+   clocks = < CAMSS_GP1_CLK>;
+   clock-names = "pwm";
+
+   enable-gpios = < 60 GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pin>;
};
 
sdhci@f9824900 {
-- 
2.17.2



[PATCH 2/2] mm/zsmalloc.c: fix zsmalloc ARM LPAE support

2018-10-24 Thread Rafael David Tinoco
Since commit 02390b87a945 ("mm/zsmalloc: Prepare to variable
MAX_PHYSMEM_BITS"), an architecture has to define this value in order to
guarantee that zsmalloc will be able to encode and decode the obj value
properly.

Similar to that change, this one sets the value for ARM LPAE, fixing a
possible null-ptr-deref in zs_map_object() when using ARM LPAE and
HIGHMEM pages located above the 4GB watermark.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco 
---
 arch/arm/include/asm/pgtable-3level-types.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level-types.h 
b/arch/arm/include/asm/pgtable-3level-types.h
index 921aa30259c4..bd4994f98700 100644
--- a/arch/arm/include/asm/pgtable-3level-types.h
+++ b/arch/arm/include/asm/pgtable-3level-types.h
@@ -67,4 +67,6 @@ typedef pteval_t pgprot_t;
 
 #endif /* STRICT_MM_TYPECHECKS */
 
+#define MAX_POSSIBLE_PHYSMEM_BITS  36
+
 #endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */
-- 
2.19.1



[PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-24 Thread Rafael David Tinoco
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
physical frame number might be so big that zsmalloc obj encoding (to
location) will break IF architecture does not re-define
MAX_PHYSMEM_BITS, causing:

[  497.097843] 
==
[  497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
[  497.105933] Read of size 4 at addr  by task mkfs.ext4/623
[  497.109684]
[  497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 
4.19.0-rc8-00017-g8239bc6d3307-dirty #15
[  497.116098] Hardware name: Generic DT based system
[  497.119094] [] (unwind_backtrace) from [] 
(show_stack+0x20/0x24)
[  497.123819] [] (show_stack) from [] 
(dump_stack+0xbc/0xe8)
[  497.128299] [] (dump_stack) from [] 
(kasan_report+0x248/0x390)
[  497.132928] [] (kasan_report) from [] 
(__asan_load4+0x78/0xb4)
[  497.137530] [] (__asan_load4) from [] 
(zs_map_object+0xa4/0x2bc)
[  497.142335] [] (zs_map_object) from [] 
(zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
[  497.148294] [] (zram_bvec_rw.constprop.2 [zram]) from [] 
(zram_make_request+0x234/0x46c [zram])
[  497.154653] [] (zram_make_request [zram]) from [] 
(generic_make_request+0x304/0x63c)
[  497.160413] [] (generic_make_request) from [] 
(submit_bio+0x4c/0x1c8)
[  497.165379] [] (submit_bio) from [] 
(submit_bh_wbc.constprop.15+0x238/0x26c)
[  497.170775] [] (submit_bh_wbc.constprop.15) from [] 
(__block_write_full_page+0x524/0x76c)
[  497.176776] [] (__block_write_full_page) from [] 
(block_write_full_page+0x1bc/0x1d4)
[  497.182549] [] (block_write_full_page) from [] 
(blkdev_writepage+0x24/0x28)
[  497.187849] [] (blkdev_writepage) from [] 
(__writepage+0x44/0x78)
[  497.192633] [] (__writepage) from [] 
(write_cache_pages+0x3b8/0x800)
[  497.197616] [] (write_cache_pages) from [] 
(generic_writepages+0x74/0xa0)
[  497.202807] [] (generic_writepages) from [] 
(blkdev_writepages+0x18/0x1c)
[  497.208022] [] (blkdev_writepages) from [] 
(do_writepages+0x68/0x134)
[  497.213002] [] (do_writepages) from [] 
(__filemap_fdatawrite_range+0xb0/0xf4)
[  497.218447] [] (__filemap_fdatawrite_range) from [] 
(file_write_and_wait_range+0x64/0xd0)
[  497.224416] [] (file_write_and_wait_range) from [] 
(blkdev_fsync+0x54/0x84)
[  497.229749] [] (blkdev_fsync) from [] 
(vfs_fsync_range+0x70/0xd4)
[  497.234549] [] (vfs_fsync_range) from [] 
(do_fsync+0x4c/0x80)
[  497.239159] [] (do_fsync) from [] (sys_fsync+0x1c/0x20)
[  497.243407] [] (sys_fsync) from [] 
(ret_fast_syscall+0x0/0x2c)

like in this ARM 32-bit (LPAE enabled), example.

That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default
to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be
wrong: which may cause obj variable to overflow if PFN is HIGHMEM and
referencing any page above the 4GB watermark.

This commit exposes the BUG IF the architecture supports PAE AND does
not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco 
---
 mm/zsmalloc.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9da65552e7ca..9c3ff8c2ccbc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -119,6 +119,15 @@
 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
+/*
+ * When using PAE, the obj encoding might overflow if arch does
+ * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
+ * This checks for a future bad page access, when de-coding obj.
+ */
+#define OBJ_OVERFLOW(_pfn) \
+   (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
+   ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
+
 #define FULLNESS_BITS  2
 #define CLASS_BITS 8
 #define ISOLATED_BITS  3
@@ -871,9 +880,14 @@ static void obj_to_location(unsigned long obj, struct page 
**page,
  */
 static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
 {
-   unsigned long obj;
+   unsigned long obj, pfn;
+
+   pfn = page_to_pfn(page);
+
+   if (unlikely(OBJ_OVERFLOW(pfn)))
+   BUG();
 
-   obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+   obj = pfn << OBJ_INDEX_BITS;
obj |= obj_idx & OBJ_INDEX_MASK;
obj <<= OBJ_TAG_BITS;
 
-- 
2.19.1



[PATCH 2/2] mm/zsmalloc.c: fix zsmalloc ARM LPAE support

2018-10-24 Thread Rafael David Tinoco
Since commit 02390b87a945 ("mm/zsmalloc: Prepare to variable
MAX_PHYSMEM_BITS"), an architecture has to define this value in order to
guarantee that zsmalloc will be able to encode and decode the obj value
properly.

Similar to that change, this one sets the value for ARM LPAE, fixing a
possible null-ptr-deref in zs_map_object() when using ARM LPAE and
HIGHMEM pages located above the 4GB watermark.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco 
---
 arch/arm/include/asm/pgtable-3level-types.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level-types.h 
b/arch/arm/include/asm/pgtable-3level-types.h
index 921aa30259c4..bd4994f98700 100644
--- a/arch/arm/include/asm/pgtable-3level-types.h
+++ b/arch/arm/include/asm/pgtable-3level-types.h
@@ -67,4 +67,6 @@ typedef pteval_t pgprot_t;
 
 #endif /* STRICT_MM_TYPECHECKS */
 
+#define MAX_POSSIBLE_PHYSMEM_BITS  36
+
 #endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */
-- 
2.19.1



[PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-24 Thread Rafael David Tinoco
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
physical frame number might be so big that zsmalloc obj encoding (to
location) will break IF architecture does not re-define
MAX_PHYSMEM_BITS, causing:

[  497.097843] 
==
[  497.102365] BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc
[  497.105933] Read of size 4 at addr  by task mkfs.ext4/623
[  497.109684]
[  497.110722] CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 
4.19.0-rc8-00017-g8239bc6d3307-dirty #15
[  497.116098] Hardware name: Generic DT based system
[  497.119094] [] (unwind_backtrace) from [] 
(show_stack+0x20/0x24)
[  497.123819] [] (show_stack) from [] 
(dump_stack+0xbc/0xe8)
[  497.128299] [] (dump_stack) from [] 
(kasan_report+0x248/0x390)
[  497.132928] [] (kasan_report) from [] 
(__asan_load4+0x78/0xb4)
[  497.137530] [] (__asan_load4) from [] 
(zs_map_object+0xa4/0x2bc)
[  497.142335] [] (zs_map_object) from [] 
(zram_bvec_rw.constprop.2+0x324/0x8e4 [zram])
[  497.148294] [] (zram_bvec_rw.constprop.2 [zram]) from [] 
(zram_make_request+0x234/0x46c [zram])
[  497.154653] [] (zram_make_request [zram]) from [] 
(generic_make_request+0x304/0x63c)
[  497.160413] [] (generic_make_request) from [] 
(submit_bio+0x4c/0x1c8)
[  497.165379] [] (submit_bio) from [] 
(submit_bh_wbc.constprop.15+0x238/0x26c)
[  497.170775] [] (submit_bh_wbc.constprop.15) from [] 
(__block_write_full_page+0x524/0x76c)
[  497.176776] [] (__block_write_full_page) from [] 
(block_write_full_page+0x1bc/0x1d4)
[  497.182549] [] (block_write_full_page) from [] 
(blkdev_writepage+0x24/0x28)
[  497.187849] [] (blkdev_writepage) from [] 
(__writepage+0x44/0x78)
[  497.192633] [] (__writepage) from [] 
(write_cache_pages+0x3b8/0x800)
[  497.197616] [] (write_cache_pages) from [] 
(generic_writepages+0x74/0xa0)
[  497.202807] [] (generic_writepages) from [] 
(blkdev_writepages+0x18/0x1c)
[  497.208022] [] (blkdev_writepages) from [] 
(do_writepages+0x68/0x134)
[  497.213002] [] (do_writepages) from [] 
(__filemap_fdatawrite_range+0xb0/0xf4)
[  497.218447] [] (__filemap_fdatawrite_range) from [] 
(file_write_and_wait_range+0x64/0xd0)
[  497.224416] [] (file_write_and_wait_range) from [] 
(blkdev_fsync+0x54/0x84)
[  497.229749] [] (blkdev_fsync) from [] 
(vfs_fsync_range+0x70/0xd4)
[  497.234549] [] (vfs_fsync_range) from [] 
(do_fsync+0x4c/0x80)
[  497.239159] [] (do_fsync) from [] (sys_fsync+0x1c/0x20)
[  497.243407] [] (sys_fsync) from [] 
(ret_fast_syscall+0x0/0x2c)

like in this ARM 32-bit (LPAE enabled), example.

That occurs because, if not set, MAX_POSSIBLE_PHYSMEM_BITS will default
to BITS_PER_LONG (32) in most cases, and, for PAE, _PFN_BITS will be
wrong: which may cause obj variable to overflow if PFN is HIGHMEM and
referencing any page above the 4GB watermark.

This commit exposes the BUG IF the architecture supports PAE AND does
not explicitly set MAX_POSSIBLE_PHYSMEM_BITS to supported value.

Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17
Signed-off-by: Rafael David Tinoco 
---
 mm/zsmalloc.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 9da65552e7ca..9c3ff8c2ccbc 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -119,6 +119,15 @@
 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
+/*
+ * When using PAE, the obj encoding might overflow if arch does
+ * not re-define MAX_PHYSMEM_BITS, since zsmalloc uses HIGHMEM.
+ * This checks for a future bad page access, when de-coding obj.
+ */
+#define OBJ_OVERFLOW(_pfn) \
+   (((unsigned long long) _pfn << (OBJ_INDEX_BITS + OBJ_TAG_BITS)) >= \
+   ((_AC(1, ULL)) << MAX_POSSIBLE_PHYSMEM_BITS) ? 1 : 0)
+
 #define FULLNESS_BITS  2
 #define CLASS_BITS 8
 #define ISOLATED_BITS  3
@@ -871,9 +880,14 @@ static void obj_to_location(unsigned long obj, struct page 
**page,
  */
 static unsigned long location_to_obj(struct page *page, unsigned int obj_idx)
 {
-   unsigned long obj;
+   unsigned long obj, pfn;
+
+   pfn = page_to_pfn(page);
+
+   if (unlikely(OBJ_OVERFLOW(pfn)))
+   BUG();
 
-   obj = page_to_pfn(page) << OBJ_INDEX_BITS;
+   obj = pfn << OBJ_INDEX_BITS;
obj |= obj_idx & OBJ_INDEX_MASK;
obj <<= OBJ_TAG_BITS;
 
-- 
2.19.1



Re: [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 10, 2018 at 03:19:17PM +0800, Huang Ying wrote:
>> +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma,
>> +pmd_t pmd, swp_entry_t *entry)
>> +{
>
> Got
> /home/dbbench/linux/mm/memcontrol.c:4719:21: warning: ‘mc_handle_swap_pmd’ 
> defined but not used [-Wunused-function]
>  static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma,
> when
> # CONFIG_TRANSPARENT_HUGEPAGE is not set

Thanks for pointing this out.  Will fix it in the next version.

Best Regards,
Huang, Ying


Re: [PATCH -V6 14/21] swap: Support to move swap account for PMD swap mapping

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 10, 2018 at 03:19:17PM +0800, Huang Ying wrote:
>> +static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma,
>> +pmd_t pmd, swp_entry_t *entry)
>> +{
>
> Got
> /home/dbbench/linux/mm/memcontrol.c:4719:21: warning: ‘mc_handle_swap_pmd’ 
> defined but not used [-Wunused-function]
>  static struct page *mc_handle_swap_pmd(struct vm_area_struct *vma,
> when
> # CONFIG_TRANSPARENT_HUGEPAGE is not set

Thanks for pointing this out.  Will fix it in the next version.

Best Regards,
Huang, Ying


Re: [PATCH v3 4/4] nds32: Add document for NDS32 PMU.

2018-10-24 Thread Nick Hu
Hi Rob,

On Thu, Oct 25, 2018 at 02:32:48AM +0800, Rob Herring wrote:
> On Wed, Oct 24, 2018 at 11:32:40AM +0800, Nickhu wrote:
> > The document for how to add NDS32 PMU
> > in devicetree.
> > 
> > Signed-off-by: Nickhu 
> > ---
> >  Documentation/devicetree/bindings/nds32/pmu.txt | 17 +
> >  1 file changed, 17 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt
> 
> Can you move to bindings/perf/ with other similar h/w. With that,
> 
> Reviewed-by: Rob Herring 

Sure, thanks!
Nick


Re: [PATCH v3 4/4] nds32: Add document for NDS32 PMU.

2018-10-24 Thread Nick Hu
Hi Rob,

On Thu, Oct 25, 2018 at 02:32:48AM +0800, Rob Herring wrote:
> On Wed, Oct 24, 2018 at 11:32:40AM +0800, Nickhu wrote:
> > The document for how to add NDS32 PMU
> > in devicetree.
> > 
> > Signed-off-by: Nickhu 
> > ---
> >  Documentation/devicetree/bindings/nds32/pmu.txt | 17 +
> >  1 file changed, 17 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/nds32/pmu.txt
> 
> Can you move to bindings/perf/ with other similar h/w. With that,
> 
> Reviewed-by: Rob Herring 

Sure, thanks!
Nick


Re: [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 10, 2018 at 03:19:09PM +0800, Huang Ying wrote:
>> +#ifdef CONFIG_THP_SWAP
>> +/*
>> + * The corresponding page table shouldn't be changed under us, that
>> + * is, the page table lock should be held.
>> + */
>> +int split_swap_cluster_map(swp_entry_t entry)
>> +{
>> +struct swap_info_struct *si;
>> +struct swap_cluster_info *ci;
>> +unsigned long offset = swp_offset(entry);
>> +
>> +VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER));
>> +si = _swap_info_get(entry);
>> +if (!si)
>> +return -EBUSY;
>
> I think this return value doesn't get used anywhere?

Yes.  And the error is only possible if page table is corrupted.  So
maybe add a VM_BUG_ON() in it caller __split_huge_swap_pmd()?

Best Regards,
Huang, Ying


Re: [PATCH -V6 06/21] swap: Support PMD swap mapping when splitting huge PMD

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 10, 2018 at 03:19:09PM +0800, Huang Ying wrote:
>> +#ifdef CONFIG_THP_SWAP
>> +/*
>> + * The corresponding page table shouldn't be changed under us, that
>> + * is, the page table lock should be held.
>> + */
>> +int split_swap_cluster_map(swp_entry_t entry)
>> +{
>> +struct swap_info_struct *si;
>> +struct swap_cluster_info *ci;
>> +unsigned long offset = swp_offset(entry);
>> +
>> +VM_BUG_ON(!IS_ALIGNED(offset, SWAPFILE_CLUSTER));
>> +si = _swap_info_get(entry);
>> +if (!si)
>> +return -EBUSY;
>
> I think this return value doesn't get used anywhere?

Yes.  And the error is only possible if page table is corrupted.  So
maybe add a VM_BUG_ON() in it caller __split_huge_swap_pmd()?

Best Regards,
Huang, Ying


RE: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support

2018-10-24 Thread Joakim Zhang

-Original Message-
From: Fabio Estevam [mailto:feste...@gmail.com] 
Sent: 2018年10月24日 19:20
To: Joakim Zhang 
Cc: Shawn Guo ; Sascha Hauer ; 
Sascha Hauer ; Fabio Estevam ; 
dl-linux-imx ; Rob Herring ; moderated 
list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE 
; open list:OPEN FIRMWARE AND FLATTENED 
DEVICE TREE BINDINGS ; linux-kernel 
; A.s. Dong 
Subject: Re: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support

Hi Joakim,

On Wed, Oct 24, 2018 at 8:15 AM Joakim Zhang  wrote:

> diff --git a/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts 
> b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts
> new file mode 100644
> index ..963e0b559d60
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2013 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.

You can remove this paragraph as you are using SPDX tag.

> +++ b/arch/arm/boot/dts/imx6q-sabreauto-flexcan1.dts
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2013 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */

Ditto.

> + {
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_flexcan1>;
> +   pinctrl-assert-gpios = <_b 3 GPIO_ACTIVE_HIGH>; /* TX */

This property does not exist in upstream.


Hi Fabio,

I will modify what you have commented. Thanks a lot.

Best Regards,
Joakim Zhang


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> > > That is actually a different problem. And you're right, we never did fix
> > > that.
> > >
> > it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
>
> But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
> hook sufficient to actually generate those.
>
You're right. But I saw that and thought, this is going in the right
direction: tracking munmap.
I agree with you that we should use this opportunity to track unmap
for his purpose but also
the issue I raised, which needs a new record type based on the new unmap hook.

> Now I agree that if he's going to do an munmap hook, he should do it
> 'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
> but ideally simply pick up and finish that patch we had back then.
Agreed.


RE: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support

2018-10-24 Thread Joakim Zhang

-Original Message-
From: Fabio Estevam [mailto:feste...@gmail.com] 
Sent: 2018年10月24日 19:20
To: Joakim Zhang 
Cc: Shawn Guo ; Sascha Hauer ; 
Sascha Hauer ; Fabio Estevam ; 
dl-linux-imx ; Rob Herring ; moderated 
list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE 
; open list:OPEN FIRMWARE AND FLATTENED 
DEVICE TREE BINDINGS ; linux-kernel 
; A.s. Dong 
Subject: Re: [PATCH V1 3/3] ARM: dts: sabreauto: Add flexcan support

Hi Joakim,

On Wed, Oct 24, 2018 at 8:15 AM Joakim Zhang  wrote:

> diff --git a/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts 
> b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts
> new file mode 100644
> index ..963e0b559d60
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6dl-sabreauto-flexcan1.dts
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2013 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.

You can remove this paragraph as you are using SPDX tag.

> +++ b/arch/arm/boot/dts/imx6q-sabreauto-flexcan1.dts
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2013 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */

Ditto.

> + {
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_flexcan1>;
> +   pinctrl-assert-gpios = <_b 3 GPIO_ACTIVE_HIGH>; /* TX */

This property does not exist in upstream.


Hi Fabio,

I will modify what you have commented. Thanks a lot.

Best Regards,
Joakim Zhang


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> > > That is actually a different problem. And you're right, we never did fix
> > > that.
> > >
> > it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
>
> But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
> hook sufficient to actually generate those.
>
You're right. But I saw that and thought, this is going in the right
direction: tracking munmap.
I agree with you that we should use this opportunity to track unmap
for his purpose but also
the issue I raised, which needs a new record type based on the new unmap hook.

> Now I agree that if he's going to do an munmap hook, he should do it
> 'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
> but ideally simply pick up and finish that patch we had back then.
Agreed.


Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 24, 2018 at 11:31:42AM +0800, Huang, Ying wrote:
>> Hi, Daniel,
>> 
>> Daniel Jordan  writes:
>> 
>> > On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote:
>> >> And for all, Any comment is welcome!
>> >> 
>> >> This patchset is based on the 2018-10-3 head of mmotm/master.
>> >
>> > There seems to be some infrequent memory corruption with THPs that have 
>> > been
>> > swapped out: page contents differ after swapin.
>> 
>> Thanks a lot for testing this!  I know there were big effort behind this
>> and it definitely will improve the quality of the patchset greatly!
>
> You're welcome!  Hopefully I'll have more results and tests to share in the
> next two weeks.
>
>> 
>> > Reproducer at the bottom.  Part of some tests I'm writing, had to separate 
>> > it a
>> > little hack-ily.  Basically it writes the word offset _at_ each word 
>> > offset in
>> > a memory blob, tries to push it to swap, and verifies the offset is the 
>> > same
>> > after swapin.
>> >
>> > I ran with THP enabled=always.  THP swapin_enabled could be always or 
>> > never, it
>> > happened with both.  Every time swapping occurred, a single THP-sized 
>> > chunk in
>> > the middle of the blob had different offsets.  Example:
>> >
>> > ** > word corruption gap
>> > ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) 
>> > **
>> > ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) 
>> > **
>> > ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) 
>> > **
>> > ...pattern continues...
>> > ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) 
>> > **
>> > ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) 
>> > **
>> > ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) 
>> > **
>> 
>> 15179776 < 15179xxx <= 17027064
>> 
>> 15179776 % 4096 = 0
>> 
>> And 15179776 = 15179768 + 8
>> 
>> So I guess we have some alignment bug.  Could you try the patches
>> attached?  It deal with some alignment issue.
>
> That fixed it.  And removed three lines of code.  Nice :)

Thanks!  I will merge the fixes into the patchset.

Best Regards,
Huang, Ying


Re: [PATCH -V6 00/21] swap: Swapout/swapin THP in one piece

2018-10-24 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, Oct 24, 2018 at 11:31:42AM +0800, Huang, Ying wrote:
>> Hi, Daniel,
>> 
>> Daniel Jordan  writes:
>> 
>> > On Wed, Oct 10, 2018 at 03:19:03PM +0800, Huang Ying wrote:
>> >> And for all, Any comment is welcome!
>> >> 
>> >> This patchset is based on the 2018-10-3 head of mmotm/master.
>> >
>> > There seems to be some infrequent memory corruption with THPs that have 
>> > been
>> > swapped out: page contents differ after swapin.
>> 
>> Thanks a lot for testing this!  I know there were big effort behind this
>> and it definitely will improve the quality of the patchset greatly!
>
> You're welcome!  Hopefully I'll have more results and tests to share in the
> next two weeks.
>
>> 
>> > Reproducer at the bottom.  Part of some tests I'm writing, had to separate 
>> > it a
>> > little hack-ily.  Basically it writes the word offset _at_ each word 
>> > offset in
>> > a memory blob, tries to push it to swap, and verifies the offset is the 
>> > same
>> > after swapin.
>> >
>> > I ran with THP enabled=always.  THP swapin_enabled could be always or 
>> > never, it
>> > happened with both.  Every time swapping occurred, a single THP-sized 
>> > chunk in
>> > the middle of the blob had different offsets.  Example:
>> >
>> > ** > word corruption gap
>> > ** corruption detected 14929920 bytes in (got 15179776, expected 14929920) 
>> > **
>> > ** corruption detected 14929928 bytes in (got 15179784, expected 14929928) 
>> > **
>> > ** corruption detected 14929936 bytes in (got 15179792, expected 14929936) 
>> > **
>> > ...pattern continues...
>> > ** corruption detected 17027048 bytes in (got 15179752, expected 17027048) 
>> > **
>> > ** corruption detected 17027056 bytes in (got 15179760, expected 17027056) 
>> > **
>> > ** corruption detected 17027064 bytes in (got 15179768, expected 17027064) 
>> > **
>> 
>> 15179776 < 15179xxx <= 17027064
>> 
>> 15179776 % 4096 = 0
>> 
>> And 15179776 = 15179768 + 8
>> 
>> So I guess we have some alignment bug.  Could you try the patches
>> attached?  It deal with some alignment issue.
>
> That fixed it.  And removed three lines of code.  Nice :)

Thanks!  I will merge the fixes into the patchset.

Best Regards,
Huang, Ying


  1   2   3   4   5   6   7   8   9   10   >