[RFC PATCH 0/2] Fix rockchip IOMMU driver vs PM issues

2014-12-11 Thread Tomasz Figa
On Rockchip SoCs, the IOMMUs are located inside the same power domains as
IP blocks using them. This means that as soon as we runtime suspend such
IP block, causing the power domain to be powered off, the IOMMU would also
be powered off, losing all its state.

This means that whenever the power domain is being powered off, the IOMMU
driver needs to be able to deinitialize the hardware and whenever the domain
is being powered on, it needs to restore all the state, so the consumer
device is able to perform memory transactions.

The solution proposed here revives the idea of PM domain notifiers submitted
originally by Samsung's Sylwester Nawrocki and Marek Szyprowski in [1].
The main benefit of this idea that it is very simple, adding just 84 lines
of code, but effective and also useful for other purposes, what can be seen
in thread [2] and [3]. Moreover, it lets us avoid adding device specific
code (in this case specific to single IOMMU type) to power domain drivers,
which can be used for more devices than just IOMMU and which may vary
between platforms using the same IOMMU driver.

The IOMMU-specific part of the solution simply uses the pre-power-off and
post-power-on notifications to do the necessary setup without the need
of any action from inside drivers of devices behind the IOMMU, so all the
code specific to this particular IOMMU type stays outside other drivers
that should not rely on being run with a particular type of IOMMU (and
often even SoC).

Tested with a custom board based on RK3288 SoC.

[1] http://thread.gmane.org/gmane.linux.kernel.samsung-soc/36079/focus=346956
[2] 
http://thread.gmane.org/gmane.linux.power-management.general/51993/focus=39158
[3] http://marc.info/?l=linux-pmm=136448756308203w=2

Sylwester Nawrocki (1):
  pm: Add PM domain notifications

Tomasz Figa (1):
  iommu: rockchip: Handle system-wide and runtime PM

 Documentation/power/notifiers.txt |  14 +++
 drivers/base/power/domain.c   |  50 +
 drivers/iommu/rockchip-iommu.c| 208 +++---
 include/linux/pm_domain.h |  20 
 4 files changed, 256 insertions(+), 36 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 1/2] pm: Add PM domain notifications

2014-12-11 Thread Tomasz Figa
From: Sylwester Nawrocki s.nawro...@samsung.com

This patch adds notifiers to the runtime PM/genpd subsystem. It is now
possible to register a notifier, which will be called before and after
the generic power domain subsystem calls the power domain's power_on
and power_off callbacks.

Signed-off-by: Sylwester Nawrocki s.nawro...@samsung.com
[tf...@chromium.org: rebased]
Signed-off-by: Tomasz Figa tf...@chromium.org
---
 Documentation/power/notifiers.txt | 14 +++
 drivers/base/power/domain.c   | 50 +++
 include/linux/pm_domain.h | 20 
 3 files changed, 84 insertions(+)

diff --git a/Documentation/power/notifiers.txt 
b/Documentation/power/notifiers.txt
index a81fa25..62303f6 100644
--- a/Documentation/power/notifiers.txt
+++ b/Documentation/power/notifiers.txt
@@ -53,3 +53,17 @@ NULL).  To register and/or unregister a suspend notifier use 
the functions
 register_pm_notifier() and unregister_pm_notifier(), respectively, defined in
 include/linux/suspend.h .  If you don't need to unregister the notifier, you 
can
 also use the pm_notifier() macro defined in include/linux/suspend.h .
+
+Power Domain notifiers
+--
+
+The power domain notifiers allow subsystems or drivers to register for power
+domain on/off notifications, should they need to perform any actions right
+before or right after the power domain on/off.  The device must be already
+added to a power domain before its subsystem or driver registers the notifier.
+Following events are supported:
+
+PM_GENPD_POWER_ON_PREPARE  The power domain is about to turn on.
+PM_GENPD_POST_POWER_ON The power domain has just turned on.
+PM_GENPD_POWER_OFF_PREPARE The power domain is about to turn off.
+PM_GENPD_POST_POWER_OFFThe power domain has just turned off.
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 6a103a3..0d6f84e 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -68,6 +68,45 @@ static struct generic_pm_domain *pm_genpd_lookup_name(const 
char *domain_name)
return genpd;
 }
 
+int pm_genpd_register_notifier(struct device *dev, struct notifier_block *nb)
+{
+   struct pm_domain_data *pdd;
+   int ret = -EINVAL;
+
+   spin_lock_irq(dev-power.lock);
+   if (dev-power.subsys_data) {
+   pdd = dev-power.subsys_data-domain_data;
+   ret = blocking_notifier_chain_register(pdd-notify_chain_head,
+  nb);
+   }
+   spin_unlock_irq(dev-power.lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(pm_genpd_register_notifier);
+
+void pm_genpd_unregister_notifier(struct device *dev, struct notifier_block 
*nb)
+{
+   struct pm_domain_data *pdd;
+
+   spin_lock_irq(dev-power.lock);
+   if (dev-power.subsys_data) {
+   pdd = dev-power.subsys_data-domain_data;
+   blocking_notifier_chain_unregister(pdd-notify_chain_head, nb);
+   }
+   spin_unlock_irq(dev-power.lock);
+}
+EXPORT_SYMBOL_GPL(pm_genpd_unregister_notifier);
+
+static void pm_genpd_notifier_call(unsigned long event,
+  struct generic_pm_domain *genpd)
+{
+   struct pm_domain_data *pdd;
+
+   list_for_each_entry(pdd, genpd-dev_list, list_node)
+   blocking_notifier_call_chain(pdd-notify_chain_head,
+event, pdd-dev);
+}
+
 struct generic_pm_domain *dev_to_genpd(struct device *dev)
 {
if (IS_ERR_OR_NULL(dev-pm_domain))
@@ -161,12 +200,17 @@ static int genpd_power_on(struct generic_pm_domain *genpd)
if (!genpd-power_on)
return 0;
 
+   pm_genpd_notifier_call(PM_GENPD_POWER_ON_PREPARE, genpd);
+
time_start = ktime_get();
ret = genpd-power_on(genpd);
if (ret)
return ret;
 
elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));
+
+   pm_genpd_notifier_call(PM_GENPD_POST_POWER_ON, genpd);
+
if (elapsed_ns = genpd-power_on_latency_ns)
return ret;
 
@@ -188,12 +232,17 @@ static int genpd_power_off(struct generic_pm_domain 
*genpd)
if (!genpd-power_off)
return 0;
 
+   pm_genpd_notifier_call(PM_GENPD_POWER_OFF_PREPARE, genpd);
+
time_start = ktime_get();
ret = genpd-power_off(genpd);
if (ret == -EBUSY)
return ret;
 
elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));
+
+   pm_genpd_notifier_call(PM_GENPD_POST_POWER_OFF, genpd);
+
if (elapsed_ns = genpd-power_off_latency_ns)
return ret;
 
@@ -1466,6 +1515,7 @@ int __pm_genpd_add_device(struct generic_pm_domain 
*genpd, struct device *dev,
genpd-attach_dev(genpd, dev);
 
mutex_lock(gpd_data-lock);
+   BLOCKING_INIT_NOTIFIER_HEAD(gpd_data-base.notify_chain_head);
gpd_data-base.dev = dev;
   

Re: [RFC PATCH 1/2] pm: Add PM domain notifications

2014-12-11 Thread Tomasz Figa
Hi Sylwester,

On Thu, Dec 11, 2014 at 7:36 PM, Sylwester Nawrocki
s.nawro...@samsung.com wrote:

 Hi Tomasz,

 On 11/12/14 09:26, Tomasz Figa wrote:
  From: Sylwester Nawrocki s.nawro...@samsung.com
 
  This patch adds notifiers to the runtime PM/genpd subsystem. It is now
  possible to register a notifier, which will be called before and after
  the generic power domain subsystem calls the power domain's power_on
  and power_off callbacks.
 
  Signed-off-by: Sylwester Nawrocki s.nawro...@samsung.com
  [tf...@chromium.org: rebased]
  Signed-off-by: Tomasz Figa tf...@chromium.org

 Not sure if you've noticed it, I posted an updated version of this patch
 recently [1]. The notifiers list is moved to struct generic_pm_domain
 there and it also allows to register a notifier for selected power domain
 by name.
[snip]
 [1] http://www.spinics.net/lists/linux-samsung-soc/msg38549.html

Ah, haven't noticed, sorry. The API using devices looks the same, so I
guess we can simply have patch 2/2 of this series applied on top of
your patch.

By the way, look-up by name (presumably hardcoded somewhere?) sounds a
bit strange to me. What was the reason for it to be added?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Tomasz Figa
This patch modifies the rockchip-iommu driver to consider state of
the power domain the IOMMU is located in. When the power domain
is powered off, the IOMMU cannot be accessed and register programming
must be deferred until the power domain becomes enabled.

The solution implemented in this patch uses power domain notifications
to perform necessary IOMMU initialization. Race with runtime PM core
is avoided by protecting code accessing the hardware, including startup
and shutdown functions, with a spinlock with a check for power state
inside.

Signed-off-by: Tomasz Figa tf...@chromium.org
---
 drivers/iommu/rockchip-iommu.c | 208 ++---
 1 file changed, 172 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index b2023af..9120655 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -20,6 +20,8 @@
 #include linux/of.h
 #include linux/of_platform.h
 #include linux/platform_device.h
+#include linux/pm_domain.h
+#include linux/pm_runtime.h
 #include linux/slab.h
 #include linux/spinlock.h
 
@@ -88,6 +90,9 @@ struct rk_iommu {
int irq;
struct list_head node; /* entry in rk_iommu_domain.iommus */
struct iommu_domain *domain; /* domain to which iommu is attached */
+   struct notifier_block genpd_nb;
+   spinlock_t hw_lock; /* lock for register access */
+   bool is_powered; /* power domain is on and register clock is enabled */
 };
 
 static inline void rk_table_flush(u32 *va, unsigned int count)
@@ -283,6 +288,9 @@ static void rk_iommu_zap_lines(struct rk_iommu *iommu, 
dma_addr_t iova,
   size_t size)
 {
dma_addr_t iova_end = iova + size;
+
+   assert_spin_locked(iommu-hw_lock);
+
/*
 * TODO(djkurtz): Figure out when it is more efficient to shootdown the
 * entire iotlb rather than iterate over individual iovas.
@@ -293,11 +301,15 @@ static void rk_iommu_zap_lines(struct rk_iommu *iommu, 
dma_addr_t iova,
 
 static bool rk_iommu_is_stall_active(struct rk_iommu *iommu)
 {
+   assert_spin_locked(iommu-hw_lock);
+
return rk_iommu_read(iommu, RK_MMU_STATUS)  RK_MMU_STATUS_STALL_ACTIVE;
 }
 
 static bool rk_iommu_is_paging_enabled(struct rk_iommu *iommu)
 {
+   assert_spin_locked(iommu-hw_lock);
+
return rk_iommu_read(iommu, RK_MMU_STATUS) 
 RK_MMU_STATUS_PAGING_ENABLED;
 }
@@ -306,6 +318,8 @@ static int rk_iommu_enable_stall(struct rk_iommu *iommu)
 {
int ret;
 
+   assert_spin_locked(iommu-hw_lock);
+
if (rk_iommu_is_stall_active(iommu))
return 0;
 
@@ -327,6 +341,8 @@ static int rk_iommu_disable_stall(struct rk_iommu *iommu)
 {
int ret;
 
+   assert_spin_locked(iommu-hw_lock);
+
if (!rk_iommu_is_stall_active(iommu))
return 0;
 
@@ -344,6 +360,8 @@ static int rk_iommu_enable_paging(struct rk_iommu *iommu)
 {
int ret;
 
+   assert_spin_locked(iommu-hw_lock);
+
if (rk_iommu_is_paging_enabled(iommu))
return 0;
 
@@ -361,6 +379,8 @@ static int rk_iommu_disable_paging(struct rk_iommu *iommu)
 {
int ret;
 
+   assert_spin_locked(iommu-hw_lock);
+
if (!rk_iommu_is_paging_enabled(iommu))
return 0;
 
@@ -379,6 +399,8 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
int ret;
u32 dte_addr;
 
+   assert_spin_locked(iommu-hw_lock);
+
/*
 * Check if register DTE_ADDR is working by writing DTE_ADDR_DUMMY
 * and verifying that upper 5 nybbles are read back.
@@ -401,6 +423,50 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
return ret;
 }
 
+static int rk_iommu_startup(struct rk_iommu *iommu)
+{
+   struct iommu_domain *domain = iommu-domain;
+   struct rk_iommu_domain *rk_domain;
+   phys_addr_t dte_addr;
+   int ret;
+
+   assert_spin_locked(iommu-hw_lock);
+
+   ret = rk_iommu_enable_stall(iommu);
+   if (ret)
+   return ret;
+
+   ret = rk_iommu_force_reset(iommu);
+   if (ret)
+   return ret;
+
+   rk_domain = domain-priv;
+   dte_addr = virt_to_phys(rk_domain-dt);
+   rk_iommu_write(iommu, RK_MMU_DTE_ADDR, dte_addr);
+   rk_iommu_command(iommu, RK_MMU_CMD_ZAP_CACHE);
+   rk_iommu_write(iommu, RK_MMU_INT_MASK, RK_MMU_IRQ_MASK);
+
+   ret = rk_iommu_enable_paging(iommu);
+   if (ret)
+   return ret;
+
+   rk_iommu_disable_stall(iommu);
+
+   return ret;
+}
+
+static void rk_iommu_shutdown(struct rk_iommu *iommu)
+{
+   assert_spin_locked(iommu-hw_lock);
+
+   /* Ignore error while disabling, just keep going */
+   rk_iommu_enable_stall(iommu);
+   rk_iommu_disable_paging(iommu);
+   rk_iommu_write(iommu, RK_MMU_INT_MASK, 0);
+   rk_iommu_write(iommu, RK_MMU_DTE_ADDR, 0);
+   rk_iommu_disable_stall(iommu);
+}
+

Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Tomasz Figa
Hi Ulf,

On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson ulf.hans...@linaro.org wrote:
 On 11 December 2014 at 09:26, Tomasz Figa tf...@chromium.org wrote:
 This patch modifies the rockchip-iommu driver to consider state of
 the power domain the IOMMU is located in. When the power domain
 is powered off, the IOMMU cannot be accessed and register programming
 must be deferred until the power domain becomes enabled.

[snip]

 @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
 *pdev)
 return -ENXIO;
 }

 +   pm_runtime_no_callbacks(dev);
 +   pm_runtime_enable(dev);
 +
 +   /* Synchronize state of the domain with driver data. */
 +   pm_runtime_get_sync(dev);
 +   iommu-is_powered = true;

 Doesn't the runtime PM status reflect the value of is_powered, thus
 why do you need to have a copy of it? Could it perpahps be that you
 try to cope with the case when CONFIG_PM is unset?


It's worth noting that this driver fully relies on status of other
devices in the power domain the IOMMU is in and does not enforce the
status on its own. So in general, as far as my understanding of PM
runtime subsystem, the status of the IOMMU device will be always
suspended, because nobody will call pm_runtime_get() on it (except the
get and put pair in probe). So is_powered is here to track status of
the domain, not the device. Feel free to suggest a better way, though.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Ulf Hansson
On 11 December 2014 at 09:26, Tomasz Figa tf...@chromium.org wrote:
 This patch modifies the rockchip-iommu driver to consider state of
 the power domain the IOMMU is located in. When the power domain
 is powered off, the IOMMU cannot be accessed and register programming
 must be deferred until the power domain becomes enabled.

 The solution implemented in this patch uses power domain notifications
 to perform necessary IOMMU initialization. Race with runtime PM core
 is avoided by protecting code accessing the hardware, including startup
 and shutdown functions, with a spinlock with a check for power state
 inside.

 Signed-off-by: Tomasz Figa tf...@chromium.org
 ---
  drivers/iommu/rockchip-iommu.c | 208 
 ++---
  1 file changed, 172 insertions(+), 36 deletions(-)

 diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
 index b2023af..9120655 100644
 --- a/drivers/iommu/rockchip-iommu.c
 +++ b/drivers/iommu/rockchip-iommu.c
 @@ -20,6 +20,8 @@
  #include linux/of.h
  #include linux/of_platform.h
  #include linux/platform_device.h
 +#include linux/pm_domain.h
 +#include linux/pm_runtime.h
  #include linux/slab.h
  #include linux/spinlock.h

 @@ -88,6 +90,9 @@ struct rk_iommu {
 int irq;
 struct list_head node; /* entry in rk_iommu_domain.iommus */
 struct iommu_domain *domain; /* domain to which iommu is attached */
 +   struct notifier_block genpd_nb;
 +   spinlock_t hw_lock; /* lock for register access */
 +   bool is_powered; /* power domain is on and register clock is enabled 
 */
  };

  static inline void rk_table_flush(u32 *va, unsigned int count)
 @@ -283,6 +288,9 @@ static void rk_iommu_zap_lines(struct rk_iommu *iommu, 
 dma_addr_t iova,
size_t size)
  {
 dma_addr_t iova_end = iova + size;
 +
 +   assert_spin_locked(iommu-hw_lock);
 +
 /*
  * TODO(djkurtz): Figure out when it is more efficient to shootdown 
 the
  * entire iotlb rather than iterate over individual iovas.
 @@ -293,11 +301,15 @@ static void rk_iommu_zap_lines(struct rk_iommu *iommu, 
 dma_addr_t iova,

  static bool rk_iommu_is_stall_active(struct rk_iommu *iommu)
  {
 +   assert_spin_locked(iommu-hw_lock);
 +
 return rk_iommu_read(iommu, RK_MMU_STATUS)  
 RK_MMU_STATUS_STALL_ACTIVE;
  }

  static bool rk_iommu_is_paging_enabled(struct rk_iommu *iommu)
  {
 +   assert_spin_locked(iommu-hw_lock);
 +
 return rk_iommu_read(iommu, RK_MMU_STATUS) 
  RK_MMU_STATUS_PAGING_ENABLED;
  }
 @@ -306,6 +318,8 @@ static int rk_iommu_enable_stall(struct rk_iommu *iommu)
  {
 int ret;

 +   assert_spin_locked(iommu-hw_lock);
 +
 if (rk_iommu_is_stall_active(iommu))
 return 0;

 @@ -327,6 +341,8 @@ static int rk_iommu_disable_stall(struct rk_iommu *iommu)
  {
 int ret;

 +   assert_spin_locked(iommu-hw_lock);
 +
 if (!rk_iommu_is_stall_active(iommu))
 return 0;

 @@ -344,6 +360,8 @@ static int rk_iommu_enable_paging(struct rk_iommu *iommu)
  {
 int ret;

 +   assert_spin_locked(iommu-hw_lock);
 +
 if (rk_iommu_is_paging_enabled(iommu))
 return 0;

 @@ -361,6 +379,8 @@ static int rk_iommu_disable_paging(struct rk_iommu *iommu)
  {
 int ret;

 +   assert_spin_locked(iommu-hw_lock);
 +
 if (!rk_iommu_is_paging_enabled(iommu))
 return 0;

 @@ -379,6 +399,8 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
 int ret;
 u32 dte_addr;

 +   assert_spin_locked(iommu-hw_lock);
 +
 /*
  * Check if register DTE_ADDR is working by writing DTE_ADDR_DUMMY
  * and verifying that upper 5 nybbles are read back.
 @@ -401,6 +423,50 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
 return ret;
  }

 +static int rk_iommu_startup(struct rk_iommu *iommu)
 +{
 +   struct iommu_domain *domain = iommu-domain;
 +   struct rk_iommu_domain *rk_domain;
 +   phys_addr_t dte_addr;
 +   int ret;
 +
 +   assert_spin_locked(iommu-hw_lock);
 +
 +   ret = rk_iommu_enable_stall(iommu);
 +   if (ret)
 +   return ret;
 +
 +   ret = rk_iommu_force_reset(iommu);
 +   if (ret)
 +   return ret;
 +
 +   rk_domain = domain-priv;
 +   dte_addr = virt_to_phys(rk_domain-dt);
 +   rk_iommu_write(iommu, RK_MMU_DTE_ADDR, dte_addr);
 +   rk_iommu_command(iommu, RK_MMU_CMD_ZAP_CACHE);
 +   rk_iommu_write(iommu, RK_MMU_INT_MASK, RK_MMU_IRQ_MASK);
 +
 +   ret = rk_iommu_enable_paging(iommu);
 +   if (ret)
 +   return ret;
 +
 +   rk_iommu_disable_stall(iommu);
 +
 +   return ret;
 +}
 +
 +static void rk_iommu_shutdown(struct rk_iommu *iommu)
 +{
 +   assert_spin_locked(iommu-hw_lock);
 +
 +   /* Ignore error while disabling, just keep going */
 +   

Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Ulf Hansson
On 11 December 2014 at 13:42, Tomasz Figa tf...@chromium.org wrote:
 Hi Ulf,

 On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson ulf.hans...@linaro.org wrote:
 On 11 December 2014 at 09:26, Tomasz Figa tf...@chromium.org wrote:
 This patch modifies the rockchip-iommu driver to consider state of
 the power domain the IOMMU is located in. When the power domain
 is powered off, the IOMMU cannot be accessed and register programming
 must be deferred until the power domain becomes enabled.

 [snip]

 @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
 *pdev)
 return -ENXIO;
 }

 +   pm_runtime_no_callbacks(dev);
 +   pm_runtime_enable(dev);
 +
 +   /* Synchronize state of the domain with driver data. */
 +   pm_runtime_get_sync(dev);
 +   iommu-is_powered = true;

 Doesn't the runtime PM status reflect the value of is_powered, thus
 why do you need to have a copy of it? Could it perpahps be that you
 try to cope with the case when CONFIG_PM is unset?


 It's worth noting that this driver fully relies on status of other
 devices in the power domain the IOMMU is in and does not enforce the
 status on its own. So in general, as far as my understanding of PM
 runtime subsystem, the status of the IOMMU device will be always
 suspended, because nobody will call pm_runtime_get() on it (except the
 get and put pair in probe). So is_powered is here to track status of
 the domain, not the device. Feel free to suggest a better way, though.

I see, thanks for clarifying. I think it makes sense as is, I have no
better suggestion.

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/2] pm: Add PM domain notifications

2014-12-11 Thread Ulf Hansson
On 11 December 2014 at 14:54, Sylwester Nawrocki s.nawro...@samsung.com wrote:
 On 11/12/14 12:04, Tomasz Figa wrote:
 ...
  On 11/12/14 09:26, Tomasz Figa wrote:
   From: Sylwester Nawrocki s.nawro...@samsung.com
  
   This patch adds notifiers to the runtime PM/genpd subsystem. It is now
   possible to register a notifier, which will be called before and after
   the generic power domain subsystem calls the power domain's power_on
   and power_off callbacks.
  
   Signed-off-by: Sylwester Nawrocki s.nawro...@samsung.com
   [tf...@chromium.org: rebased]
   Signed-off-by: Tomasz Figa tf...@chromium.org
 
  Not sure if you've noticed it, I posted an updated version of this patch
  recently [1]. The notifiers list is moved to struct generic_pm_domain
  there and it also allows to register a notifier for selected power domain
  by name.
 [snip]
  [1] http://www.spinics.net/lists/linux-samsung-soc/msg38549.html

 Ah, haven't noticed, sorry. The API using devices looks the same, so I
 guess we can simply have patch 2/2 of this series applied on top of
 your patch.

 Yes, that should work.

 By the way, look-up by name (presumably hardcoded somewhere?) sounds a
 bit strange to me. What was the reason for it to be added?

 Yes, that might not be a very elegant approach. We initially used it
 to implement power domain on/off sequence per specific domain and SoC,
 since it appeared resistant to generalize.  I.e. the control register
 write sequences are different per domain and per SoC (exynos).
 So we named the domains in the device tree in that way:

 pm_domains: pm-domains@10024000 {
 compatible = samsung,exynos4415-pd;
 reg-names = cam, tv, mfc, g3d,
 lcd0, isp0, isp1;
 reg = 0x10024000 0x20, 0x10024020 0x20,
   0x10024040 0x20, 0x10024060 0x20,
   0x10024080 0x20, 0x100240A0 0x20,
   0x100240E0 0x20;
 #power-domain-cells = 1;
 };

 and then, for example, in the exynos CMU_ISP{0, 1} (clock controller)
 driver registered for notification on isp0 and isp1 power domains
 (isp1 is a sub-domain of isp0 and the consumer devices are normally
 attached to isp1).

 We have been investigating if we could do without the notification
 at the clocks driver side, then the all SoC/power domain specific code
 would end up in the exynos power domain driver. But I'm afraid it's
 not going to work for all SoCs. Anyway lookup by name might be not
 needed.

Regarding lookup by name, let's please move away from those APIs. I
am planning to remove all name based API from the genpd API as soon as
I can.

If you need to fetch domains, this might help you:
http://www.spinics.net/lists/arm-kernel/msg383743.html

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [v2 17/25] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

2014-12-11 Thread Alex Williamson
On Thu, 2014-12-11 at 05:55 +, Wu, Feng wrote:
 
  -Original Message-
  From: Alex Williamson [mailto:alex.william...@redhat.com]
  Sent: Monday, December 08, 2014 1:21 PM
  To: Wu, Feng
  Cc: Eric Auger; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
  x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
  dw...@infradead.org; j...@8bytes.org; jiang@linux.intel.com;
  linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org;
  k...@vger.kernel.org
  Subject: Re: [v2 17/25] KVM: kvm-vfio: User API for VT-d Posted-Interrupts
  
  On Mon, 2014-12-08 at 04:58 +, Wu, Feng wrote:
  
-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
  On
Behalf Of Eric Auger
Sent: Thursday, December 04, 2014 10:05 PM
To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com;
jiang@linux.intel.com
Cc: linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org;
k...@vger.kernel.org
Subject: Re: [v2 17/25] KVM: kvm-vfio: User API for VT-d 
Posted-Interrupts
   
Hi Feng,
On 12/03/2014 08:39 AM, Feng Wu wrote:
 This patch adds and documents a new attribute
 KVM_DEV_VFIO_DEVICE_POSTING_IRQ in KVM_DEV_VFIO_DEVICE
  group.
 This new attribute is used for VT-d Posted-Interrupts.

 When guest OS changes the interrupt configuration for an
 assigned device, such as, MSI/MSIx data/address fields,
 QEMU will use this IRQ attribute to tell KVM to update the
 related IRTE according the VT-d Posted-Interrrupts Specification,
 such as, the guest vector should be updated in the related IRTE.

 Signed-off-by: Feng Wu feng...@intel.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |9 +
  include/uapi/linux/kvm.h   |   10 ++
  2 files changed, 19 insertions(+), 0 deletions(-)

 diff --git a/Documentation/virtual/kvm/devices/vfio.txt
b/Documentation/virtual/kvm/devices/vfio.txt
 index f7aff29..41e12b7 100644
 --- a/Documentation/virtual/kvm/devices/vfio.txt
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -42,3 +42,12 @@ activated before VFIO_DEVICE_SET_IRQS has
  been
called to trigger the IRQ
  or associate an eventfd to it. Unforwarding can only be called while 
 the
  signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this
  condition
is
  not satisfied, the command returns an -EBUSY.
 +
 +  KVM_DEV_VFIO_DEVICE_POSTING_IRQ: Use posted interrtups
mechanism to post
typo
 +   the IRQ to guests.
 +For this attribute, kvm_device_attr.addr points to a kvm_vfio_dev_irq
struct.
 +
 +When guest OS changes the interrupt configuration for an assigned
  device,
 +such as, MSI/MSIx data/address fields, QEMU will use this IRQ 
 attribute
 +to tell KVM to update the related IRTE according the VT-d
Posted-Interrrupts
 +Specification, such as, the guest vector should be updated in the 
 related
IRTE.
For my curiosity are there any restrictions about the instant at which
the change can be done?
I do not get here how you deactivate the posting?
  
   The current method is if the hardware supports interrupts posting, we will
   use it instead of interrupts remapping, since it has good performance. Why
   do I need deactivate interrupts posting?
  
   Here is the reply to Alex for the same question:
   In fact, I don't think we need to stop the posted-interrupts. For setting
   posted interrupts, we update the related IRTE according to the new
   format. If the guest reboots, or unload the drivers, or some other
   operations, the msi/msix will be disabled first, in this path, the irq
   will be disabled the related IRTE is not used anymore.
  
  Right, and I'm still not sure I agree with that reasoning.  We need to
  build the kernel interface to be generic, not tailored for a specific
  userspace.  I don't really feel comfortable having something that can't
  be disabled via a similar path to it being enabled.  For instance, what
  about a dynamic debug interface where we want to enable tracing and see
  each interrupt injected into the guest.  At that point we'd want to
  disabled posted interrupts and direct KVM injection and route via QEMU.
  Thanks,
  
  Alex
 
 I am not quite understand why we need to debug the software
 delivery path for interrupt when PI is used, in this case, the software
 injection code will have no chance to execute. If we don't want the use
 PI, we can disable it from kernel command line.

Well, first off, I think it's just good interface design that if we
introduce the ability to enable something we also provide the ability to
disable it without making assumption about how a specific userspace is

Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Ulf Hansson
On 11 December 2014 at 16:31, Kevin Hilman khil...@kernel.org wrote:
 [+ Laurent Pinchart]

 Tomasz Figa tf...@chromium.org writes:

 On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson ulf.hans...@linaro.org wrote:

 [...]

 @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
 *pdev)
 return -ENXIO;
 }

 +   pm_runtime_no_callbacks(dev);
 +   pm_runtime_enable(dev);
 +
 +   /* Synchronize state of the domain with driver data. */
 +   pm_runtime_get_sync(dev);
 +   iommu-is_powered = true;

 Doesn't the runtime PM status reflect the value of is_powered, thus
 why do you need to have a copy of it? Could it perpahps be that you
 try to cope with the case when CONFIG_PM is unset?


 It's worth noting that this driver fully relies on status of other
 devices in the power domain the IOMMU is in and does not enforce the
 status on its own. So in general, as far as my understanding of PM
 runtime subsystem, the status of the IOMMU device will be always
 suspended, because nobody will call pm_runtime_get() on it (except the
 get and put pair in probe). So is_powered is here to track status of
 the domain, not the device. Feel free to suggest a better way, though.

 I still don't like these notifiers.  I think they add ways to bypass
 having proper runtime PM implemented for devices/subsystems.

I do agree, but I haven't found another good solution to the problem.


 From a high-level, the IOMMU is just another device inside the PM
 domain, so ideally it should be doing it's own _get() and _put() calls
 so the PM domain code would just do the right thing without the need for
 notifiers.

As I understand it, the IOMMU (or for other similar cases) shouldn't
be doing any get() and put() at all because there are no IO API to
serve request from.

In principle we could consider these kind devices as parent devices
to those other devices that needs them. Then runtime PM core would
take care of things for us, right!?

Now, I am not so sure using the parent approach is actually viable,
since it will likely have other complications, but I haven't
thoroughly thought it though yet.


 No knowing a lot about the IOMMU API, I'm guessing the reason you're not
 doing that is because the IOMMU API currently doesn't have an easy way
 to keep track of *active* users so it's not obvious where to put those
 _get and _put calls.  If that doesn't exist, perhaps a simple
 iommu_get() and iommu_put() API needs to be introduced (which inside the
 IOMMU core would just do runtime PM calls) so that users of the IOMMU
 could inform the subsystem that the IOMMU is needed and it should not be
 powered off.

 I Cc'd Laurent because I know he's thought about this before from the
 IOMMU side, and not sure if he came up with a solution.

Cool, let's see then.

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/vt-d: Only remove domain when device is removed

2014-12-11 Thread Jerry Hoemann
On Tue, Dec 09, 2014 at 01:15:25PM +0100, Joerg Roedel wrote:
 On Thu, Nov 06, 2014 at 09:16:05AM -0700, Alex Williamson wrote:
  But the domains are unlinked from device_domain_list using
  unlink_domain_info() which is called from both domain_remove_dev_info()
  and domain_remove_one_dev_info() which are both part of that more
  likely, unlikely branch in intel_iommu_attach_device().  So it seems
  like any time we switch a device from the DMA-API to the IOMMU-API, we
  lose the reference to the domain.  Is that incorrect?  I'll try to test.
 
 Okay, I thought a while about that and it looks like a real fix needs a
 rewrite of the domain handling code in the VT-d driver to better handle
 domain lifetime. We'll get this for free when we add default domains and
 more domain handling logic to the iommu core, so I think we don't need
 to start rewriting the VT-d driver for this.
 But for the time being, here is a simple fix for the leak in
 iommu_attach_domain:


Joerg,

This patch doesn't seem to be fixing the memory leak.

I am testing with a 3.18.0-rc7 kernel applied to a RHEL 7.0 system.

I added printk in free_domain_mem and alloc_domain to first reproduce
Alex's observation.  I created a VM and assigned a PCI NIC w/ no associated
RMRR to the VM.

The pattern I see is when starting the VM is:
alloc_domain - X

When powering off the VM:
free_domain_mem(X)
alloc_domain - Y

I then applied the patch below and I still see the same pattern of two
alloc_domain and one free_domain_mem when powering on/off the VM.

I added some additional instrumentation and I do not see the new call
to domain_exit being executed.  (See inline comments below.)



Jerry


 
 From d65b236d0f27fe3ef7ac4d12cceb0da67aec86ce Mon Sep 17 00:00:00 2001
 From: Joerg Roedel jroe...@suse.de
 Date: Tue, 9 Dec 2014 12:56:45 +0100
 Subject: [PATCH] iommu/vt-d: Fix dmar_domain leak in iommu_attach_device
 
 Since commit 1196c2f a domain is only destroyed in the
 notifier path if it is hot-unplugged. This caused a
 domain leakage in iommu_attach_device when a driver was
 unbound from the device and bound to VFIO. In this case the
 device is attached to a new domain and unlinked from the old
 domain. At this point nothing points to the old domain
 anymore and its memory is leaked.
 Fix this by explicitly freeing the old domain in
 iommu_attach_domain.
 
 Fixes: 1196c2f 'iommu/vt-d: Only remove domain when device is removed'
 Signed-off-by: Joerg Roedel jroe...@suse.de
 ---
  drivers/iommu/intel-iommu.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index 1232336..9ef8e89 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -4424,10 +4424,13 @@ static int intel_iommu_attach_device(struct 
 iommu_domain *domain,
  
   old_domain = find_domain(dev);
   if (old_domain) {
 - if (domain_type_is_vm_or_si(dmar_domain))
 + if (domain_type_is_vm_or_si(dmar_domain)) {


JAH  This path is executed when starting the VM.


   domain_remove_one_dev_info(old_domain, dev);
 - else
 + } else {


JAH  I don't see this path being executed.

   domain_remove_dev_info(old_domain);
 + if (list_empty(old_domain-devices))
 + domain_exit(old_domain);
 + }
   }
   }
  
 -- 
 1.8.4.5
 
 ___
 iommu mailing list
 iommu@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu

-- 


Jerry HoemannSoftware Engineer  Hewlett-Packard

3404 E Harmony Rd. MS 36phone:  (970) 898-1022
Ft. Collins, CO 80528   FAX:(970) 898-0707
email:  jerry.hoem...@hp.com

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Rafael J. Wysocki
On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
 On 11 December 2014 at 16:31, Kevin Hilman khil...@kernel.org wrote:
  [+ Laurent Pinchart]
 
  Tomasz Figa tf...@chromium.org writes:
 
  On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson ulf.hans...@linaro.org 
  wrote:
 
  [...]
 
  @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
  *pdev)
  return -ENXIO;
  }
 
  +   pm_runtime_no_callbacks(dev);
  +   pm_runtime_enable(dev);
  +
  +   /* Synchronize state of the domain with driver data. */
  +   pm_runtime_get_sync(dev);
  +   iommu-is_powered = true;
 
  Doesn't the runtime PM status reflect the value of is_powered, thus
  why do you need to have a copy of it? Could it perpahps be that you
  try to cope with the case when CONFIG_PM is unset?
 
 
  It's worth noting that this driver fully relies on status of other
  devices in the power domain the IOMMU is in and does not enforce the
  status on its own. So in general, as far as my understanding of PM
  runtime subsystem, the status of the IOMMU device will be always
  suspended, because nobody will call pm_runtime_get() on it (except the
  get and put pair in probe). So is_powered is here to track status of
  the domain, not the device. Feel free to suggest a better way, though.
 
  I still don't like these notifiers.  I think they add ways to bypass
  having proper runtime PM implemented for devices/subsystems.
 
 I do agree, but I haven't found another good solution to the problem.

For the record, I'm not liking this mostly because it fixes a generic problem
in a way that's hidden in the genpd code and very indirect.

  From a high-level, the IOMMU is just another device inside the PM
  domain, so ideally it should be doing it's own _get() and _put() calls
  so the PM domain code would just do the right thing without the need for
  notifiers.
 
 As I understand it, the IOMMU (or for other similar cases) shouldn't
 be doing any get() and put() at all because there are no IO API to
 serve request from.
 
 In principle we could consider these kind devices as parent devices
 to those other devices that needs them. Then runtime PM core would
 take care of things for us, right!?
 
 Now, I am not so sure using the parent approach is actually viable,
 since it will likely have other complications, but I haven't
 thoroughly thought it though yet.

That actually need not be a parent.

What's needed in this case is to do a pm_runtime_get_sync() on a device
depended on every time a dependent device is runtime-resumed (and analogously
for suspending).

The core doesn't have a way to do that, but it looks like we'll need to add
it anyway for various reasons (ACPI _DEP is one of them as I mentioned some
time ago, but people dismissed it basically as not their problem).


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] iommu/vt-d: Fix crash dump failure caused by legacy DMA/IO

2014-12-11 Thread Li, ZhenHua

Sorry I have no plan yet.
Could you send me your logs on your AMD system?

Thanks
Zhenhua
On 12/10/2014 04:46 PM, Baoquan He wrote:

Hi Joerg, ZhenHua,

This issue happens on AMD iommu too, do you have any plans or
thoughts on that?

Thanks
Baoquan

On 11/17/14 at 02:38pm, Joerg Roedel wrote:

On Fri, Nov 14, 2014 at 02:27:44PM +0800, Li, ZhenHua wrote:

I am working following  your directions:

1.  If the VT-d driver finds the IOMMU enabled, it reuses its root entry
table, and do NOT disable-enable iommu. Other data will be copied.

2. When a device driver issues the first dma_map command for a
device, we assign a new and empty page-table, thus removing all
mappings from the old kernel for the device.

Please let me know if I get something wrong.


Yes, this sounds right. Happily waiting for patches :)


Joerg



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-11 Thread Tomasz Figa
Hi Rafael,

On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki r...@rjwysocki.net wrote:
 On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
 On 11 December 2014 at 16:31, Kevin Hilman khil...@kernel.org wrote:
  [+ Laurent Pinchart]
 
  Tomasz Figa tf...@chromium.org writes:
 
  On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson ulf.hans...@linaro.org 
  wrote:
 
  [...]
 
  @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct platform_device 
  *pdev)
  return -ENXIO;
  }
 
  +   pm_runtime_no_callbacks(dev);
  +   pm_runtime_enable(dev);
  +
  +   /* Synchronize state of the domain with driver data. */
  +   pm_runtime_get_sync(dev);
  +   iommu-is_powered = true;
 
  Doesn't the runtime PM status reflect the value of is_powered, thus
  why do you need to have a copy of it? Could it perpahps be that you
  try to cope with the case when CONFIG_PM is unset?
 
 
  It's worth noting that this driver fully relies on status of other
  devices in the power domain the IOMMU is in and does not enforce the
  status on its own. So in general, as far as my understanding of PM
  runtime subsystem, the status of the IOMMU device will be always
  suspended, because nobody will call pm_runtime_get() on it (except the
  get and put pair in probe). So is_powered is here to track status of
  the domain, not the device. Feel free to suggest a better way, though.
 
  I still don't like these notifiers.  I think they add ways to bypass
  having proper runtime PM implemented for devices/subsystems.

 I do agree, but I haven't found another good solution to the problem.

 For the record, I'm not liking this mostly because it fixes a generic 
 problem
 in a way that's hidden in the genpd code and very indirect.

Well, that's true. This is indeed a generic problem of PM dependencies
between devices (other than those represented by parent-child
relation), which in fact doesn't have much to do with genpd, but
rather with those devices directly. It is just that genpd is the most
convenient location to solve this in current code and in a simple way.
In other words, I see this solution as a reasonable way to get the
problem solved quickly for now, so that we can start thinking about a
more elegant solution.


  From a high-level, the IOMMU is just another device inside the PM
  domain, so ideally it should be doing it's own _get() and _put() calls
  so the PM domain code would just do the right thing without the need for
  notifiers.

 As I understand it, the IOMMU (or for other similar cases) shouldn't
 be doing any get() and put() at all because there are no IO API to
 serve request from.

 In principle we could consider these kind devices as parent devices
 to those other devices that needs them. Then runtime PM core would
 take care of things for us, right!?

 Now, I am not so sure using the parent approach is actually viable,
 since it will likely have other complications, but I haven't
 thoroughly thought it though yet.

 That actually need not be a parent.

 What's needed in this case is to do a pm_runtime_get_sync() on a device
 depended on every time a dependent device is runtime-resumed (and analogously
 for suspending).

 The core doesn't have a way to do that, but it looks like we'll need to add
 it anyway for various reasons (ACPI _DEP is one of them as I mentioned some
 time ago, but people dismissed it basically as not their problem).

Let me show you our exact use case.

We have a power domain, which contains an IOMMU and an IP block, which
can do bus transactions through that IOMMU. Of course the IP block is
not aware of the IOMMU, because this is just an integration detail and
on other platforms using the same IP block the IOMMU might not be
there. This implies that the driver for this IP block should not be
aware of the IOMMU either, which, on the buffer allocation and mapping
side, is achieved by DMA mapping subsystem. We would also want the
IOMMU to be fully transparent to that driver on PM side.

Now, for PM related details, they are located in the same power
domain, which means that whenever the power domain is turned off, the
CPU can't access their registers and all the hardware state is gone.
To make the case more interesting, there is no point in programming
the IOMMU unless the device using it is powered on. Similarly, there
is no point in powering the domain on to just access the IOMMU. This
implies that the power domain should be fully controlled by the
stand-alone IP block, while the peripheral IOMMU shouldn't affect its
state and its driver only respond to operations performed by driver of
that stand-alone IP block.

A solution like below could work for the above:

- There is a per-device list of peripheral devices, which need to be
powered on for this device to work.
- Whenever the IOMMU subsystem/driver binds an IOMMU to a device, it
adds the IOMMU device to the list of peripheral devices of that device
(and similarly for removal).
- A runtime PM