Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-17 Thread Yanmin Zhang

On 2019/1/17 下午11:17, Dave Hansen wrote:

On 1/17/19 12:19 AM, Yanmin Zhang wrote:



I didn't try pmem and I am wondering it's slower than DRAM.
Should a flag, such like _GFP_PMEM, be added to distinguish it from
DRAM?


Absolutely not. :)

Agree.



We already have performance-differentiated memory, and lots of ways to
enumerate and select it in the kernel (all of our NUMA infrastructure).

Kernel does manage memory like what you say.
My question is: with your patch, PMEM becomes normal RAM, then there is
a chance for kernel to allocate PMEM as DMA buffer.
Some super speed devices like 10Giga NIC, USB (SSIC connecting modem), 
might not work well if DMA buffer is in PMEM as it's slower than DRAM.


Should your patchset consider it?





PMEM is also just the first of many "kinds" of memory that folks want to
build in systems and use a "RAM".  We literally don't have space to put
a flag in for each type.




Re: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal RAM

2019-01-17 Thread Yanmin Zhang

On 2019/1/17 上午2:19, Dave Hansen wrote:

From: Dave Hansen 

Currently, a persistent memory region is "owned" by a device driver,
either the "Direct DAX" or "Filesystem DAX" drivers.  These drivers
allow applications to explicitly use persistent memory, generally
by being modified to use special, new libraries.

However, this limits persistent memory use to applications which
*have* been modified.  To make it more broadly usable, this driver
"hotplugs" memory into the kernel, to be managed ad used just like
normal RAM would be.

To make this work, management software must remove the device from
being controlled by the "Device DAX" infrastructure:

echo -n dax0.0 > /sys/bus/dax/drivers/device_dax/remove_id
echo -n dax0.0 > /sys/bus/dax/drivers/device_dax/unbind

and then bind it to this new driver:

echo -n dax0.0 > /sys/bus/dax/drivers/kmem/new_id
echo -n dax0.0 > /sys/bus/dax/drivers/kmem/bind

After this, there will be a number of new memory sections visible
in sysfs that can be onlined, or that may get onlined by existing
udev-initiated memory hotplug rules.

Note: this inherits any existing NUMA information for the newly-
added memory from the persistent memory device that came from the
firmware.  On Intel platforms, the firmware has guarantees that
require each socket's persistent memory to be in a separate
memory-only NUMA node.  That means that this patch is not expected
to create NUMA nodes, but will simply hotplug memory into existing
nodes.

There is currently some metadata at the beginning of pmem regions.
The section-size memory hotplug restrictions, plus this small
reserved area can cause the "loss" of a section or two of capacity.
This should be fixable in follow-on patches.  But, as a first step,
losing 256MB of memory (worst case) out of hundreds of gigabytes
is a good tradeoff vs. the required code to fix this up precisely.

Signed-off-by: Dave Hansen 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Ross Zwisler 
Cc: Vishal Verma 
Cc: Tom Lendacky 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: linux-nvd...@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: Huang Ying 
Cc: Fengguang Wu 
Cc: Borislav Petkov 
Cc: Bjorn Helgaas 
Cc: Yaowei Bai 
Cc: Takashi Iwai 
---

  b/drivers/dax/Kconfig  |5 ++
  b/drivers/dax/Makefile |1
  b/drivers/dax/kmem.c   |   93 
+
  3 files changed, 99 insertions(+)

diff -puN drivers/dax/Kconfig~dax-kmem-try-4 drivers/dax/Kconfig
--- a/drivers/dax/Kconfig~dax-kmem-try-42019-01-08 09:54:44.051694874 
-0800
+++ b/drivers/dax/Kconfig   2019-01-08 09:54:44.056694874 -0800
@@ -32,6 +32,11 @@ config DEV_DAX_PMEM
  
  	  Say M if unsure
  
+config DEV_DAX_KMEM

+   def_bool y
+   depends on DEV_DAX_PMEM   # Needs DEV_DAX_PMEM infrastructure
+   depends on MEMORY_HOTPLUG # for add_memory() and friends
+
  config DEV_DAX_PMEM_COMPAT
tristate "PMEM DAX: support the deprecated /sys/class/dax interface"
depends on DEV_DAX_PMEM
diff -puN /dev/null drivers/dax/kmem.c
--- /dev/null   2018-12-03 08:41:47.355756491 -0800
+++ b/drivers/dax/kmem.c2019-01-08 09:54:44.056694874 -0800
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2016-2018 Intel Corporation. All rights reserved. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "dax-private.h"
+#include "bus.h"
+
+int dev_dax_kmem_probe(struct device *dev)
+{
+   struct dev_dax *dev_dax = to_dev_dax(dev);
+   struct resource *res = _dax->region->res;
+   resource_size_t kmem_start;
+   resource_size_t kmem_size;
+   struct resource *new_res;
+   int numa_node;
+   int rc;
+
+   /* Hotplug starting at the beginning of the next block: */
+   kmem_start = ALIGN(res->start, memory_block_size_bytes());
+
+   kmem_size = resource_size(res);
+   /* Adjust the size down to compensate for moving up kmem_start: */
+kmem_size -= kmem_start - res->start;
+   /* Align the size down to cover only complete blocks: */
+   kmem_size &= ~(memory_block_size_bytes() - 1);
+
+   new_res = devm_request_mem_region(dev, kmem_start, kmem_size,
+ dev_name(dev));
+
+   if (!new_res) {
+   printk("could not reserve region %016llx -> %016llx\n",
+   kmem_start, kmem_start+kmem_size);
+   return -EBUSY;
+   }
+
+   /*
+* Set flags appropriate for System RAM.  Leave ..._BUSY clear
+* so that add_memory() can add a child resource.
+*/
+   new_res->flags = IORESOURCE_SYSTEM_RAM;
+   new_res->name = dev_name(dev);
+
+   numa_node = dev_dax->target_node;
+   if (numa_node < 0) {
+   pr_warn_once("bad numa_node: %d, forcing to 0\n", numa_node);
+   numa_node = 0;
+   }
+
+   rc = 

Re: [PATCH] pnp: Bypass the calling to pnp_stop_dev at suspend when there is a protocol suspend

2014-01-05 Thread Yanmin Zhang
On 二, 2013-12-24 at 09:35 +0800, shuox@intel.com wrote:
> From: Zhang Yanmin 
> 
> pnp pnp_bus_suspend/_resume have an issue.
> pnp_bus_suspend calls pnp_stop_dev to disable the device. With ACPI,
> pnp_stop_dev turns off the dev usually. Then,
> pnp_bus_suspend=>pnp_dev->protocol->suspend accesses the device and
> suspend it again.
> 
> pnp_bus_resume has the similar issue.
> 
> Another issue is firmware might just provide _DIS, but no_STS method.
> 
> The patch fixes it by adding a checking. If there is
> pnp_dev->protocol->suspend, pnp_bus_suspend doesn't call pnp_stop_dev.
> Do the similar thing for _resume.
Rafael,

What's your idea about this patch?

We hit the issue when enabling Android on a latest tablet. After
suspend-to-ram wakeup, serial console doesn't work.
This serial port is bound by pnpcore driver.
At suspending, 
static int __pnp_bus_suspend(struct device *dev, pm_message_t state)
{
...

if (pnp_can_disable(pnp_dev)) {
error = pnp_stop_dev(pnp_dev);
if (error)
return error;
}

if (pnp_dev->protocol->suspend)
pnp_dev->protocol->suspend(pnp_dev, state);
return 0;
}

pnp_stop_dev calls dev->protocol->disable.
As for ACPI device, that disable callback calls _DIS. Based
on ACPI spec, driver need turn off the device before disabling it
by _DIS. That means, after pnp_stop_dev returns, the device is at OFF
state.

Then, __pnp_bus_suspend calls pnp_dev->protocol->suspend, which
continues to access the device while the device is at OFF.

Our firmware just provides _DIS for the device. There is no _STS
method. Then, after wakeup, the device doesn't work.

But just like what the patch points out, pnp_dev->protocol->suspend
continues to access the device while the device is at OFF. It's not safe.

The patch looks like a workaround. Another possible fix is to just
delete the calling of pnp_stop_dev in function __pnp_bus_suspend, as
suspend is not equal to _disable_.
The deletion might be a little intrusive. That's why we sent a workaround
patch to LKML.

Which one is better?

Thanks,
Yanmin

> 
> Signed-off-by: Zhang Yanmin 
> Signed-off-by: Liu ShuoX 
> ---
>  drivers/pnp/driver.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pnp/driver.c b/drivers/pnp/driver.c
> index f748cc8..2512e47 100644
> --- a/drivers/pnp/driver.c
> +++ b/drivers/pnp/driver.c
> @@ -176,7 +176,7 @@ static int __pnp_bus_suspend(struct device *dev, 
> pm_message_t state)
>   return error;
>   }
>  
> - if (pnp_can_disable(pnp_dev)) {
> + if (pnp_can_disable(pnp_dev) && !pnp_dev->protocol->suspend) {
>   error = pnp_stop_dev(pnp_dev);
>   if (error)
>   return error;
> @@ -215,9 +215,7 @@ static int pnp_bus_resume(struct device *dev)
>   error = pnp_dev->protocol->resume(pnp_dev);
>   if (error)
>   return error;
> - }
> -
> - if (pnp_can_write(pnp_dev)) {
> + } else if (pnp_can_write(pnp_dev)) {
>   error = pnp_start_dev(pnp_dev);
>   if (error)
>   return error;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pnp: Bypass the calling to pnp_stop_dev at suspend when there is a protocol suspend

2014-01-05 Thread Yanmin Zhang
On 二, 2013-12-24 at 09:35 +0800, shuox@intel.com wrote:
 From: Zhang Yanmin yanmin.zh...@intel.com
 
 pnp pnp_bus_suspend/_resume have an issue.
 pnp_bus_suspend calls pnp_stop_dev to disable the device. With ACPI,
 pnp_stop_dev turns off the dev usually. Then,
 pnp_bus_suspend=pnp_dev-protocol-suspend accesses the device and
 suspend it again.
 
 pnp_bus_resume has the similar issue.
 
 Another issue is firmware might just provide _DIS, but no_STS method.
 
 The patch fixes it by adding a checking. If there is
 pnp_dev-protocol-suspend, pnp_bus_suspend doesn't call pnp_stop_dev.
 Do the similar thing for _resume.
Rafael,

What's your idea about this patch?

We hit the issue when enabling Android on a latest tablet. After
suspend-to-ram wakeup, serial console doesn't work.
This serial port is bound by pnpcore driver.
At suspending, 
static int __pnp_bus_suspend(struct device *dev, pm_message_t state)
{
...

if (pnp_can_disable(pnp_dev)) {
error = pnp_stop_dev(pnp_dev);
if (error)
return error;
}

if (pnp_dev-protocol-suspend)
pnp_dev-protocol-suspend(pnp_dev, state);
return 0;
}

pnp_stop_dev calls dev-protocol-disable.
As for ACPI device, that disable callback calls _DIS. Based
on ACPI spec, driver need turn off the device before disabling it
by _DIS. That means, after pnp_stop_dev returns, the device is at OFF
state.

Then, __pnp_bus_suspend calls pnp_dev-protocol-suspend, which
continues to access the device while the device is at OFF.

Our firmware just provides _DIS for the device. There is no _STS
method. Then, after wakeup, the device doesn't work.

But just like what the patch points out, pnp_dev-protocol-suspend
continues to access the device while the device is at OFF. It's not safe.

The patch looks like a workaround. Another possible fix is to just
delete the calling of pnp_stop_dev in function __pnp_bus_suspend, as
suspend is not equal to _disable_.
The deletion might be a little intrusive. That's why we sent a workaround
patch to LKML.

Which one is better?

Thanks,
Yanmin

 
 Signed-off-by: Zhang Yanmin yanmin.zh...@intel.com
 Signed-off-by: Liu ShuoX shuox@intel.com
 ---
  drivers/pnp/driver.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/pnp/driver.c b/drivers/pnp/driver.c
 index f748cc8..2512e47 100644
 --- a/drivers/pnp/driver.c
 +++ b/drivers/pnp/driver.c
 @@ -176,7 +176,7 @@ static int __pnp_bus_suspend(struct device *dev, 
 pm_message_t state)
   return error;
   }
  
 - if (pnp_can_disable(pnp_dev)) {
 + if (pnp_can_disable(pnp_dev)  !pnp_dev-protocol-suspend) {
   error = pnp_stop_dev(pnp_dev);
   if (error)
   return error;
 @@ -215,9 +215,7 @@ static int pnp_bus_resume(struct device *dev)
   error = pnp_dev-protocol-resume(pnp_dev);
   if (error)
   return error;
 - }
 -
 - if (pnp_can_write(pnp_dev)) {
 + } else if (pnp_can_write(pnp_dev)) {
   error = pnp_start_dev(pnp_dev);
   if (error)
   return error;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM: avoid 'autosleep' in shutdown progress

2013-07-14 Thread Yanmin Zhang
On Sat, 2013-07-13 at 13:56 +0200, Pavel Machek wrote:
> On Fri 2013-07-12 10:37:33, Alan Stern wrote:
> > On Fri, 12 Jul 2013, Yanmin Zhang wrote:
> > 
> > > On Thu, 2013-07-11 at 16:03 +0800, shuox@intel.com wrote:
> > > > From: Liu ShuoX 
> > > > 
> > > > In shutdown progress, system is possible to do power transition
> > > > (such as suspend-to-ram) in parallel. It is unreasonable. So,
> > > > fixes it by adding a system_state checking and queue try_to_suspend
> > > > again when system status is not running.
> > > > 
> > > > Signed-off-by: Liu ShuoX 
> > > > ---
> > > >  kernel/power/autosleep.c |3 ++-
> > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > 
> > > Without this patch, we hit an hang issue on Android.
> > > 
> > > Scenario:
> > > Kernel starts shutdown and calls all device driver's shutdown callback.
> > > When a driver's shutdown is called, the last wakelock is released and
> > > suspend-to-ram starts. However, as some driver's shut down callbacks
> > > already shut down devices and disable runtime pm, the suspend-to-ram
> > > calls driver's suspend callback without noticing that device is already
> > > off and causes crash.
> > > We know the drivers should be fixed, but can we also change generic
> > > codes a little to make it stronger?
> > 
> > Indeed, this does seem like the sort of thing we want to have.  
> > Allowing an "automatic" or "opportunistic" sleep in the middle of a
> > shutdown makes no sense at all.  In fact, it might be a good idea to
> > disable system sleep completely at this time -- not even a write to
> > /sys/power/state should be allowed to interrupt a shutdown.
> 
> I'm not completely sure, but as long as userland is running, we should
> have system_state == RUNNING, no?
No. The reboot/shutdown is started by application processes, then kernel
changes system_state quickly. See kernel_restart_prepare, 
kernel_shutdown_prepare.

But indeed, it's a good question. We hit many other issues around rebooting.
When system is rebooting, user space processes are still running. There is no
freeze step. Some processes might jump in to access devices. To avoid it, we
have to add some tricks in some device drivers. It's a little crazy.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM: avoid 'autosleep' in shutdown progress

2013-07-14 Thread Yanmin Zhang
On Sat, 2013-07-13 at 13:56 +0200, Pavel Machek wrote:
 On Fri 2013-07-12 10:37:33, Alan Stern wrote:
  On Fri, 12 Jul 2013, Yanmin Zhang wrote:
  
   On Thu, 2013-07-11 at 16:03 +0800, shuox@intel.com wrote:
From: Liu ShuoX shuox@intel.com

In shutdown progress, system is possible to do power transition
(such as suspend-to-ram) in parallel. It is unreasonable. So,
fixes it by adding a system_state checking and queue try_to_suspend
again when system status is not running.

Signed-off-by: Liu ShuoX shuox@intel.com
---
 kernel/power/autosleep.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

   Without this patch, we hit an hang issue on Android.
   
   Scenario:
   Kernel starts shutdown and calls all device driver's shutdown callback.
   When a driver's shutdown is called, the last wakelock is released and
   suspend-to-ram starts. However, as some driver's shut down callbacks
   already shut down devices and disable runtime pm, the suspend-to-ram
   calls driver's suspend callback without noticing that device is already
   off and causes crash.
   We know the drivers should be fixed, but can we also change generic
   codes a little to make it stronger?
  
  Indeed, this does seem like the sort of thing we want to have.  
  Allowing an automatic or opportunistic sleep in the middle of a
  shutdown makes no sense at all.  In fact, it might be a good idea to
  disable system sleep completely at this time -- not even a write to
  /sys/power/state should be allowed to interrupt a shutdown.
 
 I'm not completely sure, but as long as userland is running, we should
 have system_state == RUNNING, no?
No. The reboot/shutdown is started by application processes, then kernel
changes system_state quickly. See kernel_restart_prepare, 
kernel_shutdown_prepare.

But indeed, it's a good question. We hit many other issues around rebooting.
When system is rebooting, user space processes are still running. There is no
freeze step. Some processes might jump in to access devices. To avoid it, we
have to add some tricks in some device drivers. It's a little crazy.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM: avoid 'autosleep' in shutdown progress

2013-07-12 Thread Yanmin Zhang
On Thu, 2013-07-11 at 16:03 +0800, shuox@intel.com wrote:
> From: Liu ShuoX 
> 
> In shutdown progress, system is possible to do power transition
> (such as suspend-to-ram) in parallel. It is unreasonable. So,
> fixes it by adding a system_state checking and queue try_to_suspend
> again when system status is not running.
> 
> Signed-off-by: Liu ShuoX 
> ---
>  kernel/power/autosleep.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
Without this patch, we hit an hang issue on Android.

Scenario:
Kernel starts shutdown and calls all device driver's shutdown callback.
When a driver's shutdown is called, the last wakelock is released and
suspend-to-ram starts. However, as some driver's shut down callbacks
already shut down devices and disable runtime pm, the suspend-to-ram
calls driver's suspend callback without noticing that device is already
off and causes crash.
We know the drivers should be fixed, but can we also change generic
codes a little to make it stronger?

> diff --git a/kernel/power/autosleep.c b/kernel/power/autosleep.c
> index c6422ff..9012ecf 100644
> --- a/kernel/power/autosleep.c
> +++ b/kernel/power/autosleep.c
> @@ -32,7 +32,8 @@ static void try_to_suspend(struct work_struct *work)
>  
>   mutex_lock(_lock);
>  
> - if (!pm_save_wakeup_count(initial_count)) {
> + if (!pm_save_wakeup_count(initial_count) ||
> + system_state != SYSTEM_RUNNING) {
>   mutex_unlock(_lock);
>   goto out;
>   }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM: avoid 'autosleep' in shutdown progress

2013-07-12 Thread Yanmin Zhang
On Thu, 2013-07-11 at 16:03 +0800, shuox@intel.com wrote:
 From: Liu ShuoX shuox@intel.com
 
 In shutdown progress, system is possible to do power transition
 (such as suspend-to-ram) in parallel. It is unreasonable. So,
 fixes it by adding a system_state checking and queue try_to_suspend
 again when system status is not running.
 
 Signed-off-by: Liu ShuoX shuox@intel.com
 ---
  kernel/power/autosleep.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
Without this patch, we hit an hang issue on Android.

Scenario:
Kernel starts shutdown and calls all device driver's shutdown callback.
When a driver's shutdown is called, the last wakelock is released and
suspend-to-ram starts. However, as some driver's shut down callbacks
already shut down devices and disable runtime pm, the suspend-to-ram
calls driver's suspend callback without noticing that device is already
off and causes crash.
We know the drivers should be fixed, but can we also change generic
codes a little to make it stronger?

 diff --git a/kernel/power/autosleep.c b/kernel/power/autosleep.c
 index c6422ff..9012ecf 100644
 --- a/kernel/power/autosleep.c
 +++ b/kernel/power/autosleep.c
 @@ -32,7 +32,8 @@ static void try_to_suspend(struct work_struct *work)
  
   mutex_lock(autosleep_lock);
  
 - if (!pm_save_wakeup_count(initial_count)) {
 + if (!pm_save_wakeup_count(initial_count) ||
 + system_state != SYSTEM_RUNNING) {
   mutex_unlock(autosleep_lock);
   goto out;
   }


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 03:52 +0200, Rafael J. Wysocki wrote:
> On Saturday, June 08, 2013 09:36:03 AM Yanmin Zhang wrote:
> > On Sat, 2013-06-08 at 03:30 +0200, Rafael J. Wysocki wrote:
> > > On Friday, June 07, 2013 06:16:25 PM Greg KH wrote:
> > > > On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
> > > > > On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
> > > > > > On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
> > > > > > > dpm_run_callback is used in other stages of power states changing.
> > > > > > > It provides debug info message and time measurement when call 
> > > > > > > these
> > > > > > > callback. We also want to benefit ->prepare and ->complete.
> > > > > > > 
> > > > > > > [PATCH 1/2] PM: use dpm_run_callback in device_prepare
> > > > > > > [PATCH 2/2] PM: add dpm_run_callback_void and use it in 
> > > > > > > device_complete
> > > > > > 
> > > > > > Is this an "Oh, why don't we do that?" series, or is it useful for 
> > > > > > anything
> > > > > > in practice?  I'm asking, because we haven't added that stuff to 
> > > > > > start with
> > > > > > since we didn't see why it would be useful to anyone.
> > > > > > 
> > > > > > And while patch [1/2] reduces the code size (by 1 line), so I can 
> > > > > > see some
> > > > > > (tiny) benefit from applying it, patch [2/2] adds more code and is 
> > > > > > there any
> > > > > > paractical reason?
> > > > > Sometimes, suspend-to-ram path spends too much time (either suspend 
> > > > > slowly
> > > > > or wakeup slowly) and we need optimize it.
> > > > > With the 2 patches, we could collect initcall_debug printk info and 
> > > > > manually
> > > > > check what prepare/complete callbacks consume too much time.
> > > > 
> > > > But initcall information is for initialization stuff, not suspend/resume
> > > > things, right?  Doesn't the existing tools for parsing this choke if it
> > > > sees the information at suspend/resume time?
> > > 
> > > We've been using that for suspend/resume for quite some time too, but not
> > > for the prepare/complete phases (because we still believe that's not 
> > > really
> > > useful for them). 
> > > 
> > > Well, I'll be handling patches changing code under drivers/base/power,
> > > I promise. :-)
> > > 
> > > I've been doing that for quite a few years now ...
> > Yes, indeed. Power is one of the most important features on embedded 
> > devices.
> > Lots of smart phones don't really go through the full cycles of 
> > suspend-to-ram.
> > We are following the full steps of the suspend.
> 
> But if you go through the code, you'll see that alomost no drivers actually
> implemet .prepare() and .complete().  Some subsystems do, but they really 
> don't
> take too much time to execute.  Which means that your patches with
> initcall_debug will add quite a pile of useless garbage to the kernel log
Does that mean we need add more log levels around such info instead of just 
having or
not having?

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > > > > From: Zhang Yanmin 
> > > > > > > > > > 
> > > > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. 
> > > > > > > > > > If using this
> > > > > > > > > > function while holding a resource, the IRQ handler may 
> > > > > > > > > > cause deadlock.
> > > > > > > > > > 
> > > > > > > > > > Here we add a new function irq_in_progress which doesn't 
> > > > > > > > > > wait for the handlers
> > > > > > > > > > to be finished.
> > > > > > > > > > 
> > > > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > > > 
> > > > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > > > mutex at rare cases.
> > > > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > > > irq_in_progress. if
> > > > > > > > > > handler is running, abort suspend.
> > > > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > > > suspended, irq handler
> > > > > > > > > > either ignores the interrupt, or wakes up the whole system, 
> > > > > > > > > > and the driver's
> > > > > > > > > > resume function could deal with the delayed interrupt 
> > > > > > > > > > handling.
> > > > > > > > > 
> > > > > > > > > This is as wrong as it can be. Fix the driver instead of 
> > > > > > > > > hacking racy
> > > > > > > > > functions into the core code.
> > > > > > > > > 
> > > > > > > > > So your problem looks like this:
> > > > > > > > > 
> > > > > > > > > CPU 0 CPU 1
> > > > > > > > > irq_handler_thread()  suspend()
> > > > > > > > >.  mutex_lock();
> > > > > > > > >mutex_lock();synchronize_irq();
> > > > > > > > > 
> > > > > > > > > So why needs the mutex to be taken before synchronize_irq()? 
> > > > > > > > > Why not
> > > > > > > > > doing the obvious?
> > > > > > > > > 
> > > > > > > > > suspend()
> > > > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > > > >   mutex_lock();
> > > > > > > > >   
> > > > > > > > >   mutex_unlock();
> > > > > > > > >   enable_irq();
> > > > > > > > Thanks for the kind comment.
> > > > > > > > 
> > > > > > > > We do consider your solution before and it works well indeed 
> > > > > > > > with some specific
> > > > > > > > simple drivers. For example, some drives use GPIO pin as 
> > > > > > > > interrupt source.
> > > > > > > > 
> > > > > > > > On one specific platform, disable_irq would really disable the 
> > > > > > > > irq at RTE entry,
> > > > > > > > 

Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 03:30 +0200, Rafael J. Wysocki wrote:
> On Friday, June 07, 2013 06:16:25 PM Greg KH wrote:
> > On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
> > > On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
> > > > On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
> > > > > dpm_run_callback is used in other stages of power states changing.
> > > > > It provides debug info message and time measurement when call these
> > > > > callback. We also want to benefit ->prepare and ->complete.
> > > > > 
> > > > > [PATCH 1/2] PM: use dpm_run_callback in device_prepare
> > > > > [PATCH 2/2] PM: add dpm_run_callback_void and use it in 
> > > > > device_complete
> > > > 
> > > > Is this an "Oh, why don't we do that?" series, or is it useful for 
> > > > anything
> > > > in practice?  I'm asking, because we haven't added that stuff to start 
> > > > with
> > > > since we didn't see why it would be useful to anyone.
> > > > 
> > > > And while patch [1/2] reduces the code size (by 1 line), so I can see 
> > > > some
> > > > (tiny) benefit from applying it, patch [2/2] adds more code and is 
> > > > there any
> > > > paractical reason?
> > > Sometimes, suspend-to-ram path spends too much time (either suspend slowly
> > > or wakeup slowly) and we need optimize it.
> > > With the 2 patches, we could collect initcall_debug printk info and 
> > > manually
> > > check what prepare/complete callbacks consume too much time.
> > 
> > But initcall information is for initialization stuff, not suspend/resume
> > things, right?  Doesn't the existing tools for parsing this choke if it
> > sees the information at suspend/resume time?
> 
> We've been using that for suspend/resume for quite some time too, but not
> for the prepare/complete phases (because we still believe that's not really
> useful for them). 
> 
> Well, I'll be handling patches changing code under drivers/base/power,
> I promise. :-)
> 
> I've been doing that for quite a few years now ...
Yes, indeed. Power is one of the most important features on embedded devices.
Lots of smart phones don't really go through the full cycles of suspend-to-ram.
We are following the full steps of the suspend.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 18:16 -0700, Greg KH wrote:
> On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
> > On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
> > > On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
> > > > dpm_run_callback is used in other stages of power states changing.
> > > > It provides debug info message and time measurement when call these
> > > > callback. We also want to benefit ->prepare and ->complete.
> > > > 
> > > > [PATCH 1/2] PM: use dpm_run_callback in device_prepare
> > > > [PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
> > > 
> > > Is this an "Oh, why don't we do that?" series, or is it useful for 
> > > anything
> > > in practice?  I'm asking, because we haven't added that stuff to start 
> > > with
> > > since we didn't see why it would be useful to anyone.
> > > 
> > > And while patch [1/2] reduces the code size (by 1 line), so I can see some
> > > (tiny) benefit from applying it, patch [2/2] adds more code and is there 
> > > any
> > > paractical reason?
> > Sometimes, suspend-to-ram path spends too much time (either suspend slowly
> > or wakeup slowly) and we need optimize it.
> > With the 2 patches, we could collect initcall_debug printk info and manually
> > check what prepare/complete callbacks consume too much time.
> 
> But initcall information is for initialization stuff, not suspend/resume
> things, right? 
initcall_debug also controls shutdown callbacks, and suspend/resume callbacks.
See __device_suspend=>dpm_run_callback. It's very useful when we want to 
optimize
suspend-to-ram, or debug some hard issues which happen when async suspend is 
enabled.

>  Doesn't the existing tools for parsing this choke if it
> sees the information at suspend/resume time?
Current kernel doesn't print out info around prepare/complete.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] PM: use dpm_run_callback in device_prepare

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 18:15 -0700, Greg KH wrote:
> On Sat, Jun 08, 2013 at 08:43:55AM +0800, Yanmin Zhang wrote:
> > On Fri, 2013-06-07 at 10:37 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 04:20:31PM +0800, shuox@intel.com wrote:
> > > > From: ShuoX Liu 
> > > > 
> > > > dpm_run_callback could show more debug info around prepare stage.
> > > 
> > > Why?  Who needs this?  What problem does it solve?
> > > 
> > > Without answers to this, why would you expect us to accept such a
> > > change?
> > It's to provide more debug info and developers can quickly find what
> > prepare/complete callback consumes too much time.
> 
> What exact debug info is this providing, and why wouldn't you say all of
> this in the original changelog information?
Thanks for the reminder.

Below are examples, after applying the patches. We can see clearly
how much time every device prepare callback consumes.

[  600.384779] sr 1:0:0:0: preparing bus suspend
[  600.384780] call 1:0:0:0+ returned 0 after 2 usecs
[  600.384785] calling  2-1+ @ 2556, parent: usb2
[  600.384788] usb 2-1: preparing type suspend
[  600.384789] call 2-1+ returned 0 after 2 usecs
[  600.384794] calling  1-1.1+ @ 2556, parent: 1-1
[  600.384797] usb 1-1.1: preparing type suspend
[  600.384799] call 1-1.1+ returned 0 after 2 usecs
[  600.384802] calling  1-1.3+ @ 2556, parent: 1-1
[  600.384804] usb 1-1.3: preparing type suspend
[  600.384805] call 1-1.3+ returned 0 after 2 usecs
[  600.384809] calling  1-1.4+ @ 2556, parent: 1-1
[  600.384811] usb 1-1.4: preparing type suspend
[  600.384813] call 1-1.4+ returned 0 after 2 usecs
[  600.384816] calling  1-1.5+ @ 2556, parent: 1-1
[  600.384819] usb 1-1.5: preparing type suspend, may wakeup
[  600.384820] call 1-1.5+ returned 0 after 2 usecs
[  600.384823] calling  1-1.4.1+ @ 2556, parent: 1-1.4
[  600.384825] usb 1-1.4.1: preparing type suspend
[  600.384827] call 1-1.4.1+ returned 0 after 2 usecs
[  600.384829] calling  1-1.4.2+ @ 2556, parent: 1-1.4
[  600.384831] usb 1-1.4.2: preparing type suspend
[  600.384833] call 1-1.4.2+ returned 0 after 2 usecs
[  600.384865] PM: prepare suspend of devices complete after 0.471 msecs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 02:54 +0200, Rafael J. Wysocki wrote:
> On Saturday, June 08, 2013 08:42:12 AM Yanmin Zhang wrote:
> > On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
> > > On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
> > > > dpm_run_callback is used in other stages of power states changing.
> > > > It provides debug info message and time measurement when call these
> > > > callback. We also want to benefit ->prepare and ->complete.
> > > > 
> > > > [PATCH 1/2] PM: use dpm_run_callback in device_prepare
> > > > [PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
> > > 
> > > Is this an "Oh, why don't we do that?" series, or is it useful for 
> > > anything
> > > in practice?  I'm asking, because we haven't added that stuff to start 
> > > with
> > > since we didn't see why it would be useful to anyone.
> > > 
> > > And while patch [1/2] reduces the code size (by 1 line), so I can see some
> > > (tiny) benefit from applying it, patch [2/2] adds more code and is there 
> > > any
> > > paractical reason?
> > Sometimes, suspend-to-ram path spends too much time (either suspend slowly
> > or wakeup slowly) and we need optimize it.
> > With the 2 patches, we could collect initcall_debug printk info and manually
> > check what prepare/complete callbacks consume too much time.
> 
> Well, can you point me to a single driver where prepare/complete causes this
> type of problems to happen?

That's a good question. We are enabling Android mobiles and need optimize
suspend-to-ram. In the current upstream, we don't find a driver's 
prepare/complete
callback spends too much time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] PM: use dpm_run_callback in device_prepare

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 10:37 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 04:20:31PM +0800, shuox@intel.com wrote:
> > From: ShuoX Liu 
> > 
> > dpm_run_callback could show more debug info around prepare stage.
> 
> Why?  Who needs this?  What problem does it solve?
> 
> Without answers to this, why would you expect us to accept such a
> change?
It's to provide more debug info and developers can quickly find what
prepare/complete callback consumes too much time.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
> On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
> > dpm_run_callback is used in other stages of power states changing.
> > It provides debug info message and time measurement when call these
> > callback. We also want to benefit ->prepare and ->complete.
> > 
> > [PATCH 1/2] PM: use dpm_run_callback in device_prepare
> > [PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
> 
> Is this an "Oh, why don't we do that?" series, or is it useful for anything
> in practice?  I'm asking, because we haven't added that stuff to start with
> since we didn't see why it would be useful to anyone.
> 
> And while patch [1/2] reduces the code size (by 1 line), so I can see some
> (tiny) benefit from applying it, patch [2/2] adds more code and is there any
> paractical reason?
Sometimes, suspend-to-ram path spends too much time (either suspend slowly
or wakeup slowly) and we need optimize it.
With the 2 patches, we could collect initcall_debug printk info and manually
check what prepare/complete callbacks consume too much time.

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
 On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
  dpm_run_callback is used in other stages of power states changing.
  It provides debug info message and time measurement when call these
  callback. We also want to benefit -prepare and -complete.
  
  [PATCH 1/2] PM: use dpm_run_callback in device_prepare
  [PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
 
 Is this an Oh, why don't we do that? series, or is it useful for anything
 in practice?  I'm asking, because we haven't added that stuff to start with
 since we didn't see why it would be useful to anyone.
 
 And while patch [1/2] reduces the code size (by 1 line), so I can see some
 (tiny) benefit from applying it, patch [2/2] adds more code and is there any
 paractical reason?
Sometimes, suspend-to-ram path spends too much time (either suspend slowly
or wakeup slowly) and we need optimize it.
With the 2 patches, we could collect initcall_debug printk info and manually
check what prepare/complete callbacks consume too much time.

Thanks,
Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] PM: use dpm_run_callback in device_prepare

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 10:37 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 04:20:31PM +0800, shuox@intel.com wrote:
  From: ShuoX Liu shuox@intel.com
  
  dpm_run_callback could show more debug info around prepare stage.
 
 Why?  Who needs this?  What problem does it solve?
 
 Without answers to this, why would you expect us to accept such a
 change?
It's to provide more debug info and developers can quickly find what
prepare/complete callback consumes too much time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 02:54 +0200, Rafael J. Wysocki wrote:
 On Saturday, June 08, 2013 08:42:12 AM Yanmin Zhang wrote:
  On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
   On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
dpm_run_callback is used in other stages of power states changing.
It provides debug info message and time measurement when call these
callback. We also want to benefit -prepare and -complete.

[PATCH 1/2] PM: use dpm_run_callback in device_prepare
[PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
   
   Is this an Oh, why don't we do that? series, or is it useful for 
   anything
   in practice?  I'm asking, because we haven't added that stuff to start 
   with
   since we didn't see why it would be useful to anyone.
   
   And while patch [1/2] reduces the code size (by 1 line), so I can see some
   (tiny) benefit from applying it, patch [2/2] adds more code and is there 
   any
   paractical reason?
  Sometimes, suspend-to-ram path spends too much time (either suspend slowly
  or wakeup slowly) and we need optimize it.
  With the 2 patches, we could collect initcall_debug printk info and manually
  check what prepare/complete callbacks consume too much time.
 
 Well, can you point me to a single driver where prepare/complete causes this
 type of problems to happen?

That's a good question. We are enabling Android mobiles and need optimize
suspend-to-ram. In the current upstream, we don't find a driver's 
prepare/complete
callback spends too much time.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] PM: use dpm_run_callback in device_prepare

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 18:15 -0700, Greg KH wrote:
 On Sat, Jun 08, 2013 at 08:43:55AM +0800, Yanmin Zhang wrote:
  On Fri, 2013-06-07 at 10:37 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 04:20:31PM +0800, shuox@intel.com wrote:
From: ShuoX Liu shuox@intel.com

dpm_run_callback could show more debug info around prepare stage.
   
   Why?  Who needs this?  What problem does it solve?
   
   Without answers to this, why would you expect us to accept such a
   change?
  It's to provide more debug info and developers can quickly find what
  prepare/complete callback consumes too much time.
 
 What exact debug info is this providing, and why wouldn't you say all of
 this in the original changelog information?
Thanks for the reminder.

Below are examples, after applying the patches. We can see clearly
how much time every device prepare callback consumes.

[  600.384779] sr 1:0:0:0: preparing bus suspend
[  600.384780] call 1:0:0:0+ returned 0 after 2 usecs
[  600.384785] calling  2-1+ @ 2556, parent: usb2
[  600.384788] usb 2-1: preparing type suspend
[  600.384789] call 2-1+ returned 0 after 2 usecs
[  600.384794] calling  1-1.1+ @ 2556, parent: 1-1
[  600.384797] usb 1-1.1: preparing type suspend
[  600.384799] call 1-1.1+ returned 0 after 2 usecs
[  600.384802] calling  1-1.3+ @ 2556, parent: 1-1
[  600.384804] usb 1-1.3: preparing type suspend
[  600.384805] call 1-1.3+ returned 0 after 2 usecs
[  600.384809] calling  1-1.4+ @ 2556, parent: 1-1
[  600.384811] usb 1-1.4: preparing type suspend
[  600.384813] call 1-1.4+ returned 0 after 2 usecs
[  600.384816] calling  1-1.5+ @ 2556, parent: 1-1
[  600.384819] usb 1-1.5: preparing type suspend, may wakeup
[  600.384820] call 1-1.5+ returned 0 after 2 usecs
[  600.384823] calling  1-1.4.1+ @ 2556, parent: 1-1.4
[  600.384825] usb 1-1.4.1: preparing type suspend
[  600.384827] call 1-1.4.1+ returned 0 after 2 usecs
[  600.384829] calling  1-1.4.2+ @ 2556, parent: 1-1.4
[  600.384831] usb 1-1.4.2: preparing type suspend
[  600.384833] call 1-1.4.2+ returned 0 after 2 usecs
[  600.384865] PM: prepare suspend of devices complete after 0.471 msecs

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 18:16 -0700, Greg KH wrote:
 On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
  On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
   On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
dpm_run_callback is used in other stages of power states changing.
It provides debug info message and time measurement when call these
callback. We also want to benefit -prepare and -complete.

[PATCH 1/2] PM: use dpm_run_callback in device_prepare
[PATCH 2/2] PM: add dpm_run_callback_void and use it in device_complete
   
   Is this an Oh, why don't we do that? series, or is it useful for 
   anything
   in practice?  I'm asking, because we haven't added that stuff to start 
   with
   since we didn't see why it would be useful to anyone.
   
   And while patch [1/2] reduces the code size (by 1 line), so I can see some
   (tiny) benefit from applying it, patch [2/2] adds more code and is there 
   any
   paractical reason?
  Sometimes, suspend-to-ram path spends too much time (either suspend slowly
  or wakeup slowly) and we need optimize it.
  With the 2 patches, we could collect initcall_debug printk info and manually
  check what prepare/complete callbacks consume too much time.
 
 But initcall information is for initialization stuff, not suspend/resume
 things, right? 
initcall_debug also controls shutdown callbacks, and suspend/resume callbacks.
See __device_suspend=dpm_run_callback. It's very useful when we want to 
optimize
suspend-to-ram, or debug some hard issues which happen when async suspend is 
enabled.

  Doesn't the existing tools for parsing this choke if it
 sees the information at suspend/resume time?
Current kernel doesn't print out info around prepare/complete.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 03:30 +0200, Rafael J. Wysocki wrote:
 On Friday, June 07, 2013 06:16:25 PM Greg KH wrote:
  On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
   On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
 dpm_run_callback is used in other stages of power states changing.
 It provides debug info message and time measurement when call these
 callback. We also want to benefit -prepare and -complete.
 
 [PATCH 1/2] PM: use dpm_run_callback in device_prepare
 [PATCH 2/2] PM: add dpm_run_callback_void and use it in 
 device_complete

Is this an Oh, why don't we do that? series, or is it useful for 
anything
in practice?  I'm asking, because we haven't added that stuff to start 
with
since we didn't see why it would be useful to anyone.

And while patch [1/2] reduces the code size (by 1 line), so I can see 
some
(tiny) benefit from applying it, patch [2/2] adds more code and is 
there any
paractical reason?
   Sometimes, suspend-to-ram path spends too much time (either suspend slowly
   or wakeup slowly) and we need optimize it.
   With the 2 patches, we could collect initcall_debug printk info and 
   manually
   check what prepare/complete callbacks consume too much time.
  
  But initcall information is for initialization stuff, not suspend/resume
  things, right?  Doesn't the existing tools for parsing this choke if it
  sees the information at suspend/resume time?
 
 We've been using that for suspend/resume for quite some time too, but not
 for the prepare/complete phases (because we still believe that's not really
 useful for them). 
 
 Well, I'll be handling patches changing code under drivers/base/power,
 I promise. :-)
 
 I've been doing that for quite a few years now ...
Yes, indeed. Power is one of the most important features on embedded devices.
Lots of smart phones don't really go through the full cycles of suspend-to-ram.
We are following the full steps of the suspend.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-07 Thread Yanmin Zhang
On Fri, 2013-06-07 at 08:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 12:54:55PM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. 
  If using this
  function while holding a resource, the IRQ handler may 
  cause deadlock.
  
  Here we add a new function irq_in_progress which doesn't 
  wait for the handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a 
  mutex at rare cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls 
  irq_in_progress. if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is 
  suspended, irq handler
  either ignores the interrupt, or wakes up the whole system, 
  and the driver's
  resume function could deal with the delayed interrupt 
  handling.
 
 This is as wrong as it can be. Fix the driver instead of 
 hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? 
 Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed 
with some specific
simple drivers. For example, some drives use GPIO pin as 
interrupt source.

On one specific platform, disable_irq would really disable the 
irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM 
device timeout' issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many 
functions to optimize it.
In addition, we have to use the driver instead of throwing it 
away.
   
   What is preventing you from rewriting it to work properly?
  It's complicated.
 
 That sounds like your issue, not ours, so please don't push the 
 problem
 onto someone else.  Take ownership of the driver, fix it up, and all
 will be good.
 
 
With irq_in_progress, we can resolve this issue and it does 
work, although it
looks like ugly.
   
   Don't paper over driver bugs in the core kernel, fix the driver.
  It's hard to say it's a driver bug. Could generic kernel provide 
  some
  flexibility for such complicated drivers?
 
 Please post the code for the driver, and then we will be glad to
 continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.
   
   What do you mean private driver?  All drivers need to be merged into
   the mainline kernel, it saves you time and money, and we will fix your
   bugs for you.
   
   You know that, your bosses know that, your management knows that, so why
   are you trying to hide things?
   
   totally confused,
  They are embedded device drivers, highly depending on specific devices 
  which runs
  its own firmware in devices. Here the kernel drivers run at AP side.
 
 That's no different from loads of drivers that we have in the kernel
 today, no need to keep them from being merged, please submit them.
It need managers' approval and is beyond my capability.

 
  One example is Graphics driver, which is very big and coding is not 
  friendly. Kernel
  experts can raise tons of questions against the driver, but we have to make 
  the driver
  work well on real products.
 
 So you are saying that kernel experts don't ask questions that
 actually make drivers work well on real products?
Obviously, I didn't say that. Pls. also remove the , as I really
think I'm

Re: [PATCH 0/2] Run callback of device_prepare/complete consistently

2013-06-07 Thread Yanmin Zhang
On Sat, 2013-06-08 at 03:52 +0200, Rafael J. Wysocki wrote:
 On Saturday, June 08, 2013 09:36:03 AM Yanmin Zhang wrote:
  On Sat, 2013-06-08 at 03:30 +0200, Rafael J. Wysocki wrote:
   On Friday, June 07, 2013 06:16:25 PM Greg KH wrote:
On Sat, Jun 08, 2013 at 08:42:12AM +0800, Yanmin Zhang wrote:
 On Fri, 2013-06-07 at 12:36 +0200, Rafael J. Wysocki wrote:
  On Friday, June 07, 2013 04:20:30 PM shuox@intel.com wrote:
   dpm_run_callback is used in other stages of power states changing.
   It provides debug info message and time measurement when call 
   these
   callback. We also want to benefit -prepare and -complete.
   
   [PATCH 1/2] PM: use dpm_run_callback in device_prepare
   [PATCH 2/2] PM: add dpm_run_callback_void and use it in 
   device_complete
  
  Is this an Oh, why don't we do that? series, or is it useful for 
  anything
  in practice?  I'm asking, because we haven't added that stuff to 
  start with
  since we didn't see why it would be useful to anyone.
  
  And while patch [1/2] reduces the code size (by 1 line), so I can 
  see some
  (tiny) benefit from applying it, patch [2/2] adds more code and is 
  there any
  paractical reason?
 Sometimes, suspend-to-ram path spends too much time (either suspend 
 slowly
 or wakeup slowly) and we need optimize it.
 With the 2 patches, we could collect initcall_debug printk info and 
 manually
 check what prepare/complete callbacks consume too much time.

But initcall information is for initialization stuff, not suspend/resume
things, right?  Doesn't the existing tools for parsing this choke if it
sees the information at suspend/resume time?
   
   We've been using that for suspend/resume for quite some time too, but not
   for the prepare/complete phases (because we still believe that's not 
   really
   useful for them). 
   
   Well, I'll be handling patches changing code under drivers/base/power,
   I promise. :-)
   
   I've been doing that for quite a few years now ...
  Yes, indeed. Power is one of the most important features on embedded 
  devices.
  Lots of smart phones don't really go through the full cycles of 
  suspend-to-ram.
  We are following the full steps of the suspend.
 
 But if you go through the code, you'll see that alomost no drivers actually
 implemet .prepare() and .complete().  Some subsystems do, but they really 
 don't
 take too much time to execute.  Which means that your patches with
 initcall_debug will add quite a pile of useless garbage to the kernel log
Does that mean we need add more log levels around such info instead of just 
having or
not having?

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > > > From: Zhang Yanmin 
> > > > > > > > 
> > > > > > > > synchronize_irq waits pending IRQ handlers to be finished. If 
> > > > > > > > using this
> > > > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > > > deadlock.
> > > > > > > > 
> > > > > > > > Here we add a new function irq_in_progress which doesn't wait 
> > > > > > > > for the handlers
> > > > > > > > to be finished.
> > > > > > > > 
> > > > > > > > A typical use case at suspend-to-ram:
> > > > > > > > 
> > > > > > > > device driver's irq handler is complicated and might hold a 
> > > > > > > > mutex at rare cases.
> > > > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > > > irq_in_progress. if
> > > > > > > > handler is running, abort suspend.
> > > > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > > > suspended, irq handler
> > > > > > > > either ignores the interrupt, or wakes up the whole system, and 
> > > > > > > > the driver's
> > > > > > > > resume function could deal with the delayed interrupt handling.
> > > > > > > 
> > > > > > > This is as wrong as it can be. Fix the driver instead of hacking 
> > > > > > > racy
> > > > > > > functions into the core code.
> > > > > > > 
> > > > > > > So your problem looks like this:
> > > > > > > 
> > > > > > > CPU 0 CPU 1
> > > > > > > irq_handler_thread()  suspend()
> > > > > > >.  mutex_lock();
> > > > > > >mutex_lock();synchronize_irq();
> > > > > > > 
> > > > > > > So why needs the mutex to be taken before synchronize_irq()? Why 
> > > > > > > not
> > > > > > > doing the obvious?
> > > > > > > 
> > > > > > > suspend()
> > > > > > >   disable_irq(); (Implies synchronize_irq)
> > > > > > >   mutex_lock();
> > > > > > >   
> > > > > > >   mutex_unlock();
> > > > > > >   enable_irq();
> > > > > > Thanks for the kind comment.
> > > > > > 
> > > > > > We do consider your solution before and it works well indeed with 
> > > > > > some specific
> > > > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > > > source.
> > > > > > 
> > > > > > On one specific platform, disable_irq would really disable the irq 
> > > > > > at RTE entry,
> > > > > > which means we lose the wakeup capability of this device.
> > > > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > > > timeout' issue
> > > > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > > > 
> > > > > > The driver is complicated. We couldn't change too many functions to 
> > > > > > optimize it.
> > > > > > In addition, we have to use the driver instead of throwing it away.
> > > > > 
> > > > > What is preventing you from rewriting it to work properly?
> > > > It's complicated.
> > > 
> > > That sounds like your issue, not ours, so please don't push the problem
> &

Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> > > On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > > > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > > > From: Zhang Yanmin 
> > > > > > 
> > > > > > synchronize_irq waits pending IRQ handlers to be finished. If using 
> > > > > > this
> > > > > > function while holding a resource, the IRQ handler may cause 
> > > > > > deadlock.
> > > > > > 
> > > > > > Here we add a new function irq_in_progress which doesn't wait for 
> > > > > > the handlers
> > > > > > to be finished.
> > > > > > 
> > > > > > A typical use case at suspend-to-ram:
> > > > > > 
> > > > > > device driver's irq handler is complicated and might hold a mutex 
> > > > > > at rare cases.
> > > > > > Its suspend function is called and a suspended flag is set.
> > > > > > In case its IRQ handler is running, suspend function calls 
> > > > > > irq_in_progress. if
> > > > > > handler is running, abort suspend.
> > > > > > The irq handler checks the suspended flag. If the device is 
> > > > > > suspended, irq handler
> > > > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > > > driver's
> > > > > > resume function could deal with the delayed interrupt handling.
> > > > > 
> > > > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > > > functions into the core code.
> > > > > 
> > > > > So your problem looks like this:
> > > > > 
> > > > > CPU 0 CPU 1
> > > > > irq_handler_thread()  suspend()
> > > > >.  mutex_lock();
> > > > >mutex_lock();synchronize_irq();
> > > > > 
> > > > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > > > doing the obvious?
> > > > > 
> > > > > suspend()
> > > > >   disable_irq(); (Implies synchronize_irq)
> > > > >   mutex_lock();
> > > > >   
> > > > >   mutex_unlock();
> > > > >   enable_irq();
> > > > Thanks for the kind comment.
> > > > 
> > > > We do consider your solution before and it works well indeed with some 
> > > > specific
> > > > simple drivers. For example, some drives use GPIO pin as interrupt 
> > > > source.
> > > > 
> > > > On one specific platform, disable_irq would really disable the irq at 
> > > > RTE entry,
> > > > which means we lose the wakeup capability of this device.
> > > > synchronize_irq can be another solution. But we did hit 'DPM device 
> > > > timeout' issue
> > > > reported by dpm_wd_handler at suspend-to-ram.
> > > > 
> > > > The driver is complicated. We couldn't change too many functions to 
> > > > optimize it.
> > > > In addition, we have to use the driver instead of throwing it away.
> > > 
> > > What is preventing you from rewriting it to work properly?
> > It's complicated.
> 
> That sounds like your issue, not ours, so please don't push the problem
> onto someone else.  Take ownership of the driver, fix it up, and all
> will be good.
> 
> 
> > > > With irq_in_progress, we can resolve this issue and it does work, 
> > > > although it
> > > > looks like ugly.
> > > 
> > > Don't paper over driver bugs in the core kernel, fix the driver.
> > It's hard to say it's a driver bug. Could generic kernel provide some
> > flexibility for such complicated drivers?
> 
> Please post the code for the driver, and then we will be glad to
> continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.

Let's stop here and I wouldn't push the patch to upstream.

Thanks all.
Yanmin

> 
> As for "complicated", just make it "uncomplicated", it's just code :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
> On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
> > On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> > > On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > > > From: Zhang Yanmin 
> > > > 
> > > > synchronize_irq waits pending IRQ handlers to be finished. If using this
> > > > function while holding a resource, the IRQ handler may cause deadlock.
> > > > 
> > > > Here we add a new function irq_in_progress which doesn't wait for the 
> > > > handlers
> > > > to be finished.
> > > > 
> > > > A typical use case at suspend-to-ram:
> > > > 
> > > > device driver's irq handler is complicated and might hold a mutex at 
> > > > rare cases.
> > > > Its suspend function is called and a suspended flag is set.
> > > > In case its IRQ handler is running, suspend function calls 
> > > > irq_in_progress. if
> > > > handler is running, abort suspend.
> > > > The irq handler checks the suspended flag. If the device is suspended, 
> > > > irq handler
> > > > either ignores the interrupt, or wakes up the whole system, and the 
> > > > driver's
> > > > resume function could deal with the delayed interrupt handling.
> > > 
> > > This is as wrong as it can be. Fix the driver instead of hacking racy
> > > functions into the core code.
> > > 
> > > So your problem looks like this:
> > > 
> > > CPU 0 CPU 1
> > > irq_handler_thread()  suspend()
> > >.  mutex_lock();
> > >mutex_lock();synchronize_irq();
> > > 
> > > So why needs the mutex to be taken before synchronize_irq()? Why not
> > > doing the obvious?
> > > 
> > > suspend()
> > >   disable_irq(); (Implies synchronize_irq)
> > >   mutex_lock();
> > >   
> > >   mutex_unlock();
> > >   enable_irq();
> > Thanks for the kind comment.
> > 
> > We do consider your solution before and it works well indeed with some 
> > specific
> > simple drivers. For example, some drives use GPIO pin as interrupt source.
> > 
> > On one specific platform, disable_irq would really disable the irq at RTE 
> > entry,
> > which means we lose the wakeup capability of this device.
> > synchronize_irq can be another solution. But we did hit 'DPM device 
> > timeout' issue
> > reported by dpm_wd_handler at suspend-to-ram.
> > 
> > The driver is complicated. We couldn't change too many functions to 
> > optimize it.
> > In addition, we have to use the driver instead of throwing it away.
> 
> What is preventing you from rewriting it to work properly?
It's complicated.

> 
> > With irq_in_progress, we can resolve this issue and it does work, although 
> > it
> > looks like ugly.
> 
> Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some 
flexibility
for such complicated drivers?

For example, any driver's suspend can return error and the whole suspend-to-ram
aborts. Can we say the driver has a bug? If so, why not to change all 
suspend/resume
callbacks to return VOID?
Current kernel already allows drivers to abort suspend-to-ram at some rare 
corner cases.
Of course, if the abort happens frequently, it's a bug and we need fix it in 
driver.
If it happens rarely, can we provide some flexibility for the driver?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
> On Thu, 6 Jun 2013, shuox@intel.com wrote:
> > From: Zhang Yanmin 
> > 
> > synchronize_irq waits pending IRQ handlers to be finished. If using this
> > function while holding a resource, the IRQ handler may cause deadlock.
> > 
> > Here we add a new function irq_in_progress which doesn't wait for the 
> > handlers
> > to be finished.
> > 
> > A typical use case at suspend-to-ram:
> > 
> > device driver's irq handler is complicated and might hold a mutex at rare 
> > cases.
> > Its suspend function is called and a suspended flag is set.
> > In case its IRQ handler is running, suspend function calls irq_in_progress. 
> > if
> > handler is running, abort suspend.
> > The irq handler checks the suspended flag. If the device is suspended, irq 
> > handler
> > either ignores the interrupt, or wakes up the whole system, and the driver's
> > resume function could deal with the delayed interrupt handling.
> 
> This is as wrong as it can be. Fix the driver instead of hacking racy
> functions into the core code.
> 
> So your problem looks like this:
> 
> CPU 0 CPU 1
> irq_handler_thread()  suspend()
>.  mutex_lock();
>mutex_lock();synchronize_irq();
> 
> So why needs the mutex to be taken before synchronize_irq()? Why not
> doing the obvious?
> 
> suspend()
>   disable_irq(); (Implies synchronize_irq)
>   mutex_lock();
>   
>   mutex_unlock();
>   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some specific
simple drivers. For example, some drives use GPIO pin as interrupt source.

On one specific platform, disable_irq would really disable the irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to optimize it.
In addition, we have to use the driver instead of throwing it away.

With irq_in_progress, we can resolve this issue and it does work, although it
looks like ugly.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-06-06 Thread Yanmin Zhang
On Wed, 2013-06-05 at 07:30 -0600, Bjorn Helgaas wrote:
> On Tue, Jun 4, 2013 at 6:38 PM, Yanmin Zhang
>  wrote:
> > On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
> >> I'm not sure where we are with this patch.  I think Joseph initially
> >> reported a problem (though I haven't actually seen that), and this
> >> patch fixed it, so it seems like there's something we want to do here.
> > Yes, indeed. We checked Powerpc error handling and plan to use the
> > similar method to deal with the issue.
> >
> > Sorry for replying very late. I move to Android smartphone area and am
> > very busy on many top issues.
> 
> OK.  Can you point us at a bugzilla or email archive of Joseph's
> original problem report, or post it to this thread?  Then maybe
> somebody else can pick it up and make progress on this.

Joseph sent email to my outlook emailbox. I changed it a little to delete 
company
sensitive product name.

===Joseph's original email to me
Hi Tom and Yanmin,
 
Hope this email can reach you well. I am working on a driver's PCI error 
recovery with AER. I have a question with one of my observations from platform 
AER driver's behavior. As your names and emails are listed to the PCIe AER 
driver coming with the kernel, I send this question to you:
 
During injecting AER Non-Correctable/Fatal error and PCIe error recovery 
process, I observed the following from our Low Level Driver (LLD):
 
1. The LLD's error_detected() callback got called and the PCI channel state 
passed in as "pci_channel_io_frozen", as expected;
 
2. The LLD's error_detected() callback function returned with 
PCI_ERS_RESULT_NEED_RESET, requesting a PCI slot reset;
 
3. The LLD's slot_reset() callback got called and it attempted to re-initialize 
the device hardware for the recovery. However, the PCI slot state was still in 
"pci_channel_io_frozen" and the pci_channel_offline() helper routine returned 
1. This is not expected, and it actually prevented LLD in performing needed 
access to memory mapped I/O space for preparing the device for recovery;
 
4. Later, the LLD's resume() callback got called and the PCI slot state was set 
to "pci_channel_io_normal";
 
In the upstream Linux kernel 3.7.0's pci-error-recovery.txt, "STEP 4 Slot 
Reset", it stated after platform has performed PCI slot reset and then calls 
LLD's slot_reset() callback:
 
"This call gives drivers the chance to re-initialize the hardware (re-download 
firmware, etc.).  At this point, the driver may assume that the card is in a 
fresh state and is fully functional. The slot is unfrozen and the driver has 
full access to PCI config space, memory mapped I/O space and DMA. Interrupts 
(Legacy, MSI, or MSI-X) will also be available."
 
I looked into the kernel 3.7.0's AER driver's aerdrv_core.c and see that the 
PCI slot state was set to "pci_channel_io_frozen" on AER_FATAL serverity, and 
only set back to "pci_channel_io_normal" in report_resume() function. The PCI 
slot state was not set to "pci_channel_io_normal" when invoking 
report_slot_reset().
 
As a comparison, I also perform the EEH error recovery with the LLD driver on a 
PPC platform, which indeed set the PCI slot state to "pci_channel_io_normal" 
when calling LLD's slot_reset() callback function.
 
Can you please verify this platform AER driver's behavior is intended and will 
stay with the AER platform support, or it should be changed to be consistent 
with the PCI error recovery procedure described in the pci-error-recovery.txt 
documentation? I also noticed that before invoking the 
broadcast_error_message() function with "slot_reset", there is a comment in the 
3.7.0 kernel source:
 
/*
 * TODO: Should call platform-specific
 * functions to reset slot before calling
 * drivers' slot_reset callbacks?
 */
And I do see that only the PCIe Link_Reset was performed, no PCI fundamental 
reset was performed even the LLD set the PCIe device's "pdev->needs_freset = 
1", is this the area that later will be added for performing needed PCI hot 
reset or fundamental reset at this stage?
 
Your timely response is very appreciated. Thanks in advance and please let me 
know if you think I should redirect the question to someone else.
 
Best Regards,
Joseph Liu, Ph.D.
Senior Principal Engineer
Emulex Corporation


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-06-06 Thread Yanmin Zhang
On Wed, 2013-06-05 at 07:30 -0600, Bjorn Helgaas wrote:
 On Tue, Jun 4, 2013 at 6:38 PM, Yanmin Zhang
 yanmin_zh...@linux.intel.com wrote:
  On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
  I'm not sure where we are with this patch.  I think Joseph initially
  reported a problem (though I haven't actually seen that), and this
  patch fixed it, so it seems like there's something we want to do here.
  Yes, indeed. We checked Powerpc error handling and plan to use the
  similar method to deal with the issue.
 
  Sorry for replying very late. I move to Android smartphone area and am
  very busy on many top issues.
 
 OK.  Can you point us at a bugzilla or email archive of Joseph's
 original problem report, or post it to this thread?  Then maybe
 somebody else can pick it up and make progress on this.

Joseph sent email to my outlook emailbox. I changed it a little to delete 
company
sensitive product name.

===Joseph's original email to me
Hi Tom and Yanmin,
 
Hope this email can reach you well. I am working on a driver's PCI error 
recovery with AER. I have a question with one of my observations from platform 
AER driver's behavior. As your names and emails are listed to the PCIe AER 
driver coming with the kernel, I send this question to you:
 
During injecting AER Non-Correctable/Fatal error and PCIe error recovery 
process, I observed the following from our Low Level Driver (LLD):
 
1. The LLD's error_detected() callback got called and the PCI channel state 
passed in as pci_channel_io_frozen, as expected;
 
2. The LLD's error_detected() callback function returned with 
PCI_ERS_RESULT_NEED_RESET, requesting a PCI slot reset;
 
3. The LLD's slot_reset() callback got called and it attempted to re-initialize 
the device hardware for the recovery. However, the PCI slot state was still in 
pci_channel_io_frozen and the pci_channel_offline() helper routine returned 
1. This is not expected, and it actually prevented LLD in performing needed 
access to memory mapped I/O space for preparing the device for recovery;
 
4. Later, the LLD's resume() callback got called and the PCI slot state was set 
to pci_channel_io_normal;
 
In the upstream Linux kernel 3.7.0's pci-error-recovery.txt, STEP 4 Slot 
Reset, it stated after platform has performed PCI slot reset and then calls 
LLD's slot_reset() callback:
 
This call gives drivers the chance to re-initialize the hardware (re-download 
firmware, etc.).  At this point, the driver may assume that the card is in a 
fresh state and is fully functional. The slot is unfrozen and the driver has 
full access to PCI config space, memory mapped I/O space and DMA. Interrupts 
(Legacy, MSI, or MSI-X) will also be available.
 
I looked into the kernel 3.7.0's AER driver's aerdrv_core.c and see that the 
PCI slot state was set to pci_channel_io_frozen on AER_FATAL serverity, and 
only set back to pci_channel_io_normal in report_resume() function. The PCI 
slot state was not set to pci_channel_io_normal when invoking 
report_slot_reset().
 
As a comparison, I also perform the EEH error recovery with the LLD driver on a 
PPC platform, which indeed set the PCI slot state to pci_channel_io_normal 
when calling LLD's slot_reset() callback function.
 
Can you please verify this platform AER driver's behavior is intended and will 
stay with the AER platform support, or it should be changed to be consistent 
with the PCI error recovery procedure described in the pci-error-recovery.txt 
documentation? I also noticed that before invoking the 
broadcast_error_message() function with slot_reset, there is a comment in the 
3.7.0 kernel source:
 
/*
 * TODO: Should call platform-specific
 * functions to reset slot before calling
 * drivers' slot_reset callbacks?
 */
And I do see that only the PCIe Link_Reset was performed, no PCI fundamental 
reset was performed even the LLD set the PCIe device's pdev-needs_freset = 
1, is this the area that later will be added for performing needed PCI hot 
reset or fundamental reset at this stage?
 
Your timely response is very appreciated. Thanks in advance and please let me 
know if you think I should redirect the question to someone else.
 
Best Regards,
Joseph Liu, Ph.D.
Senior Principal Engineer
Emulex Corporation


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. If using this
  function while holding a resource, the IRQ handler may cause deadlock.
  
  Here we add a new function irq_in_progress which doesn't wait for the 
  handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a mutex at rare 
  cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls irq_in_progress. 
  if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is suspended, irq 
  handler
  either ignores the interrupt, or wakes up the whole system, and the driver's
  resume function could deal with the delayed interrupt handling.
 
 This is as wrong as it can be. Fix the driver instead of hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some specific
simple drivers. For example, some drives use GPIO pin as interrupt source.

On one specific platform, disable_irq would really disable the irq at RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device timeout' 
issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to optimize it.
In addition, we have to use the driver instead of throwing it away.

With irq_in_progress, we can resolve this issue and it does work, although it
looks like ugly.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
   On Thu, 6 Jun 2013, shuox@intel.com wrote:
From: Zhang Yanmin yanmin.zh...@intel.com

synchronize_irq waits pending IRQ handlers to be finished. If using this
function while holding a resource, the IRQ handler may cause deadlock.

Here we add a new function irq_in_progress which doesn't wait for the 
handlers
to be finished.

A typical use case at suspend-to-ram:

device driver's irq handler is complicated and might hold a mutex at 
rare cases.
Its suspend function is called and a suspended flag is set.
In case its IRQ handler is running, suspend function calls 
irq_in_progress. if
handler is running, abort suspend.
The irq handler checks the suspended flag. If the device is suspended, 
irq handler
either ignores the interrupt, or wakes up the whole system, and the 
driver's
resume function could deal with the delayed interrupt handling.
   
   This is as wrong as it can be. Fix the driver instead of hacking racy
   functions into the core code.
   
   So your problem looks like this:
   
   CPU 0 CPU 1
   irq_handler_thread()  suspend()
  .  mutex_lock(m);
  mutex_lock(m);synchronize_irq();
   
   So why needs the mutex to be taken before synchronize_irq()? Why not
   doing the obvious?
   
   suspend()
 disable_irq(); (Implies synchronize_irq)
 mutex_lock(m);
 
 mutex_unlock(m);
 enable_irq();
  Thanks for the kind comment.
  
  We do consider your solution before and it works well indeed with some 
  specific
  simple drivers. For example, some drives use GPIO pin as interrupt source.
  
  On one specific platform, disable_irq would really disable the irq at RTE 
  entry,
  which means we lose the wakeup capability of this device.
  synchronize_irq can be another solution. But we did hit 'DPM device 
  timeout' issue
  reported by dpm_wd_handler at suspend-to-ram.
  
  The driver is complicated. We couldn't change too many functions to 
  optimize it.
  In addition, we have to use the driver instead of throwing it away.
 
 What is preventing you from rewriting it to work properly?
It's complicated.

 
  With irq_in_progress, we can resolve this issue and it does work, although 
  it
  looks like ugly.
 
 Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some 
flexibility
for such complicated drivers?

For example, any driver's suspend can return error and the whole suspend-to-ram
aborts. Can we say the driver has a bug? If so, why not to change all 
suspend/resume
callbacks to return VOID?
Current kernel already allows drivers to abort suspend-to-ram at some rare 
corner cases.
Of course, if the abort happens frequently, it's a bug and we need fix it in 
driver.
If it happens rarely, can we provide some flexibility for the driver?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
 On Thu, 6 Jun 2013, shuox@intel.com wrote:
  From: Zhang Yanmin yanmin.zh...@intel.com
  
  synchronize_irq waits pending IRQ handlers to be finished. If using 
  this
  function while holding a resource, the IRQ handler may cause 
  deadlock.
  
  Here we add a new function irq_in_progress which doesn't wait for 
  the handlers
  to be finished.
  
  A typical use case at suspend-to-ram:
  
  device driver's irq handler is complicated and might hold a mutex 
  at rare cases.
  Its suspend function is called and a suspended flag is set.
  In case its IRQ handler is running, suspend function calls 
  irq_in_progress. if
  handler is running, abort suspend.
  The irq handler checks the suspended flag. If the device is 
  suspended, irq handler
  either ignores the interrupt, or wakes up the whole system, and the 
  driver's
  resume function could deal with the delayed interrupt handling.
 
 This is as wrong as it can be. Fix the driver instead of hacking racy
 functions into the core code.
 
 So your problem looks like this:
 
 CPU 0 CPU 1
 irq_handler_thread()  suspend()
.  mutex_lock(m);
mutex_lock(m);synchronize_irq();
 
 So why needs the mutex to be taken before synchronize_irq()? Why not
 doing the obvious?
 
 suspend()
   disable_irq(); (Implies synchronize_irq)
   mutex_lock(m);
   
   mutex_unlock(m);
   enable_irq();
Thanks for the kind comment.

We do consider your solution before and it works well indeed with some 
specific
simple drivers. For example, some drives use GPIO pin as interrupt 
source.

On one specific platform, disable_irq would really disable the irq at 
RTE entry,
which means we lose the wakeup capability of this device.
synchronize_irq can be another solution. But we did hit 'DPM device 
timeout' issue
reported by dpm_wd_handler at suspend-to-ram.

The driver is complicated. We couldn't change too many functions to 
optimize it.
In addition, we have to use the driver instead of throwing it away.
   
   What is preventing you from rewriting it to work properly?
  It's complicated.
 
 That sounds like your issue, not ours, so please don't push the problem
 onto someone else.  Take ownership of the driver, fix it up, and all
 will be good.
 
 
With irq_in_progress, we can resolve this issue and it does work, 
although it
looks like ugly.
   
   Don't paper over driver bugs in the core kernel, fix the driver.
  It's hard to say it's a driver bug. Could generic kernel provide some
  flexibility for such complicated drivers?
 
 Please post the code for the driver, and then we will be glad to
 continue the dicussion.
Greg,

Thanks for your enthusiasm. It's a private driver for products.

Let's stop here and I wouldn't push the patch to upstream.

Thanks all.
Yanmin

 
 As for complicated, just make it uncomplicated, it's just code :)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irq: add a new function irq_in_progress to check pending IRQ handlers

2013-06-06 Thread Yanmin Zhang
On Thu, 2013-06-06 at 21:19 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 11:18:06AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 20:08 -0700, Greg KH wrote:
   On Fri, Jun 07, 2013 at 10:37:52AM +0800, Yanmin Zhang wrote:
On Thu, 2013-06-06 at 18:02 -0700, Greg KH wrote:
 On Fri, Jun 07, 2013 at 08:53:29AM +0800, Yanmin Zhang wrote:
  On Thu, 2013-06-06 at 15:18 +0200, Thomas Gleixner wrote:
   On Thu, 6 Jun 2013, shuox@intel.com wrote:
From: Zhang Yanmin yanmin.zh...@intel.com

synchronize_irq waits pending IRQ handlers to be finished. If 
using this
function while holding a resource, the IRQ handler may cause 
deadlock.

Here we add a new function irq_in_progress which doesn't wait 
for the handlers
to be finished.

A typical use case at suspend-to-ram:

device driver's irq handler is complicated and might hold a 
mutex at rare cases.
Its suspend function is called and a suspended flag is set.
In case its IRQ handler is running, suspend function calls 
irq_in_progress. if
handler is running, abort suspend.
The irq handler checks the suspended flag. If the device is 
suspended, irq handler
either ignores the interrupt, or wakes up the whole system, and 
the driver's
resume function could deal with the delayed interrupt handling.
   
   This is as wrong as it can be. Fix the driver instead of hacking 
   racy
   functions into the core code.
   
   So your problem looks like this:
   
   CPU 0 CPU 1
   irq_handler_thread()  suspend()
  .  mutex_lock(m);
  mutex_lock(m);synchronize_irq();
   
   So why needs the mutex to be taken before synchronize_irq()? Why 
   not
   doing the obvious?
   
   suspend()
 disable_irq(); (Implies synchronize_irq)
 mutex_lock(m);
 
 mutex_unlock(m);
 enable_irq();
  Thanks for the kind comment.
  
  We do consider your solution before and it works well indeed with 
  some specific
  simple drivers. For example, some drives use GPIO pin as interrupt 
  source.
  
  On one specific platform, disable_irq would really disable the irq 
  at RTE entry,
  which means we lose the wakeup capability of this device.
  synchronize_irq can be another solution. But we did hit 'DPM device 
  timeout' issue
  reported by dpm_wd_handler at suspend-to-ram.
  
  The driver is complicated. We couldn't change too many functions to 
  optimize it.
  In addition, we have to use the driver instead of throwing it away.
 
 What is preventing you from rewriting it to work properly?
It's complicated.
   
   That sounds like your issue, not ours, so please don't push the problem
   onto someone else.  Take ownership of the driver, fix it up, and all
   will be good.
   
   
  With irq_in_progress, we can resolve this issue and it does work, 
  although it
  looks like ugly.
 
 Don't paper over driver bugs in the core kernel, fix the driver.
It's hard to say it's a driver bug. Could generic kernel provide some
flexibility for such complicated drivers?
   
   Please post the code for the driver, and then we will be glad to
   continue the dicussion.
  Greg,
  
  Thanks for your enthusiasm. It's a private driver for products.
 
 What do you mean private driver?  All drivers need to be merged into
 the mainline kernel, it saves you time and money, and we will fix your
 bugs for you.
 
 You know that, your bosses know that, your management knows that, so why
 are you trying to hide things?
 
 totally confused,
They are embedded device drivers, highly depending on specific devices which 
runs
its own firmware in devices. Here the kernel drivers run at AP side.

One example is Graphics driver, which is very big and coding is not friendly. 
Kernel
experts can raise tons of questions against the driver, but we have to make the 
driver
work well on real products.

Another reason is drivers need workaround many hardware issues. That makes it
hard to implement kernel drivers in good shape sometimes.

We need support all cases.

We fixed lots of hard issues on embedded products and think if kernel could be 
more
flexible to support complicated cases.

Anyway, thanks.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-06-04 Thread Yanmin Zhang
On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
> I'm not sure where we are with this patch.  I think Joseph initially
> reported a problem (though I haven't actually seen that), and this
> patch fixed it, so it seems like there's something we want to do here.
Yes, indeed. We checked Powerpc error handling and plan to use the
similar method to deal with the issue.

Sorry for replying very late. I move to Android smartphone area and am
very busy on many top issues.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-06-04 Thread Yanmin Zhang
On Tue, 2013-06-04 at 12:04 -0600, Bjorn Helgaas wrote:
 I'm not sure where we are with this patch.  I think Joseph initially
 reported a problem (though I haven't actually seen that), and this
 patch fixed it, so it seems like there's something we want to do here.
Yes, indeed. We checked Powerpc error handling and plan to use the
similar method to deal with the issue.

Sorry for replying very late. I move to Android smartphone area and am
very busy on many top issues.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Yanmin Zhang
On Mon, 2013-05-20 at 10:37 -0500, Linas Vepstas wrote:
> I think Joe Liu has it right.  I'm going to top-post because things
> are a bit tangled below.  I urge a review of
> /Documentation/PCI/pci-error-recovery.txt, as that gives the details.
> 
> The intended sequence is that, after an error, the device driver gets
> a shot at running some diagnostics & dumps, and then the pci
> bridges/controllers/ports/links are reset (by platform code, viz. aer
> in this case) to a state resembling a fresh power-on.  Then the
> .slot_reset() callback is called on the device driver, to tell the
> driver "hey everything upstream is now working, go set yourself up for
> normal ops."  Thus, in particular, one should have pdev->error_state =
> pci_channel_io_normal; before .slot_reset() is called, and the PCI
> config space contents should resemble a fresh power-on state (and
> **NOT** some other saved state!)
> 
> If the device driver wanted to leave the card in a dead state, it had
> several opportunities to say so, earlier in the callback sequence.  If
> the driver wanted to fiddle with the card with the old PCI config
> space, it already had a chance to do that too, before the
> bridges/controllers/ports/links were fully reset by the platform.   By
> the time that .slot_reset() is being called, both the platform and the
> device driver are expecting smooth sailing ahead.
Yes. It's flexible for drivers to do that in many callbacks. AER framework
provides such flexibility.

> 
> So, looking at the original patch, it seems reasonable. 
I agree.

>  My impression
> is that maybe the AER driver had been doing not quite the right thing
> for a long time.
Pls. provide evidence/facts. The new patch is to facilitate device driver
implementation. It doesn't mean current AER driver is incorrect. We need
a tradeoff.

Just like what Bjoin says, we shouldn't change error_state to 
pci_channel_io_normal
before we really recover the hardware. The patch changes it just because
drivers might call some functions to recover the devices, while such functions
need (error_state==pci_channel_io_normal).

> 
> -- Linas Vepstas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Yanmin Zhang
On Mon, 2013-05-20 at 16:48 -0600, Bjorn Helgaas wrote:
> On Fri, Apr 26, 2013 at 06:28:59AM +, Zhang, LongX wrote:
> > From: Zhang Long 
> > 
> > Specific pci device drivers might have many functions to call
> > pci_channel_offline to check device states. When slot_reset happens,
> > drivers' slot_reset callback might call such functions and eventually
> > abort the reset.
> > 
> > The patch resets pdev->error_state to pci_channel_io_normal at
> > the begining of report_slot_reset.
> > 
> > Thank Liu Joseph for pointing it out.
> > 
> > Signed-off-by: Zhang Yanmin 
> > Signed-off-by: Zhang Long 
> > ---
> >  drivers/pci/pcie/aer/aerdrv_core.c |1 +
> >  drivers/pci/pcie/portdrv_pci.c |   12 +---
> >  2 files changed, 6 insertions(+), 7 deletions(-)
Thank all for the kind comments.

> > 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> > b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 564d97f..c61fd44 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void 
> > *data)
> > result_data = (struct aer_broadcast_data *) data;
> >  
> > device_lock(>dev);
> > +   dev->error_state = pci_channel_io_normal;
> > if (!dev->driver ||
> > !dev->driver->err_handler ||
> > !dev->driver->err_handler->slot_reset)
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index ed4d094..7abefd9 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -332,13 +332,11 @@ static pci_ers_result_t 
> > pcie_portdrv_slot_reset(struct pci_dev *dev)
> > pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
> > int retval;
> >  
> > -   /* If fatal, restore cfg space for possible link reset at upstream */
> > -   if (dev->error_state == pci_channel_io_frozen) {
> > -   dev->state_saved = true;
> > -   pci_restore_state(dev);
> > -   pcie_portdrv_restore_config(dev);
> > -   pci_enable_pcie_error_reporting(dev);
> > -   }
> > +   /* restore cfg space for possible link reset at upstream */
> > +   dev->state_saved = true;
It's indeed a dirty trick to change it to true. The reason is suspend. The port 
would
suspend/resume at suspend-to-ram. When the port is suspended, PCI power 
framework calls
pci_save_state. When the port is resumed, PCI framework calls pci_restore_state 
and
dev->state_saved is set to false.
A better solution is to add double space in 
pci_dev->saved_config_space/saved_cap_space,
which seems consume unnecessary resource.

> > +   pci_restore_state(dev);
> > +   pcie_portdrv_restore_config(dev);
> > +   pci_enable_pcie_error_reporting(dev);
> >  
> > /* get true return value from  */
> > retval = device_for_each_child(>dev, , slot_reset_iter);
> I think this patch changes the behavior in the case of a non-fatal error
> where one of the .error_detected() methods returned
> PCI_ERS_RESULT_NEED_RESET.  In that case, pcie_portdrv_slot_reset()
> previously did not restore config space, but after your patch, it *will*
> restore it.  We need an explanation of why this is safe.
AER standard doesn't define such behavior like if we need reset a slot.
When we implemented the feature in kernel, we assumed at non-fatal error,
config space is still good.

With the new patch, we change the behavior a little.

> 
> I think you should split this into two patches: the first would remove the
> "if (dev->error_state == pci_channel_io_frozen)" test from portdrv_pci.c
The first patch would depend on the 2nd patch as 2nd patch resets error_state 
to pci_channel_io_normal.

> and explain the reason, and the second would make the aerdrv_core.c change.
> 
> I'm also concerned that in that same case (a non-fatal error where one of
> the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET), I don't
> think we actually *do* any kind of device reset.  This isn't related to
> your patch, of course, so if you resolve the config space restore question,
> we can deal with the reset question later.
It's a good question. At the beginning when we enabled AER feature in kernel,
we didn't really implement the real device reset. It's in the TODO list.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Yanmin Zhang
On Mon, 2013-05-20 at 16:48 -0600, Bjorn Helgaas wrote:
 On Fri, Apr 26, 2013 at 06:28:59AM +, Zhang, LongX wrote:
  From: Zhang Long longx.zh...@intel.com
  
  Specific pci device drivers might have many functions to call
  pci_channel_offline to check device states. When slot_reset happens,
  drivers' slot_reset callback might call such functions and eventually
  abort the reset.
  
  The patch resets pdev-error_state to pci_channel_io_normal at
  the begining of report_slot_reset.
  
  Thank Liu Joseph for pointing it out.
  
  Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com
  Signed-off-by: Zhang Long longx.zh...@intel.com
  ---
   drivers/pci/pcie/aer/aerdrv_core.c |1 +
   drivers/pci/pcie/portdrv_pci.c |   12 +---
   2 files changed, 6 insertions(+), 7 deletions(-)
Thank all for the kind comments.

  
  diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
  b/drivers/pci/pcie/aer/aerdrv_core.c
  index 564d97f..c61fd44 100644
  --- a/drivers/pci/pcie/aer/aerdrv_core.c
  +++ b/drivers/pci/pcie/aer/aerdrv_core.c
  @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void 
  *data)
  result_data = (struct aer_broadcast_data *) data;
   
  device_lock(dev-dev);
  +   dev-error_state = pci_channel_io_normal;
  if (!dev-driver ||
  !dev-driver-err_handler ||
  !dev-driver-err_handler-slot_reset)
  diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
  index ed4d094..7abefd9 100644
  --- a/drivers/pci/pcie/portdrv_pci.c
  +++ b/drivers/pci/pcie/portdrv_pci.c
  @@ -332,13 +332,11 @@ static pci_ers_result_t 
  pcie_portdrv_slot_reset(struct pci_dev *dev)
  pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
  int retval;
   
  -   /* If fatal, restore cfg space for possible link reset at upstream */
  -   if (dev-error_state == pci_channel_io_frozen) {
  -   dev-state_saved = true;
  -   pci_restore_state(dev);
  -   pcie_portdrv_restore_config(dev);
  -   pci_enable_pcie_error_reporting(dev);
  -   }
  +   /* restore cfg space for possible link reset at upstream */
  +   dev-state_saved = true;
It's indeed a dirty trick to change it to true. The reason is suspend. The port 
would
suspend/resume at suspend-to-ram. When the port is suspended, PCI power 
framework calls
pci_save_state. When the port is resumed, PCI framework calls pci_restore_state 
and
dev-state_saved is set to false.
A better solution is to add double space in 
pci_dev-saved_config_space/saved_cap_space,
which seems consume unnecessary resource.

  +   pci_restore_state(dev);
  +   pcie_portdrv_restore_config(dev);
  +   pci_enable_pcie_error_reporting(dev);
   
  /* get true return value from status */
  retval = device_for_each_child(dev-dev, status, slot_reset_iter);
 I think this patch changes the behavior in the case of a non-fatal error
 where one of the .error_detected() methods returned
 PCI_ERS_RESULT_NEED_RESET.  In that case, pcie_portdrv_slot_reset()
 previously did not restore config space, but after your patch, it *will*
 restore it.  We need an explanation of why this is safe.
AER standard doesn't define such behavior like if we need reset a slot.
When we implemented the feature in kernel, we assumed at non-fatal error,
config space is still good.

With the new patch, we change the behavior a little.

 
 I think you should split this into two patches: the first would remove the
 if (dev-error_state == pci_channel_io_frozen) test from portdrv_pci.c
The first patch would depend on the 2nd patch as 2nd patch resets error_state 
to pci_channel_io_normal.

 and explain the reason, and the second would make the aerdrv_core.c change.
 
 I'm also concerned that in that same case (a non-fatal error where one of
 the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET), I don't
 think we actually *do* any kind of device reset.  This isn't related to
 your patch, of course, so if you resolve the config space restore question,
 we can deal with the reset question later.
It's a good question. At the beginning when we enabled AER feature in kernel,
we didn't really implement the real device reset. It's in the TODO list.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-21 Thread Yanmin Zhang
On Mon, 2013-05-20 at 10:37 -0500, Linas Vepstas wrote:
 I think Joe Liu has it right.  I'm going to top-post because things
 are a bit tangled below.  I urge a review of
 /Documentation/PCI/pci-error-recovery.txt, as that gives the details.
 
 The intended sequence is that, after an error, the device driver gets
 a shot at running some diagnostics  dumps, and then the pci
 bridges/controllers/ports/links are reset (by platform code, viz. aer
 in this case) to a state resembling a fresh power-on.  Then the
 .slot_reset() callback is called on the device driver, to tell the
 driver hey everything upstream is now working, go set yourself up for
 normal ops.  Thus, in particular, one should have pdev-error_state =
 pci_channel_io_normal; before .slot_reset() is called, and the PCI
 config space contents should resemble a fresh power-on state (and
 **NOT** some other saved state!)
 
 If the device driver wanted to leave the card in a dead state, it had
 several opportunities to say so, earlier in the callback sequence.  If
 the driver wanted to fiddle with the card with the old PCI config
 space, it already had a chance to do that too, before the
 bridges/controllers/ports/links were fully reset by the platform.   By
 the time that .slot_reset() is being called, both the platform and the
 device driver are expecting smooth sailing ahead.
Yes. It's flexible for drivers to do that in many callbacks. AER framework
provides such flexibility.

 
 So, looking at the original patch, it seems reasonable. 
I agree.

  My impression
 is that maybe the AER driver had been doing not quite the right thing
 for a long time.
Pls. provide evidence/facts. The new patch is to facilitate device driver
implementation. It doesn't mean current AER driver is incorrect. We need
a tradeoff.

Just like what Bjoin says, we shouldn't change error_state to 
pci_channel_io_normal
before we really recover the hardware. The patch changes it just because
drivers might call some functions to recover the devices, while such functions
need (error_state==pci_channel_io_normal).

 
 -- Linas Vepstas


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-06 Thread Yanmin Zhang
On Fri, 2013-05-03 at 13:00 -0500, Linas Vepstas wrote:
> Greg,
> 
> I've been receiving (and reading!) these messages; I replied that I am
> not maintaining a tree, and have no way of testing these patches (no
> access to hardware.)  I believe it unlikely that my situation will
> change, and so I should probably be removed from the maintainers file.
> 
> No acked-by or anything; the patches are not "obviously correct" by
> visual inspection; they may be right, but would require deeper
> thinking about what is actually happening than I'm capable of at this
> time; I'm a bit rusty with this code base, and might miss something
> important
powerpc uses the similar method. See function eeh_report_reset.
We worked out the patch after checking powerpc codes.

Joseph Liu helped test the patch and the patch does work well.

> .
> 
> -- Linas
> 
> p.s. you didn't see my earlier reply because I forgot to hit 'plain text 
> reply'
> 
> 
> On 2 May 2013 22:13, Yanmin Zhang  wrote:
> >
> > On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
> > > On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
> > > > On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
> > > > > Hi,
> > > > >
> > > > > On 1 May 2013 19:30, Yanmin Zhang 
> > > > > wrote:
> > > > > On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
> > > > > > From: Zhang Long 
> > > > > >
> > > > > > Specific pci device drivers might have many functions to
> > > > > call
> > > > > > pci_channel_offline to check device states. When slot_reset
> > > > > happens,
> > > > > > drivers' slot_reset callback might call such functions and
> > > > > eventually
> > > > > > abort the reset.
> > > > > >
> > > > > > The patch resets pdev->error_state to pci_channel_io_normal
> > > > > at
> > > > > > the begining of report_slot_reset.
> > > > > >
> > > > > > Thank Liu Joseph for pointing it out.
> > > > >
> > > > > Linas, Bjorn,
> > > > >
> > > > > Would you like to merge the patch to your testing tree?
> > > > > The patch resolves one issue pointed out by Joseph.
> > > > >
> > > > >
> > > > > I'm not maintaining a tree at this time, and am not able to test.
> > > > > Sorry.
> > > > Thanks Linas.
> > > >
> > > > Greg,
> > > >
> > > > Would you like to merge it into your testing tree? Joseph Liu tested
> > > > the patch and it does work.
> > >
> > > You do know about the scripts/get_maintainer.pl script, right?
> > >
> > > Hint, try it out :)
> > Greg,
> >
> > Thanks. We did send to the right people, Linas and Bjorn.
> >
> > scripts/get_maintainer.pl 
> > 0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
> > Bjorn Helgaas  (supporter:PCI 
> > SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
> > Linas Vepstas  (commit_signer:2/8=25%)
> > Yijing Wang  
> > (commit_signer:2/8=25%,commit_signer:2/11=18%)
> > Jiang Liu  
> > (commit_signer:2/8=25%,commit_signer:2/11=18%)
> > Stephen Hemminger  (commit_signer:1/8=12%)
> > "Rafael J. Wysocki"  (commit_signer:6/11=55%)
> > Huang Ying  (commit_signer:5/11=45%)
> > linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
> > linux-kernel@vger.kernel.org (open list)
> >
> >
> > I remember Jesse was maintaining PCI subsystem and he responded quickly.
> >
> > Yanmin
> >
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-06 Thread Yanmin Zhang
On Fri, 2013-05-03 at 13:00 -0500, Linas Vepstas wrote:
 Greg,
 
 I've been receiving (and reading!) these messages; I replied that I am
 not maintaining a tree, and have no way of testing these patches (no
 access to hardware.)  I believe it unlikely that my situation will
 change, and so I should probably be removed from the maintainers file.
 
 No acked-by or anything; the patches are not obviously correct by
 visual inspection; they may be right, but would require deeper
 thinking about what is actually happening than I'm capable of at this
 time; I'm a bit rusty with this code base, and might miss something
 important
powerpc uses the similar method. See function eeh_report_reset.
We worked out the patch after checking powerpc codes.

Joseph Liu helped test the patch and the patch does work well.

 .
 
 -- Linas
 
 p.s. you didn't see my earlier reply because I forgot to hit 'plain text 
 reply'
 
 
 On 2 May 2013 22:13, Yanmin Zhang yanmin_zh...@linux.intel.com wrote:
 
  On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
   On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
 Hi,

 On 1 May 2013 19:30, Yanmin Zhang yanmin_zh...@linux.intel.com
 wrote:
 On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
  From: Zhang Long longx.zh...@intel.com
 
  Specific pci device drivers might have many functions to
 call
  pci_channel_offline to check device states. When slot_reset
 happens,
  drivers' slot_reset callback might call such functions and
 eventually
  abort the reset.
 
  The patch resets pdev-error_state to pci_channel_io_normal
 at
  the begining of report_slot_reset.
 
  Thank Liu Joseph for pointing it out.

 Linas, Bjorn,

 Would you like to merge the patch to your testing tree?
 The patch resolves one issue pointed out by Joseph.


 I'm not maintaining a tree at this time, and am not able to test.
 Sorry.
Thanks Linas.
   
Greg,
   
Would you like to merge it into your testing tree? Joseph Liu tested
the patch and it does work.
  
   You do know about the scripts/get_maintainer.pl script, right?
  
   Hint, try it out :)
  Greg,
 
  Thanks. We did send to the right people, Linas and Bjorn.
 
  scripts/get_maintainer.pl 
  0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
  Bjorn Helgaas bhelg...@google.com (supporter:PCI 
  SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
  Linas Vepstas linasveps...@gmail.com (commit_signer:2/8=25%)
  Yijing Wang wangyij...@huawei.com 
  (commit_signer:2/8=25%,commit_signer:2/11=18%)
  Jiang Liu jiang@huawei.com 
  (commit_signer:2/8=25%,commit_signer:2/11=18%)
  Stephen Hemminger shemmin...@vyatta.com (commit_signer:1/8=12%)
  Rafael J. Wysocki rafael.j.wyso...@intel.com (commit_signer:6/11=55%)
  Huang Ying ying.hu...@intel.com (commit_signer:5/11=45%)
  linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
  linux-kernel@vger.kernel.org (open list)
 
 
  I remember Jesse was maintaining PCI subsystem and he responded quickly.
 
  Yanmin
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-02 Thread Yanmin Zhang
On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
> On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
> > On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
> > > Hi,
> > > 
> > > On 1 May 2013 19:30, Yanmin Zhang 
> > > wrote:
> > > On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
> > > > From: Zhang Long 
> > > >
> > > > Specific pci device drivers might have many functions to
> > > call
> > > > pci_channel_offline to check device states. When slot_reset
> > > happens,
> > > > drivers' slot_reset callback might call such functions and
> > > eventually
> > > > abort the reset.
> > > >
> > > > The patch resets pdev->error_state to pci_channel_io_normal
> > > at
> > > > the begining of report_slot_reset.
> > > >
> > > > Thank Liu Joseph for pointing it out.
> > > 
> > > Linas, Bjorn,
> > > 
> > > Would you like to merge the patch to your testing tree?
> > > The patch resolves one issue pointed out by Joseph.
> > > 
> > > 
> > > I'm not maintaining a tree at this time, and am not able to test.
> > > Sorry.
> > Thanks Linas.
> > 
> > Greg,
> > 
> > Would you like to merge it into your testing tree? Joseph Liu tested
> > the patch and it does work.
> 
> You do know about the scripts/get_maintainer.pl script, right?
> 
> Hint, try it out :)
Greg,

Thanks. We did send to the right people, Linas and Bjorn.

scripts/get_maintainer.pl 
0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
Bjorn Helgaas  (supporter:PCI 
SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
Linas Vepstas  (commit_signer:2/8=25%)
Yijing Wang  
(commit_signer:2/8=25%,commit_signer:2/11=18%)
Jiang Liu  (commit_signer:2/8=25%,commit_signer:2/11=18%)
Stephen Hemminger  (commit_signer:1/8=12%)
"Rafael J. Wysocki"  (commit_signer:6/11=55%)
Huang Ying  (commit_signer:5/11=45%)
linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
linux-kernel@vger.kernel.org (open list)


I remember Jesse was maintaining PCI subsystem and he responded quickly.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-02 Thread Yanmin Zhang
On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
> Hi,
> 
> On 1 May 2013 19:30, Yanmin Zhang 
> wrote:
> On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
> > From: Zhang Long 
> >
> > Specific pci device drivers might have many functions to
> call
> > pci_channel_offline to check device states. When slot_reset
> happens,
> > drivers' slot_reset callback might call such functions and
> eventually
> > abort the reset.
> >
> > The patch resets pdev->error_state to pci_channel_io_normal
> at
> > the begining of report_slot_reset.
> >
> > Thank Liu Joseph for pointing it out.
> 
> Linas, Bjorn,
> 
> Would you like to merge the patch to your testing tree?
> The patch resolves one issue pointed out by Joseph.
> 
> 
> I'm not maintaining a tree at this time, and am not able to test.
> Sorry.
Thanks Linas.

Greg,

Would you like to merge it into your testing tree? Joseph Liu tested
the patch and it does work.

Thanks,
Yanmin



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-02 Thread Yanmin Zhang
On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
 Hi,
 
 On 1 May 2013 19:30, Yanmin Zhang yanmin_zh...@linux.intel.com
 wrote:
 On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
  From: Zhang Long longx.zh...@intel.com
 
  Specific pci device drivers might have many functions to
 call
  pci_channel_offline to check device states. When slot_reset
 happens,
  drivers' slot_reset callback might call such functions and
 eventually
  abort the reset.
 
  The patch resets pdev-error_state to pci_channel_io_normal
 at
  the begining of report_slot_reset.
 
  Thank Liu Joseph for pointing it out.
 
 Linas, Bjorn,
 
 Would you like to merge the patch to your testing tree?
 The patch resolves one issue pointed out by Joseph.
 
 
 I'm not maintaining a tree at this time, and am not able to test.
 Sorry.
Thanks Linas.

Greg,

Would you like to merge it into your testing tree? Joseph Liu tested
the patch and it does work.

Thanks,
Yanmin



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-02 Thread Yanmin Zhang
On Thu, 2013-05-02 at 19:00 -0700, Greg Kroah-Hartman wrote:
 On Fri, May 03, 2013 at 08:33:00AM +0800, Yanmin Zhang wrote:
  On Wed, 2013-05-01 at 20:20 -0500, Linas Vepstas wrote:
   Hi,
   
   On 1 May 2013 19:30, Yanmin Zhang yanmin_zh...@linux.intel.com
   wrote:
   On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
From: Zhang Long longx.zh...@intel.com
   
Specific pci device drivers might have many functions to
   call
pci_channel_offline to check device states. When slot_reset
   happens,
drivers' slot_reset callback might call such functions and
   eventually
abort the reset.
   
The patch resets pdev-error_state to pci_channel_io_normal
   at
the begining of report_slot_reset.
   
Thank Liu Joseph for pointing it out.
   
   Linas, Bjorn,
   
   Would you like to merge the patch to your testing tree?
   The patch resolves one issue pointed out by Joseph.
   
   
   I'm not maintaining a tree at this time, and am not able to test.
   Sorry.
  Thanks Linas.
  
  Greg,
  
  Would you like to merge it into your testing tree? Joseph Liu tested
  the patch and it does work.
 
 You do know about the scripts/get_maintainer.pl script, right?
 
 Hint, try it out :)
Greg,

Thanks. We did send to the right people, Linas and Bjorn.

scripts/get_maintainer.pl 
0001-pci-reset-error_state-to-pci_channel_io_normal-at-re.patch
Bjorn Helgaas bhelg...@google.com (supporter:PCI 
SUBSYSTEM,commit_signer:7/8=88%,commit_signer:10/11=91%)
Linas Vepstas linasveps...@gmail.com (commit_signer:2/8=25%)
Yijing Wang wangyij...@huawei.com 
(commit_signer:2/8=25%,commit_signer:2/11=18%)
Jiang Liu jiang@huawei.com (commit_signer:2/8=25%,commit_signer:2/11=18%)
Stephen Hemminger shemmin...@vyatta.com (commit_signer:1/8=12%)
Rafael J. Wysocki rafael.j.wyso...@intel.com (commit_signer:6/11=55%)
Huang Ying ying.hu...@intel.com (commit_signer:5/11=45%)
linux-...@vger.kernel.org (open list:PCI SUBSYSTEM)
linux-kernel@vger.kernel.org (open list)


I remember Jesse was maintaining PCI subsystem and he responded quickly.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-01 Thread Yanmin Zhang
On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
> From: Zhang Long 
> 
> Specific pci device drivers might have many functions to call
> pci_channel_offline to check device states. When slot_reset happens,
> drivers' slot_reset callback might call such functions and eventually
> abort the reset.
> 
> The patch resets pdev->error_state to pci_channel_io_normal at
> the begining of report_slot_reset.
> 
> Thank Liu Joseph for pointing it out.
Linas, Bjorn,

Would you like to merge the patch to your testing tree?
The patch resolves one issue pointed out by Joseph.

Thanks,
Yanmin


> 
> Signed-off-by: Zhang Yanmin 
> Signed-off-by: Zhang Long 
> ---
>  drivers/pci/pcie/aer/aerdrv_core.c |1 +
>  drivers/pci/pcie/portdrv_pci.c |   12 +---
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index 564d97f..c61fd44 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void 
> *data)
>   result_data = (struct aer_broadcast_data *) data;
>  
>   device_lock(>dev);
> + dev->error_state = pci_channel_io_normal;
>   if (!dev->driver ||
>   !dev->driver->err_handler ||
>   !dev->driver->err_handler->slot_reset)
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index ed4d094..7abefd9 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -332,13 +332,11 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct 
> pci_dev *dev)
>   pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
>   int retval;
>  
> - /* If fatal, restore cfg space for possible link reset at upstream */
> - if (dev->error_state == pci_channel_io_frozen) {
> - dev->state_saved = true;
> - pci_restore_state(dev);
> - pcie_portdrv_restore_config(dev);
> - pci_enable_pcie_error_reporting(dev);
> - }
> + /* restore cfg space for possible link reset at upstream */
> + dev->state_saved = true;
> + pci_restore_state(dev);
> + pcie_portdrv_restore_config(dev);
> + pci_enable_pcie_error_reporting(dev);
>  
>   /* get true return value from  */
>   retval = device_for_each_child(>dev, , slot_reset_iter);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset

2013-05-01 Thread Yanmin Zhang
On Fri, 2013-04-26 at 06:28 +, Zhang, LongX wrote:
 From: Zhang Long longx.zh...@intel.com
 
 Specific pci device drivers might have many functions to call
 pci_channel_offline to check device states. When slot_reset happens,
 drivers' slot_reset callback might call such functions and eventually
 abort the reset.
 
 The patch resets pdev-error_state to pci_channel_io_normal at
 the begining of report_slot_reset.
 
 Thank Liu Joseph for pointing it out.
Linas, Bjorn,

Would you like to merge the patch to your testing tree?
The patch resolves one issue pointed out by Joseph.

Thanks,
Yanmin


 
 Signed-off-by: Zhang Yanmin yanmin_zh...@linux.intel.com
 Signed-off-by: Zhang Long longx.zh...@intel.com
 ---
  drivers/pci/pcie/aer/aerdrv_core.c |1 +
  drivers/pci/pcie/portdrv_pci.c |   12 +---
  2 files changed, 6 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
 b/drivers/pci/pcie/aer/aerdrv_core.c
 index 564d97f..c61fd44 100644
 --- a/drivers/pci/pcie/aer/aerdrv_core.c
 +++ b/drivers/pci/pcie/aer/aerdrv_core.c
 @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void 
 *data)
   result_data = (struct aer_broadcast_data *) data;
  
   device_lock(dev-dev);
 + dev-error_state = pci_channel_io_normal;
   if (!dev-driver ||
   !dev-driver-err_handler ||
   !dev-driver-err_handler-slot_reset)
 diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
 index ed4d094..7abefd9 100644
 --- a/drivers/pci/pcie/portdrv_pci.c
 +++ b/drivers/pci/pcie/portdrv_pci.c
 @@ -332,13 +332,11 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct 
 pci_dev *dev)
   pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED;
   int retval;
  
 - /* If fatal, restore cfg space for possible link reset at upstream */
 - if (dev-error_state == pci_channel_io_frozen) {
 - dev-state_saved = true;
 - pci_restore_state(dev);
 - pcie_portdrv_restore_config(dev);
 - pci_enable_pcie_error_reporting(dev);
 - }
 + /* restore cfg space for possible link reset at upstream */
 + dev-state_saved = true;
 + pci_restore_state(dev);
 + pcie_portdrv_restore_config(dev);
 + pci_enable_pcie_error_reporting(dev);
  
   /* get true return value from status */
   retval = device_for_each_child(dev-dev, status, slot_reset_iter);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Yanmin Zhang
On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
> On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> > From: "he, bo" 
> > 
> > We often hit kernel panic issues on SMP machines because processes race
> > on multiple cpu. By adding a new parameter printk.cpu, kernel prints
> > cpu number at printk information line. It’s useful to debug what cpus
> > are racing.
> 
> How useful is this really for normal developers?
It's very useful to debug race conditions under SMP environment.
We applied the patch to our Android build image on our smartphones.

>   We complained when
> this option was proposed by the ARM developers who were, for the first
> time, handling more than one processor and the issues involved with
> that.  You are enabling this as a default option, for all developers,
> and almost no one will ever need it.
At least, I am a developer to use it every day. :). I need debug various
weird issues and tried many debug tools.

Anyway, we would change the default to _disable_.

> 
> So, if you really want this, don't enable this by default.  Also, go
> back and read the old thread about this option and why it was rejeted
> previously.
Thanks for your kind reminder. I googled it. Is the link
like https://lkml.org/lkml/2012/8/3/121?
Or http://lkml.indiana.edu/hypermail/linux/kernel/1208.0/02097.html?

Above links raise a good question that we might be able to use
ftrace (trace_printk) instead of to extend printk.
1) There are indeed lots of debug methods/tools in kernel because of many
excellent developers' contributions. However, the most frequently-used tool
is still printk.
2) ftrace (I use it often) is good when system mostly doesn't crash.
If system crashes, we couldn't get ftrace info sometimes.
When we debug some hard issues on a multi-core device, the system often
crashes when Graphics driver accesses registers incorrectly. There is no chance
for kernel to dump ftrace or panic info. With printk, we find that's because
2 cpu are racing. One cpu turns off Graphics while another cpu is accessing it.
Of course, we also applied the Android persistent ram patch, so after rebooting,
we still could see the printk buffer of previous booting.
3) Current drivers usually have many dev_dbg statements. We could enable
dynamic debug control to get such info quickly. With the printk cpu number
patch, we quickly enhance them to dump info about potential races among cpu.
With ftrace, we need change all these dev_dbg to trace_printk and that's not
the original objective of driver developers.

Mostly, the patch is to reuse all current printk statements in kernel with
a little overhead.

Thanks for your kind comments.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Yanmin Zhang
On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
 On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
  From: he, bo bo...@intel.com
  
  We often hit kernel panic issues on SMP machines because processes race
  on multiple cpu. By adding a new parameter printk.cpu, kernel prints
  cpu number at printk information line. It’s useful to debug what cpus
  are racing.
 
 How useful is this really for normal developers?
It's very useful to debug race conditions under SMP environment.
We applied the patch to our Android build image on our smartphones.

   We complained when
 this option was proposed by the ARM developers who were, for the first
 time, handling more than one processor and the issues involved with
 that.  You are enabling this as a default option, for all developers,
 and almost no one will ever need it.
At least, I am a developer to use it every day. :). I need debug various
weird issues and tried many debug tools.

Anyway, we would change the default to _disable_.

 
 So, if you really want this, don't enable this by default.  Also, go
 back and read the old thread about this option and why it was rejeted
 previously.
Thanks for your kind reminder. I googled it. Is the link
like https://lkml.org/lkml/2012/8/3/121?
Or http://lkml.indiana.edu/hypermail/linux/kernel/1208.0/02097.html?

Above links raise a good question that we might be able to use
ftrace (trace_printk) instead of to extend printk.
1) There are indeed lots of debug methods/tools in kernel because of many
excellent developers' contributions. However, the most frequently-used tool
is still printk.
2) ftrace (I use it often) is good when system mostly doesn't crash.
If system crashes, we couldn't get ftrace info sometimes.
When we debug some hard issues on a multi-core device, the system often
crashes when Graphics driver accesses registers incorrectly. There is no chance
for kernel to dump ftrace or panic info. With printk, we find that's because
2 cpu are racing. One cpu turns off Graphics while another cpu is accessing it.
Of course, we also applied the Android persistent ram patch, so after rebooting,
we still could see the printk buffer of previous booting.
3) Current drivers usually have many dev_dbg statements. We could enable
dynamic debug control to get such info quickly. With the printk cpu number
patch, we quickly enhance them to dump info about potential races among cpu.
With ftrace, we need change all these dev_dbg to trace_printk and that's not
the original objective of driver developers.

Mostly, the patch is to reuse all current printk statements in kernel with
a little overhead.

Thanks for your kind comments.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] hrtimer: Printing timer info when hitting BUG_ON()

2012-11-05 Thread Yanmin Zhang
On Tue, 2012-10-30 at 18:27 +0800, Chuansheng Liu wrote:
> We encounted one BUG_ON() issue at function __run_hrtimer(),
> but the panic info is not enough to find out which hrtimer
> users use the hrtimer wrongly.
> (in this BUG_ON case, it is callback running at the same time
> hrtimer_start() is executed on different context.)
> 
> We can print basic timer info before BUG(), it is easy to find
> out which hrtimer user has fault, the output info is like below:
> 
> <3>[   41.118552] timer info: state/3, func/test_timer_cb
> <3>[   41.118618] timer stats: pid/786(sh), site/hrtimer_start
> <0>[   41.118670] [ cut here ]
> <2>[   41.118691] kernel BUG at 
> /root/r4_code/hardware/intel/linux-2.6/kernel/hrtimer.c:1260!
> <0>[   41.118715] invalid opcode:  [#1] PREEMPT SMP
> 
> The infos of callback/state are very helpful for debugging this BUG_ON().
It's a useful patch.

Thomas,

Could you merge it into your testing tree, so other developers could also
benefit from it?

Thanks all.
Yanmin

> 
> Signed-off-by: Li Fei 
> Signed-off-by: liu chuansheng 
> ---
>  kernel/hrtimer.c |   21 -
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 6db7a5e..0618eaf 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1204,6 +1204,22 @@ int hrtimer_get_res(const clockid_t which_clock, 
> struct timespec *tp)
>  }
>  EXPORT_SYMBOL_GPL(hrtimer_get_res);
>  
> +/*
> + * dump_hrtimer_callinfo - print hrtimer information including:
> + * state, callback function, pid and start_site.
> +*/
> +static void dump_hrtimer_callinfo(struct hrtimer *timer)
> +{
> +
> + pr_err("timer info: state/%lx, func/%pf\n",
> + timer->state, timer->function);
> +
> +#ifdef CONFIG_TIMER_STATS
> + pr_err("timer stats: pid/%d(%s), site/%pf\n",
> + timer->start_pid, timer->start_comm, timer->start_site);
> +#endif
> +}
> +
>  static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
>  {
>   struct hrtimer_clock_base *base = timer->base;
> @@ -1235,7 +1251,10 @@ static void __run_hrtimer(struct hrtimer *timer, 
> ktime_t *now)
>* hrtimer_start_range_ns() or in hrtimer_interrupt()
>*/
>   if (restart != HRTIMER_NORESTART) {
> - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
> + if (timer->state != HRTIMER_STATE_CALLBACK) {
> + dump_hrtimer_callinfo(timer);
> + BUG();
> + }
>   enqueue_hrtimer(timer, base);
>   }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] hrtimer: Printing timer info when hitting BUG_ON()

2012-11-05 Thread Yanmin Zhang
On Tue, 2012-10-30 at 18:27 +0800, Chuansheng Liu wrote:
 We encounted one BUG_ON() issue at function __run_hrtimer(),
 but the panic info is not enough to find out which hrtimer
 users use the hrtimer wrongly.
 (in this BUG_ON case, it is callback running at the same time
 hrtimer_start() is executed on different context.)
 
 We can print basic timer info before BUG(), it is easy to find
 out which hrtimer user has fault, the output info is like below:
 
 3[   41.118552] timer info: state/3, func/test_timer_cb
 3[   41.118618] timer stats: pid/786(sh), site/hrtimer_start
 0[   41.118670] [ cut here ]
 2[   41.118691] kernel BUG at 
 /root/r4_code/hardware/intel/linux-2.6/kernel/hrtimer.c:1260!
 0[   41.118715] invalid opcode:  [#1] PREEMPT SMP
 
 The infos of callback/state are very helpful for debugging this BUG_ON().
It's a useful patch.

Thomas,

Could you merge it into your testing tree, so other developers could also
benefit from it?

Thanks all.
Yanmin

 
 Signed-off-by: Li Fei fei...@intel.com
 Signed-off-by: liu chuansheng chuansheng@intel.com
 ---
  kernel/hrtimer.c |   21 -
  1 files changed, 20 insertions(+), 1 deletions(-)
 
 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
 index 6db7a5e..0618eaf 100644
 --- a/kernel/hrtimer.c
 +++ b/kernel/hrtimer.c
 @@ -1204,6 +1204,22 @@ int hrtimer_get_res(const clockid_t which_clock, 
 struct timespec *tp)
  }
  EXPORT_SYMBOL_GPL(hrtimer_get_res);
  
 +/*
 + * dump_hrtimer_callinfo - print hrtimer information including:
 + * state, callback function, pid and start_site.
 +*/
 +static void dump_hrtimer_callinfo(struct hrtimer *timer)
 +{
 +
 + pr_err(timer info: state/%lx, func/%pf\n,
 + timer-state, timer-function);
 +
 +#ifdef CONFIG_TIMER_STATS
 + pr_err(timer stats: pid/%d(%s), site/%pf\n,
 + timer-start_pid, timer-start_comm, timer-start_site);
 +#endif
 +}
 +
  static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
  {
   struct hrtimer_clock_base *base = timer-base;
 @@ -1235,7 +1251,10 @@ static void __run_hrtimer(struct hrtimer *timer, 
 ktime_t *now)
* hrtimer_start_range_ns() or in hrtimer_interrupt()
*/
   if (restart != HRTIMER_NORESTART) {
 - BUG_ON(timer-state != HRTIMER_STATE_CALLBACK);
 + if (timer-state != HRTIMER_STATE_CALLBACK) {
 + dump_hrtimer_callinfo(timer);
 + BUG();
 + }
   enqueue_hrtimer(timer, base);
   }
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-29 Thread Yanmin Zhang
On Fri, 2012-10-26 at 14:09 +0200, Thomas Gleixner wrote:
> On Fri, 26 Oct 2012, Zhang, Yanmin wrote:
> > >From: Thomas Gleixner [mailto:t...@linutronix.de]
> > >Your code is returning HRTIMER_RESTART from the timer callback and at
> > >the same time it starts the timer from some other context. That's what
> > >needs to be fixed.
> > 
> > The timer user should fix it. But could we also change hrtimer to
> > make it more stable?  At least, instead of panic, could we print
> > some information and go ahead to let kernel continue?
> 
> That's unfortunately not possible. At this point the timer might be
> already corrupted.
> 
> CPU0  CPU 1
> 
> timer expires
>   callback runs   
> hrtimer_start()
>  expiry value is set
>  hrtimer_enqueue()
> 
>hrtimer_forward()
>   expiry value is set
> 
>   return HRTIMER_RESTART  
> 
> So while we can prevent the double enqueue, we have no way to deal
> with the corrupted expiry value and the inconsistent RB tree. We can
> give better debugging information, but we can't pretend that
> everything is nice and cool.
> 
> If we really want to do something about it which keeps the machine
> alive, then we need to 
> 
>1) dequeue the timer
>2) run a consistency check over the rbtree
>3) enqueue the timer
> 
> Not sure if that's worth the trouble.
I strongly agree with you. Such checking might cause more trouble than
what we could benefit from it.

Could we add more dumping, especially about the timer address and callback
function? Next time, developers could find the bad timer quickly.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-29 Thread Yanmin Zhang
On Fri, 2012-10-26 at 14:09 +0200, Thomas Gleixner wrote:
 On Fri, 26 Oct 2012, Zhang, Yanmin wrote:
  From: Thomas Gleixner [mailto:t...@linutronix.de]
  Your code is returning HRTIMER_RESTART from the timer callback and at
  the same time it starts the timer from some other context. That's what
  needs to be fixed.
  
  The timer user should fix it. But could we also change hrtimer to
  make it more stable?  At least, instead of panic, could we print
  some information and go ahead to let kernel continue?
 
 That's unfortunately not possible. At this point the timer might be
 already corrupted.
 
 CPU0  CPU 1
 
 timer expires
   callback runs   
 hrtimer_start()
  expiry value is set
  hrtimer_enqueue()
 
hrtimer_forward()
   expiry value is set
 
   return HRTIMER_RESTART  
 
 So while we can prevent the double enqueue, we have no way to deal
 with the corrupted expiry value and the inconsistent RB tree. We can
 give better debugging information, but we can't pretend that
 everything is nice and cool.
 
 If we really want to do something about it which keeps the machine
 alive, then we need to 
 
1) dequeue the timer
2) run a consistency check over the rbtree
3) enqueue the timer
 
 Not sure if that's worth the trouble.
I strongly agree with you. Such checking might cause more trouble than
what we could benefit from it.

Could we add more dumping, especially about the timer address and callback
function? Next time, developers could find the bad timer quickly.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Yanmin Zhang
V2 adds a checking HRTIMER_STATE_ENQUEUED. If HRTIMER_STATE_ENQUEUED is set,
We don't call enqueue_hrtimer. It should be rare that HRTIMER_STATE_ENQUEUED is 
set.

---

We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != 
HRTIMER_STATE_CALLBACK).
<2>[   10.226053, 3] kernel BUG at 
/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
<0>[   10.235682, 3] invalid opcode:  [#1] PREEMPT SMP
<4>[   10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 
compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
<4>[   10.251651, 3]
<4>[   10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC  
3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
<4>[   10.264674, 3] EIP: 0060:[] EFLAGS: 00010002 CPU: 3
<4>[   10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
<4>[   10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301
<4>[   10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
<4>[   10.288113, 3]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
<0>[   10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 
task.ti=f57b4000)
<0>[   10.301827, 3] Stack:
<4>[   10.304083, 3]   c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 
f67fb6b8 f57b5e14
<4>[   10.312069, 3]  0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 
f57b5e08 c126f325
<4>[   10.320055, 3]   86b9868d 0001 86b9868d 0001 0003 
 7fff
<0>[   10.328041, 3] Call Trace:
<4>[   10.330742, 3]  [] ? gburst_thread_stop.isra.25+0x40/0x40
<4>[   10.336988, 3]  [] hrtimer_interrupt+0xd5/0x250
<4>[   10.342368, 3]  [] ? sched_clock_cpu+0xe5/0x150
<4>[   10.347753, 3]  [] smp_apic_timer_interrupt+0x54/0x88
<4>[   10.353654, 3]  [] ? trace_hardirqs_off_thunk+0xc/0x14
<4>[   10.359643, 3]  [] apic_timer_interrupt+0x2f/0x34
<4>[   10.365199, 3]  [] ? sub_preempt_count+0x1f/0x50
<4>[   10.370669, 3]  [] delay_tsc+0x3a/0xc0
<6>[   10.371589, 0] android_work: did not send uevent (0 0   (null))
<4>[   10.381171, 3]  [] __const_udelay+0x23/0x30
<4>[   10.386207, 3]  [] mdfld_dsi_send_dcs+0x12a/0x5d0
<4>[   10.391760, 3]  [] ? _raw_spin_unlock_irqrestore+0x26/0x50
<4>[   10.398101, 3]  [] ? ospm_power_using_hw_begin+0xa1/0x350
<4>[   10.399053, 3]  [] ? __mutex_lock_slowpath+0x1ff/0x2f0
<4>[   10.399069, 3]  [] mdfld_dbi_update_panel+0x21e/0x2d0
<4>[   10.399085, 3]  [] mdfld_te_handler_work+0x71/0x80
<4>[   10.399099, 3]  [] process_one_work+0xfe/0x3f0
<4>[   10.399114, 3]  [] ? mdfld_async_flip_te_handler+0xf0/0xf0

Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer 
calls
the timer callback fn, another thread might call enqueue_hrtimer or 
hrtimer_start
to requeue it, and the timer->state is equal to 
HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking fails.

The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK and also bypass
enqueue_hrtimer when the timer is queued.

Signed-off-by: Yanmin Zhang 
Signed-off-by: He Bo 
---
 kernel/hrtimer.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..535cdbb 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1235,8 +1235,9 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t 
*now)
 * hrtimer_start_range_ns() or in hrtimer_interrupt()
 */
if (restart != HRTIMER_NORESTART) {
-   BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
-   enqueue_hrtimer(timer, base);
+   BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
+   if (likely(!(timer->state & HRTIMER_STATE_ENQUEUED)))
+   enqueue_hrtimer(timer, base);
}
 
WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
-- 
1.7.4.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Yanmin Zhang
On Fri, 2012-10-26 at 10:51 +0800, he, bo wrote:
> From: Yanmin Zhang 
> 
> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != 
> HRTIMER_STATE_CALLBACK).
> <2>[   10.226053, 3] kernel BUG at 
> /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
> <0>[   10.235682, 3] invalid opcode:  [#1] PREEMPT SMP
> <4>[   10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 
> compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
> <4>[   10.251651, 3]
> <4>[   10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC  
> 3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
> <4>[   10.264674, 3] EIP: 0060:[] EFLAGS: 00010002 CPU: 3
> <4>[   10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
> <4>[   10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301
> <4>[   10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
> <4>[   10.288113, 3]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> <0>[   10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 
> task.ti=f57b4000)
> <0>[   10.301827, 3] Stack:
> <4>[   10.304083, 3]   c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 
> f67fb6b8 f57b5e14
> <4>[   10.312069, 3]  0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 
> f57b5e08 c126f325
> <4>[   10.320055, 3]   86b9868d 0001 86b9868d 0001 0003 
>  7fff
> <0>[   10.328041, 3] Call Trace:
> <4>[   10.330742, 3]  [] ? gburst_thread_stop.isra.25+0x40/0x40
> <4>[   10.336988, 3]  [] hrtimer_interrupt+0xd5/0x250
> <4>[   10.342368, 3]  [] ? sched_clock_cpu+0xe5/0x150
> <4>[   10.347753, 3]  [] smp_apic_timer_interrupt+0x54/0x88
> <4>[   10.353654, 3]  [] ? trace_hardirqs_off_thunk+0xc/0x14
> <4>[   10.359643, 3]  [] apic_timer_interrupt+0x2f/0x34
> <4>[   10.365199, 3]  [] ? sub_preempt_count+0x1f/0x50
> <4>[   10.370669, 3]  [] delay_tsc+0x3a/0xc0
> <6>[   10.371589, 0] android_work: did not send uevent (0 0   (null))
> <4>[   10.381171, 3]  [] __const_udelay+0x23/0x30
> <4>[   10.386207, 3]  [] mdfld_dsi_send_dcs+0x12a/0x5d0
> <4>[   10.391760, 3]  [] ? _raw_spin_unlock_irqrestore+0x26/0x50
> <4>[   10.398101, 3]  [] ? ospm_power_using_hw_begin+0xa1/0x350
> <4>[   10.399053, 3]  [] ? __mutex_lock_slowpath+0x1ff/0x2f0
> <4>[   10.399069, 3]  [] mdfld_dbi_update_panel+0x21e/0x2d0
> <4>[   10.399085, 3]  [] mdfld_te_handler_work+0x71/0x80
> <4>[   10.399099, 3]  [] process_one_work+0xfe/0x3f0
> <4>[   10.399114, 3]  [] ? mdfld_async_flip_te_handler+0xf0/0xf0
> 
> Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer 
> calls
> the timer callback fn, another thread might call enqueue_hrtimer or 
> hrtimer_start
> to requeue it, and the timer->state is equal to 
> HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
> which causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking 
> fails.
> 
> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
The patch has an issue that enqueue_hrtimer doesn't check if the timer is 
queued.
I will send a new patch.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Yanmin Zhang
On Fri, 2012-10-26 at 10:51 +0800, he, bo wrote:
 From: Yanmin Zhang yanmin.zh...@intel.com
 
 We hit a kernel panic at __run_hrtimer=BUG_ON(timer-state != 
 HRTIMER_STATE_CALLBACK).
 2[   10.226053, 3] kernel BUG at 
 /home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
 0[   10.235682, 3] invalid opcode:  [#1] PREEMPT SMP
 4[   10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 
 compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
 4[   10.251651, 3]
 4[   10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC  
 3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
 4[   10.264674, 3] EIP: 0060:[c126c7ed] EFLAGS: 00010002 CPU: 3
 4[   10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
 4[   10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301
 4[   10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
 4[   10.288113, 3]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
 0[   10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 
 task.ti=f57b4000)
 0[   10.301827, 3] Stack:
 4[   10.304083, 3]   c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 
 f67fb6b8 f57b5e14
 4[   10.312069, 3]  0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 
 f57b5e08 c126f325
 4[   10.320055, 3]   86b9868d 0001 86b9868d 0001 0003 
  7fff
 0[   10.328041, 3] Call Trace:
 4[   10.330742, 3]  [c167a6e0] ? gburst_thread_stop.isra.25+0x40/0x40
 4[   10.336988, 3]  [c126d1e5] hrtimer_interrupt+0xd5/0x250
 4[   10.342368, 3]  [c126f325] ? sched_clock_cpu+0xe5/0x150
 4[   10.347753, 3]  [c1871d44] smp_apic_timer_interrupt+0x54/0x88
 4[   10.353654, 3]  [c1496558] ? trace_hardirqs_off_thunk+0xc/0x14
 4[   10.359643, 3]  [c186be9f] apic_timer_interrupt+0x2f/0x34
 4[   10.365199, 3]  [c186e60f] ? sub_preempt_count+0x1f/0x50
 4[   10.370669, 3]  [c149558a] delay_tsc+0x3a/0xc0
 6[   10.371589, 0] android_work: did not send uevent (0 0   (null))
 4[   10.381171, 3]  [c14954e3] __const_udelay+0x23/0x30
 4[   10.386207, 3]  [c16d043a] mdfld_dsi_send_dcs+0x12a/0x5d0
 4[   10.391760, 3]  [c186b6c6] ? _raw_spin_unlock_irqrestore+0x26/0x50
 4[   10.398101, 3]  [c16af431] ? ospm_power_using_hw_begin+0xa1/0x350
 4[   10.399053, 3]  [c186a49f] ? __mutex_lock_slowpath+0x1ff/0x2f0
 4[   10.399069, 3]  [c16bd59e] mdfld_dbi_update_panel+0x21e/0x2d0
 4[   10.399085, 3]  [c16b1ae1] mdfld_te_handler_work+0x71/0x80
 4[   10.399099, 3]  [c12642be] process_one_work+0xfe/0x3f0
 4[   10.399114, 3]  [c16b1a70] ? mdfld_async_flip_te_handler+0xf0/0xf0
 
 Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer 
 calls
 the timer callback fn, another thread might call enqueue_hrtimer or 
 hrtimer_start
 to requeue it, and the timer-state is equal to 
 HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
 which causes the BUG_ON(timer-state != HRTIMER_STATE_CALLBACK) checking 
 fails.
 
 The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
The patch has an issue that enqueue_hrtimer doesn't check if the timer is 
queued.
I will send a new patch.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] hrtimer:__run_hrtimer races with enqueue_hrtimer

2012-10-26 Thread Yanmin Zhang
V2 adds a checking HRTIMER_STATE_ENQUEUED. If HRTIMER_STATE_ENQUEUED is set,
We don't call enqueue_hrtimer. It should be rare that HRTIMER_STATE_ENQUEUED is 
set.

---

We hit a kernel panic at __run_hrtimer=BUG_ON(timer-state != 
HRTIMER_STATE_CALLBACK).
2[   10.226053, 3] kernel BUG at 
/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228!
0[   10.235682, 3] invalid opcode:  [#1] PREEMPT SMP
4[   10.240716, 3] Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211 
compat btwilink rmi4(C) fmdrv_chr st_drv matrix(C)
4[   10.251651, 3]
4[   10.253391, 3] Pid: 68, comm: kworker/3:4 Tainted: GWC  
3.0.34-140430-g2af538d #45 Intel Corporation CloverTrail/FFRD
4[   10.264674, 3] EIP: 0060:[c126c7ed] EFLAGS: 00010002 CPU: 3
4[   10.270411, 3] EIP is at __run_hrtimer+0xbd/0x240
4[   10.275091, 3] EAX: 0001 EBX: f67fb6b8 ECX: f57b4000 EDX: 7301
4[   10.281602, 3] ESI: c1d614c0 EDI: f67fb680 EBP: f57b5dd8 ESP: f57b5da8
4[   10.288113, 3]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
0[   10.293754, 3] Process kworker/3:4 (pid: 68, ti=f57b4000 task=f57aa730 
task.ti=f57b4000)
0[   10.301827, 3] Stack:
4[   10.304083, 3]   c1afef40 f57b5dd8 c167a6e0 f67fb680 20b366e3 
f67fb6b8 f57b5e14
4[   10.312069, 3]  0001 f67fb6b8 0001 f67fb680 f57b5e28 c126d1e5 
f57b5e08 c126f325
4[   10.320055, 3]   86b9868d 0001 86b9868d 0001 0003 
 7fff
0[   10.328041, 3] Call Trace:
4[   10.330742, 3]  [c167a6e0] ? gburst_thread_stop.isra.25+0x40/0x40
4[   10.336988, 3]  [c126d1e5] hrtimer_interrupt+0xd5/0x250
4[   10.342368, 3]  [c126f325] ? sched_clock_cpu+0xe5/0x150
4[   10.347753, 3]  [c1871d44] smp_apic_timer_interrupt+0x54/0x88
4[   10.353654, 3]  [c1496558] ? trace_hardirqs_off_thunk+0xc/0x14
4[   10.359643, 3]  [c186be9f] apic_timer_interrupt+0x2f/0x34
4[   10.365199, 3]  [c186e60f] ? sub_preempt_count+0x1f/0x50
4[   10.370669, 3]  [c149558a] delay_tsc+0x3a/0xc0
6[   10.371589, 0] android_work: did not send uevent (0 0   (null))
4[   10.381171, 3]  [c14954e3] __const_udelay+0x23/0x30
4[   10.386207, 3]  [c16d043a] mdfld_dsi_send_dcs+0x12a/0x5d0
4[   10.391760, 3]  [c186b6c6] ? _raw_spin_unlock_irqrestore+0x26/0x50
4[   10.398101, 3]  [c16af431] ? ospm_power_using_hw_begin+0xa1/0x350
4[   10.399053, 3]  [c186a49f] ? __mutex_lock_slowpath+0x1ff/0x2f0
4[   10.399069, 3]  [c16bd59e] mdfld_dbi_update_panel+0x21e/0x2d0
4[   10.399085, 3]  [c16b1ae1] mdfld_te_handler_work+0x71/0x80
4[   10.399099, 3]  [c12642be] process_one_work+0xfe/0x3f0
4[   10.399114, 3]  [c16b1a70] ? mdfld_async_flip_te_handler+0xf0/0xf0

Basically, __run_hrtimer has a race with enqueue_hrtimer. When __run_hrtimer 
calls
the timer callback fn, another thread might call enqueue_hrtimer or 
hrtimer_start
to requeue it, and the timer-state is equal to 
HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED,
which causes the BUG_ON(timer-state != HRTIMER_STATE_CALLBACK) checking fails.

The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK and also bypass
enqueue_hrtimer when the timer is queued.

Signed-off-by: Yanmin Zhang yanmin_zh...@linux.intel.com
Signed-off-by: He Bo bo...@intel.com
---
 kernel/hrtimer.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6db7a5e..535cdbb 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1235,8 +1235,9 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t 
*now)
 * hrtimer_start_range_ns() or in hrtimer_interrupt()
 */
if (restart != HRTIMER_NORESTART) {
-   BUG_ON(timer-state != HRTIMER_STATE_CALLBACK);
-   enqueue_hrtimer(timer, base);
+   BUG_ON(!(timer-state  HRTIMER_STATE_CALLBACK));
+   if (likely(!(timer-state  HRTIMER_STATE_ENQUEUED)))
+   enqueue_hrtimer(timer, base);
}
 
WARN_ON_ONCE(!(timer-state  HRTIMER_STATE_CALLBACK));
-- 
1.7.4.1



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] drivers-core: move the calling to device_pm_remove behind the calling to bus_remove_device

2012-10-24 Thread Yanmin Zhang
On Thu, 2012-10-25 at 00:33 +0200, Rafael J. Wysocki wrote:
> On Wednesday 24 of October 2012 15:13:27 Greg Kroah-Hartman wrote:
> > On Wed, Oct 24, 2012 at 09:46:19PM +0200, Rafael J. Wysocki wrote:
> > > On Tuesday 16 of October 2012 23:28:08 zhanglong wrote:
> > > > We hit an hang issue when removing a mmc device on Medfield Android 
> > > > phone by sysfs interface.
> > > > 
> > > > device_pm_remove will call pm_runtime_remove which would disable
> > > > runtime PM of the device. After that pm_runtime_get* or
> > > > pm_runtime_put* will be ignored. So if we disable the runtime PM
> > > > before device really be removed, drivers' _remove callback may
> > > > access HW even pm_runtime_get* fails. That is bad.
> > > > 
> > > > Consider below call sequence when removing a device:
> > > > device_del => device_pm_remove
> > > >  => class_intf->remove_dev(dev, class_intf)  => 
> > > > pm_runtime_get_sync/put_sync
> > > >  => bus_remove_device => device_release_driver => 
> > > > pm_runtime_get_sync/put_sync
> > > > 
> > > > remove_dev might call pm_runtime_get_sync/put_sync.
> > > > Then, generic device_release_driver also calls 
> > > > pm_runtime_get_sync/put_sync.
> > > > Since device_del => device_pm_remove firstly, later _get_sync wouldn't 
> > > > really wake up the device. 
> > > > 
> > > > I git log -p to find the patch which moves the calling to 
> > > > device_pm_remove ahead.
> > > > It's below patch:
> > > > 
> > > > commit  775b64d2b6ca37697de925f70799c710aab5849a
> > > > Author: Rafael J. Wysocki 
> > > > Date:   Sat Jan 12 20:40:46 2008 +0100
> > > > 
> > > >  PM: Acquire device locks on suspend
> > > > 
> > > >  This patch reorganizes the way suspend and resume notifications are
> > > >  sent to drivers.  The major changes are that now the PM core 
> > > > acquires
> > > >  every device semaphore before calling the methods, and calls to
> > > >  device_add() during suspends will fail, while calls to device_del()
> > > >  during suspends will block.
> > > > 
> > > >  It also provides a way to safely remove a suspended device with the
> > > >  help of the PM core, by using the device_pm_schedule_removal() 
> > > > callback
> > > >  introduced specifically for this purpose, and updates two drivers 
> > > > (msr
> > > >  and cpuid) that need to use it. 
> > > > 
> > > > 
> > > > As device_pm_schedule_removal is deleted by another patch, we need also 
> > > > revert other parts of the patch,
> > > > i.e. move the calling of device_pm_remove after the calling to 
> > > > bus_remove_device.
> > > > 
> > > > Signed-off-by: LongX Zhang 
> > > 
> > > Greg, do you see any potential problems with this patch?
> > 
> > No, no objection from me:
> > 
> > Acked-by: Greg Kroah-Hartman 
> 
> OK, thanks!
> 
> Applied to linux-pm.git/linux-next as v3.8 material.
Rafael,

As usual, I really appreciate it!

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] drivers-core: move the calling to device_pm_remove behind the calling to bus_remove_device

2012-10-24 Thread Yanmin Zhang
On Thu, 2012-10-25 at 00:33 +0200, Rafael J. Wysocki wrote:
 On Wednesday 24 of October 2012 15:13:27 Greg Kroah-Hartman wrote:
  On Wed, Oct 24, 2012 at 09:46:19PM +0200, Rafael J. Wysocki wrote:
   On Tuesday 16 of October 2012 23:28:08 zhanglong wrote:
We hit an hang issue when removing a mmc device on Medfield Android 
phone by sysfs interface.

device_pm_remove will call pm_runtime_remove which would disable
runtime PM of the device. After that pm_runtime_get* or
pm_runtime_put* will be ignored. So if we disable the runtime PM
before device really be removed, drivers' _remove callback may
access HW even pm_runtime_get* fails. That is bad.

Consider below call sequence when removing a device:
device_del = device_pm_remove
 = class_intf-remove_dev(dev, class_intf)  = 
pm_runtime_get_sync/put_sync
 = bus_remove_device = device_release_driver = 
pm_runtime_get_sync/put_sync

remove_dev might call pm_runtime_get_sync/put_sync.
Then, generic device_release_driver also calls 
pm_runtime_get_sync/put_sync.
Since device_del = device_pm_remove firstly, later _get_sync wouldn't 
really wake up the device. 

I git log -p to find the patch which moves the calling to 
device_pm_remove ahead.
It's below patch:

commit  775b64d2b6ca37697de925f70799c710aab5849a
Author: Rafael J. Wysocki r...@sisk.pl
Date:   Sat Jan 12 20:40:46 2008 +0100

 PM: Acquire device locks on suspend

 This patch reorganizes the way suspend and resume notifications are
 sent to drivers.  The major changes are that now the PM core 
acquires
 every device semaphore before calling the methods, and calls to
 device_add() during suspends will fail, while calls to device_del()
 during suspends will block.

 It also provides a way to safely remove a suspended device with the
 help of the PM core, by using the device_pm_schedule_removal() 
callback
 introduced specifically for this purpose, and updates two drivers 
(msr
 and cpuid) that need to use it. 


As device_pm_schedule_removal is deleted by another patch, we need also 
revert other parts of the patch,
i.e. move the calling of device_pm_remove after the calling to 
bus_remove_device.

Signed-off-by: LongX Zhang longx.zh...@intel.com
   
   Greg, do you see any potential problems with this patch?
  
  No, no objection from me:
  
  Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org
 
 OK, thanks!
 
 Applied to linux-pm.git/linux-next as v3.8 material.
Rafael,

As usual, I really appreciate it!

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] drivers-core: move the calling to device_pm_remove behind the calling to bus_remove_device

2012-10-21 Thread Yanmin Zhang
On Tue, 2012-10-16 at 23:28 +0800, zhanglong wrote:
> We hit an hang issue when removing a mmc device on Medfield Android phone by 
> sysfs interface.
> 
> device_pm_remove will call pm_runtime_remove which would disable
> runtime PM of the device. After that pm_runtime_get* or
> pm_runtime_put* will be ignored. So if we disable the runtime PM
> before device really be removed, drivers' _remove callback may
> access HW even pm_runtime_get* fails. That is bad.
> 
> Consider below call sequence when removing a device:
> device_del => device_pm_remove
>  => class_intf->remove_dev(dev, class_intf)  => 
> pm_runtime_get_sync/put_sync
>  => bus_remove_device => device_release_driver => 
> pm_runtime_get_sync/put_sync
> 
> remove_dev might call pm_runtime_get_sync/put_sync.
> Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
> Since device_del => device_pm_remove firstly, later _get_sync wouldn't really 
> wake up the device. 
> 
> I git log -p to find the patch which moves the calling to device_pm_remove 
> ahead.
> It's below patch:
> 
> commit  775b64d2b6ca37697de925f70799c710aab5849a
> Author: Rafael J. Wysocki 
> Date:   Sat Jan 12 20:40:46 2008 +0100
> 
>  PM: Acquire device locks on suspend
> 
>  This patch reorganizes the way suspend and resume notifications are
>  sent to drivers.  The major changes are that now the PM core acquires
>  every device semaphore before calling the methods, and calls to
>  device_add() during suspends will fail, while calls to device_del()
>  during suspends will block.
> 
>  It also provides a way to safely remove a suspended device with the
>  help of the PM core, by using the device_pm_schedule_removal() callback
>  introduced specifically for this purpose, and updates two drivers (msr
>  and cpuid) that need to use it. 
> 
> 
> As device_pm_schedule_removal is deleted by another patch, we need also 
> revert other parts of the patch,
> i.e. move the calling of device_pm_remove after the calling to 
> bus_remove_device.
> 
> Signed-off-by: LongX Zhang 
Rafael,

With the patch V2, is there anything else we could improve before it can
be merged into your testing tree?

Thanks,
Yanmin

> ---
>  drivers/base/core.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index abea76c..150a415 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1180,7 +1180,6 @@ void device_del(struct device *dev)
> if (dev->bus)
> blocking_notifier_call_chain(>bus->p->bus_notifier,
>  BUS_NOTIFY_DEL_DEVICE, dev);
> -   device_pm_remove(dev);
> dpm_sysfs_remove(dev);
> if (parent)
> klist_del(>p->knode_parent);
> @@ -1205,6 +1204,7 @@ void device_del(struct device *dev)
> device_remove_file(dev, _attr);
> device_remove_attrs(dev);
> bus_remove_device(dev);
> +   device_pm_remove(dev);
> driver_deferred_probe_del(dev);
> 
> /* Notify the platform of the removal, in case they


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] drivers-core: move the calling to device_pm_remove behind the calling to bus_remove_device

2012-10-21 Thread Yanmin Zhang
On Tue, 2012-10-16 at 23:28 +0800, zhanglong wrote:
 We hit an hang issue when removing a mmc device on Medfield Android phone by 
 sysfs interface.
 
 device_pm_remove will call pm_runtime_remove which would disable
 runtime PM of the device. After that pm_runtime_get* or
 pm_runtime_put* will be ignored. So if we disable the runtime PM
 before device really be removed, drivers' _remove callback may
 access HW even pm_runtime_get* fails. That is bad.
 
 Consider below call sequence when removing a device:
 device_del = device_pm_remove
  = class_intf-remove_dev(dev, class_intf)  = 
 pm_runtime_get_sync/put_sync
  = bus_remove_device = device_release_driver = 
 pm_runtime_get_sync/put_sync
 
 remove_dev might call pm_runtime_get_sync/put_sync.
 Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
 Since device_del = device_pm_remove firstly, later _get_sync wouldn't really 
 wake up the device. 
 
 I git log -p to find the patch which moves the calling to device_pm_remove 
 ahead.
 It's below patch:
 
 commit  775b64d2b6ca37697de925f70799c710aab5849a
 Author: Rafael J. Wysocki r...@sisk.pl
 Date:   Sat Jan 12 20:40:46 2008 +0100
 
  PM: Acquire device locks on suspend
 
  This patch reorganizes the way suspend and resume notifications are
  sent to drivers.  The major changes are that now the PM core acquires
  every device semaphore before calling the methods, and calls to
  device_add() during suspends will fail, while calls to device_del()
  during suspends will block.
 
  It also provides a way to safely remove a suspended device with the
  help of the PM core, by using the device_pm_schedule_removal() callback
  introduced specifically for this purpose, and updates two drivers (msr
  and cpuid) that need to use it. 
 
 
 As device_pm_schedule_removal is deleted by another patch, we need also 
 revert other parts of the patch,
 i.e. move the calling of device_pm_remove after the calling to 
 bus_remove_device.
 
 Signed-off-by: LongX Zhang longx.zh...@intel.com
Rafael,

With the patch V2, is there anything else we could improve before it can
be merged into your testing tree?

Thanks,
Yanmin

 ---
  drivers/base/core.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index abea76c..150a415 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -1180,7 +1180,6 @@ void device_del(struct device *dev)
 if (dev-bus)
 blocking_notifier_call_chain(dev-bus-p-bus_notifier,
  BUS_NOTIFY_DEL_DEVICE, dev);
 -   device_pm_remove(dev);
 dpm_sysfs_remove(dev);
 if (parent)
 klist_del(dev-p-knode_parent);
 @@ -1205,6 +1204,7 @@ void device_del(struct device *dev)
 device_remove_file(dev, uevent_attr);
 device_remove_attrs(dev);
 bus_remove_device(dev);
 +   device_pm_remove(dev);
 driver_deferred_probe_del(dev);
 
 /* Notify the platform of the removal, in case they


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject: [PATCH] drivers-core: move device_pm_remove behind bus_remove_device

2012-10-17 Thread Yanmin Zhang
On Mon, 2012-10-15 at 22:59 +0200, Rafael J. Wysocki wrote:
> On Monday 15 of October 2012 15:39:49 Yanmin Zhang wrote:
> > On Fri, 2012-09-21 at 01:58 +, Zhang, LongX wrote:
> > > From: LongX Zhang 
> > > 
> > > device_pm_remove will call pm_runtime_remove which would disable
> > > runtime PM of the device. After that pm_runtime_get* or
> > > pm_runtime_put* will be ingored. So if we disable the runtime PM
> > > before device really be removed, drivers' _remove callback may
> > > access HW even pm_runtime_get* fails. That is bad.
> > The background about the patch: We hit an hang issue when removing a mmc
> > device on Medfield Android phone by sysfs interface.
> > 
> > Consider below call sequence when removing a device:
> > device_del => device_pm_remove
> >=> class_intf->remove_dev(dev, class_intf)  => 
> > pm_runtime_get_sync/put_sync
> >=> bus_remove_device => device_release_driver => 
> > pm_runtime_get_sync/put_sync
> > 
> > remove_dev might call pm_runtime_get_sync/put_sync.
> > Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
> > Since device_del => device_pm_remove firstly, later _get_sync wouldn't 
> > really
> > wake up the device.
> > 
> > I git log -p to find the patch which moves the calling to device_pm_remove 
> > ahead.
> > It's below patch:
> > 
> > commit 775b64d2b6ca37697de925f70799c710aab5849a
> > Author: Rafael J. Wysocki 
> > Date:   Sat Jan 12 20:40:46 2008 +0100
> > 
> > PM: Acquire device locks on suspend
> > 
> > This patch reorganizes the way suspend and resume notifications are
> > sent to drivers.  The major changes are that now the PM core acquires
> > every device semaphore before calling the methods, and calls to
> > device_add() during suspends will fail, while calls to device_del()
> > during suspends will block.
> > 
> > It also provides a way to safely remove a suspended device with the
> > help of the PM core, by using the device_pm_schedule_removal() callback
> > introduced specifically for this purpose, and updates two drivers (msr
> > and cpuid) that need to use it.
> > 
> > 
> > As device_pm_schedule_removal is deleted by another patch, we need also 
> > revert
> > other parts of the patch, i.e. move the calling of device_pm_remove after
> > the calling to bus_remove_device.
> > 
> > > 
> > > Signed-off-by: LongX Zhang 
> 
> Can you please resend the patch?  I lost the track of it unfortunately.
Long sent V2 out. See https://lkml.org/lkml/2012/10/16/594.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject: [PATCH] drivers-core: move device_pm_remove behind bus_remove_device

2012-10-17 Thread Yanmin Zhang
On Mon, 2012-10-15 at 22:59 +0200, Rafael J. Wysocki wrote:
 On Monday 15 of October 2012 15:39:49 Yanmin Zhang wrote:
  On Fri, 2012-09-21 at 01:58 +, Zhang, LongX wrote:
   From: LongX Zhang longx.zh...@intel.com
   
   device_pm_remove will call pm_runtime_remove which would disable
   runtime PM of the device. After that pm_runtime_get* or
   pm_runtime_put* will be ingored. So if we disable the runtime PM
   before device really be removed, drivers' _remove callback may
   access HW even pm_runtime_get* fails. That is bad.
  The background about the patch: We hit an hang issue when removing a mmc
  device on Medfield Android phone by sysfs interface.
  
  Consider below call sequence when removing a device:
  device_del = device_pm_remove
 = class_intf-remove_dev(dev, class_intf)  = 
  pm_runtime_get_sync/put_sync
 = bus_remove_device = device_release_driver = 
  pm_runtime_get_sync/put_sync
  
  remove_dev might call pm_runtime_get_sync/put_sync.
  Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
  Since device_del = device_pm_remove firstly, later _get_sync wouldn't 
  really
  wake up the device.
  
  I git log -p to find the patch which moves the calling to device_pm_remove 
  ahead.
  It's below patch:
  
  commit 775b64d2b6ca37697de925f70799c710aab5849a
  Author: Rafael J. Wysocki r...@sisk.pl
  Date:   Sat Jan 12 20:40:46 2008 +0100
  
  PM: Acquire device locks on suspend
  
  This patch reorganizes the way suspend and resume notifications are
  sent to drivers.  The major changes are that now the PM core acquires
  every device semaphore before calling the methods, and calls to
  device_add() during suspends will fail, while calls to device_del()
  during suspends will block.
  
  It also provides a way to safely remove a suspended device with the
  help of the PM core, by using the device_pm_schedule_removal() callback
  introduced specifically for this purpose, and updates two drivers (msr
  and cpuid) that need to use it.
  
  
  As device_pm_schedule_removal is deleted by another patch, we need also 
  revert
  other parts of the patch, i.e. move the calling of device_pm_remove after
  the calling to bus_remove_device.
  
   
   Signed-off-by: LongX Zhang longx.zh...@intel.com
 
 Can you please resend the patch?  I lost the track of it unfortunately.
Long sent V2 out. See https://lkml.org/lkml/2012/10/16/594.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject: [PATCH] drivers-core: move device_pm_remove behind bus_remove_device

2012-10-15 Thread Yanmin Zhang
On Fri, 2012-09-21 at 01:58 +, Zhang, LongX wrote:
> From: LongX Zhang 
> 
> device_pm_remove will call pm_runtime_remove which would disable
> runtime PM of the device. After that pm_runtime_get* or
> pm_runtime_put* will be ingored. So if we disable the runtime PM
> before device really be removed, drivers' _remove callback may
> access HW even pm_runtime_get* fails. That is bad.
The background about the patch: We hit an hang issue when removing a mmc
device on Medfield Android phone by sysfs interface.

Consider below call sequence when removing a device:
device_del => device_pm_remove
   => class_intf->remove_dev(dev, class_intf)  => 
pm_runtime_get_sync/put_sync
   => bus_remove_device => device_release_driver => 
pm_runtime_get_sync/put_sync

remove_dev might call pm_runtime_get_sync/put_sync.
Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
Since device_del => device_pm_remove firstly, later _get_sync wouldn't really
wake up the device.

I git log -p to find the patch which moves the calling to device_pm_remove 
ahead.
It's below patch:

commit 775b64d2b6ca37697de925f70799c710aab5849a
Author: Rafael J. Wysocki 
Date:   Sat Jan 12 20:40:46 2008 +0100

PM: Acquire device locks on suspend

This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.

It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.


As device_pm_schedule_removal is deleted by another patch, we need also revert
other parts of the patch, i.e. move the calling of device_pm_remove after
the calling to bus_remove_device.

> 
> Signed-off-by: LongX Zhang 
> ---
>  drivers/base/core.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5e6e00b..81ea7f2 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1169,7 +1169,6 @@ void device_del(struct device *dev)
>   if (dev->bus)
>   blocking_notifier_call_chain(>bus->p->bus_notifier,
>BUS_NOTIFY_DEL_DEVICE, dev);
> - device_pm_remove(dev);
>   dpm_sysfs_remove(dev);
>   if (parent)
>   klist_del(>p->knode_parent);
> @@ -1194,6 +1193,7 @@ void device_del(struct device *dev)
>   device_remove_file(dev, _attr);
>   device_remove_attrs(dev);
>   bus_remove_device(dev);
> + device_pm_remove(dev);
>   driver_deferred_probe_del(dev);
>  
>   /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Subject: [PATCH] drivers-core: move device_pm_remove behind bus_remove_device

2012-10-15 Thread Yanmin Zhang
On Fri, 2012-09-21 at 01:58 +, Zhang, LongX wrote:
 From: LongX Zhang longx.zh...@intel.com
 
 device_pm_remove will call pm_runtime_remove which would disable
 runtime PM of the device. After that pm_runtime_get* or
 pm_runtime_put* will be ingored. So if we disable the runtime PM
 before device really be removed, drivers' _remove callback may
 access HW even pm_runtime_get* fails. That is bad.
The background about the patch: We hit an hang issue when removing a mmc
device on Medfield Android phone by sysfs interface.

Consider below call sequence when removing a device:
device_del = device_pm_remove
   = class_intf-remove_dev(dev, class_intf)  = 
pm_runtime_get_sync/put_sync
   = bus_remove_device = device_release_driver = 
pm_runtime_get_sync/put_sync

remove_dev might call pm_runtime_get_sync/put_sync.
Then, generic device_release_driver also calls pm_runtime_get_sync/put_sync.
Since device_del = device_pm_remove firstly, later _get_sync wouldn't really
wake up the device.

I git log -p to find the patch which moves the calling to device_pm_remove 
ahead.
It's below patch:

commit 775b64d2b6ca37697de925f70799c710aab5849a
Author: Rafael J. Wysocki r...@sisk.pl
Date:   Sat Jan 12 20:40:46 2008 +0100

PM: Acquire device locks on suspend

This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.

It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.


As device_pm_schedule_removal is deleted by another patch, we need also revert
other parts of the patch, i.e. move the calling of device_pm_remove after
the calling to bus_remove_device.

 
 Signed-off-by: LongX Zhang longx.zh...@intel.com
 ---
  drivers/base/core.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index 5e6e00b..81ea7f2 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -1169,7 +1169,6 @@ void device_del(struct device *dev)
   if (dev-bus)
   blocking_notifier_call_chain(dev-bus-p-bus_notifier,
BUS_NOTIFY_DEL_DEVICE, dev);
 - device_pm_remove(dev);
   dpm_sysfs_remove(dev);
   if (parent)
   klist_del(dev-p-knode_parent);
 @@ -1194,6 +1193,7 @@ void device_del(struct device *dev)
   device_remove_file(dev, uevent_attr);
   device_remove_attrs(dev);
   bus_remove_device(dev);
 + device_pm_remove(dev);
   driver_deferred_probe_del(dev);
  
   /*


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/fixup_irq: using the cpu_online_mask instead of cpu_all_mask

2012-08-19 Thread Yanmin Zhang
On Tue, 2012-08-14 at 06:55 +, Liu, Chuansheng wrote:
> From: liu chuansheng 
> Subject: [PATCH] x86/fixup_irq: using the cpu_online_mask instead of 
> cpu_all_mask
> 
> When one CPU is going down, and this CPU is the last one in irq affinity,
> current code is setting cpu_all_mask as the new affinity for that irq.
> 
> But for some system the firmware maybe send the interrupt to each CPU
> in irq affinity averagely, and cpu_all_mask include all CPUs.
> 
> Here replacing cpu_all_mask with cpu_online_mask, it is more reasonable
> and fittable.
It's a good finding. The issue exists on Medfield Android mobile. 


> 
> Signed-off-by: liu chuansheng 
> ---
>  arch/x86/kernel/irq.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index 7ad683d..d44f782 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -270,7 +270,7 @@ void fixup_irqs(void)
>  
> if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) 
> {
> break_affinity = 1;
> -   affinity = cpu_all_mask;
> +   affinity = cpu_online_mask;
> }
>  
> chip = irq_data_get_irq_chip(data);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/fixup_irq: using the cpu_online_mask instead of cpu_all_mask

2012-08-19 Thread Yanmin Zhang
On Tue, 2012-08-14 at 06:55 +, Liu, Chuansheng wrote:
 From: liu chuansheng chuansheng@intel.com
 Subject: [PATCH] x86/fixup_irq: using the cpu_online_mask instead of 
 cpu_all_mask
 
 When one CPU is going down, and this CPU is the last one in irq affinity,
 current code is setting cpu_all_mask as the new affinity for that irq.
 
 But for some system the firmware maybe send the interrupt to each CPU
 in irq affinity averagely, and cpu_all_mask include all CPUs.
 
 Here replacing cpu_all_mask with cpu_online_mask, it is more reasonable
 and fittable.
It's a good finding. The issue exists on Medfield Android mobile. 


 
 Signed-off-by: liu chuansheng chuansheng@intel.com
 ---
  arch/x86/kernel/irq.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
 index 7ad683d..d44f782 100644
 --- a/arch/x86/kernel/irq.c
 +++ b/arch/x86/kernel/irq.c
 @@ -270,7 +270,7 @@ void fixup_irqs(void)
  
 if (cpumask_any_and(affinity, cpu_online_mask) = nr_cpu_ids) 
 {
 break_affinity = 1;
 -   affinity = cpu_all_mask;
 +   affinity = cpu_online_mask;
 }
  
 chip = irq_data_get_irq_chip(data);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Fwd: [PATCH] x86/smp: Fix cpuN startup panic]

2012-08-09 Thread Yanmin Zhang
Peter,

What's your opinion about the patch? We hit it when enabling Medfield Android 
mobile.
This patch would put AP to a loop.

Another method to fix it is just to enlarge the wait time, for example, from 
2HZ to 10HZ.

Yanmin

 Forwarded Message 
> From: Chen, LinX Z 
> To: linux-kernel@vger.kernel.org
> Cc: mi...@redhat.com, t...@linutronix.de, h...@zytor.com,
> yanmin_zh...@linux.intel.com
> Subject: [PATCH] x86/smp: Fix cpuN startup panic
> Date: Tue, 07 Aug 2012 18:50:40 +0900
> 
> From: Lin Chen 
> 
> We hit a panic while doing cpu hotplug test.
> <0>[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but 
> did not get a callout!
> <0>[  627.982864]
> <4>[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> <4>[  627.982883] Call Trace:
> <4>[  627.982903]  [] panic+0x66/0x16c
> <4>[  627.982918]  [] ? default_get_apic_id+0x1c/0x40
> <4>[  627.982931]  [] start_secondary+0xda/0x252
> 
> During BSP bootup AP, it is possible that BSP be preempted before
> finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> AP busy wait for it. At present, AP will wait for 2 seconds then panic.
> 
> This patch let AP waits until BSP finish the startup sequence and gives
> WARNING when BSP is preempted more than 2 seconds.
> 
> Signed-off-by: Yanmin Zhang 
> Signed-off-by: Lin Chen 
> ---
>   arch/x86/kernel/smpboot.c |   11 ++-
>   1 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 7c5a8c3..a9e3379 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
>* Waiting 2s total for startup (udelay is not yet working)
>*/
>   timeout = jiffies + 2*HZ;
> - while (time_before(jiffies, timeout)) {
> + while (1) {
>   /*
>* Has the boot CPU finished it's STARTUP sequence?
>*/
>   if (cpumask_test_cpu(cpuid, cpu_callout_mask))
>   break;
>   cpu_relax();
> + if (!time_before(jiffies, timeout)) {
> + WARN(1, "%s: CPU%d started up but did not get a 
> callout!\n",
> + __func__, cpuid);
> + timeout = jiffies + 2*HZ;
> + }
>   }
> 
> - if (!time_before(jiffies, timeout)) {
> - panic("%s: CPU%d started up but did not get a callout!\n",
> -   __func__, cpuid);
> - }
> 
>   /*
>* the boot CPU has finished the init stage and is spinning
> -- 
> 1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Fwd: [PATCH] x86/smp: Fix cpuN startup panic]

2012-08-09 Thread Yanmin Zhang
Peter,

What's your opinion about the patch? We hit it when enabling Medfield Android 
mobile.
This patch would put AP to a loop.

Another method to fix it is just to enlarge the wait time, for example, from 
2HZ to 10HZ.

Yanmin

 Forwarded Message 
 From: Chen, LinX Z linx.z.c...@intel.com
 To: linux-kernel@vger.kernel.org
 Cc: mi...@redhat.com, t...@linutronix.de, h...@zytor.com,
 yanmin_zh...@linux.intel.com
 Subject: [PATCH] x86/smp: Fix cpuN startup panic
 Date: Tue, 07 Aug 2012 18:50:40 +0900
 
 From: Lin Chen linx.z.c...@intel.com
 
 We hit a panic while doing cpu hotplug test.
 0[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but 
 did not get a callout!
 0[  627.982864]
 4[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
 4[  627.982883] Call Trace:
 4[  627.982903]  [c18f2977] panic+0x66/0x16c
 4[  627.982918]  [c12234cc] ? default_get_apic_id+0x1c/0x40
 4[  627.982931]  [c18ef96d] start_secondary+0xda/0x252
 
 During BSP bootup AP, it is possible that BSP be preempted before
 finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
 AP busy wait for it. At present, AP will wait for 2 seconds then panic.
 
 This patch let AP waits until BSP finish the startup sequence and gives
 WARNING when BSP is preempted more than 2 seconds.
 
 Signed-off-by: Yanmin Zhang yanmin_zh...@linux.intel.com
 Signed-off-by: Lin Chen linx.z.c...@intel.com
 ---
   arch/x86/kernel/smpboot.c |   11 ++-
   1 files changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
 index 7c5a8c3..a9e3379 100644
 --- a/arch/x86/kernel/smpboot.c
 +++ b/arch/x86/kernel/smpboot.c
 @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
* Waiting 2s total for startup (udelay is not yet working)
*/
   timeout = jiffies + 2*HZ;
 - while (time_before(jiffies, timeout)) {
 + while (1) {
   /*
* Has the boot CPU finished it's STARTUP sequence?
*/
   if (cpumask_test_cpu(cpuid, cpu_callout_mask))
   break;
   cpu_relax();
 + if (!time_before(jiffies, timeout)) {
 + WARN(1, %s: CPU%d started up but did not get a 
 callout!\n,
 + __func__, cpuid);
 + timeout = jiffies + 2*HZ;
 + }
   }
 
 - if (!time_before(jiffies, timeout)) {
 - panic(%s: CPU%d started up but did not get a callout!\n,
 -   __func__, cpuid);
 - }
 
   /*
* the boot CPU has finished the init stage and is spinning
 -- 
 1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/smp: Fix cpuN startup panic

2012-08-07 Thread Yanmin Zhang
On Wed, 2012-08-08 at 00:33 +0800, Jiang Liu wrote:
> On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
> > From: Lin Chen 
> > 
> > We hit a panic while doing cpu hotplug test.
> > <0>[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up 
> > but did not get a callout!
> > <0>[  627.982864]
> > <4>[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> > <4>[  627.982883] Call Trace:
> > <4>[  627.982903]  [] panic+0x66/0x16c
> > <4>[  627.982918]  [] ? default_get_apic_id+0x1c/0x40
> > <4>[  627.982931]  [] start_secondary+0xda/0x252
> > 
> > During BSP bootup AP, it is possible that BSP be preempted before
> > finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> > AP busy wait for it. At present, AP will wait for 2 seconds then panic.
> > 
> > This patch let AP waits until BSP finish the startup sequence and gives
> > WARNING when BSP is preempted more than 2 seconds.
> > 
> > Signed-off-by: Yanmin Zhang 
> > Signed-off-by: Lin Chen 
> > ---
> >  arch/x86/kernel/smpboot.c |   11 ++-
> >  1 files changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 7c5a8c3..a9e3379 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
> >   * Waiting 2s total for startup (udelay is not yet working)
> >   */
> >  timeout = jiffies + 2*HZ;
> > -while (time_before(jiffies, timeout)) {
> > +while (1) {
> Hi Yanmin,
> 
>   Seems a little risky, what if a slave CPU can't be booted due to 
> hardware errors?
Slave CPU runs the loop. Basically, there is a handshake between BSP and AP.
The patch doesn't change BSP codes. So when slave CPU fails, BSP still goes 
ahead
and kernel still works.
 
>   Regards!
>   Gerry
> 
> >  /*
> >   * Has the boot CPU finished it's STARTUP sequence?
> >   */
> >  if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> >  break;
> >  cpu_relax();
> > +if (!time_before(jiffies, timeout)) {
> > +WARN(1, "%s: CPU%d started up but did not get a callout!\n",
> > +__func__, cpuid);
> > +timeout = jiffies + 2*HZ;
> > +}
> >  }
> > 
> > -if (!time_before(jiffies, timeout)) {
> > -panic("%s: CPU%d started up but did not get a callout!\n",
> > -  __func__, cpuid);
> > -}
> > 
> >  /*
> >   * the boot CPU has finished the init stage and is spinning
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/smp: Fix cpuN startup panic

2012-08-07 Thread Yanmin Zhang
On Wed, 2012-08-08 at 00:33 +0800, Jiang Liu wrote:
 On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
  From: Lin Chen linx.z.c...@intel.com
  
  We hit a panic while doing cpu hotplug test.
  0[  627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up 
  but did not get a callout!
  0[  627.982864]
  4[  627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
  4[  627.982883] Call Trace:
  4[  627.982903]  [c18f2977] panic+0x66/0x16c
  4[  627.982918]  [c12234cc] ? default_get_apic_id+0x1c/0x40
  4[  627.982931]  [c18ef96d] start_secondary+0xda/0x252
  
  During BSP bootup AP, it is possible that BSP be preempted before
  finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
  AP busy wait for it. At present, AP will wait for 2 seconds then panic.
  
  This patch let AP waits until BSP finish the startup sequence and gives
  WARNING when BSP is preempted more than 2 seconds.
  
  Signed-off-by: Yanmin Zhang yanmin_zh...@linux.intel.com
  Signed-off-by: Lin Chen linx.z.c...@intel.com
  ---
   arch/x86/kernel/smpboot.c |   11 ++-
   1 files changed, 6 insertions(+), 5 deletions(-)
  
  diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
  index 7c5a8c3..a9e3379 100644
  --- a/arch/x86/kernel/smpboot.c
  +++ b/arch/x86/kernel/smpboot.c
  @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
* Waiting 2s total for startup (udelay is not yet working)
*/
   timeout = jiffies + 2*HZ;
  -while (time_before(jiffies, timeout)) {
  +while (1) {
 Hi Yanmin,
 
   Seems a little risky, what if a slave CPU can't be booted due to 
 hardware errors?
Slave CPU runs the loop. Basically, there is a handshake between BSP and AP.
The patch doesn't change BSP codes. So when slave CPU fails, BSP still goes 
ahead
and kernel still works.
 
   Regards!
   Gerry
 
   /*
* Has the boot CPU finished it's STARTUP sequence?
*/
   if (cpumask_test_cpu(cpuid, cpu_callout_mask))
   break;
   cpu_relax();
  +if (!time_before(jiffies, timeout)) {
  +WARN(1, %s: CPU%d started up but did not get a callout!\n,
  +__func__, cpuid);
  +timeout = jiffies + 2*HZ;
  +}
   }
  
  -if (!time_before(jiffies, timeout)) {
  -panic(%s: CPU%d started up but did not get a callout!\n,
  -  __func__, cpuid);
  -}
  
   /*
* the boot CPU has finished the init stage and is spinning
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/