RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove

2016-10-04 Thread KY Srinivasan


> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Tuesday, September 27, 2016 12:30 PM
> To: Long Li 
> Cc: KY Srinivasan ; Haiyang Zhang
> ; Bjorn Helgaas ;
> de...@linuxdriverproject.org; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Long Li 
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
> 
> On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> > From: Long Li 
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan the
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: 88007f5f8000 ti: 88007f60 task.ti:
> > 88007f60
> > [  383.853124] RIP: 0010:[]  []
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:88007f603d38  EFLAGS: 00010206
> > [  383.853124] RAX: 88007f5f8000 RBX: 642f3d4854415056 RCX:
> > 88007f603fd8
> > [  383.853124] RDX:  RSI:  RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: 88007f603d68 R08: 0246 R09:
> > a045eb9e
> > [  383.853124] R10: 88007b419a80 R11: ea0001c0ef40 R12:
> > 880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14:  R15:
> > 0246
> > [  383.853124] FS:  () GS:88007b40()
> > knlGS:
> > [  383.853124] CS:  0010 DS:  ES:  CR0: 80050033
> > [  383.853124] CR2: 7f68b3f52350 CR3: 03546000 CR4:
> > 000406f0
> > [  383.853124] DR0:  DR1:  DR2:
> > 
> > [  383.853124] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [  383.853124] Stack:
> > [  383.853124]  88007f603d68 8134db17 0008
> > 880003ee1c00
> > [  383.853124]  63702f30303a3137 880003d8edb8 88007f603da0
> > 8134ee2d
> > [  383.853124]  880003d8ed00 88007f603dd8 880075fec320
> > 880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [] ? pci_scan_slot+0x27/0x140
> > [  383.853124]  [] pci_scan_child_bus+0x3d/0x150
> > [  383.853124]  []
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> > [  383.853124]  [] process_one_work+0x17b/0x470
> > [  383.853124]  [] worker_thread+0x126/0x410
> > [  383.853124]  [] ? rescuer_thread+0x460/0x460
> > [  383.853124]  [] kthread+0xcf/0xe0
> > [  383.853124]  [] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [] ret_from_fork+0x58/0x90
> > [  383.853124]  [] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [] pci_is_pcie+0x6/0x20
> > [  383.853124]  RSP 
> 
> Personally, I would remove the timestamps and addresses from this trace
> because I don't think they contribute to diagnosing the problem.
> 
> > Signed-off-by: Long Li 

Acked-by: KY Srinivasan 

Thanks,

K. Y
> 
> I'm ready to apply these but am waiting for an ack from the maintainers
> listed in MAINTAINERS (feel free to update that if it's out of date).
> 
> > ---
> >  drivers/pci/host/pci-hyperv.c | 20 +---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> > index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> > hv_pcibus_init = 0,
> > hv_pcibus_probed,
> > hv_pcibus_installed,
> > +   hv_pcibus_removed,
> > hv_pcibus_maximum
> >  };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> > put_pcichild(hpdev, hv_pcidev_ref_initial);
> > }
> >
> > -   /* Tell the core to rescan bus because there may have been changes.
> */
> > -   if (hbus->state == hv

RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove

2016-09-27 Thread Long Li


> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Tuesday, September 27, 2016 12:30 PM
> To: Long Li 
> Cc: KY Srinivasan ; Haiyang Zhang
> ; Bjorn Helgaas ;
> de...@linuxdriverproject.org; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Long Li 
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
> 
> On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> > From: Long Li 
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan the
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: 88007f5f8000 ti: 88007f60 task.ti:
> > 88007f60
> > [  383.853124] RIP: 0010:[]  []
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:88007f603d38  EFLAGS: 00010206 [
> > 383.853124] RAX: 88007f5f8000 RBX: 642f3d4854415056 RCX:
> > 88007f603fd8
> > [  383.853124] RDX:  RSI:  RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: 88007f603d68 R08: 0246 R09:
> > a045eb9e
> > [  383.853124] R10: 88007b419a80 R11: ea0001c0ef40 R12:
> > 880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14:  R15:
> > 0246
> > [  383.853124] FS:  () GS:88007b40()
> > knlGS:
> > [  383.853124] CS:  0010 DS:  ES:  CR0: 80050033 [
> > 383.853124] CR2: 7f68b3f52350 CR3: 03546000 CR4:
> > 000406f0
> > [  383.853124] DR0:  DR1:  DR2:
> > 
> > [  383.853124] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [  383.853124] Stack:
> > [  383.853124]  88007f603d68 8134db17 0008
> > 880003ee1c00
> > [  383.853124]  63702f30303a3137 880003d8edb8 88007f603da0
> > 8134ee2d [  383.853124]  880003d8ed00 88007f603dd8
> > 880075fec320
> > 880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [] ? pci_scan_slot+0x27/0x140 [
> > 383.853124]  [] pci_scan_child_bus+0x3d/0x150 [
> > 383.853124]  []
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv] [  383.853124]
> > [] process_one_work+0x17b/0x470 [  383.853124]
> > [] worker_thread+0x126/0x410 [  383.853124]
> > [] ? rescuer_thread+0x460/0x460 [  383.853124]
> > [] kthread+0xcf/0xe0 [  383.853124]
> > [] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [] ret_from_fork+0x58/0x90 [
> > 383.853124]  [] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84
> > 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66
> > 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [] pci_is_pcie+0x6/0x20 [
> > 383.853124]  RSP 
> 
> Personally, I would remove the timestamps and addresses from this trace
> because I don't think they contribute to diagnosing the problem.

Thanks Bjorn. I will remove those kernel traces and send a v3 patch.

> 
> > Signed-off-by: Long Li 
> 
> I'm ready to apply these but am waiting for an ack from the maintainers listed
> in MAINTAINERS (feel free to update that if it's out of date).
> 
> > ---
> >  drivers/pci/host/pci-hyperv.c | 20 +---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c
> > b/drivers/pci/host/pci-hyperv.c index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> > hv_pcibus_init = 0,
> > hv_pcibus_probed,
> > hv_pcibus_installed,
> > +   hv_pcibus_removed,
> > hv_pcibus_maximum
> >  };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> > put_pcichild(hpdev, hv_pcidev_ref_initial);
> > }
> >
> > -   /* Tell the core to rescan bus because there may have been changes.
> */
>

Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove

2016-09-27 Thread Bjorn Helgaas
On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> From: Long Li 
> 
> hv_pci_devices_present is called in hv_pci_remove when we remove a PCI device 
> from host (e.g. by disabling SRIOV on a device). In hv_pci_remove, the bus is 
> already removed before the call, so we don't need to rescan the bus in the 
> workqueue scheduled from hv_pci_devices_present. By introducing status 
> hv_pcibus_removed, we can avoid this situation.
> 
> The patch fixes the following kernel panic.
> 
> [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> [  383.853124] task: 88007f5f8000 ti: 88007f60 task.ti:
> 88007f60
> [  383.853124] RIP: 0010:[]  []
> pci_is_pcie+0x6/0x20
> [  383.853124] RSP: 0018:88007f603d38  EFLAGS: 00010206
> [  383.853124] RAX: 88007f5f8000 RBX: 642f3d4854415056 RCX:
> 88007f603fd8
> [  383.853124] RDX:  RSI:  RDI:
> 642f3d4854415056
> [  383.853124] RBP: 88007f603d68 R08: 0246 R09:
> a045eb9e
> [  383.853124] R10: 88007b419a80 R11: ea0001c0ef40 R12:
> 880003ee1c00
> [  383.853124] R13: 63702f30303a3137 R14:  R15:
> 0246
> [  383.853124] FS:  () GS:88007b40()
> knlGS:
> [  383.853124] CS:  0010 DS:  ES:  CR0: 80050033
> [  383.853124] CR2: 7f68b3f52350 CR3: 03546000 CR4:
> 000406f0
> [  383.853124] DR0:  DR1:  DR2:
> 
> [  383.853124] DR3:  DR6: 0ff0 DR7:
> 0400
> [  383.853124] Stack:
> [  383.853124]  88007f603d68 8134db17 0008
> 880003ee1c00
> [  383.853124]  63702f30303a3137 880003d8edb8 88007f603da0
> 8134ee2d
> [  383.853124]  880003d8ed00 88007f603dd8 880075fec320
> 880003d8edb8
> [  383.853124] Call Trace:
> [  383.853124]  [] ? pci_scan_slot+0x27/0x140
> [  383.853124]  [] pci_scan_child_bus+0x3d/0x150
> [  383.853124]  []
> pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> [  383.853124]  [] process_one_work+0x17b/0x470
> [  383.853124]  [] worker_thread+0x126/0x410
> [  383.853124]  [] ? rescuer_thread+0x460/0x460
> [  383.853124]  [] kthread+0xcf/0xe0
> [  383.853124]  [] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124]  [] ret_from_fork+0x58/0x90
> [  383.853124]  [] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> [  383.853124] RIP  [] pci_is_pcie+0x6/0x20
> [  383.853124]  RSP 

Personally, I would remove the timestamps and addresses from this trace
because I don't think they contribute to diagnosing the problem.

> Signed-off-by: Long Li 

I'm ready to apply these but am waiting for an ack from the maintainers
listed in MAINTAINERS (feel free to update that if it's out of date).

> ---
>  drivers/pci/host/pci-hyperv.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index a8deeca..4a37598 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -348,6 +348,7 @@ enum hv_pcibus_state {
>   hv_pcibus_init = 0,
>   hv_pcibus_probed,
>   hv_pcibus_installed,
> + hv_pcibus_removed,
>   hv_pcibus_maximum
>  };
>  
> @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct 
> work_struct *work)
>   put_pcichild(hpdev, hv_pcidev_ref_initial);
>   }
>  
> - /* Tell the core to rescan bus because there may have been changes. */
> - if (hbus->state == hv_pcibus_installed) {
> + switch (hbus->state) {
> + case hv_pcibus_installed:
> + /*
> +  * Tell the core to rescan bus
> +  * because there may have been changes.
> +  */
>   pci_lock_rescan_remove();
>   pci_scan_child_bus(hbus->pci_bus);
>   pci_unlock_rescan_remove();
> - } else {
> + break;
> +
> + case hv_pcibus_init:
> + case hv_pcibus_probed:
>   survey_child_resources(hbus);
> + break;
> +
> + default:
> + break;
>   }
>  
>   up(&hbus->enum_sem);
> @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>   hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
>   if (!hbus)
>   return -ENOMEM;
> + hbus->state = hv_pcibus_init;
>  
>   /*
>* The PCI bus "domain" is what is called "segment" in ACPI and
> @@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
>   pci_stop_root_bus(hbus->pci_bus);
>   pci_remove_root_bus(hbus->pci_bus);
>   pci_unlock_rescan_remove();
> + hbus->state =