Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On Wed, 27 Nov 2019 10:47:45 +0100 Frederic Barrat wrote: > > > Le 27/11/2019 à 10:33, Greg Kurz a écrit : > > On Wed, 27 Nov 2019 10:10:13 +0100 > > Frederic Barrat wrote: > > > >> > >> > >> Le 27/11/2019 à 09:24, Greg Kurz a écrit : > >>> On Wed, 27 Nov 2019 18:09:40 +1100 > >>> Alexey Kardashevskiy wrote: > >>> > > > On 20/11/2019 12:28, Oliver O'Halloran wrote: > > The comment here implies that we don't need to take a ref to the pci_dev > > because the ioda_pe will always have one. This implies that the current > > expection is that the pci_dev for an NPU device will *never* be torn > > down since the ioda_pe having a ref to the device will prevent the > > release function from being called. > > > > In other words, the desired behaviour here appears to be leaking a ref. > > > > Nice! > > > There is a history: https://patchwork.ozlabs.org/patch/1088078/ > > We did not fix anything in particular then, we do not seem to be fixing > anything now (in other words - we cannot test it in a normal natural > way). I'd drop this one. > > >>> > >>> Yeah, I didn't fix anything at the time. Just reverted to the ref > >>> count behavior we had before: > >>> > >>> https://patchwork.ozlabs.org/patch/829172/ > >>> > >>> Frederic recently posted his take on the same topic from the OpenCAPI > >>> point of view: > >>> > >>> http://patchwork.ozlabs.org/patch/1198947/ > >>> > >>> He seems to indicate the NPU devices as the real culprit because > >>> nobody ever cared for them to be removable. Fixing that seems be > >>> a chore nobody really wants to address obviously... :-\ > >> > >> > >> I had taken a stab at not leaking a ref for the nvlink devices and do > >> the proper thing regarding ref counting (i.e. fixing all the callers of > >> get_pci_dev() to drop the reference when they were done). With that, I > >> could see that the ref count of the nvlink devices could drop to 0 > >> (calling remove for the device in /sys) and that the devices could go away. > >> > >> But then, I realized it's not necessarily desirable at this point. There > >> are several comments in the code saying the npu devices (for nvlink) > >> don't go away, there's no device release callback defined when it seems > >> there should be, at least to handle releasing PEs All in all, it > >> seems that some work would be needed. And if it hasn't been required by > >> now... > >> > > > > If everyone is ok with leaking a reference in the NPU case, I guess > > this isn't a problem. But if we move forward with Oliver's patch, a > > pci_dev_put() would be needed for OpenCAPI, correct ? > > > No, these code paths are nvlink-only. > Oh yes indeed. Then this patch and yours fit well together :) >Fred > > > > >> Fred > >> > >> > > > > > > Signed-off-by: Oliver O'Halloran > > --- > >arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ > >1 file changed, 3 insertions(+), 8 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > > b/arch/powerpc/platforms/powernv/npu-dma.c > > index 72d3749da02c..2eb6e6d45a98 100644 > > --- a/arch/powerpc/platforms/powernv/npu-dma.c > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct > > device_node *dn) > > break; > > > > /* > > -* pci_get_domain_bus_and_slot() increased the reference count > > of > > -* the PCI device, but callers don't need that actually as the > > PE > > -* already holds a reference to the device. Since callers aren't > > -* aware of the reference count change, call pci_dev_put() now > > to > > -* avoid leaks. > > +* NB: for_each_pci_dev() elevates the pci_dev refcount. > > +* Caller is responsible for dropping the ref when it's > > +* finished with it. > > */ > > - if (pdev) > > - pci_dev_put(pdev); > > - > > return pdev; > >} > > > > > > >>> > >> > > >
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On Wed, 27 Nov 2019 10:10:13 +0100 Frederic Barrat wrote: > > > Le 27/11/2019 à 09:24, Greg Kurz a écrit : > > On Wed, 27 Nov 2019 18:09:40 +1100 > > Alexey Kardashevskiy wrote: > > > >> > >> > >> On 20/11/2019 12:28, Oliver O'Halloran wrote: > >>> The comment here implies that we don't need to take a ref to the pci_dev > >>> because the ioda_pe will always have one. This implies that the current > >>> expection is that the pci_dev for an NPU device will *never* be torn > >>> down since the ioda_pe having a ref to the device will prevent the > >>> release function from being called. > >>> > >>> In other words, the desired behaviour here appears to be leaking a ref. > >>> > >>> Nice! > >> > >> > >> There is a history: https://patchwork.ozlabs.org/patch/1088078/ > >> > >> We did not fix anything in particular then, we do not seem to be fixing > >> anything now (in other words - we cannot test it in a normal natural > >> way). I'd drop this one. > >> > > > > Yeah, I didn't fix anything at the time. Just reverted to the ref > > count behavior we had before: > > > > https://patchwork.ozlabs.org/patch/829172/ > > > > Frederic recently posted his take on the same topic from the OpenCAPI > > point of view: > > > > http://patchwork.ozlabs.org/patch/1198947/ > > > > He seems to indicate the NPU devices as the real culprit because > > nobody ever cared for them to be removable. Fixing that seems be > > a chore nobody really wants to address obviously... :-\ > > > I had taken a stab at not leaking a ref for the nvlink devices and do > the proper thing regarding ref counting (i.e. fixing all the callers of > get_pci_dev() to drop the reference when they were done). With that, I > could see that the ref count of the nvlink devices could drop to 0 > (calling remove for the device in /sys) and that the devices could go away. > > But then, I realized it's not necessarily desirable at this point. There > are several comments in the code saying the npu devices (for nvlink) > don't go away, there's no device release callback defined when it seems > there should be, at least to handle releasing PEs All in all, it > seems that some work would be needed. And if it hasn't been required by > now... > If everyone is ok with leaking a reference in the NPU case, I guess this isn't a problem. But if we move forward with Oliver's patch, a pci_dev_put() would be needed for OpenCAPI, correct ? >Fred > > > >> > >> > >>> > >>> Signed-off-by: Oliver O'Halloran > >>> --- > >>> arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ > >>> 1 file changed, 3 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > >>> b/arch/powerpc/platforms/powernv/npu-dma.c > >>> index 72d3749da02c..2eb6e6d45a98 100644 > >>> --- a/arch/powerpc/platforms/powernv/npu-dma.c > >>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c > >>> @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node > >>> *dn) > >>> break; > >>> > >>> /* > >>> - * pci_get_domain_bus_and_slot() increased the reference count of > >>> - * the PCI device, but callers don't need that actually as the PE > >>> - * already holds a reference to the device. Since callers aren't > >>> - * aware of the reference count change, call pci_dev_put() now to > >>> - * avoid leaks. > >>> + * NB: for_each_pci_dev() elevates the pci_dev refcount. > >>> + * Caller is responsible for dropping the ref when it's > >>> + * finished with it. > >>>*/ > >>> - if (pdev) > >>> - pci_dev_put(pdev); > >>> - > >>> return pdev; > >>> } > >>> > >>> > >> > > >
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On Wed, 27 Nov 2019 20:40:00 +1100 "Oliver O'Halloran" wrote: > On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz wrote: > > > > > > If everyone is ok with leaking a reference in the NPU case, I guess > > this isn't a problem. But if we move forward with Oliver's patch, a > > pci_dev_put() would be needed for OpenCAPI, correct ? > > Yes, but I think that's fair enough. By convention it's the callers > responsibility to drop the ref when it calls a function that returns a > refcounted object. Doing anything else creates a race condition since > the object's count could drop to zero before the caller starts using > it. > Sure, you're right, especially with Frederic's patch that drops the pci_dev_get(dev) in pnv_ioda_setup_dev_PE(). > Oliver
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
Le 27/11/2019 à 10:33, Greg Kurz a écrit : On Wed, 27 Nov 2019 10:10:13 +0100 Frederic Barrat wrote: Le 27/11/2019 à 09:24, Greg Kurz a écrit : On Wed, 27 Nov 2019 18:09:40 +1100 Alexey Kardashevskiy wrote: On 20/11/2019 12:28, Oliver O'Halloran wrote: The comment here implies that we don't need to take a ref to the pci_dev because the ioda_pe will always have one. This implies that the current expection is that the pci_dev for an NPU device will *never* be torn down since the ioda_pe having a ref to the device will prevent the release function from being called. In other words, the desired behaviour here appears to be leaking a ref. Nice! There is a history: https://patchwork.ozlabs.org/patch/1088078/ We did not fix anything in particular then, we do not seem to be fixing anything now (in other words - we cannot test it in a normal natural way). I'd drop this one. Yeah, I didn't fix anything at the time. Just reverted to the ref count behavior we had before: https://patchwork.ozlabs.org/patch/829172/ Frederic recently posted his take on the same topic from the OpenCAPI point of view: http://patchwork.ozlabs.org/patch/1198947/ He seems to indicate the NPU devices as the real culprit because nobody ever cared for them to be removable. Fixing that seems be a chore nobody really wants to address obviously... :-\ I had taken a stab at not leaking a ref for the nvlink devices and do the proper thing regarding ref counting (i.e. fixing all the callers of get_pci_dev() to drop the reference when they were done). With that, I could see that the ref count of the nvlink devices could drop to 0 (calling remove for the device in /sys) and that the devices could go away. But then, I realized it's not necessarily desirable at this point. There are several comments in the code saying the npu devices (for nvlink) don't go away, there's no device release callback defined when it seems there should be, at least to handle releasing PEs All in all, it seems that some work would be needed. And if it hasn't been required by now... If everyone is ok with leaking a reference in the NPU case, I guess this isn't a problem. But if we move forward with Oliver's patch, a pci_dev_put() would be needed for OpenCAPI, correct ? No, these code paths are nvlink-only. Fred Fred Signed-off-by: Oliver O'Halloran --- arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index 72d3749da02c..2eb6e6d45a98 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn) break; /* -* pci_get_domain_bus_and_slot() increased the reference count of -* the PCI device, but callers don't need that actually as the PE -* already holds a reference to the device. Since callers aren't -* aware of the reference count change, call pci_dev_put() now to -* avoid leaks. +* NB: for_each_pci_dev() elevates the pci_dev refcount. +* Caller is responsible for dropping the ref when it's +* finished with it. */ - if (pdev) - pci_dev_put(pdev); - return pdev; }
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz wrote: > > > If everyone is ok with leaking a reference in the NPU case, I guess > this isn't a problem. But if we move forward with Oliver's patch, a > pci_dev_put() would be needed for OpenCAPI, correct ? Yes, but I think that's fair enough. By convention it's the callers responsibility to drop the ref when it calls a function that returns a refcounted object. Doing anything else creates a race condition since the object's count could drop to zero before the caller starts using it. Oliver
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
Le 27/11/2019 à 09:24, Greg Kurz a écrit : On Wed, 27 Nov 2019 18:09:40 +1100 Alexey Kardashevskiy wrote: On 20/11/2019 12:28, Oliver O'Halloran wrote: The comment here implies that we don't need to take a ref to the pci_dev because the ioda_pe will always have one. This implies that the current expection is that the pci_dev for an NPU device will *never* be torn down since the ioda_pe having a ref to the device will prevent the release function from being called. In other words, the desired behaviour here appears to be leaking a ref. Nice! There is a history: https://patchwork.ozlabs.org/patch/1088078/ We did not fix anything in particular then, we do not seem to be fixing anything now (in other words - we cannot test it in a normal natural way). I'd drop this one. Yeah, I didn't fix anything at the time. Just reverted to the ref count behavior we had before: https://patchwork.ozlabs.org/patch/829172/ Frederic recently posted his take on the same topic from the OpenCAPI point of view: http://patchwork.ozlabs.org/patch/1198947/ He seems to indicate the NPU devices as the real culprit because nobody ever cared for them to be removable. Fixing that seems be a chore nobody really wants to address obviously... :-\ I had taken a stab at not leaking a ref for the nvlink devices and do the proper thing regarding ref counting (i.e. fixing all the callers of get_pci_dev() to drop the reference when they were done). With that, I could see that the ref count of the nvlink devices could drop to 0 (calling remove for the device in /sys) and that the devices could go away. But then, I realized it's not necessarily desirable at this point. There are several comments in the code saying the npu devices (for nvlink) don't go away, there's no device release callback defined when it seems there should be, at least to handle releasing PEs All in all, it seems that some work would be needed. And if it hasn't been required by now... Fred Signed-off-by: Oliver O'Halloran --- arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index 72d3749da02c..2eb6e6d45a98 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn) break; /* -* pci_get_domain_bus_and_slot() increased the reference count of -* the PCI device, but callers don't need that actually as the PE -* already holds a reference to the device. Since callers aren't -* aware of the reference count change, call pci_dev_put() now to -* avoid leaks. +* NB: for_each_pci_dev() elevates the pci_dev refcount. +* Caller is responsible for dropping the ref when it's +* finished with it. */ - if (pdev) - pci_dev_put(pdev); - return pdev; }
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On Wed, 27 Nov 2019 18:09:40 +1100 Alexey Kardashevskiy wrote: > > > On 20/11/2019 12:28, Oliver O'Halloran wrote: > > The comment here implies that we don't need to take a ref to the pci_dev > > because the ioda_pe will always have one. This implies that the current > > expection is that the pci_dev for an NPU device will *never* be torn > > down since the ioda_pe having a ref to the device will prevent the > > release function from being called. > > > > In other words, the desired behaviour here appears to be leaking a ref. > > > > Nice! > > > There is a history: https://patchwork.ozlabs.org/patch/1088078/ > > We did not fix anything in particular then, we do not seem to be fixing > anything now (in other words - we cannot test it in a normal natural > way). I'd drop this one. > Yeah, I didn't fix anything at the time. Just reverted to the ref count behavior we had before: https://patchwork.ozlabs.org/patch/829172/ Frederic recently posted his take on the same topic from the OpenCAPI point of view: http://patchwork.ozlabs.org/patch/1198947/ He seems to indicate the NPU devices as the real culprit because nobody ever cared for them to be removable. Fixing that seems be a chore nobody really wants to address obviously... :-\ > > > > > > Signed-off-by: Oliver O'Halloran > > --- > > arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > > b/arch/powerpc/platforms/powernv/npu-dma.c > > index 72d3749da02c..2eb6e6d45a98 100644 > > --- a/arch/powerpc/platforms/powernv/npu-dma.c > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node > > *dn) > > break; > > > > /* > > -* pci_get_domain_bus_and_slot() increased the reference count of > > -* the PCI device, but callers don't need that actually as the PE > > -* already holds a reference to the device. Since callers aren't > > -* aware of the reference count change, call pci_dev_put() now to > > -* avoid leaks. > > +* NB: for_each_pci_dev() elevates the pci_dev refcount. > > +* Caller is responsible for dropping the ref when it's > > +* finished with it. > > */ > > - if (pdev) > > - pci_dev_put(pdev); > > - > > return pdev; > > } > > > > >
Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
On 20/11/2019 12:28, Oliver O'Halloran wrote: > The comment here implies that we don't need to take a ref to the pci_dev > because the ioda_pe will always have one. This implies that the current > expection is that the pci_dev for an NPU device will *never* be torn > down since the ioda_pe having a ref to the device will prevent the > release function from being called. > > In other words, the desired behaviour here appears to be leaking a ref. > > Nice! There is a history: https://patchwork.ozlabs.org/patch/1088078/ We did not fix anything in particular then, we do not seem to be fixing anything now (in other words - we cannot test it in a normal natural way). I'd drop this one. > > Signed-off-by: Oliver O'Halloran > --- > arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > b/arch/powerpc/platforms/powernv/npu-dma.c > index 72d3749da02c..2eb6e6d45a98 100644 > --- a/arch/powerpc/platforms/powernv/npu-dma.c > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn) > break; > > /* > - * pci_get_domain_bus_and_slot() increased the reference count of > - * the PCI device, but callers don't need that actually as the PE > - * already holds a reference to the device. Since callers aren't > - * aware of the reference count change, call pci_dev_put() now to > - * avoid leaks. > + * NB: for_each_pci_dev() elevates the pci_dev refcount. > + * Caller is responsible for dropping the ref when it's > + * finished with it. >*/ > - if (pdev) > - pci_dev_put(pdev); > - > return pdev; > } > > -- Alexey
[Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs
The comment here implies that we don't need to take a ref to the pci_dev because the ioda_pe will always have one. This implies that the current expection is that the pci_dev for an NPU device will *never* be torn down since the ioda_pe having a ref to the device will prevent the release function from being called. In other words, the desired behaviour here appears to be leaking a ref. Nice! Signed-off-by: Oliver O'Halloran --- arch/powerpc/platforms/powernv/npu-dma.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index 72d3749da02c..2eb6e6d45a98 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn) break; /* -* pci_get_domain_bus_and_slot() increased the reference count of -* the PCI device, but callers don't need that actually as the PE -* already holds a reference to the device. Since callers aren't -* aware of the reference count change, call pci_dev_put() now to -* avoid leaks. +* NB: for_each_pci_dev() elevates the pci_dev refcount. +* Caller is responsible for dropping the ref when it's +* finished with it. */ - if (pdev) - pci_dev_put(pdev); - return pdev; } -- 2.21.0