Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Benjamin Herrenschmidt
On Fri, 2014-01-17 at 09:13 -0500, Vivek Goyal wrote:
> On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
> > On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
> > > 
> > > I think we have largely survied until now because kdump is so popular
> > > and kdump winds up having to reinitialize devices from any random
> > > state.
> > 
> > kdump also doesn't care too much if the device is still DMA'ing to the
> > old kernel memory :-)
> 
> In principle kdump does not care about ongoing DMAs but in practice it
> is giving us some headaches with IOMMU. Various kind of issues crop up
> during IOMMU intialization in second kernel while DMA is ongoing and
> unfortunately no good solution has made into upstream yet.
> 
> Well, ongoing DMA and IOMMU seems to be orthogonal to using ->remove()
> in kexec. So I will stop here. :-)

Right, it's an orthogonal problem. I think hot resetting the bus might
solve it too though. It's even worse on ppc because the resulting iommu
errors trigger those "EEH freeze" that we have here blocking the devices
out etc...

Ben.

> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Sumner, William
Vivek, Ben, Eric,

Please take a look at a proposed patch to intel-iommu: "[PATCHv3 0/6] Crashdump 
Accepting Active IOMMU"

This is specifically for kdump; however, would some small variation of this 
technique be applicable to kexec ?

Thanks,
Bill

-Original Message-
From: kexec [mailto:kexec-boun...@lists.infradead.org] On Behalf Of Vivek Goyal
Sent: Friday, January 17, 2014 8:13 AM
To: Benjamin Herrenschmidt
Cc: Matthew Garrett; ke...@lists.infradead.org; linux-kernel@vger.kernel.org; 
Eric W. Biederman; Andrew Morton; Linus Torvalds
Subject: Re: [Q] Why does kexec use device_shutdown rather than ubind them

On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
> > 
> > I think we have largely survied until now because kdump is so popular
> > and kdump winds up having to reinitialize devices from any random
> > state.
> 
> kdump also doesn't care too much if the device is still DMA'ing to the
> old kernel memory :-)

In principle kdump does not care about ongoing DMAs but in practice it
is giving us some headaches with IOMMU. Various kind of issues crop up
during IOMMU intialization in second kernel while DMA is ongoing and
unfortunately no good solution has made into upstream yet.

Well, ongoing DMA and IOMMU seems to be orthogonal to using ->remove()
in kexec. So I will stop here. :-)

Thanks
Vivek

___
kexec mailing list
ke...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Vivek Goyal
On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
> > 
> > I think we have largely survied until now because kdump is so popular
> > and kdump winds up having to reinitialize devices from any random
> > state.
> 
> kdump also doesn't care too much if the device is still DMA'ing to the
> old kernel memory :-)

In principle kdump does not care about ongoing DMAs but in practice it
is giving us some headaches with IOMMU. Various kind of issues crop up
during IOMMU intialization in second kernel while DMA is ongoing and
unfortunately no good solution has made into upstream yet.

Well, ongoing DMA and IOMMU seems to be orthogonal to using ->remove()
in kexec. So I will stop here. :-)

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Vivek Goyal
On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
  
  I think we have largely survied until now because kdump is so popular
  and kdump winds up having to reinitialize devices from any random
  state.
 
 kdump also doesn't care too much if the device is still DMA'ing to the
 old kernel memory :-)

In principle kdump does not care about ongoing DMAs but in practice it
is giving us some headaches with IOMMU. Various kind of issues crop up
during IOMMU intialization in second kernel while DMA is ongoing and
unfortunately no good solution has made into upstream yet.

Well, ongoing DMA and IOMMU seems to be orthogonal to using -remove()
in kexec. So I will stop here. :-)

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Sumner, William
Vivek, Ben, Eric,

Please take a look at a proposed patch to intel-iommu: [PATCHv3 0/6] Crashdump 
Accepting Active IOMMU

This is specifically for kdump; however, would some small variation of this 
technique be applicable to kexec ?

Thanks,
Bill

-Original Message-
From: kexec [mailto:kexec-boun...@lists.infradead.org] On Behalf Of Vivek Goyal
Sent: Friday, January 17, 2014 8:13 AM
To: Benjamin Herrenschmidt
Cc: Matthew Garrett; ke...@lists.infradead.org; linux-kernel@vger.kernel.org; 
Eric W. Biederman; Andrew Morton; Linus Torvalds
Subject: Re: [Q] Why does kexec use device_shutdown rather than ubind them

On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
  
  I think we have largely survied until now because kdump is so popular
  and kdump winds up having to reinitialize devices from any random
  state.
 
 kdump also doesn't care too much if the device is still DMA'ing to the
 old kernel memory :-)

In principle kdump does not care about ongoing DMAs but in practice it
is giving us some headaches with IOMMU. Various kind of issues crop up
during IOMMU intialization in second kernel while DMA is ongoing and
unfortunately no good solution has made into upstream yet.

Well, ongoing DMA and IOMMU seems to be orthogonal to using -remove()
in kexec. So I will stop here. :-)

Thanks
Vivek

___
kexec mailing list
ke...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-17 Thread Benjamin Herrenschmidt
On Fri, 2014-01-17 at 09:13 -0500, Vivek Goyal wrote:
 On Fri, Jan 17, 2014 at 04:59:13PM +1100, Benjamin Herrenschmidt wrote:
  On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
   
   I think we have largely survied until now because kdump is so popular
   and kdump winds up having to reinitialize devices from any random
   state.
  
  kdump also doesn't care too much if the device is still DMA'ing to the
  old kernel memory :-)
 
 In principle kdump does not care about ongoing DMAs but in practice it
 is giving us some headaches with IOMMU. Various kind of issues crop up
 during IOMMU intialization in second kernel while DMA is ongoing and
 unfortunately no good solution has made into upstream yet.
 
 Well, ongoing DMA and IOMMU seems to be orthogonal to using -remove()
 in kexec. So I will stop here. :-)

Right, it's an orthogonal problem. I think hot resetting the bus might
solve it too though. It's even worse on ppc because the resulting iommu
errors trigger those EEH freeze that we have here blocking the devices
out etc...

Ben.

 Thanks
 Vivek
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-16 Thread Benjamin Herrenschmidt
On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
> 
> I think we have largely survied until now because kdump is so popular
> and kdump winds up having to reinitialize devices from any random
> state.

kdump also doesn't care too much if the device is still DMA'ing to the
old kernel memory :-)

> But like I said I am all for reducing the burden on device driver
> developers.

Right. I'm experimenting with a variant of device_shutdown() that tries
remove() first and if it doesn't exist and shutdown() does, call that
(is that ever the case ?). I'm keeping this kexec-specific for now.

I'll try to hammer that on some of our machines see if it breaks
anything, I think it's a much better approach for kexec.

As for actual machine shutdown, we *might* have some corner cases where
shutdown is actually different from remove for good reasons, so that
will have to be investigated a bit more in depth.

I'll post my results when I have them.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-16 Thread Eric W. Biederman
Benjamin Herrenschmidt  writes:

> Hi Folks !
>
> Sorry for the semi-random CC list, not sure who owns kexec nowadays. So
> we are working on a new crop of power servers for which the bootloader
> is going to be using kexec.
>
> As expected, we've been chasing a number of reliability issues mostly
> due to drivers not behaving properly, such as leaving devices DMA'ing or
> in a state that upsets the new kernel etc...
>
> So far our approach has been to fix the drivers one by one, adding the
> shutdown() method when it's missing, etc...
>
> But that lead me to wonder ... why shutdown() in the first place ? The
> semantic of shutdown() is that we are going to power the machine off. In
> some cases, that method will actively participate in the shutdown,
> powering things off, spinning disk down, etc
>
> It doesn't have the semantic of "put the device into a clean state for a
> new driver to pick up". It's also rarely implemented.
>
> On the other hand, the remove() routine is almost everywhere, and is
> already well understood as needing to leave the device in a clean state,
> as it's often used for rmmod (often by the driver developer
> him/herself), more likely to be tested in a condition that doesn't
> involve having the machine off immediately afterward but on the contrare
> in a condition where a new driver can come and try to pick the device
> up.
>
> Additionally, remove() is also what KVM does when assigning devices to
> guest, ie, the original driver is unbound from the host, and VFIO is
> bound in its place.
>
> So we have common purpose with kexec (somewhat) and possibly common (and
> better) testing coverage with remove() than with shutdown().
>
> I plan to experiment a bit in our bootloader see if that makes a
> difference, maybe doing a first pass of unbind for anything that can be
> unbound, and shutdown for the rest.
>
> (I'll probably also sneak it a PCIe hot reset at the end but that's more
> platform specific).
>
> Any opinion ?

When I was young and kexec was a new idea and the device model was just
getting methods.  An argument was made that using remove in the reboot
path was a bad idea because it does more than is necessary so we needed
shutdown.

I wanted to use remove at the time.

There is a lot of silliness now, but my inclination is to remove
shutdown entirely and replace it with remove.  I certainly would not
have heart burn with doing so just for kexec but getting the whole
reboot path at this point would make me very happy and then we could
kill a little used rarely implemented method from the kernel.

Given that we have found emperically that flipping the bus master enable
bit to off helps enormously for stability, but doing so generically
occassionally has unfortunate consequences why not.

I think we have largely survied until now because kdump is so popular
and kdump winds up having to reinitialize devices from any random state.

But like I said I am all for reducing the burden on device driver developers.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-16 Thread Eric W. Biederman
Benjamin Herrenschmidt b...@kernel.crashing.org writes:

 Hi Folks !

 Sorry for the semi-random CC list, not sure who owns kexec nowadays. So
 we are working on a new crop of power servers for which the bootloader
 is going to be using kexec.

 As expected, we've been chasing a number of reliability issues mostly
 due to drivers not behaving properly, such as leaving devices DMA'ing or
 in a state that upsets the new kernel etc...

 So far our approach has been to fix the drivers one by one, adding the
 shutdown() method when it's missing, etc...

 But that lead me to wonder ... why shutdown() in the first place ? The
 semantic of shutdown() is that we are going to power the machine off. In
 some cases, that method will actively participate in the shutdown,
 powering things off, spinning disk down, etc

 It doesn't have the semantic of put the device into a clean state for a
 new driver to pick up. It's also rarely implemented.

 On the other hand, the remove() routine is almost everywhere, and is
 already well understood as needing to leave the device in a clean state,
 as it's often used for rmmod (often by the driver developer
 him/herself), more likely to be tested in a condition that doesn't
 involve having the machine off immediately afterward but on the contrare
 in a condition where a new driver can come and try to pick the device
 up.

 Additionally, remove() is also what KVM does when assigning devices to
 guest, ie, the original driver is unbound from the host, and VFIO is
 bound in its place.

 So we have common purpose with kexec (somewhat) and possibly common (and
 better) testing coverage with remove() than with shutdown().

 I plan to experiment a bit in our bootloader see if that makes a
 difference, maybe doing a first pass of unbind for anything that can be
 unbound, and shutdown for the rest.

 (I'll probably also sneak it a PCIe hot reset at the end but that's more
 platform specific).

 Any opinion ?

When I was young and kexec was a new idea and the device model was just
getting methods.  An argument was made that using remove in the reboot
path was a bad idea because it does more than is necessary so we needed
shutdown.

I wanted to use remove at the time.

There is a lot of silliness now, but my inclination is to remove
shutdown entirely and replace it with remove.  I certainly would not
have heart burn with doing so just for kexec but getting the whole
reboot path at this point would make me very happy and then we could
kill a little used rarely implemented method from the kernel.

Given that we have found emperically that flipping the bus master enable
bit to off helps enormously for stability, but doing so generically
occassionally has unfortunate consequences why not.

I think we have largely survied until now because kdump is so popular
and kdump winds up having to reinitialize devices from any random state.

But like I said I am all for reducing the burden on device driver developers.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Why does kexec use device_shutdown rather than ubind them

2014-01-16 Thread Benjamin Herrenschmidt
On Thu, 2014-01-16 at 20:52 -0800, Eric W. Biederman wrote:
 
 I think we have largely survied until now because kdump is so popular
 and kdump winds up having to reinitialize devices from any random
 state.

kdump also doesn't care too much if the device is still DMA'ing to the
old kernel memory :-)

 But like I said I am all for reducing the burden on device driver
 developers.

Right. I'm experimenting with a variant of device_shutdown() that tries
remove() first and if it doesn't exist and shutdown() does, call that
(is that ever the case ?). I'm keeping this kexec-specific for now.

I'll try to hammer that on some of our machines see if it breaks
anything, I think it's a much better approach for kexec.

As for actual machine shutdown, we *might* have some corner cases where
shutdown is actually different from remove for good reasons, so that
will have to be investigated a bit more in depth.

I'll post my results when I have them.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/