RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-14 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, April 15, 2020 8:36 AM
> 
> On Tue, 14 Apr 2020 23:57:33 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, April 14, 2020 11:24 PM
> > >
> > > On Tue, 14 Apr 2020 03:42:42 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, April 14, 2020 11:29 AM
> > > > >
> > > > > On Tue, 14 Apr 2020 02:40:58 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Alex Williamson 
> > > > > > > Sent: Tuesday, April 14, 2020 3:21 AM
> > > > > > >
> > > > > > > On Mon, 13 Apr 2020 08:05:33 +
> > > > > > > "Tian, Kevin"  wrote:
> > > > > > >
> > > > > > > > > From: Tian, Kevin
> > > > > > > > > Sent: Monday, April 13, 2020 3:55 PM
> > > > > > > > >
> > > > > > > > > > From: Raj, Ashok 
> > > > > > > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson
> > > wrote:
> > > > > > > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > > > > > > "Raj, Ashok"  wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Alex
> > > > > > > > > > > >
> > > > > > > > > > > > + Bjorn
> > > > > > > > > > >
> > > > > > > > > > >  + Don
> > > > > > > > > > >
> > > > > > > > > > > > FWIW I can't understand why PCI SIG went different ways
> > > with
> > > > > ATS,
> > > > > > > > > > > > where its enumerated on PF and VF. But for PASID and
> PRI its
> > > > > only
> > > > > > > > > > > > in PF.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm checking with our internal SIG reps to followup on
> that.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex
> Williamson
> > > > > wrote:
> > > > > > > > > > > > > > Is there vendor guarantee that hidden registers will
> locate
> > > at
> > > > > the
> > > > > > > > > > > > > > same offset between PF and VF config space?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not sure if the spec really precludes hidden 
> > > > > > > > > > > > > registers,
> > > but
> > > > > the
> > > > > > > > > > > > > fact that these registers are explicitly outside of 
> > > > > > > > > > > > > the
> > > capability
> > > > > > > > > > > > > chain implies they're only intended for device 
> > > > > > > > > > > > > specific
> use,
> > > so
> > > > > I'd
> > > > > > > say
> > > > > > > > > > > > > there are no guarantees about anything related to 
> > > > > > > > > > > > > these
> > > > > registers.
> > > > > > > > > > > >
> > > > > > > > > > > > As you had suggested in the other thread, we could
> consider
> > > > > > > > > > > > using the same offset as in PF, but even that's a better
> guess
> > > > > > > > > > > > still not reliable.
> > > > > > > > > > > >
> > > > > > > > > > > > The other option is to maybe extend driver ops in the PF
> to
> > > > > expose
> > > > > > > > > > > > where the offsets should be. Sort of adding the quirk in
> the
> > > > > > > > > > > > implementation.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not sure how prevalent are PASID and PRI in VF
> devices. If
> > > > > SIG is
> > > > > > > > > > resisting
> > > > > > > > > > > > making VF's first class citizen, we might ask them to 
> > > > > > > > > > > > add
> > > some
> > > > > > > verbiage
> > > > > > > > > > > > to suggest leave the same offsets as PF open to help
> > > emulation
> > > > > > > software.
> > > > > > > > > > >
> > > > > > > > > > > Even if we know where to expose these capabilities on the
> VF,
> > > it's
> > > > > not
> > > > > > > > > > > clear to me how we can actually virtualize the capability
> itself.
> > > If
> > > > > > > > > > > the spec defines, for example, an enable bit as r/w then
> > > software
> > > > > that
> > > > > > > > > > > interacts with that register expects the bit is settable.
> There's
> > > no
> > > > > > > > > > > protocol for "try to set the bit and re-read it to see if 
> > > > > > > > > > > the
> > > hardware
> > > > > > > > > > > accepted it".  Therefore a capability with a fixed enable 
> > > > > > > > > > > bit
> > > > > > > > > > > representing the state of the PF, not settable by the VF, 
> > > > > > > > > > > is
> > > > > > > > > > > disingenuous to the spec.
> > > > > > > > > >
> > > > > > > > > > I think we are all in violent agreement. A lot of times the 
> > > > > > > > > > pci
> spec
> > > > > gets
> > > > > > > > > > defined several years ahead of real products and no one
> > > > > remembers
> > > > > > > > > > the justification on why they restricted things the way they
> did.
> > > > > > > > > >
> > > > > > > > > > Maybe someone early product wasn't quite exposing these
> > > features
> > > > > to
> > > > > > > the
> > > > > > > > > > VF
> > > > > > > > > > and hence the spec is bug compatible :-)
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If what we're trying to do is expose that PASID and PRI 
> > > > > > > > > > > are
> > > enabled
> > > > > on
> > > > > > > > > > > the 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-14 Thread Alex Williamson
On Tue, 14 Apr 2020 23:57:33 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Tuesday, April 14, 2020 11:24 PM
> > 
> > On Tue, 14 Apr 2020 03:42:42 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson 
> > > > Sent: Tuesday, April 14, 2020 11:29 AM
> > > >
> > > > On Tue, 14 Apr 2020 02:40:58 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Alex Williamson 
> > > > > > Sent: Tuesday, April 14, 2020 3:21 AM
> > > > > >
> > > > > > On Mon, 13 Apr 2020 08:05:33 +
> > > > > > "Tian, Kevin"  wrote:
> > > > > >  
> > > > > > > > From: Tian, Kevin
> > > > > > > > Sent: Monday, April 13, 2020 3:55 PM
> > > > > > > >  
> > > > > > > > > From: Raj, Ashok 
> > > > > > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > > > > > >
> > > > > > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson  
> > wrote:  
> > > > > > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > > > > > "Raj, Ashok"  wrote:
> > > > > > > > > >  
> > > > > > > > > > > Hi Alex
> > > > > > > > > > >
> > > > > > > > > > > + Bjorn  
> > > > > > > > > >
> > > > > > > > > >  + Don
> > > > > > > > > >  
> > > > > > > > > > > FWIW I can't understand why PCI SIG went different ways  
> > with  
> > > > ATS,  
> > > > > > > > > > > where its enumerated on PF and VF. But for PASID and PRI 
> > > > > > > > > > > its  
> > > > only  
> > > > > > > > > > > in PF.
> > > > > > > > > > >
> > > > > > > > > > > I'm checking with our internal SIG reps to followup on 
> > > > > > > > > > > that.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson 
> > > > > > > > > > >  
> > > > wrote:  
> > > > > > > > > > > > > Is there vendor guarantee that hidden registers will 
> > > > > > > > > > > > > locate  
> > at  
> > > > the  
> > > > > > > > > > > > > same offset between PF and VF config space?  
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not sure if the spec really precludes hidden 
> > > > > > > > > > > > registers,  
> > but  
> > > > the  
> > > > > > > > > > > > fact that these registers are explicitly outside of the 
> > > > > > > > > > > >  
> > capability  
> > > > > > > > > > > > chain implies they're only intended for device specific 
> > > > > > > > > > > > use,  
> > so  
> > > > I'd  
> > > > > > say  
> > > > > > > > > > > > there are no guarantees about anything related to these 
> > > > > > > > > > > >  
> > > > registers.  
> > > > > > > > > > >
> > > > > > > > > > > As you had suggested in the other thread, we could 
> > > > > > > > > > > consider
> > > > > > > > > > > using the same offset as in PF, but even that's a better 
> > > > > > > > > > > guess
> > > > > > > > > > > still not reliable.
> > > > > > > > > > >
> > > > > > > > > > > The other option is to maybe extend driver ops in the PF 
> > > > > > > > > > > to  
> > > > expose  
> > > > > > > > > > > where the offsets should be. Sort of adding the quirk in 
> > > > > > > > > > > the
> > > > > > > > > > > implementation.
> > > > > > > > > > >
> > > > > > > > > > > I'm not sure how prevalent are PASID and PRI in VF 
> > > > > > > > > > > devices. If  
> > > > SIG is  
> > > > > > > > > resisting  
> > > > > > > > > > > making VF's first class citizen, we might ask them to add 
> > > > > > > > > > >  
> > some  
> > > > > > verbiage  
> > > > > > > > > > > to suggest leave the same offsets as PF open to help  
> > emulation  
> > > > > > software.  
> > > > > > > > > >
> > > > > > > > > > Even if we know where to expose these capabilities on the 
> > > > > > > > > > VF,  
> > it's  
> > > > not  
> > > > > > > > > > clear to me how we can actually virtualize the capability 
> > > > > > > > > > itself.  
> > If  
> > > > > > > > > > the spec defines, for example, an enable bit as r/w then  
> > software  
> > > > that  
> > > > > > > > > > interacts with that register expects the bit is settable.  
> > > > > > > > > > There's  
> > no  
> > > > > > > > > > protocol for "try to set the bit and re-read it to see if 
> > > > > > > > > > the  
> > hardware  
> > > > > > > > > > accepted it".  Therefore a capability with a fixed enable 
> > > > > > > > > > bit
> > > > > > > > > > representing the state of the PF, not settable by the VF, is
> > > > > > > > > > disingenuous to the spec.  
> > > > > > > > >
> > > > > > > > > I think we are all in violent agreement. A lot of times the 
> > > > > > > > > pci spec  
> > > > gets  
> > > > > > > > > defined several years ahead of real products and no one  
> > > > remembers  
> > > > > > > > > the justification on why they restricted things the way they 
> > > > > > > > > did.
> > > > > > > > >
> > > > > > > > > Maybe someone early product wasn't quite exposing these  
> > features  
> > > > to  
> > > > > > the  
> > > > > > > > > VF
> > > > > > > > > and hence the spec is bug compatible :-)
> > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > If what we're trying to do is expose that PASID and PRI are 
> > > > > > > > > >  

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-14 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Tuesday, April 14, 2020 11:24 PM
> 
> On Tue, 14 Apr 2020 03:42:42 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, April 14, 2020 11:29 AM
> > >
> > > On Tue, 14 Apr 2020 02:40:58 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, April 14, 2020 3:21 AM
> > > > >
> > > > > On Mon, 13 Apr 2020 08:05:33 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Tian, Kevin
> > > > > > > Sent: Monday, April 13, 2020 3:55 PM
> > > > > > >
> > > > > > > > From: Raj, Ashok 
> > > > > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > > > > >
> > > > > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson
> wrote:
> > > > > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > > > > "Raj, Ashok"  wrote:
> > > > > > > > >
> > > > > > > > > > Hi Alex
> > > > > > > > > >
> > > > > > > > > > + Bjorn
> > > > > > > > >
> > > > > > > > >  + Don
> > > > > > > > >
> > > > > > > > > > FWIW I can't understand why PCI SIG went different ways
> with
> > > ATS,
> > > > > > > > > > where its enumerated on PF and VF. But for PASID and PRI its
> > > only
> > > > > > > > > > in PF.
> > > > > > > > > >
> > > > > > > > > > I'm checking with our internal SIG reps to followup on that.
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson
> > > wrote:
> > > > > > > > > > > > Is there vendor guarantee that hidden registers will 
> > > > > > > > > > > > locate
> at
> > > the
> > > > > > > > > > > > same offset between PF and VF config space?
> > > > > > > > > > >
> > > > > > > > > > > I'm not sure if the spec really precludes hidden 
> > > > > > > > > > > registers,
> but
> > > the
> > > > > > > > > > > fact that these registers are explicitly outside of the
> capability
> > > > > > > > > > > chain implies they're only intended for device specific 
> > > > > > > > > > > use,
> so
> > > I'd
> > > > > say
> > > > > > > > > > > there are no guarantees about anything related to these
> > > registers.
> > > > > > > > > >
> > > > > > > > > > As you had suggested in the other thread, we could consider
> > > > > > > > > > using the same offset as in PF, but even that's a better 
> > > > > > > > > > guess
> > > > > > > > > > still not reliable.
> > > > > > > > > >
> > > > > > > > > > The other option is to maybe extend driver ops in the PF to
> > > expose
> > > > > > > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > > > > > > implementation.
> > > > > > > > > >
> > > > > > > > > > I'm not sure how prevalent are PASID and PRI in VF devices. 
> > > > > > > > > > If
> > > SIG is
> > > > > > > > resisting
> > > > > > > > > > making VF's first class citizen, we might ask them to add
> some
> > > > > verbiage
> > > > > > > > > > to suggest leave the same offsets as PF open to help
> emulation
> > > > > software.
> > > > > > > > >
> > > > > > > > > Even if we know where to expose these capabilities on the VF,
> it's
> > > not
> > > > > > > > > clear to me how we can actually virtualize the capability 
> > > > > > > > > itself.
> If
> > > > > > > > > the spec defines, for example, an enable bit as r/w then
> software
> > > that
> > > > > > > > > interacts with that register expects the bit is settable.  
> > > > > > > > > There's
> no
> > > > > > > > > protocol for "try to set the bit and re-read it to see if the
> hardware
> > > > > > > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > > > > > > representing the state of the PF, not settable by the VF, is
> > > > > > > > > disingenuous to the spec.
> > > > > > > >
> > > > > > > > I think we are all in violent agreement. A lot of times the pci 
> > > > > > > > spec
> > > gets
> > > > > > > > defined several years ahead of real products and no one
> > > remembers
> > > > > > > > the justification on why they restricted things the way they 
> > > > > > > > did.
> > > > > > > >
> > > > > > > > Maybe someone early product wasn't quite exposing these
> features
> > > to
> > > > > the
> > > > > > > > VF
> > > > > > > > and hence the spec is bug compatible :-)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > If what we're trying to do is expose that PASID and PRI are
> enabled
> > > on
> > > > > > > > > the PF to a VF driver, maybe duplicating the PF capabilities 
> > > > > > > > > on
> the
> > > VF
> > > > > > > > > without the ability to control it is not the right approach.
> Maybe
> > > we
> > > > > > > >
> > > > > > > > As long as the capability enable is only provided when the PF 
> > > > > > > > has
> > > > > enabled
> > > > > > > > the feature. Then it seems the hardware seems to do the right
> thing.
> > > > > > > >
> > > > > > > > Assume we expose PASID/PRI only when PF has enabled it. It will
> be
> > > the
> > > > > > > > case since the PF driver needs to exist, and IOMMU would have
> set
> > > the
> > > > > > > > PASID/PRI/ATS on PF.
> > > > > > > >
> > > > > > > > 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-14 Thread Alex Williamson
On Tue, 14 Apr 2020 03:42:42 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Tuesday, April 14, 2020 11:29 AM
> > 
> > On Tue, 14 Apr 2020 02:40:58 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson 
> > > > Sent: Tuesday, April 14, 2020 3:21 AM
> > > >
> > > > On Mon, 13 Apr 2020 08:05:33 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Tian, Kevin
> > > > > > Sent: Monday, April 13, 2020 3:55 PM
> > > > > >  
> > > > > > > From: Raj, Ashok 
> > > > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > > > >
> > > > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:  
> > > > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > > > "Raj, Ashok"  wrote:
> > > > > > > >  
> > > > > > > > > Hi Alex
> > > > > > > > >
> > > > > > > > > + Bjorn  
> > > > > > > >
> > > > > > > >  + Don
> > > > > > > >  
> > > > > > > > > FWIW I can't understand why PCI SIG went different ways with  
> > ATS,  
> > > > > > > > > where its enumerated on PF and VF. But for PASID and PRI its  
> > only  
> > > > > > > > > in PF.
> > > > > > > > >
> > > > > > > > > I'm checking with our internal SIG reps to followup on that.
> > > > > > > > >
> > > > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson  
> > wrote:  
> > > > > > > > > > > Is there vendor guarantee that hidden registers will 
> > > > > > > > > > > locate at  
> > the  
> > > > > > > > > > > same offset between PF and VF config space?  
> > > > > > > > > >
> > > > > > > > > > I'm not sure if the spec really precludes hidden registers, 
> > > > > > > > > > but  
> > the  
> > > > > > > > > > fact that these registers are explicitly outside of the 
> > > > > > > > > > capability
> > > > > > > > > > chain implies they're only intended for device specific 
> > > > > > > > > > use, so  
> > I'd  
> > > > say  
> > > > > > > > > > there are no guarantees about anything related to these  
> > registers.  
> > > > > > > > >
> > > > > > > > > As you had suggested in the other thread, we could consider
> > > > > > > > > using the same offset as in PF, but even that's a better guess
> > > > > > > > > still not reliable.
> > > > > > > > >
> > > > > > > > > The other option is to maybe extend driver ops in the PF to  
> > expose  
> > > > > > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > > > > > implementation.
> > > > > > > > >
> > > > > > > > > I'm not sure how prevalent are PASID and PRI in VF devices. 
> > > > > > > > > If  
> > SIG is  
> > > > > > > resisting  
> > > > > > > > > making VF's first class citizen, we might ask them to add 
> > > > > > > > > some  
> > > > verbiage  
> > > > > > > > > to suggest leave the same offsets as PF open to help 
> > > > > > > > > emulation  
> > > > software.  
> > > > > > > >
> > > > > > > > Even if we know where to expose these capabilities on the VF, 
> > > > > > > > it's  
> > not  
> > > > > > > > clear to me how we can actually virtualize the capability 
> > > > > > > > itself.  If
> > > > > > > > the spec defines, for example, an enable bit as r/w then 
> > > > > > > > software  
> > that  
> > > > > > > > interacts with that register expects the bit is settable.  
> > > > > > > > There's no
> > > > > > > > protocol for "try to set the bit and re-read it to see if the 
> > > > > > > > hardware
> > > > > > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > > > > > representing the state of the PF, not settable by the VF, is
> > > > > > > > disingenuous to the spec.  
> > > > > > >
> > > > > > > I think we are all in violent agreement. A lot of times the pci 
> > > > > > > spec  
> > gets  
> > > > > > > defined several years ahead of real products and no one  
> > remembers  
> > > > > > > the justification on why they restricted things the way they did.
> > > > > > >
> > > > > > > Maybe someone early product wasn't quite exposing these features  
> > to  
> > > > the  
> > > > > > > VF
> > > > > > > and hence the spec is bug compatible :-)
> > > > > > >  
> > > > > > > >
> > > > > > > > If what we're trying to do is expose that PASID and PRI are 
> > > > > > > > enabled  
> > on  
> > > > > > > > the PF to a VF driver, maybe duplicating the PF capabilities on 
> > > > > > > > the  
> > VF  
> > > > > > > > without the ability to control it is not the right approach.  
> > > > > > > > Maybe  
> > we  
> > > > > > >
> > > > > > > As long as the capability enable is only provided when the PF has 
> > > > > > >  
> > > > enabled  
> > > > > > > the feature. Then it seems the hardware seems to do the right 
> > > > > > > thing.
> > > > > > >
> > > > > > > Assume we expose PASID/PRI only when PF has enabled it. It will 
> > > > > > > be  
> > the  
> > > > > > > case since the PF driver needs to exist, and IOMMU would have set 
> > > > > > >  
> > the  
> > > > > > > PASID/PRI/ATS on PF.
> > > > > > >
> > > > > > > If the emulation is purely spoofing the capability. Once vIOMMU  
> > driver  
> > > 

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Tuesday, April 14, 2020 11:29 AM
> 
> On Tue, 14 Apr 2020 02:40:58 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, April 14, 2020 3:21 AM
> > >
> > > On Mon, 13 Apr 2020 08:05:33 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Tian, Kevin
> > > > > Sent: Monday, April 13, 2020 3:55 PM
> > > > >
> > > > > > From: Raj, Ashok 
> > > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > > >
> > > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> > > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > > "Raj, Ashok"  wrote:
> > > > > > >
> > > > > > > > Hi Alex
> > > > > > > >
> > > > > > > > + Bjorn
> > > > > > >
> > > > > > >  + Don
> > > > > > >
> > > > > > > > FWIW I can't understand why PCI SIG went different ways with
> ATS,
> > > > > > > > where its enumerated on PF and VF. But for PASID and PRI its
> only
> > > > > > > > in PF.
> > > > > > > >
> > > > > > > > I'm checking with our internal SIG reps to followup on that.
> > > > > > > >
> > > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson
> wrote:
> > > > > > > > > > Is there vendor guarantee that hidden registers will locate 
> > > > > > > > > > at
> the
> > > > > > > > > > same offset between PF and VF config space?
> > > > > > > > >
> > > > > > > > > I'm not sure if the spec really precludes hidden registers, 
> > > > > > > > > but
> the
> > > > > > > > > fact that these registers are explicitly outside of the 
> > > > > > > > > capability
> > > > > > > > > chain implies they're only intended for device specific use, 
> > > > > > > > > so
> I'd
> > > say
> > > > > > > > > there are no guarantees about anything related to these
> registers.
> > > > > > > >
> > > > > > > > As you had suggested in the other thread, we could consider
> > > > > > > > using the same offset as in PF, but even that's a better guess
> > > > > > > > still not reliable.
> > > > > > > >
> > > > > > > > The other option is to maybe extend driver ops in the PF to
> expose
> > > > > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > > > > implementation.
> > > > > > > >
> > > > > > > > I'm not sure how prevalent are PASID and PRI in VF devices. If
> SIG is
> > > > > > resisting
> > > > > > > > making VF's first class citizen, we might ask them to add some
> > > verbiage
> > > > > > > > to suggest leave the same offsets as PF open to help emulation
> > > software.
> > > > > > >
> > > > > > > Even if we know where to expose these capabilities on the VF, it's
> not
> > > > > > > clear to me how we can actually virtualize the capability itself. 
> > > > > > >  If
> > > > > > > the spec defines, for example, an enable bit as r/w then software
> that
> > > > > > > interacts with that register expects the bit is settable.  
> > > > > > > There's no
> > > > > > > protocol for "try to set the bit and re-read it to see if the 
> > > > > > > hardware
> > > > > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > > > > representing the state of the PF, not settable by the VF, is
> > > > > > > disingenuous to the spec.
> > > > > >
> > > > > > I think we are all in violent agreement. A lot of times the pci spec
> gets
> > > > > > defined several years ahead of real products and no one
> remembers
> > > > > > the justification on why they restricted things the way they did.
> > > > > >
> > > > > > Maybe someone early product wasn't quite exposing these features
> to
> > > the
> > > > > > VF
> > > > > > and hence the spec is bug compatible :-)
> > > > > >
> > > > > > >
> > > > > > > If what we're trying to do is expose that PASID and PRI are 
> > > > > > > enabled
> on
> > > > > > > the PF to a VF driver, maybe duplicating the PF capabilities on 
> > > > > > > the
> VF
> > > > > > > without the ability to control it is not the right approach.  
> > > > > > > Maybe
> we
> > > > > >
> > > > > > As long as the capability enable is only provided when the PF has
> > > enabled
> > > > > > the feature. Then it seems the hardware seems to do the right thing.
> > > > > >
> > > > > > Assume we expose PASID/PRI only when PF has enabled it. It will be
> the
> > > > > > case since the PF driver needs to exist, and IOMMU would have set
> the
> > > > > > PASID/PRI/ATS on PF.
> > > > > >
> > > > > > If the emulation is purely spoofing the capability. Once vIOMMU
> driver
> > > > > > enables PASID, the context entries for the VF are completely
> > > independent
> > > > > > from the PF context entries.
> > > > > >
> > > > > > vIOMMU would enable PASID, and we just spoof the PASID
> capability.
> > > > > >
> > > > > > If vIOMMU or guest for some reason does disable_pasid(), then the
> > > > > > vIOMMU driver can disaable PASID on the VF context entries. So the
> VF
> > > > > > although the capability is blanket enabled on PF, IOMMU gaurantees
> > > the
> > > > > > transactions are blocked.
> > > > > >
> > > > > >
> > > > > > In the interim, it seems 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Alex Williamson
On Tue, 14 Apr 2020 02:40:58 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Tuesday, April 14, 2020 3:21 AM
> > 
> > On Mon, 13 Apr 2020 08:05:33 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Tian, Kevin
> > > > Sent: Monday, April 13, 2020 3:55 PM
> > > >  
> > > > > From: Raj, Ashok 
> > > > > Sent: Monday, April 13, 2020 11:11 AM
> > > > >
> > > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:  
> > > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > > "Raj, Ashok"  wrote:
> > > > > >  
> > > > > > > Hi Alex
> > > > > > >
> > > > > > > + Bjorn  
> > > > > >
> > > > > >  + Don
> > > > > >  
> > > > > > > FWIW I can't understand why PCI SIG went different ways with ATS,
> > > > > > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > > > > > in PF.
> > > > > > >
> > > > > > > I'm checking with our internal SIG reps to followup on that.
> > > > > > >
> > > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:  
> > > > > > > > > Is there vendor guarantee that hidden registers will locate 
> > > > > > > > > at the
> > > > > > > > > same offset between PF and VF config space?  
> > > > > > > >
> > > > > > > > I'm not sure if the spec really precludes hidden registers, but 
> > > > > > > > the
> > > > > > > > fact that these registers are explicitly outside of the 
> > > > > > > > capability
> > > > > > > > chain implies they're only intended for device specific use, so 
> > > > > > > > I'd  
> > say  
> > > > > > > > there are no guarantees about anything related to these 
> > > > > > > > registers.  
> > > > > > >
> > > > > > > As you had suggested in the other thread, we could consider
> > > > > > > using the same offset as in PF, but even that's a better guess
> > > > > > > still not reliable.
> > > > > > >
> > > > > > > The other option is to maybe extend driver ops in the PF to expose
> > > > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > > > implementation.
> > > > > > >
> > > > > > > I'm not sure how prevalent are PASID and PRI in VF devices. If 
> > > > > > > SIG is  
> > > > > resisting  
> > > > > > > making VF's first class citizen, we might ask them to add some  
> > verbiage  
> > > > > > > to suggest leave the same offsets as PF open to help emulation  
> > software.  
> > > > > >
> > > > > > Even if we know where to expose these capabilities on the VF, it's 
> > > > > > not
> > > > > > clear to me how we can actually virtualize the capability itself.  
> > > > > > If
> > > > > > the spec defines, for example, an enable bit as r/w then software 
> > > > > > that
> > > > > > interacts with that register expects the bit is settable.  There's 
> > > > > > no
> > > > > > protocol for "try to set the bit and re-read it to see if the 
> > > > > > hardware
> > > > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > > > representing the state of the PF, not settable by the VF, is
> > > > > > disingenuous to the spec.  
> > > > >
> > > > > I think we are all in violent agreement. A lot of times the pci spec 
> > > > > gets
> > > > > defined several years ahead of real products and no one remembers
> > > > > the justification on why they restricted things the way they did.
> > > > >
> > > > > Maybe someone early product wasn't quite exposing these features to  
> > the  
> > > > > VF
> > > > > and hence the spec is bug compatible :-)
> > > > >  
> > > > > >
> > > > > > If what we're trying to do is expose that PASID and PRI are enabled 
> > > > > > on
> > > > > > the PF to a VF driver, maybe duplicating the PF capabilities on the 
> > > > > > VF
> > > > > > without the ability to control it is not the right approach.  Maybe 
> > > > > > we  
> > > > >
> > > > > As long as the capability enable is only provided when the PF has  
> > enabled  
> > > > > the feature. Then it seems the hardware seems to do the right thing.
> > > > >
> > > > > Assume we expose PASID/PRI only when PF has enabled it. It will be the
> > > > > case since the PF driver needs to exist, and IOMMU would have set the
> > > > > PASID/PRI/ATS on PF.
> > > > >
> > > > > If the emulation is purely spoofing the capability. Once vIOMMU driver
> > > > > enables PASID, the context entries for the VF are completely  
> > independent  
> > > > > from the PF context entries.
> > > > >
> > > > > vIOMMU would enable PASID, and we just spoof the PASID capability.
> > > > >
> > > > > If vIOMMU or guest for some reason does disable_pasid(), then the
> > > > > vIOMMU driver can disaable PASID on the VF context entries. So the VF
> > > > > although the capability is blanket enabled on PF, IOMMU gaurantees  
> > the  
> > > > > transactions are blocked.
> > > > >
> > > > >
> > > > > In the interim, it seems like the intent of the virtual capability
> > > > > can be honored via help from the IOMMU for the controlling aspect..
> > > > >
> > > > > Did i miss anything?  
> > > >
> > > > Above works for emulating the 

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Tuesday, April 14, 2020 3:21 AM
> 
> On Mon, 13 Apr 2020 08:05:33 +
> "Tian, Kevin"  wrote:
> 
> > > From: Tian, Kevin
> > > Sent: Monday, April 13, 2020 3:55 PM
> > >
> > > > From: Raj, Ashok 
> > > > Sent: Monday, April 13, 2020 11:11 AM
> > > >
> > > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> > > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > > "Raj, Ashok"  wrote:
> > > > >
> > > > > > Hi Alex
> > > > > >
> > > > > > + Bjorn
> > > > >
> > > > >  + Don
> > > > >
> > > > > > FWIW I can't understand why PCI SIG went different ways with ATS,
> > > > > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > > > > in PF.
> > > > > >
> > > > > > I'm checking with our internal SIG reps to followup on that.
> > > > > >
> > > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > > > > > Is there vendor guarantee that hidden registers will locate at 
> > > > > > > > the
> > > > > > > > same offset between PF and VF config space?
> > > > > > >
> > > > > > > I'm not sure if the spec really precludes hidden registers, but 
> > > > > > > the
> > > > > > > fact that these registers are explicitly outside of the capability
> > > > > > > chain implies they're only intended for device specific use, so 
> > > > > > > I'd
> say
> > > > > > > there are no guarantees about anything related to these registers.
> > > > > >
> > > > > > As you had suggested in the other thread, we could consider
> > > > > > using the same offset as in PF, but even that's a better guess
> > > > > > still not reliable.
> > > > > >
> > > > > > The other option is to maybe extend driver ops in the PF to expose
> > > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > > implementation.
> > > > > >
> > > > > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG 
> > > > > > is
> > > > resisting
> > > > > > making VF's first class citizen, we might ask them to add some
> verbiage
> > > > > > to suggest leave the same offsets as PF open to help emulation
> software.
> > > > >
> > > > > Even if we know where to expose these capabilities on the VF, it's not
> > > > > clear to me how we can actually virtualize the capability itself.  If
> > > > > the spec defines, for example, an enable bit as r/w then software that
> > > > > interacts with that register expects the bit is settable.  There's no
> > > > > protocol for "try to set the bit and re-read it to see if the hardware
> > > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > > representing the state of the PF, not settable by the VF, is
> > > > > disingenuous to the spec.
> > > >
> > > > I think we are all in violent agreement. A lot of times the pci spec 
> > > > gets
> > > > defined several years ahead of real products and no one remembers
> > > > the justification on why they restricted things the way they did.
> > > >
> > > > Maybe someone early product wasn't quite exposing these features to
> the
> > > > VF
> > > > and hence the spec is bug compatible :-)
> > > >
> > > > >
> > > > > If what we're trying to do is expose that PASID and PRI are enabled on
> > > > > the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> > > > > without the ability to control it is not the right approach.  Maybe we
> > > >
> > > > As long as the capability enable is only provided when the PF has
> enabled
> > > > the feature. Then it seems the hardware seems to do the right thing.
> > > >
> > > > Assume we expose PASID/PRI only when PF has enabled it. It will be the
> > > > case since the PF driver needs to exist, and IOMMU would have set the
> > > > PASID/PRI/ATS on PF.
> > > >
> > > > If the emulation is purely spoofing the capability. Once vIOMMU driver
> > > > enables PASID, the context entries for the VF are completely
> independent
> > > > from the PF context entries.
> > > >
> > > > vIOMMU would enable PASID, and we just spoof the PASID capability.
> > > >
> > > > If vIOMMU or guest for some reason does disable_pasid(), then the
> > > > vIOMMU driver can disaable PASID on the VF context entries. So the VF
> > > > although the capability is blanket enabled on PF, IOMMU gaurantees
> the
> > > > transactions are blocked.
> > > >
> > > >
> > > > In the interim, it seems like the intent of the virtual capability
> > > > can be honored via help from the IOMMU for the controlling aspect..
> > > >
> > > > Did i miss anything?
> > >
> > > Above works for emulating the enable bit (under the assumption that
> > > PF driver won't disable pasid when vf is assigned). However, there are
> > > also "Execute permission enable" and "Privileged mode enable" bits in
> > > PASID control registers. I don't know how those bits could be cleanly
> > > emulated when the guest writes a value different from PF's...
> >
> > sent too quick. the IOMMU also includes control bits for allowing/
> > blocking execute requests and supervisor requests. We can rely 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Alex Williamson
On Thu, 9 Apr 2020 09:35:33 +0200
Jean-Philippe Brucker  wrote:

> On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> > On Tue, 7 Apr 2020 21:00:21 -0700
> > "Raj, Ashok"  wrote:
> >   
> > > Hi Alex
> > > 
> > > + Bjorn  
> > 
> >  + Don
> >   
> > > FWIW I can't understand why PCI SIG went different ways with ATS, 
> > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > in PF. 
> > > 
> > > I'm checking with our internal SIG reps to followup on that.
> > > 
> > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:  
> > > > > Is there vendor guarantee that hidden registers will locate at the
> > > > > same offset between PF and VF config space? 
> > > > 
> > > > I'm not sure if the spec really precludes hidden registers, but the
> > > > fact that these registers are explicitly outside of the capability
> > > > chain implies they're only intended for device specific use, so I'd say
> > > > there are no guarantees about anything related to these registers.
> > > 
> > > As you had suggested in the other thread, we could consider
> > > using the same offset as in PF, but even that's a better guess
> > > still not reliable.
> > > 
> > > The other option is to maybe extend driver ops in the PF to expose
> > > where the offsets should be. Sort of adding the quirk in the 
> > > implementation. 
> > > 
> > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > > resisting 
> > > making VF's first class citizen, we might ask them to add some verbiage
> > > to suggest leave the same offsets as PF open to help emulation software.  
> > 
> > Even if we know where to expose these capabilities on the VF, it's not
> > clear to me how we can actually virtualize the capability itself.  If
> > the spec defines, for example, an enable bit as r/w then software that
> > interacts with that register expects the bit is settable.  There's no
> > protocol for "try to set the bit and re-read it to see if the hardware
> > accepted it".  Therefore a capability with a fixed enable bit
> > representing the state of the PF, not settable by the VF, is
> > disingenuous to the spec.  
> 
> Would it be OK to implement a lock down mechanism for the PF PASID
> capability, preventing changes to the PF cap when the VF is in use by
> VFIO?  The emulation would still break the spec: since the PF cap would
> always be enabled the VF configuration bits would have no effect, but it
> seems preferable to having the Enable bit not enable anything.

I think we absolutely need some mechanism to make sure the PF driver
doesn't change the PASID enable bit while we're using it.  A PASID user
registration perhaps.  And yes, that doesn't necessarily map to being
able to actually disable PASID, but it sounds like Kevin and Ashok have
some ideas how the emulation could use the IOMMU settings to achieve
that more precisely.

> > If what we're trying to do is expose that PASID and PRI are enabled on
> > the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> > without the ability to control it is not the right approach.  Maybe we
> > need new capabilities exposing these as slave features that cannot be
> > controlled?  We could define our own vendor capability for this, but of
> > course we have both the where to put it in config space issue, as well
> > as the issue of trying to push an ad-hoc standard.  vfio could expose
> > these as device features rather than emulating capabilities, but that
> > still leaves a big gap between vfio in the hypervisor and the driver in
> > the guest VM.  That might still help push the responsibility and policy
> > for how to expose it to the VM as a userspace problem though.  
> 
> Userspace might have more difficulty working around the issues mentioned
> in this thread. They would still need a guarantee that the PF PASID
> configuration doesn't change at runtime, and they wouldn't have the
> ability to talk to a vendor driver to figure out where to place the fake
> PASID capability.

Couldn't we do this with the DEVICE_FEATURE ioctl we just added?  A
user could set a PASID user or get a capability offset, where vfio-pci
would plumb these through to whatever PF/vendor driver provides that
interface, just as if it was implemented in the vfio-pci driver.  But
that still lets us leave the policy of inserting a capability and using
the IOMMU to implement the bits to the user/QEMU.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Alex Williamson
On Mon, 13 Apr 2020 08:05:33 +
"Tian, Kevin"  wrote:

> > From: Tian, Kevin
> > Sent: Monday, April 13, 2020 3:55 PM
> >   
> > > From: Raj, Ashok 
> > > Sent: Monday, April 13, 2020 11:11 AM
> > >
> > > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:  
> > > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > > "Raj, Ashok"  wrote:
> > > >  
> > > > > Hi Alex
> > > > >
> > > > > + Bjorn  
> > > >
> > > >  + Don
> > > >  
> > > > > FWIW I can't understand why PCI SIG went different ways with ATS,
> > > > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > > > in PF.
> > > > >
> > > > > I'm checking with our internal SIG reps to followup on that.
> > > > >
> > > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:  
> > > > > > > Is there vendor guarantee that hidden registers will locate at the
> > > > > > > same offset between PF and VF config space?  
> > > > > >
> > > > > > I'm not sure if the spec really precludes hidden registers, but the
> > > > > > fact that these registers are explicitly outside of the capability
> > > > > > chain implies they're only intended for device specific use, so I'd 
> > > > > > say
> > > > > > there are no guarantees about anything related to these registers.  
> > > > >
> > > > > As you had suggested in the other thread, we could consider
> > > > > using the same offset as in PF, but even that's a better guess
> > > > > still not reliable.
> > > > >
> > > > > The other option is to maybe extend driver ops in the PF to expose
> > > > > where the offsets should be. Sort of adding the quirk in the
> > > > > implementation.
> > > > >
> > > > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > > > >  
> > > resisting  
> > > > > making VF's first class citizen, we might ask them to add some 
> > > > > verbiage
> > > > > to suggest leave the same offsets as PF open to help emulation 
> > > > > software.  
> > > >
> > > > Even if we know where to expose these capabilities on the VF, it's not
> > > > clear to me how we can actually virtualize the capability itself.  If
> > > > the spec defines, for example, an enable bit as r/w then software that
> > > > interacts with that register expects the bit is settable.  There's no
> > > > protocol for "try to set the bit and re-read it to see if the hardware
> > > > accepted it".  Therefore a capability with a fixed enable bit
> > > > representing the state of the PF, not settable by the VF, is
> > > > disingenuous to the spec.  
> > >
> > > I think we are all in violent agreement. A lot of times the pci spec gets
> > > defined several years ahead of real products and no one remembers
> > > the justification on why they restricted things the way they did.
> > >
> > > Maybe someone early product wasn't quite exposing these features to the
> > > VF
> > > and hence the spec is bug compatible :-)
> > >  
> > > >
> > > > If what we're trying to do is expose that PASID and PRI are enabled on
> > > > the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> > > > without the ability to control it is not the right approach.  Maybe we  
> > >
> > > As long as the capability enable is only provided when the PF has enabled
> > > the feature. Then it seems the hardware seems to do the right thing.
> > >
> > > Assume we expose PASID/PRI only when PF has enabled it. It will be the
> > > case since the PF driver needs to exist, and IOMMU would have set the
> > > PASID/PRI/ATS on PF.
> > >
> > > If the emulation is purely spoofing the capability. Once vIOMMU driver
> > > enables PASID, the context entries for the VF are completely independent
> > > from the PF context entries.
> > >
> > > vIOMMU would enable PASID, and we just spoof the PASID capability.
> > >
> > > If vIOMMU or guest for some reason does disable_pasid(), then the
> > > vIOMMU driver can disaable PASID on the VF context entries. So the VF
> > > although the capability is blanket enabled on PF, IOMMU gaurantees the
> > > transactions are blocked.
> > >
> > >
> > > In the interim, it seems like the intent of the virtual capability
> > > can be honored via help from the IOMMU for the controlling aspect..
> > >
> > > Did i miss anything?  
> > 
> > Above works for emulating the enable bit (under the assumption that
> > PF driver won't disable pasid when vf is assigned). However, there are
> > also "Execute permission enable" and "Privileged mode enable" bits in
> > PASID control registers. I don't know how those bits could be cleanly
> > emulated when the guest writes a value different from PF's...  
> 
> sent too quick. the IOMMU also includes control bits for allowing/
> blocking execute requests and supervisor requests. We can rely on 
> IOMMU to block those requests to emulate the disabled cases of
> all three control bits in the pasid cap.


So if the emulation of the PASID capability takes into account the
IOMMU configuration to back that emulation, shouldn't we do that
emulation in the 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Alex Williamson
On Sun, 12 Apr 2020 20:29:31 -0700
"Raj, Ashok"  wrote:

> Hi Alex
> 
> Going through the PCIe Spec, there seems a lot of such capabilities
> that are different between PF and VF. Some that make sense
> and some don't.
> 
> 
> On Sun, Apr 12, 2020 at 08:10:43PM -0700, Raj, Ashok wrote:
> >   
> > > 
> > > I agree though, I don't know why the SIG would preclude implementing
> > > per VF control of these features.  Thanks,
> > >   
> 
> For e.g. 
> 
> VF doesn't have I/O and Mem space enables, but has BME

VFs don't have I/O, so I/O enable is irrelevant.  The memory enable bit
is emulated, so it doesn't really do anything from the VM perspective.
The hypervisor could provide more emulation around this, but it hasn't
proven necessary.

> Interrupt Status

VFs don't have INTx, so this is irrelevant.

> Correctable Error Reporting
> Almost all of Device Control Register.

Are we doing anything to virtualize these for VFs?  I think we've
addressed access control to these for PFs, but I don't see that we try
to virtualize them for the VF.

> So it seems like there is a ton of them we have to deal with today for 
> VF's. How do we manage to emulate them without any support for them 
> in VF's? 

The memory enable bit is just access to the MMIO space of the device,
the hypervisor could choose to do more, but currently emulating the bit
itself is sufficient.  This doesn't really affect the device, just
access to the device.  The device control registers, I don't think
we've had a need to virtualize them yet and I think we'd run into many
of the same questions.  If your point is that there exists gaps in the
spec that make things difficult to virtualize, I won't argue with you
there.  MPS is a nearby one that's difficult to virtualize on the PF
since its setting needs to take entire communication channels into
account.

So far though we aren't inventing new capabilities to add to VF config
space and pretending they work, we're just stumbling on what the VF
exposes whether on bare metal or in a VM.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Monday, April 13, 2020 3:55 PM
> 
> > From: Raj, Ashok 
> > Sent: Monday, April 13, 2020 11:11 AM
> >
> > On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> > > On Tue, 7 Apr 2020 21:00:21 -0700
> > > "Raj, Ashok"  wrote:
> > >
> > > > Hi Alex
> > > >
> > > > + Bjorn
> > >
> > >  + Don
> > >
> > > > FWIW I can't understand why PCI SIG went different ways with ATS,
> > > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > > in PF.
> > > >
> > > > I'm checking with our internal SIG reps to followup on that.
> > > >
> > > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > > > Is there vendor guarantee that hidden registers will locate at the
> > > > > > same offset between PF and VF config space?
> > > > >
> > > > > I'm not sure if the spec really precludes hidden registers, but the
> > > > > fact that these registers are explicitly outside of the capability
> > > > > chain implies they're only intended for device specific use, so I'd 
> > > > > say
> > > > > there are no guarantees about anything related to these registers.
> > > >
> > > > As you had suggested in the other thread, we could consider
> > > > using the same offset as in PF, but even that's a better guess
> > > > still not reliable.
> > > >
> > > > The other option is to maybe extend driver ops in the PF to expose
> > > > where the offsets should be. Sort of adding the quirk in the
> > > > implementation.
> > > >
> > > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is
> > resisting
> > > > making VF's first class citizen, we might ask them to add some verbiage
> > > > to suggest leave the same offsets as PF open to help emulation software.
> > >
> > > Even if we know where to expose these capabilities on the VF, it's not
> > > clear to me how we can actually virtualize the capability itself.  If
> > > the spec defines, for example, an enable bit as r/w then software that
> > > interacts with that register expects the bit is settable.  There's no
> > > protocol for "try to set the bit and re-read it to see if the hardware
> > > accepted it".  Therefore a capability with a fixed enable bit
> > > representing the state of the PF, not settable by the VF, is
> > > disingenuous to the spec.
> >
> > I think we are all in violent agreement. A lot of times the pci spec gets
> > defined several years ahead of real products and no one remembers
> > the justification on why they restricted things the way they did.
> >
> > Maybe someone early product wasn't quite exposing these features to the
> > VF
> > and hence the spec is bug compatible :-)
> >
> > >
> > > If what we're trying to do is expose that PASID and PRI are enabled on
> > > the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> > > without the ability to control it is not the right approach.  Maybe we
> >
> > As long as the capability enable is only provided when the PF has enabled
> > the feature. Then it seems the hardware seems to do the right thing.
> >
> > Assume we expose PASID/PRI only when PF has enabled it. It will be the
> > case since the PF driver needs to exist, and IOMMU would have set the
> > PASID/PRI/ATS on PF.
> >
> > If the emulation is purely spoofing the capability. Once vIOMMU driver
> > enables PASID, the context entries for the VF are completely independent
> > from the PF context entries.
> >
> > vIOMMU would enable PASID, and we just spoof the PASID capability.
> >
> > If vIOMMU or guest for some reason does disable_pasid(), then the
> > vIOMMU driver can disaable PASID on the VF context entries. So the VF
> > although the capability is blanket enabled on PF, IOMMU gaurantees the
> > transactions are blocked.
> >
> >
> > In the interim, it seems like the intent of the virtual capability
> > can be honored via help from the IOMMU for the controlling aspect..
> >
> > Did i miss anything?
> 
> Above works for emulating the enable bit (under the assumption that
> PF driver won't disable pasid when vf is assigned). However, there are
> also "Execute permission enable" and "Privileged mode enable" bits in
> PASID control registers. I don't know how those bits could be cleanly
> emulated when the guest writes a value different from PF's...

sent too quick. the IOMMU also includes control bits for allowing/
blocking execute requests and supervisor requests. We can rely on 
IOMMU to block those requests to emulate the disabled cases of
all three control bits in the pasid cap.

Thanks
Kevin

> 
> Similar problem also exists when talking about PRI emulation, e.g.
> to enable PRI the software usually waits until the 'stopped' bit
> is set (indicating all previously issued requests have completed). How
> to emulate this bit accurately when one guest toggles the enable bit
> while the PF and other VFs are actively issuing page requests through
> the shared page request interface? from pcie spec I didn't find a way
> to catch when all previously-issued 

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-13 Thread Tian, Kevin
> From: Raj, Ashok 
> Sent: Monday, April 13, 2020 11:11 AM
> 
> On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> > On Tue, 7 Apr 2020 21:00:21 -0700
> > "Raj, Ashok"  wrote:
> >
> > > Hi Alex
> > >
> > > + Bjorn
> >
> >  + Don
> >
> > > FWIW I can't understand why PCI SIG went different ways with ATS,
> > > where its enumerated on PF and VF. But for PASID and PRI its only
> > > in PF.
> > >
> > > I'm checking with our internal SIG reps to followup on that.
> > >
> > > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > > Is there vendor guarantee that hidden registers will locate at the
> > > > > same offset between PF and VF config space?
> > > >
> > > > I'm not sure if the spec really precludes hidden registers, but the
> > > > fact that these registers are explicitly outside of the capability
> > > > chain implies they're only intended for device specific use, so I'd say
> > > > there are no guarantees about anything related to these registers.
> > >
> > > As you had suggested in the other thread, we could consider
> > > using the same offset as in PF, but even that's a better guess
> > > still not reliable.
> > >
> > > The other option is to maybe extend driver ops in the PF to expose
> > > where the offsets should be. Sort of adding the quirk in the
> > > implementation.
> > >
> > > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is
> resisting
> > > making VF's first class citizen, we might ask them to add some verbiage
> > > to suggest leave the same offsets as PF open to help emulation software.
> >
> > Even if we know where to expose these capabilities on the VF, it's not
> > clear to me how we can actually virtualize the capability itself.  If
> > the spec defines, for example, an enable bit as r/w then software that
> > interacts with that register expects the bit is settable.  There's no
> > protocol for "try to set the bit and re-read it to see if the hardware
> > accepted it".  Therefore a capability with a fixed enable bit
> > representing the state of the PF, not settable by the VF, is
> > disingenuous to the spec.
> 
> I think we are all in violent agreement. A lot of times the pci spec gets
> defined several years ahead of real products and no one remembers
> the justification on why they restricted things the way they did.
> 
> Maybe someone early product wasn't quite exposing these features to the
> VF
> and hence the spec is bug compatible :-)
> 
> >
> > If what we're trying to do is expose that PASID and PRI are enabled on
> > the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> > without the ability to control it is not the right approach.  Maybe we
> 
> As long as the capability enable is only provided when the PF has enabled
> the feature. Then it seems the hardware seems to do the right thing.
> 
> Assume we expose PASID/PRI only when PF has enabled it. It will be the
> case since the PF driver needs to exist, and IOMMU would have set the
> PASID/PRI/ATS on PF.
> 
> If the emulation is purely spoofing the capability. Once vIOMMU driver
> enables PASID, the context entries for the VF are completely independent
> from the PF context entries.
> 
> vIOMMU would enable PASID, and we just spoof the PASID capability.
> 
> If vIOMMU or guest for some reason does disable_pasid(), then the
> vIOMMU driver can disaable PASID on the VF context entries. So the VF
> although the capability is blanket enabled on PF, IOMMU gaurantees the
> transactions are blocked.
> 
> 
> In the interim, it seems like the intent of the virtual capability
> can be honored via help from the IOMMU for the controlling aspect..
> 
> Did i miss anything?

Above works for emulating the enable bit (under the assumption that 
PF driver won't disable pasid when vf is assigned). However, there are 
also "Execute permission enable" and "Privileged mode enable" bits in 
PASID control registers. I don't know how those bits could be cleanly 
emulated when the guest writes a value different from PF's...

Similar problem also exists when talking about PRI emulation, e.g. 
to enable PRI the software usually waits until the 'stopped' bit
is set (indicating all previously issued requests have completed). How
to emulate this bit accurately when one guest toggles the enable bit
while the PF and other VFs are actively issuing page requests through
the shared page request interface? from pcie spec I didn't find a way
to catch when all previously-issued requests from a specific VF have
completed. Can a conservative big-enough timeout value help here?
I don't know... similar puzzle also exists for emulating the 'reset'
control bit which is supposed to clear the pending request state for
the whole page request interface.

I feel the main problem in pcie spec is that, while they invent SR-IOV
to address I/O virtualization requirement (where strict isolation is
required), they blurred the boundary by leaving some shared resource 
cross PF and VFs which imply 

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-12 Thread Raj, Ashok
Hi Alex

Going through the PCIe Spec, there seems a lot of such capabilities
that are different between PF and VF. Some that make sense
and some don't.


On Sun, Apr 12, 2020 at 08:10:43PM -0700, Raj, Ashok wrote:
> 
> > 
> > I agree though, I don't know why the SIG would preclude implementing
> > per VF control of these features.  Thanks,
> > 

For e.g. 

VF doesn't have I/O and Mem space enables, but has BME
Interrupt Status
Correctable Error Reporting
Almost all of Device Control Register.

So it seems like there is a ton of them we have to deal with today for 
VF's. How do we manage to emulate them without any support for them 
in VF's? 


> 
> Cheers,
> Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-12 Thread Raj, Ashok
On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> On Tue, 7 Apr 2020 21:00:21 -0700
> "Raj, Ashok"  wrote:
> 
> > Hi Alex
> > 
> > + Bjorn
> 
>  + Don
> 
> > FWIW I can't understand why PCI SIG went different ways with ATS, 
> > where its enumerated on PF and VF. But for PASID and PRI its only
> > in PF. 
> > 
> > I'm checking with our internal SIG reps to followup on that.
> > 
> > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > Is there vendor guarantee that hidden registers will locate at the
> > > > same offset between PF and VF config space?   
> > > 
> > > I'm not sure if the spec really precludes hidden registers, but the
> > > fact that these registers are explicitly outside of the capability
> > > chain implies they're only intended for device specific use, so I'd say
> > > there are no guarantees about anything related to these registers.  
> > 
> > As you had suggested in the other thread, we could consider
> > using the same offset as in PF, but even that's a better guess
> > still not reliable.
> > 
> > The other option is to maybe extend driver ops in the PF to expose
> > where the offsets should be. Sort of adding the quirk in the 
> > implementation. 
> > 
> > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > resisting 
> > making VF's first class citizen, we might ask them to add some verbiage
> > to suggest leave the same offsets as PF open to help emulation software.
> 
> Even if we know where to expose these capabilities on the VF, it's not
> clear to me how we can actually virtualize the capability itself.  If
> the spec defines, for example, an enable bit as r/w then software that
> interacts with that register expects the bit is settable.  There's no
> protocol for "try to set the bit and re-read it to see if the hardware
> accepted it".  Therefore a capability with a fixed enable bit
> representing the state of the PF, not settable by the VF, is
> disingenuous to the spec.

I think we are all in violent agreement. A lot of times the pci spec gets
defined several years ahead of real products and no one remembers
the justification on why they restricted things the way they did.

Maybe someone early product wasn't quite exposing these features to the VF
and hence the spec is bug compatible :-)

> 
> If what we're trying to do is expose that PASID and PRI are enabled on
> the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> without the ability to control it is not the right approach.  Maybe we

As long as the capability enable is only provided when the PF has enabled
the feature. Then it seems the hardware seems to do the right thing.

Assume we expose PASID/PRI only when PF has enabled it. It will be the
case since the PF driver needs to exist, and IOMMU would have set the 
PASID/PRI/ATS on PF.

If the emulation is purely spoofing the capability. Once vIOMMU driver
enables PASID, the context entries for the VF are completely independent
from the PF context entries. 

vIOMMU would enable PASID, and we just spoof the PASID capability.

If vIOMMU or guest for some reason does disable_pasid(), then the 
vIOMMU driver can disaable PASID on the VF context entries. So the VF
although the capability is blanket enabled on PF, IOMMU gaurantees the
transactions are blocked.


In the interim, it seems like the intent of the virtual capability
can be honored via help from the IOMMU for the controlling aspect.. 

Did i miss anything? 

> need new capabilities exposing these as slave features that cannot be
> controlled?  We could define our own vendor capability for this, but of
> course we have both the where to put it in config space issue, as well
> as the issue of trying to push an ad-hoc standard.  vfio could expose
> these as device features rather than emulating capabilities, but that
> still leaves a big gap between vfio in the hypervisor and the driver in
> the guest VM.  That might still help push the responsibility and policy
> for how to expose it to the VM as a userspace problem though.

I think this is a good long term solution, but if the vIOMMU implenentations
can carry us for the time being, we can probably defer them unless
we are stuck.

> 
> I agree though, I don't know why the SIG would preclude implementing
> per VF control of these features.  Thanks,
> 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-09 Thread Jean-Philippe Brucker
On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> On Tue, 7 Apr 2020 21:00:21 -0700
> "Raj, Ashok"  wrote:
> 
> > Hi Alex
> > 
> > + Bjorn
> 
>  + Don
> 
> > FWIW I can't understand why PCI SIG went different ways with ATS, 
> > where its enumerated on PF and VF. But for PASID and PRI its only
> > in PF. 
> > 
> > I'm checking with our internal SIG reps to followup on that.
> > 
> > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > Is there vendor guarantee that hidden registers will locate at the
> > > > same offset between PF and VF config space?   
> > > 
> > > I'm not sure if the spec really precludes hidden registers, but the
> > > fact that these registers are explicitly outside of the capability
> > > chain implies they're only intended for device specific use, so I'd say
> > > there are no guarantees about anything related to these registers.  
> > 
> > As you had suggested in the other thread, we could consider
> > using the same offset as in PF, but even that's a better guess
> > still not reliable.
> > 
> > The other option is to maybe extend driver ops in the PF to expose
> > where the offsets should be. Sort of adding the quirk in the 
> > implementation. 
> > 
> > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > resisting 
> > making VF's first class citizen, we might ask them to add some verbiage
> > to suggest leave the same offsets as PF open to help emulation software.
> 
> Even if we know where to expose these capabilities on the VF, it's not
> clear to me how we can actually virtualize the capability itself.  If
> the spec defines, for example, an enable bit as r/w then software that
> interacts with that register expects the bit is settable.  There's no
> protocol for "try to set the bit and re-read it to see if the hardware
> accepted it".  Therefore a capability with a fixed enable bit
> representing the state of the PF, not settable by the VF, is
> disingenuous to the spec.

Would it be OK to implement a lock down mechanism for the PF PASID
capability, preventing changes to the PF cap when the VF is in use by
VFIO?  The emulation would still break the spec: since the PF cap would
always be enabled the VF configuration bits would have no effect, but it
seems preferable to having the Enable bit not enable anything.

> 
> If what we're trying to do is expose that PASID and PRI are enabled on
> the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> without the ability to control it is not the right approach.  Maybe we
> need new capabilities exposing these as slave features that cannot be
> controlled?  We could define our own vendor capability for this, but of
> course we have both the where to put it in config space issue, as well
> as the issue of trying to push an ad-hoc standard.  vfio could expose
> these as device features rather than emulating capabilities, but that
> still leaves a big gap between vfio in the hypervisor and the driver in
> the guest VM.  That might still help push the responsibility and policy
> for how to expose it to the VM as a userspace problem though.

Userspace might have more difficulty working around the issues mentioned
in this thread. They would still need a guarantee that the PF PASID
configuration doesn't change at runtime, and they wouldn't have the
ability to talk to a vendor driver to figure out where to place the fake
PASID capability.

Thanks,
Jean

> 
> I agree though, I don't know why the SIG would preclude implementing
> per VF control of these features.  Thanks,
> 
> Alex
> 
> > > FWIW, vfio started out being more strict about restricting config space
> > > access to defined capabilities, until...
> > > 
> > > commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> > > Author: Alex Williamson 
> > > Date:   Mon Apr 1 09:04:12 2013 -0600
> > >   
> > 
> > Cheers,
> > Ashok
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-08 Thread Raj, Ashok
Hi Alex

On Wed, Apr 08, 2020 at 10:19:40AM -0600, Alex Williamson wrote:
> On Tue, 7 Apr 2020 21:00:21 -0700
> "Raj, Ashok"  wrote:
> 
> > Hi Alex
> > 
> > + Bjorn
> 
>  + Don
> 
> > FWIW I can't understand why PCI SIG went different ways with ATS, 
> > where its enumerated on PF and VF. But for PASID and PRI its only
> > in PF. 
> > 
> > I'm checking with our internal SIG reps to followup on that.
> > 
> > On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > > Is there vendor guarantee that hidden registers will locate at the
> > > > same offset between PF and VF config space?   
> > > 
> > > I'm not sure if the spec really precludes hidden registers, but the
> > > fact that these registers are explicitly outside of the capability
> > > chain implies they're only intended for device specific use, so I'd say
> > > there are no guarantees about anything related to these registers.  
> > 
> > As you had suggested in the other thread, we could consider
> > using the same offset as in PF, but even that's a better guess
> > still not reliable.
> > 
> > The other option is to maybe extend driver ops in the PF to expose
> > where the offsets should be. Sort of adding the quirk in the 
> > implementation. 
> > 
> > I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> > resisting 
> > making VF's first class citizen, we might ask them to add some verbiage
> > to suggest leave the same offsets as PF open to help emulation software.
> 
> Even if we know where to expose these capabilities on the VF, it's not
> clear to me how we can actually virtualize the capability itself.  If
> the spec defines, for example, an enable bit as r/w then software that
> interacts with that register expects the bit is settable.  There's no
> protocol for "try to set the bit and re-read it to see if the hardware
> accepted it".  Therefore a capability with a fixed enable bit
> representing the state of the PF, not settable by the VF, is
> disingenuous to the spec.
> 
> If what we're trying to do is expose that PASID and PRI are enabled on
> the PF to a VF driver, maybe duplicating the PF capabilities on the VF
> without the ability to control it is not the right approach.  Maybe we
> need new capabilities exposing these as slave features that cannot be
> controlled?  We could define our own vendor capability for this, but of

The other option is to say, VFIO would never emulate these
fake capablities. If exposing a VF with PASID/PRI is required
the PF driver would simply wrap it into a VDCM like model which we do
today for Scalable IOV devices. So PF handles all aspects of this
interface.

I also like the suggestion you propose, maybe an offset where these
capabilities are exposed to VF's. Maybe have an architected DEVCAPx
which exposes these RO capabilities. No control, and the 
offset should be preserved by the SIG, so VMM can have a safe place
to stash them.

> course we have both the where to put it in config space issue, as well
> as the issue of trying to push an ad-hoc standard.  vfio could expose
> these as device features rather than emulating capabilities, but that
> still leaves a big gap between vfio in the hypervisor and the driver in
> the guest VM.  That might still help push the responsibility and policy
> for how to expose it to the VM as a userspace problem though.
> 
> I agree though, I don't know why the SIG would preclude implementing
> per VF control of these features.  Thanks,

Even if we ask SIG for clarification, it might affect today's devices
So might not be useful to solve our current situation.

Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-08 Thread Alex Williamson
On Tue, 7 Apr 2020 21:00:21 -0700
"Raj, Ashok"  wrote:

> Hi Alex
> 
> + Bjorn

 + Don

> FWIW I can't understand why PCI SIG went different ways with ATS, 
> where its enumerated on PF and VF. But for PASID and PRI its only
> in PF. 
> 
> I'm checking with our internal SIG reps to followup on that.
> 
> On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > > Is there vendor guarantee that hidden registers will locate at the
> > > same offset between PF and VF config space?   
> > 
> > I'm not sure if the spec really precludes hidden registers, but the
> > fact that these registers are explicitly outside of the capability
> > chain implies they're only intended for device specific use, so I'd say
> > there are no guarantees about anything related to these registers.  
> 
> As you had suggested in the other thread, we could consider
> using the same offset as in PF, but even that's a better guess
> still not reliable.
> 
> The other option is to maybe extend driver ops in the PF to expose
> where the offsets should be. Sort of adding the quirk in the 
> implementation. 
> 
> I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is 
> resisting 
> making VF's first class citizen, we might ask them to add some verbiage
> to suggest leave the same offsets as PF open to help emulation software.

Even if we know where to expose these capabilities on the VF, it's not
clear to me how we can actually virtualize the capability itself.  If
the spec defines, for example, an enable bit as r/w then software that
interacts with that register expects the bit is settable.  There's no
protocol for "try to set the bit and re-read it to see if the hardware
accepted it".  Therefore a capability with a fixed enable bit
representing the state of the PF, not settable by the VF, is
disingenuous to the spec.

If what we're trying to do is expose that PASID and PRI are enabled on
the PF to a VF driver, maybe duplicating the PF capabilities on the VF
without the ability to control it is not the right approach.  Maybe we
need new capabilities exposing these as slave features that cannot be
controlled?  We could define our own vendor capability for this, but of
course we have both the where to put it in config space issue, as well
as the issue of trying to push an ad-hoc standard.  vfio could expose
these as device features rather than emulating capabilities, but that
still leaves a big gap between vfio in the hypervisor and the driver in
the guest VM.  That might still help push the responsibility and policy
for how to expose it to the VM as a userspace problem though.

I agree though, I don't know why the SIG would preclude implementing
per VF control of these features.  Thanks,

Alex

> > FWIW, vfio started out being more strict about restricting config space
> > access to defined capabilities, until...
> > 
> > commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> > Author: Alex Williamson 
> > Date:   Mon Apr 1 09:04:12 2013 -0600
> >   
> 
> Cheers,
> Ashok
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Raj, Ashok
Hi Alex

+ Bjorn

FWIW I can't understand why PCI SIG went different ways with ATS, 
where its enumerated on PF and VF. But for PASID and PRI its only
in PF. 

I'm checking with our internal SIG reps to followup on that.

On Tue, Apr 07, 2020 at 09:58:01AM -0600, Alex Williamson wrote:
> > Is there vendor guarantee that hidden registers will locate at the
> > same offset between PF and VF config space? 
> 
> I'm not sure if the spec really precludes hidden registers, but the
> fact that these registers are explicitly outside of the capability
> chain implies they're only intended for device specific use, so I'd say
> there are no guarantees about anything related to these registers.

As you had suggested in the other thread, we could consider
using the same offset as in PF, but even that's a better guess
still not reliable.

The other option is to maybe extend driver ops in the PF to expose
where the offsets should be. Sort of adding the quirk in the 
implementation. 

I'm not sure how prevalent are PASID and PRI in VF devices. If SIG is resisting 
making VF's first class citizen, we might ask them to add some verbiage
to suggest leave the same offsets as PF open to help emulation software.


> 
> FWIW, vfio started out being more strict about restricting config space
> access to defined capabilities, until...
> 
> commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> Author: Alex Williamson 
> Date:   Mon Apr 1 09:04:12 2013 -0600
> 

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Tuesday, April 7, 2020 11:58 PM
> 
> On Tue, 7 Apr 2020 04:26:23 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Saturday, April 4, 2020 1:26 AM
> > [...]
> > > > > > +   if (!pasid_cap.control_reg.paside) {
> > > > > > +   pr_debug("%s: its PF's PASID capability is not
> enabled\n",
> > > > > > +   dev_name(>pdev->dev));
> > > > > > +   ret = 0;
> > > > > > +   goto out;
> > > > > > +   }
> > > > >
> > > > > What happens if the PF's PASID gets disabled while we're using it??
> > > >
> > > > This is actually the open I highlighted in cover letter. Per the reply
> > > > from Baolu, this seems to be an open for bare-metal all the same.
> > > > https://lkml.org/lkml/2020/3/31/95
> > >
> > > Seems that needs to get sorted out before we can expose this.  Maybe
> > > some sort of registration with the PF driver that PASID is being used
> > > by a VF so it cannot be disabled?
> >
> > I guess we may do vSVA for PF first, and then adding VF vSVA later
> > given above additional need. It's not necessarily to enable both
> > in one step.
> >
> > [...]
> > > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct
> > > vfio_pci_device *vdev)
> > > > > > if (!ecaps)
> > > > > > *(u32 *)>vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > > >
> > > > > > +#ifdef CONFIG_PCI_ATS
> > > > > > +   if (pdev->is_virtfn) {
> > > > > > +   struct pci_dev *physfn = pdev->physfn;
> > > > > > +
> > > > > > +   ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > > +   physfn, epos_max, prev);
> > > > > > +   if (ret)
> > > > > > +   pr_info("%s, failed to add special caps for
> VF %s\n",
> > > > > > +   __func__, dev_name(>pdev-
> >dev));
> > > > > > +   }
> > > > > > +#endif
> > > > >
> > > > > I can only imagine that we should place the caps at the same location
> > > > > they exist on the PF, we don't know what hidden registers might be
> > > > > hiding in config space.
> >
> > Is there vendor guarantee that hidden registers will locate at the
> > same offset between PF and VF config space?
> 
> I'm not sure if the spec really precludes hidden registers, but the
> fact that these registers are explicitly outside of the capability
> chain implies they're only intended for device specific use, so I'd say
> there are no guarantees about anything related to these registers.
> 
> FWIW, vfio started out being more strict about restricting config space
> access to defined capabilities, until...
> 
> commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
> Author: Alex Williamson 
> Date:   Mon Apr 1 09:04:12 2013 -0600
> 
> vfio-pci: Enable raw access to unassigned config space
> 
> Devices like be2net hide registers between the gaps in capabilities
> and architected regions of PCI config space.  Our choices to support
> such devices is to either build an ever growing and unmanageable white
> list or rely on hardware isolation to protect us.  These registers are
> really no different than MMIO or I/O port space registers, which we
> don't attempt to regulate, so treat PCI config space in the same way.
> 
> > > > but we are not sure whether the same location is available on VF. In
> > > > this patch, it actually places the emulated cap physically behind the
> > > > cap which lays farthest (its offset is largest) within VF's config space
> > > > as the PCIe caps are linked in a chain.
> > >
> > > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > > nasty habit of hiding random registers in PCI config space, outside of
> > > defined capabilities.  I feel like IGD might even do this too, is that
> > > true?  So I don't think we can guarantee that just because a section of
> > > config space isn't part of a defined capability that its unused.  It
> > > only means that it's unused by common code, but it might have device
> > > specific purposes.  So of the PCIe spec indicates that VFs cannot
> > > include these capabilities and virtialization software needs to
> > > emulate them, we need somewhere safe to place them in config space,
> and
> > > simply placing them off the end of known capabilities doesn't give me
> > > any confidence.  Also, hardware has no requirement to make compact
> use
> > > of extended config space.  The first capability must be at 0x100, the
> > > very next capability could consume all the way to the last byte of the
> > > 4K extended range, and the next link in the chain could be somewhere in
> > > the middle.  Thanks,
> > >
> >
> > Then what would be a viable option? Vendor nasty habit implies
> > no standard, thus I don't see how VFIO can find a safe location
> > by itself. Also curious how those hidden registers are identified
> > by VFIO and employed with proper r/w policy today. If sort of quirks
> > are used, then could such quirk way be extended to also carry

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-07 Thread Alex Williamson
On Tue, 7 Apr 2020 04:26:23 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Saturday, April 4, 2020 1:26 AM  
> [...]
> > > > > + if (!pasid_cap.control_reg.paside) {
> > > > > + pr_debug("%s: its PF's PASID capability is not 
> > > > > enabled\n",
> > > > > + dev_name(>pdev->dev));
> > > > > + ret = 0;
> > > > > + goto out;
> > > > > + }  
> > > >
> > > > What happens if the PF's PASID gets disabled while we're using it??  
> > >
> > > This is actually the open I highlighted in cover letter. Per the reply
> > > from Baolu, this seems to be an open for bare-metal all the same.
> > > https://lkml.org/lkml/2020/3/31/95  
> > 
> > Seems that needs to get sorted out before we can expose this.  Maybe
> > some sort of registration with the PF driver that PASID is being used
> > by a VF so it cannot be disabled?  
> 
> I guess we may do vSVA for PF first, and then adding VF vSVA later
> given above additional need. It's not necessarily to enable both
> in one step.
> 
> [...]
> > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct  
> > vfio_pci_device *vdev)  
> > > > >   if (!ecaps)
> > > > >   *(u32 *)>vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > >
> > > > > +#ifdef CONFIG_PCI_ATS
> > > > > + if (pdev->is_virtfn) {
> > > > > + struct pci_dev *physfn = pdev->physfn;
> > > > > +
> > > > > + ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > + physfn, epos_max, prev);
> > > > > + if (ret)
> > > > > + pr_info("%s, failed to add special caps for VF 
> > > > > %s\n",
> > > > > + __func__, dev_name(>pdev->dev));
> > > > > + }
> > > > > +#endif  
> > > >
> > > > I can only imagine that we should place the caps at the same location
> > > > they exist on the PF, we don't know what hidden registers might be
> > > > hiding in config space.  
> 
> Is there vendor guarantee that hidden registers will locate at the
> same offset between PF and VF config space? 

I'm not sure if the spec really precludes hidden registers, but the
fact that these registers are explicitly outside of the capability
chain implies they're only intended for device specific use, so I'd say
there are no guarantees about anything related to these registers.

FWIW, vfio started out being more strict about restricting config space
access to defined capabilities, until...

commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
Author: Alex Williamson 
Date:   Mon Apr 1 09:04:12 2013 -0600

vfio-pci: Enable raw access to unassigned config space

Devices like be2net hide registers between the gaps in capabilities
and architected regions of PCI config space.  Our choices to support
such devices is to either build an ever growing and unmanageable white
list or rely on hardware isolation to protect us.  These registers are
really no different than MMIO or I/O port space registers, which we
don't attempt to regulate, so treat PCI config space in the same way.

> > > but we are not sure whether the same location is available on VF. In
> > > this patch, it actually places the emulated cap physically behind the
> > > cap which lays farthest (its offset is largest) within VF's config space
> > > as the PCIe caps are linked in a chain.  
> > 
> > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > nasty habit of hiding random registers in PCI config space, outside of
> > defined capabilities.  I feel like IGD might even do this too, is that
> > true?  So I don't think we can guarantee that just because a section of
> > config space isn't part of a defined capability that its unused.  It
> > only means that it's unused by common code, but it might have device
> > specific purposes.  So of the PCIe spec indicates that VFs cannot
> > include these capabilities and virtialization software needs to
> > emulate them, we need somewhere safe to place them in config space, and
> > simply placing them off the end of known capabilities doesn't give me
> > any confidence.  Also, hardware has no requirement to make compact use
> > of extended config space.  The first capability must be at 0x100, the
> > very next capability could consume all the way to the last byte of the
> > 4K extended range, and the next link in the chain could be somewhere in
> > the middle.  Thanks,
> >   
> 
> Then what would be a viable option? Vendor nasty habit implies
> no standard, thus I don't see how VFIO can find a safe location
> by itself. Also curious how those hidden registers are identified
> by VFIO and employed with proper r/w policy today. If sort of quirks
> are used, then could such quirk way be extended to also carry
> the information about vendor specific safe location? When no
> such quirk info is provided (the majority case), VFIO then finds
> out a free location to carry the new cap.

See above 

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-06 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Saturday, April 4, 2020 1:26 AM
[...]
> > > > +   if (!pasid_cap.control_reg.paside) {
> > > > +   pr_debug("%s: its PF's PASID capability is not 
> > > > enabled\n",
> > > > +   dev_name(>pdev->dev));
> > > > +   ret = 0;
> > > > +   goto out;
> > > > +   }
> > >
> > > What happens if the PF's PASID gets disabled while we're using it??
> >
> > This is actually the open I highlighted in cover letter. Per the reply
> > from Baolu, this seems to be an open for bare-metal all the same.
> > https://lkml.org/lkml/2020/3/31/95
> 
> Seems that needs to get sorted out before we can expose this.  Maybe
> some sort of registration with the PF driver that PASID is being used
> by a VF so it cannot be disabled?

I guess we may do vSVA for PF first, and then adding VF vSVA later
given above additional need. It's not necessarily to enable both
in one step.

[...]
> > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct
> vfio_pci_device *vdev)
> > > > if (!ecaps)
> > > > *(u32 *)>vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > >
> > > > +#ifdef CONFIG_PCI_ATS
> > > > +   if (pdev->is_virtfn) {
> > > > +   struct pci_dev *physfn = pdev->physfn;
> > > > +
> > > > +   ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > +   physfn, epos_max, prev);
> > > > +   if (ret)
> > > > +   pr_info("%s, failed to add special caps for VF 
> > > > %s\n",
> > > > +   __func__, dev_name(>pdev->dev));
> > > > +   }
> > > > +#endif
> > >
> > > I can only imagine that we should place the caps at the same location
> > > they exist on the PF, we don't know what hidden registers might be
> > > hiding in config space.

Is there vendor guarantee that hidden registers will locate at the
same offset between PF and VF config space? 

> >
> > but we are not sure whether the same location is available on VF. In
> > this patch, it actually places the emulated cap physically behind the
> > cap which lays farthest (its offset is largest) within VF's config space
> > as the PCIe caps are linked in a chain.
> 
> But, as we've found on Broadcom NICs (iirc), hardware developers have a
> nasty habit of hiding random registers in PCI config space, outside of
> defined capabilities.  I feel like IGD might even do this too, is that
> true?  So I don't think we can guarantee that just because a section of
> config space isn't part of a defined capability that its unused.  It
> only means that it's unused by common code, but it might have device
> specific purposes.  So of the PCIe spec indicates that VFs cannot
> include these capabilities and virtialization software needs to
> emulate them, we need somewhere safe to place them in config space, and
> simply placing them off the end of known capabilities doesn't give me
> any confidence.  Also, hardware has no requirement to make compact use
> of extended config space.  The first capability must be at 0x100, the
> very next capability could consume all the way to the last byte of the
> 4K extended range, and the next link in the chain could be somewhere in
> the middle.  Thanks,
> 

Then what would be a viable option? Vendor nasty habit implies
no standard, thus I don't see how VFIO can find a safe location
by itself. Also curious how those hidden registers are identified
by VFIO and employed with proper r/w policy today. If sort of quirks
are used, then could such quirk way be extended to also carry
the information about vendor specific safe location? When no
such quirk info is provided (the majority case), VFIO then finds
out a free location to carry the new cap.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-03 Thread Alex Williamson
On Fri, 3 Apr 2020 07:53:55 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> > From: Alex Williamson 
> > Sent: Friday, April 3, 2020 7:00 AM
> > To: Liu, Yi L 
> > Subject: Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
> > 
> > On Sun, 22 Mar 2020 05:33:14 -0700
> > "Liu, Yi L"  wrote:
> >   
> > > From: Liu Yi L 
> > >
> > > Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the
> > > PF PASID configuration is shared by its VFs.  VFs must not implement their
> > > own PASID Capability.
> > >
> > > Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If
> > > the PF implements PRI, it is shared by the VFs.
> > >
> > > On bare metal, it has been fixed by below efforts.
> > > to PASID/PRI are
> > > https://lkml.org/lkml/2019/9/5/996
> > > https://lkml.org/lkml/2019/9/5/995
> > >
> > > This patch adds emulated PASID/PRI capabilities for VFs when assigned to
> > > VMs via vfio-pci driver. This is required for enabling vSVA on 
> > > pass-through
> > > VFs as VFs have no PASID/PRI capability structure in its configure space.
> > >
> > > Cc: Kevin Tian 
> > > CC: Jacob Pan 
> > > Cc: Alex Williamson 
> > > Cc: Eric Auger 
> > > Cc: Jean-Philippe Brucker 
> > > Signed-off-by: Liu Yi L 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_config.c | 325  
> > -  
> > >  1 file changed, 323 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> > > b/drivers/vfio/pci/vfio_pci_config.c
> > > index 4b9af99..84b4ea0 100644
> > > --- a/drivers/vfio/pci/vfio_pci_config.c
> > > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > > @@ -1509,11 +1509,304 @@ static int vfio_cap_init(struct vfio_pci_device 
> > > *vdev)
> > >   return 0;
> > >  }
> > >
> > > +static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_device *vdev,
> > > + int offset, uint8_t *data, int size)
> > > +{
> > > + int ret = 0, data_offset = 0;
> > > +
> > > + while (size) {
> > > + int filled;
> > > +
> > > + if (size >= 4 && !(offset % 4)) {
> > > + __le32 *dwordp = (__le32 *)>vconfig[offset];
> > > + u32 dword;
> > > +
> > > + memcpy(, data + data_offset, 4);
> > > + *dwordp = cpu_to_le32(dword);  
> > 
> > Why wouldn't we do:
> > 
> > *dwordp = cpu_to_le32(*(u32 *)(data + data_offset));
> > 
> > or better yet, increment data on each iteration for:
> > 
> > *dwordp = cpu_to_le32(*(u32 *)data);  
> 
> I'll refine it.
> 
> > vfio_fill_vconfig_bytes() does almost this same thing, getting
> > the data
> > from config space rather than a buffer, so please figure out how to
> > avoid duplicating the logic.  
> 
> This patch is to emulate the PASID/PRI capability for VF. And per
> my understanding, the cap data from PF's config space cannot be
> filled to VF's vconfig directly. Take the control register in PASID
> capability structure as an example. If PASID cap is enabled in PF,
> the PASID enable bit in the control register will be set. If the cap
> data is filled to VF's vconfig directly, then guest will see a default
> enabled PASID capability in the VF. I guess it is not good. But, I may
> be wrong. If no such requirement, I can use vfio_fill_vconfig_bytes()
> directly, no need to do copy and modify and then copy.
> 
> Also, if still needs to do the copy and modification, I may modify
> the vfio_fill_vconfig_bytes() to pass in the buffer and an indicator
> to tell vfio_fill_vconfig_bytes() whether fetch data from buffer or
> from config space.

Why is vconfig not your buffer?  We're building config space emulation
before the user has access to the device, so it's really not an issue
to copy the PF config into the VF vconfig, then modify the enable bit
in the vconfig space.

> > > + filled = 4;
> > > + } else if (size >= 2 && !(offset % 2)) {
> > > + __le16 *wordp = (__le16 *)>vconfig[offset];
> > > + u16 word;
> > > +
> > > + memcpy(, data + data_offset, 2);
> > > + *wordp = cpu_to_le16(word);
> > > + filled = 2;
> > > + } else {

RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-03 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 7:00 AM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
> 
> On Sun, 22 Mar 2020 05:33:14 -0700
> "Liu, Yi L"  wrote:
> 
> > From: Liu Yi L 
> >
> > Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the
> > PF PASID configuration is shared by its VFs.  VFs must not implement their
> > own PASID Capability.
> >
> > Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If
> > the PF implements PRI, it is shared by the VFs.
> >
> > On bare metal, it has been fixed by below efforts.
> > to PASID/PRI are
> > https://lkml.org/lkml/2019/9/5/996
> > https://lkml.org/lkml/2019/9/5/995
> >
> > This patch adds emulated PASID/PRI capabilities for VFs when assigned to
> > VMs via vfio-pci driver. This is required for enabling vSVA on pass-through
> > VFs as VFs have no PASID/PRI capability structure in its configure space.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/pci/vfio_pci_config.c | 325
> -
> >  1 file changed, 323 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> > b/drivers/vfio/pci/vfio_pci_config.c
> > index 4b9af99..84b4ea0 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -1509,11 +1509,304 @@ static int vfio_cap_init(struct vfio_pci_device 
> > *vdev)
> > return 0;
> >  }
> >
> > +static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_device *vdev,
> > +   int offset, uint8_t *data, int size)
> > +{
> > +   int ret = 0, data_offset = 0;
> > +
> > +   while (size) {
> > +   int filled;
> > +
> > +   if (size >= 4 && !(offset % 4)) {
> > +   __le32 *dwordp = (__le32 *)>vconfig[offset];
> > +   u32 dword;
> > +
> > +   memcpy(, data + data_offset, 4);
> > +   *dwordp = cpu_to_le32(dword);
> 
> Why wouldn't we do:
> 
> *dwordp = cpu_to_le32(*(u32 *)(data + data_offset));
> 
> or better yet, increment data on each iteration for:
> 
> *dwordp = cpu_to_le32(*(u32 *)data);
> 
> vfio_fill_vconfig_bytes() does almost this same thing, getting the data
> from config space rather than a buffer, so please figure out how to
> avoid duplicating the logic.

Got another alternative. I may use the vfio_fill_vconfig_bytes()
to fill the cap data from PF's config space into VF's vconfig.
And after that, I can further modify the data in vconfig. e.g.
zero the control reg of pasid cap. would it make more sense?

> > +   filled = 4;
> > +   } else if (size >= 2 && !(offset % 2)) {
> > +   __le16 *wordp = (__le16 *)>vconfig[offset];
> > +   u16 word;
> > +
> > +   memcpy(, data + data_offset, 2);
> > +   *wordp = cpu_to_le16(word);
> > +   filled = 2;
> > +   } else {
> > +   u8 *byte = >vconfig[offset];
> > +
> > +   memcpy(byte, data + data_offset, 1);
[...]
> 
> > +
> > +   memset(map + epos, vpasid_cap.id, len);
> 
> See below.
> 
> > +   ret = vfio_fill_custom_vconfig_bytes(vdev, epos,
> > +   (u8 *)_cap, len);
> > +   if (!ret) {
> > +   /*
> > +* Successfully filled in PASID cap, update
> > +* the next offset in previous cap header,
> > +* and also update caller about the offset
> > +* of next cap if any.
> > +*/
> > +   u32 val = epos;
> > +   **prevp &= cpu_to_le32(~(0xffcU << 20));
> > +   **prevp |= cpu_to_le32(val << 20);
> > +   *prevp = (__le32 *)>vconfig[epos];
> > +   *next = epos + len;
> 
> Could we make this any more complicated?

I'm not sure if adding comments addressed this comment. After adding
new cap in vconfig, it needs to update the cap.next field of prior cap.
And in case of further add other cap, this function needs to update the
prevp pointer to the address of the newly added cap.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-03 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 3, 2020 7:00 AM
> To: Liu, Yi L 
> Subject: Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
> 
> On Sun, 22 Mar 2020 05:33:14 -0700
> "Liu, Yi L"  wrote:
> 
> > From: Liu Yi L 
> >
> > Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the
> > PF PASID configuration is shared by its VFs.  VFs must not implement their
> > own PASID Capability.
> >
> > Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If
> > the PF implements PRI, it is shared by the VFs.
> >
> > On bare metal, it has been fixed by below efforts.
> > to PASID/PRI are
> > https://lkml.org/lkml/2019/9/5/996
> > https://lkml.org/lkml/2019/9/5/995
> >
> > This patch adds emulated PASID/PRI capabilities for VFs when assigned to
> > VMs via vfio-pci driver. This is required for enabling vSVA on pass-through
> > VFs as VFs have no PASID/PRI capability structure in its configure space.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/pci/vfio_pci_config.c | 325
> -
> >  1 file changed, 323 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> > b/drivers/vfio/pci/vfio_pci_config.c
> > index 4b9af99..84b4ea0 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -1509,11 +1509,304 @@ static int vfio_cap_init(struct vfio_pci_device 
> > *vdev)
> > return 0;
> >  }
> >
> > +static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_device *vdev,
> > +   int offset, uint8_t *data, int size)
> > +{
> > +   int ret = 0, data_offset = 0;
> > +
> > +   while (size) {
> > +   int filled;
> > +
> > +   if (size >= 4 && !(offset % 4)) {
> > +   __le32 *dwordp = (__le32 *)>vconfig[offset];
> > +   u32 dword;
> > +
> > +   memcpy(, data + data_offset, 4);
> > +   *dwordp = cpu_to_le32(dword);
> 
> Why wouldn't we do:
> 
> *dwordp = cpu_to_le32(*(u32 *)(data + data_offset));
> 
> or better yet, increment data on each iteration for:
> 
> *dwordp = cpu_to_le32(*(u32 *)data);

I'll refine it.

> vfio_fill_vconfig_bytes() does almost this same thing, getting
> the data
> from config space rather than a buffer, so please figure out how to
> avoid duplicating the logic.

This patch is to emulate the PASID/PRI capability for VF. And per
my understanding, the cap data from PF's config space cannot be
filled to VF's vconfig directly. Take the control register in PASID
capability structure as an example. If PASID cap is enabled in PF,
the PASID enable bit in the control register will be set. If the cap
data is filled to VF's vconfig directly, then guest will see a default
enabled PASID capability in the VF. I guess it is not good. But, I may
be wrong. If no such requirement, I can use vfio_fill_vconfig_bytes()
directly, no need to do copy and modify and then copy.

Also, if still needs to do the copy and modification, I may modify
the vfio_fill_vconfig_bytes() to pass in the buffer and an indicator
to tell vfio_fill_vconfig_bytes() whether fetch data from buffer or
from config space.

> > +   filled = 4;
> > +   } else if (size >= 2 && !(offset % 2)) {
> > +   __le16 *wordp = (__le16 *)>vconfig[offset];
> > +   u16 word;
> > +
> > +   memcpy(, data + data_offset, 2);
> > +   *wordp = cpu_to_le16(word);
> > +   filled = 2;
> > +   } else {
> > +   u8 *byte = >vconfig[offset];
> > +
> > +   memcpy(byte, data + data_offset, 1);
> 
> This one is really silly.
> 
> vdev->vconfig[offset] = *data;

got it.

> > +   filled = 1;
> > +   }
> > +
> > +   offset += filled;
> > +   data_offset += filled;
> > +   size -= filled;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +static int vfio_pci_get_ecap_content(struct pci_dev *pdev,
> > +   int cap, int cap_len, u8 *content)
> > +{
> > +   int pos, offset, len = cap_len, ret = 0;
> > +
> > +   pos = pci_find_ext_capability(pdev, cap);
> > +   

Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

2020-04-02 Thread Alex Williamson
On Sun, 22 Mar 2020 05:33:14 -0700
"Liu, Yi L"  wrote:

> From: Liu Yi L 
> 
> Per PCIe r5.0, sec 9.3.7.14, if a PF implements the PASID Capability, the
> PF PASID configuration is shared by its VFs.  VFs must not implement their
> own PASID Capability.
> 
> Per PCIe r5.0, sec 9.3.7.11, VFs must not implement the PRI Capability. If
> the PF implements PRI, it is shared by the VFs.
> 
> On bare metal, it has been fixed by below efforts.
> to PASID/PRI are
> https://lkml.org/lkml/2019/9/5/996
> https://lkml.org/lkml/2019/9/5/995
> 
> This patch adds emulated PASID/PRI capabilities for VFs when assigned to
> VMs via vfio-pci driver. This is required for enabling vSVA on pass-through
> VFs as VFs have no PASID/PRI capability structure in its configure space.
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/pci/vfio_pci_config.c | 325 
> -
>  1 file changed, 323 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index 4b9af99..84b4ea0 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -1509,11 +1509,304 @@ static int vfio_cap_init(struct vfio_pci_device 
> *vdev)
>   return 0;
>  }
>  
> +static int vfio_fill_custom_vconfig_bytes(struct vfio_pci_device *vdev,
> + int offset, uint8_t *data, int size)
> +{
> + int ret = 0, data_offset = 0;
> +
> + while (size) {
> + int filled;
> +
> + if (size >= 4 && !(offset % 4)) {
> + __le32 *dwordp = (__le32 *)>vconfig[offset];
> + u32 dword;
> +
> + memcpy(, data + data_offset, 4);
> + *dwordp = cpu_to_le32(dword);

Why wouldn't we do:

*dwordp = cpu_to_le32(*(u32 *)(data + data_offset));

or better yet, increment data on each iteration for:

*dwordp = cpu_to_le32(*(u32 *)data);

vfio_fill_vconfig_bytes() does almost this same thing, getting the data
from config space rather than a buffer, so please figure out how to
avoid duplicating the logic.

> + filled = 4;
> + } else if (size >= 2 && !(offset % 2)) {
> + __le16 *wordp = (__le16 *)>vconfig[offset];
> + u16 word;
> +
> + memcpy(, data + data_offset, 2);
> + *wordp = cpu_to_le16(word);
> + filled = 2;
> + } else {
> + u8 *byte = >vconfig[offset];
> +
> + memcpy(byte, data + data_offset, 1);

This one is really silly.

vdev->vconfig[offset] = *data;

> + filled = 1;
> + }
> +
> + offset += filled;
> + data_offset += filled;
> + size -= filled;
> + }
> +
> + return ret;
> +}
> +
> +static int vfio_pci_get_ecap_content(struct pci_dev *pdev,
> + int cap, int cap_len, u8 *content)
> +{
> + int pos, offset, len = cap_len, ret = 0;
> +
> + pos = pci_find_ext_capability(pdev, cap);
> + if (!pos)
> + return -EINVAL;
> +
> + offset = 0;
> + while (len) {
> + int fetched;
> +
> + if (len >= 4 && !(pos % 4)) {
> + u32 *dwordp = (u32 *) (content + offset);
> + u32 dword;
> + __le32 *dwptr = (__le32 *) 
> +
> + ret = pci_read_config_dword(pdev, pos, );
> + if (ret)
> + return ret;
> + *dwordp = le32_to_cpu(*dwptr);

WHAT???  pci_read_config_dword() returns cpu endian data!

> + fetched = 4;
> + } else if (len >= 2 && !(pos % 2)) {
> + u16 *wordp = (u16 *) (content + offset);
> + u16 word;
> + __le16 *wptr = (__le16 *) 
> +
> + ret = pci_read_config_word(pdev, pos, );
> + if (ret)
> + return ret;
> + *wordp = le16_to_cpu(*wptr);
> + fetched = 2;
> + } else {
> + u8 *byte = (u8 *) (content + offset);
> +
> + ret = pci_read_config_byte(pdev, pos, byte);
> + if (ret)
> + return ret;
> + fetched = 1;
> + }
> +
> + pos += fetched;
> + offset += fetched;
> + len -= fetched;
> + }
> +
> + return ret;
> +}
> +
> +struct vfio_pci_pasid_cap_data {
> + u32 id:16;
> + u32 version:4;
> + u32 next:12;
> + union {
> + u16 cap_reg_val;
> + struct {
> + u16 rsv1:1;
> + u16 execs:1;
> +