Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-16 Thread Jason Wang
On Fri, Jan 14, 2022 at 8:59 PM Liu Yi L  wrote:
>
> On 2022/1/14 15:22, Jason Wang wrote:
> > On Fri, Jan 14, 2022 at 3:13 PM Peter Xu  wrote:
> >>
> >> On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > Right, but I think you meant to do this only when scalable mode is 
> > disabled.
> 
>  Yes IMHO it will definitely suite for !scalable case since that's 
>  exactly what
>  we did before.  What I'm also wondering is even if scalable is enabled 
>  but no
>  "real" pasid is used, so if all the translations go through the default 
>  pasid
>  that stored in the device context entry, then maybe we can ignore 
>  checking it.
>  The latter is the "hacky" part mentioned above.
> >>>
> >>> The problem I see is that we can't know what PASID is used as default
> >>> without reading the context entry?
> >>
> >> Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> >> same device?  If that's possible, then I agree..
> >
> > My understanding is that it is possible.
> >
> >>
> >> My previous idea should be based on the fact that if NO_PASID is used on 
> >> one
> >> device, then all translations will be based on NO_PASID, but now I'm not 
> >> sure
> >> of it.
> >
> > Actually, what I meant is:
> >
> > device 1 using transactions without PASID with RID2PASID 1
> > device 2 using transactions without PASID with RID2PASID 2
>
> Interesting series, Jason.
>
> haven't read through all your code yet. Just a quick comment. The
> RID2PASID1 and RID2PASID2 may be the same one. Vt-d spec has defined a RPS
> bit in ecap register. If it is reported as 0, that means the RID_PASID
> (previously it is called RID2PASID :-)) field of scalable mode context
> entry is not supported, a PASID value of 0 will be used for transactions
> wihout PASID. So in the code, you may check the RPS bit to see if the
> RID_PASID value are the same for all devices.

Good to know this, will spend some time to implement this (probably on
top). That can speed up the IOTLB lookup somehow

Thanks

>
> Regards,
> Yi Liu
>
> > Then we can't assume a default pasid here.
> >
> >>
> >>>
> 
>  The other thing to mention is, if we postpone the iotlb lookup to be 
>  after
>  context entry, then logically we can have per-device iotlb, that means 
>  we can
>  replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, 
>  too,
>  which can also be more efficient.
> >>>
> >>> Right but we still need to limit the total slots and ATS is a better
> >>> way to deal with the IOTLB bottleneck actually.
> >>
> >> I think it depends on how the iotlb ghash is implemented.  Logically I 
> >> think if
> >> we can split the cache to per-device it'll be slightly better because we 
> >> don't
> >> need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> >> each iotlb takes less space too (no devfn needed anymore).
> >
> > So we've already used sid in the IOTLB hash, I wonder how much we can
> > gain form this.
> >
> > Thanks
> >
> >>
> >> Thanks,
> >>
> >> --
> >> Peter Xu
> >>
> >
>
> --
> Regards,
> Yi Liu
>




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-14 Thread Liu Yi L

On 2022/1/14 15:22, Jason Wang wrote:

On Fri, Jan 14, 2022 at 3:13 PM Peter Xu  wrote:


On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:

Right, but I think you meant to do this only when scalable mode is disabled.


Yes IMHO it will definitely suite for !scalable case since that's exactly what
we did before.  What I'm also wondering is even if scalable is enabled but no
"real" pasid is used, so if all the translations go through the default pasid
that stored in the device context entry, then maybe we can ignore checking it.
The latter is the "hacky" part mentioned above.


The problem I see is that we can't know what PASID is used as default
without reading the context entry?


Can the default NO_PASID being used in mixture of !NO_PASID use case on the
same device?  If that's possible, then I agree..


My understanding is that it is possible.



My previous idea should be based on the fact that if NO_PASID is used on one
device, then all translations will be based on NO_PASID, but now I'm not sure
of it.


Actually, what I meant is:

device 1 using transactions without PASID with RID2PASID 1
device 2 using transactions without PASID with RID2PASID 2


Interesting series, Jason.

haven't read through all your code yet. Just a quick comment. The 
RID2PASID1 and RID2PASID2 may be the same one. Vt-d spec has defined a RPS 
bit in ecap register. If it is reported as 0, that means the RID_PASID 
(previously it is called RID2PASID :-)) field of scalable mode context 
entry is not supported, a PASID value of 0 will be used for transactions 
wihout PASID. So in the code, you may check the RPS bit to see if the 
RID_PASID value are the same for all devices.


Regards,
Yi Liu


Then we can't assume a default pasid here.







The other thing to mention is, if we postpone the iotlb lookup to be after
context entry, then logically we can have per-device iotlb, that means we can
replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
which can also be more efficient.


Right but we still need to limit the total slots and ATS is a better
way to deal with the IOTLB bottleneck actually.


I think it depends on how the iotlb ghash is implemented.  Logically I think if
we can split the cache to per-device it'll be slightly better because we don't
need to iterate over iotlbs of other devices when lookup anymore; meanwhile
each iotlb takes less space too (no devfn needed anymore).


So we've already used sid in the IOTLB hash, I wonder how much we can
gain form this.

Thanks



Thanks,

--
Peter Xu





--
Regards,
Yi Liu



Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-14 Thread Jason Wang
On Fri, Jan 14, 2022 at 3:45 PM Peter Xu  wrote:
>
> On Fri, Jan 14, 2022 at 03:22:16PM +0800, Jason Wang wrote:
> > On Fri, Jan 14, 2022 at 3:13 PM Peter Xu  wrote:
> > >
> > > On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > > > Right, but I think you meant to do this only when scalable mode is 
> > > > > > disabled.
> > > > >
> > > > > Yes IMHO it will definitely suite for !scalable case since that's 
> > > > > exactly what
> > > > > we did before.  What I'm also wondering is even if scalable is 
> > > > > enabled but no
> > > > > "real" pasid is used, so if all the translations go through the 
> > > > > default pasid
> > > > > that stored in the device context entry, then maybe we can ignore 
> > > > > checking it.
> > > > > The latter is the "hacky" part mentioned above.
> > > >
> > > > The problem I see is that we can't know what PASID is used as default
> > > > without reading the context entry?
> > >
> > > Can the default NO_PASID being used in mixture of !NO_PASID use case on 
> > > the
> > > same device?  If that's possible, then I agree..
> >
> > My understanding is that it is possible.
>
> OK.
>
> >
> > >
> > > My previous idea should be based on the fact that if NO_PASID is used on 
> > > one
> > > device, then all translations will be based on NO_PASID, but now I'm not 
> > > sure
> > > of it.
> >
> > Actually, what I meant is:
> >
> > device 1 using transactions without PASID with RID2PASID 1
> > device 2 using transactions without PASID with RID2PASID 2
> >
> > Then we can't assume a default pasid here.
>
> This seems fine, because "device N" is still part of the equation when looking
> up, so we won't lookup wrong.

Right.

>
> But yeah.. it could not really work anyway.
>
> >
> > >
> > > >
> > > > >
> > > > > The other thing to mention is, if we postpone the iotlb lookup to be 
> > > > > after
> > > > > context entry, then logically we can have per-device iotlb, that 
> > > > > means we can
> > > > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the 
> > > > > future, too,
> > > > > which can also be more efficient.
> > > >
> > > > Right but we still need to limit the total slots and ATS is a better
> > > > way to deal with the IOTLB bottleneck actually.
> > >
> > > I think it depends on how the iotlb ghash is implemented.  Logically I 
> > > think if
> > > we can split the cache to per-device it'll be slightly better because we 
> > > don't
> > > need to iterate over iotlbs of other devices when lookup anymore; 
> > > meanwhile
> > > each iotlb takes less space too (no devfn needed anymore).
> >
> > So we've already used sid in the IOTLB hash, I wonder how much we can
> > gain form this.
>
> I think at least we can shrink iotlb structures, e.g.:
>
> struct vtd_iotlb_key {
> uint16_t sid;   <-- not needed
> uint32_t pasid; <-- not needed
> uint64_t gfn;
> uint32_t level;
> };

I don't get why PASID is not needed.

Thanks

>
> struct VTDIOTLBEntry {
> uint64_t gfn;
> uint16_t domain_id;
> uint32_t pasid; <-- not needed
> uint64_t slpte;
> uint64_t mask;
> uint8_t access_flags;
> };
>
> Thanks,
>
> --
> Peter Xu
>




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-14 Thread Peter Xu
On Fri, Jan 14, 2022 at 03:22:16PM +0800, Jason Wang wrote:
> On Fri, Jan 14, 2022 at 3:13 PM Peter Xu  wrote:
> >
> > On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > > Right, but I think you meant to do this only when scalable mode is 
> > > > > disabled.
> > > >
> > > > Yes IMHO it will definitely suite for !scalable case since that's 
> > > > exactly what
> > > > we did before.  What I'm also wondering is even if scalable is enabled 
> > > > but no
> > > > "real" pasid is used, so if all the translations go through the default 
> > > > pasid
> > > > that stored in the device context entry, then maybe we can ignore 
> > > > checking it.
> > > > The latter is the "hacky" part mentioned above.
> > >
> > > The problem I see is that we can't know what PASID is used as default
> > > without reading the context entry?
> >
> > Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> > same device?  If that's possible, then I agree..
> 
> My understanding is that it is possible.

OK.

> 
> >
> > My previous idea should be based on the fact that if NO_PASID is used on one
> > device, then all translations will be based on NO_PASID, but now I'm not 
> > sure
> > of it.
> 
> Actually, what I meant is:
> 
> device 1 using transactions without PASID with RID2PASID 1
> device 2 using transactions without PASID with RID2PASID 2
> 
> Then we can't assume a default pasid here.

This seems fine, because "device N" is still part of the equation when looking
up, so we won't lookup wrong.

But yeah.. it could not really work anyway.

> 
> >
> > >
> > > >
> > > > The other thing to mention is, if we postpone the iotlb lookup to be 
> > > > after
> > > > context entry, then logically we can have per-device iotlb, that means 
> > > > we can
> > > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, 
> > > > too,
> > > > which can also be more efficient.
> > >
> > > Right but we still need to limit the total slots and ATS is a better
> > > way to deal with the IOTLB bottleneck actually.
> >
> > I think it depends on how the iotlb ghash is implemented.  Logically I 
> > think if
> > we can split the cache to per-device it'll be slightly better because we 
> > don't
> > need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> > each iotlb takes less space too (no devfn needed anymore).
> 
> So we've already used sid in the IOTLB hash, I wonder how much we can
> gain form this.

I think at least we can shrink iotlb structures, e.g.:

struct vtd_iotlb_key {
uint16_t sid;   <-- not needed
uint32_t pasid; <-- not needed
uint64_t gfn;
uint32_t level;
};

struct VTDIOTLBEntry {
uint64_t gfn;
uint16_t domain_id;
uint32_t pasid; <-- not needed
uint64_t slpte;
uint64_t mask;
uint8_t access_flags;
};

Thanks,

-- 
Peter Xu




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-14 Thread Jason Wang
On Fri, Jan 14, 2022 at 3:13 PM Peter Xu  wrote:
>
> On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > > Right, but I think you meant to do this only when scalable mode is 
> > > > disabled.
> > >
> > > Yes IMHO it will definitely suite for !scalable case since that's exactly 
> > > what
> > > we did before.  What I'm also wondering is even if scalable is enabled 
> > > but no
> > > "real" pasid is used, so if all the translations go through the default 
> > > pasid
> > > that stored in the device context entry, then maybe we can ignore 
> > > checking it.
> > > The latter is the "hacky" part mentioned above.
> >
> > The problem I see is that we can't know what PASID is used as default
> > without reading the context entry?
>
> Can the default NO_PASID being used in mixture of !NO_PASID use case on the
> same device?  If that's possible, then I agree..

My understanding is that it is possible.

>
> My previous idea should be based on the fact that if NO_PASID is used on one
> device, then all translations will be based on NO_PASID, but now I'm not sure
> of it.

Actually, what I meant is:

device 1 using transactions without PASID with RID2PASID 1
device 2 using transactions without PASID with RID2PASID 2

Then we can't assume a default pasid here.

>
> >
> > >
> > > The other thing to mention is, if we postpone the iotlb lookup to be after
> > > context entry, then logically we can have per-device iotlb, that means we 
> > > can
> > > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, 
> > > too,
> > > which can also be more efficient.
> >
> > Right but we still need to limit the total slots and ATS is a better
> > way to deal with the IOTLB bottleneck actually.
>
> I think it depends on how the iotlb ghash is implemented.  Logically I think 
> if
> we can split the cache to per-device it'll be slightly better because we don't
> need to iterate over iotlbs of other devices when lookup anymore; meanwhile
> each iotlb takes less space too (no devfn needed anymore).

So we've already used sid in the IOTLB hash, I wonder how much we can
gain form this.

Thanks

>
> Thanks,
>
> --
> Peter Xu
>




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-13 Thread Peter Xu
On Fri, Jan 14, 2022 at 01:58:07PM +0800, Jason Wang wrote:
> > > Right, but I think you meant to do this only when scalable mode is 
> > > disabled.
> >
> > Yes IMHO it will definitely suite for !scalable case since that's exactly 
> > what
> > we did before.  What I'm also wondering is even if scalable is enabled but 
> > no
> > "real" pasid is used, so if all the translations go through the default 
> > pasid
> > that stored in the device context entry, then maybe we can ignore checking 
> > it.
> > The latter is the "hacky" part mentioned above.
> 
> The problem I see is that we can't know what PASID is used as default
> without reading the context entry?

Can the default NO_PASID being used in mixture of !NO_PASID use case on the
same device?  If that's possible, then I agree..

My previous idea should be based on the fact that if NO_PASID is used on one
device, then all translations will be based on NO_PASID, but now I'm not sure
of it.

> 
> >
> > The other thing to mention is, if we postpone the iotlb lookup to be after
> > context entry, then logically we can have per-device iotlb, that means we 
> > can
> > replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> > which can also be more efficient.
> 
> Right but we still need to limit the total slots and ATS is a better
> way to deal with the IOTLB bottleneck actually.

I think it depends on how the iotlb ghash is implemented.  Logically I think if
we can split the cache to per-device it'll be slightly better because we don't
need to iterate over iotlbs of other devices when lookup anymore; meanwhile
each iotlb takes less space too (no devfn needed anymore).

Thanks,

-- 
Peter Xu




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-13 Thread Jason Wang
On Fri, Jan 14, 2022 at 11:31 AM Peter Xu  wrote:
>
> On Fri, Jan 14, 2022 at 10:47:44AM +0800, Jason Wang wrote:
> >
> > 在 2022/1/13 下午1:06, Peter Xu 写道:
> > > On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > > > @@ -1725,11 +1780,16 @@ static bool 
> > > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > >   cc_entry->context_cache_gen = s->context_cache_gen;
> > > >   }
> > > > +/* Try to fetch slpte form IOTLB */
> > > > +if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > > > +pasid = VTD_CE_GET_RID2PASID(&ce);
> > > > +}
> > > > +
> > > >   /*
> > > >* We don't need to translate for pass-through context entries.
> > > >* Also, let's ignore IOTLB caching as well for PT devices.
> > > >*/
> > > > -if (vtd_dev_pt_enabled(s, &ce)) {
> > > > +if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> > > >   entry->iova = addr & VTD_PAGE_MASK_4K;
> > > >   entry->translated_addr = entry->iova;
> > > >   entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > > > @@ -1750,14 +1810,24 @@ static bool 
> > > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > > >   return true;
> > > >   }
> > > > +iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > > > +if (iotlb_entry) {
> > > > +trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > > > + iotlb_entry->domain_id);
> > > > +slpte = iotlb_entry->slpte;
> > > > +access_flags = iotlb_entry->access_flags;
> > > > +page_mask = iotlb_entry->mask;
> > > > +goto out;
> > > > +}
> > > IIUC the iotlb lookup moved down just because the pasid==NO_PASID case 
> > > then
> > > we'll need to fetch the default pasid from the context entry.  That looks
> > > reasonable.
> > >
> > > It's just a bit of pity because logically it'll slow down iotlb hits due 
> > > to
> > > context entry operations.  When NO_PASID we could have looked up iotlb 
> > > without
> > > checking pasid at all, assuming that "default pasid" will always match.  
> > > But
> > > that is a little bit hacky.
> >
> >
> > Right, but I think you meant to do this only when scalable mode is disabled.
>
> Yes IMHO it will definitely suite for !scalable case since that's exactly what
> we did before.  What I'm also wondering is even if scalable is enabled but no
> "real" pasid is used, so if all the translations go through the default pasid
> that stored in the device context entry, then maybe we can ignore checking it.
> The latter is the "hacky" part mentioned above.

The problem I see is that we can't know what PASID is used as default
without reading the context entry?

>
> The other thing to mention is, if we postpone the iotlb lookup to be after
> context entry, then logically we can have per-device iotlb, that means we can
> replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
> which can also be more efficient.

Right but we still need to limit the total slots and ATS is a better
way to deal with the IOTLB bottleneck actually.

>
> Not sure whether Michael will have a preference, for me I think either way can
> be done on top.
>
> >
> >
> > >
> > > vIOMMU seems to be mostly used for assigned devices and dpdk in 
> > > production in
> > > the future due to its slowness otherwise.. so maybe not a big deal at all.
> > >
> > > [...]
> > >
> > > > @@ -2011,7 +2083,52 @@ static void 
> > > > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > > >   vtd_iommu_lock(s);
> > > >   g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, 
> > > > &info);
> > > >   vtd_iommu_unlock(s);
> > > > -vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > > > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, 
> > > > PCI_NO_PASID);
> > > > +}
> > > > +
> > > > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > > > +uint16_t domain_id,
> > > > +hwaddr addr, uint8_t am,
> > > > +uint32_t pasid)
> > > > +{
> > > > +VTDIOTLBPageInvInfo info;
> > > > +
> > > > +trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > > > +
> > > > +assert(am <= VTD_MAMV);
> > > > +info.domain_id = domain_id;
> > > > +info.addr = addr;
> > > > +info.mask = ~((1 << am) - 1);
> > > > +info.pasid = pasid;
> > > > +vtd_iommu_lock(s);
> > > > +g_hash_table_foreach_remove(s->iotlb, 
> > > > vtd_hash_remove_by_page_pasid, &info);
> > > > +vtd_iommu_unlock(s);
> > > > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> > > Hmm, I think indeed we need a notification, but it'll be unnecessary for
> > > e.g. vfio map notifiers, because this is 1st level invalidation and at 
> > > least so
> > > far vfio map notifiers are rewalking o

Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-13 Thread Peter Xu
On Fri, Jan 14, 2022 at 10:47:44AM +0800, Jason Wang wrote:
> 
> 在 2022/1/13 下午1:06, Peter Xu 写道:
> > On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > > @@ -1725,11 +1780,16 @@ static bool 
> > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > >   cc_entry->context_cache_gen = s->context_cache_gen;
> > >   }
> > > +/* Try to fetch slpte form IOTLB */
> > > +if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > > +pasid = VTD_CE_GET_RID2PASID(&ce);
> > > +}
> > > +
> > >   /*
> > >* We don't need to translate for pass-through context entries.
> > >* Also, let's ignore IOTLB caching as well for PT devices.
> > >*/
> > > -if (vtd_dev_pt_enabled(s, &ce)) {
> > > +if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> > >   entry->iova = addr & VTD_PAGE_MASK_4K;
> > >   entry->translated_addr = entry->iova;
> > >   entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > > @@ -1750,14 +1810,24 @@ static bool 
> > > vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
> > >   return true;
> > >   }
> > > +iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > > +if (iotlb_entry) {
> > > +trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > > + iotlb_entry->domain_id);
> > > +slpte = iotlb_entry->slpte;
> > > +access_flags = iotlb_entry->access_flags;
> > > +page_mask = iotlb_entry->mask;
> > > +goto out;
> > > +}
> > IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> > we'll need to fetch the default pasid from the context entry.  That looks
> > reasonable.
> > 
> > It's just a bit of pity because logically it'll slow down iotlb hits due to
> > context entry operations.  When NO_PASID we could have looked up iotlb 
> > without
> > checking pasid at all, assuming that "default pasid" will always match.  But
> > that is a little bit hacky.
> 
> 
> Right, but I think you meant to do this only when scalable mode is disabled.

Yes IMHO it will definitely suite for !scalable case since that's exactly what
we did before.  What I'm also wondering is even if scalable is enabled but no
"real" pasid is used, so if all the translations go through the default pasid
that stored in the device context entry, then maybe we can ignore checking it.
The latter is the "hacky" part mentioned above.

The other thing to mention is, if we postpone the iotlb lookup to be after
context entry, then logically we can have per-device iotlb, that means we can
replace IntelIOMMUState.iotlb with VTDAddressSpace.iotlb in the future, too,
which can also be more efficient.

Not sure whether Michael will have a preference, for me I think either way can
be done on top.

> 
> 
> > 
> > vIOMMU seems to be mostly used for assigned devices and dpdk in production 
> > in
> > the future due to its slowness otherwise.. so maybe not a big deal at all.
> > 
> > [...]
> > 
> > > @@ -2011,7 +2083,52 @@ static void 
> > > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> > >   vtd_iommu_lock(s);
> > >   g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, 
> > > &info);
> > >   vtd_iommu_unlock(s);
> > > -vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, 
> > > PCI_NO_PASID);
> > > +}
> > > +
> > > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > > +uint16_t domain_id,
> > > +hwaddr addr, uint8_t am,
> > > +uint32_t pasid)
> > > +{
> > > +VTDIOTLBPageInvInfo info;
> > > +
> > > +trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > > +
> > > +assert(am <= VTD_MAMV);
> > > +info.domain_id = domain_id;
> > > +info.addr = addr;
> > > +info.mask = ~((1 << am) - 1);
> > > +info.pasid = pasid;
> > > +vtd_iommu_lock(s);
> > > +g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, 
> > > &info);
> > > +vtd_iommu_unlock(s);
> > > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> > Hmm, I think indeed we need a notification, but it'll be unnecessary for
> > e.g. vfio map notifiers, because this is 1st level invalidation and at 
> > least so
> > far vfio map notifiers are rewalking only the 2nd level page table, so 
> > it'll be
> > destined to be a no-op and pure overhead.
> 
> 
> Right, consider we don't implement l1 and we don't have a 1st level
> abstraction in neither vhost nor vfio, we can simply remove this.

We probably still need the real pasid invalidation parts in the future?  Either
for vhost (if vhost will going to cache pasid-based translations), or for
compatible assigned devices in the future where the HW can cache it.

I'm not sure what's the best way t

Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-13 Thread Jason Wang



在 2022/1/13 下午1:06, Peter Xu 写道:

On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:

@@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  cc_entry->context_cache_gen = s->context_cache_gen;
  }
  
+/* Try to fetch slpte form IOTLB */

+if ((pasid == PCI_NO_PASID) && s->root_scalable) {
+pasid = VTD_CE_GET_RID2PASID(&ce);
+}
+
  /*
   * We don't need to translate for pass-through context entries.
   * Also, let's ignore IOTLB caching as well for PT devices.
   */
-if (vtd_dev_pt_enabled(s, &ce)) {
+if (vtd_dev_pt_enabled(s, &ce, pasid)) {
  entry->iova = addr & VTD_PAGE_MASK_4K;
  entry->translated_addr = entry->iova;
  entry->addr_mask = ~VTD_PAGE_MASK_4K;
@@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
  return true;
  }
  
+iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);

+if (iotlb_entry) {
+trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
+ iotlb_entry->domain_id);
+slpte = iotlb_entry->slpte;
+access_flags = iotlb_entry->access_flags;
+page_mask = iotlb_entry->mask;
+goto out;
+}

IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
we'll need to fetch the default pasid from the context entry.  That looks
reasonable.

It's just a bit of pity because logically it'll slow down iotlb hits due to
context entry operations.  When NO_PASID we could have looked up iotlb without
checking pasid at all, assuming that "default pasid" will always match.  But
that is a little bit hacky.



Right, but I think you meant to do this only when scalable mode is disabled.




vIOMMU seems to be mostly used for assigned devices and dpdk in production in
the future due to its slowness otherwise.. so maybe not a big deal at all.

[...]


@@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState 
*s, uint16_t domain_id,
  vtd_iommu_lock(s);
  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
  vtd_iommu_unlock(s);
-vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
+vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
+}
+
+static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
+uint16_t domain_id,
+hwaddr addr, uint8_t am,
+uint32_t pasid)
+{
+VTDIOTLBPageInvInfo info;
+
+trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
+
+assert(am <= VTD_MAMV);
+info.domain_id = domain_id;
+info.addr = addr;
+info.mask = ~((1 << am) - 1);
+info.pasid = pasid;
+vtd_iommu_lock(s);
+g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, 
&info);
+vtd_iommu_unlock(s);
+vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);

Hmm, I think indeed we need a notification, but it'll be unnecessary for
e.g. vfio map notifiers, because this is 1st level invalidation and at least so
far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
destined to be a no-op and pure overhead.



Right, consider we don't implement l1 and we don't have a 1st level 
abstraction in neither vhost nor vfio, we can simply remove this.






+}
+
+static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id,
+   uint32_t pasid)
+{
+VTDIOTLBPageInvInfo info;
+VTDAddressSpace *vtd_as;
+VTDContextEntry ce;
+
+trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
+
+info.domain_id = domain_id;
+info.pasid = pasid;
+vtd_iommu_lock(s);
+g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
+vtd_iommu_unlock(s);
+
+QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
+if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+  vtd_as->devfn, &ce) &&
+domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
+pasid == vtd_as->pasid) {
+vtd_sync_shadow_page_table(vtd_as);

Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
if we got the 1st level pgtable invalidated?



Seems not and this makes me think to remove the whole PASID based 
invalidation logic since they are for L1 which is not implemented in 
this series.






+}
+}
  }

The rest looks mostly good to me; thanks.



Thanks









Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-12 Thread Michael S. Tsirkin
On Thu, Jan 13, 2022 at 01:06:09PM +0800, Peter Xu wrote:
> On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> > @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> > *vtd_as, PCIBus *bus,
> >  cc_entry->context_cache_gen = s->context_cache_gen;
> >  }
> >  
> > +/* Try to fetch slpte form IOTLB */
> > +if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> > +pasid = VTD_CE_GET_RID2PASID(&ce);
> > +}
> > +
> >  /*
> >   * We don't need to translate for pass-through context entries.
> >   * Also, let's ignore IOTLB caching as well for PT devices.
> >   */
> > -if (vtd_dev_pt_enabled(s, &ce)) {
> > +if (vtd_dev_pt_enabled(s, &ce, pasid)) {
> >  entry->iova = addr & VTD_PAGE_MASK_4K;
> >  entry->translated_addr = entry->iova;
> >  entry->addr_mask = ~VTD_PAGE_MASK_4K;
> > @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> > *vtd_as, PCIBus *bus,
> >  return true;
> >  }
> >  
> > +iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> > +if (iotlb_entry) {
> > +trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> > + iotlb_entry->domain_id);
> > +slpte = iotlb_entry->slpte;
> > +access_flags = iotlb_entry->access_flags;
> > +page_mask = iotlb_entry->mask;
> > +goto out;
> > +}
> 
> IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
> we'll need to fetch the default pasid from the context entry.  That looks
> reasonable.
> 
> It's just a bit of pity because logically it'll slow down iotlb hits due to
> context entry operations.  When NO_PASID we could have looked up iotlb without
> checking pasid at all, assuming that "default pasid" will always match.  But
> that is a little bit hacky.

Maybe that's not a bad idea for an optimization.


> vIOMMU seems to be mostly used for assigned devices and dpdk in production in
> the future due to its slowness otherwise.. so maybe not a big deal at all.
> 
> [...]
> 
> > @@ -2011,7 +2083,52 @@ static void 
> > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
> >  vtd_iommu_lock(s);
> >  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
> >  vtd_iommu_unlock(s);
> > -vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> > +}
> > +
> > +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> > +uint16_t domain_id,
> > +hwaddr addr, uint8_t am,
> > +uint32_t pasid)
> > +{
> > +VTDIOTLBPageInvInfo info;
> > +
> > +trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> > +
> > +assert(am <= VTD_MAMV);
> > +info.domain_id = domain_id;
> > +info.addr = addr;
> > +info.mask = ~((1 << am) - 1);
> > +info.pasid = pasid;
> > +vtd_iommu_lock(s);
> > +g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, 
> > &info);
> > +vtd_iommu_unlock(s);
> > +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);
> 
> Hmm, I think indeed we need a notification, but it'll be unnecessary for
> e.g. vfio map notifiers, because this is 1st level invalidation and at least 
> so
> far vfio map notifiers are rewalking only the 2nd level page table, so it'll 
> be
> destined to be a no-op and pure overhead.
> 
> > +}
> > +
> > +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t 
> > domain_id,
> > +   uint32_t pasid)
> > +{
> > +VTDIOTLBPageInvInfo info;
> > +VTDAddressSpace *vtd_as;
> > +VTDContextEntry ce;
> > +
> > +trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> > +
> > +info.domain_id = domain_id;
> > +info.pasid = pasid;
> > +vtd_iommu_lock(s);
> > +g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> > +vtd_iommu_unlock(s);
> > +
> > +QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> > +if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> > +  vtd_as->devfn, &ce) &&
> > +domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> > +pasid == vtd_as->pasid) {
> > +vtd_sync_shadow_page_table(vtd_as);
> 
> Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
> if we got the 1st level pgtable invalidated?
> 
> > +}
> > +}
> >  }
> 
> The rest looks mostly good to me; thanks.
> 
> -- 
> Peter Xu




Re: [PATCH 3/3] intel-iommu: PASID support

2022-01-12 Thread Peter Xu
On Wed, Jan 05, 2022 at 12:19:45PM +0800, Jason Wang wrote:
> @@ -1725,11 +1780,16 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> *vtd_as, PCIBus *bus,
>  cc_entry->context_cache_gen = s->context_cache_gen;
>  }
>  
> +/* Try to fetch slpte form IOTLB */
> +if ((pasid == PCI_NO_PASID) && s->root_scalable) {
> +pasid = VTD_CE_GET_RID2PASID(&ce);
> +}
> +
>  /*
>   * We don't need to translate for pass-through context entries.
>   * Also, let's ignore IOTLB caching as well for PT devices.
>   */
> -if (vtd_dev_pt_enabled(s, &ce)) {
> +if (vtd_dev_pt_enabled(s, &ce, pasid)) {
>  entry->iova = addr & VTD_PAGE_MASK_4K;
>  entry->translated_addr = entry->iova;
>  entry->addr_mask = ~VTD_PAGE_MASK_4K;
> @@ -1750,14 +1810,24 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> *vtd_as, PCIBus *bus,
>  return true;
>  }
>  
> +iotlb_entry = vtd_lookup_iotlb(s, source_id, addr, pasid);
> +if (iotlb_entry) {
> +trace_vtd_iotlb_page_hit(source_id, addr, iotlb_entry->slpte,
> + iotlb_entry->domain_id);
> +slpte = iotlb_entry->slpte;
> +access_flags = iotlb_entry->access_flags;
> +page_mask = iotlb_entry->mask;
> +goto out;
> +}

IIUC the iotlb lookup moved down just because the pasid==NO_PASID case then
we'll need to fetch the default pasid from the context entry.  That looks
reasonable.

It's just a bit of pity because logically it'll slow down iotlb hits due to
context entry operations.  When NO_PASID we could have looked up iotlb without
checking pasid at all, assuming that "default pasid" will always match.  But
that is a little bit hacky.

vIOMMU seems to be mostly used for assigned devices and dpdk in production in
the future due to its slowness otherwise.. so maybe not a big deal at all.

[...]

> @@ -2011,7 +2083,52 @@ static void vtd_iotlb_page_invalidate(IntelIOMMUState 
> *s, uint16_t domain_id,
>  vtd_iommu_lock(s);
>  g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, &info);
>  vtd_iommu_unlock(s);
> -vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am);
> +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, PCI_NO_PASID);
> +}
> +
> +static void vtd_iotlb_page_pasid_invalidate(IntelIOMMUState *s,
> +uint16_t domain_id,
> +hwaddr addr, uint8_t am,
> +uint32_t pasid)
> +{
> +VTDIOTLBPageInvInfo info;
> +
> +trace_vtd_inv_desc_iotlb_pasid_pages(domain_id, addr, am, pasid);
> +
> +assert(am <= VTD_MAMV);
> +info.domain_id = domain_id;
> +info.addr = addr;
> +info.mask = ~((1 << am) - 1);
> +info.pasid = pasid;
> +vtd_iommu_lock(s);
> +g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page_pasid, 
> &info);
> +vtd_iommu_unlock(s);
> +vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am, pasid);

Hmm, I think indeed we need a notification, but it'll be unnecessary for
e.g. vfio map notifiers, because this is 1st level invalidation and at least so
far vfio map notifiers are rewalking only the 2nd level page table, so it'll be
destined to be a no-op and pure overhead.

> +}
> +
> +static void vtd_iotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t 
> domain_id,
> +   uint32_t pasid)
> +{
> +VTDIOTLBPageInvInfo info;
> +VTDAddressSpace *vtd_as;
> +VTDContextEntry ce;
> +
> +trace_vtd_inv_desc_iotlb_pasid(domain_id, pasid);
> +
> +info.domain_id = domain_id;
> +info.pasid = pasid;
> +vtd_iommu_lock(s);
> +g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pasid, &info);
> +vtd_iommu_unlock(s);
> +
> +QLIST_FOREACH(vtd_as, &s->vtd_as_with_notifiers, next) {
> +if (!vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> +  vtd_as->devfn, &ce) &&
> +domain_id == vtd_get_domain_id(s, &ce, vtd_as->pasid) &&
> +pasid == vtd_as->pasid) {
> +vtd_sync_shadow_page_table(vtd_as);

Do we need to rewalk the shadow pgtable (which is the 2nd level, afaict) even
if we got the 1st level pgtable invalidated?

> +}
> +}
>  }

The rest looks mostly good to me; thanks.

-- 
Peter Xu




[PATCH 3/3] intel-iommu: PASID support

2022-01-04 Thread Jason Wang
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush

For simplicity, PASID cache is not implemented so we can simply
implement the PASID cache flush as a nop. This could be added in the
future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address w/ PASID. vtd_find_add_as() has been
extended to provision address space with PASID which could be utilized
by the future extension of PCI core to allow device model to use PASID
based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c  | 355 +
 hw/i386/intel_iommu_internal.h |  14 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |   6 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 289 insertions(+), 90 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 58c682097b..04ae604ea1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -64,6 +64,14 @@
 struct vtd_as_key {
 PCIBus *bus;
 uint8_t devfn;
+uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+uint16_t sid;
+uint32_t pasid;
+uint64_t gfn;
+uint32_t level;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -193,15 +201,36 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_iotlb_key *key1 = v1;
+const struct vtd_iotlb_key *key2 = v2;
+
+return key1->sid == key2->sid &&
+   key1->pasid == key2->pasid &&
+   key1->level == key2->level &&
+   key1->gfn == key2->gfn;
+}
+
+static guint vtd_iotlb_hash(gconstpointer v)
+{
+const struct vtd_iotlb_key *key = v;
+
+return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+   (key->level) << VTD_IOTLB_LVL_SHIFT |
+   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
+}
+
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
 {
 const struct vtd_as_key *key1 = v1;
 const struct vtd_as_key *key2 = v2;
 
-return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+   (key1->pasid == key2->pasid);
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_as_hash(gconstpointer v)
 {
 const struct vtd_as_key *key = v;
 guint value = (guint)(uintptr_t)key->bus;
@@ -241,6 +270,29 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
  (entry->gfn == gfn_tlb));
 }
 
+static gboolean vtd_hash_remove_by_page_pasid(gpointer key, gpointer value,
+  gpointer user_data)
+{
+VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+uint64_t gfn = (info->addr >> VTD_PAGE_SHIFT_4K) & info->mask;
+uint64_t gfn_tlb = (info->addr & entry->mask) >> VTD_PAGE_SHIFT_4K;
+return (entry->domain_id == info->domain_id) &&
+   (entry->pasid == info->pasid) &&
+(((entry->gfn & info->mask) == gfn) ||
+ (entry->gfn == gfn_tlb));
+}
+
+static gboolean vtd_hash_remove_by_pasid(gpointer key, gpointer value,
+ gpointer user_data)
+{
+VTDIOTLBEntry *entry = (VTDIOTLBEntry *)value;
+VTDIOTLBPageInvInfo *info = (VTDIOTLBPageInvInfo *)user_data;
+
+return (entry->domain_id == info->domain_id) &&
+   (entry->pasid == info->pasid);
+}
+
 /* Reset all the gen of VTDAddressSpace to zero and set the gen of
  * IntelIOMMUState to 1.  Must be called with IOMMU lock held.
  */
@@ -281,13 +333,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-  uint32_t level)
-{
-return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-   ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -295,15 +340,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-   hwaddr addr)
+   hwad