Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi Jason,

On 4/16/21 4:34 PM, Jason Gunthorpe wrote:
> On Fri, Apr 16, 2021 at 04:26:19PM +0200, Auger Eric wrote:
> 
>> This was largely done during several confs including plumber, KVM forum,
>> for several years. Also API docs were shared on the ML. I don't remember
>> any voice was raised at those moments.
> 
> I don't think anyone objects to the high level ideas, but
> implementation does matter. I don't think anyone presented "hey we
> will tunnel an uAPI through VFIO to the IOMMU subsystem" - did they?

At minimum
https://events19.linuxfoundation.cn/wp-content/uploads/2017/11/Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf

But most obviously everything is documented in
Documentation/userspace-api/iommu.rst where the VFIO tunneling is
clearly stated ;-)

But well let's work together to design a better and more elegant
solution then.

Thanks

Eric
> 
> Look at the fairly simple IMS situation, for example. This was
> presented at plumbers too, and the slides were great - but the
> implementation was too hacky. It required a major rework of the x86
> interrupt handling before it was OK.
> 
> Jason
> 



Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi,
On 4/16/21 4:05 PM, Jason Gunthorpe wrote:
> On Fri, Apr 16, 2021 at 03:38:02PM +0200, Auger Eric wrote:
> 
>> The redesign requirement came pretty late in the development process.
>> The iommu user API is upstream for a while, the VFIO interfaces have
>> been submitted a long time ago and under review for a bunch of time.
>> Redesigning everything with a different API, undefined at this point, is
>> a major setback for our work and will have a large impact on the
>> introduction of features companies are looking forward, hence our
>> frustration.
> 
> I will answer both you and Jacob at once.
> 
> This is uAPI, once it is set it can never be changed.
> 
> The kernel process and philosophy is to invest heavily in uAPI
> development and review to converge on the best uAPI possible.
> 
> Many past submissions have take a long time to get this right, there
> are several high profile uAPI examples.
> 
> Do you think this case is so special, or the concerns so minor, that it
> should get to bypass all of the normal process?

That's not my intent to bypass any process. I am just trying to
understand what needs to be re-designed and for what use case.
> 
> Ask yourself, is anyone advocating for the current direction on
> technical merits alone?
> 
> Certainly the patches I last saw where completely disgusting from a
> uAPI design perspective.
> 
> It was against the development process to organize this work the way
> it was done. Merging a wack of dead code to the kernel to support a
> uAPI vision that was never clearly articulated was a big mistake.
> 
> Start from the beginning. Invest heavily in defining a high quality
> uAPI. Clearly describe the uAPI to all stake holders.
This was largely done during several confs including plumber, KVM forum,
for several years. Also API docs were shared on the ML. I don't remember
any voice was raised at those moments.

 Break up the
> implementation into patch series without dead code. Make the
> patches. Remove the dead code this group has already added.
> 
> None of this should be a surprise. The VDPA discussion and related
> "what is a mdev" over a year ago made it pretty clear VFIO is not the
> exclusive user of "IOMMU in userspace" and that places limits on what
> kind of uAPIs expansion it should experience going forward.
Maybe clear for you but most probably not for many other stakeholders.

Anyway I do not intend to further argue and I will be happy to learn
from you and work with you, Jacob, Liu and all other stakeholders to
define a better integration.

Thanks

Eric
> 
> Jason
> 



Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-16 Thread Auger Eric
Hi Jason,

On 4/16/21 1:07 AM, Jason Gunthorpe wrote:
> On Thu, Apr 15, 2021 at 03:11:19PM +0200, Auger Eric wrote:
>> Hi Jason,
>>
>> On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
>>> On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
>>>
>>>> DMA page faults are delivered to root-complex via page request message and
>>>> it is per-device according to PCIe spec. Page request handling flow is:
>>>>
>>>> 1) iommu driver receives a page request from device
>>>> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
>>>>page and requested permissions etc.
>>>> 3) iommu driver triggers fault handler registered by device driver with
>>>>iommu_report_device_fault()
>>>
>>> This seems confused.
>>>
>>> The PASID should define how to handle the page fault, not the driver.
>>
>> In my series I don't use PASID at all. I am just enabling nested stage
>> and the guest uses a single context. I don't allocate any user PASID at
>> any point.
>>
>> When there is a fault at physical level (a stage 1 fault that concerns
>> the guest), this latter needs to be reported and injected into the
>> guest. The vfio pci driver registers a fault handler to the iommu layer
>> and in that fault handler it fills a circ bugger and triggers an eventfd
>> that is listened to by the VFIO-PCI QEMU device. this latter retrives
>> the faault from the mmapped circ buffer, it knowns which vIOMMU it is
>> attached to, and passes the fault to the vIOMMU.
>> Then the vIOMMU triggers and IRQ in the guest.
>>
>> We are reusing the existing concepts from VFIO, region, IRQ to do that.
>>
>> For that use case, would you also use /dev/ioasid?
> 
> /dev/ioasid could do all the things you described vfio-pci as doing,
> it can even do them the same way you just described.
> 
> Stated another way, do you plan to duplicate all of this code someday
> for vfio-cxl? What about for vfio-platform? ARM SMMU can be hooked to
> platform devices, right?
vfio regions and IRQ related APIs are common user interfaces exposed by
all vfio drivers, including platform. Then the actual circular buffer
implementation details can be put in a common lib.

as for the thin vfio iommu wrappers, the ones you don't like, they are
implemented in type1 code.

Maybe the need for /dev/ioasid is more crying for PASID management but
for the nested use case, that's not obvious to me and in your different
replies, it was not crystal clear where the use case belongs to.

The redesign requirement came pretty late in the development process.
The iommu user API is upstream for a while, the VFIO interfaces have
been submitted a long time ago and under review for a bunch of time.
Redesigning everything with a different API, undefined at this point, is
a major setback for our work and will have a large impact on the
introduction of features companies are looking forward, hence our
frustration.

Thanks

Eric


> 
> I feel what you guys are struggling with is some choice in the iommu
> kernel APIs that cause the events to be delivered to the pci_device
> owner, not the PASID owner.
> 
> That feels solvable.
> 
> Jason
> 



Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-15 Thread Auger Eric
Hi Jason,

On 4/1/21 6:03 PM, Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 02:08:17PM +, Liu, Yi L wrote:
> 
>> DMA page faults are delivered to root-complex via page request message and
>> it is per-device according to PCIe spec. Page request handling flow is:
>>
>> 1) iommu driver receives a page request from device
>> 2) iommu driver parses the page request message. Get the RID,PASID, faulted
>>page and requested permissions etc.
>> 3) iommu driver triggers fault handler registered by device driver with
>>iommu_report_device_fault()
> 
> This seems confused.
> 
> The PASID should define how to handle the page fault, not the driver.

In my series I don't use PASID at all. I am just enabling nested stage
and the guest uses a single context. I don't allocate any user PASID at
any point.

When there is a fault at physical level (a stage 1 fault that concerns
the guest), this latter needs to be reported and injected into the
guest. The vfio pci driver registers a fault handler to the iommu layer
and in that fault handler it fills a circ bugger and triggers an eventfd
that is listened to by the VFIO-PCI QEMU device. this latter retrives
the faault from the mmapped circ buffer, it knowns which vIOMMU it is
attached to, and passes the fault to the vIOMMU.
Then the vIOMMU triggers and IRQ in the guest.

We are reusing the existing concepts from VFIO, region, IRQ to do that.

For that use case, would you also use /dev/ioasid?

Thanks

Eric
> 
> I don't remember any device specific actions in ATS, so what is the
> driver supposed to do?
> 
>> 4) device driver's fault handler signals an event FD to notify userspace to
>>fetch the information about the page fault. If it's VM case, inject the
>>page fault to VM and let guest to solve it.
> 
> If the PASID is set to 'report page fault to userspace' then some
> event should come out of /dev/ioasid, or be reported to a linked
> eventfd, or whatever.
> 
> If the PASID is set to 'SVM' then the fault should be passed to
> handle_mm_fault
> 
> And so on.
> 
> Userspace chooses what happens based on how they configure the PASID
> through /dev/ioasid.
> 
> Why would a device driver get involved here?
> 
>> Eric has sent below series for the page fault reporting for VM with passthru
>> device.
>> https://lore.kernel.org/kvm/20210223210625.604517-5-eric.au...@redhat.com/
> 
> It certainly should not be in vfio pci. Everything using a PASID needs
> this infrastructure, VDPA, mdev, PCI, CXL, etc.
> 
> Jason
> 



Re: [PATCH v12 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-04-11 Thread Auger Eric
Hi Zenghui,

On 4/7/21 11:33 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 5:06, Eric Auger wrote:
>> +/*
>> + * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
>> + *    struct vfio_iommu_type1_set_pasid_table)
>> + *
>> + * The SET operation passes a PASID table to the host while the
>> + * UNSET operation detaches the one currently programmed. Setting
>> + * a table while another is already programmed replaces the old table.
> 
> It looks to me that this description doesn't match the IOMMU part.

Yep that's misleanding.

I replaced it by:

 It is allowed to "SET" the table several times without un-setting as
 long as the table config does not stay IOMMU_PASID_CONFIG_TRANSLATE.

> 
> [v14,05/13] iommu/smmuv3: Implement attach/detach_pasid_table
> 
> |    case IOMMU_PASID_CONFIG_TRANSLATE:
> |    /* we do not support S1 <-> S1 transitions */
> |    if (smmu_domain->s1_cfg.set)
> |    goto out;
> 
> Maybe I've misread something?
> 
> 
> Thanks,
> Zenghui
> 

Thanks

Eric



Re: [PATCH v14 08/13] dma-iommu: Implement NESTED_MSI cookie

2021-04-10 Thread Auger Eric
Hi Zenghui,

On 4/7/21 9:39 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> Up to now, when the type was UNMANAGED, we used to
>> allocate IOVA pages within a reserved IOVA MSI range.
>>
>> If both the host and the guest are exposed with SMMUs, each
>> would allocate an IOVA. The guest allocates an IOVA (gIOVA)
>> to map onto the guest MSI doorbell (gDB). The Host allocates
>> another IOVA (hIOVA) to map onto the physical doorbell (hDB).
>>
>> So we end up with 2 unrelated mappings, at S1 and S2:
>>   S1 S2
>> gIOVA    -> gDB
>>     hIOVA    ->    hDB
>>
>> The PCI device would be programmed with hIOVA.
>> No stage 1 mapping would existing, causing the MSIs to fault.
>>
>> iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
>> to the host so that gIOVA can be used by the host instead of
>> re-allocating a new hIOVA.
>>
>>   S1   S2
>> gIOVA    ->    gDB    ->    hDB
>>
>> this time, the PCI device can be programmed with the gIOVA MSI
>> doorbell which is correctly mapped through both stages.
>>
>> Nested mode is not compatible with HW MSI regions as in that
>> case gDB and hDB should have a 1-1 mapping. This check will
>> be done when attaching each device to the IOMMU domain.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index f659395e7959..d25eb7cecaa7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -19,6 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
> 
> Duplicated include.
sure
> 
>>   #include 
>>   #include 
>>   #include 
>> @@ -29,12 +30,15 @@
>>   struct iommu_dma_msi_page {
>>   struct list_head    list;
>>   dma_addr_t    iova;
>> +    dma_addr_t    gpa;
>>   phys_addr_t    phys;
>> +    size_t    s1_granule;
>>   };
>>     enum iommu_dma_cookie_type {
>>   IOMMU_DMA_IOVA_COOKIE,
>>   IOMMU_DMA_MSI_COOKIE,
>> +    IOMMU_DMA_NESTED_MSI_COOKIE,
>>   };
>>     struct iommu_dma_cookie {
>> @@ -46,6 +50,7 @@ struct iommu_dma_cookie {
>>   dma_addr_t    msi_iova;
> 
> msi_iova is unused in the nested mode, but we still set it to the start
> address of the RESV_SW_MSI region (in iommu_get_msi_cookie()), which
> looks a bit strange to me.
I agree with you
> 
>>   };
>>   struct list_head    msi_page_list;
>> +    spinlock_t    msi_lock;
> 
> Should msi_lock be grabbed everywhere msi_page_list is populated?
> Especially in iommu_dma_get_msi_page(), which can be invoked from the
> irqchip driver.
Yes I agree
> 
>>     /* Domain for flush queue callback; NULL if flush queue not in
>> use */
>>   struct iommu_domain    *fq_domain;
>> @@ -87,6 +92,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum
>> iommu_dma_cookie_type type)
>>     cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>>   if (cookie) {
>> +    spin_lock_init(>msi_lock);
>>   INIT_LIST_HEAD(>msi_page_list);
>>   cookie->type = type;
>>   }
>> @@ -120,14 +126,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
>>    *
>>    * Users who manage their own IOVA allocation and do not want DMA
>> API support,
>>    * but would still like to take advantage of automatic MSI
>> remapping, can use
>> - * this to initialise their own domain appropriately. Users should
>> reserve a
>> + * this to initialise their own domain appropriately. Users may
>> reserve a
>>    * contiguous IOVA region, starting at @base, large enough to
>> accommodate the
>>    * number of PAGE_SIZE mappings necessary to cover every MSI
>> doorbell address
>> - * used by the devices attached to @domain.
>> + * used by the devices attached to @domain. The other way round is to
>> provide
>> + * usable iova pages through the iommu_dma_bind_doorbell API (nested
>> stages
> 
> s/iommu_dma_bind_doorbell/iommu_dma_bind_guest_msi/ ?
correct
> 
>> + * use case)
>>    */
>>   int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>>   {
>>   struct iommu_dma_cookie *cookie;
>> +    int nesting, ret;
>>     if (domain->type != IOMMU_DOMAIN_UNMANAGED)
>>   return -EINVAL;
>> @@ -135,7 +144,12 @@ int iommu_get_msi_cookie(struct iommu_domain
>> *domain, dma_addr_t base)
>>   if (domain->iova_cookie)
>>   return -EEXIST;
>>   -    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +    ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
> 
> Redundant space.
yep
> 
>> +    if (!ret && nesting)
>> +    cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
>> +    else
>> +    cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
>> +
>>   if (!cookie)
>>   return -ENOMEM;
>>   @@ -156,6 +170,7 @@ void iommu_put_dma_cookie(struct iommu_domain
>> *domain)
>>   {
>>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>   struct iommu_dma_msi_page *msi, *tmp;
>> +    bool s2_unmap = false;
>>     if (!cookie)

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-09 Thread Auger Eric
Hi Kunkun,

On 4/9/21 6:48 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/8 20:30, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/1/21 2:37 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> With nested stage support, soon we will need to invalidate
>>>> S1 contexts and ranges tagged with an unmanaged asid, this
>>>> latter being managed by the guest. So let's introduce 2 helpers
>>>> that allow to invalidate with externally managed ASIDs
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v13 -> v14
>>>> - Actually send the NH_ASID command (reported by Xingang Wang)
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38
>>>> -
>>>>    1 file changed, 29 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 5579ec4fccc8..4c19a1114de4 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>>>> arm_smmu_domain *smmu_domain, int ssid,
>>>>    }
>>>>      /* IO_PGTABLE API */
>>>> -static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>>>> *smmu_domain,
>>>> +   int ext_asid)
>>>>    {
>>>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>>>    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>    struct arm_smmu_cmdq_ent cmd;
>>>>    @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>     * insertion to guarantee those are observed before the TLBI.
>>>> Do be
>>>>     * careful, 007.
>>>>     */
>>>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>>>> +    cmd.tlbi.asid    = ext_asid;
>>>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>>>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>>>> +    arm_smmu_cmdq_issue_sync(smmu);
>>>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>>    arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>>>    } else {
>>>>    cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>>>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void
>>>> *cookie)
>>>>    arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>>>    }
>>>>    +static void arm_smmu_tlb_inv_context(void *cookie)
>>>> +{
>>>> +    struct arm_smmu_domain *smmu_domain = cookie;
>>>> +
>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>>>> +}
>>>> +
>>>>    static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>>>     unsigned long iova, size_t size,
>>>>     size_t granule,
>>>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>>>> arm_smmu_cmdq_ent *cmd,
>>>>    arm_smmu_cmdq_batch_submit(smmu, );
>>>>    }
>>>>    
>>> Here is the part of code in __arm_smmu_tlb_inv_range():
>>>>  if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>>>  /* Get the leaf page size */
>>>>  tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>>>
>>>>  /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>>>  cmd->tlbi.tg = (tg - 10) / 2;
>>>>
>>>>  /* Determine what level the granule is at */
>>>>  cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>>>
>>>>  num_pages = size >> tg;
>>>>  }
>>> When pSMMU supports RIL, we get the leaf page size by
>>> __ffs(smmu_domain->
>>> domain.pgsize_bitmap). In nested mode, it is determined by host
>>> PAGE_SIZE. If
>>> the host kernel and guest kernel has different translation

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> With nested stage support, soon we will need to invalidate
>> S1 contexts and ranges tagged with an unmanaged asid, this
>> latter being managed by the guest. So let's introduce 2 helpers
>> that allow to invalidate with externally managed ASIDs
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14
>> - Actually send the NH_ASID command (reported by Xingang Wang)
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 -
>>   1 file changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5579ec4fccc8..4c19a1114de4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>   }
>>     /* IO_PGTABLE API */
>> -static void arm_smmu_tlb_inv_context(void *cookie)
>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>> *smmu_domain,
>> +   int ext_asid)
>>   {
>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>   struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   struct arm_smmu_cmdq_ent cmd;
>>   @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>    * insertion to guarantee those are observed before the TLBI. Do be
>>    * careful, 007.
>>    */
>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>> +    arm_smmu_cmdq_issue_sync(smmu);
>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>   arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>   } else {
>>   cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>   arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>   }
>>   +static void arm_smmu_tlb_inv_context(void *cookie)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = cookie;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>> +}
>> +
>>   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>    unsigned long iova, size_t size,
>>    size_t granule,
>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>> arm_smmu_cmdq_ent *cmd,
>>   arm_smmu_cmdq_batch_submit(smmu, );
>>   }
>>   
> Here is the part of code in __arm_smmu_tlb_inv_range():
>>     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>     /* Get the leaf page size */
>>     tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>
>>     /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>     cmd->tlbi.tg = (tg - 10) / 2;
>>
>>     /* Determine what level the granule is at */
>>     cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>
>>     num_pages = size >> tg;
>>     }
> When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
> domain.pgsize_bitmap). In nested mode, it is determined by host
> PAGE_SIZE. If
> the host kernel and guest kernel has different translation granule (e.g.
> host 16K,
> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.
> 
> Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/

In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

> 
> Best Regards,
> Kunkun Jiang
>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>> size,
>> -  size_t granule, bool leaf,
>> -  struct arm_smmu_domain *smmu_domain)
>> +static void
>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>> +  size_t granule, bool leaf, int ext_asid,
>> +  struct arm_smmu_domain *smmu_domain)
>>   {
>>   struct arm_smmu_cmdq_ent cmd = {
>>   .tlbi = {
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> 

Re: [PATCH v6 9/9] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-04-07 Thread Auger Eric
Hi Drew,

On 4/6/21 5:09 PM, Andrew Jones wrote:
> 
> Hi Eric,
> 
> It looks like Marc already picked this patch up, but, FWIW, here's
> a few more comments you may consider.

I will send a fixup patch on top of the one taken my Marc.  Few comments
below.
> 
> On Mon, Apr 05, 2021 at 06:39:41PM +0200, Eric Auger wrote:
>> The tests exercise the VGIC_V3 device creation including the
>> associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:
>>
>> - KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
>> - KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION
>>
>> Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
>> and especially the GICR_TYPER read. The goal was to test the case
>> recently fixed by commit 23bde34771f1
>> ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").
>>
>> The API under test can be found at
>> Documentation/virt/kvm/devices/arm-vgic-v3.rst
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v4 -> v5:
>> - simplify the last bit tests given the simpler interpretation
>>   of the spec
>>
>> v3 -> v4:
>> - update .gitignore
>> - More vgic-mmio-v3.c change into the previous patch
>> - rename fuzz_dist_rdist into test_dist_rdist
>> - cleanup in run_vcpu and guest_code
>> - max_ipa_bits is global
>> - s/fuzz/subtest
>> - added test_kvm_device,
>> - moved ucall_init() just before the cpu run
>> - use vm_create_default_with_vcpus
>> - use vm_gic struct, vm_gic_create, vm_gic_destroy
>> - revwrite util.c helpers to comply with the usual style
>> ---
>>  tools/testing/selftests/kvm/.gitignore|   1 +
>>  tools/testing/selftests/kvm/Makefile  |   1 +
>>  .../testing/selftests/kvm/aarch64/vgic_init.c | 585 ++
>>  .../testing/selftests/kvm/include/kvm_util.h  |   9 +
>>  tools/testing/selftests/kvm/lib/kvm_util.c|  77 +++
>>  5 files changed, 673 insertions(+)
>>  create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c
>>
>> diff --git a/tools/testing/selftests/kvm/.gitignore 
>> b/tools/testing/selftests/kvm/.gitignore
>> index 7bd7e776c266..bb862f91f640 100644
>> --- a/tools/testing/selftests/kvm/.gitignore
>> +++ b/tools/testing/selftests/kvm/.gitignore
>> @@ -1,6 +1,7 @@
>>  # SPDX-License-Identifier: GPL-2.0-only
>>  /aarch64/get-reg-list
>>  /aarch64/get-reg-list-sve
>> +/aarch64/vgic_init
>>  /s390x/memop
>>  /s390x/resets
>>  /s390x/sync_regs_test
>> diff --git a/tools/testing/selftests/kvm/Makefile 
>> b/tools/testing/selftests/kvm/Makefile
>> index 67eebb53235f..2fd4801de9ca 100644
>> --- a/tools/testing/selftests/kvm/Makefile
>> +++ b/tools/testing/selftests/kvm/Makefile
>> @@ -78,6 +78,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
>>  
>>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
>> +TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
>>  TEST_GEN_PROGS_aarch64 += demand_paging_test
>>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>>  TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
>> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
>> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
>> new file mode 100644
>> index ..be1a7c0d0527
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
>> @@ -0,0 +1,585 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * vgic init sequence tests
>> + *
>> + * Copyright (C) 2020, Red Hat, Inc.
>> + */
>> +#define _GNU_SOURCE
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "test_util.h"
>> +#include "kvm_util.h"
>> +#include "processor.h"
>> +
>> +#define NR_VCPUS4
>> +
>> +#define REDIST_REGION_ATTR_ADDR(count, base, flags, index) 
>> (((uint64_t)(count) << 52) | \
>> +((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index)
>> +#define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset)
>> +
>> +#define GICR_TYPER 0x8
>> +
>> +struct vm_gic {
>> +struct kvm_vm *vm;
>> +int gic_fd;
>> +};
>> +
>> +int max_ipa_bits;
> 
> static
done
> 
>> +
>> +/* helper to access a redistributor register */
>> +static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
>> + uint32_t *val, bool write)
>> +{
>> +uint64_t attr = REG_OFFSET(vcpu, offset);
>> +
>> +return _kvm_device_access(gicv3_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
>> +  attr, val, write);
>> +}
>> +
>> +/* dummy guest code */
>> +static void guest_code(void)
>> +{
>> +GUEST_SYNC(0);
>> +GUEST_SYNC(1);
>> +GUEST_SYNC(2);
>> +GUEST_DONE();
>> +}
>> +
>> +/* we don't want to assert on run execution, hence that helper */
>> +static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
>> +{
>> +int ret;
>> +
>> +vcpu_args_set(vm, vcpuid, 1);
> 
> You don't need the above vcpu_args_set call since guest_code doesn't take
> any arguments.
ok
> 
>> +ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
>> +get_ucall(vm, vcpuid, NULL);
> 
> You're not checking the result of get_ucall, so there's no need for the
> call.

Re: [PATCH v5 0/8] KVM/ARM: Some vgic fixes and init sequence KVM selftests

2021-04-05 Thread Auger Eric
Hi Marc,

On 4/5/21 12:12 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Sun, 04 Apr 2021 18:22:35 +0100,
> Eric Auger  wrote:
>>
>> While writting vgic v3 init sequence KVM selftests I noticed some
>> relatively minor issues. This was also the opportunity to try to
>> fix the issue laterly reported by Zenghui, related to the RDIST_TYPER
>> last bit emulation. The final patch is a first batch of VGIC init
>> sequence selftests. Of course they can be augmented with a lot more
>> register access tests, but let's try to move forward incrementally ...
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/linux/tree/vgic_kvmselftests_v5
>>
>> History:
>> v4 -> v5:
>> - rewrite the last bit detection according to Marc's
>>   interpretation of the spec and modify the kvm selftests
>>   accordingly
> 
> Have you dropped v4's patch #1? It did seem to fix an actual issue,
> didn't it?

Hum no that was not my intent :-( Resending ...

Eric
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH v5 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-05 Thread Auger Eric
Hi Marc,

On 4/5/21 12:10 PM, Marc Zyngier wrote:
> On Sun, 04 Apr 2021 18:22:42 +0100,
> Eric Auger  wrote:
>>
>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>> a bug identified when attempting to access the GICR_TYPER
>> register before the redistributor region setting, but dropped
>> the support of the LAST bit.
>>
>> Emulating the GICR_TYPER.Last bit still makes sense for
>> architecture compliance though. This patch restores its support
>> (if the redistributor region was set) while keeping the code safe.
>>
>> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
>> computes whether a redistributor is the highest one of a series
>> of redistributor contributor pages.
>>
>> With this new implementation we do not need to have a uaccess
>> read accessor anymore.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v4 -> v5:
>> - redist region list now is sorted by @base
>> - change the implementation according to Marc's understanding of
>>   the spec
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 58 +-
>>  include/kvm/arm_vgic.h |  1 +
>>  2 files changed, 34 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index e1ed0c5a8eaa..03a253785700 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -251,45 +251,52 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
>> *vcpu,
>>  vgic_enable_lpis(vcpu);
>>  }
>>  
>> +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_dist *vgic = >kvm->arch.vgic;
>> +struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>> +struct vgic_redist_region *iter, *rdreg = vgic_cpu->rdreg;
>> +
>> +if (!rdreg)
>> +return false;
>> +
>> +if (vgic_cpu->rdreg_index < rdreg->free_index - 1) {
>> +return false;
>> +} else if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) 
>> {
>> +struct list_head *rd_regions = >rd_regions;
>> +gpa_t end = rdreg->base + rdreg->count * 
>> KVM_VGIC_V3_REDIST_SIZE;
>> +
>> +/*
>> + * the rdist is the last one of the redist region,
>> + * check whether there is no other contiguous rdist region
>> + */
>> +list_for_each_entry(iter, rd_regions, list) {
>> +if (iter->base == end && iter->free_index > 0)
>> +return false;
>> +}
> 
> In the above notes, you state that the region list is now sorted by
> base address, but I really can't see what sorts that list. And the
> lines above indicate that you are still iterating over the whole RD
> regions.
> 
> It's not a big deal (the code is now simple enough), but that's just
> to confirm that I understand what is going on here.

Sorry I should have removed the notes. I made the change but then I
noticed that the list was already sorted by redistributor region index
as the API forbids to register rdist regions in non ascending index
order. So sorting by base address was eventually causing more trouble
than it helped.

Thanks

Eric
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 7:30 PM, Marc Zyngier wrote:
> On Thu, 01 Apr 2021 18:03:25 +0100,
> Auger Eric  wrote:
>>
>> Hi Marc,
>>
>> On 4/1/21 3:42 PM, Marc Zyngier wrote:
>>> Hi Eric,
>>>
>>> On Thu, 01 Apr 2021 09:52:37 +0100,
>>> Eric Auger  wrote:
>>>>
>>>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>>>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>>>> a bug identified when attempting to access the GICR_TYPER
>>>> register before the redistributor region setting, but dropped
>>>> the support of the LAST bit.
>>>>
>>>> Emulating the GICR_TYPER.Last bit still makes sense for
>>>> architecture compliance though. This patch restores its support
>>>> (if the redistributor region was set) while keeping the code safe.
>>>>
>>>> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
>>>> computes whether a redistributor is the highest one of a series
>>>> of redistributor contributor pages.
>>>>
>>>> The spec says "Indicates whether this Redistributor is the
>>>> highest-numbered Redistributor in a series of contiguous
>>>> Redistributor pages."
>>>>
>>>> The code is a bit convulated since there is no guarantee
>>>
>>> nit: convoluted
>>>
>>>> redistributors are added in a given reditributor region in
>>>> ascending order. In that case the current implementation was
>>>> wrong. Also redistributor regions can be contiguous
>>>> and registered in non increasing base address order.
>>>>
>>>> So the index of redistributors are stored in an array within
>>>> the redistributor region structure.
>>>>
>>>> With this new implementation we do not need to have a uaccess
>>>> read accessor anymore.
>>>>
>>>> Signed-off-by: Eric Auger 
>>>
>>> This patch also hurt my head, a lot more than the first one.  See
>>> below.
>>>
>>>> ---
>>>>  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
>>>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
>>>>  arch/arm64/kvm/vgic/vgic.h |  1 +
>>>>  include/kvm/arm_vgic.h |  3 +
>>>>  4 files changed, 73 insertions(+), 35 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
>>>> b/arch/arm64/kvm/vgic/vgic-init.c
>>>> index cf6faa0aeddb2..61150c34c268c 100644
>>>> --- a/arch/arm64/kvm/vgic/vgic-init.c
>>>> +++ b/arch/arm64/kvm/vgic/vgic-init.c
>>>> @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>>>int i;
>>>>  
>>>>vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
>>>> +  vgic_cpu->index = vcpu->vcpu_id;
>>>
>>> Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
>>> do we need another field? We can always get to the vcpu using a
>>> container_of().
>>>
>>>>  
>>>>INIT_LIST_HEAD(_cpu->ap_list_head);
>>>>raw_spin_lock_init(_cpu->ap_list_lock);
>>>> @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>>>>dist->vgic_dist_base = VGIC_ADDR_UNDEF;
>>>>  
>>>>if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>>>> -  list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
>>>> -  list_del(>list);
>>>> -  kfree(rdreg);
>>>> -  }
>>>> +  list_for_each_entry_safe(rdreg, next, >rd_regions, list)
>>>> +  vgic_v3_free_redist_region(rdreg);
>>>
>>> Consider moving the introduction of vgic_v3_free_redist_region() into
>>> a separate patch. On its own, that's a good readability improvement.
>>>
>>>>INIT_LIST_HEAD(>rd_regions);
>>>>} else {
>>>>dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
>>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>>>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> index 987e366c80008..f6a7eed1d6adb 100644
>>>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
>>>> *vcpu,
>>>> 

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 4/1/21 8:11 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device
>> *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +    return -EINVAL;
>> +
>> +    if (!smmu)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +    return -EINVAL;
>> +
>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
> 
> I didn't find any code where we would emulate the CFGI_CD{_ALL} commands
> for guest and invalidate the stale CD entries on the physical side. Is
> PASID-cache type designed for that effect?
Yes it is. PASID-cache matches the CD table.
> 
>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +    return -ENOENT;
>> +    }
>> +
>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +    return -EINVAL;
>> +
>> +    /* IOTLB invalidation */
>> +
>> +    switch (inv_info->granularity) {
>> +    case IOMMU_INV_GRANU_PASID:
>> +    {
>> +    struct iommu_inv_pasid_info *info =
>> +    _info->granu.pasid_info;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +    if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +    return -EINVAL;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +    return 0;
>> +    }
>> +    case IOMMU_INV_GRANU_ADDR:
>> +    {
>> +    struct iommu_inv_addr_info *info = _info->granu.addr_info;
>> +    size_t size = info->nb_granules * info->granule_size;
>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +    return -ENOENT;
>> +
>> +    if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +    break;
>> +
>> +    arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +  info->granule_size, leaf,
>> +  info->archid, smmu_domain);
>> +
>> +    arm_smmu_cmdq_issue_sync(smmu);
> 
> There is no need to issue one more SYNC.
Hum yes I did not notice it was made by the arm_smmu_cmdq_issue_cmdlist()

Thanks!

Eric
> 



Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Shameer,
On 4/1/21 2:38 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 01 April 2021 12:49
>> To: yuzenghui 
>> Cc: eric.auger@gmail.com; io...@lists.linux-foundation.org;
>> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
>> kvm...@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org;
>> robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com;
>> t...@semihalf.com; zhukeqian ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; wangxingang
>> ; jiangkunkun ;
>> jean-phili...@linaro.org; zhangfei@linaro.org; zhangfei@gmail.com;
>> vivek.gau...@arm.com; Shameerali Kolothum Thodi
>> ; nicoleots...@gmail.com;
>> lushenming ; vse...@nvidia.com; Wanghaibin (D)
>> 
>> Subject: Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than
>> one context descriptor
>>
>> Hi Zenghui,
>>
>> On 3/30/21 11:23 AM, Zenghui Yu wrote:
>>> Hi Eric,
>>>
>>> On 2021/2/24 4:56, Eric Auger wrote:
>>>> In preparation for vSVA, let's accept userspace provided configs
>>>> with more than one CD. We check the max CD against the host iommu
>>>> capability and also the format (linear versus 2 level).
>>>>
>>>> Signed-off-by: Eric Auger 
>>>> Signed-off-by: Shameer Kolothum
>> 
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
>>>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 332d31c0680f..ab74a0289893 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -3038,14 +3038,17 @@ static int
>> arm_smmu_attach_pasid_table(struct
>>>> iommu_domain *domain,
>>>>   if (smmu_domain->s1_cfg.set)
>>>>   goto out;
>>>>   -    /*
>>>> - * we currently support a single CD so s1fmt and s1dss
>>>> - * fields are also ignored
>>>> - */
>>>> -    if (cfg->pasid_bits)
>>>> +    list_for_each_entry(master, _domain->devices,
>>>> domain_head) {
>>>> +    if (cfg->pasid_bits > master->ssid_bits)
>>>> +    goto out;
>>>> +    }
>>>> +    if (cfg->vendor_data.smmuv3.s1fmt ==
>>>> STRTAB_STE_0_S1FMT_64K_L2 &&
>>>> +    !(smmu->features &
>> ARM_SMMU_FEAT_2_LVL_CDTAB))
>>>>   goto out;
>>>>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>>>> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>>>> +    smmu_domain->s1_cfg.s1fmt =
>> cfg->vendor_data.smmuv3.s1fmt;
>>>
>>> And what about the SIDSS field?
>>>
>> I added this patch upon Shameer's request, to be more vSVA friendly.
>> Hower this series does not really target multiple CD support. At the
>> moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
>> At this moment maybe I can only check the s1dss field is 0x2. Or simply
>> removes this patch?
>>
>> Thoughts?
> 
> Right. This was useful for vSVA tests. But yes, to properly support multiple 
> CDs
> we need to pass the S1DSS from Qemu. And that requires further changes.
> So I think it's better to remove this patch and reject S1CDMAX != 0 cases.
OK I will remove it

Thanks

Eric
> 
> Thanks,
> Shameer
>
>>
>> Eric
> 



Re: [PATCH v14 13/13] iommu/smmuv3: Accept configs with more than one context descriptor

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> In preparation for vSVA, let's accept userspace provided configs
>> with more than one CD. We check the max CD against the host iommu
>> capability and also the format (linear versus 2 level).
>>
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Shameer Kolothum 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 -
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 332d31c0680f..ab74a0289893 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3038,14 +3038,17 @@ static int arm_smmu_attach_pasid_table(struct
>> iommu_domain *domain,
>>   if (smmu_domain->s1_cfg.set)
>>   goto out;
>>   -    /*
>> - * we currently support a single CD so s1fmt and s1dss
>> - * fields are also ignored
>> - */
>> -    if (cfg->pasid_bits)
>> +    list_for_each_entry(master, _domain->devices,
>> domain_head) {
>> +    if (cfg->pasid_bits > master->ssid_bits)
>> +    goto out;
>> +    }
>> +    if (cfg->vendor_data.smmuv3.s1fmt ==
>> STRTAB_STE_0_S1FMT_64K_L2 &&
>> +    !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
>>   goto out;
>>     smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +    smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
>> +    smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
> 
> And what about the SIDSS field?
> 
I added this patch upon Shameer's request, to be more vSVA friendly.
Hower this series does not really target multiple CD support. At the
moment the driver only supports STRTAB_STE_1_S1DSS_SSID0 (0x2) I think.
At this moment maybe I can only check the s1dss field is 0x2. Or simply
removes this patch?

Thoughts?

Eric



Re: [PATCH v4 1/8] KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 12:52 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Thu, 01 Apr 2021 09:52:31 +0100,
> Eric Auger  wrote:
>>
>> KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
>> -EEXIST in case the base address of the redist is already set.
>> We currently return -EINVAL.
>>
>> However we need to return -EINVAL in case a legacy REDIST address
>> is attempted to be set while REDIST_REGIONS were set. This case
>> is discriminated by looking at the count field.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - simplify the check sequence
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 15a6c98ee92f0..013b737b658f8 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -791,10 +791,6 @@ static int vgic_v3_insert_redist_region(struct kvm 
>> *kvm, uint32_t index,
>>  size_t size = count * KVM_VGIC_V3_REDIST_SIZE;
>>  int ret;
>>  
>> -/* single rdist region already set ?*/
>> -if (!count && !list_empty(rd_regions))
>> -return -EINVAL;
>> -
>>  /* cross the end of memory ? */
>>  if (base + size < base)
>>  return -EINVAL;
>> @@ -805,11 +801,14 @@ static int vgic_v3_insert_redist_region(struct kvm 
>> *kvm, uint32_t index,
>>  } else {
>>  rdreg = list_last_entry(rd_regions,
>>  struct vgic_redist_region, list);
>> -if (index != rdreg->index + 1)
>> -return -EINVAL;
>>  
>> -/* Cannot add an explicitly sized regions after legacy region */
>> -if (!rdreg->count)
>> +if ((!count) != (!rdreg->count))
>> +return -EINVAL; /* Mix REDIST and REDIST_REGION */
> 
> Urgh... The triple negation killed me. Can we come up with a more
> intuitive expression? Something like:

Yes sometimes I can be "different" ;-)
> 
>   /* Don't mix single region and discrete redist regions */
>   if (!count && rdreg->count)
>   return -EINVAL;>
> Does it capture what you want to express?

yes it does!

Thanks

Eric
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH v4 7/8] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-04-01 Thread Auger Eric
Hi Marc,

On 4/1/21 3:42 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Thu, 01 Apr 2021 09:52:37 +0100,
> Eric Auger  wrote:
>>
>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>> a bug identified when attempting to access the GICR_TYPER
>> register before the redistributor region setting, but dropped
>> the support of the LAST bit.
>>
>> Emulating the GICR_TYPER.Last bit still makes sense for
>> architecture compliance though. This patch restores its support
>> (if the redistributor region was set) while keeping the code safe.
>>
>> We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
>> computes whether a redistributor is the highest one of a series
>> of redistributor contributor pages.
>>
>> The spec says "Indicates whether this Redistributor is the
>> highest-numbered Redistributor in a series of contiguous
>> Redistributor pages."
>>
>> The code is a bit convulated since there is no guarantee
> 
> nit: convoluted
> 
>> redistributors are added in a given reditributor region in
>> ascending order. In that case the current implementation was
>> wrong. Also redistributor regions can be contiguous
>> and registered in non increasing base address order.
>>
>> So the index of redistributors are stored in an array within
>> the redistributor region structure.
>>
>> With this new implementation we do not need to have a uaccess
>> read accessor anymore.
>>
>> Signed-off-by: Eric Auger 
> 
> This patch also hurt my head, a lot more than the first one.  See
> below.
> 
>> ---
>>  arch/arm64/kvm/vgic/vgic-init.c|  7 +--
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 97 --
>>  arch/arm64/kvm/vgic/vgic.h |  1 +
>>  include/kvm/arm_vgic.h |  3 +
>>  4 files changed, 73 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
>> b/arch/arm64/kvm/vgic/vgic-init.c
>> index cf6faa0aeddb2..61150c34c268c 100644
>> --- a/arch/arm64/kvm/vgic/vgic-init.c
>> +++ b/arch/arm64/kvm/vgic/vgic-init.c
>> @@ -190,6 +190,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>  int i;
>>  
>>  vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
>> +vgic_cpu->index = vcpu->vcpu_id;
> 
> Is it so that vgic_cpu->index is always equal to vcpu_id? If so, why
> do we need another field? We can always get to the vcpu using a
> container_of().
> 
>>  
>>  INIT_LIST_HEAD(_cpu->ap_list_head);
>>  raw_spin_lock_init(_cpu->ap_list_lock);
>> @@ -338,10 +339,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>>  dist->vgic_dist_base = VGIC_ADDR_UNDEF;
>>  
>>  if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>> -list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
>> -list_del(>list);
>> -kfree(rdreg);
>> -}
>> +list_for_each_entry_safe(rdreg, next, >rd_regions, list)
>> +vgic_v3_free_redist_region(rdreg);
> 
> Consider moving the introduction of vgic_v3_free_redist_region() into
> a separate patch. On its own, that's a good readability improvement.
> 
>>  INIT_LIST_HEAD(>rd_regions);
>>  } else {
>>  dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 987e366c80008..f6a7eed1d6adb 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -251,45 +251,57 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu 
>> *vcpu,
>>  vgic_enable_lpis(vcpu);
>>  }
>>  
>> +static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_dist *vgic = >kvm->arch.vgic;
>> +struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>> +struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
>> +
>> +if (!rdreg)
>> +return false;
>> +
>> +if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
>> +/* check whether there is no other contiguous rdist region */
>> +struct list_head *rd_regions = >rd_regions;
>> +struct vgic_redist_region *iter;
>> +
>> +list_for_each_entry(iter, rd_regions, list) {
>> +if (iter->base == rdreg->base + rdreg->count * 
>> KVM_VGIC_V3_REDIST_SIZE &&
>> +iter->free_index > 0) {
>> +/* check the first rdist index of this region, if any */
>> +if (vgic_cpu->index < iter->rdist_indices[0])
>> +return false;
> 
> rdist_indices[] contains the vcpu_id of the vcpu associated with a
> given RD in the region. At this stage, you have established that there
> is another region that is contiguous with the one associated with our
> vcpu. You also know that this adjacent region has a vcpu mapped in
> (free_index isn't 0). Isn't that enough to declare that our vcpu isn't
> last?  I 

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-01 Thread Auger Eric
Hi Zenghui,

On 3/30/21 11:17 AM, Zenghui Yu wrote:
> On 2021/2/24 4:56, Eric Auger wrote:
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
> 
> If I understand it correctly, the true nested mode effectively gives us
> a *NS-EL1* StreamWorld. We should therefore use CMDQ_OP_TLBI_NH_VA to
> invalidate the stage-1 NS-EL1 entries, no matter E2H is selected or not.
> 

Yes at the moment you're right. Support for nested virt may induce some
changes here but we are not there. I will fix it and add a comment.
Thank you!

Best Regards

Eric



Re: [PATCH v3 8/8] KVM: selftests: aarch64/vgic-v3 init sequence tests

2021-03-31 Thread Auger Eric
Hi Drew,

On 3/22/21 7:32 PM, Andrew Jones wrote:
> On Fri, Mar 12, 2021 at 06:32:02PM +0100, Eric Auger wrote:
>> The tests exercise the VGIC_V3 device creation including the
>> associated KVM_DEV_ARM_VGIC_GRP_ADDR group attributes:
>>
>> - KVM_VGIC_V3_ADDR_TYPE_DIST/REDIST
>> - KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION
>>
>> Some other tests dedicate to KVM_DEV_ARM_VGIC_GRP_REDIST_REGS group
>> and especially the GICR_TYPER read. The goal was to test the case
>> recently fixed by commit 23bde34771f1
>> ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace").
>>
>> The API under test can be found at
>> Documentation/virt/kvm/devices/arm-vgic-v3.rst
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c|   2 +-
>>  tools/testing/selftests/kvm/Makefile  |   1 +
>>  .../testing/selftests/kvm/aarch64/vgic_init.c | 672 ++
>>  .../testing/selftests/kvm/include/kvm_util.h  |   5 +
>>  tools/testing/selftests/kvm/lib/kvm_util.c|  51 ++
>>  5 files changed, 730 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_init.c
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 652998ed0b55..f6a7eed1d6ad 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -260,7 +260,7 @@ static bool vgic_mmio_vcpu_rdist_is_last(struct kvm_vcpu 
>> *vcpu)
>>  if (!rdreg)
>>  return false;
>>  
>> -if (!rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
>> +if (rdreg->count && vgic_cpu->rdreg_index == (rdreg->count - 1)) {
> 
> I guess this is an accidental change?
this change should be squashed into the previous patch
> 
>>  /* check whether there is no other contiguous rdist region */
>>  struct list_head *rd_regions = >rd_regions;
>>  struct vgic_redist_region *iter;
>> diff --git a/tools/testing/selftests/kvm/Makefile 
>> b/tools/testing/selftests/kvm/Makefile
>> index a6d61f451f88..4e548d7ab0ab 100644
>> --- a/tools/testing/selftests/kvm/Makefile
>> +++ b/tools/testing/selftests/kvm/Makefile
>> @@ -75,6 +75,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
>>  
>>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
>> +TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
>>  TEST_GEN_PROGS_aarch64 += demand_paging_test
>>  TEST_GEN_PROGS_aarch64 += dirty_log_test
>>  TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
> 
> Missing .gitignore change
OK
> 
>> diff --git a/tools/testing/selftests/kvm/aarch64/vgic_init.c 
>> b/tools/testing/selftests/kvm/aarch64/vgic_init.c
>> new file mode 100644
>> index ..12205ab9fb10
>> --- /dev/null
>> +++ b/tools/testing/selftests/kvm/aarch64/vgic_init.c
>> @@ -0,0 +1,672 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * vgic init sequence tests
>> + *
>> + * Copyright (C) 2020, Red Hat, Inc.
>> + */
>> +#define _GNU_SOURCE
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "test_util.h"
>> +#include "kvm_util.h"
>> +#include "processor.h"
>> +
>> +#define NR_VCPUS4
>> +
>> +#define REDIST_REGION_ATTR_ADDR(count, base, flags, index) 
>> (((uint64_t)(count) << 52) | \
>> +((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index)
>> +#define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset)
>> +
>> +#define GICR_TYPER 0x8
>> +
>> +/* helper to access a redistributor register */
>> +static int access_redist_reg(int gicv3_fd, int vcpu, int offset,
>> + uint32_t *val, bool write)
>> +{
>> +uint64_t attr = REG_OFFSET(vcpu, offset);
>> +
>> +return kvm_device_access(gicv3_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
>> + attr, val, write);
>> +}
>> +
>> +/* dummy guest code */
>> +static void guest_code(int cpu)
> 
> cpu is unused, no need for it
sure
> 
>> +{
>> +GUEST_SYNC(0);
>> +GUEST_SYNC(1);
>> +GUEST_SYNC(2);
>> +GUEST_DONE();
>> +}
>> +
>> +/* we don't want to assert on run execution, hence that helper */
>> +static int run_vcpu(struct kvm_vm *vm, uint32_t vcpuid)
>> +{
>> +static int run;
>> +struct ucall uc;
>> +int ret;
>> +
>> +vcpu_args_set(vm, vcpuid, 1, vcpuid);
> 
> The cpu index is unused, so you need to pass it in
removed
> 
>> +ret = _vcpu_ioctl(vm, vcpuid, KVM_RUN, NULL);
>> +get_ucall(vm, vcpuid, );
> 
> uc is unused, so you can pass NULL for it
OK
> 
>> +run++;
> 
> What good is this counter? Nobody reads it.
removed
> 
>> +
>> +if (ret)
>> +return -errno;
>> +return 0;
>> +}
>> +
>> +/**
>> + * Helper routine that performs KVM device tests in general and
>> + * especially ARM_VGIC_V3 ones. Eventually the ARM_VGIC_V3
>> + * device gets created, a legacy RDIST region is set at @0x0
>> + * and a DIST region is set @0x6
>> + */
>> +int fuzz_dist_rdist(struct kvm_vm *vm)
> 
> 

Re: [PATCH 1/4] vfio/type1: fix a couple of spelling mistakes

2021-03-26 Thread Auger Eric
Hi,

On 3/26/21 9:35 AM, Zhen Lei wrote:
> There are several spelling mistakes, as follows:
> userpsace ==> userspace
> Accouting ==> Accounting
> exlude ==> exclude
Reviewed-by: Eric Auger 

Thanks

Eric
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index be07664af74..21cf1d123036c82 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -16,7 +16,7 @@
>   * IOMMU to support the IOMMU API and have few to no restrictions around
>   * the IOVA range that can be mapped.  The Type1 IOMMU is currently
>   * optimized for relatively static mappings of a userspace process with
> - * userpsace pages pinned into memory.  We also assume devices and IOMMU
> + * userspace pages pinned into memory.  We also assume devices and IOMMU
>   * domains are PCI based as the IOMMU API is still centered around a
>   * device/bus interface rather than a group interface.
>   */
> @@ -871,7 +871,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   /*
>* If iommu capable domain exist in the container then all pages are
> -  * already pinned and accounted. Accouting should be done if there is no
> +  * already pinned and accounted. Accounting should be done if there is 
> no
>* iommu capable domain in the container.
>*/
>   do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
> @@ -2171,7 +2171,7 @@ static int vfio_iommu_resv_exclude(struct list_head 
> *iova,
>   continue;
>   /*
>* Insert a new node if current node overlaps with the
> -  * reserve region to exlude that from valid iova range.
> +  * reserve region to exclude that from valid iova range.
>* Note that, new node is inserted before the current
>* node and finally the current node is deleted keeping
>* the list updated and sorted.
> 



Re: [PATCH 2/4] vfio/mdev: Fix spelling mistake "interal" -> "internal"

2021-03-26 Thread Auger Eric



On 3/26/21 9:35 AM, Zhen Lei wrote:
> There is a spelling mistake in a comment, fix it.
> 
> Signed-off-by: Zhen Lei 
Reviewed-by: Eric Auger 
> ---
>  drivers/vfio/mdev/mdev_private.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index 7d922950caaf3c1..4d62b76c473409d 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: GPL-2.0-only */
>  /*
> - * Mediated device interal definitions
> + * Mediated device internal definitions
>   *
>   * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
>   * Author: Neo Jia 
> 



Re: [PATCH 3/4] vfio/pci: fix a couple of spelling mistakes

2021-03-26 Thread Auger Eric



On 3/26/21 9:35 AM, Zhen Lei wrote:
> There are several spelling mistakes, as follows:
> permision ==> permission
> thru ==> through
> presense ==> presence
> 
> Signed-off-by: Zhen Lei 
Reviewed-by: Eric Auger 

Eric
> ---
>  drivers/vfio/pci/vfio_pci.c | 2 +-
>  drivers/vfio/pci/vfio_pci_config.c  | 2 +-
>  drivers/vfio/pci/vfio_pci_nvlink2.c | 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 65e7e6b44578c29..d2ab8b5bc8a86fe 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -2409,7 +2409,7 @@ static int __init vfio_pci_init(void)
>  {
>   int ret;
>  
> - /* Allocate shared config space permision data used by all devices */
> + /* Allocate shared config space permission data used by all devices */
>   ret = vfio_pci_init_perm_bits();
>   if (ret)
>   return ret;
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index a402adee8a21558..d57f037f65b85d4 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -101,7 +101,7 @@
>  /*
>   * Read/Write Permission Bits - one bit for each bit in capability
>   * Any field can be read if it exists, but what is read depends on
> - * whether the field is 'virtualized', or just pass thru to the
> + * whether the field is 'virtualized', or just pass through to the
>   * hardware.  Any virtualized field is also virtualized for writes.
>   * Writes are only permitted if they have a 1 bit here.
>   */
> diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c 
> b/drivers/vfio/pci/vfio_pci_nvlink2.c
> index 9adcf6a8f888575..f276624fec79f68 100644
> --- a/drivers/vfio/pci/vfio_pci_nvlink2.c
> +++ b/drivers/vfio/pci/vfio_pci_nvlink2.c
> @@ -219,7 +219,7 @@ int vfio_pci_nvdia_v100_nvlink2_init(struct 
> vfio_pci_device *vdev)
>   unsigned long events = VFIO_GROUP_NOTIFY_SET_KVM;
>  
>   /*
> -  * PCI config space does not tell us about NVLink presense but
> +  * PCI config space does not tell us about NVLink presence but
>* platform does, use this.
>*/
>   npu_dev = pnv_pci_get_npu_dev(vdev->pdev, 0);
> @@ -402,7 +402,7 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
>   u32 link_speed = 0xff;
>  
>   /*
> -  * PCI config space does not tell us about NVLink presense but
> +  * PCI config space does not tell us about NVLink presence but
>* platform does, use this.
>*/
>   if (!pnv_pci_get_gpu_dev(vdev->pdev))
> 



Re: [PATCH 4/4] vfio/platform: Fix spelling mistake "registe" -> "register"

2021-03-26 Thread Auger Eric
Hi,

On 3/26/21 9:35 AM, Zhen Lei wrote:
> There is a spelling mistake in a comment, fix it.
> 
> Signed-off-by: Zhen Lei 
Acked-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c 
> b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
> index 09a9453b75c5592..63cc7f0b2e4a437 100644
> --- a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
> +++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
> @@ -26,7 +26,7 @@
>  #define XGMAC_DMA_CONTROL   0x0f18  /* Ctrl (Operational Mode) */
>  #define XGMAC_DMA_INTR_ENA  0x0f1c  /* Interrupt Enable */
>  
> -/* DMA Control registe defines */
> +/* DMA Control register defines */
>  #define DMA_CONTROL_ST  0x2000  /* Start/Stop Transmission */
>  #define DMA_CONTROL_SR  0x0002  /* Start/Stop Receive */
>  
> 



Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-22 Thread Auger Eric
Hi Chenxiang,

On 3/22/21 7:40 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/3/20 1:36, Auger Eric 写道:
>> Hi Chenxiang,
>>
>> On 3/4/21 8:55 AM, chenxiang (M) wrote:
>>> Hi Eric,
>>>
>>>
>>> 在 2021/2/24 4:56, Eric Auger 写道:
>>>> Implement domain-selective, pasid selective and page-selective
>>>> IOTLB invalidations.
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v13 -> v14:
>>>> - Add domain invalidation
>>>> - do global inval when asid is not provided with addr
>>>>    granularity
>>>>
>>>> v7 -> v8:
>>>> - ASID based invalidation using iommu_inv_pasid_info
>>>> - check ARCHID/PASID flags in addr based invalidation
>>>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>>>
>>>> v6 -> v7
>>>> - check the uapi version
>>>>
>>>> v3 -> v4:
>>>> - adapt to changes in the uapi
>>>> - add support for leaf parameter
>>>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>>>    anymore
>>>>
>>>> v2 -> v3:
>>>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>>>
>>>> v1 -> v2:
>>>> - properly pass the asid
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74
>>>> +
>>>>   1 file changed, 74 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 4c19a1114de4..df3adc49111c 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -2949,6 +2949,79 @@ static void
>>>> arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>>>   mutex_unlock(_domain->init_mutex);
>>>>   }
>>>>   +static int
>>>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct
>>>> device *dev,
>>>> +  struct iommu_cache_invalidate_info *inv_info)
>>>> +{
>>>> +    struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>>>> +    struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +    struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>> +
>>>> +    if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (!smmu)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>>>> +    return -EINVAL;
>>>> +
>>>> +    if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>>>> +    inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>>>> +    return -ENOENT;
>>>> +    }
>>>> +
>>>> +    if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>>>> +    return -EINVAL;
>>>> +
>>>> +    /* IOTLB invalidation */
>>>> +
>>>> +    switch (inv_info->granularity) {
>>>> +    case IOMMU_INV_GRANU_PASID:
>>>> +    {
>>>> +    struct iommu_inv_pasid_info *info =
>>>> +    _info->granu.pasid_info;
>>>> +
>>>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +    return -ENOENT;
>>>> +    if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>>>> +    return -EINVAL;
>>>> +
>>>> +    __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>>>> +    return 0;
>>>> +    }
>>>> +    case IOMMU_INV_GRANU_ADDR:
>>>> +    {
>>>> +    struct iommu_inv_addr_info *info = _info->granu.addr_info;
>>>> +    size_t size = info->nb_granules * info->granule_size;
>>>> +    bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>>>> +
>>>> +    if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>>>> +    return -ENOENT;
>>>> +
>>>> +    if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>>>> +    break;
>>>> +
>>>> +    arm_smmu_tlb_inv_range_domai

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-19 Thread Auger Eric
Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:
> Hi Eric,
> 
> 
> 在 2021/2/24 4:56, Eric Auger 写道:
>> Implement domain-selective, pasid selective and page-selective
>> IOTLB invalidations.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14:
>> - Add domain invalidation
>> - do global inval when asid is not provided with addr
>>   granularity
>>
>> v7 -> v8:
>> - ASID based invalidation using iommu_inv_pasid_info
>> - check ARCHID/PASID flags in addr based invalidation
>> - use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync
>>
>> v6 -> v7
>> - check the uapi version
>>
>> v3 -> v4:
>> - adapt to changes in the uapi
>> - add support for leaf parameter
>> - do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
>>   anymore
>>
>> v2 -> v3:
>> - replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync
>>
>> v1 -> v2:
>> - properly pass the asid
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +
>>  1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 4c19a1114de4..df3adc49111c 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct 
>> iommu_domain *domain)
>>  mutex_unlock(_domain->init_mutex);
>>  }
>>  
>> +static int
>> +arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>> +  struct iommu_cache_invalidate_info *inv_info)
>> +{
>> +struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +return -EINVAL;
>> +
>> +if (!smmu)
>> +return -EINVAL;
>> +
>> +if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> +return -EINVAL;
>> +
>> +if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
>> +inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
>> +return -ENOENT;
>> +}
>> +
>> +if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
>> +return -EINVAL;
>> +
>> +/* IOTLB invalidation */
>> +
>> +switch (inv_info->granularity) {
>> +case IOMMU_INV_GRANU_PASID:
>> +{
>> +struct iommu_inv_pasid_info *info =
>> +_info->granu.pasid_info;
>> +
>> +if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +return -ENOENT;
>> +if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
>> +return -EINVAL;
>> +
>> +__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
>> +return 0;
>> +}
>> +case IOMMU_INV_GRANU_ADDR:
>> +{
>> +struct iommu_inv_addr_info *info = _info->granu.addr_info;
>> +size_t size = info->nb_granules * info->granule_size;
>> +bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
>> +
>> +if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
>> +return -ENOENT;
>> +
>> +if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
>> +break;
>> +
>> +arm_smmu_tlb_inv_range_domain(info->addr, size,
>> +  info->granule_size, leaf,
>> +  info->archid, smmu_domain);
> 
> Is it possible to add a check before the function to make sure that
> info->granule_size can be recognized by SMMU?
> There is a scenario which will cause TLBI issue: RIL feature is enabled
> on guest but is disabled on host, and then on
> host it just invalidate 4K/2M/1G once a time, but from QEMU,
> info->nb_granules is always 1 and info->granule_size = size,
> if size is not equal to 4K or 2M or 1G (for example size = granule_size
> is 5M), it will only part of the size it wants to invalidate.
> 
> I think maybe we can add a check here: if RIL is not enabled and also
> size is not the granule_size (4K/2M/1G) supported by
> SMMU hardware, can we just simply use the smallest granule_size
> supported by hardware all the time?
> 
>> +
>> +arm_smmu_cmdq_issue_sync(smmu);
>> +return 0;
>> +}
>> +case IOMMU_INV_GRANU_DOMAIN:
>> +break;
> 
> I check the qemu code
> (https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
> CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
> it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
> but it seems not set event.entry.granularity which i think it should set
> IOMMU_INV_GRAN_ADDR.
this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
set it explicitly ;-)
> BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
> size = 0x1 on 48bit 

Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-19 Thread Auger Eric
Hi Krishna,

On 3/18/21 1:16 AM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy 
> 
> Validated nested translations with NVMe PCI device assigned to Guest VM. 
> Tested with both v12 and v13 of Jean-Philippe's patches as base.

Many thanks for that.
> 
>> This is based on Jean-Philippe's
>> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
>> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to 
> be resolved manually.

Yep I will respin accordingly.

Best Regards

Eric
> 
> -KR
> 
> 



Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-19 Thread Auger Eric
Hi Keqian,

On 3/2/21 9:35 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> On detach, the stage 1 config is unset and the abort flag is
>> unset.
>>
>> Signed-off-by: Eric Auger 
>>
> [...]
> 
>> +
>> +/*
>> + * we currently support a single CD so s1fmt and s1dss
>> + * fields are also ignored
>> + */
>> +if (cfg->pasid_bits)
>> +goto out;
>> +
>> +smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
> specific cd using arm_smmu_get_cd_ptr().
> 
> Maybe we'd better use a specialized function to fill other fields of "cdcfg" 
> or add a sanity check in arm_smmu_get_cd_ptr()
> to prevent calling it under nested mode?
> 
> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
> found. Just a suggestion ;-)

forgive me for the delay. yes I can indeed make sure that code is not
called in nested mode. Please could you detail why you would need to
call arm_smmu_get_cd_ptr()?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
> 
>> +smmu_domain->s1_cfg.set = true;
>> +smmu_domain->abort = false;
>> +break;
>> +default:
>> +goto out;
>> +}
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +ret = 0;
>> +out:
>> +mutex_unlock(_domain->init_mutex);
>> +return ret;
>> +}
>> +
>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_master *master;
>> +unsigned long flags;
>> +
>> +mutex_lock(_domain->init_mutex);
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +goto unlock;
>> +
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = false;
>> +
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +
>> +unlock:
>> +mutex_unlock(_domain->init_mutex);
>> +}
>> +
>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>   enum iommu_dev_features feat)
>>  {
>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>  .of_xlate   = arm_smmu_of_xlate,
>>  .get_resv_regions   = arm_smmu_get_resv_regions,
>>  .put_resv_regions   = generic_iommu_put_resv_regions,
>> +.attach_pasid_table = arm_smmu_attach_pasid_table,
>> +.detach_pasid_table = arm_smmu_detach_pasid_table,
>>  .dev_has_feat   = arm_smmu_dev_has_feature,
>>  .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>>  .dev_enable_feat= arm_smmu_dev_enable_feature,
>>
> 



Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-16 Thread Auger Eric
Hi Krishna,
On 3/15/21 7:04 PM, Krishna Reddy wrote:
> Tested-by: Krishna Reddy 
> 
>> 1) pass the guest stage 1 configuration
> 
> Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
> with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch 
> series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
> NVMe PCIe device is functional with 2-stage translations and no issues 
> observed.
Thank you very much for your testing efforts. For your info, there are
more recent kernel series:
[PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23)
[PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)

working along with QEMU RFC
[RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)

If you have cycles to test with those, this would be higly appreciated.

Thanks

Eric
> 
> -KR
> 



Re: [PATCH 5/9] KVM: arm: move has_run_once after the map_resources

2021-03-12 Thread Auger Eric
Hi Alexandru,

On 1/20/21 4:56 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 1/14/21 10:02 AM, Auger Eric wrote:
>> Hi Alexandru,
>>
>> On 1/12/21 3:55 PM, Alexandru Elisei wrote:
>>> Hi Eric,
>>>
>>> On 12/12/20 6:50 PM, Eric Auger wrote:
>>>> has_run_once is set to true at the beginning of
>>>> kvm_vcpu_first_run_init(). This generally is not an issue
>>>> except when exercising the code with KVM selftests. Indeed,
>>>> if kvm_vgic_map_resources() fails due to erroneous user settings,
>>>> has_run_once is set and this prevents from continuing
>>>> executing the test. This patch moves the assignment after the
>>>> kvm_vgic_map_resources().
>>>>
>>>> Signed-off-by: Eric Auger 
>>>> ---
>>>>  arch/arm64/kvm/arm.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index c0ffb019ca8b..331fae6bff31 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -540,8 +540,6 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu 
>>>> *vcpu)
>>>>if (!kvm_arm_vcpu_is_finalized(vcpu))
>>>>return -EPERM;
>>>>  
>>>> -  vcpu->arch.has_run_once = true;
>>>> -
>>>>if (likely(irqchip_in_kernel(kvm))) {
>>>>/*
>>>> * Map the VGIC hardware resources before running a vcpu the
>>>> @@ -560,6 +558,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu 
>>>> *vcpu)
>>>>static_branch_inc(_irqchip_in_use);
>>>>}
>>>>  
>>>> +  vcpu->arch.has_run_once = true;
>>> I have a few concerns regarding this:
>>>
>>> 1. Moving has_run_once = true here seems very arbitrary to me - 
>>> kvm_timer_enable()
>>> and kvm_arm_pmu_v3_enable(), below it, can both fail because of erroneous 
>>> user
>>> values. If there's a reason why the assignment cannot be moved at the end 
>>> of the
>>> function, I think it should be clearly stated in a comment for the people 
>>> who
>>> might be tempted to write similar tests for the timer or pmu.
>> Setting has_run_once = true at the entry of the function looks to me
>> even more arbitrary. I agree with you that eventually has_run_once may
> 
> Or it could be it's there to prevent the user from calling
> kvm_vgic_map_resources() a second time after it failed. This is what I'm 
> concerned
> about and I think deserves more investigation.

I have reworked my kvm selftests to live without that change.

Thanks

Eric
> 
> Thanks,
> Alex
>> be moved at the very end but maybe this can be done later once timer,
>> pmu tests haven ben written
>>> 2. There are many ways that kvm_vgic_map_resources() can fail, other than
>>> incorrect user settings. I started digging into how
>>> kvm_vgic_map_resources()->vgic_v2_map_resources() can fail for a VGIC V2 
>>> and this
>>> is what I managed to find before I gave up:
>>>
>>> * vgic_init() can fail in:
>>>     - kvm_vgic_dist_init()
>>>     - vgic_v3_init()
>>>     - kvm_vgic_setup_default_irq_routing()
>>> * vgic_register_dist_iodev() can fail in:
>>>     - vgic_v3_init_dist_iodev()
>>>     - kvm_io_bus_register_dev()(*)
>>> * kvm_phys_addr_ioremap() can fail in:
>>>     - kvm_mmu_topup_memory_cache()
>>>     - kvm_pgtable_stage2_map()
>> I changed the commit msg so that "incorrect user settings" sounds as an
>> example.
>>> So if any of the functions below fail, are we 100% sure it is safe to allow 
>>> the
>>> user to execute kvm_vgic_map_resources() again?
>> I think additional tests will confirm this. However at the moment,
>> moving the assignment, which does not look wrong to me, allows to
>> greatly simplify the tests so I would tend to say that it is worth.
>>> (*) It looks to me like kvm_io_bus_register_dev() doesn't take into account 
>>> a
>>> caller that tries to register the same device address range and it will 
>>> create
>>> another identical range. Is this intentional? Is it a bug that should be 
>>> fixed? Or
>>> am I misunderstanding the function?
>> doesn't kvm_io_bus_cmp() do the check?
>>
>> Thanks
>>
>> Eric
>>> Thanks,
>>> Alex
>>>> +
>>>>ret = kvm_timer_enable(vcpu);
>>>>if (ret)
>>>>return ret;
> 



Re: [PATCH 8/9] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-03-12 Thread Auger Eric
Hi Alexandru,

On 1/20/21 5:13 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 1/14/21 10:16 AM, Auger Eric wrote:
>> Hi Alexandru,
>>
>> On 1/12/21 6:02 PM, Alexandru Elisei wrote:
>>> Hi Eric,
>>>
>>> On 12/12/20 6:50 PM, Eric Auger wrote:
>>>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>>>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>>>> a bug identified when attempting to access the GICR_TYPER
>>>> register before the redistributor region setting but dropped
>>>> the support of the LAST bit. This patch restores its
>>>> support (if the redistributor region was set) while keeping the
>>>> code safe.
>>> I suppose the reason for emulating GICR_TYPER.Last is for architecture 
>>> compliance,
>>> right? I think that should be in the commit message.
>> OK added this in the commit msg.
>>>> Signed-off-by: Eric Auger 
>>>> ---
>>>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 7 ++-
>>>>  include/kvm/arm_vgic.h | 1 +
>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>>>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> index 581f0f49..2f9ef6058f6e 100644
>>>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>>>> @@ -277,6 +277,8 @@ static unsigned long 
>>>> vgic_uaccess_read_v3r_typer(struct kvm_vcpu *vcpu,
>>>> gpa_t addr, unsigned int len)
>>>>  {
>>>>unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
>>>> +  struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>>>> +  struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
>>>>int target_vcpu_id = vcpu->vcpu_id;
>>>>u64 value;
>>>>  
>>>> @@ -286,7 +288,9 @@ static unsigned long 
>>>> vgic_uaccess_read_v3r_typer(struct kvm_vcpu *vcpu,
>>>>if (vgic_has_its(vcpu->kvm))
>>>>value |= GICR_TYPER_PLPIS;
>>>>  
>>>> -  /* reporting of the Last bit is not supported for userspace */
>>>> +  if (rdreg && (vgic_cpu->rdreg_index == (rdreg->free_index - 1)))
>>>> +  value |= GICR_TYPER_LAST;
>>>> +
>>>>return extract_bytes(value, addr & 7, len);
>>>>  }
>>>>  
>>>> @@ -714,6 +718,7 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu)
>>>>return -EINVAL;
>>>>  
>>>>vgic_cpu->rdreg = rdreg;
>>>> +  vgic_cpu->rdreg_index = rdreg->free_index;
>>> What happens if the next redistributor region we register has the base 
>>> address
>>> adjacent to this one?
>>>
>>> I'm really not familiar with the code, but is it not possible to create two
>>> Redistributor regions (via
>>> KVM_DEV_ARM_VGIC_GRP_ADDR(KVM_VGIC_V3_ADDR_TYPE_REDIST)) where the second
>>> Redistributor region start address is immediately after the last 
>>> Redistributor in
>>> the preceding region?
>> KVM_VGIC_V3_ADDR_TYPE_REDIST only allows to create a single rdist
>> region. Only KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION allows to register
>> several of them.
>>
>> with KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, it is possible to register
>> adjacent rdist regions. vgic_v3_rdist_free_slot() previously returned
>> the 1st rdist region where enough space remains for inserting the new
>> reg. We put the rdist at the free index there.
>>
>> But maybe I misunderstood your question?
> 
> Yes, I think you did a good job at answering my poorly worded question.
> 
> This is the case I am concerned about:
> 
> 1. Userspace sets first redistributor base address to 0x0 via
> KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION(count = 1, base = 0x0, flags = 0, index = 
> 0).
> 
> 2. Userspace sets first redistributor base address to 0x0 + 128K, immediately
> following the previous Redistributor.
> 
> In that case the two Redistributors will be represented by two separate struct
> vgic_redist_region, but they are adjacent to one another and represent one
> contiguous memory region.
> 
> From what I understand from your patch, GICR_TYPER.Last will be set for both
> Redistributors, when it should be set only for the second Redistributor. Does 
> any
> of that make sense?

Please forgive me for not having replied before on this thread.

This is a valid concern

Re: [PATCH v12 03/13] vfio: VFIO_IOMMU_SET_MSI_BINDING

2021-03-08 Thread Auger Eric
Hi Jean,

On 3/5/21 11:45 AM, Jean-Philippe Brucker wrote:
> Hi,
> 
> On Tue, Feb 23, 2021 at 10:06:15PM +0100, Eric Auger wrote:
>> This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim
>> to (un)register the guest MSI binding to the host. This latter
>> then can use those stage 1 bindings to build a nested stage
>> binding targeting the physical MSIs.
> 
> Now that RMR is in the IORT spec, could it be used for the nested MSI
> problem?  For virtio-iommu tables I was planning to do it like this:
> 
> MSI is mapped at stage-2 with an arbitrary IPA->doorbell PA. We report
> this IPA to userspace through iommu_groups/X/reserved_regions. No change
> there. Then to the guest we report a reserved identity mapping at IPA
> (using RMR, an equivalent DT binding, or probed RESV_MEM for
> virtio-iommu).

Is there any DT binding equivalent?

 The guest creates that mapping at stage-1, and that's it.
> Unless I overlooked something we'd only reuse existing infrastructure and
> avoid the SET_MSI_BINDING interface.

Yes at first glance I think this should work. The guest SMMU driver will
continue allocating IOVA for MSIs but I think that's not an issue as
they won't be used.

For the SMMU case this makes the guest behavior different from the
baremetal one though. Typically you will never get any S1 fault. Also
the S1 mapping is static and direct.

I will prototype this too.

Thanks

Eric
> 
> Thanks,
> Jean
> 



Re: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-25 Thread Auger Eric
Hi Shameer, all

On 2/23/21 9:56 PM, Eric Auger wrote:
> This series brings the IOMMU part of HW nested paging support
> in the SMMUv3. The VFIO part is submitted separately.
> 
> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/
> 
> The IOMMU API is extended to support 2 new API functionalities:
> 1) pass the guest stage 1 configuration
> 2) pass stage 1 MSI bindings
> 
> Then those capabilities gets implemented in the SMMUv3 driver.
> 
> The virtualizer passes information through the VFIO user API
> which cascades them to the iommu subsystem. This allows the guest
> to own stage 1 tables and context descriptors (so-called PASID
> table) while the host owns stage 2 tables and main configuration
> structures (STE).
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/linux/tree/v5.11-stallv12-2stage-v14
> (including the VFIO part in its last version: v12)

As committed, I have rebased the iommu + vfio part on top of Jean's
sva/current (5.11-rc4).

https://github.com/eauger/linux/tree/jean_sva_current_2stage_v14

I have not tested the SVA bits but I have tested there is no regression
from my pov.

>From the QEMU perspective is works off the shelf with that branch but if
you want to use other SVA related IOCTLs please remind of updating the
linux headers.

Again thank you to all of you who reviewed and tested the previous version.

Thanks

Eric
> 
> The VFIO series is sent separately.
> 
> History:
> 
> Previous version (v13):
> https://github.com/eauger/linux/tree/5.10-rc4-2stage-v13
> 
> v13 -> v14:
> - Took into account all received comments I think. Great
>   thanks to all the testers for their effort and sometimes
>   fixes. I am really grateful to you!
> - numerous fixes including guest running in
>   noiommu, iommu.strict=0, iommu.passthrough=on,
>   enable_unsafe_noiommu_mode
> 
> v12 -> v13:
> - fixed compilation issue with CONFIG_ARM_SMMU_V3_SVA
>   reported by Shameer. This urged me to revisit patch 4 into
>   iommu/smmuv3: Allow s1 and s2 configs to coexist where
>   s1_cfg and s2_cfg are not dynamically allocated anymore.
>   Instead I use a new set field in existing structs
> - fixed 2 others config checks
> - Updated "iommu/arm-smmu-v3: Maintain a SID->device structure"
>   according to the last version
> 
> v11 -> v12:
> - rebase on top of v5.10-rc4
> 
> Eric Auger (13):
>   iommu: Introduce attach/detach_pasid_table API
>   iommu: Introduce bind/unbind_guest_msi
>   iommu/smmuv3: Allow s1 and s2 configs to coexist
>   iommu/smmuv3: Get prepared for nested stage support
>   iommu/smmuv3: Implement attach/detach_pasid_table
>   iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs
>   iommu/smmuv3: Implement cache_invalidate
>   dma-iommu: Implement NESTED_MSI cookie
>   iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement
>   iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI
> regions
>   iommu/smmuv3: Implement bind/unbind_guest_msi
>   iommu/smmuv3: report additional recoverable faults
>   iommu/smmuv3: Accept configs with more than one context descriptor
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 444 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +-
>  drivers/iommu/dma-iommu.c   | 142 ++-
>  drivers/iommu/iommu.c   | 106 +
>  include/linux/dma-iommu.h   |  16 +
>  include/linux/iommu.h   |  47 +++
>  include/uapi/linux/iommu.h  |  54 +++
>  7 files changed, 781 insertions(+), 42 deletions(-)
> 



Re: [PATCH v11 04/13] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2021-02-23 Thread Auger Eric
Hi Shenming,

On 2/23/21 1:45 PM, Shenming Lu wrote:
>> +static int vfio_pci_dma_fault_init(struct vfio_pci_device *vdev)
>> +{
>> +struct vfio_region_dma_fault *header;
>> +struct iommu_domain *domain;
>> +size_t size;
>> +bool nested;
>> +int ret;
>> +
>> +domain = iommu_get_domain_for_dev(>pdev->dev);
>> +ret = iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, );
>> +if (ret || !nested)
>> +return ret;
> 
> Hi Eric,
> 
> It seems that the type of nested should be int, the use of bool might trigger
> a panic in arm_smmu_domain_get_attr().

Thank you. That's fixed now.

Best Regards

Eric
> 
> Thanks,
> Shenming
> 



Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/22/21 1:20 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/22 18:53, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/2/21 1:34 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/16 19:00, Eric Auger wrote:
>>>> From: "Liu, Yi L" 
>>>>
>>>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>>>> which aims to pass the virtual iommu guest configuration
>>>> to the host. This latter takes the form of the so-called
>>>> PASID table.
>>>>
>>>> Signed-off-by: Jacob Pan 
>>>> Signed-off-by: Liu, Yi L 
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>> v11 -> v12:
>>>> - use iommu_uapi_set_pasid_table
>>>> - check SET and UNSET are not set simultaneously (Zenghui)
>>>>
>>>> v8 -> v9:
>>>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>>>
>>>> v6 -> v7:
>>>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>>>
>>>> v3 -> v4:
>>>> - restore ATTACH/DETACH
>>>> - add unwind on failure
>>>>
>>>> v2 -> v3:
>>>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>>>
>>>> v1 -> v2:
>>>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>>>> - remove the struct device arg
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>>>  include/uapi/linux/vfio.h   | 19 ++
>>>>  2 files changed, 84 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index 67e827638995..87ddd9e882dc 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>>>> vfio_iommu *iommu,
>>>>return ret;
>>>>  }
>>>>  
>>>> +static void
>>>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>>>> +{
>>>> +  struct vfio_domain *d;
>>>> +
>>>> +  mutex_lock(>lock);
>>>> +  list_for_each_entry(d, >domain_list, next)
>>>> +  iommu_detach_pasid_table(d->domain);
>>>> +
>>>> +  mutex_unlock(>lock);
>>>> +}
>>>> +
>>>> +static int
>>>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>>>> +{
>>>> +  struct vfio_domain *d;
>>>> +  int ret = 0;
>>>> +
>>>> +  mutex_lock(>lock);
>>>> +
>>>> +  list_for_each_entry(d, >domain_list, next) {
>>>> +  ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>>>> *)arg);
>>> This design is not very clear to me. This assumes all iommu_domains share 
>>> the same pasid table.
>>>
>>> As I understand, it's reasonable when there is only one group in the 
>>> domain, and only one domain in the vfio_iommu.
>>> If more than one group in the vfio_iommu, the guest may put them into 
>>> different guest iommu_domain, then they have different pasid table.
>>>
>>> Is this the use scenario?
>>
>> the vfio_iommu is attached to a container. all the groups within a
>> container share the same set of page tables (linux
>> Documentation/driver-api/vfio.rst). So to me if you want to use
>> different pasid tables, the groups need to be attached to different
>> containers. Does that make sense to you?
> OK, so this is what I understand about the design. A little question is that 
> when
> we perform attach_pasid_table on a container, maybe we ought to do a sanity
> check to make sure that only one group is in this container, instead of
> iterating all domain?
> 
> To be frank, my main concern is that if we put each group into different 
> container
> under nested mode, then we give up the possibility that they can share stage2 
> page tables,
> which saves host memory and reduces the time of preparing environment for VM.

Referring to the QEMU integration, when you use a virtual IOMMU, there
is generally one VFIO container per viommu protected device
(AddressSpace), independently on the fact nested stage is being used. I
think the exception is if you put 2 assigned devices behind a virtual
PCIe to PCI bridge (pcie-pci-bridge), in that case they hav

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/2/21 1:34 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/16 19:00, Eric Auger wrote:
>> From: "Liu, Yi L" 
>>
>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>> which aims to pass the virtual iommu guest configuration
>> to the host. This latter takes the form of the so-called
>> PASID table.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v11 -> v12:
>> - use iommu_uapi_set_pasid_table
>> - check SET and UNSET are not set simultaneously (Zenghui)
>>
>> v8 -> v9:
>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>
>> v6 -> v7:
>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>
>> v3 -> v4:
>> - restore ATTACH/DETACH
>> - add unwind on failure
>>
>> v2 -> v3:
>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>
>> v1 -> v2:
>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>> - remove the struct device arg
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>  include/uapi/linux/vfio.h   | 19 ++
>>  2 files changed, 84 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 67e827638995..87ddd9e882dc 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>> vfio_iommu *iommu,
>>  return ret;
>>  }
>>  
>> +static void
>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>> +{
>> +struct vfio_domain *d;
>> +
>> +mutex_lock(>lock);
>> +list_for_each_entry(d, >domain_list, next)
>> +iommu_detach_pasid_table(d->domain);
>> +
>> +mutex_unlock(>lock);
>> +}
>> +
>> +static int
>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>> +{
>> +struct vfio_domain *d;
>> +int ret = 0;
>> +
>> +mutex_lock(>lock);
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>> *)arg);
> This design is not very clear to me. This assumes all iommu_domains share the 
> same pasid table.
> 
> As I understand, it's reasonable when there is only one group in the domain, 
> and only one domain in the vfio_iommu.
> If more than one group in the vfio_iommu, the guest may put them into 
> different guest iommu_domain, then they have different pasid table.
> 
> Is this the use scenario?

the vfio_iommu is attached to a container. all the groups within a
container share the same set of page tables (linux
Documentation/driver-api/vfio.rst). So to me if you want to use
different pasid tables, the groups need to be attached to different
containers. Does that make sense to you?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
>> +if (ret)
>> +goto unwind;
>> +}
>> +goto unlock;
>> +unwind:
>> +list_for_each_entry_continue_reverse(d, >domain_list, next) {
>> +iommu_detach_pasid_table(d->domain);
>> +}
>> +unlock:
>> +mutex_unlock(>lock);
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>> struct vfio_info_cap *caps)
>>  {
>> @@ -2747,6 +2782,34 @@ static int vfio_iommu_type1_unmap_dma(struct 
>> vfio_iommu *iommu,
>>  -EFAULT : 0;
>>  }
>>  
>> +static int vfio_iommu_type1_set_pasid_table(struct vfio_iommu *iommu,
>> +unsigned long arg)
>> +{
>> +struct vfio_iommu_type1_set_pasid_table spt;
>> +unsigned long minsz;
>> +int ret = -EINVAL;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, flags);
>> +
>> +if (copy_from_user(, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (spt.argsz < minsz)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET &&
>> +spt.flags & VFIO_PASID_TABLE_FLAG_UNSET)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET)
>> +ret = vfio_attach_pasid_table(iommu, arg + minsz);
>> +else if (spt.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
>> +vfio_detach_pasid_table(iommu);
>> +ret = 0;
>> +}
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  unsigned long arg)
>>  {
>> @@ -2867,6 +2930,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  return vfio_iommu_type1_unmap_dma(iommu, arg);
>>  case VFIO_IOMMU_DIRTY_PAGES:
>>  return vfio_iommu_type1_dirty_pages(iommu, arg);
>> +case VFIO_IOMMU_SET_PASID_TABLE:
>> +return vfio_iommu_type1_set_pasid_table(iommu, arg);
>>  default:
>>  return -ENOTTY;
>>  }
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index 2f313a238a8f..78ce3ce6c331 100644
>> 

Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-21 Thread Auger Eric
Hi Shameer,
On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: fast devsel
> Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool : n=155456, size=2176, 
> socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port 
> will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: bus master, fast devsel, latency 0
> Memory at 800010 (64-bit, prefetchable) [size=64K]
> Memory at 80 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe 

Re: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 11:36 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>>> -Original Message-
>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>> Sent: 16 November 2020 11:00
>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>>> alex.william...@redhat.com
>>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>>> Thodi ;
>>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>>> nicoleots...@gmail.com; yuzenghui 
>>> Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
>>> region
>>>
>>> In preparation for vSVA, let's register a DMA fault response region,
>>> where the userspace will push the page responses and increment the
>>> head of the buffer. The kernel will pop those responses and inject
>>> them on iommu side.
>>>
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  drivers/vfio/pci/vfio_pci.c | 114 +---
>>>  drivers/vfio/pci/vfio_pci_private.h |   5 ++
>>>  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
>>>  include/uapi/linux/vfio.h   |  32 
>>>  4 files changed, 181 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index 65a83fd0e8c0..e9a904ce3f0d 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
>>> vfio_pci_device *vdev,
>>> kfree(vdev->fault_pages);
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> -  struct vfio_pci_region *region,
>>> -  struct vm_area_struct *vma)
>>> +static void
>>> +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region) {
>>> +   if (vdev->dma_fault_response_wq)
>>> +   destroy_workqueue(vdev->dma_fault_response_wq);
>>> +   kfree(vdev->fault_response_pages);
>>> +   vdev->fault_response_pages = NULL;
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +struct vfio_pci_region *region,
>>> +struct vm_area_struct *vma,
>>> +u8 *pages)
>>>  {
>>> u64 phys_len, req_len, pgoff, req_start;
>>> unsigned long long addr;
>>> @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>>> req_start = pgoff << PAGE_SHIFT;
>>>
>>> -   /* only the second page of the producer fault region is mmappable */
>>> +   /* only the second page of the fault region is mmappable */
>>> if (req_start < PAGE_SIZE)
>>> return -EINVAL;
>>>
>>> if (req_start + req_len > phys_len)
>>> return -EINVAL;
>>>
>>> -   addr = virt_to_phys(vdev->fault_pages);
>>> +   addr = virt_to_phys(pages);
>>> vma->vm_private_data = vdev;
>>> vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
>>>
>>> @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> return ret;
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
>>> -struct vfio_pci_region *region,
>>> -struct vfio_info_cap *caps)
>>> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_pages);
>>> +}
>>> +
>>> +static int
>>> +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region,
>>> +   struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_response_pages);
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
>>> *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vfio_info_cap *caps,
>>> +  u32 cap_id)
>>>  {
>>> struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
>>> struct vfio_region_info_cap_fault cap = {
>>> -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>>> +   .header.id = cap_id,
>>> .header.version = 1,
>>> .version = 1,
>>> };
>>> @@ -383,6 +410,14 @@ static int
>>> vfio_pci_dma_fault_add_capability(struct
>>> 

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Auger Eric
Hi Keqian,

On 2/18/21 9:43 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/12 16:55, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/18 19:21, Eric Auger wrote:
>>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>>> for each MSI doorbell. If both the host and the guest are exposed
>>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>>> onto the physical doorbell (hDB).
>>>>
>>>> So we end up with 2 untied mappings:
>>>>  S1S2
>>>> gIOVA->gDB
>>>>   hIOVA->hDB
>>>>
>>>> Currently the PCI device is programmed by the host with hIOVA
>>>> as MSI doorbell. So this does not work.
>>>>
>>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>>> that gIOVA can be reused by the host instead of re-allocating
>>>> a new IOVA. So the goal is to create the following nested mapping:
>>> Does the gDB can be reused under non-nested mode?
>>
>> Under non nested mode the hIOVA is allocated within the MSI reserved
>> region exposed by the SMMU driver, [0x800, 80f]. see
>> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
>> is programmed in the physical device so that the physical SMMU
>> translates it into the physical doorbell (hDB = host physical ITS
> So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
> non-virtualization scenario.
Without virtualization, the host kernel also transparently allocates an
iova to map the doorbell. With standard passthrough withou vIOMMU, the
iova window is different (MSI RESV region).

Thanks

Eric
> 
>> doorbell). The gDB is not used at pIOMMU programming level. It is only
>> used when setting up the KVM irq route.
>>
>> Hope this answers your question.
> Thanks for your explanation!
>>
> 
> Thanks,
> Keqian
> 
>>>
>>>>
>>>>  S1S2
>>>> gIOVA->gDB ->hDB
>>>>
>>>> and program the PCI device with gIOVA MSI doorbell.
>>>>
>>>> In case we have several devices attached to this nested domain
>>>> (devices belonging to the same group), they cannot be isolated
>>>> on guest side either. So they should also end up in the same domain
>>>> on guest side. We will enforce that all the devices attached to
>>>> the host iommu domain use the same physical doorbell and similarly
>>>> a single virtual doorbell mapping gets registered (1 single
>>>> virtual doorbell is used on guest as well).
>>>>
>>> [...]
>>>
>>>> + *
>>>> + * The associated IOVA can be reused by the host to create a nested
>>>> + * stage2 binding mapping translating into the physical doorbell used
>>>> + * by the devices attached to the domain.
>>>> + *
>>>> + * All devices within the domain must share the same physical doorbell.
>>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>>> + */
>>>> +
>>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>>> +   dma_addr_t giova, phys_addr_t gpa, size_t size)
>>>> +{
>>>> +  if (unlikely(!domain->ops->bind_guest_msi))
>>>> +  return -ENODEV;
>>>> +
>>>> +  return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>>> +
>>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>>> +  dma_addr_t iova)
>>> nit: s/iova/giova
>> sure
>>>
>>>> +{
>>>> +  if (unlikely(!domain->ops->unbind_guest_msi))
>>>> +  return;
>>>> +
>>>> +  domain->ops->unbind_guest_msi(domain, iova);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>>> +
>>> [...]
>>>
>>> Thanks,
>>> Keqian
>>>
>>
>> Thanks
>>
>> Eric
>>
>> .
>>
> 



Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-02-15 Thread Auger Eric
Hi Shameer,

On 12/3/20 7:42 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: kvmarm-boun...@lists.cs.columbia.edu
>> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On Behalf Of Auger Eric
>> Sent: 01 December 2020 13:59
>> To: wangxingang 
>> Cc: Xieyingtai ; jean-phili...@linaro.org;
>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> vivek.gau...@arm.com; alex.william...@redhat.com;
>> zhangfei@linaro.org; robin.mur...@arm.com;
>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Xingang,
>>
>> On 12/1/20 2:33 PM, Xingang Wang wrote:
>>> Hi Eric
>>>
>>> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>>>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>>> * insertion to guarantee those are observed before the TLBI. Do be
>>>> * careful, 007.
>>>> */
>>>> -  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>> +  if (ext_asid >= 0) { /* guest stage 1 invalidation */
>>>> +  cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
>>>> +  cmd.tlbi.asid   = ext_asid;
>>>> +  cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>>>> +  } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>>
>>> Found a problem here, the cmd for guest stage 1 invalidation is built,
>>> but it is not delivered to smmu.
>>>
>>
>> Thank you for the report. I will fix that soon. With that fixed, have
>> you been able to run vSVA on top of the series. Do you need other stuff
>> to be fixed at SMMU level? 
> 
> I am seeing another issue with this series. This is when you have the vSMMU
> in non-strict mode(iommu.strict=0). Any network pass-through dev with iperf 
> run 
> will be enough to reproduce the issue. It may randomly stop/hang.
> 
> It looks like the .flush_iotlb_all from guest is not propagated down to the 
> host
> correctly. I have a temp hack to fix this in Qemu wherein CMDQ_OP_TLBI_NH_ASID
> will result in a CACHE_INVALIDATE with IOMMU_INV_GRANU_PASID flag and archid
> set.

Thank you for the analysis. Indeed the NH_ASID was not properly handled
as asid info was not passed down. I fixed domain invalidation and added
asid based invalidation.

Thanks

Eric
> 
> Please take a look and let me know. 
> 
> As I am going to respin soon, please let me
>> know what is the best branch to rebase to alleviate your integration.
> 
> Please find the latest kernel and Qemu branch with vSVA support added here,
> 
> https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva
> https://github.com/hisilicon/qemu/tree/v5.2.0-rc1-2stage-rfcv7-vsva
> 
> I have done some basic minimum vSVA tests on a HiSilicon D06 board with
> a zip dev that supports STALL. All looks good so far apart from the issues
> that have been already reported/discussed.
> 
> The kernel branch is actually a rebase of sva/uacce related patches from a
> Linaro branch here,
> 
> https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.10
> 
> I think going forward it will be good(if possible) to respin your series on 
> top of
> a sva branch with STALL/PRI support added. 
> 
> Hi Jean/zhangfei,
> Is it possible to have a branch with minimum required SVA/UACCE related 
> patches
> that are already public and can be a "stable" candidate for future respin of 
> Eric's series?
> Please share your thoughts.
> 
> Thanks,
> Shameer 
> 
>> Best Regards
>>
>> Eric
>>
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 



Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
IOMMU_NESTING_FEAT_PAGE_RESP definition is missing too

Eric
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(>padding, 0x0, 12);
> + memset(>vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
>   return 0;
>  }
>  
> 



Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.

s/reuqired/required in the commit title
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
I understood IOMMU_NESTING_FEAT_BIND_PGTBL advertises the requirement to
bind tables per PASID, ie. passing iommu_gpasid_bind_data.
In ARM case I guess you plan to use attach/detach_pasid_table API with
iommu_pasid_table_config struct. So I understood we should add a new
feature here.
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(>padding, 0x0, 12);
> + memset(>vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
spurious new line
>   return 0;
>  }
>  
> 



Re: [PATCH 1/2] iommu: Report domain nesting info for arm-smmu-v3

2021-02-12 Thread Auger Eric
Hi Vivek,
On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Add a vendor specific structure for domain nesting info for
> arm smmu-v3, and necessary info fields required to populate
> stage1 page tables.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  include/uapi/linux/iommu.h | 31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 4d3d988fa353..5f059bcf7720 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -323,7 +323,8 @@ struct iommu_gpasid_bind_data {
>  #define IOMMU_GPASID_BIND_VERSION_1  1
>   __u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD 1
> -#define IOMMU_PASID_FORMAT_LAST  2
> +#define IOMMU_PASID_FORMAT_ARM_SMMU_V3   2
> +#define IOMMU_PASID_FORMAT_LAST  3
>   __u32 format;
>   __u32 addr_width;
>  #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */
> @@ -409,6 +410,21 @@ struct iommu_nesting_info_vtd {
>   __u64   ecap_reg;
>  };
>  
> +/*
> + * struct iommu_nesting_info_arm_smmuv3 - Arm SMMU-v3 nesting info.
> + */
> +struct iommu_nesting_info_arm_smmuv3 {
> + __u32   flags;
> + __u16   asid_bits;
> +
> + /* Arm LPAE page table format as per kernel */
> +#define ARM_PGTBL_32_LPAE_S1 (0x0)
> +#define ARM_PGTBL_64_LPAE_S1 (0x2)
Shouldn't it be a bitfield instead as both can be supported (the actual
driver only supports 64b table format though). Does it match matches
IDR0.TTF?
> + __u8pgtbl_fmt;
So I understand this API is supposed to allow VFIO to expose those info
early enough to the userspace to help configuring the viommu and avoid
errors later on. I wonder how far we want to go on this path. What about
those other caps that impact the STE/CD validity. There may be others...

SMMU_IDR0.CD2L (support of 2 stage CD)
SMMU_IDR0.TTENDIAN (endianness)
SMMU_IDR0.HTTU (if 0 forbids HA/HD setting in the CD)
SMMU_IDR3.STT (impacts T0SZ)

Thanks

Eric

> +
> + __u8padding[9];
> +};
> +
>  /*
>   * struct iommu_nesting_info - Information for nesting-capable IOMMU.
>   *  userspace should check it before using
> @@ -445,11 +461,13 @@ struct iommu_nesting_info_vtd {
>   * +---+--+
>   *
>   * data struct types defined for @format:
> - * ++=+
> - * | @format| data struct |
> - * ++=+
> - * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd   |
> - * ++-+
> + * ++==+
> + * | @format| data struct  |
> + * ++==+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd|
> + * +---+---+
> + * | IOMMU_PASID_FORMAT_ARM_SMMU_V3 | struct iommu_nesting_info_arm_smmuv3 |
> + * ++--+
>   *
>   */
>  struct iommu_nesting_info {
> @@ -466,6 +484,7 @@ struct iommu_nesting_info {
>   /* Vendor specific data */
>   union {
>   struct iommu_nesting_info_vtd vtd;
> + struct iommu_nesting_info_arm_smmuv3 smmuv3;
>   } vendor;
>  };
>  
> 



Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-12 Thread Auger Eric
Hi Keqian,

On 2/1/21 12:52 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>> for each MSI doorbell. If both the host and the guest are exposed
>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>> onto the physical doorbell (hDB).
>>
>> So we end up with 2 untied mappings:
>>  S1S2
>> gIOVA->gDB
>>   hIOVA->hDB
>>
>> Currently the PCI device is programmed by the host with hIOVA
>> as MSI doorbell. So this does not work.
>>
>> This patch introduces an API to pass gIOVA/gDB to the host so
>> that gIOVA can be reused by the host instead of re-allocating
>> a new IOVA. So the goal is to create the following nested mapping:
> Does the gDB can be reused under non-nested mode?

Under non nested mode the hIOVA is allocated within the MSI reserved
region exposed by the SMMU driver, [0x800, 80f]. see
iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
is programmed in the physical device so that the physical SMMU
translates it into the physical doorbell (hDB = host physical ITS
doorbell). The gDB is not used at pIOMMU programming level. It is only
used when setting up the KVM irq route.

Hope this answers your question.

> 
>>
>>  S1S2
>> gIOVA->gDB ->hDB
>>
>> and program the PCI device with gIOVA MSI doorbell.
>>
>> In case we have several devices attached to this nested domain
>> (devices belonging to the same group), they cannot be isolated
>> on guest side either. So they should also end up in the same domain
>> on guest side. We will enforce that all the devices attached to
>> the host iommu domain use the same physical doorbell and similarly
>> a single virtual doorbell mapping gets registered (1 single
>> virtual doorbell is used on guest as well).
>>
> [...]
> 
>> + *
>> + * The associated IOVA can be reused by the host to create a nested
>> + * stage2 binding mapping translating into the physical doorbell used
>> + * by the devices attached to the domain.
>> + *
>> + * All devices within the domain must share the same physical doorbell.
>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>> + */
>> +
>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>> + dma_addr_t giova, phys_addr_t gpa, size_t size)
>> +{
>> +if (unlikely(!domain->ops->bind_guest_msi))
>> +return -ENODEV;
>> +
>> +return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>> +
>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>> +dma_addr_t iova)
> nit: s/iova/giova
sure
> 
>> +{
>> +if (unlikely(!domain->ops->unbind_guest_msi))
>> +return;
>> +
>> +domain->ops->unbind_guest_msi(domain, iova);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>> +
> [...]
> 
> Thanks,
> Keqian
> 

Thanks

Eric



Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2021-02-11 Thread Auger Eric
Hi Keqian,

On 2/2/21 8:14 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * three cases at the moment:
>>   *
>>   * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
> 
> The following line "BUG_ON(ste_live && !nested);" forbids this transform.

Yes as pointed out by Kunkun, there is always an abort in-between. I
will restore the original comment.

> And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
> So after detach, the status is not the same as that before attach. Does it
> match our expectation?

Indeed at detach time I think I should reset the abort() flag as this
latter is not imposed anymore by the guest.

Thanks!

Eric


> 
>>   *
>>   * Given that we can't update the STE atomically and the SMMU
>>   * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * 3. Update Config, sync
>>   */
>>  u64 val = le64_to_cpu(dst[0]);
>> -bool ste_live = false;
>> +bool s1_live = false, s2_live = false, ste_live;
>> +bool abort, nested = false, translate = false;
>>  struct arm_smmu_device *smmu = NULL;
>>  struct arm_smmu_s1_cfg *s1_cfg;
>>  struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  default:
>>  break;
>>  }
>> +nested = s1_cfg->set && s2_cfg->set;
>> +translate = s1_cfg->set || s2_cfg->set;
>>  }
>>  
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  case STRTAB_STE_0_CFG_BYPASS:
>>  break;
>>  case STRTAB_STE_0_CFG_S1_TRANS:
>> +s1_live = true;
>> +break;
>>  case STRTAB_STE_0_CFG_S2_TRANS:
>> -ste_live = true;
>> +s2_live = true;
>> +break;
>> +case STRTAB_STE_0_CFG_NESTED:
>> +s1_live = true;
>> +s2_live = true;
>>  break;
>>  case STRTAB_STE_0_CFG_ABORT:
>> -BUG_ON(!disable_bypass);
>>  break;
>>  default:
>>  BUG(); /* STE corruption */
>>  }
>>  }
>>  
>> +ste_live = s1_live || s2_live;
>> +
>>  /* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  val = STRTAB_STE_0_V;
>>  
>>  /* Bypass/fault */
>> -if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -if (!smmu_domain && disable_bypass)
>> +
>> +if (!smmu_domain)
>> +abort = disable_bypass;
>> +else
>> +abort = smmu_domain->abort;
>> +
>> +if (abort || !translate) {
>> +if (abort)
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> STRTAB_STE_0_CFG_ABORT);
>>  else
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> 

Re: [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table

2021-02-11 Thread Auger Eric
Hi Keqian,

On 2/2/21 9:03 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> On attach_pasid_table() we program STE S1 related info set
>> by the guest into the actual physical STEs. At minimum
>> we need to program the context descriptor GPA and compute
>> whether the stage1 is translated/bypassed or aborted.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v7 -> v8:
>> - remove smmu->features check, now done on domain finalize
>>
>> v6 -> v7:
>> - check versions and comment the fact we don't need to take
>>   into account s1dss and s1fmt
>> v3 -> v4:
>> - adapt to changes in iommu_pasid_table_config
>> - different programming convention at s1_cfg/s2_cfg/ste.abort
>>
>> v2 -> v3:
>> - callback now is named set_pasid_table and struct fields
>>   are laid out differently.
>>
>> v1 -> v2:
>> - invalidate the STE before changing them
>> - hold init_mutex
>> - handle new fields
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +
>>  1 file changed, 89 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 412ea1bafa50..805acdc18a3a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device 
>> *dev,
>>  iommu_dma_get_resv_regions(dev, head);
>>  }
>>  
>> +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
>> +   struct iommu_pasid_table_config *cfg)
>> +{
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct arm_smmu_master *master;
>> +struct arm_smmu_device *smmu;
>> +unsigned long flags;
>> +int ret = -EINVAL;
>> +
>> +if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
>> +return -EINVAL;
>> +
>> +if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
>> +cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
>> +return -EINVAL;
>> +
>> +mutex_lock(_domain->init_mutex);
>> +
>> +smmu = smmu_domain->smmu;
>> +
>> +if (!smmu)
>> +goto out;
>> +
>> +if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>> +goto out;
>> +
>> +switch (cfg->config) {
>> +case IOMMU_PASID_CONFIG_ABORT:
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = true;
>> +break;
>> +case IOMMU_PASID_CONFIG_BYPASS:
>> +smmu_domain->s1_cfg.set = false;
>> +smmu_domain->abort = false;
> I didn't test it, but it seems that this will cause BUG() in 
> arm_smmu_write_strtab_ent().
> At the line "BUG_ON(ste_live && !nested);". Maybe I miss something?

No you are fully correct. Shammeer hit the BUG_ON() when booting the
guest with iommu.passthrough = 1. So I removed the BUG_ON(). The legacy
BUG_ON(ste_live) still is there under the form of BUG_ON(s1_live).

Thanks!

Eric


> 
>> +break;
>> +case IOMMU_PASID_CONFIG_TRANSLATE:
>> +/* we do not support S1 <-> S1 transitions */
>> +if (smmu_domain->s1_cfg.set)
>> +goto out;
>> +
>> +/*
>> + * we currently support a single CD so s1fmt and s1dss
>> + * fields are also ignored
>> + */
>> +if (cfg->pasid_bits)
>> +goto out;
>> +
>> +smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> +smmu_domain->s1_cfg.set = true;
>> +smmu_domain->abort = false;
>> +break;
>> +default:
>> +goto out;
>> +}
>> +spin_lock_irqsave(_domain->devices_lock, flags);
>> +list_for_each_entry(master, _domain->devices, domain_head)
>> +arm_smmu_install_ste_for_dev(master);
>> +spin_unlock_irqrestore(_domain->devices_lock, flags);
>> +ret = 0;
>> +out:
>> +mutex_unlock(_domain->init_mutex);
>> +return ret;
>> +}
>> +
> [...]
> 
> Thanks,
> Keqian
> 



Re: [RFC v4 3/3] vfio: platform: reset: add msi support

2021-02-08 Thread Auger Eric
Hi Vikas,

On 1/29/21 6:24 PM, Vikas Gupta wrote:
> Add msi support for Broadcom FlexRm device.
> 
> Signed-off-by: Vikas Gupta 
> ---
>  .../platform/reset/vfio_platform_bcmflexrm.c  | 72 ++-
>  1 file changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c 
> b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
> index 96064ef8f629..6ca4ca12575b 100644
> --- a/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
> +++ b/drivers/vfio/platform/reset/vfio_platform_bcmflexrm.c
> @@ -21,7 +21,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  
>  #include "../vfio_platform_private.h"
>  
> @@ -33,6 +35,9 @@
>  #define RING_VER 0x000
>  #define RING_CONTROL 0x034
>  #define RING_FLUSH_DONE  0x038
> +#define RING_MSI_ADDR_LS 0x03c
> +#define RING_MSI_ADDR_MS 0x040
> +#define RING_MSI_DATA_VALUE  0x064
>  
>  /* Register RING_CONTROL fields */
>  #define CONTROL_FLUSH_SHIFT  5
> @@ -105,8 +110,71 @@ static int vfio_platform_bcmflexrm_reset(struct 
> vfio_platform_device *vdev)
>   return ret;
>  }
>  
> -module_vfio_reset_handler("brcm,iproc-flexrm-mbox",
> -   vfio_platform_bcmflexrm_reset);
> +static u32 bcm_num_msi(struct vfio_platform_device *vdev)
Please use the same prefix as for reset, ie. vfio_platform_bcmflexrm_get_msi
> +{
> + struct vfio_platform_region *reg = >regions[0];> +
> + return (reg->size / RING_REGS_SIZE);
> +}
> +
> +static void bcm_write_msi(struct vfio_platform_device *vdev,
vfio_platform_bcmflexrm_msi_write?
> + struct msi_desc *desc,
> + struct msi_msg *msg)
> +{
> + int i;
> + int hwirq = -1;
> + int msi_src;
> + void __iomem *ring;
> + struct vfio_platform_region *reg = >regions[0];
> +
> + if (!reg)
> + return;
why do you need this check? For this to be called, SET_IRQ IOCTL must
have been called. This can only happen after vfio_platform_open which
first calls vfio_platform_regions_init and then vfio_platform_irq_init
> +
> + for (i = 0; i < vdev->num_irqs; i++)
> + if (vdev->irqs[i].type == VFIO_IRQ_TYPE_MSI)
> + hwirq = vdev->irqs[i].ctx[0].hwirq;
nit: It would have been easier to record in vdev the last index of wired
interrupts and just add the index of the MSI
> +
> + if (hwirq < 0)
> + return;
> +
> + msi_src = desc->irq - hwirq;
> +
> + if (!reg->ioaddr) {
> + reg->ioaddr = ioremap(reg->addr, reg->size);
> + if (!reg->ioaddr)
pr_warn_once("")?
> + return;
> + }
> +
> + ring = reg->ioaddr + msi_src * RING_REGS_SIZE;
> +
> + writel_relaxed(msg->address_lo, ring + RING_MSI_ADDR_LS);
> + writel_relaxed(msg->address_hi, ring + RING_MSI_ADDR_MS);
> + writel_relaxed(msg->data, ring + RING_MSI_DATA_VALUE);
> +}
> +
> +static struct vfio_platform_reset_node vfio_platform_bcmflexrm_reset_node = {
> + .owner = THIS_MODULE,
> + .compat = "brcm,iproc-flexrm-mbox",
> + .of_reset = vfio_platform_bcmflexrm_reset,
> + .of_get_msi = bcm_num_msi,
> + .of_msi_write = bcm_write_msi
> +};
> +
> +static int __init vfio_platform_bcmflexrm_reset_module_init(void)
> +{
> + __vfio_platform_register_reset(_platform_bcmflexrm_reset_node);
> +
> + return 0;
> +}
> +
> +static void __exit vfio_platform_bcmflexrm_reset_module_exit(void)
> +{
> + vfio_platform_unregister_reset("brcm,iproc-flexrm-mbox",
> +vfio_platform_bcmflexrm_reset);
> +}
> +
> +module_init(vfio_platform_bcmflexrm_reset_module_init);
> +module_exit(vfio_platform_bcmflexrm_reset_module_exit);
>  
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Anup Patel ");
> 

I think you should move the whole series in PATCH now.

Thanks

Eric



Re: [RFC v4 2/3] vfio/platform: change cleanup order

2021-02-08 Thread Auger Eric
Hi Vikas,

On 1/29/21 6:24 PM, Vikas Gupta wrote:
> In the case of msi, vendor specific msi module may require
> region access to handle msi cleanup so we need to cleanup region
> after irq cleanup only.
> 
> Signed-off-by: Vikas Gupta 
Acked-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/vfio/platform/vfio_platform_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> b/drivers/vfio/platform/vfio_platform_common.c
> index f2b1f0c3bfcc..1cc040e3ed1f 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -243,8 +243,8 @@ static void vfio_platform_release(void *device_data)
>   WARN_ON(1);
>   }
>   pm_runtime_put(vdev->device);
> - vfio_platform_regions_cleanup(vdev);
>   vfio_platform_irq_cleanup(vdev);
> + vfio_platform_regions_cleanup(vdev);
>   }
>  
>   mutex_unlock(_lock);
> 



Re: [RFC v4 1/3] vfio/platform: add support for msi

2021-02-08 Thread Auger Eric
Hi Vikas,

On 1/29/21 6:24 PM, Vikas Gupta wrote:
> MSI support for platform devices. MSI is added
s/MSI support/ Add MSI support
> as a single 'index' with 'count' as the number of
> MSI(s) supported by the devices.
as a single 'index' following the last wired irq index index, with count.

It allows to associate eventfds to MSIs.

If MSI is supported, specialization callbacks need to be implemented in
the reset module (of_get_msi and of_msi_write).
> 
> Signed-off-by: Vikas Gupta 
> ---
>  drivers/vfio/platform/Kconfig |   1 +
>  drivers/vfio/platform/vfio_platform_common.c  |  95 ++-
>  drivers/vfio/platform/vfio_platform_irq.c | 253 --
>  drivers/vfio/platform/vfio_platform_private.h |  29 ++
>  include/uapi/linux/vfio.h |  24 ++
>  5 files changed, 373 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
> index dc1a3c44f2c6..d4bbc9f27763 100644
> --- a/drivers/vfio/platform/Kconfig
> +++ b/drivers/vfio/platform/Kconfig
> @@ -3,6 +3,7 @@ config VFIO_PLATFORM
>   tristate "VFIO support for platform devices"
>   depends on VFIO && EVENTFD && (ARM || ARM64)
>   select VFIO_VIRQFD
> + select GENERIC_MSI_IRQ_DOMAIN
>   help
> Support for platform devices with VFIO. This is required to make
> use of platform devices present on the system using the VFIO
> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> b/drivers/vfio/platform/vfio_platform_common.c
> index fb4b385191f2..f2b1f0c3bfcc 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "vfio_platform_private.h"
>  
> @@ -28,23 +29,22 @@
>  static LIST_HEAD(reset_list);
>  static DEFINE_MUTEX(driver_lock);
>  
> -static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char 
> *compat,
> - struct module **module)
> +static void vfio_platform_lookup_reset(const char *compat,
> +struct module **module,
> +struct vfio_platform_reset_node **node)
nit: I would prefer this function directly returns a
struct vfio_platform_reset_node *
>  {
>   struct vfio_platform_reset_node *iter;
> - vfio_platform_reset_fn_t reset_fn = NULL;
>  
>   mutex_lock(_lock);
>   list_for_each_entry(iter, _list, link) {
>   if (!strcmp(iter->compat, compat) &&
>   try_module_get(iter->owner)) {
>   *module = iter->owner;
> - reset_fn = iter->of_reset;
> + *node = iter;
>   break;
>   }
>   }
>   mutex_unlock(_lock);
> - return reset_fn;
>  }
>  
>  static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,> @@ 
> -112,15 +112,23 @@ static bool vfio_platform_has_reset(struct
vfio_platform_device *vdev)
> 
>  static int vfio_platform_get_reset(struct vfio_platform_device *vdev)
>  {
> + struct vfio_platform_reset_node *node = NULL;
> +
>   if (VFIO_PLATFORM_IS_ACPI(vdev))
>   return vfio_platform_acpi_has_reset(vdev) ? 0 : -ENOENT;
>  
> - vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
> - >reset_module);
> - if (!vdev->of_reset) {
> + vfio_platform_lookup_reset(vdev->compat, >reset_module,
> +);
> + if (!node) {
>   request_module("vfio-reset:%s", vdev->compat);
> - vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
> - >reset_module);
> + vfio_platform_lookup_reset(vdev->compat, >reset_module,
> +);
> + }
> +
> + if (node) {
> + vdev->of_reset = node->of_reset;
> + vdev->of_get_msi = node->of_get_msi;
> + vdev->of_msi_write = node->of_msi_write;>   }
>  
>   return vdev->of_reset ? 0 : -ENOENT;
> @@ -343,9 +351,16 @@ static long vfio_platform_ioctl(void *device_data,
>  
>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>   struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
> + unsigned long capsz;
> + u32 index;
>  
>   minsz = offsetofend(struct vfio_irq_info, count);
>  
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
>  
> @@ -355,8 +370,53 @@ static long vfio_platform_ioctl(void *device_data,
>   if (info.index >= 

Re: [PATCH v3 4/5] amba: Make the remove callback return void

2021-02-03 Thread Auger Eric
Hi Uwe,

On 1/26/21 5:58 PM, Uwe Kleine-König wrote:
> All amba drivers return 0 in their remove callback. Together with the
> driver core ignoring the return value anyhow, it doesn't make sense to
> return a value here.
> 
> Change the remove prototype to return void, which makes it explicit that
> returning an error value doesn't work as expected. This simplifies changing
> the core remove callback to return void, too.
> 
> Reviewed-by: Ulf Hansson 
> Reviewed-by: Arnd Bergmann 
> Acked-by: Alexandre Belloni 
> Acked-by: Dmitry Torokhov 
> Acked-by: Krzysztof Kozlowski  # for drivers/memory
> Acked-by: Mark Brown 
> Acked-by: Dmitry Torokhov 
> Acked-by: Linus Walleij 
> Signed-off-by: Uwe Kleine-König 
> ---
>  drivers/amba/bus.c | 5 ++---
>  drivers/char/hw_random/nomadik-rng.c   | 3 +--
>  drivers/dma/pl330.c| 3 +--
>  drivers/gpu/drm/pl111/pl111_drv.c  | 4 +---
>  drivers/hwtracing/coresight/coresight-catu.c   | 3 +--
>  drivers/hwtracing/coresight/coresight-cpu-debug.c  | 4 +---
>  drivers/hwtracing/coresight/coresight-cti-core.c   | 4 +---
>  drivers/hwtracing/coresight/coresight-etb10.c  | 4 +---
>  drivers/hwtracing/coresight/coresight-etm3x-core.c | 4 +---
>  drivers/hwtracing/coresight/coresight-etm4x-core.c | 4 +---
>  drivers/hwtracing/coresight/coresight-funnel.c | 4 ++--
>  drivers/hwtracing/coresight/coresight-replicator.c | 4 ++--
>  drivers/hwtracing/coresight/coresight-stm.c| 4 +---
>  drivers/hwtracing/coresight/coresight-tmc-core.c   | 4 +---
>  drivers/hwtracing/coresight/coresight-tpiu.c   | 4 +---
>  drivers/i2c/busses/i2c-nomadik.c   | 4 +---
>  drivers/input/serio/ambakmi.c  | 3 +--
>  drivers/memory/pl172.c | 4 +---
>  drivers/memory/pl353-smc.c | 4 +---
>  drivers/mmc/host/mmci.c| 4 +---
>  drivers/rtc/rtc-pl030.c| 4 +---
>  drivers/rtc/rtc-pl031.c| 4 +---
>  drivers/spi/spi-pl022.c| 5 ++---
>  drivers/tty/serial/amba-pl010.c| 4 +---
>  drivers/tty/serial/amba-pl011.c| 3 +--
>  drivers/vfio/platform/vfio_amba.c  | 3 +--
>  drivers/video/fbdev/amba-clcd.c| 4 +---
>  drivers/watchdog/sp805_wdt.c   | 4 +---
>  include/linux/amba/bus.h   | 2 +-
>  sound/arm/aaci.c   | 4 +---
>  30 files changed, 34 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index 8c4a42df47c6..48b5d4b4e889 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -300,11 +300,10 @@ static int amba_remove(struct device *dev)
>  {
>   struct amba_device *pcdev = to_amba_device(dev);
>   struct amba_driver *drv = to_amba_driver(dev->driver);
> - int ret = 0;
>  
>   pm_runtime_get_sync(dev);
>   if (drv->remove)
> - ret = drv->remove(pcdev);
> + drv->remove(pcdev);
>   pm_runtime_put_noidle(dev);
>  
>   /* Undo the runtime PM settings in amba_probe() */
> @@ -315,7 +314,7 @@ static int amba_remove(struct device *dev)
>   amba_put_disable_pclk(pcdev);
>   dev_pm_domain_detach(dev, true);
>  
> - return ret;
> + return 0;
>  }
>  
>  static void amba_shutdown(struct device *dev)
> diff --git a/drivers/char/hw_random/nomadik-rng.c 
> b/drivers/char/hw_random/nomadik-rng.c
> index b0ded41eb865..67947a19aa22 100644
> --- a/drivers/char/hw_random/nomadik-rng.c
> +++ b/drivers/char/hw_random/nomadik-rng.c
> @@ -69,11 +69,10 @@ static int nmk_rng_probe(struct amba_device *dev, const 
> struct amba_id *id)
>   return ret;
>  }
>  
> -static int nmk_rng_remove(struct amba_device *dev)
> +static void nmk_rng_remove(struct amba_device *dev)
>  {
>   amba_release_regions(dev);
>   clk_disable(rng_clk);
> - return 0;
>  }
>  
>  static const struct amba_id nmk_rng_ids[] = {
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index bc0f66af0f11..fd8d2bc3be9f 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -3195,7 +3195,7 @@ pl330_probe(struct amba_device *adev, const struct 
> amba_id *id)
>   return ret;
>  }
>  
> -static int pl330_remove(struct amba_device *adev)
> +static void pl330_remove(struct amba_device *adev)
>  {
>   struct pl330_dmac *pl330 = amba_get_drvdata(adev);
>   struct dma_pl330_chan *pch, *_p;
> @@ -3235,7 +3235,6 @@ static int pl330_remove(struct amba_device *adev)
>  
>   if (pl330->rstc)
>   reset_control_assert(pl330->rstc);
> - return 0;
>  }
>  
>  static const struct amba_id pl330_ids[] = {
> diff --git a/drivers/gpu/drm/pl111/pl111_drv.c 
> b/drivers/gpu/drm/pl111/pl111_drv.c
> index 40e6708fbbe2..1fb5eacefd2d 100644
> --- 

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 1:26 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> From: Jean-Philippe Brucker 
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>
>> Signed-off-by: Jean-Philippe Brucker 
> [...]
> 
>>  }
>>  
>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>> +  struct arm_smmu_master *master)
>> +{
>> +int i;
>> +int ret = 0;
>> +struct arm_smmu_stream *new_stream, *cur_stream;
>> +struct rb_node **new_node, *parent_node = NULL;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +master->streams = kcalloc(fwspec->num_ids,
>> +  sizeof(struct arm_smmu_stream), GFP_KERNEL);
>> +if (!master->streams)
>> +return -ENOMEM;
>> +master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.
> 
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
>> +u32 sid = fwspec->ids[i];
>> +
>> +new_stream = >streams[i];
>> +new_stream->id = sid;
>> +new_stream->master = master;
>> +
>> +/*
>> + * Check the SIDs are in range of the SMMU and our stream table
>> + */
>> +if (!arm_smmu_sid_in_range(smmu, sid)) {
>> +ret = -ERANGE;
>> +break;
>> +}
>> +
>> +/* Ensure l2 strtab is initialised */
>> +if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> +ret = arm_smmu_init_l2_strtab(smmu, sid);
>> +if (ret)
>> +break;
>> +}
>> +
>> +/* Insert into SID tree */
>> +new_node = &(smmu->streams.rb_node);
>> +while (*new_node) {
>> +cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
>> +  node);
>> +parent_node = *new_node;
>> +if (cur_stream->id > new_stream->id) {
>> +new_node = &((*new_node)->rb_left);
>> +} else if (cur_stream->id < new_stream->id) {
>> +new_node = &((*new_node)->rb_right);
>> +} else {
>> +dev_warn(master->dev,
>> + "stream %u already in tree\n",
>> + cur_stream->id);
>> +ret = -EINVAL;
>> +break;
>> +}
>> +}
>> +
>> +if (!ret) {
>> +rb_link_node(_stream->node, parent_node, new_node);
>> +rb_insert_color(_stream->node, >streams);
>> +}
>> +}
>> +
>> +if (ret) {
>> +for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.
> 
>> +rb_erase(>streams[i].node, >streams);
>> +kfree(master->streams);
>> +}
>> +mutex_unlock(>streams_mutex);
>> +
>> +return ret;
>> +}
>> +
>> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
>> +{
>> +int i;
>> +struct arm_smmu_device *smmu = master->smmu;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +if (!smmu || !master->streams)
>> +return;
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids; i++)
>> +rb_erase(>streams[i].node, >streams);
>> +mutex_unlock(>streams_mutex);
>> +
>> +kfree(master->streams);
>> +}
>> +
>>  static struct iommu_ops arm_smmu_ops;
>>  
>>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  {
>> -int i, ret;
>> +int ret;
>>  struct arm_smmu_device *smmu;
>>  struct arm_smmu_master *master;
>>  struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> @@ -2331,27 +2447,12 @@ static struct iommu_device 
>> *arm_smmu_probe_device(struct device *dev)
>>  
>>  master->dev = dev;
>>  master->smmu = smmu;
>> -master->sids = fwspec->ids;
>> -master->num_sids = fwspec->num_ids;
>>  INIT_LIST_HEAD(>bonds);
>>  dev_iommu_priv_set(dev, master);
>>  
>> -/* Check the SIDs are in range of the SMMU and our stream table */
>> -for (i = 0; i < master->num_sids; i++) {
>> -u32 sid = master->sids[i];
>> -
>> -if (!arm_smmu_sid_in_range(smmu, sid)) {
>> -ret = -ERANGE;
>> -goto err_free_master;
>> -}
>> -
>> -/* Ensure l2 strtab is initialised */
>> -

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 12:27 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c  | 68 ++
>>  include/linux/iommu.h  | 21 
>>  include/uapi/linux/iommu.h | 54 ++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain 
>> *domain, struct device *dev
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
> miss export symbol?
yes we do
> 
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +  void __user *uinfo)
>> +{
>> +struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +u32 minsz;
>> +
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +/*
>> + * No new spaces can be added before the variable sized union, the
>> + * minimum size is the offset to the union.
>> + */
>> +minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +/* Copy minsz from user to get flags and argsz */
>> +if (copy_from_user(_table_data, uinfo, minsz))
>> +return -EFAULT;
>> +
>> +/* Fields before the variable size union are mandatory */
>> +if (pasid_table_data.argsz < minsz)
>> +return -EINVAL;
>> +
>> +/* PASID and address granu require additional info beyond minsz */
>> +if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +return -EINVAL;
>> +if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +pasid_table_data.argsz <
>> +offsetofend(struct iommu_pasid_table_config, 
>> vendor_data.smmuv3))
>> +return -EINVAL;
>> +
>> +/*
>> + * User might be using a newer UAPI header which has a larger data
>> + * size, we shall support the existing flags within the current
>> + * size. Copy the remaining user data _after_ minsz but not more
>> + * than the current kernel supported size.
>> + */
>> +if (copy_from_user((void *)_table_data + minsz, uinfo + minsz,
>> +   min_t(u32, pasid_table_data.argsz, 
>> sizeof(pasid_table_data)) - minsz))
>> +return -EFAULT;
>> +
>> +/* Now the argsz is validated, check the content */
>> +if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +return -EINVAL;
>> +
>> +return domain->ops->attach_pasid_table(domain, _table_data);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * 

Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-26 Thread Auger Eric
Hi Vivek,

On 1/21/21 6:34 PM, Vivek Kumar Gautam wrote:
> Hi Eric,
> 
> 
> On 1/19/21 2:33 PM, Auger Eric wrote:
>> Hi Vivek,
>>
>> On 1/15/21 1:13 PM, Vivek Gautam wrote:
>>> This patch-series aims at enabling Nested stage translation in guests
>>> using virtio-iommu as the paravirtualized iommu. The backend is
>>> supported
>>> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
>>>
>>> This series derives its purpose from various efforts happening to add
>>> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
>>> most of the support for SVA has already landed. The support for nested
>>> stage translation and fault reporting to guest has been proposed [1].
>>> The related changes required in VFIO [2] framework have also been put
>>> forward.
>>>
>>> This series proposes changes in virtio-iommu to program PASID tables
>>> and related stage-1 page tables. A simple iommu-pasid-table library
>>> is added for this purpose that interacts with vendor drivers to
>>> allocate and populate PASID tables.
>>> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
>>> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
>>> to support allocating CD tables, and populating them with right values.
>>> These CD tables are essentially the PASID tables and contain stage-1
>>> page table configurations too.
>>> A request to setup these CD tables come from virtio-iommu driver using
>>> the iommu-pasid-table library when running on Arm. The virtio-iommu
>>> then pass these PASID tables to the host using the right virtio backend
>>> and support in VMM.
>>>
>>> For testing we have added necessary support in kvmtool. The changes in
>>> kvmtool are based on virtio-iommu development branch by Jean-Philippe
>>> Brucker [3].
>>>
>>> The tested kernel branch contains following in the order bottom to top
>>> on the git hash -
>>> a) v5.11-rc3
>>> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>>>     table support for Arm.
>>> c) Smmu test engine patches from Jean-Philippe's branch [4]
>>> d) This series
>>> e) Domain nesting info patches [5][6][7].
>>> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>>>     the list).
>>>
>>> This kernel is tested on Neoverse reference software stack with
>>> Fixed virtual platform. Public version of the software stack and
>>> FVP is available here[8][9].
>>>
>>> A big thanks to Jean-Philippe for his contributions towards this work
>>> and for his valuable guidance.
>>>
>>> [1]
>>> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
>>>
>>> [2]
>>> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
>>>
>>> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
>>> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
>>> [5]
>>> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
>>>
>>> [6]
>>> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
>>>
>>> [7]
>>> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
>>>
>>> [8]
>>> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
>>>
>>> [9]
>>> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst
>>>
>>
>> Could you share a public branch where we could find all the kernel
>> pieces.
>>
>> Thank you in advance
> 
> Apologies for the delay. It took a bit of time to sort things out for a
> public branch.
> The branch is available in my github now. Please have a look.
> 
> https://github.com/vivek-arm/linux/tree/5.11-rc3-nested-pgtbl-arm-smmuv3-virtio-iommu

no problem. Thank you for the link.

Best Regards

Eric
> 
> 
> 
> Thanks and regards
> Vivek
> 
>>
>> Best Regards
>>
>> Eric
>>>
>>> Jean-Philippe Brucker (6):
>>>    iommu/virtio: Add headers for table format probing
>>>    iommu/virtio: Add table format probing
>>>    iommu/virtio: Add headers for binding pasid table in iommu
>>>    iommu/virtio: Add support for INVALIDATE request
>>>    iommu/virtio: Attach 

Re: [RFC v3 2/2] vfio/platform: msi: add Broadcom platform devices

2021-01-20 Thread Auger Eric
Hi Alex,

On 1/19/21 11:45 PM, Alex Williamson wrote:
> On Fri, 15 Jan 2021 10:24:33 +0100
> Auger Eric  wrote:
> 
>> Hi Vikas,
>> On 1/15/21 7:35 AM, Vikas Gupta wrote:
>>> Hi Eric,
>>>
>>> On Tue, Jan 12, 2021 at 2:52 PM Auger Eric  wrote:  
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 12/14/20 6:45 PM, Vikas Gupta wrote:  
>>>>> Add msi support for Broadcom platform devices
>>>>>
>>>>> Signed-off-by: Vikas Gupta 
>>>>> ---
>>>>>  drivers/vfio/platform/Kconfig |  1 +
>>>>>  drivers/vfio/platform/Makefile|  1 +
>>>>>  drivers/vfio/platform/msi/Kconfig |  9 
>>>>>  drivers/vfio/platform/msi/Makefile|  2 +
>>>>>  .../vfio/platform/msi/vfio_platform_bcmplt.c  | 49 +++
>>>>>  5 files changed, 62 insertions(+)
>>>>>  create mode 100644 drivers/vfio/platform/msi/Kconfig
>>>>>  create mode 100644 drivers/vfio/platform/msi/Makefile
>>>>>  create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c  
>>>> what does plt mean?  
>>> This(plt) is a generic name for Broadcom platform devices, which we`ll
>>>  plan to add in this file. Currently we have only one in this file.
>>> Do you think this name does not sound good here?  
>>
>> we have VFIO_PLATFORM_BCMFLEXRM_RESET config which also applied to vfio
>> flex-rm platform device.
>>
>> I think it would be more homegenous to have VFIO_PLATFORM_BCMFLEXRM_MSI
>> in case we keep a separate msi module.
>>
>> also in reset dir we have vfio_platform_bcmflexrm.c
>>
>>
>>>>>
>>>>> diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
>>>>> index dc1a3c44f2c6..7b8696febe61 100644
>>>>> --- a/drivers/vfio/platform/Kconfig
>>>>> +++ b/drivers/vfio/platform/Kconfig
>>>>> @@ -21,3 +21,4 @@ config VFIO_AMBA
>>>>> If you don't know what to do here, say N.
>>>>>
>>>>>  source "drivers/vfio/platform/reset/Kconfig"
>>>>> +source "drivers/vfio/platform/msi/Kconfig"
>>>>> diff --git a/drivers/vfio/platform/Makefile 
>>>>> b/drivers/vfio/platform/Makefile
>>>>> index 3f3a24e7c4ef..9ccdcdbf0e7e 100644
>>>>> --- a/drivers/vfio/platform/Makefile
>>>>> +++ b/drivers/vfio/platform/Makefile
>>>>> @@ -5,6 +5,7 @@ vfio-platform-y := vfio_platform.o
>>>>>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
>>>>>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform-base.o
>>>>>  obj-$(CONFIG_VFIO_PLATFORM) += reset/
>>>>> +obj-$(CONFIG_VFIO_PLATFORM) += msi/
>>>>>
>>>>>  vfio-amba-y := vfio_amba.o
>>>>>
>>>>> diff --git a/drivers/vfio/platform/msi/Kconfig 
>>>>> b/drivers/vfio/platform/msi/Kconfig
>>>>> new file mode 100644
>>>>> index ..54d6b70e1e32
>>>>> --- /dev/null
>>>>> +++ b/drivers/vfio/platform/msi/Kconfig
>>>>> @@ -0,0 +1,9 @@
>>>>> +# SPDX-License-Identifier: GPL-2.0-only
>>>>> +config VFIO_PLATFORM_BCMPLT_MSI
>>>>> + tristate "MSI support for Broadcom platform devices"
>>>>> + depends on VFIO_PLATFORM && (ARCH_BCM_IPROC || COMPILE_TEST)
>>>>> + default ARCH_BCM_IPROC
>>>>> + help
>>>>> +   Enables the VFIO platform driver to handle msi for Broadcom 
>>>>> devices
>>>>> +
>>>>> +   If you don't know what to do here, say N.
>>>>> diff --git a/drivers/vfio/platform/msi/Makefile 
>>>>> b/drivers/vfio/platform/msi/Makefile
>>>>> new file mode 100644
>>>>> index ..27422d45cecb
>>>>> --- /dev/null
>>>>> +++ b/drivers/vfio/platform/msi/Makefile
>>>>> @@ -0,0 +1,2 @@
>>>>> +# SPDX-License-Identifier: GPL-2.0
>>>>> +obj-$(CONFIG_VFIO_PLATFORM_BCMPLT_MSI) += vfio_platform_bcmplt.o
>>>>> diff --git a/drivers/vfio/platform/msi/vfio_platform_bcmplt.c 
>>>>> b/drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>>>> new file mode 100644
>>>>> index ..a074b5e92d77
>>>>> --- /dev/null
>>>>> +++ b/

Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-19 Thread Auger Eric
Hi Vivek,

On 1/15/21 1:13 PM, Vivek Gautam wrote:
> This patch-series aims at enabling Nested stage translation in guests
> using virtio-iommu as the paravirtualized iommu. The backend is supported
> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
> 
> This series derives its purpose from various efforts happening to add
> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
> most of the support for SVA has already landed. The support for nested
> stage translation and fault reporting to guest has been proposed [1].
> The related changes required in VFIO [2] framework have also been put
> forward.
> 
> This series proposes changes in virtio-iommu to program PASID tables
> and related stage-1 page tables. A simple iommu-pasid-table library
> is added for this purpose that interacts with vendor drivers to
> allocate and populate PASID tables.
> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
> to support allocating CD tables, and populating them with right values.
> These CD tables are essentially the PASID tables and contain stage-1
> page table configurations too.
> A request to setup these CD tables come from virtio-iommu driver using
> the iommu-pasid-table library when running on Arm. The virtio-iommu
> then pass these PASID tables to the host using the right virtio backend
> and support in VMM.
> 
> For testing we have added necessary support in kvmtool. The changes in
> kvmtool are based on virtio-iommu development branch by Jean-Philippe
> Brucker [3].
> 
> The tested kernel branch contains following in the order bottom to top
> on the git hash -
> a) v5.11-rc3
> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>table support for Arm.
> c) Smmu test engine patches from Jean-Philippe's branch [4]
> d) This series
> e) Domain nesting info patches [5][6][7].
> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>the list).
> 
> This kernel is tested on Neoverse reference software stack with
> Fixed virtual platform. Public version of the software stack and
> FVP is available here[8][9].
> 
> A big thanks to Jean-Philippe for his contributions towards this work
> and for his valuable guidance.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
> [2] 
> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
> [5] 
> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
> [6] 
> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
> [7] 
> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
> [8] 
> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
> [9] 
> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst

Could you share a public branch where we could find all the kernel pieces.

Thank you in advance

Best Regards

Eric
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Add headers for table format probing
>   iommu/virtio: Add table format probing
>   iommu/virtio: Add headers for binding pasid table in iommu
>   iommu/virtio: Add support for INVALIDATE request
>   iommu/virtio: Attach Arm PASID tables when available
>   iommu/virtio: Add support for Arm LPAE page table format
> 
> Vivek Gautam (9):
>   iommu/arm-smmu-v3: Create a Context Descriptor library
>   iommu: Add a simple PASID table library
>   iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table
>   iommu/arm-smmu-v3: Update CD base address info for user-space
>   iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib
>   iommu: Add asid_bits to arm smmu-v3 stage1 table info
>   iommu/virtio: Update table format probing header
>   iommu/virtio: Prepare to add attach pasid table infrastructure
>   iommu/virtio: Update fault type and reason info for viommu fault
> 
>  drivers/iommu/arm/arm-smmu-v3/Makefile|   2 +-
>  .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c  | 283 +++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  16 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +-
>  drivers/iommu/iommu-pasid-table.h | 140 
>  drivers/iommu/virtio-iommu.c  | 692 +-
>  include/uapi/linux/iommu.h|   2 +-
>  include/uapi/linux/virtio_iommu.h | 158 +++-
>  9 files changed, 1303 insertions(+), 262 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
>  create mode 100644 drivers/iommu/iommu-pasid-table.h
> 



Re: [RFC v3 1/2] vfio/platform: add support for msi

2021-01-15 Thread Auger Eric
Hi Vikas,

On 1/15/21 7:26 AM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Tue, Jan 12, 2021 at 2:30 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 1/5/21 6:53 AM, Vikas Gupta wrote:
>>> On Tue, Dec 22, 2020 at 10:57 PM Auger Eric  wrote:
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 12/14/20 6:45 PM, Vikas Gupta wrote:
>>>>> MSI support for platform devices.The MSI block
>>>>> is added as an extended IRQ which exports caps
>>>>> VFIO_IRQ_INFO_CAP_TYPE and VFIO_IRQ_INFO_CAP_MSI_DESCS.
>>>>>
>>>>> Signed-off-by: Vikas Gupta 
>>>>> ---
>>>>>  drivers/vfio/platform/vfio_platform_common.c  | 179 +++-
>>>>>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>>>>>  drivers/vfio/platform/vfio_platform_private.h |  32 +++
>>>>>  include/uapi/linux/vfio.h |  44 +++
>>>>>  4 files changed, 496 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>>>>> b/drivers/vfio/platform/vfio_platform_common.c
>>>>> index fb4b385191f2..c936852f35d7 100644
>>>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>>>> @@ -16,6 +16,7 @@
>>>>>  #include 
>>>>>  #include 
>>>>>  #include 
>>>>> +#include 
>>>>>
>>>>>  #include "vfio_platform_private.h"
>>>>>
>>>>> @@ -26,6 +27,8 @@
>>>>>  #define VFIO_PLATFORM_IS_ACPI(vdev) ((vdev)->acpihid != NULL)
>>>>>
>>>>>  static LIST_HEAD(reset_list);
>>>>> +/* devices having MSI support */
>>>> nit: for devices using MSIs?
>>>>> +static LIST_HEAD(msi_list);
>>>>>  static DEFINE_MUTEX(driver_lock);
>>>>>
>>>>>  static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char 
>>>>> *compat,
>>>>> @@ -47,6 +50,25 @@ static vfio_platform_reset_fn_t 
>>>>> vfio_platform_lookup_reset(const char *compat,
>>>>>   return reset_fn;
>>>>>  }
>>>>>
>>>>> +static bool vfio_platform_lookup_msi(struct vfio_platform_device *vdev)
>>>>> +{
>>>>> + bool has_msi = false;
>>>>> + struct vfio_platform_msi_node *iter;
>>>>> +
>>>>> + mutex_lock(_lock);
>>>>> + list_for_each_entry(iter, _list, link) {
>>>>> + if (!strcmp(iter->compat, vdev->compat) &&
>>>>> + try_module_get(iter->owner)) {
>>>>> + vdev->msi_module = iter->owner;
>>>>> + vdev->of_get_msi = iter->of_get_msi;
>>>>> + has_msi = true;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> + mutex_unlock(_lock);
>>>>> + return has_msi;
>>>>> +}
>>>>> +
>>>>>  static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
>>>>>   struct device *dev)
>>>>>  {
>>>>> @@ -126,6 +148,19 @@ static int vfio_platform_get_reset(struct 
>>>>> vfio_platform_device *vdev)
>>>>>   return vdev->of_reset ? 0 : -ENOENT;
>>>>>  }
>>>>>
>>>>> +static int vfio_platform_get_msi(struct vfio_platform_device *vdev)
>>>>> +{
>>>>> + bool has_msi;
>>>>> +
>>>>> + has_msi = vfio_platform_lookup_msi(vdev);
>>>>> + if (!has_msi) {
>>>>> + request_module("vfio-msi:%s", vdev->compat);
>>>>> + has_msi = vfio_platform_lookup_msi(vdev);
>>>>> + }
>>>>> +
>>>>> + return has_msi ? 0 : -ENOENT;
>>>>> +}
>>>>> +
>>>>>  static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
>>>>>  {
>>>>>   if (VFIO_PLATFORM_IS_ACPI(vdev))
>>>>> @@ -135,6 +170,12 @@ static void vfio_platform_put_reset(struct 
>>>>> vfio_platform_device *vdev)
>>>>>   module_put(vdev->reset_module);
>

Re: [RFC v3 2/2] vfio/platform: msi: add Broadcom platform devices

2021-01-15 Thread Auger Eric
Hi Vikas,
On 1/15/21 7:35 AM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Tue, Jan 12, 2021 at 2:52 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 12/14/20 6:45 PM, Vikas Gupta wrote:
>>> Add msi support for Broadcom platform devices
>>>
>>> Signed-off-by: Vikas Gupta 
>>> ---
>>>  drivers/vfio/platform/Kconfig |  1 +
>>>  drivers/vfio/platform/Makefile|  1 +
>>>  drivers/vfio/platform/msi/Kconfig |  9 
>>>  drivers/vfio/platform/msi/Makefile|  2 +
>>>  .../vfio/platform/msi/vfio_platform_bcmplt.c  | 49 +++
>>>  5 files changed, 62 insertions(+)
>>>  create mode 100644 drivers/vfio/platform/msi/Kconfig
>>>  create mode 100644 drivers/vfio/platform/msi/Makefile
>>>  create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>> what does plt mean?
> This(plt) is a generic name for Broadcom platform devices, which we`ll
>  plan to add in this file. Currently we have only one in this file.
> Do you think this name does not sound good here?

we have VFIO_PLATFORM_BCMFLEXRM_RESET config which also applied to vfio
flex-rm platform device.

I think it would be more homegenous to have VFIO_PLATFORM_BCMFLEXRM_MSI
in case we keep a separate msi module.

also in reset dir we have vfio_platform_bcmflexrm.c


>>>
>>> diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
>>> index dc1a3c44f2c6..7b8696febe61 100644
>>> --- a/drivers/vfio/platform/Kconfig
>>> +++ b/drivers/vfio/platform/Kconfig
>>> @@ -21,3 +21,4 @@ config VFIO_AMBA
>>> If you don't know what to do here, say N.
>>>
>>>  source "drivers/vfio/platform/reset/Kconfig"
>>> +source "drivers/vfio/platform/msi/Kconfig"
>>> diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
>>> index 3f3a24e7c4ef..9ccdcdbf0e7e 100644
>>> --- a/drivers/vfio/platform/Makefile
>>> +++ b/drivers/vfio/platform/Makefile
>>> @@ -5,6 +5,7 @@ vfio-platform-y := vfio_platform.o
>>>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
>>>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform-base.o
>>>  obj-$(CONFIG_VFIO_PLATFORM) += reset/
>>> +obj-$(CONFIG_VFIO_PLATFORM) += msi/
>>>
>>>  vfio-amba-y := vfio_amba.o
>>>
>>> diff --git a/drivers/vfio/platform/msi/Kconfig 
>>> b/drivers/vfio/platform/msi/Kconfig
>>> new file mode 100644
>>> index ..54d6b70e1e32
>>> --- /dev/null
>>> +++ b/drivers/vfio/platform/msi/Kconfig
>>> @@ -0,0 +1,9 @@
>>> +# SPDX-License-Identifier: GPL-2.0-only
>>> +config VFIO_PLATFORM_BCMPLT_MSI
>>> + tristate "MSI support for Broadcom platform devices"
>>> + depends on VFIO_PLATFORM && (ARCH_BCM_IPROC || COMPILE_TEST)
>>> + default ARCH_BCM_IPROC
>>> + help
>>> +   Enables the VFIO platform driver to handle msi for Broadcom devices
>>> +
>>> +   If you don't know what to do here, say N.
>>> diff --git a/drivers/vfio/platform/msi/Makefile 
>>> b/drivers/vfio/platform/msi/Makefile
>>> new file mode 100644
>>> index ..27422d45cecb
>>> --- /dev/null
>>> +++ b/drivers/vfio/platform/msi/Makefile
>>> @@ -0,0 +1,2 @@
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +obj-$(CONFIG_VFIO_PLATFORM_BCMPLT_MSI) += vfio_platform_bcmplt.o
>>> diff --git a/drivers/vfio/platform/msi/vfio_platform_bcmplt.c 
>>> b/drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>> new file mode 100644
>>> index ..a074b5e92d77
>>> --- /dev/null
>>> +++ b/drivers/vfio/platform/msi/vfio_platform_bcmplt.c
>>> @@ -0,0 +1,49 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright 2020 Broadcom.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include "../vfio_platform_private.h"
>>> +
>>> +#define RING_SIZE(64 << 10)
>>> +
>>> +#define RING_MSI_ADDR_LS 0x03c
>>> +#define RING_MSI_ADDR_MS 0x040
>>> +#define RING_MSI_DATA_VALUE  0x064
>> Those 3 defines would not be needed anymore with that implementation option.
>>> +
>>> +static u32 bcm_num_msi(struct vfio_platform_device *vdev)
>>> +{
>>> + struct vfio_platform_region *reg = >regions[0];
>>

Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Jean,

On 1/14/21 6:33 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Thu, Jan 14, 2021 at 05:58:27PM +0100, Auger Eric wrote:
>>>>  The uacce-devel branches from
>>>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>>>> (they track the latest sva/zip-devel branch
>>>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
>> As I plan to respin shortly, please could you confirm the best branch to
>> rebase on still is that one (uacce-devel from the linux-kernel-uadk git
>> repo). Is it up to date? Commits seem to be quite old there.
> 
> Right I meant the uacce-devel-X branches. The uacce-devel-5.11 branch
> currently has the latest patches

OK thanks!

Eric
> 
> Thanks,
> Jean
> 



Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-01-14 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:23 AM, Auger Eric wrote:
> Hi Shameer, Jean-Philippe,
> 
> On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
>> Hi Jean,
>>
>>> -Original Message-
>>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>>> Sent: 04 December 2020 09:54
>>> To: Shameerali Kolothum Thodi 
>>> Cc: Auger Eric ; wangxingang
>>> ; Xieyingtai ;
>>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>>> vivek.gau...@arm.com; alex.william...@redhat.com;
>>> zhangfei@linaro.org; robin.mur...@arm.com;
>>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>>> ; qubingbing 
>>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>>> unmanaged ASIDs
>>>
>>> Hi Shameer,
>>>
>>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
>>>> Hi Jean/zhangfei,
>>>> Is it possible to have a branch with minimum required SVA/UACCE related
>>> patches
>>>> that are already public and can be a "stable" candidate for future respin 
>>>> of
>>> Eric's series?
>>>> Please share your thoughts.
>>>
>>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>>> based on mainline? 
>>
>> Yes. 
>>
>>  The uacce-devel branches from
>>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>>> (they track the latest sva/zip-devel branch
>>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
As I plan to respin shortly, please could you confirm the best branch to
rebase on still is that one (uacce-devel from the linux-kernel-uadk git
repo). Is it up to date? Commits seem to be quite old there.

Thanks

Eric
>>
>> Thanks. 
>>
>> Hi Eric,
>>
>> Could you please take a look at the above branches and see whether it make 
>> sense
>> to rebase on top of either of those?
>>
>> From vSVA point of view, it will be less rebase hassle if we can do that.
> 
> Sure. I will rebase on top of this ;-)
> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Shameer
>>
>>> Thanks,
>>> Jean
>>
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 



Re: [PATCH 8/9] KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace

2021-01-14 Thread Auger Eric
Hi Alexandru,

On 1/12/21 6:02 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 12/12/20 6:50 PM, Eric Auger wrote:
>> Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
>> reporting of GICR_TYPER.Last for userspace") temporarily fixed
>> a bug identified when attempting to access the GICR_TYPER
>> register before the redistributor region setting but dropped
>> the support of the LAST bit. This patch restores its
>> support (if the redistributor region was set) while keeping the
>> code safe.
> 
> I suppose the reason for emulating GICR_TYPER.Last is for architecture 
> compliance,
> right? I think that should be in the commit message.
OK added this in the commit msg.
> 
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 7 ++-
>>  include/kvm/arm_vgic.h | 1 +
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 581f0f49..2f9ef6058f6e 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -277,6 +277,8 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct 
>> kvm_vcpu *vcpu,
>>   gpa_t addr, unsigned int len)
>>  {
>>  unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
>> +struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
>> +struct vgic_redist_region *rdreg = vgic_cpu->rdreg;
>>  int target_vcpu_id = vcpu->vcpu_id;
>>  u64 value;
>>  
>> @@ -286,7 +288,9 @@ static unsigned long vgic_uaccess_read_v3r_typer(struct 
>> kvm_vcpu *vcpu,
>>  if (vgic_has_its(vcpu->kvm))
>>  value |= GICR_TYPER_PLPIS;
>>  
>> -/* reporting of the Last bit is not supported for userspace */
>> +if (rdreg && (vgic_cpu->rdreg_index == (rdreg->free_index - 1)))
>> +value |= GICR_TYPER_LAST;
>> +
>>  return extract_bytes(value, addr & 7, len);
>>  }
>>  
>> @@ -714,6 +718,7 @@ int vgic_register_redist_iodev(struct kvm_vcpu *vcpu)
>>  return -EINVAL;
>>  
>>  vgic_cpu->rdreg = rdreg;
>> +vgic_cpu->rdreg_index = rdreg->free_index;
> 
> What happens if the next redistributor region we register has the base address
> adjacent to this one?
> 
> I'm really not familiar with the code, but is it not possible to create two
> Redistributor regions (via
> KVM_DEV_ARM_VGIC_GRP_ADDR(KVM_VGIC_V3_ADDR_TYPE_REDIST)) where the second
> Redistributor region start address is immediately after the last 
> Redistributor in
> the preceding region?
KVM_VGIC_V3_ADDR_TYPE_REDIST only allows to create a single rdist
region. Only KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION allows to register
several of them.

with KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, it is possible to register
adjacent rdist regions. vgic_v3_rdist_free_slot() previously returned
the 1st rdist region where enough space remains for inserting the new
reg. We put the rdist at the free index there.

But maybe I misunderstood your question?

Thanks

Eric
> 
> Thanks,
> Alex
>>  
>>  rd_base = rdreg->base + rdreg->free_index * KVM_VGIC_V3_REDIST_SIZE;
>>  
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index a8d8fdcd3723..596c069263a7 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -322,6 +322,7 @@ struct vgic_cpu {
>>   */
>>  struct vgic_io_device   rd_iodev;
>>  struct vgic_redist_region *rdreg;
>> +u32 rdreg_index;
>>  
>>  /* Contains the attributes and gpa of the LPI pending tables. */
>>  u64 pendbaser;
> 



Re: [PATCH 5/9] KVM: arm: move has_run_once after the map_resources

2021-01-14 Thread Auger Eric
Hi Alexandru,

On 1/12/21 3:55 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 12/12/20 6:50 PM, Eric Auger wrote:
>> has_run_once is set to true at the beginning of
>> kvm_vcpu_first_run_init(). This generally is not an issue
>> except when exercising the code with KVM selftests. Indeed,
>> if kvm_vgic_map_resources() fails due to erroneous user settings,
>> has_run_once is set and this prevents from continuing
>> executing the test. This patch moves the assignment after the
>> kvm_vgic_map_resources().
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/arm.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index c0ffb019ca8b..331fae6bff31 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -540,8 +540,6 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>  if (!kvm_arm_vcpu_is_finalized(vcpu))
>>  return -EPERM;
>>  
>> -vcpu->arch.has_run_once = true;
>> -
>>  if (likely(irqchip_in_kernel(kvm))) {
>>  /*
>>   * Map the VGIC hardware resources before running a vcpu the
>> @@ -560,6 +558,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>  static_branch_inc(_irqchip_in_use);
>>  }
>>  
>> +vcpu->arch.has_run_once = true;
> 
> I have a few concerns regarding this:
> 
> 1. Moving has_run_once = true here seems very arbitrary to me - 
> kvm_timer_enable()
> and kvm_arm_pmu_v3_enable(), below it, can both fail because of erroneous user
> values. If there's a reason why the assignment cannot be moved at the end of 
> the
> function, I think it should be clearly stated in a comment for the people who
> might be tempted to write similar tests for the timer or pmu.

Setting has_run_once = true at the entry of the function looks to me
even more arbitrary. I agree with you that eventually has_run_once may
be moved at the very end but maybe this can be done later once timer,
pmu tests haven ben written
> 
> 2. There are many ways that kvm_vgic_map_resources() can fail, other than
> incorrect user settings. I started digging into how
> kvm_vgic_map_resources()->vgic_v2_map_resources() can fail for a VGIC V2 and 
> this
> is what I managed to find before I gave up:
> 
> * vgic_init() can fail in:
>     - kvm_vgic_dist_init()
>     - vgic_v3_init()
>     - kvm_vgic_setup_default_irq_routing()
> * vgic_register_dist_iodev() can fail in:
>     - vgic_v3_init_dist_iodev()
>     - kvm_io_bus_register_dev()(*)
> * kvm_phys_addr_ioremap() can fail in:
>     - kvm_mmu_topup_memory_cache()
>     - kvm_pgtable_stage2_map()

I changed the commit msg so that "incorrect user settings" sounds as an
example.
> 
> So if any of the functions below fail, are we 100% sure it is safe to allow 
> the
> user to execute kvm_vgic_map_resources() again?

I think additional tests will confirm this. However at the moment,
moving the assignment, which does not look wrong to me, allows to
greatly simplify the tests so I would tend to say that it is worth.
> 
> (*) It looks to me like kvm_io_bus_register_dev() doesn't take into account a
> caller that tries to register the same device address range and it will create
> another identical range. Is this intentional? Is it a bug that should be 
> fixed? Or
> am I misunderstanding the function?

doesn't kvm_io_bus_cmp() do the check?

Thanks

Eric
> 
> Thanks,
> Alex
>> +
>>  ret = kvm_timer_enable(vcpu);
>>  if (ret)
>>  return ret;
> 



Re: [PATCH 1/9] KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base

2021-01-14 Thread Auger Eric
Hi Alexandru,

On 1/6/21 5:32 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 12/12/20 6:50 PM, Eric Auger wrote:
>> KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
>> -EEXIST in case the base address of the redist is already set.
>> We currently return -EINVAL.
>>
>> However we need to return -EINVAL in case a legacy REDIST address
>> is attempted to be set while REDIST_REGIONS were set. This case
>> is discriminated by looking at the count field.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 9 +++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 15a6c98ee92f..8e8a862def76 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -792,8 +792,13 @@ static int vgic_v3_insert_redist_region(struct kvm 
>> *kvm, uint32_t index,
>>  int ret;
>>  
>>  /* single rdist region already set ?*/
>> -if (!count && !list_empty(rd_regions))
>> -return -EINVAL;
>> +if (!count && !list_empty(rd_regions)) {
>> +rdreg = list_last_entry(rd_regions,
>> +   struct vgic_redist_region, list);
>> +if (rdreg->count)
>> +return -EINVAL; /* Mixing REDIST and REDIST_REGION API 
>> */
>> +return -EEXIST;
>> +}
> 
> A few instructions below:
> 
>     if (list_empty(rd_regions)) {
>         [..]
>     } else {
>         rdreg = list_last_entry(rd_regions,
>                     struct vgic_redist_region, list);
>         [..]
> 
>         /* Cannot add an explicitly sized regions after legacy region */
>         if (!rdreg->count)
>             return -EINVAL;
>     }
> 
> Isn't this testing for the same thing, but using the opposite condition? Or 
> am I
> misunderstanding the code (quite likely)?
the 1st test sequence handles the case where the legacy
KVM_VGIC_V3_ADDR_TYPE_REDIST is used (!count) while the second handles
the case where the REDIST_REGION is used. Nevertheless I think this can
be simplified into:

if (list_empty(rd_regions)) {
if (index != 0)
return -EINVAL;
} else {
rdreg = list_last_entry(rd_regions,
struct vgic_redist_region, list);

if ((!count) != (!rdreg->count))
return -EINVAL; /* Mix REDIST and REDIST_REGION */

if (!count)
return -EEXIST;

if (index != rdreg->index + 1)
return -EINVAL;
}






> 
> Looks to me like 
> KVM_DEV_ARM_VGIC_GRP_ADDR(KVM_VGIC_V3_ADDR_TYPE_REDIST{,_REGION})
> used to return -EEXIST (from vgic_check_ioaddr()) before commit ccc27bf5be7b7
> ("KVM: arm/arm64: Helper to register a new redistributor region") which added 
> the
> vgic_v3_insert_redist_region() function, so bringing back the -EEXIST return 
> code
> looks the right thing to me.

OK thank you for the detailed study.

Eric
> 
> Thanks,
> Alex
>>  
>>  /* cross the end of memory ? */
>>  if (base + size < base)
> 



Re: [PATCH 3/9] KVM: arm64: vgic-v3: Fix error handling in vgic_v3_set_redist_base()

2021-01-13 Thread Auger Eric
Hi Marc,

On 12/28/20 4:35 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On Sat, 12 Dec 2020 18:50:04 +,
> Eric Auger  wrote:
>>
>> vgic_register_all_redist_iodevs may succeed while
>> vgic_register_all_redist_iodevs fails. For example this can happen
> 
> The same function cannot both fail and succeed ;-) Can you shed some
> light on what you had in mind?

Damn, I meant vgic_v3_insert_redist_region() can be successful and then
vgic_register_all_redist_iodevs() fails due to detection of overlap.
> 
>> while adding a redistributor region overlapping a dist region. The
>> failure only is detected on vgic_register_all_redist_iodevs when
>> vgic_v3_check_base() gets called.
>>
>> In such a case, remove the newly added redistributor region and free
>> it.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 8e8a862def76..581f0f49 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -866,8 +866,14 @@ int vgic_v3_set_redist_base(struct kvm *kvm, u32 index, 
>> u64 addr, u32 count)
>>   * afterwards will register the iodevs when needed.
>>   */
>>  ret = vgic_register_all_redist_iodevs(kvm);
>> -if (ret)
>> +if (ret) {
>> +struct vgic_redist_region *rdreg =
>> +vgic_v3_rdist_region_from_index(kvm, index);
>> +
> 
> nit: consider splitting declaration and assignment so that we avoid
> the line split if you insist on the 80 character limit.
Sure

Thanks

Eric
> 
>> +list_del(>list);
>> +kfree(rdreg);
>>  return ret;
>> +}
>>  
>>  return 0;
>>  }
>> -- 
>> 2.21.3
>>
>>
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH 6/9] docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT doc

2021-01-13 Thread Auger Eric
Hi Alexandru,

On 1/12/21 4:39 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 12/12/20 6:50 PM, Eric Auger wrote:
>> kvm_arch_vcpu_precreate() returns -EBUSY if the vgic is
>> already initialized. So let's document that KVM_DEV_ARM_VGIC_CTRL_INIT
>> must be called after all vcpu creations.
> 
> Checked and this is indeed the case,
> kvm_vm_ioctl_create_vcpu()->kvm_arch_vcpu_precreate() returns -EBUSY is
> vgic_initialized() is true.
thanks!
> 
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  Documentation/virt/kvm/devices/arm-vgic-v3.rst | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst 
>> b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
>> index 5dd3bff51978..322de6aebdec 100644
>> --- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
>> +++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
>> @@ -228,7 +228,7 @@ Groups:
>>  
>>  KVM_DEV_ARM_VGIC_CTRL_INIT
>>request the initialization of the VGIC, no additional parameter in
>> -  kvm_device_attr.addr.
>> +  kvm_device_attr.addr. Must be called after all vcpu creations.
> 
> Nitpick here: the document writes VCPU with all caps. This also sounds a bit
> weird, I think something like "Must be called after all VCPUs have been 
> created"
> is clearer.
I took your suggestion.

Thanks

Eric
> 
> Thanks,
> Alex
>>  KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES
>>save all LPI pending bits into guest RAM pending tables.
>>  
> 



Re: [PATCH 4/9] KVM: arm/arm64: vgic: Reset base address on kvm_vgic_dist_destroy()

2021-01-13 Thread Auger Eric
Hi Marc,

On 12/28/20 4:41 PM, Marc Zyngier wrote:
> On Sat, 12 Dec 2020 18:50:05 +,
> Eric Auger  wrote:
>>
>> On vgic_dist_destroy(), the addresses are not reset. However for
>> kvm selftest purpose this would allow to continue the test execution
>> even after a failure when running KVM_RUN. So let's reset the
>> base addresses.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-init.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-init.c 
>> b/arch/arm64/kvm/vgic/vgic-init.c
>> index 32e32d67a127..6147bed56b1b 100644
>> --- a/arch/arm64/kvm/vgic/vgic-init.c
>> +++ b/arch/arm64/kvm/vgic/vgic-init.c
>> @@ -335,14 +335,16 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
>>  kfree(dist->spis);
>>  dist->spis = NULL;
>>  dist->nr_spis = 0;
>> +dist->vgic_dist_base = VGIC_ADDR_UNDEF;
>>  
>> -if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>> +if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>>  list_for_each_entry_safe(rdreg, next, >rd_regions, list) {
>>  list_del(>list);
>>  kfree(rdreg);
>>  }
>>  INIT_LIST_HEAD(>rd_regions);
>> -}
>> +} else
>> +kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
> 
> Since you have converted the hunk above to use dist->, you could do
> the same thing here. And the coding style dictates that you need {} on
> the else side as well.
sure

Thanks

Eric
> 
>>
>>  if (vgic_has_its(kvm))
>>  vgic_lpi_translation_cache_destroy(kvm);
>> @@ -362,6 +364,7 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  vgic_flush_pending_lpis(vcpu);
>>  
>>  INIT_LIST_HEAD(_cpu->ap_list_head);
>> +vgic_cpu->rd_iodev.base_addr = VGIC_ADDR_UNDEF;
>>  }
>>  
>>  /* To be called with kvm->lock held */
>> -- 
>> 2.21.3
>>
>>
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH 2/9] KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read

2021-01-13 Thread Auger Eric
Hi Alexandru,

On 1/6/21 6:12 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> The patch looks correct to me. kvm_vgic_addr() masks out all the bits except 
> index
> from addr, so we don't need to do it in vgic_get_common_attr():
> 
> Reviewed-by: Alexandru Elisei 
> 
> One nitpick below.
> 
> On 12/12/20 6:50 PM, Eric Auger wrote:
>> The doc says:
>> "The characteristics of a specific redistributor region can
>>  be read by presetting the index field in the attr data.
>>  Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"
>>
>> Unfortunately the existing code fails to read the input attr data
>> and thus the index always is 0.
> 
> addr is allocated on the stack, I don't think it will always be 0.
I removed this statement in the commit message. Thanks!

Eric
> 
> Thanks,
> Alex
>>
>> Fixes: 04c110932225 ("KVM: arm/arm64: Implement 
>> KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
>> Cc: sta...@vger.kernel.org#v4.17+
>> Signed-off-by: Eric Auger 
>> ---
>>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c 
>> b/arch/arm64/kvm/vgic/vgic-kvm-device.c
>> index 44419679f91a..2f66cf247282 100644
>> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
>> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
>> @@ -226,6 +226,9 @@ static int vgic_get_common_attr(struct kvm_device *dev,
>>  u64 addr;
>>  unsigned long type = (unsigned long)attr->attr;
>>  
>> +if (copy_from_user(, uaddr, sizeof(addr)))
>> +return -EFAULT;
>> +
>>  r = kvm_vgic_addr(dev->kvm, type, , false);
>>  if (r)
>>  return (r == -ENODEV) ? -ENXIO : r;
> 



Re: [PATCH 7/9] KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]

2021-01-13 Thread Auger Eric
Hi Alexandru,
On 1/12/21 5:16 PM, Alexandru Elisei wrote:
> Hi Eric,
> 
> On 1/12/21 4:04 PM, Alexandru Elisei wrote:
>> Hi Eric,
>>
>> On 12/12/20 6:50 PM, Eric Auger wrote:
>>> Instead of converting the vgic_io_device handle to a kvm_io_device
>>> handled and then do the oppositive, pass a vgic_io_device pointer all
>>> along the call chain.
>> To me, it looks like the commit message describes what the patch does 
>> instead of
>> why it does it.
>>
>> What are "vgic_io_device handle" and "kvm_io_device handled"?
Yes unfortunate typo, sorry.
> 
> Sorry, I think I got it now. You were referring to the argument types struct
> vgic_io_device and struct kvm_io_device. The patch looks like a very good 
> cleanup.
> 
> How changing to commit message to sound something like this (feel free to
> ignore/change it if you think of something else):
> 
> vgic_uaccess() takes a struct vgic_io_device argument, converts it to a struct
> kvm_io_device and passes it to the read/write accessor functions, which 
> convert it
> back to a struct vgic_io_device. Avoid the indirection by passing the struct
> vgic_io_device argument directly to vgic_uaccess_{read,write).
I reworded the commit message as you suggested.

Thanks

Eric
> 
> Thanks,
> Alex
>>
>> Thanks,
>> Alex
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  arch/arm64/kvm/vgic/vgic-mmio.c | 10 --
>>>  1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c 
>>> b/arch/arm64/kvm/vgic/vgic-mmio.c
>>> index b2d73fc0d1ef..48c6067fc5ec 100644
>>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
>>> @@ -938,10 +938,9 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct 
>>> vgic_io_device *iodev,
>>> return region;
>>>  }
>>>  
>>> -static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct kvm_io_device 
>>> *dev,
>>> +static int vgic_uaccess_read(struct kvm_vcpu *vcpu, struct vgic_io_device 
>>> *iodev,
>>>  gpa_t addr, u32 *val)
>>>  {
>>> -   struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
>>> const struct vgic_register_region *region;
>>> struct kvm_vcpu *r_vcpu;
>>>  
>>> @@ -960,10 +959,9 @@ static int vgic_uaccess_read(struct kvm_vcpu *vcpu, 
>>> struct kvm_io_device *dev,
>>> return 0;
>>>  }
>>>  
>>> -static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct kvm_io_device 
>>> *dev,
>>> +static int vgic_uaccess_write(struct kvm_vcpu *vcpu, struct vgic_io_device 
>>> *iodev,
>>>   gpa_t addr, const u32 *val)
>>>  {
>>> -   struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
>>> const struct vgic_register_region *region;
>>> struct kvm_vcpu *r_vcpu;
>>>  
>>> @@ -986,9 +984,9 @@ int vgic_uaccess(struct kvm_vcpu *vcpu, struct 
>>> vgic_io_device *dev,
>>>  bool is_write, int offset, u32 *val)
>>>  {
>>> if (is_write)
>>> -   return vgic_uaccess_write(vcpu, >dev, offset, val);
>>> +   return vgic_uaccess_write(vcpu, dev, offset, val);
>>> else
>>> -   return vgic_uaccess_read(vcpu, >dev, offset, val);
>>> +   return vgic_uaccess_read(vcpu, dev, offset, val);
>>>  }
>>>  
>>>  static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device 
>>> *dev,
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 



Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-01-13 Thread Auger Eric
Hi Shameer,

On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 18 November 2020 11:22
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
>>
>> This series brings the IOMMU part of HW nested paging support
>> in the SMMUv3. The VFIO part is submitted separately.
>>
>> The IOMMU API is extended to support 2 new API functionalities:
>> 1) pass the guest stage 1 configuration
>> 2) pass stage 1 MSI bindings
>>
>> Then those capabilities gets implemented in the SMMUv3 driver.
>>
>> The virtualizer passes information through the VFIO user API
>> which cascades them to the iommu subsystem. This allows the guest
>> to own stage 1 tables and context descriptors (so-called PASID
>> table) while the host owns stage 2 tables and main configuration
>> structures (STE).
> 
> I am seeing an issue with Guest testpmd run with this series.
> I have two different setups and testpmd works fine with the
> first one but not with the second.
> 
> 1). Guest doesn't have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: fast devsel
> Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0  -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket 0)
> EAL: No legacy callbacks, legacy socket not created
> Interactive-mode selected
> testpmd: create a new mbuf pool : n=155456, size=2176, 
> socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port 
> will pair with itself.
> 
> Configuring Port 0 (socket 0)
> Port 0: 8E:A6:8C:43:43:45
> Checking link statuses...
> Done
> testpmd>
> 
> 2). Guest have kernel driver built-in for pass-through dev.
> 
> root@ubuntu:/# lspci -v
> ...
> 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev 
> 21)
> Subsystem: Huawei Technologies Co., Ltd. Device 
> Flags: bus master, fast devsel, latency 0
> Memory at 800010 (64-bit, prefetchable) [size=64K]
> Memory at 80 (64-bit, prefetchable) [size=1M]
> Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> Capabilities: [b0] Power Management version 3
> Capabilities: [100] Access Control Services
> Capabilities: [300] Transaction Processing Hints
> Kernel driver in use: hns3
> 
> root@ubuntu:/# echo vfio-pci > 
> /sys/bus/pci/devices/:00:02.0/driver_override
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers/hns3/unbind
> root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> 
> root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix 
> socket0 -l 0-1 -n 2 -- -i
> EAL: Detected 8 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-32768kB
> EAL: No available hugepages reported in hugepages-64kB
> EAL: No available hugepages reported in hugepages-1048576kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL:   Invalid NUMA socket, default to 0
> EAL:   using IOMMU type 1 (Type 1)
> EAL: Probe 

Re: [RFC v3 2/2] vfio/platform: msi: add Broadcom platform devices

2021-01-12 Thread Auger Eric
Hi Vikas,

On 12/14/20 6:45 PM, Vikas Gupta wrote:
> Add msi support for Broadcom platform devices
> 
> Signed-off-by: Vikas Gupta 
> ---
>  drivers/vfio/platform/Kconfig |  1 +
>  drivers/vfio/platform/Makefile|  1 +
>  drivers/vfio/platform/msi/Kconfig |  9 
>  drivers/vfio/platform/msi/Makefile|  2 +
>  .../vfio/platform/msi/vfio_platform_bcmplt.c  | 49 +++
>  5 files changed, 62 insertions(+)
>  create mode 100644 drivers/vfio/platform/msi/Kconfig
>  create mode 100644 drivers/vfio/platform/msi/Makefile
>  create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
what does plt mean?
> 
> diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
> index dc1a3c44f2c6..7b8696febe61 100644
> --- a/drivers/vfio/platform/Kconfig
> +++ b/drivers/vfio/platform/Kconfig
> @@ -21,3 +21,4 @@ config VFIO_AMBA
> If you don't know what to do here, say N.
>  
>  source "drivers/vfio/platform/reset/Kconfig"
> +source "drivers/vfio/platform/msi/Kconfig"
> diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
> index 3f3a24e7c4ef..9ccdcdbf0e7e 100644
> --- a/drivers/vfio/platform/Makefile
> +++ b/drivers/vfio/platform/Makefile
> @@ -5,6 +5,7 @@ vfio-platform-y := vfio_platform.o
>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
>  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform-base.o
>  obj-$(CONFIG_VFIO_PLATFORM) += reset/
> +obj-$(CONFIG_VFIO_PLATFORM) += msi/
>  
>  vfio-amba-y := vfio_amba.o
>  
> diff --git a/drivers/vfio/platform/msi/Kconfig 
> b/drivers/vfio/platform/msi/Kconfig
> new file mode 100644
> index ..54d6b70e1e32
> --- /dev/null
> +++ b/drivers/vfio/platform/msi/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VFIO_PLATFORM_BCMPLT_MSI
> + tristate "MSI support for Broadcom platform devices"
> + depends on VFIO_PLATFORM && (ARCH_BCM_IPROC || COMPILE_TEST)
> + default ARCH_BCM_IPROC
> + help
> +   Enables the VFIO platform driver to handle msi for Broadcom devices
> +
> +   If you don't know what to do here, say N.
> diff --git a/drivers/vfio/platform/msi/Makefile 
> b/drivers/vfio/platform/msi/Makefile
> new file mode 100644
> index ..27422d45cecb
> --- /dev/null
> +++ b/drivers/vfio/platform/msi/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_VFIO_PLATFORM_BCMPLT_MSI) += vfio_platform_bcmplt.o
> diff --git a/drivers/vfio/platform/msi/vfio_platform_bcmplt.c 
> b/drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> new file mode 100644
> index ..a074b5e92d77
> --- /dev/null
> +++ b/drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Broadcom.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "../vfio_platform_private.h"
> +
> +#define RING_SIZE(64 << 10)
> +
> +#define RING_MSI_ADDR_LS 0x03c
> +#define RING_MSI_ADDR_MS 0x040
> +#define RING_MSI_DATA_VALUE  0x064
Those 3 defines would not be needed anymore with that implementation option.
> +
> +static u32 bcm_num_msi(struct vfio_platform_device *vdev)
> +{
> + struct vfio_platform_region *reg = >regions[0];
> +
> + return (reg->size / RING_SIZE);
> +}
> +
> +static struct vfio_platform_msi_node vfio_platform_bcmflexrm_msi_node = {
> + .owner = THIS_MODULE,
> + .compat = "brcm,iproc-flexrm-mbox",
> + .of_get_msi = bcm_num_msi,
> +};
> +
> +static int __init vfio_platform_bcmflexrm_msi_module_init(void)
> +{
> + __vfio_platform_register_msi(_platform_bcmflexrm_msi_node);
> +
> + return 0;
> +}
> +
> +static void __exit vfio_platform_bcmflexrm_msi_module_exit(void)
> +{
> + vfio_platform_unregister_msi("brcm,iproc-flexrm-mbox");
> +}
> +
> +module_init(vfio_platform_bcmflexrm_msi_module_init);
> +module_exit(vfio_platform_bcmflexrm_msi_module_exit);
One thing I would like to discuss with Alex.

As the reset module is mandated (except if reset_required is forced to
0), I am wondering if we shouldn't try to turn the reset module into a
"specialization" module and put the msi hooks there. I am afraid we may
end up having modules for each and every vfio platform feature
specialization. At the moment that's fully bearable but I can't predict
what's next.

As the mandated feature is the reset capability maybe we could just keep
the config/module name terminology, tune the kconfig help message to
mention the msi support in case of flex-rm?

What do you think?

Thanks

Eric




> +
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Broadcom");
> 



Re: [RFC v3 1/2] vfio/platform: add support for msi

2021-01-12 Thread Auger Eric
Hi Vikas,

On 1/5/21 6:53 AM, Vikas Gupta wrote:
> On Tue, Dec 22, 2020 at 10:57 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 12/14/20 6:45 PM, Vikas Gupta wrote:
>>> MSI support for platform devices.The MSI block
>>> is added as an extended IRQ which exports caps
>>> VFIO_IRQ_INFO_CAP_TYPE and VFIO_IRQ_INFO_CAP_MSI_DESCS.
>>>
>>> Signed-off-by: Vikas Gupta 
>>> ---
>>>  drivers/vfio/platform/vfio_platform_common.c  | 179 +++-
>>>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>>>  drivers/vfio/platform/vfio_platform_private.h |  32 +++
>>>  include/uapi/linux/vfio.h |  44 +++
>>>  4 files changed, 496 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>>> b/drivers/vfio/platform/vfio_platform_common.c
>>> index fb4b385191f2..c936852f35d7 100644
>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>> @@ -16,6 +16,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "vfio_platform_private.h"
>>>
>>> @@ -26,6 +27,8 @@
>>>  #define VFIO_PLATFORM_IS_ACPI(vdev) ((vdev)->acpihid != NULL)
>>>
>>>  static LIST_HEAD(reset_list);
>>> +/* devices having MSI support */
>> nit: for devices using MSIs?
>>> +static LIST_HEAD(msi_list);
>>>  static DEFINE_MUTEX(driver_lock);
>>>
>>>  static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char 
>>> *compat,
>>> @@ -47,6 +50,25 @@ static vfio_platform_reset_fn_t 
>>> vfio_platform_lookup_reset(const char *compat,
>>>   return reset_fn;
>>>  }
>>>
>>> +static bool vfio_platform_lookup_msi(struct vfio_platform_device *vdev)
>>> +{
>>> + bool has_msi = false;
>>> + struct vfio_platform_msi_node *iter;
>>> +
>>> + mutex_lock(_lock);
>>> + list_for_each_entry(iter, _list, link) {
>>> + if (!strcmp(iter->compat, vdev->compat) &&
>>> + try_module_get(iter->owner)) {
>>> + vdev->msi_module = iter->owner;
>>> + vdev->of_get_msi = iter->of_get_msi;
>>> + has_msi = true;
>>> + break;
>>> + }
>>> + }
>>> + mutex_unlock(_lock);
>>> + return has_msi;
>>> +}
>>> +
>>>  static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
>>>   struct device *dev)
>>>  {
>>> @@ -126,6 +148,19 @@ static int vfio_platform_get_reset(struct 
>>> vfio_platform_device *vdev)
>>>   return vdev->of_reset ? 0 : -ENOENT;
>>>  }
>>>
>>> +static int vfio_platform_get_msi(struct vfio_platform_device *vdev)
>>> +{
>>> + bool has_msi;
>>> +
>>> + has_msi = vfio_platform_lookup_msi(vdev);
>>> + if (!has_msi) {
>>> + request_module("vfio-msi:%s", vdev->compat);
>>> + has_msi = vfio_platform_lookup_msi(vdev);
>>> + }
>>> +
>>> + return has_msi ? 0 : -ENOENT;
>>> +}
>>> +
>>>  static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
>>>  {
>>>   if (VFIO_PLATFORM_IS_ACPI(vdev))
>>> @@ -135,6 +170,12 @@ static void vfio_platform_put_reset(struct 
>>> vfio_platform_device *vdev)
>>>   module_put(vdev->reset_module);
>>>  }
>>>
>>> +static void vfio_platform_put_msi(struct vfio_platform_device *vdev)
>>> +{
>>> + if (vdev->of_get_msi)
>>> + module_put(vdev->msi_module);
>>> +}
>>> +
>>>  static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
>>>  {
>>>   int cnt = 0, i;
>>> @@ -343,9 +384,17 @@ static long vfio_platform_ioctl(void *device_data,
>>>
>>>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>>   struct vfio_irq_info info;
>>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
>>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>>> + unsigned 

Re: [RFC v3 1/2] vfio/platform: add support for msi

2020-12-22 Thread Auger Eric
Hi Vikas,

On 12/14/20 6:45 PM, Vikas Gupta wrote:
> MSI support for platform devices.The MSI block
> is added as an extended IRQ which exports caps
> VFIO_IRQ_INFO_CAP_TYPE and VFIO_IRQ_INFO_CAP_MSI_DESCS.
> 
> Signed-off-by: Vikas Gupta 
> ---
>  drivers/vfio/platform/vfio_platform_common.c  | 179 +++-
>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>  drivers/vfio/platform/vfio_platform_private.h |  32 +++
>  include/uapi/linux/vfio.h |  44 +++
>  4 files changed, 496 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> b/drivers/vfio/platform/vfio_platform_common.c
> index fb4b385191f2..c936852f35d7 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "vfio_platform_private.h"
>  
> @@ -26,6 +27,8 @@
>  #define VFIO_PLATFORM_IS_ACPI(vdev) ((vdev)->acpihid != NULL)
>  
>  static LIST_HEAD(reset_list);
> +/* devices having MSI support */
nit: for devices using MSIs?
> +static LIST_HEAD(msi_list);
>  static DEFINE_MUTEX(driver_lock);
>  
>  static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char 
> *compat,
> @@ -47,6 +50,25 @@ static vfio_platform_reset_fn_t 
> vfio_platform_lookup_reset(const char *compat,
>   return reset_fn;
>  }
>  
> +static bool vfio_platform_lookup_msi(struct vfio_platform_device *vdev)
> +{
> + bool has_msi = false;
> + struct vfio_platform_msi_node *iter;
> +
> + mutex_lock(_lock);
> + list_for_each_entry(iter, _list, link) {
> + if (!strcmp(iter->compat, vdev->compat) &&
> + try_module_get(iter->owner)) {
> + vdev->msi_module = iter->owner;
> + vdev->of_get_msi = iter->of_get_msi;
> + has_msi = true;
> + break;
> + }
> + }
> + mutex_unlock(_lock);
> + return has_msi;
> +}
> +
>  static int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
>   struct device *dev)
>  {
> @@ -126,6 +148,19 @@ static int vfio_platform_get_reset(struct 
> vfio_platform_device *vdev)
>   return vdev->of_reset ? 0 : -ENOENT;
>  }
>  
> +static int vfio_platform_get_msi(struct vfio_platform_device *vdev)
> +{
> + bool has_msi;
> +
> + has_msi = vfio_platform_lookup_msi(vdev);
> + if (!has_msi) {
> + request_module("vfio-msi:%s", vdev->compat);
> + has_msi = vfio_platform_lookup_msi(vdev);
> + }
> +
> + return has_msi ? 0 : -ENOENT;
> +}
> +
>  static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
>  {
>   if (VFIO_PLATFORM_IS_ACPI(vdev))
> @@ -135,6 +170,12 @@ static void vfio_platform_put_reset(struct 
> vfio_platform_device *vdev)
>   module_put(vdev->reset_module);
>  }
>  
> +static void vfio_platform_put_msi(struct vfio_platform_device *vdev)
> +{
> + if (vdev->of_get_msi)
> + module_put(vdev->msi_module);
> +}
> +
>  static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
>  {
>   int cnt = 0, i;
> @@ -343,9 +384,17 @@ static long vfio_platform_ioctl(void *device_data,
>  
>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>   struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + struct vfio_irq_info_cap_msi *msi_info = NULL;
> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
> + unsigned long capsz;
> + u32 index;
>  
>   minsz = offsetofend(struct vfio_irq_info, count);
>  
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
>  
> @@ -355,8 +404,94 @@ static long vfio_platform_ioctl(void *device_data,
>   if (info.index >= vdev->num_irqs)
>   return -EINVAL;
>  
> - info.flags = vdev->irqs[info.index].flags;
> - info.count = vdev->irqs[info.index].count;
> + if (info.argsz >= capsz)
> + minsz = capsz;
> +
> + index = info.index;
> +
> + info.flags = vdev->irqs[index].flags;
> + info.count = vdev->irqs[index].count;
> +
> + if (ext_irq_index - index == VFIO_EXT_IRQ_MSI) {
> + struct vfio_irq_info_cap_type cap_type = {
> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
> + .header.version = 1 };
> + struct vfio_platform_irq *irq;
> + size_t msi_info_size;
> + int num_msgs;
> + int ret;
> + int i;
> +
> +  

Re: [RFC PATCH v1 3/4] KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side

2020-12-16 Thread Auger Eric
Hi Shenming,

On 12/1/20 1:15 PM, Shenming Lu wrote:
> On 2020/12/1 19:50, Marc Zyngier wrote:
>> On 2020-12-01 11:40, Shenming Lu wrote:
>>> On 2020/12/1 18:55, Marc Zyngier wrote:
 On 2020-11-30 07:23, Shenming Lu wrote:

 Hi Shenming,

> We are pondering over this problem these days, but still don't get a
> good solution...
> Could you give us some advice on this?
>
> Or could we move the restoring of the pending states (include the sync
> from guest RAM and the transfer to HW) to the GIC VM state change handler,
> which is completely corresponding to save_pending_tables (more symmetric?)
> and don't expose GICv4...

 What is "the GIC VM state change handler"? Is that a QEMU thing?
>>>
>>> Yeah, it is a a QEMU thing...
>>>
 We don't really have that concept in KVM, so I'd appreciate if you could
 be a bit more explicit on this.
>>>
>>> My thought is to add a new interface (to QEMU) for the restoring of
>>> the pending states, which is completely corresponding to
>>> KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES...
>>> And it is called from the GIC VM state change handler in QEMU, which
>>> is happening after the restoring (call kvm_vgic_v4_set_forwarding())
>>> but before the starting (running) of the VFIO device.
>>
>> Right, that makes sense. I still wonder how much the GIC save/restore
>> stuff differs from other architectures that implement similar features,
>> such as x86 with VT-D.
> 
> I am not familiar with it...
> 
>>
>> It is obviously too late to change the userspace interface, but I wonder
>> whether we missed something at the time.
> 
> The interface seems to be really asymmetrical?...

in qemu d5aa0c229a ("hw/intc/arm_gicv3_kvm: Implement pending table
save") commit message, it is traced:

"There is no explicit restore as the tables are implicitly sync'ed
on ITS table restore and on LPI enable at redistributor level."

At that time there was no real justification behind adding the RESTORE
fellow attr.

Maybe a stupid question but isn't it possible to unset the forwarding
when saving and rely on VFIO to automatically restore it when resuming
on destination?

Thanks

Eric


> 
> Or is there a possibility that we could know which irq is hw before the VFIO
> device calls kvm_vgic_v4_set_forwarding()?
> 
> Thanks,
> Shenming
> 
>>
>> Thanks,
>>
>>     M.
> 



Re: [RFC v2 1/1] vfio/platform: add support for msi

2020-12-11 Thread Auger Eric
Hi Vikas,

On 12/10/20 8:34 AM, Vikas Gupta wrote:
> HI Eric,
> 
> On Tue, Dec 8, 2020 at 2:13 AM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 12/3/20 3:50 PM, Vikas Gupta wrote:
>>> Hi Eric,
>>>
>>> On Wed, Dec 2, 2020 at 8:14 PM Auger Eric  wrote:
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 11/24/20 5:16 PM, Vikas Gupta wrote:
>>>>> MSI support for platform devices.
>>>>>
>>>>> Signed-off-by: Vikas Gupta 
>>>>> ---
>>>>>  drivers/vfio/platform/vfio_platform_common.c  |  99 ++-
>>>>>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>>>>>  drivers/vfio/platform/vfio_platform_private.h |  16 ++
>>>>>  include/uapi/linux/vfio.h |  43 +++
>>>>>  4 files changed, 401 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>>>>> b/drivers/vfio/platform/vfio_platform_common.c
>>>>> index c0771a9567fb..b0bfc0f4ee1f 100644
>>>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>>>> @@ -16,6 +16,7 @@
>>>>>  #include 
>>>>>  #include 
>>>>>  #include 
>>>>> +#include 
>>>>>
>>>>>  #include "vfio_platform_private.h"
>>>>>
>>>>> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>>>>>
>>>>>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>>>>   struct vfio_irq_info info;
>>>>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>>>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
>>>>> + unsigned long capsz;
>>>>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>>>>>
>>>>>   minsz = offsetofend(struct vfio_irq_info, count);
>>>>>
>>>>> + /* For backward compatibility, cannot require this */
>>>>> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
>>>>> +
>>>>>   if (copy_from_user(, (void __user *)arg, minsz))
>>>>>   return -EFAULT;
>>>>>
>>>>> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
>>>>>   if (info.index >= vdev->num_irqs)
>>>>>   return -EINVAL;
>>>>>
>>>>> + if (info.argsz >= capsz)
>>>>> + minsz = capsz;
>>>>> +
>>>>> + if (info.index == ext_irq_index) {
>>>> nit: n case we add new ext indices afterwards, I would check info.index
>>>> -  ext_irq_index against an VFIO_EXT_IRQ_MSI define.
>>>>> + struct vfio_irq_info_cap_type cap_type = {
>>>>> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
>>>>> + .header.version = 1 };
>>>>> + int i;
>>>>> + int ret;
>>>>> + int num_msgs;
>>>>> + size_t msi_info_size;
>>>>> + struct vfio_platform_irq *irq;
>>>> nit: I think generally the opposite order (length) is chosen. This also
>>>> would better match the existing style in this file
>>> Ok will fix it
>>>>> +
>>>>> + info.index = array_index_nospec(info.index,
>>>>> + vdev->num_irqs);
>>>>> +
>>>>> + irq = >irqs[info.index];
>>>>> +
>>>>> + info.flags = irq->flags;
>>>> I think this can be removed given [*]
>>> Sure.
>>>>> + cap_type.type = irq->type;
>>>>> + cap_type.subtype = irq->subtype;
>>>>> +
>>>>> + ret = vfio_info_add_capability(,
>>>>> +_type.header,
>>>>> +sizeof(cap_type));
>>>&g

Re: [RFC v2 1/1] vfio/platform: add support for msi

2020-12-07 Thread Auger Eric
Hi Vikas,

On 12/3/20 3:50 PM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Wed, Dec 2, 2020 at 8:14 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 11/24/20 5:16 PM, Vikas Gupta wrote:
>>> MSI support for platform devices.
>>>
>>> Signed-off-by: Vikas Gupta 
>>> ---
>>>  drivers/vfio/platform/vfio_platform_common.c  |  99 ++-
>>>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>>>  drivers/vfio/platform/vfio_platform_private.h |  16 ++
>>>  include/uapi/linux/vfio.h |  43 +++
>>>  4 files changed, 401 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>>> b/drivers/vfio/platform/vfio_platform_common.c
>>> index c0771a9567fb..b0bfc0f4ee1f 100644
>>> --- a/drivers/vfio/platform/vfio_platform_common.c
>>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>>> @@ -16,6 +16,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "vfio_platform_private.h"
>>>
>>> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>>>
>>>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>>   struct vfio_irq_info info;
>>> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>>> + struct vfio_irq_info_cap_msi *msi_info = NULL;
>>> + unsigned long capsz;
>>> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>>>
>>>   minsz = offsetofend(struct vfio_irq_info, count);
>>>
>>> + /* For backward compatibility, cannot require this */
>>> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
>>> +
>>>   if (copy_from_user(, (void __user *)arg, minsz))
>>>   return -EFAULT;
>>>
>>> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
>>>   if (info.index >= vdev->num_irqs)
>>>   return -EINVAL;
>>>
>>> + if (info.argsz >= capsz)
>>> + minsz = capsz;
>>> +
>>> + if (info.index == ext_irq_index) {
>> nit: n case we add new ext indices afterwards, I would check info.index
>> -  ext_irq_index against an VFIO_EXT_IRQ_MSI define.
>>> + struct vfio_irq_info_cap_type cap_type = {
>>> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
>>> + .header.version = 1 };
>>> + int i;
>>> + int ret;
>>> + int num_msgs;
>>> + size_t msi_info_size;
>>> + struct vfio_platform_irq *irq;
>> nit: I think generally the opposite order (length) is chosen. This also
>> would better match the existing style in this file
> Ok will fix it
>>> +
>>> + info.index = array_index_nospec(info.index,
>>> + vdev->num_irqs);
>>> +
>>> + irq = >irqs[info.index];
>>> +
>>> + info.flags = irq->flags;
>> I think this can be removed given [*]
> Sure.
>>> + cap_type.type = irq->type;
>>> + cap_type.subtype = irq->subtype;
>>> +
>>> + ret = vfio_info_add_capability(,
>>> +_type.header,
>>> +sizeof(cap_type));
>>> + if (ret)
>>> + return ret;
>>> +
>>> + num_msgs = irq->num_ctx;
>> do would want to return the cap even if !num_ctx?
> If num_ctx = 0 then VFIO_IRQ_INFO_CAP_MSI_DESCS should not be written.
> I`ll take care of same.
>>> +
>>> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
>>> +
>>> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
>>> + if (!msi_info) {
>>> + kfree(caps.buf);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS

Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-04 Thread Auger Eric
Hi Shameer, Jean-Philippe,

On 12/4/20 11:20 AM, Shameerali Kolothum Thodi wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
>> Sent: 04 December 2020 09:54
>> To: Shameerali Kolothum Thodi 
>> Cc: Auger Eric ; wangxingang
>> ; Xieyingtai ;
>> k...@vger.kernel.org; m...@kernel.org; j...@8bytes.org; w...@kernel.org;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> vivek.gau...@arm.com; alex.william...@redhat.com;
>> zhangfei@linaro.org; robin.mur...@arm.com;
>> kvm...@lists.cs.columbia.edu; eric.auger@gmail.com; Zengtao (B)
>> ; qubingbing 
>> Subject: Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with
>> unmanaged ASIDs
>>
>> Hi Shameer,
>>
>> On Thu, Dec 03, 2020 at 06:42:57PM +, Shameerali Kolothum Thodi wrote:
>>> Hi Jean/zhangfei,
>>> Is it possible to have a branch with minimum required SVA/UACCE related
>> patches
>>> that are already public and can be a "stable" candidate for future respin of
>> Eric's series?
>>> Please share your thoughts.
>>
>> By "stable" you mean a fixed branch with the latest SVA/UACCE patches
>> based on mainline? 
> 
> Yes. 
> 
>  The uacce-devel branches from
>> https://github.com/Linaro/linux-kernel-uadk do provide this at the moment
>> (they track the latest sva/zip-devel branch
>> https://jpbrucker.net/git/linux/ which is roughly based on mainline.)
> 
> Thanks. 
> 
> Hi Eric,
> 
> Could you please take a look at the above branches and see whether it make 
> sense
> to rebase on top of either of those?
> 
> From vSVA point of view, it will be less rebase hassle if we can do that.

Sure. I will rebase on top of this ;-)

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks,
>> Jean
> 



Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2020-12-03 Thread Auger Eric
Hi Kunkun,

On 12/3/20 1:32 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> When nested stage translation is setup, both s1_cfg and
>> s2_cfg are set.
>>
>> We introduce a new smmu domain abort field that will be set
>> upon guest stage1 configuration passing.
>>
>> arm_smmu_write_strtab_ent() is modified to write both stage
>> fields in the STE and deal with the abort field.
>>
>> In nested mode, only stage 2 is "finalized" as the host does
>> not own/configure the stage 1 context descriptor; guest does.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v10 -> v11:
>> - Fix an issue reported by Shameer when switching from with vSMMU
>>   to without vSMMU. Despite the spec does not seem to mention it
>>   seems to be needed to reset the 2 high 64b when switching from
>>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>>   On some implementations, if the S2TTB is not reset, this causes
>>   a C_BAD_STE error
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>>  2 files changed, 56 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 18ac5af1b284..412ea1bafa50 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * three cases at the moment:
> Now, it should be *five cases*.
>>   *
>>   * 1. Invalid (all zero) -> bypass/fault (init)
>> - * 2. Bypass/fault -> translation/bypass (attach)
>> - * 3. Translation/bypass -> bypass/fault (detach)
>> + * 2. Bypass/fault -> single stage translation/bypass (attach)
>> + * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
>> + * 4. S2 -> S1 + S2 (attach_pasid_table)
> 
> I was testing this series on one of our hardware board with SMMUv3. And
> I found while trying to /"//attach_pasid_table//"/,
> 
> the sequence of STE (host) config(bit[3:1]) is /"S2->abort->S1 + S2"/.
> Because the maintenance is  /"Write everything apart///
> 
> /from dword 0, sync, write dword 0, sync"/ when we update the STE
> (guest). Dose the sequence meet your expectation?

yes it does. I will fix the comments accordingly.

Is there anything to correct in the code or was it functional?

Thanks

Eric
> 
>> + * 5. S1 + S2 -> S2 (detach_pasid_table)
>>   *
>>   * Given that we can't update the STE atomically and the SMMU
>>   * doesn't read the thing in a defined order, that leaves us
>> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>   * 3. Update Config, sync
>>   */
>>  u64 val = le64_to_cpu(dst[0]);
>> -bool ste_live = false;
>> +bool s1_live = false, s2_live = false, ste_live;
>> +bool abort, nested = false, translate = false;
>>  struct arm_smmu_device *smmu = NULL;
>>  struct arm_smmu_s1_cfg *s1_cfg;
>>  struct arm_smmu_s2_cfg *s2_cfg;
>> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  default:
>>  break;
>>  }
>> +nested = s1_cfg->set && s2_cfg->set;
>> +translate = s1_cfg->set || s2_cfg->set;
>>  }
>>  
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
>> arm_smmu_master *master, u32 sid,
>>  case STRTAB_STE_0_CFG_BYPASS:
>>  break;
>>  case STRTAB_STE_0_CFG_S1_TRANS:
>> +s1_live = true;
>> +break;
>>  case STRTAB_STE_0_CFG_S2_TRANS:
>> -ste_live = true;
>> +s2_live = true;
>> +break;
>> +case STRTAB_STE_0_CFG_NESTED:
>> +s1_live = true;
>> +s2_live = true;
>>  break;
>>  case STRTAB_STE_0_CFG_ABORT:
>> -BUG_ON(!disable_bypass);
>>  break;
>>  default:
>>  BUG(); /* STE corruption */
>>  }
>>  }
>>  
>> +ste_live = s1_live || s2_live;
>> +
>>  /* Nuke the existing STE_0 value, as we're going to rewrite it */
>>  val = STRTAB_STE_0_V;
>>  
>>  /* Bypass/fault */
>> -if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>> -if (!smmu_domain && disable_bypass)
>> +
>> +if (!smmu_domain)
>> +abort = disable_bypass;
>> +else
>> +abort = smmu_domain->abort;
>> +
>> +if (abort || !translate) {
>> +if (abort)
>>  val |= FIELD_PREP(STRTAB_STE_0_CFG, 
>> STRTAB_STE_0_CFG_ABORT);
>>  else
>>  val 

Re: [PATCH v1 2/5] vfio: platform: Switch to use platform_get_mem_or_io_resource()

2020-12-03 Thread Auger Eric
Hi Andy,

On 10/27/20 6:58 PM, Andy Shevchenko wrote:
> Switch to use new platform_get_mem_or_io_resource() instead of
> home grown analogue.
> 
> Cc: Eric Auger 
> Cc: Alex Williamson 
> Cc: Cornelia Huck 
> Cc: k...@vger.kernel.org
> Signed-off-by: Andy Shevchenko 
Acked-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/vfio/platform/vfio_platform.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform.c 
> b/drivers/vfio/platform/vfio_platform.c
> index 1e2769010089..84afafb6941b 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -25,19 +25,8 @@ static struct resource *get_platform_resource(struct 
> vfio_platform_device *vdev,
> int num)
>  {
>   struct platform_device *dev = (struct platform_device *) vdev->opaque;
> - int i;
>  
> - for (i = 0; i < dev->num_resources; i++) {
> - struct resource *r = >resource[i];
> -
> - if (resource_type(r) & (IORESOURCE_MEM|IORESOURCE_IO)) {
> - if (!num)
> - return r;
> -
> - num--;
> - }
> - }
> - return NULL;
> + return platform_get_mem_or_io_resource(dev, num);
>  }
>  
>  static int get_platform_irq(struct vfio_platform_device *vdev, int i)
> 



Re: [PATCH v2 3/5] vfio: platform: simplify device removal

2020-12-02 Thread Auger Eric
Hi Uwe,

On 11/24/20 2:31 PM, Uwe Kleine-König wrote:
> From: Uwe Kleine-König 
> 
> vfio_platform_remove_common() cannot return non-NULL in
> vfio_amba_remove() as the latter is only called if vfio_amba_probe()
> returned success.
> 
> Diagnosed-by: Arnd Bergmann 
> Signed-off-by: Uwe Kleine-König 
Acked-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/vfio/platform/vfio_amba.c | 14 +-
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_amba.c 
> b/drivers/vfio/platform/vfio_amba.c
> index 9636a2afaecd..7b3ebf1558e1 100644
> --- a/drivers/vfio/platform/vfio_amba.c
> +++ b/drivers/vfio/platform/vfio_amba.c
> @@ -73,16 +73,12 @@ static int vfio_amba_probe(struct amba_device *adev, 
> const struct amba_id *id)
>  
>  static int vfio_amba_remove(struct amba_device *adev)
>  {
> - struct vfio_platform_device *vdev;
> -
> - vdev = vfio_platform_remove_common(>dev);
> - if (vdev) {
> - kfree(vdev->name);
> - kfree(vdev);
> - return 0;
> - }
> + struct vfio_platform_device *vdev =
> + vfio_platform_remove_common(>dev);
>  
> - return -EINVAL;
> + kfree(vdev->name);
> + kfree(vdev);
> + return 0;
>  }
>  
>  static const struct amba_id pl330_ids[] = {
> 



Re: [RFC v2 1/1] vfio/platform: add support for msi

2020-12-02 Thread Auger Eric
Hi Vikas,

On 11/24/20 5:16 PM, Vikas Gupta wrote:
> MSI support for platform devices.
> 
> Signed-off-by: Vikas Gupta 
> ---
>  drivers/vfio/platform/vfio_platform_common.c  |  99 ++-
>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>  drivers/vfio/platform/vfio_platform_private.h |  16 ++
>  include/uapi/linux/vfio.h |  43 +++
>  4 files changed, 401 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
> b/drivers/vfio/platform/vfio_platform_common.c
> index c0771a9567fb..b0bfc0f4ee1f 100644
> --- a/drivers/vfio/platform/vfio_platform_common.c
> +++ b/drivers/vfio/platform/vfio_platform_common.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "vfio_platform_private.h"
>  
> @@ -344,9 +345,16 @@ static long vfio_platform_ioctl(void *device_data,
>  
>   } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>   struct vfio_irq_info info;
> + struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> + struct vfio_irq_info_cap_msi *msi_info = NULL;
> + unsigned long capsz;
> + int ext_irq_index = vdev->num_irqs - vdev->num_ext_irqs;
>  
>   minsz = offsetofend(struct vfio_irq_info, count);
>  
> + /* For backward compatibility, cannot require this */
> + capsz = offsetofend(struct vfio_irq_info, cap_offset);
> +
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
>  
> @@ -356,9 +364,89 @@ static long vfio_platform_ioctl(void *device_data,
>   if (info.index >= vdev->num_irqs)
>   return -EINVAL;
>  
> + if (info.argsz >= capsz)
> + minsz = capsz;
> +
> + if (info.index == ext_irq_index) {
nit: n case we add new ext indices afterwards, I would check info.index
-  ext_irq_index against an VFIO_EXT_IRQ_MSI define.
> + struct vfio_irq_info_cap_type cap_type = {
> + .header.id = VFIO_IRQ_INFO_CAP_TYPE,
> + .header.version = 1 };
> + int i;
> + int ret;
> + int num_msgs;
> + size_t msi_info_size;
> + struct vfio_platform_irq *irq;
nit: I think generally the opposite order (length) is chosen. This also
would better match the existing style in this file
> +
> + info.index = array_index_nospec(info.index,
> + vdev->num_irqs);
> +
> + irq = >irqs[info.index];
> +
> + info.flags = irq->flags;
I think this can be removed given [*]
> + cap_type.type = irq->type;
> + cap_type.subtype = irq->subtype;
> +
> + ret = vfio_info_add_capability(,
> +_type.header,
> +sizeof(cap_type));
> + if (ret)
> + return ret;
> +
> + num_msgs = irq->num_ctx;
do would want to return the cap even if !num_ctx?
> +
> + msi_info_size = struct_size(msi_info, msgs, num_msgs);
> +
> + msi_info = kzalloc(msi_info_size, GFP_KERNEL);
> + if (!msi_info) {
> + kfree(caps.buf);
> + return -ENOMEM;
> + }
> +
> + msi_info->header.id = VFIO_IRQ_INFO_CAP_MSI_DESCS;
> + msi_info->header.version = 1;
> + msi_info->nr_msgs = num_msgs;
> +
> + for (i = 0; i < num_msgs; i++) {
> + struct vfio_irq_ctx *ctx = >ctx[i];
> +
> + msi_info->msgs[i].addr_lo = ctx->msg.address_lo;
> + msi_info->msgs[i].addr_hi = ctx->msg.address_hi;
> + msi_info->msgs[i].data = ctx->msg.data;
> + }
> +
> + ret = vfio_info_add_capability(, _info->header,
> +msi_info_size);
> + if (ret) {
> + kfree(msi_info);
> + kfree(caps.buf);
> + return ret;
> + }
> + }
> +
>   info.flags = vdev->irqs[info.index].flags;
[*]
>   info.count = vdev->irqs[info.index].count;
>  
> + if (caps.size) {
> + info.flags |= VFIO_IRQ_INFO_FLAG_CAPS;
> + if (info.argsz < sizeof(info) + caps.size) {
> + info.argsz = sizeof(info) + caps.size;
> + info.cap_offset = 0;
> +   

Re: [RFC, v2 0/1] msi support for platform devices

2020-12-02 Thread Auger Eric
Hi Vikas,
On 11/24/20 5:16 PM, Vikas Gupta wrote:
> This RFC adds support for MSI for platform devices.
> MSI block is added as an ext irq along with the existing
> wired interrupt implementation.
> 
> Changes from:
> -
>  v1 to v2:
>   1) IRQ allocation has been implemented as below:
>  
>  |IRQ-0|IRQ-1||IRQ-n|MSI|
>  
>   MSI block has msi contexts and its implemneted
it is implemented
>   as ext irq.
> 
>   2) Removed vendor specific module for msi handling so
>  previously patch2 and patch3 are not required.
> 
>   3) MSI related data is exported to userspace using 'caps'.
>Please note VFIO_IRQ_INFO_CAP_TYPE in include/uapi/linux/vfio.h 
> implementation
>   is taken from the Eric`s patch
> 
> https://patchwork.kernel.org/project/kvm/patch/20201116110030.32335-8-eric.au...@redhat.com/
So do you mean that by exposing the vectors, now you do not need the msi
module anymore?


Thanks

Eric
> 
> 
>  v0 to v1:
>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>MSI(s) with first entry of MSI block has count and flag
>information.
>IRQ list: Allocation for IRQs + MSIs are allocated as below
>Example: if there are 'n' IRQs and 'k' MSIs
>---
>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>---
>MSI-0 will have count=k set and flags set accordingly.
> 
> Vikas Gupta (1):
>   vfio/platform: add support for msi
> 
>  drivers/vfio/platform/vfio_platform_common.c  |  99 ++-
>  drivers/vfio/platform/vfio_platform_irq.c | 260 +-
>  drivers/vfio/platform/vfio_platform_private.h |  16 ++
>  include/uapi/linux/vfio.h |  43 +++
>  4 files changed, 401 insertions(+), 17 deletions(-)
> 



Re: [PATCH v13 07/15] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2020-12-01 Thread Auger Eric
Hi Xingang,

On 12/1/20 2:33 PM, Xingang Wang wrote:
> Hi Eric
> 
> On  Wed, 18 Nov 2020 12:21:43, Eric Auger wrote:
>> @@ -1710,7 +1710,11 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>   * insertion to guarantee those are observed before the TLBI. Do be
>>   * careful, 007.
>>   */
>> -if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +cmd.opcode  = CMDQ_OP_TLBI_NH_ASID;
>> +cmd.tlbi.asid   = ext_asid;
>> +cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> 
> Found a problem here, the cmd for guest stage 1 invalidation is built,
> but it is not delivered to smmu.
> 

Thank you for the report. I will fix that soon. With that fixed, have
you been able to run vSVA on top of the series. Do you need other stuff
to be fixed at SMMU level? As I am going to respin soon, please let me
know what is the best branch to rebase to alleviate your integration.

Best Regards

Eric



Re: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices

2020-11-24 Thread Auger Eric
Hi Shameer, Qubingbing
On 11/23/20 1:51 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 11:00
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>> alex.william...@redhat.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui 
>> Subject: [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt
>> indices
>>
>> Implement IRQ capability chain infrastructure. All interrupt
>> indexes beyond VFIO_PCI_NUM_IRQS are handled as extended
>> interrupts. They are registered with a specific type/subtype
>> and supported flags.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/pci/vfio_pci.c | 99 +++--
>>  drivers/vfio/pci/vfio_pci_intrs.c   | 62 ++
>>  drivers/vfio/pci/vfio_pci_private.h | 14 
>>  3 files changed, 157 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 2a6cc1a87323..93e03a4a5f32 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -608,6 +608,14 @@ static void vfio_pci_disable(struct vfio_pci_device
>> *vdev)
>>
>>  WARN_ON(iommu_unregister_device_fault_handler(>pdev->dev));
>>
>> +for (i = 0; i < vdev->num_ext_irqs; i++)
>> +vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE |
>> +VFIO_IRQ_SET_ACTION_TRIGGER,
>> +VFIO_PCI_NUM_IRQS + i, 0, 0, NULL);
>> +vdev->num_ext_irqs = 0;
>> +kfree(vdev->ext_irqs);
>> +vdev->ext_irqs = NULL;
>> +
>>  /* Device closed, don't need mutex here */
>>  list_for_each_entry_safe(ioeventfd, ioeventfd_tmp,
>>   >ioeventfds_list, next) {
>> @@ -823,6 +831,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device
>> *vdev, int irq_type)
>>  return 1;
>>  } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
>>  return 1;
>> +} else if (irq_type >= VFIO_PCI_NUM_IRQS &&
>> +   irq_type < VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs) {
>> +return 1;
>>  }
>>
>>  return 0;
>> @@ -1008,7 +1019,7 @@ static long vfio_pci_ioctl(void *device_data,
>>  info.flags |= VFIO_DEVICE_FLAGS_RESET;
>>
>>  info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
>> -info.num_irqs = VFIO_PCI_NUM_IRQS;
>> +info.num_irqs = VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs;
>>
>>  if (IS_ENABLED(CONFIG_VFIO_PCI_ZDEV)) {
>>  int ret = vfio_pci_info_zdev_add_caps(vdev, );
>> @@ -1187,36 +1198,87 @@ static long vfio_pci_ioctl(void *device_data,
>>
>>  } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
>>  struct vfio_irq_info info;
>> +struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
>> +unsigned long capsz;
>>
>>  minsz = offsetofend(struct vfio_irq_info, count);
>>
>> +/* For backward compatibility, cannot require this */
>> +capsz = offsetofend(struct vfio_irq_info, cap_offset);
>> +
>>  if (copy_from_user(, (void __user *)arg, minsz))
>>  return -EFAULT;
>>
>> -if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
>> +if (info.argsz < minsz ||
>> +info.index >= VFIO_PCI_NUM_IRQS + vdev->num_ext_irqs)
>>  return -EINVAL;
>>
>> -switch (info.index) {
>> -case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>> -case VFIO_PCI_REQ_IRQ_INDEX:
>> -break;
>> -case VFIO_PCI_ERR_IRQ_INDEX:
>> -if (pci_is_pcie(vdev->pdev))
>> -break;
>> -fallthrough;
>> -default:
>> -return -EINVAL;
>> -}
>> +if (info.argsz >= capsz)
>> +minsz = capsz;
>>
>>  info.flags = VFIO_IRQ_INFO_EVENTFD;
>>
>> -info.count = vfio_pci_get_irq_count(vdev, info.index);
>> -
>> -if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
>> +switch (info.index) {
>> +case VFIO_PCI_INTX_IRQ_INDEX:
>>  info.flags |= (VFIO_IRQ_INFO_MASKABLE |
>> VFIO_IRQ_INFO_AUTOMASKED);
>> -else
>> +break;
>> +case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
>> +

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2020-11-19 Thread Auger Eric
Hi Jacob,
On 11/18/20 5:19 PM, Jacob Pan wrote:
> Hi Eric,
> 
> On Wed, 18 Nov 2020 12:21:37 +0100, Eric Auger 
> wrote:
> 
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c  | 68 ++
>>  include/linux/iommu.h  | 21 
>>  include/uapi/linux/iommu.h | 54 ++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct
>> iommu_domain *domain, struct device *dev }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +  void __user *uinfo)
>> +{
>> +struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +u32 minsz;
>> +
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +/*
>> + * No new spaces can be added before the variable sized union,
>> the
>> + * minimum size is the offset to the union.
>> + */
>> +minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +/* Copy minsz from user to get flags and argsz */
>> +if (copy_from_user(_table_data, uinfo, minsz))
>> +return -EFAULT;
>> +
>> +/* Fields before the variable size union are mandatory */
>> +if (pasid_table_data.argsz < minsz)
>> +return -EINVAL;
>> +
>> +/* PASID and address granu require additional info beyond minsz
>> */
>> +if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +return -EINVAL;
>> +if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +pasid_table_data.argsz <
>> +offsetofend(struct iommu_pasid_table_config,
>> vendor_data.smmuv3))
>> +return -EINVAL;
>> +
>> +/*
>> + * User might be using a newer UAPI header which has a larger
>> data
>> + * size, we shall support the existing flags within the current
>> + * size. Copy the remaining user data _after_ minsz but not more
>> + * than the current kernel supported size.
>> + */
>> +if (copy_from_user((void *)_table_data + minsz, uinfo +
>> minsz,
>> +   min_t(u32, pasid_table_data.argsz,
>> sizeof(pasid_table_data)) - minsz))
>> +return -EFAULT;
>> +
>> +/* Now the argsz is validated, check the content */
>> +if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +return -EINVAL;
>> +
>> +return domain->ops->attach_pasid_table(domain,
>> _table_data); +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * 

Re: [RFC, v1 0/3] msi support for platform devices

2020-11-18 Thread Auger Eric
Hi Vikas,

On 11/17/20 5:36 PM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Tue, Nov 17, 2020 at 1:55 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 11/17/20 9:05 AM, Auger Eric wrote:
>>> Hi Vikas,
>>>
>>> On 11/17/20 7:25 AM, Vikas Gupta wrote:
>>>> Hi Eric,
>>>>
>>>> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric  wrote:
>>>>>
>>>>> Hi Vikas,
>>>>>
>>>>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric  
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Vikas,
>>>>>>>
>>>>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>>>>> This RFC adds support for MSI for platform devices.
>>>>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>>>>> b) The vendor specific MSI configuration can be done using
>>>>>>>>callbacks which is implemented as msi module.
>>>>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>>>>
>>>>>>>> Changes from:
>>>>>>>> -
>>>>>>>>  v0 to v1:
>>>>>>>>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>>>>>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>>>>>MSI(s) with first entry of MSI block has count and flag
>>>>>>>>information.
>>>>>>>>IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>>>>>Example: if there are 'n' IRQs and 'k' MSIs
>>>>>>>>---
>>>>>>>>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>>>>>>>>---
>>>>>>> I have not taken time yet to look at your series, but to me you should 
>>>>>>> have
>>>>>>> |IRQ-0|IRQ-1||IRQ-n|MSI|MSIX
>>>>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>>>>> then set start=i count=1.
>>>>>>
>>>>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>>>>> and m-MSIs, allocation of IRQs should be done as below
>>>>>>
>>>>>> |IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>>>>>>  ||
>>>>>>  |
>>>>>> |MSIX0||MSIX1||MSXI2||MSIX-(k-1)|
>>>>>>  
>>>>>> |MSI0||MSI1||MSI2||MSI-(m-1)|
>>>>> No I really meant this list of indices: 
>>>>> IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>>>>> and potentially later on IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>>>>> if ERR/REQ were to be added.
>>>> I agree on this. Actually the map I drew incorrectly above but wanted
>>>> to demonstrate the same. It was a child-parent relationship for MSI
>>>> and its members and similarly for MSIX as well.
>>>>>
>>>>> I think the userspace could query the total number of indices using
>>>>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>>>>> interrupts + MSI index + MSIX index)
>>>>>
>>>>> Then userspace can loop on all the indices using
>>>>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>>>>> first indices related to wire interrupts (count = 1). Then comes the MSI
>>>>> index, and after the MSI index. If any of those is supported, count >1,
>>>>> otherwise count=0. The only thing I am dubious about is can the device
>>>>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>>>>> discriminate between wire first indices and other indices.
>>>> I believe count can be one as well, especially for ERR/REQ as you
>>>> mentioned above.
>>> Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
>>> could be recognized by the first index whose count != 1. But indeed I am
>>> not sure th

Re: [PATCH] KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace

2020-11-17 Thread Auger Eric
Hi Zenghui,

On 11/17/20 4:16 PM, Zenghui Yu wrote:
> It was recently reported that if GICR_TYPER is accessed before the RD base
> address is set, we'll suffer from the unset @rdreg dereferencing. Oops...
> 
>   gpa_t last_rdist_typer = rdreg->base + GICR_TYPER +
>   (rdreg->free_index - 1) * KVM_VGIC_V3_REDIST_SIZE;
> 
> It's "expected" that users will access registers in the redistributor if
> the RD has been properly configured (e.g., the RD base address is set). But
> it hasn't yet been covered by the existing documentation.
> 
> Per discussion on the list [1], the reporting of the GICR_TYPER.Last bit
> for userspace never actually worked. And it's difficult for us to emulate
> it correctly given that userspace has the flexibility to access it any
> time. Let's just drop the reporting of the Last bit for userspace for now
> (userspace should have full knowledge about it anyway) and it at least
> prevents kernel from panic ;-)
> 
> [1] 
> https://lore.kernel.org/kvmarm/c20865a267e44d1e2c0d52ce4e012...@kernel.org/
> 
> Fixes: ba7b3f1275fd ("KVM: arm/arm64: Revisit Redistributor TYPER last bit 
> computation")
> Reported-by: Keqian Zhu 
> Signed-off-by: Zenghui Yu 
Given the state of last bit, it looks sensible atm.

Reviewed-by: Eric Auger 

Thanks

Eric


> ---
> 
> This may be the easiest way to fix the issue and to get the fix backported
> to stable tree. There is still some work can be done since (at least) we
> have code duplicates between the MMIO and uaccess callbacks.
> 
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c 
> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 52d6f24f65dc..15a6c98ee92f 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -273,6 +273,23 @@ static unsigned long vgic_mmio_read_v3r_typer(struct 
> kvm_vcpu *vcpu,
>   return extract_bytes(value, addr & 7, len);
>  }
>  
> +static unsigned long vgic_uaccess_read_v3r_typer(struct kvm_vcpu *vcpu,
> +  gpa_t addr, unsigned int len)
> +{
> + unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
> + int target_vcpu_id = vcpu->vcpu_id;
> + u64 value;
> +
> + value = (u64)(mpidr & GENMASK(23, 0)) << 32;
> + value |= ((target_vcpu_id & 0x) << 8);
> +
> + if (vgic_has_its(vcpu->kvm))
> + value |= GICR_TYPER_PLPIS;
> +
> + /* reporting of the Last bit is not supported for userspace */
> + return extract_bytes(value, addr & 7, len);
> +}
> +
>  static unsigned long vgic_mmio_read_v3r_iidr(struct kvm_vcpu *vcpu,
>gpa_t addr, unsigned int len)
>  {
> @@ -593,8 +610,9 @@ static const struct vgic_register_region 
> vgic_v3_rd_registers[] = {
>   REGISTER_DESC_WITH_LENGTH(GICR_IIDR,
>   vgic_mmio_read_v3r_iidr, vgic_mmio_write_wi, 4,
>   VGIC_ACCESS_32bit),
> - REGISTER_DESC_WITH_LENGTH(GICR_TYPER,
> - vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
> + REGISTER_DESC_WITH_LENGTH_UACCESS(GICR_TYPER,
> + vgic_mmio_read_v3r_typer, vgic_mmio_write_wi,
> + vgic_uaccess_read_v3r_typer, vgic_mmio_uaccess_write_wi, 8,
>   VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>   REGISTER_DESC_WITH_LENGTH(GICR_WAKER,
>   vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
> 



Re: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg

2020-11-17 Thread Auger Eric
Hi Shameer,

On 11/17/20 12:39 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 10:43
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>> Thodi ;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; t...@semihalf.com; nicoleots...@gmail.com
>> Subject: [PATCH v12 04/15] iommu/smmuv3: Dynamically allocate s1_cfg and
>> s2_cfg
>>
>> In preparation for the introduction of nested stages
>> let's turn s1_cfg and s2_cfg fields into pointers which are
>> dynamically allocated depending on the smmu_domain stage.
> 
> This will break compile if we have CONFIG_ARM_SMMU_V3_SVA
> because ,
> https://github.com/eauger/linux/blob/5.10-rc4-2stage-v12/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c#L40
> 
> Do we really need to make these pointers?

Thanks for reporting. I think I can do differently. Working on this now.

Thanks

Eric
> 
> Thanks,
> Shameer
>  
>> In nested mode, both stages will coexist and s1_cfg will
>> be allocated when the guest configuration gets passed.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 83 -
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 +-
>>  2 files changed, 48 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index d828d6cbeb0e..4baf9fafe462 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -953,9 +953,9 @@ static __le64 *arm_smmu_get_cd_ptr(struct
>> arm_smmu_domain *smmu_domain,
>>  unsigned int idx;
>>  struct arm_smmu_l1_ctx_desc *l1_desc;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>> -if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>> +if (smmu_domain->s1_cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>>  return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS;
>>
>>  idx = ssid >> CTXDESC_SPLIT;
>> @@ -990,7 +990,7 @@ int arm_smmu_write_ctx_desc(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>  __le64 *cdptr;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>>
>> -if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax)))
>> +if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg->s1cdmax)))
>>  return -E2BIG;
>>
>>  cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>> @@ -1056,7 +1056,7 @@ static int arm_smmu_alloc_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  size_t l1size;
>>  size_t max_contexts;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
>> +struct arm_smmu_s1_cfg *cfg = smmu_domain->s1_cfg;
>>  struct arm_smmu_ctx_desc_cfg *cdcfg = >cdcfg;
>>
>>  max_contexts = 1 << cfg->s1cdmax;
>> @@ -1104,7 +1104,7 @@ static void arm_smmu_free_cd_tables(struct
>> arm_smmu_domain *smmu_domain)
>>  int i;
>>  size_t size, l1size;
>>  struct arm_smmu_device *smmu = smmu_domain->smmu;
>> -struct arm_smmu_ctx_desc_cfg *cdcfg = _domain->s1_cfg.cdcfg;
>> +struct arm_smmu_ctx_desc_cfg *cdcfg =
>> _domain->s1_cfg->cdcfg;
>>
>>  if (cdcfg->l1_desc) {
>>  size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
>> @@ -1211,17 +1211,8 @@ static void arm_smmu_write_strtab_ent(struct
>> arm_smmu_master *master, u32 sid,
>>  }
>>
>>  if (smmu_domain) {
>> -switch (smmu_domain->stage) {
>> -case ARM_SMMU_DOMAIN_S1:
>> -s1_cfg = _domain->s1_cfg;
>> -break;
>> -case ARM_SMMU_DOMAIN_S2:
>> -case ARM_SMMU_DOMAIN_NESTED:
>> -s2_cfg = _domain->s2_cfg;
>> -break;
>> -default:
>> -break;
>> -}
>> +s1_cfg = smmu_domain->s1_cfg;
>> +s2_cfg = smmu_domain->s2_cfg;
>>  }
>>
>>  if (val & STRTAB_STE_0_V) {
>> @@ -1664,10 +1655,10 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>   * careful, 007.
>>   */
>>  if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> -arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>> +arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg->cd.asid);
>>  } else {
>>  cmd.opcode  = CMDQ_OP_TLBI_S12_VMALL;
>> -cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
>> +

Re: [PATCH 1/2] KVM: arm64: vgic: Forbid invalid userspace Redistributor accesses

2020-11-17 Thread Auger Eric
Hi Marc,

On 11/17/20 9:49 AM, Marc Zyngier wrote:
> Hi Zenghui,
> 
> On 2020-11-16 14:57, Zenghui Yu wrote:
>> Hi Marc,
>>
>> On 2020/11/16 22:10, Marc Zyngier wrote:
 My take is that only if the "[Re]Distributor base address" is specified
 in the system memory map, will the user-provided kvm_device_attr.offset
 make sense. And we can then handle the access to the register which is
 defined by "base address + offset".
>>>
>>> I'd tend to agree, but it is just that this is a large change at -rc4.
>>> I'd rather have a quick fix for 5.10, and a more invasive change for
>>> 5.11,
>>> spanning all the possible vgic devices.
>>
>> So you prefer fixing it by "return a value that doesn't have the Last
>> bit set" for v5.10? I'm ok with it and can send v2 for it.
> 
> Cool. Thanks for that.
> 
>> Btw, looking again at the way we handle the user-reading of GICR_TYPER
>>
>> vgic_mmio_read_v3r_typer(vcpu, addr, len)
>>
>> it seems that @addr is actually the *offset* of GICR_TYPER (0x0008) and
>> @addr is unlikely to be equal to last_rdist_typer, which is the *GPA* of
>> the last RD. Looks like the user-reading of GICR_TYPER.Last is always
>> broken?
> 
> I think you are right. Somehow, we don't seem to track the index of
> the RD in the region, so we can never compute the address of the RD
> even if the base address is set.
> 
> Let's drop the reporting of Last for userspace for now, as it never
> worked. If you post a patch addressing that quickly, I'll get it to
> Paolo by the end of the week (there's another fix that needs merging).
> 
> Eric: do we have any test covering the userspace API?

So as this issue seems related to the changes made when implementing the
multiple RDIST regions, I volunteer to write those KVM selftests :-)

Thanks

Eric

> 
> Thanks,
> 
>     M.



Re: [PATCH 1/2] KVM: arm64: vgic: Forbid invalid userspace Redistributor accesses

2020-11-17 Thread Auger Eric
Hi Zenghui,

On 11/17/20 9:49 AM, Marc Zyngier wrote:
> Hi Zenghui,
> 
> On 2020-11-16 14:57, Zenghui Yu wrote:
>> Hi Marc,
>>
>> On 2020/11/16 22:10, Marc Zyngier wrote:
 My take is that only if the "[Re]Distributor base address" is specified
 in the system memory map, will the user-provided kvm_device_attr.offset
 make sense. And we can then handle the access to the register which is
 defined by "base address + offset".
>>>
>>> I'd tend to agree, but it is just that this is a large change at -rc4.
>>> I'd rather have a quick fix for 5.10, and a more invasive change for
>>> 5.11,
>>> spanning all the possible vgic devices.
>>
>> So you prefer fixing it by "return a value that doesn't have the Last
>> bit set" for v5.10? I'm ok with it and can send v2 for it.
> 
> Cool. Thanks for that.
> 
>> Btw, looking again at the way we handle the user-reading of GICR_TYPER
>>
>> vgic_mmio_read_v3r_typer(vcpu, addr, len)
>>
>> it seems that @addr is actually the *offset* of GICR_TYPER (0x0008) and
>> @addr is unlikely to be equal to last_rdist_typer, which is the *GPA* of
>> the last RD. Looks like the user-reading of GICR_TYPER.Last is always
>> broken?
> 
> I think you are right. Somehow, we don't seem to track the index of
> the RD in the region, so we can never compute the address of the RD
> even if the base address is set.
> 
> Let's drop the reporting of Last for userspace for now, as it never
> worked. If you post a patch addressing that quickly, I'll get it to
> Paolo by the end of the week (there's another fix that needs merging).
> 
> Eric: do we have any test covering the userspace API?
No, there are no KVM selftests for that yet.

Thanks

Eric

> 
> Thanks,
> 
>     M.



Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2020-11-17 Thread Auger Eric
Hi Shameer,

On 5/13/20 5:57 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-----
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 13 May 2020 14:29
>> To: Shameerali Kolothum Thodi ;
>> Zhangfei Gao ; eric.auger@gmail.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com
>> Cc: jean-phili...@linaro.org; alex.william...@redhat.com;
>> jacob.jun@linux.intel.com; yi.l@intel.com; peter.mayd...@linaro.org;
>> t...@semihalf.com; bbhush...@marvell.com
>> Subject: Re: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
>>
> [...]
> 
>>>>> Yes that's normal this series is not meant to support vSVM at this stage.
>>>>>
>>>>> I intend to add the missing pieces during the next weeks.
>>>>
>>>> Thanks for that. I have made an attempt to add the vSVA based on
>>>> your v10 + JPBs sva patches. The host kernel and Qemu changes can
>>>> be found here[1][2].
>>>>
>>>> This basically adds multiple pasid support on top of your changes.
>>>> I have done some basic sanity testing and we have some initial success
>>>> with the zip vf dev on our D06 platform. Please note that the STALL event 
>>>> is
>>>> not yet supported though, but works fine if we mlock() guest usr mem.
>>>
>>> I have added STALL support for our vSVA prototype and it seems to be
>>> working(on our hardware). I have updated the kernel and qemu branches
>> with
>>> the same[1][2]. I should warn you though that these are prototype code and I
>> am pretty
>>> much re-using the VFIO_IOMMU_SET_PASID_TABLE interface for almost
>> everything.
>>> But thought of sharing, in case if it is useful somehow!.
>>
>> Thank you again for sharing the POC. I looked at the kernel and QEMU
>> branches.
>>
>> Here are some preliminary comments:
>> - "arm-smmu-v3: Reset S2TTB while switching back from nested stage":  as
>> you mentionned S2TTB reset now is featured in v11
> 
> Yes.
> 
>> - "arm-smmu-v3: Add support for multiple pasid in nested mode": I could
>> easily integrate this into my series. Update the iommu api first and
>> pass multiple CD info in a separate patch
> 
> Ok.
in v12, I added
[PATCH v12 14/15] iommu/smmuv3: Accept configs with more than one
context descriptor

I don't think you need to add s1cdmax addition as we already have
pasid_bits which should do the job.

>> - "arm-smmu-v3: Add support to Invalidate CD": CD invalidation should be
>> cascaded to host through the PASID cache invalidation uapi (no pb you
>> warned us for the POC you simply used VFIO_IOMMU_SET_PASID_TABLE). I
>> think I should add this support in my original series although it does
>> not seem to trigger any issue up to now.
> 
> Agree. Cache invalidation uapi is a better interface for this. Also I don’t 
> think
> this matters for non-vsva cases as Guest kernel table/CD(pasid 0) will never
> get invalidated. 
in v12 I added [PATCH v12 15/15] iommu/smmuv3: Add PASID cache
invalidation per PASID. I have not tested it though.
> 
>> - "arm-smmu-v3: Remove duplication of fault propagation". I understand
>> the transcode is done somewhere else with SVA but we still need to do it
>> if a single CD is used, right? I will review the SVA code to better
>> understand.

Since I have rebase on 5.10-rc4 you will still have this duplication to
handle.
> 
> Hmm..not sure. Need to take another look to see whether we need a special
> handling for single CD or not.
> 
>> - for the STALL response injection I would tend to use a new VFIO region
>> for responses. At the moment there is a single VFIO region for reporting
>> the fault.

in v12 I added a new VFIO region to inject your fault. This was tested
with dummy event injection, this should work properly.

If we clearly identify all the public dependencies needed for vSVA/ARM I
can help you respinning on top of them

Thanks

Eric
> 
> Sure. That will be much cleaner and probably improve the context switch
> latency. Another thing I noted with STALL is that pasid_valid flag needs to be
> taken care in the SVA kernel path. 
> 
> "iommu: Remove pasid validity check for STALL model page response msg"
> Not sure this one is a proper way to handle this.
>  
>> On QEMU side:
>> - I am currently working on 3.2 range invalidation support which is
>> needed for DPDK/VFIO
>> - While at it I will look at how to incrementally introduce some of the
>> features you need in this series.
> 
> Ok. 
> 
> Thanks for taking a look at the POC.
> 
> Cheers,
> Shameer
> 



Re: [RFC, v1 0/3] msi support for platform devices

2020-11-17 Thread Auger Eric
Hi Vikas,

On 11/17/20 9:05 AM, Auger Eric wrote:
> Hi Vikas,
> 
> On 11/17/20 7:25 AM, Vikas Gupta wrote:
>> Hi Eric,
>>
>> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric  wrote:
>>>
>>> Hi Vikas,
>>>
>>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>>> Hi Eric,
>>>>
>>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric  wrote:
>>>>>
>>>>> Hi Vikas,
>>>>>
>>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>>> This RFC adds support for MSI for platform devices.
>>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>>> b) The vendor specific MSI configuration can be done using
>>>>>>callbacks which is implemented as msi module.
>>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>>
>>>>>> Changes from:
>>>>>> -
>>>>>>  v0 to v1:
>>>>>>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>>>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>>>MSI(s) with first entry of MSI block has count and flag
>>>>>>information.
>>>>>>IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>>>Example: if there are 'n' IRQs and 'k' MSIs
>>>>>>---
>>>>>>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>>>>>>---
>>>>> I have not taken time yet to look at your series, but to me you should 
>>>>> have
>>>>> |IRQ-0|IRQ-1||IRQ-n|MSI|MSIX
>>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>>> then set start=i count=1.
>>>>
>>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>>> and m-MSIs, allocation of IRQs should be done as below
>>>>
>>>> |IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>>>>  ||
>>>>  |
>>>> |MSIX0||MSIX1||MSXI2||MSIX-(k-1)|
>>>>  
>>>> |MSI0||MSI1||MSI2||MSI-(m-1)|
>>> No I really meant this list of indices: IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>>> and potentially later on IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>>> if ERR/REQ were to be added.
>> I agree on this. Actually the map I drew incorrectly above but wanted
>> to demonstrate the same. It was a child-parent relationship for MSI
>> and its members and similarly for MSIX as well.
>>>
>>> I think the userspace could query the total number of indices using
>>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>>> interrupts + MSI index + MSIX index)
>>>
>>> Then userspace can loop on all the indices using
>>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>>> first indices related to wire interrupts (count = 1). Then comes the MSI
>>> index, and after the MSI index. If any of those is supported, count >1,
>>> otherwise count=0. The only thing I am dubious about is can the device
>>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>>> discriminate between wire first indices and other indices.
>> I believe count can be one as well, especially for ERR/REQ as you
>> mentioned above.
> Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
> could be recognized by the first index whose count != 1. But indeed I am
> not sure the number of supported vectors cannot be 1. In your case it is
> induced by the size of the ring so it is OK but for other devices this
> may be different.
> 
> I think we can not rely on the count > 1. Now, this is
>> blocking and we are not left with options unless we consider adding
>> more enums in flags in vfio_irq_info to tell userspace that particular
>> index is wired, MSI, MSIX etc. for the platform device.
>> What do you think?
> If count is not reliable to discriminate the first n wired interrupts
> from the subsequen MSI and MSIx index, Alex suggested to add a
> capability extension in the vfio_irq_info structure. Something similar
> to what was done for vfio_region_info.
> 
> Such kind of thing was attempted in
> http

Re: [RFC, v1 0/3] msi support for platform devices

2020-11-17 Thread Auger Eric
Hi Vikas,

On 11/17/20 7:25 AM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Mon, Nov 16, 2020 at 6:44 PM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 11/13/20 6:24 PM, Vikas Gupta wrote:
>>> Hi Eric,
>>>
>>> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric  wrote:
>>>>
>>>> Hi Vikas,
>>>>
>>>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>>>> This RFC adds support for MSI for platform devices.
>>>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>>>> b) The vendor specific MSI configuration can be done using
>>>>>callbacks which is implemented as msi module.
>>>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>>>
>>>>> Changes from:
>>>>> -
>>>>>  v0 to v1:
>>>>>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>>>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>>>MSI(s) with first entry of MSI block has count and flag
>>>>>information.
>>>>>IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>>>Example: if there are 'n' IRQs and 'k' MSIs
>>>>>---
>>>>>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>>>>>---
>>>> I have not taken time yet to look at your series, but to me you should have
>>>> |IRQ-0|IRQ-1||IRQ-n|MSI|MSIX
>>>> then for setting a given MSIX (i) you would select the MSIx index and
>>>> then set start=i count=1.
>>>
>>> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
>>> and m-MSIs, allocation of IRQs should be done as below
>>>
>>> |IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>>>  ||
>>>  |
>>> |MSIX0||MSIX1||MSXI2||MSIX-(k-1)|
>>>  
>>> |MSI0||MSI1||MSI2||MSI-(m-1)|
>> No I really meant this list of indices: IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>> and potentially later on IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX| ERR| REQ
>> if ERR/REQ were to be added.
> I agree on this. Actually the map I drew incorrectly above but wanted
> to demonstrate the same. It was a child-parent relationship for MSI
> and its members and similarly for MSIX as well.
>>
>> I think the userspace could query the total number of indices using
>> VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
>> interrupts + MSI index + MSIX index)
>>
>> Then userspace can loop on all the indices using
>> VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
>> first indices related to wire interrupts (count = 1). Then comes the MSI
>> index, and after the MSI index. If any of those is supported, count >1,
>> otherwise count=0. The only thing I am dubious about is can the device
>> use a single MSI/MSIX? Because my hypothesis here is we use count to
>> discriminate between wire first indices and other indices.
> I believe count can be one as well, especially for ERR/REQ as you
> mentioned above.
Given ERR and REQ indices would follow MSI and MSIX ones, MSI index
could be recognized by the first index whose count != 1. But indeed I am
not sure the number of supported vectors cannot be 1. In your case it is
induced by the size of the ring so it is OK but for other devices this
may be different.

I think we can not rely on the count > 1. Now, this is
> blocking and we are not left with options unless we consider adding
> more enums in flags in vfio_irq_info to tell userspace that particular
> index is wired, MSI, MSIX etc. for the platform device.
> What do you think?
If count is not reliable to discriminate the first n wired interrupts
from the subsequen MSI and MSIx index, Alex suggested to add a
capability extension in the vfio_irq_info structure. Something similar
to what was done for vfio_region_info.

Such kind of thing was attempted in
https://lore.kernel.org/kvmarm/20201116110030.32335-8-eric.au...@redhat.com/T/#u

` [PATCH v11 07/13] vfio: Use capability chains to handle device
specific irq
` [PATCH v11 08/13] vfio/pci: Add framework for custom interrupt indices
` [PATCH v11 09/13] vfio: Add new IRQ for DMA fault reporting

Note this has not been reviewed yet.

Thanks

Eric

>>
>>
>>
>>> With this implementation user space can know that, at

Re: [RFC, v1 0/3] msi support for platform devices

2020-11-16 Thread Auger Eric
Hi Vikas,

On 11/13/20 6:24 PM, Vikas Gupta wrote:
> Hi Eric,
> 
> On Fri, Nov 13, 2020 at 12:10 AM Auger Eric  wrote:
>>
>> Hi Vikas,
>>
>> On 11/12/20 6:58 PM, Vikas Gupta wrote:
>>> This RFC adds support for MSI for platform devices.
>>> a) MSI(s) is/are added in addition to the normal interrupts.
>>> b) The vendor specific MSI configuration can be done using
>>>callbacks which is implemented as msi module.
>>> c) Adds a msi handling module for the Broadcom platform devices.
>>>
>>> Changes from:
>>> -
>>>  v0 to v1:
>>>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>>>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>>>MSI(s) with first entry of MSI block has count and flag
>>>information.
>>>IRQ list: Allocation for IRQs + MSIs are allocated as below
>>>Example: if there are 'n' IRQs and 'k' MSIs
>>>---
>>>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>>>---
>> I have not taken time yet to look at your series, but to me you should have
>> |IRQ-0|IRQ-1||IRQ-n|MSI|MSIX
>> then for setting a given MSIX (i) you would select the MSIx index and
>> then set start=i count=1.
> 
> As per your suggestion, we should have, if there are n-IRQs, k-MSIXs
> and m-MSIs, allocation of IRQs should be done as below
> 
> |IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
>  ||
>  |
> |MSIX0||MSIX1||MSXI2||MSIX-(k-1)|
>  |MSI0||MSI1||MSI2||MSI-(m-1)|
No I really meant this list of indices: IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX|
and potentially later on IRQ0|IRQ1|..|IRQ-(n-1)|MSI|MSIX| ERR| REQ
if ERR/REQ were to be added.

I think the userspace could query the total number of indices using
VFIO_DEVICE_GET_INFO and retrieve num_irqs (corresponding to the n wire
interrupts + MSI index + MSIX index)

Then userspace can loop on all the indices using
VFIO_DEVICE_GET_IRQ_INFO. For each index it uses count to determine the
first indices related to wire interrupts (count = 1). Then comes the MSI
index, and after the MSI index. If any of those is supported, count >1,
otherwise count=0. The only thing I am dubious about is can the device
use a single MSI/MSIX? Because my hypothesis here is we use count to
discriminate between wire first indices and other indices.



> With this implementation user space can know that, at indexes n and
> n+1, edge triggered interrupts are present.
note wired interrupts can also be edge ones.
>We may add an element in vfio_platform_irq itself to allocate MSIs/MSIXs
>struct vfio_platform_irq{
>.
>.
>struct vfio_platform_irq *block; => this points to the block
> allocation for MSIs/MSIXs and all msi/msix are type of IRQs.As wired 
> interrupts and MSI interrupts coexist, I would store in vdev an
array of wired interrupts (the existing vdev->irqs) and a new array for
MSI(x) as done in the PCI code.

vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL);

Does it make sense?

>};
>  OR
> Another structure can be defined in 'vfio_pci_private.h'
> struct vfio_msi_ctx {
> struct eventfd_ctx  *trigger;
> char*name;
> };
> and
> struct vfio_platform_irq {
>   .
>   .
>   struct vfio_msi_ctx *block; => this points to the block allocation
> for MSIs/MSIXs
> };
> Which of the above two options sounds OK to you? Please suggest.
> 
>> to me individual MSIs are encoded in the subindex and not in the index.
>> The index just selects the "type" of interrupt.
>>
>> For PCI you just have:
>> VFIO_PCI_INTX_IRQ_INDEX,
>> VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
>> start/count
>> VFIO_PCI_MSIX_IRQ_INDEX,
>> VFIO_PCI_ERR_IRQ_INDEX,
>> VFIO_PCI_REQ_IRQ_INDEX,
>>
>> (include/uapi/linux/vfio.h)
> 
> In pci case, type of interrupts is fixed so they can be 'indexed' by
> these enums but for VFIO platform user space will need to iterate all
> (num_irqs) indexes to know at which indexes edge triggered interrupts
> are present.
indeed, but can't you loop over all indices looking until count !=1? At
this point you know if have finished emurating the wires. Holds if
MSI(x) count !=1 of course.

Thanks

Eric

> 
> Thanks,
> Vikas
>>
>> Thanks
>>
&

Re: [PATCH v10 04/11] vfio/pci: Add VFIO_REGION_TYPE_NESTED region type

2020-11-13 Thread Auger Eric
Hi Zenghui,
On 9/24/20 10:23 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> Add a new specific DMA_FAULT region aiming to exposed nested mode
>> translation faults.
>>
>> The region has a ring buffer that contains the actual fault
>> records plus a header allowing to handle it (tail/head indices,
>> max capacity, entry size). At the moment the region is dimensionned
>> for 512 fault records.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 379a02c36e37..586b89debed5 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -260,6 +260,69 @@ int vfio_pci_set_power_state(struct
>> vfio_pci_device *vdev, pci_power_t state)
>>   return ret;
>>   }
>>   +static void vfio_pci_dma_fault_release(struct vfio_pci_device *vdev,
>> +   struct vfio_pci_region *region)
>> +{
>> +}
>> +
>> +static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device
>> *vdev,
>> + struct vfio_pci_region *region,
>> + struct vfio_info_cap *caps)
>> +{
>> +    struct vfio_region_info_cap_fault cap = {
>> +    .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>> +    .header.version = 1,
>> +    .version = 1,
>> +    };
>> +    return vfio_info_add_capability(caps, , sizeof(cap));
>> +}
>> +
>> +static const struct vfio_pci_regops vfio_pci_dma_fault_regops = {
>> +    .rw    = vfio_pci_dma_fault_rw,
>> +    .release    = vfio_pci_dma_fault_release,
>> +    .add_capability = vfio_pci_dma_fault_add_capability,
>> +};
>> +
>> +#define DMA_FAULT_RING_LENGTH 512
>> +
>> +static int vfio_pci_init_dma_fault_region(struct vfio_pci_device *vdev)
>> +{
>> +    struct vfio_region_dma_fault *header;
>> +    size_t size;
>> +    int ret;
>> +
>> +    mutex_init(>fault_queue_lock);
>> +
>> +    /*
>> + * We provision 1 page for the header and space for
>> + * DMA_FAULT_RING_LENGTH fault records in the ring buffer.
>> + */
>> +    size = ALIGN(sizeof(struct iommu_fault) *
>> + DMA_FAULT_RING_LENGTH, PAGE_SIZE) + PAGE_SIZE;
>> +
>> +    vdev->fault_pages = kzalloc(size, GFP_KERNEL);
>> +    if (!vdev->fault_pages)
>> +    return -ENOMEM;
>> +
>> +    ret = vfio_pci_register_dev_region(vdev,
>> +    VFIO_REGION_TYPE_NESTED,
>> +    VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
>> +    _pci_dma_fault_regops, size,
>> +    VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE,
>> +    vdev->fault_pages);
>> +    if (ret)
>> +    goto out;
>> +
>> +    header = (struct vfio_region_dma_fault *)vdev->fault_pages;
>> +    header->entry_size = sizeof(struct iommu_fault);
>> +    header->nb_entries = DMA_FAULT_RING_LENGTH;
>> +    header->offset = sizeof(struct vfio_region_dma_fault);
>> +    return 0;
>> +out:
>> +    kfree(vdev->fault_pages);
>> +    return ret;
>> +}
>> +
>>   static int vfio_pci_enable(struct vfio_pci_device *vdev)
>>   {
>>   struct pci_dev *pdev = vdev->pdev;
>> @@ -358,6 +421,10 @@ static int vfio_pci_enable(struct vfio_pci_device
>> *vdev)
>>   }
>>   }
>>   +    ret = vfio_pci_init_dma_fault_region(vdev);
>> +    if (ret)
>> +    goto disable_exit;
>> +
>>   vfio_pci_probe_mmaps(vdev);
>>     return 0;
>> @@ -1383,6 +1450,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
>>     vfio_iommu_group_put(pdev->dev.iommu_group, >dev);
>>   kfree(vdev->region);
>> +    kfree(vdev->fault_pages);
>>   mutex_destroy(>ioeventfds_lock);
>>     if (!disable_idle_d3)
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h
>> b/drivers/vfio/pci/vfio_pci_private.h
>> index 8a2c7607d513..a392f50e3a99 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -119,6 +119,8 @@ struct vfio_pci_device {
>>   int    ioeventfds_nr;
>>   struct eventfd_ctx    *err_trigger;
>>   struct eventfd_ctx    *req_trigger;
>> +    u8    *fault_pages;
> 
> What's the reason to use 'u8'? It doesn't match the type of header, nor
> the type of ring buffer.
actually it matches
u8  *pci_config_map;
u8  *vconfig;

fault_pages is the va of the ring buffer. In the header, offset is the
offset of the ring wrt start of the region.

> 
>> +    struct mutex    fault_queue_lock;
>>   struct list_head    dummy_resources_list;
>>   struct mutex    ioeventfds_lock;
>>   struct list_head    ioeventfds_list;
>> @@ -150,6 +152,14 @@ extern ssize_t vfio_pci_vga_rw(struct
>> vfio_pci_device *vdev, char __user *buf,
>>   extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t
>> offset,
>>  uint64_t data, int count, int fd);
>>   +struct vfio_pci_fault_abi {
>> +    u32 entry_size;
>> +};
> 
> This is not used by this patch (and the whole series).
removed
> 
>> +
>> +extern size_t 

Re: [PATCH v10 05/11] vfio/pci: Register an iommu fault handler

2020-11-13 Thread Auger Eric
Hi Zenghui,

On 9/24/20 10:49 AM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2020/3/21 0:19, Eric Auger wrote:
>> Register an IOMMU fault handler which records faults in
>> the DMA FAULT region ring buffer. In a subsequent patch, we
>> will add the signaling of a specific eventfd to allow the
>> userspace to be notified whenever a new fault as shown up.
>>
>> Signed-off-by: Eric Auger 
> 
> [...]
> 
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 586b89debed5..69595c240baf 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -27,6 +27,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>     #include "vfio_pci_private.h"
>>   @@ -283,6 +284,38 @@ static const struct vfio_pci_regops
>> vfio_pci_dma_fault_regops = {
>>   .add_capability = vfio_pci_dma_fault_add_capability,
>>   };
>>   +int vfio_pci_iommu_dev_fault_handler(struct iommu_fault *fault,
>> void *data)
>> +{
>> +    struct vfio_pci_device *vdev = (struct vfio_pci_device *)data;
>> +    struct vfio_region_dma_fault *reg =
>> +    (struct vfio_region_dma_fault *)vdev->fault_pages;
>> +    struct iommu_fault *new =
>> +    (struct iommu_fault *)(vdev->fault_pages + reg->offset +
>> +    reg->head * reg->entry_size);
> 
> Shouldn't 'reg->head' be protected under the fault_queue_lock? Otherwise
> things may change behind our backs...>
> We shouldn't take any assumption about how IOMMU driver would report the
> fault (serially or in parallel), I think.

Yes I modified the locking

Thanks

Eric
> 
>> +    int head, tail, size;
>> +    int ret = 0;
>> +
>> +    if (fault->type != IOMMU_FAULT_DMA_UNRECOV)
>> +    return -ENOENT;
>> +
>> +    mutex_lock(>fault_queue_lock);
>> +
>> +    head = reg->head;
>> +    tail = reg->tail;
>> +    size = reg->nb_entries;
>> +
>> +    if (CIRC_SPACE(head, tail, size) < 1) {
>> +    ret = -ENOSPC;
>> +    goto unlock;
>> +    }
>> +
>> +    *new = *fault;
>> +    reg->head = (head + 1) % size;
>> +unlock:
>> +    mutex_unlock(>fault_queue_lock);
>> +    return ret;
>> +}
>> +
>>   #define DMA_FAULT_RING_LENGTH 512
>>     static int vfio_pci_init_dma_fault_region(struct vfio_pci_device
>> *vdev)
>> @@ -317,6 +350,13 @@ static int vfio_pci_init_dma_fault_region(struct
>> vfio_pci_device *vdev)
>>   header->entry_size = sizeof(struct iommu_fault);
>>   header->nb_entries = DMA_FAULT_RING_LENGTH;
>>   header->offset = sizeof(struct vfio_region_dma_fault);
>> +
>> +    ret = iommu_register_device_fault_handler(>pdev->dev,
>> +    vfio_pci_iommu_dev_fault_handler,
>> +    vdev);
>> +    if (ret)
>> +    goto out;
>> +
>>   return 0;
>>   out:
>>   kfree(vdev->fault_pages);
> 
> 
> Thanks,
> Zenghui
> 



Re: [RFC, v1 0/3] msi support for platform devices

2020-11-12 Thread Auger Eric
Hi Vikas,

On 11/12/20 6:58 PM, Vikas Gupta wrote:
> This RFC adds support for MSI for platform devices.
> a) MSI(s) is/are added in addition to the normal interrupts.
> b) The vendor specific MSI configuration can be done using
>callbacks which is implemented as msi module.
> c) Adds a msi handling module for the Broadcom platform devices.
> 
> Changes from:
> -
>  v0 to v1:
>i)  Removed MSI device flag VFIO_DEVICE_FLAGS_MSI.
>ii) Add MSI(s) at the end of the irq list of platform IRQs.
>MSI(s) with first entry of MSI block has count and flag
>information.
>IRQ list: Allocation for IRQs + MSIs are allocated as below
>Example: if there are 'n' IRQs and 'k' MSIs
>---
>|IRQ-0|IRQ-1||IRQ-n|MSI-0|MSI-1|MSI-2|..|MSI-k|
>---
I have not taken time yet to look at your series, but to me you should have
|IRQ-0|IRQ-1||IRQ-n|MSI|MSIX
then for setting a given MSIX (i) you would select the MSIx index and
then set start=i count=1.
to me individual MSIs are encoded in the subindex and not in the index.
The index just selects the "type" of interrupt.

For PCI you just have:
VFIO_PCI_INTX_IRQ_INDEX,
VFIO_PCI_MSI_IRQ_INDEX, -> MSI index and then you play with
start/count
VFIO_PCI_MSIX_IRQ_INDEX,
VFIO_PCI_ERR_IRQ_INDEX,
VFIO_PCI_REQ_IRQ_INDEX,

(include/uapi/linux/vfio.h)

Thanks

Eric
>MSI-0 will have count=k set and flags set accordingly.
> 
> Vikas Gupta (3):
>   vfio/platform: add support for msi
>   vfio/platform: change cleanup order
>   vfio/platform: add Broadcom msi module
> 
>  drivers/vfio/platform/Kconfig |   1 +
>  drivers/vfio/platform/Makefile|   1 +
>  drivers/vfio/platform/msi/Kconfig |   9 +
>  drivers/vfio/platform/msi/Makefile|   2 +
>  .../vfio/platform/msi/vfio_platform_bcmplt.c  |  74 ++
>  drivers/vfio/platform/vfio_platform_common.c  |  86 ++-
>  drivers/vfio/platform/vfio_platform_irq.c | 238 +-
>  drivers/vfio/platform/vfio_platform_private.h |  23 ++
>  8 files changed, 419 insertions(+), 15 deletions(-)
>  create mode 100644 drivers/vfio/platform/msi/Kconfig
>  create mode 100644 drivers/vfio/platform/msi/Makefile
>  create mode 100644 drivers/vfio/platform/msi/vfio_platform_bcmplt.c
> 



Re: [RFC, v0 1/3] vfio/platform: add support for msi

2020-11-09 Thread Auger Eric
Hi Vikas,

On 11/9/20 7:41 AM, Vikas Gupta wrote:
> Hi Alex,
> 
> On Fri, Nov 6, 2020 at 8:42 AM Alex Williamson
>  wrote:
>>
>> On Fri, 6 Nov 2020 08:24:26 +0530
>> Vikas Gupta  wrote:
>>
>>> Hi Alex,
>>>
>>> On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
>>>  wrote:

 On Thu,  5 Nov 2020 11:32:55 +0530
 Vikas Gupta  wrote:

> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 2f313a238a8f..aab051e8338d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -203,6 +203,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_AP (1 << 5)/* vfio-ap device */
>  #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6)/* vfio-fsl-mc device */
>  #define VFIO_DEVICE_FLAGS_CAPS   (1 << 7)/* Info supports 
> caps */
> +#define VFIO_DEVICE_FLAGS_MSI(1 << 8)/* Device supports 
> msi */
>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>   __u32   cap_offset; /* Offset within info struct of first cap */

 This doesn't make any sense to me, MSIs are just edge triggered
 interrupts to userspace, so why isn't this fully described via
 VFIO_DEVICE_GET_IRQ_INFO?  If we do need something new to describe it,
 this seems incomplete, which indexes are MSI (IRQ_INFO can describe
 that)?  We also already support MSI with vfio-pci, so a global flag for
 the device advertising this still seems wrong.  Thanks,

 Alex

>>> Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
>>> cannot be described using indexes.
>>
>> That would be news for vfio-pci which has been describing MSIs with
>> sub-indexes within indexes since vfio started.
>>
>>> In the patch set there is no difference between MSI and normal
>>> interrupt for VFIO_DEVICE_GET_IRQ_INFO.
>>
>> Then what exactly is a global device flag indicating?  Does it indicate
>> all IRQs are MSI?
> 
> No, it's not indicating that all are MSI.
> The rationale behind adding the flag to tell user-space that platform
> device supports MSI as well. As you mentioned recently added
> capabilities can help on this, I`ll go through that.
> 
>>
>>> The patch set adds MSI(s), say as an extension, to the normal
>>> interrupts and handled accordingly.
>>
>> So we have both "normal" IRQs and MSIs?  How does the user know which
>> indexes are which?
> 
> With this patch set, I think this is missing and user space cannot
> know that particular index is MSI interrupt.
> For platform devices there is no such mechanism, like index and
> sub-indexes to differentiate between legacy, MSI or MSIX as it’s there
> in PCI.
Wht can't you use the count field (as per vfio_pci_get_irq_count())?
> I believe for a particular IRQ index if the flag
> VFIO_IRQ_INFO_NORESIZE is used then user space can know which IRQ
> index has MSI(s). Does it make sense?
I don't think it is the same semantics.

Thanks

Eric
> Suggestions on this would be helpful.
> 
> Thanks,
> Vikas
>>
>>> Do you see this is a violation? If
>>
>> Seems pretty unclear and dubious use of a global device flag.
>>
>>> yes, then we`ll think of other possible ways to support MSI for the
>>> platform devices.
>>> Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
>>> collides with an already supported vfio-pci or if not necessary, we
>>> can remove this flag.
>>
>> If nothing else you're using a global flag to describe a platform
>> device specific augmentation.  We've recently added capabilities on the
>> device info return that would be more appropriate for this, but
>> fundamentally I don't understand why the irq info isn't sufficient.
>> Thanks,
>>
>> Alex
>>



Re: [RFC, v0 1/3] vfio/platform: add support for msi

2020-11-09 Thread Auger Eric
Hi Vikas,

On 11/6/20 3:54 AM, Vikas Gupta wrote:
> Hi Alex,
> 
> On Thu, Nov 5, 2020 at 12:38 PM Alex Williamson
>  wrote:
>>
>> On Thu,  5 Nov 2020 11:32:55 +0530
>> Vikas Gupta  wrote:
>>
>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>> index 2f313a238a8f..aab051e8338d 100644
>>> --- a/include/uapi/linux/vfio.h
>>> +++ b/include/uapi/linux/vfio.h
>>> @@ -203,6 +203,7 @@ struct vfio_device_info {
>>>  #define VFIO_DEVICE_FLAGS_AP (1 << 5)/* vfio-ap device */
>>>  #define VFIO_DEVICE_FLAGS_FSL_MC (1 << 6)/* vfio-fsl-mc device */
>>>  #define VFIO_DEVICE_FLAGS_CAPS   (1 << 7)/* Info supports caps 
>>> */
>>> +#define VFIO_DEVICE_FLAGS_MSI(1 << 8)/* Device supports 
>>> msi */
>>>   __u32   num_regions;/* Max region index + 1 */
>>>   __u32   num_irqs;   /* Max IRQ index + 1 */
>>>   __u32   cap_offset; /* Offset within info struct of first cap */
>>
>> This doesn't make any sense to me, MSIs are just edge triggered
>> interrupts to userspace, so why isn't this fully described via
>> VFIO_DEVICE_GET_IRQ_INFO?  If we do need something new to describe it,
>> this seems incomplete, which indexes are MSI (IRQ_INFO can describe
>> that)?  We also already support MSI with vfio-pci, so a global flag for
>> the device advertising this still seems wrong.  Thanks,
>>
>> Alex
>>
> Since VFIO platform uses indexes for IRQ numbers so I think MSI(s)
> cannot be described using indexes.
> In the patch set there is no difference between MSI and normal
> interrupt for VFIO_DEVICE_GET_IRQ_INFO.
in vfio_platform_irq_init() we first iterate on normal interrupts using
get_irq(). Can't we add an MSI index at the end of this list with
vdev->irqs[i].count > 1 and set vdev->num_irqs accordingly?

Thanks

Eric
> The patch set adds MSI(s), say as an extension, to the normal
> interrupts and handled accordingly. Do you see this is a violation? If
> yes, then we`ll think of other possible ways to support MSI for the
> platform devices.
> Macro VFIO_DEVICE_FLAGS_MSI can be changed to any other name if it
> collides with an already supported vfio-pci or if not necessary, we
> can remove this flag.
> 
> Thanks,
> Vikas
> 



Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE

2020-10-27 Thread Auger Eric
Hi Shameer,

On 10/27/20 1:20 PM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of
>> Auger Eric
>> Sent: 23 September 2020 12:47
>> To: yuzenghui ; eric.auger@gmail.com;
>> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; j...@8bytes.org;
>> alex.william...@redhat.com; jacob.jun@linux.intel.com;
>> yi.l@intel.com; robin.mur...@arm.com
>> Subject: Re: [PATCH v10 01/11] vfio: VFIO_IOMMU_SET_PASID_TABLE
> 
> ...
> 
>>> Besides, before going through the whole series [1][2], I'd like to
>>> know if this is the latest version of your Nested-Stage-Setup work in
>>> case I had missed something.
>>>
>>> [1]
>>> https://lore.kernel.org/r/20200320161911.27494-1-eric.au...@redhat.com
>>> [2]
>>> https://lore.kernel.org/r/20200414150607.28488-1-eric.au...@redhat.com
>>
>> yes those 2 series are the last ones. Thank you for reviewing.
>>
>> FYI, I intend to respin within a week or 2 on top of Jacob's  [PATCH v10 0/7]
>> IOMMU user API enhancement. 
> 
> Thanks for that. Also is there any plan to respin the related Qemu series as 
> well?
> I know dual stage interface proposals are still under discussion, but it 
> would be
> nice to have a testable solution based on new interfaces for ARM64 as well.
> Happy to help with any tests or verifications.

Yes the QEMU series will be respinned as well. That's on the top of my
todo list right now.

Thanks

Eric
> 
> Please let me know.
> 
> Thanks,
> Shameer
>   
> 



Re: [PATCH v6 10/10] vfio/fsl-mc: Add support for device reset

2020-10-10 Thread Auger Eric
Hi Diana,

On 10/5/20 7:36 PM, Diana Craciun wrote:
> Currently only resetting the DPRC container is supported which
> will reset all the objects inside it. Resetting individual
> objects is possible from the userspace by issueing commands
> towards MC firmware.
> 
> Signed-off-by: Diana Craciun 
Reviewed-by: Eric Auger 

Eric
> ---
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c 
> b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> index d95568cd8021..d009f873578c 100644
> --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> @@ -217,6 +217,10 @@ static long vfio_fsl_mc_ioctl(void *device_data, 
> unsigned int cmd,
>   return -EINVAL;
>  
>   info.flags = VFIO_DEVICE_FLAGS_FSL_MC;
> +
> + if (is_fsl_mc_bus_dprc(mc_dev))
> + info.flags |= VFIO_DEVICE_FLAGS_RESET;
> +
>   info.num_regions = mc_dev->obj_desc.region_count;
>   info.num_irqs = mc_dev->obj_desc.irq_count;
>  
> @@ -299,7 +303,19 @@ static long vfio_fsl_mc_ioctl(void *device_data, 
> unsigned int cmd,
>   }
>   case VFIO_DEVICE_RESET:
>   {
> - return -ENOTTY;
> + int ret;
> + struct fsl_mc_device *mc_dev = vdev->mc_dev;
> +
> + /* reset is supported only for the DPRC */
> + if (!is_fsl_mc_bus_dprc(mc_dev))
> + return -ENOTTY;
> +
> + ret = dprc_reset_container(mc_dev->mc_io, 0,
> +mc_dev->mc_handle,
> +mc_dev->obj_desc.id,
> +DPRC_RESET_OPTION_NON_RECURSIVE);
> + return ret;
> +
>   }
>   default:
>   return -ENOTTY;
> 



Re: [PATCH v6 01/10] vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices

2020-10-10 Thread Auger Eric
Hi Diana,

On 10/5/20 7:36 PM, Diana Craciun wrote:
> From: Bharat Bhushan 
> 
> DPAA2 (Data Path Acceleration Architecture) consists in
> mechanisms for processing Ethernet packets, queue management,
> accelerators, etc.
> 
> The Management Complex (mc) is a hardware entity that manages the DPAA2
> hardware resources. It provides an object-based abstraction for software
> drivers to use the DPAA2 hardware. The MC mediates operations such as
> create, discover, destroy of DPAA2 objects.
> The MC provides memory-mapped I/O command interfaces (MC portals) which
> DPAA2 software drivers use to operate on DPAA2 objects.
> 
> A DPRC is a container object that holds other types of DPAA2 objects.
> Each object in the DPRC is a Linux device and bound to a driver.
> The MC-bus driver is a platform driver (different from PCI or platform
> bus). The DPRC driver does runtime management of a bus instance. It
> performs the initial scan of the DPRC and handles changes in the DPRC
> configuration (adding/removing objects).
> 
> All objects inside a container share the same hardware isolation
> context, meaning that only an entire DPRC can be assigned to
> a virtual machine.
> When a container is assigned to a virtual machine, all the objects
> within that container are assigned to that virtual machine.
> The DPRC container assigned to the virtual machine is not allowed
> to change contents (add/remove objects) by the guest. The restriction
> is set by the host and enforced by the mc hardware.
> 
> The DPAA2 objects can be directly assigned to the guest. However
> the MC portals (the memory mapped command interface to the MC) need
> to be emulated because there are commands that configure the
> interrupts and the isolation IDs which are virtual in the guest.
> 
> Example:
> echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
> echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind
> 
> The dprc.2 is bound to the VFIO driver and all the objects within
> dprc.2 are going to be bound to the VFIO driver.
> 
> This patch adds the infrastructure for VFIO support for fsl-mc
> devices. Subsequent patches will add support for binding and secure
> assigning these devices using VFIO.
> 
> More details about the DPAA2 objects can be found here:
> Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
> 
> Signed-off-by: Bharat Bhushan 
> Signed-off-by: Diana Craciun 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  MAINTAINERS   |   6 +
>  drivers/vfio/Kconfig  |   1 +
>  drivers/vfio/Makefile |   1 +
>  drivers/vfio/fsl-mc/Kconfig   |   9 ++
>  drivers/vfio/fsl-mc/Makefile  |   4 +
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c | 157 ++
>  drivers/vfio/fsl-mc/vfio_fsl_mc_private.h |  14 ++
>  include/uapi/linux/vfio.h |   1 +
>  8 files changed, 193 insertions(+)
>  create mode 100644 drivers/vfio/fsl-mc/Kconfig
>  create mode 100644 drivers/vfio/fsl-mc/Makefile
>  create mode 100644 drivers/vfio/fsl-mc/vfio_fsl_mc.c
>  create mode 100644 drivers/vfio/fsl-mc/vfio_fsl_mc_private.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 33b27e62ce19..1046f4065ac1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18258,6 +18258,12 @@ F:   drivers/vfio/
>  F:   include/linux/vfio.h
>  F:   include/uapi/linux/vfio.h
>  
> +VFIO FSL-MC DRIVER
> +M:   Diana Craciun 
> +L:   k...@vger.kernel.org
> +S:   Maintained
> +F:   drivers/vfio/fsl-mc/
> +
>  VFIO MEDIATED DEVICE DRIVERS
>  M:   Kirti Wankhede 
>  L:   k...@vger.kernel.org
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index fd17db9b432f..5533df91b257 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -47,4 +47,5 @@ menuconfig VFIO_NOIOMMU
>  source "drivers/vfio/pci/Kconfig"
>  source "drivers/vfio/platform/Kconfig"
>  source "drivers/vfio/mdev/Kconfig"
> +source "drivers/vfio/fsl-mc/Kconfig"
>  source "virt/lib/Kconfig"
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index de67c4725cce..fee73f3d9480 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
>  obj-$(CONFIG_VFIO_PCI) += pci/
>  obj-$(CONFIG_VFIO_PLATFORM) += platform/
>  obj-$(CONFIG_VFIO_MDEV) += mdev/
> +obj-$(CONFIG_VFIO_FSL_MC) += fsl-mc/
> diff --git a/drivers/vfio/fsl-mc/Kconfig b/drivers/vfio/fsl-mc/Kconfig
> new file mode 100644
> index ..b1a527d6b6f2
> --- /dev/null
> +++ b/drivers/vfio/fsl-mc/Kconfig
> @@ -0,0 +1,9 @@
> +config VFIO_FSL_MC
> + tristate "VFIO support for QorIQ DPAA2 fsl-mc bus devices"
> + depends on VFIO && FSL_MC_BUS && EVENTFD
> + help
> +   Driver to enable support for the VFIO QorIQ DPAA2 fsl-mc
> +   (Management Complex) devices. This is required to passthrough
> +   fsl-mc bus devices using the VFIO framework.
> +
> +   If you don't know what to do here, say N.
> diff 

Re: [PATCH v6 02/10] vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind

2020-10-10 Thread Auger Eric
Hi Diana,

On 10/5/20 7:36 PM, Diana Craciun wrote:
> The DPRC (Data Path Resource Container) device is a bus device and has
> child devices attached to it. When the vfio-fsl-mc driver is probed
> the DPRC is scanned and the child devices discovered and initialized.
> 
> Signed-off-by: Bharat Bhushan 
> Signed-off-by: Diana Craciun 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c | 91 +++
>  drivers/vfio/fsl-mc/vfio_fsl_mc_private.h |  1 +
>  2 files changed, 92 insertions(+)
> 
> diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c 
> b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> index a7a483a1e90b..594760203268 100644
> --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> @@ -15,6 +15,8 @@
>  
>  #include "vfio_fsl_mc_private.h"
>  
> +static struct fsl_mc_driver vfio_fsl_mc_driver;
> +
>  static int vfio_fsl_mc_open(void *device_data)
>  {
>   if (!try_module_get(THIS_MODULE))
> @@ -84,6 +86,80 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
>   .mmap   = vfio_fsl_mc_mmap,
>  };
>  
> +static int vfio_fsl_mc_bus_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct vfio_fsl_mc_device *vdev = container_of(nb,
> + struct vfio_fsl_mc_device, nb);
> + struct device *dev = data;
> + struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
> + struct fsl_mc_device *mc_cont = to_fsl_mc_device(mc_dev->dev.parent);
> +
> + if (action == BUS_NOTIFY_ADD_DEVICE &&
> + vdev->mc_dev == mc_cont) {
> + mc_dev->driver_override = kasprintf(GFP_KERNEL, "%s",
> + vfio_fsl_mc_ops.name);
> + if (!mc_dev->driver_override)
> + dev_warn(dev, "VFIO_FSL_MC: Setting driver override for 
> device in dprc %s failed\n",
> +  dev_name(_cont->dev));
> + else
> + dev_info(dev, "VFIO_FSL_MC: Setting driver override for 
> device in dprc %s\n",
> +  dev_name(_cont->dev));
> + } else if (action == BUS_NOTIFY_BOUND_DRIVER &&
> + vdev->mc_dev == mc_cont) {
> + struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(dev->driver);
> +
> + if (mc_drv && mc_drv != _fsl_mc_driver)
> + dev_warn(dev, "VFIO_FSL_MC: Object %s bound to driver 
> %s while DPRC bound to vfio-fsl-mc\n",
> +  dev_name(dev), mc_drv->driver.name);
> + }
> +
> + return 0;
> +}
> +
> +static int vfio_fsl_mc_init_device(struct vfio_fsl_mc_device *vdev)
> +{
> + struct fsl_mc_device *mc_dev = vdev->mc_dev;
> + int ret;
> +
> + /* Non-dprc devices share mc_io from parent */
> + if (!is_fsl_mc_bus_dprc(mc_dev)) {
> + struct fsl_mc_device *mc_cont = 
> to_fsl_mc_device(mc_dev->dev.parent);
> +
> + mc_dev->mc_io = mc_cont->mc_io;
> + return 0;
> + }
> +
> + vdev->nb.notifier_call = vfio_fsl_mc_bus_notifier;
> + ret = bus_register_notifier(_mc_bus_type, >nb);
> + if (ret)
> + return ret;
> +
> + /* open DPRC, allocate a MC portal */
> + ret = dprc_setup(mc_dev);
> + if (ret) {
> + dev_err(_dev->dev, "VFIO_FSL_MC: Failed to setup DPRC 
> (%d)\n", ret);
> + goto out_nc_unreg;
> + }
> +
> + ret = dprc_scan_container(mc_dev, false);
> + if (ret) {
> + dev_err(_dev->dev, "VFIO_FSL_MC: Container scanning failed 
> (%d)\n", ret);
> + goto out_dprc_cleanup;
> + }
> +
> + return 0;
> +
> +out_dprc_cleanup:
> + dprc_remove_devices(mc_dev, NULL, 0);
> + dprc_cleanup(mc_dev);
> +out_nc_unreg:
> + bus_unregister_notifier(_mc_bus_type, >nb);
> + vdev->nb.notifier_call = NULL;
> +
> + return ret;
> +}
> +
>  static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
>  {
>   struct iommu_group *group;
> @@ -110,8 +186,15 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device 
> *mc_dev)
>   dev_err(dev, "VFIO_FSL_MC: Failed to add to vfio group\n");
>   goto out_group_put;
>   }
> +
> + ret = vfio_fsl_mc_init_device(vdev);
> + if (ret)
> + goto out_group_dev;
> +
>   return 0;
>  
> +out_group_dev:
> + vfio_del_group_dev(dev);
>  out_group_put:
>   vfio_iommu_group_put(group, dev);
>   return ret;
> @@ -126,6 +209,14 @@ static int vfio_fsl_mc_remove(struct fsl_mc_device 
> *mc_dev)
>   if (!vdev)
>   return -EINVAL;
>  
> + if (is_fsl_mc_bus_dprc(mc_dev)) {
> + dprc_remove_devices(mc_dev, NULL, 0);
> + dprc_cleanup(mc_dev);
> + }
> +
> + if (vdev->nb.notifier_call)
> + bus_unregister_notifier(_mc_bus_type, >nb);
> +
>   vfio_iommu_group_put(mc_dev->dev.iommu_group, dev);
>  
>   return 0;
> 

  1   2   3   4   5   6   7   8   9   10   >