Re: QEMU SMMUv3 stage 2 translation

2021-05-20 Thread Auger Eric
Hi Shashi,

[ fixing my email address ]

On 5/13/21 4:25 PM, Shashi Mallela wrote:
> Hi,
> 
> Since the current SMMUv3 qemu implementation only supports stage 1
> translation,wanted to understand if the implementation could be extended
> to stage 2 translation support and if yes what is the overall scope
> involved.This is required for sbsa-ref platforms.

Yes I think this is feasible. This would require some additionnal
decoding in the STE and also adapt the page table decoding to stage2
according to AArch64 Virtual Memory System Architecture.

If you proceed I would like this code to be isolated of the stage1
decoding as much as possible to alleviate the maintainance all the more
so the stage1 code is the one likely to be used in production for DPDK
or SVA use cases, let's dream. One of the tricky part is the internal
TLB modeling (cache and IOTLB). In your case you may not need to
implement such internal IOTLB for stage 2 entries as I guess you do not
really target perf and this is the source of lots of bugs/headaches ;-)

Thanks

Eric
> .
> Thanks
> Shashi
> Sent from Mailspring




Re: [PATCH] hw/arm/smmuv3: Another range invalidation fix

2021-05-10 Thread Auger Eric
Hi Peter,

On 5/10/21 1:31 PM, Peter Maydell wrote:
> On Wed, 21 Apr 2021 at 18:29, Eric Auger  wrote:
>>
>> 6d9cd115b9 ("hw/arm/smmuv3: Enforce invalidation on a power of two range")
>> failed to completely fix misalignment issues with range
>> invalidation. For instance invalidations patterns like "invalidate 32
>> 4kB pages starting from 0xff395000 are not correctly handled" due
>> to the fact the previous fix only made sure the number of invalidated
>> pages were a power of 2 but did not properly handle the start
>> address was not aligned with the range. This can be noticed when
>> boothing a fedora 33 with protected virtio-blk-pci.
>>
>> Signed-off-by: Eric Auger 
>> Fixes: 6d9cd115b9 ("hw/arm/smmuv3: Enforce invalidation on a power of two 
>> range")
>>
>> ---
>>
>> This bug was found with SMMU RIL avocado-qemu acceptance tests
>> ---
>>  hw/arm/smmuv3.c | 49 +
>>  1 file changed, 25 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index 8705612535..16f285a566 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -856,43 +856,44 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, 
>> int asid, dma_addr_t iova,
>>
>>  static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
>>  {
>> -uint8_t scale = 0, num = 0, ttl = 0;
>> -dma_addr_t addr = CMD_ADDR(cmd);
>> +dma_addr_t end, addr = CMD_ADDR(cmd);
>>  uint8_t type = CMD_TYPE(cmd);
>>  uint16_t vmid = CMD_VMID(cmd);
>> +uint8_t scale = CMD_SCALE(cmd);
>> +uint8_t num = CMD_NUM(cmd);
>> +uint8_t ttl = CMD_TTL(cmd);
>>  bool leaf = CMD_LEAF(cmd);
>>  uint8_t tg = CMD_TG(cmd);
>> -uint64_t first_page = 0, last_page;
>> -uint64_t num_pages = 1;
>> +uint64_t num_pages;
>> +uint8_t granule;
>>  int asid = -1;
>>
>> -if (tg) {
>> -scale = CMD_SCALE(cmd);
>> -num = CMD_NUM(cmd);
>> -ttl = CMD_TTL(cmd);
>> -num_pages = (num + 1) * BIT_ULL(scale);
>> -}
>> -
>>  if (type == SMMU_CMD_TLBI_NH_VA) {
>>  asid = CMD_ASID(cmd);
>>  }
>>
>> +if (!tg) {
>> +trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
>> +smmuv3_inv_notifiers_iova(s, asid, addr, tg, 1);
>> +smmu_iotlb_inv_iova(s, asid, addr, tg, 1, ttl);
>> +}
> 
> Is this intended to fall through ?
hum no it isn't. I will fix that.

Thanks

Eric
> 
>> +
>> +/* RIL in use */
>> +
>> +num_pages = (num + 1) * BIT_ULL(scale);
>> +granule = tg * 2 + 10;
>> +
>>  /* Split invalidations into ^2 range invalidations */
>> -last_page = num_pages - 1;
>> -while (num_pages) {
>> -uint8_t granule = tg * 2 + 10;
>> -uint64_t mask, count;
>> +end = addr + (num_pages << granule) - 1;
>>
>> -mask = dma_aligned_pow2_mask(first_page, last_page, 64 - granule);
>> -count = mask + 1;
>> +while (addr != end + 1) {
>> +uint64_t mask = dma_aligned_pow2_mask(addr, end, 64);
>>
>> -trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, count, ttl, leaf);
>> -smmuv3_inv_notifiers_iova(s, asid, addr, tg, count);
>> -smmu_iotlb_inv_iova(s, asid, addr, tg, count, ttl);
>> -
>> -num_pages -= count;
>> -first_page += count;
>> -addr += count * BIT_ULL(granule);
>> +num_pages = (mask + 1) >> granule;
>> +trace_smmuv3_s1_range_inval(vmid, asid, addr, tg, num_pages, ttl, 
>> leaf);
>> +smmuv3_inv_notifiers_iova(s, asid, addr, tg, num_pages);
>> +smmu_iotlb_inv_iova(s, asid, addr, tg, num_pages, ttl);
>> +addr += mask + 1;
>>  }
>>  }
> 
> thanks
> -- PMM
> 




Re: [PATCH] virtio-gpu: handle partial maps properly

2021-05-06 Thread Auger Eric
Hi Gerd,

On 5/6/21 11:10 AM, Gerd Hoffmann wrote:
> dma_memory_map() may map only a part of the request.  Happens if the
> request can't be mapped in one go, for example due to a iommu creating
> a linear dma mapping for scattered physical pages.  Should that be the
> case virtio-gpu must call dma_memory_map() again with the remaining
> range instead of simply throwing an error.
> 
> Note that this change implies the number of iov entries may differ from
> the number of mapping entries sent by the guest.  Therefore the iov_len
> bookkeeping needs some updates too, we have to explicitly pass around
> the iov length now.
> 
> Reported-by: Auger Eric 
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/virtio/virtio-gpu.h |  3 +-
>  hw/display/virtio-gpu-3d.c |  7 ++--
>  hw/display/virtio-gpu.c| 75 --
>  3 files changed, 51 insertions(+), 34 deletions(-)
> 
> diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
> index fae149235c58..0d15af41d96d 100644
> --- a/include/hw/virtio/virtio-gpu.h
> +++ b/include/hw/virtio/virtio-gpu.h
> @@ -209,7 +209,8 @@ void virtio_gpu_get_edid(VirtIOGPU *g,
>  int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
>struct virtio_gpu_resource_attach_backing 
> *ab,
>struct virtio_gpu_ctrl_command *cmd,
> -  uint64_t **addr, struct iovec **iov);
> +  uint64_t **addr, struct iovec **iov,
> +  uint32_t *niov);
>  void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
>  struct iovec *iov, uint32_t count);
>  void virtio_gpu_process_cmdq(VirtIOGPU *g);
> diff --git a/hw/display/virtio-gpu-3d.c b/hw/display/virtio-gpu-3d.c
> index d98964858e13..72c14d91324b 100644
> --- a/hw/display/virtio-gpu-3d.c
> +++ b/hw/display/virtio-gpu-3d.c
> @@ -283,22 +283,23 @@ static void virgl_resource_attach_backing(VirtIOGPU *g,
>  {
>  struct virtio_gpu_resource_attach_backing att_rb;
>  struct iovec *res_iovs;
> +uint32_t res_niov;
>  int ret;
>  
>  VIRTIO_GPU_FILL_CMD(att_rb);
>  trace_virtio_gpu_cmd_res_back_attach(att_rb.resource_id);
>  
> -ret = virtio_gpu_create_mapping_iov(g, _rb, cmd, NULL, _iovs);
> +ret = virtio_gpu_create_mapping_iov(g, _rb, cmd, NULL, _iovs, 
> _niov);
>  if (ret != 0) {
>  cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
>  return;
>  }
>  
>  ret = virgl_renderer_resource_attach_iov(att_rb.resource_id,
> - res_iovs, att_rb.nr_entries);
> + res_iovs, res_niov);
>  
>  if (ret != 0)
> -virtio_gpu_cleanup_mapping_iov(g, res_iovs, att_rb.nr_entries);
> +virtio_gpu_cleanup_mapping_iov(g, res_iovs, res_niov);
>  }
>  
>  static void virgl_resource_detach_backing(VirtIOGPU *g,
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index c9f5e36fd076..1dd3648f32a3 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -608,11 +608,12 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g,
>  int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
>struct virtio_gpu_resource_attach_backing 
> *ab,
>struct virtio_gpu_ctrl_command *cmd,
> -  uint64_t **addr, struct iovec **iov)
> +  uint64_t **addr, struct iovec **iov,
> +  uint32_t *niov)
>  {
>  struct virtio_gpu_mem_entry *ents;
>  size_t esize, s;
> -int i;
> +int e, v;
>  
>  if (ab->nr_entries > 16384) {
>  qemu_log_mask(LOG_GUEST_ERROR,
> @@ -633,37 +634,53 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
>  return -1;
>  }
>  
> -*iov = g_malloc0(sizeof(struct iovec) * ab->nr_entries);
> +*iov = NULL;
>  if (addr) {
> -*addr = g_malloc0(sizeof(uint64_t) * ab->nr_entries);
> +*addr = NULL;
>  }
> -for (i = 0; i < ab->nr_entries; i++) {
> -uint64_t a = le64_to_cpu(ents[i].addr);
> -uint32_t l = le32_to_cpu(ents[i].length);
> -hwaddr len = l;
> -(*iov)[i].iov_base = dma_memory_map(VIRTIO_DEVICE(g)->dma_as,
> -a, , 
> DMA_DIRECTION_TO_DEVICE);
> -(*iov)[i].iov_len = len;
> -if (addr) {
> -(*addr)[i] = a;
> -}
> -if (!(*iov)[i].iov_base || len != l) {
> -qemu_log_mask(LOG_

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-29 Thread Auger Eric
Hi Kunkun,

On 4/28/21 11:51 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/27 3:16, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/15/21 4:03 AM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/4/14 16:05, Auger Eric wrote:
>>>> Hi Kunkun,
>>>>
>>>> On 4/14/21 3:45 AM, Kunkun Jiang wrote:
>>>>> On 2021/4/13 20:57, Auger Eric wrote:
>>>>>> Hi Kunkun,
>>>>>>
>>>>>> On 4/13/21 2:10 PM, Kunkun Jiang wrote:
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> On 2021/4/11 20:08, Eric Auger wrote:
>>>>>>>> In nested mode, legacy vfio_iommu_map_notify cannot be used as
>>>>>>>> there is no "caching" mode and we do not trap on map.
>>>>>>>>
>>>>>>>> On Intel, vfio_iommu_map_notify was used to DMA map the RAM
>>>>>>>> through the host single stage.
>>>>>>>>
>>>>>>>> With nested mode, we need to setup the stage 2 and the stage 1
>>>>>>>> separately. This patch introduces a prereg_listener to setup
>>>>>>>> the stage 2 mapping.
>>>>>>>>
>>>>>>>> The stage 1 mapping, owned by the guest, is passed to the host
>>>>>>>> when the guest invalidates the stage 1 configuration, through
>>>>>>>> a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
>>>>>>>> are cascaded downto the host through another IOMMU MR UNMAP
>>>>>>>> notifier.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger 
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> v7 -> v8:
>>>>>>>> - properly handle new IOMMUTLBEntry fields and especially
>>>>>>>>   propagate DOMAIN and PASID based invalidations
>>>>>>>>
>>>>>>>> v6 -> v7:
>>>>>>>> - remove PASID based invalidation
>>>>>>>>
>>>>>>>> v5 -> v6:
>>>>>>>> - add error_report_err()
>>>>>>>> - remove the abort in case of nested stage case
>>>>>>>>
>>>>>>>> v4 -> v5:
>>>>>>>> - use VFIO_IOMMU_SET_PASID_TABLE
>>>>>>>> - use PCIPASIDOps for config notification
>>>>>>>>
>>>>>>>> v3 -> v4:
>>>>>>>> - use iommu_inv_pasid_info for ASID invalidation
>>>>>>>>
>>>>>>>> v2 -> v3:
>>>>>>>> - use VFIO_IOMMU_ATTACH_PASID_TABLE
>>>>>>>> - new user API
>>>>>>>> - handle leaf
>>>>>>>>
>>>>>>>> v1 -> v2:
>>>>>>>> - adapt to uapi changes
>>>>>>>> - pass the asid
>>>>>>>> - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
>>>>>>>> ---
>>>>>>>>  hw/vfio/common.c | 139
>>>>>>>> +--
>>>>>>>>  hw/vfio/pci.c    |  21 +++
>>>>>>>>  hw/vfio/trace-events |   2 +
>>>>>>>>  3 files changed, 157 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>>>> index 0cd7ef2139..e369d451e7 100644
>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>> @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
>>>>>>>> *iotlb, void **vaddr,
>>>>>>>>  return true;
>>>>>>>>  }
>>>>>>>>  +/* Propagate a guest IOTLB invalidation to the host (nested
>>>>>>>> mode) */
>>>>>>>> +static void vfio_iommu_unmap_notify(IOMMUNotifier *n,
>>>>>>>> IOMMUTLBEntry
>>>>>>>> *iotlb)
>>>>>>>> +{
>>>>>>>> +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>>>>>>>> +    struct vfio_iommu_type1_cache_inval

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-26 Thread Auger Eric
Hi Kunkun,

On 4/15/21 4:03 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/14 16:05, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/14/21 3:45 AM, Kunkun Jiang wrote:
>>> On 2021/4/13 20:57, Auger Eric wrote:
>>>> Hi Kunkun,
>>>>
>>>> On 4/13/21 2:10 PM, Kunkun Jiang wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 2021/4/11 20:08, Eric Auger wrote:
>>>>>> In nested mode, legacy vfio_iommu_map_notify cannot be used as
>>>>>> there is no "caching" mode and we do not trap on map.
>>>>>>
>>>>>> On Intel, vfio_iommu_map_notify was used to DMA map the RAM
>>>>>> through the host single stage.
>>>>>>
>>>>>> With nested mode, we need to setup the stage 2 and the stage 1
>>>>>> separately. This patch introduces a prereg_listener to setup
>>>>>> the stage 2 mapping.
>>>>>>
>>>>>> The stage 1 mapping, owned by the guest, is passed to the host
>>>>>> when the guest invalidates the stage 1 configuration, through
>>>>>> a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
>>>>>> are cascaded downto the host through another IOMMU MR UNMAP
>>>>>> notifier.
>>>>>>
>>>>>> Signed-off-by: Eric Auger 
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v7 -> v8:
>>>>>> - properly handle new IOMMUTLBEntry fields and especially
>>>>>>  propagate DOMAIN and PASID based invalidations
>>>>>>
>>>>>> v6 -> v7:
>>>>>> - remove PASID based invalidation
>>>>>>
>>>>>> v5 -> v6:
>>>>>> - add error_report_err()
>>>>>> - remove the abort in case of nested stage case
>>>>>>
>>>>>> v4 -> v5:
>>>>>> - use VFIO_IOMMU_SET_PASID_TABLE
>>>>>> - use PCIPASIDOps for config notification
>>>>>>
>>>>>> v3 -> v4:
>>>>>> - use iommu_inv_pasid_info for ASID invalidation
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - use VFIO_IOMMU_ATTACH_PASID_TABLE
>>>>>> - new user API
>>>>>> - handle leaf
>>>>>>
>>>>>> v1 -> v2:
>>>>>> - adapt to uapi changes
>>>>>> - pass the asid
>>>>>> - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
>>>>>> ---
>>>>>>     hw/vfio/common.c | 139
>>>>>> +--
>>>>>>     hw/vfio/pci.c    |  21 +++
>>>>>>     hw/vfio/trace-events |   2 +
>>>>>>     3 files changed, 157 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>> index 0cd7ef2139..e369d451e7 100644
>>>>>> --- a/hw/vfio/common.c
>>>>>> +++ b/hw/vfio/common.c
>>>>>> @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
>>>>>> *iotlb, void **vaddr,
>>>>>>     return true;
>>>>>>     }
>>>>>>     +/* Propagate a guest IOTLB invalidation to the host (nested
>>>>>> mode) */
>>>>>> +static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry
>>>>>> *iotlb)
>>>>>> +{
>>>>>> +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>>>>>> +    struct vfio_iommu_type1_cache_invalidate ustruct = {};
>>>>>> +    VFIOContainer *container = giommu->container;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    assert(iotlb->perm == IOMMU_NONE);
>>>>>> +
>>>>>> +    ustruct.argsz = sizeof(ustruct);
>>>>>> +    ustruct.flags = 0;
>>>>>> +    ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
>>>>>> +    ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
>>>>>> +    ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>>>> +
>>>>>> +    switch (iotlb->granularity) {
>>>>>> +    case IOMMU_INV_GRAN_DOMAIN:
>>>>>> +    ustruct.info.granularit

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-26 Thread Auger Eric
Hi Kunkun,

On 4/14/21 3:45 AM, Kunkun Jiang wrote:
> On 2021/4/13 20:57, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/13/21 2:10 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/4/11 20:08, Eric Auger wrote:
>>>> In nested mode, legacy vfio_iommu_map_notify cannot be used as
>>>> there is no "caching" mode and we do not trap on map.
>>>>
>>>> On Intel, vfio_iommu_map_notify was used to DMA map the RAM
>>>> through the host single stage.
>>>>
>>>> With nested mode, we need to setup the stage 2 and the stage 1
>>>> separately. This patch introduces a prereg_listener to setup
>>>> the stage 2 mapping.
>>>>
>>>> The stage 1 mapping, owned by the guest, is passed to the host
>>>> when the guest invalidates the stage 1 configuration, through
>>>> a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
>>>> are cascaded downto the host through another IOMMU MR UNMAP
>>>> notifier.
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v7 -> v8:
>>>> - properly handle new IOMMUTLBEntry fields and especially
>>>>     propagate DOMAIN and PASID based invalidations
>>>>
>>>> v6 -> v7:
>>>> - remove PASID based invalidation
>>>>
>>>> v5 -> v6:
>>>> - add error_report_err()
>>>> - remove the abort in case of nested stage case
>>>>
>>>> v4 -> v5:
>>>> - use VFIO_IOMMU_SET_PASID_TABLE
>>>> - use PCIPASIDOps for config notification
>>>>
>>>> v3 -> v4:
>>>> - use iommu_inv_pasid_info for ASID invalidation
>>>>
>>>> v2 -> v3:
>>>> - use VFIO_IOMMU_ATTACH_PASID_TABLE
>>>> - new user API
>>>> - handle leaf
>>>>
>>>> v1 -> v2:
>>>> - adapt to uapi changes
>>>> - pass the asid
>>>> - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
>>>> ---
>>>>    hw/vfio/common.c | 139
>>>> +--
>>>>    hw/vfio/pci.c    |  21 +++
>>>>    hw/vfio/trace-events |   2 +
>>>>    3 files changed, 157 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 0cd7ef2139..e369d451e7 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
>>>> *iotlb, void **vaddr,
>>>>    return true;
>>>>    }
>>>>    +/* Propagate a guest IOTLB invalidation to the host (nested
>>>> mode) */
>>>> +static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry
>>>> *iotlb)
>>>> +{
>>>> +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>>>> +    struct vfio_iommu_type1_cache_invalidate ustruct = {};
>>>> +    VFIOContainer *container = giommu->container;
>>>> +    int ret;
>>>> +
>>>> +    assert(iotlb->perm == IOMMU_NONE);
>>>> +
>>>> +    ustruct.argsz = sizeof(ustruct);
>>>> +    ustruct.flags = 0;
>>>> +    ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
>>>> +    ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
>>>> +    ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>> +
>>>> +    switch (iotlb->granularity) {
>>>> +    case IOMMU_INV_GRAN_DOMAIN:
>>>> +    ustruct.info.granularity = IOMMU_INV_GRANU_DOMAIN;
>>>> +    break;
>>>> +    case IOMMU_INV_GRAN_PASID:
>>>> +    {
>>>> +    struct iommu_inv_pasid_info *pasid_info;
>>>> +    int archid = -1;
>>>> +
>>>> +    pasid_info = _info;
>>>> +    ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>>>> +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
>>>> +    pasid_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
>>>> +    archid = iotlb->arch_id;
>>>> +    }
>>>> +    pasid_info->archid = archid;
>>>> +    trace_vfio_iommu_asid_inv_iotlb(archid);
>>>> +    break;
>>>> +    }
>>>> +    case IOMMU_INV_GRAN_ADDR:
>>>

Re: [PATCH 2/3] Acceptance Tests: move definition of distro checksums to the framework

2021-04-22 Thread Auger Eric
Hi Cleber,

On 4/15/21 12:14 AM, Cleber Rosa wrote:
> Instead of having, by default, the checksum in the tests, and the
> definition of tests in the framework, let's keep them together.
> 
> A central definition for distributions is available, and it should
> allow other known distros to be added more easily.
> 
> No behavior change is expected here, and tests can still define
> a distro_checksum value if for some reason they want to override
> the known distribution information.
> 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 34 +--
>  tests/acceptance/boot_linux.py|  8 --
>  2 files changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index aae1e5bbc9..97093614d9 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -299,6 +299,30 @@ def ssh_command(self, command):
>  return stdout_lines, stderr_lines
>  
>  
> +#: A collection of known distros and their respective image checksum
> +KNOWN_DISTROS = {
> +'fedora': {
> +'31': {
> +'x86_64':
> +{'checksum': 
> 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'},
> +'aarch64':
> +{'checksum': 
> '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'},
> +'ppc64':
> +{'checksum': 
> '7c3528b85a3df4b2306e892199a9e1e43f991c506f2cc390dc4efa2026ad2f58'},
> +'s390x':
> +{'checksum': 
> '4caaab5a434fd4d1079149a072fdc7891e354f834d355069ca982fdcaf5a122d'},
> +}
> +}
> +}
assuming we may put other data like kernel params and maybe pxeboot URL,
this may grow rapidly, Maybe we should put that in a different file?
> +
> +
> +def get_known_distro_checksum(distro, distro_version, arch):
> +try:
> +return 
> KNOWN_DISTROS.get(distro).get(distro_version).get(arch).get('checksum')
> +except AttributeError:
> +return None
> +
> +
>  class LinuxTest(Test, LinuxSSHMixIn):
>  """Facilitates having a cloud-image Linux based available.
>  
> @@ -348,14 +372,20 @@ def download_boot(self):
>  vmimage.QEMU_IMG = qemu_img
>  
>  self.log.info('Downloading/preparing boot image')
> +distro = 'fedora'
> +distro_version = '31'
> +known_distro_checksum = get_known_distro_checksum(distro,
> +  distro_version,
> +  self.arch)
> +distro_checksum = self.distro_checksum or known_distro_checksum
>  # Fedora 31 only provides ppc64le images
>  image_arch = self.arch
>  if image_arch == 'ppc64':
>  image_arch = 'ppc64le'
>  try:
>  boot = vmimage.get(
> -'fedora', arch=image_arch, version='31',
> -checksum=self.distro_checksum,
> +distro, arch=image_arch, version=distro_version,
> +checksum=distro_checksum,
>  algorithm='sha256',
>  cache_dir=self.cache_dirs[0],
>  snapshot_dir=self.workdir)
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index c7bc3a589e..9e618c6daa 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -20,8 +20,6 @@ class BootLinuxX8664(LinuxTest):
>  :avocado: tags=arch:x86_64
>  """
>  
> -distro_checksum = 
> 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
> -
>  def test_pc_i440fx_tcg(self):
>  """
>  :avocado: tags=machine:pc
> @@ -66,8 +64,6 @@ class BootLinuxAarch64(LinuxTest):
>  :avocado: tags=machine:gic-version=2
>  """
>  
> -distro_checksum = 
> '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
> -
>  def add_common_args(self):
>  self.vm.add_args('-bios',
>   os.path.join(BUILD_DIR, 'pc-bios',
> @@ -119,8 +115,6 @@ class BootLinuxPPC64(LinuxTest):
>  :avocado: tags=arch:ppc64
>  """
>  
> -distro_checksum = 
> '7c3528b85a3df4b2306e892199a9e1e43f991c506f2cc390dc4efa2026ad2f58'
> -
>  def test_pseries_tcg(self):
>  """
>  :avocado: tags=machine:pseries
> @@ -136,8 +130,6 @@ class BootLinuxS390X(LinuxTest):
>  :avocado: tags=arch:s390x
>  """
>  
> -distro_checksum = 
> '4caaab5a434fd4d1079149a072fdc7891e354f834d355069ca982fdcaf5a122d'
> -
>  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
>  def test_s390_ccw_virtio_tcg(self):
>  """
> 
Thanks

Eric




Re: [PATCH 1/3] Acceptance Tests: rename attribute holding the distro image checksum

2021-04-19 Thread Auger Eric
Hi Cleber,

On 4/15/21 12:14 AM, Cleber Rosa wrote:
> This renames the attribute that holds the checksum for the image Linux
> distribution image used.
> 
> The current name of the attribute is not very descriptive.  Also, in
> preparation for making the distribution used configurable, which will
user configurable
> add distro related parameters, attributes and tags, let's make the
> naming of those more uniform.
> 
> Signed-off-by: Cleber Rosa 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  tests/acceptance/avocado_qemu/__init__.py | 4 ++--
>  tests/acceptance/boot_linux.py| 8 
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 1062a851b9..aae1e5bbc9 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -307,7 +307,7 @@ class LinuxTest(Test, LinuxSSHMixIn):
>  """
>  
>  timeout = 900
> -chksum = None
> +distro_checksum = None
>  username = 'root'
>  password = 'password'
>  
> @@ -355,7 +355,7 @@ def download_boot(self):
>  try:
>  boot = vmimage.get(
>  'fedora', arch=image_arch, version='31',
> -checksum=self.chksum,
> +checksum=self.distro_checksum,
>  algorithm='sha256',
>  cache_dir=self.cache_dirs[0],
>  snapshot_dir=self.workdir)
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 314370fd1f..c7bc3a589e 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -20,7 +20,7 @@ class BootLinuxX8664(LinuxTest):
>  :avocado: tags=arch:x86_64
>  """
>  
> -chksum = 
> 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
> +distro_checksum = 
> 'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0'
>  
>  def test_pc_i440fx_tcg(self):
>  """
> @@ -66,7 +66,7 @@ class BootLinuxAarch64(LinuxTest):
>  :avocado: tags=machine:gic-version=2
>  """
>  
> -chksum = 
> '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
> +distro_checksum = 
> '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
>  
>  def add_common_args(self):
>  self.vm.add_args('-bios',
> @@ -119,7 +119,7 @@ class BootLinuxPPC64(LinuxTest):
>  :avocado: tags=arch:ppc64
>  """
>  
> -chksum = 
> '7c3528b85a3df4b2306e892199a9e1e43f991c506f2cc390dc4efa2026ad2f58'
> +distro_checksum = 
> '7c3528b85a3df4b2306e892199a9e1e43f991c506f2cc390dc4efa2026ad2f58'
>  
>  def test_pseries_tcg(self):
>  """
> @@ -136,7 +136,7 @@ class BootLinuxS390X(LinuxTest):
>  :avocado: tags=arch:s390x
>  """
>  
> -chksum = 
> '4caaab5a434fd4d1079149a072fdc7891e354f834d355069ca982fdcaf5a122d'
> +distro_checksum = 
> '4caaab5a434fd4d1079149a072fdc7891e354f834d355069ca982fdcaf5a122d'
>  
>  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
>  def test_s390_ccw_virtio_tcg(self):
> 




Re: [PATCH v3 11/11] tests/acceptance/virtiofs_submounts.py: fix setup of SSH pubkey

2021-04-19 Thread Auger Eric
Hi Cleber,

On 4/12/21 6:46 AM, Cleber Rosa wrote:
> The public key argument should be a path to a file, and not the
> public key data.
> 
> Reported-by: Wainer dos Santos Moschetta 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/acceptance/virtiofs_submounts.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index d77ee356740..21ad7d792e7 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -195,7 +195,7 @@ def setUp(self):
>  
>  self.run(('ssh-keygen', '-N', '', '-t', 'ed25519', '-f', 
> self.ssh_key))
>  
> -pubkey = open(self.ssh_key + '.pub').read()
> +pubkey = self.ssh_key + '.pub'
>  
>  super(VirtiofsSubmountsTest, self).setUp(pubkey)
>  
> 
Reviewed-by: Eric Auger 

Thanks

Eric




Re: [PATCH v3 07/11] Acceptance Tests: set up SSH connection by default after boot for LinuxTest

2021-04-19 Thread Auger Eric
Hi Cleber,

On 4/12/21 6:46 AM, Cleber Rosa wrote:
> The LinuxTest specifically targets users that need to interact with Linux
> guests.  So, it makes sense to give a connection by default, and avoid
> requiring it as boiler-plate code.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Marc-André Lureau 
> Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  tests/acceptance/avocado_qemu/__init__.py |  5 -
>  tests/acceptance/boot_linux.py| 18 +-
>  tests/acceptance/virtiofs_submounts.py|  1 -
>  3 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 25f871f5bc6..1062a851b97 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -391,7 +391,7 @@ def set_up_cloudinit(self, ssh_pubkey=None):
>  cloudinit_iso = self.prepare_cloudinit(ssh_pubkey)
>  self.vm.add_args('-drive', 'file=%s,format=raw' % cloudinit_iso)
>  
> -def launch_and_wait(self):
> +def launch_and_wait(self, set_up_ssh_connection=True):
>  self.vm.set_console()
>  self.vm.launch()
>  console_drainer = 
> datadrainer.LineLogger(self.vm.console_socket.fileno(),
> @@ -399,3 +399,6 @@ def launch_and_wait(self):
>  console_drainer.start()
>  self.log.info('VM launched, waiting for boot confirmation from 
> guest')
>  cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
> self.name)
> +if set_up_ssh_connection:
> +self.log.info('Setting up the SSH connection')
> +self.ssh_connect(self.username, self.ssh_key)
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 0d178038a09..314370fd1f5 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -29,7 +29,7 @@ def test_pc_i440fx_tcg(self):
>  """
>  self.require_accelerator("tcg")
>  self.vm.add_args("-accel", "tcg")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  def test_pc_i440fx_kvm(self):
>  """
> @@ -38,7 +38,7 @@ def test_pc_i440fx_kvm(self):
>  """
>  self.require_accelerator("kvm")
>  self.vm.add_args("-accel", "kvm")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  def test_pc_q35_tcg(self):
>  """
> @@ -47,7 +47,7 @@ def test_pc_q35_tcg(self):
>  """
>  self.require_accelerator("tcg")
>  self.vm.add_args("-accel", "tcg")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  def test_pc_q35_kvm(self):
>  """
> @@ -56,7 +56,7 @@ def test_pc_q35_kvm(self):
>  """
>  self.require_accelerator("kvm")
>  self.vm.add_args("-accel", "kvm")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  
>  class BootLinuxAarch64(LinuxTest):
> @@ -85,7 +85,7 @@ def test_virt_tcg(self):
>  self.vm.add_args("-cpu", "max")
>  self.vm.add_args("-machine", "virt,gic-version=2")
>  self.add_common_args()
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  def test_virt_kvm_gicv2(self):
>  """
> @@ -98,7 +98,7 @@ def test_virt_kvm_gicv2(self):
>  self.vm.add_args("-cpu", "host")
>  self.vm.add_args("-machine", "virt,gic-version=2")
>  self.add_common_args()
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  def test_virt_kvm_gicv3(self):
>  """
> @@ -111,7 +111,7 @@ def test_virt_kvm_gicv3(self):
>  self.vm.add_args("-cpu", "host")
>  self.vm.add_args("-machine", "virt,gic-version=3")
>  self.add_common_args()
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  
>  class BootLinuxPPC64(LinuxTest):
> @@ -128,7 +128,7 @@ def test_pseries_tcg(self):
>  """
>  self.require_accelerator("tcg")
>  self.vm.add_args("-accel", "tcg")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
>  
>  
>  class BootLinuxS390X(LinuxTest):
> @@ -146,4 +146,4 @@ def test_s390_ccw_virtio_tcg(self):
>  """
>  self.require_accelerator("tcg")
>  self.vm.add_args("-accel", "tcg")
> -self.launch_and_wait()
> +self.launch_and_wait(set_up_ssh_connection=False)
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index e10a935ac4e..e019d3b896b 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -136,7 +136,6 @@ def set_up_virtiofs(self):
>  
>  def launch_vm(self):
>  

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-14 Thread Auger Eric
Hi Kunkun,

On 4/14/21 3:45 AM, Kunkun Jiang wrote:
> On 2021/4/13 20:57, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 4/13/21 2:10 PM, Kunkun Jiang wrote:
>>> Hi Eric,
>>>
>>> On 2021/4/11 20:08, Eric Auger wrote:
>>>> In nested mode, legacy vfio_iommu_map_notify cannot be used as
>>>> there is no "caching" mode and we do not trap on map.
>>>>
>>>> On Intel, vfio_iommu_map_notify was used to DMA map the RAM
>>>> through the host single stage.
>>>>
>>>> With nested mode, we need to setup the stage 2 and the stage 1
>>>> separately. This patch introduces a prereg_listener to setup
>>>> the stage 2 mapping.
>>>>
>>>> The stage 1 mapping, owned by the guest, is passed to the host
>>>> when the guest invalidates the stage 1 configuration, through
>>>> a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
>>>> are cascaded downto the host through another IOMMU MR UNMAP
>>>> notifier.
>>>>
>>>> Signed-off-by: Eric Auger 
>>>>
>>>> ---
>>>>
>>>> v7 -> v8:
>>>> - properly handle new IOMMUTLBEntry fields and especially
>>>>     propagate DOMAIN and PASID based invalidations
>>>>
>>>> v6 -> v7:
>>>> - remove PASID based invalidation
>>>>
>>>> v5 -> v6:
>>>> - add error_report_err()
>>>> - remove the abort in case of nested stage case
>>>>
>>>> v4 -> v5:
>>>> - use VFIO_IOMMU_SET_PASID_TABLE
>>>> - use PCIPASIDOps for config notification
>>>>
>>>> v3 -> v4:
>>>> - use iommu_inv_pasid_info for ASID invalidation
>>>>
>>>> v2 -> v3:
>>>> - use VFIO_IOMMU_ATTACH_PASID_TABLE
>>>> - new user API
>>>> - handle leaf
>>>>
>>>> v1 -> v2:
>>>> - adapt to uapi changes
>>>> - pass the asid
>>>> - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
>>>> ---
>>>>    hw/vfio/common.c | 139
>>>> +--
>>>>    hw/vfio/pci.c    |  21 +++
>>>>    hw/vfio/trace-events |   2 +
>>>>    3 files changed, 157 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>> index 0cd7ef2139..e369d451e7 100644
>>>> --- a/hw/vfio/common.c
>>>> +++ b/hw/vfio/common.c
>>>> @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
>>>> *iotlb, void **vaddr,
>>>>    return true;
>>>>    }
>>>>    +/* Propagate a guest IOTLB invalidation to the host (nested
>>>> mode) */
>>>> +static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry
>>>> *iotlb)
>>>> +{
>>>> +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>>>> +    struct vfio_iommu_type1_cache_invalidate ustruct = {};
>>>> +    VFIOContainer *container = giommu->container;
>>>> +    int ret;
>>>> +
>>>> +    assert(iotlb->perm == IOMMU_NONE);
>>>> +
>>>> +    ustruct.argsz = sizeof(ustruct);
>>>> +    ustruct.flags = 0;
>>>> +    ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
>>>> +    ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
>>>> +    ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>>>> +
>>>> +    switch (iotlb->granularity) {
>>>> +    case IOMMU_INV_GRAN_DOMAIN:
>>>> +    ustruct.info.granularity = IOMMU_INV_GRANU_DOMAIN;
>>>> +    break;
>>>> +    case IOMMU_INV_GRAN_PASID:
>>>> +    {
>>>> +    struct iommu_inv_pasid_info *pasid_info;
>>>> +    int archid = -1;
>>>> +
>>>> +    pasid_info = _info;
>>>> +    ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>>>> +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
>>>> +    pasid_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
>>>> +    archid = iotlb->arch_id;
>>>> +    }
>>>> +    pasid_info->archid = archid;
>>>> +    trace_vfio_iommu_asid_inv_iotlb(archid);
>>>> +    break;
>>>> +    }
>>>> +    case IOMMU_INV_GRAN_ADDR:
>>>

Re: [RFC v9 15/29] vfio: Set up nested stage mappings

2021-04-13 Thread Auger Eric
Hi Kunkun,

On 4/13/21 2:10 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/11 20:08, Eric Auger wrote:
>> In nested mode, legacy vfio_iommu_map_notify cannot be used as
>> there is no "caching" mode and we do not trap on map.
>>
>> On Intel, vfio_iommu_map_notify was used to DMA map the RAM
>> through the host single stage.
>>
>> With nested mode, we need to setup the stage 2 and the stage 1
>> separately. This patch introduces a prereg_listener to setup
>> the stage 2 mapping.
>>
>> The stage 1 mapping, owned by the guest, is passed to the host
>> when the guest invalidates the stage 1 configuration, through
>> a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
>> are cascaded downto the host through another IOMMU MR UNMAP
>> notifier.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v7 -> v8:
>> - properly handle new IOMMUTLBEntry fields and especially
>>    propagate DOMAIN and PASID based invalidations
>>
>> v6 -> v7:
>> - remove PASID based invalidation
>>
>> v5 -> v6:
>> - add error_report_err()
>> - remove the abort in case of nested stage case
>>
>> v4 -> v5:
>> - use VFIO_IOMMU_SET_PASID_TABLE
>> - use PCIPASIDOps for config notification
>>
>> v3 -> v4:
>> - use iommu_inv_pasid_info for ASID invalidation
>>
>> v2 -> v3:
>> - use VFIO_IOMMU_ATTACH_PASID_TABLE
>> - new user API
>> - handle leaf
>>
>> v1 -> v2:
>> - adapt to uapi changes
>> - pass the asid
>> - pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
>> ---
>>   hw/vfio/common.c | 139 +--
>>   hw/vfio/pci.c    |  21 +++
>>   hw/vfio/trace-events |   2 +
>>   3 files changed, 157 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 0cd7ef2139..e369d451e7 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -595,6 +595,73 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry
>> *iotlb, void **vaddr,
>>   return true;
>>   }
>>   +/* Propagate a guest IOTLB invalidation to the host (nested mode) */
>> +static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry
>> *iotlb)
>> +{
>> +    VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>> +    struct vfio_iommu_type1_cache_invalidate ustruct = {};
>> +    VFIOContainer *container = giommu->container;
>> +    int ret;
>> +
>> +    assert(iotlb->perm == IOMMU_NONE);
>> +
>> +    ustruct.argsz = sizeof(ustruct);
>> +    ustruct.flags = 0;
>> +    ustruct.info.argsz = sizeof(struct iommu_cache_invalidate_info);
>> +    ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
>> +    ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
>> +
>> +    switch (iotlb->granularity) {
>> +    case IOMMU_INV_GRAN_DOMAIN:
>> +    ustruct.info.granularity = IOMMU_INV_GRANU_DOMAIN;
>> +    break;
>> +    case IOMMU_INV_GRAN_PASID:
>> +    {
>> +    struct iommu_inv_pasid_info *pasid_info;
>> +    int archid = -1;
>> +
>> +    pasid_info = _info;
>> +    ustruct.info.granularity = IOMMU_INV_GRANU_PASID;
>> +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
>> +    pasid_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
>> +    archid = iotlb->arch_id;
>> +    }
>> +    pasid_info->archid = archid;
>> +    trace_vfio_iommu_asid_inv_iotlb(archid);
>> +    break;
>> +    }
>> +    case IOMMU_INV_GRAN_ADDR:
>> +    {
>> +    hwaddr start = iotlb->iova + giommu->iommu_offset;
>> +    struct iommu_inv_addr_info *addr_info;
>> +    size_t size = iotlb->addr_mask + 1;
>> +    int archid = -1;
>> +
>> +    addr_info = _info;
>> +    ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
>> +    if (iotlb->leaf) {
>> +    addr_info->flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
>> +    }
>> +    if (iotlb->flags & IOMMU_INV_FLAGS_ARCHID) {
>> +    addr_info->flags |= IOMMU_INV_ADDR_FLAGS_ARCHID;
>> +    archid = iotlb->arch_id;
>> +    }
>> +    addr_info->archid = archid;
>> +    addr_info->addr = start;
>> +    addr_info->granule_size = size;
>> +    addr_info->nb_granules = 1;
>> +    trace_vfio_iommu_addr_inv_iotlb(archid, start, size,
>> +    1, iotlb->leaf);
>> +    break;
>> +    }
> Should we pass a size to  host kernel here, even if vSMMU doesn't support
> RIL or guest kernel doesn't use RIL?
> 
> It will cause TLBI issue in  this scenario: Guest kernel issues a TLBI cmd
> without "range" (tg = 0) to invalidate a 2M huge page. Then qemu passed
> the iova and size (4K) to host kernel. Finally, host kernel issues a
> TLBI cmd
> with "range" (4K) which can not invalidate the TLB entry of 2M huge page.
> (pSMMU supports RIL)

In that case the guest will loop over all 4K images belonging to the 2M
huge page and invalidate each of them. This should turn into qemu
notifications for each 4kB page, no? This is totally inefficient, hence
the support of RIL on guest side and QEMU device.

What do I miss?

Thanks


Re: [PATCH RFC RESEND v2 4/6] hw/arm/virt-acpi-build: Add explicit idmap info in IORT table

2021-04-13 Thread Auger Eric
Hi Xingang,

On 3/25/21 8:22 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> The idmap of smmuv3 and root complex covers the whole RID space for now,
> this patch add explicit idmap info according to root bus number range.
> This add smmuv3 idmap for certain bus which has enabled the iommu property.
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 
> ---
>  hw/arm/virt-acpi-build.c | 103 ++-
>  1 file changed, 81 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f5a2b2d4cb..5491036c86 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -44,6 +44,7 @@
>  #include "hw/acpi/tpm.h"
>  #include "hw/pci/pcie_host.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bus.h"
>  #include "hw/pci-host/gpex.h"
>  #include "hw/arm/virt.h"
>  #include "hw/mem/nvdimm.h"
> @@ -237,6 +238,41 @@ static void acpi_dsdt_add_tpm(Aml *scope, 
> VirtMachineState *vms)
>  aml_append(scope, dev);
>  }
>  
> +typedef
> +struct AcpiIortMapping {
> +AcpiIortIdMapping idmap;
> +bool iommu;
> +} AcpiIortMapping;
> +
> +/* For all PCI host bridges, walk and insert DMAR scope */
this comment should rather be in the caller
also DMAR is not the ARM vocable.

I would add the comment for this function:
build the ID mapping for aa given PCI host bridge
> +static int
> +iort_host_bridges(Object *obj, void *opaque)
> +{
> +GArray *map_blob = opaque;
> +AcpiIortMapping map;
> +AcpiIortIdMapping *idmap = 
> +int bus_num, max_bus;
> +
> +if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +
> +if (bus) {
> +bus_num = pci_bus_num(bus);
> +max_bus = pci_root_bus_max_bus(bus);
> +
> +idmap->input_base = cpu_to_le32(bus_num << 8);
> +idmap->id_count = cpu_to_le32((max_bus - bus_num + 1) << 8);
> +idmap->output_base = cpu_to_le32(bus_num << 8);
> +idmap->flags = cpu_to_le32(0);
> +
> +map.iommu = pci_root_bus_has_iommu(bus);
if iommu is not set, we don't need to populate the idmap and we may even
directly continue, ie. not add the element the map_bap_blob, no?
> +g_array_append_val(map_blob, map);
> +}
> +}
> +
> +return 0;
> +}
> +
>  static void
>  build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  {
> @@ -247,6 +283,21 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  AcpiIortSmmu3 *smmu;
>  size_t node_size, iort_node_offset, iort_length, smmu_offset = 0;
>  AcpiIortRC *rc;
> +int smmu_mapping_count;
> +GArray *map_blob = g_array_new(false, true, sizeof(AcpiIortMapping));
> +AcpiIortMapping *map;
> +
> +/* pci_for_each_bus(vms->bus, insert_map, map_blob); */
comment to be removed
> +object_child_foreach_recursive(object_get_root(),
> +   iort_host_bridges, map_blob);
> +
> +smmu_mapping_count = 0;
> +for (int i = 0; i < map_blob->len; i++) {
> +map = _array_index(map_blob, AcpiIortMapping, i);
> +if (map->iommu) {
> +smmu_mapping_count++;
> +}
> +}
>  
>  iort = acpi_data_push(table_data, sizeof(*iort));
>  
> @@ -280,13 +331,13 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  
>  /* SMMUv3 node */
>  smmu_offset = iort_node_offset + node_size;
> -node_size = sizeof(*smmu) + sizeof(*idmap);
> +node_size = sizeof(*smmu) + sizeof(*idmap) * smmu_mapping_count;
>  iort_length += node_size;
>  smmu = acpi_data_push(table_data, node_size);
>  
>  smmu->type = ACPI_IORT_NODE_SMMU_V3;
>  smmu->length = cpu_to_le16(node_size);
> -smmu->mapping_count = cpu_to_le32(1);
> +smmu->mapping_count = cpu_to_le32(smmu_mapping_count);
>  smmu->mapping_offset = cpu_to_le32(sizeof(*smmu));
>  smmu->base_address = cpu_to_le64(vms->memmap[VIRT_SMMU].base);
>  smmu->flags = cpu_to_le32(ACPI_IORT_SMMU_V3_COHACC_OVERRIDE);
> @@ -295,23 +346,28 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  smmu->gerr_gsiv = cpu_to_le32(irq + 2);
>  smmu->sync_gsiv = cpu_to_le32(irq + 3);
>  
> -/* Identity RID mapping covering the whole input RID range */
> -idmap = >id_mapping_array[0];
> -idmap->input_base = 0;
> -idmap->id_count = cpu_to_le32(0x);
> -idmap->output_base = 0;
> -/* output IORT node is the ITS group node (the first node) */
> -idmap->output_reference = cpu_to_le32(iort_node_offset);
> +for (int i = 0, j = 0; i < map_blob->len; i++) {
> +map = _array_index(map_blob, AcpiIortMapping, i);
> +
> +if (!map->iommu) {
> +continue;
> +}
> +
> +idmap = 

Re: [PATCH RFC RESEND v2 3/6] hw/pci: Add pci_root_bus_max_bus

2021-04-13 Thread Auger Eric
Hi Xingang,

On 3/25/21 8:22 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> This helps to find max bus number of a root bus.
s/max bus number of a root bus/highest bus number of a bridge hierarchy?
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 
> ---
>  hw/pci/pci.c | 34 ++
>  include/hw/pci/pci.h |  1 +
>  2 files changed, 35 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e17aa9075f..c7957cbf7c 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -538,6 +538,40 @@ int pci_bus_num(PCIBus *s)
>  return PCI_BUS_GET_CLASS(s)->bus_num(s);
>  }
>  
> +int pci_root_bus_max_bus(PCIBus *bus)
> +{
> +PCIHostState *host;
> +PCIDevice *dev;
> +int max_bus = 0;
> +int type, devfn;
> +uint8_t subordinate;
> +
> +if (!pci_bus_is_root(bus)) {
> +return 0;
> +}
> +
> +host = PCI_HOST_BRIDGE(BUS(bus)->parent);
> +max_bus = pci_bus_num(host->bus);
> +
> +for (devfn = 0; devfn < ARRAY_SIZE(host->bus->devices); devfn++) {
> +dev = host->bus->devices[devfn];
> +
> +if (!dev) {
> +continue;
> +}
> +
> +type = dev->config[PCI_HEADER_TYPE] & 
> ~PCI_HEADER_TYPE_MULTI_FUNCTION;
Seems there is PCI_DEVICE_GET_CLASS(dev)->is_bridge (see
pci_root_bus_in_range). Can't that be used instead?
> +if (type == PCI_HEADER_TYPE_BRIDGE) {
> +subordinate = dev->config[PCI_SUBORDINATE_BUS];
> +if (subordinate > max_bus) {
> +max_bus = subordinate;
what about the secondary bus number, it is always less than the others?

Thanks

Eric
> +}
> +}
> +}
> +
> +return max_bus;
> +}
> +
>  int pci_bus_numa_node(PCIBus *bus)
>  {
>  return PCI_BUS_GET_CLASS(bus)->numa_node(bus);
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 718b5a454a..e0c69534f4 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -450,6 +450,7 @@ static inline PCIBus *pci_get_bus(const PCIDevice *dev)
>  return PCI_BUS(qdev_get_parent_bus(DEVICE(dev)));
>  }
>  int pci_bus_num(PCIBus *s);
> +int pci_root_bus_max_bus(PCIBus *bus);
>  static inline int pci_dev_bus_num(const PCIDevice *dev)
>  {
>  return pci_bus_num(pci_get_bus(dev));
> 




Re: [PATCH RFC RESEND v2 2/6] hw/pci: Add iommu option for pci root bus

2021-04-12 Thread Auger Eric
Hi Wang,

On 3/25/21 8:22 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> This add iommu option for pci root bus, including primary bus
> and pxb root bus. The option is valid only if there is a virtual
> iommu device.
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 

same in this patch I would prefer to inverse the logic and use something
like bypass_iommu.

Sorry it an invasive change for you. Maybe wait for the other's feedbacks.

> ---
>  hw/arm/virt.c   | 25 +
>  hw/i386/pc.c| 19 +++
>  hw/pci-bridge/pci_expander_bridge.c |  3 +++
>  hw/pci-host/q35.c   |  1 +
>  include/hw/arm/virt.h   |  1 +
>  include/hw/i386/pc.h|  1 +
Also I think this patch would need to be split into several ones (at
least this is what I would do)
- 1 patch for the pci_expander_bridge.c  change ("add an iommy bypass prop")
- 1  patch for virt machine where you add a machine option to bypass the
iommu translation for the root bus.
- 1 patch of pc/q35

Indeed the patch title title does not reflect the machine changes and
the pxb changes
>  6 files changed, 50 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index aa2bbd14e0..446b3b867f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1366,6 +1366,7 @@ static void create_pcie(VirtMachineState *vms)
>  }
>  
>  pci = PCI_HOST_BRIDGE(dev);
> +pci->iommu = vms->primary_bus_iommu;
>  vms->bus = pci->bus;
>  if (vms->bus) {
>  for (i = 0; i < nb_nics; i++) {
> @@ -2319,6 +2320,20 @@ static void virt_set_iommu(Object *obj, const char 
> *value, Error **errp)
>  }
>  }

>  
> +static bool virt_get_primary_bus_iommu(Object *obj, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +return vms->primary_bus_iommu;
> +}
> +
> +static void virt_set_primary_bus_iommu(Object *obj, bool value, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +vms->primary_bus_iommu = value;
> +}
> +
>  static CpuInstanceProperties
>  virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> @@ -2652,6 +2667,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
> void *data)
>"Set the IOMMU type. "
>"Valid values are none and 
> smmuv3");
>  
> +object_class_property_add_bool(oc, "primary_bus_iommu",
> +  virt_get_primary_bus_iommu,
> +  virt_set_primary_bus_iommu);
> +object_class_property_set_description(oc, "primary_bus_iommu",
> +  "Set on/off to enable/disable "
> +  "iommu for primary bus");
> +
>  object_class_property_add_bool(oc, "ras", virt_get_ras,
> virt_set_ras);
>  object_class_property_set_description(oc, "ras",
> @@ -2719,6 +2741,9 @@ static void virt_instance_init(Object *obj)
>  /* Default disallows iommu instantiation */
>  vms->iommu = VIRT_IOMMU_NONE;
>  
> +/* The primary bus is attached to iommu by default */
> +vms->primary_bus_iommu = true;
> +
>  /* Default disallows RAS instantiation */
>  vms->ras = false;
>  
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 8a84b25a03..b64e4bb7f2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1529,6 +1529,21 @@ static void pc_machine_set_hpet(Object *obj, bool 
> value, Error **errp)
>  pcms->hpet_enabled = value;
>  }
>  
> +static bool pc_machine_get_primary_bus_iommu(Object *obj, Error **errp)
> +{
> +PCMachineState *pcms = PC_MACHINE(obj);
> +
> +return pcms->primary_bus_iommu;
> +}
> +
> +static void pc_machine_set_primary_bus_iommu(Object *obj, bool value,
> + Error **errp)
> +{
> +PCMachineState *pcms = PC_MACHINE(obj);
> +
> +pcms->primary_bus_iommu = value;
> +}
> +
>  static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
>  const char *name, void *opaque,
>  Error **errp)
> @@ -1628,6 +1643,7 @@ static void pc_machine_initfn(Object *obj)
>  #ifdef CONFIG_HPET
>  pcms->hpet_enabled = true;
>  #endif
> +pcms->primary_bus_iommu = true;
>  
>  pc_system_flash_create(pcms);
>  pcms->pcspk = isa_new(TYPE_PC_SPEAKER);
> @@ -1752,6 +1768,9 @@ static void pc_machine_class_init(ObjectClass *oc, void 
> *data)
>  object_class_property_add_bool(oc, "hpet",
>  pc_machine_get_hpet, pc_machine_set_hpet);
>  
> +object_class_property_add_bool(oc, "primary_bus_iommu",
> +pc_machine_get_primary_bus_iommu, pc_machine_set_primary_bus_iommu);
> +
>  object_class_property_add(oc, PC_MACHINE_MAX_FW_SIZE, "size",
>  pc_machine_get_max_fw_size, 

Re: [PATCH RFC RESEND v2 1/6] hw/pci/pci_host: Add iommu property for pci host

2021-04-12 Thread Auger Eric
Hi Wang,

On 3/25/21 8:22 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> The pci host iommu property is useful to check whether
> the iommu is enabled on the pci root bus.
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 
> ---
>  hw/pci/pci.c  | 18 +-
>  hw/pci/pci_host.c |  2 ++
>  include/hw/pci/pci.h  |  1 +
>  include/hw/pci/pci_host.h |  1 +
>  4 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index ac9a24889c..e17aa9075f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -417,6 +417,22 @@ const char *pci_root_bus_path(PCIDevice *dev)
>  return rootbus->qbus.name;
>  }
>  
> +bool pci_root_bus_has_iommu(PCIBus *bus)
"has_iommu" is misleading as it does not mean an IOMMU is actually
instantiated but rather that if there is any, it will translate
transactions coming from this primary bus

I would rather inverse the logic and have a

"bypass_iommu" property defaulting to false

and this function dubbed something like pci_root_bus_bypass_iommu
> +{
> +PCIBus *rootbus = bus;
> +PCIHostState *host_bridge;
> +
> +if (!pci_bus_is_root(bus)) {
> +rootbus = pci_device_root_bus(bus->parent_dev);
> +}
> +
> +host_bridge = PCI_HOST_BRIDGE(rootbus->qbus.parent);
> +
> +assert(host_bridge->bus == rootbus);
> +
> +return host_bridge->iommu;
> +}
> +
>  static void pci_root_bus_init(PCIBus *bus, DeviceState *parent,
>MemoryRegion *address_space_mem,
>MemoryRegion *address_space_io,
> @@ -2716,7 +2732,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
> *dev)
>  
>  iommu_bus = parent_bus;
>  }
> -if (iommu_bus && iommu_bus->iommu_fn) {
> +if (pci_root_bus_has_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
>  return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
>  }
>  return _space_memory;
> diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
> index 8ca5fadcbd..92ce213b18 100644
> --- a/hw/pci/pci_host.c
> +++ b/hw/pci/pci_host.c
> @@ -222,6 +222,8 @@ const VMStateDescription vmstate_pcihost = {
>  static Property pci_host_properties_common[] = {
>  DEFINE_PROP_BOOL("x-config-reg-migration-enabled", PCIHostState,
>   mig_enabled, true),
> +DEFINE_PROP_BOOL("pci-host-iommu-enabled", PCIHostState,
> + iommu, true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6be4e0c460..718b5a454a 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -480,6 +480,7 @@ void pci_for_each_bus(PCIBus *bus,
>  
>  PCIBus *pci_device_root_bus(const PCIDevice *d);
>  const char *pci_root_bus_path(PCIDevice *dev);
> +bool pci_root_bus_has_iommu(PCIBus *bus);
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn);
>  int pci_qdev_find_device(const char *id, PCIDevice **pdev);
>  void pci_bus_get_w64_range(PCIBus *bus, Range *range);
> diff --git a/include/hw/pci/pci_host.h b/include/hw/pci/pci_host.h
> index 52e038c019..64128e3a19 100644
> --- a/include/hw/pci/pci_host.h
> +++ b/include/hw/pci/pci_host.h
> @@ -43,6 +43,7 @@ struct PCIHostState {
>  uint32_t config_reg;
>  bool mig_enabled;
>  PCIBus *bus;
> +bool iommu;
>  
>  QLIST_ENTRY(PCIHostState) next;
>  };
> 
Thanks

Eric




Re: [RFC PATCH 0/3] Add migration support for VFIO PCI devices in SMMUv3 nested stage mode

2021-04-12 Thread Auger Eric
Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> Hi all,
> 
> Since the SMMUv3's nested translation stages[1] has been introduced by Eric, 
> we
> need to pay attention to the migration of VFIO PCI devices in SMMUv3 nested 
> stage
> mode. At present, it is not yet supported in QEMU. There are two problems in 
> the
> existing framework.
> 
> First, the current way to get dirty pages is not applicable to nested stage 
> mode.
> Because of the "Caching Mode", VTD can map the RAM through the host single 
> stage
> (giova->hpa). "vfio_listener_log_sync" gets dirty pages by transferring 
> "giova"
> to the kernel for the RAM block section of mapped MMIO region. In nested stage
> mode, we setup the stage 2 (gpa->hpa) and the stage 1(giova->gpa) separately. 
> So
> it is inapplicable to get dirty pages by the current way in nested stage mode.
> 
> Second, it also need to pass stage 1 configurations to the destination host 
> after
> the migration. In Eric's patch, it passes the stage 1 configuration to the 
> host on
> each STE update for the devices set the PASID PciOps. The configuration will 
> be
> applied at physical level. But the data of physical level will not be sent to 
> the
> destination host. So we have to pass stage 1 configurations to the destination
> host after the migration.
> 
> This Patch set includes patches as below:
> Patch 1-2:
> - Refactor the vfio_listener_log_sync and added a new function to get dirty 
> pages
> in nested stage mode.
> 
> Patch 3:
> - Added the post_load function to vmstate_smmuv3 for passing stage 1 
> configuration
> to the destination host after the migration.
> 
> @Eric, Could you please add this Patch set to your future version of
> "vSMMUv3/pSMMUv3 2 stage VFIO integration", if you think this Patch set makes 
> sense? :)
First of all, thank you for working on this. As you may have noticed I
sent a new RFC version yesterday (without including this). When time
allows, you may have a look at the comments I posted on your series. I
don't think I can test it at the moment so I may prefer to keep it
separate. Also be aware that the QEMU integration of nested has not
received much comments yet and is likely to evolve. The priority is to
get some R-b's on the kernel pieces, especially the SMMU part. With this
dependency resolved, things can't move forward I am afraid.

Thanks

Eric
> 
> Best Regards
> Kunkun Jiang
> 
> [1] [RFC,v7,00/26] vSMMUv3/pSMMUv3 2 stage VFIO integration
> http://patchwork.ozlabs.org/project/qemu-devel/cover/20201116181349.11908-1-eric.au...@redhat.com/
> 
> Kunkun Jiang (3):
>   vfio: Introduce helpers to mark dirty pages of a RAM section
>   vfio: Add vfio_prereg_listener_log_sync in nested stage
>   hw/arm/smmuv3: Post-load stage 1 configurations to the host
> 
>  hw/arm/smmuv3.c | 60 +
>  hw/arm/trace-events |  1 +
>  hw/vfio/common.c| 47 +--
>  3 files changed, 100 insertions(+), 8 deletions(-)
> 




Re: [RFC PATCH 3/3] hw/arm/smmuv3: Post-load stage 1 configurations to the host

2021-04-12 Thread Auger Eric
Hi Kunkun,
On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> In nested mode, we call the set_pasid_table() callback on each STE
> update to pass the guest stage 1 configuration to the host and
> apply it at physical level.
> 
> In the case of live migration, we need to manual call the
s/manual/manually
> set_pasid_table() to load the guest stage 1 configurations to the
> host. If this operation is fail, the migration is fail.
s/If this operation is fail, the migration is fail/If this operation
fails, the migration fails.
> 
> Signed-off-by: Kunkun Jiang 
> ---
>  hw/arm/smmuv3.c | 60 +
>  hw/arm/trace-events |  1 +
>  2 files changed, 61 insertions(+)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 6c6ed84e78..94ca15375c 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1473,6 +1473,65 @@ static void smmu_realize(DeviceState *d, Error **errp)
>  smmu_init_irq(s, dev);
>  }
>  
> +static int smmuv3_manual_set_pci_device_pasid_table(SMMUDevice *sdev)
Can't you retrieve the associated sid and then call
smmuv3_notify_config_change()
> +{
> +#ifdef __linux__
> +IOMMUMemoryRegion *mr = &(sdev->iommu);
> +int sid = smmu_get_sid(sdev);
> +SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
> +   .inval_ste_allowed = true};
> +IOMMUConfig iommu_config = {};
> +SMMUTransCfg *cfg;
> +int ret = -1;
> +
> +cfg = smmuv3_get_config(sdev, );
> +if (!cfg) {
> +return ret;
> +}
> +
> +iommu_config.pasid_cfg.argsz = sizeof(struct iommu_pasid_table_config);
> +iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
> +iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
> +iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
> +iommu_config.pasid_cfg.pasid_bits = 0;
> +iommu_config.pasid_cfg.vendor_data.smmuv3.version = 
> PASID_TABLE_SMMUV3_CFG_VERSION_1;
> +
> +if (cfg->disabled || cfg->bypassed) {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
> +} else if (cfg->aborted) {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
> +} else {
> +iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
> +}
> +
> +ret = pci_device_set_pasid_table(sdev->bus, sdev->devfn, _config);
> +if (ret) {
> +error_report("Failed to pass PASID table to host for iommu mr %s 
> (%m)",
> + mr->parent_obj.name);
> +}
> +
> +return ret;
> +#endif
> +}
> +
> +static int smmuv3_post_load(void *opaque, int version_id)
> +{
> +SMMUv3State *s3 = opaque;
> +SMMUState *s = &(s3->smmu_state);
> +SMMUDevice *sdev;
> +int ret = 0;
> +
> +QLIST_FOREACH(sdev, >devices_with_notifiers, next) {
> +trace_smmuv3_post_load_sdev(sdev->devfn, 
> sdev->iommu.parent_obj.name);
> +ret = smmuv3_manual_set_pci_device_pasid_table(sdev);
> +if (ret) {
> +break;
> +}
> +}
> +
> +return ret;
> +}
> +
>  static const VMStateDescription vmstate_smmuv3_queue = {
>  .name = "smmuv3_queue",
>  .version_id = 1,
> @@ -1491,6 +1550,7 @@ static const VMStateDescription vmstate_smmuv3 = {
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .priority = MIG_PRI_IOMMU,
> +.post_load = smmuv3_post_load,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(features, SMMUv3State),
>  VMSTATE_UINT8(sid_size, SMMUv3State),
> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
> index 35e562ab74..caa864dd72 100644
> --- a/hw/arm/trace-events
> +++ b/hw/arm/trace-events
> @@ -53,4 +53,5 @@ smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier 
> node for iommu mr=%s
>  smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
> mr=%s"
>  smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova, 
> uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d iova=0x%"PRIx64" tg=%d 
> num_pages=0x%"PRIx64
>  smmuv3_notify_config_change(const char *name, uint8_t config, uint64_t 
> s1ctxptr) "iommu mr=%s config=%d s1ctxptr=0x%"PRIx64
> +smmuv3_post_load_sdev(int devfn, const char *name) "sdev devfn=%d iommu 
> mr=%s"PRIx64
>  
> 
Thanks

Eric




Re: [RFC PATCH 2/3] vfio: Add vfio_prereg_listener_log_sync in nested stage

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> On Intel, the DMA mapped through the host single stage. Instead
> we set up the stage 2 and stage 1 separately in nested mode as there
> is no "Caching Mode".

You need to rewrite the above sentences, Missing ARM and also the 1st
sentences misses a verb.
> 
> Legacy vfio_listener_log_sync cannot be used in nested stage as we
> don't need to pay close attention to stage 1 mapping. This patch adds
> vfio_prereg_listener_log_sync to mark dirty pages in nested mode.
> 
> Signed-off-by: Kunkun Jiang 
> ---
>  hw/vfio/common.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7c50905856..af333e0dee 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1216,6 +1216,22 @@ static int 
> vfio_dma_sync_ram_section_dirty_bitmap(VFIOContainer *container,
> int128_get64(section->size), ram_addr);
>  }
>  
> +static void vfio_prereg_listener_log_sync(MemoryListener *listener,
> +  MemoryRegionSection *section)
> +{
> +VFIOContainer *container =
> +container_of(listener, VFIOContainer, prereg_listener);
> +
> +if (!memory_region_is_ram(section->mr) ||
> +!container->dirty_pages_supported) {
> +return;
> +}
> +
> +if (vfio_devices_all_saving(container)) {
I fail to see where is this defined?
> +vfio_dma_sync_ram_section_dirty_bitmap(container, section);
> +}
> +}
> +
>  typedef struct {
>  IOMMUNotifier n;
>  VFIOGuestIOMMU *giommu;
> @@ -1260,6 +1276,14 @@ static int vfio_sync_dirty_bitmap(VFIOContainer 
> *container,
>  if (memory_region_is_iommu(section->mr)) {
>  VFIOGuestIOMMU *giommu;
>  
> +/*
> + * In nested mode, stage 2 and stage 1 are set up separately. We
> + * only need to focus on stage 2 mapping when marking dirty pages.
> + */
> +if (container->iommu_type == VFIO_TYPE1_NESTING_IOMMU) {
> +return 0;
> +}
> +
>  QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
>  if (MEMORY_REGION(giommu->iommu) == section->mr &&
>  giommu->n.start == section->offset_within_region) {
> @@ -1312,6 +1336,7 @@ static const MemoryListener vfio_memory_listener = {
>  static MemoryListener vfio_memory_prereg_listener = {
>  .region_add = vfio_prereg_listener_region_add,
>  .region_del = vfio_prereg_listener_region_del,
> +.log_sync = vfio_prereg_listener_log_sync,
>  };
>  
>  static void vfio_listener_release(VFIOContainer *container)
> 
Thanks

Eric




Re: [RFC PATCH 1/3] vfio: Introduce helpers to mark dirty pages of a RAM section

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 2/19/21 10:42 AM, Kunkun Jiang wrote:
> Extract part of the code from vfio_sync_dirty_bitmap to form a
> new helper, which allows to mark dirty pages of a RAM section.
> This helper will be called for nested stage.
> 
> Signed-off-by: Kunkun Jiang 
> ---
>  hw/vfio/common.c | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9225f10722..7c50905856 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1203,6 +1203,19 @@ err_out:
>  return ret;
>  }
>  
> +static int vfio_dma_sync_ram_section_dirty_bitmap(VFIOContainer *container,
> +  MemoryRegionSection 
> *section)
> +{
> +ram_addr_t ram_addr;
> +
> +ram_addr = memory_region_get_ram_addr(section->mr) +
> +   section->offset_within_region;
> +
> +return vfio_get_dirty_bitmap(container,
> +   
> TARGET_PAGE_ALIGN(section->offset_within_address_space),
> +   int128_get64(section->size), ram_addr);
> +}
> +
>  typedef struct {
>  IOMMUNotifier n;
>  VFIOGuestIOMMU *giommu;
> @@ -1244,8 +1257,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier 
> *n, IOMMUTLBEntry *iotlb)
>  static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>MemoryRegionSection *section)
>  {
> -ram_addr_t ram_addr;
> -
>  if (memory_region_is_iommu(section->mr)) {
>  VFIOGuestIOMMU *giommu;
>  
> @@ -1274,12 +1285,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer 
> *container,
>  return 0;
>  }
>  
> -ram_addr = memory_region_get_ram_addr(section->mr) +
> -   section->offset_within_region;
> -
> -return vfio_get_dirty_bitmap(container,
> -   
> TARGET_PAGE_ALIGN(section->offset_within_address_space),
this is now REAL_HOST_PAGE_ALIGN

Thanks

Eric
> -   int128_get64(section->size), ram_addr);
> +return vfio_dma_sync_ram_section_dirty_bitmap(container, section);
>  }
>  
>  static void vfio_listerner_log_sync(MemoryListener *listener,
> 




Re: A question about the translation granule size supported by the vSMMU

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 4/7/21 11:26 AM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/4/7 3:50, Auger Eric wrote:
>> Hi Kunkun,
>>
>> On 3/27/21 3:24 AM, Kunkun Jiang wrote:
>>> Hi all,
>>>
>>> Recently, I did some tests on SMMU nested mode. Here is
>>> a question about the translation granule size supported by
>>> vSMMU.
>>>
>>> There is such a code in SMMUv3_init_regs():
>>>
>>>>     /* 4K and 64K granule support */
>>>>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
>>>>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, 1);
>>>>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS); /* 44
>>>> bits */
>>> Why is the 16K granule not supported? I modified the code
>>> to support it and did not encounter any problems in the
>>> test. Although 4K and 64K minimal granules are "strongly
>>> recommended", I think vSMMU should still support 16K.
>>> Are there other reasons why 16K is not supported here?
>> no there aren't any. The main reasons were 16KB support is optional and
>> supporting it increases the test matrix. Also it seems quite a few
>> machines I have access to do support 16KB granule. On the others I get
>>
>> "EFI stub: ERROR: This 16 KB granular kernel is not supported by your
>> CPU".
>>
>> Nevertheless I am not opposed to support it as it seems to work without
>> trouble. Just need to have an extra look at implied validity checks but
>> there shouldn't be much.
>>
>> Thanks
>>
>> Eric
> Yes, you are right. In my opinion, it is necessary to check whether pSMMU
> supports 16K to avoid the situation I mentioned below.
> In SMMU nested mode, if vSMMU supports 16K and set pasid table to
> pSMMU, it may get errors when pSMMU does translation table walk if
> pSMMU doesn't support 16K (not tested). Do you think we need to add
> an interface to get some pSMMU info?>
> Maybe my consideration was superfluous.
No it is not. At qemu level we have
memory_region_iommu_set_page_size_mask() that is called from the VFIO
device. It allows to pass such info to the IOMMU device (qemu
b91774984249).

iommu_set_page_size_mask() cb needs to be implemented at SMMU QEMU
device level. Also [PATCH 0/2] Domain nesting info for arm-smmu may
allow to return other constraints from the pSMMU.

Thanks

Eric
> 
> Thanks,
> Kunkun Jiang
>>> When in SMMU nested mode, it may get errors if pSMMU
>>> doesn't support 16K but vSMMU supports 16K. But we
>>> can get some settings of pSMMU to avoid this situation.
>>> I found some discussions between Eric and Linu about
>>> this [1], but this idea does not seem to be implemented.
>>>
>>> [1] https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00149.html
>>>
>>> Best regards,
>>> Kunkun Jiang
>>>
>>
>> .
> 
> 




Re: [PATCH] hw/arm/smmuv3: Support 16K translation granule

2021-04-07 Thread Auger Eric
Hi Kunkun,

On 3/31/21 8:47 AM, Kunkun Jiang wrote:
> The driver can query some bits in SMMUv3 IDR5 to learn which
> translation granules are supported. Arm recommends that SMMUv3
> implementations support at least 4K and 64K granules. But in
> the vSMMUv3, there seems to be no reason not to support 16K
> translation granule. In addition, if 16K is not supported,
> vSVA will failed to be enabled in the future for 16K guest
> kernel. So it'd better to support it.
> 
> Signed-off-by: Kunkun Jiang 

Looks good to me.
Reviewed-by: Eric Auger 
Tested-by: Eric Auger 

Thanks

Eric
> ---
>  hw/arm/smmuv3.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 3b87324ce2..0a483b0bab 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -259,8 +259,9 @@ static void smmuv3_init_regs(SMMUv3State *s)
>  s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
>  s->idr[3] = FIELD_DP32(s->idr[3], IDR3, HAD, 1);
>  
> -   /* 4K and 64K granule support */
> +/* 4K, 16K and 64K granule support */
>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
> +s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN16K, 1);
>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, 1);
>  s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS); /* 44 bits 
> */
>  
> @@ -503,7 +504,8 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, 
> SMMUEventInfo *event)
>  
>  tg = CD_TG(cd, i);
>  tt->granule_sz = tg2granule(tg, i);
> -if ((tt->granule_sz != 12 && tt->granule_sz != 16) || CD_ENDI(cd)) {
> +if ((tt->granule_sz != 12 && tt->granule_sz != 14 &&
> + tt->granule_sz != 16) || CD_ENDI(cd)) {
>  goto bad_cd;
>  }
>  
> 




Re: A question about the translation granule size supported by the vSMMU

2021-04-06 Thread Auger Eric
Hi Kunkun,

On 3/27/21 3:24 AM, Kunkun Jiang wrote:
> Hi all,
> 
> Recently, I did some tests on SMMU nested mode. Here is
> a question about the translation granule size supported by
> vSMMU.
> 
> There is such a code in SMMUv3_init_regs():
> 
>>    /* 4K and 64K granule support */
>>     s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN4K, 1);
>>     s->idr[5] = FIELD_DP32(s->idr[5], IDR5, GRAN64K, 1);
>>     s->idr[5] = FIELD_DP32(s->idr[5], IDR5, OAS, SMMU_IDR5_OAS); /* 44
>> bits */
> Why is the 16K granule not supported? I modified the code
> to support it and did not encounter any problems in the
> test. Although 4K and 64K minimal granules are "strongly
> recommended", I think vSMMU should still support 16K.
> Are there other reasons why 16K is not supported here?
no there aren't any. The main reasons were 16KB support is optional and
supporting it increases the test matrix. Also it seems quite a few
machines I have access to do support 16KB granule. On the others I get

"EFI stub: ERROR: This 16 KB granular kernel is not supported by your CPU".

Nevertheless I am not opposed to support it as it seems to work without
trouble. Just need to have an extra look at implied validity checks but
there shouldn't be much.

Thanks

Eric
> 
> When in SMMU nested mode, it may get errors if pSMMU
> doesn't support 16K but vSMMU supports 16K. But we
> can get some settings of pSMMU to avoid this situation.
> I found some discussions between Eric and Linu about
> this [1], but this idea does not seem to be implemented.
> 
> [1] https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00149.html
> 
> Best regards,
> Kunkun Jiang
> 




Re: [PATCH] hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts

2021-04-06 Thread Auger Eric
Hi Peter,

On 4/6/21 2:31 PM, Peter Maydell wrote:
> On Tue, 6 Apr 2021 at 13:23, Auger Eric  wrote:
>>
>> Hi Peter,
>>
>> On 4/6/21 12:44 PM, Peter Maydell wrote:
>>> On Tue, 6 Apr 2021 at 11:10, Auger Eric  wrote:
>>>>
>>>> Hi Zenghui,
>>>>
>>>> On 4/2/21 10:47 AM, Zenghui Yu wrote:
>>>>> The GSIV values in SMMUv3 IORT node are not correct as they don't match
>>>>> the SMMUIrq enumeration, which describes the IRQ<->PIN mapping used by
>>>>> our emulated vSMMU.
>>>>>
>>>>> Fixes: a703b4f6c1ee ("hw/arm/virt-acpi-build: Add smmuv3 node in IORT 
>>>>> table")
>>>>> Signed-off-by: Zenghui Yu 
>>>> Acked-by: Eric Auger 
>>>
>>> Eric, when you send an acked-by tag do you mean to say that you've
>>> reviewed the patch, or merely that you think it's basically the
>>> right thing but you haven't actually looked at the details?
>>
>> I mean I have reviewed the patch carefully and I think it is good to go.
>> I thought that as a maintainer for the arm smmu component I was supposed
>> to send an A-b instead of an R-b.
> 
> The usual meaning I think is that "Acked-by" means "I'm the maintainer,
> I've seen this going by, and I'm basically OK with this" (ie it's you
> saying "I'm not NAKing it") -- so it's not as "strong" as a "Reviewed-by"
> tag (which means "I've reviewed it").

Hum OK, I thought it was stronger than the R-b. So in the future, wrt
the SMMU component, I will give an R-b as I always do a proper review.

Thanks

Eric
> 
> thanks
> -- PMM
> 




Re: [PATCH] hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts

2021-04-06 Thread Auger Eric
Hi Peter,

On 4/6/21 12:44 PM, Peter Maydell wrote:
> On Tue, 6 Apr 2021 at 11:10, Auger Eric  wrote:
>>
>> Hi Zenghui,
>>
>> On 4/2/21 10:47 AM, Zenghui Yu wrote:
>>> The GSIV values in SMMUv3 IORT node are not correct as they don't match
>>> the SMMUIrq enumeration, which describes the IRQ<->PIN mapping used by
>>> our emulated vSMMU.
>>>
>>> Fixes: a703b4f6c1ee ("hw/arm/virt-acpi-build: Add smmuv3 node in IORT 
>>> table")
>>> Signed-off-by: Zenghui Yu 
>> Acked-by: Eric Auger 
> 
> Eric, when you send an acked-by tag do you mean to say that you've
> reviewed the patch, or merely that you think it's basically the
> right thing but you haven't actually looked at the details?

I mean I have reviewed the patch carefully and I think it is good to go.
I thought that as a maintainer for the arm smmu component I was supposed
to send an A-b instead of an R-b.
> 
> (I ask because if the former I can just put this in target-arm.next,
> but if the latter then I need to dig out the SMMU spec and review
> the patch myself :-))

Yes that's rather the former but obviously if you have some cycles /
interest in the topic I am more than happy to get your opinion too!

Thanks

Eric
> 
> thanks
> -- PMM
> 




Re: [PATCH] hw/arm/virt-acpi-build: Fix GSIV values of the {GERR, Sync} interrupts

2021-04-06 Thread Auger Eric
Hi Zenghui,

On 4/2/21 10:47 AM, Zenghui Yu wrote:
> The GSIV values in SMMUv3 IORT node are not correct as they don't match
> the SMMUIrq enumeration, which describes the IRQ<->PIN mapping used by
> our emulated vSMMU.
> 
> Fixes: a703b4f6c1ee ("hw/arm/virt-acpi-build: Add smmuv3 node in IORT table")
> Signed-off-by: Zenghui Yu 
Acked-by: Eric Auger 

Thanks!

Eric
> ---
>  hw/arm/virt-acpi-build.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f5a2b2d4cb..60fe2e65a7 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -292,8 +292,8 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
> VirtMachineState *vms)
>  smmu->flags = cpu_to_le32(ACPI_IORT_SMMU_V3_COHACC_OVERRIDE);
>  smmu->event_gsiv = cpu_to_le32(irq);
>  smmu->pri_gsiv = cpu_to_le32(irq + 1);
> -smmu->gerr_gsiv = cpu_to_le32(irq + 2);
> -smmu->sync_gsiv = cpu_to_le32(irq + 3);
> +smmu->sync_gsiv = cpu_to_le32(irq + 2);
> +smmu->gerr_gsiv = cpu_to_le32(irq + 3);
>  
>  /* Identity RID mapping covering the whole input RID range */
>  idmap = >id_mapping_array[0];
> 




Re: [PATCH] hw/arm/smmuv3: Emulate CFGI_STE_RANGE for an aligned range of StreamIDs

2021-04-06 Thread Auger Eric
Hi Zenghui,

On 4/2/21 12:04 PM, Zenghui Yu wrote:
> In emulation of the CFGI_STE_RANGE command, we now take StreamID as the
> start of the invalidation range, regardless of whatever the Range is,
> whilst the spec clearly states that
> 
>  - "Invalidation is performed for an *aligned* range of 2^(Range+1)
> StreamIDs."
> 
>  - "The bottom Range+1 bits of the StreamID parameter are IGNORED,
> aligning the range to its size."
> 
> Take CFGI_ALL (where Range == 31) as an example, if there are some random
> bits in the StreamID field, we'll fail to perform the full invalidation but
> get a strange range (e.g., SMMUSIDRange={.start=1, .end=0}) instead. Rework
> the emulation a bit to get rid of the discrepancy with the spec.
> 
> Signed-off-by: Zenghui Yu 
Acked-by: Eric Auger 

Thanks!

Eric
> ---
>  hw/arm/smmuv3.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 3b87324ce2..8705612535 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -980,16 +980,20 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>  }
>  case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
>  {
> -uint32_t start = CMD_SID();
> +uint32_t sid = CMD_SID(), mask;
>  uint8_t range = CMD_STE_RANGE();
> -uint64_t end = start + (1ULL << (range + 1)) - 1;
> -SMMUSIDRange sid_range = {start, end};
> +SMMUSIDRange sid_range;
>  
>  if (CMD_SSEC()) {
>  cmd_error = SMMU_CERROR_ILL;
>  break;
>  }
> -trace_smmuv3_cmdq_cfgi_ste_range(start, end);
> +
> +mask = (1ULL << (range + 1)) - 1;
> +sid_range.start = sid & ~mask;
> +sid_range.end = sid_range.start + mask;
> +
> +trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start, sid_range.end);
>  g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
>  _range);
>  break;
> 




Re: [PATCH for-6.0 1/4] include/hw/boards.h: Document machine_class_allow_dynamic_sysbus_dev()

2021-03-26 Thread Auger Eric
Hi Peter,

On 3/26/21 11:20 AM, Peter Maydell wrote:
> On Fri, 26 Mar 2021 at 09:27, Auger Eric  wrote:
>>
>> Hi Peter,
>>
>> On 3/25/21 4:33 PM, Peter Maydell wrote:
>>> The function machine_class_allow_dynamic_sysbus_dev() is currently
>>> undocumented; add a doc comment.
>>>
>>> Signed-off-by: Peter Maydell 
>>> ---
>>>  include/hw/boards.h | 14 ++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>> index 4a90549ad85..27106abc11d 100644
>>> --- a/include/hw/boards.h
>>> +++ b/include/hw/boards.h
>>> @@ -36,7 +36,21 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>> const CpuInstanceProperties *props,
>>> Error **errp);
>>>
>>> +/**
>>> + * machine_class_allow_dynamic_sysbus_dev: Add type to list of valid 
>>> devices
>> nit: s/of valid devices/of dynamically instantiable sysbus devices ?
> 
> I was trying to keep the summary line to be one line, which
> doesn't give much space for nuance with a function name this long...

OK no worries

Thanks

Eric
> 
> 
> -- PMM
> 




Re: [PATCH for-6.0 4/4] hw/ppc/e500plat: Only try to add valid dynamic sysbus devices to platform bus

2021-03-26 Thread Auger Eric



On 3/25/21 4:33 PM, Peter Maydell wrote:
> The e500plat machine device plug callback currently calls
> platform_bus_link_device() for any sysbus device.  This is overly
> broad, because platform_bus_link_device() will unconditionally grab
> the IRQs and MMIOs of the device it is passed, whether it was
> intended for the platform bus or not.  Restrict hotpluggability of
> sysbus devices to only those devices on the dynamic sysbus whitelist.
> 
> We were mostly getting away with this because the board creates the
> platform bus as the last device it creates, and so the hotplug
> callback did not do anything for all the sysbus devices created by
> the board itself.  However if the user plugged in a device which
> itself uses a sysbus device internally we would have mishandled this
> and probably asserted. An example of this is:
>  qemu-system-ppc64 -M ppce500 -device macio-oldworld
> 
> This isn't a sensible command because the macio-oldworld device
> is really specific to the 'g3beige' machine, but we now fail
> with a reasonable error message rather than asserting:
> qemu-system-ppc64: Device heathrow is not supported by this machine yet.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Eric Auger 

Eric
> ---
>  hw/ppc/e500plat.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c
> index bddd5e7c48f..fc911bbb7bd 100644
> --- a/hw/ppc/e500plat.c
> +++ b/hw/ppc/e500plat.c
> @@ -48,7 +48,9 @@ static void e500plat_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  PPCE500MachineState *pms = PPCE500_MACHINE(hotplug_dev);
>  
>  if (pms->pbus_dev) {
> -if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
> +MachineClass *mc = MACHINE_GET_CLASS(pms);
> +
> +if (device_is_dynamic_sysbus(mc, dev)) {
>  platform_bus_link_device(pms->pbus_dev, SYS_BUS_DEVICE(dev));
>  }
>  }
> @@ -58,7 +60,9 @@ static
>  HotplugHandler *e500plat_machine_get_hotpug_handler(MachineState *machine,
>  DeviceState *dev)
>  {
> -if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
> +MachineClass *mc = MACHINE_GET_CLASS(machine);
> +
> +if (device_is_dynamic_sysbus(mc, dev)) {
>  return HOTPLUG_HANDLER(machine);
>  }
>  
> 




Re: [PATCH for-6.0 3/4] hw/arm/virt: Only try to add valid dynamic sysbus devices to platform bus

2021-03-26 Thread Auger Eric
Hi Peter,

On 3/25/21 4:33 PM, Peter Maydell wrote:
> The virt machine device plug callback currently calls
> platform_bus_link_device() for any sysbus device.  This is overly
> broad, because platform_bus_link_device() will unconditionally grab
> the IRQs and MMIOs of the device it is passed, whether it was
> intended for the platform bus or not.  Restrict hotpluggability of
> sysbus devices to only those devices on the dynamic sysbus whitelist.
> 
> We were mostly getting away with this because the board creates the
> platform bus as the last device it creates, and so the hotplug
> callback did not do anything for all the sysbus devices created by
> the board itself.  However if the user plugged in a device which
> itself uses a sysbus device internally we would have mishandled this
> and probably asserted.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  hw/arm/virt.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index aa2bbd14e09..8625152a735 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2443,7 +2443,9 @@ static void virt_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
>  
>  if (vms->platform_bus_dev) {
> -if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
> +MachineClass *mc = MACHINE_GET_CLASS(vms);
> +
> +if (device_is_dynamic_sysbus(mc, dev)) {
>  
> platform_bus_link_device(PLATFORM_BUS_DEVICE(vms->platform_bus_dev),
>   SYS_BUS_DEVICE(dev));
>  }
> @@ -2527,7 +2529,9 @@ static void 
> virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
>  static HotplugHandler *virt_machine_get_hotplug_handler(MachineState 
> *machine,
>  DeviceState *dev)
>  {
> -if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
> +MachineClass *mc = MACHINE_GET_CLASS(machine);
> +
> +if (device_is_dynamic_sysbus(mc, dev) ||
> (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
>  return HOTPLUG_HANDLER(machine);
>  }
> 




Re: [PATCH for-6.0 2/4] machine: Provide a function to check the dynamic sysbus whitelist

2021-03-26 Thread Auger Eric
Hi Peter,

On 3/25/21 4:33 PM, Peter Maydell wrote:
> Provide a new function dynamic_sysbus_dev_allowed() which checks
> the per-machine whitelist of dynamic sysbus devices and returns
> a boolean result indicating whether the device is whitelisted.
> We can use this in the implementation of validate_sysbus_device(),
> but we will also need it so that machine hotplug callbacks can
> validate devices rather than assuming that any sysbus device
> might be hotpluggable into the platform bus.
> 
> Signed-off-by: Peter Maydell 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/hw/boards.h | 24 
>  hw/core/machine.c   | 21 -
>  2 files changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 27106abc11d..609112a4e1a 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -51,6 +51,30 @@ void machine_set_cpu_numa_node(MachineState *machine,
>   */
>  void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char 
> *type);
>  
> +/**
> + * device_is_dynamic_sysbus: test whether device is a dynamic sysbus device
> + * @mc: Machine class
> + * @dev: device to check
> + *
> + * Returns: true if @dev is a sysbus device on the machine's whitelist
> + * of dynamically pluggable sysbus devices; otherwise false.
> + *
> + * This function checks whether @dev is a valid dynamic sysbus device,
> + * by first confirming that it is a sysbus device and then checking it
> + * against the whitelist of permitted dynamic sysbus devices which has
> + * been set up by the machine using machine_class_allow_dynamic_sysbus_dev().
> + *
> + * It is valid to call this with something that is not a subclass of
> + * TYPE_SYS_BUS_DEVICE; the function will return false in this case.
> + * This allows hotplug callback functions to be written as:
> + * if (device_is_dynamic_sysbus(mc, dev)) {
> + * handle dynamic sysbus case;
> + * } else if (some other kind of hotplug) {
> + * handle that;
> + * }
> + */
> +bool device_is_dynamic_sysbus(MachineClass *mc, DeviceState *dev);
> +
>  /*
>   * Checks that backend isn't used, preps it for exclusive usage and
>   * returns migratable MemoryRegion provided by backend.
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 9935c6ddd56..8d97094736a 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -529,20 +529,31 @@ void 
> machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
>  QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
>  }
>  
> -static void validate_sysbus_device(SysBusDevice *sbdev, void *opaque)
> +bool device_is_dynamic_sysbus(MachineClass *mc, DeviceState *dev)
>  {
> -MachineState *machine = opaque;
> -MachineClass *mc = MACHINE_GET_CLASS(machine);
>  bool allowed = false;
>  strList *wl;
> +Object *obj = OBJECT(dev);
> +
> +if (!object_dynamic_cast(obj, TYPE_SYS_BUS_DEVICE)) {
> +return false;
> +}
>  
>  for (wl = mc->allowed_dynamic_sysbus_devices;
>   !allowed && wl;
>   wl = wl->next) {
> -allowed |= !!object_dynamic_cast(OBJECT(sbdev), wl->value);
> +allowed |= !!object_dynamic_cast(obj, wl->value);
>  }
>  
> -if (!allowed) {
> +return allowed;
> +}
> +
> +static void validate_sysbus_device(SysBusDevice *sbdev, void *opaque)
> +{
> +MachineState *machine = opaque;
> +MachineClass *mc = MACHINE_GET_CLASS(machine);
> +
> +if (!device_is_dynamic_sysbus(mc, DEVICE(sbdev))) {
>  error_report("Option '-device %s' cannot be handled by this machine",
>   object_class_get_name(object_get_class(OBJECT(sbdev;
>  exit(1);
> 




Re: [PATCH for-6.0 1/4] include/hw/boards.h: Document machine_class_allow_dynamic_sysbus_dev()

2021-03-26 Thread Auger Eric
Hi Peter,

On 3/25/21 4:33 PM, Peter Maydell wrote:
> The function machine_class_allow_dynamic_sysbus_dev() is currently
> undocumented; add a doc comment.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/boards.h | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 4a90549ad85..27106abc11d 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -36,7 +36,21 @@ void machine_set_cpu_numa_node(MachineState *machine,
> const CpuInstanceProperties *props,
> Error **errp);
>  
> +/**
> + * machine_class_allow_dynamic_sysbus_dev: Add type to list of valid devices
nit: s/of valid devices/of dynamically instantiable sysbus devices ?
> + * @mc: Machine class
> + * @type: type to whitelist (should be a subtype of TYPE_SYS_BUS_DEVICE)
> + *
> + * Add the QOM type @type to the list of devices of which are subtypes
> + * of TYPE_SYS_BUS_DEVICE but which are still permitted to be dynamically
> + * created (eg by the user on the command line with -device).
> + * By default if the user tries to create any devices on the command line
> + * that are subtypes of TYPE_SYS_BUS_DEVICE they will get an error message;
> + * for the special cases which are permitted for this machine model, the
> + * machine model class init code must call this function to whitelist them.
> + */
>  void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char 
> *type);
> +
>  /*
>   * Checks that backend isn't used, preps it for exclusive usage and
>   * returns migratable MemoryRegion provided by backend.
> 
Besides

Reviewed-by: Eric Auger 

Thanks

Eric




Re: [PATCH v2 00/10] Acceptance Test: introduce base class for Linux based tests

2021-03-26 Thread Auger Eric
Hi Wainer,

On 3/25/21 8:45 PM, Wainer dos Santos Moschetta wrote:
> Hi,
> 
> On 3/23/21 7:15 PM, Cleber Rosa wrote:
>> This introduces a base class for tests that need to interact with a
>> Linux guest.  It generalizes the "boot_linux.py" code, already been
>> used by the "virtiofs_submounts.py" and also SSH related code being
>> used by that and "linux_ssh_mips_malta.py".
> 
> I ran the linux_ssh_mips_malta.py tests, they all passed:
> 
> (11/34)
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32eb_kernel3_2_0:
> PASS (64.41 s)
> (12/34)
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32el_kernel3_2_0:
> PASS (63.43 s)
> (13/34)
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64eb_kernel3_2_0:
> PASS (63.76 s)
> (14/34)
> tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64el_kernel3_2_0:
> PASS (62.52 s)
> 
> Then I tried the virtiofs_submounts.py tests, it finishes with error.
> Something like that fixes it:
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py
> b/tests/acceptance/virtiofs_submounts.py
> index d77ee35674..21ad7d792e 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -195,7 +195,7 @@ def setUp(self):
> 
>  self.run(('ssh-keygen', '-N', '', '-t', 'ed25519', '-f',
> self.ssh_key))
> 
> -    pubkey = open(self.ssh_key + '.pub').read()
> +    pubkey = self.ssh_key + '.pub'

Yes I discovered that too when developping the SMMU test. Thanks for
mentionning

Eric
> 
>  super(VirtiofsSubmountsTest, self).setUp(pubkey)
> 
> 
>>
>> While at it, a number of fixes on hopeful improvements to those tests
>> were added.
>>
>> Changes from v1:
>>
>> * Majority of v1 patches have been merged.
>>
>> * New patches:
>>    - Acceptance Tests: make username/password configurable
>>    - Acceptance Tests: set up SSH connection by default after boot for
>> LinuxTest
>>    - tests/acceptance/virtiofs_submounts.py: remove launch_vm()
>>
>> * Allowed for the configuration of the network device type (defaulting
>>    to virtio-net) [Phil]
>>
>> * Fix module name typo (s/qemu.util/qemu.utils/) in the commit message
>>    [John]
>>
>> * Tests based on LinuxTest will have the SSH connection already prepared
>>
>> Cleber Rosa (10):
>>    tests/acceptance/virtiofs_submounts.py: add missing accel tag
>>    tests/acceptance/virtiofs_submounts.py: evaluate string not length
>>    Python: add utility function for retrieving port redirection
>>    Acceptance Tests: move useful ssh methods to base class
>>    Acceptance Tests: add port redirection for ssh by default
>>    Acceptance Tests: make username/password configurable
>>    Acceptance Tests: set up SSH connection by default after boot for
>>  LinuxTest
>>    tests/acceptance/virtiofs_submounts.py: remove launch_vm()
>>    Acceptance Tests: add basic documentation on LinuxTest base class
>>    Acceptance Tests: introduce CPU hotplug test
>>
>>   docs/devel/testing.rst    | 25 
>>   python/qemu/utils.py  | 35 
>>   tests/acceptance/avocado_qemu/__init__.py | 63 +++--
>>   tests/acceptance/hotplug_cpu.py   | 37 
>>   tests/acceptance/info_usernet.py  | 29 ++
>>   tests/acceptance/linux_ssh_mips_malta.py  | 44 ++-
>>   tests/acceptance/virtiofs_submounts.py    | 69 +++
>>   tests/vm/basevm.py    |  7 +--
>>   8 files changed, 198 insertions(+), 111 deletions(-)
>>   create mode 100644 python/qemu/utils.py
>>   create mode 100644 tests/acceptance/hotplug_cpu.py
>>   create mode 100644 tests/acceptance/info_usernet.py
>>




Re: [PATCH v3 6/7] hw/arm/smmuv3: Fix SMMU_CMD_CFGI_STE_RANGE handling

2021-03-25 Thread Auger Eric
Hi Zenghui,

On 3/25/21 3:18 PM, Zenghui Yu wrote:
> On 2021/3/9 18:27, Eric Auger wrote:
>> If the whole SID range (32b) is invalidated (SMMU_CMD_CFGI_ALL),
>> @end overflows and we fail to handle the command properly.
>>
>> Once this gets fixed, the current code really is awkward in the
>> sense it loops over the whole range instead of removing the
>> currently cached configs through a hash table lookup.
>>
>> Fix both the overflow and the lookup.
>>
>> Signed-off-by: Eric Auger 
>> Reviewed-by: Peter Maydell 
> 
> Not much to do with this patch, but maybe we can take the fix a little
> further. We now take StreamID as the start of the invalidation range,
> regardless of whatever the Range is, whilst the spec clearly states that
> "the StreamID parameter (of *CMD_CFGI_ALL* command) is IGNORED". If
> there are some random bits in the StreamID field (who knows), we'll fail
> to perform the full invalidation but get a strange range (e.g.,
> SMMUSIDRange={.start=1, .end=0}) instead.
> 
> And having looked at the spec again, 4.3.2 CMD_CFGI_STE_RANGE:
> 
>  - "Invalidation is performed for an *aligned* range of 2^(Range+1)
>     StreamIDs."
> 
>  - "The bottom Range+1 bits of the StreamID parameter are IGNORED,
>     aligning the range to its size."
> 
> which seems to be some bits that we had never taken into account. And
> what I'm saying is roughly something like below (compile tested), any
> thoughts?
> 
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 3b87324ce2..8705612535 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -980,16 +980,20 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
>  }
>  case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
>  {
> -    uint32_t start = CMD_SID();
> +    uint32_t sid = CMD_SID(), mask;
>  uint8_t range = CMD_STE_RANGE();
> -    uint64_t end = start + (1ULL << (range + 1)) - 1;
> -    SMMUSIDRange sid_range = {start, end};
> +    SMMUSIDRange sid_range;
> 
>  if (CMD_SSEC()) {
>  cmd_error = SMMU_CERROR_ILL;
>  break;
>  }
> -    trace_smmuv3_cmdq_cfgi_ste_range(start, end);
> +
> +    mask = (1ULL << (range + 1)) - 1;
> +    sid_range.start = sid & ~mask;
> +    sid_range.end = sid_range.start + mask;
> +
> +    trace_smmuv3_cmdq_cfgi_ste_range(sid_range.start,
> sid_range.end);
>  g_hash_table_foreach_remove(bs->configs,
> smmuv3_invalidate_ste,
>  _range);
>  break;
> 
Thanks for spotting this discrepancy with the spec. This looks good to
me, please feel free to then the patch.

Thanks

Eric




Re: [PATCH] hw/arm/smmuv3: Drop unused CDM_VALID() and is_cd_valid()

2021-03-25 Thread Auger Eric
Hi Zenghui,

On 3/25/21 3:27 PM, Zenghui Yu wrote:
> They were introduced in commit 9bde7f0674fe ("hw/arm/smmuv3: Implement
> translate callback") but never actually used. Drop them.
> 
> Signed-off-by: Zenghui Yu 
> ---
>  hw/arm/smmuv3-internal.h | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index b6f7e53b7c..3dac5766ca 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -595,13 +595,6 @@ static inline int pa_range(STE *ste)
>  #define CD_A(x)  extract32((x)->word[1], 14, 1)
>  #define CD_AARCH64(x)extract32((x)->word[1], 9 , 1)
>  
> -#define CDM_VALID(x)((x)->word[0] & 0x1)
> -
> -static inline int is_cd_valid(SMMUv3State *s, STE *ste, CD *cd)
> -{
> -return CD_VALID(cd);
> -}
> -
>  /**
>   * tg2granule - Decodes the CD translation granule size field according
>   * to the ttbr in use
> 
Acked-by: Eric Auger 

Thanks

Eric




Re: [PATCH 1/1] avocado_qemu: Add SMMUv3 tests

2021-03-25 Thread Auger Eric
Hi Cleber,

On 3/25/21 3:36 PM, Cleber Rosa wrote:
> On Thu, Mar 25, 2021 at 10:57:12AM +0100, Eric Auger wrote:
>> Add new tests checking the good behavior of the SMMUv3 protecting
>> 2 virtio pci devices (block and net). We check the guest boots and
>> we are able to install a package. Different guest configs are tested:
>> standard, passthrough an strict=0. Given the version of the guest
>> kernel in use (5.3.7 at this moment), range invalidation is not yet
>> tested. This will be handled separately.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  tests/acceptance/smmu.py | 104 +++
>>  1 file changed, 104 insertions(+)
>>  create mode 100644 tests/acceptance/smmu.py
>>
>> diff --git a/tests/acceptance/smmu.py b/tests/acceptance/smmu.py
>> new file mode 100644
>> index 00..65ecac8f1a
>> --- /dev/null
>> +++ b/tests/acceptance/smmu.py
>> @@ -0,0 +1,104 @@
>> +# SMMUv3 Functional tests
>> +#
>> +# Copyright (c) 2021 Red Hat, Inc.
>> +#
>> +# Author:
>> +#  Eric Auger 
>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2 or
>> +# later.  See the COPYING file in the top-level directory.
>> +
>> +import os
>> +
>> +from avocado_qemu import LinuxTest, BUILD_DIR
>> +from avocado.utils import ssh
> 
> This import is not needed, given that the you're not using them directly,
> but only using the LinuxTest methods that wrap them.
Sure I will remove it
> 
>> +
>> +class SMMU(LinuxTest):
>> +
>> +KERNEL_COMMON_PARAMS = ("root=UUID=b6950a44-9f3c-4076-a9c2-355e8475b0a7 
>> ro "
>> +"earlyprintk=pl011,0x900 ignore_loglevel "
>> +"no_timer_check printk.time=1 rd_NO_PLYMOUTH "
>> +"console=ttyAMA0 ")
>> +IOMMU_ADDON = ',iommu_platform=on,disable-modern=off,disable-legacy=on'
>> +IMAGE = ("https://archives.fedoraproject.org/pub/archive/fedora/;
>> + "linux/releases/31/Everything/aarch64/os/images/pxeboot/")
>> +kernel_path = None
>> +initrd_path = None
>> +kernel_params = None
>> +
>> +def set_up_boot(self):
>> +path = self.download_boot()
>> +self.vm.add_args('-device', 'virtio-blk-pci,bus=pcie.0,scsi=off,' +
>> + 'drive=drv0,id=virtio-disk0,bootindex=1,'
>> + 'werror=stop,rerror=stop' + self.IOMMU_ADDON)
>> +self.vm.add_args('-drive',
>> + 'file=%s,if=none,cache=writethrough,id=drv0' % 
>> path)
>> +
>> +def setUp(self):
>> +super(SMMU, self).setUp(None, 'virtio-net-pci' + self.IOMMU_ADDON)
>> +
>> +def add_common_args(self):
>> +self.vm.add_args("-machine", "virt")
>> +self.vm.add_args('-bios', os.path.join(BUILD_DIR, 'pc-bios',
>> +  'edk2-aarch64-code.fd'))
>> +self.vm.add_args('-device', 'virtio-rng-pci,rng=rng0')
>> +self.vm.add_args('-object',
>> + 'rng-random,id=rng0,filename=/dev/urandom')
>> +
>> +def common_vm_setup(self, custom_kernel=None):
>> +self.require_accelerator("kvm")
>> +self.add_common_args()
> 
> I know you're following the previous test pattern/template, but maybe
> combine add_command_args() and common_vm_setup()?  They seem to be
> doing the same thing.
yep
> 
>> +self.vm.add_args("-accel", "kvm")
>> +self.vm.add_args("-cpu", "host")
>> +self.vm.add_args("-machine", "iommu=smmuv3")
>> +
>> +if custom_kernel is None:
>> +return
>> +
>> +kernel_url = self.IMAGE + 'vmlinuz'
>> +initrd_url = self.IMAGE + 'initrd.img'
>> +self.kernel_path = self.fetch_asset(kernel_url)
>> +self.initrd_path = self.fetch_asset(initrd_url)
>> +
>> +def run_and_check(self):
>> +if self.kernel_path:
>> +self.vm.add_args('-kernel', self.kernel_path,
>> + '-append', self.kernel_params,
>> + '-initrd', self.initrd_path)
>> +self.launch_and_wait()
>> +self.ssh_command('cat /proc/cmdline')
>> +self.ssh_command('dnf -y install numactl-devel')
> 
> Would you expect the package installation to cover significant more
> than, say, a package removal?  Not relying on the distro's package
> repos (and external networking) would be an improvement to the test's
> stability, but I wonder how much functional coverage would be lost.
I noticed this package installed caused an issue in a specific case
(range invalidation) which is not handled here due to the kernel being
too old. I image that doing the package install is more stressful wrt
network than the remove?
> 
> FIY, I've tried it with 'dnf -y remove yum' instead, and test times
> are also considerably lower.
> 
>> +
>> +def test_smmu(self):
>> +"""
>> +:avocado: tags=accel:kvm
>> +:avocado: tags=cpu:host
>> +:avocado: tags=smmu
>> +"""
> 
> These tags are 

Re: [PATCH v2 04/10] Acceptance Tests: move useful ssh methods to base class

2021-03-25 Thread Auger Eric
Hi Cleber,

On 3/24/21 11:23 PM, Cleber Rosa wrote:
> On Wed, Mar 24, 2021 at 10:07:31AM +0100, Auger Eric wrote:
>> Hi Cleber,
>>
>> On 3/23/21 11:15 PM, Cleber Rosa wrote:
>>> Both the virtiofs submounts and the linux ssh mips malta tests
>>> contains useful methods related to ssh that deserve to be made
>>> available to other tests.  Let's move them to the base LinuxTest
>> nit: strictly speaking they are moved to another class which is
>> inherited by LinuxTest, right?
>>> class.
>>>
>>> The method that helps with setting up an ssh connection will now
>>> support both key and password based authentication, defaulting to key
>>> based.
>>>
>>> Signed-off-by: Cleber Rosa 
>>> Reviewed-by: Wainer dos Santos Moschetta 
>>> Reviewed-by: Willian Rampazzo 
>>> ---
>>>  tests/acceptance/avocado_qemu/__init__.py | 48 ++-
>>>  tests/acceptance/linux_ssh_mips_malta.py  | 38 ++
>>>  tests/acceptance/virtiofs_submounts.py| 37 -
>>>  3 files changed, 50 insertions(+), 73 deletions(-)
>>>
>>> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
>>> b/tests/acceptance/avocado_qemu/__init__.py
>>> index 83b1741ec8..67f75f66e5 100644
>>> --- a/tests/acceptance/avocado_qemu/__init__.py
>>> +++ b/tests/acceptance/avocado_qemu/__init__.py
>>> @@ -20,6 +20,7 @@
>>>  from avocado.utils import cloudinit
>>>  from avocado.utils import datadrainer
>>>  from avocado.utils import network
>>> +from avocado.utils import ssh
>>>  from avocado.utils import vmimage
>>>  from avocado.utils.path import find_command
>>>  
>>> @@ -43,6 +44,8 @@
>>>  from qemu.accel import kvm_available
>>>  from qemu.accel import tcg_available
>>>  from qemu.machine import QEMUMachine
>>> +from qemu.utils import get_info_usernet_hostfwd_port
>>> +
>>>  
>>>  def is_readable_executable_file(path):
>>>  return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
>>> @@ -253,7 +256,50 @@ def fetch_asset(self, name,
>>>  cancel_on_missing=cancel_on_missing)
>>>  
>>>  
>>> -class LinuxTest(Test):
>>> +class LinuxSSHMixIn:
>>> +"""Contains utility methods for interacting with a guest via SSH."""
>>> +
>>> +def ssh_connect(self, username, credential, credential_is_key=True):
>>> +self.ssh_logger = logging.getLogger('ssh')
>>> +res = self.vm.command('human-monitor-command',
>>> +  command_line='info usernet')
>>> +port = get_info_usernet_hostfwd_port(res)
>>> +self.assertIsNotNone(port)
>>> +self.assertGreater(port, 0)
>>> +self.log.debug('sshd listening on port: %d', port)
>>> +if credential_is_key:
>>> +self.ssh_session = ssh.Session('127.0.0.1', port=port,
>>> +   user=username, key=credential)
>>> +else:
>>> +self.ssh_session = ssh.Session('127.0.0.1', port=port,
>>> +   user=username, 
>>> password=credential)
>>> +for i in range(10):
>>> +try:
>>> +self.ssh_session.connect()
>>> +return
>>> +except:
>>> +time.sleep(4)
>>> +pass
>>> +self.fail('ssh connection timeout')
>>> +
>>> +def ssh_command(self, command):
>>> +self.ssh_logger.info(command)
>>> +result = self.ssh_session.cmd(command)
>>> +stdout_lines = [line.rstrip() for line
>>> +in result.stdout_text.splitlines()]
>>> +for line in stdout_lines:
>>> +self.ssh_logger.info(line)
>>> +stderr_lines = [line.rstrip() for line
>>> +in result.stderr_text.splitlines()]
>>> +for line in stderr_lines:
>>> +self.ssh_logger.warning(line)
>>> +
>>> +self.assertEqual(result.exit_status, 0,
>>> + f'Guest command failed: {command}')
>>> +return stdout_lines, stderr_lines
>>> +
>>> +
>>> +class LinuxTest(Test, LinuxSSHMixIn):
>>>  """Facilitates having a cloud-image Linu

Re: [PATCH v2 05/10] Acceptance Tests: add port redirection for ssh by default

2021-03-24 Thread Auger Eric
Hi Cleber,
On 3/23/21 11:15 PM, Cleber Rosa wrote:
> For users of the LinuxTest class, let's set up the VM with the port
> redirection for SSH, instead of requiring each test to set the same
> arguments.
> 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 4 +++-
>  tests/acceptance/virtiofs_submounts.py| 4 
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 67f75f66e5..e75b002c70 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -309,10 +309,12 @@ class LinuxTest(Test, LinuxSSHMixIn):
>  timeout = 900
>  chksum = None
>  
> -def setUp(self, ssh_pubkey=None):
> +def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
I would be interested in testing with HW bridging too, when a bridge is
available. Do you think we could have the netdev configurable too?
This would be helpful to test vhost for instance.

With respect the network device type, I am currently working on SMMU
test and I need to call the parent setUp-) with the following args now:

super(IOMMU, self).setUp(pubkey,
'virtio-net-pci,iommu_platform=on,disable-modern=off,disable-legacy=on')

It works but I am not sure you had such kind of scenario in mind?

Thanks

Eric

>  super(LinuxTest, self).setUp()
>  self.vm.add_args('-smp', '2')
>  self.vm.add_args('-m', '1024')
> +self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
> + '-device', '%s,netdev=vnet' % network_device_type)
>  self.set_up_boot()
>  if ssh_pubkey is None:
>  ssh_pubkey, self.ssh_key = self.set_up_existing_ssh_keys()
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index bed8ce44df..e10a935ac4 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -207,10 +207,6 @@ def setUp(self):
>  self.vm.add_args('-kernel', vmlinuz,
>   '-append', 'console=ttyS0 root=/dev/sda1')
>  
> -# Allow us to connect to SSH
> -self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
> - '-device', 'virtio-net,netdev=vnet')
> -
>  self.require_accelerator("kvm")
>  self.vm.add_args('-accel', 'kvm')
>  
> 




Re: [PATCH v2 10/10] Acceptance Tests: introduce CPU hotplug test

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> Even though there are qtest based tests for hotplugging CPUs (from
> which this test took some inspiration from), this one adds checks
> from a Linux guest point of view.
> 
> It should also serve as an example for tests that follow a similar
> pattern and need to interact with QEMU (via qmp) and with the Linux
> guest via SSH.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Marc-André Lureau 
> Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 

Eric
> ---
>  tests/acceptance/hotplug_cpu.py | 37 +
>  1 file changed, 37 insertions(+)
>  create mode 100644 tests/acceptance/hotplug_cpu.py
> 
> diff --git a/tests/acceptance/hotplug_cpu.py b/tests/acceptance/hotplug_cpu.py
> new file mode 100644
> index 00..6374bf1b54
> --- /dev/null
> +++ b/tests/acceptance/hotplug_cpu.py
> @@ -0,0 +1,37 @@
> +# Functional test that hotplugs a CPU and checks it on a Linux guest
> +#
> +# Copyright (c) 2021 Red Hat, Inc.
> +#
> +# Author:
> +#  Cleber Rosa 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +
> +from avocado_qemu import LinuxTest
> +
> +
> +class HotPlugCPU(LinuxTest):
> +
> +def test(self):
> +"""
> +:avocado: tags=arch:x86_64
> +:avocado: tags=machine:q35
> +:avocado: tags=accel:kvm
> +"""
> +self.require_accelerator('kvm')
> +self.vm.add_args('-accel', 'kvm')
> +self.vm.add_args('-cpu', 'Haswell')
> +self.vm.add_args('-smp', '1,sockets=1,cores=2,threads=1,maxcpus=2')
> +self.launch_and_wait()
> +
> +self.ssh_command('test -e /sys/devices/system/cpu/cpu0')
> +with self.assertRaises(AssertionError):
> +self.ssh_command('test -e /sys/devices/system/cpu/cpu1')
> +
> +self.vm.command('device_add',
> +driver='Haswell-x86_64-cpu',
> +socket_id=0,
> +core_id=1,
> +thread_id=0)
> +self.ssh_command('test -e /sys/devices/system/cpu/cpu1')
> 




Re: [PATCH v2 09/10] Acceptance Tests: add basic documentation on LinuxTest base class

2021-03-24 Thread Auger Eric
Hi,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Marc-André Lureau 
> Reviewed-by: Willian Rampazzo 
Reviewed-by: Eric Auger 

Eric
> ---
>  docs/devel/testing.rst | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
> index 1da4c4e4c4..ed2a06db28 100644
> --- a/docs/devel/testing.rst
> +++ b/docs/devel/testing.rst
> @@ -810,6 +810,31 @@ and hypothetical example follows:
>  At test "tear down", ``avocado_qemu.Test`` handles all the QEMUMachines
>  shutdown.
>  
> +The ``avocado_qemu.LinuxTest`` base test class
> +~~
> +
> +The ``avocado_qemu.LinuxTest`` is further specialization of the
> +``avocado_qemu.Test`` class, so it contains all the characteristics of
> +the later plus some extra features.
> +
> +First of all, this base class is intended for tests that need to
> +interact with a fully booted and operational Linux guest.  The most
> +basic example looks like this:
> +
> +.. code::
> +
> +  from avocado_qemu import LinuxTest
> +
> +
> +  class SomeTest(LinuxTest):
> +
> +  def test(self):
> +  self.launch_and_wait()
> +  self.ssh_command('some_command_to_be_run_in_the_guest')
> +
> +Please refer to tests that use ``avocado_qemu.LinuxTest`` under
> +``tests/acceptance`` for more examples.
> +
>  QEMUMachine
>  ~~~
>  
> 




Re: [PATCH v2 08/10] tests/acceptance/virtiofs_submounts.py: remove launch_vm()

2021-03-24 Thread Auger Eric
Hi,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> The LinuxTest class' launch_and_wait() method now behaves the same way
> as this test's custom launch_vm(), so let's just use the upper layer
> (common) method.
> 
> Signed-off-by: Cleber Rosa 
Reviewed-by: Eric Auger 

Eric
> ---
>  tests/acceptance/virtiofs_submounts.py | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index e019d3b896..d77ee35674 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -134,9 +134,6 @@ def set_up_virtiofs(self):
>   '-numa',
>   'node,memdev=mem')
>  
> -def launch_vm(self):
> -self.launch_and_wait()
> -
>  def set_up_nested_mounts(self):
>  scratch_dir = os.path.join(self.shared_dir, 'scratch')
>  try:
> @@ -225,7 +222,7 @@ def test_pre_virtiofsd_set_up(self):
>  self.set_up_nested_mounts()
>  
>  self.set_up_virtiofs()
> -self.launch_vm()
> +self.launch_and_wait()
>  self.mount_in_guest()
>  self.check_in_guest()
>  
> @@ -235,14 +232,14 @@ def test_pre_launch_set_up(self):
>  
>  self.set_up_nested_mounts()
>  
> -self.launch_vm()
> +self.launch_and_wait()
>  self.mount_in_guest()
>  self.check_in_guest()
>  
>  def test_post_launch_set_up(self):
>  self.set_up_shared_dir()
>  self.set_up_virtiofs()
> -self.launch_vm()
> +self.launch_and_wait()
>  
>  self.set_up_nested_mounts()
>  
> @@ -252,7 +249,7 @@ def test_post_launch_set_up(self):
>  def test_post_mount_set_up(self):
>  self.set_up_shared_dir()
>  self.set_up_virtiofs()
> -self.launch_vm()
> +self.launch_and_wait()
>  self.mount_in_guest()
>  
>  self.set_up_nested_mounts()
> @@ -265,7 +262,7 @@ def test_two_runs(self):
>  self.set_up_nested_mounts()
>  
>  self.set_up_virtiofs()
> -self.launch_vm()
> +self.launch_and_wait()
>  self.mount_in_guest()
>  self.check_in_guest()
>  
> 




Re: [PATCH v2 07/10] Acceptance Tests: set up SSH connection by default after boot for LinuxTest

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> The LinuxTest specifically targets users that need to interact with Linux
> guests.  So, it makes sense to give a connection by default, and avoid
> requiring it as boiler-plate code.
> 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 5 -
>  tests/acceptance/virtiofs_submounts.py| 1 -
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 535f63a48d..4960142bcc 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -390,7 +390,7 @@ def set_up_cloudinit(self, ssh_pubkey=None):
>  cloudinit_iso = self.prepare_cloudinit(ssh_pubkey)
>  self.vm.add_args('-drive', 'file=%s,format=raw' % cloudinit_iso)
>  
> -def launch_and_wait(self):
> +def launch_and_wait(self, set_up_ssh_connection=True):
>  self.vm.set_console()
>  self.vm.launch()
>  console_drainer = 
> datadrainer.LineLogger(self.vm.console_socket.fileno(),
> @@ -398,3 +398,6 @@ def launch_and_wait(self):
>  console_drainer.start()
>  self.log.info('VM launched, waiting for boot confirmation from 
> guest')
>  cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
> self.name)
> +if set_up_ssh_connection:
> +self.log.info('Setting up the SSH connection')
> +self.ssh_connect(self.username, self.ssh_key)
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index e10a935ac4..e019d3b896 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -136,7 +136,6 @@ def set_up_virtiofs(self):
>  
>  def launch_vm(self):
>  self.launch_and_wait()
> -self.ssh_connect('root', self.ssh_key)
>  
>  def set_up_nested_mounts(self):
>  scratch_dir = os.path.join(self.shared_dir, 'scratch')
> 
what about launch_and_wait calls in boot_linux.py. Don't you want to
force ssh connection off there?

Thanks

Eric




Re: [PATCH v2 06/10] Acceptance Tests: make username/password configurable

2021-03-24 Thread Auger Eric



On 3/23/21 11:15 PM, Cleber Rosa wrote:
> This makes the username/password used for authentication configurable,
> because some guest operating systems may have restrictions on accounts
> to be used for logins, and it just makes it better documented.
> 
> Signed-off-by: Cleber Rosa 
Reviewed-by: Eric Auger 

Eric
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index e75b002c70..535f63a48d 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -308,6 +308,8 @@ class LinuxTest(Test, LinuxSSHMixIn):
>  
>  timeout = 900
>  chksum = None
> +username = 'root'
> +password = 'password'
>  
>  def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
>  super(LinuxTest, self).setUp()
> @@ -370,8 +372,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
>  with open(ssh_pubkey) as pubkey:
>  pubkey_content = pubkey.read()
>  cloudinit.iso(cloudinit_iso, self.name,
> -  username='root',
> -  password='password',
> +  username=self.username,
> +  password=self.password,
># QEMU's hard coded usermode router address
>phone_home_host='10.0.2.2',
>phone_home_port=self.phone_home_port,
> 




Re: [PATCH v2 05/10] Acceptance Tests: add port redirection for ssh by default

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> For users of the LinuxTest class, let's set up the VM with the port
> redirection for SSH, instead of requiring each test to set the same
also sets the network device to virtio-net. This may be worth mentioning
here in the commit msg.
> arguments.
> 
> Signed-off-by: Cleber Rosa 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  tests/acceptance/avocado_qemu/__init__.py | 4 +++-
>  tests/acceptance/virtiofs_submounts.py| 4 
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 67f75f66e5..e75b002c70 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -309,10 +309,12 @@ class LinuxTest(Test, LinuxSSHMixIn):
>  timeout = 900
>  chksum = None
>  
> -def setUp(self, ssh_pubkey=None):
> +def setUp(self, ssh_pubkey=None, network_device_type='virtio-net'):
>  super(LinuxTest, self).setUp()
>  self.vm.add_args('-smp', '2')
>  self.vm.add_args('-m', '1024')
> +self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
> + '-device', '%s,netdev=vnet' % network_device_type)
>  self.set_up_boot()
>  if ssh_pubkey is None:
>  ssh_pubkey, self.ssh_key = self.set_up_existing_ssh_keys()
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index bed8ce44df..e10a935ac4 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -207,10 +207,6 @@ def setUp(self):
>  self.vm.add_args('-kernel', vmlinuz,
>   '-append', 'console=ttyS0 root=/dev/sda1')
>  
> -# Allow us to connect to SSH
> -self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22',
> - '-device', 'virtio-net,netdev=vnet')
> -
>  self.require_accelerator("kvm")
>  self.vm.add_args('-accel', 'kvm')
>  
> 




Re: [PATCH v2 04/10] Acceptance Tests: move useful ssh methods to base class

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> Both the virtiofs submounts and the linux ssh mips malta tests
> contains useful methods related to ssh that deserve to be made
> available to other tests.  Let's move them to the base LinuxTest
nit: strictly speaking they are moved to another class which is
inherited by LinuxTest, right?
> class.
> 
> The method that helps with setting up an ssh connection will now
> support both key and password based authentication, defaulting to key
> based.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Wainer dos Santos Moschetta 
> Reviewed-by: Willian Rampazzo 
> ---
>  tests/acceptance/avocado_qemu/__init__.py | 48 ++-
>  tests/acceptance/linux_ssh_mips_malta.py  | 38 ++
>  tests/acceptance/virtiofs_submounts.py| 37 -
>  3 files changed, 50 insertions(+), 73 deletions(-)
> 
> diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> b/tests/acceptance/avocado_qemu/__init__.py
> index 83b1741ec8..67f75f66e5 100644
> --- a/tests/acceptance/avocado_qemu/__init__.py
> +++ b/tests/acceptance/avocado_qemu/__init__.py
> @@ -20,6 +20,7 @@
>  from avocado.utils import cloudinit
>  from avocado.utils import datadrainer
>  from avocado.utils import network
> +from avocado.utils import ssh
>  from avocado.utils import vmimage
>  from avocado.utils.path import find_command
>  
> @@ -43,6 +44,8 @@
>  from qemu.accel import kvm_available
>  from qemu.accel import tcg_available
>  from qemu.machine import QEMUMachine
> +from qemu.utils import get_info_usernet_hostfwd_port
> +
>  
>  def is_readable_executable_file(path):
>  return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
> @@ -253,7 +256,50 @@ def fetch_asset(self, name,
>  cancel_on_missing=cancel_on_missing)
>  
>  
> -class LinuxTest(Test):
> +class LinuxSSHMixIn:
> +"""Contains utility methods for interacting with a guest via SSH."""
> +
> +def ssh_connect(self, username, credential, credential_is_key=True):
> +self.ssh_logger = logging.getLogger('ssh')
> +res = self.vm.command('human-monitor-command',
> +  command_line='info usernet')
> +port = get_info_usernet_hostfwd_port(res)
> +self.assertIsNotNone(port)
> +self.assertGreater(port, 0)
> +self.log.debug('sshd listening on port: %d', port)
> +if credential_is_key:
> +self.ssh_session = ssh.Session('127.0.0.1', port=port,
> +   user=username, key=credential)
> +else:
> +self.ssh_session = ssh.Session('127.0.0.1', port=port,
> +   user=username, 
> password=credential)
> +for i in range(10):
> +try:
> +self.ssh_session.connect()
> +return
> +except:
> +time.sleep(4)
> +pass
> +self.fail('ssh connection timeout')
> +
> +def ssh_command(self, command):
> +self.ssh_logger.info(command)
> +result = self.ssh_session.cmd(command)
> +stdout_lines = [line.rstrip() for line
> +in result.stdout_text.splitlines()]
> +for line in stdout_lines:
> +self.ssh_logger.info(line)
> +stderr_lines = [line.rstrip() for line
> +in result.stderr_text.splitlines()]
> +for line in stderr_lines:
> +self.ssh_logger.warning(line)
> +
> +self.assertEqual(result.exit_status, 0,
> + f'Guest command failed: {command}')
> +return stdout_lines, stderr_lines
> +
> +
> +class LinuxTest(Test, LinuxSSHMixIn):
>  """Facilitates having a cloud-image Linux based available.
>  
>  For tests that indend to interact with guests, this is a better choice
> diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
> b/tests/acceptance/linux_ssh_mips_malta.py
> index 052008f02d..3f590a081f 100644
> --- a/tests/acceptance/linux_ssh_mips_malta.py
> +++ b/tests/acceptance/linux_ssh_mips_malta.py
> @@ -12,7 +12,7 @@
>  import time
>  
>  from avocado import skipUnless
> -from avocado_qemu import Test
> +from avocado_qemu import Test, LinuxSSHMixIn
>  from avocado_qemu import wait_for_console_pattern
>  from avocado.utils import process
>  from avocado.utils import archive
> @@ -21,7 +21,7 @@
>  from qemu.utils import get_info_usernet_hostfwd_port
Can't you remove this now?
>  
>  
> -class LinuxSSH(Test):
> +class LinuxSSH(Test, LinuxSSHMixIn):
out of curiosity why can't it be migrated to a LinuxTest?
>  
>  timeout = 150 # Not for 'configure --enable-debug --enable-debug-tcg'
>  
> @@ -72,41 +72,9 @@ def get_kernel_info(self, endianess, wordsize):
>  def setUp(self):
>  super(LinuxSSH, self).setUp()
>  
> -def ssh_connect(self, username, password):
> -self.ssh_logger = logging.getLogger('ssh')
> -res = 

Re: [PATCH v2 03/10] Python: add utility function for retrieving port redirection

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> Slightly different versions for the same utility code are currently
> present on different locations.  This unifies them all, giving
> preference to the version from virtiofs_submounts.py, because of the
> last tweaks added to it.
> 
> While at it, this adds a "qemu.utils" module to host the utility
> function and a test.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Wainer dos Santos Moschetta 
> ---
>  python/qemu/utils.py | 35 
>  tests/acceptance/info_usernet.py | 29 
>  tests/acceptance/linux_ssh_mips_malta.py | 16 +--
>  tests/acceptance/virtiofs_submounts.py   | 21 --
>  tests/vm/basevm.py   |  7 ++---
>  5 files changed, 78 insertions(+), 30 deletions(-)
>  create mode 100644 python/qemu/utils.py
>  create mode 100644 tests/acceptance/info_usernet.py
> 
> diff --git a/python/qemu/utils.py b/python/qemu/utils.py
> new file mode 100644
> index 00..89a246ab30
> --- /dev/null
> +++ b/python/qemu/utils.py
> @@ -0,0 +1,35 @@
> +"""
> +QEMU utility library
> +
> +This offers miscellaneous utility functions, which may not be easily
> +distinguishable or numerous to be in their own module.
> +"""
> +
> +# Copyright (C) 2021 Red Hat Inc.
> +#
> +# Authors:
> +#  Cleber Rosa 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2.  See
> +# the COPYING file in the top-level directory.
> +#
> +
> +import re
> +from typing import Optional
> +> +
> +def get_info_usernet_hostfwd_port(info_usernet_output: str) -> Optional[int]:
> +"""
> +Returns the port given to the hostfwd parameter via info usernet
> +
> +:param info_usernet_output: output generated by hmp command "info 
> usernet"
> +:param info_usernet_output: str
> +:return: the port number allocated by the hostfwd option
> +:rtype: int
> +"""
> +for line in info_usernet_output.split('\r\n'):
> +regex = r'TCP.HOST_FORWARD.*127\.0\.0\.1\s+(\d+)\s+10\.'
> +match = re.search(regex, line)
> +if match is not None:
> +return int(match[1])
> +return None
> diff --git a/tests/acceptance/info_usernet.py 
> b/tests/acceptance/info_usernet.py
> new file mode 100644
> index 00..9c1fd903a0
> --- /dev/null
> +++ b/tests/acceptance/info_usernet.py
> @@ -0,0 +1,29 @@
> +# Test for the hmp command "info usernet"
> +#
> +# Copyright (c) 2021 Red Hat, Inc.
> +#
> +# Author:
> +#  Cleber Rosa 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +
> +from avocado_qemu import Test
> +
> +from qemu.utils import get_info_usernet_hostfwd_port
> +
> +
> +class InfoUsernet(Test):
> +
> +def test_hostfwd(self):
> +self.vm.add_args('-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22')
> +self.vm.launch()
> +res = self.vm.command('human-monitor-command',
> +  command_line='info usernet')
> +port = get_info_usernet_hostfwd_port(res)
> +self.assertIsNotNone(port,
> + ('"info usernet" output content does not seem 
> to '
> +  'contain the redirected port'))
> +self.assertGreater(port, 0,
> +   ('Found a redirected port that is not greater 
> than'
> +' zero'))
> diff --git a/tests/acceptance/linux_ssh_mips_malta.py 
> b/tests/acceptance/linux_ssh_mips_malta.py
> index 6dbd02d49d..052008f02d 100644
> --- a/tests/acceptance/linux_ssh_mips_malta.py
> +++ b/tests/acceptance/linux_ssh_mips_malta.py
> @@ -18,6 +18,8 @@
>  from avocado.utils import archive
>  from avocado.utils import ssh
>  
> +from qemu.utils import get_info_usernet_hostfwd_port
> +
>  
>  class LinuxSSH(Test):
>  
> @@ -70,18 +72,14 @@ def get_kernel_info(self, endianess, wordsize):
>  def setUp(self):
>  super(LinuxSSH, self).setUp()
>  
> -def get_portfwd(self):
> +def ssh_connect(self, username, password):
> +self.ssh_logger = logging.getLogger('ssh')
>  res = self.vm.command('human-monitor-command',
>command_line='info usernet')
> -line = res.split('\r\n')[2]
> -port = re.split(r'.*TCP.HOST_FORWARD.*127\.0\.0\.1 (\d+)\s+10\..*',
> -line)[1]
> +port = get_info_usernet_hostfwd_port(res)
> +if not port:
> +self.cancel("Failed to retrieve SSH port")
>  self.log.debug("sshd listening on port:" + port)
> -return port
> -
> -def ssh_connect(self, username, password):
> -self.ssh_logger = logging.getLogger('ssh')
> -port = self.get_portfwd()
>  self.ssh_session = ssh.Session(self.VM_IP, port=int(port),
> user=username, password=password)
>  for i in range(10):
> diff 

Re: [PATCH v2 01/10] tests/acceptance/virtiofs_submounts.py: add missing accel tag

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> The tag is useful to select tests that depend/use a particular
> feature.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Wainer dos Santos Moschetta 
> Reviewed-by: Willian Rampazzo 
> ---
>  tests/acceptance/virtiofs_submounts.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index 46fa65392a..5b74ce2929 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -70,6 +70,7 @@ def test_something_that_needs_cmd1_and_cmd2(self):
>  class VirtiofsSubmountsTest(LinuxTest):
>  """
>  :avocado: tags=arch:x86_64
> +:avocado: tags=accel:kvm
>  """
>  
>  def get_portfwd(self):
> 
Reviewed-by: Eric Auger 

Thanks

Eric




Re: [PATCH v2 02/10] tests/acceptance/virtiofs_submounts.py: evaluate string not length

2021-03-24 Thread Auger Eric
Hi Cleber,

On 3/23/21 11:15 PM, Cleber Rosa wrote:
> If the vmlinuz variable is set to anything that evaluates to True,
> then the respective arguments should be set.  If the variable contains
> an empty string, than it will evaluate to False, and the extra
s/than/then
> arguments will not be set.>
> This keeps the same logic, but improves readability a bit.
> 
> Signed-off-by: Cleber Rosa 
> Reviewed-by: Beraldo Leal 
> ---
>  tests/acceptance/virtiofs_submounts.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/acceptance/virtiofs_submounts.py 
> b/tests/acceptance/virtiofs_submounts.py
> index 5b74ce2929..ca64b76301 100644
> --- a/tests/acceptance/virtiofs_submounts.py
> +++ b/tests/acceptance/virtiofs_submounts.py
> @@ -251,7 +251,7 @@ def setUp(self):
>  
>  super(VirtiofsSubmountsTest, self).setUp(pubkey)
>  
> -if len(vmlinuz) > 0:
> +if vmlinuz:
>  self.vm.add_args('-kernel', vmlinuz,
>   '-append', 'console=ttyS0 root=/dev/sda1')
>  
> 
Reviewed-by: Eric Auger 

Thanks

Eric




Re: [RFC RESEND PATCH 2/4] hw/pci: Add iommu option for pci root bus

2021-03-14 Thread Auger Eric
Hi Xingang

On 3/11/21 1:24 PM, Wang Xingang wrote:
> Hi Eric,
> 
> On 2021/3/10 18:24, Auger Eric wrote:
>> Hi Xingang,
>>
>> On 2/27/21 9:33 AM, Wang Xingang wrote:
>>> From: Xingang Wang 
>>>
>>> This add iommu option for pci root bus, including primary bus
>>> and pxb root bus. Default option is set to true, and the option
>>> is valid only if the iommu option for machine is properly set.
>>>
>>> Signed-off-by: Xingang Wang 
>>> Signed-off-by: Jiahui Cen 
>>> ---
>>>   hw/arm/virt.c   | 29 +
>>>   hw/pci-bridge/pci_expander_bridge.c |  6 ++
>>>   hw/pci/pci.c    |  2 +-
>>>   include/hw/arm/virt.h   |  1 +
>>>   4 files changed, 37 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 371147f3ae..0c9e549759 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -79,6 +79,7 @@
>>>   #include "hw/virtio/virtio-iommu.h"
>>>   #include "hw/char/pl011.h"
>>>   #include "qemu/guest-random.h"
>>> +#include "include/hw/pci/pci_bus.h"
>>>     #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>>>   static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
>>> @@ -1232,6 +1233,10 @@ static void create_smmu(const VirtMachineState
>>> *vms,
>>>     dev = qdev_new("arm-smmuv3");
>>>   +    if (vms->primary_bus_iommu) {
>>> +    bus->flags |= PCI_BUS_IOMMU;
>>> +    }
>>> +
>>>   object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
>>>    _abort);
>>>   sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
>>> @@ -2305,6 +2310,20 @@ static void virt_set_iommu(Object *obj, const
>>> char *value, Error **errp)
>>>   }
>>>   }
>>>   +static bool virt_get_primary_bus_iommu(Object *obj, Error **errp)
>>> +{
>>> +    VirtMachineState *vms = VIRT_MACHINE(obj);
>>> +
>>> +    return vms->primary_bus_iommu;
>>> +}
>>> +
>>> +static void virt_set_primary_bus_iommu(Object *obj, bool value,
>>> Error **errp)
>>> +{
>>> +    VirtMachineState *vms = VIRT_MACHINE(obj);
>>> +
>>> +    vms->primary_bus_iommu = value;
>>> +}
>>> +
>>>   static CpuInstanceProperties
>>>   virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>>>   {
>>> @@ -2629,6 +2648,13 @@ static void
>>> virt_machine_class_init(ObjectClass *oc, void *data)
>>>     "Set the IOMMU type. "
>>>     "Valid values are none
>>> and smmuv3");
>>>   +    object_class_property_add_bool(oc, "primary_bus_iommu",
>>> +  virt_get_primary_bus_iommu,
>>> +  virt_set_primary_bus_iommu);
>>> +    object_class_property_set_description(oc, "primary_bus_iommu",
>>> +  "Set on/off to
>>> enable/disable "
>>> +  "iommu for primary bus");
>>> +
>>>   object_class_property_add_bool(oc, "ras", virt_get_ras,
>>>  virt_set_ras);
>>>   object_class_property_set_description(oc, "ras",
>>> @@ -2696,6 +2722,9 @@ static void virt_instance_init(Object *obj)
>>>   /* Default disallows iommu instantiation */
>>>   vms->iommu = VIRT_IOMMU_NONE;
>>>   +    /* Iommu is enabled by default for primary bus */
>>> +    vms->primary_bus_iommu = true;
>>> +
>>>   /* Default disallows RAS instantiation */
>>>   vms->ras = false;
>>>   diff --git a/hw/pci-bridge/pci_expander_bridge.c
>>> b/hw/pci-bridge/pci_expander_bridge.c
>>> index aedded1064..0412656265 100644
>>> --- a/hw/pci-bridge/pci_expander_bridge.c
>>> +++ b/hw/pci-bridge/pci_expander_bridge.c
>>> @@ -57,6 +57,7 @@ struct PXBDev {
>>>     uint8_t bus_nr;
>>>   uint16_t numa_node;
>>> +    bool iommu;
>>>   };
>>>     static PXBDev *convert_to_pxb(PCIDevice *dev)
>>> @@ -254,6 +255,10 @@ static void pxb_dev_realize_common(PCIDevice

Re: [RFC RESEND PATCH 0/4] hw/arm/virt-acpi-build: Introduce iommu option for pci root bus

2021-03-14 Thread Auger Eric
Hi Xingang,

On 3/11/21 12:57 PM, Wang Xingang wrote:
> Hi Eric,
> 
> On 2021/3/10 18:18, Auger Eric wrote:
>> Hi Xingang,
>>
>> On 3/10/21 3:13 AM, Wang Xingang wrote:
>>> Hi Eric,
>>>
>>> On 2021/3/9 22:36, Auger Eric wrote:
>>>> Hi,
>>>> On 2/27/21 9:33 AM, Wang Xingang wrote:
>>>>> From: Xingang Wang 
>>>>>
>>>>> These patches add support for configure iommu on/off for pci root bus,
>>>>> including primary bus and pxb root bus. At present, All root bus
>>>>> will go
>>>>> through iommu when iommu is configured, which is not flexible.
>>>>>
>>>>> So this add option to enable/disable iommu for primary bus and pxb
>>>>> root bus.
>>>>> When iommu is enabled for the root bus, devices attached to it will go
>>>>> through iommu. When iommu is disabled for the root bus, devices
>>>>> will not
>>>>> go through iommu accordingly.
>>>>
>>>> Please could you give an example of the qemu command line for which the
>>>> new option is useful for you. This would help me to understand your
>>>> pcie/pci topology and also make sure I test it with the smmu.

It looks like a guest issue. I have switched to a fedora guest and it
works now with the following command line:

./build/qemu-system-aarch64 -M virt,gic-version=host -cpu host \
-smp 8 -m 16G -display none --enable-kvm -serial \
-drive
file=/home/augere/VM/IMAGES/aarch64-vm0-fed30.raw,format=raw,if=none,cache=writethrough,id=drv0
\
-netdev
tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
\
-drive if=pflash,format=raw,file=/home/augere/VM/UEFI/flash0.img,readonly \
-drive if=pflash,format=raw,file=/home/augere/VM/UEFI/flash1.img \
-net none -d guest_errors \
-device
virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
\
-device pxb-pcie,id=bridge1,bus=pcie.0,bus_nr=254 \
-device pcie-root-port,port=0x0,chassis=4,id=pcie.4,bus=bridge1 \
-device virtio-net-pci,bus=pcie.4,netdev=nic0,mac=6a:f5:10:b1:3d:d2

It also works with your patches.

Thanks

Eric

>>>>
>>>> Thank you in advance
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>>
>>>>> Xingang Wang (4):
>>>>>     pci: Add PCI_BUS_IOMMU property
>>>>>     hw/pci: Add iommu option for pci root bus
>>>>>     hw/pci: Add pci_root_bus_max_bus
>>>>>     hw/arm/virt-acpi-build: Add explicit idmap info in IORT table
>>>>>
>>>>>    hw/arm/virt-acpi-build.c    | 92
>>>>> +
>>>>>    hw/arm/virt.c   | 29 +
>>>>>    hw/pci-bridge/pci_expander_bridge.c |  6 ++
>>>>>    hw/pci/pci.c    | 35 ++-
>>>>>    include/hw/arm/virt.h   |  1 +
>>>>>    include/hw/pci/pci.h    |  1 +
>>>>>    include/hw/pci/pci_bus.h    | 13 
>>>>>    7 files changed, 153 insertions(+), 24 deletions(-)
>>>>>
>>>>
>>>> .
>>>>
>>>
>>> Thanks for your advice.
>>>
>>> I test this with the following script, in which i add two options.
>>>
>>> The option `primary_bus_iommu=false(or true)` for `-machine
>>> virt,iommu=smmuv3`, this helps to enable/disable whether primary bus go
>>> through iommu.
>>>
>>> The other option `iommu=false` or `iommu=true` for `-device pxb-pcie`
>>> helps to enable/disable whether pxb root bus go through iommu.
>>>
>>>> #!/bin/sh
>>>>
>>>> /path/to/qemu/build/aarch64-softmmu/qemu-system-aarch64 \
>>>> -enable-kvm \
>>>> -cpu host \
>>>> -kernel /path/to/linux/arch/arm64/boot/Image \
>>>> -m 16G \
>>>> -smp 8,sockets=8,cores=1,threads=1 \
>>>> -machine
>>>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3,primary_bus_iommu=false
>>>>
>>>> \
>>>> -drive
>>>> file=./QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on \
>>>> -device
>>>> pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3.0x1,iommu=false \
>>>> -device
>>>> pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x3.0x2,iommu=true \
>>>> -device
>>>> pxb-pcie,bus_nr=0x23,id=pci.30,bus=pcie.0,addr=0x3.0x3,iommu=true \
>>>> -de

Re: [PATCH v2 2/2] hw/arm/virt: KVM: The IPA lower bound is 32

2021-03-10 Thread Auger Eric
Hi Drew,

On 3/10/21 2:52 PM, Andrew Jones wrote:
> The virt machine already checks KVM_CAP_ARM_VM_IPA_SIZE to get the
> upper bound of the IPA size. If that bound is lower than the highest
> possible GPA for the machine, then QEMU will error out. However, the
> IPA is set to 40 when the highest GPA is less than or equal to 40,
> even when KVM may support an IPA limit as low as 32. This means KVM
> may fail the VM creation unnecessarily. Additionally, 40 is selected
> with the value 0, which means use the default, and that gets around
> a check in some versions of KVM, causing a difficult to debug fail.
> Always use the IPA size that corresponds to the highest possible GPA,
> unless it's lower than 32, in which case use 32. Also, we must still
> use 0 when KVM only supports the legacy fixed 40 bit IPA.
> 
> Suggested-by: Marc Zyngier 
> Signed-off-by: Andrew Jones 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---

>  hw/arm/virt.c| 23 ---
>  target/arm/kvm.c |  4 +++-
>  target/arm/kvm_arm.h |  6 --
>  3 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 371147f3ae9c..3ed94d24d70b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2534,27 +2534,36 @@ static HotplugHandler 
> *virt_machine_get_hotplug_handler(MachineState *machine,
>  static int virt_kvm_type(MachineState *ms, const char *type_str)
>  {
>  VirtMachineState *vms = VIRT_MACHINE(ms);
> -int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
> -int requested_pa_size;
> +int max_vm_pa_size, requested_pa_size;
> +bool fixed_ipa;
> +
> +max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms, _ipa);
>  
>  /* we freeze the memory map to compute the highest gpa */
>  virt_set_memmap(vms);
>  
>  requested_pa_size = 64 - clz64(vms->highest_gpa);
>  
> +/*
> + * KVM requires the IPA size to be at least 32 bits.
> + */
> +if (requested_pa_size < 32) {
> +requested_pa_size = 32;
> +}
> +
>  if (requested_pa_size > max_vm_pa_size) {
>  error_report("-m and ,maxmem option values "
>   "require an IPA range (%d bits) larger than "
>   "the one supported by the host (%d bits)",
>   requested_pa_size, max_vm_pa_size);
> -   exit(1);
> +exit(1);
>  }
>  /*
> - * By default we return 0 which corresponds to an implicit legacy
> - * 40b IPA setting. Otherwise we return the actual requested PA
> - * logsize
> + * We return the requested PA log size, unless KVM only supports
> + * the implicit legacy 40b IPA setting, in which case the kvm_type
> + * must be 0.
>   */
> -return requested_pa_size > 40 ? requested_pa_size : 0;
> +return fixed_ipa ? 0 : requested_pa_size;
>  }
>  
>  static void virt_machine_class_init(ObjectClass *oc, void *data)
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 00e124c81239..1fcab0e1d37b 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -230,12 +230,14 @@ bool kvm_arm_pmu_supported(void)
>  return kvm_check_extension(kvm_state, KVM_CAP_ARM_PMU_V3);
>  }
>  
> -int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
> +int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa)
>  {
>  KVMState *s = KVM_STATE(ms->accelerator);
>  int ret;
>  
>  ret = kvm_check_extension(s, KVM_CAP_ARM_VM_IPA_SIZE);
> +*fixed_ipa = ret <= 0;
> +
>  return ret > 0 ? ret : 40;
>  }
>  
> diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
> index eb81b7059eb1..d36d76403ff2 100644
> --- a/target/arm/kvm_arm.h
> +++ b/target/arm/kvm_arm.h
> @@ -311,10 +311,12 @@ bool kvm_arm_sve_supported(void);
>  /**
>   * kvm_arm_get_max_vm_ipa_size:
>   * @ms: Machine state handle
> + * @fixed_ipa: True when the IPA limit is fixed at 40. This is the case
> + * for legacy KVM.
>   *
>   * Returns the number of bits in the IPA address space supported by KVM
>   */
> -int kvm_arm_get_max_vm_ipa_size(MachineState *ms);
> +int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa);
>  
>  /**
>   * kvm_arm_sync_mpstate_to_kvm:
> @@ -409,7 +411,7 @@ static inline void kvm_arm_add_vcpu_properties(Object 
> *obj)
>  g_assert_not_reached();
>  }
>  
> -static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
> +static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool 
> *fixed_ipa)
>  {
>  g_assert_not_reached();
>  }
> 




Re: [PATCH v2 1/2] accel: kvm: Fix kvm_type invocation

2021-03-10 Thread Auger Eric
Hi Drew,

On 3/10/21 2:52 PM, Andrew Jones wrote:
> Prior to commit f2ce39b4f067 a MachineClass kvm_type method
> only needed to be registered to ensure it would be executed.
> With commit f2ce39b4f067 a kvm-type machine property must also
> be specified. hw/arm/virt relies on the kvm_type method to pass
> its selected IPA limit to KVM, but this is not exposed as a
> machine property. Restore the previous functionality of invoking
> kvm_type when it's present.

Ouch, good catch for this regression
> 
> Fixes: f2ce39b4f067 ("vl: make qemu_get_machine_opts static")
> Signed-off-by: Andrew Jones 
> ---
>  accel/kvm/kvm-all.c | 2 ++
>  include/hw/boards.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f88a52393fe0..37b0a1861e72 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2068,6 +2068,8 @@ static int kvm_init(MachineState *ms)
>  "kvm-type",
>  _abort);
>  type = mc->kvm_type(ms, kvm_type);
> +} else if (mc->kvm_type) {
> +type = mc->kvm_type(ms, NULL);
>  }
>  
>  do {
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index a46dfe5d1a6a..327949967609 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -128,6 +128,7 @@ typedef struct {
>   * @kvm_type:
>   *Return the type of KVM corresponding to the kvm-type string option or
>   *computed based on other criteria such as the host kernel capabilities.
> + *kvm-type may be NULL if it is not needed.
>   * @numa_mem_supported:
>   *true if '--numa node.mem' option is supported and false otherwise
>   * @smp_parse:
> 
Reviewed-by: Eric Auger 

Eric




Re: [RFC RESEND PATCH 1/4] pci: Add PCI_BUS_IOMMU property

2021-03-10 Thread Auger Eric
Hi Xingang,

On 2/27/21 9:33 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> This Property can be useful to check whether this bus is attached to iommu.

Strictly speaking this is not a Property (QEMU property) but a flag
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 
> ---
>  include/hw/pci/pci_bus.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> index 347440d42c..42109e8a06 100644
> --- a/include/hw/pci/pci_bus.h
> +++ b/include/hw/pci/pci_bus.h
> @@ -24,6 +24,8 @@ enum PCIBusFlags {
>  PCI_BUS_IS_ROOT = 0x0001,
>  /* PCIe extended configuration space is accessible on this bus */
>  PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
> +/* Iommu is enabled on this bus */
s/Iommu/IOMMU here and elsewhere
> +PCI_BUS_IOMMU   = 0x0004,
>  };
>  
>  struct PCIBus {
> @@ -63,4 +65,15 @@ static inline bool 
> pci_bus_allows_extended_config_space(PCIBus *bus)
>  return !!(bus->flags & PCI_BUS_EXTENDED_CONFIG_SPACE);
>  }
>  
> +static inline bool pci_bus_has_iommu(PCIBus *bus)
> +{
> +PCIBus *root_bus = bus;
> +
> +while (root_bus && !pci_bus_is_root(root_bus)) {
> +root_bus = pci_get_bus(root_bus->parent_dev);
> +}
> +
> +return !!(root_bus->flags & PCI_BUS_IOMMU);
> +}
> +
>  #endif /* QEMU_PCI_BUS_H */
> 
Eric




Re: [RFC RESEND PATCH 0/4] hw/arm/virt-acpi-build: Introduce iommu option for pci root bus

2021-03-10 Thread Auger Eric
Hi Xingang,

On 3/10/21 3:13 AM, Wang Xingang wrote:
> Hi Eric,
> 
> On 2021/3/9 22:36, Auger Eric wrote:
>> Hi,
>> On 2/27/21 9:33 AM, Wang Xingang wrote:
>>> From: Xingang Wang 
>>>
>>> These patches add support for configure iommu on/off for pci root bus,
>>> including primary bus and pxb root bus. At present, All root bus will go
>>> through iommu when iommu is configured, which is not flexible.
>>>
>>> So this add option to enable/disable iommu for primary bus and pxb
>>> root bus.
>>> When iommu is enabled for the root bus, devices attached to it will go
>>> through iommu. When iommu is disabled for the root bus, devices will not
>>> go through iommu accordingly.
>>
>> Please could you give an example of the qemu command line for which the
>> new option is useful for you. This would help me to understand your
>> pcie/pci topology and also make sure I test it with the smmu.
>>
>> Thank you in advance
>>
>> Best Regards
>>
>> Eric
>>>
>>> Xingang Wang (4):
>>>    pci: Add PCI_BUS_IOMMU property
>>>    hw/pci: Add iommu option for pci root bus
>>>    hw/pci: Add pci_root_bus_max_bus
>>>    hw/arm/virt-acpi-build: Add explicit idmap info in IORT table
>>>
>>>   hw/arm/virt-acpi-build.c    | 92 +
>>>   hw/arm/virt.c   | 29 +
>>>   hw/pci-bridge/pci_expander_bridge.c |  6 ++
>>>   hw/pci/pci.c    | 35 ++-
>>>   include/hw/arm/virt.h   |  1 +
>>>   include/hw/pci/pci.h    |  1 +
>>>   include/hw/pci/pci_bus.h    | 13 
>>>   7 files changed, 153 insertions(+), 24 deletions(-)
>>>
>>
>> .
>>
> 
> Thanks for your advice.
> 
> I test this with the following script, in which i add two options.
> 
> The option `primary_bus_iommu=false(or true)` for `-machine
> virt,iommu=smmuv3`, this helps to enable/disable whether primary bus go
> through iommu.
> 
> The other option `iommu=false` or `iommu=true` for `-device pxb-pcie`
> helps to enable/disable whether pxb root bus go through iommu.
> 
>> #!/bin/sh
>>
>> /path/to/qemu/build/aarch64-softmmu/qemu-system-aarch64 \
>> -enable-kvm \
>> -cpu host \
>> -kernel /path/to/linux/arch/arm64/boot/Image \
>> -m 16G \
>> -smp 8,sockets=8,cores=1,threads=1 \
>> -machine
>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3,primary_bus_iommu=false
>> \
>> -drive
>> file=./QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on \
>> -device
>> pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3.0x1,iommu=false \
>> -device
>> pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x3.0x2,iommu=true \
>> -device
>> pxb-pcie,bus_nr=0x23,id=pci.30,bus=pcie.0,addr=0x3.0x3,iommu=true \
>> -device
>> pxb-pcie,bus_nr=0x40,id=pci.40,bus=pcie.0,addr=0x3.0x4,iommu=false \
>> -device pcie-pci-bridge,id=pci.11,bus=pci.10,addr=0x1 \
>> -device pcie-pci-bridge,id=pci.21,bus=pci.20,addr=0x1 \
>> -device
>> pcie-root-port,port=0x20,chassis=10,id=pci.2,bus=pcie.0,addr=0x2 \
>> -device
>> pcie-root-port,port=0x20,chassis=11,id=pci.12,bus=pci.10,addr=0x2 \
>> -device
>> pcie-root-port,port=0x20,chassis=19,id=pci.19,bus=pci.11,addr=0x3 \
>> -device
>> pcie-root-port,port=0x20,chassis=12,id=pci.22,bus=pci.20,addr=0x2 \
>> -device
>> pcie-root-port,port=0x20,chassis=13,id=pci.42,bus=pci.40,addr=0x2 \
>> -device virtio-scsi-pci,id=scsi0,bus=pci.12,addr=0x1 \
>> -device vfio-pci,host=b5:00.2,bus=pci.42,addr=0x0,id=acc2 \
>> -net none \
>> -initrd /path/to/rootfs.cpio.gz \
>> -nographic \
>> -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900 nokaslr" \
> 
> I test the command line with an accelerator. The IORT table will have
> some changes, so only the root bus with iommu=true will go through smmuv3.

Thank you for sharing your command line.

On my end without using ",iommu=smmuv3" and the new options, my guest
crashes.

0.833665] ACPI: PCI Root Bridge [PC0A] (domain  [bus 0a-0b])
[0.837630] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM
ClockPM Segments MSI HPX-Type3]
[0.843377] acpi PNP0A08:00: _OSC: platform does not support [LTR]
[0.846796] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME
AER PCIeCapability]
[0.851082] acpi PNP0A08:00: ECAM area [mem
0x4010a0-0x4010bf] reserved by PNP0C02:00
[0.854742] acpi PNP0A08:00: ECAM at [mem 0x4010a0-0x4010bf]
for [bus 0a-0b]
[0.85956

Re: [RFC RESEND PATCH 2/4] hw/pci: Add iommu option for pci root bus

2021-03-10 Thread Auger Eric
Hi Xingang,

On 2/27/21 9:33 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> This add iommu option for pci root bus, including primary bus
> and pxb root bus. Default option is set to true, and the option
> is valid only if the iommu option for machine is properly set.
> 
> Signed-off-by: Xingang Wang 
> Signed-off-by: Jiahui Cen 
> ---
>  hw/arm/virt.c   | 29 +
>  hw/pci-bridge/pci_expander_bridge.c |  6 ++
>  hw/pci/pci.c|  2 +-
>  include/hw/arm/virt.h   |  1 +
>  4 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 371147f3ae..0c9e549759 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -79,6 +79,7 @@
>  #include "hw/virtio/virtio-iommu.h"
>  #include "hw/char/pl011.h"
>  #include "qemu/guest-random.h"
> +#include "include/hw/pci/pci_bus.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>  static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -1232,6 +1233,10 @@ static void create_smmu(const VirtMachineState *vms,
>  
>  dev = qdev_new("arm-smmuv3");
>  
> +if (vms->primary_bus_iommu) {
> +bus->flags |= PCI_BUS_IOMMU;
> +}
> +
>  object_property_set_link(OBJECT(dev), "primary-bus", OBJECT(bus),
>   _abort);
>  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
> @@ -2305,6 +2310,20 @@ static void virt_set_iommu(Object *obj, const char 
> *value, Error **errp)
>  }
>  }
>  
> +static bool virt_get_primary_bus_iommu(Object *obj, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +return vms->primary_bus_iommu;
> +}
> +
> +static void virt_set_primary_bus_iommu(Object *obj, bool value, Error **errp)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +vms->primary_bus_iommu = value;
> +}
> +
>  static CpuInstanceProperties
>  virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> @@ -2629,6 +2648,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
> void *data)
>"Set the IOMMU type. "
>"Valid values are none and 
> smmuv3");
>  
> +object_class_property_add_bool(oc, "primary_bus_iommu",
> +  virt_get_primary_bus_iommu,
> +  virt_set_primary_bus_iommu);
> +object_class_property_set_description(oc, "primary_bus_iommu",
> +  "Set on/off to enable/disable "
> +  "iommu for primary bus");
> +
>  object_class_property_add_bool(oc, "ras", virt_get_ras,
> virt_set_ras);
>  object_class_property_set_description(oc, "ras",
> @@ -2696,6 +2722,9 @@ static void virt_instance_init(Object *obj)
>  /* Default disallows iommu instantiation */
>  vms->iommu = VIRT_IOMMU_NONE;
>  
> +/* Iommu is enabled by default for primary bus */
> +vms->primary_bus_iommu = true;
> +
>  /* Default disallows RAS instantiation */
>  vms->ras = false;
>  
> diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> b/hw/pci-bridge/pci_expander_bridge.c
> index aedded1064..0412656265 100644
> --- a/hw/pci-bridge/pci_expander_bridge.c
> +++ b/hw/pci-bridge/pci_expander_bridge.c
> @@ -57,6 +57,7 @@ struct PXBDev {
>  
>  uint8_t bus_nr;
>  uint16_t numa_node;
> +bool iommu;
>  };
>  
>  static PXBDev *convert_to_pxb(PCIDevice *dev)
> @@ -254,6 +255,10 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> pcie, Error **errp)
>  bus->address_space_io = pci_get_bus(dev)->address_space_io;
>  bus->map_irq = pxb_map_irq_fn;
>  
> +if (pxb->iommu) {
> +bus->flags |= PCI_BUS_IOMMU;
> +}
> +
>  PCI_HOST_BRIDGE(ds)->bus = bus;
>  
>  pxb_register_bus(dev, bus, _err);
> @@ -301,6 +306,7 @@ static Property pxb_dev_properties[] = {
>  /* Note: 0 is not a legal PXB bus number. */
>  DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
>  DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
> +DEFINE_PROP_BOOL("iommu", PXBDev, iommu, true),
looks a bit odd to me that we have a property for the PXE-PCIe extra
root complex and not for the gpex device. Wouldn't it make sense to add
one for the GPEX too? In the positive you still could have a machine
option that would force the GPEX property value?
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a9ebef8a35..dc969989c9 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2712,7 +2712,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
> *dev)
>  
>  iommu_bus = parent_bus;
>  }
> -if (iommu_bus && iommu_bus->iommu_fn) {
> +if (pci_bus_has_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
>  return iommu_bus->iommu_fn(bus, 

Re: [RFC RESEND PATCH 0/4] hw/arm/virt-acpi-build: Introduce iommu option for pci root bus

2021-03-09 Thread Auger Eric
Hi,
On 2/27/21 9:33 AM, Wang Xingang wrote:
> From: Xingang Wang 
> 
> These patches add support for configure iommu on/off for pci root bus,
> including primary bus and pxb root bus. At present, All root bus will go
> through iommu when iommu is configured, which is not flexible.
> 
> So this add option to enable/disable iommu for primary bus and pxb root bus.
> When iommu is enabled for the root bus, devices attached to it will go
> through iommu. When iommu is disabled for the root bus, devices will not
> go through iommu accordingly.

Please could you give an example of the qemu command line for which the
new option is useful for you. This would help me to understand your
pcie/pci topology and also make sure I test it with the smmu.

Thank you in advance

Best Regards

Eric
> 
> Xingang Wang (4):
>   pci: Add PCI_BUS_IOMMU property
>   hw/pci: Add iommu option for pci root bus
>   hw/pci: Add pci_root_bus_max_bus
>   hw/arm/virt-acpi-build: Add explicit idmap info in IORT table
> 
>  hw/arm/virt-acpi-build.c| 92 +
>  hw/arm/virt.c   | 29 +
>  hw/pci-bridge/pci_expander_bridge.c |  6 ++
>  hw/pci/pci.c| 35 ++-
>  include/hw/arm/virt.h   |  1 +
>  include/hw/pci/pci.h|  1 +
>  include/hw/pci/pci_bus.h| 13 
>  7 files changed, 153 insertions(+), 24 deletions(-)
> 




Re: [PATCH v2 4/7] hw/arm/smmu-common: Fix smmu_iotlb_inv_iova when asid is not set

2021-03-09 Thread Auger Eric
Hi Peter,
On 3/8/21 5:37 PM, Peter Maydell wrote:
> On Thu, 25 Feb 2021 at 09:15, Eric Auger  wrote:
>>
>> If the asid is not set, do not attempt to locate the key directly
>> as all inserted keys have a valid asid.
>>
>> Use g_hash_table_foreach_remove instead.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  hw/arm/smmu-common.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
>> index 405d5c5325..e9ca3aebb2 100644
>> --- a/hw/arm/smmu-common.c
>> +++ b/hw/arm/smmu-common.c
>> @@ -151,7 +151,7 @@ inline void
>>  smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
>>  uint8_t tg, uint64_t num_pages, uint8_t ttl)
>>  {
>> -if (ttl && (num_pages == 1)) {
>> +if (ttl && (num_pages == 1) && (asid >= 0)) {
>>  SMMUIOTLBKey key = smmu_get_iotlb_key(asid, iova, tg, ttl);
>>
>>  g_hash_table_remove(s->iotlb, );
> 
> Do we also need to avoid the remove-by-key codepath if
> the tg is not set ?
when TG is not set, TTL is res0 so I think it is safe.

Thanks

Eric
> 
> thanks
> -- PMM
> 




Re: [PATCH v2 1/7] intel_iommu: Fix mask may be uninitialized in vtd_context_device_invalidate

2021-02-25 Thread Auger Eric
Hi Philippe,

On 2/25/21 11:08 AM, Philippe Mathieu-Daudé wrote:
> On 2/25/21 10:14 AM, Eric Auger wrote:
>> With -Werror=maybe-uninitialized configuration we get
>> ../hw/i386/intel_iommu.c: In function ‘vtd_context_device_invalidate’:
>> ../hw/i386/intel_iommu.c:1888:10: error: ‘mask’ may be used
>> uninitialized in this function [-Werror=maybe-uninitialized]
>>  1888 | mask = ~mask;
>>   | ~^~~
>>
>> Add a g_assert_not_reached() to avoid the error.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  hw/i386/intel_iommu.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index b4f5094259..3206f379f8 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -1884,6 +1884,8 @@ static void 
>> vtd_context_device_invalidate(IntelIOMMUState *s,
>>  case 3:
>>  mask = 7;   /* Mask bit 2:0 in the SID field */
>>  break;
>> +default:
>> +g_assert_not_reached();
>>  }
>>  mask = ~mask;
> 
> Unrelated to this patch, but I wonder why we don't directly assign the
> correct value of the mask in the switch cases...

After reading the vtd spec again, I think this is aligned with the spec
description.  FM = function mask encodes the bits to mask. Then you
actually compute the mask by ~mask.
> 
> Reviewed-by: Philippe Mathieu-Daudé 

Thanks!

Eric

> 
> set the mask
> diuse the
>>  
>>
> 




Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-24 Thread Auger Eric
Hi Shameer,

On 2/18/21 2:32 PM, Auger Eric wrote:
> Hi Shameer,
> 
> On 2/18/21 12:46 PM, Shameerali Kolothum Thodi wrote:
>>
>> Hi Eric,
>>
>>> -Original Message-
>>> From: Auger Eric [mailto:eric.au...@redhat.com]
>>> Sent: 18 February 2021 10:42
>>> To: Shameerali Kolothum Thodi ;
>>> eric.auger@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org;
>>> alex.william...@redhat.com
>>> Cc: peter.mayd...@linaro.org; jacob.jun@linux.intel.com; Zengtao (B)
>>> ; jean-phili...@linaro.org; t...@semihalf.com;
>>> pet...@redhat.com; nicoleots...@gmail.com; vivek.gau...@arm.com;
>>> yi.l@intel.com; zhangfei@gmail.com; yuzenghui
>>> ; qubingbing 
>>> Subject: Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page
>>> response callback
>>>
>> [...]
>>
>>>> Also, I just noted that this patch breaks the dev hot add/del 
>>>> functionality.
>>>> device_add works fine but device_del is not removing the dev cleanly.Thank
>>> you for reporting this!
>>>
>>> The test matrix becomes bigger and bigger :-( I Need to write some
>>> avocado-vt tests or alike.
>>>
>>> I am currently working on the respin. At the moment I investigate the
>>> DPDK issue that you reported and I was able to reproduce.
>>
>> Ok. Good to know that it is reproducible.
>>
>>> I intend to rebase on top of Jean-Philippe's
>>> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
>>>
>>> Is that good enough for your SVA integration or do you want I prepare a
>>> rebase on some extended code?
>>
>> Could you please try to base it on 
>> https://jpbrucker.net/git/linux/log/?h=sva/current
> 
> OK. At least I will provide a branch.

I sent the respin on top of master branch + Jean-Philippe's
[PATCH v12 00/10] iommu: I/O page faults for SMMUv3.
because I thought it makes more sense to post on master + some nearly
"ready to go" stuff.

Nevertheless I will do my best to prepare asap a branch based on Jean's
sva/current branch (based on 5.11-rc5)

Thanks

Eric



> 
> Eric
>>
>> I think that has the latest from Jean-Philippe and will be easy to add
>> uacce/zip specific patches to test SVA/vSVA.
>>
>> Thanks,
>> Shameer
>>
>>  
>>> Thanks
>>>
>>> Eric
>>>>
>>>> The below one fixes it. Please check.
>>>>
>>>> Thanks,
>>>> Shameer
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index 797acd9c73..92c1d48316 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3470,6 +3470,7 @@ static void vfio_instance_finalize(Object *obj)
>>>>  vfio_display_finalize(vdev);
>>>>  vfio_bars_finalize(vdev);
>>>>  vfio_region_finalize(>dma_fault_region);
>>>> +vfio_region_finalize(>dma_fault_response_region);
>>>>  g_free(vdev->emulated_config_bits);
>>>>  g_free(vdev->rom);
>>>>  /*
>>>> @@ -3491,6 +3492,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>>>  vfio_unregister_err_notifier(vdev);
>>>>  vfio_unregister_ext_irq_notifiers(vdev);
>>>>  vfio_region_exit(>dma_fault_region);
>>>> +vfio_region_exit(>dma_fault_response_region);
>>>>  pci_device_set_intx_routing_notifier(>pdev, NULL);
>>>>  if (vdev->irqchip_change_notifier.notify) {
>>>>
>>> kvm_irqchip_remove_change_notifier(>irqchip_change_not
>>>>
>>>>
>>>>
>>




Re: [PATCH] virtio-iommu: Handle non power of 2 range invalidations

2021-02-18 Thread Auger Eric
Hi Peter,

On 2/18/21 6:48 PM, Peter Xu wrote:
> On Thu, Feb 18, 2021 at 06:18:22PM +0100, Auger Eric wrote:
>> Hi Peter,
>>
>> On 2/18/21 5:42 PM, Peter Xu wrote:
>>> Eric,
>>>
>>> On Thu, Feb 18, 2021 at 03:16:50PM +0100, Eric Auger wrote:
>>>> @@ -164,12 +166,27 @@ static void 
>>>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start,
>>>>  
>>>>  event.type = IOMMU_NOTIFIER_UNMAP;
>>>>  event.entry.target_as = _space_memory;
>>>> -event.entry.addr_mask = virt_end - virt_start;
>>>> -event.entry.iova = virt_start;
>>>>  event.entry.perm = IOMMU_NONE;
>>>>  event.entry.translated_addr = 0;
>>>> +event.entry.addr_mask = mask;
>>>> +event.entry.iova = virt_start;
>>>>  
>>>> -memory_region_notify_iommu(mr, 0, event);
>>>> +if (mask == UINT64_MAX) {
>>>> +memory_region_notify_iommu(mr, 0, event);
>>>> +}
>>>> +
>>>> +size = mask + 1;
>>>> +
>>>> +while (size) {
>>>> +uint8_t highest_bit = 64 - clz64(size) - 1;
>>>
>>> I'm not sure fetching highest bit would work right. E.g., with start=0x11000
>>> and size=0x11000 (then we need to unmap 0x11000-0x22000), current code will
>>> first try to invalidate range (0x11000, 0x1), that seems still invalid
>>> since 0x11000 is not aligned to 0x1 page mask.
>>
>> Hum I thought aligning the size was sufficient. Where is it checked exactly?
> 
> I don't remember all the context either.. :)
> 
> Firstly - It makes sense to do that since hardware does it, and emulation code
> would make sense to follow that.>
> There's some more info where I looked into the src of when vt-d got introduced
> with the similar change:
> 
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg625340.html
> 
> I'm not 100% certain anything will break if we don't use page mask but
> arbitrary length as vhost did in iotlb msg.  However what Yan Zhao reported is
> definitely worse than that since we (vt-d) used to unmap outside the range of
> the range just for page mask alignment.

OK. I read again the get_naturally_aligned_size() code and indeed it
matches the need while taking into account the largest page size bigger
than the start @ alignment. This is a common pattern anyway. I am not
sure either this would break with vhost and vfio notifications but
better to use that function too in virtio-iommu and smmu (I sent a
similar patch for smmu).

Thank you for reminding me of that piece of code!

Eric
> 
> Thanks,
> 




Re: [PATCH] virtio-iommu: Handle non power of 2 range invalidations

2021-02-18 Thread Auger Eric
Hi Peter,

On 2/18/21 5:42 PM, Peter Xu wrote:
> Eric,
> 
> On Thu, Feb 18, 2021 at 03:16:50PM +0100, Eric Auger wrote:
>> @@ -164,12 +166,27 @@ static void 
>> virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start,
>>  
>>  event.type = IOMMU_NOTIFIER_UNMAP;
>>  event.entry.target_as = _space_memory;
>> -event.entry.addr_mask = virt_end - virt_start;
>> -event.entry.iova = virt_start;
>>  event.entry.perm = IOMMU_NONE;
>>  event.entry.translated_addr = 0;
>> +event.entry.addr_mask = mask;
>> +event.entry.iova = virt_start;
>>  
>> -memory_region_notify_iommu(mr, 0, event);
>> +if (mask == UINT64_MAX) {
>> +memory_region_notify_iommu(mr, 0, event);
>> +}
>> +
>> +size = mask + 1;
>> +
>> +while (size) {
>> +uint8_t highest_bit = 64 - clz64(size) - 1;
> 
> I'm not sure fetching highest bit would work right. E.g., with start=0x11000
> and size=0x11000 (then we need to unmap 0x11000-0x22000), current code will
> first try to invalidate range (0x11000, 0x1), that seems still invalid
> since 0x11000 is not aligned to 0x1 page mask.

Hum I thought aligning the size was sufficient. Where is it checked exactly?
> 
> I think the same trick in vtd_address_space_unmap() would work.  If you agree,
> maybe we can generalize that get_naturally_aligned_size() out, but maybe with 
> a
> better name as a helper?

Yep I need to read the code again ;-)

Thank you!

Eric
> 
> Thanks,
> 




Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 12:46 PM, Shameerali Kolothum Thodi wrote:
> 
> Hi Eric,
> 
>> -Original Message-
>> From: Auger Eric [mailto:eric.au...@redhat.com]
>> Sent: 18 February 2021 10:42
>> To: Shameerali Kolothum Thodi ;
>> eric.auger@gmail.com; qemu-devel@nongnu.org; qemu-...@nongnu.org;
>> alex.william...@redhat.com
>> Cc: peter.mayd...@linaro.org; jacob.jun@linux.intel.com; Zengtao (B)
>> ; jean-phili...@linaro.org; t...@semihalf.com;
>> pet...@redhat.com; nicoleots...@gmail.com; vivek.gau...@arm.com;
>> yi.l@intel.com; zhangfei@gmail.com; yuzenghui
>> ; qubingbing 
>> Subject: Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page
>> response callback
>>
> [...]
> 
>>> Also, I just noted that this patch breaks the dev hot add/del functionality.
>>> device_add works fine but device_del is not removing the dev cleanly.Thank
>> you for reporting this!
>>
>> The test matrix becomes bigger and bigger :-( I Need to write some
>> avocado-vt tests or alike.
>>
>> I am currently working on the respin. At the moment I investigate the
>> DPDK issue that you reported and I was able to reproduce.
> 
> Ok. Good to know that it is reproducible.
> 
>> I intend to rebase on top of Jean-Philippe's
>> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
>>
>> Is that good enough for your SVA integration or do you want I prepare a
>> rebase on some extended code?
> 
> Could you please try to base it on 
> https://jpbrucker.net/git/linux/log/?h=sva/current

OK. At least I will provide a branch.

Eric
> 
> I think that has the latest from Jean-Philippe and will be easy to add
> uacce/zip specific patches to test SVA/vSVA.
> 
> Thanks,
> Shameer
> 
>  
>> Thanks
>>
>> Eric
>>>
>>> The below one fixes it. Please check.
>>>
>>> Thanks,
>>> Shameer
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 797acd9c73..92c1d48316 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3470,6 +3470,7 @@ static void vfio_instance_finalize(Object *obj)
>>>  vfio_display_finalize(vdev);
>>>  vfio_bars_finalize(vdev);
>>>  vfio_region_finalize(>dma_fault_region);
>>> +vfio_region_finalize(>dma_fault_response_region);
>>>  g_free(vdev->emulated_config_bits);
>>>  g_free(vdev->rom);
>>>  /*
>>> @@ -3491,6 +3492,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>>>  vfio_unregister_err_notifier(vdev);
>>>  vfio_unregister_ext_irq_notifiers(vdev);
>>>  vfio_region_exit(>dma_fault_region);
>>> +vfio_region_exit(>dma_fault_response_region);
>>>  pci_device_set_intx_routing_notifier(>pdev, NULL);
>>>  if (vdev->irqchip_change_notifier.notify) {
>>>
>> kvm_irqchip_remove_change_notifier(>irqchip_change_not
>>>
>>>
>>>
> 




Re: [RFC v7 26/26] vfio/pci: Implement return_page_response page response callback

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 11:19 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: 16 November 2020 18:14
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> qemu-devel@nongnu.org; qemu-...@nongnu.org;
>> alex.william...@redhat.com
>> Cc: peter.mayd...@linaro.org; jean-phili...@linaro.org; pet...@redhat.com;
>> jacob.jun@linux.intel.com; yi.l@intel.com; Shameerali Kolothum Thodi
>> ; t...@semihalf.com;
>> nicoleots...@gmail.com; yuzenghui ;
>> zhangfei@gmail.com; vivek.gau...@arm.com
>> Subject: [RFC v7 26/26] vfio/pci: Implement return_page_response page
>> response callback
>>
>> This patch implements the page response path. The
>> response s written into the page response ring buffer and then
>> update header's head index is updated. This path is not used
>> by this series. It is introduced here as a POC for vSVA/ARM
>> integration.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  hw/vfio/pci.h |   2 +
>>  hw/vfio/pci.c | 121
>> ++
>>  2 files changed, 123 insertions(+)
>>
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 350e9e9005..ce0472611e 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -147,6 +147,8 @@ struct VFIOPCIDevice {
>>  VFIOPCIExtIRQ *ext_irqs;
>>  VFIORegion dma_fault_region;
>>  uint32_t fault_tail_index;
>> +VFIORegion dma_fault_response_region;
>> +uint32_t fault_response_head_index;
>>  int (*resetfn)(struct VFIOPCIDevice *);
>>  uint32_t vendor_id;
>>  uint32_t device_id;
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 4e3495bb60..797acd9c73 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2631,6 +2631,61 @@ out:
>>  g_free(fault_region_info);
>>  }
>>
>> +static void vfio_init_fault_response_regions(VFIOPCIDevice *vdev, Error
>> **errp)
>> +{
>> +struct vfio_region_info *fault_region_info = NULL;
>> +struct vfio_region_info_cap_fault *cap_fault;
>> +VFIODevice *vbasedev = >vbasedev;
>> +struct vfio_info_cap_header *hdr;
>> +char *fault_region_name;
>> +int ret;
>> +
>> +ret = vfio_get_dev_region_info(>vbasedev,
>> +   VFIO_REGION_TYPE_NESTED,
>> +
>> VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT_RESPONSE,
>> +   _region_info);
>> +if (ret) {
>> +goto out;
>> +}
>> +
>> +hdr = vfio_get_region_info_cap(fault_region_info,
>> +
>> VFIO_REGION_INFO_CAP_DMA_FAULT);
> 
> VFIO_REGION_INFO_CAP_DMA_FAULT_RESPONSE ? 
yes!
> 
>> +if (!hdr) {
>> +error_setg(errp, "failed to retrieve DMA FAULT RESPONSE
>> capability");
>> +goto out;
>> +}
>> +cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
>> + header);
>> +if (cap_fault->version != 1) {
>> +error_setg(errp, "Unsupported DMA FAULT RESPONSE API
>> version %d",
>> +   cap_fault->version);
>> +goto out;
>> +}
>> +
>> +fault_region_name = g_strdup_printf("%s DMA FAULT RESPONSE %d",
>> +vbasedev->name,
>> +fault_region_info->index);
>> +
>> +ret = vfio_region_setup(OBJECT(vdev), vbasedev,
>> +>dma_fault_response_region,
>> +fault_region_info->index,
>> +fault_region_name);
>> +g_free(fault_region_name);
>> +if (ret) {
>> +error_setg_errno(errp, -ret,
>> + "failed to set up the DMA FAULT RESPONSE
>> region %d",
>> + fault_region_info->index);
>> +goto out;
>> +}
>> +
>> +ret = vfio_region_mmap(>dma_fault_response_region);
>> +if (ret) {
>> +error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT
>> RESPONSE queue");
>> +}
>> +out:
>> +g_free(fault_region_info);
>> +}
>> +
>>  static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>>  {
>>  VFIODevice *vbasedev = >vbasedev;
>> @@ -2706,6 +2761,12 @@ static void vfio_populate_device(VFIOPCIDevice
>> *vdev, Error **errp)
>>  return;
>>  }
>>
>> +vfio_init_fault_response_regions(vdev, );
>> +if (err) {
>> +error_propagate(errp, err);
>> +return;
>> +}
>> +
>>  irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>>
>>  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
>> @@ -2884,8 +2945,68 @@ static int vfio_iommu_set_pasid_table(PCIBus
>> *bus, int32_t devfn,
>>  return ioctl(container->fd, VFIO_IOMMU_SET_PASID_TABLE, );
>>  }
>>
>> +static int vfio_iommu_return_page_response(PCIBus *bus, int32_t devfn,
>> +   IOMMUPageResponse
>> *resp)
>> +{
>> +PCIDevice *pdev = bus->devices[devfn];
>> +VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +struct iommu_page_response 

Re: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-09 Thread Auger Eric
Hi,

On 2/9/21 4:12 AM, Jason Wang wrote:
> 
> On 2021/2/9 上午2:37, Peter Xu wrote:
>> On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:
>>
>> [...]
>>
 I'm not sure I remember it right, but we seem to have similar
 discussion
 previously on "what if the user didn't specify ats=on" - I think at
 that time
 the conclusion was that we ignore the failure since that's not a valid
 configuration for qemu.
>>>
>>> Yes, but I think I was wrong at that time.
>> I can't say you're wrong - I actually still agree with you that at least
>> there's a priority of things we'd do, and this one is not extremely
>> important
>> if that's not a major use case (say, if you will 100% always suggest
>> an user to
>> use ats=on for a viommu enabled vhost).
> 
> 
> Right, but it depends on e.g how libvirt use that. As far as I know,
> they do enable ATS. But it would still an issue if libvirt want to
> support vIOMMUs other than intel.
> 
> 
>>
 The other issue I'm worried is (I think I mentioned it somewhere,
 but just to
 double confirm): I'd like to make sure SMMU and virtio-iommu are the
 only IOMMU
 platform that will use vhost.
>>>
>>> For upstream, it won't be easy :)
>> Sorry I definitely didn't make myself clear... :)
>>
>> To be explicit, does ppc use vhost kernel too?
> 
> 
> I think the answer is yes.
> 
> 
>>   Since I know at least ppc has
>> its own translation unit and its iommu notifier in qemu, so I'm unsure
>> whether
>> the same patch would break ppc too, because vhost could also ignore
>> all UNMAP
>> sent by the ppc vIOMMU.
> 
> 
> If this is true, we probably need to fix that.
> 
> 
>>
>>>
     Otherwise IIUC we need to fix those vIOMMUs too.
>>>
>>> Right, last time I check AMD IOMMU emulation, it simply trigger
>>> device IOTLB
>>> invalidation during IOTLB invalidation which looks wrong.
>> I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw
>> nothing. It
>> seems amd iommu is not ready for any kind of IOMMU notifiers yet.
>>
>> Thanks,
> 
> 
> Right.
> 
> Thanks
> 
> 
>>
> 
> 
I just noted that the vhost fix now breaks virtio-iommu/vfio integration
because VFIO registers IOMMU_NOTIFIER_ALL which includes the DEV-IOTLB
that is now rejected by virtio-iommu virtio_iommu_notify_flag_changed().
Is it safe to replace IOMMU_NOTIFIER_ALL by IOMMU_NOTIFIER_IOTLB_EVENTS
in vfio_listener_region_add (hw/vfio/common.c) or shall we also do the
2-step registration? After your confirmation, I can send the patch.

Thanks

Eric




Re: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-08 Thread Auger Eric
Hi,

On 2/7/21 3:47 PM, Peter Xu wrote:
> Hi, Kevin,
> 
> On Sun, Feb 07, 2021 at 09:04:55AM +, Tian, Kevin wrote:
>>> From: Peter Xu
>>> Sent: Friday, February 5, 2021 11:31 PM
>>>
>
>
>> or virtio-iommu
>> since dev-iotlb (or PCIe ATS)
>
>
> We may need to add this in the future.
 added Jean-Philippe in CC
>>>
>>> So that's the part I'm unsure about..  Since everybody is cced so maybe good
>>> time to ask. :)
>>>
>>> The thing is I'm still not clear on whether dev-iotlb is useful for a full
>>> emulation environment and how that should differ from a normal iotlb, since
>>> after all normal iotlb will be attached with device information too.
>>
>> dev-iotlb is useful in two manners.First, it's a functional prerequisite for
>> supporting I/O page faults.
If I understand correctly, the stall model of the ARM SMMU allows IOPF I
guess without dev-iotlb (ATS). However indeed PRI requires ATS.
> 
> Is this also a hard requirement for virtio-iommu, which is not a real hardware
> after all?
> 
>> Second, it has performance benefit as you don't
>> need to contend the lock of global iotlb.
> 
> Hmm.. are you talking about e.g. vt-d driver or virtio-iommu?
> 
> Assuming it's about vt-d, qi_flush_dev_iotlb() will still call 
> qi_submit_sync()
> and taking the same global QI lock, as I see it, or I could be wrong 
> somewhere.
> I don't see where dev-iotlb has a standalone channel for delivery.
> 
> For virtio-iommu, we haven't defined dev-iotlb, right?
no there is no such feature at the moment. If my understanding is
correct this would only make sense when protecting a HW device. In that
case the underlying physical IOMMU would be programmed for ATS.

When protecting a virtio device (inc. vhost) what would be the adventage
over the current standard unmap notifier?

Thanks

Eric
  Sorry I missed things
> when I completely didn't follow virtio-iommu recently - let's say if
> virtio-iommu in the future can support per-dev dev-iotlb queue so it doesn't
> need a global lock, what if we make it still per-device but still delivering
> iotlb message?  Again, it's still a bit unclear to me why a full emulation
> iommu would need that definition of "iotlb" and "dev-iotlb".
> 
>>
>>>
>>> For real hardwares, they make sense because they ask for two things: iotlb 
>>> is
>>> for IOMMU, but dev-iotlb is for the device cache.  For emulation
>>> environment
>>> (virtio-iommu is the case) do we really need that complexity?
>>>
>>> Note that even if there're assigned devices under virtio-iommu in the 
>>> future,
>>> we can still isolate that and iiuc we can easily convert an iotlb (from
>>> virtio-iommu) into a hardware IOMMU dev-iotlb no matter what type of
>>> IOMMU is
>>> underneath the vIOMMU.
>>>
>>
>> Didn't get this point. Hardware dev-iotlb is updated by hardware (between
>> the device and the IOMMU). How could software convert a virtual iotlb
>> entry into hardware dev-iotlb?
> 
> I mean if virtio-iommu must be run in a guest, then we can trap that message
> first, right?  If there're assigned device in the guest, we must convert that
> invalidation to whatever message required for the host, that seems to not
> require the virtio-iommu to have dev-iotlb knowledge, still?
> 
> Thanks,
> 




Re: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-08 Thread Auger Eric
Hi,

[Adding David and Greg in CC]


On 2/8/21 7:37 PM, Peter Xu wrote:
> On Mon, Feb 08, 2021 at 11:21:23AM +0800, Jason Wang wrote:
> 
> [...]
> 
>>> I'm not sure I remember it right, but we seem to have similar discussion
>>> previously on "what if the user didn't specify ats=on" - I think at that 
>>> time
>>> the conclusion was that we ignore the failure since that's not a valid
>>> configuration for qemu.
>>
>>
>> Yes, but I think I was wrong at that time.
> 
> I can't say you're wrong - I actually still agree with you that at least
> there's a priority of things we'd do, and this one is not extremely important
> if that's not a major use case (say, if you will 100% always suggest an user 
> to
> use ats=on for a viommu enabled vhost).
> 
>>>
>>> The other issue I'm worried is (I think I mentioned it somewhere, but just 
>>> to
>>> double confirm): I'd like to make sure SMMU and virtio-iommu are the only 
>>> IOMMU
>>> platform that will use vhost.
>>
>>
>> For upstream, it won't be easy :)
> 
> Sorry I definitely didn't make myself clear... :)
> 
> To be explicit, does ppc use vhost kernel too?  Since I know at least ppc has
> its own translation unit and its iommu notifier in qemu, so I'm unsure whether
> the same patch would break ppc too, because vhost could also ignore all UNMAP
> sent by the ppc vIOMMU.
> 
>>
>>
>>>Otherwise IIUC we need to fix those vIOMMUs too.
>>
>>
>> Right, last time I check AMD IOMMU emulation, it simply trigger device IOTLB
>> invalidation during IOTLB invalidation which looks wrong.
> 
> I did quickly grep IOMMU_NOTIFIER_UNMAP in amd_iommu.c and saw nothing. It
> seems amd iommu is not ready for any kind of IOMMU notifiers yet.

for context, we experienced a regression with vsmmuv3/vhost and
virtio-iommu/vhost integration. We wondered whether the ppc viommu is
able to protect vhost devices and if this relies on legacy
IOMMU_NOTIFIER_UNMAP notifiers. ie. vhost does not register this
notifier anymore but instead register dev-iotlb unmap notifier.

Thanks

Eric
> 
> Thanks,
> 




Re: [PATCH] vhost: Unbreak SMMU and virtio-iommu on dev-iotlb support

2021-02-05 Thread Auger Eric
Hi,

On 2/5/21 4:16 AM, Jason Wang wrote:
> 
> On 2021/2/5 上午3:12, Peter Xu wrote:
>> Previous work on dev-iotlb message broke vhost on either SMMU
> 
> 
> Have a quick git grep and it looks to me v3 support ATS and have command
> for device iotlb (ATC) invalidation.


Yes I will do that. Should not be a big deal.
> 
> 
>> or virtio-iommu
>> since dev-iotlb (or PCIe ATS)
> 
> 
> We may need to add this in the future.
added Jean-Philippe in CC
> 
> 
>> is not yet supported for those archs.
> 
> 
> Rethink about this, it looks to me the point is that we should make
> vhost work when ATS is disabled like what ATS spec defined:
> 
> """
> 
> ATS is enabled through a new Capability and associated configuration
> structure.  To enable 15 ATS, software must detect this Capability and
> enable the Function to issue ATS TLP.  If a Function is not enabled, the
> Function is required not to issue ATS Translation Requests and is
> required to issue all DMA Read and Write Requests with the TLP AT field
> set to “untranslated.”
> 
> """
> 
> Maybe we can add this in the commit log.
> 
> 
>>
>> An initial idea is that we can let IOMMU to export this information to
>> vhost so
>> that vhost would know whether the vIOMMU would support dev-iotlb, then
>> vhost
>> can conditionally register to dev-iotlb or the old iotlb way.  We can
>> work
>> based on some previous patch to introduce PCIIOMMUOps as Yi Liu
>> proposed [1].
>>
>> However it's not as easy as I thought since vhost_iommu_region_add()
>> does not
>> have a PCIDevice context at all since it's completely a backend.  It
>> seems
>> non-trivial to pass over a PCI device to the backend during init. 
>> E.g. when
>> the IOMMU notifier registered hdev->vdev is still NULL.
>>
>> To make the fix smaller and easier, this patch goes the other way to
>> leverage
>> the flag_changed() hook of vIOMMUs so that SMMU and virtio-iommu can
>> trap the
>> dev-iotlb registration and fail it.  Then vhost could try the fallback
>> solution
>> as using UNMAP invalidation for it's translations.
>>
>> [1]
>> https://lore.kernel.org/qemu-devel/1599735398-6829-4-git-send-email-yi.l@intel.com/
>>
>>
>> Reported-by: Eric Auger 
>> Fixes: b68ba1ca57677acf870d5ab10579e6105c1f5338
>> Reviewed-by: Eric Auger 
>> Tested-by: Eric Auger 
>> Signed-off-by: Peter Xu 
>> ---
>>   hw/arm/smmuv3.c  |  5 +
>>   hw/virtio/vhost.c    | 13 +++--
>>   hw/virtio/virtio-iommu.c |  5 +
>>   3 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index 98b99d4fe8e..bd1f97000d9 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -1497,6 +1497,11 @@ static int
>> smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
>>   SMMUv3State *s3 = sdev->smmu;
>>   SMMUState *s = &(s3->smmu_state);
>>   +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
>> +    error_setg(errp, "SMMUv3 does not support dev-iotlb yet");
>> +    return -EINVAL;
>> +    }
>> +
>>   if (new & IOMMU_NOTIFIER_MAP) {
>>   error_setg(errp,
>>  "device %02x.%02x.%x requires iommu MAP notifier
>> which is "
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 28c7d781721..6e17d631f77 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -704,6 +704,7 @@ static void vhost_iommu_region_add(MemoryListener
>> *listener,
>>   Int128 end;
>>   int iommu_idx;
>>   IOMMUMemoryRegion *iommu_mr;
>> +    int ret;
>>     if (!memory_region_is_iommu(section->mr)) {
>>   return;
>> @@ -726,8 +727,16 @@ static void vhost_iommu_region_add(MemoryListener
>> *listener,
>>   iommu->iommu_offset = section->offset_within_address_space -
>>     section->offset_within_region;
>>   iommu->hdev = dev;
>> -    memory_region_register_iommu_notifier(section->mr, >n,
>> -  _fatal);
>> +    ret = memory_region_register_iommu_notifier(section->mr,
>> >n, NULL);
>> +    if (ret) {
>> +    /*
>> + * Some vIOMMUs do not support dev-iotlb yet.  If so, try to
>> use the
>> + * UNMAP legacy message
>> + */
>> +    iommu->n.notifier_flags = IOMMU_NOTIFIER_UNMAP;
>> +    memory_region_register_iommu_notifier(section->mr, >n,
>> +  _fatal);
>> +    }
>>   QLIST_INSERT_HEAD(>iommu_list, iommu, iommu_next);
>>   /* TODO: can replay help performance here? */
>>   }
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 6b9ef7f6b2b..c2883a2f6c8 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -893,6 +893,11 @@ static int
>> virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
>>   IOMMUNotifierFlag new,
>>   Error **errp)
>>   {
>> +    if (new & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
>> +    error_setg(errp, 

Re: [PATCH v2] hw/arm/smmuv3: Fix addr_mask for range-based invalidation

2021-01-31 Thread Auger Eric
Hi Zenghui,

On 1/30/21 5:32 AM, Zenghui Yu wrote:
> When handling guest range-based IOTLB invalidation, we should decode the TG
> field into the corresponding translation granule size so that we can pass
> the correct invalidation range to backend. Set @granule to (tg * 2 + 10) to
> properly emulate the architecture.
> 
> Fixes: d52915616c05 ("hw/arm/smmuv3: Get prepared for range invalidation")
> Signed-off-by: Zenghui Yu 
> ---
> * From v1:
>   - Fix the compilation error
> 
>  hw/arm/smmuv3.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index bbca0e9f20..98b99d4fe8 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>  {
>  SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
>  IOMMUTLBEvent event;
> -uint8_t granule = tg;
> +uint8_t granule;
>  
>  if (!tg) {
>  SMMUEventInfo event = {.inval_ste_allowed = true};
> @@ -821,6 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>  return;
>  }
>  granule = tt->granule_sz;
> +} else {
> +granule = tg * 2 + 10;
>  }
>  
>  event.type = IOMMU_NOTIFIER_UNMAP;
> 
Acked-by: Eric Auger 

Thanks

Eric




Re: [PATCH] hw/arm/smmuv3: Fix addr_mask for range-based invalidation

2021-01-29 Thread Auger Eric
Hi Zenghui,

On 1/29/21 1:15 PM, Zenghui Yu wrote:
> Hi Eric,
> 
> On 2021/1/29 5:30, Auger Eric wrote:
>> Hi Zenghui,
>>
>> On 1/28/21 9:25 AM, Auger Eric wrote:
>>> Hi Zenghui,
>>>
>>> On 12/25/20 10:50 AM, Zenghui Yu wrote:
>>>> When performing range-based IOTLB invalidation, we should decode the TG
>>>> field into the corresponding translation granule size so that we can
>>>> pass
>>>> the correct invalidation range to backend. Set @granule to (tg * 2 +
>>>> 10) to
>>>> properly emulate the architecture.
>>>>
>>>> Fixes: d52915616c05 ("hw/arm/smmuv3: Get prepared for range
>>>> invalidation")
>>>> Signed-off-by: Zenghui Yu 
>>>
>>> Good catch! I tested with older guest kernels though. I wonder how I did
>>> not face the bug?
>> Please ignore this wrong comment as this corresponds to recent kernels
>> instead. Still puzzled anyway ;-)
> 
> I noticed this when looking through your nested SMMU series and I didn't
> have much clue about the impact on the real setups.
> 
> I guess we may receive some unexpected fault events with this bug. But I
> think we may miss it for some reasons:
> 
>  - the stale TLB entries happen to be evicted due to heavy traffic
>  - some form of over-invalidation is performed by your implementation
>  - ...
Yep I will further trace things. Anyway thank you for spotting it.
> 
>>>> ---
>>>>   hw/arm/smmuv3.c | 4 +++-
>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>>>> index bbca0e9f20..65231c7d52 100644
>>>> --- a/hw/arm/smmuv3.c
>>>> +++ b/hw/arm/smmuv3.c
>>>> @@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion
>>>> *mr,
>>>>   {
>>>>   SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
>>>>   IOMMUTLBEvent event;
>>>> -    uint8_t granule = tg;
>>>> +    uint8_t granule;
>>>>     if (!tg) {
>>>>   SMMUEventInfo event = {.inval_ste_allowed = true};
>>>> @@ -821,6 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion
>>>> *mr,
>>>>   return;
>>>>   }
>>>>   granule = tt->granule_sz;
>>>> +    } else {
>>>> +    guanule = tg * 2 + 10;
>>> maybe just init granule to this value above while fixing the typo.
> 
> My intention is to initialize @granule to this value explicitly for the
> range-based invalidation case. But I'm okay with either way.
same for me ;-)

Eric
> 
> 
> Thanks,
> Zenghui
> 




Re: [PATCH] hw/arm/smmuv3: Fix addr_mask for range-based invalidation

2021-01-28 Thread Auger Eric
Hi Zenghui,

On 1/28/21 9:25 AM, Auger Eric wrote:
> Hi Zenghui,
> 
> On 12/25/20 10:50 AM, Zenghui Yu wrote:
>> When performing range-based IOTLB invalidation, we should decode the TG
>> field into the corresponding translation granule size so that we can pass
>> the correct invalidation range to backend. Set @granule to (tg * 2 + 10) to
>> properly emulate the architecture.
>>
>> Fixes: d52915616c05 ("hw/arm/smmuv3: Get prepared for range invalidation")
>> Signed-off-by: Zenghui Yu 
> 
> Good catch! I tested with older guest kernels though. I wonder how I did
> not face the bug?
Please ignore this wrong comment as this corresponds to recent kernels
instead. Still puzzled anyway ;-)

Eric
> 
> 
>> ---
>>  hw/arm/smmuv3.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index bbca0e9f20..65231c7d52 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>>  {
>>  SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
>>  IOMMUTLBEvent event;
>> -uint8_t granule = tg;
>> +uint8_t granule;
>>  
>>  if (!tg) {
>>  SMMUEventInfo event = {.inval_ste_allowed = true};
>> @@ -821,6 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>>  return;
>>  }
>>  granule = tt->granule_sz;
>> +} else {
>> +guanule = tg * 2 + 10;
> maybe just init granule to this value above while fixing the typo.
> 
> Thanks
> 
> Eric
>>  }
>>  
>>  event.type = IOMMU_NOTIFIER_UNMAP;
>>




Re: [PATCH] hw/arm/smmuv3: Fix addr_mask for range-based invalidation

2021-01-28 Thread Auger Eric
Hi Zenghui,

On 12/25/20 10:50 AM, Zenghui Yu wrote:
> When performing range-based IOTLB invalidation, we should decode the TG
> field into the corresponding translation granule size so that we can pass
> the correct invalidation range to backend. Set @granule to (tg * 2 + 10) to
> properly emulate the architecture.
> 
> Fixes: d52915616c05 ("hw/arm/smmuv3: Get prepared for range invalidation")
> Signed-off-by: Zenghui Yu 

Good catch! I tested with older guest kernels though. I wonder how I did
not face the bug?


> ---
>  hw/arm/smmuv3.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index bbca0e9f20..65231c7d52 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -801,7 +801,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>  {
>  SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
>  IOMMUTLBEvent event;
> -uint8_t granule = tg;
> +uint8_t granule;
>  
>  if (!tg) {
>  SMMUEventInfo event = {.inval_ste_allowed = true};
> @@ -821,6 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
>  return;
>  }
>  granule = tt->granule_sz;
> +} else {
> +guanule = tg * 2 + 10;
maybe just init granule to this value above while fixing the typo.

Thanks

Eric
>  }
>  
>  event.type = IOMMU_NOTIFIER_UNMAP;
> 




Re: [RFC v7 09/26] vfio: Force nested if iommu requires it

2020-12-01 Thread Auger Eric
Hi Kunkun,

On 11/28/20 10:01 AM, Kunkun Jiang wrote:
> Hi Eric,
>> @@ -1668,6 +1679,14 @@ static int vfio_connect_container(VFIOGroup *group, 
>> AddressSpace *as,
>>  VFIOContainer *container;
>>  int ret, fd;
>>  VFIOAddressSpace *space;
>> +IOMMUMemoryRegion *iommu_mr;
>> +bool nested = false;
>> +
>> +if (as != _space_memory && memory_region_is_iommu(as->root)) {
>> +iommu_mr = IOMMU_MEMORY_REGION(as->root);
>> +memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
>> + (void *));
>> +}
>>  
>>  space = vfio_get_address_space(as);
> Is the condition "as != _space_memory" needed to determine whether
> a vIOMMU is in place? I think "memory_region_is_iommu(as->root)" is enough.
> 
> Looking forward to your reply.:)

Yes I think so.

Thank you for your report!

Eric
> 
> Thanks,
> 
> Kunkun Jiang
> 




Re: [PATCH] hw/arm/smmuv3: Fix up L1STD_SPAN decoding

2020-11-30 Thread Auger Eric
Hi Kunkun, Peter,

On 11/30/20 12:29 PM, Peter Maydell wrote:
> On Tue, 24 Nov 2020 at 02:37, Kunkun Jiang  wrote:
>>
>> Accroding to the SMMUv3 spec, the SPAN field of Level1 Stream Table
>> Descriptor is 5 bits([4:0]).
>>
>> Fixes: 9bde7f0674f(hw/arm/smmuv3: Implement translate callback)
>> Signed-off-by: Kunkun Jiang 
>> ---
Acked-by: Eric Auger 
> 
> 
> Applied to target-arm.next for 6.0, thanks.
thanks and sorry for the delay

Eric
> 
> -- PMM
> 




Re: [PATCH v2] hw/arm/virt enable support for virtio-mem

2020-11-09 Thread Auger Eric
Hi Jonathan,

On 11/5/20 6:43 PM, Jonathan Cameron wrote:
> Basically a cut and paste job from the x86 support with the exception of
> needing a larger block size as the Memory Block Size (MIN_SECTION_SIZE)
> on ARM64 in Linux is 1G.
> 
> Tested:
> * In full emulation and with KVM on an arm64 server.
> * cold and hotplug for the virtio-mem-pci device.
> * Wide range of memory sizes, added at creation and later.
> * Fairly basic memory usage of memory added.  Seems to function as normal.
> * NUMA setup with virtio-mem-pci devices on each node.
> * Simple migration test.

I would add in the commit message that the hot-unplug of the device is
not supported.
> 
> Related kernel patch just enables the Kconfig item for ARM64 as an
> alternative to x86 in drivers/virtio/Kconfig
> 
> The original patches from David Hildenbrand stated that he thought it should
> work for ARM64 but it wasn't enabled in the kernel [1]
> It appears he was correct and everything 'just works'.
Did you try with 64kB page guest as well?
> 
> The build system related stuff is intended to ensure virtio-mem support is
> not built for arm32 (build will fail due no defined block size).
> If there is a more elegant way to do this, please point me in the right
> direction.
I guess you meant CONFIG_ARCH_VIRTIO_MEM_SUPPORTED introduction
> 
> [1] https://lore.kernel.org/linux-mm/20191212171137.13872-1-da...@redhat.com/
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  default-configs/devices/aarch64-softmmu.mak |  1 +
>  hw/arm/Kconfig  |  1 +
>  hw/arm/virt.c   | 64 -
>  hw/virtio/Kconfig   |  4 ++
>  hw/virtio/virtio-mem.c  |  2 +
>  5 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/devices/aarch64-softmmu.mak 
> b/default-configs/devices/aarch64-softmmu.mak
> index 958b1e08e4..31d6128a29 100644
> --- a/default-configs/devices/aarch64-softmmu.mak
> +++ b/default-configs/devices/aarch64-softmmu.mak
> @@ -6,3 +6,4 @@ include arm-softmmu.mak
>  CONFIG_XLNX_ZYNQMP_ARM=y
>  CONFIG_XLNX_VERSAL=y
>  CONFIG_SBSA_REF=y
> +CONFIG_ARCH_VIRTIO_MEM_SUPPORTED=y
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index fdf4464b94..eeae77eee9 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -20,6 +20,7 @@ config ARM_VIRT
>  select PLATFORM_BUS
>  select SMBIOS
>  select VIRTIO_MMIO
> +select VIRTIO_MEM_SUPPORTED if ARCH_VIRTIO_MEM_SUPPORTED
>  select ACPI_PCI
>  select MEM_DEVICE
>  select DIMM
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 8abb385d4e..6c96d71106 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,9 +73,11 @@
>  #include "hw/acpi/acpi.h"
>  #include "target/arm/internals.h"
>  #include "hw/mem/pc-dimm.h"
> +#include "hw/mem/memory-device.h"
>  #include "hw/mem/nvdimm.h"
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/virtio/virtio-iommu.h"
> +#include "hw/virtio/virtio-mem-pci.h"
>  #include "hw/char/pl011.h"
>  #include "qemu/guest-random.h"
>  
> @@ -2286,6 +2288,34 @@ static void virt_memory_plug(HotplugHandler 
> *hotplug_dev,
>   dev, _abort);
>  }
>  
> +static void virt_virtio_md_pci_pre_plug(HotplugHandler *hotplug_dev,
> +  DeviceState *dev, Error **errp)
> +{
> +HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
> +Error *local_err = NULL;
> +
> +if (!hotplug_dev2 && dev->hotplugged) {
> +/*
> + * Without a bus hotplug handler, we cannot control the plug/unplug
> + * order. We should never reach this point when hotplugging,
> + * however, better add a safety net.
> + */
> +error_setg(errp, "hotplug of virtio-mem based memory devices not"
> +   " supported on this bus.");
> +return;
> +}
> +/*
> + * First, see if we can plug this memory device at all. If that
> + * succeeds, branch of to the actual hotplug handler.
> + */
> +memory_device_pre_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev), NULL,
> +   _err);
> +if (!local_err && hotplug_dev2) {
> +hotplug_handler_pre_plug(hotplug_dev2, dev, _err);
> +}
> +error_propagate(errp, local_err);
> +}
> +
>  static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>  DeviceState *dev, Error **errp)
>  {
> @@ -2293,6 +2323,8 @@ static void 
> virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>  
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>  virt_memory_pre_plug(hotplug_dev, dev, errp);
> +} else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
> +virt_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
>  } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>  hwaddr db_start = 0, db_end = 0;
>  char 

Re: [PATCH-for-5.2 v3 0/7] util/vfio-helpers: Generic code strengthening

2020-11-03 Thread Auger Eric
Hi Philippe,
On 11/3/20 3:07 AM, Philippe Mathieu-Daudé wrote:
> v3:
> - Extract reviewed patches from
>   "util/vfio-helpers: Allow using multiple MSIX IRQs"
> - Added "Assert offset is aligned to page size"
>   which would have helped debugging:
>   "block/nvme: Fix use of write-only doorbells page on Aarch64 arch"
> 
> Missing review: 7
> 
> Based-on: <20201029093306.1063879-1-phi...@redhat.com>
Tested-by: Eric Auger 

Thanks

Eric
> 
> Philippe Mathieu-Daudé (7):
>   util/vfio-helpers: Improve reporting unsupported IOMMU type
>   util/vfio-helpers: Trace PCI I/O config accesses
>   util/vfio-helpers: Trace PCI BAR region info
>   util/vfio-helpers: Trace where BARs are mapped
>   util/vfio-helpers: Improve DMA trace events
>   util/vfio-helpers: Convert vfio_dump_mapping to trace events
>   util/vfio-helpers: Assert offset is aligned to page size
> 
>  util/vfio-helpers.c | 43 ++-
>  util/trace-events   | 10 --
>  2 files changed, 34 insertions(+), 19 deletions(-)
> 




Re: [PATCH-for-5.2] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

2020-10-30 Thread Auger Eric
Hi Philippe,

On 10/30/20 3:46 PM, Philippe Mathieu-Daudé wrote:
> Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
> This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
> 
>   CID 1432363 (#1 of 1): Unintentional integer overflow:
> 
>   overflow_before_widen:
> Potentially overflowing expression 1 << scale with type int
> (32 bits, signed) is evaluated using 32-bit arithmetic, and
> then used in a context that expects an expression of type
> hwaddr (64 bits, unsigned).
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Eric Auger 

Thanks!

Eric
> ---
>  hw/arm/smmuv3.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 2017ba7a5a7..22607c37841 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -17,6 +17,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/bitops.h"
>  #include "hw/irq.h"
>  #include "hw/sysbus.h"
>  #include "migration/vmstate.h"
> @@ -864,7 +865,7 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
>  scale = CMD_SCALE(cmd);
>  num = CMD_NUM(cmd);
>  ttl = CMD_TTL(cmd);
> -num_pages = (num + 1) * (1 << (scale));
> +num_pages = (num + 1) * BIT_ULL(scale);
>  }
>  
>  if (type == SMMU_CMD_TLBI_NH_VA) {
> 




Re: [PATCH 00/25] block/nvme: Fix Aarch64 host

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Add a bit of tracing, clean around to finally fix few bugs.
> In particular, restore NVMe on Aarch64 host.
> 
> Eric Auger (4):
>   block/nvme: Change size and alignment of IDENTIFY response buffer
>   block/nvme: Change size and alignment of queue
>   block/nvme: Change size and alignment of prp_list_pages
>   block/nvme: Align iov's va and size on host page size>
> Philippe Mathieu-Daudé (21):
>   MAINTAINERS: Cover 'block/nvme.h' file
>   block/nvme: Use hex format to display offset in trace events
>   block/nvme: Report warning with warn_report()
>   block/nvme: Trace controller capabilities
>   block/nvme: Trace nvme_poll_queue() per queue
>   block/nvme: Improve nvme_free_req_queue_wait() trace information
>   block/nvme: Trace queue pair creation/deletion
>   block/nvme: Simplify device reset
>   block/nvme: Move definitions before structure declarations
>   block/nvme: Use unsigned integer for queue counter/size
>   block/nvme: Make nvme_identify() return boolean indicating error
>   block/nvme: Make nvme_init_queue() return boolean indicating error
>   block/nvme: Introduce Completion Queue definitions
>   block/nvme: Use definitions instead of magic values in add_io_queue()
>   block/nvme: Correctly initialize Admin Queue Attributes
>   block/nvme: Simplify ADMIN queue access
>   block/nvme: Simplify nvme_cmd_sync()
>   block/nvme: Pass AioContext argument to nvme_add_io_queue()
>   block/nvme: Set request_alignment at initialization
>   block/nvme: Correct minimum device page size
>   block/nvme: Fix use of write-only doorbells page on Aarch64 arch
> 
>  include/block/nvme.h |  17 ++--
>  block/nvme.c | 208 ---
>  MAINTAINERS  |   2 +
>  block/trace-events   |  30 ---
>  4 files changed, 148 insertions(+), 109 deletions(-)
> 

I have tested the series on ARM with both 4kB and 64kB pages and it
works for me.

Feel free to add:
Tested-by: Eric Auger 

Thanks

Eric




Re: [RFC PATCH 25/25] block/nvme: Fix use of write-only doorbells page on Aarch64 arch

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> qemu_vfio_pci_map_bar() calls mmap(), and mmap(2) states:
> 
>   'offset' must be a multiple of the page size as returned
>by sysconf(_SC_PAGE_SIZE).
> 
> In commit f68453237b9 we started to use an offset of 4K which
> broke this contract on Aarch64 arch.
> 
> Fix by mapping at offset 0, and and accessing doorbells at offset=4K.
> 
> Fixes: f68453237b9 ("block/nvme: Map doorbells pages write-only")
> Reported-by: Eric Auger 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Eric

> ---
>  block/nvme.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index c1c52bae44f..ff645eefe6a 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -94,6 +94,7 @@ typedef struct {
>  struct BDRVNVMeState {
>  AioContext *aio_context;
>  QEMUVFIOState *vfio;
> +void *bar0_wo_map;
>  /* Memory mapped registers */
>  volatile struct {
>  uint32_t sq_tail;
> @@ -778,8 +779,10 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  }
>  }
>  
> -s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0, sizeof(NvmeBar),
> - NVME_DOORBELL_SIZE, PROT_WRITE, 
> errp);
> +s->bar0_wo_map = qemu_vfio_pci_map_bar(s->vfio, 0, 0,
> +   sizeof(NvmeBar) + 
> NVME_DOORBELL_SIZE,
> +   PROT_WRITE, errp);
> +s->doorbells = (void *)((uintptr_t)s->bar0_wo_map + sizeof(NvmeBar));
>  if (!s->doorbells) {
>  ret = -EINVAL;
>  goto out;
> @@ -913,8 +916,8 @@ static void nvme_close(BlockDriverState *bs)
> >irq_notifier[MSIX_SHARED_IRQ_IDX],
> false, NULL, NULL);
>  event_notifier_cleanup(>irq_notifier[MSIX_SHARED_IRQ_IDX]);
> -qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
> -sizeof(NvmeBar), NVME_DOORBELL_SIZE);
> +qemu_vfio_pci_unmap_bar(s->vfio, 0, s->bar0_wo_map,
> +0, sizeof(NvmeBar) + NVME_DOORBELL_SIZE);
>  qemu_vfio_close(s->vfio);
>  
>  g_free(s->device);
> 




Re: [PATCH 18/25] block/nvme: Pass AioContext argument to nvme_add_io_queue()

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> We want to get ride of the BlockDriverState pointer at some point,
s/ride/rid
> so pass aio_context along.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 68f0c3f7959..a0871fc2a81 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -644,7 +644,9 @@ static void nvme_handle_event(EventNotifier *n)
>  nvme_poll_queues(s);
>  }
>  
> -static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
> +/* Returns true on success, false on failure. */
belongs to another patch, still not a big fan of bool ;-)
> +static bool nvme_add_io_queue(BlockDriverState *bs,
> +  AioContext *aio_context, Error **errp)
>  {
>  BDRVNVMeState *s = bs->opaque;
>  unsigned n = s->queue_count;
> @@ -653,8 +655,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  unsigned queue_size = NVME_QUEUE_SIZE;
>  
>  assert(n <= UINT16_MAX);
> -q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
> -   n, queue_size, errp);
> +q = nvme_create_queue_pair(s, aio_context, n, queue_size, errp);
>  if (!q) {
>  return false;
>  }
> @@ -830,7 +831,7 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  }
>  
>  /* Set up command queues. */
> -if (!nvme_add_io_queue(bs, errp)) {
> +    if (!nvme_add_io_queue(bs, aio_context, errp)) {
>  ret = -EIO;
>  }
>  out:
> 
Besides
Reviewed-by: Eric Auger 

Eric




Re: [PATCH 17/25] block/nvme: Simplify nvme_cmd_sync()

2020-10-28 Thread Auger Eric



On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> As all commands use the ADMIN queue, it is pointless to pass
> it as argument each time. Remove the argument.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Eric
> ---
>  block/nvme.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 2d3648694b0..68f0c3f7959 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -487,9 +487,10 @@ static void nvme_cmd_sync_cb(void *opaque, int ret)
>  aio_wait_kick();
>  }
>  
> -static int nvme_cmd_sync(BlockDriverState *bs, NVMeQueuePair *q,
> - NvmeCmd *cmd)
> +static int nvme_cmd_sync(BlockDriverState *bs, NvmeCmd *cmd)
>  {
> +BDRVNVMeState *s = bs->opaque;
> +NVMeQueuePair *q = s->queues[INDEX_ADMIN];
>  AioContext *aio_context = bdrv_get_aio_context(bs);
>  NVMeRequest *req;
>  int ret = -EINPROGRESS;
> @@ -534,7 +535,7 @@ static bool nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  
>  memset(id, 0, sizeof(*id));
>  cmd.dptr.prp1 = cpu_to_le64(iova);
> -if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> +if (nvme_cmd_sync(bs, )) {
>  error_setg(errp, "Failed to identify controller");
>  goto out;
>  }
> @@ -557,7 +558,7 @@ static bool nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  memset(id, 0, sizeof(*id));
>  cmd.cdw10 = 0;
>  cmd.nsid = cpu_to_le32(namespace);
> -if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> +if (nvme_cmd_sync(bs, )) {
>  error_setg(errp, "Failed to identify namespace");
>  goto out;
>  }
> @@ -663,7 +664,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
>  .cdw11 = cpu_to_le32(NVME_CQ_IEN | NVME_CQ_PC),
>  };
> -if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> +if (nvme_cmd_sync(bs, )) {
>  error_setg(errp, "Failed to create CQ io queue [%u]", n);
>  goto out_error;
>  }
> @@ -673,7 +674,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  .cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
>  .cdw11 = cpu_to_le32(NVME_SQ_PC | (n << 16)),
>  };
> -if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> +if (nvme_cmd_sync(bs, )) {
>  error_setg(errp, "Failed to create SQ io queue [%u]", n);
>  goto out_error;
>  }
> @@ -889,7 +890,7 @@ static int 
> nvme_enable_disable_write_cache(BlockDriverState *bs, bool enable,
>  .cdw11 = cpu_to_le32(enable ? 0x01 : 0x00),
>  };
>  
> -ret = nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], );
> +ret = nvme_cmd_sync(bs, );
>  if (ret) {
>  error_setg(errp, "Failed to configure NVMe write cache");
>  }
> 




Re: [PATCH 16/25] block/nvme: Simplify ADMIN queue access

2020-10-28 Thread Auger Eric



On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> We don't need to dereference from BDRVNVMeState each time.
> Use a NVMeQueuePair pointer to the admin queue and use it.
double "use"
> The nvme_init() becomes easier to review, matching the style
> of nvme_add_io_queue().>
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Eric

> ---
>  block/nvme.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index d5df30ec074..2d3648694b0 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -699,6 +699,7 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>   Error **errp)
>  {
>  BDRVNVMeState *s = bs->opaque;
> +NVMeQueuePair *q;
>  AioContext *aio_context = bdrv_get_aio_context(bs);
>  int ret;
>  uint64_t cap;
> @@ -781,19 +782,18 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  
>  /* Set up admin queue. */
>  s->queues = g_new(NVMeQueuePair *, 1);
> -s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context, 0,
> -  NVME_QUEUE_SIZE,
> -  errp);
> -if (!s->queues[INDEX_ADMIN]) {
> +q = nvme_create_queue_pair(s, aio_context, 0, NVME_QUEUE_SIZE, errp);
> +if (!q) {
>  ret = -EINVAL;
>  goto out;
>  }
> +s->queues[INDEX_ADMIN] = q;
>  s->queue_count = 1;
>  QEMU_BUILD_BUG_ON((NVME_QUEUE_SIZE - 1) & 0xF000);
>  regs->aqa = cpu_to_le32(((NVME_QUEUE_SIZE - 1) << AQA_ACQS_SHIFT) |
>  ((NVME_QUEUE_SIZE - 1) << AQA_ASQS_SHIFT));
> -regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
> -regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
> +regs->asq = cpu_to_le64(q->sq.iova);
> +regs->acq = cpu_to_le64(q->cq.iova);
>  
>  /* After setting up all control registers we can enable device now. */
>  regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << CC_IOCQES_SHIFT) |
> 




Re: [PATCH 15/25] block/nvme: Correctly initialize Admin Queue Attributes

2020-10-28 Thread Auger Eric



On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> From the specification chapter 3.1.8 "AQA - Admin Queue Attributes"
> the Admin Submission Queue Size field is a 0’s based value:
> 
>   Admin Submission Queue Size (ASQS):
> 
> Defines the size of the Admin Submission Queue in entries.
> Enabling a controller while this field is cleared to 00h
> produces undefined results. The minimum size of the Admin
> Submission Queue is two entries. The maximum size of the
> Admin Submission Queue is 4096 entries.
> This is a 0’s based value.
> 
> This bug has never been hit because the device initialization
> uses a single command synchronously :)
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 


Eric

> ---
>  block/nvme.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 2dfcf8c41d7..d5df30ec074 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -789,9 +789,9 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  goto out;
>  }
>  s->queue_count = 1;
> -QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
> -regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << AQA_ACQS_SHIFT) |
> -(NVME_QUEUE_SIZE << AQA_ASQS_SHIFT));
> +QEMU_BUILD_BUG_ON((NVME_QUEUE_SIZE - 1) & 0xF000);
> +regs->aqa = cpu_to_le32(((NVME_QUEUE_SIZE - 1) << AQA_ACQS_SHIFT) |
> +((NVME_QUEUE_SIZE - 1) << AQA_ASQS_SHIFT));
>  regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
>  regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
>  
> 




Re: [PATCH 14/25] block/nvme: Use definitions instead of magic values in add_io_queue()

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Replace magic values by definitions, and simplifiy since the
> number of queues will never reach 64K.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Eric

> ---
>  block/nvme.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 9324f0bfdc4..2dfcf8c41d7 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -651,6 +651,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  NvmeCmd cmd;
>  unsigned queue_size = NVME_QUEUE_SIZE;
>  
> +assert(n <= UINT16_MAX);
>  q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
> n, queue_size, errp);
>  if (!q) {
> @@ -659,8 +660,8 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_CQ,
>  .dptr.prp1 = cpu_to_le64(q->cq.iova),
> -.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
> -.cdw11 = cpu_to_le32(0x3),
> +.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
> +.cdw11 = cpu_to_le32(NVME_CQ_IEN | NVME_CQ_PC),
>  };
>  if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
>  error_setg(errp, "Failed to create CQ io queue [%u]", n);
> @@ -669,8 +670,8 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  cmd = (NvmeCmd) {
>  .opcode = NVME_ADM_CMD_CREATE_SQ,
>  .dptr.prp1 = cpu_to_le64(q->sq.iova),
> -.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
> -.cdw11 = cpu_to_le32(0x1 | (n << 16)),
> +.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
> +.cdw11 = cpu_to_le32(NVME_SQ_PC | (n << 16)),
>  };
>  if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
>  error_setg(errp, "Failed to create SQ io queue [%u]", n);
> 




Re: [PATCH 13/25] block/nvme: Introduce Completion Queue definitions

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Rename Submission Queue flags with 'Sq' 
... to differentiate submission queue flags from command queue flags.

and introduce
> Completion Queue flag definitions.

besides
Reviewed-by: Eric Auger 

Thanks

Eric
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/block/nvme.h | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 65e68a82c89..079f884a2d3 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -491,6 +491,11 @@ typedef struct QEMU_PACKED NvmeCreateCq {
>  #define NVME_CQ_FLAGS_PC(cq_flags)  (cq_flags & 0x1)
>  #define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1)
>  
> +enum NvmeFlagsCq {
> +NVME_CQ_PC  = 1,
> +NVME_CQ_IEN = 2,
> +};
> +
>  typedef struct QEMU_PACKED NvmeCreateSq {
>  uint8_t opcode;
>  uint8_t flags;
> @@ -508,12 +513,12 @@ typedef struct QEMU_PACKED NvmeCreateSq {
>  #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
>  #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)
>  
> -enum NvmeQueueFlags {
> -NVME_Q_PC   = 1,
> -NVME_Q_PRIO_URGENT  = 0,
> -NVME_Q_PRIO_HIGH= 1,
> -NVME_Q_PRIO_NORMAL  = 2,
> -NVME_Q_PRIO_LOW = 3,
> +enum NvmeFlagsSq {
> +NVME_SQ_PC  = 1,
> +NVME_SQ_PRIO_URGENT = 0,
> +NVME_SQ_PRIO_HIGH   = 1,
> +NVME_SQ_PRIO_NORMAL = 2,
> +NVME_SQ_PRIO_LOW= 3,
>  };
>  
>  typedef struct QEMU_PACKED NvmeIdentify {
> 




Re: [PATCH 12/25] block/nvme: Make nvme_init_queue() return boolean indicating error

2020-10-28 Thread Auger Eric
Hi,
On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Just for consistency, following the example documented since
> commit e3fe3988d7 ("error: Document Error API usage rules"),
> return a boolean value indicating an error is set or not.
> This simplifies a bit nvme_create_queue_pair().
also directly pass errp as the local_err is not requested in our case.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 74994c442e5..9324f0bfdc4 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -160,7 +160,8 @@ static QemuOptsList runtime_opts = {
>  },
>  };
>  
> -static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
> +/* Returns true on success, false on failure. */
> +static bool nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
>  unsigned nentries, size_t entry_bytes, Error 
> **errp)
>  {
>  size_t bytes;
> @@ -171,13 +172,14 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue 
> *q,
I personally prefer returning a conventional int instead of bool.
>  q->queue = qemu_try_memalign(s->page_size, bytes);
>  if (!q->queue) {
>  error_setg(errp, "Cannot allocate queue");
> -return;
> +return false;
>  }
>  memset(q->queue, 0, bytes);
>  r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova);
>  if (r) {
>  error_setg(errp, "Cannot map queue");
>  }
> +return r == 0;
also avoids that kind of conversion and use of !() in the called
>  }
>  
>  static void nvme_free_queue_pair(NVMeQueuePair *q)
> @@ -210,7 +212,6 @@ static NVMeQueuePair 
> *nvme_create_queue_pair(BDRVNVMeState *s,
>   Error **errp)
>  {
>  int i, r;
> -Error *local_err = NULL;
>  NVMeQueuePair *q;
>  uint64_t prp_list_iova;
>  
> @@ -247,16 +248,12 @@ static NVMeQueuePair 
> *nvme_create_queue_pair(BDRVNVMeState *s,
>  req->prp_list_iova = prp_list_iova + i * s->page_size;
>  }
>  
> -nvme_init_queue(s, >sq, size, NVME_SQ_ENTRY_BYTES, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (!nvme_init_queue(s, >sq, size, NVME_SQ_ENTRY_BYTES, errp)) {
>  goto fail;
>  }
>  q->sq.doorbell = >doorbells[idx * s->doorbell_scale].sq_tail;
>  
> -nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (!nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES, errp)) {
>  goto fail;
>  }
>  q->cq.doorbell = >doorbells[idx * s->doorbell_scale].cq_head;
> 
Thanks

Eric




Re: [PATCH 11/25] block/nvme: Make nvme_identify() return boolean indicating error

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Just for consistency, following the example documented since
> commit e3fe3988d7 ("error: Document Error API usage rules"),
> return a boolean value indicating an error is set or not.
Then I think the returned value should be used by the caller in this patch

Thanks

Eric
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 8b0fd59c6ea..74994c442e5 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -506,9 +506,11 @@ static int nvme_cmd_sync(BlockDriverState *bs, 
> NVMeQueuePair *q,
>  return ret;
>  }
>  
> -static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
> +/* Returns true on success, false on failure. */
> +static bool nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
>  {
>  BDRVNVMeState *s = bs->opaque;
> +bool ret = false;>  union {
>  NvmeIdCtrl ctrl;
>  NvmeIdNs ns;
> @@ -585,10 +587,13 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  goto out;
>  }
>  
> +ret = true;
>  s->blkshift = lbaf->ds;
>  out:
>  qemu_vfio_dma_unmap(s->vfio, id);
>  qemu_vfree(id);
> +
> +return ret;
>  }
>  
>  static bool nvme_poll_queue(NVMeQueuePair *q)
> 




Re: [PATCH 10/25] block/nvme: Use unsigned integer for queue counter/size

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> We can not have negative queue count/size/index, use unsigned type.
> Rename 'nr_queues' as 'queue_count' to match the spec naming.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  block/nvme.c   | 38 ++
>  block/trace-events | 10 +-
>  2 files changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 30075e230ca..8b0fd59c6ea 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -104,7 +104,7 @@ struct BDRVNVMeState {
>   * [1..]: io queues.
>   */
>  NVMeQueuePair **queues;
> -int nr_queues;
> +unsigned queue_count;
>  size_t page_size;
>  /* How many uint32_t elements does each doorbell entry take. */
>  size_t doorbell_scale;
> @@ -161,7 +161,7 @@ static QemuOptsList runtime_opts = {
>  };
>  
>  static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
> -int nentries, int entry_bytes, Error **errp)
> +unsigned nentries, size_t entry_bytes, Error 
> **errp)
>  {
>  size_t bytes;
>  int r;
> @@ -206,7 +206,7 @@ static void nvme_free_req_queue_cb(void *opaque)
>  
>  static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
>   AioContext *aio_context,
> - int idx, int size,
> + unsigned idx, size_t size,
>   Error **errp)
>  {
>  int i, r;
> @@ -623,7 +623,7 @@ static bool nvme_poll_queues(BDRVNVMeState *s)
>  bool progress = false;
>  int i;
>  
> -for (i = 0; i < s->nr_queues; i++) {
> +for (i = 0; i < s->queue_count; i++) {
>  if (nvme_poll_queue(s->queues[i])) {
>  progress = true;
>  }
> @@ -644,10 +644,10 @@ static void nvme_handle_event(EventNotifier *n)
>  static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
>  {
>  BDRVNVMeState *s = bs->opaque;
> -int n = s->nr_queues;
> +unsigned n = s->queue_count;
>  NVMeQueuePair *q;
>  NvmeCmd cmd;
> -int queue_size = NVME_QUEUE_SIZE;
> +unsigned queue_size = NVME_QUEUE_SIZE;
>  
>  q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
> n, queue_size, errp);
> @@ -661,7 +661,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
> **errp)
>  .cdw11 = cpu_to_le32(0x3),
>  };
>  if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> -error_setg(errp, "Failed to create CQ io queue [%d]", n);
> +error_setg(errp, "Failed to create CQ io queue [%u]", n);
>  goto out_error;
>  }
>  cmd = (NvmeCmd) {
> @@ -671,12 +671,12 @@ static bool nvme_add_io_queue(BlockDriverState *bs, 
> Error **errp)
>  .cdw11 = cpu_to_le32(0x1 | (n << 16)),
>  };
>  if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
> -error_setg(errp, "Failed to create SQ io queue [%d]", n);
> +error_setg(errp, "Failed to create SQ io queue [%u]", n);
>  goto out_error;
>  }
>  s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
>  s->queues[n] = q;
> -s->nr_queues++;
> +s->queue_count++;
>  return true;
>  out_error:
>  nvme_free_queue_pair(q);
> @@ -785,7 +785,7 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  ret = -EINVAL;
>  goto out;
>  }
> -s->nr_queues = 1;
> +s->queue_count = 1;
>  QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
>  regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << AQA_ACQS_SHIFT) |
>  (NVME_QUEUE_SIZE << AQA_ASQS_SHIFT));
> @@ -895,10 +895,9 @@ static int 
> nvme_enable_disable_write_cache(BlockDriverState *bs, bool enable,
>  
>  static void nvme_close(BlockDriverState *bs)
>  {
> -int i;
>  BDRVNVMeState *s = bs->opaque;
>  
> -for (i = 0; i < s->nr_queues; ++i) {
> +for (unsigned i = 0; i < s->queue_count; ++i) {
>  nvme_free_queue_pair(s->queues[i]);
>  }
>  g_free(s->queues);
> @@ -1123,7 +1122,7 @@ static coroutine_fn int 
> nvme_co_prw_aligned(BlockDriverState *bs,
>  };
>  
>  trace_nvme_prw_aligned(s, is_write, offset, bytes, flags, qiov->niov);
> -assert(s->nr_queues > 1);
> +assert(s->queue_count > 1);
>  req = nvme_get_free_req(ioq);
>  assert(req);
>  
> @@ -1233,7 +1232,7 @@ static coroutine_fn int nvme_co_flush(BlockDriverState 
> *bs)
>  .ret = -EINPROGRESS,
>  };
>  
> -assert(s->nr_queues > 1);
> +assert(s->queue_count > 1);
>  req = nvme_get_free_req(ioq);
>  assert(req);
>  nvme_submit_command(ioq, req, , nvme_rw_cb, );
> @@ -1285,7 +1284,7 @@ static coroutine_fn int 
> nvme_co_pwrite_zeroes(BlockDriverState *bs,
>  cmd.cdw12 = cpu_to_le32(cdw12);
>  
>  

Re: [PATCH 09/25] block/nvme: Move definitions before structure declarations

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> To be able to use some definitions in structure declarations,
> move them earlier. No logical change.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Eric

> ---
>  block/nvme.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index be14350f959..30075e230ca 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -41,6 +41,16 @@
>  
>  typedef struct BDRVNVMeState BDRVNVMeState;
>  
> +/* Same index is used for queues and IRQs */
> +#define INDEX_ADMIN 0
> +#define INDEX_IO(n) (1 + n)
> +
> +/* This driver shares a single MSIX IRQ for the admin and I/O queues */
> +enum {
> +MSIX_SHARED_IRQ_IDX = 0,
> +MSIX_IRQ_COUNT = 1
> +};
> +
>  typedef struct {
>  int32_t  head, tail;
>  uint8_t  *queue;
> @@ -81,15 +91,6 @@ typedef struct {
>  QEMUBH  *completion_bh;
>  } NVMeQueuePair;
>  
> -#define INDEX_ADMIN 0
> -#define INDEX_IO(n) (1 + n)
> -
> -/* This driver shares a single MSIX IRQ for the admin and I/O queues */
> -enum {
> -MSIX_SHARED_IRQ_IDX = 0,
> -MSIX_IRQ_COUNT = 1
> -};
> -
>  struct BDRVNVMeState {
>  AioContext *aio_context;
>  QEMUVFIOState *vfio;
> 




Re: [PATCH 04/25] block/nvme: Trace controller capabilities

2020-10-28 Thread Auger Eric
Hi Philippe,
On 10/28/20 11:25 AM, Philippe Mathieu-Daudé wrote:
> On 10/28/20 11:20 AM, Auger Eric wrote:
>> Hi Philippe,
>>
>> On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
>>> Controllers have different capabilities and report them in the
>>> CAP register. We are particularly interested by the page size
>>> limits.
>>>
>>> Reviewed-by: Stefan Hajnoczi 
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  block/nvme.c   | 13 +
>>>  block/trace-events |  2 ++
>>>  2 files changed, 15 insertions(+)
>>>
>>> diff --git a/block/nvme.c b/block/nvme.c
>>> index 6f1d7f9b2a1..361b5772b7a 100644
>>> --- a/block/nvme.c
>>> +++ b/block/nvme.c
>>> @@ -727,6 +727,19 @@ static int nvme_init(BlockDriverState *bs, const char 
>>> *device, int namespace,
>>>   * Initialization". */
>>>  
>>>  cap = le64_to_cpu(regs->cap);
>>> +trace_nvme_controller_capability_raw(cap);
>>> +trace_nvme_controller_capability("Maximum Queue Entries Supported",
>>> + 1 + NVME_CAP_MQES(cap));
>>> +trace_nvme_controller_capability("Contiguous Queues Required",
>>> + NVME_CAP_CQR(cap));
>> I think this should be +1 too (0's based value)
> 
> This is a boolean:
> 
>   Contiguous Queues Required (CQR): This field is set to ‘1’ if
>   the controller requires that I/O Submission Queues and I/O
>   Completion Queues are required to be physically contiguous.
>   This field is cleared to ‘0’ if the controller supports I/O
>   Submission Queues and I/O Completion Queues that are not
>   physically contiguous. If this field is set to ‘1’, then the
>   Physically Contiguous bit (CDW11.PC) in the Create I/O Submission
>   Queue and Create I/O Completion Queue commands shall be set to ‘1’.

Oh I mixed with NCQR :-(

sorry for the noise
Reviewed-by: Eric Auger 

Eric

> 
>>> +trace_nvme_controller_capability("Doorbell Stride",
>>> + 2 << (2 + NVME_CAP_DSTRD(cap)));
>>> +trace_nvme_controller_capability("Subsystem Reset Supported",
>>> + NVME_CAP_NSSRS(cap));
>>> +trace_nvme_controller_capability("Memory Page Size Minimum",
>>> + 1 << (12 + NVME_CAP_MPSMIN(cap)));
>>> +trace_nvme_controller_capability("Memory Page Size Maximum",
>>> + 1 << (12 + NVME_CAP_MPSMAX(cap)));
>>>  if (!NVME_CAP_CSS(cap)) {
>>>  error_setg(errp, "Device doesn't support NVMe command set");
>>>  ret = -EINVAL;
>>> diff --git a/block/trace-events b/block/trace-events
>>> index 0955c85c783..b90b07b15fa 100644
>>> --- a/block/trace-events
>>> +++ b/block/trace-events
>>> @@ -134,6 +134,8 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t 
>>> start, size_t len, uint64_t
>>>  qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t 
>>> len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
>>>  
>>>  # nvme.c
>>> +nvme_controller_capability_raw(uint64_t value) "0x%08"PRIx64
>>> +nvme_controller_capability(const char *desc, uint64_t value) "%s: %"PRIu64
>>>  nvme_kick(void *s, int queue) "s %p queue %d"
>>>  nvme_dma_flush_queue_wait(void *s) "s %p"
>>>  nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
>>> "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
>>>
>> Besides
>> Reviewed-by: Eric Auger 
>>
>> Thanks
>>
>> Eric
>>
> 
> 




Re: [PATCH 06/25] block/nvme: Improve nvme_free_req_queue_wait() trace information

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> What we want to trace is the block driver state and the queue index.
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  block/nvme.c   | 2 +-
>  block/trace-events | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 8d74401ae7a..29d2541b911 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -292,7 +292,7 @@ static NVMeRequest *nvme_get_free_req(NVMeQueuePair *q)
>  
>  while (q->free_req_head == -1) {
>  if (qemu_in_coroutine()) {
> -trace_nvme_free_req_queue_wait(q);
> +trace_nvme_free_req_queue_wait(q->s, q->index);
>  qemu_co_queue_wait(>free_req_queue, >lock);
>  } else {
>  qemu_mutex_unlock(>lock);
> diff --git a/block/trace-events b/block/trace-events
> index 86292f3312b..cc5e2b55cb5 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -154,7 +154,7 @@ nvme_rw_done(void *s, int is_write, uint64_t offset, 
> uint64_t bytes, int ret) "s
>  nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset 0x%"PRIx64" 
> bytes %"PRId64""
>  nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p 
> offset 0x%"PRIx64" bytes %"PRId64" ret %d"
>  nvme_dma_map_flush(void *s) "s %p"
> -nvme_free_req_queue_wait(void *q) "q %p"
> +nvme_free_req_queue_wait(void *s, unsigned q_index) "s %p q #%u"
>  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
> %p cmd %p req %p qiov %p entries %d"
>  nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 
> 0x%"PRIx64
>  nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] 
> %p pages %d"
> 




Re: [PATCH 04/25] block/nvme: Trace controller capabilities

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Controllers have different capabilities and report them in the
> CAP register. We are particularly interested by the page size
> limits.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c   | 13 +
>  block/trace-events |  2 ++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 6f1d7f9b2a1..361b5772b7a 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -727,6 +727,19 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>   * Initialization". */
>  
>  cap = le64_to_cpu(regs->cap);
> +trace_nvme_controller_capability_raw(cap);
> +trace_nvme_controller_capability("Maximum Queue Entries Supported",
> + 1 + NVME_CAP_MQES(cap));
> +trace_nvme_controller_capability("Contiguous Queues Required",
> + NVME_CAP_CQR(cap));
I think this should be +1 too (0's based value)
> +trace_nvme_controller_capability("Doorbell Stride",
> + 2 << (2 + NVME_CAP_DSTRD(cap)));
> +trace_nvme_controller_capability("Subsystem Reset Supported",
> + NVME_CAP_NSSRS(cap));
> +trace_nvme_controller_capability("Memory Page Size Minimum",
> + 1 << (12 + NVME_CAP_MPSMIN(cap)));
> +trace_nvme_controller_capability("Memory Page Size Maximum",
> + 1 << (12 + NVME_CAP_MPSMAX(cap)));
>  if (!NVME_CAP_CSS(cap)) {
>  error_setg(errp, "Device doesn't support NVMe command set");
>  ret = -EINVAL;
> diff --git a/block/trace-events b/block/trace-events
> index 0955c85c783..b90b07b15fa 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -134,6 +134,8 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t 
> start, size_t len, uint64_t
>  qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
> "s %p acb %p ret %d offset %"PRIu64" len %zu"
>  
>  # nvme.c
> +nvme_controller_capability_raw(uint64_t value) "0x%08"PRIx64
> +nvme_controller_capability(const char *desc, uint64_t value) "%s: %"PRIu64
>  nvme_kick(void *s, int queue) "s %p queue %d"
>  nvme_dma_flush_queue_wait(void *s) "s %p"
>  nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
> "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
> 
Besides
Reviewed-by: Eric Auger 

Thanks

Eric




Re: [PATCH 07/25] block/nvme: Trace queue pair creation/deletion

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  block/nvme.c   | 3 +++
>  block/trace-events | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 29d2541b911..e95d59d3126 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -181,6 +181,7 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue 
> *q,
>  
>  static void nvme_free_queue_pair(NVMeQueuePair *q)
>  {
> +trace_nvme_free_queue_pair(q->index, q);
>  if (q->completion_bh) {
>  qemu_bh_delete(q->completion_bh);
>  }
> @@ -216,6 +217,8 @@ static NVMeQueuePair 
> *nvme_create_queue_pair(BDRVNVMeState *s,
>  if (!q) {
>  return NULL;
>  }
> +trace_nvme_create_queue_pair(idx, q, size, aio_context,
> + event_notifier_get_fd(s->irq_notifier));
>  q->prp_list_pages = qemu_try_memalign(s->page_size,
>s->page_size * NVME_NUM_REQS);
>  if (!q->prp_list_pages) {
> diff --git a/block/trace-events b/block/trace-events
> index cc5e2b55cb5..f6a0f99df1a 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -155,6 +155,8 @@ nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p 
> offset 0x%"PRIx64" byte
>  nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p 
> offset 0x%"PRIx64" bytes %"PRId64" ret %d"
>  nvme_dma_map_flush(void *s) "s %p"
>  nvme_free_req_queue_wait(void *s, unsigned q_index) "s %p q #%u"
> +nvme_create_queue_pair(unsigned q_index, void *q, unsigned size, void 
> *aio_context, int fd) "index %u q %p size %u aioctx %p fd %d"
> +nvme_free_queue_pair(unsigned q_index, void *q) "index %u q %p"
>  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
> %p cmd %p req %p qiov %p entries %d"
>  nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 
> 0x%"PRIx64
>  nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] 
> %p pages %d"
> 




Re: [PATCH 05/25] block/nvme: Trace nvme_poll_queue() per queue

2020-10-28 Thread Auger Eric
Hi,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> As we want to enable multiple queues, report the event
> in each nvme_poll_queue() call, rather than once in
> the callback calling nvme_poll_queues().
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  block/nvme.c   | 2 +-
>  block/trace-events | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 361b5772b7a..8d74401ae7a 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -594,6 +594,7 @@ static bool nvme_poll_queue(NVMeQueuePair *q)
>  const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
>  NvmeCqe *cqe = (NvmeCqe *)>cq.queue[cqe_offset];
>  
> +trace_nvme_poll_queue(q->s, q->index);
>  /*
>   * Do an early check for completions. q->lock isn't needed because
>   * nvme_process_completion() only runs in the event loop thread and
> @@ -684,7 +685,6 @@ static bool nvme_poll_cb(void *opaque)
>  BDRVNVMeState *s = container_of(e, BDRVNVMeState,
>  irq_notifier[MSIX_SHARED_IRQ_IDX]);
>  
> -trace_nvme_poll_cb(s);
>  return nvme_poll_queues(s);
>  }
>  
> diff --git a/block/trace-events b/block/trace-events
> index b90b07b15fa..86292f3312b 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -145,7 +145,7 @@ nvme_complete_command(void *s, int index, int cid) "s %p 
> queue %d cid %d"
>  nvme_submit_command(void *s, int index, int cid) "s %p queue %d cid %d"
>  nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
> c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
>  nvme_handle_event(void *s) "s %p"
> -nvme_poll_cb(void *s) "s %p"
> +nvme_poll_queue(void *s, unsigned q_index) "s %p q #%u"
>  nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> flags, int niov) "s %p is_write %d offset 0x%"PRIx64" bytes %"PRId64" flags 
> %d niov %d"
>  nvme_write_zeroes(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p 
> offset 0x%"PRIx64" bytes %"PRId64" flags %d"
>  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int 
> align) "qiov %p n %d base %p size 0x%zx align 0x%x"
> 




Re: [PATCH 03/25] block/nvme: Report warning with warn_report()

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Instead of displaying warning on stderr, use warn_report()
> which also displays it on the monitor.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  block/nvme.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 739a0a700cb..6f1d7f9b2a1 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -399,8 +399,8 @@ static bool nvme_process_completion(NVMeQueuePair *q)
>  }
>  cid = le16_to_cpu(c->cid);
>  if (cid == 0 || cid > NVME_QUEUE_SIZE) {
> -fprintf(stderr, "Unexpected CID in completion queue: %" PRIu32 
> "\n",
> -cid);
> +warn_report("NVMe: Unexpected CID in completion queue: 
> %"PRIu32", "
> +"queue size: %u", cid, NVME_QUEUE_SIZE);
>  continue;
>  }
>  trace_nvme_complete_command(s, q->index, cid);
> 




Re: [PATCH 02/25] block/nvme: Use hex format to display offset in trace events

2020-10-28 Thread Auger Eric
Hi Philippe,

On 10/27/20 2:55 PM, Philippe Mathieu-Daudé wrote:
> Use the same format used for the hw/vfio/ trace events.
> 
> Suggested-by: Eric Auger 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/trace-events | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/trace-events b/block/trace-events
> index 0e351c3fa3d..0955c85c783 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -144,13 +144,13 @@ nvme_submit_command(void *s, int index, int cid) "s %p 
> queue %d cid %d"
>  nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
> c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
>  nvme_handle_event(void *s) "s %p"
>  nvme_poll_cb(void *s) "s %p"
> -nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d 
> niov %d"
while we are at it I would change bytes here and below too.

But this can be part of another patch
Reviewed-by: Eric Auger 

Thanks

Eric

> -nvme_write_zeroes(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p 
> offset %"PRId64" bytes %"PRId64" flags %d"
> +nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> flags, int niov) "s %p is_write %d offset 0x%"PRIx64" bytes %"PRId64" flags 
> %d niov %d"
> +nvme_write_zeroes(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p 
> offset 0x%"PRIx64" bytes %"PRId64" flags %d"
>  nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int 
> align) "qiov %p n %d base %p size 0x%zx align 0x%x"
> -nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int 
> is_write) "s %p offset %"PRId64" bytes %"PRId64" niov %d is_write %d"
> -nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> ret) "s %p is_write %d offset %"PRId64" bytes %"PRId64" ret %d"
> -nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" 
> bytes %"PRId64""
> -nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p 
> offset %"PRId64" bytes %"PRId64" ret %d"
> +nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int 
> is_write) "s %p offset 0x%"PRIx64" bytes %"PRId64" niov %d is_write %d"
> +nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
> ret) "s %p is_write %d offset 0x%"PRIx64" bytes %"PRId64" ret %d"
> +nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset 0x%"PRIx64" 
> bytes %"PRId64""
> +nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p 
> offset 0x%"PRIx64" bytes %"PRId64" ret %d"
>  nvme_dma_map_flush(void *s) "s %p"
>  nvme_free_req_queue_wait(void *q) "q %p"
>  nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
> %p cmd %p req %p qiov %p entries %d"
> 




Re: [PATCH] arm/trace: Fix hex printing

2020-10-27 Thread Auger Eric
Hi Peter,

On 10/27/20 11:38 AM, Peter Maydell wrote:
> OK; I'll apply this patch to target-arm.next; feel free to send
> a patch updating the other tracepoints to hex.

sure, I will.

Thanks

Eric




Re: [PATCH] arm/trace: Fix hex printing

2020-10-27 Thread Auger Eric
Hi Peter,

On 10/19/20 9:26 PM, Peter Maydell wrote:
> On Wed, 14 Oct 2020 at 20:36, Dr. David Alan Gilbert (git)
>  wrote:
>>
>> From: "Dr. David Alan Gilbert" 
>>
>> Use of 0x%d - make up our mind as 0x%x
>>
>> Signed-off-by: Dr. David Alan Gilbert 
>> ---
>>  hw/arm/trace-events | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/arm/trace-events b/hw/arm/trace-events
>> index c8a4d80f6b..a335ee891d 100644
>> --- a/hw/arm/trace-events
>> +++ b/hw/arm/trace-events
>> @@ -41,7 +41,7 @@ smmuv3_get_cd(uint64_t addr) "CD addr: 0x%"PRIx64
>>  smmuv3_decode_cd(uint32_t oas) "oas=%d"
>>  smmuv3_decode_cd_tt(int i, uint32_t tsz, uint64_t ttb, uint32_t granule_sz, 
>> bool had) "TT[%d]:tsz:%d ttb:0x%"PRIx64" granule_sz:%d had:%d"
>>  smmuv3_cmdq_cfgi_ste(int streamid) "streamid =%d"
>> -smmuv3_cmdq_cfgi_ste_range(int start, int end) "start=0x%d - end=0x%d"
>> +smmuv3_cmdq_cfgi_ste_range(int start, int end) "start=0x%x - end=0x%x"
> 
> Ah, I missed that you'd sent this patch before.
> 
> Eric, do we want to use hex here, or should we go for
> decimal the way we do with (almost) all the other
> tracing of stream IDs (eg mmuv3_cmdq_cfgi_ste in the line before)?
> 
> The other odd-one-out is smmuv3_find_ste which prints a hex
> SID; I think the other tracing of SIDs is always decimal.
I think my preference would be to use hexa here and in the other places.

Thanks

Eric
> 
> thanks
> -- PMM
> 




Re: [PATCH v2 16/19] util/vfio-helpers: Introduce qemu_vfio_pci_msix_init_irqs()

2020-10-26 Thread Auger Eric
Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> qemu_vfio_pci_init_irq() allows us to initialize any type of IRQ,
> but only one. Introduce qemu_vfio_pci_msix_init_irqs() which is
> specific to MSIX IRQ type, and allow us to use multiple IRQs
> (thus passing multiple eventfd notifiers).
> All eventfd notifiers are initialized with the special '-1' value
> meaning "un-assigned".
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/qemu/vfio-helpers.h |  6 +++-
>  util/vfio-helpers.c | 65 -
>  util/trace-events   |  1 +
>  3 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 4b97a904e93..492072cba2f 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -1,11 +1,13 @@
>  /*
>   * QEMU VFIO helpers
>   *
> - * Copyright 2016 - 2018 Red Hat, Inc.
> + * Copyright 2016 - 2020 Red Hat, Inc.
>   *
>   * Authors:
>   *   Fam Zheng 
> + *   Philippe Mathieu-Daudé 
>   *
> + * SPDX-License-Identifier: GPL-2.0-or-later
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> @@ -29,5 +31,7 @@ void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, 
> void *bar,
>   uint64_t offset, uint64_t size);
>  int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
> int irq_type, Error **errp);
> +int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
> + unsigned *irq_count, Error **errp);
>  
>  #endif
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 874d76c2a2a..d88e2c7dc1f 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -1,11 +1,13 @@
>  /*
>   * VFIO utility
>   *
> - * Copyright 2016 - 2018 Red Hat, Inc.
> + * Copyright 2016 - 2020 Red Hat, Inc.
>   *
>   * Authors:
>   *   Fam Zheng 
> + *   Philippe Mathieu-Daudé 
>   *
> + * SPDX-License-Identifier: GPL-2.0-or-later
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
>   */
> @@ -230,6 +232,67 @@ int qemu_vfio_pci_init_irq(QEMUVFIOState *s, 
> EventNotifier *e,
>  return 0;
>  }
>  
> +/**
> + * Initialize device MSIX IRQs and register event notifiers.
> + * @irq_count: pointer to number of MSIX IRQs to initialize
> + *
> + * If the number of IRQs requested exceeds the available on the device,
> + * store the number of available IRQs in @irq_count and return -EOVERFLOW.
> + */
> +int qemu_vfio_pci_msix_init_irqs(QEMUVFIOState *s,
> + unsigned *irq_count, Error **errp)
> +{
> +int r;
> +size_t irq_set_size;
> +struct vfio_irq_set *irq_set;
> +struct vfio_irq_info irq_info = {
> +.argsz = sizeof(irq_info),
> +.index = VFIO_PCI_MSIX_IRQ_INDEX
> +};
> +
> +if (ioctl(s->device, VFIO_DEVICE_GET_IRQ_INFO, _info)) {
> +error_setg_errno(errp, errno, "Failed to get device interrupt info");
> +return -errno;
> +}
> +trace_qemu_vfio_msix_info_irqs(irq_info.count, *irq_count);
> +if (irq_info.count < *irq_count) {
> +error_setg(errp, "Not enough device interrupts available");
> +*irq_count = irq_info.count;
> +return -EOVERFLOW;
> +}
> +if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
> +error_setg(errp, "Device interrupt doesn't support eventfd");
> +return -EINVAL;
> +}
> +
> +irq_set_size = sizeof(*irq_set) + *irq_count * sizeof(int32_t);
> +irq_set = g_malloc0(irq_set_size);
> +
> +/* Get to a known IRQ state */
> +*irq_set = (struct vfio_irq_set) {
> +.argsz = irq_set_size,
> +.flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
> +.index = VFIO_PCI_MSIX_IRQ_INDEX,
> +.start = 0,
> +.count = *irq_count,
> +};
> +
> +for (unsigned i = 0; i < *irq_count; i++) {
> +((int32_t *)_set->data)[i] = -1; /* un-assigned: skip */
> +}
> +r = ioctl(s->device, VFIO_DEVICE_SET_IRQS, irq_set);
> +g_free(irq_set);
> +if (r < 0) {
> +error_setg_errno(errp, errno, "Failed to setup device interrupts");
> +return -errno;
> +} else if (r > 0) {
Can it happen?

Thanks

Eric
> +error_setg(errp, "Not enough device interrupts available");
> +*irq_count = r;
> +return -EOVERFLOW;
> +}
> +return 0;
> +}
> +
>  static int qemu_vfio_pci_read_config(QEMUVFIOState *s, void *buf,
>   int size, int ofs)
>  {
> diff --git a/util/trace-events b/util/trace-events
> index 3c36def9f30..ec93578b125 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,6 +87,7 @@ qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, 
> size_t size) "s %p host
>  

Re: [PATCH v2 18/19] block/nvme: Switch to using the MSIX API

2020-10-26 Thread Auger Eric
Hi Philippe,

On 10/26/20 11:55 AM, Philippe Mathieu-Daudé wrote:
> In preparation of using multiple IRQs, switch to using the recently
> introduced MSIX API. Instead of allocating and assigning IRQ in
> a single step, we now have to use two distinct calls.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/nvme.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 46b09b3a3a7..191678540b6 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -693,6 +693,7 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  size_t device_page_size_min;
>  size_t device_page_size_max;
>  size_t iommu_page_size_min = 4096;
> +unsigned irq_count = MSIX_IRQ_COUNT;
>  
>  qemu_co_mutex_init(>dma_map_lock);
>  qemu_co_queue_init(>dma_flush_queue);
> @@ -809,8 +810,17 @@ static int nvme_init(BlockDriverState *bs, const char 
> *device, int namespace,
>  }
>  }
>  
> -ret = qemu_vfio_pci_init_irq(s->vfio, s->irq_notifier,
> - VFIO_PCI_MSIX_IRQ_INDEX, errp);
> +ret = qemu_vfio_pci_msix_init_irqs(s->vfio, _count, errp);
> +if (ret) {
> +if (ret == -EOVERFLOW) {
> +error_append_hint(errp, "%u IRQs requested but only %u 
> available\n",
> +  MSIX_IRQ_COUNT, irq_count);
This message can be directly printed in qemu_vfio_pci_msix_init_irqs()
> +}
> +goto out;
> +}
> +
> +ret = qemu_vfio_pci_msix_set_irq(s->vfio, MSIX_SHARED_IRQ_IDX,
> + s->irq_notifier, errp);
>  if (ret) {
>  goto out;
>  }
> 
Thanks

Eric




Re: [PATCH v2 11/19] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error

2020-10-26 Thread Auger Eric
Hi Philippe,

On 10/26/20 11:54 AM, Philippe Mathieu-Daudé wrote:
> Currently qemu_vfio_dma_map() displays errors on stderr.
> When using management interface, this information is simply
> lost. Pass qemu_vfio_dma_map() an Error* argument so it can
Error** or simply error handle
> propagate the error to callers.
> 
> Reviewed-by: Fam Zheng 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c| 14 +++---
>  util/vfio-helpers.c | 12 +++-
>  3 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
> index 4491c8e1a6e..bde9495b254 100644
> --- a/include/qemu/vfio-helpers.h
> +++ b/include/qemu/vfio-helpers.h
> @@ -18,7 +18,7 @@ typedef struct QEMUVFIOState QEMUVFIOState;
>  QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
>  void qemu_vfio_close(QEMUVFIOState *s);
>  int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
> -  bool temporary, uint64_t *iova_list);
> +  bool temporary, uint64_t *iova_list, Error **errp);
>  int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
>  void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
>  void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
> diff --git a/block/nvme.c b/block/nvme.c
> index 3b6d3972ec2..6f1ebdf031f 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -167,9 +167,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue 
> *q,
>  return;
>  }
>  memset(q->queue, 0, bytes);
> -r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova);
> +r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova, errp);
>  if (r) {
> -error_setg(errp, "Cannot map queue");
> +error_prepend(errp, "Cannot map queue: ");
>  }
>  }
>  
> @@ -223,7 +223,7 @@ static NVMeQueuePair 
> *nvme_create_queue_pair(BDRVNVMeState *s,
>  q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, 
> q);
>  r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
>s->page_size * NVME_NUM_REQS,
> -  false, _list_iova);
> +  false, _list_iova, errp);
>  if (r) {
you may add an associated error_prepend(errp, "") here too to be consistent.
>  goto fail;
>  }
> @@ -514,9 +514,9 @@ static void nvme_identify(BlockDriverState *bs, int 
> namespace, Error **errp)
>  error_setg(errp, "Cannot allocate buffer for identify response");
>  goto out;
>  }
> -r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, );
> +r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, , errp);
>  if (r) {
> -error_setg(errp, "Cannot map buffer for DMA");
> +error_prepend(errp, "Cannot map buffer for DMA: ");
>  goto out;
>  }
>  
> @@ -1003,7 +1003,7 @@ try_map:
>  r = qemu_vfio_dma_map(s->vfio,
>qiov->iov[i].iov_base,
>qiov->iov[i].iov_len,
> -  true, );
> +  true, , NULL);
>  if (r == -ENOMEM && retry) {
>  retry = false;
>  trace_nvme_dma_flush_queue_wait(s);
> @@ -1450,7 +1450,7 @@ static void nvme_register_buf(BlockDriverState *bs, 
> void *host, size_t size)
>  int ret;
>  BDRVNVMeState *s = bs->opaque;
>  
> -ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL);
> +ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL, NULL);
>  if (ret) {
>  /* FIXME: we may run out of IOVA addresses after repeated
>   * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_unmap
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 73f7bfa7540..c03fe0b7156 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -462,7 +462,7 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
>  {
>  QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
>  trace_qemu_vfio_ram_block_added(s, host, size);
> -qemu_vfio_dma_map(s, host, size, false, NULL);
> +qemu_vfio_dma_map(s, host, size, false, NULL, NULL);
>  }
>  
>  static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
> @@ -477,6 +477,7 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier 
> *n,
>  
>  static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
>  {
> +Error *local_err = NULL;
>  void *host_addr = qemu_ram_get_host_addr(rb);
>  ram_addr_t length = qemu_ram_get_used_length(rb);
>  int ret;
> @@ -485,10 +486,11 @@ static int qemu_vfio_init_ramblock(RAMBlock *rb, void 
> *opaque)
>  if (!host_addr) {
>  return 0;
>  }
> -ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
> +ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL, _err);
>  if (ret) {
> -

  1   2   3   4   5   6   7   8   9   10   >