RE: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Friday, September 9, 2022 11:17 AM
> 
> On Thu, Sep 08, 2022 at 01:14:42PM -0300, Jason Gunthorpe wrote:
> 
> > > I am wondering if this can be solved by better defining what the return
> > > codes mean and adjust the call-back functions to match the definition.
> > > Something like:
> > >
> > >   -ENODEV : Device not mapped my an IOMMU
> > >   -EBUSY  : Device attached and domain can not be changed
> > >   -EINVAL : Device and domain are incompatible
> > >   ...
> >
> > Yes, this was gone over in a side thread the pros/cons, so lets do
> > it. Nicolin will come with something along these lines.
> 
> I have started this effort by combining this list and the one from
> the side thread:
> 
> @@ -266,6 +266,13 @@ struct iommu_ops {
>  /**
>   * struct iommu_domain_ops - domain specific operations
>   * @attach_dev: attach an iommu domain to a device
> + *  Rules of its return errno:
> + *   ENOMEM  - Out of memory
> + *   EINVAL  - Device and domain are incompatible
> + *   EBUSY   - Device is attached to a domain and cannot be 
> changed

With this definition then probably @attach_dev should not return -EBUSY
at all given it's already checked in the start of __iommu_attach_group():

if (group->domain && group->domain != group->default_domain &&
group->domain != group->blocking_domain)
return -EBUSY;

> + *   ENODEV  - Device or domain is messed up: device is not 
> mapped
> + * to an IOMMU, no domain can attach, and etc.

if domain is messed up then should return -EINVAL given using another domain
might just work. IMHO here -ENODEV should only cover device specific problems
preventing this device from being attached to by any domain.

> + *   - Same behavior as ENODEV, use is discouraged

didn't get the "Same behavior" part. Does it suggest all other errnos should
be converted to ENODEV?

btw what about -ENOSPC? It's sane to allocate some resource in the attach
path while the resource might be not available, e.g.:

intel_iommu_attach_device()
  domain_add_dev_info()
domain_attach_iommu():

int num, ret = -ENOSPC;
...
ndomains = cap_ndoms(iommu->cap);
num = find_first_zero_bit(iommu->domain_ids, ndomains);
if (num >= ndomains) {
pr_err("%s: No free domain ids\n", iommu->name);
goto err_unlock;
}

As discussed in a side thread a note might be added to exempt calling
kAPI outside of the iommu driver. 

>   * @detach_dev: detach an iommu domain from a device
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @map_pages: map a physically contiguous set of pages of the same size
> to
> 
> I am now going through every single return value of ->attach_dev to
> make sure the list above applies. And I will also incorporate things
> like Robin's comments at the AMD IOMMU driver.
> 
> And if the change occurs to be bigger, I guess that separating it to
> be an IOMMU series from this VFIO one might be better.
> 
> Thanks
> Nic
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/6] udp: allow header check for dodgy GSO_UDP_L4 packets.

2022-09-08 Thread Andrew Melnichenko
Hi all,

On Thu, Sep 8, 2022 at 3:40 AM David Ahern  wrote:
>
> On 9/7/22 6:50 AM, Andrew Melnychenko wrote:
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 6d1a4bec2614..8e002419b4d5 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -387,7 +387,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
> > *skb,
> >   if (!pskb_may_pull(skb, sizeof(struct udphdr)))
> >   goto out;
> >
> > - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
> > + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && !skb_gso_ok(skb, 
> > features | NETIF_F_GSO_ROBUST))
>
> that line needs to be wrapped.

Ok, I'll wrap it.

>
> >   return __udp_gso_segment(skb, features, false);
> >
> >   mss = skb_shinfo(skb)->gso_size;
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] TUN/VirtioNet USO features support.

2022-09-08 Thread Andrew Melnichenko
Hi all,

On Thu, Sep 8, 2022 at 3:44 AM David Ahern  wrote:
>
> On 9/7/22 6:50 AM, Andrew Melnychenko wrote:
> > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> > Technically they enable NETIF_F_GSO_UDP_L4
> > (and only if USO4 & USO6 are set simultaneously).
> > It allows the transmission of large UDP packets.
>
> Please spell out USO at least once in the cover letter to make sure the
> acronym is clear.

USO - UDP Segmentation Offload. Ability to split UDP packets into
several segments.
Allows sending/receiving big UDP packets. At some point, it looks like
UFO(fragmentation),
but each segment contains a UDP header.

>
> >
> > Different features USO4 and USO6 are required for qemu where Windows guests 
> > can
> > enable disable USO receives for IPv4 and IPv6 separately.
> > On the other side, Linux can't really differentiate USO4 and USO6, for now.
>
> Why is that and what is needed for Linux to differentiate between the 2
> protocols?

Well, this feature requires for Windows VM guest interaction. Windows may have
a separate configuration for USO4/USO6. Currently, we support Windows guests
with enabled USO4 and USO6 simultaneously.
To implement this on Linux host, will require at least one additional
netdev feature
and changes in Linux network stack. Discussion about that will be in
the future after
some research.

>
> > For now, to enable USO for TUN it requires enabling USO4 and USO6 together.
> > In the future, there would be a mechanism to control UDP_L4 GSO separately.
> >
> > New types for virtio-net already in virtio-net specification:
> > https://github.com/oasis-tcs/virtio-spec/issues/120
> >
> > Test it WIP Qemu https://github.com/daynix/qemu/tree/USOv3
> >
> > Andrew (5):
> >   uapi/linux/if_tun.h: Added new offload types for USO4/6.
> >   driver/net/tun: Added features for USO.
> >   uapi/linux/virtio_net.h: Added USO types.
> >   linux/virtio_net.h: Support USO offload in vnet header.
> >   drivers/net/virtio_net.c: Added USO support.
> >
> > Andrew Melnychenko (1):
> >   udp: allow header check for dodgy GSO_UDP_L4 packets.
> >
> >  drivers/net/tap.c   | 10 --
> >  drivers/net/tun.c   |  8 +++-
> >  drivers/net/virtio_net.c| 19 +++
> >  include/linux/virtio_net.h  |  9 +
> >  include/uapi/linux/if_tun.h |  2 ++
> >  include/uapi/linux/virtio_net.h |  5 +
> >  net/ipv4/udp_offload.c  |  2 +-
> >  7 files changed, 47 insertions(+), 8 deletions(-)
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_bt: Fix alignment in configuration struct

2022-09-08 Thread Michael S. Tsirkin
On Tue, Aug 30, 2022 at 09:45:02AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 11, 2022 at 10:02:31AM -0700, Luiz Augusto von Dentz wrote:
> > Hi Michael,
> > 
> > On Thu, Aug 11, 2022 at 1:00 AM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Aug 08, 2022 at 08:16:11AM -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 08, 2022 at 02:04:43PM +0200, Igor Skalkin wrote:
> > > > > On 8/8/22 01:00, Michael S. Tsirkin wrote:
> > > > >
> > > > > On Mon, Aug 08, 2022 at 12:11:52AM +0200, Igor Skalkin wrote:
> > > > >
> > > > > According to specification [1], "For the device-specific 
> > > > > configuration
> > > > > space, the driver MUST use 8 bit wide accesses for 8 bit wide 
> > > > > fields,
> > > > > 16 bit wide and aligned accesses for 16 bit wide fields and 
> > > > > 32 bit wide
> > > > > and aligned accesses for 32 and 64 bit wide fields.".
> > > > >
> > > > > Current version of the configuration structure:
> > > > >
> > > > > struct virtio_bt_config {
> > > > > __u8  type;
> > > > > __u16 vendor;
> > > > > __u16 msft_opcode;
> > > > > } __attribute__((packed));
> > > > >
> > > > > has both 16bit fields non-aligned.
> > > > >
> > > > > This commit fixes it.
> > > > >
> > > > > [1] 
> > > > > https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.oasis%2dopen.org%2fvirtio%2fvirtio%2fv1.1%2fvirtio%2dv1.1.pdf=d1786ace-e8ea-40e8-9665-96c0949174e5=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-39b15885ceebe9fda9357320aec1ccbac416a470
> > > > >
> > > > > Signed-off-by: Igor Skalkin 
> > > > >
> > > > > This is all true enough, but the problem is
> > > > > 1. changing uapi like this can't be done, will break userspace
> > > > > 2. the driver has more issues and no one seems to want to
> > > > >maintain it.
> > > > > I posted a patch "Bluetooth: virtio_bt: mark broken" and intend
> > > > > to merge it for this release.
> > > > >
> > > > > This is very sad. We already use this driver in our projects.
> > > >
> > > > Really?  Can you step up to maintain it? Then we can fix the issues
> > > > and it won't be broken.
> > >
> > > Just a reminder that I'm waiting for a response on that.
> > > I just don't know enough about bluetooth.
> > 
> > Just a heads up that Marcel is on vacation, he did mention that he had
> > done some work to update virtio_bt thus why I didn't apply any of the
> > changes yet.
> 
> Any update? when does Marcel return?


Annd ... this is falling between the chairs again?
Guys if anyone wants to use this driver it needs a maintainer.

> > > --
> > > MST
> > >
> > 
> > 
> > -- 
> > Luiz Augusto von Dentz

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Call to discuss vsock netdev/sk_buff [was Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc]

2022-09-08 Thread Stefano Garzarella

On Thu, Aug 18, 2022 at 02:39:32PM +, Bobby Eshleman wrote:

On Tue, Sep 06, 2022 at 09:26:33AM -0400, Stefan Hajnoczi wrote:

Hi Bobby,
If you are attending Linux Foundation conferences in Dublin, Ireland
next week (Linux Plumbers Conference, Open Source Summit Europe, KVM
Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet
Stefano Garzarella and others to discuss this patch series.

Using netdev and sk_buff is a big change to vsock. Discussing your
requirements and the future direction of vsock in person could help.

If you won't be in Dublin, don't worry. You can schedule a video call if
you feel it would be helpful to discuss these topics.

Stefan


Hey Stefan,

That sounds like a great idea! I was unable to make the Dublin trip work
so I think a video call would be best, of course if okay with everyone.


Looking better at the KVM forum sched, I found 1h slot for Sep 15 at 
16:30 UTC.


Could this work for you?

It would be nice to also have HyperV and VMCI people in the call and 
anyone else who is interested of course.


@Dexuan @Bryan @Vishnu can you attend?

@MST @Jason @Stefan if you can be there that would be great, we could 
connect together from Dublin.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Joerg Roedel
On Wed, Sep 07, 2022 at 02:10:33PM -0300, Jason Gunthorpe wrote:
> Sure, rust has all sorts of nice things. But the kernel doesn't follow
> rust idioms, and I don't think this is a great place to start
> experimenting with them.

It is actually a great place to start experimenting. The IOMMU
interfaces are rather domain specific and if we get something wrong the
damage is limited to a few callers. There are APIs much more exposed in
the kernel which would be worse for that.

But anyway, I am not insisting on it.

> It has been 3 months since EMEDIUMTYPE was first proposed and 6
> iterations of the series, don't you think it is a bit late in the game
> to try to experiment with rust error handling idioms?

If I am not mistaken, I am the person who gets blamed when crappy IOMMU
code is sent upstream. So it is also up to me to decide in which state
and how close to merging a given patch series is an whether it is
already 'late in the game'.

> So, again, would you be happy with a simple 
> 
>  #define IOMMU_EINCOMPATIBLE_DEVICE xx
> 
> to make it less "re-using random error codes"?

I am wondering if this can be solved by better defining what the return
codes mean and adjust the call-back functions to match the definition.
Something like:

-ENODEV : Device not mapped my an IOMMU
-EBUSY  : Device attached and domain can not be changed
-EINVAL : Device and domain are incompatible
...

That would be much more intuitive than using something obscure like
EMEDIUMTYPE.

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-09-08 Thread Rob Clark
On Tue, Sep 6, 2022 at 1:01 PM Daniel Vetter  wrote:
>
> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > Higher order pages allocated using alloc_pages() aren't refcounted and 
> > > they
> > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > faults.
> > >
> > > Without this change guest virgl driver can't map host buffers into guest
> > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > mappings are also needed for enabling the Venus driver using host GPU
> > > drivers that are utilizing TTM.
> > >
> > > Based on a patch proposed by Trigger Huang.
> >
> > Well I can't count how often I have repeated this: This is an absolutely
> > clear NAK!
> >
> > TTM pages are not reference counted in the first place and because of this
> > giving them to virgl is illegal.
> >
> > Please immediately stop this completely broken approach. We have discussed
> > this multiple times now.
>
> Yeah we need to get this stuff closed for real by tagging them all with
> VM_IO or VM_PFNMAP asap.
>
> It seems ot be a recurring amount of fun that people try to mmap dma-buf
> and then call get_user_pages on them.
>
> Which just doesn't work. I guess this is also why Rob Clark send out that
> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

No, not really.. my patch was simply so that the VMM side of virtgpu
could send the correct cache mode to the guest when handed a dma-buf
;-)

BR,
-R

>
> There seems to be some serious bonghits going on :-/
> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Cc: sta...@vger.kernel.org
> > > Cc: Trigger Huang 
> > > Link: 
> > > https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > > Tested-by: Dmitry Osipenko  # AMDGPU (Qemu 
> > > and crosvm)
> > > Signed-off-by: Dmitry Osipenko 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 25 -
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c 
> > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index 21b61631f73a..11e92bb149c9 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
> > > *pool, gfp_t gfp_flags,
> > > unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > > struct ttm_pool_dma *dma;
> > > struct page *p;
> > > +   unsigned int i;
> > > void *vaddr;
> > > /* Don't set the __GFP_COMP flag for higher order allocations.
> > > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > > if (!pool->use_dma_alloc) {
> > > p = alloc_pages(gfp_flags, order);
> > > -   if (p)
> > > +   if (p) {
> > > p->private = order;
> > > +   goto ref_tail_pages;
> > > +   }
> > > return p;
> > > }
> > > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > > dma->vaddr = (unsigned long)vaddr | order;
> > > p->private = (unsigned long)dma;
> > > +
> > > +ref_tail_pages:
> > > +   /*
> > > +* KVM requires mapped tail pages to be refcounted because put_page()
> > > +* is invoked on them in the end of the page fault handling, and thus,
> > > +* tail pages need to be protected from the premature releasing.
> > > +* In fact, KVM page fault handler refuses to map tail pages to guest
> > > +* if they aren't refcounted because hva_to_pfn_remapped() checks the
> > > +* refcount specifically for this case.
> > > +*
> > > +* In particular, unreferenced tail pages result in a KVM "Bad 
> > > address"
> > > +* failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > > +* accesses mapped host TTM buffer that contains tail pages.
> > > +*/
> > > +   for (i = 1; i < 1 << order; i++)
> > > +   page_ref_inc(p + i);
> > > +
> > > return p;
> > >   error_free:
> > > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, 
> > > enum ttm_caching caching,
> > >   {
> > > unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > > struct ttm_pool_dma *dma;
> > > +   unsigned int i;
> > > void *vaddr;
> > >   #ifdef CONFIG_X86
> > > @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, 
> > > enum ttm_caching caching,
> > > if (caching != ttm_cached && !PageHighMem(p))
> > > set_pages_wb(p, 1 << order);
> > >   #endif
> > > +   for (i = 1; i < 1 << order; i++)
> > > +   page_ref_dec(p + i);
> > > if (!pool || !pool->use_dma_alloc) {
> > > __free_pages(p, order);
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel 

Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Robin Murphy

On 2022-09-08 01:43, Jason Gunthorpe wrote:

On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:


FWIW, we're now very close to being able to validate dev->iommu against
where the domain came from in core code, and so short-circuit ->attach_dev
entirely if they don't match.


I don't think this is a long term direction. We have systems now with
a number of SMMU blocks and we really are going to see a need that
they share the iommu_domains so we don't have unncessary overheads
from duplicated io page table memory.

So ultimately I'd expect to pass the iommu_domain to the driver and
the driver will decide if the page table memory it represents is
compatible or not. Restricting to only the same iommu instance isn't
good..


Who said IOMMU instance?


Ah, I completely misunderstood what 'dev->iommu' was referring too, OK
I see.


Again, not what I was suggesting. In fact the nature of iommu_attach_group()
already rules out bogus devices getting this far, so all a driver currently
has to worry about is compatibility of a device that it definitely probed
with a domain that it definitely allocated. Therefore, from a caller's point
of view, if attaching to an existing domain returns -EINVAL, try another
domain; multiple different existing domains can be tried, and may also
return -EINVAL for the same or different reasons; the final attempt is to
allocate a fresh domain and attach to that, which should always be nominally
valid and *never* return -EINVAL. If any attempt returns any other error,
bail out down the usual "this should have worked but something went wrong"
path. Even if any driver did have a nonsensical "nothing went wrong, I just
can't attach my device to any of my domains" case, I don't think it would
really need distinguishing from any other general error anyway.


The algorithm you described is exactly what this series does, it just
used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
fundamental problem, just a bit more work.

Looking at Nicolin's series there is a bunch of existing errnos that
would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
ENXIO are all returned as codes for 'domain incompatible with device'
in various drivers. So the patch would still look much the same, just
changing them to EINVAL instead of EMEDIUMTYPE.

That leaves the question of the remaining EINVAL's that Nicolin did
not convert to EMEDIUMTYPE.

eg in the AMD driver:

if (!check_device(dev))
return -EINVAL;

iommu = rlookup_amd_iommu(dev);
if (!iommu)
return -EINVAL;

These are all cases of 'something is really wrong with the device or
iommu, everything will fail'. Other drivers are using ENODEV for this
already, so we'd probably have an additional patch changing various
places like that to ENODEV.

This mixture of error codes is the basic reason why a new code was
used, because none of the existing codes are used with any
consistency.

But OK, I'm on board, lets use more common errnos with specific
meaning, that can be documented in a comment someplace:
  ENOMEM - out of memory
  ENODEV - no domain can attach, device or iommu is messed up
  EINVAL - the domain is incompatible with the device
   - Same behavior as ENODEV, use is discouraged.

I think achieving consistency of error codes is a generally desirable
goal, it makes the error code actually useful.

Joerg this is a good bit of work, will you be OK with it?


Thus as long as we can maintain that basic guarantee that attaching
a group to a newly allocated domain can only ever fail for resource
allocation reasons and not some spurious "incompatibility", then we
don't need any obscure trickery, and a single, clear, error code is
in fact enough to say all that needs to be said.


As above, this is not the case, drivers do seem to have error paths
that are unconditional on the domain. Perhaps they are just protective
assertions and never happen.


Right, that's the gist of what I was getting at - I think it's worth 
putting in the effort to audit and fix the drivers so that that *can* be 
the case, then we can have a meaningful error API with standard codes 
effectively for free, rather than just sighing at the existing mess and 
building a slightly esoteric special case on top.


Case in point, the AMD checks quoted above are pointless, since it 
checks the same things in ->probe_device, and if that fails then the 
device won't get a group so there's no way for it to even reach 
->attach_dev any more. I'm sure there's a *lot* of cruft that can be 
cleared out now that per-device and per-domain ops give us this kind of 
inherent robustness.


Cheers,
Robin.


Regardless, it doesn't matter. If they return ENODEV or EINVAL the
VFIO side algorithm will continue to work fine, it just does alot more
work if EINVAL is permanently returned.

Thanks,
Jason

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

RE: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Thursday, September 8, 2022 5:31 PM
> > This mixture of error codes is the basic reason why a new code was
> > used, because none of the existing codes are used with any
> > consistency.
> 
> btw I saw the policy for -EBUSY is also not consistent in this series.
> 
> while it's correct to change -EBUSY to -EMEDIUMTYPE for omap, assuming
> that retrying another fresh domain for the said device should work:
> 
>   if (omap_domain->dev) {
> - dev_err(dev, "iommu domain is already attached\n");
> - ret = -EBUSY;
> + ret = -EMEDIUMTYPE;
>   goto out;
>   }
> 
> the change in tegra-gart doesn't sound correct:
> 
>   if (gart->active_domain && gart->active_domain != domain) {
> - ret = -EBUSY;
> + ret = -EMEDIUMTYPE;
> 
> one device cannot be attached to two domains. This fact doesn't change
> no matter how many domains are tried. In concept this check is
> redundant and should have been done by iommu core, but obviously we
> didn't pay attention to what -EBUSY actually represents in this path.
> 

oops. Above is actually a right retry condition. gart is iommu instead of
device. So in concept retrying gart->active_domain for the device could
work.

So please ignore this comment.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-09-08 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, September 8, 2022 8:43 AM
> 
> On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:
> 
> > Again, not what I was suggesting. In fact the nature of
> iommu_attach_group()
> > already rules out bogus devices getting this far, so all a driver currently
> > has to worry about is compatibility of a device that it definitely probed
> > with a domain that it definitely allocated. Therefore, from a caller's point
> > of view, if attaching to an existing domain returns -EINVAL, try another
> > domain; multiple different existing domains can be tried, and may also
> > return -EINVAL for the same or different reasons; the final attempt is to
> > allocate a fresh domain and attach to that, which should always be
> nominally
> > valid and *never* return -EINVAL. If any attempt returns any other error,
> > bail out down the usual "this should have worked but something went
> wrong"
> > path. Even if any driver did have a nonsensical "nothing went wrong, I just
> > can't attach my device to any of my domains" case, I don't think it would
> > really need distinguishing from any other general error anyway.
> 
> The algorithm you described is exactly what this series does, it just
> used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
> fundamental problem, just a bit more work.
> 
> Looking at Nicolin's series there is a bunch of existing errnos that
> would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
> ENXIO are all returned as codes for 'domain incompatible with device'
> in various drivers. So the patch would still look much the same, just
> changing them to EINVAL instead of EMEDIUMTYPE.
> 
> That leaves the question of the remaining EINVAL's that Nicolin did
> not convert to EMEDIUMTYPE.
> 
> eg in the AMD driver:
> 
>   if (!check_device(dev))
>   return -EINVAL;
> 
>   iommu = rlookup_amd_iommu(dev);
>   if (!iommu)
>   return -EINVAL;
> 
> These are all cases of 'something is really wrong with the device or
> iommu, everything will fail'. Other drivers are using ENODEV for this
> already, so we'd probably have an additional patch changing various
> places like that to ENODEV.
> 
> This mixture of error codes is the basic reason why a new code was
> used, because none of the existing codes are used with any
> consistency.

btw I saw the policy for -EBUSY is also not consistent in this series.

while it's correct to change -EBUSY to -EMEDIUMTYPE for omap, assuming
that retrying another fresh domain for the said device should work:

if (omap_domain->dev) {
-   dev_err(dev, "iommu domain is already attached\n");
-   ret = -EBUSY;
+   ret = -EMEDIUMTYPE;
goto out;
}

the change in tegra-gart doesn't sound correct:

if (gart->active_domain && gart->active_domain != domain) {
-   ret = -EBUSY;
+   ret = -EMEDIUMTYPE;

one device cannot be attached to two domains. This fact doesn't change
no matter how many domains are tried. In concept this check is
redundant and should have been done by iommu core, but obviously we
didn't pay attention to what -EBUSY actually represents in this path.

> 
> But OK, I'm on board, lets use more common errnos with specific
> meaning, that can be documented in a comment someplace:
>  ENOMEM - out of memory
>  ENODEV - no domain can attach, device or iommu is messed up
>  EINVAL - the domain is incompatible with the device
>   - Same behavior as ENODEV, use is discouraged.

There are also cases where common kAPIs are called in the attach
path which may return -EINVAL and random errno, e.g.:

omap_iommu_attach_dev()
  omap_iommu_attach()
iommu_enable()
  pm_runtime_get_sync()
__pm_runtime_resume()
  rpm_resume()
if (dev->power.runtime_error) {
retval = -EINVAL;

viommu_attach_dev()
  viommu_domain_finalise()
ida_alloc_range()
if ((int)min < 0)
return -ENOSPC;

> 
> I think achieving consistency of error codes is a generally desirable
> goal, it makes the error code actually useful.
> 
> Joerg this is a good bit of work, will you be OK with it?
> 
> > Thus as long as we can maintain that basic guarantee that attaching
> > a group to a newly allocated domain can only ever fail for resource
> > allocation reasons and not some spurious "incompatibility", then we
> > don't need any obscure trickery, and a single, clear, error code is
> > in fact enough to say all that needs to be said.
> 
> As above, this is not the case, drivers do seem to have error paths
> that are unconditional on the domain. Perhaps they are just protective
> assertions and never happen.
> 
> Regardless, it doesn't matter. If they return ENODEV or EINVAL the
> VFIO side algorithm will continue to work fine, it just does alot more
> work if EINVAL is permanently returned.
> 

I don't see an elegant option here.

If we care 

Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc

2022-09-08 Thread Stefano Garzarella

On Thu, Aug 18, 2022 at 02:39:32PM +, Bobby Eshleman wrote:

On Tue, Sep 06, 2022 at 09:26:33AM -0400, Stefan Hajnoczi wrote:

Hi Bobby,
If you are attending Linux Foundation conferences in Dublin, Ireland
next week (Linux Plumbers Conference, Open Source Summit Europe, KVM
Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet
Stefano Garzarella and others to discuss this patch series.

Using netdev and sk_buff is a big change to vsock. Discussing your
requirements and the future direction of vsock in person could help.

If you won't be in Dublin, don't worry. You can schedule a video call if
you feel it would be helpful to discuss these topics.

Stefan


Hey Stefan,

That sounds like a great idea!


Yep, I agree!


I was unable to make the Dublin trip work
so I think a video call would be best, of course if okay with everyone.


It will work for me, but I'll be a bit busy in the next 2 weeks:

From Sep 12 to Sep 14 I'll be at KVM Forum, so it may be difficult to 
arrange, but we can try.


Sep 15 I'm not available.
Sep 16 I'm traveling, but early in my morning, so I should be available.

Form Sep 10 to Sep 23 I'll be mostly off, but I can try to find some 
slots if needed.


From Sep 26 I'm back and fully available.

Let's see if others are available and try to find a slot :-)

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization