Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

2021-03-30 Thread Jiang Wang .
Hi Stefano,

I checked dgram code again. I think assigning transport for each packet
is doable. The dgram will be similar to stream to have two transports.

If there is no other problem, I can submit a Linux kernel patch to support
nested dgram on VMCI first. Then it will be easier for virtio vsock.

Also, I don't think we need to put anything related to this multiple-
transport support in the spec.  Let me know if otherwise.

Regards,

Jiang

On Tue, Mar 30, 2021 at 11:34 AM Jiang Wang .  wrote:
>
> On Tue, Mar 30, 2021 at 8:32 AM Stefano Garzarella  
> wrote:
> >
> > Hi Jiang,
> >
> > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:
> > >Hi Michael and Stefan,
> > >
> > >I thought about this and discussed it with my colleague Cong Wang.
> > >One idea is to make current asynchronous send_pkt flow to be synchronous,
> > >then if the virtqueue is full, the function can return  ENOMEM all the way 
> > >back
> > >to the caller and the caller can check the return value of sendmsg
> > >and slow down when necessary.
> > >
> > >In the spec, we can put something like, if the virtqueue is full, the 
> > >caller
> > >should be notified with an error etc.
> > >
> > >In terms of implementation, that means we will remove the current
> > >send_pkt_work for both stream and dgram sockets. Currently, the
> > >code path uses RCU and a work queue, then grab a mutex in the
> > >work queue function. Since we cannot grab mutex when in rcu
> > >critical section, we have to change RCU to a normal reference
> > >counting mechanism. I think this is doable. The drawback is
> > >that the reference counting in general spends a little more
> > >cycles than the RCU, so there is a small price to pay. Another
> > >option is to use Sleepable RCU and remove the work queue.
> > >
> > >What do you guys think?
> >
> > another thing that came to mind not related to the spec but to the Linux
> > implementation, is the multi-transport support.
> > When we discussed the initial proposals [1][2], we decided to take a
> > shortcut for DGRAM, since the only transport with DGRAM support was
> > vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM
> > set can be registered.
> >
> > Since also virtio-transport and vhost-transport will support DGRAM, we
> > need to find a way to allow multiple transports that support DGRAM to be
> > registered together to support nested VMs.
> >
> > Do you have already thought about how to solve this problem?
> >
> > We should definitely allow the registration of more transports with
> > VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when
> > sending a packet.
>
> Got it. We briefly discussed multiple dgram transport
> support in my old email thread. At that time, I was thinking
> maybe check for transport for each packet. I did not spend more
> time on that part after that. I will think about it more and get
> back to you. Thanks
>
> > Thanks,
> > Stefano
> >
> > [1] https://www.spinics.net/lists/netdev/msg570274.html
> > [2] https://www.spinics.net/lists/netdev/msg575792.html
> >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

2021-03-30 Thread Jiang Wang .
On Tue, Mar 30, 2021 at 8:32 AM Stefano Garzarella  wrote:
>
> Hi Jiang,
>
> On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:
> >Hi Michael and Stefan,
> >
> >I thought about this and discussed it with my colleague Cong Wang.
> >One idea is to make current asynchronous send_pkt flow to be synchronous,
> >then if the virtqueue is full, the function can return  ENOMEM all the way 
> >back
> >to the caller and the caller can check the return value of sendmsg
> >and slow down when necessary.
> >
> >In the spec, we can put something like, if the virtqueue is full, the caller
> >should be notified with an error etc.
> >
> >In terms of implementation, that means we will remove the current
> >send_pkt_work for both stream and dgram sockets. Currently, the
> >code path uses RCU and a work queue, then grab a mutex in the
> >work queue function. Since we cannot grab mutex when in rcu
> >critical section, we have to change RCU to a normal reference
> >counting mechanism. I think this is doable. The drawback is
> >that the reference counting in general spends a little more
> >cycles than the RCU, so there is a small price to pay. Another
> >option is to use Sleepable RCU and remove the work queue.
> >
> >What do you guys think?
>
> another thing that came to mind not related to the spec but to the Linux
> implementation, is the multi-transport support.
> When we discussed the initial proposals [1][2], we decided to take a
> shortcut for DGRAM, since the only transport with DGRAM support was
> vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM
> set can be registered.
>
> Since also virtio-transport and vhost-transport will support DGRAM, we
> need to find a way to allow multiple transports that support DGRAM to be
> registered together to support nested VMs.
>
> Do you have already thought about how to solve this problem?
>
> We should definitely allow the registration of more transports with
> VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when
> sending a packet.

Got it. We briefly discussed multiple dgram transport
support in my old email thread. At that time, I was thinking
maybe check for transport for each packet. I did not spend more
time on that part after that. I will think about it more and get
back to you. Thanks

> Thanks,
> Stefano
>
> [1] https://www.spinics.net/lists/netdev/msg570274.html
> [2] https://www.spinics.net/lists/netdev/msg575792.html
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

2021-03-30 Thread Jiang Wang .
On Tue, Mar 30, 2021 at 3:42 AM Stefan Hajnoczi  wrote:
>
> On Mon, Mar 29, 2021 at 04:22:28PM -0700, Jiang Wang . wrote:
> > On Mon, Mar 29, 2021 at 2:26 AM Stefan Hajnoczi  wrote:
> > >
> > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:
> > > > I thought about this and discussed it with my colleague Cong Wang.
> > > > One idea is to make current asynchronous send_pkt flow to be 
> > > > synchronous,
> > > > then if the virtqueue is full, the function can return  ENOMEM all the 
> > > > way back
> > > > to the caller and the caller can check the return value of sendmsg
> > > > and slow down when necessary.
> > > >
> > > > In the spec, we can put something like, if the virtqueue is full, the 
> > > > caller
> > > > should be notified with an error etc.
> > > >
> > > > In terms of implementation, that means we will remove the current
> > > > send_pkt_work for both stream and dgram sockets. Currently, the
> > > > code path uses RCU and a work queue, then grab a mutex in the
> > > > work queue function. Since we cannot grab mutex when in rcu
> > > > critical section, we have to change RCU to a normal reference
> > > > counting mechanism. I think this is doable. The drawback is
> > > > that the reference counting in general spends a little more
> > > > cycles than the RCU, so there is a small price to pay. Another
> > > > option is to use Sleepable RCU and remove the work queue.
> > > >
> > > > What do you guys think?
> > >
> > > I think the tx code path is like this because of reliable delivery.
> > > Maybe a separate datagram rx/tx code path would be simpler?
> >
> > I thought about this too.  dgram can have a separate rx/tx
> > path from stream types. In this case, the the_virtio_vsock
> > will still be shared because the virtqueues have to be in one
> > structure. Then the_virtio_vsock will be protected by a rcu
> > and a reference counting ( or a sleepable RCU ).
> >
> > In vhost_vsock_dev_release, it will wait for both rcu and another
> > one to be finished and then free the memory. I think this is
> > doable. Let me know if there is a better way to do it.
> > Btw, I think dgram tx code path will be quite different from
> > stream, but dgram rx path will be similar to stream type.
> >
> > > Take the datagram tx virtqueue lock, try to add the packet into the
> > > virtqueue, and return -ENOBUFS if the virtqueue is full. Then use the
> > > datagram socket's sndbuf accounting to prevent queuing up too many
> > > packets. When a datagram tx virtqueue buffer is used by the device,
> > > select queued packets for transmission.
> >
> > I am not sure about the last sentence. In the new design, there will
> > be no other queued packets for dgram (except those in the virtqueue).
> > When dgram tx virtqueue is freed by the device, the driver side
> > just needs to free some space. No need
> > to trigger more transmission.
>
> This approach drops packets when the virtqueue is full. In order to get
> close to UDP semantics I think the following is necessary:
>
> 1. Packets are placed directly into the tx virtqueue when possible.
> 2. Packets are queued up to sndbuf bytes if the tx virtqueue is full.
> 3. When a full tx virtqueue gets some free space again there is an
>algorithm that selects from among the queued sockets.
>
> This is how unreliably delivery can be somewhat fair between competing
> threads. I thought you were trying to achieve this (it's similar to what
> UDP does)?

I see. I was thinking something different without an extra queue. I think
your approach is better. For step 3, I think I will just start with a simple
FIFO algorithm first.


> Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:58, Will Deacon wrote:

On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
   missing bits]
Signed-off-by: Christoph Hellwig 
---
   drivers/iommu/amd/iommu.c   | 23 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
   drivers/iommu/dma-iommu.c   |  9 +--
   drivers/iommu/intel/iommu.c | 64 -
   drivers/iommu/iommu.c   | 27 ++---
   include/linux/iommu.h   |  4 +-
   8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of stuffing a
global property into a domain attribute then pulling it out again in the
illusion that it was in any way per-domain. We're still checking
dev_is_untrusted() before making an actual decision, and it's not like we
can't add more factors at that point if we want to.


Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.


But previously, SMMU only ever saw the global policy piped through the 
domain attribute by iommu_group_alloc_default_domain(), so there's no 
functional change there.


Obviously some of the above checks could be factored out into some kind 
of iommu_use_flush_queue() helper that IOMMU drivers can also call if 
they need to keep in sync. Or maybe we just allow iommu-dma to set 
IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if 
we're treating that as a generic thing now.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/


Ah, sorry, I didn't spot that! I was just trying to illustrate that this
is per-device.


Sure, I understand - and I'm just trying to bang home that despite 
appearances it's never actually been treated as such for SMMU, so 
anything that's wrong after this change was already wrong before.


Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

2021-03-30 Thread Stefano Garzarella

Hi Jiang,

On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:

Hi Michael and Stefan,

I thought about this and discussed it with my colleague Cong Wang.
One idea is to make current asynchronous send_pkt flow to be synchronous,
then if the virtqueue is full, the function can return  ENOMEM all the way back
to the caller and the caller can check the return value of sendmsg
and slow down when necessary.

In the spec, we can put something like, if the virtqueue is full, the caller
should be notified with an error etc.

In terms of implementation, that means we will remove the current
send_pkt_work for both stream and dgram sockets. Currently, the
code path uses RCU and a work queue, then grab a mutex in the
work queue function. Since we cannot grab mutex when in rcu
critical section, we have to change RCU to a normal reference
counting mechanism. I think this is doable. The drawback is
that the reference counting in general spends a little more
cycles than the RCU, so there is a small price to pay. Another
option is to use Sleepable RCU and remove the work queue.

What do you guys think?


another thing that came to mind not related to the spec but to the Linux 
implementation, is the multi-transport support.
When we discussed the initial proposals [1][2], we decided to take a 
shortcut for DGRAM, since the only transport with DGRAM support was 
vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM 
set can be registered.


Since also virtio-transport and vhost-transport will support DGRAM, we 
need to find a way to allow multiple transports that support DGRAM to be 
registered together to support nested VMs.


Do you have already thought about how to solve this problem?

We should definitely allow the registration of more transports with 
VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when 
sending a packet.


Thanks,
Stefano

[1] https://www.spinics.net/lists/netdev/msg570274.html
[2] https://www.spinics.net/lists/netdev/msg575792.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Will Deacon
On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote:
> On 2021-03-30 14:11, Will Deacon wrote:
> > On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:
> > > From: Robin Murphy 
> > > 
> > > Instead make the global iommu_dma_strict paramete in iommu.c canonical by
> > > exporting helpers to get and set it and use those directly in the drivers.
> > > 
> > > This make sure that the iommu.strict parameter also works for the AMD and
> > > Intel IOMMU drivers on x86.  As those default to lazy flushing a new
> > > IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
> > > represent the default if not overriden by an explicit parameter.
> > > 
> > > Signed-off-by: Robin Murphy .
> > > [ported on top of the other iommu_attr changes and added a few small
> > >   missing bits]
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >   drivers/iommu/amd/iommu.c   | 23 +---
> > >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
> > >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
> > >   drivers/iommu/dma-iommu.c   |  9 +--
> > >   drivers/iommu/intel/iommu.c | 64 -
> > >   drivers/iommu/iommu.c   | 27 ++---
> > >   include/linux/iommu.h   |  4 +-
> > >   8 files changed, 40 insertions(+), 165 deletions(-)
> > 
> > I really like this cleanup, but I can't help wonder if it's going in the
> > wrong direction. With SoCs often having multiple IOMMU instances and a
> > distinction between "trusted" and "untrusted" devices, then having the
> > flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
> > unreasonable to me, but this change makes it a global property.
> 
> The intent here was just to streamline the existing behaviour of stuffing a
> global property into a domain attribute then pulling it out again in the
> illusion that it was in any way per-domain. We're still checking
> dev_is_untrusted() before making an actual decision, and it's not like we
> can't add more factors at that point if we want to.

Like I say, the cleanup is great. I'm just wondering whether there's a
better way to express the complicated logic to decide whether or not to use
the flush queue than what we end up with:

if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict())

which is mixing up globals, device properties and domain properties. The
result is that the driver code ends up just using the global to determine
whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code,
which is a departure from the current way of doing things.

> > For example, see the recent patch from Lu Baolu:
> > 
> > https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com
> 
> Erm, this patch is based on that one, it's right there in the context :/

Ah, sorry, I didn't spot that! I was just trying to illustrate that this
is per-device.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description

2021-03-30 Thread Stefan Hajnoczi
On Tue, Mar 30, 2021 at 12:50:06PM +0300, Arseny Krasnov wrote:
> 
> On 30.03.2021 11:55, Stefan Hajnoczi wrote:
> > On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote:
> >> On 30.03.2021 00:28, Stefano Garzarella wrote:
> >>> On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote:
>  On 29.03.2021 19:11, Stefan Hajnoczi wrote:
> > On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote:
> >> @@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device 
> >> Types / Socket Device / Device Op
> >>  #define VIRTIO_VSOCK_OP_CREDIT_UPDATE  6
> >>  /* Request the peer to send the credit info to us */
> >>  #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7
> >> +/* Message begin for SOCK_SEQPACKET */
> >> +#define VIRTIO_VSOCK_OP_SEQ_BEGIN  8
> >> +/* Message end for SOCK_SEQPACKET */
> >> +#define VIRTIO_VSOCK_OP_SEQ_END9
> > The struct virtio_vsock_hdr->flags field is le32 and currently unused.
> > Could 24 bits be used for a unique message id and 8 bits for flags? 1
> > flag bit could be used for end-of-message and the remaining 7 bits could
> > be reserved. That way SEQ_BEGIN and SEQ_END are not necessary.  
> > Pressure
> > on the virtqueue would be reduced and performance should be comparable
> > to SOCK_STREAM.
>  Well, my first versions of SOCK_SEQPACKET implementation, worked
>  something like this: i used flags field of header as length of whole
>  message. I discussed it with Stefano Garzarella, and he told that it 
>  will
>  be better to use special "header" in packet's payload, to keep some
>  SOCK_SEQPACKET specific data, instead of reusing packet's header
>  fields.
> >>> IIRC in the first implementation SEQ_BEGIN was an empty message and we 
> >>> didn't added the msg_id yet. So since we needed to carry both id and 
> >>> total length, I suggested to use the payload to put these extra 
> >>> information.
> >>>
> >>> IIUC what Stefan is suggesting is a bit different and I think it should 
> >>> be cool to implement: we can remove the boundary packets, use only 8 
> >>> bits for the flags, and add a new field to reuse the 24 unused bits, 
> >>> maybe also 16 bits would be enough.
> >>> At that point we will only use the EOR flag to know the last packet.
> >>>
> >>> The main difference will be that the receiver will know the total size 
> >>> only when the last packet is received.
> >>>
> >>> Do you see any issue on that approach?
> >> It will work, except we can't check that any packet of message,
> >>
> >> except last(with EOR bit) was dropped, since receiver don't know
> >>
> >> real length of message. If it is ok, then i can implement it.
> > The credit mechanism ensures that packets are not dropped, so it's not
> > necessary to check for this condition.
> >
> > By the way, is a unique message ID needed? My understanding is:
> >
> > If two messages are being sent on a socket at the same time either their
> > order is serialized (whichever message began first) or it is undefined
> > (whichever message completes first).
> 
> If we are talking about case, when two threads writes to one socket at the 
> same time,
> 
> in Linux it is possible that two message will interleave(for vsock). But as i 
> know, for example
> 
> when TCP socket is used, both 'write()' calls will be serialized. May be it 
> is bug of vsock: when
> 
> first writer goes out of space, it will sleep. Then second writer tries to 
> send something, but
> 
> as free space is over, it will sleep too. Then, credit update is received 
> from peer. Both sender's
> 
> will be woken up, but sender which grab socket lock first will continue to 
> send it's message.
> 
> So may be we can add something like semaphore to new/vmw_vsock/af_vsock.c 
> which will
> 
> serialize two 'write()' calls: second sender enters 'write()' path, only when 
> first left this path.
> 
> My implementation doesn't care about that, because i wanted to add semaphore 
> later, by another
> 
> patch.

Yes, that is definitely an issue that the driver needs to take care of
if we don't have unique message IDs. Thanks for explaining!

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Robin Murphy

On 2021-03-30 14:11, Will Deacon wrote:

On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:

From: Robin Murphy 

Instead make the global iommu_dma_strict paramete in iommu.c canonical by
exporting helpers to get and set it and use those directly in the drivers.

This make sure that the iommu.strict parameter also works for the AMD and
Intel IOMMU drivers on x86.  As those default to lazy flushing a new
IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
represent the default if not overriden by an explicit parameter.

Signed-off-by: Robin Murphy .
[ported on top of the other iommu_attr changes and added a few small
  missing bits]
Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
  drivers/iommu/dma-iommu.c   |  9 +--
  drivers/iommu/intel/iommu.c | 64 -
  drivers/iommu/iommu.c   | 27 ++---
  include/linux/iommu.h   |  4 +-
  8 files changed, 40 insertions(+), 165 deletions(-)


I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.


The intent here was just to streamline the existing behaviour of 
stuffing a global property into a domain attribute then pulling it out 
again in the illusion that it was in any way per-domain. We're still 
checking dev_is_untrusted() before making an actual decision, and it's 
not like we can't add more factors at that point if we want to.



For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com


Erm, this patch is based on that one, it's right there in the context :/

Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/18] iommu: remove iommu_domain_{get,set}_attr

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:24PM +0100, Christoph Hellwig wrote:
> Remove the now unused iommu attr infrastructure.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/iommu.c | 26 --
>  include/linux/iommu.h | 36 
>  2 files changed, 62 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 17/18] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:23PM +0100, Christoph Hellwig wrote:
> Use an explicit set_pgtable_quirks method instead that just passes
> the actual quirk bitmask instead.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  5 +-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 64 +
>  drivers/iommu/arm/arm-smmu/arm-smmu.h   |  2 +-
>  drivers/iommu/iommu.c   | 11 +
>  include/linux/io-pgtable.h  |  4 --
>  include/linux/iommu.h   | 12 -
>  6 files changed, 35 insertions(+), 63 deletions(-)

I'm fine with this for now, although there has been talk about passing
things other than boolean flags as page-table quirks. We can cross that
bridge when we get there, so:

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote:
> From: Robin Murphy 
> 
> Instead make the global iommu_dma_strict paramete in iommu.c canonical by
> exporting helpers to get and set it and use those directly in the drivers.
> 
> This make sure that the iommu.strict parameter also works for the AMD and
> Intel IOMMU drivers on x86.  As those default to lazy flushing a new
> IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to
> represent the default if not overriden by an explicit parameter.
> 
> Signed-off-by: Robin Murphy .
> [ported on top of the other iommu_attr changes and added a few small
>  missing bits]
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/amd/iommu.c   | 23 +---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 27 +
>  drivers/iommu/dma-iommu.c   |  9 +--
>  drivers/iommu/intel/iommu.c | 64 -
>  drivers/iommu/iommu.c   | 27 ++---
>  include/linux/iommu.h   |  4 +-
>  8 files changed, 40 insertions(+), 165 deletions(-)

I really like this cleanup, but I can't help wonder if it's going in the
wrong direction. With SoCs often having multiple IOMMU instances and a
distinction between "trusted" and "untrusted" devices, then having the
flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound
unreasonable to me, but this change makes it a global property.

For example, see the recent patch from Lu Baolu:

https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 15/18] iommu: remove iommu_set_cmd_line_dma_api and iommu_cmd_line_dma_api

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:21PM +0100, Christoph Hellwig wrote:
> Don't obsfucate the trivial bit flag check.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/iommu.c | 23 +--
>  1 file changed, 5 insertions(+), 18 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/18] iommu: remove DOMAIN_ATTR_NESTING

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:20PM +0100, Christoph Hellwig wrote:
> Use an explicit enable_nesting method instead.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 -
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 30 +++---
>  drivers/iommu/intel/iommu.c | 31 +--
>  drivers/iommu/iommu.c   | 10 +
>  drivers/vfio/vfio_iommu_type1.c |  5 +--
>  include/linux/iommu.h   |  4 +-
>  6 files changed, 55 insertions(+), 68 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/18] iommu: remove DOMAIN_ATTR_GEOMETRY

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:19PM +0100, Christoph Hellwig wrote:
> The geometry information can be trivially queried from the iommu_domain
> struture.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/iommu.c   | 20 +++-
>  drivers/vfio/vfio_iommu_type1.c | 26 --
>  drivers/vhost/vdpa.c| 10 +++---
>  include/linux/iommu.h   |  1 -
>  4 files changed, 18 insertions(+), 39 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 12/18] iommu: remove DOMAIN_ATTR_PAGING

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:18PM +0100, Christoph Hellwig wrote:
> DOMAIN_ATTR_PAGING is never used.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/iommu.c | 5 -
>  include/linux/iommu.h | 1 -
>  2 files changed, 6 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/18] iommu/fsl_pamu: remove the snoop_id field

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:17PM +0100, Christoph Hellwig wrote:
> The snoop_id is always set to ~(u32)0.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 5 ++---
>  drivers/iommu/fsl_pamu_domain.h | 1 -
>  2 files changed, 2 insertions(+), 4 deletions(-)

pamu_config_ppaace() takes quite a few useless parameters at this stage,
but anyway:

Acked-by: Will Deacon 

Do you know if this driver is actually useful? Once the complexity has been
stripped back, the stubs and default values really stand out.

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 10/18] iommu/fsl_pamu: enable the liodn when attaching a device

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:16PM +0100, Christoph Hellwig wrote:
> Instead of a separate call to enable all devices from the list, just
> enablde the liodn one the device is attached to the iommu domain.

(typos: "enablde" and "one" probably needs to be "once"?)

> This also remove the DOMAIN_ATTR_FSL_PAMU_ENABLE iommu_attr.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 47 ++---
>  drivers/iommu/fsl_pamu_domain.h | 10 --
>  drivers/soc/fsl/qbman/qman_portal.c | 11 ---
>  include/linux/iommu.h   |  1 -
>  4 files changed, 3 insertions(+), 66 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/18] iommu/fsl_pamu: merge handle_attach_device into fsl_pamu_attach_device

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:15PM +0100, Christoph Hellwig wrote:
> No good reason to split this functionality over two functions.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 59 +++--
>  1 file changed, 20 insertions(+), 39 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 08/18] iommu/fsl_pamu: merge pamu_set_liodn and map_liodn

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:14PM +0100, Christoph Hellwig wrote:
> Merge the two fuctions that configure the ppaace into a single coherent
> function.  I somehow doubt we need the two pamu_config_ppaace calls,
> but keep the existing behavior just to be on the safe side.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 65 +
>  1 file changed, 17 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
> index 40eff4b7bc5d42..4a4944332674f7 100644
> --- a/drivers/iommu/fsl_pamu_domain.c
> +++ b/drivers/iommu/fsl_pamu_domain.c
> @@ -54,25 +54,6 @@ static int __init iommu_init_mempool(void)
>   return 0;
>  }
>  
> -/* Map the DMA window corresponding to the LIODN */
> -static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain)
> -{
> - int ret;
> - struct iommu_domain_geometry *geom = _domain->iommu_domain.geometry;
> - unsigned long flags;
> -
> - spin_lock_irqsave(_lock, flags);
> - ret = pamu_config_ppaace(liodn, geom->aperture_start,
> -  geom->aperture_end - 1, ~(u32)0,
> -  0, dma_domain->snoop_id, dma_domain->stash_id,
> -  PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);
> - spin_unlock_irqrestore(_lock, flags);
> - if (ret)
> - pr_debug("PAACE configuration failed for liodn %d\n", liodn);
> -
> - return ret;
> -}
> -
>  static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain,
> u32 val)
>  {
> @@ -94,11 +75,11 @@ static int update_liodn_stash(int liodn, struct 
> fsl_dma_domain *dma_domain,
>  }
>  
>  /* Set the geometry parameters for a LIODN */
> -static int pamu_set_liodn(int liodn, struct device *dev,
> -   struct fsl_dma_domain *dma_domain,
> -   struct iommu_domain_geometry *geom_attr)
> +static int pamu_set_liodn(struct fsl_dma_domain *dma_domain, struct device 
> *dev,
> +   int liodn)
>  {
> - phys_addr_t window_addr, window_size;
> + struct iommu_domain *domain = _domain->iommu_domain;
> + struct iommu_domain_geometry *geom = >geometry;
>   u32 omi_index = ~(u32)0;
>   unsigned long flags;
>   int ret;
> @@ -110,22 +91,25 @@ static int pamu_set_liodn(int liodn, struct device *dev,
>*/
>   get_ome_index(_index, dev);
>  
> - window_addr = geom_attr->aperture_start;
> - window_size = geom_attr->aperture_end + 1;
> -
>   spin_lock_irqsave(_lock, flags);
>   ret = pamu_disable_liodn(liodn);
> - if (!ret)
> - ret = pamu_config_ppaace(liodn, window_addr, window_size, 
> omi_index,
> -  0, dma_domain->snoop_id,
> -  dma_domain->stash_id, 0);
> + if (ret)
> + goto out_unlock;
> + ret = pamu_config_ppaace(liodn, geom->aperture_start,
> +  geom->aperture_end - 1, omi_index, 0,
> +  dma_domain->snoop_id, dma_domain->stash_id, 0);
> + if (ret)
> + goto out_unlock;
> + ret = pamu_config_ppaace(liodn, geom->aperture_start,
> +  geom->aperture_end - 1, ~(u32)0,
> +  0, dma_domain->snoop_id, dma_domain->stash_id,
> +  PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE);

There's more '+1' / '-1' confusion here with aperture_end which I'm not
managing to follow. What am I missing?

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/18] iommu/fsl_pamu: replace DOMAIN_ATTR_FSL_PAMU_STASH with a direct call

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:13PM +0100, Christoph Hellwig wrote:
> Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly
> instead of indirecting through the iommu attr API.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  arch/powerpc/include/asm/fsl_pamu_stash.h | 12 +++-
>  drivers/iommu/fsl_pamu_domain.c   | 16 +++-
>  drivers/iommu/fsl_pamu_domain.h   |  2 --
>  drivers/soc/fsl/qbman/qman_portal.c   | 18 +++---
>  include/linux/iommu.h |  1 -
>  5 files changed, 9 insertions(+), 40 deletions(-)

Heh, this thing is so over-engineered.

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/18] iommu/fsl_pamu: remove ->domain_window_enable

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:12PM +0100, Christoph Hellwig wrote:
> The only thing that fsl_pamu_window_enable does for the current caller
> is to fill in the prot value in the only dma_window structure, and to
> propagate a few values from the iommu_domain_geometry struture into the
> dma_window.  Remove the dma_window entirely, hardcode the prot value and
> otherwise use the iommu_domain_geometry structure instead.
> 
> Remove the now unused ->domain_window_enable iommu method.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 182 +++-
>  drivers/iommu/fsl_pamu_domain.h |  17 ---
>  drivers/iommu/iommu.c   |  11 --
>  drivers/soc/fsl/qbman/qman_portal.c |   7 --
>  include/linux/iommu.h   |  17 ---
>  5 files changed, 14 insertions(+), 220 deletions(-)
> 
> diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
> index e6bdd38fc18409..fd2bc88b690465 100644
> --- a/drivers/iommu/fsl_pamu_domain.c
> +++ b/drivers/iommu/fsl_pamu_domain.c
> @@ -54,34 +54,18 @@ static int __init iommu_init_mempool(void)
>   return 0;
>  }
>  
> -static phys_addr_t get_phys_addr(struct fsl_dma_domain *dma_domain, 
> dma_addr_t iova)
> -{
> - struct dma_window *win_ptr = _domain->win_arr[0];
> - struct iommu_domain_geometry *geom;
> -
> - geom = _domain->iommu_domain.geometry;
> -
> - if (win_ptr->valid)
> - return win_ptr->paddr + (iova & (win_ptr->size - 1));
> -
> - return 0;
> -}
> -
>  /* Map the DMA window corresponding to the LIODN */
>  static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain)
>  {
>   int ret;
> - struct dma_window *wnd = _domain->win_arr[0];
> - phys_addr_t wnd_addr = dma_domain->iommu_domain.geometry.aperture_start;
> + struct iommu_domain_geometry *geom = _domain->iommu_domain.geometry;
>   unsigned long flags;
>  
>   spin_lock_irqsave(_lock, flags);
> - ret = pamu_config_ppaace(liodn, wnd_addr,
> -  wnd->size,
> -  ~(u32)0,
> -  wnd->paddr >> PAMU_PAGE_SHIFT,
> -  dma_domain->snoop_id, dma_domain->stash_id,
> -  wnd->prot);
> + ret = pamu_config_ppaace(liodn, geom->aperture_start,
> +  geom->aperture_end - 1, ~(u32)0,

You're passing 'geom->aperture_end - 1' as the size here, but the old code
seemed to _add_ 1:

> -static int fsl_pamu_window_enable(struct iommu_domain *domain, u32 wnd_nr,
> -   phys_addr_t paddr, u64 size, int prot)
> -{
> - struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain);
> - struct dma_window *wnd;
> - int pamu_prot = 0;
> - int ret;
> - unsigned long flags;
> - u64 win_size;
> -
> - if (prot & IOMMU_READ)
> - pamu_prot |= PAACE_AP_PERMS_QUERY;
> - if (prot & IOMMU_WRITE)
> - pamu_prot |= PAACE_AP_PERMS_UPDATE;
> -
> - spin_lock_irqsave(_domain->domain_lock, flags);
> - if (wnd_nr > 0) {
> - pr_debug("Invalid window index\n");
> - spin_unlock_irqrestore(_domain->domain_lock, flags);
> - return -EINVAL;
> - }
> -
> - win_size = (domain->geometry.aperture_end + 1) >> ilog2(1);

here ^^ when calculating the exclusive upper bound. In other words, I think
'1ULL << 36' used to get passed to pamu_config_ppaace(). Is that an
intentional change?

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 05/18] iommu/fsl_pamu: remove support for multiple windows

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:11PM +0100, Christoph Hellwig wrote:
> The only domains allocated forces use of a single window.  Remove all
> the code related to multiple window support, as well as the need for
> qman_portal to force a single window.
> 
> Remove the now unused DOMAIN_ATTR_WINDOWS iommu_attr.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu.c| 264 +-
>  drivers/iommu/fsl_pamu.h|  10 +-
>  drivers/iommu/fsl_pamu_domain.c | 275 +---
>  drivers/iommu/fsl_pamu_domain.h |  12 +-
>  drivers/soc/fsl/qbman/qman_portal.c |   7 -
>  include/linux/iommu.h   |   1 -
>  6 files changed, 59 insertions(+), 510 deletions(-)

[...]

> + set_bf(ppaace->impl_attr, PAACE_IA_ATM, PAACE_ATM_WINDOW_XLATE);
> + ppaace->twbah = rpn >> 20;
> + set_bf(ppaace->win_bitfields, PAACE_WIN_TWBAL, rpn);
> + set_bf(ppaace->addr_bitfields, PAACE_AF_AP, prot);
> + set_bf(ppaace->impl_attr, PAACE_IA_WCE, 0);
> + set_bf(ppaace->addr_bitfields, PPAACE_AF_MW, 0);
>   mb();

(I wonder what on Earth that mb() is doing...)

> diff --git a/drivers/iommu/fsl_pamu_domain.h b/drivers/iommu/fsl_pamu_domain.h
> index 53d359d66fe577..b9236fb5a8f82e 100644
> --- a/drivers/iommu/fsl_pamu_domain.h
> +++ b/drivers/iommu/fsl_pamu_domain.h
> @@ -17,23 +17,13 @@ struct dma_window {
>  };
>  
>  struct fsl_dma_domain {
> - /*
> -  * Number of windows assocaited with this domain.
> -  * During domain initialization, it is set to the
> -  * the maximum number of subwindows allowed for a LIODN.
> -  * Minimum value for this is 1 indicating a single PAMU
> -  * window, without any sub windows. Value can be set/
> -  * queried by set_attr/get_attr API for DOMAIN_ATTR_WINDOWS.
> -  * Value can only be set once the geometry has been configured.
> -  */
> - u32 win_cnt;
>   /*
>* win_arr contains information of the configured
>* windows for a domain. This is allocated only
>* when the number of windows for the domain are
>* set.
>*/

The last part of this comment is now stale ^^

> - struct dma_window   *win_arr;
> + struct dma_window   win_arr[1];
>   /* list of devices associated with the domain */
>   struct list_headdevices;
>   /* dma_domain states:

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/18] iommu/fsl_pamu: merge iommu_alloc_dma_domain into fsl_pamu_domain_alloc

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:10PM +0100, Christoph Hellwig wrote:
> Keep the functionality to allocate the domain together.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 34 ++---
>  1 file changed, 10 insertions(+), 24 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 03/18] iommu/fsl_pamu: remove support for setting DOMAIN_ATTR_GEOMETRY

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:09PM +0100, Christoph Hellwig wrote:
> The default geometry is the same as the one set by qman_port given
> that FSL_PAMU depends on having 64-bit physical and thus DMA addresses.
> 
> Remove the support to update the geometry and remove the now pointless
> geom_size field.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 55 +++--
>  drivers/iommu/fsl_pamu_domain.h |  6 
>  drivers/soc/fsl/qbman/qman_portal.c | 12 ---
>  3 files changed, 5 insertions(+), 68 deletions(-)

Took me a minute to track down the other magic '36' which ends up in
aperture_end, but I found it eventually so:

Acked-by: Will Deacon 

(It does make me wonder what all this glue was intended to be used for)

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/18] iommu/fsl_pamu: remove fsl_pamu_get_domain_attr

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:08PM +0100, Christoph Hellwig wrote:
> None of the values returned by this function are ever queried.  Also
> remove the DOMAIN_ATTR_FSL_PAMUV1 enum value that is not otherwise used.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 30 --
>  include/linux/iommu.h   |  4 
>  2 files changed, 34 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/18] iommu: remove the unused domain_window_disable method

2021-03-30 Thread Will Deacon
On Tue, Mar 16, 2021 at 04:38:07PM +0100, Christoph Hellwig wrote:
> domain_window_disable is wired up by fsl_pamu, but never actually called.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Li Yang 
> ---
>  drivers/iommu/fsl_pamu_domain.c | 48 -
>  include/linux/iommu.h   |  2 --
>  2 files changed, 50 deletions(-)

Acked-by: Will Deacon 

Will
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type

2021-03-30 Thread Stefan Hajnoczi
On Mon, Mar 29, 2021 at 04:22:28PM -0700, Jiang Wang . wrote:
> On Mon, Mar 29, 2021 at 2:26 AM Stefan Hajnoczi  wrote:
> >
> > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote:
> > > I thought about this and discussed it with my colleague Cong Wang.
> > > One idea is to make current asynchronous send_pkt flow to be synchronous,
> > > then if the virtqueue is full, the function can return  ENOMEM all the 
> > > way back
> > > to the caller and the caller can check the return value of sendmsg
> > > and slow down when necessary.
> > >
> > > In the spec, we can put something like, if the virtqueue is full, the 
> > > caller
> > > should be notified with an error etc.
> > >
> > > In terms of implementation, that means we will remove the current
> > > send_pkt_work for both stream and dgram sockets. Currently, the
> > > code path uses RCU and a work queue, then grab a mutex in the
> > > work queue function. Since we cannot grab mutex when in rcu
> > > critical section, we have to change RCU to a normal reference
> > > counting mechanism. I think this is doable. The drawback is
> > > that the reference counting in general spends a little more
> > > cycles than the RCU, so there is a small price to pay. Another
> > > option is to use Sleepable RCU and remove the work queue.
> > >
> > > What do you guys think?
> >
> > I think the tx code path is like this because of reliable delivery.
> > Maybe a separate datagram rx/tx code path would be simpler?
> 
> I thought about this too.  dgram can have a separate rx/tx
> path from stream types. In this case, the the_virtio_vsock
> will still be shared because the virtqueues have to be in one
> structure. Then the_virtio_vsock will be protected by a rcu
> and a reference counting ( or a sleepable RCU ).
> 
> In vhost_vsock_dev_release, it will wait for both rcu and another
> one to be finished and then free the memory. I think this is
> doable. Let me know if there is a better way to do it.
> Btw, I think dgram tx code path will be quite different from
> stream, but dgram rx path will be similar to stream type.
> 
> > Take the datagram tx virtqueue lock, try to add the packet into the
> > virtqueue, and return -ENOBUFS if the virtqueue is full. Then use the
> > datagram socket's sndbuf accounting to prevent queuing up too many
> > packets. When a datagram tx virtqueue buffer is used by the device,
> > select queued packets for transmission.
> 
> I am not sure about the last sentence. In the new design, there will
> be no other queued packets for dgram (except those in the virtqueue).
> When dgram tx virtqueue is freed by the device, the driver side
> just needs to free some space. No need
> to trigger more transmission.

This approach drops packets when the virtqueue is full. In order to get
close to UDP semantics I think the following is necessary:

1. Packets are placed directly into the tx virtqueue when possible.
2. Packets are queued up to sndbuf bytes if the tx virtqueue is full.
3. When a full tx virtqueue gets some free space again there is an
   algorithm that selects from among the queued sockets.

This is how unreliably delivery can be somewhat fair between competing
threads. I thought you were trying to achieve this (it's similar to what
UDP does)?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description

2021-03-30 Thread Stefan Hajnoczi
On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote:
> On 30.03.2021 00:28, Stefano Garzarella wrote:
> > On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote:
> >> On 29.03.2021 19:11, Stefan Hajnoczi wrote:
> >>> On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote:
>  @@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device Types 
>  / Socket Device / Device Op
>   #define VIRTIO_VSOCK_OP_CREDIT_UPDATE  6
>   /* Request the peer to send the credit info to us */
>   #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7
>  +/* Message begin for SOCK_SEQPACKET */
>  +#define VIRTIO_VSOCK_OP_SEQ_BEGIN  8
>  +/* Message end for SOCK_SEQPACKET */
>  +#define VIRTIO_VSOCK_OP_SEQ_END9
> >>> The struct virtio_vsock_hdr->flags field is le32 and currently unused.
> >>> Could 24 bits be used for a unique message id and 8 bits for flags? 1
> >>> flag bit could be used for end-of-message and the remaining 7 bits could
> >>> be reserved. That way SEQ_BEGIN and SEQ_END are not necessary.  
> >>> Pressure
> >>> on the virtqueue would be reduced and performance should be comparable
> >>> to SOCK_STREAM.
> >> Well, my first versions of SOCK_SEQPACKET implementation, worked
> >> something like this: i used flags field of header as length of whole
> >> message. I discussed it with Stefano Garzarella, and he told that it 
> >> will
> >> be better to use special "header" in packet's payload, to keep some
> >> SOCK_SEQPACKET specific data, instead of reusing packet's header
> >> fields.
> > IIRC in the first implementation SEQ_BEGIN was an empty message and we 
> > didn't added the msg_id yet. So since we needed to carry both id and 
> > total length, I suggested to use the payload to put these extra 
> > information.
> >
> > IIUC what Stefan is suggesting is a bit different and I think it should 
> > be cool to implement: we can remove the boundary packets, use only 8 
> > bits for the flags, and add a new field to reuse the 24 unused bits, 
> > maybe also 16 bits would be enough.
> > At that point we will only use the EOR flag to know the last packet.
> >
> > The main difference will be that the receiver will know the total size 
> > only when the last packet is received.
> >
> > Do you see any issue on that approach?
> 
> It will work, except we can't check that any packet of message,
> 
> except last(with EOR bit) was dropped, since receiver don't know
> 
> real length of message. If it is ok, then i can implement it.

The credit mechanism ensures that packets are not dropped, so it's not
necessary to check for this condition.

By the way, is a unique message ID needed? My understanding is:

If two messages are being sent on a socket at the same time either their
order is serialized (whichever message began first) or it is undefined
(whichever message completes first). I wonder if POSIX specifies this
and if Linux implements it (e.g. with AF_UNIX SOCK_SEQPACKET messages
that are multiple pages long and exceed sndbuf)?

Depending on these semantics maybe we don't need a unique message ID.
Instead the driver transmits messages sequentially. RW packets for
different messages on the same socket will never be interleaved.
Therefore the unique message ID is not needed and just the MSG_EOR flag
is enough to indicate message boundaries.

What do you think?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 00/10] drm: Support simple-framebuffer devices and firmware fbs

2021-03-30 Thread Hans de Goede
Hi,

On 3/30/21 9:09 AM, Thomas Zimmermann wrote:
> Hi
> 
> Am 29.03.21 um 16:50 schrieb Hans de Goede:
>> Hi,
>>
>> On 3/29/21 2:31 PM, Thomas Zimmermann wrote:
>>> Hi
>>>
>>> Am 25.03.21 um 12:29 schrieb Hans de Goede:
 Hi,

 On 3/18/21 11:29 AM, Thomas Zimmermann wrote:
> This patchset adds support for simple-framebuffer platform devices and
> a handover mechanism for native drivers to take-over control of the
> hardware.
>
> The new driver, called simpledrm, binds to a simple-frambuffer platform
> device. The kernel's boot code creates such devices for firmware-provided
> framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot
> loader sets up the framebuffers. Description via device tree is also an
> option.
>
> Simpledrm is small enough to be linked into the kernel. The driver's main
> purpose is to provide graphical output during the early phases of the boot
> process, before the native DRM drivers are available. Native drivers are
> typically loaded from an initrd ram disk. Occationally simpledrm can also
> serve as interim solution on graphics hardware without native DRM driver.
>
> So far distributions rely on fbdev drivers, such as efifb, vesafb or
> simplefb, for early-boot graphical output. However fbdev is deprecated and
> the drivers do not provide DRM interfaces for modern userspace.
>
> Patches 1 and 2 prepare the DRM format helpers for simpledrm.
>
> Patches 3 and 4 add a hand-over mechanism. Simpledrm acquires it's
> framebuffer's I/O-memory range and provides a callback function to be
> removed by a native driver. The native driver will remove simpledrm before
> taking over the hardware. The removal is integrated into existing helpers,
> so drivers use it automatically.
>
> Patches 5 to 10 add the simpledrm driver. It's build on simple DRM helpers
> and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During
> pageflips, SHMEM buffers are copied into the framebuffer memory, similar
> to cirrus or mgag200. The code in patches 8 and 9 handles clocks and
> regulators. It's based on the simplefb drivers, but has been modified for
> DRM.

 Thank you for your work on this, this is very interesting.

> I've also been working on fastboot support (i.e., flicker-free booting).
> This requires state-readout from simpledrm via generic interfaces, as
> outlined in [1]. I do have some prototype code, but it will take a while
> to get this ready. Simpledrm will then support it.
>
> I've tested simpledrm with x86 EFI and VESA framebuffers, which both work
> reliably. The fbdev console and Weston work automatically. Xorg requires
> manual configuration of the device. Xorgs current modesetting driver does
> not work with both, platform and PCI device, for the same physical
> hardware. Once configured, X11 works. I looked into X11, but couldn't see
> an easy way of fixing the problem. With the push towards Wayland+Xwayland
> I expect the problem to become a non-issue soon. Additional testing has
> been reported at [2].
>
> One cosmetical issue is that simpledrm's device file is card0 and the
> native driver's device file is card1. After simpledrm has been kicked out,
> only card1 is left. This does not seem to be a practical problem however.
>
> TODO/IDEAS:
>
>  * provide deferred takeover

 I'm not sure what you mean with this ?  Currently deferred-takeover is
 handled in the fbcon code. Current flickerfree boot works like this
 (assuming a single LCD panel in a laptop):

 1. EFI/GOP sets up the framebuffer, draws a vendor logo
 2. The bootloader runs in silent mode and does not touch anything gfx 
 related
 3. kernel boots, with a loglevel of 3 so only CRIT/EMERG messages are shown
 2. efifb loads; and tells fbcon that a framebuffer is now available for it 
 to "bind"
  to. Since CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y fbcon defers 
 taking over
  the console and leaves the dummy-console driver in place (unless 
 there have already
  been kernel messages logged, which there shouldn't because loglevel=3)
 3. i915 loads, reads out the hw state compares this to the preferred-mode 
 for the
  panel which it would set, they match, nothing happens. i915 takes 
 ownership
  of the scanout-buffer set up by the GOP, but leaves it in place.
  i915 also removes the efifb /dev/fb0 and installs its own /dev/fb0 
 fbdev compat
  device, fbcon is notified of this, but is still deferred and leaves 
 the dummy
  console driver in place as console driver.
 4. Plymouth loads, allocates a new scan-out buffer at the panel's 
 preferred resolution,
  plymouth reads the vendor-logo through the 

Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description

2021-03-30 Thread Stefano Garzarella

On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote:


On 30.03.2021 00:28, Stefano Garzarella wrote:

On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote:

On 29.03.2021 19:11, Stefan Hajnoczi wrote:

On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote:

This adds description of SOCK_SEQPACKET socket type
support for virtio-vsock.

Signed-off-by: Arseny Krasnov 
---
 virtio-vsock.tex | 65 +++-
 1 file changed, 59 insertions(+), 6 deletions(-)

diff --git a/virtio-vsock.tex b/virtio-vsock.tex
index ad57f9d..c366de7 100644
--- a/virtio-vsock.tex
+++ b/virtio-vsock.tex
@@ -17,6 +17,10 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket 
Device / Virtqueues}
 \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature 
bits}

 There are currently no feature bits defined for this device.

^ This line is now out of date :)

Ack

+\begin{description}
+\item VIRTIO_VSOCK_F_SEQPACKET (0) SOCK_SEQPACKET socket type is
+supported.
+\end{description}

 \subsection{Device configuration layout}\label{sec:Device Types / Socket 
Device / Device configuration layout}

@@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device Types / 
Socket Device / Device Op
 #define VIRTIO_VSOCK_OP_CREDIT_UPDATE  6
 /* Request the peer to send the credit info to us */
 #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7
+/* Message begin for SOCK_SEQPACKET */
+#define VIRTIO_VSOCK_OP_SEQ_BEGIN  8
+/* Message end for SOCK_SEQPACKET */
+#define VIRTIO_VSOCK_OP_SEQ_END9

The struct virtio_vsock_hdr->flags field is le32 and currently unused.
Could 24 bits be used for a unique message id and 8 bits for flags? 1
flag bit could be used for end-of-message and the remaining 7 bits could
be reserved. That way SEQ_BEGIN and SEQ_END are not necessary.
Pressure
on the virtqueue would be reduced and performance should be comparable
to SOCK_STREAM.

Well, my first versions of SOCK_SEQPACKET implementation, worked
something like this: i used flags field of header as length of whole
message. I discussed it with Stefano Garzarella, and he told that it
will
be better to use special "header" in packet's payload, to keep some
SOCK_SEQPACKET specific data, instead of reusing packet's header
fields.

IIRC in the first implementation SEQ_BEGIN was an empty message and we
didn't added the msg_id yet. So since we needed to carry both id and
total length, I suggested to use the payload to put these extra
information.

IIUC what Stefan is suggesting is a bit different and I think it should
be cool to implement: we can remove the boundary packets, use only 8
bits for the flags, and add a new field to reuse the 24 unused bits,
maybe also 16 bits would be enough.
At that point we will only use the EOR flag to know the last packet.

The main difference will be that the receiver will know the total size
only when the last packet is received.

Do you see any issue on that approach?


It will work, except we can't check that any packet of message,

except last(with EOR bit) was dropped, since receiver don't know

real length of message. If it is ok, then i can implement it.



This is true, but where can a packet be lost?

The channel is lossless, so it can be lost either by the transmitter or 
the receiver, but only in critical cases where for example the whole 
system has run out of memory, but in this case we can't do much, maybe 
only reset the connection.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 00/10] drm: Support simple-framebuffer devices and firmware fbs

2021-03-30 Thread Thomas Zimmermann

Hi

Am 29.03.21 um 16:50 schrieb Hans de Goede:

Hi,

On 3/29/21 2:31 PM, Thomas Zimmermann wrote:

Hi

Am 25.03.21 um 12:29 schrieb Hans de Goede:

Hi,

On 3/18/21 11:29 AM, Thomas Zimmermann wrote:

This patchset adds support for simple-framebuffer platform devices and
a handover mechanism for native drivers to take-over control of the
hardware.

The new driver, called simpledrm, binds to a simple-frambuffer platform
device. The kernel's boot code creates such devices for firmware-provided
framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot
loader sets up the framebuffers. Description via device tree is also an
option.

Simpledrm is small enough to be linked into the kernel. The driver's main
purpose is to provide graphical output during the early phases of the boot
process, before the native DRM drivers are available. Native drivers are
typically loaded from an initrd ram disk. Occationally simpledrm can also
serve as interim solution on graphics hardware without native DRM driver.

So far distributions rely on fbdev drivers, such as efifb, vesafb or
simplefb, for early-boot graphical output. However fbdev is deprecated and
the drivers do not provide DRM interfaces for modern userspace.

Patches 1 and 2 prepare the DRM format helpers for simpledrm.

Patches 3 and 4 add a hand-over mechanism. Simpledrm acquires it's
framebuffer's I/O-memory range and provides a callback function to be
removed by a native driver. The native driver will remove simpledrm before
taking over the hardware. The removal is integrated into existing helpers,
so drivers use it automatically.

Patches 5 to 10 add the simpledrm driver. It's build on simple DRM helpers
and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During
pageflips, SHMEM buffers are copied into the framebuffer memory, similar
to cirrus or mgag200. The code in patches 8 and 9 handles clocks and
regulators. It's based on the simplefb drivers, but has been modified for
DRM.


Thank you for your work on this, this is very interesting.


I've also been working on fastboot support (i.e., flicker-free booting).
This requires state-readout from simpledrm via generic interfaces, as
outlined in [1]. I do have some prototype code, but it will take a while
to get this ready. Simpledrm will then support it.

I've tested simpledrm with x86 EFI and VESA framebuffers, which both work
reliably. The fbdev console and Weston work automatically. Xorg requires
manual configuration of the device. Xorgs current modesetting driver does
not work with both, platform and PCI device, for the same physical
hardware. Once configured, X11 works. I looked into X11, but couldn't see
an easy way of fixing the problem. With the push towards Wayland+Xwayland
I expect the problem to become a non-issue soon. Additional testing has
been reported at [2].

One cosmetical issue is that simpledrm's device file is card0 and the
native driver's device file is card1. After simpledrm has been kicked out,
only card1 is left. This does not seem to be a practical problem however.

TODO/IDEAS:

 * provide deferred takeover


I'm not sure what you mean with this ?  Currently deferred-takeover is
handled in the fbcon code. Current flickerfree boot works like this
(assuming a single LCD panel in a laptop):

1. EFI/GOP sets up the framebuffer, draws a vendor logo
2. The bootloader runs in silent mode and does not touch anything gfx related
3. kernel boots, with a loglevel of 3 so only CRIT/EMERG messages are shown
2. efifb loads; and tells fbcon that a framebuffer is now available for it to 
"bind"
     to. Since CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y fbcon defers 
taking over
     the console and leaves the dummy-console driver in place (unless there 
have already
     been kernel messages logged, which there shouldn't because loglevel=3)
3. i915 loads, reads out the hw state compares this to the preferred-mode for 
the
     panel which it would set, they match, nothing happens. i915 takes ownership
     of the scanout-buffer set up by the GOP, but leaves it in place.
     i915 also removes the efifb /dev/fb0 and installs its own /dev/fb0 fbdev 
compat
     device, fbcon is notified of this, but is still deferred and leaves the 
dummy
     console driver in place as console driver.
4. Plymouth loads, allocates a new scan-out buffer at the panel's preferred 
resolution,
     plymouth reads the vendor-logo through the BGRT ACPI interface and fills 
the
     scanout-buffer with the vendor-logo + a spinner. Then plymouth installs 
the new
     scanout-buffer on the crtc, this is done atomically during vsync, so the 
user
     sees no changes, other then the spinner appearing
     (note the active VT is now in graphical mode)
5. From here on not flickering is a userspace problem

AFAICT this should work fine with simplekms too, unless it clears the screen
to black when it binds.


I forgot to add the code that clears the screen, but that's the case here.

Instead of a