Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
Hi Stefano, I checked dgram code again. I think assigning transport for each packet is doable. The dgram will be similar to stream to have two transports. If there is no other problem, I can submit a Linux kernel patch to support nested dgram on VMCI first. Then it will be easier for virtio vsock. Also, I don't think we need to put anything related to this multiple- transport support in the spec. Let me know if otherwise. Regards, Jiang On Tue, Mar 30, 2021 at 11:34 AM Jiang Wang . wrote: > > On Tue, Mar 30, 2021 at 8:32 AM Stefano Garzarella > wrote: > > > > Hi Jiang, > > > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote: > > >Hi Michael and Stefan, > > > > > >I thought about this and discussed it with my colleague Cong Wang. > > >One idea is to make current asynchronous send_pkt flow to be synchronous, > > >then if the virtqueue is full, the function can return ENOMEM all the way > > >back > > >to the caller and the caller can check the return value of sendmsg > > >and slow down when necessary. > > > > > >In the spec, we can put something like, if the virtqueue is full, the > > >caller > > >should be notified with an error etc. > > > > > >In terms of implementation, that means we will remove the current > > >send_pkt_work for both stream and dgram sockets. Currently, the > > >code path uses RCU and a work queue, then grab a mutex in the > > >work queue function. Since we cannot grab mutex when in rcu > > >critical section, we have to change RCU to a normal reference > > >counting mechanism. I think this is doable. The drawback is > > >that the reference counting in general spends a little more > > >cycles than the RCU, so there is a small price to pay. Another > > >option is to use Sleepable RCU and remove the work queue. > > > > > >What do you guys think? > > > > another thing that came to mind not related to the spec but to the Linux > > implementation, is the multi-transport support. > > When we discussed the initial proposals [1][2], we decided to take a > > shortcut for DGRAM, since the only transport with DGRAM support was > > vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM > > set can be registered. > > > > Since also virtio-transport and vhost-transport will support DGRAM, we > > need to find a way to allow multiple transports that support DGRAM to be > > registered together to support nested VMs. > > > > Do you have already thought about how to solve this problem? > > > > We should definitely allow the registration of more transports with > > VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when > > sending a packet. > > Got it. We briefly discussed multiple dgram transport > support in my old email thread. At that time, I was thinking > maybe check for transport for each packet. I did not spend more > time on that part after that. I will think about it more and get > back to you. Thanks > > > Thanks, > > Stefano > > > > [1] https://www.spinics.net/lists/netdev/msg570274.html > > [2] https://www.spinics.net/lists/netdev/msg575792.html > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
On Tue, Mar 30, 2021 at 8:32 AM Stefano Garzarella wrote: > > Hi Jiang, > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote: > >Hi Michael and Stefan, > > > >I thought about this and discussed it with my colleague Cong Wang. > >One idea is to make current asynchronous send_pkt flow to be synchronous, > >then if the virtqueue is full, the function can return ENOMEM all the way > >back > >to the caller and the caller can check the return value of sendmsg > >and slow down when necessary. > > > >In the spec, we can put something like, if the virtqueue is full, the caller > >should be notified with an error etc. > > > >In terms of implementation, that means we will remove the current > >send_pkt_work for both stream and dgram sockets. Currently, the > >code path uses RCU and a work queue, then grab a mutex in the > >work queue function. Since we cannot grab mutex when in rcu > >critical section, we have to change RCU to a normal reference > >counting mechanism. I think this is doable. The drawback is > >that the reference counting in general spends a little more > >cycles than the RCU, so there is a small price to pay. Another > >option is to use Sleepable RCU and remove the work queue. > > > >What do you guys think? > > another thing that came to mind not related to the spec but to the Linux > implementation, is the multi-transport support. > When we discussed the initial proposals [1][2], we decided to take a > shortcut for DGRAM, since the only transport with DGRAM support was > vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM > set can be registered. > > Since also virtio-transport and vhost-transport will support DGRAM, we > need to find a way to allow multiple transports that support DGRAM to be > registered together to support nested VMs. > > Do you have already thought about how to solve this problem? > > We should definitely allow the registration of more transports with > VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when > sending a packet. Got it. We briefly discussed multiple dgram transport support in my old email thread. At that time, I was thinking maybe check for transport for each packet. I did not spend more time on that part after that. I will think about it more and get back to you. Thanks > Thanks, > Stefano > > [1] https://www.spinics.net/lists/netdev/msg570274.html > [2] https://www.spinics.net/lists/netdev/msg575792.html > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
On Tue, Mar 30, 2021 at 3:42 AM Stefan Hajnoczi wrote: > > On Mon, Mar 29, 2021 at 04:22:28PM -0700, Jiang Wang . wrote: > > On Mon, Mar 29, 2021 at 2:26 AM Stefan Hajnoczi wrote: > > > > > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote: > > > > I thought about this and discussed it with my colleague Cong Wang. > > > > One idea is to make current asynchronous send_pkt flow to be > > > > synchronous, > > > > then if the virtqueue is full, the function can return ENOMEM all the > > > > way back > > > > to the caller and the caller can check the return value of sendmsg > > > > and slow down when necessary. > > > > > > > > In the spec, we can put something like, if the virtqueue is full, the > > > > caller > > > > should be notified with an error etc. > > > > > > > > In terms of implementation, that means we will remove the current > > > > send_pkt_work for both stream and dgram sockets. Currently, the > > > > code path uses RCU and a work queue, then grab a mutex in the > > > > work queue function. Since we cannot grab mutex when in rcu > > > > critical section, we have to change RCU to a normal reference > > > > counting mechanism. I think this is doable. The drawback is > > > > that the reference counting in general spends a little more > > > > cycles than the RCU, so there is a small price to pay. Another > > > > option is to use Sleepable RCU and remove the work queue. > > > > > > > > What do you guys think? > > > > > > I think the tx code path is like this because of reliable delivery. > > > Maybe a separate datagram rx/tx code path would be simpler? > > > > I thought about this too. dgram can have a separate rx/tx > > path from stream types. In this case, the the_virtio_vsock > > will still be shared because the virtqueues have to be in one > > structure. Then the_virtio_vsock will be protected by a rcu > > and a reference counting ( or a sleepable RCU ). > > > > In vhost_vsock_dev_release, it will wait for both rcu and another > > one to be finished and then free the memory. I think this is > > doable. Let me know if there is a better way to do it. > > Btw, I think dgram tx code path will be quite different from > > stream, but dgram rx path will be similar to stream type. > > > > > Take the datagram tx virtqueue lock, try to add the packet into the > > > virtqueue, and return -ENOBUFS if the virtqueue is full. Then use the > > > datagram socket's sndbuf accounting to prevent queuing up too many > > > packets. When a datagram tx virtqueue buffer is used by the device, > > > select queued packets for transmission. > > > > I am not sure about the last sentence. In the new design, there will > > be no other queued packets for dgram (except those in the virtqueue). > > When dgram tx virtqueue is freed by the device, the driver side > > just needs to free some space. No need > > to trigger more transmission. > > This approach drops packets when the virtqueue is full. In order to get > close to UDP semantics I think the following is necessary: > > 1. Packets are placed directly into the tx virtqueue when possible. > 2. Packets are queued up to sndbuf bytes if the tx virtqueue is full. > 3. When a full tx virtqueue gets some free space again there is an >algorithm that selects from among the queued sockets. > > This is how unreliably delivery can be somewhat fair between competing > threads. I thought you were trying to achieve this (it's similar to what > UDP does)? I see. I was thinking something different without an extra queue. I think your approach is better. For step 3, I think I will just start with a simple FIFO algorithm first. > Stefan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:58, Will Deacon wrote: On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. But previously, SMMU only ever saw the global policy piped through the domain attribute by iommu_group_alloc_default_domain(), so there's no functional change there. Obviously some of the above checks could be factored out into some kind of iommu_use_flush_queue() helper that IOMMU drivers can also call if they need to keep in sync. Or maybe we just allow iommu-dma to set IO_PGTABLE_QUIRK_NON_STRICT directly via iommu_set_pgtable_quirks() if we're treating that as a generic thing now. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Ah, sorry, I didn't spot that! I was just trying to illustrate that this is per-device. Sure, I understand - and I'm just trying to bang home that despite appearances it's never actually been treated as such for SMMU, so anything that's wrong after this change was already wrong before. Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
Hi Jiang, On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote: Hi Michael and Stefan, I thought about this and discussed it with my colleague Cong Wang. One idea is to make current asynchronous send_pkt flow to be synchronous, then if the virtqueue is full, the function can return ENOMEM all the way back to the caller and the caller can check the return value of sendmsg and slow down when necessary. In the spec, we can put something like, if the virtqueue is full, the caller should be notified with an error etc. In terms of implementation, that means we will remove the current send_pkt_work for both stream and dgram sockets. Currently, the code path uses RCU and a work queue, then grab a mutex in the work queue function. Since we cannot grab mutex when in rcu critical section, we have to change RCU to a normal reference counting mechanism. I think this is doable. The drawback is that the reference counting in general spends a little more cycles than the RCU, so there is a small price to pay. Another option is to use Sleepable RCU and remove the work queue. What do you guys think? another thing that came to mind not related to the spec but to the Linux implementation, is the multi-transport support. When we discussed the initial proposals [1][2], we decided to take a shortcut for DGRAM, since the only transport with DGRAM support was vmci. So for now only a single transport with VSOCK_TRANSPORT_F_DGRAM set can be registered. Since also virtio-transport and vhost-transport will support DGRAM, we need to find a way to allow multiple transports that support DGRAM to be registered together to support nested VMs. Do you have already thought about how to solve this problem? We should definitely allow the registration of more transports with VSOCK_TRANSPORT_F_DGRAM, and dynamically choose which one to use when sending a packet. Thanks, Stefano [1] https://www.spinics.net/lists/netdev/msg570274.html [2] https://www.spinics.net/lists/netdev/msg575792.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On Tue, Mar 30, 2021 at 02:19:38PM +0100, Robin Murphy wrote: > On 2021-03-30 14:11, Will Deacon wrote: > > On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: > > > From: Robin Murphy > > > > > > Instead make the global iommu_dma_strict paramete in iommu.c canonical by > > > exporting helpers to get and set it and use those directly in the drivers. > > > > > > This make sure that the iommu.strict parameter also works for the AMD and > > > Intel IOMMU drivers on x86. As those default to lazy flushing a new > > > IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to > > > represent the default if not overriden by an explicit parameter. > > > > > > Signed-off-by: Robin Murphy . > > > [ported on top of the other iommu_attr changes and added a few small > > > missing bits] > > > Signed-off-by: Christoph Hellwig > > > --- > > > drivers/iommu/amd/iommu.c | 23 +--- > > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- > > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - > > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + > > > drivers/iommu/dma-iommu.c | 9 +-- > > > drivers/iommu/intel/iommu.c | 64 - > > > drivers/iommu/iommu.c | 27 ++--- > > > include/linux/iommu.h | 4 +- > > > 8 files changed, 40 insertions(+), 165 deletions(-) > > > > I really like this cleanup, but I can't help wonder if it's going in the > > wrong direction. With SoCs often having multiple IOMMU instances and a > > distinction between "trusted" and "untrusted" devices, then having the > > flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound > > unreasonable to me, but this change makes it a global property. > > The intent here was just to streamline the existing behaviour of stuffing a > global property into a domain attribute then pulling it out again in the > illusion that it was in any way per-domain. We're still checking > dev_is_untrusted() before making an actual decision, and it's not like we > can't add more factors at that point if we want to. Like I say, the cleanup is great. I'm just wondering whether there's a better way to express the complicated logic to decide whether or not to use the flush queue than what we end up with: if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && domain->ops->flush_iotlb_all && !iommu_get_dma_strict()) which is mixing up globals, device properties and domain properties. The result is that the driver code ends up just using the global to determine whether or not to pass IO_PGTABLE_QUIRK_NON_STRICT to the page-table code, which is a departure from the current way of doing things. > > For example, see the recent patch from Lu Baolu: > > > > https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com > > Erm, this patch is based on that one, it's right there in the context :/ Ah, sorry, I didn't spot that! I was just trying to illustrate that this is per-device. Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description
On Tue, Mar 30, 2021 at 12:50:06PM +0300, Arseny Krasnov wrote: > > On 30.03.2021 11:55, Stefan Hajnoczi wrote: > > On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote: > >> On 30.03.2021 00:28, Stefano Garzarella wrote: > >>> On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote: > On 29.03.2021 19:11, Stefan Hajnoczi wrote: > > On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote: > >> @@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device > >> Types / Socket Device / Device Op > >> #define VIRTIO_VSOCK_OP_CREDIT_UPDATE 6 > >> /* Request the peer to send the credit info to us */ > >> #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7 > >> +/* Message begin for SOCK_SEQPACKET */ > >> +#define VIRTIO_VSOCK_OP_SEQ_BEGIN 8 > >> +/* Message end for SOCK_SEQPACKET */ > >> +#define VIRTIO_VSOCK_OP_SEQ_END9 > > The struct virtio_vsock_hdr->flags field is le32 and currently unused. > > Could 24 bits be used for a unique message id and 8 bits for flags? 1 > > flag bit could be used for end-of-message and the remaining 7 bits could > > be reserved. That way SEQ_BEGIN and SEQ_END are not necessary. > > Pressure > > on the virtqueue would be reduced and performance should be comparable > > to SOCK_STREAM. > Well, my first versions of SOCK_SEQPACKET implementation, worked > something like this: i used flags field of header as length of whole > message. I discussed it with Stefano Garzarella, and he told that it > will > be better to use special "header" in packet's payload, to keep some > SOCK_SEQPACKET specific data, instead of reusing packet's header > fields. > >>> IIRC in the first implementation SEQ_BEGIN was an empty message and we > >>> didn't added the msg_id yet. So since we needed to carry both id and > >>> total length, I suggested to use the payload to put these extra > >>> information. > >>> > >>> IIUC what Stefan is suggesting is a bit different and I think it should > >>> be cool to implement: we can remove the boundary packets, use only 8 > >>> bits for the flags, and add a new field to reuse the 24 unused bits, > >>> maybe also 16 bits would be enough. > >>> At that point we will only use the EOR flag to know the last packet. > >>> > >>> The main difference will be that the receiver will know the total size > >>> only when the last packet is received. > >>> > >>> Do you see any issue on that approach? > >> It will work, except we can't check that any packet of message, > >> > >> except last(with EOR bit) was dropped, since receiver don't know > >> > >> real length of message. If it is ok, then i can implement it. > > The credit mechanism ensures that packets are not dropped, so it's not > > necessary to check for this condition. > > > > By the way, is a unique message ID needed? My understanding is: > > > > If two messages are being sent on a socket at the same time either their > > order is serialized (whichever message began first) or it is undefined > > (whichever message completes first). > > If we are talking about case, when two threads writes to one socket at the > same time, > > in Linux it is possible that two message will interleave(for vsock). But as i > know, for example > > when TCP socket is used, both 'write()' calls will be serialized. May be it > is bug of vsock: when > > first writer goes out of space, it will sleep. Then second writer tries to > send something, but > > as free space is over, it will sleep too. Then, credit update is received > from peer. Both sender's > > will be woken up, but sender which grab socket lock first will continue to > send it's message. > > So may be we can add something like semaphore to new/vmw_vsock/af_vsock.c > which will > > serialize two 'write()' calls: second sender enters 'write()' path, only when > first left this path. > > My implementation doesn't care about that, because i wanted to add semaphore > later, by another > > patch. Yes, that is definitely an issue that the driver needs to take care of if we don't have unique message IDs. Thanks for explaining! Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-30 14:11, Will Deacon wrote: On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: From: Robin Murphy Instead make the global iommu_dma_strict paramete in iommu.c canonical by exporting helpers to get and set it and use those directly in the drivers. This make sure that the iommu.strict parameter also works for the AMD and Intel IOMMU drivers on x86. As those default to lazy flushing a new IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to represent the default if not overriden by an explicit parameter. Signed-off-by: Robin Murphy . [ported on top of the other iommu_attr changes and added a few small missing bits] Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + drivers/iommu/dma-iommu.c | 9 +-- drivers/iommu/intel/iommu.c | 64 - drivers/iommu/iommu.c | 27 ++--- include/linux/iommu.h | 4 +- 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. The intent here was just to streamline the existing behaviour of stuffing a global property into a domain attribute then pulling it out again in the illusion that it was in any way per-domain. We're still checking dev_is_untrusted() before making an actual decision, and it's not like we can't add more factors at that point if we want to. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Erm, this patch is based on that one, it's right there in the context :/ Thanks, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 18/18] iommu: remove iommu_domain_{get,set}_attr
On Tue, Mar 16, 2021 at 04:38:24PM +0100, Christoph Hellwig wrote: > Remove the now unused iommu attr infrastructure. > > Signed-off-by: Christoph Hellwig > --- > drivers/iommu/iommu.c | 26 -- > include/linux/iommu.h | 36 > 2 files changed, 62 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 17/18] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG
On Tue, Mar 16, 2021 at 04:38:23PM +0100, Christoph Hellwig wrote: > Use an explicit set_pgtable_quirks method instead that just passes > the actual quirk bitmask instead. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 5 +- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 64 + > drivers/iommu/arm/arm-smmu/arm-smmu.h | 2 +- > drivers/iommu/iommu.c | 11 + > include/linux/io-pgtable.h | 4 -- > include/linux/iommu.h | 12 - > 6 files changed, 35 insertions(+), 63 deletions(-) I'm fine with this for now, although there has been talk about passing things other than boolean flags as page-table quirks. We can cross that bridge when we get there, so: Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 16/18] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On Tue, Mar 16, 2021 at 04:38:22PM +0100, Christoph Hellwig wrote: > From: Robin Murphy > > Instead make the global iommu_dma_strict paramete in iommu.c canonical by > exporting helpers to get and set it and use those directly in the drivers. > > This make sure that the iommu.strict parameter also works for the AMD and > Intel IOMMU drivers on x86. As those default to lazy flushing a new > IOMMU_CMD_LINE_STRICT is used to turn the value into a tristate to > represent the default if not overriden by an explicit parameter. > > Signed-off-by: Robin Murphy . > [ported on top of the other iommu_attr changes and added a few small > missing bits] > Signed-off-by: Christoph Hellwig > --- > drivers/iommu/amd/iommu.c | 23 +--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 +--- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - > drivers/iommu/arm/arm-smmu/arm-smmu.c | 27 + > drivers/iommu/dma-iommu.c | 9 +-- > drivers/iommu/intel/iommu.c | 64 - > drivers/iommu/iommu.c | 27 ++--- > include/linux/iommu.h | 4 +- > 8 files changed, 40 insertions(+), 165 deletions(-) I really like this cleanup, but I can't help wonder if it's going in the wrong direction. With SoCs often having multiple IOMMU instances and a distinction between "trusted" and "untrusted" devices, then having the flush-queue enabled on a per-IOMMU or per-domain basis doesn't sound unreasonable to me, but this change makes it a global property. For example, see the recent patch from Lu Baolu: https://lore.kernel.org/r/20210225061454.2864009-1-baolu...@linux.intel.com Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 15/18] iommu: remove iommu_set_cmd_line_dma_api and iommu_cmd_line_dma_api
On Tue, Mar 16, 2021 at 04:38:21PM +0100, Christoph Hellwig wrote: > Don't obsfucate the trivial bit flag check. > > Signed-off-by: Christoph Hellwig > --- > drivers/iommu/iommu.c | 23 +-- > 1 file changed, 5 insertions(+), 18 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 14/18] iommu: remove DOMAIN_ATTR_NESTING
On Tue, Mar 16, 2021 at 04:38:20PM +0100, Christoph Hellwig wrote: > Use an explicit enable_nesting method instead. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 - > drivers/iommu/arm/arm-smmu/arm-smmu.c | 30 +++--- > drivers/iommu/intel/iommu.c | 31 +-- > drivers/iommu/iommu.c | 10 + > drivers/vfio/vfio_iommu_type1.c | 5 +-- > include/linux/iommu.h | 4 +- > 6 files changed, 55 insertions(+), 68 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 13/18] iommu: remove DOMAIN_ATTR_GEOMETRY
On Tue, Mar 16, 2021 at 04:38:19PM +0100, Christoph Hellwig wrote: > The geometry information can be trivially queried from the iommu_domain > struture. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/iommu.c | 20 +++- > drivers/vfio/vfio_iommu_type1.c | 26 -- > drivers/vhost/vdpa.c| 10 +++--- > include/linux/iommu.h | 1 - > 4 files changed, 18 insertions(+), 39 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 12/18] iommu: remove DOMAIN_ATTR_PAGING
On Tue, Mar 16, 2021 at 04:38:18PM +0100, Christoph Hellwig wrote: > DOMAIN_ATTR_PAGING is never used. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/iommu.c | 5 - > include/linux/iommu.h | 1 - > 2 files changed, 6 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/18] iommu/fsl_pamu: remove the snoop_id field
On Tue, Mar 16, 2021 at 04:38:17PM +0100, Christoph Hellwig wrote: > The snoop_id is always set to ~(u32)0. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 5 ++--- > drivers/iommu/fsl_pamu_domain.h | 1 - > 2 files changed, 2 insertions(+), 4 deletions(-) pamu_config_ppaace() takes quite a few useless parameters at this stage, but anyway: Acked-by: Will Deacon Do you know if this driver is actually useful? Once the complexity has been stripped back, the stubs and default values really stand out. Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 10/18] iommu/fsl_pamu: enable the liodn when attaching a device
On Tue, Mar 16, 2021 at 04:38:16PM +0100, Christoph Hellwig wrote: > Instead of a separate call to enable all devices from the list, just > enablde the liodn one the device is attached to the iommu domain. (typos: "enablde" and "one" probably needs to be "once"?) > This also remove the DOMAIN_ATTR_FSL_PAMU_ENABLE iommu_attr. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 47 ++--- > drivers/iommu/fsl_pamu_domain.h | 10 -- > drivers/soc/fsl/qbman/qman_portal.c | 11 --- > include/linux/iommu.h | 1 - > 4 files changed, 3 insertions(+), 66 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/18] iommu/fsl_pamu: merge handle_attach_device into fsl_pamu_attach_device
On Tue, Mar 16, 2021 at 04:38:15PM +0100, Christoph Hellwig wrote: > No good reason to split this functionality over two functions. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 59 +++-- > 1 file changed, 20 insertions(+), 39 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 08/18] iommu/fsl_pamu: merge pamu_set_liodn and map_liodn
On Tue, Mar 16, 2021 at 04:38:14PM +0100, Christoph Hellwig wrote: > Merge the two fuctions that configure the ppaace into a single coherent > function. I somehow doubt we need the two pamu_config_ppaace calls, > but keep the existing behavior just to be on the safe side. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 65 + > 1 file changed, 17 insertions(+), 48 deletions(-) > > diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c > index 40eff4b7bc5d42..4a4944332674f7 100644 > --- a/drivers/iommu/fsl_pamu_domain.c > +++ b/drivers/iommu/fsl_pamu_domain.c > @@ -54,25 +54,6 @@ static int __init iommu_init_mempool(void) > return 0; > } > > -/* Map the DMA window corresponding to the LIODN */ > -static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain) > -{ > - int ret; > - struct iommu_domain_geometry *geom = _domain->iommu_domain.geometry; > - unsigned long flags; > - > - spin_lock_irqsave(_lock, flags); > - ret = pamu_config_ppaace(liodn, geom->aperture_start, > - geom->aperture_end - 1, ~(u32)0, > - 0, dma_domain->snoop_id, dma_domain->stash_id, > - PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE); > - spin_unlock_irqrestore(_lock, flags); > - if (ret) > - pr_debug("PAACE configuration failed for liodn %d\n", liodn); > - > - return ret; > -} > - > static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain, > u32 val) > { > @@ -94,11 +75,11 @@ static int update_liodn_stash(int liodn, struct > fsl_dma_domain *dma_domain, > } > > /* Set the geometry parameters for a LIODN */ > -static int pamu_set_liodn(int liodn, struct device *dev, > - struct fsl_dma_domain *dma_domain, > - struct iommu_domain_geometry *geom_attr) > +static int pamu_set_liodn(struct fsl_dma_domain *dma_domain, struct device > *dev, > + int liodn) > { > - phys_addr_t window_addr, window_size; > + struct iommu_domain *domain = _domain->iommu_domain; > + struct iommu_domain_geometry *geom = >geometry; > u32 omi_index = ~(u32)0; > unsigned long flags; > int ret; > @@ -110,22 +91,25 @@ static int pamu_set_liodn(int liodn, struct device *dev, >*/ > get_ome_index(_index, dev); > > - window_addr = geom_attr->aperture_start; > - window_size = geom_attr->aperture_end + 1; > - > spin_lock_irqsave(_lock, flags); > ret = pamu_disable_liodn(liodn); > - if (!ret) > - ret = pamu_config_ppaace(liodn, window_addr, window_size, > omi_index, > - 0, dma_domain->snoop_id, > - dma_domain->stash_id, 0); > + if (ret) > + goto out_unlock; > + ret = pamu_config_ppaace(liodn, geom->aperture_start, > + geom->aperture_end - 1, omi_index, 0, > + dma_domain->snoop_id, dma_domain->stash_id, 0); > + if (ret) > + goto out_unlock; > + ret = pamu_config_ppaace(liodn, geom->aperture_start, > + geom->aperture_end - 1, ~(u32)0, > + 0, dma_domain->snoop_id, dma_domain->stash_id, > + PAACE_AP_PERMS_QUERY | PAACE_AP_PERMS_UPDATE); There's more '+1' / '-1' confusion here with aperture_end which I'm not managing to follow. What am I missing? Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 07/18] iommu/fsl_pamu: replace DOMAIN_ATTR_FSL_PAMU_STASH with a direct call
On Tue, Mar 16, 2021 at 04:38:13PM +0100, Christoph Hellwig wrote: > Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly > instead of indirecting through the iommu attr API. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > arch/powerpc/include/asm/fsl_pamu_stash.h | 12 +++- > drivers/iommu/fsl_pamu_domain.c | 16 +++- > drivers/iommu/fsl_pamu_domain.h | 2 -- > drivers/soc/fsl/qbman/qman_portal.c | 18 +++--- > include/linux/iommu.h | 1 - > 5 files changed, 9 insertions(+), 40 deletions(-) Heh, this thing is so over-engineered. Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 06/18] iommu/fsl_pamu: remove ->domain_window_enable
On Tue, Mar 16, 2021 at 04:38:12PM +0100, Christoph Hellwig wrote: > The only thing that fsl_pamu_window_enable does for the current caller > is to fill in the prot value in the only dma_window structure, and to > propagate a few values from the iommu_domain_geometry struture into the > dma_window. Remove the dma_window entirely, hardcode the prot value and > otherwise use the iommu_domain_geometry structure instead. > > Remove the now unused ->domain_window_enable iommu method. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 182 +++- > drivers/iommu/fsl_pamu_domain.h | 17 --- > drivers/iommu/iommu.c | 11 -- > drivers/soc/fsl/qbman/qman_portal.c | 7 -- > include/linux/iommu.h | 17 --- > 5 files changed, 14 insertions(+), 220 deletions(-) > > diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c > index e6bdd38fc18409..fd2bc88b690465 100644 > --- a/drivers/iommu/fsl_pamu_domain.c > +++ b/drivers/iommu/fsl_pamu_domain.c > @@ -54,34 +54,18 @@ static int __init iommu_init_mempool(void) > return 0; > } > > -static phys_addr_t get_phys_addr(struct fsl_dma_domain *dma_domain, > dma_addr_t iova) > -{ > - struct dma_window *win_ptr = _domain->win_arr[0]; > - struct iommu_domain_geometry *geom; > - > - geom = _domain->iommu_domain.geometry; > - > - if (win_ptr->valid) > - return win_ptr->paddr + (iova & (win_ptr->size - 1)); > - > - return 0; > -} > - > /* Map the DMA window corresponding to the LIODN */ > static int map_liodn(int liodn, struct fsl_dma_domain *dma_domain) > { > int ret; > - struct dma_window *wnd = _domain->win_arr[0]; > - phys_addr_t wnd_addr = dma_domain->iommu_domain.geometry.aperture_start; > + struct iommu_domain_geometry *geom = _domain->iommu_domain.geometry; > unsigned long flags; > > spin_lock_irqsave(_lock, flags); > - ret = pamu_config_ppaace(liodn, wnd_addr, > - wnd->size, > - ~(u32)0, > - wnd->paddr >> PAMU_PAGE_SHIFT, > - dma_domain->snoop_id, dma_domain->stash_id, > - wnd->prot); > + ret = pamu_config_ppaace(liodn, geom->aperture_start, > + geom->aperture_end - 1, ~(u32)0, You're passing 'geom->aperture_end - 1' as the size here, but the old code seemed to _add_ 1: > -static int fsl_pamu_window_enable(struct iommu_domain *domain, u32 wnd_nr, > - phys_addr_t paddr, u64 size, int prot) > -{ > - struct fsl_dma_domain *dma_domain = to_fsl_dma_domain(domain); > - struct dma_window *wnd; > - int pamu_prot = 0; > - int ret; > - unsigned long flags; > - u64 win_size; > - > - if (prot & IOMMU_READ) > - pamu_prot |= PAACE_AP_PERMS_QUERY; > - if (prot & IOMMU_WRITE) > - pamu_prot |= PAACE_AP_PERMS_UPDATE; > - > - spin_lock_irqsave(_domain->domain_lock, flags); > - if (wnd_nr > 0) { > - pr_debug("Invalid window index\n"); > - spin_unlock_irqrestore(_domain->domain_lock, flags); > - return -EINVAL; > - } > - > - win_size = (domain->geometry.aperture_end + 1) >> ilog2(1); here ^^ when calculating the exclusive upper bound. In other words, I think '1ULL << 36' used to get passed to pamu_config_ppaace(). Is that an intentional change? Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 05/18] iommu/fsl_pamu: remove support for multiple windows
On Tue, Mar 16, 2021 at 04:38:11PM +0100, Christoph Hellwig wrote: > The only domains allocated forces use of a single window. Remove all > the code related to multiple window support, as well as the need for > qman_portal to force a single window. > > Remove the now unused DOMAIN_ATTR_WINDOWS iommu_attr. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu.c| 264 +- > drivers/iommu/fsl_pamu.h| 10 +- > drivers/iommu/fsl_pamu_domain.c | 275 +--- > drivers/iommu/fsl_pamu_domain.h | 12 +- > drivers/soc/fsl/qbman/qman_portal.c | 7 - > include/linux/iommu.h | 1 - > 6 files changed, 59 insertions(+), 510 deletions(-) [...] > + set_bf(ppaace->impl_attr, PAACE_IA_ATM, PAACE_ATM_WINDOW_XLATE); > + ppaace->twbah = rpn >> 20; > + set_bf(ppaace->win_bitfields, PAACE_WIN_TWBAL, rpn); > + set_bf(ppaace->addr_bitfields, PAACE_AF_AP, prot); > + set_bf(ppaace->impl_attr, PAACE_IA_WCE, 0); > + set_bf(ppaace->addr_bitfields, PPAACE_AF_MW, 0); > mb(); (I wonder what on Earth that mb() is doing...) > diff --git a/drivers/iommu/fsl_pamu_domain.h b/drivers/iommu/fsl_pamu_domain.h > index 53d359d66fe577..b9236fb5a8f82e 100644 > --- a/drivers/iommu/fsl_pamu_domain.h > +++ b/drivers/iommu/fsl_pamu_domain.h > @@ -17,23 +17,13 @@ struct dma_window { > }; > > struct fsl_dma_domain { > - /* > - * Number of windows assocaited with this domain. > - * During domain initialization, it is set to the > - * the maximum number of subwindows allowed for a LIODN. > - * Minimum value for this is 1 indicating a single PAMU > - * window, without any sub windows. Value can be set/ > - * queried by set_attr/get_attr API for DOMAIN_ATTR_WINDOWS. > - * Value can only be set once the geometry has been configured. > - */ > - u32 win_cnt; > /* >* win_arr contains information of the configured >* windows for a domain. This is allocated only >* when the number of windows for the domain are >* set. >*/ The last part of this comment is now stale ^^ > - struct dma_window *win_arr; > + struct dma_window win_arr[1]; > /* list of devices associated with the domain */ > struct list_headdevices; > /* dma_domain states: Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 04/18] iommu/fsl_pamu: merge iommu_alloc_dma_domain into fsl_pamu_domain_alloc
On Tue, Mar 16, 2021 at 04:38:10PM +0100, Christoph Hellwig wrote: > Keep the functionality to allocate the domain together. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 34 ++--- > 1 file changed, 10 insertions(+), 24 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 03/18] iommu/fsl_pamu: remove support for setting DOMAIN_ATTR_GEOMETRY
On Tue, Mar 16, 2021 at 04:38:09PM +0100, Christoph Hellwig wrote: > The default geometry is the same as the one set by qman_port given > that FSL_PAMU depends on having 64-bit physical and thus DMA addresses. > > Remove the support to update the geometry and remove the now pointless > geom_size field. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 55 +++-- > drivers/iommu/fsl_pamu_domain.h | 6 > drivers/soc/fsl/qbman/qman_portal.c | 12 --- > 3 files changed, 5 insertions(+), 68 deletions(-) Took me a minute to track down the other magic '36' which ends up in aperture_end, but I found it eventually so: Acked-by: Will Deacon (It does make me wonder what all this glue was intended to be used for) Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 02/18] iommu/fsl_pamu: remove fsl_pamu_get_domain_attr
On Tue, Mar 16, 2021 at 04:38:08PM +0100, Christoph Hellwig wrote: > None of the values returned by this function are ever queried. Also > remove the DOMAIN_ATTR_FSL_PAMUV1 enum value that is not otherwise used. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 30 -- > include/linux/iommu.h | 4 > 2 files changed, 34 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 01/18] iommu: remove the unused domain_window_disable method
On Tue, Mar 16, 2021 at 04:38:07PM +0100, Christoph Hellwig wrote: > domain_window_disable is wired up by fsl_pamu, but never actually called. > > Signed-off-by: Christoph Hellwig > Acked-by: Li Yang > --- > drivers/iommu/fsl_pamu_domain.c | 48 - > include/linux/iommu.h | 2 -- > 2 files changed, 50 deletions(-) Acked-by: Will Deacon Will ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
On Mon, Mar 29, 2021 at 04:22:28PM -0700, Jiang Wang . wrote: > On Mon, Mar 29, 2021 at 2:26 AM Stefan Hajnoczi wrote: > > > > On Fri, Mar 26, 2021 at 04:40:09PM -0700, Jiang Wang . wrote: > > > I thought about this and discussed it with my colleague Cong Wang. > > > One idea is to make current asynchronous send_pkt flow to be synchronous, > > > then if the virtqueue is full, the function can return ENOMEM all the > > > way back > > > to the caller and the caller can check the return value of sendmsg > > > and slow down when necessary. > > > > > > In the spec, we can put something like, if the virtqueue is full, the > > > caller > > > should be notified with an error etc. > > > > > > In terms of implementation, that means we will remove the current > > > send_pkt_work for both stream and dgram sockets. Currently, the > > > code path uses RCU and a work queue, then grab a mutex in the > > > work queue function. Since we cannot grab mutex when in rcu > > > critical section, we have to change RCU to a normal reference > > > counting mechanism. I think this is doable. The drawback is > > > that the reference counting in general spends a little more > > > cycles than the RCU, so there is a small price to pay. Another > > > option is to use Sleepable RCU and remove the work queue. > > > > > > What do you guys think? > > > > I think the tx code path is like this because of reliable delivery. > > Maybe a separate datagram rx/tx code path would be simpler? > > I thought about this too. dgram can have a separate rx/tx > path from stream types. In this case, the the_virtio_vsock > will still be shared because the virtqueues have to be in one > structure. Then the_virtio_vsock will be protected by a rcu > and a reference counting ( or a sleepable RCU ). > > In vhost_vsock_dev_release, it will wait for both rcu and another > one to be finished and then free the memory. I think this is > doable. Let me know if there is a better way to do it. > Btw, I think dgram tx code path will be quite different from > stream, but dgram rx path will be similar to stream type. > > > Take the datagram tx virtqueue lock, try to add the packet into the > > virtqueue, and return -ENOBUFS if the virtqueue is full. Then use the > > datagram socket's sndbuf accounting to prevent queuing up too many > > packets. When a datagram tx virtqueue buffer is used by the device, > > select queued packets for transmission. > > I am not sure about the last sentence. In the new design, there will > be no other queued packets for dgram (except those in the virtqueue). > When dgram tx virtqueue is freed by the device, the driver side > just needs to free some space. No need > to trigger more transmission. This approach drops packets when the virtqueue is full. In order to get close to UDP semantics I think the following is necessary: 1. Packets are placed directly into the tx virtqueue when possible. 2. Packets are queued up to sndbuf bytes if the tx virtqueue is full. 3. When a full tx virtqueue gets some free space again there is an algorithm that selects from among the queued sockets. This is how unreliably delivery can be somewhat fair between competing threads. I thought you were trying to achieve this (it's similar to what UDP does)? Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description
On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote: > On 30.03.2021 00:28, Stefano Garzarella wrote: > > On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote: > >> On 29.03.2021 19:11, Stefan Hajnoczi wrote: > >>> On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote: > @@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device Types > / Socket Device / Device Op > #define VIRTIO_VSOCK_OP_CREDIT_UPDATE 6 > /* Request the peer to send the credit info to us */ > #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7 > +/* Message begin for SOCK_SEQPACKET */ > +#define VIRTIO_VSOCK_OP_SEQ_BEGIN 8 > +/* Message end for SOCK_SEQPACKET */ > +#define VIRTIO_VSOCK_OP_SEQ_END9 > >>> The struct virtio_vsock_hdr->flags field is le32 and currently unused. > >>> Could 24 bits be used for a unique message id and 8 bits for flags? 1 > >>> flag bit could be used for end-of-message and the remaining 7 bits could > >>> be reserved. That way SEQ_BEGIN and SEQ_END are not necessary. > >>> Pressure > >>> on the virtqueue would be reduced and performance should be comparable > >>> to SOCK_STREAM. > >> Well, my first versions of SOCK_SEQPACKET implementation, worked > >> something like this: i used flags field of header as length of whole > >> message. I discussed it with Stefano Garzarella, and he told that it > >> will > >> be better to use special "header" in packet's payload, to keep some > >> SOCK_SEQPACKET specific data, instead of reusing packet's header > >> fields. > > IIRC in the first implementation SEQ_BEGIN was an empty message and we > > didn't added the msg_id yet. So since we needed to carry both id and > > total length, I suggested to use the payload to put these extra > > information. > > > > IIUC what Stefan is suggesting is a bit different and I think it should > > be cool to implement: we can remove the boundary packets, use only 8 > > bits for the flags, and add a new field to reuse the 24 unused bits, > > maybe also 16 bits would be enough. > > At that point we will only use the EOR flag to know the last packet. > > > > The main difference will be that the receiver will know the total size > > only when the last packet is received. > > > > Do you see any issue on that approach? > > It will work, except we can't check that any packet of message, > > except last(with EOR bit) was dropped, since receiver don't know > > real length of message. If it is ok, then i can implement it. The credit mechanism ensures that packets are not dropped, so it's not necessary to check for this condition. By the way, is a unique message ID needed? My understanding is: If two messages are being sent on a socket at the same time either their order is serialized (whichever message began first) or it is undefined (whichever message completes first). I wonder if POSIX specifies this and if Linux implements it (e.g. with AF_UNIX SOCK_SEQPACKET messages that are multiple pages long and exceed sndbuf)? Depending on these semantics maybe we don't need a unique message ID. Instead the driver transmits messages sequentially. RW packets for different messages on the same socket will never be interleaved. Therefore the unique message ID is not needed and just the MSG_EOR flag is enough to indicate message boundaries. What do you think? Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 00/10] drm: Support simple-framebuffer devices and firmware fbs
Hi, On 3/30/21 9:09 AM, Thomas Zimmermann wrote: > Hi > > Am 29.03.21 um 16:50 schrieb Hans de Goede: >> Hi, >> >> On 3/29/21 2:31 PM, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 25.03.21 um 12:29 schrieb Hans de Goede: Hi, On 3/18/21 11:29 AM, Thomas Zimmermann wrote: > This patchset adds support for simple-framebuffer platform devices and > a handover mechanism for native drivers to take-over control of the > hardware. > > The new driver, called simpledrm, binds to a simple-frambuffer platform > device. The kernel's boot code creates such devices for firmware-provided > framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot > loader sets up the framebuffers. Description via device tree is also an > option. > > Simpledrm is small enough to be linked into the kernel. The driver's main > purpose is to provide graphical output during the early phases of the boot > process, before the native DRM drivers are available. Native drivers are > typically loaded from an initrd ram disk. Occationally simpledrm can also > serve as interim solution on graphics hardware without native DRM driver. > > So far distributions rely on fbdev drivers, such as efifb, vesafb or > simplefb, for early-boot graphical output. However fbdev is deprecated and > the drivers do not provide DRM interfaces for modern userspace. > > Patches 1 and 2 prepare the DRM format helpers for simpledrm. > > Patches 3 and 4 add a hand-over mechanism. Simpledrm acquires it's > framebuffer's I/O-memory range and provides a callback function to be > removed by a native driver. The native driver will remove simpledrm before > taking over the hardware. The removal is integrated into existing helpers, > so drivers use it automatically. > > Patches 5 to 10 add the simpledrm driver. It's build on simple DRM helpers > and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During > pageflips, SHMEM buffers are copied into the framebuffer memory, similar > to cirrus or mgag200. The code in patches 8 and 9 handles clocks and > regulators. It's based on the simplefb drivers, but has been modified for > DRM. Thank you for your work on this, this is very interesting. > I've also been working on fastboot support (i.e., flicker-free booting). > This requires state-readout from simpledrm via generic interfaces, as > outlined in [1]. I do have some prototype code, but it will take a while > to get this ready. Simpledrm will then support it. > > I've tested simpledrm with x86 EFI and VESA framebuffers, which both work > reliably. The fbdev console and Weston work automatically. Xorg requires > manual configuration of the device. Xorgs current modesetting driver does > not work with both, platform and PCI device, for the same physical > hardware. Once configured, X11 works. I looked into X11, but couldn't see > an easy way of fixing the problem. With the push towards Wayland+Xwayland > I expect the problem to become a non-issue soon. Additional testing has > been reported at [2]. > > One cosmetical issue is that simpledrm's device file is card0 and the > native driver's device file is card1. After simpledrm has been kicked out, > only card1 is left. This does not seem to be a practical problem however. > > TODO/IDEAS: > > * provide deferred takeover I'm not sure what you mean with this ? Currently deferred-takeover is handled in the fbcon code. Current flickerfree boot works like this (assuming a single LCD panel in a laptop): 1. EFI/GOP sets up the framebuffer, draws a vendor logo 2. The bootloader runs in silent mode and does not touch anything gfx related 3. kernel boots, with a loglevel of 3 so only CRIT/EMERG messages are shown 2. efifb loads; and tells fbcon that a framebuffer is now available for it to "bind" to. Since CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y fbcon defers taking over the console and leaves the dummy-console driver in place (unless there have already been kernel messages logged, which there shouldn't because loglevel=3) 3. i915 loads, reads out the hw state compares this to the preferred-mode for the panel which it would set, they match, nothing happens. i915 takes ownership of the scanout-buffer set up by the GOP, but leaves it in place. i915 also removes the efifb /dev/fb0 and installs its own /dev/fb0 fbdev compat device, fbcon is notified of this, but is still deferred and leaves the dummy console driver in place as console driver. 4. Plymouth loads, allocates a new scan-out buffer at the panel's preferred resolution, plymouth reads the vendor-logo through the
Re: [virtio-comment] Re: [MASSMAIL KLMS] Re: [virtio-comment] [RFC PATCH v4 2/2] virtio-vsock: SOCK_SEQPACKET description
On Tue, Mar 30, 2021 at 09:15:39AM +0300, Arseny Krasnov wrote: On 30.03.2021 00:28, Stefano Garzarella wrote: On Mon, Mar 29, 2021 at 08:33:27PM +0300, Arseny Krasnov wrote: On 29.03.2021 19:11, Stefan Hajnoczi wrote: On Fri, Mar 26, 2021 at 12:02:50PM +0300, Arseny Krasnov wrote: This adds description of SOCK_SEQPACKET socket type support for virtio-vsock. Signed-off-by: Arseny Krasnov --- virtio-vsock.tex | 65 +++- 1 file changed, 59 insertions(+), 6 deletions(-) diff --git a/virtio-vsock.tex b/virtio-vsock.tex index ad57f9d..c366de7 100644 --- a/virtio-vsock.tex +++ b/virtio-vsock.tex @@ -17,6 +17,10 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues} \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits} There are currently no feature bits defined for this device. ^ This line is now out of date :) Ack +\begin{description} +\item VIRTIO_VSOCK_F_SEQPACKET (0) SOCK_SEQPACKET socket type is +supported. +\end{description} \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout} @@ -98,6 +102,10 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op #define VIRTIO_VSOCK_OP_CREDIT_UPDATE 6 /* Request the peer to send the credit info to us */ #define VIRTIO_VSOCK_OP_CREDIT_REQUEST 7 +/* Message begin for SOCK_SEQPACKET */ +#define VIRTIO_VSOCK_OP_SEQ_BEGIN 8 +/* Message end for SOCK_SEQPACKET */ +#define VIRTIO_VSOCK_OP_SEQ_END9 The struct virtio_vsock_hdr->flags field is le32 and currently unused. Could 24 bits be used for a unique message id and 8 bits for flags? 1 flag bit could be used for end-of-message and the remaining 7 bits could be reserved. That way SEQ_BEGIN and SEQ_END are not necessary. Pressure on the virtqueue would be reduced and performance should be comparable to SOCK_STREAM. Well, my first versions of SOCK_SEQPACKET implementation, worked something like this: i used flags field of header as length of whole message. I discussed it with Stefano Garzarella, and he told that it will be better to use special "header" in packet's payload, to keep some SOCK_SEQPACKET specific data, instead of reusing packet's header fields. IIRC in the first implementation SEQ_BEGIN was an empty message and we didn't added the msg_id yet. So since we needed to carry both id and total length, I suggested to use the payload to put these extra information. IIUC what Stefan is suggesting is a bit different and I think it should be cool to implement: we can remove the boundary packets, use only 8 bits for the flags, and add a new field to reuse the 24 unused bits, maybe also 16 bits would be enough. At that point we will only use the EOR flag to know the last packet. The main difference will be that the receiver will know the total size only when the last packet is received. Do you see any issue on that approach? It will work, except we can't check that any packet of message, except last(with EOR bit) was dropped, since receiver don't know real length of message. If it is ok, then i can implement it. This is true, but where can a packet be lost? The channel is lossless, so it can be lost either by the transmitter or the receiver, but only in critical cases where for example the whole system has run out of memory, but in this case we can't do much, maybe only reset the connection. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 00/10] drm: Support simple-framebuffer devices and firmware fbs
Hi Am 29.03.21 um 16:50 schrieb Hans de Goede: Hi, On 3/29/21 2:31 PM, Thomas Zimmermann wrote: Hi Am 25.03.21 um 12:29 schrieb Hans de Goede: Hi, On 3/18/21 11:29 AM, Thomas Zimmermann wrote: This patchset adds support for simple-framebuffer platform devices and a handover mechanism for native drivers to take-over control of the hardware. The new driver, called simpledrm, binds to a simple-frambuffer platform device. The kernel's boot code creates such devices for firmware-provided framebuffers, such as EFI-GOP or VESA. Typically the BIOS, UEFI or boot loader sets up the framebuffers. Description via device tree is also an option. Simpledrm is small enough to be linked into the kernel. The driver's main purpose is to provide graphical output during the early phases of the boot process, before the native DRM drivers are available. Native drivers are typically loaded from an initrd ram disk. Occationally simpledrm can also serve as interim solution on graphics hardware without native DRM driver. So far distributions rely on fbdev drivers, such as efifb, vesafb or simplefb, for early-boot graphical output. However fbdev is deprecated and the drivers do not provide DRM interfaces for modern userspace. Patches 1 and 2 prepare the DRM format helpers for simpledrm. Patches 3 and 4 add a hand-over mechanism. Simpledrm acquires it's framebuffer's I/O-memory range and provides a callback function to be removed by a native driver. The native driver will remove simpledrm before taking over the hardware. The removal is integrated into existing helpers, so drivers use it automatically. Patches 5 to 10 add the simpledrm driver. It's build on simple DRM helpers and SHMEM. It supports 16-bit, 24-bit and 32-bit RGB framebuffers. During pageflips, SHMEM buffers are copied into the framebuffer memory, similar to cirrus or mgag200. The code in patches 8 and 9 handles clocks and regulators. It's based on the simplefb drivers, but has been modified for DRM. Thank you for your work on this, this is very interesting. I've also been working on fastboot support (i.e., flicker-free booting). This requires state-readout from simpledrm via generic interfaces, as outlined in [1]. I do have some prototype code, but it will take a while to get this ready. Simpledrm will then support it. I've tested simpledrm with x86 EFI and VESA framebuffers, which both work reliably. The fbdev console and Weston work automatically. Xorg requires manual configuration of the device. Xorgs current modesetting driver does not work with both, platform and PCI device, for the same physical hardware. Once configured, X11 works. I looked into X11, but couldn't see an easy way of fixing the problem. With the push towards Wayland+Xwayland I expect the problem to become a non-issue soon. Additional testing has been reported at [2]. One cosmetical issue is that simpledrm's device file is card0 and the native driver's device file is card1. After simpledrm has been kicked out, only card1 is left. This does not seem to be a practical problem however. TODO/IDEAS: * provide deferred takeover I'm not sure what you mean with this ? Currently deferred-takeover is handled in the fbcon code. Current flickerfree boot works like this (assuming a single LCD panel in a laptop): 1. EFI/GOP sets up the framebuffer, draws a vendor logo 2. The bootloader runs in silent mode and does not touch anything gfx related 3. kernel boots, with a loglevel of 3 so only CRIT/EMERG messages are shown 2. efifb loads; and tells fbcon that a framebuffer is now available for it to "bind" to. Since CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y fbcon defers taking over the console and leaves the dummy-console driver in place (unless there have already been kernel messages logged, which there shouldn't because loglevel=3) 3. i915 loads, reads out the hw state compares this to the preferred-mode for the panel which it would set, they match, nothing happens. i915 takes ownership of the scanout-buffer set up by the GOP, but leaves it in place. i915 also removes the efifb /dev/fb0 and installs its own /dev/fb0 fbdev compat device, fbcon is notified of this, but is still deferred and leaves the dummy console driver in place as console driver. 4. Plymouth loads, allocates a new scan-out buffer at the panel's preferred resolution, plymouth reads the vendor-logo through the BGRT ACPI interface and fills the scanout-buffer with the vendor-logo + a spinner. Then plymouth installs the new scanout-buffer on the crtc, this is done atomically during vsync, so the user sees no changes, other then the spinner appearing (note the active VT is now in graphical mode) 5. From here on not flickering is a userspace problem AFAICT this should work fine with simplekms too, unless it clears the screen to black when it binds. I forgot to add the code that clears the screen, but that's the case here. Instead of a