On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: > > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.pal...@oracle.com> > > wrote: > >> > >> The goal of these patches is to add support to a variety of virtio and > >> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > >> indicates that all buffers are used by the device in the same order in > >> which they were made available by the driver. > >> > >> These patches attempt to implement a generalized, non-device-specific > >> solution to support this feature. > >> > >> The core feature behind this solution is a buffer mechanism in the form > >> of GLib's GHashTable. The decision behind using a hash table was to > >> leverage their ability for quick lookup, insertion, and removal > >> operations. Given that our keys are simply numbers of an ordered > >> sequence, a hash table seemed like the best choice for a buffer > >> mechanism. > >> > >> --------------------- > >> > >> The strategy behind this implementation is as follows: > >> > >> We know that buffers that are popped from the available ring and enqueued > >> for further processing will always done in the same order in which they > >> were made available by the driver. Given this, we can note their order > >> by assigning the resulting VirtQueueElement a key. This key is a number > >> in a sequence that represents the order in which they were popped from > >> the available ring, relative to the other VirtQueueElements. > >> > >> For example, given 3 "elements" that were popped from the available > >> ring, we assign a key value to them which represents their order (elem0 > >> is popped first, then elem1, then lastly elem2): > >> > >> elem2 -- elem1 -- elem0 ---> Enqueue for processing > >> (key: 2) (key: 1) (key: 0) > >> > >> Then these elements are enqueued for further processing by the host. > >> > >> While most devices will return these completed elements in the same > >> order in which they were enqueued, some devices may not (e.g. > >> virtio-blk). To guarantee that these elements are put on the used ring > >> in the same order in which they were enqueued, we can use a buffering > >> mechanism that keeps track of the next expected sequence number of an > >> element. > >> > >> In other words, if the completed element does not have a key value that > >> matches the next expected sequence number, then we know this element is > >> not in-order and we must stash it away in a hash table until an order > >> can be made. The element's key value is used as the key for placing it > >> in the hash table. > >> > >> If the completed element has a key value that matches the next expected > >> sequence number, then we know this element is in-order and we can push > >> it on the used ring. Then we increment the next expected sequence number > >> and check if the hash table contains an element at this key location. > >> > >> If so, we retrieve this element, push it to the used ring, delete the > >> key-value pair from the hash table, increment the next expected sequence > >> number, and check the hash table again for an element at this new key > >> location. This process is repeated until we're unable to find an element > >> in the hash table to continue the order. > >> > >> So, for example, say the 3 elements we enqueued were completed in the > >> following order: elem1, elem2, elem0. The next expected sequence number > >> is 0: > >> > >> exp-seq-num = 0: > >> > >> elem1 --> elem1.key == exp-seq-num ? --> No, stash it > >> (key: 1) | > >> | > >> v > >> ================ > >> |key: 1 - elem1| > >> ================ > >> --------------------- > >> exp-seq-num = 0: > >> > >> elem2 --> elem2.key == exp-seq-num ? --> No, stash it > >> (key: 2) | > >> | > >> v > >> ================ > >> |key: 1 - elem1| > >> |--------------| > >> |key: 2 - elem2| > >> ================ > >> --------------------- > >> exp-seq-num = 0: > >> > >> elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring > >> (key: 0) > >> > >> exp-seq-num = 1: > >> > >> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > >> remove elem from table > >> | > >> v > >> ================ > >> |key: 2 - elem2| > >> ================ > >> > >> exp-seq-num = 2: > >> > >> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > >> remove elem from table > >> | > >> v > >> ================ > >> | *empty* | > >> ================ > >> > >> exp-seq-num = 3: > >> > >> lookup(table, exp-seq-num) != NULL ? --> No, done > >> --------------------- > >> > > > > I think to use a hashtable to handle this has an important drawback: > > it hurts performance on the devices that are using right in-order > > because of hash calculus, to benefit devices that are using it badly > > by using descriptors out of order. We should use data structs that are > > as free as possible for the first, and we don't care to worse the > > experience of the devices that enable in_order and they shouldn't. > > > > Right, because if descriptors are coming in in-order, we still search > the (empty) hash table. > > Hmm... what if we introduced a flag to see if we actually should bother > searching the hash table? That way we avoid the cost of searching when > we really don't need to. > > > So I suggest reusing vq->used_elems array vq. At each used descriptor > > written in the used ring, you know the next head is elem->index + > > elem->ndescs, so you can check if that element has been filled or not. > > If used, it needs to be flushed too. If not used, just return. > > > > Of course virtqueue_flush also needs to take this into account. > > > > What do you think, does it make sense to you? > > > > I'm having a bit of trouble understanding the suggestion here. Would you > mind elaborating a bit more for me on this? > > For example, say elem0, elem1, and elem2 were enqueued in-order (elem0 > being first, elem2 last) and then elem2 finishes first, elem1 second, > and elem0 third. Given that these elements finish out-of-order, how > would you handle these out-of-order elements using your suggestion? >
virtqueue_fill is called first with elem2. So vq->used_elems[2 % vq->num] is filled with the needed information of the descriptor: index, len and ndescs. idx function parameter is ignored. Optionally, virtqueue_push is called. It checks if vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num + elem->out_num > 0, and reset them on every used ring write. If it is not valid, this is a no-op. Currently, it is not valid. Same process for elem1. virtqueue_fill is the same for elem0. But now virtqueue_flush gets interesting, as it detects vq->used_elems[0] is used. It scans for the first not-used element, and it finds it is vq->used_elems[3]. So it needs to write an used elem with id = 2 and the corresponding length. Maybe it is interesting to implement ways to improve the look for the last used descriptor, but if any I'd go for a bitmap and always on top of the basis series. The algorithm has not been tested, so maybe I've missed something. Thanks! > Thanks :) > > > Thanks! > > > > > >> Jonah Palmer (8): > >> virtio: Define InOrderVQElement > >> virtio: Create/destroy/reset VirtQueue In-Order hash table > >> virtio: Define order variables > >> virtio: Implement in-order handling for virtio devices > >> virtio-net: in-order handling > >> vhost-svq: in-order handling > >> vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits > >> virtio: Add VIRTIO_F_IN_ORDER property definition > >> > >> hw/block/vhost-user-blk.c | 1 + > >> hw/net/vhost_net.c | 2 + > >> hw/net/virtio-net.c | 6 +- > >> hw/scsi/vhost-scsi.c | 1 + > >> hw/scsi/vhost-user-scsi.c | 1 + > >> hw/virtio/vhost-shadow-virtqueue.c | 15 ++++- > >> hw/virtio/vhost-user-fs.c | 1 + > >> hw/virtio/vhost-user-vsock.c | 1 + > >> hw/virtio/virtio.c | 103 ++++++++++++++++++++++++++++- > >> include/hw/virtio/virtio.h | 20 +++++- > >> net/vhost-vdpa.c | 1 + > >> 11 files changed, 145 insertions(+), 7 deletions(-) > >> > >> -- > >> 2.39.3 > >> > > >