Re: [PATCH net-next v3 0/5] virtio-net tx napi
On Mon, Apr 24, 2017 at 01:49:25PM -0400, Willem de Bruijn wrote: > From: Willem de Bruijn> > Add napi for virtio-net transmit completion processing. Acked-by: Michael S. Tsirkin > Changes: > v2 -> v3: > - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll > ensure that the handler always cleans, to avoid deadlock > - unconditionally clean in start_xmit > avoid adding an unnecessary "if (use_napi)" branch > - remove virtqueue_disable_cb in patch 5/5 > a noop in the common event_idx based loop > - document affinity_hint_set constraint > > v1 -> v2: > - disable by default > - disable unless affinity_hint_set > because cache misses add up to a third higher cycle cost, > e.g., in TCP_RR tests. This is not limited to the patch > that enables tx completion cleaning in rx napi. > - use trylock to avoid contention between tx and rx napi > - keep interrupts masked during xmit_more (new patch 5/5) > this improves cycles especially for multi UDP_STREAM, which > does not benefit from cleaning tx completions on rx napi. > - move free_old_xmit_skbs (new patch 3/5) > to avoid forward declaration > > not changed: > - deduplicate virnet_poll_tx and virtnet_poll_txclean > they look similar, but have differ too much to make it > worthwhile. > - delay netif_wake_subqueue for more than 2 + MAX_SKB_FRAGS > evaluated, but made no difference > - patch 1/5 > > RFC -> v1: > - dropped vhost interrupt moderation patch: > not needed and likely expensive at light load > - remove tx napi weight > - always clean all tx completions > - use boolean to toggle tx-napi, instead > - only clean tx in rx if tx-napi is enabled > - then clean tx before rx > - fix: add missing braces in virtnet_freeze_down > - testing: add 4KB TCP_RR + UDP test results > > Based on previous patchsets by Jason Wang: > > [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net > http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html > > > Before commit b0c39dbdc204 ("virtio_net: don't free buffers in xmit > ring") the virtio-net driver would free transmitted packets on > transmission of new packets in ndo_start_xmit and, to catch the edge > case when no new packet is sent, also in a timer at 10HZ. > > A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls > due to low free descriptor count. It does not address a stalls due to > low socket SO_SNDBUF. Increasing timer frequency decreases that stall > time, but increases interrupt rate and, thus, cycle count. > > Currently, with no timer, packets are freed only at ndo_start_xmit. > Latency of consume_skb is now unbounded. To avoid a deadlock if a sock > reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small > queues. > > Reenable TCP small queues by removing the orphan. Instead of using a > timer, convert the driver to regular tx napi. This does not have the > unresolved stall issue and does not have any frequency to tune. > > By keeping interrupts enabled by default, napi increases tx > interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if > one is already unacknowledged, so makes this more feasible today. > Combine that with an optimization that brings interrupt rate > back in line with the existing version for most workloads: > > Tx completion cleaning on rx interrupts elides most explicit tx > interrupts by relying on the fact that many rx interrupts fire. > > Tested by running {1, 10, 100} {TCP, UDP} STREAM, RR, 4K_RR benchmarks > from a guest to a server on the host, on an x86_64 Haswell. The guest > runs 4 vCPUs pinned to 4 cores. vhost and the test server are > pinned to a core each. > > All results are the median of 5 runs, with variance well < 10%. > Used neper (github.com/google/neper) as test process. > > Napi increases single stream throughput, but increases cycle cost. > The optimizations bring this down. The previous patchset saw a > regression with UDP_STREAM, which does not benefit from cleaning tx > interrupts in rx napi. This regression is now gone for 10x, 100x. > Remaining difference is higher 1x TCP_STREAM, lower 1x UDP_STREAM. > > The latest results are with process, rx napi and tx napi affine to > the same core. All numbers are lower than the previous patchset. > > > upstream napi > TCP_STREAM: > 1x: > Mbps 2781639805 > Gcycles 274 285 > > 10x: > Mbps 4294742531 > Gcycles 300 296 > > 100x: > Mbps 3183028042 > Gcycles 279 269 > > TCP_RR Latency (us): > 1x: > p50 21 21 > p99 27 27 > Gcycles 180 167 > > 10x: > p50 40 39 > p99 52 52 > Gcycles
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Mon, Apr 24, 2017 at 1:14 PM, Michael S. Tsirkinwrote: > On Mon, Apr 24, 2017 at 01:05:45PM -0400, Willem de Bruijn wrote: >> On Mon, Apr 24, 2017 at 12:40 PM, Michael S. Tsirkin wrote: >> > On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote: >> >> >>> Maybe I was wrong, but according to Michael's comment it looks like he >> >> >>> want >> >> >>> check affinity_hint_set just for speculative tx polling on rx napi >> >> >>> instead >> >> >>> of disabling it at all. >> >> >>> >> >> >>> And I'm not convinced this is really needed, driver only provide >> >> >>> affinity >> >> >>> hint instead of affinity, so it's not guaranteed that tx and rx >> >> >>> interrupt >> >> >>> are in the same vcpus. >> >> >> >> >> >> You're right. I made the restriction broader than the request, to >> >> >> really >> >> >> err >> >> >> on the side of caution for the initial merge of napi tx. And enabling >> >> >> the optimization is always a win over keeping it off, even without irq >> >> >> affinity. >> >> >> >> >> >> The cycle cost is significant without affinity regardless of whether >> >> >> the >> >> >> optimization is used. >> >> > >> >> > >> >> > Yes, I noticed this in the past too. >> >> > >> >> >> Though this is not limited to napi-tx, it is more >> >> >> pronounced in that mode than without napi. >> >> >> >> >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> >> >> >> >> upstream: >> >> >> >> >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> >> >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> >> >> >> >> napi tx: >> >> >> >> >> >> 1,1,1: 34492 Mbps, 269 Gcyc >> >> >> 1,0,2: 36527 Mbps, 537 Gcyc (!) >> >> >> 1,0,1: 36269 Mbps, 394 Gcyc >> >> >> 1,0,0: 34674 Mbps, 402 Gcyc >> >> >> >> >> >> This is a particularly strong example. It is also representative >> >> >> of most RR tests. It is less pronounced in other streaming tests. >> >> >> 10x TCP_RR, for instance: >> >> >> >> >> >> upstream: >> >> >> >> >> >> 1,1,1: 42267 Mbps, 301 Gcyc >> >> >> 1,0,2: 40663 Mbps, 445 Gcyc >> >> >> >> >> >> napi tx: >> >> >> >> >> >> 1,1,1: 42420 Mbps, 303 Gcyc >> >> >> 1,0,2: 42267 Mbps, 431 Gcyc >> >> >> >> >> >> These numbers were obtained with the virtqueue_enable_cb_delayed >> >> >> optimization after xmit_skb, btw. It turns out that moving that before >> >> >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing >> >> >> 100x TCP_RR a bit. >> >> > >> >> > >> >> > I see, so I think we can leave the affinity hint optimization/check for >> >> > future investigation: >> >> > >> >> > - to avoid endless optimization (e.g we may want to share a single >> >> > vector/napi for tx/rx queue pairs in the future) for this series. >> >> > - tx napi is disabled by default which means we can do optimization on >> >> > top. >> >> >> >> Okay. I'll drop the vi->affinity_hint_set from the patch set for now. >> > >> > I kind of like it, let's be conservative. But I'd prefer a comment >> > near it explaining why it's there. >> >> I don't feel strongly. Was minutes away from sending a v3 with this >> code reverted, but I'll reinstate it and add a comment. Other planned >> changes based on Jason's feedback to v2: >> >> v2 -> v3: >> - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll >> ensure that the handler always cleans, to avoid deadlock >> - unconditionally clean in start_xmit >> avoid adding an unnecessary "if (use_napi)" branch >> - remove virtqueue_disable_cb in patch 5/5 >> a noop in the common event_idx based loop > > Makes sense, thanks! Great. Sent that, thanks. The actual diff to v2 is quite small: diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b107ae011632..003143835766 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -986,6 +986,9 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi, if (!napi->weight) return; + /* Tx napi touches cachelines on the cpu handling tx interrupts. Only +* enable the feature if this is likely affine with the transmit path. +*/ if (!vi->affinity_hint_set) { napi->weight = 0; return; @@ -1131,10 +1134,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) struct virtnet_info *vi = sq->vq->vdev->priv; struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); - if (__netif_tx_trylock(txq)) { - free_old_xmit_skbs(sq); - __netif_tx_unlock(txq); - } + __netif_tx_lock(txq, raw_smp_processor_id()); + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); virtqueue_napi_complete(napi, sq->vq, 0); @@ -1196,14 +1198,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) bool use_napi = sq->napi.weight; /* Free up any pending old buffers before queueing new ones. */ - if (use_napi) { - if
[PATCH net-next v3 4/5] virtio-net: clean tx descriptors from rx napi
From: Willem de BruijnAmortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4ec79e5d7a86..9dd978f34c1f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1075,12 +1075,33 @@ static void free_old_xmit_skbs(struct send_queue *sq) u64_stats_update_end(>tx_syncp); } +static void virtnet_poll_cleantx(struct receive_queue *rq) +{ + struct virtnet_info *vi = rq->vq->vdev->priv; + unsigned int index = vq2rxq(rq->vq); + struct send_queue *sq = >sq[index]; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index); + + if (!sq->napi.weight) + return; + + if (__netif_tx_trylock(txq)) { + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + } + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_tx_wake_queue(txq); +} + static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); unsigned int received; + virtnet_poll_cleantx(rq); + received = virtnet_receive(rq, budget); /* Out of packets? */ -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v3 5/5] virtio-net: keep tx interrupts disabled unless kick
From: Willem de BruijnTx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9dd978f34c1f..003143835766 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1200,6 +1200,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); + if (use_napi && kick) + virtqueue_enable_cb_delayed(sq->vq); + /* timestamp packet in software */ skb_tx_timestamp(skb); -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v3 2/5] virtio-net: transmit napi
From: Willem de BruijnConvert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 76 ++-- 1 file changed, 67 insertions(+), 9 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b9c1df29892c..356d18481ee4 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -33,9 +33,10 @@ static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); -static bool csum = true, gso = true; +static bool csum = true, gso = true, napi_tx; module_param(csum, bool, 0444); module_param(gso, bool, 0444); +module_param(napi_tx, bool, 0644); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -86,6 +87,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -262,12 +265,16 @@ static void virtqueue_napi_complete(struct napi_struct *napi, static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; + struct napi_struct *napi = >sq[vq2txq(vq)].napi; /* Suppress further interrupts. */ virtqueue_disable_cb(vq); - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi->dev, vq2txq(vq)); + if (napi->weight) + virtqueue_napi_schedule(napi, vq); + else + /* We were probably waiting for more output buffers. */ + netif_wake_subqueue(vi->dev, vq2txq(vq)); } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -972,6 +979,24 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) local_bh_enable(); } +static void virtnet_napi_tx_enable(struct virtnet_info *vi, + struct virtqueue *vq, + struct napi_struct *napi) +{ + if (!napi->weight) + return; + + /* Tx napi touches cachelines on the cpu handling tx interrupts. Only +* enable the feature if this is likely affine with the transmit path. +*/ + if (!vi->affinity_hint_set) { + napi->weight = 0; + return; + } + + return virtnet_napi_enable(vq, napi); +} + static void refill_work(struct work_struct *work) { struct virtnet_info *vi = @@ -1046,6 +1071,7 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); + virtnet_napi_tx_enable(vi, vi->sq[i].vq, >sq[i].napi); } return 0; @@ -1081,6 +1107,24 @@ static void free_old_xmit_skbs(struct send_queue *sq) u64_stats_update_end(>tx_syncp); } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq->vq->vdev->priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); + + __netif_tx_lock(txq, raw_smp_processor_id()); + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + + virtqueue_napi_complete(napi, sq->vq, 0); + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_tx_wake_queue(txq); + + return 0; +} + static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct virtio_net_hdr_mrg_rxbuf *hdr; @@ -1130,6 +1174,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int err; struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !skb->xmit_more; + bool use_napi = sq->napi.weight; /* Free up any pending old buffers before queueing new ones. */ free_old_xmit_skbs(sq); @@ -1152,8 +1197,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } /* Don't wait up for transmitted skbs to be freed. */ -
[PATCH net-next v3 3/5] virtio-net: move free_old_xmit_skbs
From: Willem de BruijnAn upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 60 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 356d18481ee4..4ec79e5d7a86 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1045,6 +1045,36 @@ static int virtnet_receive(struct receive_queue *rq, int budget) return received; } +static void free_old_xmit_skbs(struct send_queue *sq) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq->vq->vdev->priv; + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); + unsigned int packets = 0; + unsigned int bytes = 0; + + while ((skb = virtqueue_get_buf(sq->vq, )) != NULL) { + pr_debug("Sent skb %p\n", skb); + + bytes += skb->len; + packets++; + + dev_kfree_skb_any(skb); + } + + /* Avoid overhead when no packets have been processed +* happens when called speculatively from start_xmit. +*/ + if (!packets) + return; + + u64_stats_update_begin(>tx_syncp); + stats->tx_bytes += bytes; + stats->tx_packets += packets; + u64_stats_update_end(>tx_syncp); +} + static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = @@ -1077,36 +1107,6 @@ static int virtnet_open(struct net_device *dev) return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq->vq->vdev->priv; - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); - unsigned int packets = 0; - unsigned int bytes = 0; - - while ((skb = virtqueue_get_buf(sq->vq, )) != NULL) { - pr_debug("Sent skb %p\n", skb); - - bytes += skb->len; - packets++; - - dev_kfree_skb_any(skb); - } - - /* Avoid overhead when no packets have been processed -* happens when called speculatively from start_xmit. -*/ - if (!packets) - return; - - u64_stats_update_begin(>tx_syncp); - stats->tx_bytes += bytes; - stats->tx_packets += packets; - u64_stats_update_end(>tx_syncp); -} - static int virtnet_poll_tx(struct napi_struct *napi, int budget) { struct send_queue *sq = container_of(napi, struct send_queue, napi); -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v3 1/5] virtio-net: napi helper functions
From: Willem de BruijnPrepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 65 ++-- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 666ada6130ab..b9c1df29892c 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -239,6 +239,26 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static void virtqueue_napi_schedule(struct napi_struct *napi, + struct virtqueue *vq) +{ + if (napi_schedule_prep(napi)) { + virtqueue_disable_cb(vq); + __napi_schedule(napi); + } +} + +static void virtqueue_napi_complete(struct napi_struct *napi, + struct virtqueue *vq, int processed) +{ + int opaque; + + opaque = virtqueue_enable_cb_prepare(vq); + if (napi_complete_done(napi, processed) && + unlikely(virtqueue_poll(vq, opaque))) + virtqueue_napi_schedule(napi, vq); +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; @@ -936,27 +956,20 @@ static void skb_recv_done(struct virtqueue *rvq) struct virtnet_info *vi = rvq->vdev->priv; struct receive_queue *rq = >rq[vq2rxq(rvq)]; - /* Schedule NAPI, Suppress further interrupts if successful. */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rvq); - __napi_schedule(>napi); - } + virtqueue_napi_schedule(>napi, rvq); } -static void virtnet_napi_enable(struct receive_queue *rq) +static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) { - napi_enable(>napi); + napi_enable(napi); /* If all buffers were filled by other side before we napi_enabled, we -* won't get another interrupt, so process any outstanding packets -* now. virtnet_poll wants re-enable the queue, so we disable here. -* We synchronize against interrupts via NAPI_STATE_SCHED */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rq->vq); - local_bh_disable(); - __napi_schedule(>napi); - local_bh_enable(); - } +* won't get another interrupt, so process any outstanding packets now. +* Call local_bh_enable after to trigger softIRQ processing. +*/ + local_bh_disable(); + virtqueue_napi_schedule(napi, vq); + local_bh_enable(); } static void refill_work(struct work_struct *work) @@ -971,7 +984,7 @@ static void refill_work(struct work_struct *work) napi_disable(>napi); still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); - virtnet_napi_enable(rq); + virtnet_napi_enable(rq->vq, >napi); /* In theory, this can happen: if we don't get any buffers in * we will *never* try to fill again. @@ -1011,21 +1024,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); - unsigned int r, received; + unsigned int received; received = virtnet_receive(rq, budget); /* Out of packets? */ - if (received < budget) { - r = virtqueue_enable_cb_prepare(rq->vq); - if (napi_complete_done(napi, received)) { - if (unlikely(virtqueue_poll(rq->vq, r)) && - napi_schedule_prep(napi)) { - virtqueue_disable_cb(rq->vq); - __napi_schedule(napi); - } - } - } + if (received < budget) + virtqueue_napi_complete(napi, rq->vq, received); return received; } @@ -1040,7 +1045,7 @@ static int virtnet_open(struct net_device *dev) /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } return 0; @@ -1747,7 +1752,7 @@ static int virtnet_restore_up(struct virtio_device *vdev) schedule_delayed_work(>refill, 0); for (i = 0; i < vi->max_queue_pairs; i++) - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } netif_device_attach(vi->dev); --
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Mon, Apr 24, 2017 at 12:40 PM, Michael S. Tsirkinwrote: > On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote: >> >>> Maybe I was wrong, but according to Michael's comment it looks like he >> >>> want >> >>> check affinity_hint_set just for speculative tx polling on rx napi >> >>> instead >> >>> of disabling it at all. >> >>> >> >>> And I'm not convinced this is really needed, driver only provide affinity >> >>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt >> >>> are in the same vcpus. >> >> >> >> You're right. I made the restriction broader than the request, to really >> >> err >> >> on the side of caution for the initial merge of napi tx. And enabling >> >> the optimization is always a win over keeping it off, even without irq >> >> affinity. >> >> >> >> The cycle cost is significant without affinity regardless of whether the >> >> optimization is used. >> > >> > >> > Yes, I noticed this in the past too. >> > >> >> Though this is not limited to napi-tx, it is more >> >> pronounced in that mode than without napi. >> >> >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> >> >> upstream: >> >> >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> >> >> napi tx: >> >> >> >> 1,1,1: 34492 Mbps, 269 Gcyc >> >> 1,0,2: 36527 Mbps, 537 Gcyc (!) >> >> 1,0,1: 36269 Mbps, 394 Gcyc >> >> 1,0,0: 34674 Mbps, 402 Gcyc >> >> >> >> This is a particularly strong example. It is also representative >> >> of most RR tests. It is less pronounced in other streaming tests. >> >> 10x TCP_RR, for instance: >> >> >> >> upstream: >> >> >> >> 1,1,1: 42267 Mbps, 301 Gcyc >> >> 1,0,2: 40663 Mbps, 445 Gcyc >> >> >> >> napi tx: >> >> >> >> 1,1,1: 42420 Mbps, 303 Gcyc >> >> 1,0,2: 42267 Mbps, 431 Gcyc >> >> >> >> These numbers were obtained with the virtqueue_enable_cb_delayed >> >> optimization after xmit_skb, btw. It turns out that moving that before >> >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing >> >> 100x TCP_RR a bit. >> > >> > >> > I see, so I think we can leave the affinity hint optimization/check for >> > future investigation: >> > >> > - to avoid endless optimization (e.g we may want to share a single >> > vector/napi for tx/rx queue pairs in the future) for this series. >> > - tx napi is disabled by default which means we can do optimization on top. >> >> Okay. I'll drop the vi->affinity_hint_set from the patch set for now. > > I kind of like it, let's be conservative. But I'd prefer a comment > near it explaining why it's there. I don't feel strongly. Was minutes away from sending a v3 with this code reverted, but I'll reinstate it and add a comment. Other planned changes based on Jason's feedback to v2: v2 -> v3: - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll ensure that the handler always cleans, to avoid deadlock - unconditionally clean in start_xmit avoid adding an unnecessary "if (use_napi)" branch - remove virtqueue_disable_cb in patch 5/5 a noop in the common event_idx based loop ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 5/5] virtio-net: keep tx interrupts disabled unless kick
On Thu, Apr 20, 2017 at 10:03 AM, Willem de Bruijnwrote: >>> - if (!use_napi) >>> + if (use_napi) { >>> + if (kick) >>> + virtqueue_enable_cb_delayed(sq->vq); >>> + else >>> + virtqueue_disable_cb(sq->vq); >> >> >> Since virtqueue_disable_cb() do nothing for event idx. I wonder whether or >> not just calling enable_cb_dealyed() is ok here. > > Good point. > >> Btw, it does not disable interrupt at all, I propose a patch in the past >> which can do more than this: >> >> https://patchwork.kernel.org/patch/6472601/ > > Interesting. Yes, let me evaluate that variant. In initial tests I don't see a significant change, but we can look into this more closely as a follow-on patch. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Thu, Apr 20, 2017 at 12:02 PM, Willem de Bruijnwrote: >>> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) >>> { >>> struct virtio_net_hdr_mrg_rxbuf *hdr; >>> @@ -1130,9 +1172,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, >>> struct net_device *dev) >>> int err; >>> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); >>> bool kick = !skb->xmit_more; >>> + bool use_napi = sq->napi.weight; >>> /* Free up any pending old buffers before queueing new ones. */ >>> - free_old_xmit_skbs(sq); >>> + if (!use_napi) >>> + free_old_xmit_skbs(sq); >> >> >> I'm not sure this is best or even correct. Consider we clean xmit packets >> speculatively in virtnet_poll_tx(), we need call free_old_xmit_skbs() >> unconditionally. This can also help to reduce the possible of napi >> rescheduling in virtnet_poll_tx(). > > Because of the use of trylock there. Absolutely, thanks! Perhaps I should > only use trylock in the opportunistic clean path from the rx softirq and > full locking in the tx softirq. > > I previously observed that cleaning here would, counterintuitively, > reduce efficiency. It reverted the improvements of cleaning transmit > completions from the receive softirq. Going through my data, I did > not observe this regression anymore on the latest patchset. > > Let me test again, with and without the new > virtqueue_enable_cb_delayed patch. Perhaps that made a > difference. Neither cleaning in start_xmit nor converting the napi tx trylock to lock shows a significant impact on loadtests, whether cpu affine or not. I'll make both changes, as the first reduces patch size and code complexity and the second is a more obviously correct codepath than than with trylock. To be clear, the variant called from the rx napi handler will still opportunistically use trylock. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
>>> Maybe I was wrong, but according to Michael's comment it looks like he >>> want >>> check affinity_hint_set just for speculative tx polling on rx napi >>> instead >>> of disabling it at all. >>> >>> And I'm not convinced this is really needed, driver only provide affinity >>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt >>> are in the same vcpus. >> >> You're right. I made the restriction broader than the request, to really >> err >> on the side of caution for the initial merge of napi tx. And enabling >> the optimization is always a win over keeping it off, even without irq >> affinity. >> >> The cycle cost is significant without affinity regardless of whether the >> optimization is used. > > > Yes, I noticed this in the past too. > >> Though this is not limited to napi-tx, it is more >> pronounced in that mode than without napi. >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> upstream: >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> napi tx: >> >> 1,1,1: 34492 Mbps, 269 Gcyc >> 1,0,2: 36527 Mbps, 537 Gcyc (!) >> 1,0,1: 36269 Mbps, 394 Gcyc >> 1,0,0: 34674 Mbps, 402 Gcyc >> >> This is a particularly strong example. It is also representative >> of most RR tests. It is less pronounced in other streaming tests. >> 10x TCP_RR, for instance: >> >> upstream: >> >> 1,1,1: 42267 Mbps, 301 Gcyc >> 1,0,2: 40663 Mbps, 445 Gcyc >> >> napi tx: >> >> 1,1,1: 42420 Mbps, 303 Gcyc >> 1,0,2: 42267 Mbps, 431 Gcyc >> >> These numbers were obtained with the virtqueue_enable_cb_delayed >> optimization after xmit_skb, btw. It turns out that moving that before >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing >> 100x TCP_RR a bit. > > > I see, so I think we can leave the affinity hint optimization/check for > future investigation: > > - to avoid endless optimization (e.g we may want to share a single > vector/napi for tx/rx queue pairs in the future) for this series. > - tx napi is disabled by default which means we can do optimization on top. Okay. I'll drop the vi->affinity_hint_set from the patch set for now. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2] net: virtio_net: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes--- Changelog: v2: - remove comment about the missing hardware, I've tested this change with qemu drivers/net/virtio_net.c | 50 +++-- 1 files changed, 30 insertions(+), 20 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ea9890d..b0d241d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1636,47 +1636,57 @@ static void virtnet_get_channels(struct net_device *dev, } /* Check if the user is trying to change anything besides speed/duplex */ -static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd) +static bool +virtnet_validate_ethtool_cmd(const struct ethtool_link_ksettings *cmd) { - struct ethtool_cmd diff1 = *cmd; - struct ethtool_cmd diff2 = {}; + struct ethtool_link_ksettings diff1 = *cmd; + struct ethtool_link_ksettings diff2 = {}; /* cmd is always set so we need to clear it, validate the port type * and also without autonegotiation we can ignore advertising */ - ethtool_cmd_speed_set(, 0); - diff2.port = PORT_OTHER; - diff1.advertising = 0; - diff1.duplex = 0; - diff1.cmd = 0; + diff1.base.speed = 0; + diff2.base.port = PORT_OTHER; + ethtool_link_ksettings_zero_link_mode(, advertising); + diff1.base.duplex = 0; + diff1.base.cmd = 0; + diff1.base.link_mode_masks_nwords = 0; - return !memcmp(, , sizeof(diff1)); + return !memcmp(, , sizeof(diff1.base)) && + bitmap_empty(diff1.link_modes.supported, +__ETHTOOL_LINK_MODE_MASK_NBITS) && + bitmap_empty(diff1.link_modes.advertising, +__ETHTOOL_LINK_MODE_MASK_NBITS) && + bitmap_empty(diff1.link_modes.lp_advertising, +__ETHTOOL_LINK_MODE_MASK_NBITS); } -static int virtnet_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) +static int virtnet_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) { struct virtnet_info *vi = netdev_priv(dev); u32 speed; - speed = ethtool_cmd_speed(cmd); + speed = cmd->base.speed; /* don't allow custom speed and duplex */ if (!ethtool_validate_speed(speed) || - !ethtool_validate_duplex(cmd->duplex) || + !ethtool_validate_duplex(cmd->base.duplex) || !virtnet_validate_ethtool_cmd(cmd)) return -EINVAL; vi->speed = speed; - vi->duplex = cmd->duplex; + vi->duplex = cmd->base.duplex; return 0; } -static int virtnet_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +static int virtnet_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *cmd) { struct virtnet_info *vi = netdev_priv(dev); - ethtool_cmd_speed_set(cmd, vi->speed); - cmd->duplex = vi->duplex; - cmd->port = PORT_OTHER; + cmd->base.speed = vi->speed; + cmd->base.duplex = vi->duplex; + cmd->base.port = PORT_OTHER; return 0; } @@ -1696,8 +1706,8 @@ static void virtnet_init_settings(struct net_device *dev) .set_channels = virtnet_set_channels, .get_channels = virtnet_get_channels, .get_ts_info = ethtool_op_get_ts_info, - .get_settings = virtnet_get_settings, - .set_settings = virtnet_set_settings, + .get_link_ksettings = virtnet_get_link_ksettings, + .set_link_ksettings = virtnet_set_link_ksettings, }; static void virtnet_freeze_down(struct virtio_device *vdev) -- 1.7.4.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] net: virtio_net: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes--- drivers/net/virtio_net.c | 50 +++-- 1 files changed, 30 insertions(+), 20 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ea9890d..b0d241d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1636,47 +1636,57 @@ static void virtnet_get_channels(struct net_device *dev, } /* Check if the user is trying to change anything besides speed/duplex */ -static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd) +static bool +virtnet_validate_ethtool_cmd(const struct ethtool_link_ksettings *cmd) { - struct ethtool_cmd diff1 = *cmd; - struct ethtool_cmd diff2 = {}; + struct ethtool_link_ksettings diff1 = *cmd; + struct ethtool_link_ksettings diff2 = {}; /* cmd is always set so we need to clear it, validate the port type * and also without autonegotiation we can ignore advertising */ - ethtool_cmd_speed_set(, 0); - diff2.port = PORT_OTHER; - diff1.advertising = 0; - diff1.duplex = 0; - diff1.cmd = 0; + diff1.base.speed = 0; + diff2.base.port = PORT_OTHER; + ethtool_link_ksettings_zero_link_mode(, advertising); + diff1.base.duplex = 0; + diff1.base.cmd = 0; + diff1.base.link_mode_masks_nwords = 0; - return !memcmp(, , sizeof(diff1)); + return !memcmp(, , sizeof(diff1.base)) && + bitmap_empty(diff1.link_modes.supported, +__ETHTOOL_LINK_MODE_MASK_NBITS) && + bitmap_empty(diff1.link_modes.advertising, +__ETHTOOL_LINK_MODE_MASK_NBITS) && + bitmap_empty(diff1.link_modes.lp_advertising, +__ETHTOOL_LINK_MODE_MASK_NBITS); } -static int virtnet_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) +static int virtnet_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) { struct virtnet_info *vi = netdev_priv(dev); u32 speed; - speed = ethtool_cmd_speed(cmd); + speed = cmd->base.speed; /* don't allow custom speed and duplex */ if (!ethtool_validate_speed(speed) || - !ethtool_validate_duplex(cmd->duplex) || + !ethtool_validate_duplex(cmd->base.duplex) || !virtnet_validate_ethtool_cmd(cmd)) return -EINVAL; vi->speed = speed; - vi->duplex = cmd->duplex; + vi->duplex = cmd->base.duplex; return 0; } -static int virtnet_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +static int virtnet_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *cmd) { struct virtnet_info *vi = netdev_priv(dev); - ethtool_cmd_speed_set(cmd, vi->speed); - cmd->duplex = vi->duplex; - cmd->port = PORT_OTHER; + cmd->base.speed = vi->speed; + cmd->base.duplex = vi->duplex; + cmd->base.port = PORT_OTHER; return 0; } @@ -1696,8 +1706,8 @@ static void virtnet_init_settings(struct net_device *dev) .set_channels = virtnet_set_channels, .get_channels = virtnet_get_channels, .get_ts_info = ethtool_op_get_ts_info, - .get_settings = virtnet_get_settings, - .set_settings = virtnet_set_settings, + .get_link_ksettings = virtnet_get_link_ksettings, + .set_link_ksettings = virtnet_set_link_ksettings, }; static void virtnet_freeze_down(struct virtio_device *vdev) -- 1.7.4.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
>> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) >> { >> struct virtio_net_hdr_mrg_rxbuf *hdr; >> @@ -1130,9 +1172,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, >> struct net_device *dev) >> int err; >> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); >> bool kick = !skb->xmit_more; >> + bool use_napi = sq->napi.weight; >> /* Free up any pending old buffers before queueing new ones. */ >> - free_old_xmit_skbs(sq); >> + if (!use_napi) >> + free_old_xmit_skbs(sq); > > > I'm not sure this is best or even correct. Consider we clean xmit packets > speculatively in virtnet_poll_tx(), we need call free_old_xmit_skbs() > unconditionally. This can also help to reduce the possible of napi > rescheduling in virtnet_poll_tx(). Because of the use of trylock there. Absolutely, thanks! Perhaps I should only use trylock in the opportunistic clean path from the rx softirq and full locking in the tx softirq. I previously observed that cleaning here would, counterintuitively, reduce efficiency. It reverted the improvements of cleaning transmit completions from the receive softirq. Going through my data, I did not observe this regression anymore on the latest patchset. Let me test again, with and without the new virtqueue_enable_cb_delayed patch. Perhaps that made a difference. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 5/5] virtio-net: keep tx interrupts disabled unless kick
>> - if (!use_napi) >> + if (use_napi) { >> + if (kick) >> + virtqueue_enable_cb_delayed(sq->vq); >> + else >> + virtqueue_disable_cb(sq->vq); > > > Since virtqueue_disable_cb() do nothing for event idx. I wonder whether or > not just calling enable_cb_dealyed() is ok here. Good point. > Btw, it does not disable interrupt at all, I propose a patch in the past > which can do more than this: > > https://patchwork.kernel.org/patch/6472601/ Interesting. Yes, let me evaluate that variant. Thanks for reviewing, Willem ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v2 5/5] virtio-net: keep tx interrupts disabled unless kick
From: Willem de BruijnTx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b14c82ce0032..b107ae011632 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1196,8 +1196,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) bool use_napi = sq->napi.weight; /* Free up any pending old buffers before queueing new ones. */ - if (!use_napi) + if (use_napi) { + if (kick) + virtqueue_enable_cb_delayed(sq->vq); + else + virtqueue_disable_cb(sq->vq); + } else { free_old_xmit_skbs(sq); + } /* timestamp packet in software */ skb_tx_timestamp(skb); -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v2 4/5] virtio-net: clean tx descriptors from rx napi
From: Willem de BruijnAmortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 81a98de864e6..b14c82ce0032 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1072,12 +1072,33 @@ static void free_old_xmit_skbs(struct send_queue *sq) u64_stats_update_end(>tx_syncp); } +static void virtnet_poll_cleantx(struct receive_queue *rq) +{ + struct virtnet_info *vi = rq->vq->vdev->priv; + unsigned int index = vq2rxq(rq->vq); + struct send_queue *sq = >sq[index]; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index); + + if (!sq->napi.weight) + return; + + if (__netif_tx_trylock(txq)) { + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + } + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_tx_wake_queue(txq); +} + static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); unsigned int received; + virtnet_poll_cleantx(rq); + received = virtnet_receive(rq, budget); /* Out of packets? */ -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Thu, Apr 20, 2017 at 2:27 AM, Jason Wangwrote: > > > On 2017年04月19日 04:21, Willem de Bruijn wrote: >> >> +static void virtnet_napi_tx_enable(struct virtnet_info *vi, >> + struct virtqueue *vq, >> + struct napi_struct *napi) >> +{ >> + if (!napi->weight) >> + return; >> + >> + if (!vi->affinity_hint_set) { >> + napi->weight = 0; >> + return; >> + } >> + >> + return virtnet_napi_enable(vq, napi); >> +} >> + >> static void refill_work(struct work_struct *work) > > > Maybe I was wrong, but according to Michael's comment it looks like he want > check affinity_hint_set just for speculative tx polling on rx napi instead > of disabling it at all. > > And I'm not convinced this is really needed, driver only provide affinity > hint instead of affinity, so it's not guaranteed that tx and rx interrupt > are in the same vcpus. You're right. I made the restriction broader than the request, to really err on the side of caution for the initial merge of napi tx. And enabling the optimization is always a win over keeping it off, even without irq affinity. The cycle cost is significant without affinity regardless of whether the optimization is used. Though this is not limited to napi-tx, it is more pronounced in that mode than without napi. 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: upstream: 1,1,1: 28985 Mbps, 278 Gcyc 1,0,2: 30067 Mbps, 402 Gcyc napi tx: 1,1,1: 34492 Mbps, 269 Gcyc 1,0,2: 36527 Mbps, 537 Gcyc (!) 1,0,1: 36269 Mbps, 394 Gcyc 1,0,0: 34674 Mbps, 402 Gcyc This is a particularly strong example. It is also representative of most RR tests. It is less pronounced in other streaming tests. 10x TCP_RR, for instance: upstream: 1,1,1: 42267 Mbps, 301 Gcyc 1,0,2: 40663 Mbps, 445 Gcyc napi tx: 1,1,1: 42420 Mbps, 303 Gcyc 1,0,2: 42267 Mbps, 431 Gcyc These numbers were obtained with the virtqueue_enable_cb_delayed optimization after xmit_skb, btw. It turns out that moving that before increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing 100x TCP_RR a bit. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v2 3/5] virtio-net: move free_old_xmit_skbs
From: Willem de BruijnAn upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 60 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c173e85dc7b8..81a98de864e6 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1042,6 +1042,36 @@ static int virtnet_receive(struct receive_queue *rq, int budget) return received; } +static void free_old_xmit_skbs(struct send_queue *sq) +{ + struct sk_buff *skb; + unsigned int len; + struct virtnet_info *vi = sq->vq->vdev->priv; + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); + unsigned int packets = 0; + unsigned int bytes = 0; + + while ((skb = virtqueue_get_buf(sq->vq, )) != NULL) { + pr_debug("Sent skb %p\n", skb); + + bytes += skb->len; + packets++; + + dev_kfree_skb_any(skb); + } + + /* Avoid overhead when no packets have been processed +* happens when called speculatively from start_xmit. +*/ + if (!packets) + return; + + u64_stats_update_begin(>tx_syncp); + stats->tx_bytes += bytes; + stats->tx_packets += packets; + u64_stats_update_end(>tx_syncp); +} + static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = @@ -1074,36 +1104,6 @@ static int virtnet_open(struct net_device *dev) return 0; } -static void free_old_xmit_skbs(struct send_queue *sq) -{ - struct sk_buff *skb; - unsigned int len; - struct virtnet_info *vi = sq->vq->vdev->priv; - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); - unsigned int packets = 0; - unsigned int bytes = 0; - - while ((skb = virtqueue_get_buf(sq->vq, )) != NULL) { - pr_debug("Sent skb %p\n", skb); - - bytes += skb->len; - packets++; - - dev_kfree_skb_any(skb); - } - - /* Avoid overhead when no packets have been processed -* happens when called speculatively from start_xmit. -*/ - if (!packets) - return; - - u64_stats_update_begin(>tx_syncp); - stats->tx_bytes += bytes; - stats->tx_packets += packets; - u64_stats_update_end(>tx_syncp); -} - static int virtnet_poll_tx(struct napi_struct *napi, int budget) { struct send_queue *sq = container_of(napi, struct send_queue, napi); -- 2.12.2.816.g281164-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v2 1/5] virtio-net: napi helper functions
From: Willem de BruijnPrepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 65 ++-- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 666ada6130ab..b9c1df29892c 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -239,6 +239,26 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static void virtqueue_napi_schedule(struct napi_struct *napi, + struct virtqueue *vq) +{ + if (napi_schedule_prep(napi)) { + virtqueue_disable_cb(vq); + __napi_schedule(napi); + } +} + +static void virtqueue_napi_complete(struct napi_struct *napi, + struct virtqueue *vq, int processed) +{ + int opaque; + + opaque = virtqueue_enable_cb_prepare(vq); + if (napi_complete_done(napi, processed) && + unlikely(virtqueue_poll(vq, opaque))) + virtqueue_napi_schedule(napi, vq); +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; @@ -936,27 +956,20 @@ static void skb_recv_done(struct virtqueue *rvq) struct virtnet_info *vi = rvq->vdev->priv; struct receive_queue *rq = >rq[vq2rxq(rvq)]; - /* Schedule NAPI, Suppress further interrupts if successful. */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rvq); - __napi_schedule(>napi); - } + virtqueue_napi_schedule(>napi, rvq); } -static void virtnet_napi_enable(struct receive_queue *rq) +static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) { - napi_enable(>napi); + napi_enable(napi); /* If all buffers were filled by other side before we napi_enabled, we -* won't get another interrupt, so process any outstanding packets -* now. virtnet_poll wants re-enable the queue, so we disable here. -* We synchronize against interrupts via NAPI_STATE_SCHED */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rq->vq); - local_bh_disable(); - __napi_schedule(>napi); - local_bh_enable(); - } +* won't get another interrupt, so process any outstanding packets now. +* Call local_bh_enable after to trigger softIRQ processing. +*/ + local_bh_disable(); + virtqueue_napi_schedule(napi, vq); + local_bh_enable(); } static void refill_work(struct work_struct *work) @@ -971,7 +984,7 @@ static void refill_work(struct work_struct *work) napi_disable(>napi); still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); - virtnet_napi_enable(rq); + virtnet_napi_enable(rq->vq, >napi); /* In theory, this can happen: if we don't get any buffers in * we will *never* try to fill again. @@ -1011,21 +1024,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); - unsigned int r, received; + unsigned int received; received = virtnet_receive(rq, budget); /* Out of packets? */ - if (received < budget) { - r = virtqueue_enable_cb_prepare(rq->vq); - if (napi_complete_done(napi, received)) { - if (unlikely(virtqueue_poll(rq->vq, r)) && - napi_schedule_prep(napi)) { - virtqueue_disable_cb(rq->vq); - __napi_schedule(napi); - } - } - } + if (received < budget) + virtqueue_napi_complete(napi, rq->vq, received); return received; } @@ -1040,7 +1045,7 @@ static int virtnet_open(struct net_device *dev) /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } return 0; @@ -1747,7 +1752,7 @@ static int virtnet_restore_up(struct virtio_device *vdev) schedule_delayed_work(>refill, 0); for (i = 0; i < vi->max_queue_pairs; i++) - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } netif_device_attach(vi->dev); --
[PATCH net-next v2 0/5] virtio-net tx napi
From: Willem de BruijnAdd napi for virtio-net transmit completion processing. Changes: v1 -> v2: - disable by default - disable unless affinity_hint_set because cache misses add up to a third higher cycle cost, e.g., in TCP_RR tests. This is not limited to the patch that enables tx completion cleaning in rx napi. - use trylock to avoid contention between tx and rx napi - keep interrupts masked during xmit_more (new patch 5/5) this improves cycles especially for multi UDP_STREAM, which does not benefit from cleaning tx completions on rx napi. - move free_old_xmit_skbs (new patch 3/5) to avoid forward declaration not changed: - deduplicate virnet_poll_tx and virtnet_poll_txclean they look similar, but have differ too much to make it worthwhile. - delay netif_wake_subqueue for more than 2 + MAX_SKB_FRAGS evaluated, but made no difference - patch 1/5 RFC -> v1: - dropped vhost interrupt moderation patch: not needed and likely expensive at light load - remove tx napi weight - always clean all tx completions - use boolean to toggle tx-napi, instead - only clean tx in rx if tx-napi is enabled - then clean tx before rx - fix: add missing braces in virtnet_freeze_down - testing: add 4KB TCP_RR + UDP test results Based on previous patchsets by Jason Wang: [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html Before commit b0c39dbdc204 ("virtio_net: don't free buffers in xmit ring") the virtio-net driver would free transmitted packets on transmission of new packets in ndo_start_xmit and, to catch the edge case when no new packet is sent, also in a timer at 10HZ. A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls due to low free descriptor count. It does not address a stalls due to low socket SO_SNDBUF. Increasing timer frequency decreases that stall time, but increases interrupt rate and, thus, cycle count. Currently, with no timer, packets are freed only at ndo_start_xmit. Latency of consume_skb is now unbounded. To avoid a deadlock if a sock reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small queues. Reenable TCP small queues by removing the orphan. Instead of using a timer, convert the driver to regular tx napi. This does not have the unresolved stall issue and does not have any frequency to tune. By keeping interrupts enabled by default, napi increases tx interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if one is already unacknowledged, so makes this more feasible today. Combine that with an optimization that brings interrupt rate back in line with the existing version for most workloads: Tx completion cleaning on rx interrupts elides most explicit tx interrupts by relying on the fact that many rx interrupts fire. Tested by running {1, 10, 100} {TCP, UDP} STREAM, RR, 4K_RR benchmarks from a guest to a server on the host, on an x86_64 Haswell. The guest runs 4 vCPUs pinned to 4 cores. vhost and the test server are pinned to a core each. All results are the median of 5 runs, with variance well < 10%. Used neper (github.com/google/neper) as test process. Napi increases single stream throughput, but increases cycle cost. The optimizations bring this down. The previous patchset saw a regression with UDP_STREAM, which does not benefit from cleaning tx interrupts in rx napi. This regression is now gone for 10x, 100x. Remaining difference is higher 1x TCP_STREAM, lower 1x UDP_STREAM. The latest results are with process, rx napi and tx napi affine to the same core. All numbers are lower than the previous patchset. upstream napi TCP_STREAM: 1x: Mbps 2898534492 Gcycles 278 269 10x: Mbps 4226742420 Gcycles 301 303 100x: Mbps 2971031131 Gcycles 286 282 TCP_RR Latency (us): 1x: p50 22 21 p99 27 26 Gcycles 176 171 10x: p50 40 40 p99 51 51 Gcycles 215 211 100x: p50 272 235 p99 414 360 Gcycles 228 233 TCP_RR 4K: 1x: p50 27 29 p99 34 36 Gcycles 184 170 10x: p50 67 66 p99 82 82 Gcycles 217 218 100x: p50 435 485 p99 813 626 Gcycles 242 226 UDP_STREAM: 1x: Mbps 2980225600 Gcycles 305 295 10x: Mbps 2990129684 Gcycles 287 295 100x: Mbps 2995229863 Gcycles 336 325 UDP_RR: 1x: p50 18 18 p99 23 22 Gcycles 183 181
[PATCH net-next v2 2/5] virtio-net: transmit napi
From: Willem de BruijnConvert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 77 +--- 1 file changed, 67 insertions(+), 10 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b9c1df29892c..c173e85dc7b8 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -33,9 +33,10 @@ static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); -static bool csum = true, gso = true; +static bool csum = true, gso = true, napi_tx; module_param(csum, bool, 0444); module_param(gso, bool, 0444); +module_param(napi_tx, bool, 0644); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -86,6 +87,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -262,12 +265,16 @@ static void virtqueue_napi_complete(struct napi_struct *napi, static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; + struct napi_struct *napi = >sq[vq2txq(vq)].napi; /* Suppress further interrupts. */ virtqueue_disable_cb(vq); - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi->dev, vq2txq(vq)); + if (napi->weight) + virtqueue_napi_schedule(napi, vq); + else + /* We were probably waiting for more output buffers. */ + netif_wake_subqueue(vi->dev, vq2txq(vq)); } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -972,6 +979,21 @@ static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) local_bh_enable(); } +static void virtnet_napi_tx_enable(struct virtnet_info *vi, + struct virtqueue *vq, + struct napi_struct *napi) +{ + if (!napi->weight) + return; + + if (!vi->affinity_hint_set) { + napi->weight = 0; + return; + } + + return virtnet_napi_enable(vq, napi); +} + static void refill_work(struct work_struct *work) { struct virtnet_info *vi = @@ -1046,6 +1068,7 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); + virtnet_napi_tx_enable(vi, vi->sq[i].vq, >sq[i].napi); } return 0; @@ -1081,6 +1104,25 @@ static void free_old_xmit_skbs(struct send_queue *sq) u64_stats_update_end(>tx_syncp); } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq->vq->vdev->priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); + + if (__netif_tx_trylock(txq)) { + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + } + + virtqueue_napi_complete(napi, sq->vq, 0); + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_tx_wake_queue(txq); + + return 0; +} + static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct virtio_net_hdr_mrg_rxbuf *hdr; @@ -1130,9 +1172,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int err; struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !skb->xmit_more; + bool use_napi = sq->napi.weight; /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq); + if (!use_napi) + free_old_xmit_skbs(sq); /* timestamp packet in software */ skb_tx_timestamp(skb); @@ -1152,8 +1196,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } /* Don't wait up for transmitted skbs to be freed. */ - skb_orphan(skb); -
[PATCH] tools/virtio: fix build breakage
Commit db932ced55cf ("virtio: add context flag to find vqs") added a new 'context' flag to vring_new_virtqueue(), but the corresponding API in tools/virtio/ is not updated causing build errors due to conflicting declarations. Bring code in tools/virtio in sync with that in kernel. I have used 'false' for the value of the new boolean 'context' flag as that seems to be the best way to preserve existing behavior. Tested with: $ make -C tools/virtio clean all ARCH=x86 Fixes: db932ced55cf ("virtio: add context flag to find vqs") Signed-off-by: Sekhar Nori--- I am not really a virtio user so I have not tested these apart from the build test. Applies on linux-next of April 11 2017 tools/virtio/linux/virtio.h | 1 + tools/virtio/virtio_test.c | 2 +- tools/virtio/vringh_test.c | 7 --- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h index 9377c8b4ac16..d8f534025b7f 100644 --- a/tools/virtio/linux/virtio.h +++ b/tools/virtio/linux/virtio.h @@ -57,6 +57,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, unsigned int vring_align, struct virtio_device *vdev, bool weak_barriers, + bool ctx, void *pages, bool (*notify)(struct virtqueue *vq), void (*callback)(struct virtqueue *vq), diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c index e0445898f08f..76d6f583c249 100644 --- a/tools/virtio/virtio_test.c +++ b/tools/virtio/virtio_test.c @@ -100,7 +100,7 @@ static void vq_info_add(struct vdev_info *dev, int num) vring_init(>vring, num, info->ring, 4096); info->vq = vring_new_virtqueue(info->idx, info->vring.num, 4096, >vdev, - true, info->ring, + true, false, info->ring, vq_notify, vq_callback, "test"); assert(info->vq); info->vq->priv = info; diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c index 5f94f5105678..9476c616d064 100644 --- a/tools/virtio/vringh_test.c +++ b/tools/virtio/vringh_test.c @@ -314,7 +314,8 @@ static int parallel_test(u64 features, err(1, "Could not set affinity to cpu %u", first_cpu); vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, , true, -guest_map, fast_vringh ? no_notify_host +false, guest_map, +fast_vringh ? no_notify_host : parallel_notify_host, never_callback_guest, "guest vq"); @@ -479,7 +480,7 @@ int main(int argc, char *argv[]) memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN)); /* Set up guest side. */ - vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, , true, + vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, , true, false, __user_addr_min, never_notify_host, never_callback_guest, "guest vq"); @@ -663,7 +664,7 @@ int main(int argc, char *argv[]) /* Force creation of direct, which we modify. */ __virtio_clear_bit(, VIRTIO_RING_F_INDIRECT_DESC); vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, , true, -__user_addr_min, +false, __user_addr_min, never_notify_host, never_callback_guest, "guest vq"); -- 2.9.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Need information on type 2 IOMMU
Hi All, We have drivers/vfio/vfio_iommu_type1.c. what is type1 iommu? Is it w.r.t vfio layer it is being referred? Is there type 2 IOMMU w.r.t vfio? If so what is it? Regards, Valmiki ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Fri, Apr 7, 2017 at 5:10 PM, Michael S. Tsirkinwrote: > On Fri, Apr 07, 2017 at 04:59:58PM -0400, Willem de Bruijn wrote: >> On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkin wrote: >> > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: >> >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin >> >> wrote: >> >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: >> >> >> From: Willem de Bruijn >> >> >> >> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work >> >> >> on reception of a receive interrupt if tx napi is enabled. With >> >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion >> >> >> interrupts for bidirectional workloads. >> >> >> >> >> >> Signed-off-by: Willem de Bruijn >> > >> > This is a popular approach, but I think this will only work well if tx >> > and rx interrupts are processed on the same CPU and if tx queue is per >> > cpu. If they target different CPUs or if tx queue is used from multiple >> > CPUs they will conflict on the shared locks. >> >> Yes. As a result of this discussion I started running a few vcpu affinity >> tests. >> >> > This can even change dynamically as CPUs/queues are reconfigured. >> > How about adding a flag and skipping the tx poll if there's no match? >> >> I suspect that even with the cache invalidations this optimization >> will be an improvement over handling all tx interrupts in the tx napi >> handler. I will get the datapoint for that. >> >> That said, we can make this conditional. What flag exactly do you >> propose? Compare raw_smp_processor_id() in the rx softint with one >> previously stored in the napi tx callback? > > I'm not sure. Another idea is to check vi->affinity_hint_set. > If set we know rq and sq are on the same CPU. I was not aware of that flag, thanks. Yes, that looks like it should work. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Fri, Apr 7, 2017 at 3:28 PM, Michael S. Tsirkinwrote: > On Mon, Apr 03, 2017 at 01:02:13AM -0400, Willem de Bruijn wrote: >> On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkin wrote: >> > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: >> >> From: Willem de Bruijn >> >> >> >> Amortize the cost of virtual interrupts by doing both rx and tx work >> >> on reception of a receive interrupt if tx napi is enabled. With >> >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion >> >> interrupts for bidirectional workloads. >> >> >> >> Signed-off-by: Willem de Bruijn > > This is a popular approach, but I think this will only work well if tx > and rx interrupts are processed on the same CPU and if tx queue is per > cpu. If they target different CPUs or if tx queue is used from multiple > CPUs they will conflict on the shared locks. Yes. As a result of this discussion I started running a few vcpu affinity tests. The data is not complete. In particular, I don't have the data yet to compare having tx and rx irq on the same cpu (0,0) vs on different (0,2) for this patchset. Which is the relevant data to your point. Initial results for unmodified upstream driver at {1, 10, 100}x TCP_STREAM, for irq cpu affinity (rx,tx). Process is always pinned to cpu 1. This is a 4 vcpu system pinned by the host to 4 cores on the same socket. The previously reported results were obtained with txq, rtx and process on different vcpus (0,2). Running all on the same vcpu lower cycle count considerably: irq 0,0 1throughput_Mbps=29767.14 391,488,924,526 cycles 10 throughput_Mbps=40808.64 424,530,251,896 cycles 100 throughput_Mbps=33475.13 414,622,071,167 cycles irq 0,2 1 throughput_Mbps=30176.05 395,673,200,747 cycles 10 throughput_Mbps=40729.26 433,948,374,991 cycles 100 throughput_Mbps=33758.68 436,291,949,393 cycles irq 1,1 1throughput_Mbps=26635.20 269,071,002,844 cycles 10 throughput_Mbps=42385.05 299,945,944,516 cycles 100 throughput_Mbps=33580.98 283,272,895,507 cycles With this patch set applied, cpu (1,1) 1 throughput_Mbps=34980.76 276,504,805,414 cycles 10 throughput_Mbps=42519.92 298,105,889,785 cycles 100 throughput_Mbps=35268.86 296,670,598,712 cycles I will need to get data for (0,2) vs (0,0). > This can even change dynamically as CPUs/queues are reconfigured. > How about adding a flag and skipping the tx poll if there's no match? I suspect that even with the cache invalidations this optimization will be an improvement over handling all tx interrupts in the tx napi handler. I will get the datapoint for that. That said, we can make this conditional. What flag exactly do you propose? Compare raw_smp_processor_id() in the rx softint with one previously stored in the napi tx callback? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
On Sun, Apr 2, 2017 at 10:47 PM, Michael S. Tsirkinwrote: > On Sun, Apr 02, 2017 at 04:10:12PM -0400, Willem de Bruijn wrote: >> From: Willem de Bruijn >> >> Amortize the cost of virtual interrupts by doing both rx and tx work >> on reception of a receive interrupt if tx napi is enabled. With >> VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion >> interrupts for bidirectional workloads. >> >> Signed-off-by: Willem de Bruijn >> --- >> drivers/net/virtio_net.c | 22 ++ >> 1 file changed, 22 insertions(+) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 95d938e82080..af830eb212bf 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -1030,12 +1030,34 @@ static int virtnet_receive(struct receive_queue *rq, >> int budget) >> return received; >> } >> >> +static void free_old_xmit_skbs(struct send_queue *sq); >> + > > Could you pls re-arrange code to avoid forward declarations? Okay. I'll do the move in a separate patch to simplify review. >> +static void virtnet_poll_cleantx(struct receive_queue *rq) >> +{ >> + struct virtnet_info *vi = rq->vq->vdev->priv; >> + unsigned int index = vq2rxq(rq->vq); >> + struct send_queue *sq = >sq[index]; >> + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index); >> + >> + if (!sq->napi.weight) >> + return; >> + >> + __netif_tx_lock(txq, smp_processor_id()); >> + free_old_xmit_skbs(sq); >> + __netif_tx_unlock(txq); >> + >> + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) >> + netif_wake_subqueue(vi->dev, vq2txq(sq->vq)); >> +} >> + > > Looks very similar to virtnet_poll_tx. > > I think this might be waking the tx queue too early, so > it will tend to stay almost full for long periods of time. > Why not defer wakeup until queue is at least half empty? I'll test that. Delaying wake-up longer than necessary can cause queue build up at the qdisc and higher tail latency, I imagine. But it may reduce the number of __netif_schedule calls. > I wonder whether it's worth it to handle very short queues > correctly - they previously made very slow progress, > not they are never woken up. > > I'm a bit concerned about the cost of these wakeups > and locking. I note that this wake is called basically > every time queue is not full. > > Would it make sense to limit the amount of tx polling? > Maybe use trylock to reduce the conflict with xmit? Yes, that sounds good. I did test that previously and saw no difference then. But when multiple cpus contend for a single txq it should help. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 3/3] virtio-net: clean tx descriptors from rx napi
From: Willem de BruijnAmortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn --- drivers/net/virtio_net.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 95d938e82080..af830eb212bf 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1030,12 +1030,34 @@ static int virtnet_receive(struct receive_queue *rq, int budget) return received; } +static void free_old_xmit_skbs(struct send_queue *sq); + +static void virtnet_poll_cleantx(struct receive_queue *rq) +{ + struct virtnet_info *vi = rq->vq->vdev->priv; + unsigned int index = vq2rxq(rq->vq); + struct send_queue *sq = >sq[index]; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index); + + if (!sq->napi.weight) + return; + + __netif_tx_lock(txq, smp_processor_id()); + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_wake_subqueue(vi->dev, vq2txq(sq->vq)); +} + static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); unsigned int received; + virtnet_poll_cleantx(rq); + received = virtnet_receive(rq, budget); /* Out of packets? */ -- 2.12.2.564.g063fe858b8-goog ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 2/3] virtio-net: transmit napi
On Sun, Apr 2, 2017 at 10:30 PM, Michael S. Tsirkinwrote: > On Sun, Apr 02, 2017 at 04:10:11PM -0400, Willem de Bruijn wrote: >> From: Willem de Bruijn >> >> Convert virtio-net to a standard napi tx completion path. This enables >> better TCP pacing using TCP small queues and increases single stream >> throughput. >> >> The virtio-net driver currently cleans tx descriptors on transmission >> of new packets in ndo_start_xmit. Latency depends on new traffic, so >> is unbounded. To avoid deadlock when a socket reaches its snd limit, >> packets are orphaned on tranmission. This breaks socket backpressure, >> including TSQ. >> >> Napi increases the number of interrupts generated compared to the >> current model, which keeps interrupts disabled as long as the ring >> has enough free descriptors. Keep tx napi optional for now. Follow-on >> patches will reduce the interrupt cost. >> >> Signed-off-by: Willem de Bruijn >> Signed-off-by: Jason Wang >> --- >> drivers/net/virtio_net.c | 63 >> >> 1 file changed, 53 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 6aac0ad0d9b2..95d938e82080 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -33,9 +33,10 @@ >> static int napi_weight = NAPI_POLL_WEIGHT; >> module_param(napi_weight, int, 0444); >> >> -static bool csum = true, gso = true; >> +static bool csum = true, gso = true, napi_tx = true; >> module_param(csum, bool, 0444); >> module_param(gso, bool, 0444); >> +module_param(napi_tx, bool, 0644); >> >> /* FIXME: MTU in config. */ >> #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > Off by default seems safer until we can find better ways > to reduce the overhead, esp for UDP. Okay, I'll change that. I don't have an immediate idea for that unidirectional case. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 0/3] virtio-net tx napi
From: Willem de BruijnAdd napi for virtio-net transmit completion processing. Based on previous patchsets by Jason Wang: [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html Changes: RFC -> v1: - dropped vhost interrupt moderation patch: not needed and likely expensive at light load - remove tx napi weight - always clean all tx completions - use boolean to toggle tx-napi, instead - only clean tx in rx if tx-napi is enabled - then clean tx before rx - fix: add missing braces in virtnet_freeze_down - testing: add 4KB TCP_RR + UDP test results Before commit b0c39dbdc204 ("virtio_net: don't free buffers in xmit ring") the virtio-net driver would free transmitted packets on transmission of new packets in ndo_start_xmit and, to catch the edge case when no new packet is sent, also in a timer at 10HZ. A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls due to low free descriptor count. It does not address a stalls due to low socket SO_SNDBUF. Increasing timer frequency decreases that stall time, but increases interrupt rate and, thus, cycle count. Currently, with no timer, packets are freed only at ndo_start_xmit. Latency of consume_skb is now unbounded. To avoid a deadlock if a sock reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small queues. Reenable TCP small queues by removing the orphan. Instead of using a timer, convert the driver to regular tx napi. This does not have the unresolved stall issue and does not have any frequency to tune. By keeping interrupts enabled by default, napi increases tx interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if one is already unacknowledged, so makes this more feasible today. Combine that with an optimization that brings interrupt rate back in line with the existing version for most workloads: Tx completion cleaning on rx interrupts elides most explicit tx interrupts by relying on the fact that many rx interrupts fire. Tested by running {1, 10, 100} {TCP, UDP} STREAM, RR, 4K_RR benchmarks from a guest to a server on the host, on an x86_64 Haswell. The guest runs 4 vCPUs pinned to 4 cores. vhost and the test server are pinned to a core each. All results are the median of 5 runs, with variance well < 10%. Used neper (github.com/google/neper) as test process. Napi increases single stream throughput, but increases cycle cost. Processing completions on rx interrupts optimization brings this down, especially for bi-directional workloads. UDP_STREAM is unidirectional and continues to see a ~10% lower throughput. Not showing number for only the optimization patch. That showed no significant difference with upstream. upstream napi +at-rx TCP_STREAM: 1x: Mbps 305373766637910 Gcycles 400 540 405 10x: Mbps 410123995440245 Gcycles 434 546 421 100x: Mbps 340883417234245 Gcycles 435 546 418 TCP_RR Latency (us): 1x: p50 24 24 21 p99 27 27 27 Gcycles 299 432 308 10x: p50 31 31 41 p99 40 46 52 Gcycles 346 428 322 100x: p50 155 151 310 p99 334 329 362 Gcycles 336 421 308 TCP_RR 4K: 1x: p50 30 30 27 p99 34 33 34 Gcycles 307 437 305 10x: p50 63 67 65 p99 76 77 87 Gcycles 334 425 315 100x: p50 421 497 511 p99 510 571 773 Gcycles 350 430 321 UDP_STREAM: 1x: Mbps 298022636026608 Gcycles 305 363 362 10x: Mbps 299012680127078 Gcycles 287 363 360 100x: Mbps 299522682227054 Gcycles 336 351 354 UDP_RR: 1x: p50 24 21 19 p99 27 24 23 Gcycles 299 431 309 10x: p50 31 27 35 p99 40 35 54 Gcycles 346 421 325 100x: p50 155 153 240 p99 334 323 462 Gcycles 336 421 311 UDP_RR 4K: 1x: p50 24 25 23 p99 27 28 30 Gcycles 299 435 321 10x: p50 31 35 48 p99 40 54 66 Gcycles 346 451 308 100x: p50 155 210 307 p99 334 451 519 Gcycles 336 440 297 Note that GSO is enabled, so 4K RR still translates to one packet
Re: [PATCH] virtio_console: fix uninitialized variable use
On Wed, 2017-03-29 at 23:27 +0300, Michael S. Tsirkin wrote: > Hi Mike > if you like, pls send me your Signed-off-by and I'll > change the patch to make you an author. Nah, it's perfect as it is. While I was pretty darn sure it was generic, I intentionally posted it as diagnostic info. > More importantly, pls let me know whether this helps > resolve the issues about hybernation for you! Well, it prevents kaboom with VIRTIO_CONSOLE_F_MULTIPORT turned off, but no, it does nada wrt the warnings. -Mike ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 1/3] virtio-net: napi helper functions
From: Willem de BruijnPrepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 65 ++-- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b0d241d110ec..6aac0ad0d9b2 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -239,6 +239,26 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static void virtqueue_napi_schedule(struct napi_struct *napi, + struct virtqueue *vq) +{ + if (napi_schedule_prep(napi)) { + virtqueue_disable_cb(vq); + __napi_schedule(napi); + } +} + +static void virtqueue_napi_complete(struct napi_struct *napi, + struct virtqueue *vq, int processed) +{ + int opaque; + + opaque = virtqueue_enable_cb_prepare(vq); + if (napi_complete_done(napi, processed) && + unlikely(virtqueue_poll(vq, opaque))) + virtqueue_napi_schedule(napi, vq); +} + static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; @@ -936,27 +956,20 @@ static void skb_recv_done(struct virtqueue *rvq) struct virtnet_info *vi = rvq->vdev->priv; struct receive_queue *rq = >rq[vq2rxq(rvq)]; - /* Schedule NAPI, Suppress further interrupts if successful. */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rvq); - __napi_schedule(>napi); - } + virtqueue_napi_schedule(>napi, rvq); } -static void virtnet_napi_enable(struct receive_queue *rq) +static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) { - napi_enable(>napi); + napi_enable(napi); /* If all buffers were filled by other side before we napi_enabled, we -* won't get another interrupt, so process any outstanding packets -* now. virtnet_poll wants re-enable the queue, so we disable here. -* We synchronize against interrupts via NAPI_STATE_SCHED */ - if (napi_schedule_prep(>napi)) { - virtqueue_disable_cb(rq->vq); - local_bh_disable(); - __napi_schedule(>napi); - local_bh_enable(); - } +* won't get another interrupt, so process any outstanding packets now. +* Call local_bh_enable after to trigger softIRQ processing. +*/ + local_bh_disable(); + virtqueue_napi_schedule(napi, vq); + local_bh_enable(); } static void refill_work(struct work_struct *work) @@ -971,7 +984,7 @@ static void refill_work(struct work_struct *work) napi_disable(>napi); still_empty = !try_fill_recv(vi, rq, GFP_KERNEL); - virtnet_napi_enable(rq); + virtnet_napi_enable(rq->vq, >napi); /* In theory, this can happen: if we don't get any buffers in * we will *never* try to fill again. @@ -1011,21 +1024,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) { struct receive_queue *rq = container_of(napi, struct receive_queue, napi); - unsigned int r, received; + unsigned int received; received = virtnet_receive(rq, budget); /* Out of packets? */ - if (received < budget) { - r = virtqueue_enable_cb_prepare(rq->vq); - if (napi_complete_done(napi, received)) { - if (unlikely(virtqueue_poll(rq->vq, r)) && - napi_schedule_prep(napi)) { - virtqueue_disable_cb(rq->vq); - __napi_schedule(napi); - } - } - } + if (received < budget) + virtqueue_napi_complete(napi, rq->vq, received); return received; } @@ -1040,7 +1045,7 @@ static int virtnet_open(struct net_device *dev) /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } return 0; @@ -1747,7 +1752,7 @@ static int virtnet_restore_up(struct virtio_device *vdev) schedule_delayed_work(>refill, 0); for (i = 0; i < vi->max_queue_pairs; i++) - virtnet_napi_enable(>rq[i]); + virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); } netif_device_attach(vi->dev); --
Re: [REGRESSION] 07ec51480b5e ("virtio_pci: use shared interrupts for virtqueues") causes crashes in guest
On 23.03.2017 17:41, Michael S. Tsirkin wrote: > On Thu, Mar 23, 2017 at 03:19:07PM +, Richard W.M. Jones wrote: >> On Thu, Mar 23, 2017 at 01:13:50PM +0800, Jason Wang wrote: >>> >From 312859b596e83a2164a8430343d31fce2a5ad808 Mon Sep 17 00:00:00 2001 >>> From: Jason Wang>>> Date: Thu, 23 Mar 2017 13:07:16 +0800 >>> Subject: [PATCH] virtio_pci: fix out of bound access for msix_names >>> Signed-off-by: Jason Wang >> I tested this, and it does appear to fix the crashes in >> vp_modern_find_vqs. Therefore: >> Tested-by: Richard W.M. Jones > I've queued the fix, thanks everyone! Thx. Feel free to add Tested-by: Thorsten Leemhuis as a kernel with that patch works well in my reboot tests so far. Ciao, Thorsten ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next 2/3] virtio-net: transmit napi
From: Willem de BruijnConvert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 63 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6aac0ad0d9b2..95d938e82080 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -33,9 +33,10 @@ static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); -static bool csum = true, gso = true; +static bool csum = true, gso = true, napi_tx = true; module_param(csum, bool, 0444); module_param(gso, bool, 0444); +module_param(napi_tx, bool, 0644); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -86,6 +87,8 @@ struct send_queue { /* Name of the send queue: output.$index */ char name[40]; + + struct napi_struct napi; }; /* Internal representation of a receive virtqueue */ @@ -262,12 +265,16 @@ static void virtqueue_napi_complete(struct napi_struct *napi, static void skb_xmit_done(struct virtqueue *vq) { struct virtnet_info *vi = vq->vdev->priv; + struct napi_struct *napi = >sq[vq2txq(vq)].napi; /* Suppress further interrupts. */ virtqueue_disable_cb(vq); - /* We were probably waiting for more output buffers. */ - netif_wake_subqueue(vi->dev, vq2txq(vq)); + if (napi->weight) + virtqueue_napi_schedule(napi, vq); + else + /* We were probably waiting for more output buffers. */ + netif_wake_subqueue(vi->dev, vq2txq(vq)); } static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) @@ -961,6 +968,9 @@ static void skb_recv_done(struct virtqueue *rvq) static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi) { + if (!napi->weight) + return; + napi_enable(napi); /* If all buffers were filled by other side before we napi_enabled, we @@ -1046,6 +1056,7 @@ static int virtnet_open(struct net_device *dev) if (!try_fill_recv(vi, >rq[i], GFP_KERNEL)) schedule_delayed_work(>refill, 0); virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi); + virtnet_napi_enable(vi->sq[i].vq, >sq[i].napi); } return 0; @@ -1081,6 +1092,24 @@ static void free_old_xmit_skbs(struct send_queue *sq) u64_stats_update_end(>tx_syncp); } +static int virtnet_poll_tx(struct napi_struct *napi, int budget) +{ + struct send_queue *sq = container_of(napi, struct send_queue, napi); + struct virtnet_info *vi = sq->vq->vdev->priv; + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); + + __netif_tx_lock(txq, smp_processor_id()); + free_old_xmit_skbs(sq); + __netif_tx_unlock(txq); + + virtqueue_napi_complete(napi, sq->vq, 0); + + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + netif_wake_subqueue(vi->dev, vq2txq(sq->vq)); + + return 0; +} + static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) { struct virtio_net_hdr_mrg_rxbuf *hdr; @@ -1130,9 +1159,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) int err; struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !skb->xmit_more; + bool use_napi = sq->napi.weight; /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq); + if (!use_napi) + free_old_xmit_skbs(sq); /* timestamp packet in software */ skb_tx_timestamp(skb); @@ -1152,8 +1183,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) } /* Don't wait up for transmitted skbs to be freed. */ - skb_orphan(skb); - nf_reset(skb); + if (!use_napi) { + skb_orphan(skb); + nf_reset(skb); + } /* If running out of space, stop queue to avoid getting packets that we * are then unable to transmit. @@ -1167,7 +1200,8 @@ static netdev_tx_t start_xmit(struct sk_buff
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Mon, Apr 24, 2017 at 01:05:45PM -0400, Willem de Bruijn wrote: > On Mon, Apr 24, 2017 at 12:40 PM, Michael S. Tsirkinwrote: > > On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote: > >> >>> Maybe I was wrong, but according to Michael's comment it looks like he > >> >>> want > >> >>> check affinity_hint_set just for speculative tx polling on rx napi > >> >>> instead > >> >>> of disabling it at all. > >> >>> > >> >>> And I'm not convinced this is really needed, driver only provide > >> >>> affinity > >> >>> hint instead of affinity, so it's not guaranteed that tx and rx > >> >>> interrupt > >> >>> are in the same vcpus. > >> >> > >> >> You're right. I made the restriction broader than the request, to really > >> >> err > >> >> on the side of caution for the initial merge of napi tx. And enabling > >> >> the optimization is always a win over keeping it off, even without irq > >> >> affinity. > >> >> > >> >> The cycle cost is significant without affinity regardless of whether the > >> >> optimization is used. > >> > > >> > > >> > Yes, I noticed this in the past too. > >> > > >> >> Though this is not limited to napi-tx, it is more > >> >> pronounced in that mode than without napi. > >> >> > >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: > >> >> > >> >> upstream: > >> >> > >> >> 1,1,1: 28985 Mbps, 278 Gcyc > >> >> 1,0,2: 30067 Mbps, 402 Gcyc > >> >> > >> >> napi tx: > >> >> > >> >> 1,1,1: 34492 Mbps, 269 Gcyc > >> >> 1,0,2: 36527 Mbps, 537 Gcyc (!) > >> >> 1,0,1: 36269 Mbps, 394 Gcyc > >> >> 1,0,0: 34674 Mbps, 402 Gcyc > >> >> > >> >> This is a particularly strong example. It is also representative > >> >> of most RR tests. It is less pronounced in other streaming tests. > >> >> 10x TCP_RR, for instance: > >> >> > >> >> upstream: > >> >> > >> >> 1,1,1: 42267 Mbps, 301 Gcyc > >> >> 1,0,2: 40663 Mbps, 445 Gcyc > >> >> > >> >> napi tx: > >> >> > >> >> 1,1,1: 42420 Mbps, 303 Gcyc > >> >> 1,0,2: 42267 Mbps, 431 Gcyc > >> >> > >> >> These numbers were obtained with the virtqueue_enable_cb_delayed > >> >> optimization after xmit_skb, btw. It turns out that moving that before > >> >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing > >> >> 100x TCP_RR a bit. > >> > > >> > > >> > I see, so I think we can leave the affinity hint optimization/check for > >> > future investigation: > >> > > >> > - to avoid endless optimization (e.g we may want to share a single > >> > vector/napi for tx/rx queue pairs in the future) for this series. > >> > - tx napi is disabled by default which means we can do optimization on > >> > top. > >> > >> Okay. I'll drop the vi->affinity_hint_set from the patch set for now. > > > > I kind of like it, let's be conservative. But I'd prefer a comment > > near it explaining why it's there. > > I don't feel strongly. Was minutes away from sending a v3 with this > code reverted, but I'll reinstate it and add a comment. Other planned > changes based on Jason's feedback to v2: > > v2 -> v3: > - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll > ensure that the handler always cleans, to avoid deadlock > - unconditionally clean in start_xmit > avoid adding an unnecessary "if (use_napi)" branch > - remove virtqueue_disable_cb in patch 5/5 > a noop in the common event_idx based loop Makes sense, thanks! -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions
On Thu, Apr 20, 2017 at 11:34:57AM -0400, Vlad Yasevich wrote: > > - For 1.1, do we really want something like vnet header? AFAIK, it was not > > used by modern > > NICs, is this better to pack all meta-data into descriptor itself? This may > > need a some > > changes in tun/macvtap, but looks more PCIE friendly. > > That would really be ideal and I've looked at this. We already have at least 16 unused bits in the used ring (head is 16 bit we are using 32 for it). -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote: > >>> Maybe I was wrong, but according to Michael's comment it looks like he > >>> want > >>> check affinity_hint_set just for speculative tx polling on rx napi > >>> instead > >>> of disabling it at all. > >>> > >>> And I'm not convinced this is really needed, driver only provide affinity > >>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt > >>> are in the same vcpus. > >> > >> You're right. I made the restriction broader than the request, to really > >> err > >> on the side of caution for the initial merge of napi tx. And enabling > >> the optimization is always a win over keeping it off, even without irq > >> affinity. > >> > >> The cycle cost is significant without affinity regardless of whether the > >> optimization is used. > > > > > > Yes, I noticed this in the past too. > > > >> Though this is not limited to napi-tx, it is more > >> pronounced in that mode than without napi. > >> > >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: > >> > >> upstream: > >> > >> 1,1,1: 28985 Mbps, 278 Gcyc > >> 1,0,2: 30067 Mbps, 402 Gcyc > >> > >> napi tx: > >> > >> 1,1,1: 34492 Mbps, 269 Gcyc > >> 1,0,2: 36527 Mbps, 537 Gcyc (!) > >> 1,0,1: 36269 Mbps, 394 Gcyc > >> 1,0,0: 34674 Mbps, 402 Gcyc > >> > >> This is a particularly strong example. It is also representative > >> of most RR tests. It is less pronounced in other streaming tests. > >> 10x TCP_RR, for instance: > >> > >> upstream: > >> > >> 1,1,1: 42267 Mbps, 301 Gcyc > >> 1,0,2: 40663 Mbps, 445 Gcyc > >> > >> napi tx: > >> > >> 1,1,1: 42420 Mbps, 303 Gcyc > >> 1,0,2: 42267 Mbps, 431 Gcyc > >> > >> These numbers were obtained with the virtqueue_enable_cb_delayed > >> optimization after xmit_skb, btw. It turns out that moving that before > >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing > >> 100x TCP_RR a bit. > > > > > > I see, so I think we can leave the affinity hint optimization/check for > > future investigation: > > > > - to avoid endless optimization (e.g we may want to share a single > > vector/napi for tx/rx queue pairs in the future) for this series. > > - tx napi is disabled by default which means we can do optimization on top. > > Okay. I'll drop the vi->affinity_hint_set from the patch set for now. I kind of like it, let's be conservative. But I'd prefer a comment near it explaining why it's there. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 1/3] virtio-iommu: firmware description of the virtual topology
On 21/04/17 09:43, Tian, Kevin wrote: >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Wednesday, April 19, 2017 2:41 AM >> >> On 18/04/17 10:51, Tian, Kevin wrote: From: Jean-Philippe Brucker Sent: Saturday, April 8, 2017 3:18 AM Unlike other virtio devices, the virtio-iommu doesn't work independently, it is linked to other virtual or assigned devices. So before jumping into device operations, we need to define a way for the guest to discover the virtual IOMMU and the devices it translates. The host must describe the relation between IOMMU and devices to the guest using either device-tree or ACPI. The virtual IOMMU identifies each >>> >>> Do you plan to support both device tree and ACPI? >> >> Yes, with ACPI the topology would be described using IORT nodes. I didn't >> include an example in my driver because DT is sufficient for a prototype >> and is readily available (both in Linux and kvmtool), whereas IORT would >> be quite easy to reuse in Linux, but isn't present in kvmtool at the >> moment. However, both interfaces have to be supported for the virtio- >> iommu >> to be portable. > > 'portable' means whether guest enables ACPI? Sorry, "supported" isn't the right term for what I meant. It is for firmware interface to accommodate devices, not the other way around, so firmware consideration is outside the scope of the virtio-iommu specification and virtio-iommu itself doesn't need to "support" any interface. For the purpose of this particular document however, both popular firmware interfaces (ACPI and DT) must be taken into account. Those are the two interfaces I know about, there might be others. But I figure that a VMM implementing a virtual IOMMU is complex enough to be able to also implement one of these two interfaces, so talking about DT and ACPI should fit all use cases. It also provides two examples for other firmware interfaces that wish to describe the IOMMU topology. virtual device with a 32-bit ID, that we will call "Device ID" in this document. Device IDs are not necessarily unique system-wide, but they >> may not overlap within a single virtual IOMMU. Device ID of passed-through devices do not need to match IDs seen by the physical IOMMU. The virtual IOMMU uses virtio-mmio transport exclusively, not virtio-pci, because with PCI the IOMMU interface would itself be an endpoint, and existing firmware interfaces don't allow to describe IOMMU<->master relations between PCI endpoints. >>> >>> I'm not familiar with virtio-mmio mechanism. Curious how devices in >>> virtio-mmio are enumerated today? Could we use that mechanism to >>> identify vIOMMUs and then invent a purely para-virtualized method to >>> enumerate devices behind each vIOMMU? >> >> Using DT, virtio-mmio devices are described with "virtio-mmio" compatible >> node, and with ACPI they use _HID LNRO0005. Since the host already >> describes available devices to a guest using a firmware interface, I think >> we should reuse the tools provided by that interface for describing >> relations between DMA masters and IOMMU. > > OK, I didn't realize virtio-mmio is defined to rely on DT for enumeration. Not necessarily DT, you can have virtio-mmio devices in ACPI namespace as well. Qemu has a an example of LNRO0005 with ACPI. >>> Asking this is because each vendor has its own enumeration methods. >>> ARM has device tree and ACPI IORT. AMR has ACPI IVRS and device >>> tree (same format as ARM?). Intel has APCI DMAR and sub-tables. Your >>> current proposal looks following ARM definitions which I'm not sure >>> extensible enough to cover features defined only in other vendors' >>> structures. >> >> ACPI IORT can be extended to incorporate para-virtualized IOMMUs, >> regardless of the underlying architecture. It isn't defined solely for the >> ARM SMMU, but serves a more general purpose of describing a map of >> device >> identifiers communicated from one components to another. Both DMAR and >> IVRS have such description (respectively DRHD and IVHD), but they are >> designed for a specific IOMMU, whereas IORT could host other kinds. > > I'll take a look at IORT definition. DRHD includes information more > than device mapping. I guess that most information provided by DMAR and others are IOMMU-specific and the equivalent for virtio-iommu would fit in virtio config space. But describing device mapping relative to IOMMUs is the same problem for all systems. Doing it with a virtio-iommu probing mechanism would require to reinvent a way to identify devices every time a host wants to add support for a new bus (RID for PCI, base address for MMIO, others in the future), when firmwares would have to provide this information anyway for bare metal. >> It seems that all we really need is an interface that says "there is a >> virtio-iommu at address X, here are the devices it translates and their >> corresponding IDs",
Re: [RFC 3/3] virtio-iommu: future work
On 21/04/17 09:31, Tian, Kevin wrote: >> From: Jean-Philippe Brucker >> Sent: Saturday, April 8, 2017 3:18 AM >> >> Here I propose a few ideas for extensions and optimizations. This is all >> very exploratory, feel free to correct mistakes and suggest more things. > > [...] >> >> II. Page table sharing >> == >> >> 1. Sharing IOMMU page tables >> >> >> VIRTIO_IOMMU_F_PT_SHARING >> >> This is independent of the nested mode described in I.2, but relies on a >> similar feature in the physical IOMMU: having two stages of page tables, >> one for the host and one for the guest. >> >> When this is supported, the guest can manage its own s1 page directory, to >> avoid sending MAP/UNMAP requests. Feature >> VIRTIO_IOMMU_F_PT_SHARING allows >> a driver to give a page directory pointer (pgd) to the host and send >> invalidations when removing or changing a mapping. In this mode, three >> requests are used: probe, attach and invalidate. An address space cannot >> be using the MAP/UNMAP interface and PT_SHARING at the same time. >> >> Device and driver first need to negotiate which page table format they >> will be using. This depends on the physical IOMMU, so the request contains >> a negotiation part to probe the device capabilities. >> >> (1) Driver attaches devices to address spaces as usual, but a flag >> VIRTIO_IOMMU_ATTACH_F_PRIVATE (working title) tells the device not to >> create page tables for use with the MAP/UNMAP API. The driver intends >> to manage the address space itself. >> >> (2) Driver sends a PROBE_TABLE request. It sets len > 0 with the size of >> pg_format array. >> >> VIRTIO_IOMMU_T_PROBE_TABLE >> >> struct virtio_iommu_req_probe_table { >> le32address_space; >> le32flags; >> le32len; >> >> le32nr_contexts; >> struct { >> le32model; >> u8 format[64]; >> } pg_format[len]; >> }; >> >> Introducing a probe request is more flexible than advertising those >> features in virtio config, because capabilities are dynamic, and depend on >> which devices are attached to an address space. Within a single address >> space, devices may support different numbers of contexts (PASIDs), and >> some may not support recoverable faults. >> >> (3) Device responds success with all page table formats implemented by the >> physical IOMMU in pg_format. 'model' 0 is invalid, so driver can >> initialize the array to 0 and deduce from there which entries have >> been filled by the device. >> >> Using a probe method seems preferable over trying to attach every possible >> format until one sticks. For instance, with an ARM guest running on an x86 >> host, PROBE_TABLE would return the Intel IOMMU page table format, and >> the >> guest could use that page table code to handle its mappings, hidden behind >> the IOMMU API. This requires that the page-table code is reasonably >> abstracted from the architecture, as is done with drivers/iommu/io-pgtable >> (an x86 guest could use any format implement by io-pgtable for example.) > > So essentially you need modify all existing IOMMU drivers to support page > table sharing in pvIOMMU. After abstraction is done the core pvIOMMU files > can be kept vendor agnostic. But if we talk about the whole pvIOMMU > module, it actually includes vendor specific logic thus unlike typical > para-virtualized virtio drivers being completely vendor agnostic. Is this > understanding accurate? Yes, although kernel modules would be separate. For Linux on ARM we already have the page-table logic abstracted in iommu/io-pgtable module, because multiple IOMMUs share the same PT formats (SMMUv2, SMMUv3, Renesas IPMMU, Qcom MSM, Mediatek). It offers a simple interface: * When attaching devices to an IOMMU domain, the IOMMU driver registers its page table format and provides invalidation callbacks. * On iommu_map/unmap, the IOMMU driver calls into io_pgtable_ops, which provide map, unmap and iova_to_phys functions. * Page table operations call back into the driver via iommu_gather_ops when they need to invalidate TLB entries. Currently only the few flavors of ARM PT formats are implemented, but other page table formats could be added if they fit this model. > It also means in the host-side pIOMMU driver needs to propagate all > supported formats through VFIO to Qemu vIOMMU, meaning > such format definitions need be consistently agreed across all those > components. Yes, that's the icky part. We need to define a format that every OS and hypervisor implementing virtio-iommu can understand (similarly to the PASID table sharing interface that Yi L is working on for VFIO, although that one is contained in Linux UAPI and doesn't require other OSes to know about it). >> 2. Sharing MMU page tables >> -- >> >> The guest can share process
Re: [RFC 2/3] virtio-iommu: device probing and operations
On 21/04/17 10:02, Tian, Kevin wrote: >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] >> Sent: Wednesday, April 19, 2017 2:46 AM >> >> On 18/04/17 11:26, Tian, Kevin wrote: From: Jean-Philippe Brucker Sent: Saturday, April 8, 2017 3:18 AM >>> [...] II. Feature bits VIRTIO_IOMMU_F_INPUT_RANGE (0) Available range of virtual addresses is described in input_range >>> >>> Usually only the maximum supported address bits are important. >>> Curious do you see such situation where low end of the address >>> space is not usable (since you have both start/end defined later)? >> >> A start address would allow to provide something resembling a GART to the >> guest: an IOMMU with one address space (ioasid_bits=0) and a small IOVA >> aperture. I'm not sure how useful that would be in practice. > > Intel VT-d has no such limitation, which I can tell. :-) > >> >> On a related note, the virtio-iommu itself doesn't provide a >> per-address-space aperture as it stands. For example, attaching a device >> to an address space might restrict the available IOVA range for the whole >> AS if that device cannot write to high memory (above 32-bit). If the guest >> attempts to map an IOVA outside this window into the device's address >> space, it should expect the MAP request to fail. And when attaching, if >> the address space already has mappings outside this window, then ATTACH >> should fail. >> >> This too seems to be something that ought to be communicated by firmware, >> but bits are missing (I can't find anything equivalent to DT's dma-ranges >> for PCI root bridges in ACPI tables, for example). In addition VFIO >> doesn't communicate any DMA mask for devices, and doesn't check them >> itself. I guess that the host could find out the DMA mask of devices one >> way or another, but it is tricky to enforce, so I didn't make this a hard >> requirement. Although I should probably add a few words about it. > > If there is no such communication on bare metal, then same for pvIOMMU. > >> >>> [...] 1. Attach device struct virtio_iommu_req_attach { le32address_space; le32device; le32flags/reserved; }; Attach a device to an address space. 'address_space' is an identifier unique to the guest. If the address space doesn't exist in the IOMMU >>> >>> Based on your description this address space ID is per operation right? >>> MAP/UNMAP and page-table sharing should have different ID spaces... >> >> I think it's simpler if we keep a single IOASID space per virtio-iommu >> device, because the maximum number of address spaces (described by >> ioasid_bits) might be a restriction of the pIOMMU. For page-table sharing >> you still need to define which devices will share a page directory using >> ATTACH requests, though that interface is not set in stone. > > got you. yes VM is supposed to consume less IOASIDs than physically > available. It doesn’t hurt to have one IOASID space for both IOVA > map/unmap usages (one IOASID per device) and SVM usages (multiple > IOASIDs per device). The former is digested by software and the latter > will be bound to hardware. > Hmm, I'm using address space indexed by IOASID for "classic" IOMMU, and then contexts indexed by PASID when talking about SVM. So in my mind an address space can have multiple sub-address-spaces (contexts). Number of IOASIDs is a limitation of the pIOMMU, and number of PASIDs is a limitation of the device. Therefore attaching devices to address spaces would update the number of available contexts in that address space. The terminology is not ideal, and I'd be happy to change it for something more clear. Thanks, Jean-Philippe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 6/6] drm: fourcc byteorder: virtio restrict to XRGB8888
While wading through the code I've noticed we have a little issue in virtio: We attach a format to the bo when it is created (DRM_IOCTL_MODE_CREATE_DUMB), not when we map it as framebuffer (DRM_IOCTL_MODE_ADDFB). Easy way out: support a single format only. Signed-off-by: Gerd Hoffmann--- drivers/gpu/drm/virtio/virtgpu_gem.c | 5 - drivers/gpu/drm/virtio/virtgpu_plane.c | 9 + 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index 4f2c2dc731..b09e5e5ae4 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -90,7 +90,10 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv, uint32_t resid; uint32_t format; - pitch = args->width * ((args->bpp + 1) / 8); + if (args->bpp != 32) + return -EINVAL; + + pitch = args->width * 4; args->size = pitch * args->height; args->size = ALIGN(args->size, PAGE_SIZE); diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index f40ffc9a70..3a4498a223 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -28,14 +28,7 @@ #include static const uint32_t virtio_gpu_formats[] = { - DRM_FORMAT_XRGB, - DRM_FORMAT_ARGB, - DRM_FORMAT_BGRX, - DRM_FORMAT_BGRA, - DRM_FORMAT_RGBX, - DRM_FORMAT_RGBA, - DRM_FORMAT_XBGR, - DRM_FORMAT_ABGR, + DRM_FORMAT_CPU_XRGB, }; static const uint32_t virtio_gpu_cursor_formats[] = { -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5/6] drm: fourcc byteorder: adapt virtio to drm_mode_legacy_fb_format update
Signed-off-by: Gerd Hoffmann--- drivers/gpu/drm/virtio/virtgpu_gem.c | 2 +- drivers/gpu/drm/virtio/virtgpu_plane.c | 31 --- 2 files changed, 1 insertion(+), 32 deletions(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index cc025d8fbe..4f2c2dc731 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -99,7 +99,7 @@ int virtio_gpu_mode_dumb_create(struct drm_file *file_priv, if (ret) goto fail; - format = virtio_gpu_translate_format(DRM_FORMAT_XRGB); + format = virtio_gpu_translate_format(DRM_FORMAT_CPU_XRGB); virtio_gpu_resource_id_get(vgdev, ); virtio_gpu_cmd_create_resource(vgdev, resid, format, args->width, args->height); diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index adcdbd0abe..f40ffc9a70 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -39,11 +39,7 @@ static const uint32_t virtio_gpu_formats[] = { }; static const uint32_t virtio_gpu_cursor_formats[] = { -#ifdef __BIG_ENDIAN - DRM_FORMAT_BGRA, -#else DRM_FORMAT_ARGB, -#endif }; uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc) @@ -51,32 +47,6 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc) uint32_t format; switch (drm_fourcc) { -#ifdef __BIG_ENDIAN - case DRM_FORMAT_XRGB: - format = VIRTIO_GPU_FORMAT_X8R8G8B8_UNORM; - break; - case DRM_FORMAT_ARGB: - format = VIRTIO_GPU_FORMAT_A8R8G8B8_UNORM; - break; - case DRM_FORMAT_BGRX: - format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM; - break; - case DRM_FORMAT_BGRA: - format = VIRTIO_GPU_FORMAT_B8G8R8A8_UNORM; - break; - case DRM_FORMAT_RGBX: - format = VIRTIO_GPU_FORMAT_R8G8B8X8_UNORM; - break; - case DRM_FORMAT_RGBA: - format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM; - break; - case DRM_FORMAT_XBGR: - format = VIRTIO_GPU_FORMAT_X8B8G8R8_UNORM; - break; - case DRM_FORMAT_ABGR: - format = VIRTIO_GPU_FORMAT_A8B8G8R8_UNORM; - break; -#else case DRM_FORMAT_XRGB: format = VIRTIO_GPU_FORMAT_B8G8R8X8_UNORM; break; @@ -101,7 +71,6 @@ uint32_t virtio_gpu_translate_format(uint32_t drm_fourcc) case DRM_FORMAT_ABGR: format = VIRTIO_GPU_FORMAT_R8G8B8A8_UNORM; break; -#endif default: /* * This should not happen, we handle everything listed -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 4/6] drm: fourcc byteorder: adapt bochs-drm to drm_mode_legacy_fb_format update
Signed-off-by: Gerd Hoffmann--- drivers/gpu/drm/bochs/bochs_mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index 857755ac2d..781d35bdff 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -508,7 +508,7 @@ bochs_user_framebuffer_create(struct drm_device *dev, (mode_cmd->pixel_format >> 16) & 0xff, (mode_cmd->pixel_format >> 24) & 0xff); - if (mode_cmd->pixel_format != DRM_FORMAT_XRGB) + if (mode_cmd->pixel_format != DRM_FORMAT_CPU_XRGB) return ERR_PTR(-ENOENT); obj = drm_gem_object_lookup(filp, mode_cmd->handles[0]); -- 2.9.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization