Call For Nominations: HPDC'23 Achievement Award

2023-03-21 Thread Zheng, Mai [E CPE]
==
Call For Nominations: HPDC'23 Achievement Award

In 2012, HPDC established an annual achievement award, which is presented to an 
individual who has made long-lasting, influential contributions to the 
foundations or practice of the field of high-performance parallel and 
distributed computing (HPDC). These contributions may include one or more of 
the following:

*Conceptual advances that have influenced the design or operation of HPDC 
systems or applications;
*Innovative techniques or tools for the design or analysis of HPDC systems or 
applications;
*Design, implementation, and deployment of innovative (components of) HPDC 
systems or applications;
*Analysis of innovative (components of) HPDC systems or applications.

In selecting the achievement award recipient, the Award Selection Committee 
will place particular emphasis on seminal contributions and a sustained record 
of high-impact in the field.

-
Past Winners
-
2022: Franck Cappello for his pioneering contributions in methods, tools, and 
testbeds for resilient high performance parallel and distributed computing.
2021: Rosa Badia, for her innovations in parallel task-based programming 
models, workflow applications and systems, and leadership in the high 
performance computing research community.
2020: No winner
2019: Geoffrey Fox, for his foundational contributions to parallel computing, 
high-performance software, the interface between applications and systems, 
contributions to education, and outreach to underrepresented communities.
2018: Satoshi Matsuoka, for his pioneering research in the design, 
implementation, and application of high performance systems and software tools 
for parallel and distributed systems.
2017: David Abramson, for his pioneering research in the design, 
implementation, and application of high performance systems and software tools 
for parallel and distributed systems.
2016: Jack Dongarra, for his long-standing and far-reaching contributions in 
high performance linear algebra and large-scale parallel and distributed 
computing.
2015: Ewa Deelman, for her significant influence, contributions, and 
distinguished use of workflow systems in high-performance computing.
2014: Rich Wolski, for pioneering and high-impact contributions to grid, cloud, 
and parallel computing.
2013: Miron Livny, for his significant contribution and high impact in the area 
of high-throughput computing.
2012: Ian Foster, for his initiative in the creation and development of grid 
computing and his significant contributions to high-performance distributed 
computing in support of the sciences.

-
Achievement Award Talk
-
The award will be presented at the 32nd International Symposium on 
High-Performance Parallel and Distributed Computing in Orlando, Florida, United 
States, June 16 - June 23, 2023. The winner should be available to receive the 
award and present an achievement award talk at the conference.

-
Nomination for the 2023 Award
-
Candidates may nominate a colleague by sending a letter in PDF form to 
mailto:dth...@nd.edu. Each nomination received will be retained and considered 
by the committee for three consecutive years. The letter of nomination should 
be about one page and contain:
*The nominee's current professional affiliation(s).
*A brief citation (thirty words or less) precisely stating the most salient 
reason(s) why the nominee is qualified for the award.
*A description of the technical contributions of the nominee and their 
significance and impact.

-
Important Dates
-
Nomination deadline: Wednesday, April 12, 2023

-
2023 Award Selection Committee
-
Ali Butt
Franck Cappello
Kyle Chard
Ningfang Mi
Douglas Thain
Devesh Tiwari
Jon Weissman

The award selection committee is chaired by a member of the HPDC steering 
committee and includes the General Chair(s) and the Program Committee Chair(s) 
of the current HPDC conference, and the previous year's award winner. An award 
committee member cannot be a nominator or be selected as the winner.

More info: 
https://www.hpdc.org/2023/awards/achievement-award/call-for-nominations/
==





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Xuan Zhuo
On Tue, 21 Mar 2023 23:53:52 -0400, "Michael S. Tsirkin"  
wrote:
> On Wed, Mar 22, 2023 at 11:40:56AM +0800, Xuan Zhuo wrote:
> > On Tue, 21 Mar 2023 23:34:43 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Wed, Mar 22, 2023 at 11:03:00AM +0800, Xuan Zhuo wrote:
> > > > Due to historical reasons, the implementation of XDP in virtio-net is 
> > > > relatively
> > > > chaotic. For example, the processing of XDP actions has two copies of 
> > > > similar
> > > > code. Such as page, xdp_page processing, etc.
> > > >
> > > > The purpose of this patch set is to refactor these code. Reduce the 
> > > > difficulty
> > > > of subsequent maintenance. Subsequent developers will not introduce new 
> > > > bugs
> > > > because of some complex logical relationships.
> > > >
> > > > In addition, the supporting to AF_XDP that I want to submit later will 
> > > > also need
> > > > to reuse the logic of XDP, such as the processing of actions, I don't 
> > > > want to
> > > > introduce a new similar code. In this way, I can reuse these codes in 
> > > > the
> > > > future.
> > > >
> > > > Please review.
> > > >
> > > > Thanks.
> > >
> > > I really want to see that code make progress though.
> >
> > I want to know, you refer to virtio-net + AF_XDP or refactor for XDP.
> >
> > > Would it make sense to merge this one through the virtio tree?
> >
> > There are some small problems that we merge this patch-set to Virtio Tree
> > directly.
> >
> > Thanks.
>
> what exactly? is there a dependency on net-next?

There will be a conflict, I submitted to net before. Now net-next includes it.

[1]. 
https://lore.kernel.org/netdev/20230315015223.89137-1-xuanz...@linux.alibaba.com/

There are no other problems.

Thanks.

>
> > >
> > > Then you will have all the pieces in one place and try to target
> > > next linux.
> > >
> > >
> > > > Xuan Zhuo (8):
> > > >   virtio_net: mergeable xdp: put old page immediately
> > > >   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
> > > >   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> > > > run xdp
> > > >   virtio_net: separate the logic of freeing xdp shinfo
> > > >   virtio_net: separate the logic of freeing the rest mergeable buf
> > > >   virtio_net: auto release xdp shinfo
> > > >   virtio_net: introduce receive_mergeable_xdp()
> > > >   virtio_net: introduce receive_small_xdp()
> > > >
> > > >  drivers/net/virtio_net.c | 615 +++
> > > >  1 file changed, 357 insertions(+), 258 deletions(-)
> > > >
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Michael S. Tsirkin
On Wed, Mar 22, 2023 at 11:40:56AM +0800, Xuan Zhuo wrote:
> On Tue, 21 Mar 2023 23:34:43 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Mar 22, 2023 at 11:03:00AM +0800, Xuan Zhuo wrote:
> > > Due to historical reasons, the implementation of XDP in virtio-net is 
> > > relatively
> > > chaotic. For example, the processing of XDP actions has two copies of 
> > > similar
> > > code. Such as page, xdp_page processing, etc.
> > >
> > > The purpose of this patch set is to refactor these code. Reduce the 
> > > difficulty
> > > of subsequent maintenance. Subsequent developers will not introduce new 
> > > bugs
> > > because of some complex logical relationships.
> > >
> > > In addition, the supporting to AF_XDP that I want to submit later will 
> > > also need
> > > to reuse the logic of XDP, such as the processing of actions, I don't 
> > > want to
> > > introduce a new similar code. In this way, I can reuse these codes in the
> > > future.
> > >
> > > Please review.
> > >
> > > Thanks.
> >
> > I really want to see that code make progress though.
> 
> I want to know, you refer to virtio-net + AF_XDP or refactor for XDP.
> 
> > Would it make sense to merge this one through the virtio tree?
> 
> There are some small problems that we merge this patch-set to Virtio Tree
> directly.
> 
> Thanks.

what exactly? is there a dependency on net-next?

> >
> > Then you will have all the pieces in one place and try to target
> > next linux.
> >
> >
> > > Xuan Zhuo (8):
> > >   virtio_net: mergeable xdp: put old page immediately
> > >   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
> > >   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> > > run xdp
> > >   virtio_net: separate the logic of freeing xdp shinfo
> > >   virtio_net: separate the logic of freeing the rest mergeable buf
> > >   virtio_net: auto release xdp shinfo
> > >   virtio_net: introduce receive_mergeable_xdp()
> > >   virtio_net: introduce receive_small_xdp()
> > >
> > >  drivers/net/virtio_net.c | 615 +++
> > >  1 file changed, 357 insertions(+), 258 deletions(-)
> > >
> > > --
> > > 2.32.0.3.g01195cf9f
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Xuan Zhuo
On Tue, 21 Mar 2023 23:34:43 -0400, "Michael S. Tsirkin"  
wrote:
> On Wed, Mar 22, 2023 at 11:03:00AM +0800, Xuan Zhuo wrote:
> > Due to historical reasons, the implementation of XDP in virtio-net is 
> > relatively
> > chaotic. For example, the processing of XDP actions has two copies of 
> > similar
> > code. Such as page, xdp_page processing, etc.
> >
> > The purpose of this patch set is to refactor these code. Reduce the 
> > difficulty
> > of subsequent maintenance. Subsequent developers will not introduce new bugs
> > because of some complex logical relationships.
> >
> > In addition, the supporting to AF_XDP that I want to submit later will also 
> > need
> > to reuse the logic of XDP, such as the processing of actions, I don't want 
> > to
> > introduce a new similar code. In this way, I can reuse these codes in the
> > future.
> >
> > Please review.
> >
> > Thanks.
>
> I really want to see that code make progress though.

I want to know, you refer to virtio-net + AF_XDP or refactor for XDP.

> Would it make sense to merge this one through the virtio tree?

There are some small problems that we merge this patch-set to Virtio Tree
directly.

Thanks.

>
> Then you will have all the pieces in one place and try to target
> next linux.
>
>
> > Xuan Zhuo (8):
> >   virtio_net: mergeable xdp: put old page immediately
> >   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
> >   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> > run xdp
> >   virtio_net: separate the logic of freeing xdp shinfo
> >   virtio_net: separate the logic of freeing the rest mergeable buf
> >   virtio_net: auto release xdp shinfo
> >   virtio_net: introduce receive_mergeable_xdp()
> >   virtio_net: introduce receive_small_xdp()
> >
> >  drivers/net/virtio_net.c | 615 +++
> >  1 file changed, 357 insertions(+), 258 deletions(-)
> >
> > --
> > 2.32.0.3.g01195cf9f
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/2] vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 11:32:36PM -0400, Michael S. Tsirkin wrote:
> On Tue, Mar 21, 2023 at 01:28:08PM +0200, Eli Cohen wrote:
> > Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.
> > 
> > Current firmware versions show degradation in packet rate when using
> > MRG_RXBUF. Users who favor memory saving over packet rate could enable
> > this feature but we want to keep it off by default.
> > 
> > One can still enable it when creating the vdpa device using vdpa tool by
> > providing features that include it.
> > 
> > For example:
> > $ vdpa dev add name vdpa0 mgmtdev pci/:86:00.2 device_features 
> > 0x300cb982b
> > 
> > 
> > Acked-by: Jason Wang 
> > Signed-off-by: Eli Cohen 
> 
> 
> I have a question. Could you guys try with the recent XDP mergeable
> buffer rework? Does this address the performance issue you are
> observing?
> 
> Specifically:
> 
> Message-Id: <20230322025701.2955-1-xuanz...@linux.alibaba.com>
> Message-Id: <20230322030308.16046-1-xuanz...@linux.alibaba.com>
> 
> I would very much appreciate if you posted perf numbers
> in response to that thread - both to help evaluate that
> patchset and this one.
> Thanks!

Oh sorry, I got confused. That is still preparatory work not real
zero copy.  The patchset to try is this:

Message-Id: <20230202110058.130695-1-xuanz...@linux.alibaba.com>




> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 520646ae7fa0..502c482a93ce 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3122,6 +3122,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> > *v_mdev, const char *name,
> > return -EINVAL;
> > }
> > device_features &= add_config->device_features;
> > +   } else {
> > +   device_features &= ~BIT_ULL(VIRTIO_NET_F_MRG_RXBUF);
> > }
> > if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) &&
> >   device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) {
> > -- 
> > 2.38.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Michael S. Tsirkin
On Wed, Mar 22, 2023 at 11:03:00AM +0800, Xuan Zhuo wrote:
> Due to historical reasons, the implementation of XDP in virtio-net is 
> relatively
> chaotic. For example, the processing of XDP actions has two copies of similar
> code. Such as page, xdp_page processing, etc.
> 
> The purpose of this patch set is to refactor these code. Reduce the difficulty
> of subsequent maintenance. Subsequent developers will not introduce new bugs
> because of some complex logical relationships.
> 
> In addition, the supporting to AF_XDP that I want to submit later will also 
> need
> to reuse the logic of XDP, such as the processing of actions, I don't want to
> introduce a new similar code. In this way, I can reuse these codes in the
> future.
> 
> Please review.
> 
> Thanks.

I really want to see that code make progress though.
Would it make sense to merge this one through the virtio tree?

Then you will have all the pieces in one place and try to target
next linux.


> Xuan Zhuo (8):
>   virtio_net: mergeable xdp: put old page immediately
>   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
>   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> run xdp
>   virtio_net: separate the logic of freeing xdp shinfo
>   virtio_net: separate the logic of freeing the rest mergeable buf
>   virtio_net: auto release xdp shinfo
>   virtio_net: introduce receive_mergeable_xdp()
>   virtio_net: introduce receive_small_xdp()
> 
>  drivers/net/virtio_net.c | 615 +++
>  1 file changed, 357 insertions(+), 258 deletions(-)
> 
> --
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 1/2] vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 01:28:08PM +0200, Eli Cohen wrote:
> Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.
> 
> Current firmware versions show degradation in packet rate when using
> MRG_RXBUF. Users who favor memory saving over packet rate could enable
> this feature but we want to keep it off by default.
> 
> One can still enable it when creating the vdpa device using vdpa tool by
> providing features that include it.
> 
> For example:
> $ vdpa dev add name vdpa0 mgmtdev pci/:86:00.2 device_features 0x300cb982b
> 
> 
> Acked-by: Jason Wang 
> Signed-off-by: Eli Cohen 


I have a question. Could you guys try with the recent XDP mergeable
buffer rework? Does this address the performance issue you are
observing?

Specifically:

Message-Id: <20230322025701.2955-1-xuanz...@linux.alibaba.com>
Message-Id: <20230322030308.16046-1-xuanz...@linux.alibaba.com>

I would very much appreciate if you posted perf numbers
in response to that thread - both to help evaluate that
patchset and this one.
Thanks!

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 520646ae7fa0..502c482a93ce 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3122,6 +3122,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
>   return -EINVAL;
>   }
>   device_features &= add_config->device_features;
> + } else {
> + device_features &= ~BIT_ULL(VIRTIO_NET_F_MRG_RXBUF);
>   }
>   if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) &&
> device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) {
> -- 
> 2.38.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 8/8] virtio_net: introduce receive_small_xdp()

2023-03-21 Thread Xuan Zhuo
The purpose of this patch is to simplify the receive_small().
Separate all the logic of XDP of small into a function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 165 +++
 1 file changed, 97 insertions(+), 68 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6ecb17e972e1..23b1a6fd1224 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -939,6 +939,96 @@ static struct page *xdp_linearize_page(struct 
receive_queue *rq,
return NULL;
 }
 
+static struct sk_buff *receive_small_xdp(struct net_device *dev,
+struct virtnet_info *vi,
+struct receive_queue *rq,
+struct bpf_prog *xdp_prog,
+void *buf,
+void *ctx,
+unsigned int len,
+unsigned int *xdp_xmit,
+struct virtnet_rq_stats *stats)
+{
+   unsigned int xdp_headroom = (unsigned long)ctx;
+   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   unsigned int headroom = vi->hdr_len + header_offset;
+   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
+   struct page *page = virt_to_head_page(buf);
+   struct page *xdp_page;
+   unsigned int buflen;
+   struct xdp_buff xdp;
+   struct sk_buff *skb;
+   unsigned int delta = 0;
+   unsigned int metasize = 0;
+   void *orig_data;
+   u32 act;
+
+   if (unlikely(hdr->hdr.gso_type))
+   goto err_xdp;
+
+   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
+   int offset = buf - page_address(page) + header_offset;
+   unsigned int tlen = len + vi->hdr_len;
+   int num_buf = 1;
+
+   xdp_headroom = virtnet_get_headroom(vi);
+   header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   headroom = vi->hdr_len + header_offset;
+   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+   xdp_page = xdp_linearize_page(rq, _buf, page,
+ offset, header_offset,
+ );
+   if (!xdp_page)
+   goto err_xdp;
+
+   buf = page_address(xdp_page);
+   put_page(page);
+   page = xdp_page;
+   }
+
+   xdp_init_buff(, buflen, >xdp_rxq);
+   xdp_prepare_buff(, buf + VIRTNET_RX_PAD + vi->hdr_len,
+xdp_headroom, len, true);
+   orig_data = xdp.data;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
+
+   switch (act) {
+   case VIRTNET_XDP_RES_PASS:
+   /* Recalculate length in case bpf program changed it */
+   delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
+   metasize = xdp.data - xdp.data_meta;
+   break;
+
+   case VIRTNET_XDP_RES_CONSUMED:
+   goto xdp_xmit;
+
+   case VIRTNET_XDP_RES_DROP:
+   goto err_xdp;
+   }
+
+   skb = build_skb(buf, buflen);
+   if (!skb)
+   goto err;
+
+   skb_reserve(skb, headroom - delta);
+   skb_put(skb, len);
+   if (metasize)
+   skb_metadata_set(skb, metasize);
+
+   return skb;
+
+err_xdp:
+   stats->xdp_drops++;
+err:
+   stats->drops++;
+   put_page(page);
+xdp_xmit:
+   return NULL;
+}
+
 static struct sk_buff *receive_small(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -949,15 +1039,11 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
 {
struct sk_buff *skb;
struct bpf_prog *xdp_prog;
-   unsigned int xdp_headroom = (unsigned long)ctx;
-   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   unsigned int header_offset = VIRTNET_RX_PAD;
unsigned int headroom = vi->hdr_len + header_offset;
unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
struct page *page = virt_to_head_page(buf);
-   unsigned int delta = 0;
-   struct page *xdp_page;
-   unsigned int metasize = 0;
 
len -= vi->hdr_len;
stats->bytes += len;
@@ -977,57 +1063,9 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
-   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
-   struct xdp_buff xdp;
-   void 

[PATCH net-next 6/8] virtio_net: auto release xdp shinfo

2023-03-21 Thread Xuan Zhuo
virtnet_build_xdp_buff_mrg() and virtnet_xdp_handler() auto
release xdp shinfo then the caller no need to careful the xdp shinfo.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a3f2bcb3db27..136131a7868a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,14 +833,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
stats->xdp_tx++;
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
 
err = virtnet_xdp_xmit(dev, 1, , 0);
if (unlikely(!err)) {
xdp_return_frame_rx_napi(xdpf);
} else if (unlikely(err < 0)) {
trace_xdp_exception(dev, xdp_prog, act);
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
}
 
*xdp_xmit |= VIRTIO_XDP_TX;
@@ -850,7 +850,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
stats->xdp_redirects++;
err = xdp_do_redirect(dev, xdp, xdp_prog);
if (err)
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
 
*xdp_xmit |= VIRTIO_XDP_REDIR;
return VIRTNET_XDP_RES_CONSUMED;
@@ -862,8 +862,12 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
trace_xdp_exception(dev, xdp_prog, act);
fallthrough;
case XDP_DROP:
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
}
+
+drop:
+   put_xdp_frags(xdp);
+   return VIRTNET_XDP_RES_DROP;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -1199,7 +1203,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
 dev->name, *num_buf,
 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
dev->stats.rx_length_errors++;
-   return -EINVAL;
+   goto err;
}
 
stats->bytes += len;
@@ -1218,7 +1222,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 dev->name, len, (unsigned long)(truesize - 
room));
dev->stats.rx_length_errors++;
-   return -EINVAL;
+   goto err;
}
 
frag = >frags[shinfo->nr_frags++];
@@ -1233,6 +1237,10 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
 
*xdp_frags_truesize = xdp_frags_truesz;
return 0;
+
+err:
+   put_xdp_frags(xdp);
+   return -EINVAL;
 }
 
 static void *mergeable_xdp_prepare(struct virtnet_info *vi,
@@ -1361,7 +1369,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, 
frame_sz,
 _buf, _frags_truesz, 
stats);
if (unlikely(err))
-   goto err_xdp_frags;
+   goto err_xdp;
 
act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
 
@@ -1369,7 +1377,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
case VIRTNET_XDP_RES_PASS:
head_skb = build_skb_from_xdp_buff(dev, vi, , 
xdp_frags_truesz);
if (unlikely(!head_skb))
-   goto err_xdp_frags;
+   goto err_xdp;
 
rcu_read_unlock();
return head_skb;
@@ -1379,11 +1387,8 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto xdp_xmit;
 
case VIRTNET_XDP_RES_DROP:
-   goto err_xdp_frags;
+   goto err_xdp;
}
-err_xdp_frags:
-   put_xdp_frags();
-   goto err_xdp;
}
rcu_read_unlock();
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 2/8] virtio_net: mergeable xdp: introduce mergeable_xdp_prepare

2023-03-21 Thread Xuan Zhuo
Separating the logic of preparation for xdp from receive_mergeable.

The purpose of this is to simplify the logic of execution of XDP.

The main logic here is that when headroom is insufficient, we need to
allocate a new page and calculate offset. It should be noted that if
there is new page, the variable page will refer to the new page.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 135 ++-
 1 file changed, 77 insertions(+), 58 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d2bf1ce0730..bb426958cdd4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1162,6 +1162,79 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
return 0;
 }
 
+static void *mergeable_xdp_prepare(struct virtnet_info *vi,
+  struct receive_queue *rq,
+  struct bpf_prog *xdp_prog,
+  void *ctx,
+  unsigned int *frame_sz,
+  int *num_buf,
+  struct page **page,
+  int offset,
+  unsigned int *len,
+  struct virtio_net_hdr_mrg_rxbuf *hdr)
+{
+   unsigned int truesize = mergeable_ctx_to_truesize(ctx);
+   unsigned int headroom = mergeable_ctx_to_headroom(ctx);
+   struct page *xdp_page;
+   unsigned int xdp_room;
+
+   /* Transient failure which in theory could occur if
+* in-flight packets from before XDP was enabled reach
+* the receive path after XDP is loaded.
+*/
+   if (unlikely(hdr->hdr.gso_type))
+   return NULL;
+
+   /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+* with headroom may add hole in truesize, which
+* make their length exceed PAGE_SIZE. So we disabled the
+* hole mechanism for xdp. See add_recvbuf_mergeable().
+*/
+   *frame_sz = truesize;
+
+   /* This happens when headroom is not enough because
+* of the buffer was prefilled before XDP is set.
+* This should only happen for the first several packets.
+* In fact, vq reset can be used here to help us clean up
+* the prefilled buffers, but many existing devices do not
+* support it, and we don't want to bother users who are
+* using xdp normally.
+*/
+   if (!xdp_prog->aux->xdp_has_frags &&
+   (*num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
+   /* linearize data for XDP */
+   xdp_page = xdp_linearize_page(rq, num_buf,
+ *page, offset,
+ VIRTIO_XDP_HEADROOM,
+ len);
+
+   if (!xdp_page)
+   return NULL;
+   } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
+   xdp_room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
+ sizeof(struct skb_shared_info));
+   if (*len + xdp_room > PAGE_SIZE)
+   return NULL;
+
+   xdp_page = alloc_page(GFP_ATOMIC);
+   if (!xdp_page)
+   return NULL;
+
+   memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+  page_address(*page) + offset, *len);
+   } else {
+   return page_address(*page) + offset;
+   }
+
+   *frame_sz = PAGE_SIZE;
+
+   put_page(*page);
+
+   *page = xdp_page;
+
+   return page_address(xdp_page) + VIRTIO_XDP_HEADROOM;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -1181,7 +1254,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
-   unsigned int frame_sz, xdp_room;
+   unsigned int frame_sz;
int err;
 
head_skb = NULL;
@@ -1211,65 +1284,11 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
u32 act;
int i;
 
-   /* Transient failure which in theory could occur if
-* in-flight packets from before XDP was enabled reach
-* the receive path after XDP is loaded.
-*/
-   if (unlikely(hdr->hdr.gso_type))
+   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
+offset, , hdr);
+   if (!data)
goto err_xdp;
 
-  

[PATCH net-next 4/8] virtio_net: separate the logic of freeing xdp shinfo

2023-03-21 Thread Xuan Zhuo
This patch introduce a new function that releases the
xdp shinfo. The subsequent patch will reuse this function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 72b9d6ee4024..09aed60e2f51 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -798,6 +798,21 @@ static int virtnet_xdp_xmit(struct net_device *dev,
return ret;
 }
 
+static void put_xdp_frags(struct xdp_buff *xdp)
+{
+   struct skb_shared_info *shinfo;
+   struct page *xdp_page;
+   int i;
+
+   if (xdp_buff_has_frags(xdp)) {
+   shinfo = xdp_get_shared_info_from_buff(xdp);
+   for (i = 0; i < shinfo->nr_frags; i++) {
+   xdp_page = skb_frag_page(>frags[i]);
+   put_page(xdp_page);
+   }
+   }
+}
+
 static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
   struct net_device *dev,
   unsigned int *xdp_xmit,
@@ -1312,12 +1327,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
unsigned int xdp_frags_truesz = 0;
-   struct skb_shared_info *shinfo;
-   struct page *xdp_page;
struct xdp_buff xdp;
void *data;
u32 act;
-   int i;
 
data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
 offset, , hdr);
@@ -1348,14 +1360,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
 err_xdp_frags:
-   if (xdp_buff_has_frags()) {
-   shinfo = xdp_get_shared_info_from_buff();
-   for (i = 0; i < shinfo->nr_frags; i++) {
-   xdp_page = skb_frag_page(>frags[i]);
-   put_page(xdp_page);
-   }
-   }
-
+   put_xdp_frags();
goto err_xdp;
}
rcu_read_unlock();
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 1/8] virtio_net: mergeable xdp: put old page immediately

2023-03-21 Thread Xuan Zhuo
In the xdp implementation of virtio-net mergeable, it always checks
whether two page is used and a page is selected to release. This is
complicated for the processing of action, and be careful.

In the entire process, we have such principles:
* If xdp_page is used (PASS, TX, Redirect), then we release the old
  page.
* If it is a drop case, we will release two. The old page obtained from
  buf is release inside err_xdp, and xdp_page needs be relased by us.

But in fact, when we allocate a new page, we can release the old page
immediately. Then just one is using, we just need to release the new
page for drop case. On the drop path, err_xdp will release the variable
"page", so we only need to let "page" point to the new xdp_page in
advance.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e2560b6f7980..4d2bf1ce0730 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1245,6 +1245,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (!xdp_page)
goto err_xdp;
offset = VIRTIO_XDP_HEADROOM;
+
+   put_page(page);
+   page = xdp_page;
} else if (unlikely(headroom < virtnet_get_headroom(vi))) {
xdp_room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
  sizeof(struct 
skb_shared_info));
@@ -1259,6 +1262,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
   page_address(page) + offset, len);
frame_sz = PAGE_SIZE;
offset = VIRTIO_XDP_HEADROOM;
+
+   put_page(page);
+   page = xdp_page;
} else {
xdp_page = page;
}
@@ -1278,8 +1284,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (unlikely(!head_skb))
goto err_xdp_frags;
 
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
return head_skb;
case XDP_TX:
@@ -1297,8 +1301,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
*xdp_xmit |= VIRTIO_XDP_TX;
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
goto xdp_xmit;
case XDP_REDIRECT:
@@ -1307,8 +1309,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (err)
goto err_xdp_frags;
*xdp_xmit |= VIRTIO_XDP_REDIR;
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
goto xdp_xmit;
default:
@@ -1321,9 +1321,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
 err_xdp_frags:
-   if (unlikely(xdp_page != page))
-   __free_pages(xdp_page, 0);
-
if (xdp_buff_has_frags()) {
shinfo = xdp_get_shared_info_from_buff();
for (i = 0; i < shinfo->nr_frags; i++) {
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 7/8] virtio_net: introduce receive_mergeable_xdp()

2023-03-21 Thread Xuan Zhuo
The purpose of this patch is to simplify the receive_mergeable().
Separate all the logic of XDP into a function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 128 +++
 1 file changed, 76 insertions(+), 52 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 136131a7868a..6ecb17e972e1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1316,6 +1316,63 @@ static void *mergeable_xdp_prepare(struct virtnet_info 
*vi,
return page_address(xdp_page) + VIRTIO_XDP_HEADROOM;
 }
 
+static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
+struct virtnet_info *vi,
+struct receive_queue *rq,
+struct bpf_prog *xdp_prog,
+void *buf,
+void *ctx,
+unsigned int len,
+unsigned int *xdp_xmit,
+struct virtnet_rq_stats *stats)
+{
+   struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+   int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+   struct page *page = virt_to_head_page(buf);
+   int offset = buf - page_address(page);
+   unsigned int xdp_frags_truesz = 0;
+   struct sk_buff *head_skb;
+   unsigned int frame_sz;
+   struct xdp_buff xdp;
+   void *data;
+   u32 act;
+   int err;
+
+   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
+offset, , hdr);
+   if (!data)
+   goto err_xdp;
+
+   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, frame_sz,
+_buf, _frags_truesz, stats);
+   if (unlikely(err))
+   goto err_xdp;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
+
+   switch (act) {
+   case VIRTNET_XDP_RES_PASS:
+   head_skb = build_skb_from_xdp_buff(dev, vi, , 
xdp_frags_truesz);
+   if (unlikely(!head_skb))
+   goto err_xdp;
+   return head_skb;
+
+   case VIRTNET_XDP_RES_CONSUMED:
+   return NULL;
+
+   case VIRTNET_XDP_RES_DROP:
+   break;
+   }
+
+err_xdp:
+   put_page(page);
+   mergeable_buf_free(rq, num_buf, dev, stats);
+
+   stats->xdp_drops++;
+   stats->drops++;
+   return NULL;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -1325,18 +1382,16 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
 unsigned int *xdp_xmit,
 struct virtnet_rq_stats *stats)
 {
-   struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
-   int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
-   struct page *page = virt_to_head_page(buf);
-   int offset = buf - page_address(page);
-   struct sk_buff *head_skb, *curr_skb;
-   struct bpf_prog *xdp_prog;
unsigned int truesize = mergeable_ctx_to_truesize(ctx);
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
-   unsigned int frame_sz;
-   int err;
+   struct virtio_net_hdr_mrg_rxbuf *hdr;
+   struct sk_buff *head_skb, *curr_skb;
+   struct bpf_prog *xdp_prog;
+   struct page *page;
+   int num_buf;
+   int offset;
 
head_skb = NULL;
stats->bytes += len - vi->hdr_len;
@@ -1348,51 +1403,24 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_skb;
}
 
-   if (likely(!vi->xdp_enabled)) {
-   xdp_prog = NULL;
-   goto skip_xdp;
-   }
-
-   rcu_read_lock();
-   xdp_prog = rcu_dereference(rq->xdp_prog);
-   if (xdp_prog) {
-   unsigned int xdp_frags_truesz = 0;
-   struct xdp_buff xdp;
-   void *data;
-   u32 act;
-
-   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
-offset, , hdr);
-   if (!data)
-   goto err_xdp;
-
-   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, 
frame_sz,
-_buf, _frags_truesz, 
stats);
-   if (unlikely(err))
-   goto err_xdp;
-
-   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
-
-   switch (act) {
-   

[PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Xuan Zhuo
Due to historical reasons, the implementation of XDP in virtio-net is relatively
chaotic. For example, the processing of XDP actions has two copies of similar
code. Such as page, xdp_page processing, etc.

The purpose of this patch set is to refactor these code. Reduce the difficulty
of subsequent maintenance. Subsequent developers will not introduce new bugs
because of some complex logical relationships.

In addition, the supporting to AF_XDP that I want to submit later will also need
to reuse the logic of XDP, such as the processing of actions, I don't want to
introduce a new similar code. In this way, I can reuse these codes in the
future.

Please review.

Thanks.

Xuan Zhuo (8):
  virtio_net: mergeable xdp: put old page immediately
  virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
  virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
run xdp
  virtio_net: separate the logic of freeing xdp shinfo
  virtio_net: separate the logic of freeing the rest mergeable buf
  virtio_net: auto release xdp shinfo
  virtio_net: introduce receive_mergeable_xdp()
  virtio_net: introduce receive_small_xdp()

 drivers/net/virtio_net.c | 615 +++
 1 file changed, 357 insertions(+), 258 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-03-21 Thread Xuan Zhuo
At present, we have two similar logic to perform the XDP prog.

Therefore, this PATCH separates the code of executing XDP, which is
conducive to later maintenance.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 142 +--
 1 file changed, 75 insertions(+), 67 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bb426958cdd4..72b9d6ee4024 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -301,6 +301,15 @@ struct padded_vnet_hdr {
char padding[12];
 };
 
+enum {
+   /* xdp pass */
+   VIRTNET_XDP_RES_PASS,
+   /* drop packet. the caller needs to release the page. */
+   VIRTNET_XDP_RES_DROP,
+   /* packet is consumed by xdp. the caller needs to do nothing. */
+   VIRTNET_XDP_RES_CONSUMED,
+};
+
 static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
 
@@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device *dev,
return ret;
 }
 
+static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
+  struct net_device *dev,
+  unsigned int *xdp_xmit,
+  struct virtnet_rq_stats *stats)
+{
+   struct xdp_frame *xdpf;
+   int err;
+   u32 act;
+
+   act = bpf_prog_run_xdp(xdp_prog, xdp);
+   stats->xdp_packets++;
+
+   switch (act) {
+   case XDP_PASS:
+   return VIRTNET_XDP_RES_PASS;
+
+   case XDP_TX:
+   stats->xdp_tx++;
+   xdpf = xdp_convert_buff_to_frame(xdp);
+   if (unlikely(!xdpf))
+   return VIRTNET_XDP_RES_DROP;
+
+   err = virtnet_xdp_xmit(dev, 1, , 0);
+   if (unlikely(!err)) {
+   xdp_return_frame_rx_napi(xdpf);
+   } else if (unlikely(err < 0)) {
+   trace_xdp_exception(dev, xdp_prog, act);
+   return VIRTNET_XDP_RES_DROP;
+   }
+
+   *xdp_xmit |= VIRTIO_XDP_TX;
+   return VIRTNET_XDP_RES_CONSUMED;
+
+   case XDP_REDIRECT:
+   stats->xdp_redirects++;
+   err = xdp_do_redirect(dev, xdp, xdp_prog);
+   if (err)
+   return VIRTNET_XDP_RES_DROP;
+
+   *xdp_xmit |= VIRTIO_XDP_REDIR;
+   return VIRTNET_XDP_RES_CONSUMED;
+
+   default:
+   bpf_warn_invalid_xdp_action(dev, xdp_prog, act);
+   fallthrough;
+   case XDP_ABORTED:
+   trace_xdp_exception(dev, xdp_prog, act);
+   fallthrough;
+   case XDP_DROP:
+   return VIRTNET_XDP_RES_DROP;
+   }
+}
+
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 {
return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
@@ -876,7 +938,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
struct page *page = virt_to_head_page(buf);
unsigned int delta = 0;
struct page *xdp_page;
-   int err;
unsigned int metasize = 0;
 
len -= vi->hdr_len;
@@ -898,7 +959,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
-   struct xdp_frame *xdpf;
struct xdp_buff xdp;
void *orig_data;
u32 act;
@@ -931,46 +991,22 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
xdp_prepare_buff(, buf + VIRTNET_RX_PAD + vi->hdr_len,
 xdp_headroom, len, true);
orig_data = xdp.data;
-   act = bpf_prog_run_xdp(xdp_prog, );
-   stats->xdp_packets++;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
 
switch (act) {
-   case XDP_PASS:
+   case VIRTNET_XDP_RES_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
len = xdp.data_end - xdp.data;
metasize = xdp.data - xdp.data_meta;
break;
-   case XDP_TX:
-   stats->xdp_tx++;
-   xdpf = xdp_convert_buff_to_frame();
-   if (unlikely(!xdpf))
-   goto err_xdp;
-   err = virtnet_xdp_xmit(dev, 1, , 0);
-   if (unlikely(!err)) {
-   xdp_return_frame_rx_napi(xdpf);
-   } else if (unlikely(err < 0)) {
-   trace_xdp_exception(vi->dev, xdp_prog, act);
-   goto err_xdp;
-   }
-   

[PATCH net-next 5/8] virtio_net: separate the logic of freeing the rest mergeable buf

2023-03-21 Thread Xuan Zhuo
This patch introduce a new function that frees the rest mergeable buf.
The subsequent patch will reuse this function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 09aed60e2f51..a3f2bcb3db27 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1076,6 +1076,28 @@ static struct sk_buff *receive_big(struct net_device 
*dev,
return NULL;
 }
 
+static void mergeable_buf_free(struct receive_queue *rq, int num_buf,
+  struct net_device *dev,
+  struct virtnet_rq_stats *stats)
+{
+   struct page *page;
+   void *buf;
+   int len;
+
+   while (num_buf-- > 1) {
+   buf = virtqueue_get_buf(rq->vq, );
+   if (unlikely(!buf)) {
+   pr_debug("%s: rx error: %d buffers missing\n",
+dev->name, num_buf);
+   dev->stats.rx_length_errors++;
+   break;
+   }
+   stats->bytes += len;
+   page = virt_to_head_page(buf);
+   put_page(page);
+   }
+}
+
 /* Why not use xdp_build_skb_from_frame() ?
  * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
  * virtio-net there are 2 points that do not match its requirements:
@@ -1436,18 +1458,8 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
stats->xdp_drops++;
 err_skb:
put_page(page);
-   while (num_buf-- > 1) {
-   buf = virtqueue_get_buf(rq->vq, );
-   if (unlikely(!buf)) {
-   pr_debug("%s: rx error: %d buffers missing\n",
-dev->name, num_buf);
-   dev->stats.rx_length_errors++;
-   break;
-   }
-   stats->bytes += len;
-   page = virt_to_head_page(buf);
-   put_page(page);
-   }
+   mergeable_buf_free(rq, num_buf, dev, stats);
+
 err_buf:
stats->drops++;
dev_kfree_skb(head_skb);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize

2023-03-21 Thread Xuan Zhuo
The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 58 
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a31cf71c1689..bd79d681022b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2158,6 +2158,43 @@ static int virtqueue_resize_packed(struct virtqueue 
*_vq, u32 num)
return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+void (*recycle)(struct virtqueue *vq, 
void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+   void *buf;
+   int err;
+
+   if (!vq->we_own_ring)
+   return -EPERM;
+
+   if (!vdev->config->disable_vq_and_reset)
+   return -ENOENT;
+
+   if (!vdev->config->enable_vq_after_reset)
+   return -ENOENT;
+
+   err = vdev->config->disable_vq_and_reset(_vq);
+   if (err)
+   return err;
+
+   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+   recycle(_vq, buf);
+
+   return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+
+   if (vdev->config->enable_vq_after_reset(_vq))
+   return -EBUSY;
+
+   return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2728,13 +2765,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   struct virtio_device *vdev = vq->vq.vdev;
-   void *buf;
int err;
 
-   if (!vq->we_own_ring)
-   return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
 
@@ -2744,28 +2776,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == 
num)
return 0;
 
-   if (!vdev->config->disable_vq_and_reset)
-   return -ENOENT;
-
-   if (!vdev->config->enable_vq_after_reset)
-   return -ENOENT;
-
-   err = vdev->config->disable_vq_and_reset(_vq);
+   err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
 
-   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-   recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
 
-   if (vdev->config->enable_vq_after_reset(_vq))
-   return -EBUSY;
-
-   return err;
+   return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 08/11] virtio_ring: introduce virtqueue_dma_dev()

2023-03-21 Thread Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio.c  |  6 ++
 drivers/virtio/virtio_ring.c | 17 +
 include/linux/virtio.h   |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..11c5035369e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include 
 #include 
 #include 
 #include 
@@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
u64 driver_features;
u64 driver_features_legacy;
 
+   _d->dma_mask = &_d->coherent_dma_mask;
+   err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
+   if (err)
+   return err;
+
/* We have a driver! */
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3a558ca350c6..6b2334951e5f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2299,6 +2299,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (vq->use_dma_api)
+   return vring_dma_dev(vq);
+   else
+   return >vq.vdev->dev;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2b472514c49b..1fa50191cf0a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 01/11] virtio_ring: split: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_split() to
one new function. DMA address will be saved as sg->dma_address if
use_dma_api is true, then virtqueue_add_split() will use it directly.
Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 121 +++
 1 file changed, 93 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 41144b5246a8..fe704ca6c813 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
direction);
 }
 
+static dma_addr_t vring_sg_address(struct scatterlist *sg)
+{
+   if (sg->dma_address)
+   return sg->dma_address;
+
+   return (dma_addr_t)sg_phys(sg);
+}
+
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
   void *cpu_addr, size_t size,
   enum dma_data_direction direction)
@@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct 
virtqueue *vq,
return next;
 }
 
+static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
+   struct scatterlist *sgs[],
+   unsigned int total_sg,
+   unsigned int out_sgs,
+   unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_TO_DEVICE);
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_FROM_DEVICE);
+   }
+   }
+}
+
+static int virtqueue_map_sgs(struct vring_virtqueue *vq,
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return 0;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   return 0;
+
+err:
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   return -ENOMEM;
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
  struct scatterlist *sgs[],
  unsigned int total_sg,
@@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
struct vring_desc *desc;
-   unsigned int i, n, avail, descs_used, prev, err_idx;
-   int head;
+   unsigned int i, n, avail, descs_used, prev;
bool indirect;
+   int head;
 
START_USE(vq);
 
@@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return -ENOSPC;
}
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
prev = i;
/* Note that we trust indirect descriptor
 * table since it use stream DMA mapping.
 */
-   

[PATCH vhost v4 06/11] virtio_ring: packed-indirect: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use sg->dma_address,
otherwise all dma_address must be null when passing it to the APIs of
virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cafb720cbdc1..623ee7442336 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1333,6 +1333,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
unsigned int i, n;
u16 head, id;
dma_addr_t addr;
+   bool dma_map_internal;
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
@@ -1350,7 +1351,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs))
return -ENOMEM;
 
for (n = 0; n < out_sgs + in_sgs; n++) {
@@ -1412,6 +1414,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = desc;
vq->packed.desc_state[id].last = id;
+   vq->packed.desc_state[id].dma_map_internal = dma_map_internal;
+
 
vq->num_added += 1;
 
@@ -1421,7 +1425,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (dma_map_internal)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
kfree(desc);
 
@@ -1643,7 +1648,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (!desc)
return;
 
-   if (vq->use_dma_api) {
+   if (vq->use_dma_api && state->dma_map_internal) {
len = vq->packed.desc_extra[id].len;
for (i = 0; i < len / sizeof(struct vring_packed_desc);
i++)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 09/11] virtio_ring: correct the expression of the description of virtqueue_resize()

2023-03-21 Thread Xuan Zhuo
Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6b2334951e5f..a31cf71c1689 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2704,7 +2704,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 11/11] virtio_ring: introduce virtqueue_reset()

2023-03-21 Thread Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 33 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bd79d681022b..38058de7f1b9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2789,6 +2789,39 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+   void (*recycle)(struct virtqueue *vq, void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   int err;
+
+   err = virtqueue_disable_and_recycle(_vq, recycle);
+   if (err)
+   return err;
+
+   if (vq->packed_ring)
+   virtqueue_reinit_packed(vq);
+   else
+   virtqueue_reinit_split(vq);
+
+   return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1fa50191cf0a..22bbd06ef8c8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -97,6 +97,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+   void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 07/11] virtio_ring: update document for virtqueue_add_*

2023-03-21 Thread Xuan Zhuo
Update the document of virtqueue_add_* series API, allowing the callers to
use sg->dma_address to pass the dma address to Virtio Core.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 623ee7442336..3a558ca350c6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2192,6 +2192,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_sgs(struct virtqueue *_vq,
@@ -2226,6 +2230,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_outbuf(struct virtqueue *vq,
@@ -2248,6 +2256,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf(struct virtqueue *vq,
@@ -2271,6 +2283,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 02/11] virtio_ring: packed: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_packed(). DMA
address will be saved as sg->dma_address, then virtqueue_add_packed()
will use it directly. Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 37 +++-
 1 file changed, 7 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fe704ca6c813..42e8c9d44161 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1430,9 +1430,9 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, c, descs_used, err_idx;
+   unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
-   u16 head, id, prev, curr, avail_used_flags;
+   u16 head, id, prev, curr;
int err;
 
START_USE(vq);
@@ -1461,7 +1461,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
}
 
head = vq->packed.next_avail_idx;
-   avail_used_flags = vq->packed.avail_used_flags;
 
WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
 
@@ -1479,15 +1478,13 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
curr = id;
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
flags = cpu_to_le16(vq->packed.avail_used_flags |
(++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
(n < out_sgs ? 0 : VRING_DESC_F_WRITE));
@@ -1496,12 +1493,12 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
else
desc[i].flags = flags;
 
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
 
if (unlikely(vq->use_dma_api)) {
-   vq->packed.desc_extra[curr].addr = addr;
+   vq->packed.desc_extra[curr].addr = 
vring_sg_address(sg);
vq->packed.desc_extra[curr].len = sg->length;
vq->packed.desc_extra[curr].flags =
le16_to_cpu(flags);
@@ -1547,26 +1544,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
END_USE(vq);
 
return 0;
-
-unmap_release:
-   err_idx = i;
-   i = head;
-   curr = vq->free_head;
-
-   vq->packed.avail_used_flags = avail_used_flags;
-
-   for (n = 0; n < total_sg; n++) {
-   if (i == err_idx)
-   break;
-   vring_unmap_extra_packed(vq, >packed.desc_extra[curr]);
-   curr = vq->packed.desc_extra[curr].next;
-   i++;
-   if (i >= vq->packed.vring.num)
-   i = 0;
-   }
-
-   END_USE(vq);
-   return -EIO;
 }
 
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 00/11] virtio core prepares for AF_XDP

2023-03-21 Thread Xuan Zhuo
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-|---|--|
xmit by sockperf: 90%|   100%|  |  318967
xmit by xsk:  100%   |   30% |   33%| 1192064
recv by sockperf: 100%   |   68% |   100%   |  692288
recv by xsk:  100%   |   33% |   43%|  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.



Xuan Zhuo (11):
  virtio_ring: split: separate dma codes
  virtio_ring: packed: separate dma codes
  virtio_ring: packed-indirect: separate dma codes
  virtio_ring: split: support premapped
  virtio_ring: packed: support premapped
  virtio_ring: packed-indirect: support premapped
  virtio_ring: update document for virtqueue_add_*
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: correct the expression of the description of
virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()

 drivers/virtio/virtio.c  |   6 +
 drivers/virtio/virtio_ring.c | 342 +--
 include/linux/virtio.h   |   4 +
 3 files changed, 255 insertions(+), 97 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 04/11] virtio_ring: split: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8ed4aef9462..a2a77a0dafe6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -70,6 +70,7 @@
 struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   bool dma_map_internal;  /* Do dma map internally. */
 };
 
 struct vring_desc_state_packed {
@@ -448,7 +449,7 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
 }
 
 static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
- unsigned int i)
+ unsigned int i, bool dma_map_internal)
 {
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
@@ -465,6 +466,9 @@ static unsigned int vring_unmap_one_split(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!dma_map_internal)
+   goto out;
+
dma_unmap_page(vring_dma_dev(vq),
   extra[i].addr,
   extra[i].len,
@@ -615,7 +619,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct scatterlist *sg;
struct vring_desc *desc;
unsigned int i, n, avail, descs_used, prev;
-   bool indirect;
+   bool indirect, dma_map_internal;
int head;
 
START_USE(vq);
@@ -668,7 +672,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return -ENOSPC;
}
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs))
return -ENOMEM;
 
for (n = 0; n < out_sgs; n++) {
@@ -734,6 +739,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
vq->split.desc_state[head].indir_desc = desc;
else
vq->split.desc_state[head].indir_desc = ctx;
+   vq->split.desc_state[head].dma_map_internal = dma_map_internal;
 
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
@@ -759,7 +765,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (dma_map_internal)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
if (indirect)
kfree(desc);
@@ -804,20 +811,22 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
 {
unsigned int i, j;
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+   bool dma_map_internal;
 
/* Clear data ptr. */
vq->split.desc_state[head].data = NULL;
+   dma_map_internal = vq->split.desc_state[head].dma_map_internal;
 
/* Put back on free list: unmap first-level descriptors and find end */
i = head;
 
while (vq->split.vring.desc[i].flags & nextflag) {
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, dma_map_internal);
i = vq->split.desc_extra[i].next;
vq->vq.num_free++;
}
 
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, dma_map_internal);
vq->split.desc_extra[i].next = vq->free_head;
vq->free_head = head;
 
@@ -839,8 +848,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-   for (j = 0; j < len / sizeof(struct vring_desc); j++)
-   vring_unmap_one_split_indirect(vq, _desc[j]);
+   if (dma_map_internal) {
+   for (j = 0; j < len / sizeof(struct vring_desc); j++)
+   vring_unmap_one_split_indirect(vq, 
_desc[j]);
+   }
 
kfree(indir_desc);

[PATCH vhost v4 05/11] virtio_ring: packed: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a2a77a0dafe6..cafb720cbdc1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,7 @@ struct vring_desc_state_packed {
struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
u16 num;/* Descriptor list length. */
u16 last;   /* The last desc state in a list. */
+   bool dma_map_internal;  /* Do dma map internally. */
 };
 
 struct vring_desc_extra {
@@ -1259,7 +1260,8 @@ static inline u16 packed_last_used(u16 last_used_idx)
 }
 
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-struct vring_desc_extra *extra)
+struct vring_desc_extra *extra,
+bool dma_map_internal)
 {
u16 flags;
 
@@ -1274,6 +1276,9 @@ static void vring_unmap_extra_packed(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!dma_map_internal)
+   return;
+
dma_unmap_page(vring_dma_dev(vq),
   extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
@@ -1439,6 +1444,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
u16 head, id, prev, curr;
+   bool dma_map_internal;
int err;
 
START_USE(vq);
@@ -1484,7 +1490,8 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs))
return -ENOMEM;
 
curr = id;
@@ -1536,6 +1543,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = ctx;
vq->packed.desc_state[id].last = prev;
+   vq->packed.desc_state[id].dma_map_internal = dma_map_internal;
 
/*
 * A driver MUST NOT make the first descriptor in the list
@@ -1621,7 +1629,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
->packed.desc_extra[curr]);
+>packed.desc_extra[curr],
+state->dma_map_internal);
curr = vq->packed.desc_extra[curr].next;
}
}
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v4 03/11] virtio_ring: packed-indirect: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_indirect_packed().

DMA address will be saved as sg->dma_address, then
virtqueue_add_indirect_packed() will use it directly. Unmap operation
will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 42e8c9d44161..c8ed4aef9462 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1314,7 +1314,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 {
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, err_idx;
+   unsigned int i, n;
u16 head, id;
dma_addr_t addr;
 
@@ -1334,16 +1334,14 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
desc[i].flags = cpu_to_le16(n < out_sgs ?
0 : VRING_DESC_F_WRITE);
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
i++;
}
@@ -1407,10 +1405,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   err_idx = i;
-
-   for (i = 0; i < err_idx; i++)
-   vring_unmap_desc_packed(vq, [i]);
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
kfree(desc);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_ring: Suppress tx interrupt when napi_tx disable

2023-03-21 Thread Jason Wang
On Tue, Mar 21, 2023 at 5:00 PM Albert Huang
 wrote:
>
> From: "huangjie.albert" 
>
> fix commit 8d622d21d248 ("virtio: fix up virtio_disable_cb")
>
> if we disable the napi_tx. when we triger a tx interrupt, the

typo should be "trigger"

> vq->event_triggered will be set to true. It will no longer be
> set to false. Unless we explicitly call virtqueue_enable_cb_delayed
> or virtqueue_enable_cb_prepare
>
> if we disable the napi_tx, It will only be called when the tx ring
> buffer is relatively small:
> virtio_net->start_xmit:
> if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> netif_stop_subqueue(dev, qnum);
> if (!use_napi &&
> unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> /* More just got used, free them then recheck. */
> free_old_xmit_skbs(sq, false);
> if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> netif_start_subqueue(dev, qnum);
> virtqueue_disable_cb(sq->vq);
> }

The code example here is out of date, make sure your tree has this:

commit d71ebe8114b4bf622804b810f5e274069060a174
Author: Jason Wang 
Date:   Tue Jan 17 11:47:07 2023 +0800

virtio-net: correctly enable callback during start_xmit

> }
> }
> Because event_triggered is true.Therefore, VRING_AVAIL_F_NO_INTERRUPT or
> VRING_PACKED_EVENT_FLAG_DISABLE will not be set.So we update
> vring_used_event(>split.vring) or vq->packed.vring.driver->off_wrap
> every time we call virtqueue_get_buf_ctx.This will bring more interruptions.

Can you please post how to test with the performance numbers?

>
> if event_triggered is set to true, do not update 
> vring_used_event(>split.vring)
> or vq->packed.vring.driver->off_wrap
>
> Signed-off-by: huangjie.albert 
> ---
>  drivers/virtio/virtio_ring.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 307e139cb11d..f486cccadbeb 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -795,7 +795,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
> *_vq,
> /* If we expect an interrupt for the next entry, tell host
>  * by writing event index and flush out the write before
>  * the read in the next get_buf call. */
> -   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> +   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)
> +   && (vq->event_triggered == false))

I'm not sure this can work, when event_triggered is true it means
we've got an interrupt, in this case if we want another interrupt for
the next entry, we should update used_event otherwise we will lose
that interrupt?

Thanks

> virtio_store_mb(vq->weak_barriers,
> _used_event(>split.vring),
> cpu_to_virtio16(_vq->vdev, 
> vq->last_used_idx));
> @@ -1529,7 +1530,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>  * by writing event index and flush out the write before
>  * the read in the next get_buf call.
>  */
> -   if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC)
> +   if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC
> +   && (vq->event_triggered == false))
> virtio_store_mb(vq->weak_barriers,
> >packed.vring.driver->off_wrap,
> cpu_to_le16(vq->last_used_idx));
> --
> 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 0/6] use canonical ftrace path whenever possible

2023-03-21 Thread Michael S. Tsirkin
On Wed, Feb 15, 2023 at 03:33:44PM -0700, Ross Zwisler wrote:
> Changes in v2:
>  * Dropped patches which were pulled into maintainer trees.
>  * Split BPF patches out into another series targeting bpf-next.
>  * trace-agent now falls back to debugfs if tracefs isn't present.
>  * Added Acked-by from m...@redhat.com to series.
>  * Added a typo fixup for the virtio-trace README.
> 
> Steven, assuming there are no objections, would you feel comfortable
> taking this series through your tree?

for merging up to patch 5 through another tree:

Acked-by: Michael S. Tsirkin 

I'll merge patch 6, no problem.


> ---
> The canonical location for the tracefs filesystem is at /sys/kernel/tracing.
> 
> But, from Documentation/trace/ftrace.rst:
> 
>   Before 4.1, all ftrace tracing control files were within the debugfs
>   file system, which is typically located at /sys/kernel/debug/tracing.
>   For backward compatibility, when mounting the debugfs file system,
>   the tracefs file system will be automatically mounted at:
> 
>   /sys/kernel/debug/tracing
> 
> There are many places where this older debugfs path is still used in
> code comments, selftests, examples and tools, so let's update them to
> avoid confusion.
> 
> I've broken up the series as best I could by maintainer or directory,
> and I've only sent people the patches that I think they care about to
> avoid spamming everyone.
> 
> Ross Zwisler (6):
>   tracing: always use canonical ftrace path
>   selftests: use canonical ftrace path
>   leaking_addresses: also skip canonical ftrace path
>   tools/kvm_stat: use canonical ftrace path
>   tools/virtio: use canonical ftrace path
>   tools/virtio: fix typo in README instructions
> 
>  include/linux/kernel.h|  2 +-
>  include/linux/tracepoint.h|  4 ++--
>  kernel/trace/Kconfig  | 20 +--
>  kernel/trace/kprobe_event_gen_test.c  |  2 +-
>  kernel/trace/ring_buffer.c|  2 +-
>  kernel/trace/synth_event_gen_test.c   |  2 +-
>  kernel/trace/trace.c  |  2 +-
>  samples/user_events/example.c |  4 ++--
>  scripts/leaking_addresses.pl  |  1 +
>  scripts/tracing/draw_functrace.py |  6 +++---
>  tools/kvm/kvm_stat/kvm_stat   |  2 +-
>  tools/lib/api/fs/tracing_path.c   |  4 ++--
>  .../testing/selftests/user_events/dyn_test.c  |  2 +-
>  .../selftests/user_events/ftrace_test.c   | 10 +-
>  .../testing/selftests/user_events/perf_test.c |  8 
>  tools/testing/selftests/vm/protection_keys.c  |  4 ++--
>  tools/tracing/latency/latency-collector.c |  2 +-
>  tools/virtio/virtio-trace/README  |  4 ++--
>  tools/virtio/virtio-trace/trace-agent.c   | 12 +++
>  19 files changed, 49 insertions(+), 44 deletions(-)
> 
> -- 
> 2.39.1.637.g21b0678d19-goog

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Remove debugfs file after device unregister

2023-03-21 Thread Jason Wang
On Tue, Mar 21, 2023 at 4:29 PM Eli Cohen  wrote:
>
> > -Original Message-
> > From: Jason Wang 
> > Sent: Tuesday, 21 March 2023 5:23
> > To: Eli Cohen 
> > Cc: m...@redhat.com; si-wei@oracle.com; epere...@redhat.com;
> > virtualization@lists.linux-foundation.org; Parav Pandit
> > 
> > Subject: Re: [PATCH] vdpa/mlx5: Remove debugfs file after device unregister
> >
> > On Mon, Mar 20, 2023 at 5:35 PM Eli Cohen  wrote:
> > >
> > >
> > > On 20/03/2023 11:28, Jason Wang wrote:
> > > > On Mon, Mar 20, 2023 at 5:07 PM Eli Cohen  wrote:
> > > >> On 17/03/2023 5:32, Jason Wang wrote:
> > > >>> On Sun, Mar 12, 2023 at 4:41 PM Eli Cohen  wrote:
> > >  When deleting the vdpa device, the debugfs files need to be removed
> > so
> > >  need to remove debugfs after the device has been unregistered.
> > > 
> > >  This fixes null pointer dereference when someone deletes the device
> > >  after debugfs has been populated.
> > > 
> > >  Fixes: 294221004322 ("vdpa/mlx5: Add debugfs subtree")
> > >  Signed-off-by: Eli Cohen 
> > >  ---
> > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > >  diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >  index 3858ba1e8975..3f6149f2ffd4 100644
> > >  --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >  +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >  @@ -3322,8 +3322,6 @@ static void mlx5_vdpa_dev_del(struct
> > vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > >    struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > >    struct workqueue_struct *wq;
> > > 
> > >  -   mlx5_vdpa_remove_debugfs(ndev->debugfs);
> > >  -   ndev->debugfs = NULL;
> > >    if (ndev->nb_registered) {
> > >    ndev->nb_registered = false;
> > >    mlx5_notifier_unregister(mvdev->mdev, >nb);
> > >  @@ -3332,6 +3330,8 @@ static void mlx5_vdpa_dev_del(struct
> > vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > >    mvdev->wq = NULL;
> > >    destroy_workqueue(wq);
> > >    _vdpa_unregister_device(dev);
> > > >>> What if the user tries to access debugfs after
> > _vdpa_unregister_device()?
> > > >> I don't see an issue with that. During the process of deleting a 
> > > >> device,
> > > >> the resources are destroyed and their corresponding debugfs files are
> > > >> also destroyed just prior to destroying the resource.
> > > > For example, rx_flow_table_show() requires the structure mlx5_vdpa_net
> > > > alive, but it seems the structure has been freed after
> > > > _vdpa_unregister_device().
> > > But in this case, rx_flow_table_show() gets called first, so the file
> > > will be removed from the debugfs before
> > >
> > > mlx5_vdpa_net gets released.
> >
> > Just to make sure we are at the same page. I meant:
> >
> > [CPU 0] _vdpa_unregister_device(dev);
>
> If the vdpa is still in use, e.g a VM is using it, this call will hang.

Ture since it will wait for the VM to release the fd. But this is not
the case when the device is bound to the virtio-vdpa driver. In this
case, do we need to synchronize here?

>
> For this call to conclude, the device needs to be reset. During the reset 
> process, all the resources are destroyed, including (for example) the TIR. 
> Just before destroying the TIR, the debugfs entry is removed by calling 
> mlx5_vdpa_remove_rx_flow_table(). So at the time _vdpa_unregister_device() is 
> completed, the debugfs file is not there anymore.

Ok, so in this case we probably don't need a second call to
mlx5_vdpa_remove_debugfs()?

Thanks

>
> > [CPU 1] rx_flow_table_show()
> > [CPU 0] mlx5_vdpa_remove_debugfs()
> >
> > So when CPU 1 tries to access the flow table, the ndev has been
> > released by CPU0 in _vdpa_unregister_device()?
> >
> > Thanks
> >
> > >
> > > >
> > > > If a user tries to access that file after _vdpa_unregister_device()
> > > > but before mlx5_vdpa_remove_debugfs(), will we end up with
> > > > use-after-free?
> > > >
> > > > Thanks
> > > >
> > > >
> > > >>> Thanks
> > > >>>
> > >  +   mlx5_vdpa_remove_debugfs(ndev->debugfs);
> > >  +   ndev->debugfs = NULL;
> > >    mgtdev->ndev = NULL;
> > > }
> > > 
> > >  --
> > >  2.38.1
> > > 
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 04/11] virtio_ring: split: support premapped

2023-03-21 Thread Xuan Zhuo
On Tue, 21 Mar 2023 16:45:21 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Mar 21, 2023 at 05:34:59PM +0800, Xuan Zhuo wrote:
> > virtio core only supports virtual addresses, dma is completed in virtio
> > core.
> >
> > In some scenarios (such as the AF_XDP), the memory is allocated
> > and DMA mapping is completed in advance, so it is necessary for us to
> > support passing the DMA address to virtio core.
> >
> > Drives can use sg->dma_address to pass the mapped dma address to virtio
> > core. If one sg->dma_address is used then all sgs must use
> > sg->dma_address, otherwise all must be null when passing it to the APIs
> > of virtio.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/virtio/virtio_ring.c | 27 +++
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c8ed4aef9462..be2ff96964c3 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,6 +70,7 @@
> >  struct vring_desc_state_split {
> > void *data; /* Data for callback. */
> > struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > +   bool map_inter; /* Do dma map internally. */
>
> I prefer a full name. E.g. "dma_map_internal". Eschew abbreviation.


Will fix.

Thanks.


>
>
> >  };
> >
> >  struct vring_desc_state_packed {
> > @@ -448,7 +449,7 @@ static void vring_unmap_one_split_indirect(const struct 
> > vring_virtqueue *vq,
> >  }
> >
> >  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > - unsigned int i)
> > + unsigned int i, bool map_inter)
> >  {
> > struct vring_desc_extra *extra = vq->split.desc_extra;
> > u16 flags;
> > @@ -465,6 +466,9 @@ static unsigned int vring_unmap_one_split(const struct 
> > vring_virtqueue *vq,
> >  (flags & VRING_DESC_F_WRITE) ?
> >  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > } else {
> > +   if (!map_inter)
> > +   goto out;
> > +
> > dma_unmap_page(vring_dma_dev(vq),
> >extra[i].addr,
> >extra[i].len,
> > @@ -615,7 +619,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > struct scatterlist *sg;
> > struct vring_desc *desc;
> > unsigned int i, n, avail, descs_used, prev;
> > -   bool indirect;
> > +   bool indirect, map_inter;
> > int head;
> >
> > START_USE(vq);
> > @@ -668,7 +672,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > return -ENOSPC;
> > }
> >
> > -   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> > +   map_inter = !sgs[0]->dma_address;
> > +   if (map_inter && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> > return -ENOMEM;
> >
> > for (n = 0; n < out_sgs; n++) {
> > @@ -734,6 +739,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > vq->split.desc_state[head].indir_desc = desc;
> > else
> > vq->split.desc_state[head].indir_desc = ctx;
> > +   vq->split.desc_state[head].map_inter = map_inter;
> >
> > /* Put entry in available array (but don't update avail->idx until they
> >  * do sync). */
> > @@ -759,7 +765,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > return 0;
> >
> >  unmap_release:
> > -   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> > +   if (map_inter)
> > +   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> >
> > if (indirect)
> > kfree(desc);
> > @@ -804,20 +811,22 @@ static void detach_buf_split(struct vring_virtqueue 
> > *vq, unsigned int head,
> >  {
> > unsigned int i, j;
> > __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > +   bool map_inter;
> >
> > /* Clear data ptr. */
> > vq->split.desc_state[head].data = NULL;
> > +   map_inter = vq->split.desc_state[head].map_inter;
> >
> > /* Put back on free list: unmap first-level descriptors and find end */
> > i = head;
> >
> > while (vq->split.vring.desc[i].flags & nextflag) {
> > -   vring_unmap_one_split(vq, i);
> > +   vring_unmap_one_split(vq, i, map_inter);
> > i = vq->split.desc_extra[i].next;
> > vq->vq.num_free++;
> > }
> >
> > -   vring_unmap_one_split(vq, i);
> > +   vring_unmap_one_split(vq, i, map_inter);
> > vq->split.desc_extra[i].next = vq->free_head;
> > vq->free_head = head;
> >
> > @@ -839,8 +848,10 @@ static void detach_buf_split(struct vring_virtqueue 
> > *vq, unsigned int head,
> > VRING_DESC_F_INDIRECT));
> > BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > -   for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > 

Re: [PATCH v4 1/2] vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default

2023-03-21 Thread Si-Wei Liu





On 3/21/2023 4:28 AM, Eli Cohen wrote:

Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.

Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.

One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.

For example:
$ vdpa dev add name vdpa0 mgmtdev pci/:86:00.2 device_features 0x300cb982b


Acked-by: Jason Wang 
Signed-off-by: Eli Cohen 

Looks good to me.

Reviewed-by: Si-Wei Liu 

---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..502c482a93ce 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3122,6 +3122,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
return -EINVAL;
}
device_features &= add_config->device_features;
+   } else {
+   device_features &= ~BIT_ULL(VIRTIO_NET_F_MRG_RXBUF);
}
if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) &&
  device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) {


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kernel?] general protection fault in vhost_task_start

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 01:55:00PM -0400, Michael S. Tsirkin wrote:
> On Tue, Mar 21, 2023 at 12:46:04PM -0500, Mike Christie wrote:
> > On 3/21/23 12:03 PM, syzbot wrote:
> > > RIP: 0010:vhost_task_start+0x22/0x40 kernel/vhost_task.c:115
> > > Code: 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 53 48 89 fb e8 c3 67 2c 00 48 
> > > 8d 7b 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 
> > > 75 0a 48 8b 7b 70 5b e9 fe bd 02 00 e8 79 ec 7e 00 eb
> > > RSP: 0018:c90003a9fc38 EFLAGS: 00010207
> > > RAX: dc00 RBX: fff4 RCX: 
> > > RDX: 000c RSI: 81564c8d RDI: 0064
> > > RBP: 88802b21dd40 R08: 0100 R09: 8c917cf3
> > > R10: fff4 R11:  R12: fff4
> > > R13: 888075d000b0 R14: 888075d0 R15: 888075d8
> > > FS:  56247300() GS:8880b980() 
> > > knlGS:
> > > CS:  0010 DS:  ES:  CR0: 80050033
> > > CR2: 7ffe3d8e5ff8 CR3: 215d4000 CR4: 003506f0
> > > DR0:  DR1:  DR2: 
> > > DR3:  DR6: fffe0ff0 DR7: 0400
> > > Call Trace:
> > >  
> > >  vhost_worker_create drivers/vhost/vhost.c:580 [inline]
> > 
> > The return value from vhost_task_create is incorrect if the kzalloc fails.
> > 
> > Christian, here is a fix for what's in your tree. Do you want me to submit
> > a follow up patch like this or a replacement patch for:
> > 
> > commit 77feab3c4156 ("vhost_task: Allow vhost layer to use copy_process")
> > 
> > with the fix rolled into it?
> > 
> 
> 
> 
> > 
> > >From 0677ad6d77722f301ca35e8e0f8fd0cbd5ed8484 Mon Sep 17 00:00:00 2001
> > From: Mike Christie 
> > Date: Tue, 21 Mar 2023 12:39:39 -0500
> > Subject: [PATCH] vhost_task: Fix vhost_task_create return value
> > 
> > vhost_task_create is supposed to return the vhost_task or NULL on
> > failure. This fixes it to return the correct value when the allocation
> > of the struct fails.
> > ---
> >  kernel/vhost_task.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> > index 4b8aff160640..b7cbd66f889e 100644
> > --- a/kernel/vhost_task.c
> > +++ b/kernel/vhost_task.c
> > @@ -88,7 +88,7 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), 
> > void *arg,
> >  
> > vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
> > if (!vtsk)
> > -   return ERR_PTR(-ENOMEM);
> > +   return NULL;
> > init_completion(>exited);
> > vtsk->data = arg;
> > vtsk->fn = fn;
> > 

Sorry I had nothing to add here. Sent by mistake.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v3 04/11] virtio_ring: split: support premapped

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 05:34:59PM +0800, Xuan Zhuo wrote:
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
> 
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
> 
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use
> sg->dma_address, otherwise all must be null when passing it to the APIs
> of virtio.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_ring.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c8ed4aef9462..be2ff96964c3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -70,6 +70,7 @@
>  struct vring_desc_state_split {
>   void *data; /* Data for callback. */
>   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> + bool map_inter; /* Do dma map internally. */

I prefer a full name. E.g. "dma_map_internal". Eschew abbreviation.


>  };
>  
>  struct vring_desc_state_packed {
> @@ -448,7 +449,7 @@ static void vring_unmap_one_split_indirect(const struct 
> vring_virtqueue *vq,
>  }
>  
>  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> -   unsigned int i)
> +   unsigned int i, bool map_inter)
>  {
>   struct vring_desc_extra *extra = vq->split.desc_extra;
>   u16 flags;
> @@ -465,6 +466,9 @@ static unsigned int vring_unmap_one_split(const struct 
> vring_virtqueue *vq,
>(flags & VRING_DESC_F_WRITE) ?
>DMA_FROM_DEVICE : DMA_TO_DEVICE);
>   } else {
> + if (!map_inter)
> + goto out;
> +
>   dma_unmap_page(vring_dma_dev(vq),
>  extra[i].addr,
>  extra[i].len,
> @@ -615,7 +619,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   struct scatterlist *sg;
>   struct vring_desc *desc;
>   unsigned int i, n, avail, descs_used, prev;
> - bool indirect;
> + bool indirect, map_inter;
>   int head;
>  
>   START_USE(vq);
> @@ -668,7 +672,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   return -ENOSPC;
>   }
>  
> - if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> + map_inter = !sgs[0]->dma_address;
> + if (map_inter && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
>   return -ENOMEM;
>  
>   for (n = 0; n < out_sgs; n++) {
> @@ -734,6 +739,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   vq->split.desc_state[head].indir_desc = desc;
>   else
>   vq->split.desc_state[head].indir_desc = ctx;
> + vq->split.desc_state[head].map_inter = map_inter;
>  
>   /* Put entry in available array (but don't update avail->idx until they
>* do sync). */
> @@ -759,7 +765,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   return 0;
>  
>  unmap_release:
> - virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> + if (map_inter)
> + virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>  
>   if (indirect)
>   kfree(desc);
> @@ -804,20 +811,22 @@ static void detach_buf_split(struct vring_virtqueue 
> *vq, unsigned int head,
>  {
>   unsigned int i, j;
>   __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> + bool map_inter;
>  
>   /* Clear data ptr. */
>   vq->split.desc_state[head].data = NULL;
> + map_inter = vq->split.desc_state[head].map_inter;
>  
>   /* Put back on free list: unmap first-level descriptors and find end */
>   i = head;
>  
>   while (vq->split.vring.desc[i].flags & nextflag) {
> - vring_unmap_one_split(vq, i);
> + vring_unmap_one_split(vq, i, map_inter);
>   i = vq->split.desc_extra[i].next;
>   vq->vq.num_free++;
>   }
>  
> - vring_unmap_one_split(vq, i);
> + vring_unmap_one_split(vq, i, map_inter);
>   vq->split.desc_extra[i].next = vq->free_head;
>   vq->free_head = head;
>  
> @@ -839,8 +848,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
> unsigned int head,
>   VRING_DESC_F_INDIRECT));
>   BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> - for (j = 0; j < len / sizeof(struct vring_desc); j++)
> - vring_unmap_one_split_indirect(vq, _desc[j]);
> + if (map_inter) {
> + for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +  

Re: [v2,1/8] drm/fbdev-generic: Always use shadow buffering

2023-03-21 Thread Thomas Zimmermann

Hi,

thanks for testing the patchset.

Am 21.03.23 um 16:23 schrieb Sui jingfeng:


On 2023/3/20 23:07, Thomas Zimmermann wrote:

Remove all codepaths that implement fbdev output directly on GEM
buffers. Always allocate a shadow buffer in system memory and set
up deferred I/O for mmap.

The fbdev code that operated directly on GEM buffers was used by
drivers based on GEM DMA helpers. Those drivers have been migrated
to use fbdev-dma, a dedicated fbdev emulation for DMA memory. All
remaining users of fbdev-generic require shadow buffering.

Memory management of the remaining callers uses TTM, GEM SHMEM
helpers or a variant of GEM DMA helpers that is incompatible with
fbdev-dma. Therefore remove the unused codepaths from fbdev-generic
and simplify the code.

Using a shadow buffer with deferred I/O is probably the best case
for most remaining callers. Some of the TTM-based drivers might
benefit from a dedicated fbdev emulation that operates directly on
the driver's video memory.


I don't understand here,  the TTM-based drivers should have equivalent 
performance


with you implement. Because device memory typically very slow for cpu 
read, at least


this is true for Mips and loongarch architecture.  TTM-based drivers for 
those platform


is also prefer to render to system ram first(for fast reading and 
compositing) and then


blit to the real framebuffer pinned to VRAM.


Right. But in practice, writing the console directly to the video ram is 
perceived as faster. The shadow buffer needs an additional memcpy to the 
video ram. That adds latency to the output. It's more a slowness in 
perception than in actual I/O access.


The drawing helpers for system memory have also been slow compared to 
the helper for I/O memory. I've fixed some of it, but I'm not sure if I 
covered all cases. See [1] for the patches.


[1] https://patchwork.freedesktop.org/series/100317/

And finally, with the fbdev buffer in video memory, drivers could 
utilize hardware blitting to further speed up drawing. Although not many 
drivers ever did this.





In turn, I think shmem helper based drivers might benefit from a 
dedicated fbdev emulation.


Because you are blit to the shadow of the video memory for shmem helper 
based driver. The


driver may need another blit to the ultimate framebuffer.  Using a 


Indeed. SHMEM has two memcpy operations. If we could remove the fbdev 
shadow buffer and operate on the SHMEM buffer directly, we'd reduce 
overhead.


It would also mean that we'd mmap the SHMEM pages to fbdev's userspace. 
I've not yet managed to implement this successfully, as it would require 
new mmap code in the GEM SHMEM object for this purpose. And so far, I've 
always ran into a bug where the fbdev emulation would end up in a 
corrupt state.


I've spend time on 'fbdev-shmem' emulation, but given the problems, it 
hasn't paid off so far.


Best regards
Thomas


shadow buffer is still acceptable

though, but why  do you say "the TTM-based drivers might benefit from a 
dedicated fbdev emulation" ?




Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
Acked-by: Zack Rusin 
---
  drivers/gpu/drm/drm_fbdev_generic.c | 184 +---
  1 file changed, 30 insertions(+), 154 deletions(-)

diff --git a/drivers/gpu/drm/drm_fbdev_generic.c 
b/drivers/gpu/drm/drm_fbdev_generic.c

index 4d6325e91565..e48a8e82378d 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -11,16 +11,6 @@
  #include 
-static bool drm_fbdev_use_shadow_fb(struct drm_fb_helper *fb_helper)
-{
-    struct drm_device *dev = fb_helper->dev;
-    struct drm_framebuffer *fb = fb_helper->fb;
-
-    return dev->mode_config.prefer_shadow_fbdev ||
-   dev->mode_config.prefer_shadow ||
-   fb->funcs->dirty;
-}
-
  /* @user: 1=userspace, 0=fbcon */
  static int drm_fbdev_fb_open(struct fb_info *info, int user)
  {
@@ -46,115 +36,33 @@ static int drm_fbdev_fb_release(struct fb_info 
*info, int user)

  static void drm_fbdev_fb_destroy(struct fb_info *info)
  {
  struct drm_fb_helper *fb_helper = info->par;
-    void *shadow = NULL;
+    void *shadow = info->screen_buffer;
  if (!fb_helper->dev)
  return;
-    if (info->fbdefio)
-    fb_deferred_io_cleanup(info);
-    if (drm_fbdev_use_shadow_fb(fb_helper))
-    shadow = info->screen_buffer;
-
+    fb_deferred_io_cleanup(info);
  drm_fb_helper_fini(fb_helper);
-
-    if (shadow)
-    vfree(shadow);
-    else if (fb_helper->buffer)
-    drm_client_buffer_vunmap(fb_helper->buffer);
-
+    vfree(shadow);
  drm_client_framebuffer_delete(fb_helper->buffer);
-    drm_client_release(_helper->client);
+    drm_client_release(_helper->client);
  drm_fb_helper_unprepare(fb_helper);
  kfree(fb_helper);
  }
-static int drm_fbdev_fb_mmap(struct fb_info *info, struct 
vm_area_struct *vma)

-{
-    struct drm_fb_helper *fb_helper = info->par;
-
-    if (drm_fbdev_use_shadow_fb(fb_helper))
-   

Re: [syzbot] [kernel?] general protection fault in vhost_task_start

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 12:46:04PM -0500, Mike Christie wrote:
> On 3/21/23 12:03 PM, syzbot wrote:
> > RIP: 0010:vhost_task_start+0x22/0x40 kernel/vhost_task.c:115
> > Code: 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 53 48 89 fb e8 c3 67 2c 00 48 8d 
> > 7b 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 
> > 0a 48 8b 7b 70 5b e9 fe bd 02 00 e8 79 ec 7e 00 eb
> > RSP: 0018:c90003a9fc38 EFLAGS: 00010207
> > RAX: dc00 RBX: fff4 RCX: 
> > RDX: 000c RSI: 81564c8d RDI: 0064
> > RBP: 88802b21dd40 R08: 0100 R09: 8c917cf3
> > R10: fff4 R11:  R12: fff4
> > R13: 888075d000b0 R14: 888075d0 R15: 888075d8
> > FS:  56247300() GS:8880b980() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 7ffe3d8e5ff8 CR3: 215d4000 CR4: 003506f0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >  
> >  vhost_worker_create drivers/vhost/vhost.c:580 [inline]
> 
> The return value from vhost_task_create is incorrect if the kzalloc fails.
> 
> Christian, here is a fix for what's in your tree. Do you want me to submit
> a follow up patch like this or a replacement patch for:
> 
> commit 77feab3c4156 ("vhost_task: Allow vhost layer to use copy_process")
> 
> with the fix rolled into it?
> 



> 
> >From 0677ad6d77722f301ca35e8e0f8fd0cbd5ed8484 Mon Sep 17 00:00:00 2001
> From: Mike Christie 
> Date: Tue, 21 Mar 2023 12:39:39 -0500
> Subject: [PATCH] vhost_task: Fix vhost_task_create return value
> 
> vhost_task_create is supposed to return the vhost_task or NULL on
> failure. This fixes it to return the correct value when the allocation
> of the struct fails.
> ---
>  kernel/vhost_task.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index 4b8aff160640..b7cbd66f889e 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -88,7 +88,7 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), 
> void *arg,
>  
>   vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
>   if (!vtsk)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>   init_completion(>exited);
>   vtsk->data = arg;
>   vtsk->fn = fn;
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [syzbot] [kernel?] general protection fault in vhost_task_start

2023-03-21 Thread Mike Christie
On 3/21/23 12:03 PM, syzbot wrote:
> RIP: 0010:vhost_task_start+0x22/0x40 kernel/vhost_task.c:115
> Code: 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 53 48 89 fb e8 c3 67 2c 00 48 8d 7b 
> 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 0a 48 
> 8b 7b 70 5b e9 fe bd 02 00 e8 79 ec 7e 00 eb
> RSP: 0018:c90003a9fc38 EFLAGS: 00010207
> RAX: dc00 RBX: fff4 RCX: 
> RDX: 000c RSI: 81564c8d RDI: 0064
> RBP: 88802b21dd40 R08: 0100 R09: 8c917cf3
> R10: fff4 R11:  R12: fff4
> R13: 888075d000b0 R14: 888075d0 R15: 888075d8
> FS:  56247300() GS:8880b980() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7ffe3d8e5ff8 CR3: 215d4000 CR4: 003506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  vhost_worker_create drivers/vhost/vhost.c:580 [inline]

The return value from vhost_task_create is incorrect if the kzalloc fails.

Christian, here is a fix for what's in your tree. Do you want me to submit
a follow up patch like this or a replacement patch for:

commit 77feab3c4156 ("vhost_task: Allow vhost layer to use copy_process")

with the fix rolled into it?



>From 0677ad6d77722f301ca35e8e0f8fd0cbd5ed8484 Mon Sep 17 00:00:00 2001
From: Mike Christie 
Date: Tue, 21 Mar 2023 12:39:39 -0500
Subject: [PATCH] vhost_task: Fix vhost_task_create return value

vhost_task_create is supposed to return the vhost_task or NULL on
failure. This fixes it to return the correct value when the allocation
of the struct fails.
---
 kernel/vhost_task.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index 4b8aff160640..b7cbd66f889e 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -88,7 +88,7 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), void 
*arg,
 
vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
if (!vtsk)
-   return ERR_PTR(-ENOMEM);
+   return NULL;
init_completion(>exited);
vtsk->data = arg;
vtsk->fn = fn;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs

2023-03-21 Thread Mike Christie
On 3/21/23 2:12 AM, Michael S. Tsirkin wrote:
> On Mon, Mar 20, 2023 at 09:29:50PM -0500, michael.chris...@oracle.com wrote:
>> On 3/20/23 9:06 PM, Mike Christie wrote:
>>> The following patches were made over Linus tree.
>>
>> Hi Michael, I see you merged my first version of the patchset in your
>> vhost branch.
>>
>> Do you want me to just send a followup patchset?
>>
>> The major diff between the 2 versions:
>>
>> 1. I added the first 2 patches which fix some bugs in the existing code
>> I found while doing some code review and testing another LIO patchset
>> plus v1.
>>
>> Note: The other day I posted that I thought the 3rd patch in v1 caused
>> the bugs but they were already in the code.
>>
>> 2. In v2 I made one of the patches not need the vhost device lock when
>> unmapping/mapping LUNs, so you can add new LUNs even if one LUN on the same
>> vhost_scsi device was hung.
>>
>> Since it's not regressions with the existing patches, I can just send those
>> as a followup patchset if that's preferred.
> 
> It's ok, I will drop v1 and replace it with v2.
> Do you feel any of this is needed in this release?
> 

No. It can wait for 6.4. Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Cornelia Huck
On Tue, Mar 21 2023, "Michael S. Tsirkin"  wrote:

> On Tue, Mar 21, 2023 at 04:30:57PM +0100, Cornelia Huck wrote:
>> On Tue, Mar 21 2023, Viktor Prutyanov  wrote:
>> 
>> > On Tue, Mar 21, 2023 at 5:59 PM Cornelia Huck  wrote:
>> >>
>> >> On Tue, Mar 21 2023, Viktor Prutyanov  wrote:
>> >>
>> >> > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
>> >> > indicates that the driver passes extra data along with the queue
>> >> > notifications.
>> >> >
>> >> > In a split queue case, the extra data is 16-bit available index. In a
>> >> > packed queue case, the extra data is 1-bit wrap counter and 15-bit
>> >> > available index.
>> >> >
>> >> > Add support for this feature for MMIO, PCI and channel I/O transports.
>> >> >
>> >> > Signed-off-by: Viktor Prutyanov 
>> >> > ---
>> >> >  v3: support feature in virtio_ccw, remove VM_NOTIFY, use 
>> >> > avail_idx_shadow,
>> >> > remove byte swap, rename to vring_notification_data
>> >> >  v2: reject the feature in virtio_ccw, replace __le32 with u32
>> >> >
>> >> >  drivers/s390/virtio/virtio_ccw.c   |  4 +++-
>> >> >  drivers/virtio/virtio_mmio.c   | 14 +-
>> >> >  drivers/virtio/virtio_pci_common.c | 10 ++
>> >> >  drivers/virtio/virtio_pci_common.h |  4 
>> >> >  drivers/virtio/virtio_pci_legacy.c |  2 +-
>> >> >  drivers/virtio/virtio_pci_modern.c |  2 +-
>> >> >  drivers/virtio/virtio_ring.c   | 17 +
>> >> >  include/linux/virtio_ring.h|  2 ++
>> >> >  include/uapi/linux/virtio_config.h |  6 ++
>> >> >  9 files changed, 57 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
>> >> > b/drivers/s390/virtio/virtio_ccw.c
>> >> > index 954fc31b4bc7..c33172c5b8d5 100644
>> >> > --- a/drivers/s390/virtio/virtio_ccw.c
>> >> > +++ b/drivers/s390/virtio/virtio_ccw.c
>> >> > @@ -396,13 +396,15 @@ static bool virtio_ccw_kvm_notify(struct 
>> >> > virtqueue *vq)
>> >> >   struct virtio_ccw_vq_info *info = vq->priv;
>> >> >   struct virtio_ccw_device *vcdev;
>> >> >   struct subchannel_id schid;
>> >> > + u32 data = __virtio_test_bit(vq->vdev, 
>> >> > VIRTIO_F_NOTIFICATION_DATA) ?
>> >> > + vring_notification_data(vq) : vq->index;
>> >> >
>> >> >   vcdev = to_vc_device(info->vq->vdev);
>> >> >   ccw_device_get_schid(vcdev->cdev, );
>> >> >   BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned 
>> >> > int));
>> >> >   info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
>> >> > *((unsigned int *)),
>> >> > -   vq->index, info->cookie);
>> >> > +   data, info->cookie);
>> >> >   if (info->cookie < 0)
>> >> >   return false;
>> >> >   return true;
>> >> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
>> >> > index 3ff746e3f24a..7c16e622c33d 100644
>> >> > --- a/drivers/virtio/virtio_mmio.c
>> >> > +++ b/drivers/virtio/virtio_mmio.c
>> >> > @@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
>> >> >   return true;
>> >> >  }
>> >> >
>> >> > +static bool vm_notify_with_data(struct virtqueue *vq)
>> >> > +{
>> >> > + struct virtio_mmio_device *vm_dev = 
>> >> > to_virtio_mmio_device(vq->vdev);
>> >> > + u32 data = vring_notification_data(vq);
>> >> > +
>> >> > + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
>> >>
>> >> Can't you simply use the same method as for ccw, i.e. use one callback
>> >> function that simply writes one value or the other?
>> >
>> > The idea is to eliminate the conditional branch induced by feature bit
>> > testing from the notification function. Probably, this can be done in
>> > the same way in ccw.
>> 
>> Hm, how noticable is that branch? IOW, is it worth making the code less
>> readable for this?
>
> I'm not sure but these things add up. I'm with Viktor here let's just
> avoid the branch and not worry about whether it's important or not.
> So let's use the same thing here then? And we can use a subfunction
> to avoid code duplication.

Ok, let's do it that way.

>
>> (In any case, all transports probably should use the same method.)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 04:30:57PM +0100, Cornelia Huck wrote:
> On Tue, Mar 21 2023, Viktor Prutyanov  wrote:
> 
> > On Tue, Mar 21, 2023 at 5:59 PM Cornelia Huck  wrote:
> >>
> >> On Tue, Mar 21 2023, Viktor Prutyanov  wrote:
> >>
> >> > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> >> > indicates that the driver passes extra data along with the queue
> >> > notifications.
> >> >
> >> > In a split queue case, the extra data is 16-bit available index. In a
> >> > packed queue case, the extra data is 1-bit wrap counter and 15-bit
> >> > available index.
> >> >
> >> > Add support for this feature for MMIO, PCI and channel I/O transports.
> >> >
> >> > Signed-off-by: Viktor Prutyanov 
> >> > ---
> >> >  v3: support feature in virtio_ccw, remove VM_NOTIFY, use 
> >> > avail_idx_shadow,
> >> > remove byte swap, rename to vring_notification_data
> >> >  v2: reject the feature in virtio_ccw, replace __le32 with u32
> >> >
> >> >  drivers/s390/virtio/virtio_ccw.c   |  4 +++-
> >> >  drivers/virtio/virtio_mmio.c   | 14 +-
> >> >  drivers/virtio/virtio_pci_common.c | 10 ++
> >> >  drivers/virtio/virtio_pci_common.h |  4 
> >> >  drivers/virtio/virtio_pci_legacy.c |  2 +-
> >> >  drivers/virtio/virtio_pci_modern.c |  2 +-
> >> >  drivers/virtio/virtio_ring.c   | 17 +
> >> >  include/linux/virtio_ring.h|  2 ++
> >> >  include/uapi/linux/virtio_config.h |  6 ++
> >> >  9 files changed, 57 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> >> > b/drivers/s390/virtio/virtio_ccw.c
> >> > index 954fc31b4bc7..c33172c5b8d5 100644
> >> > --- a/drivers/s390/virtio/virtio_ccw.c
> >> > +++ b/drivers/s390/virtio/virtio_ccw.c
> >> > @@ -396,13 +396,15 @@ static bool virtio_ccw_kvm_notify(struct virtqueue 
> >> > *vq)
> >> >   struct virtio_ccw_vq_info *info = vq->priv;
> >> >   struct virtio_ccw_device *vcdev;
> >> >   struct subchannel_id schid;
> >> > + u32 data = __virtio_test_bit(vq->vdev, VIRTIO_F_NOTIFICATION_DATA) 
> >> > ?
> >> > + vring_notification_data(vq) : vq->index;
> >> >
> >> >   vcdev = to_vc_device(info->vq->vdev);
> >> >   ccw_device_get_schid(vcdev->cdev, );
> >> >   BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned int));
> >> >   info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
> >> > *((unsigned int *)),
> >> > -   vq->index, info->cookie);
> >> > +   data, info->cookie);
> >> >   if (info->cookie < 0)
> >> >   return false;
> >> >   return true;
> >> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> >> > index 3ff746e3f24a..7c16e622c33d 100644
> >> > --- a/drivers/virtio/virtio_mmio.c
> >> > +++ b/drivers/virtio/virtio_mmio.c
> >> > @@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
> >> >   return true;
> >> >  }
> >> >
> >> > +static bool vm_notify_with_data(struct virtqueue *vq)
> >> > +{
> >> > + struct virtio_mmio_device *vm_dev = 
> >> > to_virtio_mmio_device(vq->vdev);
> >> > + u32 data = vring_notification_data(vq);
> >> > +
> >> > + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> >>
> >> Can't you simply use the same method as for ccw, i.e. use one callback
> >> function that simply writes one value or the other?
> >
> > The idea is to eliminate the conditional branch induced by feature bit
> > testing from the notification function. Probably, this can be done in
> > the same way in ccw.
> 
> Hm, how noticable is that branch? IOW, is it worth making the code less
> readable for this?

I'm not sure but these things add up. I'm with Viktor here let's just
avoid the branch and not worry about whether it's important or not.
So let's use the same thing here then? And we can use a subfunction
to avoid code duplication.

> (In any case, all transports probably should use the same method.)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 6/8] vdpa_sim: use kthread worker

2023-03-21 Thread Stefano Garzarella
Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.

Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- fix `dev` not initialized in the error path [Simon Horman]

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 19 +--
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index acee20faaf6a..ce83f9130a5d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
 struct vdpasim {
struct vdpa_device vdpa;
struct vdpasim_virtqueue *vqs;
-   struct work_struct work;
+   struct kthread_worker *worker;
+   struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
/* spinlock to synchronize virtqueue state */
spinlock_t lock;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index f6329900e61a..1cfa56c52e5a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -11,8 +11,8 @@
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -127,7 +127,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
-static void vdpasim_work_fn(struct work_struct *work)
+static void vdpasim_work_fn(struct kthread_work *work)
 {
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
 
@@ -170,11 +170,17 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
-   INIT_WORK(>work, vdpasim_work_fn);
+   dev = >vdpa.dev;
+
+   kthread_init_work(>work, vdpasim_work_fn);
+   vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+   dev_attr->name);
+   if (IS_ERR(vdpasim->worker))
+   goto err_iommu;
+
spin_lock_init(>lock);
spin_lock_init(>iommu_lock);
 
-   dev = >vdpa.dev;
dev->dma_mask = >coherent_dma_mask;
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
goto err_iommu;
@@ -223,7 +229,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
 
 void vdpasim_schedule_work(struct vdpasim *vdpasim)
 {
-   schedule_work(>work);
+   kthread_queue_work(vdpasim->worker, >work);
 }
 EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
 
@@ -623,7 +629,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
 
-   cancel_work_sync(>work);
+   kthread_cancel_work_sync(>work);
+   kthread_destroy_worker(vdpasim->worker);
 
for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
vringh_kiov_cleanup(>vqs[i].out_iov);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 5/8] vdpa_sim: make devices agnostic for work management

2023-03-21 Thread Stefano Garzarella
Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.

Acked-by: Eugenio Pérez Martin 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++--
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  6 ++
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  6 ++
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 144858636c10..acee20faaf6a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
u32 ngroups;
u32 nas;
 
-   work_func_t work_fn;
+   void (*work_fn)(struct vdpasim *vdpasim);
void (*get_config)(struct vdpasim *vdpasim, void *config);
void (*set_config)(struct vdpasim *vdpasim, const void *config);
int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
@@ -78,6 +78,7 @@ struct vdpasim {
 
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
   const struct vdpa_dev_set_config *config);
+void vdpasim_schedule_work(struct vdpasim *vdpasim);
 
 /* TODO: cross-endian support */
 static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 47cdf2a1f5b8..f6329900e61a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -127,6 +127,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
+static void vdpasim_work_fn(struct work_struct *work)
+{
+   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+
+   vdpasim->dev_attr.work_fn(vdpasim);
+}
+
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
   const struct vdpa_dev_set_config *config)
 {
@@ -163,7 +170,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
-   INIT_WORK(>work, dev_attr->work_fn);
+   INIT_WORK(>work, vdpasim_work_fn);
spin_lock_init(>lock);
spin_lock_init(>iommu_lock);
 
@@ -214,6 +221,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 }
 EXPORT_SYMBOL_GPL(vdpasim_create);
 
+void vdpasim_schedule_work(struct vdpasim *vdpasim)
+{
+   schedule_work(>work);
+}
+EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
+
 static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
  u64 desc_area, u64 driver_area,
  u64 device_area)
@@ -248,7 +261,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 
idx)
}
 
if (vq->ready)
-   schedule_work(>work);
+   vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 5117959bed8a..eb4897c8541e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
return handled;
 }
 
-static void vdpasim_blk_work(struct work_struct *work)
+static void vdpasim_blk_work(struct vdpasim *vdpasim)
 {
-   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
bool reschedule = false;
int i;
 
@@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
spin_unlock(>lock);
 
if (reschedule)
-   schedule_work(>work);
+   vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 862f405362de..e61a9ecbfafe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
u64_stats_update_end(>cq_stats.syncp);
 }
 
-static void vdpasim_net_work(struct work_struct *work)
+static void vdpasim_net_work(struct vdpasim *vdpasim)
 {
-   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
struct vdpasim_virtqueue *txq = >vqs[1];
struct vdpasim_virtqueue *rxq = >vqs[0];
struct vdpasim_net *net = sim_to_net(vdpasim);
@@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
   

[PATCH v3 8/8] vdpa_sim: add support for user VA

2023-03-21 Thread Stefano Garzarella
The new "use_va" module parameter (default: true) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.

vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).

vdpasim_mm_work_fn work is used to serialize the binding to a new
address space when the .bind_mm callback is invoked, and unbinding
when the .unbind_mm callback is invoked.

Call mmget_not_zero()/kthread_use_mm() inside the worker function
to pin the address space only as long as needed, following the
documentation of mmget() in include/linux/sched/mm.h:

  * Never use this function to pin this address space for an
  * unbounded/indefinite amount of time.

Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- called mmget_not_zero() before kthread_use_mm() [Jason]
  As the documentation of mmget() in include/linux/sched/mm.h says:

  * Never use this function to pin this address space for an
  * unbounded/indefinite amount of time.

  I moved mmget_not_zero/kthread_use_mm inside the worker function,
  this way we pin the address space only as long as needed.
  This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
  drivers/vfio/vfio_iommu_type1.c
- simplified the mm bind/unbind [Jason]
- renamed vdpasim_worker_change_mm_sync() [Jason]
- fix commit message (s/default: false/default: true)
v2:
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since we
  are now using a mutex to protect the device state

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 80 +++-
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 4774292fba8c..3a42887d05d9 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -59,6 +59,7 @@ struct vdpasim {
struct vdpasim_virtqueue *vqs;
struct kthread_worker *worker;
struct kthread_work work;
+   struct mm_struct *mm_bound;
struct vdpasim_dev_attr dev_attr;
/* mutex to synchronize virtqueue state */
struct mutex mutex;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index ab4cfb82c237..23c891cdcd54 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
 MODULE_PARM_DESC(max_iotlb_entries,
 "Maximum number of iotlb entries for each address space. 0 
means unlimited. (default: 2048)");
 
+static bool use_va = true;
+module_param(use_va, bool, 0444);
+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
+
 #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
 #define VDPASIM_QUEUE_MAX 256
 #define VDPASIM_VENDOR_ID 0
 
+struct vdpasim_mm_work {
+   struct kthread_work work;
+   struct vdpasim *vdpasim;
+   struct mm_struct *mm_to_bind;
+   int ret;
+};
+
+static void vdpasim_mm_work_fn(struct kthread_work *work)
+{
+   struct vdpasim_mm_work *mm_work =
+   container_of(work, struct vdpasim_mm_work, work);
+   struct vdpasim *vdpasim = mm_work->vdpasim;
+
+   mm_work->ret = 0;
+
+   //TODO: should we attach the cgroup of the mm owner?
+   vdpasim->mm_bound = mm_work->mm_to_bind;
+}
+
+static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
+ struct vdpasim_mm_work *mm_work)
+{
+   struct kthread_work *work = _work->work;
+
+   kthread_init_work(work, vdpasim_mm_work_fn);
+   kthread_queue_work(vdpasim->worker, work);
+
+   kthread_flush_work(work);
+}
+
 static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
 {
return container_of(vdpa, struct vdpasim, vdpa);
@@ -59,8 +93,10 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, 
unsigned int idx)
 {
struct vdpasim_virtqueue *vq = >vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
+   bool va_enabled = use_va && vdpasim->mm_bound;
 
-   vringh_init_iotlb(>vring, vdpasim->features, vq->num, true, false,
+   vringh_init_iotlb(>vring, vdpasim->features, vq->num, true,
+ va_enabled,
  (struct vring_desc *)(uintptr_t)vq->desc_addr,
  (struct vring_avail *)
  (uintptr_t)vq->driver_addr,
@@ -130,8 +166,20 @@ static const struct vdpa_config_ops 
vdpasim_batch_config_ops;
 static void vdpasim_work_fn(struct kthread_work *work)
 {
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+   struct mm_struct *mm = vdpasim->mm_bound;
+
+   if (mm) {
+

[PATCH v3 7/8] vdpa_sim: replace the spinlock with a mutex to protect the state

2023-03-21 Thread Stefano Garzarella
The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).

This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.

For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.

Suggested-by: Jason Wang 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  4 ++--
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index ce83f9130a5d..4774292fba8c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -60,8 +60,8 @@ struct vdpasim {
struct kthread_worker *worker;
struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
-   /* spinlock to synchronize virtqueue state */
-   spinlock_t lock;
+   /* mutex to synchronize virtqueue state */
+   struct mutex mutex;
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 1cfa56c52e5a..ab4cfb82c237 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -178,7 +178,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
if (IS_ERR(vdpasim->worker))
goto err_iommu;
 
-   spin_lock_init(>lock);
+   mutex_init(>mutex);
spin_lock_init(>iommu_lock);
 
dev->dma_mask = >coherent_dma_mask;
@@ -286,13 +286,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device 
*vdpa, u16 idx, bool ready)
struct vdpasim_virtqueue *vq = >vqs[idx];
bool old_ready;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
old_ready = vq->ready;
vq->ready = ready;
if (vq->ready && !old_ready) {
vdpasim_queue_ready(vdpasim, idx);
}
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 }
 
 static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
@@ -310,9 +310,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, 
u16 idx,
struct vdpasim_virtqueue *vq = >vqs[idx];
struct vringh *vrh = >vring;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vrh->last_avail_idx = state->split.avail_index;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -409,9 +409,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
u8 status;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
status = vdpasim->status;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return status;
 }
@@ -420,19 +420,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, 
u8 status)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->status = status;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 }
 
 static int vdpasim_reset(struct vdpa_device *vdpa)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->status = 0;
vdpasim_do_reset(vdpasim);
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -441,9 +441,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->running = false;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->running = true;
 
if (vdpasim->pending_kick) {
@@ -464,7 +464,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
vdpasim->pending_kick = false;
}
 
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -536,14 +536,14 @@ static int vdpasim_set_group_asid(struct vdpa_device 
*vdpa, unsigned int group,
 
iommu = >iommu[asid];
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
 
for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
if (vdpasim_get_vq_group(vdpa, i) == group)
vringh_set_iotlb(>vqs[i].vring, iommu,
 >iommu_lock);
 
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
diff --git 

[PATCH v3 4/8] vringh: support VA with iotlb

2023-03-21 Thread Stefano Garzarella
vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.

Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- refactored avoiding code duplication [Eugenio]
v2:
- replace kmap_atomic() with kmap_local_page() [see previous patch]
- fix cast warnings when build with W=1 C=1

 include/linux/vringh.h|   5 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |   2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |   4 +-
 drivers/vhost/vringh.c| 153 +++---
 4 files changed, 127 insertions(+), 37 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 1991a02c6431..d39b9f2dcba0 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -32,6 +32,9 @@ struct vringh {
/* Can we get away with weak barriers? */
bool weak_barriers;
 
+   /* Use user's VA */
+   bool use_va;
+
/* Last available index we saw (ie. where we're up to). */
u16 last_avail_idx;
 
@@ -279,7 +282,7 @@ void vringh_set_iotlb(struct vringh *vrh, struct 
vhost_iotlb *iotlb,
  spinlock_t *iotlb_lock);
 
 int vringh_init_iotlb(struct vringh *vrh, u64 features,
- unsigned int num, bool weak_barriers,
+ unsigned int num, bool weak_barriers, bool use_va,
  struct vring_desc *desc,
  struct vring_avail *avail,
  struct vring_used *used);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..dfd0e000217b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2537,7 +2537,7 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
 
if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
err = vringh_init_iotlb(>vring, mvdev->actual_features,
-   MLX5_CVQ_MAX_ENT, false,
+   MLX5_CVQ_MAX_ENT, false, false,
(struct vring_desc 
*)(uintptr_t)cvq->desc_addr,
(struct vring_avail 
*)(uintptr_t)cvq->driver_addr,
(struct vring_used 
*)(uintptr_t)cvq->device_addr);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index eea23c630f7c..47cdf2a1f5b8 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -60,7 +60,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, 
unsigned int idx)
struct vdpasim_virtqueue *vq = >vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
 
-   vringh_init_iotlb(>vring, vdpasim->features, vq->num, true,
+   vringh_init_iotlb(>vring, vdpasim->features, vq->num, true, false,
  (struct vring_desc *)(uintptr_t)vq->desc_addr,
  (struct vring_avail *)
  (uintptr_t)vq->driver_addr,
@@ -92,7 +92,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
vq->cb = NULL;
vq->private = NULL;
vringh_init_iotlb(>vring, vdpasim->dev_attr.supported_features,
- VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
+ VDPASIM_QUEUE_MAX, false, false, NULL, NULL, NULL);
 
vq->vring.notify = NULL;
 }
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..72c88519329a 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1094,10 +1094,18 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
 
 #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
 
+struct iotlb_vec {
+   union {
+   struct iovec *iovec;
+   struct bio_vec *bvec;
+   } iov;
+   size_t count;
+   bool is_iovec;
+};
+
 static int iotlb_translate(const struct vringh *vrh,
   u64 addr, u64 len, u64 *translated,
-  struct bio_vec iov[],
-  int iov_size, u32 perm)
+  struct iotlb_vec *ivec, u32 perm)
 {
struct vhost_iotlb_map *map;
struct vhost_iotlb *iotlb = vrh->iotlb;
@@ -1107,9 +1115,9 @@ static int iotlb_translate(const struct vringh *vrh,
spin_lock(vrh->iotlb_lock);
 
while (len > s) {
-   u64 size, pa, pfn;
+   u64 size;
 
-   if (unlikely(ret >= iov_size)) {
+   if (unlikely(ret >= ivec->count)) {
ret = -ENOBUFS;
break;
}
@@ -1124,10 +1132,22 @@ static int iotlb_translate(const struct vringh *vrh,
}
 
size = map->size - addr + map->start;
-   pa = map->addr + addr - map->start;
-   pfn = pa >> PAGE_SHIFT;
-   bvec_set_page([ret], pfn_to_page(pfn), min(len - s, size),
-

[PATCH v3 3/8] vringh: replace kmap_atomic() with kmap_local_page()

2023-03-21 Thread Stefano Garzarella
kmap_atomic() is deprecated in favor of kmap_local_page() since commit
f3ba3c710ac5 ("mm/highmem: Provide kmap_local*").

With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.

kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).

The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).

This commit reuses a "boiler plate" commit message from Fabio, who has
already did this change in several places.

Cc: "Fabio M. De Francesco" 
Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- credited Fabio for the commit message
- added reference to the commit that deprecated kmap_atomic() [Jason]
v2:
- added this patch since checkpatch.pl complained about deprecation
  of kmap_atomic() touched by next patch

 drivers/vhost/vringh.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a1e27da54481..0ba3ef809e48 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
+   kaddr = kmap_local_page(iov.bv_page);
from = kaddr + iov.bv_offset;
*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-   kunmap_atomic(kaddr);
+   kunmap_local(kaddr);
 
return 0;
 }
@@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
+   kaddr = kmap_local_page(iov.bv_page);
to = kaddr + iov.bv_offset;
WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-   kunmap_atomic(kaddr);
+   kunmap_local(kaddr);
 
return 0;
 }
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/8] vhost-vdpa: use bind_mm/unbind_mm device callbacks

2023-03-21 Thread Stefano Garzarella
When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.

The unbind_mm callback is called during the release after
stopping the device.

Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- added `case VHOST_SET_OWNER` in vhost_vdpa_unlocked_ioctl() [Jason]
v2:
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
  is not detaching it now during the reset

 drivers/vhost/vdpa.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..20250c3418b2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
return vdpa_reset(vdpa);
 }
 
+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (!vdpa->use_va || !ops->bind_mm)
+   return 0;
+
+   return ops->bind_mm(vdpa, v->vdev.mm);
+}
+
+static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (!vdpa->use_va || !ops->unbind_mm)
+   return;
+
+   ops->unbind_mm(vdpa);
+}
+
 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
 {
struct vdpa_device *vdpa = v->vdpa;
@@ -709,6 +731,14 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
case VHOST_VDPA_RESUME:
r = vhost_vdpa_resume(v);
break;
+   case VHOST_SET_OWNER:
+   r = vhost_dev_set_owner(d);
+   if (r)
+   break;
+   r = vhost_vdpa_bind_mm(v);
+   if (r)
+   vhost_dev_reset_owner(d, NULL);
+   break;
default:
r = vhost_dev_ioctl(>vdev, cmd, argp);
if (r == -ENOIOCTLCMD)
@@ -1287,6 +1317,7 @@ static int vhost_vdpa_release(struct inode *inode, struct 
file *filep)
vhost_vdpa_clean_irq(v);
vhost_vdpa_reset(v);
vhost_dev_stop(>vdev);
+   vhost_vdpa_unbind_mm(v);
vhost_vdpa_config_put(v);
vhost_vdpa_cleanup(v);
mutex_unlock(>mutex);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/8] vdpa_sim: add support for user VA

2023-03-21 Thread Stefano Garzarella
This series adds support for the use of user virtual addresses in the
vDPA simulator devices.

The main reason for this change is to lift the pinning of all guest memory.
Especially with virtio devices implemented in software.

The next step would be to generalize the code in vdpa-sim to allow the
implementation of in-kernel software devices. Similar to vhost, but using vDPA
so we can reuse the same software stack (e.g. in QEMU) for both HW and SW
devices.

For example, we have never merged vhost-blk, and lately there has been interest.
So it would be nice to do it directly with vDPA to reuse the same code in the
VMM for both HW and SW vDPA block devices.

The main problem (addressed by this series) was due to the pinning of all
guest memory, which thus prevented the overcommit of guest memory.

Thanks,
Stefano

Changelog listed in each patch.
v2: https://lore.kernel.org/lkml/20230302113421.174582-1-sgarz...@redhat.com/
RFC v1: 
https://lore.kernel.org/lkml/20221214163025.103075-1-sgarz...@redhat.com/

Stefano Garzarella (8):
  vdpa: add bind_mm/unbind_mm callbacks
  vhost-vdpa: use bind_mm/unbind_mm device callbacks
  vringh: replace kmap_atomic() with kmap_local_page()
  vringh: support VA with iotlb
  vdpa_sim: make devices agnostic for work management
  vdpa_sim: use kthread worker
  vdpa_sim: replace the spinlock with a mutex to protect the state
  vdpa_sim: add support for user VA

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  11 +-
 include/linux/vdpa.h |  10 ++
 include/linux/vringh.h   |   5 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c|   2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 144 -
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  10 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  10 +-
 drivers/vhost/vdpa.c |  31 ++
 drivers/vhost/vringh.c   | 153 +--
 9 files changed, 301 insertions(+), 75 deletions(-)

-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/8] vdpa: add bind_mm/unbind_mm callbacks

2023-03-21 Thread Stefano Garzarella
These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.

Suggested-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- removed `struct task_struct *owner` param (unused for now, maybe
  useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]

 include/linux/vdpa.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..369c21394284 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -290,6 +290,14 @@ struct vdpa_map_file {
  * @vdev: vdpa device
  * @idx: virtqueue index
  * Returns pointer to structure device or error 
(NULL)
+ * @bind_mm:   Bind the device to a specific address space
+ * so the vDPA framework can use VA when this
+ * callback is implemented. (optional)
+ * @vdev: vdpa device
+ * @mm: address space to bind
+ * @unbind_mm: Unbind the device from the address space
+ * bound using the bind_mm callback. (optional)
+ * @vdev: vdpa device
  * @free:  Free resources that belongs to vDPA (optional)
  * @vdev: vdpa device
  */
@@ -351,6 +359,8 @@ struct vdpa_config_ops {
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
  unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
+   int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
+   void (*unbind_mm)(struct vdpa_device *vdev);
 
/* Free device resources */
void (*free)(struct vdpa_device *vdev);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Cornelia Huck
On Tue, Mar 21 2023, Viktor Prutyanov  wrote:

> On Tue, Mar 21, 2023 at 5:59 PM Cornelia Huck  wrote:
>>
>> On Tue, Mar 21 2023, Viktor Prutyanov  wrote:
>>
>> > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
>> > indicates that the driver passes extra data along with the queue
>> > notifications.
>> >
>> > In a split queue case, the extra data is 16-bit available index. In a
>> > packed queue case, the extra data is 1-bit wrap counter and 15-bit
>> > available index.
>> >
>> > Add support for this feature for MMIO, PCI and channel I/O transports.
>> >
>> > Signed-off-by: Viktor Prutyanov 
>> > ---
>> >  v3: support feature in virtio_ccw, remove VM_NOTIFY, use avail_idx_shadow,
>> > remove byte swap, rename to vring_notification_data
>> >  v2: reject the feature in virtio_ccw, replace __le32 with u32
>> >
>> >  drivers/s390/virtio/virtio_ccw.c   |  4 +++-
>> >  drivers/virtio/virtio_mmio.c   | 14 +-
>> >  drivers/virtio/virtio_pci_common.c | 10 ++
>> >  drivers/virtio/virtio_pci_common.h |  4 
>> >  drivers/virtio/virtio_pci_legacy.c |  2 +-
>> >  drivers/virtio/virtio_pci_modern.c |  2 +-
>> >  drivers/virtio/virtio_ring.c   | 17 +
>> >  include/linux/virtio_ring.h|  2 ++
>> >  include/uapi/linux/virtio_config.h |  6 ++
>> >  9 files changed, 57 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
>> > b/drivers/s390/virtio/virtio_ccw.c
>> > index 954fc31b4bc7..c33172c5b8d5 100644
>> > --- a/drivers/s390/virtio/virtio_ccw.c
>> > +++ b/drivers/s390/virtio/virtio_ccw.c
>> > @@ -396,13 +396,15 @@ static bool virtio_ccw_kvm_notify(struct virtqueue 
>> > *vq)
>> >   struct virtio_ccw_vq_info *info = vq->priv;
>> >   struct virtio_ccw_device *vcdev;
>> >   struct subchannel_id schid;
>> > + u32 data = __virtio_test_bit(vq->vdev, VIRTIO_F_NOTIFICATION_DATA) ?
>> > + vring_notification_data(vq) : vq->index;
>> >
>> >   vcdev = to_vc_device(info->vq->vdev);
>> >   ccw_device_get_schid(vcdev->cdev, );
>> >   BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned int));
>> >   info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
>> > *((unsigned int *)),
>> > -   vq->index, info->cookie);
>> > +   data, info->cookie);
>> >   if (info->cookie < 0)
>> >   return false;
>> >   return true;
>> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
>> > index 3ff746e3f24a..7c16e622c33d 100644
>> > --- a/drivers/virtio/virtio_mmio.c
>> > +++ b/drivers/virtio/virtio_mmio.c
>> > @@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
>> >   return true;
>> >  }
>> >
>> > +static bool vm_notify_with_data(struct virtqueue *vq)
>> > +{
>> > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
>> > + u32 data = vring_notification_data(vq);
>> > +
>> > + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
>>
>> Can't you simply use the same method as for ccw, i.e. use one callback
>> function that simply writes one value or the other?
>
> The idea is to eliminate the conditional branch induced by feature bit
> testing from the notification function. Probably, this can be done in
> the same way in ccw.

Hm, how noticable is that branch? IOW, is it worth making the code less
readable for this?

(In any case, all transports probably should use the same method.)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Cornelia Huck
On Tue, Mar 21 2023, Viktor Prutyanov  wrote:

> According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> indicates that the driver passes extra data along with the queue
> notifications.
>
> In a split queue case, the extra data is 16-bit available index. In a
> packed queue case, the extra data is 1-bit wrap counter and 15-bit
> available index.
>
> Add support for this feature for MMIO, PCI and channel I/O transports.
>
> Signed-off-by: Viktor Prutyanov 
> ---
>  v3: support feature in virtio_ccw, remove VM_NOTIFY, use avail_idx_shadow,
> remove byte swap, rename to vring_notification_data
>  v2: reject the feature in virtio_ccw, replace __le32 with u32
>
>  drivers/s390/virtio/virtio_ccw.c   |  4 +++-
>  drivers/virtio/virtio_mmio.c   | 14 +-
>  drivers/virtio/virtio_pci_common.c | 10 ++
>  drivers/virtio/virtio_pci_common.h |  4 
>  drivers/virtio/virtio_pci_legacy.c |  2 +-
>  drivers/virtio/virtio_pci_modern.c |  2 +-
>  drivers/virtio/virtio_ring.c   | 17 +
>  include/linux/virtio_ring.h|  2 ++
>  include/uapi/linux/virtio_config.h |  6 ++
>  9 files changed, 57 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index 954fc31b4bc7..c33172c5b8d5 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -396,13 +396,15 @@ static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
>   struct virtio_ccw_vq_info *info = vq->priv;
>   struct virtio_ccw_device *vcdev;
>   struct subchannel_id schid;
> + u32 data = __virtio_test_bit(vq->vdev, VIRTIO_F_NOTIFICATION_DATA) ?
> + vring_notification_data(vq) : vq->index;
>  
>   vcdev = to_vc_device(info->vq->vdev);
>   ccw_device_get_schid(vcdev->cdev, );
>   BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned int));
>   info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
> *((unsigned int *)),
> -   vq->index, info->cookie);
> +   data, info->cookie);
>   if (info->cookie < 0)
>   return false;
>   return true;
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 3ff746e3f24a..7c16e622c33d 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
>   return true;
>  }
>  
> +static bool vm_notify_with_data(struct virtqueue *vq)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + u32 data = vring_notification_data(vq);
> +
> + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);

Can't you simply use the same method as for ccw, i.e. use one callback
function that simply writes one value or the other?

> +
> + return true;
> +}
> +
>  /* Notify all virtqueues on an interrupt. */
>  static irqreturn_t vm_interrupt(int irq, void *opaque)
>  {
> @@ -368,6 +378,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>   unsigned long flags;
>   unsigned int num;
>   int err;
> + bool (*notify)(struct virtqueue *vq) = __virtio_test_bit(vdev,
> + VIRTIO_F_NOTIFICATION_DATA) ? vm_notify_with_data : vm_notify;
>  
>   if (!name)
>   return NULL;
> @@ -397,7 +409,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>  
>   /* Create the vring */
>   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev,
> -  true, true, ctx, vm_notify, callback, name);
> +  true, true, ctx, notify, callback, name);
>   if (!vq) {
>   err = -ENOMEM;
>   goto error_new_virtqueue;
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index a6c86f916dbd..e915c22f2384 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -43,6 +43,16 @@ bool vp_notify(struct virtqueue *vq)
>   /* we write the queue's selector into the notification register to
>* signal the other end */
>   iowrite16(vq->index, (void __iomem *)vq->priv);
> +
> + return true;
> +}
> +
> +bool vp_notify_with_data(struct virtqueue *vq)
> +{
> + u32 data = vring_notification_data(vq);
> +
> + iowrite32(data, (void __iomem *)vq->priv);

Same for pci.

> +
>   return true;
>  }
>  

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 04:44:10PM +0300, Viktor Prutyanov wrote:
> According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> indicates that the driver passes extra data along with the queue
> notifications.
> 
> In a split queue case, the extra data is 16-bit available index. In a
> packed queue case, the extra data is 1-bit wrap counter and 15-bit
> available index.
> 
> Add support for this feature for MMIO, PCI and channel I/O transports.
> 
> Signed-off-by: Viktor Prutyanov 

Can you pls report what was tested and how? Thanks!

> ---
>  v3: support feature in virtio_ccw, remove VM_NOTIFY, use avail_idx_shadow,
> remove byte swap, rename to vring_notification_data
>  v2: reject the feature in virtio_ccw, replace __le32 with u32
> 
>  drivers/s390/virtio/virtio_ccw.c   |  4 +++-
>  drivers/virtio/virtio_mmio.c   | 14 +-
>  drivers/virtio/virtio_pci_common.c | 10 ++
>  drivers/virtio/virtio_pci_common.h |  4 
>  drivers/virtio/virtio_pci_legacy.c |  2 +-
>  drivers/virtio/virtio_pci_modern.c |  2 +-
>  drivers/virtio/virtio_ring.c   | 17 +
>  include/linux/virtio_ring.h|  2 ++
>  include/uapi/linux/virtio_config.h |  6 ++
>  9 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index 954fc31b4bc7..c33172c5b8d5 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -396,13 +396,15 @@ static bool virtio_ccw_kvm_notify(struct virtqueue *vq)
>   struct virtio_ccw_vq_info *info = vq->priv;
>   struct virtio_ccw_device *vcdev;
>   struct subchannel_id schid;
> + u32 data = __virtio_test_bit(vq->vdev, VIRTIO_F_NOTIFICATION_DATA) ?
> + vring_notification_data(vq) : vq->index;
>  
>   vcdev = to_vc_device(info->vq->vdev);
>   ccw_device_get_schid(vcdev->cdev, );
>   BUILD_BUG_ON(sizeof(struct subchannel_id) != sizeof(unsigned int));
>   info->cookie = kvm_hypercall3(KVM_S390_VIRTIO_CCW_NOTIFY,
> *((unsigned int *)),
> -   vq->index, info->cookie);
> +   data, info->cookie);
>   if (info->cookie < 0)
>   return false;
>   return true;
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 3ff746e3f24a..7c16e622c33d 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -285,6 +285,16 @@ static bool vm_notify(struct virtqueue *vq)
>   return true;
>  }
>  
> +static bool vm_notify_with_data(struct virtqueue *vq)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + u32 data = vring_notification_data(vq);
> +
> + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + return true;
> +}
> +
>  /* Notify all virtqueues on an interrupt. */
>  static irqreturn_t vm_interrupt(int irq, void *opaque)
>  {
> @@ -368,6 +378,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>   unsigned long flags;
>   unsigned int num;
>   int err;
> + bool (*notify)(struct virtqueue *vq) = __virtio_test_bit(vdev,
> + VIRTIO_F_NOTIFICATION_DATA) ? vm_notify_with_data : vm_notify;
>  
>   if (!name)
>   return NULL;
> @@ -397,7 +409,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>  
>   /* Create the vring */
>   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev,
> -  true, true, ctx, vm_notify, callback, name);
> +  true, true, ctx, notify, callback, name);
>   if (!vq) {
>   err = -ENOMEM;
>   goto error_new_virtqueue;
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index a6c86f916dbd..e915c22f2384 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -43,6 +43,16 @@ bool vp_notify(struct virtqueue *vq)
>   /* we write the queue's selector into the notification register to
>* signal the other end */
>   iowrite16(vq->index, (void __iomem *)vq->priv);
> +
> + return true;
> +}
> +
> +bool vp_notify_with_data(struct virtqueue *vq)
> +{
> + u32 data = vring_notification_data(vq);
> +
> + iowrite32(data, (void __iomem *)vq->priv);
> +
>   return true;
>  }
>  
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index 23112d84218f..9a7212dcbb32 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -105,6 +105,7 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>  void vp_synchronize_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct 

Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Cornelia Huck
On Tue, Mar 21 2023, "Michael S. Tsirkin"  wrote:

> On Tue, Mar 21, 2023 at 12:00:42PM +0300, Viktor Prutyanov wrote:
>> On Tue, Mar 21, 2023 at 5:29 AM Jason Wang  wrote:
>> >
>> > On Tue, Mar 21, 2023 at 7:21 AM Viktor Prutyanov  wrote:
>> > >
>> > > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
>> > > indicates that the driver passes extra data along with the queue
>> > > notifications.
>> > >
>> > > In a split queue case, the extra data is 16-bit available index. In a
>> > > packed queue case, the extra data is 1-bit wrap counter and 15-bit
>> > > available index.
>> > >
>> > > Add support for this feature for MMIO and PCI transports. Channel I/O
>> > > transport will not accept this feature.
>> > >
>> > > Signed-off-by: Viktor Prutyanov 
>> > > ---
>> > >
>> > >  v2: reject the feature in virtio_ccw, replace __le32 with u32
>> > >
>> > >  drivers/s390/virtio/virtio_ccw.c   |  4 +---
>> > >  drivers/virtio/virtio_mmio.c   | 15 ++-
>> > >  drivers/virtio/virtio_pci_common.c | 10 ++
>> > >  drivers/virtio/virtio_pci_common.h |  4 
>> > >  drivers/virtio/virtio_pci_legacy.c |  2 +-
>> > >  drivers/virtio/virtio_pci_modern.c |  2 +-
>> > >  drivers/virtio/virtio_ring.c   | 17 +
>> > >  include/linux/virtio_ring.h|  2 ++
>> > >  include/uapi/linux/virtio_config.h |  6 ++
>> > >  9 files changed, 56 insertions(+), 6 deletions(-)
>> > >
>> > > diff --git a/drivers/s390/virtio/virtio_ccw.c 
>> > > b/drivers/s390/virtio/virtio_ccw.c
>> > > index a10dbe632ef9..d72a59415527 100644
>> > > --- a/drivers/s390/virtio/virtio_ccw.c
>> > > +++ b/drivers/s390/virtio/virtio_ccw.c
>> > > @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct 
>> > > virtio_device *vdev)
>> > >
>> > >  static void ccw_transport_features(struct virtio_device *vdev)
>> > >  {
>> > > -   /*
>> > > -* Currently nothing to do here.
>> > > -*/
>> > > +   __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
>> >
>> > Is there any restriction that prevents us from implementing
>> > VIRTIO_F_NOTIFICATION_DATA? (Spec seems doesn't limit us from this)
>> 
>> Most likely, nothing.
>
> So pls code it up. It's the same format.

FWIW, the notification data needs to go via the third parameter of
kvm_hypercall3() in virtio_ccw_kvm_notify().

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 08:20:20PM +0800, Xuan Zhuo wrote:
> On Tue, 21 Mar 2023 19:58:49 +0800, Xuan Zhuo  
> wrote:
> > On Wed, 15 Mar 2023 12:10:34 +0800, Xuan Zhuo  
> > wrote:
> > > Due to historical reasons, the implementation of XDP in virtio-net is 
> > > relatively
> > > chaotic. For example, the processing of XDP actions has two copies of 
> > > similar
> > > code. Such as page, xdp_page processing, etc.
> > >
> > > The purpose of this patch set is to refactor these code. Reduce the 
> > > difficulty
> > > of subsequent maintenance. Subsequent developers will not introduce new 
> > > bugs
> > > because of some complex logical relationships.
> > >
> > > In addition, the supporting to AF_XDP that I want to submit later will 
> > > also need
> > > to reuse the logic of XDP, such as the processing of actions, I don't 
> > > want to
> > > introduce a new similar code. In this way, I can reuse these codes in the
> > > future.
> > >
> > > This patches are developed on the top of another patch set[1]. I may have 
> > > to
> > > wait to merge this. So this patch set is a RFC.
> >
> >
> > Hi, Jason:
> >
> > Now, the patch set[1] has been in net-next. So this patch set can been merge
> > into net-next.
> >
> > Please review.
> 
> 
> I do not know why this patch set miss Jason. I send it normally. ^_^
> 
> Thanks.


repost as non-rfc.

> 
> 
> >
> > Thanks.
> >
> >
> > >
> > > Please review.
> > >
> > > Thanks.
> > >
> > > [1]. 
> > > https://lore.kernel.org/netdev/20230315015223.89137-1-xuanz...@linux.alibaba.com/
> > >
> > >
> > > Xuan Zhuo (8):
> > >   virtio_net: mergeable xdp: put old page immediately
> > >   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
> > >   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> > > run xdp
> > >   virtio_net: separate the logic of freeing xdp shinfo
> > >   virtio_net: separate the logic of freeing the rest mergeable buf
> > >   virtio_net: auto release xdp shinfo
> > >   virtio_net: introduce receive_mergeable_xdp()
> > >   virtio_net: introduce receive_small_xdp()
> > >
> > >  drivers/net/virtio_net.c | 615 +++
> > >  1 file changed, 357 insertions(+), 258 deletions(-)
> > >
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> > > ___
> > > Virtualization mailing list
> > > Virtualization@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Xuan Zhuo
On Tue, 21 Mar 2023 19:58:49 +0800, Xuan Zhuo  
wrote:
> On Wed, 15 Mar 2023 12:10:34 +0800, Xuan Zhuo  
> wrote:
> > Due to historical reasons, the implementation of XDP in virtio-net is 
> > relatively
> > chaotic. For example, the processing of XDP actions has two copies of 
> > similar
> > code. Such as page, xdp_page processing, etc.
> >
> > The purpose of this patch set is to refactor these code. Reduce the 
> > difficulty
> > of subsequent maintenance. Subsequent developers will not introduce new bugs
> > because of some complex logical relationships.
> >
> > In addition, the supporting to AF_XDP that I want to submit later will also 
> > need
> > to reuse the logic of XDP, such as the processing of actions, I don't want 
> > to
> > introduce a new similar code. In this way, I can reuse these codes in the
> > future.
> >
> > This patches are developed on the top of another patch set[1]. I may have to
> > wait to merge this. So this patch set is a RFC.
>
>
> Hi, Jason:
>
> Now, the patch set[1] has been in net-next. So this patch set can been merge
> into net-next.
>
> Please review.


I do not know why this patch set miss Jason. I send it normally. ^_^

Thanks.



>
> Thanks.
>
>
> >
> > Please review.
> >
> > Thanks.
> >
> > [1]. 
> > https://lore.kernel.org/netdev/20230315015223.89137-1-xuanz...@linux.alibaba.com/
> >
> >
> > Xuan Zhuo (8):
> >   virtio_net: mergeable xdp: put old page immediately
> >   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
> >   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> > run xdp
> >   virtio_net: separate the logic of freeing xdp shinfo
> >   virtio_net: separate the logic of freeing the rest mergeable buf
> >   virtio_net: auto release xdp shinfo
> >   virtio_net: introduce receive_mergeable_xdp()
> >   virtio_net: introduce receive_small_xdp()
> >
> >  drivers/net/virtio_net.c | 615 +++
> >  1 file changed, 357 insertions(+), 258 deletions(-)
> >
> > --
> > 2.32.0.3.g01195cf9f
> >
> > ___
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 02:49:13PM +0300, Viktor Prutyanov wrote:
> On Tue, Mar 21, 2023 at 12:23 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Mar 21, 2023 at 02:21:15AM +0300, Viktor Prutyanov wrote:
> > > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> > > indicates that the driver passes extra data along with the queue
> > > notifications.
> > >
> > > In a split queue case, the extra data is 16-bit available index. In a
> > > packed queue case, the extra data is 1-bit wrap counter and 15-bit
> > > available index.
> > >
> > > Add support for this feature for MMIO and PCI transports. Channel I/O
> > > transport will not accept this feature.
> > >
> > > Signed-off-by: Viktor Prutyanov 
> > > ---
> > >
> > >  v2: reject the feature in virtio_ccw, replace __le32 with u32
> > >
> > >  drivers/s390/virtio/virtio_ccw.c   |  4 +---
> > >  drivers/virtio/virtio_mmio.c   | 15 ++-
> > >  drivers/virtio/virtio_pci_common.c | 10 ++
> > >  drivers/virtio/virtio_pci_common.h |  4 
> > >  drivers/virtio/virtio_pci_legacy.c |  2 +-
> > >  drivers/virtio/virtio_pci_modern.c |  2 +-
> > >  drivers/virtio/virtio_ring.c   | 17 +
> > >  include/linux/virtio_ring.h|  2 ++
> > >  include/uapi/linux/virtio_config.h |  6 ++
> > >  9 files changed, 56 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > > b/drivers/s390/virtio/virtio_ccw.c
> > > index a10dbe632ef9..d72a59415527 100644
> > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct 
> > > virtio_device *vdev)
> > >
> > >  static void ccw_transport_features(struct virtio_device *vdev)
> > >  {
> > > - /*
> > > -  * Currently nothing to do here.
> > > -  */
> > > + __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
> > >  }
> > >
> > >  static int virtio_ccw_finalize_features(struct virtio_device *vdev)
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index 3ff746e3f24a..0e13da17fe0a 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -285,6 +285,19 @@ static bool vm_notify(struct virtqueue *vq)
> > >   return true;
> > >  }
> > >
> > > +static bool vm_notify_with_data(struct virtqueue *vq)
> > > +{
> > > + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> > > + u32 data = vring_fill_notification_data(vq);
> > > +
> > > + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> > > +
> > > + return true;
> > > +}
> > > +
> > > +#define VM_NOTIFY(vdev) (__virtio_test_bit((vdev), 
> > > VIRTIO_F_NOTIFICATION_DATA) \
> > > + ? vm_notify_with_data : vm_notify)
> > > +
> > >  /* Notify all virtqueues on an interrupt. */
> > >  static irqreturn_t vm_interrupt(int irq, void *opaque)
> > >  {
> > > @@ -397,7 +410,7 @@ static struct virtqueue *vm_setup_vq(struct 
> > > virtio_device *vdev, unsigned int in
> > >
> > >   /* Create the vring */
> > >   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, 
> > > vdev,
> > > -  true, true, ctx, vm_notify, callback, 
> > > name);
> > > + true, true, ctx, VM_NOTIFY(vdev), callback, name);
> >
> > I don't see why is this macro useful.
> >
> >
> >
> > >   if (!vq) {
> > >   err = -ENOMEM;
> > >   goto error_new_virtqueue;
> > > diff --git a/drivers/virtio/virtio_pci_common.c 
> > > b/drivers/virtio/virtio_pci_common.c
> > > index a6c86f916dbd..535263abc2bd 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -43,6 +43,16 @@ bool vp_notify(struct virtqueue *vq)
> > >   /* we write the queue's selector into the notification register to
> > >* signal the other end */
> > >   iowrite16(vq->index, (void __iomem *)vq->priv);
> > > +
> > > + return true;
> > > +}
> > > +
> > > +bool vp_notify_with_data(struct virtqueue *vq)
> > > +{
> > > + u32 data = vring_fill_notification_data(vq);
> > > +
> > > + iowrite32(data, (void __iomem *)vq->priv);
> > > +
> > >   return true;
> > >  }
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.h 
> > > b/drivers/virtio/virtio_pci_common.h
> > > index 23112d84218f..9a7212dcbb32 100644
> > > --- a/drivers/virtio/virtio_pci_common.h
> > > +++ b/drivers/virtio/virtio_pci_common.h
> > > @@ -105,6 +105,7 @@ static struct virtio_pci_device *to_vp_device(struct 
> > > virtio_device *vdev)
> > >  void vp_synchronize_vectors(struct virtio_device *vdev);
> > >  /* the notify function used when creating a virt queue */
> > >  bool vp_notify(struct virtqueue *vq);
> > > +bool vp_notify_with_data(struct virtqueue *vq);
> > >  /* the config->del_vqs() implementation */
> > >  void vp_del_vqs(struct virtio_device *vdev);
> > >  /* the config->find_vqs() implementation */
> > > 

Re: [RFC net-next 0/8] virtio_net: refactor xdp codes

2023-03-21 Thread Xuan Zhuo
On Wed, 15 Mar 2023 12:10:34 +0800, Xuan Zhuo  
wrote:
> Due to historical reasons, the implementation of XDP in virtio-net is 
> relatively
> chaotic. For example, the processing of XDP actions has two copies of similar
> code. Such as page, xdp_page processing, etc.
>
> The purpose of this patch set is to refactor these code. Reduce the difficulty
> of subsequent maintenance. Subsequent developers will not introduce new bugs
> because of some complex logical relationships.
>
> In addition, the supporting to AF_XDP that I want to submit later will also 
> need
> to reuse the logic of XDP, such as the processing of actions, I don't want to
> introduce a new similar code. In this way, I can reuse these codes in the
> future.
>
> This patches are developed on the top of another patch set[1]. I may have to
> wait to merge this. So this patch set is a RFC.


Hi, Jason:

Now, the patch set[1] has been in net-next. So this patch set can been merge
into net-next.

Please review.

Thanks.


>
> Please review.
>
> Thanks.
>
> [1]. 
> https://lore.kernel.org/netdev/20230315015223.89137-1-xuanz...@linux.alibaba.com/
>
>
> Xuan Zhuo (8):
>   virtio_net: mergeable xdp: put old page immediately
>   virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
>   virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
> run xdp
>   virtio_net: separate the logic of freeing xdp shinfo
>   virtio_net: separate the logic of freeing the rest mergeable buf
>   virtio_net: auto release xdp shinfo
>   virtio_net: introduce receive_mergeable_xdp()
>   virtio_net: introduce receive_small_xdp()
>
>  drivers/net/virtio_net.c | 615 +++
>  1 file changed, 357 insertions(+), 258 deletions(-)
>
> --
> 2.32.0.3.g01195cf9f
>
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 11/11] virtio_ring: introduce virtqueue_reset()

2023-03-21 Thread Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 33 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cbbebe2b2203..f1c68fadc75b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2789,6 +2789,39 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+   void (*recycle)(struct virtqueue *vq, void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   int err;
+
+   err = virtqueue_disable_and_recycle(_vq, recycle);
+   if (err)
+   return err;
+
+   if (vq->packed_ring)
+   virtqueue_reinit_packed(vq);
+   else
+   virtqueue_reinit_split(vq);
+
+   return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1fa50191cf0a..22bbd06ef8c8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -97,6 +97,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+   void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 09/11] virtio_ring: correct the expression of the description of virtqueue_resize()

2023-03-21 Thread Xuan Zhuo
Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d214a8a2fd4b..6f093528afd0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2704,7 +2704,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize

2023-03-21 Thread Xuan Zhuo
The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 58 
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6f093528afd0..cbbebe2b2203 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2158,6 +2158,43 @@ static int virtqueue_resize_packed(struct virtqueue 
*_vq, u32 num)
return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+void (*recycle)(struct virtqueue *vq, 
void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+   void *buf;
+   int err;
+
+   if (!vq->we_own_ring)
+   return -EPERM;
+
+   if (!vdev->config->disable_vq_and_reset)
+   return -ENOENT;
+
+   if (!vdev->config->enable_vq_after_reset)
+   return -ENOENT;
+
+   err = vdev->config->disable_vq_and_reset(_vq);
+   if (err)
+   return err;
+
+   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+   recycle(_vq, buf);
+
+   return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+
+   if (vdev->config->enable_vq_after_reset(_vq))
+   return -EBUSY;
+
+   return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2728,13 +2765,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   struct virtio_device *vdev = vq->vq.vdev;
-   void *buf;
int err;
 
-   if (!vq->we_own_ring)
-   return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
 
@@ -2744,28 +2776,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == 
num)
return 0;
 
-   if (!vdev->config->disable_vq_and_reset)
-   return -ENOENT;
-
-   if (!vdev->config->enable_vq_after_reset)
-   return -ENOENT;
-
-   err = vdev->config->disable_vq_and_reset(_vq);
+   err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
 
-   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-   recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
 
-   if (vdev->config->enable_vq_after_reset(_vq))
-   return -EBUSY;
-
-   return err;
+   return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 05/11] virtio_ring: packed: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index be2ff96964c3..95209bc61072 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -78,6 +78,7 @@ struct vring_desc_state_packed {
struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
u16 num;/* Descriptor list length. */
u16 last;   /* The last desc state in a list. */
+   bool map_inter; /* Do dma map internally. */
 };
 
 struct vring_desc_extra {
@@ -1259,7 +1260,8 @@ static inline u16 packed_last_used(u16 last_used_idx)
 }
 
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-struct vring_desc_extra *extra)
+struct vring_desc_extra *extra,
+bool map_inter)
 {
u16 flags;
 
@@ -1274,6 +1276,9 @@ static void vring_unmap_extra_packed(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!map_inter)
+   return;
+
dma_unmap_page(vring_dma_dev(vq),
   extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
@@ -1439,6 +1444,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
u16 head, id, prev, curr;
+   bool map_inter;
int err;
 
START_USE(vq);
@@ -1484,7 +1490,8 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   map_inter = !sgs[0]->dma_address;
+   if (map_inter && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
return -ENOMEM;
 
curr = id;
@@ -1536,6 +1543,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = ctx;
vq->packed.desc_state[id].last = prev;
+   vq->packed.desc_state[id].map_inter = map_inter;
 
/*
 * A driver MUST NOT make the first descriptor in the list
@@ -1621,7 +1629,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
->packed.desc_extra[curr]);
+>packed.desc_extra[curr],
+state->map_inter);
curr = vq->packed.desc_extra[curr].next;
}
}
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 06/11] virtio_ring: packed-indirect: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use sg->dma_address,
otherwise all dma_address must be null when passing it to the APIs of
virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 95209bc61072..e929e61dd145 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1333,6 +1333,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
unsigned int i, n;
u16 head, id;
dma_addr_t addr;
+   bool map_inter;
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
@@ -1350,7 +1351,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   map_inter = !sgs[0]->dma_address;
+   if (map_inter && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
return -ENOMEM;
 
for (n = 0; n < out_sgs + in_sgs; n++) {
@@ -1412,6 +1414,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = desc;
vq->packed.desc_state[id].last = id;
+   vq->packed.desc_state[id].map_inter = map_inter;
+
 
vq->num_added += 1;
 
@@ -1421,7 +1425,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (map_inter)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
kfree(desc);
 
@@ -1643,7 +1648,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (!desc)
return;
 
-   if (vq->use_dma_api) {
+   if (vq->use_dma_api && state->map_inter) {
len = vq->packed.desc_extra[id].len;
for (i = 0; i < len / sizeof(struct vring_packed_desc);
i++)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 04/11] virtio_ring: split: support premapped

2023-03-21 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8ed4aef9462..be2ff96964c3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -70,6 +70,7 @@
 struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   bool map_inter; /* Do dma map internally. */
 };
 
 struct vring_desc_state_packed {
@@ -448,7 +449,7 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
 }
 
 static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
- unsigned int i)
+ unsigned int i, bool map_inter)
 {
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
@@ -465,6 +466,9 @@ static unsigned int vring_unmap_one_split(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!map_inter)
+   goto out;
+
dma_unmap_page(vring_dma_dev(vq),
   extra[i].addr,
   extra[i].len,
@@ -615,7 +619,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct scatterlist *sg;
struct vring_desc *desc;
unsigned int i, n, avail, descs_used, prev;
-   bool indirect;
+   bool indirect, map_inter;
int head;
 
START_USE(vq);
@@ -668,7 +672,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return -ENOSPC;
}
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   map_inter = !sgs[0]->dma_address;
+   if (map_inter && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
return -ENOMEM;
 
for (n = 0; n < out_sgs; n++) {
@@ -734,6 +739,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
vq->split.desc_state[head].indir_desc = desc;
else
vq->split.desc_state[head].indir_desc = ctx;
+   vq->split.desc_state[head].map_inter = map_inter;
 
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
@@ -759,7 +765,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (map_inter)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
if (indirect)
kfree(desc);
@@ -804,20 +811,22 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
 {
unsigned int i, j;
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+   bool map_inter;
 
/* Clear data ptr. */
vq->split.desc_state[head].data = NULL;
+   map_inter = vq->split.desc_state[head].map_inter;
 
/* Put back on free list: unmap first-level descriptors and find end */
i = head;
 
while (vq->split.vring.desc[i].flags & nextflag) {
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, map_inter);
i = vq->split.desc_extra[i].next;
vq->vq.num_free++;
}
 
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, map_inter);
vq->split.desc_extra[i].next = vq->free_head;
vq->free_head = head;
 
@@ -839,8 +848,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-   for (j = 0; j < len / sizeof(struct vring_desc); j++)
-   vring_unmap_one_split_indirect(vq, _desc[j]);
+   if (map_inter) {
+   for (j = 0; j < len / sizeof(struct vring_desc); j++)
+   vring_unmap_one_split_indirect(vq, 
_desc[j]);
+   }
 
kfree(indir_desc);
vq->split.desc_state[head].indir_desc = NULL;
-- 
2.32.0.3.g01195cf9f


[PATCH vhost v3 08/11] virtio_ring: introduce virtqueue_dma_dev()

2023-03-21 Thread Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio.c  |  6 ++
 drivers/virtio/virtio_ring.c | 17 +
 include/linux/virtio.h   |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..11c5035369e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include 
 #include 
 #include 
 #include 
@@ -243,6 +244,11 @@ static int virtio_dev_probe(struct device *_d)
u64 driver_features;
u64 driver_features_legacy;
 
+   _d->dma_mask = &_d->coherent_dma_mask;
+   err = dma_set_mask_and_coherent(_d, DMA_BIT_MASK(64));
+   if (err)
+   return err;
+
/* We have a driver! */
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 78ec8a430033..d214a8a2fd4b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2299,6 +2299,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (vq->use_dma_api)
+   return vring_dma_dev(vq);
+   else
+   return >vq.vdev->dev;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2b472514c49b..1fa50191cf0a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 02/11] virtio_ring: packed: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_packed(). DMA
address will be saved as sg->dma_address, then virtqueue_add_packed()
will use it directly. Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 37 +++-
 1 file changed, 7 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index fe704ca6c813..42e8c9d44161 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1430,9 +1430,9 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, c, descs_used, err_idx;
+   unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
-   u16 head, id, prev, curr, avail_used_flags;
+   u16 head, id, prev, curr;
int err;
 
START_USE(vq);
@@ -1461,7 +1461,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
}
 
head = vq->packed.next_avail_idx;
-   avail_used_flags = vq->packed.avail_used_flags;
 
WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
 
@@ -1479,15 +1478,13 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
curr = id;
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
flags = cpu_to_le16(vq->packed.avail_used_flags |
(++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
(n < out_sgs ? 0 : VRING_DESC_F_WRITE));
@@ -1496,12 +1493,12 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
else
desc[i].flags = flags;
 
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
 
if (unlikely(vq->use_dma_api)) {
-   vq->packed.desc_extra[curr].addr = addr;
+   vq->packed.desc_extra[curr].addr = 
vring_sg_address(sg);
vq->packed.desc_extra[curr].len = sg->length;
vq->packed.desc_extra[curr].flags =
le16_to_cpu(flags);
@@ -1547,26 +1544,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
END_USE(vq);
 
return 0;
-
-unmap_release:
-   err_idx = i;
-   i = head;
-   curr = vq->free_head;
-
-   vq->packed.avail_used_flags = avail_used_flags;
-
-   for (n = 0; n < total_sg; n++) {
-   if (i == err_idx)
-   break;
-   vring_unmap_extra_packed(vq, >packed.desc_extra[curr]);
-   curr = vq->packed.desc_extra[curr].next;
-   i++;
-   if (i >= vq->packed.vring.num)
-   i = 0;
-   }
-
-   END_USE(vq);
-   return -EIO;
 }
 
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 03/11] virtio_ring: packed-indirect: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_indirect_packed().

DMA address will be saved as sg->dma_address, then
virtqueue_add_indirect_packed() will use it directly. Unmap operation
will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 42e8c9d44161..c8ed4aef9462 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1314,7 +1314,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 {
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, err_idx;
+   unsigned int i, n;
u16 head, id;
dma_addr_t addr;
 
@@ -1334,16 +1334,14 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
desc[i].flags = cpu_to_le16(n < out_sgs ?
0 : VRING_DESC_F_WRITE);
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
i++;
}
@@ -1407,10 +1405,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   err_idx = i;
-
-   for (i = 0; i < err_idx; i++)
-   vring_unmap_desc_packed(vq, [i]);
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
kfree(desc);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 07/11] virtio_ring: update document for virtqueue_add_*

2023-03-21 Thread Xuan Zhuo
Update the document of virtqueue_add_* series API, allowing the callers to
use sg->dma_address to pass the dma address to Virtio Core.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e929e61dd145..78ec8a430033 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2192,6 +2192,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_sgs(struct virtqueue *_vq,
@@ -2226,6 +2230,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_outbuf(struct virtqueue *vq,
@@ -2248,6 +2256,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf(struct virtqueue *vq,
@@ -2271,6 +2283,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 00/11] virtio core prepares for AF_XDP

2023-03-21 Thread Xuan Zhuo
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-|---|--|
xmit by sockperf: 90%|   100%|  |  318967
xmit by xsk:  100%   |   30% |   33%| 1192064
recv by sockperf: 100%   |   68% |   100%   |  692288
recv by xsk:  100%   |   33% |   43%|  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.


Xuan Zhuo (11):
  virtio_ring: split: separate dma codes
  virtio_ring: packed: separate dma codes
  virtio_ring: packed-indirect: separate dma codes
  virtio_ring: split: support premapped
  virtio_ring: packed: support premapped
  virtio_ring: packed-indirect: support premapped
  virtio_ring: update document for virtqueue_add_*
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: correct the expression of the description of
virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()

 drivers/virtio/virtio.c  |   6 +
 drivers/virtio/virtio_ring.c | 342 +--
 include/linux/virtio.h   |   4 +
 3 files changed, 255 insertions(+), 97 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v3 01/11] virtio_ring: split: separate dma codes

2023-03-21 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_split() to
one new function. DMA address will be saved as sg->dma_address if
use_dma_api is true, then virtqueue_add_split() will use it directly.
Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 121 +++
 1 file changed, 93 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 41144b5246a8..fe704ca6c813 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
direction);
 }
 
+static dma_addr_t vring_sg_address(struct scatterlist *sg)
+{
+   if (sg->dma_address)
+   return sg->dma_address;
+
+   return (dma_addr_t)sg_phys(sg);
+}
+
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
   void *cpu_addr, size_t size,
   enum dma_data_direction direction)
@@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct 
virtqueue *vq,
return next;
 }
 
+static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
+   struct scatterlist *sgs[],
+   unsigned int total_sg,
+   unsigned int out_sgs,
+   unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_TO_DEVICE);
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_FROM_DEVICE);
+   }
+   }
+}
+
+static int virtqueue_map_sgs(struct vring_virtqueue *vq,
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return 0;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   return 0;
+
+err:
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   return -ENOMEM;
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
  struct scatterlist *sgs[],
  unsigned int total_sg,
@@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
struct vring_desc *desc;
-   unsigned int i, n, avail, descs_used, prev, err_idx;
-   int head;
+   unsigned int i, n, avail, descs_used, prev;
bool indirect;
+   int head;
 
START_USE(vq);
 
@@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return -ENOSPC;
}
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   return -ENOMEM;
+
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
prev = i;
/* Note that we trust indirect descriptor
 * table since it use stream DMA mapping.
 */
-   

Re: [PATCH v3 2/2] vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 11:24:34AM +0200, Eli Cohen wrote:
> 
> On 20/03/2023 22:02, Michael S. Tsirkin wrote:
> > On Mon, Mar 20, 2023 at 01:49:30PM +0200, Eli Cohen wrote:
> > > One can still enable it when creating the vdpa device using vdpa tool by
> > > providing features that include it.
> > > 
> > > For example:
> > > $ vdpa dev add name vdpa0 mgmtdev pci/:86:00.2 device_features 
> > > 0x300cb982b
> > > 
> > > Please be aware that this feature was not supported before the previous
> > > patch in this list was introduced so we don't change user experience.
> > so I would  say the patches have to be reordered to avoid a regression?
> Yes, I can do that.
> > 
> > > Current firmware versions show degradation in packet rate when using
> > > MRG_RXBUF. Users who favor memory saving over packet rate could enable
> > > this feature but we want to keep it off by default.
> > > 
> > > Signed-off-by: Eli Cohen 
> > OK and when future firmware will (maybe) fix this up how
> > will you know it's ok to enable by default?
> > Some version check I guess?
> > It would be better if firmware specified flags to enable
> > by default ...
> Currently there are no flags for that so let's just keep the default off.

Right but I mean, why are mrg buffers slow? It's a firmware bug right?

> > 
> > > ---
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 5285dd76c793..24397a71d6f3 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -3146,6 +3146,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> > > *v_mdev, const char *name,
> > >   return -EINVAL;
> > >   }
> > >   device_features &= add_config->device_features;
> > > + } else {
> > > + device_features &= ~BIT_ULL(VIRTIO_NET_F_MRG_RXBUF);
> > >   }
> > >   if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) &&
> > > device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) {
> > > -- 
> > > 2.38.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 02:21:15AM +0300, Viktor Prutyanov wrote:
> According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> indicates that the driver passes extra data along with the queue
> notifications.
> 
> In a split queue case, the extra data is 16-bit available index. In a
> packed queue case, the extra data is 1-bit wrap counter and 15-bit
> available index.
> 
> Add support for this feature for MMIO and PCI transports. Channel I/O
> transport will not accept this feature.
> 
> Signed-off-by: Viktor Prutyanov 
> ---
> 
>  v2: reject the feature in virtio_ccw, replace __le32 with u32
> 
>  drivers/s390/virtio/virtio_ccw.c   |  4 +---
>  drivers/virtio/virtio_mmio.c   | 15 ++-
>  drivers/virtio/virtio_pci_common.c | 10 ++
>  drivers/virtio/virtio_pci_common.h |  4 
>  drivers/virtio/virtio_pci_legacy.c |  2 +-
>  drivers/virtio/virtio_pci_modern.c |  2 +-
>  drivers/virtio/virtio_ring.c   | 17 +
>  include/linux/virtio_ring.h|  2 ++
>  include/uapi/linux/virtio_config.h |  6 ++
>  9 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index a10dbe632ef9..d72a59415527 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct virtio_device 
> *vdev)
>  
>  static void ccw_transport_features(struct virtio_device *vdev)
>  {
> - /*
> -  * Currently nothing to do here.
> -  */
> + __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
>  }
>  
>  static int virtio_ccw_finalize_features(struct virtio_device *vdev)
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 3ff746e3f24a..0e13da17fe0a 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -285,6 +285,19 @@ static bool vm_notify(struct virtqueue *vq)
>   return true;
>  }
>  
> +static bool vm_notify_with_data(struct virtqueue *vq)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + u32 data = vring_fill_notification_data(vq);
> +
> + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + return true;
> +}
> +
> +#define VM_NOTIFY(vdev) (__virtio_test_bit((vdev), 
> VIRTIO_F_NOTIFICATION_DATA) \
> + ? vm_notify_with_data : vm_notify)
> +
>  /* Notify all virtqueues on an interrupt. */
>  static irqreturn_t vm_interrupt(int irq, void *opaque)
>  {
> @@ -397,7 +410,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>  
>   /* Create the vring */
>   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev,
> -  true, true, ctx, vm_notify, callback, name);
> + true, true, ctx, VM_NOTIFY(vdev), callback, name);
>   if (!vq) {
>   err = -ENOMEM;
>   goto error_new_virtqueue;
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index a6c86f916dbd..535263abc2bd 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -43,6 +43,16 @@ bool vp_notify(struct virtqueue *vq)
>   /* we write the queue's selector into the notification register to
>* signal the other end */
>   iowrite16(vq->index, (void __iomem *)vq->priv);
> +
> + return true;
> +}
> +
> +bool vp_notify_with_data(struct virtqueue *vq)
> +{
> + u32 data = vring_fill_notification_data(vq);
> +
> + iowrite32(data, (void __iomem *)vq->priv);
> +
>   return true;
>  }
>  
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index 23112d84218f..9a7212dcbb32 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -105,6 +105,7 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>  void vp_synchronize_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
> +bool vp_notify_with_data(struct virtqueue *vq);
>  /* the config->del_vqs() implementation */
>  void vp_del_vqs(struct virtio_device *vdev);
>  /* the config->find_vqs() implementation */
> @@ -114,6 +115,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int 
> nvqs,
>   struct irq_affinity *desc);
>  const char *vp_bus_name(struct virtio_device *vdev);
>  
> +#define VP_NOTIFY(vdev) (__virtio_test_bit((vdev), 
> VIRTIO_F_NOTIFICATION_DATA) \
> + ? vp_notify : vp_notify_with_data)
> +
>  /* Setup the affinity for a virtqueue:
>   * - force the affinity for per vq vector
>   * - OR over all affinities for shared MSI
> diff --git a/drivers/virtio/virtio_pci_legacy.c 
> b/drivers/virtio/virtio_pci_legacy.c
> index 2257f1b3d8ae..b98e994cae48 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ 

Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 02:21:15AM +0300, Viktor Prutyanov wrote:
> According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> indicates that the driver passes extra data along with the queue
> notifications.
> 
> In a split queue case, the extra data is 16-bit available index. In a
> packed queue case, the extra data is 1-bit wrap counter and 15-bit
> available index.
> 
> Add support for this feature for MMIO and PCI transports. Channel I/O
> transport will not accept this feature.
> 
> Signed-off-by: Viktor Prutyanov 
> ---
> 
>  v2: reject the feature in virtio_ccw, replace __le32 with u32
> 
>  drivers/s390/virtio/virtio_ccw.c   |  4 +---
>  drivers/virtio/virtio_mmio.c   | 15 ++-
>  drivers/virtio/virtio_pci_common.c | 10 ++
>  drivers/virtio/virtio_pci_common.h |  4 
>  drivers/virtio/virtio_pci_legacy.c |  2 +-
>  drivers/virtio/virtio_pci_modern.c |  2 +-
>  drivers/virtio/virtio_ring.c   | 17 +
>  include/linux/virtio_ring.h|  2 ++
>  include/uapi/linux/virtio_config.h |  6 ++
>  9 files changed, 56 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index a10dbe632ef9..d72a59415527 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct virtio_device 
> *vdev)
>  
>  static void ccw_transport_features(struct virtio_device *vdev)
>  {
> - /*
> -  * Currently nothing to do here.
> -  */
> + __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
>  }
>  
>  static int virtio_ccw_finalize_features(struct virtio_device *vdev)
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 3ff746e3f24a..0e13da17fe0a 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -285,6 +285,19 @@ static bool vm_notify(struct virtqueue *vq)
>   return true;
>  }
>  
> +static bool vm_notify_with_data(struct virtqueue *vq)
> +{
> + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> + u32 data = vring_fill_notification_data(vq);
> +
> + writel(data, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY);
> +
> + return true;
> +}
> +
> +#define VM_NOTIFY(vdev) (__virtio_test_bit((vdev), 
> VIRTIO_F_NOTIFICATION_DATA) \
> + ? vm_notify_with_data : vm_notify)
> +
>  /* Notify all virtqueues on an interrupt. */
>  static irqreturn_t vm_interrupt(int irq, void *opaque)
>  {
> @@ -397,7 +410,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev, unsigned int in
>  
>   /* Create the vring */
>   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev,
> -  true, true, ctx, vm_notify, callback, name);
> + true, true, ctx, VM_NOTIFY(vdev), callback, name);

I don't see why is this macro useful.



>   if (!vq) {
>   err = -ENOMEM;
>   goto error_new_virtqueue;
> diff --git a/drivers/virtio/virtio_pci_common.c 
> b/drivers/virtio/virtio_pci_common.c
> index a6c86f916dbd..535263abc2bd 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -43,6 +43,16 @@ bool vp_notify(struct virtqueue *vq)
>   /* we write the queue's selector into the notification register to
>* signal the other end */
>   iowrite16(vq->index, (void __iomem *)vq->priv);
> +
> + return true;
> +}
> +
> +bool vp_notify_with_data(struct virtqueue *vq)
> +{
> + u32 data = vring_fill_notification_data(vq);
> +
> + iowrite32(data, (void __iomem *)vq->priv);
> +
>   return true;
>  }
>  
> diff --git a/drivers/virtio/virtio_pci_common.h 
> b/drivers/virtio/virtio_pci_common.h
> index 23112d84218f..9a7212dcbb32 100644
> --- a/drivers/virtio/virtio_pci_common.h
> +++ b/drivers/virtio/virtio_pci_common.h
> @@ -105,6 +105,7 @@ static struct virtio_pci_device *to_vp_device(struct 
> virtio_device *vdev)
>  void vp_synchronize_vectors(struct virtio_device *vdev);
>  /* the notify function used when creating a virt queue */
>  bool vp_notify(struct virtqueue *vq);
> +bool vp_notify_with_data(struct virtqueue *vq);
>  /* the config->del_vqs() implementation */
>  void vp_del_vqs(struct virtio_device *vdev);
>  /* the config->find_vqs() implementation */
> @@ -114,6 +115,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int 
> nvqs,
>   struct irq_affinity *desc);
>  const char *vp_bus_name(struct virtio_device *vdev);
>  
> +#define VP_NOTIFY(vdev) (__virtio_test_bit((vdev), 
> VIRTIO_F_NOTIFICATION_DATA) \
> + ? vp_notify : vp_notify_with_data)
> +
>  /* Setup the affinity for a virtqueue:
>   * - force the affinity for per vq vector
>   * - OR over all affinities for shared MSI
> diff --git a/drivers/virtio/virtio_pci_legacy.c 
> b/drivers/virtio/virtio_pci_legacy.c
> index 2257f1b3d8ae..b98e994cae48 100644
> --- 

Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 12:00:42PM +0300, Viktor Prutyanov wrote:
> On Tue, Mar 21, 2023 at 5:29 AM Jason Wang  wrote:
> >
> > On Tue, Mar 21, 2023 at 7:21 AM Viktor Prutyanov  wrote:
> > >
> > > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> > > indicates that the driver passes extra data along with the queue
> > > notifications.
> > >
> > > In a split queue case, the extra data is 16-bit available index. In a
> > > packed queue case, the extra data is 1-bit wrap counter and 15-bit
> > > available index.
> > >
> > > Add support for this feature for MMIO and PCI transports. Channel I/O
> > > transport will not accept this feature.
> > >
> > > Signed-off-by: Viktor Prutyanov 
> > > ---
> > >
> > >  v2: reject the feature in virtio_ccw, replace __le32 with u32
> > >
> > >  drivers/s390/virtio/virtio_ccw.c   |  4 +---
> > >  drivers/virtio/virtio_mmio.c   | 15 ++-
> > >  drivers/virtio/virtio_pci_common.c | 10 ++
> > >  drivers/virtio/virtio_pci_common.h |  4 
> > >  drivers/virtio/virtio_pci_legacy.c |  2 +-
> > >  drivers/virtio/virtio_pci_modern.c |  2 +-
> > >  drivers/virtio/virtio_ring.c   | 17 +
> > >  include/linux/virtio_ring.h|  2 ++
> > >  include/uapi/linux/virtio_config.h |  6 ++
> > >  9 files changed, 56 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > > b/drivers/s390/virtio/virtio_ccw.c
> > > index a10dbe632ef9..d72a59415527 100644
> > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct 
> > > virtio_device *vdev)
> > >
> > >  static void ccw_transport_features(struct virtio_device *vdev)
> > >  {
> > > -   /*
> > > -* Currently nothing to do here.
> > > -*/
> > > +   __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
> >
> > Is there any restriction that prevents us from implementing
> > VIRTIO_F_NOTIFICATION_DATA? (Spec seems doesn't limit us from this)
> 
> Most likely, nothing.

So pls code it up. It's the same format.

> >
> > >  }
> > >
> > >  static int virtio_ccw_finalize_features(struct virtio_device *vdev)
> > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > > index 3ff746e3f24a..0e13da17fe0a 100644
> > > --- a/drivers/virtio/virtio_mmio.c
> > > +++ b/drivers/virtio/virtio_mmio.c
> > > @@ -285,6 +285,19 @@ static bool vm_notify(struct virtqueue *vq)
> > > return true;
> > >  }
> > >
> > > +static bool vm_notify_with_data(struct virtqueue *vq)
> > > +{
> > > +   struct virtio_mmio_device *vm_dev = 
> > > to_virtio_mmio_device(vq->vdev);
> > > +   u32 data = vring_fill_notification_data(vq);
> >
> > Can we move this to the initialization?
> 
> This data is new for each notification, because it helps to identify
> the next available index.
> 
> >
> > Thanks
> >
> 
> Thanks,
> Viktor Prutyanov

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v3] virtio/vsock: allocate multiple skbuffs on tx

2023-03-21 Thread Stefano Garzarella

On Tue, Mar 21, 2023 at 12:31:48AM +0300, Arseniy Krasnov wrote:

This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Signed-off-by: Arseniy Krasnov 
---
Link to v1:
https://lore.kernel.org/netdev/2c52aa26-8181-d37a-bccd-a86bd3cbc...@sberdevices.ru/
Link to v2:
https://lore.kernel.org/netdev/ea5725eb-6cb5-cf15-2938-34e335a44...@sberdevices.ru/

Changelog:
v1 -> v2:
- If sent something, return number of bytes sent (even in
  case of error). Return error only if failed to sent first
  skbuff.

v2 -> v3:
- Handle case when transport callback returns unexpected value which
  is not equal to 'skb->len'. Break loop.
- Don't check for zero value of 'rest_len' before calling
  'virtio_transport_put_credit()'. Decided to add this check directly
  to 'virtio_transport_put_credit()' in separate patch.

net/vmw_vsock/virtio_transport_common.c | 59 +++--
1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 6564192e7f20..e0b2c6ecbe22 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -196,7 +196,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock 
*vsk,
const struct virtio_transport *t_ops;
struct virtio_vsock_sock *vvs;
u32 pkt_len = info->pkt_len;
-   struct sk_buff *skb;
+   u32 rest_len;
+   int ret;

info->type = virtio_transport_get_type(sk_vsock(vsk));

@@ -216,10 +217,6 @@ static int virtio_transport_send_pkt_info(struct 
vsock_sock *vsk,

vvs = vsk->trans;

-   /* we can send less than pkt_len bytes */
-   if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
-   pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
-
/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);

@@ -227,17 +224,51 @@ static int virtio_transport_send_pkt_info(struct 
vsock_sock *vsk,
if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
return pkt_len;

-   skb = virtio_transport_alloc_skb(info, pkt_len,
-src_cid, src_port,
-dst_cid, dst_port);
-   if (!skb) {
-   virtio_transport_put_credit(vvs, pkt_len);
-   return -ENOMEM;
-   }
+   ret = 0;
+   rest_len = pkt_len;
+
+   do {
+   struct sk_buff *skb;
+   size_t skb_len;
+
+   skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
+
+   skb = virtio_transport_alloc_skb(info, skb_len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!skb) {
+   ret = -ENOMEM;
+   break;
+   }
+
+   virtio_transport_inc_tx_pkt(vvs, skb);

-   virtio_transport_inc_tx_pkt(vvs, skb);
+   ret = t_ops->send_pkt(skb);

-   return t_ops->send_pkt(skb);
+   if (ret < 0)
+   break;
+
+   /* Both virtio and vhost 'send_pkt()' returns 'skb_len',
+* but for reliability use 'ret' instead of 'skb_len'.
+* Also if partial send happens (e.g. 'ret' != 'skb_len')
+* somehow, we break this loop, but account such returned
+* value in 'virtio_transport_put_credit()'.
+*/
+   rest_len -= ret;
+
+   if (ret != skb_len) {
+   ret = -EFAULT;


Okay, but `ret` will be overwritten by the check we have before the
return ...


+   break;
+   }
+   } while (rest_len);
+
+   virtio_transport_put_credit(vvs, rest_len);
+
+   /* Return number of bytes, if any data has been sent. */
+   if (rest_len != pkt_len)
+   ret = pkt_len - rest_len;


... here.

Since we don't expect this condition for now, perhaps we can avoid
setting ret with -EFAULT, but we can add a WARN_ONCE (interrupting the
loop as you did here).

This way we return the partial length as we did before.

Thanks,
Stefano


+
+   return ret;
}

static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v1 1/3] virtio/vsock: fix header length on skb merging

2023-03-21 Thread Stefano Garzarella

On Mon, Mar 20, 2023 at 09:10:13PM +0300, Arseniy Krasnov wrote:



On 20.03.2023 17:57, Stefano Garzarella wrote:

On Sun, Mar 19, 2023 at 09:51:06PM +0300, Arseniy Krasnov wrote:

This fixes header length calculation of skbuff during data appending to
it. When such skbuff is processed in dequeue callbacks, e.g. 'skb_pull()'
is called on it, 'skb->len' is dynamic value, so it is impossible to use
it in header, because value from header must be permanent for valid
credit calculation ('rx_bytes'/'fwd_cnt').

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")


I don't understand how this commit introduced this problem, can you
explain it better?

Sorry, seems i said it wrong a little bit. Before 0777, implementation was 
buggy, but
exactly this problem was not actual - it didn't triggered somehow. I checked it 
with
reproducer from this patch. But in 0777 as value from header was used to 
'rx_bytes'
calculation, bug become actual. Yes, may be it is not "Fixes:" for 0777, but 
critical
addition. I'm not sure.


Is it related more to the credit than to the size in the header itself?


It is related to size in header more.

Anyway, the patch LGTM, but we should explain better the issue.



Ok, I'll write it more clear in the commit message.


Okay, if 077706165717 triggered the problem, even if it was there from
before, then IMHO it is okay to use that commit in Fixes.

Please, explain it better in the message, so it's clear for everyone ;-)

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2] virtio/vsock: allocate multiple skbuffs on tx

2023-03-21 Thread Stefano Garzarella

On Mon, Mar 20, 2023 at 09:02:19PM +0300, Arseniy Krasnov wrote:



On 20.03.2023 17:29, Stefano Garzarella wrote:

On Sun, Mar 19, 2023 at 09:46:10PM +0300, Arseniy Krasnov wrote:

This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Signed-off-by: Arseniy Krasnov 
---
Link to v1:
https://lore.kernel.org/netdev/2c52aa26-8181-d37a-bccd-a86bd3cbc...@sberdevices.ru/

Changelog:
v1 -> v2:
- If sent something, return number of bytes sent (even in
  case of error). Return error only if failed to sent first
  skbuff.

net/vmw_vsock/virtio_transport_common.c | 53 ++---
1 file changed, 39 insertions(+), 14 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 6564192e7f20..3fdf1433ec28 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -196,7 +196,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock 
*vsk,
const struct virtio_transport *t_ops;
struct virtio_vsock_sock *vvs;
u32 pkt_len = info->pkt_len;
-    struct sk_buff *skb;
+    u32 rest_len;
+    int ret;

info->type = virtio_transport_get_type(sk_vsock(vsk));

@@ -216,10 +217,6 @@ static int virtio_transport_send_pkt_info(struct 
vsock_sock *vsk,

vvs = vsk->trans;

-    /* we can send less than pkt_len bytes */
-    if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
-    pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
-
/* virtio_transport_get_credit might return less than pkt_len credit */
pkt_len = virtio_transport_get_credit(vvs, pkt_len);

@@ -227,17 +224,45 @@ static int virtio_transport_send_pkt_info(struct 
vsock_sock *vsk,
if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
    return pkt_len;

-    skb = virtio_transport_alloc_skb(info, pkt_len,
- src_cid, src_port,
- dst_cid, dst_port);
-    if (!skb) {
-    virtio_transport_put_credit(vvs, pkt_len);
-    return -ENOMEM;
-    }
+    ret = 0;
+    rest_len = pkt_len;
+
+    do {
+    struct sk_buff *skb;
+    size_t skb_len;
+
+    skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
+
+    skb = virtio_transport_alloc_skb(info, skb_len,
+ src_cid, src_port,
+ dst_cid, dst_port);
+    if (!skb) {
+    ret = -ENOMEM;
+    break;
+    }
+
+    virtio_transport_inc_tx_pkt(vvs, skb);
+
+    ret = t_ops->send_pkt(skb);
+
+    if (ret < 0)
+    break;

-    virtio_transport_inc_tx_pkt(vvs, skb);
+    rest_len -= skb_len;


t_ops->send_pkt() is returning the number of bytes sent. Current
implementations always return `skb_len`, so there should be no problem,
but it would be better to put a comment here, or we should handle the
case where ret != skb_len to avoid future issues.


Hello, thanks for review!

I see. I think i'll handle such partial sends (ret != skb_len) as error, as
it is the only thing to do - we remove 'skb_len' from user's buffer, but
'send_pkt()' returns another value, so it will be strange for me to continue
this tx loop as everything is ok. Something like this:
+
+ if (ret < 0)
+break;
+
+ if (ret != skb_len) {
+ret = -EFAULT;//or may be -EIO
+break;
+ }


Good for me.






+    } while (rest_len);

-    return t_ops->send_pkt(skb);
+    /* Don't call this function with zero as argument:
+ * it tries to acquire spinlock and such argument
+ * makes this call useless.


Good point, can we do the same also for virtio_transport_get_credit()?
(Maybe in a separate patch)

I'm thinking if may be better to do it directly inside the functions,
but I don't have a strong opinion on that since we only call them here.



I think in this patch i can call 'virtio_transport_put_credit()' without if, but
i'll prepare separate patch which adds zero argument check to this function.


Yep, I agree.


As i see, the only function suitable for such 'if' condition is
'virtio_transport_put_credit()'.


Why not even for virtio_transport_get_credit() ?

When we send packets without payload (e.g. VIRTIO_VSOCK_OP_REQUEST,
VIRTIO_VSOCK_OP_SHUTDOWN) we call virtio_transport_get_credit()
with `credit` parameter equal to 0, then we acquire the spinlock but
in the end we do nothing.


Anyway - for future use this check won't be bad.


Yep, these are minor improvements ;-)

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-03-21 Thread Michael S. Tsirkin
On Tue, Mar 21, 2023 at 10:29:11AM +0800, Jason Wang wrote:
> On Tue, Mar 21, 2023 at 7:21 AM Viktor Prutyanov  wrote:
> >
> > According to VirtIO spec v1.2, VIRTIO_F_NOTIFICATION_DATA feature
> > indicates that the driver passes extra data along with the queue
> > notifications.
> >
> > In a split queue case, the extra data is 16-bit available index. In a
> > packed queue case, the extra data is 1-bit wrap counter and 15-bit
> > available index.
> >
> > Add support for this feature for MMIO and PCI transports. Channel I/O
> > transport will not accept this feature.
> >
> > Signed-off-by: Viktor Prutyanov 
> > ---
> >
> >  v2: reject the feature in virtio_ccw, replace __le32 with u32
> >
> >  drivers/s390/virtio/virtio_ccw.c   |  4 +---
> >  drivers/virtio/virtio_mmio.c   | 15 ++-
> >  drivers/virtio/virtio_pci_common.c | 10 ++
> >  drivers/virtio/virtio_pci_common.h |  4 
> >  drivers/virtio/virtio_pci_legacy.c |  2 +-
> >  drivers/virtio/virtio_pci_modern.c |  2 +-
> >  drivers/virtio/virtio_ring.c   | 17 +
> >  include/linux/virtio_ring.h|  2 ++
> >  include/uapi/linux/virtio_config.h |  6 ++
> >  9 files changed, 56 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > b/drivers/s390/virtio/virtio_ccw.c
> > index a10dbe632ef9..d72a59415527 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -789,9 +789,7 @@ static u64 virtio_ccw_get_features(struct virtio_device 
> > *vdev)
> >
> >  static void ccw_transport_features(struct virtio_device *vdev)
> >  {
> > -   /*
> > -* Currently nothing to do here.
> > -*/
> > +   __virtio_clear_bit(vdev, VIRTIO_F_NOTIFICATION_DATA);
> 
> Is there any restriction that prevents us from implementing
> VIRTIO_F_NOTIFICATION_DATA? (Spec seems doesn't limit us from this)

Right, spec actually tells you what to do.

> >  }
> >
> >  static int virtio_ccw_finalize_features(struct virtio_device *vdev)
> > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> > index 3ff746e3f24a..0e13da17fe0a 100644
> > --- a/drivers/virtio/virtio_mmio.c
> > +++ b/drivers/virtio/virtio_mmio.c
> > @@ -285,6 +285,19 @@ static bool vm_notify(struct virtqueue *vq)
> > return true;
> >  }
> >
> > +static bool vm_notify_with_data(struct virtqueue *vq)
> > +{
> > +   struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev);
> > +   u32 data = vring_fill_notification_data(vq);
> 
> Can we move this to the initialization?
> 
> Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 0/7] vhost-scsi: Fix crashes and management op hangs

2023-03-21 Thread Michael S. Tsirkin
On Mon, Mar 20, 2023 at 09:29:50PM -0500, michael.chris...@oracle.com wrote:
> On 3/20/23 9:06 PM, Mike Christie wrote:
> > The following patches were made over Linus tree.
> 
> Hi Michael, I see you merged my first version of the patchset in your
> vhost branch.
> 
> Do you want me to just send a followup patchset?
> 
> The major diff between the 2 versions:
> 
> 1. I added the first 2 patches which fix some bugs in the existing code
> I found while doing some code review and testing another LIO patchset
> plus v1.
> 
> Note: The other day I posted that I thought the 3rd patch in v1 caused
> the bugs but they were already in the code.
> 
> 2. In v2 I made one of the patches not need the vhost device lock when
> unmapping/mapping LUNs, so you can add new LUNs even if one LUN on the same
> vhost_scsi device was hung.
> 
> Since it's not regressions with the existing patches, I can just send those
> as a followup patchset if that's preferred.

It's ok, I will drop v1 and replace it with v2.
Do you feel any of this is needed in this release?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization