Re: [PATCH net-next v2 00/12] vsock/virtio: continue MSG_ZEROCOPY support

2023-10-04 Thread Stefano Garzarella

On Wed, Oct 04, 2023 at 07:22:04PM +0300, Arseniy Krasnov wrote:



On 04.10.2023 08:25, Arseniy Krasnov wrote:



On 03.10.2023 19:26, Stefano Garzarella wrote:

Hi Arseniy,

On Sun, Oct 01, 2023 at 12:02:56AM +0300, Arseniy Krasnov wrote:

Hello,

this patchset contains second and third parts of another big patchset
for MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-avkras...@sberdevices.ru/

During review of this series, Stefano Garzarella 
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
  link below)
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
  tx completions) and update for Documentation/. <-- this patchset
3) Updates for tests and utils. <-- this patchset

Part 1) was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=236f3873b517acfaf949c23bb2d5dec13bfd2da2

Link to v1:
https://lore.kernel.org/netdev/20230922052428.4005676-1-avkras...@salutedevices.com/

Changelog:
v1 -> v2:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.


Thanks for this new version.
I started to include vsock_uring_test in my test suite and tests are
going well.

I reviewed code patches, I still need to review the tests.
I'll do that by the end of the week, but they looks good!


Thanks for review! Ok, I'll wait for tests review, and then send next
version.


Got your comments from review. I'll update patches by:
1) Trying to avoid touching util.c/util.h


I mean, we can touch it ;-) but for this case it looks like we don't
need most of that functions to be there.

At least for now. If we need them to be used in more places, then it
makes sense.


2) Add new header with functions shared between util vsock_perf and
tests


We can do this also later in another PR as cleanup if you prefer.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-10-04 Thread Michael S. Tsirkin
On Wed, Oct 04, 2023 at 02:56:53PM +0200, Eugenio Perez Martin wrote:
> On Tue, Jul 4, 2023 at 12:16 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > >
> > >
> > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > Offer this backend feature as mlx5 is compatible with it. It allows it
> > > > > to do live migration with CVQ, dynamically switching between 
> > > > > passthrough
> > > > > and shadow virtqueue.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > Same comment.
> > > to which?
> > >
> > > -Siwei
> >
> > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to commit 
> > to it
> > as a kernel/userspace ABI: what if one wants to start rings in some
> > other specific order?
> > As was discussed on list, a better promise is not to access ring
> > until the 1st kick. vdpa can then do a kick when it wants
> > the device to start accessing rings.
> >
> 
> Friendly ping about this series,
> 
> Now that VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK has been merged for
> vdpa_sim, does it make sense for mlx too?
> 
> Thanks!

For sure. I was just busy with a qemu pull, will handle this next.

> > > >
> > > > > ---
> > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > >   1 file changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -7,6 +7,7 @@
> > > > >   #include 
> > > > >   #include 
> > > > >   #include 
> > > > > +#include 
> > > > >   #include 
> > > > >   #include 
> > > > >   #include 
> > > > > @@ -2499,6 +2500,11 @@ static void unregister_link_notifier(struct 
> > > > > mlx5_vdpa_net *ndev)
> > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > >   }
> > > > > +static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device 
> > > > > *vdpa)
> > > > > +{
> > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > +}
> > > > > +
> > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, 
> > > > > u64 features)
> > > > >   {
> > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > mlx5_vdpa_ops = {
> > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > > --
> > > > > 2.39.3
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v2 12/12] test/vsock: io_uring rx/tx tests

2023-10-04 Thread Stefano Garzarella

On Sun, Oct 01, 2023 at 12:03:08AM +0300, Arseniy Krasnov wrote:

This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Add 'LDLIBS = -luring' to the target 'vsock_uring_test'.
 * Add 'vsock_uring_test' to the target 'test'.

tools/testing/vsock/Makefile   |   7 +-
tools/testing/vsock/vsock_uring_test.c | 321 +
2 files changed, 326 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/vsock/vsock_uring_test.c

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 1a26f60a596c..b80e7c7def1e 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,12 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
-test: vsock_test vsock_diag_test
+test: vsock_test vsock_diag_test vsock_uring_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

+vsock_uring_test: LDLIBS = -luring
+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
+
CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include 
-Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
clean:
-   ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
+   ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
diff --git a/tools/testing/vsock/vsock_uring_test.c 
b/tools/testing/vsock/vsock_uring_test.c
new file mode 100644
index ..725895350697
--- /dev/null
+++ b/tools/testing/vsock/vsock_uring_test.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* io_uring tests for vsock
+ *
+ * Copyright (C) 2023 SberDevices.
+ *
+ * Author: Arseniy Krasnov 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "util.h"
+#include "control.h"
+
+#define PAGE_SIZE  4096
+#define RING_ENTRIES_NUM   4
+
+static struct vsock_test_data test_data_array[] = {


Ah, I see vsock_test_data is used here, but we are using a subset
of fields that are not exposed outside of this file.

So, let's define a custom struct in this file for this
(e.g. struct vsock_io_uring_tests)

The rest LGTM!


+   /* All elements have page aligned base and size. */
+   {
+   .vecs_cnt = 3,
+   {
+   { NULL, PAGE_SIZE },
+   { NULL, 2 * PAGE_SIZE },
+   { NULL, 3 * PAGE_SIZE },
+   }
+   },
+   /* Middle element has both non-page aligned base and size. */
+   {
+   .vecs_cnt = 3,
+   {
+   { NULL, PAGE_SIZE },
+   { (void *)1, 200  },
+   { NULL, 3 * PAGE_SIZE },
+   }
+   }
+};
+
+static void vsock_io_uring_client(const struct test_opts *opts,
+ const struct vsock_test_data *test_data,
+ bool msg_zerocopy)
+{
+   struct io_uring_sqe *sqe;
+   struct io_uring_cqe *cqe;
+   struct io_uring ring;
+   struct iovec *iovec;
+   struct msghdr msg;
+   int fd;
+
+   fd = vsock_stream_connect(opts->peer_cid, 1234);
+   if (fd < 0) {
+   perror("connect");
+   exit(EXIT_FAILURE);
+   }
+
+   if (msg_zerocopy)
+   enable_so_zerocopy(fd);
+
+   iovec = iovec_from_test_data(test_data);
+
+   if (io_uring_queue_init(RING_ENTRIES_NUM, , 0))
+   error(1, errno, "io_uring_queue_init");
+
+   if (io_uring_register_buffers(, iovec, test_data->vecs_cnt))
+   error(1, errno, "io_uring_register_buffers");
+
+   memset(, 0, sizeof(msg));
+   msg.msg_iov = iovec;
+   msg.msg_iovlen = test_data->vecs_cnt;
+   sqe = io_uring_get_sqe();
+
+   if (msg_zerocopy)
+   io_uring_prep_sendmsg_zc(sqe, fd, , 0);
+   else
+   io_uring_prep_sendmsg(sqe, fd, , 0);
+
+   if (io_uring_submit() != 1)
+   error(1, errno, "io_uring_submit");
+
+   if (io_uring_wait_cqe(, ))
+   error(1, errno, "io_uring_wait_cqe");
+
+   io_uring_cqe_seen(, cqe);
+
+   control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt));
+
+   control_writeln("DONE");
+   io_uring_queue_exit();
+   free_iovec_test_data(test_data, iovec);
+   close(fd);
+}
+
+static void vsock_io_uring_server(const struct test_opts *opts,
+ const struct vsock_test_data *test_data)
+{
+   unsigned long remote_hash;
+   unsigned long local_hash;
+ 

Re: [PATCH net-next v2 11/12] test/vsock: MSG_ZEROCOPY support for vsock_perf

2023-10-04 Thread Stefano Garzarella

On Sun, Oct 01, 2023 at 12:03:07AM +0300, Arseniy Krasnov wrote:

To use this option pass '--zc' parameter:


--zerocopy would be better IMHO



./vsock_perf --zc --sender  --port  --bytes 

With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.

Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/vsock_perf.c | 143 +--
1 file changed, 134 insertions(+), 9 deletions(-)

diff --git a/tools/testing/vsock/vsock_perf.c b/tools/testing/vsock/vsock_perf.c
index a72520338f84..f0f183f3f9e8 100644
--- a/tools/testing/vsock/vsock_perf.c
+++ b/tools/testing/vsock/vsock_perf.c
@@ -18,6 +18,8 @@
#include 
#include 
#include 
+#include 
+#include 

#define DEFAULT_BUF_SIZE_BYTES  (128 * 1024)
#define DEFAULT_TO_SEND_BYTES   (64 * 1024)
@@ -28,9 +30,18 @@
#define BYTES_PER_GB(1024 * 1024 * 1024ULL)
#define NSEC_PER_SEC(10ULL)

+#ifndef SOL_VSOCK
+#define SOL_VSOCK  287
+#endif
+
+#ifndef VSOCK_RECVERR
+#define VSOCK_RECVERR  1
+#endif
+
static unsigned int port = DEFAULT_PORT;
static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES;
static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES;
+static bool zerocopy;

static void error(const char *s)
{
@@ -247,15 +258,76 @@ static void run_receiver(unsigned long rcvlowat_bytes)
close(fd);
}

+static void recv_completion(int fd)
+{
+   struct sock_extended_err *serr;
+   char cmsg_data[128];
+   struct cmsghdr *cm;
+   struct msghdr msg = { 0 };
+   ssize_t ret;
+
+   msg.msg_control = cmsg_data;
+   msg.msg_controllen = sizeof(cmsg_data);
+
+   ret = recvmsg(fd, , MSG_ERRQUEUE);
+   if (ret) {
+   fprintf(stderr, "recvmsg: failed to read err: %zi\n", ret);
+   return;
+   }
+
+   cm = CMSG_FIRSTHDR();
+   if (!cm) {
+   fprintf(stderr, "cmsg: no cmsg\n");
+   return;
+   }
+
+   if (cm->cmsg_level != SOL_VSOCK) {
+   fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+   return;
+   }
+
+   if (cm->cmsg_type != VSOCK_RECVERR) {
+   fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+   return;
+   }
+
+   serr = (void *)CMSG_DATA(cm);
+   if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) {
+   fprintf(stderr, "serr: wrong origin\n");
+   return;
+   }
+
+   if (serr->ee_errno) {
+   fprintf(stderr, "serr: wrong error code\n");
+   return;
+   }
+
+   if (zerocopy && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED))
+   fprintf(stderr, "warning: copy instead of zerocopy\n");
+}
+
+static void enable_so_zerocopy(int fd)
+{
+   int val = 1;
+
+   if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, , sizeof(val)))
+   error("setsockopt(SO_ZEROCOPY)");
+}


We use enable_so_zerocopy() in a single place, maybe we can put this
code there.

Anyway it seems we are copy & paste some codes from util, etc.

Would make sense create a new header to use on both tests and perf?



+
static void run_sender(int peer_cid, unsigned long to_send_bytes)
{
time_t tx_begin_ns;
time_t tx_total_ns;
size_t total_send;
+   time_t time_in_send;
void *data;
int fd;

-   printf("Run as sender\n");
+   if (zerocopy)
+   printf("Run as sender MSG_ZEROCOPY\n");
+   else
+   printf("Run as sender\n");
+
printf("Connect to %i:%u\n", peer_cid, port);
printf("Send %lu bytes\n", to_send_bytes);
printf("TX buffer %lu bytes\n", buf_size_bytes);
@@ -265,38 +337,82 @@ static void run_sender(int peer_cid, unsigned long 
to_send_bytes)
if (fd < 0)
exit(EXIT_FAILURE);

-   data = malloc(buf_size_bytes);
+   if (zerocopy) {
+   enable_so_zerocopy(fd);

-   if (!data) {
-   fprintf(stderr, "'malloc()' failed\n");
-   exit(EXIT_FAILURE);
+   data = mmap(NULL, buf_size_bytes, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+   if (data == MAP_FAILED) {
+   perror("mmap");
+   exit(EXIT_FAILURE);
+   }
+   } else {
+   data = malloc(buf_size_bytes);
+
+   if (!data) {
+   fprintf(stderr, "'malloc()' failed\n");
+   exit(EXIT_FAILURE);
+   }
}

memset(data, 0, buf_size_bytes);
total_send = 0;
+   time_in_send = 0;
tx_begin_ns = current_nsec();

while (total_send < to_send_bytes) {
ssize_t sent;
+   size_t rest_bytes;
+   time_t before;

-   sent = write(fd, data, buf_size_bytes);
+   rest_bytes = to_send_bytes - total_send;
+
+   before = current_nsec();
+   sent = send(fd, data, 

Re: [PATCH net-next v2 10/12] test/vsock: MSG_ZEROCOPY flag tests

2023-10-04 Thread Stefano Garzarella

On Sun, Oct 01, 2023 at 12:03:06AM +0300, Arseniy Krasnov wrote:

This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.

tools/testing/vsock/Makefile  |   2 +-
tools/testing/vsock/util.c| 214 +++
tools/testing/vsock/util.h|  27 ++
tools/testing/vsock/vsock_test.c  |  16 ++
tools/testing/vsock/vsock_test_zerocopy.c | 314 ++
tools/testing/vsock/vsock_test_zerocopy.h |  15 ++
6 files changed, 587 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 21a98ba565ab..1a26f60a596c 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
test: vsock_test vsock_diag_test
-vsock_test: vsock_test.o timeout.o control.o util.o
+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 6779d5008b27..2a641ab38f08 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -11,10 +11,14 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
#include 
+#include 
+#include 
+#include 

#include "timeout.h"
#include "control.h"
@@ -444,3 +448,213 @@ unsigned long hash_djb2(const void *data, size_t len)

return hash;
}
+
+void enable_so_zerocopy(int fd)
+{
+   int val = 1;
+
+   if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, , sizeof(val))) {
+   perror("setsockopt");
+   exit(EXIT_FAILURE);
+   }
+}
+
+static void *mmap_no_fail(size_t bytes)
+{
+   void *res;
+
+   res = mmap(NULL, bytes, PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+   if (res == MAP_FAILED) {
+   perror("mmap");
+   exit(EXIT_FAILURE);
+   }
+
+   return res;
+}
+
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
+{
+   size_t bytes;
+   int i;
+
+   for (bytes = 0, i = 0; i < iovnum; i++)
+   bytes += iov[i].iov_len;
+
+   return bytes;
+}
+
+static void iovec_random_init(struct iovec *iov,
+ const struct vsock_test_data *test_data)
+{
+   int i;
+
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   int j;
+
+   if (test_data->vecs[i].iov_base == MAP_FAILED)
+   continue;
+
+   for (j = 0; j < iov[i].iov_len; j++)
+   ((uint8_t *)iov[i].iov_base)[j] = rand() & 0xff;
+   }
+}
+
+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum)
+{
+   unsigned long hash;
+   size_t iov_bytes;
+   size_t offs;
+   void *tmp;
+   int i;
+
+   iov_bytes = iovec_bytes(iov, iovnum);
+
+   tmp = malloc(iov_bytes);
+   if (!tmp) {
+   perror("malloc");
+   exit(EXIT_FAILURE);
+   }
+
+   for (offs = 0, i = 0; i < iovnum; i++) {
+   memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
+   offs += iov[i].iov_len;
+   }
+
+   hash = hash_djb2(tmp, iov_bytes);
+   free(tmp);
+
+   return hash;
+}
+
+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data)
+{
+   const struct iovec *test_iovec;
+   struct iovec *iovec;
+   int i;
+
+   iovec = malloc(sizeof(*iovec) * test_data->vecs_cnt);
+   if (!iovec) {
+   perror("malloc");
+   exit(EXIT_FAILURE);
+   }
+
+   test_iovec = test_data->vecs;
+
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   iovec[i].iov_len = test_iovec[i].iov_len;
+   iovec[i].iov_base = mmap_no_fail(test_iovec[i].iov_len);
+
+   if (test_iovec[i].iov_base != MAP_FAILED &&
+   test_iovec[i].iov_base)
+   iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
+   }
+
+   /* Unmap "invalid" elements. */
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   if (test_iovec[i].iov_base == MAP_FAILED) {
+   if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+   perror("munmap");
+   exit(EXIT_FAILURE);
+   }
+   }
+   }
+
+   iovec_random_init(iovec, test_data);
+
+   return iovec;
+}
+
+void free_iovec_test_data(const struct vsock_test_data *test_data,
+