from:"Akihiko Odaki"

Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki


On 2024/05/16 2:15, Dmitry Osipenko wrote:

On 5/15/24 20:04, Akihiko Odaki wrote:




VIRTIO_GPU_CMD_RESOURCE_UNREF should also call
virtio_gpu_virgl_async_unmap_resource_blob(). I guess that's the
original intention of having a function for this instead of inlining the
content of this function to virgl_cmd_resource_unmap_blob().


Correct, previous patchset versions unmapped resource on unref.

In v11 I dropped unmapping from unref to avoid adding additional
`async_unmap_in_progress` flag because normally map/unmap will be
balanced by guest anyways.

The virtio-gpu spec doesn't tell that resource have to be implicitly
unmapped on unref. In a case of Linux guest, it actually will be a bug
to unref a mapped resource because guest will continue to map and use
the destroyed resource.



Additional `async_unmap_in_progress` flag should not be necessary as 
explained earlier.


It is a valid design not to issue UNMAP_BLOB before UNREF if the 
automatically performs the unmapping operation. A guest needs to ensure 
the blob is not mapped in a guest userspace virtual address space, but 
it does not require issuing UNMAP_BLOB, which is to unmap the blob from 
the guest physical address space.


In case of Linux, virtio_gpu_vram_free() calls virtio_gpu_cmd_unmap() to 
issue UNMAP_BLOB before UNREF, which is actually not necessary. Linux 
still needs to ensure that the blob is not mapped in a guest userspace 
virtual address space, but that is done before virtio_gpu_vram_free() 
gets called, and virtio_gpu_cmd_unmap() has nothing to do with that.


It is still a good practice for a guest to issue UNMAP_BLOB in such a 
case because the spec does not say the VMM will automatically unmap the 
blob for UNREF, and that's what Linux does. From the VMM perspective, 
it's better to perform the unmapping operation for UNREF because the 
spec does not say the guest always issue UNMAP_BLOB before UNREF.

Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki


On 2024/05/16 2:01, Dmitry Osipenko wrote:

On 5/15/24 19:42, Akihiko Odaki wrote:

It may be better to actually implement unmapping instead of returning an
error for consistency with the iov operation. Apparently crosvm also
unmaps blobs with VIRTIO_GPU_CMD_RESOURCE_UNREF.


Then I'll add back `async_unmap_in_progress` because resource can be
both mapped/unmapped on unref, and we'll need flag to know whether async
unmapping has been finished to do the final unmapping of the resource.


Such a situation should be already handled since unmapping in progress
blocks all commands (not just VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB but
literally all, including VIRTIO_GPU_CMD_RESOURCE_UNREF).


The async unmapping consists of 3 parts:

1. begin async unmapping with memory_region_del_subregion() and suspend
2. wait for res->mr to be freed and resume
3. finish the unmapping with final virgl_renderer_resource_unmap()

Parts 1 and 3 are handled by  virtio_gpu_virgl_async_unmap_resource_blob()


The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB is different because we know that
blob is mapped in the first place. Hence we can safely perform the part
3, assuming that parts 1/2 has been completed.

In case of VIRTIO_GPU_CMD_RESOURCE_UNREF, blob can be unmapped in the
first place and we can't do the part 3 because it will error out for
unmapped resource since parts 1/2 were not performed.



VIRTIO_GPU_CMD_RESOURCE_UNREF should also call 
virtio_gpu_virgl_async_unmap_resource_blob(). I guess that's the 
original intention of having a function for this instead of inlining the 
content of this function to virgl_cmd_resource_unmap_blob().

Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki


On 2024/05/16 1:39, Dmitry Osipenko wrote:

On 5/13/24 12:18, Akihiko Odaki wrote:

     static void virgl_cmd_resource_unref(VirtIOGPU *g,
- struct virtio_gpu_ctrl_command
*cmd)
+ struct virtio_gpu_ctrl_command
*cmd,
+ bool *cmd_suspended)


This parameter is not used as it returns an error if the resource is
still mapped.


Missed to remove it by accident


It may be better to actually implement unmapping instead of returning an
error for consistency with the iov operation. Apparently crosvm also
unmaps blobs with VIRTIO_GPU_CMD_RESOURCE_UNREF.


Then I'll add back `async_unmap_in_progress` because resource can be
both mapped/unmapped on unref, and we'll need flag to know whether async
unmapping has been finished to do the final unmapping of the resource.


Such a situation should be already handled since unmapping in progress 
blocks all commands (not just VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB but 
literally all, including VIRTIO_GPU_CMD_RESOURCE_UNREF).

Re: [PATCH v11 01/10] virtio-gpu: Unrealize GL device

2024-05-15 Thread Akihiko Odaki


On 2024/05/16 1:18, Dmitry Osipenko wrote:

On 5/13/24 11:44, Akihiko Odaki wrote:

On 2024/05/12 3:22, Dmitry Osipenko wrote:

Even though GL GPU doesn't support hotplugging today, free virgl
resources when GL device is unrealized. For consistency.

Signed-off-by: Dmitry Osipenko 
---
   hw/display/virtio-gpu-gl.c | 11 +++
   hw/display/virtio-gpu-virgl.c  |  9 +
   include/hw/virtio/virtio-gpu.h |  1 +
   3 files changed, 21 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfbfc..0c0a8d136954 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -136,6 +136,16 @@ static Property virtio_gpu_gl_properties[] = {
   DEFINE_PROP_END_OF_LIST(),
   };
   +static void virtio_gpu_gl_device_unrealize(DeviceState *qdev)
+{
+    VirtIOGPU *g = VIRTIO_GPU(qdev);
+    VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
+
+    if (gl->renderer_inited) {
+    virtio_gpu_virgl_deinit(g);
+    }
+}
+
   static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
   {
   DeviceClass *dc = DEVICE_CLASS(klass);
@@ -149,6 +159,7 @@ static void virtio_gpu_gl_class_init(ObjectClass
*klass, void *data)
   vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
     vdc->realize = virtio_gpu_gl_device_realize;
+    vdc->unrealize = virtio_gpu_gl_device_unrealize;
   vdc->reset = virtio_gpu_gl_reset;
   device_class_set_props(dc, virtio_gpu_gl_properties);
   }
diff --git a/hw/display/virtio-gpu-virgl.c
b/hw/display/virtio-gpu-virgl.c
index 9f34d0e6619c..b0500eccf8e0 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -665,3 +665,12 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
     return capset2_max_ver ? 2 : 1;
   }
+
+void virtio_gpu_virgl_deinit(VirtIOGPU *g)
+{
+    if (g->fence_poll) {


Isn't g->fence_poll always non-NULL when this function is called?


virtio_gpu_virgl_init() is invoked when first cmd is executed, please
see virtio_gpu_gl_handle_ctrl() that invokes it. Hence g->fence_poll can
be NULL.



But it already checks renderer_inited, doesn't it? And I think it's 
better to utilize one single flag to represent that virgl is enabled 
instead of checking several variables (fence_poll and cmdq_resume_bh in 
the future).

Re: [PATCH v11 09/10] virtio-gpu: Register capsets dynamically

2024-05-13 Thread Akihiko Odaki


On 2024/05/12 3:22, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-gl.c |  6 --
  hw/display/virtio-gpu-virgl.c  | 33 +
  include/hw/virtio/virtio-gpu.h |  4 +++-
  3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 95806999189e..431fc2881a00 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -124,8 +124,8 @@ static void virtio_gpu_gl_device_realize(DeviceState *qdev, 
Error **errp)
  }
  
  g->parent_obj.conf.flags |= (1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);

-VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
-virtio_gpu_virgl_get_num_capsets(g);
+g->capset_ids = virtio_gpu_virgl_get_capsets(g);
+VIRTIO_GPU_BASE(g)->virtio_config.num_capsets = g->capset_ids->len;
  
  #ifdef HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS

  g->parent_obj.conf.flags |= 1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED;
@@ -148,6 +148,8 @@ static void virtio_gpu_gl_device_unrealize(DeviceState 
*qdev)
  if (gl->renderer_inited) {
  virtio_gpu_virgl_deinit(g);
  }
+
+g_array_unref(g->capset_ids);
  }
  
  static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 3f2e406be3a4..135974431492 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -590,19 +590,13 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->capset_ids->len) {
+resp.capset_id = g_array_index(g->capset_ids, uint32_t,
+   info.capset_index);
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1123,14 +1117,29 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
-int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)

+static void virtio_gpu_virgl_add_capset(GArray *capset_ids, uint32_t capset_id)
+{
+g_array_append_val(capset_ids, capset_id);
+}


Is it worthwhile to have a function for this?

Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-13 Thread Akihiko Odaki


On 2024/05/12 3:22, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 274 -
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   2 +
  3 files changed, 277 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 524f15220b7f..3f2e406be3a4 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,7 @@
  
  struct virtio_gpu_virgl_resource {

  struct virtio_gpu_simple_resource base;
+MemoryRegion *mr;
  };
  
  static struct virtio_gpu_virgl_resource *

@@ -49,6 +50,117 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+struct virtio_gpu_virgl_resource *res;
+};
+
+static void virtio_gpu_virgl_resume_cmdq_bh(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->mr = NULL;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() is executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+qemu_bh_schedule(vmr->g->cmdq_resume_bh);
+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource: 
%s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->res = res;
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * MR could outlive the resource if MR's reference is held outside of
+ * virtio-gpu. In order to prevent unmapping resource while MR is alive,
+ * and thus, making the data pointer invalid, we will block virtio-gpu
+ * command processing until MR is fully unreferenced and freed.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static int
+virtio_gpu_virgl_async_unmap_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource 
*res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr = res->mr;
+int ret;
+
+if (mr) {
+/* render will be unblocked once MR is freed */
+b->renderer_blocked++;
+
+/* memory region owns self res->mr object and frees it by itself */
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+} else {
+ret = virgl_renderer_resource_unmap(res->base.resource_id);
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: failed to unmap virgl resource: %s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+}
+
+return 0;
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -146,7 +258,8 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
  }
  
  static void virgl_cmd_resource_unref(VirtIOGPU *g,

- struct virtio_gpu_ctrl_command *cmd)
+ struct virtio_gpu_ctrl_command *cmd,
+ bool *cmd_suspended)


This parameter is not used as it returns an error if the resource is 
still mapped.


It may be better to actually

Re: [PATCH v11 01/10] virtio-gpu: Unrealize GL device

2024-05-13 Thread Akihiko Odaki


On 2024/05/12 3:22, Dmitry Osipenko wrote:

Even though GL GPU doesn't support hotplugging today, free virgl
resources when GL device is unrealized. For consistency.

Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-gl.c | 11 +++
  hw/display/virtio-gpu-virgl.c  |  9 +
  include/hw/virtio/virtio-gpu.h |  1 +
  3 files changed, 21 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfbfc..0c0a8d136954 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -136,6 +136,16 @@ static Property virtio_gpu_gl_properties[] = {
  DEFINE_PROP_END_OF_LIST(),
  };
  
+static void virtio_gpu_gl_device_unrealize(DeviceState *qdev)

+{
+VirtIOGPU *g = VIRTIO_GPU(qdev);
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
+
+if (gl->renderer_inited) {
+virtio_gpu_virgl_deinit(g);
+}
+}
+
  static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -149,6 +159,7 @@ static void virtio_gpu_gl_class_init(ObjectClass *klass, 
void *data)
  vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
  
  vdc->realize = virtio_gpu_gl_device_realize;

+vdc->unrealize = virtio_gpu_gl_device_unrealize;
  vdc->reset = virtio_gpu_gl_reset;
  device_class_set_props(dc, virtio_gpu_gl_properties);
  }
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 9f34d0e6619c..b0500eccf8e0 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -665,3 +665,12 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  
  return capset2_max_ver ? 2 : 1;

  }
+
+void virtio_gpu_virgl_deinit(VirtIOGPU *g)
+{
+if (g->fence_poll) {


Isn't g->fence_poll always non-NULL when this function is called?


+timer_free(g->fence_poll);
+}
+
+virgl_renderer_cleanup(NULL);
+}
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index ed44cdad6b34..b657187159d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -336,6 +336,7 @@ void virtio_gpu_virgl_fence_poll(VirtIOGPU *g);
  void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g);
  void virtio_gpu_virgl_reset(VirtIOGPU *g);
  int virtio_gpu_virgl_init(VirtIOGPU *g);
+void virtio_gpu_virgl_deinit(VirtIOGPU *g);
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
  
  #endif

Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-05-10 Thread Akihiko Odaki


On 2024/05/09 21:39, Dmitry Osipenko wrote:

On 5/5/24 09:37, Akihiko Odaki wrote:

On 2024/05/02 4:02, Dmitry Osipenko wrote:

On 4/27/24 08:48, Akihiko Odaki wrote:


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can
just refer to !finished. No need to have the suspended flag.


Not sure what you're meaning. The !finished says that cmd is fenced,
this fenced command is added to the polling list and the fence is
checked periodically by the fence_poll timer, meanwhile next virgl
commands are executed in the same time.

This is completely different from the suspension where whole cmd
processing is blocked until command is resumed.



!finished means you have not sent a response with
virtio_gpu_ctrl_response(). Currently such a situation only happens when
a fence is requested and virtio_gpu_process_cmdq() exploits the fact,
but we are adding a new case without a fence.

So we need to add code to check if we are fencing or not in
virtio_gpu_process_cmdq(). This can be achieved by evaluating the
following expression as done in virtio_gpu_virgl_process_cmd():
cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE


This works, but then I'll add back the res->async_unmap_in_progress
because we need to know whether unmapping has been started.



Isn't the command processing paused when an unmapping operation is in 
progress?

Re: [PATCH] qemu-keymap: Free xkb allocations

2024-05-05 Thread Akihiko Odaki


On 2024/05/05 19:24, Michael Tokarev wrote:

01.05.2024 10:55, Akihiko Odaki wrote:

This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
  qemu-keymap.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
  xkb_state_unref(state);
  state = NULL;
+    xkb_keymap_unref(map);
+    xkb_context_unref(ctx);
+


I'd avoid freeing any resources in main() entirely,
since it's much cheaper to free whole process by the
kernel at exit time than to mess with each chunk of
allocated memory.  Dunno how useful it is to "fix"
these.

/mjt


This is purely to satisfy LeakSanitizier; the LeakSanitizer complaints 
result in a build failure with many noisy logs. I don't add Fixes: tags 
for this kind of leaks.


Regards,
Akihiko Odaki

Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-05-05 Thread Akihiko Odaki


On 2024/05/02 4:20, Dmitry Osipenko wrote:

On 4/27/24 08:52, Akihiko Odaki wrote:

On 2024/04/24 19:30, Dmitry Osipenko wrote:

On 4/19/24 12:18, Akihiko Odaki wrote:

@@ -61,6 +61,10 @@ struct virtio_gpu_simple_resource {
    int dmabuf_fd;
    uint8_t *remapped;
    +    MemoryRegion *mr;
+    bool async_unmap_completed;
+    bool async_unmap_in_progress;
+


Don't add fields to virtio_gpu_simple_resource but instead create a
struct that embeds virtio_gpu_simple_resource in virtio-gpu-virgl.c.


Please give a justification. I'd rather rename
virtio_gpu_simple_resource s/_simple//. Simple resource already supports
blob and the added fields are directly related to the blob. Don't see
why another struct is needed.



Because mapping is only implemented in virtio-gpu-gl while blob itself
is implemented also in virtio-gpu.


Rutubaga maps blobs and it should do unmapping blobs asynchronously as
well, AFAICT.



Right. It makes sense to put mr in struct virtio_gpu_simple_resource in 
preparation for such a situation.


Based on this discussion, I think it is fine to put mr either in struct 
virtio_gpu_simple_resource or a distinct struct. However if you put mr 
in struct virtio_gpu_simple_resource, the logic that manages 
MemoryRegion should also be moved to virtio-gpu.c for consistency.

Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-05-05 Thread Akihiko Odaki


On 2024/05/02 4:02, Dmitry Osipenko wrote:

On 4/27/24 08:48, Akihiko Odaki wrote:


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can
just refer to !finished. No need to have the suspended flag.


Not sure what you're meaning. The !finished says that cmd is fenced,
this fenced command is added to the polling list and the fence is
checked periodically by the fence_poll timer, meanwhile next virgl
commands are executed in the same time.

This is completely different from the suspension where whole cmd
processing is blocked until command is resumed.



!finished means you have not sent a response with 
virtio_gpu_ctrl_response(). Currently such a situation only happens when 
a fence is requested and virtio_gpu_process_cmdq() exploits the fact, 
but we are adding a new case without a fence.


So we need to add code to check if we are fencing or not in 
virtio_gpu_process_cmdq(). This can be achieved by evaluating the 
following expression as done in virtio_gpu_virgl_process_cmd():

cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE

Re: [PATCH v9 09/11] virtio-gpu: Register capsets dynamically

2024-05-03 Thread Akihiko Odaki


On 2024/05/02 4:52, Dmitry Osipenko wrote:

On 5/1/24 22:38, Dmitry Osipenko wrote:

On 5/1/24 22:31, Dmitry Osipenko wrote:

On 4/27/24 10:12, Akihiko Odaki wrote:

   int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
   {
   uint32_t capset2_max_ver, capset2_max_size;
+
+    if (g->capset_ids) {


Move capset_ids initialization to virtio_gpu_virgl_init() to save this
conditional.


Capsets are used before virgl is inited. At first guest queries virtio
device features and then enables virgl only if capset is available.
While virgl itself is initialized when first virtio command is
processed. I.e. it's not possible to move to virtio_gpu_virgl_init.


Though no, capsets aren't part of device features. I'll move it to
virtio_gpu_virgl_init, thanks.



Number of capsets actually is a part of generic virtio device cfg
descriptor. Capsets initialization can't be moved without probing
capsets twice, i.e. not worthwhile.



I see. Then I suggest replacing virtio_gpu_virgl_get_num_capsets() with 
a function that returns GArray of capset IDs. 
virtio_gpu_gl_device_realize() will assign the returned GArray to 
g->capset_ids. virtio_gpu_gl_device_unrealize(), which doesn't exist 
yet, will free g->capset_ids later.


This way, you won't need the conditional, and it will be clear that a 
GArray allocation happens in virtio_gpu_gl_device_realize() and is 
matched with the deallocation in virtio_gpu_gl_device_unrealize().

[PATCH] qemu-keymap: Free xkb allocations

2024-05-01 Thread Akihiko Odaki

This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
 qemu-keymap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
 xkb_state_unref(state);
 state = NULL;
 
+xkb_keymap_unref(map);
+xkb_context_unref(ctx);
+
 /* add quirks */
 fprintf(outfile,
 "\n"

---
base-commit: c25df57ae8f9fe1c72eee2dab37d76d904ac382e
change-id: 20240501-xkb-258483ccc5d8

Best regards,
-- 
Akihiko Odaki

Re: [PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto

2024-05-01 Thread Akihiko Odaki


On 2024/04/29 16:05, Michael S. Tsirkin wrote:

On Sun, Apr 28, 2024 at 04:21:06PM +0900, Akihiko Odaki wrote:

Based-on: <20240428-rss-v10-0-73cbaa91a...@daynix.com>
("[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements")

Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 


Should we maybe bite the bullet allow "auto" for all binary/boolean
properties? Just ignore "auto" if no one cares ATM.


It is not always obvious whether "auto" should be considered as "on" or 
"off" for existing boolean properties. The properties this patch deals 
with are to enable features so "auto" should be considered as "on" if 
possible. However, other properties may mean to disable features, and in 
such a case, "auto" should be considered as "off".


It may still make sense to accept "auto" for all virtio-net feature bits 
for consistency. In particular, I left guest_rsc_ext property boolean 
since nobody cares "auto" for that feature, but this can be converted to 
OnOffAuto.

Re: [PATCH 2/3] virtio-net: Convert feature properties to OnOffAuto

2024-05-01 Thread Akihiko Odaki


On 2024/05/01 0:02, Yuri Benditovich wrote:

Question:
How will libvirt (as an example) work with this change. In the
existing semantic of libvirt profile the "on" means "on if possible"
and using existing profile after qemu update will still use "on" with
meaning "force"? > Typically this is solved by machine type - if libvirt uses
'machine='pc-q35-8.1'' this will be backward-compatible.
How will this change be accepted?


The reasoning here is that this patch only changes the configuration 
validation and, once the validation passes, QEMU can load the state of a 
machine created with an old version of QEMU.


It is still a good idea to add a compatibility property. I'll do so in 
the next version.

Re: [PATCH 1/3] qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()

2024-05-01 Thread Akihiko Odaki


On 2024/04/30 23:41, Yuri Benditovich wrote:

On Sun, Apr 28, 2024 at 10:21 AM Akihiko Odaki  wrote:


DEFINE_PROP_ON_OFF_AUTO_BIT64() corresponds to DEFINE_PROP_ON_OFF_AUTO()
as DEFINE_PROP_BIT64() corresponds to DEFINE_PROP_BOOL(). The difference
is that DEFINE_PROP_ON_OFF_AUTO_BIT64() exposes OnOffAuto instead of
bool.

Signed-off-by: Akihiko Odaki 
---
  include/hw/qdev-properties.h | 18 
  hw/core/qdev-properties.c| 65 +++-
  2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 09aa04ca1e27..afec53a48470 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -43,11 +43,22 @@ struct PropertyInfo {
  ObjectPropertyRelease *release;
  };

+/**
+ * struct OnOffAutoBit64 - OnOffAuto storage with 64 elements.
+ * @on_bits: Bitmap of elements with "on".
+ * @auto_bits: Bitmap of elements with "auto".
+ */
+typedef struct OnOffAutoBit64 {
+uint64_t on_bits;
+uint64_t auto_bits;
+} OnOffAutoBit64;
+

  /*** qdev-properties.c ***/

  extern const PropertyInfo qdev_prop_bit;
  extern const PropertyInfo qdev_prop_bit64;
+extern const PropertyInfo qdev_prop_on_off_auto_bit64;
  extern const PropertyInfo qdev_prop_bool;
  extern const PropertyInfo qdev_prop_enum;
  extern const PropertyInfo qdev_prop_uint8;
@@ -100,6 +111,13 @@ extern const PropertyInfo qdev_prop_link;
  .set_default = true,  \
  .defval.u  = (bool)_defval)

+#define DEFINE_PROP_ON_OFF_AUTO_BIT64(_name, _state, _field, _bit, _defval) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_on_off_auto_bit64, \
+OnOffAutoBit64, \
+.bitnr= (_bit), \
+.set_default = true,\
+.defval.i = (OnOffAuto)_defval)
+
  #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) \
  DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
  .set_default = true, \
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 7d6fa726fdf2..b96f54a1b912 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -188,7 +188,8 @@ const PropertyInfo qdev_prop_bit = {

  static uint64_t qdev_get_prop_mask64(Property *prop)
  {
-assert(prop->info == _prop_bit64);
+assert(prop->info == _prop_bit64 ||
+   prop->info == _prop_on_off_auto_bit64);
  return 0x1ull << prop->bitnr;
  }

@@ -233,6 +234,68 @@ const PropertyInfo qdev_prop_bit64 = {
  .set_default_value = set_default_value_bool,
  };

+static void prop_get_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (p->auto_bits & mask) {
+value = ON_OFF_AUTO_AUTO;
+} else if (p->on_bits & mask) {
+value = ON_OFF_AUTO_ON;
+} else {
+value = ON_OFF_AUTO_OFF;
+}
+
+visit_type_enum(v, name, , _lookup, errp);
+}
+
+static void prop_set_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (!visit_type_enum(v, name, , _lookup, errp)) {
+return;
+}
+
+switch (value) {
+case ON_OFF_AUTO_AUTO:
+p->on_bits &= ~mask;
+p->auto_bits |= mask;
+break;
+
+case ON_OFF_AUTO_ON:
+p->on_bits |= mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO_OFF:
+p->on_bits &= ~mask;
+p->auto_bits &= ~mask;
+break;
+}
+}
+
+const PropertyInfo qdev_prop_on_off_auto_bit64 = {
+.name  = "bool",


Does it mean that the name of this tristate type is "bool"? Or I miss something?


No, this should be OnOffAuto. Thanks for pointing out this; I'll fix 
this in the next version.

Re: [PATCH 1/1] Fix safeAreaInsets not being available on OS X 10.13

2024-04-28 Thread Akihiko Odaki


On 2024/04/28 20:57, Tobias Markus wrote:

Hello,

to prefix this: I previously filed 
https://gitlab.com/qemu-project/qemu/-/issues/2314 for this compilation 
error and I'm quite aware that QEMU only supports the most recent two 
versions of Mac OS X by default. However, given the small change 
required for this to work, I hope you can make an exception and include 
the attached patch into QEMU.


It would really help me continue to run QEMU on my legacy MacOS X 10.13 
version.


Hi,

I'm sorry to tell this but I'm not for making an exception.

While this change is indeed small, we already have many codes that are 
only needed by unsupported macOS versions. They were not added to keep 
QEMU work with unsupported macOS versions, but they were added when 
those versions were still supported. Maintaining all of them is not a 
trivial task so they are now eligible for removal.


Regards,
Akihiko Odaki



Thanks in advance!

Kind regards,
Tobias Markus

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2314
Signed-off-by: Tobias Markus 
---
  ui/cocoa.m | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 25e0db9dd0..96992736ef 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -539,9 +539,11 @@ - (NSSize)fixAspectRatio:(NSSize)max
  - (NSSize) screenSafeAreaSize
  {
  NSSize size = [[[self window] screen] frame].size;
+#if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_VERSION_14_0
  NSEdgeInsets insets = [[[self window] screen] safeAreaInsets];
  size.width -= insets.left + insets.right;
  size.height -= insets.top + insets.bottom;
+#endif
  return size;
  }

[PATCH 1/2] util/iov: Do not assert offset is in iov

2024-04-28 Thread Akihiko Odaki

iov_from_buf(), iov_to_buf(), iov_memset(), and iov_copy() asserts
that the given offset fits in the iov while tolerating the specified
number of bytes to operate with to be greater than the size of iov.
This is inconsistent so remove the assertions.

Asserting the offset fits in the iov makes sense if it is expected that
there are other operations that process the content before the offset
and the content is processed in order. Under this expectation, the
offset should point to the end of bytes that are previously processed
and fit in the iov. However, this expectation depends on the details of
the caller, and did not hold true at least one case and required code to
check iov_size(), which is added with commit 83ddb3dbba2e
("hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()").

Adding such a check is inefficient and error-prone. These functions
already tolerate the specified number of bytes to operate with to be
greater than the size of iov to avoid such checks so remove the
assertions to tolerate invalid offset as well. They return the number of
bytes they operated with so their callers can still check the returned
value to ensure there are sufficient space at the given offset.

Signed-off-by: Akihiko Odaki 
---
 include/qemu/iov.h | 5 +++--
 util/iov.c | 5 -
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index 63a1c01965d1..33548058d2ee 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -30,7 +30,7 @@ size_t iov_size(const struct iovec *iov, const unsigned int 
iov_cnt);
  * only part of data will be copied, up to the end of the iovec.
  * Number of bytes actually copied will be returned, which is
  *  min(bytes, iov_size(iov)-offset)
- * `Offset' must point to the inside of iovec.
+ * Returns 0 when `offset' points to the outside of iovec.
  */
 size_t iov_from_buf_full(const struct iovec *iov, unsigned int iov_cnt,
  size_t offset, const void *buf, size_t bytes);
@@ -66,11 +66,12 @@ iov_to_buf(const struct iovec *iov, const unsigned int 
iov_cnt,
 /**
  * Set data bytes pointed out by iovec `iov' of size `iov_cnt' elements,
  * starting at byte offset `start', to value `fillc', repeating it
- * `bytes' number of times.  `Offset' must point to the inside of iovec.
+ * `bytes' number of times.
  * If `bytes' is large enough, only last bytes portion of iovec,
  * up to the end of it, will be filled with the specified value.
  * Function return actual number of bytes processed, which is
  * min(size, iov_size(iov) - offset).
+ * Returns 0 when `offset' points to the outside of iovec.
  */
 size_t iov_memset(const struct iovec *iov, const unsigned int iov_cnt,
   size_t offset, int fillc, size_t bytes);
diff --git a/util/iov.c b/util/iov.c
index 7e73948f5e3d..a523b406b7f8 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -36,7 +36,6 @@ size_t iov_from_buf_full(const struct iovec *iov, unsigned 
int iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -55,7 +54,6 @@ size_t iov_to_buf_full(const struct iovec *iov, const 
unsigned int iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -74,7 +72,6 @@ size_t iov_memset(const struct iovec *iov, const unsigned int 
iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -266,7 +263,6 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int 
dst_iov_cnt,
 bytes -= len;
 offset = 0;
 }
-assert(offset == 0);
 return j;
 }
 
@@ -337,7 +333,6 @@ size_t qemu_iovec_concat_iov(QEMUIOVector *dst,
 soffset -= src_iov[i].iov_len;
 }
 }
-assert(soffset == 0); /* offset beyond end of src */
 
 return done;
 }

-- 
2.44.0

[PATCH 2/2] Revert "hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()"

2024-04-28 Thread Akihiko Odaki

This reverts commit 83ddb3dbba2ee0f1767442ae6ee665058aeb1093.

The added check is no longer necessary due to a change of
iov_from_buf().

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index b7b1de816dc5..2134a18c4c90 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,10 +141,6 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
 uint32_t csum = 0;
 struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
 
-if (iov_size(pl_start_frag, pkt->payload_frags) < 8 + sizeof(csum)) {
-return false;
-}
-
 if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , 
sizeof(csum)) < sizeof(csum)) {
 return false;
 }

-- 
2.44.0

[PATCH 0/2] util/iov: Do not assert offset is in iov

2024-04-28 Thread Akihiko Odaki

iov_from_buf(), iov_to_buf(), iov_memset(), and iov_copy() asserts
that the given offset fits in the iov while tolerating the specified
number of bytes to operate with to be greater than the size of iov.
This is inconsistent so remove the assertions.

Asserting the offset fits in the iov makes sense if it is expected that
there are other operations that process the content before the offset
and the content is processed in order. Under this expectation, the
offset should point to the end of bytes that are previously processed
and fit in the iov. However, this expectation depends on the details of
the caller, and did not hold true at least one case and required code to
check iov_size(), which is added with commit 83ddb3dbba2e
("hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()").

Adding such a check is inefficient and error-prone. These functions
already tolerate the specified number of bytes to operate with to be
greater than the size of iov to avoid such checks so remove the
assertions to tolerate invalid offset as well. They return the number of
bytes they operated with so their callers can still check the returned
value to ensure there are sufficient space at the given offset.

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (2):
  util/iov: Do not assert offset is in iov
  Revert "hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()"

 include/qemu/iov.h  | 5 +++--
 hw/net/net_tx_pkt.c | 4 
 util/iov.c  | 5 -
 3 files changed, 3 insertions(+), 11 deletions(-)
---
base-commit: fd87be1dada5672f877e03c2ca8504458292c479
change-id: 20240428-iov-a317b02601dc

Best regards,
-- 
Akihiko Odaki

[PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF

2024-04-28 Thread Akihiko Odaki

A SR-IOV VF cannot have a ROM BAR.

Co-developed-by: Yui Washizu 
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index cb5ac46e9f27..201ff64e11cc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2359,6 +2359,14 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
 return;
 }
 
+if (pci_is_vf(pdev)) {
+if (pdev->rom_bar != UINT32_MAX) {
+error_setg(errp, "ROM BAR cannot be enabled for SR-IOV VF");
+}
+
+return;
+}
+
 if (load_file || pdev->romsize == UINT32_MAX) {
 path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
 if (path == NULL) {

-- 
2.44.0

[PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive

2024-04-28 Thread Akihiko Odaki

A device cannot be a SR-IOV PF and a VF at the same time.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 56523ab4e833..ec8fc0757b92 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 uint8_t *cfg = dev->config + offset;
 uint8_t *wmask;
 
+if (pci_is_vf(dev)) {
+error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same 
time");
+return false;
+}
+
 if (total_vfs) {
 uint16_t ari_cap = pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI);
 uint16_t first_vf_devfn = dev->devfn + vf_offset;

-- 
2.44.0

[PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF

2024-04-28 Thread Akihiko Odaki

A virtio-net device can be added as a SR-IOV VF to another virtio-pci
device that will be the PF.

Signed-off-by: Akihiko Odaki 
---
 hw/virtio/virtio-net-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c
index e03543a70a75..dba4987d6e04 100644
--- a/hw/virtio/virtio-net-pci.c
+++ b/hw/virtio/virtio-net-pci.c
@@ -75,6 +75,7 @@ static void virtio_net_pci_class_init(ObjectClass *klass, 
void *data)
 k->device_id = PCI_DEVICE_ID_VIRTIO_NET;
 k->revision = VIRTIO_PCI_ABI_VERSION;
 k->class_id = PCI_CLASS_NETWORK_ETHERNET;
+k->sriov_vf_user_creatable = true;
 set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 device_class_set_props(dc, virtio_net_properties);
 vpciklass->realize = virtio_net_pci_realize;

-- 
2.44.0

[PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation

2024-04-28 Thread Akihiko Odaki

Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com>
("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements")

Introduction


This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o
  -netdev user,id=p -netdev user,id=q
  -device pcie-root-port,id=b
  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.

Keeping VF instances


A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.

As an strategy alternative to runtime realization/unrealization, this
series proposes to reuse the code to power down PCI Express devices.
When a PCI Express device is powered down, it will be hidden from the
guest but will be kept realized. This effectively implements the
behavior we need for the SR-IOV emulation.

Summary
---

Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for
VFs.
Patch 2 makes zero stride valid for 1 VF configuration.
Patch 3 and 4 adds validations.
Patch 5 adds user-created SR-IOV VF infrastructure.
Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs.
Patch 7 allows user to create SR-IOV VFs with virtio-net-pci.

[1] 
https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.wash...@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4...@daynix.com/

Co-developed-by: Yui Washizu 
Signed-off-by: Akihiko Odaki 
---
Changes in v4:
- Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division
  by zero reported by Yui Washizu.
- Rebased.
- Link to v3: 
https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770...@daynix.com

Changes in v3:
- Rebased.
- Link to v2: 
https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6d...@daynix.com

Changes in v2:
- Changed to keep VF instances.
- Link to v1: 
https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7...@daynix.com

---
Akihiko Odaki (7):
  hw/pci: Do not add ROM BAR for SR-IOV VF
  hw/pci: Fix SR-IOV VF number calculation
  pcie_sriov: Ensure PF and VF are mutually exclusive
  pcie_sriov: Check PCI Express for SR-IOV PF
  pcie_sriov: Allow user to create SR-IOV device
  virtio-pci: Implement SR-IOV PF
  virtio-net: Implement SR-IOV VF

 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c|  76 +++
 hw/pci/pcie_sriov.c | 298 +++-
 hw/virtio/virtio-net-pci.c  |   1 +
 hw/virtio/virtio-pci.c  |   7 ++
 6 files changed, 323 insertions(+), 84 deletions(-)
---
base-commit: 2ac5458086ab61282f30c2f8bdf2ae9a0a06a75d
change-id: 20231202-sriov-9402fb262be8

Best regards,
-- 
Akihiko Odaki

[PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF

2024-04-28 Thread Akihiko Odaki

SR-IOV requires PCI Express.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index ec8fc0757b92..3af0cc7d560a 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 uint8_t *cfg = dev->config + offset;
 uint8_t *wmask;
 
+if (!pci_is_express(dev)) {
+error_setg(errp, "PCI Express is required for SR-IOV PF");
+return false;
+}
+
 if (pci_is_vf(dev)) {
 error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same 
time");
 return false;

-- 
2.44.0

[PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF

2024-04-28 Thread Akihiko Odaki

Allow user to attach SR-IOV VF to a virtio-pci PF.

Signed-off-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402cfa..996bb2cbad20 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2245,6 +2245,12 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 pci_register_bar(>pci_dev, proxy->legacy_io_bar_idx,
  PCI_BASE_ADDRESS_SPACE_IO, >bar);
 }
+
+if (pcie_sriov_pf_init_from_user_created_vfs(>pci_dev,
+ PCI_CONFIG_SPACE_SIZE,
+ errp)) {
+virtio_add_feature(>host_features, VIRTIO_F_SR_IOV);
+}
 }
 
 static void virtio_pci_device_unplugged(DeviceState *d)
@@ -2253,6 +2259,7 @@ static void virtio_pci_device_unplugged(DeviceState *d)
 bool modern = virtio_pci_modern(proxy);
 bool modern_pio = proxy->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
 
+pcie_sriov_pf_exit(>pci_dev);
 virtio_pci_stop_ioeventfd(proxy);
 
 if (modern) {

-- 
2.44.0

[PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device

2024-04-28 Thread Akihiko Odaki

A user can create a SR-IOV device by specifying the PF with the
sriov-pf property of the VFs. The VFs must be added before the PF.

A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable
set. Such a VF cannot refer to the PF because it is created before the
PF.

A PF that user-creatable VFs can be attached calls
pcie_sriov_pf_init_from_user_created_vfs() during realization and
pcie_sriov_pf_exit() when exiting.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c|  62 ++
 hw/pci/pcie_sriov.c | 288 +++-
 4 files changed, 292 insertions(+), 83 deletions(-)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index 6be0f989ebe0..eefd9d9a7b5a 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -37,6 +37,8 @@ struct PCIDeviceClass {
 uint16_t subsystem_id;  /* only for header type = 0 */
 
 const char *romfile;/* rom bar */
+
+bool sriov_vf_user_creatable;
 };
 
 enum PCIReqIDType {
@@ -160,6 +162,8 @@ struct PCIDevice {
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
 uint32_t acpi_index;
+
+char *sriov_pf;
 };
 
 static inline int pci_intx(PCIDevice *pci_dev)
@@ -192,7 +196,7 @@ static inline int pci_is_express_downstream_port(const 
PCIDevice *d)
 
 static inline int pci_is_vf(const PCIDevice *d)
 {
-return d->exp.sriov_vf.pf != NULL;
+return d->sriov_pf || d->exp.sriov_vf.pf != NULL;
 }
 
 static inline uint32_t pci_config_size(const PCIDevice *d)
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index d576a8c6be19..20626b5605c9 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,6 +18,7 @@
 struct PCIESriovPF {
 uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
 PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
+bool vf_user_created; /* If VFs are created by user */
 };
 
 struct PCIESriovVF {
@@ -40,6 +41,24 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int 
region_num,
 void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
 MemoryRegion *memory);
 
+/**
+ * pcie_sriov_pf_init_from_user_created_vfs() - Initialize PF with user-created
+ *  VFs.
+ * @dev: A PCIe device being realized.
+ * @offset: The offset of the SR-IOV capability.
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return:
+ * * true - @dev is initialized as a PCIe SR-IOV PF.
+ * * false - @dev is not initialized because there is no SR-IOV VFs or an error
+ *   occurred.
+ */
+bool pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, uint16_t offset,
+  Error **errp);
+
+bool pcie_sriov_register_device(PCIDevice *dev, Error **errp);
+void pcie_sriov_unregister_device(PCIDevice *dev);
+
 /*
  * Default (minimal) page size support values
  * as required by the SR/IOV standard:
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dbecb3d4aa42..e79bb8b6b6fa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -85,6 +85,7 @@ static Property pci_props[] = {
 QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
 QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
+DEFINE_PROP_STRING("sriov-pf", PCIDevice, sriov_pf),
 DEFINE_PROP_END_OF_LIST()
 };
 
@@ -959,13 +960,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
 }
 
-/*
- * With SR/IOV and ARI, a device at function 0 need not be a multifunction
- * device, as it may just be a VF that ended up with function 0 in
- * the legacy PCI interpretation. Avoid failing in such cases:
- */
-if (pci_is_vf(dev) &&
-dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+/* SR/IOV is not handled here. */
+if (pci_is_vf(dev)) {
 return;
 }
 
@@ -998,7 +994,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 }
 /* function 0 indicates single function, so function > 0 must be NULL */
 for (func = 1; func < PCI_FUNC_MAX; ++func) {
-if (bus->devices[PCI_DEVFN(slot, func)]) {
+PCIDevice *device = bus->devices[PCI_DEVFN(slot, func)];
+if (device && !pci_is_vf(device)) {
 error_setg(errp, "PCI: %x.0 indicates single function, "
"but %x.%x is already populated.",
slot, slot, func);
@@ -1283,6 +1280,7 @@ static void pci_qdev_unrealize(DeviceState *dev)
 
 pci_unregister_io_regions(pci_dev);
 pc

[PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation

2024-04-28 Thread Akihiko Odaki

pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs
to be non-zero when there are multiple VFs, but the specification does
not prohibit to make it zero when there is only one VF.

Do not perform the division for the first VF to avoid division by zero.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 201ff64e11cc..dbecb3d4aa42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1437,7 +1437,11 @@ static pcibus_t pci_config_get_bar_addr(PCIDevice *d, 
int reg,
 pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
 uint16_t vf_stride =
 pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
+uint32_t vf_num = d->devfn - (pf->devfn + vf_offset);
+
+if (vf_num) {
+vf_num /= vf_stride;
+}
 
 if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
 new_addr = pci_get_quad(pf->config + bar);

-- 
2.44.0

[PATCH 3/3] virtio-net: Report RSS warning at device realization

2024-04-28 Thread Akihiko Odaki

Warning about RSS fallback at device realization allows the user to
notice the configuration problem early.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5b6c901915a9..e1a6d43de283 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -817,6 +817,10 @@ static uint64_t virtio_net_get_features(VirtIODevice 
*vdev, uint64_t features,
 
 if (!get_vhost_net(nc->peer)) {
 if (!ebpf_rss_is_loaded(>ebpf_rss)) {
+if (on_off_auto_features.on_bits & VIRTIO_NET_F_RSS) {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+}
+
 virtio_clear_feature(_off_auto_features.auto_bits,
  VIRTIO_NET_F_RSS);
 }
@@ -1327,16 +1331,10 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 static void virtio_net_commit_rss_config(VirtIONet *n)
 {
 if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash ||
+   !virtio_net_attach_epbf_rss(n);
 if (n->rss_data.populate_hash) {
 virtio_net_detach_epbf_rss(n);
-} else if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-} else {
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
 }
 
 trace_virtio_net_rss_enable(n->rss_data.hash_types,

-- 
2.44.0

[PATCH 2/3] virtio-net: Convert feature properties to OnOffAuto

2024-04-28 Thread Akihiko Odaki

Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 
---
 include/hw/virtio/virtio-net.h |   2 +-
 hw/net/virtio-net.c| 247 +
 2 files changed, 152 insertions(+), 97 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 060c23c04d2d..ff32e30f001b 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -178,7 +178,7 @@ struct VirtIONet {
 uint32_t has_vnet_hdr;
 size_t host_hdr_len;
 size_t guest_hdr_len;
-uint64_t host_features;
+OnOffAutoBit64 host_features;
 uint32_t rsc_timeout;
 uint8_t rsc4_enabled;
 uint8_t rsc6_enabled;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c8059dc99bd4..5b6c901915a9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -750,58 +750,96 @@ static void virtio_net_set_queue_pairs(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
+static bool virtio_net_clear_features(OnOffAutoBit64 *features,
+  uint64_t clear_bits,
+  const char *reason, Error **errp)
+{
+if (features->on_bits & clear_bits) {
+error_setg(errp, "%s", reason);
+return false;
+}
+
+features->auto_bits &= ~clear_bits;
+
+return true;
+}
+
 static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
 Error **errp)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
-
-/* Firstly sync all virtio-net possible supported features */
-features |= n->host_features;
-
-virtio_add_feature(, VIRTIO_NET_F_MAC);
-
-if (!peer_has_vnet_hdr(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_CSUM);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_TSO4);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_TSO6);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_ECN);
-
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_CSUM);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO6);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_ECN);
-
-virtio_clear_feature(, VIRTIO_NET_F_HOST_USO);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO6);
-
-virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);
-}
-
-if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_UFO);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_UFO);
-}
-
-if (!peer_has_uso(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_HOST_USO);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO6);
+OnOffAutoBit64 on_off_auto_features = n->host_features;
+
+on_off_auto_features.on_bits |= features;
+virtio_add_feature(_off_auto_features.auto_bits, VIRTIO_NET_F_MAC);
+
+if (!((peer_has_vnet_hdr(n) ||
+   virtio_net_clear_features(_off_auto_features,
+ BIT_ULL(VIRTIO_NET_F_CSUM) |
+ BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+ BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+ BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+ BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_HOST_USO) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_USO4) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_USO6) |
+ BIT_ULL(VIRTIO_NET_F_HASH_REPORT),
+ "A requested feature requires the peer to 
support virtio-net headers.",
+ errp)) &&
+  (peer_has_ufo(n) ||
+   virtio_net_clear_features(_off_auto_features,
+ BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_HOST_UFO),
+ "UFO

[PATCH 1/3] qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()

2024-04-28 Thread Akihiko Odaki

DEFINE_PROP_ON_OFF_AUTO_BIT64() corresponds to DEFINE_PROP_ON_OFF_AUTO()
as DEFINE_PROP_BIT64() corresponds to DEFINE_PROP_BOOL(). The difference
is that DEFINE_PROP_ON_OFF_AUTO_BIT64() exposes OnOffAuto instead of
bool.

Signed-off-by: Akihiko Odaki 
---
 include/hw/qdev-properties.h | 18 
 hw/core/qdev-properties.c| 65 +++-
 2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 09aa04ca1e27..afec53a48470 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -43,11 +43,22 @@ struct PropertyInfo {
 ObjectPropertyRelease *release;
 };
 
+/**
+ * struct OnOffAutoBit64 - OnOffAuto storage with 64 elements.
+ * @on_bits: Bitmap of elements with "on".
+ * @auto_bits: Bitmap of elements with "auto".
+ */
+typedef struct OnOffAutoBit64 {
+uint64_t on_bits;
+uint64_t auto_bits;
+} OnOffAutoBit64;
+
 
 /*** qdev-properties.c ***/
 
 extern const PropertyInfo qdev_prop_bit;
 extern const PropertyInfo qdev_prop_bit64;
+extern const PropertyInfo qdev_prop_on_off_auto_bit64;
 extern const PropertyInfo qdev_prop_bool;
 extern const PropertyInfo qdev_prop_enum;
 extern const PropertyInfo qdev_prop_uint8;
@@ -100,6 +111,13 @@ extern const PropertyInfo qdev_prop_link;
 .set_default = true,  \
 .defval.u  = (bool)_defval)
 
+#define DEFINE_PROP_ON_OFF_AUTO_BIT64(_name, _state, _field, _bit, _defval) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_on_off_auto_bit64, \
+OnOffAutoBit64, \
+.bitnr= (_bit), \
+.set_default = true,\
+.defval.i = (OnOffAuto)_defval)
+
 #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) \
 DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
 .set_default = true, \
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 7d6fa726fdf2..b96f54a1b912 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -188,7 +188,8 @@ const PropertyInfo qdev_prop_bit = {
 
 static uint64_t qdev_get_prop_mask64(Property *prop)
 {
-assert(prop->info == _prop_bit64);
+assert(prop->info == _prop_bit64 ||
+   prop->info == _prop_on_off_auto_bit64);
 return 0x1ull << prop->bitnr;
 }
 
@@ -233,6 +234,68 @@ const PropertyInfo qdev_prop_bit64 = {
 .set_default_value = set_default_value_bool,
 };
 
+static void prop_get_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (p->auto_bits & mask) {
+value = ON_OFF_AUTO_AUTO;
+} else if (p->on_bits & mask) {
+value = ON_OFF_AUTO_ON;
+} else {
+value = ON_OFF_AUTO_OFF;
+}
+
+visit_type_enum(v, name, , _lookup, errp);
+}
+
+static void prop_set_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (!visit_type_enum(v, name, , _lookup, errp)) {
+return;
+}
+
+switch (value) {
+case ON_OFF_AUTO_AUTO:
+p->on_bits &= ~mask;
+p->auto_bits |= mask;
+break;
+
+case ON_OFF_AUTO_ON:
+p->on_bits |= mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO_OFF:
+p->on_bits &= ~mask;
+p->auto_bits &= ~mask;
+break;
+}
+}
+
+const PropertyInfo qdev_prop_on_off_auto_bit64 = {
+.name  = "bool",
+.description = "on/off/auto",
+.enum_table = _lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.get = prop_get_on_off_auto_bit64,
+.set = prop_set_on_off_auto_bit64,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- bool --- */
 
 static void get_bool(Object *obj, Visitor *v, const char *name, void *opaque,

-- 
2.44.0

[PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto

2024-04-28 Thread Akihiko Odaki

Based-on: <20240428-rss-v10-0-73cbaa91a...@daynix.com>
("[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements")

Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (3):
  qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()
  virtio-net: Convert feature properties to OnOffAuto
  virtio-net: Report RSS warning at device realization

 include/hw/qdev-properties.h   |  18 +++
 include/hw/virtio/virtio-net.h |   2 +-
 hw/core/qdev-properties.c  |  65 ++-
 hw/net/virtio-net.c| 259 +
 4 files changed, 239 insertions(+), 105 deletions(-)
---
base-commit: ec6325eec995018983a3f88f0e78ebf733a47b7e
change-id: 20240428-auto-be0dc010dda5

Best regards,
-- 
Akihiko Odaki

[PATCH v10 12/18] virtio-net: Unify the logic to update NIC state for RSS

2024-04-28 Thread Akihiko Odaki

The code to attach or detach the eBPF program to RSS were duplicated so
unify them into one function to save some code.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 90 +
 1 file changed, 36 insertions(+), 54 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ac9c06f6865..61b49e335dea 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1232,18 +1232,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
-static void virtio_net_detach_epbf_rss(VirtIONet *n);
-
-static void virtio_net_disable_rss(VirtIONet *n)
-{
-if (n->rss_data.enabled) {
-trace_virtio_net_rss_disable();
-}
-n->rss_data.enabled = false;
-
-virtio_net_detach_epbf_rss(n);
-}
-
 static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
 {
 NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
@@ -1291,6 +1279,40 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
+static void virtio_net_commit_rss_config(VirtIONet *n)
+{
+if (n->rss_data.enabled) {
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+if (n->rss_data.populate_hash) {
+virtio_net_detach_epbf_rss(n);
+} else if (!virtio_net_attach_epbf_rss(n)) {
+if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+warn_report("Can't load eBPF RSS for vhost");
+} else {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+n->rss_data.enabled_software_rss = true;
+}
+}
+
+trace_virtio_net_rss_enable(n->rss_data.hash_types,
+n->rss_data.indirections_len,
+sizeof(n->rss_data.key));
+} else {
+virtio_net_detach_epbf_rss(n);
+trace_virtio_net_rss_disable();
+}
+}
+
+static void virtio_net_disable_rss(VirtIONet *n)
+{
+if (!n->rss_data.enabled) {
+return;
+}
+
+n->rss_data.enabled = false;
+virtio_net_commit_rss_config(n);
+}
+
 static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
@@ -1455,28 +1477,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
 goto error;
 }
 n->rss_data.enabled = true;
-
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-/* EBPF must be loaded for vhost */
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-goto error;
-}
-/* fallback to software RSS */
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-} else {
-/* use software RSS for hash populating */
-/* and detach eBPF if was loaded before */
-virtio_net_detach_epbf_rss(n);
-n->rss_data.enabled_software_rss = true;
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-temp.b);
+virtio_net_commit_rss_config(n);
 return queue_pairs;
 error:
 trace_virtio_net_rss_error(err_msg, err_value);
@@ -3092,26 +3093,7 @@ static int virtio_net_post_load_device(void *opaque, int 
version_id)
 }
 }
 
-if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't post-load eBPF RSS for vhost");
-} else {
-warn_report("Can't post-load eBPF RSS - "
-"fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-}
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-sizeof(n->rss_data.key));
-} else {
-trace_virtio_net_rss_disable();
-}
+virtio_net_commit_rss_config(n);
 return 0;
 }
 

-- 
2.44.0

[PATCH v10 10/18] virtio-net: Shrink header byte swapping buffer

2024-04-28 Thread Akihiko Odaki

Byte swapping is only performed for the part of header shared with the
legacy standard and the buffer only needs to cover it.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ca0fbf7b7654..5aa0527a1921 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -676,11 +676,6 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
-/*
- * Note: when extending the vnet header, please make sure to
- * change the vnet header copying logic in virtio_net_flush_tx()
- * as well.
- */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2752,7 +2747,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_v1_hash vhdr;
+struct virtio_net_hdr vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2769,18 +2764,18 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->needs_vnet_hdr_swap) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
-n->guest_hdr_len) {
+if (iov_to_buf(out_sg, out_num, 0, , sizeof(vhdr)) <
+sizeof(vhdr)) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
 g_free(elem);
 return -EINVAL;
 }
-virtio_net_hdr_swap(vdev, (void *) );
+virtio_net_hdr_swap(vdev, );
 sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
+sg2[0].iov_len = sizeof(vhdr);
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
-   n->guest_hdr_len, -1);
+   sizeof(vhdr), -1);
 if (out_num == VIRTQUEUE_MAX_SIZE) {
 goto drop;
 }

-- 
2.44.0

[PATCH v10 07/18] virtio-net: Do not propagate ebpf-rss-fds errors

2024-04-28 Thread Akihiko Odaki

Propagating ebpf-rss-fds errors has several problems.

First, it makes device realization fail and disables the fallback to the
conventional eBPF loading.

Second, it leaks memory by making device realization fail without
freeing memory already allocated.

Third, the convention is to set an error when a function returns false,
but virtio_net_load_ebpf_fds() and virtio_net_load_ebpf() returns false
without setting an error, which is confusing.

Remove the propagation to fix these problems.

Fixes: 0524ea0510a3 ("ebpf: Added eBPF initialization by fds.")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f6112c0ac97d..8ede38aadbbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1329,24 +1329,22 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
 int ret = true;
 int i = 0;
 
-ERRP_GUARD();
-
 if (n->nr_ebpf_rss_fds != EBPF_RSS_MAX_FDS) {
-error_setg(errp,
-  "Expected %d file descriptors but got %d",
-  EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
+warn_report("Expected %d file descriptors but got %d",
+EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
return false;
}
 
 for (i = 0; i < n->nr_ebpf_rss_fds; i++) {
-fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i], errp);
-if (*errp) {
+fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i],
+  _warn);
+if (fds[i] < 0) {
 ret = false;
 goto exit;
 }
@@ -1355,7 +1353,7 @@ static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error 
**errp)
 ret = ebpf_rss_load_fds(>ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
 
 exit:
-if (!ret || *errp) {
+if (!ret) {
 for (i = 0; i < n->nr_ebpf_rss_fds && fds[i] != -1; i++) {
 close(fds[i]);
 }
@@ -1364,13 +1362,12 @@ exit:
 return ret;
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf(VirtIONet *n)
 {
 bool ret = false;
 
 if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-if (!(n->ebpf_rss_fds
-&& virtio_net_load_ebpf_fds(n, errp))) {
+if (!(n->ebpf_rss_fds && virtio_net_load_ebpf_fds(n))) {
 ret = ebpf_rss_load(>ebpf_rss);
 }
 }
@@ -3825,7 +3822,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n, errp);
+virtio_net_load_ebpf(n);
 }
 }
 

-- 
2.44.0

[PATCH v10 16/18] ebpf: Return 0 when configuration fails

2024-04-28 Thread Akihiko Odaki

The kernel interprets the returned value as an unsigned 32-bit so -1
will mean queue 4294967295, which is awkward. Return 0 instead.

Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1532 +++
 tools/ebpf/rss.bpf.c|2 +-
 2 files changed, 767 insertions(+), 767 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index e41ed8890191..647212e5dd0c 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -178,786 +178,786 @@ static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
-\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
-\xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
-\x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
-\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
-\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
-\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
-\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x05\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x57\x01\0\0\x3f\xff\0\0\xb7\x04\
-\0\0\x01\0\0\0\x55\x01\x01\0\0\0\0\0\xb7\x04\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\
-\0\x63\x1a\x64\xff\0\0\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\
-\xa9\xd9\xff\0\0\0\0\x71\xa2\xd0\xff\0\0\0\0\x67\x02\0\0\x02\0\0\0\x57\x02\0\0\
-\x3c\0\0\0\x73\x4a\x5e\xff\0\0\0\0\x05\0\xbc\0\0\0\0\0\xb7\x01\0\0\x01\0\0\0\
-\x73\x1a\x59\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\xf0\xff\0\0\0\0\x7b\x1a\
-\xe8\xff\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\
-\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x28\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\
-\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xe4\x01\0\0\0\0\xb7\
-\x03\0\0\x28\0\0\0\x7b\x9a\x48\xff\0\0\0\0\x79\xa1\xe0\xff\0\0\0\0\x63\x1a\x6c\
-\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x70\xff\0\0\0\0\x79\xa1\xd8\xff\0\0\
-\0\0\x63\x1a\x64\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x68\xff\0\0\0\0\x79\
-\xa1\xe8\xff\0\0\0\0\x63\x1a\x74\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x78\
-\xff\0\0\0\0\x79\xa1\xf0\xff\0\0\0\0\x63\x1a\x7c\xff\0\0\0\0\x77\x01\0\0\x20\0\
-\0\0\x63\x1a\x80\xff\0\0\0\0\x71\xa9\xd6\xff\0\0\0\0\x25\x09\x93\0\x3c\0\0\0\
-\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\0\0\x18\x02\0\0\x01\0\0\0\0\0\0\0\0\x18\
-\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\0\x8c\0\0\0\0\0\xb7\x01\0\
-\0\0\0\0\0\x6b\x1a\xfe\xff\0\0\0\0\xb7\x02\0\0\x28\0\0\0\xbf\xa1\0\0\0\0\0\0\
-\x07\x01\0\0\x94\xff\xff\xff\x7b\x1a\x20\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\
-\x01\0\0\x84\xff\xff\xff\x7b\x1a\x18\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\
-\x38\xff\0\0\0\0\x7b\x7a\x30\xff\0\0\0\0\x7b\x8a\x28\xff\0\0\0\0\xbf\xa3\0\0\0\
-\0\0\0\x07\x03\0\0\xfe\xff\xff\xff\x79\xa1\x48\xff\0\0\0\0\x7b\x2a\x40\xff\0\0\
-\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\
-\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xb2\x01\0\0\0\0\xbf\x91\0\0\0\0\0\0\x15\
-\x01\x22\0\x3c\0\0\0\x15\x01\x58\0\x2c\0\0\0\x79\xa2\x40\xff\0\0\0\0\x55\x01\
-\x59\0\x2b\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xf8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\

[PATCH v10 11/18] virtio-net: Disable RSS on reset

2024-04-28 Thread Akihiko Odaki

RSS is disabled by default.

Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Michael Tokarev 
---
 hw/net/virtio-net.c | 70 +++--
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5aa0527a1921..1ac9c06f6865 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -600,40 +600,6 @@ static void virtio_net_queue_enable(VirtIODevice *vdev, 
uint32_t queue_index)
 }
 }
 
-static void virtio_net_reset(VirtIODevice *vdev)
-{
-VirtIONet *n = VIRTIO_NET(vdev);
-int i;
-
-/* Reset back to compatibility mode */
-n->promisc = 1;
-n->allmulti = 0;
-n->alluni = 0;
-n->nomulti = 0;
-n->nouni = 0;
-n->nobcast = 0;
-/* multiqueue is disabled by default */
-n->curr_queue_pairs = 1;
-timer_del(n->announce_timer.tm);
-n->announce_timer.round = 0;
-n->status &= ~VIRTIO_NET_S_ANNOUNCE;
-
-/* Flush any MAC and VLAN filter table state */
-n->mac_table.in_use = 0;
-n->mac_table.first_multi = 0;
-n->mac_table.multi_overflow = 0;
-n->mac_table.uni_overflow = 0;
-memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
-memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
-qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
-memset(n->vlans, 0, MAX_VLAN >> 3);
-
-/* Flush any async TX */
-for (i = 0;  i < n->max_queue_pairs; i++) {
-flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
-}
-}
-
 static void peer_test_vnet_hdr(VirtIONet *n)
 {
 NetClientState *nc = qemu_get_queue(n->nic);
@@ -3861,6 +3827,42 @@ static void virtio_net_device_unrealize(DeviceState *dev)
 virtio_cleanup(vdev);
 }
 
+static void virtio_net_reset(VirtIODevice *vdev)
+{
+VirtIONet *n = VIRTIO_NET(vdev);
+int i;
+
+/* Reset back to compatibility mode */
+n->promisc = 1;
+n->allmulti = 0;
+n->alluni = 0;
+n->nomulti = 0;
+n->nouni = 0;
+n->nobcast = 0;
+/* multiqueue is disabled by default */
+n->curr_queue_pairs = 1;
+timer_del(n->announce_timer.tm);
+n->announce_timer.round = 0;
+n->status &= ~VIRTIO_NET_S_ANNOUNCE;
+
+/* Flush any MAC and VLAN filter table state */
+n->mac_table.in_use = 0;
+n->mac_table.first_multi = 0;
+n->mac_table.multi_overflow = 0;
+n->mac_table.uni_overflow = 0;
+memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
+qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+memset(n->vlans, 0, MAX_VLAN >> 3);
+
+/* Flush any async TX */
+for (i = 0;  i < n->max_queue_pairs; i++) {
+flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
+}
+
+virtio_net_disable_rss(n);
+}
+
 static void virtio_net_instance_init(Object *obj)
 {
 VirtIONet *n = VIRTIO_NET(obj);

-- 
2.44.0

[PATCH v10 17/18] ebpf: Refactor tun_rss_steering_prog()

2024-04-28 Thread Akihiko Odaki

This saves branches and makes later BPF program changes easier.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/rss.bpf.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
index 77434435ac15..c989cb3cd82c 100644
--- a/tools/ebpf/rss.bpf.c
+++ b/tools/ebpf/rss.bpf.c
@@ -547,27 +547,23 @@ int tun_rss_steering_prog(struct __sk_buff *skb)
 config = bpf_map_lookup_elem(_rss_map_configurations, );
 toe = bpf_map_lookup_elem(_rss_map_toeplitz_key, );
 
-if (config && toe) {
-if (!config->redirect) {
-return config->default_queue;
-}
+if (!config || !toe) {
+return 0;
+}
 
-if (calculate_rss_hash(skb, config, toe, )) {
-__u32 table_idx = hash % config->indirections_len;
-__u16 *queue = 0;
+if (config->redirect && calculate_rss_hash(skb, config, toe, )) {
+__u32 table_idx = hash % config->indirections_len;
+__u16 *queue = 0;
 
-queue = bpf_map_lookup_elem(_rss_map_indirection_table,
-_idx);
+queue = bpf_map_lookup_elem(_rss_map_indirection_table,
+_idx);
 
-if (queue) {
-return *queue;
-}
+if (queue) {
+return *queue;
 }
-
-return config->default_queue;
 }
 
-return 0;
+return config->default_queue;
 }
 
 char _license[] SEC("license") = "GPL v2";

-- 
2.44.0

[PATCH v10 14/18] virtio-net: Do not write hashes to peer buffer

2024-04-28 Thread Akihiko Odaki

The peer buffer is qualified with const and not meant to be modified.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 527aac3a0465..c8059dc99bd4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1830,16 +1830,9 @@ static uint8_t virtio_net_get_hash_type(bool hasip4,
 return 0xff;
 }
 
-static void virtio_set_packet_hash(const uint8_t *buf, uint8_t report,
-   uint32_t hash)
-{
-struct virtio_net_hdr_v1_hash *hdr = (void *)buf;
-hdr->hash_value = hash;
-hdr->hash_report = report;
-}
-
 static int virtio_net_process_rss(NetClientState *nc, const uint8_t *buf,
-  size_t size)
+  size_t size,
+  struct virtio_net_hdr_v1_hash *hdr)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 unsigned int index = nc->queue_index, new_index = index;
@@ -1870,7 +1863,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
  n->rss_data.hash_types);
 if (net_hash_type > NetPktRssIpV6UdpEx) {
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, VIRTIO_NET_HASH_REPORT_NONE, 0);
+hdr->hash_value = VIRTIO_NET_HASH_REPORT_NONE;
+hdr->hash_report = 0;
 }
 return n->rss_data.redirect ? n->rss_data.default_queue : -1;
 }
@@ -1878,7 +1872,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
 hash = net_rx_pkt_calc_rss_hash(pkt, net_hash_type, n->rss_data.key);
 
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, reports[net_hash_type], hash);
+hdr->hash_value = hash;
+hdr->hash_report = reports[net_hash_type];
 }
 
 if (n->rss_data.redirect) {
@@ -1898,7 +1893,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 VirtQueueElement *elems[VIRTQUEUE_MAX_SIZE];
 size_t lens[VIRTQUEUE_MAX_SIZE];
 struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash extra_hdr;
 unsigned mhdr_cnt = 0;
 size_t offset, i, guest_offset, j;
 ssize_t err;
@@ -1908,7 +1903,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 }
 
 if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
-int index = virtio_net_process_rss(nc, buf, size);
+int index = virtio_net_process_rss(nc, buf, size, _hdr);
 if (index >= 0) {
 NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
 return virtio_net_receive_rcu(nc2, buf, size, true);
@@ -1968,15 +1963,17 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 if (n->mergeable_rx_bufs) {
 mhdr_cnt = iov_copy(mhdr_sg, ARRAY_SIZE(mhdr_sg),
 sg, elem->in_num,
-offsetof(typeof(mhdr), num_buffers),
-sizeof(mhdr.num_buffers));
+offsetof(typeof(extra_hdr), 
hdr.num_buffers),
+sizeof(extra_hdr.hdr.num_buffers));
 }
 
 receive_header(n, sg, elem->in_num, buf, size);
 if (n->rss_data.populate_hash) {
-offset = sizeof(mhdr);
+offset = offsetof(typeof(extra_hdr), hash_value);
 iov_from_buf(sg, elem->in_num, offset,
- buf + offset, n->host_hdr_len - sizeof(mhdr));
+ (char *)_hdr + offset,
+ sizeof(extra_hdr.hash_value) +
+ sizeof(extra_hdr.hash_report));
 }
 offset = n->host_hdr_len;
 total += n->guest_hdr_len;
@@ -2022,10 +2019,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 }
 
 if (mhdr_cnt) {
-virtio_stw_p(vdev, _buffers, i);
+virtio_stw_p(vdev, _hdr.hdr.num_buffers, i);
 iov_from_buf(mhdr_sg, mhdr_cnt,
  0,
- _buffers, sizeof mhdr.num_buffers);
+ _hdr.hdr.num_buffers,
+ sizeof extra_hdr.hdr.num_buffers);
 }
 
 for (j = 0; j < i; j++) {

-- 
2.44.0

[PATCH v10 06/18] tap: Shrink zeroed virtio-net header

2024-04-28 Thread Akihiko Odaki

tap prepends a zeroed virtio-net header when writing a packet to a
tap with virtio-net header enabled but not in use. This only happens
when s->host_vnet_hdr_len == sizeof(struct virtio_net_hdr).

Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tap.c b/net/tap.c
index 9825518ff1f3..51f7aec39d9e 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -119,7 +119,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 const struct iovec *iovp = iov;
 g_autofree struct iovec *iov_copy = NULL;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
+struct virtio_net_hdr hdr = { };
 
 if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
 iov_copy = g_new(struct iovec, iovcnt + 1);

-- 
2.44.0

[PATCH v10 15/18] ebpf: Fix RSS error handling

2024-04-28 Thread Akihiko Odaki

calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.

Fixes: f3fa412de2 ("ebpf: Added eBPF RSS program.")
Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1210 +++
 tools/ebpf/rss.bpf.c|   20 +-
 2 files changed, 610 insertions(+), 620 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index aed4ef9a0335..e41ed8890191 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -165,7 +165,7 @@ rss_bpf__create_skeleton(struct rss_bpf *obj)
s->progs[0].prog = >progs.tun_rss_steering_prog;
s->progs[0].link = >links.tun_rss_steering_prog;
 
-   s->data = (void *)rss_bpf__elf_bytes(>data_sz);
+   s->data = rss_bpf__elf_bytes(>data_sz);
 
obj->skeleton = s;
return 0;
@@ -176,194 +176,188 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20600;
-   return (const void *)"\
+   static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x38\x4d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x61\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x5f\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x58\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
-\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
-\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
-\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
-\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
-\xff\0\0\0\0\x15\x09\x47\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
+\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
+\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
+\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
+\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
+\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
+\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x3c\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
+\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x2c\x02\0\
-\0\0\0\x69\xa1\xc8\xff\0\0\0\0\x15\x01\x2a\x02\0\0\0\0\x7b\x9a\x38\xff\0\0\0\0\
-\x15\x01\x56\0\x86\xdd\0\0\x55\x01\x3b\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\x7b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xc8\xff\
-\xff\xff\x79\xa1\x38\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
+\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
+\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
+\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
+\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
+\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
+\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
 \x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x17\x02\0\0\0\0\x69\xa1\xce\xff\0\0

[PATCH v10 18/18] ebpf: Add a separate target for skeleton

2024-04-28 Thread Akihiko Odaki

This generalizes the rule to generate the skeleton and allows to add
another.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/Makefile.ebpf | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
index 3391e7ce0898..572ca5987ae6 100755
--- a/tools/ebpf/Makefile.ebpf
+++ b/tools/ebpf/Makefile.ebpf
@@ -1,23 +1,24 @@
-OBJS = rss.bpf.o
+SKELETONS = rss.bpf.skeleton.h
 
 LLVM_STRIP ?= llvm-strip
 CLANG ?= clang
 INC_FLAGS = `$(CLANG) -print-file-name=include`
 EXTRA_CFLAGS ?= -O2 -g -target bpf
 
-all: $(OBJS)
+all: $(SKELETONS)
 
 .PHONY: clean
 
 clean:
-   rm -f $(OBJS)
-   rm -f rss.bpf.skeleton.h
+   rm -f $(SKELETONS) $(SKELETONS:%.skeleton.h=%.o)
 
-$(OBJS):  %.o:%.c
+%.o: %.c
$(CLANG) $(INC_FLAGS) \
 -D__KERNEL__ -D__ASM_SYSREG_H \
 -I../include $(LINUXINCLUDE) \
 $(EXTRA_CFLAGS) -c $< -o $@
$(LLVM_STRIP) -g $@
-   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
-   cp rss.bpf.skeleton.h ../../ebpf/
+
+%.skeleton.h: %.o
+   bpftool gen skeleton $< > $@
+   cp $@ ../../ebpf/

-- 
2.44.0

[PATCH v10 01/18] tap: Remove tap_probe_vnet_hdr_len()

2024-04-28 Thread Akihiko Odaki

It was necessary since an Linux older than 2.6.35 may implement the
virtio-net header but may not allow to change its length. Remove it
since such an old Linux is no longer supported.

Signed-off-by: Akihiko Odaki 
Acked-by: Michael S. Tsirkin 
---
 net/tap_int.h |  1 -
 net/tap-bsd.c |  5 -
 net/tap-linux.c   | 20 
 net/tap-solaris.c |  5 -
 net/tap-stub.c|  5 -
 net/tap.c |  8 ++--
 6 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/net/tap_int.h b/net/tap_int.h
index 9a2175655bb0..8857ff299d22 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -35,7 +35,6 @@ ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
 void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
-int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 274ea7bd2c3c..b4c84441ba8b 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -217,11 +217,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index c7e514ecb04b..1226d5fda2d9 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -185,26 +185,6 @@ int tap_probe_has_uso(int fd)
 return 1;
 }
 
-/* Verify that we can assign given length */
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-int orig;
-if (ioctl(fd, TUNGETVNETHDRSZ, ) == -1) {
-return 0;
-}
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-return 0;
-}
-/* Restore original length: we can't handle failure. */
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-fprintf(stderr, "TUNGETVNETHDRSZ ioctl() failed: %s. Exiting.\n",
-strerror(errno));
-abort();
-return -errno;
-}
-return 1;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 08b13af51257..51b7830bef1d 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -221,11 +221,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 4b24f61e3a6c..38673434cbd6 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -52,11 +52,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap.c b/net/tap.c
index baaa2f7a9ac7..72ae95894ff1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -259,11 +259,7 @@ static bool tap_has_vnet_hdr(NetClientState *nc)
 
 static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-
-assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-
-return !!tap_probe_vnet_hdr_len(s->fd, len);
+return tap_has_vnet_hdr(nc);
 }
 
 static int tap_get_vnet_hdr_len(NetClientState *nc)
@@ -432,7 +428,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
  * Make sure host header length is set correctly in tap:
  * it might have been modified by another instance of qemu.
  */
-if (tap_probe_vnet_hdr_len(s->fd, s->host_vnet_hdr_len)) {
+if (vnet_hdr) {
 tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
 }
 tap_read_poll(s, true);

-- 
2.44.0

[PATCH v10 08/18] virtio-net: Add only one queue pair when realizing

2024-04-28 Thread Akihiko Odaki

Multiqueue usage is not negotiated yet when realizing. If more than
one queue is added and the guest never requests to enable multiqueue,
the extra queues will not be deleted when unrealizing and leak.

Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest doesn't 
support multiqueue")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8ede38aadbbe..e33bdbfd84a5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3759,9 +3759,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n),
 n->net_conf.tx_queue_size);
 
-for (i = 0; i < n->max_queue_pairs; i++) {
-virtio_net_add_queue(n, i);
-}
+virtio_net_add_queue(n, 0);
 
 n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
 qemu_macaddr_default_if_unset(>nic_conf.macaddr);

-- 
2.44.0

[PATCH v10 05/18] tap: Call tap_receive_iov() from tap_receive()

2024-04-28 Thread Akihiko Odaki

This will save duplicate logic found in both of tap_receive_iov() and
tap_receive().

Suggested-by: "Zhang, Chen" 
Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 99c59ee46881..9825518ff1f3 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -133,39 +133,14 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 return tap_write_packet(s, iovp, iovcnt);
 }
 
-static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t 
size)
-{
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[2];
-int iovcnt = 0;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
-
-if (s->host_vnet_hdr_len) {
-iov[iovcnt].iov_base = 
-iov[iovcnt].iov_len  = s->host_vnet_hdr_len;
-iovcnt++;
-}
-
-iov[iovcnt].iov_base = (char *)buf;
-iov[iovcnt].iov_len  = size;
-iovcnt++;
-
-return tap_write_packet(s, iov, iovcnt);
-}
-
 static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[1];
-
-if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
-return tap_receive_raw(nc, buf, size);
-}
-
-iov[0].iov_base = (char *)buf;
-iov[0].iov_len  = size;
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = size
+};
 
-return tap_write_packet(s, iov, 1);
+return tap_receive_iov(nc, , 1);
 }
 
 #ifndef __sun__

-- 
2.44.0

[PATCH v10 13/18] virtio-net: Always set populate_hash

2024-04-28 Thread Akihiko Odaki

The member is not cleared during reset so may have a stale value.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..527aac3a0465 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -651,6 +651,7 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 n->guest_hdr_len = n->mergeable_rx_bufs ?
 sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 sizeof(struct virtio_net_hdr);
+n->rss_data.populate_hash = false;
 }
 
 for (i = 0; i < n->max_queue_pairs; i++) {

-- 
2.44.0

[PATCH v10 09/18] virtio-net: Copy header only when necessary

2024-04-28 Thread Akihiko Odaki

The copied header is only used for byte swapping.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e33bdbfd84a5..ca0fbf7b7654 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -360,7 +360,8 @@ static void virtio_net_vnet_endian_status(VirtIONet *n, 
uint8_t status)
  * can't do it, we fallback onto fixing the headers in the core
  * virtio-net code.
  */
-n->needs_vnet_hdr_swap = virtio_net_set_vnet_endian(vdev, n->nic->ncs,
+n->needs_vnet_hdr_swap = n->has_vnet_hdr &&
+ virtio_net_set_vnet_endian(vdev, n->nic->ncs,
 queue_pairs, true);
 } else if (virtio_net_started(n, vdev->status)) {
 /* After using the device, we need to reset the network backend to
@@ -2767,7 +2768,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 
-if (n->has_vnet_hdr) {
+if (n->needs_vnet_hdr_swap) {
 if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
@@ -2775,19 +2776,16 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 g_free(elem);
 return -EINVAL;
 }
-if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
-   out_sg, out_num,
-   n->guest_hdr_len, -1);
-if (out_num == VIRTQUEUE_MAX_SIZE) {
-goto drop;
-}
-out_num += 1;
-out_sg = sg2;
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
+sg2[0].iov_len = n->guest_hdr_len;
+out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
+   n->guest_hdr_len, -1);
+if (out_num == VIRTQUEUE_MAX_SIZE) {
+goto drop;
 }
+out_num += 1;
+out_sg = sg2;
 }
 /*
  * If host wants to see the guest header as is, we can

-- 
2.44.0

[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements

2024-04-28 Thread Akihiko Odaki

This series contains fixes and improvements for virtio-net RSS and hash
reporting feature.

V7 -> V8:
  Reset author email addresses.
  Rebased.

V6 -> V7:
  Dropped patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Dropped the changes to remove packet flags.
  Re-introduced tap_receive() and changed it to call tap_receive_iov().
  Removed tap_get_vnet_hdr_len().
  Fixed tap initialization not to call tap_fd_set_vnet_hdr_len() for tap
  without virtio-net header.
  Changed to call error_report() instead of warn_report() for
  programming errors.

V5 -> V6:
  Corrected the message for patch "virtio-net: Return an error when vhost
  cannot enable RSS".
  Removed changes to introduce asserts from "virtio-net: Return an error
  when vhost cannot enable RSS".
  Reorganized patches "virtio-net: Return an error when vhost cannot
  enable RSS" and "virtio-net: Do not clear VIRTIO_NET_F_RSS". This
  version now contains patches "virtio-net: Return an error when vhost
  cannot enable RSS" and "virtio-net: Enable software RSS".
  Rebased.

V4 -> V5:
  Added patch "virtio-net: Do not write hashes to peer buffer".

V3 -> V4:
  Extract patches "tap: Remove tap_receive()" and  "net: Remove flag
  propagation" from "net: Remove receive_raw()".
  Added patch "virtio-net: Always set populate_hash".
  Added patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Added patch "ebpf: Use standard section name".
  Added patch "ebpf: Simplify error handling".
  Added patch "ebpf: Return 0 when configuration fails".
  Added patch "ebpf: Refactor tun_rss_steering_prog()".
  Added patch "ebpf: Add a separate target for skeleton".

V2 -> V3:
  Added patch "tap: Remove tap_probe_vnet_hdr_len()".
  Added patch "tap: Remove qemu_using_vnet_hdr()".
  Added patch "net: Move virtio-net header length assertion".
  Added patch "net: Remove receive_raw()".
  Added patch "tap: Shrink zeroed virtio-net header".
  Dropped patch "tap: Fix virtio-net header buffer size".

V1 -> V2:
  Added patch "ebpf: Fix RSS error handling".

Signed-off-by: Akihiko Odaki 
---
Changes in v10:
- Dropped an obsolete statement of patch "virtio-net: Do not write
  hashes to peer buffer". (Yuri Benditovich)
- Dropped patch "virtio-net: Return an error when vhost cannot enable
  RSS" and "virtio-net: Report RSS warning at device realization".
- Link to v9: 
https://lore.kernel.org/r/20240403-rss-v9-0-c6d87e69d...@daynix.com

Changes in v9:
- Added patch "virtio-net: Do not propagate ebpf-rss-fds errors".
- Added patch "virtio-net: Shrink header byte swapping buffer".
- Rebased.
- Link to v8: 
https://lore.kernel.org/r/20231210-rss-v8-0-9553ee714...@daynix.com

---
Akihiko Odaki (18):
  tap: Remove tap_probe_vnet_hdr_len()
  tap: Remove qemu_using_vnet_hdr()
  net: Move virtio-net header length assertion
  net: Remove receive_raw()
  tap: Call tap_receive_iov() from tap_receive()
  tap: Shrink zeroed virtio-net header
  virtio-net: Do not propagate ebpf-rss-fds errors
  virtio-net: Add only one queue pair when realizing
  virtio-net: Copy header only when necessary
  virtio-net: Shrink header byte swapping buffer
  virtio-net: Disable RSS on reset
  virtio-net: Unify the logic to update NIC state for RSS
  virtio-net: Always set populate_hash
  virtio-net: Do not write hashes to peer buffer
  ebpf: Fix RSS error handling
  ebpf: Return 0 when configuration fails
  ebpf: Refactor tun_rss_steering_prog()
  ebpf: Add a separate target for skeleton

 ebpf/rss.bpf.skeleton.h  | 1558 +++---
 include/net/net.h|8 -
 net/tap_int.h|1 -
 hw/net/e1000e.c  |1 -
 hw/net/igb.c |1 -
 hw/net/net_tx_pkt.c  |4 +-
 hw/net/virtio-net.c  |  264 
 hw/net/vmxnet3.c |2 -
 net/dump.c   |4 +-
 net/net.c|   47 +-
 net/netmap.c |5 -
 net/tap-bsd.c|5 -
 net/tap-linux.c  |   20 -
 net/tap-solaris.c|    5 -
 net/tap-stub.c   |5 -
 net/tap.c|   77 +--
 tools/ebpf/rss.bpf.c |   44 +-
 tools/ebpf/Makefile.ebpf |   15 +-
 18 files changed, 948 insertions(+), 1118 deletions(-)
---
base-commit: e5c6528dce86d7a9ada7ecf02fcb7b8560955131
change-id: 20231210-rss-e7c98e722253

Best regards,
-- 
Akihiko Odaki

[PATCH v10 03/18] net: Move virtio-net header length assertion

2024-04-28 Thread Akihiko Odaki

The virtio-net header length assertion should happen for any clients.

Signed-off-by: Akihiko Odaki 
---
 net/net.c | 5 +
 net/tap.c | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/net.c b/net/net.c
index bd51037ebfb0..db096765f4b2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -56,6 +56,7 @@
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/qobject-input-visitor.h"
+#include "standard-headers/linux/virtio_net.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -550,6 +551,10 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 return;
 }
 
+assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
+   len == sizeof(struct virtio_net_hdr) ||
+   len == sizeof(struct virtio_net_hdr_v1_hash));
+
 nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }
diff --git a/net/tap.c b/net/tap.c
index c848844955df..49edf6c2b6e1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -267,9 +267,6 @@ static void tap_set_vnet_hdr_len(NetClientState *nc, int 
len)
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
 assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
-   len == sizeof(struct virtio_net_hdr) ||
-   len == sizeof(struct virtio_net_hdr_v1_hash));
 
 tap_fd_set_vnet_hdr_len(s->fd, len);
 s->host_vnet_hdr_len = len;

-- 
2.44.0

[PATCH v10 02/18] tap: Remove qemu_using_vnet_hdr()

2024-04-28 Thread Akihiko Odaki

Since qemu_set_vnet_hdr_len() is always called when
qemu_using_vnet_hdr() is called, we can merge them and save some code.

For consistency, express that the virtio-net header is not in use by
returning 0 with qemu_get_vnet_hdr_len() instead of having a dedicated
function, qemu_get_using_vnet_hdr().

Signed-off-by: Akihiko Odaki 
---
 include/net/net.h   |  7 ---
 hw/net/e1000e.c |  1 -
 hw/net/igb.c|  1 -
 hw/net/net_tx_pkt.c |  4 ++--
 hw/net/virtio-net.c |  3 ---
 hw/net/vmxnet3.c|  2 --
 net/dump.c  |  4 +---
 net/net.c   | 24 +---
 net/netmap.c|  5 -
 net/tap.c   | 28 +---
 10 files changed, 5 insertions(+), 74 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index b1f9b35fcca1..6fe5a0aee833 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -57,8 +57,6 @@ typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasUso)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
-typedef bool (GetUsingVnetHdr)(NetClientState *);
-typedef void (UsingVnetHdr)(NetClientState *, bool);
 typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
 typedef int (GetVnetHdrLen)(NetClientState *);
 typedef void (SetVnetHdrLen)(NetClientState *, int);
@@ -88,10 +86,7 @@ typedef struct NetClientInfo {
 HasUso *has_uso;
 HasVnetHdr *has_vnet_hdr;
 HasVnetHdrLen *has_vnet_hdr_len;
-GetUsingVnetHdr *get_using_vnet_hdr;
-UsingVnetHdr *using_vnet_hdr;
 SetOffload *set_offload;
-GetVnetHdrLen *get_vnet_hdr_len;
 SetVnetHdrLen *set_vnet_hdr_len;
 SetVnetLE *set_vnet_le;
 SetVnetBE *set_vnet_be;
@@ -194,8 +189,6 @@ bool qemu_has_ufo(NetClientState *nc);
 bool qemu_has_uso(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
 bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
-bool qemu_get_using_vnet_hdr(NetClientState *nc);
-void qemu_using_vnet_hdr(NetClientState *nc, bool enable);
 void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
   int ecn, int ufo, int uso4, int uso6);
 int qemu_get_vnet_hdr_len(NetClientState *nc);
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 7c6f6029518c..d0dde767f6aa 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -352,7 +352,6 @@ e1000e_init_net_peer(E1000EState *s, PCIDevice *pci_dev, 
uint8_t *macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..1224c7ba8e38 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -349,7 +349,6 @@ igb_init_net_peer(IGBState *s, PCIDevice *pci_dev, uint8_t 
*macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c90..903238dca24d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -578,7 +578,7 @@ static void net_tx_pkt_sendv(
 {
 NetClientState *nc = opaque;
 
-if (qemu_get_using_vnet_hdr(nc->peer)) {
+if (qemu_get_vnet_hdr_len(nc->peer)) {
 qemu_sendv_packet(nc, virt_iov, virt_iov_cnt);
 } else {
 qemu_sendv_packet(nc, iov, iov_cnt);
@@ -808,7 +808,7 @@ static bool net_tx_pkt_do_sw_fragmentation(struct NetTxPkt 
*pkt,
 
 bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState *nc)
 {
-bool offload = qemu_get_using_vnet_hdr(nc->peer);
+bool offload = qemu_get_vnet_hdr_len(nc->peer);
 return net_tx_pkt_send_custom(pkt, offload, net_tx_pkt_sendv, nc);
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 58014a92ad19..f6112c0ac97d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3794,9 +3794,6 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 peer_test_vnet_hdr(n);
 if (peer_has_vnet_hdr(n)) {
-for (i = 0; i < n->max_queue_pairs; i++) {
-qemu_using_vnet_hdr(qemu_get_subqueue(n->nic, i)->peer, true);
-}
 n->host_hdr_len = sizeof(struct virtio_net_hdr);
 } else {
 n->host_hdr_len = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 707487c63666..63a91877730d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2091,8 +2091,6 @@ static void vmxnet3_net_init(VMXNET3State *s)
 if (s->peer_has_vhdr) {
 qemu_set_vnet_hdr_len(qemu_get_queue(s->nic)->peer,
 sizeof(struct virtio_net_hdr));
-
-qemu_using_vnet_hdr(qemu_get_queue(s->nic)->peer, 1);
 }
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic

[PATCH v10 04/18] net: Remove receive_raw()

2024-04-28 Thread Akihiko Odaki

While netmap implements virtio-net header, it does not implement
receive_raw(). Instead of implementing receive_raw for netmap, add
virtio-net headers in the common code and use receive_iov()/receive()
instead. This also fixes the buffer size for the virtio-net header.

Fixes: fbbdbddec0 ("tap: allow extended virtio header with hash info")
Signed-off-by: Akihiko Odaki 
---
 include/net/net.h |  1 -
 net/net.c | 18 --
 net/tap.c |  1 -
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 6fe5a0aee833..c8f679761bf9 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -72,7 +72,6 @@ typedef struct NetClientInfo {
 NetClientDriver type;
 size_t size;
 NetReceive *receive;
-NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetStart *start;
diff --git a/net/net.c b/net/net.c
index db096765f4b2..6938da05e077 100644
--- a/net/net.c
+++ b/net/net.c
@@ -787,11 +787,7 @@ static ssize_t nc_sendv_compat(NetClientState *nc, const 
struct iovec *iov,
 offset = iov_to_buf(iov, iovcnt, 0, buf, offset);
 }
 
-if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-ret = nc->info->receive_raw(nc, buffer, offset);
-} else {
-ret = nc->info->receive(nc, buffer, offset);
-}
+ret = nc->info->receive(nc, buffer, offset);
 
 g_free(buf);
 return ret;
@@ -806,6 +802,8 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 MemReentrancyGuard *owned_reentrancy_guard;
 NetClientState *nc = opaque;
 int ret;
+struct virtio_net_hdr_v1_hash vnet_hdr = { };
+g_autofree struct iovec *iov_copy = NULL;
 
 
 if (nc->link_down) {
@@ -824,7 +822,15 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 owned_reentrancy_guard->engaged_in_io = true;
 }
 
-if (nc->info->receive_iov && !(flags & QEMU_NET_PACKET_FLAG_RAW)) {
+if ((flags & QEMU_NET_PACKET_FLAG_RAW) && nc->vnet_hdr_len) {
+iov_copy = g_new(struct iovec, iovcnt + 1);
+iov_copy[0].iov_base = _hdr;
+iov_copy[0].iov_len =  nc->vnet_hdr_len;
+memcpy(_copy[1], iov, iovcnt * sizeof(*iov));
+iov = iov_copy;
+}
+
+if (nc->info->receive_iov) {
 ret = nc->info->receive_iov(nc, iov, iovcnt);
 } else {
 ret = nc_sendv_compat(nc, iov, iovcnt, flags);
diff --git a/net/tap.c b/net/tap.c
index 49edf6c2b6e1..99c59ee46881 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -360,7 +360,6 @@ static NetClientInfo net_tap_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
-.receive_raw = tap_receive_raw,
 .receive_iov = tap_receive_iov,
 .poll = tap_poll,
 .cleanup = tap_cleanup,

-- 
2.44.0

Re: [PATCH v9 10/11] virtio-gpu: Support Venus context

2024-04-27 Thread Akihiko Odaki


On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Request Venus when initializing VirGL and if vulkan=true flag is set for
virtio-gpu device.


Naming it vulkan is a bit confusing as there is also GFXSTREAM_VULKAN 
capset though virgl does not support it. I think you can just name it venus.

Re: [PATCH v9 11/11] migration/virtio: Add virtio-gpu section

2024-04-27 Thread Akihiko Odaki


On 2024/04/26 0:45, Dmitry Osipenko wrote:

Document virtio-gpu migration specifics.

Suggested-by: Akihiko Odaki 
Signed-off-by: Dmitry Osipenko 
---
  docs/devel/migration/virtio.rst | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/docs/devel/migration/virtio.rst b/docs/devel/migration/virtio.rst
index 611a18b82151..67f5fcfed196 100644
--- a/docs/devel/migration/virtio.rst
+++ b/docs/devel/migration/virtio.rst
@@ -113,3 +113,10 @@ virtio_load() returned (like e.g. code depending on 
features).
  Any extension of the state being migrated should be done in subsections
  added to the core for compatibility reasons. If transport or device specific
  state is added, core needs to invoke a callback from the new subsection.
+
+VirtIO-GPU migration
+
+VirtIO-GPU doesn't adhere to a common virtio migration scheme. It doesn't
+support save/loading of virtio device state, instead it uses generic device
+migration management on top of the virtio core to save/load GPU state.
+Migration of virgl and rutabaga states not supported.


Sorry for confusion, but I didn't mean to add a subsection to the 
documentation. I intended to refer to a terminology of migration data 
structure named subsection, which is documented at: 
docs/devel/migration/main.rst


A device-specific information is not worth to describe here.

Re: [PATCH v9 09/11] virtio-gpu: Register capsets dynamically

2024-04-27 Thread Akihiko Odaki


On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 34 +++---
  include/hw/virtio/virtio-gpu.h |  2 ++
  2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index de788df155bf..9aa1fd78f1e1 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -597,19 +597,13 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->capset_ids->len) {
+resp.capset_id = g_array_index(g->capset_ids, uint32_t,
+   info.capset_index);
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1159,12 +1153,30 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
+static void virtio_gpu_virgl_add_capset(VirtIOGPU *g, uint32_t capset_id)

+{
+g_array_append_val(g->capset_ids, capset_id);
+}
+
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  {
  uint32_t capset2_max_ver, capset2_max_size;
+
+if (g->capset_ids) {


Move capset_ids initialization to virtio_gpu_virgl_init() to save this 
conditional. capset_ids also needs to be freed when the device gets 
unrealized.

Re: [PATCH v9 08/11] virtio-gpu: Resource UUID

2024-04-27 Thread Akihiko Odaki


On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
UUID feature availability is mandatory for Vulkan Venus context.

UUID is intended for sharing dmabufs between virtio devices on host. Qemu
doesn't have second virtio device for sharing, thus a simple stub UUID
implementation is enough. More complete implementation using global UUID
resource table might become interesting for a multi-gpu cases.


This message needs to be updated to clarify that a VM can have a second 
virtio-gpu device but this implementation does not support sharing 
between two virtio-gpu devices.

Re: [PATCH v9 07/11] virtio-gpu: Handle resource blob commands

2024-04-27 Thread Akihiko Odaki


On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c | 268 ++
  hw/display/virtio-gpu.c   |   4 +-
  2 files changed, 271 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 0feaa9f2c52e..73d4acbf1777 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,8 @@
  
  struct virtio_gpu_virgl_resource {

  struct virtio_gpu_simple_resource base;
+bool async_unmap_in_progress;


Why is this flag needed?


+MemoryRegion *mr;
  };
  
  static struct virtio_gpu_virgl_resource *

@@ -49,6 +51,120 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+struct virtio_gpu_virgl_resource *res;
+};
+
+static void virtio_gpu_virgl_resume_cmdq(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->async_unmap_in_progress = false;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() may be executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+aio_bh_schedule_oneshot(qemu_get_aio_context(),
+virtio_gpu_virgl_resume_cmdq, vmr->g);


Use aio_bh_new() and qemu_bh_schedule() instead to save one-time bottom 
half allocation.



+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP; > +}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource\n",
+  __func__);


Print strerror(-ret) here instead as printing strerror(EOPNOTSUPP) helps 
little when !virtio_gpu_hostmem_enabled(b->conf).



+return -ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->res = res;
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * Potentially, MR could outlive the resource if MR's reference is held
+ * outside of virtio-gpu. In order to prevent unmapping resource while
+ * MR is alive, and thus, making the data pointer invalid, we will block
+ * virtio-gpu command processing until MR is fully unreferenced and
+ * released.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static bool
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_virgl_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->async_unmap_in_progress && res->mr) {
+/* memory region owns self res->mr object and frees it by itself */
+MemoryRegion *mr = res->mr;
+res->mr = NULL;
+
+res->async_unmap_in_progress = true;
+
+/* render will be unblocked when MR is freed */
+b->renderer_blocked++;
+
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+}
+
+if (res->async_unmap_in_progress) {
+return false;
+}
+
+virgl_renderer_resource_unmap(res->base.resource_id);
+
+return true;
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -162,6 +278,14 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+if (res->mr || cmd->suspended) {

+bool unmapped =

Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-04-26 Thread Akihiko Odaki


On 2024/04/24 19:30, Dmitry Osipenko wrote:

On 4/19/24 12:18, Akihiko Odaki wrote:

@@ -61,6 +61,10 @@ struct virtio_gpu_simple_resource {
   int dmabuf_fd;
   uint8_t *remapped;
   +    MemoryRegion *mr;
+    bool async_unmap_completed;
+    bool async_unmap_in_progress;
+


Don't add fields to virtio_gpu_simple_resource but instead create a
struct that embeds virtio_gpu_simple_resource in virtio-gpu-virgl.c.


Please give a justification. I'd rather rename
virtio_gpu_simple_resource s/_simple//. Simple resource already supports
blob and the added fields are directly related to the blob. Don't see
why another struct is needed.



Because mapping is only implemented in virtio-gpu-gl while blob itself 
is implemented also in virtio-gpu.

Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-04-26 Thread Akihiko Odaki


On 2024/04/24 18:43, Dmitry Osipenko wrote:

On 4/19/24 11:53, Akihiko Odaki wrote:

On 2024/04/19 4:00, Dmitry Osipenko wrote:

Add new "suspended" flag to virtio_gpu_ctrl_command telling cmd
processor that it should stop processing commands and retry again
next time until flag is unset.

Signed-off-by: Dmitry Osipenko 


This flag shouldn't be added to virtio_gpu_ctrl_command. suspended is
just !finished in virtio-gpu.c. Only virtio_gpu_virgl_process_cmd()
needs the distinction of suspended and !finished so it is not
appropriate to add this flag the common structure.


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can 
just refer to !finished. No need to have the suspended flag.




I'd keep the flag if there are no better suggestions.

Re: [PATCH v2 1/2] net: Provide MemReentrancyGuard * to qemu_new_nic()

2024-04-26 Thread Akihiko Odaki


On 2024/04/24 21:32, Thomas Huth wrote:

On 24/04/2024 12.41, Prasad Pandit wrote:
On Wednesday, 24 April, 2024 at 03:36:01 pm IST, Philippe 
Mathieu-Daudé wrote:

On 1/6/23 05:18, Akihiko Odaki wrote:

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to 
update it

when delivering a packet to a device.
In preparation for such a change, add MemReentrancyGuard * as a
parameter of qemu_new_nic().


An user on IRC asked if this patch is related/fixing CVE-2021-20255,
any clue?


* CVE-2021-20255 bug: infinite recursion is pointing at a different 
fix patch.

   -> https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2021-20255

* And the this patch below has different issue tagged
   
-> https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg08312.html

   Fixes: CVE-2023-3019


* They look different, former is an infinite recursion issue and the 
latter is a use-after-free one.


I assume the eepro reentrancy issue has been fixed with:

  https://gitlab.com/qemu-project/qemu/-/issues/556
  i.e.:
  https://gitlab.com/qemu-project/qemu/-/commit/c40ca2301c7603524eaddb5308a3


I agree. Commit c40ca2301c7603524eaddb5308a3 should be what fixed 
CVE-2021-20255, not this patch.


Regards,
Akihiko Odaki

Re: [PATCH v8 11/11] virtio-gpu: Support Venus context

2024-04-19 Thread Akihiko Odaki


On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Request Venus when initializing VirGL and if vulkan=true flag is set for
virtio-gpu device.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 14 ++
  hw/display/virtio-gpu.c| 15 +++
  include/hw/virtio/virtio-gpu.h |  3 +++
  meson.build|  1 +
  4 files changed, 33 insertions(+)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index c0e1ca3ff339..2eac09370b84 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -1095,6 +1095,11 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  flags |= VIRGL_RENDERER_D3D11_SHARE_TEXTURE;
  }
  #endif
+#ifdef VIRGL_RENDERER_VENUS
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {
+flags |= VIRGL_RENDERER_VENUS | VIRGL_RENDERER_RENDER_SERVER;
+}
+#endif
  
  ret = virgl_renderer_init(g, flags, _gpu_3d_cbs);

  if (ret != 0) {
@@ -1138,5 +1143,14 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL2);
  }
  
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {

+virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VENUS,
+   _max_ver,
+   _max_size);
+if (capset2_max_size) {


Now capset2_max_ver and capset2_max_size are misnomers as they are used 
not only for VIRTIO_GPU_CAPSET_VIRGL2 but also VIRTIO_GPU_CAPSET_VENUS. 
Just removing the "capset2_" prefix would be fine.



+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VENUS);
+}
+}
+
  return g->num_capsets;
  }
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index fbf5c0e6b8b7..a811a86dd600 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1496,6 +1496,19 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
  #endif
  }
  
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {

+#ifdef HAVE_VIRGL_VENUS
+if (!virtio_gpu_blob_enabled(g->parent_obj.conf) ||
+!virtio_gpu_hostmem_enabled(g->parent_obj.conf)) {
+error_setg(errp, "venus requires enabled blob and hostmem 
options");
+return;
+}
+#else
+error_setg(errp, "old virglrenderer, venus unsupported");
+return;
+#endif
+}
+ >   if (!virtio_gpu_base_device_realize(qdev,
  virtio_gpu_handle_ctrl_cb,
  virtio_gpu_handle_cursor_cb,
@@ -1672,6 +1685,8 @@ static Property virtio_gpu_properties[] = {
  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
  DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+DEFINE_PROP_BIT("vulkan", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_VENUS_ENABLED, false),


This property shouldn't be added here because it is specific to 
virtio-gpu-gl.

Re: [PATCH v8 10/11] virtio-gpu: Register capsets dynamically

2024-04-19 Thread Akihiko Odaki


On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 37 ++
  include/hw/virtio/virtio-gpu.h |  3 +++
  2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index eee3816b987f..c0e1ca3ff339 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -558,19 +558,12 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->num_capsets) {
+resp.capset_id = g->capset_ids[info.capset_index];
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1120,12 +1113,30 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
+static void virtio_gpu_virgl_add_capset(VirtIOGPU *g, uint32_t capset_id)

+{
+g->capset_ids = g_realloc_n(g->capset_ids, g->num_capsets + 1,
+sizeof(*g->capset_ids));
+g->capset_ids[g->num_capsets++] = capset_id;
+}
+
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  {
  uint32_t capset2_max_ver, capset2_max_size;
+
+if (g->num_capsets) {
+return g->num_capsets;
+}
+
+/* VIRGL is always supported. */
+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL);
+
  virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
-  _max_ver,
-  _max_size);
+   _max_ver,
+   _max_size);
+if (capset2_max_ver) {
+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL2);
+}
  
-return capset2_max_ver ? 2 : 1;

+return g->num_capsets;
  }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index d2a0d542fbb3..3d7d001a85c5 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -218,6 +218,9 @@ struct VirtIOGPU {
  QTAILQ_HEAD(, VGPUDMABuf) bufs;
  VGPUDMABuf *primary[VIRTIO_GPU_MAX_SCANOUTS];
  } dmabuf;
+
+uint32_t *capset_ids;
+uint32_t num_capsets;


Use GArray.

Re: [PATCH v8 09/11] virtio-gpu: Resource UUID

2024-04-19 Thread Akihiko Odaki


On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
UUID feature availability is mandatory for Vulkan Venus context.

UUID is intended for sharing dmabufs between virtio devices on host. Qemu
doesn't have second virtio device for sharing, thus a simple stub UUID
implementation is enough. More complete implementation using global UUID
resource table might become interesting for a multi-gpu cases.


Isn't it possible to add two virtio-gpu devices even now?

A new subsection should also be added for migration compatibility; see: 
docs/devel/migration/main.rst

Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-04-19 Thread Akihiko Odaki


On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 248 +
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   4 +
  3 files changed, 255 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index bb9ee1eba9a0..de132b22f554 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -32,6 +32,102 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+VirtIOGPUBase *b;
+struct virtio_gpu_simple_resource *res;
+};
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->async_unmap_in_progress = false;
+vmr->res->async_unmap_completed = true;
+vmr->b->renderer_blocked--;


Resume the command queue processing here.


+
+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+return -EOPNOTSUPP;


Log a message here instead of picking an error number.


+}
+
+ret = virgl_renderer_resource_map(res->resource_id, , );
+if (ret) {
+return -ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+MemoryRegion *mr = >mr;


Mixed declarations are not allowed; see: docs/devel/style.rst


+vmr->res = res;
+vmr->b = b;
+
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * Potentially, MR could outlive the resource if MR's reference is held
+ * outside of virtio-gpu. In order to prevent unmapping resource while
+ * MR is alive, and thus, making the data pointer invalid, we will block
+ * virtio-gpu command processing until MR is fully unreferenced and
+ * released.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static bool
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_simple_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->async_unmap_in_progress && !res->async_unmap_completed) {
+/* memory region owns self res->mr object and frees it by itself */
+MemoryRegion *mr = res->mr;
+res->mr = NULL;
+
+res->async_unmap_in_progress = true;
+
+/* render will be unblocked when MR is freed */
+b->renderer_blocked++;
+
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+}
+
+if (!res->async_unmap_completed) {


This check is unnecessary as the command processing is blocked until the 
unmap operation completes.



+return false;
+}
+
+virgl_renderer_resource_unmap(res->resource_id);
+res->async_unmap_completed = false;
+
+return true;
+
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -145,6 +241,14 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+if (res->mr || cmd->suspended) {

+bool unmapped = virtio_gpu_virgl_unmap_resource_blob(g, res);
+cmd->suspended = !unmapped;
+if (cmd->suspended) {
+return;
+}
+}
+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
@@ -495,6 +599,141 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  }
  
  #ifdef HAVE_VIRGL_RESOURCE_BLOB

+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+struct virtio_gpu_resource_create_blob cblob;
+struct virtio_gpu_simple_resource *res;
+int ret;
+
+if

Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-04-19 Thread Akihiko Odaki


On 2024/04/19 4:00, Dmitry Osipenko wrote:

Add new "suspended" flag to virtio_gpu_ctrl_command telling cmd
processor that it should stop processing commands and retry again
next time until flag is unset.

Signed-off-by: Dmitry Osipenko 


This flag shouldn't be added to virtio_gpu_ctrl_command. suspended is 
just !finished in virtio-gpu.c. Only virtio_gpu_virgl_process_cmd() 
needs the distinction of suspended and !finished so it is not 
appropriate to add this flag the common structure.


Regards,
Akihiko Odaki

Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-16 Thread Akihiko Odaki


On 2024/04/16 13:00, Jason Wang wrote:

On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
 wrote:


On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  wrote:


vhost requires eBPF for RSS. When eBPF is not available, virtio-net
implicitly disables RSS even if the user explicitly requests it. Return
an error instead of implicitly disabling RSS if RSS is requested but not
available.

Signed-off-by: Akihiko Odaki 
---
  hw/net/virtio-net.c | 97 ++---
  1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..3d53eba88cfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  return features;
  }

-if (!ebpf_rss_is_loaded(>ebpf_rss)) {
-virtio_clear_feature(, VIRTIO_NET_F_RSS);
-}
  features = vhost_net_get_features(get_vhost_net(nc->peer), features);
  vdev->backend_features = features;

@@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
  return qatomic_read(>failover_primary_hidden);
  }

+static void virtio_net_device_unrealize(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIONet *n = VIRTIO_NET(dev);
+int i, max_queue_pairs;
+
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+virtio_net_unload_ebpf(n);
+}
+
+/* This will stop vhost backend if appropriate. */
+virtio_net_set_status(vdev, 0);
+
+g_free(n->netclient_name);
+n->netclient_name = NULL;
+g_free(n->netclient_type);
+n->netclient_type = NULL;
+
+g_free(n->mac_table.macs);
+g_free(n->vlans);
+
+if (n->failover) {
+qobject_unref(n->primary_opts);
+device_listener_unregister(>primary_listener);
+migration_remove_notifier(>migration_state);
+} else {
+assert(n->primary_opts == NULL);
+}
+
+max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+for (i = 0; i < max_queue_pairs; i++) {
+virtio_net_del_queue(n, i);
+}
+/* delete also control vq */
+virtio_del_queue(vdev, max_queue_pairs * 2);
+qemu_announce_timer_del(>announce_timer, false);
+g_free(n->vqs);
+qemu_del_nic(n->nic);
+virtio_net_rsc_cleanup(n);
+g_free(n->rss_data.indirections_table);
+net_rx_pkt_uninit(n->rx_pkt);
+virtio_cleanup(vdev);
+}
+
  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
  {
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)

  net_rx_pkt_init(>rx_pkt);

-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
-}
-}
-
-static void virtio_net_device_unrealize(DeviceState *dev)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(dev);
-VirtIONet *n = VIRTIO_NET(dev);
-int i, max_queue_pairs;
-
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_unload_ebpf(n);
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
+!virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
+virtio_net_device_unrealize(dev);
+error_setg(errp, "Can't load eBPF RSS for vhost");
  }


As I already mentioned, I think this is an extremely bad idea to
fail to run qemu due to such a reason as .absence of one feature.
What I suggest is:
1. Redefine rss as tri-state (off|auto|on)
2. Fail to run only if rss is on and not available via ebpf
3. On auto - silently drop it


"Auto" might be promatic for migration compatibility which is hard to
be used by management layers like libvirt. The reason is that there's
no way for libvirt to know if it is supported by device or not.


Certainly auto is not good for migration, but it is useful in the other 
situations. You can still set "on" or "off" if you care migration. I'll 
add "auto" support in the next version.




Thanks


4. The same with 'hash' option - it is not compatible with vhost (at
least at the moment)
5. Reformat the patch as it is hard to review it due to replacing
entire procedures, i.e. one patch with replacing without changes,
another one - with real changes. >> If this is hard to review only for me - 
please ignore that.


I'll split this patch accordingly in the next version.

Regards,
Akihiko Odak

Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-15 Thread Akihiko Odaki


On 2024/04/15 17:49, Dmitry Osipenko wrote:

On 4/15/24 11:13, Akihiko Odaki wrote:

On 2024/04/15 17:03, Dmitry Osipenko wrote:

Hello,

On 4/13/24 14:57, Akihiko Odaki wrote:
...

+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct
virtio_gpu_simple_resource *res)
+{
+    VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+    if (!res->mr) {
+    return;
+    }
+
+    memory_region_set_enabled(res->mr, false);
+    memory_region_del_subregion(>hostmem, res->mr);
+
+    /* memory region owns res->mr object and frees it when mr is
released */
+    res->mr = NULL;
+
+    virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still
vulnerable to the race condition pointed out. The memory region is
asynchronously unmapped from the guest address space, but the backing
memory on the host address space is unmapped synchronously before that.
This results in use-after-free. The whole unmapping operation needs to
be implemented in an asynchronous manner.


Thanks for the clarification! I missed this point from the previous
discussion.

Could you please clarify what do you mean by the "asynchronous manner"?
Virglrenderer API works only in the virtio-gpu-gl context, it can't be
accessed from other places.

The memory_region_del_subregion() should remove the region as long as
nobody references it, isn't it? On Linux guest nobody should reference
hostmem regions besides virtio-gpu device on the unmap, don't know about
other guests.

We can claim it a guest's fault if MR lives after the deletion and in
that case exit Qemu with a noisy error msg or leak resource. WDYT?



We need to be prepared for a faulty guest for reliability and security
as they are common goals of virtualization, and it is nice to have them
after all.

You need to call virgl_renderer_resource_unmap() after the MR actually
gets freed. The virtio-gpu-gl context is just a context with BQL so it
is fine to call virgl functions in most places.


Do you have example of a legit use-case where hostmem MR could outlive
resource mapping?


MR outliving after memory_region_del_subregion() is not a use-case, but 
a situation that occurs due to the limitation of the memory subsystem. 
It is not easy to answer how often such a situation happens.




Turning it into a error condition is much more reasonable to do than to
to worry about edge case that nobody cares about, which can't be tested
easily and that not trivial to support, IMO.

I'm not sure what you mean by turning into an error condition. I doubt 
it's possible to emit errors when someone touches an unmapped region.


Reproducing this issue is not easy as it's often cases for 
use-after-free bugs, but fixing it is not that complicated in my opinion 
since you already have an implementation which asynchronously unmaps the 
region in v6. I write my suggestions to fix problems in v6:


- Remove ref member in virgl_gpu_resource, vres_get_ref(), 
vres_put_ref(), and virgl_resource_unmap().


- Change virtio_gpu_virgl_process_cmd(), 
virgl_cmd_resource_unmap_blob(), and virgl_cmd_resource_unref() to 
return a bool, which tells if the command was processed or suspended.


- In virtio_gpu_process_cmdq(), break if the command was suspended.

- In virgl_resource_blob_async_unmap(), call virtio_gpu_gl_block(g, false).

- In virgl_cmd_resource_unmap_blob() and virgl_cmd_resource_unref(), 
call memory_region_del_subregion() and virtio_gpu_gl_block(g, true), and 
tell that the command was suspended if the reference counter of 
MemoryRegion > 0. Free and unmap the MR otherwise.


Regards,
Akihiko Odaki

Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-15 Thread Akihiko Odaki


On 2024/04/15 17:03, Dmitry Osipenko wrote:

Hello,

On 4/13/24 14:57, Akihiko Odaki wrote:
...

+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct
virtio_gpu_simple_resource *res)
+{
+    VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+    if (!res->mr) {
+    return;
+    }
+
+    memory_region_set_enabled(res->mr, false);
+    memory_region_del_subregion(>hostmem, res->mr);
+
+    /* memory region owns res->mr object and frees it when mr is
released */
+    res->mr = NULL;
+
+    virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still
vulnerable to the race condition pointed out. The memory region is
asynchronously unmapped from the guest address space, but the backing
memory on the host address space is unmapped synchronously before that.
This results in use-after-free. The whole unmapping operation needs to
be implemented in an asynchronous manner.


Thanks for the clarification! I missed this point from the previous
discussion.

Could you please clarify what do you mean by the "asynchronous manner"?
Virglrenderer API works only in the virtio-gpu-gl context, it can't be
accessed from other places.

The memory_region_del_subregion() should remove the region as long as
nobody references it, isn't it? On Linux guest nobody should reference
hostmem regions besides virtio-gpu device on the unmap, don't know about
other guests.

We can claim it a guest's fault if MR lives after the deletion and in
that case exit Qemu with a noisy error msg or leak resource. WDYT?



We need to be prepared for a faulty guest for reliability and security 
as they are common goals of virtualization, and it is nice to have them 
after all.


You need to call virgl_renderer_resource_unmap() after the MR actually 
gets freed. The virtio-gpu-gl context is just a context with BQL so it 
is fine to call virgl functions in most places.


Also the command response must be delayed in a similar manner. Currently 
virtio_gpu_virgl_process_cmd() synchronously returns command responses 
so this needs to be changed.


Regards,
Akihiko Odaki

Re: [PATCH v9 17/20] ebpf: Fix RSS error handling

2024-04-14 Thread Akihiko Odaki


On 2024/04/13 21:16, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.


This is interesting question whether in reality the hash value might
be 0 or not.
On one hand - this seems like a kind of fix.
On another hard - this adds computation cycles for each packet, and the
corner case that this targets to fix seems hardly reachable if at all.
Optimistic estimation is 2.5*10^-8 percent of address:address:port triplets.
I would suggest at least to find some proof of the fact that the calculated
hash might be 0 in real case where source addresses are not random.


Your estimation, which suggests the low probability of error, and the 
fact that an error in RSS is not fatal and only regresses the 
performance, should be sufficient to determine how this patch should be 
handled; this patch is nice to have, but is not worth to backport to 
stable or to merge before the 9.0 release.


Regards,
Akihiko Odaki

Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-13 Thread Akihiko Odaki


On 2024/04/11 19:19, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 196 +
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   1 +
  3 files changed, 200 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index c2057b0c2147..ec63f5d698b7 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -32,6 +32,55 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource *res,
+   uint64_t offset)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->resource_id, , );
+if (ret) {
+return -ret;
+}
+
+res->mr = g_new0(MemoryRegion, 1);
+memory_region_init_ram_ptr(res->mr, OBJECT(res->mr), "blob",
+   size, data);
+memory_region_add_subregion(>hostmem, offset, res->mr);
+memory_region_set_enabled(res->mr, true);
+
+return 0;
+}
+
+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_simple_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->mr) {
+return;
+}
+
+memory_region_set_enabled(res->mr, false);
+memory_region_del_subregion(>hostmem, res->mr);
+
+/* memory region owns res->mr object and frees it when mr is released */
+res->mr = NULL;
+
+virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still 
vulnerable to the race condition pointed out. The memory region is 
asynchronously unmapped from the guest address space, but the backing 
memory on the host address space is unmapped synchronously before that. 
This results in use-after-free. The whole unmapping operation needs to 
be implemented in an asynchronous manner.


Regards,
Akihiko Odaki


+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -145,6 +194,10 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+virtio_gpu_virgl_unmap_resource_blob(g, res);
+#endif
+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
@@ -495,6 +548,140 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  }
  
  #ifdef HAVE_VIRGL_RESOURCE_BLOB

+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+struct virtio_gpu_resource_create_blob cblob;
+struct virtio_gpu_simple_resource *res;
+int ret;
+
+if (!virtio_gpu_blob_enabled(g->parent_obj.conf)) {
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap();
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, cblob.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, cblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+res->resource_id = cblob.resource_id;
+res->blob_size = cblob.size;
+res->dmabuf_fd = -1;
+
+if (cblob.blob_mem != VIRTIO_GPU_BLOB_MEM_HOST3D) {
+ret = virtio_gpu_create_mapping_iov(g, cblob.nr_entries, sizeof(cblob),
+cmd, >addrs,
+>iov, >iov_cnt);
+if (!r

Re: [PATCH-for-9.0? v2] hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()

2024-04-10 Thread Akihiko Odaki


On 2024/04/10 16:04, Philippe Mathieu-Daudé wrote:

If a fragmented packet size is too short, do not try to
calculate its checksum.

Reproduced using:

   $ cat << EOF | qemu-system-i386 -display none -nodefaults \
   -machine q35,accel=qtest -m 32M \
   -device igb,netdev=net0 \
   -netdev user,id=net0 \
   -qtest stdio
   outl 0xcf8 0x8810
   outl 0xcfc 0xe000
   outl 0xcf8 0x8804
   outw 0xcfc 0x06
   write 0xe403 0x1 0x02
   writel 0xe0003808 0x
   write 0xe000381a 0x1 0x5b
   write 0xe000381b 0x1 0x00
   EOF
   Assertion failed: (offset == 0), function iov_from_buf_full, file 
util/iov.c, line 39.
   #1 0x5575e81e952a in iov_from_buf_full qemu/util/iov.c:39:5
   #2 0x5575e6500768 in net_tx_pkt_update_sctp_checksum 
qemu/hw/net/net_tx_pkt.c:144:9
   #3 0x5575e659f3e1 in igb_setup_tx_offloads qemu/hw/net/igb_core.c:478:11
   #4 0x5575e659f3e1 in igb_tx_pkt_send qemu/hw/net/igb_core.c:552:10
   #5 0x5575e659f3e1 in igb_process_tx_desc qemu/hw/net/igb_core.c:671:17
   #6 0x5575e659f3e1 in igb_start_xmit qemu/hw/net/igb_core.c:903:9
   #7 0x5575e659f3e1 in igb_set_tdt qemu/hw/net/igb_core.c:2812:5
   #8 0x5575e657d6a4 in igb_core_write qemu/hw/net/igb_core.c:4248:9

Cc: qemu-sta...@nongnu.org
Reported-by: Zheyu Ma 
Fixes: f199b13bc1 ("igb: Implement Tx SCTP CSO")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2273
Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Akihiko Odaki 


---
Since v1: check at offset 8 (Akihiko)
---
  hw/net/net_tx_pkt.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c..b7b1de816d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,6 +141,10 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
  uint32_t csum = 0;
  struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
  
+if (iov_size(pl_start_frag, pkt->payload_frags) < 8 + sizeof(csum)) {

+return false;
+}
+
  if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , sizeof(csum)) 
< sizeof(csum)) {
  return false;
  }

Re: [RFC PATCH-for-9.0?] hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()

2024-04-10 Thread Akihiko Odaki


On 2024/04/10 3:04, Philippe Mathieu-Daudé wrote:

If a fragmented packet size is too short, do not try to
calculate its checksum.

Reproduced using:

   $ cat << EOF | qemu-system-i386 -display none -nodefaults \
   -machine q35,accel=qtest -m 32M \
   -device igb,netdev=net0 \
   -netdev user,id=net0 \
   -qtest stdio
   outl 0xcf8 0x8810
   outl 0xcfc 0xe000
   outl 0xcf8 0x8804
   outw 0xcfc 0x06
   write 0xe403 0x1 0x02
   writel 0xe0003808 0x
   write 0xe000381a 0x1 0x5b
   write 0xe000381b 0x1 0x00
   EOF
   Assertion failed: (offset == 0), function iov_from_buf_full, file 
util/iov.c, line 39.
   #1 0x5575e81e952a in iov_from_buf_full qemu/util/iov.c:39:5
   #2 0x5575e6500768 in net_tx_pkt_update_sctp_checksum 
qemu/hw/net/net_tx_pkt.c:144:9
   #3 0x5575e659f3e1 in igb_setup_tx_offloads qemu/hw/net/igb_core.c:478:11
   #4 0x5575e659f3e1 in igb_tx_pkt_send qemu/hw/net/igb_core.c:552:10
   #5 0x5575e659f3e1 in igb_process_tx_desc qemu/hw/net/igb_core.c:671:17
   #6 0x5575e659f3e1 in igb_start_xmit qemu/hw/net/igb_core.c:903:9
   #7 0x5575e659f3e1 in igb_set_tdt qemu/hw/net/igb_core.c:2812:5
   #8 0x5575e657d6a4 in igb_core_write qemu/hw/net/igb_core.c:4248:9

Reported-by: Zheyu Ma 
Fixes: f199b13bc1 ("igb: Implement Tx SCTP CSO")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2273
Signed-off-by: Philippe Mathieu-Daudé 
---
No clue this makes sense, but avoids the crash...
---
  hw/net/net_tx_pkt.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c..6a8640157f 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,6 +141,10 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
  uint32_t csum = 0;
  struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
  
+if (iov_size(pl_start_frag, pkt->payload_frags) < sizeof(csum)) {

+return false;
+}
+


iov_from_buf() uses 8 as offset, so we should check if the size >= 8 + 
sizeof(csum).


However I doubt that it is worth to fix. I think it is fine to remove 
the assertion (i.e., remove the requirement that the offset specified 
for iov_from_buf() must be inside iov and instead let the function() 
return 0 in that case).


Regards,
Akihiko Odaki


  if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , sizeof(csum)) 
< sizeof(csum)) {
  return false;
  }

Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki


On 2024/04/08 17:06, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:57 AM Akihiko Odaki  wrote:


On 2024/04/08 16:54, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:42 AM Akihiko Odaki  wrote:


On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


But how is your patch involved in it? Should this line be removed from
the commit message?


In the master, VIRTIO_NET_F_HASH_REPORT is silently dropped, but this
patch will change to work without dropping it, which is worth to mention.

After applying this series of patches the VIRTIO_NET_F_HASH_REPORT is
dropped _the same way_ as in the master


You are right. I forgot that I dropped patch "virtio-net: Do not clear 
VIRTIO_NET_F_HASH_REPORT" with v7. I'll drop the line in the next 
version accordingly. Thanks for pointing out that.


Regards,
Akihiko Odaki

Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki


On 2024/04/08 16:54, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:42 AM Akihiko Odaki  wrote:


On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


But how is your patch involved in it? Should this line be removed from
the commit message?


In the master, VIRTIO_NET_F_HASH_REPORT is silently dropped, but this 
patch will change to work without dropping it, which is worth to mention.


Regards,
Akihiko Odaki

Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki


On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from 
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


Regards,
Akihiko Odaki

Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-07 Thread Akihiko Odaki


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.

Regards,
Akihiko Odaki

Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-07 Thread Akihiko Odaki


On 2024/04/08 6:46, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  wrote:


vhost requires eBPF for RSS. When eBPF is not available, virtio-net
implicitly disables RSS even if the user explicitly requests it. Return
an error instead of implicitly disabling RSS if RSS is requested but not
available.

Signed-off-by: Akihiko Odaki 
---
  hw/net/virtio-net.c | 97 ++---
  1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..3d53eba88cfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  return features;
  }

-if (!ebpf_rss_is_loaded(>ebpf_rss)) {
-virtio_clear_feature(, VIRTIO_NET_F_RSS);
-}
  features = vhost_net_get_features(get_vhost_net(nc->peer), features);
  vdev->backend_features = features;

@@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
  return qatomic_read(>failover_primary_hidden);
  }

+static void virtio_net_device_unrealize(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIONet *n = VIRTIO_NET(dev);
+int i, max_queue_pairs;
+
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+virtio_net_unload_ebpf(n);
+}
+
+/* This will stop vhost backend if appropriate. */
+virtio_net_set_status(vdev, 0);
+
+g_free(n->netclient_name);
+n->netclient_name = NULL;
+g_free(n->netclient_type);
+n->netclient_type = NULL;
+
+g_free(n->mac_table.macs);
+g_free(n->vlans);
+
+if (n->failover) {
+qobject_unref(n->primary_opts);
+device_listener_unregister(>primary_listener);
+migration_remove_notifier(>migration_state);
+} else {
+assert(n->primary_opts == NULL);
+}
+
+max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+for (i = 0; i < max_queue_pairs; i++) {
+virtio_net_del_queue(n, i);
+}
+/* delete also control vq */
+virtio_del_queue(vdev, max_queue_pairs * 2);
+qemu_announce_timer_del(>announce_timer, false);
+g_free(n->vqs);
+qemu_del_nic(n->nic);
+virtio_net_rsc_cleanup(n);
+g_free(n->rss_data.indirections_table);
+net_rx_pkt_uninit(n->rx_pkt);
+virtio_cleanup(vdev);
+}
+
  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
  {
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)

  net_rx_pkt_init(>rx_pkt);

-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
-}
-}
-
-static void virtio_net_device_unrealize(DeviceState *dev)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(dev);
-VirtIONet *n = VIRTIO_NET(dev);
-int i, max_queue_pairs;
-
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_unload_ebpf(n);
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&


I disagree with this change of qemu behavior.
 From my point of view:
- this is not a major problem and it should not be a reason to stop VM execution
- it is enough to disable the RSS feature and continue working. Depending on
   other qemu parameters (number of queues, number of cpus) this might be just
   suboptimal. might be a minor problem and might be not a problem at all


The reasoning is that we shouldn't disable what the user explicitly 
requested. c.f., 
https://lore.kernel.org/all/20231102091717-mutt-send-email-...@kernel.org/



- this change defines rss as _only_ feature whose absence breaks the VM start,
   _all_ other features are dropped silently and only rss is not. Why??


I'm following what QEMU does in the other places rather than what it 
does just in virtio-net. I have pointed out virtio-gpu raises errors in 
such a situation. c.f., 
https://lore.kernel.org/all/8880b6f9-f556-46f7-a191-eeec0fe20...@daynix.com



- the series has a title 'Fixes and improvements' . This is not a fix and not an
   improvement, this is significant behavioral change that should be discussed 
in
   light of future plans regarding rss
- I suggest to remove this change from the series, submit it separately
   and discuss from all the sides


We should have already discussed about these matters; I responded all 
past replies in the previous versions months ago and had no update after 
that. Let's focus on matters that were not previously pointed out.


Regards,
Akihiko Odaki

[PATCH v9 09/20] virtio-net: Copy header only when necessary

2024-04-03 Thread Akihiko Odaki

The copied header is only used for byte swapping.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e33bdbfd84a5..ca0fbf7b7654 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -360,7 +360,8 @@ static void virtio_net_vnet_endian_status(VirtIONet *n, 
uint8_t status)
  * can't do it, we fallback onto fixing the headers in the core
  * virtio-net code.
  */
-n->needs_vnet_hdr_swap = virtio_net_set_vnet_endian(vdev, n->nic->ncs,
+n->needs_vnet_hdr_swap = n->has_vnet_hdr &&
+ virtio_net_set_vnet_endian(vdev, n->nic->ncs,
 queue_pairs, true);
 } else if (virtio_net_started(n, vdev->status)) {
 /* After using the device, we need to reset the network backend to
@@ -2767,7 +2768,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 
-if (n->has_vnet_hdr) {
+if (n->needs_vnet_hdr_swap) {
 if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
@@ -2775,19 +2776,16 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 g_free(elem);
 return -EINVAL;
 }
-if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
-   out_sg, out_num,
-   n->guest_hdr_len, -1);
-if (out_num == VIRTQUEUE_MAX_SIZE) {
-goto drop;
-}
-out_num += 1;
-out_sg = sg2;
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
+sg2[0].iov_len = n->guest_hdr_len;
+out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
+   n->guest_hdr_len, -1);
+if (out_num == VIRTQUEUE_MAX_SIZE) {
+goto drop;
 }
+out_num += 1;
+out_sg = sg2;
 }
 /*
  * If host wants to see the guest header as is, we can

-- 
2.44.0

[PATCH v9 12/20] virtio-net: Unify the logic to update NIC state for RSS

2024-04-03 Thread Akihiko Odaki

The code to attach or detach the eBPF program to RSS were duplicated so
unify them into one function to save some code.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 90 +
 1 file changed, 36 insertions(+), 54 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ac9c06f6865..61b49e335dea 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1232,18 +1232,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
-static void virtio_net_detach_epbf_rss(VirtIONet *n);
-
-static void virtio_net_disable_rss(VirtIONet *n)
-{
-if (n->rss_data.enabled) {
-trace_virtio_net_rss_disable();
-}
-n->rss_data.enabled = false;
-
-virtio_net_detach_epbf_rss(n);
-}
-
 static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
 {
 NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
@@ -1291,6 +1279,40 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
+static void virtio_net_commit_rss_config(VirtIONet *n)
+{
+if (n->rss_data.enabled) {
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+if (n->rss_data.populate_hash) {
+virtio_net_detach_epbf_rss(n);
+} else if (!virtio_net_attach_epbf_rss(n)) {
+if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+warn_report("Can't load eBPF RSS for vhost");
+} else {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+n->rss_data.enabled_software_rss = true;
+}
+}
+
+trace_virtio_net_rss_enable(n->rss_data.hash_types,
+n->rss_data.indirections_len,
+sizeof(n->rss_data.key));
+} else {
+virtio_net_detach_epbf_rss(n);
+trace_virtio_net_rss_disable();
+}
+}
+
+static void virtio_net_disable_rss(VirtIONet *n)
+{
+if (!n->rss_data.enabled) {
+return;
+}
+
+n->rss_data.enabled = false;
+virtio_net_commit_rss_config(n);
+}
+
 static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
@@ -1455,28 +1477,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
 goto error;
 }
 n->rss_data.enabled = true;
-
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-/* EBPF must be loaded for vhost */
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-goto error;
-}
-/* fallback to software RSS */
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-} else {
-/* use software RSS for hash populating */
-/* and detach eBPF if was loaded before */
-virtio_net_detach_epbf_rss(n);
-n->rss_data.enabled_software_rss = true;
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-temp.b);
+virtio_net_commit_rss_config(n);
 return queue_pairs;
 error:
 trace_virtio_net_rss_error(err_msg, err_value);
@@ -3092,26 +3093,7 @@ static int virtio_net_post_load_device(void *opaque, int 
version_id)
 }
 }
 
-if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't post-load eBPF RSS for vhost");
-} else {
-warn_report("Can't post-load eBPF RSS - "
-"fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-}
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-sizeof(n->rss_data.key));
-} else {
-trace_virtio_net_rss_disable();
-}
+virtio_net_commit_rss_config(n);
 return 0;
 }
 

-- 
2.44.0

[PATCH v9 20/20] ebpf: Add a separate target for skeleton

2024-04-03 Thread Akihiko Odaki

This generalizes the rule to generate the skeleton and allows to add
another.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/Makefile.ebpf | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
index 3391e7ce0898..572ca5987ae6 100755
--- a/tools/ebpf/Makefile.ebpf
+++ b/tools/ebpf/Makefile.ebpf
@@ -1,23 +1,24 @@
-OBJS = rss.bpf.o
+SKELETONS = rss.bpf.skeleton.h
 
 LLVM_STRIP ?= llvm-strip
 CLANG ?= clang
 INC_FLAGS = `$(CLANG) -print-file-name=include`
 EXTRA_CFLAGS ?= -O2 -g -target bpf
 
-all: $(OBJS)
+all: $(SKELETONS)
 
 .PHONY: clean
 
 clean:
-   rm -f $(OBJS)
-   rm -f rss.bpf.skeleton.h
+   rm -f $(SKELETONS) $(SKELETONS:%.skeleton.h=%.o)
 
-$(OBJS):  %.o:%.c
+%.o: %.c
$(CLANG) $(INC_FLAGS) \
 -D__KERNEL__ -D__ASM_SYSREG_H \
 -I../include $(LINUXINCLUDE) \
 $(EXTRA_CFLAGS) -c $< -o $@
$(LLVM_STRIP) -g $@
-   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
-   cp rss.bpf.skeleton.h ../../ebpf/
+
+%.skeleton.h: %.o
+   bpftool gen skeleton $< > $@
+   cp $@ ../../ebpf/

-- 
2.44.0

[PATCH v9 15/20] virtio-net: Always set populate_hash

2024-04-03 Thread Akihiko Odaki

The member is not cleared during reset so may have a stale value.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 86929c9e1fad..2de073ce18fd 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -651,6 +651,7 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 n->guest_hdr_len = n->mergeable_rx_bufs ?
 sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 sizeof(struct virtio_net_hdr);
+n->rss_data.populate_hash = false;
 }
 
 for (i = 0; i < n->max_queue_pairs; i++) {

-- 
2.44.0

[PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-03 Thread Akihiko Odaki

vhost requires eBPF for RSS. When eBPF is not available, virtio-net
implicitly disables RSS even if the user explicitly requests it. Return
an error instead of implicitly disabling RSS if RSS is requested but not
available.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 97 ++---
 1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..3d53eba88cfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 return features;
 }
 
-if (!ebpf_rss_is_loaded(>ebpf_rss)) {
-virtio_clear_feature(, VIRTIO_NET_F_RSS);
-}
 features = vhost_net_get_features(get_vhost_net(nc->peer), features);
 vdev->backend_features = features;
 
@@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
 return qatomic_read(>failover_primary_hidden);
 }
 
+static void virtio_net_device_unrealize(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIONet *n = VIRTIO_NET(dev);
+int i, max_queue_pairs;
+
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+virtio_net_unload_ebpf(n);
+}
+
+/* This will stop vhost backend if appropriate. */
+virtio_net_set_status(vdev, 0);
+
+g_free(n->netclient_name);
+n->netclient_name = NULL;
+g_free(n->netclient_type);
+n->netclient_type = NULL;
+
+g_free(n->mac_table.macs);
+g_free(n->vlans);
+
+if (n->failover) {
+qobject_unref(n->primary_opts);
+device_listener_unregister(>primary_listener);
+migration_remove_notifier(>migration_state);
+} else {
+assert(n->primary_opts == NULL);
+}
+
+max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+for (i = 0; i < max_queue_pairs; i++) {
+virtio_net_del_queue(n, i);
+}
+/* delete also control vq */
+virtio_del_queue(vdev, max_queue_pairs * 2);
+qemu_announce_timer_del(>announce_timer, false);
+g_free(n->vqs);
+qemu_del_nic(n->nic);
+virtio_net_rsc_cleanup(n);
+g_free(n->rss_data.indirections_table);
+net_rx_pkt_uninit(n->rx_pkt);
+virtio_cleanup(vdev);
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 net_rx_pkt_init(>rx_pkt);
 
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
-}
-}
-
-static void virtio_net_device_unrealize(DeviceState *dev)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(dev);
-VirtIONet *n = VIRTIO_NET(dev);
-int i, max_queue_pairs;
-
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_unload_ebpf(n);
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
+!virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
+virtio_net_device_unrealize(dev);
+error_setg(errp, "Can't load eBPF RSS for vhost");
 }
-
-/* This will stop vhost backend if appropriate. */
-virtio_net_set_status(vdev, 0);
-
-g_free(n->netclient_name);
-n->netclient_name = NULL;
-g_free(n->netclient_type);
-n->netclient_type = NULL;
-
-g_free(n->mac_table.macs);
-g_free(n->vlans);
-
-if (n->failover) {
-qobject_unref(n->primary_opts);
-device_listener_unregister(>primary_listener);
-migration_remove_notifier(>migration_state);
-} else {
-assert(n->primary_opts == NULL);
-}
-
-max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
-for (i = 0; i < max_queue_pairs; i++) {
-virtio_net_del_queue(n, i);
-}
-/* delete also control vq */
-virtio_del_queue(vdev, max_queue_pairs * 2);
-qemu_announce_timer_del(>announce_timer, false);
-g_free(n->vqs);
-qemu_del_nic(n->nic);
-virtio_net_rsc_cleanup(n);
-g_free(n->rss_data.indirections_table);
-net_rx_pkt_uninit(n->rx_pkt);
-virtio_cleanup(vdev);
 }
 
 static void virtio_net_reset(VirtIODevice *vdev)

-- 
2.44.0

[PATCH v9 10/20] virtio-net: Shrink header byte swapping buffer

2024-04-03 Thread Akihiko Odaki

Byte swapping is only performed for the part of header shared with the
legacy standard and the buffer only needs to cover it.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ca0fbf7b7654..5aa0527a1921 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -676,11 +676,6 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
-/*
- * Note: when extending the vnet header, please make sure to
- * change the vnet header copying logic in virtio_net_flush_tx()
- * as well.
- */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2752,7 +2747,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_v1_hash vhdr;
+struct virtio_net_hdr vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2769,18 +2764,18 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->needs_vnet_hdr_swap) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
-n->guest_hdr_len) {
+if (iov_to_buf(out_sg, out_num, 0, , sizeof(vhdr)) <
+sizeof(vhdr)) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
 g_free(elem);
 return -EINVAL;
 }
-virtio_net_hdr_swap(vdev, (void *) );
+virtio_net_hdr_swap(vdev, );
 sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
+sg2[0].iov_len = sizeof(vhdr);
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
-   n->guest_hdr_len, -1);
+   sizeof(vhdr), -1);
 if (out_num == VIRTQUEUE_MAX_SIZE) {
 goto drop;
 }

-- 
2.44.0

[PATCH v9 03/20] net: Move virtio-net header length assertion

2024-04-03 Thread Akihiko Odaki

The virtio-net header length assertion should happen for any clients.

Signed-off-by: Akihiko Odaki 
---
 net/net.c | 5 +
 net/tap.c | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/net.c b/net/net.c
index bd51037ebfb0..db096765f4b2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -56,6 +56,7 @@
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/qobject-input-visitor.h"
+#include "standard-headers/linux/virtio_net.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -550,6 +551,10 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 return;
 }
 
+assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
+   len == sizeof(struct virtio_net_hdr) ||
+   len == sizeof(struct virtio_net_hdr_v1_hash));
+
 nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }
diff --git a/net/tap.c b/net/tap.c
index c848844955df..49edf6c2b6e1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -267,9 +267,6 @@ static void tap_set_vnet_hdr_len(NetClientState *nc, int 
len)
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
 assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
-   len == sizeof(struct virtio_net_hdr) ||
-   len == sizeof(struct virtio_net_hdr_v1_hash));
 
 tap_fd_set_vnet_hdr_len(s->fd, len);
 s->host_vnet_hdr_len = len;

-- 
2.44.0

[PATCH v9 19/20] ebpf: Refactor tun_rss_steering_prog()

2024-04-03 Thread Akihiko Odaki

This saves branches and makes later BPF program changes easier.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/rss.bpf.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
index 77434435ac15..c989cb3cd82c 100644
--- a/tools/ebpf/rss.bpf.c
+++ b/tools/ebpf/rss.bpf.c
@@ -547,27 +547,23 @@ int tun_rss_steering_prog(struct __sk_buff *skb)
 config = bpf_map_lookup_elem(_rss_map_configurations, );
 toe = bpf_map_lookup_elem(_rss_map_toeplitz_key, );
 
-if (config && toe) {
-if (!config->redirect) {
-return config->default_queue;
-}
+if (!config || !toe) {
+return 0;
+}
 
-if (calculate_rss_hash(skb, config, toe, )) {
-__u32 table_idx = hash % config->indirections_len;
-__u16 *queue = 0;
+if (config->redirect && calculate_rss_hash(skb, config, toe, )) {
+__u32 table_idx = hash % config->indirections_len;
+__u16 *queue = 0;
 
-queue = bpf_map_lookup_elem(_rss_map_indirection_table,
-_idx);
+queue = bpf_map_lookup_elem(_rss_map_indirection_table,
+_idx);
 
-if (queue) {
-return *queue;
-}
+if (queue) {
+return *queue;
 }
-
-return config->default_queue;
 }
 
-return 0;
+return config->default_queue;
 }
 
 char _license[] SEC("license") = "GPL v2";

-- 
2.44.0

[PATCH v9 11/20] virtio-net: Disable RSS on reset

2024-04-03 Thread Akihiko Odaki

RSS is disabled by default.

Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Michael Tokarev 
---
 hw/net/virtio-net.c | 70 +++--
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5aa0527a1921..1ac9c06f6865 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -600,40 +600,6 @@ static void virtio_net_queue_enable(VirtIODevice *vdev, 
uint32_t queue_index)
 }
 }
 
-static void virtio_net_reset(VirtIODevice *vdev)
-{
-VirtIONet *n = VIRTIO_NET(vdev);
-int i;
-
-/* Reset back to compatibility mode */
-n->promisc = 1;
-n->allmulti = 0;
-n->alluni = 0;
-n->nomulti = 0;
-n->nouni = 0;
-n->nobcast = 0;
-/* multiqueue is disabled by default */
-n->curr_queue_pairs = 1;
-timer_del(n->announce_timer.tm);
-n->announce_timer.round = 0;
-n->status &= ~VIRTIO_NET_S_ANNOUNCE;
-
-/* Flush any MAC and VLAN filter table state */
-n->mac_table.in_use = 0;
-n->mac_table.first_multi = 0;
-n->mac_table.multi_overflow = 0;
-n->mac_table.uni_overflow = 0;
-memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
-memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
-qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
-memset(n->vlans, 0, MAX_VLAN >> 3);
-
-/* Flush any async TX */
-for (i = 0;  i < n->max_queue_pairs; i++) {
-flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
-}
-}
-
 static void peer_test_vnet_hdr(VirtIONet *n)
 {
 NetClientState *nc = qemu_get_queue(n->nic);
@@ -3861,6 +3827,42 @@ static void virtio_net_device_unrealize(DeviceState *dev)
 virtio_cleanup(vdev);
 }
 
+static void virtio_net_reset(VirtIODevice *vdev)
+{
+VirtIONet *n = VIRTIO_NET(vdev);
+int i;
+
+/* Reset back to compatibility mode */
+n->promisc = 1;
+n->allmulti = 0;
+n->alluni = 0;
+n->nomulti = 0;
+n->nouni = 0;
+n->nobcast = 0;
+/* multiqueue is disabled by default */
+n->curr_queue_pairs = 1;
+timer_del(n->announce_timer.tm);
+n->announce_timer.round = 0;
+n->status &= ~VIRTIO_NET_S_ANNOUNCE;
+
+/* Flush any MAC and VLAN filter table state */
+n->mac_table.in_use = 0;
+n->mac_table.first_multi = 0;
+n->mac_table.multi_overflow = 0;
+n->mac_table.uni_overflow = 0;
+memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
+qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+memset(n->vlans, 0, MAX_VLAN >> 3);
+
+/* Flush any async TX */
+for (i = 0;  i < n->max_queue_pairs; i++) {
+flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
+}
+
+virtio_net_disable_rss(n);
+}
+
 static void virtio_net_instance_init(Object *obj)
 {
 VirtIONet *n = VIRTIO_NET(obj);

-- 
2.44.0

[PATCH v9 02/20] tap: Remove qemu_using_vnet_hdr()

2024-04-03 Thread Akihiko Odaki

Since qemu_set_vnet_hdr_len() is always called when
qemu_using_vnet_hdr() is called, we can merge them and save some code.

For consistency, express that the virtio-net header is not in use by
returning 0 with qemu_get_vnet_hdr_len() instead of having a dedicated
function, qemu_get_using_vnet_hdr().

Signed-off-by: Akihiko Odaki 
---
 include/net/net.h   |  7 ---
 hw/net/e1000e.c |  1 -
 hw/net/igb.c|  1 -
 hw/net/net_tx_pkt.c |  4 ++--
 hw/net/virtio-net.c |  3 ---
 hw/net/vmxnet3.c|  2 --
 net/dump.c  |  4 +---
 net/net.c   | 24 +---
 net/netmap.c|  5 -
 net/tap.c   | 28 +---
 10 files changed, 5 insertions(+), 74 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index b1f9b35fcca1..6fe5a0aee833 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -57,8 +57,6 @@ typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasUso)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
-typedef bool (GetUsingVnetHdr)(NetClientState *);
-typedef void (UsingVnetHdr)(NetClientState *, bool);
 typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
 typedef int (GetVnetHdrLen)(NetClientState *);
 typedef void (SetVnetHdrLen)(NetClientState *, int);
@@ -88,10 +86,7 @@ typedef struct NetClientInfo {
 HasUso *has_uso;
 HasVnetHdr *has_vnet_hdr;
 HasVnetHdrLen *has_vnet_hdr_len;
-GetUsingVnetHdr *get_using_vnet_hdr;
-UsingVnetHdr *using_vnet_hdr;
 SetOffload *set_offload;
-GetVnetHdrLen *get_vnet_hdr_len;
 SetVnetHdrLen *set_vnet_hdr_len;
 SetVnetLE *set_vnet_le;
 SetVnetBE *set_vnet_be;
@@ -194,8 +189,6 @@ bool qemu_has_ufo(NetClientState *nc);
 bool qemu_has_uso(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
 bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
-bool qemu_get_using_vnet_hdr(NetClientState *nc);
-void qemu_using_vnet_hdr(NetClientState *nc, bool enable);
 void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
   int ecn, int ufo, int uso4, int uso6);
 int qemu_get_vnet_hdr_len(NetClientState *nc);
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 7c6f6029518c..d0dde767f6aa 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -352,7 +352,6 @@ e1000e_init_net_peer(E1000EState *s, PCIDevice *pci_dev, 
uint8_t *macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..1224c7ba8e38 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -349,7 +349,6 @@ igb_init_net_peer(IGBState *s, PCIDevice *pci_dev, uint8_t 
*macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c90..903238dca24d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -578,7 +578,7 @@ static void net_tx_pkt_sendv(
 {
 NetClientState *nc = opaque;
 
-if (qemu_get_using_vnet_hdr(nc->peer)) {
+if (qemu_get_vnet_hdr_len(nc->peer)) {
 qemu_sendv_packet(nc, virt_iov, virt_iov_cnt);
 } else {
 qemu_sendv_packet(nc, iov, iov_cnt);
@@ -808,7 +808,7 @@ static bool net_tx_pkt_do_sw_fragmentation(struct NetTxPkt 
*pkt,
 
 bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState *nc)
 {
-bool offload = qemu_get_using_vnet_hdr(nc->peer);
+bool offload = qemu_get_vnet_hdr_len(nc->peer);
 return net_tx_pkt_send_custom(pkt, offload, net_tx_pkt_sendv, nc);
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 58014a92ad19..f6112c0ac97d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3794,9 +3794,6 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 peer_test_vnet_hdr(n);
 if (peer_has_vnet_hdr(n)) {
-for (i = 0; i < n->max_queue_pairs; i++) {
-qemu_using_vnet_hdr(qemu_get_subqueue(n->nic, i)->peer, true);
-}
 n->host_hdr_len = sizeof(struct virtio_net_hdr);
 } else {
 n->host_hdr_len = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 707487c63666..63a91877730d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2091,8 +2091,6 @@ static void vmxnet3_net_init(VMXNET3State *s)
 if (s->peer_has_vhdr) {
 qemu_set_vnet_hdr_len(qemu_get_queue(s->nic)->peer,
 sizeof(struct virtio_net_hdr));
-
-qemu_using_vnet_hdr(qemu_get_queue(s->nic)->peer, 1);
 }
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic

[PATCH v9 05/20] tap: Call tap_receive_iov() from tap_receive()

2024-04-03 Thread Akihiko Odaki

This will save duplicate logic found in both of tap_receive_iov() and
tap_receive().

Suggested-by: "Zhang, Chen" 
Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 99c59ee46881..9825518ff1f3 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -133,39 +133,14 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 return tap_write_packet(s, iovp, iovcnt);
 }
 
-static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t 
size)
-{
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[2];
-int iovcnt = 0;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
-
-if (s->host_vnet_hdr_len) {
-iov[iovcnt].iov_base = 
-iov[iovcnt].iov_len  = s->host_vnet_hdr_len;
-iovcnt++;
-}
-
-iov[iovcnt].iov_base = (char *)buf;
-iov[iovcnt].iov_len  = size;
-iovcnt++;
-
-return tap_write_packet(s, iov, iovcnt);
-}
-
 static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[1];
-
-if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
-return tap_receive_raw(nc, buf, size);
-}
-
-iov[0].iov_base = (char *)buf;
-iov[0].iov_len  = size;
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = size
+};
 
-return tap_write_packet(s, iov, 1);
+return tap_receive_iov(nc, , 1);
 }
 
 #ifndef __sun__

-- 
2.44.0

[PATCH v9 17/20] ebpf: Fix RSS error handling

2024-04-03 Thread Akihiko Odaki

calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.

Fixes: f3fa412de2 ("ebpf: Added eBPF RSS program.")
Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1210 +++
 tools/ebpf/rss.bpf.c|   20 +-
 2 files changed, 610 insertions(+), 620 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index aed4ef9a0335..e41ed8890191 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -165,7 +165,7 @@ rss_bpf__create_skeleton(struct rss_bpf *obj)
s->progs[0].prog = >progs.tun_rss_steering_prog;
s->progs[0].link = >links.tun_rss_steering_prog;
 
-   s->data = (void *)rss_bpf__elf_bytes(>data_sz);
+   s->data = rss_bpf__elf_bytes(>data_sz);
 
obj->skeleton = s;
return 0;
@@ -176,194 +176,188 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20600;
-   return (const void *)"\
+   static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x38\x4d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x61\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x5f\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x58\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
-\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
-\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
-\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
-\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
-\xff\0\0\0\0\x15\x09\x47\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
+\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
+\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
+\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
+\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
+\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
+\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x3c\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
+\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x2c\x02\0\
-\0\0\0\x69\xa1\xc8\xff\0\0\0\0\x15\x01\x2a\x02\0\0\0\0\x7b\x9a\x38\xff\0\0\0\0\
-\x15\x01\x56\0\x86\xdd\0\0\x55\x01\x3b\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\x7b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xc8\xff\
-\xff\xff\x79\xa1\x38\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
+\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
+\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
+\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
+\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
+\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
+\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
 \x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x17\x02\0\0\0\0\x69\xa1\xce\xff\0\0

[PATCH v9 14/20] virtio-net: Report RSS warning at device realization

2024-04-03 Thread Akihiko Odaki

Warning about RSS fallback at device realization allows the user to
notice the configuration problem early.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3d53eba88cfc..86929c9e1fad 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1261,10 +1261,12 @@ static bool virtio_net_attach_epbf_rss(VirtIONet *n)
 
 if (!ebpf_rss_set_all(>ebpf_rss, ,
   n->rss_data.indirections_table, n->rss_data.key)) {
+error_report("Failed to configure eBPF RSS");
 return false;
 }
 
 if (!virtio_net_attach_ebpf_to_backend(n->nic, n->ebpf_rss.program_fd)) {
+error_report("Failed to attach eBPF to backend");
 return false;
 }
 
@@ -1279,16 +1281,10 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 static void virtio_net_commit_rss_config(VirtIONet *n)
 {
 if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash ||
+   !virtio_net_attach_epbf_rss(n);
 if (n->rss_data.populate_hash) {
 virtio_net_detach_epbf_rss(n);
-} else if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-} else {
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
 }
 
 trace_virtio_net_rss_enable(n->rss_data.hash_types,
@@ -3802,9 +3798,14 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
-!virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
-virtio_net_device_unrealize(dev);
-error_setg(errp, "Can't load eBPF RSS for vhost");
+!virtio_net_load_ebpf(n)) {
+if (get_vhost_net(nc->peer)) {
+error_setg(errp, "Can't load eBPF RSS for vhost");
+virtio_net_device_unrealize(dev);
+return;
+}
+
+warn_report("Can't load eBPF RSS - fallback to software RSS");
 }
 }
 

-- 
2.44.0

[PATCH v9 07/20] virtio-net: Do not propagate ebpf-rss-fds errors

2024-04-03 Thread Akihiko Odaki

Propagating ebpf-rss-fds errors has several problems.

First, it makes device realization fail and disables the fallback to the
conventional eBPF loading.

Second, it leaks memory by making device realization fail without
freeing memory already allocated.

Third, the convention is to set an error when a function returns false,
but virtio_net_load_ebpf_fds() and virtio_net_load_ebpf() returns false
without setting an error, which is confusing.

Remove the propagation to fix these problems.

Fixes: 0524ea0510a3 ("ebpf: Added eBPF initialization by fds.")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f6112c0ac97d..8ede38aadbbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1329,24 +1329,22 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
 int ret = true;
 int i = 0;
 
-ERRP_GUARD();
-
 if (n->nr_ebpf_rss_fds != EBPF_RSS_MAX_FDS) {
-error_setg(errp,
-  "Expected %d file descriptors but got %d",
-  EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
+warn_report("Expected %d file descriptors but got %d",
+EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
return false;
}
 
 for (i = 0; i < n->nr_ebpf_rss_fds; i++) {
-fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i], errp);
-if (*errp) {
+fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i],
+  _warn);
+if (fds[i] < 0) {
 ret = false;
 goto exit;
 }
@@ -1355,7 +1353,7 @@ static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error 
**errp)
 ret = ebpf_rss_load_fds(>ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
 
 exit:
-if (!ret || *errp) {
+if (!ret) {
 for (i = 0; i < n->nr_ebpf_rss_fds && fds[i] != -1; i++) {
 close(fds[i]);
 }
@@ -1364,13 +1362,12 @@ exit:
 return ret;
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf(VirtIONet *n)
 {
 bool ret = false;
 
 if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-if (!(n->ebpf_rss_fds
-&& virtio_net_load_ebpf_fds(n, errp))) {
+if (!(n->ebpf_rss_fds && virtio_net_load_ebpf_fds(n))) {
 ret = ebpf_rss_load(>ebpf_rss);
 }
 }
@@ -3825,7 +3822,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n, errp);
+virtio_net_load_ebpf(n);
 }
 }
 

-- 
2.44.0

[PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-03 Thread Akihiko Odaki

The peer buffer is qualified with const and not meant to be modified.
It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 2de073ce18fd..ff1884564d0d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1823,16 +1823,9 @@ static uint8_t virtio_net_get_hash_type(bool hasip4,
 return 0xff;
 }
 
-static void virtio_set_packet_hash(const uint8_t *buf, uint8_t report,
-   uint32_t hash)
-{
-struct virtio_net_hdr_v1_hash *hdr = (void *)buf;
-hdr->hash_value = hash;
-hdr->hash_report = report;
-}
-
 static int virtio_net_process_rss(NetClientState *nc, const uint8_t *buf,
-  size_t size)
+  size_t size,
+  struct virtio_net_hdr_v1_hash *hdr)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 unsigned int index = nc->queue_index, new_index = index;
@@ -1863,7 +1856,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
  n->rss_data.hash_types);
 if (net_hash_type > NetPktRssIpV6UdpEx) {
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, VIRTIO_NET_HASH_REPORT_NONE, 0);
+hdr->hash_value = VIRTIO_NET_HASH_REPORT_NONE;
+hdr->hash_report = 0;
 }
 return n->rss_data.redirect ? n->rss_data.default_queue : -1;
 }
@@ -1871,7 +1865,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
 hash = net_rx_pkt_calc_rss_hash(pkt, net_hash_type, n->rss_data.key);
 
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, reports[net_hash_type], hash);
+hdr->hash_value = hash;
+hdr->hash_report = reports[net_hash_type];
 }
 
 if (n->rss_data.redirect) {
@@ -1891,7 +1886,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 VirtQueueElement *elems[VIRTQUEUE_MAX_SIZE];
 size_t lens[VIRTQUEUE_MAX_SIZE];
 struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash extra_hdr;
 unsigned mhdr_cnt = 0;
 size_t offset, i, guest_offset, j;
 ssize_t err;
@@ -1901,7 +1896,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 }
 
 if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
-int index = virtio_net_process_rss(nc, buf, size);
+int index = virtio_net_process_rss(nc, buf, size, _hdr);
 if (index >= 0) {
 NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
 return virtio_net_receive_rcu(nc2, buf, size, true);
@@ -1961,15 +1956,17 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 if (n->mergeable_rx_bufs) {
 mhdr_cnt = iov_copy(mhdr_sg, ARRAY_SIZE(mhdr_sg),
 sg, elem->in_num,
-offsetof(typeof(mhdr), num_buffers),
-sizeof(mhdr.num_buffers));
+offsetof(typeof(extra_hdr), 
hdr.num_buffers),
+sizeof(extra_hdr.hdr.num_buffers));
 }
 
 receive_header(n, sg, elem->in_num, buf, size);
 if (n->rss_data.populate_hash) {
-offset = sizeof(mhdr);
+offset = offsetof(typeof(extra_hdr), hash_value);
 iov_from_buf(sg, elem->in_num, offset,
- buf + offset, n->host_hdr_len - sizeof(mhdr));
+ (char *)_hdr + offset,
+ sizeof(extra_hdr.hash_value) +
+ sizeof(extra_hdr.hash_report));
 }
 offset = n->host_hdr_len;
 total += n->guest_hdr_len;
@@ -2015,10 +2012,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 }
 
 if (mhdr_cnt) {
-virtio_stw_p(vdev, _buffers, i);
+virtio_stw_p(vdev, _hdr.hdr.num_buffers, i);
 iov_from_buf(mhdr_sg, mhdr_cnt,
  0,
- _buffers, sizeof mhdr.num_buffers);
+ _hdr.hdr.num_buffers,
+ sizeof extra_hdr.hdr.num_buffers);
 }
 
 for (j = 0; j < i; j++) {

-- 
2.44.0

[PATCH v9 06/20] tap: Shrink zeroed virtio-net header

2024-04-03 Thread Akihiko Odaki

tap prepends a zeroed virtio-net header when writing a packet to a
tap with virtio-net header enabled but not in use. This only happens
when s->host_vnet_hdr_len == sizeof(struct virtio_net_hdr).

Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tap.c b/net/tap.c
index 9825518ff1f3..51f7aec39d9e 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -119,7 +119,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 const struct iovec *iovp = iov;
 g_autofree struct iovec *iov_copy = NULL;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
+struct virtio_net_hdr hdr = { };
 
 if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
 iov_copy = g_new(struct iovec, iovcnt + 1);

-- 
2.44.0

[PATCH v9 18/20] ebpf: Return 0 when configuration fails

2024-04-03 Thread Akihiko Odaki

The kernel interprets the returned value as an unsigned 32-bit so -1
will mean queue 4294967295, which is awkward. Return 0 instead.

Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1532 +++
 tools/ebpf/rss.bpf.c|2 +-
 2 files changed, 767 insertions(+), 767 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index e41ed8890191..647212e5dd0c 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -178,786 +178,786 @@ static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
-\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
-\xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
-\x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
-\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
-\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
-\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
-\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x05\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x57\x01\0\0\x3f\xff\0\0\xb7\x04\
-\0\0\x01\0\0\0\x55\x01\x01\0\0\0\0\0\xb7\x04\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\
-\0\x63\x1a\x64\xff\0\0\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\
-\xa9\xd9\xff\0\0\0\0\x71\xa2\xd0\xff\0\0\0\0\x67\x02\0\0\x02\0\0\0\x57\x02\0\0\
-\x3c\0\0\0\x73\x4a\x5e\xff\0\0\0\0\x05\0\xbc\0\0\0\0\0\xb7\x01\0\0\x01\0\0\0\
-\x73\x1a\x59\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\xf0\xff\0\0\0\0\x7b\x1a\
-\xe8\xff\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\
-\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x28\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\
-\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xe4\x01\0\0\0\0\xb7\
-\x03\0\0\x28\0\0\0\x7b\x9a\x48\xff\0\0\0\0\x79\xa1\xe0\xff\0\0\0\0\x63\x1a\x6c\
-\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x70\xff\0\0\0\0\x79\xa1\xd8\xff\0\0\
-\0\0\x63\x1a\x64\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x68\xff\0\0\0\0\x79\
-\xa1\xe8\xff\0\0\0\0\x63\x1a\x74\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x78\
-\xff\0\0\0\0\x79\xa1\xf0\xff\0\0\0\0\x63\x1a\x7c\xff\0\0\0\0\x77\x01\0\0\x20\0\
-\0\0\x63\x1a\x80\xff\0\0\0\0\x71\xa9\xd6\xff\0\0\0\0\x25\x09\x93\0\x3c\0\0\0\
-\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\0\0\x18\x02\0\0\x01\0\0\0\0\0\0\0\0\x18\
-\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\0\x8c\0\0\0\0\0\xb7\x01\0\
-\0\0\0\0\0\x6b\x1a\xfe\xff\0\0\0\0\xb7\x02\0\0\x28\0\0\0\xbf\xa1\0\0\0\0\0\0\
-\x07\x01\0\0\x94\xff\xff\xff\x7b\x1a\x20\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\
-\x01\0\0\x84\xff\xff\xff\x7b\x1a\x18\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\
-\x38\xff\0\0\0\0\x7b\x7a\x30\xff\0\0\0\0\x7b\x8a\x28\xff\0\0\0\0\xbf\xa3\0\0\0\
-\0\0\0\x07\x03\0\0\xfe\xff\xff\xff\x79\xa1\x48\xff\0\0\0\0\x7b\x2a\x40\xff\0\0\
-\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\
-\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xb2\x01\0\0\0\0\xbf\x91\0\0\0\0\0\0\x15\
-\x01\x22\0\x3c\0\0\0\x15\x01\x58\0\x2c\0\0\0\x79\xa2\x40\xff\0\0\0\0\x55\x01\
-\x59\0\x2b\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xf8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\

[PATCH v9 04/20] net: Remove receive_raw()

2024-04-03 Thread Akihiko Odaki

While netmap implements virtio-net header, it does not implement
receive_raw(). Instead of implementing receive_raw for netmap, add
virtio-net headers in the common code and use receive_iov()/receive()
instead. This also fixes the buffer size for the virtio-net header.

Fixes: fbbdbddec0 ("tap: allow extended virtio header with hash info")
Signed-off-by: Akihiko Odaki 
---
 include/net/net.h |  1 -
 net/net.c | 18 --
 net/tap.c |  1 -
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 6fe5a0aee833..c8f679761bf9 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -72,7 +72,6 @@ typedef struct NetClientInfo {
 NetClientDriver type;
 size_t size;
 NetReceive *receive;
-NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetStart *start;
diff --git a/net/net.c b/net/net.c
index db096765f4b2..6938da05e077 100644
--- a/net/net.c
+++ b/net/net.c
@@ -787,11 +787,7 @@ static ssize_t nc_sendv_compat(NetClientState *nc, const 
struct iovec *iov,
 offset = iov_to_buf(iov, iovcnt, 0, buf, offset);
 }
 
-if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-ret = nc->info->receive_raw(nc, buffer, offset);
-} else {
-ret = nc->info->receive(nc, buffer, offset);
-}
+ret = nc->info->receive(nc, buffer, offset);
 
 g_free(buf);
 return ret;
@@ -806,6 +802,8 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 MemReentrancyGuard *owned_reentrancy_guard;
 NetClientState *nc = opaque;
 int ret;
+struct virtio_net_hdr_v1_hash vnet_hdr = { };
+g_autofree struct iovec *iov_copy = NULL;
 
 
 if (nc->link_down) {
@@ -824,7 +822,15 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 owned_reentrancy_guard->engaged_in_io = true;
 }
 
-if (nc->info->receive_iov && !(flags & QEMU_NET_PACKET_FLAG_RAW)) {
+if ((flags & QEMU_NET_PACKET_FLAG_RAW) && nc->vnet_hdr_len) {
+iov_copy = g_new(struct iovec, iovcnt + 1);
+iov_copy[0].iov_base = _hdr;
+iov_copy[0].iov_len =  nc->vnet_hdr_len;
+memcpy(_copy[1], iov, iovcnt * sizeof(*iov));
+iov = iov_copy;
+}
+
+if (nc->info->receive_iov) {
 ret = nc->info->receive_iov(nc, iov, iovcnt);
 } else {
 ret = nc_sendv_compat(nc, iov, iovcnt, flags);
diff --git a/net/tap.c b/net/tap.c
index 49edf6c2b6e1..99c59ee46881 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -360,7 +360,6 @@ static NetClientInfo net_tap_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
-.receive_raw = tap_receive_raw,
 .receive_iov = tap_receive_iov,
 .poll = tap_poll,
 .cleanup = tap_cleanup,

-- 
2.44.0

[PATCH v9 01/20] tap: Remove tap_probe_vnet_hdr_len()

2024-04-03 Thread Akihiko Odaki

It was necessary since an Linux older than 2.6.35 may implement the
virtio-net header but may not allow to change its length. Remove it
since such an old Linux is no longer supported.

Signed-off-by: Akihiko Odaki 
Acked-by: Michael S. Tsirkin 
---
 net/tap_int.h |  1 -
 net/tap-bsd.c |  5 -
 net/tap-linux.c   | 20 
 net/tap-solaris.c |  5 -
 net/tap-stub.c|  5 -
 net/tap.c |  8 ++--
 6 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/net/tap_int.h b/net/tap_int.h
index 9a2175655bb0..8857ff299d22 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -35,7 +35,6 @@ ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
 void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
-int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 274ea7bd2c3c..b4c84441ba8b 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -217,11 +217,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index c7e514ecb04b..1226d5fda2d9 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -185,26 +185,6 @@ int tap_probe_has_uso(int fd)
 return 1;
 }
 
-/* Verify that we can assign given length */
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-int orig;
-if (ioctl(fd, TUNGETVNETHDRSZ, ) == -1) {
-return 0;
-}
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-return 0;
-}
-/* Restore original length: we can't handle failure. */
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-fprintf(stderr, "TUNGETVNETHDRSZ ioctl() failed: %s. Exiting.\n",
-strerror(errno));
-abort();
-return -errno;
-}
-return 1;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 08b13af51257..51b7830bef1d 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -221,11 +221,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 4b24f61e3a6c..38673434cbd6 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -52,11 +52,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap.c b/net/tap.c
index baaa2f7a9ac7..72ae95894ff1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -259,11 +259,7 @@ static bool tap_has_vnet_hdr(NetClientState *nc)
 
 static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-
-assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-
-return !!tap_probe_vnet_hdr_len(s->fd, len);
+return tap_has_vnet_hdr(nc);
 }
 
 static int tap_get_vnet_hdr_len(NetClientState *nc)
@@ -432,7 +428,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
  * Make sure host header length is set correctly in tap:
  * it might have been modified by another instance of qemu.
  */
-if (tap_probe_vnet_hdr_len(s->fd, s->host_vnet_hdr_len)) {
+if (vnet_hdr) {
 tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
 }
 tap_read_poll(s, true);

-- 
2.44.0

[PATCH v9 08/20] virtio-net: Add only one queue pair when realizing

2024-04-03 Thread Akihiko Odaki

Multiqueue usage is not negotiated yet when realizing. If more than
one queue is added and the guest never requests to enable multiqueue,
the extra queues will not be deleted when unrealizing and leak.

Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest doesn't 
support multiqueue")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8ede38aadbbe..e33bdbfd84a5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3759,9 +3759,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n),
 n->net_conf.tx_queue_size);
 
-for (i = 0; i < n->max_queue_pairs; i++) {
-virtio_net_add_queue(n, i);
-}
+virtio_net_add_queue(n, 0);
 
 n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
 qemu_macaddr_default_if_unset(>nic_conf.macaddr);

-- 
2.44.0

[PATCH v9 00/20] virtio-net RSS/hash report fixes and improvements

2024-04-03 Thread Akihiko Odaki

This series contains fixes and improvements for virtio-net RSS and hash
reporting feature.

V7 -> V8:
  Reset author email addresses.
  Rebased.

V6 -> V7:
  Dropped patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Dropped the changes to remove packet flags.
  Re-introduced tap_receive() and changed it to call tap_receive_iov().
  Removed tap_get_vnet_hdr_len().
  Fixed tap initialization not to call tap_fd_set_vnet_hdr_len() for tap
  without virtio-net header.
  Changed to call error_report() instead of warn_report() for
  programming errors.

V5 -> V6:
  Corrected the message for patch "virtio-net: Return an error when vhost
  cannot enable RSS".
  Removed changes to introduce asserts from "virtio-net: Return an error
  when vhost cannot enable RSS".
  Reorganized patches "virtio-net: Return an error when vhost cannot
  enable RSS" and "virtio-net: Do not clear VIRTIO_NET_F_RSS". This
  version now contains patches "virtio-net: Return an error when vhost
  cannot enable RSS" and "virtio-net: Enable software RSS".
  Rebased.

V4 -> V5:
  Added patch "virtio-net: Do not write hashes to peer buffer".

V3 -> V4:
  Extract patches "tap: Remove tap_receive()" and  "net: Remove flag
  propagation" from "net: Remove receive_raw()".
  Added patch "virtio-net: Always set populate_hash".
  Added patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Added patch "ebpf: Use standard section name".
  Added patch "ebpf: Simplify error handling".
  Added patch "ebpf: Return 0 when configuration fails".
  Added patch "ebpf: Refactor tun_rss_steering_prog()".
  Added patch "ebpf: Add a separate target for skeleton".

V2 -> V3:
  Added patch "tap: Remove tap_probe_vnet_hdr_len()".
  Added patch "tap: Remove qemu_using_vnet_hdr()".
  Added patch "net: Move virtio-net header length assertion".
  Added patch "net: Remove receive_raw()".
  Added patch "tap: Shrink zeroed virtio-net header".
  Dropped patch "tap: Fix virtio-net header buffer size".

V1 -> V2:
  Added patch "ebpf: Fix RSS error handling".

Signed-off-by: Akihiko Odaki 
---
Changes in v9:
- Added patch "virtio-net: Do not propagate ebpf-rss-fds errors".
- Added patch "virtio-net: Shrink header byte swapping buffer".
- Rebased.
- Link to v8: 
https://lore.kernel.org/r/20231210-rss-v8-0-9553ee714...@daynix.com

---
Akihiko Odaki (20):
  tap: Remove tap_probe_vnet_hdr_len()
  tap: Remove qemu_using_vnet_hdr()
  net: Move virtio-net header length assertion
  net: Remove receive_raw()
  tap: Call tap_receive_iov() from tap_receive()
  tap: Shrink zeroed virtio-net header
  virtio-net: Do not propagate ebpf-rss-fds errors
  virtio-net: Add only one queue pair when realizing
  virtio-net: Copy header only when necessary
  virtio-net: Shrink header byte swapping buffer
  virtio-net: Disable RSS on reset
  virtio-net: Unify the logic to update NIC state for RSS
  virtio-net: Return an error when vhost cannot enable RSS
  virtio-net: Report RSS warning at device realization
  virtio-net: Always set populate_hash
  virtio-net: Do not write hashes to peer buffer
  ebpf: Fix RSS error handling
  ebpf: Return 0 when configuration fails
  ebpf: Refactor tun_rss_steering_prog()
  ebpf: Add a separate target for skeleton

 ebpf/rss.bpf.skeleton.h  | 1558 +++---
 include/net/net.h|8 -
 net/tap_int.h|1 -
 hw/net/e1000e.c  |1 -
 hw/net/igb.c |1 -
 hw/net/net_tx_pkt.c  |4 +-
 hw/net/virtio-net.c  |  344 +-
 hw/net/vmxnet3.c |2 -
 net/dump.c   |4 +-
 net/net.c|   47 +-
 net/netmap.c |5 -
 net/tap-bsd.c|5 -
 net/tap-linux.c  |   20 -
 net/tap-solaris.c|5 -
 net/tap-stub.c   |    5 -
 net/tap.c|   77 +--
 tools/ebpf/rss.bpf.c |   44 +-
 tools/ebpf/Makefile.ebpf |   15 +-
 18 files changed, 988 insertions(+), 1158 deletions(-)
---
base-commit: e5c6528dce86d7a9ada7ecf02fcb7b8560955131
change-id: 20231210-rss-e7c98e722253

Best regards,
-- 
Akihiko Odaki

Re: Qemu Display Coacoa Patch Serie Qemu 9.0 RC1

2024-03-28 Thread Akihiko Odaki


On 2024/03/29 6:44, BALATON Zoltan wrote:

On Thu, 28 Mar 2024, Rene Engel wrote:
I wanted to discuss this topic with you again, there was already a 
patch series that worked well under Qemu with
Pegasos2/AmigaOneXe/Same460 and AmigaOs4.1. The option zoom-to-fit=on 
should be used to adjust all resolutions provided by the guest
system to the aspect ratio if there are no Virtio GPU drivers 
available that allow this.


In my opinion exactly this option zoom-to-fit=on makes this possible. 
If you don't want to use this option you still have the possibility
to deactivate it. In Qemu 9.0 RC1 not all resolutions are stretched 
like in previous patches e.g. 640x480/800x600/1024x720 etc. but this
is exactly what we need for the Pegasos2/AmigaOneXe/Same460 machine 
with AmigaOs4.1.


There seems to be a bit of confusion about how this zoom-to-fit option 
is implemented by different -display backends and I'm not sure what is 
the intended behaviour or how other -display backends handle it. Maybe a 
single option is not even enough to describe all possible preferences so 
another one i.e. keep-aspect=true|false may also be needed to cover all 
possible settings (don't zoom, zoom with aspect ratio kept, zoom to fit 
window even if that stretches the picture out of aspect ratio). For 9.0 
ptobsbly we should go for consistency with other backends now as adding 
new options is not possible during freeze and then resolve this afterwards.


It is indeed something that requires a new flag like zoom-interpolation.



There are also problems within the resolutions with the mouse pointer 
where the screen output flickers it currently affects all patch 
series. I would be happy if we could find a solution for all this.


The flicker may be due to the resize algorithm used by macOS not giving 
the same result always. To resolve it maybe yet another option may be 
needed to not zoom to full available window but try to keep the zoom 
factor some integer value to avoid fractional scaling but I'm not sure 
that's the best way to solve it.


Flickering sounds more like a bug. I appreciate if you can share some 
recording.


Regards,
Akihiko Odaki



Regards,
BALATON Zoltan

I'll leave you 2 videos so you can decide for yourself what would make 
the most sense. It shows once for me the working zoom behavior
which works very well and the behavior with Qemu 9.0Rc1 including new 
Cocoa patches.


Qemu zoom-to fit=on for all Screenmodes 
working: https://www.youtube.com/watch?v=dnJ3W8egAFY


Qemu 9.0. RC1 zoom-to fit=on not working for all 
Screenmodes: https://www.youtube.com/watch?v=Ddq68ViudrA

[PATCH v2] hw/net/net_tx_pkt: Fix virtio header without checksum offloading

2024-03-27 Thread Akihiko Odaki

It is incorrect to have the VIRTIO_NET_HDR_F_NEEDS_CSUM set when
checksum offloading is disabled so clear the bit.

TCP/UDP checksum is usually offloaded when the peer requires virtio
headers because they can instruct the peer to compute checksum. However,
igb disables TX checksum offloading when a VF is enabled whether the
peer requires virtio headers because a transmitted packet can be routed
to it and it expects the packet has a proper checksum. Therefore, it
is necessary to have a correct virtio header even when checksum
offloading is disabled.

A real TCP/UDP checksum will be computed and saved in the buffer when
checksum offloading is disabled. The virtio specification requires to
set the packet checksum stored in the buffer to the TCP/UDP pseudo
header when the VIRTIO_NET_HDR_F_NEEDS_CSUM bit is set so the bit must
be cleared in that case.

Fixes: ffbd2dbd8e64 ("e1000e: Perform software segmentation for loopback")
Buglink: https://issues.redhat.com/browse/RHEL-23067
Signed-off-by: Akihiko Odaki 
---
Changes in v2:
- Dropped VIRTIO_NET_HDR_F_DATA_VALID. (Jason Wang)
- Link to v1: https://lore.kernel.org/r/20240324-tx-v1-1-a3b413574...@daynix.com
---
 hw/net/net_tx_pkt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2e5f58b3c9cc..2134a18c4c90 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -833,6 +833,7 @@ bool net_tx_pkt_send_custom(struct NetTxPkt *pkt, bool 
offload,
 
 if (offload || gso_type == VIRTIO_NET_HDR_GSO_NONE) {
 if (!offload && pkt->virt_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+pkt->virt_hdr.flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
 net_tx_pkt_do_sw_csum(pkt, >vec[NET_TX_PKT_L2HDR_FRAG],
   pkt->payload_frags + 
NET_TX_PKT_PL_START_FRAG - 1,
   pkt->payload_len);

---
base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
change-id: 20240324-tx-c57d3c22ad73

Best regards,
-- 
Akihiko Odaki

[PATCH] configure: Disable -Werror by default if sanitizers are enabled

2024-03-27 Thread Akihiko Odaki

With GCC 13.2.1 I got the following warning when sanitizers are enabled:

../hw/net/rtl8139.c: In function 'rtl8139_io_writeb':
../hw/net/rtl8139.c:2273:17: error: writing 8 bytes into a region of size 0 
[-Werror=stringop-overflow=]
 2273 | memcpy(data_to_checksum, saved_ip_header + 12, 8);

The bug is reported upstream and you can find details at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114494

The GCC documentation states it is not recommended to combine -Werror
and sanitizers so disable -Werror by default if sanitizers are enabled.

Signed-off-by: Akihiko Odaki 
---
 configure | 12 
 1 file changed, 12 insertions(+)

diff --git a/configure b/configure
index 3cd736b139f3..3407f9971db1 100755
--- a/configure
+++ b/configure
@@ -262,6 +262,7 @@ unset target_list_exclude
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
 # because they automatically enable/disable other options
+sanitizers=no
 tcg="auto"
 cfi="false"
 
@@ -713,6 +714,10 @@ for opt do
   meson_option_add -Doptimization=0
   default_cflags='-O0 -g'
   ;;
+  --disable-sanitizers) sanitizers=no
+  ;;
+  --enable-sanitizers) sanitizers=yes
+  ;;
   --disable-tcg) tcg="disabled"
   ;;
   --enable-tcg) tcg="enabled"
@@ -1706,9 +1711,15 @@ if test "$skip_meson" = no; then
   echo "cpp_link_args = [$(meson_quote $CXXFLAGS $LDFLAGS $EXTRA_CXXFLAGS 
$EXTRA_LDFLAGS)]" >> $cross
 
   # Only enable by default for git builds and on select OSes
+  # Also don't enable if sanitizers are enabled. The GCC documentation says:
+  # > Note that sanitizers tend to increase the rate of false positive 
warnings,
+  # > most notably those around -Wmaybe-uninitialized. We recommend against
+  # > combining -Werror and [the use of] sanitizers.
+  # https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Instrumentation-Options.html
   echo "# environment defaults, can still be overridden on " >> $cross
   echo "# the command line" >> $cross
   if test -e "$source_path/.git" && \
+  test "$sanitizers" == no && \
   { test "$host_os" = linux || test "$host_os" = "windows"; }; then
   echo 'werror = true' >> $cross
   fi
@@ -1789,6 +1800,7 @@ if test "$skip_meson" = no; then
   test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
"-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
   test "$plugins" = yes && meson_option_add "-Dplugins=true"
   test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
+  test "$sanitizers" = yes && meson_option_add -Dsanitizers=true
   run_meson() {
 NINJA=$ninja $meson setup "$@" "$PWD" "$source_path"
   }

---
base-commit: ba49d760eb04630e7b15f423ebecf6c871b8f77b
change-id: 20240327-werror-5d2e974c3d7d

Best regards,
-- 
Akihiko Odaki

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2828 matches

Mail list logo