Re: [PATCH v3 3/3] meson: Drop the .fa library prefix

2024-05-24 Thread Akihiko Odaki

On 2024/05/22 22:45, Paolo Bonzini wrote:

On Wed, May 22, 2024 at 12:49 PM Akihiko Odaki  wrote:

The non-standard .fa library prefix breaks the link source
de-duplication done by Meson so drop it.


Can you show the difference in the command lines?


Without this patch:
clang  -o qemu-io qemu-io.p/qemu-io.c.o -Werror -flto -Wl,--as-needed 
-Wl,--no-undefined -pie -Wl,--whole-archive libblock.fa libcrypto.fa 
libauthz.fa libqom.fa libio.fa libevent-loop-base.fa 
-Wl,--no-whole-archive -fsanitize=cfi-icall 
-fsanitize-cfi-icall-generalize-pointers -fsanitize=undefined 
-fsanitize=address -fstack-protector-strong -Wl,-z,relro -Wl,-z,now 
-fuse-ld=lld -Wl,--start-group libqemuutil.a 
subprojects/libvhost-user/libvhost-user-glib.a 
subprojects/libvhost-user/libvhost-user.a libblock.fa libcrypto.fa 
libauthz.fa libqom.fa libio.fa libevent-loop-base.fa @block.syms 
/usr/lib64/libgio-2.0.so /usr/lib64/libgobject-2.0.so 
/usr/lib64/libglib-2.0.so /usr/lib64/libgmodule-2.0.so -pthread 
/usr/lib64/libgnutls.so -lm /usr/lib64/libpixman-1.so 
/usr/lib64/libzstd.so /usr/lib64/libz.so /usr/lib64/libcurl.so 
/usr/lib64/libssh.so -lbz2 -lpam -Wl,--end-group


With this patch:
clang  -o qemu-io qemu-io.p/qemu-io.c.o -Werror -flto -Wl,--as-needed 
-Wl,--no-undefined -pie -Wl,--whole-archive -Wl,--start-group libblock.a 
libcrypto.a libauthz.a libqom.a libio.a libevent-loop-base.a 
-Wl,--no-whole-archive -fsanitize=cfi-icall 
-fsanitize-cfi-icall-generalize-pointers -fsanitize=undefined 
-fsanitize=address -fstack-protector-strong -Wl,-z,relro -Wl,-z,now 
-fuse-ld=lld libqemuutil.a 
subprojects/libvhost-user/libvhost-user-glib.a 
subprojects/libvhost-user/libvhost-user.a @block.syms 
/usr/lib64/libgio-2.0.so /usr/lib64/libgobject-2.0.so 
/usr/lib64/libglib-2.0.so /usr/lib64/libgmodule-2.0.so -pthread 
/usr/lib64/libgnutls.so -lm /usr/lib64/libpixman-1.so 
/usr/lib64/libzstd.so /usr/lib64/libz.so /usr/lib64/libcurl.so 
/usr/lib64/libssh.so -lbz2 -lpam -Wl,--end-group




One possibility to force de-duplication of objects is to change
"link_whole: foo" to "objects: foo.extract_all_objects(recursive:
false)" in all the declare_dependency() invocations that involve a
'fa' archive.

This completely gets rid of the archives, which now become just a
dummy target. I have gotten reports of "ld" exhausting the limit of
open files when using thin archives (thin archives contain "symbolic
links" to the files with the actual object code, thus reducing disk
usage), and this would also be fixed.

The disadvantage is requiring a bump to Meson 1.1.x as the minimum
required version (the recommended version is 1.2.x because earlier
versions are incompatible with recent macOS). It could be done before
this patch (because then this patch is a total no-op), or after too to
fix the immediate issue with sanitizer builds.


I wrote such a change and applied after this patch, but it caused 
dependencies to be ignored. Please see "[PATCH RFC 0/2] meson: Pass 
objects to declare_dependency()", which I sent earlier.


Regards,
Akihiko Odaki



[PATCH RFC 0/2] meson: Pass objects to declare_dependency()

2024-05-24 Thread Akihiko Odaki
qcow.c:584)
>>>   qemu-img.lto.o:(decompress_cluster)
>>> referenced by cloop.c:244 (/home/me/q/var/qemu/build/../block/cloop.c:244)
>>>   qemu-img.lto.o:(cloop_read_block)
>>> referenced 3 more times

ld.lld: error: undefined symbol: inflateEnd
>>> referenced by qcow2-threads.c:169 
>>> (/home/me/q/var/qemu/build/../block/qcow2-threads.c:169)
>>>   qemu-img.lto.o:(qcow2_zlib_decompress.cfi)
>>> referenced by qcow.c:0 (/home/me/q/var/qemu/build/../block/qcow.c:0)
>>>   qemu-img.lto.o:(decompress_cluster)
>>> referenced by qcow.c:0 (/home/me/q/var/qemu/build/../block/qcow.c:0)
>>>   qemu-img.lto.o:(decompress_cluster)
>>> referenced 2 more times

ld.lld: error: undefined symbol: uncompress
>>> referenced by vmdk.c:1958 (/home/me/q/var/qemu/build/../block/vmdk.c:1958)
>>>   qemu-img.lto.o:(vmdk_read_extent)

ld.lld: error: undefined symbol: compress
>>> referenced by vmdk.c:1865 (/home/me/q/var/qemu/build/../block/vmdk.c:1865)
>>>   qemu-img.lto.o:(vmdk_write_extent)

ld.lld: error: undefined symbol: inflateReset
>>> referenced by cloop.c:240 (/home/me/q/var/qemu/build/../block/cloop.c:240)
>>>   qemu-img.lto.o:(cloop_read_block)
>>> referenced by cloop.c:240 (/home/me/q/var/qemu/build/../block/cloop.c:240)
>>>   qemu-img.lto.o:(cloop_read_block)
>>> referenced by dmg.c:658 (/home/me/q/var/qemu/build/../block/dmg.c:658)
>>>   qemu-img.lto.o:(dmg_read_chunk)
>>> referenced 1 more times

ld.lld: error: too many errors emitted, stopping now (use --error-limit=0 to 
see all errors)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (2):
  meson: Pass objects to declare_dependency()
  Revert "meson: Propagate gnutls dependency"

 docs/devel/build-system.rst   |  2 +-
 meson.build   | 31 ---
 block/meson.build |  2 +-
 gdbstub/meson.build   |  4 ++--
 io/meson.build|  2 +-
 storage-daemon/meson.build|  2 +-
 subprojects/libvhost-user/meson.build |  2 +-
 tests/qtest/libqos/meson.build|  2 +-
 ui/meson.build|  2 +-
 9 files changed, 25 insertions(+), 24 deletions(-)
---
base-commit: 4a207ef32de97bc785ced1987eacab7665b49420
change-id: 20240524-objects-3dc07e485b7f

Best regards,
-- 
Akihiko Odaki 




[PATCH RFC 2/2] Revert "meson: Propagate gnutls dependency"

2024-05-24 Thread Akihiko Odaki
This reverts commit 3eacf70bb5a83e4775ad8003cbca63a40f70c8c2.

Signed-off-by: Akihiko Odaki 
---
 meson.build| 4 ++--
 block/meson.build  | 2 +-
 io/meson.build | 2 +-
 storage-daemon/meson.build | 2 +-
 ui/meson.build | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/meson.build b/meson.build
index 0e6fa2e4b777..cd5a24807ec8 100644
--- a/meson.build
+++ b/meson.build
@@ -3518,7 +3518,7 @@ if have_block
 'blockdev-nbd.c',
 'iothread.c',
 'job-qmp.c',
-  ), gnutls)
+  ))
 
   # os-posix.c contains POSIX-specific functions used by qemu-storage-daemon,
   # os-win32.c does not
@@ -4008,7 +4008,7 @@ if have_tools
   qemu_io = executable('qemu-io', files('qemu-io.c'),
  dependencies: [block, qemuutil], install: true)
   qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
-   dependencies: [blockdev, qemuutil, gnutls, selinux],
+   dependencies: [blockdev, qemuutil, selinux],
install: true)
 
   subdir('storage-daemon')
diff --git a/block/meson.build b/block/meson.build
index e1f03fd773e9..0165ac178370 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -39,7 +39,7 @@ block_ss.add(files(
   'throttle.c',
   'throttle-groups.c',
   'write-threshold.c',
-), zstd, zlib, gnutls)
+), zstd, zlib)
 
 system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
 system_ss.add(files('block-ram-registrar.c'))
diff --git a/io/meson.build b/io/meson.build
index 283b9b2bdbdf..1164812f9126 100644
--- a/io/meson.build
+++ b/io/meson.build
@@ -13,4 +13,4 @@ io_ss.add(files(
   'dns-resolver.c',
   'net-listener.c',
   'task.c',
-), gnutls)
+))
diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
index 46267b63e72b..b955949fd6f3 100644
--- a/storage-daemon/meson.build
+++ b/storage-daemon/meson.build
@@ -1,6 +1,6 @@
 qsd_ss = ss.source_set()
 qsd_ss.add(files('qemu-storage-daemon.c'))
-qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil, gnutls)
+qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
 
 subdir('qapi')
 
diff --git a/ui/meson.build b/ui/meson.build
index a5ce22a678ba..9358439ceeed 100644
--- a/ui/meson.build
+++ b/ui/meson.build
@@ -43,7 +43,7 @@ vnc_ss.add(files(
   'vnc-jobs.c',
   'vnc-clipboard.c',
 ))
-vnc_ss.add(zlib, jpeg, gnutls)
+vnc_ss.add(zlib, jpeg)
 vnc_ss.add(when: sasl, if_true: files('vnc-auth-sasl.c'))
 system_ss.add_all(when: [vnc, pixman], if_true: vnc_ss)
 system_ss.add(when: vnc, if_false: files('vnc-stubs.c'))

-- 
2.45.1




[PATCH RFC 1/2] meson: Pass objects to declare_dependency()

2024-05-24 Thread Akihiko Odaki
We used to request declare_dependency() to link_whole static libraries.
If a static library is a thin archive, GNU ld needs to open all object
files referenced by the archieve, and sometimes reaches to the open
file limit.

Another problem with link_whole is that it does not propagate
dependencies. In particular, gnutls, a dependency of crypto, is not
propagated to its users, and we currently workaround the issue by
declaring gnutls as a dependency for each crypto user.

Instead of using link_whole, extract objects included in static
libraries and pass them to declare_dependency(). This requires Meson
1.1.0 or later.

Signed-off-by: Akihiko Odaki 
---
 docs/devel/build-system.rst   |  2 +-
 meson.build   | 27 ++-
 gdbstub/meson.build   |  4 ++--
 subprojects/libvhost-user/meson.build |  2 +-
 tests/qtest/libqos/meson.build|  2 +-
 5 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 5baf027b7614..36ad40c76d2a 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -238,7 +238,7 @@ Subsystem sourcesets:
 libchardev = static_library('chardev', chardev_ss.sources(),
 build_by_default: false)
 
-chardev = declare_dependency(link_whole: libchardev)
+chardev = declare_dependency(objects: 
libchardev.extract_all_objects(recursive: false))
 
 Target-independent emulator sourcesets:
   Various general purpose helper code is compiled only once and
diff --git a/meson.build b/meson.build
index d6549722b50d..0e6fa2e4b777 100644
--- a/meson.build
+++ b/meson.build
@@ -1,4 +1,4 @@
-project('qemu', ['c'], meson_version: '>=0.63.0',
+project('qemu', ['c'], meson_version: '>=1.1.0',
 default_options: ['warning_level=1', 'c_std=gnu11', 'cpp_std=gnu++11', 
'b_colorout=auto',
   'b_staticpic=false', 'stdsplit=false', 
'optimization=2', 'b_pie=true'],
 version: files('VERSION'))
@@ -3456,20 +3456,20 @@ subdir('gdbstub')
 
 if enable_modules
   libmodulecommon = static_library('module-common', files('module-common.c') + 
genh, pic: true, c_args: '-DBUILD_DSO')
-  modulecommon = declare_dependency(link_whole: libmodulecommon, compile_args: 
'-DBUILD_DSO')
+  modulecommon = declare_dependency(objects: 
libmodulecommon.extract_all_objects(recursive: false), compile_args: 
'-DBUILD_DSO')
 endif
 
 qom_ss = qom_ss.apply({})
 libqom = static_library('qom', qom_ss.sources() + genh,
 dependencies: [qom_ss.dependencies()],
 build_by_default: false)
-qom = declare_dependency(link_whole: libqom)
+qom = declare_dependency(objects: libqom.extract_all_objects(recursive: false))
 
 event_loop_base = files('event-loop-base.c')
 event_loop_base = static_library('event-loop-base',
  sources: event_loop_base + genh,
  build_by_default: false)
-event_loop_base = declare_dependency(link_whole: event_loop_base,
+event_loop_base = declare_dependency(objects: 
event_loop_base.extract_all_objects(recursive: false),
  dependencies: [qom])
 
 stub_ss = stub_ss.apply({})
@@ -3703,7 +3703,7 @@ libauthz = static_library('authz', authz_ss.sources() + 
genh,
   dependencies: [authz_ss.dependencies()],
   build_by_default: false)
 
-authz = declare_dependency(link_whole: libauthz,
+authz = declare_dependency(objects: libauthz.extract_all_objects(recursive: 
false),
dependencies: qom)
 
 crypto_ss = crypto_ss.apply({})
@@ -3711,7 +3711,7 @@ libcrypto = static_library('crypto', crypto_ss.sources() 
+ genh,
dependencies: [crypto_ss.dependencies()],
build_by_default: false)
 
-crypto = declare_dependency(link_whole: libcrypto,
+crypto = declare_dependency(objects: libcrypto.extract_all_objects(recursive: 
false),
 dependencies: [authz, qom])
 
 io_ss = io_ss.apply({})
@@ -3720,7 +3720,8 @@ libio = static_library('io', io_ss.sources() + genh,
link_with: libqemuutil,
build_by_default: false)
 
-io = declare_dependency(link_whole: libio, dependencies: [crypto, qom])
+io = declare_dependency(objects: libio.extract_all_objects(recursive: false),
+dependencies: [crypto, qom])
 
 libmigration = static_library('migration', sources: migration_files + genh,
   build_by_default: false)
@@ -3734,7 +3735,7 @@ libblock = static_library('block', block_ss.sources() + 
genh,
   link_depends: block_syms,
   build_by_default: false)
 
-block = declare_dependency(link_whole: [libblock],
+block = declare_dependency(objects: libblock.extract_all_objects(recursive: 

[PATCH v4 3/4] qapi: Do not cast function pointers

2024-05-23 Thread Akihiko Odaki
-fsanitize=undefined complains if function pointers are casted. It
also prevents enabling teh strict mode of CFI which is currently
disabled with -fsanitize-cfi-icall-generalize-pointers.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2346
Signed-off-by: Akihiko Odaki 
---
 include/qapi/clone-visitor.h | 37 -
 qapi/qapi-clone-visitor.c| 30 --
 2 files changed, 28 insertions(+), 39 deletions(-)

diff --git a/include/qapi/clone-visitor.h b/include/qapi/clone-visitor.h
index adf9a788e232..ebc182b034d7 100644
--- a/include/qapi/clone-visitor.h
+++ b/include/qapi/clone-visitor.h
@@ -11,6 +11,7 @@
 #ifndef QAPI_CLONE_VISITOR_H
 #define QAPI_CLONE_VISITOR_H
 
+#include "qapi/error.h"
 #include "qapi/visitor.h"
 
 /*
@@ -20,11 +21,8 @@
  */
 typedef struct QapiCloneVisitor QapiCloneVisitor;
 
-void *qapi_clone(const void *src, bool (*visit_type)(Visitor *, const char *,
- void **, Error **));
-void qapi_clone_members(void *dst, const void *src, size_t sz,
-bool (*visit_type_members)(Visitor *, void *,
-   Error **));
+Visitor *qapi_clone_visitor_new(void);
+Visitor *qapi_clone_members_visitor_new(void);
 
 /*
  * Deep-clone QAPI object @src of the given @type, and return the result.
@@ -32,10 +30,18 @@ void qapi_clone_members(void *dst, const void *src, size_t 
sz,
  * Not usable on QAPI scalars (integers, strings, enums), nor on a
  * QAPI object that references the 'any' type.  Safe when @src is NULL.
  */
-#define QAPI_CLONE(type, src)   \
-((type *)qapi_clone(src,\
-(bool (*)(Visitor *, const char *, void **, \
-  Error **))visit_type_ ## type))
+#define QAPI_CLONE(type, src)   \
+({  \
+Visitor *v_;\
+type *dst_ = (type *) (src); /* Cast away const */  \
+\
+if (dst_) { \
+v_ = qapi_clone_visitor_new();  \
+visit_type_ ## type(v_, NULL, _, _abort); \
+visit_free(v_); \
+}   \
+dst_;   \
+})
 
 /*
  * Copy deep clones of @type members from @src to @dst.
@@ -43,9 +49,14 @@ void qapi_clone_members(void *dst, const void *src, size_t 
sz,
  * Not usable on QAPI scalars (integers, strings, enums), nor on a
  * QAPI object that references the 'any' type.
  */
-#define QAPI_CLONE_MEMBERS(type, dst, src)  \
-qapi_clone_members(dst, src, sizeof(type),  \
-   (bool (*)(Visitor *, void *, \
- Error **))visit_type_ ## type ## _members)
+#define QAPI_CLONE_MEMBERS(type, dst, src)\
+({\
+Visitor *v_;  \
+  \
+v_ = qapi_clone_members_visitor_new();\
+*(type *)(dst) = *(src);  \
+visit_type_ ## type ## _members(v_, (type *)(dst), _abort); \
+visit_free(v_);   \
+})
 
 #endif
diff --git a/qapi/qapi-clone-visitor.c b/qapi/qapi-clone-visitor.c
index c45c5caa3b89..bbf953698f38 100644
--- a/qapi/qapi-clone-visitor.c
+++ b/qapi/qapi-clone-visitor.c
@@ -149,7 +149,7 @@ static void qapi_clone_free(Visitor *v)
 g_free(v);
 }
 
-static Visitor *qapi_clone_visitor_new(void)
+Visitor *qapi_clone_visitor_new(void)
 {
 QapiCloneVisitor *v;
 
@@ -174,31 +174,9 @@ static Visitor *qapi_clone_visitor_new(void)
 return >visitor;
 }
 
-void *qapi_clone(const void *src, bool (*visit_type)(Visitor *, const char *,
- void **, Error **))
+Visitor *qapi_clone_members_visitor_new(void)
 {
-Visitor *v;
-void *dst = (void *) src; /* Cast away const */
-
-if (!src) {
-return NULL;
-}
-
-v = qapi_clone_visitor_new();
-visit_type(v, NULL, , _abort);
-visit_free(v);
-return dst;
-}
-
-void qapi_clone_members(void *dst, const void *src, size_t sz,
-bool (*visit_type_members)(Visitor *, void *,
-   Error **))
-{
-Visitor *v;
-
-v =

[PATCH v4 0/4] Fix sanitizer errors with clang 18.1.1

2024-05-23 Thread Akihiko Odaki
I upgraded my Fedora Asahi Remix from 39 to 40 and found new sanitizer
errors with clang it ships so here are fixes.

The patch "meson: Drop the .fa library prefix" may have a broad impact
to the build system so please tell me if you have a concern with it.

To: Michael Tokarev 
To: Laurent Vivier 
To: Paolo Bonzini 
To: Marc-André Lureau 
To: Daniel P. Berrangé 
To: Thomas Huth 
To: Philippe Mathieu-Daudé 
To: Alex Bennée 
To: Wainer dos Santos Moschetta 
To: Beraldo Leal 
To: Richard Henderson 
To: Laurent Vivier 
Cc: qemu-devel@nongnu.org
Signed-off-by: Akihiko Odaki 

Changes in v4:
- Fixed function pointer problems instead of ignoring them.
- Made references to allocations static instead of incompletely freeing
  them for qemu-keymap.
- s/prefix/suffix/ for patch "meson: Drop the .fa library suffix".
- Link to v3: 
https://lore.kernel.org/r/20240522-xkb-v3-0-c429de860...@daynix.com

Changes in v3:
- Moved changes that should belong to patch "meson: Drop the .fa library
  prefix" from patch "meson: Add -fno-sanitize=function".
- Link to v2: 
https://lore.kernel.org/r/20240522-xkb-v2-0-67b54fa7c...@daynix.com

Changes in v2:
- Added more patches and converted them to a series.
- Link to v1: 
https://lore.kernel.org/r/20240501-xkb-v1-1-f046d8e11...@daynix.com

---
Akihiko Odaki (4):
  qemu-keymap: Make references to allocations static
  lockable: Do not cast function pointers
  qapi: Do not cast function pointers
  meson: Drop the .fa library suffix

 docs/devel/build-system.rst |  5 -
 meson.build | 17 ++---
 include/qapi/clone-visitor.h| 37 -
 include/qemu/lockable.h | 23 +++
 qapi/qapi-clone-visitor.c   | 30 --
 qemu-keymap.c   |  8 +++-
 stubs/blk-exp-close-all.c   |  2 +-
 .gitlab-ci.d/buildtest-template.yml |  2 --
 .gitlab-ci.d/buildtest.yml  |  2 --
 gdbstub/meson.build |  2 --
 tcg/meson.build |  2 --
 tests/Makefile.include  |  2 +-
 tests/qtest/libqos/meson.build  |  1 -
 13 files changed, 54 insertions(+), 79 deletions(-)
---
base-commit: c25df57ae8f9fe1c72eee2dab37d76d904ac382e
change-id: 20240501-xkb-258483ccc5d8

Best regards,
-- 
Akihiko Odaki 




[PATCH v4 4/4] meson: Drop the .fa library suffix

2024-05-23 Thread Akihiko Odaki
The non-standard .fa library suffix breaks the link source
de-duplication done by Meson so drop it.

The lack of link source de-duplication causes AddressSanitizer to
complain ODR violations, and makes GNU ld abort when combined with
clang's LTO.

Previously, the non-standard suffix was necessary for fork-fuzzing.
Meson wraps all standard-suffixed libraries with --start-group and
--end-group. This made a fork-fuzz.ld linker script wrapped as well and
broke builds. Commit d2e6f9272d33 ("fuzz: remove fork-fuzzing
scaffolding") dropped fork-fuzzing so we can now restore the standard
suffix.

The occurences of the suffix were detected and removed by performing
a tree-wide search with 'fa' and .fa (note the quotes and dot).

Signed-off-by: Akihiko Odaki 
---
 docs/devel/build-system.rst |  5 -
 meson.build | 17 ++---
 stubs/blk-exp-close-all.c   |  2 +-
 .gitlab-ci.d/buildtest-template.yml |  2 --
 .gitlab-ci.d/buildtest.yml  |  2 --
 gdbstub/meson.build |  2 --
 tcg/meson.build |  2 --
 tests/Makefile.include  |  2 +-
 tests/qtest/libqos/meson.build  |  1 -
 9 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 09caf2f8e199..5baf027b7614 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -236,15 +236,10 @@ Subsystem sourcesets:
   are then turned into static libraries as follows::
 
 libchardev = static_library('chardev', chardev_ss.sources(),
-name_suffix: 'fa',
 build_by_default: false)
 
 chardev = declare_dependency(link_whole: libchardev)
 
-  As of Meson 0.55.1, the special ``.fa`` suffix should be used for everything
-  that is used with ``link_whole``, to ensure that the link flags are placed
-  correctly in the command line.
-
 Target-independent emulator sourcesets:
   Various general purpose helper code is compiled only once and
   the .o files are linked into all output binaries that need it.
diff --git a/meson.build b/meson.build
index 91a0aa64c640..d6549722b50d 100644
--- a/meson.build
+++ b/meson.build
@@ -3462,14 +3462,12 @@ endif
 qom_ss = qom_ss.apply({})
 libqom = static_library('qom', qom_ss.sources() + genh,
 dependencies: [qom_ss.dependencies()],
-name_suffix: 'fa',
 build_by_default: false)
 qom = declare_dependency(link_whole: libqom)
 
 event_loop_base = files('event-loop-base.c')
 event_loop_base = static_library('event-loop-base',
  sources: event_loop_base + genh,
- name_suffix: 'fa',
  build_by_default: false)
 event_loop_base = declare_dependency(link_whole: event_loop_base,
  dependencies: [qom])
@@ -3703,7 +3701,6 @@ qemu_syms = custom_target('qemu.syms', output: 
'qemu.syms',
 authz_ss = authz_ss.apply({})
 libauthz = static_library('authz', authz_ss.sources() + genh,
   dependencies: [authz_ss.dependencies()],
-  name_suffix: 'fa',
   build_by_default: false)
 
 authz = declare_dependency(link_whole: libauthz,
@@ -3712,7 +3709,6 @@ authz = declare_dependency(link_whole: libauthz,
 crypto_ss = crypto_ss.apply({})
 libcrypto = static_library('crypto', crypto_ss.sources() + genh,
dependencies: [crypto_ss.dependencies()],
-   name_suffix: 'fa',
build_by_default: false)
 
 crypto = declare_dependency(link_whole: libcrypto,
@@ -3722,13 +3718,11 @@ io_ss = io_ss.apply({})
 libio = static_library('io', io_ss.sources() + genh,
dependencies: [io_ss.dependencies()],
link_with: libqemuutil,
-   name_suffix: 'fa',
build_by_default: false)
 
 io = declare_dependency(link_whole: libio, dependencies: [crypto, qom])
 
 libmigration = static_library('migration', sources: migration_files + genh,
-  name_suffix: 'fa',
   build_by_default: false)
 migration = declare_dependency(link_with: libmigration,
dependencies: [zlib, qom, io])
@@ -3738,7 +3732,6 @@ block_ss = block_ss.apply({})
 libblock = static_library('block', block_ss.sources() + genh,
   dependencies: block_ss.dependencies(),
   link_depends: block_syms,
-  name_suffix: 'fa',
   build_by_default: false)
 
 block = declare_dependency(link_whole: [libblock],
@@ -3748,7 +3741,6 @@ block = declare_dependency(link_whole: [libblock],
 blockdev_ss = blockdev_ss.apply({})
 libblockdev = static_library('blockdev', blockdev_

[PATCH v4 2/4] lockable: Do not cast function pointers

2024-05-23 Thread Akihiko Odaki
-fsanitize=undefined complains if function pointers are casted. It
also prevents enabling teh strict mode of CFI which is currently
disabled with -fsanitize-cfi-icall-generalize-pointers.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2345
Signed-off-by: Akihiko Odaki 
---
 include/qemu/lockable.h | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/qemu/lockable.h b/include/qemu/lockable.h
index 9823220446d9..c1b097c44879 100644
--- a/include/qemu/lockable.h
+++ b/include/qemu/lockable.h
@@ -43,15 +43,30 @@ qemu_null_lockable(void *x)
 return NULL;
 }
 
+#define QML_FUNC_(name)   \
+static inline void qemu_lockable_ ## name ## _lock(void *x)   \
+{ \
+qemu_ ## name ## _lock(x);\
+} \
+static inline void qemu_lockable_ ## name ## _unlock(void *x) \
+{ \
+qemu_ ## name ## _unlock(x);  \
+}
+
+QML_FUNC_(mutex)
+QML_FUNC_(rec_mutex)
+QML_FUNC_(co_mutex)
+QML_FUNC_(spin)
+
 /*
  * In C, compound literals have the lifetime of an automatic variable.
  * In C++ it would be different, but then C++ wouldn't need QemuLockable
  * either...
  */
-#define QML_OBJ_(x, name) (&(QemuLockable) {\
-.object = (x),  \
-.lock = (QemuLockUnlockFunc *) qemu_ ## name ## _lock,  \
-.unlock = (QemuLockUnlockFunc *) qemu_ ## name ## _unlock   \
+#define QML_OBJ_(x, name) (&(QemuLockable) {\
+.object = (x),  \
+.lock = qemu_lockable_ ## name ## _lock,\
+.unlock = qemu_lockable_ ## name ## _unlock \
 })
 
 /**

-- 
2.45.1




[PATCH v4 1/4] qemu-keymap: Make references to allocations static

2024-05-23 Thread Akihiko Odaki
LeakSanitizer complains about allocations whose references are held
only by automatic variables. It is possible to free them to suppress
the complaints, but it is a chore to make sure they are freed in all
exit paths so make them static instead.

Signed-off-by: Akihiko Odaki 
---
 qemu-keymap.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..701e4332af87 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -154,9 +154,9 @@ static xkb_mod_mask_t get_mod(struct xkb_keymap *map, const 
char *name)
 
 int main(int argc, char *argv[])
 {
-struct xkb_context *ctx;
-struct xkb_keymap *map;
-struct xkb_state *state;
+static struct xkb_context *ctx;
+static struct xkb_keymap *map;
+static struct xkb_state *state;
 xkb_mod_index_t mod, mods;
 int rc;
 
@@ -234,8 +234,6 @@ int main(int argc, char *argv[])
 
 state = xkb_state_new(map);
 xkb_keymap_key_for_each(map, walk_map, state);
-xkb_state_unref(state);
-state = NULL;
 
 /* add quirks */
 fprintf(outfile,

-- 
2.45.1




Re: [PATCH v3 1/3] qemu-keymap: Free xkb allocations

2024-05-23 Thread Akihiko Odaki

On 2024/05/22 23:36, Peter Maydell wrote:

On Wed, 22 May 2024 at 12:47, Daniel P. Berrangé  wrote:


On Wed, May 22, 2024 at 12:35:23PM +0100, Peter Maydell wrote:

On Wed, 22 May 2024 at 11:49, Akihiko Odaki  wrote:


This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
  qemu-keymap.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
  xkb_state_unref(state);
  state = NULL;

+xkb_keymap_unref(map);
+xkb_context_unref(ctx);
+
  /* add quirks */
  fprintf(outfile,
  "\n"


This is surely a sanitizer bug. We're unconditionally about
to exit() the program here, where everything is freed, so nothing
is leaked.


I'm not sure I'd call it a sanitizer bug, rather its expected behaviour
of sanitizers. Even if you're about to exit, its important to see info
about all memory that is not freed by that time, since it can reveal
leaks that were ongoing in the process that are valid things to fix.
To make the sanitizers usable you need to get rid of the noise. IOW,
either have to provide a file to supress reports of memory that is
expected to remain allocated, or have to free it despite being about
to exit.  Free'ing is the more maintainable strategy, as IME, supression
files get outdated over time.


I think if there's still a live variable pointing to the unfreed
memory at point of exit the compiler/sanitizer should be able to
deduce that that's not a real leak. And if you believe that these
really are leaks then you also need to be fixing them on the early
exit paths, like the one where we exit(1) if xkb_keymap_new_from_names()
fails.

I don't object to this change, but I think that if the sanitizer
complains about this kind of thing it's a bug, because it obscures
real leaks.


The sanitizer can certainly be improved to keep the automatic variables 
alive when there is exit(), but I'm a bit sympathetic with the sanitizer.


Covering such a case requires the sanitizer to know that exit() 
terminates the process. Perhaps the sanitizer can look for 
__attribute__((noreturn)) and __builtin_unreachable(), but they may not 
be present and not reliable. I think it is a legitimate design decision 
not to try to deal with this kind of situation instead of partially 
handling it with attributes and builtin calls.


Regards,
Akihiko Odaki



[PATCH v3 3/3] meson: Drop the .fa library prefix

2024-05-22 Thread Akihiko Odaki
The non-standard .fa library prefix breaks the link source
de-duplication done by Meson so drop it.

The lack of link source de-duplication causes AddressSanitizer to
complain ODR violations, and makes GNU ld abort when combined with
clang's LTO.

Previously, the non-standard prefix was necessary for fork-fuzzing.
Meson wraps all standard-prefixed libraries with --start-group and
--end-group. This made a fork-fuzz.ld linker script wrapped as well and
broke builds. Commit d2e6f9272d33 ("fuzz: remove fork-fuzzing
scaffolding") dropped fork-fuzzing so we can now restore the standard
prefix.

The occurences of the prefix were detected and removed by performing
a tree-wide search with 'fa' and .fa (note the quotes and dot).

Signed-off-by: Akihiko Odaki 
---
 docs/devel/build-system.rst |  5 -
 meson.build | 17 ++---
 stubs/blk-exp-close-all.c   |  2 +-
 .gitlab-ci.d/buildtest-template.yml |  2 --
 .gitlab-ci.d/buildtest.yml  |  2 --
 gdbstub/meson.build |  2 --
 tcg/meson.build |  2 --
 tests/Makefile.include  |  2 +-
 tests/qtest/libqos/meson.build  |  1 -
 9 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 09caf2f8e199..5baf027b7614 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -236,15 +236,10 @@ Subsystem sourcesets:
   are then turned into static libraries as follows::
 
 libchardev = static_library('chardev', chardev_ss.sources(),
-name_suffix: 'fa',
 build_by_default: false)
 
 chardev = declare_dependency(link_whole: libchardev)
 
-  As of Meson 0.55.1, the special ``.fa`` suffix should be used for everything
-  that is used with ``link_whole``, to ensure that the link flags are placed
-  correctly in the command line.
-
 Target-independent emulator sourcesets:
   Various general purpose helper code is compiled only once and
   the .o files are linked into all output binaries that need it.
diff --git a/meson.build b/meson.build
index 3c3ad0d5f5eb..9eaf218609eb 100644
--- a/meson.build
+++ b/meson.build
@@ -3462,14 +3462,12 @@ endif
 qom_ss = qom_ss.apply({})
 libqom = static_library('qom', qom_ss.sources() + genh,
 dependencies: [qom_ss.dependencies()],
-name_suffix: 'fa',
 build_by_default: false)
 qom = declare_dependency(link_whole: libqom)
 
 event_loop_base = files('event-loop-base.c')
 event_loop_base = static_library('event-loop-base',
  sources: event_loop_base + genh,
- name_suffix: 'fa',
  build_by_default: false)
 event_loop_base = declare_dependency(link_whole: event_loop_base,
  dependencies: [qom])
@@ -3703,7 +3701,6 @@ qemu_syms = custom_target('qemu.syms', output: 
'qemu.syms',
 authz_ss = authz_ss.apply({})
 libauthz = static_library('authz', authz_ss.sources() + genh,
   dependencies: [authz_ss.dependencies()],
-  name_suffix: 'fa',
   build_by_default: false)
 
 authz = declare_dependency(link_whole: libauthz,
@@ -3712,7 +3709,6 @@ authz = declare_dependency(link_whole: libauthz,
 crypto_ss = crypto_ss.apply({})
 libcrypto = static_library('crypto', crypto_ss.sources() + genh,
dependencies: [crypto_ss.dependencies()],
-   name_suffix: 'fa',
build_by_default: false)
 
 crypto = declare_dependency(link_whole: libcrypto,
@@ -3722,13 +3718,11 @@ io_ss = io_ss.apply({})
 libio = static_library('io', io_ss.sources() + genh,
dependencies: [io_ss.dependencies()],
link_with: libqemuutil,
-   name_suffix: 'fa',
build_by_default: false)
 
 io = declare_dependency(link_whole: libio, dependencies: [crypto, qom])
 
 libmigration = static_library('migration', sources: migration_files + genh,
-  name_suffix: 'fa',
   build_by_default: false)
 migration = declare_dependency(link_with: libmigration,
dependencies: [zlib, qom, io])
@@ -3738,7 +3732,6 @@ block_ss = block_ss.apply({})
 libblock = static_library('block', block_ss.sources() + genh,
   dependencies: block_ss.dependencies(),
   link_depends: block_syms,
-  name_suffix: 'fa',
   build_by_default: false)
 
 block = declare_dependency(link_whole: [libblock],
@@ -3748,7 +3741,6 @@ block = declare_dependency(link_whole: [libblock],
 blockdev_ss = blockdev_ss.apply({})
 libblockdev = static_library('blockdev', blockdev_

[PATCH v3 2/3] meson: Add -fno-sanitize=function

2024-05-22 Thread Akihiko Odaki
-fsanitize=function enforces the consistency of function types, but
include/qemu/lockable.h contains function pointer casts, which violate
the rule. We already disables exact type checks for CFI with
-fsanitize-cfi-icall-generalize-pointers so disable -fsanitize=function
as well.

Signed-off-by: Akihiko Odaki 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 91a0aa64c640..3c3ad0d5f5eb 100644
--- a/meson.build
+++ b/meson.build
@@ -298,7 +298,7 @@ endforeach
 
 qemu_common_flags = [
   '-D_GNU_SOURCE', '-D_FILE_OFFSET_BITS=64', '-D_LARGEFILE_SOURCE',
-  '-fno-strict-aliasing', '-fno-common', '-fwrapv' ]
+  '-fno-sanitize=function', '-fno-strict-aliasing', '-fno-common', '-fwrapv' ]
 qemu_cflags = []
 qemu_ldflags = []
 

-- 
2.45.1




[PATCH v3 1/3] qemu-keymap: Free xkb allocations

2024-05-22 Thread Akihiko Odaki
This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
 qemu-keymap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
 xkb_state_unref(state);
 state = NULL;
 
+xkb_keymap_unref(map);
+xkb_context_unref(ctx);
+
 /* add quirks */
 fprintf(outfile,
 "\n"

-- 
2.45.1




[PATCH v3 0/3] Fix sanitizer errors with clang 18.1.1

2024-05-22 Thread Akihiko Odaki
I upgraded my Fedora Asahi Remix from 39 to 40 and found new sanitizer
errors with clang it ships so here are fixes.

The patch "meson: Drop the .fa library prefix" may have a broad impact
to the build system so please tell me if you have a concern with it.

To: Michael Tokarev 
To: Laurent Vivier 
To: Paolo Bonzini 
To: Marc-André Lureau 
To: Daniel P. Berrangé 
To: Thomas Huth 
To: Philippe Mathieu-Daudé 
To: Alex Bennée 
To: Wainer dos Santos Moschetta 
To: Beraldo Leal 
To: Richard Henderson 
To: Laurent Vivier 
Cc: qemu-devel@nongnu.org
Signed-off-by: Akihiko Odaki 

Changes in v3:
- Moved changes that should belong to patch "meson: Drop the .fa library
  prefix" from patch "meson: Add -fno-sanitize=function".
- Link to v2: 
https://lore.kernel.org/r/20240522-xkb-v2-0-67b54fa7c...@daynix.com

Changes in v2:
- Added more patches and converted them to a series.
- Link to v1: 
https://lore.kernel.org/r/20240501-xkb-v1-1-f046d8e11...@daynix.com

---
Akihiko Odaki (3):
  qemu-keymap: Free xkb allocations
  meson: Add -fno-sanitize=function
  meson: Drop the .fa library prefix

 docs/devel/build-system.rst |  5 -
 meson.build | 19 +++
 qemu-keymap.c   |  3 +++
 stubs/blk-exp-close-all.c   |  2 +-
 .gitlab-ci.d/buildtest-template.yml |  2 --
 .gitlab-ci.d/buildtest.yml  |  2 --
 gdbstub/meson.build |  2 --
 tcg/meson.build |  2 --
 tests/Makefile.include  |  2 +-
 tests/qtest/libqos/meson.build  |  1 -
 10 files changed, 8 insertions(+), 32 deletions(-)
---
base-commit: c25df57ae8f9fe1c72eee2dab37d76d904ac382e
change-id: 20240501-xkb-258483ccc5d8

Best regards,
-- 
Akihiko Odaki 




[PATCH v2 0/3] Fix sanitizer errors with clang 18.1.1

2024-05-22 Thread Akihiko Odaki
I upgraded my Fedora Asahi Remix from 39 to 40 and found new sanitizer
errors with clang it ships so here are fixes.

The patch "meson: Drop the .fa library prefix" may have a broad impact
to the build system so please tell me if you have a concern with it.

To: Michael Tokarev 
To: Laurent Vivier 
To: Paolo Bonzini 
To: Marc-André Lureau 
To: Daniel P. Berrangé 
To: Thomas Huth 
To: Philippe Mathieu-Daudé 
To: Alex Bennée 
To: Wainer dos Santos Moschetta 
To: Beraldo Leal 
To: Richard Henderson 
To: Laurent Vivier 
Cc: qemu-devel@nongnu.org
Signed-off-by: Akihiko Odaki 

Changes in v2:
- Added more patches and converted them to a series.
- Link to v1: 
https://lore.kernel.org/r/20240501-xkb-v1-1-f046d8e11...@daynix.com

---
Akihiko Odaki (3):
  qemu-keymap: Free xkb allocations
  meson: Add -fno-sanitize=function
  meson: Drop the .fa library prefix

 docs/devel/build-system.rst |  5 -
 meson.build | 19 +++
 qemu-keymap.c   |  3 +++
 stubs/blk-exp-close-all.c   |  2 +-
 .gitlab-ci.d/buildtest-template.yml |  2 --
 .gitlab-ci.d/buildtest.yml  |  2 --
 gdbstub/meson.build |  2 --
 tcg/meson.build |  2 --
 tests/Makefile.include  |  2 +-
 tests/qtest/libqos/meson.build  |  1 -
 10 files changed, 8 insertions(+), 32 deletions(-)
---
base-commit: c25df57ae8f9fe1c72eee2dab37d76d904ac382e
change-id: 20240501-xkb-258483ccc5d8

Best regards,
-- 
Akihiko Odaki 




[PATCH v2 3/3] meson: Drop the .fa library prefix

2024-05-22 Thread Akihiko Odaki
The non-standard .fa library prefix breaks the link source
de-duplication done by Meson so drop it.

The lack of link source de-duplication causes AddressSanitizer to
complain ODR violations, and makes GNU ld abort when combined with
clang's LTO.

Previously, the non-standard prefix was necessary for fork-fuzzing.
Meson wraps all standard-prefixed libraries with --start-group and
--end-group. This made a fork-fuzz.ld linker script wrapped as well and
broke builds. Commit d2e6f9272d33 ("fuzz: remove fork-fuzzing
scaffolding") dropped fork-fuzzing so we can now restore the standard
prefix.

The occurances of the prefix was detected and removed by performing
a tree-wide search with 'fa' and .fa (note the quotes and dot).

Signed-off-by: Akihiko Odaki 
---
 docs/devel/build-system.rst | 5 -
 stubs/blk-exp-close-all.c   | 2 +-
 .gitlab-ci.d/buildtest-template.yml | 2 --
 .gitlab-ci.d/buildtest.yml  | 2 --
 gdbstub/meson.build | 2 --
 tcg/meson.build | 2 --
 tests/Makefile.include  | 2 +-
 tests/qtest/libqos/meson.build  | 1 -
 8 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 09caf2f8e199..5baf027b7614 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -236,15 +236,10 @@ Subsystem sourcesets:
   are then turned into static libraries as follows::
 
 libchardev = static_library('chardev', chardev_ss.sources(),
-name_suffix: 'fa',
 build_by_default: false)
 
 chardev = declare_dependency(link_whole: libchardev)
 
-  As of Meson 0.55.1, the special ``.fa`` suffix should be used for everything
-  that is used with ``link_whole``, to ensure that the link flags are placed
-  correctly in the command line.
-
 Target-independent emulator sourcesets:
   Various general purpose helper code is compiled only once and
   the .o files are linked into all output binaries that need it.
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
index 1c7131676392..2f68e06d7d05 100644
--- a/stubs/blk-exp-close-all.c
+++ b/stubs/blk-exp-close-all.c
@@ -1,7 +1,7 @@
 #include "qemu/osdep.h"
 #include "block/export.h"
 
-/* Only used in programs that support block exports (libblockdev.fa) */
+/* Only used in programs that support block exports (libblockdev.a) */
 void blk_exp_close_all(void)
 {
 }
diff --git a/.gitlab-ci.d/buildtest-template.yml 
b/.gitlab-ci.d/buildtest-template.yml
index 22045add8064..69e468a576ba 100644
--- a/.gitlab-ci.d/buildtest-template.yml
+++ b/.gitlab-ci.d/buildtest-template.yml
@@ -45,10 +45,8 @@
 exclude:
   - build/**/*.p
   - build/**/*.a.p
-  - build/**/*.fa.p
   - build/**/*.c.o
   - build/**/*.c.o.d
-  - build/**/*.fa
 
 .common_test_job_template:
   extends: .base_job_template
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index cfdff175c389..c156e6f1d90e 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -178,10 +178,8 @@ build-previous-qemu:
 exclude:
   - build-previous/**/*.p
   - build-previous/**/*.a.p
-  - build-previous/**/*.fa.p
   - build-previous/**/*.c.o
   - build-previous/**/*.c.o.d
-  - build-previous/**/*.fa
   needs:
 job: amd64-opensuse-leap-container
   variables:
diff --git a/gdbstub/meson.build b/gdbstub/meson.build
index da5721d8452b..c91e398ae726 100644
--- a/gdbstub/meson.build
+++ b/gdbstub/meson.build
@@ -19,13 +19,11 @@ gdb_system_ss = gdb_system_ss.apply({})
 
 libgdb_user = static_library('gdb_user',
  gdb_user_ss.sources() + genh,
- name_suffix: 'fa',
  c_args: '-DCONFIG_USER_ONLY',
  build_by_default: false)
 
 libgdb_system = static_library('gdb_system',
 gdb_system_ss.sources() + genh,
-name_suffix: 'fa',
 build_by_default: false)
 
 gdb_user = declare_dependency(link_whole: libgdb_user)
diff --git a/tcg/meson.build b/tcg/meson.build
index 8251589fd4e9..f941413d5801 100644
--- a/tcg/meson.build
+++ b/tcg/meson.build
@@ -31,7 +31,6 @@ tcg_ss = tcg_ss.apply({})
 
 libtcg_user = static_library('tcg_user',
  tcg_ss.sources() + genh,
- name_suffix: 'fa',
  c_args: '-DCONFIG_USER_ONLY',
  build_by_default: false)
 
@@ -41,7 +40,6 @@ user_ss.add(tcg_user)
 
 libtcg_system = static_library('tcg_system',
 tcg_ss.sources() + genh,
-name_suffix: 'fa',
 c_args: '-DCONFIG_SOFTMMU',
 build_by_default: false)
 
diff --git a/tests/Makefile.include 

[PATCH v2 1/3] qemu-keymap: Free xkb allocations

2024-05-22 Thread Akihiko Odaki
This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
 qemu-keymap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
 xkb_state_unref(state);
 state = NULL;
 
+xkb_keymap_unref(map);
+xkb_context_unref(ctx);
+
 /* add quirks */
 fprintf(outfile,
 "\n"

-- 
2.45.1




[PATCH v2 2/3] meson: Add -fno-sanitize=function

2024-05-22 Thread Akihiko Odaki
-fsanitize=function enforces the consistency of function types, but
include/qemu/lockable.h contains function pointer casts, which violate
the rule. We already disables exact type checks for CFI with
-fsanitize-cfi-icall-generalize-pointers so disable -fsanitize=function
as well.

Signed-off-by: Akihiko Odaki 
---
 meson.build | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/meson.build b/meson.build
index 91a0aa64c640..9eaf218609eb 100644
--- a/meson.build
+++ b/meson.build
@@ -298,7 +298,7 @@ endforeach
 
 qemu_common_flags = [
   '-D_GNU_SOURCE', '-D_FILE_OFFSET_BITS=64', '-D_LARGEFILE_SOURCE',
-  '-fno-strict-aliasing', '-fno-common', '-fwrapv' ]
+  '-fno-sanitize=function', '-fno-strict-aliasing', '-fno-common', '-fwrapv' ]
 qemu_cflags = []
 qemu_ldflags = []
 
@@ -3462,14 +3462,12 @@ endif
 qom_ss = qom_ss.apply({})
 libqom = static_library('qom', qom_ss.sources() + genh,
 dependencies: [qom_ss.dependencies()],
-name_suffix: 'fa',
 build_by_default: false)
 qom = declare_dependency(link_whole: libqom)
 
 event_loop_base = files('event-loop-base.c')
 event_loop_base = static_library('event-loop-base',
  sources: event_loop_base + genh,
- name_suffix: 'fa',
  build_by_default: false)
 event_loop_base = declare_dependency(link_whole: event_loop_base,
  dependencies: [qom])
@@ -3703,7 +3701,6 @@ qemu_syms = custom_target('qemu.syms', output: 
'qemu.syms',
 authz_ss = authz_ss.apply({})
 libauthz = static_library('authz', authz_ss.sources() + genh,
   dependencies: [authz_ss.dependencies()],
-  name_suffix: 'fa',
   build_by_default: false)
 
 authz = declare_dependency(link_whole: libauthz,
@@ -3712,7 +3709,6 @@ authz = declare_dependency(link_whole: libauthz,
 crypto_ss = crypto_ss.apply({})
 libcrypto = static_library('crypto', crypto_ss.sources() + genh,
dependencies: [crypto_ss.dependencies()],
-   name_suffix: 'fa',
build_by_default: false)
 
 crypto = declare_dependency(link_whole: libcrypto,
@@ -3722,13 +3718,11 @@ io_ss = io_ss.apply({})
 libio = static_library('io', io_ss.sources() + genh,
dependencies: [io_ss.dependencies()],
link_with: libqemuutil,
-   name_suffix: 'fa',
build_by_default: false)
 
 io = declare_dependency(link_whole: libio, dependencies: [crypto, qom])
 
 libmigration = static_library('migration', sources: migration_files + genh,
-  name_suffix: 'fa',
   build_by_default: false)
 migration = declare_dependency(link_with: libmigration,
dependencies: [zlib, qom, io])
@@ -3738,7 +3732,6 @@ block_ss = block_ss.apply({})
 libblock = static_library('block', block_ss.sources() + genh,
   dependencies: block_ss.dependencies(),
   link_depends: block_syms,
-  name_suffix: 'fa',
   build_by_default: false)
 
 block = declare_dependency(link_whole: [libblock],
@@ -3748,7 +3741,6 @@ block = declare_dependency(link_whole: [libblock],
 blockdev_ss = blockdev_ss.apply({})
 libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
  dependencies: blockdev_ss.dependencies(),
- name_suffix: 'fa',
  build_by_default: false)
 
 blockdev = declare_dependency(link_whole: [libblockdev],
@@ -3757,13 +3749,11 @@ blockdev = declare_dependency(link_whole: [libblockdev],
 qmp_ss = qmp_ss.apply({})
 libqmp = static_library('qmp', qmp_ss.sources() + genh,
 dependencies: qmp_ss.dependencies(),
-name_suffix: 'fa',
 build_by_default: false)
 
 qmp = declare_dependency(link_whole: [libqmp])
 
 libchardev = static_library('chardev', chardev_ss.sources() + genh,
-name_suffix: 'fa',
 dependencies: chardev_ss.dependencies(),
 build_by_default: false)
 
@@ -3771,7 +3761,6 @@ chardev = declare_dependency(link_whole: libchardev)
 
 hwcore_ss = hwcore_ss.apply({})
 libhwcore = static_library('hwcore', sources: hwcore_ss.sources() + genh,
-   name_suffix: 'fa',
build_by_default: false)
 hwcore = declare_dependency(link_whole: libhwcore)
 common_ss.add(hwcore)
@@ -3807,8 +3796,7 @@ common_all = static_library('common',
 sources: common_ss.all_sources() + genh,
 include_directories: common_user_inc

Re: [PATCH v12 10/13] virtio-gpu: Move fence_poll timer to VirtIOGPUGL

2024-05-19 Thread Akihiko Odaki

On 2024/05/20 6:27, Dmitry Osipenko wrote:

Move fence_poll timer to VirtIOGPUGL for consistency with cmdq_resume_bh
that are used only by GL device.

Signed-off-by: Dmitry Osipenko 


Thanks for refacotoring.

Please move this before "[PATCH v12 01/13] virtio-gpu: Unrealize GL 
device" so that you don't have to rewrite code introduced by that patch.




Re: [PATCH v12 09/13] virtio-gpu: Handle resource blob commands

2024-05-19 Thread Akihiko Odaki

On 2024/05/20 6:27, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 310 -
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   2 +
  3 files changed, 312 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 7d4d2882a5af..63a5a983aad6 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,18 @@
  
  struct virtio_gpu_virgl_resource {

  struct virtio_gpu_simple_resource base;
+MemoryRegion *mr;
+
+/*
+ * Used by virgl_cmd_resource_unref() to know whether async
+ * unmapping has been started.  Blob can be both mapped/unmapped
+ * on unref and we shouldn't unmap blob that wasn't mapped in the
+ * first place because it's a error condition.  This flag prevents
+ * performing step 3 of the async unmapping process described in the
+ * comment to virtio_gpu_virgl_async_unmap_resource_blob() if blob
+ * wasn't mapped in the first place on unref.
+ */
+bool async_unmap_in_progress;


I suggest adding a field that tells if mr is deleted to 
virtio_gpu_virgl_hostmem_region instead to minimize the size of 
virtio_gpu_virgl_resource.



  };
  
  static struct virtio_gpu_virgl_resource *

@@ -49,6 +61,128 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+struct virtio_gpu_virgl_resource *res;
+};
+
+static void virtio_gpu_virgl_resume_cmdq_bh(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+VirtIOGPUGL *gl;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->mr = NULL;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() is executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+gl = VIRTIO_GPU_GL(vmr->g);
+qemu_bh_schedule(gl->cmdq_resume_bh);
+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource: 
%s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->res = res;
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * MR could outlive the resource if MR's reference is held outside of
+ * virtio-gpu. In order to prevent unmapping resource while MR is alive,
+ * and thus, making the data pointer invalid, we will block virtio-gpu
+ * command processing until MR is fully unreferenced and freed.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static int
+virtio_gpu_virgl_async_unmap_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource 
*res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr = res->mr;
+int ret;
+
+/*
+ * Perform async unmapping in 3 steps:
+ *
+ * 1. Begin async unmapping with memory_region_del_subregion()
+ *and suspend/block cmd processing.
+ * 2. Wait for res->mr to be freed and cmd processing resumed
+ *asynchronously by virtio_gpu_virgl_hostmem_region_free().
+ * 3. Finish the unmapping with final virgl_renderer_resource_unmap().
+ */
+if (mr) {
+/* render will be unblocked once MR is freed */
+b->renderer_blocked++;
+
+/* memory region owns self res->mr object and frees it by 

Re: [PATCH v2 1/1] riscv, gdbstub.c: fix reg_width in ricsv_gen_dynamic_vector_feature()

2024-05-18 Thread Akihiko Odaki

On 2024/05/18 5:30, Daniel Henrique Barboza wrote:

Commit 33a24910ae changed 'reg_width' to use 'vlenb', i.e. vector length
in bytes, when in this context we want 'reg_width' as the length in
bits.

Fix 'reg_width' back to the value in bits like 7cb59921c05a
("target/riscv/gdbstub.c: use 'vlenb' instead of shifting 'vlen'") set
beforehand.

While we're at it, rename 'reg_width' to 'bitsize' to provide a bit more
clarity about what the variable represents. 'bitsize' is also used in
riscv_gen_dynamic_csr_feature() with the same purpose, i.e. as an input to
gdb_feature_builder_append_reg().

Cc: Akihiko Odaki 
Cc: Alex Bennée 
Reported-by: Robin Dapp 
Fixes: 33a24910ae ("target/riscv: Use GDBFeature for dynamic XML")
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
Acked-by: Alex Bennée 


Reviewed-by: Akihiko Odaki 



Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki

On 2024/05/16 2:15, Dmitry Osipenko wrote:

On 5/15/24 20:04, Akihiko Odaki wrote:




VIRTIO_GPU_CMD_RESOURCE_UNREF should also call
virtio_gpu_virgl_async_unmap_resource_blob(). I guess that's the
original intention of having a function for this instead of inlining the
content of this function to virgl_cmd_resource_unmap_blob().


Correct, previous patchset versions unmapped resource on unref.

In v11 I dropped unmapping from unref to avoid adding additional
`async_unmap_in_progress` flag because normally map/unmap will be
balanced by guest anyways.

The virtio-gpu spec doesn't tell that resource have to be implicitly
unmapped on unref. In a case of Linux guest, it actually will be a bug
to unref a mapped resource because guest will continue to map and use
the destroyed resource.



Additional `async_unmap_in_progress` flag should not be necessary as 
explained earlier.


It is a valid design not to issue UNMAP_BLOB before UNREF if the 
automatically performs the unmapping operation. A guest needs to ensure 
the blob is not mapped in a guest userspace virtual address space, but 
it does not require issuing UNMAP_BLOB, which is to unmap the blob from 
the guest physical address space.


In case of Linux, virtio_gpu_vram_free() calls virtio_gpu_cmd_unmap() to 
issue UNMAP_BLOB before UNREF, which is actually not necessary. Linux 
still needs to ensure that the blob is not mapped in a guest userspace 
virtual address space, but that is done before virtio_gpu_vram_free() 
gets called, and virtio_gpu_cmd_unmap() has nothing to do with that.


It is still a good practice for a guest to issue UNMAP_BLOB in such a 
case because the spec does not say the VMM will automatically unmap the 
blob for UNREF, and that's what Linux does. From the VMM perspective, 
it's better to perform the unmapping operation for UNREF because the 
spec does not say the guest always issue UNMAP_BLOB before UNREF.




Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki

On 2024/05/16 2:01, Dmitry Osipenko wrote:

On 5/15/24 19:42, Akihiko Odaki wrote:

It may be better to actually implement unmapping instead of returning an
error for consistency with the iov operation. Apparently crosvm also
unmaps blobs with VIRTIO_GPU_CMD_RESOURCE_UNREF.


Then I'll add back `async_unmap_in_progress` because resource can be
both mapped/unmapped on unref, and we'll need flag to know whether async
unmapping has been finished to do the final unmapping of the resource.


Such a situation should be already handled since unmapping in progress
blocks all commands (not just VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB but
literally all, including VIRTIO_GPU_CMD_RESOURCE_UNREF).


The async unmapping consists of 3 parts:

1. begin async unmapping with memory_region_del_subregion() and suspend
2. wait for res->mr to be freed and resume
3. finish the unmapping with final virgl_renderer_resource_unmap()

Parts 1 and 3 are handled by  virtio_gpu_virgl_async_unmap_resource_blob()


The VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB is different because we know that
blob is mapped in the first place. Hence we can safely perform the part
3, assuming that parts 1/2 has been completed.

In case of VIRTIO_GPU_CMD_RESOURCE_UNREF, blob can be unmapped in the
first place and we can't do the part 3 because it will error out for
unmapped resource since parts 1/2 were not performed.



VIRTIO_GPU_CMD_RESOURCE_UNREF should also call 
virtio_gpu_virgl_async_unmap_resource_blob(). I guess that's the 
original intention of having a function for this instead of inlining the 
content of this function to virgl_cmd_resource_unmap_blob().




Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-15 Thread Akihiko Odaki

On 2024/05/16 1:39, Dmitry Osipenko wrote:

On 5/13/24 12:18, Akihiko Odaki wrote:

     static void virgl_cmd_resource_unref(VirtIOGPU *g,
- struct virtio_gpu_ctrl_command
*cmd)
+ struct virtio_gpu_ctrl_command
*cmd,
+ bool *cmd_suspended)


This parameter is not used as it returns an error if the resource is
still mapped.


Missed to remove it by accident


It may be better to actually implement unmapping instead of returning an
error for consistency with the iov operation. Apparently crosvm also
unmaps blobs with VIRTIO_GPU_CMD_RESOURCE_UNREF.


Then I'll add back `async_unmap_in_progress` because resource can be
both mapped/unmapped on unref, and we'll need flag to know whether async
unmapping has been finished to do the final unmapping of the resource.


Such a situation should be already handled since unmapping in progress 
blocks all commands (not just VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB but 
literally all, including VIRTIO_GPU_CMD_RESOURCE_UNREF).




Re: [PATCH v11 01/10] virtio-gpu: Unrealize GL device

2024-05-15 Thread Akihiko Odaki

On 2024/05/16 1:18, Dmitry Osipenko wrote:

On 5/13/24 11:44, Akihiko Odaki wrote:

On 2024/05/12 3:22, Dmitry Osipenko wrote:

Even though GL GPU doesn't support hotplugging today, free virgl
resources when GL device is unrealized. For consistency.

Signed-off-by: Dmitry Osipenko 
---
   hw/display/virtio-gpu-gl.c | 11 +++
   hw/display/virtio-gpu-virgl.c  |  9 +
   include/hw/virtio/virtio-gpu.h |  1 +
   3 files changed, 21 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfbfc..0c0a8d136954 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -136,6 +136,16 @@ static Property virtio_gpu_gl_properties[] = {
   DEFINE_PROP_END_OF_LIST(),
   };
   +static void virtio_gpu_gl_device_unrealize(DeviceState *qdev)
+{
+    VirtIOGPU *g = VIRTIO_GPU(qdev);
+    VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
+
+    if (gl->renderer_inited) {
+    virtio_gpu_virgl_deinit(g);
+    }
+}
+
   static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
   {
   DeviceClass *dc = DEVICE_CLASS(klass);
@@ -149,6 +159,7 @@ static void virtio_gpu_gl_class_init(ObjectClass
*klass, void *data)
   vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
     vdc->realize = virtio_gpu_gl_device_realize;
+    vdc->unrealize = virtio_gpu_gl_device_unrealize;
   vdc->reset = virtio_gpu_gl_reset;
   device_class_set_props(dc, virtio_gpu_gl_properties);
   }
diff --git a/hw/display/virtio-gpu-virgl.c
b/hw/display/virtio-gpu-virgl.c
index 9f34d0e6619c..b0500eccf8e0 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -665,3 +665,12 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
     return capset2_max_ver ? 2 : 1;
   }
+
+void virtio_gpu_virgl_deinit(VirtIOGPU *g)
+{
+    if (g->fence_poll) {


Isn't g->fence_poll always non-NULL when this function is called?


virtio_gpu_virgl_init() is invoked when first cmd is executed, please
see virtio_gpu_gl_handle_ctrl() that invokes it. Hence g->fence_poll can
be NULL.



But it already checks renderer_inited, doesn't it? And I think it's 
better to utilize one single flag to represent that virgl is enabled 
instead of checking several variables (fence_poll and cmdq_resume_bh in 
the future).




Re: [PATCH v11 09/10] virtio-gpu: Register capsets dynamically

2024-05-13 Thread Akihiko Odaki

On 2024/05/12 3:22, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-gl.c |  6 --
  hw/display/virtio-gpu-virgl.c  | 33 +
  include/hw/virtio/virtio-gpu.h |  4 +++-
  3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 95806999189e..431fc2881a00 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -124,8 +124,8 @@ static void virtio_gpu_gl_device_realize(DeviceState *qdev, 
Error **errp)
  }
  
  g->parent_obj.conf.flags |= (1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);

-VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
-virtio_gpu_virgl_get_num_capsets(g);
+g->capset_ids = virtio_gpu_virgl_get_capsets(g);
+VIRTIO_GPU_BASE(g)->virtio_config.num_capsets = g->capset_ids->len;
  
  #ifdef HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS

  g->parent_obj.conf.flags |= 1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED;
@@ -148,6 +148,8 @@ static void virtio_gpu_gl_device_unrealize(DeviceState 
*qdev)
  if (gl->renderer_inited) {
  virtio_gpu_virgl_deinit(g);
  }
+
+g_array_unref(g->capset_ids);
  }
  
  static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 3f2e406be3a4..135974431492 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -590,19 +590,13 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->capset_ids->len) {
+resp.capset_id = g_array_index(g->capset_ids, uint32_t,
+   info.capset_index);
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1123,14 +1117,29 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
-int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)

+static void virtio_gpu_virgl_add_capset(GArray *capset_ids, uint32_t capset_id)
+{
+g_array_append_val(capset_ids, capset_id);
+}


Is it worthwhile to have a function for this?



Re: [PATCH v11 08/10] virtio-gpu: Handle resource blob commands

2024-05-13 Thread Akihiko Odaki

On 2024/05/12 3:22, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 274 -
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   2 +
  3 files changed, 277 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 524f15220b7f..3f2e406be3a4 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,7 @@
  
  struct virtio_gpu_virgl_resource {

  struct virtio_gpu_simple_resource base;
+MemoryRegion *mr;
  };
  
  static struct virtio_gpu_virgl_resource *

@@ -49,6 +50,117 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+struct virtio_gpu_virgl_resource *res;
+};
+
+static void virtio_gpu_virgl_resume_cmdq_bh(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->mr = NULL;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() is executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+qemu_bh_schedule(vmr->g->cmdq_resume_bh);
+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource: 
%s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->res = res;
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * MR could outlive the resource if MR's reference is held outside of
+ * virtio-gpu. In order to prevent unmapping resource while MR is alive,
+ * and thus, making the data pointer invalid, we will block virtio-gpu
+ * command processing until MR is fully unreferenced and freed.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static int
+virtio_gpu_virgl_async_unmap_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource 
*res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr = res->mr;
+int ret;
+
+if (mr) {
+/* render will be unblocked once MR is freed */
+b->renderer_blocked++;
+
+/* memory region owns self res->mr object and frees it by itself */
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+} else {
+ret = virgl_renderer_resource_unmap(res->base.resource_id);
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: failed to unmap virgl resource: %s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+}
+
+return 0;
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -146,7 +258,8 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
  }
  
  static void virgl_cmd_resource_unref(VirtIOGPU *g,

- struct virtio_gpu_ctrl_command *cmd)
+ struct virtio_gpu_ctrl_command *cmd,
+ bool *cmd_suspended)


This parameter is not used as it returns an error if the resource is 
still mapped.


It may be better to actually 

Re: [PATCH v11 01/10] virtio-gpu: Unrealize GL device

2024-05-13 Thread Akihiko Odaki

On 2024/05/12 3:22, Dmitry Osipenko wrote:

Even though GL GPU doesn't support hotplugging today, free virgl
resources when GL device is unrealized. For consistency.

Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-gl.c | 11 +++
  hw/display/virtio-gpu-virgl.c  |  9 +
  include/hw/virtio/virtio-gpu.h |  1 +
  3 files changed, 21 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfbfc..0c0a8d136954 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -136,6 +136,16 @@ static Property virtio_gpu_gl_properties[] = {
  DEFINE_PROP_END_OF_LIST(),
  };
  
+static void virtio_gpu_gl_device_unrealize(DeviceState *qdev)

+{
+VirtIOGPU *g = VIRTIO_GPU(qdev);
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
+
+if (gl->renderer_inited) {
+virtio_gpu_virgl_deinit(g);
+}
+}
+
  static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -149,6 +159,7 @@ static void virtio_gpu_gl_class_init(ObjectClass *klass, 
void *data)
  vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
  
  vdc->realize = virtio_gpu_gl_device_realize;

+vdc->unrealize = virtio_gpu_gl_device_unrealize;
  vdc->reset = virtio_gpu_gl_reset;
  device_class_set_props(dc, virtio_gpu_gl_properties);
  }
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 9f34d0e6619c..b0500eccf8e0 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -665,3 +665,12 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  
  return capset2_max_ver ? 2 : 1;

  }
+
+void virtio_gpu_virgl_deinit(VirtIOGPU *g)
+{
+if (g->fence_poll) {


Isn't g->fence_poll always non-NULL when this function is called?


+timer_free(g->fence_poll);
+}
+
+virgl_renderer_cleanup(NULL);
+}
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index ed44cdad6b34..b657187159d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -336,6 +336,7 @@ void virtio_gpu_virgl_fence_poll(VirtIOGPU *g);
  void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g);
  void virtio_gpu_virgl_reset(VirtIOGPU *g);
  int virtio_gpu_virgl_init(VirtIOGPU *g);
+void virtio_gpu_virgl_deinit(VirtIOGPU *g);
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
  
  #endif




Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-05-10 Thread Akihiko Odaki

On 2024/05/09 21:39, Dmitry Osipenko wrote:

On 5/5/24 09:37, Akihiko Odaki wrote:

On 2024/05/02 4:02, Dmitry Osipenko wrote:

On 4/27/24 08:48, Akihiko Odaki wrote:


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can
just refer to !finished. No need to have the suspended flag.


Not sure what you're meaning. The !finished says that cmd is fenced,
this fenced command is added to the polling list and the fence is
checked periodically by the fence_poll timer, meanwhile next virgl
commands are executed in the same time.

This is completely different from the suspension where whole cmd
processing is blocked until command is resumed.



!finished means you have not sent a response with
virtio_gpu_ctrl_response(). Currently such a situation only happens when
a fence is requested and virtio_gpu_process_cmdq() exploits the fact,
but we are adding a new case without a fence.

So we need to add code to check if we are fencing or not in
virtio_gpu_process_cmdq(). This can be achieved by evaluating the
following expression as done in virtio_gpu_virgl_process_cmd():
cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE


This works, but then I'll add back the res->async_unmap_in_progress
because we need to know whether unmapping has been started.



Isn't the command processing paused when an unmapping operation is in 
progress?




Re: [PATCH] qemu-keymap: Free xkb allocations

2024-05-05 Thread Akihiko Odaki

On 2024/05/05 19:24, Michael Tokarev wrote:

01.05.2024 10:55, Akihiko Odaki wrote:

This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
  qemu-keymap.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
  xkb_state_unref(state);
  state = NULL;
+    xkb_keymap_unref(map);
+    xkb_context_unref(ctx);
+


I'd avoid freeing any resources in main() entirely,
since it's much cheaper to free whole process by the
kernel at exit time than to mess with each chunk of
allocated memory.  Dunno how useful it is to "fix"
these.

/mjt


This is purely to satisfy LeakSanitizier; the LeakSanitizer complaints 
result in a build failure with many noisy logs. I don't add Fixes: tags 
for this kind of leaks.


Regards,
Akihiko Odaki



Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-05-05 Thread Akihiko Odaki

On 2024/05/02 4:20, Dmitry Osipenko wrote:

On 4/27/24 08:52, Akihiko Odaki wrote:

On 2024/04/24 19:30, Dmitry Osipenko wrote:

On 4/19/24 12:18, Akihiko Odaki wrote:

@@ -61,6 +61,10 @@ struct virtio_gpu_simple_resource {
    int dmabuf_fd;
    uint8_t *remapped;
    +    MemoryRegion *mr;
+    bool async_unmap_completed;
+    bool async_unmap_in_progress;
+


Don't add fields to virtio_gpu_simple_resource but instead create a
struct that embeds virtio_gpu_simple_resource in virtio-gpu-virgl.c.


Please give a justification. I'd rather rename
virtio_gpu_simple_resource s/_simple//. Simple resource already supports
blob and the added fields are directly related to the blob. Don't see
why another struct is needed.



Because mapping is only implemented in virtio-gpu-gl while blob itself
is implemented also in virtio-gpu.


Rutubaga maps blobs and it should do unmapping blobs asynchronously as
well, AFAICT.



Right. It makes sense to put mr in struct virtio_gpu_simple_resource in 
preparation for such a situation.


Based on this discussion, I think it is fine to put mr either in struct 
virtio_gpu_simple_resource or a distinct struct. However if you put mr 
in struct virtio_gpu_simple_resource, the logic that manages 
MemoryRegion should also be moved to virtio-gpu.c for consistency.




Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-05-05 Thread Akihiko Odaki

On 2024/05/02 4:02, Dmitry Osipenko wrote:

On 4/27/24 08:48, Akihiko Odaki wrote:


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can
just refer to !finished. No need to have the suspended flag.


Not sure what you're meaning. The !finished says that cmd is fenced,
this fenced command is added to the polling list and the fence is
checked periodically by the fence_poll timer, meanwhile next virgl
commands are executed in the same time.

This is completely different from the suspension where whole cmd
processing is blocked until command is resumed.



!finished means you have not sent a response with 
virtio_gpu_ctrl_response(). Currently such a situation only happens when 
a fence is requested and virtio_gpu_process_cmdq() exploits the fact, 
but we are adding a new case without a fence.


So we need to add code to check if we are fencing or not in 
virtio_gpu_process_cmdq(). This can be achieved by evaluating the 
following expression as done in virtio_gpu_virgl_process_cmd():

cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE



Re: [PATCH v9 09/11] virtio-gpu: Register capsets dynamically

2024-05-03 Thread Akihiko Odaki

On 2024/05/02 4:52, Dmitry Osipenko wrote:

On 5/1/24 22:38, Dmitry Osipenko wrote:

On 5/1/24 22:31, Dmitry Osipenko wrote:

On 4/27/24 10:12, Akihiko Odaki wrote:

   int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
   {
   uint32_t capset2_max_ver, capset2_max_size;
+
+    if (g->capset_ids) {


Move capset_ids initialization to virtio_gpu_virgl_init() to save this
conditional.


Capsets are used before virgl is inited. At first guest queries virtio
device features and then enables virgl only if capset is available.
While virgl itself is initialized when first virtio command is
processed. I.e. it's not possible to move to virtio_gpu_virgl_init.


Though no, capsets aren't part of device features. I'll move it to
virtio_gpu_virgl_init, thanks.



Number of capsets actually is a part of generic virtio device cfg
descriptor. Capsets initialization can't be moved without probing
capsets twice, i.e. not worthwhile.



I see. Then I suggest replacing virtio_gpu_virgl_get_num_capsets() with 
a function that returns GArray of capset IDs. 
virtio_gpu_gl_device_realize() will assign the returned GArray to 
g->capset_ids. virtio_gpu_gl_device_unrealize(), which doesn't exist 
yet, will free g->capset_ids later.


This way, you won't need the conditional, and it will be clear that a 
GArray allocation happens in virtio_gpu_gl_device_realize() and is 
matched with the deallocation in virtio_gpu_gl_device_unrealize().




[PATCH] qemu-keymap: Free xkb allocations

2024-05-01 Thread Akihiko Odaki
This fixes LeakSanitizer complaints with xkbcommon 1.6.0.

Signed-off-by: Akihiko Odaki 
---
 qemu-keymap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-keymap.c b/qemu-keymap.c
index 8c80f7a4ed65..7a9f38cf9863 100644
--- a/qemu-keymap.c
+++ b/qemu-keymap.c
@@ -237,6 +237,9 @@ int main(int argc, char *argv[])
 xkb_state_unref(state);
 state = NULL;
 
+xkb_keymap_unref(map);
+xkb_context_unref(ctx);
+
 /* add quirks */
 fprintf(outfile,
 "\n"

---
base-commit: c25df57ae8f9fe1c72eee2dab37d76d904ac382e
change-id: 20240501-xkb-258483ccc5d8

Best regards,
-- 
Akihiko Odaki 




Re: [PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto

2024-05-01 Thread Akihiko Odaki

On 2024/04/29 16:05, Michael S. Tsirkin wrote:

On Sun, Apr 28, 2024 at 04:21:06PM +0900, Akihiko Odaki wrote:

Based-on: <20240428-rss-v10-0-73cbaa91a...@daynix.com>
("[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements")

Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 


Should we maybe bite the bullet allow "auto" for all binary/boolean
properties? Just ignore "auto" if no one cares ATM.


It is not always obvious whether "auto" should be considered as "on" or 
"off" for existing boolean properties. The properties this patch deals 
with are to enable features so "auto" should be considered as "on" if 
possible. However, other properties may mean to disable features, and in 
such a case, "auto" should be considered as "off".


It may still make sense to accept "auto" for all virtio-net feature bits 
for consistency. In particular, I left guest_rsc_ext property boolean 
since nobody cares "auto" for that feature, but this can be converted to 
OnOffAuto.




Re: [PATCH 2/3] virtio-net: Convert feature properties to OnOffAuto

2024-05-01 Thread Akihiko Odaki

On 2024/05/01 0:02, Yuri Benditovich wrote:

Question:
How will libvirt (as an example) work with this change. In the
existing semantic of libvirt profile the "on" means "on if possible"
and using existing profile after qemu update will still use "on" with
meaning "force"? > Typically this is solved by machine type - if libvirt uses
'machine='pc-q35-8.1'' this will be backward-compatible.
How will this change be accepted?


The reasoning here is that this patch only changes the configuration 
validation and, once the validation passes, QEMU can load the state of a 
machine created with an old version of QEMU.


It is still a good idea to add a compatibility property. I'll do so in 
the next version.




Re: [PATCH 1/3] qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()

2024-05-01 Thread Akihiko Odaki

On 2024/04/30 23:41, Yuri Benditovich wrote:

On Sun, Apr 28, 2024 at 10:21 AM Akihiko Odaki  wrote:


DEFINE_PROP_ON_OFF_AUTO_BIT64() corresponds to DEFINE_PROP_ON_OFF_AUTO()
as DEFINE_PROP_BIT64() corresponds to DEFINE_PROP_BOOL(). The difference
is that DEFINE_PROP_ON_OFF_AUTO_BIT64() exposes OnOffAuto instead of
bool.

Signed-off-by: Akihiko Odaki 
---
  include/hw/qdev-properties.h | 18 
  hw/core/qdev-properties.c| 65 +++-
  2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 09aa04ca1e27..afec53a48470 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -43,11 +43,22 @@ struct PropertyInfo {
  ObjectPropertyRelease *release;
  };

+/**
+ * struct OnOffAutoBit64 - OnOffAuto storage with 64 elements.
+ * @on_bits: Bitmap of elements with "on".
+ * @auto_bits: Bitmap of elements with "auto".
+ */
+typedef struct OnOffAutoBit64 {
+uint64_t on_bits;
+uint64_t auto_bits;
+} OnOffAutoBit64;
+

  /*** qdev-properties.c ***/

  extern const PropertyInfo qdev_prop_bit;
  extern const PropertyInfo qdev_prop_bit64;
+extern const PropertyInfo qdev_prop_on_off_auto_bit64;
  extern const PropertyInfo qdev_prop_bool;
  extern const PropertyInfo qdev_prop_enum;
  extern const PropertyInfo qdev_prop_uint8;
@@ -100,6 +111,13 @@ extern const PropertyInfo qdev_prop_link;
  .set_default = true,  \
  .defval.u  = (bool)_defval)

+#define DEFINE_PROP_ON_OFF_AUTO_BIT64(_name, _state, _field, _bit, _defval) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_on_off_auto_bit64, \
+OnOffAutoBit64, \
+.bitnr= (_bit), \
+.set_default = true,\
+.defval.i = (OnOffAuto)_defval)
+
  #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) \
  DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
  .set_default = true, \
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 7d6fa726fdf2..b96f54a1b912 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -188,7 +188,8 @@ const PropertyInfo qdev_prop_bit = {

  static uint64_t qdev_get_prop_mask64(Property *prop)
  {
-assert(prop->info == _prop_bit64);
+assert(prop->info == _prop_bit64 ||
+   prop->info == _prop_on_off_auto_bit64);
  return 0x1ull << prop->bitnr;
  }

@@ -233,6 +234,68 @@ const PropertyInfo qdev_prop_bit64 = {
  .set_default_value = set_default_value_bool,
  };

+static void prop_get_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (p->auto_bits & mask) {
+value = ON_OFF_AUTO_AUTO;
+} else if (p->on_bits & mask) {
+value = ON_OFF_AUTO_ON;
+} else {
+value = ON_OFF_AUTO_OFF;
+}
+
+visit_type_enum(v, name, , _lookup, errp);
+}
+
+static void prop_set_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (!visit_type_enum(v, name, , _lookup, errp)) {
+return;
+}
+
+switch (value) {
+case ON_OFF_AUTO_AUTO:
+p->on_bits &= ~mask;
+p->auto_bits |= mask;
+break;
+
+case ON_OFF_AUTO_ON:
+p->on_bits |= mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO_OFF:
+p->on_bits &= ~mask;
+p->auto_bits &= ~mask;
+break;
+}
+}
+
+const PropertyInfo qdev_prop_on_off_auto_bit64 = {
+.name  = "bool",


Does it mean that the name of this tristate type is "bool"? Or I miss something?


No, this should be OnOffAuto. Thanks for pointing out this; I'll fix 
this in the next version.




Re: [PATCH 1/1] Fix safeAreaInsets not being available on OS X 10.13

2024-04-28 Thread Akihiko Odaki

On 2024/04/28 20:57, Tobias Markus wrote:

Hello,

to prefix this: I previously filed 
https://gitlab.com/qemu-project/qemu/-/issues/2314 for this compilation 
error and I'm quite aware that QEMU only supports the most recent two 
versions of Mac OS X by default. However, given the small change 
required for this to work, I hope you can make an exception and include 
the attached patch into QEMU.


It would really help me continue to run QEMU on my legacy MacOS X 10.13 
version.


Hi,

I'm sorry to tell this but I'm not for making an exception.

While this change is indeed small, we already have many codes that are 
only needed by unsupported macOS versions. They were not added to keep 
QEMU work with unsupported macOS versions, but they were added when 
those versions were still supported. Maintaining all of them is not a 
trivial task so they are now eligible for removal.


Regards,
Akihiko Odaki



Thanks in advance!

Kind regards,
Tobias Markus

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2314
Signed-off-by: Tobias Markus 
---
  ui/cocoa.m | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 25e0db9dd0..96992736ef 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -539,9 +539,11 @@ - (NSSize)fixAspectRatio:(NSSize)max
  - (NSSize) screenSafeAreaSize
  {
  NSSize size = [[[self window] screen] frame].size;
+#if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_VERSION_14_0
  NSEdgeInsets insets = [[[self window] screen] safeAreaInsets];
  size.width -= insets.left + insets.right;
  size.height -= insets.top + insets.bottom;
+#endif
  return size;
  }





[PATCH 1/2] util/iov: Do not assert offset is in iov

2024-04-28 Thread Akihiko Odaki
iov_from_buf(), iov_to_buf(), iov_memset(), and iov_copy() asserts
that the given offset fits in the iov while tolerating the specified
number of bytes to operate with to be greater than the size of iov.
This is inconsistent so remove the assertions.

Asserting the offset fits in the iov makes sense if it is expected that
there are other operations that process the content before the offset
and the content is processed in order. Under this expectation, the
offset should point to the end of bytes that are previously processed
and fit in the iov. However, this expectation depends on the details of
the caller, and did not hold true at least one case and required code to
check iov_size(), which is added with commit 83ddb3dbba2e
("hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()").

Adding such a check is inefficient and error-prone. These functions
already tolerate the specified number of bytes to operate with to be
greater than the size of iov to avoid such checks so remove the
assertions to tolerate invalid offset as well. They return the number of
bytes they operated with so their callers can still check the returned
value to ensure there are sufficient space at the given offset.

Signed-off-by: Akihiko Odaki 
---
 include/qemu/iov.h | 5 +++--
 util/iov.c | 5 -
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index 63a1c01965d1..33548058d2ee 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -30,7 +30,7 @@ size_t iov_size(const struct iovec *iov, const unsigned int 
iov_cnt);
  * only part of data will be copied, up to the end of the iovec.
  * Number of bytes actually copied will be returned, which is
  *  min(bytes, iov_size(iov)-offset)
- * `Offset' must point to the inside of iovec.
+ * Returns 0 when `offset' points to the outside of iovec.
  */
 size_t iov_from_buf_full(const struct iovec *iov, unsigned int iov_cnt,
  size_t offset, const void *buf, size_t bytes);
@@ -66,11 +66,12 @@ iov_to_buf(const struct iovec *iov, const unsigned int 
iov_cnt,
 /**
  * Set data bytes pointed out by iovec `iov' of size `iov_cnt' elements,
  * starting at byte offset `start', to value `fillc', repeating it
- * `bytes' number of times.  `Offset' must point to the inside of iovec.
+ * `bytes' number of times.
  * If `bytes' is large enough, only last bytes portion of iovec,
  * up to the end of it, will be filled with the specified value.
  * Function return actual number of bytes processed, which is
  * min(size, iov_size(iov) - offset).
+ * Returns 0 when `offset' points to the outside of iovec.
  */
 size_t iov_memset(const struct iovec *iov, const unsigned int iov_cnt,
   size_t offset, int fillc, size_t bytes);
diff --git a/util/iov.c b/util/iov.c
index 7e73948f5e3d..a523b406b7f8 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -36,7 +36,6 @@ size_t iov_from_buf_full(const struct iovec *iov, unsigned 
int iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -55,7 +54,6 @@ size_t iov_to_buf_full(const struct iovec *iov, const 
unsigned int iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -74,7 +72,6 @@ size_t iov_memset(const struct iovec *iov, const unsigned int 
iov_cnt,
 offset -= iov[i].iov_len;
 }
 }
-assert(offset == 0);
 return done;
 }
 
@@ -266,7 +263,6 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int 
dst_iov_cnt,
 bytes -= len;
 offset = 0;
 }
-assert(offset == 0);
 return j;
 }
 
@@ -337,7 +333,6 @@ size_t qemu_iovec_concat_iov(QEMUIOVector *dst,
 soffset -= src_iov[i].iov_len;
 }
 }
-assert(soffset == 0); /* offset beyond end of src */
 
 return done;
 }

-- 
2.44.0




[PATCH 2/2] Revert "hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()"

2024-04-28 Thread Akihiko Odaki
This reverts commit 83ddb3dbba2ee0f1767442ae6ee665058aeb1093.

The added check is no longer necessary due to a change of
iov_from_buf().

Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index b7b1de816dc5..2134a18c4c90 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,10 +141,6 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
 uint32_t csum = 0;
 struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
 
-if (iov_size(pl_start_frag, pkt->payload_frags) < 8 + sizeof(csum)) {
-return false;
-}
-
 if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , 
sizeof(csum)) < sizeof(csum)) {
 return false;
 }

-- 
2.44.0




[PATCH 0/2] util/iov: Do not assert offset is in iov

2024-04-28 Thread Akihiko Odaki
iov_from_buf(), iov_to_buf(), iov_memset(), and iov_copy() asserts
that the given offset fits in the iov while tolerating the specified
number of bytes to operate with to be greater than the size of iov.
This is inconsistent so remove the assertions.

Asserting the offset fits in the iov makes sense if it is expected that
there are other operations that process the content before the offset
and the content is processed in order. Under this expectation, the
offset should point to the end of bytes that are previously processed
and fit in the iov. However, this expectation depends on the details of
the caller, and did not hold true at least one case and required code to
check iov_size(), which is added with commit 83ddb3dbba2e
("hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()").

Adding such a check is inefficient and error-prone. These functions
already tolerate the specified number of bytes to operate with to be
greater than the size of iov to avoid such checks so remove the
assertions to tolerate invalid offset as well. They return the number of
bytes they operated with so their callers can still check the returned
value to ensure there are sufficient space at the given offset.

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (2):
  util/iov: Do not assert offset is in iov
  Revert "hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()"

 include/qemu/iov.h  | 5 +++--
 hw/net/net_tx_pkt.c | 4 
 util/iov.c  | 5 -
 3 files changed, 3 insertions(+), 11 deletions(-)
---
base-commit: fd87be1dada5672f877e03c2ca8504458292c479
change-id: 20240428-iov-a317b02601dc

Best regards,
-- 
Akihiko Odaki 




[PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF

2024-04-28 Thread Akihiko Odaki
A SR-IOV VF cannot have a ROM BAR.

Co-developed-by: Yui Washizu 
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index cb5ac46e9f27..201ff64e11cc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2359,6 +2359,14 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
 return;
 }
 
+if (pci_is_vf(pdev)) {
+if (pdev->rom_bar != UINT32_MAX) {
+error_setg(errp, "ROM BAR cannot be enabled for SR-IOV VF");
+}
+
+return;
+}
+
 if (load_file || pdev->romsize == UINT32_MAX) {
 path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
 if (path == NULL) {

-- 
2.44.0




[PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive

2024-04-28 Thread Akihiko Odaki
A device cannot be a SR-IOV PF and a VF at the same time.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 56523ab4e833..ec8fc0757b92 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 uint8_t *cfg = dev->config + offset;
 uint8_t *wmask;
 
+if (pci_is_vf(dev)) {
+error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same 
time");
+return false;
+}
+
 if (total_vfs) {
 uint16_t ari_cap = pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI);
 uint16_t first_vf_devfn = dev->devfn + vf_offset;

-- 
2.44.0




[PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF

2024-04-28 Thread Akihiko Odaki
A virtio-net device can be added as a SR-IOV VF to another virtio-pci
device that will be the PF.

Signed-off-by: Akihiko Odaki 
---
 hw/virtio/virtio-net-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c
index e03543a70a75..dba4987d6e04 100644
--- a/hw/virtio/virtio-net-pci.c
+++ b/hw/virtio/virtio-net-pci.c
@@ -75,6 +75,7 @@ static void virtio_net_pci_class_init(ObjectClass *klass, 
void *data)
 k->device_id = PCI_DEVICE_ID_VIRTIO_NET;
 k->revision = VIRTIO_PCI_ABI_VERSION;
 k->class_id = PCI_CLASS_NETWORK_ETHERNET;
+k->sriov_vf_user_creatable = true;
 set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 device_class_set_props(dc, virtio_net_properties);
 vpciklass->realize = virtio_net_pci_realize;

-- 
2.44.0




[PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation

2024-04-28 Thread Akihiko Odaki
Based-on: <20240315-reuse-v9-0-67aa69af4...@daynix.com>
("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements")

Introduction


This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o
  -netdev user,id=p -netdev user,id=q
  -device pcie-root-port,id=b
  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.

Keeping VF instances


A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.

As an strategy alternative to runtime realization/unrealization, this
series proposes to reuse the code to power down PCI Express devices.
When a PCI Express device is powered down, it will be hidden from the
guest but will be kept realized. This effectively implements the
behavior we need for the SR-IOV emulation.

Summary
---

Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for
VFs.
Patch 2 makes zero stride valid for 1 VF configuration.
Patch 3 and 4 adds validations.
Patch 5 adds user-created SR-IOV VF infrastructure.
Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs.
Patch 7 allows user to create SR-IOV VFs with virtio-net-pci.

[1] 
https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.wash...@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4...@daynix.com/

Co-developed-by: Yui Washizu 
Signed-off-by: Akihiko Odaki 
---
Changes in v4:
- Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division
  by zero reported by Yui Washizu.
- Rebased.
- Link to v3: 
https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770...@daynix.com

Changes in v3:
- Rebased.
- Link to v2: 
https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6d...@daynix.com

Changes in v2:
- Changed to keep VF instances.
- Link to v1: 
https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7...@daynix.com

---
Akihiko Odaki (7):
  hw/pci: Do not add ROM BAR for SR-IOV VF
  hw/pci: Fix SR-IOV VF number calculation
  pcie_sriov: Ensure PF and VF are mutually exclusive
  pcie_sriov: Check PCI Express for SR-IOV PF
  pcie_sriov: Allow user to create SR-IOV device
  virtio-pci: Implement SR-IOV PF
  virtio-net: Implement SR-IOV VF

 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c|  76 +++
 hw/pci/pcie_sriov.c | 298 +++-
 hw/virtio/virtio-net-pci.c  |   1 +
 hw/virtio/virtio-pci.c  |   7 ++
 6 files changed, 323 insertions(+), 84 deletions(-)
---
base-commit: 2ac5458086ab61282f30c2f8bdf2ae9a0a06a75d
change-id: 20231202-sriov-9402fb262be8

Best regards,
-- 
Akihiko Odaki 




[PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF

2024-04-28 Thread Akihiko Odaki
SR-IOV requires PCI Express.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index ec8fc0757b92..3af0cc7d560a 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 uint8_t *cfg = dev->config + offset;
 uint8_t *wmask;
 
+if (!pci_is_express(dev)) {
+error_setg(errp, "PCI Express is required for SR-IOV PF");
+return false;
+}
+
 if (pci_is_vf(dev)) {
 error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same 
time");
 return false;

-- 
2.44.0




[PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF

2024-04-28 Thread Akihiko Odaki
Allow user to attach SR-IOV VF to a virtio-pci PF.

Signed-off-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402cfa..996bb2cbad20 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2245,6 +2245,12 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 pci_register_bar(>pci_dev, proxy->legacy_io_bar_idx,
  PCI_BASE_ADDRESS_SPACE_IO, >bar);
 }
+
+if (pcie_sriov_pf_init_from_user_created_vfs(>pci_dev,
+ PCI_CONFIG_SPACE_SIZE,
+ errp)) {
+virtio_add_feature(>host_features, VIRTIO_F_SR_IOV);
+}
 }
 
 static void virtio_pci_device_unplugged(DeviceState *d)
@@ -2253,6 +2259,7 @@ static void virtio_pci_device_unplugged(DeviceState *d)
 bool modern = virtio_pci_modern(proxy);
 bool modern_pio = proxy->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
 
+pcie_sriov_pf_exit(>pci_dev);
 virtio_pci_stop_ioeventfd(proxy);
 
 if (modern) {

-- 
2.44.0




[PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device

2024-04-28 Thread Akihiko Odaki
A user can create a SR-IOV device by specifying the PF with the
sriov-pf property of the VFs. The VFs must be added before the PF.

A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable
set. Such a VF cannot refer to the PF because it is created before the
PF.

A PF that user-creatable VFs can be attached calls
pcie_sriov_pf_init_from_user_created_vfs() during realization and
pcie_sriov_pf_exit() when exiting.

Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c|  62 ++
 hw/pci/pcie_sriov.c | 288 +++-
 4 files changed, 292 insertions(+), 83 deletions(-)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index 6be0f989ebe0..eefd9d9a7b5a 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -37,6 +37,8 @@ struct PCIDeviceClass {
 uint16_t subsystem_id;  /* only for header type = 0 */
 
 const char *romfile;/* rom bar */
+
+bool sriov_vf_user_creatable;
 };
 
 enum PCIReqIDType {
@@ -160,6 +162,8 @@ struct PCIDevice {
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
 uint32_t acpi_index;
+
+char *sriov_pf;
 };
 
 static inline int pci_intx(PCIDevice *pci_dev)
@@ -192,7 +196,7 @@ static inline int pci_is_express_downstream_port(const 
PCIDevice *d)
 
 static inline int pci_is_vf(const PCIDevice *d)
 {
-return d->exp.sriov_vf.pf != NULL;
+return d->sriov_pf || d->exp.sriov_vf.pf != NULL;
 }
 
 static inline uint32_t pci_config_size(const PCIDevice *d)
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index d576a8c6be19..20626b5605c9 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,6 +18,7 @@
 struct PCIESriovPF {
 uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
 PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
+bool vf_user_created; /* If VFs are created by user */
 };
 
 struct PCIESriovVF {
@@ -40,6 +41,24 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int 
region_num,
 void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
 MemoryRegion *memory);
 
+/**
+ * pcie_sriov_pf_init_from_user_created_vfs() - Initialize PF with user-created
+ *  VFs.
+ * @dev: A PCIe device being realized.
+ * @offset: The offset of the SR-IOV capability.
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return:
+ * * true - @dev is initialized as a PCIe SR-IOV PF.
+ * * false - @dev is not initialized because there is no SR-IOV VFs or an error
+ *   occurred.
+ */
+bool pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, uint16_t offset,
+  Error **errp);
+
+bool pcie_sriov_register_device(PCIDevice *dev, Error **errp);
+void pcie_sriov_unregister_device(PCIDevice *dev);
+
 /*
  * Default (minimal) page size support values
  * as required by the SR/IOV standard:
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dbecb3d4aa42..e79bb8b6b6fa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -85,6 +85,7 @@ static Property pci_props[] = {
 QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
 DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
 QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
+DEFINE_PROP_STRING("sriov-pf", PCIDevice, sriov_pf),
 DEFINE_PROP_END_OF_LIST()
 };
 
@@ -959,13 +960,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
 }
 
-/*
- * With SR/IOV and ARI, a device at function 0 need not be a multifunction
- * device, as it may just be a VF that ended up with function 0 in
- * the legacy PCI interpretation. Avoid failing in such cases:
- */
-if (pci_is_vf(dev) &&
-dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+/* SR/IOV is not handled here. */
+if (pci_is_vf(dev)) {
 return;
 }
 
@@ -998,7 +994,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 }
 /* function 0 indicates single function, so function > 0 must be NULL */
 for (func = 1; func < PCI_FUNC_MAX; ++func) {
-if (bus->devices[PCI_DEVFN(slot, func)]) {
+PCIDevice *device = bus->devices[PCI_DEVFN(slot, func)];
+if (device && !pci_is_vf(device)) {
 error_setg(errp, "PCI: %x.0 indicates single function, "
"but %x.%x is already populated.",
slot, slot, func);
@@ -1283,6 +1280,7 @@ static void pci_qdev_unrealize(DeviceState *dev)
 
 pci_unregister_io_regions(pci_dev);
 pc

[PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation

2024-04-28 Thread Akihiko Odaki
pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs
to be non-zero when there are multiple VFs, but the specification does
not prohibit to make it zero when there is only one VF.

Do not perform the division for the first VF to avoid division by zero.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 201ff64e11cc..dbecb3d4aa42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1437,7 +1437,11 @@ static pcibus_t pci_config_get_bar_addr(PCIDevice *d, 
int reg,
 pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
 uint16_t vf_stride =
 pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
+uint32_t vf_num = d->devfn - (pf->devfn + vf_offset);
+
+if (vf_num) {
+vf_num /= vf_stride;
+}
 
 if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
 new_addr = pci_get_quad(pf->config + bar);

-- 
2.44.0




[PATCH 3/3] virtio-net: Report RSS warning at device realization

2024-04-28 Thread Akihiko Odaki
Warning about RSS fallback at device realization allows the user to
notice the configuration problem early.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5b6c901915a9..e1a6d43de283 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -817,6 +817,10 @@ static uint64_t virtio_net_get_features(VirtIODevice 
*vdev, uint64_t features,
 
 if (!get_vhost_net(nc->peer)) {
 if (!ebpf_rss_is_loaded(>ebpf_rss)) {
+if (on_off_auto_features.on_bits & VIRTIO_NET_F_RSS) {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+}
+
 virtio_clear_feature(_off_auto_features.auto_bits,
  VIRTIO_NET_F_RSS);
 }
@@ -1327,16 +1331,10 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 static void virtio_net_commit_rss_config(VirtIONet *n)
 {
 if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash ||
+   !virtio_net_attach_epbf_rss(n);
 if (n->rss_data.populate_hash) {
 virtio_net_detach_epbf_rss(n);
-} else if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-} else {
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
 }
 
 trace_virtio_net_rss_enable(n->rss_data.hash_types,

-- 
2.44.0




[PATCH 2/3] virtio-net: Convert feature properties to OnOffAuto

2024-04-28 Thread Akihiko Odaki
Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 
---
 include/hw/virtio/virtio-net.h |   2 +-
 hw/net/virtio-net.c| 247 +
 2 files changed, 152 insertions(+), 97 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 060c23c04d2d..ff32e30f001b 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -178,7 +178,7 @@ struct VirtIONet {
 uint32_t has_vnet_hdr;
 size_t host_hdr_len;
 size_t guest_hdr_len;
-uint64_t host_features;
+OnOffAutoBit64 host_features;
 uint32_t rsc_timeout;
 uint8_t rsc4_enabled;
 uint8_t rsc6_enabled;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index c8059dc99bd4..5b6c901915a9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -750,58 +750,96 @@ static void virtio_net_set_queue_pairs(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
+static bool virtio_net_clear_features(OnOffAutoBit64 *features,
+  uint64_t clear_bits,
+  const char *reason, Error **errp)
+{
+if (features->on_bits & clear_bits) {
+error_setg(errp, "%s", reason);
+return false;
+}
+
+features->auto_bits &= ~clear_bits;
+
+return true;
+}
+
 static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
 Error **errp)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
-
-/* Firstly sync all virtio-net possible supported features */
-features |= n->host_features;
-
-virtio_add_feature(, VIRTIO_NET_F_MAC);
-
-if (!peer_has_vnet_hdr(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_CSUM);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_TSO4);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_TSO6);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_ECN);
-
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_CSUM);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_TSO6);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_ECN);
-
-virtio_clear_feature(, VIRTIO_NET_F_HOST_USO);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO6);
-
-virtio_clear_feature(, VIRTIO_NET_F_HASH_REPORT);
-}
-
-if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_UFO);
-virtio_clear_feature(, VIRTIO_NET_F_HOST_UFO);
-}
-
-if (!peer_has_uso(n)) {
-virtio_clear_feature(, VIRTIO_NET_F_HOST_USO);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO4);
-virtio_clear_feature(, VIRTIO_NET_F_GUEST_USO6);
+OnOffAutoBit64 on_off_auto_features = n->host_features;
+
+on_off_auto_features.on_bits |= features;
+virtio_add_feature(_off_auto_features.auto_bits, VIRTIO_NET_F_MAC);
+
+if (!((peer_has_vnet_hdr(n) ||
+   virtio_net_clear_features(_off_auto_features,
+ BIT_ULL(VIRTIO_NET_F_CSUM) |
+ BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+ BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+ BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+ BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_HOST_USO) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_USO4) |
+ BIT_ULL(VIRTIO_NET_F_GUEST_USO6) |
+ BIT_ULL(VIRTIO_NET_F_HASH_REPORT),
+ "A requested feature requires the peer to 
support virtio-net headers.",
+ errp)) &&
+  (peer_has_ufo(n) ||
+   virtio_net_clear_features(_off_auto_features,
+ BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+ BIT_ULL(VIRTIO_NET_F_HOST_UFO),
+ "UFO

[PATCH 1/3] qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()

2024-04-28 Thread Akihiko Odaki
DEFINE_PROP_ON_OFF_AUTO_BIT64() corresponds to DEFINE_PROP_ON_OFF_AUTO()
as DEFINE_PROP_BIT64() corresponds to DEFINE_PROP_BOOL(). The difference
is that DEFINE_PROP_ON_OFF_AUTO_BIT64() exposes OnOffAuto instead of
bool.

Signed-off-by: Akihiko Odaki 
---
 include/hw/qdev-properties.h | 18 
 hw/core/qdev-properties.c| 65 +++-
 2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 09aa04ca1e27..afec53a48470 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -43,11 +43,22 @@ struct PropertyInfo {
 ObjectPropertyRelease *release;
 };
 
+/**
+ * struct OnOffAutoBit64 - OnOffAuto storage with 64 elements.
+ * @on_bits: Bitmap of elements with "on".
+ * @auto_bits: Bitmap of elements with "auto".
+ */
+typedef struct OnOffAutoBit64 {
+uint64_t on_bits;
+uint64_t auto_bits;
+} OnOffAutoBit64;
+
 
 /*** qdev-properties.c ***/
 
 extern const PropertyInfo qdev_prop_bit;
 extern const PropertyInfo qdev_prop_bit64;
+extern const PropertyInfo qdev_prop_on_off_auto_bit64;
 extern const PropertyInfo qdev_prop_bool;
 extern const PropertyInfo qdev_prop_enum;
 extern const PropertyInfo qdev_prop_uint8;
@@ -100,6 +111,13 @@ extern const PropertyInfo qdev_prop_link;
 .set_default = true,  \
 .defval.u  = (bool)_defval)
 
+#define DEFINE_PROP_ON_OFF_AUTO_BIT64(_name, _state, _field, _bit, _defval) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_on_off_auto_bit64, \
+OnOffAutoBit64, \
+.bitnr= (_bit), \
+.set_default = true,\
+.defval.i = (OnOffAuto)_defval)
+
 #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) \
 DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
 .set_default = true, \
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 7d6fa726fdf2..b96f54a1b912 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -188,7 +188,8 @@ const PropertyInfo qdev_prop_bit = {
 
 static uint64_t qdev_get_prop_mask64(Property *prop)
 {
-assert(prop->info == _prop_bit64);
+assert(prop->info == _prop_bit64 ||
+   prop->info == _prop_on_off_auto_bit64);
 return 0x1ull << prop->bitnr;
 }
 
@@ -233,6 +234,68 @@ const PropertyInfo qdev_prop_bit64 = {
 .set_default_value = set_default_value_bool,
 };
 
+static void prop_get_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (p->auto_bits & mask) {
+value = ON_OFF_AUTO_AUTO;
+} else if (p->on_bits & mask) {
+value = ON_OFF_AUTO_ON;
+} else {
+value = ON_OFF_AUTO_OFF;
+}
+
+visit_type_enum(v, name, , _lookup, errp);
+}
+
+static void prop_set_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+int value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (!visit_type_enum(v, name, , _lookup, errp)) {
+return;
+}
+
+switch (value) {
+case ON_OFF_AUTO_AUTO:
+p->on_bits &= ~mask;
+p->auto_bits |= mask;
+break;
+
+case ON_OFF_AUTO_ON:
+p->on_bits |= mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO_OFF:
+p->on_bits &= ~mask;
+p->auto_bits &= ~mask;
+break;
+}
+}
+
+const PropertyInfo qdev_prop_on_off_auto_bit64 = {
+.name  = "bool",
+.description = "on/off/auto",
+.enum_table = _lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.get = prop_get_on_off_auto_bit64,
+.set = prop_set_on_off_auto_bit64,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- bool --- */
 
 static void get_bool(Object *obj, Visitor *v, const char *name, void *opaque,

-- 
2.44.0




[PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto

2024-04-28 Thread Akihiko Odaki
Based-on: <20240428-rss-v10-0-73cbaa91a...@daynix.com>
("[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements")

Some features are not always available, and virtio-net used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

Convert feature properties to OnOffAuto so that the user can explicitly
tell QEMU to automatically select the value by setting them "auto".
QEMU will give an error if they are set "on".

Signed-off-by: Akihiko Odaki 
---
Akihiko Odaki (3):
  qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()
  virtio-net: Convert feature properties to OnOffAuto
  virtio-net: Report RSS warning at device realization

 include/hw/qdev-properties.h   |  18 +++
 include/hw/virtio/virtio-net.h |   2 +-
 hw/core/qdev-properties.c  |  65 ++-
 hw/net/virtio-net.c| 259 +
 4 files changed, 239 insertions(+), 105 deletions(-)
---
base-commit: ec6325eec995018983a3f88f0e78ebf733a47b7e
change-id: 20240428-auto-be0dc010dda5

Best regards,
-- 
Akihiko Odaki 




[PATCH v10 12/18] virtio-net: Unify the logic to update NIC state for RSS

2024-04-28 Thread Akihiko Odaki
The code to attach or detach the eBPF program to RSS were duplicated so
unify them into one function to save some code.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 90 +
 1 file changed, 36 insertions(+), 54 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ac9c06f6865..61b49e335dea 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1232,18 +1232,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
-static void virtio_net_detach_epbf_rss(VirtIONet *n);
-
-static void virtio_net_disable_rss(VirtIONet *n)
-{
-if (n->rss_data.enabled) {
-trace_virtio_net_rss_disable();
-}
-n->rss_data.enabled = false;
-
-virtio_net_detach_epbf_rss(n);
-}
-
 static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
 {
 NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
@@ -1291,6 +1279,40 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
+static void virtio_net_commit_rss_config(VirtIONet *n)
+{
+if (n->rss_data.enabled) {
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+if (n->rss_data.populate_hash) {
+virtio_net_detach_epbf_rss(n);
+} else if (!virtio_net_attach_epbf_rss(n)) {
+if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+warn_report("Can't load eBPF RSS for vhost");
+} else {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+n->rss_data.enabled_software_rss = true;
+}
+}
+
+trace_virtio_net_rss_enable(n->rss_data.hash_types,
+n->rss_data.indirections_len,
+sizeof(n->rss_data.key));
+} else {
+virtio_net_detach_epbf_rss(n);
+trace_virtio_net_rss_disable();
+}
+}
+
+static void virtio_net_disable_rss(VirtIONet *n)
+{
+if (!n->rss_data.enabled) {
+return;
+}
+
+n->rss_data.enabled = false;
+virtio_net_commit_rss_config(n);
+}
+
 static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
@@ -1455,28 +1477,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
 goto error;
 }
 n->rss_data.enabled = true;
-
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-/* EBPF must be loaded for vhost */
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-goto error;
-}
-/* fallback to software RSS */
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-} else {
-/* use software RSS for hash populating */
-/* and detach eBPF if was loaded before */
-virtio_net_detach_epbf_rss(n);
-n->rss_data.enabled_software_rss = true;
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-temp.b);
+virtio_net_commit_rss_config(n);
 return queue_pairs;
 error:
 trace_virtio_net_rss_error(err_msg, err_value);
@@ -3092,26 +3093,7 @@ static int virtio_net_post_load_device(void *opaque, int 
version_id)
 }
 }
 
-if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't post-load eBPF RSS for vhost");
-} else {
-warn_report("Can't post-load eBPF RSS - "
-"fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-}
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-sizeof(n->rss_data.key));
-} else {
-trace_virtio_net_rss_disable();
-}
+virtio_net_commit_rss_config(n);
 return 0;
 }
 

-- 
2.44.0




[PATCH v10 10/18] virtio-net: Shrink header byte swapping buffer

2024-04-28 Thread Akihiko Odaki
Byte swapping is only performed for the part of header shared with the
legacy standard and the buffer only needs to cover it.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ca0fbf7b7654..5aa0527a1921 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -676,11 +676,6 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 
 n->mergeable_rx_bufs = mergeable_rx_bufs;
 
-/*
- * Note: when extending the vnet header, please make sure to
- * change the vnet header copying logic in virtio_net_flush_tx()
- * as well.
- */
 if (version_1) {
 n->guest_hdr_len = hash_report ?
 sizeof(struct virtio_net_hdr_v1_hash) :
@@ -2752,7 +2747,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 ssize_t ret;
 unsigned int out_num;
 struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE + 1], 
*out_sg;
-struct virtio_net_hdr_v1_hash vhdr;
+struct virtio_net_hdr vhdr;
 
 elem = virtqueue_pop(q->tx_vq, sizeof(VirtQueueElement));
 if (!elem) {
@@ -2769,18 +2764,18 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 }
 
 if (n->needs_vnet_hdr_swap) {
-if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
-n->guest_hdr_len) {
+if (iov_to_buf(out_sg, out_num, 0, , sizeof(vhdr)) <
+sizeof(vhdr)) {
 virtio_error(vdev, "virtio-net header incorrect");
 virtqueue_detach_element(q->tx_vq, elem, 0);
 g_free(elem);
 return -EINVAL;
 }
-virtio_net_hdr_swap(vdev, (void *) );
+virtio_net_hdr_swap(vdev, );
 sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
+sg2[0].iov_len = sizeof(vhdr);
 out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
-   n->guest_hdr_len, -1);
+   sizeof(vhdr), -1);
 if (out_num == VIRTQUEUE_MAX_SIZE) {
 goto drop;
 }

-- 
2.44.0




[PATCH v10 07/18] virtio-net: Do not propagate ebpf-rss-fds errors

2024-04-28 Thread Akihiko Odaki
Propagating ebpf-rss-fds errors has several problems.

First, it makes device realization fail and disables the fallback to the
conventional eBPF loading.

Second, it leaks memory by making device realization fail without
freeing memory already allocated.

Third, the convention is to set an error when a function returns false,
but virtio_net_load_ebpf_fds() and virtio_net_load_ebpf() returns false
without setting an error, which is confusing.

Remove the propagation to fix these problems.

Fixes: 0524ea0510a3 ("ebpf: Added eBPF initialization by fds.")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f6112c0ac97d..8ede38aadbbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1329,24 +1329,22 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
 int ret = true;
 int i = 0;
 
-ERRP_GUARD();
-
 if (n->nr_ebpf_rss_fds != EBPF_RSS_MAX_FDS) {
-error_setg(errp,
-  "Expected %d file descriptors but got %d",
-  EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
+warn_report("Expected %d file descriptors but got %d",
+EBPF_RSS_MAX_FDS, n->nr_ebpf_rss_fds);
return false;
}
 
 for (i = 0; i < n->nr_ebpf_rss_fds; i++) {
-fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i], errp);
-if (*errp) {
+fds[i] = monitor_fd_param(monitor_cur(), n->ebpf_rss_fds[i],
+  _warn);
+if (fds[i] < 0) {
 ret = false;
 goto exit;
 }
@@ -1355,7 +1353,7 @@ static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error 
**errp)
 ret = ebpf_rss_load_fds(>ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
 
 exit:
-if (!ret || *errp) {
+if (!ret) {
 for (i = 0; i < n->nr_ebpf_rss_fds && fds[i] != -1; i++) {
 close(fds[i]);
 }
@@ -1364,13 +1362,12 @@ exit:
 return ret;
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+static bool virtio_net_load_ebpf(VirtIONet *n)
 {
 bool ret = false;
 
 if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-if (!(n->ebpf_rss_fds
-&& virtio_net_load_ebpf_fds(n, errp))) {
+if (!(n->ebpf_rss_fds && virtio_net_load_ebpf_fds(n))) {
 ret = ebpf_rss_load(>ebpf_rss);
 }
 }
@@ -3825,7 +3822,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n, errp);
+virtio_net_load_ebpf(n);
 }
 }
 

-- 
2.44.0




[PATCH v10 16/18] ebpf: Return 0 when configuration fails

2024-04-28 Thread Akihiko Odaki
The kernel interprets the returned value as an unsigned 32-bit so -1
will mean queue 4294967295, which is awkward. Return 0 instead.

Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1532 +++
 tools/ebpf/rss.bpf.c|2 +-
 2 files changed, 767 insertions(+), 767 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index e41ed8890191..647212e5dd0c 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -178,786 +178,786 @@ static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
-\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
-\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
-\xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
-\x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
-\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
-\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
-\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
-\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x05\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x57\x01\0\0\x3f\xff\0\0\xb7\x04\
-\0\0\x01\0\0\0\x55\x01\x01\0\0\0\0\0\xb7\x04\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\
-\0\x63\x1a\x64\xff\0\0\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\
-\xa9\xd9\xff\0\0\0\0\x71\xa2\xd0\xff\0\0\0\0\x67\x02\0\0\x02\0\0\0\x57\x02\0\0\
-\x3c\0\0\0\x73\x4a\x5e\xff\0\0\0\0\x05\0\xbc\0\0\0\0\0\xb7\x01\0\0\x01\0\0\0\
-\x73\x1a\x59\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\xf0\xff\0\0\0\0\x7b\x1a\
-\xe8\xff\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\
-\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x28\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\
-\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xe4\x01\0\0\0\0\xb7\
-\x03\0\0\x28\0\0\0\x7b\x9a\x48\xff\0\0\0\0\x79\xa1\xe0\xff\0\0\0\0\x63\x1a\x6c\
-\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x70\xff\0\0\0\0\x79\xa1\xd8\xff\0\0\
-\0\0\x63\x1a\x64\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x68\xff\0\0\0\0\x79\
-\xa1\xe8\xff\0\0\0\0\x63\x1a\x74\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x78\
-\xff\0\0\0\0\x79\xa1\xf0\xff\0\0\0\0\x63\x1a\x7c\xff\0\0\0\0\x77\x01\0\0\x20\0\
-\0\0\x63\x1a\x80\xff\0\0\0\0\x71\xa9\xd6\xff\0\0\0\0\x25\x09\x93\0\x3c\0\0\0\
-\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\0\0\x18\x02\0\0\x01\0\0\0\0\0\0\0\0\x18\
-\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\0\x8c\0\0\0\0\0\xb7\x01\0\
-\0\0\0\0\0\x6b\x1a\xfe\xff\0\0\0\0\xb7\x02\0\0\x28\0\0\0\xbf\xa1\0\0\0\0\0\0\
-\x07\x01\0\0\x94\xff\xff\xff\x7b\x1a\x20\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\
-\x01\0\0\x84\xff\xff\xff\x7b\x1a\x18\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\
-\x38\xff\0\0\0\0\x7b\x7a\x30\xff\0\0\0\0\x7b\x8a\x28\xff\0\0\0\0\xbf\xa3\0\0\0\
-\0\0\0\x07\x03\0\0\xfe\xff\xff\xff\x79\xa1\x48\xff\0\0\0\0\x7b\x2a\x40\xff\0\0\
-\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\
-\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xb2\x01\0\0\0\0\xbf\x91\0\0\0\0\0\0\x15\
-\x01\x22\0\x3c\0\0\0\x15\x01\x58\0\x2c\0\0\0\x79\xa2\x40\xff\0\0\0\0\x55\x01\
-\x59\0\x2b\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xf8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\

[PATCH v10 11/18] virtio-net: Disable RSS on reset

2024-04-28 Thread Akihiko Odaki
RSS is disabled by default.

Fixes: 590790297c ("virtio-net: implement RSS configuration command")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Michael Tokarev 
---
 hw/net/virtio-net.c | 70 +++--
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5aa0527a1921..1ac9c06f6865 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -600,40 +600,6 @@ static void virtio_net_queue_enable(VirtIODevice *vdev, 
uint32_t queue_index)
 }
 }
 
-static void virtio_net_reset(VirtIODevice *vdev)
-{
-VirtIONet *n = VIRTIO_NET(vdev);
-int i;
-
-/* Reset back to compatibility mode */
-n->promisc = 1;
-n->allmulti = 0;
-n->alluni = 0;
-n->nomulti = 0;
-n->nouni = 0;
-n->nobcast = 0;
-/* multiqueue is disabled by default */
-n->curr_queue_pairs = 1;
-timer_del(n->announce_timer.tm);
-n->announce_timer.round = 0;
-n->status &= ~VIRTIO_NET_S_ANNOUNCE;
-
-/* Flush any MAC and VLAN filter table state */
-n->mac_table.in_use = 0;
-n->mac_table.first_multi = 0;
-n->mac_table.multi_overflow = 0;
-n->mac_table.uni_overflow = 0;
-memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
-memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
-qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
-memset(n->vlans, 0, MAX_VLAN >> 3);
-
-/* Flush any async TX */
-for (i = 0;  i < n->max_queue_pairs; i++) {
-flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
-}
-}
-
 static void peer_test_vnet_hdr(VirtIONet *n)
 {
 NetClientState *nc = qemu_get_queue(n->nic);
@@ -3861,6 +3827,42 @@ static void virtio_net_device_unrealize(DeviceState *dev)
 virtio_cleanup(vdev);
 }
 
+static void virtio_net_reset(VirtIODevice *vdev)
+{
+VirtIONet *n = VIRTIO_NET(vdev);
+int i;
+
+/* Reset back to compatibility mode */
+n->promisc = 1;
+n->allmulti = 0;
+n->alluni = 0;
+n->nomulti = 0;
+n->nouni = 0;
+n->nobcast = 0;
+/* multiqueue is disabled by default */
+n->curr_queue_pairs = 1;
+timer_del(n->announce_timer.tm);
+n->announce_timer.round = 0;
+n->status &= ~VIRTIO_NET_S_ANNOUNCE;
+
+/* Flush any MAC and VLAN filter table state */
+n->mac_table.in_use = 0;
+n->mac_table.first_multi = 0;
+n->mac_table.multi_overflow = 0;
+n->mac_table.uni_overflow = 0;
+memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
+qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+memset(n->vlans, 0, MAX_VLAN >> 3);
+
+/* Flush any async TX */
+for (i = 0;  i < n->max_queue_pairs; i++) {
+flush_or_purge_queued_packets(qemu_get_subqueue(n->nic, i));
+}
+
+virtio_net_disable_rss(n);
+}
+
 static void virtio_net_instance_init(Object *obj)
 {
 VirtIONet *n = VIRTIO_NET(obj);

-- 
2.44.0




[PATCH v10 17/18] ebpf: Refactor tun_rss_steering_prog()

2024-04-28 Thread Akihiko Odaki
This saves branches and makes later BPF program changes easier.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/rss.bpf.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
index 77434435ac15..c989cb3cd82c 100644
--- a/tools/ebpf/rss.bpf.c
+++ b/tools/ebpf/rss.bpf.c
@@ -547,27 +547,23 @@ int tun_rss_steering_prog(struct __sk_buff *skb)
 config = bpf_map_lookup_elem(_rss_map_configurations, );
 toe = bpf_map_lookup_elem(_rss_map_toeplitz_key, );
 
-if (config && toe) {
-if (!config->redirect) {
-return config->default_queue;
-}
+if (!config || !toe) {
+return 0;
+}
 
-if (calculate_rss_hash(skb, config, toe, )) {
-__u32 table_idx = hash % config->indirections_len;
-__u16 *queue = 0;
+if (config->redirect && calculate_rss_hash(skb, config, toe, )) {
+__u32 table_idx = hash % config->indirections_len;
+__u16 *queue = 0;
 
-queue = bpf_map_lookup_elem(_rss_map_indirection_table,
-_idx);
+queue = bpf_map_lookup_elem(_rss_map_indirection_table,
+_idx);
 
-if (queue) {
-return *queue;
-}
+if (queue) {
+return *queue;
 }
-
-return config->default_queue;
 }
 
-return 0;
+return config->default_queue;
 }
 
 char _license[] SEC("license") = "GPL v2";

-- 
2.44.0




[PATCH v10 14/18] virtio-net: Do not write hashes to peer buffer

2024-04-28 Thread Akihiko Odaki
The peer buffer is qualified with const and not meant to be modified.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 527aac3a0465..c8059dc99bd4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1830,16 +1830,9 @@ static uint8_t virtio_net_get_hash_type(bool hasip4,
 return 0xff;
 }
 
-static void virtio_set_packet_hash(const uint8_t *buf, uint8_t report,
-   uint32_t hash)
-{
-struct virtio_net_hdr_v1_hash *hdr = (void *)buf;
-hdr->hash_value = hash;
-hdr->hash_report = report;
-}
-
 static int virtio_net_process_rss(NetClientState *nc, const uint8_t *buf,
-  size_t size)
+  size_t size,
+  struct virtio_net_hdr_v1_hash *hdr)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 unsigned int index = nc->queue_index, new_index = index;
@@ -1870,7 +1863,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
  n->rss_data.hash_types);
 if (net_hash_type > NetPktRssIpV6UdpEx) {
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, VIRTIO_NET_HASH_REPORT_NONE, 0);
+hdr->hash_value = VIRTIO_NET_HASH_REPORT_NONE;
+hdr->hash_report = 0;
 }
 return n->rss_data.redirect ? n->rss_data.default_queue : -1;
 }
@@ -1878,7 +1872,8 @@ static int virtio_net_process_rss(NetClientState *nc, 
const uint8_t *buf,
 hash = net_rx_pkt_calc_rss_hash(pkt, net_hash_type, n->rss_data.key);
 
 if (n->rss_data.populate_hash) {
-virtio_set_packet_hash(buf, reports[net_hash_type], hash);
+hdr->hash_value = hash;
+hdr->hash_report = reports[net_hash_type];
 }
 
 if (n->rss_data.redirect) {
@@ -1898,7 +1893,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 VirtQueueElement *elems[VIRTQUEUE_MAX_SIZE];
 size_t lens[VIRTQUEUE_MAX_SIZE];
 struct iovec mhdr_sg[VIRTQUEUE_MAX_SIZE];
-struct virtio_net_hdr_mrg_rxbuf mhdr;
+struct virtio_net_hdr_v1_hash extra_hdr;
 unsigned mhdr_cnt = 0;
 size_t offset, i, guest_offset, j;
 ssize_t err;
@@ -1908,7 +1903,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, 
const uint8_t *buf,
 }
 
 if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
-int index = virtio_net_process_rss(nc, buf, size);
+int index = virtio_net_process_rss(nc, buf, size, _hdr);
 if (index >= 0) {
 NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
 return virtio_net_receive_rcu(nc2, buf, size, true);
@@ -1968,15 +1963,17 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 if (n->mergeable_rx_bufs) {
 mhdr_cnt = iov_copy(mhdr_sg, ARRAY_SIZE(mhdr_sg),
 sg, elem->in_num,
-offsetof(typeof(mhdr), num_buffers),
-sizeof(mhdr.num_buffers));
+offsetof(typeof(extra_hdr), 
hdr.num_buffers),
+sizeof(extra_hdr.hdr.num_buffers));
 }
 
 receive_header(n, sg, elem->in_num, buf, size);
 if (n->rss_data.populate_hash) {
-offset = sizeof(mhdr);
+offset = offsetof(typeof(extra_hdr), hash_value);
 iov_from_buf(sg, elem->in_num, offset,
- buf + offset, n->host_hdr_len - sizeof(mhdr));
+ (char *)_hdr + offset,
+ sizeof(extra_hdr.hash_value) +
+ sizeof(extra_hdr.hash_report));
 }
 offset = n->host_hdr_len;
 total += n->guest_hdr_len;
@@ -2022,10 +2019,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 }
 
 if (mhdr_cnt) {
-virtio_stw_p(vdev, _buffers, i);
+virtio_stw_p(vdev, _hdr.hdr.num_buffers, i);
 iov_from_buf(mhdr_sg, mhdr_cnt,
  0,
- _buffers, sizeof mhdr.num_buffers);
+ _hdr.hdr.num_buffers,
+ sizeof extra_hdr.hdr.num_buffers);
 }
 
 for (j = 0; j < i; j++) {

-- 
2.44.0




[PATCH v10 06/18] tap: Shrink zeroed virtio-net header

2024-04-28 Thread Akihiko Odaki
tap prepends a zeroed virtio-net header when writing a packet to a
tap with virtio-net header enabled but not in use. This only happens
when s->host_vnet_hdr_len == sizeof(struct virtio_net_hdr).

Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tap.c b/net/tap.c
index 9825518ff1f3..51f7aec39d9e 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -119,7 +119,7 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 const struct iovec *iovp = iov;
 g_autofree struct iovec *iov_copy = NULL;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
+struct virtio_net_hdr hdr = { };
 
 if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
 iov_copy = g_new(struct iovec, iovcnt + 1);

-- 
2.44.0




[PATCH v10 15/18] ebpf: Fix RSS error handling

2024-04-28 Thread Akihiko Odaki
calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.

Fixes: f3fa412de2 ("ebpf: Added eBPF RSS program.")
Signed-off-by: Akihiko Odaki 
---
 ebpf/rss.bpf.skeleton.h | 1210 +++
 tools/ebpf/rss.bpf.c|   20 +-
 2 files changed, 610 insertions(+), 620 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index aed4ef9a0335..e41ed8890191 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -165,7 +165,7 @@ rss_bpf__create_skeleton(struct rss_bpf *obj)
s->progs[0].prog = >progs.tun_rss_steering_prog;
s->progs[0].link = >links.tun_rss_steering_prog;
 
-   s->data = (void *)rss_bpf__elf_bytes(>data_sz);
+   s->data = rss_bpf__elf_bytes(>data_sz);
 
obj->skeleton = s;
return 0;
@@ -176,194 +176,188 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20600;
-   return (const void *)"\
+   static const char data[] __attribute__((__aligned__(8))) = "\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x38\x4d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xb8\x4b\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x61\x02\0\0\0\0\xbf\x78\0\0\
-\0\0\0\0\x15\x08\x5f\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x58\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
-\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
-\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
-\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
-\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
-\xff\0\0\0\0\x15\x09\x47\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x4f\x02\0\0\0\0\xbf\x78\0\0\
+\0\0\0\0\x15\x08\x4d\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x46\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
+\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
+\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
+\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
+\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
+\xff\0\0\0\0\x15\x09\x35\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x3c\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
+\x77\0\0\0\x20\0\0\0\x55\0\x2a\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x2c\x02\0\
-\0\0\0\x69\xa1\xc8\xff\0\0\0\0\x15\x01\x2a\x02\0\0\0\0\x7b\x9a\x38\xff\0\0\0\0\
-\x15\x01\x56\0\x86\xdd\0\0\x55\x01\x3b\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
-\x1a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\
-\xff\0\0\0\0\x7b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xc8\xff\
-\xff\xff\x79\xa1\x38\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
+\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\
+\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x18\x02\0\0\0\0\x15\x01\x21\0\x86\xdd\0\
+\0\x7b\x9a\x48\xff\0\0\0\0\x55\x01\xf6\0\x08\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\
+\x1a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
+\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
+\xff\xff\x79\xa1\x48\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\
 \x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\
-\x55\0\x17\x02\0\0\0\0\x69\xa1\xce\xff\0\0

[PATCH v10 18/18] ebpf: Add a separate target for skeleton

2024-04-28 Thread Akihiko Odaki
This generalizes the rule to generate the skeleton and allows to add
another.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/Makefile.ebpf | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
index 3391e7ce0898..572ca5987ae6 100755
--- a/tools/ebpf/Makefile.ebpf
+++ b/tools/ebpf/Makefile.ebpf
@@ -1,23 +1,24 @@
-OBJS = rss.bpf.o
+SKELETONS = rss.bpf.skeleton.h
 
 LLVM_STRIP ?= llvm-strip
 CLANG ?= clang
 INC_FLAGS = `$(CLANG) -print-file-name=include`
 EXTRA_CFLAGS ?= -O2 -g -target bpf
 
-all: $(OBJS)
+all: $(SKELETONS)
 
 .PHONY: clean
 
 clean:
-   rm -f $(OBJS)
-   rm -f rss.bpf.skeleton.h
+   rm -f $(SKELETONS) $(SKELETONS:%.skeleton.h=%.o)
 
-$(OBJS):  %.o:%.c
+%.o: %.c
$(CLANG) $(INC_FLAGS) \
 -D__KERNEL__ -D__ASM_SYSREG_H \
 -I../include $(LINUXINCLUDE) \
 $(EXTRA_CFLAGS) -c $< -o $@
$(LLVM_STRIP) -g $@
-   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
-   cp rss.bpf.skeleton.h ../../ebpf/
+
+%.skeleton.h: %.o
+   bpftool gen skeleton $< > $@
+   cp $@ ../../ebpf/

-- 
2.44.0




[PATCH v10 01/18] tap: Remove tap_probe_vnet_hdr_len()

2024-04-28 Thread Akihiko Odaki
It was necessary since an Linux older than 2.6.35 may implement the
virtio-net header but may not allow to change its length. Remove it
since such an old Linux is no longer supported.

Signed-off-by: Akihiko Odaki 
Acked-by: Michael S. Tsirkin 
---
 net/tap_int.h |  1 -
 net/tap-bsd.c |  5 -
 net/tap-linux.c   | 20 
 net/tap-solaris.c |  5 -
 net/tap-stub.c|  5 -
 net/tap.c |  8 ++--
 6 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/net/tap_int.h b/net/tap_int.h
index 9a2175655bb0..8857ff299d22 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -35,7 +35,6 @@ ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
 void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
-int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo,
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 274ea7bd2c3c..b4c84441ba8b 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -217,11 +217,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index c7e514ecb04b..1226d5fda2d9 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -185,26 +185,6 @@ int tap_probe_has_uso(int fd)
 return 1;
 }
 
-/* Verify that we can assign given length */
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-int orig;
-if (ioctl(fd, TUNGETVNETHDRSZ, ) == -1) {
-return 0;
-}
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-return 0;
-}
-/* Restore original length: we can't handle failure. */
-if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
-fprintf(stderr, "TUNGETVNETHDRSZ ioctl() failed: %s. Exiting.\n",
-strerror(errno));
-abort();
-return -errno;
-}
-return 1;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 if (ioctl(fd, TUNSETVNETHDRSZ, ) == -1) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 08b13af51257..51b7830bef1d 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -221,11 +221,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 4b24f61e3a6c..38673434cbd6 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -52,11 +52,6 @@ int tap_probe_has_uso(int fd)
 return 0;
 }
 
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
 void tap_fd_set_vnet_hdr_len(int fd, int len)
 {
 }
diff --git a/net/tap.c b/net/tap.c
index baaa2f7a9ac7..72ae95894ff1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -259,11 +259,7 @@ static bool tap_has_vnet_hdr(NetClientState *nc)
 
 static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-
-assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-
-return !!tap_probe_vnet_hdr_len(s->fd, len);
+return tap_has_vnet_hdr(nc);
 }
 
 static int tap_get_vnet_hdr_len(NetClientState *nc)
@@ -432,7 +428,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
  * Make sure host header length is set correctly in tap:
  * it might have been modified by another instance of qemu.
  */
-if (tap_probe_vnet_hdr_len(s->fd, s->host_vnet_hdr_len)) {
+if (vnet_hdr) {
 tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
 }
 tap_read_poll(s, true);

-- 
2.44.0




[PATCH v10 08/18] virtio-net: Add only one queue pair when realizing

2024-04-28 Thread Akihiko Odaki
Multiqueue usage is not negotiated yet when realizing. If more than
one queue is added and the guest never requests to enable multiqueue,
the extra queues will not be deleted when unrealizing and leak.

Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest doesn't 
support multiqueue")
Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8ede38aadbbe..e33bdbfd84a5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3759,9 +3759,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n),
 n->net_conf.tx_queue_size);
 
-for (i = 0; i < n->max_queue_pairs; i++) {
-virtio_net_add_queue(n, i);
-}
+virtio_net_add_queue(n, 0);
 
 n->ctrl_vq = virtio_add_queue(vdev, 64, virtio_net_handle_ctrl);
 qemu_macaddr_default_if_unset(>nic_conf.macaddr);

-- 
2.44.0




[PATCH v10 05/18] tap: Call tap_receive_iov() from tap_receive()

2024-04-28 Thread Akihiko Odaki
This will save duplicate logic found in both of tap_receive_iov() and
tap_receive().

Suggested-by: "Zhang, Chen" 
Signed-off-by: Akihiko Odaki 
---
 net/tap.c | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 99c59ee46881..9825518ff1f3 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -133,39 +133,14 @@ static ssize_t tap_receive_iov(NetClientState *nc, const 
struct iovec *iov,
 return tap_write_packet(s, iovp, iovcnt);
 }
 
-static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t 
size)
-{
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[2];
-int iovcnt = 0;
-struct virtio_net_hdr_mrg_rxbuf hdr = { };
-
-if (s->host_vnet_hdr_len) {
-iov[iovcnt].iov_base = 
-iov[iovcnt].iov_len  = s->host_vnet_hdr_len;
-iovcnt++;
-}
-
-iov[iovcnt].iov_base = (char *)buf;
-iov[iovcnt].iov_len  = size;
-iovcnt++;
-
-return tap_write_packet(s, iov, iovcnt);
-}
-
 static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-TAPState *s = DO_UPCAST(TAPState, nc, nc);
-struct iovec iov[1];
-
-if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
-return tap_receive_raw(nc, buf, size);
-}
-
-iov[0].iov_base = (char *)buf;
-iov[0].iov_len  = size;
+struct iovec iov = {
+.iov_base = (void *)buf,
+.iov_len = size
+};
 
-return tap_write_packet(s, iov, 1);
+return tap_receive_iov(nc, , 1);
 }
 
 #ifndef __sun__

-- 
2.44.0




[PATCH v10 13/18] virtio-net: Always set populate_hash

2024-04-28 Thread Akihiko Odaki
The member is not cleared during reset so may have a stale value.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..527aac3a0465 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -651,6 +651,7 @@ static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int 
mergeable_rx_bufs,
 n->guest_hdr_len = n->mergeable_rx_bufs ?
 sizeof(struct virtio_net_hdr_mrg_rxbuf) :
 sizeof(struct virtio_net_hdr);
+n->rss_data.populate_hash = false;
 }
 
 for (i = 0; i < n->max_queue_pairs; i++) {

-- 
2.44.0




[PATCH v10 09/18] virtio-net: Copy header only when necessary

2024-04-28 Thread Akihiko Odaki
The copied header is only used for byte swapping.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e33bdbfd84a5..ca0fbf7b7654 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -360,7 +360,8 @@ static void virtio_net_vnet_endian_status(VirtIONet *n, 
uint8_t status)
  * can't do it, we fallback onto fixing the headers in the core
  * virtio-net code.
  */
-n->needs_vnet_hdr_swap = virtio_net_set_vnet_endian(vdev, n->nic->ncs,
+n->needs_vnet_hdr_swap = n->has_vnet_hdr &&
+ virtio_net_set_vnet_endian(vdev, n->nic->ncs,
 queue_pairs, true);
 } else if (virtio_net_started(n, vdev->status)) {
 /* After using the device, we need to reset the network backend to
@@ -2767,7 +2768,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 
-if (n->has_vnet_hdr) {
+if (n->needs_vnet_hdr_swap) {
 if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
@@ -2775,19 +2776,16 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 g_free(elem);
 return -EINVAL;
 }
-if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
-   out_sg, out_num,
-   n->guest_hdr_len, -1);
-if (out_num == VIRTQUEUE_MAX_SIZE) {
-goto drop;
-}
-out_num += 1;
-out_sg = sg2;
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
+sg2[0].iov_len = n->guest_hdr_len;
+out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
+   n->guest_hdr_len, -1);
+if (out_num == VIRTQUEUE_MAX_SIZE) {
+goto drop;
 }
+out_num += 1;
+out_sg = sg2;
 }
 /*
  * If host wants to see the guest header as is, we can

-- 
2.44.0




[PATCH v10 00/18] virtio-net RSS/hash report fixes and improvements

2024-04-28 Thread Akihiko Odaki
This series contains fixes and improvements for virtio-net RSS and hash
reporting feature.

V7 -> V8:
  Reset author email addresses.
  Rebased.

V6 -> V7:
  Dropped patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Dropped the changes to remove packet flags.
  Re-introduced tap_receive() and changed it to call tap_receive_iov().
  Removed tap_get_vnet_hdr_len().
  Fixed tap initialization not to call tap_fd_set_vnet_hdr_len() for tap
  without virtio-net header.
  Changed to call error_report() instead of warn_report() for
  programming errors.

V5 -> V6:
  Corrected the message for patch "virtio-net: Return an error when vhost
  cannot enable RSS".
  Removed changes to introduce asserts from "virtio-net: Return an error
  when vhost cannot enable RSS".
  Reorganized patches "virtio-net: Return an error when vhost cannot
  enable RSS" and "virtio-net: Do not clear VIRTIO_NET_F_RSS". This
  version now contains patches "virtio-net: Return an error when vhost
  cannot enable RSS" and "virtio-net: Enable software RSS".
  Rebased.

V4 -> V5:
  Added patch "virtio-net: Do not write hashes to peer buffer".

V3 -> V4:
  Extract patches "tap: Remove tap_receive()" and  "net: Remove flag
  propagation" from "net: Remove receive_raw()".
  Added patch "virtio-net: Always set populate_hash".
  Added patch "virtio-net: Do not clear VIRTIO_NET_F_HASH_REPORT".
  Added patch "ebpf: Use standard section name".
  Added patch "ebpf: Simplify error handling".
  Added patch "ebpf: Return 0 when configuration fails".
  Added patch "ebpf: Refactor tun_rss_steering_prog()".
  Added patch "ebpf: Add a separate target for skeleton".

V2 -> V3:
  Added patch "tap: Remove tap_probe_vnet_hdr_len()".
  Added patch "tap: Remove qemu_using_vnet_hdr()".
  Added patch "net: Move virtio-net header length assertion".
  Added patch "net: Remove receive_raw()".
  Added patch "tap: Shrink zeroed virtio-net header".
  Dropped patch "tap: Fix virtio-net header buffer size".

V1 -> V2:
  Added patch "ebpf: Fix RSS error handling".

Signed-off-by: Akihiko Odaki 
---
Changes in v10:
- Dropped an obsolete statement of patch "virtio-net: Do not write
  hashes to peer buffer". (Yuri Benditovich)
- Dropped patch "virtio-net: Return an error when vhost cannot enable
  RSS" and "virtio-net: Report RSS warning at device realization".
- Link to v9: 
https://lore.kernel.org/r/20240403-rss-v9-0-c6d87e69d...@daynix.com

Changes in v9:
- Added patch "virtio-net: Do not propagate ebpf-rss-fds errors".
- Added patch "virtio-net: Shrink header byte swapping buffer".
- Rebased.
- Link to v8: 
https://lore.kernel.org/r/20231210-rss-v8-0-9553ee714...@daynix.com

---
Akihiko Odaki (18):
  tap: Remove tap_probe_vnet_hdr_len()
  tap: Remove qemu_using_vnet_hdr()
  net: Move virtio-net header length assertion
  net: Remove receive_raw()
  tap: Call tap_receive_iov() from tap_receive()
  tap: Shrink zeroed virtio-net header
  virtio-net: Do not propagate ebpf-rss-fds errors
  virtio-net: Add only one queue pair when realizing
  virtio-net: Copy header only when necessary
  virtio-net: Shrink header byte swapping buffer
  virtio-net: Disable RSS on reset
  virtio-net: Unify the logic to update NIC state for RSS
  virtio-net: Always set populate_hash
  virtio-net: Do not write hashes to peer buffer
  ebpf: Fix RSS error handling
  ebpf: Return 0 when configuration fails
  ebpf: Refactor tun_rss_steering_prog()
  ebpf: Add a separate target for skeleton

 ebpf/rss.bpf.skeleton.h  | 1558 +++---
 include/net/net.h|8 -
 net/tap_int.h|1 -
 hw/net/e1000e.c  |1 -
 hw/net/igb.c |1 -
 hw/net/net_tx_pkt.c  |4 +-
 hw/net/virtio-net.c  |  264 
 hw/net/vmxnet3.c |2 -
 net/dump.c   |4 +-
 net/net.c|   47 +-
 net/netmap.c |5 -
 net/tap-bsd.c|5 -
 net/tap-linux.c  |   20 -
 net/tap-solaris.c|    5 -
 net/tap-stub.c   |5 -
 net/tap.c|   77 +--
 tools/ebpf/rss.bpf.c |   44 +-
 tools/ebpf/Makefile.ebpf |   15 +-
 18 files changed, 948 insertions(+), 1118 deletions(-)
---
base-commit: e5c6528dce86d7a9ada7ecf02fcb7b8560955131
change-id: 20231210-rss-e7c98e722253

Best regards,
-- 
Akihiko Odaki 




[PATCH v10 03/18] net: Move virtio-net header length assertion

2024-04-28 Thread Akihiko Odaki
The virtio-net header length assertion should happen for any clients.

Signed-off-by: Akihiko Odaki 
---
 net/net.c | 5 +
 net/tap.c | 3 ---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/net.c b/net/net.c
index bd51037ebfb0..db096765f4b2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -56,6 +56,7 @@
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
 #include "qapi/qobject-input-visitor.h"
+#include "standard-headers/linux/virtio_net.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -550,6 +551,10 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
 return;
 }
 
+assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
+   len == sizeof(struct virtio_net_hdr) ||
+   len == sizeof(struct virtio_net_hdr_v1_hash));
+
 nc->vnet_hdr_len = len;
 nc->info->set_vnet_hdr_len(nc, len);
 }
diff --git a/net/tap.c b/net/tap.c
index c848844955df..49edf6c2b6e1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -267,9 +267,6 @@ static void tap_set_vnet_hdr_len(NetClientState *nc, int 
len)
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
 assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
-assert(len == sizeof(struct virtio_net_hdr_mrg_rxbuf) ||
-   len == sizeof(struct virtio_net_hdr) ||
-   len == sizeof(struct virtio_net_hdr_v1_hash));
 
 tap_fd_set_vnet_hdr_len(s->fd, len);
 s->host_vnet_hdr_len = len;

-- 
2.44.0




[PATCH v10 02/18] tap: Remove qemu_using_vnet_hdr()

2024-04-28 Thread Akihiko Odaki
Since qemu_set_vnet_hdr_len() is always called when
qemu_using_vnet_hdr() is called, we can merge them and save some code.

For consistency, express that the virtio-net header is not in use by
returning 0 with qemu_get_vnet_hdr_len() instead of having a dedicated
function, qemu_get_using_vnet_hdr().

Signed-off-by: Akihiko Odaki 
---
 include/net/net.h   |  7 ---
 hw/net/e1000e.c |  1 -
 hw/net/igb.c|  1 -
 hw/net/net_tx_pkt.c |  4 ++--
 hw/net/virtio-net.c |  3 ---
 hw/net/vmxnet3.c|  2 --
 net/dump.c  |  4 +---
 net/net.c   | 24 +---
 net/netmap.c|  5 -
 net/tap.c   | 28 +---
 10 files changed, 5 insertions(+), 74 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index b1f9b35fcca1..6fe5a0aee833 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -57,8 +57,6 @@ typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasUso)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
-typedef bool (GetUsingVnetHdr)(NetClientState *);
-typedef void (UsingVnetHdr)(NetClientState *, bool);
 typedef void (SetOffload)(NetClientState *, int, int, int, int, int, int, int);
 typedef int (GetVnetHdrLen)(NetClientState *);
 typedef void (SetVnetHdrLen)(NetClientState *, int);
@@ -88,10 +86,7 @@ typedef struct NetClientInfo {
 HasUso *has_uso;
 HasVnetHdr *has_vnet_hdr;
 HasVnetHdrLen *has_vnet_hdr_len;
-GetUsingVnetHdr *get_using_vnet_hdr;
-UsingVnetHdr *using_vnet_hdr;
 SetOffload *set_offload;
-GetVnetHdrLen *get_vnet_hdr_len;
 SetVnetHdrLen *set_vnet_hdr_len;
 SetVnetLE *set_vnet_le;
 SetVnetBE *set_vnet_be;
@@ -194,8 +189,6 @@ bool qemu_has_ufo(NetClientState *nc);
 bool qemu_has_uso(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
 bool qemu_has_vnet_hdr_len(NetClientState *nc, int len);
-bool qemu_get_using_vnet_hdr(NetClientState *nc);
-void qemu_using_vnet_hdr(NetClientState *nc, bool enable);
 void qemu_set_offload(NetClientState *nc, int csum, int tso4, int tso6,
   int ecn, int ufo, int uso4, int uso6);
 int qemu_get_vnet_hdr_len(NetClientState *nc);
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 7c6f6029518c..d0dde767f6aa 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -352,7 +352,6 @@ e1000e_init_net_peer(E1000EState *s, PCIDevice *pci_dev, 
uint8_t *macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..1224c7ba8e38 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -349,7 +349,6 @@ igb_init_net_peer(IGBState *s, PCIDevice *pci_dev, uint8_t 
*macaddr)
 for (i = 0; i < s->conf.peers.queues; i++) {
 nc = qemu_get_subqueue(s->nic, i);
 qemu_set_vnet_hdr_len(nc->peer, sizeof(struct virtio_net_hdr));
-qemu_using_vnet_hdr(nc->peer, true);
 }
 }
 
diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c90..903238dca24d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -578,7 +578,7 @@ static void net_tx_pkt_sendv(
 {
 NetClientState *nc = opaque;
 
-if (qemu_get_using_vnet_hdr(nc->peer)) {
+if (qemu_get_vnet_hdr_len(nc->peer)) {
 qemu_sendv_packet(nc, virt_iov, virt_iov_cnt);
 } else {
 qemu_sendv_packet(nc, iov, iov_cnt);
@@ -808,7 +808,7 @@ static bool net_tx_pkt_do_sw_fragmentation(struct NetTxPkt 
*pkt,
 
 bool net_tx_pkt_send(struct NetTxPkt *pkt, NetClientState *nc)
 {
-bool offload = qemu_get_using_vnet_hdr(nc->peer);
+bool offload = qemu_get_vnet_hdr_len(nc->peer);
 return net_tx_pkt_send_custom(pkt, offload, net_tx_pkt_sendv, nc);
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 58014a92ad19..f6112c0ac97d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3794,9 +3794,6 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 
 peer_test_vnet_hdr(n);
 if (peer_has_vnet_hdr(n)) {
-for (i = 0; i < n->max_queue_pairs; i++) {
-qemu_using_vnet_hdr(qemu_get_subqueue(n->nic, i)->peer, true);
-}
 n->host_hdr_len = sizeof(struct virtio_net_hdr);
 } else {
 n->host_hdr_len = 0;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 707487c63666..63a91877730d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2091,8 +2091,6 @@ static void vmxnet3_net_init(VMXNET3State *s)
 if (s->peer_has_vhdr) {
 qemu_set_vnet_hdr_len(qemu_get_queue(s->nic)->peer,
 sizeof(struct virtio_net_hdr));
-
-qemu_using_vnet_hdr(qemu_get_queue(s->nic)->peer, 1);
 }
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic

[PATCH v10 04/18] net: Remove receive_raw()

2024-04-28 Thread Akihiko Odaki
While netmap implements virtio-net header, it does not implement
receive_raw(). Instead of implementing receive_raw for netmap, add
virtio-net headers in the common code and use receive_iov()/receive()
instead. This also fixes the buffer size for the virtio-net header.

Fixes: fbbdbddec0 ("tap: allow extended virtio header with hash info")
Signed-off-by: Akihiko Odaki 
---
 include/net/net.h |  1 -
 net/net.c | 18 --
 net/tap.c |  1 -
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 6fe5a0aee833..c8f679761bf9 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -72,7 +72,6 @@ typedef struct NetClientInfo {
 NetClientDriver type;
 size_t size;
 NetReceive *receive;
-NetReceive *receive_raw;
 NetReceiveIOV *receive_iov;
 NetCanReceive *can_receive;
 NetStart *start;
diff --git a/net/net.c b/net/net.c
index db096765f4b2..6938da05e077 100644
--- a/net/net.c
+++ b/net/net.c
@@ -787,11 +787,7 @@ static ssize_t nc_sendv_compat(NetClientState *nc, const 
struct iovec *iov,
 offset = iov_to_buf(iov, iovcnt, 0, buf, offset);
 }
 
-if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-ret = nc->info->receive_raw(nc, buffer, offset);
-} else {
-ret = nc->info->receive(nc, buffer, offset);
-}
+ret = nc->info->receive(nc, buffer, offset);
 
 g_free(buf);
 return ret;
@@ -806,6 +802,8 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 MemReentrancyGuard *owned_reentrancy_guard;
 NetClientState *nc = opaque;
 int ret;
+struct virtio_net_hdr_v1_hash vnet_hdr = { };
+g_autofree struct iovec *iov_copy = NULL;
 
 
 if (nc->link_down) {
@@ -824,7 +822,15 @@ static ssize_t qemu_deliver_packet_iov(NetClientState 
*sender,
 owned_reentrancy_guard->engaged_in_io = true;
 }
 
-if (nc->info->receive_iov && !(flags & QEMU_NET_PACKET_FLAG_RAW)) {
+if ((flags & QEMU_NET_PACKET_FLAG_RAW) && nc->vnet_hdr_len) {
+iov_copy = g_new(struct iovec, iovcnt + 1);
+iov_copy[0].iov_base = _hdr;
+iov_copy[0].iov_len =  nc->vnet_hdr_len;
+memcpy(_copy[1], iov, iovcnt * sizeof(*iov));
+iov = iov_copy;
+}
+
+if (nc->info->receive_iov) {
 ret = nc->info->receive_iov(nc, iov, iovcnt);
 } else {
 ret = nc_sendv_compat(nc, iov, iovcnt, flags);
diff --git a/net/tap.c b/net/tap.c
index 49edf6c2b6e1..99c59ee46881 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -360,7 +360,6 @@ static NetClientInfo net_tap_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
-.receive_raw = tap_receive_raw,
 .receive_iov = tap_receive_iov,
 .poll = tap_poll,
 .cleanup = tap_cleanup,

-- 
2.44.0




Re: [PATCH v9 10/11] virtio-gpu: Support Venus context

2024-04-27 Thread Akihiko Odaki

On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Request Venus when initializing VirGL and if vulkan=true flag is set for
virtio-gpu device.


Naming it vulkan is a bit confusing as there is also GFXSTREAM_VULKAN 
capset though virgl does not support it. I think you can just name it venus.




Re: [PATCH v9 11/11] migration/virtio: Add virtio-gpu section

2024-04-27 Thread Akihiko Odaki

On 2024/04/26 0:45, Dmitry Osipenko wrote:

Document virtio-gpu migration specifics.

Suggested-by: Akihiko Odaki 
Signed-off-by: Dmitry Osipenko 
---
  docs/devel/migration/virtio.rst | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/docs/devel/migration/virtio.rst b/docs/devel/migration/virtio.rst
index 611a18b82151..67f5fcfed196 100644
--- a/docs/devel/migration/virtio.rst
+++ b/docs/devel/migration/virtio.rst
@@ -113,3 +113,10 @@ virtio_load() returned (like e.g. code depending on 
features).
  Any extension of the state being migrated should be done in subsections
  added to the core for compatibility reasons. If transport or device specific
  state is added, core needs to invoke a callback from the new subsection.
+
+VirtIO-GPU migration
+
+VirtIO-GPU doesn't adhere to a common virtio migration scheme. It doesn't
+support save/loading of virtio device state, instead it uses generic device
+migration management on top of the virtio core to save/load GPU state.
+Migration of virgl and rutabaga states not supported.


Sorry for confusion, but I didn't mean to add a subsection to the 
documentation. I intended to refer to a terminology of migration data 
structure named subsection, which is documented at: 
docs/devel/migration/main.rst


A device-specific information is not worth to describe here.



Re: [PATCH v9 09/11] virtio-gpu: Register capsets dynamically

2024-04-27 Thread Akihiko Odaki

On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 34 +++---
  include/hw/virtio/virtio-gpu.h |  2 ++
  2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index de788df155bf..9aa1fd78f1e1 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -597,19 +597,13 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->capset_ids->len) {
+resp.capset_id = g_array_index(g->capset_ids, uint32_t,
+   info.capset_index);
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1159,12 +1153,30 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
+static void virtio_gpu_virgl_add_capset(VirtIOGPU *g, uint32_t capset_id)

+{
+g_array_append_val(g->capset_ids, capset_id);
+}
+
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  {
  uint32_t capset2_max_ver, capset2_max_size;
+
+if (g->capset_ids) {


Move capset_ids initialization to virtio_gpu_virgl_init() to save this 
conditional. capset_ids also needs to be freed when the device gets 
unrealized.




Re: [PATCH v9 08/11] virtio-gpu: Resource UUID

2024-04-27 Thread Akihiko Odaki

On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
UUID feature availability is mandatory for Vulkan Venus context.

UUID is intended for sharing dmabufs between virtio devices on host. Qemu
doesn't have second virtio device for sharing, thus a simple stub UUID
implementation is enough. More complete implementation using global UUID
resource table might become interesting for a multi-gpu cases.


This message needs to be updated to clarify that a VM can have a second 
virtio-gpu device but this implementation does not support sharing 
between two virtio-gpu devices.




Re: [PATCH v9 07/11] virtio-gpu: Handle resource blob commands

2024-04-27 Thread Akihiko Odaki

On 2024/04/26 0:45, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c | 268 ++
  hw/display/virtio-gpu.c   |   4 +-
  2 files changed, 271 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 0feaa9f2c52e..73d4acbf1777 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,8 @@
  
  struct virtio_gpu_virgl_resource {

  struct virtio_gpu_simple_resource base;
+bool async_unmap_in_progress;


Why is this flag needed?


+MemoryRegion *mr;
  };
  
  static struct virtio_gpu_virgl_resource *

@@ -49,6 +51,120 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+struct virtio_gpu_virgl_resource *res;
+};
+
+static void virtio_gpu_virgl_resume_cmdq(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->async_unmap_in_progress = false;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() may be executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+aio_bh_schedule_oneshot(qemu_get_aio_context(),
+virtio_gpu_virgl_resume_cmdq, vmr->g);


Use aio_bh_new() and qemu_bh_schedule() instead to save one-time bottom 
half allocation.



+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP; > +}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource\n",
+  __func__);


Print strerror(-ret) here instead as printing strerror(EOPNOTSUPP) helps 
little when !virtio_gpu_hostmem_enabled(b->conf).



+return -ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->res = res;
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * Potentially, MR could outlive the resource if MR's reference is held
+ * outside of virtio-gpu. In order to prevent unmapping resource while
+ * MR is alive, and thus, making the data pointer invalid, we will block
+ * virtio-gpu command processing until MR is fully unreferenced and
+ * released.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static bool
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_virgl_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->async_unmap_in_progress && res->mr) {
+/* memory region owns self res->mr object and frees it by itself */
+MemoryRegion *mr = res->mr;
+res->mr = NULL;
+
+res->async_unmap_in_progress = true;
+
+/* render will be unblocked when MR is freed */
+b->renderer_blocked++;
+
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+}
+
+if (res->async_unmap_in_progress) {
+return false;
+}
+
+virgl_renderer_resource_unmap(res->base.resource_id);
+
+return true;
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -162,6 +278,14 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+if (res->mr || cmd->suspended) {

+bool unmapped = 

Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-04-26 Thread Akihiko Odaki

On 2024/04/24 19:30, Dmitry Osipenko wrote:

On 4/19/24 12:18, Akihiko Odaki wrote:

@@ -61,6 +61,10 @@ struct virtio_gpu_simple_resource {
   int dmabuf_fd;
   uint8_t *remapped;
   +    MemoryRegion *mr;
+    bool async_unmap_completed;
+    bool async_unmap_in_progress;
+


Don't add fields to virtio_gpu_simple_resource but instead create a
struct that embeds virtio_gpu_simple_resource in virtio-gpu-virgl.c.


Please give a justification. I'd rather rename
virtio_gpu_simple_resource s/_simple//. Simple resource already supports
blob and the added fields are directly related to the blob. Don't see
why another struct is needed.



Because mapping is only implemented in virtio-gpu-gl while blob itself 
is implemented also in virtio-gpu.




Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-04-26 Thread Akihiko Odaki

On 2024/04/24 18:43, Dmitry Osipenko wrote:

On 4/19/24 11:53, Akihiko Odaki wrote:

On 2024/04/19 4:00, Dmitry Osipenko wrote:

Add new "suspended" flag to virtio_gpu_ctrl_command telling cmd
processor that it should stop processing commands and retry again
next time until flag is unset.

Signed-off-by: Dmitry Osipenko 


This flag shouldn't be added to virtio_gpu_ctrl_command. suspended is
just !finished in virtio-gpu.c. Only virtio_gpu_virgl_process_cmd()
needs the distinction of suspended and !finished so it is not
appropriate to add this flag the common structure.


The VIRTIO_GPU_FILL_CMD() macro returns void and this macro is used by
every function processing commands. Changing process_cmd() to return
bool will require to change all those functions. Not worthwhile to
change it, IMO. >
The flag reflects the exact command status. The !finished + !suspended
means that command is fenced, i.e. these flags don't have exactly same
meaning.


It is not necessary to change the signature of process_cmd(). You can 
just refer to !finished. No need to have the suspended flag.




I'd keep the flag if there are no better suggestions.





Re: [PATCH v2 1/2] net: Provide MemReentrancyGuard * to qemu_new_nic()

2024-04-26 Thread Akihiko Odaki

On 2024/04/24 21:32, Thomas Huth wrote:

On 24/04/2024 12.41, Prasad Pandit wrote:
On Wednesday, 24 April, 2024 at 03:36:01 pm IST, Philippe 
Mathieu-Daudé wrote:

On 1/6/23 05:18, Akihiko Odaki wrote:

Recently MemReentrancyGuard was added to DeviceState to record that the
device is engaging in I/O. The network device backend needs to 
update it

when delivering a packet to a device.
In preparation for such a change, add MemReentrancyGuard * as a
parameter of qemu_new_nic().


An user on IRC asked if this patch is related/fixing CVE-2021-20255,
any clue?


* CVE-2021-20255 bug: infinite recursion is pointing at a different 
fix patch.

   -> https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2021-20255

* And the this patch below has different issue tagged
   
-> https://lists.nongnu.org/archive/html/qemu-devel/2023-05/msg08312.html

   Fixes: CVE-2023-3019


* They look different, former is an infinite recursion issue and the 
latter is a use-after-free one.


I assume the eepro reentrancy issue has been fixed with:

  https://gitlab.com/qemu-project/qemu/-/issues/556
  i.e.:
  https://gitlab.com/qemu-project/qemu/-/commit/c40ca2301c7603524eaddb5308a3


I agree. Commit c40ca2301c7603524eaddb5308a3 should be what fixed 
CVE-2021-20255, not this patch.


Regards,
Akihiko Odaki



Re: [PATCH v8 11/11] virtio-gpu: Support Venus context

2024-04-19 Thread Akihiko Odaki

On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Request Venus when initializing VirGL and if vulkan=true flag is set for
virtio-gpu device.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 14 ++
  hw/display/virtio-gpu.c| 15 +++
  include/hw/virtio/virtio-gpu.h |  3 +++
  meson.build|  1 +
  4 files changed, 33 insertions(+)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index c0e1ca3ff339..2eac09370b84 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -1095,6 +1095,11 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  flags |= VIRGL_RENDERER_D3D11_SHARE_TEXTURE;
  }
  #endif
+#ifdef VIRGL_RENDERER_VENUS
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {
+flags |= VIRGL_RENDERER_VENUS | VIRGL_RENDERER_RENDER_SERVER;
+}
+#endif
  
  ret = virgl_renderer_init(g, flags, _gpu_3d_cbs);

  if (ret != 0) {
@@ -1138,5 +1143,14 @@ int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL2);
  }
  
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {

+virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VENUS,
+   _max_ver,
+   _max_size);
+if (capset2_max_size) {


Now capset2_max_ver and capset2_max_size are misnomers as they are used 
not only for VIRTIO_GPU_CAPSET_VIRGL2 but also VIRTIO_GPU_CAPSET_VENUS. 
Just removing the "capset2_" prefix would be fine.



+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VENUS);
+}
+}
+
  return g->num_capsets;
  }
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index fbf5c0e6b8b7..a811a86dd600 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1496,6 +1496,19 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
  #endif
  }
  
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {

+#ifdef HAVE_VIRGL_VENUS
+if (!virtio_gpu_blob_enabled(g->parent_obj.conf) ||
+!virtio_gpu_hostmem_enabled(g->parent_obj.conf)) {
+error_setg(errp, "venus requires enabled blob and hostmem 
options");
+return;
+}
+#else
+error_setg(errp, "old virglrenderer, venus unsupported");
+return;
+#endif
+}
+ >   if (!virtio_gpu_base_device_realize(qdev,
  virtio_gpu_handle_ctrl_cb,
  virtio_gpu_handle_cursor_cb,
@@ -1672,6 +1685,8 @@ static Property virtio_gpu_properties[] = {
  DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
  VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
  DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
+DEFINE_PROP_BIT("vulkan", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_VENUS_ENABLED, false),


This property shouldn't be added here because it is specific to 
virtio-gpu-gl.




Re: [PATCH v8 10/11] virtio-gpu: Register capsets dynamically

2024-04-19 Thread Akihiko Odaki

On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 37 ++
  include/hw/virtio/virtio-gpu.h |  3 +++
  2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index eee3816b987f..c0e1ca3ff339 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -558,19 +558,12 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
  VIRTIO_GPU_FILL_CMD(info);
  
  memset(, 0, sizeof(resp));

-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->num_capsets) {
+resp.capset_id = g->capset_ids[info.capset_index];
  virgl_renderer_get_cap_set(resp.capset_id,
 _max_version,
 _max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
  }
  resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
  virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1120,12 +1113,30 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
  return 0;
  }
  
+static void virtio_gpu_virgl_add_capset(VirtIOGPU *g, uint32_t capset_id)

+{
+g->capset_ids = g_realloc_n(g->capset_ids, g->num_capsets + 1,
+sizeof(*g->capset_ids));
+g->capset_ids[g->num_capsets++] = capset_id;
+}
+
  int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
  {
  uint32_t capset2_max_ver, capset2_max_size;
+
+if (g->num_capsets) {
+return g->num_capsets;
+}
+
+/* VIRGL is always supported. */
+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL);
+
  virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
-  _max_ver,
-  _max_size);
+   _max_ver,
+   _max_size);
+if (capset2_max_ver) {
+virtio_gpu_virgl_add_capset(g, VIRTIO_GPU_CAPSET_VIRGL2);
+}
  
-return capset2_max_ver ? 2 : 1;

+return g->num_capsets;
  }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index d2a0d542fbb3..3d7d001a85c5 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -218,6 +218,9 @@ struct VirtIOGPU {
  QTAILQ_HEAD(, VGPUDMABuf) bufs;
  VGPUDMABuf *primary[VIRTIO_GPU_MAX_SCANOUTS];
  } dmabuf;
+
+uint32_t *capset_ids;
+uint32_t num_capsets;


Use GArray.



Re: [PATCH v8 09/11] virtio-gpu: Resource UUID

2024-04-19 Thread Akihiko Odaki

On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Enable resource UUID feature and implement command resource assign UUID.
UUID feature availability is mandatory for Vulkan Venus context.

UUID is intended for sharing dmabufs between virtio devices on host. Qemu
doesn't have second virtio device for sharing, thus a simple stub UUID
implementation is enough. More complete implementation using global UUID
resource table might become interesting for a multi-gpu cases.


Isn't it possible to add two virtio-gpu devices even now?

A new subsection should also be added for migration compatibility; see: 
docs/devel/migration/main.rst




Re: [PATCH v8 08/11] virtio-gpu: Handle resource blob commands

2024-04-19 Thread Akihiko Odaki

On 2024/04/19 4:00, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 248 +
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   4 +
  3 files changed, 255 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index bb9ee1eba9a0..de132b22f554 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -32,6 +32,102 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+VirtIOGPUBase *b;
+struct virtio_gpu_simple_resource *res;
+};
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+
+vmr = container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+vmr->res->async_unmap_in_progress = false;
+vmr->res->async_unmap_completed = true;
+vmr->b->renderer_blocked--;


Resume the command queue processing here.


+
+g_free(vmr);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+return -EOPNOTSUPP;


Log a message here instead of picking an error number.


+}
+
+ret = virgl_renderer_resource_map(res->resource_id, , );
+if (ret) {
+return -ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+MemoryRegion *mr = >mr;


Mixed declarations are not allowed; see: docs/devel/style.rst


+vmr->res = res;
+vmr->b = b;
+
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * Potentially, MR could outlive the resource if MR's reference is held
+ * outside of virtio-gpu. In order to prevent unmapping resource while
+ * MR is alive, and thus, making the data pointer invalid, we will block
+ * virtio-gpu command processing until MR is fully unreferenced and
+ * released.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static bool
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_simple_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->async_unmap_in_progress && !res->async_unmap_completed) {
+/* memory region owns self res->mr object and frees it by itself */
+MemoryRegion *mr = res->mr;
+res->mr = NULL;
+
+res->async_unmap_in_progress = true;
+
+/* render will be unblocked when MR is freed */
+b->renderer_blocked++;
+
+memory_region_set_enabled(mr, false);
+memory_region_del_subregion(>hostmem, mr);
+object_unparent(OBJECT(mr));
+}
+
+if (!res->async_unmap_completed) {


This check is unnecessary as the command processing is blocked until the 
unmap operation completes.



+return false;
+}
+
+virgl_renderer_resource_unmap(res->resource_id);
+res->async_unmap_completed = false;
+
+return true;
+
+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -145,6 +241,14 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+if (res->mr || cmd->suspended) {

+bool unmapped = virtio_gpu_virgl_unmap_resource_blob(g, res);
+cmd->suspended = !unmapped;
+if (cmd->suspended) {
+return;
+}
+}
+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
@@ -495,6 +599,141 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  }
  
  #ifdef HAVE_VIRGL_RESOURCE_BLOB

+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+struct virtio_gpu_resource_create_blob cblob;
+struct virtio_gpu_simple_resource *res;
+int ret;
+
+if 

Re: [PATCH v8 07/11] virtio-gpu: Support suspension of commands processing

2024-04-19 Thread Akihiko Odaki

On 2024/04/19 4:00, Dmitry Osipenko wrote:

Add new "suspended" flag to virtio_gpu_ctrl_command telling cmd
processor that it should stop processing commands and retry again
next time until flag is unset.

Signed-off-by: Dmitry Osipenko 


This flag shouldn't be added to virtio_gpu_ctrl_command. suspended is 
just !finished in virtio-gpu.c. Only virtio_gpu_virgl_process_cmd() 
needs the distinction of suspended and !finished so it is not 
appropriate to add this flag the common structure.


Regards,
Akihiko Odaki



Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-16 Thread Akihiko Odaki

On 2024/04/16 13:00, Jason Wang wrote:

On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
 wrote:


On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  wrote:


vhost requires eBPF for RSS. When eBPF is not available, virtio-net
implicitly disables RSS even if the user explicitly requests it. Return
an error instead of implicitly disabling RSS if RSS is requested but not
available.

Signed-off-by: Akihiko Odaki 
---
  hw/net/virtio-net.c | 97 ++---
  1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..3d53eba88cfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  return features;
  }

-if (!ebpf_rss_is_loaded(>ebpf_rss)) {
-virtio_clear_feature(, VIRTIO_NET_F_RSS);
-}
  features = vhost_net_get_features(get_vhost_net(nc->peer), features);
  vdev->backend_features = features;

@@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
  return qatomic_read(>failover_primary_hidden);
  }

+static void virtio_net_device_unrealize(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIONet *n = VIRTIO_NET(dev);
+int i, max_queue_pairs;
+
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+virtio_net_unload_ebpf(n);
+}
+
+/* This will stop vhost backend if appropriate. */
+virtio_net_set_status(vdev, 0);
+
+g_free(n->netclient_name);
+n->netclient_name = NULL;
+g_free(n->netclient_type);
+n->netclient_type = NULL;
+
+g_free(n->mac_table.macs);
+g_free(n->vlans);
+
+if (n->failover) {
+qobject_unref(n->primary_opts);
+device_listener_unregister(>primary_listener);
+migration_remove_notifier(>migration_state);
+} else {
+assert(n->primary_opts == NULL);
+}
+
+max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+for (i = 0; i < max_queue_pairs; i++) {
+virtio_net_del_queue(n, i);
+}
+/* delete also control vq */
+virtio_del_queue(vdev, max_queue_pairs * 2);
+qemu_announce_timer_del(>announce_timer, false);
+g_free(n->vqs);
+qemu_del_nic(n->nic);
+virtio_net_rsc_cleanup(n);
+g_free(n->rss_data.indirections_table);
+net_rx_pkt_uninit(n->rx_pkt);
+virtio_cleanup(vdev);
+}
+
  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
  {
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)

  net_rx_pkt_init(>rx_pkt);

-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
-}
-}
-
-static void virtio_net_device_unrealize(DeviceState *dev)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(dev);
-VirtIONet *n = VIRTIO_NET(dev);
-int i, max_queue_pairs;
-
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_unload_ebpf(n);
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
+!virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
+virtio_net_device_unrealize(dev);
+error_setg(errp, "Can't load eBPF RSS for vhost");
  }


As I already mentioned, I think this is an extremely bad idea to
fail to run qemu due to such a reason as .absence of one feature.
What I suggest is:
1. Redefine rss as tri-state (off|auto|on)
2. Fail to run only if rss is on and not available via ebpf
3. On auto - silently drop it


"Auto" might be promatic for migration compatibility which is hard to
be used by management layers like libvirt. The reason is that there's
no way for libvirt to know if it is supported by device or not.


Certainly auto is not good for migration, but it is useful in the other 
situations. You can still set "on" or "off" if you care migration. I'll 
add "auto" support in the next version.




Thanks


4. The same with 'hash' option - it is not compatible with vhost (at
least at the moment)
5. Reformat the patch as it is hard to review it due to replacing
entire procedures, i.e. one patch with replacing without changes,
another one - with real changes. >> If this is hard to review only for me - 
please ignore that.


I'll split this patch accordingly in the next version.

Regards,
Akihiko Odak



Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-15 Thread Akihiko Odaki

On 2024/04/15 17:49, Dmitry Osipenko wrote:

On 4/15/24 11:13, Akihiko Odaki wrote:

On 2024/04/15 17:03, Dmitry Osipenko wrote:

Hello,

On 4/13/24 14:57, Akihiko Odaki wrote:
...

+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct
virtio_gpu_simple_resource *res)
+{
+    VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+    if (!res->mr) {
+    return;
+    }
+
+    memory_region_set_enabled(res->mr, false);
+    memory_region_del_subregion(>hostmem, res->mr);
+
+    /* memory region owns res->mr object and frees it when mr is
released */
+    res->mr = NULL;
+
+    virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still
vulnerable to the race condition pointed out. The memory region is
asynchronously unmapped from the guest address space, but the backing
memory on the host address space is unmapped synchronously before that.
This results in use-after-free. The whole unmapping operation needs to
be implemented in an asynchronous manner.


Thanks for the clarification! I missed this point from the previous
discussion.

Could you please clarify what do you mean by the "asynchronous manner"?
Virglrenderer API works only in the virtio-gpu-gl context, it can't be
accessed from other places.

The memory_region_del_subregion() should remove the region as long as
nobody references it, isn't it? On Linux guest nobody should reference
hostmem regions besides virtio-gpu device on the unmap, don't know about
other guests.

We can claim it a guest's fault if MR lives after the deletion and in
that case exit Qemu with a noisy error msg or leak resource. WDYT?



We need to be prepared for a faulty guest for reliability and security
as they are common goals of virtualization, and it is nice to have them
after all.

You need to call virgl_renderer_resource_unmap() after the MR actually
gets freed. The virtio-gpu-gl context is just a context with BQL so it
is fine to call virgl functions in most places.


Do you have example of a legit use-case where hostmem MR could outlive
resource mapping?


MR outliving after memory_region_del_subregion() is not a use-case, but 
a situation that occurs due to the limitation of the memory subsystem. 
It is not easy to answer how often such a situation happens.




Turning it into a error condition is much more reasonable to do than to
to worry about edge case that nobody cares about, which can't be tested
easily and that not trivial to support, IMO.

I'm not sure what you mean by turning into an error condition. I doubt 
it's possible to emit errors when someone touches an unmapped region.


Reproducing this issue is not easy as it's often cases for 
use-after-free bugs, but fixing it is not that complicated in my opinion 
since you already have an implementation which asynchronously unmaps the 
region in v6. I write my suggestions to fix problems in v6:


- Remove ref member in virgl_gpu_resource, vres_get_ref(), 
vres_put_ref(), and virgl_resource_unmap().


- Change virtio_gpu_virgl_process_cmd(), 
virgl_cmd_resource_unmap_blob(), and virgl_cmd_resource_unref() to 
return a bool, which tells if the command was processed or suspended.


- In virtio_gpu_process_cmdq(), break if the command was suspended.

- In virgl_resource_blob_async_unmap(), call virtio_gpu_gl_block(g, false).

- In virgl_cmd_resource_unmap_blob() and virgl_cmd_resource_unref(), 
call memory_region_del_subregion() and virtio_gpu_gl_block(g, true), and 
tell that the command was suspended if the reference counter of 
MemoryRegion > 0. Free and unmap the MR otherwise.


Regards,
Akihiko Odaki



Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-15 Thread Akihiko Odaki

On 2024/04/15 17:03, Dmitry Osipenko wrote:

Hello,

On 4/13/24 14:57, Akihiko Odaki wrote:
...

+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct
virtio_gpu_simple_resource *res)
+{
+    VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+    if (!res->mr) {
+    return;
+    }
+
+    memory_region_set_enabled(res->mr, false);
+    memory_region_del_subregion(>hostmem, res->mr);
+
+    /* memory region owns res->mr object and frees it when mr is
released */
+    res->mr = NULL;
+
+    virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still
vulnerable to the race condition pointed out. The memory region is
asynchronously unmapped from the guest address space, but the backing
memory on the host address space is unmapped synchronously before that.
This results in use-after-free. The whole unmapping operation needs to
be implemented in an asynchronous manner.


Thanks for the clarification! I missed this point from the previous
discussion.

Could you please clarify what do you mean by the "asynchronous manner"?
Virglrenderer API works only in the virtio-gpu-gl context, it can't be
accessed from other places.

The memory_region_del_subregion() should remove the region as long as
nobody references it, isn't it? On Linux guest nobody should reference
hostmem regions besides virtio-gpu device on the unmap, don't know about
other guests.

We can claim it a guest's fault if MR lives after the deletion and in
that case exit Qemu with a noisy error msg or leak resource. WDYT?



We need to be prepared for a faulty guest for reliability and security 
as they are common goals of virtualization, and it is nice to have them 
after all.


You need to call virgl_renderer_resource_unmap() after the MR actually 
gets freed. The virtio-gpu-gl context is just a context with BQL so it 
is fine to call virgl functions in most places.


Also the command response must be delayed in a similar manner. Currently 
virtio_gpu_virgl_process_cmd() synchronously returns command responses 
so this needs to be changed.


Regards,
Akihiko Odaki



Re: [PATCH v9 17/20] ebpf: Fix RSS error handling

2024-04-14 Thread Akihiko Odaki

On 2024/04/13 21:16, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


calculate_rss_hash() was using hash value 0 to tell if it calculated
a hash, but the hash value may be 0 on a rare occasion. Have a
distinct bool value for correctness.


This is interesting question whether in reality the hash value might
be 0 or not.
On one hand - this seems like a kind of fix.
On another hard - this adds computation cycles for each packet, and the
corner case that this targets to fix seems hardly reachable if at all.
Optimistic estimation is 2.5*10^-8 percent of address:address:port triplets.
I would suggest at least to find some proof of the fact that the calculated
hash might be 0 in real case where source addresses are not random.


Your estimation, which suggests the low probability of error, and the 
fact that an error in RSS is not fatal and only regresses the 
performance, should be sufficient to determine how this patch should be 
handled; this patch is nice to have, but is not worth to backport to 
stable or to merge before the 9.0 release.


Regards,
Akihiko Odaki



Re: [PATCH v7 07/10] virtio-gpu: Handle resource blob commands

2024-04-13 Thread Akihiko Odaki

On 2024/04/11 19:19, Dmitry Osipenko wrote:

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
  hw/display/virtio-gpu-virgl.c  | 196 +
  hw/display/virtio-gpu.c|   4 +-
  include/hw/virtio/virtio-gpu.h |   1 +
  3 files changed, 200 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index c2057b0c2147..ec63f5d698b7 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -32,6 +32,55 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
  }
  #endif
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_simple_resource *res,
+   uint64_t offset)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->resource_id, , );
+if (ret) {
+return -ret;
+}
+
+res->mr = g_new0(MemoryRegion, 1);
+memory_region_init_ram_ptr(res->mr, OBJECT(res->mr), "blob",
+   size, data);
+memory_region_add_subregion(>hostmem, offset, res->mr);
+memory_region_set_enabled(res->mr, true);
+
+return 0;
+}
+
+static void
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_simple_resource *res)
+{
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+
+if (!res->mr) {
+return;
+}
+
+memory_region_set_enabled(res->mr, false);
+memory_region_del_subregion(>hostmem, res->mr);
+
+/* memory region owns res->mr object and frees it when mr is released */
+res->mr = NULL;
+
+virgl_renderer_resource_unmap(res->resource_id);


Hi,

First, thanks for keeping working on this.

This patch has some changes since the previous version, but it is still 
vulnerable to the race condition pointed out. The memory region is 
asynchronously unmapped from the guest address space, but the backing 
memory on the host address space is unmapped synchronously before that. 
This results in use-after-free. The whole unmapping operation needs to 
be implemented in an asynchronous manner.


Regards,
Akihiko Odaki


+}
+#endif /* HAVE_VIRGL_RESOURCE_BLOB */
+
  static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
  {
@@ -145,6 +194,10 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  return;
  }
  
+#ifdef HAVE_VIRGL_RESOURCE_BLOB

+virtio_gpu_virgl_unmap_resource_blob(g, res);
+#endif
+
  virgl_renderer_resource_detach_iov(unref.resource_id,
 _iovs,
 _iovs);
@@ -495,6 +548,140 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
  }
  
  #ifdef HAVE_VIRGL_RESOURCE_BLOB

+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virgl_renderer_resource_create_blob_args virgl_args = { 0 };
+struct virtio_gpu_resource_create_blob cblob;
+struct virtio_gpu_simple_resource *res;
+int ret;
+
+if (!virtio_gpu_blob_enabled(g->parent_obj.conf)) {
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap();
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, cblob.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, cblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+res->resource_id = cblob.resource_id;
+res->blob_size = cblob.size;
+res->dmabuf_fd = -1;
+
+if (cblob.blob_mem != VIRTIO_GPU_BLOB_MEM_HOST3D) {
+ret = virtio_gpu_create_mapping_iov(g, cblob.nr_entries, sizeof(cblob),
+cmd, >addrs,
+>iov, >iov_cnt);
+if (!r

Re: [PATCH-for-9.0? v2] hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()

2024-04-10 Thread Akihiko Odaki

On 2024/04/10 16:04, Philippe Mathieu-Daudé wrote:

If a fragmented packet size is too short, do not try to
calculate its checksum.

Reproduced using:

   $ cat << EOF | qemu-system-i386 -display none -nodefaults \
   -machine q35,accel=qtest -m 32M \
   -device igb,netdev=net0 \
   -netdev user,id=net0 \
   -qtest stdio
   outl 0xcf8 0x8810
   outl 0xcfc 0xe000
   outl 0xcf8 0x8804
   outw 0xcfc 0x06
   write 0xe403 0x1 0x02
   writel 0xe0003808 0x
   write 0xe000381a 0x1 0x5b
   write 0xe000381b 0x1 0x00
   EOF
   Assertion failed: (offset == 0), function iov_from_buf_full, file 
util/iov.c, line 39.
   #1 0x5575e81e952a in iov_from_buf_full qemu/util/iov.c:39:5
   #2 0x5575e6500768 in net_tx_pkt_update_sctp_checksum 
qemu/hw/net/net_tx_pkt.c:144:9
   #3 0x5575e659f3e1 in igb_setup_tx_offloads qemu/hw/net/igb_core.c:478:11
   #4 0x5575e659f3e1 in igb_tx_pkt_send qemu/hw/net/igb_core.c:552:10
   #5 0x5575e659f3e1 in igb_process_tx_desc qemu/hw/net/igb_core.c:671:17
   #6 0x5575e659f3e1 in igb_start_xmit qemu/hw/net/igb_core.c:903:9
   #7 0x5575e659f3e1 in igb_set_tdt qemu/hw/net/igb_core.c:2812:5
   #8 0x5575e657d6a4 in igb_core_write qemu/hw/net/igb_core.c:4248:9

Cc: qemu-sta...@nongnu.org
Reported-by: Zheyu Ma 
Fixes: f199b13bc1 ("igb: Implement Tx SCTP CSO")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2273
Signed-off-by: Philippe Mathieu-Daudé 


Reviewed-by: Akihiko Odaki 


---
Since v1: check at offset 8 (Akihiko)
---
  hw/net/net_tx_pkt.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c..b7b1de816d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,6 +141,10 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
  uint32_t csum = 0;
  struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
  
+if (iov_size(pl_start_frag, pkt->payload_frags) < 8 + sizeof(csum)) {

+return false;
+}
+
  if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , sizeof(csum)) 
< sizeof(csum)) {
  return false;
  }




Re: [RFC PATCH-for-9.0?] hw/net/net_tx_pkt: Fix overrun in update_sctp_checksum()

2024-04-10 Thread Akihiko Odaki

On 2024/04/10 3:04, Philippe Mathieu-Daudé wrote:

If a fragmented packet size is too short, do not try to
calculate its checksum.

Reproduced using:

   $ cat << EOF | qemu-system-i386 -display none -nodefaults \
   -machine q35,accel=qtest -m 32M \
   -device igb,netdev=net0 \
   -netdev user,id=net0 \
   -qtest stdio
   outl 0xcf8 0x8810
   outl 0xcfc 0xe000
   outl 0xcf8 0x8804
   outw 0xcfc 0x06
   write 0xe403 0x1 0x02
   writel 0xe0003808 0x
   write 0xe000381a 0x1 0x5b
   write 0xe000381b 0x1 0x00
   EOF
   Assertion failed: (offset == 0), function iov_from_buf_full, file 
util/iov.c, line 39.
   #1 0x5575e81e952a in iov_from_buf_full qemu/util/iov.c:39:5
   #2 0x5575e6500768 in net_tx_pkt_update_sctp_checksum 
qemu/hw/net/net_tx_pkt.c:144:9
   #3 0x5575e659f3e1 in igb_setup_tx_offloads qemu/hw/net/igb_core.c:478:11
   #4 0x5575e659f3e1 in igb_tx_pkt_send qemu/hw/net/igb_core.c:552:10
   #5 0x5575e659f3e1 in igb_process_tx_desc qemu/hw/net/igb_core.c:671:17
   #6 0x5575e659f3e1 in igb_start_xmit qemu/hw/net/igb_core.c:903:9
   #7 0x5575e659f3e1 in igb_set_tdt qemu/hw/net/igb_core.c:2812:5
   #8 0x5575e657d6a4 in igb_core_write qemu/hw/net/igb_core.c:4248:9

Reported-by: Zheyu Ma 
Fixes: f199b13bc1 ("igb: Implement Tx SCTP CSO")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2273
Signed-off-by: Philippe Mathieu-Daudé 
---
No clue this makes sense, but avoids the crash...
---
  hw/net/net_tx_pkt.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 2134a18c4c..6a8640157f 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -141,6 +141,10 @@ bool net_tx_pkt_update_sctp_checksum(struct NetTxPkt *pkt)
  uint32_t csum = 0;
  struct iovec *pl_start_frag = pkt->vec + NET_TX_PKT_PL_START_FRAG;
  
+if (iov_size(pl_start_frag, pkt->payload_frags) < sizeof(csum)) {

+return false;
+}
+


iov_from_buf() uses 8 as offset, so we should check if the size >= 8 + 
sizeof(csum).


However I doubt that it is worth to fix. I think it is fine to remove 
the assertion (i.e., remove the requirement that the offset specified 
for iov_from_buf() must be inside iov and instead let the function() 
return 0 in that case).


Regards,
Akihiko Odaki


  if (iov_from_buf(pl_start_frag, pkt->payload_frags, 8, , sizeof(csum)) 
< sizeof(csum)) {
  return false;
  }




Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki

On 2024/04/08 17:06, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:57 AM Akihiko Odaki  wrote:


On 2024/04/08 16:54, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:42 AM Akihiko Odaki  wrote:


On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


But how is your patch involved in it? Should this line be removed from
the commit message?


In the master, VIRTIO_NET_F_HASH_REPORT is silently dropped, but this
patch will change to work without dropping it, which is worth to mention.

After applying this series of patches the VIRTIO_NET_F_HASH_REPORT is
dropped _the same way_ as in the master


You are right. I forgot that I dropped patch "virtio-net: Do not clear 
VIRTIO_NET_F_HASH_REPORT" with v7. I'll drop the line in the next 
version accordingly. Thanks for pointing out that.


Regards,
Akihiko Odaki



Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki

On 2024/04/08 16:54, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 10:42 AM Akihiko Odaki  wrote:


On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


But how is your patch involved in it? Should this line be removed from
the commit message?


In the master, VIRTIO_NET_F_HASH_REPORT is silently dropped, but this 
patch will change to work without dropping it, which is worth to mention.


Regards,
Akihiko Odaki



Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-08 Thread Akihiko Odaki

On 2024/04/08 16:40, Yuri Benditovich wrote:

On Mon, Apr 8, 2024 at 4:30 AM Akihiko Odaki  wrote:


On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.


No, it does not. Such a problem does not exist in the master, the
hash_report feature
is silently dropped in such case:
https://github.com/qemu/qemu/blob/master/hw/net/virtio-net.c#L816


Well, silently dropping VIRTIO_NET_F_HASH_REPORT is not different from 
preventing enabling VIRTIO_NET_F_HASH_REPORT, is it?


Regards,
Akihiko Odaki



Re: [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer

2024-04-07 Thread Akihiko Odaki

On 2024/04/08 7:09, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:12 PM Akihiko Odaki  wrote:


The peer buffer is qualified with const and not meant to be modified.


IMHO, this buffer is not so 'const' (although the prototype states so),
it is allocated in net.c
btw, another procedure in this file also modifies the buffer
(work_around_broken_dhclient)


Right but it has a FIXME comment.




It also prevents enabling VIRTIO_NET_F_HASH_REPORT for peers without
virtio-net header support.


Does it mean _this commit_ prevents enabling VIRTIO_NET_F_HASH_REPORT
for peers without
virtio-net header support? Where?


No, but I meant that this patch fixes such a problem.

Regards,
Akihiko Odaki



Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS

2024-04-07 Thread Akihiko Odaki

On 2024/04/08 6:46, Yuri Benditovich wrote:

On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki  wrote:


vhost requires eBPF for RSS. When eBPF is not available, virtio-net
implicitly disables RSS even if the user explicitly requests it. Return
an error instead of implicitly disabling RSS if RSS is requested but not
available.

Signed-off-by: Akihiko Odaki 
---
  hw/net/virtio-net.c | 97 ++---
  1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 61b49e335dea..3d53eba88cfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
  return features;
  }

-if (!ebpf_rss_is_loaded(>ebpf_rss)) {
-virtio_clear_feature(, VIRTIO_NET_F_RSS);
-}
  features = vhost_net_get_features(get_vhost_net(nc->peer), features);
  vdev->backend_features = features;

@@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
  return qatomic_read(>failover_primary_hidden);
  }

+static void virtio_net_device_unrealize(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VirtIONet *n = VIRTIO_NET(dev);
+int i, max_queue_pairs;
+
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+virtio_net_unload_ebpf(n);
+}
+
+/* This will stop vhost backend if appropriate. */
+virtio_net_set_status(vdev, 0);
+
+g_free(n->netclient_name);
+n->netclient_name = NULL;
+g_free(n->netclient_type);
+n->netclient_type = NULL;
+
+g_free(n->mac_table.macs);
+g_free(n->vlans);
+
+if (n->failover) {
+qobject_unref(n->primary_opts);
+device_listener_unregister(>primary_listener);
+migration_remove_notifier(>migration_state);
+} else {
+assert(n->primary_opts == NULL);
+}
+
+max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+for (i = 0; i < max_queue_pairs; i++) {
+virtio_net_del_queue(n, i);
+}
+/* delete also control vq */
+virtio_del_queue(vdev, max_queue_pairs * 2);
+qemu_announce_timer_del(>announce_timer, false);
+g_free(n->vqs);
+qemu_del_nic(n->nic);
+virtio_net_rsc_cleanup(n);
+g_free(n->rss_data.indirections_table);
+net_rx_pkt_uninit(n->rx_pkt);
+virtio_cleanup(vdev);
+}
+
  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
  {
  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)

  net_rx_pkt_init(>rx_pkt);

-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
-}
-}
-
-static void virtio_net_device_unrealize(DeviceState *dev)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(dev);
-VirtIONet *n = VIRTIO_NET(dev);
-int i, max_queue_pairs;
-
-if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_unload_ebpf(n);
+if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&


I disagree with this change of qemu behavior.
 From my point of view:
- this is not a major problem and it should not be a reason to stop VM execution
- it is enough to disable the RSS feature and continue working. Depending on
   other qemu parameters (number of queues, number of cpus) this might be just
   suboptimal. might be a minor problem and might be not a problem at all


The reasoning is that we shouldn't disable what the user explicitly 
requested. c.f., 
https://lore.kernel.org/all/20231102091717-mutt-send-email-...@kernel.org/



- this change defines rss as _only_ feature whose absence breaks the VM start,
   _all_ other features are dropped silently and only rss is not. Why??


I'm following what QEMU does in the other places rather than what it 
does just in virtio-net. I have pointed out virtio-gpu raises errors in 
such a situation. c.f., 
https://lore.kernel.org/all/8880b6f9-f556-46f7-a191-eeec0fe20...@daynix.com



- the series has a title 'Fixes and improvements' . This is not a fix and not an
   improvement, this is significant behavioral change that should be discussed 
in
   light of future plans regarding rss
- I suggest to remove this change from the series, submit it separately
   and discuss from all the sides


We should have already discussed about these matters; I responded all 
past replies in the previous versions months ago and had no update after 
that. Let's focus on matters that were not previously pointed out.


Regards,
Akihiko Odaki



[PATCH v9 09/20] virtio-net: Copy header only when necessary

2024-04-03 Thread Akihiko Odaki
The copied header is only used for byte swapping.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e33bdbfd84a5..ca0fbf7b7654 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -360,7 +360,8 @@ static void virtio_net_vnet_endian_status(VirtIONet *n, 
uint8_t status)
  * can't do it, we fallback onto fixing the headers in the core
  * virtio-net code.
  */
-n->needs_vnet_hdr_swap = virtio_net_set_vnet_endian(vdev, n->nic->ncs,
+n->needs_vnet_hdr_swap = n->has_vnet_hdr &&
+ virtio_net_set_vnet_endian(vdev, n->nic->ncs,
 queue_pairs, true);
 } else if (virtio_net_started(n, vdev->status)) {
 /* After using the device, we need to reset the network backend to
@@ -2767,7 +2768,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 return -EINVAL;
 }
 
-if (n->has_vnet_hdr) {
+if (n->needs_vnet_hdr_swap) {
 if (iov_to_buf(out_sg, out_num, 0, , n->guest_hdr_len) <
 n->guest_hdr_len) {
 virtio_error(vdev, "virtio-net header incorrect");
@@ -2775,19 +2776,16 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 g_free(elem);
 return -EINVAL;
 }
-if (n->needs_vnet_hdr_swap) {
-virtio_net_hdr_swap(vdev, (void *) );
-sg2[0].iov_base = 
-sg2[0].iov_len = n->guest_hdr_len;
-out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1,
-   out_sg, out_num,
-   n->guest_hdr_len, -1);
-if (out_num == VIRTQUEUE_MAX_SIZE) {
-goto drop;
-}
-out_num += 1;
-out_sg = sg2;
+virtio_net_hdr_swap(vdev, (void *) );
+sg2[0].iov_base = 
+sg2[0].iov_len = n->guest_hdr_len;
+out_num = iov_copy([1], ARRAY_SIZE(sg2) - 1, out_sg, out_num,
+   n->guest_hdr_len, -1);
+if (out_num == VIRTQUEUE_MAX_SIZE) {
+goto drop;
 }
+out_num += 1;
+out_sg = sg2;
 }
 /*
  * If host wants to see the guest header as is, we can

-- 
2.44.0




[PATCH v9 12/20] virtio-net: Unify the logic to update NIC state for RSS

2024-04-03 Thread Akihiko Odaki
The code to attach or detach the eBPF program to RSS were duplicated so
unify them into one function to save some code.

Signed-off-by: Akihiko Odaki 
---
 hw/net/virtio-net.c | 90 +
 1 file changed, 36 insertions(+), 54 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ac9c06f6865..61b49e335dea 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1232,18 +1232,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
uint8_t cmd,
 }
 }
 
-static void virtio_net_detach_epbf_rss(VirtIONet *n);
-
-static void virtio_net_disable_rss(VirtIONet *n)
-{
-if (n->rss_data.enabled) {
-trace_virtio_net_rss_disable();
-}
-n->rss_data.enabled = false;
-
-virtio_net_detach_epbf_rss(n);
-}
-
 static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
 {
 NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
@@ -1291,6 +1279,40 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
+static void virtio_net_commit_rss_config(VirtIONet *n)
+{
+if (n->rss_data.enabled) {
+n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+if (n->rss_data.populate_hash) {
+virtio_net_detach_epbf_rss(n);
+} else if (!virtio_net_attach_epbf_rss(n)) {
+if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+warn_report("Can't load eBPF RSS for vhost");
+} else {
+warn_report("Can't load eBPF RSS - fallback to software RSS");
+n->rss_data.enabled_software_rss = true;
+}
+}
+
+trace_virtio_net_rss_enable(n->rss_data.hash_types,
+n->rss_data.indirections_len,
+sizeof(n->rss_data.key));
+} else {
+virtio_net_detach_epbf_rss(n);
+trace_virtio_net_rss_disable();
+}
+}
+
+static void virtio_net_disable_rss(VirtIONet *n)
+{
+if (!n->rss_data.enabled) {
+return;
+}
+
+n->rss_data.enabled = false;
+virtio_net_commit_rss_config(n);
+}
+
 static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
 int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
@@ -1455,28 +1477,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
 goto error;
 }
 n->rss_data.enabled = true;
-
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-/* EBPF must be loaded for vhost */
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't load eBPF RSS for vhost");
-goto error;
-}
-/* fallback to software RSS */
-warn_report("Can't load eBPF RSS - fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-} else {
-/* use software RSS for hash populating */
-/* and detach eBPF if was loaded before */
-virtio_net_detach_epbf_rss(n);
-n->rss_data.enabled_software_rss = true;
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-temp.b);
+virtio_net_commit_rss_config(n);
 return queue_pairs;
 error:
 trace_virtio_net_rss_error(err_msg, err_value);
@@ -3092,26 +3093,7 @@ static int virtio_net_post_load_device(void *opaque, int 
version_id)
 }
 }
 
-if (n->rss_data.enabled) {
-n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
-if (!n->rss_data.populate_hash) {
-if (!virtio_net_attach_epbf_rss(n)) {
-if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
-warn_report("Can't post-load eBPF RSS for vhost");
-} else {
-warn_report("Can't post-load eBPF RSS - "
-"fallback to software RSS");
-n->rss_data.enabled_software_rss = true;
-}
-}
-}
-
-trace_virtio_net_rss_enable(n->rss_data.hash_types,
-n->rss_data.indirections_len,
-sizeof(n->rss_data.key));
-} else {
-trace_virtio_net_rss_disable();
-}
+virtio_net_commit_rss_config(n);
 return 0;
 }
 

-- 
2.44.0




[PATCH v9 20/20] ebpf: Add a separate target for skeleton

2024-04-03 Thread Akihiko Odaki
This generalizes the rule to generate the skeleton and allows to add
another.

Signed-off-by: Akihiko Odaki 
---
 tools/ebpf/Makefile.ebpf | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
index 3391e7ce0898..572ca5987ae6 100755
--- a/tools/ebpf/Makefile.ebpf
+++ b/tools/ebpf/Makefile.ebpf
@@ -1,23 +1,24 @@
-OBJS = rss.bpf.o
+SKELETONS = rss.bpf.skeleton.h
 
 LLVM_STRIP ?= llvm-strip
 CLANG ?= clang
 INC_FLAGS = `$(CLANG) -print-file-name=include`
 EXTRA_CFLAGS ?= -O2 -g -target bpf
 
-all: $(OBJS)
+all: $(SKELETONS)
 
 .PHONY: clean
 
 clean:
-   rm -f $(OBJS)
-   rm -f rss.bpf.skeleton.h
+   rm -f $(SKELETONS) $(SKELETONS:%.skeleton.h=%.o)
 
-$(OBJS):  %.o:%.c
+%.o: %.c
$(CLANG) $(INC_FLAGS) \
 -D__KERNEL__ -D__ASM_SYSREG_H \
 -I../include $(LINUXINCLUDE) \
 $(EXTRA_CFLAGS) -c $< -o $@
$(LLVM_STRIP) -g $@
-   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
-   cp rss.bpf.skeleton.h ../../ebpf/
+
+%.skeleton.h: %.o
+   bpftool gen skeleton $< > $@
+   cp $@ ../../ebpf/

-- 
2.44.0




  1   2   3   4   5   6   7   8   9   10   >