Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-08 Thread ThinerLogoer
At 2023-08-09 05:01:17, "Peter Xu"  wrote:
>On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
>> From: Thiner Logoer 
>> 
>> Users may specify
>> * "-mem-path" or
>> * "-object memory-backend-file,share=off,readonly=off"
>> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
>> does not have write permissions to open the file.
>> 
>> For now, we would always fail in that case, always requiring file write
>> permissions. Let's detect when that failure happens and fallback to opening
>> the file readonly.
>> 
>> Warn the user, since there are other use cases where we want the file to
>> be mapped writable: ftruncate() and fallocate() will fail if the file
>> was not opened with write permissions.
>> 
>> Signed-off-by: Thiner Logoer 
>> Co-developed-by: David Hildenbrand 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  softmmu/physmem.c | 26 ++
>>  1 file changed, 18 insertions(+), 8 deletions(-)
>> 
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 3df73542e1..d1ae694b20 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>>  static int file_ram_open(const char *path,
>>   const char *region_name,
>>   bool readonly,
>> - bool *created,
>> - Error **errp)
>> + bool *created)
>>  {
>>  char *filename;
>>  char *sanitized_name;
>> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>>  g_free(filename);
>>  }
>>  if (errno != EEXIST && errno != EINTR) {
>> -error_setg_errno(errp, errno,
>> - "can't open backing store %s for guest RAM",
>> - path);
>> -return -1;
>> +return -errno;
>>  }
>>  /*
>>   * Try again on EINTR and EEXIST.  The latter happens when
>> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
>> MemoryRegion *mr,
>>  bool created;
>>  RAMBlock *block;
>>  
>> -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, ,
>> -   errp);
>> +fd = file_ram_open(mem_path, memory_region_name(mr), readonly, 
>> );
>> +if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
>> +/*
>> + * We can have a writable MAP_PRIVATE mapping of a readonly file.
>> + * However, some operations like ftruncate() or fallocate() might 
>> fail
>> + * later, let's warn the user.
>> + */
>> +fd = file_ram_open(mem_path, memory_region_name(mr), true, 
>> );
>> +if (fd >= 0) {
>> +warn_report("backing store %s for guest RAM (MAP_PRIVATE) 
>> opened"
>> +" readonly because the file is not writable", 
>> mem_path);
>
>I can understand the use case, but this will be slightly unwanted,
>especially the user doesn't yet have a way to predict when will it happen.
>
>Meanwhile this changes the behavior, is it a concern that someone may want
>to rely on current behavior of failing?
>

I am happy to learn if there is some solid evidence supporting this view.

The target of compatibility is "private+discard" which seems itself pathological
practice in early discussion. The only difference in behavior that might be 
unwanted
in your argument lies in "readonly file+private+discard" failure time. Before 
the
patch it fails early, after the patch it fails later and may does additional 
stuff.
Do you think that a bug reporting mechanism which relies on qemu failure
timing is valid?

If someone argues that "readonly file+private+discard" early failure behavior
is all where their system relies, I would be happy to learn why. Actually this
is much easier to solve outside qemu, by checking memory file permission,
compared to the "readonly+private" alternative plan that requires a btrfs.

In the long run the "private+discard" setup may itself be warned and
deprecated, rather than "private+readonly file" which is supported by
linux kernel.

Current patch is already a compromise considering compatibility, as it
always tries open the file in readwrite mode first, to persist behavior on
"readwrite+private+discard" case. Personally I prefer to try opening the file
readonly first, as this will reduce the risk of opening ram file accidentally
with write permission.

However I can accept a second level of compromise, aka adding 
"-disallow-private-discard"
to qemu argument for at least three qemu releases before "private+discard"
get deprecated and removed, if extreme compatibility enthusiast is involved.
With this argument, readonly private is enabled and private
discard fails immediately when discard request is detected, and without this
argument readonly private is disabled and the behavior is unchanged.
This argument is useful also because it helps
finding out 

Re: [PATCH v4 9/9] docs/system: add basic virtio-gpu documentation

2023-08-08 Thread Akihiko Odaki

On 2023/08/09 11:11, Gurchetan Singh wrote:

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
---
v2: - Incorporated suggestions by Akihiko Odaki
 - Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

  docs/system/device-emulation.rst   |   1 +
  docs/system/devices/virtio-gpu.rst | 115 +
  2 files changed, 116 insertions(+)
  create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
 devices/nvme.rst
 devices/usb.rst
 devices/vhost-user.rst
+   devices/virtio-gpu.rst
 devices/virtio-pmem.rst
 devices/vhost-user-rng.rst
 devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..d56524270d
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,115 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+This document uses the PCI interface in examples.


I think it's better to omit -pci.

By the way you are not adding the aliases for Rutabaga so please do so. 
You can find the table in: softmmu/qdev-monitor.c



+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu-pci
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl-pci
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. For Android guests, ``gfxstream-gles`` is also supported.
+
+The device will try to auto-detect the wayland socket path if the
+``cross-domain`` capset name is set.  The user may 

Re: [PATCH v4 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-08 Thread Akihiko Odaki

On 2023/08/09 11:11, Gurchetan Singh wrote:

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
   - Very simple initial display integration: just used Pixman
   - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
 calls

Next steps for Android VMs:
   - The next step would be improving display integration and UI interfaces
 with the goal of the QEMU upstream graphics being in an emulator
 release [d].

Next steps for Linux VMs for display virtualization:
   - For widespread distribution, someone needs to package Sommelier or the
 wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
 versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
 which allows disabling KMS hypercalls.  If anyone cares enough, it'll
 probably be possible to build a custom VM variant that uses this display
 virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
 - Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
 - Used error_report(..)
 - Used g_autofree to fix leaks on error paths
 - Removed unnecessary casts
 - added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
 Bernard Berschow:
 - Parenthesis in CHECK macro
 - CHECK_RESULT(result, ..) --> CHECK(!result, ..)
 - delay until g->parent_obj.enable = 1
 - Additional cast fixes
 - initialize directly in virtio_gpu_rutabaga_realize(..)
 - add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
 - Autodetect Wayland socket when not explicitly specified
 - Fix map_blob error paths
 - Add comment why we need both `res` and `resource` in create blob
 - Cast and whitespace fixes
 - Big endian check comes before virtio_gpu_rutabaga_init().
 - VirtIOVGARUTABAGA --> VirtIOVGARutabaga

v4: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
  

Re: [PATCH 08/33] Add structs target_freebsd11_nstat and target_freebsd11_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 10:41 PM Warner Losh  wrote:

>
>
> On Tue, Aug 8, 2023 at 9:12 PM Richard Henderson <
> richard.hender...@linaro.org> wrote:
>
>> On 8/8/23 19:51, Warner Losh wrote:
>> >  > +/* __int32_t  st_lspare; */
>> >
>> > Why commented out?
>> >
>> >
>> > I believe that the element was a padding one 
>> >
>> >  > +struct target_freebsd_timespec st_birthtim; /* time of file
>> creation */
>> >
>> > Does that not place st_birthtim at the wrong place?
>> >
>> >
>> > So this winds up in the right place because there's a hole...
>> >
>> > However, having said that, I don't think it should be commented out.
>> It's not
>> > in the bsd-user branch. And the state of the upstream code is such that
>> we can't
>> > run full tests easily on the system calls, so we're making sure they
>> basically
>> > work, but will run the full regression test once some other changes are
>> made
>> > to allow shared libraries to work (many of the calls in this patch are
>> needed
>> > to make 'hello world' work).
>>
>> I think there is not a hole, because the struct is __packed.
>>
>> (Also, QEMU_PACKED vs __packed?)
>
>
> There's a nstat that's an older stat w/o this field. I'm not entirely sure
> __packed should
> be on either one of these, honestly. nstat is quite old, and I'm not at
> all sure what's up
> with it. I think it dates from a time when there was only i386 and then we
> expanded to
> alpha and needed to 'fix' this interface... Not sure why it got the
> freebsd11_ prefix, so
> I'll have to chase those details down to see if this is an extra cut and
> paste or what.
> That may take a little bit to chase down in the logs and in people's
> memory. I think
> that's the back story. Normally, nstat is only defined in the kernel, and
> this is a binary
> interface from ages ago that we likely don't need to implement, but I need
> to confirm
> that, and make sure rust or go don't have some weird, misguided mistake...
>
> But nstat and stat are supposed to be the same, except nstat omits
> st_spare, so that's
> why it's commented out. stat's supposed to be carefully laid out so packed
> or not
> doesn't matter. But testing that would take a little bit as well.
>

OK. Back in 1998, John Dyson added nstat and friends:
commit 1f5621728039a2009fc163d345508d0ee9fae2e9
Author: John Dyson 
Date:   Mon May 11 03:55:28 1998 +

Fix the futimes/undelete/utrace conflict with other BSD's.  Note that
the only common  usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem.  Add
the various NetBSD syscalls that allow full emulation of their
development environment.

such emulation hasn't worked since the ELF cut over in FreeSBD 3.2
in 1999,if it even did before that (there's some code to implement these,
but
it's not at all clear to me it ever worked)... This was all preserved when
FreeBSD 11 converted to 64-bit inodes.

So I guess we should preserve it, but I still need to find out the layout
of nstat vs stat on different platforms (though the only systems we ran
on at the time were i386, so after grubbing around with git blame,
I completely retract my 'it was related to alpha' comments: they were
completely wrong). But even so, I guess it was only ever used in
life fire with a.out to run old netbsd a.out files from 25 years ago and
it's relevance is zero since nobody has used it since then and all
maintenance since then in FreeBSD has been basically useless.

I may try to do a regression build of rust w/o it to see if it's used there
or not...

So today I learned...

Warner


Re: [PATCH 08/33] Add structs target_freebsd11_nstat and target_freebsd11_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 9:12 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/8/23 19:51, Warner Losh wrote:
> >  > +/* __int32_t  st_lspare; */
> >
> > Why commented out?
> >
> >
> > I believe that the element was a padding one 
> >
> >  > +struct target_freebsd_timespec st_birthtim; /* time of file
> creation */
> >
> > Does that not place st_birthtim at the wrong place?
> >
> >
> > So this winds up in the right place because there's a hole...
> >
> > However, having said that, I don't think it should be commented out.
> It's not
> > in the bsd-user branch. And the state of the upstream code is such that
> we can't
> > run full tests easily on the system calls, so we're making sure they
> basically
> > work, but will run the full regression test once some other changes are
> made
> > to allow shared libraries to work (many of the calls in this patch are
> needed
> > to make 'hello world' work).
>
> I think there is not a hole, because the struct is __packed.
>
> (Also, QEMU_PACKED vs __packed?)


There's a nstat that's an older stat w/o this field. I'm not entirely sure
__packed should
be on either one of these, honestly. nstat is quite old, and I'm not at all
sure what's up
with it. I think it dates from a time when there was only i386 and then we
expanded to
alpha and needed to 'fix' this interface... Not sure why it got the
freebsd11_ prefix, so
I'll have to chase those details down to see if this is an extra cut and
paste or what.
That may take a little bit to chase down in the logs and in people's
memory. I think
that's the back story. Normally, nstat is only defined in the kernel, and
this is a binary
interface from ages ago that we likely don't need to implement, but I need
to confirm
that, and make sure rust or go don't have some weird, misguided mistake...

But nstat and stat are supposed to be the same, except nstat omits
st_spare, so that's
why it's commented out. stat's supposed to be carefully laid out so packed
or not
doesn't matter. But testing that would take a little bit as well.

Warner


> ~
>


Re: [PATCH] qemu/osdep: Remove fallback for MAP_FIXED_NOREPLACE

2023-08-08 Thread Richard Henderson

On 8/8/23 20:50, Michael Tokarev wrote:

08.08.2023 19:44, Richard Henderson пишет:

In order for our emulation of MAP_FIXED_NOREPLACE to succeed within
linux-user target_mmap, we require a non-zero value.  This does not
require host kernel support, merely the bit being defined.


Hm. Should we add something like

  assert(MAP_FIXED_NOREPLACE != 0)

somewhere?


I wouldn't think so.  Where would a zero come from?


r~




Re: [PATCH] qemu/osdep: Remove fallback for MAP_FIXED_NOREPLACE

2023-08-08 Thread Michael Tokarev

08.08.2023 19:44, Richard Henderson пишет:

In order for our emulation of MAP_FIXED_NOREPLACE to succeed within
linux-user target_mmap, we require a non-zero value.  This does not
require host kernel support, merely the bit being defined.


Hm. Should we add something like

 assert(MAP_FIXED_NOREPLACE != 0)

somewhere?

/mjt



Re: [PATCH] qemu/osdep: Remove fallback for MAP_FIXED_NOREPLACE

2023-08-08 Thread Akihiko Odaki

On 2023/08/09 1:44, Richard Henderson wrote:

In order for our emulation of MAP_FIXED_NOREPLACE to succeed within
linux-user target_mmap, we require a non-zero value.  This does not
require host kernel support, merely the bit being defined.

MAP_FIXED_NOREPLACE was added with glibc 2.28.  From repology.org:

   Fedora 36: 2.35
   CentOS 8 (RHEL-8): 2.28
   Debian 11: 2.31
  OpenSUSE Leap 15.4: 2.31
Ubuntu LTS 20.04: 2.31

Reported-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 


Reviewed-by: Akihiko Odaki 



Re: [PATCH v2 7/7] tcg/ppc: Use prefixed instructions for tcg_out_goto_tb

2023-08-08 Thread Richard Henderson

On 8/8/23 19:56, Jordan Niethe wrote:

On Tue, Aug 8, 2023 at 1:02 PM Richard Henderson
 wrote:


When a direct branch is out of range, we can load the destination for
the indirect branch using PLA (for 16GB worth of buffer) and PLD from
the TranslationBlock for everything larger.

This means the patch affects exactly one instruction: B (plus filler),
PLA or PLD.  Which means we can update and execute the patch atomically.


I think the commit message needs to be updated for Nick's changes.


Whoops, yes.


r~



Re: [PATCH 18/33] Implement stat related syscalls

2023-08-08 Thread Richard Henderson

On 8/8/23 20:07, Warner Losh wrote:

So, that's kinda why. I do agree with you that that would be a better 
structure, but
can we use this structure for upstreaming and once we get the other issues 
worked
out, we can do a restructure... We've moved things around a bit, and I'm also 
waiting
for the NetBSD folks that contacted me to finish their efforts before I pull 
the rug out
from them (or they timeout, which isn't quite yet).


Yes, that's fine.  I understand the pain.

In which case have a

Reviewed-by: Richard Henderson 

for patches 18-33.  My only quibble with all of them related to this.


r~



Re: [PATCH 08/33] Add structs target_freebsd11_nstat and target_freebsd11_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/8/23 19:51, Warner Losh wrote:

 > +    /* __int32_t  st_lspare; */

Why commented out?


I believe that the element was a padding one 

 > +    struct target_freebsd_timespec st_birthtim; /* time of file 
creation */

Does that not place st_birthtim at the wrong place?


So this winds up in the right place because there's a hole...

However, having said that, I don't think it should be commented out. It's not
in the bsd-user branch. And the state of the upstream code is such that we can't
run full tests easily on the system calls, so we're making sure they basically
work, but will run the full regression test once some other changes are made
to allow shared libraries to work (many of the calls in this patch are needed
to make 'hello world' work).


I think there is not a hole, because the struct is __packed.

(Also, QEMU_PACKED vs __packed?)


r~



Re: [PATCH 02/33] Disable clang warnings arising from bsd-user/qemu.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 9:05 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/8/23 19:56, Warner Losh wrote:
> > Looking at this now, This bug is in clang 7, from 2018. For FreeBSD, we
> don't support
> > anything older than clang 12 or 13 However, the bug still exists in
> clang 16, the latest.
> > I believe this was also copied verbatim from linux-user, so let's leave
> it and then make
> > this one of the common things as a followup... ok?
>
> Ah.  In which case this should simply be moved to qemu/compiler.h.
>

OK. That we can do. I'll look into it.

Warner


>
> r~
>


Re: [PATCH 18/33] Implement stat related syscalls

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:53 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:08, Karim Taha wrote:
> > From: Stacey Son 
> >
> > Implement the following syscalls:
> > stat(2)
> > lstat(2)
> > fstat(2)
> > fstatat(2)
> > nstat
> > nfstat
> > nlstat
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Karim Taha 
>
> Why are all of these in os-stat.h instead of os-stat.c?
> Is this attempting to avoid clang's warning for unused static inline
> function in a c file?
>

I asked Stacey about this ages ago (why some things are like this) and he'd
indicated it was for better optimization (inlining) in
do_freebsd_syscall(). Other
inlines are in the .h files so that we can define a common set, and include
that
file and have netbsd/openbsd/freebsd pick and choose which ones to use
without
the warnings. The whole thing is a big mess right now, to be honest, but
I've tried
to avoid making structural changes in this area since I didn't want to
introduce bugs
and we have been chasing some weird corruption bug with threads for ages
(though
it might be related to the vm address space use bug that you fixed
recently, I've not
run things to see if that fixed it or not).

So, that's kinda why. I do agree with you that that would be a better
structure, but
can we use this structure for upstreaming and once we get the other issues
worked
out, we can do a restructure... We've moved things around a bit, and I'm
also waiting
for the NetBSD folks that contacted me to finish their efforts before I
pull the rug out
from them (or they timeout, which isn't quite yet).

Warner


> > +/* stat(2) */
> > +static inline abi_long do_freebsd11_stat(abi_long arg1, abi_long arg2)
> > +{
> > +abi_long ret;
> > +void *p;
> > +struct freebsd11_stat st;
> > +
> > +LOCK_PATH(p, arg1);
> > +ret = get_errno(freebsd11_stat(path(p), ));
> > +UNLOCK_PATH(p, arg1);
> > +if (!is_error(ret)) {
> > +ret = h2t_freebsd11_stat(arg2, );
> > +}
> > +return ret;
> > +}
>
> The patch ordering is poor, because freebsd11_stat is used here but not
> introduced until
> patch 23, and do_freebsd11_stat itself is not used until patch 30.
>
> And yet you delay compilation of os-stat.c until patch 29.  Patch 29 is
> either too early
> or too late, depending on the viewpoint.
>
> If os-stat.c compilation was enabled earlier, it would require you to
> order all of the
> patches such that os-stat.c will always compile.
>
> If os-stat.c compilation was enabled later (indeed last), you would not
> need to mark this
> function 'static inline' in order to avoid unused function warnings prior
> to their use in
> patches 30-33.
>
> I prefer the ordering in which os-stat.c always compiles.  This probably
> requires patches
> 23-27 be ordered first, and patches 30-33 be merged with patches 18-22.
> There is no need
> for *any* of these functions to be marked inline -- leave that choice to
> the compiler.
>
>
> r~
>


Re: [PATCH 03/33] Update the definitions of __put_user and __get_user macros

2023-08-08 Thread Richard Henderson

On 8/8/23 19:41, Warner Losh wrote:

I tried to implement both get and put in terms of this, but found that it broke 
the blitz
branch. So why don't we land this as is for bsd-user and then one of us can try 
to
put it into common-user so as to not create too many waves for our GSoC student 
Kariim.
There's already enough to fix in this series... Sound fair?


Yes, that's fine.


r~



Re: [PATCH 02/33] Disable clang warnings arising from bsd-user/qemu.h

2023-08-08 Thread Richard Henderson

On 8/8/23 19:56, Warner Losh wrote:

Looking at this now, This bug is in clang 7, from 2018. For FreeBSD, we don't 
support
anything older than clang 12 or 13 However, the bug still exists in clang 16, 
the latest.
I believe this was also copied verbatim from linux-user, so let's leave it and 
then make
this one of the common things as a followup... ok?


Ah.  In which case this should simply be moved to qemu/compiler.h.


r~



Re: [PATCH 16/33] Implement host-target convertion functions

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:39 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > From: Stacey Son 
> >
> > Implement the stat converstion functions:
> > target_to_host_fcntl_cmd
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Karim Taha 
> > ---
> >   bsd-user/freebsd/os-stat.c | 71 ++
> >   1 file changed, 71 insertions(+)
>
> Which host / guest pairs have varying fcntl constants?
> I thought freebsd had these standardized...
>

Ah, indeed. This can be an identity wrapper, since the only #ifdefs for
these
values are for visibility. So this whole function can be replaced by

abi_long target_to_host_fcntl_cmd(int cmd)
{
return cmd;
}

Warner


Re: [PATCH 02/33] Disable clang warnings arising from bsd-user/qemu.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 2:50 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > +/*
> > + * Tricky points:
> > + * - Use __builtin_choose_expr to avoid type promotion from ?:,
> > + * - Invalid sizes result in a compile time error stemming from
> > + *   the fact that abort has no parameters.
> > + * - It's easier to use the endian-specific unaligned load/store
> > + *   functions than host-endian unaligned load/store plus tswapN.
> > + * - The pragmas are necessary only to silence a clang false-positive
> > + *   warning: see https://bugs.llvm.org/show_bug.cgi?id=39113 .
> > + * - We have to disable -Wpragmas warnings to avoid a complaint about
> > + *   an unknown warning type from older compilers that don't know about
> > + *   -Waddress-of-packed-member.
> > + * - gcc has bugs in its _Pragma() support in some versions, eg
> > + *   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 -- so we only
> > + *   include the warning-suppression pragmas for clang
>
> Perhaps s/in some versions/prior to gcc-13/ ?
> At least that's what the bugzilla says, and it
> will help when auditing for compiler versions
> in a few years when gcc-12 is EOL.
>

Looking at this now, This bug is in clang 7, from 2018. For FreeBSD, we
don't support
anything older than clang 12 or 13 However, the bug still exists in clang
16, the latest.
I believe this was also copied verbatim from linux-user, so let's leave it
and then make
this one of the common things as a followup... ok?

Warner

Either way,
> Reviewed-by: Richard Henderson 
>
> r~
>


Re: [PATCH v2 7/7] tcg/ppc: Use prefixed instructions for tcg_out_goto_tb

2023-08-08 Thread Jordan Niethe
On Tue, Aug 8, 2023 at 1:02 PM Richard Henderson
 wrote:
>
> When a direct branch is out of range, we can load the destination for
> the indirect branch using PLA (for 16GB worth of buffer) and PLD from
> the TranslationBlock for everything larger.
>
> This means the patch affects exactly one instruction: B (plus filler),
> PLA or PLD.  Which means we can update and execute the patch atomically.

I think the commit message needs to be updated for Nick's changes.

>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/ppc/tcg-target.c.inc | 31 +++
>  1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 63fe4ef995..b686a68247 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -2646,31 +2646,38 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
>  uintptr_t ptr = get_jmp_target_addr(s, which);
>
>  if (USE_REG_TB) {
> +/*
> + * With REG_TB, we must always use indirect branching,
> + * so that the branch destination and TCG_REG_TB match.
> + */
>  ptrdiff_t offset = tcg_tbrel_diff(s, (void *)ptr);
>  tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_TB, offset);
> -
> -/* TODO: Use direct branches when possible. */
> -set_jmp_insn_offset(s, which);
>  tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
> -
>  tcg_out32(s, BCCTR | BO_ALWAYS);
>
>  /* For the unlinked case, need to reset TCG_REG_TB.  */
>  set_jmp_reset_offset(s, which);
>  tcg_out_mem_long(s, ADDI, ADD, TCG_REG_TB, TCG_REG_TB,
>   -tcg_current_code_size(s));
> -} else {
> -/* Direct branch will be patched by tb_target_set_jmp_target. */
> -set_jmp_insn_offset(s, which);
> -tcg_out32(s, NOP);
> +return;
> +}
>
> -/* When branch is out of range, fall through to indirect. */
> +/* Direct branch will be patched by tb_target_set_jmp_target. */
> +set_jmp_insn_offset(s, which);
> +tcg_out32(s, NOP);
> +
> +/* When branch is out of range, fall through to indirect. */
> +if (have_isa_3_10) {
> +ptrdiff_t offset = tcg_pcrel_diff_for_prefix(s, (void *)ptr);
> +tcg_out_8ls_d(s, PLD, TCG_REG_TMP1, 0, offset, 1);
> +} else {
>  tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP1, ptr - (int16_t)ptr);
>  tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_REG_TMP1, 
> (int16_t)ptr);
> -tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
> -tcg_out32(s, BCCTR | BO_ALWAYS);
> -set_jmp_reset_offset(s, which);
>  }
> +
> +tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
> +tcg_out32(s, BCCTR | BO_ALWAYS);
> +set_jmp_reset_offset(s, which);
>  }
>
>  void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
> --
> 2.34.1
>

Thank you for implementing this Richard.

I was able to boot mttcg guests on P9 and P10 hosts.

Tested-by: Jordan Niethe 
Reviewed-by: Jordan Niethe 



Re: [PATCH 10/33] Add struct target_freebsd_fhandle and fcntl flags to bsd-user/syscall_defs.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:26 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > +struct target_freebsd_fid {
> > +u_short fid_len;/* len of data in bytes */
> > +u_short fid_data0;  /* force longword align */
> > +charfid_data[TARGET_MAXFIDSZ];  /* data (variable len) */
>
> uint16_t?
>

These were copied from FreeBSD's sys/mount.h... Changing to uint16_t likely
is a good idea, though. I'll handle the logistics of making changes like
this in
bsd-user upstream with Kariim.

Warner


> Otherwise,
> Acked-by: Richard Henderson 
>


Re: [PATCH 08/33] Add structs target_freebsd11_nstat and target_freebsd11_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:24 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > +uint32_t   st_flags;/* user defined flags for file */
> > +__uint32_t st_gen;  /* file generation number */
>
> Drop the __.
>
> > +/* __int32_t  st_lspare; */
>
> Why commented out?
>

I believe that the element was a padding one 


> > +struct target_freebsd_timespec st_birthtim; /* time of file
> creation */
>
> Does that not place st_birthtim at the wrong place?
>

So this winds up in the right place because there's a hole...

However, having said that, I don't think it should be commented out. It's
not
in the bsd-user branch. And the state of the upstream code is such that we
can't
run full tests easily on the system calls, so we're making sure they
basically
work, but will run the full regression test once some other changes are made
to allow shared libraries to work (many of the calls in this patch are
needed
to make 'hello world' work).

Warner



>
> r~
>


Re: [PATCH 06/33] Add struct target_freebsd11_stat to bsd-user/syscall_defs

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:23 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > +uint32_t   st_flags;/* user defined flags for file */
> > +__uint32_t st_gen;  /* file generation number */
> > +__int32_t  st_lspare;
>
> Oh, drop the __ types.
>

Agreed. The original from the blitz branch copied from FreeBSD's stat.h
file which
used the __ types to avoid namespace pollution... a constraint we don't
have here.

Warner


Re: [PATCH 03/33] Update the definitions of __put_user and __get_user macros

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 7:47 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/8/23 18:39, Warner Losh wrote:
> >  > +#define __put_user_e(x, hptr, e)
> \
> >  > +do {
> \
> >  > +PRAGMA_DISABLE_PACKED_WARNING;
> \
> >  > +(__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p,
>\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 2,
> stw_##e##_p,\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 4,
> stl_##e##_p,\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p,
> abort  \
> >  > +((hptr), (x)), (void)0);
> \
> >  > +PRAGMA_REENABLE_PACKED_WARNING;
>\
> >  > +} while (0)
> >  > +
> >  > +#define __get_user_e(x, hptr, e)
> \
> >  > +do {
> \
> >  > +PRAGMA_DISABLE_PACKED_WARNING;
> \
> >  > +((x) = (typeof(*hptr))(
>\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,
>\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 2,
> lduw_##e##_p,   \
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 4,
> ldl_##e##_p,\
> >  > +__builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p,
> abort  \
> >  > +(hptr)), (void)0);
> \
> >  > +PRAGMA_REENABLE_PACKED_WARNING;
>\
> >  > +} while (0)
> >
> > Hmm.  I guess this works.  The typeof cast in __get_user_e being
> required when
> > sizeof(x) >
> > sizeof(*hptr) in order to get the correct extension.
> >
> >
> > This code was copied 100% from the current linux-user :)
>
> Ha ha indeed!  I should have known.
>

Yea... It's old-school crazy, and I've done a lot of that in my day, but not
here :)


> > Is it clearer with _Generic?
> >
> >   (x) = _Generic(*(hptr),
> >  int8_t: *(int8_t *)(hptr),
> >  uint8_t: *(uint8_t *)(hptr),
> >  int16_t: (int16_t)lduw_##e##_p(hptr),
> >  uint16_t: lduw_##e##_p(hptr),
> >  int32_t: (int32_t)ldl_##e##_p(hptr),
> >  uint32_t: (uint32_t)ldl_##e##_p(hptr),
> >  int64_t: (int64_t)ldq_##e##_p(hptr),
> >  uint64_t: ldq_##e##_p(hptr));
> >
> > In particular I believe the error message will be much prettier.
> >
> >
> > Indeed. That looks cleaner. I'll see if I can test that against the
> latest bsd-user upstream.
>
> I'll see if we can share this code via common-user.
>

It seems to work, though I still need the pragmas for clang 16 (see another
patch for that).

I tried to implement both get and put in terms of this, but found that it
broke the blitz
branch. So why don't we land this as is for bsd-user and then one of us can
try to
put it into common-user so as to not create too many waves for our GSoC
student Kariim.
There's already enough to fix in this series... Sound fair?

Warner


Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-08 Thread Jason Wang
On Tue, Aug 8, 2023 at 6:28 AM Ilya Maximets  wrote:
>
> Lots of virtio functions that are on a hot path in data transmission
> are initializing indirect descriptor cache at the point of stack
> allocation.  It's a 112 byte structure that is getting zeroed out on
> each call adding unnecessary overhead.  It's going to be correctly
> initialized later via special init function.  The only reason to
> actually initialize right away is the ability to safely destruct it.
> However, we only need to destruct it when it was used, i.e. when a
> desc_cache points to it.
>
> Removing these unnecessary stack initializations improves throughput
> of virtio-net devices in terms of 64B packets per second by 6-14 %
> depending on the case.  Tested with a proposed af-xdp network backend
> and a dpdk testpmd application in the guest, but should be beneficial
> for other virtio devices as well.
>
> Signed-off-by: Ilya Maximets 

Acked-by: Jason Wang 

Btw, we can probably remove MEMORY_REGION_CACHE_INVALID.

Thanks

> ---
>  hw/virtio/virtio.c | 42 +++---
>  1 file changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..a65396e616 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1071,7 +1071,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int idx;
>  unsigned int total_bufs, in_total, out_total;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len = 0;
>  int rc;
>
> @@ -1079,7 +1080,6 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  total_bufs = in_total = out_total = 0;
>
>  while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> -MemoryRegionCache *desc_cache = >desc;
>  unsigned int num_bufs;
>  VRingDesc desc;
>  unsigned int i;
> @@ -1091,6 +1091,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  goto err;
>  }
>
> +desc_cache = >desc;
> +
>  vring_split_desc_read(vdev, , desc_cache, i);
>
>  if (desc.flags & VRING_DESC_F_INDIRECT) {
> @@ -1156,7 +1158,9 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  }
>
>  done:
> -address_space_cache_destroy(_desc_cache);
> +if (desc_cache == _desc_cache) {
> +address_space_cache_destroy(_desc_cache);
> +}
>  if (in_bytes) {
>  *in_bytes = in_total;
>  }
> @@ -1207,8 +1211,8 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int idx;
>  unsigned int total_bufs, in_total, out_total;
> -MemoryRegionCache *desc_cache;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len = 0;
>  VRingPackedDesc desc;
>  bool wrap_counter;
> @@ -1297,7 +1301,9 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
> *vq,
>  vq->shadow_avail_idx = idx;
>  vq->shadow_avail_wrap_counter = wrap_counter;
>  done:
> -address_space_cache_destroy(_desc_cache);
> +if (desc_cache == _desc_cache) {
> +address_space_cache_destroy(_desc_cache);
> +}
>  if (in_bytes) {
>  *in_bytes = in_total;
>  }
> @@ -1487,8 +1493,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  {
>  unsigned int i, head, max;
>  VRingMemoryRegionCaches *caches;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> -MemoryRegionCache *desc_cache;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len;
>  VirtIODevice *vdev = vq->vdev;
>  VirtQueueElement *elem = NULL;
> @@ -1611,7 +1617,9 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>  done:
> -address_space_cache_destroy(_desc_cache);
> +if (desc_cache == _desc_cache) {
> +address_space_cache_destroy(_desc_cache);
> +}
>
>  return elem;
>
> @@ -1624,8 +1632,8 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
> sz)
>  {
>  unsigned int i, max;
>  VRingMemoryRegionCaches *caches;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> -MemoryRegionCache *desc_cache;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len;
>  VirtIODevice *vdev = vq->vdev;
>  VirtQueueElement *elem = NULL;
> @@ -1746,7 +1754,9 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
> sz)
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>  done:
> -address_space_cache_destroy(_desc_cache);
> +

Re: [PATCH v3 0/9] gfxstream + rutabaga_gfx

2023-08-08 Thread Huang Rui
On Wed, Aug 09, 2023 at 09:50:29AM +0800, Gurchetan Singh wrote:
>On Mon, Aug 7, 2023 at 7:24 AM Erico Nunes <[1]ernu...@redhat.com>
>wrote:
> 
>  Hello,
>  On 04/08/2023 01:54, Gurchetan Singh wrote:
>  > Prior versions:
>  >
>  >
>  [2]https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05801.ht
>  ml
>  >
>  >
>  [3]https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02341.ht
>  ml
>  >
>  >
>  [4]https://patchew.org/QEMU/20230421011223.718-1-gurchetansingh@chro
>  mium.org/
>  >
>  > Changes since v2:
>  > - Incorporated review feedback
>  >
>  > How to build both rutabaga and gfxstream guest/host libs:
>  >
>  > [5]https://crosvm.dev/book/appendix/rutabaga_gfx.html
>  >
>  > Branch containing this patch series:
>  >
>  >
>  [6]https://gitlab.freedesktop.org/gurchetansingh/qemu-gfxstream/-/co
>  mmits/qemu-gfxstream-v3
>  I tried this on Fedora with a Fedora guest and I was able to get
>  Vulkan
>  headless applications as well as Wayland proxy with sommelier to
>  work.
>  If you don't mind, I have a few questions.
>  I was not able to run Vulkan applications over the Wayland proxy,
>  only
>  unaccelerated apps. This seems to be unsupported yet; is actually
>  unsupported for now or was something missing in my setup?
> 
>Yes, currently this is unsupported.  In the near future, I do imagine
>3D accelerated rendering over cross-domain to be a thing (among many
>context types, not just gfxstream VK).
>Re: using gfxstream VK in Linux distros, depends on your use case.  If
>you are looking for best performance over virtio on open-source Linux
>platforms, perhaps gfxstream Vulkan (or any API virtualization
>solution) is not your best bet.  The Mesa native context work looks
>particularly exciting there:
>[7]https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658

In fact, we will introduce both venus and virtio native context next step
with qemu. :-)

I am cleanning up the codes will send out the qemu very soon.

Thanks,
Ray

>We are interested in running gfxstream VK in Linux guests, but we
>envisage that for reference and testing.  For all embedded use cases,
>using the host driver in the guest will predominate due to performance
>considerations (either through virtio or HW direct / mediated
>passthru).
> 
>  Also apparently GL/GLES is only supported on Android right now as
>  you
>  mentioned, since on Linux the gfxstream guest only installs the
>  Vulkan
>  library and icd. What is the plan to support GL on Linux; provide
>  gfxstream GL guest libraries later or enable Zink or some other
>  solution?
>  Then if I understand correctly, Mesa virgl is not used at all with
>  the
>  gfxstream solution, so I guess we would need to find a way to ship
>  the
>  gfxstream guest libraries too on distributions?
> 
> 
> 
>  Also I wonder about including all of the the dependencies required
>  to
>  get this to build on distributions as well to enable the feature on
>  distribution-provided qemu, but I guess we can figure this out
>  later...
>  And finally out of curiosity, I see that rutabaga also has a
>  virgl_renderer (and virgl_renderer_next) backend which would then
>  not
>  use gfxstream but virglrenderer instead. I wonder if there would be
>  any
>  benefit/features in enabling that with qemu later compared to the
>  current qemu virtio/virglrenderer implementation (if that would make
>  sense at all)?
> 
>Yeah, maybe later if there's developer interest,  rutabaga FFI can
>build its virglrenderer bindings in a subsequent release.  So far I
>don't have time to test, and the most important use case is gfxstream +
>Android for Emulator.  As ever, patches are welcome.
> 
>  Thanks
>  Erico
> 
> References
> 
>1. mailto:ernu...@redhat.com
>2. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05801.html
>3. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02341.html
>4. 
> https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/
>5. https://crosvm.dev/book/appendix/rutabaga_gfx.html
>6. 
> https://gitlab.freedesktop.org/gurchetansingh/qemu-gfxstream/-/commits/qemu-gfxstream-v3
>7. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658



Re: [PATCH v5 1/5] ebpf: Added eBPF map update through mmap.

2023-08-08 Thread Jason Wang
On Wed, Aug 9, 2023 at 7:15 AM Andrew Melnichenko  wrote:
>
> Hi all,
>
> On Tue, Aug 8, 2023 at 5:39 AM Jason Wang  wrote:
> >
> > On Thu, Aug 3, 2023 at 5:01 AM Andrew Melnychenko  wrote:
> > >
> > > Changed eBPF map updates through mmaped array.
> > > Mmaped arrays provide direct access to map data.
> > > It should omit using bpf_map_update_elem() call,
> > > which may require capabilities that are not present.
> > >
> > > Signed-off-by: Andrew Melnychenko 
> > > ---
> > >  ebpf/ebpf_rss.c | 117 ++--
> > >  ebpf/ebpf_rss.h |   5 +++
> > >  2 files changed, 99 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
> > > index cee658c158b..247f5eee1b6 100644
> > > --- a/ebpf/ebpf_rss.c
> > > +++ b/ebpf/ebpf_rss.c
> > > @@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
> > >  {
> > >  if (ctx != NULL) {
> > >  ctx->obj = NULL;
> > > +ctx->program_fd = -1;
> > > +ctx->map_configuration = -1;
> > > +ctx->map_toeplitz_key = -1;
> > > +ctx->map_indirections_table = -1;
> > > +
> > > +ctx->mmap_configuration = NULL;
> > > +ctx->mmap_toeplitz_key = NULL;
> > > +ctx->mmap_indirections_table = NULL;
> > >  }
> > >  }
> > >
> > >  bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
> > >  {
> > > -return ctx != NULL && ctx->obj != NULL;
> > > +return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
> > > +}
> > > +
> > > +static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
> > > +{
> > > +if (!ebpf_rss_is_loaded(ctx)) {
> > > +return false;
> > > +}
> > > +
> > > +ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
> > > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +   ctx->map_configuration, 0);
> > > +if (ctx->mmap_configuration == MAP_FAILED) {
> > > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration 
> > > array");
> > > +return false;
> > > +}
> > > +ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
> > > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +   ctx->map_toeplitz_key, 0);
> > > +if (ctx->mmap_toeplitz_key == MAP_FAILED) {
> > > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
> > > +goto toeplitz_fail;
> > > +}
> > > +ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
> > > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > > +   ctx->map_indirections_table, 0);
> > > +if (ctx->mmap_indirections_table == MAP_FAILED) {
> > > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection 
> > > table");
> > > +goto indirection_fail;
> > > +}
> > > +
> > > +return true;
> > > +
> > > +indirection_fail:
> > > +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> > > +toeplitz_fail:
> > > +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> > > +
> > > +ctx->mmap_configuration = NULL;
> > > +ctx->mmap_toeplitz_key = NULL;
> > > +ctx->mmap_indirections_table = NULL;
> > > +return false;
> > > +}
> > > +
> > > +static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
> > > +{
> > > +if (!ebpf_rss_is_loaded(ctx)) {
> > > +return;
> > > +}
> > > +
> > > +munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
> > > +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> > > +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> > > +
> > > +ctx->mmap_configuration = NULL;
> > > +ctx->mmap_toeplitz_key = NULL;
> > > +ctx->mmap_indirections_table = NULL;
> > >  }
> > >
> > >  bool ebpf_rss_load(struct EBPFRSSContext *ctx)
> > >  {
> > >  struct rss_bpf *rss_bpf_ctx;
> > >
> > > -if (ctx == NULL) {
> > > +if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
> > >  return false;
> > >  }
> > >
> > > @@ -66,10 +130,18 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
> > >  ctx->map_toeplitz_key = bpf_map__fd(
> > >  rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
> > >
> > > +if (!ebpf_rss_mmap(ctx)) {
> > > +goto error;
> > > +}
> > > +
> > >  return true;
> > >  error:
> > >  rss_bpf__destroy(rss_bpf_ctx);
> > >  ctx->obj = NULL;
> > > +ctx->program_fd = -1;
> > > +ctx->map_configuration = -1;
> > > +ctx->map_toeplitz_key = -1;
> > > +ctx->map_indirections_table = -1;
> > >
> > >  return false;
> > >  }
> > > @@ -77,15 +149,11 @@ error:
> > >  static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
> > >  struct EBPFRSSConfig *config)
> > >  {
> > > -uint32_t map_key = 0;
> > > -
> > >  if (!ebpf_rss_is_loaded(ctx)) {
> > >  return 

[PATCH v4 2/9] virtio-gpu: CONTEXT_INIT feature

2023-08-08 Thread Gurchetan Singh
From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-08-08 Thread Gurchetan Singh
This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.

The command to run these would be:

-device virtio-vga-rutabaga,capset_names=gfxstream-vulkan:cross-domain, \
wayland_socket_path=/run/user/1000/wayland-0,hostmem=8G  \

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
v1: void *rutabaga --> struct rutabaga *rutabaga (Akihiko)
have a separate rutabaga device instead of using GL device (Bernard)

v2: VirtioGpuRutabaga --> VirtIOGPURutabaga (Akihiko)
move MemoryRegionInfo into VirtIOGPURutabaga (Akihiko)
remove 'ctx' field (Akihiko)
remove 'rutabaga_active'

 include/hw/virtio/virtio-gpu.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 55973e112f..e2a07e68d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
 #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
 OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
 
+#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA)
+
 struct virtio_gpu_simple_resource {
 uint32_t resource_id;
 uint32_t width;
@@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
 
@@ -232,6 +238,28 @@ struct VhostUserGPU {
 bool backend_blocked;
 };
 
+#define MAX_SLOTS 4096
+
+struct MemoryRegionInfo {
+int used;
+MemoryRegion mr;
+uint32_t resource_id;
+};
+
+struct rutabaga;
+
+struct VirtIOGPURutabaga {
+struct VirtIOGPU parent_obj;
+
+struct MemoryRegionInfo memory_regions[MAX_SLOTS];
+char *capset_names;
+char *wayland_socket_path;
+char *wsi;
+bool headless;
+uint32_t num_capsets;
+struct rutabaga *rutabaga;
+};
+
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
 size_t s;   \
 s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-08 Thread Gurchetan Singh
This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
  - Very simple initial display integration: just used Pixman
  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
calls

Next steps for Android VMs:
  - The next step would be improving display integration and UI interfaces
with the goal of the QEMU upstream graphics being in an emulator
release [d].

Next steps for Linux VMs for display virtualization:
  - For widespread distribution, someone needs to package Sommelier or the
wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
which allows disabling KMS hypercalls.  If anyone cares enough, it'll
probably be possible to build a custom VM variant that uses this display
virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
- Used error_report(..)
- Used g_autofree to fix leaks on error paths
- Removed unnecessary casts
- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
Bernard Berschow:
- Parenthesis in CHECK macro
- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
- delay until g->parent_obj.enable = 1
- Additional cast fixes
- initialize directly in virtio_gpu_rutabaga_realize(..)
- add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
- Autodetect Wayland socket when not explicitly specified
- Fix map_blob error paths
- Add comment why we need both `res` and `resource` in create blob
- Cast and whitespace fixes
- Big endian check comes before virtio_gpu_rutabaga_init().
- VirtIOVGARUTABAGA --> VirtIOVGARutabaga

v4: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
- Double checked all casts
- Remove unnecessary parenthesis
- 

[PATCH v4 4/9] virtio-gpu: blob prep

2023-08-08 Thread Gurchetan Singh
From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 18 ++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 48ef0d9fad..3e658f1fef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 9124108485..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -63,10 +63,28 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 {
 virtio_gpu_ctrl_hdr_bswap(>hdr);
 le32_to_cpus(>resource_id);
+le32_to_cpus(>blob_mem);
 le32_to_cpus(>blob_flags);
+le32_to_cpus(>nr_entries);
+le64_to_cpus(>blob_id);
 le64_to_cpus(>size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+le64_to_cpus(>offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(>hdr);
+le32_to_cpus(>resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 1/9] virtio: Add shared memory capability

2023-08-08 Thread Gurchetan Singh
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Reviewed-by: Gurchetan Singh 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Reviewed-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 9/9] docs/system: add basic virtio-gpu documentation

2023-08-08 Thread Gurchetan Singh
This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
---
v2: - Incorporated suggestions by Akihiko Odaki
- Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 115 +
 2 files changed, 116 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..d56524270d
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,115 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+This document uses the PCI interface in examples.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu-pci
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl-pci
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. For Android guests, ``gfxstream-gles`` is also supported.
+
+The device will try to auto-detect the wayland socket path if the
+``cross-domain`` capset name is set.  The user may optionally specify
+``wayland_socket_path`` for non-standard paths.
+
+The ``wsi`` option can be set to ``surfaceless`` or ``headless``.
+Surfaceless doesn't create a native window surface, but does copy from the
+render 

[PATCH v4 3/9] virtio-gpu: hostmem

2023-08-08 Thread Gurchetan Singh
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bbd5c6561a..48ef0d9fad 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1509,6 +1509,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, >vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(>hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ >hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 8/9] gfxstream + rutabaga: enable rutabaga

2023-08-08 Thread Gurchetan Singh
This change enables rutabaga to receive virtio-gpu-3d hypercalls
when it is active.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
v3: Whitespace fix (Akihiko)

 hw/display/virtio-gpu-base.c | 3 ++-
 hw/display/virtio-gpu.c  | 5 +++--
 softmmu/qdev-monitor.c   | 3 +++
 softmmu/vl.c | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..50c5373b65 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 {
 VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
 
-if (virtio_gpu_virgl_enabled(g->conf)) {
+if (virtio_gpu_virgl_enabled(g->conf) ||
+virtio_gpu_rutabaga_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_VIRGL);
 }
 if (virtio_gpu_edid_enabled(g->conf)) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..08e170e029 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1361,8 +1361,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
-error_setg(errp, "cannot enable blob resources without udmabuf");
+if (!virtio_gpu_have_udmabuf() &&
+!virtio_gpu_rutabaga_enabled(g->parent_obj.conf)) {
+error_setg(errp, "need udmabuf or rutabaga for blob resources");
 return;
 }
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..1b8005ae55 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = {
 { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI },
+{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga",
+  QEMU_ARCH_VIRTIO_MMIO },
+{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW },
 { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..2f98eefdf3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -216,6 +216,7 @@ static struct {
 { .driver = "ati-vga",  .flag = _vga   },
 { .driver = "vhost-user-vga",   .flag = _vga   },
 { .driver = "virtio-vga-gl",.flag = _vga   },
+{ .driver = "virtio-vga-rutabaga",  .flag = _vga   },
 };
 
 static QemuOptsList qemu_rtc_opts = {
-- 
2.41.0.640.ga95def55d0-goog




[PATCH v4 7/9] gfxstream + rutabaga: meson support

2023-08-08 Thread Gurchetan Singh
- Add meson detection of rutabaga_gfx
- Build virtio-gpu-rutabaga.c + associated vga/pci files when
  present

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
---
 v3: Fix alignment issues (Akihiko)

 hw/display/meson.build| 22 ++
 meson.build   |  7 +++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/hw/display/meson.build b/hw/display/meson.build
index 413ba4ab24..e362d625dd 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -79,6 +79,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
  if_true: [files('virtio-gpu-gl.c', 
'virtio-gpu-virgl.c'), pixman, virgl])
 hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss}
   endif
+
+  if rutabaga.found()
+virtio_gpu_rutabaga_ss = ss.source_set()
+virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga],
+   if_true: [files('virtio-gpu-rutabaga.c'), 
pixman])
+hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
@@ -95,6 +102,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
  if_true: [files('virtio-gpu-pci-gl.c'), pixman])
 hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss}
   endif
+  if rutabaga.found()
+virtio_gpu_pci_rutabaga_ss = ss.source_set()
+virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 
'CONFIG_VIRTIO_PCI', rutabaga],
+   if_true: 
[files('virtio-gpu-pci-rutabaga.c'), pixman])
+hw_display_modules += {'virtio-gpu-pci-rutabaga': 
virtio_gpu_pci_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
@@ -113,6 +126,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
   virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'),
 if_false: files('acpi-vga-stub.c'))
   hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss}
+
+  if rutabaga.found()
+virtio_vga_rutabaga_ss = ss.source_set()
+virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga],
+   if_true: [files('virtio-vga-rutabaga.c'), 
pixman])
+virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: 
files('acpi-vga.c'),
+if_false: 
files('acpi-vga-stub.c'))
+hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss}
+  endif
 endif
 
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c'))
diff --git a/meson.build b/meson.build
index 98e68ef0b1..293f388e53 100644
--- a/meson.build
+++ b/meson.build
@@ -1069,6 +1069,12 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
dependencies: virgl))
   endif
 endif
+rutabaga = not_found
+if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu
+  rutabaga = dependency('rutabaga_gfx_ffi',
+ method: 'pkg-config',
+ required: get_option('rutabaga_gfx'))
+endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
   blkio = dependency('blkio',
@@ -4272,6 +4278,7 @@ summary_info += {'libtasn1':  tasn1}
 summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
+summary_info += {'rutabaga support':  rutabaga}
 summary_info += {'blkio support': blkio}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
diff --git a/meson_options.txt b/meson_options.txt
index aaea5ddd77..dea3bf7d9c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -224,6 +224,8 @@ option('vmnet', type : 'feature', value : 'auto',
description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
+option('rutabaga_gfx', type : 'feature', value : 'auto',
+   description: 'rutabaga_gfx support')
 option('png', type : 'feature', value : 'auto',
description: 'PNG support with libpng')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9da3fe299b..9a95b4f782 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -154,6 +154,7 @@ meson_options_help() {
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
+  printf "%s\n" '  rutabaga-gfxrutabaga_gfx support'
   printf "%s\n" '  sdl SDL user interface'
   printf "%s\n" '  sdl-image   SDL Image support for icons'
   printf "%s\n" '  seccomp 

[PATCH v4 0/9] gfxstream + rutabaga_gfx

2023-08-08 Thread Gurchetan Singh
From: Gurchetan Singh 

Prior versions:

https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg00565.html

https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05801.html

https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02341.html

https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/

Changes since v3:
- Incorporated review feedback

How to build both rutabaga and gfxstream guest/host libs:

https://crosvm.dev/book/appendix/rutabaga_gfx.html

Branch containing this patch series:

https://gitlab.freedesktop.org/gurchetansingh/qemu-gfxstream/-/commits/qemu-gfxstream-v4

Antonio Caggiano (2):
  virtio-gpu: CONTEXT_INIT feature
  virtio-gpu: blob prep

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

Gurchetan Singh (5):
  gfxstream + rutabaga prep: added need defintions, fields, and options
  gfxstream + rutabaga: add initial support for gfxstream
  gfxstream + rutabaga: meson support
  gfxstream + rutabaga: enable rutabaga
  docs/system: add basic virtio-gpu documentation

 docs/system/device-emulation.rst |1 +
 docs/system/devices/virtio-gpu.rst   |  115 +++
 hw/display/meson.build   |   22 +
 hw/display/virtio-gpu-base.c |6 +-
 hw/display/virtio-gpu-pci-rutabaga.c |   48 ++
 hw/display/virtio-gpu-pci.c  |   14 +
 hw/display/virtio-gpu-rutabaga.c | 1116 ++
 hw/display/virtio-gpu.c  |   16 +-
 hw/display/virtio-vga-rutabaga.c |   51 ++
 hw/display/virtio-vga.c  |   33 +-
 hw/virtio/virtio-pci.c   |   18 +
 include/hw/virtio/virtio-gpu-bswap.h |   18 +
 include/hw/virtio/virtio-gpu.h   |   41 +
 include/hw/virtio/virtio-pci.h   |4 +
 meson.build  |7 +
 meson_options.txt|2 +
 scripts/meson-buildoptions.sh|3 +
 softmmu/qdev-monitor.c   |3 +
 softmmu/vl.c |1 +
 19 files changed, 1500 insertions(+), 19 deletions(-)
 create mode 100644 docs/system/devices/virtio-gpu.rst
 create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
 create mode 100644 hw/display/virtio-gpu-rutabaga.c
 create mode 100644 hw/display/virtio-vga-rutabaga.c

-- 
2.41.0.640.ga95def55d0-goog




[PATCH] powerpc/spapr: Add DEXCR to device tree

2023-08-08 Thread Benjamin Gray
Each DEXCR aspect is allocated a bit in the device tree, using the
68--71 byte range (inclusive). The functionality of the
[P]HASHST/[P]HASHCHK instructions is separately declared in byte 72,
bit 0 (BE).

QEMU supports these features (though the speculation aspects are just
tracked, they don't do anything), so declare this in the device tree.

Signed-off-by: Benjamin Gray 

---

The current design appears to duplicate the previous block and add the
new features after it. I copied that for the 3.10 features, but not sure
how well this scales, so alternatives welcome.
---
 hw/ppc/spapr.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1c8b8d57a7..df76d2dc12 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -244,6 +244,35 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
 /* 60: NM atomic, 62: RNG */
 0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
 };
+uint8_t pa_features_310[] = { 74, 0,
+/* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
+/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, SSO, 5: LE|CFAR|EB|LSQ */
+0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, /* 0 - 5 */
+/* 6: DS207 */
+0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
+/* 16: Vector */
+0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
+/* 18: Vec. Scalar, 20: Vec. XOR, 22: HTM */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 18 - 23 */
+/* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
+/* 30: MMR, 32: LE atomic, 34: EBB + ext EBB */
+0x80, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
+/* 36: SPR SO, 38: Copy/Paste, 40: Radix MMU */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 36 - 41 */
+/* 42: PM, 44: PC RA, 46: SC vec'd */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
+/* 48: SIMD, 50: QP BFP, 52: String */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 48 - 53 */
+/* 54: DecFP, 56: DecI, 58: SHA */
+0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 54 - 59 */
+/* 60: NM atomic, 62: RNG */
+0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 60 - 65 */
+/* 68: DEXCR */
+0x00, 0x00, 0x9E, 0x00, 0x00, 0x00, /* 66 - 71 */
+/* 72: [P]HASHCHK */
+0x80, 0x00, /* 72 - 73 */
+};
 uint8_t *pa_features = NULL;
 size_t pa_size;
 
@@ -259,6 +288,10 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
 pa_features = pa_features_300;
 pa_size = sizeof(pa_features_300);
 }
+if (ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_10, 0, cpu->compat_pvr)) {
+pa_features = pa_features_310;
+pa_size = sizeof(pa_features_310);
+}
 if (!pa_features) {
 return;
 }
-- 
2.41.0




Re: [PATCH v3 0/9] gfxstream + rutabaga_gfx

2023-08-08 Thread Gurchetan Singh
On Mon, Aug 7, 2023 at 7:24 AM Erico Nunes  wrote:

> Hello,
>
> On 04/08/2023 01:54, Gurchetan Singh wrote:
> > Prior versions:
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05801.html
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02341.html
> >
> >
> https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/
> >
> > Changes since v2:
> > - Incorporated review feedback
> >
> > How to build both rutabaga and gfxstream guest/host libs:
> >
> > https://crosvm.dev/book/appendix/rutabaga_gfx.html
> >
> > Branch containing this patch series:
> >
> >
> https://gitlab.freedesktop.org/gurchetansingh/qemu-gfxstream/-/commits/qemu-gfxstream-v3
>
> I tried this on Fedora with a Fedora guest and I was able to get Vulkan
> headless applications as well as Wayland proxy with sommelier to work.
> If you don't mind, I have a few questions.
> I was not able to run Vulkan applications over the Wayland proxy, only
> unaccelerated apps. This seems to be unsupported yet; is actually
> unsupported for now or was something missing in my setup?
>

Yes, currently this is unsupported.  In the near future, I do imagine 3D
accelerated rendering over cross-domain to be a thing (among many context
types, not just gfxstream VK).

Re: using gfxstream VK in Linux distros, depends on your use case.  If you
are looking for best performance over virtio on open-source Linux
platforms, perhaps gfxstream Vulkan (or any API virtualization solution) is
not your best bet.  The Mesa native context work looks particularly
exciting there:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21658

We are interested in running gfxstream VK in Linux guests, but we envisage
that for reference and testing.  For all embedded use cases, using the host
driver in the guest will predominate due to performance considerations
(either through virtio or HW direct / mediated passthru).

Also apparently GL/GLES is only supported on Android right now as you
> mentioned, since on Linux the gfxstream guest only installs the Vulkan
> library and icd. What is the plan to support GL on Linux; provide
> gfxstream GL guest libraries later or enable Zink or some other solution?
> Then if I understand correctly, Mesa virgl is not used at all with the
> gfxstream solution, so I guess we would need to find a way to ship the
> gfxstream guest libraries too on distributions?



Also I wonder about including all of the the dependencies required to
> get this to build on distributions as well to enable the feature on
> distribution-provided qemu, but I guess we can figure this out later...
>
> And finally out of curiosity, I see that rutabaga also has a
> virgl_renderer (and virgl_renderer_next) backend which would then not
> use gfxstream but virglrenderer instead. I wonder if there would be any
> benefit/features in enabling that with qemu later compared to the
> current qemu virtio/virglrenderer implementation (if that would make
> sense at all)?
>

Yeah, maybe later if there's developer interest,  rutabaga FFI can build
its virglrenderer bindings in a subsequent release.  So far I don't have
time to test, and the most important use case is gfxstream + Android for
Emulator.  As ever, patches are welcome.

Thanks
>
> Erico
>
>


Re: [PATCH 03/33] Update the definitions of __put_user and __get_user macros

2023-08-08 Thread Richard Henderson

On 8/8/23 18:39, Warner Losh wrote:

 > +#define __put_user_e(x, hptr, e)                                        
    \
 > +    do {                                                                
    \
 > +        PRAGMA_DISABLE_PACKED_WARNING;                                  
    \
 > +        (__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p,             
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 2, stw_##e##_p,        
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 4, stl_##e##_p,        
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p, 
abort  \
 > +            ((hptr), (x)), (void)0);                                    
    \
 > +        PRAGMA_REENABLE_PACKED_WARNING;                                 
    \
 > +    } while (0)
 > +
 > +#define __get_user_e(x, hptr, e)                                        
    \
 > +    do {                                                                
    \
 > +        PRAGMA_DISABLE_PACKED_WARNING;                                  
    \
 > +        ((x) = (typeof(*hptr))(                                         
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,             
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,       
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p,        
    \
 > +        __builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, 
abort  \
 > +            (hptr)), (void)0);                                          
    \
 > +        PRAGMA_REENABLE_PACKED_WARNING;                                 
    \
 > +    } while (0)

Hmm.  I guess this works.  The typeof cast in __get_user_e being required 
when
sizeof(x) >
sizeof(*hptr) in order to get the correct extension.


This code was copied 100% from the current linux-user :) 


Ha ha indeed!  I should have known.



Is it clearer with _Generic?

      (x) = _Generic(*(hptr),
                     int8_t: *(int8_t *)(hptr),
                     uint8_t: *(uint8_t *)(hptr),
                     int16_t: (int16_t)lduw_##e##_p(hptr),
                     uint16_t: lduw_##e##_p(hptr),
                     int32_t: (int32_t)ldl_##e##_p(hptr),
                     uint32_t: (uint32_t)ldl_##e##_p(hptr),
                     int64_t: (int64_t)ldq_##e##_p(hptr),
                     uint64_t: ldq_##e##_p(hptr));

In particular I believe the error message will be much prettier.


Indeed. That looks cleaner. I'll see if I can test that against the latest 
bsd-user upstream.


I'll see if we can share this code via common-user.


r~



Re: [PULL v2] hw/nvme fixes

2023-08-08 Thread Richard Henderson

On 8/8/23 06:35, Klaus Jensen wrote:

From: Klaus Jensen

Hi,

There was a small typo in the last pull. This replaces it.

The following changes since commit 0450cf08976f9036feaded438031b4cba94f6452:

   Merge tag 'fixes-pull-request' ofhttps://gitlab.com/marcandre.lureau/qemu  
into staging (2023-08-07 13:55:00 -0700)

are available in the Git repository at:

   https://gitlab.com/birkelund/qemu.git  tags/nvme-next-pull-request

for you to fetch changes up to ec5a138ce63ce460575a44cf9ec3172c33eb0fd6:

   docs: update hw/nvme documentation for protection information (2023-08-08 
15:28:05 +0200)


hw/nvme fixes


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH 03/33] Update the definitions of __put_user and __get_user macros

2023-08-08 Thread Warner Losh
On Tue, Aug 8, 2023 at 3:03 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/7/23 23:07, Karim Taha wrote:
> > From: Warner Losh 
> >
> > Use __builtin_choose_expr to avoid type promotion from ?:
> > in __put_user_e and __get_user_e macros.
> > Copied from linux-user/qemu.h, originally by Blue Swirl.
> >
> > Signed-off-by: Warner Losh 
> > Signed-off-by: Karim Taha 
> > ---
> >   bsd-user/qemu.h   | 81 ---
> >   bsd-user/signal.c |  5 +--
> >   2 files changed, 35 insertions(+), 51 deletions(-)
> >
> > diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
> > index dfdfa8dd67..c41ebfe937 100644
> > --- a/bsd-user/qemu.h
> > +++ b/bsd-user/qemu.h
> > @@ -307,50 +307,37 @@ static inline bool access_ok(int type, abi_ulong
> addr, abi_ulong size)
> >   #define PRAGMA_REENABLE_PACKED_WARNING
> >   #endif
> >
> > -#define __put_user(x, hptr)\
> > -({\
> > -int size = sizeof(*hptr);\
> > -switch (size) {\
> > -case 1:\
> > -*(uint8_t *)(hptr) = (uint8_t)(typeof(*hptr))(x);\
> > -break;\
> > -case 2:\
> > -*(uint16_t *)(hptr) = tswap16((typeof(*hptr))(x));\
> > -break;\
> > -case 4:\
> > -*(uint32_t *)(hptr) = tswap32((typeof(*hptr))(x));\
> > -break;\
> > -case 8:\
> > -*(uint64_t *)(hptr) = tswap64((typeof(*hptr))(x));\
> > -break;\
> > -default:\
> > -abort();\
> > -} \
> > -0;\
> > -})
> > +#define __put_user_e(x, hptr, e)
> \
> > +do {
> \
> > +PRAGMA_DISABLE_PACKED_WARNING;
> \
> > +(__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p,
>  \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 2, stw_##e##_p,
> \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 4, stl_##e##_p,
> \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p,
> abort  \
> > +((hptr), (x)), (void)0);
> \
> > +PRAGMA_REENABLE_PACKED_WARNING;
>  \
> > +} while (0)
> > +
> > +#define __get_user_e(x, hptr, e)
> \
> > +do {
> \
> > +PRAGMA_DISABLE_PACKED_WARNING;
> \
> > +((x) = (typeof(*hptr))(
>  \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,
>  \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,
>  \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p,
> \
> > +__builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p,
> abort  \
> > +(hptr)), (void)0);
> \
> > +PRAGMA_REENABLE_PACKED_WARNING;
>  \
> > +} while (0)
>
> Hmm.  I guess this works.  The typeof cast in __get_user_e being required
> when sizeof(x) >
> sizeof(*hptr) in order to get the correct extension.
>

This code was copied 100% from the current linux-user :) It was also copied
before I needed
to do _Generic for a different project for something as well, so I didn't
think to switch over
from the old-school, gcc-specific hack to the modern clean form.


> Is it clearer with _Generic?
>
>  (x) = _Generic(*(hptr),
> int8_t: *(int8_t *)(hptr),
> uint8_t: *(uint8_t *)(hptr),
> int16_t: (int16_t)lduw_##e##_p(hptr),
> uint16_t: lduw_##e##_p(hptr),
> int32_t: (int32_t)ldl_##e##_p(hptr),
> uint32_t: (uint32_t)ldl_##e##_p(hptr),
> int64_t: (int64_t)ldq_##e##_p(hptr),
> uint64_t: ldq_##e##_p(hptr));
>
> In particular I believe the error message will be much prettier.
>

Indeed. That looks cleaner. I'll see if I can test that against the latest
bsd-user upstream.

Warner


> Either way,
> Reviewed-by: Richard Henderson 
>
>
> r~
>


Re: [PATCH 0/9] Replace remaining target_ulong in system-mode accel

2023-08-08 Thread Richard Henderson

On 8/7/23 08:56, Anton Johansson wrote:

This patchset replaces the remaining uses of target_ulong in the accel/
directory.  Specifically, the address type of a few kvm/hvf functions
is widened to vaddr, and the address type of the cpu_[st|ld]*()
functions is changed to abi_ptr (which is re-typedef'd to vaddr in
system mode).

As a starting point, my goal is to be able to build cputlb.c once for
system mode, and this is a step in that direction by reducing the
target-dependence of accel/.

* Changes in v2:
 - Removed explicit target_ulong casts from 3rd and 4th patches.


Queued to for 8.2.


r~



Re: [PATCH v2 4/9] target: Use vaddr for hvf_arch_[insert|remove]_hw_breakpoint

2023-08-08 Thread Richard Henderson

On 8/7/23 08:57, Anton Johansson wrote:

Changes the signature of the target-defined functions for
inserting/removing hvf hw breakpoints. The address and length arguments
are now of vaddr type, which both matches the type used internally in
accel/hvf/hvf-all.c and makes the api target-agnostic.

Signed-off-by: Anton Johansson
---
  include/sysemu/hvf.h  | 6 ++
  target/arm/hvf/hvf.c  | 4 ++--
  target/i386/hvf/hvf.c | 4 ++--
  3 files changed, 6 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 3/9] target: Use vaddr for kvm_arch_[insert|remove]_hw_breakpoint

2023-08-08 Thread Richard Henderson

On 8/7/23 08:57, Anton Johansson wrote:

Changes the signature of the target-defined functions for
inserting/removing kvm hw breakpoints. The address and length arguments
are now of vaddr type, which both matches the type used internally in
accel/kvm/kvm-all.c and makes the api target-agnostic.

Signed-off-by: Anton Johansson
---
  include/sysemu/kvm.h   |  6 ++
  target/arm/kvm64.c |  6 ++
  target/i386/kvm/kvm.c  |  8 +++-
  target/ppc/kvm.c   | 13 ++---
  target/s390x/kvm/kvm.c |  6 ++
  5 files changed, 15 insertions(+), 24 deletions(-)


Reviewed-by: Richard Henderson 


r~



[PATCH for-8.1] tests/tcg: Disable filename test for info proc mappings

2023-08-08 Thread Richard Henderson
This test fails when host page size != guest page size,
because qemu may not be able to directly map the file.

Fixes: a6341482695 ("tests/tcg: Add a test for info proc mappings")
Signed-off-by: Richard Henderson 
---
 tests/tcg/multiarch/gdbstub/test-proc-mappings.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/multiarch/gdbstub/test-proc-mappings.py 
b/tests/tcg/multiarch/gdbstub/test-proc-mappings.py
index 7b596ac21b..5e3e5a2fb7 100644
--- a/tests/tcg/multiarch/gdbstub/test-proc-mappings.py
+++ b/tests/tcg/multiarch/gdbstub/test-proc-mappings.py
@@ -33,7 +33,8 @@ def run_test():
 return
 raise
 report(isinstance(mappings, str), "Fetched the mappings from the inferior")
-report("/sha1" in mappings, "Found the test binary name in the mappings")
+# Broken with host page size > guest page size
+# report("/sha1" in mappings, "Found the test binary name in the mappings")
 
 
 def main():
-- 
2.34.1




Re: [PULL 00/14] linux-user image mapping fixes

2023-08-08 Thread Richard Henderson

On 8/8/23 14:08, Richard Henderson wrote:

The following changes since commit 0450cf08976f9036feaded438031b4cba94f6452:

   Merge tag 'fixes-pull-request' ofhttps://gitlab.com/marcandre.lureau/qemu  
into staging (2023-08-07 13:55:00 -0700)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git  tags/pull-lu-20230808

for you to fetch changes up to dd55885516f42f718d0d121c59a5f7be5fdae3e6:

   linux-user: Rewrite non-fixed probe_guest_base (2023-08-08 13:41:55 -0700)


linux-user: Adjust guest image layout vs reserved_va
linux-user: Do not adjust image mapping for host page size
linux-user: Adjust initial brk when interpreter is close to executable
util/selfmap: Rewrite using qemu/interval-tree.h
linux-user: Rewrite probe_guest_base


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PULL 0/3] fix ppc64le build, fully kill alpha and s390 linux-user

2023-08-08 Thread Richard Henderson

On 8/8/23 11:45, Paolo Bonzini wrote:

The following changes since commit 9400601a689a128c25fa9c21e932562e0eeb7a26:

   Merge tag 'pull-tcg-20230806-3' ofhttps://gitlab.com/rth7680/qemu  into 
staging (2023-08-06 16:47:48 -0700)

are available in the Git repository at:

   https://gitlab.com/bonzini/qemu.git  tags/for-upstream

for you to fetch changes up to 971fac2731e60f2143f35648b14fd2f1b5b2c1af:

   configure: unify case statements for CPU canonicalization (2023-08-08 
20:44:11 +0200)


* cleanup architecture canonicalization once and for all


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




[PATCH v6 1/5] ebpf: Added eBPF map update through mmap.

2023-08-08 Thread Andrew Melnychenko
Changed eBPF map updates through mmaped array.
Mmaped arrays provide direct access to map data.
It should omit using bpf_map_update_elem() call,
which may require capabilities that are not present.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf_rss.c | 117 ++--
 ebpf/ebpf_rss.h |   5 +++
 2 files changed, 99 insertions(+), 23 deletions(-)

diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index cee658c158..247f5eee1b 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
 if (ctx != NULL) {
 ctx->obj = NULL;
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
+
+ctx->mmap_configuration = NULL;
+ctx->mmap_toeplitz_key = NULL;
+ctx->mmap_indirections_table = NULL;
 }
 }
 
 bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 {
-return ctx != NULL && ctx->obj != NULL;
+return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
+}
+
+static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_configuration, 0);
+if (ctx->mmap_configuration == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration array");
+return false;
+}
+ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_toeplitz_key, 0);
+if (ctx->mmap_toeplitz_key == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
+goto toeplitz_fail;
+}
+ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
+   PROT_READ | PROT_WRITE, MAP_SHARED,
+   ctx->map_indirections_table, 0);
+if (ctx->mmap_indirections_table == MAP_FAILED) {
+trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
+goto indirection_fail;
+}
+
+return true;
+
+indirection_fail:
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+toeplitz_fail:
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+
+ctx->mmap_configuration = NULL;
+ctx->mmap_toeplitz_key = NULL;
+ctx->mmap_indirections_table = NULL;
+return false;
+}
+
+static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
+{
+if (!ebpf_rss_is_loaded(ctx)) {
+return;
+}
+
+munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
+munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+
+ctx->mmap_configuration = NULL;
+ctx->mmap_toeplitz_key = NULL;
+ctx->mmap_indirections_table = NULL;
 }
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 {
 struct rss_bpf *rss_bpf_ctx;
 
-if (ctx == NULL) {
+if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
 return false;
 }
 
@@ -66,10 +130,18 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 ctx->map_toeplitz_key = bpf_map__fd(
 rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
 
+if (!ebpf_rss_mmap(ctx)) {
+goto error;
+}
+
 return true;
 error:
 rss_bpf__destroy(rss_bpf_ctx);
 ctx->obj = NULL;
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
 
 return false;
 }
@@ -77,15 +149,11 @@ error:
 static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
 struct EBPFRSSConfig *config)
 {
-uint32_t map_key = 0;
-
 if (!ebpf_rss_is_loaded(ctx)) {
 return false;
 }
-if (bpf_map_update_elem(ctx->map_configuration,
-_key, config, 0) < 0) {
-return false;
-}
+
+memcpy(ctx->mmap_configuration, config, sizeof(*config));
 return true;
 }
 
@@ -93,27 +161,19 @@ static bool ebpf_rss_set_indirections_table(struct 
EBPFRSSContext *ctx,
 uint16_t *indirections_table,
 size_t len)
 {
-uint32_t i = 0;
-
 if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
 return false;
 }
 
-for (; i < len; ++i) {
-if (bpf_map_update_elem(ctx->map_indirections_table, ,
-indirections_table + i, 0) < 0) {
-return false;
-}
-}
+memcpy(ctx->mmap_indirections_table, indirections_table,
+sizeof(*indirections_table) * len);
 return true;
 }

[PATCH v6 5/5] ebpf: Updated eBPF program and skeleton.

2023-08-08 Thread Andrew Melnychenko
Updated section name, so libbpf should init/gues proper
program type without specifications during open/load.
Also, added map_flags with explicitly declared BPF_F_MMAPABLE.
Which would require kernel 5.5+.

Signed-off-by: Andrew Melnychenko 
---
 ebpf/rss.bpf.skeleton.h | 1460 ---
 tools/ebpf/rss.bpf.c|5 +-
 2 files changed, 741 insertions(+), 724 deletions(-)

diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
index 18eb2adb12..446df248e1 100644
--- a/ebpf/rss.bpf.skeleton.h
+++ b/ebpf/rss.bpf.skeleton.h
@@ -176,162 +176,162 @@ err:
 
 static inline const void *rss_bpf__elf_bytes(size_t *sz)
 {
-   *sz = 20440;
+   *sz = 20824;
return (const void *)"\
 \x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
-\0\0\0\0\0\0\0\0\0\0\0\x98\x4c\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
-\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x54\xff\0\0\0\0\xbf\xa7\
-\0\0\0\0\0\0\x07\x07\0\0\x54\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\x18\x4e\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0d\0\
+\x01\0\xbf\x19\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 \xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 \0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x08\0\0\0\0\0\0\
-\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x67\x02\0\0\0\0\xbf\x87\0\0\
-\0\0\0\0\x15\x07\x65\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
-\0\x5e\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc8\xff\0\0\0\0\x7b\x1a\xc0\xff\
-\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\
-\0\x63\x1a\xa0\xff\0\0\0\0\x7b\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\
-\x1a\x88\xff\0\0\0\0\x7b\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\
-\x70\xff\0\0\0\0\x7b\x1a\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\
-\xff\0\0\0\0\x15\x09\x4d\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
-\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x64\x02\0\0\0\0\xbf\x87\0\0\
+\0\0\0\0\x15\x07\x62\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x5b\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
+\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
+\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
+\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
+\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
+\xff\0\0\0\0\x15\x09\x4a\x02\0\0\0\0\x6b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 \x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
-\x77\0\0\0\x20\0\0\0\x55\0\x42\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\
+\x77\0\0\0\x20\0\0\0\x55\0\x3f\x02\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xc8\
 \xff\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\
 \x55\x03\x0b\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\
-\0\xd0\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
-\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x32\x02\0\
-\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x15\x01\x30\x02\0\0\0\0\x7b\x7a\x38\xff\0\0\0\0\
-\x7b\x9a\x40\xff\0\0\0\0\x15\x01\x55\0\x86\xdd\0\0\x55\x01\x39\0\x08\0\0\0\xb7\
-\x07\0\0\x01\0\0\0\x73\x7a\x58\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xe0\xff\
-\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
-\x07\x03\0\0\xd0\xff\xff\xff\x79\xa1\x40\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\
+\0\xc8\xff\xff\xff\xbf\x91\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\
+\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x2f\x02\0\
+\0\0\0\x69\xa1\xc8\xff\0\0\0\0\x15\x01\x2d\x02\0\0\0\0\x7b\x7a\x30\xff\0\0\0\0\
+\x7b\x9a\x38\xff\0\0\0\0\x15\x01\x55\0\x86\xdd\0\0\x55\x01\x39\0\x08\0\0\0\xb7\
+\x07\0\0\x01\0\0\0\x73\x7a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xd8\xff\
+\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\x7b\x1a\xc8\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
+\x07\x03\0\0\xc8\xff\xff\xff\x79\xa1\x38\xff\0\0\0\0\xb7\x02\0\0\0\0\0\0\xb7\
 \x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\
-\0\x77\0\0\0\x20\0\0\0\x55\0\x1c\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\0\0\x55\x01\
-\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\0\x63\x1a\x64\xff\0\0\
-\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x68\xff\0\0\0\0\x71\xa9\xd9\xff\0\0\0\0\
-\x73\x7a\x5e\xff\0\0\0\0\x71\xa1\xd0\xff\0\0\0\0\x67\x01\0\0\x02\0\0\0\x57\x01\
-\0\0\x3c\0\0\0\x7b\x1a\x48\xff\0\0\0\0\xbf\x91\0\0\0\0\0\0\x57\x01\0\0\xff\0\0\
+\0\x77\0\0\0\x20\0\0\0\x55\0\x19\x02\0\0\0\0\x69\xa1\xce\xff\0\0\0\0\x55\x01\

[PATCH v6 4/5] qmp: Added new command to retrieve eBPF blob.

2023-08-08 Thread Andrew Melnychenko
Now, the binary objects may be retrieved by id.
It would require for future qmp commands that may require specific
eBPF blob.

Added command "request-ebpf". This command returns
eBPF program encoded base64. The program taken from the
skeleton and essentially is an ELF object that can be
loaded in the future with libbpf.

The reason to use the command to provide the eBPF object
instead of a separate artifact was to avoid issues related
to finding the eBPF itself. eBPF object is an ELF binary
that contains the eBPF program and eBPF map description(BTF).
Overall, eBPF object should contain the program and enough
metadata to create/load eBPF with libbpf. As the eBPF
maps/program should correspond to QEMU, the eBPF can't
be used from different QEMU build.

The first solution was a helper that comes with QEMU
and loads appropriate eBPF objects. And the issue is
to find a proper helper if the system has several
different QEMUs installed and/or built from the source,
which helpers may not be compatible.

Another issue is QEMU updating while there is a running
QEMU instance. With an updated helper, it may not be
possible to hotplug virtio-net device to the already
running QEMU. Overall, requesting the eBPF object from
QEMU itself solves possible failures with acceptable effort.

Links:
[PATCH 3/5] qmp: Added the helper stamp check.
https://lore.kernel.org/all/20230219162100.174318-4-and...@daynix.com/

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf.c   | 70 +++
 ebpf/ebpf.h   | 31 +++
 ebpf/ebpf_rss.c   |  6 
 ebpf/meson.build  |  2 +-
 qapi/ebpf.json| 56 ++
 qapi/meson.build  |  1 +
 qapi/qapi-schema.json |  1 +
 7 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 create mode 100644 qapi/ebpf.json

diff --git a/ebpf/ebpf.c b/ebpf/ebpf.c
new file mode 100644
index 00..ea97c0403e
--- /dev/null
+++ b/ebpf/ebpf.c
@@ -0,0 +1,70 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-ebpf.h"
+#include "ebpf/ebpf.h"
+
+struct ElfBinaryDataEntry {
+int id;
+const void *data;
+size_t datalen;
+
+QSLIST_ENTRY(ElfBinaryDataEntry) node;
+};
+
+static QSLIST_HEAD(, ElfBinaryDataEntry) ebpf_elf_obj_list =
+QSLIST_HEAD_INITIALIZER();
+
+void ebpf_register_binary_data(int id, const void *data, size_t datalen)
+{
+struct ElfBinaryDataEntry *dataentry = NULL;
+
+dataentry = g_new0(struct ElfBinaryDataEntry, 1);
+dataentry->data = data;
+dataentry->datalen = datalen;
+dataentry->id = id;
+
+QSLIST_INSERT_HEAD(_elf_obj_list, dataentry, node);
+}
+
+const void *ebpf_find_binary_by_id(int id, size_t *sz, Error **errp)
+{
+struct ElfBinaryDataEntry *it = NULL;
+QSLIST_FOREACH(it, _elf_obj_list, node) {
+if (id == it->id) {
+*sz = it->datalen;
+return it->data;
+}
+}
+
+error_setg(errp, "can't find eBPF object with id: %d", id);
+
+return NULL;
+}
+
+EbpfObject *qmp_request_ebpf(EbpfProgramID id, Error **errp)
+{
+EbpfObject *ret = NULL;
+size_t size = 0;
+const void *data = ebpf_find_binary_by_id(id, , errp);
+if (!data) {
+return NULL;
+}
+
+ret = g_new0(EbpfObject, 1);
+ret->object = g_base64_encode(data, size);
+
+return ret;
+}
diff --git a/ebpf/ebpf.h b/ebpf/ebpf.h
new file mode 100644
index 00..b6266b28b8
--- /dev/null
+++ b/ebpf/ebpf.h
@@ -0,0 +1,31 @@
+/*
+ * QEMU eBPF binary declaration routine.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef EBPF_H
+#define EBPF_H
+
+struct Error;
+
+void ebpf_register_binary_data(int id, const void *data,
+   size_t datalen);
+const void *ebpf_find_binary_by_id(int id, size_t *sz,
+   struct Error **errp);
+
+#define ebpf_binary_init(id, fn)   \
+static void __attribute__((constructor)) ebpf_binary_init_ ## fn(void) \
+{  \
+size_t datalen = 0;\
+const void *data = fn();   \
+ebpf_register_binary_data(id, data, datalen);  \
+}
+
+#endif /* EBPF_H */
diff --git 

[PATCH v6 2/5] ebpf: Added eBPF initialization by fds.

2023-08-08 Thread Andrew Melnychenko
It allows using file descriptors of eBPF provided
outside of QEMU.
QEMU may be run without capabilities for eBPF and run
RSS program provided by management tool(g.e. libvirt).

Signed-off-by: Andrew Melnychenko 
---
 ebpf/ebpf_rss-stub.c |  6 ++
 ebpf/ebpf_rss.c  | 27 +++
 ebpf/ebpf_rss.h  |  5 +
 3 files changed, 38 insertions(+)

diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
index e71e229190..8d7fae2ad9 100644
--- a/ebpf/ebpf_rss-stub.c
+++ b/ebpf/ebpf_rss-stub.c
@@ -28,6 +28,12 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
+{
+return false;
+}
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
   uint16_t *indirections_table, uint8_t *toeplitz_key)
 {
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index 247f5eee1b..24bc6cc409 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -146,6 +146,33 @@ error:
 return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd)
+{
+if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
+return false;
+}
+
+if (program_fd < 0 || config_fd < 0 || toeplitz_fd < 0 || table_fd < 0) {
+return false;
+}
+
+ctx->program_fd = program_fd;
+ctx->map_configuration = config_fd;
+ctx->map_toeplitz_key = toeplitz_fd;
+ctx->map_indirections_table = table_fd;
+
+if (!ebpf_rss_mmap(ctx)) {
+ctx->program_fd = -1;
+ctx->map_configuration = -1;
+ctx->map_toeplitz_key = -1;
+ctx->map_indirections_table = -1;
+return false;
+}
+
+return true;
+}
+
 static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
 struct EBPFRSSConfig *config)
 {
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
index ab08a7266d..239242b0d2 100644
--- a/ebpf/ebpf_rss.h
+++ b/ebpf/ebpf_rss.h
@@ -14,6 +14,8 @@
 #ifndef QEMU_EBPF_RSS_H
 #define QEMU_EBPF_RSS_H
 
+#define EBPF_RSS_MAX_FDS 4
+
 struct EBPFRSSContext {
 void *obj;
 int program_fd;
@@ -41,6 +43,9 @@ bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx);
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+   int config_fd, int toeplitz_fd, int table_fd);
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
   uint16_t *indirections_table, uint8_t *toeplitz_key);
 
-- 
2.40.1




[PATCH v6 0/5] eBPF RSS through QMP support.

2023-08-08 Thread Andrew Melnychenko
This series of patches provides the ability to retrieve eBPF program
through qmp, so management application may load bpf blob with proper 
capabilities.
Now, virtio-net devices can accept eBPF programs and maps through properties
as external file descriptors. Access to the eBPF map is direct through mmap()
call, so it should not require additional capabilities to bpf* calls.
eBPF file descriptors can be passed to QEMU from parent process or by unix
socket with sendfd() qmp command.

Possible solution for libvirt may look like this: 
https://github.com/daynix/libvirt/tree/RSS_eBPF (WIP)

Changes since v5:
 * Refactored ebpf.json

Changes since v4:
 * refactored commit hunks
 * added explicit BPF_F_MMAPABLE declaration

Changes since v3:
 * fixed issue with the build if bpf disabled
 * rebased to the last master
 * refactored according to review

Changes since v2:
 * moved/refactored QMP command
 * refactored virtio-net

Changes since v1:
 * refactored virtio-net
 * moved hunks for ebpf mmap()
 * added qmp enum for eBPF id.

Andrew Melnychenko (5):
  ebpf: Added eBPF map update through mmap.
  ebpf: Added eBPF initialization by fds.
  virtio-net: Added property to load eBPF RSS with fds.
  qmp: Added new command to retrieve eBPF blob.
  ebpf: Updated eBPF program and skeleton.

 ebpf/ebpf.c|   70 ++
 ebpf/ebpf.h|   31 +
 ebpf/ebpf_rss-stub.c   |6 +
 ebpf/ebpf_rss.c|  150 +++-
 ebpf/ebpf_rss.h|   10 +
 ebpf/meson.build   |2 +-
 ebpf/rss.bpf.skeleton.h| 1460 
 hw/net/virtio-net.c|   55 +-
 include/hw/virtio/virtio-net.h |1 +
 qapi/ebpf.json |   56 ++
 qapi/meson.build   |1 +
 qapi/qapi-schema.json  |1 +
 tools/ebpf/rss.bpf.c   |5 +-
 13 files changed, 1094 insertions(+), 754 deletions(-)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 create mode 100644 qapi/ebpf.json

-- 
2.40.1




[PATCH v6 3/5] virtio-net: Added property to load eBPF RSS with fds.

2023-08-08 Thread Andrew Melnychenko
eBPF RSS program and maps may now be passed during initialization.
Initially was implemented for libvirt to launch qemu without permissions,
and initialized eBPF program through the helper.

Signed-off-by: Andrew Melnychenko 
---
 hw/net/virtio-net.c| 55 ++
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7102ec4817..f1894f2095 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "monitor/qdev.h"
+#include "monitor/monitor.h"
 #include "hw/pci/pci_device.h"
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
@@ -1304,14 +1305,55 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
 virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n)
+static bool virtio_net_load_ebpf_fds(VirtIONet *n, Error **errp)
 {
-if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-/* backend does't support steering ebpf */
-return false;
+int fds[EBPF_RSS_MAX_FDS] = { [0 ... EBPF_RSS_MAX_FDS - 1] = -1};
+int nfds = 0;
+int ret = true;
+int i = 0;
+g_auto(GStrv) fds_strs = g_strsplit(n->ebpf_rss_fds, ":", 0);
+
+ERRP_GUARD();
+
+if (g_strv_length(fds_strs) != EBPF_RSS_MAX_FDS) {
+error_setg(errp,
+  "Expected %d file descriptors but got %d",
+  EBPF_RSS_MAX_FDS, g_strv_length(fds_strs));
+   return false;
+   }
+
+for (i = 0; i < nfds; i++) {
+fds[i] = monitor_fd_param(monitor_cur(), fds_strs[i], errp);
+if (*errp) {
+ret = false;
+goto exit;
+}
+}
+
+ret = ebpf_rss_load_fds(>ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
+
+exit:
+if (!ret || *errp) {
+for (i = 0; i < nfds && fds[i] != -1; i++) {
+close(fds[i]);
+}
 }
 
-return ebpf_rss_load(>ebpf_rss);
+return ret;
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
+{
+bool ret = false;
+
+if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+if (!(n->ebpf_rss_fds
+&& virtio_net_load_ebpf_fds(n, errp))) {
+ret = ebpf_rss_load(>ebpf_rss);
+}
+}
+
+return ret;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3741,7 +3783,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 net_rx_pkt_init(>rx_pkt);
 
 if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-virtio_net_load_ebpf(n);
+virtio_net_load_ebpf(n, errp);
 }
 }
 
@@ -3903,6 +3945,7 @@ static Property virtio_net_properties[] = {
 VIRTIO_NET_F_RSS, false),
 DEFINE_PROP_BIT64("hash", VirtIONet, host_features,
 VIRTIO_NET_F_HASH_REPORT, false),
+DEFINE_PROP_STRING("ebpf_rss_fds", VirtIONet, ebpf_rss_fds),
 DEFINE_PROP_BIT64("guest_rsc_ext", VirtIONet, host_features,
 VIRTIO_NET_F_RSC_EXT, false),
 DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 5f5dcb4572..44faf700b4 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -219,6 +219,7 @@ struct VirtIONet {
 VirtioNetRssData rss_data;
 struct NetRxPkt *rx_pkt;
 struct EBPFRSSContext ebpf_rss;
+char *ebpf_rss_fds;
 };
 
 size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-- 
2.40.1




Re: [PATCH v5 1/5] ebpf: Added eBPF map update through mmap.

2023-08-08 Thread Andrew Melnichenko
Hi all,

On Tue, Aug 8, 2023 at 5:39 AM Jason Wang  wrote:
>
> On Thu, Aug 3, 2023 at 5:01 AM Andrew Melnychenko  wrote:
> >
> > Changed eBPF map updates through mmaped array.
> > Mmaped arrays provide direct access to map data.
> > It should omit using bpf_map_update_elem() call,
> > which may require capabilities that are not present.
> >
> > Signed-off-by: Andrew Melnychenko 
> > ---
> >  ebpf/ebpf_rss.c | 117 ++--
> >  ebpf/ebpf_rss.h |   5 +++
> >  2 files changed, 99 insertions(+), 23 deletions(-)
> >
> > diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
> > index cee658c158b..247f5eee1b6 100644
> > --- a/ebpf/ebpf_rss.c
> > +++ b/ebpf/ebpf_rss.c
> > @@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
> >  {
> >  if (ctx != NULL) {
> >  ctx->obj = NULL;
> > +ctx->program_fd = -1;
> > +ctx->map_configuration = -1;
> > +ctx->map_toeplitz_key = -1;
> > +ctx->map_indirections_table = -1;
> > +
> > +ctx->mmap_configuration = NULL;
> > +ctx->mmap_toeplitz_key = NULL;
> > +ctx->mmap_indirections_table = NULL;
> >  }
> >  }
> >
> >  bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
> >  {
> > -return ctx != NULL && ctx->obj != NULL;
> > +return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
> > +}
> > +
> > +static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
> > +{
> > +if (!ebpf_rss_is_loaded(ctx)) {
> > +return false;
> > +}
> > +
> > +ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
> > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > +   ctx->map_configuration, 0);
> > +if (ctx->mmap_configuration == MAP_FAILED) {
> > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration 
> > array");
> > +return false;
> > +}
> > +ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
> > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > +   ctx->map_toeplitz_key, 0);
> > +if (ctx->mmap_toeplitz_key == MAP_FAILED) {
> > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
> > +goto toeplitz_fail;
> > +}
> > +ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
> > +   PROT_READ | PROT_WRITE, MAP_SHARED,
> > +   ctx->map_indirections_table, 0);
> > +if (ctx->mmap_indirections_table == MAP_FAILED) {
> > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection 
> > table");
> > +goto indirection_fail;
> > +}
> > +
> > +return true;
> > +
> > +indirection_fail:
> > +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> > +toeplitz_fail:
> > +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> > +
> > +ctx->mmap_configuration = NULL;
> > +ctx->mmap_toeplitz_key = NULL;
> > +ctx->mmap_indirections_table = NULL;
> > +return false;
> > +}
> > +
> > +static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
> > +{
> > +if (!ebpf_rss_is_loaded(ctx)) {
> > +return;
> > +}
> > +
> > +munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
> > +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> > +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> > +
> > +ctx->mmap_configuration = NULL;
> > +ctx->mmap_toeplitz_key = NULL;
> > +ctx->mmap_indirections_table = NULL;
> >  }
> >
> >  bool ebpf_rss_load(struct EBPFRSSContext *ctx)
> >  {
> >  struct rss_bpf *rss_bpf_ctx;
> >
> > -if (ctx == NULL) {
> > +if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
> >  return false;
> >  }
> >
> > @@ -66,10 +130,18 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
> >  ctx->map_toeplitz_key = bpf_map__fd(
> >  rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
> >
> > +if (!ebpf_rss_mmap(ctx)) {
> > +goto error;
> > +}
> > +
> >  return true;
> >  error:
> >  rss_bpf__destroy(rss_bpf_ctx);
> >  ctx->obj = NULL;
> > +ctx->program_fd = -1;
> > +ctx->map_configuration = -1;
> > +ctx->map_toeplitz_key = -1;
> > +ctx->map_indirections_table = -1;
> >
> >  return false;
> >  }
> > @@ -77,15 +149,11 @@ error:
> >  static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
> >  struct EBPFRSSConfig *config)
> >  {
> > -uint32_t map_key = 0;
> > -
> >  if (!ebpf_rss_is_loaded(ctx)) {
> >  return false;
> >  }
> > -if (bpf_map_update_elem(ctx->map_configuration,
> > -_key, config, 0) < 0) {
> > -return false;
> > -}
> > +
> > +memcpy(ctx->mmap_configuration, config, sizeof(*config));
> >  return true;
> >  }
> >
> > @@ -93,27 +161,19 @@ static bool 

Re: [PATCH] gdbstub: Fix client Ctrl-C handling

2023-08-08 Thread Richard Henderson

On 8/1/23 11:40, Matheus Tavares Bernardino wrote:

Hi, Nick.


Nicholas Piggin  wrote:

On Tue Jul 11, 2023 at 9:03 PM AEST, Matheus Tavares Bernardino wrote:

Nicholas Piggin  wrote:

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 6911b73c07..ce8b42eb15 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -2051,8 +2051,17 @@ void gdb_read_byte(uint8_t ch)
  return;
  }
  if (runstate_is_running()) {
-/* when the CPU is running, we cannot do anything except stop
-   it when receiving a char */
+/*
+ * When the CPU is running, we cannot do anything except stop
+ * it when receiving a char. This is expected on a Ctrl-C in the
+ * gdb client. Because we are in all-stop mode, gdb sends a
+ * 0x03 byte which is not a usual packet, so we handle it specially
+ * here, but it does expect a stop reply.
+ */
+if (ch != 0x03) {
+warn_report("gdbstub: client sent packet while target running\n");
+}
+gdbserver_state.allow_stop_reply = true;
  vm_stop(RUN_STATE_PAUSED);
  } else
  #endif


Makes sense to me, but shouldn't we send the stop-reply packet only for
Ctrl+C/0x03?


Good question.

I think if we get a character here that's not a 3, we're already in
trouble, and we eat it so even worse. Since we only send a stop packet
back when the vm stops, then if we don't send one now we might never
send it. At least if we send one then the client might have some chance
to get back to a sane state.


I just noticed now (as I was integrating the latest upstream patches
with our downstream qemu-system-hexagon) that this causes the
gdbstub-untimely-packet tcg test to fail.

My first thought was that, if 0x3 is the only valid case where we will
read a char when the cpu is running, perhaps not issuing the stop-reply
isn't that bad as GDB would ignore it anyways. E.g. from a `set debug
remote 1` output:

   Sending packet: $qSupported:multiprocess+;swbreak+;hwbreak+;qRelocInsn+;
fork-events+;vfork-events+;exec-events+;vContSupported+;
   QThreadEvents+;no-resumed+;
   xmlRegisters=i386#6a...
   Packet instead of Ack, ignoring it

So, perhaps, we could do:

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index f123b40ce7..8af066301a 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -2055,8 +2055,9 @@ void gdb_read_byte(uint8_t ch)
   */
  if (ch != 0x03) {
  warn_report("gdbstub: client sent packet while target running\n");
+} else {
+gdbserver_state.allow_stop_reply = true;
  }
-gdbserver_state.allow_stop_reply = true;
  vm_stop(RUN_STATE_PAUSED);
  } else
  #endif
-- >8 --

Alternatively, since GDB ignores the packet anyways, should we just let
this be and refactor/remove the test?


Ping, Alex and Nick.

I can confirm that Matheus' patch fixes the problem I am seeing.
But I'm also open to removing the test.


r~




CXL volatile memory is not listed

2023-08-08 Thread Maverickk 78
Hello,

I am running qemu-system-x86_64

qemu-system-x86_64 --version
QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)

qemu-system-x86_64 \
-m 2G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
-object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G


I was expecting the CXL memory to be listed in "System Ram", the lsmem
shows only 2G memory which is System RAM, it's not listing the CXL
memory.

Do I need to pass any particular parameter in the kernel command line?

Is there any documentation available? I followed the inputs provided in

https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/

Is there any documentation/blog listed?



Re: [PATCH v4 38/38] tests/tcg: Add a test for info proc mappings

2023-08-08 Thread Richard Henderson

On 6/30/23 11:04, Alex Bennée wrote:

From: Ilya Leoshkevich 

Add a small test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230621203627.1808446-9-...@linux.ibm.com>
Signed-off-by: Alex Bennée 

...

+def run_test():
+"""Run through the tests one by one"""
+try:
+mappings = gdb.execute("info proc mappings", False, True)
+except gdb.error as exc:
+exc_str = str(exc)
+if "Not supported on this target." in exc_str:
+# Detect failures due to an outstanding issue with how GDB handles
+# the x86_64 QEMU's target.xml, which does not contain the
+# definition of orig_rax. Skip the test in this case.
+print("SKIP: {}".format(exc_str))
+return
+raise
+report(isinstance(mappings, str), "Fetched the mappings from the inferior")
+report("/sha1" in mappings, "Found the test binary name in the mappings")


This test fails on ppc64 host, or indeed any host with page size != 4k.

When host page size != target page size, and the executable image is small, such as sha1, 
then target_mmap cannot directly map the executable file, but must implement the mmap via 
MAP_ANON + pread.  Which leaves us with no binary name in the host /proc/self/maps for us 
to copy to the artificial guest /proc/self/maps.


One of the very many issues with page size mismatch...

I'm tempted to remove the test, but I suppose we could check host page size in 
python.


r~




Re: [PATCH] qemu/osdep: Remove fallback for MAP_FIXED_NOREPLACE

2023-08-08 Thread Alex Bennée


Richard Henderson  writes:

> In order for our emulation of MAP_FIXED_NOREPLACE to succeed within
> linux-user target_mmap, we require a non-zero value.  This does not
> require host kernel support, merely the bit being defined.
>
> MAP_FIXED_NOREPLACE was added with glibc 2.28.  From repology.org:
>
>   Fedora 36: 2.35
>   CentOS 8 (RHEL-8): 2.28
>   Debian 11: 2.31
>  OpenSUSE Leap 15.4: 2.31
>Ubuntu LTS 20.04: 2.31
>
> Reported-by: Akihiko Odaki 
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH 2/2] riscv: zicond: make default

2023-08-08 Thread Vineet Gupta




On 8/8/23 14:06, Daniel Henrique Barboza wrote:

(CCing Alistair and other reviewers)

On 8/8/23 15:17, Vineet Gupta wrote:

Again this helps with better testing and something qemu has been doing
with newer features anyways.

Signed-off-by: Vineet Gupta 
---


Even if we can reach a consensus about removing the experimental (x- 
prefix) status
from an extension that is Frozen instead of ratified, enabling stuff 
in the default
CPUs because it's easier to test is something we would like to avoid. 
The rv64
CPU has a random set of extensions enabled for the most different and 
undocumented
reasons, and users don't know what they'll get because we keep beefing 
up the

generic CPUs arbitrarily.


I understand this position given the arbitrary nature of gazillion 
extensions. However pragmatically things like bitmanip and zicond are so 
fundamental it would be strange for designs to not have them, in a few 
years. Besides these don't compete or conflict with other extensions.
But on face value it is indeed possible for vendors to drop them for 
various reasons or no-reasons.


But having the x- dropped is good enough for our needs as there's 
already mechanisms to enable the toggles from elf attributes.




Starting on QEMU 8.2 we'll have a 'max' CPU type that will enable all 
non-experimental
and non-vendor extensions by default, making it easier for tooling to 
test new
features/extensions. All tooling should consider changing their 
scripts to use the

'max' CPU when it's available.


That would be great.



For now, I fear that gcc and friends will still need to enable 
'zicond' in the command

line via 'zicond=true'.  Thanks,


Thx,
-Vineet



Re: [PATCH 18/33] Implement stat related syscalls

2023-08-08 Thread Richard Henderson

On 8/7/23 23:08, Karim Taha wrote:

From: Stacey Son 

Implement the following syscalls:
stat(2)
lstat(2)
fstat(2)
fstatat(2)
nstat
nfstat
nlstat

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 


Why are all of these in os-stat.h instead of os-stat.c?
Is this attempting to avoid clang's warning for unused static inline function 
in a c file?


+/* stat(2) */
+static inline abi_long do_freebsd11_stat(abi_long arg1, abi_long arg2)
+{
+abi_long ret;
+void *p;
+struct freebsd11_stat st;
+
+LOCK_PATH(p, arg1);
+ret = get_errno(freebsd11_stat(path(p), ));
+UNLOCK_PATH(p, arg1);
+if (!is_error(ret)) {
+ret = h2t_freebsd11_stat(arg2, );
+}
+return ret;
+}


The patch ordering is poor, because freebsd11_stat is used here but not introduced until 
patch 23, and do_freebsd11_stat itself is not used until patch 30.


And yet you delay compilation of os-stat.c until patch 29.  Patch 29 is either too early 
or too late, depending on the viewpoint.


If os-stat.c compilation was enabled earlier, it would require you to order all of the 
patches such that os-stat.c will always compile.


If os-stat.c compilation was enabled later (indeed last), you would not need to mark this 
function 'static inline' in order to avoid unused function warnings prior to their use in 
patches 30-33.


I prefer the ordering in which os-stat.c always compiles.  This probably requires patches 
23-27 be ordered first, and patches 30-33 be merged with patches 18-22.  There is no need 
for *any* of these functions to be marked inline -- leave that choice to the compiler.



r~



Re: [PATCH 17/33] Implement h2t_freebsd_stat and h2t_freebsd_statfs functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Michal Meloun

They are the 64-bit variants of h2t_freebsd11_stat and
h2t_freebsd11_statfs, respectively

Signed-off-by: Michal Meloun
Signed-off-by: Karim Taha
---
  bsd-user/freebsd/os-stat.c | 82 ++
  1 file changed, 82 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 16/33] Implement host-target convertion functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son 

Implement the stat converstion functions:
target_to_host_fcntl_cmd

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
---
  bsd-user/freebsd/os-stat.c | 71 ++
  1 file changed, 71 insertions(+)


Which host / guest pairs have varying fcntl constants?
I thought freebsd had these standardized...


r~



Re: [PATCH 15/33] Implement host-target convertion functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son 

Implement the stat converstion functions:
h2t_freebds11_statfs

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 



Same typo and changes to subject.  Otherwise,

Reviewed-by: Richard Henderson 

r~



Re: [PATCH 14/33] Implement host-target convertion functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son

Implement the stat converstion functions:
h2t_freebsd_fhandle
t2h_freebsd_fhandle

Signed-off-by: Stacey Son
Signed-off-by: Karim Taha
---
  bsd-user/freebsd/os-stat.c | 37 +
  1 file changed, 37 insertions(+)


Same typo and changes to subject.  Otherwise,

Reviewed-by: Richard Henderson 


r~



Re: [PATCH 13/33] Implement host-target convertion functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son 

Implement the stat converstion functions:
h2t_freebsd11_stat
h2t_freebsd_nstat


"conversion"

And copy the functions into the subject, so that we don't wind up with 4 commits with 
identical subject.



r~



Re: [PATCH 13/33] Implement host-target convertion functions

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son

Implement the stat converstion functions:
h2t_freebsd11_stat
h2t_freebsd_nstat

Signed-off-by: Stacey Son
Signed-off-by: Karim Taha
---
  bsd-user/freebsd/os-stat.c | 94 ++
  1 file changed, 94 insertions(+)
  create mode 100644 bsd-user/freebsd/os-stat.c


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 12/33] Rename target_freebsd_time_t to target_time_t

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Warner Losh

This is necessary for future code using target_time_t, in
bsd-user/syscall_defs.

Signed-off-by: Warner Losh
Signed-off-by: Karim Taha
---
  bsd-user/syscall_defs.h | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH 11/33] Define safe_fcntl macro in bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Kyle Evans

Signed-off-by: Kyle Evans
Signed-off-by: Karim Taha
---
  bsd-user/syscall_defs.h | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH 10/33] Add struct target_freebsd_fhandle and fcntl flags to bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+struct target_freebsd_fid {
+u_short fid_len;/* len of data in bytes */
+u_short fid_data0;  /* force longword align */
+charfid_data[TARGET_MAXFIDSZ];  /* data (variable len) */


uint16_t?

Otherwise,
Acked-by: Richard Henderson 


r~



Re: [PATCH 09/33] Add struct target_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+struct target_statfs {
+uint32_t f_version; /* structure version number */


Indentation.  Otherwise,

Acked-by: Richard Henderson 


r~



Re: [PATCH 08/33] Add structs target_freebsd11_nstat and target_freebsd11_statfs to bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+uint32_t   st_flags;/* user defined flags for file */
+__uint32_t st_gen;  /* file generation number */


Drop the __.


+/* __int32_t  st_lspare; */


Why commented out?


+struct target_freebsd_timespec st_birthtim; /* time of file creation */


Does that not place st_birthtim at the wrong place?


r~



Re: [PATCH QEMU v3 1/3] tests: Add migration dirty-limit capability test

2023-08-08 Thread Peter Xu
On Thu, Jun 08, 2023 at 12:46:45AM +0800, ~hyman wrote:
> From: Hyman Huang(黄勇) 
> 
> Add migration dirty-limit capability test if kernel support
> dirty ring.
> 
> Migration dirty-limit capability introduce dirty limit
> capability, two parameters: x-vcpu-dirty-limit-period and
> vcpu-dirty-limit are introduced to implement the live
> migration with dirty limit.
> 
> The test case does the following things:
> 1. start src, dst vm and enable dirty-limit capability
> 2. start migrate and set cancel it to check if dirty limit
>stop working.
> 3. restart dst vm
> 4. start migrate and enable dirty-limit capability
> 5. check if migration satisfy the convergence condition
>during pre-switchover phase.
> 
> Note that this test case involves many passes, so it runs
> in slow mode only.
> 
> Signed-off-by: Hyman Huang(黄勇) 

Acked-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 06/33] Add struct target_freebsd11_stat to bsd-user/syscall_defs

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+uint32_t   st_flags;/* user defined flags for file */
+__uint32_t st_gen;  /* file generation number */
+__int32_t  st_lspare;


Oh, drop the __ types.


r~



Re: [PATCH 07/33] Add struct target_stat to bsd-user/syscall_defs.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+struct target_stat {
+uint64_t  st_dev;   /* inode's device */
+uint64_t  st_ino;   /* inode's number */
+uint64_t  st_nlink; /* number of hard links */


Indentation of 8 instead of 4.

Otherwise,
Acked-by: Richard Henderson 


r~



Re: [PATCH 06/33] Add struct target_freebsd11_stat to bsd-user/syscall_defs

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son

Signed-off-by: Stacey Son
Signed-off-by: Karim Taha
---
  bsd-user/syscall_defs.h | 33 +
  1 file changed, 33 insertions(+)


Acked-by: Richard Henderson 


r~



Re: [PATCH 05/33] Forward declare functions defined in os-stat.c

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Michal Meloun 

Add to bsd-user/freebsd/qemu-os.h the forward declarations
of conversion functions related to stat syscalls.

Signed-off-by: Michal Meloun 
Signed-off-by: Karim Taha 
---
  bsd-user/freebsd/qemu-os.h | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/bsd-user/freebsd/qemu-os.h b/bsd-user/freebsd/qemu-os.h
index 7ef4c55350..12adc50928 100644
--- a/bsd-user/freebsd/qemu-os.h
+++ b/bsd-user/freebsd/qemu-os.h
@@ -32,4 +32,19 @@
  
  struct freebsd11_stat;
  
+/* os-stat.c */

+abi_long h2t_freebsd11_stat(abi_ulong target_addr,
+struct freebsd11_stat *host_st);
+abi_long h2t_freebsd11_nstat(abi_ulong target_addr,
+struct freebsd11_stat *host_st);
+abi_long t2h_freebsd_fhandle(fhandle_t *host_fh, abi_ulong target_addr);
+abi_long h2t_freebsd_fhandle(abi_ulong target_addr, fhandle_t *host_fh);
+abi_long h2t_freebsd11_statfs(abi_ulong target_addr,
+struct freebsd11_statfs *host_statfs);
+abi_long target_to_host_fcntl_cmd(int cmd);
+abi_long h2t_freebsd_stat(abi_ulong target_addr,
+struct stat *host_st);
+abi_long h2t_freebsd_statfs(abi_ulong target_addr,
+struct statfs *host_statfs);
+



This appears to be the patch that matches the subject and comment for patch 4.


r~




Re: [PATCH 04/33] Declarations of h2t and t2h conversion functions.

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Stacey Son 

Declarations of functions that convert between host and target structs.

Signed-off-by: Stacey Son 
Signed-off-by: Karim Taha 
---
  bsd-user/freebsd/qemu-os.h | 35 +++
  bsd-user/qemu.h|  1 +
  2 files changed, 36 insertions(+)
  create mode 100644 bsd-user/freebsd/qemu-os.h

diff --git a/bsd-user/freebsd/qemu-os.h b/bsd-user/freebsd/qemu-os.h
new file mode 100644
index 00..7ef4c55350
--- /dev/null
+++ b/bsd-user/freebsd/qemu-os.h
@@ -0,0 +1,35 @@
+/*
+ *  FreeBSD conversion extern declarations
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef QEMU_OS_H
+#define QEMU_OS_H
+
+/* qemu/osdep.h pulls in the rest */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct freebsd11_stat;
+
+#endif /* QEMU_OS_H */
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index c41ebfe937..1344c3fce6 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -38,6 +38,7 @@ extern char **environ;
  #include "exec/gdbstub.h"
  #include "qemu/clang-tsa.h"
  
+#include "qemu-os.h"

  /*
   * This struct is used to hold certain information about the image.  
Basically,
   * it replicates in user space what would be certain task_struct fields in the


The subject and comment do not match the patch, or at least not obviously.
This appears to merely include some extra system headers, not declare any sort of 
coversion functions.



r~



Re: [PATCH 1/2] riscv: zicond: make non-experimental

2023-08-08 Thread Palmer Dabbelt

On Tue, 08 Aug 2023 14:10:54 PDT (-0700), dbarb...@ventanamicro.com wrote:



On 8/8/23 17:52, Palmer Dabbelt wrote:

On Tue, 08 Aug 2023 11:45:49 PDT (-0700), Vineet Gupta wrote:



On 8/8/23 11:29, Richard Henderson wrote:

On 8/8/23 11:17, Vineet Gupta wrote:

zicond is now codegen supported in both llvm and gcc.


It is still not in

https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions


Right, its been frozen since April though and with support trickling in
rest of tooling it becomes harder to test.
I don't know what exactly QEMU's policy is on this ?


IIUC we'd historically marked stuff as non-experimental when it's frozen, largely because 
ratification is such a nebulous process. There's obviously risk there, but there's risk to 
anything.  Last I can find is 260b594d8a ("RISC-V: Add Zawrs ISA extension support"), 
which specifically calls out Zawrs as frozen and IIUC adds support without the "x-" 
prefix.


If that's the case then I think it's sensible to remove the 'experimental' 
status
of zicond as well.



I can't find anything written down about it, though...


As soon as we agree on an official policy I'll do a doc update. Thanks,


Thanks.  We should probably give Alistair some time to chime in, it's 
still pretty early there.





Daniel







Re: [PATCH 1/2] riscv: zicond: make non-experimental

2023-08-08 Thread Daniel Henrique Barboza




On 8/8/23 17:52, Palmer Dabbelt wrote:

On Tue, 08 Aug 2023 11:45:49 PDT (-0700), Vineet Gupta wrote:



On 8/8/23 11:29, Richard Henderson wrote:

On 8/8/23 11:17, Vineet Gupta wrote:

zicond is now codegen supported in both llvm and gcc.


It is still not in

https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions


Right, its been frozen since April though and with support trickling in
rest of tooling it becomes harder to test.
I don't know what exactly QEMU's policy is on this ?


IIUC we'd historically marked stuff as non-experimental when it's frozen, largely because 
ratification is such a nebulous process. There's obviously risk there, but there's risk to 
anything.  Last I can find is 260b594d8a ("RISC-V: Add Zawrs ISA extension support"), 
which specifically calls out Zawrs as frozen and IIUC adds support without the "x-" 
prefix.


If that's the case then I think it's sensible to remove the 'experimental' 
status
of zicond as well.



I can't find anything written down about it, though...


As soon as we agree on an official policy I'll do a doc update. Thanks,


Daniel







[PULL 04/14] linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap

2023-08-08 Thread Richard Henderson
Use this as extra protection for the guest mapping over
any qemu host mappings.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 36e4026f05..1b4bb2d5af 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3147,8 +3147,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 /*
  * Reserve address space for all of this.
  *
- * In the case of ET_EXEC, we supply MAP_FIXED so that we get
- * exactly the address range that is required.
+ * In the case of ET_EXEC, we supply MAP_FIXED_NOREPLACE so that we get
+ * exactly the address range that is required.  Without reserved_va,
+ * the guest address space is not isolated.  We have attempted to avoid
+ * conflict with the host program itself via probe_guest_base, but using
+ * MAP_FIXED_NOREPLACE instead of MAP_FIXED provides an extra check.
  *
  * Otherwise this is ET_DYN, and we are searching for a location
  * that can hold the memory space required.  If the image is
@@ -3160,7 +3163,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
 MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
-(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+(ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
0),
 -1, 0);
 if (load_addr == -1) {
 goto exit_mmap;
-- 
2.34.1




[PULL 08/14] linux-user: Do not adjust zero_bss for host page size

2023-08-08 Thread Richard Henderson
Rely on target_mmap to handle guest vs host page size mismatch.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 63 +++-
 1 file changed, 27 insertions(+), 36 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 964b21f997..881fdeb157 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2211,47 +2211,37 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
 }
 }
 
-/* Map and zero the bss.  We need to explicitly zero any fractional pages
-   after the data section (i.e. bss).  */
-static void zero_bss(abi_ulong elf_bss, abi_ulong last_bss, int prot)
+/**
+ * zero_bss:
+ *
+ * Map and zero the bss.  We need to explicitly zero any fractional pages
+ * after the data section (i.e. bss).  Return false on mapping failure.
+ */
+static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss, int prot)
 {
-uintptr_t host_start, host_map_start, host_end;
+abi_ulong align_bss;
 
-last_bss = TARGET_PAGE_ALIGN(last_bss);
+align_bss = TARGET_PAGE_ALIGN(start_bss);
+end_bss = TARGET_PAGE_ALIGN(end_bss);
 
-/* ??? There is confusion between qemu_real_host_page_size and
-   qemu_host_page_size here and elsewhere in target_mmap, which
-   may lead to the end of the data section mapping from the file
-   not being mapped.  At least there was an explicit test and
-   comment for that here, suggesting that "the file size must
-   be known".  The comment probably pre-dates the introduction
-   of the fstat system call in target_mmap which does in fact
-   find out the size.  What isn't clear is if the workaround
-   here is still actually needed.  For now, continue with it,
-   but merge it with the "normal" mmap that would allocate the bss.  */
+if (start_bss < align_bss) {
+int flags = page_get_flags(start_bss);
 
-host_start = (uintptr_t) g2h_untagged(elf_bss);
-host_end = (uintptr_t) g2h_untagged(last_bss);
-host_map_start = REAL_HOST_PAGE_ALIGN(host_start);
-
-if (host_map_start < host_end) {
-void *p = mmap((void *)host_map_start, host_end - host_map_start,
-   prot, MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-if (p == MAP_FAILED) {
-perror("cannot mmap brk");
-exit(-1);
+if (!(flags & PAGE_VALID)) {
+/* Map the start of the bss. */
+align_bss -= TARGET_PAGE_SIZE;
+} else if (flags & PAGE_WRITE) {
+/* The page is already mapped writable. */
+memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
+} else {
+/* Read-only zeros? */
+g_assert_not_reached();
 }
 }
 
-/* Ensure that the bss page(s) are valid */
-if ((page_get_flags(last_bss-1) & prot) != prot) {
-page_set_flags(elf_bss & TARGET_PAGE_MASK, last_bss - 1,
-   prot | PAGE_VALID);
-}
-
-if (host_start < host_map_start) {
-memset((void *)host_start, 0, host_map_start - host_start);
-}
+return align_bss >= end_bss ||
+   target_mmap(align_bss, end_bss - align_bss, prot,
+   MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0) != -1;
 }
 
 #if defined(TARGET_ARM)
@@ -3255,8 +3245,9 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 /*
  * If the load segment requests extra zeros (e.g. bss), map it.
  */
-if (eppnt->p_filesz < eppnt->p_memsz) {
-zero_bss(vaddr_ef, vaddr_em, elf_prot);
+if (eppnt->p_filesz < eppnt->p_memsz &&
+!zero_bss(vaddr_ef, vaddr_em, elf_prot)) {
+goto exit_mmap;
 }
 } else if (eppnt->p_memsz != 0) {
 vaddr_len = eppnt->p_memsz + vaddr_po;
-- 
2.34.1




[PULL 14/14] linux-user: Rewrite non-fixed probe_guest_base

2023-08-08 Thread Richard Henderson
Use pgb_addr_set to probe for all of the guest addresses,
not just the main executable.  Handle the identity map
specially and separately from the search.

If /proc/self/maps is available, utilize the full power
of the interval tree search, rather than a linear search
through the address list.

If /proc/self/maps is not available, increase the skip
between probes so that we do not probe every single page
of the host address space.  Choose 1 MiB for 32-bit hosts
(max 4k probes) and 1 GiB for 64-bit hosts (possibly a
large number of probes, but the large step makes it more
likely to find empty space quicker).

Tested-by: Alex Bennée 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 311 ---
 1 file changed, 115 insertions(+), 196 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index a5f9dd5b31..ac03beb01b 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2683,220 +2683,143 @@ static void pgb_fixed(const char *image_name, 
uintptr_t guest_loaddr,
 }
 
 /**
- * pgd_find_hole_fallback: potential mmap address
- * @guest_size: size of available space
- * @brk: location of break
- * @align: memory alignment
+ * pgb_find_fallback:
  *
- * This is a fallback method for finding a hole in the host address
- * space if we don't have the benefit of being able to access
- * /proc/self/map. It can potentially take a very long time as we can
- * only dumbly iterate up the host address space seeing if the
- * allocation would work.
+ * This is a fallback method for finding holes in the host address space
+ * if we don't have the benefit of being able to access /proc/self/map.
+ * It can potentially take a very long time as we can only dumbly iterate
+ * up the host address space seeing if the allocation would work.
  */
-static uintptr_t pgd_find_hole_fallback(uintptr_t guest_size, uintptr_t brk,
-long align, uintptr_t offset)
+static uintptr_t pgb_find_fallback(const PGBAddrs *ga, uintptr_t align,
+   uintptr_t brk)
 {
-uintptr_t base;
+/* TODO: come up with a better estimate of how much to skip. */
+uintptr_t skip = sizeof(uintptr_t) == 4 ? MiB : GiB;
 
-/* Start (aligned) at the bottom and work our way up */
-base = ROUND_UP(mmap_min_addr, align);
-
-while (true) {
-uintptr_t align_start, end;
-align_start = ROUND_UP(base, align);
-end = align_start + guest_size + offset;
-
-/* if brk is anywhere in the range give ourselves some room to grow. */
-if (align_start <= brk && brk < end) {
-base = brk + (16 * MiB);
-continue;
-} else if (align_start + guest_size < align_start) {
-/* we have run out of space */
+for (uintptr_t base = skip; ; base += skip) {
+base = ROUND_UP(base, align);
+if (pgb_try_mmap_set(ga, base, brk)) {
+return base;
+}
+if (base >= -skip) {
 return -1;
-} else {
-int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE |
-MAP_FIXED_NOREPLACE;
-void * mmap_start = mmap((void *) align_start, guest_size,
- PROT_NONE, flags, -1, 0);
-if (mmap_start != MAP_FAILED) {
-munmap(mmap_start, guest_size);
-if (mmap_start == (void *) align_start) {
-return (uintptr_t) mmap_start + offset;
-}
-}
-base += qemu_host_page_size;
 }
 }
 }
 
-/* Return value for guest_base, or -1 if no hole found. */
-static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
-   long align, uintptr_t offset)
+static uintptr_t pgb_try_itree(const PGBAddrs *ga, uintptr_t base,
+   IntervalTreeRoot *root)
 {
-IntervalTreeRoot *maps;
-IntervalTreeNode *iter;
-uintptr_t this_start, this_end, next_start, brk;
-intptr_t ret = -1;
+for (int i = ga->nbounds - 1; i >= 0; --i) {
+uintptr_t s = base + ga->bounds[i][0];
+uintptr_t l = base + ga->bounds[i][1];
+IntervalTreeNode *n;
+
+if (l < s) {
+/* Wraparound. Skip to advance S to mmap_min_addr. */
+return mmap_min_addr - s;
+}
+
+n = interval_tree_iter_first(root, s, l);
+if (n != NULL) {
+/* Conflict.  Skip to advance S to LAST + 1. */
+return n->last - s + 1;
+}
+}
+return 0;  /* success */
+}
+
+static uintptr_t pgb_find_itree(const PGBAddrs *ga, IntervalTreeRoot *root,
+uintptr_t align, uintptr_t brk)
+{
+uintptr_t last = mmap_min_addr;
+uintptr_t base, skip;
+
+while (true) {
+base = ROUND_UP(last, align);
+if (base < last) {
+return -1;
+}
+
+skip = 

[PULL 09/14] linux-user: Use zero_bss for PT_LOAD with no file contents too

2023-08-08 Thread Richard Henderson
If p_filesz == 0, then vaddr_ef == vaddr.  We can reuse the
code in zero_bss rather than incompletely duplicating it in
load_elf_image.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 881fdeb157..e72497c4b4 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3208,7 +3208,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 for (i = 0; i < ehdr->e_phnum; i++) {
 struct elf_phdr *eppnt = phdr + i;
 if (eppnt->p_type == PT_LOAD) {
-abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len;
+abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em;
 int elf_prot = 0;
 
 if (eppnt->p_flags & PF_R) {
@@ -3233,31 +3233,18 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  * but no backing file segment.
  */
 if (eppnt->p_filesz != 0) {
-vaddr_len = eppnt->p_filesz + vaddr_po;
-error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
-MAP_PRIVATE | MAP_FIXED,
+error = target_mmap(vaddr_ps, eppnt->p_filesz + vaddr_po,
+elf_prot, MAP_PRIVATE | MAP_FIXED,
 image_fd, eppnt->p_offset - vaddr_po);
-
 if (error == -1) {
 goto exit_mmap;
 }
+}
 
-/*
- * If the load segment requests extra zeros (e.g. bss), map it.
- */
-if (eppnt->p_filesz < eppnt->p_memsz &&
-!zero_bss(vaddr_ef, vaddr_em, elf_prot)) {
-goto exit_mmap;
-}
-} else if (eppnt->p_memsz != 0) {
-vaddr_len = eppnt->p_memsz + vaddr_po;
-error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
-MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS,
--1, 0);
-
-if (error == -1) {
-goto exit_mmap;
-}
+/* If the load segment requests extra zeros (e.g. bss), map it. */
+if (vaddr_ef < vaddr_em &&
+!zero_bss(vaddr_ef, vaddr_em, elf_prot)) {
+goto exit_mmap;
 }
 
 /* Find the full program boundaries.  */
-- 
2.34.1




[PULL 05/14] linux-user: Use elf_et_dyn_base for ET_DYN with interpreter

2023-08-08 Thread Richard Henderson
Follow the lead of the linux kernel in fs/binfmt_elf.c,
in which an ET_DYN executable which uses an interpreter
(usually a PIE executable) is loaded away from where the
interpreter itself will be loaded.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1b4bb2d5af..d1b278d799 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3107,6 +3107,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 }
 
+load_addr = loaddr;
+
 if (pinterp_name != NULL) {
 /*
  * This is the main executable.
@@ -3136,11 +3138,32 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 probe_guest_base(image_name, loaddr, hiaddr);
 } else {
+abi_ulong align;
+
 /*
  * The binary is dynamic, but we still need to
  * select guest_base.  In this case we pass a size.
  */
 probe_guest_base(image_name, 0, hiaddr - loaddr);
+
+/*
+ * Avoid collision with the loader by providing a different
+ * default load address.
+ */
+load_addr += elf_et_dyn_base;
+
+/*
+ * TODO: Better support for mmap alignment is desirable.
+ * Since we do not have complete control over the guest
+ * address space, we prefer the kernel to choose some address
+ * rather than force the use of LOAD_ADDR via MAP_FIXED.
+ * But without MAP_FIXED we cannot guarantee alignment,
+ * only suggest it.
+ */
+align = pow2ceil(info->alignment);
+if (align) {
+load_addr &= -align;
+}
 }
 }
 
@@ -3155,13 +3178,13 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  *
  * Otherwise this is ET_DYN, and we are searching for a location
  * that can hold the memory space required.  If the image is
- * pre-linked, LOADDR will be non-zero, and the kernel should
+ * pre-linked, LOAD_ADDR will be non-zero, and the kernel should
  * honor that address if it happens to be free.
  *
  * In both cases, we will overwrite pages in this range with mappings
  * from the executable.
  */
-load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
+load_addr = target_mmap(load_addr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
 MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
 (ehdr->e_type == ET_EXEC ? MAP_FIXED_NOREPLACE : 
0),
 -1, 0);
-- 
2.34.1




[PULL 06/14] linux-user: Adjust initial brk when interpreter is close to executable

2023-08-08 Thread Richard Henderson
From: Helge Deller 

While we attempt to load a ET_DYN executable far away from
TASK_UNMAPPED_BASE, we are not completely in control of the
address space layout.  If the interpreter lands close to
the executable, leaving insufficient heap space, move brk.

Tested-by: Helge Deller 
Signed-off-by: Helge Deller 
[rth: Re-order after ELF_ET_DYN_BASE patch so that we do not
 "temporarily break" tsan, and also to minimize the changes required.
 Remove image_info.reserve_brk as unused.]
Reviewed-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h|  1 -
 linux-user/elfload.c | 51 +---
 2 files changed, 15 insertions(+), 37 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 2046a23037..4f8b55e2fb 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -30,7 +30,6 @@ struct image_info {
 abi_ulong   start_data;
 abi_ulong   end_data;
 abi_ulong   brk;
-abi_ulong   reserve_brk;
 abi_ulong   start_mmap;
 abi_ulong   start_stack;
 abi_ulong   stack_limit;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index d1b278d799..3553a3eaef 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3110,27 +3110,6 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 load_addr = loaddr;
 
 if (pinterp_name != NULL) {
-/*
- * This is the main executable.
- *
- * Reserve extra space for brk.
- * We hold on to this space while placing the interpreter
- * and the stack, lest they be placed immediately after
- * the data segment and block allocation from the brk.
- *
- * 16MB is chosen as "large enough" without being so large as
- * to allow the result to not fit with a 32-bit guest on a
- * 32-bit host. However some 64 bit guests (e.g. s390x)
- * attempt to place their heap further ahead and currently
- * nothing stops them smashing into QEMUs address space.
- */
-#if TARGET_LONG_BITS == 64
-info->reserve_brk = 32 * MiB;
-#else
-info->reserve_brk = 16 * MiB;
-#endif
-hiaddr += info->reserve_brk;
-
 if (ehdr->e_type == ET_EXEC) {
 /*
  * Make sure that the low address does not conflict with
@@ -3221,7 +3200,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->end_code = 0;
 info->start_data = -1;
 info->end_data = 0;
-info->brk = 0;
+/* Usual start for brk is after all sections of the main executable. */
+info->brk = TARGET_PAGE_ALIGN(hiaddr);
 info->elf_flags = ehdr->e_flags;
 
 prot_exec = PROT_EXEC;
@@ -3315,9 +3295,6 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->end_data = vaddr_ef;
 }
 }
-if (vaddr_em > info->brk) {
-info->brk = vaddr_em;
-}
 #ifdef TARGET_MIPS
 } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
 Mips_elf_abiflags_v0 abiflags;
@@ -3646,6 +3623,19 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 if (elf_interpreter) {
 load_elf_interp(elf_interpreter, _info, bprm->buf);
 
+/*
+ * While unusual because of ELF_ET_DYN_BASE, if we are unlucky
+ * with the mappings the interpreter can be loaded above but
+ * near the main executable, which can leave very little room
+ * for the heap.
+ * If the current brk has less than 16MB, use the end of the
+ * interpreter.
+ */
+if (interp_info.brk > info->brk &&
+interp_info.load_bias - info->brk < 16 * MiB)  {
+info->brk = interp_info.brk;
+}
+
 /* If the program interpreter is one of these two, then assume
an iBCS2 image.  Otherwise assume a native linux image.  */
 
@@ -3699,17 +3689,6 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 bprm->core_dump = _core_dump;
 #endif
 
-/*
- * If we reserved extra space for brk, release it now.
- * The implementation of do_brk in syscalls.c expects to be able
- * to mmap pages in this space.
- */
-if (info->reserve_brk) {
-abi_ulong start_brk = TARGET_PAGE_ALIGN(info->brk);
-abi_ulong end_brk = TARGET_PAGE_ALIGN(info->brk + info->reserve_brk);
-target_munmap(start_brk, end_brk - start_brk);
-}
-
 return 0;
 }
 
-- 
2.34.1




[PULL 10/14] util/selfmap: Rewrite using qemu/interval-tree.h

2023-08-08 Thread Richard Henderson
We will want to be able to search the set of mappings.
For this patch, the two users iterate the tree in order.

Signed-off-by: Richard Henderson 
---
 include/qemu/selfmap.h |  22 
 linux-user/elfload.c   |  14 +++--
 linux-user/syscall.c   |  15 +++---
 util/selfmap.c | 114 +
 4 files changed, 97 insertions(+), 68 deletions(-)

diff --git a/include/qemu/selfmap.h b/include/qemu/selfmap.h
index 3479a2a618..7d938945cb 100644
--- a/include/qemu/selfmap.h
+++ b/include/qemu/selfmap.h
@@ -9,9 +9,10 @@
 #ifndef SELFMAP_H
 #define SELFMAP_H
 
+#include "qemu/interval-tree.h"
+
 typedef struct {
-unsigned long start;
-unsigned long end;
+IntervalTreeNode itree;
 
 /* flags */
 bool is_read;
@@ -19,26 +20,25 @@ typedef struct {
 bool is_exec;
 bool is_priv;
 
-unsigned long offset;
-gchar *dev;
+uint64_t offset;
 uint64_t inode;
-gchar *path;
+const char *path;
+char dev[];
 } MapInfo;
 
-
 /**
  * read_self_maps:
  *
- * Read /proc/self/maps and return a list of MapInfo structures.
+ * Read /proc/self/maps and return a tree of MapInfo structures.
  */
-GSList *read_self_maps(void);
+IntervalTreeRoot *read_self_maps(void);
 
 /**
  * free_self_maps:
- * @info: a GSlist
+ * @info: an interval tree
  *
- * Free a list of MapInfo structures.
+ * Free a tree of MapInfo structures.
  */
-void free_self_maps(GSList *info);
+void free_self_maps(IntervalTreeRoot *root);
 
 #endif /* SELFMAP_H */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index e72497c4b4..fb137345f6 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2620,7 +2620,8 @@ static uintptr_t pgd_find_hole_fallback(uintptr_t 
guest_size, uintptr_t brk,
 static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, uintptr_t guest_size,
long align, uintptr_t offset)
 {
-GSList *maps, *iter;
+IntervalTreeRoot *maps;
+IntervalTreeNode *iter;
 uintptr_t this_start, this_end, next_start, brk;
 intptr_t ret = -1;
 
@@ -2638,12 +2639,15 @@ static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, 
uintptr_t guest_size,
 /* The first hole is before the first map entry. */
 this_start = mmap_min_addr;
 
-for (iter = maps; iter;
- this_start = next_start, iter = g_slist_next(iter)) {
+for (iter = interval_tree_iter_first(maps, 0, -1);
+ iter;
+ this_start = next_start,
+ iter = interval_tree_iter_next(iter, 0, -1)) {
+MapInfo *info = container_of(iter, MapInfo, itree);
 uintptr_t align_start, hole_size;
 
-this_end = ((MapInfo *)iter->data)->start;
-next_start = ((MapInfo *)iter->data)->end;
+this_end = info->itree.start;
+next_start = info->itree.last + 1;
 align_start = ROUND_UP(this_start + offset, align);
 
 /* Skip holes that are too small. */
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7c2c2f6e2f..a15bce2be2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8070,16 +8070,17 @@ static int open_self_maps_1(CPUArchState *cpu_env, int 
fd, bool smaps)
 {
 CPUState *cpu = env_cpu(cpu_env);
 TaskState *ts = cpu->opaque;
-GSList *map_info = read_self_maps();
-GSList *s;
+IntervalTreeRoot *map_info = read_self_maps();
+IntervalTreeNode *s;
 int count;
 
-for (s = map_info; s; s = g_slist_next(s)) {
-MapInfo *e = (MapInfo *) s->data;
+for (s = interval_tree_iter_first(map_info, 0, -1); s;
+ s = interval_tree_iter_next(s, 0, -1)) {
+MapInfo *e = container_of(s, MapInfo, itree);
 
-if (h2g_valid(e->start)) {
-unsigned long min = e->start;
-unsigned long max = e->end;
+if (h2g_valid(e->itree.start)) {
+unsigned long min = e->itree.start;
+unsigned long max = e->itree.last + 1;
 int flags = page_get_flags(h2g(min));
 const char *path;
 
diff --git a/util/selfmap.c b/util/selfmap.c
index 2c14f019ce..4db5b42651 100644
--- a/util/selfmap.c
+++ b/util/selfmap.c
@@ -10,74 +10,98 @@
 #include "qemu/cutils.h"
 #include "qemu/selfmap.h"
 
-GSList *read_self_maps(void)
+IntervalTreeRoot *read_self_maps(void)
 {
-gchar *maps;
-GSList *map_info = NULL;
+IntervalTreeRoot *root;
+gchar *maps, **lines;
+guint i, nlines;
 
-if (g_file_get_contents("/proc/self/maps", , NULL, NULL)) {
-gchar **lines = g_strsplit(maps, "\n", 0);
-int i, entries = g_strv_length(lines);
+if (!g_file_get_contents("/proc/self/maps", , NULL, NULL)) {
+return NULL;
+}
 
-for (i = 0; i < entries; i++) {
-gchar **fields = g_strsplit(lines[i], " ", 6);
-if (g_strv_length(fields) > 4) {
-MapInfo *e = g_new0(MapInfo, 1);
-int errors = 0;
-const char *end;
+root = g_new0(IntervalTreeRoot, 1);
+lines = 

[PULL 12/14] linux-user: Consolidate guest bounds check in probe_guest_base

2023-08-08 Thread Richard Henderson
The three sets of checks are identical, logically.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 50 +++-
 1 file changed, 17 insertions(+), 33 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index c14139a5fc..06d81f83b1 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2525,25 +2525,6 @@ static void pgb_have_guest_base(const char *image_name, 
abi_ulong guest_loaddr,
 exit(EXIT_FAILURE);
 }
 
-/* Sanity check the guest binary. */
-if (reserved_va) {
-if (guest_hiaddr > reserved_va) {
-error_report("%s: requires more than reserved virtual "
- "address space (0x%" PRIx64 " > 0x%lx)",
- image_name, (uint64_t)guest_hiaddr, reserved_va);
-exit(EXIT_FAILURE);
-}
-} else {
-#if HOST_LONG_BITS < TARGET_ABI_BITS
-if ((guest_hiaddr - guest_base) > ~(uintptr_t)0) {
-error_report("%s: requires more virtual address space "
- "than the host can provide (0x%" PRIx64 ")",
- image_name, (uint64_t)guest_hiaddr + 1 - guest_base);
-exit(EXIT_FAILURE);
-}
-#endif
-}
-
 /*
  * Expand the allocation to the entire reserved_va.
  * Exclude the mmap_min_addr hole.
@@ -2694,13 +2675,6 @@ static void pgb_static(const char *image_name, abi_ulong 
orig_loaddr,
 uintptr_t offset = 0;
 uintptr_t addr;
 
-if (hiaddr != orig_hiaddr) {
-error_report("%s: requires virtual address space that the "
- "host cannot provide (0x%" PRIx64 ")",
- image_name, (uint64_t)orig_hiaddr + 1);
-exit(EXIT_FAILURE);
-}
-
 loaddr &= -align;
 if (HI_COMMPAGE) {
 /*
@@ -2766,13 +2740,6 @@ static void pgb_reserved_va(const char *image_name, 
abi_ulong guest_loaddr,
 int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE;
 void *addr, *test;
 
-if (guest_hiaddr > reserved_va) {
-error_report("%s: requires more than reserved virtual "
- "address space (0x%" PRIx64 " > 0x%lx)",
- image_name, (uint64_t)guest_hiaddr, reserved_va);
-exit(EXIT_FAILURE);
-}
-
 /* Widen the "image" to the entire reserved address space. */
 pgb_static(image_name, 0, reserved_va, align);
 
@@ -2799,6 +2766,23 @@ void probe_guest_base(const char *image_name, abi_ulong 
guest_loaddr,
 /* In order to use host shmat, we must be able to honor SHMLBA.  */
 uintptr_t align = MAX(SHMLBA, qemu_host_page_size);
 
+/* Sanity check the guest binary. */
+if (reserved_va) {
+if (guest_hiaddr > reserved_va) {
+error_report("%s: requires more than reserved virtual "
+ "address space (0x%" PRIx64 " > 0x%lx)",
+ image_name, (uint64_t)guest_hiaddr, reserved_va);
+exit(EXIT_FAILURE);
+}
+} else {
+if (guest_hiaddr != (uintptr_t)guest_hiaddr) {
+error_report("%s: requires more virtual address space "
+ "than the host can provide (0x%" PRIx64 ")",
+ image_name, (uint64_t)guest_hiaddr + 1);
+exit(EXIT_FAILURE);
+}
+}
+
 if (have_guest_base) {
 pgb_have_guest_base(image_name, guest_loaddr, guest_hiaddr, align);
 } else if (reserved_va) {
-- 
2.34.1




[PULL 11/14] linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base

2023-08-08 Thread Richard Henderson
The proper logging for probe_guest_base is in the main function.
There is no need to duplicate that in the subroutines.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index fb137345f6..c14139a5fc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2560,9 +2560,6 @@ static void pgb_have_guest_base(const char *image_name, 
abi_ulong guest_loaddr,
 if (test != addr) {
 pgb_fail_in_use(image_name);
 }
-qemu_log_mask(CPU_LOG_PAGE,
-  "%s: base @ %p for %" PRIu64 " bytes\n",
-  __func__, addr, (uint64_t)guest_hiaddr - guest_loaddr + 1);
 }
 
 /**
@@ -2605,9 +2602,6 @@ static uintptr_t pgd_find_hole_fallback(uintptr_t 
guest_size, uintptr_t brk,
 if (mmap_start != MAP_FAILED) {
 munmap(mmap_start, guest_size);
 if (mmap_start == (void *) align_start) {
-qemu_log_mask(CPU_LOG_PAGE,
-  "%s: base @ %p for %" PRIdPTR" bytes\n",
-  __func__, mmap_start + offset, guest_size);
 return (uintptr_t) mmap_start + offset;
 }
 }
@@ -2689,13 +2683,6 @@ static uintptr_t pgb_find_hole(uintptr_t guest_loaddr, 
uintptr_t guest_size,
 }
 }
 free_self_maps(maps);
-
-if (ret != -1) {
-qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %" PRIxPTR
-  " for %" PRIuPTR " bytes\n",
-  __func__, ret, guest_size);
-}
-
 return ret;
 }
 
@@ -2747,9 +2734,6 @@ static void pgb_static(const char *image_name, abi_ulong 
orig_loaddr,
 }
 
 guest_base = addr;
-
-qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %"PRIxPTR" for %" PRIuPTR" 
bytes\n",
-  __func__, addr, hiaddr - loaddr);
 }
 
 static void pgb_dynamic(const char *image_name, long align)
@@ -2807,9 +2791,6 @@ static void pgb_reserved_va(const char *image_name, 
abi_ulong guest_loaddr,
  reserved_va + 1, test, strerror(errno));
 exit(EXIT_FAILURE);
 }
-
-qemu_log_mask(CPU_LOG_PAGE, "%s: base @ %p for %lu bytes\n",
-  __func__, addr, reserved_va + 1);
 }
 
 void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
-- 
2.34.1




[PULL 02/14] linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h

2023-08-08 Thread Richard Henderson
Provide default values that are as close as possible to the
values used by the guest's kernel.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_mman.h | 10 ++
 linux-user/alpha/target_mman.h   |  8 
 linux-user/arm/target_mman.h |  8 
 linux-user/cris/target_mman.h|  9 +
 linux-user/hexagon/target_mman.h | 10 ++
 linux-user/hppa/target_mman.h|  3 +++
 linux-user/i386/target_mman.h| 13 +
 linux-user/loongarch64/target_mman.h |  8 
 linux-user/m68k/target_mman.h|  3 +++
 linux-user/microblaze/target_mman.h  |  8 
 linux-user/mips/target_mman.h|  7 +++
 linux-user/nios2/target_mman.h   |  7 +++
 linux-user/openrisc/target_mman.h|  7 +++
 linux-user/ppc/target_mman.h | 13 +
 linux-user/riscv/target_mman.h   |  7 +++
 linux-user/s390x/target_mman.h   | 10 ++
 linux-user/sh4/target_mman.h |  4 
 linux-user/sparc/target_mman.h   | 14 ++
 linux-user/user-mmap.h   | 14 --
 linux-user/x86_64/target_mman.h  | 12 
 linux-user/xtensa/target_mman.h  |  6 ++
 21 files changed, 167 insertions(+), 14 deletions(-)

diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h
index f721295fe1..4d3eecfb26 100644
--- a/linux-user/aarch64/target_mman.h
+++ b/linux-user/aarch64/target_mman.h
@@ -4,6 +4,16 @@
 #define TARGET_PROT_BTI 0x10
 #define TARGET_PROT_MTE 0x20
 
+/*
+ * arch/arm64/include/asm/processor.h:
+ *
+ * TASK_UNMAPPED_BASE DEFAULT_MAP_WINDOW / 4
+ * DEFAULT_MAP_WINDOW DEFAULT_MAP_WINDOW_64
+ * DEFAULT_MAP_WINDOW_64  UL(1) << VA_BITS_MIN
+ * VA_BITS_MIN48 (unless explicitly configured smaller)
+ */
+#define TASK_UNMAPPED_BASE  (1ull << (48 - 2))
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h
index 6bb03e7336..c90b493711 100644
--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -20,6 +20,14 @@
 #define TARGET_MS_SYNC 2
 #define TARGET_MS_INVALIDATE 4
 
+/*
+ * arch/alpha/include/asm/processor.h:
+ *
+ * TASK_UNMAPPED_BASE   TASK_SIZE / 2
+ * TASK_SIZE0x400UL
+ */
+#define TASK_UNMAPPED_BASE  0x200ull
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/arm/target_mman.h b/linux-user/arm/target_mman.h
index e7ba6070fe..76275b2c7e 100644
--- a/linux-user/arm/target_mman.h
+++ b/linux-user/arm/target_mman.h
@@ -1 +1,9 @@
+/*
+ * arch/arm/include/asm/memory.h
+ * TASK_UNMAPPED_BASEALIGN(TASK_SIZE / 3, SZ_16M)
+ * TASK_SIZE CONFIG_PAGE_OFFSET
+ * CONFIG_PAGE_OFFSET0xC000 (default in Kconfig)
+ */
+#define TASK_UNMAPPED_BASE   0x4000
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/cris/target_mman.h b/linux-user/cris/target_mman.h
index e7ba6070fe..9df7b1eda5 100644
--- a/linux-user/cris/target_mman.h
+++ b/linux-user/cris/target_mman.h
@@ -1 +1,10 @@
+/*
+ * arch/cris/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE  (PAGE_ALIGN(TASK_SIZE / 3))
+ *
+ * arch/cris/include/arch-v32/arch/processor.h
+ * TASK_SIZE   0xb000
+ */
+#define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0xb000 / 3)
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/hexagon/target_mman.h b/linux-user/hexagon/target_mman.h
index e7ba6070fe..c5ae336e07 100644
--- a/linux-user/hexagon/target_mman.h
+++ b/linux-user/hexagon/target_mman.h
@@ -1 +1,11 @@
+/*
+ * arch/hexgon/include/asm/processor.h
+ * TASK_UNMAPPED_BASEPAGE_ALIGN(TASK_SIZE / 3)
+ *
+ * arch/hexagon/include/asm/mem-layout.h
+ * TASK_SIZE PAGE_OFFSET
+ * PAGE_OFFSET   0xc000
+ */
+#define TASK_UNMAPPED_BASE   0x4000
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h
index 97f87d042a..6459e7dbdd 100644
--- a/linux-user/hppa/target_mman.h
+++ b/linux-user/hppa/target_mman.h
@@ -24,6 +24,9 @@
 #define TARGET_MS_ASYNC 2
 #define TARGET_MS_INVALIDATE 4
 
+/* arch/parisc/include/asm/processor.h: DEFAULT_MAP_BASE32 */
+#define TASK_UNMAPPED_BASE  0x4000
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/i386/target_mman.h b/linux-user/i386/target_mman.h
index e7ba6070fe..cc3382007f 100644
--- a/linux-user/i386/target_mman.h
+++ b/linux-user/i386/target_mman.h
@@ -1 +1,14 @@
+/*
+ * arch/x86/include/asm/processor.h:
+ * TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE_LOW)
+ * __TASK_UNMAPPED_BASE(S)PAGE_ALIGN(S / 3)
+ *
+ * arch/x86/include/asm/page_32_types.h:
+ * TASK_SIZE_LOW  TASK_SIZE
+ * TASK_SIZE  __PAGE_OFFSET
+ * 

[PULL 13/14] linux-user: Rewrite fixed probe_guest_base

2023-08-08 Thread Richard Henderson
Create a set of subroutines to collect a set of guest addresses,
all of which must be mappable on the host.  Use this within the
renamed pgb_fixed subroutine to validate the user's choice of
guest_base specified by the -B command-line option.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 188 ---
 1 file changed, 161 insertions(+), 27 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 06d81f83b1..a5f9dd5b31 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2504,6 +2504,157 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 #endif
 #endif
 
+/**
+ * pgb_try_mmap:
+ * @addr: host start address
+ * @addr_last: host last address
+ * @keep: do not unmap the probe region
+ *
+ * Return 1 if [@addr, @addr_last] is not mapped in the host,
+ * return 0 if it is not available to map, and -1 on mmap error.
+ * If @keep, the region is left mapped on success, otherwise unmapped.
+ */
+static int pgb_try_mmap(uintptr_t addr, uintptr_t addr_last, bool keep)
+{
+size_t size = addr_last - addr + 1;
+void *p = mmap((void *)addr, size, PROT_NONE,
+   MAP_ANONYMOUS | MAP_PRIVATE |
+   MAP_NORESERVE | MAP_FIXED_NOREPLACE, -1, 0);
+int ret;
+
+if (p == MAP_FAILED) {
+return errno == EEXIST ? 0 : -1;
+}
+ret = p == (void *)addr;
+if (!keep || !ret) {
+munmap(p, size);
+}
+return ret;
+}
+
+/**
+ * pgb_try_mmap_skip_brk(uintptr_t addr, uintptr_t size, uintptr_t brk)
+ * @addr: host address
+ * @addr_last: host last address
+ * @brk: host brk
+ *
+ * Like pgb_try_mmap, but additionally reserve some memory following brk.
+ */
+static int pgb_try_mmap_skip_brk(uintptr_t addr, uintptr_t addr_last,
+ uintptr_t brk, bool keep)
+{
+uintptr_t brk_last = brk + 16 * MiB - 1;
+
+/* Do not map anything close to the host brk. */
+if (addr <= brk_last && brk <= addr_last) {
+return 0;
+}
+return pgb_try_mmap(addr, addr_last, keep);
+}
+
+/**
+ * pgb_try_mmap_set:
+ * @ga: set of guest addrs
+ * @base: guest_base
+ * @brk: host brk
+ *
+ * Return true if all @ga can be mapped by the host at @base.
+ * On success, retain the mapping at index 0 for reserved_va.
+ */
+
+typedef struct PGBAddrs {
+uintptr_t bounds[3][2]; /* start/last pairs */
+int nbounds;
+} PGBAddrs;
+
+static bool pgb_try_mmap_set(const PGBAddrs *ga, uintptr_t base, uintptr_t brk)
+{
+for (int i = ga->nbounds - 1; i >= 0; --i) {
+if (pgb_try_mmap_skip_brk(ga->bounds[i][0] + base,
+  ga->bounds[i][1] + base,
+  brk, i == 0 && reserved_va) <= 0) {
+return false;
+}
+}
+return true;
+}
+
+/**
+ * pgb_addr_set:
+ * @ga: output set of guest addrs
+ * @guest_loaddr: guest image low address
+ * @guest_loaddr: guest image high address
+ * @identity: create for identity mapping
+ *
+ * Fill in @ga with the image, COMMPAGE and NULL page.
+ */
+static bool pgb_addr_set(PGBAddrs *ga, abi_ulong guest_loaddr,
+ abi_ulong guest_hiaddr, bool try_identity)
+{
+int n;
+
+/*
+ * With a low commpage, or a guest mapped very low,
+ * we may not be able to use the identity map.
+ */
+if (try_identity) {
+if (LO_COMMPAGE != -1 && LO_COMMPAGE < mmap_min_addr) {
+return false;
+}
+if (guest_loaddr != 0 && guest_loaddr < mmap_min_addr) {
+return false;
+}
+}
+
+memset(ga, 0, sizeof(*ga));
+n = 0;
+
+if (reserved_va) {
+ga->bounds[n][0] = try_identity ? mmap_min_addr : 0;
+ga->bounds[n][1] = reserved_va;
+n++;
+/* LO_COMMPAGE and NULL handled by reserving from 0. */
+} else {
+/* Add any LO_COMMPAGE or NULL page. */
+if (LO_COMMPAGE != -1) {
+ga->bounds[n][0] = 0;
+ga->bounds[n][1] = LO_COMMPAGE + TARGET_PAGE_SIZE - 1;
+n++;
+} else if (!try_identity) {
+ga->bounds[n][0] = 0;
+ga->bounds[n][1] = TARGET_PAGE_SIZE - 1;
+n++;
+}
+
+/* Add the guest image for ET_EXEC. */
+if (guest_loaddr) {
+ga->bounds[n][0] = guest_loaddr;
+ga->bounds[n][1] = guest_hiaddr;
+n++;
+}
+}
+
+/*
+ * Temporarily disable
+ *   "comparison is always false due to limited range of data type"
+ * due to comparison between unsigned and (possible) 0.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtype-limits"
+
+/* Add any HI_COMMPAGE not covered by reserved_va. */
+if (reserved_va < HI_COMMPAGE) {
+ga->bounds[n][0] = HI_COMMPAGE & qemu_host_page_mask;
+ga->bounds[n][1] = HI_COMMPAGE + TARGET_PAGE_SIZE - 1;
+n++;
+}
+
+#pragma GCC diagnostic pop
+

[PULL 00/14] linux-user image mapping fixes

2023-08-08 Thread Richard Henderson
The following changes since commit 0450cf08976f9036feaded438031b4cba94f6452:

  Merge tag 'fixes-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
into staging (2023-08-07 13:55:00 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-lu-20230808

for you to fetch changes up to dd55885516f42f718d0d121c59a5f7be5fdae3e6:

  linux-user: Rewrite non-fixed probe_guest_base (2023-08-08 13:41:55 -0700)


linux-user: Adjust guest image layout vs reserved_va
linux-user: Do not adjust image mapping for host page size
linux-user: Adjust initial brk when interpreter is close to executable
util/selfmap: Rewrite using qemu/interval-tree.h
linux-user: Rewrite probe_guest_base


Helge Deller (1):
  linux-user: Adjust initial brk when interpreter is close to executable

Richard Henderson (13):
  linux-user: Adjust task_unmapped_base for reserved_va
  linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
  linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
  linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
  linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
  linux-user: Do not adjust image mapping for host page size
  linux-user: Do not adjust zero_bss for host page size
  linux-user: Use zero_bss for PT_LOAD with no file contents too
  util/selfmap: Rewrite using qemu/interval-tree.h
  linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base
  linux-user: Consolidate guest bounds check in probe_guest_base
  linux-user: Rewrite fixed probe_guest_base
  linux-user: Rewrite non-fixed probe_guest_base

 include/qemu/selfmap.h   |  22 +-
 linux-user/aarch64/target_mman.h |  13 +
 linux-user/alpha/target_mman.h   |  11 +
 linux-user/arm/target_mman.h |  11 +
 linux-user/cris/target_mman.h|  12 +
 linux-user/hexagon/target_mman.h |  13 +
 linux-user/hppa/target_mman.h|   6 +
 linux-user/i386/target_mman.h|  16 +
 linux-user/loongarch64/target_mman.h |  11 +
 linux-user/m68k/target_mman.h|   5 +
 linux-user/microblaze/target_mman.h  |  11 +
 linux-user/mips/target_mman.h|  10 +
 linux-user/nios2/target_mman.h   |  10 +
 linux-user/openrisc/target_mman.h|  10 +
 linux-user/ppc/target_mman.h |  20 +
 linux-user/qemu.h|   1 -
 linux-user/riscv/target_mman.h   |  10 +
 linux-user/s390x/target_mman.h   |  20 +
 linux-user/sh4/target_mman.h |   7 +
 linux-user/sparc/target_mman.h   |  25 ++
 linux-user/user-mmap.h   |  29 +-
 linux-user/x86_64/target_mman.h  |  15 +
 linux-user/xtensa/target_mman.h  |  10 +
 linux-user/elfload.c | 792 +--
 linux-user/main.c|  43 ++
 linux-user/mmap.c|  19 +-
 linux-user/syscall.c |  15 +-
 util/selfmap.c   | 114 +++--
 28 files changed, 803 insertions(+), 478 deletions(-)



[PULL 03/14] linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h

2023-08-08 Thread Richard Henderson
Copy each guest kernel's default value, then bound it
against reserved_va or the host address space.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_mman.h |  3 +++
 linux-user/alpha/target_mman.h   |  3 +++
 linux-user/arm/target_mman.h |  3 +++
 linux-user/cris/target_mman.h|  3 +++
 linux-user/hexagon/target_mman.h |  3 +++
 linux-user/hppa/target_mman.h|  3 +++
 linux-user/i386/target_mman.h|  3 +++
 linux-user/loongarch64/target_mman.h |  3 +++
 linux-user/m68k/target_mman.h|  2 ++
 linux-user/microblaze/target_mman.h  |  3 +++
 linux-user/mips/target_mman.h|  3 +++
 linux-user/nios2/target_mman.h   |  3 +++
 linux-user/openrisc/target_mman.h|  3 +++
 linux-user/ppc/target_mman.h |  7 +++
 linux-user/riscv/target_mman.h   |  3 +++
 linux-user/s390x/target_mman.h   | 10 ++
 linux-user/sh4/target_mman.h |  3 +++
 linux-user/sparc/target_mman.h   | 11 +++
 linux-user/user-mmap.h   | 13 +++--
 linux-user/x86_64/target_mman.h  |  3 +++
 linux-user/xtensa/target_mman.h  |  4 
 linux-user/main.c| 15 +++
 linux-user/mmap.c|  1 +
 23 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h
index 4d3eecfb26..69ec5d5739 100644
--- a/linux-user/aarch64/target_mman.h
+++ b/linux-user/aarch64/target_mman.h
@@ -14,6 +14,9 @@
  */
 #define TASK_UNMAPPED_BASE  (1ull << (48 - 2))
 
+/* arch/arm64/include/asm/elf.h */
+#define ELF_ET_DYN_BASE TARGET_PAGE_ALIGN((1ull << 48) / 3 * 2)
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h
index c90b493711..8edfe2b88c 100644
--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -28,6 +28,9 @@
  */
 #define TASK_UNMAPPED_BASE  0x200ull
 
+/* arch/alpha/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x100)
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/arm/target_mman.h b/linux-user/arm/target_mman.h
index 76275b2c7e..51005da869 100644
--- a/linux-user/arm/target_mman.h
+++ b/linux-user/arm/target_mman.h
@@ -6,4 +6,7 @@
  */
 #define TASK_UNMAPPED_BASE   0x4000
 
+/* arch/arm/include/asm/elf.h */
+#define ELF_ET_DYN_BASE  0x0040
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/cris/target_mman.h b/linux-user/cris/target_mman.h
index 9df7b1eda5..9ace8ac292 100644
--- a/linux-user/cris/target_mman.h
+++ b/linux-user/cris/target_mman.h
@@ -7,4 +7,7 @@
  */
 #define TASK_UNMAPPED_BASE TARGET_PAGE_ALIGN(0xb000 / 3)
 
+/* arch/cris/include/uapi/asm/elf.h */
+#define ELF_ET_DYN_BASE(TASK_UNMAPPED_BASE * 2)
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/hexagon/target_mman.h b/linux-user/hexagon/target_mman.h
index c5ae336e07..e6b5e2ca36 100644
--- a/linux-user/hexagon/target_mman.h
+++ b/linux-user/hexagon/target_mman.h
@@ -8,4 +8,7 @@
  */
 #define TASK_UNMAPPED_BASE   0x4000
 
+/* arch/hexagon/include/asm/elf.h */
+#define ELF_ET_DYN_BASE  0x0800
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h
index 6459e7dbdd..ccda46e842 100644
--- a/linux-user/hppa/target_mman.h
+++ b/linux-user/hppa/target_mman.h
@@ -27,6 +27,9 @@
 /* arch/parisc/include/asm/processor.h: DEFAULT_MAP_BASE32 */
 #define TASK_UNMAPPED_BASE  0x4000
 
+/* arch/parisc/include/asm/elf.h */
+#define ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x0100)
+
 #include "../generic/target_mman.h"
 
 #endif
diff --git a/linux-user/i386/target_mman.h b/linux-user/i386/target_mman.h
index cc3382007f..e3b8e1eaa6 100644
--- a/linux-user/i386/target_mman.h
+++ b/linux-user/i386/target_mman.h
@@ -11,4 +11,7 @@
  */
 #define TASK_UNMAPPED_BASE0x4000
 
+/* arch/x86/include/asm/elf.h */
+#define ELF_ET_DYN_BASE   0x0040
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/loongarch64/target_mman.h 
b/linux-user/loongarch64/target_mman.h
index d70e44d44c..8c2a3d5596 100644
--- a/linux-user/loongarch64/target_mman.h
+++ b/linux-user/loongarch64/target_mman.h
@@ -6,4 +6,7 @@
 #define TASK_UNMAPPED_BASE \
 TARGET_PAGE_ALIGN((1ull << TARGET_VIRT_ADDR_SPACE_BITS) / 3)
 
+/* arch/loongarch/include/asm/elf.h */
+#define ELF_ET_DYN_BASE   (TASK_UNMAPPED_BASE * 2)
+
 #include "../generic/target_mman.h"
diff --git a/linux-user/m68k/target_mman.h b/linux-user/m68k/target_mman.h
index d3eceb663b..20cfe750c5 100644
--- a/linux-user/m68k/target_mman.h
+++ b/linux-user/m68k/target_mman.h
@@ -1,4 +1,6 @@
 /* arch/m68k/include/asm/processor.h */
 #define TASK_UNMAPPED_BASE  0xC000
+/* arch/m68k/include/asm/elf.h */
+#define ELF_ET_DYN_BASE 0xD000
 
 

[PULL 01/14] linux-user: Adjust task_unmapped_base for reserved_va

2023-08-08 Thread Richard Henderson
Ensure that the chosen values for mmap_next_start and
task_unmapped_base are within the guest address space.

Tested-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Signed-off-by: Richard Henderson 
---
 linux-user/user-mmap.h | 34 +-
 linux-user/main.c  | 28 
 linux-user/mmap.c  | 18 +++---
 3 files changed, 64 insertions(+), 16 deletions(-)

diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index 7265c2c116..2c9d99ed6c 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -18,6 +18,39 @@
 #ifndef LINUX_USER_USER_MMAP_H
 #define LINUX_USER_USER_MMAP_H
 
+#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
+#ifdef TARGET_AARCH64
+# define TASK_UNMAPPED_BASE  0x55
+#else
+# define TASK_UNMAPPED_BASE  (1ul << 38)
+#endif
+#else
+#ifdef TARGET_HPPA
+# define TASK_UNMAPPED_BASE  0xfa00
+#else
+# define TASK_UNMAPPED_BASE  0x4000
+#endif
+#endif
+
+/*
+ * Guest parameters for the ADDR_COMPAT_LAYOUT personality
+ * (at present this is the only layout supported by QEMU).
+ *
+ * TASK_UNMAPPED_BASE: For mmap without hint (addr != 0), the search
+ * for unused virtual memory begins at TASK_UNMAPPED_BASE.
+ *
+ * task_unmapped_base: When the guest address space is limited via -R,
+ * the value of TASK_UNMAPPED_BASE is adjusted to fit.
+ */
+extern abi_ulong task_unmapped_base;
+
+/*
+ * mmap_next_start: The base address for the next mmap without hint,
+ * increased after each successful map, starting at task_unmapped_base.
+ * This is an optimization within QEMU and not part of ADDR_COMPAT_LAYOUT.
+ */
+extern abi_ulong mmap_next_start;
+
 int target_mprotect(abi_ulong start, abi_ulong len, int prot);
 abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
  int flags, int fd, off_t offset);
@@ -26,7 +59,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
abi_ulong new_size, unsigned long flags,
abi_ulong new_addr);
 abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice);
-extern abi_ulong mmap_next_start;
 abi_ulong mmap_find_vma(abi_ulong, abi_ulong, abi_ulong);
 void mmap_fork_start(void);
 void mmap_fork_end(int child);
diff --git a/linux-user/main.c b/linux-user/main.c
index 556956c363..be621dc792 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -821,6 +821,34 @@ int main(int argc, char **argv, char **envp)
 reserved_va = max_reserved_va;
 }
 
+/*
+ * Temporarily disable
+ *   "comparison is always false due to limited range of data type"
+ * due to comparison between (possible) uint64_t and uintptr_t.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtype-limits"
+
+/*
+ * Select an initial value for task_unmapped_base that is in range.
+ */
+if (reserved_va) {
+if (TASK_UNMAPPED_BASE < reserved_va) {
+task_unmapped_base = TASK_UNMAPPED_BASE;
+} else {
+/* The most common default formula is TASK_SIZE / 3. */
+task_unmapped_base = TARGET_PAGE_ALIGN(reserved_va / 3);
+}
+} else if (TASK_UNMAPPED_BASE < UINTPTR_MAX) {
+task_unmapped_base = TASK_UNMAPPED_BASE;
+} else {
+/* 32-bit host: pick something medium size. */
+task_unmapped_base = 0x1000;
+}
+mmap_next_start = task_unmapped_base;
+
+#pragma GCC diagnostic pop
+
 {
 Error *err = NULL;
 if (seed_optarg != NULL) {
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index eb04fab8ab..84436d45c8 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -299,20 +299,8 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
 return true;
 }
 
-#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
-#ifdef TARGET_AARCH64
-# define TASK_UNMAPPED_BASE  0x55
-#else
-# define TASK_UNMAPPED_BASE  (1ul << 38)
-#endif
-#else
-#ifdef TARGET_HPPA
-# define TASK_UNMAPPED_BASE  0xfa00
-#else
-# define TASK_UNMAPPED_BASE  0x4000
-#endif
-#endif
-abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
+abi_ulong task_unmapped_base;
+abi_ulong mmap_next_start;
 
 /*
  * Subroutine of mmap_find_vma, used when we have pre-allocated
@@ -391,7 +379,7 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 
 if ((addr & (align - 1)) == 0) {
 /* Success.  */
-if (start == mmap_next_start && addr >= TASK_UNMAPPED_BASE) {
+if (start == mmap_next_start && addr >= task_unmapped_base) {
 mmap_next_start = addr + size;
 }
 return addr;
-- 
2.34.1




[PULL 07/14] linux-user: Do not adjust image mapping for host page size

2023-08-08 Thread Richard Henderson
Remove TARGET_ELF_EXEC_PAGESIZE, and 3 other TARGET_ELF_PAGE* macros
based off of that.  Rely on target_mmap to handle guest vs host page
size mismatch.

Tested-by: Helge Deller 
Reviewed-by: Helge Deller 
Reviewed-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3553a3eaef..964b21f997 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1960,15 +1960,6 @@ struct exec
 #define ZMAGIC 0413
 #define QMAGIC 0314
 
-/* Necessary parameters */
-#define TARGET_ELF_EXEC_PAGESIZE \
-(((eppnt->p_align & ~qemu_host_page_mask) != 0) ? \
- TARGET_PAGE_SIZE : MAX(qemu_host_page_size, TARGET_PAGE_SIZE))
-#define TARGET_ELF_PAGELENGTH(_v) ROUND_UP((_v), TARGET_ELF_EXEC_PAGESIZE)
-#define TARGET_ELF_PAGESTART(_v) ((_v) & \
- ~(abi_ulong)(TARGET_ELF_EXEC_PAGESIZE-1))
-#define TARGET_ELF_PAGEOFFSET(_v) ((_v) & (TARGET_ELF_EXEC_PAGESIZE-1))
-
 #define DLINFO_ITEMS 16
 
 static inline void memcpy_fromfs(void * to, const void * from, unsigned long n)
@@ -3241,8 +3232,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 
 vaddr = load_bias + eppnt->p_vaddr;
-vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
-vaddr_ps = TARGET_ELF_PAGESTART(vaddr);
+vaddr_po = vaddr & ~TARGET_PAGE_MASK;
+vaddr_ps = vaddr & TARGET_PAGE_MASK;
 
 vaddr_ef = vaddr + eppnt->p_filesz;
 vaddr_em = vaddr + eppnt->p_memsz;
@@ -3252,7 +3243,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  * but no backing file segment.
  */
 if (eppnt->p_filesz != 0) {
-vaddr_len = TARGET_ELF_PAGELENGTH(eppnt->p_filesz + vaddr_po);
+vaddr_len = eppnt->p_filesz + vaddr_po;
 error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
 MAP_PRIVATE | MAP_FIXED,
 image_fd, eppnt->p_offset - vaddr_po);
@@ -3268,7 +3259,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 zero_bss(vaddr_ef, vaddr_em, elf_prot);
 }
 } else if (eppnt->p_memsz != 0) {
-vaddr_len = TARGET_ELF_PAGELENGTH(eppnt->p_memsz + vaddr_po);
+vaddr_len = eppnt->p_memsz + vaddr_po;
 error = target_mmap(vaddr_ps, vaddr_len, elf_prot,
 MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS,
 -1, 0);
-- 
2.34.1




Re: [PATCH 2/2] riscv: zicond: make default

2023-08-08 Thread Daniel Henrique Barboza

(CCing Alistair and other reviewers)

On 8/8/23 15:17, Vineet Gupta wrote:

Again this helps with better testing and something qemu has been doing
with newer features anyways.

Signed-off-by: Vineet Gupta 
---


Even if we can reach a consensus about removing the experimental (x- prefix) 
status
from an extension that is Frozen instead of ratified, enabling stuff in the 
default
CPUs because it's easier to test is something we would like to avoid. The rv64
CPU has a random set of extensions enabled for the most different and 
undocumented
reasons, and users don't know what they'll get because we keep beefing up the
generic CPUs arbitrarily.

Starting on QEMU 8.2 we'll have a 'max' CPU type that will enable all 
non-experimental
and non-vendor extensions by default, making it easier for tooling to test new
features/extensions. All tooling should consider changing their scripts to use 
the
'max' CPU when it's available.

For now, I fear that gcc and friends will still need to enable 'zicond' in the 
command
line via 'zicond=true'.  Thanks,


Daniel



  target/riscv/cpu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 022bd9d01223..e6e28414b223 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -438,6 +438,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  cpu->cfg.ext_xtheadbs = true;
  cpu->cfg.ext_xtheadcmo = true;
  cpu->cfg.ext_xtheadcondmov = true;
+cpu->cfg.ext_zicond = false;
  cpu->cfg.ext_xtheadfmemidx = true;
  cpu->cfg.ext_xtheadmac = true;
  cpu->cfg.ext_xtheadmemidx = true;
@@ -483,6 +484,7 @@ static void rv64_veyron_v1_cpu_init(Object *obj)
  cpu->cfg.ext_zbc = true;
  cpu->cfg.ext_zbs = true;
  cpu->cfg.ext_XVentanaCondOps = true;
+cpu->cfg.ext_zicond = false;
  
  cpu->cfg.mvendorid = VEYRON_V1_MVENDORID;

  cpu->cfg.marchid = VEYRON_V1_MARCHID;
@@ -1816,7 +1818,7 @@ static Property riscv_cpu_extensions[] = {
  DEFINE_PROP_BOOL("zcf", RISCVCPU, cfg.ext_zcf, false),
  DEFINE_PROP_BOOL("zcmp", RISCVCPU, cfg.ext_zcmp, false),
  DEFINE_PROP_BOOL("zcmt", RISCVCPU, cfg.ext_zcmt, false),
-DEFINE_PROP_BOOL("zicond", RISCVCPU, cfg.ext_zicond, false),
+DEFINE_PROP_BOOL("zicond", RISCVCPU, cfg.ext_zicond, true),
  
  /* Vendor-specific custom extensions */

  DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),




Re: [PATCH 03/33] Update the definitions of __put_user and __get_user macros

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Warner Losh 

Use __builtin_choose_expr to avoid type promotion from ?:
in __put_user_e and __get_user_e macros.
Copied from linux-user/qemu.h, originally by Blue Swirl.

Signed-off-by: Warner Losh 
Signed-off-by: Karim Taha 
---
  bsd-user/qemu.h   | 81 ---
  bsd-user/signal.c |  5 +--
  2 files changed, 35 insertions(+), 51 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index dfdfa8dd67..c41ebfe937 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -307,50 +307,37 @@ static inline bool access_ok(int type, abi_ulong addr, 
abi_ulong size)
  #define PRAGMA_REENABLE_PACKED_WARNING
  #endif
  
-#define __put_user(x, hptr)\

-({\
-int size = sizeof(*hptr);\
-switch (size) {\
-case 1:\
-*(uint8_t *)(hptr) = (uint8_t)(typeof(*hptr))(x);\
-break;\
-case 2:\
-*(uint16_t *)(hptr) = tswap16((typeof(*hptr))(x));\
-break;\
-case 4:\
-*(uint32_t *)(hptr) = tswap32((typeof(*hptr))(x));\
-break;\
-case 8:\
-*(uint64_t *)(hptr) = tswap64((typeof(*hptr))(x));\
-break;\
-default:\
-abort();\
-} \
-0;\
-})
+#define __put_user_e(x, hptr, e)\
+do {\
+PRAGMA_DISABLE_PACKED_WARNING;  \
+(__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p, \
+__builtin_choose_expr(sizeof(*(hptr)) == 2, stw_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 4, stl_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p, abort  \
+((hptr), (x)), (void)0);\
+PRAGMA_REENABLE_PACKED_WARNING; \
+} while (0)
+
+#define __get_user_e(x, hptr, e)\
+do {\
+PRAGMA_DISABLE_PACKED_WARNING;  \
+((x) = (typeof(*hptr))( \
+__builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p, \
+__builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,   \
+__builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort  \
+(hptr)), (void)0);  \
+PRAGMA_REENABLE_PACKED_WARNING; \
+} while (0)


Hmm.  I guess this works.  The typeof cast in __get_user_e being required when sizeof(x) > 
sizeof(*hptr) in order to get the correct extension.


Is it clearer with _Generic?

(x) = _Generic(*(hptr),
   int8_t: *(int8_t *)(hptr),
   uint8_t: *(uint8_t *)(hptr),
   int16_t: (int16_t)lduw_##e##_p(hptr),
   uint16_t: lduw_##e##_p(hptr),
   int32_t: (int32_t)ldl_##e##_p(hptr),
   uint32_t: (uint32_t)ldl_##e##_p(hptr),
   int64_t: (int64_t)ldq_##e##_p(hptr),
   uint64_t: ldq_##e##_p(hptr));

In particular I believe the error message will be much prettier.

Either way,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-08 Thread Peter Xu
On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
> From: Thiner Logoer 
> 
> Users may specify
> * "-mem-path" or
> * "-object memory-backend-file,share=off,readonly=off"
> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
> does not have write permissions to open the file.
> 
> For now, we would always fail in that case, always requiring file write
> permissions. Let's detect when that failure happens and fallback to opening
> the file readonly.
> 
> Warn the user, since there are other use cases where we want the file to
> be mapped writable: ftruncate() and fallocate() will fail if the file
> was not opened with write permissions.
> 
> Signed-off-by: Thiner Logoer 
> Co-developed-by: David Hildenbrand 
> Signed-off-by: David Hildenbrand 
> ---
>  softmmu/physmem.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3df73542e1..d1ae694b20 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>  static int file_ram_open(const char *path,
>   const char *region_name,
>   bool readonly,
> - bool *created,
> - Error **errp)
> + bool *created)
>  {
>  char *filename;
>  char *sanitized_name;
> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>  g_free(filename);
>  }
>  if (errno != EEXIST && errno != EINTR) {
> -error_setg_errno(errp, errno,
> - "can't open backing store %s for guest RAM",
> - path);
> -return -1;
> +return -errno;
>  }
>  /*
>   * Try again on EINTR and EEXIST.  The latter happens when
> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
> MemoryRegion *mr,
>  bool created;
>  RAMBlock *block;
>  
> -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, ,
> -   errp);
> +fd = file_ram_open(mem_path, memory_region_name(mr), readonly, );
> +if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
> +/*
> + * We can have a writable MAP_PRIVATE mapping of a readonly file.
> + * However, some operations like ftruncate() or fallocate() might 
> fail
> + * later, let's warn the user.
> + */
> +fd = file_ram_open(mem_path, memory_region_name(mr), true, );
> +if (fd >= 0) {
> +warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
> +" readonly because the file is not writable", 
> mem_path);

I can understand the use case, but this will be slightly unwanted,
especially the user doesn't yet have a way to predict when will it happen.

Meanwhile this changes the behavior, is it a concern that someone may want
to rely on current behavior of failing?

To think from a higher level of current use case, the ideal solution seems
to me that if the ram file can be put on a file system that supports CoW
itself (like btrfs), we can snapshot that ram file and make it RW for the
qemu instance. Then here it'll be able to open the file.  We'll be able to
keep the interface working as before, meanwhile it'll work with fallocate
or truncations too I assume.

Would that be better instead of changing QEMU?

Thanks,

> +}
> +}
>  if (fd < 0) {
> +error_setg_errno(errp, -fd,
> + "can't open backing store %s for guest RAM",
> + mem_path);
>  return NULL;
>  }
>  
> -- 
> 2.41.0
> 

-- 
Peter Xu




Re: [PATCH 1/2] riscv: zicond: make non-experimental

2023-08-08 Thread Palmer Dabbelt

On Tue, 08 Aug 2023 11:45:49 PDT (-0700), Vineet Gupta wrote:



On 8/8/23 11:29, Richard Henderson wrote:

On 8/8/23 11:17, Vineet Gupta wrote:

zicond is now codegen supported in both llvm and gcc.


It is still not in

https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions


Right, its been frozen since April though and with support trickling in
rest of tooling it becomes harder to test.
I don't know what exactly QEMU's policy is on this ?


IIUC we'd historically marked stuff as non-experimental when it's 
frozen, largely because ratification is such a nebulous process.  
There's obviously risk there, but there's risk to anything.  Last I can 
find is 260b594d8a ("RISC-V: Add Zawrs ISA extension support"), which 
specifically calls out Zawrs as frozen and IIUC adds support without the 
"x-" prefix.


I can't find anything written down about it, though...



Re: [PATCH 02/33] Disable clang warnings arising from bsd-user/qemu.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

+/*
+ * Tricky points:
+ * - Use __builtin_choose_expr to avoid type promotion from ?:,
+ * - Invalid sizes result in a compile time error stemming from
+ *   the fact that abort has no parameters.
+ * - It's easier to use the endian-specific unaligned load/store
+ *   functions than host-endian unaligned load/store plus tswapN.
+ * - The pragmas are necessary only to silence a clang false-positive
+ *   warning: see https://bugs.llvm.org/show_bug.cgi?id=39113 .
+ * - We have to disable -Wpragmas warnings to avoid a complaint about
+ *   an unknown warning type from older compilers that don't know about
+ *   -Waddress-of-packed-member.
+ * - gcc has bugs in its _Pragma() support in some versions, eg
+ *   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 -- so we only
+ *   include the warning-suppression pragmas for clang


Perhaps s/in some versions/prior to gcc-13/ ?
At least that's what the bugzilla says, and it
will help when auditing for compiler versions
in a few years when gcc-12 is EOL.

Either way,
Reviewed-by: Richard Henderson 

r~



Re: [PATCH 01/33] Move _WANT_FREEBSD macros to include/qemu/osdep.h

2023-08-08 Thread Richard Henderson

On 8/7/23 23:07, Karim Taha wrote:

From: Warner Losh

move _WANT_FREEBSD macros from bsd-user/freebsd/os-syscall.c to
include/qemu/osdep.h in order to pull some struct defintions needed
later in the build.

Signed-off-by: Warner Losh
Signed-off-by: Karim Taha
---
  bsd-user/freebsd/os-syscall.c | 11 ---
  include/qemu/osdep.h  | 13 +
  2 files changed, 13 insertions(+), 11 deletions(-)


Acked-by: Richard Henderson 



r~



Re: [PATCH for-8.2 v2 1/2] qapi/migration: Deduplicate migration parameter field comments

2023-08-08 Thread Peter Xu
On Sun, Aug 06, 2023 at 11:49:46AM -0400, Peter Xu wrote:
> > I think we have a tradeoff here.  If perpetuating the unclean and ugly
> > use of "" is what it takes to de-triplicate migration parameters, we may
> > decide to accept that.
> 
> I don't think it's a must.  As Dan raised, we can convert str -> StrOrNull
> for MigrationParameters. I assume it won't affect query-migrate-parameters
> anyway OTOH.
> 
> I assume it means there's nothing yet obvious that we overlooked on the
> whole idea.  Let me propose the formal patchset early next week.  It'll be
> mostly the patch I attached but just add those extra logics for StrOrNull,
> so the diffstat might be less attractive but hopefully still good enough to
> be accepted.

The new StrOrNull approach doesn't work with current migration object
properties.. as StrOrNull must be a pointer for @MigrationParameters not
static, and it stops working with offsetof():

../migration/options.c:218:5: error: cannot apply ‘offsetof’ to a non constant 
address
  218 | DEFINE_PROP_STRING("tls-creds", MigrationState, 
parameters.tls_creds->u.s),
  | ^~
../migration/options.c:219:5: error: cannot apply ‘offsetof’ to a non constant 
address
  219 | DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname->u.s),
  | ^~
../migration/options.c:220:5: error: cannot apply ‘offsetof’ to a non constant 
address
  220 | DEFINE_PROP_STRING("tls-authz", MigrationState, 
parameters.tls_authz->u.s),
  | ^~

Any easy way to fix this?  I.e., is there a way to declare StrOrNull (in
MigrationParameters of qapi/migration.json) to be statically allocated
rather than a pointer (just like default behavior of any uint* types)?

Thanks,

-- 
Peter Xu




  1   2   3   >