date:20230403

Re: [PATCH v5] hostmem-file: add offset option

2023-04-03 Thread Markus Armbruster

Alexander Graf  writes:

> Add an option for hostmem-file to start the memory object at an offset
> into the target file. This is useful if multiple memory objects reside
> inside the same target file, such as a device node.
>
> In particular, it's useful to map guest memory directly into /dev/mem
> for experimentation.
>
> To make this work consistently, also fix up all places in QEMU that
> expect fd offsets to be 0.
>
> Signed-off-by: Alexander Graf 

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index a877b879b9..f740f74be3 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -635,6 +635,10 @@
>  # specify the required alignment via this option.
>  # 0 selects a default alignment (currently the page size). (default: 
> 0)
>  #
> +# @offset: the offset into the target file that the region starts at. You can
> +#  use this option to back multiple regions with a single file. Must 
> be
> +#  a multiple of the page size. (default: 0) (since 8.1)
> +#
>  # @discard-data: if true, the file contents can be destroyed when QEMU exits,
>  #to avoid unnecessarily flushing data to the backing file. 
> Note
>  #that ``discard-data`` is only an optimization, and QEMU 
> might
> @@ -655,6 +659,7 @@
>  { 'struct': 'MemoryBackendFileProperties',
>'base': 'MemoryBackendProperties',
>'data': { '*align': 'size',
> +'*offset': 'size',
>  '*discard-data': 'bool',
>  'mem-path': 'str',
>  '*pmem': { 'type': 'bool', 'if': 'CONFIG_LIBPMEM' },

Acked-by: Markus Armbruster 

[...]

Re: an issue for device hot-unplug

2023-04-03 Thread Jinpu Wang

Hi Yu,

On Mon, Apr 3, 2023 at 6:59 PM Yu Zhang  wrote:
>
> Dear Laurent,
>
> Thank you for your quick reply. We used qemu-7.1, but it is reproducible with 
> qemu from v6.2 to the recent v8.0 release candidates.
> I found that it's introduced by the commit  9323f892b39 (between v6.2.0-rc2 
> and v6.2.0-rc3).
>
> If it doesn't break anything else, it suffices to remove the line below from 
> acpi_pcihp_device_unplug_request_cb():
>
> pdev->qdev.pending_deleted_event = true;
>
> but you may have a reason to keep it. First of all, I'll open a bug in the 
> bug tracker and let you know.
>
> Best regards,
> Yu Zhang
This patch from Igor Mammedov seems relevant,
https://lore.kernel.org/qemu-devel/20230403131833-mutt-send-email-...@kernel.org/T/#t
Can you try it out?

Regards!
Jinpu
>
> On Mon, Apr 3, 2023 at 6:32 PM Laurent Vivier  wrote:
>>
>> Hi Yu,
>>
>> please open a bug in the bug tracker:
>>
>> https://gitlab.com/qemu/qemu/-/issues
>>
>> It's easier to track the problem.
>>
>> What is the version of QEMU you are using?
>> Could you provide QEMU command line?
>>
>> Thanks,
>> Laurent
>>
>>
>> On 4/3/23 15:24, Yu Zhang wrote:
>> > Dear Laurent,
>> >
>> > recently we run into an issue with the following error:
>> >
>> > command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" } 
>> > }' for VM "id"
>> > failed ({ "return": {"class": "GenericError", "desc": "Device virtio-diskX 
>> > is already in
>> > the process of unplug"} }).
>> >
>> > The issue is reproducible. With a few seconds delay before hot-unplug, 
>> > hot-unplug just
>> > works fine.
>> >
>> > After a few digging, we found that the commit 9323f892b39 may incur the 
>> > issue.
>> > --
>> >  failover: fix unplug pending detection
>> >
>> >  Failover needs to detect the end of the PCI unplug to start migration
>> >  after the VFIO card has been unplugged.
>> >
>> >  To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and 
>> > reset in
>> >  pcie_unplug_device().
>> >
>> >  But since
>> >  17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on 
>> > Q35")
>> >  we have switched to ACPI unplug and these functions are not called 
>> > anymore
>> >  and the flag not set. So failover migration is not able to detect if 
>> > card
>> >  is really unplugged and acts as it's done as soon as it's started. So 
>> > it
>> >  doesn't wait the end of the unplug to start the migration. We don't 
>> > see any
>> >  problem when we test that because ACPI unplug is faster than PCIe 
>> > native
>> >  hotplug and when the migration really starts the unplug operation is
>> >  already done.
>> >
>> >  See c000a9bd06ea ("pci: mark device having guest unplug request 
>> > pending")
>> >  a99c4da9fc2a ("pci: mark devices partially unplugged")
>> >
>> >  Signed-off-by: Laurent Vivier > > >
>> >  Reviewed-by: Ani Sinha mailto:a...@anisinha.ca>>
>> >  Message-Id: <2028133225.324937-4-lviv...@redhat.com
>> > >
>> >  Reviewed-by: Michael S. Tsirkin > > >
>> >  Signed-off-by: Michael S. Tsirkin > > >
>> > --
>> > The purpose is for detecting the end of the PCI device hot-unplug. 
>> > However, we feel the
>> > error confusing. How is it possible that a disk "is already in the process 
>> > of unplug"
>> > during the first hot-unplug attempt? So far as I know, the issue was also 
>> > encountered by
>> > libvirt, but they simply ignored it:
>> >
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1878659
>> > 
>> >
>> > Hence, a question is: should we have the line below in  
>> > acpi_pcihp_device_unplug_request_cb()?
>> >
>> > pdev->qdev.pending_deleted_event = true;
>> >
>> > It would be great if you as the author could give us a few hints.
>> >
>> > Thank you very much for your reply!
>> >
>> > Sincerely,
>> >
>> > Yu Zhang @ Compute Platform IONOS
>> > 03.04.2013
>>

Re: [PATCH v6 00/25] target/riscv: MSTATUS_SUM + cleanups

2023-04-03 Thread Wu, Fei

On 3/25/2023 6:54 PM, Richard Henderson wrote:
> This builds on Fei and Zhiwei's SUM and TB_FLAGS changes.
> 
>   * Reclaim 5 TB_FLAGS bits, since we nearly ran out.
> 
>   * Using cpu_mmu_index(env, true) is insufficient to implement
> HLVX properly.  While that chooses the correct mmu_idx, it
> does not perform the read with execute permission.
> I add a new tcg interface to perform a read-for-execute with
> an arbitrary mmu_idx.  This is still not 100% compliant, but
> it's closer.
> 
>   * Handle mstatus.MPV in cpu_mmu_index.
>   * Use vsstatus.SUM when required for MMUIdx_S_SUM.
>   * Cleanups for get_physical_address.
> 
> While this passes check-avocado, I'm sure that's insufficient.
> Please have a close look.
> 
I tested stress-ng to get the feeling of performance gain, although
stress-ng is not designed to be a performance workload. btw, I had to
revert commit 0ee342256af9 which is unrelated to this series, or qemu
exited during the test.
./stress-ng --timeout 5 --metrics-brief --class os --sequential 1

Here is the result, in general most of the tests benefit from these
series, but please note that not all the results are necessary to be
consistent across multiple runs, and some regressions are not real but I
haven't checked it one by one.

 master(60ca584b)   master + this  speedup

stressor   bogo ops/s  bogo ops/s
   (usr+sys time)  (usr+sys time)
sigsuspend19430.09  1492746.34 76.8265
utime  8779.64   271023.89 30.8696
chmod  1728.2627050.50 15.6519
vdso   23527136.74246955742.76 10.4966
signal   584521.13  5470775.44 9.35941
sigtrap  822935.76  7190973.63 8.7382
signest  802706.93  6969509.05 8.68251
sockpair 501188.08  4242275.08 8.46444
msg 1627863.38 13557215.89 8.32823
sigpending   551174.68  4575836.91 8.30197
locka   1447750.95 11727762.91 8.10068
lockofd 1460020.77 11562178.66 7.91919
sigsegv  718492.57  5673228.57 7.89602
getrandom129004.90  1006544.31 7.80237
sigq 892062.12  6828556.43 7.6548
chdir13.39  100.66 7.51755
timerfd 2074142.37 15395307.29 7.42249
mq   916620.00  6208148.59 6.77287
mutex   1124306.59  7285459.79 6.47996
urandom  104868.58   678510.46 6.4701
pipe2243935.71 14391093.39 6.41333
loadavg  463874.30  2936816.17 6.33106
fifo 423415.43  2632734.32 6.21785
vm16726.9199928.62 5.97412
handle   199246.08  1131172.45 5.67726
fstat  2383.1213479.35 5.65618
sigrt405007.13  2143758.11 5.29314
access 8449.1744145.10 5.22479
sigfd   1506073.95  7408089.06 4.91881
sysinfo   11711.4754868.08 4.68499
sigio   1672452.59  7564833.33 4.5232
rlimit26771.83   119476.12 4.46276
xattr   772.25 3412.81 4.41931
udp  595733.08  2495239.72 4.18852
sockfd   260825.22  1061910.05 4.07135
get   13169.5652788.06 4.00834
getdent  141465.81   564471.43 3.99016
rename61771.74   246277.28 3.98689
chown 54946.74   212353.58 3.86472
dev3555.8013596.14 3.82365
mincore6617.9225215.66 3.81021
file-ioctl   105919.35   398122.29 3.75873
link 15.45   56.02 3.62589
splice   239841.25   865390.06 3.60818
io-uring  45798.90   157006.17 3.42816
filename   7795.9826238.75 3.36568
sock   1746.96 5850.73 3.34909
vm-splice953550.50  3188724.62 3.34405
schedpolicy  231915.33   773655.76 3.33594
clock 21878.0272400.21 3.30927
fcntl 76122.11   245817.92 3.22926
dentry79533.95   247610.80 3.11327
fpunch11895.3036608.97 3.0776
revio866066.56  2596187.53 2.99768
null2351038.37  6984334.92 2.97074
mknod 71145.05   203284.26 2.85732
symlink  12.40   35.41 2.85565
fiemap45437.02   128983.69 2.83874
sleep100093.89   282540.81 2.82276
dir   99154.72   272727.21 2.75052
timer

Re: [PATCH v8] audio/pwaudio.c: Add Pipewire audio backend for QEMU

2023-04-03 Thread Volker Rümelin

Hi Dorinda,

Hi Volker,

Filling a buffer with zeros to produce silence still wrong for
unsigned
samples. For example, a 0 in SPA_AUDIO_FORMAT_U8 format maps to
-1.0 in
SPA_AUDIO_FORMAT_F32.

This is a bug. On a buffer underrun, the buffer filled with
silence is
dropped.

What are your suggestions to improve this?

The code in patch v7 handled buffer underruns in playback_on_process() 
correctly. I suggest to use that part of the code again. It was just 
wrong to fill the buffer with zeros for unsigned samples. Christian 
suggested to use the audio_pcm_info_clear_buf() function instead of 
memset(p, 0, n_bytes). If you don't want to use 
audio_pcm_info_clear_buf() you could use the code there as a template.

There is no guarantee that guests can produce audio samples fast enough. 
Buffer underruns should therefore be handled properly.

Why don't you need a lock here? Is pw_stream_set_active() thread safe?

I will put a lock there, Thanks.

You only have the three volume levels 2.0, 1.0 and 0.0 while
vol[i] has
256 levels.

Ack.

It's an optimization. Evaluating req =
(uint64_t)v->g->dev->timer_period
* v->info.rate * 1 / 2 / 100 * v->frame_size once in
qpw_init_out()
vs. a lot of needless evaluations every few milliseconds in the
callback.

Ack

  options. Please

Can you please clarify WYM here?

I didn't write that. The link was already in your email.

With best regards,
Volker

Thanks,
Dorinda

On Mon, Apr 3, 2023 at 8:51 AM Volker Rümelin  wrote:

Am 28.03.23 um 13:56 schrieb Dorinda Bassey:

Hi Dorinda,

> Hi Volker,
>
> Thanks for the feedback.
>
>     This term is constant for the lifetime of the playback
stream. It
>     could
>     be precalculated in qpw_init_out().
>
> It's still constant even when precalculated in qpw_init_out().

It's an optimization. Evaluating req =
(uint64_t)v->g->dev->timer_period
* v->info.rate * 1 / 2 / 100 * v->frame_size once in
qpw_init_out()
vs. a lot of needless evaluations every few milliseconds in the
callback.

With best regards,
Volker

>
>     The if (!v->enabled) block isn't needed. When the guest
stops the
>     playback stream, it won't write new samples. After the pipewire
>     ringbuffer is drained, avail is always 0. It's better to
drain the
>     ringbuffer, otherwise the first thing you will hear after
playback
>     starts again will be stale audio samples.
>
>     You removed the code to play silence on a buffer underrun. I
>     suggest to
>     add it again. Use a trace point with the "simple" trace
backend to
>     see
>     how often pipewire now calls the callback in short
succession for a
>     disabled stream before giving up. Please read again Marc-André's
>     comments for the v7 version of the
>     pipewire backend. When the guest enables/disables an audio
stream,
>     pipewire should know this. It's unnecessary that pipewire
calls the
>     callback code for disabled streams. Don't forget to connect the
>     stream
>     with the flag PW_STREAM_FLAG_INACTIVE. Every QEMU audio device
>     enables
>     the stream before playback/recording starts. The pcm_ops
functions
>     volume_out and volume_in are missing. Probably
>     SPA_PROP_channelVolumes can be used to adjust the stream
volumes.
>     Without these functions the guest can adjust the stream
volume and
>     the
>     host has an independent way to adjust the stream volume. This is
>     sometimes irritating.
>
>     The pipewire backend code doesn't use the in|out.name

>      options. Please
>     either remove the name options or add code to connect to the
>     specified
>     source/sink. I would prefer the latter. PW_KEY_TARGET_OBJECT
looks
>     promising.
>
> Ack.
>
> Thanks,
> Dorinda.
>
>
>
> On Mon, Mar 20, 2023 at 7:31 AM Volker Rümelin

> wrote:
>
>
>     > diff --git a/audio/trace-events b/audio/trace-events
>     > index e1ab643add..e0acf9ac56 100644
>     > --- a/audio/trace-events
>     > +++ b/audio/trace-events
>     > @@ -18,6 +18,13 @@ dbus_audio_register(const char *s,
const char
>     *dir) "sender = %s, dir = %s"
>     >   dbus_audio_put_buffer_out(size_t len) "len = %zu"
>     >   dbus_audio_read(size_t len) "len = %zu"
>     >
>     > +# pwaudio.c
>     > +pw_state_changed(const char *s) "stream state: %s"
>     > +pw_node(int nodeid) "node id: %d"
>     > +pw_read(int32_t avail, uint32_t index, size_t len) "avail=%u
>     index=%u len=%zu"
>     > +pw_write(int32_t filled, int32_t avail, uint32_t index,
size_t
>     len) "fill

Re: [PATCH] vfio/migration: Skip log_sync during migration SETUP state

2023-04-03 Thread Avihai Horon




On 04/04/2023 0:36, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Mon, 3 Apr 2023 22:36:42 +0200
Cédric Le Goater  wrote:


On 4/3/23 15:00, Avihai Horon wrote:

Currently, VFIO log_sync can be issued while migration is in SETUP
state. However, doing this log_sync is at best redundant and at worst
can fail.

Redundant -- all RAM is marked dirty in migration SETUP state and is
transferred only after migration is set to ACTIVE state, so doing
log_sync during migration SETUP is pointless.

Can fail -- there is a time window, between setting migration state to
SETUP and starting dirty tracking by RAM save_live_setup handler, during
which dirty tracking is still not started. Any VFIO log_sync call that
is issued during this time window will fail. For example, this error can
be triggered by migrating a VM when a GUI is active, which constantly
calls log_sync.

Fix it by skipping VFIO log_sync while migration is in SETUP state.

Fixes: 758b96b61d5c ("vfio/migrate: Move switch of dirty tracking into 
vfio_memory_listener")
Signed-off-by: Avihai Horon 

migration is still experimental, so this can wait 8.1. Correct me if not.

Agreed, this doesn't seem nearly catastrophic enough as an experimental
feature that it can't wait for the 8.1 devel cycle to open.


Sure, so let's wait for 8.1 cycle to open.

Thanks!


---
   hw/vfio/common.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4d01ea3515..78358ede27 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -478,7 +478,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
   VFIODevice *vbasedev;
   MigrationState *ms = migrate_get_current();

-if (!migration_is_setup_or_active(ms->state)) {
+if (ms->state != MIGRATION_STATUS_ACTIVE &&
+ms->state != MIGRATION_STATUS_DEVICE) {
   return false;
   }

Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat

2023-04-03 Thread Richard Henderson


On 4/3/23 19:11, gaosong wrote:


在 2023/4/4 上午4:13, Richard Henderson 写道:

On 4/3/23 05:55, gaosong wrote:

Hi, Richard

在 2023/4/1 下午1:03, Richard Henderson 写道:

On 3/27/23 20:06, Song Gao wrote:

+static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    TCGv_vec t1;
+    int64_t max  = (1l << imm) - 1;


This needed 1ull, but better to just use

    max = MAKE_64BIT_MASK(0, imm - 1); 

For the signed  version use ll?
I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is not suitable.


int64_t max = MAKE_64BIT_MASK(0, imm);
int64_t min = ~max // or -1 - max


The same problem with imm = 0,
MAKE_64BIT_MASK(0, 0) is always  0x. :-)


Huh.  Well that's a bug.


r~

Re: [PATCH v6 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread liweiwei




On 2023/4/4 11:12, LIU Zhiwei wrote:


On 2023/4/4 10:06, Weiwei Li wrote:

Add a base pc_save for PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
We can get pc-relative address from following formula:
   real_pc = (old)env->pc + diff, where diff = target_pc - ctx->pc_save.
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr and jal.

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps libc.so
(et al) at a different virtual address, and this allows those
translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +--
  target/riscv/insn_trans/trans_rvi.c.inc | 14 ++--


Miss the process for trans_ebreak.

I want to construct the PCREL feature on the processing of ctx pc 
related fields, which is the reason why we need do specially process. 
For example,


 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    if (tb_cflags(ctx->cflags) & CF_PCREL) {
+    target_ulong pc_rel = ctx->base.pc_next - ctx->base.pc_first 
+ a->imm;

+    gen_set_gpr_pcrel(ctx, a->rd, cpu_pc, pc_rel);
+    } else {
+    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    }
 return true;
 }

+static void gen_set_gpr_pcrel(DisasContext *ctx, int reg_num, TCGv t, 
target_ulong rel)

+{
+    TCGv dest = dest_gpr(ctx, reg_num);
+    tcg_gen_addi_tl(dest, t, rel);
+    gen_set_gpr(ctx, reg_num, dest);
+}
+

But if it is too difficult to reuse the current implementation, your 
implementation is also acceptable to me.


Sorry, I don't get your idea. gen_pc_plus_diff() can do all the above job.

Regards,

Weiwei Li



Zhiwei


  target/riscv/translate.c | 48 -
  3 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
    const TranslationBlock *tb)
  {
-    RISCVCPU *cpu = RISCV_CPU(cs);
-    CPURISCVState *env = &cpu->env;
-    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    if (!(tb_cflags(tb) & CF_PCREL)) {
+    RISCVCPU *cpu = RISCV_CPU(cs);
+    CPURISCVState *env = &cpu->env;
+    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  -    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  -    if (xl == MXL_RV32) {
-    env->pc = (int32_t) tb->pc;
-    } else {
-    env->pc = tb->pc;
+    if (xl == MXL_RV32) {
+    env->pc = (int32_t) tb->pc;
+    } else {
+    env->pc = tb->pc;
+    }
  }
  }
  @@ -693,11 +695,18 @@ static void 
riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    target_ulong pc;
+
+    if (tb_cflags(tb) & CF_PCREL) {
+    pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+    } else {
+    pc = data[0];
+    }
    if (xl == MXL_RV32) {
-    env->pc = (int32_t)data[0];
+    env->pc = (int32_t)pc;
  } else {
-    env->pc = data[0];
+    env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index cc72864d32..7cbbdac5aa 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
    static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
  {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    TCGv target_pc = dest_gpr(ctx, a->rd);
+    gen_pc_plus_diff(target_pc, ctx, a->imm + ctx->base.pc_next);
+    gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  @@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, 
arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+    TCGv succ_pc = dest_gpr(ctx, a->rd);
    tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), 
a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  -    gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+    gen_pc_plus_diff(succ_pc, ctx, ctx->pc_succ_insn);
+    gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);
  lookup_and_goto_ptr(ctx);
  @@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

Re: [RFC PATCH v2 44/44] target/loongarch: Use {set/get}_gpr replace to cpu_fpr

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

Introduce set_fpr() and get_fpr() and remove cpu_fpr.

Signed-off-by: Song Gao
---
  .../loongarch/insn_trans/trans_farith.c.inc   | 72 +++
  target/loongarch/insn_trans/trans_fcmp.c.inc  | 12 ++--
  .../loongarch/insn_trans/trans_fmemory.c.inc  | 37 ++
  target/loongarch/insn_trans/trans_fmov.c.inc  | 31 +---
  target/loongarch/translate.c  | 20 --
  5 files changed, 129 insertions(+), 43 deletions(-)


Reviewed-by: Richard Henderson 

As previously mentioned, patch 2 must be last, because without this patch you will 
generate invalid tcg.



r~

Re: [RFC PATCH v2 43/44] target/loongarch: Implement vldi

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
+{
+int sel, vece;
+uint64_t value;
+CHECK_SXE;
+
+sel = (a->imm >> 12) & 0x1;
+
+if (sel) {
+/* VSETI.D */
+value = vldi_get_value(ctx, a->imm);
+vece = MO_64;
+} else {
+   /*
+* VLDI.B/H/W/D
+*  a->imm bit [11:10] is vece.
+*  a->imm bit [9:0] is value;
+*/
+   value = ((int32_t)(a->imm << 22)) >> 22;
+   vece = (a->imm >> 10) & 0x3;
+}
+
+tcg_gen_gvec_dup_i64(vece, vreg_full_offset(a->vd), 16, 16,
+ tcg_constant_i64(value));
+return true;
+}


I think you should finish this decode in insns.decode,
especially since we are using that for disassembly.


r~

Re: [PATCH v6 3/6] target/riscv: Fix target address to update badaddr

2023-04-03 Thread liweiwei




On 2023/4/4 11:06, LIU Zhiwei wrote:


On 2023/4/4 10:06, Weiwei Li wrote:

Compute the target address before storing it into badaddr
when mis-aligned exception is triggered.
Use a target_pc temp to store the target address to avoid
the confusing operation that udpate target address into
cpu_pc before misalign check, then update it into badaddr
and restore cpu_pc to current pc if exception is triggered.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/insn_trans/trans_rvi.c.inc | 23 ---
  target/riscv/translate.c    | 21 ++---
  2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index 4ad54e8a49..cc72864d32 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -51,25 +51,30 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
  static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  {
  TCGLabel *misaligned = NULL;
+    TCGv target_pc = tcg_temp_new();
  -    tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
-    tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
+    tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
+    tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
+
+    if (get_xl(ctx) == MXL_RV32) {
+    tcg_gen_ext32s_tl(target_pc, target_pc);
+    }

Delete this.


The (signed-extended)target_pc should be used in both updating cpu_pc 
and  following gen_exception_inst_addr_mis().


So we cannot delete it and just resue  the logic in gen_set_pc().

This is also why  I use  tcg_gen_mov_tl(cpu_pc, target_pc) directly in 
following code.


Regards,

Weiwei Li


-    gen_set_pc(ctx, cpu_pc);
  if (!has_ext(ctx, RVC)) {
  TCGv t0 = tcg_temp_new();
    misaligned = gen_new_label();
-    tcg_gen_andi_tl(t0, cpu_pc, 0x2);
+    tcg_gen_andi_tl(t0, target_pc, 0x2);
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
    gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+    tcg_gen_mov_tl(cpu_pc, target_pc);


And we can use the gen_set_pc instead.

I think the reason you want to delete gen_set_pc is to make the 
gen_set_pc_imm the only API to change the cpu_pc.


This implicitly enhance the correctness, which may constrain the 
scalable.


Zhiwei


  lookup_and_goto_ptr(ctx);
    if (misaligned) {
  gen_set_label(misaligned);
-    gen_exception_inst_addr_mis(ctx);
+    gen_exception_inst_addr_mis(ctx, target_pc);
  }
  ctx->base.is_jmp = DISAS_NORETURN;
  @@ -153,6 +158,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

  TCGLabel *l = gen_new_label();
  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
+    target_ulong next_pc;
    if (get_xl(ctx) == MXL_RV128) {
  TCGv src1h = get_gprh(ctx, a->rs1);
@@ -169,9 +175,12 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

    gen_set_label(l); /* branch taken */
  -    if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
+    next_pc = ctx->base.pc_next + a->imm;
+    if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
  /* misaligned */
-    gen_exception_inst_addr_mis(ctx);
+    TCGv target_pc = tcg_temp_new();
+    gen_pc_plus_diff(target_pc, ctx, next_pc);
+    gen_exception_inst_addr_mis(ctx, target_pc);
  } else {
  gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
  }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..d434fedb37 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -222,21 +222,18 @@ static void decode_save_opc(DisasContext *ctx)
  ctx->insn_start = NULL;
  }
  -static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
+static void gen_pc_plus_diff(TCGv target, DisasContext *ctx,
+  target_ulong dest)
  {
  if (get_xl(ctx) == MXL_RV32) {
  dest = (int32_t)dest;
  }
-    tcg_gen_movi_tl(cpu_pc, dest);
+    tcg_gen_movi_tl(target, dest);
  }
  -static void gen_set_pc(DisasContext *ctx, TCGv dest)
+static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
  {
-    if (get_xl(ctx) == MXL_RV32) {
-    tcg_gen_ext32s_tl(cpu_pc, dest);
-    } else {
-    tcg_gen_mov_tl(cpu_pc, dest);
-    }
+    gen_pc_plus_diff(cpu_pc, ctx, dest);
  }
    static void generate_exception(DisasContext *ctx, int excp)
@@ -257,9 +254,9 @@ static void gen_exception_illegal(DisasContext *ctx)
  }
  }
  -static void gen_exception_inst_addr_mis(DisasContext *ctx)
+static void gen_exception_inst_addr_mis(DisasContext *ctx, TCGv target)
  {
-    tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
+    tcg_gen_st_tl(target, cpu_env, offsetof(CPURISCVState, badaddr));
  generate_exception(ctx, RISC

Re: [RFC PATCH v2 42/44] target/loongarch: Implement vld vst

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+void HELPER(vld_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+int i;
+VReg *Vd = &(env->fpr[vd].vreg);
+#if !defined(CONFIG_USER_ONLY)
+MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, cpu_mmu_index(env, false));
+
+for (i = 0; i < LSX_LEN/8; i++) {
+Vd->B(i) = helper_ret_ldub_mmu(env, addr + i, oi, GETPC());
+}
+#else
+for (i = 0; i < LSX_LEN/8; i++) {
+Vd->B(i) = cpu_ldub_data(env, addr + i);
+}
+#endif
+}


tcg_gen_qemu_ld_i128.


+static inline void ensure_writable_pages(CPULoongArchState *env,
+ target_ulong addr,
+ int mmu_idx,
+ uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+/* FIXME: Probe the actual accesses (pass and use a size) */
+if (unlikely(LSX_PAGESPAN(addr))) {
+/* first page */
+probe_write(env, addr, 0, mmu_idx, retaddr);
+/* second page */
+addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+probe_write(env, addr, 0, mmu_idx, retaddr);
+}
+#endif
+}


Won't be needed with...


+void HELPER(vst_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+int i;
+VReg *Vd = &(env->fpr[vd].vreg);
+int mmu_idx = cpu_mmu_index(env, false);
+
+ensure_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, mmu_idx);
+for (i = 0; i < LSX_LEN/8; i++) {
+helper_ret_stb_mmu(env, addr + i, Vd->B(i),  oi, GETPC());
+}
+#else
+for (i = 0; i < LSX_LEN/8; i++) {
+cpu_stb_data(env, addr + i, Vd->B(i));
+}
+#endif
+}


... tcg_gen_qemu_st_i128.


+void HELPER(vldrepl_b)(CPULoongArchState *env, uint32_t vd, target_ulong addr)
+{
+VReg *Vd = &(env->fpr[vd].vreg);
+uint8_t data;
+#if !defined(CONFIG_USER_ONLY)
+MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+ cpu_mmu_index(env, false));
+data = helper_ret_ldub_mmu(env, addr, oi, GETPC());
+#else
+data = cpu_ldub_data(env, addr);
+#endif
+int i;
+for (i = 0; i < 16; i++) {
+Vd->B(i) = data;
+}
+}


tcg_gen_qemu_ld_i64 + tcg_gen_gvec_dup_i64.


+#define B_PAGESPAN(x) \
+x) & ~TARGET_PAGE_MASK) + 8/8 - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_b_writable_pages(CPULoongArchState *env,
+   target_ulong addr,
+   int mmu_idx,
+   uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+/* FIXME: Probe the actual accesses (pass and use a size) */
+if (unlikely(B_PAGESPAN(addr))) {
+/* first page */
+probe_write(env, addr, 0, mmu_idx, retaddr);
+/* second page */
+addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+probe_write(env, addr, 0, mmu_idx, retaddr);
+}
+#endif
+}
+
+void HELPER(vstelm_b)(CPULoongArchState *env,
+  uint32_t vd, target_ulong addr, uint32_t sel)
+{
+VReg *Vd = &(env->fpr[vd].vreg);
+int mmu_idx = cpu_mmu_index(env, false);
+
+ensure_b_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+MemOpIdx oi = make_memop_idx(MO_TE | MO_8 | MO_UNALN,
+ cpu_mmu_index(env, false));
+helper_ret_stb_mmu(env, addr, Vd->B(sel), oi, GETPC());
+#else
+cpu_stb_data(env, addr, Vd->B(sel));
+#endif
+}


What are you doing here?
This is a plain integer store.


r~

Re: [PATCH v6 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread LIU Zhiwei




On 2023/4/4 11:12, LIU Zhiwei wrote:


On 2023/4/4 10:06, Weiwei Li wrote:

Add a base pc_save for PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
We can get pc-relative address from following formula:
   real_pc = (old)env->pc + diff, where diff = target_pc - ctx->pc_save.
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr and jal.

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps libc.so
(et al) at a different virtual address, and this allows those
translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +--
  target/riscv/insn_trans/trans_rvi.c.inc | 14 ++--


Miss the process for trans_ebreak.


Please ignore this sentence comment. It will not influence the codegen.

Zhiwei



I want to construct the PCREL feature on the processing of ctx pc 
related fields, which is the reason why we need do specially process. 
For example,


 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    if (tb_cflags(ctx->cflags) & CF_PCREL) {
+    target_ulong pc_rel = ctx->base.pc_next - ctx->base.pc_first 
+ a->imm;

+    gen_set_gpr_pcrel(ctx, a->rd, cpu_pc, pc_rel);
+    } else {
+    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    }
 return true;
 }

+static void gen_set_gpr_pcrel(DisasContext *ctx, int reg_num, TCGv t, 
target_ulong rel)

+{
+    TCGv dest = dest_gpr(ctx, reg_num);
+    tcg_gen_addi_tl(dest, t, rel);
+    gen_set_gpr(ctx, reg_num, dest);
+}
+

But if it is too difficult to reuse the current implementation, your 
implementation is also acceptable to me.


Zhiwei


  target/riscv/translate.c | 48 -
  3 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
    const TranslationBlock *tb)
  {
-    RISCVCPU *cpu = RISCV_CPU(cs);
-    CPURISCVState *env = &cpu->env;
-    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    if (!(tb_cflags(tb) & CF_PCREL)) {
+    RISCVCPU *cpu = RISCV_CPU(cs);
+    CPURISCVState *env = &cpu->env;
+    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  -    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  -    if (xl == MXL_RV32) {
-    env->pc = (int32_t) tb->pc;
-    } else {
-    env->pc = tb->pc;
+    if (xl == MXL_RV32) {
+    env->pc = (int32_t) tb->pc;
+    } else {
+    env->pc = tb->pc;
+    }
  }
  }
  @@ -693,11 +695,18 @@ static void 
riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    target_ulong pc;
+
+    if (tb_cflags(tb) & CF_PCREL) {
+    pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+    } else {
+    pc = data[0];
+    }
    if (xl == MXL_RV32) {
-    env->pc = (int32_t)data[0];
+    env->pc = (int32_t)pc;
  } else {
-    env->pc = data[0];
+    env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index cc72864d32..7cbbdac5aa 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
    static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
  {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    TCGv target_pc = dest_gpr(ctx, a->rd);
+    gen_pc_plus_diff(target_pc, ctx, a->imm + ctx->base.pc_next);
+    gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  @@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, 
arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+    TCGv succ_pc = dest_gpr(ctx, a->rd);
    tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), 
a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  -    gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+    gen_pc_plus_diff(succ_pc, ctx, ctx->pc_succ_insn);
+    gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);
  lookup_and_goto_ptr(ctx);
  @@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

  TCGv src1 = ge

Re: [PATCH v6 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread LIU Zhiwei




On 2023/4/4 10:06, Weiwei Li wrote:

Add a base pc_save for PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
We can get pc-relative address from following formula:
   real_pc = (old)env->pc + diff, where diff = target_pc - ctx->pc_save.
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr and jal.

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps libc.so
(et al) at a different virtual address, and this allows those
translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +--
  target/riscv/insn_trans/trans_rvi.c.inc | 14 ++--


Miss the process for trans_ebreak.

I want to construct the PCREL feature on the processing of ctx pc 
related fields, which is the reason why we need do specially process. 
For example,


 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    if (tb_cflags(ctx->cflags) & CF_PCREL) {
+    target_ulong pc_rel = ctx->base.pc_next - ctx->base.pc_first + 
a->imm;

+    gen_set_gpr_pcrel(ctx, a->rd, cpu_pc, pc_rel);
+    } else {
+    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    }
 return true;
 }

+static void gen_set_gpr_pcrel(DisasContext *ctx, int reg_num, TCGv t, 
target_ulong rel)

+{
+    TCGv dest = dest_gpr(ctx, reg_num);
+    tcg_gen_addi_tl(dest, t, rel);
+    gen_set_gpr(ctx, reg_num, dest);
+}
+

But if it is too difficult to reuse the current implementation, your 
implementation is also acceptable to me.


Zhiwei


  target/riscv/translate.c| 48 -
  3 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
const TranslationBlock *tb)
  {
-RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = &cpu->env;
-RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+if (!(tb_cflags(tb) & CF_PCREL)) {
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
+RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  
-tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));

+tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  
-if (xl == MXL_RV32) {

-env->pc = (int32_t) tb->pc;
-} else {
-env->pc = tb->pc;
+if (xl == MXL_RV32) {
+env->pc = (int32_t) tb->pc;
+} else {
+env->pc = tb->pc;
+}
  }
  }
  
@@ -693,11 +695,18 @@ static void riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+target_ulong pc;
+
+if (tb_cflags(tb) & CF_PCREL) {
+pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+} else {
+pc = data[0];
+}
  
  if (xl == MXL_RV32) {

-env->pc = (int32_t)data[0];
+env->pc = (int32_t)pc;
  } else {
-env->pc = data[0];
+env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index cc72864d32..7cbbdac5aa 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
  
  static bool trans_auipc(DisasContext *ctx, arg_auipc *a)

  {
-gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+TCGv target_pc = dest_gpr(ctx, a->rd);
+gen_pc_plus_diff(target_pc, ctx, a->imm + ctx->base.pc_next);
+gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  
@@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+TCGv succ_pc = dest_gpr(ctx, a->rd);
  
  tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  
-gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);

+gen_pc_plus_diff(succ_pc, ctx, ctx->pc_succ_insn);
+gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);
  lookup_and_goto_ptr(ctx);
  
@@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond cond)

  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
  target_ulong

Re: [PATCH v6 3/6] target/riscv: Fix target address to update badaddr

2023-04-03 Thread LIU Zhiwei




On 2023/4/4 10:06, Weiwei Li wrote:

Compute the target address before storing it into badaddr
when mis-aligned exception is triggered.
Use a target_pc temp to store the target address to avoid
the confusing operation that udpate target address into
cpu_pc before misalign check, then update it into badaddr
and restore cpu_pc to current pc if exception is triggered.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/insn_trans/trans_rvi.c.inc | 23 ---
  target/riscv/translate.c| 21 ++---
  2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 4ad54e8a49..cc72864d32 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -51,25 +51,30 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
  static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  {
  TCGLabel *misaligned = NULL;
+TCGv target_pc = tcg_temp_new();
  
-tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);

-tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
+tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
+tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
+
+if (get_xl(ctx) == MXL_RV32) {
+tcg_gen_ext32s_tl(target_pc, target_pc);
+}
  

Delete this.

-gen_set_pc(ctx, cpu_pc);
  if (!has_ext(ctx, RVC)) {
  TCGv t0 = tcg_temp_new();
  
  misaligned = gen_new_label();

-tcg_gen_andi_tl(t0, cpu_pc, 0x2);
+tcg_gen_andi_tl(t0, target_pc, 0x2);
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  
  gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);

+tcg_gen_mov_tl(cpu_pc, target_pc);


And we can use the gen_set_pc instead.

I think the reason you want to delete gen_set_pc is to make the 
gen_set_pc_imm the only API to change the cpu_pc.


This implicitly enhance the correctness, which may constrain the scalable.

Zhiwei


  lookup_and_goto_ptr(ctx);
  
  if (misaligned) {

  gen_set_label(misaligned);
-gen_exception_inst_addr_mis(ctx);
+gen_exception_inst_addr_mis(ctx, target_pc);
  }
  ctx->base.is_jmp = DISAS_NORETURN;
  
@@ -153,6 +158,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond cond)

  TCGLabel *l = gen_new_label();
  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
+target_ulong next_pc;
  
  if (get_xl(ctx) == MXL_RV128) {

  TCGv src1h = get_gprh(ctx, a->rs1);
@@ -169,9 +175,12 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, 
TCGCond cond)
  
  gen_set_label(l); /* branch taken */
  
-if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {

+next_pc = ctx->base.pc_next + a->imm;
+if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
  /* misaligned */
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new();
+gen_pc_plus_diff(target_pc, ctx, next_pc);
+gen_exception_inst_addr_mis(ctx, target_pc);
  } else {
  gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
  }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..d434fedb37 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -222,21 +222,18 @@ static void decode_save_opc(DisasContext *ctx)
  ctx->insn_start = NULL;
  }
  
-static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)

+static void gen_pc_plus_diff(TCGv target, DisasContext *ctx,
+  target_ulong dest)
  {
  if (get_xl(ctx) == MXL_RV32) {
  dest = (int32_t)dest;
  }
-tcg_gen_movi_tl(cpu_pc, dest);
+tcg_gen_movi_tl(target, dest);
  }
  
-static void gen_set_pc(DisasContext *ctx, TCGv dest)

+static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
  {
-if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32s_tl(cpu_pc, dest);
-} else {
-tcg_gen_mov_tl(cpu_pc, dest);
-}
+gen_pc_plus_diff(cpu_pc, ctx, dest);
  }
  
  static void generate_exception(DisasContext *ctx, int excp)

@@ -257,9 +254,9 @@ static void gen_exception_illegal(DisasContext *ctx)
  }
  }
  
-static void gen_exception_inst_addr_mis(DisasContext *ctx)

+static void gen_exception_inst_addr_mis(DisasContext *ctx, TCGv target)
  {
-tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
+tcg_gen_st_tl(target, cpu_env, offsetof(CPURISCVState, badaddr));
  generate_exception(ctx, RISCV_EXCP_INST_ADDR_MIS);
  }
  
@@ -551,7 +548,9 @@ static void gen_jal(DisasContext *ctx, int rd, target_ulong imm)

  next_pc = ctx->base.pc_next + imm;
  if (!has_ext(ctx, RVC)) {
  if ((next_pc & 0x3) != 0) {
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new()

Re: [PATCH v6 6/6] target/riscv: Add pointer mask support for instruction fetch

2023-04-03 Thread LIU Zhiwei




On 2023/4/4 10:06, Weiwei Li wrote:

Transform the fetch address in cpu_get_tb_cpu_state() when pointer
mask for instruction is enabled.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.h|  1 +
  target/riscv/cpu_helper.c | 20 +++-
  target/riscv/csr.c|  2 --
  3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..57bd9c3279 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -368,6 +368,7 @@ struct CPUArchState {
  #endif
  target_ulong cur_pmmask;
  target_ulong cur_pmbase;
+bool cur_pminsn;
  
  /* Fields from here on are preserved across CPU reset. */

  QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f88c503cf4..b683a770fe 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -40,6 +40,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
  #endif
  }
  
+static target_ulong adjust_pc_address(CPURISCVState *env, target_ulong pc)

+{
+target_ulong adjust_pc = pc;
+
+if (env->cur_pminsn) {
+adjust_pc = (adjust_pc & ~env->cur_pmmask) | env->cur_pmbase;
+} else if (env->xl == MXL_RV32) {
+adjust_pc &= UINT32_MAX;
+}
+
+return adjust_pc;
+}
+
  void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
target_ulong *cs_base, uint32_t *pflags)
  {
@@ -48,7 +61,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  
  uint32_t flags = 0;
  
-*pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;

+*pc = adjust_pc_address(env, env->pc);
  *cs_base = 0;
  
  if (cpu->cfg.ext_zve32f) {

@@ -124,6 +137,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  void riscv_cpu_update_mask(CPURISCVState *env)
  {
  target_ulong mask = -1, base = 0;
+bool insn = false;
  /*
   * TODO: Current RVJ spec does not specify
   * how the extension interacts with XLEN.
@@ -135,18 +149,21 @@ void riscv_cpu_update_mask(CPURISCVState *env)
  if (env->mmte & M_PM_ENABLE) {
  mask = env->mpmmask;
  base = env->mpmbase;
+insn = env->mmte & MMTE_M_PM_INSN;
  }
  break;
  case PRV_S:
  if (env->mmte & S_PM_ENABLE) {
  mask = env->spmmask;
  base = env->spmbase;
+insn = env->mmte & MMTE_S_PM_INSN;
  }
  break;
  case PRV_U:
  if (env->mmte & U_PM_ENABLE) {
  mask = env->upmmask;
  base = env->upmbase;
+insn = env->mmte & MMTE_U_PM_INSN;
  }
  break;
  default:
@@ -161,6 +178,7 @@ void riscv_cpu_update_mask(CPURISCVState *env)
  env->cur_pmmask = mask;
  env->cur_pmbase = base;
  }
+env->cur_pminsn = insn;
  }
  
  #ifndef CONFIG_USER_ONLY

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 43b9ad4500..0902b64129 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3518,8 +3518,6 @@ static RISCVException write_mmte(CPURISCVState *env, int 
csrno,
  /* for machine mode pm.current is hardwired to 1 */
  wpri_val |= MMTE_M_PM_CURRENT;
  
-/* hardwiring pm.instruction bit to 0, since it's not supported yet */

-wpri_val &= ~(MMTE_M_PM_INSN | MMTE_S_PM_INSN | MMTE_U_PM_INSN);


Reviewed-by: LIU Zhiwei 

Zhiwei


  env->mmte = wpri_val | PM_EXT_DIRTY;
  riscv_cpu_update_mask(env);

Re: [RESEND PATCH v5 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread LIU Zhiwei




On 2023/4/4 10:13, liweiwei wrote:


On 2023/4/4 09:58, LIU Zhiwei wrote:


On 2023/4/1 20:49, Weiwei Li wrote:

Add a base save_pc For PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
Sync pc before it's used or updated from tb related pc:
    real_pc = (old)env->pc + target_pc(from tb) - ctx->save_pc
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +-
  target/riscv/insn_trans/trans_rvi.c.inc | 14 +--
  target/riscv/translate.c    | 53 
+

  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
    const TranslationBlock *tb)
  {
-    RISCVCPU *cpu = RISCV_CPU(cs);
-    CPURISCVState *env = &cpu->env;
-    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    if (!(tb_cflags(tb) & CF_PCREL)) {
+    RISCVCPU *cpu = RISCV_CPU(cs);
+    CPURISCVState *env = &cpu->env;
+    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  -    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  -    if (xl == MXL_RV32) {
-    env->pc = (int32_t) tb->pc;
-    } else {
-    env->pc = tb->pc;
+    if (xl == MXL_RV32) {
+    env->pc = (int32_t) tb->pc;
+    } else {
+    env->pc = tb->pc;
+    }
  }
  }
  @@ -693,11 +695,18 @@ static void 
riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    target_ulong pc;
+
+    if (tb_cflags(tb) & CF_PCREL) {
+    pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+    } else {
+    pc = data[0];
+    }
    if (xl == MXL_RV32) {
-    env->pc = (int32_t)data[0];
+    env->pc = (int32_t)pc;
  } else {
-    env->pc = data[0];
+    env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index 48c73cfcfe..52ef260eff 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
    static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
  {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    TCGv target_pc = dest_gpr(ctx, a->rd);
+    gen_get_target_pc(target_pc, ctx, a->imm + ctx->base.pc_next);
+    gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  @@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, 
arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+    TCGv succ_pc = dest_gpr(ctx, a->rd);
    tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), 
a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr 
*a)

  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  -    gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+    gen_get_target_pc(succ_pc, ctx, ctx->pc_succ_insn);
+    gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);


If pointer masking enabled, should we adjust the target_pc before it 
is written into the cpu_pc?


Pointer mask works on effective address. So I think it will not affect 
pc register value, but the fetch address.


And I remember one of its application is memory tag. If pc register is 
already affected by pointer mask,


the memory tag will not work.


Make sense. Thanks.

Zhiwei



Regards,

Weiwei Li




  lookup_and_goto_ptr(ctx);
  @@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, 
arg_b *a, TCGCond cond)

  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
  target_ulong next_pc;
+    target_ulong orig_pc_save = ctx->pc_save;
    if (get_xl(ctx) == MXL_RV128) {
  TCGv src1h = get_gprh(ctx, a->rs1);
@@ -175,6 +181,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

    gen_set_label(l); /* branch taken */
  +    ctx->pc_save = orig_pc_save;
  next_pc = ctx->base.pc_next + a->imm;
  if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
  /* misaligned */
@@ -182,8 +189,9 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

  gen_get_target_pc(target_pc, ctx, next_pc);
  gen_exception_inst_addr_mis(ctx, target_pc);
  } else {
-    gen_goto_tb(ctx, 0, ctx->base.pc_next + a->

Re: [edk2-devel] On integrating LoongArch EDK2 firmware into QEMU build process

2023-04-03 Thread Chao Li


在 2023/4/3 19:04, Gerd Hoffmann 写道:


On Mon, Apr 03, 2023 at 10:29:52AM +, Michael Brown wrote:

On 03/04/2023 11:13, Chao Li wrote:

This problem is because the gcc-12 does not yet to support the option
'mno-explicit-reloc', this option is used to open the new reloaction
type for LoongArch, this new feature is very important for LoongArch,
because it can reduce the binary size and improve code execution
efficiency, so we turn it on when submitting the code to the EDK2 repo.

Is it possible to produce a _functional_ LoongArch64 EDK2 binary without
this option, even if the resulting binary is less efficient?

MdePkg/Include/IndustryStandard/PeImage.h lists a single loongarch
relocation type only, which I expect being the new type.  So I suspect
the answer is "no" because the edk2 pe loader isn't able to handle the
old relocation type(s).


Yes, the answer is "no", but the opposite is ture, the 
MdePkg/Include/IndustryStandard/PeImage.h LoongArch relocation type is 
older, this type appears in this list for compatiblility with binaries 
using the old reloaction type. If you use this type, you have to turn on 
the option '-mla-global-with-abs' in gcc,all global symbols will be 
created as "mark la" type, PE loader will use this rule to handle them. 
This option is mutually exclusive with 'mno-explicit-reloc',  new 
reloaction type(s) doesn't require special type(s) to be expressed in 
PeImage.h, PE loader doesn't need to do anything about relocation, all 
of reloaction process is done in BaseTools/Source/C/GenFw/Elf64Convert.c.



Thanks,
Chao
在 2023/4/3 19:04, Gerd Hoffmann 写道:

On Mon, Apr 03, 2023 at 10:29:52AM +, Michael Brown wrote:

On 03/04/2023 11:13, Chao Li wrote:

This problem is because the gcc-12 does not yet to support the option
'mno-explicit-reloc', this option is used to open the new reloaction
type for LoongArch, this new feature is very important for LoongArch,
because it can reduce the binary size and improve code execution
efficiency, so we turn it on when submitting the code to the EDK2 repo.

Is it possible to produce a _functional_ LoongArch64 EDK2 binary without
this option, even if the resulting binary is less efficient?

MdePkg/Include/IndustryStandard/PeImage.h lists a single loongarch
relocation type only, which I expect being the new type.  So I suspect
the answer is "no" because the edk2 pe loader isn't able to handle the
old relocation type(s).

take care,
   Gerd



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#102384):https://edk2.groups.io/g/devel/message/102384
Mute This Topic:https://groups.io/mt/98030924/6496846
Group Owner:devel+ow...@edk2.groups.io
Unsubscribe:https://edk2.groups.io/g/devel/unsub  [lic...@loongson.cn]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [RESEND PATCH v5 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread liweiwei




On 2023/4/4 09:58, LIU Zhiwei wrote:


On 2023/4/1 20:49, Weiwei Li wrote:

Add a base save_pc For PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
Sync pc before it's used or updated from tb related pc:
    real_pc = (old)env->pc + target_pc(from tb) - ctx->save_pc
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +-
  target/riscv/insn_trans/trans_rvi.c.inc | 14 +--
  target/riscv/translate.c    | 53 +
  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
    const TranslationBlock *tb)
  {
-    RISCVCPU *cpu = RISCV_CPU(cs);
-    CPURISCVState *env = &cpu->env;
-    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    if (!(tb_cflags(tb) & CF_PCREL)) {
+    RISCVCPU *cpu = RISCV_CPU(cs);
+    CPURISCVState *env = &cpu->env;
+    RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  -    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+    tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  -    if (xl == MXL_RV32) {
-    env->pc = (int32_t) tb->pc;
-    } else {
-    env->pc = tb->pc;
+    if (xl == MXL_RV32) {
+    env->pc = (int32_t) tb->pc;
+    } else {
+    env->pc = tb->pc;
+    }
  }
  }
  @@ -693,11 +695,18 @@ static void 
riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+    target_ulong pc;
+
+    if (tb_cflags(tb) & CF_PCREL) {
+    pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+    } else {
+    pc = data[0];
+    }
    if (xl == MXL_RV32) {
-    env->pc = (int32_t)data[0];
+    env->pc = (int32_t)pc;
  } else {
-    env->pc = data[0];
+    env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index 48c73cfcfe..52ef260eff 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
    static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
  {
-    gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+    TCGv target_pc = dest_gpr(ctx, a->rd);
+    gen_get_target_pc(target_pc, ctx, a->imm + ctx->base.pc_next);
+    gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  @@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, 
arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+    TCGv succ_pc = dest_gpr(ctx, a->rd);
    tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), 
a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  -    gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+    gen_get_target_pc(succ_pc, ctx, ctx->pc_succ_insn);
+    gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);


If pointer masking enabled, should we adjust the target_pc before it 
is written into the cpu_pc?


Pointer mask works on effective address. So I think it will not affect 
pc register value, but the fetch address.


And I remember one of its application is memory tag. If pc register is 
already affected by pointer mask,


the memory tag will not work.

Regards,

Weiwei Li




  lookup_and_goto_ptr(ctx);
  @@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
  target_ulong next_pc;
+    target_ulong orig_pc_save = ctx->pc_save;
    if (get_xl(ctx) == MXL_RV128) {
  TCGv src1h = get_gprh(ctx, a->rs1);
@@ -175,6 +181,7 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

    gen_set_label(l); /* branch taken */
  +    ctx->pc_save = orig_pc_save;
  next_pc = ctx->base.pc_next + a->imm;
  if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
  /* misaligned */
@@ -182,8 +189,9 @@ static bool gen_branch(DisasContext *ctx, arg_b 
*a, TCGCond cond)

  gen_get_target_pc(target_pc, ctx, next_pc);
  gen_exception_inst_addr_mis(ctx, target_pc);
  } else {
-    gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
+    gen_goto_tb(ctx, 0, next_pc);
  }
+    ctx->pc_save =

Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat

2023-04-03 Thread gaosong




在 2023/4/4 上午4:13, Richard Henderson 写道:

On 4/3/23 05:55, gaosong wrote:

Hi, Richard

在 2023/4/1 下午1:03, Richard Henderson 写道:

On 3/27/23 20:06, Song Gao wrote:
+static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, 
int64_t imm)

+{
+    TCGv_vec t1;
+    int64_t max  = (1l << imm) - 1;


This needed 1ull, but better to just use

    max = MAKE_64BIT_MASK(0, imm - 1); 

For the signed  version use ll?
I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is 
not suitable.


int64_t max = MAKE_64BIT_MASK(0, imm);
int64_t min = ~max // or -1 - max


The same problem with imm = 0,
MAKE_64BIT_MASK(0, 0) is always  0x. :-)

Thanks.
Song Gao

[PATCH v6 1/6] target/riscv: Fix pointer mask transformation for vector address

2023-04-03 Thread Weiwei Li

actual_address = (requested_address & ~mpmmask) | mpmbase.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/vector_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 2423affe37..a58d82af8c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -172,7 +172,7 @@ static inline uint32_t vext_get_total_elems(CPURISCVState 
*env, uint32_t desc,
 
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
-return (addr & env->cur_pmmask) | env->cur_pmbase;
+return (addr & ~env->cur_pmmask) | env->cur_pmbase;
 }
 
 /*
-- 
2.25.1

[PATCH v6 0/6] target/riscv: Fix pointer mask related support

2023-04-03 Thread Weiwei Li

This patchset tries to fix some problem in current implementation for pointer 
mask, and add support for pointer mask of instruction fetch.

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pm-fix-v6

v2:
* drop some error patchs
* Add patch 2 and 3 to fix the new problems
* Add patch 4 and 5 to use PC-relative translation for pointer mask for 
instruction fetch

v3:
* use target_pc temp instead of cpu_pc to store into badaddr in patch 3
* use dest_gpr instead of tcg_temp_new() for succ_pc in patch 4
* enable CF_PCREL for system mode in seperate patch 5

v4：
* Fix wrong pc_save value for conditional jump in patch 4
* Fix tcg_cflags overwrite problem to make CF_PCREL really work in new patch 5
* Fix tb mis-matched problem in new patch 6

v5:
* use gen_get_target_pc to compute target address of auipc and successor 
address of jalr in patch 4.
* separate tcg related fix patches(5, 6) from this patchset

v6:
* rename gen_get_target_pc as gen_pc_plus_diff in patch 3 and patch 4
* use gen_pc_plus_diff to compute successor address of jal in patch 4
* Mov comments for patch 5 to patch 4

Weiwei Li (6):
  target/riscv: Fix pointer mask transformation for vector address
  target/riscv: Update cur_pmmask/base when xl changes
  target/riscv: Fix target address to update badaddr
  target/riscv: Add support for PC-relative translation
  target/riscv: Enable PC-relative translation in system mode
  target/riscv: Add pointer mask support for instruction fetch

 target/riscv/cpu.c  | 31 
 target/riscv/cpu.h  |  1 +
 target/riscv/cpu_helper.c   | 20 +++-
 target/riscv/csr.c  | 11 ++--
 target/riscv/insn_trans/trans_rvi.c.inc | 37 ++
 target/riscv/translate.c| 67 ++---
 target/riscv/vector_helper.c|  2 +-
 7 files changed, 126 insertions(+), 43 deletions(-)

-- 
2.25.1

[PATCH v6 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread Weiwei Li

Add a base pc_save for PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
We can get pc-relative address from following formula:
  real_pc = (old)env->pc + diff, where diff = target_pc - ctx->pc_save.
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr and jal.

The existence of CF_PCREL can improve performance with the guest
kernel's address space randomization.  Each guest process maps libc.so
(et al) at a different virtual address, and this allows those
translations to be shared.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c  | 29 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 14 ++--
 target/riscv/translate.c| 48 -
 3 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
 static void riscv_cpu_synchronize_from_tb(CPUState *cs,
   const TranslationBlock *tb)
 {
-RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = &cpu->env;
-RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+if (!(tb_cflags(tb) & CF_PCREL)) {
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
+RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
 
-tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
+tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
 
-if (xl == MXL_RV32) {
-env->pc = (int32_t) tb->pc;
-} else {
-env->pc = tb->pc;
+if (xl == MXL_RV32) {
+env->pc = (int32_t) tb->pc;
+} else {
+env->pc = tb->pc;
+}
 }
 }
 
@@ -693,11 +695,18 @@ static void riscv_restore_state_to_opc(CPUState *cs,
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = &cpu->env;
 RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+target_ulong pc;
+
+if (tb_cflags(tb) & CF_PCREL) {
+pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+} else {
+pc = data[0];
+}
 
 if (xl == MXL_RV32) {
-env->pc = (int32_t)data[0];
+env->pc = (int32_t)pc;
 } else {
-env->pc = data[0];
+env->pc = pc;
 }
 env->bins = data[1];
 }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index cc72864d32..7cbbdac5aa 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
 
 static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
 {
-gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+TCGv target_pc = dest_gpr(ctx, a->rd);
+gen_pc_plus_diff(target_pc, ctx, a->imm + ctx->base.pc_next);
+gen_set_gpr(ctx, a->rd, target_pc);
 return true;
 }
 
@@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 {
 TCGLabel *misaligned = NULL;
 TCGv target_pc = tcg_temp_new();
+TCGv succ_pc = dest_gpr(ctx, a->rd);
 
 tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
 tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
 }
 
-gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+gen_pc_plus_diff(succ_pc, ctx, ctx->pc_succ_insn);
+gen_set_gpr(ctx, a->rd, succ_pc);
+
 tcg_gen_mov_tl(cpu_pc, target_pc);
 lookup_and_goto_ptr(ctx);
 
@@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
 target_ulong next_pc;
+target_ulong orig_pc_save = ctx->pc_save;
 
 if (get_xl(ctx) == MXL_RV128) {
 TCGv src1h = get_gprh(ctx, a->rs1);
@@ -175,6 +181,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 
 gen_set_label(l); /* branch taken */
 
+ctx->pc_save = orig_pc_save;
 next_pc = ctx->base.pc_next + a->imm;
 if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
 /* misaligned */
@@ -182,8 +189,9 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 gen_pc_plus_diff(target_pc, ctx, next_pc);
 gen_exception_inst_addr_mis(ctx, target_pc);
 } else {
-gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
+gen_goto_tb(ctx, 0, next_pc);
 }
+ctx->pc_save = -1;
 ctx->base.is_jmp = DISAS_NORETURN;
 
 return true;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d434fedb37..4623749602 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -59,6 +59,7 @@ typedef struct DisasContext

[PATCH v6 5/6] target/riscv: Enable PC-relative translation in system mode

2023-04-03 Thread Weiwei Li

Enable PC-relative translation in system mode by setting CF_PCREL
field of tcg_cflags in riscv_cpu_realize().

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 646fa31a59..3b562d5d9f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1193,6 +1193,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 
 #ifndef CONFIG_USER_ONLY
+cs->tcg_cflags |= CF_PCREL;
+
 if (cpu->cfg.ext_sstc) {
 riscv_timer_init(cpu);
 }
-- 
2.25.1

[PATCH v6 6/6] target/riscv: Add pointer mask support for instruction fetch

2023-04-03 Thread Weiwei Li

Transform the fetch address in cpu_get_tb_cpu_state() when pointer
mask for instruction is enabled.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.h|  1 +
 target/riscv/cpu_helper.c | 20 +++-
 target/riscv/csr.c|  2 --
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..57bd9c3279 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -368,6 +368,7 @@ struct CPUArchState {
 #endif
 target_ulong cur_pmmask;
 target_ulong cur_pmbase;
+bool cur_pminsn;
 
 /* Fields from here on are preserved across CPU reset. */
 QEMUTimer *stimer; /* Internal timer for S-mode interrupt */
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f88c503cf4..b683a770fe 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -40,6 +40,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 #endif
 }
 
+static target_ulong adjust_pc_address(CPURISCVState *env, target_ulong pc)
+{
+target_ulong adjust_pc = pc;
+
+if (env->cur_pminsn) {
+adjust_pc = (adjust_pc & ~env->cur_pmmask) | env->cur_pmbase;
+} else if (env->xl == MXL_RV32) {
+adjust_pc &= UINT32_MAX;
+}
+
+return adjust_pc;
+}
+
 void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
@@ -48,7 +61,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 
 uint32_t flags = 0;
 
-*pc = env->xl == MXL_RV32 ? env->pc & UINT32_MAX : env->pc;
+*pc = adjust_pc_address(env, env->pc);
 *cs_base = 0;
 
 if (cpu->cfg.ext_zve32f) {
@@ -124,6 +137,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 void riscv_cpu_update_mask(CPURISCVState *env)
 {
 target_ulong mask = -1, base = 0;
+bool insn = false;
 /*
  * TODO: Current RVJ spec does not specify
  * how the extension interacts with XLEN.
@@ -135,18 +149,21 @@ void riscv_cpu_update_mask(CPURISCVState *env)
 if (env->mmte & M_PM_ENABLE) {
 mask = env->mpmmask;
 base = env->mpmbase;
+insn = env->mmte & MMTE_M_PM_INSN;
 }
 break;
 case PRV_S:
 if (env->mmte & S_PM_ENABLE) {
 mask = env->spmmask;
 base = env->spmbase;
+insn = env->mmte & MMTE_S_PM_INSN;
 }
 break;
 case PRV_U:
 if (env->mmte & U_PM_ENABLE) {
 mask = env->upmmask;
 base = env->upmbase;
+insn = env->mmte & MMTE_U_PM_INSN;
 }
 break;
 default:
@@ -161,6 +178,7 @@ void riscv_cpu_update_mask(CPURISCVState *env)
 env->cur_pmmask = mask;
 env->cur_pmbase = base;
 }
+env->cur_pminsn = insn;
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 43b9ad4500..0902b64129 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3518,8 +3518,6 @@ static RISCVException write_mmte(CPURISCVState *env, int 
csrno,
 /* for machine mode pm.current is hardwired to 1 */
 wpri_val |= MMTE_M_PM_CURRENT;
 
-/* hardwiring pm.instruction bit to 0, since it's not supported yet */
-wpri_val &= ~(MMTE_M_PM_INSN | MMTE_S_PM_INSN | MMTE_U_PM_INSN);
 env->mmte = wpri_val | PM_EXT_DIRTY;
 riscv_cpu_update_mask(env);
 
-- 
2.25.1

[PATCH v6 2/6] target/riscv: Update cur_pmmask/base when xl changes

2023-04-03 Thread Weiwei Li

write_mstatus() can only change current xl when in debug mode.
And we need update cur_pmmask/base in this case.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/csr.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d522efc0b6..43b9ad4500 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1277,8 +1277,15 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
 mstatus = set_field(mstatus, MSTATUS64_SXL, xl);
 }
 env->mstatus = mstatus;
-env->xl = cpu_recompute_xl(env);
 
+/*
+ * Except in debug mode, UXL/SXL can only be modified by higher
+ * privilege mode. So xl will not be changed in normal mode.
+ */
+if (env->debugger) {
+env->xl = cpu_recompute_xl(env);
+riscv_cpu_update_mask(env);
+}
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

[PATCH v6 3/6] target/riscv: Fix target address to update badaddr

2023-04-03 Thread Weiwei Li

Compute the target address before storing it into badaddr
when mis-aligned exception is triggered.
Use a target_pc temp to store the target address to avoid
the confusing operation that udpate target address into
cpu_pc before misalign check, then update it into badaddr
and restore cpu_pc to current pc if exception is triggered.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 23 ---
 target/riscv/translate.c| 21 ++---
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 4ad54e8a49..cc72864d32 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -51,25 +51,30 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
 static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 {
 TCGLabel *misaligned = NULL;
+TCGv target_pc = tcg_temp_new();
 
-tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
-tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
+tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
+tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
+
+if (get_xl(ctx) == MXL_RV32) {
+tcg_gen_ext32s_tl(target_pc, target_pc);
+}
 
-gen_set_pc(ctx, cpu_pc);
 if (!has_ext(ctx, RVC)) {
 TCGv t0 = tcg_temp_new();
 
 misaligned = gen_new_label();
-tcg_gen_andi_tl(t0, cpu_pc, 0x2);
+tcg_gen_andi_tl(t0, target_pc, 0x2);
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
 }
 
 gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+tcg_gen_mov_tl(cpu_pc, target_pc);
 lookup_and_goto_ptr(ctx);
 
 if (misaligned) {
 gen_set_label(misaligned);
-gen_exception_inst_addr_mis(ctx);
+gen_exception_inst_addr_mis(ctx, target_pc);
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
@@ -153,6 +158,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 TCGLabel *l = gen_new_label();
 TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
+target_ulong next_pc;
 
 if (get_xl(ctx) == MXL_RV128) {
 TCGv src1h = get_gprh(ctx, a->rs1);
@@ -169,9 +175,12 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, 
TCGCond cond)
 
 gen_set_label(l); /* branch taken */
 
-if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
+next_pc = ctx->base.pc_next + a->imm;
+if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
 /* misaligned */
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new();
+gen_pc_plus_diff(target_pc, ctx, next_pc);
+gen_exception_inst_addr_mis(ctx, target_pc);
 } else {
 gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
 }
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..d434fedb37 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -222,21 +222,18 @@ static void decode_save_opc(DisasContext *ctx)
 ctx->insn_start = NULL;
 }
 
-static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
+static void gen_pc_plus_diff(TCGv target, DisasContext *ctx,
+  target_ulong dest)
 {
 if (get_xl(ctx) == MXL_RV32) {
 dest = (int32_t)dest;
 }
-tcg_gen_movi_tl(cpu_pc, dest);
+tcg_gen_movi_tl(target, dest);
 }
 
-static void gen_set_pc(DisasContext *ctx, TCGv dest)
+static void gen_set_pc_imm(DisasContext *ctx, target_ulong dest)
 {
-if (get_xl(ctx) == MXL_RV32) {
-tcg_gen_ext32s_tl(cpu_pc, dest);
-} else {
-tcg_gen_mov_tl(cpu_pc, dest);
-}
+gen_pc_plus_diff(cpu_pc, ctx, dest);
 }
 
 static void generate_exception(DisasContext *ctx, int excp)
@@ -257,9 +254,9 @@ static void gen_exception_illegal(DisasContext *ctx)
 }
 }
 
-static void gen_exception_inst_addr_mis(DisasContext *ctx)
+static void gen_exception_inst_addr_mis(DisasContext *ctx, TCGv target)
 {
-tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
+tcg_gen_st_tl(target, cpu_env, offsetof(CPURISCVState, badaddr));
 generate_exception(ctx, RISCV_EXCP_INST_ADDR_MIS);
 }
 
@@ -551,7 +548,9 @@ static void gen_jal(DisasContext *ctx, int rd, target_ulong 
imm)
 next_pc = ctx->base.pc_next + imm;
 if (!has_ext(ctx, RVC)) {
 if ((next_pc & 0x3) != 0) {
-gen_exception_inst_addr_mis(ctx);
+TCGv target_pc = tcg_temp_new();
+gen_pc_plus_diff(target_pc, ctx, next_pc);
+gen_exception_inst_addr_mis(ctx, target_pc);
 return;
 }
 }
-- 
2.25.1

Re: [RESEND PATCH v5 4/6] target/riscv: Add support for PC-relative translation

2023-04-03 Thread LIU Zhiwei




On 2023/4/1 20:49, Weiwei Li wrote:

Add a base save_pc For PC-relative translation(CF_PCREL).
Diable the directly sync pc from tb by riscv_cpu_synchronize_from_tb.
Sync pc before it's used or updated from tb related pc:
real_pc = (old)env->pc + target_pc(from tb) - ctx->save_pc
Use gen_get_target_pc to compute target address of auipc and successor
address of jalr.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
Reviewed-by: Richard Henderson 
---
  target/riscv/cpu.c  | 29 +-
  target/riscv/insn_trans/trans_rvi.c.inc | 14 +--
  target/riscv/translate.c| 53 +
  3 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..646fa31a59 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -658,16 +658,18 @@ static vaddr riscv_cpu_get_pc(CPUState *cs)
  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
const TranslationBlock *tb)
  {
-RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = &cpu->env;
-RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+if (!(tb_cflags(tb) & CF_PCREL)) {
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
+RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
  
-tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));

+tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
  
-if (xl == MXL_RV32) {

-env->pc = (int32_t) tb->pc;
-} else {
-env->pc = tb->pc;
+if (xl == MXL_RV32) {
+env->pc = (int32_t) tb->pc;
+} else {
+env->pc = tb->pc;
+}
  }
  }
  
@@ -693,11 +695,18 @@ static void riscv_restore_state_to_opc(CPUState *cs,

  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
  RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
+target_ulong pc;
+
+if (tb_cflags(tb) & CF_PCREL) {
+pc = (env->pc & TARGET_PAGE_MASK) | data[0];
+} else {
+pc = data[0];
+}
  
  if (xl == MXL_RV32) {

-env->pc = (int32_t)data[0];
+env->pc = (int32_t)pc;
  } else {
-env->pc = data[0];
+env->pc = pc;
  }
  env->bins = data[1];
  }
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 48c73cfcfe..52ef260eff 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -38,7 +38,9 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
  
  static bool trans_auipc(DisasContext *ctx, arg_auipc *a)

  {
-gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
+TCGv target_pc = dest_gpr(ctx, a->rd);
+gen_get_target_pc(target_pc, ctx, a->imm + ctx->base.pc_next);
+gen_set_gpr(ctx, a->rd, target_pc);
  return true;
  }
  
@@ -52,6 +54,7 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)

  {
  TCGLabel *misaligned = NULL;
  TCGv target_pc = tcg_temp_new();
+TCGv succ_pc = dest_gpr(ctx, a->rd);
  
  tcg_gen_addi_tl(target_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);

  tcg_gen_andi_tl(target_pc, target_pc, (target_ulong)-2);
@@ -68,7 +71,9 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
  tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
  }
  
-gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);

+gen_get_target_pc(succ_pc, ctx, ctx->pc_succ_insn);
+gen_set_gpr(ctx, a->rd, succ_pc);
+
  tcg_gen_mov_tl(cpu_pc, target_pc);


If pointer masking enabled, should we adjust the target_pc before it is 
written into the cpu_pc?



  lookup_and_goto_ptr(ctx);
  
@@ -159,6 +164,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond cond)

  TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
  TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
  target_ulong next_pc;
+target_ulong orig_pc_save = ctx->pc_save;
  
  if (get_xl(ctx) == MXL_RV128) {

  TCGv src1h = get_gprh(ctx, a->rs1);
@@ -175,6 +181,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
  
  gen_set_label(l); /* branch taken */
  
+ctx->pc_save = orig_pc_save;

  next_pc = ctx->base.pc_next + a->imm;
  if (!has_ext(ctx, RVC) && (next_pc & 0x3)) {
  /* misaligned */
@@ -182,8 +189,9 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
  gen_get_target_pc(target_pc, ctx, next_pc);
  gen_exception_inst_addr_mis(ctx, target_pc);
  } else {
-gen_goto_tb(ctx, 0, ctx->base.pc_next + a->imm);
+gen_goto_tb(ctx, 0, next_pc);
  }
+ctx->pc_save = -1;
  ctx->base.is_jmp = DISAS_NORETURN;
  
  return true;

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 7b5223efc2..2dd594ddae 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -59,6 +59,7 @@ typedef struct DisasContext {
  DisasContextBase bas

Re: [RFC PATCH v2 41/44] target/loongarch: Implement vilvl vilvh vextrins vshuf

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+void HELPER(vshuf_b)(CPULoongArchState *env,
+ uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
+{
+int i, m, k;
+VReg temp;
+VReg *Vd = &(env->fpr[vd].vreg);
+VReg *Vj = &(env->fpr[vj].vreg);
+VReg *Vk = &(env->fpr[vk].vreg);
+VReg *Va = &(env->fpr[va].vreg);
+
+m = LSX_LEN/8;
+for (i = 0; i < m ; i++) {
+k = (Va->B(i)& 0x3f) % (2 * m);


Eh?  Double masking?


+temp.B(i) = (Va->B(i) & 0xc0) ? 0 : k < m ? Vk->B(k) : Vj->B(k - m);


Triple masking?

I would have expected something like

k = Va->B(i) % N;
temp.B(i) = (k < m ? Vj : k < 2 * m ? Vk : 0);


+#define VSHUF(NAME, BIT, E)  \
+void HELPER(NAME)(CPULoongArchState *env,\
+  uint32_t vd, uint32_t vj, uint32_t vk) \
+{\
+int i, m, k; \
+VReg temp;   \
+VReg *Vd = &(env->fpr[vd].vreg); \
+VReg *Vj = &(env->fpr[vj].vreg); \
+VReg *Vk = &(env->fpr[vk].vreg); \
+ \
+m = LSX_LEN/BIT; \
+for (i = 0; i < m; i++) {\
+k  = (Vd->E(i) & 0x3f) % (2 * m);\
+temp.E(i) = (Vd->E(i) & 0xc0) ? 0 : k < m ? Vk->E(k) : Vj->E(k - m); \
+}\
+Vd->D(0) = temp.D(0);\
+Vd->D(1) = temp.D(1);\
+}


Likewise.


+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
+#define VSHUF4I(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+  uint32_t vd, uint32_t vj, uint32_t imm) \
+{ \
+int i;\
+VReg temp;\
+VReg *Vd = &(env->fpr[vd].vreg);  \
+VReg *Vj = &(env->fpr[vj].vreg);  \
+  \
+for (i = 0; i < LSX_LEN/BIT; i++) {   \
+ temp.E(i) = Vj->E(SHF_POS(i, imm));  \
+} \
+Vd->D[0] = temp.D[0]; \
+Vd->D[1] = temp.D[1]; \
+}


Merge SHF_POS unless you expect it to be used again?


+void HELPER(vshuf4i_d)(CPULoongArchState *env,
+   uint32_t vd, uint32_t vj, uint32_t imm)
+{
+VReg *Vd = &(env->fpr[vd].vreg);
+VReg *Vj = &(env->fpr[vj].vreg);
+
+VReg temp;
+temp.D(0) = ((imm & 0x03) == 0x00) ? Vd->D(0):
+((imm & 0x03) == 0x01) ? Vd->D(1):
+((imm & 0x03) == 0x02) ? Vj->D(0): Vj->D(1);
+
+temp.D(1) = ((imm & 0x0c) == 0x00) ? Vd->D(0):
+((imm & 0x0c) == 0x04) ? Vd->D(1):
+((imm & 0x0c) == 0x08) ? Vj->D(0): Vj->D(1);
+
+Vd->D[0] = temp.D[0];
+Vd->D[1] = temp.D[1];
+}


Perhaps

temp.D(0) = (imm & 2 ? Vj : Vd)->D(imm & 1);
temp.D(1) = (imm & 8 ? Vj : Vd)->D((imm >> 2) & 1);


r~

[PATCH qemu v2] sev/i386: Fix error reporting

2023-04-03 Thread Alexey Kardashevskiy

c9f5aaa6bce8 ("sev: Add Error ** to sev_kvm_init()") converted
error_report() to error_setg(), however it missed one error_report()
and other 2 changes added error_report() after conversion. The result
is the caller - kvm_init() - crashes in error_report_err as local_err
is NULL.

Follow the pattern and use error_setg instead of error_report.
Remove the __func__ anti-pattern.

Fixes: 9681f8677f26 ("sev/i386: Require in-kernel irqchip support for SEV-ES 
guests")
Fixes: 6b98e96f1842 ("sev/i386: Add initial support for SEV-ES")
Fixes: c9f5aaa6bce8 ("sev: Add Error ** to sev_kvm_init()")
Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* removed __func__ from afftected lines
---
 target/i386/sev.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 859e06f6ad..eabb095a69 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -922,7 +922,7 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 
 ret = ram_block_discard_disable(true);
 if (ret) {
-error_report("%s: cannot disable RAM discard", __func__);
+error_setg(errp, "Cannot disable RAM discard");
 return -1;
 }
 
@@ -968,15 +968,12 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
**errp)
 
 if (sev_es_enabled()) {
 if (!kvm_kernel_irqchip_allowed()) {
-error_report("%s: SEV-ES guests require in-kernel irqchip support",
- __func__);
+error_setg(errp, "SEV-ES guests require in-kernel irqchip 
support");
 goto err;
 }
 
 if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
-error_report("%s: guest policy requires SEV-ES, but "
- "host SEV-ES support unavailable",
- __func__);
+error_setg(errp, "Guest policy requires SEV-ES, but host SEV-ES 
support unavailable");
 goto err;
 }
 cmd = KVM_SEV_ES_INIT;
-- 
2.39.1

[PATCH v2 1/3] qapi/machine-target: refactor machine-target

2023-04-03 Thread Dinah Baum

Moved architecture agnostic data types to their own
file to avoid "attempt to use poisoned TARGET_*"
error that results when including qapi header
with commands that aren't defined for all architectures.
Required to implement enabling `query-cpu-model-expansion`
on all architectures

Signed-off-by: Dinah Baum 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS |  1 +
 qapi/machine-target-common.json | 79 +
 qapi/machine-target.json| 73 +-
 qapi/meson.build|  1 +
 4 files changed, 82 insertions(+), 72 deletions(-)
 create mode 100644 qapi/machine-target-common.json

diff --git a/MAINTAINERS b/MAINTAINERS
index ef45b5e71e..fbc4d7be66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1751,6 +1751,7 @@ F: hw/core/numa.c
 F: hw/cpu/cluster.c
 F: qapi/machine.json
 F: qapi/machine-target.json
+F: qapi/machine-target-common.json
 F: include/hw/boards.h
 F: include/hw/core/cpu.h
 F: include/hw/cpu/cluster.h
diff --git a/qapi/machine-target-common.json b/qapi/machine-target-common.json
new file mode 100644
index 00..1e6da3177d
--- /dev/null
+++ b/qapi/machine-target-common.json
@@ -0,0 +1,79 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = Common data types for machine target commands
+##
+
+##
+# @CpuModelInfo:
+#
+# Virtual CPU model.
+#
+# A CPU model consists of the name of a CPU definition, to which
+# delta changes are applied (e.g. features added/removed). Most magic values
+# that an architecture might require should be hidden behind the name.
+# However, if required, architectures can expose relevant properties.
+#
+# @name: the name of the CPU definition the model is based on
+# @props: a dictionary of QOM properties to be applied
+#
+# Since: 2.8
+##
+{ 'struct': 'CpuModelInfo',
+'data': { 'name': 'str',
+  '*props': 'any' } }
+
+##
+# @CpuModelExpansionType:
+#
+# An enumeration of CPU model expansion types.
+#
+# @static: Expand to a static CPU model, a combination of a static base
+#  model name and property delta changes. As the static base model will
+#  never change, the expanded CPU model will be the same, independent 
of
+#  QEMU version, machine type, machine options, and accelerator 
options.
+#  Therefore, the resulting model can be used by tooling without having
+#  to specify a compatibility machine - e.g. when displaying the "host"
+#  model. The @static CPU models are migration-safe.
+
+# @full: Expand all properties. The produced model is not guaranteed to be
+#migration-safe, but allows tooling to get an insight and work with
+#model details.
+#
+# Note: When a non-migration-safe CPU model is expanded in static mode, some
+#   features enabled by the CPU model may be omitted, because they can't be
+#   implemented by a static CPU model definition (e.g. cache info 
passthrough and
+#   PMU passthrough in x86). If you need an accurate representation of the
+#   features enabled by a non-migration-safe CPU model, use @full. If you 
need a
+#   static representation that will keep ABI compatibility even when 
changing QEMU
+#   version or machine-type, use @static (but keep in mind that some 
features may
+#   be omitted).
+#
+# Since: 2.8
+##
+{ 'enum': 'CpuModelExpansionType',
+  'data': [ 'static', 'full' ] }
+
+##
+# @CpuModelCompareResult:
+#
+# An enumeration of CPU model comparison results. The result is usually
+# calculated using e.g. CPU features or CPU generations.
+#
+# @incompatible: If model A is incompatible to model B, model A is not
+#guaranteed to run where model B runs and the other way around.
+#
+# @identical: If model A is identical to model B, model A is guaranteed to run
+# where model B runs and the other way around.
+#
+# @superset: If model A is a superset of model B, model B is guaranteed to run
+#where model A runs. There are no guarantees about the other way.
+#
+# @subset: If model A is a subset of model B, model A is guaranteed to run
+#  where model B runs. There are no guarantees about the other way.
+#
+# Since: 2.8
+##
+{ 'enum': 'CpuModelCompareResult',
+  'data': [ 'incompatible', 'identical', 'superset', 'subset' ] }
diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 2e267fa458..1cacfde88e 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -4,78 +4,7 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
 
-##
-# @CpuModelInfo:
-#
-# Virtual CPU model.
-#
-# A CPU model consists of the name of a CPU definition, to which
-# delta changes are applied (e.g. features added/removed). Most magic values
-# that an architecture might require should be hidden behind the name.
-# However, if required, architectures can expose relevant properties.
-#
-# @name: the name of the CPU

[PATCH v2 2/3] cpu, qapi, target/arm, i386, s390x: Generalize query-cpu-model-expansion

2023-04-03 Thread Dinah Baum

This patch enables 'query-cpu-model-expansion' on all
architectures. Only architectures that implement
the command will return results, others will return an
error message as before.

This patch lays the groundwork for parsing a
-cpu cpu,help option as specified in
https://gitlab.com/qemu-project/qemu/-/issues/1480

Signed-off-by: Dinah Baum 
---
 cpu.c| 20 
 include/exec/cpu-common.h|  8 +
 qapi/machine-target-common.json  | 51 +
 qapi/machine-target.json | 56 
 target/arm/arm-qmp-cmds.c|  7 ++--
 target/arm/cpu.h |  7 +++-
 target/i386/cpu-sysemu.c |  7 ++--
 target/i386/cpu.h|  6 
 target/s390x/cpu.h   |  7 
 target/s390x/cpu_models_sysemu.c |  6 ++--
 10 files changed, 108 insertions(+), 67 deletions(-)

diff --git a/cpu.c b/cpu.c
index 849bac062c..daf4e1ff0d 100644
--- a/cpu.c
+++ b/cpu.c
@@ -292,6 +292,26 @@ void list_cpus(const char *optarg)
 #endif
 }
 
+CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType type,
+CpuModelInfo *model,
+Error **errp)
+{
+/* XXX: implement cpu_model_expansion for targets that still miss it */
+#if defined(cpu_model_expansion)
+return cpu_model_expansion(type, model, errp);
+#else
+error_setg(errp, "Could not query cpu model information");
+return NULL;
+#endif
+}
+
+CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
type,
+ CpuModelInfo *model,
+ Error **errp)
+{
+return get_cpu_model_expansion_info(type, model, errp);
+}
+
 #if defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(target_ulong addr)
 {
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6feaa40ca7..ec6024dfde 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -7,6 +7,8 @@
 #include "exec/hwaddr.h"
 #endif
 
+#include "qapi/qapi-commands-machine-target-common.h"
+
 /**
  * vaddr:
  * Type wide enough to contain any #target_ulong virtual address.
@@ -166,5 +168,11 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
 extern int singlestep;
 
 void list_cpus(const char *optarg);
+typedef void (*cpu_model_expansion_func)(CpuModelExpansionType type,
+ CpuModelInfo *model,
+ Error **errp);
+CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType type,
+CpuModelInfo *model,
+Error **errp);
 
 #endif /* CPU_COMMON_H */
diff --git a/qapi/machine-target-common.json b/qapi/machine-target-common.json
index 1e6da3177d..44713e9935 100644
--- a/qapi/machine-target-common.json
+++ b/qapi/machine-target-common.json
@@ -77,3 +77,54 @@
 ##
 { 'enum': 'CpuModelCompareResult',
   'data': [ 'incompatible', 'identical', 'superset', 'subset' ] }
+
+##
+# @CpuModelExpansionInfo:
+#
+# The result of a cpu model expansion.
+#
+# @model: the expanded CpuModelInfo.
+#
+# Since: 2.8
+##
+{ 'struct': 'CpuModelExpansionInfo',
+  'data': { 'model': 'CpuModelInfo' } }
+
+##
+# @query-cpu-model-expansion:
+#
+# Expands a given CPU model (or a combination of CPU model + additional 
options)
+# to different granularities, allowing tooling to get an understanding what a
+# specific CPU model looks like in QEMU under a certain configuration.
+#
+# This interface can be used to query the "host" CPU model.
+#
+# The data returned by this command may be affected by:
+#
+# * QEMU version: CPU models may look different depending on the QEMU version.
+#   (Except for CPU models reported as "static" in query-cpu-definitions.)
+# * machine-type: CPU model  may look different depending on the machine-type.
+#   (Except for CPU models reported as "static" in query-cpu-definitions.)
+# * machine options (including accelerator): in some architectures, CPU models
+#   may look different depending on machine and accelerator options. (Except 
for
+#   CPU models reported as "static" in query-cpu-definitions.)
+# * "-cpu" arguments and global properties: arguments to the -cpu option and
+#   global properties may affect expansion of CPU models. Using
+#   query-cpu-model-expansion while using these is not advised.
+#
+# Some architectures may not support all expansion types. s390x supports
+# "full" and "static". Arm only supports "full".
+#
+# Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is
+#  not supported, if the model cannot be expanded, if the model 
contains
+#  an unknown CPU definition name, unknown properties or properties
+#  with a wrong type. Also returns an error if an expansion type is
+#

[PATCH v2 3/3] cpu, qdict, vl: Enable printing options for CPU type

2023-04-03 Thread Dinah Baum

Change parsing of -cpu argument to allow -cpu cpu,help
to print options for the CPU type similar to
how the '-device' option works.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480

Signed-off-by: Dinah Baum 
---
 cpu.c | 41 +++
 include/exec/cpu-common.h |  2 ++
 include/qapi/qmp/qdict.h  |  2 ++
 qemu-options.hx   |  7 ---
 qobject/qdict.c   |  5 +
 softmmu/vl.c  | 36 --
 6 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/cpu.c b/cpu.c
index daf4e1ff0d..5f8a72e51f 100644
--- a/cpu.c
+++ b/cpu.c
@@ -23,7 +23,9 @@
 #include "exec/target_page.h"
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
+#include "qemu/qemu-print.h"
 #include "migration/vmstate.h"
 #ifdef CONFIG_USER_ONLY
 #include "qemu.h"
@@ -43,6 +45,8 @@
 #include "trace/trace-root.h"
 #include "qemu/accel.h"
 #include "qemu/plugin.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qobject.h"
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
@@ -312,6 +316,43 @@ CpuModelExpansionInfo 
*qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 return get_cpu_model_expansion_info(type, model, errp);
 }
 
+void list_cpu_model_expansion(CpuModelExpansionType type,
+  CpuModelInfo *model,
+  Error **errp)
+{
+CpuModelExpansionInfo *expansion_info;
+QDict *qdict;
+QDictEntry *qdict_entry;
+const char *key;
+QObject *obj;
+QType q_type;
+GPtrArray *array;
+int i;
+const char *type_name;
+
+expansion_info = get_cpu_model_expansion_info(type, model, errp);
+if (expansion_info) {
+qdict = qobject_to(QDict, expansion_info->model->props);
+if (qdict) {
+qemu_printf("%s features:\n", model->name);
+array = g_ptr_array_new();
+for (qdict_entry = (QDictEntry *)qdict_first(qdict); qdict_entry;
+ qdict_entry = (QDictEntry *)qdict_next(qdict, qdict_entry)) {
+g_ptr_array_add(array, qdict_entry);
+}
+g_ptr_array_sort(array, (GCompareFunc)dict_key_compare);
+for (i = 0; i < array->len; i++) {
+qdict_entry = array->pdata[i];
+key = qdict_entry_key(qdict_entry);
+obj = qdict_get(qdict, key);
+q_type = qobject_type(obj);
+type_name = QType_str(q_type);
+qemu_printf("  %s=<%s>\n", key, type_name);
+}
+}
+}
+}
+
 #if defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(target_ulong addr)
 {
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index ec6024dfde..8fc05307ad 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -174,5 +174,7 @@ typedef void 
(*cpu_model_expansion_func)(CpuModelExpansionType type,
 CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType type,
 CpuModelInfo *model,
 Error **errp);
+void list_cpu_model_expansion(CpuModelExpansionType type,
+  CpuModelInfo *model, Error **errp);
 
 #endif /* CPU_COMMON_H */
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 82e90fc072..1ff9523a13 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -68,4 +68,6 @@ const char *qdict_get_try_str(const QDict *qdict, const char 
*key);
 
 QDict *qdict_clone_shallow(const QDict *src);
 
+int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2);
+
 #endif /* QDICT_H */
diff --git a/qemu-options.hx b/qemu-options.hx
index 59bdf67a2c..10601626b7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -169,11 +169,12 @@ SRST
 ERST
 
 DEF("cpu", HAS_ARG, QEMU_OPTION_cpu,
-"-cpu cpuselect CPU ('-cpu help' for list)\n", QEMU_ARCH_ALL)
+"-cpu cpuselect CPU ('-cpu help' for list)\n"
+"use '-cpu cpu,help' to print possible properties\n", 
QEMU_ARCH_ALL)
 SRST
 ``-cpu model``
-Select CPU model (``-cpu help`` for list and additional feature
-selection)
+Select CPU model (``-cpu help`` and ``-cpu cpu,help``) for list and 
additional feature
+selection
 ERST
 
 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 8faff230d3..31407e62f6 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -447,3 +447,8 @@ void qdict_unref(QDict *q)
 {
 qobject_unref(q);
 }
+
+int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2)
+{
+return g_strcmp0(qdict_entry_key(*entry1), qdict_entry_key(*entry2));
+}
diff --git a/softmmu/vl.c b/softmmu/vl.c
index ea20b23e4c..af6753a7e3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -500,6 +500,15 @@ static QemuOptsList qemu_action_opts = {
 },
 }

[RESEND PATCH v2 0/3] Enable -cpu ,help

2023-04-03 Thread Dinah Baum

Part 1 is a refactor/code motion patch for
qapi/machine target required for setup of

Part 2 which enables query-cpu-model-expansion
on all architectures

Part 3 implements the ',help' feature

Limitations:
Currently only 'FULL' expansion queries are implemented since
that's the only type enabled on the architectures that
allow feature probing

Unlike the 'device,help' command, default values aren't
printed

Changes since v2: Rebase

Dinah Baum (3):
  qapi/machine-target: refactor machine-target
  cpu, qapi, target/arm, i386, s390x: Generalize
query-cpu-model-expansion
  cpu, qdict, vl: Enable printing options for CPU type

 MAINTAINERS  |   1 +
 cpu.c|  61 +++
 include/exec/cpu-common.h|  10 +++
 include/qapi/qmp/qdict.h |   2 +
 qapi/machine-target-common.json  | 130 +++
 qapi/machine-target.json | 129 +-
 qapi/meson.build |   1 +
 qemu-options.hx  |   7 +-
 qobject/qdict.c  |   5 ++
 softmmu/vl.c |  36 -
 target/arm/arm-qmp-cmds.c|   7 +-
 target/arm/cpu.h |   7 +-
 target/i386/cpu-sysemu.c |   7 +-
 target/i386/cpu.h|   6 ++
 target/s390x/cpu.h   |   7 ++
 target/s390x/cpu_models_sysemu.c |   6 +-
 16 files changed, 278 insertions(+), 144 deletions(-)
 create mode 100644 qapi/machine-target-common.json

-- 
2.30.2

Re: [RFC PATCH v2 40/44] target/loongarch: Implement vreplve vpack vpick

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+static bool trans_vbsll_v(DisasContext *ctx, arg_vv_i *a)
+{
+int ofs;
+TCGv_i64 desthigh, destlow, high, low, t;
+
+CHECK_SXE;
+
+desthigh = tcg_temp_new_i64();
+destlow = tcg_temp_new_i64();
+high = tcg_temp_new_i64();
+low = tcg_temp_new_i64();
+t = tcg_constant_i64(0);
+
+tcg_gen_ld_i64(high, cpu_env,
+   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(1)));
+tcg_gen_ld_i64(low, cpu_env,
+   offsetof(CPULoongArchState, fpr[a->vj].vreg.D(0)));
+
+ofs = ((a->imm) & 0xf) * 8;
+if (ofs < 64) {
+tcg_gen_extract2_i64(desthigh, low, high, 64 -ofs);


high is only used here, therefore the load should be delayed.


+tcg_gen_shli_i64(destlow, low, ofs);
+} else {
+tcg_gen_shli_i64(desthigh, low, ofs -64);
+tcg_gen_mov_i64(destlow, t);


Delay the allocation of destlow into the < 64 block,
then simply assign destlow = tcg_constant_i64(0) here.

Watch the spacing: "ofs - 64".

Similarly for trans_vbsrl_v.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [RFC PATCH v2 39/44] target/loongarch: Implement vinsgr2vr vpickve2gr vreplgr2vr

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

This patch includes:
- VINSGR2VR.{B/H/W/D};
- VPICKVE2GR.{B/H/W/D}[U];
- VREPLGR2VR.{B/H/W/D}.

Signed-off-by: Song Gao
---
  target/loongarch/disas.c|  33 ++
  target/loongarch/insn_trans/trans_lsx.c.inc | 110 
  target/loongarch/insns.decode   |  30 ++
  3 files changed, 173 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+static void gen_vbitseli(unsigned vece, TCGv_vec a, TCGv_vec b, int64_t imm)
+{
+TCGv_vec t;
+
+t = tcg_temp_new_vec_matching(a);
+tcg_gen_dupi_vec(vece, t, imm);


tcg_constant_vec_matching.


+void HELPER(vseteqz_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+VReg *Vj = &(env->fpr[vj].vreg);
+env->cf[cd & 0x7] = (Vj->Q(0) == 0);
+}
+
+void HELPER(vsetnez_v)(CPULoongArchState *env, uint32_t cd, uint32_t vj)
+{
+VReg *Vj = &(env->fpr[vj].vreg);
+env->cf[cd & 0x7] = (Vj->Q(0) != 0);
+}


This is trivial inline.


+#define SETANYEQZ(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
+{   \
+int i;  \
+bool ret = false;   \
+VReg *Vj = &(env->fpr[vj].vreg);\
+\
+for (i = 0; i < LSX_LEN/BIT; i++) { \
+ret |= (Vj->E(i) == 0); \
+}   \
+env->cf[cd & 0x7] = ret;\
+}
+SETANYEQZ(vsetanyeqz_b, 8, B)
+SETANYEQZ(vsetanyeqz_h, 16, H)
+SETANYEQZ(vsetanyeqz_w, 32, W)
+SETANYEQZ(vsetanyeqz_d, 64, D)


These could be inlined, though slightly harder.
C.f. target/arm/sve_helper.c, do_match2 (your n == 0).

Anyway, leaving this as-is for now is also ok.


r~

Re: [RFC PATCH v2 37/44] target/loongarch: Implement vfcmp

2023-04-03 Thread Richard Henderson


On 3/27/23 20:06, Song Gao wrote:

+static uint64_t vfcmp_common(CPULoongArchState *env,
+ FloatRelation cmp, uint32_t flags)
+{
+bool ret;
+
+switch (cmp) {
+case float_relation_less:
+ret = (flags & FCMP_LT);
+break;
+case float_relation_equal:
+ret = (flags & FCMP_EQ);
+break;
+case float_relation_greater:
+ret = (flags & FCMP_GT);
+break;
+case float_relation_unordered:
+ret = (flags & FCMP_UN);
+break;
+default:
+g_assert_not_reached();
+}
+
+return ret;
+}


Either change the return type to bool, or return {0, -1} here...


+
+#define VFCMP(NAME, BIT, T, E, FN)   \
+void HELPER(NAME)(CPULoongArchState *env,\
+  uint32_t vd, uint32_t vj, uint32_t vk, uint32_t flags) \
+{\
+int i;   \
+VReg t;  \
+VReg *Vd = &(env->fpr[vd].vreg); \
+VReg *Vj = &(env->fpr[vj].vreg); \
+VReg *Vk = &(env->fpr[vk].vreg); \
+ \
+vec_clear_cause(env);\
+for (i = 0; i < LSX_LEN/BIT ; i++) { \
+FloatRelation cmp;   \
+cmp = FN(Vj->E(i), Vk->E(i), &env->fp_status);   \
+t.E(i) = (vfcmp_common(env, cmp, flags)) ? -1 : 0;   \


... and avoid the extra conditional here.

Otherwise,
Reviewed-by: Richard Henderson 


r~

[PATCH v2 for 8.0?] nbd/server: Request TCP_NODELAY

2023-04-03 Thread Eric Blake

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] 
https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] 
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases

CC: Florian Westphal 
Signed-off-by: Eric Blake 
Message-Id: <20230327192947.1324372-1-ebl...@redhat.com>
---

v2 fix typo, enhance commit message

Given that corking made it in through Kevin's tree for 8.0-rc2 but
this one did not, but I didn't get any R-b, is there any objection to
me doing a pull request to get this into 8.0-rc3?

 nbd/server.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/server.c b/nbd/server.c
index 848836d4140..3d8d0d81df2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2758,6 +2758,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
 }
 client->tlsauthz = g_strdup(tlsauthz);
 client->sioc = sioc;
+qio_channel_set_delay(QIO_CHANNEL(sioc), false);
 object_ref(OBJECT(client->sioc));
 client->ioc = QIO_CHANNEL(sioc);
 object_ref(OBJECT(client->ioc));

base-commit: efcd0ec14b0fe9ee0ee70277763b2d538d19238d
-- 
2.39.2

Re: [PATCH] MAINTAINERS: Add Eugenio Pérez as vhost-shadow-virtqueue reviewer

2023-04-03 Thread Jason Wang

On Fri, Mar 31, 2023 at 11:04 PM Eugenio Pérez  wrote:
>
> I'd like to be notified on SVQ patches and review them.
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

> ---
>  MAINTAINERS | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ef45b5e71e..986119e8ab 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2061,6 +2061,10 @@ F: backends/vhost-user.c
>  F: include/sysemu/vhost-user-backend.h
>  F: subprojects/libvhost-user/
>
> +vhost-shadow-virtqueue
> +R: Eugenio Pérez 
> +F: hw/virtio/vhost-shadow-virtqueue.*
> +
>  virtio
>  M: Michael S. Tsirkin 
>  S: Supported
> --
> 2.31.1
>

[PATCH] block/nfs: avoid BDRV_POLL_WHILE() in nfs_co_get_allocated_file_size()

2023-04-03 Thread Stefan Hajnoczi

Commit 82618d7bc341 ("block: Convert bdrv_get_allocated_file_size() to
co_wrapper") made nfs_get_allocated_file_size() a coroutine. The
coroutine still uses BDRV_POLL_WHILE() to wait for the NFS RPC to
complete.

Take it a step further and yield the coroutine until the RPC completes.
This avoids the blocking, nested event loop and unifies
nfs_co_get_allocated_file_size() with the other coroutine functions that
send RPCs:
- Use nfs_co_init_task() to set up a coroutine NFSRPC task.
- Take client->mutex to protect fd handler state since we're in IO_CODE.
- Use nfs_co_generic_cb() instead of a specialized callback function.
- Yield until the task completes.

Getting rid of BDRV_POLL_WHILE() helps with the multi-queue block layer
effort where we don't want to take the AioContext lock.

This commit passes qemu-iotests/check -nfs, except inactivate-failure,
which also fails before this commit.

Signed-off-by: Stefan Hajnoczi 
---
 block/nfs.c | 45 +
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 351dc6ec8d..71062c9b47 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -248,14 +248,18 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void 
*data,
 {
 NFSRPC *task = private_data;
 task->ret = ret;
-assert(!task->st);
 if (task->ret > 0 && task->iov) {
+assert(!task->st);
 if (task->ret <= task->iov->size) {
 qemu_iovec_from_buf(task->iov, 0, data, task->ret);
 } else {
 task->ret = -EIO;
 }
 }
+if (task->ret == 0 && task->st) {
+assert(!task->iov);
+memcpy(task->st, data, sizeof(struct stat));
+}
 if (task->ret < 0) {
 error_report("NFS Error: %s", nfs_get_error(nfs));
 }
@@ -713,29 +717,10 @@ static int nfs_has_zero_init(BlockDriverState *bs)
 }
 
 #if !defined(_WIN32)
-/* Called (via nfs_service) with QemuMutex held.  */
-static void
-nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
-   void *private_data)
-{
-NFSRPC *task = private_data;
-task->ret = ret;
-if (task->ret == 0) {
-memcpy(task->st, data, sizeof(struct stat));
-}
-if (task->ret < 0) {
-error_report("NFS Error: %s", nfs_get_error(nfs));
-}
-
-/* Set task->complete before reading bs->wakeup.  */
-qatomic_mb_set(&task->complete, 1);
-bdrv_wakeup(task->bs);
-}
-
 static int64_t coroutine_fn nfs_co_get_allocated_file_size(BlockDriverState 
*bs)
 {
 NFSClient *client = bs->opaque;
-NFSRPC task = {0};
+NFSRPC task;
 struct stat st;
 
 if (bdrv_is_read_only(bs) &&
@@ -743,15 +728,19 @@ static int64_t coroutine_fn 
nfs_co_get_allocated_file_size(BlockDriverState *bs)
 return client->st_blocks * 512;
 }
 
-task.bs = bs;
+nfs_co_init_task(bs, &task);
 task.st = &st;
-if (nfs_fstat_async(client->context, client->fh, 
nfs_get_allocated_file_size_cb,
-&task) != 0) {
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(&client->mutex) {
+if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
+&task) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-BDRV_POLL_WHILE(bs, !task.complete);
+nfs_set_events(client);
+}
+while (!task.complete) {
+qemu_coroutine_yield();
+}
 
 return (task.ret < 0 ? task.ret : st.st_blocks * 512);
 }
-- 
2.39.2

[PATCH v5] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf

Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

To make this work consistently, also fix up all places in QEMU that
expect fd offsets to be 0.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - add qom documentation
  - propagate offset into truncate, size and alignment checks

v2 -> v3:

  - failed attempt at fixing typo

v3 -> v4:

  - fix typo

v4 -> v5:

  - improve qom doc comment
  - account for fd_offset in more places
---
 backends/hostmem-file.c | 40 +++-
 hw/virtio/vhost-user.c  |  1 +
 include/exec/memory.h   |  2 ++
 include/exec/ram_addr.h |  3 ++-
 include/exec/ramblock.h |  1 +
 qapi/qom.json   |  5 +
 qemu-options.hx |  6 +-
 softmmu/memory.c|  3 ++-
 softmmu/physmem.c   | 17 -
 9 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 25141283c4..38ea65bec5 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -27,6 +27,7 @@ struct HostMemoryBackendFile {
 
 char *mem_path;
 uint64_t align;
+uint64_t offset;
 bool discard_data;
 bool is_pmem;
 bool readonly;
@@ -58,7 +59,8 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
 memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
  backend->size, fb->align, ram_flags,
- fb->mem_path, fb->readonly, errp);
+ fb->mem_path, fb->offset, fb->readonly,
+ errp);
 g_free(name);
 #endif
 }
@@ -125,6 +127,36 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
 fb->align = val;
 }
 
+static void file_memory_backend_get_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val = fb->offset;
+
+visit_type_size(v, name, &val, errp);
+}
+
+static void file_memory_backend_set_offset(Object *o, Visitor *v,
+  const char *name, void *opaque,
+  Error **errp)
+{
+HostMemoryBackend *backend = MEMORY_BACKEND(o);
+HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
+uint64_t val;
+
+if (host_memory_backend_mr_inited(backend)) {
+error_setg(errp, "cannot change property '%s' of %s", name,
+   object_get_typename(o));
+return;
+}
+
+if (!visit_type_size(v, name, &val, errp)) {
+return;
+}
+fb->offset = val;
+}
+
 #ifdef CONFIG_LIBPMEM
 static bool file_memory_backend_get_pmem(Object *o, Error **errp)
 {
@@ -197,6 +229,12 @@ file_backend_class_init(ObjectClass *oc, void *data)
 file_memory_backend_get_align,
 file_memory_backend_set_align,
 NULL, NULL);
+object_class_property_add(oc, "offset", "int",
+file_memory_backend_get_offset,
+file_memory_backend_set_offset,
+NULL, NULL);
+object_class_property_set_description(oc, "offset",
+"Offset into the target file (ex: 1G)");
 #ifdef CONFIG_LIBPMEM
 object_class_property_add_bool(oc, "pmem",
 file_memory_backend_get_pmem, file_memory_backend_set_pmem);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index e5285df4ba..39dc803b03 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -483,6 +483,7 @@ static MemoryRegion *vhost_user_get_mr_data(uint64_t addr, 
ram_addr_t *offset,
 assert((uintptr_t)addr == addr);
 mr = memory_region_from_host((void *)(uintptr_t)addr, offset);
 *fd = memory_region_get_fd(mr);
+*offset += mr->ram_block->fd_offset;
 
 return mr;
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..3b7295fbe2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1318,6 +1318,7 @@ void memory_region_init_resizeable_ram(MemoryRegion *mr,
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  * RAM_NORESERVE,
  * @path: the path in which to allocate the RAM.
+ * @offset: offset within the file referenced by path
  * @readonly: true to open @path for reading, false for read/write.
  * @errp: pointer to Error*, to store an error if it happens.
  *
@@ -1331,6 +1332,7 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
   uint64_t align,
   uint32_t ram_flags,
   const char *path,
+

Re: [PATCH v4] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf



On 03.04.23 09:13, David Hildenbrand wrote:



On 01.04.23 19:47, Stefan Hajnoczi wrote:

On Sat, Apr 01, 2023 at 12:42:57PM +, Alexander Graf wrote:

Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 

---

v1 -> v2:

   - add qom documentation
   - propagate offset into truncate, size and alignment checks

v2 -> v3:

   - failed attempt at fixing typo

v2 -> v4:

   - fix typo
---
  backends/hostmem-file.c | 40 +++-
  include/exec/memory.h   |  2 ++
  include/exec/ram_addr.h |  3 ++-
  qapi/qom.json   |  5 +
  qemu-options.hx |  6 +-
  softmmu/memory.c    |  3 ++-
  softmmu/physmem.c   | 14 ++
  7 files changed, 65 insertions(+), 8 deletions(-)


Reviewed-by: Stefan Hajnoczi 


The change itself looks good to me, but I do think some other QEMU code
that ends up working on the RAMBlock is not prepared yet. Most probably,
because we never ended up using fd with an offset as guest RAM.

We don't seem to be remembering that offset in the RAMBlock. First, I
thought block->offset would be used for that, but that's just the offset
in the ram_addr_t space. Maybe we need a new "block->fd_offset" to
remember the offset (unless I am missing something).

The real offset in the file would be required at least in two cases I
can see (whenever we essentially end up calling mmap() on the fd again):

1) qemu_ram_remap(): We'd have to add the file offset on top of the
calculated offset.



This one is a bit tricky to test, as we're only running into that code 
path with KVM when we see an #MCE. But it's trivial, so I'm confident it 
will work as expected.





2) vhost-user: most probably whenever we set the mmap_offset. For
example, in vhost_user_fill_set_mem_table_msg() we'd similarly have to
add the file_offset on top of the calculated offset.
vhost_user_get_mr_data() should most probably do that.



I agree - adding the offset as part of get_mr_data() is sufficient. I 
have validated it works correctly with QEMU's vhost-user-blk target.


I think the changes are still obvious enough that I'll fold them all 
into a single patch.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH] vfio/migration: Skip log_sync during migration SETUP state

2023-04-03 Thread Alex Williamson

On Mon, 3 Apr 2023 22:36:42 +0200
Cédric Le Goater  wrote:

> On 4/3/23 15:00, Avihai Horon wrote:
> > Currently, VFIO log_sync can be issued while migration is in SETUP
> > state. However, doing this log_sync is at best redundant and at worst
> > can fail.
> > 
> > Redundant -- all RAM is marked dirty in migration SETUP state and is
> > transferred only after migration is set to ACTIVE state, so doing
> > log_sync during migration SETUP is pointless.
> > 
> > Can fail -- there is a time window, between setting migration state to
> > SETUP and starting dirty tracking by RAM save_live_setup handler, during
> > which dirty tracking is still not started. Any VFIO log_sync call that
> > is issued during this time window will fail. For example, this error can
> > be triggered by migrating a VM when a GUI is active, which constantly
> > calls log_sync.
> > 
> > Fix it by skipping VFIO log_sync while migration is in SETUP state.
> > 
> > Fixes: 758b96b61d5c ("vfio/migrate: Move switch of dirty tracking into 
> > vfio_memory_listener")
> > Signed-off-by: Avihai Horon   
> migration is still experimental, so this can wait 8.1. Correct me if not.

Agreed, this doesn't seem nearly catastrophic enough as an experimental
feature that it can't wait for the 8.1 devel cycle to open.  Thanks,

Alex

> > ---
> >   hw/vfio/common.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 4d01ea3515..78358ede27 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -478,7 +478,8 @@ static bool 
> > vfio_devices_all_dirty_tracking(VFIOContainer *container)
> >   VFIODevice *vbasedev;
> >   MigrationState *ms = migrate_get_current();
> >   
> > -if (!migration_is_setup_or_active(ms->state)) {
> > +if (ms->state != MIGRATION_STATUS_ACTIVE &&
> > +ms->state != MIGRATION_STATUS_DEVICE) {
> >   return false;
> >   }
> > 
>

Re: [PATCH RESEND 2/2] migration/ram.c: Fix migration with compress enabled

2023-04-03 Thread Peter Xu

On Sun, Apr 02, 2023 at 05:48:38PM +, Lukas Straub wrote:
> Since ec6f3ab9, migration with compress enabled was broken, because
> the compress threads use a dummy QEMUFile which just acts as a
> buffer and that commit accidentally changed it to use the outgoing
> migration channel instead.

Sorry. :(

> Fix this by using the dummy file again in the compress threads.
> 
> Signed-off-by: Lukas Straub 

Reviewed-by: Peter Xu 

Let's also add a Fixes tag to be clear:

ec6f3ab9f4 ("migration: Move last_sent_block into PageSearchStatus")

Even though it may not need to copy stable.

I think this is for 8.0, am I right?  It should be very clear if so,
otherwise it's easy to get overlooked.

-- 
Peter Xu

Re: [PATCH RESEND 1/2] qtest/migration-test.c: Add test with compress enabled

2023-04-03 Thread Peter Xu

On Sun, Apr 02, 2023 at 05:47:45PM +, Lukas Straub wrote:
> There has never been a test for migration with compress enabled.
> 
> Add a suitable test, testing with compress-wait-thread = false
> too.
> 
> iterations = 2 is intentional, so it also tests that no invalid
> thread state is left over from the previous iteration.
> 
> Signed-off-by: Lukas Straub 

Overall looks good to me:

Reviewed-by: Peter Xu 

A few nitpicks below.

> ---
>  tests/qtest/migration-test.c | 67 
>  1 file changed, 67 insertions(+)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 3b615b0da9..dbcab2e8ae 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -406,6 +406,41 @@ static void migrate_set_parameter_str(QTestState *who, 
> const char *parameter,
>  migrate_check_parameter_str(who, parameter, value);
>  }
>  
> +static long long migrate_get_parameter_bool(QTestState *who,
> +   const char *parameter)
> +{
> +QDict *rsp;
> +int result;
> +
> +rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
> +result = qdict_get_bool(rsp, parameter);
> +qobject_unref(rsp);
> +return !!result;
> +}
> +
> +static void migrate_check_parameter_bool(QTestState *who, const char 
> *parameter,
> +int value)
> +{
> +int result;
> +
> +result = migrate_get_parameter_bool(who, parameter);
> +g_assert_cmpint(result, ==, value);
> +}
> +
> +static void migrate_set_parameter_bool(QTestState *who, const char 
> *parameter,
> +  int value)
> +{
> +QDict *rsp;
> +
> +rsp = qtest_qmp(who,
> +"{ 'execute': 'migrate-set-parameters',"
> +"'arguments': { %s: %i } }",
> +parameter, value);
> +g_assert(qdict_haskey(rsp, "return"));
> +qobject_unref(rsp);
> +migrate_check_parameter_bool(who, parameter, value);
> +}
> +
>  static void migrate_ensure_non_converge(QTestState *who)
>  {
>  /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
> @@ -1524,6 +1559,36 @@ static void test_precopy_unix_xbzrle(void)
>  test_precopy_common(&args);
>  }
>  
> +static void *
> +test_migrate_compress_start(QTestState *from,
> +  QTestState *to)
> +{
> +migrate_set_parameter_int(from, "compress-level", 9);
> +migrate_set_parameter_int(from, "compress-threads", 1);
> +migrate_set_parameter_bool(from, "compress-wait-thread", false);

May worth trying both true/false (can split into two tests)?

> +migrate_set_parameter_int(to, "decompress-threads", 1);

Why not set both compress/decompress threads to something >1 to check arace
conditions between the threads?

> +
> +migrate_set_capability(from, "compress", true);
> +migrate_set_capability(to, "compress", true);
> +
> +return NULL;
> +}
> +
> +static void test_precopy_unix_compress(void)
> +{
> +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> +MigrateCommon args = {
> +.connect_uri = uri,
> +.listen_uri = uri,
> +

Empty line.

> +.start_hook = test_migrate_compress_start,
> +

Empty line.

> +.iterations = 2,

Maybe move the comment in commit message over here?

> +};
> +
> +test_precopy_common(&args);
> +}
> +
>  static void test_precopy_tcp_plain(void)
>  {
>  MigrateCommon args = {
> @@ -2515,6 +2580,8 @@ int main(int argc, char **argv)
>  qtest_add_func("/migration/bad_dest", test_baddest);
>  qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
>  qtest_add_func("/migration/precopy/unix/xbzrle", 
> test_precopy_unix_xbzrle);
> +qtest_add_func("/migration/precopy/unix/compress",
> +   test_precopy_unix_compress);
>  #ifdef CONFIG_GNUTLS
>  qtest_add_func("/migration/precopy/unix/tls/psk",
> test_precopy_unix_tls_psk);
> -- 
> 2.30.2
> 



-- 
Peter Xu

Re: [RFC QEMU PATCH 08/18] virtio-gpu: Initialize Venus

2023-04-03 Thread Dmitry Osipenko

On 3/24/23 16:22, Huang Rui wrote:
> On Thu, Mar 16, 2023 at 07:14:47AM +0800, Dmitry Osipenko wrote:
>> On 3/13/23 18:55, Huang Rui wrote:
>>> On Mon, Mar 13, 2023 at 01:51:03AM +0800, Dmitry Osipenko wrote:
 On 3/12/23 12:22, Huang Rui wrote:
> From: Antonio Caggiano 
>
> Request Venus when initializing VirGL.
>
> Signed-off-by: Antonio Caggiano 
> ---
>  hw/display/virtio-gpu-virgl.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
> index fe03dc916f..f5ce206b93 100644
> --- a/hw/display/virtio-gpu-virgl.c
> +++ b/hw/display/virtio-gpu-virgl.c
> @@ -803,7 +803,11 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
>  {
>  int ret;
>  
> +#ifdef VIRGL_RENDERER_VENUS
> +ret = virgl_renderer_init(g, VIRGL_RENDERER_VENUS, 
> &virtio_gpu_3d_cbs);
> +#else
>  ret = virgl_renderer_init(g, 0, &virtio_gpu_3d_cbs);
> +#endif

 Note that Venus now requires VIRGL_RENDERER_RENDER_SERVER flag to be
 set. Please test the patches with the latest virglrenderer and etc.

 The #ifdef also doesn't allow adding new flags, it should look like:

 #ifdef VIRGL_RENDERER_VENUS
 flags |= VIRGL_RENDERER_RENDER_SERVER;
 #endif

 ret = virgl_renderer_init(g, flags, &virtio_gpu_3d_cbs);
>>>
>>> In fact, we have rebased to the latest virglrenderer:
>>>
>>> We check both VIRGL_RENDERER_RENDER_SERVER or VIRGL_RENDERER_VENUS in
>>> virglrenderer, alternative of them works.
>>>
>>> https://gitlab.freedesktop.org/rui/virglrenderer/-/commit/c1322a8a84379b1ef7939f56c6761b0114716f45
>>
>> All the extra changes you made to virglrenderer that Qemu depends on
>> need to go upstream. Please open all the relevant merge requests. Thanks!
>>
> 
> Dmitry, sorry to late response, I have created relevant merge requests
> below:
> 
> Virglrenderer:
> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1068
> 
> Mesa:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22108
> 
> I'd appreciate any comments. :-)

Thanks, Ray. I'll try to get to the patches soon.


-- 
Best regards,
Dmitry

Re: [PATCH v2 04/10] linux-user: Add '-one-insn-per-tb' option equivalent to '-singlestep'

2023-04-03 Thread Warner Losh

On Mon, Apr 3, 2023 at 12:35 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 4/3/23 07:46, Peter Maydell wrote:
> > The '-singlestep' option is confusing, because it doesn't actually
> > have anything to do with single-stepping the CPU. What it does do
> > is force TCG emulation to put one guest instruction in each TB,
> > which can be useful in some situations.
> >
> > Create a new command line argument -one-insn-per-tb, so we can
> > document that -singlestep is just a deprecated synonym for it,
> > and eventually perhaps drop it.
> >
> > Signed-off-by: Peter Maydell
> > ---
> >   docs/user/main.rst | 7 ++-
> >   linux-user/main.c  | 9 ++---
> >   2 files changed, 12 insertions(+), 4 deletions(-)
>
> Reviewed-by: Richard Henderson 
>

Reviewed-by: Warner Losh 


> r~
>

Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event

2023-04-03 Thread Philippe Mathieu-Daudé


On 3/4/23 20:29, Stefan Hajnoczi wrote:

Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().

qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.

scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.

These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.

Signed-off-by: Stefan Hajnoczi 
---
  hw/scsi/scsi-bus.c|  3 ++-
  hw/scsi/virtio-scsi.c | 18 +-
  2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c97176110c..f9bd064833 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq 
*r)
  DeviceState *qdev = kid->child;
  SCSIDevice *dev = SCSI_DEVICE(qdev);
  
-if (dev->channel == channel && dev->id == id && dev->lun != 0) {

+if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
+qatomic_load_acquire(&dev->qdev.realized)) {


Would this be more useful as a qdev_is_realized() helper?

Re: [PATCH v2 05/10] bsd-user: Add '-one-insn-per-tb' option equivalent to '-singlestep'

2023-04-03 Thread Warner Losh

On Mon, Apr 3, 2023 at 8:46 AM Peter Maydell 
wrote:

> The '-singlestep' option is confusing, because it doesn't actually
> have anything to do with single-stepping the CPU. What it does do
> is force TCG emulation to put one guest instruction in each TB,
> which can be useful in some situations.
>
> Create a new command line argument -one-insn-per-tb, so we can
> document that -singlestep is just a deprecated synonym for it,
> and eventually perhaps drop it.
>
> Signed-off-by: Peter Maydell 
> ---
> NB: not even compile tested!
> ---
>

It looks good in theory. It may even compile. If ti does:

Reviewed-by: Warner Losh 



>  docs/user/main.rst | 7 ++-
>  bsd-user/main.c| 5 +++--
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/docs/user/main.rst b/docs/user/main.rst
> index f9ac701f4b1..f4786353965 100644
> --- a/docs/user/main.rst
> +++ b/docs/user/main.rst
> @@ -247,5 +247,10 @@ Debug options:
>  ``-p pagesize``
> Act as if the host page size was 'pagesize' bytes
>
> +``-one-insn-per-tb``
> +   Run the emulation with one guest instruction per translation block.
> +   This slows down emulation a lot, but can be useful in some situations,
> +   such as when trying to analyse the logs produced by the ``-d`` option.
> +
>  ``-singlestep``
> -   Run the emulation in single step mode.
> +   This is a deprecated synonym for the ``-one-insn-per-tb`` option.
> diff --git a/bsd-user/main.c b/bsd-user/main.c
> index a9e5a127d38..9d604a670b7 100644
> --- a/bsd-user/main.c
> +++ b/bsd-user/main.c
> @@ -162,7 +162,8 @@ static void usage(void)
> "-d item1[,...]enable logging of specified items\n"
> "  (use '-d help' for a list of log items)\n"
> "-D logfilewrite logs to 'logfile' (default stderr)\n"
> -   "-singlestep   always run in singlestep mode\n"
> +   "-one-insn-per-tb  run with one guest instruction per emulated
> TB\n"
> +   "-singlestep   deprecated synonym for -one-insn-per-tb\n"
> "-strace   log system calls\n"
> "-trace
> [[enable=]][,events=][,file=]\n"
> "  specify tracing options\n"
> @@ -385,7 +386,7 @@ int main(int argc, char **argv)
>  (void) envlist_unsetenv(envlist, "LD_PRELOAD");
>  } else if (!strcmp(r, "seed")) {
>  seed_optarg = optarg;
> -} else if (!strcmp(r, "singlestep")) {
> +} else if (!strcmp(r, "singlestep") || !strcmp(r,
> "one-insn-per-tb") {
>  opt_one_insn_per_tb = true;
>  } else if (!strcmp(r, "strace")) {
>  do_strace = 1;
> --
> 2.34.1
>
>

Re: [PATCH] vfio/migration: Skip log_sync during migration SETUP state

2023-04-03 Thread Cédric Le Goater


On 4/3/23 15:00, Avihai Horon wrote:

Currently, VFIO log_sync can be issued while migration is in SETUP
state. However, doing this log_sync is at best redundant and at worst
can fail.

Redundant -- all RAM is marked dirty in migration SETUP state and is
transferred only after migration is set to ACTIVE state, so doing
log_sync during migration SETUP is pointless.

Can fail -- there is a time window, between setting migration state to
SETUP and starting dirty tracking by RAM save_live_setup handler, during
which dirty tracking is still not started. Any VFIO log_sync call that
is issued during this time window will fail. For example, this error can
be triggered by migrating a VM when a GUI is active, which constantly
calls log_sync.

Fix it by skipping VFIO log_sync while migration is in SETUP state.

Fixes: 758b96b61d5c ("vfio/migrate: Move switch of dirty tracking into 
vfio_memory_listener")
Signed-off-by: Avihai Horon 

migration is still experimental, so this can wait 8.1. Correct me if not.

Thanks,

C.


---
  hw/vfio/common.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4d01ea3515..78358ede27 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -478,7 +478,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
  VFIODevice *vbasedev;
  MigrationState *ms = migrate_get_current();
  
-if (!migration_is_setup_or_active(ms->state)) {

+if (ms->state != MIGRATION_STATUS_ACTIVE &&
+ms->state != MIGRATION_STATUS_DEVICE) {
  return false;
  }

Re: [PATCH v3 2/6] hw/isa/piix3: Reuse piix3_realize() in piix3_xen_realize()

2023-04-03 Thread Bernhard Beschow




Am 3. April 2023 12:27:14 UTC schrieb Jason Andryuk :
>On Mon, Apr 3, 2023 at 5:33 AM Anthony PERARD  
>wrote:
>>
>> On Sat, Apr 01, 2023 at 10:36:45PM +, Bernhard Beschow wrote:
>> >
>> >
>> > Am 30. März 2023 13:00:25 UTC schrieb Anthony PERARD 
>> > :
>> > >On Sun, Mar 12, 2023 at 01:02:17PM +0100, Bernhard Beschow wrote:
>> > >> This is a preparational patch for the next one to make the following
>> > >> more obvious:
>> > >>
>> > >> First, pci_bus_irqs() is now called twice in case of Xen where the
>> > >> second call overrides the pci_set_irq_fn with the Xen variant.
>> > >
>> > >pci_bus_irqs() does allocates pci_bus->irq_count, so the second call in
>> > >piix3_xen_realize() will leak `pci_bus->irq_count`. Could you look if
>> > >pci_bus_irqs_cleanup() can be called before the second pci_bus_irqs()
>> > >call, or maybe some other way to avoid the leak?
>> >
>> > Thanks for catching this! I'll post a v4.
>> >
>> > I think the most fool-proof way to fix this is to free irq_count just 
>> > before the assignment. pci_bus_irqs_cleanup() would then have to NULL the 
>> > attribute such that pci_bus_irqs() can be called afterwards.
>> >
>> > BTW: I tried running qemu-system-x86_64 with PIIX4 rather than PIIX3 as 
>> > Xen guest with my pc-piix4 branch without success. This branch essentially 
>> > just provides slightly different PCI IDs for PIIX. Does xl or something 
>> > else in Xen check these? If not then this means I'm still missing 
>> > something. Under KVM this branch works just fine. Any idea?
>>
>> Maybe the ACPI tables provided by libxl needs to be updated.
>> Or maybe something in the firmware (SeaBIOS or OVMF/OvmfXen) check the
>> id (I know that the PCI id of the root bus is checked, but I don't know
>> if that's the one that's been changed).
>
>Xen also has hvmloader, which runs before SeaBIOS/OVMF.  Looking at
>tools/firmware/hvmloader/pci.c, it has
>ASSERT((devfn != PCI_ISA_DEVFN) ||
>   ((vendor_id == 0x8086) && (device_id == 0x7000)));
>
>From QEMU, it looks like 0x7000 is PCI_DEVICE_ID_INTEL_82371SB_0, but
>PIIX4 uses 0x7110 (PCI_DEVICE_ID_INTEL_82371AB_0).  Maybe try removing
>that check?

Sounds promising indeed. I'll give it a try!

Regards,
Bernhard

>
>Regards,
>Jason

Re: [PATCH v10] audio/pwaudio.c: Add Pipewire audio backend for QEMU

2023-04-03 Thread Dorinda Bassey

Hi Volker, Marc.

I have spent a significant amount of time revising the patchset and I'm
eager to see them included in the project. I understand that reviewing the
patches can be a time-consuming process and I appreciate the effort you've
put into providing feedback and guiding me through the revision process.
However I would appreciate any information you can provide on the expected
timeline for merging the patches. Let me know if there's anything else I
can do to help move this process forward.

Regards,
Dorinda.

On Mon, Apr 3, 2023 at 10:21 PM Dorinda Bassey  wrote:

> This commit adds a new audiodev backend to allow QEMU to use Pipewire as
> both an audio sink and source. This backend is available on most systems
>
> Add Pipewire entry points for QEMU Pipewire audio backend
> Add wrappers for QEMU Pipewire audio backend in qpw_pcm_ops()
> qpw_write function returns the current state of the stream to pwaudio
> and Writes some data to the server for playback streams using pipewire
> spa_ringbuffer implementation.
> qpw_read function returns the current state of the stream to pwaudio and
> reads some data from the server for capture streams using pipewire
> spa_ringbuffer implementation. These functions qpw_write and qpw_read
> are called during playback and capture.
> Added some functions that convert pw audio formats to QEMU audio format
> and vice versa which would be needed in the pipewire audio sink and
> source functions qpw_init_in() & qpw_init_out().
> These methods that implement playback and recording will create streams
> for playback and capture that will start processing and will result in
> the on_process callbacks to be called.
> Built a connection to the Pipewire sound system server in the
> qpw_audio_init() method.
>
> Signed-off-by: Dorinda Bassey 
> ---
> v10:
> improve error handling
> fix volume functions
> add locks in enable_in out functions
> cleanup in reverse order of intialization
> add triggers for the sync method on the core object
> add waiting function for pw_thread_loop_signal
> improve trace
>
>  audio/audio.c |   3 +
>  audio/audio_template.h|   4 +
>  audio/meson.build |   1 +
>  audio/pwaudio.c   | 906 ++
>  audio/trace-events|   8 +
>  meson.build   |   8 +
>  meson_options.txt |   4 +-
>  qapi/audio.json   |  44 ++
>  qemu-options.hx   |  21 +
>  scripts/meson-buildoptions.sh |   8 +-
>  10 files changed, 1004 insertions(+), 3 deletions(-)
>  create mode 100644 audio/pwaudio.c
>
> diff --git a/audio/audio.c b/audio/audio.c
> index 70b096713c..90c7c49d11 100644
> --- a/audio/audio.c
> +++ b/audio/audio.c
> @@ -2061,6 +2061,9 @@ void audio_create_pdos(Audiodev *dev)
>  #ifdef CONFIG_AUDIO_PA
>  CASE(PA, pa, Pa);
>  #endif
> +#ifdef CONFIG_AUDIO_PIPEWIRE
> +CASE(PIPEWIRE, pipewire, Pipewire);
> +#endif
>  #ifdef CONFIG_AUDIO_SDL
>  CASE(SDL, sdl, Sdl);
>  #endif
> diff --git a/audio/audio_template.h b/audio/audio_template.h
> index e42326c20d..dc0c74aa74 100644
> --- a/audio/audio_template.h
> +++ b/audio/audio_template.h
> @@ -362,6 +362,10 @@ AudiodevPerDirectionOptions *glue(audio_get_pdo_,
> TYPE)(Audiodev *dev)
>  case AUDIODEV_DRIVER_PA:
>  return qapi_AudiodevPaPerDirectionOptions_base(dev->u.pa.TYPE);
>  #endif
> +#ifdef CONFIG_AUDIO_PIPEWIRE
> +case AUDIODEV_DRIVER_PIPEWIRE:
> +return
> qapi_AudiodevPipewirePerDirectionOptions_base(dev->u.pipewire.TYPE);
> +#endif
>  #ifdef CONFIG_AUDIO_SDL
>  case AUDIODEV_DRIVER_SDL:
>  return qapi_AudiodevSdlPerDirectionOptions_base(dev->u.sdl.TYPE);
> diff --git a/audio/meson.build b/audio/meson.build
> index 074ba9..65a49c1a10 100644
> --- a/audio/meson.build
> +++ b/audio/meson.build
> @@ -19,6 +19,7 @@ foreach m : [
>['sdl', sdl, files('sdlaudio.c')],
>['jack', jack, files('jackaudio.c')],
>['sndio', sndio, files('sndioaudio.c')],
> +  ['pipewire', pipewire, files('pwaudio.c')],
>['spice', spice, files('spiceaudio.c')]
>  ]
>if m[1].found()
> diff --git a/audio/pwaudio.c b/audio/pwaudio.c
> new file mode 100644
> index 00..f9da86059f
> --- /dev/null
> +++ b/audio/pwaudio.c
> @@ -0,0 +1,906 @@
> +/*
> + * QEMU Pipewire audio driver
> + *
> + * Copyright (c) 2023 Red Hat Inc.
> + *
> + * Author: Dorinda Bassey   
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/module.h"
> +#include "audio.h"
> +#include 
> +#include "qemu/error-report.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "trace.h"
> +
> +#define AUDIO_CAP "pipewire"
> +#define RINGBUFFER_SIZE(1u << 22)
> +#define RINGBUFFER_MASK(RINGBUFFER_SIZE - 1)
> +
> +#include "audio_int.h"
> +
> +typedef struct pwvolume {
> +uint32_t channels;
> +float values[SPA_AUDIO_MAX_CHANNELS];
> +} pwvolume;
> +
> +typedef struct pwau

[PATCH v10] audio/pwaudio.c: Add Pipewire audio backend for QEMU

2023-04-03 Thread Dorinda Bassey

This commit adds a new audiodev backend to allow QEMU to use Pipewire as
both an audio sink and source. This backend is available on most systems

Add Pipewire entry points for QEMU Pipewire audio backend
Add wrappers for QEMU Pipewire audio backend in qpw_pcm_ops()
qpw_write function returns the current state of the stream to pwaudio
and Writes some data to the server for playback streams using pipewire
spa_ringbuffer implementation.
qpw_read function returns the current state of the stream to pwaudio and
reads some data from the server for capture streams using pipewire
spa_ringbuffer implementation. These functions qpw_write and qpw_read
are called during playback and capture.
Added some functions that convert pw audio formats to QEMU audio format
and vice versa which would be needed in the pipewire audio sink and
source functions qpw_init_in() & qpw_init_out().
These methods that implement playback and recording will create streams
for playback and capture that will start processing and will result in
the on_process callbacks to be called.
Built a connection to the Pipewire sound system server in the
qpw_audio_init() method.

Signed-off-by: Dorinda Bassey 
---
v10:
improve error handling
fix volume functions
add locks in enable_in out functions
cleanup in reverse order of intialization
add triggers for the sync method on the core object
add waiting function for pw_thread_loop_signal
improve trace

 audio/audio.c |   3 +
 audio/audio_template.h|   4 +
 audio/meson.build |   1 +
 audio/pwaudio.c   | 906 ++
 audio/trace-events|   8 +
 meson.build   |   8 +
 meson_options.txt |   4 +-
 qapi/audio.json   |  44 ++
 qemu-options.hx   |  21 +
 scripts/meson-buildoptions.sh |   8 +-
 10 files changed, 1004 insertions(+), 3 deletions(-)
 create mode 100644 audio/pwaudio.c

diff --git a/audio/audio.c b/audio/audio.c
index 70b096713c..90c7c49d11 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -2061,6 +2061,9 @@ void audio_create_pdos(Audiodev *dev)
 #ifdef CONFIG_AUDIO_PA
 CASE(PA, pa, Pa);
 #endif
+#ifdef CONFIG_AUDIO_PIPEWIRE
+CASE(PIPEWIRE, pipewire, Pipewire);
+#endif
 #ifdef CONFIG_AUDIO_SDL
 CASE(SDL, sdl, Sdl);
 #endif
diff --git a/audio/audio_template.h b/audio/audio_template.h
index e42326c20d..dc0c74aa74 100644
--- a/audio/audio_template.h
+++ b/audio/audio_template.h
@@ -362,6 +362,10 @@ AudiodevPerDirectionOptions *glue(audio_get_pdo_, 
TYPE)(Audiodev *dev)
 case AUDIODEV_DRIVER_PA:
 return qapi_AudiodevPaPerDirectionOptions_base(dev->u.pa.TYPE);
 #endif
+#ifdef CONFIG_AUDIO_PIPEWIRE
+case AUDIODEV_DRIVER_PIPEWIRE:
+return 
qapi_AudiodevPipewirePerDirectionOptions_base(dev->u.pipewire.TYPE);
+#endif
 #ifdef CONFIG_AUDIO_SDL
 case AUDIODEV_DRIVER_SDL:
 return qapi_AudiodevSdlPerDirectionOptions_base(dev->u.sdl.TYPE);
diff --git a/audio/meson.build b/audio/meson.build
index 074ba9..65a49c1a10 100644
--- a/audio/meson.build
+++ b/audio/meson.build
@@ -19,6 +19,7 @@ foreach m : [
   ['sdl', sdl, files('sdlaudio.c')],
   ['jack', jack, files('jackaudio.c')],
   ['sndio', sndio, files('sndioaudio.c')],
+  ['pipewire', pipewire, files('pwaudio.c')],
   ['spice', spice, files('spiceaudio.c')]
 ]
   if m[1].found()
diff --git a/audio/pwaudio.c b/audio/pwaudio.c
new file mode 100644
index 00..f9da86059f
--- /dev/null
+++ b/audio/pwaudio.c
@@ -0,0 +1,906 @@
+/*
+ * QEMU Pipewire audio driver
+ *
+ * Copyright (c) 2023 Red Hat Inc.
+ *
+ * Author: Dorinda Bassey   
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "audio.h"
+#include 
+#include "qemu/error-report.h"
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "trace.h"
+
+#define AUDIO_CAP "pipewire"
+#define RINGBUFFER_SIZE(1u << 22)
+#define RINGBUFFER_MASK(RINGBUFFER_SIZE - 1)
+
+#include "audio_int.h"
+
+typedef struct pwvolume {
+uint32_t channels;
+float values[SPA_AUDIO_MAX_CHANNELS];
+} pwvolume;
+
+typedef struct pwaudio {
+Audiodev *dev;
+struct pw_thread_loop *thread_loop;
+struct pw_context *context;
+
+struct pw_core *core;
+struct spa_hook core_listener;
+int last_seq, pending_seq, error;
+} pwaudio;
+
+typedef struct PWVoice {
+pwaudio *g;
+struct pw_stream *stream;
+struct spa_hook stream_listener;
+struct spa_audio_info_raw info;
+uint32_t highwater_mark;
+uint32_t frame_size, req;
+struct spa_ringbuffer ring;
+uint8_t buffer[RINGBUFFER_SIZE];
+
+struct pw_properties *props;
+pwvolume volume;
+bool muted;
+} PWVoice;
+
+typedef struct PWVoiceOut {
+HWVoiceOut hw;
+PWVoice v;
+} PWVoiceOut;
+
+typedef struct PWVoiceIn {
+HWVoiceIn hw;
+PWVoice v;
+} PWVoiceIn;
+
+static void
+stream_destroy(void *data)
+{
+PWVoice *v = (PWVoice *) data;
+

Re: [RFC PATCH v2 18/44] target/loongarch: Implement vsat

2023-04-03 Thread Richard Henderson


On 4/3/23 05:55, gaosong wrote:

Hi, Richard

在 2023/4/1 下午1:03, Richard Henderson 写道:

On 3/27/23 20:06, Song Gao wrote:

+static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
+{
+    TCGv_vec t1;
+    int64_t max  = (1l << imm) - 1;


This needed 1ull, but better to just use

    max = MAKE_64BIT_MASK(0, imm - 1); 

For the signed  version use ll?
I think use MAKE_64BIT_MASK(0, imm -1 )  for the signed version is not suitable.


int64_t max = MAKE_64BIT_MASK(0, imm);
int64_t min = ~max // or -1 - max

Re: [PATCH v2] hostmem-file: add offset option

2023-04-03 Thread Alexander Graf



On 03.04.23 08:28, Markus Armbruster wrote:


Alexander Graf  writes:


Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 

[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index a877b879b9..8f5eaa8415 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -635,6 +635,10 @@
  # specify the required alignment via this option.
  # 0 selects a default alignment (currently the page size). (default: 
0)
  #
+# @offset: the offset into the target file that the region starts at. You can
+#  use this option to overload multiple regions into a single fils.

single file

I'm not sure about "to overload multiple regions into a single file".
Maybe "to back multiple regions with a single file".



I like it, I'll use that version here and in the qemu-options.hx file.



Any alignment requirements?



Page size, I'll add it.




What happens when the regions overlap?



It "just works" - same as mapping the same file twice. It's up to the 
user to ensure that nothing bad happens because of that.



Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram

2023-04-03 Thread Peter Xu

Hi, Claudio,

Thanks for the context.

On Mon, Apr 03, 2023 at 09:47:26AM +0200, Claudio Fontana wrote:
> Hi, not sure if what is asked here is context in terms of the previous
> upstream discussions or our specific requirement we are trying to bring
> upstream.
>
> In terms of the specific requirement we are trying to bring upstream, we
> need to get libvirt+QEMU VM save and restore functionality to be able to
> transfer VM sizes of ~30 GB (4/8 vcpus) in roughly 5 seconds.  When an
> event trigger happens, the VM needs to be quickly paused and saved to
> disk safely, including datasync, and another VM needs to be restored,
> also in ~5 secs.  For our specific requirement, the VM is never running
> when its data (mostly consisting of RAM) is saved.
>
> I understand that the need to handle also the "live" case comes from
> upstream discussions about solving the "general case", where someone
> might want to do this for "live" VMs, but if helpful I want to highlight
> that it is not part of the specific requirement we are trying to address,
> and for this specific case won't also in the future, as the whole point
> of the trigger is to replace the running VM with another VM, so it cannot
> be kept running.

>From what I read so far, that scenario suites exactly what live snapshot
would do with current QEMU - that at least should involve a snapshot on the
disks being used or I can't see how that can be live.  So it looks like a
separate request.

> The reason we are using "migrate" here likely stems from the fact that
> existing libvirt code currently uses QMP migrate to implement the save
> and restore commands.  And in my personal view, I think that reusing the
> existing building blocks (migration, multifd) would be preferable, to
> avoid having to maintain two separate ways to do the same thing.  That
> said, it could be done in a different way, if the performance can keep
> up. Just thinking of reducing the overall effort and also maintenance
> surface.

I would vaguely guess the performance can not only keep up but better than
what the current solution would provide, due to the possibility of (1)
batch handling of continuous guest pages, and (2) completely no dirty
tracking overhead.

For (2), it's not about wr-protect page faults or vmexits due to PML being
full (because vcpus will be stopped anyway..), it's about enabling the
dirty tracking (which already contains overhead, especially when huge pages
are enabled, to split huge pages in EPT pgtables) and all the bitmap
operations QEMU does during live migration even if the VM is not live.

IMHO reusing multifd may or may not be a good idea here, because it'll of
course also complicate multifd code, hence makes multifd harder to
maintain, while not in a good way, because as I mentioned I don't think it
can use much of what multifd provides.

I don't have a strong opinion on the impl (even though I do have a
preference..), but I think at least we should still check on two things:

  - Being crystal clear on the use case above, and double check whether "VM
stop" should be the default operation at the start of the new cmd - we
shouldn't assume the user will be aware of doing this, neither should
we assume the user is aware of the performance implications.

  - Making sure the image layout is well defined, so:

- It'll be extensible in the future, and,

- If someone would like to refactor it to not use the migration thread
  model anymore, the image format, hopefully, can be easy to keep
  untouched so it can be compatible with the current approach.

Just my two cents. I think Juan should have the best grasp on this.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 09/10] qapi/run-state.json: Fix missing newline at end of file

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The run-state.json file is missing a trailing newline; add it.

Signed-off-by: Peter Maydell
---
Noticed this because my editor wanted to add the newline
when I touched the file for the following patch...
---
  qapi/run-state.json | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 08/10] hmp: Report 'one-insn-per-tb', not 'single step mode', in 'info status' output

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The HMP 'info status' output includes "(single step mode)" when we are
running with TCG one-insn-per-tb enabled. Change this text to
"(one insn per TB)" to match the new command line option names.

We don't need to have a deprecation/transition plan for this, because
we make no guarantees about stability of HMP output.

Signed-off-by: Peter Maydell
---
  softmmu/runstate-hmp-cmds.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 07/10] hmp: Add 'one-insn-per-tb' command equivalent to 'singlestep'

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The 'singlestep' HMP command is confusing, because it doesn't
actually have anything to do with single-stepping the CPU.  What it
does do is force TCG emulation to put one guest instruction in each
TB, which can be useful in some situations.

Create a new HMP command  'one-insn-per-tb', so we can document that
'singlestep' is just a deprecated synonym for it, and eventually
perhaps drop it.

We aren't obliged to do deprecate-and-drop for HMP commands,
but it's easy enough to do so, so we do.

Signed-off-by: Peter Maydell
---
  docs/about/deprecated.rst   |  9 +
  include/monitor/hmp.h   |  2 +-
  softmmu/runstate-hmp-cmds.c |  2 +-
  tests/qtest/test-hmp.c  |  1 +
  hmp-commands.hx | 25 +
  5 files changed, 33 insertions(+), 6 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 06/10] Document that -singlestep command line option is deprecated

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

Document that the -singlestep command line option is now
deprecated, as it is replaced by either the TCG accelerator
property 'one-insn-per-tb' for system emulation or the new
'-one-insn-per-tb' option for usermode emulation, and remove
the only use of the deprecated syntax from a README.

Signed-off-by: Peter Maydell
---
  docs/about/deprecated.rst | 16 
  qemu-options.hx   |  5 +++--
  tcg/tci/README|  2 +-
  3 files changed, 20 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 05/10] bsd-user: Add '-one-insn-per-tb' option equivalent to '-singlestep'

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The '-singlestep' option is confusing, because it doesn't actually
have anything to do with single-stepping the CPU. What it does do
is force TCG emulation to put one guest instruction in each TB,
which can be useful in some situations.

Create a new command line argument -one-insn-per-tb, so we can
document that -singlestep is just a deprecated synonym for it,
and eventually perhaps drop it.

Signed-off-by: Peter Maydell
---
NB: not even compile tested!
---
  docs/user/main.rst | 7 ++-
  bsd-user/main.c| 5 +++--
  2 files changed, 9 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH v4 1/2] arm: move KVM breakpoints helpers

2023-04-03 Thread francesco . cagnin

From: Francesco Cagnin 

These helpers will be also used for HVF. Aside from reformatting a
couple of comments for 'checkpatch.pl' and updating meson to compile
'hyp_gdbstub.c', this is just code motion.

Signed-off-by: Francesco Cagnin 
Reviewed-by: Alex Bennée 
Reviewed-by: Peter Maydell 
---
 target/arm/hyp_gdbstub.c | 253 +++
 target/arm/internals.h   |  50 +++
 target/arm/kvm64.c   | 276 ---
 target/arm/meson.build   |   3 +-
 4 files changed, 305 insertions(+), 277 deletions(-)
 create mode 100644 target/arm/hyp_gdbstub.c

diff --git a/target/arm/hyp_gdbstub.c b/target/arm/hyp_gdbstub.c
new file mode 100644
index 00..ebde2899cd
--- /dev/null
+++ b/target/arm/hyp_gdbstub.c
@@ -0,0 +1,253 @@
+/*
+ * ARM implementation of KVM and HVF hooks, 64 bit specific code
+ *
+ * Copyright Mian-M. Hamayun 2013, Virtual Open Systems
+ * Copyright Alex Bennée 2014, Linaro
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internals.h"
+#include "exec/gdbstub.h"
+
+/* Maximum and current break/watch point counts */
+int max_hw_bps, max_hw_wps;
+GArray *hw_breakpoints, *hw_watchpoints;
+
+/**
+ * insert_hw_breakpoint()
+ * @addr: address of breakpoint
+ *
+ * See ARM ARM D2.9.1 for details but here we are only going to create
+ * simple un-linked breakpoints (i.e. we don't chain breakpoints
+ * together to match address and context or vmid). The hardware is
+ * capable of fancier matching but that will require exposing that
+ * fanciness to GDB's interface
+ *
+ * DBGBCR_EL1, Debug Breakpoint Control Registers
+ *
+ *  31  24 23  20 19   16 15 14  13  12   9 8   5 43 2   1  0
+ * +--+--+---+-++--+-+--+-+---+
+ * | RES0 |  BT  |  LBN  | SSC | HMC| RES0 | BAS | RES0 | PMC | E |
+ * +--+--+---+-++--+-+--+-+---+
+ *
+ * BT: Breakpoint type (0 = unlinked address match)
+ * LBN: Linked BP number (0 = unused)
+ * SSC/HMC/PMC: Security, Higher and Priv access control (Table D-12)
+ * BAS: Byte Address Select (RES1 for AArch64)
+ * E: Enable bit
+ *
+ * DBGBVR_EL1, Debug Breakpoint Value Registers
+ *
+ *  63  53 52   49 48   2  1 0
+ * +--+---+--+-+
+ * | RESS | VA[52:49] | VA[48:2] | 0 0 |
+ * +--+---+--+-+
+ *
+ * Depending on the addressing mode bits the top bits of the register
+ * are a sign extension of the highest applicable VA bit. Some
+ * versions of GDB don't do it correctly so we ensure they are correct
+ * here so future PC comparisons will work properly.
+ */
+
+int insert_hw_breakpoint(target_ulong addr)
+{
+HWBreakpoint brk = {
+.bcr = 0x1, /* BCR E=1, enable */
+.bvr = sextract64(addr, 0, 53)
+};
+
+if (cur_hw_bps >= max_hw_bps) {
+return -ENOBUFS;
+}
+
+brk.bcr = deposit32(brk.bcr, 1, 2, 0x3);   /* PMC = 11 */
+brk.bcr = deposit32(brk.bcr, 5, 4, 0xf);   /* BAS = RES1 */
+
+g_array_append_val(hw_breakpoints, brk);
+
+return 0;
+}
+
+/**
+ * delete_hw_breakpoint()
+ * @pc: address of breakpoint
+ *
+ * Delete a breakpoint and shuffle any above down
+ */
+
+int delete_hw_breakpoint(target_ulong pc)
+{
+int i;
+for (i = 0; i < hw_breakpoints->len; i++) {
+HWBreakpoint *brk = get_hw_bp(i);
+if (brk->bvr == pc) {
+g_array_remove_index(hw_breakpoints, i);
+return 0;
+}
+}
+return -ENOENT;
+}
+
+/**
+ * insert_hw_watchpoint()
+ * @addr: address of watch point
+ * @len: size of area
+ * @type: type of watch point
+ *
+ * See ARM ARM D2.10. As with the breakpoints we can do some advanced
+ * stuff if we want to. The watch points can be linked with the break
+ * points above to make them context aware. However for simplicity
+ * currently we only deal with simple read/write watch points.
+ *
+ * D7.3.11 DBGWCR_EL1, Debug Watchpoint Control Registers
+ *
+ *  31  29 28   24 23  21  20  19 16 15 14  13   12  5 4   3 2   1  0
+ * +--+---+--++-+-+-+-+-+-+---+
+ * | RES0 |  MASK | RES0 | WT | LBN | SSC | HMC | BAS | LSC | PAC | E |
+ * +--+---+--++-+-+-+-+-+-+---+
+ *
+ * MASK: num bits addr mask (0=none,01/10=res,11=3 bits (8 bytes))
+ * WT: 0 - unlinked, 1 - linked (not currently used)
+ * LBN: Linked BP number (not currently used)
+ * SSC/HMC/PAC: Security, Higher and Priv access control (Table D2-11)
+ * BAS: Byte Address Select
+ * LSC: Load/Store control (01: load, 10: store, 11: both)
+ * E: Enable
+ *
+ * The bottom 2 bits of the value register are masked. Therefore to
+ * break on any sizes smaller than an unaligned word you need to set
+ * MASK=0, BAS=bit per byte in question. For larger regions (^2) you
+ * need to ensure you mask

[PATCH v4 2/2] hvf: implement guest debugging on Apple Silicon hosts

2023-04-03 Thread francesco . cagnin

From: Francesco Cagnin 

Support is added for single-stepping, software breakpoints, hardware
breakpoints and watchpoints. The code has been structured like the KVM
counterpart (and many parts are basically identical).

Guests can be debugged through the gdbstub.

While guest debugging is enabled, the guest can still read and write the
DBG*_EL1 registers but they don't have any effect.

Signed-off-by: Francesco Cagnin 
---
 accel/hvf/hvf-accel-ops.c | 115 +++
 accel/hvf/hvf-all.c   |  23 ++
 include/sysemu/hvf.h  |  34 ++
 include/sysemu/hvf_int.h  |   1 +
 target/arm/hvf/hvf.c  | 709 +-
 target/i386/hvf/hvf.c |  33 ++
 6 files changed, 913 insertions(+), 2 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 24913ca9c4..c54301203d 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -52,6 +52,7 @@
 #include "qemu/main-loop.h"
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
+#include "exec/gdbstub.h"
 #include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
 #include "sysemu/hvf_int.h"
@@ -340,12 +341,18 @@ static int hvf_accel_init(MachineState *ms)
 return hvf_arch_init();
 }
 
+static inline int hvf_gdbstub_sstep_flags(void)
+{
+return SSTEP_ENABLE | SSTEP_NOIRQ;
+}
+
 static void hvf_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
 ac->name = "HVF";
 ac->init_machine = hvf_accel_init;
 ac->allowed = &hvf_allowed;
+ac->gdbstub_supported_sstep_flags = hvf_gdbstub_sstep_flags;
 }
 
 static const TypeInfo hvf_accel_type = {
@@ -462,6 +469,108 @@ static void hvf_start_vcpu_thread(CPUState *cpu)
cpu, QEMU_THREAD_JOINABLE);
 }
 
+static int hvf_insert_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr 
len)
+{
+struct hvf_sw_breakpoint *bp;
+int err;
+
+if (type == GDB_BREAKPOINT_SW) {
+bp = hvf_find_sw_breakpoint(cpu, addr);
+if (bp) {
+bp->use_count++;
+return 0;
+}
+
+bp = g_new(struct hvf_sw_breakpoint, 1);
+bp->pc = addr;
+bp->use_count = 1;
+err = hvf_arch_insert_sw_breakpoint(cpu, bp);
+if (err) {
+g_free(bp);
+return err;
+}
+
+QTAILQ_INSERT_HEAD(&cpu->hvf->hvf_sw_breakpoints, bp, entry);
+} else {
+err = hvf_arch_insert_hw_breakpoint(addr, len, type);
+if (err) {
+return err;
+}
+}
+
+CPU_FOREACH(cpu) {
+err = hvf_update_guest_debug(cpu);
+if (err) {
+return err;
+}
+}
+return 0;
+}
+
+static int hvf_remove_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr 
len)
+{
+struct hvf_sw_breakpoint *bp;
+int err;
+
+if (type == GDB_BREAKPOINT_SW) {
+bp = hvf_find_sw_breakpoint(cpu, addr);
+if (!bp) {
+return -ENOENT;
+}
+
+if (bp->use_count > 1) {
+bp->use_count--;
+return 0;
+}
+
+err = hvf_arch_remove_sw_breakpoint(cpu, bp);
+if (err) {
+return err;
+}
+
+QTAILQ_REMOVE(&cpu->hvf->hvf_sw_breakpoints, bp, entry);
+g_free(bp);
+} else {
+err = hvf_arch_remove_hw_breakpoint(addr, len, type);
+if (err) {
+return err;
+}
+}
+
+CPU_FOREACH(cpu) {
+err = hvf_update_guest_debug(cpu);
+if (err) {
+return err;
+}
+}
+return 0;
+}
+
+static void hvf_remove_all_breakpoints(CPUState *cpu)
+{
+struct hvf_sw_breakpoint *bp, *next;
+CPUState *tmpcpu;
+
+QTAILQ_FOREACH_SAFE(bp, &cpu->hvf->hvf_sw_breakpoints, entry, next) {
+if (hvf_arch_remove_sw_breakpoint(cpu, bp) != 0) {
+/* Try harder to find a CPU that currently sees the breakpoint. */
+CPU_FOREACH(tmpcpu)
+{
+if (hvf_arch_remove_sw_breakpoint(tmpcpu, bp) == 0) {
+break;
+}
+}
+}
+QTAILQ_REMOVE(&cpu->hvf->hvf_sw_breakpoints, bp, entry);
+g_free(bp);
+}
+hvf_arch_remove_all_hw_breakpoints();
+
+CPU_FOREACH(cpu) {
+hvf_update_guest_debug(cpu);
+}
+}
+
 static void hvf_accel_ops_class_init(ObjectClass *oc, void *data)
 {
 AccelOpsClass *ops = ACCEL_OPS_CLASS(oc);
@@ -473,6 +582,12 @@ static void hvf_accel_ops_class_init(ObjectClass *oc, void 
*data)
 ops->synchronize_post_init = hvf_cpu_synchronize_post_init;
 ops->synchronize_state = hvf_cpu_synchronize_state;
 ops->synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm;
+
+ops->insert_breakpoint = hvf_insert_breakpoint;
+ops->remove_breakpoint = hvf_remove_breakpoint;
+ops->remove_all_breakpoints = hvf_remove_all_breakpoints;
+ops->update_guest_debug = hvf_update_guest_debug;
+ops->supports_guest_debug = hvf_arch_supports_guest_debug;
 };
 static cons

[PATCH v4 0/2] Add gdbstub support to HVF

2023-04-03 Thread francesco . cagnin

From: Francesco Cagnin 

This patch series aims to add gdbstub support to HVF (the 'QEMU
accelerator on macOS that employs Hypervisor.framework') on Apple
Silicon hosts.

The proposed implementation, structured like the KVM counterpart,
handles single-stepping, software breakpoints, hardware breakpoints and
hardware watchpoints.

The patch has been most recently tested working on macOS Ventura 13.3
hosts and single-core Linux kernel 5.19 guests with the test script
'tests/guest-debug/test-gdbstub.py' (slightly updated to make it work
with Linux kernels compiled on macOS, see
https://gitlab.com/qemu-project/qemu/-/issues/1489).

The patch still has uninvestigated issues with multi-cores guests (see
thread https://www.mail-archive.com/qemu-devel@nongnu.org/msg932884.html).

v4:
* Add license and copyright comment in 'hyp_gdbstub.c'
* Fix build on x86 macos
* Add architecture specific functions to check guest debug support
* Move include to the top of the file in 'hvf.h'
* Merge patches 2 and 3 from previous patch series
* Re-inject EC_AA64_BKPT into the guest if unhandled
* Add comments explaining how debug registers are handled
* Minor refactor around debug registers for readability
* Other minor changes

Francesco Cagnin (2):
  arm: move KVM breakpoints helpers
  hvf: implement guest debugging on Apple Silicon hosts

 accel/hvf/hvf-accel-ops.c | 115 +++
 accel/hvf/hvf-all.c   |  23 ++
 include/sysemu/hvf.h  |  34 ++
 include/sysemu/hvf_int.h  |   1 +
 target/arm/hvf/hvf.c  | 709 +-
 target/arm/hyp_gdbstub.c  | 253 ++
 target/arm/internals.h|  50 +++
 target/arm/kvm64.c| 276 ---
 target/arm/meson.build|   3 +-
 target/i386/hvf/hvf.c |  33 ++
 10 files changed, 1218 insertions(+), 279 deletions(-)
 create mode 100644 target/arm/hyp_gdbstub.c

-- 
2.40.0

Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/plain/cancel

2023-04-03 Thread Peter Maydell

On Wed, 22 Mar 2023 at 20:15, Peter Maydell  wrote:
>
> On Tue, 7 Mar 2023 at 09:53, Peter Maydell  wrote:
> >
> > On Sat, 4 Mar 2023 at 15:39, Peter Maydell  wrote:
> > >
> > > On Thu, 2 Mar 2023 at 17:22, Peter Maydell  
> > > wrote:
> > > >
> > > > migration-test has been flaky for a long time, both in CI and
> > > > otherwise:
> > > >
> > >
> > >
> > > > In the cases where I've looked at the underlying log, this seems to
> > > > be in the migration/multifd/tcp/plain/cancel subtest.  Disable that
> > > > specific subtest by default until somebody can track down the
> > > > underlying cause. Enthusiasts can opt back in by setting
> > > > QEMU_TEST_FLAKY_TESTS=1 in their environment.
> > >
> > > So I'm going to apply this, because hopefully it will improve
> > > the reliability a bit, but it's clearly not all of the
> > > issues with migration-test, because in the course of the
> > > run I was doing to test it before applying it I got this
> > > error from the OpenBSD VM:
> > >
> > >  32/646 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
> > >ERROR  134.73s   killed by signal 6 SIGABRT
> > > ― ✀  
> > > ―
> > > stderr:
> > > qemu-system-aarch64: multifd_send_sync_main: channel 15 has already quit
> > > qemu-system-aarch64: failed to save SaveStateEntry with id(name): 2(ram): 
> > > -1
> > > qemu-system-aarch64: Failed to connect to '127.0.0.1:19581': Address
> > > already in use
> > > query-migrate shows failed migration: Failed to connect to
> > > '127.0.0.1:19581': Address already in use
> > > **
> > > ERROR:../src/tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
> > > assertion failed: (!g_str_equal(status, "failed"))
> > >
> > > (test program exited with status code -6)
> >
> > Got another repeat of this one today; again, on the OpenBSD VM:
> >
> >  32/646 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
> >ERROR
> >   131.28s   killed by signal 6 SIGABRT
> > ― ✀  
> > ―
> > stderr:
> > qemu-system-aarch64: multifd_send_sync_main: channel 15 has already quit
> > qemu-system-aarch64: failed to save SaveStateEntry with id(name): 2(ram): -1
> > qemu-system-aarch64: Failed to connect to '127.0.0.1:30312': Address
> > already in use
> > query-migrate shows failed migration: Failed to connect to
> > '127.0.0.1:30312': Address already i
> > n use
> > **
> > ERROR:../src/tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
> > assertion failed: (!
> > g_str_equal(status, "failed"))
> >
> > (test program exited with status code -6)
> > ――
>
> This one's still here (openbsd VM again):
>
>  37/774 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>ERROR  565.42s   killed by signal 6 SIGABRT
> ― ✀  ―
> stderr:
> qemu-system-x86_64: multifd_send_sync_main: channel 15 has already quit
> qemu-system-x86_64: Failed to connect to '127.0.0.1:7856': Address
> already in use
> query-migrate shows failed migration: Failed to connect to
> '127.0.0.1:7856': Address already in use
> **
> ERROR:../src/tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
> assertion failed: (!g_str_equal(status, "failed"))
>
> (test program exited with status code -6)

And again, here on x86 macos:

 32/626 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
   ERROR  413.00s   killed by signal 6 SIGABRT
― ✀  ―
stderr:
qemu-system-aarch64: multifd_send_sync_main: channel 14 has already quit
qemu-system-aarch64: Failed to connect to '127.0.0.1:52689':
Connection reset by peer
query-migrate shows failed migration: Failed to connect to
'127.0.0.1:52689': Connection reset by peer
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
assertion failed: (!g_str_equal(status, "failed"))

(test program exited with status code -6)
――


If this isn't sufficient output to be able to figure out the
problem, can we add more diagnostics to the test and/or the
migration code so that the test does produce helpful output?

thanks
-- PMM

Re: [PATCH v2 04/10] linux-user: Add '-one-insn-per-tb' option equivalent to '-singlestep'

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The '-singlestep' option is confusing, because it doesn't actually
have anything to do with single-stepping the CPU. What it does do
is force TCG emulation to put one guest instruction in each TB,
which can be useful in some situations.

Create a new command line argument -one-insn-per-tb, so we can
document that -singlestep is just a deprecated synonym for it,
and eventually perhaps drop it.

Signed-off-by: Peter Maydell
---
  docs/user/main.rst | 7 ++-
  linux-user/main.c  | 9 ++---
  2 files changed, 12 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()

2023-04-03 Thread Stefan Hajnoczi

These functions must be called with the AioContext acquired:

  /* Callers must hold exp->ctx lock */
  void blk_exp_ref(BlockExport *exp)
  ...
  /* Callers must hold exp->ctx lock */
  void blk_exp_unref(BlockExport *exp)

Signed-off-by: Stefan Hajnoczi 
---
 block/export/fuse.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 06fa41079e..18394f9e07 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
 FuseExport *exp = opaque;
 int ret;
 
+aio_context_acquire(exp->common.ctx);
 blk_exp_ref(&exp->common);
+aio_context_release(exp->common.ctx);
 
 do {
 ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
@@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
 fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
 
 out:
+aio_context_acquire(exp->common.ctx);
 blk_exp_unref(&exp->common);
+aio_context_release(exp->common.ctx);
 }
 
 static void fuse_export_shutdown(BlockExport *blk_exp)
-- 
2.39.2

Re: [PATCH v2 03/10] tcg: Use one-insn-per-tb accelerator property in curr_cflags()

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

  uint32_t curr_cflags(CPUState *cpu)
  {
  uint32_t cflags = cpu->tcg_cflags;
+TCGState *tcgstate = TCG_STATE(current_accel());


As mentioned against the cover, this is a very hot path.

We should try for something less expensive.  Perhaps as simple as

return cpu->tcg_cflags | tcg_cflags_global;

where cpu->tcg_cflags is updated with cpu->singlestep_enabled.


r~

[PATCH 05/13] block/export: wait for vhost-user-blk requests when draining

2023-04-03 Thread Stefan Hajnoczi

Each vhost-user-blk request runs in a coroutine. When the BlockBackend
enters a drained section we need to enter a quiescent state. Currently
any in-flight requests race with bdrv_drained_begin() because it is
unaware of vhost-user-blk requests.

When blk_co_preadv/pwritev()/etc returns it wakes the
bdrv_drained_begin() thread but vhost-user-blk request processing has
not yet finished. The request coroutine continues executing while the
main loop thread thinks it is in a drained section.

One example where this is unsafe is for blk_set_aio_context() where
bdrv_drained_begin() is called before .aio_context_detached() and
.aio_context_attach(). If request coroutines are still running after
bdrv_drained_begin(), then the AioContext could change underneath them
and they race with new requests processed in the new AioContext. This
could lead to virtqueue corruption, for example.

(This example is theoretical, I came across this while reading the
code and have not tried to reproduce it.)

It's easy to make bdrv_drained_begin() wait for in-flight requests: add
a .drained_poll() callback that checks the VuServer's in-flight counter.
VuServer just needs an API that returns true when there are requests in
flight. The in-flight counter needs to be atomic.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/vhost-user-server.h |  4 +++-
 block/export/vhost-user-blk-server.c | 19 +++
 util/vhost-user-server.c | 14 ++
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index bc0ac9ddb6..b1c1cda886 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -40,8 +40,9 @@ typedef struct {
 int max_queues;
 const VuDevIface *vu_iface;
 
+unsigned int in_flight; /* atomic */
+
 /* Protected by ctx lock */
-unsigned int in_flight;
 bool wait_idle;
 VuDev vu_dev;
 QIOChannel *ioc; /* The I/O channel with the client */
@@ -62,6 +63,7 @@ void vhost_user_server_stop(VuServer *server);
 
 void vhost_user_server_inc_in_flight(VuServer *server);
 void vhost_user_server_dec_in_flight(VuServer *server);
+bool vhost_user_server_has_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index e93f2ed6b4..dbf5207162 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -254,6 +254,22 @@ static void vu_blk_exp_request_shutdown(BlockExport *exp)
 vhost_user_server_stop(&vexp->vu_server);
 }
 
+/*
+ * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
+ *
+ * Called with vexp->export.ctx acquired.
+ */
+static bool vu_blk_drained_poll(void *opaque)
+{
+VuBlkExport *vexp = opaque;
+
+return vhost_user_server_has_in_flight(&vexp->vu_server);
+}
+
+static const BlockDevOps vu_blk_dev_ops = {
+.drained_poll  = vu_blk_drained_poll,
+};
+
 static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
  Error **errp)
 {
@@ -292,6 +308,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
  logical_block_size, num_queues);
 
+blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vexp);
 
@@ -299,6 +316,7 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  num_queues, &vu_blk_iface, errp)) {
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
 blk_aio_detach, vexp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 g_free(vexp->handler.serial);
 return -EADDRNOTAVAIL;
 }
@@ -312,6 +330,7 @@ static void vu_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vexp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 g_free(vexp->handler.serial);
 }
 
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 1622f8cfb3..2e6b640050 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -78,17 +78,23 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 void vhost_user_server_inc_in_flight(VuServer *server)
 {
 assert(!server->wait_idle);
-server->in_flight++;
+qatomic_inc(&server->in_flight);
 }
 
 void vhost_user_server_dec_in_flight(VuServer *server)
 {
-server->in_flight--;
-if (server->wait_idle && !server->in_flight) {
-aio_co_wake(server->co_trip);
+if (qatomic_fetch_dec(&server->in_flight) == 1) {
+if (server->wait_idl

[PATCH 01/13] virtio-scsi: avoid race between unplug and transport event

2023-04-03 Thread Stefan Hajnoczi

Only report a transport reset event to the guest after the SCSIDevice
has been unrealized by qdev_simple_device_unplug_cb().

qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
to false so that scsi_device_find/get() no longer see it.

scsi_target_emulate_report_luns() also needs to be updated to filter out
SCSIDevices that are unrealized.

These changes ensure that the guest driver does not see the SCSIDevice
that's being unplugged if it responds very quickly to the transport
reset event.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/scsi-bus.c|  3 ++-
 hw/scsi/virtio-scsi.c | 18 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index c97176110c..f9bd064833 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq 
*r)
 DeviceState *qdev = kid->child;
 SCSIDevice *dev = SCSI_DEVICE(qdev);
 
-if (dev->channel == channel && dev->id == id && dev->lun != 0) {
+if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
+qatomic_load_acquire(&dev->qdev.realized)) {
 store_lun(tmp, dev->lun);
 g_byte_array_append(buf, tmp, 8);
 len += 8;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 612c525d9d..000961446c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1063,15 +1063,6 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 SCSIDevice *sd = SCSI_DEVICE(dev);
 AioContext *ctx = s->ctx ?: qemu_get_aio_context();
 
-if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
-virtio_scsi_acquire(s);
-virtio_scsi_push_event(s, sd,
-   VIRTIO_SCSI_T_TRANSPORT_RESET,
-   VIRTIO_SCSI_EVT_RESET_REMOVED);
-scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
-virtio_scsi_release(s);
-}
-
 aio_disable_external(ctx);
 qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
 aio_enable_external(ctx);
@@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
 virtio_scsi_release(s);
 }
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+virtio_scsi_acquire(s);
+virtio_scsi_push_event(s, sd,
+   VIRTIO_SCSI_T_TRANSPORT_RESET,
+   VIRTIO_SCSI_EVT_RESET_REMOVED);
+scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
+virtio_scsi_release(s);
+}
 }
 
 static struct SCSIBusInfo virtio_scsi_scsi_info = {
-- 
2.39.2

[PATCH 03/13] block/export: only acquire AioContext once for vhost_user_server_stop()

2023-04-03 Thread Stefan Hajnoczi

vhost_user_server_stop() uses AIO_WAIT_WHILE(). AIO_WAIT_WHILE()
requires that AioContext is only acquired once.

Since blk_exp_request_shutdown() already acquires the AioContext it
shouldn't be acquired again in vhost_user_server_stop().

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 40f36ea214..5b6216069c 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -346,10 +346,9 @@ static void vu_accept(QIONetListener *listener, 
QIOChannelSocket *sioc,
 aio_context_release(server->ctx);
 }
 
+/* server->ctx acquired by caller */
 void vhost_user_server_stop(VuServer *server)
 {
-aio_context_acquire(server->ctx);
-
 qemu_bh_delete(server->restart_listener_bh);
 server->restart_listener_bh = NULL;
 
@@ -366,8 +365,6 @@ void vhost_user_server_stop(VuServer *server)
 AIO_WAIT_WHILE(server->ctx, server->co_trip);
 }
 
-aio_context_release(server->ctx);
-
 if (server->listener) {
 qio_net_listener_disconnect(server->listener);
 object_unref(OBJECT(server->listener));
-- 
2.39.2

[PATCH 07/13] virtio: do not set is_external=true on host notifiers

2023-04-03 Thread Stefan Hajnoczi

Host notifiers trigger virtqueue processing. There are critical sections
when new I/O requests must not be submitted because they would cause
interference.

In the past this was solved using aio_set_event_notifiers()
is_external=true argument, which disables fd monitoring between
aio_disable/enable_external() calls. This API is not multi-queue block
layer friendly because it requires knowledge of the specific AioContext.
In a multi-queue block layer world any thread can submit I/O and we
don't know which AioContexts are currently involved.

virtio-blk and virtio-scsi are the only users that depend on
is_external=true. Both rely on the block layer, where we can take
advantage of the existing request queuing behavior that happens during
drained sections. The block layer's drained sections are the only user
of aio_disable_external().

After this patch the virtqueues will be processed during drained
section, but submitted I/O requests will be queued in the BlockBackend.
Queued requests are resumed when the drained section ends. Therefore,
the BlockBackend is still quiesced during drained sections but we no
longer rely on is_external=true to achieve this.

Note that virtqueues have a finite size, so queuing requests does not
lead to unbounded memory usage.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/virtio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 98c4819fcc..dcd7aabb4e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3491,7 +3491,7 @@ static void 
virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-aio_set_event_notifier(ctx, &vq->host_notifier, true,
+aio_set_event_notifier(ctx, &vq->host_notifier, false,
virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
virtio_queue_host_notifier_aio_poll_ready);
@@ -3508,14 +3508,14 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue 
*vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext 
*ctx)
 {
-aio_set_event_notifier(ctx, &vq->host_notifier, true,
+aio_set_event_notifier(ctx, &vq->host_notifier, false,
virtio_queue_host_notifier_read,
NULL, NULL);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
-aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL);
+aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL);
 /* Test and clear notifier before after disabling event,
  * in case poll callback didn't have time to run. */
 virtio_queue_host_notifier_read(&vq->host_notifier);
-- 
2.39.2

[PATCH 13/13] aio: remove aio_disable_external() API

2023-04-03 Thread Stefan Hajnoczi

All callers now pass is_external=false to aio_set_fd_handler() and
aio_set_event_notifier(). The aio_disable_external() API that
temporarily disables fd handlers that were registered is_external=true
is therefore dead code.

Remove aio_disable_external(), aio_enable_external(), and the
is_external arguments to aio_set_fd_handler() and
aio_set_event_notifier().

The entire test-fdmon-epoll test is removed because its sole purpose was
testing aio_disable_external().

Parts of this patch were generated using the following coccinelle
(https://coccinelle.lip6.fr/) semantic patch:

  @@
  expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, 
opaque;
  @@
  - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, 
io_poll_ready, opaque)
  + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, 
opaque)

  @@
  expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
  @@
  - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, 
io_poll_ready)
  + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h   | 55 --
 util/aio-posix.h  |  1 -
 block.c   |  7 
 block/blkio.c | 15 +++
 block/curl.c  | 10 ++---
 block/export/fuse.c   |  8 ++--
 block/export/vduse-blk.c  | 10 ++---
 block/io.c|  2 -
 block/io_uring.c  |  4 +-
 block/iscsi.c |  3 +-
 block/linux-aio.c |  4 +-
 block/nfs.c   |  5 +--
 block/nvme.c  |  8 ++--
 block/ssh.c   |  4 +-
 block/win32-aio.c |  6 +--
 hw/i386/kvm/xen_xenstore.c|  2 +-
 hw/virtio/virtio.c|  6 +--
 hw/xen/xen-bus.c  |  6 +--
 io/channel-command.c  |  6 +--
 io/channel-file.c |  3 +-
 io/channel-socket.c   |  3 +-
 migration/rdma.c  | 16 
 tests/unit/test-aio.c | 27 +
 tests/unit/test-fdmon-epoll.c | 73 ---
 util/aio-posix.c  | 20 +++---
 util/aio-win32.c  |  8 +---
 util/async.c  |  3 +-
 util/fdmon-epoll.c| 10 -
 util/fdmon-io_uring.c |  8 +---
 util/fdmon-poll.c |  3 +-
 util/main-loop.c  |  7 ++--
 util/qemu-coroutine-io.c  |  7 ++--
 util/vhost-user-server.c  | 11 +++---
 tests/unit/meson.build|  3 --
 34 files changed, 75 insertions(+), 289 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

diff --git a/include/block/aio.h b/include/block/aio.h
index e267d918fd..d4ce01ea08 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -467,7 +467,6 @@ bool aio_poll(AioContext *ctx, bool blocking);
  */
 void aio_set_fd_handler(AioContext *ctx,
 int fd,
-bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
 AioPollFn *io_poll,
@@ -483,7 +482,6 @@ void aio_set_fd_handler(AioContext *ctx,
  */
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
-bool is_external,
 EventNotifierHandler *io_read,
 AioPollFn *io_poll,
 EventNotifierHandler *io_poll_ready);
@@ -612,59 +610,6 @@ static inline void aio_timer_init(AioContext *ctx,
  */
 int64_t aio_compute_timeout(AioContext *ctx);
 
-/**
- * aio_disable_external:
- * @ctx: the aio context
- *
- * Disable the further processing of external clients.
- */
-static inline void aio_disable_external(AioContext *ctx)
-{
-qatomic_inc(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_enable_external:
- * @ctx: the aio context
- *
- * Enable the processing of external clients.
- */
-static inline void aio_enable_external(AioContext *ctx)
-{
-int old;
-
-old = qatomic_fetch_dec(&ctx->external_disable_cnt);
-assert(old > 0);
-if (old == 1) {
-/* Kick event loop so it re-arms file descriptors */
-aio_notify(ctx);
-}
-}
-
-/**
- * aio_external_disabled:
- * @ctx: the aio context
- *
- * Return true if the external clients are disabled.
- */
-static inline bool aio_external_disabled(AioContext *ctx)
-{
-return qatomic_read(&ctx->external_disable_cnt);
-}
-
-/**
- * aio_node_check:
- * @ctx: the aio context
- * @is_external: Whether or not the checked node is an external event source.
- *
- * Check if the node's is_external flag is okay to be polled by the ctx at this
- * moment. True means green light.
- */
-static inline bool aio_node_check(AioContext *ctx, bool is_external)
-{
-return !is_external || !qatomic_read(&ctx->external_disable_cnt);
-}
-
 /**
  * aio_co_schedule:
  * @ctx: the aio contex

[PATCH 00/13] block: remove aio_disable_external() API

2023-04-03 Thread Stefan Hajnoczi

The aio_disable_external() API temporarily suspends file descriptor monitoring
in the event loop. The block layer uses this to prevent new I/O requests being
submitted from the guest and elsewhere between bdrv_drained_begin() and
bdrv_drained_end().

While the block layer still needs to prevent new I/O requests in drained
sections, the aio_disable_external() API can be replaced with
.drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
BlockDevOps.

This newer .bdrained_begin/end/poll() approach is attractive because it works
without specifying a specific AioContext. The block layer is moving towards
multi-queue and that means multiple AioContexts may be processing I/O
simultaneously.

The aio_disable_external() was always somewhat hacky. It suspends all file
descriptors that were registered with is_external=true, even if they have
nothing to do with the BlockDriverState graph nodes that are being drained.
It's better to solve a block layer problem in the block layer than to have an
odd event loop API solution.

That covers the motivation for this change, now on to the specifics of this
series:

While it would be nice if a single conceptual approach could be applied to all
is_external=true file descriptors, I ended up looking at callers on a
case-by-case basis. There are two general ways I migrated code away from
is_external=true:

1. Block exports are typically best off unregistering fds in .drained_begin()
   and registering them again in .drained_end(). The .drained_poll() function
   waits for in-flight requests to finish using a reference counter.

2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
   simpler. They can rely on BlockBackend's request queuing during drain
   feature. Guest I/O request coroutines are suspended in a drained section and
   resume upon the end of the drained section.

The first two virtio-scsi patches were already sent as a separate series. I
included them because they are necessary in order to fully remove
aio_disable_external().

Based-on: 087bc644b7634436ca9d52fe58ba9234e2bef026 (kevin/block-next)

Stefan Hajnoczi (13):
  virtio-scsi: avoid race between unplug and transport event
  virtio-scsi: stop using aio_disable_external() during unplug
  block/export: only acquire AioContext once for
vhost_user_server_stop()
  util/vhost-user-server: rename refcount to in_flight counter
  block/export: wait for vhost-user-blk requests when draining
  block/export: stop using is_external in vhost-user-blk server
  virtio: do not set is_external=true on host notifiers
  hw/xen: do not use aio_set_fd_handler(is_external=true) in
xen_xenstore
  hw/xen: do not set is_external=true on evtchn fds
  block/export: rewrite vduse-blk drain code
  block/fuse: take AioContext lock around blk_exp_ref/unref()
  block/fuse: do not set is_external=true on FUSE fd
  aio: remove aio_disable_external() API

 include/block/aio.h  |  55 ---
 include/qemu/vhost-user-server.h |   8 +-
 util/aio-posix.h |   1 -
 block.c  |   7 --
 block/blkio.c|  15 +--
 block/curl.c |  10 +-
 block/export/fuse.c  |  62 -
 block/export/vduse-blk.c | 132 +++
 block/export/vhost-user-blk-server.c |  73 +--
 block/io.c   |   2 -
 block/io_uring.c |   4 +-
 block/iscsi.c|   3 +-
 block/linux-aio.c|   4 +-
 block/nfs.c  |   5 +-
 block/nvme.c |   8 +-
 block/ssh.c  |   4 +-
 block/win32-aio.c|   6 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/scsi/scsi-bus.c   |   3 +-
 hw/scsi/scsi-disk.c  |   1 +
 hw/scsi/virtio-scsi.c|  21 ++---
 hw/virtio/virtio.c   |   6 +-
 hw/xen/xen-bus.c |   6 +-
 io/channel-command.c |   6 +-
 io/channel-file.c|   3 +-
 io/channel-socket.c  |   3 +-
 migration/rdma.c |  16 ++--
 tests/unit/test-aio.c|  27 +-
 tests/unit/test-fdmon-epoll.c|  73 ---
 util/aio-posix.c |  20 +---
 util/aio-win32.c |   8 +-
 util/async.c |   3 +-
 util/fdmon-epoll.c   |  10 --
 util/fdmon-io_uring.c|   8 +-
 util/fdmon-poll.c|   3 +-
 util/main-loop.c |   7 +-
 util/qemu-coroutine-io.c |   7 +-
 util/vhost-user-server.c |  38 
 tests/unit/meson.build   |   3 -
 39 files changed, 298 insertions(+), 375 deletions(-)
 delete mode 100644 tests/unit/test-fdmon-epoll.c

-- 
2.39.2

[PATCH 12/13] block/fuse: do not set is_external=true on FUSE fd

2023-04-03 Thread Stefan Hajnoczi

This is part of ongoing work to remove the aio_disable_external() API.

Use BlockDevOps .drained_begin/end/poll() instead of
aio_set_fd_handler(is_external=true).

As a side-effect the FUSE export now follows AioContext changes like the
other export types.

Signed-off-by: Stefan Hajnoczi 
---
 block/export/fuse.c | 58 +++--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 18394f9e07..83bccf046b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -50,6 +50,7 @@ typedef struct FuseExport {
 
 struct fuse_session *fuse_session;
 struct fuse_buf fuse_buf;
+unsigned int in_flight; /* atomic */
 bool mounted, fd_handler_set_up;
 
 char *mountpoint;
@@ -78,6 +79,42 @@ static void read_from_fuse_export(void *opaque);
 static bool is_regular_file(const char *path, Error **errp);
 
 
+static void fuse_export_drained_begin(void *opaque)
+{
+FuseExport *exp = opaque;
+
+aio_set_fd_handler(exp->common.ctx,
+   fuse_session_fd(exp->fuse_session), false,
+   NULL, NULL, NULL, NULL, NULL);
+exp->fd_handler_set_up = false;
+}
+
+static void fuse_export_drained_end(void *opaque)
+{
+FuseExport *exp = opaque;
+
+/* Refresh AioContext in case it changed */
+exp->common.ctx = blk_get_aio_context(exp->common.blk);
+
+aio_set_fd_handler(exp->common.ctx,
+   fuse_session_fd(exp->fuse_session), false,
+   read_from_fuse_export, NULL, NULL, NULL, exp);
+exp->fd_handler_set_up = true;
+}
+
+static bool fuse_export_drained_poll(void *opaque)
+{
+FuseExport *exp = opaque;
+
+return qatomic_read(&exp->in_flight) > 0;
+}
+
+static const BlockDevOps fuse_export_blk_dev_ops = {
+.drained_begin = fuse_export_drained_begin,
+.drained_end   = fuse_export_drained_end,
+.drained_poll  = fuse_export_drained_poll,
+};
+
 static int fuse_export_create(BlockExport *blk_exp,
   BlockExportOptions *blk_exp_args,
   Error **errp)
@@ -101,6 +138,15 @@ static int fuse_export_create(BlockExport *blk_exp,
 }
 }
 
+blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp);
+
+/*
+ * We handle draining ourselves using an in-flight counter and by disabling
+ * the FUSE fd handler. Do not queue BlockBackend requests, they need to
+ * complete so the in-flight counter reaches zero.
+ */
+blk_set_disable_request_queuing(exp->common.blk, true);
+
 init_exports_table();
 
 /*
@@ -224,7 +270,7 @@ static int setup_fuse_export(FuseExport *exp, const char 
*mountpoint,
 g_hash_table_insert(exports, g_strdup(mountpoint), NULL);
 
 aio_set_fd_handler(exp->common.ctx,
-   fuse_session_fd(exp->fuse_session), true,
+   fuse_session_fd(exp->fuse_session), false,
read_from_fuse_export, NULL, NULL, NULL, exp);
 exp->fd_handler_set_up = true;
 
@@ -248,6 +294,8 @@ static void read_from_fuse_export(void *opaque)
 blk_exp_ref(&exp->common);
 aio_context_release(exp->common.ctx);
 
+qatomic_inc(&exp->in_flight);
+
 do {
 ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
 } while (ret == -EINTR);
@@ -258,6 +306,10 @@ static void read_from_fuse_export(void *opaque)
 fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
 
 out:
+if (qatomic_fetch_dec(&exp->in_flight) == 1) {
+aio_wait_kick(); /* wake AIO_WAIT_WHILE() */
+}
+
 aio_context_acquire(exp->common.ctx);
 blk_exp_unref(&exp->common);
 aio_context_release(exp->common.ctx);
@@ -272,7 +324,7 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
 
 if (exp->fd_handler_set_up) {
 aio_set_fd_handler(exp->common.ctx,
-   fuse_session_fd(exp->fuse_session), true,
+   fuse_session_fd(exp->fuse_session), false,
NULL, NULL, NULL, NULL, NULL);
 exp->fd_handler_set_up = false;
 }
@@ -291,6 +343,8 @@ static void fuse_export_delete(BlockExport *blk_exp)
 {
 FuseExport *exp = container_of(blk_exp, FuseExport, common);
 
+blk_set_dev_ops(exp->common.blk, NULL, NULL);
+
 if (exp->fuse_session) {
 if (exp->mounted) {
 fuse_session_unmount(exp->fuse_session);
-- 
2.39.2

[PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore

2023-04-03 Thread Stefan Hajnoczi

There is no need to suspend activity between aio_disable_external() and
aio_enable_external(), which is mainly used for the block layer's drain
operation.

This is part of ongoing work to remove the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi 
---
 hw/i386/kvm/xen_xenstore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 900679af8a..6e81bc8791 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error 
**errp)
 error_setg(errp, "Xenstore evtchn port init failed");
 return;
 }
-aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
+aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false,
xen_xenstore_event, NULL, NULL, NULL, s);
 
 s->impl = xs_impl_create(xen_domid);
-- 
2.39.2

[PATCH 06/13] block/export: stop using is_external in vhost-user-blk server

2023-04-03 Thread Stefan Hajnoczi

vhost-user activity must be suspended during bdrv_drained_begin/end().
This prevents new requests from interfering with whatever is happening
in the drained section.

Previously this was done using aio_set_fd_handler()'s is_external
argument. In a multi-queue block layer world the aio_disable_external()
API cannot be used since multiple AioContext may be processing I/O, not
just one.

Switch to BlockDevOps->drained_begin/end() callbacks.

Signed-off-by: Stefan Hajnoczi 
---
 block/export/vhost-user-blk-server.c | 43 ++--
 util/vhost-user-server.c | 10 +++
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index dbf5207162..6e1bc196fb 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -207,22 +207,6 @@ static const VuDevIface vu_blk_iface = {
 .process_msg   = vu_blk_process_msg,
 };
 
-static void blk_aio_attached(AioContext *ctx, void *opaque)
-{
-VuBlkExport *vexp = opaque;
-
-vexp->export.ctx = ctx;
-vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
-}
-
-static void blk_aio_detach(void *opaque)
-{
-VuBlkExport *vexp = opaque;
-
-vhost_user_server_detach_aio_context(&vexp->vu_server);
-vexp->export.ctx = NULL;
-}
-
 static void
 vu_blk_initialize_config(BlockDriverState *bs,
  struct virtio_blk_config *config,
@@ -254,6 +238,25 @@ static void vu_blk_exp_request_shutdown(BlockExport *exp)
 vhost_user_server_stop(&vexp->vu_server);
 }
 
+/* Called with vexp->export.ctx acquired */
+static void vu_blk_drained_begin(void *opaque)
+{
+VuBlkExport *vexp = opaque;
+
+vhost_user_server_detach_aio_context(&vexp->vu_server);
+}
+
+/* Called with vexp->export.blk AioContext acquired */
+static void vu_blk_drained_end(void *opaque)
+{
+VuBlkExport *vexp = opaque;
+
+/* Refresh AioContext in case it changed */
+vexp->export.ctx = blk_get_aio_context(vexp->export.blk);
+
+vhost_user_server_attach_aio_context(&vexp->vu_server, vexp->export.ctx);
+}
+
 /*
  * Ensures that bdrv_drained_begin() waits until in-flight requests complete.
  *
@@ -267,6 +270,8 @@ static bool vu_blk_drained_poll(void *opaque)
 }
 
 static const BlockDevOps vu_blk_dev_ops = {
+.drained_begin = vu_blk_drained_begin,
+.drained_end   = vu_blk_drained_end,
 .drained_poll  = vu_blk_drained_poll,
 };
 
@@ -309,13 +314,9 @@ static int vu_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
  logical_block_size, num_queues);
 
 blk_set_dev_ops(exp->blk, &vu_blk_dev_ops, vexp);
-blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
- vexp);
 
 if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
  num_queues, &vu_blk_iface, errp)) {
-blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
-blk_aio_detach, vexp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 g_free(vexp->handler.serial);
 return -EADDRNOTAVAIL;
@@ -328,8 +329,6 @@ static void vu_blk_exp_delete(BlockExport *exp)
 {
 VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 
-blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
-vexp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 g_free(vexp->handler.serial);
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 2e6b640050..332aea9306 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -278,7 +278,7 @@ set_watch(VuDev *vu_dev, int fd, int vu_evt,
 vu_fd_watch->fd = fd;
 vu_fd_watch->cb = cb;
 qemu_socket_set_nonblock(fd);
-aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
+aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler,
NULL, NULL, NULL, vu_fd_watch);
 vu_fd_watch->vu_dev = vu_dev;
 vu_fd_watch->pvt = pvt;
@@ -299,7 +299,7 @@ static void remove_watch(VuDev *vu_dev, int fd)
 if (!vu_fd_watch) {
 return;
 }
-aio_set_fd_handler(server->ioc->ctx, fd, true,
+aio_set_fd_handler(server->ioc->ctx, fd, false,
NULL, NULL, NULL, NULL, NULL);
 
 QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
@@ -362,7 +362,7 @@ void vhost_user_server_stop(VuServer *server)
 VuFdWatch *vu_fd_watch;
 
 QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
-aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
+aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false,
NULL, NULL, NULL, NULL, vu_fd_watch);
 }
 
@@ -403,7 +403,7 @@ void vhost_user_server_attach_aio_context(VuServer *server, 
AioContext *ctx)

[PATCH 09/13] hw/xen: do not set is_external=true on evtchn fds

2023-04-03 Thread Stefan Hajnoczi

is_external=true suspends fd handlers between aio_disable_external() and
aio_enable_external(). The block layer's drain operation uses this
mechanism to prevent new I/O from sneaking in between
bdrv_drained_begin() and bdrv_drained_end().

The xen-block device actually works fine with is_external=false because
BlockBackend requests are already queued between bdrv_drained_begin()
and bdrv_drained_end(). Since the Xen ring size is finite, request
queuing will stop once the ring is full and memory usage is bounded.
After bdrv_drained_end() the BlockBackend requests will resume and
xen-block's processing will continue.

This is part of ongoing work to remove the aio_disable_external() API.

Signed-off-by: Stefan Hajnoczi 
---
 hw/xen/xen-bus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index c59850b1de..c4fd26abe1 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -842,11 +842,11 @@ void xen_device_set_event_channel_context(XenDevice 
*xendev,
 }
 
 if (channel->ctx)
-aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), 
true,
+aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), 
false,
NULL, NULL, NULL, NULL, NULL);
 
 channel->ctx = ctx;
-aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
xen_device_event, NULL, xen_device_poll, NULL, channel);
 }
 
@@ -920,7 +920,7 @@ void xen_device_unbind_event_channel(XenDevice *xendev,
 
 QLIST_REMOVE(channel, list);
 
-aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true,
+aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false,
NULL, NULL, NULL, NULL, NULL);
 
 if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) {
-- 
2.39.2

[PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter

2023-04-03 Thread Stefan Hajnoczi

The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.

Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.

Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/vhost-user-server.h |  6 +++---
 block/export/vhost-user-blk-server.c | 11 +++
 util/vhost-user-server.c | 14 +++---
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 25c72433ca..bc0ac9ddb6 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -41,7 +41,7 @@ typedef struct {
 const VuDevIface *vu_iface;
 
 /* Protected by ctx lock */
-unsigned int refcount;
+unsigned int in_flight;
 bool wait_idle;
 VuDev vu_dev;
 QIOChannel *ioc; /* The I/O channel with the client */
@@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
 
 void vhost_user_server_stop(VuServer *server);
 
-void vhost_user_server_ref(VuServer *server);
-void vhost_user_server_unref(VuServer *server);
+void vhost_user_server_inc_in_flight(VuServer *server);
+void vhost_user_server_dec_in_flight(VuServer *server);
 
 void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 3409d9e02e..e93f2ed6b4 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -49,7 +49,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
 free(req);
 }
 
-/* Called with server refcount increased, must decrease before returning */
+/*
+ * Called with server in_flight counter increased, must decrease before
+ * returning.
+ */
 static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
 {
 VuBlkReq *req = opaque;
@@ -67,12 +70,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void 
*opaque)
 in_num, out_num);
 if (in_len < 0) {
 free(req);
-vhost_user_server_unref(server);
+vhost_user_server_dec_in_flight(server);
 return;
 }
 
 vu_blk_req_complete(req, in_len);
-vhost_user_server_unref(server);
+vhost_user_server_dec_in_flight(server);
 }
 
 static void vu_blk_process_vq(VuDev *vu_dev, int idx)
@@ -94,7 +97,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
 Coroutine *co =
 qemu_coroutine_create(vu_blk_virtio_process_req, req);
 
-vhost_user_server_ref(server);
+vhost_user_server_inc_in_flight(server);
 qemu_coroutine_enter(co);
 }
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5b6216069c..1622f8cfb3 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 error_report("vu_panic: %s", buf);
 }
 
-void vhost_user_server_ref(VuServer *server)
+void vhost_user_server_inc_in_flight(VuServer *server)
 {
 assert(!server->wait_idle);
-server->refcount++;
+server->in_flight++;
 }
 
-void vhost_user_server_unref(VuServer *server)
+void vhost_user_server_dec_in_flight(VuServer *server)
 {
-server->refcount--;
-if (server->wait_idle && !server->refcount) {
+server->in_flight--;
+if (server->wait_idle && !server->in_flight) {
 aio_co_wake(server->co_trip);
 }
 }
@@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
 /* Keep running */
 }
 
-if (server->refcount) {
+if (server->in_flight) {
 /* Wait for requests to complete before we can unmap the memory */
 server->wait_idle = true;
 qemu_coroutine_yield();
 server->wait_idle = false;
 }
-assert(server->refcount == 0);
+assert(server->in_flight == 0);
 
 vu_deinit(vu_dev);
 
-- 
2.39.2

[PATCH 10/13] block/export: rewrite vduse-blk drain code

2023-04-03 Thread Stefan Hajnoczi

vduse_blk_detach_ctx() waits for in-flight requests using
AIO_WAIT_WHILE(). This is not allowed according to a comment in
bdrv_set_aio_context_commit():

  /*
   * Take the old AioContex when detaching it from bs.
   * At this point, new_context lock is already acquired, and we are now
   * also taking old_context. This is safe as long as bdrv_detach_aio_context
   * does not call AIO_POLL_WHILE().
   */

Use this opportunity to rewrite the drain code in vduse-blk:

- Use the BlockExport refcount so that vduse_blk_exp_delete() is only
  called when there are no more requests in flight.

- Implement .drained_poll() so in-flight request coroutines are stopped
  by the time .bdrv_detach_aio_context() is called.

- Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the
  .bdrv_detach_aio_context() constraint violation. It's no longer
  needed due to the previous changes.

- Always handle the VDUSE file descriptor, even in drained sections. The
  VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in
  drained sections. This ensures that the VDUSE kernel code gets a fast
  response.

- Suspend virtqueue fd handlers in .drained_begin() and resume them in
  .drained_end(). This eliminates the need for the
  aio_set_fd_handler(is_external=true) flag, which is being removed from
  QEMU.

This is a long list but splitting it into individual commits would
probably lead to git bisect failures - the changes are all related.

Signed-off-by: Stefan Hajnoczi 
---
 block/export/vduse-blk.c | 132 +++
 1 file changed, 93 insertions(+), 39 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index f7ae44e3ce..35dc8fcf45 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -31,7 +31,8 @@ typedef struct VduseBlkExport {
 VduseDev *dev;
 uint16_t num_queues;
 char *recon_file;
-unsigned int inflight;
+unsigned int inflight; /* atomic */
+bool vqs_started;
 } VduseBlkExport;
 
 typedef struct VduseBlkReq {
@@ -41,13 +42,24 @@ typedef struct VduseBlkReq {
 
 static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp)
 {
-vblk_exp->inflight++;
+if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) {
+/* Prevent export from being deleted */
+aio_context_acquire(vblk_exp->export.ctx);
+blk_exp_ref(&vblk_exp->export);
+aio_context_release(vblk_exp->export.ctx);
+}
 }
 
 static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp)
 {
-if (--vblk_exp->inflight == 0) {
+if (qatomic_fetch_dec(&vblk_exp->inflight) == 1) {
+/* Wake AIO_WAIT_WHILE() */
 aio_wait_kick();
+
+/* Now the export can be deleted */
+aio_context_acquire(vblk_exp->export.ctx);
+blk_exp_unref(&vblk_exp->export);
+aio_context_release(vblk_exp->export.ctx);
 }
 }
 
@@ -124,8 +136,12 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 {
 VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
 
+if (!vblk_exp->vqs_started) {
+return; /* vduse_blk_drained_end() will start vqs later */
+}
+
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-   true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+   false, on_vduse_vq_kick, NULL, NULL, NULL, vq);
 /* Make sure we don't miss any kick afer reconnecting */
 eventfd_write(vduse_queue_get_fd(vq), 1);
 }
@@ -133,9 +149,14 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
 {
 VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+int fd = vduse_queue_get_fd(vq);
 
-aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
-   true, NULL, NULL, NULL, NULL, NULL);
+if (fd < 0) {
+return;
+}
+
+aio_set_fd_handler(vblk_exp->export.ctx, fd, false,
+   NULL, NULL, NULL, NULL, NULL);
 }
 
 static const VduseOps vduse_blk_ops = {
@@ -152,42 +173,19 @@ static void on_vduse_dev_kick(void *opaque)
 
 static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx)
 {
-int i;
-
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev),
-   true, on_vduse_dev_kick, NULL, NULL, NULL,
+   false, on_vduse_dev_kick, NULL, NULL, NULL,
vblk_exp->dev);
 
-for (i = 0; i < vblk_exp->num_queues; i++) {
-VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i);
-int fd = vduse_queue_get_fd(vq);
-
-if (fd < 0) {
-continue;
-}
-aio_set_fd_handler(vblk_exp->export.ctx, fd, true,
-   on_vduse_vq_kick, NULL, NULL, NULL, vq);
-}
+/* Virtqueues are handled by vduse_blk_drained_end() */
 }
 
 static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp)
 {
-int i;
-
-for (i = 0; i < vblk_exp

[PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug

2023-04-03 Thread Stefan Hajnoczi

This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.

The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da1c
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.

Ensure virtio_scsi_hotunplug() is safe as follows:

1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
   device_set_realized() calls qatomic_set(&dev->realized, false) so
   that future scsi_device_get() calls return NULL because they exclude
   SCSIDevices with realized=false.

   That means virtio-scsi will reject new I/O requests to this
   SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
   virtio_scsi_hotunplug() is still executing. We are protected against
   new requests!

2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
   that in-flight requests are cancelled synchronously. This ensures
   that no in-flight requests remain once qdev_simple_device_unplug_cb()
   returns.

Thanks to these two conditions we don't need aio_disable_external()
anymore.

Cc: Zhengui Li 
Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/scsi-disk.c   | 1 +
 hw/scsi/virtio-scsi.c | 3 ---
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 97c9b1c8cd..e01bd84541 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2522,6 +2522,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
 
 static void scsi_unrealize(SCSIDevice *dev)
 {
+scsi_device_purge_requests(dev, SENSE_CODE(RESET));
 del_boot_device_lchs(&dev->qdev, NULL);
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 000961446c..a02f9233ec 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1061,11 +1061,8 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 SCSIDevice *sd = SCSI_DEVICE(dev);
-AioContext *ctx = s->ctx ?: qemu_get_aio_context();
 
-aio_disable_external(ctx);
 qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
-aio_enable_external(ctx);
 
 if (s->ctx) {
 virtio_scsi_acquire(s);
-- 
2.39.2

Re: [PATCH v4] hostmem-file: add offset option

2023-04-03 Thread David Hildenbrand


On 03.04.23 17:49, Peter Xu wrote:

On Mon, Apr 03, 2023 at 09:13:29AM +0200, David Hildenbrand wrote:

On 01.04.23 19:47, Stefan Hajnoczi wrote:

On Sat, Apr 01, 2023 at 12:42:57PM +, Alexander Graf wrote:

Add an option for hostmem-file to start the memory object at an offset
into the target file. This is useful if multiple memory objects reside
inside the same target file, such as a device node.

In particular, it's useful to map guest memory directly into /dev/mem
for experimentation.

Signed-off-by: Alexander Graf 
Reviewed-by: Stefan Hajnoczi 

---

v1 -> v2:

- add qom documentation
- propagate offset into truncate, size and alignment checks

v2 -> v3:

- failed attempt at fixing typo

v2 -> v4:

- fix typo
---
   backends/hostmem-file.c | 40 +++-
   include/exec/memory.h   |  2 ++
   include/exec/ram_addr.h |  3 ++-
   qapi/qom.json   |  5 +
   qemu-options.hx |  6 +-
   softmmu/memory.c|  3 ++-
   softmmu/physmem.c   | 14 ++
   7 files changed, 65 insertions(+), 8 deletions(-)


Reviewed-by: Stefan Hajnoczi 


The change itself looks good to me, but I do think some other QEMU code that
ends up working on the RAMBlock is not prepared yet. Most probably, because
we never ended up using fd with an offset as guest RAM.

We don't seem to be remembering that offset in the RAMBlock. First, I
thought block->offset would be used for that, but that's just the offset in
the ram_addr_t space. Maybe we need a new "block->fd_offset" to remember the
offset (unless I am missing something).


I think you're right.



The real offset in the file would be required at least in two cases I can
see (whenever we essentially end up calling mmap() on the fd again):

1) qemu_ram_remap(): We'd have to add the file offset on top of the
calculated offset.

2) vhost-user: most probably whenever we set the mmap_offset. For example,
in vhost_user_fill_set_mem_table_msg() we'd similarly have to add the
file_offset on top of the calculated offset. vhost_user_get_mr_data() should
most probably do that.


I had a patch to add that offset for the upcoming doublemap feature here:

https://lore.kernel.org/all/20230117220914.2062125-8-pet...@redhat.com/

But that was because doublemap wants to map the guest mem twice for other
purposes. I didn't yet notice that the code seem to be already broken if
without offset==0.

While, I _think_ we already have offset!=0 case for a ramblock, since:

 commit ed5d001916dd46ceed6d8850e453bcd7b5db2acb
 Author: Jagannathan Raman 
 Date:   Fri Jan 29 11:46:13 2021 -0500

 multi-process: setup memory manager for remote device

Where there's:

 memory_region_init_ram_from_fd(subregion, NULL,
name, sysmem_info->sizes[region],
RAM_SHARED, msg->fds[region],
sysmem_info->offsets[region],
errp);


Interesting ... maybe so far never used alongside vhost-user.

--
Thanks,

David / dhildenb

Re: [PATCH v2 02/10] softmmu: Don't use 'singlestep' global in QMP and HMP commands

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

The HMP 'singlestep' command, the QMP 'query-status' command and the
HMP 'info status' command (which is just wrapping the QMP command
implementation) look at the 'singlestep' global variable. Make them
access the new TCG accelerator 'one-insn-per-tb' property instead.

This leaves the HMP and QMP command/field names and output strings
unchanged; we will clean that up later.

Signed-off-by: Peter Maydell
---
  softmmu/runstate-hmp-cmds.c | 18 --
  softmmu/runstate.c  | 10 +-
  2 files changed, 25 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 01/10] make one-insn-per-tb an accel option

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

This commit adds 'one-insn-per-tb' as a property on the TCG
accelerator object, so you can enable it with
-accel tcg,one-insn-per-tb=on

It has the same behaviour as the existing '-singlestep' command line
option.  We use a different name because 'singlestep' has always been
a confusing choice, because it doesn't have anything to do with
single-stepping the CPU.  What it does do is force TCG emulation to
put one guest instruction in each TB, which can be useful in some
situations (such as analysing debug logs).

The existing '-singlestep' commandline options are decoupled from the
global 'singlestep' variable and instead now are syntactic sugar for
setting the accel property.  (These can then go away after a
deprecation period.)

The global variable remains for the moment as:
  * what the TCG code looks at to change its behaviour
  * what HMP and QMP use to query and set the behaviour

In the following commits we'll clean those up to not directly
look at the global variable.

Signed-off-by: Peter Maydell
---
  accel/tcg/tcg-all.c | 21 +
  bsd-user/main.c |  8 ++--
  linux-user/main.c   |  8 ++--
  softmmu/vl.c| 17 +++--
  qemu-options.hx |  7 +++
  5 files changed, 55 insertions(+), 6 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] target/mips: tcg: detect out-of-bounds accesses to cpu_gpr and cpu_gpr_hi

2023-04-03 Thread Richard Henderson


On 4/3/23 10:29, Paolo Bonzini wrote:

In some cases (for example gen_compute_branch_nm in
nanomips_translate.c.inc) registers can be unused
on some paths and a negative value is passed in that case:

 gen_compute_branch_nm(ctx, OPC_BPOSGE32, 4, -1, -2,
   imm << 1);

To avoid an out of bounds access in those cases, introduce
assertions.

Signed-off-by: Paolo Bonzini
---
  target/mips/tcg/translate.c | 4 
  1 file changed, 4 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 04/11] qemu-options: finesse the recommendations around -blockdev

2023-04-03 Thread Markus Armbruster

Thomas Huth  writes:

> On 03/04/2023 16.55, Markus Armbruster wrote:
>> Alex Bennée  writes:
>> 
>>> Markus Armbruster  writes:
>>>
 Alex Bennée  writes:
> ...
>>> I was under the impression things like -hda wouldn't work say on an Arm
>>> machine because you don't know what sort of interface you might be
>>> using and -hda implies IDE. Where is this macro substitution done?
>> 
>> qemu_init() calls drive_add() for all these options.
>> 
>> drive_add(TYPE, INDEX, FILE, OPTSTR) creates a QemuOpts in group
>> "drive".  It sets "if" to if_name[TYPE] unless TYPE is IF_DEFAULT,
>> "index" to INDEX unless it's negative, and "file" to FILE unless it's
>> null.  Then it parses OPTSTR on top.
>> 
>> For -hdX, the call looks like
>> 
>>  drive_add(IF_DEFAULT, popt->index - QEMU_OPTION_hda, optarg,
>>HD_OPTS);
>> 
>> We pass IF_DEFAULT, so "if" remains unset.  "index" is set to 0 for
>> -hda, 1, for -hdb and so forth.  "file" is set to the option argument.
>> Since HD_OPTS is "media=disk", we set "media" to "disk".
>> 
>> The QemuOpts in config group "drive" get passed to drive_new() via
>> drive_init_func().  Unset "if" defaults to the current machine's class's
>> block_default_type.
>> 
>> If a machine doesn't set this member explicitly, it remains zero, which
>> is IF_NONE.  Documented in blockdev.h:
>> 
>>  typedef enum {
>>  IF_DEFAULT = -1,/* for use with drive_add() only */
>>  /*
>>   * IF_NONE must be zero, because we want MachineClass member
>> ---> * block_default_type to default-initialize to IF_NONE
>>   */
>>  IF_NONE = 0,
>>  IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, 
>> IF_XEN,
>>  IF_COUNT
>>  } BlockInterfaceType;
>> 
>> Questions?
>
> How's the average user supposed to know that? Our qemu-options.hx just says: 
> "-hda/-hdb file  use 'file' as IDE hard disk 0/1 image"...

Ancient doc bug.  Should have been updated in commit 2d0d2837dcf
(Support default block interfaces per QEMUMachine) back in 2012.

Re: [PATCH v2 07/10] hmp: Add 'one-insn-per-tb' command equivalent to 'singlestep'

2023-04-03 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> The 'singlestep' HMP command is confusing, because it doesn't
> actually have anything to do with single-stepping the CPU.  What it
> does do is force TCG emulation to put one guest instruction in each
> TB, which can be useful in some situations.
> 
> Create a new HMP command  'one-insn-per-tb', so we can document that
> 'singlestep' is just a deprecated synonym for it, and eventually
> perhaps drop it.
> 
> We aren't obliged to do deprecate-and-drop for HMP commands,
> but it's easy enough to do so, so we do.
> 
> Signed-off-by: Peter Maydell 
> ---
>  docs/about/deprecated.rst   |  9 +
>  include/monitor/hmp.h   |  2 +-
>  softmmu/runstate-hmp-cmds.c |  2 +-
>  tests/qtest/test-hmp.c  |  1 +
>  hmp-commands.hx | 25 +
>  5 files changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 3c62671dac1..6f5e689aa45 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -199,6 +199,15 @@ accepted incorrect commands will return an error. Users 
> should make sure that
>  all arguments passed to ``device_add`` are consistent with the documented
>  property types.
>  
> +Human Monitor Protocol (HMP) commands
> +-
> +
> +``singlestep`` (since 8.1)
> +''
> +
> +The ``singlestep`` command has been replaced by the ``one-insn-per-tb``
> +command, which has the same behaviour but a less misleading name.
> +
>  Host Architectures
>  --
>  
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index fdb69b7f9ca..13f9a2dedb8 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -158,7 +158,7 @@ void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict 
> *qdict);
>  void hmp_human_readable_text_helper(Monitor *mon,
>  HumanReadableText *(*qmp_handler)(Error 
> **));
>  void hmp_info_stats(Monitor *mon, const QDict *qdict);
> -void hmp_singlestep(Monitor *mon, const QDict *qdict);
> +void hmp_one_insn_per_tb(Monitor *mon, const QDict *qdict);
>  void hmp_watchdog_action(Monitor *mon, const QDict *qdict);
>  void hmp_pcie_aer_inject_error(Monitor *mon, const QDict *qdict);
>  void hmp_info_capture(Monitor *mon, const QDict *qdict);
> diff --git a/softmmu/runstate-hmp-cmds.c b/softmmu/runstate-hmp-cmds.c
> index 127521a483a..76d1399ed85 100644
> --- a/softmmu/runstate-hmp-cmds.c
> +++ b/softmmu/runstate-hmp-cmds.c
> @@ -41,7 +41,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict)
>  qapi_free_StatusInfo(info);
>  }
>  
> -void hmp_singlestep(Monitor *mon, const QDict *qdict)
> +void hmp_one_insn_per_tb(Monitor *mon, const QDict *qdict)
>  {
>  const char *option = qdict_get_try_str(qdict, "option");
>  AccelState *accel = current_accel();
> diff --git a/tests/qtest/test-hmp.c b/tests/qtest/test-hmp.c
> index b4a920df898..cb3530df722 100644
> --- a/tests/qtest/test-hmp.c
> +++ b/tests/qtest/test-hmp.c
> @@ -64,6 +64,7 @@ static const char *hmp_cmds[] = {
>  "screendump /dev/null",
>  "sendkey x",
>  "singlestep on",
> +"one-insn-per-tb on",

OK, it wouldn't be bad if this list got a bit back into near alphabetic
order.


Reviewed-by: Dr. David Alan Gilbert 


>  "wavcapture /dev/null",
>  "stopcapture 0",
>  "sum 0 512",
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bb85ee1d267..9afbb54a515 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -378,18 +378,35 @@ SRST
>only *tag* as parameter.
>  ERST
>  
> +{
> +.name   = "one-insn-per-tb",
> +.args_type  = "option:s?",
> +.params = "[on|off]",
> +.help   = "run emulation with one guest instruction per 
> translation block",
> +.cmd= hmp_one_insn_per_tb,
> +},
> +
> +SRST
> +``one-insn-per-tb [off]``
> +  Run the emulation with one guest instruction per translation block.
> +  This slows down emulation a lot, but can be useful in some situations,
> +  such as when trying to analyse the logs produced by the ``-d`` option.
> +  This only has an effect when using TCG, not with KVM or other accelerators.
> +
> +  If called with option off, the emulation returns to normal mode.
> +ERST
> +
>  {
>  .name   = "singlestep",
>  .args_type  = "option:s?",
>  .params = "[on|off]",
> -.help   = "run emulation in singlestep mode or switch to normal 
> mode",
> -.cmd= hmp_singlestep,
> +.help   = "deprecated synonym for one-insn-per-tb",
> +.cmd= hmp_one_insn_per_tb,
>  },
>  
>  SRST
>  ``singlestep [off]``
> -  Run the emulation in single step mode.
> -  If called with option off, the emulation returns to normal mode.
> +  This is a deprecated synonym for the one-insn-per-tb command.
>  ERST
>  
>  {
> -- 
> 2.34.1
> 
-- 
Dr. Davi

[PATCH] target/mips: tcg: detect out-of-bounds accesses to cpu_gpr and cpu_gpr_hi

2023-04-03 Thread Paolo Bonzini

In some cases (for example gen_compute_branch_nm in
nanomips_translate.c.inc) registers can be unused
on some paths and a negative value is passed in that case:

gen_compute_branch_nm(ctx, OPC_BPOSGE32, 4, -1, -2,
  imm << 1);

To avoid an out of bounds access in those cases, introduce
assertions.

Signed-off-by: Paolo Bonzini 
---
 target/mips/tcg/translate.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index 1fb4ef712729..999fbb7cc1c0 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -1223,6 +1223,7 @@ static const char regnames_LO[][4] = {
 /* General purpose registers moves. */
 void gen_load_gpr(TCGv t, int reg)
 {
+assert(reg >= 0 && reg <= ARRAY_SIZE(cpu_gpr));
 if (reg == 0) {
 tcg_gen_movi_tl(t, 0);
 } else {
@@ -1232,6 +1233,7 @@ void gen_load_gpr(TCGv t, int reg)
 
 void gen_store_gpr(TCGv t, int reg)
 {
+assert(reg >= 0 && reg <= ARRAY_SIZE(cpu_gpr));
 if (reg != 0) {
 tcg_gen_mov_tl(cpu_gpr[reg], t);
 }
@@ -1240,6 +1242,7 @@ void gen_store_gpr(TCGv t, int reg)
 #if defined(TARGET_MIPS64)
 void gen_load_gpr_hi(TCGv_i64 t, int reg)
 {
+assert(reg >= 0 && reg <= ARRAY_SIZE(cpu_gpr_hi));
 if (reg == 0) {
 tcg_gen_movi_i64(t, 0);
 } else {
@@ -1249,6 +1252,7 @@ void gen_load_gpr_hi(TCGv_i64 t, int reg)
 
 void gen_store_gpr_hi(TCGv_i64 t, int reg)
 {
+assert(reg >= 0 && reg <= ARRAY_SIZE(cpu_gpr_hi));
 if (reg != 0) {
 tcg_gen_mov_i64(cpu_gpr_hi[reg], t);
 }
-- 
2.39.2

Re: [PATCH] MAINTAINERS: Remove and change David Gilbert maintainer entries

2023-04-03 Thread Markus Armbruster

"Dr. David Alan Gilbert (git)"  writes:

> From: "Dr. David Alan Gilbert" 
>
> I'm leaving Red Hat next week, so clean up the maintainer entries.
>
> 'virtiofs' is just the device code now, so is pretty small, and
> Stefan is still a maintainer there.
>
> 'migration' still has Juan.
>
> For 'HMP' I'll swing that over to my personal email.
>
> Signed-off-by: Dr. David Alan Gilbert 

Thank you for you distinguished service, and thank you some more for
staying on as HMP maintainer.

Reviewed-by: Markus Armbruster

Re: [PATCH] acpi: pcihp: make pending delete expire in 5sec

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 06:16:18PM +0200, Igor Mammedov wrote:
> with Q35 using ACPI PCI hotplug by default, user's request to unplug
> device is ignored when it's issued before guest OS has been booted.
> And any additional attempt to request device hot-unplug afterwards
> results in following error:
> 
>   "Device XYZ is already in the process of unplug"
> 
> arguably it can be considered as a regression introduced by [2],
> before which it was possible to issue unplug request multiple
> times.
> 
> Allowing pending delete expire brings ACPI PCI hotplug on par
> with native PCIe unplug behavior [1] which in its turn refers
> back to ACPI PCI hotplug ability to repeat unplug requests.
> 
> PS:
> >From ACPI point of view, unplug request sets PCI hotplug status
> bit in GPE0 block. However depending on OSPM, status bits may
> be retained (Windows) or cleared (Linux) during guest's ACPI
> subsystem initialization, and as result Linux guest looses
> plug/unplug event (no SCI generated) if plug/unplug has
> happend before guest OS initialized GPE registers handling.
> I couldn't find any restrictions wrt OPM clearing GPE status
> bits ACPI spec.
> Hence a fallback approach is to let user repeat unplug request
> later at the time when guest OS has booted.
> 
> 1) 18416c62e3 ("pcie: expire pending delete")
> 2)
> Fixes: cce8944cc9ef ("qdev-monitor: Forbid repeated device_del")
> Signed-off-by: Igor Mammedov 

A bit concerned about how this interacts with failover,
and 5sec is a lot of time that I hoped we'd avoid with acpi.
Any better ideas of catching such misbehaving guests?


Also at this point I do not know why we deny hotplug
pending_deleted_event in qdev core.  
Commit log says:

Device unplug can be done asynchronously. Thus, sending the second
device_del before the previous unplug is complete may lead to
unexpected results. On PCIe devices, this cancels the hot-unplug
process.

so it's a work around for an issue in pcie hotplug (and maybe shpc
too?). Maybe we should have put that check in pcie/shpc and
leave acpi along?




> ---
> CC: m...@redhat.com
> CC: anisi...@redhat.com
> CC: jus...@redhat.com
> CC: kra...@redhat.com
> ---
>  hw/acpi/pcihp.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index dcfb779a7a..cd4f9fee0a 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -357,6 +357,8 @@ void acpi_pcihp_device_unplug_request_cb(HotplugHandler 
> *hotplug_dev,
>   * acpi_pcihp_eject_slot() when the operation is completed.
>   */
>  pdev->qdev.pending_deleted_event = true;
> +pdev->qdev.pending_deleted_expires_ms =
> +qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
>  s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>  acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS);
>  }
> -- 
> 2.39.1

Re: [PATCH v19 13/21] docs/s390x/cpu topology: document s390x cpu topology

2023-04-03 Thread Pierre Morel




On 4/3/23 19:00, Cédric Le Goater wrote:

On 4/3/23 18:28, Pierre Morel wrote:

Add some basic examples for the definition of cpu topology
in s390x.

Signed-off-by: Pierre Morel 
---
  MAINTAINERS    |   2 +
  docs/devel/index-internals.rst |   1 +
  docs/devel/s390-cpu-topology.rst   | 161 +++
  docs/system/s390x/cpu-topology.rst | 238 +
  docs/system/target-s390x.rst   |   1 +
  5 files changed, 403 insertions(+)
  create mode 100644 docs/devel/s390-cpu-topology.rst
  create mode 100644 docs/system/s390x/cpu-topology.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index de9052f753..fe5638e31d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1660,6 +1660,8 @@ S: Supported
  F: include/hw/s390x/cpu-topology.h
  F: hw/s390x/cpu-topology.c
  F: target/s390x/kvm/cpu_topology.c
+F: docs/devel/s390-cpu-topology.rst
+F: docs/system/s390x/cpu-topology.rst
    X86 Machines
  
diff --git a/docs/devel/index-internals.rst 
b/docs/devel/index-internals.rst

index e1a93df263..6f81df92bc 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -14,6 +14,7 @@ Details about QEMU's various subsystems including 
how to add features to them.

 migration
 multi-process
 reset
+   s390-cpu-topology
 s390-dasd-ipl
 tracing
 vfio-migration
diff --git a/docs/devel/s390-cpu-topology.rst 
b/docs/devel/s390-cpu-topology.rst

new file mode 100644
index 00..0b7bb42079
--- /dev/null
+++ b/docs/devel/s390-cpu-topology.rst
@@ -0,0 +1,161 @@
+QAPI interface for S390 CPU topology
+
+
+Let's start QEMU with the following command:
+
+.. code-block:: bash
+
+ qemu-system-s390x \
+    -enable-kvm \
+    -cpu z14,ctop=on \
+    -smp 1,drawers=3,books=3,sockets=2,cores=2,maxcpus=36 \
+    \
+    -device z14-s390x-cpu,core-id=19,polarization=3 \
+    -device z14-s390x-cpu,core-id=11,polarization=1 \
+    -device z14-s390x-cpu,core-id=112,polarization=3 \
+   ...
+
+and see the result when using the QAPI interface.
+
+Addons to query-cpus-fast
+-
+
+The command query-cpus-fast allows to query the topology tree and
+modifiers for all configured vCPUs.
+
+.. code-block:: QMP
+
+ { "execute": "query-cpus-fast" }
+ {
+  "return": [
+    {
+  "dedicated": false,
+  "thread-id": 536993,
+  "props": {
+    "core-id": 0,
+    "socket-id": 0,
+    "drawer-id": 0,
+    "book-id": 0
+  },
+  "cpu-state": "operating",
+  "entitlement": "medium",
+  "qom-path": "/machine/unattached/device[0]",
+  "cpu-index": 0,
+  "target": "s390x"
+    },
+    {
+  "dedicated": false,
+  "thread-id": 537003,
+  "props": {
+    "core-id": 19,
+    "socket-id": 1,
+    "drawer-id": 0,
+    "book-id": 2
+  },
+  "cpu-state": "operating",
+  "entitlement": "high",
+  "qom-path": "/machine/peripheral-anon/device[0]",
+  "cpu-index": 19,
+  "target": "s390x"
+    },
+    {
+  "dedicated": false,
+  "thread-id": 537004,
+  "props": {
+    "core-id": 11,
+    "socket-id": 1,
+    "drawer-id": 0,
+    "book-id": 1
+  },
+  "cpu-state": "operating",
+  "entitlement": "low",
+  "qom-path": "/machine/peripheral-anon/device[1]",
+  "cpu-index": 11,
+  "target": "s390x"
+    },
+    {
+  "dedicated": true,
+  "thread-id": 537005,
+  "props": {
+    "core-id": 112,
+    "socket-id": 0,
+    "drawer-id": 3,
+    "book-id": 2
+  },
+  "cpu-state": "operating",
+  "entitlement": "high",
+  "qom-path": "/machine/peripheral-anon/device[2]",
+  "cpu-index": 112,
+  "target": "s390x"
+    }
+  ]
+ }
+
+
+QAPI command: set-cpu-topology
+--
+
+The command set-cpu-topology allows to modify the topology tree
+or the topology modifiers of a vCPU in the configuration.
+
+.. code-block:: QMP
+
+    { "execute": "set-cpu-topology",
+  "arguments": {
+ "core-id": 11,
+ "socket-id": 0,
+ "book-id": 0,
+ "drawer-id": 0,
+ "entitlement": "low",
+ "dedicated": false
+  }
+    }
+    {"return": {}}
+
+The core-id parameter is the only non optional parameter and every
+unspecified parameter keeps its previous value.
+
+QAPI event CPU_POLARIZATION_CHANGE
+--
+
+When a guest is requests a modification of the polarization,
+QEMU sends a CPU_POLARIZATION_CHANGE event.
+
+When requesting the change, the guest only specifies horizontal or
+vertical polarization.
+It is the job of the upper layer to set the dedication and fine grained
+vertical entitlement in response to this event.
+
+Note that a vertical polarized dedicated vCPU can only have a high
+entitlement, this gives 6 possibilities for vCPU polarization:
+
+- Horizontal
+- Horizontal dedicated
+- Vertical low
+- Vertical medium
+- Vertical high

Re: [PATCH for 8.1 v2 5/6] vdpa: move CVQ isolation check to net_init_vhost_vdpa

2023-04-03 Thread Eugenio Perez Martin

On Mon, Apr 3, 2023 at 7:32 AM Jason Wang  wrote:
>
> On Fri, Mar 31, 2023 at 6:12 PM Eugenio Perez Martin
>  wrote:
> >
> > On Fri, Mar 31, 2023 at 10:00 AM Jason Wang  wrote:
> > >
> > >
> > > 在 2023/3/30 18:42, Eugenio Perez Martin 写道:
> > > > On Thu, Mar 30, 2023 at 8:23 AM Jason Wang  wrote:
> > > >> On Thu, Mar 30, 2023 at 2:20 PM Jason Wang  wrote:
> > > >>> On Fri, Mar 24, 2023 at 3:54 AM Eugenio Pérez  
> > > >>> wrote:
> > >  Evaluating it at start time instead of initialization time may make 
> > >  the
> > >  guest capable of dynamically adding or removing migration blockers.
> > > 
> > >  Also, moving to initialization reduces the number of ioctls in the
> > >  migration, reducing failure possibilities.
> > > 
> > >  As a drawback we need to check for CVQ isolation twice: one time 
> > >  with no
> > >  MQ negotiated and another one acking it, as long as the device 
> > >  supports
> > >  it.  This is because Vring ASID / group management is based on vq
> > >  indexes, but we don't know the index of CVQ before negotiating MQ.
> > > >>> We need to fail if we see a device that can isolate cvq without MQ but
> > > >>> not with MQ.
> > > >>>
> > >  Signed-off-by: Eugenio Pérez 
> > >  ---
> > >  v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
> > >  ---
> > >    net/vhost-vdpa.c | 194 
> > >  ---
> > >    1 file changed, 151 insertions(+), 43 deletions(-)
> > > 
> > >  diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > >  index 4397c0d4b3..db2c9afcb3 100644
> > >  --- a/net/vhost-vdpa.c
> > >  +++ b/net/vhost-vdpa.c
> > >  @@ -43,6 +43,13 @@ typedef struct VhostVDPAState {
> > > 
> > >    /* The device always have SVQ enabled */
> > >    bool always_svq;
> > >  +
> > >  +/* The device can isolate CVQ in its own ASID if MQ is 
> > >  negotiated */
> > >  +bool cvq_isolated_mq;
> > >  +
> > >  +/* The device can isolate CVQ in its own ASID if MQ is not 
> > >  negotiated */
> > >  +bool cvq_isolated;
> > > >>> As stated above, if we need a device that cvq_isolated_mq^cvq_isolated
> > > >>> == true, we need to fail. This may reduce the complexity of the code?
> > > >>>
> > > >>> Thanks
> > > >> Since we are the mediation layer, Qemu can alway choose to negotiate
> > > >> MQ regardless whether or not it is supported by the guest. In this
> > > >> way, we can have a stable virtqueue index for cvq.
> > > >>
> > > > I think it is a great idea and it simplifies this patch somehow.
> > > > However, we need something like the queue mapping [1] to do so :).
> > > >
> > > > To double confirm:
> > > > * If the device supports MQ, only probe MQ. If not, only probe !MQ.
> > > > * Only store cvq_isolated in VhostVDPAState.
> > > >
> > > > Now, if the device does not negotiate MQ but the device supports MQ:
> > >
> > >
> > > I'm not sure I understand here, if device supports MQ it should accepts
> > > MQ or we can fail the initialization here.
> > >
> >
> > My fault, I wanted to say "if the device offers MQ but the driver does
> > not acks it".
> >
> > >
> > > > * All the requests to queue 3 must be redirected to the last queue in
> > > > the device. That includes set_vq_address, notifiers regions, etc.
> > >
> > >
> > > This also means we will only mediate the case:
> > >
> > > 1) Qemu emulated virtio-net has 1 queue but device support multiple queue
> > >
> > > but not
> > >
> > > 2) Qemu emulated virtio-net has M queue but device support N queue (N>M)
> > >
> >
> > Right.
> >
> > >
> > > >
> > > > I'm totally ok to go this route but it's not immediate.
> > >
> > >
> > > Yes but I mean, we can start from failing the device if
> > > cvq_isolated_mq^cvq_isolated == true
> > >
> >
> > So probe the two cases but set VhostVDPAState->cvq_isolated =
> > cvq_isolated && cvq_mq_isolated then? No map involved that way, and
> > all parents should behave that way.
> >
> > > (or I wonder if we can meet this condition for any existing parents).
> >
> > I don't think so, but I think we need to probe the two anyway.
> > Otherwise we may change the dataplane asid too.
>
> Just to make sure we are at the same page, I meant we could fail the
> initialization of vhost-vDPA is the device:
>
> 1) can isolate cvq in the case of singqueue but not multiqueue
>
> or
>
> 2) can isolate cvq in the case of multiqueue but not single queue
>
> Because I don't think there are any parents that have such a buggy
> implementation.
>

Got it.

Leaving out the queue multiplex for the moment, as it adds complexity
and we can add it on top.

Thanks!

Re: [PATCH v7 3/4] qemu-iotests: test zone append operation

2023-04-03 Thread Stefan Hajnoczi

On Thu, Mar 23, 2023 at 01:19:06PM +0800, Sam Li wrote:
> The patch tests zone append writes by reporting the zone wp after
> the completion of the call. "zap -p" option can print the sector
> offset value after completion, which should be the start sector
> where the append write begins.
> 
> Signed-off-by: Sam Li 
> ---
>  qemu-io-cmds.c | 75 ++
>  tests/qemu-iotests/tests/zoned | 16 +++
>  tests/qemu-iotests/tests/zoned.out | 16 +++
>  3 files changed, 107 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v7 4/4] block: add some trace events for zone append

2023-04-03 Thread Stefan Hajnoczi

On Thu, Mar 23, 2023 at 01:19:07PM +0800, Sam Li wrote:
> Signed-off-by: Sam Li 
> Reviewed-by: Dmitry Fomichev 
> ---
>  block/file-posix.c | 3 +++
>  block/trace-events | 2 ++
>  2 files changed, 5 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v7 2/4] block: introduce zone append write for zoned devices

2023-04-03 Thread Stefan Hajnoczi

On Thu, Mar 23, 2023 at 01:19:05PM +0800, Sam Li wrote:
> diff --git a/block/io.c b/block/io.c
> index 5dbf1e50f2..fe9cabaaf6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -3152,6 +3152,27 @@ out:
>  return co.ret;
>  }
>  
> +int coroutine_fn bdrv_co_zone_append(BlockDriverState *bs, int64_t *offset,
> +QEMUIOVector *qiov,
> +BdrvRequestFlags flags)
> +{
> +BlockDriver *drv = bs->drv;
> +CoroutineIOCompletion co = {
> +.coroutine = qemu_coroutine_self(),
> +};
> +IO_CODE();
> +
> +bdrv_inc_in_flight(bs);
> +if (!drv || !drv->bdrv_co_zone_append || bs->bl.zoned == BLK_Z_NONE) {
> +co.ret = -ENOTSUP;
> +goto out;
> +}

No bdrv_check_qiov_request()? We need to validate inputs. For example,
code later on assumes that offset / bs.bl.zone_size < bs.bl.nr_zones.

> +co.ret = drv->bdrv_co_zone_append(bs, offset, qiov, flags);
> +out:
> +bdrv_dec_in_flight(bs);
> +return co.ret;
> +}


signature.asc
Description: PGP signature

Re: [PATCH v7 1/4] file-posix: add tracking of the zone write pointers

2023-04-03 Thread Stefan Hajnoczi

On Thu, Mar 23, 2023 at 01:19:04PM +0800, Sam Li wrote:
> Since Linux doesn't have a user API to issue zone append operations to
> zoned devices from user space, the file-posix driver is modified to add
> zone append emulation using regular writes. To do this, the file-posix
> driver tracks the wp location of all zones of the device. It uses an
> array of uint64_t. The most significant bit of each wp location indicates
> if the zone type is conventional zones.
> 
> The zones wp can be changed due to the following operations issued:
> - zone reset: change the wp to the start offset of that zone
> - zone finish: change to the end location of that zone
> - write to a zone
> - zone append
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c   | 168 ++-
>  include/block/block-common.h |  14 +++
>  include/block/block_int-common.h |   5 +
>  3 files changed, 183 insertions(+), 4 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 65efe5147e..0fb425dcae 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1324,6 +1324,85 @@ static int hdev_get_max_segments(int fd, struct stat 
> *st)
>  #endif
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +/*
> + * If the ra (reset_all) flag > 0, then the wp of that zone should be reset 
> to
> + * the start sector. Else, take the real wp of the device.
> + */
> +static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +unsigned int nrz, int ra) {

Please use bool for true/false and use clear variable names:
int ra -> bool reset_all

> +struct blk_zone *blkz;
> +size_t rep_size;
> +uint64_t sector = offset >> BDRV_SECTOR_BITS;
> +int ret, n = 0, i = 0;
> +rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> blk_zone);
> +g_autofree struct blk_zone_report *rep = NULL;
> +
> +rep = g_malloc(rep_size);
> +blkz = (struct blk_zone *)(rep + 1);
> +while (n < nrz) {
> +memset(rep, 0, rep_size);
> +rep->sector = sector;
> +rep->nr_zones = nrz - n;
> +
> +do {
> +ret = ioctl(fd, BLKREPORTZONE, rep);
> +} while (ret != 0 && errno == EINTR);
> +if (ret != 0) {
> +error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +fd, offset, errno);
> +return -errno;
> +}
> +
> +if (!rep->nr_zones) {
> +break;
> +}
> +
> +for (i = 0; i < rep->nr_zones; i++, n++) {
> +/*
> + * The wp tracking cares only about sequential writes required 
> and
> + * sequential write preferred zones so that the wp can advance to
> + * the right location.
> + * Use the most significant bit of the wp location to indicate 
> the
> + * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
> + */
> +if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +wps->wp[i] &= 1ULL << 63;
> +} else {
> +switch(blkz[i].cond) {
> +case BLK_ZONE_COND_FULL:
> +case BLK_ZONE_COND_READONLY:
> +/* Zone not writable */
> +wps->wp[i] = (blkz[i].start + blkz[i].len) << 
> BDRV_SECTOR_BITS;
> +break;
> +case BLK_ZONE_COND_OFFLINE:
> +/* Zone not writable nor readable */
> +wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
> +break;
> +default:
> +if (ra > 0) {
> +wps->wp[i] = blkz[i].start << BDRV_SECTOR_BITS;
> +} else {
> +wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
> +}
> +break;
> +}
> +}
> +}
> +sector = blkz[i - 1].start + blkz[i - 1].len;
> +}
> +
> +return 0;
> +}
> +
> +static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +unsigned int nrz) {

QEMU coding style puts the opening curly bracket on a new line:

  static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
  unsigned int nrz)
  {

> +if (get_zones_wp(fd, wps, offset, nrz, 0) < 0) {
> +error_report("update zone wp failed");
> +}
> +}
> +#endif
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
> @@ -1413,6 +1492,21 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  if (ret >= 0) {
>  bs->bl.max_active_zones = ret;
>  }
> +
> +ret = get_sysfs_long_val(&st, "physical_block_size");
> +if (ret >= 0) {
> +bs->bl.write_granularity = ret;
> +}
> +
> +bs->bl.wps = g_malloc(sizeof(BlockZoneWps) +
> +

Re: [PATCH v19 13/21] docs/s390x/cpu topology: document s390x cpu topology

2023-04-03 Thread Cédric Le Goater


On 4/3/23 18:28, Pierre Morel wrote:

Add some basic examples for the definition of cpu topology
in s390x.

Signed-off-by: Pierre Morel 
---
  MAINTAINERS|   2 +
  docs/devel/index-internals.rst |   1 +
  docs/devel/s390-cpu-topology.rst   | 161 +++
  docs/system/s390x/cpu-topology.rst | 238 +
  docs/system/target-s390x.rst   |   1 +
  5 files changed, 403 insertions(+)
  create mode 100644 docs/devel/s390-cpu-topology.rst
  create mode 100644 docs/system/s390x/cpu-topology.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index de9052f753..fe5638e31d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1660,6 +1660,8 @@ S: Supported
  F: include/hw/s390x/cpu-topology.h
  F: hw/s390x/cpu-topology.c
  F: target/s390x/kvm/cpu_topology.c
+F: docs/devel/s390-cpu-topology.rst
+F: docs/system/s390x/cpu-topology.rst
  
  X86 Machines

  
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index e1a93df263..6f81df92bc 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -14,6 +14,7 @@ Details about QEMU's various subsystems including how to add 
features to them.
 migration
 multi-process
 reset
+   s390-cpu-topology
 s390-dasd-ipl
 tracing
 vfio-migration
diff --git a/docs/devel/s390-cpu-topology.rst b/docs/devel/s390-cpu-topology.rst
new file mode 100644
index 00..0b7bb42079
--- /dev/null
+++ b/docs/devel/s390-cpu-topology.rst
@@ -0,0 +1,161 @@
+QAPI interface for S390 CPU topology
+
+
+Let's start QEMU with the following command:
+
+.. code-block:: bash
+
+ qemu-system-s390x \
+-enable-kvm \
+-cpu z14,ctop=on \
+-smp 1,drawers=3,books=3,sockets=2,cores=2,maxcpus=36 \
+\
+-device z14-s390x-cpu,core-id=19,polarization=3 \
+-device z14-s390x-cpu,core-id=11,polarization=1 \
+-device z14-s390x-cpu,core-id=112,polarization=3 \
+   ...
+
+and see the result when using the QAPI interface.
+
+Addons to query-cpus-fast
+-
+
+The command query-cpus-fast allows to query the topology tree and
+modifiers for all configured vCPUs.
+
+.. code-block:: QMP
+
+ { "execute": "query-cpus-fast" }
+ {
+  "return": [
+{
+  "dedicated": false,
+  "thread-id": 536993,
+  "props": {
+"core-id": 0,
+"socket-id": 0,
+"drawer-id": 0,
+"book-id": 0
+  },
+  "cpu-state": "operating",
+  "entitlement": "medium",
+  "qom-path": "/machine/unattached/device[0]",
+  "cpu-index": 0,
+  "target": "s390x"
+},
+{
+  "dedicated": false,
+  "thread-id": 537003,
+  "props": {
+"core-id": 19,
+"socket-id": 1,
+"drawer-id": 0,
+"book-id": 2
+  },
+  "cpu-state": "operating",
+  "entitlement": "high",
+  "qom-path": "/machine/peripheral-anon/device[0]",
+  "cpu-index": 19,
+  "target": "s390x"
+},
+{
+  "dedicated": false,
+  "thread-id": 537004,
+  "props": {
+"core-id": 11,
+"socket-id": 1,
+"drawer-id": 0,
+"book-id": 1
+  },
+  "cpu-state": "operating",
+  "entitlement": "low",
+  "qom-path": "/machine/peripheral-anon/device[1]",
+  "cpu-index": 11,
+  "target": "s390x"
+},
+{
+  "dedicated": true,
+  "thread-id": 537005,
+  "props": {
+"core-id": 112,
+"socket-id": 0,
+"drawer-id": 3,
+"book-id": 2
+  },
+  "cpu-state": "operating",
+  "entitlement": "high",
+  "qom-path": "/machine/peripheral-anon/device[2]",
+  "cpu-index": 112,
+  "target": "s390x"
+}
+  ]
+ }
+
+
+QAPI command: set-cpu-topology
+--
+
+The command set-cpu-topology allows to modify the topology tree
+or the topology modifiers of a vCPU in the configuration.
+
+.. code-block:: QMP
+
+{ "execute": "set-cpu-topology",
+  "arguments": {
+ "core-id": 11,
+ "socket-id": 0,
+ "book-id": 0,
+ "drawer-id": 0,
+ "entitlement": "low",
+ "dedicated": false
+  }
+}
+{"return": {}}
+
+The core-id parameter is the only non optional parameter and every
+unspecified parameter keeps its previous value.
+
+QAPI event CPU_POLARIZATION_CHANGE
+--
+
+When a guest is requests a modification of the polarization,
+QEMU sends a CPU_POLARIZATION_CHANGE event.
+
+When requesting the change, the guest only specifies horizontal or
+vertical polarization.
+It is the job of the upper layer to set the dedication and fine grained
+vertical entitlement in response to this event.
+
+Note that a vertical polarized dedicated vCPU can only have a high
+entitlement, this gives 6 possibilities for vCPU polarization:
+
+- Horizontal
+- Horizontal dedicated
+- Vertical low
+- Vertical medium
+- Vertical high
+- Vertical high dedicated
+
+Example of the e

Re: an issue for device hot-unplug

2023-04-03 Thread Yu Zhang

Dear Laurent,

Thank you for your quick reply. We used qemu-7.1, but it is reproducible
with qemu from v6.2 to the recent v8.0 release candidates.
I found that it's introduced by the commit  9323f892b39 (between v6.2.0-rc2
and v6.2.0-rc3).

If it doesn't break anything else, it suffices to remove the line below
from acpi_pcihp_device_unplug_request_cb():

pdev->qdev.pending_deleted_event = true;

but you may have a reason to keep it. First of all, I'll open a bug in the
bug tracker and let you know.

Best regards,
Yu Zhang

On Mon, Apr 3, 2023 at 6:32 PM Laurent Vivier  wrote:

> Hi Yu,
>
> please open a bug in the bug tracker:
>
> https://gitlab.com/qemu/qemu/-/issues
>
> It's easier to track the problem.
>
> What is the version of QEMU you are using?
> Could you provide QEMU command line?
>
> Thanks,
> Laurent
>
>
> On 4/3/23 15:24, Yu Zhang wrote:
> > Dear Laurent,
> >
> > recently we run into an issue with the following error:
> >
> > command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX"
> } }' for VM "id"
> > failed ({ "return": {"class": "GenericError", "desc": "Device
> virtio-diskX is already in
> > the process of unplug"} }).
> >
> > The issue is reproducible. With a few seconds delay before hot-unplug,
> hot-unplug just
> > works fine.
> >
> > After a few digging, we found that the commit 9323f892b39 may incur the
> issue.
> > --
> >  failover: fix unplug pending detection
> >
> >  Failover needs to detect the end of the PCI unplug to start
> migration
> >  after the VFIO card has been unplugged.
> >
> >  To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and
> reset in
> >  pcie_unplug_device().
> >
> >  But since
> >  17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default
> on Q35")
> >  we have switched to ACPI unplug and these functions are not called
> anymore
> >  and the flag not set. So failover migration is not able to detect
> if card
> >  is really unplugged and acts as it's done as soon as it's started.
> So it
> >  doesn't wait the end of the unplug to start the migration. We don't
> see any
> >  problem when we test that because ACPI unplug is faster than PCIe
> native
> >  hotplug and when the migration really starts the unplug operation is
> >  already done.
> >
> >  See c000a9bd06ea ("pci: mark device having guest unplug request
> pending")
> >  a99c4da9fc2a ("pci: mark devices partially unplugged")
> >
> >  Signed-off-by: Laurent Vivier  lviv...@redhat.com>>
> >  Reviewed-by: Ani Sinha mailto:a...@anisinha.ca>>
> >  Message-Id: <2028133225.324937-4-lviv...@redhat.com
> > >
> >  Reviewed-by: Michael S. Tsirkin  m...@redhat.com>>
> >  Signed-off-by: Michael S. Tsirkin  m...@redhat.com>>
> > --
> > The purpose is for detecting the end of the PCI device hot-unplug.
> However, we feel the
> > error confusing. How is it possible that a disk "is already in the
> process of unplug"
> > during the first hot-unplug attempt? So far as I know, the issue was
> also encountered by
> > libvirt, but they simply ignored it:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1878659
> > 
> >
> > Hence, a question is: should we have the line below in
> acpi_pcihp_device_unplug_request_cb()?
> >
> > pdev->qdev.pending_deleted_event = true;
> >
> > It would be great if you as the author could give us a few hints.
> >
> > Thank you very much for your reply!
> >
> > Sincerely,
> >
> > Yu Zhang @ Compute Platform IONOS
> > 03.04.2013
>
>

Re: [PATCH v2 00/10] Deprecate/rename singlestep command line option, monitor interfaces

2023-04-03 Thread Richard Henderson


On 4/3/23 07:46, Peter Maydell wrote:

  * I have written patch 3 on the assumption that curr_cflags()
is not such a hot codepath that we can't afford to have
a QOM cast macro in it; the alternative would be to
keep it using a global variable but make the global be
restricted to accel/tcg/internals.h. RTH: opinions welcome...


curr_cflags() is quite hot, called from lookup_tb_ptr every time we time we end a chain of 
directly linked TBs.  You'll see lookup_tb_ptr near the top of any tcg profile.


With a global variable, it might be worth combining with CPU_LOG_TB_NOCHAIN, recomputing 
the global if either option changes.



r~

Re: [PATCH v2 2/5] apic: add support for x2APIC mode

2023-04-03 Thread Bui Quang Minh


On 4/3/23 17:27, David Woodhouse wrote:

On Wed, 2023-03-29 at 22:30 +0700, Bui Quang Minh wrote:




I do some more testing on my hardware, your point is correct when dest
== 0x, the interrupt is delivered to all APICs regardless of
their mode.


To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI
destination mode is physical. In case the destination mode is logical,
flat model/cluster model rule applies to determine if the xAPIC CPUs
accept the IPI. Wow, this is so complicated :)


So even if you send to *all* of the first 8 CPUs in a cluster (e.g.
cluster 0x0001 giving a destination 0x000100FF, a CPU in xAPIC mode
doesn't see that as a broadcast because it's logical mode?


I mean if the destination is 0x, the xAPIC CPU will see 
destination as 0xff. 0xff is broadcast in physical destination mode 
only, in logical destination, it may not be a broadcast. It may depend 
on the whether it is flat model or cluster model in logical destination 
mode.


In flat model, 8 bits are used as mask, so in theory, this model can 
only support 8 CPUs, each CPU reserves its own bit by setting the upper 
8 bits of APIC LDR register. In Intel SDM, it is said that 0xff can be 
interpreted as a broadcast, this is true in normal case, but I think if 
the CPU its APIC LDR to 0, the IPI should not deliver to this CPU. This 
also matches with the current flat model of logical destination mode 
implementation of userspace APIC in Qemu before my series. However, I 
see this seems like a corner case, I didn't test it on real hardware.


With cluster model, when writing the above paragraphs, I think that 0xff 
will be delivered to all APICs (mask = 0xf) of cluster 15 (0xf). 
However, reading the SDM more carefully, I see that the there are only 
15 clusters with address from 0 to 14 so there is no cluster with 
address 15. 0xff is interpreted as broadcast to all APICs in all 
clusters too.


In conclusion, IPI with destination 0x can be a broadcast to all 
xAPIC CPUs too if we just ignore the corner case in flat model of 
logical destination mode (we may need to test more)



I would have assumed that a CPU in xAPIC mode would have looked at the
low byte and interpreted it as xAPIC logical mode, with the cluster in
the high nybble and the 4-bit mask in the low nybble?


Yes, this is the behavior in cluster model of logical destination mode 
(I try to stick with Intel SDM Section 10.6.2.2 Volume 3A when using 
those terminologies)

1 2 3 >

1 - 100 of 240 matches

Mail list logo