[PATCH] esp: Handle CMD_BUSRESET by resetting the SCSI bus

2022-08-16 Thread John Millikin
Per investigation on the linked ticket, SunOS issues a SCSI bus reset
to the ESP as part of its boot sequence. If this ESP command doesn't
cause devices to assert sense flag UNIT ATTENTION, SunOS will consider
the CD-ROM device to be non-compliant with Common Command Set (CCS).
In this condition, the SunOS installer's early userspace doesn't set
the installation source location to sr0 and the miniroot copy fails.

Signed-off-by: John Millikin 
Suggested-by: Bill Paul 
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1127
---
 hw/scsi/esp.c | 6 ++
 1 file changed, 6 insertions(+)

(re-sending because I forgot the `Signed-off-by`; sorry)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 2d3c649567..c799c19bd4 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -939,6 +939,11 @@ static void esp_soft_reset(ESPState *s)
 esp_hard_reset(s);
 }
 
+static void esp_bus_reset(ESPState *s)
+{
+qbus_reset_all(BUS(>bus));
+}
+
 static void parent_esp_reset(ESPState *s, int irq, int level)
 {
 if (level) {
@@ -1067,6 +1072,7 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
 break;
 case CMD_BUSRESET:
 trace_esp_mem_writeb_cmd_bus_reset(val);
+esp_bus_reset(s);
 if (!(s->wregs[ESP_CFG1] & CFG1_RESREPT)) {
 s->rregs[ESP_RINTR] |= INTR_RST;
 esp_raise_irq(s);
-- 
2.25.1




Re: [PATCH 1/4] hw/nvme: avoid unnecessary call to irq (de)assertion functions

2022-08-16 Thread Jinhao Fan
at 11:24 PM, Stefan Hajnoczi  wrote:

> Can the logic be moved into assert()/deassert() so callers don't need
> to duplicate the checks?
> 
> (I assume the optimization is that eventfd syscalls are avoided, not
> that the function call is avoided.)

I guess I can move the eventfd syscall into assert()/deassert().




Re: [PATCH 2/4] hw/nvme: add option to (de)assert irq with eventfd

2022-08-16 Thread Jinhao Fan
at 7:20 PM, Klaus Jensen  wrote:

> This option does not seem to change anything - the value is never used
> ;)

What a stupid mistake. I’ll fix this in the next version.



[PATCH 2/2] scsi: Reject commands if the CDB length exceeds buf_len

2022-08-16 Thread John Millikin
In scsi_req_parse_cdb(), if the CDB length implied by the command type
exceeds the initialized portion of the command buffer, reject the request.

Rejected requests are recorded by the `scsi_req_parse_bad` trace event.

On example of a bug detected by this check is SunOS's use of interleaved
DMA and non-DMA commands. This guest behavior currently causes QEMU to
parse uninitialized memory as a SCSI command, with unpredictable
outcomes.

With the new check in place:

  * QEMU consistently creates a trace event and rejects the request.

  * SunOS retries the request(s) and is able to successfully boot from
disk.

Signed-off-by: John Millikin 
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1127
---
 hw/scsi/scsi-bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 288ea12969..1beb1b0cfc 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -712,6 +712,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 SCSICommand cmd = { .len = 0 };
 int ret;
 
+assert(buf_len > 0);
+
 if ((d->unit_attention.key == UNIT_ATTENTION ||
  bus->unit_attention.key == UNIT_ATTENTION) &&
 (buf[0] != INQUIRY &&
@@ -1316,7 +1318,7 @@ int scsi_req_parse_cdb(SCSIDevice *dev, SCSICommand *cmd, 
uint8_t *buf,
 
 cmd->lba = -1;
 len = scsi_cdb_length(buf);
-if (len < 0) {
+if (len < 0 || len > buf_len) {
 return -1;
 }
 
-- 
2.25.1




[PATCH 1/2] scsi: Add buf_len parameter to scsi_req_new()

2022-08-16 Thread John Millikin
When a SCSI command is received from the guest, the CDB length implied
by the first byte might exceed the number of bytes the guest sent. In
this case scsi_req_new() will read uninitialized data, causing
unpredictable behavior.

Adds the buf_len parameter to scsi_req_new() and plumbs it through the
call stack.

The sigil SCSI_CMD_BUF_LEN_TODO() is used to indicate that the buffer
length calculation is TODO it should be replaced by a better value,
such as the length of a successful DMA read.

Signed-off-by: John Millikin 
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1127
---
 hw/scsi/esp.c  |  2 +-
 hw/scsi/lsi53c895a.c   |  2 +-
 hw/scsi/megasas.c  | 13 -
 hw/scsi/mptsas.c   |  3 ++-
 hw/scsi/scsi-bus.c | 19 +++
 hw/scsi/scsi-disk.c|  7 ---
 hw/scsi/scsi-generic.c |  5 +++--
 hw/scsi/spapr_vscsi.c  |  3 ++-
 hw/scsi/virtio-scsi.c  |  6 --
 hw/scsi/vmw_pvscsi.c   |  3 ++-
 hw/usb/dev-storage.c   |  3 ++-
 hw/usb/dev-uas.c   |  3 ++-
 include/hw/scsi/scsi.h | 11 ++-
 include/scsi/utils.h   |  6 ++
 14 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 2d3c649567..19fafad2a3 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -292,7 +292,7 @@ static void do_command_phase(ESPState *s)
 esp_fifo_pop_buf(>cmdfifo, buf, cmdlen);
 
 current_lun = scsi_device_find(>bus, 0, s->current_dev->id, s->lun);
-s->current_req = scsi_req_new(current_lun, 0, s->lun, buf, s);
+s->current_req = scsi_req_new(current_lun, 0, s->lun, buf, cmdlen, s);
 datalen = scsi_req_enqueue(s->current_req);
 s->ti_size = datalen;
 fifo8_reset(>cmdfifo);
diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index ad5f5e5f39..e8a4a705e7 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -864,7 +864,7 @@ static void lsi_do_command(LSIState *s)
 s->current = g_new0(lsi_request, 1);
 s->current->tag = s->select_tag;
 s->current->req = scsi_req_new(dev, s->current->tag, s->current_lun, buf,
-   s->current);
+   SCSI_CMD_BUF_LEN_TODO(s->dbc), s->current);
 
 n = scsi_req_enqueue(s->current->req);
 if (n) {
diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index d5dfb412ba..e887ae8adb 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1053,6 +1053,7 @@ static int megasas_pd_get_info_submit(SCSIDevice *sdev, 
int lun,
 uint64_t pd_size;
 uint16_t pd_id = ((sdev->id & 0xFF) << 8) | (lun & 0xFF);
 uint8_t cmdbuf[6];
+size_t cmdbuf_len = SCSI_CMD_BUF_LEN_TODO(sizeof(cmdbuf));
 size_t len;
 dma_addr_t residual;
 
@@ -1062,7 +1063,7 @@ static int megasas_pd_get_info_submit(SCSIDevice *sdev, 
int lun,
 info->inquiry_data[0] = 0x7f; /* Force PQual 0x3, PType 0x1f */
 info->vpd_page83[0] = 0x7f;
 megasas_setup_inquiry(cmdbuf, 0, sizeof(info->inquiry_data));
-cmd->req = scsi_req_new(sdev, cmd->index, lun, cmdbuf, cmd);
+cmd->req = scsi_req_new(sdev, cmd->index, lun, cmdbuf, cmdbuf_len, 
cmd);
 if (!cmd->req) {
 trace_megasas_dcmd_req_alloc_failed(cmd->index,
 "PD get info std inquiry");
@@ -1080,7 +1081,7 @@ static int megasas_pd_get_info_submit(SCSIDevice *sdev, 
int lun,
 return MFI_STAT_INVALID_STATUS;
 } else if (info->inquiry_data[0] != 0x7f && info->vpd_page83[0] == 0x7f) {
 megasas_setup_inquiry(cmdbuf, 0x83, sizeof(info->vpd_page83));
-cmd->req = scsi_req_new(sdev, cmd->index, lun, cmdbuf, cmd);
+cmd->req = scsi_req_new(sdev, cmd->index, lun, cmdbuf, cmdbuf_len, 
cmd);
 if (!cmd->req) {
 trace_megasas_dcmd_req_alloc_failed(cmd->index,
 "PD get info vpd inquiry");
@@ -1259,6 +1260,7 @@ static int megasas_ld_get_info_submit(SCSIDevice *sdev, 
int lun,
 struct mfi_ld_info *info = cmd->iov_buf;
 size_t dcmd_size = sizeof(struct mfi_ld_info);
 uint8_t cdb[6];
+size_t cdb_len;
 ssize_t len;
 dma_addr_t residual;
 uint16_t sdev_id = ((sdev->id & 0xFF) << 8) | (lun & 0xFF);
@@ -1268,7 +1270,8 @@ static int megasas_ld_get_info_submit(SCSIDevice *sdev, 
int lun,
 cmd->iov_buf = g_malloc0(dcmd_size);
 info = cmd->iov_buf;
 megasas_setup_inquiry(cdb, 0x83, sizeof(info->vpd_page83));
-cmd->req = scsi_req_new(sdev, cmd->index, lun, cdb, cmd);
+cdb_len = SCSI_CMD_BUF_LEN_TODO(sizeof(cdb));
+cmd->req = scsi_req_new(sdev, cmd->index, lun, cdb, cdb_len, cmd);
 if (!cmd->req) {
 trace_megasas_dcmd_req_alloc_failed(cmd->index,
 "LD get info vpd inquiry");
@@ -1748,7 +1751,7 @@ static int megasas_handle_scsi(MegasasState *s, 
MegasasCmd *cmd,
 return MFI_STAT_SCSI_DONE_WITH_ERROR;
 }
 
-cmd->req = scsi_req_new(sdev, 

Re: [PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd

2022-08-16 Thread Jinhao Fan
at 6:46 PM, Klaus Jensen  wrote:

> 
> Did qtest work out for you for testing? If so, it would be nice to add a
> simple test case as well.

I found MSI-x masking harder to test than we imagined. My plan is to only
emulate IO queues in the IOthread and leave admin queue emulation in the
main loop since some admin commands require BQL. So I didn’t enable irqfd on
admin queues. Therefore we can onlyt test MSI-x masking on IO queues. This
makes qtest complicated since we need to handle IO queue creation.

But I’m not sure my understanding is correct. Is it true that the admin
queue does not need irqfd as long as it runs in the main loop thread?



Re: Bluetooth support in QEMU

2022-08-16 Thread vaishu venkat
Hi Team,

Could you have any thoughts or solutions for accessing the wireless adapter
interface inside the QEMU.


Thanks and regards,
Vaishnavi

On Tue, Aug 16, 2022 at 11:33 AM vaishu venkat 
wrote:

> Thomas,
>
> Sure will try with the USB Passthrough. Do you have any thoughts on
> accessing the wireless interface inside the QEMU.
>
> Thanks in Anticipation.
>
>
>
>
> Regards,
> Vaishnavi
>
> On Tue, Aug 16, 2022 at 11:27 AM Thomas Huth  wrote:
>
>> On 16/08/2022 06.22, vaishu venkat wrote:
>> > Hi Thomas,
>> >
>> > Thanks for prompt response.
>> >
>> > We currently using the QEMU version as below,
>> > *
>> > *
>> > *qemu-system-aarch64 -version
>> > QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
>> > Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers*
>> >
>> > Could you please guide us with the, how to access the real bluetooth
>> devices
>> > in the guest*
>>
>> Simply search for "QEMU USB passthrough" with your favourite search
>> engine,
>> there are plenty of help pages out there.
>>
>>   Thomas
>>
>>


Re: [PATCH v13 0/6] Improve PMU support

2022-08-16 Thread Alistair Francis
On Wed, Aug 17, 2022 at 9:23 AM Atish Patra  wrote:
>
> The latest version of the SBI specification includes a Performance Monitoring
> Unit(PMU) extension[1] which allows the supervisor to start/stop/configure
> various PMU events. The Sscofpmf ('Ss' for Privileged arch and 
> Supervisor-level
> extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering)
> extension[2] allows the perf like tool to handle overflow interrupts and
> filtering support.
>
> This series implements remaining PMU infrastructure to support
> PMU in virt machine. The first seven patches from the original series
> have been merged already.
>
> This will allow us to add any PMU events in future.
> Currently, this series enables the following omu events.
> 1. cycle count
> 2. instruction count
> 3. DTLB load/store miss
> 4. ITLB prefetch miss
>
> The first two are computed using host ticks while last three are counted 
> during
> cpu_tlb_fill. We can do both sampling and count from guest userspace.
> This series has been tested on both RV64 and RV32. Both Linux[3] and 
> Opensbi[4]
> patches are required to get the perf working.
>
> Here is an output of perf stat/report while running hackbench with latest
> OpenSBI & Linux kernel.
>
> Perf stat:
> ==
> [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e 
> dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
>  Total time: 0.265 [sec]
>
>  Performance counter stats for 'perf bench sched messaging -g 1 -l 10':
>
>  4,167,825,362  cycles
>  4,166,609,256  instructions  #1.00  insn per cycle
>  3,092,026  dTLB-load-misses
>258,280  dTLB-store-misses
>  2,068,966  iTLB-load-misses
>
>0.585791767 seconds time elapsed
>
>0.373802000 seconds user
>1.042359000 seconds sys
>
> Perf record:
> 
> [root@fedora-riscv ~]# perf record -e cycles -e instructions \
> > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 1 \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups == 40 processes run
>
>  Total time: 1.397 [sec]
> [ perf record: Woken up 10 times to write data ]
> Check IO/CPU overload!
> [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ]
>
> [root@fedora-riscv riscv]# perf report
> Available samples
> 107K cycles   
>  ◆
> 107K instructions 
>  ▒
> 250 dTLB-load-misses  
>  ▒
> 13 dTLB-store-misses  
>  ▒
> 172 iTLB-load-misses
> ..
>
> Changes from v12->v13:
> 1. Rebased on top of the apply-next.
> 2. Addressed comments about space & comment block.
>
> Changes from v11->v12:
> 1. Rebased on top of the apply-next.
> 2. Aligned the write function & .min_priv to the previous line.
> 3. Fixed the FDT generations for multi-socket scenario.
> 4. Dropped interrupt property from the DT.
> 5. Generate illegal instruction fault instead of virtual instruction fault
>for VS/VU access while mcounteren is not set.
>
> Changes from v10->v11:
> 1. Rebased on top of the master where first 7 patches were already merged.
> 2. Removed unnecessary additional check in ctr predicate function.
> 3. Removed unnecessary priv version checks in mcountinhibit read/write.
> 4. Added Heiko's reviewed-by/tested-by tags.
>
> Changes from v8->v9:
> 1. Added the write_done flags to the vmstate.
> 2. Fixed the hpmcounter read access from M-mode.
>
> Changes from v7->v8:
> 1. Removeding ordering constraints for mhpmcounter & mhpmevent.
>
> Changes from v6->v7:
> 1. Fixed all the compilation errors for the usermode.
>
> Changes from v5->v6:
> 1. Fixed compilation issue with PATCH 1.
> 2. Addressed other comments.
>
> Changes from v4->v5:
> 1. Rebased on top of the -next with following patches.
>- isa extension
>- priv 1.12 spec
> 2. Addressed all the comments on v4
> 3. Removed additional isa-ext DT node in favor of riscv,isa string update
>
> Changes from v3->v4:
> 1. Removed the dummy events from pmu DT node.
> 2. Fixed pmu_avail_counters mask generation.
> 3. Added a patch to simplify the predicate function for counters.
>
> Changes from v2->v3:
> 1. Addressed all the comments on PATCH1-4.
> 2. Split patch1 into two separate patches.
> 3. Added explicit comments to explain the event types in DT node.
> 4. Rebased on latest Qemu.
>
> Changes from v1->v2:
> 1. Dropped the ACks from v1 as signficant changes happened after v1.
> 2. sscofpmf support.
> 3. A generic counter management framework.
>
> [1] 

Re: [PATCH for-7.1 3/4] target/loongarch: rename the TCG CPU "la464" to "qemu64-v1.00"

2022-08-16 Thread maobibo
QEMU64 cpu model can be added, however la464 cpu model should be kept here
still. Actually there is no formal micro-achitecture name for loongarch, I
prefer to la464 still :)

Also host cpu model can be added later, which has the same features with
host processor. What is meaning for QEMU64/KVM64 cpu model? Does it mean
that minimum required CPU features for current popular OSV?

regards
bibo,mao


在 2022/8/14 22:55, WANG Xuerui 写道:
> From: WANG Xuerui 
> 
> The only LoongArch CPU implemented is modeled after the Loongson 3A5000,
> but it is not the real thing, and at least one feature [1] is missing
> that actually made the model incompatible with the real 3A5000. What's
> more, the model is currently named "la464", while none of the
> micro-architecture-specific things are currently present, further making
> things needlessly complex.
> 
> In general, high-fidelity models can and should be named after the real
> hardware model, while generic emulation-oriented models should be named
> after ISA levels. For now, the best reference for LoongArch ISA levels
> is the revision number of the LoongArch ISA Manual, of which v1.00 is
> still the latest. (v1.01 and v1.02 are minor revisions without
> substantive change.)
> 
> As defined by various specs, the vendor and model names are also
> reflected in respective CSRs, and are 8 bytes long. So, rename "la464"
> to "qemu64-v1.00", with "QEMU64" as vendor name and "v1.00" as model
> name.
> 
> As the QEMU 7.1 hasn't been officially released, no downstream is
> expected to depend on the old name, so this change should be safe for
> 7.1.
> 
> [1]: 
> https://lore.kernel.org/loongarch/20220726094049.7200-2-maob...@loongson.cn/
> 
> Signed-off-by: WANG Xuerui 
> ---
>  hw/loongarch/virt.c| 14 ++
>  target/loongarch/cpu.c |  6 +++---
>  2 files changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
> index 5cc0b05538..35e2174a17 100644
> --- a/hw/loongarch/virt.c
> +++ b/hw/loongarch/virt.c
> @@ -626,7 +626,6 @@ static void 
> loongarch_direct_kernel_boot(LoongArchMachineState *lams)
>  static void loongarch_init(MachineState *machine)
>  {
>  LoongArchCPU *lacpu;
> -const char *cpu_model = machine->cpu_type;
>  ram_addr_t offset = 0;
>  ram_addr_t ram_size = machine->ram_size;
>  uint64_t highram_size = 0;
> @@ -634,15 +633,6 @@ static void loongarch_init(MachineState *machine)
>  LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
>  int i;
>  
> -if (!cpu_model) {
> -cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
> -}
> -
> -if (!strstr(cpu_model, "la464")) {
> -error_report("LoongArch/TCG needs cpu type la464");
> -exit(1);
> -}
> -
>  if (ram_size < 1 * GiB) {
>  error_report("ram_size must be greater than 1G.");
>  exit(1);
> @@ -749,10 +739,10 @@ static void loongarch_class_init(ObjectClass *oc, void 
> *data)
>  {
>  MachineClass *mc = MACHINE_CLASS(oc);
>  
> -mc->desc = "Loongson-3A5000 LS7A1000 machine";
> +mc->desc = "LoongArch64 v1.00-compatible LS7A1000 machine";
>  mc->init = loongarch_init;
>  mc->default_ram_size = 1 * GiB;
> -mc->default_cpu_type = LOONGARCH_CPU_TYPE_NAME("la464");
> +mc->default_cpu_type = LOONGARCH_CPU_TYPE_NAME("qemu64-v1.00");
>  mc->default_ram_id = "loongarch.ram";
>  mc->max_cpus = LOONGARCH_MAX_VCPUS;
>  mc->is_default = 1;
> diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
> index 4663539443..0a41509a0c 100644
> --- a/target/loongarch/cpu.c
> +++ b/target/loongarch/cpu.c
> @@ -527,9 +527,9 @@ static uint64_t loongarch_qemu_read(void *opaque, hwaddr 
> addr, unsigned size)
>  return 1ULL << IOCSRF_MSI | 1ULL << IOCSRF_EXTIOI |
> 1ULL << IOCSRF_CSRIPI;
>  case VENDOR_REG:
> -return 0x6e6f73676e6f6f4cULL; /* "Loongson" */
> +return 0x3436554d4551ULL; /* "QEMU64" */
>  case CPUNAME_REG:
> -return 0x303030354133ULL; /* "3A5000" */
> +return 0x30302e3176ULL;   /* "v1.00" */
>  case MISC_FUNC_REG:
>  return 1ULL << IOCSRM_EXTIOI_EN;
>  }
> @@ -715,7 +715,7 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
>  .class_size = sizeof(LoongArchCPUClass),
>  .class_init = loongarch_cpu_class_init,
>  },
> -DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn),
> +DEFINE_LOONGARCH_CPU_TYPE("qemu64-v1.00", loongarch_la464_initfn),
>  };
>  
>  DEFINE_TYPES(loongarch_cpu_type_infos)




[PATCH] .gitlab-ci.d/buildtest.yml: Increase the check-gprof-gcov job timeout

2022-08-16 Thread Bin Meng
Current project timeout is 1 hour, but the check-gprof-gcov job never
completes within 1 hour. Increase the job timeout to 90 minutes.

Signed-off-by: Bin Meng 
---

 .gitlab-ci.d/buildtest.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 1931b77b49..52d45508fb 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -495,6 +495,7 @@ check-gprof-gcov:
   variables:
 IMAGE: ubuntu2004
 MAKE_CHECK_ARGS: check
+  timeout: 90m
   after_script:
 - ${CI_PROJECT_DIR}/scripts/ci/coverage-summary.sh
 
-- 
2.34.1




Re: [PATCH for-7.1 3/4] target/loongarch: rename the TCG CPU "la464" to "qemu64-v1.00"

2022-08-16 Thread chen huacai
Hi, Richard and Xuerui,

On Mon, Aug 15, 2022 at 4:54 AM Richard Henderson
 wrote:
>
> On 8/14/22 09:55, WANG Xuerui wrote:
> > From: WANG Xuerui 
> >
> > The only LoongArch CPU implemented is modeled after the Loongson 3A5000,
> > but it is not the real thing, ...
>
> The 3A5000 is the SoC, as far as I could find, and the documentation of that 
> says the core
> is named the la464.
>
>
> > In general, high-fidelity models can and should be named after the real
> > hardware model, while generic emulation-oriented models should be named
> > after ISA levels.
>
> This wasn't intended to be a generic emulation model, as far as I know.  
> There are missing
> features, but presumably those would eventually be filled in.
>
>
> > For now, the best reference for LoongArch ISA levels
> > is the revision number of the LoongArch ISA Manual, of which v1.00 is
> > still the latest. (v1.01 and v1.02 are minor revisions without
> > substantive change.)
> >
> > As defined by various specs, the vendor and model names are also
> > reflected in respective CSRs, and are 8 bytes long. So, rename "la464"
> > to "qemu64-v1.00", with "QEMU64" as vendor name and "v1.00" as model
> > name.
>
> Eh, I suppose.  I'm not really keen on this though, as I would presume there 
> will be
> eventual forward progress on completing the real cpu model.  We simply won't 
> give any
> compatibility guarantees for loongarch until we are ready to do so.
In my opinion, real cpu name (Loongson-3A5000, Loongson-3A6000, etc.)
and generic qemu emulated name (qemu64-v1.00, qemu64-v2.00, vx.xx is
the ISA level, I found this style is used for x86) are both
acceptable. But la464 is not a good cpu name, because la264 and la464
are in the same ISA level, while la664 will be in a new level.

Huacai

>
>
> r~
>


-- 
Huacai Chen



[PATCH v5 1/3] Update linux headers to 6.0-rc1

2022-08-16 Thread Chenyi Qiang
commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868

Signed-off-by: Chenyi Qiang 
---
 include/standard-headers/asm-x86/bootparam.h  |   7 +-
 include/standard-headers/drm/drm_fourcc.h |  73 +++-
 include/standard-headers/linux/ethtool.h  |  29 +--
 include/standard-headers/linux/input.h|  12 +-
 include/standard-headers/linux/pci_regs.h |  30 ++-
 include/standard-headers/linux/vhost_types.h  |  17 +-
 include/standard-headers/linux/virtio_9p.h|   2 +-
 .../standard-headers/linux/virtio_config.h|   7 +-
 include/standard-headers/linux/virtio_ids.h   |  14 +-
 include/standard-headers/linux/virtio_net.h   |  34 +++-
 include/standard-headers/linux/virtio_pci.h   |   2 +
 linux-headers/asm-arm64/kvm.h |  27 +++
 linux-headers/asm-generic/unistd.h|   4 +-
 linux-headers/asm-riscv/kvm.h |  22 +++
 linux-headers/asm-riscv/unistd.h  |   3 +-
 linux-headers/asm-s390/kvm.h  |   1 +
 linux-headers/asm-x86/kvm.h   |  33 ++--
 linux-headers/asm-x86/mman.h  |  14 --
 linux-headers/linux/kvm.h | 172 +-
 linux-headers/linux/userfaultfd.h |  10 +-
 linux-headers/linux/vduse.h   |  47 +
 linux-headers/linux/vfio.h|   4 +-
 linux-headers/linux/vfio_zdev.h   |   7 +
 linux-headers/linux/vhost.h   |  35 +++-
 24 files changed, 523 insertions(+), 83 deletions(-)

diff --git a/include/standard-headers/asm-x86/bootparam.h 
b/include/standard-headers/asm-x86/bootparam.h
index b2aaad10e5..0b06d2bff1 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,12 +10,13 @@
 #define SETUP_EFI  4
 #define SETUP_APPLE_PROPERTIES 5
 #define SETUP_JAILHOUSE6
+#define SETUP_CC_BLOB  7
+#define SETUP_IMA  8
 #define SETUP_RNG_SEED 9
+#define SETUP_ENUM_MAX SETUP_RNG_SEED
 
 #define SETUP_INDIRECT (1<<31)
-
-/* SETUP_INDIRECT | max(SETUP_*) */
-#define SETUP_TYPE_MAX (SETUP_INDIRECT | SETUP_JAILHOUSE)
+#define SETUP_TYPE_MAX (SETUP_ENUM_MAX | SETUP_INDIRECT)
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK   0x07FF
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 4888f85f69..48b620cbef 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -558,7 +558,7 @@ extern "C" {
  *
  * The main surface is Y-tiled and is at plane index 0 whereas CCS is linear
  * and at index 1. The clear color is stored at index 2, and the pitch should
- * be ignored. The clear color structure is 256 bits. The first 128 bits
+ * be 64 bytes aligned. The clear color structure is 256 bits. The first 128 
bits
  * represents Raw Clear Color Red, Green, Blue and Alpha color each represented
  * by 32 bits. The raw clear color is consumed by the 3d engine and generates
  * the converted clear color of size 64 bits. The first 32 bits store the Lower
@@ -571,6 +571,53 @@ extern "C" {
  */
 #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
 
+/*
+ * Intel Tile 4 layout
+ *
+ * This is a tiled layout using 4KB tiles in a row-major layout. It has the 
same
+ * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It
+ * only differs from Tile Y at the 256B granularity in between. At this
+ * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a 
shape
+ * of 64B x 8 rows.
+ */
+#define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 render compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data is stored
+ * outside of the GEM object in a reserved memory area dedicated for the
+ * storage of the CCS data for all RC/RC_CC/MC compressible GEM objects. The
+ * main surface pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
+
+/*
+ * Intel color control surfaces (CCS) for DG2 media compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. For semi-planar formats
+ * like NV12, the Y and UV planes are Tile 4 and are located at plane indices
+ * 0 and 1, respectively. The CCS for all planes are stored outside of the
+ * GEM object in a reserved memory area dedicated for the storage of the
+ * CCS data for all RC/RC_CC/MC compressible GEM objects. The main surface
+ * pitch is required to be a multiple of four Tile 4 widths.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+
+/*
+ * Intel Color Control Surface with Clear Color (CCS) for DG2 render 
compression.
+ *
+ * The main surface is Tile 4 and at plane index 0. The CCS data 

[PATCH v5 0/3] Enable notify VM exit

2022-08-16 Thread Chenyi Qiang
Notify VM exit is introduced to mitigate the potential DOS attach from
malicious VM. This series is the userspace part to enable this feature
through a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT. The detailed
info can be seen in Patch 3.

The corresponding KVM support can be found in linux 6.0-rc1:
(2f4073e08f4c KVM: VMX: Enable Notify VM exit)

---
Change logs:
v4 -> v5
- Remove the assert check to avoid the nop in NDEBUG case. (Yuan)
- v4: 
https://lore.kernel.org/qemu-devel/20220524140302.23272-1-chenyi.qi...@intel.com/

v3 -> v4
- Add a new KVM cap KVM_CAP_TRIPLE_FAULT_EVENT to guard the extension of triple 
fault
  event save
- v3: 
https://lore.kernel.org/qemu-devel/20220421074028.18196-1-chenyi.qi...@intel.com/

v2 -> v3
- Extend the argument to include both the notify window and some flags
  when enabling KVM_CAP_X86_BUS_LOCK_EXIT CAP.
- Change to use KVM_VCPUEVENTS_VALID_TRIPLE_FAULT in flags field and add
  pending_triple_fault field in struct kvm_vcpu_events.
- v2: 
https://lore.kernel.org/qemu-devel/20220318082934.25030-1-chenyi.qi...@intel.com/

v1 -> v2
- Add some commit message to explain why we disable Notify VM exit by default.
- Rename KVM_VCPUEVENT_SHUTDOWN to KVM_VCPUEVENT_TRIPLE_FAULT.
- Do the corresponding change to use the KVM_VCPUEVENTS_TRIPLE_FAULT
  to save/restore the triple fault event to avoid lose some synthesized
  triple fault from KVM.
- v1: 
https://lore.kernel.org/qemu-devel/20220310090205.10645-1-chenyi.qi...@intel.com/

---

Chenyi Qiang (3):
  Update linux headers to 6.0-rc1
  i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple
fault
  i386: Add notify VM exit support

 hw/i386/x86.c |  45 +
 include/hw/i386/x86.h |   5 +
 include/standard-headers/asm-x86/bootparam.h  |   7 +-
 include/standard-headers/drm/drm_fourcc.h |  73 +++-
 include/standard-headers/linux/ethtool.h  |  29 +--
 include/standard-headers/linux/input.h|  12 +-
 include/standard-headers/linux/pci_regs.h |  30 ++-
 include/standard-headers/linux/vhost_types.h  |  17 +-
 include/standard-headers/linux/virtio_9p.h|   2 +-
 .../standard-headers/linux/virtio_config.h|   7 +-
 include/standard-headers/linux/virtio_ids.h   |  14 +-
 include/standard-headers/linux/virtio_net.h   |  34 +++-
 include/standard-headers/linux/virtio_pci.h   |   2 +
 linux-headers/asm-arm64/kvm.h |  27 +++
 linux-headers/asm-generic/unistd.h|   4 +-
 linux-headers/asm-riscv/kvm.h |  22 +++
 linux-headers/asm-riscv/unistd.h  |   3 +-
 linux-headers/asm-s390/kvm.h  |   1 +
 linux-headers/asm-x86/kvm.h   |  33 ++--
 linux-headers/asm-x86/mman.h  |  14 --
 linux-headers/linux/kvm.h | 172 +-
 linux-headers/linux/userfaultfd.h |  10 +-
 linux-headers/linux/vduse.h   |  47 +
 linux-headers/linux/vfio.h|   4 +-
 linux-headers/linux/vfio_zdev.h   |   7 +
 linux-headers/linux/vhost.h   |  35 +++-
 target/i386/cpu.c |   1 +
 target/i386/cpu.h |   1 +
 target/i386/kvm/kvm.c |  48 +
 29 files changed, 623 insertions(+), 83 deletions(-)

-- 
2.17.1




[PATCH v5 2/3] i386: kvm: extend kvm_{get, put}_vcpu_events to support pending triple fault

2022-08-16 Thread Chenyi Qiang
For the direct triple faults, i.e. hardware detected and KVM morphed
to VM-Exit, KVM will never lose them. But for triple faults sythesized
by KVM, e.g. the RSM path, if KVM exits to userspace before the request
is serviced, userspace could migrate the VM and lose the triple fault.

A new flag KVM_VCPUEVENT_VALID_TRIPLE_FAULT is defined to signal that
the event.triple_fault_pending field contains a valid state if the
KVM_CAP_X86_TRIPLE_FAULT_EVENT capability is enabled.

Signed-off-by: Chenyi Qiang 
---
 target/i386/cpu.c |  1 +
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c | 20 
 3 files changed, 22 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1db1278a59..6e107466b3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6017,6 +6017,7 @@ static void x86_cpu_reset(DeviceState *dev)
 env->exception_has_payload = false;
 env->exception_payload = 0;
 env->nmi_injected = false;
+env->triple_fault_pending = false;
 #if !defined(CONFIG_USER_ONLY)
 /* We hard-wire the BSP to the first CPU. */
 apic_designate_bsp(cpu->apic_state, s->cpu_index == 0);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b9..b97d182e28 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1739,6 +1739,7 @@ typedef struct CPUArchState {
 uint8_t has_error_code;
 uint8_t exception_has_payload;
 uint64_t exception_payload;
+bool triple_fault_pending;
 uint32_t ins_len;
 uint32_t sipi_vector;
 bool tsc_valid;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f148a6d52f..cb88ba4a00 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -132,6 +132,7 @@ static int has_xcrs;
 static int has_pit_state2;
 static int has_sregs2;
 static int has_exception_payload;
+static int has_triple_fault_event;
 
 static bool has_msr_mcg_ext_ctl;
 
@@ -2466,6 +2467,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+has_triple_fault_event = kvm_check_extension(s, 
KVM_CAP_X86_TRIPLE_FAULT_EVENT);
+if (has_triple_fault_event) {
+ret = kvm_vm_enable_cap(s, KVM_CAP_X86_TRIPLE_FAULT_EVENT, 0, true);
+if (ret < 0) {
+error_report("kvm: Failed to enable triple fault event cap: %s",
+ strerror(-ret));
+return ret;
+}
+}
+
 ret = kvm_get_supported_msrs(s);
 if (ret < 0) {
 return ret;
@@ -4282,6 +4293,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
 }
 }
 
+if (has_triple_fault_event) {
+events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT;
+events.triple_fault.pending = env->triple_fault_pending;
+}
+
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, );
 }
 
@@ -4351,6 +4367,10 @@ static int kvm_get_vcpu_events(X86CPU *cpu)
 }
 }
 
+if (events.flags & KVM_VCPUEVENT_VALID_TRIPLE_FAULT) {
+env->triple_fault_pending = events.triple_fault.pending;
+}
+
 env->sipi_vector = events.sipi_vector;
 
 return 0;
-- 
2.17.1




[PATCH v5 3/3] i386: Add notify VM exit support

2022-08-16 Thread Chenyi Qiang
There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).

A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
  Bit 63:32 - notify window specified in qemu command
  Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
  enable the feature.)

Because there are some concerns, e.g. a notify VM exit may happen with
VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated
that would set this bit), which means VM context is corrupted. To avoid
the false positive and a well-behaved guest gets killed, make this
feature disabled by default. Users can enable the feature by a new
machine property:
qemu -machine notify_vmexit=on,notify_window=0 ...

A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If
it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to
inform the fatal case. Then user space can inject a SHUTDOWN event to
the target vcpu. This is implemented by injecting a sythesized triple
fault event.

Signed-off-by: Chenyi Qiang 
---
 hw/i386/x86.c | 45 +++
 include/hw/i386/x86.h |  5 +
 target/i386/kvm/kvm.c | 28 +++
 3 files changed, 78 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8..1eccbd3deb 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, 
const char *name,
 qapi_free_SgxEPCList(list);
 }
 
+static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp)
+{
+X86MachineState *x86ms = X86_MACHINE(obj);
+
+return x86ms->notify_vmexit;
+}
+
+static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error 
**errp)
+{
+X86MachineState *x86ms = X86_MACHINE(obj);
+
+x86ms->notify_vmexit = value;
+}
+
+static void x86_machine_get_notify_window(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+X86MachineState *x86ms = X86_MACHINE(obj);
+uint32_t notify_window = x86ms->notify_window;
+
+visit_type_uint32(v, name, _window, errp);
+}
+
+static void x86_machine_set_notify_window(Object *obj, Visitor *v,
+   const char *name, void *opaque, Error **errp)
+{
+X86MachineState *x86ms = X86_MACHINE(obj);
+
+visit_type_uint32(v, name, >notify_window, errp);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
 X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj)
 x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 x86ms->bus_lock_ratelimit = 0;
 x86ms->above_4g_mem_start = 4 * GiB;
+x86ms->notify_vmexit = false;
+x86ms->notify_window = 0;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
@@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
 NULL, NULL);
 object_class_property_set_description(oc, "sgx-epc",
 "SGX EPC device");
+
+object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t",
+  x86_machine_get_notify_window,
+  x86_machine_set_notify_window, NULL, NULL);
+object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW,
+"Set the notify window required by notify VM exit");
+
+object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT,
+   x86_machine_get_notify_vmexit,
+   x86_machine_set_notify_vmexit);
+object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT,
+"Enable notify VM exit");
 }
 
 static const TypeInfo x86_machine_info = {
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 62fa5774f8..5707329fa7 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -85,6 +85,9 @@ struct X86MachineState {
  * which means no limitation on the guest's bus locks.
  */
 uint64_t bus_lock_ratelimit;
+
+bool notify_vmexit;
+uint32_t notify_window;
 };
 
 #define X86_MACHINE_SMM  "smm"
@@ -94,6 +97,8 @@ struct X86MachineState {
 #define X86_MACHINE_OEM_ID   "x-oem-id"
 #define X86_MACHINE_OEM_TABLE_ID "x-oem-table-id"
 #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
+#define X86_MACHINE_NOTIFY_VMEXIT 

Re: [PATCH for-7.2 14/21] accel/tcg: Hoist get_page_addr_code out of tb_lookup

2022-08-16 Thread Richard Henderson

On 8/16/22 18:43, Ilya Leoshkevich wrote:

On Fri, 2022-08-12 at 11:07 -0700, Richard Henderson wrote:

We will want to re-use the result of get_page_addr_code
beyond the scope of tb_lookup.

Signed-off-by: Richard Henderson 
---
  accel/tcg/cpu-exec.c | 34 --
  1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a9b7053274..889355b341 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -209,13 +209,12 @@ static bool tb_lookup_cmp(const void *p, const
void *d)
  }
  
  /* Might cause an exception, so have a longjmp destination ready */

-static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base,
+static TranslationBlock *tb_lookup(CPUState *cpu, tb_page_addr_t
phys_pc,
+   target_ulong pc, target_ulong
cs_base,
     uint32_t flags, uint32_t cflags)
  {
  CPUArchState *env = cpu->env_ptr;
  TranslationBlock *tb;
-    tb_page_addr_t phys_pc;
  struct tb_desc desc;
  uint32_t jmp_hash, tb_hash;
  
@@ -240,11 +239,8 @@ static TranslationBlock *tb_lookup(CPUState

*cpu, target_ulong pc,
  desc.cflags = cflags;
  desc.trace_vcpu_dstate = *cpu->trace_dstate;
  desc.pc = pc;
-    phys_pc = get_page_addr_code(desc.env, pc);
-    if (phys_pc == -1) {
-    return NULL;
-    }
  desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+
  tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu-

trace_dstate);

  tb = qht_lookup_custom(_ctx.htable, , tb_hash,
tb_lookup_cmp);
  if (tb == NULL) {
@@ -371,6 +367,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState
*env)
  TranslationBlock *tb;
  target_ulong cs_base, pc;
  uint32_t flags, cflags;
+    tb_page_addr_t phys_pc;
  
  cpu_get_tb_cpu_state(env, , _base, );
  
@@ -379,7 +376,12 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState

*env)
  cpu_loop_exit(cpu);
  }
  
-    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);

+    phys_pc = get_page_addr_code(env, pc);
+    if (phys_pc == -1) {
+    return tcg_code_gen_epilogue;
+    }
+
+    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
  if (tb == NULL) {
  return tcg_code_gen_epilogue;
  }
@@ -482,6 +484,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
  TranslationBlock *tb;
  target_ulong cs_base, pc;
  uint32_t flags, cflags;
+    tb_page_addr_t phys_pc;
  int tb_exit;
  
  if (sigsetjmp(cpu->jmp_env, 0) == 0) {

@@ -504,7 +507,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
   * Any breakpoint for this insn will have been recognized
earlier.
   */
  
-    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);

+    phys_pc = get_page_addr_code(env, pc);
+    if (phys_pc == -1) {
+    tb = NULL;
+    } else {
+    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags,
cflags);
+    }
  if (tb == NULL) {
  mmap_lock();
  tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
@@ -949,6 +957,7 @@ int cpu_exec(CPUState *cpu)
  TranslationBlock *tb;
  target_ulong cs_base, pc;
  uint32_t flags, cflags;
+    tb_page_addr_t phys_pc;
  
  cpu_get_tb_cpu_state(cpu->env_ptr, , _base,

);
  
@@ -970,7 +979,12 @@ int cpu_exec(CPUState *cpu)

  break;
  }
  
-    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);

+    phys_pc = get_page_addr_code(cpu->env_ptr, pc);
+    if (phys_pc == -1) {
+    tb = NULL;
+    } else {
+    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags,
cflags);
+    }
  if (tb == NULL) {
  mmap_lock();
  tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);


This patch did not make it into v2, but having get_page_addr_code()
before tb_lookup() in helper_lookup_tb_ptr() helped raise the exception
when trying to execute a no-longer-executable TB.

Was it dropped for performance reasons?


Ah, yes.  I dropped it because I ran into some regression, and started minimizing the 
tree.  Because of the extra lock that needed to be held (next patch, also dropped), I 
couldn't prove this actually helped.


I think the bit that's causing your user-only failure at the moment is the jump cache. 
This patch hoisted the page table check before the jmp_cache.  For system, cputlb.c takes 
care of flushing the jump cache with page table changes; we still don't have anything in 
user-only that takes care of that.



r~




[PATCH] esp: Handle CMD_BUSRESET by resetting the SCSI bus

2022-08-16 Thread John Millikin
Per investigation on the linked ticket, SunOS issues a SCSI bus reset
to the ESP as part of its boot sequence. If this ESP command doesn't
cause devices to assert sense flag UNIT ATTENTION, SunOS will consider
the CD-ROM device to be non-compliant with Common Command Set (CCS).
In this condition, the SunOS installer's early userspace doesn't set
the installation source location to sr0 and the miniroot copy fails.

Suggested-by: Bill Paul 
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1127
---
 hw/scsi/esp.c | 6 ++
 1 file changed, 6 insertions(+)

With this patch in place, booting the SunOS installation contains the
new output line:

  sr0 at esp0 target 6 lun 0

and the early userspace successfully writes the CD-ROM's device name
to the ramdisk, where the installer script expects it:

  # dd if=/dev/rd0a of=/tmp.bin bs=1 count=3
  3+0 records in
  3+0 records out
  # cat /tmp.bin; echo ''
  sr0
  #

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 2d3c649567..c799c19bd4 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -939,6 +939,11 @@ static void esp_soft_reset(ESPState *s)
 esp_hard_reset(s);
 }
 
+static void esp_bus_reset(ESPState *s)
+{
+qbus_reset_all(BUS(>bus));
+}
+
 static void parent_esp_reset(ESPState *s, int irq, int level)
 {
 if (level) {
@@ -1067,6 +1072,7 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
 break;
 case CMD_BUSRESET:
 trace_esp_mem_writeb_cmd_bus_reset(val);
+esp_bus_reset(s);
 if (!(s->wregs[ESP_CFG1] & CFG1_RESREPT)) {
 s->rregs[ESP_RINTR] |= INTR_RST;
 esp_raise_irq(s);
-- 
2.25.1




Re: [PATCH for-7.2 14/21] accel/tcg: Hoist get_page_addr_code out of tb_lookup

2022-08-16 Thread Ilya Leoshkevich
On Fri, 2022-08-12 at 11:07 -0700, Richard Henderson wrote:
> We will want to re-use the result of get_page_addr_code
> beyond the scope of tb_lookup.
> 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/cpu-exec.c | 34 --
>  1 file changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index a9b7053274..889355b341 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -209,13 +209,12 @@ static bool tb_lookup_cmp(const void *p, const
> void *d)
>  }
>  
>  /* Might cause an exception, so have a longjmp destination ready */
> -static TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
> -   target_ulong cs_base,
> +static TranslationBlock *tb_lookup(CPUState *cpu, tb_page_addr_t
> phys_pc,
> +   target_ulong pc, target_ulong
> cs_base,
>     uint32_t flags, uint32_t cflags)
>  {
>  CPUArchState *env = cpu->env_ptr;
>  TranslationBlock *tb;
> -    tb_page_addr_t phys_pc;
>  struct tb_desc desc;
>  uint32_t jmp_hash, tb_hash;
>  
> @@ -240,11 +239,8 @@ static TranslationBlock *tb_lookup(CPUState
> *cpu, target_ulong pc,
>  desc.cflags = cflags;
>  desc.trace_vcpu_dstate = *cpu->trace_dstate;
>  desc.pc = pc;
> -    phys_pc = get_page_addr_code(desc.env, pc);
> -    if (phys_pc == -1) {
> -    return NULL;
> -    }
>  desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> +
>  tb_hash = tb_hash_func(phys_pc, pc, flags, cflags, *cpu-
> >trace_dstate);
>  tb = qht_lookup_custom(_ctx.htable, , tb_hash,
> tb_lookup_cmp);
>  if (tb == NULL) {
> @@ -371,6 +367,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState
> *env)
>  TranslationBlock *tb;
>  target_ulong cs_base, pc;
>  uint32_t flags, cflags;
> +    tb_page_addr_t phys_pc;
>  
>  cpu_get_tb_cpu_state(env, , _base, );
>  
> @@ -379,7 +376,12 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState
> *env)
>  cpu_loop_exit(cpu);
>  }
>  
> -    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
> +    phys_pc = get_page_addr_code(env, pc);
> +    if (phys_pc == -1) {
> +    return tcg_code_gen_epilogue;
> +    }
> +
> +    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags, cflags);
>  if (tb == NULL) {
>  return tcg_code_gen_epilogue;
>  }
> @@ -482,6 +484,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
>  TranslationBlock *tb;
>  target_ulong cs_base, pc;
>  uint32_t flags, cflags;
> +    tb_page_addr_t phys_pc;
>  int tb_exit;
>  
>  if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> @@ -504,7 +507,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
>   * Any breakpoint for this insn will have been recognized
> earlier.
>   */
>  
> -    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
> +    phys_pc = get_page_addr_code(env, pc);
> +    if (phys_pc == -1) {
> +    tb = NULL;
> +    } else {
> +    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags,
> cflags);
> +    }
>  if (tb == NULL) {
>  mmap_lock();
>  tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> @@ -949,6 +957,7 @@ int cpu_exec(CPUState *cpu)
>  TranslationBlock *tb;
>  target_ulong cs_base, pc;
>  uint32_t flags, cflags;
> +    tb_page_addr_t phys_pc;
>  
>  cpu_get_tb_cpu_state(cpu->env_ptr, , _base,
> );
>  
> @@ -970,7 +979,12 @@ int cpu_exec(CPUState *cpu)
>  break;
>  }
>  
> -    tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
> +    phys_pc = get_page_addr_code(cpu->env_ptr, pc);
> +    if (phys_pc == -1) {
> +    tb = NULL;
> +    } else {
> +    tb = tb_lookup(cpu, phys_pc, pc, cs_base, flags,
> cflags);
> +    }
>  if (tb == NULL) {
>  mmap_lock();
>  tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);

This patch did not make it into v2, but having get_page_addr_code()
before tb_lookup() in helper_lookup_tb_ptr() helped raise the exception
when trying to execute a no-longer-executable TB.

Was it dropped for performance reasons?



[PATCH v13 6/6] target/riscv: Remove additional priv version check for mcountinhibit

2022-08-16 Thread Atish Patra
With .min_priv_version, additiona priv version check is uncessary
for mcountinhibit read/write functions.

Reviewed-by: Heiko Stuebner 
Tested-by: Heiko Stuebner 
Reviewed-by: Alistair Francis 
Signed-off-by: Atish Patra 
---
 target/riscv/csr.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 07b8b4eb1768..2dcd4e5b2d40 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1640,10 +1640,6 @@ static RISCVException write_mtvec(CPURISCVState *env, 
int csrno,
 static RISCVException read_mcountinhibit(CPURISCVState *env, int csrno,
  target_ulong *val)
 {
-if (env->priv_ver < PRIV_VERSION_1_11_0) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
 *val = env->mcountinhibit;
 return RISCV_EXCP_NONE;
 }
@@ -1654,10 +1650,6 @@ static RISCVException write_mcountinhibit(CPURISCVState 
*env, int csrno,
 int cidx;
 PMUCTRState *counter;
 
-if (env->priv_ver < PRIV_VERSION_1_11_0) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-
 env->mcountinhibit = val;
 
 /* Check if any other counter is also monitoring cycles/instructions */
-- 
2.25.1




Re: [PATCH v12 0/6] Improve PMU support

2022-08-16 Thread Atish Kumar Patra
On Sun, Aug 14, 2022 at 5:02 PM Alistair Francis 
wrote:

> On Fri, Aug 12, 2022 at 12:05 PM Atish Patra 
> wrote:
> >
> > On Tue, Aug 2, 2022 at 4:33 PM Atish Patra  wrote:
> > >
> > > The latest version of the SBI specification includes a Performance
> Monitoring
> > > Unit(PMU) extension[1] which allows the supervisor to
> start/stop/configure
> > > various PMU events. The Sscofpmf ('Ss' for Privileged arch and
> Supervisor-level
> > > extensions, and 'cofpmf' for Count OverFlow and Privilege Mode
> Filtering)
> > > extension[2] allows the perf like tool to handle overflow interrupts
> and
> > > filtering support.
> > >
> > > This series implements remaining PMU infrastructure to support
> > > PMU in virt machine. The first seven patches from the original series
> > > have been merged already.
> > >
> > > This will allow us to add any PMU events in future.
> > > Currently, this series enables the following omu events.
> > > 1. cycle count
> > > 2. instruction count
> > > 3. DTLB load/store miss
> > > 4. ITLB prefetch miss
> > >
> > > The first two are computed using host ticks while last three are
> counted during
> > > cpu_tlb_fill. We can do both sampling and count from guest userspace.
> > > This series has been tested on both RV64 and RV32. Both Linux[3] and
> Opensbi[4]
> > > patches are required to get the perf working.
> > >
> > > Here is an output of perf stat/report while running hackbench with
> latest
> > > OpenSBI & Linux kernel.
> > >
> > > Perf stat:
> > > ==
> > > [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e
> dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \
> > > > perf bench sched messaging -g 1 -l 10
> > > # Running 'sched/messaging' benchmark:
> > > # 20 sender and receiver processes per group
> > > # 1 groups == 40 processes run
> > >
> > >  Total time: 0.265 [sec]
> > >
> > >  Performance counter stats for 'perf bench sched messaging -g 1 -l 10':
> > >
> > >  4,167,825,362  cycles
> > >  4,166,609,256  instructions  #1.00  insn per
> cycle
> > >  3,092,026  dTLB-load-misses
> > >258,280  dTLB-store-misses
> > >  2,068,966  iTLB-load-misses
> > >
> > >0.585791767 seconds time elapsed
> > >
> > >0.373802000 seconds user
> > >1.042359000 seconds sys
> > >
> > > Perf record:
> > > 
> > > [root@fedora-riscv ~]# perf record -e cycles -e instructions \
> > > > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c
> 1 \
> > > > perf bench sched messaging -g 1 -l 10
> > > # Running 'sched/messaging' benchmark:
> > > # 20 sender and receiver processes per group
> > > # 1 groups == 40 processes run
> > >
> > >  Total time: 1.397 [sec]
> > > [ perf record: Woken up 10 times to write data ]
> > > Check IO/CPU overload!
> > > [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ]
> > >
> > > [root@fedora-riscv riscv]# perf report
> > > Available samples
> > > 107K cycles
> ◆
> > > 107K instructions
> ▒
> > > 250 dTLB-load-misses
>  ▒
> > > 13 dTLB-store-misses
>  ▒
> > > 172 iTLB-load-misses
> > > ..
> > >
> > > Changes from v11->v12:
> > > 1. Rebased on top of the apply-next.
> > > 2. Aligned the write function & .min_priv to the previous line.
> > > 3. Fixed the FDT generations for multi-socket scenario.
> > > 4. Dropped interrupt property from the DT.
> > > 5. Generate illegal instruction fault instead of virtual instruction
> fault
> > >for VS/VU access while mcounteren is not set.
> > >
> > > Changes from v10->v11:
> > > 1. Rebased on top of the master where first 7 patches were already
> merged.
> > > 2. Removed unnecessary additional check in ctr predicate function.
> > > 3. Removed unnecessary priv version checks in mcountinhibit read/write.
> > > 4. Added Heiko's reviewed-by/tested-by tags.
> > >
> > > Changes from v8->v9:
> > > 1. Added the write_done flags to the vmstate.
> > > 2. Fixed the hpmcounter read access from M-mode.
> > >
> > > Changes from v7->v8:
> > > 1. Removeding ordering constraints for mhpmcounter & mhpmevent.
> > >
> > > Changes from v6->v7:
> > > 1. Fixed all the compilation errors for the usermode.
> > >
> > > Changes from v5->v6:
> > > 1. Fixed compilation issue with PATCH 1.
> > > 2. Addressed other comments.
> > >
> > > Changes from v4->v5:
> > > 1. Rebased on top of the -next with following patches.
> > >- isa extension
> > >- priv 1.12 spec
> > > 2. Addressed all the comments on v4
> > > 3. Removed additional isa-ext DT node in favor of riscv,isa string
> update
> > >
> > > Changes from v3->v4:
> > > 1. Removed the dummy events from pmu DT node.
> > > 2. Fixed pmu_avail_counters mask generation.
> > > 3. Added a patch to simplify the predicate function for counters.
> > >
> > > Changes from v2->v3:
> > > 1. Addressed all the comments on PATCH1-4.
> > > 2. Split patch1 into two separate patches.
> > > 3. Added explicit comments to explain 

[PATCH v13 4/6] hw/riscv: virt: Add PMU DT node to the device tree

2022-08-16 Thread Atish Patra
Qemu virt machine can support few cache events and cycle/instret counters.
It also supports counter overflow for these events.

Add a DT node so that OpenSBI/Linux kernel is aware of the virt machine
capabilities. There are some dummy nodes added for testing as well.

Acked-by: Alistair Francis 
Signed-off-by: Atish Patra 
Signed-off-by: Atish Patra 
---
 hw/riscv/virt.c| 16 +
 target/riscv/pmu.c | 57 ++
 target/riscv/pmu.h |  1 +
 3 files changed, 74 insertions(+)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index c1e8e0fcaf22..e779d399ae7d 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -30,6 +30,7 @@
 #include "hw/char/serial.h"
 #include "target/riscv/cpu.h"
 #include "hw/core/sysbus-fdt.h"
+#include "target/riscv/pmu.h"
 #include "hw/riscv/riscv_hart.h"
 #include "hw/riscv/virt.h"
 #include "hw/riscv/boot.h"
@@ -715,6 +716,20 @@ static void create_fdt_socket_aplic(RISCVVirtState *s,
 aplic_phandles[socket] = aplic_s_phandle;
 }
 
+static void create_fdt_pmu(RISCVVirtState *s)
+{
+char *pmu_name;
+MachineState *mc = MACHINE(s);
+RISCVCPU hart = s->soc[0].harts[0];
+
+pmu_name = g_strdup_printf("/soc/pmu");
+qemu_fdt_add_subnode(mc->fdt, pmu_name);
+qemu_fdt_setprop_string(mc->fdt, pmu_name, "compatible", "riscv,pmu");
+riscv_pmu_generate_fdt_node(mc->fdt, hart.cfg.pmu_num, pmu_name);
+
+g_free(pmu_name);
+}
+
 static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry *memmap,
bool is_32_bit, uint32_t *phandle,
uint32_t *irq_mmio_phandle,
@@ -1043,6 +1058,7 @@ static void create_fdt(RISCVVirtState *s, const 
MemMapEntry *memmap,
 
 create_fdt_flash(s, memmap);
 create_fdt_fw_cfg(s, memmap);
+create_fdt_pmu(s);
 
 update_bootargs:
 if (cmdline && *cmdline) {
diff --git a/target/riscv/pmu.c b/target/riscv/pmu.c
index f380fd10f1a6..ad33b81b2ea0 100644
--- a/target/riscv/pmu.c
+++ b/target/riscv/pmu.c
@@ -20,11 +20,68 @@
 #include "cpu.h"
 #include "pmu.h"
 #include "sysemu/cpu-timers.h"
+#include "sysemu/device_tree.h"
 
 #define RISCV_TIMEBASE_FREQ 10 /* 1Ghz */
 #define MAKE_32BIT_MASK(shift, length) \
 (((uint32_t)(~0UL) >> (32 - (length))) << (shift))
 
+/*
+ * To keep it simple, any event can be mapped to any programmable counters in
+ * QEMU. The generic cycle & instruction count events can also be monitored
+ * using programmable counters. In that case, mcycle & minstret must continue
+ * to provide the correct value as well. Heterogeneous PMU per hart is not
+ * supported yet. Thus, number of counters are same across all harts.
+ */
+void riscv_pmu_generate_fdt_node(void *fdt, int num_ctrs, char *pmu_name)
+{
+uint32_t fdt_event_ctr_map[20] = {};
+uint32_t cmask;
+
+/* All the programmable counters can map to any event */
+cmask = MAKE_32BIT_MASK(3, num_ctrs);
+
+   /*
+* The event encoding is specified in the SBI specification
+* Event idx is a 20bits wide number encoded as follows:
+* event_idx[19:16] = type
+* event_idx[15:0] = code
+* The code field in cache events are encoded as follows:
+* event_idx.code[15:3] = cache_id
+* event_idx.code[2:1] = op_id
+* event_idx.code[0:0] = result_id
+*/
+
+   /* SBI_PMU_HW_CPU_CYCLES: 0x01 : type(0x00) */
+   fdt_event_ctr_map[0] = cpu_to_be32(0x0001);
+   fdt_event_ctr_map[1] = cpu_to_be32(0x0001);
+   fdt_event_ctr_map[2] = cpu_to_be32(cmask | 1 << 0);
+
+   /* SBI_PMU_HW_INSTRUCTIONS: 0x02 : type(0x00) */
+   fdt_event_ctr_map[3] = cpu_to_be32(0x0002);
+   fdt_event_ctr_map[4] = cpu_to_be32(0x0002);
+   fdt_event_ctr_map[5] = cpu_to_be32(cmask | 1 << 2);
+
+   /* SBI_PMU_HW_CACHE_DTLB : 0x03 READ : 0x00 MISS : 0x00 type(0x01) */
+   fdt_event_ctr_map[6] = cpu_to_be32(0x00010019);
+   fdt_event_ctr_map[7] = cpu_to_be32(0x00010019);
+   fdt_event_ctr_map[8] = cpu_to_be32(cmask);
+
+   /* SBI_PMU_HW_CACHE_DTLB : 0x03 WRITE : 0x01 MISS : 0x00 type(0x01) */
+   fdt_event_ctr_map[9] = cpu_to_be32(0x0001001B);
+   fdt_event_ctr_map[10] = cpu_to_be32(0x0001001B);
+   fdt_event_ctr_map[11] = cpu_to_be32(cmask);
+
+   /* SBI_PMU_HW_CACHE_ITLB : 0x04 READ : 0x00 MISS : 0x00 type(0x01) */
+   fdt_event_ctr_map[12] = cpu_to_be32(0x00010021);
+   fdt_event_ctr_map[13] = cpu_to_be32(0x00010021);
+   fdt_event_ctr_map[14] = cpu_to_be32(cmask);
+
+   /* This a OpenSBI specific DT property documented in OpenSBI docs */
+   qemu_fdt_setprop(fdt, pmu_name, "riscv,event-to-mhpmcounters",
+fdt_event_ctr_map, sizeof(fdt_event_ctr_map));
+}
+
 static bool riscv_pmu_counter_valid(RISCVCPU *cpu, uint32_t ctr_idx)
 {
 if (ctr_idx < 3 || ctr_idx >= RV_MAX_MHPMCOUNTERS ||
diff --git a/target/riscv/pmu.h b/target/riscv/pmu.h
index 036653627f78..3004ce37b636 100644
--- a/target/riscv/pmu.h
+++ b/target/riscv/pmu.h
@@ -31,5 +31,6 @@ int riscv_pmu_init(RISCVCPU *cpu, int num_counters);
 

[PATCH v13 1/6] target/riscv: Add sscofpmf extension support

2022-08-16 Thread Atish Patra
The Sscofpmf ('Ss' for Privileged arch and Supervisor-level extensions,
and 'cofpmf' for Count OverFlow and Privilege Mode Filtering)
extension allows the perf to handle overflow interrupts and filtering
support. This patch provides a framework for programmable
counters to leverage the extension. As the extension doesn't have any
provision for the overflow bit for fixed counters, the fixed events
can also be monitoring using programmable counters. The underlying
counters for cycle and instruction counters are always running. Thus,
a separate timer device is programmed to handle the overflow.

Tested-by: Heiko Stuebner 
Reviewed-by: Alistair Francis 
Signed-off-by: Atish Patra 
Signed-off-by: Atish Patra 
---
 target/riscv/cpu.c  |  12 ++
 target/riscv/cpu.h  |  25 +++
 target/riscv/cpu_bits.h |  55 +++
 target/riscv/csr.c  | 166 ++-
 target/riscv/machine.c  |   1 +
 target/riscv/pmu.c  | 357 +++-
 target/riscv/pmu.h  |   7 +
 7 files changed, 612 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2498b93105fd..d3fbaae0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -22,6 +22,7 @@
 #include "qemu/ctype.h"
 #include "qemu/log.h"
 #include "cpu.h"
+#include "pmu.h"
 #include "internals.h"
 #include "time_helper.h"
 #include "exec/exec-all.h"
@@ -100,6 +101,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
 ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
+ISA_EXT_DATA_ENTRY(sscofpmf, true, PRIV_VERSION_1_12_0, ext_sscofpmf),
 ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
 ISA_EXT_DATA_ENTRY(svinval, true, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, true, PRIV_VERSION_1_12_0, ext_svnapot),
@@ -890,6 +892,15 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 set_misa(env, env->misa_mxl, ext);
 }
 
+#ifndef CONFIG_USER_ONLY
+if (cpu->cfg.pmu_num) {
+if (!riscv_pmu_init(cpu, cpu->cfg.pmu_num) && cpu->cfg.ext_sscofpmf) {
+cpu->pmu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  riscv_pmu_timer_cb, cpu);
+}
+ }
+#endif
+
 riscv_cpu_register_gdb_regs_for_features(cs);
 
 qemu_init_vcpu(cs);
@@ -994,6 +1005,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("h", RISCVCPU, cfg.ext_h, true),
 DEFINE_PROP_UINT8("pmu-num", RISCVCPU, cfg.pmu_num, 16),
+DEFINE_PROP_BOOL("sscofpmf", RISCVCPU, cfg.ext_sscofpmf, false),
 DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
 DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
 DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 1fd382b2717f..42edfa4558d0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -137,6 +137,8 @@ typedef struct PMUCTRState {
 /* Snapshort value of a counter in RV32 */
 target_ulong mhpmcounterh_prev;
 bool started;
+/* Value beyond UINT32_MAX/UINT64_MAX before overflow interrupt trigger */
+target_ulong irq_overflow_left;
 } PMUCTRState;
 
 struct CPUArchState {
@@ -302,6 +304,9 @@ struct CPUArchState {
 /* PMU event selector configured values. First three are unused*/
 target_ulong mhpmevent_val[RV_MAX_MHPMEVENTS];
 
+/* PMU event selector configured values for RV32*/
+target_ulong mhpmeventh_val[RV_MAX_MHPMEVENTS];
+
 target_ulong sscratch;
 target_ulong mscratch;
 
@@ -446,6 +451,7 @@ struct RISCVCPUConfig {
 bool ext_zve32f;
 bool ext_zve64f;
 bool ext_zmmul;
+bool ext_sscofpmf;
 bool rvv_ta_all_1s;
 bool rvv_ma_all_1s;
 
@@ -493,6 +499,12 @@ struct ArchCPU {
 
 /* Configuration Settings */
 RISCVCPUConfig cfg;
+
+QEMUTimer *pmu_timer;
+/* A bitmask of Available programmable counters */
+uint32_t pmu_avail_ctrs;
+/* Mapping of events to counters */
+GHashTable *pmu_event_ctr_map;
 };
 
 static inline int riscv_has_ext(CPURISCVState *env, target_ulong ext)
@@ -753,6 +765,19 @@ enum {
 CSR_TABLE_SIZE = 0x1000
 };
 
+/**
+ * The event id are encoded based on the encoding specified in the
+ * SBI specification v0.3
+ */
+
+enum riscv_pmu_event_idx {
+RISCV_PMU_EVENT_HW_CPU_CYCLES = 0x01,
+RISCV_PMU_EVENT_HW_INSTRUCTIONS = 0x02,
+RISCV_PMU_EVENT_CACHE_DTLB_READ_MISS = 0x10019,
+RISCV_PMU_EVENT_CACHE_DTLB_WRITE_MISS = 0x1001B,
+RISCV_PMU_EVENT_CACHE_ITLB_PREFETCH_MISS = 0x10021,
+};
+
 /* CSR function table */
 extern riscv_csr_operations csr_ops[CSR_TABLE_SIZE];
 
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 095dab19f512..7be12cac2ee6 100644
--- 

[PATCH v13 3/6] target/riscv: Add few cache related PMU events

2022-08-16 Thread Atish Patra
From: Atish Patra 

Qemu can monitor the following cache related PMU events through
tlb_fill functions.

1. DTLB load/store miss
3. ITLB prefetch miss

Increment the PMU counter in tlb_fill function.

Reviewed-by: Alistair Francis 
Tested-by: Heiko Stuebner 
Signed-off-by: Atish Patra 
Signed-off-by: Atish Patra 
---
 target/riscv/cpu_helper.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 1e4faa84e839..81948b37dd9a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -21,11 +21,13 @@
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
+#include "pmu.h"
 #include "exec/exec-all.h"
 #include "instmap.h"
 #include "tcg/tcg-op.h"
 #include "trace.h"
 #include "semihosting/common-semi.h"
+#include "cpu_bits.h"
 
 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 {
@@ -1188,6 +1190,28 @@ void riscv_cpu_do_unaligned_access(CPUState *cs, vaddr 
addr,
 cpu_loop_exit_restore(cs, retaddr);
 }
 
+
+static void pmu_tlb_fill_incr_ctr(RISCVCPU *cpu, MMUAccessType access_type)
+{
+enum riscv_pmu_event_idx pmu_event_type;
+
+switch (access_type) {
+case MMU_INST_FETCH:
+pmu_event_type = RISCV_PMU_EVENT_CACHE_ITLB_PREFETCH_MISS;
+break;
+case MMU_DATA_LOAD:
+pmu_event_type = RISCV_PMU_EVENT_CACHE_DTLB_READ_MISS;
+break;
+case MMU_DATA_STORE:
+pmu_event_type = RISCV_PMU_EVENT_CACHE_DTLB_WRITE_MISS;
+break;
+default:
+return;
+}
+
+riscv_pmu_incr_ctr(cpu, pmu_event_type);
+}
+
 bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 MMUAccessType access_type, int mmu_idx,
 bool probe, uintptr_t retaddr)
@@ -1286,6 +1310,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 }
 }
 } else {
+pmu_tlb_fill_incr_ctr(cpu, access_type);
 /* Single stage lookup */
 ret = get_physical_address(env, , , address, NULL,
access_type, mmu_idx, true, false, false);
-- 
2.25.1




[PATCH v13 0/6] Improve PMU support

2022-08-16 Thread Atish Patra
The latest version of the SBI specification includes a Performance Monitoring
Unit(PMU) extension[1] which allows the supervisor to start/stop/configure
various PMU events. The Sscofpmf ('Ss' for Privileged arch and Supervisor-level
extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering)
extension[2] allows the perf like tool to handle overflow interrupts and
filtering support.

This series implements remaining PMU infrastructure to support
PMU in virt machine. The first seven patches from the original series
have been merged already.

This will allow us to add any PMU events in future.
Currently, this series enables the following omu events.
1. cycle count
2. instruction count
3. DTLB load/store miss
4. ITLB prefetch miss

The first two are computed using host ticks while last three are counted during
cpu_tlb_fill. We can do both sampling and count from guest userspace.
This series has been tested on both RV64 and RV32. Both Linux[3] and Opensbi[4]
patches are required to get the perf working.

Here is an output of perf stat/report while running hackbench with latest
OpenSBI & Linux kernel.

Perf stat:
==
[root@fedora-riscv ~]# perf stat -e cycles -e instructions -e dTLB-load-misses 
-e dTLB-store-misses -e iTLB-load-misses \
> perf bench sched messaging -g 1 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1 groups == 40 processes run

 Total time: 0.265 [sec]

 Performance counter stats for 'perf bench sched messaging -g 1 -l 10':

 4,167,825,362  cycles  

 4,166,609,256  instructions  #1.00  insn per cycle 

 3,092,026  dTLB-load-misses

   258,280  dTLB-store-misses   

 2,068,966  iTLB-load-misses


   0.585791767 seconds time elapsed

   0.373802000 seconds user
   1.042359000 seconds sys

Perf record:

[root@fedora-riscv ~]# perf record -e cycles -e instructions \
> -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 1 \
> perf bench sched messaging -g 1 -l 10
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1 groups == 40 processes run

 Total time: 1.397 [sec]
[ perf record: Woken up 10 times to write data ]
Check IO/CPU overload!
[ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ]

[root@fedora-riscv riscv]# perf report
Available samples   
107K cycles◆
107K instructions  ▒
250 dTLB-load-misses   ▒
13 dTLB-store-misses   ▒
172 iTLB-load-misses  
..

Changes from v12->v13:
1. Rebased on top of the apply-next.
2. Addressed comments about space & comment block.

Changes from v11->v12:
1. Rebased on top of the apply-next.
2. Aligned the write function & .min_priv to the previous line.
3. Fixed the FDT generations for multi-socket scenario.
4. Dropped interrupt property from the DT.
5. Generate illegal instruction fault instead of virtual instruction fault
   for VS/VU access while mcounteren is not set.

Changes from v10->v11:
1. Rebased on top of the master where first 7 patches were already merged.
2. Removed unnecessary additional check in ctr predicate function.
3. Removed unnecessary priv version checks in mcountinhibit read/write. 
4. Added Heiko's reviewed-by/tested-by tags.

Changes from v8->v9:
1. Added the write_done flags to the vmstate.
2. Fixed the hpmcounter read access from M-mode.

Changes from v7->v8:
1. Removeding ordering constraints for mhpmcounter & mhpmevent.

Changes from v6->v7:
1. Fixed all the compilation errors for the usermode.

Changes from v5->v6:
1. Fixed compilation issue with PATCH 1.
2. Addressed other comments.

Changes from v4->v5:
1. Rebased on top of the -next with following patches.
   - isa extension
   - priv 1.12 spec
2. Addressed all the comments on v4
3. Removed additional isa-ext DT node in favor of riscv,isa string update

Changes from v3->v4:
1. Removed the dummy events from pmu DT node.
2. Fixed pmu_avail_counters mask generation.
3. Added a patch to simplify the predicate function for counters. 

Changes from v2->v3:
1. Addressed all the comments on PATCH1-4.
2. Split patch1 into two separate patches.
3. Added explicit comments to explain the event types in DT node.
4. Rebased on latest Qemu.

Changes from v1->v2:
1. Dropped the ACks from v1 as signficant changes happened after v1.
2. sscofpmf support.
3. A generic counter management framework.

[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
[2] 

[PATCH v13 5/6] target/riscv: Update the privilege field for sscofpmf CSRs

2022-08-16 Thread Atish Patra
The sscofpmf extension was ratified as a part of priv spec v1.12.
Mark the csr_ops accordingly.

Reviewed-by: Weiwei Li 
Reviewed-by: Alistair Francis 
Signed-off-by: Atish Patra 
---
 target/riscv/csr.c | 90 ++
 1 file changed, 60 insertions(+), 30 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 397803d07727..07b8b4eb1768 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -4063,63 +4063,92 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
  write_mhpmevent   },
 
 [CSR_MHPMEVENT3H]= { "mhpmevent3h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT4H]= { "mhpmevent4h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT5H]= { "mhpmevent5h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT6H]= { "mhpmevent6h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT7H]= { "mhpmevent7h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT8H]= { "mhpmevent8h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT9H]= { "mhpmevent9h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT10H]   = { "mhpmevent10h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT11H]   = { "mhpmevent11h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT12H]   = { "mhpmevent12h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT13H]   = { "mhpmevent13h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT14H]   = { "mhpmevent14h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT15H]   = { "mhpmevent15h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT16H]   = { "mhpmevent16h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT17H]   = { "mhpmevent17h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = PRIV_VERSION_1_12_0},
 [CSR_MHPMEVENT18H]   = { "mhpmevent18h",sscofpmf,  read_mhpmeventh,
- write_mhpmeventh   },
+ write_mhpmeventh,
+ .min_priv_ver = 

[PATCH v13 2/6] target/riscv: Simplify counter predicate function

2022-08-16 Thread Atish Patra
All the hpmcounters and the fixed counters (CY, IR, TM) can be represented
as a unified counter. Thus, the predicate function doesn't need handle each
case separately.

Simplify the predicate function so that we just handle things differently
between RV32/RV64 and S/HS mode.

Reviewed-by: Bin Meng 
Acked-by: Alistair Francis 
Signed-off-by: Atish Patra 
---
 target/riscv/csr.c | 110 -
 1 file changed, 9 insertions(+), 101 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 886695221956..397803d07727 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -75,6 +75,7 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
 CPUState *cs = env_cpu(env);
 RISCVCPU *cpu = RISCV_CPU(cs);
 int ctr_index;
+target_ulong ctr_mask;
 int base_csrno = CSR_CYCLE;
 bool rv32 = riscv_cpu_mxl(env) == MXL_RV32 ? true : false;
 
@@ -83,122 +84,29 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
 base_csrno += 0x80;
 }
 ctr_index = csrno - base_csrno;
+ctr_mask = BIT(ctr_index);
 
 if ((csrno >= CSR_CYCLE && csrno <= CSR_INSTRET) ||
 (csrno >= CSR_CYCLEH && csrno <= CSR_INSTRETH)) {
 goto skip_ext_pmu_check;
 }
 
-if ((!cpu->cfg.pmu_num || !(cpu->pmu_avail_ctrs & BIT(ctr_index {
+if (!(cpu->pmu_avail_ctrs & ctr_mask)) {
 /* No counter is enabled in PMU or the counter is out of range */
 return RISCV_EXCP_ILLEGAL_INST;
 }
 
 skip_ext_pmu_check:
 
-if (env->priv == PRV_S) {
-switch (csrno) {
-case CSR_CYCLE:
-if (!get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_TIME:
-if (!get_field(env->mcounteren, COUNTEREN_TM)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_INSTRET:
-if (!get_field(env->mcounteren, COUNTEREN_IR)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_HPMCOUNTER3...CSR_HPMCOUNTER31:
-if (!get_field(env->mcounteren, 1 << ctr_index)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-}
-if (rv32) {
-switch (csrno) {
-case CSR_CYCLEH:
-if (!get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_TIMEH:
-if (!get_field(env->mcounteren, COUNTEREN_TM)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_INSTRETH:
-if (!get_field(env->mcounteren, COUNTEREN_IR)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-case CSR_HPMCOUNTER3H...CSR_HPMCOUNTER31H:
-if (!get_field(env->mcounteren, 1 << ctr_index)) {
-return RISCV_EXCP_ILLEGAL_INST;
-}
-break;
-}
-}
+if (((env->priv == PRV_S) && (!get_field(env->mcounteren, ctr_mask))) ||
+((env->priv == PRV_U) && (!get_field(env->scounteren, ctr_mask {
+return RISCV_EXCP_ILLEGAL_INST;
 }
 
 if (riscv_cpu_virt_enabled(env)) {
-switch (csrno) {
-case CSR_CYCLE:
-if (!get_field(env->hcounteren, COUNTEREN_CY) &&
-get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_TIME:
-if (!get_field(env->hcounteren, COUNTEREN_TM) &&
-get_field(env->mcounteren, COUNTEREN_TM)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_INSTRET:
-if (!get_field(env->hcounteren, COUNTEREN_IR) &&
-get_field(env->mcounteren, COUNTEREN_IR)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_HPMCOUNTER3...CSR_HPMCOUNTER31:
-if (!get_field(env->hcounteren, 1 << ctr_index) &&
- get_field(env->mcounteren, 1 << ctr_index)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-}
-if (rv32) {
-switch (csrno) {
-case CSR_CYCLEH:
-if (!get_field(env->hcounteren, COUNTEREN_CY) &&
-get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_TIMEH:
-if (!get_field(env->hcounteren, COUNTEREN_TM) &&
-get_field(env->mcounteren, COUNTEREN_TM)) {
-return 

Re: [PATCH for-7.2 00/21] accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes

2022-08-16 Thread Ilya Leoshkevich
On Fri, 2022-08-12 at 11:07 -0700, Richard Henderson wrote:
> This is part of a larger body of work, but in the process of
> reorganizing I was reminded that PROT_EXEC wasn't being enforced
> properly for user-only.  As this has come up in the context of
> some of Ilya's patches, I thought I'd go ahead and post this part.
> 
> 
> r~
> 
> 
> Ilya Leoshkevich (1):
>   accel/tcg: Introduce is_same_page()
> 
> Richard Henderson (20):
>   linux-user/arm: Mark the commpage executable
>   linux-user/hppa: Allocate page zero as a commpage
>   linux-user/x86_64: Allocate vsyscall page as a commpage
>   linux-user: Honor PT_GNU_STACK
>   tests/tcg/i386: Move smc_code2 to an executable section
>   accel/tcg: Remove PageDesc code_bitmap
>   accel/tcg: Use bool for page_find_alloc
>   accel/tcg: Merge tb_htable_lookup into caller
>   accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
>   accel/tcg: Properly implement get_page_addr_code for user-only
>   accel/tcg: Use probe_access_internal for softmmu
>     get_page_addr_code_hostp
>   accel/tcg: Add nofault parameter to get_page_addr_code_hostp
>   accel/tcg: Unlock mmap_lock after longjmp
>   accel/tcg: Hoist get_page_addr_code out of tb_lookup
>   accel/tcg: Hoist get_page_addr_code out of tb_gen_code
>   accel/tcg: Raise PROT_EXEC exception early
>   accel/tcg: Remove translator_ldsw
>   accel/tcg: Add pc and host_pc params to gen_intermediate_code
>   accel/tcg: Add fast path for translator_ld*
>   accel/tcg: Use DisasContextBase in plugin_gen_tb_start
> 
>  accel/tcg/internal.h  |   7 +-
>  include/elf.h |   1 +
>  include/exec/cpu-common.h |   1 +
>  include/exec/exec-all.h   |  87 +---
>  include/exec/plugin-gen.h |   7 +-
>  include/exec/translator.h |  85 
>  linux-user/arm/target_cpu.h   |   4 +-
>  linux-user/qemu.h |   1 +
>  accel/tcg/cpu-exec.c  | 184 ++--
> --
>  accel/tcg/cputlb.c    |  93 +
>  accel/tcg/plugin-gen.c    |  23 +++--
>  accel/tcg/translate-all.c | 120 --
>  accel/tcg/translator.c    | 122 +-
>  accel/tcg/user-exec.c |  15 +++
>  linux-user/elfload.c  |  80 ++-
>  softmmu/physmem.c |  12 +++
>  target/alpha/translate.c  |   5 +-
>  target/arm/translate.c    |   5 +-
>  target/avr/translate.c    |   5 +-
>  target/cris/translate.c   |   5 +-
>  target/hexagon/translate.c    |   6 +-
>  target/hppa/translate.c   |   5 +-
>  target/i386/tcg/translate.c   |   7 +-
>  target/loongarch/translate.c  |   6 +-
>  target/m68k/translate.c   |   5 +-
>  target/microblaze/translate.c |   5 +-
>  target/mips/tcg/translate.c   |   5 +-
>  target/nios2/translate.c  |   5 +-
>  target/openrisc/translate.c   |   6 +-
>  target/ppc/translate.c    |   5 +-
>  target/riscv/translate.c  |   5 +-
>  target/rx/translate.c |   5 +-
>  target/s390x/tcg/translate.c  |   5 +-
>  target/sh4/translate.c    |   5 +-
>  target/sparc/translate.c  |   5 +-
>  target/tricore/translate.c    |   6 +-
>  target/xtensa/translate.c |   6 +-
>  tests/tcg/i386/test-i386.c    |   2 +-
>  38 files changed, 532 insertions(+), 424 deletions(-)
> 

Hi,

I need the following fixup to make my noexec tests pass with v1:

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 6a3ca8224f..cc6a43a3bc 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -386,6 +386,10 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState
*env)
 return tcg_code_gen_epilogue;
 }
 
+if (tb->page_addr[1] != -1) {
+get_page_addr_code_hostp(env, tb->page_addr[1], false, NULL);
+}
+
 log_cpu_exec(pc, cpu, tb);
 
 return tb->tc.ptr;
@@ -997,6 +1001,9 @@ int cpu_exec(CPUState *cpu)
  * for the fast lookup
  */
 qatomic_set(
>tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
+} else if (tb->page_addr[1] != -1) {
+get_page_addr_code_hostp(cpu->env_ptr, tb-
>page_addr[1], false,
+ NULL);
 }
 mmap_unlock();

With v2, the exception after mprotect(PROT_NONE) is not happening
again. I have not figured out what the problem is yet.

Also, wasmtime tests trigger this assertion:

static void pgb_dynamic(const char *image_name, long align)
{
/*
 * The executable is dynamic and does not require a fixed address.
 * All we need is a commpage that satisfies align.
 * If we do not need a commpage, leave guest_base == 0.
 */
if (HI_COMMPAGE) {
uintptr_t addr, commpage;

/* 64-bit hosts should have used reserved_va. */
assert(sizeof(uintptr_t) == 4);
^^^

Likewise, I also have not figured out why this is happening.

Best regards,
Ilya



[PATCH v2 27/33] target/arm: Change gen_*set_pc_im to gen_*update_pc

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on
absolute values by passing in pc difference.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a32.h |  2 +-
 target/arm/translate.h |  6 ++--
 target/arm/translate-a64.c | 32 +-
 target/arm/translate-vfp.c |  2 +-
 target/arm/translate.c | 68 --
 5 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index 78a84c1414..09c8f467aa 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -40,7 +40,7 @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, 
MemOp memop);
 TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs);
 void gen_set_cpsr(TCGv_i32 var, uint32_t mask);
 void gen_set_condexec(DisasContext *s);
-void gen_set_pc_im(DisasContext *s, target_ulong val);
+void gen_update_pc(DisasContext *s, int diff);
 void gen_lookup_tb(DisasContext *s);
 long vfp_reg_offset(bool dp, unsigned reg);
 long neon_full_reg_offset(unsigned reg);
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 90bf7c57fc..33b94a18bb 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -254,7 +254,7 @@ static inline int curr_insn_len(DisasContext *s)
  * For instructions which want an immediate exit to the main loop, as opposed
  * to attempting to use lookup_and_goto_ptr.  Unlike DISAS_UPDATE_EXIT, this
  * doesn't write the PC on exiting the translation loop so you need to ensure
- * something (gen_a64_set_pc_im or runtime helper) has done so before we reach
+ * something (gen_a64_update_pc or runtime helper) has done so before we reach
  * return from cpu_tb_exec.
  */
 #define DISAS_EXIT  DISAS_TARGET_9
@@ -263,14 +263,14 @@ static inline int curr_insn_len(DisasContext *s)
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
-void gen_a64_set_pc_im(uint64_t val);
+void gen_a64_update_pc(DisasContext *s, int diff);
 extern const TranslatorOps aarch64_translator_ops;
 #else
 static inline void a64_translate_init(void)
 {
 }
 
-static inline void gen_a64_set_pc_im(uint64_t val)
+static inline void gen_a64_update_pc(DisasContext *s, int diff)
 {
 }
 #endif
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 695ccd0723..90f31b1dff 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -148,9 +148,9 @@ static void reset_btype(DisasContext *s)
 }
 }
 
-void gen_a64_set_pc_im(uint64_t val)
+void gen_a64_update_pc(DisasContext *s, int diff)
 {
-tcg_gen_movi_i64(cpu_pc, val);
+tcg_gen_movi_i64(cpu_pc, s->pc_curr + diff);
 }
 
 /*
@@ -342,14 +342,14 @@ static void gen_exception_internal(int excp)
 
 static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
 {
-gen_a64_set_pc_im(pc);
+gen_a64_update_pc(s, pc - s->pc_curr);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
 
 static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syndrome)
 {
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_exception_bkpt_insn(cpu_env, tcg_constant_i32(syndrome));
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -384,11 +384,11 @@ static void gen_goto_tb(DisasContext *s, int n, int diff)
 
 if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
-gen_a64_set_pc_im(dest);
+gen_a64_update_pc(s, diff);
 tcg_gen_exit_tb(s->base.tb, n);
 s->base.is_jmp = DISAS_NORETURN;
 } else {
-gen_a64_set_pc_im(dest);
+gen_a64_update_pc(s, diff);
 if (s->ss_active) {
 gen_step_complete_exception(s);
 } else {
@@ -1960,7 +1960,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 uint32_t syndrome;
 
 syndrome = syn_aa64_sysregtrap(op0, op1, op2, crn, crm, rt, isread);
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_access_check_cp_reg(cpu_env,
tcg_constant_ptr(ri),
tcg_constant_i32(syndrome),
@@ -1970,7 +1970,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
  * The readfn or writefn might raise an exception;
  * synchronize the CPU state in case it does.
  */
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 }
 
 /* Handle special cases first */
@@ -2180,7 +2180,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 /* The pre HVC helper handles cases when HVC gets trapped
  * as an undefined insn by runtime configuration.
  */
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_pre_hvc(cpu_env);
 gen_ss_advance(s);
 gen_exception_insn_el(s, s->base.pc_next, EXCP_HVC,
@@ -2191,7 +2191,7 @@ static void disas_exc(DisasContext *s, 

[PATCH v2 25/33] target/arm: Introduce curr_insn_len

2022-08-16 Thread Richard Henderson
A simple helper to retrieve the length of the current insn.

Signed-off-by: Richard Henderson 
---
 target/arm/translate.h | 5 +
 target/arm/translate-vfp.c | 2 +-
 target/arm/translate.c | 5 ++---
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index af5d4a7086..90bf7c57fc 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -226,6 +226,11 @@ static inline void disas_set_insn_syndrome(DisasContext 
*s, uint32_t syn)
 s->insn_start = NULL;
 }
 
+static inline int curr_insn_len(DisasContext *s)
+{
+return s->base.pc_next - s->pc_curr;
+}
+
 /* is_jmp field values */
 #define DISAS_JUMP  DISAS_TARGET_0 /* only pc was modified dynamically */
 /* CPU state was modified dynamically; exit to main loop for interrupts. */
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index bd5ae27d09..94cc1e4b77 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -242,7 +242,7 @@ static bool vfp_access_check_a(DisasContext *s, bool 
ignore_vfp_enabled)
 if (s->sme_trap_nonstreaming) {
 gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
syn_smetrap(SME_ET_Streaming,
-   s->base.pc_next - s->pc_curr == 2));
+   curr_insn_len(s) == 2));
 return false;
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9474e4b44b..638a051281 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6660,7 +6660,7 @@ static ISSInfo make_issinfo(DisasContext *s, int rd, bool 
p, bool w)
 /* ISS not valid if writeback */
 if (p && !w) {
 ret = rd;
-if (s->base.pc_next - s->pc_curr == 2) {
+if (curr_insn_len(s) == 2) {
 ret |= ISSIs16Bit;
 }
 } else {
@@ -9825,8 +9825,7 @@ static void arm_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 /* nothing more to generate */
 break;
 case DISAS_WFI:
-gen_helper_wfi(cpu_env,
-   tcg_constant_i32(dc->base.pc_next - dc->pc_curr));
+gen_helper_wfi(cpu_env, tcg_constant_i32(curr_insn_len(dc)));
 /*
  * The helper doesn't necessarily throw an exception, but we
  * must go back to the main loop to check for interrupts anyway.
-- 
2.34.1




[PATCH v2 32/33] target/arm: Introduce gen_pc_plus_diff for aarch32

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4d13e365e2..f01c8df60a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -276,11 +276,16 @@ static int jmp_diff(DisasContext *s, int diff)
 return diff + (s->thumb ? 4 : 8);
 }
 
+static void gen_pc_plus_diff(DisasContext *s, TCGv_i32 var, int diff)
+{
+tcg_gen_movi_i32(var, s->pc_curr + diff);
+}
+
 /* Set a variable to the value of a CPU register.  */
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 {
 if (reg == 15) {
-tcg_gen_movi_i32(var, read_pc(s));
+gen_pc_plus_diff(s, var, jmp_diff(s, 0));
 } else {
 tcg_gen_mov_i32(var, cpu_R[reg]);
 }
@@ -296,7 +301,8 @@ TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
 TCGv_i32 tmp = tcg_temp_new_i32();
 
 if (reg == 15) {
-tcg_gen_movi_i32(tmp, (read_pc(s) & ~3) + ofs);
+/* This difference computes a page offset so ok for TARGET_TB_PCREL. */
+gen_pc_plus_diff(s, tmp, (read_pc(s) & ~3) - s->pc_curr + ofs);
 } else {
 tcg_gen_addi_i32(tmp, cpu_R[reg], ofs);
 }
@@ -1158,7 +1164,7 @@ void unallocated_encoding(DisasContext *s)
 /* Force a TB lookup after an instruction that changes the CPU state.  */
 void gen_lookup_tb(DisasContext *s)
 {
-tcg_gen_movi_i32(cpu_R[15], s->base.pc_next);
+gen_pc_plus_diff(s, cpu_R[15], curr_insn_len(s));
 s->base.is_jmp = DISAS_EXIT;
 }
 
@@ -6485,7 +6491,7 @@ static bool trans_BLX_r(DisasContext *s, arg_BLX_r *a)
 return false;
 }
 tmp = load_reg(s, a->rm);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 gen_bx(s, tmp);
 return true;
 }
@@ -8356,7 +8362,7 @@ static bool trans_B_cond_thumb(DisasContext *s, arg_ci *a)
 
 static bool trans_BL(DisasContext *s, arg_i *a)
 {
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
@@ -8375,7 +8381,7 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 if (s->thumb && (a->imm & 2)) {
 return false;
 }
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 store_cpu_field_constant(!s->thumb, thumb);
 /* This difference computes a page offset so ok for TARGET_TB_PCREL. */
 gen_jmp(s, (read_pc(s) & ~3) - s->pc_curr + a->imm);
@@ -8385,7 +8391,7 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 static bool trans_BL_BLX_prefix(DisasContext *s, arg_BL_BLX_prefix *a)
 {
 assert(!arm_dc_feature(s, ARM_FEATURE_THUMB2));
-tcg_gen_movi_i32(cpu_R[14], read_pc(s) + (a->imm << 12));
+gen_pc_plus_diff(s, cpu_R[14], jmp_diff(s, a->imm << 12));
 return true;
 }
 
@@ -8395,7 +8401,7 @@ static bool trans_BL_suffix(DisasContext *s, 
arg_BL_suffix *a)
 
 assert(!arm_dc_feature(s, ARM_FEATURE_THUMB2));
 tcg_gen_addi_i32(tmp, cpu_R[14], (a->imm << 1) | 1);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | 1);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | 1);
 gen_bx(s, tmp);
 return true;
 }
@@ -8411,7 +8417,7 @@ static bool trans_BLX_suffix(DisasContext *s, 
arg_BLX_suffix *a)
 tmp = tcg_temp_new_i32();
 tcg_gen_addi_i32(tmp, cpu_R[14], a->imm << 1);
 tcg_gen_andi_i32(tmp, tmp, 0xfffc);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | 1);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | 1);
 gen_bx(s, tmp);
 return true;
 }
@@ -8734,10 +8740,11 @@ static bool op_tbranch(DisasContext *s, arg_tbranch *a, 
bool half)
 tcg_gen_add_i32(addr, addr, tmp);
 
 gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s), half ? MO_UW : MO_UB);
-tcg_temp_free_i32(addr);
 
 tcg_gen_add_i32(tmp, tmp, tmp);
-tcg_gen_addi_i32(tmp, tmp, read_pc(s));
+gen_pc_plus_diff(s, addr, jmp_diff(s, 0));
+tcg_gen_add_i32(tmp, tmp, addr);
+tcg_temp_free_i32(addr);
 store_reg(s, 15, tmp);
 return true;
 }
-- 
2.34.1




[PATCH v2 22/33] accel/tcg: Introduce tb_pc and tb_pc_log

2022-08-16 Thread Richard Henderson
The availability of tb->pc will shortly be conditional.
Introduce accessor functions to minimize ifdefs.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 12 ++
 accel/tcg/cpu-exec.c| 20 -
 accel/tcg/translate-all.c   | 29 +
 target/arm/cpu.c|  4 ++--
 target/avr/cpu.c|  2 +-
 target/hexagon/cpu.c|  2 +-
 target/hppa/cpu.c   |  4 ++--
 target/i386/tcg/tcg-cpu.c   |  2 +-
 target/loongarch/cpu.c  |  2 +-
 target/microblaze/cpu.c |  2 +-
 target/mips/tcg/exception.c |  2 +-
 target/mips/tcg/sysemu/special_helper.c |  2 +-
 target/openrisc/cpu.c   |  2 +-
 target/riscv/cpu.c  |  4 ++--
 target/rx/cpu.c |  2 +-
 target/sh4/cpu.c|  4 ++--
 target/sparc/cpu.c  |  2 +-
 target/tricore/cpu.c|  2 +-
 tcg/tcg.c   |  6 ++---
 19 files changed, 59 insertions(+), 46 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 4ad166966b..cec3ef1666 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -533,6 +533,18 @@ struct TranslationBlock {
 uintptr_t jmp_dest[2];
 };
 
+/* Hide the read to avoid ifdefs for TARGET_TB_PCREL. */
+static inline target_ulong tb_pc(const TranslationBlock *tb)
+{
+return tb->pc;
+}
+
+/* Similarly, but for logs. */
+static inline target_ulong tb_pc_log(const TranslationBlock *tb)
+{
+return tb->pc;
+}
+
 /* Hide the qatomic_read to make code a little easier on the eyes */
 static inline uint32_t tb_cflags(const TranslationBlock *tb)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 3f8e4bbbc8..f146960b7b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -185,7 +185,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
 
-if (tb->pc == desc->pc &&
+if (tb_pc(tb) == desc->pc &&
 tb->page_addr[0] == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
@@ -413,7 +413,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
*tb_exit)
 TranslationBlock *last_tb;
 const void *tb_ptr = itb->tc.ptr;
 
-log_cpu_exec(itb->pc, cpu, itb);
+log_cpu_exec(tb_pc_log(itb), cpu, itb);
 
 qemu_thread_jit_execute();
 ret = tcg_qemu_tb_exec(env, tb_ptr);
@@ -437,16 +437,16 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
*tb_exit)
  * of the start of the TB.
  */
 CPUClass *cc = CPU_GET_CLASS(cpu);
-qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
+qemu_log_mask_and_addr(CPU_LOG_EXEC, tb_pc_log(last_tb),
"Stopped execution of TB chain before %p ["
TARGET_FMT_lx "] %s\n",
-   last_tb->tc.ptr, last_tb->pc,
-   lookup_symbol(last_tb->pc));
+   last_tb->tc.ptr, tb_pc_log(last_tb),
+   lookup_symbol(tb_pc_log(last_tb)));
 if (cc->tcg_ops->synchronize_from_tb) {
 cc->tcg_ops->synchronize_from_tb(cpu, last_tb);
 } else {
 assert(cc->set_pc);
-cc->set_pc(cpu, last_tb->pc);
+cc->set_pc(cpu, tb_pc(last_tb));
 }
 }
 
@@ -588,11 +588,11 @@ static inline void tb_add_jump(TranslationBlock *tb, int 
n,
 
 qemu_spin_unlock(_next->jmp_lock);
 
-qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
+qemu_log_mask_and_addr(CPU_LOG_EXEC, tb_pc_log(tb),
"Linking TBs %p [" TARGET_FMT_lx
"] index %d -> %p [" TARGET_FMT_lx "]\n",
-   tb->tc.ptr, tb->pc, n,
-   tb_next->tc.ptr, tb_next->pc);
+   tb->tc.ptr, tb_pc_log(tb), n,
+   tb_next->tc.ptr, tb_pc_log(tb_next));
 return;
 
  out_unlock_next:
@@ -842,7 +842,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 {
 int32_t insns_left;
 
-trace_exec_tb(tb, tb->pc);
+trace_exec_tb(tb, tb_pc_log(tb));
 tb = cpu_tb_exec(cpu, tb, tb_exit);
 if (*tb_exit != TB_EXIT_REQUESTED) {
 *last_tb = tb;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c2745f14a6..1248ee3433 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -298,7 +298,7 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 
 for (j = 0; j < TARGET_INSN_START_WORDS; ++j) {
 if (i == 0) {
-prev = (j == 0 ? tb->pc : 0);
+prev = (j == 0 ? tb_pc(tb) : 0);
 } else 

Re: [PATCH v2 33/33] target/arm: Enable TARGET_TB_PCREL

2022-08-16 Thread Richard Henderson

On 8/16/22 15:34, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/arm/cpu-param.h |  2 ++
  target/arm/translate.h |  6 
  target/arm/cpu.c   | 23 +++---
  target/arm/translate-a64.c | 37 ++-
  target/arm/translate.c | 62 ++
  5 files changed, 100 insertions(+), 30 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 68ffb12427..ef62371d8f 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -34,4 +34,6 @@
  
  #define NB_MMU_MODES 15
  
+#define TARGET_TB_PCREL 1


Oh, this was meant to be system-mode only, as there is no virtual aliasing in user-only. 
But during development it was handy to test the feature in user-only.  And we might find 
that maintaining both mechanisms is more effort than not.



r~



[PATCH v2 30/33] target/arm: Change gen_jmp* to work on displacements

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 63a41ed438..4d13e365e2 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -270,6 +270,12 @@ static uint32_t read_pc(DisasContext *s)
 return s->pc_curr + (s->thumb ? 4 : 8);
 }
 
+/* The pc_curr difference for an architectural jump. */
+static int jmp_diff(DisasContext *s, int diff)
+{
+return diff + (s->thumb ? 4 : 8);
+}
+
 /* Set a variable to the value of a CPU register.  */
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 {
@@ -2614,10 +2620,8 @@ static void gen_goto_tb(DisasContext *s, int n, int diff)
 }
 
 /* Jump, specifying which TB number to use if we gen_goto_tb() */
-static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
+static void gen_jmp_tb(DisasContext *s, int diff, int tbno)
 {
-int diff = dest - s->pc_curr;
-
 if (unlikely(s->ss_active)) {
 /* An indirect jump so that we still trigger the debug exception.  */
 gen_update_pc(s, diff);
@@ -2659,9 +2663,9 @@ static inline void gen_jmp_tb(DisasContext *s, uint32_t 
dest, int tbno)
 }
 }
 
-static inline void gen_jmp(DisasContext *s, uint32_t dest)
+static inline void gen_jmp(DisasContext *s, int diff)
 {
-gen_jmp_tb(s, dest, 0);
+gen_jmp_tb(s, diff, 0);
 }
 
 static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
@@ -8331,7 +8335,7 @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
 
 static bool trans_B(DisasContext *s, arg_i *a)
 {
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8346,14 +8350,14 @@ static bool trans_B_cond_thumb(DisasContext *s, arg_ci 
*a)
 return true;
 }
 arm_skip_unless(s, a->cond);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
 static bool trans_BL(DisasContext *s, arg_i *a)
 {
 tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8373,7 +8377,8 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 }
 tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
 store_cpu_field_constant(!s->thumb, thumb);
-gen_jmp(s, (read_pc(s) & ~3) + a->imm);
+/* This difference computes a page offset so ok for TARGET_TB_PCREL. */
+gen_jmp(s, (read_pc(s) & ~3) - s->pc_curr + a->imm);
 return true;
 }
 
@@ -8534,10 +8539,10 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
  * when we take this upcoming exit from this TB, so gen_jmp_tb() is OK.
  */
 }
-gen_jmp_tb(s, s->base.pc_next, 1);
+gen_jmp_tb(s, curr_insn_len(s), 1);
 
 gen_set_label(nextlabel);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8617,7 +8622,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 
 if (a->f) {
 /* Loop-forever: just jump back to the loop start */
-gen_jmp(s, read_pc(s) - a->imm);
+gen_jmp(s, jmp_diff(s, -a->imm));
 return true;
 }
 
@@ -8648,7 +8653,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 tcg_temp_free_i32(decr);
 }
 /* Jump back to the loop start */
-gen_jmp(s, read_pc(s) - a->imm);
+gen_jmp(s, jmp_diff(s, -a->imm));
 
 gen_set_label(loopend);
 if (a->tp) {
@@ -8656,7 +8661,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 store_cpu_field(tcg_constant_i32(4), v7m.ltpsize);
 }
 /* End TB, continuing to following insn */
-gen_jmp_tb(s, s->base.pc_next, 1);
+gen_jmp_tb(s, curr_insn_len(s), 1);
 return true;
 }
 
@@ -8755,7 +8760,7 @@ static bool trans_CBZ(DisasContext *s, arg_CBZ *a)
 tcg_gen_brcondi_i32(a->nz ? TCG_COND_EQ : TCG_COND_NE,
 tmp, 0, s->condlabel);
 tcg_temp_free_i32(tmp);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
-- 
2.34.1




[PATCH v2 16/33] accel/tcg: Remove translator_ldsw

2022-08-16 Thread Richard Henderson
The only user can easily use translator_lduw and
adjust the type to signed during the return.

Signed-off-by: Richard Henderson 
---
 include/exec/translator.h   | 1 -
 target/i386/tcg/translate.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 0d0bf3a31e..45b9268ca4 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -178,7 +178,6 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
 
 #define FOR_EACH_TRANSLATOR_LD(F)   \
 F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)   \
-F(translator_ldsw, int16_t, cpu_ldsw_code, bswap16) \
 F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\
 F(translator_ldl, uint32_t, cpu_ldl_code, bswap32)  \
 F(translator_ldq, uint64_t, cpu_ldq_code, bswap64)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b7972f0ff5..a23417d058 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2033,7 +2033,7 @@ static inline uint8_t x86_ldub_code(CPUX86State *env, 
DisasContext *s)
 
 static inline int16_t x86_ldsw_code(CPUX86State *env, DisasContext *s)
 {
-return translator_ldsw(env, >base, advance_pc(env, s, 2));
+return translator_lduw(env, >base, advance_pc(env, s, 2));
 }
 
 static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s)
-- 
2.34.1




[PATCH v2 19/33] accel/tcg: Use DisasContextBase in plugin_gen_tb_start

2022-08-16 Thread Richard Henderson
Use the pc coming from db->pc_first rather than the TB.

Use the cached host_addr rather than re-computing for the
first page.  We still need a separate lookup for the second
page because it won't be computed for DisasContextBase until
the translator actually performs a read from the page.

Signed-off-by: Richard Henderson 
---
 include/exec/plugin-gen.h |  7 ---
 accel/tcg/plugin-gen.c| 23 ---
 accel/tcg/translator.c|  2 +-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
index f92f169739..5004728c61 100644
--- a/include/exec/plugin-gen.h
+++ b/include/exec/plugin-gen.h
@@ -19,7 +19,8 @@ struct DisasContextBase;
 
 #ifdef CONFIG_PLUGIN
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
supress);
+bool plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db,
+ bool supress);
 void plugin_gen_tb_end(CPUState *cpu);
 void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
 void plugin_gen_insn_end(void);
@@ -48,8 +49,8 @@ static inline void plugin_insn_append(abi_ptr pc, const void 
*from, size_t size)
 
 #else /* !CONFIG_PLUGIN */
 
-static inline
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
supress)
+static inline bool
+plugin_gen_tb_start(CPUState *cpu, const struct DisasContextBase *db, bool sup)
 {
 return false;
 }
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 8377c15383..0f080386af 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -852,7 +852,8 @@ static void plugin_gen_inject(const struct qemu_plugin_tb 
*plugin_tb)
 pr_ops();
 }
 
-bool plugin_gen_tb_start(CPUState *cpu, const TranslationBlock *tb, bool 
mem_only)
+bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
+ bool mem_only)
 {
 bool ret = false;
 
@@ -870,9 +871,9 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
TranslationBlock *tb, bool mem_onl
 
 ret = true;
 
-ptb->vaddr = tb->pc;
+ptb->vaddr = db->pc_first;
 ptb->vaddr2 = -1;
-get_page_addr_code_hostp(cpu->env_ptr, tb->pc, true, >haddr1);
+ptb->haddr1 = db->host_addr[0];
 ptb->haddr2 = NULL;
 ptb->mem_only = mem_only;
 
@@ -898,16 +899,16 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
  * Note that we skip this when haddr1 == NULL, e.g. when we're
  * fetching instructions from a region not backed by RAM.
  */
-if (likely(ptb->haddr1 != NULL && ptb->vaddr2 == -1) &&
-unlikely((db->pc_next & TARGET_PAGE_MASK) !=
- (db->pc_first & TARGET_PAGE_MASK))) {
-get_page_addr_code_hostp(cpu->env_ptr, db->pc_next,
- true, >haddr2);
-ptb->vaddr2 = db->pc_next;
-}
-if (likely(ptb->vaddr2 == -1)) {
+if (ptb->haddr1 == NULL) {
+pinsn->haddr = NULL;
+} else if (is_same_page(db, db->pc_next)) {
 pinsn->haddr = ptb->haddr1 + pinsn->vaddr - ptb->vaddr;
 } else {
+if (ptb->vaddr2 == -1) {
+ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
+get_page_addr_code_hostp(cpu->env_ptr, ptb->vaddr2,
+ true, >haddr2);
+}
 pinsn->haddr = ptb->haddr2 + pinsn->vaddr - ptb->vaddr2;
 }
 }
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index a693c17259..3e6fab482e 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -81,7 +81,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int 
max_insns,
 ops->tb_start(db, cpu);
 tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
-plugin_enabled = plugin_gen_tb_start(cpu, tb, cflags & CF_MEMI_ONLY);
+plugin_enabled = plugin_gen_tb_start(cpu, db, cflags & CF_MEMI_ONLY);
 
 while (true) {
 db->num_insns++;
-- 
2.34.1




[PATCH v2 26/33] target/arm: Change gen_goto_tb to work on displacements

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 40 --
 target/arm/translate.c | 10 ++
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 163df8c615..695ccd0723 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -378,8 +378,10 @@ static inline bool use_goto_tb(DisasContext *s, uint64_t 
dest)
 return translator_use_goto_tb(>base, dest);
 }
 
-static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
+static void gen_goto_tb(DisasContext *s, int n, int diff)
 {
+uint64_t dest = s->pc_curr + diff;
+
 if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
 gen_a64_set_pc_im(dest);
@@ -1362,7 +1364,7 @@ static inline AArch64DecodeFn *lookup_disas_fn(const 
AArch64DecodeTable *table,
  */
 static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
 {
-uint64_t addr = s->pc_curr + sextract32(insn, 0, 26) * 4;
+int diff = sextract32(insn, 0, 26) * 4;
 
 if (insn & (1U << 31)) {
 /* BL Branch with link */
@@ -1371,7 +1373,7 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t 
insn)
 
 /* B Branch / BL Branch with link */
 reset_btype(s);
-gen_goto_tb(s, 0, addr);
+gen_goto_tb(s, 0, diff);
 }
 
 /* Compare and branch (immediate)
@@ -1383,14 +1385,14 @@ static void disas_uncond_b_imm(DisasContext *s, 
uint32_t insn)
 static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int sf, op, rt;
-uint64_t addr;
+int diff;
 TCGLabel *label_match;
 TCGv_i64 tcg_cmp;
 
 sf = extract32(insn, 31, 1);
 op = extract32(insn, 24, 1); /* 0: CBZ; 1: CBNZ */
 rt = extract32(insn, 0, 5);
-addr = s->pc_curr + sextract32(insn, 5, 19) * 4;
+diff = sextract32(insn, 5, 19) * 4;
 
 tcg_cmp = read_cpu_reg(s, rt, sf);
 label_match = gen_new_label();
@@ -1399,9 +1401,9 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 }
 
 /* Test and branch (immediate)
@@ -1413,13 +1415,13 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 static void disas_test_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int bit_pos, op, rt;
-uint64_t addr;
+int diff;
 TCGLabel *label_match;
 TCGv_i64 tcg_cmp;
 
 bit_pos = (extract32(insn, 31, 1) << 5) | extract32(insn, 19, 5);
 op = extract32(insn, 24, 1); /* 0: TBZ; 1: TBNZ */
-addr = s->pc_curr + sextract32(insn, 5, 14) * 4;
+diff = sextract32(insn, 5, 14) * 4;
 rt = extract32(insn, 0, 5);
 
 tcg_cmp = tcg_temp_new_i64();
@@ -1430,9 +1432,9 @@ static void disas_test_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 tcg_temp_free_i64(tcg_cmp);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 }
 
 /* Conditional branch (immediate)
@@ -1444,13 +1446,13 @@ static void disas_test_b_imm(DisasContext *s, uint32_t 
insn)
 static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int cond;
-uint64_t addr;
+int diff;
 
 if ((insn & (1 << 4)) || (insn & (1 << 24))) {
 unallocated_encoding(s);
 return;
 }
-addr = s->pc_curr + sextract32(insn, 5, 19) * 4;
+diff = sextract32(insn, 5, 19) * 4;
 cond = extract32(insn, 0, 4);
 
 reset_btype(s);
@@ -1458,12 +1460,12 @@ static void disas_cond_b_imm(DisasContext *s, uint32_t 
insn)
 /* genuinely conditional branches */
 TCGLabel *label_match = gen_new_label();
 arm_gen_test_cc(cond, label_match);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 } else {
 /* 0xe and 0xf are both "always" conditions */
-gen_goto_tb(s, 0, addr);
+gen_goto_tb(s, 0, diff);
 }
 }
 
@@ -1637,7 +1639,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
  * any pending interrupts immediately.
  */
 reset_btype(s);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 return;
 
 case 7: /* SB */
@@ -1649,7 +1651,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
  * MB and end the TB instead.
  */
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 return;
 
 

[PATCH v2 20/33] accel/tcg: Do not align tb->page_addr[0]

2022-08-16 Thread Richard Henderson
Let tb->page_addr[0] contain the offset within the page of the
start of the translation block.  We need to recover this value
anyway at various points, and it is easier to discard the page
offset when it's not needed, which happens naturally via the
existing find_page shift.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c  | 16 
 accel/tcg/cputlb.c|  3 ++-
 accel/tcg/translate-all.c |  9 +
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7b8977a0a4..b1fd962718 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -174,7 +174,7 @@ struct tb_desc {
 target_ulong pc;
 target_ulong cs_base;
 CPUArchState *env;
-tb_page_addr_t phys_page1;
+tb_page_addr_t page_addr0;
 uint32_t flags;
 uint32_t cflags;
 uint32_t trace_vcpu_dstate;
@@ -186,7 +186,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const struct tb_desc *desc = d;
 
 if (tb->pc == desc->pc &&
-tb->page_addr[0] == desc->phys_page1 &&
+tb->page_addr[0] == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
 tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
@@ -195,12 +195,12 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 if (tb->page_addr[1] == -1) {
 return true;
 } else {
-tb_page_addr_t phys_page2;
-target_ulong virt_page2;
+tb_page_addr_t phys_page1;
+target_ulong virt_page1;
 
-virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
-phys_page2 = get_page_addr_code(desc->env, virt_page2);
-if (tb->page_addr[1] == phys_page2) {
+virt_page1 = TARGET_PAGE_ALIGN(desc->pc);
+phys_page1 = get_page_addr_code(desc->env, virt_page1);
+if (tb->page_addr[1] == phys_page1) {
 return true;
 }
 }
@@ -226,7 +226,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 if (phys_pc == -1) {
 return NULL;
 }
-desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+desc.page_addr0 = phys_pc;
 h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
 return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index ae7b40dd51..8b81b07b79 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -951,7 +951,8 @@ void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState 
*src_cpu,
can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
-cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
+cpu_physical_memory_test_and_clear_dirty(ram_addr & TARGET_PAGE_MASK,
+ TARGET_PAGE_SIZE,
  DIRTY_MEMORY_CODE);
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a8f1c34c4e..20f00f4335 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1167,7 +1167,7 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 qemu_spin_unlock(>jmp_lock);
 
 /* remove the TB from the hash list */
-phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
+phys_pc = tb->page_addr[0];
 h = tb_hash_func(phys_pc, tb->pc, tb->flags, orig_cflags,
  tb->trace_vcpu_dstate);
 if (!qht_remove(_ctx.htable, tb, h)) {
@@ -1291,7 +1291,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
  * we can only insert TBs that are fully initialized.
  */
 page_lock_pair(, phys_pc, , phys_page2, true);
-tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
+tb_page_add(p, tb, 0, phys_pc);
 if (p2) {
 tb_page_add(p2, tb, 1, phys_page2);
 } else {
@@ -1644,11 +1644,12 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 if (n == 0) {
 /* NOTE: tb_end may be after the end of the page, but
it is not a problem */
-tb_start = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
+tb_start = tb->page_addr[0];
 tb_end = tb_start + tb->size;
 } else {
 tb_start = tb->page_addr[1];
-tb_end = tb_start + ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
+tb_end = tb_start + ((tb->page_addr[0] + tb->size)
+ & ~TARGET_PAGE_MASK);
 }
 if (!(tb_end <= start || tb_start >= end)) {
 #ifdef TARGET_HAS_PRECISE_SMC
-- 
2.34.1




[PATCH v2 28/33] target/arm: Change gen_exception_insn* to work on displacements

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate.h|  4 ++--
 target/arm/translate-a64.c| 28 +++--
 target/arm/translate-m-nocp.c |  6 +++---
 target/arm/translate-mve.c|  2 +-
 target/arm/translate-vfp.c|  6 +++---
 target/arm/translate.c| 39 +--
 6 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 33b94a18bb..d42059aa1d 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -281,9 +281,9 @@ void arm_jump_cc(DisasCompare *cmp, TCGLabel *label);
 void arm_gen_test_cc(int cc, TCGLabel *label);
 MemOp pow2_align(unsigned i);
 void unallocated_encoding(DisasContext *s);
-void gen_exception_insn_el(DisasContext *s, uint64_t pc, int excp,
+void gen_exception_insn_el(DisasContext *s, int pc_diff, int excp,
uint32_t syn, uint32_t target_el);
-void gen_exception_insn(DisasContext *s, uint64_t pc, int excp, uint32_t syn);
+void gen_exception_insn(DisasContext *s, int pc_diff, int excp, uint32_t syn);
 
 /* Return state of Alternate Half-precision flag, caller frees result */
 static inline TCGv_i32 get_ahp_flag(void)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 90f31b1dff..422ce9288d 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1163,7 +1163,7 @@ static bool fp_access_check_only(DisasContext *s)
 assert(!s->fp_access_checked);
 s->fp_access_checked = true;
 
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_fp_access_trap(1, 0xe, false, 0),
   s->fp_excp_el);
 return false;
@@ -1178,7 +1178,7 @@ static bool fp_access_check(DisasContext *s)
 return false;
 }
 if (s->sme_trap_nonstreaming && s->is_nonstreaming) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_Streaming, false));
 return false;
 }
@@ -1198,7 +1198,7 @@ bool sve_access_check(DisasContext *s)
 goto fail_exit;
 }
 } else if (s->sve_excp_el) {
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_sve_access_trap(), s->sve_excp_el);
 goto fail_exit;
 }
@@ -1220,7 +1220,7 @@ bool sve_access_check(DisasContext *s)
 static bool sme_access_check(DisasContext *s)
 {
 if (s->sme_excp_el) {
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_smetrap(SME_ET_AccessTrap, false),
   s->sme_excp_el);
 return false;
@@ -1250,12 +1250,12 @@ bool sme_enabled_check_with_svcr(DisasContext *s, 
unsigned req)
 return false;
 }
 if (FIELD_EX64(req, SVCR, SM) && !s->pstate_sm) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_NotStreaming, false));
 return false;
 }
 if (FIELD_EX64(req, SVCR, ZA) && !s->pstate_za) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_InactiveZA, false));
 return false;
 }
@@ -1915,7 +1915,7 @@ static void gen_sysreg_undef(DisasContext *s, bool isread,
 } else {
 syndrome = syn_uncategorized();
 }
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syndrome);
+gen_exception_insn(s, 0, EXCP_UDEF, syndrome);
 }
 
 /* MRS - move from system register
@@ -2169,8 +2169,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 switch (op2_ll) {
 case 1: /* SVC */
 gen_ss_advance(s);
-gen_exception_insn(s, s->base.pc_next, EXCP_SWI,
-   syn_aa64_svc(imm16));
+gen_exception_insn(s, 4, EXCP_SWI, syn_aa64_svc(imm16));
 break;
 case 2: /* HVC */
 if (s->current_el == 0) {
@@ -2183,8 +2182,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 gen_a64_update_pc(s, 0);
 gen_helper_pre_hvc(cpu_env);
 gen_ss_advance(s);
-gen_exception_insn_el(s, s->base.pc_next, EXCP_HVC,
-  syn_aa64_hvc(imm16), 2);
+gen_exception_insn_el(s, 4, EXCP_HVC, syn_aa64_hvc(imm16), 2);
 break;
 case 3: /* SMC */
 if (s->current_el == 0) {
@@ -2194,8 +2192,7 @@ static void disas_exc(DisasContext *s, uint32_t 

[PATCH v2 24/33] accel/tcg: Split log_cpu_exec into inline and slow path

2022-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f7c82a8f2c..d758396bcd 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -283,12 +283,11 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
 return tb;
 }
 
-static inline void log_cpu_exec(target_ulong pc, CPUState *cpu,
-const TranslationBlock *tb)
+static void log_cpu_exec1(CPUState *cpu, const TranslationBlock *tb)
 {
-if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC))
-&& qemu_log_in_addr_range(pc)) {
+target_ulong pc = tb_pc_log(tb);
 
+if (qemu_log_in_addr_range(pc)) {
 qemu_log_mask(CPU_LOG_EXEC,
   "Trace %d: %p [" TARGET_FMT_lx
   "/" TARGET_FMT_lx "/%08x/%08x] %s\n",
@@ -315,6 +314,13 @@ static inline void log_cpu_exec(target_ulong pc, CPUState 
*cpu,
 }
 }
 
+static inline void log_cpu_exec(CPUState *cpu, const TranslationBlock *tb)
+{
+if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC))) {
+log_cpu_exec1(cpu, tb);
+}
+}
+
 static bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
   uint32_t *cflags)
 {
@@ -412,7 +418,7 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 return tcg_code_gen_epilogue;
 }
 
-log_cpu_exec(pc, cpu, tb);
+log_cpu_exec(cpu, tb);
 
 return tb->tc.ptr;
 }
@@ -435,7 +441,7 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
*tb_exit)
 TranslationBlock *last_tb;
 const void *tb_ptr = itb->tc.ptr;
 
-log_cpu_exec(tb_pc_log(itb), cpu, itb);
+log_cpu_exec(cpu, itb);
 
 qemu_thread_jit_execute();
 ret = tcg_qemu_tb_exec(env, tb_ptr);
-- 
2.34.1




[PATCH v2 21/33] include/hw/core: Create struct CPUJumpCache

2022-08-16 Thread Richard Henderson
Wrap the bare TranslationBlock pointer into a structure.

Signed-off-by: Richard Henderson 
---
 include/hw/core/cpu.h | 8 ++--
 accel/tcg/cpu-exec.c  | 9 ++---
 accel/tcg/cputlb.c| 2 +-
 accel/tcg/translate-all.c | 4 ++--
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 500503da13..8edef14199 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -233,6 +233,10 @@ struct hvf_vcpu_state;
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
+typedef struct {
+TranslationBlock *tb;
+} CPUJumpCache;
+
 /* work queue */
 
 /* The union type allows passing of 64 bit target pointers on 32 bit
@@ -362,7 +366,7 @@ struct CPUState {
 IcountDecr *icount_decr_ptr;
 
 /* Accessed in parallel; all accesses must be atomic */
-TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+CPUJumpCache tb_jmp_cache[TB_JMP_CACHE_SIZE];
 
 struct GDBRegisterState *gdb_regs;
 int gdb_num_regs;
@@ -453,7 +457,7 @@ static inline void cpu_tb_jmp_cache_clear(CPUState *cpu)
 unsigned int i;
 
 for (i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-qatomic_set(>tb_jmp_cache[i], NULL);
+qatomic_set(>tb_jmp_cache[i].tb, NULL);
 }
 }
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index b1fd962718..3f8e4bbbc8 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -243,7 +243,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
 tcg_debug_assert(!(cflags & CF_INVALID));
 
 hash = tb_jmp_cache_hash_func(pc);
-tb = qatomic_rcu_read(>tb_jmp_cache[hash]);
+tb = qatomic_rcu_read(>tb_jmp_cache[hash].tb);
 
 if (likely(tb &&
tb->pc == pc &&
@@ -257,7 +257,7 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
 if (tb == NULL) {
 return NULL;
 }
-qatomic_set(>tb_jmp_cache[hash], tb);
+qatomic_set(>tb_jmp_cache[hash].tb, tb);
 return tb;
 }
 
@@ -978,6 +978,8 @@ int cpu_exec(CPUState *cpu)
 
 tb = tb_lookup(cpu, pc, cs_base, flags, cflags);
 if (tb == NULL) {
+uint32_t h;
+
 mmap_lock();
 tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
 mmap_unlock();
@@ -985,7 +987,8 @@ int cpu_exec(CPUState *cpu)
  * We add the TB in the virtual pc hash table
  * for the fast lookup
  */
-qatomic_set(>tb_jmp_cache[tb_jmp_cache_hash_func(pc)], 
tb);
+h = tb_jmp_cache_hash_func(pc);
+qatomic_set(>tb_jmp_cache[h].tb, tb);
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 8b81b07b79..a8afe1ab9f 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -103,7 +103,7 @@ static void tb_jmp_cache_clear_page(CPUState *cpu, 
target_ulong page_addr)
 unsigned int i, i0 = tb_jmp_cache_hash_page(page_addr);
 
 for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
-qatomic_set(>tb_jmp_cache[i0 + i], NULL);
+qatomic_set(>tb_jmp_cache[i0 + i].tb, NULL);
 }
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 20f00f4335..c2745f14a6 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1187,8 +1187,8 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 /* remove the TB from the hash list */
 h = tb_jmp_cache_hash_func(tb->pc);
 CPU_FOREACH(cpu) {
-if (qatomic_read(>tb_jmp_cache[h]) == tb) {
-qatomic_set(>tb_jmp_cache[h], NULL);
+if (qatomic_read(>tb_jmp_cache[h].tb) == tb) {
+qatomic_set(>tb_jmp_cache[h].tb, NULL);
 }
 }
 
-- 
2.34.1




[PATCH v2 17/33] accel/tcg: Add pc and host_pc params to gen_intermediate_code

2022-08-16 Thread Richard Henderson
Pass these along to translator_loop -- pc may be used instead
of tb->pc, and host_pc is currently unused.  Adjust all targets
at one time.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   |  1 -
 include/exec/translator.h | 24 
 accel/tcg/translate-all.c |  3 ++-
 accel/tcg/translator.c|  9 +
 target/alpha/translate.c  |  5 +++--
 target/arm/translate.c|  5 +++--
 target/avr/translate.c|  5 +++--
 target/cris/translate.c   |  5 +++--
 target/hexagon/translate.c|  6 --
 target/hppa/translate.c   |  5 +++--
 target/i386/tcg/translate.c   |  5 +++--
 target/loongarch/translate.c  |  6 --
 target/m68k/translate.c   |  5 +++--
 target/microblaze/translate.c |  5 +++--
 target/mips/tcg/translate.c   |  5 +++--
 target/nios2/translate.c  |  5 +++--
 target/openrisc/translate.c   |  6 --
 target/ppc/translate.c|  5 +++--
 target/riscv/translate.c  |  5 +++--
 target/rx/translate.c |  5 +++--
 target/s390x/tcg/translate.c  |  5 +++--
 target/sh4/translate.c|  5 +++--
 target/sparc/translate.c  |  5 +++--
 target/tricore/translate.c|  6 --
 target/xtensa/translate.c |  6 --
 25 files changed, 95 insertions(+), 52 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7a6dc44d86..4ad166966b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -39,7 +39,6 @@ typedef ram_addr_t tb_page_addr_t;
 #define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
 #endif
 
-void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns);
 void restore_state_to_opc(CPUArchState *env, TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/include/exec/translator.h b/include/exec/translator.h
index 45b9268ca4..69db0f5c21 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -26,6 +26,19 @@
 #include "exec/translate-all.h"
 #include "tcg/tcg.h"
 
+/**
+ * gen_intermediate_code
+ * @cpu: cpu context
+ * @tb: translation block
+ * @max_insns: max number of instructions to translate
+ * @pc: guest virtual program counter address
+ * @host_pc: host physical program counter address
+ *
+ * This function must be provided by the target, which should create
+ * the target-specific DisasContext, and then invoke translator_loop.
+ */
+void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb, int max_insns,
+   target_ulong pc, void *host_pc);
 
 /**
  * DisasJumpType:
@@ -123,11 +136,13 @@ typedef struct TranslatorOps {
 
 /**
  * translator_loop:
- * @ops: Target-specific operations.
- * @db: Disassembly context.
  * @cpu: Target vCPU.
  * @tb: Translation block.
  * @max_insns: Maximum number of insns to translate.
+ * @pc: guest virtual program counter address
+ * @host_pc: host physical program counter address
+ * @ops: Target-specific operations.
+ * @db: Disassembly context.
  *
  * Generic translator loop.
  *
@@ -141,8 +156,9 @@ typedef struct TranslatorOps {
  * - When single-stepping is enabled (system-wide or on the current vCPU).
  * - When too many instructions have been translated.
  */
-void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
- CPUState *cpu, TranslationBlock *tb, int max_insns);
+void translator_loop(CPUState *cpu, TranslationBlock *tb, int max_insns,
+ target_ulong pc, void *host_pc,
+ const TranslatorOps *ops, DisasContextBase *db);
 
 void translator_loop_temp_check(DisasContextBase *db);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a5ca424f13..7360ecdb38 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -46,6 +46,7 @@
 
 #include "exec/cputlb.h"
 #include "exec/translate-all.h"
+#include "exec/translator.h"
 #include "qemu/bitmap.h"
 #include "qemu/qemu-print.h"
 #include "qemu/timer.h"
@@ -1391,7 +1392,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_func_start(tcg_ctx);
 
 tcg_ctx->cpu = env_cpu(env);
-gen_intermediate_code(cpu, tb, max_insns);
+gen_intermediate_code(cpu, tb, max_insns, pc, host_pc);
 assert(tb->size != 0);
 tcg_ctx->cpu = NULL;
 max_insns = tb->icount;
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index fe7af9b943..3eef30d93a 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -51,16 +51,17 @@ static inline void translator_page_protect(DisasContextBase 
*dcbase,
 #endif
 }
 
-void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
- CPUState *cpu, TranslationBlock *tb, int max_insns)
+void translator_loop(CPUState *cpu, TranslationBlock *tb, int max_insns,
+ target_ulong pc, void *host_pc,
+ const TranslatorOps *ops, DisasContextBase *db)
 {
 uint32_t cflags = tb_cflags(tb);
 bool plugin_enabled;
 
 /* Initialize DisasContext */
 

[PATCH v2 29/33] target/arm: Change gen_exception_internal to work on displacements

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c |  6 +++---
 target/arm/translate.c | 10 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 422ce9288d..b42643 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -340,9 +340,9 @@ static void gen_exception_internal(int excp)
 gen_helper_exception_internal(cpu_env, tcg_constant_i32(excp));
 }
 
-static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
+static void gen_exception_internal_insn(DisasContext *s, int pc_diff, int excp)
 {
-gen_a64_update_pc(s, pc - s->pc_curr);
+gen_a64_update_pc(s, pc_diff);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -2229,7 +2229,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 break;
 }
 #endif
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, 0, EXCP_SEMIHOST);
 } else {
 unallocated_encoding(s);
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d441e31d3a..63a41ed438 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1078,10 +1078,10 @@ static inline void gen_smc(DisasContext *s)
 s->base.is_jmp = DISAS_SMC;
 }
 
-static void gen_exception_internal_insn(DisasContext *s, uint32_t pc, int excp)
+static void gen_exception_internal_insn(DisasContext *s, int pc_diff, int excp)
 {
 gen_set_condexec(s);
-gen_update_pc(s, pc - s->pc_curr);
+gen_update_pc(s, pc_diff);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -1175,7 +1175,7 @@ static inline void gen_hlt(DisasContext *s, int imm)
 s->current_el != 0 &&
 #endif
 (imm == (s->thumb ? 0x3c : 0xf000))) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, 0, EXCP_SEMIHOST);
 return;
 }
 
@@ -6565,7 +6565,7 @@ static bool trans_BKPT(DisasContext *s, arg_BKPT *a)
 !IS_USER(s) &&
 #endif
 (a->imm == 0xab)) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, 0, EXCP_SEMIHOST);
 } else {
 gen_exception_bkpt_insn(s, syn_aa32_bkpt(a->imm, false));
 }
@@ -8773,7 +8773,7 @@ static bool trans_SVC(DisasContext *s, arg_SVC *a)
 !IS_USER(s) &&
 #endif
 (a->imm == semihost_imm)) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, 0, EXCP_SEMIHOST);
 } else {
 gen_update_pc(s, curr_insn_len(s));
 s->svc_imm = a->imm;
-- 
2.34.1




[PATCH v2 10/33] accel/tcg: Properly implement get_page_addr_code for user-only

2022-08-16 Thread Richard Henderson
The current implementation is a no-op, simply returning addr.
This is incorrect, because we ought to be checking the page
permissions for execution.

Make get_page_addr_code inline for both implementations.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 85 ++---
 accel/tcg/cputlb.c  |  5 ---
 accel/tcg/user-exec.c   | 15 
 3 files changed, 43 insertions(+), 62 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e7e30d55b8..9f35e3b7a9 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -595,43 +595,44 @@ struct MemoryRegionSection *iotlb_to_section(CPUState 
*cpu,
  hwaddr index, MemTxAttrs attrs);
 #endif
 
-#if defined(CONFIG_USER_ONLY)
-void mmap_lock(void);
-void mmap_unlock(void);
-bool have_mmap_lock(void);
-
 /**
- * get_page_addr_code() - user-mode version
+ * get_page_addr_code_hostp()
  * @env: CPUArchState
  * @addr: guest virtual address of guest code
  *
- * Returns @addr.
+ * See get_page_addr_code() (full-system version) for documentation on the
+ * return value.
+ *
+ * Sets *@hostp (when @hostp is non-NULL) as follows.
+ * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
+ * to the host address where @addr's content is kept.
+ *
+ * Note: this function can trigger an exception.
+ */
+tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
+void **hostp);
+
+/**
+ * get_page_addr_code()
+ * @env: CPUArchState
+ * @addr: guest virtual address of guest code
+ *
+ * If we cannot translate and execute from the entire RAM page, or if
+ * the region is not backed by RAM, returns -1. Otherwise, returns the
+ * ram_addr_t corresponding to the guest code at @addr.
+ *
+ * Note: this function can trigger an exception.
  */
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env,
 target_ulong addr)
 {
-return addr;
+return get_page_addr_code_hostp(env, addr, NULL);
 }
 
-/**
- * get_page_addr_code_hostp() - user-mode version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * Returns @addr.
- *
- * If @hostp is non-NULL, sets *@hostp to the host address where @addr's 
content
- * is kept.
- */
-static inline tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env,
-  target_ulong addr,
-  void **hostp)
-{
-if (hostp) {
-*hostp = g2h_untagged(addr);
-}
-return addr;
-}
+#if defined(CONFIG_USER_ONLY)
+void mmap_lock(void);
+void mmap_unlock(void);
+bool have_mmap_lock(void);
 
 /**
  * adjust_signal_pc:
@@ -688,36 +689,6 @@ G_NORETURN void cpu_loop_exit_sigbus(CPUState *cpu, 
target_ulong addr,
 static inline void mmap_lock(void) {}
 static inline void mmap_unlock(void) {}
 
-/**
- * get_page_addr_code() - full-system version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * If we cannot translate and execute from the entire RAM page, or if
- * the region is not backed by RAM, returns -1. Otherwise, returns the
- * ram_addr_t corresponding to the guest code at @addr.
- *
- * Note: this function can trigger an exception.
- */
-tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr);
-
-/**
- * get_page_addr_code_hostp() - full-system version
- * @env: CPUArchState
- * @addr: guest virtual address of guest code
- *
- * See get_page_addr_code() (full-system version) for documentation on the
- * return value.
- *
- * Sets *@hostp (when @hostp is non-NULL) as follows.
- * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
- * to the host address where @addr's content is kept.
- *
- * Note: this function can trigger an exception.
- */
-tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp);
-
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
 void tlb_set_dirty(CPUState *cpu, target_ulong vaddr);
 
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 5db56bcd1e..80a3eb4f1c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1532,11 +1532,6 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
 return qemu_ram_addr_from_host_nofail(p);
 }
 
-tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
-{
-return get_page_addr_code_hostp(env, addr, NULL);
-}
-
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
 {
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 20ada5472b..a20234fb02 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -199,6 +199,21 @@ void *probe_access(CPUArchState *env, 

[PATCH v2 31/33] target/arm: Introduce gen_pc_plus_diff for aarch64

2022-08-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 41 +++---
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b42643..322a09c503 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -148,9 +148,14 @@ static void reset_btype(DisasContext *s)
 }
 }
 
+static void gen_pc_plus_diff(DisasContext *s, TCGv_i64 dest, int diff)
+{
+tcg_gen_movi_i64(dest, s->pc_curr + diff);
+}
+
 void gen_a64_update_pc(DisasContext *s, int diff)
 {
-tcg_gen_movi_i64(cpu_pc, s->pc_curr + diff);
+gen_pc_plus_diff(s, cpu_pc, diff);
 }
 
 /*
@@ -1368,7 +1373,7 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t 
insn)
 
 if (insn & (1U << 31)) {
 /* BL Branch with link */
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
 }
 
 /* B Branch / BL Branch with link */
@@ -2319,11 +2324,17 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 default:
 goto do_unallocated;
 }
-gen_a64_set_pc(s, dst);
 /* BLR also needs to load return address */
 if (opc == 1) {
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+TCGv_i64 lr = cpu_reg(s, 30);
+if (dst == lr) {
+TCGv_i64 tmp = new_tmp_a64(s);
+tcg_gen_mov_i64(tmp, dst);
+dst = tmp;
+}
+gen_pc_plus_diff(s, lr, curr_insn_len(s));
 }
+gen_a64_set_pc(s, dst);
 break;
 
 case 8: /* BRAA */
@@ -2346,11 +2357,17 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 } else {
 dst = cpu_reg(s, rn);
 }
-gen_a64_set_pc(s, dst);
 /* BLRAA also needs to load return address */
 if (opc == 9) {
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+TCGv_i64 lr = cpu_reg(s, 30);
+if (dst == lr) {
+TCGv_i64 tmp = new_tmp_a64(s);
+tcg_gen_mov_i64(tmp, dst);
+dst = tmp;
+}
+gen_pc_plus_diff(s, lr, curr_insn_len(s));
 }
+gen_a64_set_pc(s, dst);
 break;
 
 case 4: /* ERET */
@@ -2918,7 +2935,8 @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
 
 tcg_rt = cpu_reg(s, rt);
 
-clean_addr = tcg_constant_i64(s->pc_curr + imm);
+clean_addr = new_tmp_a64(s);
+gen_pc_plus_diff(s, clean_addr, imm);
 if (is_vector) {
 do_fp_ld(s, rt, clean_addr, size);
 } else {
@@ -4262,23 +4280,22 @@ static void disas_ldst(DisasContext *s, uint32_t insn)
 static void disas_pc_rel_adr(DisasContext *s, uint32_t insn)
 {
 unsigned int page, rd;
-uint64_t base;
-uint64_t offset;
+int64_t offset;
 
 page = extract32(insn, 31, 1);
 /* SignExtend(immhi:immlo) -> offset */
 offset = sextract64(insn, 5, 19);
 offset = offset << 2 | extract32(insn, 29, 2);
 rd = extract32(insn, 0, 5);
-base = s->pc_curr;
 
 if (page) {
 /* ADRP (page based) */
-base &= ~0xfff;
 offset <<= 12;
+/* The page offset is ok for TARGET_TB_PCREL. */
+offset -= s->pc_curr & 0xfff;
 }
 
-tcg_gen_movi_i64(cpu_reg(s, rd), base + offset);
+gen_pc_plus_diff(s, cpu_reg(s, rd), offset);
 }
 
 /*
-- 
2.34.1




[PATCH v2 18/33] accel/tcg: Add fast path for translator_ld*

2022-08-16 Thread Richard Henderson
Cache the translation from guest to host address, so we may
use direct loads when we hit on the primary translation page.

Look up the second translation page only once, during translation.
This obviates another lookup of the second page within tb_gen_code
after translation.

Fixes a bug in that plugin_insn_append should be passed the bytes
in the original memory order, not bswapped by pieces.

Signed-off-by: Richard Henderson 
---
 include/exec/translator.h |  52 --
 accel/tcg/translate-all.c |  26 -
 accel/tcg/translator.c| 111 +++---
 3 files changed, 138 insertions(+), 51 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 69db0f5c21..177a001698 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -81,13 +81,14 @@ typedef enum DisasJumpType {
  * Architecture-agnostic disassembly context.
  */
 typedef struct DisasContextBase {
-const TranslationBlock *tb;
+TranslationBlock *tb;
 target_ulong pc_first;
 target_ulong pc_next;
 DisasJumpType is_jmp;
 int num_insns;
 int max_insns;
 bool singlestep_enabled;
+void *host_addr[2];
 #ifdef CONFIG_USER_ONLY
 /*
  * Guest address of the last byte of the last protected page.
@@ -183,24 +184,43 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
  * the relevant information at translation time.
  */
 
-#define GEN_TRANSLATOR_LD(fullname, type, load_fn, swap_fn) \
-type fullname ## _swap(CPUArchState *env, DisasContextBase *dcbase, \
-   abi_ptr pc, bool do_swap);   \
-static inline type fullname(CPUArchState *env,  \
-DisasContextBase *dcbase, abi_ptr pc)   \
-{   \
-return fullname ## _swap(env, dcbase, pc, false);   \
+uint8_t translator_ldub(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint16_t translator_lduw(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint32_t translator_ldl(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+uint64_t translator_ldq(CPUArchState *env, DisasContextBase *db, abi_ptr pc);
+
+static inline uint16_t
+translator_lduw_swap(CPUArchState *env, DisasContextBase *db,
+ abi_ptr pc, bool do_swap)
+{
+uint16_t ret = translator_lduw(env, db, pc);
+if (do_swap) {
+ret = bswap16(ret);
 }
+return ret;
+}
 
-#define FOR_EACH_TRANSLATOR_LD(F)   \
-F(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)   \
-F(translator_lduw, uint16_t, cpu_lduw_code, bswap16)\
-F(translator_ldl, uint32_t, cpu_ldl_code, bswap32)  \
-F(translator_ldq, uint64_t, cpu_ldq_code, bswap64)
+static inline uint32_t
+translator_ldl_swap(CPUArchState *env, DisasContextBase *db,
+abi_ptr pc, bool do_swap)
+{
+uint32_t ret = translator_ldl(env, db, pc);
+if (do_swap) {
+ret = bswap32(ret);
+}
+return ret;
+}
 
-FOR_EACH_TRANSLATOR_LD(GEN_TRANSLATOR_LD)
-
-#undef GEN_TRANSLATOR_LD
+static inline uint64_t
+translator_ldq_swap(CPUArchState *env, DisasContextBase *db,
+abi_ptr pc, bool do_swap)
+{
+uint64_t ret = translator_ldq_swap(env, db, pc, false);
+if (do_swap) {
+ret = bswap64(ret);
+}
+return ret;
+}
 
 /*
  * Return whether addr is on the same page as where disassembly started.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7360ecdb38..a8f1c34c4e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1332,10 +1332,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 {
 CPUArchState *env = cpu->env_ptr;
 TranslationBlock *tb, *existing_tb;
-tb_page_addr_t phys_pc, phys_page2;
-target_ulong virt_page2;
+tb_page_addr_t phys_pc;
 tcg_insn_unit *gen_code_buf;
 int gen_code_size, search_size, max_insns;
+void *host_pc;
 #ifdef CONFIG_PROFILER
 TCGProfile *prof = _ctx->prof;
 int64_t ti;
@@ -1344,7 +1344,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 assert_memory_lock();
 qemu_thread_jit_write();
 
-phys_pc = get_page_addr_code_hostp(env, pc, false, NULL);
+phys_pc = get_page_addr_code_hostp(env, pc, false, _pc);
 
 if (phys_pc == -1) {
 /* Generate a one-shot TB with 1 insn in it */
@@ -1375,6 +1375,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->flags = flags;
 tb->cflags = cflags;
 tb->trace_vcpu_dstate = *cpu->trace_dstate;
+tb->page_addr[0] = phys_pc;
+tb->page_addr[1] = -1;
 tcg_ctx->tb_cflags = cflags;
  tb_overflow:
 
@@ -1568,13 +1570,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 }
 
 /*
- * If the TB is not associated with a physical RAM page then
- * it must be a temporary 

[PATCH v2 33/33] target/arm: Enable TARGET_TB_PCREL

2022-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |  2 ++
 target/arm/translate.h |  6 
 target/arm/cpu.c   | 23 +++---
 target/arm/translate-a64.c | 37 ++-
 target/arm/translate.c | 62 ++
 5 files changed, 100 insertions(+), 30 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 68ffb12427..ef62371d8f 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -34,4 +34,6 @@
 
 #define NB_MMU_MODES 15
 
+#define TARGET_TB_PCREL 1
+
 #endif
diff --git a/target/arm/translate.h b/target/arm/translate.h
index d42059aa1d..7717ea3f45 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -12,6 +12,12 @@ typedef struct DisasContext {
 
 /* The address of the current instruction being translated. */
 target_ulong pc_curr;
+/*
+ * For TARGET_TB_PCREL, the value relative to pc_curr against which
+ * offsets must be computed for cpu_pc.  -1 if unknown due to jump.
+ */
+target_ulong pc_save;
+target_ulong pc_cond_save;
 target_ulong page_start;
 uint32_t insn;
 /* Nonzero if this instruction has been conditionally skipped.  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 047bf3f4ab..f5e74b6c3b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -64,17 +64,18 @@ static void arm_cpu_set_pc(CPUState *cs, vaddr value)
 void arm_cpu_synchronize_from_tb(CPUState *cs,
  const TranslationBlock *tb)
 {
-ARMCPU *cpu = ARM_CPU(cs);
-CPUARMState *env = >env;
-
-/*
- * It's OK to look at env for the current mode here, because it's
- * never possible for an AArch64 TB to chain to an AArch32 TB.
- */
-if (is_a64(env)) {
-env->pc = tb_pc(tb);
-} else {
-env->regs[15] = tb_pc(tb);
+/* The program counter is always up to date with TARGET_TB_PCREL. */
+if (!TARGET_TB_PCREL) {
+CPUARMState *env = cs->env_ptr;
+/*
+ * It's OK to look at env for the current mode here, because it's
+ * never possible for an AArch64 TB to chain to an AArch32 TB.
+ */
+if (is_a64(env)) {
+env->pc = tb_pc(tb);
+} else {
+env->regs[15] = tb_pc(tb);
+}
 }
 }
 #endif /* CONFIG_TCG */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 322a09c503..a433189722 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -150,12 +150,18 @@ static void reset_btype(DisasContext *s)
 
 static void gen_pc_plus_diff(DisasContext *s, TCGv_i64 dest, int diff)
 {
-tcg_gen_movi_i64(dest, s->pc_curr + diff);
+assert(s->pc_save != -1);
+if (TARGET_TB_PCREL) {
+tcg_gen_addi_i64(dest, cpu_pc, (s->pc_curr - s->pc_save) + diff);
+} else {
+tcg_gen_movi_i64(dest, s->pc_curr + diff);
+}
 }
 
 void gen_a64_update_pc(DisasContext *s, int diff)
 {
 gen_pc_plus_diff(s, cpu_pc, diff);
+s->pc_save = s->pc_curr + diff;
 }
 
 /*
@@ -209,6 +215,7 @@ static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
  * then loading an address into the PC will clear out any tag.
  */
 gen_top_byte_ignore(s, cpu_pc, src, s->tbii);
+s->pc_save = -1;
 }
 
 /*
@@ -347,16 +354,22 @@ static void gen_exception_internal(int excp)
 
 static void gen_exception_internal_insn(DisasContext *s, int pc_diff, int excp)
 {
+target_ulong pc_save = s->pc_save;
+
 gen_a64_update_pc(s, pc_diff);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
+s->pc_save = pc_save;
 }
 
 static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syndrome)
 {
+target_ulong pc_save = s->pc_save;
+
 gen_a64_update_pc(s, 0);
 gen_helper_exception_bkpt_insn(cpu_env, tcg_constant_i32(syndrome));
 s->base.is_jmp = DISAS_NORETURN;
+s->pc_save = pc_save;
 }
 
 static void gen_step_complete_exception(DisasContext *s)
@@ -385,11 +398,16 @@ static inline bool use_goto_tb(DisasContext *s, uint64_t 
dest)
 
 static void gen_goto_tb(DisasContext *s, int n, int diff)
 {
-uint64_t dest = s->pc_curr + diff;
+target_ulong pc_save = s->pc_save;
 
-if (use_goto_tb(s, dest)) {
-tcg_gen_goto_tb(n);
-gen_a64_update_pc(s, diff);
+if (use_goto_tb(s, s->pc_curr + diff)) {
+if (TARGET_TB_PCREL) {
+gen_a64_update_pc(s, diff);
+tcg_gen_goto_tb(n);
+} else {
+tcg_gen_goto_tb(n);
+gen_a64_update_pc(s, diff);
+}
 tcg_gen_exit_tb(s->base.tb, n);
 s->base.is_jmp = DISAS_NORETURN;
 } else {
@@ -401,6 +419,7 @@ static void gen_goto_tb(DisasContext *s, int n, int diff)
 s->base.is_jmp = DISAS_NORETURN;
 }
 }
+s->pc_save = pc_save;
 }
 
 static void init_tmp_a64_array(DisasContext *s)
@@ -14717,7 +14736,7 @@ static void 
aarch64_tr_init_disas_context(DisasContextBase *dcbase,
 
 

[PATCH v2 00/33] accel/tcg + target/arm: pc-relative translation

2022-08-16 Thread Richard Henderson
Supercedes: 20220812180806.2128593-1-richard.hender...@linaro.org
("accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes")

A few changes to the PROT_EXEC work that I posted last week, and
then continuing to the main event.

My initial goal was to reduce the overhead of TB flushing, which
Alex Bennee identified as a significant issue with respect to
booting AArch64 kernels under avocado.  Our initial guess was that
we need a more efficient data structure for walking TBs associated
with a physical page.

While I was looking at some of those numbers, I noted that we were
seeing up to 16000 TBs attached to a single page, which is well more
than I expected to see, and means that a new data structure isn't
going to help as much as simply reducing the number of translations.

It turns out the retranslation is due to the guest kernel's userland
address space randomization.  Each process gets e.g. libc mapped to
a different virtual address, which caused a new translation.

This, then, introduces some infrastructure for writing "pc-relative"
translation blocks, in which the guest pc is treated as a variable
just like any other guest cpu register.  The hashing for these TBs
are adjusted to compare the physical address.  The target/arm backend
is adjusted to use the new feature.

This does result in a significant reduction in translation.  From the
BootLinuxAarch64.test_virt_tcg_gicv2 test, at the login prompt:

Before:

gen code size   160684739/1073736704
TB count289808
TB flush count  1
TB invalidate count 235143

After:

gen code size   277992547/1073736704
TB count503882
TB flush count  0
TB invalidate count 69282

Before TARGET_TB_PCREL, we generate approximately 1.1GB of TBs
(overflow 1GB, flush, and fill 153MB again).  Afterward, we only
generate 265MB of TBs.

Surprisingly, this does not affect wall-clock times nearly as
much as I would have expected:

   before   after   change
 BootLinuxAarch64.test_virt_tcg_gicv2:  97.3585.11   -12%
 BootLinuxAarch64.test_virt_tcg_gicv3: 102.7596.87-5%

Change in profile, top 10 entries before, matched up with after:

  before   after
   9.01%  qemu-system-aar  [.] helper_lookup_tb_ptr10.67%
   4.92%  qemu-system-aar  [.] qht_lookup_custom5.06%
   4.79%  qemu-system-aar  [.] get_phys_addr_lpae   5.24%
   2.57%  qemu-system-aar  [.] address_space_ldq_le 2.77%
   2.33%  qemu-system-aar  [.] liveness_pass_1  0.60%
   2.24%  qemu-system-aar  [.] cpu_get_tb_cpu_state 2.58%
   1.76%  qemu-system-aar  [.] address_space_translate_internal 1.75%
   1.71%  qemu-system-aar  [.] tb_lookup_cmp1.92%
   1.65%  qemu-system-aar  [.] tcg_gen_code 0.44%
   1.64%  qemu-system-aar  [.] do_tb_phys_invalidate0.09%


r~


Ilya Leoshkevich (1):
  accel/tcg: Introduce is_same_page()

Richard Henderson (32):
  linux-user/arm: Mark the commpage executable
  linux-user/hppa: Allocate page zero as a commpage
  linux-user/x86_64: Allocate vsyscall page as a commpage
  linux-user: Honor PT_GNU_STACK
  tests/tcg/i386: Move smc_code2 to an executable section
  accel/tcg: Remove PageDesc code_bitmap
  accel/tcg: Use bool for page_find_alloc
  accel/tcg: Make tb_htable_lookup static
  accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
  accel/tcg: Properly implement get_page_addr_code for user-only
  accel/tcg: Use probe_access_internal for softmmu
get_page_addr_code_hostp
  accel/tcg: Add nofault parameter to get_page_addr_code_hostp
  accel/tcg: Unlock mmap_lock after longjmp
  accel/tcg: Raise PROT_EXEC exception early
  accel/tcg: Remove translator_ldsw
  accel/tcg: Add pc and host_pc params to gen_intermediate_code
  accel/tcg: Add fast path for translator_ld*
  accel/tcg: Use DisasContextBase in plugin_gen_tb_start
  accel/tcg: Do not align tb->page_addr[0]
  include/hw/core: Create struct CPUJumpCache
  accel/tcg: Introduce tb_pc and tb_pc_log
  accel/tcg: Introduce TARGET_TB_PCREL
  accel/tcg: Split log_cpu_exec into inline and slow path
  target/arm: Introduce curr_insn_len
  target/arm: Change gen_goto_tb to work on displacements
  target/arm: Change gen_*set_pc_im to gen_*update_pc
  target/arm: Change gen_exception_insn* to work on displacements
  target/arm: Change gen_exception_internal to work on displacements
  target/arm: Change gen_jmp* to work on displacements
  target/arm: Introduce gen_pc_plus_diff for aarch64
  target/arm: Introduce gen_pc_plus_diff for aarch32
  target/arm: Enable TARGET_TB_PCREL

 include/elf.h   |   1 +
 include/exec/cpu-common.h   |   1 +
 include/exec/cpu-defs.h |   3 +
 include/exec/exec-all.h | 138 

[PATCH v2 15/33] accel/tcg: Introduce is_same_page()

2022-08-16 Thread Richard Henderson
From: Ilya Leoshkevich 

Introduce a function that checks whether a given address is on the same
page as where disassembly started. Having it improves readability of
the following patches.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20220811095534.241224-3-...@linux.ibm.com>
Reviewed-by: Richard Henderson 
[rth: Make the DisasContextBase parameter const.]
Signed-off-by: Richard Henderson 
---
 include/exec/translator.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 7db6845535..0d0bf3a31e 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -187,4 +187,14 @@ FOR_EACH_TRANSLATOR_LD(GEN_TRANSLATOR_LD)
 
 #undef GEN_TRANSLATOR_LD
 
+/*
+ * Return whether addr is on the same page as where disassembly started.
+ * Translators can use this to enforce the rule that only single-insn
+ * translation blocks are allowed to cross page boundaries.
+ */
+static inline bool is_same_page(const DisasContextBase *db, target_ulong addr)
+{
+return ((addr ^ db->pc_first) & TARGET_PAGE_MASK) == 0;
+}
+
 #endif /* EXEC__TRANSLATOR_H */
-- 
2.34.1




[PATCH v2 14/33] accel/tcg: Raise PROT_EXEC exception early

2022-08-16 Thread Richard Henderson
We currently ignore PROT_EXEC on the initial lookup, and
defer raising the exception until cpu_ld*_code().
It makes more sense to raise the exception early.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c  | 2 +-
 accel/tcg/translate-all.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7887af6f45..7b8977a0a4 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -222,7 +222,7 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 desc.cflags = cflags;
 desc.trace_vcpu_dstate = *cpu->trace_dstate;
 desc.pc = pc;
-phys_pc = get_page_addr_code(desc.env, pc);
+phys_pc = get_page_addr_code_hostp(desc.env, pc, false, NULL);
 if (phys_pc == -1) {
 return NULL;
 }
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 596029b26d..a5ca424f13 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1343,7 +1343,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 assert_memory_lock();
 qemu_thread_jit_write();
 
-phys_pc = get_page_addr_code(env, pc);
+phys_pc = get_page_addr_code_hostp(env, pc, false, NULL);
 
 if (phys_pc == -1) {
 /* Generate a one-shot TB with 1 insn in it */
-- 
2.34.1




[PATCH v2 11/33] accel/tcg: Use probe_access_internal for softmmu get_page_addr_code_hostp

2022-08-16 Thread Richard Henderson
Simplify the implementation of get_page_addr_code_hostp
by reusing the existing probe_access infrastructure.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 76 --
 1 file changed, 26 insertions(+), 50 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 80a3eb4f1c..2dc2affa12 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1482,56 +1482,6 @@ static bool victim_tlb_hit(CPUArchState *env, size_t 
mmu_idx, size_t index,
   victim_tlb_hit(env, mmu_idx, index, offsetof(CPUTLBEntry, TY), \
  (ADDR) & TARGET_PAGE_MASK)
 
-/*
- * Return a ram_addr_t for the virtual address for execution.
- *
- * Return -1 if we can't translate and execute from an entire page
- * of RAM.  This will force us to execute by loading and translating
- * one insn at a time, without caching.
- *
- * NOTE: This function will trigger an exception if the page is
- * not executable.
- */
-tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
-{
-uintptr_t mmu_idx = cpu_mmu_index(env, true);
-uintptr_t index = tlb_index(env, mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-void *p;
-
-if (unlikely(!tlb_hit(entry->addr_code, addr))) {
-if (!VICTIM_TLB_HIT(addr_code, addr)) {
-tlb_fill(env_cpu(env), addr, 0, MMU_INST_FETCH, mmu_idx, 0);
-index = tlb_index(env, mmu_idx, addr);
-entry = tlb_entry(env, mmu_idx, addr);
-
-if (unlikely(entry->addr_code & TLB_INVALID_MASK)) {
-/*
- * The MMU protection covers a smaller range than a target
- * page, so we must redo the MMU check for every insn.
- */
-return -1;
-}
-}
-assert(tlb_hit(entry->addr_code, addr));
-}
-
-if (unlikely(entry->addr_code & TLB_MMIO)) {
-/* The region is not backed by RAM.  */
-if (hostp) {
-*hostp = NULL;
-}
-return -1;
-}
-
-p = (void *)((uintptr_t)addr + entry->addend);
-if (hostp) {
-*hostp = p;
-}
-return qemu_ram_addr_from_host_nofail(p);
-}
-
 static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
CPUIOTLBEntry *iotlbentry, uintptr_t retaddr)
 {
@@ -1687,6 +1637,32 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
 return flags ? NULL : host;
 }
 
+/*
+ * Return a ram_addr_t for the virtual address for execution.
+ *
+ * Return -1 if we can't translate and execute from an entire page
+ * of RAM.  This will force us to execute by loading and translating
+ * one insn at a time, without caching.
+ *
+ * NOTE: This function will trigger an exception if the page is
+ * not executable.
+ */
+tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
+void **hostp)
+{
+void *p;
+
+(void)probe_access_internal(env, addr, 1, MMU_INST_FETCH,
+cpu_mmu_index(env, true), true, , 0);
+if (p == NULL) {
+return -1;
+}
+if (hostp) {
+*hostp = p;
+}
+return qemu_ram_addr_from_host_nofail(p);
+}
+
 #ifdef CONFIG_PLUGIN
 /*
  * Perform a TLB lookup and populate the qemu_plugin_hwaddr structure.
-- 
2.34.1




[PATCH v2 13/33] accel/tcg: Unlock mmap_lock after longjmp

2022-08-16 Thread Richard Henderson
The mmap_lock is held around tb_gen_code.  While the comment
is correct that the lock is dropped when tb_gen_code runs out
of memory, the lock is *not* dropped when an exception is
raised reading code for translation.

Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 711859d4d4..7887af6f45 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -523,13 +523,11 @@ void cpu_exec_step_atomic(CPUState *cpu)
 cpu_tb_exec(cpu, tb, _exit);
 cpu_exec_exit(cpu);
 } else {
-/*
- * The mmap_lock is dropped by tb_gen_code if it runs out of
- * memory.
- */
 #ifndef CONFIG_SOFTMMU
 clear_helper_retaddr();
-tcg_debug_assert(!have_mmap_lock());
+if (have_mmap_lock()) {
+mmap_unlock();
+}
 #endif
 if (qemu_mutex_iothread_locked()) {
 qemu_mutex_unlock_iothread();
@@ -936,7 +934,9 @@ int cpu_exec(CPUState *cpu)
 
 #ifndef CONFIG_SOFTMMU
 clear_helper_retaddr();
-tcg_debug_assert(!have_mmap_lock());
+if (have_mmap_lock()) {
+mmap_unlock();
+}
 #endif
 if (qemu_mutex_iothread_locked()) {
 qemu_mutex_unlock_iothread();
-- 
2.34.1




[PATCH v2 23/33] accel/tcg: Introduce TARGET_TB_PCREL

2022-08-16 Thread Richard Henderson
Prepare for targets to be able to produce TBs that can
run in more than one virtual context.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-defs.h   |  3 +++
 include/exec/exec-all.h   | 41 ++---
 include/hw/core/cpu.h |  1 +
 accel/tcg/cpu-exec.c  | 55 ++-
 accel/tcg/translate-all.c | 48 ++
 5 files changed, 115 insertions(+), 33 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index ba3cd32a1e..87e2bc4e59 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -54,6 +54,9 @@
 #  error TARGET_PAGE_BITS must be defined in cpu-param.h
 # endif
 #endif
+#ifndef TARGET_TB_PCREL
+# define TARGET_TB_PCREL 0
+#endif
 
 #define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8)
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index cec3ef1666..b41835bb55 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -459,8 +459,32 @@ struct tb_tc {
 };
 
 struct TranslationBlock {
-target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS 
base) */
-target_ulong cs_base; /* CS base for this block */
+#if !TARGET_TB_PCREL
+/*
+ * Guest PC corresponding to this block.  This must be the true
+ * virtual address.  Therefore e.g. x86 stores EIP + CS_BASE, and
+ * targets like Arm, MIPS, HP-PA, which reuse low bits for ISA or
+ * privilege, must store those bits elsewhere.
+ *
+ * If TARGET_TB_PCREL, the opcodes for the TranslationBlock are
+ * written such that the TB is associated only with the physical
+ * page and may be run in any virtual address context.  In this case,
+ * PC must always be taken from ENV in a target-specific manner.
+ * Unwind information is taken as byte offsets from the "current"
+ * value of the PC, as tracked by the translator.
+ */
+target_ulong pc;
+#endif
+
+/*
+ * Target-specific data associated with the TranslationBlock, e.g.:
+ * x86: the original user, the Code Segment virtual base,
+ * arm: an extension of tb->flags,
+ * s390x: instruction data for EXECUTE,
+ * sparc: the next pc of the instruction queue (for delay slots).
+ */
+target_ulong cs_base;
+
 uint32_t flags; /* flags defining in which context the code was generated 
*/
 uint32_t cflags;/* compile flags */
 
@@ -536,13 +560,24 @@ struct TranslationBlock {
 /* Hide the read to avoid ifdefs for TARGET_TB_PCREL. */
 static inline target_ulong tb_pc(const TranslationBlock *tb)
 {
+#if TARGET_TB_PCREL
+qemu_build_not_reached();
+#else
 return tb->pc;
+#endif
 }
 
-/* Similarly, but for logs. */
+/*
+ * Similarly, but for logs. In this case, when the virtual pc
+ * is not available, use the physical address.
+ */
 static inline target_ulong tb_pc_log(const TranslationBlock *tb)
 {
+#if TARGET_TB_PCREL
+return tb->page_addr[0];
+#else
 return tb->pc;
+#endif
 }
 
 /* Hide the qatomic_read to make code a little easier on the eyes */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 8edef14199..7dcfccf6e2 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -235,6 +235,7 @@ struct hvf_vcpu_state;
 
 typedef struct {
 TranslationBlock *tb;
+vaddr pc;
 } CPUJumpCache;
 
 /* work queue */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index f146960b7b..f7c82a8f2c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -185,7 +185,7 @@ static bool tb_lookup_cmp(const void *p, const void *d)
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
 
-if (tb_pc(tb) == desc->pc &&
+if ((TARGET_TB_PCREL || tb_pc(tb) == desc->pc) &&
 tb->page_addr[0] == desc->page_addr0 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
@@ -227,7 +227,8 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 return NULL;
 }
 desc.page_addr0 = phys_pc;
-h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+h = tb_hash_func(phys_pc, (TARGET_TB_PCREL ? 0 : pc),
+ flags, cflags, *cpu->trace_dstate);
 return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
 
@@ -243,21 +244,42 @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, 
target_ulong pc,
 tcg_debug_assert(!(cflags & CF_INVALID));
 
 hash = tb_jmp_cache_hash_func(pc);
-tb = qatomic_rcu_read(>tb_jmp_cache[hash].tb);
-
-if (likely(tb &&
-   tb->pc == pc &&
-   tb->cs_base == cs_base &&
-   tb->flags == flags &&
-   tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-   tb_cflags(tb) == cflags)) {
-return tb;
+if (TARGET_TB_PCREL) {
+/* Use acquire to ensure current load of pc from tb_jmp_cache[]. */
+tb = qatomic_load_acquire(>tb_jmp_cache[hash].tb);
+} else {
+/* Use rcu_read to ensure 

[PATCH v2 04/33] linux-user: Honor PT_GNU_STACK

2022-08-16 Thread Richard Henderson
Map the stack executable if required by default or on demand.

Signed-off-by: Richard Henderson 
---
 include/elf.h|  1 +
 linux-user/qemu.h|  1 +
 linux-user/elfload.c | 19 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/elf.h b/include/elf.h
index 3a4bcb646a..3d6b9062c0 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -31,6 +31,7 @@ typedef int64_t  Elf64_Sxword;
 #define PT_LOPROC  0x7000
 #define PT_HIPROC  0x7fff
 
+#define PT_GNU_STACK  (PT_LOOS + 0x474e551)
 #define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
 
 #define PT_MIPS_REGINFO   0x7000
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 7d90de1b15..e2e93fbd1d 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -48,6 +48,7 @@ struct image_info {
 uint32_telf_flags;
 int personality;
 abi_ulong   alignment;
+boolexec_stack;
 
 /* Generic semihosting knows about these pointers. */
 abi_ulong   arg_strings;   /* strings for argv */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index d783240a36..050cd1fa08 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -233,6 +233,7 @@ static bool init_guest_commpage(void)
 #define ELF_ARCHEM_386
 
 #define ELF_PLATFORM get_elf_platform()
+#define EXSTACK_DEFAULT true
 
 static const char *get_elf_platform(void)
 {
@@ -309,6 +310,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 
 #define ELF_ARCHEM_ARM
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
@@ -777,6 +779,7 @@ static inline void init_thread(struct target_pt_regs *regs,
 #else
 
 #define ELF_CLASS   ELFCLASS32
+#define EXSTACK_DEFAULT true
 
 #endif
 
@@ -974,6 +977,7 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUPPCState *en
 
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_LOONGARCH
+#define EXSTACK_DEFAULT true
 
 #define elf_check_arch(x) ((x) == EM_LOONGARCH)
 
@@ -1069,6 +1073,7 @@ static uint32_t get_elf_hwcap(void)
 #define ELF_CLASS   ELFCLASS32
 #endif
 #define ELF_ARCHEM_MIPS
+#define EXSTACK_DEFAULT true
 
 #ifdef TARGET_ABI_MIPSN32
 #define elf_check_abi(x) ((x) & EF_MIPS_ABI2)
@@ -1807,6 +1812,10 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 #define bswaptls(ptr) bswap32s(ptr)
 #endif
 
+#ifndef EXSTACK_DEFAULT
+#define EXSTACK_DEFAULT false
+#endif
+
 #include "elf.h"
 
 /* We must delay the following stanzas until after "elf.h". */
@@ -2082,6 +2091,7 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
  struct image_info *info)
 {
 abi_ulong size, error, guard;
+int prot;
 
 size = guest_stack_size;
 if (size < STACK_LOWER_LIMIT) {
@@ -2092,7 +2102,11 @@ static abi_ulong setup_arg_pages(struct linux_binprm 
*bprm,
 guard = qemu_real_host_page_size();
 }
 
-error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE,
+prot = PROT_READ | PROT_WRITE;
+if (info->exec_stack) {
+prot |= PROT_EXEC;
+}
+error = target_mmap(0, size + guard, prot,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 if (error == -1) {
 perror("mmap stack");
@@ -2920,6 +2934,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 loaddr = -1, hiaddr = 0;
 info->alignment = 0;
+info->exec_stack = EXSTACK_DEFAULT;
 for (i = 0; i < ehdr->e_phnum; ++i) {
 struct elf_phdr *eppnt = phdr + i;
 if (eppnt->p_type == PT_LOAD) {
@@ -2962,6 +2977,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, )) {
 goto exit_errmsg;
 }
+} else if (eppnt->p_type == PT_GNU_STACK) {
+info->exec_stack = eppnt->p_flags & PF_X;
 }
 }
 
-- 
2.34.1




[PATCH v2 08/33] accel/tcg: Make tb_htable_lookup static

2022-08-16 Thread Richard Henderson
The function is not used outside of cpu-exec.c.  Move it and
its subroutines up in the file, before the first use.

Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |   3 -
 accel/tcg/cpu-exec.c| 122 
 2 files changed, 61 insertions(+), 64 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 311e5fb422..e7e30d55b8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -552,9 +552,6 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, 
MemTxAttrs attrs);
 #endif
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
-TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base, uint32_t flags,
-   uint32_t cflags);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index a565a3f8ec..711859d4d4 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -170,6 +170,67 @@ uint32_t curr_cflags(CPUState *cpu)
 return cflags;
 }
 
+struct tb_desc {
+target_ulong pc;
+target_ulong cs_base;
+CPUArchState *env;
+tb_page_addr_t phys_page1;
+uint32_t flags;
+uint32_t cflags;
+uint32_t trace_vcpu_dstate;
+};
+
+static bool tb_lookup_cmp(const void *p, const void *d)
+{
+const TranslationBlock *tb = p;
+const struct tb_desc *desc = d;
+
+if (tb->pc == desc->pc &&
+tb->page_addr[0] == desc->phys_page1 &&
+tb->cs_base == desc->cs_base &&
+tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
+tb_cflags(tb) == desc->cflags) {
+/* check next page if needed */
+if (tb->page_addr[1] == -1) {
+return true;
+} else {
+tb_page_addr_t phys_page2;
+target_ulong virt_page2;
+
+virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+phys_page2 = get_page_addr_code(desc->env, virt_page2);
+if (tb->page_addr[1] == phys_page2) {
+return true;
+}
+}
+}
+return false;
+}
+
+static TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
+  target_ulong cs_base, uint32_t flags,
+  uint32_t cflags)
+{
+tb_page_addr_t phys_pc;
+struct tb_desc desc;
+uint32_t h;
+
+desc.env = cpu->env_ptr;
+desc.cs_base = cs_base;
+desc.flags = flags;
+desc.cflags = cflags;
+desc.trace_vcpu_dstate = *cpu->trace_dstate;
+desc.pc = pc;
+phys_pc = get_page_addr_code(desc.env, pc);
+if (phys_pc == -1) {
+return NULL;
+}
+desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+h = tb_hash_func(phys_pc, pc, flags, cflags, *cpu->trace_dstate);
+return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
+}
+
 /* Might cause an exception, so have a longjmp destination ready */
 static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
   target_ulong cs_base,
@@ -487,67 +548,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
 end_exclusive();
 }
 
-struct tb_desc {
-target_ulong pc;
-target_ulong cs_base;
-CPUArchState *env;
-tb_page_addr_t phys_page1;
-uint32_t flags;
-uint32_t cflags;
-uint32_t trace_vcpu_dstate;
-};
-
-static bool tb_lookup_cmp(const void *p, const void *d)
-{
-const TranslationBlock *tb = p;
-const struct tb_desc *desc = d;
-
-if (tb->pc == desc->pc &&
-tb->page_addr[0] == desc->phys_page1 &&
-tb->cs_base == desc->cs_base &&
-tb->flags == desc->flags &&
-tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
-tb_cflags(tb) == desc->cflags) {
-/* check next page if needed */
-if (tb->page_addr[1] == -1) {
-return true;
-} else {
-tb_page_addr_t phys_page2;
-target_ulong virt_page2;
-
-virt_page2 = (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
-phys_page2 = get_page_addr_code(desc->env, virt_page2);
-if (tb->page_addr[1] == phys_page2) {
-return true;
-}
-}
-}
-return false;
-}
-
-TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base, uint32_t flags,
-   uint32_t cflags)
-{
-tb_page_addr_t phys_pc;
-struct tb_desc desc;
-uint32_t h;
-
-desc.env = cpu->env_ptr;
-desc.cs_base = cs_base;
-desc.flags = flags;
-desc.cflags = cflags;
-desc.trace_vcpu_dstate = *cpu->trace_dstate;
-desc.pc = pc;
-phys_pc = 

[PATCH v2 12/33] accel/tcg: Add nofault parameter to get_page_addr_code_hostp

2022-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 10 +-
 accel/tcg/cputlb.c  |  8 
 accel/tcg/plugin-gen.c  |  4 ++--
 accel/tcg/user-exec.c   |  4 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 9f35e3b7a9..7a6dc44d86 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -599,6 +599,8 @@ struct MemoryRegionSection *iotlb_to_section(CPUState *cpu,
  * get_page_addr_code_hostp()
  * @env: CPUArchState
  * @addr: guest virtual address of guest code
+ * @nofault: do not raise an exception
+ * @hostp: output for host pointer
  *
  * See get_page_addr_code() (full-system version) for documentation on the
  * return value.
@@ -607,10 +609,10 @@ struct MemoryRegionSection *iotlb_to_section(CPUState 
*cpu,
  * If the return value is -1, sets *@hostp to NULL. Otherwise, sets *@hostp
  * to the host address where @addr's content is kept.
  *
- * Note: this function can trigger an exception.
+ * Note: Unless @nofault, this function can trigger an exception.
  */
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp);
+bool nofault, void **hostp);
 
 /**
  * get_page_addr_code()
@@ -620,13 +622,11 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState 
*env, target_ulong addr,
  * If we cannot translate and execute from the entire RAM page, or if
  * the region is not backed by RAM, returns -1. Otherwise, returns the
  * ram_addr_t corresponding to the guest code at @addr.
- *
- * Note: this function can trigger an exception.
  */
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env,
 target_ulong addr)
 {
-return get_page_addr_code_hostp(env, addr, NULL);
+return get_page_addr_code_hostp(env, addr, true, NULL);
 }
 
 #if defined(CONFIG_USER_ONLY)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 2dc2affa12..ae7b40dd51 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1644,16 +1644,16 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
  * of RAM.  This will force us to execute by loading and translating
  * one insn at a time, without caching.
  *
- * NOTE: This function will trigger an exception if the page is
- * not executable.
+ * NOTE: Unless @nofault, this function will trigger an exception
+ * if the page is not executable.
  */
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
+bool nofault, void **hostp)
 {
 void *p;
 
 (void)probe_access_internal(env, addr, 1, MMU_INST_FETCH,
-cpu_mmu_index(env, true), true, , 0);
+cpu_mmu_index(env, true), nofault, , 0);
 if (p == NULL) {
 return -1;
 }
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 3d0b101e34..8377c15383 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -872,7 +872,7 @@ bool plugin_gen_tb_start(CPUState *cpu, const 
TranslationBlock *tb, bool mem_onl
 
 ptb->vaddr = tb->pc;
 ptb->vaddr2 = -1;
-get_page_addr_code_hostp(cpu->env_ptr, tb->pc, >haddr1);
+get_page_addr_code_hostp(cpu->env_ptr, tb->pc, true, >haddr1);
 ptb->haddr2 = NULL;
 ptb->mem_only = mem_only;
 
@@ -902,7 +902,7 @@ void plugin_gen_insn_start(CPUState *cpu, const 
DisasContextBase *db)
 unlikely((db->pc_next & TARGET_PAGE_MASK) !=
  (db->pc_first & TARGET_PAGE_MASK))) {
 get_page_addr_code_hostp(cpu->env_ptr, db->pc_next,
- >haddr2);
+ true, >haddr2);
 ptb->vaddr2 = db->pc_next;
 }
 if (likely(ptb->vaddr2 == -1)) {
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index a20234fb02..1b3403a064 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -200,11 +200,11 @@ void *probe_access(CPUArchState *env, target_ulong addr, 
int size,
 }
 
 tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-void **hostp)
+bool nofault, void **hostp)
 {
 int flags;
 
-flags = probe_access_internal(env, addr, 1, MMU_INST_FETCH, true, 0);
+flags = probe_access_internal(env, addr, 1, MMU_INST_FETCH, nofault, 0);
 if (unlikely(flags)) {
 return -1;
 }
-- 
2.34.1




[PATCH v2 09/33] accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c

2022-08-16 Thread Richard Henderson
The base qemu_ram_addr_from_host function is already in
softmmu/physmem.c; move the nofail version to be adjacent.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu-common.h |  1 +
 accel/tcg/cputlb.c| 12 
 softmmu/physmem.c | 12 
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 2281be4e10..d909429427 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -72,6 +72,7 @@ typedef uintptr_t ram_addr_t;
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 ram_addr_t qemu_ram_addr_from_host(void *ptr);
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
 RAMBlock *qemu_ram_block_by_name(const char *name);
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
ram_addr_t *offset);
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index a46f3a654d..5db56bcd1e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1283,18 +1283,6 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 prot, mmu_idx, size);
 }
 
-static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
-{
-ram_addr_t ram_addr;
-
-ram_addr = qemu_ram_addr_from_host(ptr);
-if (ram_addr == RAM_ADDR_INVALID) {
-error_report("Bad ram pointer %p", ptr);
-abort();
-}
-return ram_addr;
-}
-
 /*
  * Note: tlb_fill() can trigger a resize of the TLB. This means that all of the
  * caller's prior references to the TLB table (e.g. CPUTLBEntry pointers) must
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index dc3c3e5f2e..d4c30e99ea 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2460,6 +2460,18 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr)
 return block->offset + offset;
 }
 
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+ram_addr_t ram_addr;
+
+ram_addr = qemu_ram_addr_from_host(ptr);
+if (ram_addr == RAM_ADDR_INVALID) {
+error_report("Bad ram pointer %p", ptr);
+abort();
+}
+return ram_addr;
+}
+
 static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
  MemTxAttrs attrs, void *buf, hwaddr len);
 static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
-- 
2.34.1




[PATCH v2 02/33] linux-user/hppa: Allocate page zero as a commpage

2022-08-16 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means that we've
got to mark page zero executable.  We had been special casing this
entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 3e3dc02499..29d910c4cc 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1646,6 +1646,34 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 regs->gr[31] = infop->entry;
 }
 
+#define LO_COMMPAGE  0
+
+static bool init_guest_commpage(void)
+{
+void *want = g2h_untagged(LO_COMMPAGE);
+void *addr = mmap(want, qemu_host_page_size, PROT_NONE,
+  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+
+if (addr == MAP_FAILED) {
+perror("Allocating guest commpage");
+exit(EXIT_FAILURE);
+}
+if (addr != want) {
+return false;
+}
+
+/*
+ * On Linux, page zero is normally marked execute only + gateway.
+ * Normal read or write is supposed to fail (thus PROT_NONE above),
+ * but specific offsets have kernel code mapped to raise permissions
+ * and implement syscalls.  Here, simply mark the page executable.
+ * Special case the entry points during translation (see do_page_zero).
+ */
+page_set_flags(LO_COMMPAGE, LO_COMMPAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+
 #endif /* TARGET_HPPA */
 
 #ifdef TARGET_XTENSA
@@ -2326,12 +2354,12 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 }
 
 #if defined(HI_COMMPAGE)
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #elif defined(LO_COMMPAGE)
 #define HI_COMMPAGE 0
 #else
 #define HI_COMMPAGE 0
-#define LO_COMMPAGE 0
+#define LO_COMMPAGE -1
 #define init_guest_commpage() true
 #endif
 
@@ -2555,7 +2583,7 @@ static void pgb_static(const char *image_name, abi_ulong 
orig_loaddr,
 } else {
 offset = -(HI_COMMPAGE & -align);
 }
-} else if (LO_COMMPAGE != 0) {
+} else if (LO_COMMPAGE != -1) {
 loaddr = MIN(loaddr, LO_COMMPAGE & -align);
 }
 
-- 
2.34.1




[PATCH v2 07/33] accel/tcg: Use bool for page_find_alloc

2022-08-16 Thread Richard Henderson
Bool is more appropriate type for the alloc parameter.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 298277a590..596029b26d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -464,7 +464,7 @@ void page_init(void)
 #endif
 }
 
-static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
+static PageDesc *page_find_alloc(tb_page_addr_t index, bool alloc)
 {
 PageDesc *pd;
 void **lp;
@@ -532,11 +532,11 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 static inline PageDesc *page_find(tb_page_addr_t index)
 {
-return page_find_alloc(index, 0);
+return page_find_alloc(index, false);
 }
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc);
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool 
alloc);
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
@@ -650,7 +650,7 @@ static inline void page_unlock(PageDesc *pd)
 /* lock the page(s) of a TB in the correct acquisition order */
 static inline void page_lock_tb(const TranslationBlock *tb)
 {
-page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], 0);
+page_lock_pair(NULL, tb->page_addr[0], NULL, tb->page_addr[1], false);
 }
 
 static inline void page_unlock_tb(const TranslationBlock *tb)
@@ -839,7 +839,7 @@ void page_collection_unlock(struct page_collection *set)
 #endif /* !CONFIG_USER_ONLY */
 
 static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-   PageDesc **ret_p2, tb_page_addr_t phys2, int alloc)
+   PageDesc **ret_p2, tb_page_addr_t phys2, bool alloc)
 {
 PageDesc *p1, *p2;
 tb_page_addr_t page1;
@@ -1289,7 +1289,7 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
  * Note that inserting into the hash table first isn't an option, since
  * we can only insert TBs that are fully initialized.
  */
-page_lock_pair(, phys_pc, , phys_page2, 1);
+page_lock_pair(, phys_pc, , phys_page2, true);
 tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
 if (p2) {
 tb_page_add(p2, tb, 1, phys_page2);
@@ -2224,7 +2224,7 @@ void page_set_flags(target_ulong start, target_ulong end, 
int flags)
 for (addr = start, len = end - start;
  len != 0;
  len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
-PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, true);
 
 /* If the write protection bit is set, then we invalidate
the code inside.  */
-- 
2.34.1




[PATCH v2 06/33] accel/tcg: Remove PageDesc code_bitmap

2022-08-16 Thread Richard Henderson
This bitmap is created and discarded immediately.
We gain nothing by its existence.

Signed-off-by: Richard Henderson 
---
 accel/tcg/translate-all.c | 78 ++-
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b83161a081..298277a590 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -101,21 +101,14 @@
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
-#define SMC_BITMAP_USE_THRESHOLD 10
-
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
 uintptr_t first_tb;
-#ifdef CONFIG_SOFTMMU
-/* in order to optimize self modifying code, we count the number
-   of lookups we do to a given page to use a bitmap */
-unsigned long *code_bitmap;
-unsigned int code_write_count;
-#else
+#ifdef CONFIG_USER_ONLY
 unsigned long flags;
 void *target_data;
 #endif
-#ifndef CONFIG_USER_ONLY
+#ifdef CONFIG_SOFTMMU
 QemuSpin lock;
 #endif
 } PageDesc;
@@ -906,17 +899,6 @@ void tb_htable_init(void)
 qht_init(_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
 }
 
-/* call with @p->lock held */
-static inline void invalidate_page_bitmap(PageDesc *p)
-{
-assert_page_locked(p);
-#ifdef CONFIG_SOFTMMU
-g_free(p->code_bitmap);
-p->code_bitmap = NULL;
-p->code_write_count = 0;
-#endif
-}
-
 /* Set to NULL all the 'first_tb' fields in all PageDescs. */
 static void page_flush_tb_1(int level, void **lp)
 {
@@ -931,7 +913,6 @@ static void page_flush_tb_1(int level, void **lp)
 for (i = 0; i < V_L2_SIZE; ++i) {
 page_lock([i]);
 pd[i].first_tb = (uintptr_t)NULL;
-invalidate_page_bitmap(pd + i);
 page_unlock([i]);
 }
 } else {
@@ -1196,11 +1177,9 @@ static void do_tb_phys_invalidate(TranslationBlock *tb, 
bool rm_from_page_list)
 if (rm_from_page_list) {
 p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 if (tb->page_addr[1] != -1) {
 p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 }
 }
 
@@ -1245,35 +1224,6 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 }
 }
 
-#ifdef CONFIG_SOFTMMU
-/* call with @p->lock held */
-static void build_page_bitmap(PageDesc *p)
-{
-int n, tb_start, tb_end;
-TranslationBlock *tb;
-
-assert_page_locked(p);
-p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
-
-PAGE_FOR_EACH_TB(p, tb, n) {
-/* NOTE: this is subtle as a TB may span two physical pages */
-if (n == 0) {
-/* NOTE: tb_end may be after the end of the page, but
-   it is not a problem */
-tb_start = tb->pc & ~TARGET_PAGE_MASK;
-tb_end = tb_start + tb->size;
-if (tb_end > TARGET_PAGE_SIZE) {
-tb_end = TARGET_PAGE_SIZE;
- }
-} else {
-tb_start = 0;
-tb_end = ((tb->pc + tb->size) & ~TARGET_PAGE_MASK);
-}
-bitmap_set(p->code_bitmap, tb_start, tb_end - tb_start);
-}
-}
-#endif
-
 /* add the tb in the target page and protect it if necessary
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1294,7 +1244,6 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
 page_already_protected = p->first_tb != (uintptr_t)NULL;
 #endif
 p->first_tb = (uintptr_t)tb | n;
-invalidate_page_bitmap(p);
 
 #if defined(CONFIG_USER_ONLY)
 /* translator_loop() must have made all TB pages non-writable */
@@ -1356,10 +1305,8 @@ tb_link_page(TranslationBlock *tb, tb_page_addr_t 
phys_pc,
 /* remove TB from the page(s) if we couldn't insert it */
 if (unlikely(existing_tb)) {
 tb_page_remove(p, tb);
-invalidate_page_bitmap(p);
 if (p2) {
 tb_page_remove(p2, tb);
-invalidate_page_bitmap(p2);
 }
 tb = existing_tb;
 }
@@ -1736,7 +1683,6 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 #if !defined(CONFIG_USER_ONLY)
 /* if no code remaining, no need to continue to use slow writes */
 if (!p->first_tb) {
-invalidate_page_bitmap(p);
 tlb_unprotect_code(start);
 }
 #endif
@@ -1832,24 +1778,8 @@ void tb_invalidate_phys_page_fast(struct page_collection 
*pages,
 }
 
 assert_page_locked(p);
-if (!p->code_bitmap &&
-++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-build_page_bitmap(p);
-}
-if (p->code_bitmap) {
-unsigned int nr;
-unsigned long b;
-
-nr = start & ~TARGET_PAGE_MASK;
-b = p->code_bitmap[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG - 1));
-if (b & ((1 << len) - 1)) {
-goto do_invalidate;
-}
-} else {
-do_invalidate:
-   

[PATCH v2 05/33] tests/tcg/i386: Move smc_code2 to an executable section

2022-08-16 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means
that we've got to put this code into a section that is
both writable and executable.

Note that this test did not run on hardware beforehand either.

Signed-off-by: Richard Henderson 
---
 tests/tcg/i386/test-i386.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/i386/test-i386.c b/tests/tcg/i386/test-i386.c
index ac8d5a3c1f..e6b308a2c0 100644
--- a/tests/tcg/i386/test-i386.c
+++ b/tests/tcg/i386/test-i386.c
@@ -1998,7 +1998,7 @@ uint8_t code[] = {
 0xc3, /* ret */
 };
 
-asm(".section \".data\"\n"
+asm(".section \".data_x\",\"awx\"\n"
 "smc_code2:\n"
 "movl 4(%esp), %eax\n"
 "movl %eax, smc_patch_addr2 + 1\n"
-- 
2.34.1




[PATCH v2 03/33] linux-user/x86_64: Allocate vsyscall page as a commpage

2022-08-16 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means that we've
got to the vsyscall page executable.  We had been special casing
this entirely within translate.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 29d910c4cc..d783240a36 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -195,6 +195,28 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 (*regs)[26] = tswapreg(env->segs[R_GS].selector & 0x);
 }
 
+#if ULONG_MAX >= TARGET_VSYSCALL_PAGE
+#define HI_COMMPAGE  TARGET_VSYSCALL_PAGE
+
+static bool init_guest_commpage(void)
+{
+/*
+ * The vsyscall page is at a high negative address aka kernel space,
+ * which means that we cannot actually allocate it with target_mmap.
+ * We still should be able to use page_set_flags, unless the user
+ * has specified -R reserved_va, which would trigger an assert().
+ */
+if (reserved_va != 0 &&
+TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE >= reserved_va) {
+error_report("Cannot allocate vsyscall page");
+exit(EXIT_FAILURE);
+}
+page_set_flags(TARGET_VSYSCALL_PAGE,
+   TARGET_VSYSCALL_PAGE + TARGET_PAGE_SIZE,
+   PAGE_EXEC | PAGE_VALID);
+return true;
+}
+#endif
 #else
 
 #define ELF_START_MMAP 0x8000
-- 
2.34.1




[PATCH v2 01/33] linux-user/arm: Mark the commpage executable

2022-08-16 Thread Richard Henderson
We're about to start validating PAGE_EXEC, which means
that we've got to mark the commpage executable.  We had
been placing the commpage outside of reserved_va, which
was incorrect and lead to an abort.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/target_cpu.h | 4 ++--
 linux-user/elfload.c| 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 709d19bc9e..89ba274cfc 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -34,9 +34,9 @@ static inline unsigned long arm_max_reserved_va(CPUState *cs)
 } else {
 /*
  * We need to be able to map the commpage.
- * See validate_guest_space in linux-user/elfload.c.
+ * See init_guest_commpage in linux-user/elfload.c.
  */
-return 0xul;
+return 0xul;
 }
 }
 #define MAX_RESERVED_VA  arm_max_reserved_va
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ce902dbd56..3e3dc02499 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -398,7 +398,8 @@ enum {
 
 static bool init_guest_commpage(void)
 {
-void *want = g2h_untagged(HI_COMMPAGE & -qemu_host_page_size);
+abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size;
+void *want = g2h_untagged(commpage);
 void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
   MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
 
@@ -417,6 +418,9 @@ static bool init_guest_commpage(void)
 perror("Protecting guest commpage");
 exit(EXIT_FAILURE);
 }
+
+page_set_flags(commpage, commpage + qemu_host_page_size,
+   PAGE_READ | PAGE_EXEC | PAGE_VALID);
 return true;
 }
 
-- 
2.34.1




Re: [PATCH for-7.2 v4 11/11] ppc/pnv: fix QOM parenting of user creatable root ports

2022-08-16 Thread Daniel Henrique Barboza




On 8/12/22 13:13, Frederic Barrat wrote:



On 11/08/2022 18:39, Daniel Henrique Barboza wrote:

User creatable root ports are being parented by the 'peripheral' or the
'peripheral-anon' container. This happens because this is the regular
QOM schema for sysbus devices that are added via the command line.

Let's make this QOM hierarchy similar to what we have with default root
ports, i.e. the root port must be parented by the pnv-root-bus. To do
that we change the qom and bus parent of the root port during
root_port_realize(). The realize() is shared by the default root port
code path, so we can remove the code inside pnv_phb_attach_root_port()
that was adding the root port as a child of the bus as well.

While we're at it, change pnv_phb_attach_root_port() to receive a PCIBus
instead of a PCIHostState to make it clear that the function does not
make use of the PHB.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/pci-host/pnv_phb.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 17d9960aa1..65ed1f9eb4 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -51,27 +51,11 @@ static void pnv_parent_bus_fixup(DeviceState *parent, 
DeviceState *child,
  }
  }
-/*
- * Attach a root port device.
- *
- * 'index' will be used both as a PCIE slot value and to calculate
- * QOM id. 'chip_id' is going to be used as PCIE chassis for the
- * root port.
- */
-static void pnv_phb_attach_root_port(PCIHostState *pci)
+static void pnv_phb_attach_root_port(PCIBus *bus)
  {
  PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-    const char *dev_id = DEVICE(root)->id;
-    g_autofree char *default_id = NULL;
-    int index;
-
-    index = object_property_get_int(OBJECT(pci->bus), "phb-id", _fatal);
-    default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
-
-    object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,
-  OBJECT(root));
-    pci_realize_and_unref(root, pci->bus, _fatal);
+    pci_realize_and_unref(root, bus, _fatal);
  }
  /*
@@ -171,7 +155,7 @@ static void pnv_phb_realize(DeviceState *dev, Error **errp)
  return;
  }
-    pnv_phb_attach_root_port(pci);
+    pnv_phb_attach_root_port(pci->bus);
  }
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,
@@ -240,12 +224,18 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  {
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
  PnvPHBRootPort *phb_rp = PNV_PHB_ROOT_PORT(dev);
-    PCIBus *bus = PCI_BUS(qdev_get_parent_bus(dev));
+    BusState *qbus = qdev_get_parent_bus(dev);
+    PCIBus *bus = PCI_BUS(qbus);
  PCIDevice *pci = PCI_DEVICE(dev);
  uint16_t device_id = 0;
  Error *local_err = NULL;
  int chip_id, index;
+    /*
+ * 'index' will be used both as a PCIE slot value and to calculate
+ * QOM id. 'chip_id' is going to be used as PCIE chassis for the
+ * root port.
+ */
  chip_id = object_property_get_int(OBJECT(bus), "chip-id", _fatal);
  index = object_property_get_int(OBJECT(bus), "phb-id", _fatal);
@@ -253,6 +243,11 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  qdev_prop_set_uint8(dev, "chassis", chip_id);
  qdev_prop_set_uint16(dev, "slot", index);
+    pnv_parent_qom_fixup(OBJECT(bus), OBJECT(dev), index);
+    if (!qdev_set_parent_bus(dev, qbus, _fatal)) {



That line looks surprising at first, because we got qbus from qdev_get_parent_bus() 
just above, so it looks like a noop. I talked to Daniel about it: the 
device<->bus relationship is correct when entering the function but the call to 
pnv_parent_qom_fixup() interferes with the bus relationship, so it needs to be 
re-established.


The more detailed version: when adding a child property
using object_property_add_child(), object_property_try_add_child() is called.
This function calls internally a object_property_try_add() function with lots
of parameters, including a release callback named 
'object_finalize_child_property'.

This release callback is implemented like this:

static void object_finalize_child_property(Object *obj, const char *name,
   void *opaque)
{
Object *child = opaque;

if (child->class->unparent) {
(child->class->unparent)(child);
}
child->parent = NULL;
object_unref(child);
}

If you're adding a device as a child, which is our case here,
child->class->unparent is device_unparent(). Note that device_unparent()
removes the device from its parent_bus:

static void device_unparent(Object *obj)
{
DeviceState *dev = DEVICE(obj);
BusState *bus;

if (dev->realized) {
qdev_unrealize(dev);
}
while (dev->num_child_bus) {
bus = QLIST_FIRST(>child_bus);
object_unparent(OBJECT(bus));
}
if (dev->parent_bus) {

Re: [PATCH v2] ci: Upgrade msys2 release to 20220603

2022-08-16 Thread Yonggang Luo
I have reason to think that's msys2-64 bit failed because out of memory

I tried to show the memory size of the windows docker, it's result
are 6224352KB, that's less than 6GB?
https://gitlab.com/lygstate/qemu/-/jobs/2891399652

Can we increase the memory size to 16GB


On Sat, Jul 30, 2022 at 3:24 AM Richard Henderson <
richard.hender...@linaro.org> wrote:
>
> On 7/28/22 13:04, Yonggang Luo wrote:
> > Signed-off-by: Yonggang Luo 
> > ---
> >   .cirrus.yml  | 2 +-
> >   .gitlab-ci.d/windows.yml | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
>
> Thanks.  Applied to master as a hot-fix.
>
>
> r~



--
 此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo


Re: [PATCH v10 07/21] blockjob: introduce block_job _locked() APIs

2022-08-16 Thread Stefan Hajnoczi
On Mon, Jul 25, 2022 at 03:38:41AM -0400, Emanuele Giuseppe Esposito wrote:
> Just as done with job.h, create _locked() functions in blockjob.h
> 
> These functions will be later useful when caller has already taken
> the lock. All blockjob _locked functions call job _locked functions.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  blockjob.c   | 52 
>  include/block/blockjob.h | 18 ++
>  2 files changed, 60 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v10 06/21] job: move and update comments from blockjob.c

2022-08-16 Thread Stefan Hajnoczi
On Mon, Jul 25, 2022 at 03:38:40AM -0400, Emanuele Giuseppe Esposito wrote:
> This comment applies more on job, it was left in blockjob as in the past
> the whole job logic was implemented there.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> No functional change intended.
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  blockjob.c | 20 
>  job.c  | 14 ++
>  2 files changed, 14 insertions(+), 20 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v10 05/21] job.c: add job_lock/unlock while keeping job.h intact

2022-08-16 Thread Stefan Hajnoczi
On Mon, Jul 25, 2022 at 03:38:39AM -0400, Emanuele Giuseppe Esposito wrote:
> With "intact" we mean that all job.h functions implicitly
> take the lock. Therefore API callers are unmodified.
> 
> This means that:
> - many static functions that will be always called with job lock held
>   become _locked, and call _locked functions
> - all public functions take the lock internally if needed, and call _locked
>   functions
> - all public functions called internally by other functions in job.c will 
> have a
>   _locked counterpart (sometimes public), to avoid deadlocks (job lock 
> already taken).
>   These functions are not used for now.
> - some public functions called only from exernal files (not job.c) do not
>   have _locked() counterpart and take the lock inside. Others won't need
>   the lock at all because use fields only set at initialization and
>   never modified.
> 
> job_{lock/unlock} is independent from real_job_{lock/unlock}.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  include/qemu/job.h | 138 ++-
>  job.c  | 600 +++--
>  2 files changed, 553 insertions(+), 185 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v10 02/21] job.h: categorize fields in struct Job

2022-08-16 Thread Stefan Hajnoczi
On Mon, Jul 25, 2022 at 03:38:36AM -0400, Emanuele Giuseppe Esposito wrote:
> Categorize the fields in struct Job to understand which ones
> need to be protected by the job mutex and which don't.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qemu/job.h | 61 +++---
>  1 file changed, 36 insertions(+), 25 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PULL 0/2] Two small fixes for QEMU 7.1-rc3

2022-08-16 Thread Richard Henderson

On 8/16/22 04:58, Thomas Huth wrote:

  Hi Richard!

Two minor fixes for rc3. If this is too late for rc3, please feel free
to ignore, I think they are not severe enough to justify an rc4 later.

The following changes since commit d102b8162a1e5fe8288d4d5c01801ce6536ac2d1:

   Merge tag 'pull-la-20220814' of https://gitlab.com/rth7680/qemu into staging 
(2022-08-14 08:48:11 -0500)

are available in the Git repository at:

   https://gitlab.com/thuth/qemu.git tags/pull-request-2022-08-16

for you to fetch changes up to effaf5a240e03020f4ae953e10b764622c3e87cc:

   hw/usb/hcd-xhci: Fix unbounded loop in xhci_ring_chain_length() 
(CVE-2020-14394) (2022-08-16 11:37:19 +0200)


* Fix a possible endless loop in USB XHCI code
* Minor fixes for the new readconfig test


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Daniel P. Berrangé (1):
   tests/qtest: misc tweaks to readconfig

Thomas Huth (1):
   hw/usb/hcd-xhci: Fix unbounded loop in xhci_ring_chain_length() 
(CVE-2020-14394)

  hw/usb/hcd-xhci.c | 23 +++
  tests/qtest/readconfig-test.c | 12 ++--
  2 files changed, 25 insertions(+), 10 deletions(-)






Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-16 Thread Damien Le Moal
On 2022/08/15 23:25, Sam Li wrote:
> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
> 
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
> 
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c |  50 +
>  block/file-posix.c| 341 +-
>  block/io.c|  41 
>  include/block/block-common.h  |   1 -
>  include/block/block-io.h  |  13 ++
>  include/block/block_int-common.h  |  22 +-
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |   6 +
>  meson.build   |   1 +
>  qapi/block-core.json  |   8 +-
>  qemu-io-cmds.c| 143 +
>  11 files changed, 625 insertions(+), 7 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index d4a5df2ac2..fc639b0cd7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>  
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor *zones)
> +{
> +int ret;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk); /* increase before waiting */
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * offset is the starting zone specified as a sector offset.
> + * len is the maximum number of sectors the command should operate on.

You should mention that len should be zone size aligned. Also, for completness,
add a short description of the op argument too ?

> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +int64_t offset, int64_t len)
> +{
> +int ret;
> +IO_CODE();
> +
> +ret = blk_check_byte_request(blk, offset, len);
> +if (ret < 0) {
> +return ret;
> +}
> +blk_inc_in_flight(blk);
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 727389488c..29f67082d9 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,9 @@
>  #include 
>  #include 
>  #include 
> +#if defined(CONFIG_BLKZONED)
> +#include 
> +#endif
>  #include 
>  #include 
>  #include 
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>  PreallocMode prealloc;
>  Error **errp;
>  } truncate;
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +unsigned long ioctl_op;

May be clarify this field usage by calling it zone_op ?

> +} zone_mgmt;
>  };
>  } RawPosixAIOData;
>  
> @@ -1328,7 +1338,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  #endif
>  
>  if (bs->sg || S_ISBLK(st.st_mode)) {
> -int ret = hdev_get_max_hw_transfer(s->fd, );
> +ret = hdev_get_max_hw_transfer(s->fd, );
>  
>  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>  bs->bl.max_hw_transfer = ret;
> @@ -1340,11 +1350,32 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  }
>  }
>  
> -ret = get_sysfs_zoned_model(s->fd, , );
> +ret = get_sysfs_zoned_model(, );
>  if (ret < 0) {
>  zoned = BLK_Z_NONE;
>  }
>  bs->bl.zoned = zoned;
> +if (zoned != BLK_Z_NONE) {
> +ret = get_sysfs_long_val(, "chunk_sectors");
> +if (ret > 0) {
> +bs->bl.zone_sectors = ret;
> +}
> +
> +ret = get_sysfs_long_val(, "zone_append_max_bytes");
> +if (ret > 0) {
> +bs->bl.zone_append_max_bytes = ret;
> +}
> +
> +ret 

Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute

2022-08-16 Thread Sam Li
Damien Le Moal  于2022年8月17日周三 01:35写道:
>
> On 2022/08/15 23:25, Sam Li wrote:
> > Use sysfs attribute files to get the long value of zoned device
> > information.
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Hannes Reinecke 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >  block/file-posix.c | 27 +++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index c07ac4c697..727389488c 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, 
> > BlockZoneModel *zoned) {
> >  return 0;
> >  }
> >
> > +/*
> > + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> > + * max_open_zones, max_active_zones) through sysfs attribute files.
> > + */
>
> The comment here needs to be more generic since this helper is used in patch 2
> in hdev_get_max_segments(). So simply something like:
>
> /*
>  * Get a sysfs attribute value as a long integer.
>  */
>
> And since this helper is used in patch 2, this patch needs to go before patch 
> 2
> (reverse patch 2 and 3 order).

Can I merge patch2 and patch 3 into one patch? Because in patch 2
hdev_get_max_segments -> get_sysfs_long_val(-> get_sysfs_str_val)
while in patch 3 get_sysfs_long_val-> get_sysfs_str_val,
hdev_get_max_segments is required for qemu setting up I guess so the
dependency is intertwined here. If we use separate patches, then the
last patch will modify the first patch's code, which I think is messy.

>
> > +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
> > +#ifdef CONFIG_LINUX
> > +g_autofree char *str = NULL;
> > +const char *end;
> > +long val;
> > +int ret;
> > +
> > +ret = get_sysfs_str_val(st, attribute, );
> > +if (ret < 0) {
> > +return ret;
> > +}
> > +
> > +/* The file is ended with '\n', pass 'end' to accept that. */
> > +ret = qemu_strtol(str, , 10, );
> > +if (ret == 0 && end && *end == '\n') {
> > +ret = val;
> > +}
> > +return ret;
> > +#else
> > +return -ENOTSUP;
> > +#endif
> > +}
> > +
> >  static int hdev_get_max_segments(int fd, struct stat *st) {
> >  int ret;
> >  if (S_ISCHR(st->st_mode)) {
>
>
> --
> Damien Le Moal
> Western Digital Research



[PATCH for-7.2 v3 18/20] device_node.c: enable 'info fdt' to print subnodes

2022-08-16 Thread Daniel Henrique Barboza
Printing subnodes of a given node will allow us to show a whole subtree,
which the additional perk of 'info fdt /' being able to print the whole
FDT.

Since we're now printing more than one subnode, change 'fdt_info' to
print the full path of the first node. This small tweak helps
identifying which node or subnode are being displayed.

To demostrate this capability without printing the whole FDT, the
'/cpus/cpu-map' node from the ARM 'virt' machine has a lot of subnodes:

(qemu) info fdt /cpus/cpu-map
/cpus/cpu-map {
socket0 {
cluster0 {
core0 {
cpu = <0x8001>
}
}
}
}

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 43f96e371b..a6bfbc0617 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -766,17 +766,26 @@ static void fdt_prop_format_val(GString *buf, const char 
*propname,
 g_string_append_printf(buf, "];\n");
 }
 
-static void fdt_format_node(GString *buf, int node, int depth)
+
+static void fdt_format_node(GString *buf, int node, int depth,
+const char *fullpath)
 {
 const struct fdt_property *prop = NULL;
+const char *nodename = NULL;
 const char *propname = NULL;
 void *fdt = current_machine->fdt;
 int padding = depth * 4;
 int property = 0;
+int parent = node;
 int prop_size;
 
-g_string_append_printf(buf, "%*s%s {\n", padding, "",
-   fdt_get_name(fdt, node, NULL));
+if (fullpath != NULL) {
+nodename = fullpath;
+} else {
+nodename = fdt_get_name(fdt, node, NULL);
+}
+
+g_string_append_printf(buf, "%*s%s {\n", padding, "", nodename);
 
 padding += 4;
 
@@ -801,6 +810,10 @@ static void fdt_format_node(GString *buf, int node, int 
depth)
 }
 }
 
+fdt_for_each_subnode(node, fdt, parent) {
+fdt_format_node(buf, node, depth + 1, NULL);
+}
+
 padding -= 4;
 g_string_append_printf(buf, "%*s}\n", padding, "");
 }
@@ -821,7 +834,7 @@ HumanReadableText *qemu_fdt_qmp_query_fdt(const char 
*nodepath, Error **errp)
 return NULL;
 }
 
-fdt_format_node(buf, node, 0);
+fdt_format_node(buf, node, 0, nodepath);
 
 return human_readable_text_from_str(buf);
 }
-- 
2.37.2




[PATCH for-7.2 v3 20/20] hmp, device_tree.c: add 'info fdt ' support

2022-08-16 Thread Daniel Henrique Barboza
'info fdt' is only able to print full nodes so far. It would be good to
be able to also print single properties, since ometimes we just want
to verify a single value from the FDT.

libfdt does not have support to find a property given its full path, but
it does have a way to return a fdt_property given a prop name and its
subnode.

Add a new optional 'propname' parameter to x-query-fdt to specify the
property of a given node. If it's present, we'll proceed to find the
node as usual but, instead of printing the node, we'll attempt to find
the property and print it standalone.

After this change, if an user wants to print just the value of 'cpu' inside
/cpu/cpu-map(...) from an ARM FDT, we can do it:

(qemu) info fdt /cpus/cpu-map/socket0/cluster0/core0 cpu
/cpus/cpu-map/socket0/cluster0/core0/cpu = <0x8001>

Or the 'ibm,my-dma-window' from the v-scsi device inside the pSeries
FDT:

(qemu) info fdt /vdevice/v-scsi@7103 ibm,my-dma-window
/vdevice/v-scsi@7103/ibm,my-dma-window = <0x7103 0x0 0x0 0x0 0x1000>

Cc: Dr. David Alan Gilbert 
Acked-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands-info.hx |  9 +
 include/sysemu/device_tree.h |  2 ++
 monitor/hmp-cmds.c   |  5 -
 monitor/qmp-cmds.c   |  8 +---
 qapi/machine.json|  4 +++-
 softmmu/device_tree.c| 29 -
 6 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 743b48865d..17d6ee4d30 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -924,13 +924,14 @@ ERST
 
 {
 .name   = "fdt",
-.args_type  = "nodepath:s",
-.params = "nodepath",
-.help   = "show firmware device tree node given its path",
+.args_type  = "nodepath:s,propname:s?",
+.params = "nodepath [propname]",
+.help   = "show firmware device tree node or property given its 
path",
 .cmd= hmp_info_fdt,
 },
 
 SRST
   ``info fdt``
-Show a firmware device tree node given its path. Requires libfdt.
+Show a firmware device tree node or property given its path.
+Requires libfdt.
 ERST
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 057d13e397..551a02dee2 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -140,6 +140,8 @@ int qemu_fdt_add_path(void *fdt, const char *path);
 void qemu_fdt_dumpdtb(void *fdt, int size);
 void qemu_fdt_qmp_dumpdtb(const char *filename, Error **errp);
 HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath,
+  bool has_propname,
+  const char *propname,
   Error **errp);
 
 /**
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index accde90380..df8493adc5 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2488,8 +2488,11 @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
 void hmp_info_fdt(Monitor *mon, const QDict *qdict)
 {
 const char *nodepath = qdict_get_str(qdict, "nodepath");
+const char *propname = qdict_get_try_str(qdict, "propname");
 Error *err = NULL;
-g_autoptr(HumanReadableText) info = qmp_x_query_fdt(nodepath, );
+g_autoptr(HumanReadableText) info = NULL;
+
+info = qmp_x_query_fdt(nodepath, propname != NULL, propname, );
 
 if (hmp_handle_error(mon, err)) {
 return;
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index db2c6aa7da..ca2a96cdf7 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -604,9 +604,10 @@ void qmp_dumpdtb(const char *filename, Error **errp)
 return qemu_fdt_qmp_dumpdtb(filename, errp);
 }
 
-HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
+HumanReadableText *qmp_x_query_fdt(const char *nodepath, bool has_propname,
+   const char *propname, Error **errp)
 {
-return qemu_fdt_qmp_query_fdt(nodepath, errp);
+return qemu_fdt_qmp_query_fdt(nodepath, has_propname, propname, errp);
 }
 #else
 void qmp_dumpdtb(const char *filename, Error **errp)
@@ -614,7 +615,8 @@ void qmp_dumpdtb(const char *filename, Error **errp)
 error_setg(errp, "dumpdtb requires libfdt");
 }
 
-HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
+HumanReadableText *qmp_x_query_fdt(const char *nodepath, bool has_propname,
+   const char *propname, Error **errp)
 {
 error_setg(errp, "this command requires libfdt");
 
diff --git a/qapi/machine.json b/qapi/machine.json
index 96cff541ca..c15ce60f46 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1688,6 +1688,7 @@
 # Query for FDT element (node or property). Requires 'libfdt'.
 #
 # @nodepath: the path of the FDT node to be retrieved
+# @propname: name of the property inside the node
 #
 # Features:
 # @unstable: This command is meant 

Re: [RFC PATCH] tests/avocado: push default timeout to QemuBaseTest

2022-08-16 Thread Richard Henderson

On 8/16/22 08:38, Alex Bennée wrote:

All of the QEMU tests eventually end up derrived from this class. Move
the default timeout from LinuxTest to ensure we catch them all.

Signed-off-by: Alex Bennée 
---
  tests/avocado/avocado_qemu/__init__.py | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index ed4853c805..9d17a287cf 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -227,6 +227,10 @@ def exec_command_and_wait_for_pattern(test, command,
  _console_interaction(test, success_message, failure_message, command + 
'\r')
  
  class QemuBaseTest(avocado.Test):

+
+# default timeout for all tests, can be overridden
+timeout = 900
+
  def _get_unique_tag_val(self, tag_name):
  """
  Gets a tag value, if unique for a key
@@ -512,7 +516,6 @@ class LinuxTest(LinuxSSHMixIn, QemuSystemTest):
  to start with than the more vanilla `QemuSystemTest` class.
  """
  
-timeout = 900


Is 15 minutes really a reasonable default?


r~


  distro = None
  username = 'root'
  password = 'password'





[PATCH for-7.2 v3 17/20] device_tree.c: support remaining FDT prop types

2022-08-16 Thread Daniel Henrique Barboza
When printing a blob with 'dtc' using the '-O dts' option there are 3
distinct data types being printed: strings, arrays of uint32s and
regular byte arrays.

Previous patch added support to print strings. Let's add the remaining
formats. We want to resemble the format that 'dtc -O dts' uses, so every
uint32 array uses angle brackets (<>), and regular byte array uses square
brackets ([]). For properties that has no values we keep printing just
its name.

The /chosen FDT node from the pSeris machine gives an example of all
property types 'info fdt' is now able to display:

(qemu) info fdt /chosen
chosen {
ibm,architecture-vec-5 = [00 00];
rng-seed = <0x9cf5071b 0xf8804213 0xbe797764 0xad3d955 0xe0c9637 0x1f99c61e 
0xe9243741 0xe800f17d>;
ibm,arch-vec-5-platform-support = <0x178018c0 0x19001a40>;
linux,pci-probe-only = <0x0>;
stdout-path = "/vdevice/vty@7100";
linux,stdout-path = "/vdevice/vty@7100";
qemu,graphic-depth = <0x20>;
qemu,graphic-height = <0x258>;
qemu,graphic-width = <0x320>;
}

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index d32d6856da..43f96e371b 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -720,6 +720,52 @@ static void fdt_prop_format_string_array(GString *buf,
 g_string_append_printf(buf, ";\n");
 }
 
+static bool fdt_prop_is_uint32_array(int size)
+{
+return size % 4 == 0;
+}
+
+static void fdt_prop_format_uint32_array(GString *buf,
+ const char *propname,
+ const void *data,
+ int prop_size, int padding)
+{
+const fdt32_t *array = data;
+int array_len = prop_size / 4;
+int i;
+
+g_string_append_printf(buf, "%*s%s = <", padding, "", propname);
+
+for (i = 0; i < array_len; i++) {
+g_string_append_printf(buf, "0x%" PRIx32, fdt32_to_cpu(array[i]));
+
+if (i < array_len - 1) {
+g_string_append_printf(buf, " ");
+}
+}
+
+g_string_append_printf(buf, ">;\n");
+}
+
+static void fdt_prop_format_val(GString *buf, const char *propname,
+const void *data, int prop_size,
+int padding)
+{
+const char *val = data;
+int i;
+
+g_string_append_printf(buf, "%*s%s = [", padding, "", propname);
+
+for (i = 0; i < prop_size; i++) {
+g_string_append_printf(buf, "%02x", val[i]);
+if (i < prop_size - 1) {
+g_string_append_printf(buf, " ");
+}
+}
+
+g_string_append_printf(buf, "];\n");
+}
+
 static void fdt_format_node(GString *buf, int node, int depth)
 {
 const struct fdt_property *prop = NULL;
@@ -738,11 +784,20 @@ static void fdt_format_node(GString *buf, int node, int 
depth)
 prop = fdt_get_property_by_offset(fdt, property, _size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
+if (prop_size == 0) {
+g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
+continue;
+}
+
 if (fdt_prop_is_string_array(prop->data, prop_size)) {
 fdt_prop_format_string_array(buf, propname, prop->data,
  prop_size, padding);
+} else if (fdt_prop_is_uint32_array(prop_size)) {
+fdt_prop_format_uint32_array(buf, propname, prop->data, prop_size,
+ padding);
 } else {
-g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
+fdt_prop_format_val(buf, propname, prop->data,
+prop_size, padding);
 }
 }
 
-- 
2.37.2




Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute

2022-08-16 Thread Damien Le Moal
On 2022/08/15 23:25, Sam Li wrote:
> Use sysfs attribute files to get the long value of zoned device
> information.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  block/file-posix.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index c07ac4c697..727389488c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, 
> BlockZoneModel *zoned) {
>  return 0;
>  }
>  
> +/*
> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
> + * max_open_zones, max_active_zones) through sysfs attribute files.
> + */

The comment here needs to be more generic since this helper is used in patch 2
in hdev_get_max_segments(). So simply something like:

/*
 * Get a sysfs attribute value as a long integer.
 */

And since this helper is used in patch 2, this patch needs to go before patch 2
(reverse patch 2 and 3 order).

> +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
> +#ifdef CONFIG_LINUX
> +g_autofree char *str = NULL;
> +const char *end;
> +long val;
> +int ret;
> +
> +ret = get_sysfs_str_val(st, attribute, );
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* The file is ended with '\n', pass 'end' to accept that. */
> +ret = qemu_strtol(str, , 10, );
> +if (ret == 0 && end && *end == '\n') {
> +ret = val;
> +}
> +return ret;
> +#else
> +return -ENOTSUP;
> +#endif
> +}
> +
>  static int hdev_get_max_segments(int fd, struct stat *st) {
>  int ret;
>  if (S_ISCHR(st->st_mode)) {


-- 
Damien Le Moal
Western Digital Research



[PATCH for-7.2 v3 12/20] hw/riscv: set machine->fdt in spike_board_init()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the spike machine.

Cc: Palmer Dabbelt 
Cc: Alistair Francis 
Cc: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/spike.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index e41b6aa9f0..17f517bfa6 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -40,6 +40,8 @@
 #include "sysemu/device_tree.h"
 #include "sysemu/sysemu.h"
 
+#include 
+
 static const MemMapEntry spike_memmap[] = {
 [SPIKE_MROM] = { 0x1000, 0xf000 },
 [SPIKE_HTIF] = {  0x100, 0x1000 },
@@ -304,6 +306,13 @@ static void spike_board_init(MachineState *machine)
 /* Compute the fdt load address in dram */
 fdt_load_addr = riscv_load_fdt(memmap[SPIKE_DRAM].base,
machine->ram_size, s->fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = s->fdt;
+
 /* load the reset vector */
 riscv_setup_rom_reset_vec(machine, >soc[0], memmap[SPIKE_DRAM].base,
   memmap[SPIKE_MROM].base,
-- 
2.37.2




Re: [PATCH v7 3/8] file-posix: introduce get_sysfs_long_val for the long sysfs attribute

2022-08-16 Thread Damien Le Moal
On 2022/08/16 10:53, Sam Li wrote:
> Damien Le Moal  于2022年8月17日周三 01:35写道:
>>
>> On 2022/08/15 23:25, Sam Li wrote:
>>> Use sysfs attribute files to get the long value of zoned device
>>> information.
>>>
>>> Signed-off-by: Sam Li 
>>> Reviewed-by: Hannes Reinecke 
>>> Reviewed-by: Stefan Hajnoczi 
>>> ---
>>>  block/file-posix.c | 27 +++
>>>  1 file changed, 27 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index c07ac4c697..727389488c 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -1258,6 +1258,33 @@ static int get_sysfs_zoned_model(struct stat *st, 
>>> BlockZoneModel *zoned) {
>>>  return 0;
>>>  }
>>>
>>> +/*
>>> + * Get zoned device information (chunk_sectors, zoned_append_max_bytes,
>>> + * max_open_zones, max_active_zones) through sysfs attribute files.
>>> + */
>>
>> The comment here needs to be more generic since this helper is used in patch 
>> 2
>> in hdev_get_max_segments(). So simply something like:
>>
>> /*
>>  * Get a sysfs attribute value as a long integer.
>>  */
>>
>> And since this helper is used in patch 2, this patch needs to go before 
>> patch 2
>> (reverse patch 2 and 3 order).
> 
> Can I merge patch2 and patch 3 into one patch? Because in patch 2
> hdev_get_max_segments -> get_sysfs_long_val(-> get_sysfs_str_val)
> while in patch 3 get_sysfs_long_val-> get_sysfs_str_val,
> hdev_get_max_segments is required for qemu setting up I guess so the
> dependency is intertwined here. If we use separate patches, then the
> last patch will modify the first patch's code, which I think is messy.

Indeed. So merge the 2 patches to solve this. Rework the commit message too to
mention the introduction of the get_sysfs_long_val() helper.

> 
>>
>>> +static long get_sysfs_long_val(struct stat *st, const char *attribute) {
>>> +#ifdef CONFIG_LINUX
>>> +g_autofree char *str = NULL;
>>> +const char *end;
>>> +long val;
>>> +int ret;
>>> +
>>> +ret = get_sysfs_str_val(st, attribute, );
>>> +if (ret < 0) {
>>> +return ret;
>>> +}
>>> +
>>> +/* The file is ended with '\n', pass 'end' to accept that. */
>>> +ret = qemu_strtol(str, , 10, );
>>> +if (ret == 0 && end && *end == '\n') {
>>> +ret = val;
>>> +}
>>> +return ret;
>>> +#else
>>> +return -ENOTSUP;
>>> +#endif
>>> +}
>>> +
>>>  static int hdev_get_max_segments(int fd, struct stat *st) {
>>>  int ret;
>>>  if (S_ISCHR(st->st_mode)) {
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research


-- 
Damien Le Moal
Western Digital Research



[PATCH for-7.2 v3 13/20] hw/xtensa: set machine->fdt in xtfpga_init()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
all xtensa machines that uses a FDT.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/xtensa/meson.build | 2 +-
 hw/xtensa/xtfpga.c| 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/xtensa/meson.build b/hw/xtensa/meson.build
index 1d5835df4b..ebba51cc74 100644
--- a/hw/xtensa/meson.build
+++ b/hw/xtensa/meson.build
@@ -6,6 +6,6 @@ xtensa_ss.add(files(
 ))
 xtensa_ss.add(when: 'CONFIG_XTENSA_SIM', if_true: files('sim.c'))
 xtensa_ss.add(when: 'CONFIG_XTENSA_VIRT', if_true: files('virt.c'))
-xtensa_ss.add(when: 'CONFIG_XTENSA_XTFPGA', if_true: files('xtfpga.c'))
+xtensa_ss.add(when: 'CONFIG_XTENSA_XTFPGA', if_true: [files('xtfpga.c'), fdt])
 
 hw_arch += {'xtensa': xtensa_ss}
diff --git a/hw/xtensa/xtfpga.c b/hw/xtensa/xtfpga.c
index 2a5556a35f..9e2f911caa 100644
--- a/hw/xtensa/xtfpga.c
+++ b/hw/xtensa/xtfpga.c
@@ -50,6 +50,8 @@
 #include "hw/xtensa/mx_pic.h"
 #include "migration/vmstate.h"
 
+#include 
+
 typedef struct XtfpgaFlashDesc {
 hwaddr base;
 size_t size;
@@ -377,7 +379,12 @@ static void xtfpga_init(const XtfpgaBoardDesc *board, 
MachineState *machine)
 cur_tagptr = put_tag(cur_tagptr, BP_TAG_FDT,
  sizeof(dtb_addr), _addr);
 cur_lowmem = QEMU_ALIGN_UP(cur_lowmem + fdt_size, 4 * KiB);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
 }
 #else
 if (dtb_filename) {
-- 
2.37.2




[PATCH for-7.2 v3 11/20] hw/riscv: set machine->fdt in sifive_u_machine_init()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the sifive_u machine.

Cc: Alistair Francis 
Cc: Bin Meng 
Cc: Palmer Dabbelt 
Reviewed-by: Alistair Francis 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/riscv/sifive_u.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index e4c814a3ea..f14d8411df 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -634,6 +634,12 @@ static void sifive_u_machine_init(MachineState *machine)
 start_addr_hi32 = (uint64_t)start_addr >> 32;
 }
 
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = s->fdt;
+
 /* reset vector */
 uint32_t reset_vec[12] = {
 s->msel,   /* MSEL pin state */
-- 
2.37.2




[PATCH for-7.2 v3 19/20] device_tree.c: add fdt_format_property() helper

2022-08-16 Thread Daniel Henrique Barboza
We want to be able to also print properties with 'info fdt'.

Create a helper to format properties based on the already existing code
from fdt_format_node().

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index a6bfbc0617..9e681739bd 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -766,6 +766,25 @@ static void fdt_prop_format_val(GString *buf, const char 
*propname,
 g_string_append_printf(buf, "];\n");
 }
 
+static void fdt_format_property(GString *buf, const char *propname,
+const void *data, int prop_size,
+int padding)
+{
+if (prop_size == 0) {
+g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
+return;
+}
+
+if (fdt_prop_is_string_array(data, prop_size)) {
+fdt_prop_format_string_array(buf, propname, data, prop_size,
+ padding);
+} else if (fdt_prop_is_uint32_array(prop_size)) {
+fdt_prop_format_uint32_array(buf, propname, data, prop_size,
+ padding);
+} else {
+fdt_prop_format_val(buf, propname, data, prop_size, padding);
+}
+}
 
 static void fdt_format_node(GString *buf, int node, int depth,
 const char *fullpath)
@@ -793,21 +812,7 @@ static void fdt_format_node(GString *buf, int node, int 
depth,
 prop = fdt_get_property_by_offset(fdt, property, _size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
-if (prop_size == 0) {
-g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
-continue;
-}
-
-if (fdt_prop_is_string_array(prop->data, prop_size)) {
-fdt_prop_format_string_array(buf, propname, prop->data,
- prop_size, padding);
-} else if (fdt_prop_is_uint32_array(prop_size)) {
-fdt_prop_format_uint32_array(buf, propname, prop->data, prop_size,
- padding);
-} else {
-fdt_prop_format_val(buf, propname, prop->data,
-prop_size, padding);
-}
+fdt_format_property(buf, propname, prop->data, prop_size, padding);
 }
 
 fdt_for_each_subnode(node, fdt, parent) {
-- 
2.37.2




[PATCH for-7.2 v3 06/20] hw/ppc: set machine->fdt in sam460ex_load_device_tree()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the sam460ex machine.

Cc: BALATON Zoltan 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/sam460ex.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 0357ee077f..413a425d37 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -138,6 +138,7 @@ static int sam460ex_load_device_tree(hwaddr addr,
  hwaddr initrd_size,
  const char *kernel_cmdline)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 uint32_t mem_reg_property[] = { 0, 0, cpu_to_be32(ramsize) };
 char *filename;
 int fdt_size;
@@ -209,7 +210,12 @@ static int sam460ex_load_device_tree(hwaddr addr,
   EBC_FREQ);
 
 rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
 
 return fdt_size;
 }
-- 
2.37.2




[PATCH for-7.2 v3 09/20] hw/ppc: set machine->fdt in pnv_reset()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
all powernv machines.

Cc: Cédric Le Goater 
Cc: Frederic Barrat 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/pnv.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index d3f77c8367..296995a600 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -608,7 +608,13 @@ static void pnv_reset(MachineState *machine)
 qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
 cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
 
-g_free(fdt);
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' commands. Free the existing
+ * machine->fdt to avoid leaking it during a reset.
+ */
+g_free(machine->fdt);
+machine->fdt = fdt;
 }
 
 static ISABus *pnv_chip_power8_isa_create(PnvChip *chip, Error **errp)
-- 
2.37.2




[PATCH for-7.2 v3 16/20] device_tree.c: support string array prop in fdt_format_node()

2022-08-16 Thread Daniel Henrique Barboza
To support printing string properties in 'info fdt' we need to determine
whether a void data might contain a string array.

We do that by casting the void data to a string array and:

- check if the array finishes with a null character
- check if there's no empty string in the middle of the array (i.e.
consecutive \0\0 characters)
- check if all characters of each substring are printable

If all conditions are met, we'll consider it to be a string array data
type and print it accordingly. After this change, 'info fdt' is now able
to print string arrays. Here's an example of string arrays we're able to
print in the /rtas node of the ppc64 pSeries machine:

(qemu) info fdt /rtas
rtas {
(...)
qemu,hypertas-functions = "hcall-memop1";
ibm,hypertas-functions = "hcall-pft","hcall-term","hcall-dabr",
"hcall-interrupt","hcall-tce","hcall-vio","hcall-splpar","hcall-join",
"hcall-bulk","hcall-set-mode","hcall-sprg0","hcall-copy","hcall-debug",
"hcall-vphn","hcall-multi-tce","hcall-hpt-resize","hcall-watchdog";
}

'qemu,hypertas-functions' is a property with a single string while
'ibm,hypertas-functions' is a string array.

Signed-off-by: Daniel Henrique Barboza 
---
 softmmu/device_tree.c | 64 ++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 3fb07b537f..d32d6856da 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -663,6 +663,63 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, Error 
**errp)
 error_setg(errp, "Error when saving machine FDT to file %s", filename);
 }
 
+static bool fdt_prop_is_string_array(const void *data, int size)
+{
+const char *str_arr, *str;
+int i, str_len;
+
+str_arr = str = data;
+
+if (size <= 0 || str_arr[size - 1] != '\0') {
+return false;
+}
+
+while (str < str_arr + size) {
+str_len = strlen(str);
+
+/*
+ * Do not consider empty strings (consecutives \0\0)
+ * as valid.
+ */
+if (str_len == 0) {
+return false;
+}
+
+for (i = 0; i < str_len; i++) {
+if (!g_ascii_isprint(str[i])) {
+return false;
+}
+}
+
+str += str_len + 1;
+}
+
+return true;
+}
+
+static void fdt_prop_format_string_array(GString *buf,
+ const char *propname,
+ const char *data,
+ int prop_size, int padding)
+{
+const char *str = data;
+
+g_string_append_printf(buf, "%*s%s = ", padding, "", propname);
+
+while (str < data + prop_size) {
+/* appends up to the next '\0' */
+g_string_append_printf(buf, "\"%s\"", str);
+
+str += strlen(str) + 1;
+if (str < data + prop_size) {
+/* add a comma separator for the next string */
+g_string_append_printf(buf, ",");
+}
+}
+
+g_string_append_printf(buf, ";\n");
+}
+
 static void fdt_format_node(GString *buf, int node, int depth)
 {
 const struct fdt_property *prop = NULL;
@@ -681,7 +738,12 @@ static void fdt_format_node(GString *buf, int node, int 
depth)
 prop = fdt_get_property_by_offset(fdt, property, _size);
 propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
 
-g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
+if (fdt_prop_is_string_array(prop->data, prop_size)) {
+fdt_prop_format_string_array(buf, propname, prop->data,
+ prop_size, padding);
+} else {
+g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
+}
 }
 
 padding -= 4;
-- 
2.37.2




[PATCH for-7.2 v3 05/20] hw/ppc: set machine->fdt in bamboo_load_device_tree()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the bamboo machine.

Cc: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/ppc440_bamboo.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
index 873f930c77..9c89b23c08 100644
--- a/hw/ppc/ppc440_bamboo.c
+++ b/hw/ppc/ppc440_bamboo.c
@@ -34,6 +34,8 @@
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
 
+#include 
+
 #define BINARY_DEVICE_TREE_FILE "bamboo.dtb"
 
 /* from u-boot */
@@ -62,6 +64,7 @@ static int bamboo_load_device_tree(hwaddr addr,
  hwaddr initrd_size,
  const char *kernel_cmdline)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 int ret = -1;
 uint32_t mem_reg_property[] = { 0, 0, cpu_to_be32(ramsize) };
 char *filename;
@@ -116,7 +119,13 @@ static int bamboo_load_device_tree(hwaddr addr,
   tb_freq);
 
 rom_add_blob_fixed(BINARY_DEVICE_TREE_FILE, fdt, fdt_size, addr);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 return 0;
 }
 
-- 
2.37.2




[PATCH for-7.2 v3 15/20] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-16 Thread Daniel Henrique Barboza
Reading the FDT requires that the user saves the fdt_blob and then use
'dtc' to read the contents. Saving the file and using 'dtc' is a strong
use case when we need to compare two FDTs, but it's a lot of steps if
you want to do quick check on a certain node or property.

'info fdt' retrieves FDT nodes (and properties, later on) and print it
to the user. This can be used to check the FDT on running machines
without having to save the blob and use 'dtc'.

The implementation is based on the premise that the machine thas a FDT
created using libfdt and pointed by 'machine->fdt'. As long as this
pre-requisite is met the machine should be able to support it.

For now we're going to add the required QMP/HMP boilerplate and the
capability of printing the name of the properties of a given node. Next
patches will extend 'info fdt' to be able to print nodes recursively,
and then individual properties.

This command will always be executed in-band (i.e. holding BQL),
avoiding potential race conditions with machines that might change the
FDT during runtime (e.g. PowerPC 'pseries' machine).

'info fdt' is not something that we expect to be used aside from debugging,
so we're implementing it in QMP as 'x-query-fdt'.

This is an example of 'info fdt' fetching the '/chosen' node of the
pSeries machine:

(qemu) info fdt /chosen
chosen {
ibm,architecture-vec-5;
rng-seed;
ibm,arch-vec-5-platform-support;
linux,pci-probe-only;
stdout-path;
linux,stdout-path;
qemu,graphic-depth;
qemu,graphic-height;
qemu,graphic-width;
}

And the same node for the aarch64 'virt' machine:

(qemu) info fdt /chosen
chosen {
stdout-path;
rng-seed;
kaslr-seed;
}

Cc: Dr. David Alan Gilbert 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands-info.hx | 13 ++
 include/monitor/hmp.h|  1 +
 include/sysemu/device_tree.h |  4 +++
 monitor/hmp-cmds.c   | 13 ++
 monitor/qmp-cmds.c   | 12 +
 qapi/machine.json| 19 +++
 softmmu/device_tree.c| 47 
 7 files changed, 109 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 188d9ece3b..743b48865d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -921,3 +921,16 @@ SRST
   ``stats``
 Show runtime-collected statistics
 ERST
+
+{
+.name   = "fdt",
+.args_type  = "nodepath:s",
+.params = "nodepath",
+.help   = "show firmware device tree node given its path",
+.cmd= hmp_info_fdt,
+},
+
+SRST
+  ``info fdt``
+Show a firmware device tree node given its path. Requires libfdt.
+ERST
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index d7f324da59..c0883dd1e3 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -135,6 +135,7 @@ void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict 
*qdict);
 void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_dumpdtb(Monitor *mon, const QDict *qdict);
+void hmp_info_fdt(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index bf7684e4ed..057d13e397 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -14,6 +14,8 @@
 #ifndef DEVICE_TREE_H
 #define DEVICE_TREE_H
 
+#include "qapi/qapi-types-common.h"
+
 void *create_device_tree(int *sizep);
 void *load_device_tree(const char *filename_path, int *sizep);
 #ifdef CONFIG_LINUX
@@ -137,6 +139,8 @@ int qemu_fdt_add_path(void *fdt, const char *path);
 
 void qemu_fdt_dumpdtb(void *fdt, int size);
 void qemu_fdt_qmp_dumpdtb(const char *filename, Error **errp);
+HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath,
+  Error **errp);
 
 /**
  * qemu_fdt_setprop_sized_cells_from_array:
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index d23ec85f9d..accde90380 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2484,3 +2484,16 @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
 error_report_err(local_err);
 }
 }
+
+void hmp_info_fdt(Monitor *mon, const QDict *qdict)
+{
+const char *nodepath = qdict_get_str(qdict, "nodepath");
+Error *err = NULL;
+g_autoptr(HumanReadableText) info = qmp_x_query_fdt(nodepath, );
+
+if (hmp_handle_error(mon, err)) {
+return;
+}
+
+monitor_printf(mon, "%s", info->human_readable_text);
+}
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 8415aca08c..db2c6aa7da 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -603,9 +603,21 @@ void qmp_dumpdtb(const char *filename, Error **errp)
 {
 return 

[PATCH for-7.2 v3 14/20] qmp/hmp, device_tree.c: introduce dumpdtb

2022-08-16 Thread Daniel Henrique Barboza
To save the FDT blob we have the '-machine dumpdtb=' property.
With this property set, the machine saves the FDT in  and exit.
The created file can then be converted to plain text dts format using
'dtc'.

There's nothing particularly sophisticated into saving the FDT that
can't be done with the machine at any state, as long as the machine has
a valid FDT to be saved.

The 'dumpdtb' command receives a 'filename' paramenter and, if a valid
FDT is available, it'll save it in a file 'filename'. In short, this is
a '-machine dumpdtb' that can be fired on demand via QMP/HMP.

A valid FDT consists of a FDT that was created using libfdt being
retrieved via 'current_machine->fdt' in device_tree.c. This condition is
met by most FDT users in QEMU.

This command will always be executed in-band (i.e. holding BQL),
avoiding potential race conditions with machines that might change the
FDT during runtime (e.g. PowerPC 'pseries' machine).

Cc: Dr. David Alan Gilbert 
Reviewed-by: Alistair Francis 
Signed-off-by: Daniel Henrique Barboza 
---
 hmp-commands.hx  | 13 +
 include/monitor/hmp.h|  1 +
 include/sysemu/device_tree.h |  1 +
 monitor/hmp-cmds.c   | 12 
 monitor/qmp-cmds.c   | 13 +
 qapi/machine.json| 17 +
 softmmu/device_tree.c| 18 ++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 182e639d14..d2554e9701 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1800,3 +1800,16 @@ ERST
   "\n\t\t\t\t\t limit on a specified virtual cpu",
 .cmd= hmp_cancel_vcpu_dirty_limit,
 },
+
+SRST
+``dumpdtb`` *filename*
+  Save the FDT in the 'filename' file to be decoded using dtc.
+  Requires 'libfdt' support.
+ERST
+{
+.name   = "dumpdtb",
+.args_type  = "filename:s",
+.params = "[filename] file to save the FDT",
+.help   = "save the FDT in the 'filename' file to be decoded using 
dtc",
+.cmd= hmp_dumpdtb,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index a618eb1e4e..d7f324da59 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -134,6 +134,7 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_dumpdtb(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index ef060a9759..bf7684e4ed 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -136,6 +136,7 @@ int qemu_fdt_add_path(void *fdt, const char *path);
 } while (0)
 
 void qemu_fdt_dumpdtb(void *fdt, int size);
+void qemu_fdt_qmp_dumpdtb(const char *filename, Error **errp);
 
 /**
  * qemu_fdt_setprop_sized_cells_from_array:
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index c6cd6f91dd..d23ec85f9d 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -2472,3 +2472,15 @@ exit:
 exit_no_print:
 error_free(err);
 }
+
+void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
+{
+const char *filename = qdict_get_str(qdict, "filename");
+Error *local_err = NULL;
+
+qmp_dumpdtb(filename, _err);
+
+if (local_err) {
+error_report_err(local_err);
+}
+}
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 7314cd813d..8415aca08c 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -45,6 +45,7 @@
 #include "hw/intc/intc.h"
 #include "hw/rdma/rdma.h"
 #include "monitor/stats.h"
+#include "sysemu/device_tree.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -596,3 +597,15 @@ bool apply_str_list_filter(const char *string, strList 
*list)
 }
 return false;
 }
+
+#ifdef CONFIG_FDT
+void qmp_dumpdtb(const char *filename, Error **errp)
+{
+return qemu_fdt_qmp_dumpdtb(filename, errp);
+}
+#else
+void qmp_dumpdtb(const char *filename, Error **errp)
+{
+error_setg(errp, "dumpdtb requires libfdt");
+}
+#endif
diff --git a/qapi/machine.json b/qapi/machine.json
index 6afd1936b0..aeb013f3dd 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1664,3 +1664,20 @@
  '*size': 'size',
  '*max-size': 'size',
  '*slots': 'uint64' } }
+
+##
+# @dumpdtb:
+#
+# Save the FDT in dtb format. Requires 'libfdt' support.
+#
+# @filename: name of the FDT file to be created
+#
+# Since: 7.2
+#
+# Example:
+#   {"execute": "dumpdtb"}
+#"arguments": { "filename": "/tmp/fdt.dtb" } }
+#
+##
+{ 'command': 'dumpdtb',
+  'data': { 'filename': 'str' } }
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 6ca3fad285..cd487ddd4d 100644
--- 

[PATCH for-7.2 v3 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-16 Thread Daniel Henrique Barboza
The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce HMP commands to read and save the FDT, which
will rely on setting machine->fdt properly to work across all machine
archs/types.

Let's set machine->fdt in the two places where we manipulate the FDT:
spapr_machine_reset() and CAS.

spapr->fdt_blob is left untouched for now. To replace it with
machine->fdt, since we're migrating spapr->fdt_blob, we would need to
migrate machine->fdt as well. This is something that we would like to to
do keep our code simpler but it's a work we'll do another day.

Cc: Cédric Le Goater 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c   | 6 ++
 hw/ppc/spapr_hcall.c | 8 
 2 files changed, 14 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bc9ba6e6dc..7031cf964a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1713,6 +1713,12 @@ static void spapr_machine_reset(MachineState *machine)
 spapr->fdt_initial_size = spapr->fdt_size;
 spapr->fdt_blob = fdt;
 
+/*
+ * Set the common machine->fdt pointer to enable support
+ * for 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 /* Set up the entry state */
 first_ppc_cpu->env.gpr[5] = 0;
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index a8d4a6bcf0..a53bfd76f4 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1256,6 +1256,14 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
 spapr->fdt_initial_size = spapr->fdt_size;
 spapr->fdt_blob = fdt;
 
+/*
+ * Set the machine->fdt pointer again since we just freed
+ * it above (by freeing spapr->fdt_blob). We set this
+ * pointer to enable support for 'dumpdtb' and 'info fdt'
+ * QMP/HMP commands.
+ */
+MACHINE(spapr)->fdt = fdt;
+
 return H_SUCCESS;
 }
 
-- 
2.37.2




[PATCH for-7.2 v3 08/20] hw/ppc: set machine->fdt in pegasos2_machine_reset()

2022-08-16 Thread Daniel Henrique Barboza
We'll introduce QMP/HMP commands that requires machine->fdt to be set
properly.

Cc: BALATON Zoltan 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/pegasos2.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 61f4263953..624036d88b 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -331,6 +331,13 @@ static void pegasos2_machine_reset(MachineState *machine)
 
 vof_build_dt(fdt, pm->vof);
 vof_client_open_store(fdt, pm->vof, "/chosen", "stdout", "/failsafe");
+
+/*
+ * Set the common machine->fdt pointer to enable support
+ * for 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 pm->cpu->vhyp = PPC_VIRTUAL_HYPERVISOR(machine);
 }
 
-- 
2.37.2




[PATCH for-7.2 v3 07/20] hw/ppc: set machine->fdt in xilinx_load_device_tree()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the virtex_ml507 machine.

Cc: Edgar E. Iglesias 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/virtex_ml507.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
index 53b126ff48..9f4c5d85a4 100644
--- a/hw/ppc/virtex_ml507.c
+++ b/hw/ppc/virtex_ml507.c
@@ -45,6 +45,8 @@
 #include "hw/qdev-properties.h"
 #include "ppc405.h"
 
+#include 
+
 #define EPAPR_MAGIC(0x45504150)
 #define FLASH_SIZE (16 * MiB)
 
@@ -153,6 +155,7 @@ static int xilinx_load_device_tree(hwaddr addr,
   hwaddr initrd_size,
   const char *kernel_cmdline)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 char *path;
 int fdt_size;
 void *fdt = NULL;
@@ -197,7 +200,13 @@ static int xilinx_load_device_tree(hwaddr addr,
 if (r < 0)
 fprintf(stderr, "couldn't set /chosen/bootargs\n");
 cpu_physical_memory_write(addr, fdt, fdt_size);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 return fdt_size;
 }
 
-- 
2.37.2




[PATCH for-7.2 v3 04/20] hw/ppc: set machine->fdt in ppce500_load_device_tree()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
the e500 machine.

Cc: Cédric Le Goater 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/e500.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 32495d0123..6197a905b8 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -47,6 +47,8 @@
 #include "hw/i2c/i2c.h"
 #include "hw/irq.h"
 
+#include 
+
 #define EPAPR_MAGIC(0x45504150)
 #define DTC_LOAD_PAD   0x180
 #define DTC_PAD_MASK   0xF
@@ -600,7 +602,16 @@ done:
 cpu_physical_memory_write(addr, fdt, fdt_size);
 }
 ret = fdt_size;
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ *
+ * The FDT is re-created during reset, so free machine->fdt
+ * to avoid leaking the old FDT.
+ */
+g_free(machine->fdt);
+machine->fdt = fdt;
 
 out:
 g_free(pci_map);
-- 
2.37.2




[PATCH for-7.2 v3 02/20] hw/microblaze: set machine->fdt in microblaze_load_dtb()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
all microblaze machines that uses microblaze_load_dtb().

Cc: Edgar E. Iglesias 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/microblaze/boot.c  | 11 ++-
 hw/microblaze/meson.build |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/microblaze/boot.c b/hw/microblaze/boot.c
index 8b92a9801a..e9ebc04381 100644
--- a/hw/microblaze/boot.c
+++ b/hw/microblaze/boot.c
@@ -39,6 +39,8 @@
 
 #include "boot.h"
 
+#include 
+
 static struct
 {
 void (*machine_cpu_reset)(MicroBlazeCPU *);
@@ -72,6 +74,7 @@ static int microblaze_load_dtb(hwaddr addr,
const char *kernel_cmdline,
const char *dtb_filename)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 int fdt_size;
 void *fdt = NULL;
 int r;
@@ -100,7 +103,13 @@ static int microblaze_load_dtb(hwaddr addr,
 }
 
 cpu_physical_memory_write(addr, fdt, fdt_size);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 return fdt_size;
 }
 
diff --git a/hw/microblaze/meson.build b/hw/microblaze/meson.build
index bb9e4eb8f4..a38a397872 100644
--- a/hw/microblaze/meson.build
+++ b/hw/microblaze/meson.build
@@ -1,5 +1,5 @@
 microblaze_ss = ss.source_set()
-microblaze_ss.add(files('boot.c'))
+microblaze_ss.add(files('boot.c'), fdt)
 microblaze_ss.add(when: 'CONFIG_PETALOGIX_S3ADSP1800', if_true: 
files('petalogix_s3adsp1800_mmu.c'))
 microblaze_ss.add(when: 'CONFIG_PETALOGIX_ML605', if_true: 
files('petalogix_ml605_mmu.c'))
 microblaze_ss.add(when: 'CONFIG_XLNX_ZYNQMP_PMU', if_true: 
files('xlnx-zynqmp-pmu.c'))
-- 
2.37.2




[PATCH for-7.2 v3 03/20] hw/nios2: set machine->fdt in nios2_load_dtb()

2022-08-16 Thread Daniel Henrique Barboza
This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
all nios2 machines that uses nios2_load_dtb().

Cc: Chris Wulff 
Cc: Marek Vasut 
Signed-off-by: Daniel Henrique Barboza 
---
 hw/nios2/boot.c  | 11 ++-
 hw/nios2/meson.build |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/nios2/boot.c b/hw/nios2/boot.c
index 21cb47..db3b21fea6 100644
--- a/hw/nios2/boot.c
+++ b/hw/nios2/boot.c
@@ -43,6 +43,8 @@
 
 #include "boot.h"
 
+#include 
+
 #define NIOS2_MAGIC0x534f494e
 
 static struct nios2_boot_info {
@@ -81,6 +83,7 @@ static uint64_t translate_kernel_address(void *opaque, 
uint64_t addr)
 static int nios2_load_dtb(struct nios2_boot_info bi, const uint32_t ramsize,
   const char *kernel_cmdline, const char *dtb_filename)
 {
+MachineState *machine = MACHINE(qdev_get_machine());
 int fdt_size;
 void *fdt = NULL;
 int r;
@@ -113,7 +116,13 @@ static int nios2_load_dtb(struct nios2_boot_info bi, const 
uint32_t ramsize,
 }
 
 cpu_physical_memory_write(bi.fdt, fdt, fdt_size);
-g_free(fdt);
+
+/*
+ * Update the machine->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+machine->fdt = fdt;
+
 return fdt_size;
 }
 
diff --git a/hw/nios2/meson.build b/hw/nios2/meson.build
index 6c58e8082b..22277bd6c5 100644
--- a/hw/nios2/meson.build
+++ b/hw/nios2/meson.build
@@ -1,5 +1,5 @@
 nios2_ss = ss.source_set()
-nios2_ss.add(files('boot.c'))
+nios2_ss.add(files('boot.c'), fdt)
 nios2_ss.add(when: 'CONFIG_NIOS2_10M50', if_true: files('10m50_devboard.c'))
 nios2_ss.add(when: 'CONFIG_NIOS2_GENERIC_NOMMU', if_true: 
files('generic_nommu.c'))
 
-- 
2.37.2




[PATCH for-7.2 v3 01/20] hw/arm: do not free machine->fdt in arm_load_dtb()

2022-08-16 Thread Daniel Henrique Barboza
At this moment, arm_load_dtb() can free machine->fdt when
binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
machine->fdt. And, in that case, the existing g_free(fdt) at the end of
arm_load_dtb() will make machine->fdt point to an invalid memory region.

This is not an issue right now because there's no code that access
machine->fdt after arm_load_dtb(), but we're going to add a couple do
FDT HMP commands that will rely on machine->fdt being valid.

Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
machine->fdt. This will allow the FDT of ARM machines that relies on
arm_load_dtb() to be accessed later on.

Since all ARM machines allocates the FDT only once, we don't need to
worry about leaking the existing FDT during a machine reset (which is
something that other machines have to look after, e.g. the ppc64 pSeries
machine).

Cc: Peter Maydell 
Cc: qemu-...@nongnu.org
Signed-off-by: Daniel Henrique Barboza 
---
 hw/arm/boot.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ada2717f76..669a978157 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -684,7 +684,11 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
  */
 rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
 
-g_free(fdt);
+/*
+ * Update the ms->fdt pointer to enable support for
+ * 'dumpdtb' and 'info fdt' QMP/HMP commands.
+ */
+ms->fdt = fdt;
 
 return size;
 
-- 
2.37.2




[PATCH for-7.2 v3 00/20] QMP/HMP: add 'dumpdtb' and 'info fdt' commands

2022-08-16 Thread Daniel Henrique Barboza
Hi,

In this new version the most notable changes are:

- removed fdt_pack() from machine specific code. As discussed in the previous
version, the proper use of fdt_pack() would require more work/thought and,
since it's not required for the work we're doing here, it was removed;

- we're now handling string arrays. The previous version was interpreting
all string properties as a single, plain string. We're now dealing with string
arrays instead;

- changed the output format to be more in line with the dts format.

Other small changes were made based on the feeback of the previous version.

Changes from v2:
- patches 1-8:
  - remove fdt_pack() to shrink the FDT before assigning it to ms->fdt
- patch 9:
  - call g_free(ms->fdt) to avoid leaking an old fdt during reset
- patch 10:
  - added a commit msg note about why we're not eliminating spapr->fdt_blob
for machine->fdt at this moment
- patches 11, 12:
  - remove fdt_pack() to shrink the FDT before assigning it to ms->fdt
  - added Alistair's r-b
- patch 13:
  - remove fdt_pack() to shrink the FDT before assigning it to ms->fdt
- patch 14:
  - added a commit msg note about BQL
- patch 15:
  - added a commit msg note about BQL
- patch 16:
  - renamed fdt_prop_is_string to fdt_prop_is_string_array. \0 characters
in the middle of the data array is now legal
  - added a new fdt_prop_format_string_array() to format the string array
  - added a semicolon at the end of the string array
- patch 17:
  - added semicolon at the end of properties
  - use %02x instead of %x to format vals in [] notation

- v2 link: https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00937.html


Daniel Henrique Barboza (20):
  hw/arm: do not free machine->fdt in arm_load_dtb()
  hw/microblaze: set machine->fdt in microblaze_load_dtb()
  hw/nios2: set machine->fdt in nios2_load_dtb()
  hw/ppc: set machine->fdt in ppce500_load_device_tree()
  hw/ppc: set machine->fdt in bamboo_load_device_tree()
  hw/ppc: set machine->fdt in sam460ex_load_device_tree()
  hw/ppc: set machine->fdt in xilinx_load_device_tree()
  hw/ppc: set machine->fdt in pegasos2_machine_reset()
  hw/ppc: set machine->fdt in pnv_reset()
  hw/ppc: set machine->fdt in spapr machine
  hw/riscv: set machine->fdt in sifive_u_machine_init()
  hw/riscv: set machine->fdt in spike_board_init()
  hw/xtensa: set machine->fdt in xtfpga_init()
  qmp/hmp, device_tree.c: introduce dumpdtb
  qmp/hmp, device_tree.c: introduce 'info fdt' command
  device_tree.c: support string array prop in fdt_format_node()
  device_tree.c: support remaining FDT prop types
  device_node.c: enable 'info fdt' to print subnodes
  device_tree.c: add fdt_format_property() helper
  hmp, device_tree.c: add 'info fdt ' support

 hmp-commands-info.hx |  14 +++
 hmp-commands.hx  |  13 +++
 hw/arm/boot.c|   6 +-
 hw/microblaze/boot.c |  11 +-
 hw/microblaze/meson.build|   2 +-
 hw/nios2/boot.c  |  11 +-
 hw/nios2/meson.build |   2 +-
 hw/ppc/e500.c|  13 ++-
 hw/ppc/pegasos2.c|   7 ++
 hw/ppc/pnv.c |   8 +-
 hw/ppc/ppc440_bamboo.c   |  11 +-
 hw/ppc/sam460ex.c|   8 +-
 hw/ppc/spapr.c   |   6 +
 hw/ppc/spapr_hcall.c |   8 ++
 hw/ppc/virtex_ml507.c|  11 +-
 hw/riscv/sifive_u.c  |   6 +
 hw/riscv/spike.c |   9 ++
 hw/xtensa/meson.build|   2 +-
 hw/xtensa/xtfpga.c   |   9 +-
 include/monitor/hmp.h|   2 +
 include/sysemu/device_tree.h |   7 ++
 monitor/hmp-cmds.c   |  28 +
 monitor/qmp-cmds.c   |  27 +
 qapi/machine.json|  38 ++
 softmmu/device_tree.c| 219 +++
 25 files changed, 466 insertions(+), 12 deletions(-)

-- 
2.37.2




Re: [PATCH v7 2/8] file-posix: introduce get_sysfs_str_val for device zoned model

2022-08-16 Thread Damien Le Moal
On 2022/08/15 23:25, Sam Li wrote:
> Use sysfs attribute files to get the string value of device
> zoned model. Then get_sysfs_zoned_model can convert it to
> BlockZoneModel type in QEMU.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/file-posix.c   | 93 ++--
>  include/block/block_int-common.h |  3 ++
>  2 files changed, 55 insertions(+), 41 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 48cd096624..c07ac4c697 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1210,66 +1210,71 @@ static int hdev_get_max_hw_transfer(int fd, struct 
> stat *st)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(int fd, struct stat *st)
> -{
> +/*
> + * Convert the zoned attribute file in sysfs to internal value.

This function does not convert anything. So this comment should be changed to
something like:

/*
 * Get a sysfs attribute value as a character string.
 */

> + */
> +static int get_sysfs_str_val(struct stat *st, const char *attribute,
> + char **val) {
>  #ifdef CONFIG_LINUX
> -char buf[32];
> -const char *end;
> -char *sysfspath = NULL;
> +g_autofree char *sysfspath = NULL;
>  int ret;
> -int sysfd = -1;
> -long max_segments;
> -
> -if (S_ISCHR(st->st_mode)) {
> -if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
> -return ret;
> -}
> -return -ENOTSUP;
> -}
> +size_t len;
>  
>  if (!S_ISBLK(st->st_mode)) {
>  return -ENOTSUP;
>  }
>  
> -sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -major(st->st_rdev), minor(st->st_rdev));
> -sysfd = open(sysfspath, O_RDONLY);
> -if (sysfd == -1) {
> -ret = -errno;
> -goto out;
> +sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> +major(st->st_rdev), minor(st->st_rdev),
> +attribute);
> +ret = g_file_get_contents(sysfspath, val, , NULL);
> +if (ret == -1) {
> +return -ENOENT;
>  }
> -do {
> -ret = read(sysfd, buf, sizeof(buf) - 1);
> -} while (ret == -1 && errno == EINTR);
> +return ret;
> +#else
> +return -ENOTSUP;
> +#endif
> +}
> +
> +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) {
> +g_autofree char *val = NULL;
> +int ret;
> +
> +ret = get_sysfs_str_val(st, "zoned", );
>  if (ret < 0) {
> -ret = -errno;
> -goto out;
> -} else if (ret == 0) {
> -ret = -EIO;
> -goto out;
> +return ret;
>  }
> -buf[ret] = 0;
> -/* The file is ended with '\n', pass 'end' to accept that. */
> -ret = qemu_strtol(buf, , 10, _segments);
> -if (ret == 0 && end && *end == '\n') {
> -ret = max_segments;
> +
> +if (strcmp(val, "host-managed") == 0) {
> +*zoned = BLK_Z_HM;
> +} else if (strcmp(val, "host-aware") == 0) {
> +*zoned = BLK_Z_HA;
> +} else if (strcmp(val, "none") == 0) {
> +*zoned = BLK_Z_NONE;
> +} else {
> +return -ENOTSUP;
>  }
> +return 0;
> +}
>  
> -out:
> -if (sysfd != -1) {
> -close(sysfd);
> +static int hdev_get_max_segments(int fd, struct stat *st) {
> +int ret;

Add a blank line here ? Not sure about the qemu code style convention. But a
blank line after a variable declaration is always nice to clearly separate
declarations and code.

> +if (S_ISCHR(st->st_mode)) {
> +if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
> +return ret;
> +}
> +return -ENOTSUP;
>  }
> -g_free(sysfspath);
> -return ret;
> -#else
> -return -ENOTSUP;
> -#endif
> +return get_sysfs_long_val(st, "max_segments");
>  }
>  
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
>  struct stat st;
> +int ret;
> +BlockZoneModel zoned;
>  
>  s->needs_alignment = raw_needs_alignment(bs);
>  raw_probe_alignment(bs, s->fd, errp);
> @@ -1307,6 +1312,12 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.max_hw_iov = ret;
>  }
>  }
> +
> +ret = get_sysfs_zoned_model(s->fd, , );
> +if (ret < 0) {
> +zoned = BLK_Z_NONE;
> +}
> +bs->bl.zoned = zoned;
>  }
>  
>  static int check_for_dasd(int fd)
> diff --git a/include/block/block_int-common.h 
> b/include/block/block_int-common.h
> index 8947abab76..7f7863cc9e 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -825,6 +825,9 @@ typedef struct BlockLimits {
>  
>  /* maximum number of iovec elements */
>  int max_iov;
> +
> +/* device zone model */
> +BlockZoneModel zoned;
>  } BlockLimits;
>  
>  typedef struct BdrvOpBlocker BdrvOpBlocker;


-- 
Damien Le Moal
Western Digital Research

Re: [PATCH v7 1/8] include: add zoned device structs

2022-08-16 Thread Damien Le Moal
On 2022/08/15 23:25, Sam Li wrote:
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 

Looks good.

Reviewed-by: Damien Le Moal 

> ---
>  include/block/block-common.h | 43 
>  1 file changed, 43 insertions(+)
> 
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index fdb7306e78..36bd0e480e 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildClass BdrvChildClass;
>  
> +typedef enum BlockZoneOp {
> +BLK_ZO_OPEN,
> +BLK_ZO_CLOSE,
> +BLK_ZO_FINISH,
> +BLK_ZO_RESET,
> +} BlockZoneOp;
> +
> +typedef enum BlockZoneModel {
> +BLK_Z_NONE = 0x0, /* Regular block device */
> +BLK_Z_HM = 0x1, /* Host-managed zoned block device */
> +BLK_Z_HA = 0x2, /* Host-aware zoned block device */
> +} BlockZoneModel;
> +
> +typedef enum BlockZoneCondition {
> +BLK_ZS_NOT_WP = 0x0,
> +BLK_ZS_EMPTY = 0x1,
> +BLK_ZS_IOPEN = 0x2,
> +BLK_ZS_EOPEN = 0x3,
> +BLK_ZS_CLOSED = 0x4,
> +BLK_ZS_RDONLY = 0xD,
> +BLK_ZS_FULL = 0xE,
> +BLK_ZS_OFFLINE = 0xF,
> +} BlockZoneCondition;
> +
> +typedef enum BlockZoneType {
> +BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
> +BLK_ZT_SWR = 0x2, /* Sequential writes required */
> +BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> +} BlockZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provides information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> +uint64_t start;
> +uint64_t length;
> +uint64_t cap;
> +uint64_t wp;
> +BlockZoneType type;
> +BlockZoneCondition cond;
> +} BlockZoneDescriptor;
> +
>  typedef struct BlockDriverInfo {
>  /* in bytes, 0 if irrelevant */
>  int cluster_size;


-- 
Damien Le Moal
Western Digital Research



Re: [PATCH 00/24] Support VIRTIO_F_RING_RESET for virtio-net, vhost-user, vhost-kernel in virtio pci-modern

2022-08-16 Thread Xuan Zhuo
On Tue, 16 Aug 2022 02:22:16 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Aug 16, 2022 at 02:15:57PM +0800, Xuan Zhuo wrote:
> > On Tue, 16 Aug 2022 02:14:10 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Tue, Aug 16, 2022 at 09:06:12AM +0800, Kangjie Xu wrote:
> > > > The virtio queue reset function has already been defined in the virtio 
> > > > spec 1.2.
> > > > The relevant virtio spec information is here:
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/issues/124
> > > > https://github.com/oasis-tcs/virtio-spec/issues/139
> > > >
> > > > This patch set is to support this function in QEMU. It consists of 
> > > > several parts:
> > > > 1. Patches 1-7 are the basic interfaces for vq reset in virtio and 
> > > > virtio-pci.
> > > > 2. Patches 8-12 support vq stop and vq restart for vhost-kernel.
> > > > 3. Patches 13-19 support vq stop and vq restart for vhost-user.
> > > > 4. Patches 20-22 support vq reset and re-enable for virtio-net.
> > > > 5. Patches 23-24 enable the vq reset feature for vhost-kernel and 
> > > > vhost-user.
> > > >
> > > > The process of virtqueue reset can be concluded as:
> > > > 1. The virtqueue is disabled when VIRTIO_PCI_COMMON_Q_RESET is written.
> > > > 2. Then the virtqueue can be optionally restarted(re-enabled).
> > > >
> > > > Since this patch set involves multiple modules and seems a bit messy, 
> > > > we briefly describe the
> > > > calling process for different modes below.
> > > > virtio-net:
> > > > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > > > -> virtio_queue_reset() [virtio]
> > > > -> virtio_net_queue_reset() [virtio-net]
> > > > -> __virtio_queue_reset()
> > > > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > > > -> set enabled, reset status of vq.
> > > >
> > > > vhost-kernel:
> > > > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > > > -> virtio_queue_reset() [virtio]
> > > > -> virtio_net_queue_reset() [virtio-net]
> > > > -> vhost_net_virtqueue_stop() [vhost-net]
> > > > -> vhost_net_set_backend() [vhost]
> > > > -> vhost_dev_virtqueue_stop()
> > > > -> vhost_virtqueue_unmap()
> > > > -> __virtio_queue_reset()
> > > > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > > > -> virtio_queue_enable() [virtio]
> > > > -> virtio_net_queue_enable() [virtio-net]
> > > > -> vhost_net_virtqueue_restart() [vhost-net]
> > > > -> vhost_dev_virtqueue_restart() [vhost]
> > > > -> vhost_virtqueue_start()
> > > > -> vhost_net_set_backend()
> > > > -> set enabled, reset status of vq.
> > > >
> > > > vhost-user:
> > > > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > > > -> virtio_queue_reset() [virtio]
> > > > -> virtio_net_queue_reset() [virtio-net]
> > > > -> vhost_net_virtqueue_stop() [vhost-net]
> > > > -> vhost_dev_virtqueue_stop() [vhost]
> > > > -> vhost_user_reset_vring() [vhost-user]
> > > > -> send VHOST_USER_RESET_VRING to the device
> > > > -> vhost_virtqueue_unmap()
> > > > -> __virtio_queue_reset()
> > > > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > > > -> virtio_queue_enable() [virtio]
> > > > -> virtio_net_queue_enable() [virtio-net]
> > > > -> vhost_net_virtqueue_restart() [vhost-net]
> > > > -> vhost_dev_virtqueue_restart() [vhost]
> > > > -> vhost_virtqueue_start()
> > > > -> vhost_user_set_single_vring_enable [vhost-user]
> > > > -> send VHOST_USER_SET_VRING_ENABLE to the 
> > > > device
> > > > -> set enabled, reset status of vq.
> > > >
> > > >
> > > > Test environment:
> > > > Host: 5.19.0-rc3 (With vq reset support)
> > > > Qemu: QEMU emulator version 7.0.50
> > > > Guest: 5.19.0-rc3 (With vq reset support)
> > > > DPDK: 22.07-rc1 (With vq reset support)
> > > > Test Cmd: ethtool -g eth1; ethtool -G eth1 rx $1 tx $2; ethtool -g 
> > > > eth1;
> > > >
> > > > The drvier can resize the virtio queue, then virtio queue reset 
> > > > function should
> > > > be triggered.
> > > >
> > > > The default is split mode, modify Qemu virtio-net to add PACKED 
> > > > feature to
> > > > test packed mode.
> > >
> > > legacy mode testing?
> >
> >
> > legacy does not support vq reset.
> >
> > Thanks.
>
> yes but did it break with all these code changes?

OK, I see, we'll test this.

Thanks.


>
> > >
> > > > Guest Kernel Patch:
> > > > 
> > > > https://lore.kernel.org/bpf/20220801063902.129329-1-xuanz...@linux.alibaba.com/
> > > >
> > > > DPDK Patch:
> > > > 
> > > > https://github.com/middaywords/dpdk/compare/72206323a5dd3182b13f61b25a64abdddfee595c...eabadfac7953da66bc10ffb8284b490d09bb7ec7
> > > >
> > > > Host Kernel Patch:
> > > > 

Re: [PATCH 00/24] Support VIRTIO_F_RING_RESET for virtio-net, vhost-user, vhost-kernel in virtio pci-modern

2022-08-16 Thread Xuan Zhuo
On Tue, 16 Aug 2022 02:14:10 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Aug 16, 2022 at 09:06:12AM +0800, Kangjie Xu wrote:
> > The virtio queue reset function has already been defined in the virtio spec 
> > 1.2.
> > The relevant virtio spec information is here:
> >
> > https://github.com/oasis-tcs/virtio-spec/issues/124
> > https://github.com/oasis-tcs/virtio-spec/issues/139
> >
> > This patch set is to support this function in QEMU. It consists of several 
> > parts:
> > 1. Patches 1-7 are the basic interfaces for vq reset in virtio and 
> > virtio-pci.
> > 2. Patches 8-12 support vq stop and vq restart for vhost-kernel.
> > 3. Patches 13-19 support vq stop and vq restart for vhost-user.
> > 4. Patches 20-22 support vq reset and re-enable for virtio-net.
> > 5. Patches 23-24 enable the vq reset feature for vhost-kernel and 
> > vhost-user.
> >
> > The process of virtqueue reset can be concluded as:
> > 1. The virtqueue is disabled when VIRTIO_PCI_COMMON_Q_RESET is written.
> > 2. Then the virtqueue can be optionally restarted(re-enabled).
> >
> > Since this patch set involves multiple modules and seems a bit messy, we 
> > briefly describe the
> > calling process for different modes below.
> > virtio-net:
> > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > -> virtio_queue_reset() [virtio]
> > -> virtio_net_queue_reset() [virtio-net]
> > -> __virtio_queue_reset()
> > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > -> set enabled, reset status of vq.
> >
> > vhost-kernel:
> > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > -> virtio_queue_reset() [virtio]
> > -> virtio_net_queue_reset() [virtio-net]
> > -> vhost_net_virtqueue_stop() [vhost-net]
> > -> vhost_net_set_backend() [vhost]
> > -> vhost_dev_virtqueue_stop()
> > -> vhost_virtqueue_unmap()
> > -> __virtio_queue_reset()
> > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > -> virtio_queue_enable() [virtio]
> > -> virtio_net_queue_enable() [virtio-net]
> > -> vhost_net_virtqueue_restart() [vhost-net]
> > -> vhost_dev_virtqueue_restart() [vhost]
> > -> vhost_virtqueue_start()
> > -> vhost_net_set_backend()
> > -> set enabled, reset status of vq.
> >
> > vhost-user:
> > 1. VIRTIO_PCI_COMMON_Q_RESET is written [virtio-pci]
> > -> virtio_queue_reset() [virtio]
> > -> virtio_net_queue_reset() [virtio-net]
> > -> vhost_net_virtqueue_stop() [vhost-net]
> > -> vhost_dev_virtqueue_stop() [vhost]
> > -> vhost_user_reset_vring() [vhost-user]
> > -> send VHOST_USER_RESET_VRING to the device
> > -> vhost_virtqueue_unmap()
> > -> __virtio_queue_reset()
> > 2. VIRTIO_PCI_COMMON_Q_ENABLE is written [virtio-pci]
> > -> virtio_queue_enable() [virtio]
> > -> virtio_net_queue_enable() [virtio-net]
> > -> vhost_net_virtqueue_restart() [vhost-net]
> > -> vhost_dev_virtqueue_restart() [vhost]
> > -> vhost_virtqueue_start()
> > -> vhost_user_set_single_vring_enable [vhost-user]
> > -> send VHOST_USER_SET_VRING_ENABLE to the device
> > -> set enabled, reset status of vq.
> >
> >
> > Test environment:
> > Host: 5.19.0-rc3 (With vq reset support)
> > Qemu: QEMU emulator version 7.0.50
> > Guest: 5.19.0-rc3 (With vq reset support)
> > DPDK: 22.07-rc1 (With vq reset support)
> > Test Cmd: ethtool -g eth1; ethtool -G eth1 rx $1 tx $2; ethtool -g eth1;
> >
> > The drvier can resize the virtio queue, then virtio queue reset 
> > function should
> > be triggered.
> >
> > The default is split mode, modify Qemu virtio-net to add PACKED feature 
> > to
> > test packed mode.
>
> legacy mode testing?


legacy does not support vq reset.

Thanks.

>
> > Guest Kernel Patch:
> > 
> > https://lore.kernel.org/bpf/20220801063902.129329-1-xuanz...@linux.alibaba.com/
> >
> > DPDK Patch:
> > 
> > https://github.com/middaywords/dpdk/compare/72206323a5dd3182b13f61b25a64abdddfee595c...eabadfac7953da66bc10ffb8284b490d09bb7ec7
> >
> > Host Kernel Patch:
> > 
> > https://github.com/middaywords/linux/commit/19a91e0d7167b2031e46078c6215c213b89cb2c3
> >
> > Looking forward to your review and comments. Thanks.
> >
> > Kangjie Xu (19):
> >   virtio: introduce virtio_queue_enable()
> >   virtio: core: vq reset feature negotation support
> >   virtio-pci: support queue enable
> >   vhost: extract the logic of unmapping the vrings and desc
> >   vhost: introduce vhost_dev_virtqueue_stop()
> >   vhost: introduce vhost_dev_virtqueue_restart()
> >   vhost-net: vhost-kernel: introduce vhost_net_virtqueue_stop()
> >   vhost-net: vhost-kernel: introduce vhost_net_virtqueue_restart()
> >   docs: vhost-user: add 

Re: [PATCH v2 4/4] virt/hw/virt: Add virt_set_high_memmap() helper

2022-08-16 Thread Zhenyu Zhang
commit 49e00c1fe2ab24b73ac16908f3c05ebe88b9186d (HEAD -> master)
Author: Gavin Shan 
Date:   Mon Aug 15 14:29:58 2022 +0800

virt/hw/virt: Add virt_set_high_memmap() helper

The logic to assign high memory region's address in virt_set_memmap()
is independent. Lets move the logic to virt_set_high_memmap() helper.
"each device" is replaced by "each region" in the comments.

No functional change intended.

Signed-off-by: Gavin Shan 

The patchs works well on my Fujitsu host.

[root@hpe-apollo80-02-n00 qemu]# /home/qemu/build/qemu-system-aarch64 -version
QEMU emulator version 7.0.92 (v7.1.0-rc2-12-gd102b8162a)
[root@hpe-apollo80-02-n00 qemu]# /home/qemu/build/qemu-system-aarch64
-accel kvm -m 4096,maxmem=1023G -machine virt-2.12 -cpu host

[root@hpe-apollo80-02-n00 qemu]# /home/qemu/build/qemu-system-aarch64
-accel kvm -m 4096,maxmem=1024G -machine virt-2.12 -cpu host
qemu-system-aarch64: -accel kvm: Addressing limited to 40 bits, but
memory exceeds it by 1073741824 bytes

[root@hpe-apollo80-02-n00 qemu]# /home/qemu/build/qemu-system-aarch64
-accel kvm -m 4096,maxmem=1023G -machine virt -cpu host

[root@hpe-apollo80-02-n00 qemu]# /home/qemu/build/qemu-system-aarch64
-accel kvm -m 4096,maxmem=1024G -machine virt -cpu host
qemu-system-aarch64: -accel kvm: Addressing limited to 40 bits, but
memory exceeds it by 1073741824 bytes

Tested-by:zheny...@redhat.com


On Mon, Aug 15, 2022 at 2:30 PM Gavin Shan  wrote:
>
> The logic to assign high memory region's address in virt_set_memmap()
> is independent. Lets move the logic to virt_set_high_memmap() helper.
> "each device" is replaced by "each region" in the comments.
>
> No functional change intended.
>
> Signed-off-by: Gavin Shan 
> ---
>  hw/arm/virt.c | 92 ---
>  1 file changed, 50 insertions(+), 42 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index e38b6919c9..4dde08a924 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1688,6 +1688,55 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
> *vms, int idx)
>  return arm_cpu_mp_affinity(idx, clustersz);
>  }
>
> +static void virt_set_high_memmap(VirtMachineState *vms,
> + hwaddr base, int pa_bits)
> +{
> +hwaddr region_base, region_size;
> +bool *region_enabled, fits;
> +int i;
> +
> +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> +region_base = ROUND_UP(base, extended_memmap[i].size);
> +region_size = extended_memmap[i].size;
> +
> +switch (i) {
> +case VIRT_HIGH_GIC_REDIST2:
> +region_enabled = >highmem_redists;
> +break;
> +case VIRT_HIGH_PCIE_ECAM:
> +region_enabled = >highmem_ecam;
> +break;
> +case VIRT_HIGH_PCIE_MMIO:
> +region_enabled = >highmem_mmio;
> +break;
> +default:
> +region_enabled = NULL;
> +}
> +
> +/* Skip unknwon or disabled regions */
> +if (!region_enabled || !*region_enabled) {
> +continue;
> +}
> +
> +/*
> + * Check each region to see if they fit in the PA space,
> + * moving highest_gpa as we go.
> + *
> + * For each device that doesn't fit, disable it.
> + */
> +fits = (region_base + region_size) <= BIT_ULL(pa_bits);
> +if (fits) {
> +vms->memmap[i].base = region_base;
> +vms->memmap[i].size = region_size;
> +
> +base = region_base + region_size;
> +vms->highest_gpa = region_base + region_size - 1;
> +} else {
> +*region_enabled = false;
> +}
> +}
> +}
> +
>  static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
>  {
>  MachineState *ms = MACHINE(vms);
> @@ -1742,48 +1791,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
> pa_bits)
>
>  /* We know for sure that at least the memory fits in the PA space */
>  vms->highest_gpa = memtop - 1;
> -
> -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> -hwaddr region_base = ROUND_UP(base, extended_memmap[i].size);
> -hwaddr region_size = extended_memmap[i].size;
> -bool *region_enabled, fits;
> -
> -switch (i) {
> -case VIRT_HIGH_GIC_REDIST2:
> -region_enabled = >highmem_redists;
> -break;
> -case VIRT_HIGH_PCIE_ECAM:
> -region_enabled = >highmem_ecam;
> -break;
> -case VIRT_HIGH_PCIE_MMIO:
> -region_enabled = >highmem_mmio;
> -break;
> -default:
> -region_enabled = NULL;
> -}
> -
> -/* Skip unknwon or disabled regions */
> -if (!region_enabled || !*region_enabled) {
> -continue;
> -}
> -
> -/*
> - * Check each device to see if they fit in the PA space,
> - * moving highest_gpa as 

  1   2   3   >