date:20190822

Re: [Qemu-devel] [RFC PATCH qemu] qapi: Add query-memory-checksum

2019-08-22 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Thu, Aug 22, 2019 at 04:16:53PM +0200, Markus Armbruster wrote:
>> Alexey Kardashevskiy  writes:
>> 
>> > This returns MD5 checksum of all RAM blocks for migration debugging
>> > as this is way faster than saving the entire RAM to a file and checking
>> > that.
>> >
>> > Signed-off-by: Alexey Kardashevskiy 
>> 
>> Any particular reason for MD5?  Have you measured the other choices
>> offered by GLib?
>> 
>> I understand you don't need crypto-strength here.  Both MD5 and SHA-1
>> would be bad choices then.
>
> We have a tests/bench-crypto-hash test but its hardcoded for sha256.
> I hacked it to report all algorithms and got these results for varying
> input chunk sizes:
>
> /crypto/hash/md5/speed-512: 519.12 MB/sec OK
> /crypto/hash/md5/speed-1024: 560.39 MB/sec OK
> /crypto/hash/md5/speed-4096: 591.39 MB/sec OK
> /crypto/hash/md5/speed-16384: 576.46 MB/sec OK
> /crypto/hash/sha1/speed-512: 443.12 MB/sec OK
> /crypto/hash/sha1/speed-1024: 518.82 MB/sec OK
> /crypto/hash/sha1/speed-4096: 555.60 MB/sec OK
> /crypto/hash/sha1/speed-16384: 568.16 MB/sec OK
> /crypto/hash/sha224/speed-512: 221.90 MB/sec OK
> /crypto/hash/sha224/speed-1024: 239.79 MB/sec OK
> /crypto/hash/sha224/speed-4096: 269.37 MB/sec OK
> /crypto/hash/sha224/speed-16384: 274.87 MB/sec OK
> /crypto/hash/sha256/speed-512: 222.75 MB/sec OK
> /crypto/hash/sha256/speed-1024: 253.25 MB/sec OK
> /crypto/hash/sha256/speed-4096: 272.80 MB/sec OK
> /crypto/hash/sha256/speed-16384: 275.59 MB/sec OK
> /crypto/hash/sha384/speed-512: 322.73 MB/sec OK
> /crypto/hash/sha384/speed-1024: 369.84 MB/sec OK
> /crypto/hash/sha384/speed-4096: 406.71 MB/sec OK
> /crypto/hash/sha384/speed-16384: 417.87 MB/sec OK
> /crypto/hash/sha512/speed-512: 320.62 MB/sec OK
> /crypto/hash/sha512/speed-1024: 361.93 MB/sec OK
> /crypto/hash/sha512/speed-4096: 404.91 MB/sec OK
> /crypto/hash/sha512/speed-16384: 418.53 MB/sec OK
> /crypto/hash/ripemd160/speed-512: 226.45 MB/sec OK
> /crypto/hash/ripemd160/speed-1024: 239.25 MB/sec OK
> /crypto/hash/ripemd160/speed-4096: 251.31 MB/sec OK
> /crypto/hash/ripemd160/speed-16384: 255.01 MB/sec OK
>
>
> IOW, md5 is clearly the quickest, by a considerable margin over
> SHA256/512. SHA1 is slightly slower.
>
> Assuming that we document that this command is intentionally
> *not* trying to guarantee collision resistances we're ok.
>
> In fact we should not document what kind of checksum is
> reported by query-memory-checksum. The impl should be a black
> box from user's POV.
>
> If we're just aiming for debugging tool to detect accidental
> corruption, could we even just ignore cryptographic hashs
> entirely and do a crc32 - that'd be way faster than even
> md5.

Good points.

The doc strings should spell out "for debugging", like the commit
message does, and both should spell out "weak collision resistance".

I can't find CRC-32 in GLib, but zlib appears to provide it:
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/zlib-crc32-1.html

Care to compare its speed to MD5?

[Qemu-devel] [PATCH v5 30/30] riscv: sifive_u: Update model and compatible strings in device tree

2019-08-22 Thread Bin Meng

This updates model and compatible strings to use the same strings
as used in the Linux kernel device tree (hifive-unleashed-a00.dts).

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 1140c38..fae19fe 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -97,8 +97,9 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 exit(1);
 }
 
-qemu_fdt_setprop_string(fdt, "/", "model", "ucbbar,spike-bare,qemu");
-qemu_fdt_setprop_string(fdt, "/", "compatible", "ucbbar,spike-bare-dev");
+qemu_fdt_setprop_string(fdt, "/", "model", "SiFive HiFive Unleashed A00");
+qemu_fdt_setprop_string(fdt, "/", "compatible",
+"sifive,hifive-unleashed-a00");
 qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
 qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x2);
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 26/30] riscv: sifive: Implement a model for SiFive FU540 OTP

2019-08-22 Thread Bin Meng

This implements a simple model for SiFive FU540 OTP (One-Time
Programmable) Memory interface, primarily for reading out the
stored serial number from the first 1 KiB of the 16 KiB OTP
memory reserved by SiFive for internal use.

Signed-off-by: Bin Meng 

---

Changes in v5:
- change to use defines instead of enums
- change to use qemu_log_mask(LOG_GUEST_ERROR,...) in sifive_u_otp
- creating a 32-bit val variable and using that instead of casting
  everywhere in sifive_u_otp_write()
- move all register initialization to sifive_u_otp_reset() function
- drop sifive_u_otp_create()

Changes in v4:
- prefix all macros/variables/functions with SIFIVE_U/sifive_u
  in the sifive_u_otp driver

Changes in v3: None
Changes in v2: None

 hw/riscv/Makefile.objs  |   1 +
 hw/riscv/sifive_u_otp.c | 190 
 include/hw/riscv/sifive_u_otp.h |  80 +
 3 files changed, 271 insertions(+)
 create mode 100644 hw/riscv/sifive_u_otp.c
 create mode 100644 include/hw/riscv/sifive_u_otp.h

diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
index b95bbd5..fc3c6dd 100644
--- a/hw/riscv/Makefile.objs
+++ b/hw/riscv/Makefile.objs
@@ -8,6 +8,7 @@ obj-$(CONFIG_SIFIVE) += sifive_gpio.o
 obj-$(CONFIG_SIFIVE) += sifive_plic.o
 obj-$(CONFIG_SIFIVE) += sifive_test.o
 obj-$(CONFIG_SIFIVE_U) += sifive_u.o
+obj-$(CONFIG_SIFIVE_U) += sifive_u_otp.o
 obj-$(CONFIG_SIFIVE_U) += sifive_u_prci.o
 obj-$(CONFIG_SIFIVE) += sifive_uart.o
 obj-$(CONFIG_SPIKE) += spike.o
diff --git a/hw/riscv/sifive_u_otp.c b/hw/riscv/sifive_u_otp.c
new file mode 100644
index 000..7d65a85
--- /dev/null
+++ b/hw/riscv/sifive_u_otp.c
@@ -0,0 +1,190 @@
+/*
+ * QEMU SiFive U OTP (One-Time Programmable) Memory interface
+ *
+ * Copyright (c) 2019 Bin Meng 
+ *
+ * Simple model of the OTP to emulate register reads made by the SDK BSP
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/riscv/sifive_u_otp.h"
+
+static uint64_t sifive_u_otp_read(void *opaque, hwaddr addr, unsigned int size)
+{
+SiFiveUOTPState *s = opaque;
+
+switch (addr) {
+case SIFIVE_U_OTP_PA:
+return s->pa;
+case SIFIVE_U_OTP_PAIO:
+return s->paio;
+case SIFIVE_U_OTP_PAS:
+return s->pas;
+case SIFIVE_U_OTP_PCE:
+return s->pce;
+case SIFIVE_U_OTP_PCLK:
+return s->pclk;
+case SIFIVE_U_OTP_PDIN:
+return s->pdin;
+case SIFIVE_U_OTP_PDOUT:
+if ((s->pce & SIFIVE_U_OTP_PCE_EN) &&
+(s->pdstb & SIFIVE_U_OTP_PDSTB_EN) &&
+(s->ptrim & SIFIVE_U_OTP_PTRIM_EN)) {
+return s->fuse[s->pa & SIFIVE_U_OTP_PA_MASK];
+} else {
+return 0xff;
+}
+case SIFIVE_U_OTP_PDSTB:
+return s->pdstb;
+case SIFIVE_U_OTP_PPROG:
+return s->pprog;
+case SIFIVE_U_OTP_PTC:
+return s->ptc;
+case SIFIVE_U_OTP_PTM:
+return s->ptm;
+case SIFIVE_U_OTP_PTM_REP:
+return s->ptm_rep;
+case SIFIVE_U_OTP_PTR:
+return s->ptr;
+case SIFIVE_U_OTP_PTRIM:
+return s->ptrim;
+case SIFIVE_U_OTP_PWE:
+return s->pwe;
+}
+
+qemu_log_mask(LOG_GUEST_ERROR, "%s: read: addr=0x%x\n",
+  __func__, (int)addr);
+return 0;
+}
+
+static void sifive_u_otp_write(void *opaque, hwaddr addr,
+   uint64_t val64, unsigned int size)
+{
+SiFiveUOTPState *s = opaque;
+uint32_t val32 = (uint32_t)val64;
+
+switch (addr) {
+case SIFIVE_U_OTP_PA:
+s->pa = val32 & SIFIVE_U_OTP_PA_MASK;
+break;
+case SIFIVE_U_OTP_PAIO:
+s->paio = val32;
+break;
+case SIFIVE_U_OTP_PAS:
+s->pas = val32;
+break;
+case SIFIVE_U_OTP_PCE:
+s->pce = val32;
+break;
+case SIFIVE_U_OTP_PCLK:
+s->pclk = val32;
+break;
+case SIFIVE_U_OTP_PDIN:
+s->pdin = val32;
+break;
+case SIFIVE_U_OTP_PDOUT:
+/* read-only */
+break;
+case SIFIVE_U_OTP_PDSTB:
+s->pdstb = val32;
+break;
+case SIFIVE_U_OTP_PPROG:
+s->pprog = val32;
+break;
+case SIFIVE_U_OTP_PTC:
+s->ptc = val32;
+break;
+case SIFIVE_U_OTP_PTM:
+s->ptm = val32;
+break;
+case

[Qemu-devel] [PATCH v5 20/30] riscv: sifive_u: Generate hfclk and rtcclk nodes

2019-08-22 Thread Bin Meng

To keep in sync with Linux kernel device tree, generate hfclk and
rtcclk nodes in the device tree, to be referenced by PRCI node.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 23 +++
 include/hw/riscv/sifive_u.h |  2 ++
 2 files changed, 25 insertions(+)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 284f7a5..08db741 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -80,6 +80,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 char ethclk_names[] = "pclk\0hclk\0tx_clk";
 uint32_t plic_phandle, ethclk_phandle, phandle = 1;
 uint32_t uartclk_phandle;
+uint32_t hfclk_phandle, rtcclk_phandle;
 
 fdt = s->fdt = create_device_tree(>fdt_size);
 if (!fdt) {
@@ -98,6 +99,28 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, "/soc", "#size-cells", 0x2);
 qemu_fdt_setprop_cell(fdt, "/soc", "#address-cells", 0x2);
 
+hfclk_phandle = phandle++;
+nodename = g_strdup_printf("/hfclk");
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_cell(fdt, nodename, "phandle", hfclk_phandle);
+qemu_fdt_setprop_string(fdt, nodename, "clock-output-names", "hfclk");
+qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
+SIFIVE_U_HFCLK_FREQ);
+qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
+qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
+g_free(nodename);
+
+rtcclk_phandle = phandle++;
+nodename = g_strdup_printf("/rtcclk");
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_cell(fdt, nodename, "phandle", rtcclk_phandle);
+qemu_fdt_setprop_string(fdt, nodename, "clock-output-names", "rtcclk");
+qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
+SIFIVE_U_RTCCLK_FREQ);
+qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
+qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
+g_free(nodename);
+
 nodename = g_strdup_printf("/memory@%lx",
 (long)memmap[SIFIVE_U_DRAM].base);
 qemu_fdt_add_subnode(fdt, nodename);
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index 7a1a4f3..debbf28 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -68,6 +68,8 @@ enum {
 
 enum {
 SIFIVE_U_CLOCK_FREQ = 10,
+SIFIVE_U_HFCLK_FREQ = ,
+SIFIVE_U_RTCCLK_FREQ = 100,
 SIFIVE_U_GEM_CLOCK_FREQ = 12500
 };
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 29/30] riscv: sifive_u: Remove handcrafted clock nodes for UART and ethernet

2019-08-22 Thread Bin Meng

In the past we did not have a model for PRCI, hence two handcrafted
clock nodes ("/soc/ethclk" and "/soc/uartclk") were created for the
purpose of supplying hard-coded clock frequencies. But now since we
have added the PRCI support in QEMU, we don't need them any more.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- new patch to remove handcrafted clock nodes for UART and ethernet

Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 24 +---
 include/hw/riscv/sifive_u.h |  3 +--
 2 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 503db4b..1140c38 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -88,8 +88,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 uint32_t *cells;
 char *nodename;
 char ethclk_names[] = "pclk\0hclk";
-uint32_t plic_phandle, prci_phandle, ethclk_phandle, phandle = 1;
-uint32_t uartclk_phandle;
+uint32_t plic_phandle, prci_phandle, phandle = 1;
 uint32_t hfclk_phandle, rtcclk_phandle, phy_phandle;
 
 fdt = s->fdt = create_device_tree(>fdt_size);
@@ -249,17 +248,6 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 g_free(cells);
 g_free(nodename);
 
-ethclk_phandle = phandle++;
-nodename = g_strdup_printf("/soc/ethclk");
-qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
-qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
-qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
-SIFIVE_U_GEM_CLOCK_FREQ);
-qemu_fdt_setprop_cell(fdt, nodename, "phandle", ethclk_phandle);
-ethclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
-g_free(nodename);
-
 phy_phandle = phandle++;
 nodename = g_strdup_printf("/soc/ethernet@%lx",
 (long)memmap[SIFIVE_U_GEM].base);
@@ -293,16 +281,6 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "reg", 0x0);
 g_free(nodename);
 
-uartclk_phandle = phandle++;
-nodename = g_strdup_printf("/soc/uartclk");
-qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
-qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
-qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
-qemu_fdt_setprop_cell(fdt, nodename, "phandle", uartclk_phandle);
-uartclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
-g_free(nodename);
-
 nodename = g_strdup_printf("/soc/serial@%lx",
 (long)memmap[SIFIVE_U_UART0].base);
 qemu_fdt_add_subnode(fdt, nodename);
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index d2b9d99..3bb87cb 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -76,8 +76,7 @@ enum {
 enum {
 SIFIVE_U_CLOCK_FREQ = 10,
 SIFIVE_U_HFCLK_FREQ = ,
-SIFIVE_U_RTCCLK_FREQ = 100,
-SIFIVE_U_GEM_CLOCK_FREQ = 12500
+SIFIVE_U_RTCCLK_FREQ = 100
 };
 
 #define SIFIVE_U_MANAGEMENT_CPU_COUNT   1
-- 
2.7.4

[Qemu-devel] [PATCH v5 22/30] riscv: sifive_u: Reference PRCI clocks in UART and ethernet nodes

2019-08-22 Thread Bin Meng

Now that we have added a PRCI node, update existing UART and ethernet
nodes to reference PRCI as their clock sources, to keep in sync with
the Linux kernel device tree.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c  |  7 ---
 include/hw/riscv/sifive_u_prci.h | 10 ++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index c777d41..e0842ad 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -79,7 +79,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 int cpu;
 uint32_t *cells;
 char *nodename;
-char ethclk_names[] = "pclk\0hclk\0tx_clk";
+char ethclk_names[] = "pclk\0hclk";
 uint32_t plic_phandle, prci_phandle, ethclk_phandle, phandle = 1;
 uint32_t uartclk_phandle;
 uint32_t hfclk_phandle, rtcclk_phandle;
@@ -264,7 +264,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupts", SIFIVE_U_GEM_IRQ);
 qemu_fdt_setprop_cells(fdt, nodename, "clocks",
-ethclk_phandle, ethclk_phandle, ethclk_phandle);
+prci_phandle, PRCI_CLK_GEMGXLPLL, prci_phandle, PRCI_CLK_GEMGXLPLL);
 qemu_fdt_setprop(fdt, nodename, "clock-names", ethclk_names,
 sizeof(ethclk_names));
 qemu_fdt_setprop_cell(fdt, nodename, "#address-cells", 1);
@@ -294,7 +294,8 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[SIFIVE_U_UART0].base,
 0x0, memmap[SIFIVE_U_UART0].size);
-qemu_fdt_setprop_cell(fdt, nodename, "clocks", uartclk_phandle);
+qemu_fdt_setprop_cells(fdt, nodename, "clocks",
+prci_phandle, PRCI_CLK_TLCLK);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupts", SIFIVE_U_UART0_IRQ);
 
diff --git a/include/hw/riscv/sifive_u_prci.h b/include/hw/riscv/sifive_u_prci.h
index 60a2eab..0a531fd 100644
--- a/include/hw/riscv/sifive_u_prci.h
+++ b/include/hw/riscv/sifive_u_prci.h
@@ -78,4 +78,14 @@ typedef struct SiFiveUPRCIState {
 uint32_t clkmuxstatus;
 } SiFiveUPRCIState;
 
+/*
+ * Clock indexes for use by Device Tree data and the PRCI driver.
+ *
+ * These values are from sifive-fu540-prci.h in the Linux kernel.
+ */
+#define PRCI_CLK_COREPLL0
+#define PRCI_CLK_DDRPLL 1
+#define PRCI_CLK_GEMGXLPLL  2
+#define PRCI_CLK_TLCLK  3
+
 #endif /* HW_SIFIVE_U_PRCI_H */
-- 
2.7.4

[Qemu-devel] [PATCH v5 14/30] riscv: hart: Extract hart realize to a separate routine

2019-08-22 Thread Bin Meng

Currently riscv_harts_realize() creates all harts based on the
same cpu type given in the hart array property. With current
implementation it can only create homogeneous harts. Exact the
hart realize to a separate routine in preparation for supporting
multiple hart arrays.

Note the file header says the RISC-V hart array holds the state
of a heterogeneous array of RISC-V harts, which is not true.
Update the comment to mention homogeneous array of RISC-V harts.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/riscv_hart.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
index ca69a1b..9deef869 100644
--- a/hw/riscv/riscv_hart.c
+++ b/hw/riscv/riscv_hart.c
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 2017 SiFive, Inc.
  *
- * Holds the state of a heterogenous array of RISC-V harts
+ * Holds the state of a homogeneous array of RISC-V harts
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -37,26 +37,33 @@ static void riscv_harts_cpu_reset(void *opaque)
 cpu_reset(CPU(cpu));
 }
 
+static void riscv_hart_realize(RISCVHartArrayState *s, int idx,
+   char *cpu_type, Error **errp)
+{
+Error *err = NULL;
+
+object_initialize_child(OBJECT(s), "harts[*]", >harts[idx],
+sizeof(RISCVCPU), cpu_type,
+_abort, NULL);
+s->harts[idx].env.mhartid = idx;
+qemu_register_reset(riscv_harts_cpu_reset, >harts[idx]);
+object_property_set_bool(OBJECT(>harts[idx]), true,
+ "realized", );
+if (err) {
+error_propagate(errp, err);
+return;
+}
+}
+
 static void riscv_harts_realize(DeviceState *dev, Error **errp)
 {
 RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
-Error *err = NULL;
 int n;
 
 s->harts = g_new0(RISCVCPU, s->num_harts);
 
 for (n = 0; n < s->num_harts; n++) {
-object_initialize_child(OBJECT(s), "harts[*]", >harts[n],
-sizeof(RISCVCPU), s->cpu_type,
-_abort, NULL);
-s->harts[n].env.mhartid = n;
-qemu_register_reset(riscv_harts_cpu_reset, >harts[n]);
-object_property_set_bool(OBJECT(>harts[n]), true,
- "realized", );
-if (err) {
-error_propagate(errp, err);
-return;
-}
+riscv_hart_realize(s, n, s->cpu_type, errp);
 }
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 24/30] riscv: sifive_u: Change UART node name in device tree

2019-08-22 Thread Bin Meng

OpenSBI for fu540 does DT fix up (see fu540_modify_dt()) by updating
chosen "stdout-path" to point to "/soc/serial@...", and U-Boot will
use this information to locate the serial node and probe its driver.
However currently we generate the UART node name as "/soc/uart@...",
causing U-Boot fail to find the serial node in DT.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 1a178dc..6cf669c 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -287,7 +287,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 uartclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(nodename);
 
-nodename = g_strdup_printf("/soc/uart@%lx",
+nodename = g_strdup_printf("/soc/serial@%lx",
 (long)memmap[SIFIVE_U_UART0].base);
 qemu_fdt_add_subnode(fdt, nodename);
 qemu_fdt_setprop_string(fdt, nodename, "compatible", "sifive,uart0");
-- 
2.7.4

[Qemu-devel] [PATCH v5 25/30] riscv: roms: Update default bios for sifive_u machine

2019-08-22 Thread Bin Meng

With the support of heterogeneous harts and PRCI model, it's now
possible to use the OpenSBI image (PLATFORM=sifive/fu540) built
for the real hardware.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin | Bin 40968 -> 45064 bytes
 roms/Makefile|   4 ++--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin 
b/pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
index 
5d7a1ef6818994bac4a36818ad36043b592ce309..eb22aefdfb468cfe2804cb4b0bc422d8ebcae93b
 100644
GIT binary patch
delta 10830
zcmcI~eOOf0_WwTTaAp|5zy=s$G)G2;B$Y8xQwsqZMktj?Fen8a(7-p07sHnZ7-n!l
zMBCydW6I@gkn_7)_3KlO(VPJ-_F<-9NtbxZHEj
z{_M5aUTf{O*51R;dU|IYeF9VCYY-ChyTdf=_$;$t$%-!FBh)PF2YiLnP#RtlO4
zf`7%(Y^=BI*)uVqWvTi1HA8ri}M#lXXVj799nsK;XrsM6yhpfDIJrX~XyC
z2!|6CowN*ZcY7WO@m?9<2q^Y_gGC}=pjE1PeV*?)24lA+F1Veyny!q&5qy)IT**
z9rPutBL0(>5I47X9AAARDRjwiGOa}(Juz}vMdK*G692<-3vK0Nnc&)lwu}o}=nauO
zcZ?TY@G#q+qP_71r*$d=x5^cU{Rm;*npek1)(iPIr(~VJW~mN-36_!SG&$4uS~~N
zonLU~df7CKT(SnQcb*l>JuB8)tVcB(J|0hh;{4>3FrF7NUbBFQV#_`kT`g;OzrM@A#ALL{
zFTU_?J|jALA1rVijEu5Z$42A<$HLllzdOpW)s4KC2e>p^Za^`^qUJBC{Py4!EA
zutuzHbf2QIpb(0@8y_(H6QcRl+ZIXQ6a)AlE~>f3d6UUogxm|
z*kzH_OU(7-Ptx?B#rQ4vb~<{#`E`#i4*hXk`_T9gC`6$aZcQh)
zV{dOMfP%ff7aUy+I6J_`D7soZ)`!j_sahQuzb>@g*xD?yR~LRZ00n1kc|r)?_U(B6Tyr^792v7v?!|R
zf-X*H>-Ak>>vO=8IlS+h~Y>Fke_l-#wN!wf;{Zxr>5mD#b1Oqq6nuJjfiZDAe
z9+1=DnQRAk$Jl_kV#a`T?ZBk~oPtk`As5?`i^q`D?Z`!d^u$S$QJ%qeWbhbriXHhJ
zAf2(>lu=}m9T_x+^t2;q021N)F{Hm8=|6^awj+HkZq{s^;WNrpVn<5GkjRb{0Wuv+
zq@zerJJNFu*=I+(05TC@97DR;kuGD%yLKc6$mKY1>L^dzj-IBGR@o5jR~9$w0|D(-07zoZw)HjrO(kIKH(9^=JO?UhKu_4FYHvMB-U9Ol5;Tb}yi`Px~pI?{-$hQJQaxrTZ@pusP*TeGUMh6zDgaSIhCttB64
zLcHG{g7y88>$M?kY`UHRT@UV5Dc67>GK61lF|?rnXMP
zS~hy~8X=D-nafgcAga@-Ahnk(7C>)hYpvH!N>PxPz1im`ud8~frT;?jCF_-*Yqx*y
zY;RjLl-|Fw7YY9Dhk}?!5ydp~DNUo=k7AnUQko_u<-=+kR#5S!pR-K!TIzi1Y97f
zbiRV)XkFOKPU~<>8)TZo5!1XJX_~T-ra1+PJ#~~=#HZqSt7CT?4Mk|>ZZVJNR!#Ci
zv_z~@uslOHNT_8Khyd34{(2ES3t4SEOns6}*`hUY{yKdr*CHy{B-^+yxVgj4h;RA
zzZQn^o5RdQp^GV6jd8@HIQn=cKD1~TeWMId4}ZyFVcu__;F;mU_*A$Hy=)MF8Gesm
z)`S14*h)`s#nrKMaC(F%?Rpv)L`(ex{FqhxaI3XiF1*eQ_LJ_67cU@tgFr
z&+yZc-2rK^X8qAGM9>NaM|XljU<~BJfrF~O%|0o?<{Em*3VS4gI6(N~BEJVXXIY
zOWUTyB6tr4?;UrQR%tdE(B(3lr=;G-3DJ|L+01*((zeWKLw0wLB60JBkv=<`?ZlPQ
zVljvAx@l>1HQ45>X|r*AzYSlBe(+*0$e@q>#*hVVVfZ)Tac#@Anpy#pY0ce!6q!Lf
z|H0Rmwr{kVDBA7c{vkmG?=N`1+2{9kKa{L7KBG`>WXXiTek1?JU9ngub
zuN)A{qbI8-v2J)E7l~GoH7X_e;PTf%XEX7`<-K%31GdIy^PvlzmBs7h-k`tv#C#>L
zfTqhzaYFngkGTj18@lPWlD*J3eaQ
zjKSXH*Nxp@*96+1);^NTk37_W5+B=l2BIV}2e>FMDv}*sqbZh3v0
zhwZx$^bQR;7*dG33MJTarAo#1bbJ4KT=Rs3D%vAAKe#+!Iq{s%A(vd$;Gi
zs!3mofQOJ@<2hjwDI_U{>QOf*f$hDYJHD`TN<3hsAJ#lc#z{Y|apT&5)Vlq_rEIc{
z+o^I){_~9LkI4vJ#2xTIu0aB_s7PV^*1P^9tvre%YcXr3
z!JX?}AK;LT8H{Alvp6HegOMX=yfb4KQ|~$SU>c1Jr_cHvL`m&$!gw4|c_ZAM{
zEQK}~r*D3bsSmh~o!-2|)E_@X%g-;b0}?Hw=p)
zga~=n?c5o`s%{_O$oMr3?^UYk2M!``MCwhzzxPMHtPM}h-oZ#N?@0~txi9DFHz
z79a3ftGI-na$*@t(kpm%|1#u7I_MF9Z;vk;BWjd|y#|c|rU?gn|amAKb
zKzMg--4a8qhjI8@DtJ8;o8DT$NZeR_`>h-BJQ26#PMIR9%rX_KUSrXnx@s*IK@M_x
znWn;y2zF?x)^Z}T6}xQpp>K3y`PM0nB*Yyr-wM*TrQ_nQbAm?B83k~fEcQaGi$M(g
z0gOJ8cq35Y8#)|xx^QZ!SF2FqfmB}yQBa-fAPb*3d=>X@okJ&{#?!YwOW!z!leWb?
zb>kzuE(T?gmVt?NS>KHFrb0;9CnkT0FK+XZ9R84`Q0dIhhvuSAq}nt@w9Rsm^tR2NC}O@*eSju7_Ij)Ym2P$bU7RXY@n+lh7f?v9nh
zd0-yBAWFy!AH2pL36q34;q5on{zV$42gN8@idhV`QiLBR_RvyB1D`ryX71)w#{D{o
z`}uxCHiK5}=%Y0eMTl8!K$^w1h*8ua=%>UJ94YfH9vO+uw>+uyyYppd-h)%nYXbWPLYZ0I+ak(QAlH;+>EZQc13jNSmr(f0^WnJ16b33jx5A2~)!ZuUDcKb32~r1!`PH=
z+iFA$SIds+ak-jL=kKj97i#si*hxp9EVi6*{RxgvPfI@-026{an~6EWjLk>Dn_!p1
zHo97Rxzx6O3Ow@n#_6id1GSNq<;12RjYUpKc%=v|Rg7Zv;>+NDuNG5cVr|c2idc!*
z=A1^D;Kc1?*p)05Stj=6QNpbcUGakluC2RYE)#nflO2Zw??zBp%cMslDQVRJt!ZpP
z4DqTa2QZPl%{1Ututr+bJeB6w2;v#{^Wfk`PA9RPy+?QZ)zZK5AA`0IpD}T9+;0XWNGC-@r4${g2x;-N-t164m
zZBGkJuS%Z+R*m0WWv%YH(0QpXZfLbt*V}SCp?_8Hnx6FA8$0tRm>H|za(4M^zL7tzz1*mqhpNke
ztB{^Jg+7!~Bxw
zVhOhmx?U|4i%U@8FKa2`4|-US9@MI*q(|0pczjjx7#tgPuMk3r}-MxLR<@6P16Lf2ChYH%QW^v
zpyPqfJCX6=F3Q~VHRL0(H)ze=viW;V`!4O%Mu38~z9j>~AN3H3^ho%f
zp24r`8KD8_ih#l*c#acrsY6X-63Mq=6G(>r+=Ij?vQtT}_3;#cr>BHJ3h36dDo^99
zdJ4x1=$FebmkF=vX(7o!1rtWOtajwaUllNfTRTfQ!ERy*w=f=Y%GR>adBWTQ
z*O$wx1eeQv)B2zCk_vt5*oi`-wCDA>hEf4wYkLfkx
zETxHp2p{E3F)z-hAOqj@I{(KwLa!r{1kBTRyhnBQC~WdV%2)xRw<{JXl@fcai;=lY
zK^1tsBgo%-0S+K+L`ylb=Sbelx*f4g>Q-*9YLg8`^)Ky>vFbYGdh#6o^)rj1e6-%U
zW-9bLU!BB*+nTW4_N=O0dTV=nGE8_$E95l{l4%2a{EH`@hoZT%?j|E^7(}=(mdT
zvdtBy;(Qq;yQJny+mRwup<-C9^F|kl^(qhlYeTJOYQtb`?WE0Bx2rps4#o7x^(I(X
z_2_z9+Vb2?`eVh!jB#Lp
zs35?G5ZAdt1h#`fSd*DuI1~j(0!w9L+n@U$H&$Fv!fjnAF*oZ?f%Os`YV`;cK~0YEaT3
zk&;;KhEDqYk%zMzK}mx|D^c17f;kjXl}%GbOBU<#9#f&1FF=Zc-$Rq03e^Y)?z=+3
z_JYj?S|hh9woJ&(EgS^B3@+8Z{sri4@RIgJesH;^^2T>(YK3Zk;ot>MO~mmQ?ll!o
z8-`T^_>y)O3*PYUQKBllrg%f7zTr+V*25oBk9j=N4(2rsErok;a$ytj&;xAUFl5r^
zM~nlPlWAvx=<^-y@tpebo&^e|$Bn#u#f{kFIlRZ^G?7M{dgEIpA=K|Ig<2?O9>D
zHlGh$mrT4Kemk;H+@gwOoxskM`i96(;NH6BJ+F;qrZo(bH!^JtX_Wvad
zYI5itJ#g0D!(Rt^Lxl`ydF$Z=UX71OblaMew!oqI(O;_tTFMvjd_zh8%a4H5Y}4c1

[Qemu-devel] [PATCH v5 21/30] riscv: sifive_u: Add PRCI block to the SoC

2019-08-22 Thread Bin Meng

Add PRCI mmio base address and size mappings to sifive_u machine,
and generate the corresponding device tree node.

Signed-off-by: Bin Meng 

---

Changes in v5:
- create sifive_u_prci block directly in the machine codes, instead
  of calling sifive_u_prci_create()

Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 24 +++-
 include/hw/riscv/sifive_u.h |  3 +++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 08db741..c777d41 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -9,6 +9,7 @@
  * 0) UART
  * 1) CLINT (Core Level Interruptor)
  * 2) PLIC (Platform Level Interrupt Controller)
+ * 3) PRCI (Power, Reset, Clock, Interrupt)
  *
  * This board currently generates devicetree dynamically that indicates at 
least
  * two harts and up to five harts.
@@ -61,6 +62,7 @@ static const struct MemmapEntry {
 [SIFIVE_U_MROM] = { 0x1000,0x11000 },
 [SIFIVE_U_CLINT] ={  0x200,0x1 },
 [SIFIVE_U_PLIC] = {  0xc00,  0x400 },
+[SIFIVE_U_PRCI] = { 0x1000, 0x1000 },
 [SIFIVE_U_UART0] ={ 0x10013000, 0x1000 },
 [SIFIVE_U_UART1] ={ 0x10023000, 0x1000 },
 [SIFIVE_U_DRAM] = { 0x8000,0x0 },
@@ -78,7 +80,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 uint32_t *cells;
 char *nodename;
 char ethclk_names[] = "pclk\0hclk\0tx_clk";
-uint32_t plic_phandle, ethclk_phandle, phandle = 1;
+uint32_t plic_phandle, prci_phandle, ethclk_phandle, phandle = 1;
 uint32_t uartclk_phandle;
 uint32_t hfclk_phandle, rtcclk_phandle;
 
@@ -189,6 +191,21 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 g_free(cells);
 g_free(nodename);
 
+prci_phandle = phandle++;
+nodename = g_strdup_printf("/soc/clock-controller@%lx",
+(long)memmap[SIFIVE_U_PRCI].base);
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_cell(fdt, nodename, "phandle", prci_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x1);
+qemu_fdt_setprop_cells(fdt, nodename, "clocks",
+hfclk_phandle, rtcclk_phandle);
+qemu_fdt_setprop_cells(fdt, nodename, "reg",
+0x0, memmap[SIFIVE_U_PRCI].base,
+0x0, memmap[SIFIVE_U_PRCI].size);
+qemu_fdt_setprop_string(fdt, nodename, "compatible",
+"sifive,fu540-c000-prci");
+g_free(nodename);
+
 plic_phandle = phandle++;
 cells =  g_new0(uint32_t, ms->smp.cpus * 4 - 2);
 for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
@@ -411,6 +428,8 @@ static void riscv_sifive_u_soc_init(Object *obj)
 "cpu-type", _abort);
 }
 
+sysbus_init_child_obj(obj, "prci", >prci, sizeof(s->prci),
+  TYPE_SIFIVE_U_PRCI);
 sysbus_init_child_obj(obj, "gem", >gem, sizeof(s->gem),
   TYPE_CADENCE_GEM);
 }
@@ -484,6 +503,9 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
 memmap[SIFIVE_U_CLINT].size, ms->smp.cpus,
 SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE);
 
+object_property_set_bool(OBJECT(>prci), true, "realized", );
+sysbus_mmio_map(SYS_BUS_DEVICE(>prci), 0, memmap[SIFIVE_U_PRCI].base);
+
 for (i = 0; i < SIFIVE_U_PLIC_NUM_SOURCES; i++) {
 plic_gpios[i] = qdev_get_gpio_in(DEVICE(s->plic), i);
 }
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index debbf28..2a023be 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -21,6 +21,7 @@
 
 #include "hw/net/cadence_gem.h"
 #include "hw/riscv/sifive_cpu.h"
+#include "hw/riscv/sifive_u_prci.h"
 
 #define TYPE_RISCV_U_SOC "riscv.sifive.u.soc"
 #define RISCV_U_SOC(obj) \
@@ -36,6 +37,7 @@ typedef struct SiFiveUSoCState {
 RISCVHartArrayState e_cpus;
 RISCVHartArrayState u_cpus;
 DeviceState *plic;
+SiFiveUPRCIState prci;
 CadenceGEMState gem;
 } SiFiveUSoCState;
 
@@ -54,6 +56,7 @@ enum {
 SIFIVE_U_MROM,
 SIFIVE_U_CLINT,
 SIFIVE_U_PLIC,
+SIFIVE_U_PRCI,
 SIFIVE_U_UART0,
 SIFIVE_U_UART1,
 SIFIVE_U_DRAM,
-- 
2.7.4

[Qemu-devel] [PATCH v5 28/30] riscv: sifive_u: Fix broken GEM support

2019-08-22 Thread Bin Meng

At present the GEM support in sifive_u machine is seriously broken.
The GEM block register base was set to a weird number (0x100900FC),
which for no way could work with the cadence_gem model in QEMU.

Not like other GEM variants, the FU540-specific GEM has a management
block to control 10/100/1000Mbps link speed changes, that is mapped
to 0x100a. We can simply map it into MMIO space without special
handling using create_unimplemented_device().

Update the GEM node compatible string to use the official name used
by the upstream Linux kernel, and add the management block reg base
& size to the  property encoding.

Tested with upstream U-Boot and Linux kernel MACB drivers.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5:
- add the missing "local-mac-address" property in the ethernet node

Changes in v4: None
Changes in v3: None
Changes in v2:
- use create_unimplemented_device() to create the GEM management
  block instead of sifive_mmio_emulate()
- add "phy-handle" property to the ethernet node

 hw/riscv/sifive_u.c | 24 
 include/hw/riscv/sifive_u.h |  3 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index b6ddf5d..503db4b 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -3,6 +3,7 @@
  *
  * Copyright (c) 2016-2017 Sagar Karandikar, sag...@eecs.berkeley.edu
  * Copyright (c) 2017 SiFive, Inc.
+ * Copyright (c) 2019 Bin Meng 
  *
  * Provides a board compatible with the SiFive Freedom U SDK:
  *
@@ -11,6 +12,7 @@
  * 2) PLIC (Platform Level Interrupt Controller)
  * 3) PRCI (Power, Reset, Clock, Interrupt)
  * 4) OTP (One-Time Programmable) memory with stored serial number
+ * 5) GEM (Gigabit Ethernet Controller) and management block
  *
  * This board currently generates devicetree dynamically that indicates at 
least
  * two harts and up to five harts.
@@ -39,6 +41,7 @@
 #include "hw/sysbus.h"
 #include "hw/char/serial.h"
 #include "hw/cpu/cluster.h"
+#include "hw/misc/unimp.h"
 #include "target/riscv/cpu.h"
 #include "hw/riscv/riscv_hart.h"
 #include "hw/riscv/sifive_plic.h"
@@ -47,6 +50,7 @@
 #include "hw/riscv/sifive_u.h"
 #include "hw/riscv/boot.h"
 #include "chardev/char.h"
+#include "net/eth.h"
 #include "sysemu/arch_init.h"
 #include "sysemu/device_tree.h"
 #include "exec/address-spaces.h"
@@ -68,7 +72,8 @@ static const struct MemmapEntry {
 [SIFIVE_U_UART1] ={ 0x10011000, 0x1000 },
 [SIFIVE_U_OTP] =  { 0x1007, 0x1000 },
 [SIFIVE_U_DRAM] = { 0x8000,0x0 },
-[SIFIVE_U_GEM] =  { 0x100900FC, 0x2000 },
+[SIFIVE_U_GEM] =  { 0x1009, 0x2000 },
+[SIFIVE_U_GEM_MGMT] = { 0x100a, 0x1000 },
 };
 
 #define OTP_SERIAL  1
@@ -85,7 +90,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 char ethclk_names[] = "pclk\0hclk";
 uint32_t plic_phandle, prci_phandle, ethclk_phandle, phandle = 1;
 uint32_t uartclk_phandle;
-uint32_t hfclk_phandle, rtcclk_phandle;
+uint32_t hfclk_phandle, rtcclk_phandle, phy_phandle;
 
 fdt = s->fdt = create_device_tree(>fdt_size);
 if (!fdt) {
@@ -255,21 +260,28 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 ethclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(nodename);
 
+phy_phandle = phandle++;
 nodename = g_strdup_printf("/soc/ethernet@%lx",
 (long)memmap[SIFIVE_U_GEM].base);
 qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_string(fdt, nodename, "compatible", "cdns,macb");
+qemu_fdt_setprop_string(fdt, nodename, "compatible",
+"sifive,fu540-c000-gem");
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[SIFIVE_U_GEM].base,
-0x0, memmap[SIFIVE_U_GEM].size);
+0x0, memmap[SIFIVE_U_GEM].size,
+0x0, memmap[SIFIVE_U_GEM_MGMT].base,
+0x0, memmap[SIFIVE_U_GEM_MGMT].size);
 qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
 qemu_fdt_setprop_string(fdt, nodename, "phy-mode", "gmii");
+qemu_fdt_setprop_cell(fdt, nodename, "phy-handle", phy_phandle);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
 qemu_fdt_setprop_cell(fdt, nodename, "interrupts", SIFIVE_U_GEM_IRQ);
 qemu_fdt_setprop_cells(fdt, nodename, "clocks",
 prci_phandle, PRCI_CLK_GEMGXLPLL, prci_phandle, PRCI_CLK_GEMGXLPLL);
 qemu_fdt_setprop(fdt, nodename, "clock-names", ethclk_names,
 sizeof(ethclk_names));
+qemu_fdt_setprop(fdt, nodename, "local-mac-address",
+s->soc.gem.conf.macaddr.a, ETH_ALEN);
 qemu_fdt_setprop_cell(fdt, nodename, "#address-cells", 1);
 qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0);
 g_free(nodename);
@@ -277,6 +289,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 nodename = g_strdup_printf("/soc/ethernet@%lx/ethernet-phy@0",

[Qemu-devel] [PATCH v5 13/30] riscv: Add a sifive_cpu.h to include both E and U cpu type defines

2019-08-22 Thread Bin Meng

Group SiFive E and U cpu type defines into one header file.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 include/hw/riscv/sifive_cpu.h | 31 +++
 include/hw/riscv/sifive_e.h   |  7 +--
 include/hw/riscv/sifive_u.h   |  7 +--
 3 files changed, 33 insertions(+), 12 deletions(-)
 create mode 100644 include/hw/riscv/sifive_cpu.h

diff --git a/include/hw/riscv/sifive_cpu.h b/include/hw/riscv/sifive_cpu.h
new file mode 100644
index 000..1367996
--- /dev/null
+++ b/include/hw/riscv/sifive_cpu.h
@@ -0,0 +1,31 @@
+/*
+ * SiFive CPU types
+ *
+ * Copyright (c) 2017 SiFive, Inc.
+ * Copyright (c) 2019 Bin Meng 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#ifndef HW_SIFIVE_CPU_H
+#define HW_SIFIVE_CPU_H
+
+#if defined(TARGET_RISCV32)
+#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E31
+#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U34
+#elif defined(TARGET_RISCV64)
+#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
+#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
+#endif
+
+#endif /* HW_SIFIVE_CPU_H */
diff --git a/include/hw/riscv/sifive_e.h b/include/hw/riscv/sifive_e.h
index d175b24..e17cdfd 100644
--- a/include/hw/riscv/sifive_e.h
+++ b/include/hw/riscv/sifive_e.h
@@ -19,6 +19,7 @@
 #ifndef HW_SIFIVE_E_H
 #define HW_SIFIVE_E_H
 
+#include "hw/riscv/sifive_cpu.h"
 #include "hw/riscv/sifive_gpio.h"
 
 #define TYPE_RISCV_E_SOC "riscv.sifive.e.soc"
@@ -83,10 +84,4 @@ enum {
 #define SIFIVE_E_PLIC_CONTEXT_BASE 0x20
 #define SIFIVE_E_PLIC_CONTEXT_STRIDE 0x1000
 
-#if defined(TARGET_RISCV32)
-#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E31
-#elif defined(TARGET_RISCV64)
-#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
-#endif
-
 #endif
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index 892f0ee..4abc621 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -20,6 +20,7 @@
 #define HW_SIFIVE_U_H
 
 #include "hw/net/cadence_gem.h"
+#include "hw/riscv/sifive_cpu.h"
 
 #define TYPE_RISCV_U_SOC "riscv.sifive.u.soc"
 #define RISCV_U_SOC(obj) \
@@ -77,10 +78,4 @@ enum {
 #define SIFIVE_U_PLIC_CONTEXT_BASE 0x20
 #define SIFIVE_U_PLIC_CONTEXT_STRIDE 0x1000
 
-#if defined(TARGET_RISCV32)
-#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U34
-#elif defined(TARGET_RISCV64)
-#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
-#endif
-
 #endif
-- 
2.7.4

[Qemu-devel] [PATCH v5 15/30] riscv: hart: Add a "hartid-base" property to RISC-V hart array

2019-08-22 Thread Bin Meng

At present each hart's hartid in a RISC-V hart array is assigned
the same value of its index in the hart array. But for a system
that has multiple hart arrays, this is not the case any more.

Add a new "hartid-base" property so that hartid number can be
assigned based on the property value.

Signed-off-by: Bin Meng 

---

Changes in v5: None
Changes in v4:
- new patch to add a "hartid-base" property to RISC-V hart array

Changes in v3: None
Changes in v2: None

 hw/riscv/riscv_hart.c | 8 +---
 include/hw/riscv/riscv_hart.h | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
index 9deef869..52ab86a 100644
--- a/hw/riscv/riscv_hart.c
+++ b/hw/riscv/riscv_hart.c
@@ -27,6 +27,7 @@
 
 static Property riscv_harts_props[] = {
 DEFINE_PROP_UINT32("num-harts", RISCVHartArrayState, num_harts, 1),
+DEFINE_PROP_UINT32("hartid-base", RISCVHartArrayState, hartid_base, 0),
 DEFINE_PROP_STRING("cpu-type", RISCVHartArrayState, cpu_type),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -37,7 +38,7 @@ static void riscv_harts_cpu_reset(void *opaque)
 cpu_reset(CPU(cpu));
 }
 
-static void riscv_hart_realize(RISCVHartArrayState *s, int idx,
+static void riscv_hart_realize(RISCVHartArrayState *s, int idx, uint32_t 
hartid,
char *cpu_type, Error **errp)
 {
 Error *err = NULL;
@@ -45,7 +46,7 @@ static void riscv_hart_realize(RISCVHartArrayState *s, int 
idx,
 object_initialize_child(OBJECT(s), "harts[*]", >harts[idx],
 sizeof(RISCVCPU), cpu_type,
 _abort, NULL);
-s->harts[idx].env.mhartid = idx;
+s->harts[idx].env.mhartid = hartid;
 qemu_register_reset(riscv_harts_cpu_reset, >harts[idx]);
 object_property_set_bool(OBJECT(>harts[idx]), true,
  "realized", );
@@ -58,12 +59,13 @@ static void riscv_hart_realize(RISCVHartArrayState *s, int 
idx,
 static void riscv_harts_realize(DeviceState *dev, Error **errp)
 {
 RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
+uint32_t hartid = s->hartid_base;
 int n;
 
 s->harts = g_new0(RISCVCPU, s->num_harts);
 
 for (n = 0; n < s->num_harts; n++) {
-riscv_hart_realize(s, n, s->cpu_type, errp);
+riscv_hart_realize(s, n, hartid + n, s->cpu_type, errp);
 }
 }
 
diff --git a/include/hw/riscv/riscv_hart.h b/include/hw/riscv/riscv_hart.h
index 0671d88..1984e30 100644
--- a/include/hw/riscv/riscv_hart.h
+++ b/include/hw/riscv/riscv_hart.h
@@ -32,6 +32,7 @@ typedef struct RISCVHartArrayState {
 
 /*< public >*/
 uint32_t num_harts;
+uint32_t hartid_base;
 char *cpu_type;
 RISCVCPU *harts;
 } RISCVHartArrayState;
-- 
2.7.4

[Qemu-devel] [PATCH v5 16/30] riscv: sifive_u: Update hart configuration to reflect the real FU540 SoC

2019-08-22 Thread Bin Meng

The FU540-C000 includes a 64-bit E51 RISC-V core and four 64-bit U54
RISC-V cores. Currently the sifive_u machine only populates 4 U54
cores. Update the max cpu number to 5 to reflect the real hardware,
by creating 2 CPU clusters as containers for RISC-V hart arrays to
populate heterogeneous harts.

The cpu nodes in the generated DTS have been updated as well.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- changed to create clusters for each cpu type

Changes in v3:
- changed to use macros for management and compute cpu count

Changes in v2:
- fixed the "interrupts-extended" property size

 hw/riscv/sifive_u.c | 102 +---
 include/hw/riscv/sifive_u.h |   8 +++-
 2 files changed, 84 insertions(+), 26 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 3f58f61..0e5bbe7 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -10,7 +10,8 @@
  * 1) CLINT (Core Level Interruptor)
  * 2) PLIC (Platform Level Interrupt Controller)
  *
- * This board currently uses a hardcoded devicetree that indicates one hart.
+ * This board currently generates devicetree dynamically that indicates at most
+ * five harts.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -26,6 +27,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -34,6 +36,7 @@
 #include "hw/loader.h"
 #include "hw/sysbus.h"
 #include "hw/char/serial.h"
+#include "hw/cpu/cluster.h"
 #include "target/riscv/cpu.h"
 #include "hw/riscv/riscv_hart.h"
 #include "hw/riscv/sifive_plic.h"
@@ -69,6 +72,7 @@ static const struct MemmapEntry {
 static void create_fdt(SiFiveUState *s, const struct MemmapEntry *memmap,
 uint64_t mem_size, const char *cmdline)
 {
+MachineState *ms = MACHINE(qdev_get_machine());
 void *fdt;
 int cpu;
 uint32_t *cells;
@@ -109,15 +113,21 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
 qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
 
-for (cpu = s->soc.cpus.num_harts - 1; cpu >= 0; cpu--) {
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
 int cpu_phandle = phandle++;
 nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
 char *intc = g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
-char *isa = riscv_isa_string(>soc.cpus.harts[cpu]);
+char *isa;
 qemu_fdt_add_subnode(fdt, nodename);
 qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
   SIFIVE_U_CLOCK_FREQ);
-qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+/* cpu 0 is the management hart that does not have mmu */
+if (cpu != 0) {
+qemu_fdt_setprop_string(fdt, nodename, "mmu-type", "riscv,sv48");
+isa = riscv_isa_string(>soc.u_cpus.harts[cpu - 1]);
+} else {
+isa = riscv_isa_string(>soc.e_cpus.harts[0]);
+}
 qemu_fdt_setprop_string(fdt, nodename, "riscv,isa", isa);
 qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv");
 qemu_fdt_setprop_string(fdt, nodename, "status", "okay");
@@ -133,8 +143,8 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 g_free(nodename);
 }
 
-cells =  g_new0(uint32_t, s->soc.cpus.num_harts * 4);
-for (cpu = 0; cpu < s->soc.cpus.num_harts; cpu++) {
+cells =  g_new0(uint32_t, ms->smp.cpus * 4);
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 nodename =
 g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
 uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
@@ -152,20 +162,26 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 0x0, memmap[SIFIVE_U_CLINT].base,
 0x0, memmap[SIFIVE_U_CLINT].size);
 qemu_fdt_setprop(fdt, nodename, "interrupts-extended",
-cells, s->soc.cpus.num_harts * sizeof(uint32_t) * 4);
+cells, ms->smp.cpus * sizeof(uint32_t) * 4);
 g_free(cells);
 g_free(nodename);
 
 plic_phandle = phandle++;
-cells =  g_new0(uint32_t, s->soc.cpus.num_harts * 4);
-for (cpu = 0; cpu < s->soc.cpus.num_harts; cpu++) {
+cells =  g_new0(uint32_t, ms->smp.cpus * 4 - 2);
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 nodename =
 g_strdup_printf("/cpus/cpu@%d/interrupt-controller", cpu);
 uint32_t intc_phandle = qemu_fdt_get_phandle(fdt, nodename);
-cells[cpu * 4 + 0] = cpu_to_be32(intc_phandle);
-cells[cpu * 4 + 1] = cpu_to_be32(IRQ_M_EXT);
-cells[cpu * 4 + 2] = cpu_to_be32(intc_phandle);
-cells[cpu * 4 + 3] = cpu_to_be32(IRQ_S_EXT);
+/* cpu 0 is the

[Qemu-devel] [PATCH v5 11/30] riscv: sifive_e: prci: Update the PRCI register block size

2019-08-22 Thread Bin Meng

Currently the PRCI register block size is set to 0x8000, but in fact
0x1000 is enough, which is also what the manual says.

Signed-off-by: Bin Meng 
Reviewed-by: Chih-Min Chao 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_e_prci.c | 2 +-
 include/hw/riscv/sifive_e_prci.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_e_prci.c b/hw/riscv/sifive_e_prci.c
index 71de089..ad6c624 100644
--- a/hw/riscv/sifive_e_prci.c
+++ b/hw/riscv/sifive_e_prci.c
@@ -86,7 +86,7 @@ static void sifive_e_prci_init(Object *obj)
 SiFiveEPRCIState *s = SIFIVE_E_PRCI(obj);
 
 memory_region_init_io(>mmio, obj, _e_prci_ops, s,
-  TYPE_SIFIVE_E_PRCI, 0x8000);
+  TYPE_SIFIVE_E_PRCI, SIFIVE_E_PRCI_REG_SIZE);
 sysbus_init_mmio(SYS_BUS_DEVICE(obj), >mmio);
 
 s->hfrosccfg = (SIFIVE_E_PRCI_HFROSCCFG_RDY | SIFIVE_E_PRCI_HFROSCCFG_EN);
diff --git a/include/hw/riscv/sifive_e_prci.h b/include/hw/riscv/sifive_e_prci.h
index c4b76aa..698b0b4 100644
--- a/include/hw/riscv/sifive_e_prci.h
+++ b/include/hw/riscv/sifive_e_prci.h
@@ -47,6 +47,8 @@ enum {
 SIFIVE_E_PRCI_PLLOUTDIV_DIV1 = (1 << 8)
 };
 
+#define SIFIVE_E_PRCI_REG_SIZE  0x1000
+
 #define TYPE_SIFIVE_E_PRCI  "riscv.sifive.e.prci"
 
 #define SIFIVE_E_PRCI(obj) \
-- 
2.7.4

[Qemu-devel] [PATCH v5 18/30] riscv: sifive_u: Update PLIC hart topology configuration string

2019-08-22 Thread Bin Meng

With heterogeneous harts config, the PLIC hart topology configuration
string are "M,MS,.." because of the monitor hart #0.

Suggested-by: Fabien Chouteau 
Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index a36cd77..284f7a5 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -433,10 +433,11 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
 plic_hart_config = g_malloc0(plic_hart_config_len);
 for (i = 0; i < ms->smp.cpus; i++) {
 if (i != 0) {
-strncat(plic_hart_config, ",", plic_hart_config_len);
+strncat(plic_hart_config, "," SIFIVE_U_PLIC_HART_CONFIG,
+plic_hart_config_len);
+} else {
+strncat(plic_hart_config, "M", plic_hart_config_len);
 }
-strncat(plic_hart_config, SIFIVE_U_PLIC_HART_CONFIG,
-plic_hart_config_len);
 plic_hart_config_len -= (strlen(SIFIVE_U_PLIC_HART_CONFIG) + 1);
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 23/30] riscv: sifive_u: Update UART base addresses and IRQs

2019-08-22 Thread Bin Meng

This updates the UART base address and IRQs to match the hardware.

Signed-off-by: Bin Meng 
Reviewed-by: Jonathan Behrens 
Acked-by: Alistair Francis 
Reviewed-by: Chih-Min Chao 

---

Changes in v5: None
Changes in v4: None
Changes in v3:
- update IRQ numbers of both UARTs to match hardware as well

Changes in v2: None

 hw/riscv/sifive_u.c | 4 ++--
 include/hw/riscv/sifive_u.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index e0842ad..1a178dc 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -63,8 +63,8 @@ static const struct MemmapEntry {
 [SIFIVE_U_CLINT] ={  0x200,0x1 },
 [SIFIVE_U_PLIC] = {  0xc00,  0x400 },
 [SIFIVE_U_PRCI] = { 0x1000, 0x1000 },
-[SIFIVE_U_UART0] ={ 0x10013000, 0x1000 },
-[SIFIVE_U_UART1] ={ 0x10023000, 0x1000 },
+[SIFIVE_U_UART0] ={ 0x1001, 0x1000 },
+[SIFIVE_U_UART1] ={ 0x10011000, 0x1000 },
 [SIFIVE_U_DRAM] = { 0x8000,0x0 },
 [SIFIVE_U_GEM] =  { 0x100900FC, 0x2000 },
 };
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index 2a023be..b41e730 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -64,8 +64,8 @@ enum {
 };
 
 enum {
-SIFIVE_U_UART0_IRQ = 3,
-SIFIVE_U_UART1_IRQ = 4,
+SIFIVE_U_UART0_IRQ = 4,
+SIFIVE_U_UART1_IRQ = 5,
 SIFIVE_U_GEM_IRQ = 0x35
 };
 
-- 
2.7.4

[Qemu-devel] [PATCH v5 10/30] riscv: sifive_e: prci: Fix a typo of hfxosccfg register programming

2019-08-22 Thread Bin Meng

For hfxosccfg register programming, SIFIVE_E_PRCI_HFXOSCCFG_RDY and
SIFIVE_E_PRCI_HFXOSCCFG_EN should be used.

Signed-off-by: Bin Meng 
Acked-by: Alistair Francis 
Reviewed-by: Chih-Min Chao 
Reviewed-by: Philippe Mathieu-Daudé 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_e_prci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_e_prci.c b/hw/riscv/sifive_e_prci.c
index c514032..71de089 100644
--- a/hw/riscv/sifive_e_prci.c
+++ b/hw/riscv/sifive_e_prci.c
@@ -90,7 +90,7 @@ static void sifive_e_prci_init(Object *obj)
 sysbus_init_mmio(SYS_BUS_DEVICE(obj), >mmio);
 
 s->hfrosccfg = (SIFIVE_E_PRCI_HFROSCCFG_RDY | SIFIVE_E_PRCI_HFROSCCFG_EN);
-s->hfxosccfg = (SIFIVE_E_PRCI_HFROSCCFG_RDY | SIFIVE_E_PRCI_HFROSCCFG_EN);
+s->hfxosccfg = (SIFIVE_E_PRCI_HFXOSCCFG_RDY | SIFIVE_E_PRCI_HFXOSCCFG_EN);
 s->pllcfg = (SIFIVE_E_PRCI_PLLCFG_REFSEL | SIFIVE_E_PRCI_PLLCFG_BYPASS |
  SIFIVE_E_PRCI_PLLCFG_LOCK);
 s->plloutdiv = SIFIVE_E_PRCI_PLLOUTDIV_DIV1;
-- 
2.7.4

[Qemu-devel] [PATCH v5 05/30] riscv: hw: Change to use qemu_log_mask(LOG_GUEST_ERROR, ...) instead

2019-08-22 Thread Bin Meng

Replace the call to hw_error() with qemu_log_mask(LOG_GUEST_ERROR,...)
in various sifive models.

Signed-off-by: Bin Meng 

---

Changes in v5:
- new patch to change to use qemu_log_mask(LOG_GUEST_ERROR,...) instead
  in various sifive models

Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_prci.c | 8 +---
 hw/riscv/sifive_test.c | 5 +++--
 hw/riscv/sifive_uart.c | 9 +
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/sifive_prci.c b/hw/riscv/sifive_prci.c
index f406682..1ab98d4 100644
--- a/hw/riscv/sifive_prci.c
+++ b/hw/riscv/sifive_prci.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/sysbus.h"
+#include "qemu/log.h"
 #include "qemu/module.h"
 #include "target/riscv/cpu.h"
 #include "hw/riscv/sifive_prci.h"
@@ -37,7 +38,8 @@ static uint64_t sifive_prci_read(void *opaque, hwaddr addr, 
unsigned int size)
 case SIFIVE_PRCI_PLLOUTDIV:
 return s->plloutdiv;
 }
-hw_error("%s: read: addr=0x%x\n", __func__, (int)addr);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: read: addr=0x%x\n",
+  __func__, (int)addr);
 return 0;
 }
 
@@ -65,8 +67,8 @@ static void sifive_prci_write(void *opaque, hwaddr addr,
 s->plloutdiv = (uint32_t) val64;
 break;
 default:
-hw_error("%s: bad write: addr=0x%x v=0x%x\n",
- __func__, (int)addr, (int)val64);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: bad write: addr=0x%x v=0x%x\n",
+  __func__, (int)addr, (int)val64);
 }
 }
 
diff --git a/hw/riscv/sifive_test.c b/hw/riscv/sifive_test.c
index cd86831..655a3d7 100644
--- a/hw/riscv/sifive_test.c
+++ b/hw/riscv/sifive_test.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/sysbus.h"
+#include "qemu/log.h"
 #include "qemu/module.h"
 #include "sysemu/sysemu.h"
 #include "target/riscv/cpu.h"
@@ -48,8 +49,8 @@ static void sifive_test_write(void *opaque, hwaddr addr,
 break;
 }
 }
-hw_error("%s: write: addr=0x%x val=0x%016" PRIx64 "\n",
-__func__, (int)addr, val64);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: write: addr=0x%x val=0x%016" PRIx64 
"\n",
+  __func__, (int)addr, val64);
 }
 
 static const MemoryRegionOps sifive_test_ops = {
diff --git a/hw/riscv/sifive_uart.c b/hw/riscv/sifive_uart.c
index 3b3f94f..cd74043 100644
--- a/hw/riscv/sifive_uart.c
+++ b/hw/riscv/sifive_uart.c
@@ -18,6 +18,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/log.h"
 #include "hw/sysbus.h"
 #include "chardev/char.h"
 #include "chardev/char-fe.h"
@@ -93,8 +94,8 @@ uart_read(void *opaque, hwaddr addr, unsigned int size)
 return s->div;
 }
 
-hw_error("%s: bad read: addr=0x%x\n",
-__func__, (int)addr);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: bad read: addr=0x%x\n",
+  __func__, (int)addr);
 return 0;
 }
 
@@ -125,8 +126,8 @@ uart_write(void *opaque, hwaddr addr,
 s->div = val64;
 return;
 }
-hw_error("%s: bad write: addr=0x%x v=0x%x\n",
-__func__, (int)addr, (int)value);
+qemu_log_mask(LOG_GUEST_ERROR, "%s: bad write: addr=0x%x v=0x%x\n",
+  __func__, (int)addr, (int)value);
 }
 
 static const MemoryRegionOps uart_ops = {
-- 
2.7.4

[Qemu-devel] [PATCH v5 09/30] riscv: sifive: Rename sifive_prci.{c, h} to sifive_e_prci.{c, h}

2019-08-22 Thread Bin Meng

Current SiFive PRCI model only works with sifive_e machine, as it
only emulates registers or PRCI block in the FE310 SoC.

Rename the file name to make it clear that it is for sifive_e.
This also prefix "sifive_e"/"SIFIVE_E" for all macros, variables
and functions.

Signed-off-by: Bin Meng 
Reviewed-by: Chih-Min Chao 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- prefix all macros/variables/functions with SIFIVE_E/sifive_e
  in the sifive_e_prci driver

Changes in v3: None
Changes in v2: None

 hw/riscv/Makefile.objs  |  2 +-
 hw/riscv/sifive_e.c |  4 +-
 hw/riscv/{sifive_prci.c => sifive_e_prci.c} | 79 ++---
 include/hw/riscv/sifive_e_prci.h| 69 +
 include/hw/riscv/sifive_prci.h  | 69 -
 5 files changed, 111 insertions(+), 112 deletions(-)
 rename hw/riscv/{sifive_prci.c => sifive_e_prci.c} (51%)
 create mode 100644 include/hw/riscv/sifive_e_prci.h
 delete mode 100644 include/hw/riscv/sifive_prci.h

diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
index eb9d4f9..c859697 100644
--- a/hw/riscv/Makefile.objs
+++ b/hw/riscv/Makefile.objs
@@ -2,9 +2,9 @@ obj-y += boot.o
 obj-$(CONFIG_SPIKE) += riscv_htif.o
 obj-$(CONFIG_HART) += riscv_hart.o
 obj-$(CONFIG_SIFIVE_E) += sifive_e.o
+obj-$(CONFIG_SIFIVE_E) += sifive_e_prci.o
 obj-$(CONFIG_SIFIVE) += sifive_clint.o
 obj-$(CONFIG_SIFIVE) += sifive_gpio.o
-obj-$(CONFIG_SIFIVE) += sifive_prci.o
 obj-$(CONFIG_SIFIVE) += sifive_plic.o
 obj-$(CONFIG_SIFIVE) += sifive_test.o
 obj-$(CONFIG_SIFIVE_U) += sifive_u.o
diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index 2a499d8..2d67670 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -41,9 +41,9 @@
 #include "hw/riscv/riscv_hart.h"
 #include "hw/riscv/sifive_plic.h"
 #include "hw/riscv/sifive_clint.h"
-#include "hw/riscv/sifive_prci.h"
 #include "hw/riscv/sifive_uart.h"
 #include "hw/riscv/sifive_e.h"
+#include "hw/riscv/sifive_e_prci.h"
 #include "hw/riscv/boot.h"
 #include "chardev/char.h"
 #include "sysemu/arch_init.h"
@@ -174,7 +174,7 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
Error **errp)
 SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE);
 sifive_mmio_emulate(sys_mem, "riscv.sifive.e.aon",
 memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size);
-sifive_prci_create(memmap[SIFIVE_E_PRCI].base);
+sifive_e_prci_create(memmap[SIFIVE_E_PRCI].base);
 
 /* GPIO */
 
diff --git a/hw/riscv/sifive_prci.c b/hw/riscv/sifive_e_prci.c
similarity index 51%
rename from hw/riscv/sifive_prci.c
rename to hw/riscv/sifive_e_prci.c
index 1957dcd..c514032 100644
--- a/hw/riscv/sifive_prci.c
+++ b/hw/riscv/sifive_e_prci.c
@@ -1,5 +1,5 @@
 /*
- * QEMU SiFive PRCI (Power, Reset, Clock, Interrupt)
+ * QEMU SiFive E PRCI (Power, Reset, Clock, Interrupt)
  *
  * Copyright (c) 2017 SiFive, Inc.
  *
@@ -22,19 +22,19 @@
 #include "hw/sysbus.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
-#include "hw/riscv/sifive_prci.h"
+#include "hw/riscv/sifive_e_prci.h"
 
-static uint64_t sifive_prci_read(void *opaque, hwaddr addr, unsigned int size)
+static uint64_t sifive_e_prci_read(void *opaque, hwaddr addr, unsigned int 
size)
 {
-SiFivePRCIState *s = opaque;
+SiFiveEPRCIState *s = opaque;
 switch (addr) {
-case SIFIVE_PRCI_HFROSCCFG:
+case SIFIVE_E_PRCI_HFROSCCFG:
 return s->hfrosccfg;
-case SIFIVE_PRCI_HFXOSCCFG:
+case SIFIVE_E_PRCI_HFXOSCCFG:
 return s->hfxosccfg;
-case SIFIVE_PRCI_PLLCFG:
+case SIFIVE_E_PRCI_PLLCFG:
 return s->pllcfg;
-case SIFIVE_PRCI_PLLOUTDIV:
+case SIFIVE_E_PRCI_PLLOUTDIV:
 return s->plloutdiv;
 }
 qemu_log_mask(LOG_GUEST_ERROR, "%s: read: addr=0x%x\n",
@@ -42,27 +42,27 @@ static uint64_t sifive_prci_read(void *opaque, hwaddr addr, 
unsigned int size)
 return 0;
 }
 
-static void sifive_prci_write(void *opaque, hwaddr addr,
-   uint64_t val64, unsigned int size)
+static void sifive_e_prci_write(void *opaque, hwaddr addr,
+uint64_t val64, unsigned int size)
 {
-SiFivePRCIState *s = opaque;
+SiFiveEPRCIState *s = opaque;
 switch (addr) {
-case SIFIVE_PRCI_HFROSCCFG:
+case SIFIVE_E_PRCI_HFROSCCFG:
 s->hfrosccfg = (uint32_t) val64;
 /* OSC stays ready */
-s->hfrosccfg |= SIFIVE_PRCI_HFROSCCFG_RDY;
+s->hfrosccfg |= SIFIVE_E_PRCI_HFROSCCFG_RDY;
 break;
-case SIFIVE_PRCI_HFXOSCCFG:
+case SIFIVE_E_PRCI_HFXOSCCFG:
 s->hfxosccfg = (uint32_t) val64;
 /* OSC stays ready */
-s->hfxosccfg |= SIFIVE_PRCI_HFXOSCCFG_RDY;
+s->hfxosccfg |= SIFIVE_E_PRCI_HFXOSCCFG_RDY;
 break;
-case SIFIVE_PRCI_PLLCFG:
+case SIFIVE_E_PRCI_PLLCFG:
 s->pllcfg = (uint32_t) val64;
 /* PLL stays locked */
-s->pllcfg |= SIFIVE_PRCI_PLLCFG_LOCK;
+

[Qemu-devel] [PATCH v5 19/30] riscv: sifive: Implement PRCI model for FU540

2019-08-22 Thread Bin Meng

This adds a simple PRCI model for FU540 (sifive_u). It has different
register layout from the existing PRCI model for FE310 (sifive_e).

Signed-off-by: Bin Meng 

---

Changes in v5:
- change to use defines instead of enums
- change to use qemu_log_mask(LOG_GUEST_ERROR,...) in sifive_u_prci
- creating a 32-bit val variable and using that instead of casting
  everywhere in sifive_u_prci_write()
- move all register initialization to sifive_u_prci_reset() function
- drop sifive_u_prci_create()
- s/codes that worked/code that works/g

Changes in v4:
- prefix all macros/variables/functions with SIFIVE_U/sifive_u
  in the sifive_u_prci driver

Changes in v3: None
Changes in v2: None

 hw/riscv/Makefile.objs   |   1 +
 hw/riscv/sifive_u_prci.c | 171 +++
 include/hw/riscv/sifive_u_prci.h |  81 +++
 3 files changed, 253 insertions(+)
 create mode 100644 hw/riscv/sifive_u_prci.c
 create mode 100644 include/hw/riscv/sifive_u_prci.h

diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
index c859697..b95bbd5 100644
--- a/hw/riscv/Makefile.objs
+++ b/hw/riscv/Makefile.objs
@@ -8,6 +8,7 @@ obj-$(CONFIG_SIFIVE) += sifive_gpio.o
 obj-$(CONFIG_SIFIVE) += sifive_plic.o
 obj-$(CONFIG_SIFIVE) += sifive_test.o
 obj-$(CONFIG_SIFIVE_U) += sifive_u.o
+obj-$(CONFIG_SIFIVE_U) += sifive_u_prci.o
 obj-$(CONFIG_SIFIVE) += sifive_uart.o
 obj-$(CONFIG_SPIKE) += spike.o
 obj-$(CONFIG_RISCV_VIRT) += virt.o
diff --git a/hw/riscv/sifive_u_prci.c b/hw/riscv/sifive_u_prci.c
new file mode 100644
index 000..c6438fb
--- /dev/null
+++ b/hw/riscv/sifive_u_prci.c
@@ -0,0 +1,171 @@
+/*
+ * QEMU SiFive U PRCI (Power, Reset, Clock, Interrupt)
+ *
+ * Copyright (c) 2019 Bin Meng 
+ *
+ * Simple model of the PRCI to emulate register reads made by the SDK BSP
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/riscv/sifive_u_prci.h"
+
+static uint64_t sifive_u_prci_read(void *opaque, hwaddr addr, unsigned int 
size)
+{
+SiFiveUPRCIState *s = opaque;
+
+switch (addr) {
+case SIFIVE_U_PRCI_HFXOSCCFG:
+return s->hfxosccfg;
+case SIFIVE_U_PRCI_COREPLLCFG0:
+return s->corepllcfg0;
+case SIFIVE_U_PRCI_DDRPLLCFG0:
+return s->ddrpllcfg0;
+case SIFIVE_U_PRCI_DDRPLLCFG1:
+return s->ddrpllcfg1;
+case SIFIVE_U_PRCI_GEMGXLPLLCFG0:
+return s->gemgxlpllcfg0;
+case SIFIVE_U_PRCI_GEMGXLPLLCFG1:
+return s->gemgxlpllcfg1;
+case SIFIVE_U_PRCI_CORECLKSEL:
+return s->coreclksel;
+case SIFIVE_U_PRCI_DEVICESRESET:
+return s->devicesreset;
+case SIFIVE_U_PRCI_CLKMUXSTATUS:
+return s->clkmuxstatus;
+}
+
+qemu_log_mask(LOG_GUEST_ERROR, "%s: read: addr=0x%x\n",
+  __func__, (int)addr);
+
+return 0;
+}
+
+static void sifive_u_prci_write(void *opaque, hwaddr addr,
+uint64_t val64, unsigned int size)
+{
+SiFiveUPRCIState *s = opaque;
+uint32_t val32 = (uint32_t)val64;
+
+switch (addr) {
+case SIFIVE_U_PRCI_HFXOSCCFG:
+s->hfxosccfg = val32;
+/* OSC stays ready */
+s->hfxosccfg |= SIFIVE_U_PRCI_HFXOSCCFG_RDY;
+break;
+case SIFIVE_U_PRCI_COREPLLCFG0:
+s->corepllcfg0 = val32;
+/* internal feedback */
+s->corepllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_FSE;
+/* PLL stays locked */
+s->corepllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_LOCK;
+break;
+case SIFIVE_U_PRCI_DDRPLLCFG0:
+s->ddrpllcfg0 = val32;
+/* internal feedback */
+s->ddrpllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_FSE;
+/* PLL stays locked */
+s->ddrpllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_LOCK;
+break;
+case SIFIVE_U_PRCI_DDRPLLCFG1:
+s->ddrpllcfg1 = val32;
+break;
+case SIFIVE_U_PRCI_GEMGXLPLLCFG0:
+s->gemgxlpllcfg0 = val32;
+ /* internal feedback */
+s->gemgxlpllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_FSE;
+   /* PLL stays locked */
+s->gemgxlpllcfg0 |= SIFIVE_U_PRCI_PLLCFG0_LOCK;
+break;
+case SIFIVE_U_PRCI_GEMGXLPLLCFG1:
+s->gemgxlpllcfg1 = val32;
+break;
+case SIFIVE_U_PRCI_CORECLKSEL:
+s->coreclksel = val32;
+break;
+case SIFIVE_U_PRCI_DEVICESRESET:
+s->devicesreset = val32;
+

[Qemu-devel] [PATCH v5 06/30] riscv: hw: Remove the unnecessary include of target/riscv/cpu.h

2019-08-22 Thread Bin Meng

The inclusion of "target/riscv/cpu.h" is unnecessary in various
sifive model drivers.

Signed-off-by: Bin Meng 

---

Changes in v5:
- new patch to remove the unnecessary include of target/riscv/cpu.h

Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_prci.c | 1 -
 hw/riscv/sifive_test.c | 1 -
 hw/riscv/sifive_uart.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/hw/riscv/sifive_prci.c b/hw/riscv/sifive_prci.c
index 1ab98d4..1957dcd 100644
--- a/hw/riscv/sifive_prci.c
+++ b/hw/riscv/sifive_prci.c
@@ -22,7 +22,6 @@
 #include "hw/sysbus.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
-#include "target/riscv/cpu.h"
 #include "hw/riscv/sifive_prci.h"
 
 static uint64_t sifive_prci_read(void *opaque, hwaddr addr, unsigned int size)
diff --git a/hw/riscv/sifive_test.c b/hw/riscv/sifive_test.c
index 655a3d7..31cad9f 100644
--- a/hw/riscv/sifive_test.c
+++ b/hw/riscv/sifive_test.c
@@ -23,7 +23,6 @@
 #include "qemu/log.h"
 #include "qemu/module.h"
 #include "sysemu/sysemu.h"
-#include "target/riscv/cpu.h"
 #include "hw/riscv/sifive_test.h"
 
 static uint64_t sifive_test_read(void *opaque, hwaddr addr, unsigned int size)
diff --git a/hw/riscv/sifive_uart.c b/hw/riscv/sifive_uart.c
index cd74043..1601bd9 100644
--- a/hw/riscv/sifive_uart.c
+++ b/hw/riscv/sifive_uart.c
@@ -22,7 +22,6 @@
 #include "hw/sysbus.h"
 #include "chardev/char.h"
 #include "chardev/char-fe.h"
-#include "target/riscv/cpu.h"
 #include "hw/riscv/sifive_uart.h"
 
 /*
-- 
2.7.4

[Qemu-devel] [PATCH v5 04/30] riscv: hw: Change create_fdt() to return void

2019-08-22 Thread Bin Meng

There is no need to return fdt at the end of create_fdt() because
it's already saved in s->fdt.

Signed-off-by: Bin Meng 
Reviewed-by: Chih-Min Chao 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- change create_fdt() to return void in sifive_u.c too, after rebasing
  on Palmer's QEMU RISC-V tree

Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 11 ---
 hw/riscv/virt.c | 11 ---
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 5fe0033..e22803b 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -67,7 +67,7 @@ static const struct MemmapEntry {
 
 #define GEM_REVISION0x10070109
 
-static void *create_fdt(SiFiveUState *s, const struct MemmapEntry *memmap,
+static void create_fdt(SiFiveUState *s, const struct MemmapEntry *memmap,
 uint64_t mem_size, const char *cmdline)
 {
 void *fdt;
@@ -253,14 +253,11 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, "/aliases", "serial0", nodename);
 
 g_free(nodename);
-
-return fdt;
 }
 
 static void riscv_sifive_u_init(MachineState *machine)
 {
 const struct MemmapEntry *memmap = sifive_u_memmap;
-void *fdt;
 
 SiFiveUState *s = g_new0(SiFiveUState, 1);
 MemoryRegion *system_memory = get_system_memory();
@@ -281,7 +278,7 @@ static void riscv_sifive_u_init(MachineState *machine)
 main_mem);
 
 /* create device tree */
-fdt = create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
+create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
 
 riscv_find_and_load_firmware(machine, BIOS_FILENAME,
  memmap[SIFIVE_U_DRAM].base);
@@ -294,9 +291,9 @@ static void riscv_sifive_u_init(MachineState *machine)
 hwaddr end = riscv_load_initrd(machine->initrd_filename,
machine->ram_size, kernel_entry,
);
-qemu_fdt_setprop_cell(fdt, "/chosen",
+qemu_fdt_setprop_cell(s->fdt, "/chosen",
   "linux,initrd-start", start);
-qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
+qemu_fdt_setprop_cell(s->fdt, "/chosen", "linux,initrd-end",
   end);
 }
 }
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 2f75195..6bfa721 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -112,7 +112,7 @@ static void create_pcie_irq_map(void *fdt, char *nodename,
0x1800, 0, 0, 0x7);
 }
 
-static void *create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
+static void create_fdt(RISCVVirtState *s, const struct MemmapEntry *memmap,
 uint64_t mem_size, const char *cmdline)
 {
 void *fdt;
@@ -316,8 +316,6 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
 }
 g_free(nodename);
-
-return fdt;
 }
 
 
@@ -373,7 +371,6 @@ static void riscv_virt_board_init(MachineState *machine)
 size_t plic_hart_config_len;
 int i;
 unsigned int smp_cpus = machine->smp.cpus;
-void *fdt;
 
 /* Initialize SOC */
 object_initialize_child(OBJECT(machine), "soc", >soc, sizeof(s->soc),
@@ -392,7 +389,7 @@ static void riscv_virt_board_init(MachineState *machine)
 main_mem);
 
 /* create device tree */
-fdt = create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
+create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
 
 /* boot rom */
 memory_region_init_rom(mask_rom, NULL, "riscv_virt_board.mrom",
@@ -411,9 +408,9 @@ static void riscv_virt_board_init(MachineState *machine)
 hwaddr end = riscv_load_initrd(machine->initrd_filename,
machine->ram_size, kernel_entry,
);
-qemu_fdt_setprop_cell(fdt, "/chosen",
+qemu_fdt_setprop_cell(s->fdt, "/chosen",
   "linux,initrd-start", start);
-qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
+qemu_fdt_setprop_cell(s->fdt, "/chosen", "linux,initrd-end",
   end);
 }
 }
-- 
2.7.4

[Qemu-devel] [PATCH v5 08/30] riscv: sifive_u: Remove the unnecessary include of prci header

2019-08-22 Thread Bin Meng

sifive_u machine does not use PRCI as of today. Remove the prci
header inclusion.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index e22803b..3f58f61 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -39,7 +39,6 @@
 #include "hw/riscv/sifive_plic.h"
 #include "hw/riscv/sifive_clint.h"
 #include "hw/riscv/sifive_uart.h"
-#include "hw/riscv/sifive_prci.h"
 #include "hw/riscv/sifive_u.h"
 #include "hw/riscv/boot.h"
 #include "chardev/char.h"
-- 
2.7.4

[Qemu-devel] [PATCH v5 17/30] riscv: sifive_u: Set the minimum number of cpus to 2

2019-08-22 Thread Bin Meng

It is not useful if we only have one management CPU.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4: None
Changes in v3:
- use management cpu count + 1 for the min_cpus

Changes in v2:
- update the file header to indicate at least 2 harts are created

 hw/riscv/sifive_u.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 0e5bbe7..a36cd77 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -10,8 +10,8 @@
  * 1) CLINT (Core Level Interruptor)
  * 2) PLIC (Platform Level Interrupt Controller)
  *
- * This board currently generates devicetree dynamically that indicates at most
- * five harts.
+ * This board currently generates devicetree dynamically that indicates at 
least
+ * two harts and up to five harts.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -485,6 +485,7 @@ static void riscv_sifive_u_machine_init(MachineClass *mc)
 mc->desc = "RISC-V Board compatible with SiFive U SDK";
 mc->init = riscv_sifive_u_init;
 mc->max_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + SIFIVE_U_COMPUTE_CPU_COUNT;
+mc->min_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + 1;
 }
 
 DEFINE_MACHINE("sifive_u", riscv_sifive_u_machine_init)
-- 
2.7.4

[Qemu-devel] [PATCH v5 00/30] riscv: sifive_u: Improve the emulation fidelity of sifive_u machine

2019-08-22 Thread Bin Meng

As of today, the QEMU 'sifive_u' machine is a special target that does
not boot the upstream OpenSBI/U-Boot firmware images built for the real
SiFive HiFive Unleashed board. Hence OpenSBI supports a special platform
"qemu/sifive_u". For U-Boot, the sifive_fu540_defconfig is referenced
in the OpenSBI doc as its payload, but that does not boot at all due
to various issues in current QEMU 'sifive_u' machine codes.

This series aims to improve the emulation fidelity of sifive_u machine,
so that the upstream OpenSBI, U-Boot and kernel images built for the
SiFive HiFive Unleashed board can be used out of the box without any
special hack.

The major changes include:
- Heterogeneous harts creation supported, so that we can create a CPU
  that exactly mirrors the real hardware: 1 E51 + 4 U54.
- Implemented a PRCI model for FU540
- Implemented an OTP model for FU540, primarily used for storing serial
  number of the board
- Fixed GEM support that was seriously broken on sifive_u
- Synced device tree with upstream Linux kernel on sifive_u

OpenSBI v0.4 image built for sifive/fu540 is included as the default
bios image for 'sifive_u' machine.

The series is tested against OpenSBI v0.4 image for sifive/fu540
paltform, U-Boot v2019.10-rc1 image for sifive_fu540_defconfig,
and Linux kernel v5.3-rc3 image with the following patch:

macb: Update compatibility string for SiFive FU540-C000 [1]

OpenSBI + U-Boot, ping/tftpboot with U-Boot MACB driver works well.
Boot Linux 64-bit defconfig image, verified that system console on
the serial 0 and ping host work pretty well.

An OpenSBI patch [2] was sent to drop the special "qemu/sifive_u" platform
support in OpenSBI. The original plan was to get the drop patch applied
after this QEMU series is merged. However after discussion in the OpenSBI
mailing list, it seems the best option for us is to let OpenSBI continue
shipping the special "qemu/sifive_u" platform support to work with QEMU
version <= 4.1 and deprecate the support sometime in the future. A patch
will need to be sent to OpenSBI mailing list to update its document.

v4 is now rebased on Palmer's QEMU RISC-V repo "for-master" branch.
Dropped the following v3 patch that was already done by someone else.
- riscv: sifive_u: Generate an aliases node in the device tree
- riscv: sifive_u: Support loading initramfs

The following v3 patch was dropped too due to a different cluster approach
suggested by Richard Henderson is used in v4:
- riscv: hart: Support heterogeneous harts population

[1]: https://patchwork.kernel.org/patch/11050003/
[2]: http://lists.infradead.org/pipermail/opensbi/2019-August/000335.html

Changes in v5:
- new patch to change to use qemu_log_mask(LOG_GUEST_ERROR,...) instead
  in various sifive models
- new patch to remove the unnecessary include of target/riscv/cpu.h
- change to use defines instead of enums
- change to use qemu_log_mask(LOG_GUEST_ERROR,...) in sifive_u_prci
- creating a 32-bit val variable and using that instead of casting
  everywhere in sifive_u_prci_write()
- move all register initialization to sifive_u_prci_reset() function
- drop sifive_u_prci_create()
- s/codes that worked/code that works/g
- create sifive_u_prci block directly in the machine codes, instead
  of calling sifive_u_prci_create()
- change to use defines instead of enums
- change to use qemu_log_mask(LOG_GUEST_ERROR,...) in sifive_u_otp
- creating a 32-bit val variable and using that instead of casting
  everywhere in sifive_u_otp_write()
- move all register initialization to sifive_u_otp_reset() function
- drop sifive_u_otp_create()
- create sifive_u_otp block directly in the machine codes, instead
  of calling sifive_u_otp_create()
- add the missing "local-mac-address" property in the ethernet node

Changes in v4:
- remove 2 more "linux,phandle" instances in sifive_u.c and spike.c
  after rebasing on Palmer's QEMU RISC-V tree
- change create_fdt() to return void in sifive_u.c too, after rebasing
  on Palmer's QEMU RISC-V tree
- new patch to remove executable attribute of opensbi images
- prefix all macros/variables/functions with SIFIVE_E/sifive_e
  in the sifive_e_prci driver
- new patch to add a "hartid-base" property to RISC-V hart array
- changed to create clusters for each cpu type
- prefix all macros/variables/functions with SIFIVE_U/sifive_u
  in the sifive_u_prci driver
- prefix all macros/variables/functions with SIFIVE_U/sifive_u
  in the sifive_u_otp driver
- new patch to remove handcrafted clock nodes for UART and ethernet

Changes in v3:
- changed to use macros for management and compute cpu count
- use management cpu count + 1 for the min_cpus
- update IRQ numbers of both UARTs to match hardware as well

Changes in v2:
- keep the PLIC compatible string unchanged as OpenSBI uses that
  for DT fix up
- drop patch "riscv: sifive: Move sifive_mmio_emulate() to a common place"
- new patch "riscv: sifive_e: Drop sifive_mmio_emulate()"
- fixed the "interrupts-extended" property size
- update the file header to

[Qemu-devel] [PATCH v5 01/30] riscv: hw: Remove superfluous "linux, phandle" property

2019-08-22 Thread Bin Meng

"linux,phandle" property is optional. Remove all instances in the
sifive_u, virt and spike machine device trees.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- remove 2 more "linux,phandle" instances in sifive_u.c and spike.c
  after rebasing on Palmer's QEMU RISC-V tree

Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 4 
 hw/riscv/spike.c| 1 -
 hw/riscv/virt.c | 3 ---
 3 files changed, 8 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 64e233d..afe304f 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -126,7 +126,6 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
 qemu_fdt_add_subnode(fdt, intc);
 qemu_fdt_setprop_cell(fdt, intc, "phandle", cpu_phandle);
-qemu_fdt_setprop_cell(fdt, intc, "linux,phandle", cpu_phandle);
 qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
 qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
 qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
@@ -185,7 +184,6 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", 0x35);
 qemu_fdt_setprop_cells(fdt, nodename, "phandle", plic_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "linux,phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(cells);
 g_free(nodename);
@@ -198,7 +196,6 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
 SIFIVE_U_GEM_CLOCK_FREQ);
 qemu_fdt_setprop_cell(fdt, nodename, "phandle", ethclk_phandle);
-qemu_fdt_setprop_cell(fdt, nodename, "linux,phandle", ethclk_phandle);
 ethclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(nodename);
 
@@ -234,7 +231,6 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
 qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
 qemu_fdt_setprop_cell(fdt, nodename, "phandle", uartclk_phandle);
-qemu_fdt_setprop_cell(fdt, nodename, "linux,phandle", uartclk_phandle);
 uartclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(nodename);
 
diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 2991b34..14acaef 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -112,7 +112,6 @@ static void create_fdt(SpikeState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
 qemu_fdt_add_subnode(fdt, intc);
 qemu_fdt_setprop_cell(fdt, intc, "phandle", 1);
-qemu_fdt_setprop_cell(fdt, intc, "linux,phandle", 1);
 qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
 qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
 qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 25faf3b..00be05a 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -170,11 +170,9 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
 qemu_fdt_setprop_string(fdt, nodename, "device_type", "cpu");
 qemu_fdt_setprop_cell(fdt, nodename, "phandle", cpu_phandle);
-qemu_fdt_setprop_cell(fdt, nodename, "linux,phandle", cpu_phandle);
 intc_phandle = phandle++;
 qemu_fdt_add_subnode(fdt, intc);
 qemu_fdt_setprop_cell(fdt, intc, "phandle", intc_phandle);
-qemu_fdt_setprop_cell(fdt, intc, "linux,phandle", intc_phandle);
 qemu_fdt_setprop_string(fdt, intc, "compatible", "riscv,cpu-intc");
 qemu_fdt_setprop(fdt, intc, "interrupt-controller", NULL, 0);
 qemu_fdt_setprop_cell(fdt, intc, "#interrupt-cells", 1);
@@ -250,7 +248,6 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
 qemu_fdt_setprop_cells(fdt, nodename, "phandle", plic_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "linux,phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(cells);
 g_free(nodename);
-- 
2.7.4

[Qemu-devel] [PATCH v5 03/30] riscv: hw: Remove not needed PLIC properties in device tree

2019-08-22 Thread Bin Meng

This removes "reg-names" and "riscv,max-priority" properties of the
PLIC node from device tree.

Signed-off-by: Bin Meng 
Reviewed-by: Jonathan Behrens 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2:
- keep the PLIC compatible string unchanged as OpenSBI uses that
  for DT fix up

 hw/riscv/sifive_u.c | 2 --
 hw/riscv/virt.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 3f9284e..5fe0033 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -180,8 +180,6 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[SIFIVE_U_PLIC].base,
 0x0, memmap[SIFIVE_U_PLIC].size);
-qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
-qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", 0x35);
 qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 127f005..2f75195 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -244,8 +244,6 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[VIRT_PLIC].base,
 0x0, memmap[VIRT_PLIC].size);
-qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
-qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
 qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
-- 
2.7.4

[Qemu-devel] [PATCH v5 07/30] riscv: roms: Remove executable attribute of opensbi images

2019-08-22 Thread Bin Meng

Like other binary files, the executable attribute of opensbi images
should not be set.

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 

---

Changes in v5: None
Changes in v4:
- new patch to remove executable attribute of opensbi images

Changes in v3: None
Changes in v2: None

 pc-bios/opensbi-riscv32-virt-fw_jump.bin | Bin
 pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin | Bin
 pc-bios/opensbi-riscv64-virt-fw_jump.bin | Bin
 3 files changed, 0 insertions(+), 0 deletions(-)
 mode change 100755 => 100644 pc-bios/opensbi-riscv32-virt-fw_jump.bin
 mode change 100755 => 100644 pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
 mode change 100755 => 100644 pc-bios/opensbi-riscv64-virt-fw_jump.bin

diff --git a/pc-bios/opensbi-riscv32-virt-fw_jump.bin 
b/pc-bios/opensbi-riscv32-virt-fw_jump.bin
old mode 100755
new mode 100644
diff --git a/pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin 
b/pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
old mode 100755
new mode 100644
diff --git a/pc-bios/opensbi-riscv64-virt-fw_jump.bin 
b/pc-bios/opensbi-riscv64-virt-fw_jump.bin
old mode 100755
new mode 100644
-- 
2.7.4

[Qemu-devel] [PATCH v5 02/30] riscv: hw: Use qemu_fdt_setprop_cell() for property with only 1 cell

2019-08-22 Thread Bin Meng

Some of the properties only have 1 cell so we should use
qemu_fdt_setprop_cell() instead of qemu_fdt_setprop_cells().

Signed-off-by: Bin Meng 
Reviewed-by: Alistair Francis 
---

Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 hw/riscv/sifive_u.c | 18 +-
 hw/riscv/virt.c | 24 
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index afe304f..3f9284e 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -183,7 +183,7 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", 0x35);
-qemu_fdt_setprop_cells(fdt, nodename, "phandle", plic_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(cells);
 g_free(nodename);
@@ -208,20 +208,20 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 0x0, memmap[SIFIVE_U_GEM].size);
 qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
 qemu_fdt_setprop_string(fdt, nodename, "phy-mode", "gmii");
-qemu_fdt_setprop_cells(fdt, nodename, "interrupt-parent", plic_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "interrupts", SIFIVE_U_GEM_IRQ);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupts", SIFIVE_U_GEM_IRQ);
 qemu_fdt_setprop_cells(fdt, nodename, "clocks",
 ethclk_phandle, ethclk_phandle, ethclk_phandle);
 qemu_fdt_setprop(fdt, nodename, "clock-names", ethclk_names,
 sizeof(ethclk_names));
-qemu_fdt_setprop_cells(fdt, nodename, "#address-cells", 1);
-qemu_fdt_setprop_cells(fdt, nodename, "#size-cells", 0);
+qemu_fdt_setprop_cell(fdt, nodename, "#address-cells", 1);
+qemu_fdt_setprop_cell(fdt, nodename, "#size-cells", 0);
 g_free(nodename);
 
 nodename = g_strdup_printf("/soc/ethernet@%lx/ethernet-phy@0",
 (long)memmap[SIFIVE_U_GEM].base);
 qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_cells(fdt, nodename, "reg", 0x0);
+qemu_fdt_setprop_cell(fdt, nodename, "reg", 0x0);
 g_free(nodename);
 
 uartclk_phandle = phandle++;
@@ -241,9 +241,9 @@ static void *create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[SIFIVE_U_UART0].base,
 0x0, memmap[SIFIVE_U_UART0].size);
-qemu_fdt_setprop_cells(fdt, nodename, "clocks", uartclk_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "interrupt-parent", plic_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "interrupts", SIFIVE_U_UART0_IRQ);
+qemu_fdt_setprop_cell(fdt, nodename, "clocks", uartclk_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupt-parent", plic_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "interrupts", SIFIVE_U_UART0_IRQ);
 
 qemu_fdt_add_subnode(fdt, "/chosen");
 qemu_fdt_setprop_string(fdt, "/chosen", "stdout-path", nodename);
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 00be05a..127f005 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -233,8 +233,8 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 nodename = g_strdup_printf("/soc/interrupt-controller@%lx",
 (long)memmap[VIRT_PLIC].base);
 qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_cells(fdt, nodename, "#address-cells",
-   FDT_PLIC_ADDR_CELLS);
+qemu_fdt_setprop_cell(fdt, nodename, "#address-cells",
+  FDT_PLIC_ADDR_CELLS);
 qemu_fdt_setprop_cell(fdt, nodename, "#interrupt-cells",
   FDT_PLIC_INT_CELLS);
 qemu_fdt_setprop_string(fdt, nodename, "compatible", "riscv,plic0");
@@ -247,7 +247,7 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, nodename, "reg-names", "control");
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,max-priority", 7);
 qemu_fdt_setprop_cell(fdt, nodename, "riscv,ndev", VIRTIO_NDEV);
-qemu_fdt_setprop_cells(fdt, nodename, "phandle", plic_phandle);
+qemu_fdt_setprop_cell(fdt, nodename, "phandle", plic_phandle);
 plic_phandle = qemu_fdt_get_phandle(fdt, nodename);
 g_free(cells);
 g_free(nodename);
@@ -260,19 +260,19 @@ static void *create_fdt(RISCVVirtState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_cells(fdt, nodename, "reg",
 0x0, memmap[VIRT_VIRTIO].base + i * memmap[VIRT_VIRTIO].size,
 0x0, memmap[VIRT_VIRTIO].size);
-qemu_fdt_setprop_cells(fdt, nodename, "interrupt-parent", 
plic_phandle);
-qemu_fdt_setprop_cells(fdt, nodename, "interrupts",

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Yao, Jiewen

Thank you Mike!

That is good reference on the real hardware behavior. (Glad it is public.)

For threat model, the unique part in virtual environment is temp RAM.
The temp RAM in real platform is per CPU cache, while the temp RAM in virtual 
platform is global memory.
That brings one more potential attack surface in virtual environment, if 
hot-added CPU need run code with stack or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen


> -Original Message-
> From: Kinney, Michael D
> Sent: Friday, August 23, 2019 9:03 AM
> To: Paolo Bonzini ; Laszlo Ersek
> ; r...@edk2.groups.io; Yao, Jiewen
> ; Kinney, Michael D 
> Cc: Alex Williamson ; de...@edk2.groups.io;
> qemu devel list ; Igor Mammedov
> ; Chen, Yingwen ;
> Nakajima, Jun ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl 
> Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
> QEMU+OVMF
> 
> Paolo,
> 
> I find the following links related to the discussions here
> along with one example feature called GENPROTRANGE.
> 
> https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/ima
> ges-media/day1_trusted-computing_200-250.pdf
> https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Ho
> t-Add_flow.pdf
> https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131
> 292.pdf
> 
> Best regards,
> 
> Mike
> 
> > -Original Message-
> > From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> > Sent: Thursday, August 22, 2019 4:12 PM
> > To: Kinney, Michael D ;
> > Laszlo Ersek ; r...@edk2.groups.io;
> > Yao, Jiewen 
> > Cc: Alex Williamson ;
> > de...@edk2.groups.io; qemu devel list  > de...@nongnu.org>; Igor Mammedov ;
> > Chen, Yingwen ; Nakajima, Jun
> > ; Boris Ostrovsky
> > ; Joao Marcal Lemos Martins
> > ; Phillip Goerl
> > 
> > Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> > SMM with QEMU+OVMF
> >
> > On 23/08/19 00:32, Kinney, Michael D wrote:
> > > Paolo,
> > >
> > > It is my understanding that real HW hot plug uses the
> > SDM defined
> > > methods.  Meaning the initial SMI is to 3000:8000 and
> > they rebase to
> > > TSEG in the first SMI.  They must have chipset specific
> > methods to
> > > protect 3000:8000 from DMA.
> >
> > It would be great if you could check.
> >
> > > Can we add a chipset feature to prevent DMA to 64KB
> > range from
> > > 0x3-0x3 and the UEFI Memory Map and ACPI
> > content can be
> > > updated so the Guest OS knows to not use that range for
> > DMA?
> >
> > If real hardware does it at the chipset level, we will
> > probably use Igor's suggestion of aliasing A-seg to
> > 3000:.  Before starting the new CPU, the SMI handler
> > can prepare the SMBASE relocation trampoline at
> > A000:8000 and the hot-plugged CPU will find it at
> > 3000:8000 when it receives the initial SMI.  Because this
> > is backed by RAM at 0xA-0xA, DMA cannot access it
> > and would still go through to RAM at 0x3.
> >
> > Paolo

Re: [Qemu-devel] [Qemu-ppc] [GIT PULL for qemu-pseries REPOST] pseries: Update SLOF firmware image

2019-08-22 Thread Alexey Kardashevskiy





On 14/08/2019 14:33, Aravinda Prasad wrote:



On Tuesday 13 August 2019 07:47 PM, David Gibson wrote:

On Tue, Aug 13, 2019 at 01:00:24PM +0530, Aravinda Prasad wrote:



On Monday 12 August 2019 03:38 PM, David Gibson wrote:

On Mon, Aug 05, 2019 at 02:14:39PM +0530, Aravinda Prasad wrote:

Alexey/David,

With the SLOF changes, QEMU cannot resize the RTAS blob. Resizing is
required for FWNMI support which extends the RTAS blob to include an
error log upon a machine check.

The check to valid RTAS buffer fails in the guest because the rtas-size
updated in QEMU is not reflecting in the guest.

Any workaround for this?


Well, we should still be able to do it, it just means fwnmi would need
a SLOF change.  It's an inconvenience, but not really a big deal.


Yes. Alexey and I were discussing about the following changes to SLOf:

diff --git a/lib/libhvcall/hvcall.S b/lib/libhvcall/hvcall.S
index b19f6dbeff2c..880d29a29122 100644
--- a/lib/libhvcall/hvcall.S
+++ b/lib/libhvcall/hvcall.S
@@ -134,6 +134,7 @@ ENTRY(hv_rtas)
 ori r3,r3,KVMPPC_H_RTAS@l
 HVCALL
 blr
+.space 2048
 .globl hv_rtas_size
  hv_rtas_size:
 .long . - hv_rtas;


But this will statically reserve space for RTAS even when
SPAPR_CAP_FWNMI_MCE is OFF.


Sure.  We could flag that in the DT somehow, and have SLOF reserve the
space conditionally.

Or we could just ignore it. 2 kiB is miniscule compared to our minimum
guest size, and our current RTAS is microscopic compared to PowerVM.


I also think so, 2kiB is miniscule so we can allocate it statically.

Alexey,

Can you please include the above one line fix to SLOF?



I am thinking of:
===
@@ -132,20 +132,22 @@ ENTRY(hv_rtas)
mr  r4,r3
lis r3,KVMPPC_H_RTAS@h
ori r3,r3,KVMPPC_H_RTAS@l
HVCALL
blr
+   .space 2048 - (. - hv_rtas)
.globl hv_rtas_size
 hv_rtas_size:
.long . - hv_rtas;

 ENTRY(hv_rtas_broken_sc1)
mr  r4,r3
lis r3,KVMPPC_H_RTAS@h
ori r3,r3,KVMPPC_H_RTAS@l
.long   0x7c000268
blr
+   .space 2048 - (. - hv_rtas_broken_sc1)
.globl hv_rtas_broken_sc1_size
 hv_rtas_broken_sc1_size:
.long . - hv_rtas_broken_sc1;
===

to align the rtas blob to 2k precisely. But QEMU hardcoded 
RTAS_ERROR_LOG_OFFSET bothers me a bit, I should probably put some sort 
of a magic marker at which RTAS log can start.


David, any thoughts? The marker could be as simple as a zero, for example.


--
Alexey

Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] vhost-user-blk: prevent using uninitialized vqs

2019-08-22 Thread yuchenlin via Qemu-devel

Raphael Norwitz  於 2019-08-23 04:16 寫道： > > Same 
rational as: e6cc11d64fc998c11a4dfcde8fda3fc33a74d844 > > Of the 3 virtqueues, 
seabios only sets cmd, leaving ctrl > and event without a physical address. 
This can cause > vhost_verify_ring_part_mapping to return ENOMEM, causing > the 
following logs: > > qemu-system-x86_64: Unable to map available ring for ring 0 
> qemu-system-x86_64: Verify ring failure on region 0 > > This has already been 
fixed for vhost scsi devices and was > recently vhost-user scsi devices. This 
commit fixes it for > vhost-user-blk devices. > > Suggested-by: Phillippe 
Mathieu-Daude  > Signed-off-by: Raphael Norwitz 
 Reviewed-by: yuchenlin  
Thanks. > > > --- > hw/block/vhost-user-blk.c | 2 +- > 1 file changed, 1 
insertion(+), 1 deletion(-) > > diff --git a/hw/block/vhost-user-blk.c 
b/hw/block/vhost-user-blk.c > index 0b8c5df..63da9bb 100644 > --- 
a/hw/block/vhost-user-blk.c > +++ b/hw/block/vhost-user-blk.c > @@ -421,7 
+421,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error 
**errp) > } > > s->inflight = g_new0(struct vhost_inflight, 1); > - s->vqs = 
g_new(struct vhost_virtqueue, s->num_queues); > + s->vqs = g_new0(struct 
vhost_virtqueue, s->num_queues); > s->watch = 0; > s->connected = false; > > -- 
> 1.9.4 > >

Re: [Qemu-devel] [PATCH v2 2/3] net/filter.c: Add Options to insert filters anywhere in the filter list

2019-08-22 Thread Zhang, Chen




> -Original Message-
> From: Lukas Straub [mailto:lukasstra...@web.de]
> Sent: Friday, August 16, 2019 2:49 AM
> To: qemu-devel 
> Cc: Zhang, Chen ; Jason Wang
> ; Wen Congyang ; Xie
> Changlong 
> Subject: [PATCH v2 2/3] net/filter.c: Add Options to insert filters anywhere 
> in
> the filter list
> 
> To switch the Secondary to Primary, we need to insert new filters before the
> filter-rewriter.
> 
> Add the options insert= and position= to be able to insert filters anywhere 
> in the
> filter list.
> 
> position should be either "head", "tail" or the id of another filter.
> insert should be either "before" or "after" to specify where to insert the new
> filter relative to the one specified with position.
> 

Hi Lukas,

It looks no need to add the "insert = xxx" for this operation.
For example:

We have 3 net-filters, the running order like that:

Fiter1   -->   Filter2 > Filter3

If we want to add another filter between filter1 and filter2.
The "Position = head, insert = after" always seam with "position = filter2 id, 
insert = before". It seems the "insert" is a redundant args.
So I think it is enough with the "position", we can make the "insert" always 
equal "after" except the "head".


Thanks
Zhang Chen

> Signed-off-by: Lukas Straub 
> ---
>  include/net/filter.h |  2 ++
>  net/filter.c | 71 +++-
>  qemu-options.hx  | 10 +++
>  3 files changed, 77 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/filter.h b/include/net/filter.h index
> 49da666ac0..355c178f75 100644
> --- a/include/net/filter.h
> +++ b/include/net/filter.h
> @@ -62,6 +62,8 @@ struct NetFilterState {
>  NetClientState *netdev;
>  NetFilterDirection direction;
>  bool on;
> +char *position;
> +bool insert_before;
>  QTAILQ_ENTRY(NetFilterState) next;
>  };
> 
> diff --git a/net/filter.c b/net/filter.c index 28d1930db7..309fd778df 100644
> --- a/net/filter.c
> +++ b/net/filter.c
> @@ -171,11 +171,47 @@ static void netfilter_set_status(Object *obj, const
> char *str, Error **errp)
>  }
>  }
> 
> +static char *netfilter_get_position(Object *obj, Error **errp) {
> +NetFilterState *nf = NETFILTER(obj);
> +
> +return g_strdup(nf->position);
> +}
> +
> +static void netfilter_set_position(Object *obj, const char *str, Error
> +**errp) {
> +NetFilterState *nf = NETFILTER(obj);
> +
> +nf->position = g_strdup(str);
> +}
> +
> +static char *netfilter_get_insert(Object *obj, Error **errp) {
> +NetFilterState *nf = NETFILTER(obj);
> +
> +return nf->insert_before ? g_strdup("before") : g_strdup("after");
> +}
> +
> +static void netfilter_set_insert(Object *obj, const char *str, Error
> +**errp) {
> +NetFilterState *nf = NETFILTER(obj);
> +
> +if (strcmp(str, "before") && strcmp(str, "after")) {
> +error_setg(errp, "Invalid value for netfilter insert, "
> + "should be 'head' or 'tail'");
> +return;
> +}
> +
> +nf->insert_before = !strcmp(str, "before"); }
> +
>  static void netfilter_init(Object *obj)  {
>  NetFilterState *nf = NETFILTER(obj);
> 
>  nf->on = true;
> +nf->insert_before = false;
> +nf->position = g_strdup("tail");
> 
>  object_property_add_str(obj, "netdev",
>  netfilter_get_netdev_id, 
> netfilter_set_netdev_id, @@ -187,11
> +223,18 @@ static void netfilter_init(Object *obj)
>  object_property_add_str(obj, "status",
>  netfilter_get_status, netfilter_set_status,
>  NULL);
> +object_property_add_str(obj, "position",
> +netfilter_get_position, netfilter_set_position,
> +NULL);
> +object_property_add_str(obj, "insert",
> +netfilter_get_insert, netfilter_set_insert,
> +NULL);
>  }
> 
>  static void netfilter_complete(UserCreatable *uc, Error **errp)  {
>  NetFilterState *nf = NETFILTER(uc);
> +NetFilterState *position = NULL;
>  NetClientState *ncs[MAX_QUEUE_NUM];
>  NetFilterClass *nfc = NETFILTER_GET_CLASS(uc);
>  int queues;
> @@ -219,6 +262,20 @@ static void netfilter_complete(UserCreatable *uc,
> Error **errp)
>  return;
>  }
> 
> +if (strcmp(nf->position, "head") && strcmp(nf->position, "tail")) {
> +/* Search for the position to insert before/after */
> +Object *container;
> +Object *obj;
> +
> +container = object_get_objects_root();
> +obj = object_resolve_path_component(container, nf->position);
> +if (!obj) {
> +error_setg(errp, "filter '%s' not found", nf->position);
> +return;
> +}
> +position = NETFILTER(obj);
> +}
> +
>  nf->netdev = ncs[0];
> 
>  if (nfc->setup) {
> @@ -228,7 +285,18 @@ static void netfilter_complete(UserCreatable *uc,
> Error **errp)
>

Re: [Qemu-devel] [PATCH v9 00/20] Invert Endian bit in SPARCv9 MMU TTE

2019-08-22 Thread Richard Henderson

On 8/22/19 6:19 PM, Tony Nguyen wrote:
> v9:
> - Rebase on master and test with git am... again apologies to all, thanks for
>   the patience =)

So... after an hour only the cover letter has arrived.
I'm thinking that it didn't work.


r~

Re: [Qemu-devel] [PATCH v4 13/28] riscv: hart: Add a "hartid-base" property to RISC-V hart array

2019-08-22 Thread Bin Meng

Hi Alistair,

On Fri, Aug 23, 2019 at 6:44 AM Alistair Francis  wrote:
>
> On Sun, Aug 18, 2019 at 10:27 PM Bin Meng  wrote:
> >
> > At present each hart's hartid in a RISC-V hart array is assigned
> > the same value of its index in the hart array. But for a system
> > that has multiple hart arrays, this is not the case any more.
> >
> > Add a new "hartid-base" property so that hartid number can be
> > assigned based on the property value.
> >
> > Signed-off-by: Bin Meng 
>
> Why do we need this patch?
>

Without this patch, we cannot create two clusters that represent 1 E51
(hartid 0) and 4 U54 (hart id 1-4). Current codes will create 1 E51
(hartid 0) and 4 U54 (hartid 0-3)

Regards,
Bin

Re: [Qemu-devel] [PATCH v7 06/13] vfio: Add VM state change handler to know state of VM

2019-08-22 Thread Yan Zhao

On Wed, Aug 21, 2019 at 04:33:50AM +0800, Kirti Wankhede wrote:
> 
> 
> On 7/22/2019 2:07 PM, Yan Zhao wrote:
> > On Tue, Jul 09, 2019 at 05:49:13PM +0800, Kirti Wankhede wrote:
> >> VM state change handler gets called on change in VM's state. This is used 
> >> to set
> >> VFIO device state to _RUNNING.
> >> VM state change handler, migration state change handler and log_sync 
> >> listener
> >> are called asynchronously, which sometimes lead to data corruption in 
> >> migration
> >> region. Initialised mutex that is used to serialize operations on 
> >> migration data
> >> region during saving state.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  hw/vfio/migration.c   | 64 
> >> +++
> >>  hw/vfio/trace-events  |  2 ++
> >>  include/hw/vfio/vfio-common.h |  4 +++
> >>  3 files changed, 70 insertions(+)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index a2cfbd5af2e1..c01f08b659d0 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -78,6 +78,60 @@ err:
> >>  return ret;
> >>  }
> >>  
> >> +static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t state)
> >> +{
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +VFIORegion *region = >region.buffer;
> >> +uint32_t device_state;
> >> +int ret = 0;
> >> +
> >> +device_state = (state & VFIO_DEVICE_STATE_MASK) |
> >> +   (vbasedev->device_state & ~VFIO_DEVICE_STATE_MASK);
> >> +
> >> +if ((device_state & VFIO_DEVICE_STATE_MASK) == 
> >> VFIO_DEVICE_STATE_INVALID) {
> >> +return -EINVAL;
> >> +}
> >> +
> >> +ret = pwrite(vbasedev->fd, _state, sizeof(device_state),
> >> + region->fd_offset + offsetof(struct 
> >> vfio_device_migration_info,
> >> +  device_state));
> >> +if (ret < 0) {
> >> +error_report("%s: Failed to set device state %d %s",
> >> + vbasedev->name, ret, strerror(errno));
> >> +return ret;
> >> +}
> >> +
> >> +vbasedev->device_state = device_state;
> >> +trace_vfio_migration_set_state(vbasedev->name, device_state);
> >> +return 0;
> >> +}
> >> +
> >> +static void vfio_vmstate_change(void *opaque, int running, RunState state)
> >> +{
> >> +VFIODevice *vbasedev = opaque;
> >> +
> >> +if ((vbasedev->vm_running != running)) {
> >> +int ret;
> >> +uint32_t dev_state;
> >> +
> >> +if (running) {
> >> +dev_state = VFIO_DEVICE_STATE_RUNNING;
> > should be
> > dev_state |= VFIO_DEVICE_STATE_RUNNING; ?
> > 
> 
> vfio_migration_set_state() takes case of ORing.
>
if previous dev_state is VFIO_DEVICE_STATE_SAVING (without RUNNING), and
vfio_migration_set_state(VFIO_DEVICE_STATE_RUNNING) is called here, do
you mean vfio_migration_set_state() will change the device state to
VFIO_DEVICE_STATE_RUNNING | VFIO_DEVICE_STATE_SAVING ?

Thanks
Yan


> Thanks,
> Kirti
> 
> >> +} else {
> >> +dev_state = (vbasedev->device_state & VFIO_DEVICE_STATE_MASK) 
> >> &
> >> + ~VFIO_DEVICE_STATE_RUNNING;
> >> +}
> >> +
> >> +ret = vfio_migration_set_state(vbasedev, dev_state);
> >> +if (ret) {
> >> +error_report("%s: Failed to set device state 0x%x",
> >> + vbasedev->name, dev_state);
> >> +}
> >> +vbasedev->vm_running = running;
> >> +trace_vfio_vmstate_change(vbasedev->name, running, 
> >> RunState_str(state),
> >> +  dev_state);
> >> +}
> >> +}
> >> +
> >>  static int vfio_migration_init(VFIODevice *vbasedev,
> >> struct vfio_region_info *info)
> >>  {
> >> @@ -93,6 +147,11 @@ static int vfio_migration_init(VFIODevice *vbasedev,
> >>  return ret;
> >>  }
> >>  
> >> +qemu_mutex_init(>migration->lock);
> >> +
> >> +vbasedev->vm_state = 
> >> qemu_add_vm_change_state_handler(vfio_vmstate_change,
> >> +  vbasedev);
> >> +
> >>  return 0;
> >>  }
> >>  
> >> @@ -135,11 +194,16 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
> >>  return;
> >>  }
> >>  
> >> +if (vbasedev->vm_state) {
> >> +qemu_del_vm_change_state_handler(vbasedev->vm_state);
> >> +}
> >> +
> >>  if (vbasedev->migration_blocker) {
> >>  migrate_del_blocker(vbasedev->migration_blocker);
> >>  error_free(vbasedev->migration_blocker);
> >>  }
> >>  
> >> +qemu_mutex_destroy(>migration->lock);
> >>  vfio_migration_region_exit(vbasedev);
> >>  g_free(vbasedev->migration);
> >>  }
> >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> >> index 191a726a1312..3d15bacd031a 100644
> >> --- a/hw/vfio/trace-events
> >> +++ b/hw/vfio/trace-events
> >> @@ -146,3 +146,5 @@ vfio_display_edid_write_error(void) ""

Re: [Qemu-devel] [PATCH v7 08/13] vfio: Register SaveVMHandlers for VFIO device

2019-08-22 Thread Yan Zhao

On Wed, Aug 21, 2019 at 04:33:06AM +0800, Kirti Wankhede wrote:
> 
> 
> On 7/22/2019 2:04 PM, Yan Zhao wrote:
> > On Tue, Jul 09, 2019 at 05:49:15PM +0800, Kirti Wankhede wrote:
> >> Define flags to be used as delimeter in migration file stream.
> >> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
> >> region from these functions at source during saving or pre-copy phase.
> >> Set VFIO device state depending on VM's state. During live migration, VM is
> >> running when .save_setup is called, _SAVING | _RUNNING state is set for 
> >> VFIO
> >> device. During save-restore, VM is paused, _SAVING state is set for VFIO 
> >> device.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  hw/vfio/migration.c  | 82 
> >> +++-
> >>  hw/vfio/trace-events |  2 ++
> >>  2 files changed, 83 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index e4a89a6f9bc7..0597a45fda2d 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -23,6 +23,17 @@
> >>  #include "pci.h"
> >>  #include "trace.h"
> >>  
> >> +/*
> >> + * Flags used as delimiter:
> >> + * 0x => MSB 32-bit all 1s
> >> + * 0xef10 => emulated (virtual) function IO
> >> + * 0x => 16-bits reserved for flags
> >> + */
> >> +#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
> >> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
> >> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
> >> +#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> >> +
> >>  static void vfio_migration_region_exit(VFIODevice *vbasedev)
> >>  {
> >>  VFIOMigration *migration = vbasedev->migration;
> >> @@ -106,6 +117,74 @@ static int vfio_migration_set_state(VFIODevice 
> >> *vbasedev, uint32_t state)
> >>  return 0;
> >>  }
> >>  
> >> +/* -- 
> >> */
> >> +
> >> +static int vfio_save_setup(QEMUFile *f, void *opaque)
> >> +{
> >> +VFIODevice *vbasedev = opaque;
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +int ret;
> >> +
> >> +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> >> +
> >> +if (migration->region.buffer.mmaps) {
> >> +qemu_mutex_lock_iothread();
> >> +ret = vfio_region_mmap(>region.buffer);
> >> +qemu_mutex_unlock_iothread();
> >> +if (ret) {
> >> +error_report("%s: Failed to mmap VFIO migration region %d: 
> >> %s",
> >> + vbasedev->name, migration->region.index,
> >> + strerror(-ret));
> >> +return ret;
> >> +}
> >> +}
> >> +
> >> +if (vbasedev->vm_running) {
> >> +ret = vfio_migration_set_state(vbasedev,
> >> + VFIO_DEVICE_STATE_RUNNING | 
> >> VFIO_DEVICE_STATE_SAVING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state RUNNING and SAVING",
> >> + vbasedev->name);
> >> +return ret;
> >> +}
> >> +} else {
> > hi Kirti
> > May I know in which condition will this "else" case happen?
> > 
> 
> This can happen in savevm case.

ok. I see it. thanks.
Could we simplify the logic and only or VFIO_DEVICE_STATE_SAVING to
current device state here?
Because device state was already set to RUNNING or STOP in
vfio_vmstate_change().

Thanks
Yan
> 
> Thanks,
> Kirti
> 
> > Thanks
> > Yan
> > 
> >> +ret = vfio_migration_set_state(vbasedev, 
> >> VFIO_DEVICE_STATE_SAVING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state STOP and SAVING",
> >> + vbasedev->name);
> >> +return ret;
> >> +}
> >> +}
> >> +
> >> +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> >> +
> >> +ret = qemu_file_get_error(f);
> >> +if (ret) {
> >> +return ret;
> >> +}
> >> +
> >> +trace_vfio_save_setup(vbasedev->name);
> >> +return 0;
> >> +}
> >> +
> >> +static void vfio_save_cleanup(void *opaque)
> >> +{
> >> +VFIODevice *vbasedev = opaque;
> >> +VFIOMigration *migration = vbasedev->migration;
> >> +
> >> +if (migration->region.buffer.mmaps) {
> >> +vfio_region_unmap(>region.buffer);
> >> +}
> >> +trace_vfio_save_cleanup(vbasedev->name);
> >> +}
> >> +
> >> +static SaveVMHandlers savevm_vfio_handlers = {
> >> +.save_setup = vfio_save_setup,
> >> +.save_cleanup = vfio_save_cleanup,
> >> +};
> >> +
> >> +/* -- 
> >> */
> >> +
> >>  static void vfio_vmstate_change(void *opaque, int running, RunState state)
> >>  {
> >>  VFIODevice *vbasedev = opaque;
> >> @@ -195,7 +274,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
> >>  }
> >>  
> >>  qemu_mutex_init(>migration->lock);
> >> -
> >> +

[Qemu-devel] [PATCH v9 00/20] Invert Endian bit in SPARCv9 MMU TTE

2019-08-22 Thread Tony Nguyen

From: Tony Nguyen 

This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE.

It is an attempt of the instructions outlined by Richard Henderson to Mark
Cave-Ayland.

Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunately a
separate keyboard issue remains in the way.

On 01/11/17 19:15, Mark Cave-Ayland wrote:

>On 15/08/17 19:10, Richard Henderson wrote:
>
>> [CC Peter re MemTxAttrs below]
>>
>> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote:
>>> Working through an incorrect endian issue on qemu-system-sparc64, it has
>>> become apparent that at least one OS makes use of the IE (Invert Endian)
>>> bit in the SPARCv9 MMU TTE to map PCI memory space without the
>>> programmer having to manually endian-swap accesses.
>>>
>>> In other words, to quote the UltraSPARC specification: "if this bit is
>>> set, accesses to the associated page are processed with inverse
>>> endianness from what is specified by the instruction (big-for-little and
>>> little-for-big)".

A good explanation by Mark why the IE bit is required.

>>>
>>> Looking through various bits of code, I'm trying to get a feel for the
>>> best way to implement this in an efficient manner. From what I can see
>>> this could be solved using an additional MMU index, however I'm not
>>> overly familiar with the memory and softmmu subsystems.
>>
>> No, it can't be solved with an MMU index.
>>
>>> Can anyone point me in the right direction as to what would be the best
>>> way to implement this feature within QEMU?
>>
>> It's definitely tricky.
>>
>> We definitely need some TLB_FLAGS_MASK bit set so that we're forced through
>> the
>> memory slow path.  There is no other way to bypass the endianness that we've
>> already encoded from the target instruction.
>>
>> Given the tlb_set_page_with_attrs interface, I would think that we need a new
>> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) can
>> pass
>> along the TTE bit for the given page.
>>
>> We have an existing problem in softmmu_template.h,
>>
>> /* ??? Note that the io helpers always read data in the target
>>byte ordering.  We should push the LE/BE request down into io.  */
>> res = glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr);
>> res = TGT_BE(res);
>>
>> We do not want to add a third(!) byte swap along the i/o path.  We need to
>> collapse the two that we have already before considering this one.
>>
>> This probably takes the form of:
>>
>> (1) Replacing the "int size" argument with "TCGMemOp memop" for
>>   a) io_{read,write}x in accel/tcg/cputlb.c,
>>   b) memory_region_dispatch_{read,write} in memory.c,
>>   c) adjust_endianness in memory.c.
>> This carries size+sign+endianness down to the next level.
>>
>> (2) In memory.c, adjust_endianness,
>>
>>  if (memory_region_wrong_endianness(mr)) {
>> -switch (size) {
>> +memop ^= MO_BSWAP;
>> +}
>> +if (memop & MO_BSWAP) {
>>
>> For extra credit, re-arrange memory_region_wrong_endianness
>> to something more explicit -- "wrong" isn't helpful.
>
>Finally I've had a bit of spare time to experiment with this approach,
>and from what I can see there are currently 2 issues:
>
>
>1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic
>
>For the moment I've defined a separate MemOp in memory.h and provided a
>mapping function in io_{read,write}x to map from TCGMemOp to MemOp and
>then pass that into memory_region_dispatch_{read,write}.
>
>Other than not referencing TCGMemOp in the memory API, another reason
>for doing this was that I wasn't convinced that all the MO_ attributes
>were valid outside of TCG. I do, of course, strongly defer to other
>people's knowledge in this area though.
>
>
>2) The above changes to adjust_endianness() fail when
>memory_region_dispatch_{read,write} are called recursively
>
>Whilst booting qemu-system-sparc64 I see that
>memory_region_dispatch_{read,write} get called recursively - once via
>io_{read,write}x and then again via flatview_read_continue() in exec.c.
>
>The net effect of this is that we perform the bswap correctly at the
>tail of the recursion, but then as we travel back up the stack we hit
>memory_region_dispatch_{read,write} once again causing a second bswap
>which means the value is returned with the incorrect endian again.
>
>
>My understanding from your softmmu_template.h comment above is that the
>memory API should do the endian swapping internally allowing the removal
>of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong?
>
>> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set
>> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK.
>>
>> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set,
>> then memop ^= MO_BSWAP.

Thanks all for the feedback. Learnt a lot =)

v2:
- Moved size+sign+endianness attributes from TCGMemOp into MemOp.
  In v1 TCGMemOp was re-purposed entirely into MemOp.
- Replaced MemOp

Re: [Qemu-devel] [PATCH 2/6] exec.c: remove an unnecessary assert on PHYS_MAP_NODE_NIL in phys_map_node_alloc()

2019-08-22 Thread Wei Yang

On Thu, Aug 22, 2019 at 12:24:32PM +0200, Paolo Bonzini wrote:
>On 21/03/19 09:25, Wei Yang wrote:
>> PHYS_MAP_NODE_NIL is assigned to PhysPageEntry.ptr in case this is not a
>> leaf entry, while map->nodes_nb range in [0, nodes_nb_alloc).
>> 
>> Seems we are asserting on two different things, just remove it.
>
>The assertion checks that this "if" is not entered incorrectly:
>
>if (lp->skip && lp->ptr == PHYS_MAP_NODE_NIL) {
>lp->ptr = phys_map_node_alloc(map, level == 0);
>}
>

Hmm... I may not get your point.

phys_map_node_alloc() will get an available PhysPageEntry and return its
index, which will be assigned to its parent's ptr.

The "if" checks on the parent's ptr, while the assertion asserts the index for
the new child. I may miss something?

>Paolo
>
>> Signed-off-by: Wei Yang 
>> ---
>>  exec.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/exec.c b/exec.c
>> index 98ebd0dd1d..8e8b6bb1f9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -242,7 +242,6 @@ static uint32_t phys_map_node_alloc(PhysPageMap *map, 
>> bool leaf)
>>  
>>  ret = map->nodes_nb++;
>>  p = map->nodes[ret];
>> -assert(ret != PHYS_MAP_NODE_NIL);
>>  assert(ret != map->nodes_nb_alloc);
>>  
>>  e.skip = leaf ? 0 : 1;
>> 

-- 
Wei Yang
Help you, Help me

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D

Paolo,

I find the following links related to the discussions here
along with one example feature called GENPROTRANGE.

https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/images-media/day1_trusted-computing_200-250.pdf
https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Hot-Add_flow.pdf
https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131292.pdf

Best regards,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 4:12 PM
> To: Kinney, Michael D ;
> Laszlo Ersek ; r...@edk2.groups.io;
> Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 23/08/19 00:32, Kinney, Michael D wrote:
> > Paolo,
> >
> > It is my understanding that real HW hot plug uses the
> SDM defined
> > methods.  Meaning the initial SMI is to 3000:8000 and
> they rebase to
> > TSEG in the first SMI.  They must have chipset specific
> methods to
> > protect 3000:8000 from DMA.
> 
> It would be great if you could check.
> 
> > Can we add a chipset feature to prevent DMA to 64KB
> range from
> > 0x3-0x3 and the UEFI Memory Map and ACPI
> content can be
> > updated so the Guest OS knows to not use that range for
> DMA?
> 
> If real hardware does it at the chipset level, we will
> probably use Igor's suggestion of aliasing A-seg to
> 3000:.  Before starting the new CPU, the SMI handler
> can prepare the SMBASE relocation trampoline at
> A000:8000 and the hot-plugged CPU will find it at
> 3000:8000 when it receives the initial SMI.  Because this
> is backed by RAM at 0xA-0xA, DMA cannot access it
> and would still go through to RAM at 0x3.
> 
> Paolo

Re: [Qemu-devel] [PATCH v7 07/13] vfio: Add migration state change notifier

2019-08-22 Thread Yan Zhao

On Wed, Aug 21, 2019 at 04:24:27AM +0800, Kirti Wankhede wrote:
> 
> 
> On 7/17/2019 7:55 AM, Yan Zhao wrote:
> > On Tue, Jul 09, 2019 at 05:49:14PM +0800, Kirti Wankhede wrote:
> >> Added migration state change notifier to get notification on migration 
> >> state
> >> change. These states are translated to VFIO device state and conveyed to 
> >> vendor
> >> driver.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>  hw/vfio/migration.c   | 54 
> >> +++
> >>  hw/vfio/trace-events  |  1 +
> >>  include/hw/vfio/vfio-common.h |  1 +
> >>  3 files changed, 56 insertions(+)
> >>
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index c01f08b659d0..e4a89a6f9bc7 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -132,6 +132,53 @@ static void vfio_vmstate_change(void *opaque, int 
> >> running, RunState state)
> >>  }
> >>  }
> >>  
> >> +static void vfio_migration_state_notifier(Notifier *notifier, void *data)
> >> +{
> >> +MigrationState *s = data;
> >> +VFIODevice *vbasedev = container_of(notifier, VFIODevice, 
> >> migration_state);
> >> +int ret;
> >> +
> >> +trace_vfio_migration_state_notifier(vbasedev->name, s->state);
> >> +
> >> +switch (s->state) {
> >> +case MIGRATION_STATUS_ACTIVE:
> >> +if (vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING) {
> >> +if (vbasedev->vm_running) {
> >> +ret = vfio_migration_set_state(vbasedev,
> >> +  VFIO_DEVICE_STATE_RUNNING | 
> >> VFIO_DEVICE_STATE_SAVING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state RUNNING and 
> >> SAVING",
> >> +  vbasedev->name);
> >> +}
> >> +} else {
> >> +ret = vfio_migration_set_state(vbasedev,
> >> +   VFIO_DEVICE_STATE_SAVING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state STOP and 
> >> SAVING",
> >> + vbasedev->name);
> >> +}
> >> +}
> >> +} else {
> >> +ret = vfio_migration_set_state(vbasedev,
> >> +   VFIO_DEVICE_STATE_RESUMING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state RESUMING",
> >> + vbasedev->name);
> >> +}
> >> +}
> >> +return;
> >> +
> > hi Kirti
> > currently, migration state notifiers are only notified in below 3 
> > interfaces:
> > migrate_fd_connect, migrate_fd_cleanup, postcopy_start, where
> > MIGRATION_STATUS_ACTIVE is not an valid state.
> > Have you tested the above code? what's the purpose of the code?
> > 
> 
> Sorry for delayed response.
> 
> migration_iteration_finish() -> qemu_bh_schedule(s->cleanup_bh) which is
> migrate_fd_cleanup().
> 
> migration_iteration_finish() can be called with MIGRATION_STATUS_ACTIVE
> state. So migration state notifiers can be called with
> MIGRATION_STATUS_ACTIVE. So handled that case here.
>
hi Kirti

I checked the code, the MIGRATION_STATUS_ACTIVE case you mentioned is
colo only, and there's actually an assert in migrate_fd_cleanup

assert((s->state != MIGRATION_STATUS_ACTIVE) &&
(s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE));

before it calls notifier_list_notify(_state_notifiers, s).

Thanks
Yan

> 
> 
> > 
> >> +case MIGRATION_STATUS_CANCELLING:
> >> +case MIGRATION_STATUS_CANCELLED:
> >> +case MIGRATION_STATUS_FAILED:
> >> +ret = vfio_migration_set_state(vbasedev, 
> >> VFIO_DEVICE_STATE_RUNNING);
> >> +if (ret) {
> >> +error_report("%s: Failed to set state RUNNING", 
> >> vbasedev->name);
> >> +}
> >> +return;
> >> +}
> >> +}
> >> +
> >>  static int vfio_migration_init(VFIODevice *vbasedev,
> >> struct vfio_region_info *info)
> >>  {
> >> @@ -152,6 +199,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
> >>  vbasedev->vm_state = 
> >> qemu_add_vm_change_state_handler(vfio_vmstate_change,
> >>vbasedev);
> >>  
> >> +vbasedev->migration_state.notify = vfio_migration_state_notifier;
> >> +add_migration_state_change_notifier(>migration_state);
> >> +
> >>  return 0;
> >>  }
> >>  
> >> @@ -194,6 +244,10 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
> >>  return;
> >>  }
> >>  
> >> +if (vbasedev->migration_state.notify) {
> >> +
> >> remove_migration_state_change_notifier(>migration_state);
> >> +}
> >> +
> >>  if (vbasedev->vm_state) {
> >>  qemu_del_vm_change_state_handler(vbasedev->vm_state);
> >>  }
> >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> >> index

Re: [Qemu-devel] [PATCH v7 04/13] vfio: Add save and load functions for VFIO PCI devices

2019-08-22 Thread Tian, Kevin

> From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com]
> Sent: Friday, August 23, 2019 3:13 AM
> 
> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> >
> >
> > On 8/22/2019 3:02 PM, Dr. David Alan Gilbert wrote:
> > > * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> > >> Sorry for delay to respond.
> > >>
> > >> On 7/11/2019 5:37 PM, Dr. David Alan Gilbert wrote:
> > >>> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> >  These functions save and restore PCI device specific data - config
> >  space of PCI device.
> >  Tested save and restore with MSI and MSIX type.
> > 
> >  Signed-off-by: Kirti Wankhede 
> >  Reviewed-by: Neo Jia 
> >  ---
> >   hw/vfio/pci.c | 114
> ++
> >   include/hw/vfio/vfio-common.h |   2 +
> >   2 files changed, 116 insertions(+)
> > 
> >  diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >  index de0d286fc9dd..5fe4f8076cac 100644
> >  --- a/hw/vfio/pci.c
> >  +++ b/hw/vfio/pci.c
> >  @@ -2395,11 +2395,125 @@ static Object
> *vfio_pci_get_object(VFIODevice *vbasedev)
> >   return OBJECT(vdev);
> >   }
> > 
> >  +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
> >  +{
> >  +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
> vbasedev);
> >  +PCIDevice *pdev = >pdev;
> >  +uint16_t pci_cmd;
> >  +int i;
> >  +
> >  +for (i = 0; i < PCI_ROM_SLOT; i++) {
> >  +uint32_t bar;
> >  +
> >  +bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i *
> 4, 4);
> >  +qemu_put_be32(f, bar);
> >  +}
> >  +
> >  +qemu_put_be32(f, vdev->interrupt);
> >  +if (vdev->interrupt == VFIO_INT_MSI) {
> >  +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> >  +bool msi_64bit;
> >  +
> >  +msi_flags = pci_default_read_config(pdev, pdev->msi_cap +
> PCI_MSI_FLAGS,
> >  +2);
> >  +msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
> >  +
> >  +msi_addr_lo = pci_default_read_config(pdev,
> >  + pdev->msi_cap + 
> >  PCI_MSI_ADDRESS_LO, 4);
> >  +qemu_put_be32(f, msi_addr_lo);
> >  +
> >  +if (msi_64bit) {
> >  +msi_addr_hi = pci_default_read_config(pdev,
> >  + pdev->msi_cap + 
> >  PCI_MSI_ADDRESS_HI,
> >  + 4);
> >  +}
> >  +qemu_put_be32(f, msi_addr_hi);
> >  +
> >  +msi_data = pci_default_read_config(pdev,
> >  +pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 :
> PCI_MSI_DATA_32),
> >  +2);
> >  +qemu_put_be32(f, msi_data);
> >  +} else if (vdev->interrupt == VFIO_INT_MSIX) {
> >  +uint16_t offset;
> >  +
> >  +/* save enable bit and maskall bit */
> >  +offset = pci_default_read_config(pdev,
> >  +   pdev->msix_cap + 
> >  PCI_MSIX_FLAGS + 1, 2);
> >  +qemu_put_be16(f, offset);
> >  +msix_save(pdev, f);
> >  +}
> >  +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
> >  +qemu_put_be16(f, pci_cmd);
> >  +}
> >  +
> >  +static void vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
> >  +{
> >  +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
> vbasedev);
> >  +PCIDevice *pdev = >pdev;
> >  +uint32_t interrupt_type;
> >  +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
> >  +uint16_t pci_cmd;
> >  +bool msi_64bit;
> >  +int i;
> >  +
> >  +/* retore pci bar configuration */
> >  +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
> >  +vfio_pci_write_config(pdev, PCI_COMMAND,
> >  +pci_cmd & (!(PCI_COMMAND_IO |
> PCI_COMMAND_MEMORY)), 2);
> >  +for (i = 0; i < PCI_ROM_SLOT; i++) {
> >  +uint32_t bar = qemu_get_be32(f);
> >  +
> >  +vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 
> >  4);
> >  +}
> > >>>
> > >>> Is it possible to validate the bar's at all?  We just had a bug on a
> > >>> virtual device where one version was asking for a larger bar than the
> > >>> other; our validation caught this in some cases so we could tell that
> > >>> the guest had a BAR that was aligned at the wrong alignment.

I'm a bit confused here. Did you mean that src and dest include
different versions of the virtual device which implements different
BAR size? If that is the case, shouldn't the migration fail at the start
when doing compatibility check?

> > >>>
> > >>
> > >>

Re: [Qemu-devel] [PATCH 0/2] Adding some setsockopt() options

2019-08-22 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190822231443.172099-1-...@google.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH 0/2] Adding some setsockopt() options
Message-id: 20190822231443.172099-1-...@google.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for 
path 'capstone'
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) 
registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/edk2' (https://git.qemu.org/git/edk2.git) registered for path 
'roms/edk2'
Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered 
for path 'roms/openbios'
Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) 
registered for path 'roms/openhackware'
Submodule 'roms/opensbi' (https://git.qemu.org/git/opensbi.git) registered for 
path 'roms/opensbi'
Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) 
registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for 
path 'roms/seabios'
Submodule 'roms/seabios-hppa' (https://git.qemu.org/git/seabios-hppa.git) 
registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for 
path 'roms/sgabios'
Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for 
path 'roms/skiboot'
Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for 
path 'roms/u-boot'
Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) 
registered for path 'roms/u-boot-sam460ex'
Submodule 'slirp' (https://git.qemu.org/git/libslirp.git) registered for path 
'slirp'
Submodule 'tests/fp/berkeley-softfloat-3' 
(https://git.qemu.org/git/berkeley-softfloat-3.git) registered for path 
'tests/fp/berkeley-softfloat-3'
Submodule 'tests/fp/berkeley-testfloat-3' 
(https://git.qemu.org/git/berkeley-testfloat-3.git) registered for path 
'tests/fp/berkeley-testfloat-3'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) 
registered for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out 
'22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out 
'90c488d5f4a407342247b9ea869df1c2d9c8e266'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 
'7bfe584e321946771692711ff83ad2b5850daca7'
Cloning into 'roms/edk2'...
Submodule path 'roms/edk2': checked out 
'20d2e5a125e34fc8501026613a71549b2a1a3e54'
Submodule 'SoftFloat' (https://github.com/ucb-bar/berkeley-softfloat-3.git) 
registered for path 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'
Submodule 'CryptoPkg/Library/OpensslLib/openssl' 
(https://github.com/openssl/openssl) registered for path 
'CryptoPkg/Library/OpensslLib/openssl'
Cloning into 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'...
Submodule path 'roms/edk2/ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3': 
checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'CryptoPkg/Library/OpensslLib/openssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl': checked out 
'50eaac9f3337667259de725451f201e784599687'
Submodule 'boringssl' (https://boringssl.googlesource.com/boringssl) registered 
for path 'boringssl'
Submodule 'krb5' (https://github.com/krb5/krb5) registered for path 'krb5'
Submodule 'pyca.cryptography' (https://github.com/pyca/cryptography.git) 
registered for path 'pyca-cryptography'
Cloning into 'boringssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/boringssl': 
checked out '2070f8ad9151dc8f3a73bffaa146b5e6937a583f'
Cloning into 'krb5'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/krb5': checked 
out 'b9ad6c49505c96a088326b62a52568e3484f2168'
Cloning into 'pyca-cryptography'...
Submodule path 
'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/pyca-cryptography': checked out 
'09403100de2f6f1cdd0d484dcb8e620f1c335c8f'
Cloning into 'roms/ipxe'...
Submodule path 'roms/ipxe': checked out 
'de4565cbe76ea9f7913a01f331be3ee901bb6e17'
Cloning into 'roms/openbios'...
Submodule path 'roms/openbios': checked out

[Qemu-devel] [PATCH 1/2] linux-user: add missing UDP and IPv6 setsockopt options

2019-08-22 Thread Shu-Chun Weng via Qemu-devel

UDP: SOL_UDP manipulate options at UDP level. All six options currently
defined in linux source include/uapi/linux/udp.h take integer values.

IPv6: IPV6_ADDR_PREFERENCES (RFC5014: Source address selection) was not
supported.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8367cb138d..8dc4255f12 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -49,8 +49,10 @@
 #include 
 #include 
 //#include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1837,7 +1839,8 @@ static abi_long do_setsockopt(int sockfd, int level, int 
optname,
 
 switch(level) {
 case SOL_TCP:
-/* TCP options all take an 'int' value.  */
+case SOL_UDP:
+/* TCP and UDP options all take an 'int' value.  */
 if (optlen < sizeof(uint32_t))
 return -TARGET_EINVAL;
 
@@ -2488,6 +2491,7 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 case IPV6_RECVDSTOPTS:
 case IPV6_2292DSTOPTS:
 case IPV6_TCLASS:
+case IPV6_ADDR_PREFERENCES:
 #ifdef IPV6_RECVPATHMTU
 case IPV6_RECVPATHMTU:
 #endif
-- 
2.23.0.187.g17f5b7556c-goog

[Qemu-devel] [PATCH 2/2] linux-user: time stamping options for setsockopt()

2019-08-22 Thread Shu-Chun Weng via Qemu-devel

This change supports SO_TIMESTAMPNS and SO_TIMESTAMPING for
setsocketopt() with SOL_SOCKET.

The TARGET_SO_TIMESTAMP{NS,ING} constants are already defined for
alpha, hppa, and sparc. In include/uapi/asm-generic/socket.h:

In arch/mips/include/uapi/asm/socket.h:

Signed-off-by: Shu-Chun Weng 
---
 linux-user/generic/sockbits.h |  4 
 linux-user/mips/sockbits.h|  4 
 linux-user/syscall.c  | 10 --
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/linux-user/generic/sockbits.h b/linux-user/generic/sockbits.h
index e44733c601..5cbafdb49b 100644
--- a/linux-user/generic/sockbits.h
+++ b/linux-user/generic/sockbits.h
@@ -51,6 +51,10 @@
 #define TARGET_SO_PEERNAME 28
 #define TARGET_SO_TIMESTAMP29
 #define TARGET_SCM_TIMESTAMP   TARGET_SO_TIMESTAMP
+#define TARGET_SO_TIMESTAMPNS  35
+#define TARGET_SCM_TIMESTAMPNS TARGET_SO_TIMESTAMPNS
+#define TARGET_SO_TIMESTAMPING 37
+#define TARGET_SCM_TIMESTAMPINGTARGET_SO_TIMESTAMPING
 
 #define TARGET_SO_ACCEPTCONN   30
 
diff --git a/linux-user/mips/sockbits.h b/linux-user/mips/sockbits.h
index 0f022cd598..1246b7d988 100644
--- a/linux-user/mips/sockbits.h
+++ b/linux-user/mips/sockbits.h
@@ -63,6 +63,10 @@
 #define TARGET_SO_PEERNAME 28
 #define TARGET_SO_TIMESTAMP29
 #define SCM_TIMESTAMP  SO_TIMESTAMP
+#define TARGET_SO_TIMESTAMPNS  35
+#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
+#define TARGET_SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPINGSO_TIMESTAMPING
 
 #define TARGET_SO_PEERSEC  30
 #define TARGET_SO_SNDBUFFORCE  31
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 8dc4255f12..bac00d3fd4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2230,8 +2230,14 @@ set_timeout:
 optname = SO_PASSSEC;
 break;
 case TARGET_SO_TIMESTAMP:
-   optname = SO_TIMESTAMP;
-   break;
+optname = SO_TIMESTAMP;
+break;
+case TARGET_SO_TIMESTAMPNS:
+optname = SO_TIMESTAMPNS;
+break;
+case TARGET_SO_TIMESTAMPING:
+optname = SO_TIMESTAMPING;
+break;
 case TARGET_SO_RCVLOWAT:
optname = SO_RCVLOWAT;
break;
-- 
2.23.0.187.g17f5b7556c-goog

[Qemu-devel] [PATCH] contrib/gitdm: Add RT-RK to the domain-map

2019-08-22 Thread Philippe Mathieu-Daudé

This company has at least 7 contributors, add a domain-map entry.

Signed-off-by: Philippe Mathieu-Daudé 
---
 contrib/gitdm/domain-map | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
index fa9d454473..9efe066ec9 100644
--- a/contrib/gitdm/domain-map
+++ b/contrib/gitdm/domain-map
@@ -18,6 +18,7 @@ nokia.com   Nokia
 oracle.com  Oracle
 proxmox.com Proxmox
 redhat.com  Red Hat
+rt-rk.com   RT-RK
 siemens.com Siemens
 sifive.com  SiFive
 suse.de SUSE
-- 
2.20.1

[Qemu-devel] [PATCH 0/2] Adding some setsockopt() options

2019-08-22 Thread Shu-Chun Weng via Qemu-devel

Shu-Chun Weng (2):
  linux-user: add missing UDP and IPv6 setsockopt options
  linux-user: time stamping options for setsockopt()

 linux-user/generic/sockbits.h |  4 
 linux-user/mips/sockbits.h|  4 
 linux-user/syscall.c  | 16 +---
 3 files changed, 21 insertions(+), 3 deletions(-)

-- 
2.23.0.187.g17f5b7556c-goog

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini

On 23/08/19 00:32, Kinney, Michael D wrote:
> Paolo,
> 
> It is my understanding that real HW hot plug uses the SDM defined
> methods.  Meaning the initial SMI is to 3000:8000 and they rebase
> to TSEG in the first SMI.  They must have chipset specific methods
> to protect 3000:8000 from DMA.

It would be great if you could check.

> Can we add a chipset feature to prevent DMA to 64KB range from
> 0x3-0x3 and the UEFI Memory Map and ACPI content can be
> updated so the Guest OS knows to not use that range for DMA?

If real hardware does it at the chipset level, we will probably use
Igor's suggestion of aliasing A-seg to 3000:.  Before starting the
new CPU, the SMI handler can prepare the SMBASE relocation trampoline at
A000:8000 and the hot-plugged CPU will find it at 3000:8000 when it
receives the initial SMI.  Because this is backed by RAM at
0xA-0xA, DMA cannot access it and would still go through to RAM
at 0x3.

Paolo

[Qemu-devel] [PATCH 1/3] mailmap: Reorder by sections

2019-08-22 Thread Philippe Mathieu-Daudé

Our mailmap currently has 4 sections somehow documented.
Reorder few entries not related to "addresses from the original
git import" into the 3rd section, and add a comment to describe
it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 .mailmap | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/.mailmap b/.mailmap
index b8e08297c9..e1fdc88d25 100644
--- a/.mailmap
+++ b/.mailmap
@@ -4,20 +4,12 @@
 # into proper addresses so that they are counted properly by git shortlog.
 Andrzej Zaborowski  balrog 

 Anthony Liguori  aliguori 

-Anthony Liguori  Anthony Liguori 
 Aurelien Jarno  aurel32 

 Blue Swirl  blueswir1 

 Edgar E. Iglesias  edgar_igl 

 Fabrice Bellard  bellard 

-James Hogan  
 Jocelyn Mayer  j_mayer 

 Paul Brook  pbrook 

-Yongbok Kim  
-Aleksandar Markovic  
-Aleksandar Markovic  
-Paul Burton  
-Paul Burton  
-Paul Burton  
 Thiemo Seufer  ths 

 malc  malc 
 
@@ -32,6 +24,15 @@ Ian McKellar  Ian McKellar via 
Qemu-devel  Julia Suvorova via Qemu-devel 

 Justin Terry (VM)  Justin Terry (VM) via Qemu-devel 

 
+# Next, replace old addresses by a more recent one.
+Anthony Liguori  Anthony Liguori 
+James Hogan  
+Aleksandar Markovic  
+Aleksandar Markovic  
+Paul Burton  
+Paul Burton  
+Paul Burton  
+Yongbok Kim  
 
 # Also list preferred name forms where people have changed their
 # git author config, or had utf8/latin1 encoding issues.
-- 
2.20.1

[Qemu-devel] [PATCH 2/3] mailmap: Update philmd email address

2019-08-22 Thread Philippe Mathieu-Daudé

Use the email address where I spend most of my time.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index e1fdc88d25..e68ddd26e6 100644
--- a/.mailmap
+++ b/.mailmap
@@ -32,6 +32,7 @@ Aleksandar Markovic  

 Paul Burton  
 Paul Burton  
 Paul Burton  
+Philippe Mathieu-Daudé  
 Yongbok Kim  
 
 # Also list preferred name forms where people have changed their
-- 
2.20.1

[Qemu-devel] [PATCH 0/3] mailmap: Clean up

2019-08-22 Thread Philippe Mathieu-Daudé

Trivial cleanup of .mailmap to have a nice 'git shortlog' output.

Philippe Mathieu-Daudé (3):
  mailmap: Reorder by sections
  mailmap: Update philmd email address
  mailmap: Add many entries to improve 'git shortlog' statistics

 .mailmap | 123 +++
 1 file changed, 115 insertions(+), 8 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH 3/3] mailmap: Add many entries to improve 'git shortlog' statistics

2019-08-22 Thread Philippe Mathieu-Daudé

All of these emails have a least 1 commit with utf8/latin1 encoding
issue, or one with no author name.
When there are multiple commits, keep the author name the most used.

Signed-off-by: Philippe Mathieu-Daudé 
---
 .mailmap | 105 +++
 1 file changed, 105 insertions(+)

diff --git a/.mailmap b/.mailmap
index e68ddd26e6..d0fc1d793c 100644
--- a/.mailmap
+++ b/.mailmap
@@ -37,5 +37,110 @@ Yongbok Kim  
 
 # Also list preferred name forms where people have changed their
 # git author config, or had utf8/latin1 encoding issues.
+Aaron Lindsay 
+Alexey Gerasimenko 
+Alex Ivanov 
+Andreas Färber 
+Bandan Das 
+Benjamin MARSILI 
+Benoît Canet 
+Benoît Canet 
+Benoît Canet 
+Boqun Feng 
+Boqun Feng 
+Brad Smith 
+Brijesh Singh 
+Brilly Wu 
+Cédric Vincent 
+CheneyLin 
+Chen Gang 
+Chen Gang 
+Chen Gang 
+Chen Wei-Ren 
+Christophe Lyon 
+Collin L. Walling 
 Daniel P. Berrangé 
+Eduardo Otubo 
+Fabrice Desclaux 
+Fernando Luis Vázquez Cao 
+Fernando Luis Vázquez Cao 
+Gautham R. Shenoy 
+Gautham R. Shenoy 
+Gonglei (Arei) 
+Guang Wang 
+Hailiang Zhang 
+Hervé Poussineau 
+Jakub Jermář 
+Jakub Jermář 
+Jean-Christophe Dubois 
+Jindřich Makovička 
+John Arbuckle 
+Juha Riihimäki 
+Juha Riihimäki 
+Jun Li 
+Laurent Vivier 
+Leandro Lupori 
+Li Guang 
+Liming Wang 
+linzhecheng 
+Liran Schour 
+Liu Yu 
+Liu Yu 
+Li Zhang 
+Li Zhang 
+Lluís Vilanova 
+Lluís Vilanova 
+Longpeng (Mike) 
+Luc Michel 
+Luc Michel 
+Marc Marí 
+Marc Marí 
+Michael Avdienko 
+Michael S. Tsirkin 
+Munkyu Im 
+Nicholas Bellinger 
+Nicholas Thomas 
+Nikunj A Dadhania 
+Orit Wasserman 
+Paolo Bonzini 
+Pavel Dovgaluk 
+Pavel Dovgaluk 
+Pavel Dovgaluk 
+Peter Crosthwaite 
+Peter Crosthwaite 
+Peter Crosthwaite 
+Prasad J Pandit 
+Prasad J Pandit 
+Qiao Nuohan 
 Reimar Döffinger 
+Remy Noel 
+Roger Pau Monné 
+Shin'ichiro Kawasaki 
+Shin'ichiro Kawasaki 
+Sochin Jiang 
+Takashi Yoshii 
+Thomas Huth 
+Thomas Knych 
+Timothy Baldwin 
+Tony Nguyen 
+Venkateswararao Jujjuri 
+Vibi Sreenivasan 
+Vijaya Kumar K 
+Vijaya Kumar K 
+Vijay Kumar 
+Vijay Kumar 
+Wang Guang 
+Wenchao Xia 
+Wenshuang Ma 
+Xiaoqiang Zhao 
+Xinhua Cao 
+Xiong Zhang 
+Yin Yin 
+yuchenlin 
+YunQiang Su 
+YunQiang Su 
+Yuri Pudgorodskiy 
+Zhengui Li 
+Zhenwei Pi 
+Zhenwei Pi 
+Zhuang Yanying 
-- 
2.20.1

Re: [Qemu-devel] contrib/gitdm: Add group map for RT-RK?

2019-08-22 Thread Philippe Mathieu-Daudé

On 8/23/19 1:05 AM, Aleksandar Markovic wrote:
> 23.08.2019. 00.47, "Philippe Mathieu-Daudé"  > је написао/ла:
>>
>> Hi Aleksandar,
>>
>> I noticed this list of contributors:
>>
>> Aleksandar Markovic  >
>> Dejan Jovicevic  >
>> Lena Djokic mailto:lena.djo...@rt-rk.com>>
>> Mateja Marjanovic  >
>> Mateja Marjanovic  >
>> Miloš Stojanović  >
>> Petar Jovanovic  >
>> Stefan Brankovic  >
>>
>> I see most of the commits are MIPS related (a few are PPC).
>>
>> Should we add these emails to contrib/gitdm/group-map-wavecomp or should
>> we rather add a new group-map file for this company?
>>
> 
> The most appropriate and simplest approach would be to add a line in
> qemu/contrib/gitdm/domain-map for company RT-RK.

This one my first thought :)

Thanks!

Phil.

Re: [Qemu-devel] contrib/gitdm: Add group map for RT-RK?

2019-08-22 Thread Aleksandar Markovic

23.08.2019. 00.47, "Philippe Mathieu-Daudé"  је
написао/ла:
>
> Hi Aleksandar,
>
> I noticed this list of contributors:
>
> Aleksandar Markovic 
> Dejan Jovicevic 
> Lena Djokic 
> Mateja Marjanovic 
> Mateja Marjanovic 
> Miloš Stojanović 
> Petar Jovanovic 
> Stefan Brankovic 
>
> I see most of the commits are MIPS related (a few are PPC).
>
> Should we add these emails to contrib/gitdm/group-map-wavecomp or should
> we rather add a new group-map file for this company?
>

The most appropriate and simplest approach would be to add a line in
qemu/contrib/gitdm/domain-map for company RT-RK.

Thanks, Aleksandar

> Thanks,
>
> Phil.
>

Re: [Qemu-devel] [PATCH] i386: Omit all-zeroes entries from KVM CPUID table

2019-08-22 Thread Paolo Bonzini

On 23/08/19 00:52, Eduardo Habkost wrote:
> KVM has a 80-entry limit at KVM_SET_CPUID2.  With the
> introduction of CPUID[0x1F], it is now possible to hit this limit
> with unusual CPU configurations, e.g.:
> 
>   $ ./x86_64-softmmu/qemu-system-x86_64 \
> -smp 1,dies=2,maxcpus=2 \
> -cpu EPYC,check=off,enforce=off \
> -machine accel=kvm
>   qemu-system-x86_64: kvm_init_vcpu failed: Argument list too long
> 
> This happens because QEMU adds a lot of all-zeroes CPUID entries
> for unused CPUID leaves.  In the example above, we end up
> creating 48 all-zeroes CPUID entries.
> 
> KVM already returns all-zeroes when emulating the CPUID
> instruction if an entry is missing, so the all-zeroes entries are
> redundant.  Skip those entries.  This reduces the CPUID table
> size by half while keeping CPUID output unchanged.
> 
> Reported-by: Yumei Huang 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1741508
> Signed-off-by: Eduardo Habkost 
> ---
>  target/i386/kvm.c | 14 ++
>  1 file changed, 14 insertions(+)

Acked-by: Paolo Bonzini 

> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 8023c679ea..4e3df2867d 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1529,6 +1529,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  c->function = i;
>  c->flags = 0;
>  cpu_x86_cpuid(env, i, 0, >eax, >ebx, >ecx, >edx);
> +if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> +/*
> + * KVM already returns all zeroes if a CPUID entry is 
> missing,
> + * so we can omit it and avoid hitting KVM's 80-entry limit.
> + */
> +cpuid_i--;
> +}
>  break;
>  }
>  }
> @@ -1593,6 +1600,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  c->function = i;
>  c->flags = 0;
>  cpu_x86_cpuid(env, i, 0, >eax, >ebx, >ecx, >edx);
> +if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> +/*
> + * KVM already returns all zeroes if a CPUID entry is 
> missing,
> + * so we can omit it and avoid hitting KVM's 80-entry limit.
> + */
> +cpuid_i--;
> +}
>  break;
>  }
>  }
>

[Qemu-devel] [PATCH] i386: Omit all-zeroes entries from KVM CPUID table

2019-08-22 Thread Eduardo Habkost

KVM has a 80-entry limit at KVM_SET_CPUID2.  With the
introduction of CPUID[0x1F], it is now possible to hit this limit
with unusual CPU configurations, e.g.:

  $ ./x86_64-softmmu/qemu-system-x86_64 \
-smp 1,dies=2,maxcpus=2 \
-cpu EPYC,check=off,enforce=off \
-machine accel=kvm
  qemu-system-x86_64: kvm_init_vcpu failed: Argument list too long

This happens because QEMU adds a lot of all-zeroes CPUID entries
for unused CPUID leaves.  In the example above, we end up
creating 48 all-zeroes CPUID entries.

KVM already returns all-zeroes when emulating the CPUID
instruction if an entry is missing, so the all-zeroes entries are
redundant.  Skip those entries.  This reduces the CPUID table
size by half while keeping CPUID output unchanged.

Reported-by: Yumei Huang 
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1741508
Signed-off-by: Eduardo Habkost 
---
 target/i386/kvm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 8023c679ea..4e3df2867d 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1529,6 +1529,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c->function = i;
 c->flags = 0;
 cpu_x86_cpuid(env, i, 0, >eax, >ebx, >ecx, >edx);
+if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
+/*
+ * KVM already returns all zeroes if a CPUID entry is missing,
+ * so we can omit it and avoid hitting KVM's 80-entry limit.
+ */
+cpuid_i--;
+}
 break;
 }
 }
@@ -1593,6 +1600,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c->function = i;
 c->flags = 0;
 cpu_x86_cpuid(env, i, 0, >eax, >ebx, >ecx, >edx);
+if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
+/*
+ * KVM already returns all zeroes if a CPUID entry is missing,
+ * so we can omit it and avoid hitting KVM's 80-entry limit.
+ */
+cpuid_i--;
+}
 break;
 }
 }
-- 
2.21.0

Re: [Qemu-devel] [PATCH v4 24/28] riscv: sifive: Implement a model for SiFive FU540 OTP

2019-08-22 Thread Alistair Francis

On Sun, Aug 18, 2019 at 10:19 PM Bin Meng  wrote:
>
> This implements a simple model for SiFive FU540 OTP (One-Time
> Programmable) Memory interface, primarily for reading out the
> stored serial number from the first 1 KiB of the 16 KiB OTP
> memory reserved by SiFive for internal use.
>
> Signed-off-by: Bin Meng 
>
> ---
>
> Changes in v4:
> - prefix all macros/variables/functions with SIFIVE_U/sifive_u
>   in the sifive_u_otp driver
>
> Changes in v3: None
> Changes in v2: None
>
>  hw/riscv/Makefile.objs  |   1 +
>  hw/riscv/sifive_u_otp.c | 194 
> 
>  include/hw/riscv/sifive_u_otp.h |  90 +++
>  3 files changed, 285 insertions(+)
>  create mode 100644 hw/riscv/sifive_u_otp.c
>  create mode 100644 include/hw/riscv/sifive_u_otp.h
>
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index b95bbd5..fc3c6dd 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -8,6 +8,7 @@ obj-$(CONFIG_SIFIVE) += sifive_gpio.o
>  obj-$(CONFIG_SIFIVE) += sifive_plic.o
>  obj-$(CONFIG_SIFIVE) += sifive_test.o
>  obj-$(CONFIG_SIFIVE_U) += sifive_u.o
> +obj-$(CONFIG_SIFIVE_U) += sifive_u_otp.o
>  obj-$(CONFIG_SIFIVE_U) += sifive_u_prci.o
>  obj-$(CONFIG_SIFIVE) += sifive_uart.o
>  obj-$(CONFIG_SPIKE) += spike.o
> diff --git a/hw/riscv/sifive_u_otp.c b/hw/riscv/sifive_u_otp.c
> new file mode 100644
> index 000..de8801c
> --- /dev/null
> +++ b/hw/riscv/sifive_u_otp.c
> @@ -0,0 +1,194 @@
> +/*
> + * QEMU SiFive U OTP (One-Time Programmable) Memory interface
> + *
> + * Copyright (c) 2019 Bin Meng 
> + *
> + * Simple model of the OTP to emulate register reads made by the SDK BSP
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/sysbus.h"
> +#include "qemu/module.h"
> +#include "target/riscv/cpu.h"
> +#include "hw/riscv/sifive_u_otp.h"
> +
> +static uint64_t sifive_u_otp_read(void *opaque, hwaddr addr, unsigned int 
> size)
> +{
> +SiFiveUOTPState *s = opaque;
> +
> +switch (addr) {
> +case SIFIVE_U_OTP_PA:
> +return s->pa;
> +case SIFIVE_U_OTP_PAIO:
> +return s->paio;
> +case SIFIVE_U_OTP_PAS:
> +return s->pas;
> +case SIFIVE_U_OTP_PCE:
> +return s->pce;
> +case SIFIVE_U_OTP_PCLK:
> +return s->pclk;
> +case SIFIVE_U_OTP_PDIN:
> +return s->pdin;
> +case SIFIVE_U_OTP_PDOUT:
> +if ((s->pce & SIFIVE_U_OTP_PCE_EN) &&
> +(s->pdstb & SIFIVE_U_OTP_PDSTB_EN) &&
> +(s->ptrim & SIFIVE_U_OTP_PTRIM_EN)) {
> +return s->fuse[s->pa & SIFIVE_U_OTP_PA_MASK];
> +} else {
> +return 0xff;
> +}
> +case SIFIVE_U_OTP_PDSTB:
> +return s->pdstb;
> +case SIFIVE_U_OTP_PPROG:
> +return s->pprog;
> +case SIFIVE_U_OTP_PTC:
> +return s->ptc;
> +case SIFIVE_U_OTP_PTM:
> +return s->ptm;
> +case SIFIVE_U_OTP_PTM_REP:
> +return s->ptm_rep;
> +case SIFIVE_U_OTP_PTR:
> +return s->ptr;
> +case SIFIVE_U_OTP_PTRIM:
> +return s->ptrim;
> +case SIFIVE_U_OTP_PWE:
> +return s->pwe;
> +}
> +
> +hw_error("%s: read: addr=0x%x\n", __func__, (int)addr);

This should be qem_log_mask().

> +return 0;
> +}
> +
> +static void sifive_u_otp_write(void *opaque, hwaddr addr,
> +   uint64_t val64, unsigned int size)
> +{
> +SiFiveUOTPState *s = opaque;
> +
> +switch (addr) {
> +case SIFIVE_U_OTP_PA:
> +s->pa = (uint32_t) val64 & SIFIVE_U_OTP_PA_MASK;
> +break;
> +case SIFIVE_U_OTP_PAIO:
> +s->paio = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PAS:
> +s->pas = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PCE:
> +s->pce = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PCLK:
> +s->pclk = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PDIN:
> +s->pdin = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PDOUT:
> +/* read-only */
> +break;
> +case SIFIVE_U_OTP_PDSTB:
> +s->pdstb = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PPROG:
> +s->pprog = (uint32_t) val64;
> +break;
> +case SIFIVE_U_OTP_PTC:
> +s->ptc = (uint32_t) val64;
> +break;
> +

[Qemu-devel] contrib/gitdm: Add group map for RT-RK?

2019-08-22 Thread Philippe Mathieu-Daudé

Hi Aleksandar,

I noticed this list of contributors:

Aleksandar Markovic 
Dejan Jovicevic 
Lena Djokic 
Mateja Marjanovic 
Mateja Marjanovic 
Miloš Stojanović 
Petar Jovanovic 
Stefan Brankovic 

I see most of the commits are MIPS related (a few are PPC).

Should we add these emails to contrib/gitdm/group-map-wavecomp or should
we rather add a new group-map file for this company?

Thanks,

Phil.

Re: [Qemu-devel] [PATCH v4 13/28] riscv: hart: Add a "hartid-base" property to RISC-V hart array

2019-08-22 Thread Alistair Francis

On Sun, Aug 18, 2019 at 10:27 PM Bin Meng  wrote:
>
> At present each hart's hartid in a RISC-V hart array is assigned
> the same value of its index in the hart array. But for a system
> that has multiple hart arrays, this is not the case any more.
>
> Add a new "hartid-base" property so that hartid number can be
> assigned based on the property value.
>
> Signed-off-by: Bin Meng 

Why do we need this patch?

Alistair

>
> ---
>
> Changes in v4:
> - new patch to add a "hartid-base" property to RISC-V hart array
>
> Changes in v3: None
> Changes in v2: None
>
>  hw/riscv/riscv_hart.c | 8 +---
>  include/hw/riscv/riscv_hart.h | 1 +
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/hw/riscv/riscv_hart.c b/hw/riscv/riscv_hart.c
> index 9deef869..52ab86a 100644
> --- a/hw/riscv/riscv_hart.c
> +++ b/hw/riscv/riscv_hart.c
> @@ -27,6 +27,7 @@
>
>  static Property riscv_harts_props[] = {
>  DEFINE_PROP_UINT32("num-harts", RISCVHartArrayState, num_harts, 1),
> +DEFINE_PROP_UINT32("hartid-base", RISCVHartArrayState, hartid_base, 0),
>  DEFINE_PROP_STRING("cpu-type", RISCVHartArrayState, cpu_type),
>  DEFINE_PROP_END_OF_LIST(),
>  };
> @@ -37,7 +38,7 @@ static void riscv_harts_cpu_reset(void *opaque)
>  cpu_reset(CPU(cpu));
>  }
>
> -static void riscv_hart_realize(RISCVHartArrayState *s, int idx,
> +static void riscv_hart_realize(RISCVHartArrayState *s, int idx, uint32_t 
> hartid,
> char *cpu_type, Error **errp)
>  {
>  Error *err = NULL;
> @@ -45,7 +46,7 @@ static void riscv_hart_realize(RISCVHartArrayState *s, int 
> idx,
>  object_initialize_child(OBJECT(s), "harts[*]", >harts[idx],
>  sizeof(RISCVCPU), cpu_type,
>  _abort, NULL);
> -s->harts[idx].env.mhartid = idx;
> +s->harts[idx].env.mhartid = hartid;
>  qemu_register_reset(riscv_harts_cpu_reset, >harts[idx]);
>  object_property_set_bool(OBJECT(>harts[idx]), true,
>   "realized", );
> @@ -58,12 +59,13 @@ static void riscv_hart_realize(RISCVHartArrayState *s, 
> int idx,
>  static void riscv_harts_realize(DeviceState *dev, Error **errp)
>  {
>  RISCVHartArrayState *s = RISCV_HART_ARRAY(dev);
> +uint32_t hartid = s->hartid_base;
>  int n;
>
>  s->harts = g_new0(RISCVCPU, s->num_harts);
>
>  for (n = 0; n < s->num_harts; n++) {
> -riscv_hart_realize(s, n, s->cpu_type, errp);
> +riscv_hart_realize(s, n, hartid + n, s->cpu_type, errp);
>  }
>  }
>
> diff --git a/include/hw/riscv/riscv_hart.h b/include/hw/riscv/riscv_hart.h
> index 0671d88..1984e30 100644
> --- a/include/hw/riscv/riscv_hart.h
> +++ b/include/hw/riscv/riscv_hart.h
> @@ -32,6 +32,7 @@ typedef struct RISCVHartArrayState {
>
>  /*< public >*/
>  uint32_t num_harts;
> +uint32_t hartid_base;
>  char *cpu_type;
>  RISCVCPU *harts;
>  } RISCVHartArrayState;
> --
> 2.7.4
>
>

Re: [Qemu-devel] [PATCH v4 27/28] riscv: sifive_u: Remove handcrafted clock nodes for UART and ethernet

2019-08-22 Thread Alistair Francis

On Sun, Aug 18, 2019 at 10:31 PM Bin Meng  wrote:
>
> In the past we did not have a model for PRCI, hence two handcrafted
> clock nodes ("/soc/ethclk" and "/soc/uartclk") were created for the
> purpose of supplying hard-coded clock frequencies. But now since we
> have added the PRCI support in QEMU, we don't need them any more.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

>
> ---
>
> Changes in v4:
> - new patch to remove handcrafted clock nodes for UART and ethernet
>
> Changes in v3: None
> Changes in v2: None
>
>  hw/riscv/sifive_u.c | 24 +---
>  include/hw/riscv/sifive_u.h |  3 +--
>  2 files changed, 2 insertions(+), 25 deletions(-)
>
> diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
> index 7a370e9..7d9fb3a 100644
> --- a/hw/riscv/sifive_u.c
> +++ b/hw/riscv/sifive_u.c
> @@ -89,8 +89,7 @@ static void create_fdt(SiFiveUState *s, const struct 
> MemmapEntry *memmap,
>  uint32_t *cells;
>  char *nodename;
>  char ethclk_names[] = "pclk\0hclk";
> -uint32_t plic_phandle, prci_phandle, ethclk_phandle, phandle = 1;
> -uint32_t uartclk_phandle;
> +uint32_t plic_phandle, prci_phandle, phandle = 1;
>  uint32_t hfclk_phandle, rtcclk_phandle, phy_phandle;
>
>  fdt = s->fdt = create_device_tree(>fdt_size);
> @@ -250,17 +249,6 @@ static void create_fdt(SiFiveUState *s, const struct 
> MemmapEntry *memmap,
>  g_free(cells);
>  g_free(nodename);
>
> -ethclk_phandle = phandle++;
> -nodename = g_strdup_printf("/soc/ethclk");
> -qemu_fdt_add_subnode(fdt, nodename);
> -qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
> -qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
> -qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
> -SIFIVE_U_GEM_CLOCK_FREQ);
> -qemu_fdt_setprop_cell(fdt, nodename, "phandle", ethclk_phandle);
> -ethclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -g_free(nodename);
> -
>  phy_phandle = phandle++;
>  nodename = g_strdup_printf("/soc/ethernet@%lx",
>  (long)memmap[SIFIVE_U_GEM].base);
> @@ -292,16 +280,6 @@ static void create_fdt(SiFiveUState *s, const struct 
> MemmapEntry *memmap,
>  qemu_fdt_setprop_cell(fdt, nodename, "reg", 0x0);
>  g_free(nodename);
>
> -uartclk_phandle = phandle++;
> -nodename = g_strdup_printf("/soc/uartclk");
> -qemu_fdt_add_subnode(fdt, nodename);
> -qemu_fdt_setprop_string(fdt, nodename, "compatible", "fixed-clock");
> -qemu_fdt_setprop_cell(fdt, nodename, "#clock-cells", 0x0);
> -qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", 3686400);
> -qemu_fdt_setprop_cell(fdt, nodename, "phandle", uartclk_phandle);
> -uartclk_phandle = qemu_fdt_get_phandle(fdt, nodename);
> -g_free(nodename);
> -
>  nodename = g_strdup_printf("/soc/serial@%lx",
>  (long)memmap[SIFIVE_U_UART0].base);
>  qemu_fdt_add_subnode(fdt, nodename);
> diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
> index cba29e1..8880f9c 100644
> --- a/include/hw/riscv/sifive_u.h
> +++ b/include/hw/riscv/sifive_u.h
> @@ -72,8 +72,7 @@ enum {
>  enum {
>  SIFIVE_U_CLOCK_FREQ = 10,
>  SIFIVE_U_HFCLK_FREQ = ,
> -SIFIVE_U_RTCCLK_FREQ = 100,
> -SIFIVE_U_GEM_CLOCK_FREQ = 12500
> +SIFIVE_U_RTCCLK_FREQ = 100
>  };
>
>  #define SIFIVE_U_MANAGEMENT_CPU_COUNT   1
> --
> 2.7.4
>
>

Re: [Qemu-devel] [Qemu-riscv] RISCV: when will the CLIC be ready?

2019-08-22 Thread Alistair Francis

On Tue, Aug 20, 2019 at 8:38 PM liuzhiwei  wrote:
>
>
> On 2019/8/20 上午12:38, Chih-Min Chao wrote:
>
>
>
> On Mon, Aug 19, 2019 at 9:47 PM liuzhiwei  wrote:
>>
>>
>> On 2019/8/17 上午1:29, Alistair Francis wrote:
>> > On Thu, Aug 15, 2019 at 8:39 PM liuzhiwei  wrote:
>> >> Hi, Palmer
>> >>
>> >> When Michael Clark still was the maintainer of RISCV QEMU, he wrote in 
>> >> the mail list, "the CLIC interrupt controller is under testing,
>> >> and will be included in QEMU 3.1 or 3.2". It is pity that the CLIC is not 
>> >> in
>> >> included even in QEMU 4.1.0.
>> > I see that there is a CLIC branch available here:
>> > https://github.com/riscv/riscv-qemu/pull/157
>> >
>> > It looks like all of the work is in a single commit
>> > (https://github.com/riscv/riscv-qemu/pull/157/commits/206d9ac339feb9ef2c325402a00f0f45f453d019)
>> > and that most of the other commits in the PR have already made it into
>> > master.
>> >
>> > Although the CLIC commit is very large it doesn't seem impossible to
>> > manually pull out the CLIC bits and apply it onto master.
>> >
>> > Do you know the state of the CLIC model? If it's working it shouldn't
>> > be too hard to rebase the work and get the code into mainline.
>> >
>> > Alistair
>> >
>> Hi,  Alistair
>>
>> In my opinion, the CLIC code almost works.
>>
>> Last year when my workmate ported an RTOS, I once read the CLIC 
>> specification and used the CLIC model code. It worked through  all the tests 
>> after fixed two bugs. I also had sent the patch to Michael, but without 
>> response(maybe a wrong email address).
>>
>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> index 7bf6cbc..95d80ab 100644
>> --- a/target/riscv/cpu_helper.c
>> +++ b/target/riscv/cpu_helper.c
>> @@ -505,6 +505,9 @@ static target_ulong riscv_intr_pc(CPURISCVState *env,
>>   if (!(async || clic)) {
>>   return tvec & ~0b11;
>>   }
>> +if (clic) {
>> +cause &= 0x3ff;
>> +}
>>
>>   /* bits [1:0] encode mode; 0 = direct, 1 = vectored, 2 >= reserved */
>>   switch (mode1) {
>> @@ -645,6 +648,9 @@ void riscv_cpu_do_interrupt(CPUState *cs)
>>   riscv_cpu_set_mode(env, PRV_M);
>>   }
>>
>> +if (clic) {
>> +env->exccode = 0;
>> +}
>>   /* NOTE: it is not necessary to yield load reservations here. It is 
>> only
>>  necessary for an SC from "another hart" to cause a load reservation
>>  to be yielded. Refer to the memory consistency model section of the
>>
>> After that, the specification has updated and the code may changed. I didn't 
>> pull new code again.
>>
>> If the CLIC model may merged into the mainline, and no body maintain the 
>> code, I'd like to work on it, fixing the bugs and updating the code 
>> according to latest specification.
>>
>> Best Regards,
>> Zhiwei
>>
>> >> As we have cpus using CLIC, I have to use the out of tree qemu code in 
>> >> SIFIVE
>> >> a long time. Could you tell me when it will be upstreamed?
>> >>
>> >> Best Regards
>> >> Zhiwei
>> >>
>>
>
> Hi Zhiwei,
>
> I think what Alistair point out is the latest clic version (or 
> https://github.com/riscv/riscv-qemu/tree/riscv-qemu-3.1).  The two versions, 
> on pull request and 3.1 branch, should be similar.
>
> As far as I know, there is no concrete plan on CLIC patch in short term.
> It is good to know that the clic patch has been run with real RTOS.
> It is also great if you could update the implementation to latest spec and 
> send the patch again.
>
> chihmin
>
> Hi chihmin,
>
> Thanks for your reminding and approval. I will pull the latest clic version 
> code and send the patch about two or three weeks later.

Great! I'm glad to see more contributions!

Alistair

>
> The RTOS is Rhino,  which is the kernel of 
> AliOS-Things(https://github.com/alibaba/AliOS-Things).
>
> It is also the kernel of YOC(https://cop.c-sky.com).
>
> Best Regards
> Zhiwei

Re: [Qemu-devel] RISC-V: Vector && DSP Extension

2019-08-22 Thread Alistair Francis

On Wed, Aug 21, 2019 at 6:56 PM liuzhiwei  wrote:
>
>
> On 2019/8/22 上午3:31, Palmer Dabbelt wrote:
> > On Thu, 15 Aug 2019 14:37:52 PDT (-0700), alistai...@gmail.com wrote:
> >> On Thu, Aug 15, 2019 at 2:07 AM Peter Maydell
> >>  wrote:
> >>>
> >>> On Thu, 15 Aug 2019 at 09:53, Aleksandar Markovic
> >>>  wrote:
> >>> >
> >>> > > We can accept draft
> >>> > > extensions in QEMU as long as they are disabled by default.
> >>>
> >>> > Hi, Alistair, Palmer,
> >>> >
> >>> > Is this an official stance of QEMU community, or perhaps Alistair's
> >>> > personal judgement, or maybe a rule within risv subcomunity?
> >>>
> >>> Alistair asked on a previous thread; my view was:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03364.html
> >>> and nobody else spoke up disagreeing (summary: should at least be
> >>> disabled-by-default and only enabled by setting an explicit
> >>> property whose name should start with the 'x-' prefix).
> >>
> >> Agreed!
> >>
> >>>
> >>> In general QEMU does sometimes introduce experimental extensions
> >>> (we've had them in the block layer, for example) and so the 'x-'
> >>> property to enable them is a reasonably established convention.
> >>> I think it's a reasonable compromise to allow this sort of work
> >>> to start and not have to live out-of-tree for a long time, without
> >>> confusing users or getting into a situation where some QEMU
> >>> versions behave differently or to obsolete drafts of a spec
> >>> without it being clear from the command line that experimental
> >>> extensions are being enabled.
> >>>
> >>> There is also an element of "submaintainer judgement" to be applied
> >>> here -- upstream is probably not the place for a draft extension
> >>> to be implemented if it is:
> >>>  * still fast moving or subject to major changes of design direction
> >>>  * major changes to the codebase (especially if it requires
> >>>changes to core code) that might later need to be redone
> >>>entirely differently
> >>>  * still experimental
> >>
> >> Yep, agreed. For RISC-V I think this would extend to only allowing
> >> extensions that have backing from the foundation and are under active
> >> discussion.
> >
> > My general philosophy here is that we'll take anything written down in
> > an official RISC-V ISA manual (ie, the ones actually released by the
> > foundation).  This provides a single source of truth for what an
> > extension name / version means, which is important to avoid
> > confusion.  If it's a ratified extension then I see no reason not to
> > support it on my end.  For frozen extensions we should probably just
> > wait the 45 days until they go up for a ratification vote, but I'd be
> > happy to start reviewing patches then (or earlier :)).
> >
> > If the spec is a draft in the ISA manual then we need to worry about
> > the support burden, which I don't have a fixed criteria for --
> > generally there shouldn't be issues here, but early drafts can be in a
> > state where they're going to change extensively and are unlikely to be
> > used by anyone.  There's also the question of "what is an official
> > release of a draft specification?".
> > That's a bit awkward right now: the current ratified ISA manual
> > contains version 0.3 of the hypervisor extension, but I just talked to
> > Andrew and the plan is to remove the draft extensions from the
> > ratified manuals because these drafts are old and the official manuals
> > update slowly.  For now I guess we'll need an an-hoc way of
> > determining if a draft extension has been officially versioned or not,
> > which is a bit of a headache.
> >
> > We already have examples of supporting draft extensions, including
> > priv-1.9.1.  This does cause some pain for us on the QEMU side (CSR
> > bits have different semantics between the specs), but there's 1.9.1
> > hardware out there and the port continues to be useful so I'd be in
> > favor of keeping it around for now.  I suppose there is an implicit
> > risk that draft extensions will be deprecated, but the "x-" prefix,
> > draft status, and long deprecation period should be sufficient to
> > inform users of the risk.  I wouldn't be opposed to adding a "this is
> > a draft ISA" warning, but I feel like it might be a bit overkill.
> >
> Hi, Palmer
>
> Maybe it is the headache of open source hardware. Everyone cooperates to
> build a better architecture.
>
> In my opinion, we should focus on the future. The code in QEMU mainline
> should evolve to the  ratified extension step by step, and only support
> the best extension at last.
>
> At that time,  even many hardwares just support  the deprecated draft
> extension,  the draft codes should be in the wild and maintained by the
> hardware manufactures.
>
> But before that,  it is better to  have a draft implementation. So that
> We can work step by step to accelerate the coming of the ratified
> extension.
>
> Even at last draft extension implementation are deprecated, they are not
> meaningless. The

Re: [Qemu-devel] [PULL 2/3] tests: Run the iotests during "make check" again

2019-08-22 Thread Paolo Bonzini

On 17/08/19 10:54, Thomas Huth wrote:
> People often forget to run the iotests before submitting patches or pull
> requests - this is likely due to the fact that we do not run the tests
> during our mandatory "make check" tests yet. Now that we've got a proper
> "auto" group of iotests that should be fine to run in every environment,
> we can enable the iotests during "make check" again by running the "auto"
> tests by default from the check-block.sh script.
> 
> Some cases still need to be checked first, though: iotests need bash and
> GNU sed (otherwise they fail), and if gprof is enabled, it spoils the
> output of some test cases causing them to fail. So if we detect that one
> of the required programs is missing or that gprof is enabled, we still
> have to skip the iotests to avoid failures.
> 
> And finally, since we are using check-block.sh now again, this patch also
> removes the qemu-iotests-quick.sh script since we do not need that anymore
> (and having two shell wrapper scripts around the block tests seems rather
> confusing than helpful).
> 
> Message-Id: <20190717111947.30356-4-th...@redhat.com>
> Signed-off-by: Thomas Huth 
> [AJB: -makecheck to check-block.sh, move check-block to start and gate it]
> Signed-off-by: Alex Bennée 

This breaks when sanitizers are enabled.  There are leaks reported,
though I'm not sure if they are real, and in additions the warning lines
break qemu-iotests' output comparison.

Paolo

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D

Paolo,

It is my understanding that real HW hot plug uses the SDM defined
methods.  Meaning the initial SMI is to 3000:8000 and they rebase
to TSEG in the first SMI.  They must have chipset specific methods
to protect 3000:8000 from DMA.

Can we add a chipset feature to prevent DMA to 64KB range from
0x3-0x3 and the UEFI Memory Map and ACPI content can be
updated so the Guest OS knows to not use that range for DMA?

Thanks,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 3:18 PM
> To: Kinney, Michael D ;
> Laszlo Ersek ; r...@edk2.groups.io;
> Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 22:06, Kinney, Michael D wrote:
> > The SMBASE register is internal and cannot be directly
> accessed by any
> > CPU.  There is an SMBASE field that is member of the
> SMM Save State
> > area and can only be modified from SMM and requires the
> execution of
> > an RSM instruction from SMM for the SMBASE register to
> be updated from
> > the current SMBASE field value.  The new SMBASE
> register value is only
> > used on the next SMI.
> 
> Actually there is also an SMBASE MSR, even though in
> current silicon it's read-only and its use is
> theoretically limited to SMM-transfer monitors.  If that
> MSR could be made accessible somehow outside SMM, that
> would be great.
> 
> > Once all the CPUs have been initialized for SMM, the
> CPUs that are not
> > needed can be hot removed.  As noted above, the SMBASE
> value does not
> > change on an INIT.  So as long as the hot add operation
> does not do a
> > RESET, the SMBASE value must be preserved.
> 
> IIRC, hot-remove + hot-add will unplugs/plugs a
> completely different CPU.
> 
> > Another idea is to emulate this behavior.  If the hot
> plug controller
> > provide registers (only accessible from SMM) to assign
> the SMBASE
> > address for every CPU.  When a CPU is hot added, QEMU
> can set the
> > internal SMBASE register value from the hot plug
> controller register
> > value.  If the SMM Monarch sends an INIT or an SMI from
> the Local APIC
> > to the hot added CPU, then the SMBASE register should
> not be modified
> > and the CPU starts execution within TSEG the first time
> it receives an SMI.
> 
> Yes, this would work.  But again---if the issue is real
> on current hardware too, I'd rather have a matching
> solution for virtual platforms.
> 
> If the current hardware for example remembers INIT-
> preserved across hot-remove/hot-add, we could emulate
> that.
> 
> I guess the fundamental question is: how do bare metal
> platforms avoid this issue, or plan to avoid this issue?
> Once we know that, we can use that information to find a
> way to implement it in KVM.  Only if it is impossible
> we'll have a different strategy that is specific to our
> platform.
> 
> Paolo
> 
> > Jiewen and I can collect specific questions on this
> topic and continue
> > the discussion here.  For example, I do not think there
> is any method
> > other than what I referenced above to program the
> SMBASE register, but
> > I can ask if there are any other methods.

Re: [Qemu-devel] [PATCH 0/6] Refine exec

2019-08-22 Thread Wei Yang

On Thu, Aug 22, 2019 at 12:25:44PM +0200, Paolo Bonzini wrote:
>On 19/08/19 05:06, Wei Yang wrote:
>> On Thu, Mar 21, 2019 at 04:25:49PM +0800, Wei Yang wrote:
>>> This serial refine exec a little.
>>>
>> 
>> Ping again.
>
>Queued all except 2, thanks!
>

Thanks~

>Paolo
>
>>> Wei Yang (6):
>>>  exec.c: replace hwaddr with uint64_t for better understanding
>>>  exec.c: remove an unnecessary assert on PHYS_MAP_NODE_NIL in
>>>phys_map_node_alloc()
>>>  exec.c: get nodes_nb_alloc with one MAX calculation
>>>  exec.c: subpage->sub_section is already initialized to 0
>>>  exec.c: correct the maximum skip value during compact
>>>  exec.c: add a check between constants to see whether we could skip
>>>
>>> exec.c | 21 ++---
>>> 1 file changed, 10 insertions(+), 11 deletions(-)
>>>
>>> -- 
>>> 2.19.1
>> 
>

-- 
Wei Yang
Help you, Help me

Re: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-08-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/cover.1566503584.git.qemu_...@crudebyte.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions
Message-id: cover.1566503584.git.qemu_...@crudebyte.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/cover.1566503584.git.qemu_...@crudebyte.com -> 
patchew/cover.1566503584.git.qemu_...@crudebyte.com
Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for 
path 'capstone'
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) 
registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/edk2' (https://git.qemu.org/git/edk2.git) registered for path 
'roms/edk2'
Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered 
for path 'roms/openbios'
Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) 
registered for path 'roms/openhackware'
Submodule 'roms/opensbi' (https://git.qemu.org/git/opensbi.git) registered for 
path 'roms/opensbi'
Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) 
registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for 
path 'roms/seabios'
Submodule 'roms/seabios-hppa' (https://git.qemu.org/git/seabios-hppa.git) 
registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for 
path 'roms/sgabios'
Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for 
path 'roms/skiboot'
Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for 
path 'roms/u-boot'
Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) 
registered for path 'roms/u-boot-sam460ex'
Submodule 'slirp' (https://git.qemu.org/git/libslirp.git) registered for path 
'slirp'
Submodule 'tests/fp/berkeley-softfloat-3' 
(https://git.qemu.org/git/berkeley-softfloat-3.git) registered for path 
'tests/fp/berkeley-softfloat-3'
Submodule 'tests/fp/berkeley-testfloat-3' 
(https://git.qemu.org/git/berkeley-testfloat-3.git) registered for path 
'tests/fp/berkeley-testfloat-3'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) 
registered for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out 
'22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out 
'90c488d5f4a407342247b9ea869df1c2d9c8e266'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 
'7bfe584e321946771692711ff83ad2b5850daca7'
Cloning into 'roms/edk2'...
Submodule path 'roms/edk2': checked out 
'20d2e5a125e34fc8501026613a71549b2a1a3e54'
Submodule 'SoftFloat' (https://github.com/ucb-bar/berkeley-softfloat-3.git) 
registered for path 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'
Submodule 'CryptoPkg/Library/OpensslLib/openssl' 
(https://github.com/openssl/openssl) registered for path 
'CryptoPkg/Library/OpensslLib/openssl'
Cloning into 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'...
Submodule path 'roms/edk2/ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3': 
checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'CryptoPkg/Library/OpensslLib/openssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl': checked out 
'50eaac9f3337667259de725451f201e784599687'
Submodule 'boringssl' (https://boringssl.googlesource.com/boringssl) registered 
for path 'boringssl'
Submodule 'krb5' (https://github.com/krb5/krb5) registered for path 'krb5'
Submodule 'pyca.cryptography' (https://github.com/pyca/cryptography.git) 
registered for path 'pyca-cryptography'
Cloning into 'boringssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/boringssl': 
checked out '2070f8ad9151dc8f3a73bffaa146b5e6937a583f'
Cloning into 'krb5'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/krb5': checked 
out 'b9ad6c49505c96a088326b62a52568e3484f2168'
Cloning into 'pyca-cryptography'...
Submodule path 
'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/pyca-cryptography': checked out 
'09403100de2f6f1cdd0d484dcb8e620f1c335c8f'
Cloning into 'roms/ipxe'...
Submodule

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini

On 22/08/19 22:06, Kinney, Michael D wrote:
> The SMBASE register is internal and cannot be directly accessed 
> by any CPU.  There is an SMBASE field that is member of the SMM Save
> State area and can only be modified from SMM and requires the
> execution of an RSM instruction from SMM for the SMBASE register to
> be updated from the current SMBASE field value.  The new SMBASE
> register value is only used on the next SMI.

Actually there is also an SMBASE MSR, even though in current silicon
it's read-only and its use is theoretically limited to SMM-transfer
monitors.  If that MSR could be made accessible somehow outside SMM,
that would be great.

> Once all the CPUs have been initialized for SMM, the CPUs that are not needed
> can be hot removed.  As noted above, the SMBASE value does not change on
> an INIT.  So as long as the hot add operation does not do a RESET, the
> SMBASE value must be preserved.

IIRC, hot-remove + hot-add will unplugs/plugs a completely different CPU.

> Another idea is to emulate this behavior.  If the hot plug controller
> provide registers (only accessible from SMM) to assign the SMBASE address
> for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
> register value from the hot plug controller register value.  If the SMM
> Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
> then the SMBASE register should not be modified and the CPU starts execution
> within TSEG the first time it receives an SMI.

Yes, this would work.  But again---if the issue is real on current
hardware too, I'd rather have a matching solution for virtual platforms.

If the current hardware for example remembers INIT-preserved across
hot-remove/hot-add, we could emulate that.

I guess the fundamental question is: how do bare metal platforms avoid
this issue, or plan to avoid this issue?  Once we know that, we can use
that information to find a way to implement it in KVM.  Only if it is
impossible we'll have a different strategy that is specific to our platform.

Paolo

> Jiewen and I can collect specific questions on this topic and continue
> the discussion here.  For example, I do not think there is any method
> other than what I referenced above to program the SMBASE register, but
> I can ask if there are any other methods.

[Qemu-devel] [PATCH v6 4/4] 9p: Use variable length suffixes for inode remapping

2019-08-22 Thread Christian Schoenebeck via Qemu-devel

Use variable length suffixes for inode remapping instead of the fixed
16 bit size prefixes before. With this change the inode numbers on guest
will typically be much smaller (e.g. around >2^1 .. >2^7 instead of >2^48
with the previous fixed size inode remapping.

Additionally this solution is more efficient, since inode numbers in
practice can take almost their entire 64 bit range on guest as well, so
there is less likely a need for generating and tracking additional suffixes,
which might also be beneficial for nested virtualization where each level of
virtualization would shift up the inode bits and increase the chance of
expensive remapping actions.

The "Exponential Golomb" algorithm is used as basis for generating the
variable length suffixes. The algorithm has a parameter k which controls the
distribution of bits on increasing indeces (minimum bits at low index vs.
maximum bits at high index). With k=0 the generated suffixes look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [1] (1 bits)
2 [10] -> [010] (3 bits)
3 [11] -> [110] (3 bits)
4 [100] -> [00100] (5 bits)
5 [101] -> [10100] (5 bits)
6 [110] -> [01100] (5 bits)
7 [111] -> [11100] (5 bits)
8 [1000] -> [0001000] (7 bits)
9 [1001] -> [1001000] (7 bits)
10 [1010] -> [0101000] (7 bits)
11 [1011] -> [1101000] (7 bits)
12 [1100] -> [0011000] (7 bits)
...
65533 [1101] ->  [1011000] (31 bits)
65534 [1110] ->  [0111000] (31 bits)
65535 [] ->  [000] (31 bits)
Hence minBits=1 maxBits=31

And with k=5 they would look like:

Index Dec/Bin -> Generated Suffix Bin
1 [1] -> [01] (6 bits)
2 [10] -> [11] (6 bits)
3 [11] -> [010001] (6 bits)
4 [100] -> [110001] (6 bits)
5 [101] -> [001001] (6 bits)
6 [110] -> [101001] (6 bits)
7 [111] -> [011001] (6 bits)
8 [1000] -> [111001] (6 bits)
9 [1001] -> [000101] (6 bits)
10 [1010] -> [100101] (6 bits)
11 [1011] -> [010101] (6 bits)
12 [1100] -> [110101] (6 bits)
...
65533 [1101] -> [001110001000] (28 bits)
65534 [1110] -> [101110001000] (28 bits)
65535 [] -> [00001000] (28 bits)
Hence minBits=6 maxBits=28

Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 247 ---
 hw/9pfs/9p.h |  34 +++-
 2 files changed, 251 insertions(+), 30 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 728641fb7f..0359469cfa 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -26,6 +26,7 @@
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
 #include "qemu/xxhash.h"
+#include 
 
 int open_fd_hw;
 int total_open_fd;
@@ -572,6 +573,107 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_NAMED_PIPE |   \
 P9_STAT_MODE_SOCKET)
 
+/* Mirrors all bits of a byte. So e.g. binary 1010 would become 0101. 
*/
+static inline uint8_t mirror8bit(uint8_t byte)
+{
+return (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;
+}
+
+/* Same as mirror8bit() just for a 64 bit data type instead for a byte. */
+static inline uint64_t mirror64bit(uint64_t value)
+{
+return ((uint64_t)mirror8bit( value& 0xff) << 56) |
+   ((uint64_t)mirror8bit((value >> 8)  & 0xff) << 48) |
+   ((uint64_t)mirror8bit((value >> 16) & 0xff) << 40) |
+   ((uint64_t)mirror8bit((value >> 24) & 0xff) << 32) |
+   ((uint64_t)mirror8bit((value >> 32) & 0xff) << 24) |
+   ((uint64_t)mirror8bit((value >> 40) & 0xff) << 16) |
+   ((uint64_t)mirror8bit((value >> 48) & 0xff) << 8 ) |
+   ((uint64_t)mirror8bit((value >> 56) & 0xff)  ) ;
+}
+
+/** @brief Parameter k for the Exponential Golomb algorihm to be used.
+ *
+ * The smaller this value, the smaller the minimum bit count for the Exp.
+ * Golomb generated affixes will be (at lowest index) however for the
+ * price of having higher maximum bit count of generated affixes (at highest
+ * index). Likewise increasing this parameter yields in smaller maximum bit
+ * count for the price of having higher minimum bit count.
+ *
+ * In practice that means: a good value for k depends on the expected amount
+ * of devices to be exposed by one export. For a small amount of devices k
+ * should be small, for a large amount of devices k might be increased
+ * instead. The default of k=0 should be fine for most users though.
+ *
+ * @b IMPORTANT: In case this ever becomes a runtime parameter; the value of
+ * k should not change as long as guest is still running! Because that would
+ * cause completely different inode numbers to be generated on guest.
+ */
+#define EXP_GOLOMB_K0
+
+/** @brief Exponential Golomb algorithm for arbitrary k (including k=0).
+ *
+ * The Exponential Golomb algorithm generates @b prefixes (@b not suffixes!)
+ * with growing length and with the mathematical property of

[Qemu-devel] [PATCH v6 0/4] 9p: Fix file ID collisions

2019-08-22 Thread Christian Schoenebeck via Qemu-devel

This is v6 of a proposed patch set for fixing file ID collisions with 9pfs.

v5->v6:

  * Rebased to https://github.com/gkurz/qemu/commits/9p-next
(SHA1 177fd3b6a8).

  * Replaced previous boolean option 'remap_inodes' by tertiary option
'multidevs=remap|forbid|warn', where 'warn' is the new/old default
behaviour for not breaking existing installations:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg07098.html

  * Dropped incomplete fix in v9fs_do_readdir() which aimed to prevent
exposing info outside export root with '..' entry. Postponed this
fix for now for the reasons described:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01862.html

Christian Schoenebeck (4):
  9p: Treat multiple devices on one export as an error
  9p: Added virtfs option 'multidevs=remap|forbid|warn'
  9p: stat_to_qid: implement slow path
  9p: Use variable length suffixes for inode remapping

 fsdev/file-op-9p.h  |   5 +
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  11 ++
 hw/9pfs/9p.c| 488 +---
 hw/9pfs/9p.h|  51 +
 qemu-options.hx |  33 +++-
 vl.c|   6 +-
 7 files changed, 565 insertions(+), 36 deletions(-)

-- 
2.11.0

[Qemu-devel] [PATCH v6 1/4] 9p: Treat multiple devices on one export as an error

2019-08-22 Thread Christian Schoenebeck via Qemu-devel

The QID path should uniquely identify a file. However, the
inode of a file is currently used as the QID path, which
on its own only uniquely identifies files within a device.
Here we track the device hosting the 9pfs share, in order
to prevent security issues with QID path collisions from
other devices.

Signed-off-by: Antonios Motakis 
[CS: - Assign dev_id to export root's device already in
   v9fs_device_realize_common(), not postponed in
   stat_to_qid().
 - error_report_once() if more than one device was
   shared by export.
 - Return -ENODEV instead of -ENOSYS in stat_to_qid().
 - Fixed typo in log comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 69 
 hw/9pfs/9p.h |  1 +
 2 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 586a6dccba..8cc65c2c67 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -572,10 +572,18 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 P9_STAT_MODE_SOCKET)
 
 /* This is the algorithm from ufs in spfs */
-static void stat_to_qid(const struct stat *stbuf, V9fsQID *qidp)
+static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp)
 {
 size_t size;
 
+if (pdu->s->dev_id != stbuf->st_dev) {
+error_report_once(
+"9p: Multiple devices detected in same VirtFS export. "
+"You must use a separate export for each device."
+);
+return -ENODEV;
+}
+
 memset(>path, 0, sizeof(qidp->path));
 size = MIN(sizeof(stbuf->st_ino), sizeof(qidp->path));
 memcpy(>path, >st_ino, size);
@@ -587,6 +595,8 @@ static void stat_to_qid(const struct stat *stbuf, V9fsQID 
*qidp)
 if (S_ISLNK(stbuf->st_mode)) {
 qidp->type |= P9_QID_TYPE_SYMLINK;
 }
+
+return 0;
 }
 
 static int coroutine_fn fid_to_qid(V9fsPDU *pdu, V9fsFidState *fidp,
@@ -599,7 +609,10 @@ static int coroutine_fn fid_to_qid(V9fsPDU *pdu, 
V9fsFidState *fidp,
 if (err < 0) {
 return err;
 }
-stat_to_qid(, qidp);
+err = stat_to_qid(pdu, , qidp);
+if (err < 0) {
+return err;
+}
 return 0;
 }
 
@@ -830,7 +843,10 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 
 memset(v9stat, 0, sizeof(*v9stat));
 
-stat_to_qid(stbuf, >qid);
+err = stat_to_qid(pdu, stbuf, >qid);
+if (err < 0) {
+return err;
+}
 v9stat->mode = stat_to_v9mode(stbuf);
 v9stat->atime = stbuf->st_atime;
 v9stat->mtime = stbuf->st_mtime;
@@ -891,7 +907,7 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 #define P9_STATS_ALL   0x3fffULL /* Mask for All fields above */
 
 
-static void stat_to_v9stat_dotl(V9fsState *s, const struct stat *stbuf,
+static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
 V9fsStatDotl *v9lstat)
 {
 memset(v9lstat, 0, sizeof(*v9lstat));
@@ -913,7 +929,7 @@ static void stat_to_v9stat_dotl(V9fsState *s, const struct 
stat *stbuf,
 /* Currently we only support BASIC fields in stat */
 v9lstat->st_result_mask = P9_STATS_BASIC;
 
-stat_to_qid(stbuf, >qid);
+return stat_to_qid(pdu, stbuf, >qid);
 }
 
 static void print_sg(struct iovec *sg, int cnt)
@@ -1115,7 +1131,6 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 uint64_t request_mask;
 V9fsStatDotl v9stat_dotl;
 V9fsPDU *pdu = opaque;
-V9fsState *s = pdu->s;
 
 retval = pdu_unmarshal(pdu, offset, "dq", , _mask);
 if (retval < 0) {
@@ -1136,7 +1151,10 @@ static void coroutine_fn v9fs_getattr(void *opaque)
 if (retval < 0) {
 goto out;
 }
-stat_to_v9stat_dotl(s, , _dotl);
+retval = stat_to_v9stat_dotl(pdu, , _dotl);
+if (retval < 0) {
+goto out;
+}
 
 /*  fill st_gen if requested and supported by underlying fs */
 if (request_mask & P9_STATS_GEN) {
@@ -1381,7 +1399,10 @@ static void coroutine_fn v9fs_walk(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(, );
+err = stat_to_qid(pdu, , );
+if (err < 0) {
+goto out;
+}
 v9fs_path_copy(, );
 }
 memcpy([name_idx], , sizeof(qid));
@@ -1483,7 +1504,10 @@ static void coroutine_fn v9fs_open(void *opaque)
 if (err < 0) {
 goto out;
 }
-stat_to_qid(, );
+err = stat_to_qid(pdu, , );
+if (err < 0) {
+goto out;
+}
 if (S_ISDIR(stbuf.st_mode)) {
 err = v9fs_co_opendir(pdu, fidp);
 if (err < 0) {
@@ -1593,7 +1617,10 @@ static void coroutine_fn v9fs_lcreate(void *opaque)
 fidp->flags |= FID_NON_RECLAIMABLE;
 }
 iounit =  get_iounit(pdu, >path);
-stat_to_qid(, );
+err = stat_to_qid(pdu, , );
+if (err < 0) {
+goto out;
+}
 err = pdu_marshal(pdu, offset, "Qd", ,

[Qemu-devel] [PATCH v6 3/4] 9p: stat_to_qid: implement slow path

2019-08-22 Thread Christian Schoenebeck via Qemu-devel

stat_to_qid attempts via qid_path_prefixmap to map unique files (which are
identified by 64 bit inode nr and 32 bit device id) to a 64 QID path value.
However this implementation makes some assumptions about inode number
generation on the host.

If qid_path_prefixmap fails, we still have 48 bits available in the QID
path to fall back to a less memory efficient full mapping.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 177fd3b6a8).
 - Updated hash calls to new xxhash API.
 - Removed unnecessary parantheses in qpf_lookup_func().
 - Removed unnecessary g_malloc0() result checks.
 - Log error message when running out of prefixes in
   qid_path_fullmap().
 - Log error message about potential degraded performance in
   qid_path_prefixmap().
 - Fixed typo in comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p.c | 70 ++--
 hw/9pfs/9p.h |  9 
 2 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index c96ea51116..728641fb7f 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -579,23 +579,73 @@ static uint32_t qpp_hash(QppEntry e)
 return qemu_xxhash7(e.ino_prefix, e.dev, 0, 0, 0);
 }
 
+static uint32_t qpf_hash(QpfEntry e)
+{
+return qemu_xxhash7(e.ino, e.dev, 0, 0, 0);
+}
+
 static bool qpp_lookup_func(const void *obj, const void *userp)
 {
 const QppEntry *e1 = obj, *e2 = userp;
 return e1->dev == e2->dev && e1->ino_prefix == e2->ino_prefix;
 }
 
-static void qpp_table_remove(void *p, uint32_t h, void *up)
+static bool qpf_lookup_func(const void *obj, const void *userp)
+{
+const QpfEntry *e1 = obj, *e2 = userp;
+return e1->dev == e2->dev && e1->ino == e2->ino;
+}
+
+static void qp_table_remove(void *p, uint32_t h, void *up)
 {
 g_free(p);
 }
 
-static void qpp_table_destroy(struct qht *ht)
+static void qp_table_destroy(struct qht *ht)
 {
-qht_iter(ht, qpp_table_remove, NULL);
+qht_iter(ht, qp_table_remove, NULL);
 qht_destroy(ht);
 }
 
+static int qid_path_fullmap(V9fsPDU *pdu, const struct stat *stbuf,
+uint64_t *path)
+{
+QpfEntry lookup = {
+.dev = stbuf->st_dev,
+.ino = stbuf->st_ino
+}, *val;
+uint32_t hash = qpf_hash(lookup);
+
+/* most users won't need the fullmap, so init the table lazily */
+if (!pdu->s->qpf_table.map) {
+qht_init(>s->qpf_table, qpf_lookup_func, 1 << 16, 
QHT_MODE_AUTO_RESIZE);
+}
+
+val = qht_lookup(>s->qpf_table, , hash);
+
+if (!val) {
+if (pdu->s->qp_fullpath_next == 0) {
+/* no more files can be mapped :'( */
+error_report_once(
+"9p: No more prefixes available for remapping inodes from "
+"host to guest."
+);
+return -ENFILE;
+}
+
+val = g_malloc0(sizeof(QppEntry));
+*val = lookup;
+
+/* new unique inode and device combo */
+val->path = pdu->s->qp_fullpath_next++;
+pdu->s->qp_fullpath_next &= QPATH_INO_MASK;
+qht_insert(>s->qpf_table, val, hash, NULL);
+}
+
+*path = val->path;
+return 0;
+}
+
 /* stat_to_qid needs to map inode number (64 bits) and device id (32 bits)
  * to a unique QID path (64 bits). To avoid having to map and keep track
  * of up to 2^64 objects, we map only the 16 highest bits of the inode plus
@@ -621,8 +671,7 @@ static int qid_path_prefixmap(V9fsPDU *pdu, const struct 
stat *stbuf,
 if (pdu->s->qp_prefix_next == 0) {
 /* we ran out of prefixes */
 error_report_once(
-"9p: No more prefixes available for remapping inodes from "
-"host to guest."
+"9p: Potential degraded performance of inode remapping"
 );
 return -ENFILE;
 }
@@ -647,6 +696,10 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat 
*stbuf, V9fsQID *qidp)
 if (pdu->s->ctx.export_flags & V9FS_REMAP_INODES) {
 /* map inode+device to qid path (fast path) */
 err = qid_path_prefixmap(pdu, stbuf, >path);
+if (err == -ENFILE) {
+/* fast path didn't work, fall back to full map */
+err = qid_path_fullmap(pdu, stbuf, >path);
+}
 if (err) {
 return err;
 }
@@ -3813,6 +3866,7 @@ int v9fs_device_realize_common(V9fsState *s, const 
V9fsTransport *t,
 /* QID path hash table. 1 entry ought to be enough for anybody ;) */
 qht_init(>qpp_table, qpp_lookup_func, 1, QHT_MODE_AUTO_RESIZE);
 s->qp_prefix_next = 1; /* reserve 0 to detect overflow */
+s->qp_fullpath_next = 1;
 
 s->ctx.fst = >fst;
 fsdev_throttle_init(s->ctx.fst);
@@ -3827,7 +3881,8 @@ out:
 }
 g_free(s->tag);
 g_free(s->ctx.fs_root);
-qpp_table_destroy(>qpp_table);
+qp_table_destroy(>qpp_table);
+

[Qemu-devel] [PATCH v6 2/4] 9p: Added virtfs option 'multidevs=remap|forbid|warn'

2019-08-22 Thread Christian Schoenebeck via Qemu-devel

'warn' (default): Only log an error message (once) on host if more than one
device is shared by same export, except of that just ignore this config
error though. This is the default behaviour for not breaking existing
installations implying that they really know what they are doing.

'forbid': Like 'warn', but except of just logging an error this
also denies access of guest to additional devices.

'remap': Allows to share more than one device per export by remapping
inodes from host to guest appropriately. To support multiple devices on the
9p share, and avoid qid path collisions we take the device id as input to
generate a unique QID path. The lowest 48 bits of the path will be set
equal to the file inode, and the top bits will be uniquely assigned based
on the top 16 bits of the inode and the device id.

Signed-off-by: Antonios Motakis 
[CS: - Rebased to https://github.com/gkurz/qemu/commits/9p-next
   (SHA1 177fd3b6a8).
 - Updated hash calls to new xxhash API.
 - Added virtfs option 'multidevs', original patch simply did the inode
   remapping without being asked.
 - Updated docs for new option 'multidevs'.
 - Capture root_ino in v9fs_device_realize_common() as well, not just
   the device id.
 - Fixed v9fs_do_readdir() not having remapped inodes.
 - Log error message when running out of prefixes in
   qid_path_prefixmap().
 - Fixed definition of QPATH_INO_MASK.
 - Dropped unnecessary parantheses in qpp_lookup_func().
 - Dropped unnecessary g_malloc0() result checks. ]
Signed-off-by: Christian Schoenebeck 
---
 fsdev/file-op-9p.h  |   5 ++
 fsdev/qemu-fsdev-opts.c |   7 +-
 fsdev/qemu-fsdev.c  |  11 +++
 hw/9pfs/9p.c| 182 ++--
 hw/9pfs/9p.h|  13 
 qemu-options.hx |  33 +++--
 vl.c|   6 +-
 7 files changed, 229 insertions(+), 28 deletions(-)

diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
index c757c8099f..f2f7772c86 100644
--- a/fsdev/file-op-9p.h
+++ b/fsdev/file-op-9p.h
@@ -59,6 +59,11 @@ typedef struct ExtendedOps {
 #define V9FS_RDONLY 0x0040
 #define V9FS_PROXY_SOCK_FD  0x0080
 #define V9FS_PROXY_SOCK_NAME0x0100
+/*
+ * multidevs option (either one of the two applies exclusively)
+ */
+#define V9FS_REMAP_INODES   0x0200
+#define V9FS_FORBID_MULTIDEVS   0x0400
 
 #define V9FS_SEC_MASK   0x003C
 
diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
index 7c31af..07a18c6e48 100644
--- a/fsdev/qemu-fsdev-opts.c
+++ b/fsdev/qemu-fsdev-opts.c
@@ -31,7 +31,9 @@ static QemuOptsList qemu_fsdev_opts = {
 }, {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
-
+}, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
 }, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
@@ -76,6 +78,9 @@ static QemuOptsList qemu_virtfs_opts = {
 .name = "readonly",
 .type = QEMU_OPT_BOOL,
 }, {
+.name = "multidevs",
+.type = QEMU_OPT_STRING,
+}, {
 .name = "socket",
 .type = QEMU_OPT_STRING,
 }, {
diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index 077a8c4e2b..ed03d559a9 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -58,6 +58,7 @@ static FsDriverTable FsDrivers[] = {
 "writeout",
 "fmode",
 "dmode",
+"multidevs",
 "throttling.bps-total",
 "throttling.bps-read",
 "throttling.bps-write",
@@ -121,6 +122,7 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 const char *fsdev_id = qemu_opts_id(opts);
 const char *fsdriver = qemu_opt_get(opts, "fsdriver");
 const char *writeout = qemu_opt_get(opts, "writeout");
+const char *multidevs = qemu_opt_get(opts, "multidevs");
 bool ro = qemu_opt_get_bool(opts, "readonly", 0);
 
 if (!fsdev_id) {
@@ -161,6 +163,15 @@ int qemu_fsdev_add(QemuOpts *opts, Error **errp)
 } else {
 fsle->fse.export_flags &= ~V9FS_RDONLY;
 }
+if (multidevs) {
+if (!strcmp(multidevs, "remap")) {
+fsle->fse.export_flags &= ~V9FS_FORBID_MULTIDEVS;
+fsle->fse.export_flags |= V9FS_REMAP_INODES;
+} else if (!strcmp(multidevs, "forbid")) {
+fsle->fse.export_flags &= ~V9FS_REMAP_INODES;
+fsle->fse.export_flags |= V9FS_FORBID_MULTIDEVS;
+}
+}
 
 if (fsle->fse.ops->parse_opts) {
 if (fsle->fse.ops->parse_opts(opts, >fse, errp)) {
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 8cc65c2c67..c96ea51116 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -25,6 +25,7 @@
 #include "trace.h"
 #include "migration/blocker.h"
 #include "sysemu/qtest.h"
+#include "qemu/xxhash.h"
 
 int open_fd_hw;
 int total_open_fd;
@@ -571,22 +572,109 @@ static void coroutine_fn

Re: [Qemu-devel] [PATCH 1/4] configure: keep track of Python version

2019-08-22 Thread Eduardo Habkost

On Thu, Aug 22, 2019 at 05:19:26PM -0400, Cleber Rosa wrote:
> On Thu, Aug 22, 2019 at 05:48:46PM +0100, Peter Maydell wrote:
> > On Fri, 9 Nov 2018 at 15:09, Cleber Rosa  wrote:
> > >
> > > Some functionality is dependent on the Python version
> > > detected/configured on configure.  While it's possible to run the
> > > Python version later and check for the version, doing it once is
> > > preferable.  Also, it's a relevant information to keep in build logs,
> > > as the overall behavior of the build can be affected by it.
> > >
> > > Signed-off-by: Cleber Rosa 
> > > ---
> > >  configure | 6 +-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/configure b/configure
> > > index 74e313a810..67fff0290d 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -1740,6 +1740,9 @@ if ! $python -c 'import sys; 
> > > sys.exit(sys.version_info < (2,7))'; then
> > >"Use --python=/path/to/python to specify a supported Python."
> > >  fi
> > >
> > > +# Preserve python version since some functionality is dependent on it
> > > +python_version=$($python -V 2>&1 | sed -e 's/Python\ //')
> > > +
> > 
> > Hi. Somebody on IRC has just fallen over a problem where
> > their python's "-V" output prints multiple lines, which
> > means that "$python_version" here is multiple lines, which
> > means that the eventual config-host.mak has invalid syntax
> > because we assume here:
> >
> 
> We've tried a number of things, and just when I thought we wouldn't be
> able to make any sense out of it, I arrived at a still senseless but
> precise reproducer.  TL;DR: it has to do with interactive shells and
> that exact Python build.
> 
> Reproducer (docker may also do the trick here):
> 
>   $ podman run --rm -ti fedora:29 /bin/bash -c 'dnf -y install 
> http://mirror.siena.edu/fedora/linux/releases/29/Everything/x86_64/os/Packages/p/python3-3.7.0-9.fc29.x86_64.rpm;
>  python3 -V'
>   Python 3.7.0 (default, Aug 30 2018, 14:32:33) 
>   [GCC 8.2.1 20180801 (Red Hat 8.2.1-2)]
> 
> With an interactive shell instead:
> 
>   $ podman run --rm -ti fedora:29 /bin/bash -i -c 'dnf -y install 
> http://mirror.siena.edu/fedora/linux/releases/29/Everything/x86_64/os/Packages/p/python3-3.7.0-9.fc29.x86_64.rpm;
>  python3 -V'
>   Python 3.7.0
> 
> How this behavior came to be, baffles me.  But, it seems to be fixed
> on newer versions.
> 
> > > @@ -6823,6 +6826,7 @@ echo "INSTALL_DATA=$install -c -m 0644" >> 
> > > $config_host_mak
> > >  echo "INSTALL_PROG=$install -c -m 0755" >> $config_host_mak
> > >  echo "INSTALL_LIB=$install -c -m 0644" >> $config_host_mak
> > >  echo "PYTHON=$python" >> $config_host_mak
> > > +echo "PYTHON_VERSION=$python_version" >> $config_host_mak
> > >  echo "CC=$cc" >> $config_host_mak
> > >  if $iasl -h > /dev/null 2>&1; then
> > >echo "IASL=$iasl" >> $config_host_mak
> > 
> > that it's only one line, and will generate bogus makefile
> > syntax if it's got an embedded newline. (Problem system
> > seems to be Fedora 29.)
> >
> 
> The assumption could be guaranteed by a "head -1", and while
> it's not a failproof solution, it would at least not corrupt
> the makefile and the whole build system.
> 
> > I've reread this thread, where there seems to have been
> > some discussion about just running Python itself to
> > get the sys.version value (which is how we check for
> > "is this python too old" earlier in the configure script).
> > But I'm not really clear why trying to parse -V output is better:
> > it's definitely less reliable, as demonstrated by this bug.

Agreed.

> >
> > Given that the only thing as far as I can tell that we
> > do with PYTHON_VERSION is use it in tests/Makefile.inc
> > to suppress a bit of test functionality if we don't have
> > Python 3, could we stop trying to parse -V output and run
> > python to print sys.version_info instead, and/or just
> > have the makefile variable track "is this python 2",
> > since that's what we really care about and would mean we
> > don't have to then search the string for "v2"  ?
> 
> Because I've been bitten way too many times with differences in Python
> minor versions, I see a lot of value in keeping the version
> information in the build system.  But, the same information can
> certainly be obtained in a more resilient way.  Would you object something
> like:
> 
>   python_version=$($python -c 'import sys; print(sys.version().split()[0])')

Sounds much better, but why sys.version().split() instead of
sys.version_info?

  python_version=$($python -c 'import sys; print(sys.version_info[0])')

> 
> Or an even more paranoid version?  On my side, I understand the
> fragility of the current approach, but I also appreciate the
> information it stores.

We have only one place where $(PYTHON_VERSION) is used, and that
code will be removed once we stop supporting Python 2.  I don't
see the point of trying to store extra information that is not
used anywhere in our makefiles.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 1/4] configure: keep track of Python version

2019-08-22 Thread Cleber Rosa

On Thu, Aug 22, 2019 at 05:48:46PM +0100, Peter Maydell wrote:
> On Fri, 9 Nov 2018 at 15:09, Cleber Rosa  wrote:
> >
> > Some functionality is dependent on the Python version
> > detected/configured on configure.  While it's possible to run the
> > Python version later and check for the version, doing it once is
> > preferable.  Also, it's a relevant information to keep in build logs,
> > as the overall behavior of the build can be affected by it.
> >
> > Signed-off-by: Cleber Rosa 
> > ---
> >  configure | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/configure b/configure
> > index 74e313a810..67fff0290d 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1740,6 +1740,9 @@ if ! $python -c 'import sys; 
> > sys.exit(sys.version_info < (2,7))'; then
> >"Use --python=/path/to/python to specify a supported Python."
> >  fi
> >
> > +# Preserve python version since some functionality is dependent on it
> > +python_version=$($python -V 2>&1 | sed -e 's/Python\ //')
> > +
> 
> Hi. Somebody on IRC has just fallen over a problem where
> their python's "-V" output prints multiple lines, which
> means that "$python_version" here is multiple lines, which
> means that the eventual config-host.mak has invalid syntax
> because we assume here:
>

We've tried a number of things, and just when I thought we wouldn't be
able to make any sense out of it, I arrived at a still senseless but
precise reproducer.  TL;DR: it has to do with interactive shells and
that exact Python build.

Reproducer (docker may also do the trick here):

  $ podman run --rm -ti fedora:29 /bin/bash -c 'dnf -y install 
http://mirror.siena.edu/fedora/linux/releases/29/Everything/x86_64/os/Packages/p/python3-3.7.0-9.fc29.x86_64.rpm;
 python3 -V'
  Python 3.7.0 (default, Aug 30 2018, 14:32:33) 
  [GCC 8.2.1 20180801 (Red Hat 8.2.1-2)]

With an interactive shell instead:

  $ podman run --rm -ti fedora:29 /bin/bash -i -c 'dnf -y install 
http://mirror.siena.edu/fedora/linux/releases/29/Everything/x86_64/os/Packages/p/python3-3.7.0-9.fc29.x86_64.rpm;
 python3 -V'
  Python 3.7.0

How this behavior came to be, baffles me.  But, it seems to be fixed
on newer versions.

> > @@ -6823,6 +6826,7 @@ echo "INSTALL_DATA=$install -c -m 0644" >> 
> > $config_host_mak
> >  echo "INSTALL_PROG=$install -c -m 0755" >> $config_host_mak
> >  echo "INSTALL_LIB=$install -c -m 0644" >> $config_host_mak
> >  echo "PYTHON=$python" >> $config_host_mak
> > +echo "PYTHON_VERSION=$python_version" >> $config_host_mak
> >  echo "CC=$cc" >> $config_host_mak
> >  if $iasl -h > /dev/null 2>&1; then
> >echo "IASL=$iasl" >> $config_host_mak
> 
> that it's only one line, and will generate bogus makefile
> syntax if it's got an embedded newline. (Problem system
> seems to be Fedora 29.)
>

The assumption could be guaranteed by a "head -1", and while
it's not a failproof solution, it would at least not corrupt
the makefile and the whole build system.

> I've reread this thread, where there seems to have been
> some discussion about just running Python itself to
> get the sys.version value (which is how we check for
> "is this python too old" earlier in the configure script).
> But I'm not really clear why trying to parse -V output is better:
> it's definitely less reliable, as demonstrated by this bug.
>
> Given that the only thing as far as I can tell that we
> do with PYTHON_VERSION is use it in tests/Makefile.inc
> to suppress a bit of test functionality if we don't have
> Python 3, could we stop trying to parse -V output and run
> python to print sys.version_info instead, and/or just
> have the makefile variable track "is this python 2",
> since that's what we really care about and would mean we
> don't have to then search the string for "v2"  ?

Because I've been bitten way too many times with differences in Python
minor versions, I see a lot of value in keeping the version
information in the build system.  But, the same information can
certainly be obtained in a more resilient way.  Would you object something
like:

  python_version=$($python -c 'import sys; print(sys.version().split()[0])')

Or an even more paranoid version?  On my side, I understand the
fragility of the current approach, but I also appreciate the
information it stores.

> 
> thanks
> -- PMM

Thanks!
- Cleber.

Re: [Qemu-devel] [PATCH v1 1/1] spapr_pci: remove all child functions in function zero unplug

2019-08-22 Thread Eric Blake

On 8/22/19 2:59 PM, Daniel Henrique Barboza wrote:
> There is nothing wrong with how sPAPR handles multifunction PCI
> hot unplugs. The problem is that x86 does it simpler. Instead of
> removing each non-zero function and then removing function zero,
> x86 can remove any function of the slot to trigger the hot unplug.
> 

> +++ b/hw/ppc/spapr_pci.c
> @@ -1700,11 +1700,13 @@ static void spapr_pci_unplug_request(HotplugHandler 
> *plug_handler,
>  state = func_drck->dr_entity_sense(func_drc);
>  if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
>  && !spapr_drc_unplug_requested(func_drc)) {
> -error_setg(errp,
> -   "PCI: slot %d, function %d still present. "
> -   "Must unplug all non-0 functions first.",
> -   slotnr, i);
> -return;
> +/*
> + * Attempting to remove function 0 of a multifunction
> + * device will will cascade into removing all child
> + * functions, even if their unplug weren't requested

s/weren't/wasn't/

> + * beforehand.
> + */
> +spapr_drc_detach(func_drc);
>  }
>  }
>  }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] block: workaround for unaligned byte range in fallocate()

2019-08-22 Thread Eric Blake

On 8/22/19 1:31 PM, Andrey Shinkevich wrote:
> Revert the commit 118f99442d 'block/io.c: fix for the allocation failure'
> and make better error handling for the file systems that do not support

s/make/use/

> fallocate() for the unaligned byte range. Allow falling back to pwrite

s/the/an/

> in case fallocate() returns EINVAL.
> 
> Suggested-by: Kevin Wolf 
> Suggested-by: Eric Blake 
> Signed-off-by: Andrey Shinkevich 
> ---
> Discussed in email thread with the message ID
> <1554474244-553661-1-git-send-email-andrey.shinkev...@virtuozzo.com>
> 
>  block/file-posix.c | 7 +++
>  block/io.c | 2 +-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index fbeb006..2c254ff 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1588,6 +1588,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>  if (s->has_write_zeroes) {
>  int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
> aiocb->aio_offset, aiocb->aio_nbytes);
> +if (ret == -EINVAL) {
> +/*
> + * Allow falling back to pwrite for file systems that
> + * do not support fallocate() for unaligned byte range.

s/for/for an/

> + */
> +return -ENOTSUP;
> +}
>  if (ret == 0 || ret != -ENOTSUP) {
>  return ret;
>  }
> diff --git a/block/io.c b/block/io.c
> index 56bbf19..58f08cd 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1558,7 +1558,7 @@ static int coroutine_fn 
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  assert(!bs->supported_zero_flags);
>  }
>  
> -if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) {
> +if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
>  /* Fall back to bounce buffer if write zeroes is unsupported */
>  BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>  
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D

Laszlo,

I believe all the code for the AP startup vector
is already in edk2.

It is a combination of the reset vector code in
UefiCpuPkg/ResetVecor/Vtf0 and an IA32/X64 specific
feature in the GenFv tool.  It sets up a 4KB aligned
location near 4GB which can be used to start an AP
using INIT-SIPI-SIPI.

DI is set to 'AP' if the processor is not the BSP.
This can be used to choose to put the APs into a
wait loop executing from the protected FLASH region.

The SMM Monarch on a hot add event can use the Local
APIC to send an INIT-SIPI-SIPI to wake the AP at the 4KB 
startup vector in FLASH.  Later the SMM Monarch
can sent use the Local APIC to send an SMI to pull the 
hot added CPU into SMM.  It is not clear if we have to
do both SIPI followed by the SMI or if we can just do
the SMI.

Best regards,

Mike

> -Original Message-
> From: de...@edk2.groups.io
> [mailto:de...@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Thursday, August 22, 2019 11:29 AM
> To: Paolo Bonzini ; Kinney,
> Michael D ;
> r...@edk2.groups.io; Yao, Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 08/22/19 08:18, Paolo Bonzini wrote:
> > On 21/08/19 22:17, Kinney, Michael D wrote:
> >> Paolo,
> >>
> >> It makes sense to match real HW.
> >
> > Note that it'd also be fine to match some kind of
> official Intel
> > specification even if no processor (currently?)
> supports it.
> 
> I agree, because...
> 
> >> That puts us back to the reset vector and handling
> the initial SMI at
> >> 3000:8000.  That is all workable from a FW
> implementation
> >> perspective.
> 
> that would suggest that matching reset vector code
> already exists, and it would "only" need to be
> upstreamed to edk2. :)
> 
> >> It look like the only issue left is DMA.
> >>
> >> DMA protection of memory ranges is a chipset
> feature. For the current
> >> QEMU implementation, what ranges of memory are
> guaranteed to be
> >> protected from DMA?  Is it only A/B seg and TSEG?
> >
> > Yes.
> 
> (
> 
> This thread (esp. Jiewen's and Mike's messages) are the
> first time that I've heard about the *existence* of
> such RAM ranges / the chipset feature. :)
> 
> Out of interest (independently of virtualization), how
> is a general purpose OS informed by the firmware,
> "never try to set up DMA to this RAM area"? Is this
> communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA
> (Direct Memory Access)". It writes,
> 
> For example, if a platform implements a PCI bus
> that cannot access
> all of physical memory, it has a _DMA object under
> that PCI bus that
> describes the ranges of physical memory that can be
> accessed by
> devices on that bus.
> 
> Sorry about the digression, and also about being late
> to this thread, continually -- I'm primarily following
> and learning.
> 
> )
> 
> Thanks!
> Laszlo
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this
> group.
> 
> View/Reply Online (#46228):
> https://edk2.groups.io/g/devel/message/46228
> Mute This Topic: https://groups.io/mt/32979681/1643496
> Group Owner: devel+ow...@edk2.groups.io
> Unsubscribe: https://edk2.groups.io/g/devel/unsub
> [michael.d.kin...@intel.com]
> -=-=-=-=-=-=-=-=-=-=-=-

Re: [Qemu-devel] [Qemu-ppc] [PULL 1/2] spapr: Reset CAS & IRQ subsystem after devices

2019-08-22 Thread Laurent Vivier

On 13/08/2019 08:59, David Gibson wrote:
> This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
> caused by the new "dual" interrupt controller model.  Specifically,
> qemu can crash when used with KVM if a 'system_reset' is requested
> while there's active I/O in the guest.
> 
> The problem is that in spapr_machine_reset() we:
> 
> 1. Reset the CAS vector state
>   spapr_ovec_cleanup(spapr->ov5_cas);
> 
> 2. Reset all devices
>   qemu_devices_reset()
> 
> 3. Reset the irq subsystem
>   spapr_irq_reset();
> 
> However (1) implicitly changes the interrupt delivery mode, because
> whether we're using XICS or XIVE depends on the CAS state.  We don't
> properly initialize the new irq mode until (3) though - in particular
> setting up the KVM devices.
> 
> During (2), we can temporarily drop the BQL allowing some irqs to be
> delivered which will go to an irq system that's not properly set up.
> 
> Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
> reset will put us back in XICS mode.  kvm_kernel_irqchip() still
> returns true, because XIVE was using KVM, however XICs doesn't have
> its KVM components intialized and kernel_xics_fd == -1.  When the irq
> is delivered it goes via ics_kvm_set_irq() which assert()s that
> kernel_xics_fd != -1.
> 
> This change addresses the problem by delaying the CAS reset until
> after the devices reset.  The device reset should quiesce all the
> devices so we won't get irqs delivered while we mess around with the
> IRQ.  The CAS reset and irq re-initialize should also now be under the
> same BQL critical section so nothing else should be able to interrupt
> it either.
> 
> We also move the spapr_irq_msi_reset() used in one of the legacy irq
> modes, since it logically makes sense at the same point as the
> spapr_irq_reset() (it's essentially an equivalent operation for older
> machine types).  Since we don't need to switch between different
> interrupt controllers for those old machine types it shouldn't
> actually be broken in those cases though.
> 
> Cc: Cédric Le Goater 
> 
> Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
> Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
>  XIVE and XICS"
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 821f0d4a49..12ed4b065c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1726,6 +1726,18 @@ static void spapr_machine_reset(MachineState *machine)
>  spapr_setup_hpt_and_vrma(spapr);
>  }
>  
> +/*
> + * NVLink2-connected GPU RAM needs to be placed on a separate NUMA node.
> + * We assign a new numa ID per GPU in spapr_pci_collect_nvgpu() which is
> + * called from vPHB reset handler so we initialize the counter here.
> + * If no NUMA is configured from the QEMU side, we start from 1 as GPU 
> RAM
> + * must be equally distant from any other node.
> + * The final value of spapr->gpu_numa_id is going to be written to
> + * max-associativity-domains in spapr_build_fdt().
> + */
> +spapr->gpu_numa_id = MAX(1, nb_numa_nodes);
> +qemu_devices_reset();
> +
>  /*
>   * If this reset wasn't generated by CAS, we should reset our
>   * negotiated options and start from scratch
> @@ -1741,18 +1753,6 @@ static void spapr_machine_reset(MachineState *machine)
>  spapr_irq_msi_reset(spapr);
>  }
>  
> -/*
> - * NVLink2-connected GPU RAM needs to be placed on a separate NUMA node.
> - * We assign a new numa ID per GPU in spapr_pci_collect_nvgpu() which is
> - * called from vPHB reset handler so we initialize the counter here.
> - * If no NUMA is configured from the QEMU side, we start from 1 as GPU 
> RAM
> - * must be equally distant from any other node.
> - * The final value of spapr->gpu_numa_id is going to be written to
> - * max-associativity-domains in spapr_build_fdt().
> - */
> -spapr->gpu_numa_id = MAX(1, nb_numa_nodes);
> -qemu_devices_reset();
> -
>  /*
>   * This is fixing some of the default configuration of the XIVE
>   * devices. To be called after the reset of the machine devices.
> 

This commit breaks migration between POWER8 <-> POWER9 hosts:

qemu-system-ppc64: error while loading state for instance 0x1 of device 'cpu'
qemu-system-ppc64: load of migration failed: Operation not permitted

Using a guest with a running 4.18 kernel (RHEL 8) and "-M 
pseries,max-cpu-compat=power8" on both sides.

There is no problem if both hosts are of the same kind ( P8 <-> P8 or P9 <-> 
P9).

Thanks,
Laurent

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Kinney, Michael D

Paolo,

The SMBASE register is internal and cannot be directly accessed 
by any CPU.  There is an SMBASE field that is member of the SMM Save
State area and can only be modified from SMM and requires the
execution of an RSM instruction from SMM for the SMBASE register to
be updated from the current SMBASE field value.  The new SMBASE
register value is only used on the next SMI.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Vol 3C - Section 34.11

  The default base address for the SMRAM is 3H. This value is contained in 
an internal processor register called
  the SMBASE register. The operating system or executive can relocate the SMRAM 
by setting the SMBASE field in the
  saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM 
instruction reloads the internal
  SMBASE register with the value in the SMBASE field each time it exits SMM. 
All subsequent SMI requests will use
  the new SMBASE value to find the starting address for the SMI handler (at 
SMBASE + 8000H) and the SMRAM state
  save area (from SMBASE + FE00H to SMBASE + H). (The processor resets the 
value in its internal SMBASE
  register to 3H on a RESET, but does not change it on an INIT.)

One idea to work around these issues is to startup OVMF with the maximum number 
of
CPUs.  All the CPUs will be assigned an SMBASE address and at a safe time to 
assign
the SMBASE values using the initial 3000:8000 SMI vector because there is a 
guarantee
of no DMA at that point in the FW init.

Once all the CPUs have been initialized for SMM, the CPUs that are not needed
can be hot removed.  As noted above, the SMBASE value does not change on
an INIT.  So as long as the hot add operation does not do a RESET, the
SMBASE value must be preserved.

Of course, this is not a good idea from a boot performance perspective, 
especially if the max CPUs is a large value.

Another idea is to emulate this behavior.  If the hot plug controller
provide registers (only accessible from SMM) to assign the SMBASE address
for every CPU.  When a CPU is hot added, QEMU can set the internal SMBASE
register value from the hot plug controller register value.  If the SMM
Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
then the SMBASE register should not be modified and the CPU starts execution
within TSEG the first time it receives an SMI.

Jiewen and I can collect specific questions on this topic and continue
the discussion here.  For example, I do not think there is any method
other than what I referenced above to program the SMBASE register, but
I can ask if there are any other methods.

Thanks,

Mike

> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, August 22, 2019 11:43 AM
> To: Laszlo Ersek ; Kinney, Michael D
> ; r...@edk2.groups.io; Yao,
> Jiewen 
> Cc: Alex Williamson ;
> de...@edk2.groups.io; qemu devel list  de...@nongnu.org>; Igor Mammedov ;
> Chen, Yingwen ; Nakajima, Jun
> ; Boris Ostrovsky
> ; Joao Marcal Lemos Martins
> ; Phillip Goerl
> 
> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
> SMM with QEMU+OVMF
> 
> On 22/08/19 19:59, Laszlo Ersek wrote:
> > The firmware and QEMU could agree on a formula, which
> would compute
> > the CPU-specific SMBASE from a value pre-programmed by
> the firmware,
> > and the initial APIC ID of the hot-added CPU.
> >
> > Yes, it would duplicate code -- the calculation --
> between QEMU and
> > edk2. While that's not optimal, it wouldn't be a first.
> 
> No, that would be unmaintainable.  The best solution to
> me seems to be to make SMBASE programmable from non-SMM
> code if some special conditions hold.  Michael, would it
> be possible to get in contact with the Intel architects?
> 
> Paolo

[Qemu-devel] [PATCH v1 1/1] spapr_pci: remove all child functions in function zero unplug

2019-08-22 Thread Daniel Henrique Barboza

There is nothing wrong with how sPAPR handles multifunction PCI
hot unplugs. The problem is that x86 does it simpler. Instead of
removing each non-zero function and then removing function zero,
x86 can remove any function of the slot to trigger the hot unplug.

Libvirt will be directly impacted by this difference, in the
(hopefully soon) PCI Multifunction hot plug/unplug support. For
hot plugs, both x86 and sPAPR will operate the same way: a XML
with all desired functions to be added, then consecutive hotplugs
of all non-zero functions first, zero last. For hot unplugs, at
least in the current state, a XML with the devices to be removed
must also be provided because of how sPAPR operates - x86 does
not need it - since any function unplug will unplug the whole
PCIe slot. This difference puts extra strain in the management
layer, which needs to either handle both archs differently in
the unplug scenario or choose treat x86 like sPAPR, forcing x86
users to cope with sPAPR internals.

This patch changes spapr_pci_unplug_request to handle the
unplug of function zero differently. When removing function zero,
instead of error-ing out if there are any remaining function
DRCs which needs detaching, detach those. This has no effect in
any existing scripts that are detaching the non-zero functions
before function zero, and can be used by management as a shortcut
to remove the whole PCI multifunction device without specifying
each child function.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_pci.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index deb0b0c80c..9f176f463e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1700,11 +1700,13 @@ static void spapr_pci_unplug_request(HotplugHandler 
*plug_handler,
 state = func_drck->dr_entity_sense(func_drc);
 if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
 && !spapr_drc_unplug_requested(func_drc)) {
-error_setg(errp,
-   "PCI: slot %d, function %d still present. "
-   "Must unplug all non-0 functions first.",
-   slotnr, i);
-return;
+/*
+ * Attempting to remove function 0 of a multifunction
+ * device will will cascade into removing all child
+ * functions, even if their unplug weren't requested
+ * beforehand.
+ */
+spapr_drc_detach(func_drc);
 }
 }
 }
-- 
2.21.0

Re: [Qemu-devel] [PATCH] block: workaround for unaligned byte range in fallocate()

2019-08-22 Thread Vladimir Sementsov-Ogievskiy

22.08.2019 21:55, Vladimir Sementsov-Ogievskiy wrote:
> 22.08.2019 21:31, Andrey Shinkevich wrote:
>> Revert the commit 118f99442d 'block/io.c: fix for the allocation failure'
>> and make better error handling for the file systems that do not support
>> fallocate() for the unaligned byte range. Allow falling back to pwrite
>> in case fallocate() returns EINVAL.
>>
>> Suggested-by: Kevin Wolf 
>> Suggested-by: Eric Blake 
>> Signed-off-by: Andrey Shinkevich 
>> ---
>> Discussed in email thread with the message ID
>> <1554474244-553661-1-git-send-email-andrey.shinkev...@virtuozzo.com>
>>
>>   block/file-posix.c | 7 +++
>>   block/io.c | 2 +-
>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index fbeb006..2c254ff 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -1588,6 +1588,13 @@ static int j(void *opaque)
>>   if (s->has_write_zeroes) {
>>   int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
>>  aiocb->aio_offset, aiocb->aio_nbytes);
>> +    if (ret == -EINVAL) {
>> +    /*
>> + * Allow falling back to pwrite for file systems that
>> + * do not support fallocate() for unaligned byte range.
>> + */
>> +    return -ENOTSUP;
>> +    }
>>   if (ret == 0 || ret != -ENOTSUP) {
>>   return ret;
>>   }
> 
> Hmm stop, you've done exactly what Den was afraid of:
> 
> the next line
>    s->has_write_zeroes = false;
> 
> will disable write_zeroes forever.
> 
> Something like
> 
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1588,10 +1588,12 @@ static int handle_aiocb_write_zeroes(void *opaque)
>   if (s->has_write_zeroes) {
>   int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
>      aiocb->aio_offset, aiocb->aio_nbytes);
> -    if (ret == 0 || ret != -ENOTSUP) {
> +    if (ret == 0 || (ret != -ENOTSUP && ret != -EINVAL)) {
>   return ret;
>   }
> -    s->has_write_zeroes = false;
> +    if (ret == -ENOTSUP) {
> +    s->has_write_zeroes = false;
> +    }
>   }
>   #endif
> 
> 
> will work better. So, handle ENOTSUP as "disable write_zeros forever", and 
> EINVAL as
> "don't disable, but fallback to writing zeros". And we need same handling for 
> following do_fallocate() calls
> too (otherwise they again fails with EINVAL which will break the whole thing).
> 

Oops, sorry, I misread your patch, it's OK.

Still we may want to handle other do_fallocate() calls in same manner, or may 
be just:

@@ -1558,7 +1558,13 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
  assert(!bs->supported_zero_flags);
  }

-if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) {
+/*
+ * We are sure that our arguments make sense, so consider "invalid
+ * argument" in same manner as "not supported".
+ */
+if ((ret == -ENOTSUP || ret == -EINVAL) &&
+!(flags & BDRV_REQ_NO_FALLBACK))
+{
  /* Fall back to bounce buffer if write zeroes is unsupported */
  BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;




-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v7 04/13] vfio: Add save and load functions for VFIO PCI devices

2019-08-22 Thread Dr. David Alan Gilbert

* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> 
> 
> On 8/22/2019 3:02 PM, Dr. David Alan Gilbert wrote:
> > * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> >> Sorry for delay to respond.
> >>
> >> On 7/11/2019 5:37 PM, Dr. David Alan Gilbert wrote:
> >>> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
>  These functions save and restore PCI device specific data - config
>  space of PCI device.
>  Tested save and restore with MSI and MSIX type.
> 
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
>   hw/vfio/pci.c | 114 
>  ++
>   include/hw/vfio/vfio-common.h |   2 +
>   2 files changed, 116 insertions(+)
> 
>  diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>  index de0d286fc9dd..5fe4f8076cac 100644
>  --- a/hw/vfio/pci.c
>  +++ b/hw/vfio/pci.c
>  @@ -2395,11 +2395,125 @@ static Object *vfio_pci_get_object(VFIODevice 
>  *vbasedev)
>   return OBJECT(vdev);
>   }
>   
>  +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
>  +{
>  +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, 
>  vbasedev);
>  +PCIDevice *pdev = >pdev;
>  +uint16_t pci_cmd;
>  +int i;
>  +
>  +for (i = 0; i < PCI_ROM_SLOT; i++) {
>  +uint32_t bar;
>  +
>  +bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 
>  4);
>  +qemu_put_be32(f, bar);
>  +}
>  +
>  +qemu_put_be32(f, vdev->interrupt);
>  +if (vdev->interrupt == VFIO_INT_MSI) {
>  +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
>  +bool msi_64bit;
>  +
>  +msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
>  PCI_MSI_FLAGS,
>  +2);
>  +msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
>  +
>  +msi_addr_lo = pci_default_read_config(pdev,
>  + pdev->msi_cap + 
>  PCI_MSI_ADDRESS_LO, 4);
>  +qemu_put_be32(f, msi_addr_lo);
>  +
>  +if (msi_64bit) {
>  +msi_addr_hi = pci_default_read_config(pdev,
>  + pdev->msi_cap + 
>  PCI_MSI_ADDRESS_HI,
>  + 4);
>  +}
>  +qemu_put_be32(f, msi_addr_hi);
>  +
>  +msi_data = pci_default_read_config(pdev,
>  +pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
>  PCI_MSI_DATA_32),
>  +2);
>  +qemu_put_be32(f, msi_data);
>  +} else if (vdev->interrupt == VFIO_INT_MSIX) {
>  +uint16_t offset;
>  +
>  +/* save enable bit and maskall bit */
>  +offset = pci_default_read_config(pdev,
>  +   pdev->msix_cap + PCI_MSIX_FLAGS 
>  + 1, 2);
>  +qemu_put_be16(f, offset);
>  +msix_save(pdev, f);
>  +}
>  +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
>  +qemu_put_be16(f, pci_cmd);
>  +}
>  +
>  +static void vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>  +{
>  +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, 
>  vbasedev);
>  +PCIDevice *pdev = >pdev;
>  +uint32_t interrupt_type;
>  +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
>  +uint16_t pci_cmd;
>  +bool msi_64bit;
>  +int i;
>  +
>  +/* retore pci bar configuration */
>  +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
>  +vfio_pci_write_config(pdev, PCI_COMMAND,
>  +pci_cmd & (!(PCI_COMMAND_IO | 
>  PCI_COMMAND_MEMORY)), 2);
>  +for (i = 0; i < PCI_ROM_SLOT; i++) {
>  +uint32_t bar = qemu_get_be32(f);
>  +
>  +vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
>  +}
> >>>
> >>> Is it possible to validate the bar's at all?  We just had a bug on a
> >>> virtual device where one version was asking for a larger bar than the
> >>> other; our validation caught this in some cases so we could tell that
> >>> the guest had a BAR that was aligned at the wrong alignment.
> >>>
> >>
> >> "Validate the bars" does that means validate size of bars?
> > 
> > I meant validate the address programmed into the BAR against the size,
> > assuming you know the size; e.g. if it's a 128MB BAR, then make sure the
> > address programmed in is 128MB aligned.
> > 
> 
> If this validation fails, migration resume should fail, right?

Yes I think so; if you've got a device that wants 128MB alignment and
someone gives you a non-aligned address, who knows what will happen.

> 
>  +vfio_pci_write_config(pdev,

Re: [Qemu-devel] [PATCH v7 04/13] vfio: Add save and load functions for VFIO PCI devices

2019-08-22 Thread Kirti Wankhede




On 8/22/2019 3:02 PM, Dr. David Alan Gilbert wrote:
> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
>> Sorry for delay to respond.
>>
>> On 7/11/2019 5:37 PM, Dr. David Alan Gilbert wrote:
>>> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
 These functions save and restore PCI device specific data - config
 space of PCI device.
 Tested save and restore with MSI and MSIX type.

 Signed-off-by: Kirti Wankhede 
 Reviewed-by: Neo Jia 
 ---
  hw/vfio/pci.c | 114 
 ++
  include/hw/vfio/vfio-common.h |   2 +
  2 files changed, 116 insertions(+)

 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
 index de0d286fc9dd..5fe4f8076cac 100644
 --- a/hw/vfio/pci.c
 +++ b/hw/vfio/pci.c
 @@ -2395,11 +2395,125 @@ static Object *vfio_pci_get_object(VFIODevice 
 *vbasedev)
  return OBJECT(vdev);
  }
  
 +static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
 +{
 +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 +PCIDevice *pdev = >pdev;
 +uint16_t pci_cmd;
 +int i;
 +
 +for (i = 0; i < PCI_ROM_SLOT; i++) {
 +uint32_t bar;
 +
 +bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 
 4);
 +qemu_put_be32(f, bar);
 +}
 +
 +qemu_put_be32(f, vdev->interrupt);
 +if (vdev->interrupt == VFIO_INT_MSI) {
 +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
 +bool msi_64bit;
 +
 +msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
 PCI_MSI_FLAGS,
 +2);
 +msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
 +
 +msi_addr_lo = pci_default_read_config(pdev,
 + pdev->msi_cap + 
 PCI_MSI_ADDRESS_LO, 4);
 +qemu_put_be32(f, msi_addr_lo);
 +
 +if (msi_64bit) {
 +msi_addr_hi = pci_default_read_config(pdev,
 + pdev->msi_cap + 
 PCI_MSI_ADDRESS_HI,
 + 4);
 +}
 +qemu_put_be32(f, msi_addr_hi);
 +
 +msi_data = pci_default_read_config(pdev,
 +pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
 PCI_MSI_DATA_32),
 +2);
 +qemu_put_be32(f, msi_data);
 +} else if (vdev->interrupt == VFIO_INT_MSIX) {
 +uint16_t offset;
 +
 +/* save enable bit and maskall bit */
 +offset = pci_default_read_config(pdev,
 +   pdev->msix_cap + PCI_MSIX_FLAGS + 
 1, 2);
 +qemu_put_be16(f, offset);
 +msix_save(pdev, f);
 +}
 +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
 +qemu_put_be16(f, pci_cmd);
 +}
 +
 +static void vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
 +{
 +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 +PCIDevice *pdev = >pdev;
 +uint32_t interrupt_type;
 +uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
 +uint16_t pci_cmd;
 +bool msi_64bit;
 +int i;
 +
 +/* retore pci bar configuration */
 +pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
 +vfio_pci_write_config(pdev, PCI_COMMAND,
 +pci_cmd & (!(PCI_COMMAND_IO | 
 PCI_COMMAND_MEMORY)), 2);
 +for (i = 0; i < PCI_ROM_SLOT; i++) {
 +uint32_t bar = qemu_get_be32(f);
 +
 +vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
 +}
>>>
>>> Is it possible to validate the bar's at all?  We just had a bug on a
>>> virtual device where one version was asking for a larger bar than the
>>> other; our validation caught this in some cases so we could tell that
>>> the guest had a BAR that was aligned at the wrong alignment.
>>>
>>
>> "Validate the bars" does that means validate size of bars?
> 
> I meant validate the address programmed into the BAR against the size,
> assuming you know the size; e.g. if it's a 128MB BAR, then make sure the
> address programmed in is 128MB aligned.
> 

If this validation fails, migration resume should fail, right?


 +vfio_pci_write_config(pdev, PCI_COMMAND,
 +  pci_cmd | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 
 2);
>>>
>>> Can you explain what this is for?  You write the command register at the
>>> end of the function with the original value; there's no guarantee that
>>> the device is using IO for example, so ORing it seems odd.
>>>
>>
>> IO space and memory space accesses are disabled before writing BAR
>> addresses, only those are enabled here.
> 
> But do

Re: [Qemu-devel] [PATCH] block: gluster: Probe alignment limits

2019-08-22 Thread Nir Soffer

On Thu, Aug 22, 2019 at 10:03 AM Niels de Vos  wrote:

> On Wed, Aug 21, 2019 at 07:04:17PM +0200, Max Reitz wrote:
> > On 17.08.19 23:21, Nir Soffer wrote:
> > > Implement alignment probing similar to file-posix, by reading from the
> > > first 4k of the image.
> > >
> > > Before this change, provisioning a VM on storage with sector size of
> > > 4096 bytes would fail when the installer try to create filesystems.
> Here
> > > is an example command that reproduces this issue:
> > >
> > > $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \
> > > -drive
> file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \
> > > -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso
> > >
> > > The installer fails in few seconds when trying to create filesystem on
> > > /dev/mapper/fedora-root. In error report we can see that it failed with
> > > EINVAL (I could not extract the error from guest).
> > >
> > > Copying disk fails with EINVAL:
> > >
> > > $ qemu-img convert -p -f raw -O raw -t none -T none \
> > > gluster://gluster1/gv0/fedora29.raw \
> > > gluster://gluster1/gv0/fedora29-clone.raw
> > > qemu-img: error while writing sector 4190208: Invalid argument
> > >
> > > This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:
> > > Handle undetectable alignment) for gluster:// images.
> > >
> > > This fix has the same limit, that the first block of the image should
> be
> > > allocated, otherwise we cannot detect the alignment and fallback to a
> > > safe value (4096) even when using storage with sector size of 512
> bytes.
> > >
> > > Signed-off-by: Nir Soffer 
> > > ---
> > >  block/gluster.c | 47 +++
> > >  1 file changed, 47 insertions(+)
> > >
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index f64dc5b01e..d936240b72 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -52,6 +52,9 @@
> > >
> > >  #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
> > >
> > > +/* The value is known only on the server side. */
> > > +#define MAX_ALIGN 4096
> > > +
> > >  typedef struct GlusterAIOCB {
> > >  int64_t size;
> > >  int ret;
> > > @@ -902,8 +905,52 @@ out:
> > >  return ret;
> > >  }
> > >
> > > +/*
> > > + * Check if read is allowed with given memory buffer and length.
> > > + *
> > > + * This function is used to check O_DIRECT request alignment.
> > > + */
> > > +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf,
> size_t len)
> > > +{
> > > +ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);
> > > +return ret >= 0 || errno != EINVAL;
> >
> > Is glfs_pread() guaranteed to return EINVAL on invalid alignment?
> > file-posix says this is only the case on Linux (for normal files).  Now
> > I also don’t know whether the gluster driver works on anything but Linux
> > anyway.
>
> The behaviour depends on the filesystem used by the Gluster backend. XFS
> is the recommendation, but in the end it is up to the users. The Gluster
> server is known to work on Linux, NetBSD and FreeBSD, the vast majority
> of users runs it on Linux.
>
> I do not think there is a strong guarantee EINVAL is always correct. How
> about only checking 'ret > 0'?
>

Looks like we don't have a choice.

>
> > > +}
> > > +
> > > +static void gluster_probe_alignment(BlockDriverState *bs, struct
> glfs_fd *fd,
> > > +Error **errp)
> > > +{
> > > +char *buf;
> > > +size_t alignments[] = {1, 512, 1024, 2048, 4096};
> > > +size_t align;
> > > +int i;
> > > +
> > > +buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);
> > > +
> > > +for (i = 0; i < ARRAY_SIZE(alignments); i++) {
> > > +align = alignments[i];
> > > +if (gluster_is_io_aligned(fd, buf, align)) {
> > > +/* Fallback to safe value. */
> > > +bs->bl.request_alignment = (align != 1) ? align :
> MAX_ALIGN;
> > > +break;
> > > +}
> > > +}
> >
> > I don’t like the fact that the last element of alignments[] should be
> > the same as MAX_ALIGN, without that ever having been made explicit
> anywhere.
> >
> > It’s a bit worse in the file-posix patch, because if getpagesize() is
> > greater than 4k, max_align will be greater than 4k.  But MAX_BLOCKSIZE
> > is 4k, too, so I suppose we wouldn’t support any block size beyond that
> > anyway, which makes the error message appropriate still.
> >
> > > +
> > > +qemu_vfree(buf);
> > > +
> > > +if (!bs->bl.request_alignment) {
> > > +error_setg(errp, "Could not find working O_DIRECT alignment");
> > > +error_append_hint(errp, "Try cache.direct=off\n");
> > > +}
> > > +}
> > > +
> > >  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error
> **errp)
> > >  {
> > > +BDRVGlusterState *s = bs->opaque;
> > > +
> > > +gluster_probe_alignment(bs, s->fd, errp);
> > > +
> > > +bs->bl.min_mem_alignment = bs->bl.request_alignment;
> >
> >

Re: [Qemu-devel] [PATCH] block: posix: Always allocate the first block

2019-08-22 Thread Nir Soffer

On Thu, Aug 22, 2019 at 9:11 PM Max Reitz  wrote:

> On 22.08.19 18:39, Nir Soffer wrote:
> > On Thu, Aug 22, 2019 at 5:28 PM Max Reitz  > > wrote:
> >
> > On 16.08.19 23:21, Nir Soffer wrote:
> > > When creating an image with preallocation "off" or "falloc", the
> first
> > > block of the image is typically not allocated. When using Gluster
> > > storage backed by XFS filesystem, reading this block using direct
> I/O
> > > succeeds regardless of request length, fooling alignment detection.
> > >
> > > In this case we fallback to a safe value (4096) instead of the
> optimal
> > > value (512), which may lead to unneeded data copying when aligning
> > > requests.  Allocating the first block avoids the fallback.
> > >
> > > When using preallocation=off, we always allocate at least one
> > filesystem
> > > block:
> > >
> > > $ ./qemu-img create -f raw test.raw 1g
> > > Formatting 'test.raw', fmt=raw size=1073741824
> > >
> > > $ ls -lhs test.raw
> > > 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> > >
> > > I did quick performance tests for these flows:
> > > - Provisioning a VM with a new raw image.
> > > - Copying disks with qemu-img convert to new raw target image
> > >
> > > I installed Fedora 29 server on raw sparse image, measuring the
> time
> > > from clicking "Begin installation" until the "Reboot" button
> appears:
> > >
> > > Before(s)  After(s) Diff(%)
> > > ---
> > >  356389+8.4
> > >
> > > I ran this only once, so we cannot tell much from these results.
> >
> > So you’d expect it to be fast but it was slower?  Well, you only ran
> it
> > once and it isn’t really a precise benchmark...
> >
> > > The second test was cloning the installation image with qemu-img
> > > convert, doing 10 runs:
> > >
> > > for i in $(seq 10); do
> > > rm -f dst.raw
> > > sleep 10
> > > time ./qemu-img convert -f raw -O raw -t none -T none
> > src.raw dst.raw
> > > done
> > >
> > > Here is a table comparing the total time spent:
> > >
> > > TypeBefore(s)   After(s)Diff(%)
> > > ---
> > > real  530.028469.123  -11.4
> > > user   17.204 10.768  -37.4
> > > sys17.881  7.011  -60.7
> > >
> > > Here we see very clear improvement in CPU usage.
> > >
> > > Signed-off-by: Nir Soffer  > >
> > > ---
> > >  block/file-posix.c | 25 +
> > >  tests/qemu-iotests/150.out |  1 +
> > >  tests/qemu-iotests/160 |  4 
> > >  tests/qemu-iotests/175 | 19 +--
> > >  tests/qemu-iotests/175.out |  8 
> > >  tests/qemu-iotests/221.out | 12 
> > >  tests/qemu-iotests/253.out | 12 
> > >  7 files changed, 63 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index b9c33c8f6c..3964dd2021 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void
> *opaque)
> > >  return ret;
> > >  }
> > >
> > > +/*
> > > + * Help alignment detection by allocating the first block.
> > > + *
> > > + * When reading with direct I/O from unallocated area on Gluster
> > backed by XFS,
> > > + * reading succeeds regardless of request length. In this case we
> > fallback to
> > > + * safe aligment which is not optimal. Allocating the first block
> > avoids this
> > > + * fallback.
> > > + *
> > > + * Returns: 0 on success, -errno on failure.
> > > + */
> > > +static int allocate_first_block(int fd)
> > > +{
> > > +ssize_t n;
> > > +
> > > +do {
> > > +n = pwrite(fd, "\0", 1, 0);
> >
> > This breaks when fd has been opened with O_DIRECT.
> >
> >
> > It seems that we always open images without O_DIRECT when creating an
> image
> > in qemu-img create, or when creating a target image in qemu-img convert.
>
> Yes.  But you don’t call this function directly from image creation code
> but instead from the truncation function.  (The former also calls the
> latter, but truncating is also an operation on its own.)
>
> [...]
>
> > (Which happens when you open some file with cache.direct=on, and then
> > use e.g. QMP’s block_resize.)
> >
> >
> > What would be a command triggering this? I can add a test.
>
> block_resize, as I’ve said:
>
> $ ./qemu-img create -f raw empty.img 0
>

This is extreme edge case - why would someone create such image?


> $ x86_64-softmmu/qemu-system-x86_64 \
> -qmp stdio \
> -blockdev

[Qemu-devel] [PATCH 1/2] linux-user: Pass CPUState to MAX_RESERVED_VA

2019-08-22 Thread Richard Henderson

Turn the scalar macro into a functional macro.  Move the creation
of the cpu up a bit within main() so that we can pass it to the
invocation of MAX_RESERVED_VA.  Delay the validation of the -R
parameter until MAX_RESERVED_VA is computed.

So far no changes to any of the MAX_RESERVED_VA macros to actually
use the cpu in any way, but ARM will need it.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/target_cpu.h |  2 +-
 linux-user/main.c   | 43 +
 2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 8a3764919a..279ea532d5 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -21,7 +21,7 @@
 
 /* We need to be able to map the commpage.
See validate_guest_space in linux-user/elfload.c.  */
-#define MAX_RESERVED_VA  0xul
+#define MAX_RESERVED_VA(CPU)  0xul
 
 static inline void cpu_clone_regs(CPUARMState *env, target_ulong newsp)
 {
diff --git a/linux-user/main.c b/linux-user/main.c
index 47917bbb20..35da3bf14c 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -78,12 +78,12 @@ int have_guest_base;
   (TARGET_LONG_BITS == 32 || defined(TARGET_ABI32))
 /* There are a number of places where we assign reserved_va to a variable
of type abi_ulong and expect it to fit.  Avoid the last page.  */
-#   define MAX_RESERVED_VA  (0xul & TARGET_PAGE_MASK)
+#   define MAX_RESERVED_VA(CPU)  (0xul & TARGET_PAGE_MASK)
 #  else
-#   define MAX_RESERVED_VA  (1ul << TARGET_VIRT_ADDR_SPACE_BITS)
+#   define MAX_RESERVED_VA(CPU)  (1ul << TARGET_VIRT_ADDR_SPACE_BITS)
 #  endif
 # else
-#  define MAX_RESERVED_VA  0
+#  define MAX_RESERVED_VA(CPU)  0
 # endif
 #endif
 
@@ -357,8 +357,7 @@ static void handle_arg_reserved_va(const char *arg)
 unsigned long unshifted = reserved_va;
 p++;
 reserved_va <<= shift;
-if (reserved_va >> shift != unshifted
-|| (MAX_RESERVED_VA && reserved_va > MAX_RESERVED_VA)) {
+if (reserved_va >> shift != unshifted) {
 fprintf(stderr, "Reserved virtual address too big\n");
 exit(EXIT_FAILURE);
 }
@@ -607,6 +606,7 @@ int main(int argc, char **argv, char **envp)
 int i;
 int ret;
 int execfd;
+unsigned long max_reserved_va;
 
 error_init(argv[0]);
 module_call_init(MODULE_INIT_TRACE);
@@ -672,24 +672,31 @@ int main(int argc, char **argv, char **envp)
 /* init tcg before creating CPUs and to get qemu_host_page_size */
 tcg_exec_init(0);
 
-/* Reserving *too* much vm space via mmap can run into problems
-   with rlimits, oom due to page table creation, etc.  We will still try 
it,
-   if directed by the command-line option, but not by default.  */
-if (HOST_LONG_BITS == 64 &&
-TARGET_VIRT_ADDR_SPACE_BITS <= 32 &&
-reserved_va == 0) {
-/* reserved_va must be aligned with the host page size
- * as it is used with mmap()
- */
-reserved_va = MAX_RESERVED_VA & qemu_host_page_mask;
-}
-
 cpu = cpu_create(cpu_type);
 env = cpu->env_ptr;
 cpu_reset(cpu);
-
 thread_cpu = cpu;
 
+/*
+ * Reserving too much vm space via mmap can run into problems
+ * with rlimits, oom due to page table creation, etc.  We will
+ * still try it, if directed by the command-line option, but
+ * not by default.
+ */
+max_reserved_va = MAX_RESERVED_VA(cpu);
+if (reserved_va != 0) {
+if (max_reserved_va && reserved_va > max_reserved_va) {
+fprintf(stderr, "Reserved virtual address too big\n");
+exit(EXIT_FAILURE);
+}
+} else if (HOST_LONG_BITS == 64 && TARGET_VIRT_ADDR_SPACE_BITS <= 32) {
+/*
+ * reserved_va must be aligned with the host page size
+ * as it is used with mmap()
+ */
+reserved_va = max_reserved_va & qemu_host_page_mask;
+}
+
 if (getenv("QEMU_STRACE")) {
 do_strace = 1;
 }
-- 
2.17.1

[Qemu-devel] [PATCH 2/2] linux-user/arm: Adjust MAX_RESERVED_VA for M-profile

2019-08-22 Thread Richard Henderson

Limit the virtual address space for M-profile cpus to 2GB,
so that we avoid all of the magic addresses in the top half
of the M-profile system map.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/target_cpu.h | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/linux-user/arm/target_cpu.h b/linux-user/arm/target_cpu.h
index 279ea532d5..3f79356a07 100644
--- a/linux-user/arm/target_cpu.h
+++ b/linux-user/arm/target_cpu.h
@@ -19,9 +19,27 @@
 #ifndef ARM_TARGET_CPU_H
 #define ARM_TARGET_CPU_H
 
-/* We need to be able to map the commpage.
-   See validate_guest_space in linux-user/elfload.c.  */
-#define MAX_RESERVED_VA(CPU)  0xul
+static inline unsigned long arm_max_reserved_va(CPUState *cs)
+{
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (arm_feature(>env, ARM_FEATURE_M)) {
+/*
+ * There are magic return addresses above 0xfe00,
+ * and in general a lot of M-profile system stuff in
+ * the high addresses.  Restrict linux-user to the
+ * cached write-back RAM in the system map.
+ */
+return 0x8000ul;
+} else {
+/*
+ * We need to be able to map the commpage.
+ * See validate_guest_space in linux-user/elfload.c.
+ */
+return 0xul;
+}
+}
+#define MAX_RESERVED_VA  arm_max_reserved_va
 
 static inline void cpu_clone_regs(CPUARMState *env, target_ulong newsp)
 {
-- 
2.17.1

[Qemu-devel] [PATCH 0/2] linux-user/arm: Adjust MAX_RESERVED_VA for M-profile

2019-08-22 Thread Richard Henderson

This is inspired by the discussion in

   https://bugs.launchpad.net/qemu/+bug/1840922

Previously I suggested a new CPUClass hook, but when I went
to implement that seemed like overkill.


r~


Richard Henderson (2):
  linux-user: Pass CPUState to MAX_RESERVED_VA
  linux-user/arm: Adjust MAX_RESERVED_VA for M-profile

 linux-user/arm/target_cpu.h | 24 ++---
 linux-user/main.c   | 43 +
 2 files changed, 46 insertions(+), 21 deletions(-)

-- 
2.17.1

Re: [Qemu-devel] [PATCH] block: workaround for unaligned byte range in fallocate()

2019-08-22 Thread Vladimir Sementsov-Ogievskiy

22.08.2019 21:31, Andrey Shinkevich wrote:
> Revert the commit 118f99442d 'block/io.c: fix for the allocation failure'
> and make better error handling for the file systems that do not support
> fallocate() for the unaligned byte range. Allow falling back to pwrite
> in case fallocate() returns EINVAL.
> 
> Suggested-by: Kevin Wolf 
> Suggested-by: Eric Blake 
> Signed-off-by: Andrey Shinkevich 
> ---
> Discussed in email thread with the message ID
> <1554474244-553661-1-git-send-email-andrey.shinkev...@virtuozzo.com>
> 
>   block/file-posix.c | 7 +++
>   block/io.c | 2 +-
>   2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index fbeb006..2c254ff 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1588,6 +1588,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>   if (s->has_write_zeroes) {
>   int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
>  aiocb->aio_offset, aiocb->aio_nbytes);
> +if (ret == -EINVAL) {
> +/*
> + * Allow falling back to pwrite for file systems that
> + * do not support fallocate() for unaligned byte range.
> + */
> +return -ENOTSUP;
> +}
>   if (ret == 0 || ret != -ENOTSUP) {
>   return ret;
>   }

Hmm stop, you've done exactly what Den was afraid of:

the next line
   s->has_write_zeroes = false;

will disable write_zeroes forever.

Something like

--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1588,10 +1588,12 @@ static int handle_aiocb_write_zeroes(void *opaque)
  if (s->has_write_zeroes) {
  int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
 aiocb->aio_offset, aiocb->aio_nbytes);
-if (ret == 0 || ret != -ENOTSUP) {
+if (ret == 0 || (ret != -ENOTSUP && ret != -EINVAL)) {
  return ret;
  }
-s->has_write_zeroes = false;
+if (ret == -ENOTSUP) {
+s->has_write_zeroes = false;
+}
  }
  #endif


will work better. So, handle ENOTSUP as "disable write_zeros forever", and 
EINVAL as
"don't disable, but fallback to writing zeros". And we need same handling for 
following do_fallocate() calls
too (otherwise they again fails with EINVAL which will break the whole thing).

> diff --git a/block/io.c b/block/io.c
> index 56bbf19..58f08cd 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1558,7 +1558,7 @@ static int coroutine_fn 
> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>   assert(!bs->supported_zero_flags);
>   }
>   
> -if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) {
> +if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
>   /* Fall back to bounce buffer if write zeroes is unsupported */
>   BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>   
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini

On 22/08/19 20:29, Laszlo Ersek wrote:
> On 08/22/19 08:18, Paolo Bonzini wrote:
>> On 21/08/19 22:17, Kinney, Michael D wrote:
>>> DMA protection of memory ranges is a chipset feature. For the current
>>> QEMU implementation, what ranges of memory are guaranteed to be
>>> protected from DMA?  Is it only A/B seg and TSEG?
>>
>> Yes.
> 
> This thread (esp. Jiewen's and Mike's messages) are the first time that
> I've heard about the *existence* of such RAM ranges / the chipset
> feature. :)
> 
> Out of interest (independently of virtualization), how is a general
> purpose OS informed by the firmware, "never try to set up DMA to this
> RAM area"? Is this communicated through ACPI _CRS perhaps?
> 
> ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
> Access)". It writes,
> 
> For example, if a platform implements a PCI bus that cannot access
> all of physical memory, it has a _DMA object under that PCI bus that
> describes the ranges of physical memory that can be accessed by
> devices on that bus.
> 
> Sorry about the digression, and also about being late to this thread,
> continually -- I'm primarily following and learning.

It's much simpler: these ranges are not in e820, for example

kernel: BIOS-e820: [mem 0x00059000-0x0008bfff] usable
kernel: BIOS-e820: [mem 0x0008c000-0x000f] reserved

The ranges are not special-cased in any way by QEMU.  Simply, AB-segs
and TSEG RAM are not part of the address space except when in SMM.
Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
and to the bit bucket.  When AB-segs are open, for example, DMA to that
area becomes possible.

Paolo

[1] old timers may remember DEF SEG=: BLOAD "foo.img",0.  It still
works with some disk device models.

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Paolo Bonzini

On 22/08/19 19:59, Laszlo Ersek wrote:
> The firmware and QEMU could agree on a formula, which would compute the
> CPU-specific SMBASE from a value pre-programmed by the firmware, and the
> initial APIC ID of the hot-added CPU.
> 
> Yes, it would duplicate code -- the calculation -- between QEMU and
> edk2. While that's not optimal, it wouldn't be a first.

No, that would be unmaintainable.  The best solution to me seems to be
to make SMBASE programmable from non-SMM code if some special conditions
hold.  Michael, would it be possible to get in contact with the Intel
architects?

Paolo

[Qemu-devel] [PATCH 2/2] backends/vhost-user.c: prevent using uninitialized vqs

2019-08-22 Thread Raphael Norwitz

Similar rational to: e6cc11d64fc998c11a4dfcde8fda3fc33a74d844

For vhost scsi and vhost-user-scsi an issue was observed
where, of the 3 virtqueues, seabios would only set cmd,
leaving ctrl and event without a physical address.
This can caused vhost_verify_ring_part_mapping to return
ENOMEM, causing the following logs:

qemu-system-x86_64: Unable to map available ring for ring 0
qemu-system-x86_64: Verify ring failure on region 0

The issue has already been fixed elsewhere, but it was noted
that in backends/vhost-user.c, the vhost_user_backend_dev_init()
function, which other vdevs use in their realize() to initialize
their vqs, was not being properly zeroing out the queues. This
commit ensures hardware modules using the
vhost_user_backend_dev_init() API properly zero out their vqs on
initialization.

Suggested-by: Philippe Mathieu-Daude 
Signed-off-by: Raphael Norwitz 
---
 backends/vhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 0a13506..2bf3406 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -46,7 +46,7 @@ vhost_user_backend_dev_init(VhostUserBackend *b, VirtIODevice 
*vdev,
 
 b->vdev = vdev;
 b->dev.nvqs = nvqs;
-b->dev.vqs = g_new(struct vhost_virtqueue, nvqs);
+b->dev.vqs = g_new0(struct vhost_virtqueue, nvqs);
 
 ret = vhost_dev_init(>dev, >vhost_user, VHOST_BACKEND_TYPE_USER, 0);
 if (ret < 0) {
-- 
1.9.4

Re: [Qemu-devel] [Slirp] [PATCH 1/2] Do not reassemble fragments pointing outside of the original payload

2019-08-22 Thread Samuel Thibault

Hello,

Philippe Mathieu-Daudé, le jeu. 22 août 2019 16:41:33 +0200, a ecrit:
>   Later the newly calculated pointer q is converted into ip structure
>   and values are modified, Due to the wrong calculation of the delta,
>   ip will be pointing to incorrect location and ip_src and ip_dst can
>   be used to write controlled data onto the calculated location. This
>   may also crash qemu if the calculated ip is located in unmaped area.

That does not seem to be related to this:

> Do not queue fragments pointing out of the original payload to avoid
> to calculate the variable delta.

I don't understand the relation with having to calculate delta.

> diff --git a/src/ip_input.c b/src/ip_input.c
> index 7364ce0..ee52085 100644
> --- a/src/ip_input.c
> +++ b/src/ip_input.c
> @@ -304,6 +304,19 @@ static struct ip *ip_reass(Slirp *slirp, struct ip *ip, 
> struct ipq *fp)
>  ip_deq(q->ipf_prev);
>  }
>  
> +/*
> + * If we received the first fragment, we know the original
> + * payload size.

? We only know the total payload size when receiving the last fragment
(payload = offset*8 + size).

> Verify fragments are within our payload.

By construction of the protocol, fragments can only be within the
payload, since it's the last fragment which provides the payload size.

> +for (q = fp->frag_link.next; q != (struct ipasfrag*)>frag_link;
> +q = q->ipf_next) {
> +if (!q->ipf_off && q->ipf_len) {
> +if (ip->ip_off + ip->ip_len >= q->ipf_len) {
> +goto dropfrag;
> +}
> +}
> +}

Fragments are kept in order, there is no need to go around the list to
find the fragment with offset zero, if it is there it is the first one.

Did you make your test with commit 126c04acbabd ("Fix heap overflow in
ip_reass on big packet input") applied?

Samuel

[Qemu-devel] [PATCH 1/2] vhost-user-blk: prevent using uninitialized vqs

2019-08-22 Thread Raphael Norwitz

Same rational as: e6cc11d64fc998c11a4dfcde8fda3fc33a74d844

Of the 3 virtqueues, seabios only sets cmd, leaving ctrl
and event without a physical address. This can cause
vhost_verify_ring_part_mapping to return ENOMEM, causing
the following logs:

qemu-system-x86_64: Unable to map available ring for ring 0
qemu-system-x86_64: Verify ring failure on region 0

This has already been fixed for vhost scsi devices and was
recently vhost-user scsi devices. This commit fixes it for
vhost-user-blk devices.

Suggested-by: Phillippe Mathieu-Daude 
Signed-off-by: Raphael Norwitz 
---
 hw/block/vhost-user-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 0b8c5df..63da9bb 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -421,7 +421,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 s->inflight = g_new0(struct vhost_inflight, 1);
-s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
+s->vqs = g_new0(struct vhost_virtqueue, s->num_queues);
 s->watch = 0;
 s->connected = false;
 
-- 
1.9.4

[Qemu-devel] [PATCH] block: workaround for unaligned byte range in fallocate()

2019-08-22 Thread Andrey Shinkevich

Revert the commit 118f99442d 'block/io.c: fix for the allocation failure'
and make better error handling for the file systems that do not support
fallocate() for the unaligned byte range. Allow falling back to pwrite
in case fallocate() returns EINVAL.

Suggested-by: Kevin Wolf 
Suggested-by: Eric Blake 
Signed-off-by: Andrey Shinkevich 
---
Discussed in email thread with the message ID
<1554474244-553661-1-git-send-email-andrey.shinkev...@virtuozzo.com>

 block/file-posix.c | 7 +++
 block/io.c | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index fbeb006..2c254ff 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1588,6 +1588,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
 if (s->has_write_zeroes) {
 int ret = do_fallocate(s->fd, FALLOC_FL_ZERO_RANGE,
aiocb->aio_offset, aiocb->aio_nbytes);
+if (ret == -EINVAL) {
+/*
+ * Allow falling back to pwrite for file systems that
+ * do not support fallocate() for unaligned byte range.
+ */
+return -ENOTSUP;
+}
 if (ret == 0 || ret != -ENOTSUP) {
 return ret;
 }
diff --git a/block/io.c b/block/io.c
index 56bbf19..58f08cd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1558,7 +1558,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 assert(!bs->supported_zero_flags);
 }
 
-if (ret < 0 && !(flags & BDRV_REQ_NO_FALLBACK)) {
+if (ret == -ENOTSUP && !(flags & BDRV_REQ_NO_FALLBACK)) {
 /* Fall back to bounce buffer if write zeroes is unsupported */
 BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
 
-- 
1.8.3.1

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Laszlo Ersek

On 08/22/19 08:18, Paolo Bonzini wrote:
> On 21/08/19 22:17, Kinney, Michael D wrote:
>> Paolo,
>>
>> It makes sense to match real HW.
>
> Note that it'd also be fine to match some kind of official Intel
> specification even if no processor (currently?) supports it.

I agree, because...

>> That puts us back to the reset vector and handling the initial SMI at
>> 3000:8000.  That is all workable from a FW implementation
>> perspective.

that would suggest that matching reset vector code already exists, and
it would "only" need to be upstreamed to edk2. :)

>> It look like the only issue left is DMA.
>>
>> DMA protection of memory ranges is a chipset feature. For the current
>> QEMU implementation, what ranges of memory are guaranteed to be
>> protected from DMA?  Is it only A/B seg and TSEG?
>
> Yes.

(

This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.

)

Thanks!
Laszlo

Re: [Qemu-devel] [PATCH 0/3] target/mips: Convert to do_transaction_failed hook

2019-08-22 Thread Aleksandar Markovic

02.08.2019. 18.05, "Peter Maydell"  је написао/ла:
>
> This patchset converts the MIPS target away from the
> old broken do_unassigned_access hook to the new (added in
> 2017...) do_transaction_failed hook.
>

Herve, bonjour.

As far as I can see these changes are fine. May I ask you for your opinion?
Can you run your Jazz tests without regressions with this change?

Mille mercis,
Aleksandar

> The motivation here is:
>  * do_unassigned_access is broken because:
> + it will be called for any kind of access to physical addresses
>   where there is no assigned device, whether that access is by the
>   CPU or by something else (like a DMA controller!), so it can
>   result in spurious guest CPU exceptions.
> + It will also get called even when using KVM, when there's nothing
>   useful it can do.
> + It isn't passed in the return-address within the TCG generated
>   code, so it isn't able to correctly restore the CPU state
>   before generating the exception, and so the exception will
>   often be generated with the wrong faulting guest PC value
>  * there are now only a few targets still using the old hook,
>so if we can convert them we can delete all the old code
>and complete this API transation. (Patches for SPARC are on
>the list; the other user is RISCV, which accidentally
>implemented the old hook rather than the new one recently.)
>
> The general approach to the conversion is to check the target for
> load/store-by-physical-address operations which were previously
> implicitly causing exceptions, to see if they now need to explicitly
> check for and handle memory access failures. (The 'git grep' regexes
> in docs/devel/loads-stores.rst are useful here: the API families to
> look for are ld*_phys/st*_phys, address_space_ld/st*, and
> cpu_physical_memory*.)
>
> For MIPS, there are none of these (the usual place where targets do
> this is hardware page table walks where the page table entries are
> loaded by physical address, and MIPS doesn't seem to have those).
>
> Code audit out of the way, the actual hook changeover is pretty
> simple.
>
> The complication here is the MIPS Jazz board, which has some rather
> dubious code that intercepts the do_unassigned_access hook to suppress
> generation of exceptions for invalid accesses due to data accesses,
> while leaving exceptions for invalid instruction fetches in place. I'm
> a bit dubious about whether the behaviour we have implemented here is
> really what the hardware does -- it seems pretty odd to me to not
> generate exceptions for d-side accesses but to generate them for
> i-side accesses, and looking back through git and mailing list history
> this code is here mainly as "put back the behaviour we had before a
> previous commit broke it", and that older behaviour in turn I think is
> more historical-accident than because anybody deliberately checked the
> hardware behaviour and made QEMU work that way. However, I don't have
> any real hardware to do comparative tests on, so this series retains
> the same behaviour we had before on this board, by making it intercept
> the new hook in the same way it did the old one. I've beefed up the
> comment somewhat to indicate what we're doing, why, and why it might
> not be right.
>
> The patch series is structured in three parts:
>  * make the Jazz board code support CPUs regardless of which
>of the two hooks they implement
>  * switch the MIPS CPUs over to implementing the new hook
>  * remove the no-longer-needed Jazz board code for the old
>hook
> (This seemed cleaner to me than squashing the whole thing into
> a single patch that touched core mips code and the jazz board
> at the same time.)
>
> I have tested this with:
>  * the ARC Multiboot BIOS linked to from the bug
>https://bugs.launchpad.net/qemu/+bug/1245924 (and which
>was the test case for needing the hook intercept)
>  * a Linux kernel for the 'mips' mips r4k machine
>  * 'make check'
> Obviously more extensive testing would be useful, but I
> don't have any other test images. I also don't have
> a KVM MIPS host, which would be worth testing to confirm
> that it also still works.
>
> If anybody happens by some chance to still have a working
> real-hardware Magnum or PICA61 board, we could perhaps test
> how it handles accesses to invalid memory, but I suspect that
> nobody does any more :-)
>
> thanks
> -- PMM
>
>
> Peter Maydell (3):
>   hw/mips/mips_jazz: Override do_transaction_failed hook
>   target/mips: Switch to do_transaction_failed() hook
>   hw/mips/mips_jazz: Remove no-longer-necessary override of
> do_unassigned_access
>
>  target/mips/internal.h  |  8 ---
>  hw/mips/mips_jazz.c | 47 +
>  target/mips/cpu.c   |  2 +-
>  target/mips/op_helper.c | 24 +++--
>  4 files changed, 47 insertions(+), 34 deletions(-)
>
> --
> 2.20.1
>
>

Re: [Qemu-devel] [PATCH] block: posix: Always allocate the first block

2019-08-22 Thread Max Reitz

On 22.08.19 18:39, Nir Soffer wrote:
> On Thu, Aug 22, 2019 at 5:28 PM Max Reitz  > wrote:
> 
> On 16.08.19 23:21, Nir Soffer wrote:
> > When creating an image with preallocation "off" or "falloc", the first
> > block of the image is typically not allocated. When using Gluster
> > storage backed by XFS filesystem, reading this block using direct I/O
> > succeeds regardless of request length, fooling alignment detection.
> >
> > In this case we fallback to a safe value (4096) instead of the optimal
> > value (512), which may lead to unneeded data copying when aligning
> > requests.  Allocating the first block avoids the fallback.
> >
> > When using preallocation=off, we always allocate at least one
> filesystem
> > block:
> >
> >     $ ./qemu-img create -f raw test.raw 1g
> >     Formatting 'test.raw', fmt=raw size=1073741824
> >
> >     $ ls -lhs test.raw
> >     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> >
> > I did quick performance tests for these flows:
> > - Provisioning a VM with a new raw image.
> > - Copying disks with qemu-img convert to new raw target image
> >
> > I installed Fedora 29 server on raw sparse image, measuring the time
> > from clicking "Begin installation" until the "Reboot" button appears:
> >
> > Before(s)  After(s)     Diff(%)
> > ---
> >      356        389        +8.4
> >
> > I ran this only once, so we cannot tell much from these results.
> 
> So you’d expect it to be fast but it was slower?  Well, you only ran it
> once and it isn’t really a precise benchmark...
> 
> > The second test was cloning the installation image with qemu-img
> > convert, doing 10 runs:
> >
> >     for i in $(seq 10); do
> >         rm -f dst.raw
> >         sleep 10
> >         time ./qemu-img convert -f raw -O raw -t none -T none
> src.raw dst.raw
> >     done
> >
> > Here is a table comparing the total time spent:
> >
> > Type    Before(s)   After(s)    Diff(%)
> > ---
> > real      530.028    469.123      -11.4
> > user       17.204     10.768      -37.4
> > sys        17.881      7.011      -60.7
> >
> > Here we see very clear improvement in CPU usage.
> >
> > Signed-off-by: Nir Soffer  >
> > ---
> >  block/file-posix.c         | 25 +
> >  tests/qemu-iotests/150.out |  1 +
> >  tests/qemu-iotests/160     |  4 
> >  tests/qemu-iotests/175     | 19 +--
> >  tests/qemu-iotests/175.out |  8 
> >  tests/qemu-iotests/221.out | 12 
> >  tests/qemu-iotests/253.out | 12 
> >  7 files changed, 63 insertions(+), 18 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index b9c33c8f6c..3964dd2021 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void *opaque)
> >      return ret;
> >  }
> > 
> > +/*
> > + * Help alignment detection by allocating the first block.
> > + *
> > + * When reading with direct I/O from unallocated area on Gluster
> backed by XFS,
> > + * reading succeeds regardless of request length. In this case we
> fallback to
> > + * safe aligment which is not optimal. Allocating the first block
> avoids this
> > + * fallback.
> > + *
> > + * Returns: 0 on success, -errno on failure.
> > + */
> > +static int allocate_first_block(int fd)
> > +{
> > +    ssize_t n;
> > +
> > +    do {
> > +        n = pwrite(fd, "\0", 1, 0);
> 
> This breaks when fd has been opened with O_DIRECT.
> 
> 
> It seems that we always open images without O_DIRECT when creating an image
> in qemu-img create, or when creating a target image in qemu-img convert.

Yes.  But you don’t call this function directly from image creation code
but instead from the truncation function.  (The former also calls the
latter, but truncating is also an operation on its own.)

[...]

> (Which happens when you open some file with cache.direct=on, and then
> use e.g. QMP’s block_resize.)
> 
> 
> What would be a command triggering this? I can add a test.

block_resize, as I’ve said:

$ ./qemu-img create -f raw empty.img 0
$ x86_64-softmmu/qemu-system-x86_64 \
-qmp stdio \
-blockdev file,node-name=file,filename=empty.img,cache.direct=on \
 < 
> It isn’t that bad because eventually you simply ignore the error.  But
> it still makes me wonder whether we shouldn’t write like the biggest
> power of two that does not exceed the new file length or MAX_BLOCKSIZE.
> 
> 
> It makes sense if there is a way to cause qemu-img to use O_DIRECT when
>

Re: [Qemu-devel] [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

2019-08-22 Thread Laszlo Ersek

On 08/21/19 19:05, Paolo Bonzini wrote:
> On 21/08/19 17:48, Kinney, Michael D wrote:
>> Perhaps there is a way to avoid the 3000:8000 startup
>> vector.
>>
>> If a CPU is added after a cold reset, it is already in a
>> different state because one of the active CPUs needs to
>> release it by interacting with the hot plug controller.
>>
>> Can the SMRR for CPUs in that state be pre-programmed to
>> match the SMRR in the rest of the active CPUs?
>>
>> For OVMF we expect all the active CPUs to use the same
>> SMRR value, so a check can be made to verify that all 
>> the active CPUs have the same SMRR value.  If they do,
>> then any CPU released through the hot plug controller 
>> can have its SMRR pre-programmed and the initial SMI
>> will start within TSEG.
>>
>> We just need to decide what to do in the unexpected 
>> case where all the active CPUs do not have the same
>> SMRR value.
>>
>> This should also reduce the total number of steps.
> 
> The problem is not the SMRR but the SMBASE.  If the SMBASE area is
> outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
> SMBASE is also different for all CPUs, so it cannot be preprogrammed.

The firmware and QEMU could agree on a formula, which would compute the
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.

Thanks
Laszlo

1 2 3 4 >

1 - 100 of 333 matches

Mail list logo