Re: [Qemu-devel] [PATCH v4 5/5] optionrom/pvh: load initrd from fw_cfg

2019-01-17 Thread Liam Merwick




On 17/01/2019 14:37, Paolo Bonzini wrote:

On 17/01/19 15:33, Liam Merwick wrote:

#    pc-bios/optionrom/pvh.bin
#    pc-bios/optionrom/pvh.img
#    pc-bios/optionrom/pvh.raw


pvh.bin should not be ignored.



That's part of what I didn't quite understand. 
pc-bios/optionrom/linuxboot.bin (and the other binaries in the 
'build-all' target in pc-bios/optionrom/Makefile) are in .gitignore.


Also, should pvh.bin be added to BLOBS in Makefile like those files?

Regards,
Liam



Re: [Qemu-devel] [PATCH v4 5/5] optionrom/pvh: load initrd from fw_cfg

2019-01-17 Thread Liam Merwick

Hi Stefano,

On 17/01/2019 09:02, Stefano Garzarella wrote:

If we found initrd through fw_cfg, we can load it and use the
first module of hvm_start_info to pass initrd address and size
to the kernel.

Signed-off-by: Stefano Garzarella 
---
  pc-bios/optionrom/pvh_main.c |  21 +++--
  pc-bios/pvh.bin  | Bin 1536 -> 1536 bytes
  2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/pc-bios/optionrom/pvh_main.c b/pc-bios/optionrom/pvh_main.c
index f355476e68..d1b8b4b96e 100644
--- a/pc-bios/optionrom/pvh_main.c
+++ b/pc-bios/optionrom/pvh_main.c
@@ -46,6 +46,7 @@ struct pvh_e820_table {
  struct pvh_e820_table pvh_e820 __attribute__ ((aligned));
  
  static struct hvm_start_info start_info;

+static struct hvm_modlist_entry ramdisk_mod;
  static uint8_t cmdline_buffer[CMDLINE_BUFSIZE];
  
  
@@ -71,8 +72,8 @@ extern void pvh_load_kernel(void) asm("pvh_load_kernel");

  void pvh_load_kernel(void)
  {
  void *cmdline_addr = _buffer;
-void *kernel_entry;
-uint32_t cmdline_size, fw_cfg_version = bios_cfg_version();
+void *kernel_entry, *initrd_addr;
+uint32_t cmdline_size, initrd_size, fw_cfg_version = bios_cfg_version();
  
  start_info.magic = XEN_HVM_START_MAGIC_VALUE;

  start_info.version = 1;
@@ -110,6 +111,22 @@ void pvh_load_kernel(void)
  fw_cfg_version);
  start_info.cmdline_paddr = (uintptr_t)cmdline_addr;
  
+/* Check if we have the initrd to load */

+bios_cfg_read_entry(_size, FW_CFG_INITRD_SIZE, 4, fw_cfg_version);
+if (initrd_size) {
+bios_cfg_read_entry(_addr, FW_CFG_INITRD_ADDR, 4,
+fw_cfg_version);
+bios_cfg_read_entry(initrd_addr, FW_CFG_INITRD_DATA, initrd_size,
+fw_cfg_version);
+
+ramdisk_mod.paddr = (uintptr_t)initrd_addr;
+ramdisk_mod.size = initrd_size;
+
+/* The first module is always ramdisk. */
+start_info.modlist_paddr = (uintptr_t)_mod;
+start_info.nr_modules = 1;
+}
+
  bios_cfg_read_entry(_entry, FW_CFG_KERNEL_ENTRY, 4, 
fw_cfg_version);
  
  asm volatile("jmp *%1" : : "b"(_info), "c"(kernel_entry));

diff --git a/pc-bios/pvh.bin b/pc-bios/pvh.bin


I'm not sure what this binary is doing here but it reminded me that the 
following entries should be added to .gitignore in one of the patches.


#   pc-bios/optionrom/pvh.bin
#   pc-bios/optionrom/pvh.img
#   pc-bios/optionrom/pvh.raw

other than that, the code here LGTM so for that

Reviewed-by: Liam Merwick 



index 
38a41761014957d50eb55d790b6957888cbeee0a..8033080ada2db4c4613fdc3bb5a69d79c7b0c0ca
 100644
GIT binary patch
delta 735
zcmZqRY2cYKndyM&#

Re: [Qemu-devel] [PATCH v2 3/4] optionrom: add new PVH option rom

2019-01-16 Thread Liam Merwick

Hi Stefano,

Code LGTM, just a few minor comments below

On 15/01/2019 10:00, Stefano Garzarella wrote:

The new pvh.bin option rom can be used with SeaBIOS to boot
uncompressed kernel using the x86/HVM direct boot ABI.

pvh.S contains the entry point of the option rom. It runs
in real mode, loads the e820 table querying the BIOS, and
then it switches to 32bit protect mode and jump to the


"protect" -> "protected"
"jump" -> "jumps"


pvh_load_kernel() written in pvh_main.c.
pvh_load_kernel() loads the cmdline and kernel entry_point
using fw_cfg, then it looks for RSDP, fills the
hvm_start_info required by x86/HVM ABI, and finally jumps
to the kernel entry_point.

Signed-off-by: Stefano Garzarella 
Reviewed-by: Stefan Hajnoczi 
---
  pc-bios/optionrom/Makefile   |   5 +-
  pc-bios/optionrom/pvh.S  | 200 +++
  pc-bios/optionrom/pvh_main.c | 117 
  3 files changed, 321 insertions(+), 1 deletion(-)
  create mode 100644 pc-bios/optionrom/pvh.S
  create mode 100644 pc-bios/optionrom/pvh_main.c

diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index a9a9e5e7eb..92c91d9949 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -37,7 +37,7 @@ Wa = -Wa,
  ASFLAGS += -32
  QEMU_CFLAGS += $(call cc-c-option, $(QEMU_CFLAGS), $(Wa)-32)
  
-build-all: multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin

+build-all: multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin pvh.bin
  
  # suppress auto-removal of intermediate files

  .SECONDARY:
@@ -46,6 +46,9 @@ build-all: multiboot.bin linuxboot.bin linuxboot_dma.bin 
kvmvapic.bin
  %.o: %.S
$(call quiet-command,$(CPP) $(QEMU_INCLUDES) $(QEMU_DGFLAGS) -c -o - $< | $(AS) $(ASFLAGS) 
-o $@,"AS","$(TARGET_DIR)$@")
  
+%.img: %.o %_main.o

+   $(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m $(LD_I386_EMULATION) -T 
$(SRC_PATH)/pc-bios/optionrom/flat.lds -s -o $@ $?,"BUILD","$(TARGET_DIR)$@")
+
  %.img: %.o
$(call quiet-command,$(LD) $(LDFLAGS_NOPIE) -m $(LD_I386_EMULATION) -T 
$(SRC_PATH)/pc-bios/optionrom/flat.lds -s -o $@ $<,"BUILD","$(TARGET_DIR)$@")
  
diff --git a/pc-bios/optionrom/pvh.S b/pc-bios/optionrom/pvh.S

new file mode 100644
index 00..e1d7f4a7a7
--- /dev/null
+++ b/pc-bios/optionrom/pvh.S
@@ -0,0 +1,200 @@
+/*
+ * PVH Option ROM
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ *
+ * Copyright Novell Inc, 2009
+ *   Authors: Alexander Graf 
+ *
+ * Copyright (c) 2019 Red Hat Inc.
+ *   Authors: Stefano Garzarella 
+ */
+
+#include "optionrom.h"
+
+#define BOOT_ROM_PRODUCT "PVH loader"
+
+#define GS_PROT_JUMP   0
+#define GS_GDT_DESC6
+
+#ifdef OPTION_ROM_START
+#undef OPTION_ROM_START
+#endif
+#ifdef OPTION_ROM_END
+#undef OPTION_ROM_END
+#endif
+
+/*
+ * Redefine OPTION_ROM_START and OPTION_ROM_END, because this rom is produced
+ * linking multiple objects.
+ * signrom.py will add padding.
+ */
+#define OPTION_ROM_START\
+.code16;   \
+.text; \
+   .global _start; \
+_start:;   \
+   .short  0xaa55; \
+   .byte   3; /* desired size in 512 units */
+
+#define OPTION_ROM_END \
+_end:
+
+BOOT_ROM_START
+
+run_pvhboot:
+
+   cli
+   cld
+
+   mov %cs, %eax
+   shl $0x4, %eax
+
+   /* set up a long jump descriptor that is PC relative */
+
+   /* move stack memory to %gs */
+   mov %ss, %ecx
+   shl $0x4, %ecx
+   mov %esp, %ebx
+   add %ebx, %ecx
+   sub $0x20, %ecx
+   sub $0x30, %esp
+   shr $0x4, %ecx
+   mov %cx, %gs
+
+   /* now push the indirect jump descriptor there */
+   mov (prot_jump), %ebx
+   add %eax, %ebx
+   movl%ebx, %gs:GS_PROT_JUMP
+   mov $8, %bx
+   movw%bx, %gs:GS_PROT_JUMP + 4
+
+   /* fix the gdt descriptor to be PC relative */
+   movw(gdt_desc), %bx
+   movw%bx, %gs:GS_GDT_DESC
+   movl

Re: [Qemu-devel] [PATCH v2 0/4] pvh: add new PVH option rom

2019-01-16 Thread Liam Merwick



On 15/01/2019 10:00, Stefano Garzarella wrote:

This patch series is based on "[RFC v2 0/4] QEMU changes to do PVH boot" and
provides a PVH option rom that can be used with SeaBIOS to boot uncompressed
kernel using the x86/HVM direct boot ABI.

Patches 1 and 2 are to prepare the PVH option rom, moving common functions in a
new header.  Patch 3 adds the new PVH option rom and patch 4 uses it when we
are booting an uncompressed kernel using the x86/HVM direct boot ABI.

Changes in v2:
- addressed comments by Stefan and Eric:
   - Patch 2: moved inludes on top of linuxboot_dma.c and add  in
 optrom.h
   - Patch 4: added check of pvh.bin in xen_load_linux()
- modified commit message of patch 2 to explain better the patch

Stefano Garzarella (4):
   linuxboot_dma: remove duplicate definitions of FW_CFG
   linuxboot_dma: move common functions in a new header
   optionrom: add new PVH option rom
   hw/i386/pc: use PVH option rom

  hw/i386/pc.c  |   5 +
  pc-bios/optionrom/Makefile|   5 +-
  pc-bios/optionrom/linuxboot_dma.c | 112 +++--
  pc-bios/optionrom/optrom.h| 110 
  pc-bios/optionrom/optrom_fw_cfg.h |  92 ++
  pc-bios/optionrom/pvh.S   | 200 ++
  pc-bios/optionrom/pvh_main.c  | 117 +
  7 files changed, 544 insertions(+), 97 deletions(-)
  create mode 100644 pc-bios/optionrom/optrom.h
  create mode 100644 pc-bios/optionrom/optrom_fw_cfg.h
  create mode 100644 pc-bios/optionrom/pvh.S
  create mode 100644 pc-bios/optionrom/pvh_main.c



I had a few very minor comments on patch3, but with that, for the series:

Reviewed-by: Liam Merwick 



[Qemu-devel] [PATCH v3 5/5] pvh: load initrd and expose it through fw_cfg

2019-01-15 Thread Liam Merwick
From: Stefano Garzarella 

When initrd is specified, load and expose it to the guest firmware
through fw_cfg. The firmware will fill the hvm_start_info for the
kernel.

Signed-off-by: Stefano Garzarella 
Based-on: <1545422632-2-5-git-send-email-liam.merw...@oracle.com>
Signed-off-by: Liam Merwick 
---
 hw/i386/pc.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6d549950a044..9ed5063de8f8 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1213,25 +1213,45 @@ static void load_linux(PCMachineState *pcms,
  */
 if (load_elfboot(kernel_filename, kernel_size,
  header, pvh_start_addr, fw_cfg)) {
-struct hvm_modlist_entry ramdisk_mod = { 0 };
-
 fclose(f);
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
 strlen(kernel_cmdline) + 1);
 fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
 
-assert(machine->device_memory != NULL);
-ramdisk_mod.paddr = machine->device_memory->base;
-ramdisk_mod.size =
-memory_region_size(>device_memory->mr);
-
-fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, _mod,
- sizeof(ramdisk_mod));
 fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
  header, sizeof(header));
 
+/* load initrd */
+if (initrd_filename) {
+gsize initrd_size;
+gchar *initrd_data;
+GError *gerr = NULL;
+
+if (!g_file_get_contents(initrd_filename, _data,
+_size, )) {
+fprintf(stderr, "qemu: error reading initrd %s: %s\n",
+initrd_filename, gerr->message);
+exit(1);
+}
+
+initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 
1;
+if (initrd_size >= initrd_max) {
+fprintf(stderr, "qemu: initrd is too large, cannot 
support."
+"(max: %"PRIu32", need %"PRId64")\n",
+initrd_max, (uint64_t)initrd_size);
+exit(1);
+}
+
+initrd_addr = (initrd_max - initrd_size) & ~4095;
+
+fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
+ initrd_size);
+}
+
 return;
 }
 /* This looks like a multiboot kernel. If it is, let's stop
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 1/5] elf: Add optional function ptr to load_elf() to parse ELF notes

2019-01-15 Thread Liam Merwick
This patch adds an optional function pointer, 'elf_note_fn', to
load_elf() which causes load_elf() to additionally parse any
ELF program headers of type PT_NOTE and check to see if the ELF
Note is of the type specified by the 'translate_opaque' arg.
If a matching ELF Note is found then the specfied function pointer
is called to process the ELF note.

Passing a NULL function pointer results in ELF Notes being skipped.

The first consumer of this functionality is the PVHboot support
which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while
loading the uncompressed kernel binary in order to discover the
boot entry address for the x86/HVM direct boot ABI.

Signed-off-by: Liam Merwick 
---
 hw/alpha/dp264.c   |  4 ++--
 hw/arm/armv7m.c|  3 ++-
 hw/arm/boot.c  |  2 +-
 hw/core/generic-loader.c   |  2 +-
 hw/core/loader.c   | 24 
 hw/cris/boot.c |  3 ++-
 hw/hppa/machine.c  |  6 +++---
 hw/i386/multiboot.c|  2 +-
 hw/lm32/lm32_boards.c  |  6 --
 hw/lm32/milkymist.c|  3 ++-
 hw/m68k/an5206.c   |  2 +-
 hw/m68k/mcf5208.c  |  2 +-
 hw/microblaze/boot.c   |  7 ---
 hw/mips/mips_fulong2e.c|  5 +++--
 hw/mips/mips_malta.c   |  5 +++--
 hw/mips/mips_mipssim.c |  5 +++--
 hw/mips/mips_r4k.c |  5 +++--
 hw/moxie/moxiesim.c|  2 +-
 hw/nios2/boot.c|  7 ---
 hw/openrisc/openrisc_sim.c |  2 +-
 hw/pci-host/prep.c |  2 +-
 hw/ppc/e500.c  |  3 ++-
 hw/ppc/mac_newworld.c  |  5 +++--
 hw/ppc/mac_oldworld.c  |  5 +++--
 hw/ppc/ppc440_bamboo.c |  2 +-
 hw/ppc/sam460ex.c  |  3 ++-
 hw/ppc/spapr.c |  7 ---
 hw/ppc/virtex_ml507.c  |  2 +-
 hw/riscv/sifive_e.c|  2 +-
 hw/riscv/sifive_u.c|  2 +-
 hw/riscv/spike.c   |  2 +-
 hw/riscv/virt.c|  2 +-
 hw/s390x/ipl.c |  9 ++---
 hw/sparc/leon3.c   |  3 ++-
 hw/sparc/sun4m.c   |  6 --
 hw/sparc64/sun4u.c |  4 ++--
 hw/tricore/tricore_testboard.c |  2 +-
 hw/xtensa/sim.c| 12 
 hw/xtensa/xtfpga.c |  2 +-
 include/hw/elf_ops.h   |  2 ++
 include/hw/loader.h|  9 -
 41 files changed, 113 insertions(+), 70 deletions(-)

diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index dd62f2a4050c..0347eb897c8a 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -114,7 +114,7 @@ static void clipper_init(MachineState *machine)
 error_report("no palcode provided");
 exit(1);
 }
-size = load_elf(palcode_filename, cpu_alpha_superpage_to_phys,
+size = load_elf(palcode_filename, NULL, cpu_alpha_superpage_to_phys,
 NULL, _entry, _low, _high,
 0, EM_ALPHA, 0, 0);
 if (size < 0) {
@@ -133,7 +133,7 @@ static void clipper_init(MachineState *machine)
 if (kernel_filename) {
 uint64_t param_offset;
 
-size = load_elf(kernel_filename, cpu_alpha_superpage_to_phys,
+size = load_elf(kernel_filename, NULL, cpu_alpha_superpage_to_phys,
 NULL, _entry, _low, _high,
 0, EM_ALPHA, 0, 0);
 if (size < 0) {
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index f4446528307f..ae68aadef965 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -293,7 +293,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 as = cpu_get_address_space(cs, asidx);
 
 if (kernel_filename) {
-image_size = load_elf_as(kernel_filename, NULL, NULL, , ,
+image_size = load_elf_as(kernel_filename, NULL, NULL, NULL,
+ , ,
  NULL, big_endian, EM_ARM, 1, 0, as);
 if (image_size < 0) {
 image_size = load_image_targphys_as(kernel_filename, 0,
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index c7a67af7a97c..9d8746f7613f 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -885,7 +885,7 @@ static int64_t arm_load_elf(struct arm_boot_info *info, 
uint64_t *pentry,
 }
 }
 
-ret = load_elf_as(info->kernel_filename, NULL, NULL,
+ret = load_elf_as(info->kernel_filename, NULL, NULL, NULL,
   pentry, lowaddr, highaddr, big_endian, elf_machine,
   1, data_swab, as);
 if (ret <= 0) {
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index fbae05fb3b64..3695dd439cd0 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -136,7 +136,7 @@ static void generic_loader_realize(DeviceState *dev, Error 
**errp)
 AddressSpace *as = s->cpu ? s->cpu->as :  NULL;
 
 if (!s->force_raw) {
-size = load_

[Qemu-devel] [PATCH v3 4/5] pvh: Boot uncompressed kernel using direct boot ABI

2019-01-15 Thread Liam Merwick
These changes (along with corresponding Linux kernel and qboot changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point is read from an ELF Note
in the uncompressed kernel binary by a helper routine passed
to load_elf().

Co-developed-by: George Kennedy 
Signed-off-by: George Kennedy 
Signed-off-by: Liam Merwick 
---
 hw/i386/pc.c  | 135 ++
 include/elf.h |  10 +
 2 files changed, 145 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 73d688f84239..6d549950a044 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
 #include "sysemu/qtest.h"
 #include "kvm_i386.h"
 #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
@@ -110,6 +111,9 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
 GlobalProperty pc_compat_3_1[] = {
 { "intel-iommu", "dma-drain", "off" },
 { "Opteron_G3" "-" TYPE_X86_CPU, "rdtscp", "off" },
@@ -1060,6 +1064,109 @@ struct setup_data {
 uint8_t data[0];
 } __attribute__((packed));
 
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ *
+ * This function is passed to load_elf() when it is called from
+ * load_elfboot() which then additionally checks for an ELF Note of
+ * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
+ * parse the PVH entry address from the ELF Note.
+ *
+ * Due to trickery in elf_opts.h, load_elf() is actually available as
+ * load_elf32() or load_elf64() and this routine needs to be able
+ * to deal with being called as 32 or 64 bit.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable.  (although the entry point is 32-bit, the kernel
+ * binary can be either 32-bit or 64-bit).
+ */
+static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
+{
+size_t *elf_note_data_addr;
+
+/* Check if ELF Note header passed in is valid */
+if (arg1 == NULL) {
+return 0;
+}
+
+if (is64) {
+struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
+uint64_t nhdr_size64 = sizeof(struct elf64_note);
+uint64_t phdr_align = *(uint64_t *)arg2;
+uint64_t nhdr_namesz = nhdr64->n_namesz;
+
+elf_note_data_addr =
+((void *)nhdr64) + nhdr_size64 +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+} else {
+struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
+uint32_t nhdr_size32 = sizeof(struct elf32_note);
+uint32_t phdr_align = *(uint32_t *)arg2;
+uint32_t nhdr_namesz = nhdr32->n_namesz;
+
+elf_note_data_addr =
+((void *)nhdr32) + nhdr_size32 +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+}
+
+pvh_start_addr = *elf_note_data_addr;
+
+return pvh_start_addr;
+}
+
+static bool load_elfboot(const char *kernel_filename,
+   int kernel_file_size,
+   uint8_t *header,
+   size_t pvh_xen_start_addr,
+   FWCfgState *fw_cfg)
+{
+uint32_t flags = 0;
+uint32_t mh_load_addr = 0;
+uint32_t elf_kernel_size = 0;
+uint64_t elf_entry;
+uint64_t elf_low, elf_high;
+int kernel_size;
+
+if (ldl_p(header) != 0x464c457f) {
+return false; /* no elfboot */
+}
+
+bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+flags = elf_is64 ?
+((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+error_report("elfboot unsupported flags = %x", flags);
+exit(1);
+}
+
+uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
+kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
+   NULL, _note_type, _entry,
+   _low, _high, 0, I386_ELF_MACHINE,
+   0, 0);
+
+if (kernel_size < 0) {
+error_report("Error while loading elf kernel");
+exit(1);
+}
+mh_load_addr = elf_low;
+elf_kernel_size = elf_high - elf_low;
+
+if (pvh_start_addr == 0) {
+error_repo

[Qemu-devel] [PATCH v3 3/5] pvh: Add x86/HVM direct boot ABI header file

2019-01-15 Thread Liam Merwick
From: Liam Merwick 

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary with minimal firmware involvement.

https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

The canonical version of start_info.h is in the Xen codebase.
(like QEMU, the Linux kernel uses a copy as well).

Signed-off-by: Liam Merwick 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 include/hw/xen/start_info.h | 146 
 1 file changed, 146 insertions(+)
 create mode 100644 include/hw/xen/start_info.h

diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h
new file mode 100644
index ..348779eb10cd
--- /dev/null
+++ b/include/hw/xen/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 ++
+ *| magic  | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *|| ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 ++
+ *| version| Version of this structure. Current version is 1. New
+ *|| versions are guaranteed to be backwards-compatible.
+ *  8 ++
+ *| flags  | SIF_xxx flags.
+ * 12 ++
+ *| nr_modules | Number of modules passed to the kernel.
+ * 16 ++
+ *| modlist_paddr  | Physical address of an array of modules
+ *|| (layout of the structure below).
+ * 24 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 32 ++
+ *| rsdp_paddr | Physical address of the RSDP ACPI data structure.
+ * 40 ++
+ *| memmap_paddr   | Physical address of the (optional) memory map. Only
+ *|| present in version 1 and newer of the structure.
+ * 48 ++
+ *| memmap_entries | Number of entries in the memory map table. Only
+ *|| present in version 1 and newer of the structure.
+ *|| Zero if there is no memory map being provided.
+ * 52 ++
+ *| reserved   | Version 1 and newer only.
+ * 56 ++
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 ++
+ *| paddr  | Physical address of the module.
+ *  8 ++
+ *| size   | Size of the module in bytes.
+ * 16 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 24 ++
+ *| reserved   |
+ * 32 ++
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 ++
+ *| addr   | Base address
+ *  8 ++
+ *| size   | Size of mapping in bytes
+ * 16 ++
+ *| type   | Type of mapping as defined between the hypervisor
+ *|| and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +|
+ *| reserved   |
+ * 24 ++
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of t

[Qemu-devel] [PATCH v3 0/4] QEMU changes to do PVH boot

2019-01-15 Thread Liam Merwick
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes (in 4.21): https://lkml.org/lkml/2018/12/14/1330
qboot pull request integrated: https://github.com/bonzini/qboot/pull/17 

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  In load_linux() a call is made to load_elfboot()
so see if the header matches that of an uncompressed kernel binary (ELF)
and if so, loads the binary and determines the kernel entry address
from an ELF Note in the binary.  Then qboot does futher initialisation
of the guest (e820, etc.) and jumps to the kernel entry address and
boots the guest.

changes v1 -> v2
- Based on feedback from Stefan Hajnoczi
- The reading of the PVH entry point is now done in a single pass during
  elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
  and considerably reworked.
- Patch1 adds a new optional function pointer to parse the ELF note type
  (the type is passed in via the existing translate_opaque arg - the
  function already had 11 args so I didn't want to add more than one new arg).
- Patch2 adds a function to elf_ops.h to find an ELF note
  matching a specific type 
- Patch3 just has a line added to the commit message to state that the Xen
  repo is the canonical location
- Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
  just minor load_elfboot() changes and the addition of a
  read_pvh_start_addr() helper function for load_elf()

changes v2 -> v3
- Based on feedback from Stefan Hajnoczi
- Fix formatting issues where a few tabs snuck in v2
- Moved code to use ELF Note in load_elf() from Patch1 to Patch2
- In load_elf() set data to NULL after g_free() [now in Patch2 following move]
- Added Patch5 containing changes by Stefano Garzarella to support -initrd

Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (4):
  elf: Add optional function ptr to load_elf() to parse ELF notes
  elf-ops.h: Add get_elf_note_type()
  pvh: Add x86/HVM direct boot ABI header file
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/alpha/dp264.c   |   4 +-
 hw/arm/armv7m.c|   3 +-
 hw/arm/boot.c  |   2 +-
 hw/core/generic-loader.c   |   2 +-
 hw/core/loader.c   |  24 ---
 hw/cris/boot.c |   3 +-
 hw/hppa/machine.c  |   6 +-
 hw/i386/multiboot.c|   2 +-
 hw/i386/pc.c   | 135 +
 hw/lm32/lm32_boards.c  |   6 +-
 hw/lm32/milkymist.c|   3 +-
 hw/m68k/an5206.c   |   2 +-
 hw/m68k/mcf5208.c  |   2 +-
 hw/microblaze/boot.c   |   7 +-
 hw/mips/mips_fulong2e.c|   5 +-
 hw/mips/mips_malta.c   |   5 +-
 hw/mips/mips_mipssim.c |   5 +-
 hw/mips/mips_r4k.c |   5 +-
 hw/moxie/moxiesim.c|   2 +-
 hw/nios2/boot.c|   7 +-
 hw/openrisc/openrisc_sim.c |   2 +-
 hw/pci-host/prep.c |   2 +-
 hw/ppc/e500.c  |   3 +-
 hw/ppc/mac_newworld.c  |   5 +-
 hw/ppc/mac_oldworld.c  |   5 +-
 hw/ppc/ppc440_bamboo.c |   2 +-
 hw/ppc/sam460ex.c  |   3 +-
 hw/ppc/spapr.c |   7 +-
 hw/ppc/virtex_ml507.c  |   2 +-
 hw/riscv/sifive_e.c|   2 +-
 hw/riscv/sifive_u.c|   2 +-
 hw/riscv/spike.c   |   2 +-
 hw/riscv/virt.c|   2 +-
 hw/s390x/ipl.c |   9 ++-
 hw/sparc/leon3.c   |   3

[Qemu-devel] [PATCH v3 2/5] elf-ops.h: Add get_elf_note_type()

2019-01-15 Thread Liam Merwick
Introduce a routine which, given a pointer to a range of ELF Notes,
searches through them looking for a note matching the type specified
and returns a pointer to the matching ELF note.

get_elf_note_type() is used by elf_load[32|64]() to find the
specified note type required by the 'elf_note_fn' parameter
added in the previous commit.

Signed-off-by: Liam Merwick 
---
 include/hw/elf_ops.h | 75 
 1 file changed, 75 insertions(+)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 3438d6f69e8d..690f9238c8cc 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -265,6 +265,51 @@ fail:
 return ret;
 }
 
+/*
+ * Given 'nhdr', a pointer to a range of ELF Notes, search through them
+ * for a note matching type 'elf_note_type' and return a pointer to
+ * the matching ELF note.
+ */
+static struct elf_note *glue(get_elf_note_type, SZ)(struct elf_note *nhdr,
+elf_word note_size,
+elf_word phdr_align,
+elf_word elf_note_type)
+{
+elf_word nhdr_size = sizeof(struct elf_note);
+elf_word elf_note_entry_offset = 0;
+elf_word note_type;
+elf_word nhdr_namesz;
+elf_word nhdr_descsz;
+
+if (nhdr == NULL) {
+return NULL;
+}
+
+note_type = nhdr->n_type;
+while (note_type != elf_note_type) {
+nhdr_namesz = nhdr->n_namesz;
+nhdr_descsz = nhdr->n_descsz;
+
+elf_note_entry_offset = nhdr_size +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+/*
+ * If the offset calculated in this iteration exceeds the
+ * supplied size, we are done and no matching note was found.
+ */
+if (elf_note_entry_offset > note_size) {
+return NULL;
+}
+
+/* skip to the next ELF Note entry */
+nhdr = (void *)nhdr + elf_note_entry_offset;
+note_type = nhdr->n_type;
+}
+
+return nhdr;
+}
+
 static int glue(load_elf, SZ)(const char *name, int fd,
   uint64_t (*elf_note_fn)(void *, void *, bool),
   uint64_t (*translate_fn)(void *, uint64_t),
@@ -497,6 +542,36 @@ static int glue(load_elf, SZ)(const char *name, int fd,
 high = addr + mem_size;
 
 data = NULL;
+
+} else if (ph->p_type == PT_NOTE && elf_note_fn) {
+struct elf_note *nhdr = NULL;
+
+file_size = ph->p_filesz; /* Size of the range of ELF notes */
+data = g_malloc0(file_size);
+if (ph->p_filesz > 0) {
+if (lseek(fd, ph->p_offset, SEEK_SET) < 0) {
+goto fail;
+}
+if (read(fd, data, file_size) != file_size) {
+goto fail;
+}
+}
+
+/*
+ * Search the ELF notes to find one with a type matching the
+ * value passed in via 'translate_opaque'
+ */
+nhdr = (struct elf_note *)data;
+assert(translate_opaque != NULL);
+nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align,
+   *(uint64_t *)translate_opaque);
+if (nhdr != NULL) {
+bool is64 =
+sizeof(struct elf_note) == sizeof(struct elf64_note);
+elf_note_fn((void *)nhdr, (void *)>p_align, is64);
+}
+g_free(data);
+data = NULL;
 }
 }
 
-- 
1.8.3.1




Re: [Qemu-devel] [RFC v2 0/4] QEMU changes to do PVH boot

2019-01-15 Thread Liam Merwick

Hi Stefano,

On 10/01/2019 15:12, Stefano Garzarella wrote:

On Wed, Jan 09, 2019 at 01:18:12PM -0800, Maran Wilson wrote:

On 1/9/2019 11:53 AM, Boris Ostrovsky wrote:

On 1/9/19 6:53 AM, Stefano Garzarella wrote:

Hi Liam,

On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick  wrote:

QEMU sets the hvm_modlist_entry in load_linux() after the call to
load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()

But the current PVH patches don't handle initrd (they have
start_info.nr_modules == 1).

Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw:
  /* The first module is always ramdisk. */
  if (pvh_start_info.nr_modules) {
  struct hvm_modlist_entry *modaddr =
  __va(pvh_start_info.modlist_paddr);
  pvh_bootparams.hdr.ramdisk_image = modaddr->paddr;
  pvh_bootparams.hdr.ramdisk_size = modaddr->size;
  }

So, putting start_info.nr_modules = 1, means that the first
hvm_modlist_entry should have the ramdisk paddr and size. Is it
correct?


That's my understanding.

I think what's missing, is that we just need Qemu or qboot/seabios to
properly populate the pvh_start_info.modlist_paddr with the address (as
usable by the guest) of the hvm_modlist_entry which correctly defines the
details of the initrd that has already been loaded into memory that is
accessible by the guest.

-Maran



I tried and it works, I modified QEMU to load the initrd and to expose it
through fw_cfg, then qboot loads it and set correctly the hvm_modlist_entry.

You can find the patch of QEMU at the end of this email and the qboot
patch here: 
https://github.com/stefano-garzarella/qboot/commit/41e1fd765c8419e270fd79d9b3af5d53576e88a8

Do you think can be a good approach? If you want, you can add this patch
to your series.


Code looks good to me. I'll include it with v3 of my QEMU patches.

Regards,
Liam




Thanks,
Stefano


 From d5c0d51768f5a8fb214be6c2bb0cb7e86e9917b7 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella 
Date: Thu, 10 Jan 2019 15:16:44 +0100
Subject: [PATCH] pvh: load initrd and expose it through fw_cfg

When initrd is specified, load and expose it to the guest firmware
through fw_cfg. The firmware will fill the hvm_start_info for the
kernel.

Signed-off-by: Stefano Garzarella 
Based-on: <1545422632-2-5-git-send-email-liam.merw...@oracle.com>
---
  hw/i386/pc.c | 38 +-
  1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 06bce6a101..f6721f51be 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -986,25 +986,45 @@ static void load_linux(PCMachineState *pcms,
   */
  if (load_elfboot(kernel_filename, kernel_size,
   header, pvh_start_addr, fw_cfg)) {
-struct hvm_modlist_entry ramdisk_mod = { 0 };
-
  fclose(f);
  
  fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,

  strlen(kernel_cmdline) + 1);
  fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
  
-assert(machine->device_memory != NULL);

-ramdisk_mod.paddr = machine->device_memory->base;
-ramdisk_mod.size =
-memory_region_size(>device_memory->mr);
-
-fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, _mod,
- sizeof(ramdisk_mod));
  fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
  fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
   header, sizeof(header));
  
+/* load initrd */

+if (initrd_filename) {
+gsize initrd_size;
+gchar *initrd_data;
+GError *gerr = NULL;
+
+if (!g_file_get_contents(initrd_filename, _data,
+_size, )) {
+fprintf(stderr, "qemu: error reading initrd %s: %s\n",
+initrd_filename, gerr->message);
+exit(1);
+}
+
+initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 
1;
+if (initrd_size >= initrd_max) {
+fprintf(stderr, "qemu: initrd is too large, cannot 
support."
+"(max: %"PRIu32", need %"PRId64")\n",
+initrd_max, (uint64_t)initrd_size);
+exit(1);
+}
+
+initrd_addr = (initrd_max - initrd_size) & ~4095;
+
+fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
+ initrd_size);
+}
+
  return;
  }
  /* This looks like a multiboot kernel. If it is, let's stop





Re: [Qemu-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()

2019-01-08 Thread Liam Merwick




On 02/01/2019 13:12, Stefan Hajnoczi wrote:

On Fri, Dec 21, 2018 at 08:03:50PM +, Liam Merwick wrote:

+while (note_type != elf_note_type) {
+nhdr_namesz = nhdr->n_namesz;
+nhdr_descsz = nhdr->n_descsz;
+
+elf_note_entry_offset = nhdr_size +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+/* If the offset calculated in this iteration exceeds the
+* supplied size, we are done and no matching note was found.
+*/


Indentation is off here.  QEMU uses 4-space indentation.


+if (elf_note_entry_offset > note_size) {
+return NULL;
+}
+
+/* skip to the next ELF Note entry */
+nhdr = (void *)nhdr + elf_note_entry_offset;
+note_type = nhdr->n_type;
+}
+
+return nhdr;
+}
+
  static int glue(load_elf, SZ)(const char *name, int fd,
uint64_t (*elf_note_fn)(void *, void *, bool),
uint64_t (*translate_fn)(void *, uint64_t),
@@ -512,6 +555,13 @@ static int glue(load_elf, SZ)(const char *name, int fd,
  }
  }
  
+	/* Search the ELF notes to find one with a type matching the

+* value passed in via 'translate_opaque'
+*/
+nhdr = (struct elf_note *)data;


Ah, I see data gets used here!  It would be clearer to move loading of
data into this patch.



Moved.




+   assert(translate_opaque != NULL);
+nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align,
+   *(uint64_t *)translate_opaque);


Indentation is off in this hunk.  QEMU uses 4-space indentation.



A few stray tabs had snuck in - I've fixed all those.

Regards,
Liam



Re: [Qemu-devel] [RFC v2 4/4] pvh: Boot uncompressed kernel using direct boot ABI

2019-01-08 Thread Liam Merwick




On 02/01/2019 13:18, Stefan Hajnoczi wrote:

On Fri, Dec 21, 2018 at 08:03:52PM +, Liam Merwick wrote:

@@ -1336,7 +1470,7 @@ void pc_memory_init(PCMachineState *pcms,
  int linux_boot, i;
  MemoryRegion *ram, *option_rom_mr;
  MemoryRegion *ram_below_4g, *ram_above_4g;
-FWCfgState *fw_cfg;
+FWCfgState *fw_cfg = NULL;


What is the purpose of this change?



I've removed this. There is no need for it - it dated from when these 
changes used the Clear Containers -nofw patches.


Regards,
Liam



Re: [Qemu-devel] [RFC v2 1/4] elf: Add optional function ptr to load_elf() to parse ELF notes

2019-01-08 Thread Liam Merwick




On 02/01/2019 13:06, Stefan Hajnoczi wrote:

On Fri, Dec 21, 2018 at 08:03:49PM +, Liam Merwick wrote:

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 74679ff8da3a..37d20a3800c1 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -266,6 +266,7 @@ fail:
  }
  
  static int glue(load_elf, SZ)(const char *name, int fd,

+  uint64_t (*elf_note_fn)(void *, void *, bool),
uint64_t (*translate_fn)(void *, uint64_t),
void *translate_opaque,
int must_swab, uint64_t *pentry,
@@ -496,8 +497,30 @@ static int glue(load_elf, SZ)(const char *name, int fd,
  high = addr + mem_size;
  
  data = NULL;

+
+} else if (ph->p_type == PT_NOTE && elf_note_fn) {
+struct elf_note *nhdr = NULL;
+
+file_size = ph->p_filesz; /* Size of the range of ELF notes */
+data = g_malloc0(file_size);
+if (ph->p_filesz > 0) {
+if (lseek(fd, ph->p_offset, SEEK_SET) < 0) {
+goto fail;
+}
+if (read(fd, data, file_size) != file_size) {
+goto fail;
+}
+}
+
+if (nhdr != NULL) {
+bool is64 =
+sizeof(struct elf_note) == sizeof(struct elf64_note);
+elf_note_fn((void *)nhdr, (void *)>p_align, is64);


How does data get used?


Moved (as suggested in comments for next patch)




+}
+g_free(data);


Missing data = NULL to prevent double free later?



Added explicit assignment.

Regards,
Liam



Re: [Qemu-devel] [RFC v2 0/4] QEMU changes to do PVH boot

2019-01-08 Thread Liam Merwick

Hi Stefano,

[ Catching up on mail after vacation ]

On 03/01/2019 17:22, Stefano Garzarella wrote:

Hi Liam, Hi Maran,
I'm writing the optionrom to do PVH boot also with SeaBIOS.
It is almost complete and I'm testing it, but I have some issue with
QEMU -initrd parameter.
(It works correctly without -initrd and using a kernel with all needed
modules compiled statically)

Linux boots correctly, but it is not able to find the ramdisk. (I have
the same behavior with qboot)
Looking at Linux, QEMU, and qboot patches, I understood that the first
module pointed by 'modlist_paddr' in the 'hvm_start_info' should be
used to pass the ramdisk address and size to the kernel, but I didn't
understand who load it in RAM. (I guess QEMU directly or the firmware
by fw_cfg interface)

Can you give me some suggestions?




QEMU sets the hvm_modlist_entry in load_linux() after the call to 
load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg()


But the current PVH patches don't handle initrd (they have 
start_info.nr_modules == 1).


During (or after) the call to load_elfboot() it looks like we'd need to 
do something like what load_multiboot() does below (along with the 
associated initialisation)


400 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI);
401 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo));
402 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data,
403  sizeof(bootinfo));

I'm checking to see if that has any implications for the kernel side.

Regards,
Liam



On Fri, Dec 21, 2018 at 9:07 PM Liam Merwick  wrote:


For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes (v9 staged for 4.21): 
https://lkml.org/lkml/2018/12/14/1330
qboot pull request: https://github.com/bonzini/qboot/pull/17

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  In load_linux() a call is made to load_elfboot()
so see if the header matches that of an uncompressed kernel binary (ELF)
and if so, loads the binary and determines the kernel entry address
from an ELF Note in the binary.  Then qboot does futher initialisation
of the guest (e820, etc.) and jumps to the kernel entry address and
boots the guest.

changes v1 -> v2
- Based on feedback from Stefan Hajnoczi
- The reading of the PVH entry point is now done in a single pass during
   elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
   and considerably reworked.
- Patch1 adds a new optional function pointer to parse the ELF note type
   (the type is passed in via the existing translate_opaque arg - the
   function already had 11 args so I didn't want to add more than one new arg).
- Patch2 adds a function to elf_ops.h to find an ELF note
   matching a specific type
- Patch3 just has a line added to the commit message to state that the Xen
   repo is the canonical location
- Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
   just minor load_elfboot() changes and the addition of a
   read_pvh_start_addr() helper function for load_elf()


Usіng the method/scripts documented by the NEMU team at

https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
  qemu_init_end: 41.550521
  fw_start: 41.667139 (+0.116618)
  fw_do_boot: 47.448495 (+5.781356)
  linux_startup_64: 47.720785 (+0.27229)
  linux_start_kernel: 48.399541 (+0.678756)
  linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
  qemu_init_end: 29.209276
  fw_start: 29.317342 (+0.108066)
  linux_start_boot: 36.679362 (+7.36202)
  linux_startup_64: 94.531349 (+57.851987)
  linux_start_kernel: 94.900913 (+0.369564)
  linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
  qemu_init_end: 30.424430
  linux_startup_64: 893.770334 (+863.345904)
  linux_start_kernel: 894.17049 (+0.400156)
  linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (4):
   elf: Add optional function ptr to load_elf() to parse ELF notes
   elf-ops.h: Add get_elf_note_type()
   pvh: Add x86/HVM direct boot ABI header file
   pvh: Boot uncompressed kernel using direct boot ABI

  hw/alpha/dp264.c   |   4 +-
  hw/arm/armv7m.c|   3 +-
  hw/arm/boot.c  |   2 +-
  hw/core/generic-loader.c   |   2 +-
  hw/cor

[Qemu-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()

2018-12-21 Thread Liam Merwick
Introduce a routine which, given a pointer to a range of ELF Notes,
searches through them looking for a note matching the type specified
and returns a pointer to the matching ELF note.

Signed-off-by: Liam Merwick 
---
 include/hw/elf_ops.h | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 37d20a3800c1..ffbdfbe9c2d8 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -265,6 +265,49 @@ fail:
 return ret;
 }
 
+/* Given 'nhdr', a pointer to a range of ELF Notes, search through them
+ * for a note matching type 'elf_note_type' and return a pointer to
+ * the matching ELF note.
+ */
+static struct elf_note *glue(get_elf_note_type, SZ)(struct elf_note *nhdr,
+elf_word note_size,
+elf_word phdr_align,
+elf_word elf_note_type)
+{
+elf_word nhdr_size = sizeof(struct elf_note);
+elf_word elf_note_entry_offset = 0;
+elf_word note_type;
+elf_word nhdr_namesz;
+elf_word nhdr_descsz;
+
+if (nhdr == NULL) {
+return NULL;
+}
+
+note_type = nhdr->n_type;
+while (note_type != elf_note_type) {
+nhdr_namesz = nhdr->n_namesz;
+nhdr_descsz = nhdr->n_descsz;
+
+elf_note_entry_offset = nhdr_size +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+/* If the offset calculated in this iteration exceeds the
+* supplied size, we are done and no matching note was found.
+*/
+if (elf_note_entry_offset > note_size) {
+return NULL;
+}
+
+/* skip to the next ELF Note entry */
+nhdr = (void *)nhdr + elf_note_entry_offset;
+note_type = nhdr->n_type;
+}
+
+return nhdr;
+}
+
 static int glue(load_elf, SZ)(const char *name, int fd,
   uint64_t (*elf_note_fn)(void *, void *, bool),
   uint64_t (*translate_fn)(void *, uint64_t),
@@ -512,6 +555,13 @@ static int glue(load_elf, SZ)(const char *name, int fd,
 }
 }
 
+   /* Search the ELF notes to find one with a type matching the
+* value passed in via 'translate_opaque'
+*/
+nhdr = (struct elf_note *)data;
+   assert(translate_opaque != NULL);
+nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align,
+   *(uint64_t *)translate_opaque);
 if (nhdr != NULL) {
 bool is64 =
 sizeof(struct elf_note) == sizeof(struct elf64_note);
-- 
1.8.3.1




Re: [Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary

2018-12-21 Thread Liam Merwick

Thanks Stefan for the review - comments inline.

On 11/12/2018 14:17, Stefan Hajnoczi wrote:

On Wed, Dec 05, 2018 at 10:37:25PM +, Liam Merwick wrote:

From: Liam Merwick 

Add support to read the PVH Entry address from an ELF note in the
uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
This 32-bit entry point will be used by QEMU to load the kernel in the
guest and jump into the kernel entry point.

For now, a call to this function is added in pc_memory_init() to read the
address - a future patch will use the entry point.

Signed-off-by: Liam Merwick 
---
  hw/i386/pc.c  | 272 +-
  include/elf.h |  10 +++
  2 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f095725dbab2..056aa46d99b9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
  static unsigned e820_entries;
  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
  
+/* Physical Address of PVH entry point read from kernel ELF NOTE */

+static size_t pvh_start_addr;
+
  void gsi_handler(void *opaque, int n, int level)
  {
  GSIState *s = opaque;
@@ -834,6 +837,267 @@ struct setup_data {
  uint8_t data[0];
  } __attribute__((packed));
  
+/*

+ * Search through the ELF Notes for an entry with the given
+ * ELF Note type
+ */
+static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
+size_t elf_note_type)


Generic ELF code.  Can you put it in hw/core/loader.c?



I've added a modified/slimmed down version to include/hw/elf_ops.h 
(which now handles 32 and 64 bit as you mention below).  I've put this 
in a separate commit.






+{
+void *nhdr = NULL;
+size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+size_t elf_note_entry_sz = 0;
+size_t phdr_off;
+size_t phdr_align;
+size_t phdr_memsz;
+size_t nhdr_namesz;
+size_t nhdr_descsz;
+size_t note_type;


The macro tricks used by hw/core/loader.c are nasty, but I think they
get the types right.  Here the Elf64 on 32-bit host case is definitely
broken due to using size_t.  Perhaps 64-on-32 isn't supported, but
getting the types right is worth discussing.


+
+phdr_off = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+phdr_align = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+phdr_memsz = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+nhdr = ehdr + phdr_off;


The ELF file is untrusted.  All inputs must be validated.  phdr_off
could be an bogus/malicious value.



Most of the parsing of the ELF binary goes away due to moving to parse 
during elf_load() - more info below.






+note_type = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+nhdr_namesz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+nhdr_descsz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+while (note_type != elf_note_type) {
+elf_note_entry_sz = nhdr_size +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+/*
+ * Verify that we haven't exceeded the end of the ELF Note section.
+ * If we have, then there is no note of the given type present
+ * in the ELF Notes.
+ */
+if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
+error_report("Note type (0x%lx) not found in ELF Note section",
+elf_note_type);
+return NULL;
+}
+
+/* skip to the next ELF Note entry */
+nhdr += elf_note_entry_sz;
+note_type = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+nhdr_namesz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+nhdr_descsz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+}
+
+return nhdr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ * This function reads the ELF headers of the binary specified on the
+ * command line by -kernel (path contained in 'filename') and discovers
+ * the PVH entry address from the appropriate ELF Note.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable. The ELF class of the binary is returned via 'elfclass'
+ * (although the entry point is 32-bit, the kernel binary can be either
+ * 32-bit or 64-bit).
+ */
+static bool read_pvh_start_addr_elf_note(const char *filename,
+un

[Qemu-devel] [RFC v2 4/4] pvh: Boot uncompressed kernel using direct boot ABI

2018-12-21 Thread Liam Merwick
These changes (along with corresponding Linux kernel and qboot changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point is read from an ELF Note
in the uncompressed kernel binary by a helper routine passed
to load_elf().

Co-developed-by: George Kennedy 
Signed-off-by: George Kennedy 
Signed-off-by: Liam Merwick 
---
 hw/i386/pc.c  | 136 +-
 include/elf.h |  10 +
 2 files changed, 145 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 115bc2825ce4..6d44a14da44d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
 #include "sysemu/qtest.h"
 #include "kvm_i386.h"
 #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
@@ -109,6 +110,9 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
 void gsi_handler(void *opaque, int n, int level)
 {
 GSIState *s = opaque;
@@ -834,6 +838,109 @@ struct setup_data {
 uint8_t data[0];
 } __attribute__((packed));
 
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ *
+ * This function is passed to load_elf() when it is called from
+ * load_elfboot() which then additionally checks for an ELF Note of
+ * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
+ * parse the PVH entry address from the ELF Note.
+ *
+ * Due to trickery in elf_opts.h, load_elf() is actually available as
+ * load_elf32() or load_elf64() and this routine needs to be able
+ * to deal with being called as 32 or 64 bit.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable.  (although the entry point is 32-bit, the kernel
+ * binary can be either 32-bit or 64-bit).
+ */
+static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
+{
+size_t *elf_note_data_addr;
+
+/* Check if ELF Note header passed in is valid */
+if (arg1 == NULL) {
+return 0;
+}
+
+if (is64) {
+struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
+uint64_t nhdr_size64 = sizeof(struct elf64_note);
+uint64_t phdr_align = *(uint64_t *)arg2;
+uint64_t nhdr_namesz = nhdr64->n_namesz;
+
+elf_note_data_addr =
+((void *)nhdr64) + nhdr_size64 +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+} else {
+struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
+uint32_t nhdr_size32 = sizeof(struct elf32_note);
+uint32_t phdr_align = *(uint32_t *)arg2;
+uint32_t nhdr_namesz = nhdr32->n_namesz;
+
+elf_note_data_addr =
+((void *)nhdr32) + nhdr_size32 +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+}
+
+pvh_start_addr = *elf_note_data_addr;
+
+return pvh_start_addr;
+}
+
+static bool load_elfboot(const char *kernel_filename,
+   int kernel_file_size,
+   uint8_t *header,
+   size_t pvh_xen_start_addr,
+   FWCfgState *fw_cfg)
+{
+uint32_t flags = 0;
+uint32_t mh_load_addr = 0;
+uint32_t elf_kernel_size = 0;
+uint64_t elf_entry;
+uint64_t elf_low, elf_high;
+int kernel_size;
+
+if (ldl_p(header) != 0x464c457f) {
+return false; /* no elfboot */
+}
+
+bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+flags = elf_is64 ?
+((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+error_report("elfboot unsupported flags = %x", flags);
+exit(1);
+}
+
+uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
+kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
+   NULL, _note_type, _entry,
+   _low, _high, 0, I386_ELF_MACHINE,
+   0, 0);
+
+if (kernel_size < 0) {
+error_report("Error while loading elf kernel");
+exit(1);
+}
+mh_load_addr = elf_low;
+elf_kernel_size = elf_high - elf_low;
+
+if (pvh_start_addr == 0) {
+error_report("Error loading uncompressed kernel without PVH ELF Note");
+exit(1);
+}
+fw

[Qemu-devel] [RFC v2 0/4] QEMU changes to do PVH boot

2018-12-21 Thread Liam Merwick
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes (v9 staged for 4.21): 
https://lkml.org/lkml/2018/12/14/1330
qboot pull request: https://github.com/bonzini/qboot/pull/17 

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  In load_linux() a call is made to load_elfboot()
so see if the header matches that of an uncompressed kernel binary (ELF)
and if so, loads the binary and determines the kernel entry address
from an ELF Note in the binary.  Then qboot does futher initialisation
of the guest (e820, etc.) and jumps to the kernel entry address and
boots the guest.

changes v1 -> v2
- Based on feedback from Stefan Hajnoczi
- The reading of the PVH entry point is now done in a single pass during
  elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2
  and considerably reworked.
- Patch1 adds a new optional function pointer to parse the ELF note type
  (the type is passed in via the existing translate_opaque arg - the
  function already had 11 args so I didn't want to add more than one new arg).
- Patch2 adds a function to elf_ops.h to find an ELF note
  matching a specific type 
- Patch3 just has a line added to the commit message to state that the Xen
  repo is the canonical location
- Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1
  just minor load_elfboot() changes and the addition of a
  read_pvh_start_addr() helper function for load_elf()


Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (4):
  elf: Add optional function ptr to load_elf() to parse ELF notes
  elf-ops.h: Add get_elf_note_type()
  pvh: Add x86/HVM direct boot ABI header file
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/alpha/dp264.c   |   4 +-
 hw/arm/armv7m.c|   3 +-
 hw/arm/boot.c  |   2 +-
 hw/core/generic-loader.c   |   2 +-
 hw/core/loader.c   |  24 ---
 hw/cris/boot.c |   3 +-
 hw/hppa/machine.c  |   6 +-
 hw/i386/multiboot.c|   2 +-
 hw/i386/pc.c   | 131 +++-
 hw/lm32/lm32_boards.c  |   6 +-
 hw/lm32/milkymist.c|   3 +-
 hw/m68k/an5206.c   |   2 +-
 hw/m68k/mcf5208.c  |   2 +-
 hw/microblaze/boot.c   |   7 +-
 hw/mips/mips_fulong2e.c|   5 +-
 hw/mips/mips_malta.c   |   5 +-
 hw/mips/mips_mipssim.c |   5 +-
 hw/mips/mips_r4k.c |   5 +-
 hw/moxie/moxiesim.c|   2 +-
 hw/nios2/boot.c|   7 +-
 hw/openrisc/openrisc_sim.c |   2 +-
 hw/pci-host/prep.c |   2 +-
 hw/ppc/e500.c  |   3 +-
 hw/ppc/mac_newworld.c  |   5 +-
 hw/ppc/mac_oldworld.c  |   5 +-
 hw/ppc/ppc440_bamboo.c |   2 +-
 hw/ppc/sam460ex.c  |   3 +-
 hw/ppc/spapr.c |   7 +-
 hw/ppc/virtex_ml507.c  |   2 +-
 hw/riscv/sifive_e.c|   2 +-
 hw/riscv/sifive_u.c|   2 +-
 hw/riscv/spike.c   |   2 +-
 hw/riscv/virt.c|   2 +-
 hw/s390x/ipl.c |   9 ++-
 hw/sparc/leon3.c   |   3 +-
 hw/sparc/sun4m.c   |   6 +-
 hw/sparc64/sun4u.c |   4 +-
 hw/tricore/tricore_testboard.c |   2 +-
 hw/xtensa/sim.c|  12 ++--
 hw/xtensa/xtfpga.c |   2 +-
 include/elf.h  |  10 +++
 include/hw/elf_ops.h   |  72 
 include/h

[Qemu-devel] [RFC v2 1/4] elf: Add optional function ptr to load_elf() to parse ELF notes

2018-12-21 Thread Liam Merwick
This patch adds an optional function pointer, 'elf_note_fn', to
load_elf() which causes load_elf() to additionally parse any
ELF program headers of type PT_NOTE and check to see if the ELF
Note is of the type specified by the 'translate_opaque' arg.
If a matching ELF Note is found then the specfied function pointer
is called to process the ELF note.

Passing a NULL function pointer results in ELF Notes being skipped.

The first consumer of this functionality is the PVHboot support
which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while
loading the uncompressed kernel binary in order to discover the
boot entry address for the x86/HVM direct boot ABI.

Signed-off-by: Liam Merwick 
---
 hw/alpha/dp264.c   |  4 ++--
 hw/arm/armv7m.c|  3 ++-
 hw/arm/boot.c  |  2 +-
 hw/core/generic-loader.c   |  2 +-
 hw/core/loader.c   | 24 
 hw/cris/boot.c |  3 ++-
 hw/hppa/machine.c  |  6 +++---
 hw/i386/multiboot.c|  2 +-
 hw/lm32/lm32_boards.c  |  6 --
 hw/lm32/milkymist.c|  3 ++-
 hw/m68k/an5206.c   |  2 +-
 hw/m68k/mcf5208.c  |  2 +-
 hw/microblaze/boot.c   |  7 ---
 hw/mips/mips_fulong2e.c|  5 +++--
 hw/mips/mips_malta.c   |  5 +++--
 hw/mips/mips_mipssim.c |  5 +++--
 hw/mips/mips_r4k.c |  5 +++--
 hw/moxie/moxiesim.c|  2 +-
 hw/nios2/boot.c|  7 ---
 hw/openrisc/openrisc_sim.c |  2 +-
 hw/pci-host/prep.c |  2 +-
 hw/ppc/e500.c  |  3 ++-
 hw/ppc/mac_newworld.c  |  5 +++--
 hw/ppc/mac_oldworld.c  |  5 +++--
 hw/ppc/ppc440_bamboo.c |  2 +-
 hw/ppc/sam460ex.c  |  3 ++-
 hw/ppc/spapr.c |  7 ---
 hw/ppc/virtex_ml507.c  |  2 +-
 hw/riscv/sifive_e.c|  2 +-
 hw/riscv/sifive_u.c|  2 +-
 hw/riscv/spike.c   |  2 +-
 hw/riscv/virt.c|  2 +-
 hw/s390x/ipl.c |  9 ++---
 hw/sparc/leon3.c   |  3 ++-
 hw/sparc/sun4m.c   |  6 --
 hw/sparc64/sun4u.c |  4 ++--
 hw/tricore/tricore_testboard.c |  2 +-
 hw/xtensa/sim.c| 12 
 hw/xtensa/xtfpga.c |  2 +-
 include/hw/elf_ops.h   | 23 +++
 include/hw/loader.h|  9 -
 41 files changed, 134 insertions(+), 70 deletions(-)

diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index dd62f2a4050c..0347eb897c8a 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -114,7 +114,7 @@ static void clipper_init(MachineState *machine)
 error_report("no palcode provided");
 exit(1);
 }
-size = load_elf(palcode_filename, cpu_alpha_superpage_to_phys,
+size = load_elf(palcode_filename, NULL, cpu_alpha_superpage_to_phys,
 NULL, _entry, _low, _high,
 0, EM_ALPHA, 0, 0);
 if (size < 0) {
@@ -133,7 +133,7 @@ static void clipper_init(MachineState *machine)
 if (kernel_filename) {
 uint64_t param_offset;
 
-size = load_elf(kernel_filename, cpu_alpha_superpage_to_phys,
+size = load_elf(kernel_filename, NULL, cpu_alpha_superpage_to_phys,
 NULL, _entry, _low, _high,
 0, EM_ALPHA, 0, 0);
 if (size < 0) {
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index 4bf9131b81e4..a4d528537eb4 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -298,7 +298,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 as = cpu_get_address_space(cs, asidx);
 
 if (kernel_filename) {
-image_size = load_elf_as(kernel_filename, NULL, NULL, , ,
+image_size = load_elf_as(kernel_filename, NULL, NULL, NULL,
+ , ,
  NULL, big_endian, EM_ARM, 1, 0, as);
 if (image_size < 0) {
 image_size = load_image_targphys_as(kernel_filename, 0,
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 94fce128028c..2b59379be6af 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -884,7 +884,7 @@ static int64_t arm_load_elf(struct arm_boot_info *info, 
uint64_t *pentry,
 }
 }
 
-ret = load_elf_as(info->kernel_filename, NULL, NULL,
+ret = load_elf_as(info->kernel_filename, NULL, NULL, NULL,
   pentry, lowaddr, highaddr, big_endian, elf_machine,
   1, data_swab, as);
 if (ret <= 0) {
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index fbae05fb3b64..3695dd439cd0 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -136,7 +136,7 @@ static void generic_loader_realize(DeviceState *dev, Error 
**errp)
 AddressSpace *as = s->cpu ? s->cpu->as :  NULL;
 
 if (!s->force_raw) {
-size =

Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file

2018-12-21 Thread Liam Merwick

On 11/12/2018 14:57, Liam Merwick wrote:

On 11/12/2018 14:01, Stefan Hajnoczi wrote:

On Wed, Dec 05, 2018 at 10:37:24PM +, Liam Merwick wrote:

From: Liam Merwick 

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary without the need to run 
firmware.


https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson 
Signed-off-by: Liam Merwick 
Reviewed-by: Konrad Rzeszutek Wilk 
---
  include/hw/xen/start_info.h | 146 


  1 file changed, 146 insertions(+)
  create mode 100644 include/hw/xen/start_info.h


Does it make sense to bring in Linux
include/xen/interface/hvm/start_info.h via QEMU's
include/standard-headers/?

QEMU has a script in scripts/update-linux-header.sh for syncing Linux
headers into include/standard-headers/.  This makes it easy to keep
Linux header files up-to-date.  We basically treat files in
include/standard-headers/ as auto-generated.

If you define start_info.h yourself without using
include/standard-headers/, then it won't be synced with Linux.



That does seem better.  I will make that change.


When attempting to implement this, I found the canonical copy of this 
header file is actually in Xen and the Linux copy is kept in sync with 
that.  Also, 'make headers_install' doesn't install those Xen headers.


Instead I updated the commit comment to mention the canonical copy 
location.  This file isn't expected to change much so I think keeping it 
in sync in future shouldn't be onerous.


Regards,
Liam



[Qemu-devel] [RFC v2 3/4] pvh: Add x86/HVM direct boot ABI header file

2018-12-21 Thread Liam Merwick
From: Liam Merwick 

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary with minimal firmware involvement.

https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

The canonical version of start_info.h is in the Xen codebase.
(like QEMU, the Linux kernel uses a copy as well).

Signed-off-by: Liam Merwick 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 include/hw/xen/start_info.h | 146 
 1 file changed, 146 insertions(+)
 create mode 100644 include/hw/xen/start_info.h

diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h
new file mode 100644
index ..348779eb10cd
--- /dev/null
+++ b/include/hw/xen/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 ++
+ *| magic  | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *|| ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 ++
+ *| version| Version of this structure. Current version is 1. New
+ *|| versions are guaranteed to be backwards-compatible.
+ *  8 ++
+ *| flags  | SIF_xxx flags.
+ * 12 ++
+ *| nr_modules | Number of modules passed to the kernel.
+ * 16 ++
+ *| modlist_paddr  | Physical address of an array of modules
+ *|| (layout of the structure below).
+ * 24 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 32 ++
+ *| rsdp_paddr | Physical address of the RSDP ACPI data structure.
+ * 40 ++
+ *| memmap_paddr   | Physical address of the (optional) memory map. Only
+ *|| present in version 1 and newer of the structure.
+ * 48 ++
+ *| memmap_entries | Number of entries in the memory map table. Only
+ *|| present in version 1 and newer of the structure.
+ *|| Zero if there is no memory map being provided.
+ * 52 ++
+ *| reserved   | Version 1 and newer only.
+ * 56 ++
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 ++
+ *| paddr  | Physical address of the module.
+ *  8 ++
+ *| size   | Size of the module in bytes.
+ * 16 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 24 ++
+ *| reserved   |
+ * 32 ++
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 ++
+ *| addr   | Base address
+ *  8 ++
+ *| size   | Size of mapping in bytes
+ * 16 ++
+ *| type   | Type of mapping as defined between the hypervisor
+ *|| and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +|
+ *| reserved   |
+ * 24 ++
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of t

Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file

2018-12-11 Thread Liam Merwick




On 11/12/2018 14:01, Stefan Hajnoczi wrote:

On Wed, Dec 05, 2018 at 10:37:24PM +, Liam Merwick wrote:

From: Liam Merwick 

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary without the need to run firmware.

https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson 
Signed-off-by: Liam Merwick 
Reviewed-by: Konrad Rzeszutek Wilk 
---
  include/hw/xen/start_info.h | 146 
  1 file changed, 146 insertions(+)
  create mode 100644 include/hw/xen/start_info.h


Does it make sense to bring in Linux
include/xen/interface/hvm/start_info.h via QEMU's
include/standard-headers/?

QEMU has a script in scripts/update-linux-header.sh for syncing Linux
headers into include/standard-headers/.  This makes it easy to keep
Linux header files up-to-date.  We basically treat files in
include/standard-headers/ as auto-generated.

If you define start_info.h yourself without using
include/standard-headers/, then it won't be synced with Linux.



That does seem better.  I will make that change.

One a related note, I'm trying to fix the mingw compilation errors [1] 
in this series also.  I can fix the format issues with PRIx64, etc but I 
can't seem to find an include file to provide a declaration of mmap() 
et. al. - has this been resolved before? A pointer to something similar 
to investigate would be very welcome.


Regards,
Liam

[1] 
http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merw...@oracle.com/testing.docker-mingw@fedora/?type=message




Re: [Qemu-devel] [RFC 0/3] qboot changes for PVH boot

2018-12-07 Thread Liam Merwick




On 06/12/2018 20:13, Paolo Bonzini wrote:

On 05/12/18 23:31, Liam Merwick wrote:

For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002

This patch series provides qboot support to populate the start_info struct
needed by the direct boot ABI and to configure the guest e820 tables to
enable QEMU to use that same entry point for booting KVM guests.

Usіng the methods/scripts documented by the NEMU team at

https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build).
Time to get to kernel start is almost halved (95ṁs -> 48ms)


I had a few comments on patch 2, but overall it looks fine.  Can you
send a pull request on GitHub?



Thanks Paolo.  I have made those changes and submitted a PR:

https://github.com/bonzini/qboot/pull/17

Regards,
Liam





QEMU + qboot + vmlinux (PVH + 4.20-rc4)
  qemu_init_end: 41.550521
  fw_start: 41.667139 (+0.116618)
  fw_do_boot: 47.448495 (+5.781356)
  linux_startup_64: 47.720785 (+0.27229)
  linux_start_kernel: 48.399541 (+0.678756)
  linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
  qemu_init_end: 29.209276
  fw_start: 29.317342 (+0.108066)
  linux_start_boot: 36.679362 (+7.36202)
  linux_startup_64: 94.531349 (+57.851987)
  linux_start_kernel: 94.900913 (+0.369564)
  linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
  qemu_init_end: 30.424430
  linux_startup_64: 893.770334 (+863.345904)
  linux_start_kernel: 894.17049 (+0.400156)
  linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (3):
   pvh: Add x86/HVM direct boot ABI header file
   pvh: use x86/HVM direct boot ABI
   pvh: add benchmark exit point

  benchmark.h  |   3 +-
  fw_cfg.c |  79 +++-
  include/start_info.h | 146 +++
  linuxboot.c  |   2 +-
  main.c   |   3 ++
  tables.c |   9 
  6 files changed, 239 insertions(+), 3 deletions(-)
  create mode 100644 include/start_info.h







[Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot

2018-12-05 Thread Liam Merwick
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002
qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  This is called when initialising the machine state
in pc_memory_init().  Later on in load_linux() if the kernel entry
address is present, the uncompressed kernel binary (ELF) is loaded
and qboot does futher initialisation of the guest (e820, etc.) and
jumps to the kernel entry address and boots the guest.


Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (3):
  pvh: Add x86/HVM direct boot ABI header file
  pc: Read PVH entry point from ELF note in kernel binary
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/i386/pc.c| 344 +++-
 include/elf.h   |  10 ++
 include/hw/xen/start_info.h | 146 +++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/xen/start_info.h

-- 
1.8.3.1




[Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI

2018-12-05 Thread Liam Merwick
These changes (along with corresponding qboot and Linux kernel changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point has already been read
from an ELF Note in the uncompressed kernel binary earlier
in pc_memory_init().

Signed-off-by: George Kennedy 
Signed-off-by: Liam Merwick 
---
 hw/i386/pc.c | 72 
 1 file changed, 72 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 056aa46d99b9..d3012cbd8597 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
 #include "sysemu/qtest.h"
 #include "kvm_i386.h"
 #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
@@ -1098,6 +1099,50 @@ done:
 return pvh_start_addr != 0;
 }
 
+static bool load_elfboot(const char *kernel_filename,
+   int kernel_file_size,
+   uint8_t *header,
+   size_t pvh_xen_start_addr,
+   FWCfgState *fw_cfg)
+{
+uint32_t flags = 0;
+uint32_t mh_load_addr = 0;
+uint32_t elf_kernel_size = 0;
+uint64_t elf_entry;
+uint64_t elf_low, elf_high;
+int kernel_size;
+
+if (ldl_p(header) != 0x464c457f) {
+return false; /* no elfboot */
+}
+
+bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+flags = elf_is64 ?
+((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+error_report("elfboot unsupported flags = %x", flags);
+exit(1);
+}
+
+kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
+   _low, _high, 0, I386_ELF_MACHINE,
+   0, 0);
+
+if (kernel_size < 0) {
+error_report("Error while loading elf kernel");
+exit(1);
+}
+mh_load_addr = elf_low;
+elf_kernel_size = elf_high - elf_low;
+
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
+return true;
+}
+
 static void load_linux(PCMachineState *pcms,
FWCfgState *fw_cfg)
 {
@@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
 if (ldl_p(header+0x202) == 0x53726448) {
 protocol = lduw_p(header+0x206);
 } else {
+/* If the kernel address for using the x86/HVM direct boot ABI has
+ * been saved then proceed with booting the uncompressed kernel */
+if (pvh_start_addr) {
+if (load_elfboot(kernel_filename, kernel_size,
+ header, pvh_start_addr, fw_cfg)) {
+struct hvm_modlist_entry ramdisk_mod = { 0 };
+
+fclose(f);
+
+fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+strlen(kernel_cmdline) + 1);
+fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+
+assert(machine->device_memory != NULL);
+ramdisk_mod.paddr = machine->device_memory->base;
+ramdisk_mod.size =
+memory_region_size(>device_memory->mr);
+
+fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, _mod,
+ sizeof(ramdisk_mod));
+fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
+fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
+ header, sizeof(header));
+
+return;
+}
+}
 /* This looks like a multiboot kernel. If it is, let's stop
treating it like a Linux kernel. */
 if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
-- 
1.8.3.1




[Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file

2018-12-05 Thread Liam Merwick
From: Liam Merwick 

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary without the need to run firmware.

https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson 
Signed-off-by: Liam Merwick 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 include/hw/xen/start_info.h | 146 
 1 file changed, 146 insertions(+)
 create mode 100644 include/hw/xen/start_info.h

diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h
new file mode 100644
index ..348779eb10cd
--- /dev/null
+++ b/include/hw/xen/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 ++
+ *| magic  | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *|| ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 ++
+ *| version| Version of this structure. Current version is 1. New
+ *|| versions are guaranteed to be backwards-compatible.
+ *  8 ++
+ *| flags  | SIF_xxx flags.
+ * 12 ++
+ *| nr_modules | Number of modules passed to the kernel.
+ * 16 ++
+ *| modlist_paddr  | Physical address of an array of modules
+ *|| (layout of the structure below).
+ * 24 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 32 ++
+ *| rsdp_paddr | Physical address of the RSDP ACPI data structure.
+ * 40 ++
+ *| memmap_paddr   | Physical address of the (optional) memory map. Only
+ *|| present in version 1 and newer of the structure.
+ * 48 ++
+ *| memmap_entries | Number of entries in the memory map table. Only
+ *|| present in version 1 and newer of the structure.
+ *|| Zero if there is no memory map being provided.
+ * 52 ++
+ *| reserved   | Version 1 and newer only.
+ * 56 ++
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 ++
+ *| paddr  | Physical address of the module.
+ *  8 ++
+ *| size   | Size of the module in bytes.
+ * 16 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 24 ++
+ *| reserved   |
+ * 32 ++
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 ++
+ *| addr   | Base address
+ *  8 ++
+ *| size   | Size of mapping in bytes
+ * 16 ++
+ *| type   | Type of mapping as defined between the hypervisor
+ *|| and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +|
+ *| reserved   |
+ * 24 ++
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of the hvm_start_info structure have evolved like this:
+ *
+ * Version 0:
+ *
+ * Version 1:

[Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary

2018-12-05 Thread Liam Merwick
From: Liam Merwick 

Add support to read the PVH Entry address from an ELF note in the
uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
This 32-bit entry point will be used by QEMU to load the kernel in the
guest and jump into the kernel entry point.

For now, a call to this function is added in pc_memory_init() to read the
address - a future patch will use the entry point.

Signed-off-by: Liam Merwick 
---
 hw/i386/pc.c  | 272 +-
 include/elf.h |  10 +++
 2 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f095725dbab2..056aa46d99b9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
 void gsi_handler(void *opaque, int n, int level)
 {
 GSIState *s = opaque;
@@ -834,6 +837,267 @@ struct setup_data {
 uint8_t data[0];
 } __attribute__((packed));
 
+/*
+ * Search through the ELF Notes for an entry with the given
+ * ELF Note type
+ */
+static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
+size_t elf_note_type)
+{
+void *nhdr = NULL;
+size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+size_t elf_note_entry_sz = 0;
+size_t phdr_off;
+size_t phdr_align;
+size_t phdr_memsz;
+size_t nhdr_namesz;
+size_t nhdr_descsz;
+size_t note_type;
+
+phdr_off = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+phdr_align = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+phdr_memsz = elf_is64 ?
+((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+nhdr = ehdr + phdr_off;
+note_type = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+nhdr_namesz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+nhdr_descsz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+while (note_type != elf_note_type) {
+elf_note_entry_sz = nhdr_size +
+QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+/*
+ * Verify that we haven't exceeded the end of the ELF Note section.
+ * If we have, then there is no note of the given type present
+ * in the ELF Notes.
+ */
+if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
+error_report("Note type (0x%lx) not found in ELF Note section",
+elf_note_type);
+return NULL;
+}
+
+/* skip to the next ELF Note entry */
+nhdr += elf_note_entry_sz;
+note_type = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+nhdr_namesz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+nhdr_descsz = elf_is64 ?
+((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+}
+
+return nhdr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ * This function reads the ELF headers of the binary specified on the
+ * command line by -kernel (path contained in 'filename') and discovers
+ * the PVH entry address from the appropriate ELF Note.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable. The ELF class of the binary is returned via 'elfclass'
+ * (although the entry point is 32-bit, the kernel binary can be either
+ * 32-bit or 64-bit).
+ */
+static bool read_pvh_start_addr_elf_note(const char *filename,
+unsigned char *elfclass)
+{
+void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
+void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
+void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
+struct stat statbuf;
+size_t ehdr_size;
+size_t phdr_size;
+size_t nhdr_size;
+size_t elf_note_data_addr;
+/* Ehdr fields */
+size_t ehdr_poff;
+/* Phdr fields */
+size_t phdr_off;
+size_t phdr_align;
+size_t phdr_memsz;
+size_t phdr_type;
+/* Nhdr fields */
+size_t nhdr_namesz;
+size_t nhdr_descsz;
+bool elf_is64;
+FILE *file;
+union {
+Elf32_Ehdr h32;
+Elf64_Ehdr h64;
+} elf_header;
+Error *err = NULL;
+
+pvh_start_addr = 0;
+
+if (filename == NULL) {
+return false;
+}
+
+file = fopen(filename, "rb");
+if (file == NULL)

[Qemu-devel] [RFC qboot 2/3] pvh: use x86/HVM direct boot ABI

2018-12-05 Thread Liam Merwick
These changes (along with corresponding QEMU and Linux kernel changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

QEMU parses the uncompressed kernel binary passed to it via -kernel
to read the ELF Note which contains the address to be loaded.  QEMU
then depends on qboot to populate the start_info struct needed by
the direct boot ABI and configure the guest e820 tables before
jumping to the loaded kernel entry.

Signed-off-by: George Kennedy 
Signed-off-by: Liam Merwick 
---
 fw_cfg.c| 72 -
 linuxboot.c |  2 +-
 main.c  |  3 +++
 tables.c|  9 
 4 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/fw_cfg.c b/fw_cfg.c
index f5aac739b921..e13ec20d0e8b 100644
--- a/fw_cfg.c
+++ b/fw_cfg.c
@@ -8,6 +8,10 @@
 #include "linuxboot.h"
 #include "multiboot.h"
 #include "benchmark.h"
+#include "start_info.h"
+
+extern struct hvm_start_info start_info;
+extern inline uint32_t ldl_p(void *p);
 
 struct fw_cfg_file {
uint32_t size;
@@ -184,6 +188,67 @@ static void boot_multiboot_from_fw_cfg(void)
panic();
 }
 
+static void pvh_e820_setup()
+{
+   struct hvm_memmap_table_entry *pvh_e820p;
+   int i, pvh_e820_sz;
+
+   pvh_e820_sz = sizeof(struct hvm_memmap_table_entry) * e820->nr_map;
+
+   pvh_e820p = malloc(pvh_e820_sz);
+   memset(pvh_e820p, 0, pvh_e820_sz);
+
+   for (i = 0; i < e820->nr_map; i++) {
+   pvh_e820p[i].addr = e820->map[i].addr;
+   pvh_e820p[i].size = e820->map[i].size;
+   pvh_e820p[i].type = e820->map[i].type;
+   }
+   start_info.memmap_paddr = (uintptr_t)pvh_e820p;
+   start_info.memmap_entries = e820->nr_map;
+}
+
+void boot_pvh_from_fw_cfg(void)
+{
+   void *kernel_entry;
+   uint32_t sz;
+   struct linuxboot_args args;
+   struct hvm_modlist_entry ramdisk_mod;
+
+   start_info.magic = XEN_HVM_START_MAGIC_VALUE;
+   start_info.version = 1;
+   start_info.flags = 0;
+   start_info.nr_modules = 1;
+   start_info.reserved = 0;
+
+   fw_cfg_select(FW_CFG_CMDLINE_SIZE);
+   args.cmdline_size = fw_cfg_readl_le();
+   args.cmdline_addr = malloc(args.cmdline_size);
+   fw_cfg_read_entry(FW_CFG_CMDLINE_DATA, args.cmdline_addr,
+ args.cmdline_size);
+   start_info.cmdline_paddr = (uintptr_t)args.cmdline_addr;
+
+   /* Use this field for pvhboot. Not used by pvhboot otherwise */
+   fw_cfg_read_entry(FW_CFG_KERNEL_DATA, _mod,
+ sizeof(ramdisk_mod));
+   ramdisk_mod.cmdline_paddr = (uintptr_t)_mod;
+   start_info.modlist_paddr = (uintptr_t)_mod;
+
+   pvh_e820_setup();
+
+   fw_cfg_select(FW_CFG_KERNEL_SIZE);
+   sz = fw_cfg_readl_le();
+   if (!sz)
+   panic();
+
+   fw_cfg_select(FW_CFG_KERNEL_ENTRY);
+   kernel_entry = (void *) fw_cfg_readl_le();
+   asm volatile("movl %0, %%ebx" : : "r"(_info));
+
+   asm volatile("jmp *%2" : : "a" (0x2badb002),
+"b"(_info), "c"(kernel_entry));
+   panic();
+}
+
 void boot_from_fwcfg(void)
 {
struct linuxboot_args args;
@@ -208,8 +273,13 @@ void boot_from_fwcfg(void)
fw_cfg_select(FW_CFG_SETUP_DATA);
fw_cfg_read(args.header, sizeof(args.header));
 
-   if (!parse_bzimage())
+   if (!parse_bzimage()) {
+   uint8_t *header = args.header;
+
+   if (ldl_p(header) == 0x464c457f)  /* ELF magic */
+   boot_pvh_from_fw_cfg();
boot_multiboot_from_fw_cfg();
+   }
 
/* SETUP_DATA already selected */
if (args.setup_size > sizeof(args.header))
diff --git a/linuxboot.c b/linuxboot.c
index a5f1c4fa078d..573052cc0f78 100644
--- a/linuxboot.c
+++ b/linuxboot.c
@@ -12,7 +12,7 @@ static inline uint16_t lduw_p(void *p)
return val;
 }
 
-static inline uint32_t ldl_p(void *p)
+inline uint32_t ldl_p(void *p)
 {
uint32_t val;
memcpy(, p, 4);
diff --git a/main.c b/main.c
index 699cc1a8e6e7..725d50b94e1e 100644
--- a/main.c
+++ b/main.c
@@ -8,6 +8,9 @@
 #include "pflash.h"
 #include "pci.h"
 #include "benchmark.h"
+#include "start_info.h"
+
+struct hvm_start_info start_info = {0};
 
 static void set_realmode_int(int vec, void *p)
 {
diff --git a/tables.c b/tables.c
index 32b2406b9aed..47bef4125d9e 100644
--- a/tables.c
+++ b/tables.c
@@ -2,6 +2,9 @@
 #include "stdio.h"
 #include "fw_cfg.h"
 #include "string.h"
+#include "start_info.h"
+
+extern struct hvm_start_info start_info;
 
 struct loader_cmd {
uint32_t cmd;
@@ -122,6 +125,8 @@ static void do_checksum(char *file, uint32_t offset, 
uint32_t start, uint32_t le
p[offset] -= csum8([start], len);
 }
 

[Qemu-devel] [RFC qboot 3/3] pvh: add benchmark exit point

2018-12-05 Thread Liam Merwick
This commit adds a PVH specific VM exit point for use in benchmarking
boot times using a QEMU specific device that terminates the QEMU process
and thus the VM itself when handling those VM exits. Since the VM
terminates right at those exit points, generic tools like time can
be used to measure the time spent between the QEMU startup
and termination moments.

The QEMU device used for those measurement is called isa-debug-exit
for the PC and Q35 machine types. These devices take 2 arguments:
iobase and iosize.  iobase specifies which IO port we need to write
into to have these devices eventually handle the corresponding VM exit.

If for example, QEMU is started with the following argument:

-device isa-debug-exit,iobase=0xf4

then any IO write to 0xf4 will terminate the QEMU process and the
corresponding VM.

Signed-off-by: Liam Merwick 
---
 benchmark.h | 3 ++-
 fw_cfg.c| 7 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/benchmark.h b/benchmark.h
index 089c549b5803..2be08e7b2cec 100644
--- a/benchmark.h
+++ b/benchmark.h
@@ -9,5 +9,6 @@
 #define FW_START1
 #define LINUX_START_FWCFG 2
 #define LINUX_START_BOOT  3
+#define LINUX_START_PVHBOOT  4
 
-#endif
+#endif /* BENCHMARK_H */
diff --git a/fw_cfg.c b/fw_cfg.c
index e13ec20d0e8b..690ff19e74a0 100644
--- a/fw_cfg.c
+++ b/fw_cfg.c
@@ -240,10 +240,17 @@ void boot_pvh_from_fw_cfg(void)
if (!sz)
panic();
 
+
fw_cfg_select(FW_CFG_KERNEL_ENTRY);
kernel_entry = (void *) fw_cfg_readl_le();
asm volatile("movl %0, %%ebx" : : "r"(_info));
 
+#ifdef BENCHMARK_HACK
+   /* Exit just before jumping to vmlinux, so that it is easy
+* to time/profile the firmware.
+*/
+   outb(LINUX_EXIT_PORT, LINUX_START_PVHBOOT);
+#endif
asm volatile("jmp *%2" : : "a" (0x2badb002),
 "b"(_info), "c"(kernel_entry));
panic();
-- 
1.8.3.1




[Qemu-devel] [RFC qboot 1/3] pvh: Add x86/HVM direct boot ABI header file

2018-12-05 Thread Liam Merwick
The x86/HVM direct boot ABI permits a guest to be able to boot directly
into the uncompressed Linux kernel binary.

https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson 
Signed-off-by: Liam Merwick 
---
 include/start_info.h | 146 +++
 1 file changed, 146 insertions(+)
 create mode 100644 include/start_info.h

diff --git a/include/start_info.h b/include/start_info.h
new file mode 100644
index ..348779eb10cd
--- /dev/null
+++ b/include/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 ++
+ *| magic  | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *|| ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 ++
+ *| version| Version of this structure. Current version is 1. New
+ *|| versions are guaranteed to be backwards-compatible.
+ *  8 ++
+ *| flags  | SIF_xxx flags.
+ * 12 ++
+ *| nr_modules | Number of modules passed to the kernel.
+ * 16 ++
+ *| modlist_paddr  | Physical address of an array of modules
+ *|| (layout of the structure below).
+ * 24 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 32 ++
+ *| rsdp_paddr | Physical address of the RSDP ACPI data structure.
+ * 40 ++
+ *| memmap_paddr   | Physical address of the (optional) memory map. Only
+ *|| present in version 1 and newer of the structure.
+ * 48 ++
+ *| memmap_entries | Number of entries in the memory map table. Only
+ *|| present in version 1 and newer of the structure.
+ *|| Zero if there is no memory map being provided.
+ * 52 ++
+ *| reserved   | Version 1 and newer only.
+ * 56 ++
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 ++
+ *| paddr  | Physical address of the module.
+ *  8 ++
+ *| size   | Size of the module in bytes.
+ * 16 ++
+ *| cmdline_paddr  | Physical address of the command line,
+ *|| a zero-terminated ASCII string.
+ * 24 ++
+ *| reserved   |
+ * 32 ++
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 ++
+ *| addr   | Base address
+ *  8 ++
+ *| size   | Size of mapping in bytes
+ * 16 ++
+ *| type   | Type of mapping as defined between the hypervisor
+ *|| and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +|
+ *| reserved   |
+ * 24 ++
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of the hvm_start_info structure have evolved like this:
+ *
+ * Version 0:
+ *
+ * Version 1:   Added the memmap_paddr/memmap_entries fields (plus 4 bytes of
+ *  padding) to the end

[Qemu-devel] [RFC 0/3] qboot changes for PVH boot

2018-12-05 Thread Liam Merwick
For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002

This patch series provides qboot support to populate the start_info struct
needed by the direct boot ABI and to configure the guest e820 tables to
enable QEMU to use that same entry point for booting KVM guests.

Usіng the methods/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build).
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (3):
  pvh: Add x86/HVM direct boot ABI header file
  pvh: use x86/HVM direct boot ABI
  pvh: add benchmark exit point

 benchmark.h  |   3 +-
 fw_cfg.c |  79 +++-
 include/start_info.h | 146 +++
 linuxboot.c  |   2 +-
 main.c   |   3 ++
 tables.c |   9 
 6 files changed, 239 insertions(+), 3 deletions(-)
 create mode 100644 include/start_info.h

-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] vvfat: Fix memory leak

2018-11-14 Thread Liam Merwick




On 14/11/2018 12:55, Kevin Wolf wrote:

Don't leak 'cluster' in the mapping == NULL case. Found by Coverity
(CID 1055918).

Fixes: 8d9401c2791ee2d2805b741b1ee3006041edcd3e
Signed-off-by: Kevin Wolf 


Reviewed-by: Liam Merwick 

Thanks.


---
  block/vvfat.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 1de5de1db4..b7b61ea8b7 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2510,7 +2510,7 @@ static int commit_one_file(BDRVVVFATState* s,
  uint32_t first_cluster = c;
  mapping_t* mapping = find_mapping_for_cluster(s, c);
  uint32_t size = filesize_of_direntry(direntry);
-char* cluster = g_malloc(s->cluster_size);
+char *cluster;
  uint32_t i;
  int fd = 0;
  
@@ -2528,17 +2528,17 @@ static int commit_one_file(BDRVVVFATState* s,

  if (fd < 0) {
  fprintf(stderr, "Could not open %s... (%s, %d)\n", mapping->path,
  strerror(errno), errno);
-g_free(cluster);
  return fd;
  }
  if (offset > 0) {
  if (lseek(fd, offset, SEEK_SET) != offset) {
  qemu_close(fd);
-g_free(cluster);
  return -3;
  }
  }
  
+cluster = g_malloc(s->cluster_size);

+
  while (offset < size) {
  uint32_t c1;
  int rest_size = (size - offset > s->cluster_size ?





Re: [Qemu-devel] [PATCH v2] hw/bt: drop bluetooth keyboard emulation

2018-11-14 Thread Liam Merwick




On 13/11/2018 11:57, Gerd Hoffmann wrote:

Broken (segfaults on first keypress, probably since 2.1) and apparently unused.

Reproducer:
   qemu -usb -device usb-bt-dongle -bt hci,vlan=0 -bt device:keyboard

Stacktrace:
   #0  0x5606841f7363 in bt_hid_send_data (ch=0x0, type=1, data=0x560686986400 
"", len=8)
   at /home/kraxel/projects/qemu/hw/bt/hid.c:178
   #1  0x5606841f7afa in bt_hid_datain (hs=0x560686985eb8)
   at /home/kraxel/projects/qemu/hw/bt/hid.c:387
   #2  0x5606842643d8 in hid_keyboard_event (dev=0x560686985eb8, 
src=0x5606869b9c10, evt=0x5606868772a0)
   at /home/kraxel/projects/qemu/hw/input/hid.c:245
   #3  0x5606843eab39 in qemu_input_event_send_impl (src=0x5606869b9c10, 
evt=0x5606868772a0)
   at /home/kraxel/projects/qemu/ui/input.c:346
   #4  0x5606843caadd in replay_input_event (src=0x5606869b9c10, 
evt=0x5606868772a0)
   at /home/kraxel/projects/qemu/replay/replay-input.c:128
   #5  0x5606843eabf4 in qemu_input_event_send (src=0x5606869b9c10, 
evt=0x5606868772a0)
   at /home/kraxel/projects/qemu/ui/input.c:375
   #6  0x5606843ead43 in qemu_input_event_send_key (src=0x5606869b9c10, 
key=0x560686819e50, down=true) at /home/kraxel/projects/qemu/ui/input.c:419
   #7  0x5606843eae23 in qemu_input_event_send_key_qcode 
(src=0x5606869b9c10, q=Q_KEY_CODE_RET, down=true)
   at /home/kraxel/projects/qemu/ui/input.c:441
   [ ... snip ... ]

Signed-off-by: Gerd Hoffmann 


One question below, otherwise

Reviewed-by: Liam Merwick 
Tested-by: Liam Merwick 


---
  include/hw/bt.h |   3 -
  hw/bt/hid.c | 554 
  vl.c|  34 +---
  hw/bt/Makefile.objs |   3 +-
  qemu-doc.texi   |   6 +-
  qemu-options.hx |   9 -
  6 files changed, 3 insertions(+), 606 deletions(-)
  delete mode 100644 hw/bt/hid.c



[ snip]


diff --git a/qemu-options.hx b/qemu-options.hx
index 38c7a978c1..48885cdca8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx


Is this extra removal needed too?

@@ -2757,9 +2757,7 @@ DEF("bt", HAS_ARG, QEMU_OPTION_bt, \
 "-bt hci[,vlan=n]\n" \
 "emulate a standard HCI in virtual scatternet 'n'\n" \
 "-bt vhci[,vlan=n]\n" \
-"add host computer to virtual scatternet 'n' using 
VHCI\n" \

-"-bt device:dev[,vlan=n]\n" \
-"emulate a bluetooth device 'dev' in scatternet 'n'\n",
+"add host computer to virtual scatternet 'n' using 
VHCI\n", \

 QEMU_ARCH_ALL)
 STEXI
 @item -bt hci[...]





@@ -2804,15 +2804,6 @@ be used as following:
  qemu-system-i386 [...OPTIONS...] -bt hci,vlan=5 -bt vhci,vlan=5
  @end example
  
-@item -bt device:@var{dev}[,vlan=@var{n}]

-Emulate a bluetooth device @var{dev} and place it in network @var{n}
-(default @code{0}).  QEMU can only emulate one type of bluetooth devices
-currently:
-
-@table @option
-@item keyboard
-Virtual wireless keyboard implementing the HIDP bluetooth profile.
-@end table
  ETEXI
  
  STEXI






Re: [Qemu-devel] [PATCH] slirp: add tftp tracing

2018-11-12 Thread Liam Merwick



On 13/11/2018 07:03, Gerd Hoffmann wrote:

Useful when debugging pxeboot, to see what the guest tries to do.

Signed-off-by: Gerd Hoffmann 


Reviewed-by: Liam Merwick 



---
  Makefile.objs  | 1 +
  slirp/tftp.c   | 3 +++
  slirp/trace-events | 5 +
  3 files changed, 9 insertions(+)
  create mode 100644 slirp/trace-events

diff --git a/Makefile.objs b/Makefile.objs
index 1e1ff387d7..31852eaf8f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -251,6 +251,7 @@ trace-events-subdirs += net
  trace-events-subdirs += qapi
  trace-events-subdirs += qom
  trace-events-subdirs += scsi
+trace-events-subdirs += slirp
  trace-events-subdirs += target/arm
  trace-events-subdirs += target/i386
  trace-events-subdirs += target/mips
diff --git a/slirp/tftp.c b/slirp/tftp.c
index a9bc4bb1b6..735b57aa55 100644
--- a/slirp/tftp.c
+++ b/slirp/tftp.c
@@ -26,6 +26,7 @@
  #include "slirp.h"
  #include "qemu-common.h"
  #include "qemu/cutils.h"
+#include "trace.h"
  
  static inline int tftp_session_in_use(struct tftp_session *spt)

  {
@@ -204,6 +205,7 @@ static void tftp_send_error(struct tftp_session *spt,
struct mbuf *m;
struct tftp_t *tp;
  
+  trace_slirp_tftp_error(msg);

m = m_get(spt->slirp);
  
if (!m) {

@@ -323,6 +325,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct 
sockaddr_storage *srcsas,
break;
  }
}
+  trace_slirp_tftp_rrq(req_fname);
  
/* check mode */

if ((pktlen - k) < 6) {
diff --git a/slirp/trace-events b/slirp/trace-events
new file mode 100644
index 00..ff8f656e8c
--- /dev/null
+++ b/slirp/trace-events
@@ -0,0 +1,5 @@
+# See docs/devel/tracing.txt for syntax documentation.
+
+# slirp/tftp.c
+slirp_tftp_rrq(const char *file) "file: %s"
+slirp_tftp_error(const char *file) "msg: %s"





Re: [Qemu-devel] [PATCH] hw/bt: drop bluetooth keyboard emulation.

2018-11-09 Thread Liam Merwick




On 09/11/2018 14:14, Gerd Hoffmann wrote:

Broken (segfaultson first keypress) and appearently unused.



s/segfaultson/segfaults on/
s/appearently/apparently/


Signed-off-by: Gerd Hoffmann 


one question at the end, otherwise

Reviewed-by: Liam Merwick 


---
  include/hw/bt.h |   3 -
  hw/bt/hid.c | 554 
  vl.c|  34 +---
  hw/bt/Makefile.objs |   3 +-
  qemu-options.hx |   9 -
  5 files changed, 2 insertions(+), 601 deletions(-)
  delete mode 100644 hw/bt/hid.c

diff --git a/include/hw/bt.h b/include/hw/bt.h
index b5e11d4d43..73fb1911f6 100644
--- a/include/hw/bt.h
+++ b/include/hw/bt.h
@@ -173,9 +173,6 @@ enum bt_l2cap_psm_predef {
  /* bt-sdp.c */
  void bt_l2cap_sdp_init(struct bt_l2cap_device_s *dev);
  
-/* bt-hid.c */

-struct bt_device_s *bt_keyboard_init(struct bt_scatternet_s *net);
-
  /* Link Management Protocol layer defines */
  
  #define LLID_ACLU_CONT		0x1

diff --git a/hw/bt/hid.c b/hw/bt/hid.c
deleted file mode 100644
index 056291f9b5..00
--- a/hw/bt/hid.c
+++ /dev/null
@@ -1,554 +0,0 @@
-/*
- * QEMU Bluetooth HID Profile wrapper for USB HID.
- *
- * Copyright (C) 2007-2008 OpenMoko, Inc.
- * Written by Andrzej Zaborowski 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 or
- * (at your option) version 3 of the License.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along
- * with this program; if not, if not, see <http://www.gnu.org/licenses/>.
- */
-
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "qemu/timer.h"
-#include "ui/console.h"
-#include "hw/input/hid.h"
-#include "hw/bt.h"
-
-enum hid_transaction_req {
-BT_HANDSHAKE   = 0x0,
-BT_HID_CONTROL = 0x1,
-BT_GET_REPORT  = 0x4,
-BT_SET_REPORT  = 0x5,
-BT_GET_PROTOCOL= 0x6,
-BT_SET_PROTOCOL= 0x7,
-BT_GET_IDLE= 0x8,
-BT_SET_IDLE= 0x9,
-BT_DATA= 0xa,
-BT_DATC= 0xb,
-};
-
-enum hid_transaction_handshake {
-BT_HS_SUCCESSFUL   = 0x0,
-BT_HS_NOT_READY= 0x1,
-BT_HS_ERR_INVALID_REPORT_ID= 0x2,
-BT_HS_ERR_UNSUPPORTED_REQUEST  = 0x3,
-BT_HS_ERR_INVALID_PARAMETER= 0x4,
-BT_HS_ERR_UNKNOWN  = 0xe,
-BT_HS_ERR_FATAL= 0xf,
-};
-
-enum hid_transaction_control {
-BT_HC_NOP  = 0x0,
-BT_HC_HARD_RESET   = 0x1,
-BT_HC_SOFT_RESET   = 0x2,
-BT_HC_SUSPEND  = 0x3,
-BT_HC_EXIT_SUSPEND = 0x4,
-BT_HC_VIRTUAL_CABLE_UNPLUG = 0x5,
-};
-
-enum hid_protocol {
-BT_HID_PROTO_BOOT  = 0,
-BT_HID_PROTO_REPORT= 1,
-};
-
-enum hid_boot_reportid {
-BT_HID_BOOT_INVALID= 0,
-BT_HID_BOOT_KEYBOARD,
-BT_HID_BOOT_MOUSE,
-};
-
-enum hid_data_pkt {
-BT_DATA_OTHER  = 0,
-BT_DATA_INPUT,
-BT_DATA_OUTPUT,
-BT_DATA_FEATURE,
-};
-
-#define BT_HID_MTU 48
-
-/* HID interface requests */
-#define GET_REPORT 0xa101
-#define GET_IDLE   0xa102
-#define GET_PROTOCOL   0xa103
-#define SET_REPORT 0x2109
-#define SET_IDLE   0x210a
-#define SET_PROTOCOL   0x210b
-
-struct bt_hid_device_s {
-struct bt_l2cap_device_s btdev;
-struct bt_l2cap_conn_params_s *control;
-struct bt_l2cap_conn_params_s *interrupt;
-HIDState hid;
-
-int proto;
-int connected;
-int data_type;
-int intr_state;
-struct {
-int len;
-uint8_t buffer[1024];
-} dataother, datain, dataout, feature, intrdataout;
-enum {
-bt_state_ready,
-bt_state_transaction,
-bt_state_suspend,
-} state;
-};
-
-static void bt_hid_reset(struct bt_hid_device_s *s)
-{
-struct bt_scatternet_s *net = s->btdev.device.net;
-
-/* Go as far as... */
-bt_l2cap_device_done(>btdev);
-bt_l2cap_device_init(>btdev, net);
-
-hid_reset(>hid);
-s->proto = BT_HID_PROTO_REPORT;
-s->state = bt_state_ready;
-s->dataother.len = 0;
-s->datain.len 

[Qemu-devel] [PATCH v5 0/5] off-by-one and NULL pointer accesses detected by static analysis

2018-11-05 Thread Liam Merwick
Below are a number of fixes to some off-by-one, read outside array bounds, and
NULL pointer accesses detected by an internal Oracle static analysis tool 
(Parfait).
https://labs.oracle.com/pls/apex/f?p=labs:49:P49_PROJECT_ID:13

v1 -> v2
Based on feedback from Eric Blake:
patch2: reworded commit message to clarify issue
patch6: Reverted common qlist routines and added assert to qlist_dump instead
patch7: Fixed incorrect logic
patch8: Added QEMU_BUILD_BUG_ON to catch future іnstance at compile-time

v2 -> v3
Based on feedback from Eric Blake:
patch6: removed double space from commit message
patch8: removed unnecessary comment and updated QEMU_BUILD_BUG_ON to use 
ARRAY_SIZE
Added Eric's R-b to patches 6,7,8

v3 -> v4
Based on feedback from Max Reitz:
patch2: Added R-b from John Snow
patch3: fixed blk_get_attached_dev_id() instead of checking return value
patch4: switched to assert()
patch5: numerous changes based on feedback from Max
patch6: updated commit message
patch7: (was patch8): Added Max's R-b
patch8: (new): patch fixing NULL pointer dereference in kvm_arch_init_vcpu()

v4 -> v5
Based on further feedback from Max Reitz:
Dropped v4 patch1 (configure --disable-avx2) as Thomas Huth already pulled it. 
Dropped v4 patch6 (dump_qlist) as it was just an unnecessary assert
Dropped v4 patch8 'patch fixing NULL pointer dereference in 
kvm_arch_init_vcpu()'
  so as to limit this seies to block changes (will send in a separate series).
patch1: no change (v4 patch2)
patch2: Switched to using ?: in return (v4 patch3)
patch3: Added Max's R-b (v4 patch4)
patch4: couple of changes based on feedback from Max (v4 patch5)
patch5: no change (v4 patch7)

Liam Merwick (5):
  job: Fix off-by-one assert checks for JobSTT and JobVerbTable
  block: Null pointer dereference in blk_root_get_parent_desc()
  qemu-img: assert block_job_get() does not return NULL in img_commit()
  block: Fix potential Null pointer dereferences in vvfat.c
  qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

 block/block-backend.c  |  3 ++-
 block/qcow2-refcount.c | 18 ++
 block/vvfat.c  | 49 +
 job.c  |  4 ++--
 qemu-img.c |  1 +
 5 files changed, 48 insertions(+), 27 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH v5 3/5] qemu-img: assert block_job_get() does not return NULL in img_commit()

2018-11-05 Thread Liam Merwick
Although the function block_job_get() can return NULL, it would be a
serious bug if it did so (because the job yields before executing anything
(if it started successfully); but otherwise, commit_active_start() would
have returned an error).  However, as a precaution, before dereferencing
the 'job' pointer in img_commit() assert it is not NULL.

Signed-off-by: Liam Merwick 
Reviewed-by: Max Reitz 
---
 qemu-img.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..457aa152296b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,7 @@ static int img_commit(int argc, char **argv)
 }
 
 job = block_job_get("commit");
+assert(job);
 run_block_job(job, _err);
 if (local_err) {
 goto unref_backing;
-- 
1.8.3.1




[Qemu-devel] [PATCH v5 4/5] block: Fix potential Null pointer dereferences in vvfat.c

2018-11-05 Thread Liam Merwick
The calls to find_mapping_for_cluster() may return NULL but it
isn't always checked for before dereferencing the value returned.
Additionally, add some asserts to cover cases where NULL can't
be returned but which might not be obvious at first glance.

Signed-off-by: Liam Merwick 
---
 block/vvfat.c | 50 ++
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..263274d9739a 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -100,30 +100,26 @@ static inline void array_free(array_t* array)
 /* does not automatically grow */
 static inline void* array_get(array_t* array,unsigned int index) {
 assert(index < array->next);
+assert(array->pointer);
 return array->pointer + index * array->item_size;
 }
 
-static inline int array_ensure_allocated(array_t* array, int index)
+static inline void array_ensure_allocated(array_t *array, int index)
 {
 if((index + 1) * array->item_size > array->size) {
 int new_size = (index + 32) * array->item_size;
 array->pointer = g_realloc(array->pointer, new_size);
-if (!array->pointer)
-return -1;
+assert(array->pointer);
 memset(array->pointer + array->size, 0, new_size - array->size);
 array->size = new_size;
 array->next = index + 1;
 }
-
-return 0;
 }
 
 static inline void* array_get_next(array_t* array) {
 unsigned int next = array->next;
 
-if (array_ensure_allocated(array, next) < 0)
-return NULL;
-
+array_ensure_allocated(array, next);
 array->next = next + 1;
 return array_get(array, next);
 }
@@ -2428,16 +2424,13 @@ static int commit_direntries(BDRVVVFATState* s,
 direntry_t* direntry = array_get(&(s->directory), dir_index);
 uint32_t first_cluster = dir_index == 0 ? 0 : begin_of_direntry(direntry);
 mapping_t* mapping = find_mapping_for_cluster(s, first_cluster);
-
 int factor = 0x10 * s->sectors_per_cluster;
 int old_cluster_count, new_cluster_count;
-int current_dir_index = mapping->info.dir.first_dir_index;
-int first_dir_index = current_dir_index;
+int current_dir_index;
+int first_dir_index;
 int ret, i;
 uint32_t c;
 
-DLOG(fprintf(stderr, "commit_direntries for %s, parent_mapping_index %d\n", 
mapping->path, parent_mapping_index));
-
 assert(direntry);
 assert(mapping);
 assert(mapping->begin == first_cluster);
@@ -2445,6 +2438,15 @@ DLOG(fprintf(stderr, "commit_direntries for %s, 
parent_mapping_index %d\n", mapp
 assert(mapping->mode & MODE_DIRECTORY);
 assert(dir_index == 0 || is_directory(direntry));
 
+if (mapping == NULL) {
+return -1;
+}
+
+DLOG(fprintf(stderr, "commit_direntries for %s, parent_mapping_index %d\n",
+mapping->path, parent_mapping_index));
+
+current_dir_index = mapping->info.dir.first_dir_index;
+first_dir_index = current_dir_index;
 mapping->info.dir.parent_mapping_index = parent_mapping_index;
 
 if (first_cluster == 0) {
@@ -2494,6 +2496,9 @@ DLOG(fprintf(stderr, "commit_direntries for %s, 
parent_mapping_index %d\n", mapp
 direntry = array_get(&(s->directory), first_dir_index + i);
 if (is_directory(direntry) && !is_dot(direntry)) {
 mapping = find_mapping_for_cluster(s, first_cluster);
+if (mapping == NULL) {
+return -1;
+}
 assert(mapping->mode & MODE_DIRECTORY);
 ret = commit_direntries(s, first_dir_index + i,
 array_index(&(s->mapping), mapping));
@@ -2522,6 +2527,10 @@ static int commit_one_file(BDRVVVFATState* s,
 assert(offset < size);
 assert((offset % s->cluster_size) == 0);
 
+if (mapping == NULL) {
+return -1;
+}
+
 for (i = s->cluster_size; i < offset; i += s->cluster_size)
 c = modified_fat_get(s, c);
 
@@ -2668,8 +2677,12 @@ static int handle_renames_and_mkdirs(BDRVVVFATState* s)
 if (commit->action == ACTION_RENAME) {
 mapping_t* mapping = find_mapping_for_cluster(s,
 commit->param.rename.cluster);
-char* old_path = mapping->path;
+char *old_path;
 
+if (mapping == NULL) {
+return -1;
+}
+old_path = mapping->path;
 assert(commit->path);
 mapping->path = commit->path;
 if (rename(old_path, mapping->path))
@@ -2690,10 +2703,15 @@ static int handle_renames_and_mkdirs(BDRVVVFATState* s)
 direntry_t* d = direntry + i;
 
 if (is_file(d) || (is_directory(d) && !is_dot(d))) {
+int l;
+ 

[Qemu-devel] [PATCH v5 5/5] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-11-05 Thread Liam Merwick
The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array 
bounds.

Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-refcount.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5ec..46082aeac1d6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2719,15 +2719,17 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 static const char *metadata_ol_names[] = {
-[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
-[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
-[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
-[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
-[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
-[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
-[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
-[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
+[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
+[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
+[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
+[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_BITMAP_DIRECTORY_BITNR]   = "bitmap directory",
 };
+QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR != ARRAY_SIZE(metadata_ol_names));
 
 /*
  * First performs a check for metadata overlaps (through
-- 
1.8.3.1




[Qemu-devel] [PATCH v5 2/5] block: Null pointer dereference in blk_root_get_parent_desc()

2018-11-05 Thread Liam Merwick
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL).

Instead of just checking this case before before dereferencing,
adjust blk_get_attached_dev_id() to return the empty string if no
object path can be found (similar to the case when blk->dev is NULL
and an empty string is returned).

Signed-off-by: Liam Merwick 
---
 block/block-backend.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index dc0cd5772413..a2061a565024 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -918,7 +918,8 @@ char *blk_get_attached_dev_id(BlockBackend *blk)
 } else if (dev->id) {
 return g_strdup(dev->id);
 }
-return object_get_canonical_path(OBJECT(dev));
+
+return object_get_canonical_path(OBJECT(dev)) ?: g_strdup("");
 }
 
 /*
-- 
1.8.3.1




[Qemu-devel] [PATCH v5 1/5] job: Fix off-by-one assert checks for JobSTT and JobVerbTable

2018-11-05 Thread Liam Merwick
In the assert checking the array dereference of JobVerbTable[verb]
in job_apply_verb() the check of the index, verb, allows an overrun
because an index equal to the array size is permitted.

Similarly, in the assert check of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is not flagged
either.

This is not a run-time issue as there are no callers actually
passing in the max value.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
---
 job.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index c65e01bbfa34..da8e4b7bf2f3 100644
--- a/job.c
+++ b/job.c
@@ -159,7 +159,7 @@ bool job_is_internal(Job *job)
 static void job_state_transition(Job *job, JobStatus s1)
 {
 JobStatus s0 = job->status;
-assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
 trace_job_state_transition(job, job->ret,
JobSTT[s0][s1] ? "allowed" : "disallowed",
JobStatus_str(s0), JobStatus_str(s1));
@@ -174,7 +174,7 @@ static void job_state_transition(Job *job, JobStatus s1)
 int job_apply_verb(Job *job, JobVerb verb, Error **errp)
 {
 JobStatus s0 = job->status;
-assert(verb >= 0 && verb <= JOB_VERB__MAX);
+assert(verb >= 0 && verb < JOB_VERB__MAX);
 trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
  JobVerbTable[verb][s0] ? "allowed" : "prohibited");
 if (JobVerbTable[verb][s0]) {
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v4 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-11-05 Thread Liam Merwick




On 05/11/18 00:19, Max Reitz wrote:

On 19.10.18 22:39, Liam Merwick wrote:

The calls to find_mapping_for_cluster() may return NULL but it
isn't always checked for before dereferencing the value returned.
Additionally, add some asserts to cover cases where NULL can't
be returned but which might not be obvious at first glance.

Signed-off-by: Liam Merwick 
---
  block/vvfat.c | 33 -
  1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..19f6725054a0 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -100,6 +100,7 @@ static inline void array_free(array_t* array)
  /* does not automatically grow */
  static inline void* array_get(array_t* array,unsigned int index) {
  assert(index < array->next);
+assert(array->pointer);
  return array->pointer + index * array->item_size;
  }
  
@@ -108,8 +109,7 @@ static inline int array_ensure_allocated(array_t* array, int index)

  if((index + 1) * array->item_size > array->size) {
  int new_size = (index + 32) * array->item_size;
  array->pointer = g_realloc(array->pointer, new_size);
-if (!array->pointer)
-return -1;
+assert(array->pointer);


It would make sense to make this function not return any value (because
it just always returns 0 now), but I fully understand if you don't want
to mess around with vvfat more than you have to.  (Neither do I.)


It had occurred to me too but wasn't sure if it'd be preferred to roll 
that change in. 3 of the 4 callers ignored the return value already, so 
I bit the bullet and made the change.






  memset(array->pointer + array->size, 0, new_size - array->size);
  array->size = new_size;
  array->next = index + 1;
@@ -2261,6 +2261,9 @@ static mapping_t* insert_mapping(BDRVVVFATState* s,
  }
  if (index >= s->mapping.next || mapping->begin > begin) {
  mapping = array_insert(&(s->mapping), index, 1);
+if (mapping == NULL) {
+return NULL;
+}


array_insert() will never return NULL.



Removed.




  mapping->path = NULL;
  adjust_mapping_indices(s, index, +1);
  }
@@ -2428,6 +2431,9 @@ static int commit_direntries(BDRVVVFATState* s,
  direntry_t* direntry = array_get(&(s->directory), dir_index);
  uint32_t first_cluster = dir_index == 0 ? 0 : begin_of_direntry(direntry);
  mapping_t* mapping = find_mapping_for_cluster(s, first_cluster);
+if (mapping == NULL) {
+return -1;
+}


This should be moved below the declarations that still follow here.


Done. (It resulted in a bit more code rearranging and I had to fix two 
checkpatch warnings in existing code)




  
  int factor = 0x10 * s->sectors_per_cluster;

  int old_cluster_count, new_cluster_count;


[...]


@@ -3193,6 +3215,7 @@ static int enable_write_target(BlockDriverState *bs, 
Error **errp)
  
  backing = bdrv_new_open_driver(_write_target, NULL, BDRV_O_ALLOW_RDWR,

 _abort);
+assert(backing);
  *(void**) backing->opaque = s;


I personally wouldn't use an assert() here because it's clear that the
value is dereferenced immediately, so that is the assertion that it is
non-NULL, but I won't give too much of a fight.

The thing is, I believe we should write code for humans, not machines.
Fixing machines to understand what we produce is possible -- fixing
humans is more difficult.

On top of that, it would be a bug if NULL is returned and it would be
good if a static analyzer could catch that case.  Just fully silencing
it with assert() is not ideal.  Ideal would be if it would know that
setting _abort to any value crashes the program, and could thus
infer whether this function will actually ever get to return NULL when
_abort has been passed to it.



I'm investigating if the tool's config file syntax can describe that 
error_handle_fatal() exits when particular error_xxx parameters are passed.


I'll drop that assert in any case.

Regards,
Liam



Max

  
  bdrv_set_backing_hd(s->bs, backing, _abort);









Re: [Qemu-devel] [PATCH v4 6/8] block: dump_qlist() may dereference a Null pointer

2018-11-05 Thread Liam Merwick




On 05/11/18 00:07, Max Reitz wrote:

On 19.10.18 22:39, Liam Merwick wrote:

A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

Given that dump_qlist() is static, and callers already do the right
thing, just add an assert to catch future potential bugs (plus the
added benefit of suppressing a warning from a static analysis tool
and removing this noise will help us better find real issues).


But can't you fix the tool? 


I don't have access to the tool source but have been filing bugs against 
it as I run it on the QEMU codebase and discover false positives.



My opinion is still that large parts of our
code do not assert that some parameter is not NULL, and I think it isn't
a good idea to make them assert that.  


Yeah, that can be a slippery slope


I don't know what makes this
function special, and I wonder why it is special to your tool -- as I've
said in the last version, dump_qdict() is basically the same in this
regard.  I wonder why your tool doesn't mind that.



I had gone though the code paths to try to see how the tool was happy 
with one and not the other - the implementation differed slightly w.r.t 
macro usage but I couldn't see any obvious reason.



Can you not whitelist something as false positives?  I know we have a
lot of those in Coverity, and we just mark them as such, and that's it.


Yeah, I can flag this as a FP and have it fall off my list.

I'll will drop this patch in v5

Regards,
Liam



Finally, one could argue that the nonnull GCC function attribute would
be a better fit, actually.

But overall, I just don't think it's a good idea to start changing the
code to accommodate for false positives in static analyzers, because in
my experience the number of false positives only rises with time.

Max


Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
  block/qapi.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/block/qapi.c b/block/qapi.c
index c66f949db839..e81be604217c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -740,6 +740,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
  const QListEntry *entry;
  int i = 0;
  
+assert(list);

+
  for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
  QType type = qobject_type(entry->value);
  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);








Re: [Qemu-devel] [PATCH v4 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-11-05 Thread Liam Merwick




On 04/11/18 23:57, Max Reitz wrote:

On 19.10.18 22:39, Liam Merwick wrote:

The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL).

Instead of just checking this case before before dereferencing,
adjust blk_get_attached_dev_id() to return the empty string if no
object path can be found (similar to the case when blk->dev is NULL
and an empty string is returned).

Signed-off-by: Liam Merwick 
---
  block/block-backend.c | 6 +-
  dtc   | 2 +-
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index dc0cd5772413..e628920f3cd8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -909,6 +909,7 @@ void *blk_get_attached_dev(BlockBackend *blk)
  char *blk_get_attached_dev_id(BlockBackend *blk)
  {
  DeviceState *dev;
+char *dev_id;
  
  assert(!blk->legacy_dev);

  dev = blk->dev;
@@ -918,7 +919,10 @@ char *blk_get_attached_dev_id(BlockBackend *blk)
  } else if (dev->id) {
  return g_strdup(dev->id);
  }
-return object_get_canonical_path(OBJECT(dev));
+
+dev_id = object_get_canonical_path(OBJECT(dev));
+
+return dev_id ? dev_id : g_strdup("");
  }
  
  /*


Looks good, but since you'll have to respin anyway because of the hunk
below, you may want to consider

 return object_get_canonical_path(OBJECT(dev)) ?: g_strdup("");

instead.  (We have several instances of binary "?:" in the code already,
so it's fine to use it.  Of course you don't have to, though, if you
don't like it.)



I like it. Will make that change in v5



diff --git a/dtc b/dtc
index 88f18909db73..e54388015af1 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit 88f18909db731a627456f26d779445f84e449536
+Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42


I don't think this hunk belongs here.



Indeed.

Regards,
Liam



[Qemu-devel] [PATCH v4 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-10-19 Thread Liam Merwick
The calls to find_mapping_for_cluster() may return NULL but it
isn't always checked for before dereferencing the value returned.
Additionally, add some asserts to cover cases where NULL can't
be returned but which might not be obvious at first glance.

Signed-off-by: Liam Merwick 
---
 block/vvfat.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..19f6725054a0 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -100,6 +100,7 @@ static inline void array_free(array_t* array)
 /* does not automatically grow */
 static inline void* array_get(array_t* array,unsigned int index) {
 assert(index < array->next);
+assert(array->pointer);
 return array->pointer + index * array->item_size;
 }
 
@@ -108,8 +109,7 @@ static inline int array_ensure_allocated(array_t* array, 
int index)
 if((index + 1) * array->item_size > array->size) {
 int new_size = (index + 32) * array->item_size;
 array->pointer = g_realloc(array->pointer, new_size);
-if (!array->pointer)
-return -1;
+assert(array->pointer);
 memset(array->pointer + array->size, 0, new_size - array->size);
 array->size = new_size;
 array->next = index + 1;
@@ -2261,6 +2261,9 @@ static mapping_t* insert_mapping(BDRVVVFATState* s,
 }
 if (index >= s->mapping.next || mapping->begin > begin) {
 mapping = array_insert(&(s->mapping), index, 1);
+if (mapping == NULL) {
+return NULL;
+}
 mapping->path = NULL;
 adjust_mapping_indices(s, index, +1);
 }
@@ -2428,6 +2431,9 @@ static int commit_direntries(BDRVVVFATState* s,
 direntry_t* direntry = array_get(&(s->directory), dir_index);
 uint32_t first_cluster = dir_index == 0 ? 0 : begin_of_direntry(direntry);
 mapping_t* mapping = find_mapping_for_cluster(s, first_cluster);
+if (mapping == NULL) {
+return -1;
+}
 
 int factor = 0x10 * s->sectors_per_cluster;
 int old_cluster_count, new_cluster_count;
@@ -2494,6 +2500,9 @@ DLOG(fprintf(stderr, "commit_direntries for %s, 
parent_mapping_index %d\n", mapp
 direntry = array_get(&(s->directory), first_dir_index + i);
 if (is_directory(direntry) && !is_dot(direntry)) {
 mapping = find_mapping_for_cluster(s, first_cluster);
+if (mapping == NULL) {
+return -1;
+}
 assert(mapping->mode & MODE_DIRECTORY);
 ret = commit_direntries(s, first_dir_index + i,
 array_index(&(s->mapping), mapping));
@@ -2522,6 +2531,10 @@ static int commit_one_file(BDRVVVFATState* s,
 assert(offset < size);
 assert((offset % s->cluster_size) == 0);
 
+if (mapping == NULL) {
+return -1;
+}
+
 for (i = s->cluster_size; i < offset; i += s->cluster_size)
 c = modified_fat_get(s, c);
 
@@ -2668,8 +2681,12 @@ static int handle_renames_and_mkdirs(BDRVVVFATState* s)
 if (commit->action == ACTION_RENAME) {
 mapping_t* mapping = find_mapping_for_cluster(s,
 commit->param.rename.cluster);
-char* old_path = mapping->path;
+char *old_path;
 
+if (mapping == NULL) {
+return -1;
+}
+old_path = mapping->path;
 assert(commit->path);
 mapping->path = commit->path;
 if (rename(old_path, mapping->path))
@@ -2690,10 +2707,15 @@ static int handle_renames_and_mkdirs(BDRVVVFATState* s)
 direntry_t* d = direntry + i;
 
 if (is_file(d) || (is_directory(d) && !is_dot(d))) {
+int l;
+char *new_path;
 mapping_t* m = find_mapping_for_cluster(s,
 begin_of_direntry(d));
-int l = strlen(m->path);
-char* new_path = g_malloc(l + diff + 1);
+if (m == NULL) {
+return -1;
+}
+l = strlen(m->path);
+new_path = g_malloc(l + diff + 1);
 
 assert(!strncmp(m->path, mapping->path, l2));
 
@@ -3193,6 +3215,7 @@ static int enable_write_target(BlockDriverState *bs, 
Error **errp)
 
 backing = bdrv_new_open_driver(_write_target, NULL, 
BDRV_O_ALLOW_RDWR,
_abort);
+assert(backing);
 *(void**) backing->opaque = s;
 
 bdrv_set_backing_hd(s->bs, backing, _abort);
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 8/8] kvm: Potential NULL pointer dereference in kvm_arch_init_vcpu()

2018-10-19 Thread Liam Merwick
In kvm_arch_init_vcpu() a call to cpuid_find_entry() can return
NULL so the pointer returned should be checked before dereferencing it.

Signed-off-by: Liam Merwick 
---
 target/i386/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index dc4047b02fc5..eb19c87a9d25 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -1177,7 +1177,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c->ecx = c->edx = 0;
 
 c = cpuid_find_entry(_data.cpuid, kvm_base, 0);
-c->eax = MAX(c->eax, KVM_CPUID_SIGNATURE | 0x10);
+if (c) {
+c->eax = MAX(c->eax, KVM_CPUID_SIGNATURE | 0x10);
+   }
 }
 
 cpuid_data.cpuid.nent = cpuid_i;
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 1/8] configure: Provide option to explicitly disable AVX2

2018-10-19 Thread Liam Merwick
The configure script detects if the compiler has AVX2 support and
automatically sets avx2_opt="yes" which in turn defines CONFIG_AVX2_OPT.
There is no way of explicitly overriding this setting so this commit adds
two command-line options: --enable-avx2 and --disable-avx2.

The default behaviour, when no option is specified, is to maintain the
current behaviour and enable AVX2 if the compiler supports it.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 configure | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 9138af37f8a0..3a3e5f7004ce 100755
--- a/configure
+++ b/configure
@@ -428,7 +428,7 @@ usb_redir=""
 opengl=""
 opengl_dmabuf="no"
 cpuid_h="no"
-avx2_opt="no"
+avx2_opt=""
 zlib="yes"
 capstone=""
 lzo=""
@@ -1332,6 +1332,10 @@ for opt do
   ;;
   --disable-glusterfs) glusterfs="no"
   ;;
+  --disable-avx2) avx2_opt="no"
+  ;;
+  --enable-avx2) avx2_opt="yes"
+  ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1706,6 +1710,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   libxml2 for Parallels image format
   tcmalloctcmalloc support
   jemallocjemalloc support
+  avx2AVX2 optimization support
   replication replication support
   vhost-vsock virtio sockets device support
   opengl  opengl support
@@ -5094,7 +5099,7 @@ fi
 # There is no point enabling this if cpuid.h is not usable,
 # since we won't be able to select the new routines.
 
-if test $cpuid_h = yes; then
+if test "$cpuid_h" = "yes" -a "$avx2_opt" != "no"; then
   cat > $TMPC << EOF
 #pragma GCC push_options
 #pragma GCC target("avx2")
@@ -5108,6 +5113,8 @@ int main(int argc, char *argv[]) { return bar(argv[0]); }
 EOF
   if compile_object "" ; then
 avx2_opt="yes"
+  else
+avx2_opt="no"
   fi
 fi
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 7/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-10-19 Thread Liam Merwick
The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array 
bounds.

Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-refcount.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5ec..46082aeac1d6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2719,15 +2719,17 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 static const char *metadata_ol_names[] = {
-[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
-[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
-[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
-[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
-[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
-[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
-[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
-[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
+[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
+[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
+[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
+[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_BITMAP_DIRECTORY_BITNR]   = "bitmap directory",
 };
+QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR != ARRAY_SIZE(metadata_ol_names));
 
 /*
  * First performs a check for metadata overlaps (through
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 4/8] qemu-img: assert block_job_get() does not return NULL in img_commit()

2018-10-19 Thread Liam Merwick
Although the function block_job_get() can return NULL, it would be a
serious bug if it did so (because the job yields before executing anything
(if it started successfully); but otherwise, commit_active_start() would
have returned an error).  However, as a precaution, before dereferencing
the 'job' pointer in img_commit() assert it is not NULL.

Signed-off-by: Liam Merwick 
---
 qemu-img.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..457aa152296b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,7 @@ static int img_commit(int argc, char **argv)
 }
 
 job = block_job_get("commit");
+assert(job);
 run_block_job(job, _err);
 if (local_err) {
 goto unref_backing;
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 6/8] block: dump_qlist() may dereference a Null pointer

2018-10-19 Thread Liam Merwick
A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

Given that dump_qlist() is static, and callers already do the right
thing, just add an assert to catch future potential bugs (plus the
added benefit of suppressing a warning from a static analysis tool
and removing this noise will help us better find real issues).

Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
 block/qapi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qapi.c b/block/qapi.c
index c66f949db839..e81be604217c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -740,6 +740,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
 const QListEntry *entry;
 int i = 0;
 
+assert(list);
+
 for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 2/8] job: Fix off-by-one assert checks for JobSTT and JobVerbTable

2018-10-19 Thread Liam Merwick
In the assert checking the array dereference of JobVerbTable[verb]
in job_apply_verb() the check of the index, verb, allows an overrun
because an index equal to the array size is permitted.

Similarly, in the assert check of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is not flagged
either.

This is not a run-time issue as there are no callers actually
passing in the max value.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
Reviewed-by: Eric Blake 
Reviewed-by: John Snow 
---
 job.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index c65e01bbfa34..da8e4b7bf2f3 100644
--- a/job.c
+++ b/job.c
@@ -159,7 +159,7 @@ bool job_is_internal(Job *job)
 static void job_state_transition(Job *job, JobStatus s1)
 {
 JobStatus s0 = job->status;
-assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
 trace_job_state_transition(job, job->ret,
JobSTT[s0][s1] ? "allowed" : "disallowed",
JobStatus_str(s0), JobStatus_str(s1));
@@ -174,7 +174,7 @@ static void job_state_transition(Job *job, JobStatus s1)
 int job_apply_verb(Job *job, JobVerb verb, Error **errp)
 {
 JobStatus s0 = job->status;
-assert(verb >= 0 && verb <= JOB_VERB__MAX);
+assert(verb >= 0 && verb < JOB_VERB__MAX);
 trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
  JobVerbTable[verb][s0] ? "allowed" : "prohibited");
 if (JobVerbTable[verb][s0]) {
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-10-19 Thread Liam Merwick
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL).

Instead of just checking this case before before dereferencing,
adjust blk_get_attached_dev_id() to return the empty string if no
object path can be found (similar to the case when blk->dev is NULL
and an empty string is returned).

Signed-off-by: Liam Merwick 
---
 block/block-backend.c | 6 +-
 dtc   | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index dc0cd5772413..e628920f3cd8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -909,6 +909,7 @@ void *blk_get_attached_dev(BlockBackend *blk)
 char *blk_get_attached_dev_id(BlockBackend *blk)
 {
 DeviceState *dev;
+char *dev_id;
 
 assert(!blk->legacy_dev);
 dev = blk->dev;
@@ -918,7 +919,10 @@ char *blk_get_attached_dev_id(BlockBackend *blk)
 } else if (dev->id) {
 return g_strdup(dev->id);
 }
-return object_get_canonical_path(OBJECT(dev));
+
+dev_id = object_get_canonical_path(OBJECT(dev));
+
+return dev_id ? dev_id : g_strdup("");
 }
 
 /*
diff --git a/dtc b/dtc
index 88f18909db73..e54388015af1 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit 88f18909db731a627456f26d779445f84e449536
+Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42
-- 
1.8.3.1




[Qemu-devel] [PATCH v4 0/8] off-by-one and NULL pointer accesses detected by static analysis

2018-10-19 Thread Liam Merwick
Below are a number of fixes to some off-by-one, read outside array bounds, and
NULL pointer accesses detected by an internal Oracle static analysis tool 
(Parfait).
https://labs.oracle.com/pls/apex/f?p=labs:49:P49_PROJECT_ID:13

I have also included a patch to add a command-line option to configure to
select if AVX2 is used or not (keeping the existing behaviour by default).
My motivation was avoiding an issue with the static analysis tool but NetSpectre
was announced as I was working on this and I felt it may have more general uses.

v1 -> v2
Based on feedback from Eric Blake:
patch2: reworded commit message to clarify issue
patch6: Reverted common qlist routines and added assert to qlist_dump instead
patch7: Fixed incorrect logic
patch8: Added QEMU_BUILD_BUG_ON to catch future іnstance at compile-time

v2 -> v3
Based on feedback from Eric Blake:
patch6: removed double space from commit message
patch8: removed unnecessary comment and updated QEMU_BUILD_BUG_ON to use 
ARRAY_SIZE
Added Eric's R-b to patches 6,7,8

v3 -> v4
Based on feedback from Max Reitz:
patch2: Added R-b from John Snow
patch3: fixed blk_get_attached_dev_id() instead of checking return value
patch4: switched to assert()
patch5: numerous changes based on feedback from Max
patch6: updated commit message
patch7: (was patch8): Added Max's R-b
patch8: (new): patch fixing NULL pointer dereference in kvm_arch_init_vcpu()

I also dropped the 'io: potential unnecessary check in 
qio_channel_command_new_spawn()'
patch from v3 - it was correct but of no benefit to staic analysis checking

Liam Merwick (8):
  configure: Provide option to explicitly disable AVX2
  job: Fix off-by-one assert checks for JobSTT and JobVerbTable
  block: Null pointer dereference in blk_root_get_parent_desc()
  qemu-img: assert block_job_get() does not return NULL in img_commit()
  block: Fix potential Null pointer dereferences in vvfat.c
  block: dump_qlist() may dereference a Null pointer
  qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()
  kvm: Potential NULL pointer dereference in kvm_arch_init_vcpu()

 block/block-backend.c  |  6 +-
 block/qapi.c   |  2 ++
 block/qcow2-refcount.c | 18 ++
 block/vvfat.c  | 33 -
 configure  | 11 +--
 dtc|  2 +-
 job.c  |  4 ++--
 qemu-img.c |  1 +
 target/i386/kvm.c  |  4 +++-
 9 files changed, 61 insertions(+), 20 deletions(-)

-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v3 6/8] block: dump_qlist() may dereference a Null pointer

2018-10-19 Thread Liam Merwick




On 12/10/18 16:22, Max Reitz wrote:

On 31.08.18 20:16, Liam Merwick wrote:

A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

Given that dump_qlist() is static, and callers already do the right
thing, just add an assert to catch future potential bugs.

Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
  block/qapi.c | 2 ++
  1 file changed, 2 insertions(+)


I don't disagree, but I don't see why the program just wouldn't crash if
someone passed a NULL pointer.  And I don't quite see why anyone would
pass a NULL pointer.

Of course it's reasonable to just add an assert() to reinforce the
contract; but we have so many functions that just take a pointer that
they assume to be non-NULL and then immediately dereference it.  Nearly
every blk_* function takes a BlockBackend that is always assumed to be
non-NULL, for instance, and I don't really want to put assert()s into
all of them.  Or another example: dump_qobject() and dump_qdict() do
exactly the same -- if we added an assertion in dump_qlist(), we would
actually have to add the very same assertions there, too.

So I don't really object this patch (because it's not wrong), but I
don't think it's very useful.



I agree with all the above - however I kept the patch in the series 
given it was helping reduce the static analysis noise (hopefully making 
it easier to spot real issues)


Regards,
Liam




Max


diff --git a/block/qapi.c b/block/qapi.c
index c66f949db839..e81be604217c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -740,6 +740,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
  const QListEntry *entry;
  int i = 0;
  
+assert(list);

+
  for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
  QType type = qobject_type(entry->value);
  bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);








Re: [Qemu-devel] [PATCH v3 4/8] qemu-img: potential Null pointer deref in img_commit()

2018-10-19 Thread Liam Merwick




On 12/10/18 15:51, Max Reitz wrote:

On 31.08.18 20:16, Liam Merwick wrote:

The function block_job_get() may return NULL so before dereferencing
the 'job' pointer in img_commit() it should be checked.


It may not because the job yields before executing anything (if it
started successfully; but otherwise, commit_active_start() would have
returned an error).  Therefore, I think the better solution is to
assert(job) here.




Switched patch to use assert()

Regards,
Liam



(It would be a serious bug if block_job_get() returned NULL here, so
it's definitely not something we can be quiet about.  But this patch
makes it so the user doesn't even notice.)

Max


Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  qemu-img.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..51fe09bd08ed 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,9 @@ static int img_commit(int argc, char **argv)
  }
  
  job = block_job_get("commit");

+if (job == NULL) {
+goto unref_backing;
+}
  run_block_job(job, _err);
  if (local_err) {
  goto unref_backing;








Re: [Qemu-devel] [PATCH v3 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-10-19 Thread Liam Merwick




On 12/10/18 16:14, Max Reitz wrote:

On 31.08.18 20:16, Liam Merwick wrote:

The calls to bdrv_new_open_driver(), find_mapping_for_cluster(),
and array_get_next() may return NULL but it isn't always checked for
before dereferencing the value returned.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  block/vvfat.c | 56 
  1 file changed, 56 insertions(+)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..0f1f10a2f94b 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -448,6 +448,9 @@ static direntry_t *create_long_filename(BDRVVVFATState *s, 
const char *filename)
  
  for(i=0;i
  entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+continue;
+}


This is a bug in array_ensure_allocated().  It uses g_realloc() with a
non-zero size, so that function will never return NULL.  It will rather
abort().

Therefore, array_ensure_allocated() cannot fail.  Consequentially,
array_get_next() cannot fail.




I've reverted this (and the rest of the 'As above' comments below)



  entry->attributes=0xf;
  entry->reserved[0]=0;
  entry->begin=0;
@@ -665,6 +668,9 @@ static inline void fat_set(BDRVVVFATState* s,unsigned int 
cluster,uint32_t value
  } else {
  int offset = (cluster*3/2);
  unsigned char* p = array_get(&(s->fat), offset);
+if (p == NULL) {
+return;
+}


This is only reached if array_get_next() was called before.  Therefore,
this cannot return NULL.

However, an assert(array->pointer); in array_get() can't hurt.



Done.




  switch (cluster&1) {
  case 0:
  p[0] = value&0xff;
@@ -730,6 +736,9 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
  
  if(is_dot) {

  entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return NULL;
+}


As above.


  memset(entry->name, 0x20, sizeof(entry->name));
  memcpy(entry->name,filename,strlen(filename));
  return entry;
@@ -844,6 +853,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
  /* create mapping for this file */
  if(!is_dot && !is_dotdot && (S_ISDIR(st.st_mode) || st.st_size)) {
  s->current_mapping = array_get_next(&(s->mapping));
+if (s->current_mapping == NULL) {
+fprintf(stderr, "Failed to create mapping for file\n");
+g_free(buffer);
+closedir(dir);
+return -2;
+}


As above.


  s->current_mapping->begin=0;
  s->current_mapping->end=st.st_size;
  /*
@@ -941,6 +956,9 @@ static int init_directories(BDRVVVFATState* s,
  /* add volume label */
  {
  direntry_t* entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return -1;
+}


As above.


  entry->attributes=0x28; /* archive | volume label */
  memcpy(entry->name, s->volume_label, sizeof(entry->name));
  }
@@ -953,6 +971,9 @@ static int init_directories(BDRVVVFATState* s,
  s->cluster_count=sector2cluster(s, s->sector_count);
  
  mapping = array_get_next(&(s->mapping));

+if (mapping == NULL) {
+return -1;
+}


As above.


  mapping->begin = 0;
  mapping->dir_index = 0;
  mapping->info.dir.parent_mapping_index = -1;
@@ -1630,6 +1651,9 @@ static void schedule_rename(BDRVVVFATState* s,
  uint32_t cluster, char* new_path)
  {
  commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}


As above.


  commit->path = new_path;
  commit->param.rename.cluster = cluster;
  commit->action = ACTION_RENAME;
@@ -1639,6 +1663,9 @@ static void schedule_writeout(BDRVVVFATState* s,
  int dir_index, uint32_t modified_offset)
  {
  commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}


As above.


  commit->path = NULL;
  commit->param.writeout.dir_index = dir_index;
  commit->param.writeout.modified_offset = modified_offset;
@@ -1649,6 +1676,9 @@ static void schedule_new_file(BDRVVVFATState* s,
  char* path, uint32_t first_cluster)
  {
  commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}


As above.


  commit->path = path;
  commit->param.new_file.first_cluster = first_cluster;
  commit->action = ACTION_NEW_FILE;
@@ -1657,6 +1687,9 @@ static void schedule_new_file(BDRVVVFATState* s,
  static void schedule_mkdir(BDRVVVFATState* s, uint32_t cluster, char* path)
  {
  commit_t* commit =

Re: [Qemu-devel] [PATCH v3 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-10-19 Thread Liam Merwick




On 12/10/18 15:48, Max Reitz wrote:

Hi,

On 31.08.18 20:16, Liam Merwick wrote:

The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL) so it should
be checked before dereferencing.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  block/block-backend.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be83..210eee75006a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,7 +136,7 @@ static char *blk_root_get_parent_desc(BdrvChild *child)
  }
  
  dev_id = blk_get_attached_dev_id(blk);

-if (*dev_id) {
+if (dev_id && *dev_id) {
  return dev_id;


I rather think that blk_get_attached_dev_id() needs attention first.  It
returns an explicitly empty string if blk->dev is NULL.  If NULL was a
valid return value, it should just return NULL there.

Besides this caller, there are two callers that pass the dev_id to
qapi_event_send_device_tray_moved().  Now in practice that allows the
string to be NULL, but there is a comment in visit_type_str() that says
one should not pass NULL.

So it's either changing blk_get_attached_dev_id() to return NULL when
there is no valid ID (instead of the empty string, and then we could
save ourselves the check "*dev_id" here and elsewhere), but then we have
to fix all callers.

Or we make it return an empty string if object_get_canonical_path()
returned NULL.



I went with the latter and now (in upcoming v4) check the return value 
from object_get_canonical_path() and return an empty string if it's NULL.


Regards,
Liam



Max


  } else {
  /* TODO Callback into the BB owner for something more detailed */








Re: [Qemu-devel] [PATCH] linux-user: Suppress address-of-packed-member warnings in __get/put_user_e

2018-09-28 Thread Liam Merwick



On 28/09/18 15:25, Peter Maydell wrote:

Our __get_user_e() and __put_user_e() macros cause newer versions
of clang to generate false-positive -Waddress-of-packed-member
warnings if they are passed the address of a member of a packed
struct (see https://bugs.llvm.org/show_bug.cgi?id=39113).
Suppress these using the _Pragma() operator.

To put in the pragmas we need to convert the macros from
expressions to statements, but all the callsites effectively
treat them as statements already so this is OK.

Signed-off-by: Peter Maydell 


Reviewed-by: Liam Merwick 


---
  linux-user/qemu.h | 57 +++
  1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index b4959e41c6e..56c4f2d5374 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -461,27 +461,46 @@ static inline int access_ok(int type, abi_ulong addr, 
abi_ulong size)
 These are usually used to access struct data members once the struct has
 been locked - usually with lock_user_struct.  */
  
-/* Tricky points:

-   - Use __builtin_choose_expr to avoid type promotion from ?:,
-   - Invalid sizes result in a compile time error stemming from
- the fact that abort has no parameters.
-   - It's easier to use the endian-specific unaligned load/store
- functions than host-endian unaligned load/store plus tswapN.  */
+/*
+ * Tricky points:
+ * - Use __builtin_choose_expr to avoid type promotion from ?:,
+ * - Invalid sizes result in a compile time error stemming from
+ *   the fact that abort has no parameters.
+ * - It's easier to use the endian-specific unaligned load/store
+ *   functions than host-endian unaligned load/store plus tswapN.
+ * - The pragmas are necessary only to silence a clang false-positive
+ *   warning: see https://bugs.llvm.org/show_bug.cgi?id=39113 .
+ * - We have to disable -Wpragmas warnings to avoid a complaint about
+ *   an unknown warning type from older compilers that don't know about
+ *   -Waddress-of-packed-member.
+ */
+#define __put_user_e(x, hptr, e)\
+do {\
+_Pragma("GCC diagnostic push"); \
+_Pragma("GCC diagnostic ignored \"-Wpragmas\"");\
+_Pragma("GCC diagnostic ignored \"-Waddress-of-packed-member\"");   \
+(__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p, \
+__builtin_choose_expr(sizeof(*(hptr)) == 2, stw_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 4, stl_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p, abort  \
+((hptr), (x)), (void)0);\
+_Pragma("GCC diagnostic pop");  \
+} while (0)
  
-#define __put_user_e(x, hptr, e)\

-  (__builtin_choose_expr(sizeof(*(hptr)) == 1, stb_p,   \
-   __builtin_choose_expr(sizeof(*(hptr)) == 2, stw_##e##_p, \
-   __builtin_choose_expr(sizeof(*(hptr)) == 4, stl_##e##_p, \
-   __builtin_choose_expr(sizeof(*(hptr)) == 8, stq_##e##_p, abort   \
- ((hptr), (x)), (void)0)
+#define __get_user_e(x, hptr, e)\
+do {\
+_Pragma("GCC diagnostic push"); \
+_Pragma("GCC diagnostic ignored \"-Wpragmas\"");\
+_Pragma("GCC diagnostic ignored \"-Waddress-of-packed-member\"");   \
+((x) = (typeof(*hptr))( \
+__builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p, \
+__builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,   \
+__builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p,\
+__builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort  \
+(hptr)), (void)0);  \
+_Pragma("GCC diagnostic pop");  \
+} while (0)
  
-#define __get_user_e(x, hptr, e)\

-  ((x) = (typeof(*hptr))(   \
-   __builtin_choose_expr(sizeof(*(hptr)) == 1, ldub_p,  \
-   __builtin_choose_expr(sizeof(*(hptr)) == 2, lduw_##e##_p,\
-   __builtin_choose_expr(sizeof(*(hptr)) == 4, ldl_##e##_p, \
-   __builtin_choose_expr(sizeof(*(hptr)) == 8, ldq_##e##_p, abort   \
- (hptr)), (void)0)
  
  #ifdef TARGET_WORDS_BIGENDIAN

  # define __put_user(x, hptr)  __put_user_e(x, hptr, be)





[Qemu-devel] [PATCH v3 6/8] block: dump_qlist() may dereference a Null pointer

2018-08-31 Thread Liam Merwick
A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

Given that dump_qlist() is static, and callers already do the right
thing, just add an assert to catch future potential bugs.

Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
 block/qapi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qapi.c b/block/qapi.c
index c66f949db839..e81be604217c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -740,6 +740,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
 const QListEntry *entry;
 int i = 0;
 
+assert(list);
+
 for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 8/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-08-31 Thread Liam Merwick
The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array 
bounds.

Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
 block/qcow2-refcount.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5ec..46082aeac1d6 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2719,15 +2719,17 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 static const char *metadata_ol_names[] = {
-[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
-[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
-[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
-[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
-[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
-[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
-[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
-[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
+[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
+[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
+[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
+[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_BITMAP_DIRECTORY_BITNR]   = "bitmap directory",
 };
+QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR != ARRAY_SIZE(metadata_ol_names));
 
 /*
  * First performs a check for metadata overlaps (through
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-08-31 Thread Liam Merwick
The calls to bdrv_new_open_driver(), find_mapping_for_cluster(),
and array_get_next() may return NULL but it isn't always checked for
before dereferencing the value returned.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/vvfat.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..0f1f10a2f94b 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -448,6 +448,9 @@ static direntry_t *create_long_filename(BDRVVVFATState *s, 
const char *filename)
 
 for(i=0;idirectory));
+if (entry == NULL) {
+continue;
+}
 entry->attributes=0xf;
 entry->reserved[0]=0;
 entry->begin=0;
@@ -665,6 +668,9 @@ static inline void fat_set(BDRVVVFATState* s,unsigned int 
cluster,uint32_t value
 } else {
 int offset = (cluster*3/2);
 unsigned char* p = array_get(&(s->fat), offset);
+if (p == NULL) {
+return;
+}
 switch (cluster&1) {
 case 0:
 p[0] = value&0xff;
@@ -730,6 +736,9 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
 
 if(is_dot) {
 entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return NULL;
+}
 memset(entry->name, 0x20, sizeof(entry->name));
 memcpy(entry->name,filename,strlen(filename));
 return entry;
@@ -844,6 +853,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 /* create mapping for this file */
 if(!is_dot && !is_dotdot && (S_ISDIR(st.st_mode) || st.st_size)) {
 s->current_mapping = array_get_next(&(s->mapping));
+if (s->current_mapping == NULL) {
+fprintf(stderr, "Failed to create mapping for file\n");
+g_free(buffer);
+closedir(dir);
+return -2;
+}
 s->current_mapping->begin=0;
 s->current_mapping->end=st.st_size;
 /*
@@ -941,6 +956,9 @@ static int init_directories(BDRVVVFATState* s,
 /* add volume label */
 {
 direntry_t* entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return -1;
+}
 entry->attributes=0x28; /* archive | volume label */
 memcpy(entry->name, s->volume_label, sizeof(entry->name));
 }
@@ -953,6 +971,9 @@ static int init_directories(BDRVVVFATState* s,
 s->cluster_count=sector2cluster(s, s->sector_count);
 
 mapping = array_get_next(&(s->mapping));
+if (mapping == NULL) {
+return -1;
+}
 mapping->begin = 0;
 mapping->dir_index = 0;
 mapping->info.dir.parent_mapping_index = -1;
@@ -1630,6 +1651,9 @@ static void schedule_rename(BDRVVVFATState* s,
 uint32_t cluster, char* new_path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = new_path;
 commit->param.rename.cluster = cluster;
 commit->action = ACTION_RENAME;
@@ -1639,6 +1663,9 @@ static void schedule_writeout(BDRVVVFATState* s,
 int dir_index, uint32_t modified_offset)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = NULL;
 commit->param.writeout.dir_index = dir_index;
 commit->param.writeout.modified_offset = modified_offset;
@@ -1649,6 +1676,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 char* path, uint32_t first_cluster)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.new_file.first_cluster = first_cluster;
 commit->action = ACTION_NEW_FILE;
@@ -1657,6 +1687,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 static void schedule_mkdir(BDRVVVFATState* s, uint32_t cluster, char* path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.mkdir.cluster = cluster;
 commit->action = ACTION_MKDIR;
@@ -2261,6 +2294,9 @@ static mapping_t* insert_mapping(BDRVVVFATState* s,
 }
 if (index >= s->mapping.next || mapping->begin > begin) {
 mapping = array_insert(&(s->mapping), index, 1);
+if (mapping == NULL) {
+return NULL;
+}
 mapping->path = NULL;
 adjust_mapping_indices(s, index, +1);
 }
@@ -2428,6 +2464,9 @@ static int commit_direntries(BDRVVVFATState* s,
 direntry_t* direntry = array_get(&(s->directory), dir_index);
 uint32_t first_cluster = dir_index == 0 ? 0 : begin_

[Qemu-devel] [PATCH v3 7/8] io: potential unnecessary check in qio_channel_command_new_spawn()

2018-08-31 Thread Liam Merwick
In qio_channel_command_new_spawn() the 'flags' variable is checked
to see if /dev/null should be used for stdin or stdout; first with
O_RDONLY and then O_WRONLY.  However the second check for O_WRONLY
is only needed if flags != O_RDONLY and therefore should be an
else if statement.

This minor optimization has the added benefit of suppressing a warning
from a static analysis tool (Parfait) which incorrectly reported an
incorrect checking of flags in qio_channel_command_new_spawn() could
result in an uninitialized file descriptor being used. Removing this
noise will help us better find real issues.

Signed-off-by: Liam Merwick 
Reviewed-by: Eric Blake 
---
 io/channel-command.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/io/channel-command.c b/io/channel-command.c
index 3e7eb17eff54..82acd3234915 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -61,8 +61,7 @@ qio_channel_command_new_spawn(const char *const argv[],
 
 if (flags == O_RDONLY) {
 stdinnull = true;
-}
-if (flags == O_WRONLY) {
+} else if (flags == O_WRONLY) {
 stdoutnull = true;
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 4/8] qemu-img: potential Null pointer deref in img_commit()

2018-08-31 Thread Liam Merwick
The function block_job_get() may return NULL so before dereferencing
the 'job' pointer in img_commit() it should be checked.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 qemu-img.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..51fe09bd08ed 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,9 @@ static int img_commit(int argc, char **argv)
 }
 
 job = block_job_get("commit");
+if (job == NULL) {
+goto unref_backing;
+}
 run_block_job(job, _err);
 if (local_err) {
 goto unref_backing;
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 1/8] configure: Provide option to explicitly disable AVX2

2018-08-31 Thread Liam Merwick
The configure script detects if the compiler has AVX2 support and
automatically sets avx2_opt="yes" which in turn defines CONFIG_AVX2_OPT.
There is no way of explicitly overriding this setting so this commit adds
two command-line options: --enable-avx2 and --disable-avx2.

The default behaviour, when no option is specified, is to maintain the
current behaviour and enable AVX2 if the compiler supports it.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 configure | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 58862d2ae88a..8904a91715b3 100755
--- a/configure
+++ b/configure
@@ -427,7 +427,7 @@ usb_redir=""
 opengl=""
 opengl_dmabuf="no"
 cpuid_h="no"
-avx2_opt="no"
+avx2_opt=""
 zlib="yes"
 capstone=""
 lzo=""
@@ -1332,6 +1332,10 @@ for opt do
   ;;
   --disable-glusterfs) glusterfs="no"
   ;;
+  --disable-avx2) avx2_opt="no"
+  ;;
+  --enable-avx2) avx2_opt="yes"
+  ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1709,6 +1713,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   libxml2 for Parallels image format
   tcmalloctcmalloc support
   jemallocjemalloc support
+  avx2AVX2 optimization support
   replication replication support
   vhost-vsock virtio sockets device support
   opengl  opengl support
@@ -5127,7 +5132,7 @@ fi
 # There is no point enabling this if cpuid.h is not usable,
 # since we won't be able to select the new routines.
 
-if test $cpuid_h = yes; then
+if test "$cpuid_h" = "yes" -a "$avx2_opt" != "no"; then
   cat > $TMPC << EOF
 #pragma GCC push_options
 #pragma GCC target("avx2")
@@ -5141,6 +5146,8 @@ int main(int argc, char *argv[]) { return bar(argv[0]); }
 EOF
   if compile_object "" ; then
 avx2_opt="yes"
+  else
+avx2_opt="no"
   fi
 fi
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 0/8] off-by-one and NULL pointer accesses detected by static analysis

2018-08-31 Thread Liam Merwick
Below are a number of fixes to some off-by-one, read outside array bounds, and
NULL pointer accesses detected by an internal Oracle static analysis tool 
(Parfait).
https://labs.oracle.com/pls/apex/f?p=labs:49:P49_PROJECT_ID:13

I have also included a patch to add a command-line option to configure to
select if AVX2 is used or not (keeping the existing behaviour by default).
My motivation was avoiding an issue with the static analysis tool but NetSpectre
was announced as I was working on this and I felt it may have more general uses.

v1 -> v2
Based on feedback from Eric Blake:
patch2: reworded commit message to clarify issue
patch6: Reverted common qlist routines and added assert to qlist_dump instead
patch7: Fixed incorrect logic
patch8: Added QEMU_BUILD_BUG_ON to catch future іnstance at compile-time

v2 -> v3
Based on feedback from Eric Blake:
patch6: removed double space from commit message
patch8: removed unnecessary comment and updated QEMU_BUILD_BUG_ON to use 
ARRAY_SIZE
Added Eric's R-b to patches 6,7,8

Liam Merwick (8):
  configure: Provide option to explicitly disable AVX2
  job: Fix off-by-one assert checks for JobSTT and JobVerbTable
  block: Null pointer dereference in blk_root_get_parent_desc()
  qemu-img: potential Null pointer deref in img_commit()
  block: Fix potential Null pointer dereferences in vvfat.c
  block: dump_qlist() may dereference a Null pointer
  io: potential unnecessary check in qio_channel_command_new_spawn()
  qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

 block/block-backend.c  |  2 +-
 block/qapi.c   |  2 ++
 block/qcow2-refcount.c | 18 
 block/vvfat.c  | 56 ++
 configure  | 11 --
 io/channel-command.c   |  3 +--
 job.c  |  4 ++--
 qemu-img.c |  3 +++
 8 files changed, 84 insertions(+), 15 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH v3 2/8] job: Fix off-by-one assert checks for JobSTT and JobVerbTable

2018-08-31 Thread Liam Merwick
In the assert checking the array dereference of JobVerbTable[verb]
in job_apply_verb() the check of the index, verb, allows an overrun
because an index equal to the array size is permitted.

Similarly, in the assert check of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is not flagged
either.

This is not a run-time issue as there are no callers actually
passing in the max value.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
Reviewed-by: Eric Blake 
---
 job.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index e36ebaafd81c..40320566f43b 100644
--- a/job.c
+++ b/job.c
@@ -166,7 +166,7 @@ bool job_is_internal(Job *job)
 static void job_state_transition(Job *job, JobStatus s1)
 {
 JobStatus s0 = job->status;
-assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
 trace_job_state_transition(job, job->ret,
JobSTT[s0][s1] ? "allowed" : "disallowed",
JobStatus_str(s0), JobStatus_str(s1));
@@ -181,7 +181,7 @@ static void job_state_transition(Job *job, JobStatus s1)
 int job_apply_verb(Job *job, JobVerb verb, Error **errp)
 {
 JobStatus s0 = job->status;
-assert(verb >= 0 && verb <= JOB_VERB__MAX);
+assert(verb >= 0 && verb < JOB_VERB__MAX);
 trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
  JobVerbTable[verb][s0] ? "allowed" : "prohibited");
 if (JobVerbTable[verb][s0]) {
-- 
1.8.3.1




[Qemu-devel] [PATCH v3 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-08-31 Thread Liam Merwick
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL) so it should
be checked before dereferencing.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be83..210eee75006a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,7 +136,7 @@ static char *blk_root_get_parent_desc(BdrvChild *child)
 }
 
 dev_id = blk_get_attached_dev_id(blk);
-if (*dev_id) {
+if (dev_id && *dev_id) {
 return dev_id;
 } else {
 /* TODO Callback into the BB owner for something more detailed */
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2 8/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-08-31 Thread Liam Merwick




On 31/08/18 17:53, Eric Blake wrote:

On 08/31/2018 11:36 AM, Liam Merwick wrote:

The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to 
metadata_ol_names[].

As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the 
array bounds.


Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
---
  block/qcow2-refcount.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)




+
+/*
+ * Catch at compile time the case where an overlap detection bit
+ * was added to QCow2MetadataOverlap in block/qcow2.h but a
+ * corresponding entry to metadata_ol_names[] wasn't added.
+ */


I'm not sure the comment adds much value.  I'd be fine with dropping it.


+QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR !=
+    (sizeof(metadata_ol_names) / sizeof(metadata_ol_names[0])));


We have a macro for that.  Spell this:

QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR != ARRAY_SIZE(metadata_ol_names));

and then you can have

Reviewed-by: Eric Blake 



Thanks, I've updated those and removed the double space in patch6. Will 
be in upcoming v3


Regards,
Liam



[Qemu-devel] [PATCH v2 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-08-31 Thread Liam Merwick
The calls to bdrv_new_open_driver(), find_mapping_for_cluster(),
and array_get_next() may return NULL but it isn't always checked for
before dereferencing the value returned.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/vvfat.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..0f1f10a2f94b 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -448,6 +448,9 @@ static direntry_t *create_long_filename(BDRVVVFATState *s, 
const char *filename)
 
 for(i=0;idirectory));
+if (entry == NULL) {
+continue;
+}
 entry->attributes=0xf;
 entry->reserved[0]=0;
 entry->begin=0;
@@ -665,6 +668,9 @@ static inline void fat_set(BDRVVVFATState* s,unsigned int 
cluster,uint32_t value
 } else {
 int offset = (cluster*3/2);
 unsigned char* p = array_get(&(s->fat), offset);
+if (p == NULL) {
+return;
+}
 switch (cluster&1) {
 case 0:
 p[0] = value&0xff;
@@ -730,6 +736,9 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
 
 if(is_dot) {
 entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return NULL;
+}
 memset(entry->name, 0x20, sizeof(entry->name));
 memcpy(entry->name,filename,strlen(filename));
 return entry;
@@ -844,6 +853,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 /* create mapping for this file */
 if(!is_dot && !is_dotdot && (S_ISDIR(st.st_mode) || st.st_size)) {
 s->current_mapping = array_get_next(&(s->mapping));
+if (s->current_mapping == NULL) {
+fprintf(stderr, "Failed to create mapping for file\n");
+g_free(buffer);
+closedir(dir);
+return -2;
+}
 s->current_mapping->begin=0;
 s->current_mapping->end=st.st_size;
 /*
@@ -941,6 +956,9 @@ static int init_directories(BDRVVVFATState* s,
 /* add volume label */
 {
 direntry_t* entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return -1;
+}
 entry->attributes=0x28; /* archive | volume label */
 memcpy(entry->name, s->volume_label, sizeof(entry->name));
 }
@@ -953,6 +971,9 @@ static int init_directories(BDRVVVFATState* s,
 s->cluster_count=sector2cluster(s, s->sector_count);
 
 mapping = array_get_next(&(s->mapping));
+if (mapping == NULL) {
+return -1;
+}
 mapping->begin = 0;
 mapping->dir_index = 0;
 mapping->info.dir.parent_mapping_index = -1;
@@ -1630,6 +1651,9 @@ static void schedule_rename(BDRVVVFATState* s,
 uint32_t cluster, char* new_path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = new_path;
 commit->param.rename.cluster = cluster;
 commit->action = ACTION_RENAME;
@@ -1639,6 +1663,9 @@ static void schedule_writeout(BDRVVVFATState* s,
 int dir_index, uint32_t modified_offset)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = NULL;
 commit->param.writeout.dir_index = dir_index;
 commit->param.writeout.modified_offset = modified_offset;
@@ -1649,6 +1676,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 char* path, uint32_t first_cluster)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.new_file.first_cluster = first_cluster;
 commit->action = ACTION_NEW_FILE;
@@ -1657,6 +1687,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 static void schedule_mkdir(BDRVVVFATState* s, uint32_t cluster, char* path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.mkdir.cluster = cluster;
 commit->action = ACTION_MKDIR;
@@ -2261,6 +2294,9 @@ static mapping_t* insert_mapping(BDRVVVFATState* s,
 }
 if (index >= s->mapping.next || mapping->begin > begin) {
 mapping = array_insert(&(s->mapping), index, 1);
+if (mapping == NULL) {
+return NULL;
+}
 mapping->path = NULL;
 adjust_mapping_indices(s, index, +1);
 }
@@ -2428,6 +2464,9 @@ static int commit_direntries(BDRVVVFATState* s,
 direntry_t* direntry = array_get(&(s->directory), dir_index);
 uint32_t first_cluster = dir_index == 0 ? 0 : begin_

[Qemu-devel] [PATCH v2 8/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-08-31 Thread Liam Merwick
The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array 
bounds.

Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
---
 block/qcow2-refcount.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5ec..fb0de187cfd2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2719,16 +2719,26 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 static const char *metadata_ol_names[] = {
-[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
-[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
-[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
-[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
-[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
-[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
-[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
-[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
+[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
+[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
+[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
+[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_BITMAP_DIRECTORY_BITNR]   = "bitmap directory",
 };
 
+
+/*
+ * Catch at compile time the case where an overlap detection bit
+ * was added to QCow2MetadataOverlap in block/qcow2.h but a
+ * corresponding entry to metadata_ol_names[] wasn't added.
+ */
+QEMU_BUILD_BUG_ON(QCOW2_OL_MAX_BITNR !=
+(sizeof(metadata_ol_names) / sizeof(metadata_ol_names[0])));
+
 /*
  * First performs a check for metadata overlaps (through
  * qcow2_check_metadata_overlap); if that fails with a negative value (error
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 4/8] qemu-img: potential Null pointer deref in img_commit()

2018-08-31 Thread Liam Merwick
The function block_job_get() may return NULL so before dereferencing
the 'job' pointer in img_commit() it should be checked.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 qemu-img.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..51fe09bd08ed 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,9 @@ static int img_commit(int argc, char **argv)
 }
 
 job = block_job_get("commit");
+if (job == NULL) {
+goto unref_backing;
+}
 run_block_job(job, _err);
 if (local_err) {
 goto unref_backing;
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 1/8] configure: Provide option to explicitly disable AVX2

2018-08-31 Thread Liam Merwick
The configure script detects if the compiler has AVX2 support and
automatically sets avx2_opt="yes" which in turn defines CONFIG_AVX2_OPT.
There is no way of explicitly overriding this setting so this commit adds
two command-line options: --enable-avx2 and --disable-avx2.

The default behaviour, when no option is specified, is to maintain the
current behaviour and enable AVX2 if the compiler supports it.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 configure | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 58862d2ae88a..8904a91715b3 100755
--- a/configure
+++ b/configure
@@ -427,7 +427,7 @@ usb_redir=""
 opengl=""
 opengl_dmabuf="no"
 cpuid_h="no"
-avx2_opt="no"
+avx2_opt=""
 zlib="yes"
 capstone=""
 lzo=""
@@ -1332,6 +1332,10 @@ for opt do
   ;;
   --disable-glusterfs) glusterfs="no"
   ;;
+  --disable-avx2) avx2_opt="no"
+  ;;
+  --enable-avx2) avx2_opt="yes"
+  ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1709,6 +1713,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   libxml2 for Parallels image format
   tcmalloctcmalloc support
   jemallocjemalloc support
+  avx2AVX2 optimization support
   replication replication support
   vhost-vsock virtio sockets device support
   opengl  opengl support
@@ -5127,7 +5132,7 @@ fi
 # There is no point enabling this if cpuid.h is not usable,
 # since we won't be able to select the new routines.
 
-if test $cpuid_h = yes; then
+if test "$cpuid_h" = "yes" -a "$avx2_opt" != "no"; then
   cat > $TMPC << EOF
 #pragma GCC push_options
 #pragma GCC target("avx2")
@@ -5141,6 +5146,8 @@ int main(int argc, char *argv[]) { return bar(argv[0]); }
 EOF
   if compile_object "" ; then
 avx2_opt="yes"
+  else
+avx2_opt="no"
   fi
 fi
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 6/8] block: dump_qlist() may dereference a Null pointer

2018-08-31 Thread Liam Merwick
A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

Given that  dump_qlist() is static, and callers already do the right
thing, just add an assert to catch future potential bugs.

Signed-off-by: Liam Merwick 
---
 block/qapi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qapi.c b/block/qapi.c
index c66f949db839..e81be604217c 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -740,6 +740,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
 const QListEntry *entry;
 int i = 0;
 
+assert(list);
+
 for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 0/8] off-by-one and NULL pointer accesses detected by static analysis

2018-08-31 Thread Liam Merwick
Below are a number of fixes to some off-by-one, read outside array bounds, and
NULL pointer accesses detected by an internal Oracle static analysis tool 
(Parfait).
https://labs.oracle.com/pls/apex/f?p=labs:49:P49_PROJECT_ID:13

I have also included a patch to add a command-line option to configure to
select if AVX2 is used or not (keeping the existing behaviour by default).
My motivation was avoiding an issue with the static analysis tool but NetSpectre
was announced as I was working on this and I felt it may have more general uses.

v1 -> v2
Based on feedback from Eric Blake:
patch2: reworded commit message to clarify issue
patch6: Reverted common qlist routines and added assert to qlist_dump instead
patch7: Fixed incorrect logic
patch8: Added QEMU_BUILD_BUG_ON to catch future іnstance at compile-time

Liam Merwick (8):
  configure: Provide option to explicitly disable AVX2
  job: Fix off-by-one assert checks for JobSTT and JobVerbTable
  block: Null pointer dereference in blk_root_get_parent_desc()
  qemu-img: potential Null pointer deref in img_commit()
  block: Fix potential Null pointer dereferences in vvfat.c
  block: dump_qlist() may dereference a Null pointer
  io: potential unnecessary check in qio_channel_command_new_spawn()
  qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

 block/block-backend.c  |  2 +-
 block/qapi.c   |  2 ++
 block/qcow2-refcount.c | 26 +++
 block/vvfat.c  | 56 ++
 configure  | 11 --
 io/channel-command.c   |  3 +--
 job.c  |  4 ++--
 qemu-img.c |  3 +++
 8 files changed, 92 insertions(+), 15 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH v2 7/8] io: potential unnecessary check in qio_channel_command_new_spawn()

2018-08-31 Thread Liam Merwick
In qio_channel_command_new_spawn() the 'flags' variable is checked
to see if /dev/null should be used for stdin or stdout; first with
O_RDONLY and then O_WRONLY.  However the second check for O_WRONLY
is only needed if flags != O_RDONLY and therefore should be an
else if statement.

This minor optimization has the added benefit of suppressing a warning
from a static analysis tool (Parfait) which incorrectly reported an
incorrect checking of flags in qio_channel_command_new_spawn() could
result in an uninitialized file descriptor being used. Removing this
noise will help us better find real issues.

Signed-off-by: Liam Merwick 
---
 io/channel-command.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/io/channel-command.c b/io/channel-command.c
index 3e7eb17eff54..82acd3234915 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -61,8 +61,7 @@ qio_channel_command_new_spawn(const char *const argv[],
 
 if (flags == O_RDONLY) {
 stdinnull = true;
-}
-if (flags == O_WRONLY) {
+} else if (flags == O_WRONLY) {
 stdoutnull = true;
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 2/8] job: Fix off-by-one assert checks for JobSTT and JobVerbTable

2018-08-31 Thread Liam Merwick
In the assert checking the array dereference of JobVerbTable[verb]
in job_apply_verb() the check of the index, verb, allows an overrun
because an index equal to the array size is permitted.

Similarly, in the assert check of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is not flagged
either.

This is not a run-time issue as there are no callers actually
passing in the max value.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
Reviewed-by: Eric Blake 
---
 job.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index e36ebaafd81c..40320566f43b 100644
--- a/job.c
+++ b/job.c
@@ -166,7 +166,7 @@ bool job_is_internal(Job *job)
 static void job_state_transition(Job *job, JobStatus s1)
 {
 JobStatus s0 = job->status;
-assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
 trace_job_state_transition(job, job->ret,
JobSTT[s0][s1] ? "allowed" : "disallowed",
JobStatus_str(s0), JobStatus_str(s1));
@@ -181,7 +181,7 @@ static void job_state_transition(Job *job, JobStatus s1)
 int job_apply_verb(Job *job, JobVerb verb, Error **errp)
 {
 JobStatus s0 = job->status;
-assert(verb >= 0 && verb <= JOB_VERB__MAX);
+assert(verb >= 0 && verb < JOB_VERB__MAX);
 trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
  JobVerbTable[verb][s0] ? "allowed" : "prohibited");
 if (JobVerbTable[verb][s0]) {
-- 
1.8.3.1




[Qemu-devel] [PATCH v2 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-08-31 Thread Liam Merwick
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL) so it should
be checked before dereferencing.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be83..210eee75006a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,7 +136,7 @@ static char *blk_root_get_parent_desc(BdrvChild *child)
 }
 
 dev_id = blk_get_attached_dev_id(blk);
-if (*dev_id) {
+if (dev_id && *dev_id) {
 return dev_id;
 } else {
 /* TODO Callback into the BB owner for something more detailed */
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 7/8] io: file descriptor not initialized in qio_channel_command_new_spawn()

2018-08-31 Thread Liam Merwick




On 31/08/18 16:50, Eric Blake wrote:

On 08/31/2018 10:36 AM, Liam Merwick wrote:

On 30/08/2018 17:18, Eric Blake wrote:

On 08/30/2018 10:47 AM, Liam Merwick wrote:

Incorrect checking of flags could result in uninitialized
file descriptor being used.





Looking at it again, the very minor optimisation of converting the 
2nd 'if' to an 'else if' has the useful side-effect of appeasing the 
static analysis tool.


I never figured out what the tool precisely thought was wrong in the 
first place. Can you paste the output of the tool to see exactly what 
it analyzed as the potential flaw?  Perhaps the analyzer was trying to 
see what would happen if a caller submitting the fourth value (3 on 
systems where O_RDONLY is 0, 0 on systems where O_RDONLY is 1) and the 
code not behaving in that setup, even though we know that all callers 
only submit the three valid values and never the fourth invalid 
value?  Or maybe it's a weakness where we have made dependent 
assumptions but in independent branches, in such a way that we know it 
will never fail but the analyzer doesn't?




The specific error it reported was

Error: File Invalid
   File Descriptor not Initialized [file-desc-not-init]:
  The value  is not initialized as a file descriptor.
    at line 91 of io/channel-command.c in function 
'qio_channel_command_new_spawn'.

  resource  not initialized when flags != 0 at line 62
  and flags != 1 at line 65
  and stdinnull is false at line 69
  and stdoutnull is false at line 69.


I've been staring at the code and can see no reason why it isn't a false 
positive (and I'll let the tool authors know)


Regards,
Liam



--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -61,8 +61,7 @@ qio_channel_command_new_spawn(const char *const 
argv[],


  if (flags == O_RDONLY) {
  stdinnull = true;
-    }
-    if (flags == O_WRONLY) {
+    } else if (flags == O_WRONLY) {


But this sort of change is acceptable, especially if it does shut up 
the analyzer, and done with a commit message mentioning that it is not 
a semantic change but does make it easier to use the static checker to 
find real problems by getting rid of noise.



  stdoutnull = true;
  }

Regards,
Liam








Re: [Qemu-devel] [PATCH 7/8] io: file descriptor not initialized in qio_channel_command_new_spawn()

2018-08-31 Thread Liam Merwick

On 30/08/2018 17:18, Eric Blake wrote:

On 08/30/2018 10:47 AM, Liam Merwick wrote:

Incorrect checking of flags could result in uninitialized
file descriptor being used.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  io/channel-command.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/channel-command.c b/io/channel-command.c
index 3e7eb17eff54..38deb687da21 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -59,10 +59,10 @@ qio_channel_command_new_spawn(const char *const 
argv[],

  flags = flags & O_ACCMODE;
-    if (flags == O_RDONLY) {
+    if ((flags & O_RDONLY) == O_RDONLY) {


NACK.  O_RDONLY and O_WRONLY are subsets of O_ACCMODE, which we already 
masked out above.


On some systems, we have:
O_RDONLY == 0
O_WRONLY == 1
O_RDWR == 2

On other systems, we have:
O_RDONLY == 1
O_WRONLY == 2
O_RDWR == 3

Either way, if the user passed in O_RDWR, (flags & O_RDONLY) == O_RDONLY 
returns true, which is wrong.


O_ACCMODE was historically 0x3, although now that POSIX has O_EXEC and 
O_SEARCH (which can be the same bit pattern), some systems now make 
O_ACCMODE occupy 3 bits instead of 2.




Thanks for catching that and for the explanation.

Looking at it again, the very minor optimisation of converting the 2nd 
'if' to an 'else if' has the useful side-effect of appeasing the static 
analysis tool.


--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -61,8 +61,7 @@ qio_channel_command_new_spawn(const char *const argv[],

 if (flags == O_RDONLY) {
 stdinnull = true;
-}
-if (flags == O_WRONLY) {
+} else if (flags == O_WRONLY) {
 stdoutnull = true;
 }

Regards,
Liam



Re: [Qemu-devel] [PATCH 8/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-08-31 Thread Liam Merwick




On 30/08/18 19:43, Eric Blake wrote:

On 08/30/2018 10:47 AM, Liam Merwick wrote:

The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to 
metadata_ol_names[].

As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the 
array bounds.


Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  block/qcow2-refcount.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)


The fix looks correct, but to prevent the problem from happening again, 
I'd suggest you also add a compile-time BUG_ON that fails if the array 
size gets out of sync again due to another addition of another overlap 
detection bit.




Good idea. There is no generic BUG_ON in QEMU (just a few private 
copies) or BUILD_BUG_ON. I can add a commit that introduces a copy of 
include/linux/build_bug.h from the Linux kernel and use BUILD_BUG_ON in 
this commit.  Is there any reason not to do that?


Regards,
Liam



Re: [Qemu-devel] [PATCH 6/8] block: dump_qlist() may dereference a Null pointer

2018-08-31 Thread Liam Merwick




On 30/08/18 19:41, Eric Blake wrote:

On 08/30/2018 10:47 AM, Liam Merwick wrote:

A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.


But dump_qlist() is static, and it is easy to prove that it will never 
be called with a NULL 'list' parameter (it's lone caller did switch 
(qobject_type(obj)), and calls dump_qlist() only for QTYPE_QLIST, which 
implies that the qobject_to(QList, obj) will succeed and never be NULL).




This could be resolved by checking if the list is NULL in dump_qlist()
and returning immediately. However, the general case can be handled by
adding a NULL arg check to to qlist_first() and qlist_next() and all
the callers to those functions handle that cleanly.


NACK.  If anything, I'd be happier with:

assert(list);



Thank works for me too.

Regards,
Liam

in dump_qlist() to shut up the lint checker, as we do not want to slow 
down the common case of qlist_first() for something that does not 
happen.  That is, the null dereference in qlist_first() is a feature for 
detecting buggy code, and not something we need to change.




Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 

---
  include/qapi/qmp/qlist.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
index 8d2c32ca2863..1ec716e2eb9e 100644
--- a/include/qapi/qmp/qlist.h
+++ b/include/qapi/qmp/qlist.h
@@ -58,11 +58,17 @@ void qlist_destroy_obj(QObject *obj);
  static inline const QListEntry *qlist_first(const QList *qlist)
  {
+    if (!qlist) {
+    return NULL;
+    }
  return QTAILQ_FIRST(>head);
  }
  static inline const QListEntry *qlist_next(const QListEntry *entry)
  {
+    if (!entry) {
+    return NULL;
+    }
  return QTAILQ_NEXT(entry, next);
  }







Re: [Qemu-devel] [PATCH 2/8] job: Fix off-by-one accesses to JobSTT and JobVerbTable

2018-08-31 Thread Liam Merwick



On 30/08/18 19:34, Eric Blake wrote:

On 08/30/2018 10:47 AM, Liam Merwick wrote:

In the array dereference of JobVerbTable[verb] in job_apply_verb()
the check of the index, verb, allows an overrun because an index
equal to the array size is permitted.

Similarly, in the array dereference of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is possible.


Fortunately, these are just assertions that are too weak; we don't have 
any actual callers passing the __MAX entry to cause an actual overrun.




True. In v2 I'll clarify that in the commit message



Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
  job.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Eric Blake 



Thanks,
Liam




diff --git a/job.c b/job.c
index e36ebaafd81c..40320566f43b 100644
--- a/job.c
+++ b/job.c
@@ -166,7 +166,7 @@ bool job_is_internal(Job *job)
  static void job_state_transition(Job *job, JobStatus s1)
  {
  JobStatus s0 = job->status;
-    assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+    assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
  trace_job_state_transition(job, job->ret,
 JobSTT[s0][s1] ? "allowed" : 
"disallowed",

 JobStatus_str(s0), JobStatus_str(s1));
@@ -181,7 +181,7 @@ static void job_state_transition(Job *job, 
JobStatus s1)

  int job_apply_verb(Job *job, JobVerb verb, Error **errp)
  {
  JobStatus s0 = job->status;
-    assert(verb >= 0 && verb <= JOB_VERB__MAX);
+    assert(verb >= 0 && verb < JOB_VERB__MAX);
  trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
   JobVerbTable[verb][s0] ? "allowed" : 
"prohibited");

  if (JobVerbTable[verb][s0]) {







[Qemu-devel] [PATCH 1/8] configure: Provide option to explicitly disable AVX2

2018-08-30 Thread Liam Merwick
The configure script detects if the compiler has AVX2 support and
automatically sets avx2_opt="yes" which in turn defines CONFIG_AVX2_OPT.
There is no way of explicitly overriding this setting so this commit adds
two command-line options: --enable-avx2 and --disable-avx2.

The default behaviour, when no option is specified, is to maintain the
current behaviour and enable AVX2 if the compiler supports it.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 configure | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 58862d2ae88a..8904a91715b3 100755
--- a/configure
+++ b/configure
@@ -427,7 +427,7 @@ usb_redir=""
 opengl=""
 opengl_dmabuf="no"
 cpuid_h="no"
-avx2_opt="no"
+avx2_opt=""
 zlib="yes"
 capstone=""
 lzo=""
@@ -1332,6 +1332,10 @@ for opt do
   ;;
   --disable-glusterfs) glusterfs="no"
   ;;
+  --disable-avx2) avx2_opt="no"
+  ;;
+  --enable-avx2) avx2_opt="yes"
+  ;;
   --enable-glusterfs) glusterfs="yes"
   ;;
   --disable-virtio-blk-data-plane|--enable-virtio-blk-data-plane)
@@ -1709,6 +1713,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   libxml2 for Parallels image format
   tcmalloctcmalloc support
   jemallocjemalloc support
+  avx2AVX2 optimization support
   replication replication support
   vhost-vsock virtio sockets device support
   opengl  opengl support
@@ -5127,7 +5132,7 @@ fi
 # There is no point enabling this if cpuid.h is not usable,
 # since we won't be able to select the new routines.
 
-if test $cpuid_h = yes; then
+if test "$cpuid_h" = "yes" -a "$avx2_opt" != "no"; then
   cat > $TMPC << EOF
 #pragma GCC push_options
 #pragma GCC target("avx2")
@@ -5141,6 +5146,8 @@ int main(int argc, char *argv[]) { return bar(argv[0]); }
 EOF
   if compile_object "" ; then
 avx2_opt="yes"
+  else
+avx2_opt="no"
   fi
 fi
 
-- 
1.8.3.1




[Qemu-devel] [PATCH 4/8] qemu-img: potential Null pointer deref in img_commit()

2018-08-30 Thread Liam Merwick
The function block_job_get() may return NULL so before dereferencing
the 'job' pointer in img_commit() it should be checked.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 qemu-img.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index b12f4cd19b0a..51fe09bd08ed 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1029,6 +1029,9 @@ static int img_commit(int argc, char **argv)
 }
 
 job = block_job_get("commit");
+if (job == NULL) {
+goto unref_backing;
+}
 run_block_job(job, _err);
 if (local_err) {
 goto unref_backing;
-- 
1.8.3.1




[Qemu-devel] [PATCH 5/8] block: Fix potential Null pointer dereferences in vvfat.c

2018-08-30 Thread Liam Merwick
The calls to bdrv_new_open_driver(), find_mapping_for_cluster(),
and array_get_next() may return NULL but it isn't always checked for
before dereferencing the value returned.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/vvfat.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/block/vvfat.c b/block/vvfat.c
index fc41841a5c3c..0f1f10a2f94b 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -448,6 +448,9 @@ static direntry_t *create_long_filename(BDRVVVFATState *s, 
const char *filename)
 
 for(i=0;idirectory));
+if (entry == NULL) {
+continue;
+}
 entry->attributes=0xf;
 entry->reserved[0]=0;
 entry->begin=0;
@@ -665,6 +668,9 @@ static inline void fat_set(BDRVVVFATState* s,unsigned int 
cluster,uint32_t value
 } else {
 int offset = (cluster*3/2);
 unsigned char* p = array_get(&(s->fat), offset);
+if (p == NULL) {
+return;
+}
 switch (cluster&1) {
 case 0:
 p[0] = value&0xff;
@@ -730,6 +736,9 @@ static inline direntry_t* 
create_short_and_long_name(BDRVVVFATState* s,
 
 if(is_dot) {
 entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return NULL;
+}
 memset(entry->name, 0x20, sizeof(entry->name));
 memcpy(entry->name,filename,strlen(filename));
 return entry;
@@ -844,6 +853,12 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 /* create mapping for this file */
 if(!is_dot && !is_dotdot && (S_ISDIR(st.st_mode) || st.st_size)) {
 s->current_mapping = array_get_next(&(s->mapping));
+if (s->current_mapping == NULL) {
+fprintf(stderr, "Failed to create mapping for file\n");
+g_free(buffer);
+closedir(dir);
+return -2;
+}
 s->current_mapping->begin=0;
 s->current_mapping->end=st.st_size;
 /*
@@ -941,6 +956,9 @@ static int init_directories(BDRVVVFATState* s,
 /* add volume label */
 {
 direntry_t* entry=array_get_next(&(s->directory));
+if (entry == NULL) {
+return -1;
+}
 entry->attributes=0x28; /* archive | volume label */
 memcpy(entry->name, s->volume_label, sizeof(entry->name));
 }
@@ -953,6 +971,9 @@ static int init_directories(BDRVVVFATState* s,
 s->cluster_count=sector2cluster(s, s->sector_count);
 
 mapping = array_get_next(&(s->mapping));
+if (mapping == NULL) {
+return -1;
+}
 mapping->begin = 0;
 mapping->dir_index = 0;
 mapping->info.dir.parent_mapping_index = -1;
@@ -1630,6 +1651,9 @@ static void schedule_rename(BDRVVVFATState* s,
 uint32_t cluster, char* new_path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = new_path;
 commit->param.rename.cluster = cluster;
 commit->action = ACTION_RENAME;
@@ -1639,6 +1663,9 @@ static void schedule_writeout(BDRVVVFATState* s,
 int dir_index, uint32_t modified_offset)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = NULL;
 commit->param.writeout.dir_index = dir_index;
 commit->param.writeout.modified_offset = modified_offset;
@@ -1649,6 +1676,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 char* path, uint32_t first_cluster)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.new_file.first_cluster = first_cluster;
 commit->action = ACTION_NEW_FILE;
@@ -1657,6 +1687,9 @@ static void schedule_new_file(BDRVVVFATState* s,
 static void schedule_mkdir(BDRVVVFATState* s, uint32_t cluster, char* path)
 {
 commit_t* commit = array_get_next(&(s->commits));
+if (commit == NULL) {
+return;
+}
 commit->path = path;
 commit->param.mkdir.cluster = cluster;
 commit->action = ACTION_MKDIR;
@@ -2261,6 +2294,9 @@ static mapping_t* insert_mapping(BDRVVVFATState* s,
 }
 if (index >= s->mapping.next || mapping->begin > begin) {
 mapping = array_insert(&(s->mapping), index, 1);
+if (mapping == NULL) {
+return NULL;
+}
 mapping->path = NULL;
 adjust_mapping_indices(s, index, +1);
 }
@@ -2428,6 +2464,9 @@ static int commit_direntries(BDRVVVFATState* s,
 direntry_t* direntry = array_get(&(s->directory), dir_index);
 uint32_t first_cluster = dir_index == 0 ? 0 : begin_

[Qemu-devel] [PATCH 2/8] job: Fix off-by-one accesses to JobSTT and JobVerbTable

2018-08-30 Thread Liam Merwick
In the array dereference of JobVerbTable[verb] in job_apply_verb()
the check of the index, verb, allows an overrun because an index
equal to the array size is permitted.

Similarly, in the array dereference of JobSTT[s0][s1] with index s1
in job_state_transition(), an off-by-one overrun is possible.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 job.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index e36ebaafd81c..40320566f43b 100644
--- a/job.c
+++ b/job.c
@@ -166,7 +166,7 @@ bool job_is_internal(Job *job)
 static void job_state_transition(Job *job, JobStatus s1)
 {
 JobStatus s0 = job->status;
-assert(s1 >= 0 && s1 <= JOB_STATUS__MAX);
+assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
 trace_job_state_transition(job, job->ret,
JobSTT[s0][s1] ? "allowed" : "disallowed",
JobStatus_str(s0), JobStatus_str(s1));
@@ -181,7 +181,7 @@ static void job_state_transition(Job *job, JobStatus s1)
 int job_apply_verb(Job *job, JobVerb verb, Error **errp)
 {
 JobStatus s0 = job->status;
-assert(verb >= 0 && verb <= JOB_VERB__MAX);
+assert(verb >= 0 && verb < JOB_VERB__MAX);
 trace_job_apply_verb(job, JobStatus_str(s0), JobVerb_str(verb),
  JobVerbTable[verb][s0] ? "allowed" : "prohibited");
 if (JobVerbTable[verb][s0]) {
-- 
1.8.3.1




[Qemu-devel] [PATCH 3/8] block: Null pointer dereference in blk_root_get_parent_desc()

2018-08-30 Thread Liam Merwick
The dev_id returned by the call to blk_get_attached_dev_id() in
blk_root_get_parent_desc() can be NULL (an internal call to
object_get_canonical_path may have returned NULL) so it should
be checked before dereferencing.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fa120630be83..210eee75006a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -136,7 +136,7 @@ static char *blk_root_get_parent_desc(BdrvChild *child)
 }
 
 dev_id = blk_get_attached_dev_id(blk);
-if (*dev_id) {
+if (dev_id && *dev_id) {
 return dev_id;
 } else {
 /* TODO Callback into the BB owner for something more detailed */
-- 
1.8.3.1




[Qemu-devel] [PATCH 6/8] block: dump_qlist() may dereference a Null pointer

2018-08-30 Thread Liam Merwick
A NULL 'list' passed into function dump_qlist() isn't correctly
validated and can be passed to qlist_first() where it is dereferenced.

This could be resolved by checking if the list is NULL in dump_qlist()
and returning immediately. However, the general case can be handled by
adding a NULL arg check to to qlist_first() and qlist_next() and all
the callers to those functions handle that cleanly.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny  
Reviewed-by: Mark Kanda 

---
 include/qapi/qmp/qlist.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
index 8d2c32ca2863..1ec716e2eb9e 100644
--- a/include/qapi/qmp/qlist.h
+++ b/include/qapi/qmp/qlist.h
@@ -58,11 +58,17 @@ void qlist_destroy_obj(QObject *obj);
 
 static inline const QListEntry *qlist_first(const QList *qlist)
 {
+if (!qlist) {
+return NULL;
+}
 return QTAILQ_FIRST(>head);
 }
 
 static inline const QListEntry *qlist_next(const QListEntry *entry)
 {
+if (!entry) {
+return NULL;
+}
 return QTAILQ_NEXT(entry, next);
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH 0/8] off-by-one and NULL pointer accesses detected by static analysis

2018-08-30 Thread Liam Merwick
Below are a number of fixes to some off-by-one, read outside array bounds, and
NULL pointer accesses detected by an internal Oracle static analysis tool 
(Parfait).
https://labs.oracle.com/pls/apex/f?p=labs:49:P49_PROJECT_ID:13

I have also included a patch to add a command-line option to configure to
select if AVX2 is used or not (keeping the existing behaviour by default).
My motivation was avoiding an issue with the static analysis tool but NetSpectre
was announced as I was working on this and I felt it may have more general uses.


Liam Merwick (8):
  configure: Provide option to explicitly disable AVX2
  job: Fix off-by-one accesses to JobSTT and JobVerbTable
  block: Null pointer dereference in blk_root_get_parent_desc()
  qemu-img: potential Null pointer deref in img_commit()
  block: Fix potential Null pointer dereferences in vvfat.c
  block: dump_qlist() may dereference a Null pointer
  io: file descriptor not initialized in qio_channel_command_new_spawn()
  qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

 block/block-backend.c|  2 +-
 block/qcow2-refcount.c   | 17 ---
 block/vvfat.c| 56 
 configure| 11 --
 include/qapi/qmp/qlist.h |  6 ++
 io/channel-command.c |  4 ++--
 job.c|  4 ++--
 qemu-img.c   |  3 +++
 8 files changed, 88 insertions(+), 15 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH 7/8] io: file descriptor not initialized in qio_channel_command_new_spawn()

2018-08-30 Thread Liam Merwick
Incorrect checking of flags could result in uninitialized
file descriptor being used.

Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 io/channel-command.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io/channel-command.c b/io/channel-command.c
index 3e7eb17eff54..38deb687da21 100644
--- a/io/channel-command.c
+++ b/io/channel-command.c
@@ -59,10 +59,10 @@ qio_channel_command_new_spawn(const char *const argv[],
 
 flags = flags & O_ACCMODE;
 
-if (flags == O_RDONLY) {
+if ((flags & O_RDONLY) == O_RDONLY) {
 stdinnull = true;
 }
-if (flags == O_WRONLY) {
+if ((flags & O_WRONLY) == O_WRONLY) {
 stdoutnull = true;
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH 8/8] qcow2: Read outside array bounds in qcow2_pre_write_overlap_check()

2018-08-30 Thread Liam Merwick
The commit for 0e4e4318eaa5 increments QCOW2_OL_MAX_BITNR but does not
add an array entry for QCOW2_OL_BITMAP_DIRECTORY_BITNR to metadata_ol_names[].
As a result, an array dereference of metadata_ol_names[8] in
qcow2_pre_write_overlap_check() could result in a read outside of the array 
bounds.

Fixes: 0e4e4318eaa5 ('qcow2: add overlap check for bitmap directory')

Cc: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Liam Merwick 
Reviewed-by: Darren Kenny 
Reviewed-by: Mark Kanda 
---
 block/qcow2-refcount.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3c539f02e5ec..6504e7421324 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2719,14 +2719,15 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 static const char *metadata_ol_names[] = {
-[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
-[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
-[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
-[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
-[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
-[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
-[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
-[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_MAIN_HEADER_BITNR]= "qcow2_header",
+[QCOW2_OL_ACTIVE_L1_BITNR]  = "active L1 table",
+[QCOW2_OL_ACTIVE_L2_BITNR]  = "active L2 table",
+[QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+[QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+[QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+[QCOW2_OL_INACTIVE_L1_BITNR]= "inactive L1 table",
+[QCOW2_OL_INACTIVE_L2_BITNR]= "inactive L2 table",
+[QCOW2_OL_BITMAP_DIRECTORY_BITNR]   = "bitmap directory",
 };
 
 /*
-- 
1.8.3.1




<    1   2